diff --git a/.gitattributes b/.gitattributes index 05c836003938fe26e5ac5c6a100cb9015786d74c..fb045a3031053cd611b5054a5cd9459ab518e87d 100644 --- a/.gitattributes +++ b/.gitattributes @@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text *.zst filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text index.faiss filter=lfs diff=lfs merge=lfs -text +data/index.faiss filter=lfs diff=lfs merge=lfs -text diff --git a/data/index.faiss b/data/index.faiss new file mode 100644 index 0000000000000000000000000000000000000000..498f50a877a9001f92dd69e256ee2844cecce54e --- /dev/null +++ b/data/index.faiss @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78a068ac98a5de614955c9c1e307b40f7b403bd46d315cf3b583f22466bf5e7a +size 3545133 diff --git a/data/index_to_id.pkl b/data/index_to_id.pkl new file mode 100644 index 0000000000000000000000000000000000000000..a7685f46722379aa9796b1a2b121befe72b2c5f5 --- /dev/null +++ b/data/index_to_id.pkl @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eacb7d3fdf34ce9ff0a5c89cc1799b981f7da6a6d4880269bed2af19eda91da7 +size 55018 diff --git a/data/index_to_metadata.pkl b/data/index_to_metadata.pkl new file mode 100644 index 0000000000000000000000000000000000000000..8cb94ccb77d189c08a76714e23fb550c0b595e8f --- /dev/null +++ b/data/index_to_metadata.pkl @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe076a7a46265654075c23b846b259d7beaef163f5a5c72d14246a0e3d73579f +size 530243 diff --git a/data/model_data_json/AdamCodd_vit-base-nsfw-detector.json b/data/model_data_json/AdamCodd_vit-base-nsfw-detector.json new file mode 100644 index 0000000000000000000000000000000000000000..dd4c1bbe9b2ad2902e0b049cf46249c206bdb1c8 --- /dev/null +++ b/data/model_data_json/AdamCodd_vit-base-nsfw-detector.json @@ -0,0 +1,20 @@ +{ + "model_id": "AdamCodd/vit-base-nsfw-detector", + "downloads": 1035027, + "tags": [ + "transformers.js", + "onnx", + "safetensors", + "vit", + "image-classification", + "transformers", + "nlp", + "base_model:google/vit-base-patch16-384", + "base_model:quantized:google/vit-base-patch16-384", + "license:apache-2.0", + "model-index", + "region:us" + ], + "description": "--- metrics: - accuracy pipeline_tag: image-classification base_model: google/vit-base-patch16-384 model-index: - name: AdamCodd/vit-base-nsfw-detector results: - task: type: image-classification name: Image Classification metrics: - type: accuracy value: 0.9654 name: Accuracy - type: AUC value: 0.9948 - type: loss value: 0.0937 name: Loss license: apache-2.0 tags: - transformers.js - transformers - nlp --- # vit-base-nsfw-detector This model is a fine-tuned version of vit-base-patch16-384 on around 25_000 images (drawings, photos...). It achieves the following results on the evaluation set: - Loss: 0.0937 - Accuracy: 0.9654 **New [07/30]**: I created a new ViT model specifically to detect NSFW/SFW images for stable diffusion usage (read the disclaimer below for the reason): **AdamCodd/vit-nsfw-stable-diffusion**. **Disclaimer**: This model wasn't made with generative images in mind! There is no generated image in the dataset used here, and it performs significantly worse on generative images, which will require another ViT model specifically trained on generative images. Here are the model's actual scores for generative images to give you an idea: - Loss: 0.3682 (↑ 292.95%) - Accuracy: 0.8600 (↓ 10.91%) - F1: 0.8654 - AUC: 0.9376 (↓ 5.75%) - Precision: 0.8350 - Recall: 0.8980 ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. ## Intended uses & limitations There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify \"sexy\" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal. Usage for a local image: Usage for a distant image: Usage with Transformers.js (Vanilla JS): The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not. Additionally, please note that using the quantized ONNX model within the transformers.js pipeline will slightly reduce the model's accuracy. You can find a toy implementation of this model with Transformers.js here. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - num_epochs: 1 ### Training results - Validation Loss: 0.0937 - Accuracy: 0.9654, - AUC: 0.9948 Confusion matrix (eval): [1076 37] [ 60 1627] ### Framework versions - Transformers 4.36.2 - Evaluate 0.4.1 If you want to support me, you can here.", + "model_explanation_gemini": "Classifies images as SFW or NSFW with high accuracy, primarily trained on non-generative images like drawings and photos." +} \ No newline at end of file diff --git a/data/model_data_json/Alibaba-NLP_gte-Qwen2-1.5B-instruct.json b/data/model_data_json/Alibaba-NLP_gte-Qwen2-1.5B-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..5fefad7f7204ed79f6952e483b7b68f739f9420d --- /dev/null +++ b/data/model_data_json/Alibaba-NLP_gte-Qwen2-1.5B-instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "Alibaba-NLP/gte-Qwen2-1.5B-instruct", + "downloads": 214128, + "tags": [ + "sentence-transformers", + "safetensors", + "qwen2", + "text-generation", + "mteb", + "transformers", + "Qwen2", + "sentence-similarity", + "custom_code", + "arxiv:2308.03281", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-transformers - transformers - Qwen2 - sentence-similarity license: apache-2.0 model-index: - name: gte-qwen2-7B-instruct results: - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 83.98507462686567 - type: ap value: 50.93015252587014 - type: f1 value: 78.50416599051215 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 96.61065 - type: ap value: 94.89174052954196 - type: f1 value: 96.60942596940565 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 55.614000000000004 - type: f1 value: 54.90553480294904 task: type: Classification - dataset: config: default name: MTEB ArguAna revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: map_at_1 value: 45.164 - type: map_at_10 value: 61.519 - type: map_at_100 value: 61.769 - type: map_at_1000 value: 61.769 - type: map_at_3 value: 57.443999999999996 - type: map_at_5 value: 60.058 - type: mrr_at_1 value: 46.088 - type: mrr_at_10 value: 61.861 - type: mrr_at_100 value: 62.117999999999995 - type: mrr_at_1000 value: 62.117999999999995 - type: mrr_at_3 value: 57.729 - type: mrr_at_5 value: 60.392 - type: ndcg_at_1 value: 45.164 - type: ndcg_at_10 value: 69.72 - type: ndcg_at_100 value: 70.719 - type: ndcg_at_1000 value: 70.719 - type: ndcg_at_3 value: 61.517999999999994 - type: ndcg_at_5 value: 66.247 - type: precision_at_1 value: 45.164 - type: precision_at_10 value: 9.545 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 24.443 - type: precision_at_5 value: 16.97 - type: recall_at_1 value: 45.164 - type: recall_at_10 value: 95.448 - type: recall_at_100 value: 99.644 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 73.329 - type: recall_at_5 value: 84.851 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: v_measure value: 50.511868162026175 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: v_measure value: 45.007803189284004 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: map value: 64.55292107723382 - type: mrr value: 77.66158818097877 task: type: Reranking - dataset: config: default name: MTEB BIOSSES revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: cos_sim_pearson value: 85.65459047085452 - type: cos_sim_spearman value: 82.10729255710761 - type: euclidean_pearson value: 82.78079159312476 - type: euclidean_spearman value: 80.50002701880933 - type: manhattan_pearson value: 82.41372641383016 - type: manhattan_spearman value: 80.57412509272639 task: type: STS - dataset: config: default name: MTEB Banking77Classification revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 87.30844155844156 - type: f1 value: 87.25307322443255 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: v_measure value: 43.20754608934859 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: v_measure value: 38.818037697335505 task: type: Clustering - dataset: config: default name: MTEB CQADupstackAndroidRetrieval revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 35.423 - type: map_at_10 value: 47.198 - type: map_at_100 value: 48.899 - type: map_at_1000 value: 49.004 - type: map_at_3 value: 43.114999999999995 - type: map_at_5 value: 45.491 - type: mrr_at_1 value: 42.918 - type: mrr_at_10 value: 53.299 - type: mrr_at_100 value: 54.032000000000004 - type: mrr_at_1000 value: 54.055 - type: mrr_at_3 value: 50.453 - type: mrr_at_5 value: 52.205999999999996 - type: ndcg_at_1 value: 42.918 - type: ndcg_at_10 value: 53.98 - type: ndcg_at_100 value: 59.57 - type: ndcg_at_1000 value: 60.879000000000005 - type: ndcg_at_3 value: 48.224000000000004 - type: ndcg_at_5 value: 50.998 - type: precision_at_1 value: 42.918 - type: precision_at_10 value: 10.299999999999999 - type: precision_at_100 value: 1.687 - type: precision_at_1000 value: 0.211 - type: precision_at_3 value: 22.842000000000002 - type: precision_at_5 value: 16.681 - type: recall_at_1 value: 35.423 - type: recall_at_10 value: 66.824 - type: recall_at_100 value: 89.564 - type: recall_at_1000 value: 97.501 - type: recall_at_3 value: 50.365 - type: recall_at_5 value: 57.921 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 33.205 - type: map_at_10 value: 44.859 - type: map_at_100 value: 46.135 - type: map_at_1000 value: 46.259 - type: map_at_3 value: 41.839 - type: map_at_5 value: 43.662 - type: mrr_at_1 value: 41.146 - type: mrr_at_10 value: 50.621 - type: mrr_at_100 value: 51.207 - type: mrr_at_1000 value: 51.246 - type: mrr_at_3 value: 48.535000000000004 - type: mrr_at_5 value: 49.818 - type: ndcg_at_1 value: 41.146 - type: ndcg_at_10 value: 50.683 - type: ndcg_at_100 value: 54.82 - type: ndcg_at_1000 value: 56.69 - type: ndcg_at_3 value: 46.611000000000004 - type: ndcg_at_5 value: 48.66 - type: precision_at_1 value: 41.146 - type: precision_at_10 value: 9.439 - type: precision_at_100 value: 1.465 - type: precision_at_1000 value: 0.194 - type: precision_at_3 value: 22.59 - type: precision_at_5 value: 15.86 - type: recall_at_1 value: 33.205 - type: recall_at_10 value: 61.028999999999996 - type: recall_at_100 value: 78.152 - type: recall_at_1000 value: 89.59700000000001 - type: recall_at_3 value: 49.05 - type: recall_at_5 value: 54.836 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 41.637 - type: map_at_10 value: 55.162 - type: map_at_100 value: 56.142 - type: map_at_1000 value: 56.188 - type: map_at_3 value: 51.564 - type: map_at_5 value: 53.696 - type: mrr_at_1 value: 47.524 - type: mrr_at_10 value: 58.243 - type: mrr_at_100 value: 58.879999999999995 - type: mrr_at_1000 value: 58.9 - type: mrr_at_3 value: 55.69499999999999 - type: mrr_at_5 value: 57.284 - type: ndcg_at_1 value: 47.524 - type: ndcg_at_10 value: 61.305 - type: ndcg_at_100 value: 65.077 - type: ndcg_at_1000 value: 65.941 - type: ndcg_at_3 value: 55.422000000000004 - type: ndcg_at_5 value: 58.516 - type: precision_at_1 value: 47.524 - type: precision_at_10 value: 9.918000000000001 - type: precision_at_100 value: 1.276 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 24.765 - type: precision_at_5 value: 17.204 - type: recall_at_1 value: 41.637 - type: recall_at_10 value: 76.185 - type: recall_at_100 value: 92.149 - type: recall_at_1000 value: 98.199 - type: recall_at_3 value: 60.856 - type: recall_at_5 value: 68.25099999999999 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 26.27 - type: map_at_10 value: 37.463 - type: map_at_100 value: 38.434000000000005 - type: map_at_1000 value: 38.509 - type: map_at_3 value: 34.226 - type: map_at_5 value: 36.161 - type: mrr_at_1 value: 28.588 - type: mrr_at_10 value: 39.383 - type: mrr_at_100 value: 40.23 - type: mrr_at_1000 value: 40.281 - type: mrr_at_3 value: 36.422 - type: mrr_at_5 value: 38.252 - type: ndcg_at_1 value: 28.588 - type: ndcg_at_10 value: 43.511 - type: ndcg_at_100 value: 48.274 - type: ndcg_at_1000 value: 49.975 - type: ndcg_at_3 value: 37.319 - type: ndcg_at_5 value: 40.568 - type: precision_at_1 value: 28.588 - type: precision_at_10 value: 6.893000000000001 - type: precision_at_100 value: 0.9900000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 16.347 - type: precision_at_5 value: 11.661000000000001 - type: recall_at_1 value: 26.27 - type: recall_at_10 value: 60.284000000000006 - type: recall_at_100 value: 81.902 - type: recall_at_1000 value: 94.43 - type: recall_at_3 value: 43.537 - type: recall_at_5 value: 51.475 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 18.168 - type: map_at_10 value: 28.410000000000004 - type: map_at_100 value: 29.78 - type: map_at_1000 value: 29.892999999999997 - type: map_at_3 value: 25.238 - type: map_at_5 value: 26.96 - type: mrr_at_1 value: 23.507 - type: mrr_at_10 value: 33.382 - type: mrr_at_100 value: 34.404 - type: mrr_at_1000 value: 34.467999999999996 - type: mrr_at_3 value: 30.637999999999998 - type: mrr_at_5 value: 32.199 - type: ndcg_at_1 value: 23.507 - type: ndcg_at_10 value: 34.571000000000005 - type: ndcg_at_100 value: 40.663 - type: ndcg_at_1000 value: 43.236000000000004 - type: ndcg_at_3 value: 29.053 - type: ndcg_at_5 value: 31.563999999999997 - type: precision_at_1 value: 23.507 - type: precision_at_10 value: 6.654 - type: precision_at_100 value: 1.113 - type: precision_at_1000 value: 0.146 - type: precision_at_3 value: 14.427999999999999 - type: precision_at_5 value: 10.498000000000001 - type: recall_at_1 value: 18.168 - type: recall_at_10 value: 48.443000000000005 - type: recall_at_100 value: 74.47 - type: recall_at_1000 value: 92.494 - type: recall_at_3 value: 33.379999999999995 - type: recall_at_5 value: 39.76 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 32.39 - type: map_at_10 value: 44.479 - type: map_at_100 value: 45.977000000000004 - type: map_at_1000 value: 46.087 - type: map_at_3 value: 40.976 - type: map_at_5 value: 43.038 - type: mrr_at_1 value: 40.135 - type: mrr_at_10 value: 50.160000000000004 - type: mrr_at_100 value: 51.052 - type: mrr_at_1000 value: 51.087 - type: mrr_at_3 value: 47.818 - type: mrr_at_5 value: 49.171 - type: ndcg_at_1 value: 40.135 - type: ndcg_at_10 value: 50.731 - type: ndcg_at_100 value: 56.452000000000005 - type: ndcg_at_1000 value: 58.123000000000005 - type: ndcg_at_3 value: 45.507 - type: ndcg_at_5 value: 48.11 - type: precision_at_1 value: 40.135 - type: precision_at_10 value: 9.192 - type: precision_at_100 value: 1.397 - type: precision_at_1000 value: 0.169 - type: precision_at_3 value: 21.816 - type: precision_at_5 value: 15.476 - type: recall_at_1 value: 32.39 - type: recall_at_10 value: 63.597 - type: recall_at_100 value: 86.737 - type: recall_at_1000 value: 97.039 - type: recall_at_3 value: 48.906 - type: recall_at_5 value: 55.659000000000006 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 28.397 - type: map_at_10 value: 39.871 - type: map_at_100 value: 41.309000000000005 - type: map_at_1000 value: 41.409 - type: map_at_3 value: 36.047000000000004 - type: map_at_5 value: 38.104 - type: mrr_at_1 value: 34.703 - type: mrr_at_10 value: 44.773 - type: mrr_at_100 value: 45.64 - type: mrr_at_1000 value: 45.678999999999995 - type: mrr_at_3 value: 41.705 - type: mrr_at_5 value: 43.406 - type: ndcg_at_1 value: 34.703 - type: ndcg_at_10 value: 46.271 - type: ndcg_at_100 value: 52.037 - type: ndcg_at_1000 value: 53.81700000000001 - type: ndcg_at_3 value: 39.966 - type: ndcg_at_5 value: 42.801 - type: precision_at_1 value: 34.703 - type: precision_at_10 value: 8.744 - type: precision_at_100 value: 1.348 - type: precision_at_1000 value: 0.167 - type: precision_at_3 value: 19.102 - type: precision_at_5 value: 13.836 - type: recall_at_1 value: 28.397 - type: recall_at_10 value: 60.299 - type: recall_at_100 value: 84.595 - type: recall_at_1000 value: 96.155 - type: recall_at_3 value: 43.065 - type: recall_at_5 value: 50.371 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 28.044333333333338 - type: map_at_10 value: 38.78691666666666 - type: map_at_100 value: 40.113 - type: map_at_1000 value: 40.22125 - type: map_at_3 value: 35.52966666666667 - type: map_at_5 value: 37.372749999999996 - type: mrr_at_1 value: 33.159083333333335 - type: mrr_at_10 value: 42.913583333333335 - type: mrr_at_100 value: 43.7845 - type: mrr_at_1000 value: 43.830333333333336 - type: mrr_at_3 value: 40.29816666666667 - type: mrr_at_5 value: 41.81366666666667 - type: ndcg_at_1 value: 33.159083333333335 - type: ndcg_at_10 value: 44.75750000000001 - type: ndcg_at_100 value: 50.13658333333334 - type: ndcg_at_1000 value: 52.037 - type: ndcg_at_3 value: 39.34258333333334 - type: ndcg_at_5 value: 41.93708333333333 - type: precision_at_1 value: 33.159083333333335 - type: precision_at_10 value: 7.952416666666667 - type: precision_at_100 value: 1.2571666666666668 - type: precision_at_1000 value: 0.16099999999999998 - type: precision_at_3 value: 18.303833333333337 - type: precision_at_5 value: 13.057083333333333 - type: recall_at_1 value: 28.044333333333338 - type: recall_at_10 value: 58.237249999999996 - type: recall_at_100 value: 81.35391666666666 - type: recall_at_1000 value: 94.21283333333334 - type: recall_at_3 value: 43.32341666666667 - type: recall_at_5 value: 49.94908333333333 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 27.838 - type: map_at_10 value: 36.04 - type: map_at_100 value: 37.113 - type: map_at_1000 value: 37.204 - type: map_at_3 value: 33.585 - type: map_at_5 value: 34.845 - type: mrr_at_1 value: 30.982 - type: mrr_at_10 value: 39.105000000000004 - type: mrr_at_100 value: 39.98 - type: mrr_at_1000 value: 40.042 - type: mrr_at_3 value: 36.912 - type: mrr_at_5 value: 38.062000000000005 - type: ndcg_at_1 value: 30.982 - type: ndcg_at_10 value: 40.982 - type: ndcg_at_100 value: 46.092 - type: ndcg_at_1000 value: 48.25 - type: ndcg_at_3 value: 36.41 - type: ndcg_at_5 value: 38.379999999999995 - type: precision_at_1 value: 30.982 - type: precision_at_10 value: 6.534 - type: precision_at_100 value: 0.9820000000000001 - type: precision_at_1000 value: 0.124 - type: precision_at_3 value: 15.745999999999999 - type: precision_at_5 value: 10.828 - type: recall_at_1 value: 27.838 - type: recall_at_10 value: 52.971000000000004 - type: recall_at_100 value: 76.357 - type: recall_at_1000 value: 91.973 - type: recall_at_3 value: 40.157 - type: recall_at_5 value: 45.147999999999996 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 19.059 - type: map_at_10 value: 27.454 - type: map_at_100 value: 28.736 - type: map_at_1000 value: 28.865000000000002 - type: map_at_3 value: 24.773999999999997 - type: map_at_5 value: 26.266000000000002 - type: mrr_at_1 value: 23.125 - type: mrr_at_10 value: 31.267 - type: mrr_at_100 value: 32.32 - type: mrr_at_1000 value: 32.394 - type: mrr_at_3 value: 28.894 - type: mrr_at_5 value: 30.281000000000002 - type: ndcg_at_1 value: 23.125 - type: ndcg_at_10 value: 32.588 - type: ndcg_at_100 value: 38.432 - type: ndcg_at_1000 value: 41.214 - type: ndcg_at_3 value: 27.938000000000002 - type: ndcg_at_5 value: 30.127 - type: precision_at_1 value: 23.125 - type: precision_at_10 value: 5.9639999999999995 - type: precision_at_100 value: 1.047 - type: precision_at_1000 value: 0.148 - type: precision_at_3 value: 13.294 - type: precision_at_5 value: 9.628 - type: recall_at_1 value: 19.059 - type: recall_at_10 value: 44.25 - type: recall_at_100 value: 69.948 - type: recall_at_1000 value: 89.35300000000001 - type: recall_at_3 value: 31.114000000000004 - type: recall_at_5 value: 36.846000000000004 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 28.355999999999998 - type: map_at_10 value: 39.055 - type: map_at_100 value: 40.486 - type: map_at_1000 value: 40.571 - type: map_at_3 value: 35.69 - type: map_at_5 value: 37.605 - type: mrr_at_1 value: 33.302 - type: mrr_at_10 value: 42.986000000000004 - type: mrr_at_100 value: 43.957 - type: mrr_at_1000 value: 43.996 - type: mrr_at_3 value: 40.111999999999995 - type: mrr_at_5 value: 41.735 - type: ndcg_at_1 value: 33.302 - type: ndcg_at_10 value: 44.962999999999994 - type: ndcg_at_100 value: 50.917 - type: ndcg_at_1000 value: 52.622 - type: ndcg_at_3 value: 39.182 - type: ndcg_at_5 value: 41.939 - type: precision_at_1 value: 33.302 - type: precision_at_10 value: 7.779999999999999 - type: precision_at_100 value: 1.203 - type: precision_at_1000 value: 0.145 - type: precision_at_3 value: 18.035 - type: precision_at_5 value: 12.873000000000001 - type: recall_at_1 value: 28.355999999999998 - type: recall_at_10 value: 58.782000000000004 - type: recall_at_100 value: 84.02199999999999 - type: recall_at_1000 value: 95.511 - type: recall_at_3 value: 43.126999999999995 - type: recall_at_5 value: 50.14999999999999 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 27.391 - type: map_at_10 value: 37.523 - type: map_at_100 value: 39.312000000000005 - type: map_at_1000 value: 39.54 - type: map_at_3 value: 34.231 - type: map_at_5 value: 36.062 - type: mrr_at_1 value: 32.016 - type: mrr_at_10 value: 41.747 - type: mrr_at_100 value: 42.812 - type: mrr_at_1000 value: 42.844 - type: mrr_at_3 value: 39.129999999999995 - type: mrr_at_5 value: 40.524 - type: ndcg_at_1 value: 32.016 - type: ndcg_at_10 value: 43.826 - type: ndcg_at_100 value: 50.373999999999995 - type: ndcg_at_1000 value: 52.318 - type: ndcg_at_3 value: 38.479 - type: ndcg_at_5 value: 40.944 - type: precision_at_1 value: 32.016 - type: precision_at_10 value: 8.280999999999999 - type: precision_at_100 value: 1.6760000000000002 - type: precision_at_1000 value: 0.25 - type: precision_at_3 value: 18.05 - type: precision_at_5 value: 13.083 - type: recall_at_1 value: 27.391 - type: recall_at_10 value: 56.928999999999995 - type: recall_at_100 value: 85.169 - type: recall_at_1000 value: 96.665 - type: recall_at_3 value: 42.264 - type: recall_at_5 value: 48.556 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 18.398 - type: map_at_10 value: 27.929 - type: map_at_100 value: 29.032999999999998 - type: map_at_1000 value: 29.126 - type: map_at_3 value: 25.070999999999998 - type: map_at_5 value: 26.583000000000002 - type: mrr_at_1 value: 19.963 - type: mrr_at_10 value: 29.997 - type: mrr_at_100 value: 30.9 - type: mrr_at_1000 value: 30.972 - type: mrr_at_3 value: 27.264 - type: mrr_at_5 value: 28.826 - type: ndcg_at_1 value: 19.963 - type: ndcg_at_10 value: 33.678999999999995 - type: ndcg_at_100 value: 38.931 - type: ndcg_at_1000 value: 41.379 - type: ndcg_at_3 value: 28.000000000000004 - type: ndcg_at_5 value: 30.637999999999998 - type: precision_at_1 value: 19.963 - type: precision_at_10 value: 5.7299999999999995 - type: precision_at_100 value: 0.902 - type: precision_at_1000 value: 0.122 - type: precision_at_3 value: 12.631 - type: precision_at_5 value: 9.057 - type: recall_at_1 value: 18.398 - type: recall_at_10 value: 49.254 - type: recall_at_100 value: 73.182 - type: recall_at_1000 value: 91.637 - type: recall_at_3 value: 34.06 - type: recall_at_5 value: 40.416000000000004 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: map_at_1 value: 19.681 - type: map_at_10 value: 32.741 - type: map_at_100 value: 34.811 - type: map_at_1000 value: 35.003 - type: map_at_3 value: 27.697 - type: map_at_5 value: 30.372 - type: mrr_at_1 value: 44.951 - type: mrr_at_10 value: 56.34400000000001 - type: mrr_at_100 value: 56.961 - type: mrr_at_1000 value: 56.987 - type: mrr_at_3 value: 53.681 - type: mrr_at_5 value: 55.407 - type: ndcg_at_1 value: 44.951 - type: ndcg_at_10 value: 42.905 - type: ndcg_at_100 value: 49.95 - type: ndcg_at_1000 value: 52.917 - type: ndcg_at_3 value: 36.815 - type: ndcg_at_5 value: 38.817 - type: precision_at_1 value: 44.951 - type: precision_at_10 value: 12.989999999999998 - type: precision_at_100 value: 2.068 - type: precision_at_1000 value: 0.263 - type: precision_at_3 value: 27.275 - type: precision_at_5 value: 20.365 - type: recall_at_1 value: 19.681 - type: recall_at_10 value: 48.272999999999996 - type: recall_at_100 value: 71.87400000000001 - type: recall_at_1000 value: 87.929 - type: recall_at_3 value: 32.653999999999996 - type: recall_at_5 value: 39.364 task: type: Retrieval - dataset: config: default name: MTEB DBPedia revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: map_at_1 value: 10.231 - type: map_at_10 value: 22.338 - type: map_at_100 value: 31.927 - type: map_at_1000 value: 33.87 - type: map_at_3 value: 15.559999999999999 - type: map_at_5 value: 18.239 - type: mrr_at_1 value: 75.0 - type: mrr_at_10 value: 81.303 - type: mrr_at_100 value: 81.523 - type: mrr_at_1000 value: 81.53 - type: mrr_at_3 value: 80.083 - type: mrr_at_5 value: 80.758 - type: ndcg_at_1 value: 64.625 - type: ndcg_at_10 value: 48.687000000000005 - type: ndcg_at_100 value: 52.791 - type: ndcg_at_1000 value: 60.041999999999994 - type: ndcg_at_3 value: 53.757999999999996 - type: ndcg_at_5 value: 50.76500000000001 - type: precision_at_1 value: 75.0 - type: precision_at_10 value: 38.3 - type: precision_at_100 value: 12.025 - type: precision_at_1000 value: 2.3970000000000002 - type: precision_at_3 value: 55.417 - type: precision_at_5 value: 47.5 - type: recall_at_1 value: 10.231 - type: recall_at_10 value: 27.697 - type: recall_at_100 value: 57.409 - type: recall_at_1000 value: 80.547 - type: recall_at_3 value: 16.668 - type: recall_at_5 value: 20.552 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 61.365 - type: f1 value: 56.7540827912991 task: type: Classification - dataset: config: default name: MTEB FEVER revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: map_at_1 value: 83.479 - type: map_at_10 value: 88.898 - type: map_at_100 value: 89.11 - type: map_at_1000 value: 89.12400000000001 - type: map_at_3 value: 88.103 - type: map_at_5 value: 88.629 - type: mrr_at_1 value: 89.934 - type: mrr_at_10 value: 93.91000000000001 - type: mrr_at_100 value: 93.937 - type: mrr_at_1000 value: 93.938 - type: mrr_at_3 value: 93.62700000000001 - type: mrr_at_5 value: 93.84599999999999 - type: ndcg_at_1 value: 89.934 - type: ndcg_at_10 value: 91.574 - type: ndcg_at_100 value: 92.238 - type: ndcg_at_1000 value: 92.45 - type: ndcg_at_3 value: 90.586 - type: ndcg_at_5 value: 91.16300000000001 - type: precision_at_1 value: 89.934 - type: precision_at_10 value: 10.555 - type: precision_at_100 value: 1.1159999999999999 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 33.588 - type: precision_at_5 value: 20.642 - type: recall_at_1 value: 83.479 - type: recall_at_10 value: 94.971 - type: recall_at_100 value: 97.397 - type: recall_at_1000 value: 98.666 - type: recall_at_3 value: 92.24799999999999 - type: recall_at_5 value: 93.797 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: map_at_1 value: 27.16 - type: map_at_10 value: 45.593 - type: map_at_100 value: 47.762 - type: map_at_1000 value: 47.899 - type: map_at_3 value: 39.237 - type: map_at_5 value: 42.970000000000006 - type: mrr_at_1 value: 52.623 - type: mrr_at_10 value: 62.637 - type: mrr_at_100 value: 63.169 - type: mrr_at_1000 value: 63.185 - type: mrr_at_3 value: 59.928000000000004 - type: mrr_at_5 value: 61.702999999999996 - type: ndcg_at_1 value: 52.623 - type: ndcg_at_10 value: 54.701 - type: ndcg_at_100 value: 61.263 - type: ndcg_at_1000 value: 63.134 - type: ndcg_at_3 value: 49.265 - type: ndcg_at_5 value: 51.665000000000006 - type: precision_at_1 value: 52.623 - type: precision_at_10 value: 15.185 - type: precision_at_100 value: 2.202 - type: precision_at_1000 value: 0.254 - type: precision_at_3 value: 32.767 - type: precision_at_5 value: 24.722 - type: recall_at_1 value: 27.16 - type: recall_at_10 value: 63.309000000000005 - type: recall_at_100 value: 86.722 - type: recall_at_1000 value: 97.505 - type: recall_at_3 value: 45.045 - type: recall_at_5 value: 54.02400000000001 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: map_at_1 value: 42.573 - type: map_at_10 value: 59.373 - type: map_at_100 value: 60.292 - type: map_at_1000 value: 60.358999999999995 - type: map_at_3 value: 56.159000000000006 - type: map_at_5 value: 58.123999999999995 - type: mrr_at_1 value: 85.14500000000001 - type: mrr_at_10 value: 89.25999999999999 - type: mrr_at_100 value: 89.373 - type: mrr_at_1000 value: 89.377 - type: mrr_at_3 value: 88.618 - type: mrr_at_5 value: 89.036 - type: ndcg_at_1 value: 85.14500000000001 - type: ndcg_at_10 value: 68.95 - type: ndcg_at_100 value: 71.95 - type: ndcg_at_1000 value: 73.232 - type: ndcg_at_3 value: 64.546 - type: ndcg_at_5 value: 66.945 - type: precision_at_1 value: 85.14500000000001 - type: precision_at_10 value: 13.865 - type: precision_at_100 value: 1.619 - type: precision_at_1000 value: 0.179 - type: precision_at_3 value: 39.703 - type: precision_at_5 value: 25.718000000000004 - type: recall_at_1 value: 42.573 - type: recall_at_10 value: 69.325 - type: recall_at_100 value: 80.932 - type: recall_at_1000 value: 89.446 - type: recall_at_3 value: 59.553999999999995 - type: recall_at_5 value: 64.294 task: type: Retrieval - dataset: config: default name: MTEB ImdbClassification revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 95.8336 - type: ap value: 93.78862962194073 - type: f1 value: 95.83192650728371 task: type: Classification - dataset: config: default name: MTEB MSMARCO revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: map_at_1 value: 23.075000000000003 - type: map_at_10 value: 36.102000000000004 - type: map_at_100 value: 37.257 - type: map_at_1000 value: 37.3 - type: map_at_3 value: 32.144 - type: map_at_5 value: 34.359 - type: mrr_at_1 value: 23.711 - type: mrr_at_10 value: 36.671 - type: mrr_at_100 value: 37.763999999999996 - type: mrr_at_1000 value: 37.801 - type: mrr_at_3 value: 32.775 - type: mrr_at_5 value: 34.977000000000004 - type: ndcg_at_1 value: 23.711 - type: ndcg_at_10 value: 43.361 - type: ndcg_at_100 value: 48.839 - type: ndcg_at_1000 value: 49.88 - type: ndcg_at_3 value: 35.269 - type: ndcg_at_5 value: 39.224 - type: precision_at_1 value: 23.711 - type: precision_at_10 value: 6.866999999999999 - type: precision_at_100 value: 0.96 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 15.096000000000002 - type: precision_at_5 value: 11.083 - type: recall_at_1 value: 23.075000000000003 - type: recall_at_10 value: 65.756 - type: recall_at_100 value: 90.88199999999999 - type: recall_at_1000 value: 98.739 - type: recall_at_3 value: 43.691 - type: recall_at_5 value: 53.15800000000001 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 97.69493844049248 - type: f1 value: 97.55048089616261 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 88.75968992248062 - type: f1 value: 72.26321223399123 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 82.40080699394754 - type: f1 value: 79.62590029057968 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 84.49562878278414 - type: f1 value: 84.0040193313333 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: v_measure value: 39.386760057101945 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: v_measure value: 37.89687154075537 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 split: test type: mteb/mind_small metrics: - type: map value: 33.94151656057482 - type: mrr value: 35.32684700746953 task: type: Reranking - dataset: config: default name: MTEB NFCorpus revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: map_at_1 value: 6.239999999999999 - type: map_at_10 value: 14.862 - type: map_at_100 value: 18.955 - type: map_at_1000 value: 20.694000000000003 - type: map_at_3 value: 10.683 - type: map_at_5 value: 12.674 - type: mrr_at_1 value: 50.15500000000001 - type: mrr_at_10 value: 59.697 - type: mrr_at_100 value: 60.095 - type: mrr_at_1000 value: 60.129999999999995 - type: mrr_at_3 value: 58.35900000000001 - type: mrr_at_5 value: 58.839 - type: ndcg_at_1 value: 48.452 - type: ndcg_at_10 value: 39.341 - type: ndcg_at_100 value: 35.866 - type: ndcg_at_1000 value: 45.111000000000004 - type: ndcg_at_3 value: 44.527 - type: ndcg_at_5 value: 42.946 - type: precision_at_1 value: 50.15500000000001 - type: precision_at_10 value: 29.536 - type: precision_at_100 value: 9.142 - type: precision_at_1000 value: 2.2849999999999997 - type: precision_at_3 value: 41.899 - type: precision_at_5 value: 37.647000000000006 - type: recall_at_1 value: 6.239999999999999 - type: recall_at_10 value: 19.278000000000002 - type: recall_at_100 value: 36.074 - type: recall_at_1000 value: 70.017 - type: recall_at_3 value: 12.066 - type: recall_at_5 value: 15.254000000000001 task: type: Retrieval - dataset: config: default name: MTEB NQ revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: map_at_1 value: 39.75 - type: map_at_10 value: 56.443 - type: map_at_100 value: 57.233999999999995 - type: map_at_1000 value: 57.249 - type: map_at_3 value: 52.032999999999994 - type: map_at_5 value: 54.937999999999995 - type: mrr_at_1 value: 44.728 - type: mrr_at_10 value: 58.939 - type: mrr_at_100 value: 59.489000000000004 - type: mrr_at_1000 value: 59.499 - type: mrr_at_3 value: 55.711999999999996 - type: mrr_at_5 value: 57.89 - type: ndcg_at_1 value: 44.728 - type: ndcg_at_10 value: 63.998999999999995 - type: ndcg_at_100 value: 67.077 - type: ndcg_at_1000 value: 67.40899999999999 - type: ndcg_at_3 value: 56.266000000000005 - type: ndcg_at_5 value: 60.88 - type: precision_at_1 value: 44.728 - type: precision_at_10 value: 10.09 - type: precision_at_100 value: 1.1809999999999998 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 25.145 - type: precision_at_5 value: 17.822 - type: recall_at_1 value: 39.75 - type: recall_at_10 value: 84.234 - type: recall_at_100 value: 97.055 - type: recall_at_1000 value: 99.517 - type: recall_at_3 value: 64.851 - type: recall_at_5 value: 75.343 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval revision: None split: test type: mteb/quora metrics: - type: map_at_1 value: 72.085 - type: map_at_10 value: 86.107 - type: map_at_100 value: 86.727 - type: map_at_1000 value: 86.74 - type: map_at_3 value: 83.21 - type: map_at_5 value: 85.06 - type: mrr_at_1 value: 82.94 - type: mrr_at_10 value: 88.845 - type: mrr_at_100 value: 88.926 - type: mrr_at_1000 value: 88.927 - type: mrr_at_3 value: 87.993 - type: mrr_at_5 value: 88.62299999999999 - type: ndcg_at_1 value: 82.97 - type: ndcg_at_10 value: 89.645 - type: ndcg_at_100 value: 90.717 - type: ndcg_at_1000 value: 90.78 - type: ndcg_at_3 value: 86.99900000000001 - type: ndcg_at_5 value: 88.52600000000001 - type: precision_at_1 value: 82.97 - type: precision_at_10 value: 13.569 - type: precision_at_100 value: 1.539 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 38.043 - type: precision_at_5 value: 24.992 - type: recall_at_1 value: 72.085 - type: recall_at_10 value: 96.262 - type: recall_at_100 value: 99.77000000000001 - type: recall_at_1000 value: 99.997 - type: recall_at_3 value: 88.652 - type: recall_at_5 value: 93.01899999999999 task: type: Retrieval - dataset: config: default name: MTEB RedditClustering revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: v_measure value: 55.82153952668092 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P revision: 282350215ef01743dc01b456c7f5241fa8937f16 split: test type: mteb/reddit-clustering-p2p metrics: - type: v_measure value: 62.094465801879295 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS revision: None split: test type: mteb/scidocs metrics: - type: map_at_1 value: 5.688 - type: map_at_10 value: 15.201999999999998 - type: map_at_100 value: 18.096 - type: map_at_1000 value: 18.481 - type: map_at_3 value: 10.734 - type: map_at_5 value: 12.94 - type: mrr_at_1 value: 28.000000000000004 - type: mrr_at_10 value: 41.101 - type: mrr_at_100 value: 42.202 - type: mrr_at_1000 value: 42.228 - type: mrr_at_3 value: 37.683 - type: mrr_at_5 value: 39.708 - type: ndcg_at_1 value: 28.000000000000004 - type: ndcg_at_10 value: 24.976000000000003 - type: ndcg_at_100 value: 35.129 - type: ndcg_at_1000 value: 40.77 - type: ndcg_at_3 value: 23.787 - type: ndcg_at_5 value: 20.816000000000003 - type: precision_at_1 value: 28.000000000000004 - type: precision_at_10 value: 13.04 - type: precision_at_100 value: 2.761 - type: precision_at_1000 value: 0.41000000000000003 - type: precision_at_3 value: 22.6 - type: precision_at_5 value: 18.52 - type: recall_at_1 value: 5.688 - type: recall_at_10 value: 26.43 - type: recall_at_100 value: 56.02 - type: recall_at_1000 value: 83.21 - type: recall_at_3 value: 13.752 - type: recall_at_5 value: 18.777 task: type: Retrieval - dataset: config: default name: MTEB SICK-R revision: a6ea5a8cab320b040a23452cc28066d9beae2cee split: test type: mteb/sickr-sts metrics: - type: cos_sim_pearson value: 85.15084859283178 - type: cos_sim_spearman value: 80.49030614009419 - type: euclidean_pearson value: 81.84574978672468 - type: euclidean_spearman value: 79.89787150656818 - type: manhattan_pearson value: 81.63076538567131 - type: manhattan_spearman value: 79.69867352121841 task: type: STS - dataset: config: default name: MTEB STS12 revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: cos_sim_pearson value: 84.64097921490992 - type: cos_sim_spearman value: 77.25370084896514 - type: euclidean_pearson value: 82.71210826468788 - type: euclidean_spearman value: 78.50445584994826 - type: manhattan_pearson value: 82.92580164330298 - type: manhattan_spearman value: 78.69686891301019 task: type: STS - dataset: config: default name: MTEB STS13 revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: cos_sim_pearson value: 87.24596417308994 - type: cos_sim_spearman value: 87.79454220555091 - type: euclidean_pearson value: 87.40242561671164 - type: euclidean_spearman value: 88.25955597373556 - type: manhattan_pearson value: 87.25160240485849 - type: manhattan_spearman value: 88.155794979818 task: type: STS - dataset: config: default name: MTEB STS14 revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: cos_sim_pearson value: 84.44914233422564 - type: cos_sim_spearman value: 82.91015471820322 - type: euclidean_pearson value: 84.7206656630327 - type: euclidean_spearman value: 83.86408872059216 - type: manhattan_pearson value: 84.72816725158454 - type: manhattan_spearman value: 84.01603388572788 task: type: STS - dataset: config: default name: MTEB STS15 revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: cos_sim_pearson value: 87.6168026237477 - type: cos_sim_spearman value: 88.45414278092397 - type: euclidean_pearson value: 88.57023240882022 - type: euclidean_spearman value: 89.04102190922094 - type: manhattan_pearson value: 88.66695535796354 - type: manhattan_spearman value: 89.19898476680969 task: type: STS - dataset: config: default name: MTEB STS16 revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: cos_sim_pearson value: 84.27925826089424 - type: cos_sim_spearman value: 85.45291099550461 - type: euclidean_pearson value: 83.63853036580834 - type: euclidean_spearman value: 84.33468035821484 - type: manhattan_pearson value: 83.72778773251596 - type: manhattan_spearman value: 84.51583132445376 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 89.67375185692552 - type: cos_sim_spearman value: 90.32542469203855 - type: euclidean_pearson value: 89.63513717951847 - type: euclidean_spearman value: 89.87760271003745 - type: manhattan_pearson value: 89.28381452982924 - type: manhattan_spearman value: 89.53568197785721 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 66.24644693819846 - type: cos_sim_spearman value: 66.09889420525377 - type: euclidean_pearson value: 63.72551583520747 - type: euclidean_spearman value: 63.01385470780679 - type: manhattan_pearson value: 64.09258157214097 - type: manhattan_spearman value: 63.080517752822594 task: type: STS - dataset: config: default name: MTEB STSBenchmark revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: cos_sim_pearson value: 86.27321463839989 - type: cos_sim_spearman value: 86.37572865993327 - type: euclidean_pearson value: 86.36268020198149 - type: euclidean_spearman value: 86.31089339478922 - type: manhattan_pearson value: 86.4260445761947 - type: manhattan_spearman value: 86.45885895320457 task: type: STS - dataset: config: default name: MTEB SciDocsRR revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: map value: 86.52456702387798 - type: mrr value: 96.34556529164372 task: type: Reranking - dataset: config: default name: MTEB SciFact revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: map_at_1 value: 61.99400000000001 - type: map_at_10 value: 73.38799999999999 - type: map_at_100 value: 73.747 - type: map_at_1000 value: 73.75 - type: map_at_3 value: 70.04599999999999 - type: map_at_5 value: 72.095 - type: mrr_at_1 value: 65.0 - type: mrr_at_10 value: 74.42800000000001 - type: mrr_at_100 value: 74.722 - type: mrr_at_1000 value: 74.725 - type: mrr_at_3 value: 72.056 - type: mrr_at_5 value: 73.60600000000001 - type: ndcg_at_1 value: 65.0 - type: ndcg_at_10 value: 78.435 - type: ndcg_at_100 value: 79.922 - type: ndcg_at_1000 value: 80.00500000000001 - type: ndcg_at_3 value: 73.05199999999999 - type: ndcg_at_5 value: 75.98 - type: precision_at_1 value: 65.0 - type: precision_at_10 value: 10.5 - type: precision_at_100 value: 1.123 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 28.555999999999997 - type: precision_at_5 value: 19.0 - type: recall_at_1 value: 61.99400000000001 - type: recall_at_10 value: 92.72200000000001 - type: recall_at_100 value: 99.333 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 78.739 - type: recall_at_5 value: 85.828 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: cos_sim_accuracy value: 99.79009900990098 - type: cos_sim_ap value: 95.3203137438653 - type: cos_sim_f1 value: 89.12386706948641 - type: cos_sim_precision value: 89.75659229208925 - type: cos_sim_recall value: 88.5 - type: dot_accuracy value: 99.67821782178218 - type: dot_ap value: 89.94069840000675 - type: dot_f1 value: 83.45902463549521 - type: dot_precision value: 83.9231547017189 - type: dot_recall value: 83.0 - type: euclidean_accuracy value: 99.78613861386138 - type: euclidean_ap value: 95.10648259135526 - type: euclidean_f1 value: 88.77338877338877 - type: euclidean_precision value: 92.42424242424242 - type: euclidean_recall value: 85.39999999999999 - type: manhattan_accuracy value: 99.7950495049505 - type: manhattan_ap value: 95.29987661320946 - type: manhattan_f1 value: 89.21313183949972 - type: manhattan_precision value: 93.14472252448314 - type: manhattan_recall value: 85.6 - type: max_accuracy value: 99.7950495049505 - type: max_ap value: 95.3203137438653 - type: max_f1 value: 89.21313183949972 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: v_measure value: 67.65446577183913 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: v_measure value: 46.30749237193961 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: map value: 54.91481849959949 - type: mrr value: 55.853506175197346 task: type: Reranking - dataset: config: default name: MTEB SummEval revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: cos_sim_pearson value: 30.08196549170419 - type: cos_sim_spearman value: 31.16661390597077 - type: dot_pearson value: 29.892258410943466 - type: dot_spearman value: 30.51328811965085 task: type: Summarization - dataset: config: default name: MTEB TRECCOVID revision: None split: test type: mteb/trec-covid metrics: - type: map_at_1 value: 0.23900000000000002 - type: map_at_10 value: 2.173 - type: map_at_100 value: 14.24 - type: map_at_1000 value: 35.309000000000005 - type: map_at_3 value: 0.7100000000000001 - type: map_at_5 value: 1.163 - type: mrr_at_1 value: 92.0 - type: mrr_at_10 value: 96.0 - type: mrr_at_100 value: 96.0 - type: mrr_at_1000 value: 96.0 - type: mrr_at_3 value: 96.0 - type: mrr_at_5 value: 96.0 - type: ndcg_at_1 value: 90.0 - type: ndcg_at_10 value: 85.382 - type: ndcg_at_100 value: 68.03 - type: ndcg_at_1000 value: 61.021 - type: ndcg_at_3 value: 89.765 - type: ndcg_at_5 value: 88.444 - type: precision_at_1 value: 92.0 - type: precision_at_10 value: 88.0 - type: precision_at_100 value: 70.02000000000001 - type: precision_at_1000 value: 26.984 - type: precision_at_3 value: 94.0 - type: precision_at_5 value: 92.80000000000001 - type: recall_at_1 value: 0.23900000000000002 - type: recall_at_10 value: 2.313 - type: recall_at_100 value: 17.049 - type: recall_at_1000 value: 57.489999999999995 - type: recall_at_3 value: 0.737 - type: recall_at_5 value: 1.221 task: type: Retrieval - dataset: config: default name: MTEB Touche2020 revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: map_at_1 value: 2.75 - type: map_at_10 value: 11.29 - type: map_at_100 value: 18.032999999999998 - type: map_at_1000 value: 19.746 - type: map_at_3 value: 6.555 - type: map_at_5 value: 8.706999999999999 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 50.55 - type: mrr_at_100 value: 51.659 - type: mrr_at_1000 value: 51.659 - type: mrr_at_3 value: 47.278999999999996 - type: mrr_at_5 value: 49.728 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 27.894000000000002 - type: ndcg_at_100 value: 39.769 - type: ndcg_at_1000 value: 51.495999999999995 - type: ndcg_at_3 value: 32.954 - type: ndcg_at_5 value: 31.502999999999997 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 23.265 - type: precision_at_100 value: 7.898 - type: precision_at_1000 value: 1.58 - type: precision_at_3 value: 34.694 - type: precision_at_5 value: 31.429000000000002 - type: recall_at_1 value: 2.75 - type: recall_at_10 value: 16.953 - type: recall_at_100 value: 48.68 - type: recall_at_1000 value: 85.18599999999999 - type: recall_at_3 value: 7.710999999999999 - type: recall_at_5 value: 11.484 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 82.66099999999999 - type: ap value: 25.555698090238337 - type: f1 value: 66.48402012461622 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 72.94567062818335 - type: f1 value: 73.28139189595674 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: v_measure value: 49.581627240203474 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: cos_sim_accuracy value: 87.78089050485785 - type: cos_sim_ap value: 79.64487116574168 - type: cos_sim_f1 value: 72.46563021970964 - type: cos_sim_precision value: 70.62359128474831 - type: cos_sim_recall value: 74.40633245382587 - type: dot_accuracy value: 86.2609524944865 - type: dot_ap value: 75.513046857613 - type: dot_f1 value: 68.58213616489695 - type: dot_precision value: 65.12455516014235 - type: dot_recall value: 72.42744063324538 - type: euclidean_accuracy value: 87.6080348095607 - type: euclidean_ap value: 79.00204933649795 - type: euclidean_f1 value: 72.14495342605589 - type: euclidean_precision value: 69.85421299728193 - type: euclidean_recall value: 74.5910290237467 - type: manhattan_accuracy value: 87.59611372712642 - type: manhattan_ap value: 78.78523756706264 - type: manhattan_f1 value: 71.86499137718648 - type: manhattan_precision value: 67.39833641404806 - type: manhattan_recall value: 76.96569920844327 - type: max_accuracy value: 87.78089050485785 - type: max_ap value: 79.64487116574168 - type: max_f1 value: 72.46563021970964 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: cos_sim_accuracy value: 89.98719292117825 - type: cos_sim_ap value: 87.58146137353202 - type: cos_sim_f1 value: 80.28543232369239 - type: cos_sim_precision value: 79.1735289714029 - type: cos_sim_recall value: 81.42901139513397 - type: dot_accuracy value: 88.9199363526992 - type: dot_ap value: 84.98499998630417 - type: dot_f1 value: 78.21951400757969 - type: dot_precision value: 75.58523624874336 - type: dot_recall value: 81.04404065291038 - type: euclidean_accuracy value: 89.77374160748244 - type: euclidean_ap value: 87.35151562835209 - type: euclidean_f1 value: 79.92160922940393 - type: euclidean_precision value: 76.88531587933979 - type: euclidean_recall value: 83.20757622420696 - type: manhattan_accuracy value: 89.72717041176699 - type: manhattan_ap value: 87.34065592142515 - type: manhattan_f1 value: 79.85603419187943 - type: manhattan_precision value: 77.82243332115455 - type: manhattan_recall value: 81.99876809362489 - type: max_accuracy value: 89.98719292117825 - type: max_ap value: 87.58146137353202 - type: max_f1 value: 80.28543232369239 task: type: PairClassification - dataset: config: default name: MTEB AFQMC revision: b44c3b011063adb25877c13823db83bb193913c4 split: validation type: C-MTEB/AFQMC metrics: - type: cos_sim_pearson value: 53.45954203592337 - type: cos_sim_spearman value: 58.42154680418638 - type: euclidean_pearson value: 56.41543791722753 - type: euclidean_spearman value: 58.39328016640146 - type: manhattan_pearson value: 56.318510356833876 - type: manhattan_spearman value: 58.28423447818184 task: type: STS - dataset: config: default name: MTEB ATEC revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 split: test type: C-MTEB/ATEC metrics: - type: cos_sim_pearson value: 50.78356460675945 - type: cos_sim_spearman value: 55.6530411663269 - type: euclidean_pearson value: 56.50763660417816 - type: euclidean_spearman value: 55.733823335669065 - type: manhattan_pearson value: 56.45323093512866 - type: manhattan_spearman value: 55.63248619032702 task: type: STS - dataset: config: zh name: MTEB AmazonReviewsClassification (zh) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 47.209999999999994 - type: f1 value: 46.08892432018655 task: type: Classification - dataset: config: default name: MTEB BQ revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 split: test type: C-MTEB/BQ metrics: - type: cos_sim_pearson value: 70.25573992001478 - type: cos_sim_spearman value: 73.85247134951433 - type: euclidean_pearson value: 72.60033082168442 - type: euclidean_spearman value: 73.72445893756499 - type: manhattan_pearson value: 72.59932284620231 - type: manhattan_spearman value: 73.68002490614583 task: type: STS - dataset: config: default name: MTEB CLSClusteringP2P revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 split: test type: C-MTEB/CLSClusteringP2P metrics: - type: v_measure value: 45.21317724305628 task: type: Clustering - dataset: config: default name: MTEB CLSClusteringS2S revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f split: test type: C-MTEB/CLSClusteringS2S metrics: - type: v_measure value: 42.49825170976724 task: type: Clustering - dataset: config: default name: MTEB CMedQAv1 revision: 8d7f1e942507dac42dc58017c1a001c3717da7df split: test type: C-MTEB/CMedQAv1-reranking metrics: - type: map value: 88.15661686810597 - type: mrr value: 90.11222222222223 task: type: Reranking - dataset: config: default name: MTEB CMedQAv2 revision: 23d186750531a14a0357ca22cd92d712fd512ea0 split: test type: C-MTEB/CMedQAv2-reranking metrics: - type: map value: 88.1204726064383 - type: mrr value: 90.20142857142858 task: type: Reranking - dataset: config: default name: MTEB CmedqaRetrieval revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 split: dev type: C-MTEB/CmedqaRetrieval metrics: - type: map_at_1 value: 27.224999999999998 - type: map_at_10 value: 40.169 - type: map_at_100 value: 42.0 - type: map_at_1000 value: 42.109 - type: map_at_3 value: 35.76 - type: map_at_5 value: 38.221 - type: mrr_at_1 value: 40.56 - type: mrr_at_10 value: 49.118 - type: mrr_at_100 value: 50.092999999999996 - type: mrr_at_1000 value: 50.133 - type: mrr_at_3 value: 46.507 - type: mrr_at_5 value: 47.973 - type: ndcg_at_1 value: 40.56 - type: ndcg_at_10 value: 46.972 - type: ndcg_at_100 value: 54.04 - type: ndcg_at_1000 value: 55.862 - type: ndcg_at_3 value: 41.36 - type: ndcg_at_5 value: 43.704 - type: precision_at_1 value: 40.56 - type: precision_at_10 value: 10.302999999999999 - type: precision_at_100 value: 1.606 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 23.064 - type: precision_at_5 value: 16.764000000000003 - type: recall_at_1 value: 27.224999999999998 - type: recall_at_10 value: 58.05200000000001 - type: recall_at_100 value: 87.092 - type: recall_at_1000 value: 99.099 - type: recall_at_3 value: 41.373 - type: recall_at_5 value: 48.453 task: type: Retrieval - dataset: config: default name: MTEB Cmnli revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 split: validation type: C-MTEB/CMNLI metrics: - type: cos_sim_accuracy value: 77.40228502705953 - type: cos_sim_ap value: 86.22359172956327 - type: cos_sim_f1 value: 78.96328293736501 - type: cos_sim_precision value: 73.36945615091311 - type: cos_sim_recall value: 85.48047696983868 - type: dot_accuracy value: 75.53818400481059 - type: dot_ap value: 83.70164011305312 - type: dot_f1 value: 77.67298719348754 - type: dot_precision value: 67.49482401656314 - type: dot_recall value: 91.46598082768296 - type: euclidean_accuracy value: 77.94347564642213 - type: euclidean_ap value: 86.4652108728609 - type: euclidean_f1 value: 79.15555555555555 - type: euclidean_precision value: 75.41816641964853 - type: euclidean_recall value: 83.28267477203647 - type: manhattan_accuracy value: 77.45039085989175 - type: manhattan_ap value: 86.09986583900665 - type: manhattan_f1 value: 78.93669264438988 - type: manhattan_precision value: 72.63261296660117 - type: manhattan_recall value: 86.43909282207154 - type: max_accuracy value: 77.94347564642213 - type: max_ap value: 86.4652108728609 - type: max_f1 value: 79.15555555555555 task: type: PairClassification - dataset: config: default name: MTEB CovidRetrieval revision: 1271c7809071a13532e05f25fb53511ffce77117 split: dev type: C-MTEB/CovidRetrieval metrics: - type: map_at_1 value: 69.336 - type: map_at_10 value: 77.16 - type: map_at_100 value: 77.47500000000001 - type: map_at_1000 value: 77.482 - type: map_at_3 value: 75.42999999999999 - type: map_at_5 value: 76.468 - type: mrr_at_1 value: 69.44200000000001 - type: mrr_at_10 value: 77.132 - type: mrr_at_100 value: 77.43299999999999 - type: mrr_at_1000 value: 77.44 - type: mrr_at_3 value: 75.395 - type: mrr_at_5 value: 76.459 - type: ndcg_at_1 value: 69.547 - type: ndcg_at_10 value: 80.794 - type: ndcg_at_100 value: 82.245 - type: ndcg_at_1000 value: 82.40899999999999 - type: ndcg_at_3 value: 77.303 - type: ndcg_at_5 value: 79.168 - type: precision_at_1 value: 69.547 - type: precision_at_10 value: 9.305 - type: precision_at_100 value: 0.9979999999999999 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 27.749000000000002 - type: precision_at_5 value: 17.576 - type: recall_at_1 value: 69.336 - type: recall_at_10 value: 92.097 - type: recall_at_100 value: 98.736 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 82.64 - type: recall_at_5 value: 87.144 task: type: Retrieval - dataset: config: default name: MTEB DuRetrieval revision: a1a333e290fe30b10f3f56498e3a0d911a693ced split: dev type: C-MTEB/DuRetrieval metrics: - type: map_at_1 value: 26.817999999999998 - type: map_at_10 value: 82.67 - type: map_at_100 value: 85.304 - type: map_at_1000 value: 85.334 - type: map_at_3 value: 57.336 - type: map_at_5 value: 72.474 - type: mrr_at_1 value: 91.45 - type: mrr_at_10 value: 94.272 - type: mrr_at_100 value: 94.318 - type: mrr_at_1000 value: 94.32000000000001 - type: mrr_at_3 value: 94.0 - type: mrr_at_5 value: 94.17699999999999 - type: ndcg_at_1 value: 91.45 - type: ndcg_at_10 value: 89.404 - type: ndcg_at_100 value: 91.724 - type: ndcg_at_1000 value: 91.973 - type: ndcg_at_3 value: 88.104 - type: ndcg_at_5 value: 87.25699999999999 - type: precision_at_1 value: 91.45 - type: precision_at_10 value: 42.585 - type: precision_at_100 value: 4.838 - type: precision_at_1000 value: 0.49 - type: precision_at_3 value: 78.8 - type: precision_at_5 value: 66.66 - type: recall_at_1 value: 26.817999999999998 - type: recall_at_10 value: 90.67 - type: recall_at_100 value: 98.36200000000001 - type: recall_at_1000 value: 99.583 - type: recall_at_3 value: 59.614999999999995 - type: recall_at_5 value: 77.05199999999999 task: type: Retrieval - dataset: config: default name: MTEB EcomRetrieval revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 split: dev type: C-MTEB/EcomRetrieval metrics: - type: map_at_1 value: 47.699999999999996 - type: map_at_10 value: 57.589999999999996 - type: map_at_100 value: 58.226 - type: map_at_1000 value: 58.251 - type: map_at_3 value: 55.233 - type: map_at_5 value: 56.633 - type: mrr_at_1 value: 47.699999999999996 - type: mrr_at_10 value: 57.589999999999996 - type: mrr_at_100 value: 58.226 - type: mrr_at_1000 value: 58.251 - type: mrr_at_3 value: 55.233 - type: mrr_at_5 value: 56.633 - type: ndcg_at_1 value: 47.699999999999996 - type: ndcg_at_10 value: 62.505 - type: ndcg_at_100 value: 65.517 - type: ndcg_at_1000 value: 66.19800000000001 - type: ndcg_at_3 value: 57.643 - type: ndcg_at_5 value: 60.181 - type: precision_at_1 value: 47.699999999999996 - type: precision_at_10 value: 7.8 - type: precision_at_100 value: 0.919 - type: precision_at_1000 value: 0.097 - type: precision_at_3 value: 21.532999999999998 - type: precision_at_5 value: 14.16 - type: recall_at_1 value: 47.699999999999996 - type: recall_at_10 value: 78.0 - type: recall_at_100 value: 91.9 - type: recall_at_1000 value: 97.3 - type: recall_at_3 value: 64.60000000000001 - type: recall_at_5 value: 70.8 task: type: Retrieval - dataset: config: default name: MTEB IFlyTek revision: 421605374b29664c5fc098418fe20ada9bd55f8a split: validation type: C-MTEB/IFlyTek-classification metrics: - type: accuracy value: 44.84801846864178 - type: f1 value: 37.47347897956339 task: type: Classification - dataset: config: default name: MTEB JDReview revision: b7c64bd89eb87f8ded463478346f76731f07bf8b split: test type: C-MTEB/JDReview-classification metrics: - type: accuracy value: 85.81613508442777 - type: ap value: 52.68244615477374 - type: f1 value: 80.0445640948843 task: type: Classification - dataset: config: default name: MTEB LCQMC revision: 17f9b096f80380fce5ed12a9be8be7784b337daf split: test type: C-MTEB/LCQMC metrics: - type: cos_sim_pearson value: 69.57786502217138 - type: cos_sim_spearman value: 75.39106054489906 - type: euclidean_pearson value: 73.72082954602402 - type: euclidean_spearman value: 75.14421475913619 - type: manhattan_pearson value: 73.62463076633642 - type: manhattan_spearman value: 75.01301565104112 task: type: STS - dataset: config: default name: MTEB MMarcoReranking revision: None split: dev type: C-MTEB/Mmarco-reranking metrics: - type: map value: 29.143797057999134 - type: mrr value: 28.08174603174603 task: type: Reranking - dataset: config: default name: MTEB MMarcoRetrieval revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 split: dev type: C-MTEB/MMarcoRetrieval metrics: - type: map_at_1 value: 70.492 - type: map_at_10 value: 79.501 - type: map_at_100 value: 79.728 - type: map_at_1000 value: 79.735 - type: map_at_3 value: 77.77 - type: map_at_5 value: 78.851 - type: mrr_at_1 value: 72.822 - type: mrr_at_10 value: 80.001 - type: mrr_at_100 value: 80.19 - type: mrr_at_1000 value: 80.197 - type: mrr_at_3 value: 78.484 - type: mrr_at_5 value: 79.42099999999999 - type: ndcg_at_1 value: 72.822 - type: ndcg_at_10 value: 83.013 - type: ndcg_at_100 value: 84.013 - type: ndcg_at_1000 value: 84.20400000000001 - type: ndcg_at_3 value: 79.728 - type: ndcg_at_5 value: 81.542 - type: precision_at_1 value: 72.822 - type: precision_at_10 value: 9.917 - type: precision_at_100 value: 1.042 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 29.847 - type: precision_at_5 value: 18.871 - type: recall_at_1 value: 70.492 - type: recall_at_10 value: 93.325 - type: recall_at_100 value: 97.822 - type: recall_at_1000 value: 99.319 - type: recall_at_3 value: 84.636 - type: recall_at_5 value: 88.93100000000001 task: type: Retrieval - dataset: config: zh-CN name: MTEB MassiveIntentClassification (zh-CN) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 76.88298587760592 - type: f1 value: 73.89001762017176 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveScenarioClassification (zh-CN) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 80.76328177538669 - type: f1 value: 80.24718532423358 task: type: Classification - dataset: config: default name: MTEB MedicalRetrieval revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 split: dev type: C-MTEB/MedicalRetrieval metrics: - type: map_at_1 value: 49.6 - type: map_at_10 value: 55.620999999999995 - type: map_at_100 value: 56.204 - type: map_at_1000 value: 56.251 - type: map_at_3 value: 54.132999999999996 - type: map_at_5 value: 54.933 - type: mrr_at_1 value: 49.7 - type: mrr_at_10 value: 55.67100000000001 - type: mrr_at_100 value: 56.254000000000005 - type: mrr_at_1000 value: 56.301 - type: mrr_at_3 value: 54.18300000000001 - type: mrr_at_5 value: 54.983000000000004 - type: ndcg_at_1 value: 49.6 - type: ndcg_at_10 value: 58.645 - type: ndcg_at_100 value: 61.789 - type: ndcg_at_1000 value: 63.219 - type: ndcg_at_3 value: 55.567 - type: ndcg_at_5 value: 57.008 - type: precision_at_1 value: 49.6 - type: precision_at_10 value: 6.819999999999999 - type: precision_at_100 value: 0.836 - type: precision_at_1000 value: 0.095 - type: precision_at_3 value: 19.900000000000002 - type: precision_at_5 value: 12.64 - type: recall_at_1 value: 49.6 - type: recall_at_10 value: 68.2 - type: recall_at_100 value: 83.6 - type: recall_at_1000 value: 95.3 - type: recall_at_3 value: 59.699999999999996 - type: recall_at_5 value: 63.2 task: type: Retrieval - dataset: config: default name: MTEB MultilingualSentiment revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a split: validation type: C-MTEB/MultilingualSentiment-classification metrics: - type: accuracy value: 74.45666666666666 - type: f1 value: 74.32582402190089 task: type: Classification - dataset: config: default name: MTEB Ocnli revision: 66e76a618a34d6d565d5538088562851e6daa7ec split: validation type: C-MTEB/OCNLI metrics: - type: cos_sim_accuracy value: 80.67135896047645 - type: cos_sim_ap value: 87.60421240712051 - type: cos_sim_f1 value: 82.1304131408661 - type: cos_sim_precision value: 77.68361581920904 - type: cos_sim_recall value: 87.11721224920802 - type: dot_accuracy value: 79.04710341093666 - type: dot_ap value: 85.6370059719336 - type: dot_f1 value: 80.763723150358 - type: dot_precision value: 73.69337979094077 - type: dot_recall value: 89.33474128827878 - type: euclidean_accuracy value: 81.05035192203573 - type: euclidean_ap value: 87.7880240053663 - type: euclidean_f1 value: 82.50244379276637 - type: euclidean_precision value: 76.7970882620564 - type: euclidean_recall value: 89.1235480464625 - type: manhattan_accuracy value: 80.61721710882512 - type: manhattan_ap value: 87.43568120591175 - type: manhattan_f1 value: 81.89526184538653 - type: manhattan_precision value: 77.5992438563327 - type: manhattan_recall value: 86.6948257655755 - type: max_accuracy value: 81.05035192203573 - type: max_ap value: 87.7880240053663 - type: max_f1 value: 82.50244379276637 task: type: PairClassification - dataset: config: default name: MTEB OnlineShopping revision: e610f2ebd179a8fda30ae534c3878750a96db120 split: test type: C-MTEB/OnlineShopping-classification metrics: - type: accuracy value: 93.5 - type: ap value: 91.31357903446782 - type: f1 value: 93.48088994006616 task: type: Classification - dataset: config: default name: MTEB PAWSX revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 split: test type: C-MTEB/PAWSX metrics: - type: cos_sim_pearson value: 36.93293453538077 - type: cos_sim_spearman value: 42.45972506308574 - type: euclidean_pearson value: 42.34945133152159 - type: euclidean_spearman value: 42.331610303674644 - type: manhattan_pearson value: 42.31455070249498 - type: manhattan_spearman value: 42.19887982891834 task: type: STS - dataset: config: default name: MTEB QBQTC revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 split: test type: C-MTEB/QBQTC metrics: - type: cos_sim_pearson value: 33.683290790043785 - type: cos_sim_spearman value: 35.149171171202994 - type: euclidean_pearson value: 32.33806561267862 - type: euclidean_spearman value: 34.483576387347966 - type: manhattan_pearson value: 32.47629754599608 - type: manhattan_spearman value: 34.66434471867615 task: type: STS - dataset: config: zh name: MTEB STS22 (zh) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 66.46322760516104 - type: cos_sim_spearman value: 67.398478319726 - type: euclidean_pearson value: 64.7223480293625 - type: euclidean_spearman value: 66.83118568812951 - type: manhattan_pearson value: 64.88440039828305 - type: manhattan_spearman value: 66.80429458952257 task: type: STS - dataset: config: default name: MTEB STSB revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 split: test type: C-MTEB/STSB metrics: - type: cos_sim_pearson value: 79.08991383232105 - type: cos_sim_spearman value: 79.39715677296854 - type: euclidean_pearson value: 78.63201279320496 - type: euclidean_spearman value: 79.40262660785731 - type: manhattan_pearson value: 78.98138363146906 - type: manhattan_spearman value: 79.79968413014194 task: type: STS - dataset: config: default name: MTEB T2Reranking revision: 76631901a18387f85eaa53e5450019b87ad58ef9 split: dev type: C-MTEB/T2Reranking metrics: - type: map value: 67.43289278789972 - type: mrr value: 77.53012460908535 task: type: Reranking - dataset: config: default name: MTEB T2Retrieval revision: 8731a845f1bf500a4f111cf1070785c793d10e64 split: dev type: C-MTEB/T2Retrieval metrics: - type: map_at_1 value: 27.733999999999998 - type: map_at_10 value: 78.24799999999999 - type: map_at_100 value: 81.765 - type: map_at_1000 value: 81.824 - type: map_at_3 value: 54.92 - type: map_at_5 value: 67.61399999999999 - type: mrr_at_1 value: 90.527 - type: mrr_at_10 value: 92.843 - type: mrr_at_100 value: 92.927 - type: mrr_at_1000 value: 92.93 - type: mrr_at_3 value: 92.45100000000001 - type: mrr_at_5 value: 92.693 - type: ndcg_at_1 value: 90.527 - type: ndcg_at_10 value: 85.466 - type: ndcg_at_100 value: 88.846 - type: ndcg_at_1000 value: 89.415 - type: ndcg_at_3 value: 86.768 - type: ndcg_at_5 value: 85.46000000000001 - type: precision_at_1 value: 90.527 - type: precision_at_10 value: 42.488 - type: precision_at_100 value: 5.024 - type: precision_at_1000 value: 0.516 - type: precision_at_3 value: 75.907 - type: precision_at_5 value: 63.727000000000004 - type: recall_at_1 value: 27.733999999999998 - type: recall_at_10 value: 84.346 - type: recall_at_100 value: 95.536 - type: recall_at_1000 value: 98.42999999999999 - type: recall_at_3 value: 56.455 - type: recall_at_5 value: 70.755 task: type: Retrieval - dataset: config: default name: MTEB TNews revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 split: validation type: C-MTEB/TNews-classification metrics: - type: accuracy value: 49.952000000000005 - type: f1 value: 48.264617195258054 task: type: Classification - dataset: config: default name: MTEB ThuNewsClusteringP2P revision: 5798586b105c0434e4f0fe5e767abe619442cf93 split: test type: C-MTEB/ThuNewsClusteringP2P metrics: - type: v_measure value: 68.23769904483508 task: type: Clustering - dataset: config: default name: MTEB ThuNewsClusteringS2S revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d split: test type: C-MTEB/ThuNewsClusteringS2S metrics: - type: v_measure value: 62.50294403136556 task: type: Clustering - dataset: config: default name: MTEB VideoRetrieval revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 split: dev type: C-MTEB/VideoRetrieval metrics: - type: map_at_1 value: 54.0 - type: map_at_10 value: 63.668 - type: map_at_100 value: 64.217 - type: map_at_1000 value: 64.23100000000001 - type: map_at_3 value: 61.7 - type: map_at_5 value: 62.870000000000005 - type: mrr_at_1 value: 54.0 - type: mrr_at_10 value: 63.668 - type: mrr_at_100 value: 64.217 - type: mrr_at_1000 value: 64.23100000000001 - type: mrr_at_3 value: 61.7 - type: mrr_at_5 value: 62.870000000000005 - type: ndcg_at_1 value: 54.0 - type: ndcg_at_10 value: 68.11399999999999 - type: ndcg_at_100 value: 70.723 - type: ndcg_at_1000 value: 71.123 - type: ndcg_at_3 value: 64.074 - type: ndcg_at_5 value: 66.178 - type: precision_at_1 value: 54.0 - type: precision_at_10 value: 8.200000000000001 - type: precision_at_100 value: 0.941 - type: precision_at_1000 value: 0.097 - type: precision_at_3 value: 23.633000000000003 - type: precision_at_5 value: 15.2 - type: recall_at_1 value: 54.0 - type: recall_at_10 value: 82.0 - type: recall_at_100 value: 94.1 - type: recall_at_1000 value: 97.3 - type: recall_at_3 value: 70.89999999999999 - type: recall_at_5 value: 76.0 task: type: Retrieval - dataset: config: default name: MTEB Waimai revision: 339287def212450dcaa9df8c22bf93e9980c7023 split: test type: C-MTEB/waimai-classification metrics: - type: accuracy value: 86.63000000000001 - type: ap value: 69.99457882599567 - type: f1 value: 85.07735617998541 task: type: Classification - dataset: config: default name: MTEB 8TagsClustering revision: None split: test type: PL-MTEB/8tags-clustering metrics: - type: v_measure value: 44.594104491193555 task: type: Clustering - dataset: config: default name: MTEB AllegroReviews revision: None split: test type: PL-MTEB/allegro-reviews metrics: - type: accuracy value: 63.97614314115309 - type: f1 value: 52.15634261679283 task: type: Classification - dataset: config: default name: MTEB ArguAna-PL revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 split: test type: clarin-knext/arguana-pl metrics: - type: map_at_1 value: 32.646 - type: map_at_10 value: 47.963 - type: map_at_100 value: 48.789 - type: map_at_1000 value: 48.797000000000004 - type: map_at_3 value: 43.196 - type: map_at_5 value: 46.016 - type: mrr_at_1 value: 33.073 - type: mrr_at_10 value: 48.126000000000005 - type: mrr_at_100 value: 48.946 - type: mrr_at_1000 value: 48.953 - type: mrr_at_3 value: 43.374 - type: mrr_at_5 value: 46.147 - type: ndcg_at_1 value: 32.646 - type: ndcg_at_10 value: 56.481 - type: ndcg_at_100 value: 59.922 - type: ndcg_at_1000 value: 60.07 - type: ndcg_at_3 value: 46.675 - type: ndcg_at_5 value: 51.76500000000001 - type: precision_at_1 value: 32.646 - type: precision_at_10 value: 8.371 - type: precision_at_100 value: 0.9860000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 18.919 - type: precision_at_5 value: 13.825999999999999 - type: recall_at_1 value: 32.646 - type: recall_at_10 value: 83.71300000000001 - type: recall_at_100 value: 98.578 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 56.757000000000005 - type: recall_at_5 value: 69.132 task: type: Retrieval - dataset: config: default name: MTEB CBD revision: None split: test type: PL-MTEB/cbd metrics: - type: accuracy value: 68.56 - type: ap value: 23.310493680488513 - type: f1 value: 58.85369533105693 task: type: Classification - dataset: config: default name: MTEB CDSC-E revision: None split: test type: PL-MTEB/cdsce-pairclassification metrics: - type: cos_sim_accuracy value: 88.5 - type: cos_sim_ap value: 72.42140924378361 - type: cos_sim_f1 value: 66.0919540229885 - type: cos_sim_precision value: 72.78481012658227 - type: cos_sim_recall value: 60.526315789473685 - type: dot_accuracy value: 88.5 - type: dot_ap value: 72.42140924378361 - type: dot_f1 value: 66.0919540229885 - type: dot_precision value: 72.78481012658227 - type: dot_recall value: 60.526315789473685 - type: euclidean_accuracy value: 88.5 - type: euclidean_ap value: 72.42140924378361 - type: euclidean_f1 value: 66.0919540229885 - type: euclidean_precision value: 72.78481012658227 - type: euclidean_recall value: 60.526315789473685 - type: manhattan_accuracy value: 88.5 - type: manhattan_ap value: 72.49745515311696 - type: manhattan_f1 value: 66.0968660968661 - type: manhattan_precision value: 72.04968944099379 - type: manhattan_recall value: 61.05263157894737 - type: max_accuracy value: 88.5 - type: max_ap value: 72.49745515311696 - type: max_f1 value: 66.0968660968661 task: type: PairClassification - dataset: config: default name: MTEB CDSC-R revision: None split: test type: PL-MTEB/cdscr-sts metrics: - type: cos_sim_pearson value: 90.32269765590145 - type: cos_sim_spearman value: 89.73666311491672 - type: euclidean_pearson value: 88.2933868516544 - type: euclidean_spearman value: 89.73666311491672 - type: manhattan_pearson value: 88.33474590219448 - type: manhattan_spearman value: 89.8548364866583 task: type: STS - dataset: config: default name: MTEB DBPedia-PL revision: 76afe41d9af165cc40999fcaa92312b8b012064a split: test type: clarin-knext/dbpedia-pl metrics: - type: map_at_1 value: 7.632999999999999 - type: map_at_10 value: 16.426 - type: map_at_100 value: 22.651 - type: map_at_1000 value: 24.372 - type: map_at_3 value: 11.706 - type: map_at_5 value: 13.529 - type: mrr_at_1 value: 60.75000000000001 - type: mrr_at_10 value: 68.613 - type: mrr_at_100 value: 69.001 - type: mrr_at_1000 value: 69.021 - type: mrr_at_3 value: 67.0 - type: mrr_at_5 value: 67.925 - type: ndcg_at_1 value: 49.875 - type: ndcg_at_10 value: 36.978 - type: ndcg_at_100 value: 40.031 - type: ndcg_at_1000 value: 47.566 - type: ndcg_at_3 value: 41.148 - type: ndcg_at_5 value: 38.702 - type: precision_at_1 value: 60.75000000000001 - type: precision_at_10 value: 29.7 - type: precision_at_100 value: 9.278 - type: precision_at_1000 value: 2.099 - type: precision_at_3 value: 44.0 - type: precision_at_5 value: 37.6 - type: recall_at_1 value: 7.632999999999999 - type: recall_at_10 value: 22.040000000000003 - type: recall_at_100 value: 44.024 - type: recall_at_1000 value: 67.848 - type: recall_at_3 value: 13.093 - type: recall_at_5 value: 15.973 task: type: Retrieval - dataset: config: default name: MTEB FiQA-PL revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e split: test type: clarin-knext/fiqa-pl metrics: - type: map_at_1 value: 15.473 - type: map_at_10 value: 24.579 - type: map_at_100 value: 26.387 - type: map_at_1000 value: 26.57 - type: map_at_3 value: 21.278 - type: map_at_5 value: 23.179 - type: mrr_at_1 value: 30.709999999999997 - type: mrr_at_10 value: 38.994 - type: mrr_at_100 value: 39.993 - type: mrr_at_1000 value: 40.044999999999995 - type: mrr_at_3 value: 36.342999999999996 - type: mrr_at_5 value: 37.846999999999994 - type: ndcg_at_1 value: 30.709999999999997 - type: ndcg_at_10 value: 31.608999999999998 - type: ndcg_at_100 value: 38.807 - type: ndcg_at_1000 value: 42.208 - type: ndcg_at_3 value: 28.086 - type: ndcg_at_5 value: 29.323 - type: precision_at_1 value: 30.709999999999997 - type: precision_at_10 value: 8.688 - type: precision_at_100 value: 1.608 - type: precision_at_1000 value: 0.22100000000000003 - type: precision_at_3 value: 18.724 - type: precision_at_5 value: 13.950999999999999 - type: recall_at_1 value: 15.473 - type: recall_at_10 value: 38.361000000000004 - type: recall_at_100 value: 65.2 - type: recall_at_1000 value: 85.789 - type: recall_at_3 value: 25.401 - type: recall_at_5 value: 30.875999999999998 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA-PL revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 split: test type: clarin-knext/hotpotqa-pl metrics: - type: map_at_1 value: 38.096000000000004 - type: map_at_10 value: 51.44499999999999 - type: map_at_100 value: 52.325 - type: map_at_1000 value: 52.397000000000006 - type: map_at_3 value: 48.626999999999995 - type: map_at_5 value: 50.342 - type: mrr_at_1 value: 76.19200000000001 - type: mrr_at_10 value: 81.191 - type: mrr_at_100 value: 81.431 - type: mrr_at_1000 value: 81.443 - type: mrr_at_3 value: 80.30199999999999 - type: mrr_at_5 value: 80.85900000000001 - type: ndcg_at_1 value: 76.19200000000001 - type: ndcg_at_10 value: 60.9 - type: ndcg_at_100 value: 64.14699999999999 - type: ndcg_at_1000 value: 65.647 - type: ndcg_at_3 value: 56.818000000000005 - type: ndcg_at_5 value: 59.019999999999996 - type: precision_at_1 value: 76.19200000000001 - type: precision_at_10 value: 12.203 - type: precision_at_100 value: 1.478 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 34.616 - type: precision_at_5 value: 22.515 - type: recall_at_1 value: 38.096000000000004 - type: recall_at_10 value: 61.013 - type: recall_at_100 value: 73.90299999999999 - type: recall_at_1000 value: 83.91 - type: recall_at_3 value: 51.92400000000001 - type: recall_at_5 value: 56.286 task: type: Retrieval - dataset: config: default name: MTEB MSMARCO-PL revision: 8634c07806d5cce3a6138e260e59b81760a0a640 split: test type: clarin-knext/msmarco-pl metrics: - type: map_at_1 value: 1.548 - type: map_at_10 value: 11.049000000000001 - type: map_at_100 value: 28.874 - type: map_at_1000 value: 34.931 - type: map_at_3 value: 4.162 - type: map_at_5 value: 6.396 - type: mrr_at_1 value: 90.69800000000001 - type: mrr_at_10 value: 92.093 - type: mrr_at_100 value: 92.345 - type: mrr_at_1000 value: 92.345 - type: mrr_at_3 value: 91.86 - type: mrr_at_5 value: 91.86 - type: ndcg_at_1 value: 74.031 - type: ndcg_at_10 value: 63.978 - type: ndcg_at_100 value: 53.101 - type: ndcg_at_1000 value: 60.675999999999995 - type: ndcg_at_3 value: 71.421 - type: ndcg_at_5 value: 68.098 - type: precision_at_1 value: 90.69800000000001 - type: precision_at_10 value: 71.86 - type: precision_at_100 value: 31.395 - type: precision_at_1000 value: 5.981 - type: precision_at_3 value: 84.49600000000001 - type: precision_at_5 value: 79.07 - type: recall_at_1 value: 1.548 - type: recall_at_10 value: 12.149000000000001 - type: recall_at_100 value: 40.794999999999995 - type: recall_at_1000 value: 67.974 - type: recall_at_3 value: 4.244 - type: recall_at_5 value: 6.608 task: type: Retrieval - dataset: config: pl name: MTEB MassiveIntentClassification (pl) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 73.55413584398119 - type: f1 value: 69.65610882318181 task: type: Classification - dataset: config: pl name: MTEB MassiveScenarioClassification (pl) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 76.37188971082716 - type: f1 value: 75.64847309941361 task: type: Classification - dataset: config: default name: MTEB NFCorpus-PL revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 split: test type: clarin-knext/nfcorpus-pl metrics: - type: map_at_1 value: 4.919 - type: map_at_10 value: 10.834000000000001 - type: map_at_100 value: 13.38 - type: map_at_1000 value: 14.581 - type: map_at_3 value: 8.198 - type: map_at_5 value: 9.428 - type: mrr_at_1 value: 41.176 - type: mrr_at_10 value: 50.083 - type: mrr_at_100 value: 50.559 - type: mrr_at_1000 value: 50.604000000000006 - type: mrr_at_3 value: 47.936 - type: mrr_at_5 value: 49.407000000000004 - type: ndcg_at_1 value: 39.628 - type: ndcg_at_10 value: 30.098000000000003 - type: ndcg_at_100 value: 27.061 - type: ndcg_at_1000 value: 35.94 - type: ndcg_at_3 value: 35.135 - type: ndcg_at_5 value: 33.335 - type: precision_at_1 value: 41.176 - type: precision_at_10 value: 22.259999999999998 - type: precision_at_100 value: 6.712 - type: precision_at_1000 value: 1.9060000000000001 - type: precision_at_3 value: 33.23 - type: precision_at_5 value: 29.04 - type: recall_at_1 value: 4.919 - type: recall_at_10 value: 14.196 - type: recall_at_100 value: 26.948 - type: recall_at_1000 value: 59.211000000000006 - type: recall_at_3 value: 9.44 - type: recall_at_5 value: 11.569 task: type: Retrieval - dataset: config: default name: MTEB NQ-PL revision: f171245712cf85dd4700b06bef18001578d0ca8d split: test type: clarin-knext/nq-pl metrics: - type: map_at_1 value: 25.35 - type: map_at_10 value: 37.884 - type: map_at_100 value: 38.955 - type: map_at_1000 value: 39.007999999999996 - type: map_at_3 value: 34.239999999999995 - type: map_at_5 value: 36.398 - type: mrr_at_1 value: 28.737000000000002 - type: mrr_at_10 value: 39.973 - type: mrr_at_100 value: 40.844 - type: mrr_at_1000 value: 40.885 - type: mrr_at_3 value: 36.901 - type: mrr_at_5 value: 38.721 - type: ndcg_at_1 value: 28.708 - type: ndcg_at_10 value: 44.204 - type: ndcg_at_100 value: 48.978 - type: ndcg_at_1000 value: 50.33 - type: ndcg_at_3 value: 37.36 - type: ndcg_at_5 value: 40.912 - type: precision_at_1 value: 28.708 - type: precision_at_10 value: 7.367 - type: precision_at_100 value: 1.0030000000000001 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 17.034 - type: precision_at_5 value: 12.293999999999999 - type: recall_at_1 value: 25.35 - type: recall_at_10 value: 61.411 - type: recall_at_100 value: 82.599 - type: recall_at_1000 value: 92.903 - type: recall_at_3 value: 43.728 - type: recall_at_5 value: 51.854 task: type: Retrieval - dataset: config: default name: MTEB PAC revision: None split: test type: laugustyniak/abusive-clauses-pl metrics: - type: accuracy value: 69.04141326382856 - type: ap value: 77.49422763833996 - type: f1 value: 66.73472657783407 task: type: Classification - dataset: config: default name: MTEB PPC revision: None split: test type: PL-MTEB/ppc-pairclassification metrics: - type: cos_sim_accuracy value: 81.0 - type: cos_sim_ap value: 91.47194213011349 - type: cos_sim_f1 value: 84.73767885532592 - type: cos_sim_precision value: 81.49847094801224 - type: cos_sim_recall value: 88.24503311258279 - type: dot_accuracy value: 81.0 - type: dot_ap value: 91.47194213011349 - type: dot_f1 value: 84.73767885532592 - type: dot_precision value: 81.49847094801224 - type: dot_recall value: 88.24503311258279 - type: euclidean_accuracy value: 81.0 - type: euclidean_ap value: 91.47194213011349 - type: euclidean_f1 value: 84.73767885532592 - type: euclidean_precision value: 81.49847094801224 - type: euclidean_recall value: 88.24503311258279 - type: manhattan_accuracy value: 81.0 - type: manhattan_ap value: 91.46464475050571 - type: manhattan_f1 value: 84.48687350835321 - type: manhattan_precision value: 81.31699846860643 - type: manhattan_recall value: 87.91390728476821 - type: max_accuracy value: 81.0 - type: max_ap value: 91.47194213011349 - type: max_f1 value: 84.73767885532592 task: type: PairClassification - dataset: config: default name: MTEB PSC revision: None split: test type: PL-MTEB/psc-pairclassification metrics: - type: cos_sim_accuracy value: 97.6808905380334 - type: cos_sim_ap value: 99.27948611836348 - type: cos_sim_f1 value: 96.15975422427034 - type: cos_sim_precision value: 96.90402476780186 - type: cos_sim_recall value: 95.42682926829268 - type: dot_accuracy value: 97.6808905380334 - type: dot_ap value: 99.2794861183635 - type: dot_f1 value: 96.15975422427034 - type: dot_precision value: 96.90402476780186 - type: dot_recall value: 95.42682926829268 - type: euclidean_accuracy value: 97.6808905380334 - type: euclidean_ap value: 99.2794861183635 - type: euclidean_f1 value: 96.15975422427034 - type: euclidean_precision value: 96.90402476780186 - type: euclidean_recall value: 95.42682926829268 - type: manhattan_accuracy value: 97.6808905380334 - type: manhattan_ap value: 99.28715055268721 - type: manhattan_f1 value: 96.14791987673343 - type: manhattan_precision value: 97.19626168224299 - type: manhattan_recall value: 95.1219512195122 - type: max_accuracy value: 97.6808905380334 - type: max_ap value: 99.28715055268721 - type: max_f1 value: 96.15975422427034 task: type: PairClassification - dataset: config: default name: MTEB PolEmo2.0-IN revision: None split: test type: PL-MTEB/polemo2_in metrics: - type: accuracy value: 86.16343490304708 - type: f1 value: 83.3442579486744 task: type: Classification - dataset: config: default name: MTEB PolEmo2.0-OUT revision: None split: test type: PL-MTEB/polemo2_out metrics: - type: accuracy value: 68.40080971659918 - type: f1 value: 53.13720751142237 task: type: Classification - dataset: config: default name: MTEB Quora-PL revision: 0be27e93455051e531182b85e85e425aba12e9d4 split: test type: clarin-knext/quora-pl metrics: - type: map_at_1 value: 63.322 - type: map_at_10 value: 76.847 - type: map_at_100 value: 77.616 - type: map_at_1000 value: 77.644 - type: map_at_3 value: 73.624 - type: map_at_5 value: 75.603 - type: mrr_at_1 value: 72.88 - type: mrr_at_10 value: 80.376 - type: mrr_at_100 value: 80.604 - type: mrr_at_1000 value: 80.61 - type: mrr_at_3 value: 78.92 - type: mrr_at_5 value: 79.869 - type: ndcg_at_1 value: 72.89999999999999 - type: ndcg_at_10 value: 81.43 - type: ndcg_at_100 value: 83.394 - type: ndcg_at_1000 value: 83.685 - type: ndcg_at_3 value: 77.62599999999999 - type: ndcg_at_5 value: 79.656 - type: precision_at_1 value: 72.89999999999999 - type: precision_at_10 value: 12.548 - type: precision_at_100 value: 1.4869999999999999 - type: precision_at_1000 value: 0.155 - type: precision_at_3 value: 34.027 - type: precision_at_5 value: 22.654 - type: recall_at_1 value: 63.322 - type: recall_at_10 value: 90.664 - type: recall_at_100 value: 97.974 - type: recall_at_1000 value: 99.636 - type: recall_at_3 value: 80.067 - type: recall_at_5 value: 85.526 task: type: Retrieval - dataset: config: default name: MTEB SCIDOCS-PL revision: 45452b03f05560207ef19149545f168e596c9337 split: test type: clarin-knext/scidocs-pl metrics: - type: map_at_1 value: 3.95 - type: map_at_10 value: 9.658999999999999 - type: map_at_100 value: 11.384 - type: map_at_1000 value: 11.677 - type: map_at_3 value: 7.055 - type: map_at_5 value: 8.244 - type: mrr_at_1 value: 19.5 - type: mrr_at_10 value: 28.777 - type: mrr_at_100 value: 29.936 - type: mrr_at_1000 value: 30.009999999999998 - type: mrr_at_3 value: 25.55 - type: mrr_at_5 value: 27.284999999999997 - type: ndcg_at_1 value: 19.5 - type: ndcg_at_10 value: 16.589000000000002 - type: ndcg_at_100 value: 23.879 - type: ndcg_at_1000 value: 29.279 - type: ndcg_at_3 value: 15.719 - type: ndcg_at_5 value: 13.572000000000001 - type: precision_at_1 value: 19.5 - type: precision_at_10 value: 8.62 - type: precision_at_100 value: 1.924 - type: precision_at_1000 value: 0.322 - type: precision_at_3 value: 14.6 - type: precision_at_5 value: 11.78 - type: recall_at_1 value: 3.95 - type: recall_at_10 value: 17.477999999999998 - type: recall_at_100 value: 38.99 - type: recall_at_1000 value: 65.417 - type: recall_at_3 value: 8.883000000000001 - type: recall_at_5 value: 11.933 task: type: Retrieval - dataset: config: default name: MTEB SICK-E-PL revision: None split: test type: PL-MTEB/sicke-pl-pairclassification metrics: - type: cos_sim_accuracy value: 83.48960456583775 - type: cos_sim_ap value: 76.31522115825375 - type: cos_sim_f1 value: 70.35573122529645 - type: cos_sim_precision value: 70.9934735315446 - type: cos_sim_recall value: 69.72934472934473 - type: dot_accuracy value: 83.48960456583775 - type: dot_ap value: 76.31522115825373 - type: dot_f1 value: 70.35573122529645 - type: dot_precision value: 70.9934735315446 - type: dot_recall value: 69.72934472934473 - type: euclidean_accuracy value: 83.48960456583775 - type: euclidean_ap value: 76.31522115825373 - type: euclidean_f1 value: 70.35573122529645 - type: euclidean_precision value: 70.9934735315446 - type: euclidean_recall value: 69.72934472934473 - type: manhattan_accuracy value: 83.46922136159804 - type: manhattan_ap value: 76.18474601388084 - type: manhattan_f1 value: 70.34779490856937 - type: manhattan_precision value: 70.83032490974729 - type: manhattan_recall value: 69.87179487179486 - type: max_accuracy value: 83.48960456583775 - type: max_ap value: 76.31522115825375 - type: max_f1 value: 70.35573122529645 task: type: PairClassification - dataset: config: default name: MTEB SICK-R-PL revision: None split: test type: PL-MTEB/sickr-pl-sts metrics: - type: cos_sim_pearson value: 77.95374883876302 - type: cos_sim_spearman value: 73.77630219171942 - type: euclidean_pearson value: 75.81927069594934 - type: euclidean_spearman value: 73.7763211303831 - type: manhattan_pearson value: 76.03126859057528 - type: manhattan_spearman value: 73.96528138013369 task: type: STS - dataset: config: pl name: MTEB STS22 (pl) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 37.388282764841826 - type: cos_sim_spearman value: 40.83477184710897 - type: euclidean_pearson value: 26.754737044177805 - type: euclidean_spearman value: 40.83477184710897 - type: manhattan_pearson value: 26.760453110872458 - type: manhattan_spearman value: 41.034477441383856 task: type: STS - dataset: config: default name: MTEB SciFact-PL revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e split: test type: clarin-knext/scifact-pl metrics: - type: map_at_1 value: 49.15 - type: map_at_10 value: 61.690999999999995 - type: map_at_100 value: 62.348000000000006 - type: map_at_1000 value: 62.38 - type: map_at_3 value: 58.824 - type: map_at_5 value: 60.662000000000006 - type: mrr_at_1 value: 51.333 - type: mrr_at_10 value: 62.731 - type: mrr_at_100 value: 63.245 - type: mrr_at_1000 value: 63.275000000000006 - type: mrr_at_3 value: 60.667 - type: mrr_at_5 value: 61.93300000000001 - type: ndcg_at_1 value: 51.333 - type: ndcg_at_10 value: 67.168 - type: ndcg_at_100 value: 69.833 - type: ndcg_at_1000 value: 70.56700000000001 - type: ndcg_at_3 value: 62.40599999999999 - type: ndcg_at_5 value: 65.029 - type: precision_at_1 value: 51.333 - type: precision_at_10 value: 9.333 - type: precision_at_100 value: 1.0699999999999998 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.333 - type: precision_at_5 value: 17.067 - type: recall_at_1 value: 49.15 - type: recall_at_10 value: 82.533 - type: recall_at_100 value: 94.167 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 69.917 - type: recall_at_5 value: 76.356 task: type: Retrieval - dataset: config: default name: MTEB TRECCOVID-PL revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd split: test type: clarin-knext/trec-covid-pl metrics: - type: map_at_1 value: 0.261 - type: map_at_10 value: 2.1260000000000003 - type: map_at_100 value: 12.171999999999999 - type: map_at_1000 value: 26.884999999999998 - type: map_at_3 value: 0.695 - type: map_at_5 value: 1.134 - type: mrr_at_1 value: 96.0 - type: mrr_at_10 value: 96.952 - type: mrr_at_100 value: 96.952 - type: mrr_at_1000 value: 96.952 - type: mrr_at_3 value: 96.667 - type: mrr_at_5 value: 96.667 - type: ndcg_at_1 value: 92.0 - type: ndcg_at_10 value: 81.193 - type: ndcg_at_100 value: 61.129 - type: ndcg_at_1000 value: 51.157 - type: ndcg_at_3 value: 85.693 - type: ndcg_at_5 value: 84.129 - type: precision_at_1 value: 96.0 - type: precision_at_10 value: 85.39999999999999 - type: precision_at_100 value: 62.03999999999999 - type: precision_at_1000 value: 22.224 - type: precision_at_3 value: 88.0 - type: precision_at_5 value: 88.0 - type: recall_at_1 value: 0.261 - type: recall_at_10 value: 2.262 - type: recall_at_100 value: 14.981 - type: recall_at_1000 value: 46.837 - type: recall_at_3 value: 0.703 - type: recall_at_5 value: 1.172 task: type: Retrieval - dataset: config: default name: MTEB AlloProfClusteringP2P revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: v_measure value: 70.55290063940157 task: type: Clustering - dataset: config: default name: MTEB AlloProfClusteringS2S revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: v_measure value: 55.41500719337263 task: type: Clustering - dataset: config: default name: MTEB AlloprofReranking revision: 666fdacebe0291776e86f29345663dfaf80a0db9 split: test type: lyon-nlp/mteb-fr-reranking-alloprof-s2p metrics: - type: map value: 73.48697375332002 - type: mrr value: 75.01836585523822 task: type: Reranking - dataset: config: default name: MTEB AlloprofRetrieval revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: map_at_1 value: 38.454 - type: map_at_10 value: 51.605000000000004 - type: map_at_100 value: 52.653000000000006 - type: map_at_1000 value: 52.697 - type: map_at_3 value: 48.304 - type: map_at_5 value: 50.073 - type: mrr_at_1 value: 43.307 - type: mrr_at_10 value: 54.400000000000006 - type: mrr_at_100 value: 55.147999999999996 - type: mrr_at_1000 value: 55.174 - type: mrr_at_3 value: 51.77 - type: mrr_at_5 value: 53.166999999999994 - type: ndcg_at_1 value: 43.307 - type: ndcg_at_10 value: 57.891000000000005 - type: ndcg_at_100 value: 62.161 - type: ndcg_at_1000 value: 63.083 - type: ndcg_at_3 value: 51.851 - type: ndcg_at_5 value: 54.605000000000004 - type: precision_at_1 value: 43.307 - type: precision_at_10 value: 9.033 - type: precision_at_100 value: 1.172 - type: precision_at_1000 value: 0.127 - type: precision_at_3 value: 22.798 - type: precision_at_5 value: 15.492 - type: recall_at_1 value: 38.454 - type: recall_at_10 value: 74.166 - type: recall_at_100 value: 92.43599999999999 - type: recall_at_1000 value: 99.071 - type: recall_at_3 value: 58.087 - type: recall_at_5 value: 64.568 task: type: Retrieval - dataset: config: fr name: MTEB AmazonReviewsClassification (fr) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 53.474 - type: f1 value: 50.38275392350236 task: type: Classification - dataset: config: default name: MTEB BSARDRetrieval revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 split: test type: maastrichtlawtech/bsard metrics: - type: map_at_1 value: 2.252 - type: map_at_10 value: 4.661 - type: map_at_100 value: 5.271 - type: map_at_1000 value: 5.3629999999999995 - type: map_at_3 value: 3.604 - type: map_at_5 value: 4.3020000000000005 - type: mrr_at_1 value: 2.252 - type: mrr_at_10 value: 4.661 - type: mrr_at_100 value: 5.271 - type: mrr_at_1000 value: 5.3629999999999995 - type: mrr_at_3 value: 3.604 - type: mrr_at_5 value: 4.3020000000000005 - type: ndcg_at_1 value: 2.252 - type: ndcg_at_10 value: 6.3020000000000005 - type: ndcg_at_100 value: 10.342 - type: ndcg_at_1000 value: 13.475999999999999 - type: ndcg_at_3 value: 4.0649999999999995 - type: ndcg_at_5 value: 5.344 - type: precision_at_1 value: 2.252 - type: precision_at_10 value: 1.171 - type: precision_at_100 value: 0.333 - type: precision_at_1000 value: 0.059000000000000004 - type: precision_at_3 value: 1.802 - type: precision_at_5 value: 1.712 - type: recall_at_1 value: 2.252 - type: recall_at_10 value: 11.712 - type: recall_at_100 value: 33.333 - type: recall_at_1000 value: 59.458999999999996 - type: recall_at_3 value: 5.405 - type: recall_at_5 value: 8.559 task: type: Retrieval - dataset: config: default name: MTEB HALClusteringS2S revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 split: test type: lyon-nlp/clustering-hal-s2s metrics: - type: v_measure value: 28.301882091023288 task: type: Clustering - dataset: config: default name: MTEB MLSUMClusteringP2P revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: mlsum metrics: - type: v_measure value: 45.26992995191701 task: type: Clustering - dataset: config: default name: MTEB MLSUMClusteringS2S revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: mlsum metrics: - type: v_measure value: 42.773174876871145 task: type: Clustering - dataset: config: fr name: MTEB MTOPDomainClassification (fr) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 93.47635452552458 - type: f1 value: 93.19922617577213 task: type: Classification - dataset: config: fr name: MTEB MTOPIntentClassification (fr) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 80.2317569683683 - type: f1 value: 56.18060418621901 task: type: Classification - dataset: config: fra name: MTEB MasakhaNEWSClassification (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: accuracy value: 85.18957345971565 - type: f1 value: 80.829981537394 task: type: Classification - dataset: config: fra name: MTEB MasakhaNEWSClusteringP2P (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: v_measure value: 71.04138999801822 task: type: Clustering - dataset: config: fra name: MTEB MasakhaNEWSClusteringS2S (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: v_measure value: 71.7056263158008 task: type: Clustering - dataset: config: fr name: MTEB MassiveIntentClassification (fr) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 76.65097511768661 - type: f1 value: 73.82441070598712 task: type: Classification - dataset: config: fr name: MTEB MassiveScenarioClassification (fr) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 79.09885675857431 - type: f1 value: 78.28407777434224 task: type: Classification - dataset: config: fr name: MTEB MintakaRetrieval (fr) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: jinaai/mintakaqa metrics: - type: map_at_1 value: 25.307000000000002 - type: map_at_10 value: 36.723 - type: map_at_100 value: 37.713 - type: map_at_1000 value: 37.769000000000005 - type: map_at_3 value: 33.77 - type: map_at_5 value: 35.463 - type: mrr_at_1 value: 25.307000000000002 - type: mrr_at_10 value: 36.723 - type: mrr_at_100 value: 37.713 - type: mrr_at_1000 value: 37.769000000000005 - type: mrr_at_3 value: 33.77 - type: mrr_at_5 value: 35.463 - type: ndcg_at_1 value: 25.307000000000002 - type: ndcg_at_10 value: 42.559999999999995 - type: ndcg_at_100 value: 47.457 - type: ndcg_at_1000 value: 49.162 - type: ndcg_at_3 value: 36.461 - type: ndcg_at_5 value: 39.504 - type: precision_at_1 value: 25.307000000000002 - type: precision_at_10 value: 6.106 - type: precision_at_100 value: 0.8420000000000001 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 14.741999999999999 - type: precision_at_5 value: 10.319 - type: recall_at_1 value: 25.307000000000002 - type: recall_at_10 value: 61.056999999999995 - type: recall_at_100 value: 84.152 - type: recall_at_1000 value: 98.03399999999999 - type: recall_at_3 value: 44.226 - type: recall_at_5 value: 51.597 task: type: Retrieval - dataset: config: fr name: MTEB OpusparcusPC (fr) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test type: GEM/opusparcus metrics: - type: cos_sim_accuracy value: 99.90069513406156 - type: cos_sim_ap value: 100.0 - type: cos_sim_f1 value: 99.95032290114257 - type: cos_sim_precision value: 100.0 - type: cos_sim_recall value: 99.90069513406156 - type: dot_accuracy value: 99.90069513406156 - type: dot_ap value: 100.0 - type: dot_f1 value: 99.95032290114257 - type: dot_precision value: 100.0 - type: dot_recall value: 99.90069513406156 - type: euclidean_accuracy value: 99.90069513406156 - type: euclidean_ap value: 100.0 - type: euclidean_f1 value: 99.95032290114257 - type: euclidean_precision value: 100.0 - type: euclidean_recall value: 99.90069513406156 - type: manhattan_accuracy value: 99.90069513406156 - type: manhattan_ap value: 100.0 - type: manhattan_f1 value: 99.95032290114257 - type: manhattan_precision value: 100.0 - type: manhattan_recall value: 99.90069513406156 - type: max_accuracy value: 99.90069513406156 - type: max_ap value: 100.0 - type: max_f1 value: 99.95032290114257 task: type: PairClassification - dataset: config: fr name: MTEB PawsX (fr) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: paws-x metrics: - type: cos_sim_accuracy value: 70.8 - type: cos_sim_ap value: 73.7671529695957 - type: cos_sim_f1 value: 68.80964339527875 - type: cos_sim_precision value: 62.95955882352941 - type: cos_sim_recall value: 75.85825027685493 - type: dot_accuracy value: 70.8 - type: dot_ap value: 73.78345265366947 - type: dot_f1 value: 68.80964339527875 - type: dot_precision value: 62.95955882352941 - type: dot_recall value: 75.85825027685493 - type: euclidean_accuracy value: 70.8 - type: euclidean_ap value: 73.7671529695957 - type: euclidean_f1 value: 68.80964339527875 - type: euclidean_precision value: 62.95955882352941 - type: euclidean_recall value: 75.85825027685493 - type: manhattan_accuracy value: 70.75 - type: manhattan_ap value: 73.78996383615953 - type: manhattan_f1 value: 68.79432624113475 - type: manhattan_precision value: 63.39869281045751 - type: manhattan_recall value: 75.1937984496124 - type: max_accuracy value: 70.8 - type: max_ap value: 73.78996383615953 - type: max_f1 value: 68.80964339527875 task: type: PairClassification - dataset: config: default name: MTEB SICKFr revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a split: test type: Lajavaness/SICK-fr metrics: - type: cos_sim_pearson value: 84.03253762760392 - type: cos_sim_spearman value: 79.68280105762004 - type: euclidean_pearson value: 80.98265050044444 - type: euclidean_spearman value: 79.68233242682867 - type: manhattan_pearson value: 80.9678911810704 - type: manhattan_spearman value: 79.70264097683109 task: type: STS - dataset: config: fr name: MTEB STS22 (fr) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 80.56896987572884 - type: cos_sim_spearman value: 81.84352499523287 - type: euclidean_pearson value: 80.40831759421305 - type: euclidean_spearman value: 81.84352499523287 - type: manhattan_pearson value: 80.74333857561238 - type: manhattan_spearman value: 82.41503246733892 task: type: STS - dataset: config: fr name: MTEB STSBenchmarkMultilingualSTS (fr) revision: 93d57ef91790589e3ce9c365164337a8a78b7632 split: test type: stsb_multi_mt metrics: - type: cos_sim_pearson value: 82.71826762276979 - type: cos_sim_spearman value: 82.25433354916042 - type: euclidean_pearson value: 81.87115571724316 - type: euclidean_spearman value: 82.25322342890107 - type: manhattan_pearson value: 82.11174867527224 - type: manhattan_spearman value: 82.55905365203084 task: type: STS - dataset: config: default name: MTEB SummEvalFr revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 split: test type: lyon-nlp/summarization-summeval-fr-p2p metrics: - type: cos_sim_pearson value: 30.659441623392887 - type: cos_sim_spearman value: 30.501134097353315 - type: dot_pearson value: 30.659444768851056 - type: dot_spearman value: 30.501134097353315 task: type: Summarization - dataset: config: default name: MTEB SyntecReranking revision: b205c5084a0934ce8af14338bf03feb19499c84d split: test type: lyon-nlp/mteb-fr-reranking-syntec-s2p metrics: - type: map value: 94.03333333333333 - type: mrr value: 94.03333333333333 task: type: Reranking - dataset: config: default name: MTEB SyntecRetrieval revision: 77f7e271bf4a92b24fce5119f3486b583ca016ff split: test type: lyon-nlp/mteb-fr-retrieval-syntec-s2p metrics: - type: map_at_1 value: 79.0 - type: map_at_10 value: 87.61 - type: map_at_100 value: 87.655 - type: map_at_1000 value: 87.655 - type: map_at_3 value: 87.167 - type: map_at_5 value: 87.36699999999999 - type: mrr_at_1 value: 79.0 - type: mrr_at_10 value: 87.61 - type: mrr_at_100 value: 87.655 - type: mrr_at_1000 value: 87.655 - type: mrr_at_3 value: 87.167 - type: mrr_at_5 value: 87.36699999999999 - type: ndcg_at_1 value: 79.0 - type: ndcg_at_10 value: 90.473 - type: ndcg_at_100 value: 90.694 - type: ndcg_at_1000 value: 90.694 - type: ndcg_at_3 value: 89.464 - type: ndcg_at_5 value: 89.851 - type: precision_at_1 value: 79.0 - type: precision_at_10 value: 9.9 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 32.0 - type: precision_at_5 value: 19.400000000000002 - type: recall_at_1 value: 79.0 - type: recall_at_10 value: 99.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 96.0 - type: recall_at_5 value: 97.0 task: type: Retrieval - dataset: config: fr name: MTEB XPQARetrieval (fr) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: map_at_1 value: 39.395 - type: map_at_10 value: 59.123999999999995 - type: map_at_100 value: 60.704 - type: map_at_1000 value: 60.760000000000005 - type: map_at_3 value: 53.187 - type: map_at_5 value: 56.863 - type: mrr_at_1 value: 62.083 - type: mrr_at_10 value: 68.87299999999999 - type: mrr_at_100 value: 69.46900000000001 - type: mrr_at_1000 value: 69.48299999999999 - type: mrr_at_3 value: 66.8 - type: mrr_at_5 value: 67.928 - type: ndcg_at_1 value: 62.083 - type: ndcg_at_10 value: 65.583 - type: ndcg_at_100 value: 70.918 - type: ndcg_at_1000 value: 71.72800000000001 - type: ndcg_at_3 value: 60.428000000000004 - type: ndcg_at_5 value: 61.853 - type: precision_at_1 value: 62.083 - type: precision_at_10 value: 15.033 - type: precision_at_100 value: 1.9529999999999998 - type: precision_at_1000 value: 0.207 - type: precision_at_3 value: 36.315 - type: precision_at_5 value: 25.955000000000002 - type: recall_at_1 value: 39.395 - type: recall_at_10 value: 74.332 - type: recall_at_100 value: 94.729 - type: recall_at_1000 value: 99.75500000000001 - type: recall_at_3 value: 57.679 - type: recall_at_5 value: 65.036 task: type: Retrieval --- ## gte-Qwen2-1.5B-instruct **gte-Qwen2-1.5B-instruct** is the latest model in the gte (General Text Embedding) model family. The model is built on Qwen2-1.5B LLM model and use the same training data and strategies as the gte-Qwen2-7B-instruct model. The model incorporates several key advancements: - Integration of bidirectional attention mechanisms, enriching its contextual understanding. - Instruction tuning, applied solely on the query side for streamlined efficiency - Comprehensive training across a vast, multilingual text corpus spanning diverse domains and scenarios. This training leverages both weakly supervised and supervised data, ensuring the model's applicability across numerous languages and a wide array of downstream tasks. ## Model Information - Model Size: 1.5B - Embedding Dimension: 1536 - Max Input Tokens: 32k ## Requirements ## Usage ### Sentence Transformers Observe the config_sentence_transformers.json to see all pre-built prompt names. Otherwise, you can use to use a custom prompt of your choice. ### Transformers ### infinity_emb Usage via infinity, MIT Licensed. ## Evaluation ### MTEB & C-MTEB You can use the scripts/eval_mteb.py to reproduce the following result of **gte-Qwen2-1.5B-instruct** on MTEB(English)/C-MTEB(Chinese): | Model Name | MTEB(56) | C-MTEB(35) | MTEB-fr(26) | MTEB-pl(26) | |:----:|:---------:|:----------:|:----------:|:----------:| | bge-base-en-1.5 | 64.23 | - | - | - | | bge-large-en-1.5 | 63.55 | - | - | - | | gte-large-en-v1.5 | 65.39 | - | - | - | | gte-base-en-v1.5 | 64.11 | - | - | - | | mxbai-embed-large-v1 | 64.68 | - | - | - | | acge_text_embedding | - | 69.07 | - | - | | stella-mrl-large-zh-v3.5-1792d | - | 68.55 | - | - | | gte-large-zh | - | 66.72 | - | - | | multilingual-e5-base | 59.45 | 56.21 | - | - | | multilingual-e5-large | 61.50 | 58.81 | - | - | | e5-mistral-7b-instruct | 66.63 | 60.81 | - | - | | gte-Qwen1.5-7B-instruct | 67.34 | 69.52 | - | - | | NV-Embed-v1 | 69.32 | - | - | - | | **gte-Qwen2-7B-instruct** | **70.24** | **72.05** | **68.25** | **67.86** | | **gte-Qwen2-1.5B-instruct** | **67.16** | **67.65** | **66.60** | **64.04** | ### GTE Models The gte series models have consistently released two types of models: encoder-only models (based on the BERT architecture) and decode-only models (based on the LLM architecture). | Models | Language | Max Sequence Length | Dimension | Model Size (Memory Usage, fp32) | |:-------------------------------------------------------------------------------------:|:--------:|:-----: |:---------:|:-------------------------------:| | GTE-large-zh | Chinese | 512 | 1024 | 1.25GB | | GTE-base-zh | Chinese | 512 | 512 | 0.41GB | | GTE-small-zh | Chinese | 512 | 512 | 0.12GB | | GTE-large | English | 512 | 1024 | 1.25GB | | GTE-base | English | 512 | 512 | 0.21GB | | GTE-small | English | 512 | 384 | 0.10GB | | GTE-large-en-v1.5 | English | 8192 | 1024 | 1.74GB | | GTE-base-en-v1.5 | English | 8192 | 768 | 0.51GB | | GTE-Qwen1.5-7B-instruct | Multilingual | 32000 | 4096 | 26.45GB | | GTE-Qwen2-7B-instruct | Multilingual | 32000 | 3584 | 26.45GB | | GTE-Qwen2-1.5B-instruct | Multilingual | 32000 | 1536 | 6.62GB | ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Community support ### Fine-tuning GTE models can be fine-tuned with a third party framework SWIFT. ## Citation If you find our paper or models helpful, please consider cite:", + "model_explanation_gemini": "Generates sentence embeddings for tasks like classification, retrieval, clustering, and similarity measurement." +} \ No newline at end of file diff --git a/data/model_data_json/Alibaba-NLP_gte-Qwen2-7B-instruct.json b/data/model_data_json/Alibaba-NLP_gte-Qwen2-7B-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..255dfa173ad7efeb35db4d1a4ebe5761680e49d4 --- /dev/null +++ b/data/model_data_json/Alibaba-NLP_gte-Qwen2-7B-instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "Alibaba-NLP/gte-Qwen2-7B-instruct", + "downloads": 125308, + "tags": [ + "sentence-transformers", + "safetensors", + "qwen2", + "text-generation", + "mteb", + "transformers", + "Qwen2", + "sentence-similarity", + "custom_code", + "arxiv:2308.03281", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-transformers - transformers - Qwen2 - sentence-similarity license: apache-2.0 model-index: - name: gte-qwen2-7B-instruct results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 91.31343283582089 - type: ap value: 67.64251402604096 - type: f1 value: 87.53372530755692 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 97.497825 - type: ap value: 96.30329547047529 - type: f1 value: 97.49769793778039 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 62.564 - type: f1 value: 60.975777935041066 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 36.486000000000004 - type: map_at_10 value: 54.842 - type: map_at_100 value: 55.206999999999994 - type: map_at_1000 value: 55.206999999999994 - type: map_at_3 value: 49.893 - type: map_at_5 value: 53.105000000000004 - type: mrr_at_1 value: 37.34 - type: mrr_at_10 value: 55.143 - type: mrr_at_100 value: 55.509 - type: mrr_at_1000 value: 55.509 - type: mrr_at_3 value: 50.212999999999994 - type: mrr_at_5 value: 53.432 - type: ndcg_at_1 value: 36.486000000000004 - type: ndcg_at_10 value: 64.273 - type: ndcg_at_100 value: 65.66199999999999 - type: ndcg_at_1000 value: 65.66199999999999 - type: ndcg_at_3 value: 54.352999999999994 - type: ndcg_at_5 value: 60.131 - type: precision_at_1 value: 36.486000000000004 - type: precision_at_10 value: 9.395000000000001 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.428 - type: precision_at_5 value: 16.259 - type: recall_at_1 value: 36.486000000000004 - type: recall_at_10 value: 93.95400000000001 - type: recall_at_100 value: 99.644 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 67.283 - type: recall_at_5 value: 81.294 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 56.461169803700564 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 51.73600434466286 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 67.57827065898053 - type: mrr value: 79.08136569493911 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 83.53324575999243 - type: cos_sim_spearman value: 81.37173362822374 - type: euclidean_pearson value: 82.19243335103444 - type: euclidean_spearman value: 81.33679307304334 - type: manhattan_pearson value: 82.38752665975699 - type: manhattan_spearman value: 81.31510583189689 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.56818181818181 - type: f1 value: 87.25826722019875 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 50.09239610327673 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 46.64733054606282 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 33.997 - type: map_at_10 value: 48.176 - type: map_at_100 value: 49.82 - type: map_at_1000 value: 49.924 - type: map_at_3 value: 43.626 - type: map_at_5 value: 46.275 - type: mrr_at_1 value: 42.059999999999995 - type: mrr_at_10 value: 53.726 - type: mrr_at_100 value: 54.398 - type: mrr_at_1000 value: 54.416 - type: mrr_at_3 value: 50.714999999999996 - type: mrr_at_5 value: 52.639 - type: ndcg_at_1 value: 42.059999999999995 - type: ndcg_at_10 value: 55.574999999999996 - type: ndcg_at_100 value: 60.744 - type: ndcg_at_1000 value: 61.85699999999999 - type: ndcg_at_3 value: 49.363 - type: ndcg_at_5 value: 52.44 - type: precision_at_1 value: 42.059999999999995 - type: precision_at_10 value: 11.101999999999999 - type: precision_at_100 value: 1.73 - type: precision_at_1000 value: 0.218 - type: precision_at_3 value: 24.464 - type: precision_at_5 value: 18.026 - type: recall_at_1 value: 33.997 - type: recall_at_10 value: 70.35900000000001 - type: recall_at_100 value: 91.642 - type: recall_at_1000 value: 97.977 - type: recall_at_3 value: 52.76 - type: recall_at_5 value: 61.148 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 35.884 - type: map_at_10 value: 48.14 - type: map_at_100 value: 49.5 - type: map_at_1000 value: 49.63 - type: map_at_3 value: 44.646 - type: map_at_5 value: 46.617999999999995 - type: mrr_at_1 value: 44.458999999999996 - type: mrr_at_10 value: 53.751000000000005 - type: mrr_at_100 value: 54.37800000000001 - type: mrr_at_1000 value: 54.415 - type: mrr_at_3 value: 51.815 - type: mrr_at_5 value: 52.882 - type: ndcg_at_1 value: 44.458999999999996 - type: ndcg_at_10 value: 54.157 - type: ndcg_at_100 value: 58.362 - type: ndcg_at_1000 value: 60.178 - type: ndcg_at_3 value: 49.661 - type: ndcg_at_5 value: 51.74999999999999 - type: precision_at_1 value: 44.458999999999996 - type: precision_at_10 value: 10.248 - type: precision_at_100 value: 1.5890000000000002 - type: precision_at_1000 value: 0.207 - type: precision_at_3 value: 23.928 - type: precision_at_5 value: 16.878999999999998 - type: recall_at_1 value: 35.884 - type: recall_at_10 value: 64.798 - type: recall_at_100 value: 82.345 - type: recall_at_1000 value: 93.267 - type: recall_at_3 value: 51.847 - type: recall_at_5 value: 57.601 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 39.383 - type: map_at_10 value: 53.714 - type: map_at_100 value: 54.838 - type: map_at_1000 value: 54.87800000000001 - type: map_at_3 value: 50.114999999999995 - type: map_at_5 value: 52.153000000000006 - type: mrr_at_1 value: 45.016 - type: mrr_at_10 value: 56.732000000000006 - type: mrr_at_100 value: 57.411 - type: mrr_at_1000 value: 57.431 - type: mrr_at_3 value: 54.044000000000004 - type: mrr_at_5 value: 55.639 - type: ndcg_at_1 value: 45.016 - type: ndcg_at_10 value: 60.228 - type: ndcg_at_100 value: 64.277 - type: ndcg_at_1000 value: 65.07 - type: ndcg_at_3 value: 54.124 - type: ndcg_at_5 value: 57.147000000000006 - type: precision_at_1 value: 45.016 - type: precision_at_10 value: 9.937 - type: precision_at_100 value: 1.288 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 24.471999999999998 - type: precision_at_5 value: 16.991 - type: recall_at_1 value: 39.383 - type: recall_at_10 value: 76.175 - type: recall_at_100 value: 93.02 - type: recall_at_1000 value: 98.60900000000001 - type: recall_at_3 value: 60.265 - type: recall_at_5 value: 67.46600000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 27.426000000000002 - type: map_at_10 value: 37.397000000000006 - type: map_at_100 value: 38.61 - type: map_at_1000 value: 38.678000000000004 - type: map_at_3 value: 34.150999999999996 - type: map_at_5 value: 36.137 - type: mrr_at_1 value: 29.944 - type: mrr_at_10 value: 39.654 - type: mrr_at_100 value: 40.638000000000005 - type: mrr_at_1000 value: 40.691 - type: mrr_at_3 value: 36.817 - type: mrr_at_5 value: 38.524 - type: ndcg_at_1 value: 29.944 - type: ndcg_at_10 value: 43.094 - type: ndcg_at_100 value: 48.789 - type: ndcg_at_1000 value: 50.339999999999996 - type: ndcg_at_3 value: 36.984 - type: ndcg_at_5 value: 40.248 - type: precision_at_1 value: 29.944 - type: precision_at_10 value: 6.78 - type: precision_at_100 value: 1.024 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 15.895000000000001 - type: precision_at_5 value: 11.39 - type: recall_at_1 value: 27.426000000000002 - type: recall_at_10 value: 58.464000000000006 - type: recall_at_100 value: 84.193 - type: recall_at_1000 value: 95.52000000000001 - type: recall_at_3 value: 42.172 - type: recall_at_5 value: 50.101 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 19.721 - type: map_at_10 value: 31.604 - type: map_at_100 value: 32.972 - type: map_at_1000 value: 33.077 - type: map_at_3 value: 27.218999999999998 - type: map_at_5 value: 29.53 - type: mrr_at_1 value: 25.0 - type: mrr_at_10 value: 35.843 - type: mrr_at_100 value: 36.785000000000004 - type: mrr_at_1000 value: 36.842000000000006 - type: mrr_at_3 value: 32.193 - type: mrr_at_5 value: 34.264 - type: ndcg_at_1 value: 25.0 - type: ndcg_at_10 value: 38.606 - type: ndcg_at_100 value: 44.272 - type: ndcg_at_1000 value: 46.527 - type: ndcg_at_3 value: 30.985000000000003 - type: ndcg_at_5 value: 34.43 - type: precision_at_1 value: 25.0 - type: precision_at_10 value: 7.811 - type: precision_at_100 value: 1.203 - type: precision_at_1000 value: 0.15 - type: precision_at_3 value: 15.423 - type: precision_at_5 value: 11.791 - type: recall_at_1 value: 19.721 - type: recall_at_10 value: 55.625 - type: recall_at_100 value: 79.34400000000001 - type: recall_at_1000 value: 95.208 - type: recall_at_3 value: 35.19 - type: recall_at_5 value: 43.626 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 33.784 - type: map_at_10 value: 47.522 - type: map_at_100 value: 48.949999999999996 - type: map_at_1000 value: 49.038 - type: map_at_3 value: 43.284 - type: map_at_5 value: 45.629 - type: mrr_at_1 value: 41.482 - type: mrr_at_10 value: 52.830999999999996 - type: mrr_at_100 value: 53.559999999999995 - type: mrr_at_1000 value: 53.588 - type: mrr_at_3 value: 50.016000000000005 - type: mrr_at_5 value: 51.614000000000004 - type: ndcg_at_1 value: 41.482 - type: ndcg_at_10 value: 54.569 - type: ndcg_at_100 value: 59.675999999999995 - type: ndcg_at_1000 value: 60.989000000000004 - type: ndcg_at_3 value: 48.187000000000005 - type: ndcg_at_5 value: 51.183 - type: precision_at_1 value: 41.482 - type: precision_at_10 value: 10.221 - type: precision_at_100 value: 1.486 - type: precision_at_1000 value: 0.17500000000000002 - type: precision_at_3 value: 23.548 - type: precision_at_5 value: 16.805 - type: recall_at_1 value: 33.784 - type: recall_at_10 value: 69.798 - type: recall_at_100 value: 90.098 - type: recall_at_1000 value: 98.176 - type: recall_at_3 value: 52.127 - type: recall_at_5 value: 59.861 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 28.038999999999998 - type: map_at_10 value: 41.904 - type: map_at_100 value: 43.36 - type: map_at_1000 value: 43.453 - type: map_at_3 value: 37.785999999999994 - type: map_at_5 value: 40.105000000000004 - type: mrr_at_1 value: 35.046 - type: mrr_at_10 value: 46.926 - type: mrr_at_100 value: 47.815000000000005 - type: mrr_at_1000 value: 47.849000000000004 - type: mrr_at_3 value: 44.273 - type: mrr_at_5 value: 45.774 - type: ndcg_at_1 value: 35.046 - type: ndcg_at_10 value: 48.937000000000005 - type: ndcg_at_100 value: 54.544000000000004 - type: ndcg_at_1000 value: 56.069 - type: ndcg_at_3 value: 42.858000000000004 - type: ndcg_at_5 value: 45.644 - type: precision_at_1 value: 35.046 - type: precision_at_10 value: 9.452 - type: precision_at_100 value: 1.429 - type: precision_at_1000 value: 0.173 - type: precision_at_3 value: 21.346999999999998 - type: precision_at_5 value: 15.342 - type: recall_at_1 value: 28.038999999999998 - type: recall_at_10 value: 64.59700000000001 - type: recall_at_100 value: 87.735 - type: recall_at_1000 value: 97.41300000000001 - type: recall_at_3 value: 47.368 - type: recall_at_5 value: 54.93900000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 28.17291666666667 - type: map_at_10 value: 40.025749999999995 - type: map_at_100 value: 41.39208333333333 - type: map_at_1000 value: 41.499249999999996 - type: map_at_3 value: 36.347 - type: map_at_5 value: 38.41391666666667 - type: mrr_at_1 value: 33.65925 - type: mrr_at_10 value: 44.085499999999996 - type: mrr_at_100 value: 44.94116666666667 - type: mrr_at_1000 value: 44.9855 - type: mrr_at_3 value: 41.2815 - type: mrr_at_5 value: 42.91491666666666 - type: ndcg_at_1 value: 33.65925 - type: ndcg_at_10 value: 46.430833333333325 - type: ndcg_at_100 value: 51.761 - type: ndcg_at_1000 value: 53.50899999999999 - type: ndcg_at_3 value: 40.45133333333333 - type: ndcg_at_5 value: 43.31483333333334 - type: precision_at_1 value: 33.65925 - type: precision_at_10 value: 8.4995 - type: precision_at_100 value: 1.3210000000000004 - type: precision_at_1000 value: 0.16591666666666666 - type: precision_at_3 value: 19.165083333333335 - type: precision_at_5 value: 13.81816666666667 - type: recall_at_1 value: 28.17291666666667 - type: recall_at_10 value: 61.12624999999999 - type: recall_at_100 value: 83.97266666666667 - type: recall_at_1000 value: 95.66550000000001 - type: recall_at_3 value: 44.661249999999995 - type: recall_at_5 value: 51.983333333333334 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 24.681 - type: map_at_10 value: 34.892 - type: map_at_100 value: 35.996 - type: map_at_1000 value: 36.083 - type: map_at_3 value: 31.491999999999997 - type: map_at_5 value: 33.632 - type: mrr_at_1 value: 28.528 - type: mrr_at_10 value: 37.694 - type: mrr_at_100 value: 38.613 - type: mrr_at_1000 value: 38.668 - type: mrr_at_3 value: 34.714 - type: mrr_at_5 value: 36.616 - type: ndcg_at_1 value: 28.528 - type: ndcg_at_10 value: 40.703 - type: ndcg_at_100 value: 45.993 - type: ndcg_at_1000 value: 47.847 - type: ndcg_at_3 value: 34.622 - type: ndcg_at_5 value: 38.035999999999994 - type: precision_at_1 value: 28.528 - type: precision_at_10 value: 6.902 - type: precision_at_100 value: 1.0370000000000001 - type: precision_at_1000 value: 0.126 - type: precision_at_3 value: 15.798000000000002 - type: precision_at_5 value: 11.655999999999999 - type: recall_at_1 value: 24.681 - type: recall_at_10 value: 55.81 - type: recall_at_100 value: 79.785 - type: recall_at_1000 value: 92.959 - type: recall_at_3 value: 39.074 - type: recall_at_5 value: 47.568 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 18.627 - type: map_at_10 value: 27.872000000000003 - type: map_at_100 value: 29.237999999999996 - type: map_at_1000 value: 29.363 - type: map_at_3 value: 24.751 - type: map_at_5 value: 26.521 - type: mrr_at_1 value: 23.021 - type: mrr_at_10 value: 31.924000000000003 - type: mrr_at_100 value: 32.922000000000004 - type: mrr_at_1000 value: 32.988 - type: mrr_at_3 value: 29.192 - type: mrr_at_5 value: 30.798 - type: ndcg_at_1 value: 23.021 - type: ndcg_at_10 value: 33.535 - type: ndcg_at_100 value: 39.732 - type: ndcg_at_1000 value: 42.201 - type: ndcg_at_3 value: 28.153 - type: ndcg_at_5 value: 30.746000000000002 - type: precision_at_1 value: 23.021 - type: precision_at_10 value: 6.459 - type: precision_at_100 value: 1.1320000000000001 - type: precision_at_1000 value: 0.153 - type: precision_at_3 value: 13.719000000000001 - type: precision_at_5 value: 10.193000000000001 - type: recall_at_1 value: 18.627 - type: recall_at_10 value: 46.463 - type: recall_at_100 value: 74.226 - type: recall_at_1000 value: 91.28500000000001 - type: recall_at_3 value: 31.357000000000003 - type: recall_at_5 value: 38.067 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 31.457 - type: map_at_10 value: 42.888 - type: map_at_100 value: 44.24 - type: map_at_1000 value: 44.327 - type: map_at_3 value: 39.588 - type: map_at_5 value: 41.423 - type: mrr_at_1 value: 37.126999999999995 - type: mrr_at_10 value: 47.083000000000006 - type: mrr_at_100 value: 47.997 - type: mrr_at_1000 value: 48.044 - type: mrr_at_3 value: 44.574000000000005 - type: mrr_at_5 value: 46.202 - type: ndcg_at_1 value: 37.126999999999995 - type: ndcg_at_10 value: 48.833 - type: ndcg_at_100 value: 54.327000000000005 - type: ndcg_at_1000 value: 56.011 - type: ndcg_at_3 value: 43.541999999999994 - type: ndcg_at_5 value: 46.127 - type: precision_at_1 value: 37.126999999999995 - type: precision_at_10 value: 8.376999999999999 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.146 - type: precision_at_3 value: 20.211000000000002 - type: precision_at_5 value: 14.16 - type: recall_at_1 value: 31.457 - type: recall_at_10 value: 62.369 - type: recall_at_100 value: 85.444 - type: recall_at_1000 value: 96.65599999999999 - type: recall_at_3 value: 47.961 - type: recall_at_5 value: 54.676 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 27.139999999999997 - type: map_at_10 value: 38.801 - type: map_at_100 value: 40.549 - type: map_at_1000 value: 40.802 - type: map_at_3 value: 35.05 - type: map_at_5 value: 36.884 - type: mrr_at_1 value: 33.004 - type: mrr_at_10 value: 43.864 - type: mrr_at_100 value: 44.667 - type: mrr_at_1000 value: 44.717 - type: mrr_at_3 value: 40.777 - type: mrr_at_5 value: 42.319 - type: ndcg_at_1 value: 33.004 - type: ndcg_at_10 value: 46.022 - type: ndcg_at_100 value: 51.542 - type: ndcg_at_1000 value: 53.742000000000004 - type: ndcg_at_3 value: 39.795 - type: ndcg_at_5 value: 42.272 - type: precision_at_1 value: 33.004 - type: precision_at_10 value: 9.012 - type: precision_at_100 value: 1.7770000000000001 - type: precision_at_1000 value: 0.26 - type: precision_at_3 value: 19.038 - type: precision_at_5 value: 13.675999999999998 - type: recall_at_1 value: 27.139999999999997 - type: recall_at_10 value: 60.961 - type: recall_at_100 value: 84.451 - type: recall_at_1000 value: 98.113 - type: recall_at_3 value: 43.001 - type: recall_at_5 value: 49.896 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.936 - type: map_at_10 value: 27.399 - type: map_at_100 value: 28.632 - type: map_at_1000 value: 28.738000000000003 - type: map_at_3 value: 24.456 - type: map_at_5 value: 26.06 - type: mrr_at_1 value: 19.224 - type: mrr_at_10 value: 28.998 - type: mrr_at_100 value: 30.11 - type: mrr_at_1000 value: 30.177 - type: mrr_at_3 value: 26.247999999999998 - type: mrr_at_5 value: 27.708 - type: ndcg_at_1 value: 19.224 - type: ndcg_at_10 value: 32.911 - type: ndcg_at_100 value: 38.873999999999995 - type: ndcg_at_1000 value: 41.277 - type: ndcg_at_3 value: 27.142 - type: ndcg_at_5 value: 29.755 - type: precision_at_1 value: 19.224 - type: precision_at_10 value: 5.6930000000000005 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.126 - type: precision_at_3 value: 12.138 - type: precision_at_5 value: 8.909 - type: recall_at_1 value: 17.936 - type: recall_at_10 value: 48.096 - type: recall_at_100 value: 75.389 - type: recall_at_1000 value: 92.803 - type: recall_at_3 value: 32.812999999999995 - type: recall_at_5 value: 38.851 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 22.076999999999998 - type: map_at_10 value: 35.44 - type: map_at_100 value: 37.651 - type: map_at_1000 value: 37.824999999999996 - type: map_at_3 value: 30.764999999999997 - type: map_at_5 value: 33.26 - type: mrr_at_1 value: 50.163000000000004 - type: mrr_at_10 value: 61.207 - type: mrr_at_100 value: 61.675000000000004 - type: mrr_at_1000 value: 61.692 - type: mrr_at_3 value: 58.60999999999999 - type: mrr_at_5 value: 60.307 - type: ndcg_at_1 value: 50.163000000000004 - type: ndcg_at_10 value: 45.882 - type: ndcg_at_100 value: 53.239999999999995 - type: ndcg_at_1000 value: 55.852000000000004 - type: ndcg_at_3 value: 40.514 - type: ndcg_at_5 value: 42.038 - type: precision_at_1 value: 50.163000000000004 - type: precision_at_10 value: 13.466000000000001 - type: precision_at_100 value: 2.164 - type: precision_at_1000 value: 0.266 - type: precision_at_3 value: 29.707 - type: precision_at_5 value: 21.694 - type: recall_at_1 value: 22.076999999999998 - type: recall_at_10 value: 50.193 - type: recall_at_100 value: 74.993 - type: recall_at_1000 value: 89.131 - type: recall_at_3 value: 35.472 - type: recall_at_5 value: 41.814 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.953 - type: map_at_10 value: 24.515 - type: map_at_100 value: 36.173 - type: map_at_1000 value: 38.351 - type: map_at_3 value: 16.592000000000002 - type: map_at_5 value: 20.036 - type: mrr_at_1 value: 74.25 - type: mrr_at_10 value: 81.813 - type: mrr_at_100 value: 82.006 - type: mrr_at_1000 value: 82.011 - type: mrr_at_3 value: 80.875 - type: mrr_at_5 value: 81.362 - type: ndcg_at_1 value: 62.5 - type: ndcg_at_10 value: 52.42 - type: ndcg_at_100 value: 56.808 - type: ndcg_at_1000 value: 63.532999999999994 - type: ndcg_at_3 value: 56.654 - type: ndcg_at_5 value: 54.18300000000001 - type: precision_at_1 value: 74.25 - type: precision_at_10 value: 42.699999999999996 - type: precision_at_100 value: 13.675 - type: precision_at_1000 value: 2.664 - type: precision_at_3 value: 60.5 - type: precision_at_5 value: 52.800000000000004 - type: recall_at_1 value: 9.953 - type: recall_at_10 value: 30.253999999999998 - type: recall_at_100 value: 62.516000000000005 - type: recall_at_1000 value: 84.163 - type: recall_at_3 value: 18.13 - type: recall_at_5 value: 22.771 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 79.455 - type: f1 value: 74.16798697647569 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 87.531 - type: map_at_10 value: 93.16799999999999 - type: map_at_100 value: 93.341 - type: map_at_1000 value: 93.349 - type: map_at_3 value: 92.444 - type: map_at_5 value: 92.865 - type: mrr_at_1 value: 94.014 - type: mrr_at_10 value: 96.761 - type: mrr_at_100 value: 96.762 - type: mrr_at_1000 value: 96.762 - type: mrr_at_3 value: 96.672 - type: mrr_at_5 value: 96.736 - type: ndcg_at_1 value: 94.014 - type: ndcg_at_10 value: 95.112 - type: ndcg_at_100 value: 95.578 - type: ndcg_at_1000 value: 95.68900000000001 - type: ndcg_at_3 value: 94.392 - type: ndcg_at_5 value: 94.72500000000001 - type: precision_at_1 value: 94.014 - type: precision_at_10 value: 11.065 - type: precision_at_100 value: 1.157 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 35.259 - type: precision_at_5 value: 21.599 - type: recall_at_1 value: 87.531 - type: recall_at_10 value: 97.356 - type: recall_at_100 value: 98.965 - type: recall_at_1000 value: 99.607 - type: recall_at_3 value: 95.312 - type: recall_at_5 value: 96.295 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 32.055 - type: map_at_10 value: 53.114 - type: map_at_100 value: 55.235 - type: map_at_1000 value: 55.345 - type: map_at_3 value: 45.854 - type: map_at_5 value: 50.025 - type: mrr_at_1 value: 60.34 - type: mrr_at_10 value: 68.804 - type: mrr_at_100 value: 69.309 - type: mrr_at_1000 value: 69.32199999999999 - type: mrr_at_3 value: 66.40899999999999 - type: mrr_at_5 value: 67.976 - type: ndcg_at_1 value: 60.34 - type: ndcg_at_10 value: 62.031000000000006 - type: ndcg_at_100 value: 68.00500000000001 - type: ndcg_at_1000 value: 69.286 - type: ndcg_at_3 value: 56.355999999999995 - type: ndcg_at_5 value: 58.687 - type: precision_at_1 value: 60.34 - type: precision_at_10 value: 17.176 - type: precision_at_100 value: 2.36 - type: precision_at_1000 value: 0.259 - type: precision_at_3 value: 37.14 - type: precision_at_5 value: 27.809 - type: recall_at_1 value: 32.055 - type: recall_at_10 value: 70.91 - type: recall_at_100 value: 91.83 - type: recall_at_1000 value: 98.871 - type: recall_at_3 value: 51.202999999999996 - type: recall_at_5 value: 60.563 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 43.68 - type: map_at_10 value: 64.389 - type: map_at_100 value: 65.24 - type: map_at_1000 value: 65.303 - type: map_at_3 value: 61.309000000000005 - type: map_at_5 value: 63.275999999999996 - type: mrr_at_1 value: 87.36 - type: mrr_at_10 value: 91.12 - type: mrr_at_100 value: 91.227 - type: mrr_at_1000 value: 91.229 - type: mrr_at_3 value: 90.57600000000001 - type: mrr_at_5 value: 90.912 - type: ndcg_at_1 value: 87.36 - type: ndcg_at_10 value: 73.076 - type: ndcg_at_100 value: 75.895 - type: ndcg_at_1000 value: 77.049 - type: ndcg_at_3 value: 68.929 - type: ndcg_at_5 value: 71.28 - type: precision_at_1 value: 87.36 - type: precision_at_10 value: 14.741000000000001 - type: precision_at_100 value: 1.694 - type: precision_at_1000 value: 0.185 - type: precision_at_3 value: 43.043 - type: precision_at_5 value: 27.681 - type: recall_at_1 value: 43.68 - type: recall_at_10 value: 73.707 - type: recall_at_100 value: 84.7 - type: recall_at_1000 value: 92.309 - type: recall_at_3 value: 64.564 - type: recall_at_5 value: 69.203 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 96.75399999999999 - type: ap value: 95.29389839242187 - type: f1 value: 96.75348377433475 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 25.176 - type: map_at_10 value: 38.598 - type: map_at_100 value: 39.707 - type: map_at_1000 value: 39.744 - type: map_at_3 value: 34.566 - type: map_at_5 value: 36.863 - type: mrr_at_1 value: 25.874000000000002 - type: mrr_at_10 value: 39.214 - type: mrr_at_100 value: 40.251 - type: mrr_at_1000 value: 40.281 - type: mrr_at_3 value: 35.291 - type: mrr_at_5 value: 37.545 - type: ndcg_at_1 value: 25.874000000000002 - type: ndcg_at_10 value: 45.98 - type: ndcg_at_100 value: 51.197 - type: ndcg_at_1000 value: 52.073 - type: ndcg_at_3 value: 37.785999999999994 - type: ndcg_at_5 value: 41.870000000000005 - type: precision_at_1 value: 25.874000000000002 - type: precision_at_10 value: 7.181 - type: precision_at_100 value: 0.979 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 16.051000000000002 - type: precision_at_5 value: 11.713 - type: recall_at_1 value: 25.176 - type: recall_at_10 value: 68.67699999999999 - type: recall_at_100 value: 92.55 - type: recall_at_1000 value: 99.164 - type: recall_at_3 value: 46.372 - type: recall_at_5 value: 56.16 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 99.03784769721841 - type: f1 value: 98.97791641821495 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 91.88326493388054 - type: f1 value: 73.74809928034335 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 85.41358439811701 - type: f1 value: 83.503679460639 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 89.77135171486215 - type: f1 value: 88.89843747468366 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 46.22695362087359 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 44.132372165849425 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 33.35680810650402 - type: mrr value: 34.72625715637218 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 7.165000000000001 - type: map_at_10 value: 15.424 - type: map_at_100 value: 20.28 - type: map_at_1000 value: 22.065 - type: map_at_3 value: 11.236 - type: map_at_5 value: 13.025999999999998 - type: mrr_at_1 value: 51.702999999999996 - type: mrr_at_10 value: 59.965 - type: mrr_at_100 value: 60.667 - type: mrr_at_1000 value: 60.702999999999996 - type: mrr_at_3 value: 58.772000000000006 - type: mrr_at_5 value: 59.267 - type: ndcg_at_1 value: 49.536 - type: ndcg_at_10 value: 40.6 - type: ndcg_at_100 value: 37.848 - type: ndcg_at_1000 value: 46.657 - type: ndcg_at_3 value: 46.117999999999995 - type: ndcg_at_5 value: 43.619 - type: precision_at_1 value: 51.393 - type: precision_at_10 value: 30.31 - type: precision_at_100 value: 9.972 - type: precision_at_1000 value: 2.329 - type: precision_at_3 value: 43.137 - type: precision_at_5 value: 37.585 - type: recall_at_1 value: 7.165000000000001 - type: recall_at_10 value: 19.689999999999998 - type: recall_at_100 value: 39.237 - type: recall_at_1000 value: 71.417 - type: recall_at_3 value: 12.247 - type: recall_at_5 value: 14.902999999999999 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 42.653999999999996 - type: map_at_10 value: 59.611999999999995 - type: map_at_100 value: 60.32300000000001 - type: map_at_1000 value: 60.336 - type: map_at_3 value: 55.584999999999994 - type: map_at_5 value: 58.19 - type: mrr_at_1 value: 47.683 - type: mrr_at_10 value: 62.06700000000001 - type: mrr_at_100 value: 62.537 - type: mrr_at_1000 value: 62.544999999999995 - type: mrr_at_3 value: 59.178 - type: mrr_at_5 value: 61.034 - type: ndcg_at_1 value: 47.654 - type: ndcg_at_10 value: 67.001 - type: ndcg_at_100 value: 69.73899999999999 - type: ndcg_at_1000 value: 69.986 - type: ndcg_at_3 value: 59.95700000000001 - type: ndcg_at_5 value: 64.025 - type: precision_at_1 value: 47.654 - type: precision_at_10 value: 10.367999999999999 - type: precision_at_100 value: 1.192 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 26.651000000000003 - type: precision_at_5 value: 18.459 - type: recall_at_1 value: 42.653999999999996 - type: recall_at_10 value: 86.619 - type: recall_at_100 value: 98.04899999999999 - type: recall_at_1000 value: 99.812 - type: recall_at_3 value: 68.987 - type: recall_at_5 value: 78.158 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 72.538 - type: map_at_10 value: 86.702 - type: map_at_100 value: 87.31 - type: map_at_1000 value: 87.323 - type: map_at_3 value: 83.87 - type: map_at_5 value: 85.682 - type: mrr_at_1 value: 83.31 - type: mrr_at_10 value: 89.225 - type: mrr_at_100 value: 89.30399999999999 - type: mrr_at_1000 value: 89.30399999999999 - type: mrr_at_3 value: 88.44300000000001 - type: mrr_at_5 value: 89.005 - type: ndcg_at_1 value: 83.32000000000001 - type: ndcg_at_10 value: 90.095 - type: ndcg_at_100 value: 91.12 - type: ndcg_at_1000 value: 91.179 - type: ndcg_at_3 value: 87.606 - type: ndcg_at_5 value: 89.031 - type: precision_at_1 value: 83.32000000000001 - type: precision_at_10 value: 13.641 - type: precision_at_100 value: 1.541 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 38.377 - type: precision_at_5 value: 25.162000000000003 - type: recall_at_1 value: 72.538 - type: recall_at_10 value: 96.47200000000001 - type: recall_at_100 value: 99.785 - type: recall_at_1000 value: 99.99900000000001 - type: recall_at_3 value: 89.278 - type: recall_at_5 value: 93.367 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 73.55219145406065 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 74.13437105242755 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 6.873 - type: map_at_10 value: 17.944 - type: map_at_100 value: 21.171 - type: map_at_1000 value: 21.528 - type: map_at_3 value: 12.415 - type: map_at_5 value: 15.187999999999999 - type: mrr_at_1 value: 33.800000000000004 - type: mrr_at_10 value: 46.455 - type: mrr_at_100 value: 47.378 - type: mrr_at_1000 value: 47.394999999999996 - type: mrr_at_3 value: 42.367 - type: mrr_at_5 value: 44.972 - type: ndcg_at_1 value: 33.800000000000004 - type: ndcg_at_10 value: 28.907 - type: ndcg_at_100 value: 39.695 - type: ndcg_at_1000 value: 44.582 - type: ndcg_at_3 value: 26.949 - type: ndcg_at_5 value: 23.988 - type: precision_at_1 value: 33.800000000000004 - type: precision_at_10 value: 15.079999999999998 - type: precision_at_100 value: 3.056 - type: precision_at_1000 value: 0.42100000000000004 - type: precision_at_3 value: 25.167 - type: precision_at_5 value: 21.26 - type: recall_at_1 value: 6.873 - type: recall_at_10 value: 30.568 - type: recall_at_100 value: 62.062 - type: recall_at_1000 value: 85.37700000000001 - type: recall_at_3 value: 15.312999999999999 - type: recall_at_5 value: 21.575 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.37009118256057 - type: cos_sim_spearman value: 79.27986395671529 - type: euclidean_pearson value: 79.18037715442115 - type: euclidean_spearman value: 79.28004791561621 - type: manhattan_pearson value: 79.34062972800541 - type: manhattan_spearman value: 79.43106695543402 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 87.48474767383833 - type: cos_sim_spearman value: 79.54505388752513 - type: euclidean_pearson value: 83.43282704179565 - type: euclidean_spearman value: 79.54579919925405 - type: manhattan_pearson value: 83.77564492427952 - type: manhattan_spearman value: 79.84558396989286 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 88.803698035802 - type: cos_sim_spearman value: 88.83451367754881 - type: euclidean_pearson value: 88.28939285711628 - type: euclidean_spearman value: 88.83528996073112 - type: manhattan_pearson value: 88.28017412671795 - type: manhattan_spearman value: 88.9228828016344 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 85.27469288153428 - type: cos_sim_spearman value: 83.87477064876288 - type: euclidean_pearson value: 84.2601737035379 - type: euclidean_spearman value: 83.87431082479074 - type: manhattan_pearson value: 84.3621547772745 - type: manhattan_spearman value: 84.12094375000423 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.12749863201587 - type: cos_sim_spearman value: 88.54287568368565 - type: euclidean_pearson value: 87.90429700607999 - type: euclidean_spearman value: 88.5437689576261 - type: manhattan_pearson value: 88.19276653356833 - type: manhattan_spearman value: 88.99995393814679 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 85.68398747560902 - type: cos_sim_spearman value: 86.48815303460574 - type: euclidean_pearson value: 85.52356631237954 - type: euclidean_spearman value: 86.486391949551 - type: manhattan_pearson value: 85.67267981761788 - type: manhattan_spearman value: 86.7073696332485 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.9057107443124 - type: cos_sim_spearman value: 88.7312168757697 - type: euclidean_pearson value: 88.72810439714794 - type: euclidean_spearman value: 88.71976185854771 - type: manhattan_pearson value: 88.50433745949111 - type: manhattan_spearman value: 88.51726175544195 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 67.59391795109886 - type: cos_sim_spearman value: 66.87613008631367 - type: euclidean_pearson value: 69.23198488262217 - type: euclidean_spearman value: 66.85427723013692 - type: manhattan_pearson value: 69.50730124841084 - type: manhattan_spearman value: 67.10404669820792 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 87.0820605344619 - type: cos_sim_spearman value: 86.8518089863434 - type: euclidean_pearson value: 86.31087134689284 - type: euclidean_spearman value: 86.8518520517941 - type: manhattan_pearson value: 86.47203796160612 - type: manhattan_spearman value: 87.1080149734421 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 89.09255369305481 - type: mrr value: 97.10323445617563 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 61.260999999999996 - type: map_at_10 value: 74.043 - type: map_at_100 value: 74.37700000000001 - type: map_at_1000 value: 74.384 - type: map_at_3 value: 71.222 - type: map_at_5 value: 72.875 - type: mrr_at_1 value: 64.333 - type: mrr_at_10 value: 74.984 - type: mrr_at_100 value: 75.247 - type: mrr_at_1000 value: 75.25500000000001 - type: mrr_at_3 value: 73.167 - type: mrr_at_5 value: 74.35000000000001 - type: ndcg_at_1 value: 64.333 - type: ndcg_at_10 value: 79.06 - type: ndcg_at_100 value: 80.416 - type: ndcg_at_1000 value: 80.55600000000001 - type: ndcg_at_3 value: 74.753 - type: ndcg_at_5 value: 76.97500000000001 - type: precision_at_1 value: 64.333 - type: precision_at_10 value: 10.567 - type: precision_at_100 value: 1.1199999999999999 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 29.889 - type: precision_at_5 value: 19.533 - type: recall_at_1 value: 61.260999999999996 - type: recall_at_10 value: 93.167 - type: recall_at_100 value: 99.0 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 81.667 - type: recall_at_5 value: 87.394 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.71980198019801 - type: cos_sim_ap value: 92.81616007802704 - type: cos_sim_f1 value: 85.17548454688318 - type: cos_sim_precision value: 89.43894389438944 - type: cos_sim_recall value: 81.3 - type: dot_accuracy value: 99.71980198019801 - type: dot_ap value: 92.81398760591358 - type: dot_f1 value: 85.17548454688318 - type: dot_precision value: 89.43894389438944 - type: dot_recall value: 81.3 - type: euclidean_accuracy value: 99.71980198019801 - type: euclidean_ap value: 92.81560637245072 - type: euclidean_f1 value: 85.17548454688318 - type: euclidean_precision value: 89.43894389438944 - type: euclidean_recall value: 81.3 - type: manhattan_accuracy value: 99.73069306930694 - type: manhattan_ap value: 93.14005487480794 - type: manhattan_f1 value: 85.56263269639068 - type: manhattan_precision value: 91.17647058823529 - type: manhattan_recall value: 80.60000000000001 - type: max_accuracy value: 99.73069306930694 - type: max_ap value: 93.14005487480794 - type: max_f1 value: 85.56263269639068 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 79.86443362395185 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 49.40897096662564 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 55.66040806627947 - type: mrr value: 56.58670475766064 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.51015090598575 - type: cos_sim_spearman value: 31.35016454939226 - type: dot_pearson value: 31.5150068731 - type: dot_spearman value: 31.34790869023487 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.254 - type: map_at_10 value: 2.064 - type: map_at_100 value: 12.909 - type: map_at_1000 value: 31.761 - type: map_at_3 value: 0.738 - type: map_at_5 value: 1.155 - type: mrr_at_1 value: 96.0 - type: mrr_at_10 value: 98.0 - type: mrr_at_100 value: 98.0 - type: mrr_at_1000 value: 98.0 - type: mrr_at_3 value: 98.0 - type: mrr_at_5 value: 98.0 - type: ndcg_at_1 value: 93.0 - type: ndcg_at_10 value: 82.258 - type: ndcg_at_100 value: 64.34 - type: ndcg_at_1000 value: 57.912 - type: ndcg_at_3 value: 90.827 - type: ndcg_at_5 value: 86.79 - type: precision_at_1 value: 96.0 - type: precision_at_10 value: 84.8 - type: precision_at_100 value: 66.0 - type: precision_at_1000 value: 25.356 - type: precision_at_3 value: 94.667 - type: precision_at_5 value: 90.4 - type: recall_at_1 value: 0.254 - type: recall_at_10 value: 2.1950000000000003 - type: recall_at_100 value: 16.088 - type: recall_at_1000 value: 54.559000000000005 - type: recall_at_3 value: 0.75 - type: recall_at_5 value: 1.191 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 2.976 - type: map_at_10 value: 11.389000000000001 - type: map_at_100 value: 18.429000000000002 - type: map_at_1000 value: 20.113 - type: map_at_3 value: 6.483 - type: map_at_5 value: 8.770999999999999 - type: mrr_at_1 value: 40.816 - type: mrr_at_10 value: 58.118 - type: mrr_at_100 value: 58.489999999999995 - type: mrr_at_1000 value: 58.489999999999995 - type: mrr_at_3 value: 53.061 - type: mrr_at_5 value: 57.041 - type: ndcg_at_1 value: 40.816 - type: ndcg_at_10 value: 30.567 - type: ndcg_at_100 value: 42.44 - type: ndcg_at_1000 value: 53.480000000000004 - type: ndcg_at_3 value: 36.016 - type: ndcg_at_5 value: 34.257 - type: precision_at_1 value: 42.857 - type: precision_at_10 value: 25.714 - type: precision_at_100 value: 8.429 - type: precision_at_1000 value: 1.5939999999999999 - type: precision_at_3 value: 36.735 - type: precision_at_5 value: 33.878 - type: recall_at_1 value: 2.976 - type: recall_at_10 value: 17.854999999999997 - type: recall_at_100 value: 51.833 - type: recall_at_1000 value: 86.223 - type: recall_at_3 value: 7.887 - type: recall_at_5 value: 12.026 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 85.1174 - type: ap value: 30.169441069345748 - type: f1 value: 69.79254701873245 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 72.58347481607245 - type: f1 value: 72.74877295564937 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 53.90586138221305 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.35769207844072 - type: cos_sim_ap value: 77.9645072410354 - type: cos_sim_f1 value: 71.32352941176471 - type: cos_sim_precision value: 66.5903890160183 - type: cos_sim_recall value: 76.78100263852242 - type: dot_accuracy value: 87.37557370209214 - type: dot_ap value: 77.96250046429908 - type: dot_f1 value: 71.28932757557064 - type: dot_precision value: 66.95249130938586 - type: dot_recall value: 76.22691292875989 - type: euclidean_accuracy value: 87.35173153722357 - type: euclidean_ap value: 77.96520460741593 - type: euclidean_f1 value: 71.32470733210104 - type: euclidean_precision value: 66.91329479768785 - type: euclidean_recall value: 76.35883905013192 - type: manhattan_accuracy value: 87.25636287774931 - type: manhattan_ap value: 77.77752485611796 - type: manhattan_f1 value: 71.18148599269183 - type: manhattan_precision value: 66.10859728506787 - type: manhattan_recall value: 77.0976253298153 - type: max_accuracy value: 87.37557370209214 - type: max_ap value: 77.96520460741593 - type: max_f1 value: 71.32470733210104 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.38176737687739 - type: cos_sim_ap value: 86.58811861657401 - type: cos_sim_f1 value: 79.09430644097604 - type: cos_sim_precision value: 75.45085977911366 - type: cos_sim_recall value: 83.10748383122882 - type: dot_accuracy value: 89.38370784336554 - type: dot_ap value: 86.58840606004333 - type: dot_f1 value: 79.10179860068133 - type: dot_precision value: 75.44546153308643 - type: dot_recall value: 83.13058207576223 - type: euclidean_accuracy value: 89.38564830985369 - type: euclidean_ap value: 86.58820721061164 - type: euclidean_f1 value: 79.09070942235888 - type: euclidean_precision value: 75.38729937194697 - type: euclidean_recall value: 83.17677856482906 - type: manhattan_accuracy value: 89.40699344122326 - type: manhattan_ap value: 86.60631843011362 - type: manhattan_f1 value: 79.14949970570925 - type: manhattan_precision value: 75.78191039729502 - type: manhattan_recall value: 82.83030489682784 - type: max_accuracy value: 89.40699344122326 - type: max_ap value: 86.60631843011362 - type: max_f1 value: 79.14949970570925 - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: b44c3b011063adb25877c13823db83bb193913c4 metrics: - type: cos_sim_pearson value: 65.58442135663871 - type: cos_sim_spearman value: 72.2538631361313 - type: euclidean_pearson value: 70.97255486607429 - type: euclidean_spearman value: 72.25374250228647 - type: manhattan_pearson value: 70.83250199989911 - type: manhattan_spearman value: 72.14819496536272 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 metrics: - type: cos_sim_pearson value: 59.99478404929932 - type: cos_sim_spearman value: 62.61836216999812 - type: euclidean_pearson value: 66.86429811933593 - type: euclidean_spearman value: 62.6183520374191 - type: manhattan_pearson value: 66.8063778911633 - type: manhattan_spearman value: 62.569607573241115 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.98400000000001 - type: f1 value: 51.21447361350723 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 metrics: - type: cos_sim_pearson value: 79.11941660686553 - type: cos_sim_spearman value: 81.25029594540435 - type: euclidean_pearson value: 82.06973504238826 - type: euclidean_spearman value: 81.2501989488524 - type: manhattan_pearson value: 82.10094630392753 - type: manhattan_spearman value: 81.27987244392389 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 metrics: - type: v_measure value: 47.07270168705156 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f metrics: - type: v_measure value: 45.98511703185043 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: 8d7f1e942507dac42dc58017c1a001c3717da7df metrics: - type: map value: 88.19895157194931 - type: mrr value: 90.21424603174603 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: 23d186750531a14a0357ca22cd92d712fd512ea0 metrics: - type: map value: 88.03317320980119 - type: mrr value: 89.9461507936508 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 metrics: - type: map_at_1 value: 29.037000000000003 - type: map_at_10 value: 42.001 - type: map_at_100 value: 43.773 - type: map_at_1000 value: 43.878 - type: map_at_3 value: 37.637 - type: map_at_5 value: 40.034 - type: mrr_at_1 value: 43.136 - type: mrr_at_10 value: 51.158 - type: mrr_at_100 value: 52.083 - type: mrr_at_1000 value: 52.12 - type: mrr_at_3 value: 48.733 - type: mrr_at_5 value: 50.025 - type: ndcg_at_1 value: 43.136 - type: ndcg_at_10 value: 48.685 - type: ndcg_at_100 value: 55.513 - type: ndcg_at_1000 value: 57.242000000000004 - type: ndcg_at_3 value: 43.329 - type: ndcg_at_5 value: 45.438 - type: precision_at_1 value: 43.136 - type: precision_at_10 value: 10.56 - type: precision_at_100 value: 1.6129999999999998 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 24.064 - type: precision_at_5 value: 17.269000000000002 - type: recall_at_1 value: 29.037000000000003 - type: recall_at_10 value: 59.245000000000005 - type: recall_at_100 value: 87.355 - type: recall_at_1000 value: 98.74000000000001 - type: recall_at_3 value: 42.99 - type: recall_at_5 value: 49.681999999999995 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 metrics: - type: cos_sim_accuracy value: 82.68190018039687 - type: cos_sim_ap value: 90.18017125327886 - type: cos_sim_f1 value: 83.64080906868193 - type: cos_sim_precision value: 79.7076890489303 - type: cos_sim_recall value: 87.98223053542202 - type: dot_accuracy value: 82.68190018039687 - type: dot_ap value: 90.18782350103646 - type: dot_f1 value: 83.64242087729039 - type: dot_precision value: 79.65313028764805 - type: dot_recall value: 88.05237315875614 - type: euclidean_accuracy value: 82.68190018039687 - type: euclidean_ap value: 90.1801957900632 - type: euclidean_f1 value: 83.63636363636364 - type: euclidean_precision value: 79.52772506852203 - type: euclidean_recall value: 88.19265840542437 - type: manhattan_accuracy value: 82.14070956103427 - type: manhattan_ap value: 89.96178420101427 - type: manhattan_f1 value: 83.21087838578791 - type: manhattan_precision value: 78.35605121850475 - type: manhattan_recall value: 88.70703764320785 - type: max_accuracy value: 82.68190018039687 - type: max_ap value: 90.18782350103646 - type: max_f1 value: 83.64242087729039 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: 1271c7809071a13532e05f25fb53511ffce77117 metrics: - type: map_at_1 value: 72.234 - type: map_at_10 value: 80.10000000000001 - type: map_at_100 value: 80.36 - type: map_at_1000 value: 80.363 - type: map_at_3 value: 78.315 - type: map_at_5 value: 79.607 - type: mrr_at_1 value: 72.392 - type: mrr_at_10 value: 80.117 - type: mrr_at_100 value: 80.36999999999999 - type: mrr_at_1000 value: 80.373 - type: mrr_at_3 value: 78.469 - type: mrr_at_5 value: 79.633 - type: ndcg_at_1 value: 72.392 - type: ndcg_at_10 value: 83.651 - type: ndcg_at_100 value: 84.749 - type: ndcg_at_1000 value: 84.83000000000001 - type: ndcg_at_3 value: 80.253 - type: ndcg_at_5 value: 82.485 - type: precision_at_1 value: 72.392 - type: precision_at_10 value: 9.557 - type: precision_at_100 value: 1.004 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 28.732000000000003 - type: precision_at_5 value: 18.377 - type: recall_at_1 value: 72.234 - type: recall_at_10 value: 94.573 - type: recall_at_100 value: 99.368 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 85.669 - type: recall_at_5 value: 91.01700000000001 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: a1a333e290fe30b10f3f56498e3a0d911a693ced metrics: - type: map_at_1 value: 26.173999999999996 - type: map_at_10 value: 80.04 - type: map_at_100 value: 82.94500000000001 - type: map_at_1000 value: 82.98100000000001 - type: map_at_3 value: 55.562999999999995 - type: map_at_5 value: 69.89800000000001 - type: mrr_at_1 value: 89.5 - type: mrr_at_10 value: 92.996 - type: mrr_at_100 value: 93.06400000000001 - type: mrr_at_1000 value: 93.065 - type: mrr_at_3 value: 92.658 - type: mrr_at_5 value: 92.84599999999999 - type: ndcg_at_1 value: 89.5 - type: ndcg_at_10 value: 87.443 - type: ndcg_at_100 value: 90.253 - type: ndcg_at_1000 value: 90.549 - type: ndcg_at_3 value: 85.874 - type: ndcg_at_5 value: 84.842 - type: precision_at_1 value: 89.5 - type: precision_at_10 value: 41.805 - type: precision_at_100 value: 4.827 - type: precision_at_1000 value: 0.49 - type: precision_at_3 value: 76.85 - type: precision_at_5 value: 64.8 - type: recall_at_1 value: 26.173999999999996 - type: recall_at_10 value: 89.101 - type: recall_at_100 value: 98.08099999999999 - type: recall_at_1000 value: 99.529 - type: recall_at_3 value: 57.902 - type: recall_at_5 value: 74.602 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 metrics: - type: map_at_1 value: 56.10000000000001 - type: map_at_10 value: 66.15299999999999 - type: map_at_100 value: 66.625 - type: map_at_1000 value: 66.636 - type: map_at_3 value: 63.632999999999996 - type: map_at_5 value: 65.293 - type: mrr_at_1 value: 56.10000000000001 - type: mrr_at_10 value: 66.15299999999999 - type: mrr_at_100 value: 66.625 - type: mrr_at_1000 value: 66.636 - type: mrr_at_3 value: 63.632999999999996 - type: mrr_at_5 value: 65.293 - type: ndcg_at_1 value: 56.10000000000001 - type: ndcg_at_10 value: 71.146 - type: ndcg_at_100 value: 73.27799999999999 - type: ndcg_at_1000 value: 73.529 - type: ndcg_at_3 value: 66.09 - type: ndcg_at_5 value: 69.08999999999999 - type: precision_at_1 value: 56.10000000000001 - type: precision_at_10 value: 8.68 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 24.4 - type: precision_at_5 value: 16.1 - type: recall_at_1 value: 56.10000000000001 - type: recall_at_10 value: 86.8 - type: recall_at_100 value: 96.39999999999999 - type: recall_at_1000 value: 98.3 - type: recall_at_3 value: 73.2 - type: recall_at_5 value: 80.5 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: 421605374b29664c5fc098418fe20ada9bd55f8a metrics: - type: accuracy value: 54.52096960369373 - type: f1 value: 40.930845295808695 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: b7c64bd89eb87f8ded463478346f76731f07bf8b metrics: - type: accuracy value: 86.51031894934334 - type: ap value: 55.9516014323483 - type: f1 value: 81.54813679326381 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: 17f9b096f80380fce5ed12a9be8be7784b337daf metrics: - type: cos_sim_pearson value: 69.67437838574276 - type: cos_sim_spearman value: 73.81314174653045 - type: euclidean_pearson value: 72.63430276680275 - type: euclidean_spearman value: 73.81358736777001 - type: manhattan_pearson value: 72.58743833842829 - type: manhattan_spearman value: 73.7590419009179 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 31.648613483640254 - type: mrr value: 30.37420634920635 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 metrics: - type: map_at_1 value: 73.28099999999999 - type: map_at_10 value: 81.977 - type: map_at_100 value: 82.222 - type: map_at_1000 value: 82.22699999999999 - type: map_at_3 value: 80.441 - type: map_at_5 value: 81.46600000000001 - type: mrr_at_1 value: 75.673 - type: mrr_at_10 value: 82.41000000000001 - type: mrr_at_100 value: 82.616 - type: mrr_at_1000 value: 82.621 - type: mrr_at_3 value: 81.094 - type: mrr_at_5 value: 81.962 - type: ndcg_at_1 value: 75.673 - type: ndcg_at_10 value: 85.15599999999999 - type: ndcg_at_100 value: 86.151 - type: ndcg_at_1000 value: 86.26899999999999 - type: ndcg_at_3 value: 82.304 - type: ndcg_at_5 value: 84.009 - type: precision_at_1 value: 75.673 - type: precision_at_10 value: 10.042 - type: precision_at_100 value: 1.052 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 30.673000000000002 - type: precision_at_5 value: 19.326999999999998 - type: recall_at_1 value: 73.28099999999999 - type: recall_at_10 value: 94.446 - type: recall_at_100 value: 98.737 - type: recall_at_1000 value: 99.649 - type: recall_at_3 value: 86.984 - type: recall_at_5 value: 91.024 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 81.08607935440484 - type: f1 value: 78.24879986066307 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 86.05917955615332 - type: f1 value: 85.05279279434997 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 metrics: - type: map_at_1 value: 56.2 - type: map_at_10 value: 62.57899999999999 - type: map_at_100 value: 63.154999999999994 - type: map_at_1000 value: 63.193 - type: map_at_3 value: 61.217 - type: map_at_5 value: 62.012 - type: mrr_at_1 value: 56.3 - type: mrr_at_10 value: 62.629000000000005 - type: mrr_at_100 value: 63.205999999999996 - type: mrr_at_1000 value: 63.244 - type: mrr_at_3 value: 61.267 - type: mrr_at_5 value: 62.062 - type: ndcg_at_1 value: 56.2 - type: ndcg_at_10 value: 65.592 - type: ndcg_at_100 value: 68.657 - type: ndcg_at_1000 value: 69.671 - type: ndcg_at_3 value: 62.808 - type: ndcg_at_5 value: 64.24499999999999 - type: precision_at_1 value: 56.2 - type: precision_at_10 value: 7.5 - type: precision_at_100 value: 0.899 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 22.467000000000002 - type: precision_at_5 value: 14.180000000000001 - type: recall_at_1 value: 56.2 - type: recall_at_10 value: 75.0 - type: recall_at_100 value: 89.9 - type: recall_at_1000 value: 97.89999999999999 - type: recall_at_3 value: 67.4 - type: recall_at_5 value: 70.89999999999999 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a metrics: - type: accuracy value: 76.87666666666667 - type: f1 value: 76.7317686219665 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: 66e76a618a34d6d565d5538088562851e6daa7ec metrics: - type: cos_sim_accuracy value: 79.64266377910124 - type: cos_sim_ap value: 84.78274442344829 - type: cos_sim_f1 value: 81.16947472745292 - type: cos_sim_precision value: 76.47058823529412 - type: cos_sim_recall value: 86.48363252375924 - type: dot_accuracy value: 79.64266377910124 - type: dot_ap value: 84.7851404063692 - type: dot_f1 value: 81.16947472745292 - type: dot_precision value: 76.47058823529412 - type: dot_recall value: 86.48363252375924 - type: euclidean_accuracy value: 79.64266377910124 - type: euclidean_ap value: 84.78068373762378 - type: euclidean_f1 value: 81.14794656110837 - type: euclidean_precision value: 76.35009310986965 - type: euclidean_recall value: 86.58922914466737 - type: manhattan_accuracy value: 79.48023822414727 - type: manhattan_ap value: 84.72928897427576 - type: manhattan_f1 value: 81.32084770823064 - type: manhattan_precision value: 76.24768946395564 - type: manhattan_recall value: 87.11721224920802 - type: max_accuracy value: 79.64266377910124 - type: max_ap value: 84.7851404063692 - type: max_f1 value: 81.32084770823064 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: e610f2ebd179a8fda30ae534c3878750a96db120 metrics: - type: accuracy value: 94.3 - type: ap value: 92.8664032274438 - type: f1 value: 94.29311102997727 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 metrics: - type: cos_sim_pearson value: 48.51392279882909 - type: cos_sim_spearman value: 54.06338895994974 - type: euclidean_pearson value: 52.58480559573412 - type: euclidean_spearman value: 54.06417276612201 - type: manhattan_pearson value: 52.69525121721343 - type: manhattan_spearman value: 54.048147455389675 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 metrics: - type: cos_sim_pearson value: 29.728387290757325 - type: cos_sim_spearman value: 31.366121633635284 - type: euclidean_pearson value: 29.14588368552961 - type: euclidean_spearman value: 31.36764411112844 - type: manhattan_pearson value: 29.63517350523121 - type: manhattan_spearman value: 31.94157020583762 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 63.64868296271406 - type: cos_sim_spearman value: 66.12800618164744 - type: euclidean_pearson value: 63.21405767340238 - type: euclidean_spearman value: 66.12786567790748 - type: manhattan_pearson value: 64.04300276525848 - type: manhattan_spearman value: 66.5066857145652 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 metrics: - type: cos_sim_pearson value: 81.2302623912794 - type: cos_sim_spearman value: 81.16833673266562 - type: euclidean_pearson value: 79.47647843876024 - type: euclidean_spearman value: 81.16944349524972 - type: manhattan_pearson value: 79.84947238492208 - type: manhattan_spearman value: 81.64626599410026 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: 76631901a18387f85eaa53e5450019b87ad58ef9 metrics: - type: map value: 67.80129586475687 - type: mrr value: 77.77402311635554 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: 8731a845f1bf500a4f111cf1070785c793d10e64 metrics: - type: map_at_1 value: 28.666999999999998 - type: map_at_10 value: 81.063 - type: map_at_100 value: 84.504 - type: map_at_1000 value: 84.552 - type: map_at_3 value: 56.897 - type: map_at_5 value: 70.073 - type: mrr_at_1 value: 92.087 - type: mrr_at_10 value: 94.132 - type: mrr_at_100 value: 94.19800000000001 - type: mrr_at_1000 value: 94.19999999999999 - type: mrr_at_3 value: 93.78999999999999 - type: mrr_at_5 value: 94.002 - type: ndcg_at_1 value: 92.087 - type: ndcg_at_10 value: 87.734 - type: ndcg_at_100 value: 90.736 - type: ndcg_at_1000 value: 91.184 - type: ndcg_at_3 value: 88.78 - type: ndcg_at_5 value: 87.676 - type: precision_at_1 value: 92.087 - type: precision_at_10 value: 43.46 - type: precision_at_100 value: 5.07 - type: precision_at_1000 value: 0.518 - type: precision_at_3 value: 77.49000000000001 - type: precision_at_5 value: 65.194 - type: recall_at_1 value: 28.666999999999998 - type: recall_at_10 value: 86.632 - type: recall_at_100 value: 96.646 - type: recall_at_1000 value: 98.917 - type: recall_at_3 value: 58.333999999999996 - type: recall_at_5 value: 72.974 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 metrics: - type: accuracy value: 52.971999999999994 - type: f1 value: 50.2898280984929 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: 5798586b105c0434e4f0fe5e767abe619442cf93 metrics: - type: v_measure value: 86.0797948663824 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d metrics: - type: v_measure value: 85.10759092255017 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 metrics: - type: map_at_1 value: 65.60000000000001 - type: map_at_10 value: 74.773 - type: map_at_100 value: 75.128 - type: map_at_1000 value: 75.136 - type: map_at_3 value: 73.05 - type: map_at_5 value: 74.13499999999999 - type: mrr_at_1 value: 65.60000000000001 - type: mrr_at_10 value: 74.773 - type: mrr_at_100 value: 75.128 - type: mrr_at_1000 value: 75.136 - type: mrr_at_3 value: 73.05 - type: mrr_at_5 value: 74.13499999999999 - type: ndcg_at_1 value: 65.60000000000001 - type: ndcg_at_10 value: 78.84299999999999 - type: ndcg_at_100 value: 80.40899999999999 - type: ndcg_at_1000 value: 80.57 - type: ndcg_at_3 value: 75.40599999999999 - type: ndcg_at_5 value: 77.351 - type: precision_at_1 value: 65.60000000000001 - type: precision_at_10 value: 9.139999999999999 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 27.400000000000002 - type: precision_at_5 value: 17.380000000000003 - type: recall_at_1 value: 65.60000000000001 - type: recall_at_10 value: 91.4 - type: recall_at_100 value: 98.4 - type: recall_at_1000 value: 99.6 - type: recall_at_3 value: 82.19999999999999 - type: recall_at_5 value: 86.9 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: 339287def212450dcaa9df8c22bf93e9980c7023 metrics: - type: accuracy value: 89.47 - type: ap value: 75.59561751845389 - type: f1 value: 87.95207751382563 - dataset: config: default name: MTEB AlloProfClusteringP2P revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: v_measure value: 76.05592323841036 task: type: Clustering - dataset: config: default name: MTEB AlloProfClusteringS2S revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: v_measure value: 64.51718058866508 task: type: Clustering - dataset: config: default name: MTEB AlloprofReranking revision: 666fdacebe0291776e86f29345663dfaf80a0db9 split: test type: lyon-nlp/mteb-fr-reranking-alloprof-s2p metrics: - type: map value: 73.08278490943373 - type: mrr value: 74.66561454570449 task: type: Reranking - dataset: config: default name: MTEB AlloprofRetrieval revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: map_at_1 value: 38.912 - type: map_at_10 value: 52.437999999999995 - type: map_at_100 value: 53.38 - type: map_at_1000 value: 53.427 - type: map_at_3 value: 48.879 - type: map_at_5 value: 50.934000000000005 - type: mrr_at_1 value: 44.085 - type: mrr_at_10 value: 55.337 - type: mrr_at_100 value: 56.016999999999996 - type: mrr_at_1000 value: 56.043 - type: mrr_at_3 value: 52.55499999999999 - type: mrr_at_5 value: 54.20399999999999 - type: ndcg_at_1 value: 44.085 - type: ndcg_at_10 value: 58.876 - type: ndcg_at_100 value: 62.714000000000006 - type: ndcg_at_1000 value: 63.721000000000004 - type: ndcg_at_3 value: 52.444 - type: ndcg_at_5 value: 55.692 - type: precision_at_1 value: 44.085 - type: precision_at_10 value: 9.21 - type: precision_at_100 value: 1.164 - type: precision_at_1000 value: 0.128 - type: precision_at_3 value: 23.043 - type: precision_at_5 value: 15.898000000000001 - type: recall_at_1 value: 38.912 - type: recall_at_10 value: 75.577 - type: recall_at_100 value: 92.038 - type: recall_at_1000 value: 99.325 - type: recall_at_3 value: 58.592 - type: recall_at_5 value: 66.235 task: type: Retrieval - dataset: config: fr name: MTEB AmazonReviewsClassification (fr) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 55.532000000000004 - type: f1 value: 52.5783943471605 task: type: Classification - dataset: config: default name: MTEB BSARDRetrieval revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 split: test type: maastrichtlawtech/bsard metrics: - type: map_at_1 value: 8.108 - type: map_at_10 value: 14.710999999999999 - type: map_at_100 value: 15.891 - type: map_at_1000 value: 15.983 - type: map_at_3 value: 12.237 - type: map_at_5 value: 13.679 - type: mrr_at_1 value: 8.108 - type: mrr_at_10 value: 14.710999999999999 - type: mrr_at_100 value: 15.891 - type: mrr_at_1000 value: 15.983 - type: mrr_at_3 value: 12.237 - type: mrr_at_5 value: 13.679 - type: ndcg_at_1 value: 8.108 - type: ndcg_at_10 value: 18.796 - type: ndcg_at_100 value: 25.098 - type: ndcg_at_1000 value: 27.951999999999998 - type: ndcg_at_3 value: 13.712 - type: ndcg_at_5 value: 16.309 - type: precision_at_1 value: 8.108 - type: precision_at_10 value: 3.198 - type: precision_at_100 value: 0.626 - type: precision_at_1000 value: 0.086 - type: precision_at_3 value: 6.006 - type: precision_at_5 value: 4.865 - type: recall_at_1 value: 8.108 - type: recall_at_10 value: 31.982 - type: recall_at_100 value: 62.613 - type: recall_at_1000 value: 86.036 - type: recall_at_3 value: 18.018 - type: recall_at_5 value: 24.324 task: type: Retrieval - dataset: config: default name: MTEB HALClusteringS2S revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 split: test type: lyon-nlp/clustering-hal-s2s metrics: - type: v_measure value: 30.833269778867116 task: type: Clustering - dataset: config: default name: MTEB MLSUMClusteringP2P revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: mlsum metrics: - type: v_measure value: 50.0281928004713 task: type: Clustering - dataset: config: default name: MTEB MLSUMClusteringS2S revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: mlsum metrics: - type: v_measure value: 43.699961510636534 task: type: Clustering - dataset: config: fr name: MTEB MTOPDomainClassification (fr) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 96.68963357344191 - type: f1 value: 96.45175170820961 task: type: Classification - dataset: config: fr name: MTEB MTOPIntentClassification (fr) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 87.46946445349202 - type: f1 value: 65.79860440988624 task: type: Classification - dataset: config: fra name: MTEB MasakhaNEWSClassification (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: accuracy value: 82.60663507109005 - type: f1 value: 77.20462646604777 task: type: Classification - dataset: config: fra name: MTEB MasakhaNEWSClusteringP2P (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: v_measure value: 60.19311264967803 task: type: Clustering - dataset: config: fra name: MTEB MasakhaNEWSClusteringS2S (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: v_measure value: 63.6235764409785 task: type: Clustering - dataset: config: fr name: MTEB MassiveIntentClassification (fr) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 81.65097511768661 - type: f1 value: 78.77796091490924 task: type: Classification - dataset: config: fr name: MTEB MassiveScenarioClassification (fr) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 86.64425016812373 - type: f1 value: 85.4912728670017 task: type: Classification - dataset: config: fr name: MTEB MintakaRetrieval (fr) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: jinaai/mintakaqa metrics: - type: map_at_1 value: 35.913000000000004 - type: map_at_10 value: 48.147 - type: map_at_100 value: 48.91 - type: map_at_1000 value: 48.949 - type: map_at_3 value: 45.269999999999996 - type: map_at_5 value: 47.115 - type: mrr_at_1 value: 35.913000000000004 - type: mrr_at_10 value: 48.147 - type: mrr_at_100 value: 48.91 - type: mrr_at_1000 value: 48.949 - type: mrr_at_3 value: 45.269999999999996 - type: mrr_at_5 value: 47.115 - type: ndcg_at_1 value: 35.913000000000004 - type: ndcg_at_10 value: 54.03 - type: ndcg_at_100 value: 57.839 - type: ndcg_at_1000 value: 58.925000000000004 - type: ndcg_at_3 value: 48.217999999999996 - type: ndcg_at_5 value: 51.56699999999999 - type: precision_at_1 value: 35.913000000000004 - type: precision_at_10 value: 7.244000000000001 - type: precision_at_100 value: 0.9039999999999999 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 18.905 - type: precision_at_5 value: 12.981000000000002 - type: recall_at_1 value: 35.913000000000004 - type: recall_at_10 value: 72.441 - type: recall_at_100 value: 90.41799999999999 - type: recall_at_1000 value: 99.099 - type: recall_at_3 value: 56.716 - type: recall_at_5 value: 64.90599999999999 task: type: Retrieval - dataset: config: fr name: MTEB OpusparcusPC (fr) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test type: GEM/opusparcus metrics: - type: cos_sim_accuracy value: 99.90069513406156 - type: cos_sim_ap value: 100.0 - type: cos_sim_f1 value: 99.95032290114257 - type: cos_sim_precision value: 100.0 - type: cos_sim_recall value: 99.90069513406156 - type: dot_accuracy value: 99.90069513406156 - type: dot_ap value: 100.0 - type: dot_f1 value: 99.95032290114257 - type: dot_precision value: 100.0 - type: dot_recall value: 99.90069513406156 - type: euclidean_accuracy value: 99.90069513406156 - type: euclidean_ap value: 100.0 - type: euclidean_f1 value: 99.95032290114257 - type: euclidean_precision value: 100.0 - type: euclidean_recall value: 99.90069513406156 - type: manhattan_accuracy value: 99.90069513406156 - type: manhattan_ap value: 100.0 - type: manhattan_f1 value: 99.95032290114257 - type: manhattan_precision value: 100.0 - type: manhattan_recall value: 99.90069513406156 - type: max_accuracy value: 99.90069513406156 - type: max_ap value: 100.0 - type: max_f1 value: 99.95032290114257 task: type: PairClassification - dataset: config: fr name: MTEB PawsX (fr) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: paws-x metrics: - type: cos_sim_accuracy value: 75.25 - type: cos_sim_ap value: 80.86376001270014 - type: cos_sim_f1 value: 73.65945437441204 - type: cos_sim_precision value: 64.02289452166802 - type: cos_sim_recall value: 86.71096345514951 - type: dot_accuracy value: 75.25 - type: dot_ap value: 80.93686107633002 - type: dot_f1 value: 73.65945437441204 - type: dot_precision value: 64.02289452166802 - type: dot_recall value: 86.71096345514951 - type: euclidean_accuracy value: 75.25 - type: euclidean_ap value: 80.86379136218862 - type: euclidean_f1 value: 73.65945437441204 - type: euclidean_precision value: 64.02289452166802 - type: euclidean_recall value: 86.71096345514951 - type: manhattan_accuracy value: 75.3 - type: manhattan_ap value: 80.87826606097734 - type: manhattan_f1 value: 73.68421052631581 - type: manhattan_precision value: 64.0 - type: manhattan_recall value: 86.82170542635659 - type: max_accuracy value: 75.3 - type: max_ap value: 80.93686107633002 - type: max_f1 value: 73.68421052631581 task: type: PairClassification - dataset: config: default name: MTEB SICKFr revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a split: test type: Lajavaness/SICK-fr metrics: - type: cos_sim_pearson value: 81.42349425981143 - type: cos_sim_spearman value: 78.90454327031226 - type: euclidean_pearson value: 78.39086497435166 - type: euclidean_spearman value: 78.9046133980509 - type: manhattan_pearson value: 78.63743094286502 - type: manhattan_spearman value: 79.12136348449269 task: type: STS - dataset: config: fr name: MTEB STS22 (fr) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 81.452697919749 - type: cos_sim_spearman value: 82.58116836039301 - type: euclidean_pearson value: 81.04038478932786 - type: euclidean_spearman value: 82.58116836039301 - type: manhattan_pearson value: 81.37075396187771 - type: manhattan_spearman value: 82.73678231355368 task: type: STS - dataset: config: fr name: MTEB STSBenchmarkMultilingualSTS (fr) revision: 93d57ef91790589e3ce9c365164337a8a78b7632 split: test type: stsb_multi_mt metrics: - type: cos_sim_pearson value: 85.7419764013806 - type: cos_sim_spearman value: 85.46085808849622 - type: euclidean_pearson value: 83.70449639870063 - type: euclidean_spearman value: 85.46159013076233 - type: manhattan_pearson value: 83.95259510313929 - type: manhattan_spearman value: 85.8029724659458 task: type: STS - dataset: config: default name: MTEB SummEvalFr revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 split: test type: lyon-nlp/summarization-summeval-fr-p2p metrics: - type: cos_sim_pearson value: 32.61063271753325 - type: cos_sim_spearman value: 31.454589417353603 - type: dot_pearson value: 32.6106288643431 - type: dot_spearman value: 31.454589417353603 task: type: Summarization - dataset: config: default name: MTEB SyntecReranking revision: b205c5084a0934ce8af14338bf03feb19499c84d split: test type: lyon-nlp/mteb-fr-reranking-syntec-s2p metrics: - type: map value: 84.31666666666666 - type: mrr value: 84.31666666666666 task: type: Reranking - dataset: config: default name: MTEB SyntecRetrieval revision: 77f7e271bf4a92b24fce5119f3486b583ca016ff split: test type: lyon-nlp/mteb-fr-retrieval-syntec-s2p metrics: - type: map_at_1 value: 63.0 - type: map_at_10 value: 73.471 - type: map_at_100 value: 73.87 - type: map_at_1000 value: 73.87 - type: map_at_3 value: 70.5 - type: map_at_5 value: 73.05 - type: mrr_at_1 value: 63.0 - type: mrr_at_10 value: 73.471 - type: mrr_at_100 value: 73.87 - type: mrr_at_1000 value: 73.87 - type: mrr_at_3 value: 70.5 - type: mrr_at_5 value: 73.05 - type: ndcg_at_1 value: 63.0 - type: ndcg_at_10 value: 78.255 - type: ndcg_at_100 value: 79.88 - type: ndcg_at_1000 value: 79.88 - type: ndcg_at_3 value: 72.702 - type: ndcg_at_5 value: 77.264 - type: precision_at_1 value: 63.0 - type: precision_at_10 value: 9.3 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 26.333000000000002 - type: precision_at_5 value: 18.0 - type: recall_at_1 value: 63.0 - type: recall_at_10 value: 93.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 79.0 - type: recall_at_5 value: 90.0 task: type: Retrieval - dataset: config: fr name: MTEB XPQARetrieval (fr) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: map_at_1 value: 40.338 - type: map_at_10 value: 61.927 - type: map_at_100 value: 63.361999999999995 - type: map_at_1000 value: 63.405 - type: map_at_3 value: 55.479 - type: map_at_5 value: 59.732 - type: mrr_at_1 value: 63.551 - type: mrr_at_10 value: 71.006 - type: mrr_at_100 value: 71.501 - type: mrr_at_1000 value: 71.509 - type: mrr_at_3 value: 69.07 - type: mrr_at_5 value: 70.165 - type: ndcg_at_1 value: 63.551 - type: ndcg_at_10 value: 68.297 - type: ndcg_at_100 value: 73.13199999999999 - type: ndcg_at_1000 value: 73.751 - type: ndcg_at_3 value: 62.999 - type: ndcg_at_5 value: 64.89 - type: precision_at_1 value: 63.551 - type: precision_at_10 value: 15.661 - type: precision_at_100 value: 1.9789999999999999 - type: precision_at_1000 value: 0.207 - type: precision_at_3 value: 38.273 - type: precision_at_5 value: 27.61 - type: recall_at_1 value: 40.338 - type: recall_at_10 value: 77.267 - type: recall_at_100 value: 95.892 - type: recall_at_1000 value: 99.75500000000001 - type: recall_at_3 value: 60.36 - type: recall_at_5 value: 68.825 task: type: Retrieval - dataset: config: default name: MTEB 8TagsClustering revision: None split: test type: PL-MTEB/8tags-clustering metrics: - type: v_measure value: 51.36126303874126 task: type: Clustering - dataset: config: default name: MTEB AllegroReviews revision: None split: test type: PL-MTEB/allegro-reviews metrics: - type: accuracy value: 67.13717693836979 - type: f1 value: 57.27609848003782 task: type: Classification - dataset: config: default name: MTEB ArguAna-PL revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 split: test type: clarin-knext/arguana-pl metrics: - type: map_at_1 value: 35.276999999999994 - type: map_at_10 value: 51.086 - type: map_at_100 value: 51.788000000000004 - type: map_at_1000 value: 51.791 - type: map_at_3 value: 46.147 - type: map_at_5 value: 49.078 - type: mrr_at_1 value: 35.917 - type: mrr_at_10 value: 51.315999999999995 - type: mrr_at_100 value: 52.018 - type: mrr_at_1000 value: 52.022 - type: mrr_at_3 value: 46.349000000000004 - type: mrr_at_5 value: 49.297000000000004 - type: ndcg_at_1 value: 35.276999999999994 - type: ndcg_at_10 value: 59.870999999999995 - type: ndcg_at_100 value: 62.590999999999994 - type: ndcg_at_1000 value: 62.661 - type: ndcg_at_3 value: 49.745 - type: ndcg_at_5 value: 55.067 - type: precision_at_1 value: 35.276999999999994 - type: precision_at_10 value: 8.791 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.057 - type: precision_at_5 value: 14.637 - type: recall_at_1 value: 35.276999999999994 - type: recall_at_10 value: 87.909 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 60.171 - type: recall_at_5 value: 73.18599999999999 task: type: Retrieval - dataset: config: default name: MTEB CBD revision: None split: test type: PL-MTEB/cbd metrics: - type: accuracy value: 78.03000000000002 - type: ap value: 29.12548553897622 - type: f1 value: 66.54857118886073 task: type: Classification - dataset: config: default name: MTEB CDSC-E revision: None split: test type: PL-MTEB/cdsce-pairclassification metrics: - type: cos_sim_accuracy value: 89.0 - type: cos_sim_ap value: 76.75437826834582 - type: cos_sim_f1 value: 66.4850136239782 - type: cos_sim_precision value: 68.92655367231639 - type: cos_sim_recall value: 64.21052631578948 - type: dot_accuracy value: 89.0 - type: dot_ap value: 76.75437826834582 - type: dot_f1 value: 66.4850136239782 - type: dot_precision value: 68.92655367231639 - type: dot_recall value: 64.21052631578948 - type: euclidean_accuracy value: 89.0 - type: euclidean_ap value: 76.75437826834582 - type: euclidean_f1 value: 66.4850136239782 - type: euclidean_precision value: 68.92655367231639 - type: euclidean_recall value: 64.21052631578948 - type: manhattan_accuracy value: 89.0 - type: manhattan_ap value: 76.66074220647083 - type: manhattan_f1 value: 66.47058823529412 - type: manhattan_precision value: 75.33333333333333 - type: manhattan_recall value: 59.473684210526315 - type: max_accuracy value: 89.0 - type: max_ap value: 76.75437826834582 - type: max_f1 value: 66.4850136239782 task: type: PairClassification - dataset: config: default name: MTEB CDSC-R revision: None split: test type: PL-MTEB/cdscr-sts metrics: - type: cos_sim_pearson value: 93.12903172428328 - type: cos_sim_spearman value: 92.66381487060741 - type: euclidean_pearson value: 90.37278396708922 - type: euclidean_spearman value: 92.66381487060741 - type: manhattan_pearson value: 90.32503296540962 - type: manhattan_spearman value: 92.6902938354313 task: type: STS - dataset: config: default name: MTEB DBPedia-PL revision: 76afe41d9af165cc40999fcaa92312b8b012064a split: test type: clarin-knext/dbpedia-pl metrics: - type: map_at_1 value: 8.83 - type: map_at_10 value: 18.326 - type: map_at_100 value: 26.496 - type: map_at_1000 value: 28.455000000000002 - type: map_at_3 value: 12.933 - type: map_at_5 value: 15.168000000000001 - type: mrr_at_1 value: 66.0 - type: mrr_at_10 value: 72.76700000000001 - type: mrr_at_100 value: 73.203 - type: mrr_at_1000 value: 73.219 - type: mrr_at_3 value: 71.458 - type: mrr_at_5 value: 72.246 - type: ndcg_at_1 value: 55.375 - type: ndcg_at_10 value: 41.3 - type: ndcg_at_100 value: 45.891 - type: ndcg_at_1000 value: 52.905 - type: ndcg_at_3 value: 46.472 - type: ndcg_at_5 value: 43.734 - type: precision_at_1 value: 66.0 - type: precision_at_10 value: 33.074999999999996 - type: precision_at_100 value: 11.094999999999999 - type: precision_at_1000 value: 2.374 - type: precision_at_3 value: 48.583 - type: precision_at_5 value: 42.0 - type: recall_at_1 value: 8.83 - type: recall_at_10 value: 22.587 - type: recall_at_100 value: 50.61600000000001 - type: recall_at_1000 value: 73.559 - type: recall_at_3 value: 13.688 - type: recall_at_5 value: 16.855 task: type: Retrieval - dataset: config: default name: MTEB FiQA-PL revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e split: test type: clarin-knext/fiqa-pl metrics: - type: map_at_1 value: 20.587 - type: map_at_10 value: 33.095 - type: map_at_100 value: 35.24 - type: map_at_1000 value: 35.429 - type: map_at_3 value: 28.626 - type: map_at_5 value: 31.136999999999997 - type: mrr_at_1 value: 40.586 - type: mrr_at_10 value: 49.033 - type: mrr_at_100 value: 49.952999999999996 - type: mrr_at_1000 value: 49.992 - type: mrr_at_3 value: 46.553 - type: mrr_at_5 value: 48.035 - type: ndcg_at_1 value: 40.586 - type: ndcg_at_10 value: 41.046 - type: ndcg_at_100 value: 48.586 - type: ndcg_at_1000 value: 51.634 - type: ndcg_at_3 value: 36.773 - type: ndcg_at_5 value: 38.389 - type: precision_at_1 value: 40.586 - type: precision_at_10 value: 11.466 - type: precision_at_100 value: 1.909 - type: precision_at_1000 value: 0.245 - type: precision_at_3 value: 24.434 - type: precision_at_5 value: 18.426000000000002 - type: recall_at_1 value: 20.587 - type: recall_at_10 value: 47.986000000000004 - type: recall_at_100 value: 75.761 - type: recall_at_1000 value: 94.065 - type: recall_at_3 value: 33.339 - type: recall_at_5 value: 39.765 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA-PL revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 split: test type: clarin-knext/hotpotqa-pl metrics: - type: map_at_1 value: 40.878 - type: map_at_10 value: 58.775999999999996 - type: map_at_100 value: 59.632 - type: map_at_1000 value: 59.707 - type: map_at_3 value: 56.074 - type: map_at_5 value: 57.629 - type: mrr_at_1 value: 81.756 - type: mrr_at_10 value: 86.117 - type: mrr_at_100 value: 86.299 - type: mrr_at_1000 value: 86.30600000000001 - type: mrr_at_3 value: 85.345 - type: mrr_at_5 value: 85.832 - type: ndcg_at_1 value: 81.756 - type: ndcg_at_10 value: 67.608 - type: ndcg_at_100 value: 70.575 - type: ndcg_at_1000 value: 71.99600000000001 - type: ndcg_at_3 value: 63.723 - type: ndcg_at_5 value: 65.70700000000001 - type: precision_at_1 value: 81.756 - type: precision_at_10 value: 13.619 - type: precision_at_100 value: 1.5939999999999999 - type: precision_at_1000 value: 0.178 - type: precision_at_3 value: 39.604 - type: precision_at_5 value: 25.332 - type: recall_at_1 value: 40.878 - type: recall_at_10 value: 68.096 - type: recall_at_100 value: 79.696 - type: recall_at_1000 value: 89.082 - type: recall_at_3 value: 59.406000000000006 - type: recall_at_5 value: 63.329 task: type: Retrieval - dataset: config: default name: MTEB MSMARCO-PL revision: 8634c07806d5cce3a6138e260e59b81760a0a640 split: test type: clarin-knext/msmarco-pl metrics: - type: map_at_1 value: 2.1839999999999997 - type: map_at_10 value: 11.346 - type: map_at_100 value: 30.325000000000003 - type: map_at_1000 value: 37.806 - type: map_at_3 value: 4.842 - type: map_at_5 value: 6.891 - type: mrr_at_1 value: 86.047 - type: mrr_at_10 value: 89.14699999999999 - type: mrr_at_100 value: 89.46600000000001 - type: mrr_at_1000 value: 89.46600000000001 - type: mrr_at_3 value: 89.14699999999999 - type: mrr_at_5 value: 89.14699999999999 - type: ndcg_at_1 value: 67.829 - type: ndcg_at_10 value: 62.222 - type: ndcg_at_100 value: 55.337 - type: ndcg_at_1000 value: 64.076 - type: ndcg_at_3 value: 68.12700000000001 - type: ndcg_at_5 value: 64.987 - type: precision_at_1 value: 86.047 - type: precision_at_10 value: 69.535 - type: precision_at_100 value: 32.93 - type: precision_at_1000 value: 6.6049999999999995 - type: precision_at_3 value: 79.845 - type: precision_at_5 value: 75.349 - type: recall_at_1 value: 2.1839999999999997 - type: recall_at_10 value: 12.866 - type: recall_at_100 value: 43.505 - type: recall_at_1000 value: 72.366 - type: recall_at_3 value: 4.947 - type: recall_at_5 value: 7.192 task: type: Retrieval - dataset: config: pl name: MTEB MassiveIntentClassification (pl) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 80.75319435104238 - type: f1 value: 77.58961444860606 task: type: Classification - dataset: config: pl name: MTEB MassiveScenarioClassification (pl) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 85.54472091459313 - type: f1 value: 84.29498563572106 task: type: Classification - dataset: config: default name: MTEB NFCorpus-PL revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 split: test type: clarin-knext/nfcorpus-pl metrics: - type: map_at_1 value: 4.367 - type: map_at_10 value: 10.38 - type: map_at_100 value: 13.516 - type: map_at_1000 value: 14.982000000000001 - type: map_at_3 value: 7.367 - type: map_at_5 value: 8.59 - type: mrr_at_1 value: 41.486000000000004 - type: mrr_at_10 value: 48.886 - type: mrr_at_100 value: 49.657000000000004 - type: mrr_at_1000 value: 49.713 - type: mrr_at_3 value: 46.904 - type: mrr_at_5 value: 48.065000000000005 - type: ndcg_at_1 value: 40.402 - type: ndcg_at_10 value: 30.885 - type: ndcg_at_100 value: 28.393 - type: ndcg_at_1000 value: 37.428 - type: ndcg_at_3 value: 35.394999999999996 - type: ndcg_at_5 value: 33.391999999999996 - type: precision_at_1 value: 41.486000000000004 - type: precision_at_10 value: 23.437 - type: precision_at_100 value: 7.638 - type: precision_at_1000 value: 2.0389999999999997 - type: precision_at_3 value: 32.817 - type: precision_at_5 value: 28.915999999999997 - type: recall_at_1 value: 4.367 - type: recall_at_10 value: 14.655000000000001 - type: recall_at_100 value: 29.665999999999997 - type: recall_at_1000 value: 62.073 - type: recall_at_3 value: 8.51 - type: recall_at_5 value: 10.689 task: type: Retrieval - dataset: config: default name: MTEB NQ-PL revision: f171245712cf85dd4700b06bef18001578d0ca8d split: test type: clarin-knext/nq-pl metrics: - type: map_at_1 value: 28.616000000000003 - type: map_at_10 value: 41.626000000000005 - type: map_at_100 value: 42.689 - type: map_at_1000 value: 42.733 - type: map_at_3 value: 37.729 - type: map_at_5 value: 39.879999999999995 - type: mrr_at_1 value: 32.068000000000005 - type: mrr_at_10 value: 44.029 - type: mrr_at_100 value: 44.87 - type: mrr_at_1000 value: 44.901 - type: mrr_at_3 value: 40.687 - type: mrr_at_5 value: 42.625 - type: ndcg_at_1 value: 32.068000000000005 - type: ndcg_at_10 value: 48.449999999999996 - type: ndcg_at_100 value: 53.13 - type: ndcg_at_1000 value: 54.186 - type: ndcg_at_3 value: 40.983999999999995 - type: ndcg_at_5 value: 44.628 - type: precision_at_1 value: 32.068000000000005 - type: precision_at_10 value: 7.9750000000000005 - type: precision_at_100 value: 1.061 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 18.404999999999998 - type: precision_at_5 value: 13.111 - type: recall_at_1 value: 28.616000000000003 - type: recall_at_10 value: 66.956 - type: recall_at_100 value: 87.657 - type: recall_at_1000 value: 95.548 - type: recall_at_3 value: 47.453 - type: recall_at_5 value: 55.87800000000001 task: type: Retrieval - dataset: config: default name: MTEB PAC revision: None split: test type: laugustyniak/abusive-clauses-pl metrics: - type: accuracy value: 69.04141326382856 - type: ap value: 77.47589122111044 - type: f1 value: 66.6332277374775 task: type: Classification - dataset: config: default name: MTEB PPC revision: None split: test type: PL-MTEB/ppc-pairclassification metrics: - type: cos_sim_accuracy value: 86.4 - type: cos_sim_ap value: 94.1044939667201 - type: cos_sim_f1 value: 88.78048780487805 - type: cos_sim_precision value: 87.22044728434504 - type: cos_sim_recall value: 90.39735099337747 - type: dot_accuracy value: 86.4 - type: dot_ap value: 94.1044939667201 - type: dot_f1 value: 88.78048780487805 - type: dot_precision value: 87.22044728434504 - type: dot_recall value: 90.39735099337747 - type: euclidean_accuracy value: 86.4 - type: euclidean_ap value: 94.1044939667201 - type: euclidean_f1 value: 88.78048780487805 - type: euclidean_precision value: 87.22044728434504 - type: euclidean_recall value: 90.39735099337747 - type: manhattan_accuracy value: 86.4 - type: manhattan_ap value: 94.11438365697387 - type: manhattan_f1 value: 88.77968877968877 - type: manhattan_precision value: 87.84440842787681 - type: manhattan_recall value: 89.73509933774835 - type: max_accuracy value: 86.4 - type: max_ap value: 94.11438365697387 - type: max_f1 value: 88.78048780487805 task: type: PairClassification - dataset: config: default name: MTEB PSC revision: None split: test type: PL-MTEB/psc-pairclassification metrics: - type: cos_sim_accuracy value: 97.86641929499072 - type: cos_sim_ap value: 99.36904211868182 - type: cos_sim_f1 value: 96.56203288490283 - type: cos_sim_precision value: 94.72140762463343 - type: cos_sim_recall value: 98.47560975609755 - type: dot_accuracy value: 97.86641929499072 - type: dot_ap value: 99.36904211868183 - type: dot_f1 value: 96.56203288490283 - type: dot_precision value: 94.72140762463343 - type: dot_recall value: 98.47560975609755 - type: euclidean_accuracy value: 97.86641929499072 - type: euclidean_ap value: 99.36904211868183 - type: euclidean_f1 value: 96.56203288490283 - type: euclidean_precision value: 94.72140762463343 - type: euclidean_recall value: 98.47560975609755 - type: manhattan_accuracy value: 98.14471243042672 - type: manhattan_ap value: 99.43359540492416 - type: manhattan_f1 value: 96.98795180722892 - type: manhattan_precision value: 95.83333333333334 - type: manhattan_recall value: 98.17073170731707 - type: max_accuracy value: 98.14471243042672 - type: max_ap value: 99.43359540492416 - type: max_f1 value: 96.98795180722892 task: type: PairClassification - dataset: config: default name: MTEB PolEmo2.0-IN revision: None split: test type: PL-MTEB/polemo2_in metrics: - type: accuracy value: 89.39058171745152 - type: f1 value: 86.8552093529568 task: type: Classification - dataset: config: default name: MTEB PolEmo2.0-OUT revision: None split: test type: PL-MTEB/polemo2_out metrics: - type: accuracy value: 74.97975708502024 - type: f1 value: 58.73081628832407 task: type: Classification - dataset: config: default name: MTEB Quora-PL revision: 0be27e93455051e531182b85e85e425aba12e9d4 split: test type: clarin-knext/quora-pl metrics: - type: map_at_1 value: 64.917 - type: map_at_10 value: 78.74600000000001 - type: map_at_100 value: 79.501 - type: map_at_1000 value: 79.524 - type: map_at_3 value: 75.549 - type: map_at_5 value: 77.495 - type: mrr_at_1 value: 74.9 - type: mrr_at_10 value: 82.112 - type: mrr_at_100 value: 82.314 - type: mrr_at_1000 value: 82.317 - type: mrr_at_3 value: 80.745 - type: mrr_at_5 value: 81.607 - type: ndcg_at_1 value: 74.83999999999999 - type: ndcg_at_10 value: 83.214 - type: ndcg_at_100 value: 84.997 - type: ndcg_at_1000 value: 85.207 - type: ndcg_at_3 value: 79.547 - type: ndcg_at_5 value: 81.46600000000001 - type: precision_at_1 value: 74.83999999999999 - type: precision_at_10 value: 12.822 - type: precision_at_100 value: 1.506 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 34.903 - type: precision_at_5 value: 23.16 - type: recall_at_1 value: 64.917 - type: recall_at_10 value: 92.27199999999999 - type: recall_at_100 value: 98.715 - type: recall_at_1000 value: 99.854 - type: recall_at_3 value: 82.04599999999999 - type: recall_at_5 value: 87.2 task: type: Retrieval - dataset: config: default name: MTEB SCIDOCS-PL revision: 45452b03f05560207ef19149545f168e596c9337 split: test type: clarin-knext/scidocs-pl metrics: - type: map_at_1 value: 3.51 - type: map_at_10 value: 9.046999999999999 - type: map_at_100 value: 10.823 - type: map_at_1000 value: 11.144 - type: map_at_3 value: 6.257 - type: map_at_5 value: 7.648000000000001 - type: mrr_at_1 value: 17.299999999999997 - type: mrr_at_10 value: 27.419 - type: mrr_at_100 value: 28.618 - type: mrr_at_1000 value: 28.685 - type: mrr_at_3 value: 23.817 - type: mrr_at_5 value: 25.927 - type: ndcg_at_1 value: 17.299999999999997 - type: ndcg_at_10 value: 16.084 - type: ndcg_at_100 value: 23.729 - type: ndcg_at_1000 value: 29.476999999999997 - type: ndcg_at_3 value: 14.327000000000002 - type: ndcg_at_5 value: 13.017999999999999 - type: precision_at_1 value: 17.299999999999997 - type: precision_at_10 value: 8.63 - type: precision_at_100 value: 1.981 - type: precision_at_1000 value: 0.336 - type: precision_at_3 value: 13.4 - type: precision_at_5 value: 11.700000000000001 - type: recall_at_1 value: 3.51 - type: recall_at_10 value: 17.518 - type: recall_at_100 value: 40.275 - type: recall_at_1000 value: 68.203 - type: recall_at_3 value: 8.155 - type: recall_at_5 value: 11.875 task: type: Retrieval - dataset: config: default name: MTEB SICK-E-PL revision: None split: test type: PL-MTEB/sicke-pl-pairclassification metrics: - type: cos_sim_accuracy value: 86.30248675091724 - type: cos_sim_ap value: 83.6756734006714 - type: cos_sim_f1 value: 74.97367497367497 - type: cos_sim_precision value: 73.91003460207612 - type: cos_sim_recall value: 76.06837606837607 - type: dot_accuracy value: 86.30248675091724 - type: dot_ap value: 83.6756734006714 - type: dot_f1 value: 74.97367497367497 - type: dot_precision value: 73.91003460207612 - type: dot_recall value: 76.06837606837607 - type: euclidean_accuracy value: 86.30248675091724 - type: euclidean_ap value: 83.67566984333091 - type: euclidean_f1 value: 74.97367497367497 - type: euclidean_precision value: 73.91003460207612 - type: euclidean_recall value: 76.06837606837607 - type: manhattan_accuracy value: 86.28210354667753 - type: manhattan_ap value: 83.64216119130171 - type: manhattan_f1 value: 74.92152075340078 - type: manhattan_precision value: 73.4107997265892 - type: manhattan_recall value: 76.49572649572649 - type: max_accuracy value: 86.30248675091724 - type: max_ap value: 83.6756734006714 - type: max_f1 value: 74.97367497367497 task: type: PairClassification - dataset: config: default name: MTEB SICK-R-PL revision: None split: test type: PL-MTEB/sickr-pl-sts metrics: - type: cos_sim_pearson value: 82.23295940859121 - type: cos_sim_spearman value: 78.89329160768719 - type: euclidean_pearson value: 79.56019107076818 - type: euclidean_spearman value: 78.89330209904084 - type: manhattan_pearson value: 79.76098513973719 - type: manhattan_spearman value: 79.05490162570123 task: type: STS - dataset: config: pl name: MTEB STS22 (pl) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 37.732606308062486 - type: cos_sim_spearman value: 41.01645667030284 - type: euclidean_pearson value: 26.61722556367085 - type: euclidean_spearman value: 41.01645667030284 - type: manhattan_pearson value: 26.60917378970807 - type: manhattan_spearman value: 41.51335727617614 task: type: STS - dataset: config: default name: MTEB SciFact-PL revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e split: test type: clarin-knext/scifact-pl metrics: - type: map_at_1 value: 54.31700000000001 - type: map_at_10 value: 65.564 - type: map_at_100 value: 66.062 - type: map_at_1000 value: 66.08699999999999 - type: map_at_3 value: 62.592999999999996 - type: map_at_5 value: 63.888 - type: mrr_at_1 value: 56.99999999999999 - type: mrr_at_10 value: 66.412 - type: mrr_at_100 value: 66.85900000000001 - type: mrr_at_1000 value: 66.88 - type: mrr_at_3 value: 64.22200000000001 - type: mrr_at_5 value: 65.206 - type: ndcg_at_1 value: 56.99999999999999 - type: ndcg_at_10 value: 70.577 - type: ndcg_at_100 value: 72.879 - type: ndcg_at_1000 value: 73.45 - type: ndcg_at_3 value: 65.5 - type: ndcg_at_5 value: 67.278 - type: precision_at_1 value: 56.99999999999999 - type: precision_at_10 value: 9.667 - type: precision_at_100 value: 1.083 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 26.0 - type: precision_at_5 value: 16.933 - type: recall_at_1 value: 54.31700000000001 - type: recall_at_10 value: 85.056 - type: recall_at_100 value: 95.667 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 71.0 - type: recall_at_5 value: 75.672 task: type: Retrieval - dataset: config: default name: MTEB TRECCOVID-PL revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd split: test type: clarin-knext/trec-covid-pl metrics: - type: map_at_1 value: 0.245 - type: map_at_10 value: 2.051 - type: map_at_100 value: 12.009 - type: map_at_1000 value: 27.448 - type: map_at_3 value: 0.721 - type: map_at_5 value: 1.13 - type: mrr_at_1 value: 88.0 - type: mrr_at_10 value: 93.0 - type: mrr_at_100 value: 93.0 - type: mrr_at_1000 value: 93.0 - type: mrr_at_3 value: 93.0 - type: mrr_at_5 value: 93.0 - type: ndcg_at_1 value: 85.0 - type: ndcg_at_10 value: 80.303 - type: ndcg_at_100 value: 61.23499999999999 - type: ndcg_at_1000 value: 52.978 - type: ndcg_at_3 value: 84.419 - type: ndcg_at_5 value: 82.976 - type: precision_at_1 value: 88.0 - type: precision_at_10 value: 83.39999999999999 - type: precision_at_100 value: 61.96 - type: precision_at_1000 value: 22.648 - type: precision_at_3 value: 89.333 - type: precision_at_5 value: 87.2 - type: recall_at_1 value: 0.245 - type: recall_at_10 value: 2.193 - type: recall_at_100 value: 14.938 - type: recall_at_1000 value: 48.563 - type: recall_at_3 value: 0.738 - type: recall_at_5 value: 1.173 task: type: Retrieval --- ## gte-Qwen2-7B-instruct **gte-Qwen2-7B-instruct** is the latest model in the gte (General Text Embedding) model family that ranks **No.1** in both English and Chinese evaluations on the Massive Text Embedding Benchmark MTEB benchmark (as of June 16, 2024). Recently, the **Qwen team** released the Qwen2 series models, and we have trained the **gte-Qwen2-7B-instruct** model based on the Qwen2-7B LLM model. Compared to the gte-Qwen1.5-7B-instruct model, the **gte-Qwen2-7B-instruct** model uses the same training data and training strategies during the finetuning stage, with the only difference being the upgraded base model to Qwen2-7B. Considering the improvements in the Qwen2 series models compared to the Qwen1.5 series, we can also expect consistent performance enhancements in the embedding models. The model incorporates several key advancements: - Integration of bidirectional attention mechanisms, enriching its contextual understanding. - Instruction tuning, applied solely on the query side for streamlined efficiency - Comprehensive training across a vast, multilingual text corpus spanning diverse domains and scenarios. This training leverages both weakly supervised and supervised data, ensuring the model's applicability across numerous languages and a wide array of downstream tasks. ## Model Information - Model Size: 7B - Embedding Dimension: 3584 - Max Input Tokens: 32k ## Requirements ## Usage ### Sentence Transformers Observe the config_sentence_transformers.json to see all pre-built prompt names. Otherwise, you can use to use a custom prompt of your choice. ### Transformers ## Infinity_emb Usage via infinity, a MIT Licensed inference server. ## Evaluation ### MTEB & C-MTEB You can use the scripts/eval_mteb.py to reproduce the following result of **gte-Qwen2-7B-instruct** on MTEB(English)/C-MTEB(Chinese): | Model Name | MTEB(56) | C-MTEB(35) | MTEB-fr(26) | MTEB-pl(26) | |:----:|:---------:|:----------:|:----------:|:----------:| | bge-base-en-1.5 | 64.23 | - | - | - | | bge-large-en-1.5 | 63.55 | - | - | - | | gte-large-en-v1.5 | 65.39 | - | - | - | | gte-base-en-v1.5 | 64.11 | - | - | - | | mxbai-embed-large-v1 | 64.68 | - | - | - | | acge_text_embedding | - | 69.07 | - | - | | stella-mrl-large-zh-v3.5-1792d | - | 68.55 | - | - | | gte-large-zh | - | 66.72 | - | - | | multilingual-e5-base | 59.45 | 56.21 | - | - | | multilingual-e5-large | 61.50 | 58.81 | - | - | | e5-mistral-7b-instruct | 66.63 | 60.81 | - | - | | gte-Qwen1.5-7B-instruct | 67.34 | 69.52 | - | - | | NV-Embed-v1 | 69.32 | - | - | - | | **gte-Qwen2-7B-instruct** | **70.24** | **72.05** | **68.25** | **67.86** | | gte-Qwen2-1.5B-instruc( | 67.16 | 67.65 | 66.60 | 64.04 | ### GTE Models The gte series models have consistently released two types of models: encoder-only models (based on the BERT architecture) and decode-only models (based on the LLM architecture). | Models | Language | Max Sequence Length | Dimension | Model Size (Memory Usage, fp32) | |:-------------------------------------------------------------------------------------:|:--------:|:-----: |:---------:|:-------------------------------:| | GTE-large-zh | Chinese | 512 | 1024 | 1.25GB | | GTE-base-zh | Chinese | 512 | 512 | 0.41GB | | GTE-small-zh | Chinese | 512 | 512 | 0.12GB | | GTE-large | English | 512 | 1024 | 1.25GB | | GTE-base | English | 512 | 512 | 0.21GB | | GTE-small | English | 512 | 384 | 0.10GB | | GTE-large-en-v1.5 | English | 8192 | 1024 | 1.74GB | | GTE-base-en-v1.5 | English | 8192 | 768 | 0.51GB | | GTE-Qwen1.5-7B-instruct | Multilingual | 32000 | 4096 | 26.45GB | | GTE-Qwen2-7B-instruct | Multilingual | 32000 | 3584 | 26.45GB | | GTE-Qwen2-1.5B-instruct | Multilingual | 32000 | 1536 | 6.62GB | ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Community support ### Fine-tuning GTE models can be fine-tuned with a third party framework SWIFT. ## Citation If you find our paper or models helpful, please consider cite:", + "model_explanation_gemini": "Generates sentence embeddings for tasks like classification, retrieval, clustering, and similarity measurement across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Alibaba-NLP_gte-base-en-v1.5.json b/data/model_data_json/Alibaba-NLP_gte-base-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..56bee4471387c05b26b57c5da055e3dce0b1a7fa --- /dev/null +++ b/data/model_data_json/Alibaba-NLP_gte-base-en-v1.5.json @@ -0,0 +1,27 @@ +{ + "model_id": "Alibaba-NLP/gte-base-en-v1.5", + "downloads": 1472071, + "tags": [ + "transformers", + "onnx", + "safetensors", + "new", + "feature-extraction", + "sentence-transformers", + "gte", + "mteb", + "transformers.js", + "sentence-similarity", + "custom_code", + "en", + "arxiv:2407.19669", + "arxiv:2308.03281", + "license:apache-2.0", + "model-index", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.7910447761194 - type: ap value: 37.053785713650626 - type: f1 value: 68.51101510998551 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.016875 - type: ap value: 89.17750268426342 - type: f1 value: 92.9970977240524 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.312000000000005 - type: f1 value: 52.98175784163017 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 38.193 - type: map_at_10 value: 54.848 - type: map_at_100 value: 55.388000000000005 - type: map_at_1000 value: 55.388999999999996 - type: map_at_3 value: 50.427 - type: map_at_5 value: 53.105000000000004 - type: mrr_at_1 value: 39.047 - type: mrr_at_10 value: 55.153 - type: mrr_at_100 value: 55.686 - type: mrr_at_1000 value: 55.688 - type: mrr_at_3 value: 50.676 - type: mrr_at_5 value: 53.417 - type: ndcg_at_1 value: 38.193 - type: ndcg_at_10 value: 63.486 - type: ndcg_at_100 value: 65.58 - type: ndcg_at_1000 value: 65.61 - type: ndcg_at_3 value: 54.494 - type: ndcg_at_5 value: 59.339 - type: precision_at_1 value: 38.193 - type: precision_at_10 value: 9.075 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.096 - type: precision_at_5 value: 15.619 - type: recall_at_1 value: 38.193 - type: recall_at_10 value: 90.754 - type: recall_at_100 value: 99.431 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.28699999999999 - type: recall_at_5 value: 78.094 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.508221208908964 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.04668382560096 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.828759903716815 - type: mrr value: 74.37343358395991 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.03673698773017 - type: cos_sim_spearman value: 83.6470866785058 - type: euclidean_pearson value: 82.64048673096565 - type: euclidean_spearman value: 83.63142367101115 - type: manhattan_pearson value: 82.71493099760228 - type: manhattan_spearman value: 83.60491704294326 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.73376623376623 - type: f1 value: 86.70294049278262 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.31923804167062 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.552547125348454 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 30.567 - type: map_at_10 value: 41.269 - type: map_at_100 value: 42.689 - type: map_at_1000 value: 42.84 - type: map_at_3 value: 37.567 - type: map_at_5 value: 39.706 - type: mrr_at_1 value: 37.053000000000004 - type: mrr_at_10 value: 46.900999999999996 - type: mrr_at_100 value: 47.662 - type: mrr_at_1000 value: 47.713 - type: mrr_at_3 value: 43.801 - type: mrr_at_5 value: 45.689 - type: ndcg_at_1 value: 37.053000000000004 - type: ndcg_at_10 value: 47.73 - type: ndcg_at_100 value: 53.128 - type: ndcg_at_1000 value: 55.300000000000004 - type: ndcg_at_3 value: 42.046 - type: ndcg_at_5 value: 44.782 - type: precision_at_1 value: 37.053000000000004 - type: precision_at_10 value: 9.142 - type: precision_at_100 value: 1.485 - type: precision_at_1000 value: 0.197 - type: precision_at_3 value: 20.076 - type: precision_at_5 value: 14.535 - type: recall_at_1 value: 30.567 - type: recall_at_10 value: 60.602999999999994 - type: recall_at_100 value: 83.22800000000001 - type: recall_at_1000 value: 96.696 - type: recall_at_3 value: 44.336999999999996 - type: recall_at_5 value: 51.949 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 28.538000000000004 - type: map_at_10 value: 38.757999999999996 - type: map_at_100 value: 40.129 - type: map_at_1000 value: 40.262 - type: map_at_3 value: 35.866 - type: map_at_5 value: 37.417 - type: mrr_at_1 value: 36.051 - type: mrr_at_10 value: 44.868 - type: mrr_at_100 value: 45.568999999999996 - type: mrr_at_1000 value: 45.615 - type: mrr_at_3 value: 42.558 - type: mrr_at_5 value: 43.883 - type: ndcg_at_1 value: 36.051 - type: ndcg_at_10 value: 44.584 - type: ndcg_at_100 value: 49.356 - type: ndcg_at_1000 value: 51.39 - type: ndcg_at_3 value: 40.389 - type: ndcg_at_5 value: 42.14 - type: precision_at_1 value: 36.051 - type: precision_at_10 value: 8.446 - type: precision_at_100 value: 1.411 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 19.639 - type: precision_at_5 value: 13.796 - type: recall_at_1 value: 28.538000000000004 - type: recall_at_10 value: 54.99000000000001 - type: recall_at_100 value: 75.098 - type: recall_at_1000 value: 87.848 - type: recall_at_3 value: 42.236000000000004 - type: recall_at_5 value: 47.377 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 37.188 - type: map_at_10 value: 50.861000000000004 - type: map_at_100 value: 51.917 - type: map_at_1000 value: 51.964999999999996 - type: map_at_3 value: 47.144000000000005 - type: map_at_5 value: 49.417 - type: mrr_at_1 value: 42.571 - type: mrr_at_10 value: 54.086999999999996 - type: mrr_at_100 value: 54.739000000000004 - type: mrr_at_1000 value: 54.762 - type: mrr_at_3 value: 51.285000000000004 - type: mrr_at_5 value: 53.0 - type: ndcg_at_1 value: 42.571 - type: ndcg_at_10 value: 57.282 - type: ndcg_at_100 value: 61.477000000000004 - type: ndcg_at_1000 value: 62.426 - type: ndcg_at_3 value: 51.0 - type: ndcg_at_5 value: 54.346000000000004 - type: precision_at_1 value: 42.571 - type: precision_at_10 value: 9.467 - type: precision_at_100 value: 1.2550000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 23.114 - type: precision_at_5 value: 16.250999999999998 - type: recall_at_1 value: 37.188 - type: recall_at_10 value: 73.068 - type: recall_at_100 value: 91.203 - type: recall_at_1000 value: 97.916 - type: recall_at_3 value: 56.552 - type: recall_at_5 value: 64.567 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 25.041000000000004 - type: map_at_10 value: 33.86 - type: map_at_100 value: 34.988 - type: map_at_1000 value: 35.064 - type: map_at_3 value: 31.049 - type: map_at_5 value: 32.845 - type: mrr_at_1 value: 26.893 - type: mrr_at_10 value: 35.594 - type: mrr_at_100 value: 36.617 - type: mrr_at_1000 value: 36.671 - type: mrr_at_3 value: 33.051 - type: mrr_at_5 value: 34.61 - type: ndcg_at_1 value: 26.893 - type: ndcg_at_10 value: 38.674 - type: ndcg_at_100 value: 44.178 - type: ndcg_at_1000 value: 46.089999999999996 - type: ndcg_at_3 value: 33.485 - type: ndcg_at_5 value: 36.402 - type: precision_at_1 value: 26.893 - type: precision_at_10 value: 5.989 - type: precision_at_100 value: 0.918 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 14.2 - type: precision_at_5 value: 10.26 - type: recall_at_1 value: 25.041000000000004 - type: recall_at_10 value: 51.666000000000004 - type: recall_at_100 value: 76.896 - type: recall_at_1000 value: 91.243 - type: recall_at_3 value: 38.035999999999994 - type: recall_at_5 value: 44.999 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.909999999999998 - type: map_at_10 value: 23.901 - type: map_at_100 value: 25.165 - type: map_at_1000 value: 25.291000000000004 - type: map_at_3 value: 21.356 - type: map_at_5 value: 22.816 - type: mrr_at_1 value: 20.025000000000002 - type: mrr_at_10 value: 28.382 - type: mrr_at_100 value: 29.465000000000003 - type: mrr_at_1000 value: 29.535 - type: mrr_at_3 value: 25.933 - type: mrr_at_5 value: 27.332 - type: ndcg_at_1 value: 20.025000000000002 - type: ndcg_at_10 value: 29.099000000000004 - type: ndcg_at_100 value: 35.127 - type: ndcg_at_1000 value: 38.096000000000004 - type: ndcg_at_3 value: 24.464 - type: ndcg_at_5 value: 26.709 - type: precision_at_1 value: 20.025000000000002 - type: precision_at_10 value: 5.398 - type: precision_at_100 value: 0.9690000000000001 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 11.774 - type: precision_at_5 value: 8.632 - type: recall_at_1 value: 15.909999999999998 - type: recall_at_10 value: 40.672000000000004 - type: recall_at_100 value: 66.855 - type: recall_at_1000 value: 87.922 - type: recall_at_3 value: 28.069 - type: recall_at_5 value: 33.812 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 30.175 - type: map_at_10 value: 41.36 - type: map_at_100 value: 42.701 - type: map_at_1000 value: 42.817 - type: map_at_3 value: 37.931 - type: map_at_5 value: 39.943 - type: mrr_at_1 value: 35.611 - type: mrr_at_10 value: 46.346 - type: mrr_at_100 value: 47.160000000000004 - type: mrr_at_1000 value: 47.203 - type: mrr_at_3 value: 43.712 - type: mrr_at_5 value: 45.367000000000004 - type: ndcg_at_1 value: 35.611 - type: ndcg_at_10 value: 47.532000000000004 - type: ndcg_at_100 value: 53.003 - type: ndcg_at_1000 value: 55.007 - type: ndcg_at_3 value: 42.043 - type: ndcg_at_5 value: 44.86 - type: precision_at_1 value: 35.611 - type: precision_at_10 value: 8.624 - type: precision_at_100 value: 1.332 - type: precision_at_1000 value: 0.169 - type: precision_at_3 value: 20.083000000000002 - type: precision_at_5 value: 14.437 - type: recall_at_1 value: 30.175 - type: recall_at_10 value: 60.5 - type: recall_at_100 value: 83.399 - type: recall_at_1000 value: 96.255 - type: recall_at_3 value: 45.448 - type: recall_at_5 value: 52.432 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 22.467000000000002 - type: map_at_10 value: 33.812999999999995 - type: map_at_100 value: 35.248000000000005 - type: map_at_1000 value: 35.359 - type: map_at_3 value: 30.316 - type: map_at_5 value: 32.233000000000004 - type: mrr_at_1 value: 28.310999999999996 - type: mrr_at_10 value: 38.979 - type: mrr_at_100 value: 39.937 - type: mrr_at_1000 value: 39.989999999999995 - type: mrr_at_3 value: 36.244 - type: mrr_at_5 value: 37.871 - type: ndcg_at_1 value: 28.310999999999996 - type: ndcg_at_10 value: 40.282000000000004 - type: ndcg_at_100 value: 46.22 - type: ndcg_at_1000 value: 48.507 - type: ndcg_at_3 value: 34.596 - type: ndcg_at_5 value: 37.267 - type: precision_at_1 value: 28.310999999999996 - type: precision_at_10 value: 7.831 - type: precision_at_100 value: 1.257 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.275 - type: precision_at_5 value: 12.556999999999999 - type: recall_at_1 value: 22.467000000000002 - type: recall_at_10 value: 54.14099999999999 - type: recall_at_100 value: 79.593 - type: recall_at_1000 value: 95.063 - type: recall_at_3 value: 38.539 - type: recall_at_5 value: 45.403 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 24.18591666666667 - type: map_at_10 value: 33.84258333333333 - type: map_at_100 value: 35.11391666666666 - type: map_at_1000 value: 35.23258333333333 - type: map_at_3 value: 30.764249999999997 - type: map_at_5 value: 32.52333333333334 - type: mrr_at_1 value: 28.54733333333333 - type: mrr_at_10 value: 37.81725 - type: mrr_at_100 value: 38.716499999999996 - type: mrr_at_1000 value: 38.77458333333333 - type: mrr_at_3 value: 35.157833333333336 - type: mrr_at_5 value: 36.69816666666667 - type: ndcg_at_1 value: 28.54733333333333 - type: ndcg_at_10 value: 39.51508333333334 - type: ndcg_at_100 value: 44.95316666666666 - type: ndcg_at_1000 value: 47.257083333333334 - type: ndcg_at_3 value: 34.205833333333324 - type: ndcg_at_5 value: 36.78266666666667 - type: precision_at_1 value: 28.54733333333333 - type: precision_at_10 value: 7.082583333333334 - type: precision_at_100 value: 1.1590833333333332 - type: precision_at_1000 value: 0.15516666666666662 - type: precision_at_3 value: 15.908750000000001 - type: precision_at_5 value: 11.505416666666669 - type: recall_at_1 value: 24.18591666666667 - type: recall_at_10 value: 52.38758333333333 - type: recall_at_100 value: 76.13666666666667 - type: recall_at_1000 value: 91.99066666666667 - type: recall_at_3 value: 37.78333333333334 - type: recall_at_5 value: 44.30141666666666 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 21.975 - type: map_at_10 value: 29.781000000000002 - type: map_at_100 value: 30.847 - type: map_at_1000 value: 30.94 - type: map_at_3 value: 27.167 - type: map_at_5 value: 28.633999999999997 - type: mrr_at_1 value: 24.387 - type: mrr_at_10 value: 32.476 - type: mrr_at_100 value: 33.337 - type: mrr_at_1000 value: 33.403 - type: mrr_at_3 value: 29.881999999999998 - type: mrr_at_5 value: 31.339 - type: ndcg_at_1 value: 24.387 - type: ndcg_at_10 value: 34.596 - type: ndcg_at_100 value: 39.635 - type: ndcg_at_1000 value: 42.079 - type: ndcg_at_3 value: 29.516 - type: ndcg_at_5 value: 31.959 - type: precision_at_1 value: 24.387 - type: precision_at_10 value: 5.6129999999999995 - type: precision_at_100 value: 0.8909999999999999 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.73 - type: precision_at_5 value: 9.171999999999999 - type: recall_at_1 value: 21.975 - type: recall_at_10 value: 46.826 - type: recall_at_100 value: 69.554 - type: recall_at_1000 value: 87.749 - type: recall_at_3 value: 33.016 - type: recall_at_5 value: 38.97 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 15.614 - type: map_at_10 value: 22.927 - type: map_at_100 value: 24.185000000000002 - type: map_at_1000 value: 24.319 - type: map_at_3 value: 20.596 - type: map_at_5 value: 21.854000000000003 - type: mrr_at_1 value: 18.858 - type: mrr_at_10 value: 26.535999999999998 - type: mrr_at_100 value: 27.582 - type: mrr_at_1000 value: 27.665 - type: mrr_at_3 value: 24.295 - type: mrr_at_5 value: 25.532 - type: ndcg_at_1 value: 18.858 - type: ndcg_at_10 value: 27.583000000000002 - type: ndcg_at_100 value: 33.635 - type: ndcg_at_1000 value: 36.647 - type: ndcg_at_3 value: 23.348 - type: ndcg_at_5 value: 25.257 - type: precision_at_1 value: 18.858 - type: precision_at_10 value: 5.158 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.13999999999999999 - type: precision_at_3 value: 11.092 - type: precision_at_5 value: 8.1 - type: recall_at_1 value: 15.614 - type: recall_at_10 value: 37.916 - type: recall_at_100 value: 65.205 - type: recall_at_1000 value: 86.453 - type: recall_at_3 value: 26.137 - type: recall_at_5 value: 31.087999999999997 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 23.078000000000003 - type: map_at_10 value: 31.941999999999997 - type: map_at_100 value: 33.196999999999996 - type: map_at_1000 value: 33.303 - type: map_at_3 value: 28.927000000000003 - type: map_at_5 value: 30.707 - type: mrr_at_1 value: 26.866 - type: mrr_at_10 value: 35.557 - type: mrr_at_100 value: 36.569 - type: mrr_at_1000 value: 36.632 - type: mrr_at_3 value: 32.897999999999996 - type: mrr_at_5 value: 34.437 - type: ndcg_at_1 value: 26.866 - type: ndcg_at_10 value: 37.372 - type: ndcg_at_100 value: 43.248 - type: ndcg_at_1000 value: 45.632 - type: ndcg_at_3 value: 31.852999999999998 - type: ndcg_at_5 value: 34.582 - type: precision_at_1 value: 26.866 - type: precision_at_10 value: 6.511 - type: precision_at_100 value: 1.078 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 14.582999999999998 - type: precision_at_5 value: 10.634 - type: recall_at_1 value: 23.078000000000003 - type: recall_at_10 value: 50.334 - type: recall_at_100 value: 75.787 - type: recall_at_1000 value: 92.485 - type: recall_at_3 value: 35.386 - type: recall_at_5 value: 42.225 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.203999999999997 - type: map_at_10 value: 31.276 - type: map_at_100 value: 32.844 - type: map_at_1000 value: 33.062999999999995 - type: map_at_3 value: 27.733999999999998 - type: map_at_5 value: 29.64 - type: mrr_at_1 value: 27.272999999999996 - type: mrr_at_10 value: 36.083 - type: mrr_at_100 value: 37.008 - type: mrr_at_1000 value: 37.076 - type: mrr_at_3 value: 33.004 - type: mrr_at_5 value: 34.664 - type: ndcg_at_1 value: 27.272999999999996 - type: ndcg_at_10 value: 37.763000000000005 - type: ndcg_at_100 value: 43.566 - type: ndcg_at_1000 value: 46.356 - type: ndcg_at_3 value: 31.673000000000002 - type: ndcg_at_5 value: 34.501 - type: precision_at_1 value: 27.272999999999996 - type: precision_at_10 value: 7.470000000000001 - type: precision_at_100 value: 1.502 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 14.756 - type: precision_at_5 value: 11.225 - type: recall_at_1 value: 22.203999999999997 - type: recall_at_10 value: 51.437999999999995 - type: recall_at_100 value: 76.845 - type: recall_at_1000 value: 94.38600000000001 - type: recall_at_3 value: 34.258 - type: recall_at_5 value: 41.512 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.474 - type: map_at_10 value: 26.362999999999996 - type: map_at_100 value: 27.456999999999997 - type: map_at_1000 value: 27.567999999999998 - type: map_at_3 value: 23.518 - type: map_at_5 value: 25.068 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.998 - type: mrr_at_100 value: 28.953 - type: mrr_at_1000 value: 29.03 - type: mrr_at_3 value: 25.230999999999998 - type: mrr_at_5 value: 26.654 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.684 - type: ndcg_at_100 value: 36.864999999999995 - type: ndcg_at_1000 value: 39.555 - type: ndcg_at_3 value: 26.057000000000002 - type: ndcg_at_5 value: 28.587 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.3420000000000005 - type: precision_at_100 value: 0.847 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 11.583 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.474 - type: recall_at_10 value: 46.497 - type: recall_at_100 value: 69.977 - type: recall_at_1000 value: 89.872 - type: recall_at_3 value: 31.385999999999996 - type: recall_at_5 value: 37.283 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 17.173 - type: map_at_10 value: 30.407 - type: map_at_100 value: 32.528 - type: map_at_1000 value: 32.698 - type: map_at_3 value: 25.523 - type: map_at_5 value: 28.038 - type: mrr_at_1 value: 38.958 - type: mrr_at_10 value: 51.515 - type: mrr_at_100 value: 52.214000000000006 - type: mrr_at_1000 value: 52.237 - type: mrr_at_3 value: 48.502 - type: mrr_at_5 value: 50.251000000000005 - type: ndcg_at_1 value: 38.958 - type: ndcg_at_10 value: 40.355000000000004 - type: ndcg_at_100 value: 47.68 - type: ndcg_at_1000 value: 50.370000000000005 - type: ndcg_at_3 value: 33.946 - type: ndcg_at_5 value: 36.057 - type: precision_at_1 value: 38.958 - type: precision_at_10 value: 12.508 - type: precision_at_100 value: 2.054 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 25.581 - type: precision_at_5 value: 19.256999999999998 - type: recall_at_1 value: 17.173 - type: recall_at_10 value: 46.967 - type: recall_at_100 value: 71.47200000000001 - type: recall_at_1000 value: 86.238 - type: recall_at_3 value: 30.961 - type: recall_at_5 value: 37.539 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 8.999 - type: map_at_10 value: 18.989 - type: map_at_100 value: 26.133 - type: map_at_1000 value: 27.666 - type: map_at_3 value: 13.918 - type: map_at_5 value: 16.473 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.161 - type: mrr_at_100 value: 74.516 - type: mrr_at_1000 value: 74.524 - type: mrr_at_3 value: 72.875 - type: mrr_at_5 value: 73.613 - type: ndcg_at_1 value: 54.37499999999999 - type: ndcg_at_10 value: 39.902 - type: ndcg_at_100 value: 44.212 - type: ndcg_at_1000 value: 51.62 - type: ndcg_at_3 value: 45.193 - type: ndcg_at_5 value: 42.541000000000004 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 30.425 - type: precision_at_100 value: 9.754999999999999 - type: precision_at_1000 value: 2.043 - type: precision_at_3 value: 48.25 - type: precision_at_5 value: 40.65 - type: recall_at_1 value: 8.999 - type: recall_at_10 value: 24.133 - type: recall_at_100 value: 49.138999999999996 - type: recall_at_1000 value: 72.639 - type: recall_at_3 value: 15.287999999999998 - type: recall_at_5 value: 19.415 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.38999999999999 - type: f1 value: 41.444205512055234 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 87.35000000000001 - type: map_at_10 value: 92.837 - type: map_at_100 value: 92.996 - type: map_at_1000 value: 93.006 - type: map_at_3 value: 92.187 - type: map_at_5 value: 92.595 - type: mrr_at_1 value: 93.864 - type: mrr_at_10 value: 96.723 - type: mrr_at_100 value: 96.72500000000001 - type: mrr_at_1000 value: 96.72500000000001 - type: mrr_at_3 value: 96.64 - type: mrr_at_5 value: 96.71499999999999 - type: ndcg_at_1 value: 93.864 - type: ndcg_at_10 value: 94.813 - type: ndcg_at_100 value: 95.243 - type: ndcg_at_1000 value: 95.38600000000001 - type: ndcg_at_3 value: 94.196 - type: ndcg_at_5 value: 94.521 - type: precision_at_1 value: 93.864 - type: precision_at_10 value: 10.951 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 35.114000000000004 - type: precision_at_5 value: 21.476 - type: recall_at_1 value: 87.35000000000001 - type: recall_at_10 value: 96.941 - type: recall_at_100 value: 98.397 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 95.149 - type: recall_at_5 value: 96.131 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 24.476 - type: map_at_10 value: 40.11 - type: map_at_100 value: 42.229 - type: map_at_1000 value: 42.378 - type: map_at_3 value: 34.512 - type: map_at_5 value: 38.037 - type: mrr_at_1 value: 47.839999999999996 - type: mrr_at_10 value: 57.053 - type: mrr_at_100 value: 57.772 - type: mrr_at_1000 value: 57.799 - type: mrr_at_3 value: 54.552 - type: mrr_at_5 value: 56.011 - type: ndcg_at_1 value: 47.839999999999996 - type: ndcg_at_10 value: 48.650999999999996 - type: ndcg_at_100 value: 55.681000000000004 - type: ndcg_at_1000 value: 57.979 - type: ndcg_at_3 value: 43.923 - type: ndcg_at_5 value: 46.037 - type: precision_at_1 value: 47.839999999999996 - type: precision_at_10 value: 13.395000000000001 - type: precision_at_100 value: 2.0660000000000003 - type: precision_at_1000 value: 0.248 - type: precision_at_3 value: 29.064 - type: precision_at_5 value: 22.006 - type: recall_at_1 value: 24.476 - type: recall_at_10 value: 56.216 - type: recall_at_100 value: 81.798 - type: recall_at_1000 value: 95.48299999999999 - type: recall_at_3 value: 39.357 - type: recall_at_5 value: 47.802 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 42.728 - type: map_at_10 value: 57.737 - type: map_at_100 value: 58.531 - type: map_at_1000 value: 58.594 - type: map_at_3 value: 54.869 - type: map_at_5 value: 56.55 - type: mrr_at_1 value: 85.456 - type: mrr_at_10 value: 90.062 - type: mrr_at_100 value: 90.159 - type: mrr_at_1000 value: 90.16 - type: mrr_at_3 value: 89.37899999999999 - type: mrr_at_5 value: 89.81 - type: ndcg_at_1 value: 85.456 - type: ndcg_at_10 value: 67.755 - type: ndcg_at_100 value: 70.341 - type: ndcg_at_1000 value: 71.538 - type: ndcg_at_3 value: 63.735 - type: ndcg_at_5 value: 65.823 - type: precision_at_1 value: 85.456 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.545 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_3 value: 38.861000000000004 - type: precision_at_5 value: 24.964 - type: recall_at_1 value: 42.728 - type: recall_at_10 value: 67.252 - type: recall_at_100 value: 77.265 - type: recall_at_1000 value: 85.246 - type: recall_at_3 value: 58.292 - type: recall_at_5 value: 62.41100000000001 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 87.4836 - type: ap value: 82.29552224030336 - type: f1 value: 87.42791432227448 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.015 - type: map_at_10 value: 35.621 - type: map_at_100 value: 36.809 - type: map_at_1000 value: 36.853 - type: map_at_3 value: 31.832 - type: map_at_5 value: 34.006 - type: mrr_at_1 value: 23.738999999999997 - type: mrr_at_10 value: 36.309999999999995 - type: mrr_at_100 value: 37.422 - type: mrr_at_1000 value: 37.461 - type: mrr_at_3 value: 32.592999999999996 - type: mrr_at_5 value: 34.736 - type: ndcg_at_1 value: 23.724999999999998 - type: ndcg_at_10 value: 42.617 - type: ndcg_at_100 value: 48.217999999999996 - type: ndcg_at_1000 value: 49.309 - type: ndcg_at_3 value: 34.905 - type: ndcg_at_5 value: 38.769 - type: precision_at_1 value: 23.724999999999998 - type: precision_at_10 value: 6.689 - type: precision_at_100 value: 0.9480000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.89 - type: precision_at_5 value: 10.897 - type: recall_at_1 value: 23.015 - type: recall_at_10 value: 64.041 - type: recall_at_100 value: 89.724 - type: recall_at_1000 value: 98.00999999999999 - type: recall_at_3 value: 43.064 - type: recall_at_5 value: 52.31099999999999 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.49794801641588 - type: f1 value: 96.28931114498003 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.81121751025992 - type: f1 value: 63.18740125901853 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.66644250168123 - type: f1 value: 74.93211186867839 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.77202420981843 - type: f1 value: 81.63681969283554 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.596687684870645 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.26965660101405 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.33619694846802 - type: mrr value: 32.53719657720334 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.0729999999999995 - type: map_at_10 value: 13.245999999999999 - type: map_at_100 value: 16.747999999999998 - type: map_at_1000 value: 18.163 - type: map_at_3 value: 10.064 - type: map_at_5 value: 11.513 - type: mrr_at_1 value: 49.536 - type: mrr_at_10 value: 58.092 - type: mrr_at_100 value: 58.752 - type: mrr_at_1000 value: 58.78 - type: mrr_at_3 value: 56.398 - type: mrr_at_5 value: 57.389 - type: ndcg_at_1 value: 47.059 - type: ndcg_at_10 value: 35.881 - type: ndcg_at_100 value: 32.751999999999995 - type: ndcg_at_1000 value: 41.498000000000005 - type: ndcg_at_3 value: 42.518 - type: ndcg_at_5 value: 39.550999999999995 - type: precision_at_1 value: 49.536 - type: precision_at_10 value: 26.316 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.081 - type: precision_at_3 value: 39.938 - type: precision_at_5 value: 34.056 - type: recall_at_1 value: 6.0729999999999995 - type: recall_at_10 value: 16.593 - type: recall_at_100 value: 32.883 - type: recall_at_1000 value: 64.654 - type: recall_at_3 value: 11.174000000000001 - type: recall_at_5 value: 13.528 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 30.043 - type: map_at_10 value: 45.318999999999996 - type: map_at_100 value: 46.381 - type: map_at_1000 value: 46.412 - type: map_at_3 value: 40.941 - type: map_at_5 value: 43.662 - type: mrr_at_1 value: 33.98 - type: mrr_at_10 value: 47.870000000000005 - type: mrr_at_100 value: 48.681999999999995 - type: mrr_at_1000 value: 48.703 - type: mrr_at_3 value: 44.341 - type: mrr_at_5 value: 46.547 - type: ndcg_at_1 value: 33.98 - type: ndcg_at_10 value: 52.957 - type: ndcg_at_100 value: 57.434 - type: ndcg_at_1000 value: 58.103 - type: ndcg_at_3 value: 44.896 - type: ndcg_at_5 value: 49.353 - type: precision_at_1 value: 33.98 - type: precision_at_10 value: 8.786 - type: precision_at_100 value: 1.1280000000000001 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.577 - type: precision_at_5 value: 14.942 - type: recall_at_1 value: 30.043 - type: recall_at_10 value: 73.593 - type: recall_at_100 value: 93.026 - type: recall_at_1000 value: 97.943 - type: recall_at_3 value: 52.955 - type: recall_at_5 value: 63.132 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.808 - type: map_at_10 value: 84.675 - type: map_at_100 value: 85.322 - type: map_at_1000 value: 85.33800000000001 - type: map_at_3 value: 81.68900000000001 - type: map_at_5 value: 83.543 - type: mrr_at_1 value: 81.5 - type: mrr_at_10 value: 87.59700000000001 - type: mrr_at_100 value: 87.705 - type: mrr_at_1000 value: 87.70599999999999 - type: mrr_at_3 value: 86.607 - type: mrr_at_5 value: 87.289 - type: ndcg_at_1 value: 81.51 - type: ndcg_at_10 value: 88.41799999999999 - type: ndcg_at_100 value: 89.644 - type: ndcg_at_1000 value: 89.725 - type: ndcg_at_3 value: 85.49900000000001 - type: ndcg_at_5 value: 87.078 - type: precision_at_1 value: 81.51 - type: precision_at_10 value: 13.438 - type: precision_at_100 value: 1.532 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.363 - type: precision_at_5 value: 24.57 - type: recall_at_1 value: 70.808 - type: recall_at_10 value: 95.575 - type: recall_at_100 value: 99.667 - type: recall_at_1000 value: 99.98899999999999 - type: recall_at_3 value: 87.223 - type: recall_at_5 value: 91.682 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 58.614831329137715 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.86580408560826 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.093 - type: map_at_10 value: 13.014000000000001 - type: map_at_100 value: 15.412999999999998 - type: map_at_1000 value: 15.756999999999998 - type: map_at_3 value: 9.216000000000001 - type: map_at_5 value: 11.036999999999999 - type: mrr_at_1 value: 25.1 - type: mrr_at_10 value: 37.133 - type: mrr_at_100 value: 38.165 - type: mrr_at_1000 value: 38.198 - type: mrr_at_3 value: 33.217 - type: mrr_at_5 value: 35.732 - type: ndcg_at_1 value: 25.1 - type: ndcg_at_10 value: 21.918000000000003 - type: ndcg_at_100 value: 30.983 - type: ndcg_at_1000 value: 36.629 - type: ndcg_at_3 value: 20.544999999999998 - type: ndcg_at_5 value: 18.192 - type: precision_at_1 value: 25.1 - type: precision_at_10 value: 11.44 - type: precision_at_100 value: 2.459 - type: precision_at_1000 value: 0.381 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 16.16 - type: recall_at_1 value: 5.093 - type: recall_at_10 value: 23.215 - type: recall_at_100 value: 49.902 - type: recall_at_1000 value: 77.403 - type: recall_at_3 value: 11.733 - type: recall_at_5 value: 16.372999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.9365442977452 - type: cos_sim_spearman value: 79.36960687383745 - type: euclidean_pearson value: 79.6045204840714 - type: euclidean_spearman value: 79.26382712751337 - type: manhattan_pearson value: 79.4805084789529 - type: manhattan_spearman value: 79.21847863209523 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.27906192961453 - type: cos_sim_spearman value: 74.38364712099211 - type: euclidean_pearson value: 78.54358927241223 - type: euclidean_spearman value: 74.22185560806376 - type: manhattan_pearson value: 78.50904327377751 - type: manhattan_spearman value: 74.2627500781748 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.66863742649639 - type: cos_sim_spearman value: 84.70630905216271 - type: euclidean_pearson value: 84.64498334705334 - type: euclidean_spearman value: 84.87204770690148 - type: manhattan_pearson value: 84.65774227976077 - type: manhattan_spearman value: 84.91251851797985 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.1577763924467 - type: cos_sim_spearman value: 80.10314039230198 - type: euclidean_pearson value: 81.51346991046043 - type: euclidean_spearman value: 80.08678485109435 - type: manhattan_pearson value: 81.57058914661894 - type: manhattan_spearman value: 80.1516230725106 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.40310839662533 - type: cos_sim_spearman value: 87.16293477217867 - type: euclidean_pearson value: 86.50688711184775 - type: euclidean_spearman value: 87.08651444923031 - type: manhattan_pearson value: 86.54674677557857 - type: manhattan_spearman value: 87.15079017870971 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.32886275207817 - type: cos_sim_spearman value: 85.0190460590732 - type: euclidean_pearson value: 84.42553652784679 - type: euclidean_spearman value: 85.20027364279328 - type: manhattan_pearson value: 84.42926246281078 - type: manhattan_spearman value: 85.20187419804306 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 90.76732216967812 - type: cos_sim_spearman value: 90.63701653633909 - type: euclidean_pearson value: 90.26678186114682 - type: euclidean_spearman value: 90.67288073455427 - type: manhattan_pearson value: 90.20772020584582 - type: manhattan_spearman value: 90.60764863983702 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 69.09280387698125 - type: cos_sim_spearman value: 68.62743151172162 - type: euclidean_pearson value: 69.89386398104689 - type: euclidean_spearman value: 68.71191066733556 - type: manhattan_pearson value: 69.92516500604872 - type: manhattan_spearman value: 68.80452846992576 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.13178592019887 - type: cos_sim_spearman value: 86.03947178806887 - type: euclidean_pearson value: 85.87029414285313 - type: euclidean_spearman value: 86.04960843306998 - type: manhattan_pearson value: 85.92946858580146 - type: manhattan_spearman value: 86.12575341860442 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.16657063002837 - type: mrr value: 95.73671063867141 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 63.510999999999996 - type: map_at_10 value: 72.76899999999999 - type: map_at_100 value: 73.303 - type: map_at_1000 value: 73.32499999999999 - type: map_at_3 value: 70.514 - type: map_at_5 value: 71.929 - type: mrr_at_1 value: 66.333 - type: mrr_at_10 value: 73.75 - type: mrr_at_100 value: 74.119 - type: mrr_at_1000 value: 74.138 - type: mrr_at_3 value: 72.222 - type: mrr_at_5 value: 73.122 - type: ndcg_at_1 value: 66.333 - type: ndcg_at_10 value: 76.774 - type: ndcg_at_100 value: 78.78500000000001 - type: ndcg_at_1000 value: 79.254 - type: ndcg_at_3 value: 73.088 - type: ndcg_at_5 value: 75.002 - type: precision_at_1 value: 66.333 - type: precision_at_10 value: 9.833 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 28.222 - type: precision_at_5 value: 18.333 - type: recall_at_1 value: 63.510999999999996 - type: recall_at_10 value: 87.98899999999999 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 77.86699999999999 - type: recall_at_5 value: 82.73899999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.78514851485149 - type: cos_sim_ap value: 94.94214383862038 - type: cos_sim_f1 value: 89.02255639097744 - type: cos_sim_precision value: 89.2462311557789 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.78217821782178 - type: dot_ap value: 94.69965247836805 - type: dot_f1 value: 88.78695208970439 - type: dot_precision value: 90.54054054054053 - type: dot_recall value: 87.1 - type: euclidean_accuracy value: 99.78118811881188 - type: euclidean_ap value: 94.9865187695411 - type: euclidean_f1 value: 88.99950223992036 - type: euclidean_precision value: 88.60257680872151 - type: euclidean_recall value: 89.4 - type: manhattan_accuracy value: 99.78811881188119 - type: manhattan_ap value: 95.0021236766459 - type: manhattan_f1 value: 89.12071535022356 - type: manhattan_precision value: 88.54886475814413 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.78811881188119 - type: max_ap value: 95.0021236766459 - type: max_f1 value: 89.12071535022356 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 68.93190546593995 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 37.602808534760655 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.29214480978073 - type: mrr value: 53.123169722434426 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.967800769650022 - type: cos_sim_spearman value: 31.168490040206926 - type: dot_pearson value: 30.888603021128553 - type: dot_spearman value: 31.028241262520385 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22300000000000003 - type: map_at_10 value: 1.781 - type: map_at_100 value: 9.905999999999999 - type: map_at_1000 value: 23.455000000000002 - type: map_at_3 value: 0.569 - type: map_at_5 value: 0.918 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 91.067 - type: mrr_at_100 value: 91.067 - type: mrr_at_1000 value: 91.067 - type: mrr_at_3 value: 90.667 - type: mrr_at_5 value: 91.067 - type: ndcg_at_1 value: 78.0 - type: ndcg_at_10 value: 73.13499999999999 - type: ndcg_at_100 value: 55.32 - type: ndcg_at_1000 value: 49.532 - type: ndcg_at_3 value: 73.715 - type: ndcg_at_5 value: 72.74199999999999 - type: precision_at_1 value: 84.0 - type: precision_at_10 value: 78.8 - type: precision_at_100 value: 56.32 - type: precision_at_1000 value: 21.504 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 78.0 - type: recall_at_1 value: 0.22300000000000003 - type: recall_at_10 value: 2.049 - type: recall_at_100 value: 13.553 - type: recall_at_1000 value: 46.367999999999995 - type: recall_at_3 value: 0.604 - type: recall_at_5 value: 1.015 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.0380000000000003 - type: map_at_10 value: 10.188 - type: map_at_100 value: 16.395 - type: map_at_1000 value: 18.024 - type: map_at_3 value: 6.236 - type: map_at_5 value: 7.276000000000001 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 46.292 - type: mrr_at_100 value: 47.446 - type: mrr_at_1000 value: 47.446 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 44.32 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 25.219 - type: ndcg_at_100 value: 37.802 - type: ndcg_at_1000 value: 49.274 - type: ndcg_at_3 value: 28.605999999999998 - type: ndcg_at_5 value: 26.21 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 21.837 - type: precision_at_100 value: 7.776 - type: precision_at_1000 value: 1.522 - type: precision_at_3 value: 28.571 - type: precision_at_5 value: 25.306 - type: recall_at_1 value: 3.0380000000000003 - type: recall_at_10 value: 16.298000000000002 - type: recall_at_100 value: 48.712 - type: recall_at_1000 value: 83.16799999999999 - type: recall_at_3 value: 7.265000000000001 - type: recall_at_5 value: 9.551 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 83.978 - type: ap value: 24.751887949330015 - type: f1 value: 66.8685134049279 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.573288058856825 - type: f1 value: 61.973261751726604 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.75483298792469 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.36824223639506 - type: cos_sim_ap value: 75.53126388573047 - type: cos_sim_f1 value: 67.9912831688245 - type: cos_sim_precision value: 66.11817501869858 - type: cos_sim_recall value: 69.9736147757256 - type: dot_accuracy value: 86.39804494248078 - type: dot_ap value: 75.27598891718046 - type: dot_f1 value: 67.91146284159763 - type: dot_precision value: 63.90505003490807 - type: dot_recall value: 72.45382585751979 - type: euclidean_accuracy value: 86.36228169517793 - type: euclidean_ap value: 75.51438087434647 - type: euclidean_f1 value: 68.02370523061066 - type: euclidean_precision value: 66.46525679758308 - type: euclidean_recall value: 69.65699208443272 - type: manhattan_accuracy value: 86.46361089586935 - type: manhattan_ap value: 75.50800785730111 - type: manhattan_f1 value: 67.9220437187253 - type: manhattan_precision value: 67.79705573080967 - type: manhattan_recall value: 68.04749340369392 - type: max_accuracy value: 86.46361089586935 - type: max_ap value: 75.53126388573047 - type: max_f1 value: 68.02370523061066 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.80350836341057 - type: cos_sim_ap value: 85.51101933260743 - type: cos_sim_f1 value: 77.9152271629704 - type: cos_sim_precision value: 75.27815662910056 - type: cos_sim_recall value: 80.74376347397599 - type: dot_accuracy value: 88.84425815966158 - type: dot_ap value: 85.49726945962519 - type: dot_f1 value: 77.94445269567801 - type: dot_precision value: 75.27251864601261 - type: dot_recall value: 80.81305820757623 - type: euclidean_accuracy value: 88.80350836341057 - type: euclidean_ap value: 85.4882880790211 - type: euclidean_f1 value: 77.87063284615103 - type: euclidean_precision value: 74.61022927689595 - type: euclidean_recall value: 81.42901139513397 - type: manhattan_accuracy value: 88.7161873714441 - type: manhattan_ap value: 85.45753871906821 - type: manhattan_f1 value: 77.8686401480111 - type: manhattan_precision value: 74.95903683123174 - type: manhattan_recall value: 81.01324299353249 - type: max_accuracy value: 88.84425815966158 - type: max_ap value: 85.51101933260743 - type: max_f1 value: 77.94445269567801 --- # gte-base-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with : Use with : Use with infinity: Infinity is a MIT licensed server for OpenAI-compatible deployment. ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-2048: lr 5e-4, mlm_probability 0.3, batch_size 4096, num_steps 70000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 20000, rope_base 500000 - CPT: max_len 512, lr 2e-4, batch_size 32768, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 434 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:", + "model_explanation_gemini": "Generates sentence embeddings for English text to perform tasks like classification, retrieval, clustering, and similarity scoring." +} \ No newline at end of file diff --git a/data/model_data_json/Alibaba-NLP_gte-large-en-v1.5.json b/data/model_data_json/Alibaba-NLP_gte-large-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..a84b67c61512b3fdd36b91dd12ace4e6b8f305b3 --- /dev/null +++ b/data/model_data_json/Alibaba-NLP_gte-large-en-v1.5.json @@ -0,0 +1,28 @@ +{ + "model_id": "Alibaba-NLP/gte-large-en-v1.5", + "downloads": 886564, + "tags": [ + "transformers", + "onnx", + "safetensors", + "new", + "feature-extraction", + "sentence-transformers", + "gte", + "mteb", + "transformers.js", + "sentence-similarity", + "custom_code", + "en", + "dataset:allenai/c4", + "arxiv:2407.19669", + "arxiv:2308.03281", + "license:apache-2.0", + "model-index", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - allenai/c4 library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-large-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.01492537313432 - type: ap value: 35.05341696659522 - type: f1 value: 66.71270310883853 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.97189999999999 - type: ap value: 90.5952493948908 - type: f1 value: 93.95848137716877 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 54.196 - type: f1 value: 53.80122334012787 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 47.297 - type: map_at_10 value: 64.303 - type: map_at_100 value: 64.541 - type: map_at_1000 value: 64.541 - type: map_at_3 value: 60.728 - type: map_at_5 value: 63.114000000000004 - type: mrr_at_1 value: 48.435 - type: mrr_at_10 value: 64.657 - type: mrr_at_100 value: 64.901 - type: mrr_at_1000 value: 64.901 - type: mrr_at_3 value: 61.06 - type: mrr_at_5 value: 63.514 - type: ndcg_at_1 value: 47.297 - type: ndcg_at_10 value: 72.107 - type: ndcg_at_100 value: 72.963 - type: ndcg_at_1000 value: 72.963 - type: ndcg_at_3 value: 65.063 - type: ndcg_at_5 value: 69.352 - type: precision_at_1 value: 47.297 - type: precision_at_10 value: 9.623 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 25.865 - type: precision_at_5 value: 17.596 - type: recall_at_1 value: 47.297 - type: recall_at_10 value: 96.23 - type: recall_at_100 value: 99.644 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 77.596 - type: recall_at_5 value: 87.98 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.467787861077475 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.39198391914257 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.12794820591384 - type: mrr value: 75.9331442641692 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.85062993863319 - type: cos_sim_spearman value: 85.39049989733459 - type: euclidean_pearson value: 86.00222680278333 - type: euclidean_spearman value: 85.45556162077396 - type: manhattan_pearson value: 85.88769871785621 - type: manhattan_spearman value: 85.11760211290839 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.32792207792208 - type: f1 value: 87.29132945999555 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.5779328301945 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.94425623865118 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 32.978 - type: map_at_10 value: 44.45 - type: map_at_100 value: 46.19 - type: map_at_1000 value: 46.303 - type: map_at_3 value: 40.849000000000004 - type: map_at_5 value: 42.55 - type: mrr_at_1 value: 40.629 - type: mrr_at_10 value: 50.848000000000006 - type: mrr_at_100 value: 51.669 - type: mrr_at_1000 value: 51.705 - type: mrr_at_3 value: 47.997 - type: mrr_at_5 value: 49.506 - type: ndcg_at_1 value: 40.629 - type: ndcg_at_10 value: 51.102000000000004 - type: ndcg_at_100 value: 57.159000000000006 - type: ndcg_at_1000 value: 58.669000000000004 - type: ndcg_at_3 value: 45.738 - type: ndcg_at_5 value: 47.632999999999996 - type: precision_at_1 value: 40.629 - type: precision_at_10 value: 9.700000000000001 - type: precision_at_100 value: 1.5970000000000002 - type: precision_at_1000 value: 0.202 - type: precision_at_3 value: 21.698 - type: precision_at_5 value: 15.393 - type: recall_at_1 value: 32.978 - type: recall_at_10 value: 63.711 - type: recall_at_100 value: 88.39399999999999 - type: recall_at_1000 value: 97.513 - type: recall_at_3 value: 48.025 - type: recall_at_5 value: 53.52 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 30.767 - type: map_at_10 value: 42.195 - type: map_at_100 value: 43.541999999999994 - type: map_at_1000 value: 43.673 - type: map_at_3 value: 38.561 - type: map_at_5 value: 40.532000000000004 - type: mrr_at_1 value: 38.79 - type: mrr_at_10 value: 48.021 - type: mrr_at_100 value: 48.735 - type: mrr_at_1000 value: 48.776 - type: mrr_at_3 value: 45.594 - type: mrr_at_5 value: 46.986 - type: ndcg_at_1 value: 38.79 - type: ndcg_at_10 value: 48.468 - type: ndcg_at_100 value: 53.037 - type: ndcg_at_1000 value: 55.001999999999995 - type: ndcg_at_3 value: 43.409 - type: ndcg_at_5 value: 45.654 - type: precision_at_1 value: 38.79 - type: precision_at_10 value: 9.452 - type: precision_at_100 value: 1.518 - type: precision_at_1000 value: 0.201 - type: precision_at_3 value: 21.21 - type: precision_at_5 value: 15.171999999999999 - type: recall_at_1 value: 30.767 - type: recall_at_10 value: 60.118 - type: recall_at_100 value: 79.271 - type: recall_at_1000 value: 91.43299999999999 - type: recall_at_3 value: 45.36 - type: recall_at_5 value: 51.705 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 40.007 - type: map_at_10 value: 53.529 - type: map_at_100 value: 54.602 - type: map_at_1000 value: 54.647 - type: map_at_3 value: 49.951 - type: map_at_5 value: 52.066 - type: mrr_at_1 value: 45.705 - type: mrr_at_10 value: 56.745000000000005 - type: mrr_at_100 value: 57.43899999999999 - type: mrr_at_1000 value: 57.462999999999994 - type: mrr_at_3 value: 54.25299999999999 - type: mrr_at_5 value: 55.842000000000006 - type: ndcg_at_1 value: 45.705 - type: ndcg_at_10 value: 59.809 - type: ndcg_at_100 value: 63.837999999999994 - type: ndcg_at_1000 value: 64.729 - type: ndcg_at_3 value: 53.994 - type: ndcg_at_5 value: 57.028 - type: precision_at_1 value: 45.705 - type: precision_at_10 value: 9.762 - type: precision_at_100 value: 1.275 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 24.368000000000002 - type: precision_at_5 value: 16.84 - type: recall_at_1 value: 40.007 - type: recall_at_10 value: 75.017 - type: recall_at_100 value: 91.99000000000001 - type: recall_at_1000 value: 98.265 - type: recall_at_3 value: 59.704 - type: recall_at_5 value: 67.109 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 26.639000000000003 - type: map_at_10 value: 35.926 - type: map_at_100 value: 37.126999999999995 - type: map_at_1000 value: 37.202 - type: map_at_3 value: 32.989000000000004 - type: map_at_5 value: 34.465 - type: mrr_at_1 value: 28.475 - type: mrr_at_10 value: 37.7 - type: mrr_at_100 value: 38.753 - type: mrr_at_1000 value: 38.807 - type: mrr_at_3 value: 35.066 - type: mrr_at_5 value: 36.512 - type: ndcg_at_1 value: 28.475 - type: ndcg_at_10 value: 41.245 - type: ndcg_at_100 value: 46.814 - type: ndcg_at_1000 value: 48.571 - type: ndcg_at_3 value: 35.528999999999996 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 28.475 - type: precision_at_10 value: 6.497 - type: precision_at_100 value: 0.9650000000000001 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 15.065999999999999 - type: precision_at_5 value: 10.599 - type: recall_at_1 value: 26.639000000000003 - type: recall_at_10 value: 55.759 - type: recall_at_100 value: 80.913 - type: recall_at_1000 value: 93.929 - type: recall_at_3 value: 40.454 - type: recall_at_5 value: 46.439 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.767999999999999 - type: map_at_10 value: 24.811 - type: map_at_100 value: 26.064999999999998 - type: map_at_1000 value: 26.186999999999998 - type: map_at_3 value: 21.736 - type: map_at_5 value: 23.283 - type: mrr_at_1 value: 19.527 - type: mrr_at_10 value: 29.179 - type: mrr_at_100 value: 30.153999999999996 - type: mrr_at_1000 value: 30.215999999999998 - type: mrr_at_3 value: 26.223000000000003 - type: mrr_at_5 value: 27.733999999999998 - type: ndcg_at_1 value: 19.527 - type: ndcg_at_10 value: 30.786 - type: ndcg_at_100 value: 36.644 - type: ndcg_at_1000 value: 39.440999999999995 - type: ndcg_at_3 value: 24.958 - type: ndcg_at_5 value: 27.392 - type: precision_at_1 value: 19.527 - type: precision_at_10 value: 5.995 - type: precision_at_100 value: 1.03 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_3 value: 12.520999999999999 - type: precision_at_5 value: 9.129 - type: recall_at_1 value: 15.767999999999999 - type: recall_at_10 value: 44.824000000000005 - type: recall_at_100 value: 70.186 - type: recall_at_1000 value: 89.934 - type: recall_at_3 value: 28.607 - type: recall_at_5 value: 34.836 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 31.952 - type: map_at_10 value: 44.438 - type: map_at_100 value: 45.778 - type: map_at_1000 value: 45.883 - type: map_at_3 value: 41.044000000000004 - type: map_at_5 value: 42.986000000000004 - type: mrr_at_1 value: 39.172000000000004 - type: mrr_at_10 value: 49.76 - type: mrr_at_100 value: 50.583999999999996 - type: mrr_at_1000 value: 50.621 - type: mrr_at_3 value: 47.353 - type: mrr_at_5 value: 48.739 - type: ndcg_at_1 value: 39.172000000000004 - type: ndcg_at_10 value: 50.760000000000005 - type: ndcg_at_100 value: 56.084 - type: ndcg_at_1000 value: 57.865 - type: ndcg_at_3 value: 45.663 - type: ndcg_at_5 value: 48.178 - type: precision_at_1 value: 39.172000000000004 - type: precision_at_10 value: 9.22 - type: precision_at_100 value: 1.387 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 21.976000000000003 - type: precision_at_5 value: 15.457 - type: recall_at_1 value: 31.952 - type: recall_at_10 value: 63.900999999999996 - type: recall_at_100 value: 85.676 - type: recall_at_1000 value: 97.03699999999999 - type: recall_at_3 value: 49.781 - type: recall_at_5 value: 56.330000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 25.332 - type: map_at_10 value: 36.874 - type: map_at_100 value: 38.340999999999994 - type: map_at_1000 value: 38.452 - type: map_at_3 value: 33.068 - type: map_at_5 value: 35.324 - type: mrr_at_1 value: 30.822 - type: mrr_at_10 value: 41.641 - type: mrr_at_100 value: 42.519 - type: mrr_at_1000 value: 42.573 - type: mrr_at_3 value: 38.413000000000004 - type: mrr_at_5 value: 40.542 - type: ndcg_at_1 value: 30.822 - type: ndcg_at_10 value: 43.414 - type: ndcg_at_100 value: 49.196 - type: ndcg_at_1000 value: 51.237 - type: ndcg_at_3 value: 37.230000000000004 - type: ndcg_at_5 value: 40.405 - type: precision_at_1 value: 30.822 - type: precision_at_10 value: 8.379 - type: precision_at_100 value: 1.315 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 18.417 - type: precision_at_5 value: 13.744 - type: recall_at_1 value: 25.332 - type: recall_at_10 value: 57.774 - type: recall_at_100 value: 82.071 - type: recall_at_1000 value: 95.60600000000001 - type: recall_at_3 value: 40.722 - type: recall_at_5 value: 48.754999999999995 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 25.91033333333334 - type: map_at_10 value: 36.23225000000001 - type: map_at_100 value: 37.55766666666667 - type: map_at_1000 value: 37.672583333333336 - type: map_at_3 value: 32.95666666666667 - type: map_at_5 value: 34.73375 - type: mrr_at_1 value: 30.634 - type: mrr_at_10 value: 40.19449999999999 - type: mrr_at_100 value: 41.099250000000005 - type: mrr_at_1000 value: 41.15091666666667 - type: mrr_at_3 value: 37.4615 - type: mrr_at_5 value: 39.00216666666667 - type: ndcg_at_1 value: 30.634 - type: ndcg_at_10 value: 42.162166666666664 - type: ndcg_at_100 value: 47.60708333333333 - type: ndcg_at_1000 value: 49.68616666666666 - type: ndcg_at_3 value: 36.60316666666666 - type: ndcg_at_5 value: 39.15616666666668 - type: precision_at_1 value: 30.634 - type: precision_at_10 value: 7.6193333333333335 - type: precision_at_100 value: 1.2198333333333333 - type: precision_at_1000 value: 0.15975000000000003 - type: precision_at_3 value: 17.087 - type: precision_at_5 value: 12.298333333333334 - type: recall_at_1 value: 25.91033333333334 - type: recall_at_10 value: 55.67300000000001 - type: recall_at_100 value: 79.20608333333334 - type: recall_at_1000 value: 93.34866666666667 - type: recall_at_3 value: 40.34858333333333 - type: recall_at_5 value: 46.834083333333325 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 25.006 - type: map_at_10 value: 32.177 - type: map_at_100 value: 33.324999999999996 - type: map_at_1000 value: 33.419 - type: map_at_3 value: 29.952 - type: map_at_5 value: 31.095 - type: mrr_at_1 value: 28.066999999999997 - type: mrr_at_10 value: 34.995 - type: mrr_at_100 value: 35.978 - type: mrr_at_1000 value: 36.042 - type: mrr_at_3 value: 33.103 - type: mrr_at_5 value: 34.001 - type: ndcg_at_1 value: 28.066999999999997 - type: ndcg_at_10 value: 36.481 - type: ndcg_at_100 value: 42.022999999999996 - type: ndcg_at_1000 value: 44.377 - type: ndcg_at_3 value: 32.394 - type: ndcg_at_5 value: 34.108 - type: precision_at_1 value: 28.066999999999997 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 13.804 - type: precision_at_5 value: 9.508999999999999 - type: recall_at_1 value: 25.006 - type: recall_at_10 value: 46.972 - type: recall_at_100 value: 72.138 - type: recall_at_1000 value: 89.479 - type: recall_at_3 value: 35.793 - type: recall_at_5 value: 39.947 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 16.07 - type: map_at_10 value: 24.447 - type: map_at_100 value: 25.685999999999996 - type: map_at_1000 value: 25.813999999999997 - type: map_at_3 value: 21.634 - type: map_at_5 value: 23.133 - type: mrr_at_1 value: 19.580000000000002 - type: mrr_at_10 value: 28.127999999999997 - type: mrr_at_100 value: 29.119 - type: mrr_at_1000 value: 29.192 - type: mrr_at_3 value: 25.509999999999998 - type: mrr_at_5 value: 26.878 - type: ndcg_at_1 value: 19.580000000000002 - type: ndcg_at_10 value: 29.804000000000002 - type: ndcg_at_100 value: 35.555 - type: ndcg_at_1000 value: 38.421 - type: ndcg_at_3 value: 24.654999999999998 - type: ndcg_at_5 value: 26.881 - type: precision_at_1 value: 19.580000000000002 - type: precision_at_10 value: 5.736 - type: precision_at_100 value: 1.005 - type: precision_at_1000 value: 0.145 - type: precision_at_3 value: 12.033000000000001 - type: precision_at_5 value: 8.871 - type: recall_at_1 value: 16.07 - type: recall_at_10 value: 42.364000000000004 - type: recall_at_100 value: 68.01899999999999 - type: recall_at_1000 value: 88.122 - type: recall_at_3 value: 27.846 - type: recall_at_5 value: 33.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 26.365 - type: map_at_10 value: 36.591 - type: map_at_100 value: 37.730000000000004 - type: map_at_1000 value: 37.84 - type: map_at_3 value: 33.403 - type: map_at_5 value: 35.272999999999996 - type: mrr_at_1 value: 30.503999999999998 - type: mrr_at_10 value: 39.940999999999995 - type: mrr_at_100 value: 40.818 - type: mrr_at_1000 value: 40.876000000000005 - type: mrr_at_3 value: 37.065 - type: mrr_at_5 value: 38.814 - type: ndcg_at_1 value: 30.503999999999998 - type: ndcg_at_10 value: 42.185 - type: ndcg_at_100 value: 47.416000000000004 - type: ndcg_at_1000 value: 49.705 - type: ndcg_at_3 value: 36.568 - type: ndcg_at_5 value: 39.416000000000004 - type: precision_at_1 value: 30.503999999999998 - type: precision_at_10 value: 7.276000000000001 - type: precision_at_100 value: 1.118 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 16.729 - type: precision_at_5 value: 12.107999999999999 - type: recall_at_1 value: 26.365 - type: recall_at_10 value: 55.616 - type: recall_at_100 value: 78.129 - type: recall_at_1000 value: 93.95599999999999 - type: recall_at_3 value: 40.686 - type: recall_at_5 value: 47.668 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.750999999999998 - type: map_at_10 value: 33.446 - type: map_at_100 value: 35.235 - type: map_at_1000 value: 35.478 - type: map_at_3 value: 29.358 - type: map_at_5 value: 31.525 - type: mrr_at_1 value: 27.668 - type: mrr_at_10 value: 37.694 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.779 - type: mrr_at_3 value: 34.223 - type: mrr_at_5 value: 36.08 - type: ndcg_at_1 value: 27.668 - type: ndcg_at_10 value: 40.557 - type: ndcg_at_100 value: 46.605999999999995 - type: ndcg_at_1000 value: 48.917 - type: ndcg_at_3 value: 33.677 - type: ndcg_at_5 value: 36.85 - type: precision_at_1 value: 27.668 - type: precision_at_10 value: 8.3 - type: precision_at_100 value: 1.6260000000000001 - type: precision_at_1000 value: 0.253 - type: precision_at_3 value: 16.008 - type: precision_at_5 value: 12.292 - type: recall_at_1 value: 22.750999999999998 - type: recall_at_10 value: 55.643 - type: recall_at_100 value: 82.151 - type: recall_at_1000 value: 95.963 - type: recall_at_3 value: 36.623 - type: recall_at_5 value: 44.708 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.288999999999998 - type: map_at_10 value: 25.903 - type: map_at_100 value: 27.071 - type: map_at_1000 value: 27.173000000000002 - type: map_at_3 value: 22.935 - type: map_at_5 value: 24.573 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.682000000000002 - type: mrr_at_100 value: 28.691 - type: mrr_at_1000 value: 28.761 - type: mrr_at_3 value: 24.738 - type: mrr_at_5 value: 26.392 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.335 - type: ndcg_at_100 value: 36.913000000000004 - type: ndcg_at_1000 value: 39.300000000000004 - type: ndcg_at_3 value: 25.423000000000002 - type: ndcg_at_5 value: 28.262999999999998 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.379 - type: precision_at_100 value: 0.876 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 11.214 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.288999999999998 - type: recall_at_10 value: 46.377 - type: recall_at_100 value: 71.53500000000001 - type: recall_at_1000 value: 88.947 - type: recall_at_3 value: 30.581999999999997 - type: recall_at_5 value: 37.354 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 21.795 - type: map_at_10 value: 37.614999999999995 - type: map_at_100 value: 40.037 - type: map_at_1000 value: 40.184999999999995 - type: map_at_3 value: 32.221 - type: map_at_5 value: 35.154999999999994 - type: mrr_at_1 value: 50.358000000000004 - type: mrr_at_10 value: 62.129 - type: mrr_at_100 value: 62.613 - type: mrr_at_1000 value: 62.62 - type: mrr_at_3 value: 59.272999999999996 - type: mrr_at_5 value: 61.138999999999996 - type: ndcg_at_1 value: 50.358000000000004 - type: ndcg_at_10 value: 48.362 - type: ndcg_at_100 value: 55.932 - type: ndcg_at_1000 value: 58.062999999999995 - type: ndcg_at_3 value: 42.111 - type: ndcg_at_5 value: 44.063 - type: precision_at_1 value: 50.358000000000004 - type: precision_at_10 value: 14.677999999999999 - type: precision_at_100 value: 2.2950000000000004 - type: precision_at_1000 value: 0.271 - type: precision_at_3 value: 31.77 - type: precision_at_5 value: 23.375 - type: recall_at_1 value: 21.795 - type: recall_at_10 value: 53.846000000000004 - type: recall_at_100 value: 78.952 - type: recall_at_1000 value: 90.41900000000001 - type: recall_at_3 value: 37.257 - type: recall_at_5 value: 44.661 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.728 - type: map_at_10 value: 22.691 - type: map_at_100 value: 31.734 - type: map_at_1000 value: 33.464 - type: map_at_3 value: 16.273 - type: map_at_5 value: 19.016 - type: mrr_at_1 value: 73.25 - type: mrr_at_10 value: 80.782 - type: mrr_at_100 value: 81.01899999999999 - type: mrr_at_1000 value: 81.021 - type: mrr_at_3 value: 79.583 - type: mrr_at_5 value: 80.146 - type: ndcg_at_1 value: 59.62499999999999 - type: ndcg_at_10 value: 46.304 - type: ndcg_at_100 value: 51.23 - type: ndcg_at_1000 value: 58.048 - type: ndcg_at_3 value: 51.541000000000004 - type: ndcg_at_5 value: 48.635 - type: precision_at_1 value: 73.25 - type: precision_at_10 value: 36.375 - type: precision_at_100 value: 11.53 - type: precision_at_1000 value: 2.23 - type: precision_at_3 value: 55.583000000000006 - type: precision_at_5 value: 47.15 - type: recall_at_1 value: 9.728 - type: recall_at_10 value: 28.793999999999997 - type: recall_at_100 value: 57.885 - type: recall_at_1000 value: 78.759 - type: recall_at_3 value: 17.79 - type: recall_at_5 value: 21.733 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.775 - type: f1 value: 41.89794273264891 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 85.378 - type: map_at_10 value: 91.51 - type: map_at_100 value: 91.666 - type: map_at_1000 value: 91.676 - type: map_at_3 value: 90.757 - type: map_at_5 value: 91.277 - type: mrr_at_1 value: 91.839 - type: mrr_at_10 value: 95.49 - type: mrr_at_100 value: 95.493 - type: mrr_at_1000 value: 95.493 - type: mrr_at_3 value: 95.345 - type: mrr_at_5 value: 95.47200000000001 - type: ndcg_at_1 value: 91.839 - type: ndcg_at_10 value: 93.806 - type: ndcg_at_100 value: 94.255 - type: ndcg_at_1000 value: 94.399 - type: ndcg_at_3 value: 93.027 - type: ndcg_at_5 value: 93.51 - type: precision_at_1 value: 91.839 - type: precision_at_10 value: 10.93 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 34.873 - type: precision_at_5 value: 21.44 - type: recall_at_1 value: 85.378 - type: recall_at_10 value: 96.814 - type: recall_at_100 value: 98.386 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 94.643 - type: recall_at_5 value: 95.976 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 32.190000000000005 - type: map_at_10 value: 53.605000000000004 - type: map_at_100 value: 55.550999999999995 - type: map_at_1000 value: 55.665 - type: map_at_3 value: 46.62 - type: map_at_5 value: 50.517999999999994 - type: mrr_at_1 value: 60.34 - type: mrr_at_10 value: 70.775 - type: mrr_at_100 value: 71.238 - type: mrr_at_1000 value: 71.244 - type: mrr_at_3 value: 68.72399999999999 - type: mrr_at_5 value: 69.959 - type: ndcg_at_1 value: 60.34 - type: ndcg_at_10 value: 63.226000000000006 - type: ndcg_at_100 value: 68.60300000000001 - type: ndcg_at_1000 value: 69.901 - type: ndcg_at_3 value: 58.048 - type: ndcg_at_5 value: 59.789 - type: precision_at_1 value: 60.34 - type: precision_at_10 value: 17.130000000000003 - type: precision_at_100 value: 2.29 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 38.323 - type: precision_at_5 value: 27.87 - type: recall_at_1 value: 32.190000000000005 - type: recall_at_10 value: 73.041 - type: recall_at_100 value: 91.31 - type: recall_at_1000 value: 98.104 - type: recall_at_3 value: 53.70399999999999 - type: recall_at_5 value: 62.358999999999995 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 43.511 - type: map_at_10 value: 58.15 - type: map_at_100 value: 58.95399999999999 - type: map_at_1000 value: 59.018 - type: map_at_3 value: 55.31700000000001 - type: map_at_5 value: 57.04900000000001 - type: mrr_at_1 value: 87.022 - type: mrr_at_10 value: 91.32000000000001 - type: mrr_at_100 value: 91.401 - type: mrr_at_1000 value: 91.403 - type: mrr_at_3 value: 90.77 - type: mrr_at_5 value: 91.156 - type: ndcg_at_1 value: 87.022 - type: ndcg_at_10 value: 68.183 - type: ndcg_at_100 value: 70.781 - type: ndcg_at_1000 value: 72.009 - type: ndcg_at_3 value: 64.334 - type: ndcg_at_5 value: 66.449 - type: precision_at_1 value: 87.022 - type: precision_at_10 value: 13.406 - type: precision_at_100 value: 1.542 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 39.023 - type: precision_at_5 value: 25.080000000000002 - type: recall_at_1 value: 43.511 - type: recall_at_10 value: 67.02900000000001 - type: recall_at_100 value: 77.11 - type: recall_at_1000 value: 85.294 - type: recall_at_3 value: 58.535000000000004 - type: recall_at_5 value: 62.70099999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.0996 - type: ap value: 87.86206089096373 - type: f1 value: 92.07554547510763 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.179 - type: map_at_10 value: 35.86 - type: map_at_100 value: 37.025999999999996 - type: map_at_1000 value: 37.068 - type: map_at_3 value: 31.921 - type: map_at_5 value: 34.172000000000004 - type: mrr_at_1 value: 23.926 - type: mrr_at_10 value: 36.525999999999996 - type: mrr_at_100 value: 37.627 - type: mrr_at_1000 value: 37.665 - type: mrr_at_3 value: 32.653 - type: mrr_at_5 value: 34.897 - type: ndcg_at_1 value: 23.910999999999998 - type: ndcg_at_10 value: 42.927 - type: ndcg_at_100 value: 48.464 - type: ndcg_at_1000 value: 49.533 - type: ndcg_at_3 value: 34.910000000000004 - type: ndcg_at_5 value: 38.937 - type: precision_at_1 value: 23.910999999999998 - type: precision_at_10 value: 6.758 - type: precision_at_100 value: 0.9520000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.838000000000001 - type: precision_at_5 value: 10.934000000000001 - type: recall_at_1 value: 23.179 - type: recall_at_10 value: 64.622 - type: recall_at_100 value: 90.135 - type: recall_at_1000 value: 98.301 - type: recall_at_3 value: 42.836999999999996 - type: recall_at_5 value: 52.512 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.59598723210215 - type: f1 value: 96.41913500001952 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.89557683538533 - type: f1 value: 63.379319722356264 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 78.93745796906524 - type: f1 value: 75.71616541785902 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.41223940820443 - type: f1 value: 81.2877893719078 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 35.03682528325662 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.942529406124 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.459949660460317 - type: mrr value: 32.70509582031616 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.497 - type: map_at_10 value: 13.843 - type: map_at_100 value: 17.713 - type: map_at_1000 value: 19.241 - type: map_at_3 value: 10.096 - type: map_at_5 value: 11.85 - type: mrr_at_1 value: 48.916 - type: mrr_at_10 value: 57.764 - type: mrr_at_100 value: 58.251 - type: mrr_at_1000 value: 58.282999999999994 - type: mrr_at_3 value: 55.623999999999995 - type: mrr_at_5 value: 57.018 - type: ndcg_at_1 value: 46.594 - type: ndcg_at_10 value: 36.945 - type: ndcg_at_100 value: 34.06 - type: ndcg_at_1000 value: 43.05 - type: ndcg_at_3 value: 41.738 - type: ndcg_at_5 value: 39.330999999999996 - type: precision_at_1 value: 48.916 - type: precision_at_10 value: 27.43 - type: precision_at_100 value: 8.616 - type: precision_at_1000 value: 2.155 - type: precision_at_3 value: 39.112 - type: precision_at_5 value: 33.808 - type: recall_at_1 value: 6.497 - type: recall_at_10 value: 18.163 - type: recall_at_100 value: 34.566 - type: recall_at_1000 value: 67.15 - type: recall_at_3 value: 11.100999999999999 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 31.916 - type: map_at_10 value: 48.123 - type: map_at_100 value: 49.103 - type: map_at_1000 value: 49.131 - type: map_at_3 value: 43.711 - type: map_at_5 value: 46.323 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 50.617999999999995 - type: mrr_at_100 value: 51.329 - type: mrr_at_1000 value: 51.348000000000006 - type: mrr_at_3 value: 47.010999999999996 - type: mrr_at_5 value: 49.175000000000004 - type: ndcg_at_1 value: 36.181999999999995 - type: ndcg_at_10 value: 56.077999999999996 - type: ndcg_at_100 value: 60.037 - type: ndcg_at_1000 value: 60.63499999999999 - type: ndcg_at_3 value: 47.859 - type: ndcg_at_5 value: 52.178999999999995 - type: precision_at_1 value: 36.181999999999995 - type: precision_at_10 value: 9.284 - type: precision_at_100 value: 1.149 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 22.006999999999998 - type: precision_at_5 value: 15.695 - type: recall_at_1 value: 31.916 - type: recall_at_10 value: 77.771 - type: recall_at_100 value: 94.602 - type: recall_at_1000 value: 98.967 - type: recall_at_3 value: 56.528 - type: recall_at_5 value: 66.527 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.486 - type: map_at_10 value: 85.978 - type: map_at_100 value: 86.587 - type: map_at_1000 value: 86.598 - type: map_at_3 value: 83.04899999999999 - type: map_at_5 value: 84.857 - type: mrr_at_1 value: 82.32000000000001 - type: mrr_at_10 value: 88.64 - type: mrr_at_100 value: 88.702 - type: mrr_at_1000 value: 88.702 - type: mrr_at_3 value: 87.735 - type: mrr_at_5 value: 88.36 - type: ndcg_at_1 value: 82.34 - type: ndcg_at_10 value: 89.67 - type: ndcg_at_100 value: 90.642 - type: ndcg_at_1000 value: 90.688 - type: ndcg_at_3 value: 86.932 - type: ndcg_at_5 value: 88.408 - type: precision_at_1 value: 82.34 - type: precision_at_10 value: 13.675999999999998 - type: precision_at_100 value: 1.544 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 38.24 - type: precision_at_5 value: 25.068 - type: recall_at_1 value: 71.486 - type: recall_at_10 value: 96.844 - type: recall_at_100 value: 99.843 - type: recall_at_1000 value: 99.996 - type: recall_at_3 value: 88.92099999999999 - type: recall_at_5 value: 93.215 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 59.75758437908334 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 68.03497914092789 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.808 - type: map_at_10 value: 16.059 - type: map_at_100 value: 19.048000000000002 - type: map_at_1000 value: 19.43 - type: map_at_3 value: 10.953 - type: map_at_5 value: 13.363 - type: mrr_at_1 value: 28.7 - type: mrr_at_10 value: 42.436 - type: mrr_at_100 value: 43.599 - type: mrr_at_1000 value: 43.62 - type: mrr_at_3 value: 38.45 - type: mrr_at_5 value: 40.89 - type: ndcg_at_1 value: 28.7 - type: ndcg_at_10 value: 26.346000000000004 - type: ndcg_at_100 value: 36.758 - type: ndcg_at_1000 value: 42.113 - type: ndcg_at_3 value: 24.254 - type: ndcg_at_5 value: 21.506 - type: precision_at_1 value: 28.7 - type: precision_at_10 value: 13.969999999999999 - type: precision_at_100 value: 2.881 - type: precision_at_1000 value: 0.414 - type: precision_at_3 value: 22.933 - type: precision_at_5 value: 19.220000000000002 - type: recall_at_1 value: 5.808 - type: recall_at_10 value: 28.310000000000002 - type: recall_at_100 value: 58.475 - type: recall_at_1000 value: 84.072 - type: recall_at_3 value: 13.957 - type: recall_at_5 value: 19.515 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.39274129958557 - type: cos_sim_spearman value: 79.78021235170053 - type: euclidean_pearson value: 79.35335401300166 - type: euclidean_spearman value: 79.7271870968275 - type: manhattan_pearson value: 79.35256263340601 - type: manhattan_spearman value: 79.76036386976321 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.99130429246708 - type: cos_sim_spearman value: 73.88322811171203 - type: euclidean_pearson value: 80.7569419170376 - type: euclidean_spearman value: 73.82542155409597 - type: manhattan_pearson value: 80.79468183847625 - type: manhattan_spearman value: 73.87027144047784 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.88548789489907 - type: cos_sim_spearman value: 85.07535893847255 - type: euclidean_pearson value: 84.6637222061494 - type: euclidean_spearman value: 85.14200626702456 - type: manhattan_pearson value: 84.75327892344734 - type: manhattan_spearman value: 85.24406181838596 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.88140039325008 - type: cos_sim_spearman value: 79.61211268112362 - type: euclidean_pearson value: 81.29639728816458 - type: euclidean_spearman value: 79.51284578041442 - type: manhattan_pearson value: 81.3381797137111 - type: manhattan_spearman value: 79.55683684039808 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.16716737270485 - type: cos_sim_spearman value: 86.14823841857738 - type: euclidean_pearson value: 85.36325733440725 - type: euclidean_spearman value: 86.04919691402029 - type: manhattan_pearson value: 85.3147511385052 - type: manhattan_spearman value: 86.00676205857764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 80.34266645861588 - type: cos_sim_spearman value: 81.59914035005882 - type: euclidean_pearson value: 81.15053076245988 - type: euclidean_spearman value: 81.52776915798489 - type: manhattan_pearson value: 81.1819647418673 - type: manhattan_spearman value: 81.57479527353556 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 89.38263326821439 - type: cos_sim_spearman value: 89.10946308202642 - type: euclidean_pearson value: 88.87831312540068 - type: euclidean_spearman value: 89.03615865973664 - type: manhattan_pearson value: 88.79835539970384 - type: manhattan_spearman value: 88.9766156339753 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 70.1574915581685 - type: cos_sim_spearman value: 70.59144980004054 - type: euclidean_pearson value: 71.43246306918755 - type: euclidean_spearman value: 70.5544189562984 - type: manhattan_pearson value: 71.4071414609503 - type: manhattan_spearman value: 70.31799126163712 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.36215796635351 - type: cos_sim_spearman value: 83.07276756467208 - type: euclidean_pearson value: 83.06690453635584 - type: euclidean_spearman value: 82.9635366303289 - type: manhattan_pearson value: 83.04994049700815 - type: manhattan_spearman value: 82.98120125356036 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 86.92530011616722 - type: mrr value: 96.21826793395421 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 65.75 - type: map_at_10 value: 77.701 - type: map_at_100 value: 78.005 - type: map_at_1000 value: 78.006 - type: map_at_3 value: 75.48 - type: map_at_5 value: 76.927 - type: mrr_at_1 value: 68.333 - type: mrr_at_10 value: 78.511 - type: mrr_at_100 value: 78.704 - type: mrr_at_1000 value: 78.704 - type: mrr_at_3 value: 77 - type: mrr_at_5 value: 78.083 - type: ndcg_at_1 value: 68.333 - type: ndcg_at_10 value: 82.42699999999999 - type: ndcg_at_100 value: 83.486 - type: ndcg_at_1000 value: 83.511 - type: ndcg_at_3 value: 78.96300000000001 - type: ndcg_at_5 value: 81.028 - type: precision_at_1 value: 68.333 - type: precision_at_10 value: 10.667 - type: precision_at_100 value: 1.127 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 31.333 - type: precision_at_5 value: 20.133000000000003 - type: recall_at_1 value: 65.75 - type: recall_at_10 value: 95.578 - type: recall_at_100 value: 99.833 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 86.506 - type: recall_at_5 value: 91.75 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.75247524752476 - type: cos_sim_ap value: 94.16065078045173 - type: cos_sim_f1 value: 87.22986247544205 - type: cos_sim_precision value: 85.71428571428571 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.74554455445545 - type: dot_ap value: 93.90633887037264 - type: dot_f1 value: 86.9873417721519 - type: dot_precision value: 88.1025641025641 - type: dot_recall value: 85.9 - type: euclidean_accuracy value: 99.75247524752476 - type: euclidean_ap value: 94.17466319018055 - type: euclidean_f1 value: 87.3405299313052 - type: euclidean_precision value: 85.74181117533719 - type: euclidean_recall value: 89 - type: manhattan_accuracy value: 99.75445544554455 - type: manhattan_ap value: 94.27688371923577 - type: manhattan_f1 value: 87.74002954209749 - type: manhattan_precision value: 86.42095053346266 - type: manhattan_recall value: 89.1 - type: max_accuracy value: 99.75445544554455 - type: max_ap value: 94.27688371923577 - type: max_f1 value: 87.74002954209749 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 71.26500637517056 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 39.17507906280528 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.4848744828509 - type: mrr value: 53.33678168236992 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.599864323827887 - type: cos_sim_spearman value: 30.91116204665598 - type: dot_pearson value: 30.82637894269936 - type: dot_spearman value: 30.957573868416066 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.23600000000000002 - type: map_at_10 value: 1.892 - type: map_at_100 value: 11.586 - type: map_at_1000 value: 27.761999999999997 - type: map_at_3 value: 0.653 - type: map_at_5 value: 1.028 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 94 - type: mrr_at_100 value: 94 - type: mrr_at_1000 value: 94 - type: mrr_at_3 value: 94 - type: mrr_at_5 value: 94 - type: ndcg_at_1 value: 82 - type: ndcg_at_10 value: 77.48899999999999 - type: ndcg_at_100 value: 60.141 - type: ndcg_at_1000 value: 54.228 - type: ndcg_at_3 value: 82.358 - type: ndcg_at_5 value: 80.449 - type: precision_at_1 value: 88 - type: precision_at_10 value: 82.19999999999999 - type: precision_at_100 value: 61.760000000000005 - type: precision_at_1000 value: 23.684 - type: precision_at_3 value: 88 - type: precision_at_5 value: 85.6 - type: recall_at_1 value: 0.23600000000000002 - type: recall_at_10 value: 2.117 - type: recall_at_100 value: 14.985000000000001 - type: recall_at_1000 value: 51.107 - type: recall_at_3 value: 0.688 - type: recall_at_5 value: 1.1039999999999999 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 2.3040000000000003 - type: map_at_10 value: 9.025 - type: map_at_100 value: 15.312999999999999 - type: map_at_1000 value: 16.954 - type: map_at_3 value: 4.981 - type: map_at_5 value: 6.32 - type: mrr_at_1 value: 24.490000000000002 - type: mrr_at_10 value: 39.835 - type: mrr_at_100 value: 40.8 - type: mrr_at_1000 value: 40.8 - type: mrr_at_3 value: 35.034 - type: mrr_at_5 value: 37.687 - type: ndcg_at_1 value: 22.448999999999998 - type: ndcg_at_10 value: 22.545 - type: ndcg_at_100 value: 35.931999999999995 - type: ndcg_at_1000 value: 47.665 - type: ndcg_at_3 value: 23.311 - type: ndcg_at_5 value: 22.421 - type: precision_at_1 value: 24.490000000000002 - type: precision_at_10 value: 20.408 - type: precision_at_100 value: 7.815999999999999 - type: precision_at_1000 value: 1.553 - type: precision_at_3 value: 25.169999999999998 - type: precision_at_5 value: 23.265 - type: recall_at_1 value: 2.3040000000000003 - type: recall_at_10 value: 15.693999999999999 - type: recall_at_100 value: 48.917 - type: recall_at_1000 value: 84.964 - type: recall_at_3 value: 6.026 - type: recall_at_5 value: 9.066 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 82.6074 - type: ap value: 23.187467098602013 - type: f1 value: 65.36829506379657 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 63.16355404640635 - type: f1 value: 63.534725639863346 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.91004094411276 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.55301901412649 - type: cos_sim_ap value: 75.25312618556728 - type: cos_sim_f1 value: 68.76561719140429 - type: cos_sim_precision value: 65.3061224489796 - type: cos_sim_recall value: 72.61213720316623 - type: dot_accuracy value: 86.29671574178936 - type: dot_ap value: 75.11910195501207 - type: dot_f1 value: 68.44048376830045 - type: dot_precision value: 66.12546125461255 - type: dot_recall value: 70.92348284960423 - type: euclidean_accuracy value: 86.5828217202122 - type: euclidean_ap value: 75.22986344900924 - type: euclidean_f1 value: 68.81267797449549 - type: euclidean_precision value: 64.8238861674831 - type: euclidean_recall value: 73.3245382585752 - type: manhattan_accuracy value: 86.61262442629791 - type: manhattan_ap value: 75.24401608557328 - type: manhattan_f1 value: 68.80473982483257 - type: manhattan_precision value: 67.21187720181177 - type: manhattan_recall value: 70.47493403693932 - type: max_accuracy value: 86.61262442629791 - type: max_ap value: 75.25312618556728 - type: max_f1 value: 68.81267797449549 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.10688089416696 - type: cos_sim_ap value: 84.17862178779863 - type: cos_sim_f1 value: 76.17305208781748 - type: cos_sim_precision value: 71.31246641590543 - type: cos_sim_recall value: 81.74468740375731 - type: dot_accuracy value: 88.1844995536927 - type: dot_ap value: 84.33816725235876 - type: dot_f1 value: 76.43554032918746 - type: dot_precision value: 74.01557767200346 - type: dot_recall value: 79.0190945488143 - type: euclidean_accuracy value: 88.07001203089223 - type: euclidean_ap value: 84.12267000814985 - type: euclidean_f1 value: 76.12232600180778 - type: euclidean_precision value: 74.50604541433205 - type: euclidean_recall value: 77.81028641823221 - type: manhattan_accuracy value: 88.06419063142779 - type: manhattan_ap value: 84.11648917164187 - type: manhattan_f1 value: 76.20579953925474 - type: manhattan_precision value: 72.56772755762935 - type: manhattan_recall value: 80.22790267939637 - type: max_accuracy value: 88.1844995536927 - type: max_ap value: 84.33816725235876 - type: max_f1 value: 76.43554032918746 --- # gte-large-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with sentence-transformers: Use with : ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000 - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000 - CPT: max_len 512, lr 5e-5, batch_size 28672, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 409 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:", + "model_explanation_gemini": "Generates sentence embeddings for English text to perform tasks like classification, retrieval, clustering, and similarity scoring." +} \ No newline at end of file diff --git a/data/model_data_json/Alibaba-NLP_gte-multilingual-base.json b/data/model_data_json/Alibaba-NLP_gte-multilingual-base.json new file mode 100644 index 0000000000000000000000000000000000000000..1ec44af59c87bdf1a681e3fd3083f008478848ba --- /dev/null +++ b/data/model_data_json/Alibaba-NLP_gte-multilingual-base.json @@ -0,0 +1,104 @@ +{ + "model_id": "Alibaba-NLP/gte-multilingual-base", + "downloads": 1331518, + "tags": [ + "sentence-transformers", + "safetensors", + "new", + "feature-extraction", + "mteb", + "transformers", + "multilingual", + "sentence-similarity", + "custom_code", + "af", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "gl", + "gu", + "he", + "hi", + "hr", + "ht", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ky", + "lo", + "lt", + "lv", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "pa", + "pl", + "pt", + "qu", + "ro", + "ru", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "uk", + "ur", + "vi", + "yo", + "zh", + "arxiv:2407.19669", + "arxiv:2210.09984", + "arxiv:2402.03216", + "arxiv:2007.15207", + "arxiv:2104.08663", + "arxiv:2402.07440", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-transformers - transformers - multilingual - sentence-similarity license: apache-2.0 language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh model-index: - name: gte-multilingual-base (dense) results: - task: type: Clustering dataset: type: PL-MTEB/8tags-clustering name: MTEB 8TagsClustering config: default split: test revision: None metrics: - type: v_measure value: 33.66681726329994 - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: b44c3b011063adb25877c13823db83bb193913c4 metrics: - type: cos_sim_spearman value: 43.54760696384009 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 metrics: - type: cos_sim_spearman value: 48.91186363417501 - task: type: Classification dataset: type: PL-MTEB/allegro-reviews name: MTEB AllegroReviews config: default split: test revision: None metrics: - type: accuracy value: 41.689860834990064 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringP2P config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 54.20241337977897 - task: type: Clustering dataset: type: lyon-nlp/alloprof name: MTEB AlloProfClusteringS2S config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: v_measure value: 44.34083695608643 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-alloprof-s2p name: MTEB AlloprofReranking config: default split: test revision: 666fdacebe0291776e86f29345663dfaf80a0db9 metrics: - type: map value: 64.91495250072002 - task: type: Retrieval dataset: type: lyon-nlp/alloprof name: MTEB AlloprofRetrieval config: default split: test revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b metrics: - type: ndcg_at_10 value: 53.638 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.95522388059702 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 80.717625 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 43.64199999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.108 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.169999999999995 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 39.56799999999999 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 35.75000000000001 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 33.342000000000006 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: ndcg_at_10 value: 58.231 - task: type: Retrieval dataset: type: clarin-knext/arguana-pl name: MTEB ArguAna-PL config: default split: test revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 metrics: - type: ndcg_at_10 value: 53.166000000000004 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 46.01900557959478 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 41.06626465345723 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.87514497610431 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_spearman value: 81.21450112991194 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 metrics: - type: cos_sim_spearman value: 51.71589543397271 - task: type: Retrieval dataset: type: maastrichtlawtech/bsard name: MTEB BSARDRetrieval config: default split: test revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 metrics: - type: ndcg_at_10 value: 26.115 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.6169102296451 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.89603052314916 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 97.12388869645537 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: f1 value: 98.15692469720906 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.36038961038962 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.5903826674123 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 34.21474277151329 - task: type: Classification dataset: type: PL-MTEB/cbd name: MTEB CBD config: default split: test revision: None metrics: - type: accuracy value: 62.519999999999996 - task: type: PairClassification dataset: type: PL-MTEB/cdsce-pairclassification name: MTEB CDSC-E config: default split: test revision: None metrics: - type: cos_sim_ap value: 74.90132799162956 - task: type: STS dataset: type: PL-MTEB/cdscr-sts name: MTEB CDSC-R config: default split: test revision: None metrics: - type: cos_sim_spearman value: 90.30727955142524 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 metrics: - type: v_measure value: 37.94850105022274 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f metrics: - type: v_measure value: 38.11958675421534 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: 8d7f1e942507dac42dc58017c1a001c3717da7df metrics: - type: map value: 86.10950950485399 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: 23d186750531a14a0357ca22cd92d712fd512ea0 metrics: - type: map value: 87.28038294231966 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: ndcg_at_10 value: 47.099000000000004 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: ndcg_at_10 value: 45.973000000000006 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: ndcg_at_10 value: 55.606 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: ndcg_at_10 value: 36.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: ndcg_at_10 value: 30.711 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: ndcg_at_10 value: 44.523 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: ndcg_at_10 value: 37.940000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 38.12183333333333 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: ndcg_at_10 value: 32.684000000000005 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: ndcg_at_10 value: 26.735 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: ndcg_at_10 value: 36.933 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: ndcg_at_10 value: 33.747 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 28.872999999999998 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: ndcg_at_10 value: 34.833 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 metrics: - type: ndcg_at_10 value: 43.78 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 metrics: - type: cos_sim_ap value: 84.00640599186677 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: 1271c7809071a13532e05f25fb53511ffce77117 metrics: - type: ndcg_at_10 value: 80.60000000000001 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: ndcg_at_10 value: 40.116 - task: type: Retrieval dataset: type: clarin-knext/dbpedia-pl name: MTEB DBPedia-PL config: default split: test revision: 76afe41d9af165cc40999fcaa92312b8b012064a metrics: - type: ndcg_at_10 value: 32.498 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: a1a333e290fe30b10f3f56498e3a0d911a693ced metrics: - type: ndcg_at_10 value: 87.547 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 metrics: - type: ndcg_at_10 value: 64.85 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.949999999999996 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: ndcg_at_10 value: 92.111 - task: type: Retrieval dataset: type: clarin-knext/fiqa-pl name: MTEB FiQA-PL config: default split: test revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e metrics: - type: ndcg_at_10 value: 28.962 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: ndcg_at_10 value: 45.005 - task: type: Clustering dataset: type: lyon-nlp/clustering-hal-s2s name: MTEB HALClusteringS2S config: default split: test revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 metrics: - type: v_measure value: 25.133776435657595 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: ndcg_at_10 value: 63.036 - task: type: Retrieval dataset: type: clarin-knext/hotpotqa-pl name: MTEB HotpotQA-PL config: default split: test revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 metrics: - type: ndcg_at_10 value: 56.904999999999994 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: 421605374b29664c5fc098418fe20ada9bd55f8a metrics: - type: accuracy value: 44.59407464409388 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 74.912 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: b7c64bd89eb87f8ded463478346f76731f07bf8b metrics: - type: accuracy value: 79.26829268292683 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: 17f9b096f80380fce5ed12a9be8be7784b337daf metrics: - type: cos_sim_spearman value: 74.8601229809791 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringP2P config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 42.331902754246556 - task: type: Clustering dataset: type: mlsum name: MTEB MLSUMClusteringS2S config: default split: test revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 metrics: - type: v_measure value: 40.92029335502153 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 metrics: - type: map value: 32.19266316591337 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 metrics: - type: ndcg_at_10 value: 79.346 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: ndcg_at_10 value: 39.922999999999995 - task: type: Retrieval dataset: type: clarin-knext/msmarco-pl name: MTEB MSMARCO-PL config: default split: test revision: 8634c07806d5cce3a6138e260e59b81760a0a640 metrics: - type: ndcg_at_10 value: 55.620999999999995 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.53989968080255 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.26993519301212 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.87725150100067 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 87.48512370811149 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.45141627823591 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 83.45750452079565 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.57637938896488 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.50803043110736 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 71.6577718478986 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 64.05887879736925 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 65.27070634636071 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.04520795660037 - task: type: Classification dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClassification (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: accuracy value: 80.66350710900474 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 44.016506455899425 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (fra) config: fra split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 40.67730129573544 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.94552790854068 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 49.273705447209146 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.490921318090116 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.97511768661733 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.5689307330195 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 48.34902488231337 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.6684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.54539340954942 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.08675184936112 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.12508406186953 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.41425689307331 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.59515803631474 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.90517821116342 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.91526563550774 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.198386012104905 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.04371217215869 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.31203765971756 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.521183591123055 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.06254203093476 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.01546738399461 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.27975790181574 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.79556153328849 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.18493611297915 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 47.888365837256224 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.79690652320108 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.225958305312716 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.58641560188299 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.08204438466711 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.54606590450572 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.443174176193665 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.45662407531944 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.739071956960316 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.36180228648286 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.3920645595158 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.06993947545395 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.123739071956955 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.46133154001346 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.54472091459314 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.204438466711494 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.69603227975792 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.684599865501 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.523873570948226 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.53396099529253 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.88298587760591 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.65097511768662 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.8453261600538 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.6247478143914 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.16274377942166 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.61667787491594 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.17283120376598 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.89912575655683 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.27975790181573 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.269670477471415 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.10423671822461 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.40753194351043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 55.369872225958304 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.60726294552792 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.30262273032952 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.52925353059851 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.28446536650976 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.45460659045058 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.26563550773368 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.20578345662408 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.64963012777405 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.698049764626774 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.14458641560188 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.51445864156018 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.13786146603901 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.61533288500337 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.526563550773375 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.99731002017484 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.59381304640216 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.010759919300604 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 53.26160053799597 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.800941492938804 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.387357094821795 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.5359784801614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.36919973100203 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.81506388702084 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.35104236718225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.67787491593813 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.4250168123739 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.49630127774043 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.95696032279758 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.11768661735036 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.86953597848016 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.51042367182247 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.65097511768661 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.81573638197713 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.26227303295225 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.51513113651646 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.29858776059179 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.72696704774714 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.57700067249496 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.22797579018157 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.97041022192333 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.72629455279085 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.16072629455278 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.92199058507062 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.40484196368527 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.61398789509079 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 metrics: - type: ndcg_at_10 value: 61.934999999999995 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.052031054565205 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.969909524076794 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.7530992892652 - task: type: Retrieval dataset: type: jinaai/mintakaqa name: MTEB MintakaRetrieval (fr) config: fr split: test revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e metrics: - type: ndcg_at_10 value: 34.705999999999996 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ar) config: ar split: test revision: None metrics: - type: ndcg_at_10 value: 55.166000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (de) config: de split: test revision: None metrics: - type: ndcg_at_10 value: 55.155 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (en) config: en split: test revision: None metrics: - type: ndcg_at_10 value: 50.993 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (es) config: es split: test revision: None metrics: - type: ndcg_at_10 value: 81.228 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (fr) config: fr split: test revision: None metrics: - type: ndcg_at_10 value: 76.19 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (hi) config: hi split: test revision: None metrics: - type: ndcg_at_10 value: 45.206 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (it) config: it split: test revision: None metrics: - type: ndcg_at_10 value: 66.741 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ja) config: ja split: test revision: None metrics: - type: ndcg_at_10 value: 52.111 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ko) config: ko split: test revision: None metrics: - type: ndcg_at_10 value: 46.733000000000004 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (pt) config: pt split: test revision: None metrics: - type: ndcg_at_10 value: 79.105 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (ru) config: ru split: test revision: None metrics: - type: ndcg_at_10 value: 64.21 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (th) config: th split: test revision: None metrics: - type: ndcg_at_10 value: 35.467 - task: type: Retrieval dataset: type: Shitao/MLDR name: MTEB MultiLongDocRetrieval (zh) config: zh split: test revision: None metrics: - type: ndcg_at_10 value: 27.419 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a metrics: - type: accuracy value: 61.02000000000001 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: ndcg_at_10 value: 36.65 - task: type: Retrieval dataset: type: clarin-knext/nfcorpus-pl name: MTEB NFCorpus-PL config: default split: test revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 metrics: - type: ndcg_at_10 value: 26.831 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: ndcg_at_10 value: 58.111000000000004 - task: type: Retrieval dataset: type: clarin-knext/nq-pl name: MTEB NQ-PL config: default split: test revision: f171245712cf85dd4700b06bef18001578d0ca8d metrics: - type: ndcg_at_10 value: 43.126999999999995 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: 66e76a618a34d6d565d5538088562851e6daa7ec metrics: - type: cos_sim_ap value: 72.67630697316041 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: e610f2ebd179a8fda30ae534c3878750a96db120 metrics: - type: accuracy value: 84.85000000000001 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (fr) config: fr split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_ap value: 100 - task: type: Classification dataset: type: laugustyniak/abusive-clauses-pl name: MTEB PAC config: default split: test revision: None metrics: - type: accuracy value: 65.99189110918043 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 metrics: - type: cos_sim_spearman value: 16.124364530596228 - task: type: PairClassification dataset: type: PL-MTEB/ppc-pairclassification name: MTEB PPC config: default split: test revision: None metrics: - type: cos_sim_ap value: 92.43431057460192 - task: type: PairClassification dataset: type: PL-MTEB/psc-pairclassification name: MTEB PSC config: default split: test revision: None metrics: - type: cos_sim_ap value: 99.06090138049724 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX (fr) config: fr split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_ap value: 58.9314954874314 - task: type: Classification dataset: type: PL-MTEB/polemo2_in name: MTEB PolEmo2.0-IN config: default split: test revision: None metrics: - type: accuracy value: 69.59833795013851 - task: type: Classification dataset: type: PL-MTEB/polemo2_out name: MTEB PolEmo2.0-OUT config: default split: test revision: None metrics: - type: accuracy value: 44.73684210526315 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 metrics: - type: cos_sim_spearman value: 39.36450754137984 - task: type: Retrieval dataset: type: clarin-knext/quora-pl name: MTEB Quora-PL config: default split: test revision: 0be27e93455051e531182b85e85e425aba12e9d4 metrics: - type: ndcg_at_10 value: 80.76299999999999 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: ndcg_at_10 value: 88.022 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 55.719165988934385 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.25390069273025 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: ndcg_at_10 value: 18.243000000000002 - task: type: Retrieval dataset: type: clarin-knext/scidocs-pl name: MTEB SCIDOCS-PL config: default split: test revision: 45452b03f05560207ef19149545f168e596c9337 metrics: - type: ndcg_at_10 value: 14.219000000000001 - task: type: PairClassification dataset: type: PL-MTEB/sicke-pl-pairclassification name: MTEB SICK-E-PL config: default split: test revision: None metrics: - type: cos_sim_ap value: 75.4022630307816 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_spearman value: 79.34269390198548 - task: type: STS dataset: type: PL-MTEB/sickr-pl-sts name: MTEB SICK-R-PL config: default split: test revision: None metrics: - type: cos_sim_spearman value: 74.0651660446132 - task: type: STS dataset: type: Lajavaness/SICK-fr name: MTEB SICKFr config: default split: test revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a metrics: - type: cos_sim_spearman value: 78.62693119733123 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_spearman value: 77.50660544631359 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_spearman value: 85.55415077723738 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_spearman value: 81.67550814479077 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_spearman value: 88.94601412322764 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_spearman value: 84.33844259337481 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 81.58650681159105 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 78.82472265884256 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.43637938260397 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.71008299464059 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 88.88074713413747 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 76.36405640457285 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.84737910084762 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 87.03931621433031 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 84.43335591752246 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 83.85268648747021 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 82.45786516224341 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 67.20227303970304 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 60.892838305537126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.01876318464508 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 42.3879320510127 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 65.54048784845729 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 58.55244068334867 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.48710288440624 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 66.585754901838 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 81.03001290557805 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 62.28001859884359 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 79.64106342105019 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.27915339361124 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 78.28574268257462 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 72.92658860751482 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 74.83418886368217 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 56.01064022625769 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 53.64332829635126 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_spearman value: 73.24670207647144 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 metrics: - type: cos_sim_spearman value: 80.7157790971544 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_spearman value: 86.45763616928973 - task: type: STS dataset: type: stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (fr) config: fr split: test revision: 93d57ef91790589e3ce9c365164337a8a78b7632 metrics: - type: cos_sim_spearman value: 84.4335500335282 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 84.15276484499303 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: ndcg_at_10 value: 73.433 - task: type: Retrieval dataset: type: clarin-knext/scifact-pl name: MTEB SciFact-PL config: default split: test revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e metrics: - type: ndcg_at_10 value: 58.919999999999995 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_ap value: 95.40564890916419 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 63.41856697730145 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 31.709285904909112 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.09341030060322 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_spearman value: 30.58262517835034 - task: type: Summarization dataset: type: lyon-nlp/summarization-summeval-fr-p2p name: MTEB SummEvalFr config: default split: test revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 metrics: - type: cos_sim_spearman value: 29.744542072951358 - task: type: Reranking dataset: type: lyon-nlp/mteb-fr-reranking-syntec-s2p name: MTEB SyntecReranking config: default split: test revision: b205c5084a0934ce8af14338bf03feb19499c84d metrics: - type: map value: 88.03333333333333 - task: type: Retrieval dataset: type: lyon-nlp/mteb-fr-retrieval-syntec-s2p name: MTEB SyntecRetrieval config: default split: test revision: 77f7e271bf4a92b24fce5119f3486b583ca016ff metrics: - type: ndcg_at_10 value: 83.043 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: 76631901a18387f85eaa53e5450019b87ad58ef9 metrics: - type: map value: 67.08577894804324 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: 8731a845f1bf500a4f111cf1070785c793d10e64 metrics: - type: ndcg_at_10 value: 84.718 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 metrics: - type: accuracy value: 48.726 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: ndcg_at_10 value: 57.56 - task: type: Retrieval dataset: type: clarin-knext/trec-covid-pl name: MTEB TRECCOVID-PL config: default split: test revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd metrics: - type: ndcg_at_10 value: 59.355999999999995 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.765 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.69942196531792 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.86585365853657 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.81666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.75 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.78333333333335 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.72333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.45202558635395 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.59238095238095 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 35.69686411149825 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.59333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.1456922987907 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.47462133594857 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.62965440356746 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.48412698412699 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 75.85 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 27.32600866497127 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.38 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.98888712165028 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 85.55690476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 46.68466031323174 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 32.73071428571428 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.26333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 96.61666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 70.03714285714285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.09 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 59.570476190476185 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 97.68333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 80.40880503144653 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.7008547008547 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.84833333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 71.69696969696969 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 55.76985790822269 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.66666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 68.36668519547896 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 36.73992673992674 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.420952380952365 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.95392490046146 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.58936507936508 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.28999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.563650793650794 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.35 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.43 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.73333333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.38666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.64 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 21.257184628237262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 13.592316017316017 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 73.22666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.711309523809526 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 24.98790634904795 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 17.19218192918193 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.26666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.57333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.35127206127206 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 51.12318903318903 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 23.856320290390055 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 79.52833333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 95.93333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.75333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 30.802919708029197 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 15.984076294076294 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.82666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 76.36054421768706 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.232711399711398 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 45.640803181175855 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 86.29 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.90833333333332 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 11.11880248978075 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 48.45839345839346 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 65.68157033805888 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 94.63852498786997 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 81.67904761904761 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.35969868173258 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 5.957229437229437 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 91.50333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 63.75498778998778 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 82.99190476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.95 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 9.054042624042623 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 72.77064981488574 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.14 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 29.976786498525627 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.6525821596244 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 33.12964812964813 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 92.30666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 34.36077879427633 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 52.571845212690285 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 58.13107263107262 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 93.33333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 42.87370133925458 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 20.394327616827614 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.29967426710098 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.80666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.23062271062273 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 78.08398950131233 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.85166666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 67.63004001231148 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 89.77000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 40.2654503616042 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 83.90333333333334 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 77.80666666666666 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 84.08 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 60.43098607367475 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.19333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 90.55352798053529 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: f1 value: 88.44999999999999 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: 5798586b105c0434e4f0fe5e767abe619442cf93 metrics: - type: v_measure value: 57.25416429643288 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d metrics: - type: v_measure value: 56.616646560243524 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: ndcg_at_10 value: 22.819 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.02579999999999 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 57.60045274476514 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.346666699466205 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_ap value: 71.88199004440489 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_ap value: 85.41587779677383 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 metrics: - type: ndcg_at_10 value: 72.792 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: 339287def212450dcaa9df8c22bf93e9980c7023 metrics: - type: accuracy value: 82.58000000000001 - task: type: Retrieval dataset: type: jinaai/xpqa name: MTEB XPQARetrieval (fr) config: fr split: test revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f metrics: - type: ndcg_at_10 value: 67.327 --- ## gte-multilingual-base The **gte-multilingual-base** model is the latest in the GTE (General Text Embedding) family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. - **Elastic Dense Embedding**: Support elastic output dense representation while maintaining the effectiveness of downstream tasks, which significantly reduces storage costs and improves execution efficiency. - **Sparse Vectors**: In addition to dense representations, it can also generate sparse vectors. **Paper**: mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ## Model Information - Model Size: 305M - Embedding Dimension: 768 - Max Input Tokens: 8192 ## Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** - **How to use with TEI: refs/pr/7** ### Get Dense Embeddings with Transformers ### Use with sentence-transformers ### Use with infinity Usage via docker and infinity, MIT Licensed. ### Use with custom code to get dense embeddings and sparse token weights ## Evaluation We validated the performance of the **gte-multilingual-base** model on multiple downstream tasks, including multilingual retrieval, cross-lingual retrieval, long text retrieval, and general text representation evaluation on the MTEB Leaderboard, among others. ### Retrieval Task Retrieval results on MIRACL and MLDR (multilingual), MKQA (crosslingual), BEIR and LoCo (English). !image - Detail results on MLDR !image - Detail results on LoCo ### MTEB Results on MTEB English, Chinese, French, Polish !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:", + "model_explanation_gemini": "Multilingual sentence embedding model for tasks like sentence similarity, clustering, classification, and retrieval across numerous languages." +} \ No newline at end of file diff --git a/data/model_data_json/Alibaba-NLP_gte-multilingual-reranker-base.json b/data/model_data_json/Alibaba-NLP_gte-multilingual-reranker-base.json new file mode 100644 index 0000000000000000000000000000000000000000..b00ec88adfa2fd40cd278aa52a299f91edf1d9d0 --- /dev/null +++ b/data/model_data_json/Alibaba-NLP_gte-multilingual-reranker-base.json @@ -0,0 +1,94 @@ +{ + "model_id": "Alibaba-NLP/gte-multilingual-reranker-base", + "downloads": 228024, + "tags": [ + "sentence-transformers", + "safetensors", + "new", + "text-classification", + "transformers", + "text-embeddings-inference", + "text-ranking", + "custom_code", + "af", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "gl", + "gu", + "he", + "hi", + "hr", + "ht", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ky", + "lo", + "lt", + "lv", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "pa", + "pl", + "pt", + "qu", + "ro", + "ru", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "uk", + "ur", + "vi", + "yo", + "zh", + "arxiv:2407.19669", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: text-ranking tags: - transformers - sentence-transformers - text-embeddings-inference language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - 'no' - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh library_name: sentence-transformers --- ## gte-multilingual-reranker-base The **gte-multilingual-reranker-base** model is the first reranker model in the GTE family of models, featuring several key attributes: - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to reranker models of similar size. - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed. - **Long Context**: Supports text lengths up to **8192** tokens. - **Multilingual Capability**: Supports over **70** languages. ## Model Information - Model Size: 306M - Max Input Tokens: 8192 ### Usage - **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** - **How to use it offline: new-impl/discussions/2** Using Huggingface transformers (transformers>=4.36.0) Usage with infinity: Infinity, a MIT Licensed Inference RestAPI Server. ## Evaluation Results of reranking based on multiple text retreival datasets !image **More detailed experimental results can be found in the paper**. ## Cloud API Services In addition to the open-source GTE series models, GTE series models are also available as commercial API services on Alibaba Cloud. - Embedding Models: Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service. - ReRank Models: The gte-rerank model service is available. Note that the models behind the commercial APIs are not entirely identical to the open-source models. ## Citation If you find our paper or models helpful, please consider cite:", + "model_explanation_gemini": "Multilingual text reranking model supporting 70+ languages with high performance, long-context handling (8192 tokens), and efficient inference for retrieval tasks." +} \ No newline at end of file diff --git a/data/model_data_json/AutonLab_MOMENT-1-large.json b/data/model_data_json/AutonLab_MOMENT-1-large.json new file mode 100644 index 0000000000000000000000000000000000000000..8d2132c8468cd74330ad0b767f8dc2e04cb6ad20 --- /dev/null +++ b/data/model_data_json/AutonLab_MOMENT-1-large.json @@ -0,0 +1,25 @@ +{ + "model_id": "AutonLab/MOMENT-1-large", + "downloads": 147972, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "time series", + "forecasting", + "classification", + "anomaly detection", + "imputation", + "pretrained models", + "foundation models", + "time-series", + "time-series-forecasting", + "dataset:AutonLab/Timeseries-PILE", + "arxiv:2402.03885", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit datasets: - AutonLab/Timeseries-PILE metrics: - accuracy - mse - mae - f1 tags: - time series - forecasting - classification - anomaly detection - imputation - transformers - pretrained models - foundation models - time-series pipeline_tag: time-series-forecasting --- # MOMENT-Large MOMENT is a family of foundation models for general-purpose time-series analysis. The models in this family (1) serve as a building block for diverse **time-series analysis tasks** (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective **out-of-the-box**, i.e., with no (or few) task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are **tunable** using in-distribution and task-specific data to improve performance. For details on MOMENT models, training data, and experimental results, please refer to the paper MOMENT: A Family of Open Time-series Foundation Models. MOMENT-1 comes in 3 sizes: Small, Base, and Large. # Usage **Recommended Python Version:** Python 3.11 (support for additional versions is expected soon). You can install the package using pip: Alternatively, to install the latest version directly from the GitHub repository: To load the pre-trained model for one of the tasks, use one of the following code snippets: **Forecasting** **Classification** **Anomaly Detection, Imputation, and Pre-training** **Representation Learning** ### Tutorials Here is the list of tutorials and reproducibile experiments to get started with MOMENT for various tasks: - Forecasting - Classification - Anomaly Detection - Imputation - Representation Learning - Real-world Electrocardiogram (ECG) Case Study -- This tutorial also shows how to fine-tune MOMENT for a real-world ECG classification problem, performing training and inference on multiple GPUs and parameter efficient fine-tuning (PEFT). ## Model Details ### Model Description - **Developed by:** Auton Lab, Carnegie Mellon University - **Model type:** Time-series Foundation Model - **License:** MIT License ### Model Sources - **Repository:** (Pre-training and research code coming out soon!) - **Paper:** - **Demo:** ## Environmental Impact We train multiple models over many days resulting in significant energy usage and a sizeable carbon footprint. However, we hope that releasing our models will ensure that future time-series modeling efforts are quicker and more efficient, resulting in lower carbon emissions. We use the Total Graphics Power (TGP) to calculate the total power consumed for training MOMENT models, although the total power consumed by the GPU will likely vary a little based on the GPU utilization while training our model. Our calculations do not account for power demands from other sources of our compute. We use 336.566 Kg C02/MWH as the standard value of CO2 emission per megawatt hour of energy consumed for Pittsburgh. - **Hardware Type:** NVIDIA RTX A6000 GPU - **GPU Hours:** 404 - **Compute Region:** Pittsburgh, USA - **Carbon Emission (tCO2eq):** #### Hardware All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 GiB RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism). ## Citation **BibTeX:** If you use MOMENT please cite our paper: **APA:** Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning. PMLR.", + "model_explanation_gemini": "A foundation model for general-purpose time-series analysis, handling tasks like forecasting, classification, anomaly detection, and imputation with out-of-the-box effectiveness and tunability." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-base-en-v1.5.json b/data/model_data_json/BAAI_bge-base-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..b4859993baebdaf0984ed90f74cbae910ed1b46c --- /dev/null +++ b/data/model_data_json/BAAI_bge-base-en-v1.5.json @@ -0,0 +1,29 @@ +{ + "model_id": "BAAI/bge-base-en-v1.5", + "downloads": 2100857, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "mteb", + "en", + "arxiv:2401.03462", + "arxiv:2312.15503", + "arxiv:2311.13534", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.14925373134328 - type: ap value: 39.32336517995478 - type: f1 value: 70.16902252611425 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.386825 - type: ap value: 90.21276917991995 - type: f1 value: 93.37741030006174 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.846000000000004 - type: f1 value: 48.14646269778261 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 40.754000000000005 - type: map_at_10 value: 55.761 - type: map_at_100 value: 56.330999999999996 - type: map_at_1000 value: 56.333999999999996 - type: map_at_3 value: 51.92 - type: map_at_5 value: 54.010999999999996 - type: mrr_at_1 value: 41.181 - type: mrr_at_10 value: 55.967999999999996 - type: mrr_at_100 value: 56.538 - type: mrr_at_1000 value: 56.542 - type: mrr_at_3 value: 51.980000000000004 - type: mrr_at_5 value: 54.208999999999996 - type: ndcg_at_1 value: 40.754000000000005 - type: ndcg_at_10 value: 63.605000000000004 - type: ndcg_at_100 value: 66.05199999999999 - type: ndcg_at_1000 value: 66.12 - type: ndcg_at_3 value: 55.708 - type: ndcg_at_5 value: 59.452000000000005 - type: precision_at_1 value: 40.754000000000005 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.238 - type: precision_at_5 value: 15.149000000000001 - type: recall_at_1 value: 40.754000000000005 - type: recall_at_10 value: 88.407 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.714 - type: recall_at_5 value: 75.747 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.74884539679369 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.8075893810716 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.128470519187736 - type: mrr value: 74.28065778481289 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.24629081484655 - type: cos_sim_spearman value: 86.93752309911496 - type: euclidean_pearson value: 87.58589628573816 - type: euclidean_spearman value: 88.05622328825284 - type: manhattan_pearson value: 87.5594959805773 - type: manhattan_spearman value: 88.19658793233961 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.9512987012987 - type: f1 value: 86.92515357973708 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.10263762928872 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.69711517426737 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.327 - type: map_at_10 value: 44.099 - type: map_at_100 value: 45.525 - type: map_at_1000 value: 45.641999999999996 - type: map_at_3 value: 40.47 - type: map_at_5 value: 42.36 - type: mrr_at_1 value: 39.199 - type: mrr_at_10 value: 49.651 - type: mrr_at_100 value: 50.29 - type: mrr_at_1000 value: 50.329 - type: mrr_at_3 value: 46.924 - type: mrr_at_5 value: 48.548 - type: ndcg_at_1 value: 39.199 - type: ndcg_at_10 value: 50.773 - type: ndcg_at_100 value: 55.67999999999999 - type: ndcg_at_1000 value: 57.495 - type: ndcg_at_3 value: 45.513999999999996 - type: ndcg_at_5 value: 47.703 - type: precision_at_1 value: 39.199 - type: precision_at_10 value: 9.914000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 21.984 - type: precision_at_5 value: 15.737000000000002 - type: recall_at_1 value: 32.327 - type: recall_at_10 value: 63.743 - type: recall_at_100 value: 84.538 - type: recall_at_1000 value: 96.089 - type: recall_at_3 value: 48.065000000000005 - type: recall_at_5 value: 54.519 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.671 - type: map_at_10 value: 42.954 - type: map_at_100 value: 44.151 - type: map_at_1000 value: 44.287 - type: map_at_3 value: 39.912 - type: map_at_5 value: 41.798 - type: mrr_at_1 value: 41.465 - type: mrr_at_10 value: 49.351 - type: mrr_at_100 value: 49.980000000000004 - type: mrr_at_1000 value: 50.016000000000005 - type: mrr_at_3 value: 47.144000000000005 - type: mrr_at_5 value: 48.592999999999996 - type: ndcg_at_1 value: 41.465 - type: ndcg_at_10 value: 48.565999999999995 - type: ndcg_at_100 value: 52.76499999999999 - type: ndcg_at_1000 value: 54.749 - type: ndcg_at_3 value: 44.57 - type: ndcg_at_5 value: 46.759 - type: precision_at_1 value: 41.465 - type: precision_at_10 value: 9.107999999999999 - type: precision_at_100 value: 1.433 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 21.423000000000002 - type: precision_at_5 value: 15.414 - type: recall_at_1 value: 32.671 - type: recall_at_10 value: 57.738 - type: recall_at_100 value: 75.86500000000001 - type: recall_at_1000 value: 88.36 - type: recall_at_3 value: 45.626 - type: recall_at_5 value: 51.812000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 41.185 - type: map_at_10 value: 53.929 - type: map_at_100 value: 54.92 - type: map_at_1000 value: 54.967999999999996 - type: map_at_3 value: 50.70400000000001 - type: map_at_5 value: 52.673 - type: mrr_at_1 value: 47.398 - type: mrr_at_10 value: 57.303000000000004 - type: mrr_at_100 value: 57.959 - type: mrr_at_1000 value: 57.985 - type: mrr_at_3 value: 54.932 - type: mrr_at_5 value: 56.464999999999996 - type: ndcg_at_1 value: 47.398 - type: ndcg_at_10 value: 59.653 - type: ndcg_at_100 value: 63.627 - type: ndcg_at_1000 value: 64.596 - type: ndcg_at_3 value: 54.455 - type: ndcg_at_5 value: 57.245000000000005 - type: precision_at_1 value: 47.398 - type: precision_at_10 value: 9.524000000000001 - type: precision_at_100 value: 1.243 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.389 - type: precision_at_5 value: 16.752 - type: recall_at_1 value: 41.185 - type: recall_at_10 value: 73.193 - type: recall_at_100 value: 90.357 - type: recall_at_1000 value: 97.253 - type: recall_at_3 value: 59.199999999999996 - type: recall_at_5 value: 66.118 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.27 - type: map_at_10 value: 36.223 - type: map_at_100 value: 37.218 - type: map_at_1000 value: 37.293 - type: map_at_3 value: 33.503 - type: map_at_5 value: 35.097 - type: mrr_at_1 value: 29.492 - type: mrr_at_10 value: 38.352000000000004 - type: mrr_at_100 value: 39.188 - type: mrr_at_1000 value: 39.247 - type: mrr_at_3 value: 35.876000000000005 - type: mrr_at_5 value: 37.401 - type: ndcg_at_1 value: 29.492 - type: ndcg_at_10 value: 41.239 - type: ndcg_at_100 value: 46.066 - type: ndcg_at_1000 value: 47.992000000000004 - type: ndcg_at_3 value: 36.11 - type: ndcg_at_5 value: 38.772 - type: precision_at_1 value: 29.492 - type: precision_at_10 value: 6.260000000000001 - type: precision_at_100 value: 0.914 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 15.104000000000001 - type: precision_at_5 value: 10.644 - type: recall_at_1 value: 27.27 - type: recall_at_10 value: 54.589 - type: recall_at_100 value: 76.70700000000001 - type: recall_at_1000 value: 91.158 - type: recall_at_3 value: 40.974 - type: recall_at_5 value: 47.327000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.848 - type: map_at_10 value: 26.207 - type: map_at_100 value: 27.478 - type: map_at_1000 value: 27.602 - type: map_at_3 value: 23.405 - type: map_at_5 value: 24.98 - type: mrr_at_1 value: 21.891 - type: mrr_at_10 value: 31.041999999999998 - type: mrr_at_100 value: 32.092 - type: mrr_at_1000 value: 32.151999999999994 - type: mrr_at_3 value: 28.358 - type: mrr_at_5 value: 29.969 - type: ndcg_at_1 value: 21.891 - type: ndcg_at_10 value: 31.585 - type: ndcg_at_100 value: 37.531 - type: ndcg_at_1000 value: 40.256 - type: ndcg_at_3 value: 26.508 - type: ndcg_at_5 value: 28.894 - type: precision_at_1 value: 21.891 - type: precision_at_10 value: 5.795999999999999 - type: precision_at_100 value: 0.9990000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.769 - type: precision_at_5 value: 9.279 - type: recall_at_1 value: 17.848 - type: recall_at_10 value: 43.452 - type: recall_at_100 value: 69.216 - type: recall_at_1000 value: 88.102 - type: recall_at_3 value: 29.18 - type: recall_at_5 value: 35.347 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.94 - type: map_at_10 value: 41.248000000000005 - type: map_at_100 value: 42.495 - type: map_at_1000 value: 42.602000000000004 - type: map_at_3 value: 37.939 - type: map_at_5 value: 39.924 - type: mrr_at_1 value: 37.824999999999996 - type: mrr_at_10 value: 47.041 - type: mrr_at_100 value: 47.83 - type: mrr_at_1000 value: 47.878 - type: mrr_at_3 value: 44.466 - type: mrr_at_5 value: 46.111999999999995 - type: ndcg_at_1 value: 37.824999999999996 - type: ndcg_at_10 value: 47.223 - type: ndcg_at_100 value: 52.394 - type: ndcg_at_1000 value: 54.432 - type: ndcg_at_3 value: 42.032000000000004 - type: ndcg_at_5 value: 44.772 - type: precision_at_1 value: 37.824999999999996 - type: precision_at_10 value: 8.393 - type: precision_at_100 value: 1.2890000000000001 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 19.698 - type: precision_at_5 value: 14.013 - type: recall_at_1 value: 30.94 - type: recall_at_10 value: 59.316 - type: recall_at_100 value: 80.783 - type: recall_at_1000 value: 94.15400000000001 - type: recall_at_3 value: 44.712 - type: recall_at_5 value: 51.932 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.104 - type: map_at_10 value: 36.675999999999995 - type: map_at_100 value: 38.076 - type: map_at_1000 value: 38.189 - type: map_at_3 value: 33.733999999999995 - type: map_at_5 value: 35.287 - type: mrr_at_1 value: 33.904 - type: mrr_at_10 value: 42.55 - type: mrr_at_100 value: 43.434 - type: mrr_at_1000 value: 43.494 - type: mrr_at_3 value: 40.126 - type: mrr_at_5 value: 41.473 - type: ndcg_at_1 value: 33.904 - type: ndcg_at_10 value: 42.414 - type: ndcg_at_100 value: 48.203 - type: ndcg_at_1000 value: 50.437 - type: ndcg_at_3 value: 37.633 - type: ndcg_at_5 value: 39.67 - type: precision_at_1 value: 33.904 - type: precision_at_10 value: 7.82 - type: precision_at_100 value: 1.2409999999999999 - type: precision_at_1000 value: 0.159 - type: precision_at_3 value: 17.884 - type: precision_at_5 value: 12.648000000000001 - type: recall_at_1 value: 27.104 - type: recall_at_10 value: 53.563 - type: recall_at_100 value: 78.557 - type: recall_at_1000 value: 93.533 - type: recall_at_3 value: 39.92 - type: recall_at_5 value: 45.457 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.707749999999997 - type: map_at_10 value: 36.961 - type: map_at_100 value: 38.158833333333334 - type: map_at_1000 value: 38.270333333333326 - type: map_at_3 value: 34.07183333333334 - type: map_at_5 value: 35.69533333333334 - type: mrr_at_1 value: 32.81875 - type: mrr_at_10 value: 41.293 - type: mrr_at_100 value: 42.116499999999995 - type: mrr_at_1000 value: 42.170249999999996 - type: mrr_at_3 value: 38.83983333333333 - type: mrr_at_5 value: 40.29775 - type: ndcg_at_1 value: 32.81875 - type: ndcg_at_10 value: 42.355 - type: ndcg_at_100 value: 47.41374999999999 - type: ndcg_at_1000 value: 49.5805 - type: ndcg_at_3 value: 37.52825 - type: ndcg_at_5 value: 39.83266666666667 - type: precision_at_1 value: 32.81875 - type: precision_at_10 value: 7.382416666666666 - type: precision_at_100 value: 1.1640833333333334 - type: precision_at_1000 value: 0.15383333333333335 - type: precision_at_3 value: 17.134166666666665 - type: precision_at_5 value: 12.174833333333336 - type: recall_at_1 value: 27.707749999999997 - type: recall_at_10 value: 53.945 - type: recall_at_100 value: 76.191 - type: recall_at_1000 value: 91.101 - type: recall_at_3 value: 40.39083333333334 - type: recall_at_5 value: 46.40083333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.482 - type: map_at_10 value: 33.201 - type: map_at_100 value: 34.107 - type: map_at_1000 value: 34.197 - type: map_at_3 value: 31.174000000000003 - type: map_at_5 value: 32.279 - type: mrr_at_1 value: 29.908 - type: mrr_at_10 value: 36.235 - type: mrr_at_100 value: 37.04 - type: mrr_at_1000 value: 37.105 - type: mrr_at_3 value: 34.355999999999995 - type: mrr_at_5 value: 35.382999999999996 - type: ndcg_at_1 value: 29.908 - type: ndcg_at_10 value: 37.325 - type: ndcg_at_100 value: 41.795 - type: ndcg_at_1000 value: 44.105 - type: ndcg_at_3 value: 33.555 - type: ndcg_at_5 value: 35.266999999999996 - type: precision_at_1 value: 29.908 - type: precision_at_10 value: 5.721 - type: precision_at_100 value: 0.8630000000000001 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 14.008000000000001 - type: precision_at_5 value: 9.754999999999999 - type: recall_at_1 value: 26.482 - type: recall_at_10 value: 47.072 - type: recall_at_100 value: 67.27 - type: recall_at_1000 value: 84.371 - type: recall_at_3 value: 36.65 - type: recall_at_5 value: 40.774 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.815 - type: map_at_10 value: 26.369999999999997 - type: map_at_100 value: 27.458 - type: map_at_1000 value: 27.588 - type: map_at_3 value: 23.990000000000002 - type: map_at_5 value: 25.345000000000002 - type: mrr_at_1 value: 22.953000000000003 - type: mrr_at_10 value: 30.342999999999996 - type: mrr_at_100 value: 31.241000000000003 - type: mrr_at_1000 value: 31.319000000000003 - type: mrr_at_3 value: 28.16 - type: mrr_at_5 value: 29.406 - type: ndcg_at_1 value: 22.953000000000003 - type: ndcg_at_10 value: 31.151 - type: ndcg_at_100 value: 36.309000000000005 - type: ndcg_at_1000 value: 39.227000000000004 - type: ndcg_at_3 value: 26.921 - type: ndcg_at_5 value: 28.938000000000002 - type: precision_at_1 value: 22.953000000000003 - type: precision_at_10 value: 5.602 - type: precision_at_100 value: 0.9530000000000001 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 12.606 - type: precision_at_5 value: 9.119 - type: recall_at_1 value: 18.815 - type: recall_at_10 value: 41.574 - type: recall_at_100 value: 64.84400000000001 - type: recall_at_1000 value: 85.406 - type: recall_at_3 value: 29.694 - type: recall_at_5 value: 34.935 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.840999999999998 - type: map_at_10 value: 36.797999999999995 - type: map_at_100 value: 37.993 - type: map_at_1000 value: 38.086999999999996 - type: map_at_3 value: 34.050999999999995 - type: map_at_5 value: 35.379 - type: mrr_at_1 value: 32.649 - type: mrr_at_10 value: 41.025 - type: mrr_at_100 value: 41.878 - type: mrr_at_1000 value: 41.929 - type: mrr_at_3 value: 38.573 - type: mrr_at_5 value: 39.715 - type: ndcg_at_1 value: 32.649 - type: ndcg_at_10 value: 42.142 - type: ndcg_at_100 value: 47.558 - type: ndcg_at_1000 value: 49.643 - type: ndcg_at_3 value: 37.12 - type: ndcg_at_5 value: 38.983000000000004 - type: precision_at_1 value: 32.649 - type: precision_at_10 value: 7.08 - type: precision_at_100 value: 1.1039999999999999 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 16.698 - type: precision_at_5 value: 11.511000000000001 - type: recall_at_1 value: 27.840999999999998 - type: recall_at_10 value: 54.245 - type: recall_at_100 value: 77.947 - type: recall_at_1000 value: 92.36999999999999 - type: recall_at_3 value: 40.146 - type: recall_at_5 value: 44.951 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.529000000000003 - type: map_at_10 value: 35.010000000000005 - type: map_at_100 value: 36.647 - type: map_at_1000 value: 36.857 - type: map_at_3 value: 31.968000000000004 - type: map_at_5 value: 33.554 - type: mrr_at_1 value: 31.818 - type: mrr_at_10 value: 39.550999999999995 - type: mrr_at_100 value: 40.54 - type: mrr_at_1000 value: 40.596 - type: mrr_at_3 value: 36.726 - type: mrr_at_5 value: 38.416 - type: ndcg_at_1 value: 31.818 - type: ndcg_at_10 value: 40.675 - type: ndcg_at_100 value: 46.548 - type: ndcg_at_1000 value: 49.126 - type: ndcg_at_3 value: 35.829 - type: ndcg_at_5 value: 38.0 - type: precision_at_1 value: 31.818 - type: precision_at_10 value: 7.826 - type: precision_at_100 value: 1.538 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 16.601 - type: precision_at_5 value: 12.095 - type: recall_at_1 value: 26.529000000000003 - type: recall_at_10 value: 51.03 - type: recall_at_100 value: 77.556 - type: recall_at_1000 value: 93.804 - type: recall_at_3 value: 36.986000000000004 - type: recall_at_5 value: 43.096000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.480999999999998 - type: map_at_10 value: 30.817 - type: map_at_100 value: 31.838 - type: map_at_1000 value: 31.932 - type: map_at_3 value: 28.011999999999997 - type: map_at_5 value: 29.668 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 33.072 - type: mrr_at_100 value: 33.926 - type: mrr_at_1000 value: 33.993 - type: mrr_at_3 value: 30.436999999999998 - type: mrr_at_5 value: 32.092 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.514 - type: ndcg_at_100 value: 40.489000000000004 - type: ndcg_at_1000 value: 42.908 - type: ndcg_at_3 value: 30.092000000000002 - type: ndcg_at_5 value: 32.989000000000004 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.545 - type: precision_at_100 value: 0.861 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.446 - type: precision_at_5 value: 9.131 - type: recall_at_1 value: 23.480999999999998 - type: recall_at_10 value: 47.825 - type: recall_at_100 value: 70.652 - type: recall_at_1000 value: 88.612 - type: recall_at_3 value: 33.537 - type: recall_at_5 value: 40.542 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.333999999999998 - type: map_at_10 value: 22.524 - type: map_at_100 value: 24.506 - type: map_at_1000 value: 24.715 - type: map_at_3 value: 19.022 - type: map_at_5 value: 20.693 - type: mrr_at_1 value: 29.186 - type: mrr_at_10 value: 41.22 - type: mrr_at_100 value: 42.16 - type: mrr_at_1000 value: 42.192 - type: mrr_at_3 value: 38.013000000000005 - type: mrr_at_5 value: 39.704 - type: ndcg_at_1 value: 29.186 - type: ndcg_at_10 value: 31.167 - type: ndcg_at_100 value: 38.879000000000005 - type: ndcg_at_1000 value: 42.376000000000005 - type: ndcg_at_3 value: 25.817 - type: ndcg_at_5 value: 27.377000000000002 - type: precision_at_1 value: 29.186 - type: precision_at_10 value: 9.693999999999999 - type: precision_at_100 value: 1.8030000000000002 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 19.11 - type: precision_at_5 value: 14.344999999999999 - type: recall_at_1 value: 13.333999999999998 - type: recall_at_10 value: 37.092000000000006 - type: recall_at_100 value: 63.651 - type: recall_at_1000 value: 83.05 - type: recall_at_3 value: 23.74 - type: recall_at_5 value: 28.655 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.151 - type: map_at_10 value: 19.653000000000002 - type: map_at_100 value: 28.053 - type: map_at_1000 value: 29.709000000000003 - type: map_at_3 value: 14.191 - type: map_at_5 value: 16.456 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.4 - type: mrr_at_100 value: 74.715 - type: mrr_at_1000 value: 74.726 - type: mrr_at_3 value: 72.417 - type: mrr_at_5 value: 73.667 - type: ndcg_at_1 value: 54.25 - type: ndcg_at_10 value: 40.77 - type: ndcg_at_100 value: 46.359 - type: ndcg_at_1000 value: 54.193000000000005 - type: ndcg_at_3 value: 44.832 - type: ndcg_at_5 value: 42.63 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 32.175 - type: precision_at_100 value: 10.668 - type: precision_at_1000 value: 2.067 - type: precision_at_3 value: 47.667 - type: precision_at_5 value: 41.3 - type: recall_at_1 value: 9.151 - type: recall_at_10 value: 25.003999999999998 - type: recall_at_100 value: 52.976 - type: recall_at_1000 value: 78.315 - type: recall_at_3 value: 15.487 - type: recall_at_5 value: 18.999 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.89999999999999 - type: f1 value: 46.47777925067403 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 73.706 - type: map_at_10 value: 82.423 - type: map_at_100 value: 82.67999999999999 - type: map_at_1000 value: 82.694 - type: map_at_3 value: 81.328 - type: map_at_5 value: 82.001 - type: mrr_at_1 value: 79.613 - type: mrr_at_10 value: 87.07000000000001 - type: mrr_at_100 value: 87.169 - type: mrr_at_1000 value: 87.17 - type: mrr_at_3 value: 86.404 - type: mrr_at_5 value: 86.856 - type: ndcg_at_1 value: 79.613 - type: ndcg_at_10 value: 86.289 - type: ndcg_at_100 value: 87.201 - type: ndcg_at_1000 value: 87.428 - type: ndcg_at_3 value: 84.625 - type: ndcg_at_5 value: 85.53699999999999 - type: precision_at_1 value: 79.613 - type: precision_at_10 value: 10.399 - type: precision_at_100 value: 1.1079999999999999 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.473 - type: precision_at_5 value: 20.132 - type: recall_at_1 value: 73.706 - type: recall_at_10 value: 93.559 - type: recall_at_100 value: 97.188 - type: recall_at_1000 value: 98.555 - type: recall_at_3 value: 88.98700000000001 - type: recall_at_5 value: 91.373 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 19.841 - type: map_at_10 value: 32.643 - type: map_at_100 value: 34.575 - type: map_at_1000 value: 34.736 - type: map_at_3 value: 28.317999999999998 - type: map_at_5 value: 30.964000000000002 - type: mrr_at_1 value: 39.660000000000004 - type: mrr_at_10 value: 48.620000000000005 - type: mrr_at_100 value: 49.384 - type: mrr_at_1000 value: 49.415 - type: mrr_at_3 value: 45.988 - type: mrr_at_5 value: 47.361 - type: ndcg_at_1 value: 39.660000000000004 - type: ndcg_at_10 value: 40.646 - type: ndcg_at_100 value: 47.657 - type: ndcg_at_1000 value: 50.428 - type: ndcg_at_3 value: 36.689 - type: ndcg_at_5 value: 38.211 - type: precision_at_1 value: 39.660000000000004 - type: precision_at_10 value: 11.235000000000001 - type: precision_at_100 value: 1.8530000000000002 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 24.587999999999997 - type: precision_at_5 value: 18.395 - type: recall_at_1 value: 19.841 - type: recall_at_10 value: 48.135 - type: recall_at_100 value: 74.224 - type: recall_at_1000 value: 90.826 - type: recall_at_3 value: 33.536 - type: recall_at_5 value: 40.311 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.358 - type: map_at_10 value: 64.497 - type: map_at_100 value: 65.362 - type: map_at_1000 value: 65.41900000000001 - type: map_at_3 value: 61.06700000000001 - type: map_at_5 value: 63.317 - type: mrr_at_1 value: 80.716 - type: mrr_at_10 value: 86.10799999999999 - type: mrr_at_100 value: 86.265 - type: mrr_at_1000 value: 86.27 - type: mrr_at_3 value: 85.271 - type: mrr_at_5 value: 85.82499999999999 - type: ndcg_at_1 value: 80.716 - type: ndcg_at_10 value: 72.597 - type: ndcg_at_100 value: 75.549 - type: ndcg_at_1000 value: 76.61 - type: ndcg_at_3 value: 67.874 - type: ndcg_at_5 value: 70.655 - type: precision_at_1 value: 80.716 - type: precision_at_10 value: 15.148 - type: precision_at_100 value: 1.745 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.597 - type: precision_at_5 value: 28.351 - type: recall_at_1 value: 40.358 - type: recall_at_10 value: 75.739 - type: recall_at_100 value: 87.259 - type: recall_at_1000 value: 94.234 - type: recall_at_3 value: 65.39500000000001 - type: recall_at_5 value: 70.878 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 90.80799999999998 - type: ap value: 86.81350378180757 - type: f1 value: 90.79901248314215 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.096 - type: map_at_10 value: 34.384 - type: map_at_100 value: 35.541 - type: map_at_1000 value: 35.589999999999996 - type: map_at_3 value: 30.496000000000002 - type: map_at_5 value: 32.718 - type: mrr_at_1 value: 22.750999999999998 - type: mrr_at_10 value: 35.024 - type: mrr_at_100 value: 36.125 - type: mrr_at_1000 value: 36.168 - type: mrr_at_3 value: 31.225 - type: mrr_at_5 value: 33.416000000000004 - type: ndcg_at_1 value: 22.750999999999998 - type: ndcg_at_10 value: 41.351 - type: ndcg_at_100 value: 46.92 - type: ndcg_at_1000 value: 48.111 - type: ndcg_at_3 value: 33.439 - type: ndcg_at_5 value: 37.407000000000004 - type: precision_at_1 value: 22.750999999999998 - type: precision_at_10 value: 6.564 - type: precision_at_100 value: 0.935 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.288 - type: precision_at_5 value: 10.581999999999999 - type: recall_at_1 value: 22.096 - type: recall_at_10 value: 62.771 - type: recall_at_100 value: 88.529 - type: recall_at_1000 value: 97.55 - type: recall_at_3 value: 41.245 - type: recall_at_5 value: 50.788 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.16780665754673 - type: f1 value: 93.96331194859894 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.90606475148198 - type: f1 value: 58.58344986604187 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.14660390047075 - type: f1 value: 74.31533923533614 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.16139878950908 - type: f1 value: 80.18532656824924 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.949880906135085 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.56300351524862 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.196521894371315 - type: mrr value: 32.22644231694389 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.783 - type: map_at_10 value: 14.549000000000001 - type: map_at_100 value: 18.433 - type: map_at_1000 value: 19.949 - type: map_at_3 value: 10.936 - type: map_at_5 value: 12.514 - type: mrr_at_1 value: 47.368 - type: mrr_at_10 value: 56.42 - type: mrr_at_100 value: 56.908 - type: mrr_at_1000 value: 56.95 - type: mrr_at_3 value: 54.283 - type: mrr_at_5 value: 55.568 - type: ndcg_at_1 value: 45.666000000000004 - type: ndcg_at_10 value: 37.389 - type: ndcg_at_100 value: 34.253 - type: ndcg_at_1000 value: 43.059999999999995 - type: ndcg_at_3 value: 42.725 - type: ndcg_at_5 value: 40.193 - type: precision_at_1 value: 47.368 - type: precision_at_10 value: 27.988000000000003 - type: precision_at_100 value: 8.672 - type: precision_at_1000 value: 2.164 - type: precision_at_3 value: 40.248 - type: precision_at_5 value: 34.737 - type: recall_at_1 value: 6.783 - type: recall_at_10 value: 17.838 - type: recall_at_100 value: 33.672000000000004 - type: recall_at_1000 value: 66.166 - type: recall_at_3 value: 11.849 - type: recall_at_5 value: 14.205000000000002 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.698999999999998 - type: map_at_10 value: 46.556 - type: map_at_100 value: 47.652 - type: map_at_1000 value: 47.68 - type: map_at_3 value: 42.492000000000004 - type: map_at_5 value: 44.763999999999996 - type: mrr_at_1 value: 35.747 - type: mrr_at_10 value: 49.242999999999995 - type: mrr_at_100 value: 50.052 - type: mrr_at_1000 value: 50.068 - type: mrr_at_3 value: 45.867000000000004 - type: mrr_at_5 value: 47.778999999999996 - type: ndcg_at_1 value: 35.717999999999996 - type: ndcg_at_10 value: 54.14600000000001 - type: ndcg_at_100 value: 58.672999999999995 - type: ndcg_at_1000 value: 59.279 - type: ndcg_at_3 value: 46.407 - type: ndcg_at_5 value: 50.181 - type: precision_at_1 value: 35.717999999999996 - type: precision_at_10 value: 8.844000000000001 - type: precision_at_100 value: 1.139 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.993000000000002 - type: precision_at_5 value: 14.791000000000002 - type: recall_at_1 value: 31.698999999999998 - type: recall_at_10 value: 74.693 - type: recall_at_100 value: 94.15299999999999 - type: recall_at_1000 value: 98.585 - type: recall_at_3 value: 54.388999999999996 - type: recall_at_5 value: 63.08200000000001 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.283 - type: map_at_10 value: 85.24000000000001 - type: map_at_100 value: 85.882 - type: map_at_1000 value: 85.897 - type: map_at_3 value: 82.326 - type: map_at_5 value: 84.177 - type: mrr_at_1 value: 82.21000000000001 - type: mrr_at_10 value: 88.228 - type: mrr_at_100 value: 88.32 - type: mrr_at_1000 value: 88.32 - type: mrr_at_3 value: 87.323 - type: mrr_at_5 value: 87.94800000000001 - type: ndcg_at_1 value: 82.17999999999999 - type: ndcg_at_10 value: 88.9 - type: ndcg_at_100 value: 90.079 - type: ndcg_at_1000 value: 90.158 - type: ndcg_at_3 value: 86.18299999999999 - type: ndcg_at_5 value: 87.71799999999999 - type: precision_at_1 value: 82.17999999999999 - type: precision_at_10 value: 13.464 - type: precision_at_100 value: 1.533 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.693 - type: precision_at_5 value: 24.792 - type: recall_at_1 value: 71.283 - type: recall_at_10 value: 95.742 - type: recall_at_100 value: 99.67200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.888 - type: recall_at_5 value: 92.24 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.24267063669042 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.88056988932578 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.903 - type: map_at_10 value: 13.202 - type: map_at_100 value: 15.5 - type: map_at_1000 value: 15.870999999999999 - type: map_at_3 value: 9.407 - type: map_at_5 value: 11.238 - type: mrr_at_1 value: 24.2 - type: mrr_at_10 value: 35.867 - type: mrr_at_100 value: 37.001 - type: mrr_at_1000 value: 37.043 - type: mrr_at_3 value: 32.5 - type: mrr_at_5 value: 34.35 - type: ndcg_at_1 value: 24.2 - type: ndcg_at_10 value: 21.731 - type: ndcg_at_100 value: 30.7 - type: ndcg_at_1000 value: 36.618 - type: ndcg_at_3 value: 20.72 - type: ndcg_at_5 value: 17.954 - type: precision_at_1 value: 24.2 - type: precision_at_10 value: 11.33 - type: precision_at_100 value: 2.4410000000000003 - type: precision_at_1000 value: 0.386 - type: precision_at_3 value: 19.667 - type: precision_at_5 value: 15.86 - type: recall_at_1 value: 4.903 - type: recall_at_10 value: 22.962 - type: recall_at_100 value: 49.563 - type: recall_at_1000 value: 78.238 - type: recall_at_3 value: 11.953 - type: recall_at_5 value: 16.067999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.12694254604078 - type: cos_sim_spearman value: 80.30141815181918 - type: euclidean_pearson value: 81.34015449877128 - type: euclidean_spearman value: 80.13984197010849 - type: manhattan_pearson value: 81.31767068124086 - type: manhattan_spearman value: 80.11720513114103 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.13112984010417 - type: cos_sim_spearman value: 78.03063573402875 - type: euclidean_pearson value: 83.51928418844804 - type: euclidean_spearman value: 78.4045235411144 - type: manhattan_pearson value: 83.49981637388689 - type: manhattan_spearman value: 78.4042575139372 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.50327987379504 - type: cos_sim_spearman value: 84.18556767756205 - type: euclidean_pearson value: 82.69684424327679 - type: euclidean_spearman value: 83.5368106038335 - type: manhattan_pearson value: 82.57967581007374 - type: manhattan_spearman value: 83.43009053133697 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.50756863007814 - type: cos_sim_spearman value: 82.27204331279108 - type: euclidean_pearson value: 81.39535251429741 - type: euclidean_spearman value: 81.84386626336239 - type: manhattan_pearson value: 81.34281737280695 - type: manhattan_spearman value: 81.81149375673166 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.8727714856726 - type: cos_sim_spearman value: 87.95738287792312 - type: euclidean_pearson value: 86.62920602795887 - type: euclidean_spearman value: 87.05207355381243 - type: manhattan_pearson value: 86.53587918472225 - type: manhattan_spearman value: 86.95382961029586 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.52240359769479 - type: cos_sim_spearman value: 85.47685776238286 - type: euclidean_pearson value: 84.25815333483058 - type: euclidean_spearman value: 85.27415639683198 - type: manhattan_pearson value: 84.29127757025637 - type: manhattan_spearman value: 85.30226224917351 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.42501708915708 - type: cos_sim_spearman value: 86.42276182795041 - type: euclidean_pearson value: 86.5408207354761 - type: euclidean_spearman value: 85.46096321750838 - type: manhattan_pearson value: 86.54177303026881 - type: manhattan_spearman value: 85.50313151916117 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.86521089250766 - type: cos_sim_spearman value: 65.94868540323003 - type: euclidean_pearson value: 67.16569626533084 - type: euclidean_spearman value: 66.37667004134917 - type: manhattan_pearson value: 67.1482365102333 - type: manhattan_spearman value: 66.53240122580029 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.64746265365318 - type: cos_sim_spearman value: 86.41888825906786 - type: euclidean_pearson value: 85.27453642725811 - type: euclidean_spearman value: 85.94095796602544 - type: manhattan_pearson value: 85.28643660505334 - type: manhattan_spearman value: 85.95028003260744 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.48903153618527 - type: mrr value: 96.41081503826601 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 58.594 - type: map_at_10 value: 69.296 - type: map_at_100 value: 69.782 - type: map_at_1000 value: 69.795 - type: map_at_3 value: 66.23 - type: map_at_5 value: 68.293 - type: mrr_at_1 value: 61.667 - type: mrr_at_10 value: 70.339 - type: mrr_at_100 value: 70.708 - type: mrr_at_1000 value: 70.722 - type: mrr_at_3 value: 68.0 - type: mrr_at_5 value: 69.56700000000001 - type: ndcg_at_1 value: 61.667 - type: ndcg_at_10 value: 74.039 - type: ndcg_at_100 value: 76.103 - type: ndcg_at_1000 value: 76.47800000000001 - type: ndcg_at_3 value: 68.967 - type: ndcg_at_5 value: 71.96900000000001 - type: precision_at_1 value: 61.667 - type: precision_at_10 value: 9.866999999999999 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.111 - type: precision_at_5 value: 18.2 - type: recall_at_1 value: 58.594 - type: recall_at_10 value: 87.422 - type: recall_at_100 value: 96.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 74.217 - type: recall_at_5 value: 81.539 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85049504950496 - type: cos_sim_ap value: 96.33111544137081 - type: cos_sim_f1 value: 92.35443037974684 - type: cos_sim_precision value: 93.53846153846153 - type: cos_sim_recall value: 91.2 - type: dot_accuracy value: 99.82376237623762 - type: dot_ap value: 95.38082527310888 - type: dot_f1 value: 90.90909090909092 - type: dot_precision value: 92.90187891440502 - type: dot_recall value: 89.0 - type: euclidean_accuracy value: 99.84851485148515 - type: euclidean_ap value: 96.32316003996347 - type: euclidean_f1 value: 92.2071392659628 - type: euclidean_precision value: 92.71991911021233 - type: euclidean_recall value: 91.7 - type: manhattan_accuracy value: 99.84851485148515 - type: manhattan_ap value: 96.3655668249217 - type: manhattan_f1 value: 92.18356026222895 - type: manhattan_precision value: 92.98067141403867 - type: manhattan_recall value: 91.4 - type: max_accuracy value: 99.85049504950496 - type: max_ap value: 96.3655668249217 - type: max_f1 value: 92.35443037974684 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.94861371629051 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.009430451385 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.61164066427969 - type: mrr value: 55.49710603938544 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.622620124907662 - type: cos_sim_spearman value: 31.0678351356163 - type: dot_pearson value: 30.863727693306814 - type: dot_spearman value: 31.230306567021255 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22 - type: map_at_10 value: 2.011 - type: map_at_100 value: 10.974 - type: map_at_1000 value: 25.819 - type: map_at_3 value: 0.6649999999999999 - type: map_at_5 value: 1.076 - type: mrr_at_1 value: 86.0 - type: mrr_at_10 value: 91.8 - type: mrr_at_100 value: 91.8 - type: mrr_at_1000 value: 91.8 - type: mrr_at_3 value: 91.0 - type: mrr_at_5 value: 91.8 - type: ndcg_at_1 value: 82.0 - type: ndcg_at_10 value: 78.07300000000001 - type: ndcg_at_100 value: 58.231 - type: ndcg_at_1000 value: 51.153000000000006 - type: ndcg_at_3 value: 81.123 - type: ndcg_at_5 value: 81.059 - type: precision_at_1 value: 86.0 - type: precision_at_10 value: 83.0 - type: precision_at_100 value: 59.38 - type: precision_at_1000 value: 22.55 - type: precision_at_3 value: 87.333 - type: precision_at_5 value: 86.8 - type: recall_at_1 value: 0.22 - type: recall_at_10 value: 2.2079999999999997 - type: recall_at_100 value: 14.069 - type: recall_at_1000 value: 47.678 - type: recall_at_3 value: 0.7040000000000001 - type: recall_at_5 value: 1.161 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.809 - type: map_at_10 value: 10.394 - type: map_at_100 value: 16.598 - type: map_at_1000 value: 18.142 - type: map_at_3 value: 5.572 - type: map_at_5 value: 7.1370000000000005 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 46.564 - type: mrr_at_100 value: 47.469 - type: mrr_at_1000 value: 47.469 - type: mrr_at_3 value: 42.177 - type: mrr_at_5 value: 44.524 - type: ndcg_at_1 value: 30.612000000000002 - type: ndcg_at_10 value: 25.701 - type: ndcg_at_100 value: 37.532 - type: ndcg_at_1000 value: 48.757 - type: ndcg_at_3 value: 28.199999999999996 - type: ndcg_at_5 value: 25.987 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 23.469 - type: precision_at_100 value: 7.9799999999999995 - type: precision_at_1000 value: 1.5350000000000001 - type: precision_at_3 value: 29.932 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 2.809 - type: recall_at_10 value: 16.887 - type: recall_at_100 value: 48.67 - type: recall_at_1000 value: 82.89699999999999 - type: recall_at_3 value: 6.521000000000001 - type: recall_at_5 value: 9.609 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.57860000000001 - type: ap value: 13.82629211536393 - type: f1 value: 54.59860966183956 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.38030560271647 - type: f1 value: 59.69685552567865 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.4736717043405 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.92853311080646 - type: cos_sim_ap value: 77.67872502591382 - type: cos_sim_f1 value: 70.33941236068895 - type: cos_sim_precision value: 67.63273258645884 - type: cos_sim_recall value: 73.27176781002639 - type: dot_accuracy value: 85.79603027954938 - type: dot_ap value: 73.73786190233379 - type: dot_f1 value: 67.3437901774235 - type: dot_precision value: 65.67201604814443 - type: dot_recall value: 69.10290237467018 - type: euclidean_accuracy value: 86.94045419324074 - type: euclidean_ap value: 77.6687791535167 - type: euclidean_f1 value: 70.47209214023542 - type: euclidean_precision value: 67.7207492094381 - type: euclidean_recall value: 73.45646437994723 - type: manhattan_accuracy value: 86.87488823985218 - type: manhattan_ap value: 77.63373392430728 - type: manhattan_f1 value: 70.40920716112532 - type: manhattan_precision value: 68.31265508684864 - type: manhattan_recall value: 72.63852242744063 - type: max_accuracy value: 86.94045419324074 - type: max_ap value: 77.67872502591382 - type: max_f1 value: 70.47209214023542 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.67155664221679 - type: cos_sim_ap value: 85.64591703003417 - type: cos_sim_f1 value: 77.59531005352656 - type: cos_sim_precision value: 73.60967184801382 - type: cos_sim_recall value: 82.03726516784724 - type: dot_accuracy value: 88.41541506578181 - type: dot_ap value: 84.6482788957769 - type: dot_f1 value: 77.04748541466657 - type: dot_precision value: 74.02440754931176 - type: dot_recall value: 80.3279950723745 - type: euclidean_accuracy value: 88.63080684596576 - type: euclidean_ap value: 85.44570045321562 - type: euclidean_f1 value: 77.28769403336106 - type: euclidean_precision value: 72.90600040958427 - type: euclidean_recall value: 82.22975053895904 - type: manhattan_accuracy value: 88.59393798269105 - type: manhattan_ap value: 85.40271361038187 - type: manhattan_f1 value: 77.17606419344392 - type: manhattan_precision value: 72.4447747078295 - type: manhattan_recall value: 82.5685247921158 - type: max_accuracy value: 88.67155664221679 - type: max_ap value: 85.64591703003417 - type: max_f1 value: 77.59531005352656 license: mit language: - en ---

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License

For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. #### Usage of the ONNX files #### Usage via infinity Its also possible to deploy the onnx files with the infinity_emb pip package. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like text classification, retrieval, clustering, and similarity measurement." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-base-en.json b/data/model_data_json/BAAI_bge-base-en.json new file mode 100644 index 0000000000000000000000000000000000000000..556edb7f7d1eeb1d8e208bf037505d8b91172581 --- /dev/null +++ b/data/model_data_json/BAAI_bge-base-en.json @@ -0,0 +1,23 @@ +{ + "model_id": "BAAI/bge-base-en", + "downloads": 180226, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "mteb", + "en", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb model-index: - name: bge-base-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.73134328358209 - type: ap value: 38.97277232632892 - type: f1 value: 69.81740361139785 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.56522500000001 - type: ap value: 88.88821771869553 - type: f1 value: 92.54817512659696 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.91 - type: f1 value: 46.28536394320311 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 38.834 - type: map_at_10 value: 53.564 - type: map_at_100 value: 54.230000000000004 - type: map_at_1000 value: 54.235 - type: map_at_3 value: 49.49 - type: map_at_5 value: 51.784 - type: mrr_at_1 value: 39.26 - type: mrr_at_10 value: 53.744 - type: mrr_at_100 value: 54.410000000000004 - type: mrr_at_1000 value: 54.415 - type: mrr_at_3 value: 49.656 - type: mrr_at_5 value: 52.018 - type: ndcg_at_1 value: 38.834 - type: ndcg_at_10 value: 61.487 - type: ndcg_at_100 value: 64.303 - type: ndcg_at_1000 value: 64.408 - type: ndcg_at_3 value: 53.116 - type: ndcg_at_5 value: 57.248 - type: precision_at_1 value: 38.834 - type: precision_at_10 value: 8.663 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.218999999999998 - type: precision_at_5 value: 14.737 - type: recall_at_1 value: 38.834 - type: recall_at_10 value: 86.629 - type: recall_at_100 value: 98.86200000000001 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 63.656 - type: recall_at_5 value: 73.68400000000001 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.88475477433035 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.85053138403176 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.23221013208242 - type: mrr value: 74.64857318735436 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.4403443247284 - type: cos_sim_spearman value: 85.5326718115169 - type: euclidean_pearson value: 86.0114007449595 - type: euclidean_spearman value: 86.05979225604875 - type: manhattan_pearson value: 86.05423806568598 - type: manhattan_spearman value: 86.02485170086835 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.44480519480518 - type: f1 value: 86.41301900941988 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.17547250880036 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.74514172687293 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.096000000000004 - type: map_at_10 value: 43.345 - type: map_at_100 value: 44.73 - type: map_at_1000 value: 44.85 - type: map_at_3 value: 39.956 - type: map_at_5 value: 41.727 - type: mrr_at_1 value: 38.769999999999996 - type: mrr_at_10 value: 48.742000000000004 - type: mrr_at_100 value: 49.474000000000004 - type: mrr_at_1000 value: 49.513 - type: mrr_at_3 value: 46.161 - type: mrr_at_5 value: 47.721000000000004 - type: ndcg_at_1 value: 38.769999999999996 - type: ndcg_at_10 value: 49.464999999999996 - type: ndcg_at_100 value: 54.632000000000005 - type: ndcg_at_1000 value: 56.52 - type: ndcg_at_3 value: 44.687 - type: ndcg_at_5 value: 46.814 - type: precision_at_1 value: 38.769999999999996 - type: precision_at_10 value: 9.471 - type: precision_at_100 value: 1.4909999999999999 - type: precision_at_1000 value: 0.194 - type: precision_at_3 value: 21.268 - type: precision_at_5 value: 15.079 - type: recall_at_1 value: 32.096000000000004 - type: recall_at_10 value: 60.99099999999999 - type: recall_at_100 value: 83.075 - type: recall_at_1000 value: 95.178 - type: recall_at_3 value: 47.009 - type: recall_at_5 value: 53.348 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.588 - type: map_at_10 value: 42.251 - type: map_at_100 value: 43.478 - type: map_at_1000 value: 43.617 - type: map_at_3 value: 39.381 - type: map_at_5 value: 41.141 - type: mrr_at_1 value: 41.21 - type: mrr_at_10 value: 48.765 - type: mrr_at_100 value: 49.403000000000006 - type: mrr_at_1000 value: 49.451 - type: mrr_at_3 value: 46.73 - type: mrr_at_5 value: 47.965999999999994 - type: ndcg_at_1 value: 41.21 - type: ndcg_at_10 value: 47.704 - type: ndcg_at_100 value: 51.916 - type: ndcg_at_1000 value: 54.013999999999996 - type: ndcg_at_3 value: 44.007000000000005 - type: ndcg_at_5 value: 45.936 - type: precision_at_1 value: 41.21 - type: precision_at_10 value: 8.885 - type: precision_at_100 value: 1.409 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 21.274 - type: precision_at_5 value: 15.045 - type: recall_at_1 value: 32.588 - type: recall_at_10 value: 56.333 - type: recall_at_100 value: 74.251 - type: recall_at_1000 value: 87.518 - type: recall_at_3 value: 44.962 - type: recall_at_5 value: 50.609 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.308 - type: map_at_10 value: 53.12 - type: map_at_100 value: 54.123 - type: map_at_1000 value: 54.173 - type: map_at_3 value: 50.017999999999994 - type: map_at_5 value: 51.902 - type: mrr_at_1 value: 46.394999999999996 - type: mrr_at_10 value: 56.531 - type: mrr_at_100 value: 57.19800000000001 - type: mrr_at_1000 value: 57.225 - type: mrr_at_3 value: 54.368 - type: mrr_at_5 value: 55.713 - type: ndcg_at_1 value: 46.394999999999996 - type: ndcg_at_10 value: 58.811 - type: ndcg_at_100 value: 62.834 - type: ndcg_at_1000 value: 63.849999999999994 - type: ndcg_at_3 value: 53.88699999999999 - type: ndcg_at_5 value: 56.477999999999994 - type: precision_at_1 value: 46.394999999999996 - type: precision_at_10 value: 9.398 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 24.221999999999998 - type: precision_at_5 value: 16.539 - type: recall_at_1 value: 40.308 - type: recall_at_10 value: 72.146 - type: recall_at_100 value: 89.60900000000001 - type: recall_at_1000 value: 96.733 - type: recall_at_3 value: 58.91499999999999 - type: recall_at_5 value: 65.34299999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.383000000000003 - type: map_at_10 value: 35.802 - type: map_at_100 value: 36.756 - type: map_at_1000 value: 36.826 - type: map_at_3 value: 32.923 - type: map_at_5 value: 34.577999999999996 - type: mrr_at_1 value: 29.604999999999997 - type: mrr_at_10 value: 37.918 - type: mrr_at_100 value: 38.732 - type: mrr_at_1000 value: 38.786 - type: mrr_at_3 value: 35.198 - type: mrr_at_5 value: 36.808 - type: ndcg_at_1 value: 29.604999999999997 - type: ndcg_at_10 value: 40.836 - type: ndcg_at_100 value: 45.622 - type: ndcg_at_1000 value: 47.427 - type: ndcg_at_3 value: 35.208 - type: ndcg_at_5 value: 38.066 - type: precision_at_1 value: 29.604999999999997 - type: precision_at_10 value: 6.226 - type: precision_at_100 value: 0.9079999999999999 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 14.463000000000001 - type: precision_at_5 value: 10.35 - type: recall_at_1 value: 27.383000000000003 - type: recall_at_10 value: 54.434000000000005 - type: recall_at_100 value: 76.632 - type: recall_at_1000 value: 90.25 - type: recall_at_3 value: 39.275 - type: recall_at_5 value: 46.225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.885 - type: map_at_10 value: 25.724000000000004 - type: map_at_100 value: 26.992 - type: map_at_1000 value: 27.107999999999997 - type: map_at_3 value: 23.04 - type: map_at_5 value: 24.529 - type: mrr_at_1 value: 22.264 - type: mrr_at_10 value: 30.548 - type: mrr_at_100 value: 31.593 - type: mrr_at_1000 value: 31.657999999999998 - type: mrr_at_3 value: 27.756999999999998 - type: mrr_at_5 value: 29.398999999999997 - type: ndcg_at_1 value: 22.264 - type: ndcg_at_10 value: 30.902 - type: ndcg_at_100 value: 36.918 - type: ndcg_at_1000 value: 39.735 - type: ndcg_at_3 value: 25.915 - type: ndcg_at_5 value: 28.255999999999997 - type: precision_at_1 value: 22.264 - type: precision_at_10 value: 5.634 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 12.396 - type: precision_at_5 value: 9.055 - type: recall_at_1 value: 17.885 - type: recall_at_10 value: 42.237 - type: recall_at_100 value: 68.489 - type: recall_at_1000 value: 88.721 - type: recall_at_3 value: 28.283 - type: recall_at_5 value: 34.300000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.737000000000002 - type: map_at_10 value: 39.757 - type: map_at_100 value: 40.992 - type: map_at_1000 value: 41.102 - type: map_at_3 value: 36.612 - type: map_at_5 value: 38.413000000000004 - type: mrr_at_1 value: 35.804 - type: mrr_at_10 value: 45.178000000000004 - type: mrr_at_100 value: 45.975 - type: mrr_at_1000 value: 46.021 - type: mrr_at_3 value: 42.541000000000004 - type: mrr_at_5 value: 44.167 - type: ndcg_at_1 value: 35.804 - type: ndcg_at_10 value: 45.608 - type: ndcg_at_100 value: 50.746 - type: ndcg_at_1000 value: 52.839999999999996 - type: ndcg_at_3 value: 40.52 - type: ndcg_at_5 value: 43.051 - type: precision_at_1 value: 35.804 - type: precision_at_10 value: 8.104 - type: precision_at_100 value: 1.256 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 19.121 - type: precision_at_5 value: 13.532 - type: recall_at_1 value: 29.737000000000002 - type: recall_at_10 value: 57.66 - type: recall_at_100 value: 79.121 - type: recall_at_1000 value: 93.023 - type: recall_at_3 value: 43.13 - type: recall_at_5 value: 49.836000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.299 - type: map_at_10 value: 35.617 - type: map_at_100 value: 36.972 - type: map_at_1000 value: 37.096000000000004 - type: map_at_3 value: 32.653999999999996 - type: map_at_5 value: 34.363 - type: mrr_at_1 value: 32.877 - type: mrr_at_10 value: 41.423 - type: mrr_at_100 value: 42.333999999999996 - type: mrr_at_1000 value: 42.398 - type: mrr_at_3 value: 39.193 - type: mrr_at_5 value: 40.426 - type: ndcg_at_1 value: 32.877 - type: ndcg_at_10 value: 41.271 - type: ndcg_at_100 value: 46.843 - type: ndcg_at_1000 value: 49.366 - type: ndcg_at_3 value: 36.735 - type: ndcg_at_5 value: 38.775999999999996 - type: precision_at_1 value: 32.877 - type: precision_at_10 value: 7.580000000000001 - type: precision_at_100 value: 1.192 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.541999999999998 - type: precision_at_5 value: 12.443 - type: recall_at_1 value: 26.299 - type: recall_at_10 value: 52.256 - type: recall_at_100 value: 75.919 - type: recall_at_1000 value: 93.185 - type: recall_at_3 value: 39.271 - type: recall_at_5 value: 44.901 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.05741666666667 - type: map_at_10 value: 36.086416666666665 - type: map_at_100 value: 37.26916666666667 - type: map_at_1000 value: 37.38191666666666 - type: map_at_3 value: 33.34225 - type: map_at_5 value: 34.86425 - type: mrr_at_1 value: 32.06008333333333 - type: mrr_at_10 value: 40.36658333333333 - type: mrr_at_100 value: 41.206500000000005 - type: mrr_at_1000 value: 41.261083333333325 - type: mrr_at_3 value: 38.01208333333334 - type: mrr_at_5 value: 39.36858333333333 - type: ndcg_at_1 value: 32.06008333333333 - type: ndcg_at_10 value: 41.3535 - type: ndcg_at_100 value: 46.42066666666666 - type: ndcg_at_1000 value: 48.655166666666666 - type: ndcg_at_3 value: 36.78041666666667 - type: ndcg_at_5 value: 38.91783333333334 - type: precision_at_1 value: 32.06008333333333 - type: precision_at_10 value: 7.169833333333332 - type: precision_at_100 value: 1.1395 - type: precision_at_1000 value: 0.15158333333333332 - type: precision_at_3 value: 16.852 - type: precision_at_5 value: 11.8645 - type: recall_at_1 value: 27.05741666666667 - type: recall_at_10 value: 52.64491666666666 - type: recall_at_100 value: 74.99791666666667 - type: recall_at_1000 value: 90.50524999999999 - type: recall_at_3 value: 39.684000000000005 - type: recall_at_5 value: 45.37225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.607999999999997 - type: map_at_10 value: 32.28 - type: map_at_100 value: 33.261 - type: map_at_1000 value: 33.346 - type: map_at_3 value: 30.514999999999997 - type: map_at_5 value: 31.415 - type: mrr_at_1 value: 28.988000000000003 - type: mrr_at_10 value: 35.384 - type: mrr_at_100 value: 36.24 - type: mrr_at_1000 value: 36.299 - type: mrr_at_3 value: 33.717000000000006 - type: mrr_at_5 value: 34.507 - type: ndcg_at_1 value: 28.988000000000003 - type: ndcg_at_10 value: 36.248000000000005 - type: ndcg_at_100 value: 41.034 - type: ndcg_at_1000 value: 43.35 - type: ndcg_at_3 value: 32.987 - type: ndcg_at_5 value: 34.333999999999996 - type: precision_at_1 value: 28.988000000000003 - type: precision_at_10 value: 5.506 - type: precision_at_100 value: 0.853 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 14.11 - type: precision_at_5 value: 9.417 - type: recall_at_1 value: 25.607999999999997 - type: recall_at_10 value: 45.344 - type: recall_at_100 value: 67.132 - type: recall_at_1000 value: 84.676 - type: recall_at_3 value: 36.02 - type: recall_at_5 value: 39.613 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.44 - type: map_at_10 value: 25.651000000000003 - type: map_at_100 value: 26.735 - type: map_at_1000 value: 26.86 - type: map_at_3 value: 23.409 - type: map_at_5 value: 24.604 - type: mrr_at_1 value: 22.195 - type: mrr_at_10 value: 29.482000000000003 - type: mrr_at_100 value: 30.395 - type: mrr_at_1000 value: 30.471999999999998 - type: mrr_at_3 value: 27.409 - type: mrr_at_5 value: 28.553 - type: ndcg_at_1 value: 22.195 - type: ndcg_at_10 value: 30.242 - type: ndcg_at_100 value: 35.397 - type: ndcg_at_1000 value: 38.287 - type: ndcg_at_3 value: 26.201 - type: ndcg_at_5 value: 28.008 - type: precision_at_1 value: 22.195 - type: precision_at_10 value: 5.372 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 12.228 - type: precision_at_5 value: 8.727 - type: recall_at_1 value: 18.44 - type: recall_at_10 value: 40.325 - type: recall_at_100 value: 63.504000000000005 - type: recall_at_1000 value: 83.909 - type: recall_at_3 value: 28.925 - type: recall_at_5 value: 33.641 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.535999999999998 - type: map_at_10 value: 35.358000000000004 - type: map_at_100 value: 36.498999999999995 - type: map_at_1000 value: 36.597 - type: map_at_3 value: 32.598 - type: map_at_5 value: 34.185 - type: mrr_at_1 value: 31.25 - type: mrr_at_10 value: 39.593 - type: mrr_at_100 value: 40.443 - type: mrr_at_1000 value: 40.498 - type: mrr_at_3 value: 37.018 - type: mrr_at_5 value: 38.492 - type: ndcg_at_1 value: 31.25 - type: ndcg_at_10 value: 40.71 - type: ndcg_at_100 value: 46.079 - type: ndcg_at_1000 value: 48.287 - type: ndcg_at_3 value: 35.667 - type: ndcg_at_5 value: 38.080000000000005 - type: precision_at_1 value: 31.25 - type: precision_at_10 value: 6.847 - type: precision_at_100 value: 1.079 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 16.262 - type: precision_at_5 value: 11.455 - type: recall_at_1 value: 26.535999999999998 - type: recall_at_10 value: 52.92099999999999 - type: recall_at_100 value: 76.669 - type: recall_at_1000 value: 92.096 - type: recall_at_3 value: 38.956 - type: recall_at_5 value: 45.239000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.691 - type: map_at_10 value: 33.417 - type: map_at_100 value: 35.036 - type: map_at_1000 value: 35.251 - type: map_at_3 value: 30.646 - type: map_at_5 value: 32.177 - type: mrr_at_1 value: 30.04 - type: mrr_at_10 value: 37.905 - type: mrr_at_100 value: 38.929 - type: mrr_at_1000 value: 38.983000000000004 - type: mrr_at_3 value: 35.276999999999994 - type: mrr_at_5 value: 36.897000000000006 - type: ndcg_at_1 value: 30.04 - type: ndcg_at_10 value: 39.037 - type: ndcg_at_100 value: 44.944 - type: ndcg_at_1000 value: 47.644 - type: ndcg_at_3 value: 34.833999999999996 - type: ndcg_at_5 value: 36.83 - type: precision_at_1 value: 30.04 - type: precision_at_10 value: 7.4510000000000005 - type: precision_at_100 value: 1.492 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 16.337 - type: precision_at_5 value: 11.897 - type: recall_at_1 value: 24.691 - type: recall_at_10 value: 49.303999999999995 - type: recall_at_100 value: 76.20400000000001 - type: recall_at_1000 value: 93.30000000000001 - type: recall_at_3 value: 36.594 - type: recall_at_5 value: 42.41 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.118 - type: map_at_10 value: 30.714999999999996 - type: map_at_100 value: 31.656000000000002 - type: map_at_1000 value: 31.757 - type: map_at_3 value: 28.355000000000004 - type: map_at_5 value: 29.337000000000003 - type: mrr_at_1 value: 25.323 - type: mrr_at_10 value: 32.93 - type: mrr_at_100 value: 33.762 - type: mrr_at_1000 value: 33.829 - type: mrr_at_3 value: 30.775999999999996 - type: mrr_at_5 value: 31.774 - type: ndcg_at_1 value: 25.323 - type: ndcg_at_10 value: 35.408 - type: ndcg_at_100 value: 40.083 - type: ndcg_at_1000 value: 42.542 - type: ndcg_at_3 value: 30.717 - type: ndcg_at_5 value: 32.385000000000005 - type: precision_at_1 value: 25.323 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.843 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 13.001 - type: precision_at_5 value: 8.834999999999999 - type: recall_at_1 value: 23.118 - type: recall_at_10 value: 47.788000000000004 - type: recall_at_100 value: 69.37 - type: recall_at_1000 value: 87.47399999999999 - type: recall_at_3 value: 34.868 - type: recall_at_5 value: 39.001999999999995 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 14.288 - type: map_at_10 value: 23.256 - type: map_at_100 value: 25.115 - type: map_at_1000 value: 25.319000000000003 - type: map_at_3 value: 20.005 - type: map_at_5 value: 21.529999999999998 - type: mrr_at_1 value: 31.401 - type: mrr_at_10 value: 42.251 - type: mrr_at_100 value: 43.236999999999995 - type: mrr_at_1000 value: 43.272 - type: mrr_at_3 value: 39.164 - type: mrr_at_5 value: 40.881 - type: ndcg_at_1 value: 31.401 - type: ndcg_at_10 value: 31.615 - type: ndcg_at_100 value: 38.982 - type: ndcg_at_1000 value: 42.496 - type: ndcg_at_3 value: 26.608999999999998 - type: ndcg_at_5 value: 28.048000000000002 - type: precision_at_1 value: 31.401 - type: precision_at_10 value: 9.536999999999999 - type: precision_at_100 value: 1.763 - type: precision_at_1000 value: 0.241 - type: precision_at_3 value: 19.153000000000002 - type: precision_at_5 value: 14.228 - type: recall_at_1 value: 14.288 - type: recall_at_10 value: 36.717 - type: recall_at_100 value: 61.9 - type: recall_at_1000 value: 81.676 - type: recall_at_3 value: 24.203 - type: recall_at_5 value: 28.793999999999997 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.019 - type: map_at_10 value: 19.963 - type: map_at_100 value: 28.834 - type: map_at_1000 value: 30.537999999999997 - type: map_at_3 value: 14.45 - type: map_at_5 value: 16.817999999999998 - type: mrr_at_1 value: 65.75 - type: mrr_at_10 value: 74.646 - type: mrr_at_100 value: 74.946 - type: mrr_at_1000 value: 74.95100000000001 - type: mrr_at_3 value: 72.625 - type: mrr_at_5 value: 74.012 - type: ndcg_at_1 value: 54 - type: ndcg_at_10 value: 42.014 - type: ndcg_at_100 value: 47.527 - type: ndcg_at_1000 value: 54.911 - type: ndcg_at_3 value: 46.586 - type: ndcg_at_5 value: 43.836999999999996 - type: precision_at_1 value: 65.75 - type: precision_at_10 value: 33.475 - type: precision_at_100 value: 11.16 - type: precision_at_1000 value: 2.145 - type: precision_at_3 value: 50.083 - type: precision_at_5 value: 42.55 - type: recall_at_1 value: 9.019 - type: recall_at_10 value: 25.558999999999997 - type: recall_at_100 value: 53.937999999999995 - type: recall_at_1000 value: 77.67399999999999 - type: recall_at_3 value: 15.456 - type: recall_at_5 value: 19.259 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 52.635 - type: f1 value: 47.692783881403926 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 76.893 - type: map_at_10 value: 84.897 - type: map_at_100 value: 85.122 - type: map_at_1000 value: 85.135 - type: map_at_3 value: 83.88 - type: map_at_5 value: 84.565 - type: mrr_at_1 value: 83.003 - type: mrr_at_10 value: 89.506 - type: mrr_at_100 value: 89.574 - type: mrr_at_1000 value: 89.575 - type: mrr_at_3 value: 88.991 - type: mrr_at_5 value: 89.349 - type: ndcg_at_1 value: 83.003 - type: ndcg_at_10 value: 88.351 - type: ndcg_at_100 value: 89.128 - type: ndcg_at_1000 value: 89.34100000000001 - type: ndcg_at_3 value: 86.92 - type: ndcg_at_5 value: 87.78200000000001 - type: precision_at_1 value: 83.003 - type: precision_at_10 value: 10.517999999999999 - type: precision_at_100 value: 1.115 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 33.062999999999995 - type: precision_at_5 value: 20.498 - type: recall_at_1 value: 76.893 - type: recall_at_10 value: 94.374 - type: recall_at_100 value: 97.409 - type: recall_at_1000 value: 98.687 - type: recall_at_3 value: 90.513 - type: recall_at_5 value: 92.709 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.829 - type: map_at_10 value: 32.86 - type: map_at_100 value: 34.838 - type: map_at_1000 value: 35.006 - type: map_at_3 value: 28.597 - type: map_at_5 value: 31.056 - type: mrr_at_1 value: 41.358 - type: mrr_at_10 value: 49.542 - type: mrr_at_100 value: 50.29900000000001 - type: mrr_at_1000 value: 50.334999999999994 - type: mrr_at_3 value: 46.579 - type: mrr_at_5 value: 48.408 - type: ndcg_at_1 value: 41.358 - type: ndcg_at_10 value: 40.758 - type: ndcg_at_100 value: 47.799 - type: ndcg_at_1000 value: 50.589 - type: ndcg_at_3 value: 36.695 - type: ndcg_at_5 value: 38.193 - type: precision_at_1 value: 41.358 - type: precision_at_10 value: 11.142000000000001 - type: precision_at_100 value: 1.8350000000000002 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 24.023 - type: precision_at_5 value: 17.963 - type: recall_at_1 value: 20.829 - type: recall_at_10 value: 47.467999999999996 - type: recall_at_100 value: 73.593 - type: recall_at_1000 value: 90.122 - type: recall_at_3 value: 32.74 - type: recall_at_5 value: 39.608 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.324 - type: map_at_10 value: 64.183 - type: map_at_100 value: 65.037 - type: map_at_1000 value: 65.094 - type: map_at_3 value: 60.663 - type: map_at_5 value: 62.951 - type: mrr_at_1 value: 80.648 - type: mrr_at_10 value: 86.005 - type: mrr_at_100 value: 86.157 - type: mrr_at_1000 value: 86.162 - type: mrr_at_3 value: 85.116 - type: mrr_at_5 value: 85.703 - type: ndcg_at_1 value: 80.648 - type: ndcg_at_10 value: 72.351 - type: ndcg_at_100 value: 75.279 - type: ndcg_at_1000 value: 76.357 - type: ndcg_at_3 value: 67.484 - type: ndcg_at_5 value: 70.31500000000001 - type: precision_at_1 value: 80.648 - type: precision_at_10 value: 15.103 - type: precision_at_100 value: 1.7399999999999998 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 43.232 - type: precision_at_5 value: 28.165000000000003 - type: recall_at_1 value: 40.324 - type: recall_at_10 value: 75.517 - type: recall_at_100 value: 86.982 - type: recall_at_1000 value: 94.072 - type: recall_at_3 value: 64.848 - type: recall_at_5 value: 70.41199999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.4 - type: ap value: 87.4422032289312 - type: f1 value: 91.39249564302281 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.03 - type: map_at_10 value: 34.402 - type: map_at_100 value: 35.599 - type: map_at_1000 value: 35.648 - type: map_at_3 value: 30.603 - type: map_at_5 value: 32.889 - type: mrr_at_1 value: 22.679 - type: mrr_at_10 value: 35.021 - type: mrr_at_100 value: 36.162 - type: mrr_at_1000 value: 36.205 - type: mrr_at_3 value: 31.319999999999997 - type: mrr_at_5 value: 33.562 - type: ndcg_at_1 value: 22.692999999999998 - type: ndcg_at_10 value: 41.258 - type: ndcg_at_100 value: 46.967 - type: ndcg_at_1000 value: 48.175000000000004 - type: ndcg_at_3 value: 33.611000000000004 - type: ndcg_at_5 value: 37.675 - type: precision_at_1 value: 22.692999999999998 - type: precision_at_10 value: 6.5089999999999995 - type: precision_at_100 value: 0.936 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.413 - type: precision_at_5 value: 10.702 - type: recall_at_1 value: 22.03 - type: recall_at_10 value: 62.248000000000005 - type: recall_at_100 value: 88.524 - type: recall_at_1000 value: 97.714 - type: recall_at_3 value: 41.617 - type: recall_at_5 value: 51.359 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.36844505243957 - type: f1 value: 94.12408743818202 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.43410852713177 - type: f1 value: 58.501855709435624 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.04909213180902 - type: f1 value: 74.1800860395823 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.76126429051781 - type: f1 value: 79.85705217473232 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.70119520292863 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.33544316467486 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.75499243990726 - type: mrr value: 31.70602251821063 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.451999999999999 - type: map_at_10 value: 13.918 - type: map_at_100 value: 17.316000000000003 - type: map_at_1000 value: 18.747 - type: map_at_3 value: 10.471 - type: map_at_5 value: 12.104 - type: mrr_at_1 value: 46.749 - type: mrr_at_10 value: 55.717000000000006 - type: mrr_at_100 value: 56.249 - type: mrr_at_1000 value: 56.288000000000004 - type: mrr_at_3 value: 53.818 - type: mrr_at_5 value: 55.103 - type: ndcg_at_1 value: 45.201 - type: ndcg_at_10 value: 35.539 - type: ndcg_at_100 value: 32.586 - type: ndcg_at_1000 value: 41.486000000000004 - type: ndcg_at_3 value: 41.174 - type: ndcg_at_5 value: 38.939 - type: precision_at_1 value: 46.749 - type: precision_at_10 value: 25.944 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.076 - type: precision_at_3 value: 38.7 - type: precision_at_5 value: 33.56 - type: recall_at_1 value: 6.451999999999999 - type: recall_at_10 value: 17.302 - type: recall_at_100 value: 32.14 - type: recall_at_1000 value: 64.12 - type: recall_at_3 value: 11.219 - type: recall_at_5 value: 13.993 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 32.037 - type: map_at_10 value: 46.565 - type: map_at_100 value: 47.606 - type: map_at_1000 value: 47.636 - type: map_at_3 value: 42.459 - type: map_at_5 value: 44.762 - type: mrr_at_1 value: 36.181999999999995 - type: mrr_at_10 value: 49.291000000000004 - type: mrr_at_100 value: 50.059 - type: mrr_at_1000 value: 50.078 - type: mrr_at_3 value: 45.829 - type: mrr_at_5 value: 47.797 - type: ndcg_at_1 value: 36.153 - type: ndcg_at_10 value: 53.983000000000004 - type: ndcg_at_100 value: 58.347 - type: ndcg_at_1000 value: 59.058 - type: ndcg_at_3 value: 46.198 - type: ndcg_at_5 value: 50.022 - type: precision_at_1 value: 36.153 - type: precision_at_10 value: 8.763 - type: precision_at_100 value: 1.123 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.751 - type: precision_at_5 value: 14.646999999999998 - type: recall_at_1 value: 32.037 - type: recall_at_10 value: 74.008 - type: recall_at_100 value: 92.893 - type: recall_at_1000 value: 98.16 - type: recall_at_3 value: 53.705999999999996 - type: recall_at_5 value: 62.495 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.152 - type: map_at_10 value: 85.104 - type: map_at_100 value: 85.745 - type: map_at_1000 value: 85.761 - type: map_at_3 value: 82.175 - type: map_at_5 value: 84.066 - type: mrr_at_1 value: 82.03 - type: mrr_at_10 value: 88.115 - type: mrr_at_100 value: 88.21 - type: mrr_at_1000 value: 88.211 - type: mrr_at_3 value: 87.19200000000001 - type: mrr_at_5 value: 87.85 - type: ndcg_at_1 value: 82.03 - type: ndcg_at_10 value: 88.78 - type: ndcg_at_100 value: 89.96300000000001 - type: ndcg_at_1000 value: 90.056 - type: ndcg_at_3 value: 86.051 - type: ndcg_at_5 value: 87.63499999999999 - type: precision_at_1 value: 82.03 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.627 - type: precision_at_5 value: 24.784 - type: recall_at_1 value: 71.152 - type: recall_at_10 value: 95.649 - type: recall_at_100 value: 99.58200000000001 - type: recall_at_1000 value: 99.981 - type: recall_at_3 value: 87.767 - type: recall_at_5 value: 92.233 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.48713646277477 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.394940772438545 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.043 - type: map_at_10 value: 12.949 - type: map_at_100 value: 15.146 - type: map_at_1000 value: 15.495000000000001 - type: map_at_3 value: 9.333 - type: map_at_5 value: 11.312999999999999 - type: mrr_at_1 value: 24.9 - type: mrr_at_10 value: 35.958 - type: mrr_at_100 value: 37.152 - type: mrr_at_1000 value: 37.201 - type: mrr_at_3 value: 32.667 - type: mrr_at_5 value: 34.567 - type: ndcg_at_1 value: 24.9 - type: ndcg_at_10 value: 21.298000000000002 - type: ndcg_at_100 value: 29.849999999999998 - type: ndcg_at_1000 value: 35.506 - type: ndcg_at_3 value: 20.548 - type: ndcg_at_5 value: 18.064 - type: precision_at_1 value: 24.9 - type: precision_at_10 value: 10.9 - type: precision_at_100 value: 2.331 - type: precision_at_1000 value: 0.367 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 15.939999999999998 - type: recall_at_1 value: 5.043 - type: recall_at_10 value: 22.092 - type: recall_at_100 value: 47.323 - type: recall_at_1000 value: 74.553 - type: recall_at_3 value: 11.728 - type: recall_at_5 value: 16.188 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.7007085938325 - type: cos_sim_spearman value: 80.0171084446234 - type: euclidean_pearson value: 81.28133218355893 - type: euclidean_spearman value: 79.99291731740131 - type: manhattan_pearson value: 81.22926922327846 - type: manhattan_spearman value: 79.94444878127038 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.7411883252923 - type: cos_sim_spearman value: 77.93462937801245 - type: euclidean_pearson value: 83.00858563882404 - type: euclidean_spearman value: 77.82717362433257 - type: manhattan_pearson value: 82.92887645790769 - type: manhattan_spearman value: 77.78807488222115 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.04222459361023 - type: cos_sim_spearman value: 83.85931509330395 - type: euclidean_pearson value: 83.26916063876055 - type: euclidean_spearman value: 83.98621985648353 - type: manhattan_pearson value: 83.14935679184327 - type: manhattan_spearman value: 83.87938828586304 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 81.41136639535318 - type: cos_sim_spearman value: 81.51200091040481 - type: euclidean_pearson value: 81.45382456114775 - type: euclidean_spearman value: 81.46201181707931 - type: manhattan_pearson value: 81.37243088439584 - type: manhattan_spearman value: 81.39828421893426 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.71942451732227 - type: cos_sim_spearman value: 87.33044482064973 - type: euclidean_pearson value: 86.58580899365178 - type: euclidean_spearman value: 87.09206723832895 - type: manhattan_pearson value: 86.47460784157013 - type: manhattan_spearman value: 86.98367656583076 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.55868078863449 - type: cos_sim_spearman value: 85.38299230074065 - type: euclidean_pearson value: 84.64715256244595 - type: euclidean_spearman value: 85.49112229604047 - type: manhattan_pearson value: 84.60814346792462 - type: manhattan_spearman value: 85.44886026766822 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 84.99292526370614 - type: cos_sim_spearman value: 85.58139465695983 - type: euclidean_pearson value: 86.51325066734084 - type: euclidean_spearman value: 85.56736418284562 - type: manhattan_pearson value: 86.48190836601357 - type: manhattan_spearman value: 85.51616256224258 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.54124715078807 - type: cos_sim_spearman value: 65.32134275948374 - type: euclidean_pearson value: 67.09791698300816 - type: euclidean_spearman value: 65.79468982468465 - type: manhattan_pearson value: 67.13304723693966 - type: manhattan_spearman value: 65.68439995849283 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.4231099581624 - type: cos_sim_spearman value: 85.95475815226862 - type: euclidean_pearson value: 85.00339401999706 - type: euclidean_spearman value: 85.74133081802971 - type: manhattan_pearson value: 85.00407987181666 - type: manhattan_spearman value: 85.77509596397363 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.25666719585716 - type: mrr value: 96.32769917083642 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.828 - type: map_at_10 value: 68.369 - type: map_at_100 value: 68.83399999999999 - type: map_at_1000 value: 68.856 - type: map_at_3 value: 65.38000000000001 - type: map_at_5 value: 67.06299999999999 - type: mrr_at_1 value: 61 - type: mrr_at_10 value: 69.45400000000001 - type: mrr_at_100 value: 69.785 - type: mrr_at_1000 value: 69.807 - type: mrr_at_3 value: 67 - type: mrr_at_5 value: 68.43299999999999 - type: ndcg_at_1 value: 61 - type: ndcg_at_10 value: 73.258 - type: ndcg_at_100 value: 75.173 - type: ndcg_at_1000 value: 75.696 - type: ndcg_at_3 value: 68.162 - type: ndcg_at_5 value: 70.53399999999999 - type: precision_at_1 value: 61 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 1.087 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27 - type: precision_at_5 value: 17.666999999999998 - type: recall_at_1 value: 57.828 - type: recall_at_10 value: 87.122 - type: recall_at_100 value: 95.667 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 73.139 - type: recall_at_5 value: 79.361 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85247524752475 - type: cos_sim_ap value: 96.25640197639723 - type: cos_sim_f1 value: 92.37851662404091 - type: cos_sim_precision value: 94.55497382198953 - type: cos_sim_recall value: 90.3 - type: dot_accuracy value: 99.76138613861386 - type: dot_ap value: 93.40295864389073 - type: dot_f1 value: 87.64267990074441 - type: dot_precision value: 86.99507389162562 - type: dot_recall value: 88.3 - type: euclidean_accuracy value: 99.85049504950496 - type: euclidean_ap value: 96.24254350525462 - type: euclidean_f1 value: 92.32323232323232 - type: euclidean_precision value: 93.26530612244898 - type: euclidean_recall value: 91.4 - type: manhattan_accuracy value: 99.85346534653465 - type: manhattan_ap value: 96.2635334753325 - type: manhattan_f1 value: 92.37899073120495 - type: manhattan_precision value: 95.22292993630573 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.85346534653465 - type: max_ap value: 96.2635334753325 - type: max_f1 value: 92.37899073120495 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.83905786483794 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.031896152126436 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.551326709447146 - type: mrr value: 55.43758222986165 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.305688567308874 - type: cos_sim_spearman value: 29.27135743434515 - type: dot_pearson value: 30.336741878796563 - type: dot_spearman value: 30.513365725895937 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.245 - type: map_at_10 value: 1.92 - type: map_at_100 value: 10.519 - type: map_at_1000 value: 23.874000000000002 - type: map_at_3 value: 0.629 - type: map_at_5 value: 1.0290000000000001 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 93.5 - type: mrr_at_100 value: 93.5 - type: mrr_at_1000 value: 93.5 - type: mrr_at_3 value: 93 - type: mrr_at_5 value: 93.5 - type: ndcg_at_1 value: 84 - type: ndcg_at_10 value: 76.447 - type: ndcg_at_100 value: 56.516 - type: ndcg_at_1000 value: 48.583999999999996 - type: ndcg_at_3 value: 78.877 - type: ndcg_at_5 value: 79.174 - type: precision_at_1 value: 88 - type: precision_at_10 value: 80.60000000000001 - type: precision_at_100 value: 57.64 - type: precision_at_1000 value: 21.227999999999998 - type: precision_at_3 value: 82 - type: precision_at_5 value: 83.6 - type: recall_at_1 value: 0.245 - type: recall_at_10 value: 2.128 - type: recall_at_100 value: 13.767 - type: recall_at_1000 value: 44.958 - type: recall_at_3 value: 0.654 - type: recall_at_5 value: 1.111 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.5170000000000003 - type: map_at_10 value: 10.915 - type: map_at_100 value: 17.535 - type: map_at_1000 value: 19.042 - type: map_at_3 value: 5.689 - type: map_at_5 value: 7.837 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 49.547999999999995 - type: mrr_at_100 value: 50.653000000000006 - type: mrr_at_1000 value: 50.653000000000006 - type: mrr_at_3 value: 44.558 - type: mrr_at_5 value: 48.333 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 26.543 - type: ndcg_at_100 value: 38.946 - type: ndcg_at_1000 value: 49.406 - type: ndcg_at_3 value: 29.903000000000002 - type: ndcg_at_5 value: 29.231 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 23.265 - type: precision_at_100 value: 8.102 - type: precision_at_1000 value: 1.5 - type: precision_at_3 value: 31.293 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.5170000000000003 - type: recall_at_10 value: 16.88 - type: recall_at_100 value: 49.381 - type: recall_at_1000 value: 81.23899999999999 - type: recall_at_3 value: 6.965000000000001 - type: recall_at_5 value: 10.847999999999999 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.5942 - type: ap value: 13.92074156956546 - type: f1 value: 54.671999698839066 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.39728353140916 - type: f1 value: 59.68980496759517 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 52.11181870104935 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.46957143708649 - type: cos_sim_ap value: 76.16120197845457 - type: cos_sim_f1 value: 69.69919295671315 - type: cos_sim_precision value: 64.94986326344576 - type: cos_sim_recall value: 75.19788918205805 - type: dot_accuracy value: 83.0780234845324 - type: dot_ap value: 64.21717343541934 - type: dot_f1 value: 59.48375497624245 - type: dot_precision value: 57.94345759319489 - type: dot_recall value: 61.108179419525065 - type: euclidean_accuracy value: 86.6543482148179 - type: euclidean_ap value: 76.4527555010203 - type: euclidean_f1 value: 70.10156056477584 - type: euclidean_precision value: 66.05975723622782 - type: euclidean_recall value: 74.67018469656992 - type: manhattan_accuracy value: 86.66030875603504 - type: manhattan_ap value: 76.40304567255436 - type: manhattan_f1 value: 70.05275426328058 - type: manhattan_precision value: 65.4666360926393 - type: manhattan_recall value: 75.32981530343008 - type: max_accuracy value: 86.66030875603504 - type: max_ap value: 76.4527555010203 - type: max_f1 value: 70.10156056477584 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.42123646524624 - type: cos_sim_ap value: 85.15431437761646 - type: cos_sim_f1 value: 76.98069301530742 - type: cos_sim_precision value: 72.9314502239063 - type: cos_sim_recall value: 81.50600554357868 - type: dot_accuracy value: 86.70974502270346 - type: dot_ap value: 80.77621563599457 - type: dot_f1 value: 73.87058697285117 - type: dot_precision value: 68.98256396552877 - type: dot_recall value: 79.50415768401602 - type: euclidean_accuracy value: 88.46392672798541 - type: euclidean_ap value: 85.20370297495491 - type: euclidean_f1 value: 77.01372369624886 - type: euclidean_precision value: 73.39052800446397 - type: euclidean_recall value: 81.01324299353249 - type: manhattan_accuracy value: 88.43481973066325 - type: manhattan_ap value: 85.16318289864545 - type: manhattan_f1 value: 76.90884877182597 - type: manhattan_precision value: 74.01737396753062 - type: manhattan_recall value: 80.03541730828458 - type: max_accuracy value: 88.46392672798541 - type: max_ap value: 85.20370297495491 - type: max_f1 value: 77.01372369624886 license: mit language: - en --- **Recommend switching to newest BAAI/bge-base-en-v1.5, which has more reasonable similarity distribution and same method of usage.**

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License

More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Performs text classification, retrieval, clustering, reranking, and semantic textual similarity tasks across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-large-en-v1.5.json b/data/model_data_json/BAAI_bge-large-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..180de31ea8f7055fbe9d85d7082c09a1ecd54e99 --- /dev/null +++ b/data/model_data_json/BAAI_bge-large-en-v1.5.json @@ -0,0 +1,29 @@ +{ + "model_id": "BAAI/bge-large-en-v1.5", + "downloads": 2187819, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "mteb", + "en", + "arxiv:2401.03462", + "arxiv:2312.15503", + "arxiv:2311.13534", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-large-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.8507462686567 - type: ap value: 38.566457320228245 - type: f1 value: 69.69386648043475 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.416675 - type: ap value: 89.1928861155922 - type: f1 value: 92.39477019574215 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.175999999999995 - type: f1 value: 47.80712792870253 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 40.184999999999995 - type: map_at_10 value: 55.654 - type: map_at_100 value: 56.25 - type: map_at_1000 value: 56.255 - type: map_at_3 value: 51.742999999999995 - type: map_at_5 value: 54.129000000000005 - type: mrr_at_1 value: 40.967 - type: mrr_at_10 value: 55.96 - type: mrr_at_100 value: 56.54900000000001 - type: mrr_at_1000 value: 56.554 - type: mrr_at_3 value: 51.980000000000004 - type: mrr_at_5 value: 54.44 - type: ndcg_at_1 value: 40.184999999999995 - type: ndcg_at_10 value: 63.542 - type: ndcg_at_100 value: 65.96499999999999 - type: ndcg_at_1000 value: 66.08699999999999 - type: ndcg_at_3 value: 55.582 - type: ndcg_at_5 value: 59.855000000000004 - type: precision_at_1 value: 40.184999999999995 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 0.987 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.238 - type: precision_at_5 value: 15.405 - type: recall_at_1 value: 40.184999999999995 - type: recall_at_10 value: 88.407 - type: recall_at_100 value: 98.72 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.714 - type: recall_at_5 value: 77.027 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.567077926750066 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.19453389182364 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 64.46555939623092 - type: mrr value: 77.82361605768807 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 84.9554128814735 - type: cos_sim_spearman value: 84.65373612172036 - type: euclidean_pearson value: 83.2905059954138 - type: euclidean_spearman value: 84.52240782811128 - type: manhattan_pearson value: 82.99533802997436 - type: manhattan_spearman value: 84.20673798475734 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.78896103896103 - type: f1 value: 87.77189310964883 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.714538337650495 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.90108349284447 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.795 - type: map_at_10 value: 43.669000000000004 - type: map_at_100 value: 45.151 - type: map_at_1000 value: 45.278 - type: map_at_3 value: 40.006 - type: map_at_5 value: 42.059999999999995 - type: mrr_at_1 value: 39.771 - type: mrr_at_10 value: 49.826 - type: mrr_at_100 value: 50.504000000000005 - type: mrr_at_1000 value: 50.549 - type: mrr_at_3 value: 47.115 - type: mrr_at_5 value: 48.832 - type: ndcg_at_1 value: 39.771 - type: ndcg_at_10 value: 50.217999999999996 - type: ndcg_at_100 value: 55.454 - type: ndcg_at_1000 value: 57.37 - type: ndcg_at_3 value: 44.885000000000005 - type: ndcg_at_5 value: 47.419 - type: precision_at_1 value: 39.771 - type: precision_at_10 value: 9.642000000000001 - type: precision_at_100 value: 1.538 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 21.268 - type: precision_at_5 value: 15.536 - type: recall_at_1 value: 32.795 - type: recall_at_10 value: 62.580999999999996 - type: recall_at_100 value: 84.438 - type: recall_at_1000 value: 96.492 - type: recall_at_3 value: 47.071000000000005 - type: recall_at_5 value: 54.079 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.671 - type: map_at_10 value: 43.334 - type: map_at_100 value: 44.566 - type: map_at_1000 value: 44.702999999999996 - type: map_at_3 value: 40.343 - type: map_at_5 value: 41.983 - type: mrr_at_1 value: 40.764 - type: mrr_at_10 value: 49.382 - type: mrr_at_100 value: 49.988 - type: mrr_at_1000 value: 50.03300000000001 - type: mrr_at_3 value: 47.293 - type: mrr_at_5 value: 48.51 - type: ndcg_at_1 value: 40.764 - type: ndcg_at_10 value: 49.039 - type: ndcg_at_100 value: 53.259 - type: ndcg_at_1000 value: 55.253 - type: ndcg_at_3 value: 45.091 - type: ndcg_at_5 value: 46.839999999999996 - type: precision_at_1 value: 40.764 - type: precision_at_10 value: 9.191 - type: precision_at_100 value: 1.476 - type: precision_at_1000 value: 0.19499999999999998 - type: precision_at_3 value: 21.72 - type: precision_at_5 value: 15.299 - type: recall_at_1 value: 32.671 - type: recall_at_10 value: 58.816 - type: recall_at_100 value: 76.654 - type: recall_at_1000 value: 89.05999999999999 - type: recall_at_3 value: 46.743 - type: recall_at_5 value: 51.783 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.328 - type: map_at_10 value: 53.32599999999999 - type: map_at_100 value: 54.37499999999999 - type: map_at_1000 value: 54.429 - type: map_at_3 value: 49.902 - type: map_at_5 value: 52.002 - type: mrr_at_1 value: 46.332 - type: mrr_at_10 value: 56.858 - type: mrr_at_100 value: 57.522 - type: mrr_at_1000 value: 57.54899999999999 - type: mrr_at_3 value: 54.472 - type: mrr_at_5 value: 55.996 - type: ndcg_at_1 value: 46.332 - type: ndcg_at_10 value: 59.313 - type: ndcg_at_100 value: 63.266999999999996 - type: ndcg_at_1000 value: 64.36 - type: ndcg_at_3 value: 53.815000000000005 - type: ndcg_at_5 value: 56.814 - type: precision_at_1 value: 46.332 - type: precision_at_10 value: 9.53 - type: precision_at_100 value: 1.238 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.054000000000002 - type: precision_at_5 value: 16.589000000000002 - type: recall_at_1 value: 40.328 - type: recall_at_10 value: 73.421 - type: recall_at_100 value: 90.059 - type: recall_at_1000 value: 97.81 - type: recall_at_3 value: 59.009 - type: recall_at_5 value: 66.352 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.424 - type: map_at_10 value: 36.332 - type: map_at_100 value: 37.347 - type: map_at_1000 value: 37.422 - type: map_at_3 value: 33.743 - type: map_at_5 value: 35.176 - type: mrr_at_1 value: 29.153000000000002 - type: mrr_at_10 value: 38.233 - type: mrr_at_100 value: 39.109 - type: mrr_at_1000 value: 39.164 - type: mrr_at_3 value: 35.876000000000005 - type: mrr_at_5 value: 37.169000000000004 - type: ndcg_at_1 value: 29.153000000000002 - type: ndcg_at_10 value: 41.439 - type: ndcg_at_100 value: 46.42 - type: ndcg_at_1000 value: 48.242000000000004 - type: ndcg_at_3 value: 36.362 - type: ndcg_at_5 value: 38.743 - type: precision_at_1 value: 29.153000000000002 - type: precision_at_10 value: 6.315999999999999 - type: precision_at_100 value: 0.927 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 15.443000000000001 - type: precision_at_5 value: 10.644 - type: recall_at_1 value: 27.424 - type: recall_at_10 value: 55.364000000000004 - type: recall_at_100 value: 78.211 - type: recall_at_1000 value: 91.74600000000001 - type: recall_at_3 value: 41.379 - type: recall_at_5 value: 47.14 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.601 - type: map_at_10 value: 27.826 - type: map_at_100 value: 29.017 - type: map_at_1000 value: 29.137 - type: map_at_3 value: 25.125999999999998 - type: map_at_5 value: 26.765 - type: mrr_at_1 value: 24.005000000000003 - type: mrr_at_10 value: 32.716 - type: mrr_at_100 value: 33.631 - type: mrr_at_1000 value: 33.694 - type: mrr_at_3 value: 29.934 - type: mrr_at_5 value: 31.630999999999997 - type: ndcg_at_1 value: 24.005000000000003 - type: ndcg_at_10 value: 33.158 - type: ndcg_at_100 value: 38.739000000000004 - type: ndcg_at_1000 value: 41.495 - type: ndcg_at_3 value: 28.185 - type: ndcg_at_5 value: 30.796 - type: precision_at_1 value: 24.005000000000003 - type: precision_at_10 value: 5.908 - type: precision_at_100 value: 1.005 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 13.391 - type: precision_at_5 value: 9.876 - type: recall_at_1 value: 19.601 - type: recall_at_10 value: 44.746 - type: recall_at_100 value: 68.82300000000001 - type: recall_at_1000 value: 88.215 - type: recall_at_3 value: 31.239 - type: recall_at_5 value: 37.695 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.130000000000003 - type: map_at_10 value: 40.96 - type: map_at_100 value: 42.282 - type: map_at_1000 value: 42.392 - type: map_at_3 value: 37.889 - type: map_at_5 value: 39.661 - type: mrr_at_1 value: 36.958999999999996 - type: mrr_at_10 value: 46.835 - type: mrr_at_100 value: 47.644 - type: mrr_at_1000 value: 47.688 - type: mrr_at_3 value: 44.562000000000005 - type: mrr_at_5 value: 45.938 - type: ndcg_at_1 value: 36.958999999999996 - type: ndcg_at_10 value: 47.06 - type: ndcg_at_100 value: 52.345 - type: ndcg_at_1000 value: 54.35 - type: ndcg_at_3 value: 42.301 - type: ndcg_at_5 value: 44.635999999999996 - type: precision_at_1 value: 36.958999999999996 - type: precision_at_10 value: 8.479000000000001 - type: precision_at_100 value: 1.284 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 20.244 - type: precision_at_5 value: 14.224999999999998 - type: recall_at_1 value: 30.130000000000003 - type: recall_at_10 value: 59.27 - type: recall_at_100 value: 81.195 - type: recall_at_1000 value: 94.21199999999999 - type: recall_at_3 value: 45.885 - type: recall_at_5 value: 52.016 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.169999999999998 - type: map_at_10 value: 36.451 - type: map_at_100 value: 37.791000000000004 - type: map_at_1000 value: 37.897 - type: map_at_3 value: 33.109 - type: map_at_5 value: 34.937000000000005 - type: mrr_at_1 value: 32.877 - type: mrr_at_10 value: 42.368 - type: mrr_at_100 value: 43.201 - type: mrr_at_1000 value: 43.259 - type: mrr_at_3 value: 39.763999999999996 - type: mrr_at_5 value: 41.260000000000005 - type: ndcg_at_1 value: 32.877 - type: ndcg_at_10 value: 42.659000000000006 - type: ndcg_at_100 value: 48.161 - type: ndcg_at_1000 value: 50.345 - type: ndcg_at_3 value: 37.302 - type: ndcg_at_5 value: 39.722 - type: precision_at_1 value: 32.877 - type: precision_at_10 value: 7.9 - type: precision_at_100 value: 1.236 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.846 - type: precision_at_5 value: 12.9 - type: recall_at_1 value: 26.169999999999998 - type: recall_at_10 value: 55.35 - type: recall_at_100 value: 78.755 - type: recall_at_1000 value: 93.518 - type: recall_at_3 value: 40.176 - type: recall_at_5 value: 46.589000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.15516666666667 - type: map_at_10 value: 36.65741666666667 - type: map_at_100 value: 37.84991666666666 - type: map_at_1000 value: 37.96316666666667 - type: map_at_3 value: 33.74974999999999 - type: map_at_5 value: 35.3765 - type: mrr_at_1 value: 32.08233333333334 - type: mrr_at_10 value: 41.033833333333334 - type: mrr_at_100 value: 41.84524999999999 - type: mrr_at_1000 value: 41.89983333333333 - type: mrr_at_3 value: 38.62008333333333 - type: mrr_at_5 value: 40.03441666666666 - type: ndcg_at_1 value: 32.08233333333334 - type: ndcg_at_10 value: 42.229 - type: ndcg_at_100 value: 47.26716666666667 - type: ndcg_at_1000 value: 49.43466666666667 - type: ndcg_at_3 value: 37.36408333333333 - type: ndcg_at_5 value: 39.6715 - type: precision_at_1 value: 32.08233333333334 - type: precision_at_10 value: 7.382583333333334 - type: precision_at_100 value: 1.16625 - type: precision_at_1000 value: 0.15408333333333332 - type: precision_at_3 value: 17.218 - type: precision_at_5 value: 12.21875 - type: recall_at_1 value: 27.15516666666667 - type: recall_at_10 value: 54.36683333333333 - type: recall_at_100 value: 76.37183333333333 - type: recall_at_1000 value: 91.26183333333333 - type: recall_at_3 value: 40.769916666666674 - type: recall_at_5 value: 46.702333333333335 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.749 - type: map_at_10 value: 33.001999999999995 - type: map_at_100 value: 33.891 - type: map_at_1000 value: 33.993 - type: map_at_3 value: 30.703999999999997 - type: map_at_5 value: 31.959 - type: mrr_at_1 value: 28.834 - type: mrr_at_10 value: 35.955 - type: mrr_at_100 value: 36.709 - type: mrr_at_1000 value: 36.779 - type: mrr_at_3 value: 33.947 - type: mrr_at_5 value: 35.089 - type: ndcg_at_1 value: 28.834 - type: ndcg_at_10 value: 37.329 - type: ndcg_at_100 value: 41.79 - type: ndcg_at_1000 value: 44.169000000000004 - type: ndcg_at_3 value: 33.184999999999995 - type: ndcg_at_5 value: 35.107 - type: precision_at_1 value: 28.834 - type: precision_at_10 value: 5.7669999999999995 - type: precision_at_100 value: 0.876 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 14.213000000000001 - type: precision_at_5 value: 9.754999999999999 - type: recall_at_1 value: 25.749 - type: recall_at_10 value: 47.791 - type: recall_at_100 value: 68.255 - type: recall_at_1000 value: 85.749 - type: recall_at_3 value: 36.199 - type: recall_at_5 value: 41.071999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.777 - type: map_at_10 value: 25.201 - type: map_at_100 value: 26.423999999999996 - type: map_at_1000 value: 26.544 - type: map_at_3 value: 22.869 - type: map_at_5 value: 24.023 - type: mrr_at_1 value: 21.473 - type: mrr_at_10 value: 29.12 - type: mrr_at_100 value: 30.144 - type: mrr_at_1000 value: 30.215999999999998 - type: mrr_at_3 value: 26.933 - type: mrr_at_5 value: 28.051 - type: ndcg_at_1 value: 21.473 - type: ndcg_at_10 value: 30.003 - type: ndcg_at_100 value: 35.766 - type: ndcg_at_1000 value: 38.501000000000005 - type: ndcg_at_3 value: 25.773000000000003 - type: ndcg_at_5 value: 27.462999999999997 - type: precision_at_1 value: 21.473 - type: precision_at_10 value: 5.482 - type: precision_at_100 value: 0.975 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.205 - type: precision_at_5 value: 8.692 - type: recall_at_1 value: 17.777 - type: recall_at_10 value: 40.582 - type: recall_at_100 value: 66.305 - type: recall_at_1000 value: 85.636 - type: recall_at_3 value: 28.687 - type: recall_at_5 value: 33.089 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.677 - type: map_at_10 value: 36.309000000000005 - type: map_at_100 value: 37.403999999999996 - type: map_at_1000 value: 37.496 - type: map_at_3 value: 33.382 - type: map_at_5 value: 34.98 - type: mrr_at_1 value: 31.343 - type: mrr_at_10 value: 40.549 - type: mrr_at_100 value: 41.342 - type: mrr_at_1000 value: 41.397 - type: mrr_at_3 value: 38.029 - type: mrr_at_5 value: 39.451 - type: ndcg_at_1 value: 31.343 - type: ndcg_at_10 value: 42.1 - type: ndcg_at_100 value: 47.089999999999996 - type: ndcg_at_1000 value: 49.222 - type: ndcg_at_3 value: 36.836999999999996 - type: ndcg_at_5 value: 39.21 - type: precision_at_1 value: 31.343 - type: precision_at_10 value: 7.164 - type: precision_at_100 value: 1.0959999999999999 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 16.915 - type: precision_at_5 value: 11.940000000000001 - type: recall_at_1 value: 26.677 - type: recall_at_10 value: 55.54599999999999 - type: recall_at_100 value: 77.094 - type: recall_at_1000 value: 92.01 - type: recall_at_3 value: 41.191 - type: recall_at_5 value: 47.006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.501 - type: map_at_10 value: 33.102 - type: map_at_100 value: 34.676 - type: map_at_1000 value: 34.888000000000005 - type: map_at_3 value: 29.944 - type: map_at_5 value: 31.613999999999997 - type: mrr_at_1 value: 29.447000000000003 - type: mrr_at_10 value: 37.996 - type: mrr_at_100 value: 38.946 - type: mrr_at_1000 value: 38.995000000000005 - type: mrr_at_3 value: 35.079 - type: mrr_at_5 value: 36.69 - type: ndcg_at_1 value: 29.447000000000003 - type: ndcg_at_10 value: 39.232 - type: ndcg_at_100 value: 45.247 - type: ndcg_at_1000 value: 47.613 - type: ndcg_at_3 value: 33.922999999999995 - type: ndcg_at_5 value: 36.284 - type: precision_at_1 value: 29.447000000000003 - type: precision_at_10 value: 7.648000000000001 - type: precision_at_100 value: 1.516 - type: precision_at_1000 value: 0.23900000000000002 - type: precision_at_3 value: 16.008 - type: precision_at_5 value: 11.779 - type: recall_at_1 value: 24.501 - type: recall_at_10 value: 51.18899999999999 - type: recall_at_100 value: 78.437 - type: recall_at_1000 value: 92.842 - type: recall_at_3 value: 35.808 - type: recall_at_5 value: 42.197 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.039 - type: map_at_10 value: 30.377 - type: map_at_100 value: 31.275 - type: map_at_1000 value: 31.379 - type: map_at_3 value: 27.98 - type: map_at_5 value: 29.358 - type: mrr_at_1 value: 24.03 - type: mrr_at_10 value: 32.568000000000005 - type: mrr_at_100 value: 33.403 - type: mrr_at_1000 value: 33.475 - type: mrr_at_3 value: 30.436999999999998 - type: mrr_at_5 value: 31.796000000000003 - type: ndcg_at_1 value: 24.03 - type: ndcg_at_10 value: 35.198 - type: ndcg_at_100 value: 39.668 - type: ndcg_at_1000 value: 42.296 - type: ndcg_at_3 value: 30.709999999999997 - type: ndcg_at_5 value: 33.024 - type: precision_at_1 value: 24.03 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.828 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 13.309000000000001 - type: precision_at_5 value: 9.39 - type: recall_at_1 value: 22.039 - type: recall_at_10 value: 47.746 - type: recall_at_100 value: 68.23599999999999 - type: recall_at_1000 value: 87.852 - type: recall_at_3 value: 35.852000000000004 - type: recall_at_5 value: 41.410000000000004 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 15.692999999999998 - type: map_at_10 value: 26.903 - type: map_at_100 value: 28.987000000000002 - type: map_at_1000 value: 29.176999999999996 - type: map_at_3 value: 22.137 - type: map_at_5 value: 24.758 - type: mrr_at_1 value: 35.57 - type: mrr_at_10 value: 47.821999999999996 - type: mrr_at_100 value: 48.608000000000004 - type: mrr_at_1000 value: 48.638999999999996 - type: mrr_at_3 value: 44.452000000000005 - type: mrr_at_5 value: 46.546 - type: ndcg_at_1 value: 35.57 - type: ndcg_at_10 value: 36.567 - type: ndcg_at_100 value: 44.085 - type: ndcg_at_1000 value: 47.24 - type: ndcg_at_3 value: 29.964000000000002 - type: ndcg_at_5 value: 32.511 - type: precision_at_1 value: 35.57 - type: precision_at_10 value: 11.485 - type: precision_at_100 value: 1.9619999999999997 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 22.237000000000002 - type: precision_at_5 value: 17.471999999999998 - type: recall_at_1 value: 15.692999999999998 - type: recall_at_10 value: 43.056 - type: recall_at_100 value: 68.628 - type: recall_at_1000 value: 86.075 - type: recall_at_3 value: 26.918999999999997 - type: recall_at_5 value: 34.14 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.53 - type: map_at_10 value: 20.951 - type: map_at_100 value: 30.136000000000003 - type: map_at_1000 value: 31.801000000000002 - type: map_at_3 value: 15.021 - type: map_at_5 value: 17.471999999999998 - type: mrr_at_1 value: 71.0 - type: mrr_at_10 value: 79.176 - type: mrr_at_100 value: 79.418 - type: mrr_at_1000 value: 79.426 - type: mrr_at_3 value: 78.125 - type: mrr_at_5 value: 78.61200000000001 - type: ndcg_at_1 value: 58.5 - type: ndcg_at_10 value: 44.106 - type: ndcg_at_100 value: 49.268 - type: ndcg_at_1000 value: 56.711999999999996 - type: ndcg_at_3 value: 48.934 - type: ndcg_at_5 value: 45.826 - type: precision_at_1 value: 71.0 - type: precision_at_10 value: 35.0 - type: precision_at_100 value: 11.360000000000001 - type: precision_at_1000 value: 2.046 - type: precision_at_3 value: 52.833 - type: precision_at_5 value: 44.15 - type: recall_at_1 value: 9.53 - type: recall_at_10 value: 26.811 - type: recall_at_100 value: 55.916999999999994 - type: recall_at_1000 value: 79.973 - type: recall_at_3 value: 16.413 - type: recall_at_5 value: 19.980999999999998 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.519999999999996 - type: f1 value: 46.36601294761231 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 74.413 - type: map_at_10 value: 83.414 - type: map_at_100 value: 83.621 - type: map_at_1000 value: 83.635 - type: map_at_3 value: 82.337 - type: map_at_5 value: 83.039 - type: mrr_at_1 value: 80.19800000000001 - type: mrr_at_10 value: 87.715 - type: mrr_at_100 value: 87.778 - type: mrr_at_1000 value: 87.779 - type: mrr_at_3 value: 87.106 - type: mrr_at_5 value: 87.555 - type: ndcg_at_1 value: 80.19800000000001 - type: ndcg_at_10 value: 87.182 - type: ndcg_at_100 value: 87.90299999999999 - type: ndcg_at_1000 value: 88.143 - type: ndcg_at_3 value: 85.60600000000001 - type: ndcg_at_5 value: 86.541 - type: precision_at_1 value: 80.19800000000001 - type: precision_at_10 value: 10.531 - type: precision_at_100 value: 1.113 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.933 - type: precision_at_5 value: 20.429 - type: recall_at_1 value: 74.413 - type: recall_at_10 value: 94.363 - type: recall_at_100 value: 97.165 - type: recall_at_1000 value: 98.668 - type: recall_at_3 value: 90.108 - type: recall_at_5 value: 92.52 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.701 - type: map_at_10 value: 37.122 - type: map_at_100 value: 39.178000000000004 - type: map_at_1000 value: 39.326 - type: map_at_3 value: 32.971000000000004 - type: map_at_5 value: 35.332 - type: mrr_at_1 value: 44.753 - type: mrr_at_10 value: 53.452 - type: mrr_at_100 value: 54.198 - type: mrr_at_1000 value: 54.225 - type: mrr_at_3 value: 50.952 - type: mrr_at_5 value: 52.464 - type: ndcg_at_1 value: 44.753 - type: ndcg_at_10 value: 45.021 - type: ndcg_at_100 value: 52.028 - type: ndcg_at_1000 value: 54.596000000000004 - type: ndcg_at_3 value: 41.622 - type: ndcg_at_5 value: 42.736000000000004 - type: precision_at_1 value: 44.753 - type: precision_at_10 value: 12.284 - type: precision_at_100 value: 1.955 - type: precision_at_1000 value: 0.243 - type: precision_at_3 value: 27.828999999999997 - type: precision_at_5 value: 20.061999999999998 - type: recall_at_1 value: 22.701 - type: recall_at_10 value: 51.432 - type: recall_at_100 value: 77.009 - type: recall_at_1000 value: 92.511 - type: recall_at_3 value: 37.919000000000004 - type: recall_at_5 value: 44.131 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.189 - type: map_at_10 value: 66.24600000000001 - type: map_at_100 value: 67.098 - type: map_at_1000 value: 67.149 - type: map_at_3 value: 62.684 - type: map_at_5 value: 64.974 - type: mrr_at_1 value: 80.378 - type: mrr_at_10 value: 86.127 - type: mrr_at_100 value: 86.29299999999999 - type: mrr_at_1000 value: 86.297 - type: mrr_at_3 value: 85.31400000000001 - type: mrr_at_5 value: 85.858 - type: ndcg_at_1 value: 80.378 - type: ndcg_at_10 value: 74.101 - type: ndcg_at_100 value: 76.993 - type: ndcg_at_1000 value: 77.948 - type: ndcg_at_3 value: 69.232 - type: ndcg_at_5 value: 72.04599999999999 - type: precision_at_1 value: 80.378 - type: precision_at_10 value: 15.595999999999998 - type: precision_at_100 value: 1.7840000000000003 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 44.884 - type: precision_at_5 value: 29.145 - type: recall_at_1 value: 40.189 - type: recall_at_10 value: 77.981 - type: recall_at_100 value: 89.21 - type: recall_at_1000 value: 95.48299999999999 - type: recall_at_3 value: 67.326 - type: recall_at_5 value: 72.863 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.84599999999999 - type: ap value: 89.4710787567357 - type: f1 value: 92.83752676932258 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.132 - type: map_at_10 value: 35.543 - type: map_at_100 value: 36.702 - type: map_at_1000 value: 36.748999999999995 - type: map_at_3 value: 31.737 - type: map_at_5 value: 33.927 - type: mrr_at_1 value: 23.782 - type: mrr_at_10 value: 36.204 - type: mrr_at_100 value: 37.29 - type: mrr_at_1000 value: 37.330999999999996 - type: mrr_at_3 value: 32.458999999999996 - type: mrr_at_5 value: 34.631 - type: ndcg_at_1 value: 23.782 - type: ndcg_at_10 value: 42.492999999999995 - type: ndcg_at_100 value: 47.985 - type: ndcg_at_1000 value: 49.141 - type: ndcg_at_3 value: 34.748000000000005 - type: ndcg_at_5 value: 38.651 - type: precision_at_1 value: 23.782 - type: precision_at_10 value: 6.665 - type: precision_at_100 value: 0.941 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.776 - type: precision_at_5 value: 10.84 - type: recall_at_1 value: 23.132 - type: recall_at_10 value: 63.794 - type: recall_at_100 value: 89.027 - type: recall_at_1000 value: 97.807 - type: recall_at_3 value: 42.765 - type: recall_at_5 value: 52.11 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.59188326493388 - type: f1 value: 94.3842594786827 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 79.49384404924761 - type: f1 value: 59.7580539534629 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.56220578345663 - type: f1 value: 75.27228165561478 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.53463349024884 - type: f1 value: 80.4893958236536 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.56100273484962 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.470380028839607 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.06102792457849 - type: mrr value: 33.30709199672238 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.776999999999999 - type: map_at_10 value: 14.924000000000001 - type: map_at_100 value: 18.955 - type: map_at_1000 value: 20.538999999999998 - type: map_at_3 value: 10.982 - type: map_at_5 value: 12.679000000000002 - type: mrr_at_1 value: 47.988 - type: mrr_at_10 value: 57.232000000000006 - type: mrr_at_100 value: 57.818999999999996 - type: mrr_at_1000 value: 57.847 - type: mrr_at_3 value: 54.901999999999994 - type: mrr_at_5 value: 56.481 - type: ndcg_at_1 value: 46.594 - type: ndcg_at_10 value: 38.129000000000005 - type: ndcg_at_100 value: 35.54 - type: ndcg_at_1000 value: 44.172 - type: ndcg_at_3 value: 43.025999999999996 - type: ndcg_at_5 value: 41.052 - type: precision_at_1 value: 47.988 - type: precision_at_10 value: 28.111000000000004 - type: precision_at_100 value: 8.929 - type: precision_at_1000 value: 2.185 - type: precision_at_3 value: 40.144000000000005 - type: precision_at_5 value: 35.232 - type: recall_at_1 value: 6.776999999999999 - type: recall_at_10 value: 19.289 - type: recall_at_100 value: 36.359 - type: recall_at_1000 value: 67.54 - type: recall_at_3 value: 11.869 - type: recall_at_5 value: 14.999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.108000000000004 - type: map_at_10 value: 47.126000000000005 - type: map_at_100 value: 48.171 - type: map_at_1000 value: 48.199 - type: map_at_3 value: 42.734 - type: map_at_5 value: 45.362 - type: mrr_at_1 value: 34.936 - type: mrr_at_10 value: 49.571 - type: mrr_at_100 value: 50.345 - type: mrr_at_1000 value: 50.363 - type: mrr_at_3 value: 45.959 - type: mrr_at_5 value: 48.165 - type: ndcg_at_1 value: 34.936 - type: ndcg_at_10 value: 55.028999999999996 - type: ndcg_at_100 value: 59.244 - type: ndcg_at_1000 value: 59.861 - type: ndcg_at_3 value: 46.872 - type: ndcg_at_5 value: 51.217999999999996 - type: precision_at_1 value: 34.936 - type: precision_at_10 value: 9.099 - type: precision_at_100 value: 1.145 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 21.456 - type: precision_at_5 value: 15.411 - type: recall_at_1 value: 31.108000000000004 - type: recall_at_10 value: 76.53999999999999 - type: recall_at_100 value: 94.39 - type: recall_at_1000 value: 98.947 - type: recall_at_3 value: 55.572 - type: recall_at_5 value: 65.525 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.56400000000001 - type: map_at_10 value: 85.482 - type: map_at_100 value: 86.114 - type: map_at_1000 value: 86.13 - type: map_at_3 value: 82.607 - type: map_at_5 value: 84.405 - type: mrr_at_1 value: 82.42 - type: mrr_at_10 value: 88.304 - type: mrr_at_100 value: 88.399 - type: mrr_at_1000 value: 88.399 - type: mrr_at_3 value: 87.37 - type: mrr_at_5 value: 88.024 - type: ndcg_at_1 value: 82.45 - type: ndcg_at_10 value: 89.06500000000001 - type: ndcg_at_100 value: 90.232 - type: ndcg_at_1000 value: 90.305 - type: ndcg_at_3 value: 86.375 - type: ndcg_at_5 value: 87.85300000000001 - type: precision_at_1 value: 82.45 - type: precision_at_10 value: 13.486999999999998 - type: precision_at_100 value: 1.534 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.813 - type: precision_at_5 value: 24.773999999999997 - type: recall_at_1 value: 71.56400000000001 - type: recall_at_10 value: 95.812 - type: recall_at_100 value: 99.7 - type: recall_at_1000 value: 99.979 - type: recall_at_3 value: 87.966 - type: recall_at_5 value: 92.268 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 57.241876648614145 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 64.66212576446223 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.308 - type: map_at_10 value: 13.803 - type: map_at_100 value: 16.176 - type: map_at_1000 value: 16.561 - type: map_at_3 value: 9.761000000000001 - type: map_at_5 value: 11.802 - type: mrr_at_1 value: 26.200000000000003 - type: mrr_at_10 value: 37.621 - type: mrr_at_100 value: 38.767 - type: mrr_at_1000 value: 38.815 - type: mrr_at_3 value: 34.117 - type: mrr_at_5 value: 36.107 - type: ndcg_at_1 value: 26.200000000000003 - type: ndcg_at_10 value: 22.64 - type: ndcg_at_100 value: 31.567 - type: ndcg_at_1000 value: 37.623 - type: ndcg_at_3 value: 21.435000000000002 - type: ndcg_at_5 value: 18.87 - type: precision_at_1 value: 26.200000000000003 - type: precision_at_10 value: 11.74 - type: precision_at_100 value: 2.465 - type: precision_at_1000 value: 0.391 - type: precision_at_3 value: 20.033 - type: precision_at_5 value: 16.64 - type: recall_at_1 value: 5.308 - type: recall_at_10 value: 23.794999999999998 - type: recall_at_100 value: 50.015 - type: recall_at_1000 value: 79.283 - type: recall_at_3 value: 12.178 - type: recall_at_5 value: 16.882 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.93231134675553 - type: cos_sim_spearman value: 81.68319292603205 - type: euclidean_pearson value: 81.8396814380367 - type: euclidean_spearman value: 81.24641903349945 - type: manhattan_pearson value: 81.84698799204274 - type: manhattan_spearman value: 81.24269997904105 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.73241671587446 - type: cos_sim_spearman value: 79.05091082971826 - type: euclidean_pearson value: 83.91146869578044 - type: euclidean_spearman value: 79.87978465370936 - type: manhattan_pearson value: 83.90888338917678 - type: manhattan_spearman value: 79.87482848584241 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 85.14970731146177 - type: cos_sim_spearman value: 86.37363490084627 - type: euclidean_pearson value: 83.02154218530433 - type: euclidean_spearman value: 83.80258761957367 - type: manhattan_pearson value: 83.01664495119347 - type: manhattan_spearman value: 83.77567458007952 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.40474139886784 - type: cos_sim_spearman value: 82.77768789165984 - type: euclidean_pearson value: 80.7065877443695 - type: euclidean_spearman value: 81.375940662505 - type: manhattan_pearson value: 80.6507552270278 - type: manhattan_spearman value: 81.32782179098741 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 87.08585968722274 - type: cos_sim_spearman value: 88.03110031451399 - type: euclidean_pearson value: 85.74012019602384 - type: euclidean_spearman value: 86.13592849438209 - type: manhattan_pearson value: 85.74404842369206 - type: manhattan_spearman value: 86.14492318960154 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.95069052788875 - type: cos_sim_spearman value: 86.4867991595147 - type: euclidean_pearson value: 84.31013325754635 - type: euclidean_spearman value: 85.01529258006482 - type: manhattan_pearson value: 84.26995570085374 - type: manhattan_spearman value: 84.96982104986162 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.54617647971897 - type: cos_sim_spearman value: 87.49834181751034 - type: euclidean_pearson value: 86.01015322577122 - type: euclidean_spearman value: 84.63362652063199 - type: manhattan_pearson value: 86.13807574475706 - type: manhattan_spearman value: 84.7772370721132 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.20047755786615 - type: cos_sim_spearman value: 67.05324077987636 - type: euclidean_pearson value: 66.91930642976601 - type: euclidean_spearman value: 65.21491856099105 - type: manhattan_pearson value: 66.78756851976624 - type: manhattan_spearman value: 65.12356257740728 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.19852871539686 - type: cos_sim_spearman value: 87.5161895296395 - type: euclidean_pearson value: 84.59848645207485 - type: euclidean_spearman value: 85.26427328757919 - type: manhattan_pearson value: 84.59747366996524 - type: manhattan_spearman value: 85.24045855146915 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.63320317811032 - type: mrr value: 96.26242947321379 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 60.928000000000004 - type: map_at_10 value: 70.112 - type: map_at_100 value: 70.59299999999999 - type: map_at_1000 value: 70.623 - type: map_at_3 value: 66.846 - type: map_at_5 value: 68.447 - type: mrr_at_1 value: 64.0 - type: mrr_at_10 value: 71.212 - type: mrr_at_100 value: 71.616 - type: mrr_at_1000 value: 71.64500000000001 - type: mrr_at_3 value: 68.77799999999999 - type: mrr_at_5 value: 70.094 - type: ndcg_at_1 value: 64.0 - type: ndcg_at_10 value: 74.607 - type: ndcg_at_100 value: 76.416 - type: ndcg_at_1000 value: 77.102 - type: ndcg_at_3 value: 69.126 - type: ndcg_at_5 value: 71.41300000000001 - type: precision_at_1 value: 64.0 - type: precision_at_10 value: 9.933 - type: precision_at_100 value: 1.077 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 26.556 - type: precision_at_5 value: 17.467 - type: recall_at_1 value: 60.928000000000004 - type: recall_at_10 value: 87.322 - type: recall_at_100 value: 94.833 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 72.628 - type: recall_at_5 value: 78.428 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.86237623762376 - type: cos_sim_ap value: 96.72586477206649 - type: cos_sim_f1 value: 93.01858362631845 - type: cos_sim_precision value: 93.4409687184662 - type: cos_sim_recall value: 92.60000000000001 - type: dot_accuracy value: 99.78019801980199 - type: dot_ap value: 93.72748205246228 - type: dot_f1 value: 89.04109589041096 - type: dot_precision value: 87.16475095785441 - type: dot_recall value: 91.0 - type: euclidean_accuracy value: 99.85445544554456 - type: euclidean_ap value: 96.6661459876145 - type: euclidean_f1 value: 92.58337481333997 - type: euclidean_precision value: 92.17046580773042 - type: euclidean_recall value: 93.0 - type: manhattan_accuracy value: 99.85445544554456 - type: manhattan_ap value: 96.6883549244056 - type: manhattan_f1 value: 92.57598405580468 - type: manhattan_precision value: 92.25422045680239 - type: manhattan_recall value: 92.9 - type: max_accuracy value: 99.86237623762376 - type: max_ap value: 96.72586477206649 - type: max_f1 value: 93.01858362631845 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 66.39930057069995 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.96398659903402 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 55.946944700355395 - type: mrr value: 56.97151398438164 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.541657650692905 - type: cos_sim_spearman value: 31.605804192286303 - type: dot_pearson value: 28.26905996736398 - type: dot_spearman value: 27.864801765851187 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22599999999999998 - type: map_at_10 value: 1.8870000000000002 - type: map_at_100 value: 9.78 - type: map_at_1000 value: 22.514 - type: map_at_3 value: 0.6669999999999999 - type: map_at_5 value: 1.077 - type: mrr_at_1 value: 82.0 - type: mrr_at_10 value: 89.86699999999999 - type: mrr_at_100 value: 89.86699999999999 - type: mrr_at_1000 value: 89.86699999999999 - type: mrr_at_3 value: 89.667 - type: mrr_at_5 value: 89.667 - type: ndcg_at_1 value: 79.0 - type: ndcg_at_10 value: 74.818 - type: ndcg_at_100 value: 53.715999999999994 - type: ndcg_at_1000 value: 47.082 - type: ndcg_at_3 value: 82.134 - type: ndcg_at_5 value: 79.81899999999999 - type: precision_at_1 value: 82.0 - type: precision_at_10 value: 78.0 - type: precision_at_100 value: 54.48 - type: precision_at_1000 value: 20.518 - type: precision_at_3 value: 87.333 - type: precision_at_5 value: 85.2 - type: recall_at_1 value: 0.22599999999999998 - type: recall_at_10 value: 2.072 - type: recall_at_100 value: 13.013 - type: recall_at_1000 value: 43.462 - type: recall_at_3 value: 0.695 - type: recall_at_5 value: 1.139 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.328 - type: map_at_10 value: 9.795 - type: map_at_100 value: 15.801000000000002 - type: map_at_1000 value: 17.23 - type: map_at_3 value: 4.734 - type: map_at_5 value: 6.644 - type: mrr_at_1 value: 30.612000000000002 - type: mrr_at_10 value: 46.902 - type: mrr_at_100 value: 47.495 - type: mrr_at_1000 value: 47.495 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 44.218 - type: ndcg_at_1 value: 28.571 - type: ndcg_at_10 value: 24.806 - type: ndcg_at_100 value: 36.419000000000004 - type: ndcg_at_1000 value: 47.272999999999996 - type: ndcg_at_3 value: 25.666 - type: ndcg_at_5 value: 25.448999999999998 - type: precision_at_1 value: 30.612000000000002 - type: precision_at_10 value: 23.061 - type: precision_at_100 value: 7.714 - type: precision_at_1000 value: 1.484 - type: precision_at_3 value: 26.531 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 2.328 - type: recall_at_10 value: 16.524 - type: recall_at_100 value: 47.179 - type: recall_at_1000 value: 81.22200000000001 - type: recall_at_3 value: 5.745 - type: recall_at_5 value: 9.339 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.9142 - type: ap value: 14.335574772555415 - type: f1 value: 54.62839595194111 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.94340690435768 - type: f1 value: 60.286487936731916 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.26597708987974 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.48882398521786 - type: cos_sim_ap value: 79.04326607602204 - type: cos_sim_f1 value: 71.64566826860633 - type: cos_sim_precision value: 70.55512918905092 - type: cos_sim_recall value: 72.77044854881267 - type: dot_accuracy value: 84.19264469213805 - type: dot_ap value: 67.96360043562528 - type: dot_f1 value: 64.06418393006827 - type: dot_precision value: 58.64941898706424 - type: dot_recall value: 70.58047493403694 - type: euclidean_accuracy value: 87.45902127913214 - type: euclidean_ap value: 78.9742237648272 - type: euclidean_f1 value: 71.5553235908142 - type: euclidean_precision value: 70.77955601445535 - type: euclidean_recall value: 72.34828496042216 - type: manhattan_accuracy value: 87.41729749061214 - type: manhattan_ap value: 78.90073137580596 - type: manhattan_f1 value: 71.3942611553533 - type: manhattan_precision value: 68.52705653967483 - type: manhattan_recall value: 74.51187335092348 - type: max_accuracy value: 87.48882398521786 - type: max_ap value: 79.04326607602204 - type: max_f1 value: 71.64566826860633 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.68125897465751 - type: cos_sim_ap value: 85.6003454431979 - type: cos_sim_f1 value: 77.6957163958641 - type: cos_sim_precision value: 73.0110366307807 - type: cos_sim_recall value: 83.02279026793964 - type: dot_accuracy value: 87.7672992587418 - type: dot_ap value: 82.4971301112899 - type: dot_f1 value: 75.90528233151184 - type: dot_precision value: 72.0370626469368 - type: dot_recall value: 80.21250384970742 - type: euclidean_accuracy value: 88.4503434625684 - type: euclidean_ap value: 84.91949884748384 - type: euclidean_f1 value: 76.92365018444684 - type: euclidean_precision value: 74.53245721712759 - type: euclidean_recall value: 79.47336002463813 - type: manhattan_accuracy value: 88.47556952691427 - type: manhattan_ap value: 84.8963689101517 - type: manhattan_f1 value: 76.85901249256395 - type: manhattan_precision value: 74.31693989071039 - type: manhattan_recall value: 79.58115183246073 - type: max_accuracy value: 88.68125897465751 - type: max_ap value: 85.6003454431979 - type: max_f1 value: 77.6957163958641 license: mit language: - en ---

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License

For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model that supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. #### Usage of the ONNX files Its also possible to deploy the onnx files with the infinity_emb pip package. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like text classification, retrieval, clustering, and similarity measurement." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-large-en.json b/data/model_data_json/BAAI_bge-large-en.json new file mode 100644 index 0000000000000000000000000000000000000000..3a18b140f900ccbd84d11d8ebecd550d3d1f004b --- /dev/null +++ b/data/model_data_json/BAAI_bge-large-en.json @@ -0,0 +1,23 @@ +{ + "model_id": "BAAI/bge-large-en", + "downloads": 501936, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "mteb", + "sentence-transfomres", + "en", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-transfomres - transformers model-index: - name: bge-large-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.94029850746269 - type: ap value: 40.00228964744091 - type: f1 value: 70.86088267934595 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.93745 - type: ap value: 88.24758534667426 - type: f1 value: 91.91033034217591 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.158 - type: f1 value: 45.78935185074774 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 39.972 - type: map_at_10 value: 54.874 - type: map_at_100 value: 55.53399999999999 - type: map_at_1000 value: 55.539 - type: map_at_3 value: 51.031000000000006 - type: map_at_5 value: 53.342999999999996 - type: mrr_at_1 value: 40.541 - type: mrr_at_10 value: 55.096000000000004 - type: mrr_at_100 value: 55.75599999999999 - type: mrr_at_1000 value: 55.761 - type: mrr_at_3 value: 51.221000000000004 - type: mrr_at_5 value: 53.568000000000005 - type: ndcg_at_1 value: 39.972 - type: ndcg_at_10 value: 62.456999999999994 - type: ndcg_at_100 value: 65.262 - type: ndcg_at_1000 value: 65.389 - type: ndcg_at_3 value: 54.673 - type: ndcg_at_5 value: 58.80499999999999 - type: precision_at_1 value: 39.972 - type: precision_at_10 value: 8.634 - type: precision_at_100 value: 0.9860000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.740000000000002 - type: precision_at_5 value: 15.036 - type: recall_at_1 value: 39.972 - type: recall_at_10 value: 86.344 - type: recall_at_100 value: 98.578 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 65.22 - type: recall_at_5 value: 75.178 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.94652870403906 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.17257160340209 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.97867370559182 - type: mrr value: 77.00820032537484 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 80.00986015960616 - type: cos_sim_spearman value: 80.36387933827882 - type: euclidean_pearson value: 80.32305287257296 - type: euclidean_spearman value: 82.0524720308763 - type: manhattan_pearson value: 80.19847473906454 - type: manhattan_spearman value: 81.87957652506985 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 88.00000000000001 - type: f1 value: 87.99039027511853 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 41.36932844640705 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 38.34983239611985 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.257999999999996 - type: map_at_10 value: 42.937 - type: map_at_100 value: 44.406 - type: map_at_1000 value: 44.536 - type: map_at_3 value: 39.22 - type: map_at_5 value: 41.458 - type: mrr_at_1 value: 38.769999999999996 - type: mrr_at_10 value: 48.701 - type: mrr_at_100 value: 49.431000000000004 - type: mrr_at_1000 value: 49.476 - type: mrr_at_3 value: 45.875 - type: mrr_at_5 value: 47.67 - type: ndcg_at_1 value: 38.769999999999996 - type: ndcg_at_10 value: 49.35 - type: ndcg_at_100 value: 54.618 - type: ndcg_at_1000 value: 56.655 - type: ndcg_at_3 value: 43.826 - type: ndcg_at_5 value: 46.72 - type: precision_at_1 value: 38.769999999999996 - type: precision_at_10 value: 9.328 - type: precision_at_100 value: 1.484 - type: precision_at_1000 value: 0.196 - type: precision_at_3 value: 20.649 - type: precision_at_5 value: 15.25 - type: recall_at_1 value: 32.257999999999996 - type: recall_at_10 value: 61.849 - type: recall_at_100 value: 83.70400000000001 - type: recall_at_1000 value: 96.344 - type: recall_at_3 value: 46.037 - type: recall_at_5 value: 53.724000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.979 - type: map_at_10 value: 43.376999999999995 - type: map_at_100 value: 44.667 - type: map_at_1000 value: 44.794 - type: map_at_3 value: 40.461999999999996 - type: map_at_5 value: 42.138 - type: mrr_at_1 value: 41.146 - type: mrr_at_10 value: 49.575 - type: mrr_at_100 value: 50.187000000000005 - type: mrr_at_1000 value: 50.231 - type: mrr_at_3 value: 47.601 - type: mrr_at_5 value: 48.786 - type: ndcg_at_1 value: 41.146 - type: ndcg_at_10 value: 48.957 - type: ndcg_at_100 value: 53.296 - type: ndcg_at_1000 value: 55.254000000000005 - type: ndcg_at_3 value: 45.235 - type: ndcg_at_5 value: 47.014 - type: precision_at_1 value: 41.146 - type: precision_at_10 value: 9.107999999999999 - type: precision_at_100 value: 1.481 - type: precision_at_1000 value: 0.193 - type: precision_at_3 value: 21.783 - type: precision_at_5 value: 15.274 - type: recall_at_1 value: 32.979 - type: recall_at_10 value: 58.167 - type: recall_at_100 value: 76.374 - type: recall_at_1000 value: 88.836 - type: recall_at_3 value: 46.838 - type: recall_at_5 value: 52.006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.326 - type: map_at_10 value: 53.468 - type: map_at_100 value: 54.454 - type: map_at_1000 value: 54.508 - type: map_at_3 value: 50.12799999999999 - type: map_at_5 value: 51.991 - type: mrr_at_1 value: 46.394999999999996 - type: mrr_at_10 value: 57.016999999999996 - type: mrr_at_100 value: 57.67099999999999 - type: mrr_at_1000 value: 57.699999999999996 - type: mrr_at_3 value: 54.65 - type: mrr_at_5 value: 56.101 - type: ndcg_at_1 value: 46.394999999999996 - type: ndcg_at_10 value: 59.507 - type: ndcg_at_100 value: 63.31099999999999 - type: ndcg_at_1000 value: 64.388 - type: ndcg_at_3 value: 54.04600000000001 - type: ndcg_at_5 value: 56.723 - type: precision_at_1 value: 46.394999999999996 - type: precision_at_10 value: 9.567 - type: precision_at_100 value: 1.234 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.117 - type: precision_at_5 value: 16.426 - type: recall_at_1 value: 40.326 - type: recall_at_10 value: 73.763 - type: recall_at_100 value: 89.927 - type: recall_at_1000 value: 97.509 - type: recall_at_3 value: 59.34 - type: recall_at_5 value: 65.915 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.661 - type: map_at_10 value: 35.522 - type: map_at_100 value: 36.619 - type: map_at_1000 value: 36.693999999999996 - type: map_at_3 value: 33.154 - type: map_at_5 value: 34.353 - type: mrr_at_1 value: 28.362 - type: mrr_at_10 value: 37.403999999999996 - type: mrr_at_100 value: 38.374 - type: mrr_at_1000 value: 38.428000000000004 - type: mrr_at_3 value: 35.235 - type: mrr_at_5 value: 36.269 - type: ndcg_at_1 value: 28.362 - type: ndcg_at_10 value: 40.431 - type: ndcg_at_100 value: 45.745999999999995 - type: ndcg_at_1000 value: 47.493 - type: ndcg_at_3 value: 35.733 - type: ndcg_at_5 value: 37.722 - type: precision_at_1 value: 28.362 - type: precision_at_10 value: 6.101999999999999 - type: precision_at_100 value: 0.922 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 15.140999999999998 - type: precision_at_5 value: 10.305 - type: recall_at_1 value: 26.661 - type: recall_at_10 value: 53.675 - type: recall_at_100 value: 77.891 - type: recall_at_1000 value: 90.72 - type: recall_at_3 value: 40.751 - type: recall_at_5 value: 45.517 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.886 - type: map_at_10 value: 27.288 - type: map_at_100 value: 28.327999999999996 - type: map_at_1000 value: 28.438999999999997 - type: map_at_3 value: 24.453 - type: map_at_5 value: 25.959 - type: mrr_at_1 value: 23.134 - type: mrr_at_10 value: 32.004 - type: mrr_at_100 value: 32.789 - type: mrr_at_1000 value: 32.857 - type: mrr_at_3 value: 29.084 - type: mrr_at_5 value: 30.614 - type: ndcg_at_1 value: 23.134 - type: ndcg_at_10 value: 32.852 - type: ndcg_at_100 value: 37.972 - type: ndcg_at_1000 value: 40.656 - type: ndcg_at_3 value: 27.435 - type: ndcg_at_5 value: 29.823 - type: precision_at_1 value: 23.134 - type: precision_at_10 value: 6.032 - type: precision_at_100 value: 0.9950000000000001 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 13.017999999999999 - type: precision_at_5 value: 9.501999999999999 - type: recall_at_1 value: 18.886 - type: recall_at_10 value: 45.34 - type: recall_at_100 value: 67.947 - type: recall_at_1000 value: 86.924 - type: recall_at_3 value: 30.535 - type: recall_at_5 value: 36.451 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.994999999999997 - type: map_at_10 value: 40.04 - type: map_at_100 value: 41.435 - type: map_at_1000 value: 41.537 - type: map_at_3 value: 37.091 - type: map_at_5 value: 38.802 - type: mrr_at_1 value: 35.034 - type: mrr_at_10 value: 45.411 - type: mrr_at_100 value: 46.226 - type: mrr_at_1000 value: 46.27 - type: mrr_at_3 value: 43.086 - type: mrr_at_5 value: 44.452999999999996 - type: ndcg_at_1 value: 35.034 - type: ndcg_at_10 value: 46.076 - type: ndcg_at_100 value: 51.483000000000004 - type: ndcg_at_1000 value: 53.433 - type: ndcg_at_3 value: 41.304 - type: ndcg_at_5 value: 43.641999999999996 - type: precision_at_1 value: 35.034 - type: precision_at_10 value: 8.258000000000001 - type: precision_at_100 value: 1.268 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 19.57 - type: precision_at_5 value: 13.782 - type: recall_at_1 value: 28.994999999999997 - type: recall_at_10 value: 58.538000000000004 - type: recall_at_100 value: 80.72399999999999 - type: recall_at_1000 value: 93.462 - type: recall_at_3 value: 45.199 - type: recall_at_5 value: 51.237 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.795 - type: map_at_10 value: 34.935 - type: map_at_100 value: 36.306 - type: map_at_1000 value: 36.417 - type: map_at_3 value: 31.831 - type: map_at_5 value: 33.626 - type: mrr_at_1 value: 30.479 - type: mrr_at_10 value: 40.225 - type: mrr_at_100 value: 41.055 - type: mrr_at_1000 value: 41.114 - type: mrr_at_3 value: 37.538 - type: mrr_at_5 value: 39.073 - type: ndcg_at_1 value: 30.479 - type: ndcg_at_10 value: 40.949999999999996 - type: ndcg_at_100 value: 46.525 - type: ndcg_at_1000 value: 48.892 - type: ndcg_at_3 value: 35.79 - type: ndcg_at_5 value: 38.237 - type: precision_at_1 value: 30.479 - type: precision_at_10 value: 7.6259999999999994 - type: precision_at_100 value: 1.203 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 17.199 - type: precision_at_5 value: 12.466000000000001 - type: recall_at_1 value: 24.795 - type: recall_at_10 value: 53.421 - type: recall_at_100 value: 77.189 - type: recall_at_1000 value: 93.407 - type: recall_at_3 value: 39.051 - type: recall_at_5 value: 45.462 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.853499999999997 - type: map_at_10 value: 36.20433333333333 - type: map_at_100 value: 37.40391666666667 - type: map_at_1000 value: 37.515 - type: map_at_3 value: 33.39975 - type: map_at_5 value: 34.9665 - type: mrr_at_1 value: 31.62666666666667 - type: mrr_at_10 value: 40.436749999999996 - type: mrr_at_100 value: 41.260333333333335 - type: mrr_at_1000 value: 41.31525 - type: mrr_at_3 value: 38.06733333333332 - type: mrr_at_5 value: 39.41541666666667 - type: ndcg_at_1 value: 31.62666666666667 - type: ndcg_at_10 value: 41.63341666666667 - type: ndcg_at_100 value: 46.704166666666666 - type: ndcg_at_1000 value: 48.88483333333335 - type: ndcg_at_3 value: 36.896 - type: ndcg_at_5 value: 39.11891666666667 - type: precision_at_1 value: 31.62666666666667 - type: precision_at_10 value: 7.241083333333333 - type: precision_at_100 value: 1.1488333333333334 - type: precision_at_1000 value: 0.15250000000000002 - type: precision_at_3 value: 16.908333333333335 - type: precision_at_5 value: 11.942833333333333 - type: recall_at_1 value: 26.853499999999997 - type: recall_at_10 value: 53.461333333333336 - type: recall_at_100 value: 75.63633333333333 - type: recall_at_1000 value: 90.67016666666666 - type: recall_at_3 value: 40.24241666666667 - type: recall_at_5 value: 45.98608333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.241999999999997 - type: map_at_10 value: 31.863999999999997 - type: map_at_100 value: 32.835 - type: map_at_1000 value: 32.928000000000004 - type: map_at_3 value: 29.694 - type: map_at_5 value: 30.978 - type: mrr_at_1 value: 28.374 - type: mrr_at_10 value: 34.814 - type: mrr_at_100 value: 35.596 - type: mrr_at_1000 value: 35.666 - type: mrr_at_3 value: 32.745000000000005 - type: mrr_at_5 value: 34.049 - type: ndcg_at_1 value: 28.374 - type: ndcg_at_10 value: 35.969 - type: ndcg_at_100 value: 40.708 - type: ndcg_at_1000 value: 43.08 - type: ndcg_at_3 value: 31.968999999999998 - type: ndcg_at_5 value: 34.069 - type: precision_at_1 value: 28.374 - type: precision_at_10 value: 5.583 - type: precision_at_100 value: 0.8630000000000001 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 13.547999999999998 - type: precision_at_5 value: 9.447999999999999 - type: recall_at_1 value: 25.241999999999997 - type: recall_at_10 value: 45.711 - type: recall_at_100 value: 67.482 - type: recall_at_1000 value: 85.13300000000001 - type: recall_at_3 value: 34.622 - type: recall_at_5 value: 40.043 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.488999999999997 - type: map_at_10 value: 25.142999999999997 - type: map_at_100 value: 26.244 - type: map_at_1000 value: 26.363999999999997 - type: map_at_3 value: 22.654 - type: map_at_5 value: 24.017 - type: mrr_at_1 value: 21.198 - type: mrr_at_10 value: 28.903000000000002 - type: mrr_at_100 value: 29.860999999999997 - type: mrr_at_1000 value: 29.934 - type: mrr_at_3 value: 26.634999999999998 - type: mrr_at_5 value: 27.903 - type: ndcg_at_1 value: 21.198 - type: ndcg_at_10 value: 29.982999999999997 - type: ndcg_at_100 value: 35.275 - type: ndcg_at_1000 value: 38.074000000000005 - type: ndcg_at_3 value: 25.502999999999997 - type: ndcg_at_5 value: 27.557 - type: precision_at_1 value: 21.198 - type: precision_at_10 value: 5.502 - type: precision_at_100 value: 0.942 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 12.044 - type: precision_at_5 value: 8.782 - type: recall_at_1 value: 17.488999999999997 - type: recall_at_10 value: 40.821000000000005 - type: recall_at_100 value: 64.567 - type: recall_at_1000 value: 84.452 - type: recall_at_3 value: 28.351 - type: recall_at_5 value: 33.645 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.066000000000003 - type: map_at_10 value: 36.134 - type: map_at_100 value: 37.285000000000004 - type: map_at_1000 value: 37.389 - type: map_at_3 value: 33.522999999999996 - type: map_at_5 value: 34.905 - type: mrr_at_1 value: 31.436999999999998 - type: mrr_at_10 value: 40.225 - type: mrr_at_100 value: 41.079 - type: mrr_at_1000 value: 41.138000000000005 - type: mrr_at_3 value: 38.074999999999996 - type: mrr_at_5 value: 39.190000000000005 - type: ndcg_at_1 value: 31.436999999999998 - type: ndcg_at_10 value: 41.494 - type: ndcg_at_100 value: 46.678999999999995 - type: ndcg_at_1000 value: 48.964 - type: ndcg_at_3 value: 36.828 - type: ndcg_at_5 value: 38.789 - type: precision_at_1 value: 31.436999999999998 - type: precision_at_10 value: 6.931 - type: precision_at_100 value: 1.072 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 16.729 - type: precision_at_5 value: 11.567 - type: recall_at_1 value: 27.066000000000003 - type: recall_at_10 value: 53.705000000000005 - type: recall_at_100 value: 75.968 - type: recall_at_1000 value: 91.937 - type: recall_at_3 value: 40.865 - type: recall_at_5 value: 45.739999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.979000000000003 - type: map_at_10 value: 32.799 - type: map_at_100 value: 34.508 - type: map_at_1000 value: 34.719 - type: map_at_3 value: 29.947000000000003 - type: map_at_5 value: 31.584 - type: mrr_at_1 value: 30.237000000000002 - type: mrr_at_10 value: 37.651 - type: mrr_at_100 value: 38.805 - type: mrr_at_1000 value: 38.851 - type: mrr_at_3 value: 35.046 - type: mrr_at_5 value: 36.548 - type: ndcg_at_1 value: 30.237000000000002 - type: ndcg_at_10 value: 38.356 - type: ndcg_at_100 value: 44.906 - type: ndcg_at_1000 value: 47.299 - type: ndcg_at_3 value: 33.717999999999996 - type: ndcg_at_5 value: 35.946 - type: precision_at_1 value: 30.237000000000002 - type: precision_at_10 value: 7.292 - type: precision_at_100 value: 1.496 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 15.547 - type: precision_at_5 value: 11.344 - type: recall_at_1 value: 24.979000000000003 - type: recall_at_10 value: 48.624 - type: recall_at_100 value: 77.932 - type: recall_at_1000 value: 92.66499999999999 - type: recall_at_3 value: 35.217 - type: recall_at_5 value: 41.394 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.566 - type: map_at_10 value: 30.945 - type: map_at_100 value: 31.759999999999998 - type: map_at_1000 value: 31.855 - type: map_at_3 value: 28.64 - type: map_at_5 value: 29.787000000000003 - type: mrr_at_1 value: 24.954 - type: mrr_at_10 value: 33.311 - type: mrr_at_100 value: 34.050000000000004 - type: mrr_at_1000 value: 34.117999999999995 - type: mrr_at_3 value: 31.238 - type: mrr_at_5 value: 32.329 - type: ndcg_at_1 value: 24.954 - type: ndcg_at_10 value: 35.676 - type: ndcg_at_100 value: 39.931 - type: ndcg_at_1000 value: 42.43 - type: ndcg_at_3 value: 31.365 - type: ndcg_at_5 value: 33.184999999999995 - type: precision_at_1 value: 24.954 - type: precision_at_10 value: 5.564 - type: precision_at_100 value: 0.826 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 13.555 - type: precision_at_5 value: 9.168 - type: recall_at_1 value: 22.566 - type: recall_at_10 value: 47.922 - type: recall_at_100 value: 67.931 - type: recall_at_1000 value: 86.653 - type: recall_at_3 value: 36.103 - type: recall_at_5 value: 40.699000000000005 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 16.950000000000003 - type: map_at_10 value: 28.612 - type: map_at_100 value: 30.476999999999997 - type: map_at_1000 value: 30.674 - type: map_at_3 value: 24.262 - type: map_at_5 value: 26.554 - type: mrr_at_1 value: 38.241 - type: mrr_at_10 value: 50.43 - type: mrr_at_100 value: 51.059 - type: mrr_at_1000 value: 51.090999999999994 - type: mrr_at_3 value: 47.514 - type: mrr_at_5 value: 49.246 - type: ndcg_at_1 value: 38.241 - type: ndcg_at_10 value: 38.218 - type: ndcg_at_100 value: 45.003 - type: ndcg_at_1000 value: 48.269 - type: ndcg_at_3 value: 32.568000000000005 - type: ndcg_at_5 value: 34.400999999999996 - type: precision_at_1 value: 38.241 - type: precision_at_10 value: 11.674 - type: precision_at_100 value: 1.913 - type: precision_at_1000 value: 0.252 - type: precision_at_3 value: 24.387 - type: precision_at_5 value: 18.163 - type: recall_at_1 value: 16.950000000000003 - type: recall_at_10 value: 43.769000000000005 - type: recall_at_100 value: 66.875 - type: recall_at_1000 value: 84.92699999999999 - type: recall_at_3 value: 29.353 - type: recall_at_5 value: 35.467 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.276 - type: map_at_10 value: 20.848 - type: map_at_100 value: 29.804000000000002 - type: map_at_1000 value: 31.398 - type: map_at_3 value: 14.886 - type: map_at_5 value: 17.516000000000002 - type: mrr_at_1 value: 71 - type: mrr_at_10 value: 78.724 - type: mrr_at_100 value: 78.976 - type: mrr_at_1000 value: 78.986 - type: mrr_at_3 value: 77.333 - type: mrr_at_5 value: 78.021 - type: ndcg_at_1 value: 57.875 - type: ndcg_at_10 value: 43.855 - type: ndcg_at_100 value: 48.99 - type: ndcg_at_1000 value: 56.141 - type: ndcg_at_3 value: 48.914 - type: ndcg_at_5 value: 45.961 - type: precision_at_1 value: 71 - type: precision_at_10 value: 34.575 - type: precision_at_100 value: 11.182 - type: precision_at_1000 value: 2.044 - type: precision_at_3 value: 52.5 - type: precision_at_5 value: 44.2 - type: recall_at_1 value: 9.276 - type: recall_at_10 value: 26.501 - type: recall_at_100 value: 55.72899999999999 - type: recall_at_1000 value: 78.532 - type: recall_at_3 value: 16.365 - type: recall_at_5 value: 20.154 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 52.71 - type: f1 value: 47.74801556489574 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 73.405 - type: map_at_10 value: 82.822 - type: map_at_100 value: 83.042 - type: map_at_1000 value: 83.055 - type: map_at_3 value: 81.65299999999999 - type: map_at_5 value: 82.431 - type: mrr_at_1 value: 79.178 - type: mrr_at_10 value: 87.02 - type: mrr_at_100 value: 87.095 - type: mrr_at_1000 value: 87.09700000000001 - type: mrr_at_3 value: 86.309 - type: mrr_at_5 value: 86.824 - type: ndcg_at_1 value: 79.178 - type: ndcg_at_10 value: 86.72 - type: ndcg_at_100 value: 87.457 - type: ndcg_at_1000 value: 87.691 - type: ndcg_at_3 value: 84.974 - type: ndcg_at_5 value: 86.032 - type: precision_at_1 value: 79.178 - type: precision_at_10 value: 10.548 - type: precision_at_100 value: 1.113 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.848 - type: precision_at_5 value: 20.45 - type: recall_at_1 value: 73.405 - type: recall_at_10 value: 94.39699999999999 - type: recall_at_100 value: 97.219 - type: recall_at_1000 value: 98.675 - type: recall_at_3 value: 89.679 - type: recall_at_5 value: 92.392 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.651 - type: map_at_10 value: 36.886 - type: map_at_100 value: 38.811 - type: map_at_1000 value: 38.981 - type: map_at_3 value: 32.538 - type: map_at_5 value: 34.763 - type: mrr_at_1 value: 44.444 - type: mrr_at_10 value: 53.168000000000006 - type: mrr_at_100 value: 53.839000000000006 - type: mrr_at_1000 value: 53.869 - type: mrr_at_3 value: 50.54 - type: mrr_at_5 value: 52.068000000000005 - type: ndcg_at_1 value: 44.444 - type: ndcg_at_10 value: 44.994 - type: ndcg_at_100 value: 51.599 - type: ndcg_at_1000 value: 54.339999999999996 - type: ndcg_at_3 value: 41.372 - type: ndcg_at_5 value: 42.149 - type: precision_at_1 value: 44.444 - type: precision_at_10 value: 12.407 - type: precision_at_100 value: 1.9269999999999998 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 27.726 - type: precision_at_5 value: 19.814999999999998 - type: recall_at_1 value: 22.651 - type: recall_at_10 value: 52.075 - type: recall_at_100 value: 76.51400000000001 - type: recall_at_1000 value: 92.852 - type: recall_at_3 value: 37.236000000000004 - type: recall_at_5 value: 43.175999999999995 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.777 - type: map_at_10 value: 66.79899999999999 - type: map_at_100 value: 67.65299999999999 - type: map_at_1000 value: 67.706 - type: map_at_3 value: 63.352 - type: map_at_5 value: 65.52900000000001 - type: mrr_at_1 value: 81.553 - type: mrr_at_10 value: 86.983 - type: mrr_at_100 value: 87.132 - type: mrr_at_1000 value: 87.136 - type: mrr_at_3 value: 86.156 - type: mrr_at_5 value: 86.726 - type: ndcg_at_1 value: 81.553 - type: ndcg_at_10 value: 74.64 - type: ndcg_at_100 value: 77.459 - type: ndcg_at_1000 value: 78.43 - type: ndcg_at_3 value: 69.878 - type: ndcg_at_5 value: 72.59400000000001 - type: precision_at_1 value: 81.553 - type: precision_at_10 value: 15.654000000000002 - type: precision_at_100 value: 1.783 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 45.199 - type: precision_at_5 value: 29.267 - type: recall_at_1 value: 40.777 - type: recall_at_10 value: 78.271 - type: recall_at_100 value: 89.129 - type: recall_at_1000 value: 95.49 - type: recall_at_3 value: 67.79899999999999 - type: recall_at_5 value: 73.167 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 93.5064 - type: ap value: 90.25495114444111 - type: f1 value: 93.5012434973381 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.301 - type: map_at_10 value: 35.657 - type: map_at_100 value: 36.797000000000004 - type: map_at_1000 value: 36.844 - type: map_at_3 value: 31.743 - type: map_at_5 value: 34.003 - type: mrr_at_1 value: 23.854 - type: mrr_at_10 value: 36.242999999999995 - type: mrr_at_100 value: 37.32 - type: mrr_at_1000 value: 37.361 - type: mrr_at_3 value: 32.4 - type: mrr_at_5 value: 34.634 - type: ndcg_at_1 value: 23.868000000000002 - type: ndcg_at_10 value: 42.589 - type: ndcg_at_100 value: 48.031 - type: ndcg_at_1000 value: 49.189 - type: ndcg_at_3 value: 34.649 - type: ndcg_at_5 value: 38.676 - type: precision_at_1 value: 23.868000000000002 - type: precision_at_10 value: 6.6850000000000005 - type: precision_at_100 value: 0.9400000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.651 - type: precision_at_5 value: 10.834000000000001 - type: recall_at_1 value: 23.301 - type: recall_at_10 value: 63.88700000000001 - type: recall_at_100 value: 88.947 - type: recall_at_1000 value: 97.783 - type: recall_at_3 value: 42.393 - type: recall_at_5 value: 52.036 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.64888280893753 - type: f1 value: 94.41310774203512 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 79.72184222526221 - type: f1 value: 61.522034067350106 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 79.60659045057163 - type: f1 value: 77.268649687049 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.83254875588432 - type: f1 value: 81.61520635919082 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 36.31529875009507 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.734233714415073 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.994501713009452 - type: mrr value: 32.13512850703073 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.603000000000001 - type: map_at_10 value: 13.767999999999999 - type: map_at_100 value: 17.197000000000003 - type: map_at_1000 value: 18.615000000000002 - type: map_at_3 value: 10.567 - type: map_at_5 value: 12.078999999999999 - type: mrr_at_1 value: 44.891999999999996 - type: mrr_at_10 value: 53.75299999999999 - type: mrr_at_100 value: 54.35 - type: mrr_at_1000 value: 54.388000000000005 - type: mrr_at_3 value: 51.495999999999995 - type: mrr_at_5 value: 52.688 - type: ndcg_at_1 value: 43.189 - type: ndcg_at_10 value: 34.567 - type: ndcg_at_100 value: 32.273 - type: ndcg_at_1000 value: 41.321999999999996 - type: ndcg_at_3 value: 40.171 - type: ndcg_at_5 value: 37.502 - type: precision_at_1 value: 44.582 - type: precision_at_10 value: 25.139 - type: precision_at_100 value: 7.739999999999999 - type: precision_at_1000 value: 2.054 - type: precision_at_3 value: 37.152 - type: precision_at_5 value: 31.826999999999998 - type: recall_at_1 value: 6.603000000000001 - type: recall_at_10 value: 17.023 - type: recall_at_100 value: 32.914 - type: recall_at_1000 value: 64.44800000000001 - type: recall_at_3 value: 11.457 - type: recall_at_5 value: 13.816 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 30.026000000000003 - type: map_at_10 value: 45.429 - type: map_at_100 value: 46.45 - type: map_at_1000 value: 46.478 - type: map_at_3 value: 41.147 - type: map_at_5 value: 43.627 - type: mrr_at_1 value: 33.951 - type: mrr_at_10 value: 47.953 - type: mrr_at_100 value: 48.731 - type: mrr_at_1000 value: 48.751 - type: mrr_at_3 value: 44.39 - type: mrr_at_5 value: 46.533 - type: ndcg_at_1 value: 33.951 - type: ndcg_at_10 value: 53.24100000000001 - type: ndcg_at_100 value: 57.599999999999994 - type: ndcg_at_1000 value: 58.270999999999994 - type: ndcg_at_3 value: 45.190999999999995 - type: ndcg_at_5 value: 49.339 - type: precision_at_1 value: 33.951 - type: precision_at_10 value: 8.856 - type: precision_at_100 value: 1.133 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.713 - type: precision_at_5 value: 14.838000000000001 - type: recall_at_1 value: 30.026000000000003 - type: recall_at_10 value: 74.512 - type: recall_at_100 value: 93.395 - type: recall_at_1000 value: 98.402 - type: recall_at_3 value: 53.677 - type: recall_at_5 value: 63.198 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.41300000000001 - type: map_at_10 value: 85.387 - type: map_at_100 value: 86.027 - type: map_at_1000 value: 86.041 - type: map_at_3 value: 82.543 - type: map_at_5 value: 84.304 - type: mrr_at_1 value: 82.35 - type: mrr_at_10 value: 88.248 - type: mrr_at_100 value: 88.348 - type: mrr_at_1000 value: 88.349 - type: mrr_at_3 value: 87.348 - type: mrr_at_5 value: 87.96300000000001 - type: ndcg_at_1 value: 82.37 - type: ndcg_at_10 value: 88.98 - type: ndcg_at_100 value: 90.16499999999999 - type: ndcg_at_1000 value: 90.239 - type: ndcg_at_3 value: 86.34100000000001 - type: ndcg_at_5 value: 87.761 - type: precision_at_1 value: 82.37 - type: precision_at_10 value: 13.471 - type: precision_at_100 value: 1.534 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.827 - type: precision_at_5 value: 24.773999999999997 - type: recall_at_1 value: 71.41300000000001 - type: recall_at_10 value: 95.748 - type: recall_at_100 value: 99.69200000000001 - type: recall_at_1000 value: 99.98 - type: recall_at_3 value: 87.996 - type: recall_at_5 value: 92.142 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.96878497780007 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 65.31371347128074 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.287 - type: map_at_10 value: 13.530000000000001 - type: map_at_100 value: 15.891 - type: map_at_1000 value: 16.245 - type: map_at_3 value: 9.612 - type: map_at_5 value: 11.672 - type: mrr_at_1 value: 26 - type: mrr_at_10 value: 37.335 - type: mrr_at_100 value: 38.443 - type: mrr_at_1000 value: 38.486 - type: mrr_at_3 value: 33.783 - type: mrr_at_5 value: 36.028 - type: ndcg_at_1 value: 26 - type: ndcg_at_10 value: 22.215 - type: ndcg_at_100 value: 31.101 - type: ndcg_at_1000 value: 36.809 - type: ndcg_at_3 value: 21.104 - type: ndcg_at_5 value: 18.759999999999998 - type: precision_at_1 value: 26 - type: precision_at_10 value: 11.43 - type: precision_at_100 value: 2.424 - type: precision_at_1000 value: 0.379 - type: precision_at_3 value: 19.7 - type: precision_at_5 value: 16.619999999999997 - type: recall_at_1 value: 5.287 - type: recall_at_10 value: 23.18 - type: recall_at_100 value: 49.208 - type: recall_at_1000 value: 76.85300000000001 - type: recall_at_3 value: 11.991999999999999 - type: recall_at_5 value: 16.85 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.87834913790886 - type: cos_sim_spearman value: 81.04583513112122 - type: euclidean_pearson value: 81.20484174558065 - type: euclidean_spearman value: 80.76430832561769 - type: manhattan_pearson value: 81.21416730978615 - type: manhattan_spearman value: 80.7797637394211 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.56143998865157 - type: cos_sim_spearman value: 79.75387012744471 - type: euclidean_pearson value: 83.7877519997019 - type: euclidean_spearman value: 79.90489748003296 - type: manhattan_pearson value: 83.7540590666095 - type: manhattan_spearman value: 79.86434577931573 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 83.92102564177941 - type: cos_sim_spearman value: 84.98234585939103 - type: euclidean_pearson value: 84.47729567593696 - type: euclidean_spearman value: 85.09490696194469 - type: manhattan_pearson value: 84.38622951588229 - type: manhattan_spearman value: 85.02507171545574 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 80.1891164763377 - type: cos_sim_spearman value: 80.7997969966883 - type: euclidean_pearson value: 80.48572256162396 - type: euclidean_spearman value: 80.57851903536378 - type: manhattan_pearson value: 80.4324819433651 - type: manhattan_spearman value: 80.5074526239062 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 82.64319975116025 - type: cos_sim_spearman value: 84.88671197763652 - type: euclidean_pearson value: 84.74692193293231 - type: euclidean_spearman value: 85.27151722073653 - type: manhattan_pearson value: 84.72460516785438 - type: manhattan_spearman value: 85.26518899786687 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.24687565822381 - type: cos_sim_spearman value: 85.60418454111263 - type: euclidean_pearson value: 84.85829740169851 - type: euclidean_spearman value: 85.66378014138306 - type: manhattan_pearson value: 84.84672408808835 - type: manhattan_spearman value: 85.63331924364891 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 84.87758895415485 - type: cos_sim_spearman value: 85.8193745617297 - type: euclidean_pearson value: 85.78719118848134 - type: euclidean_spearman value: 84.35797575385688 - type: manhattan_pearson value: 85.97919844815692 - type: manhattan_spearman value: 84.58334745175151 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.27076035963599 - type: cos_sim_spearman value: 67.21433656439973 - type: euclidean_pearson value: 68.07434078679324 - type: euclidean_spearman value: 66.0249731719049 - type: manhattan_pearson value: 67.95495198947476 - type: manhattan_spearman value: 65.99893908331886 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 82.22437747056817 - type: cos_sim_spearman value: 85.0995685206174 - type: euclidean_pearson value: 84.08616925603394 - type: euclidean_spearman value: 84.89633925691658 - type: manhattan_pearson value: 84.08332675923133 - type: manhattan_spearman value: 84.8858228112915 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.6909022589666 - type: mrr value: 96.43341952165481 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.660999999999994 - type: map_at_10 value: 67.625 - type: map_at_100 value: 68.07600000000001 - type: map_at_1000 value: 68.10199999999999 - type: map_at_3 value: 64.50399999999999 - type: map_at_5 value: 66.281 - type: mrr_at_1 value: 61 - type: mrr_at_10 value: 68.953 - type: mrr_at_100 value: 69.327 - type: mrr_at_1000 value: 69.352 - type: mrr_at_3 value: 66.833 - type: mrr_at_5 value: 68.05 - type: ndcg_at_1 value: 61 - type: ndcg_at_10 value: 72.369 - type: ndcg_at_100 value: 74.237 - type: ndcg_at_1000 value: 74.939 - type: ndcg_at_3 value: 67.284 - type: ndcg_at_5 value: 69.72500000000001 - type: precision_at_1 value: 61 - type: precision_at_10 value: 9.733 - type: precision_at_100 value: 1.0670000000000002 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 26.222 - type: precision_at_5 value: 17.4 - type: recall_at_1 value: 57.660999999999994 - type: recall_at_10 value: 85.656 - type: recall_at_100 value: 93.833 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 71.961 - type: recall_at_5 value: 78.094 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.86930693069307 - type: cos_sim_ap value: 96.76685487950894 - type: cos_sim_f1 value: 93.44587884806354 - type: cos_sim_precision value: 92.80078895463511 - type: cos_sim_recall value: 94.1 - type: dot_accuracy value: 99.54356435643564 - type: dot_ap value: 81.18659960405607 - type: dot_f1 value: 75.78008915304605 - type: dot_precision value: 75.07360157016683 - type: dot_recall value: 76.5 - type: euclidean_accuracy value: 99.87326732673267 - type: euclidean_ap value: 96.8102411908941 - type: euclidean_f1 value: 93.6127744510978 - type: euclidean_precision value: 93.42629482071713 - type: euclidean_recall value: 93.8 - type: manhattan_accuracy value: 99.87425742574257 - type: manhattan_ap value: 96.82857341435529 - type: manhattan_f1 value: 93.62129583124059 - type: manhattan_precision value: 94.04641775983855 - type: manhattan_recall value: 93.2 - type: max_accuracy value: 99.87425742574257 - type: max_ap value: 96.82857341435529 - type: max_f1 value: 93.62129583124059 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.92560972698926 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.92797240259008 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 55.244624045597654 - type: mrr value: 56.185303666921314 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.02491987312937 - type: cos_sim_spearman value: 32.055592206679734 - type: dot_pearson value: 24.731627575422557 - type: dot_spearman value: 24.308029077069733 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.231 - type: map_at_10 value: 1.899 - type: map_at_100 value: 9.498 - type: map_at_1000 value: 20.979999999999997 - type: map_at_3 value: 0.652 - type: map_at_5 value: 1.069 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 93.4 - type: mrr_at_100 value: 93.4 - type: mrr_at_1000 value: 93.4 - type: mrr_at_3 value: 93 - type: mrr_at_5 value: 93.4 - type: ndcg_at_1 value: 86 - type: ndcg_at_10 value: 75.375 - type: ndcg_at_100 value: 52.891999999999996 - type: ndcg_at_1000 value: 44.952999999999996 - type: ndcg_at_3 value: 81.05 - type: ndcg_at_5 value: 80.175 - type: precision_at_1 value: 88 - type: precision_at_10 value: 79 - type: precision_at_100 value: 53.16 - type: precision_at_1000 value: 19.408 - type: precision_at_3 value: 85.333 - type: precision_at_5 value: 84 - type: recall_at_1 value: 0.231 - type: recall_at_10 value: 2.078 - type: recall_at_100 value: 12.601 - type: recall_at_1000 value: 41.296 - type: recall_at_3 value: 0.6779999999999999 - type: recall_at_5 value: 1.1360000000000001 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.782 - type: map_at_10 value: 10.204 - type: map_at_100 value: 16.176 - type: map_at_1000 value: 17.456 - type: map_at_3 value: 5.354 - type: map_at_5 value: 7.503 - type: mrr_at_1 value: 40.816 - type: mrr_at_10 value: 54.010000000000005 - type: mrr_at_100 value: 54.49 - type: mrr_at_1000 value: 54.49 - type: mrr_at_3 value: 48.980000000000004 - type: mrr_at_5 value: 51.735 - type: ndcg_at_1 value: 36.735 - type: ndcg_at_10 value: 26.61 - type: ndcg_at_100 value: 36.967 - type: ndcg_at_1000 value: 47.274 - type: ndcg_at_3 value: 30.363 - type: ndcg_at_5 value: 29.448999999999998 - type: precision_at_1 value: 40.816 - type: precision_at_10 value: 23.878 - type: precision_at_100 value: 7.693999999999999 - type: precision_at_1000 value: 1.4489999999999998 - type: precision_at_3 value: 31.293 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.782 - type: recall_at_10 value: 16.485 - type: recall_at_100 value: 46.924 - type: recall_at_1000 value: 79.365 - type: recall_at_3 value: 6.52 - type: recall_at_5 value: 10.48 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.08300000000001 - type: ap value: 13.91559884590195 - type: f1 value: 53.956838444291364 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.34069043576683 - type: f1 value: 59.662041994618406 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 53.70780611078653 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.10734934732073 - type: cos_sim_ap value: 77.58349999516054 - type: cos_sim_f1 value: 70.25391395868965 - type: cos_sim_precision value: 70.06035161374967 - type: cos_sim_recall value: 70.44854881266491 - type: dot_accuracy value: 80.60439887941826 - type: dot_ap value: 54.52935200483575 - type: dot_f1 value: 54.170444242973716 - type: dot_precision value: 47.47715534366309 - type: dot_recall value: 63.06068601583114 - type: euclidean_accuracy value: 87.26828396018358 - type: euclidean_ap value: 78.00158454104036 - type: euclidean_f1 value: 70.70292457670601 - type: euclidean_precision value: 68.79680479281079 - type: euclidean_recall value: 72.71767810026385 - type: manhattan_accuracy value: 87.11330988853788 - type: manhattan_ap value: 77.92527099601855 - type: manhattan_f1 value: 70.76488706365502 - type: manhattan_precision value: 68.89055472263868 - type: manhattan_recall value: 72.74406332453826 - type: max_accuracy value: 87.26828396018358 - type: max_ap value: 78.00158454104036 - type: max_f1 value: 70.76488706365502 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 87.80804905499282 - type: cos_sim_ap value: 83.06187782630936 - type: cos_sim_f1 value: 74.99716435403985 - type: cos_sim_precision value: 73.67951860931579 - type: cos_sim_recall value: 76.36279642747151 - type: dot_accuracy value: 81.83141227151008 - type: dot_ap value: 67.18241090841795 - type: dot_f1 value: 62.216037571751606 - type: dot_precision value: 56.749381227391005 - type: dot_recall value: 68.84816753926701 - type: euclidean_accuracy value: 87.91671517832887 - type: euclidean_ap value: 83.56538942001427 - type: euclidean_f1 value: 75.7327253337256 - type: euclidean_precision value: 72.48856036606828 - type: euclidean_recall value: 79.28087465352634 - type: manhattan_accuracy value: 87.86626304963713 - type: manhattan_ap value: 83.52939841172832 - type: manhattan_f1 value: 75.73635656329888 - type: manhattan_precision value: 72.99150182103836 - type: manhattan_recall value: 78.69571912534647 - type: max_accuracy value: 87.91671517832887 - type: max_ap value: 83.56538942001427 - type: max_f1 value: 75.73635656329888 license: mit language: - en --- **Recommend switching to newest BAAI/bge-large-en-v1.5, which has more reasonable similarity distribution and same method of usage.**

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License

More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The masive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "BAAI_bge-large-en is a versatile model excelling in text classification, retrieval, clustering, reranking, and semantic textual similarity tasks across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-large-zh-v1.5.json b/data/model_data_json/BAAI_bge-large-zh-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..0af763343c9219abaa07be108200d93e2f358268 --- /dev/null +++ b/data/model_data_json/BAAI_bge-large-zh-v1.5.json @@ -0,0 +1,25 @@ +{ + "model_id": "BAAI/bge-large-zh-v1.5", + "downloads": 220157, + "tags": [ + "sentence-transformers", + "pytorch", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "zh", + "arxiv:2401.03462", + "arxiv:2312.15503", + "arxiv:2311.13534", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - zh tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers ---

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License

For more details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report and massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Generates dense vector embeddings for Chinese text to enable tasks like sentence similarity and retrieval-augmented language models." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-m3.json b/data/model_data_json/BAAI_bge-m3.json new file mode 100644 index 0000000000000000000000000000000000000000..6ec2f82b432ac8e1d32424779fc5baa12b428967 --- /dev/null +++ b/data/model_data_json/BAAI_bge-m3.json @@ -0,0 +1,24 @@ +{ + "model_id": "BAAI/bge-m3", + "downloads": 3377897, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "xlm-roberta", + "feature-extraction", + "sentence-similarity", + "arxiv:2402.03216", + "arxiv:2004.04906", + "arxiv:2106.14807", + "arxiv:2107.05720", + "arxiv:2004.12832", + "license:mit", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity license: mit --- For more details please refer to our github repo: # BGE-M3 (paper, code) In this project, we introduce BGE-M3, which is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity. - Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval. - Multi-Linguality: It can support more than 100 working languages. - Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens. **Some suggestions for retrieval pipeline in RAG** We recommend to use the following pipeline: hybrid retrieval + re-ranking. - Hybrid retrieval leverages the strengths of various methods, offering higher accuracy and stronger generalization capabilities. A classic example: using both embedding retrieval and the BM25 algorithm. Now, you can try to use BGE-M3, which supports both embedding and sparse retrieval. This allows you to obtain token weights (similar to the BM25) without any additional cost when generate dense embeddings. To use hybrid retrieval, you can refer to Vespa and Milvus. - As cross-encoder models, re-ranker demonstrates higher accuracy than bi-encoder embedding model. Utilizing the re-ranking model (e.g., bge-reranker, bge-reranker-v2) after retrieval can further filter the selected text. ## News: - 2024/7/1: **We update the MIRACL evaluation results of BGE-M3**. To reproduce the new results, you can refer to: bge-m3_miracl_2cr. We have also updated our paper on arXiv.
Details The previous test results were lower because we mistakenly removed the passages that have the same id as the query from the search results. After correcting this mistake, the overall performance of BGE-M3 on MIRACL is higher than the previous results, but the experimental conclusion remains unchanged. The other results are not affected by this mistake. To reproduce the previous lower results, you need to add the parameter when using or to search the passages.
- 2024/3/20: **Thanks Milvus team!** Now you can use hybrid retrieval of bge-m3 in Milvus: pymilvus/examples /hello_hybrid_sparse_dense.py. - 2024/3/8: **Thanks for the experimental results from @Yannael. In this benchmark, BGE-M3 achieves top performance in both English and other languages, surpassing models such as OpenAI.** - 2024/3/2: Release unified fine-tuning example and data - 2024/2/6: We release the MLDR (a long document retrieval dataset covering 13 languages) and evaluation pipeline. - 2024/2/1: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this notebook ## Specs - Model | Model Name | Dimension | Sequence Length | Introduction | |:----:|:---:|:---:|:---:| | BAAI/bge-m3 | 1024 | 8192 | multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised| | BAAI/bge-m3-unsupervised | 1024 | 8192 | multilingual; contrastive learning from bge-m3-retromae | | BAAI/bge-m3-retromae | -- | 8192 | multilingual; extend the max_length of xlm-roberta to 8192 and further pretrained via retromae| | BAAI/bge-large-en-v1.5 | 1024 | 512 | English model | | BAAI/bge-base-en-v1.5 | 768 | 512 | English model | | BAAI/bge-small-en-v1.5 | 384 | 512 | English model | - Data | Dataset | Introduction | |:----------------------------------------------------------:|:-------------------------------------------------:| | MLDR | Docuemtn Retrieval Dataset, covering 13 languages | | bge-m3-data | Fine-tuning data used by bge-m3 | ## FAQ **1. Introduction for different retrieval methods** - Dense retrieval: map the text into a single embedding, e.g., DPR, BGE-v1.5 - Sparse retrieval (lexical matching): a vector of size equal to the vocabulary, with the majority of positions set to zero, calculating a weight only for tokens present in the text. e.g., BM25, unicoil, and splade - Multi-vector retrieval: use multiple vectors to represent a text, e.g., ColBERT. **2. How to use BGE-M3 in other projects?** For embedding retrieval, you can employ the BGE-M3 model using the same approach as BGE. The only difference is that the BGE-M3 model no longer requires adding instructions to the queries. For hybrid retrieval, you can use Vespa and Milvus. **3. How to fine-tune bge-M3 model?** You can follow the common in this example to fine-tune the dense embedding. If you want to fine-tune all embedding function of m3 (dense, sparse and colbert), you can refer to the unified_fine-tuning example ## Usage Install: or: ### Generate Embedding for text - Dense Embedding You also can use sentence-transformers and huggingface transformers to generate dense embeddings. Refer to baai_general_embedding for details. - Sparse Embedding (Lexical Weight) - Multi-Vector (ColBERT) ### Compute score for text pairs Input a list of text pairs, you can get the scores computed by different methods. ## Evaluation We provide the evaluation script for MKQA and MLDR ### Benchmarks from the open-source community !avatar The BGE-M3 model emerged as the top performer on this benchmark (OAI is short for OpenAI). For more details, please refer to the article and Github Repo ### Our results - Multilingual (Miracl dataset) !avatar - Cross-lingual (MKQA dataset) !avatar - Long Document Retrieval - MLDR: !avatar Please note that MLDR is a document retrieval dataset we constructed via LLM, covering 13 languages, including test set, validation set, and training set. We utilized the training set from MLDR to enhance the model's long document retrieval capabilities. Therefore, comparing baselines with (fine-tuning without long document dataset) is more equitable. Additionally, this long document retrieval dataset will be open-sourced to address the current lack of open-source multilingual long text retrieval datasets. We believe that this data will be helpful for the open-source community in training document retrieval models. - NarritiveQA: !avatar - Comparison with BM25 We utilized Pyserini to implement BM25, and the test results can be reproduced by this script. We tested BM25 using two different tokenizers: one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta). The results indicate that BM25 remains a competitive baseline, especially in long document retrieval. !avatar ## Training - Self-knowledge Distillation: combining multiple outputs from different retrieval modes as reward signal to enhance the performance of single mode(especially for sparse retrieval and multi-vec(colbert) retrival) - Efficient Batching: Improve the efficiency when fine-tuning on long text. The small-batch strategy is simple but effective, which also can used to fine-tune large embedding model. - MCLS: A simple method to improve the performance on long text without fine-tuning. If you have no enough resource to fine-tuning model with long text, the method is useful. Refer to our report for more details. ## Acknowledgement Thanks to the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc. Thanks to the open-sourced libraries like Tevatron, Pyserini. ## Citation If you find this repository useful, please consider giving a star :star: and citation", + "model_explanation_gemini": "BGE-M3 is a versatile multilingual embedding model supporting dense, sparse, and multi-vector retrieval across 100+ languages, handling text from short sentences to long documents up to 8192 tokens." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-multilingual-gemma2.json b/data/model_data_json/BAAI_bge-multilingual-gemma2.json new file mode 100644 index 0000000000000000000000000000000000000000..34e2978cb79f2901d13fa3c58905834264bdc8ca --- /dev/null +++ b/data/model_data_json/BAAI_bge-multilingual-gemma2.json @@ -0,0 +1,22 @@ +{ + "model_id": "BAAI/bge-multilingual-gemma2", + "downloads": 76038, + "tags": [ + "sentence-transformers", + "safetensors", + "gemma2", + "feature-extraction", + "sentence-similarity", + "transformers", + "mteb", + "arxiv:2402.03216", + "arxiv:2309.07597", + "license:gemma", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - feature-extraction - sentence-similarity - sentence-transformers - transformers - mteb license: gemma model-index: - name: bge-multilingual-gemma2 results: - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: main_score value: 38.11433513284057 - type: ndcg_at_1 value: 48.45201238390093 - type: ndcg_at_3 value: 44.451438575534574 - type: ndcg_at_5 value: 41.13929990797894 - type: ndcg_at_10 value: 38.11433513284057 - type: ndcg_at_100 value: 35.36065387898559 - type: ndcg_at_1000 value: 44.01125752781003 - type: map_at_1 value: 5.638004398054564 - type: map_at_3 value: 10.375632572339333 - type: map_at_5 value: 11.820531148202422 - type: map_at_10 value: 14.087436978063389 - type: map_at_100 value: 18.25397463114958 - type: map_at_1000 value: 19.868440221606203 - type: precision_at_1 value: 49.84520123839009 - type: precision_at_3 value: 41.89886480908153 - type: precision_at_5 value: 35.356037151702814 - type: precision_at_10 value: 28.513931888544857 - type: precision_at_100 value: 9.337461300309604 - type: precision_at_1000 value: 2.210216718266251 - type: recall_at_1 value: 5.638004398054564 - type: recall_at_3 value: 11.938154656310312 - type: recall_at_5 value: 14.06183119422843 - type: recall_at_10 value: 18.506397834147705 - type: recall_at_100 value: 35.96995569451433 - type: recall_at_1000 value: 68.31771509404795 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: main_score value: 45.70688915742828 - type: ndcg_at_1 value: 26.002865329512893 - type: ndcg_at_3 value: 37.49665652114275 - type: ndcg_at_5 value: 41.684045067615834 - type: ndcg_at_10 value: 45.70688915742828 - type: ndcg_at_100 value: 51.08932609519671 - type: ndcg_at_1000 value: 51.98806137292924 - type: map_at_1 value: 25.35219675262655 - type: map_at_3 value: 34.39549506526583 - type: map_at_5 value: 36.74936326010824 - type: map_at_10 value: 38.44429852488596 - type: map_at_100 value: 39.60260286311527 - type: map_at_1000 value: 39.64076154054021 - type: precision_at_1 value: 26.002865329512893 - type: precision_at_3 value: 15.840496657115954 - type: precision_at_5 value: 11.647564469914684 - type: precision_at_10 value: 7.1275071633243705 - type: precision_at_100 value: 0.9782234957019871 - type: precision_at_1000 value: 0.10565902578797497 - type: recall_at_1 value: 25.35219675262655 - type: recall_at_3 value: 45.78438395415474 - type: recall_at_5 value: 55.83213944603631 - type: recall_at_10 value: 68.08500477554918 - type: recall_at_100 value: 92.55133715377269 - type: recall_at_1000 value: 99.29083094555875 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: main_score value: 60.04205769404706 - type: ndcg_at_1 value: 59.25925925925925 - type: ndcg_at_3 value: 55.96637679199298 - type: ndcg_at_5 value: 56.937223390223956 - type: ndcg_at_10 value: 60.04205769404706 - type: ndcg_at_100 value: 66.01619664462949 - type: ndcg_at_1000 value: 67.59651529720728 - type: map_at_1 value: 31.5081163692275 - type: map_at_3 value: 45.7486689836227 - type: map_at_5 value: 48.944906602314 - type: map_at_10 value: 51.85427043799874 - type: map_at_100 value: 53.92920237379484 - type: map_at_1000 value: 54.04694438963671 - type: precision_at_1 value: 59.25925925925925 - type: precision_at_3 value: 37.44855967078195 - type: precision_at_5 value: 26.913580246913547 - type: precision_at_10 value: 16.52777777777774 - type: precision_at_100 value: 2.2962962962962754 - type: precision_at_1000 value: 0.2566358024691334 - type: recall_at_1 value: 31.5081163692275 - type: recall_at_3 value: 50.71759045138676 - type: recall_at_5 value: 57.49321152098932 - type: recall_at_10 value: 67.36356750245642 - type: recall_at_100 value: 88.67335767798735 - type: recall_at_1000 value: 97.83069725199356 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 metrics: - type: main_score value: 26.93150756480961 - type: ndcg_at_1 value: 30.8 - type: ndcg_at_3 value: 25.048085553386628 - type: ndcg_at_5 value: 22.351207380852305 - type: ndcg_at_10 value: 26.93150756480961 - type: ndcg_at_100 value: 37.965486832874014 - type: ndcg_at_1000 value: 43.346046425140244 - type: map_at_1 value: 6.238333333333366 - type: map_at_3 value: 11.479166666666679 - type: map_at_5 value: 14.215999999999983 - type: map_at_10 value: 16.774632936507945 - type: map_at_100 value: 20.148869158557293 - type: map_at_1000 value: 20.528644104490823 - type: precision_at_1 value: 30.8 - type: precision_at_3 value: 23.466666666666736 - type: precision_at_5 value: 19.899999999999967 - type: precision_at_10 value: 14.069999999999938 - type: precision_at_100 value: 2.9770000000000065 - type: precision_at_1000 value: 0.42569999999999486 - type: recall_at_1 value: 6.238333333333366 - type: recall_at_3 value: 14.29333333333338 - type: recall_at_5 value: 20.206666666666628 - type: recall_at_10 value: 28.573333333333224 - type: recall_at_100 value: 60.43666666666675 - type: recall_at_1000 value: 86.3649999999997 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: main_score value: 90.38165339181239 - type: ndcg_at_1 value: 84.86348634863486 - type: ndcg_at_3 value: 88.98667069230609 - type: ndcg_at_5 value: 89.86028996734895 - type: ndcg_at_10 value: 90.38165339181239 - type: ndcg_at_100 value: 90.99655378684439 - type: ndcg_at_1000 value: 91.15536362599602 - type: map_at_1 value: 78.8556296105801 - type: map_at_3 value: 86.24061810942983 - type: map_at_5 value: 86.94776680048933 - type: map_at_10 value: 87.26956235873007 - type: map_at_100 value: 87.47986397174834 - type: map_at_1000 value: 87.4897076664281 - type: precision_at_1 value: 84.86348634863486 - type: precision_at_3 value: 34.02340234023296 - type: precision_at_5 value: 21.10411041104359 - type: precision_at_10 value: 10.828082808282083 - type: precision_at_100 value: 1.1381638163816703 - type: precision_at_1000 value: 0.11662166216622569 - type: recall_at_1 value: 78.8556296105801 - type: recall_at_3 value: 92.34465708475605 - type: recall_at_5 value: 94.58010682020583 - type: recall_at_10 value: 96.10713452297611 - type: recall_at_100 value: 98.31672452959585 - type: recall_at_1000 value: 99.25967001462051 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: main_score value: 77.36555747844541 - type: ndcg_at_1 value: 57.681365576102415 - type: ndcg_at_3 value: 72.01664798084765 - type: ndcg_at_5 value: 75.26345973082836 - type: ndcg_at_10 value: 77.36555747844541 - type: ndcg_at_100 value: 78.15567833673768 - type: ndcg_at_1000 value: 78.16528851292641 - type: map_at_1 value: 57.681365576102415 - type: map_at_3 value: 68.59886201991475 - type: map_at_5 value: 70.38051209103858 - type: map_at_10 value: 71.26684955632336 - type: map_at_100 value: 71.4637216600468 - type: map_at_1000 value: 71.46414501573332 - type: precision_at_1 value: 57.681365576102415 - type: precision_at_3 value: 27.287814129919084 - type: precision_at_5 value: 17.965860597439132 - type: precision_at_10 value: 9.623044096728066 - type: precision_at_100 value: 0.995732574679925 - type: precision_at_1000 value: 0.09964438122332549 - type: recall_at_1 value: 57.681365576102415 - type: recall_at_3 value: 81.86344238975818 - type: recall_at_5 value: 89.82930298719772 - type: recall_at_10 value: 96.23044096728307 - type: recall_at_100 value: 99.57325746799431 - type: recall_at_1000 value: 99.6443812233286 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: main_score value: 72.0465439956427 - type: ndcg_at_1 value: 58.666666666666664 - type: ndcg_at_3 value: 66.84566274610046 - type: ndcg_at_5 value: 69.46578881873717 - type: ndcg_at_10 value: 72.0465439956427 - type: ndcg_at_100 value: 74.25705461923272 - type: ndcg_at_1000 value: 74.63689058493014 - type: map_at_1 value: 55.59444444444445 - type: map_at_3 value: 63.71851851851852 - type: map_at_5 value: 65.5362962962963 - type: map_at_10 value: 66.84112433862435 - type: map_at_100 value: 67.36269426417417 - type: map_at_1000 value: 67.37568665562833 - type: precision_at_1 value: 58.666666666666664 - type: precision_at_3 value: 26.444444444444425 - type: precision_at_5 value: 17.66666666666672 - type: precision_at_10 value: 9.866666666666706 - type: precision_at_100 value: 1.0966666666666596 - type: precision_at_1000 value: 0.11266666666666675 - type: recall_at_1 value: 55.59444444444445 - type: recall_at_3 value: 72.72777777777777 - type: recall_at_5 value: 79.31666666666666 - type: recall_at_10 value: 86.75 - type: recall_at_100 value: 96.66666666666667 - type: recall_at_1000 value: 99.66666666666667 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: bb9466bac8153a0349341eb1b22e06409e78ef4e metrics: - type: main_score value: 64.26928884606035 - type: ndcg_at_1 value: 63.0 - type: ndcg_at_3 value: 64.18432764386345 - type: ndcg_at_5 value: 64.73235515799435 - type: ndcg_at_10 value: 64.26928884606035 - type: ndcg_at_100 value: 52.39807133285409 - type: ndcg_at_1000 value: 52.19937563361241 - type: map_at_1 value: 0.18483494997310454 - type: map_at_3 value: 0.5139705769331114 - type: map_at_5 value: 0.8245601222717243 - type: map_at_10 value: 1.5832530269558573 - type: map_at_100 value: 9.664760850102393 - type: map_at_1000 value: 25.568347406468334 - type: precision_at_1 value: 70.0 - type: precision_at_3 value: 71.33333333333333 - type: precision_at_5 value: 71.60000000000001 - type: precision_at_10 value: 70.99999999999996 - type: precision_at_100 value: 55.140000000000015 - type: precision_at_1000 value: 23.857999999999997 - type: recall_at_1 value: 0.18483494997310454 - type: recall_at_3 value: 0.5584287301859913 - type: recall_at_5 value: 0.9489025953807098 - type: recall_at_10 value: 1.9023711039425688 - type: recall_at_100 value: 13.596810701594226 - type: recall_at_1000 value: 50.92058432920189 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: main_score value: 39.37204193531481 - type: ndcg_at_1 value: 35.11400651465798 - type: ndcg_at_3 value: 32.36672790229743 - type: ndcg_at_5 value: 34.79369234162357 - type: ndcg_at_10 value: 39.37204193531481 - type: ndcg_at_100 value: 47.544500439419124 - type: ndcg_at_1000 value: 50.305733346049855 - type: map_at_1 value: 15.516829533116216 - type: map_at_3 value: 23.73669923995656 - type: map_at_5 value: 26.43208469055373 - type: map_at_10 value: 28.912036175309773 - type: map_at_100 value: 31.413762299240894 - type: map_at_1000 value: 31.596796093997014 - type: precision_at_1 value: 35.11400651465798 - type: precision_at_3 value: 24.994571118349487 - type: precision_at_5 value: 19.231270358305956 - type: precision_at_10 value: 12.690553745928165 - type: precision_at_100 value: 2.1576547231270466 - type: precision_at_1000 value: 0.2676221498371306 - type: recall_at_1 value: 15.516829533116216 - type: recall_at_3 value: 29.994571118349512 - type: recall_at_5 value: 37.14223669923993 - type: recall_at_10 value: 47.29207383279043 - type: recall_at_100 value: 74.37133550488598 - type: recall_at_1000 value: 89.41585233441913 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: main_score value: 83.26282954330777 - type: ndcg_at_1 value: 87.5489534098582 - type: ndcg_at_3 value: 78.7646435855166 - type: ndcg_at_5 value: 81.41629077444277 - type: ndcg_at_10 value: 83.26282954330777 - type: ndcg_at_100 value: 85.2771369900158 - type: ndcg_at_1000 value: 85.77519303747493 - type: map_at_1 value: 43.7744767049291 - type: map_at_3 value: 73.4661264911093 - type: map_at_5 value: 75.7169705154168 - type: map_at_10 value: 76.89183627536043 - type: map_at_100 value: 77.53680315727078 - type: map_at_1000 value: 77.5649311522075 - type: precision_at_1 value: 87.5489534098582 - type: precision_at_3 value: 51.74881836596788 - type: precision_at_5 value: 33.13977042539127 - type: precision_at_10 value: 17.492234976369023 - type: precision_at_100 value: 1.9030384875084312 - type: precision_at_1000 value: 0.19679945982446267 - type: recall_at_1 value: 43.7744767049291 - type: recall_at_3 value: 77.62322754895341 - type: recall_at_5 value: 82.84942606347063 - type: recall_at_10 value: 87.4611748818366 - type: recall_at_100 value: 95.15192437542201 - type: recall_at_1000 value: 98.39972991222147 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: main_score value: 71.44670934705796 - type: ndcg_at_1 value: 54.026651216685984 - type: ndcg_at_3 value: 65.1267452491225 - type: ndcg_at_5 value: 68.6696802020747 - type: ndcg_at_10 value: 71.44670934705796 - type: ndcg_at_100 value: 73.74642927386503 - type: ndcg_at_1000 value: 73.90908268307331 - type: map_at_1 value: 48.50086906141366 - type: map_at_3 value: 61.07691193510995 - type: map_at_5 value: 63.36580243337187 - type: map_at_10 value: 64.74485498782997 - type: map_at_100 value: 65.34329174534082 - type: map_at_1000 value: 65.35107870745652 - type: precision_at_1 value: 54.026651216685984 - type: precision_at_3 value: 28.437620702974996 - type: precision_at_5 value: 19.20625724217861 - type: precision_at_10 value: 10.67207415990753 - type: precision_at_100 value: 1.1987253765932955 - type: precision_at_1000 value: 0.12143684820393259 - type: recall_at_1 value: 48.50086906141366 - type: recall_at_3 value: 73.19428350714561 - type: recall_at_5 value: 81.19689069138664 - type: recall_at_10 value: 89.04741212823485 - type: recall_at_100 value: 98.58053302433372 - type: recall_at_1000 value: 99.75376593279258 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 metrics: - type: main_score value: 90.03760323006117 - type: ndcg_at_1 value: 83.53 - type: ndcg_at_3 value: 87.53800795646302 - type: ndcg_at_5 value: 88.92909168525203 - type: ndcg_at_10 value: 90.03760323006117 - type: ndcg_at_100 value: 91.08558507332712 - type: ndcg_at_1000 value: 91.1430039358834 - type: map_at_1 value: 72.61760432018744 - type: map_at_3 value: 83.8457060028347 - type: map_at_5 value: 85.6228412692169 - type: map_at_10 value: 86.67700531365115 - type: map_at_100 value: 87.29851728827602 - type: map_at_1000 value: 87.31014621733333 - type: precision_at_1 value: 83.53 - type: precision_at_3 value: 38.33666666667159 - type: precision_at_5 value: 25.12599999999881 - type: precision_at_10 value: 13.629999999998683 - type: precision_at_100 value: 1.5431999999999773 - type: precision_at_1000 value: 0.15671999999997974 - type: recall_at_1 value: 72.61760432018744 - type: recall_at_3 value: 89.06736052932686 - type: recall_at_5 value: 93.09634203522849 - type: recall_at_10 value: 96.35128012894234 - type: recall_at_100 value: 99.7740237858541 - type: recall_at_1000 value: 99.99690476190477 - task: type: Retrieval dataset: type: mteb/webis-touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: main_score value: 30.2563523019649 - type: ndcg_at_1 value: 37.755102040816325 - type: ndcg_at_3 value: 34.45349994459905 - type: ndcg_at_5 value: 32.508805919063086 - type: ndcg_at_10 value: 30.2563523019649 - type: ndcg_at_100 value: 40.538336664503746 - type: ndcg_at_1000 value: 52.2066951614923 - type: map_at_1 value: 2.75537988273998 - type: map_at_3 value: 6.011397290504469 - type: map_at_5 value: 8.666495836494098 - type: map_at_10 value: 12.17701515007822 - type: map_at_100 value: 18.789086471205852 - type: map_at_1000 value: 20.42972375502502 - type: precision_at_1 value: 40.816326530612244 - type: precision_at_3 value: 35.37414965986394 - type: precision_at_5 value: 32.244897959183675 - type: precision_at_10 value: 26.93877551020408 - type: precision_at_100 value: 8.163265306122451 - type: precision_at_1000 value: 1.5979591836734703 - type: recall_at_1 value: 2.75537988273998 - type: recall_at_3 value: 7.254270324385098 - type: recall_at_5 value: 11.580137100328589 - type: recall_at_10 value: 18.745232816450553 - type: recall_at_100 value: 50.196809658622755 - type: recall_at_1000 value: 85.87317364148332 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: main_score value: 51.36940792375597 - type: ndcg_at_1 value: 65.125 - type: ndcg_at_3 value: 55.3967569049025 - type: ndcg_at_5 value: 53.09668587926677 - type: ndcg_at_10 value: 51.36940792375597 - type: ndcg_at_100 value: 56.69623269243084 - type: ndcg_at_1000 value: 63.481061270842 - type: map_at_1 value: 10.265595545755545 - type: map_at_3 value: 16.776544233350698 - type: map_at_5 value: 20.184523605272798 - type: map_at_10 value: 24.772797659849264 - type: map_at_100 value: 36.72689012514183 - type: map_at_1000 value: 38.73869985105569 - type: precision_at_1 value: 77.5 - type: precision_at_3 value: 59.75000000000003 - type: precision_at_5 value: 52.849999999999994 - type: precision_at_10 value: 42.47499999999995 - type: precision_at_100 value: 13.614999999999993 - type: precision_at_1000 value: 2.500749999999998 - type: recall_at_1 value: 10.265595545755545 - type: recall_at_3 value: 17.819804963534246 - type: recall_at_5 value: 22.46124219601634 - type: recall_at_10 value: 30.44583516613163 - type: recall_at_100 value: 63.84118006287797 - type: recall_at_1000 value: 85.06450356093833 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: main_score value: 47.93921415959017 - type: ndcg_at_1 value: 36.526219490536015 - type: ndcg_at_3 value: 42.35099043224295 - type: ndcg_at_5 value: 44.989685312964156 - type: ndcg_at_10 value: 47.93921415959017 - type: ndcg_at_100 value: 53.05390282389675 - type: ndcg_at_1000 value: 54.776052731794266 - type: map_at_1 value: 30.818605279548184 - type: map_at_3 value: 38.363350019087974 - type: map_at_5 value: 40.295203936887226 - type: map_at_10 value: 41.81978941662592 - type: map_at_100 value: 43.13300727554278 - type: map_at_1000 value: 43.2351061120207 - type: precision_at_1 value: 36.526219490536015 - type: precision_at_3 value: 19.550515857206346 - type: precision_at_5 value: 13.958783060831967 - type: precision_at_10 value: 8.498592395773393 - type: precision_at_100 value: 1.3024888941713948 - type: precision_at_1000 value: 0.1630253057414617 - type: recall_at_1 value: 30.818605279548184 - type: recall_at_3 value: 45.9132085981904 - type: recall_at_5 value: 52.6851323959227 - type: recall_at_10 value: 61.39718618970463 - type: recall_at_100 value: 83.30757187969981 - type: recall_at_1000 value: 94.9192024147964 - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 89.47761194029852 - type: accuracy_stderr value: 1.6502495811564162 - type: ap value: 62.20813715457866 - type: ap_stderr value: 3.7902166647587854 - type: f1 value: 84.91493292274734 - type: f1_stderr value: 1.9572239640276208 - type: main_score value: 89.47761194029852 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 96.89569999999999 - type: accuracy_stderr value: 0.6886368582206464 - type: ap value: 95.38531339207739 - type: ap_stderr value: 0.9009257949898158 - type: f1 value: 96.8941935264779 - type: f1_stderr value: 0.6908609132985931 - type: main_score value: 96.89569999999999 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 61.602000000000004 - type: accuracy_stderr value: 1.4532019818318436 - type: f1 value: 60.96100449021481 - type: f1_stderr value: 1.8031398419765765 - type: main_score value: 61.602000000000004 task: type: Classification - dataset: config: default name: MTEB ArxivClusteringP2P revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: main_score value: 54.906319409992 - type: v_measure value: 54.906319409992 - type: v_measure_std value: 14.382682652951683 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: main_score value: 50.27779516565727 - type: v_measure value: 50.27779516565727 - type: v_measure_std value: 14.463711418590636 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: map value: 64.59457317979604 - type: mrr value: 78.05214791364376 - type: main_score value: 64.59457317979604 task: type: Reranking - dataset: config: default name: MTEB BIOSSES revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: cosine_pearson value: 86.5833945335644 - type: cosine_spearman value: 85.74472483606 - type: manhattan_pearson value: 85.07748703871708 - type: manhattan_spearman value: 85.1459160110718 - type: euclidean_pearson value: 85.14704290043478 - type: euclidean_spearman value: 85.10073425868336 - type: main_score value: 85.74472483606 task: type: STS - dataset: config: default name: MTEB Banking77Classification revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 92.53246753246755 - type: accuracy_stderr value: 0.5488837781559508 - type: f1 value: 92.5143182074032 - type: f1_stderr value: 0.5657577980223147 - type: main_score value: 92.53246753246755 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: main_score value: 52.64099497480452 - type: v_measure value: 52.64099497480452 - type: v_measure_std value: 1.081892399559334 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: main_score value: 49.1972734308178 - type: v_measure value: 49.1972734308178 - type: v_measure_std value: 0.9081245477708283 task: type: Clustering - dataset: config: default name: MTEB EmotionClassification revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 92.975 - type: accuracy_stderr value: 0.5287958017987677 - type: f1 value: 89.29755895896542 - type: f1_stderr value: 0.6485027046025079 - type: main_score value: 92.975 task: type: Classification - dataset: config: default name: MTEB ImdbClassification revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 96.66480000000001 - type: accuracy_stderr value: 0.45673204398202666 - type: ap value: 95.33843919456118 - type: ap_stderr value: 0.6449846039754393 - type: f1 value: 96.6637668164617 - type: f1_stderr value: 0.45793673051468287 - type: main_score value: 96.66480000000001 task: type: Classification - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 98.61149110807114 - type: accuracy_stderr value: 0.469748178253266 - type: f1 value: 98.4685511007568 - type: f1_stderr value: 0.51636776728259 - type: main_score value: 98.61149110807114 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 95.51299589603283 - type: accuracy_stderr value: 0.3591676911539482 - type: f1 value: 85.2464691439773 - type: f1_stderr value: 0.9234502856695337 - type: main_score value: 95.51299589603283 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 82.04774714189644 - type: accuracy_stderr value: 0.7288818520309376 - type: f1 value: 79.28060657840692 - type: f1_stderr value: 0.6872008571781982 - type: main_score value: 82.04774714189644 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 84.40147948890383 - type: accuracy_stderr value: 1.2939587629143627 - type: f1 value: 83.97779287582267 - type: f1_stderr value: 0.9970599222060901 - type: main_score value: 84.40147948890383 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: main_score value: 45.80879120838561 - type: v_measure value: 45.80879120838561 - type: v_measure_std value: 1.257800489264564 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: main_score value: 44.106849261042505 - type: v_measure value: 44.106849261042505 - type: v_measure_std value: 1.4347344477874981 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 split: test type: mteb/mind_small metrics: - type: map value: 31.794062752995345 - type: mrr value: 32.98581714772614 - type: main_score value: 31.794062752995345 task: type: Reranking - dataset: config: default name: MTEB RedditClustering revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: main_score value: 56.03342473834434 - type: v_measure value: 56.03342473834434 - type: v_measure_std value: 5.972192613803461 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P revision: 282350215ef01743dc01b456c7f5241fa8937f16 split: test type: mteb/reddit-clustering-p2p metrics: - type: main_score value: 65.83156688381274 - type: v_measure value: 65.83156688381274 - type: v_measure_std value: 14.180225112120162 task: type: Clustering - dataset: config: default name: MTEB SICK-R revision: a6ea5a8cab320b040a23452cc28066d9beae2cee split: test type: mteb/sickr-sts metrics: - type: cosine_pearson value: 84.15759544348467 - type: cosine_spearman value: 82.66085892322664 - type: manhattan_pearson value: 82.27257241990692 - type: manhattan_spearman value: 82.57752467555896 - type: euclidean_pearson value: 82.20795646456065 - type: euclidean_spearman value: 82.51008729416401 - type: main_score value: 82.66085892322664 task: type: STS - dataset: config: default name: MTEB STS12 revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: cosine_pearson value: 84.3406321391237 - type: cosine_spearman value: 77.71091257651071 - type: manhattan_pearson value: 81.25784268400994 - type: manhattan_spearman value: 77.98426383345507 - type: euclidean_pearson value: 81.25641851462917 - type: euclidean_spearman value: 77.93254971878063 - type: main_score value: 77.71091257651071 task: type: STS - dataset: config: default name: MTEB STS13 revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: cosine_pearson value: 86.1528398894769 - type: cosine_spearman value: 87.44662352358895 - type: manhattan_pearson value: 86.92164570802663 - type: manhattan_spearman value: 86.9132692625668 - type: euclidean_pearson value: 87.00156426580821 - type: euclidean_spearman value: 86.98750068631274 - type: main_score value: 87.44662352358895 task: type: STS - dataset: config: default name: MTEB STS14 revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: cosine_pearson value: 83.32782491176253 - type: cosine_spearman value: 83.48313793311584 - type: manhattan_pearson value: 82.60528063429948 - type: manhattan_spearman value: 83.10434862310481 - type: euclidean_pearson value: 82.68016090104034 - type: euclidean_spearman value: 83.14418662406631 - type: main_score value: 83.48313793311584 task: type: STS - dataset: config: default name: MTEB STS15 revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: cosine_pearson value: 86.31535441436343 - type: cosine_spearman value: 87.63145141246594 - type: manhattan_pearson value: 86.95972711389149 - type: manhattan_spearman value: 86.9849824463052 - type: euclidean_pearson value: 86.95391575487379 - type: euclidean_spearman value: 86.97613682266213 - type: main_score value: 87.63145141246594 task: type: STS - dataset: config: default name: MTEB STS16 revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: cosine_pearson value: 83.43854397443079 - type: cosine_spearman value: 86.70176531845136 - type: manhattan_pearson value: 85.82302317064868 - type: manhattan_spearman value: 86.36561734213241 - type: euclidean_pearson value: 85.80127366135169 - type: euclidean_spearman value: 86.34803859754834 - type: main_score value: 86.70176531845136 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 90.38940955877999 - type: cosine_spearman value: 91.18282119920893 - type: manhattan_pearson value: 91.31823663739615 - type: manhattan_spearman value: 90.67257321731341 - type: euclidean_pearson value: 91.30318753138528 - type: euclidean_spearman value: 90.69044765693836 - type: main_score value: 91.18282119920893 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 69.33936467780947 - type: cosine_spearman value: 69.02345807358802 - type: manhattan_pearson value: 70.11799452953082 - type: manhattan_spearman value: 68.55450923481405 - type: euclidean_pearson value: 70.10857680491809 - type: euclidean_spearman value: 68.44610245708984 - type: main_score value: 69.02345807358802 task: type: STS - dataset: config: default name: MTEB STSBenchmark revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: cosine_pearson value: 85.97288135509513 - type: cosine_spearman value: 87.25208310840168 - type: manhattan_pearson value: 86.3786471501451 - type: manhattan_spearman value: 86.71177136523868 - type: euclidean_pearson value: 86.40522339296625 - type: euclidean_spearman value: 86.73930576508816 - type: main_score value: 87.25208310840168 task: type: STS - dataset: config: default name: MTEB SciDocsRR revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: map value: 87.60324164489178 - type: mrr value: 96.30331904841708 - type: main_score value: 87.60324164489178 task: type: Reranking - dataset: config: default name: MTEB SprintDuplicateQuestions revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: cos_sim_accuracy value: 99.6920792079208 - type: cos_sim_accuracy_threshold value: 90.36337347155474 - type: cos_sim_ap value: 90.93952679056765 - type: cos_sim_f1 value: 83.10700706137968 - type: cos_sim_f1_threshold value: 90.36337347155474 - type: cos_sim_precision value: 90.96313912009512 - type: cos_sim_recall value: 76.5 - type: dot_accuracy value: 99.54554455445545 - type: dot_accuracy_threshold value: 2876800.0 - type: dot_ap value: 84.01112287735286 - type: dot_f1 value: 75.7622739018088 - type: dot_f1_threshold value: 2820800.0 - type: dot_precision value: 78.39572192513369 - type: dot_recall value: 73.3 - type: euclidean_accuracy value: 99.6930693069307 - type: euclidean_accuracy_threshold value: 7718.054017089397 - type: euclidean_ap value: 91.1257568881301 - type: euclidean_f1 value: 83.09022150189087 - type: euclidean_f1_threshold value: 7817.08324628535 - type: euclidean_precision value: 90.36427732079906 - type: euclidean_recall value: 76.9 - type: manhattan_accuracy value: 99.6920792079208 - type: manhattan_accuracy_threshold value: 364735.19654273987 - type: manhattan_ap value: 91.2326885940691 - type: manhattan_f1 value: 83.36008560727663 - type: manhattan_f1_threshold value: 375395.8945572376 - type: manhattan_precision value: 89.64326812428078 - type: manhattan_recall value: 77.9 - type: max_accuracy value: 99.6930693069307 - type: max_ap value: 91.2326885940691 - type: max_f1 value: 83.36008560727663 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: main_score value: 66.2095300942637 - type: v_measure value: 66.2095300942637 - type: v_measure_std value: 3.214369679617631 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: main_score value: 45.74307000935057 - type: v_measure value: 45.74307000935057 - type: v_measure_std value: 1.5352466748569888 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: map value: 54.90337951829123 - type: mrr value: 56.12889663441134 - type: main_score value: 54.90337951829123 task: type: Reranking - dataset: config: default name: MTEB SummEval revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: cosine_pearson value: 31.0669308484832 - type: cosine_spearman value: 31.19637421540861 - type: dot_pearson value: 30.62326176666765 - type: dot_spearman value: 30.42135737502967 - type: main_score value: 31.19637421540861 task: type: Summarization - dataset: config: default name: MTEB ToxicConversationsClassification revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 87.34339999999999 - type: accuracy_stderr value: 1.838245696309393 - type: ap value: 33.536584790435406 - type: ap_stderr value: 2.276373512492581 - type: f1 value: 72.47307082324448 - type: f1_stderr value: 1.9964640292072542 - type: main_score value: 87.34339999999999 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 78.86247877758915 - type: accuracy_stderr value: 1.1273253738982443 - type: f1 value: 79.14666244848874 - type: f1_stderr value: 1.1532640958036497 - type: main_score value: 78.86247877758915 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: main_score value: 70.44270836680788 - type: v_measure value: 70.44270836680788 - type: v_measure_std value: 1.5185423698266132 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: cos_sim_accuracy value: 87.74512725755498 - type: cos_sim_accuracy_threshold value: 82.34941560483547 - type: cos_sim_ap value: 79.6389274210382 - type: cos_sim_f1 value: 71.76319176319176 - type: cos_sim_f1_threshold value: 80.1523829249257 - type: cos_sim_precision value: 70.0502512562814 - type: cos_sim_recall value: 73.56200527704485 - type: dot_accuracy value: 85.13441020444657 - type: dot_accuracy_threshold value: 2220800.0 - type: dot_ap value: 71.67080150823449 - type: dot_f1 value: 66.18984119287187 - type: dot_f1_threshold value: 2086400.0 - type: dot_precision value: 61.224489795918366 - type: dot_recall value: 72.0316622691293 - type: euclidean_accuracy value: 87.69148238660071 - type: euclidean_accuracy_threshold value: 9221.50036619459 - type: euclidean_ap value: 79.65326151280289 - type: euclidean_f1 value: 71.7903489983621 - type: euclidean_f1_threshold value: 10313.528386219872 - type: euclidean_precision value: 68.70026525198939 - type: euclidean_recall value: 75.17150395778364 - type: manhattan_accuracy value: 87.74512725755498 - type: manhattan_accuracy_threshold value: 444289.1119837761 - type: manhattan_ap value: 79.67744645365104 - type: manhattan_f1 value: 71.94423699278066 - type: manhattan_f1_threshold value: 491676.24004781246 - type: manhattan_precision value: 68.0961357210179 - type: manhattan_recall value: 76.2532981530343 - type: max_accuracy value: 87.74512725755498 - type: max_ap value: 79.67744645365104 - type: max_f1 value: 71.94423699278066 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: cos_sim_accuracy value: 89.5544688943222 - type: cos_sim_accuracy_threshold value: 81.58909533293946 - type: cos_sim_ap value: 86.95174990178396 - type: cos_sim_f1 value: 79.1543756145526 - type: cos_sim_f1_threshold value: 80.08573448087095 - type: cos_sim_precision value: 77.78355879292404 - type: cos_sim_recall value: 80.5743763473976 - type: dot_accuracy value: 88.60752124810804 - type: dot_accuracy_threshold value: 2136000.0 - type: dot_ap value: 84.26724775947629 - type: dot_f1 value: 77.67666146985243 - type: dot_f1_threshold value: 2064000.0 - type: dot_precision value: 73.40505721921468 - type: dot_recall value: 82.47613181398214 - type: euclidean_accuracy value: 89.5370046959289 - type: euclidean_accuracy_threshold value: 9750.113991666478 - type: euclidean_ap value: 86.99393092403776 - type: euclidean_f1 value: 79.07167337207571 - type: euclidean_f1_threshold value: 10338.095928500366 - type: euclidean_precision value: 76.59497690531177 - type: euclidean_recall value: 81.71388974437943 - type: manhattan_accuracy value: 89.57581402569178 - type: manhattan_accuracy_threshold value: 463812.92815208435 - type: manhattan_ap value: 87.00849868076658 - type: manhattan_f1 value: 79.08583576933297 - type: manhattan_f1_threshold value: 482453.35128605366 - type: manhattan_precision value: 78.00494270950348 - type: manhattan_recall value: 80.19710502001848 - type: max_accuracy value: 89.57581402569178 - type: max_ap value: 87.00849868076658 - type: max_f1 value: 79.1543756145526 task: type: PairClassification - dataset: config: default name: MTEB AFQMC revision: b44c3b011063adb25877c13823db83bb193913c4 split: validation type: C-MTEB/AFQMC metrics: - type: cosine_pearson value: 45.108559635369325 - type: cosine_spearman value: 47.172833128216176 - type: manhattan_pearson value: 45.75443077564791 - type: manhattan_spearman value: 47.13974146235398 - type: euclidean_pearson value: 45.78921257223492 - type: euclidean_spearman value: 47.177095238278625 - type: main_score value: 47.172833128216176 task: type: STS - dataset: config: default name: MTEB ATEC revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 split: test type: C-MTEB/ATEC metrics: - type: cosine_pearson value: 48.304409578388466 - type: cosine_spearman value: 50.75006977697012 - type: manhattan_pearson value: 52.688818756177035 - type: manhattan_spearman value: 50.739214155741095 - type: euclidean_pearson value: 52.71788557204978 - type: euclidean_spearman value: 50.77895730336448 - type: main_score value: 50.75006977697012 task: type: STS - dataset: config: zh name: MTEB AmazonReviewsClassification (zh) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 54.339999999999996 - type: accuracy_stderr value: 1.6518837731511269 - type: f1 value: 53.37316538790502 - type: f1_stderr value: 1.6112926272861336 - type: main_score value: 54.339999999999996 task: type: Classification - dataset: config: default name: MTEB BQ revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 split: test type: C-MTEB/BQ metrics: - type: cosine_pearson value: 59.62831218167518 - type: cosine_spearman value: 62.02213472473759 - type: manhattan_pearson value: 61.122261197018176 - type: manhattan_spearman value: 62.208780520694454 - type: euclidean_pearson value: 61.17827629627213 - type: euclidean_spearman value: 62.266859648664244 - type: main_score value: 62.02213472473759 task: type: STS - dataset: config: default name: MTEB CLSClusteringP2P revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 split: test type: C-MTEB/CLSClusteringP2P metrics: - type: main_score value: 54.64518394835408 - type: v_measure value: 54.64518394835408 - type: v_measure_std value: 1.2745946640208072 task: type: Clustering - dataset: config: default name: MTEB CLSClusteringS2S revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f split: test type: C-MTEB/CLSClusteringS2S metrics: - type: main_score value: 63.68323477729556 - type: v_measure value: 63.68323477729556 - type: v_measure_std value: 1.740918833098302 task: type: Clustering - dataset: config: default name: MTEB CMedQAv1 revision: 8d7f1e942507dac42dc58017c1a001c3717da7df split: test type: C-MTEB/CMedQAv1-reranking metrics: - type: map value: 84.61500884703916 - type: mrr value: 87.01424603174604 - type: main_score value: 84.61500884703916 task: type: Reranking - dataset: config: default name: MTEB CMedQAv2 revision: 23d186750531a14a0357ca22cd92d712fd512ea0 split: test type: C-MTEB/CMedQAv2-reranking metrics: - type: map value: 85.60137988993483 - type: mrr value: 87.96857142857142 - type: main_score value: 85.60137988993483 task: type: Reranking - dataset: config: default name: MTEB CmedqaRetrieval revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 split: dev type: C-MTEB/CmedqaRetrieval metrics: - type: map_at_1 value: 24.191 - type: map_at_10 value: 35.819 - type: map_at_100 value: 37.639 - type: map_at_1000 value: 37.775 - type: map_at_3 value: 32.045 - type: map_at_5 value: 34.008 - type: mrr_at_1 value: 36.684 - type: mrr_at_10 value: 44.769 - type: mrr_at_100 value: 45.754 - type: mrr_at_1000 value: 45.809 - type: mrr_at_3 value: 42.465 - type: mrr_at_5 value: 43.696 - type: ndcg_at_1 value: 36.834 - type: ndcg_at_10 value: 42.208 - type: ndcg_at_100 value: 49.507 - type: ndcg_at_1000 value: 51.834 - type: ndcg_at_3 value: 37.416 - type: ndcg_at_5 value: 39.152 - type: precision_at_1 value: 36.834 - type: precision_at_10 value: 9.357 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.183 - type: precision_at_3 value: 21.08 - type: precision_at_5 value: 15.068999999999999 - type: recall_at_1 value: 24.191 - type: recall_at_10 value: 52.078 - type: recall_at_100 value: 82.548 - type: recall_at_1000 value: 98.017 - type: recall_at_3 value: 37.484 - type: recall_at_5 value: 43.187 - type: main_score value: 42.208 task: type: Retrieval - dataset: config: default name: MTEB Cmnli revision: 41bc36f332156f7adc9e38f53777c959b2ae9766 split: validation type: C-MTEB/CMNLI metrics: - type: cos_sim_accuracy value: 81.98436560432953 - type: cos_sim_accuracy_threshold value: 67.33228049687503 - type: cos_sim_ap value: 90.13312662430796 - type: cos_sim_f1 value: 83.2163938077737 - type: cos_sim_f1_threshold value: 64.44945196171463 - type: cos_sim_precision value: 79.45555082943429 - type: cos_sim_recall value: 87.350946925415 - type: dot_accuracy value: 80.50511124473843 - type: dot_accuracy_threshold value: 1736000.0 - type: dot_ap value: 88.76136186445322 - type: dot_f1 value: 81.75838631878973 - type: dot_f1_threshold value: 1681600.0 - type: dot_precision value: 76.96594427244582 - type: dot_recall value: 87.18728080430208 - type: euclidean_accuracy value: 82.21286831028262 - type: euclidean_accuracy_threshold value: 13240.938473272565 - type: euclidean_ap value: 90.14863232280865 - type: euclidean_f1 value: 83.277292086976 - type: euclidean_f1_threshold value: 13667.852165734186 - type: euclidean_precision value: 79.97847147470398 - type: euclidean_recall value: 86.85994856207621 - type: manhattan_accuracy value: 82.21286831028262 - type: manhattan_accuracy_threshold value: 629412.1389746666 - type: manhattan_ap value: 90.03868533208357 - type: manhattan_f1 value: 83.15683870248579 - type: manhattan_f1_threshold value: 649621.3114321232 - type: manhattan_precision value: 79.46314443971026 - type: manhattan_recall value: 87.21066167874679 - type: max_accuracy value: 82.21286831028262 - type: max_ap value: 90.14863232280865 - type: max_f1 value: 83.277292086976 task: type: PairClassification - dataset: config: default name: MTEB CovidRetrieval revision: 1271c7809071a13532e05f25fb53511ffce77117 split: dev type: C-MTEB/CovidRetrieval metrics: - type: map_at_1 value: 65.595 - type: map_at_10 value: 73.717 - type: map_at_100 value: 74.134 - type: map_at_1000 value: 74.143 - type: map_at_3 value: 71.97 - type: map_at_5 value: 73.11800000000001 - type: mrr_at_1 value: 65.648 - type: mrr_at_10 value: 73.618 - type: mrr_at_100 value: 74.02499999999999 - type: mrr_at_1000 value: 74.033 - type: mrr_at_3 value: 71.865 - type: mrr_at_5 value: 73.04 - type: ndcg_at_1 value: 65.753 - type: ndcg_at_10 value: 77.458 - type: ndcg_at_100 value: 79.46 - type: ndcg_at_1000 value: 79.666 - type: ndcg_at_3 value: 73.988 - type: ndcg_at_5 value: 76.038 - type: precision_at_1 value: 65.753 - type: precision_at_10 value: 8.999 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 26.765 - type: precision_at_5 value: 17.092 - type: recall_at_1 value: 65.595 - type: recall_at_10 value: 89.041 - type: recall_at_100 value: 98.31400000000001 - type: recall_at_1000 value: 99.895 - type: recall_at_3 value: 79.768 - type: recall_at_5 value: 84.66799999999999 - type: main_score value: 77.458 task: type: Retrieval - dataset: config: default name: MTEB DuRetrieval revision: a1a333e290fe30b10f3f56498e3a0d911a693ced split: dev type: C-MTEB/DuRetrieval metrics: - type: map_at_1 value: 27.248 - type: map_at_10 value: 84.303 - type: map_at_100 value: 86.866 - type: map_at_1000 value: 86.888 - type: map_at_3 value: 58.658 - type: map_at_5 value: 74.265 - type: mrr_at_1 value: 92.2 - type: mrr_at_10 value: 94.733 - type: mrr_at_100 value: 94.767 - type: mrr_at_1000 value: 94.768 - type: mrr_at_3 value: 94.492 - type: mrr_at_5 value: 94.627 - type: ndcg_at_1 value: 92.2 - type: ndcg_at_10 value: 90.462 - type: ndcg_at_100 value: 92.562 - type: ndcg_at_1000 value: 92.757 - type: ndcg_at_3 value: 89.44800000000001 - type: ndcg_at_5 value: 88.683 - type: precision_at_1 value: 92.2 - type: precision_at_10 value: 42.980000000000004 - type: precision_at_100 value: 4.851 - type: precision_at_1000 value: 0.49 - type: precision_at_3 value: 80.233 - type: precision_at_5 value: 67.95 - type: recall_at_1 value: 27.248 - type: recall_at_10 value: 91.46600000000001 - type: recall_at_100 value: 98.566 - type: recall_at_1000 value: 99.557 - type: recall_at_3 value: 60.671 - type: recall_at_5 value: 78.363 - type: main_score value: 90.462 task: type: Retrieval - dataset: config: default name: MTEB EcomRetrieval revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 split: dev type: C-MTEB/EcomRetrieval metrics: - type: map_at_1 value: 54.7 - type: map_at_10 value: 64.574 - type: map_at_100 value: 65.144 - type: map_at_1000 value: 65.156 - type: map_at_3 value: 62.333000000000006 - type: map_at_5 value: 63.63799999999999 - type: mrr_at_1 value: 54.7 - type: mrr_at_10 value: 64.603 - type: mrr_at_100 value: 65.172 - type: mrr_at_1000 value: 65.184 - type: mrr_at_3 value: 62.383 - type: mrr_at_5 value: 63.683 - type: ndcg_at_1 value: 54.7 - type: ndcg_at_10 value: 69.298 - type: ndcg_at_100 value: 71.81 - type: ndcg_at_1000 value: 72.117 - type: ndcg_at_3 value: 64.72099999999999 - type: ndcg_at_5 value: 67.071 - type: precision_at_1 value: 54.7 - type: precision_at_10 value: 8.41 - type: precision_at_100 value: 0.9530000000000001 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 23.867 - type: precision_at_5 value: 15.459999999999999 - type: recall_at_1 value: 54.7 - type: recall_at_10 value: 84.1 - type: recall_at_100 value: 95.3 - type: recall_at_1000 value: 97.7 - type: recall_at_3 value: 71.6 - type: recall_at_5 value: 77.3 - type: main_score value: 69.298 task: type: Retrieval - dataset: config: default name: MTEB IFlyTek revision: 421605374b29664c5fc098418fe20ada9bd55f8a split: validation type: C-MTEB/IFlyTek-classification metrics: - type: accuracy value: 49.942285494420936 - type: accuracy_stderr value: 0.9218275144833329 - type: f1 value: 41.32381790374152 - type: f1_stderr value: 0.8291507105327707 - type: main_score value: 49.942285494420936 task: type: Classification - dataset: config: default name: MTEB JDReview revision: b7c64bd89eb87f8ded463478346f76731f07bf8b split: test type: C-MTEB/JDReview-classification metrics: - type: accuracy value: 88.91181988742964 - type: accuracy_stderr value: 1.952391767940518 - type: ap value: 60.18509628974178 - type: ap_stderr value: 4.273060966573582 - type: f1 value: 84.02722221827027 - type: f1_stderr value: 2.238197243395083 - type: main_score value: 88.91181988742964 task: type: Classification - dataset: config: default name: MTEB LCQMC revision: 17f9b096f80380fce5ed12a9be8be7784b337daf split: test type: C-MTEB/LCQMC metrics: - type: cosine_pearson value: 68.32691294171383 - type: cosine_spearman value: 75.95458618586729 - type: manhattan_pearson value: 74.37198807732018 - type: manhattan_spearman value: 75.99352157963375 - type: euclidean_pearson value: 74.36294627886716 - type: euclidean_spearman value: 75.98632511635132 - type: main_score value: 75.95458618586729 task: type: STS - dataset: config: default name: MTEB MMarcoReranking revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 split: dev type: C-MTEB/Mmarco-reranking metrics: - type: map value: 35.4327533126161 - type: mrr value: 34.61507936507937 - type: main_score value: 35.4327533126161 task: type: Reranking - dataset: config: default name: MTEB MMarcoRetrieval revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 split: dev type: C-MTEB/MMarcoRetrieval metrics: - type: map_at_1 value: 72.652 - type: map_at_10 value: 81.396 - type: map_at_100 value: 81.597 - type: map_at_1000 value: 81.60300000000001 - type: map_at_3 value: 79.757 - type: map_at_5 value: 80.798 - type: mrr_at_1 value: 75.01400000000001 - type: mrr_at_10 value: 81.842 - type: mrr_at_100 value: 82.025 - type: mrr_at_1000 value: 82.03099999999999 - type: mrr_at_3 value: 80.45400000000001 - type: mrr_at_5 value: 81.345 - type: ndcg_at_1 value: 74.98599999999999 - type: ndcg_at_10 value: 84.70100000000001 - type: ndcg_at_100 value: 85.568 - type: ndcg_at_1000 value: 85.721 - type: ndcg_at_3 value: 81.64099999999999 - type: ndcg_at_5 value: 83.375 - type: precision_at_1 value: 74.98599999999999 - type: precision_at_10 value: 10.049 - type: precision_at_100 value: 1.047 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 30.458000000000002 - type: precision_at_5 value: 19.206 - type: recall_at_1 value: 72.652 - type: recall_at_10 value: 94.40899999999999 - type: recall_at_100 value: 98.241 - type: recall_at_1000 value: 99.42 - type: recall_at_3 value: 86.354 - type: recall_at_5 value: 90.472 - type: main_score value: 84.70100000000001 task: type: Retrieval - dataset: config: zh-CN name: MTEB MassiveIntentClassification (zh-CN) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 78.19098856758575 - type: accuracy_stderr value: 0.6325028678427684 - type: f1 value: 74.80611425574001 - type: f1_stderr value: 0.9021806207904779 - type: main_score value: 78.19098856758575 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveScenarioClassification (zh-CN) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 82.58238063214526 - type: accuracy_stderr value: 1.0999970213165273 - type: f1 value: 81.94734854057064 - type: f1_stderr value: 1.248633855872851 - type: main_score value: 82.58238063214526 task: type: Classification - dataset: config: default name: MTEB MedicalRetrieval revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 split: dev type: C-MTEB/MedicalRetrieval metrics: - type: map_at_1 value: 53.7 - type: map_at_10 value: 59.184000000000005 - type: map_at_100 value: 59.754 - type: map_at_1000 value: 59.8 - type: map_at_3 value: 57.833 - type: map_at_5 value: 58.548 - type: mrr_at_1 value: 54.0 - type: mrr_at_10 value: 59.352000000000004 - type: mrr_at_100 value: 59.926 - type: mrr_at_1000 value: 59.971 - type: mrr_at_3 value: 57.99999999999999 - type: mrr_at_5 value: 58.714999999999996 - type: ndcg_at_1 value: 53.7 - type: ndcg_at_10 value: 62.022 - type: ndcg_at_100 value: 65.038 - type: ndcg_at_1000 value: 66.366 - type: ndcg_at_3 value: 59.209 - type: ndcg_at_5 value: 60.51299999999999 - type: precision_at_1 value: 53.7 - type: precision_at_10 value: 7.1 - type: precision_at_100 value: 0.856 - type: precision_at_1000 value: 0.096 - type: precision_at_3 value: 21.067 - type: precision_at_5 value: 13.28 - type: recall_at_1 value: 53.7 - type: recall_at_10 value: 71.0 - type: recall_at_100 value: 85.6 - type: recall_at_1000 value: 96.3 - type: recall_at_3 value: 63.2 - type: recall_at_5 value: 66.4 - type: main_score value: 62.022 task: type: Retrieval - dataset: config: default name: MTEB MultilingualSentiment revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a split: validation type: C-MTEB/MultilingualSentiment-classification metrics: - type: accuracy value: 78.91333333333334 - type: accuracy_stderr value: 1.0834307648494321 - type: f1 value: 78.881433228092 - type: f1_stderr value: 1.122457277013712 - type: main_score value: 78.91333333333334 task: type: Classification - dataset: config: default name: MTEB Ocnli revision: 66e76a618a34d6d565d5538088562851e6daa7ec split: validation type: C-MTEB/OCNLI metrics: - type: cos_sim_accuracy value: 76.39415268002165 - type: cos_sim_accuracy_threshold value: 68.98242139321592 - type: cos_sim_ap value: 83.20687440058073 - type: cos_sim_f1 value: 78.4351145038168 - type: cos_sim_f1_threshold value: 65.47409929698304 - type: cos_sim_precision value: 71.54046997389034 - type: cos_sim_recall value: 86.80042238648363 - type: dot_accuracy value: 74.60747157552788 - type: dot_accuracy_threshold value: 1737600.0 - type: dot_ap value: 79.78938545919723 - type: dot_f1 value: 76.92307692307692 - type: dot_f1_threshold value: 1652800.0 - type: dot_precision value: 67.90622473726758 - type: dot_recall value: 88.70116156283 - type: euclidean_accuracy value: 76.34001082837032 - type: euclidean_accuracy_threshold value: 12597.299662420446 - type: euclidean_ap value: 83.60222701792158 - type: euclidean_f1 value: 78.77947295423024 - type: euclidean_f1_threshold value: 13639.653702639469 - type: euclidean_precision value: 70.06578947368422 - type: euclidean_recall value: 89.96832101372756 - type: manhattan_accuracy value: 76.23172712506768 - type: manhattan_accuracy_threshold value: 587601.2824743986 - type: manhattan_ap value: 83.51813426548178 - type: manhattan_f1 value: 78.6654135338346 - type: manhattan_f1_threshold value: 639711.1931562424 - type: manhattan_precision value: 70.87214225232854 - type: manhattan_recall value: 88.3843717001056 - type: max_accuracy value: 76.39415268002165 - type: max_ap value: 83.60222701792158 - type: max_f1 value: 78.77947295423024 task: type: PairClassification - dataset: config: default name: MTEB OnlineShopping revision: e610f2ebd179a8fda30ae534c3878750a96db120 split: test type: C-MTEB/OnlineShopping-classification metrics: - type: accuracy value: 94.59 - type: accuracy_stderr value: 0.8971621926942733 - type: ap value: 93.01229797205905 - type: ap_stderr value: 1.0519542956523058 - type: f1 value: 94.58077736915268 - type: f1_stderr value: 0.8954928292768671 - type: main_score value: 94.59 task: type: Classification - dataset: config: default name: MTEB PAWSX revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 split: test type: C-MTEB/PAWSX metrics: - type: cosine_pearson value: 24.341872875292857 - type: cosine_spearman value: 30.570037022875436 - type: manhattan_pearson value: 31.41015320258418 - type: manhattan_spearman value: 30.604526098895114 - type: euclidean_pearson value: 31.400038084432175 - type: euclidean_spearman value: 30.61062265273698 - type: main_score value: 30.570037022875436 task: type: STS - dataset: config: default name: MTEB QBQTC revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 split: test type: C-MTEB/QBQTC metrics: - type: cosine_pearson value: 36.61757468091905 - type: cosine_spearman value: 38.981417359835504 - type: manhattan_pearson value: 37.971127169578764 - type: manhattan_spearman value: 39.55028286687854 - type: euclidean_pearson value: 37.96983777648438 - type: euclidean_spearman value: 39.542856511171784 - type: main_score value: 38.981417359835504 task: type: STS - dataset: config: zh name: MTEB STS22 (zh) revision: eea2b4fe26a775864c896887d910b76a8098ad3f split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 68.29834902017382 - type: cosine_spearman value: 68.6823378297782 - type: manhattan_pearson value: 68.47336169904406 - type: manhattan_spearman value: 69.08033223619941 - type: euclidean_pearson value: 68.38785956191622 - type: euclidean_spearman value: 68.97973814449657 - type: main_score value: 68.6823378297782 task: type: STS - dataset: config: default name: MTEB STSB revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 split: test type: C-MTEB/STSB metrics: - type: cosine_pearson value: 80.60572958563593 - type: cosine_spearman value: 80.87063761195603 - type: manhattan_pearson value: 79.30174059269083 - type: manhattan_spearman value: 80.02203618135883 - type: euclidean_pearson value: 79.3314553444783 - type: euclidean_spearman value: 80.04556415585255 - type: main_score value: 80.87063761195603 task: type: STS - dataset: config: default name: MTEB T2Reranking revision: 76631901a18387f85eaa53e5450019b87ad58ef9 split: dev type: C-MTEB/T2Reranking metrics: - type: map value: 67.47921173708028 - type: mrr value: 77.9396513739777 - type: main_score value: 67.47921173708028 task: type: Reranking - dataset: config: default name: MTEB T2Retrieval revision: 8731a845f1bf500a4f111cf1070785c793d10e64 split: dev type: C-MTEB/T2Retrieval metrics: - type: map_at_1 value: 28.021 - type: map_at_10 value: 79.149 - type: map_at_100 value: 82.613 - type: map_at_1000 value: 82.67099999999999 - type: map_at_3 value: 55.665 - type: map_at_5 value: 68.46900000000001 - type: mrr_at_1 value: 91.106 - type: mrr_at_10 value: 93.372 - type: mrr_at_100 value: 93.44200000000001 - type: mrr_at_1000 value: 93.445 - type: mrr_at_3 value: 92.99300000000001 - type: mrr_at_5 value: 93.24900000000001 - type: ndcg_at_1 value: 91.106 - type: ndcg_at_10 value: 86.259 - type: ndcg_at_100 value: 89.46600000000001 - type: ndcg_at_1000 value: 90.012 - type: ndcg_at_3 value: 87.574 - type: ndcg_at_5 value: 86.283 - type: precision_at_1 value: 91.106 - type: precision_at_10 value: 42.742999999999995 - type: precision_at_100 value: 5.029999999999999 - type: precision_at_1000 value: 0.516 - type: precision_at_3 value: 76.593 - type: precision_at_5 value: 64.243 - type: recall_at_1 value: 28.021 - type: recall_at_10 value: 85.184 - type: recall_at_100 value: 95.79299999999999 - type: recall_at_1000 value: 98.547 - type: recall_at_3 value: 57.233000000000004 - type: recall_at_5 value: 71.628 - type: main_score value: 86.259 task: type: Retrieval - dataset: config: default name: MTEB TNews revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 split: validation type: C-MTEB/TNews-classification metrics: - type: accuracy value: 50.255 - type: accuracy_stderr value: 0.9341868121526873 - type: f1 value: 48.65080322457893 - type: f1_stderr value: 0.9391547591179161 - type: main_score value: 50.255 task: type: Classification - dataset: config: default name: MTEB ThuNewsClusteringP2P revision: 5798586b105c0434e4f0fe5e767abe619442cf93 split: test type: C-MTEB/ThuNewsClusteringP2P metrics: - type: main_score value: 64.32076022871308 - type: v_measure value: 64.32076022871308 - type: v_measure_std value: 0.7190996709617924 task: type: Clustering - dataset: config: default name: MTEB ThuNewsClusteringS2S revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d split: test type: C-MTEB/ThuNewsClusteringS2S metrics: - type: main_score value: 54.57080911705562 - type: v_measure value: 54.57080911705562 - type: v_measure_std value: 1.5185826402845883 task: type: Clustering - dataset: config: default name: MTEB VideoRetrieval revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 split: dev type: C-MTEB/VideoRetrieval metrics: - type: map_at_1 value: 63.1 - type: map_at_10 value: 73.137 - type: map_at_100 value: 73.539 - type: map_at_1000 value: 73.546 - type: map_at_3 value: 71.467 - type: map_at_5 value: 72.552 - type: mrr_at_1 value: 63.3 - type: mrr_at_10 value: 73.238 - type: mrr_at_100 value: 73.64 - type: mrr_at_1000 value: 73.64699999999999 - type: mrr_at_3 value: 71.56700000000001 - type: mrr_at_5 value: 72.652 - type: ndcg_at_1 value: 63.1 - type: ndcg_at_10 value: 77.397 - type: ndcg_at_100 value: 79.11399999999999 - type: ndcg_at_1000 value: 79.305 - type: ndcg_at_3 value: 74.031 - type: ndcg_at_5 value: 75.976 - type: precision_at_1 value: 63.1 - type: precision_at_10 value: 9.049999999999999 - type: precision_at_100 value: 0.98 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 27.133000000000003 - type: precision_at_5 value: 17.22 - type: recall_at_1 value: 63.1 - type: recall_at_10 value: 90.5 - type: recall_at_100 value: 98.0 - type: recall_at_1000 value: 99.5 - type: recall_at_3 value: 81.39999999999999 - type: recall_at_5 value: 86.1 - type: main_score value: 77.397 task: type: Retrieval - dataset: config: default name: MTEB Waimai revision: 339287def212450dcaa9df8c22bf93e9980c7023 split: test type: C-MTEB/waimai-classification metrics: - type: accuracy value: 89.26 - type: accuracy_stderr value: 1.44651304867948 - type: ap value: 75.17154345788362 - type: ap_stderr value: 2.7356371110082565 - type: f1 value: 87.94016849813178 - type: f1_stderr value: 1.3897605039980534 - type: main_score value: 89.26 task: type: Classification - dataset: config: default name: MTEB AlloProfClusteringP2P revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: main_score value: 71.20310003742769 - type: v_measure value: 71.20310003742769 - type: v_measure_std value: 2.3682783706448687 task: type: Clustering - dataset: config: default name: MTEB AlloProfClusteringS2S revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: main_score value: 59.64232194434788 - type: v_measure value: 59.64232194434788 - type: v_measure_std value: 2.4292956011867557 task: type: Clustering - dataset: config: default name: MTEB AlloprofReranking revision: 65393d0d7a08a10b4e348135e824f385d420b0fd split: test type: lyon-nlp/mteb-fr-reranking-alloprof-s2p metrics: - type: main_score value: 78.62041803111894 - type: map value: 78.62041803111894 - type: mrr value: 79.82309057762426 - type: nAUC_map_diff1 value: 58.23586953459263 - type: nAUC_map_max value: 16.162821346484357 - type: nAUC_map_std value: 20.727030444422525 - type: nAUC_mrr_diff1 value: 57.89675675999501 - type: nAUC_mrr_max value: 17.188359535738417 - type: nAUC_mrr_std value: 20.121404571879598 task: type: Reranking - dataset: config: default name: MTEB AlloprofRetrieval revision: fcf295ea64c750f41fadbaa37b9b861558e1bfbd split: test type: lyon-nlp/alloprof metrics: - type: main_score value: 58.499 - type: map_at_1 value: 40.371 - type: map_at_10 value: 52.337 - type: map_at_100 value: 53.04 - type: map_at_1000 value: 53.065 - type: map_at_20 value: 52.772 - type: map_at_3 value: 49.201 - type: map_at_5 value: 51.025 - type: mrr_at_1 value: 40.3713298791019 - type: mrr_at_10 value: 52.322165337061755 - type: mrr_at_100 value: 53.02092832847133 - type: mrr_at_1000 value: 53.04594680215603 - type: mrr_at_20 value: 52.750849914358135 - type: mrr_at_3 value: 49.150834772596475 - type: mrr_at_5 value: 50.998848589522275 - type: nauc_map_at_1000_diff1 value: 44.71946249374932 - type: nauc_map_at_1000_max value: 28.074204125714193 - type: nauc_map_at_1000_std value: -5.1319087890196275 - type: nauc_map_at_100_diff1 value: 44.71140286780233 - type: nauc_map_at_100_max value: 28.09677884622645 - type: nauc_map_at_100_std value: -5.116353867480612 - type: nauc_map_at_10_diff1 value: 44.737968596047736 - type: nauc_map_at_10_max value: 28.103186472557184 - type: nauc_map_at_10_std value: -5.258817287329683 - type: nauc_map_at_1_diff1 value: 47.48389890056789 - type: nauc_map_at_1_max value: 24.803734709402654 - type: nauc_map_at_1_std value: -6.504759899363267 - type: nauc_map_at_20_diff1 value: 44.67268454863271 - type: nauc_map_at_20_max value: 28.068912295976933 - type: nauc_map_at_20_std value: -5.1971060419801836 - type: nauc_map_at_3_diff1 value: 44.59399231542881 - type: nauc_map_at_3_max value: 27.097806786915502 - type: nauc_map_at_3_std value: -5.957120508111229 - type: nauc_map_at_5_diff1 value: 44.549807218619236 - type: nauc_map_at_5_max value: 28.03902312965202 - type: nauc_map_at_5_std value: -5.279585300980128 - type: nauc_mrr_at_1000_diff1 value: 44.70183532803094 - type: nauc_mrr_at_1000_max value: 28.08833759937601 - type: nauc_mrr_at_1000_std value: -5.097929115475795 - type: nauc_mrr_at_100_diff1 value: 44.693824401340684 - type: nauc_mrr_at_100_max value: 28.110898009292296 - type: nauc_mrr_at_100_std value: -5.082401300601749 - type: nauc_mrr_at_10_diff1 value: 44.74052791862188 - type: nauc_mrr_at_10_max value: 28.125378341430725 - type: nauc_mrr_at_10_std value: -5.209767905428716 - type: nauc_mrr_at_1_diff1 value: 47.48389890056789 - type: nauc_mrr_at_1_max value: 24.803734709402654 - type: nauc_mrr_at_1_std value: -6.504759899363267 - type: nauc_mrr_at_20_diff1 value: 44.65204014980107 - type: nauc_mrr_at_20_max value: 28.071523791101487 - type: nauc_mrr_at_20_std value: -5.176680495032765 - type: nauc_mrr_at_3_diff1 value: 44.566371489967835 - type: nauc_mrr_at_3_max value: 27.138418179089243 - type: nauc_mrr_at_3_std value: -5.8860676927947715 - type: nauc_mrr_at_5_diff1 value: 44.513022796226025 - type: nauc_mrr_at_5_max value: 28.037968016529184 - type: nauc_mrr_at_5_std value: -5.286851060853457 - type: nauc_ndcg_at_1000_diff1 value: 44.31019947897497 - type: nauc_ndcg_at_1000_max value: 29.332844099450185 - type: nauc_ndcg_at_1000_std value: -4.185675731246788 - type: nauc_ndcg_at_100_diff1 value: 44.15415366286996 - type: nauc_ndcg_at_100_max value: 30.098413084162345 - type: nauc_ndcg_at_100_std value: -3.557438303045246 - type: nauc_ndcg_at_10_diff1 value: 44.117356815361376 - type: nauc_ndcg_at_10_max value: 30.090057186506147 - type: nauc_ndcg_at_10_std value: -4.294561567142078 - type: nauc_ndcg_at_1_diff1 value: 47.48389890056789 - type: nauc_ndcg_at_1_max value: 24.803734709402654 - type: nauc_ndcg_at_1_std value: -6.504759899363267 - type: nauc_ndcg_at_20_diff1 value: 43.868556983413285 - type: nauc_ndcg_at_20_max value: 30.06455269775592 - type: nauc_ndcg_at_20_std value: -3.9645560243946623 - type: nauc_ndcg_at_3_diff1 value: 43.71970793339256 - type: nauc_ndcg_at_3_max value: 28.057786581438034 - type: nauc_ndcg_at_3_std value: -5.597352364190012 - type: nauc_ndcg_at_5_diff1 value: 43.57692922989753 - type: nauc_ndcg_at_5_max value: 29.811975056854994 - type: nauc_ndcg_at_5_std value: -4.362865924703688 - type: nauc_precision_at_1000_diff1 value: 37.65255144893002 - type: nauc_precision_at_1000_max value: 88.70768683938714 - type: nauc_precision_at_1000_std value: 69.77642765639528 - type: nauc_precision_at_100_diff1 value: 38.99412121382678 - type: nauc_precision_at_100_max value: 61.57652450016459 - type: nauc_precision_at_100_std value: 24.826035139656348 - type: nauc_precision_at_10_diff1 value: 41.78189732924517 - type: nauc_precision_at_10_max value: 39.83536802453079 - type: nauc_precision_at_10_std value: 0.431964006091015 - type: nauc_precision_at_1_diff1 value: 47.48389890056789 - type: nauc_precision_at_1_max value: 24.803734709402654 - type: nauc_precision_at_1_std value: -6.504759899363267 - type: nauc_precision_at_20_diff1 value: 39.33781305274886 - type: nauc_precision_at_20_max value: 43.00448814568695 - type: nauc_precision_at_20_std value: 4.5633424143661365 - type: nauc_precision_at_3_diff1 value: 40.99977742505519 - type: nauc_precision_at_3_max value: 31.14585236181214 - type: nauc_precision_at_3_std value: -4.404002104899136 - type: nauc_precision_at_5_diff1 value: 40.12130730401297 - type: nauc_precision_at_5_max value: 36.45000981581976 - type: nauc_precision_at_5_std value: -0.8603896798394983 - type: nauc_recall_at_1000_diff1 value: 37.652551448927504 - type: nauc_recall_at_1000_max value: 88.70768683938547 - type: nauc_recall_at_1000_std value: 69.77642765638893 - type: nauc_recall_at_100_diff1 value: 38.9941212138267 - type: nauc_recall_at_100_max value: 61.57652450016457 - type: nauc_recall_at_100_std value: 24.82603513965631 - type: nauc_recall_at_10_diff1 value: 41.781897329245105 - type: nauc_recall_at_10_max value: 39.83536802453082 - type: nauc_recall_at_10_std value: 0.4319640060909985 - type: nauc_recall_at_1_diff1 value: 47.48389890056789 - type: nauc_recall_at_1_max value: 24.803734709402654 - type: nauc_recall_at_1_std value: -6.504759899363267 - type: nauc_recall_at_20_diff1 value: 39.337813052748835 - type: nauc_recall_at_20_max value: 43.00448814568676 - type: nauc_recall_at_20_std value: 4.56334241436601 - type: nauc_recall_at_3_diff1 value: 40.99977742505522 - type: nauc_recall_at_3_max value: 31.14585236181218 - type: nauc_recall_at_3_std value: -4.404002104899084 - type: nauc_recall_at_5_diff1 value: 40.121307304013 - type: nauc_recall_at_5_max value: 36.450009815819726 - type: nauc_recall_at_5_std value: -0.8603896798395225 - type: ndcg_at_1 value: 40.371 - type: ndcg_at_10 value: 58.499 - type: ndcg_at_100 value: 61.958 - type: ndcg_at_1000 value: 62.638000000000005 - type: ndcg_at_20 value: 60.068 - type: ndcg_at_3 value: 52.079 - type: ndcg_at_5 value: 55.359 - type: precision_at_1 value: 40.371 - type: precision_at_10 value: 7.797999999999999 - type: precision_at_100 value: 0.943 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.208 - type: precision_at_3 value: 20.135 - type: precision_at_5 value: 13.669999999999998 - type: recall_at_1 value: 40.371 - type: recall_at_10 value: 77.979 - type: recall_at_100 value: 94.257 - type: recall_at_1000 value: 99.655 - type: recall_at_20 value: 84.154 - type: recall_at_3 value: 60.406000000000006 - type: recall_at_5 value: 68.351 task: type: Retrieval - dataset: config: fr name: MTEB AmazonReviewsClassification (fr) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 55.186 - type: f1 value: 54.46705535013317 - type: f1_weighted value: 54.46705535013317 - type: main_score value: 55.186 task: type: Classification - dataset: config: default name: MTEB BSARDRetrieval revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 split: test type: maastrichtlawtech/bsard metrics: - type: main_score value: 65.766 - type: map_at_1 value: 17.116999999999997 - type: map_at_10 value: 24.2 - type: map_at_100 value: 25.196 - type: map_at_1000 value: 25.285999999999998 - type: map_at_20 value: 24.84 - type: map_at_3 value: 21.246000000000002 - type: map_at_5 value: 23.386000000000003 - type: mrr_at_1 value: 17.117117117117118 - type: mrr_at_10 value: 24.19955669955671 - type: mrr_at_100 value: 25.195531920335007 - type: mrr_at_1000 value: 25.284600511909495 - type: mrr_at_20 value: 24.840254977638896 - type: mrr_at_3 value: 21.246246246246244 - type: mrr_at_5 value: 23.38588588588589 - type: nauc_map_at_1000_diff1 value: 10.81116818873305 - type: nauc_map_at_1000_max value: 18.081485212587296 - type: nauc_map_at_1000_std value: 15.55247182359811 - type: nauc_map_at_100_diff1 value: 10.769025561727476 - type: nauc_map_at_100_max value: 18.05422658310923 - type: nauc_map_at_100_std value: 15.5467718904851 - type: nauc_map_at_10_diff1 value: 10.683272018434048 - type: nauc_map_at_10_max value: 18.142476171157714 - type: nauc_map_at_10_std value: 15.160871943210017 - type: nauc_map_at_1_diff1 value: 15.136874216646229 - type: nauc_map_at_1_max value: 19.68585969419655 - type: nauc_map_at_1_std value: 15.169957564848444 - type: nauc_map_at_20_diff1 value: 11.04316522915875 - type: nauc_map_at_20_max value: 17.817024791267443 - type: nauc_map_at_20_std value: 15.071246935999893 - type: nauc_map_at_3_diff1 value: 8.893328353778843 - type: nauc_map_at_3_max value: 16.402408590507946 - type: nauc_map_at_3_std value: 14.631998787185735 - type: nauc_map_at_5_diff1 value: 9.802455874823172 - type: nauc_map_at_5_max value: 17.939476196078495 - type: nauc_map_at_5_std value: 14.130589132632698 - type: nauc_mrr_at_1000_diff1 value: 10.813072323683013 - type: nauc_mrr_at_1000_max value: 18.08332318614462 - type: nauc_mrr_at_1000_std value: 15.553043223942819 - type: nauc_mrr_at_100_diff1 value: 10.77091057430458 - type: nauc_mrr_at_100_max value: 18.055798185778123 - type: nauc_mrr_at_100_std value: 15.547068262312003 - type: nauc_mrr_at_10_diff1 value: 10.683272018434048 - type: nauc_mrr_at_10_max value: 18.142476171157714 - type: nauc_mrr_at_10_std value: 15.160871943210017 - type: nauc_mrr_at_1_diff1 value: 15.136874216646229 - type: nauc_mrr_at_1_max value: 19.68585969419655 - type: nauc_mrr_at_1_std value: 15.169957564848444 - type: nauc_mrr_at_20_diff1 value: 11.04316522915875 - type: nauc_mrr_at_20_max value: 17.817024791267443 - type: nauc_mrr_at_20_std value: 15.071246935999893 - type: nauc_mrr_at_3_diff1 value: 8.893328353778843 - type: nauc_mrr_at_3_max value: 16.402408590507946 - type: nauc_mrr_at_3_std value: 14.631998787185735 - type: nauc_mrr_at_5_diff1 value: 9.802455874823172 - type: nauc_mrr_at_5_max value: 17.939476196078495 - type: nauc_mrr_at_5_std value: 14.130589132632698 - type: nauc_ndcg_at_1000_diff1 value: 11.202853727201774 - type: nauc_ndcg_at_1000_max value: 19.0293189527563 - type: nauc_ndcg_at_1000_std value: 18.390388750658357 - type: nauc_ndcg_at_100_diff1 value: 10.087335018055228 - type: nauc_ndcg_at_100_max value: 18.78516003607274 - type: nauc_ndcg_at_100_std value: 18.780357674944415 - type: nauc_ndcg_at_10_diff1 value: 10.574953671198443 - type: nauc_ndcg_at_10_max value: 18.572291623672044 - type: nauc_ndcg_at_10_std value: 15.808055075116057 - type: nauc_ndcg_at_1_diff1 value: 15.136874216646229 - type: nauc_ndcg_at_1_max value: 19.68585969419655 - type: nauc_ndcg_at_1_std value: 15.169957564848444 - type: nauc_ndcg_at_20_diff1 value: 11.86104023461335 - type: nauc_ndcg_at_20_max value: 17.436985589044458 - type: nauc_ndcg_at_20_std value: 15.588720372098383 - type: nauc_ndcg_at_3_diff1 value: 7.212552449189805 - type: nauc_ndcg_at_3_max value: 15.573909877641508 - type: nauc_ndcg_at_3_std value: 14.53705493856145 - type: nauc_ndcg_at_5_diff1 value: 8.778923731622235 - type: nauc_ndcg_at_5_max value: 18.140995131168534 - type: nauc_ndcg_at_5_std value: 13.608313703781533 - type: nauc_precision_at_1000_diff1 value: 21.242679241621413 - type: nauc_precision_at_1000_max value: 28.358433127289924 - type: nauc_precision_at_1000_std value: 43.82822797432329 - type: nauc_precision_at_100_diff1 value: 6.627014646720404 - type: nauc_precision_at_100_max value: 22.40433487802035 - type: nauc_precision_at_100_std value: 34.933889742457595 - type: nauc_precision_at_10_diff1 value: 10.885683410075934 - type: nauc_precision_at_10_max value: 19.96889041019717 - type: nauc_precision_at_10_std value: 17.798863824564464 - type: nauc_precision_at_1_diff1 value: 15.136874216646229 - type: nauc_precision_at_1_max value: 19.68585969419655 - type: nauc_precision_at_1_std value: 15.169957564848444 - type: nauc_precision_at_20_diff1 value: 15.496066928172066 - type: nauc_precision_at_20_max value: 16.03026652303162 - type: nauc_precision_at_20_std value: 17.26605341902364 - type: nauc_precision_at_3_diff1 value: 2.968469300914268 - type: nauc_precision_at_3_max value: 13.49791571660617 - type: nauc_precision_at_3_std value: 14.311739399090806 - type: nauc_precision_at_5_diff1 value: 6.502154730668018 - type: nauc_precision_at_5_max value: 18.889080152631124 - type: nauc_precision_at_5_std value: 12.221319698087786 - type: nauc_recall_at_1000_diff1 value: 21.242679241621435 - type: nauc_recall_at_1000_max value: 28.358433127289974 - type: nauc_recall_at_1000_std value: 43.82822797432328 - type: nauc_recall_at_100_diff1 value: 6.62701464672039 - type: nauc_recall_at_100_max value: 22.404334878020286 - type: nauc_recall_at_100_std value: 34.93388974245755 - type: nauc_recall_at_10_diff1 value: 10.885683410075906 - type: nauc_recall_at_10_max value: 19.968890410197133 - type: nauc_recall_at_10_std value: 17.7988638245644 - type: nauc_recall_at_1_diff1 value: 15.136874216646229 - type: nauc_recall_at_1_max value: 19.68585969419655 - type: nauc_recall_at_1_std value: 15.169957564848444 - type: nauc_recall_at_20_diff1 value: 15.49606692817206 - type: nauc_recall_at_20_max value: 16.030266523031628 - type: nauc_recall_at_20_std value: 17.26605341902362 - type: nauc_recall_at_3_diff1 value: 2.968469300914263 - type: nauc_recall_at_3_max value: 13.497915716606142 - type: nauc_recall_at_3_std value: 14.31173939909079 - type: nauc_recall_at_5_diff1 value: 6.50215473066801 - type: nauc_recall_at_5_max value: 18.889080152631095 - type: nauc_recall_at_5_std value: 12.221319698087767 - type: ndcg_at_1 value: 17.116999999999997 - type: ndcg_at_10 value: 28.524 - type: ndcg_at_100 value: 33.476 - type: ndcg_at_1000 value: 36.012 - type: ndcg_at_20 value: 30.820999999999998 - type: ndcg_at_3 value: 22.721 - type: ndcg_at_5 value: 26.596999999999998 - type: precision_at_1 value: 17.116999999999997 - type: precision_at_10 value: 4.234 - type: precision_at_100 value: 0.658 - type: precision_at_1000 value: 0.086 - type: precision_at_20 value: 2.568 - type: precision_at_3 value: 9.009 - type: precision_at_5 value: 7.297 - type: recall_at_1 value: 17.116999999999997 - type: recall_at_10 value: 42.342 - type: recall_at_100 value: 65.766 - type: recall_at_1000 value: 86.036 - type: recall_at_20 value: 51.351 - type: recall_at_3 value: 27.027 - type: recall_at_5 value: 36.486000000000004 task: type: Retrieval - dataset: config: default name: MTEB HALClusteringS2S revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 split: test type: lyon-nlp/clustering-hal-s2s metrics: - type: main_score value: 28.18744772954557 - type: v_measure value: 28.18744772954557 - type: v_measure_std value: 3.239838057506439 task: type: Clustering - dataset: config: fr name: MTEB MLSUMClusteringP2P (fr) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 47.75009059283003 - type: v_measure value: 47.75009059283003 - type: v_measure_std value: 2.009277732690298 task: type: Clustering - dataset: config: fr name: MTEB MLSUMClusteringS2S (fr) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 47.46091989113078 - type: v_measure value: 47.46091989113078 - type: v_measure_std value: 2.604802270948194 task: type: Clustering - dataset: config: fr name: MTEB MTOPDomainClassification (fr) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 97.20325712496086 - type: f1 value: 97.05991090368462 - type: f1_weighted value: 97.20748006323807 - type: main_score value: 97.20325712496086 task: type: Classification - dataset: config: fr name: MTEB MTOPIntentClassification (fr) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 93.07234575634199 - type: f1 value: 76.54521288506878 - type: f1_weighted value: 93.6903586431893 - type: main_score value: 93.07234575634199 task: type: Classification - dataset: config: fra name: MTEB MasakhaNEWSClassification (fra) revision: 18193f187b92da67168c655c9973a165ed9593dd split: test type: mteb/masakhanews metrics: - type: accuracy value: 82.48815165876778 - type: f1 value: 78.71164464238117 - type: f1_weighted value: 82.38927389376973 - type: main_score value: 82.48815165876778 task: type: Classification - dataset: config: fra name: MTEB MasakhaNEWSClusteringP2P (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 73.85712952800003 - type: v_measure value: 73.85712952800003 - type: v_measure_std value: 22.471668299794416 task: type: Clustering - dataset: config: fra name: MTEB MasakhaNEWSClusteringS2S (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 67.23960512566751 - type: v_measure value: 67.23960512566751 - type: v_measure_std value: 24.65079601360142 task: type: Clustering - dataset: config: fr name: MTEB MassiveIntentClassification (fr) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 79.59986550100874 - type: f1 value: 76.0439154517916 - type: f1_weighted value: 79.48538292013761 - type: main_score value: 79.59986550100874 task: type: Classification - dataset: config: fr name: MTEB MassiveScenarioClassification (fr) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 82.182246133154 - type: f1 value: 81.68006668655397 - type: f1_weighted value: 81.94775072858566 - type: main_score value: 82.182246133154 task: type: Classification - dataset: config: fr name: MTEB MintakaRetrieval (fr) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: jinaai/mintakaqa metrics: - type: main_score value: 62.532 - type: map_at_1 value: 45.823 - type: map_at_10 value: 57.174 - type: map_at_100 value: 57.735 - type: map_at_1000 value: 57.767 - type: map_at_20 value: 57.53 - type: map_at_3 value: 54.716 - type: map_at_5 value: 56.227000000000004 - type: mrr_at_1 value: 45.82309582309582 - type: mrr_at_10 value: 57.17958217958217 - type: mrr_at_100 value: 57.744059413627866 - type: mrr_at_1000 value: 57.776651992832605 - type: mrr_at_20 value: 57.53890924556554 - type: mrr_at_3 value: 54.716079716079676 - type: mrr_at_5 value: 56.227136227136256 - type: nauc_map_at_1000_diff1 value: 39.48401851944296 - type: nauc_map_at_1000_max value: 36.55276875160682 - type: nauc_map_at_1000_std value: 3.9173787361040913 - type: nauc_map_at_100_diff1 value: 39.45696514871956 - type: nauc_map_at_100_max value: 36.55786982498759 - type: nauc_map_at_100_std value: 3.9506714061766557 - type: nauc_map_at_10_diff1 value: 39.31548009319837 - type: nauc_map_at_10_max value: 36.75711871602276 - type: nauc_map_at_10_std value: 3.782911249250981 - type: nauc_map_at_1_diff1 value: 44.190649439568766 - type: nauc_map_at_1_max value: 31.017419446234317 - type: nauc_map_at_1_std value: 0.5544388561183956 - type: nauc_map_at_20_diff1 value: 39.443640617310585 - type: nauc_map_at_20_max value: 36.63799366674228 - type: nauc_map_at_20_std value: 3.934276303386171 - type: nauc_map_at_3_diff1 value: 40.30871768246873 - type: nauc_map_at_3_max value: 36.944169455458656 - type: nauc_map_at_3_std value: 2.9847330185694556 - type: nauc_map_at_5_diff1 value: 39.590461060438095 - type: nauc_map_at_5_max value: 36.998781454405574 - type: nauc_map_at_5_std value: 3.532693606637119 - type: nauc_mrr_at_1000_diff1 value: 39.46102363098429 - type: nauc_mrr_at_1000_max value: 36.56900606103558 - type: nauc_mrr_at_1000_std value: 3.972436075561705 - type: nauc_mrr_at_100_diff1 value: 39.43269261665982 - type: nauc_mrr_at_100_max value: 36.574081599242014 - type: nauc_mrr_at_100_std value: 4.006374171904806 - type: nauc_mrr_at_10_diff1 value: 39.29970560564493 - type: nauc_mrr_at_10_max value: 36.778388879484716 - type: nauc_mrr_at_10_std value: 3.8335456201567206 - type: nauc_mrr_at_1_diff1 value: 44.190649439568766 - type: nauc_mrr_at_1_max value: 31.017419446234317 - type: nauc_mrr_at_1_std value: 0.5544388561183956 - type: nauc_mrr_at_20_diff1 value: 39.42091158484574 - type: nauc_mrr_at_20_max value: 36.65421566061936 - type: nauc_mrr_at_20_std value: 3.988695948848555 - type: nauc_mrr_at_3_diff1 value: 40.313976315898195 - type: nauc_mrr_at_3_max value: 36.960483501441985 - type: nauc_mrr_at_3_std value: 3.0112756156560394 - type: nauc_mrr_at_5_diff1 value: 39.56386294620379 - type: nauc_mrr_at_5_max value: 37.02119815939672 - type: nauc_mrr_at_5_std value: 3.6118004205573184 - type: nauc_ndcg_at_1000_diff1 value: 38.05281585863137 - type: nauc_ndcg_at_1000_max value: 37.41178875860201 - type: nauc_ndcg_at_1000_std value: 5.525420555163393 - type: nauc_ndcg_at_100_diff1 value: 37.18408005856676 - type: nauc_ndcg_at_100_max value: 37.617851212997685 - type: nauc_ndcg_at_100_std value: 6.871461890669446 - type: nauc_ndcg_at_10_diff1 value: 36.624444841382484 - type: nauc_ndcg_at_10_max value: 38.62100324849529 - type: nauc_ndcg_at_10_std value: 6.027810657475449 - type: nauc_ndcg_at_1_diff1 value: 44.190649439568766 - type: nauc_ndcg_at_1_max value: 31.017419446234317 - type: nauc_ndcg_at_1_std value: 0.5544388561183956 - type: nauc_ndcg_at_20_diff1 value: 37.057047514121564 - type: nauc_ndcg_at_20_max value: 38.19839331454421 - type: nauc_ndcg_at_20_std value: 6.770369938343684 - type: nauc_ndcg_at_3_diff1 value: 38.95821428563954 - type: nauc_ndcg_at_3_max value: 38.87440219376017 - type: nauc_ndcg_at_3_std value: 4.097498274708613 - type: nauc_ndcg_at_5_diff1 value: 37.515589837182034 - type: nauc_ndcg_at_5_max value: 39.165561493023276 - type: nauc_ndcg_at_5_std value: 5.291512124344874 - type: nauc_precision_at_1000_diff1 value: -13.365474882749279 - type: nauc_precision_at_1000_max value: 50.68568417959442 - type: nauc_precision_at_1000_std value: 37.847145129019054 - type: nauc_precision_at_100_diff1 value: 12.081443207482383 - type: nauc_precision_at_100_max value: 43.67561356191485 - type: nauc_precision_at_100_std value: 44.64523987759538 - type: nauc_precision_at_10_diff1 value: 23.20358204183261 - type: nauc_precision_at_10_max value: 46.93706139285088 - type: nauc_precision_at_10_std value: 17.36243956517301 - type: nauc_precision_at_1_diff1 value: 44.190649439568766 - type: nauc_precision_at_1_max value: 31.017419446234317 - type: nauc_precision_at_1_std value: 0.5544388561183956 - type: nauc_precision_at_20_diff1 value: 22.42836999246196 - type: nauc_precision_at_20_max value: 46.29381413041759 - type: nauc_precision_at_20_std value: 26.126609401922696 - type: nauc_precision_at_3_diff1 value: 34.503018704702484 - type: nauc_precision_at_3_max value: 45.194775358016095 - type: nauc_precision_at_3_std value: 7.864444241838433 - type: nauc_precision_at_5_diff1 value: 29.494641243672138 - type: nauc_precision_at_5_max value: 47.326071718857484 - type: nauc_precision_at_5_std value: 12.273738036245172 - type: nauc_recall_at_1000_diff1 value: -13.365474882756335 - type: nauc_recall_at_1000_max value: 50.68568417959348 - type: nauc_recall_at_1000_std value: 37.8471451290128 - type: nauc_recall_at_100_diff1 value: 12.08144320748251 - type: nauc_recall_at_100_max value: 43.675613561914986 - type: nauc_recall_at_100_std value: 44.645239877595564 - type: nauc_recall_at_10_diff1 value: 23.203582041832526 - type: nauc_recall_at_10_max value: 46.9370613928509 - type: nauc_recall_at_10_std value: 17.36243956517297 - type: nauc_recall_at_1_diff1 value: 44.190649439568766 - type: nauc_recall_at_1_max value: 31.017419446234317 - type: nauc_recall_at_1_std value: 0.5544388561183956 - type: nauc_recall_at_20_diff1 value: 22.42836999246212 - type: nauc_recall_at_20_max value: 46.29381413041773 - type: nauc_recall_at_20_std value: 26.12660940192268 - type: nauc_recall_at_3_diff1 value: 34.50301870470248 - type: nauc_recall_at_3_max value: 45.19477535801611 - type: nauc_recall_at_3_std value: 7.8644442418384335 - type: nauc_recall_at_5_diff1 value: 29.494641243672216 - type: nauc_recall_at_5_max value: 47.32607171885759 - type: nauc_recall_at_5_std value: 12.273738036245142 - type: ndcg_at_1 value: 45.823 - type: ndcg_at_10 value: 62.532 - type: ndcg_at_100 value: 65.298 - type: ndcg_at_1000 value: 66.214 - type: ndcg_at_20 value: 63.82600000000001 - type: ndcg_at_3 value: 57.528999999999996 - type: ndcg_at_5 value: 60.24 - type: precision_at_1 value: 45.823 - type: precision_at_10 value: 7.928 - type: precision_at_100 value: 0.923 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.22 - type: precision_at_3 value: 21.881 - type: precision_at_5 value: 14.438999999999998 - type: recall_at_1 value: 45.823 - type: recall_at_10 value: 79.279 - type: recall_at_100 value: 92.301 - type: recall_at_1000 value: 99.631 - type: recall_at_20 value: 84.398 - type: recall_at_3 value: 65.643 - type: recall_at_5 value: 72.195 task: type: Retrieval - dataset: config: fr name: MTEB OpusparcusPC (fr) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test type: GEM/opusparcus metrics: - type: cosine_accuracy value: 99.90069513406156 - type: cosine_accuracy_threshold value: 54.45001207375879 - type: cosine_ap value: 100.0 - type: cosine_f1 value: 99.95032290114257 - type: cosine_f1_threshold value: 54.45001207375879 - type: cosine_precision value: 100.0 - type: cosine_recall value: 99.90069513406156 - type: dot_accuracy value: 99.90069513406156 - type: dot_accuracy_threshold value: 1312800.0 - type: dot_ap value: 100.0 - type: dot_f1 value: 99.95032290114257 - type: dot_f1_threshold value: 1312800.0 - type: dot_precision value: 100.0 - type: dot_recall value: 99.90069513406156 - type: euclidean_accuracy value: 99.90069513406156 - type: euclidean_accuracy_threshold value: 15150.791732002876 - type: euclidean_ap value: 100.0 - type: euclidean_f1 value: 99.95032290114257 - type: euclidean_f1_threshold value: 15150.791732002876 - type: euclidean_precision value: 100.0 - type: euclidean_recall value: 99.90069513406156 - type: main_score value: 100.0 - type: manhattan_accuracy value: 99.90069513406156 - type: manhattan_accuracy_threshold value: 717903.2791554928 - type: manhattan_ap value: 100.0 - type: manhattan_f1 value: 99.95032290114257 - type: manhattan_f1_threshold value: 717903.2791554928 - type: manhattan_precision value: 100.0 - type: manhattan_recall value: 99.90069513406156 - type: max_ap value: 100.0 - type: max_f1 value: 99.95032290114257 - type: max_precision value: 100.0 - type: max_recall value: 99.90069513406156 - type: similarity_accuracy value: 99.90069513406156 - type: similarity_accuracy_threshold value: 54.45001207375879 - type: similarity_ap value: 100.0 - type: similarity_f1 value: 99.95032290114257 - type: similarity_f1_threshold value: 54.45001207375879 - type: similarity_precision value: 100.0 - type: similarity_recall value: 99.90069513406156 task: type: PairClassification - dataset: config: fr name: MTEB PawsXPairClassification (fr) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: google-research-datasets/paws-x metrics: - type: cosine_accuracy value: 67.95 - type: cosine_accuracy_threshold value: 97.36901285947026 - type: cosine_ap value: 70.14158727060726 - type: cosine_f1 value: 65.38108356290174 - type: cosine_f1_threshold value: 94.90683744884689 - type: cosine_precision value: 55.84313725490196 - type: cosine_recall value: 78.8482834994463 - type: dot_accuracy value: 60.5 - type: dot_accuracy_threshold value: 2606400.0 - type: dot_ap value: 57.0114505567262 - type: dot_f1 value: 63.29394387001477 - type: dot_f1_threshold value: 2345600.0 - type: dot_precision value: 47.4792243767313 - type: dot_recall value: 94.90586932447398 - type: euclidean_accuracy value: 68.05 - type: euclidean_accuracy_threshold value: 3824.99743197985 - type: euclidean_ap value: 70.01158306654237 - type: euclidean_f1 value: 65.21939953810623 - type: euclidean_f1_threshold value: 5187.47968966464 - type: euclidean_precision value: 55.942947702060216 - type: euclidean_recall value: 78.18383167220377 - type: main_score value: 70.14158727060726 - type: manhattan_accuracy value: 68.05 - type: manhattan_accuracy_threshold value: 191852.34832763672 - type: manhattan_ap value: 70.01670033904287 - type: manhattan_f1 value: 65.2854511970534 - type: manhattan_f1_threshold value: 246807.1710705757 - type: manhattan_precision value: 55.87076438140268 - type: manhattan_recall value: 78.51605758582502 - type: max_ap value: 70.14158727060726 - type: max_f1 value: 65.38108356290174 - type: max_precision value: 55.942947702060216 - type: max_recall value: 94.90586932447398 - type: similarity_accuracy value: 67.95 - type: similarity_accuracy_threshold value: 97.36901285947026 - type: similarity_ap value: 70.14158727060726 - type: similarity_f1 value: 65.38108356290174 - type: similarity_f1_threshold value: 94.90683744884689 - type: similarity_precision value: 55.84313725490196 - type: similarity_recall value: 78.8482834994463 task: type: PairClassification - dataset: config: default name: MTEB SICKFr revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a split: test type: Lajavaness/SICK-fr metrics: - type: cosine_pearson value: 79.79861486027 - type: cosine_spearman value: 79.3918786992987 - type: euclidean_pearson value: 77.73226212475764 - type: euclidean_spearman value: 79.08856888397014 - type: main_score value: 79.3918786992987 - type: manhattan_pearson value: 77.8002206650809 - type: manhattan_spearman value: 79.15284532531264 - type: pearson value: 79.79861486027 - type: spearman value: 79.3918786992987 task: type: STS - dataset: config: fr name: MTEB STS22 (fr) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 83.32314025534286 - type: cosine_spearman value: 83.2806004701507 - type: euclidean_pearson value: 81.88040500817269 - type: euclidean_spearman value: 82.73179823676206 - type: main_score value: 83.2806004701507 - type: manhattan_pearson value: 82.0438174605579 - type: manhattan_spearman value: 83.0253049811576 - type: pearson value: 83.32314025534286 - type: spearman value: 83.2806004701507 task: type: STS - dataset: config: fr name: MTEB STSBenchmarkMultilingualSTS (fr) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 84.56723075054445 - type: cosine_spearman value: 85.08759191551403 - type: euclidean_pearson value: 83.186096744725 - type: euclidean_spearman value: 84.36958569816491 - type: main_score value: 85.08759191551403 - type: manhattan_pearson value: 83.1405072165467 - type: manhattan_spearman value: 84.34227830781155 - type: pearson value: 84.56723075054445 - type: spearman value: 85.08759191551403 task: type: STS - dataset: config: default name: MTEB SummEvalFr revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 split: test type: lyon-nlp/summarization-summeval-fr-p2p metrics: - type: cosine_pearson value: 31.921764332449115 - type: cosine_spearman value: 31.260442997631806 - type: dot_pearson value: 31.585578707631406 - type: dot_spearman value: 31.479238746310028 - type: main_score value: 31.260442997631806 - type: pearson value: 31.921764332449115 - type: spearman value: 31.260442997631806 task: type: Summarization - dataset: config: default name: MTEB SyntecReranking revision: daf0863838cd9e3ba50544cdce3ac2b338a1b0ad split: test type: lyon-nlp/mteb-fr-reranking-syntec-s2p metrics: - type: main_score value: 91.83333333333333 - type: map value: 91.83333333333333 - type: mrr value: 92.0 - type: nAUC_map_diff1 value: 53.97793263646914 - type: nAUC_map_max value: 44.264158743282195 - type: nAUC_map_std value: 14.692218350754885 - type: nAUC_mrr_diff1 value: 54.36926882239366 - type: nAUC_mrr_max value: 46.43108510296003 - type: nAUC_mrr_std value: 17.48914092664096 task: type: Reranking - dataset: config: default name: MTEB SyntecRetrieval revision: 19661ccdca4dfc2d15122d776b61685f48c68ca9 split: test type: lyon-nlp/mteb-fr-retrieval-syntec-s2p metrics: - type: main_score value: 90.36699999999999 - type: map_at_1 value: 79.0 - type: map_at_10 value: 87.18599999999999 - type: map_at_100 value: 87.18599999999999 - type: map_at_1000 value: 87.18599999999999 - type: map_at_20 value: 87.18599999999999 - type: map_at_3 value: 86.0 - type: map_at_5 value: 86.95 - type: mrr_at_1 value: 79.0 - type: mrr_at_10 value: 87.18611111111112 - type: mrr_at_100 value: 87.18611111111112 - type: mrr_at_1000 value: 87.18611111111112 - type: mrr_at_20 value: 87.18611111111112 - type: mrr_at_3 value: 86.0 - type: mrr_at_5 value: 86.95 - type: nauc_map_at_1000_diff1 value: 63.05539428169271 - type: nauc_map_at_1000_max value: 45.428107132447124 - type: nauc_map_at_1000_std value: 13.94507583970834 - type: nauc_map_at_100_diff1 value: 63.05539428169271 - type: nauc_map_at_100_max value: 45.428107132447124 - type: nauc_map_at_100_std value: 13.94507583970834 - type: nauc_map_at_10_diff1 value: 63.05539428169271 - type: nauc_map_at_10_max value: 45.428107132447124 - type: nauc_map_at_10_std value: 13.94507583970834 - type: nauc_map_at_1_diff1 value: 64.24122923028831 - type: nauc_map_at_1_max value: 44.34077957053877 - type: nauc_map_at_1_std value: 9.594344386466878 - type: nauc_map_at_20_diff1 value: 63.05539428169271 - type: nauc_map_at_20_max value: 45.428107132447124 - type: nauc_map_at_20_std value: 13.94507583970834 - type: nauc_map_at_3_diff1 value: 62.30831315577075 - type: nauc_map_at_3_max value: 47.33980193586779 - type: nauc_map_at_3_std value: 16.132624025733 - type: nauc_map_at_5_diff1 value: 63.079622378971834 - type: nauc_map_at_5_max value: 45.13424437707254 - type: nauc_map_at_5_std value: 13.730785051570013 - type: nauc_mrr_at_1000_diff1 value: 63.05539428169271 - type: nauc_mrr_at_1000_max value: 45.428107132447124 - type: nauc_mrr_at_1000_std value: 13.94507583970834 - type: nauc_mrr_at_100_diff1 value: 63.05539428169271 - type: nauc_mrr_at_100_max value: 45.428107132447124 - type: nauc_mrr_at_100_std value: 13.94507583970834 - type: nauc_mrr_at_10_diff1 value: 63.05539428169271 - type: nauc_mrr_at_10_max value: 45.428107132447124 - type: nauc_mrr_at_10_std value: 13.94507583970834 - type: nauc_mrr_at_1_diff1 value: 64.24122923028831 - type: nauc_mrr_at_1_max value: 44.34077957053877 - type: nauc_mrr_at_1_std value: 9.594344386466878 - type: nauc_mrr_at_20_diff1 value: 63.05539428169271 - type: nauc_mrr_at_20_max value: 45.428107132447124 - type: nauc_mrr_at_20_std value: 13.94507583970834 - type: nauc_mrr_at_3_diff1 value: 62.30831315577075 - type: nauc_mrr_at_3_max value: 47.33980193586779 - type: nauc_mrr_at_3_std value: 16.132624025733 - type: nauc_mrr_at_5_diff1 value: 63.079622378971834 - type: nauc_mrr_at_5_max value: 45.13424437707254 - type: nauc_mrr_at_5_std value: 13.730785051570013 - type: nauc_ndcg_at_1000_diff1 value: 62.97376441474187 - type: nauc_ndcg_at_1000_max value: 45.457846840130586 - type: nauc_ndcg_at_1000_std value: 14.17695491254452 - type: nauc_ndcg_at_100_diff1 value: 62.97376441474187 - type: nauc_ndcg_at_100_max value: 45.457846840130586 - type: nauc_ndcg_at_100_std value: 14.17695491254452 - type: nauc_ndcg_at_10_diff1 value: 62.97376441474187 - type: nauc_ndcg_at_10_max value: 45.457846840130586 - type: nauc_ndcg_at_10_std value: 14.17695491254452 - type: nauc_ndcg_at_1_diff1 value: 64.24122923028831 - type: nauc_ndcg_at_1_max value: 44.34077957053877 - type: nauc_ndcg_at_1_std value: 9.594344386466878 - type: nauc_ndcg_at_20_diff1 value: 62.97376441474187 - type: nauc_ndcg_at_20_max value: 45.457846840130586 - type: nauc_ndcg_at_20_std value: 14.17695491254452 - type: nauc_ndcg_at_3_diff1 value: 61.47043349797183 - type: nauc_ndcg_at_3_max value: 49.12165820225059 - type: nauc_ndcg_at_3_std value: 18.525396343409568 - type: nauc_ndcg_at_5_diff1 value: 63.04022063936115 - type: nauc_ndcg_at_5_max value: 44.381937619091765 - type: nauc_ndcg_at_5_std value: 13.3263412698325 - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_100_diff1 value: .nan - type: nauc_precision_at_100_max value: .nan - type: nauc_precision_at_100_std value: .nan - type: nauc_precision_at_10_diff1 value: 100.0 - type: nauc_precision_at_10_max value: 100.0 - type: nauc_precision_at_10_std value: 100.0 - type: nauc_precision_at_1_diff1 value: 64.24122923028831 - type: nauc_precision_at_1_max value: 44.34077957053877 - type: nauc_precision_at_1_std value: 9.594344386466878 - type: nauc_precision_at_20_diff1 value: 100.0 - type: nauc_precision_at_20_max value: 100.0 - type: nauc_precision_at_20_std value: 100.0 - type: nauc_precision_at_3_diff1 value: 56.27917833800158 - type: nauc_precision_at_3_max value: 60.51976346093969 - type: nauc_precision_at_3_std value: 33.02209772798002 - type: nauc_precision_at_5_diff1 value: 63.81886087768404 - type: nauc_precision_at_5_max value: 27.544351073763345 - type: nauc_precision_at_5_std value: -0.4668534080301362 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: .nan - type: nauc_recall_at_100_max value: .nan - type: nauc_recall_at_100_std value: .nan - type: nauc_recall_at_10_diff1 value: .nan - type: nauc_recall_at_10_max value: .nan - type: nauc_recall_at_10_std value: .nan - type: nauc_recall_at_1_diff1 value: 64.24122923028831 - type: nauc_recall_at_1_max value: 44.34077957053877 - type: nauc_recall_at_1_std value: 9.594344386466878 - type: nauc_recall_at_20_diff1 value: .nan - type: nauc_recall_at_20_max value: .nan - type: nauc_recall_at_20_std value: .nan - type: nauc_recall_at_3_diff1 value: 56.27917833800187 - type: nauc_recall_at_3_max value: 60.51976346094 - type: nauc_recall_at_3_std value: 33.022097727980125 - type: nauc_recall_at_5_diff1 value: 63.81886087768457 - type: nauc_recall_at_5_max value: 27.544351073763107 - type: nauc_recall_at_5_std value: -0.46685340803013775 - type: ndcg_at_1 value: 79.0 - type: ndcg_at_10 value: 90.36699999999999 - type: ndcg_at_100 value: 90.36699999999999 - type: ndcg_at_1000 value: 90.36699999999999 - type: ndcg_at_20 value: 90.36699999999999 - type: ndcg_at_3 value: 88.071 - type: ndcg_at_5 value: 89.75 - type: precision_at_1 value: 79.0 - type: precision_at_10 value: 10.0 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 5.0 - type: precision_at_3 value: 31.333 - type: precision_at_5 value: 19.6 - type: recall_at_1 value: 79.0 - type: recall_at_10 value: 100.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 100.0 - type: recall_at_3 value: 94.0 - type: recall_at_5 value: 98.0 task: type: Retrieval - dataset: config: fra-fra name: MTEB XPQARetrieval (fr) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 77.425 - type: map_at_1 value: 46.749 - type: map_at_10 value: 72.108 - type: map_at_100 value: 73.32499999999999 - type: map_at_1000 value: 73.341 - type: map_at_20 value: 72.991 - type: map_at_3 value: 65.09 - type: map_at_5 value: 70.137 - type: mrr_at_1 value: 71.82910547396529 - type: mrr_at_10 value: 78.63357492529722 - type: mrr_at_100 value: 78.97374961354801 - type: mrr_at_1000 value: 78.97840549855806 - type: mrr_at_20 value: 78.86005025292395 - type: mrr_at_3 value: 77.28081886960389 - type: mrr_at_5 value: 78.0551846906987 - type: nauc_map_at_1000_diff1 value: 57.508397030020156 - type: nauc_map_at_1000_max value: 43.80251983780665 - type: nauc_map_at_1000_std value: -16.231491160419434 - type: nauc_map_at_100_diff1 value: 57.48614844875469 - type: nauc_map_at_100_max value: 43.797011627763055 - type: nauc_map_at_100_std value: -16.239303348969592 - type: nauc_map_at_10_diff1 value: 57.254064849553934 - type: nauc_map_at_10_max value: 42.765535577219026 - type: nauc_map_at_10_std value: -17.255606315997156 - type: nauc_map_at_1_diff1 value: 65.04324659040175 - type: nauc_map_at_1_max value: 17.852220653388855 - type: nauc_map_at_1_std value: -14.257753661018779 - type: nauc_map_at_20_diff1 value: 57.48367588324867 - type: nauc_map_at_20_max value: 43.680084254814425 - type: nauc_map_at_20_std value: -16.59381108810359 - type: nauc_map_at_3_diff1 value: 58.328817274958276 - type: nauc_map_at_3_max value: 34.603370607250675 - type: nauc_map_at_3_std value: -15.326569334165047 - type: nauc_map_at_5_diff1 value: 57.544271139796365 - type: nauc_map_at_5_max value: 41.58159814532708 - type: nauc_map_at_5_std value: -17.035562345654515 - type: nauc_mrr_at_1000_diff1 value: 67.23053035385993 - type: nauc_mrr_at_1000_max value: 53.982556981667095 - type: nauc_mrr_at_1000_std value: -12.015571062417035 - type: nauc_mrr_at_100_diff1 value: 67.23047293440347 - type: nauc_mrr_at_100_max value: 53.97931489747768 - type: nauc_mrr_at_100_std value: -12.026957248146365 - type: nauc_mrr_at_10_diff1 value: 67.25927907237941 - type: nauc_mrr_at_10_max value: 53.99647347811833 - type: nauc_mrr_at_10_std value: -12.356365137919108 - type: nauc_mrr_at_1_diff1 value: 67.80552098159194 - type: nauc_mrr_at_1_max value: 52.34740974885752 - type: nauc_mrr_at_1_std value: -9.009347371853096 - type: nauc_mrr_at_20_diff1 value: 67.22472566769486 - type: nauc_mrr_at_20_max value: 54.03480374123263 - type: nauc_mrr_at_20_std value: -12.129416933895373 - type: nauc_mrr_at_3_diff1 value: 66.86636026044627 - type: nauc_mrr_at_3_max value: 53.84675762408544 - type: nauc_mrr_at_3_std value: -12.318414220208327 - type: nauc_mrr_at_5_diff1 value: 67.16713697443882 - type: nauc_mrr_at_5_max value: 54.174275682276765 - type: nauc_mrr_at_5_std value: -12.382704200660772 - type: nauc_ndcg_at_1000_diff1 value: 60.076768803793875 - type: nauc_ndcg_at_1000_max value: 48.06880976583911 - type: nauc_ndcg_at_1000_std value: -14.8002468401513 - type: nauc_ndcg_at_100_diff1 value: 59.84195440900073 - type: nauc_ndcg_at_100_max value: 48.031759882567265 - type: nauc_ndcg_at_100_std value: -14.93671795434138 - type: nauc_ndcg_at_10_diff1 value: 59.091362656630984 - type: nauc_ndcg_at_10_max value: 45.902216798175296 - type: nauc_ndcg_at_10_std value: -18.225812204918686 - type: nauc_ndcg_at_1_diff1 value: 67.80552098159194 - type: nauc_ndcg_at_1_max value: 52.34740974885752 - type: nauc_ndcg_at_1_std value: -9.009347371853096 - type: nauc_ndcg_at_20_diff1 value: 59.80472569029982 - type: nauc_ndcg_at_20_max value: 47.92221974783734 - type: nauc_ndcg_at_20_std value: -16.589965314279805 - type: nauc_ndcg_at_3_diff1 value: 56.9195769675713 - type: nauc_ndcg_at_3_max value: 44.992740041222575 - type: nauc_ndcg_at_3_std value: -16.329730380555382 - type: nauc_ndcg_at_5_diff1 value: 59.31912266230594 - type: nauc_ndcg_at_5_max value: 44.75423089733974 - type: nauc_ndcg_at_5_std value: -17.744216780645583 - type: nauc_precision_at_1000_diff1 value: -30.976050318575094 - type: nauc_precision_at_1000_max value: 16.55619583017722 - type: nauc_precision_at_1000_std value: 10.549164466552044 - type: nauc_precision_at_100_diff1 value: -30.217028356940872 - type: nauc_precision_at_100_max value: 17.709049202840184 - type: nauc_precision_at_100_std value: 10.04190905252673 - type: nauc_precision_at_10_diff1 value: -19.588612396735584 - type: nauc_precision_at_10_max value: 23.97095583735318 - type: nauc_precision_at_10_std value: 1.3308819095790259 - type: nauc_precision_at_1_diff1 value: 67.80552098159194 - type: nauc_precision_at_1_max value: 52.34740974885752 - type: nauc_precision_at_1_std value: -9.009347371853096 - type: nauc_precision_at_20_diff1 value: -24.56372903999468 - type: nauc_precision_at_20_max value: 21.970766470092478 - type: nauc_precision_at_20_std value: 5.690019568793079 - type: nauc_precision_at_3_diff1 value: -5.293993834675436 - type: nauc_precision_at_3_max value: 33.48037221970611 - type: nauc_precision_at_3_std value: -0.9905029996040207 - type: nauc_precision_at_5_diff1 value: -12.477204961113433 - type: nauc_precision_at_5_max value: 28.41320824321574 - type: nauc_precision_at_5_std value: -0.25510168506666026 - type: nauc_recall_at_1000_diff1 value: 63.80720019823024 - type: nauc_recall_at_1000_max value: 100.0 - type: nauc_recall_at_1000_std value: 100.0 - type: nauc_recall_at_100_diff1 value: 45.99503772001805 - type: nauc_recall_at_100_max value: 53.62256247578381 - type: nauc_recall_at_100_std value: -2.1521605315502126 - type: nauc_recall_at_10_diff1 value: 51.49183566173087 - type: nauc_recall_at_10_max value: 39.94460610694432 - type: nauc_recall_at_10_std value: -27.417226994058534 - type: nauc_recall_at_1_diff1 value: 65.04324659040175 - type: nauc_recall_at_1_max value: 17.852220653388855 - type: nauc_recall_at_1_std value: -14.257753661018779 - type: nauc_recall_at_20_diff1 value: 53.65987970751146 - type: nauc_recall_at_20_max value: 48.20536243702891 - type: nauc_recall_at_20_std value: -24.77784527777353 - type: nauc_recall_at_3_diff1 value: 53.27794448209969 - type: nauc_recall_at_3_max value: 30.304767840963283 - type: nauc_recall_at_3_std value: -19.099603261339936 - type: nauc_recall_at_5_diff1 value: 53.77383683020561 - type: nauc_recall_at_5_max value: 39.58616026474047 - type: nauc_recall_at_5_std value: -23.255086482736036 - type: ndcg_at_1 value: 71.829 - type: ndcg_at_10 value: 77.425 - type: ndcg_at_100 value: 80.88 - type: ndcg_at_1000 value: 81.128 - type: ndcg_at_20 value: 79.403 - type: ndcg_at_3 value: 72.89 - type: ndcg_at_5 value: 74.521 - type: precision_at_1 value: 71.829 - type: precision_at_10 value: 17.596999999999998 - type: precision_at_100 value: 2.033 - type: precision_at_1000 value: 0.207 - type: precision_at_20 value: 9.513 - type: precision_at_3 value: 44.192 - type: precision_at_5 value: 31.776 - type: recall_at_1 value: 46.749 - type: recall_at_10 value: 85.49799999999999 - type: recall_at_100 value: 98.17099999999999 - type: recall_at_1000 value: 99.733 - type: recall_at_20 value: 91.70700000000001 - type: recall_at_3 value: 70.309 - type: recall_at_5 value: 78.507 task: type: Retrieval - dataset: config: default name: MTEB AllegroReviews revision: b89853e6de927b0e3bfa8ecc0e56fe4e02ceafc6 split: test type: PL-MTEB/allegro-reviews metrics: - type: accuracy value: 65.0 - type: f1 value: 58.85888258599016 - type: f1_weighted value: 65.99554726292321 - type: main_score value: 65.0 task: type: Classification - dataset: config: default name: MTEB ArguAna-PL revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 split: test type: clarin-knext/arguana-pl metrics: - type: main_score value: 59.71300000000001 - type: map_at_1 value: 35.135 - type: map_at_10 value: 51.092000000000006 - type: map_at_100 value: 51.773 - type: map_at_1000 value: 51.776999999999994 - type: map_at_20 value: 51.665000000000006 - type: map_at_3 value: 46.574 - type: map_at_5 value: 49.032 - type: mrr_at_1 value: 36.201991465149355 - type: mrr_at_10 value: 51.546405427984475 - type: mrr_at_100 value: 52.202374673015285 - type: mrr_at_1000 value: 52.20610086068531 - type: mrr_at_20 value: 52.096805353180756 - type: mrr_at_3 value: 47.01280227596022 - type: mrr_at_5 value: 49.49146514935999 - type: nauc_map_at_1000_diff1 value: 19.758403663654388 - type: nauc_map_at_1000_max value: 1.9211716901459552 - type: nauc_map_at_1000_std value: -12.391775130617594 - type: nauc_map_at_100_diff1 value: 19.75801012476506 - type: nauc_map_at_100_max value: 1.927233271789035 - type: nauc_map_at_100_std value: -12.390686358565384 - type: nauc_map_at_10_diff1 value: 19.618023487744257 - type: nauc_map_at_10_max value: 1.948823709088292 - type: nauc_map_at_10_std value: -12.590649627823774 - type: nauc_map_at_1_diff1 value: 22.704520355653777 - type: nauc_map_at_1_max value: -0.7340073588952427 - type: nauc_map_at_1_std value: -11.685082615631233 - type: nauc_map_at_20_diff1 value: 19.710150386755245 - type: nauc_map_at_20_max value: 1.9579689185617946 - type: nauc_map_at_20_std value: -12.454848473878485 - type: nauc_map_at_3_diff1 value: 19.88571571635227 - type: nauc_map_at_3_max value: 2.2089391275055754 - type: nauc_map_at_3_std value: -12.152625563551476 - type: nauc_map_at_5_diff1 value: 19.345423817148774 - type: nauc_map_at_5_max value: 2.4471831202433783 - type: nauc_map_at_5_std value: -11.60532301686549 - type: nauc_mrr_at_1000_diff1 value: 16.90786453167799 - type: nauc_mrr_at_1000_max value: 0.65578323377857 - type: nauc_mrr_at_1000_std value: -12.395929715413015 - type: nauc_mrr_at_100_diff1 value: 16.90781127619206 - type: nauc_mrr_at_100_max value: 0.6619900297824423 - type: nauc_mrr_at_100_std value: -12.394826789608906 - type: nauc_mrr_at_10_diff1 value: 16.785894192163838 - type: nauc_mrr_at_10_max value: 0.7096666849274212 - type: nauc_mrr_at_10_std value: -12.592883550594735 - type: nauc_mrr_at_1_diff1 value: 19.59282927806732 - type: nauc_mrr_at_1_max value: -1.1271716729359413 - type: nauc_mrr_at_1_std value: -11.710668880297517 - type: nauc_mrr_at_20_diff1 value: 16.86673477981559 - type: nauc_mrr_at_20_max value: 0.6897167399764257 - type: nauc_mrr_at_20_std value: -12.464631471378414 - type: nauc_mrr_at_3_diff1 value: 17.0481261621288 - type: nauc_mrr_at_3_max value: 0.7183007174016199 - type: nauc_mrr_at_3_std value: -12.329335728574527 - type: nauc_mrr_at_5_diff1 value: 16.698916629443854 - type: nauc_mrr_at_5_max value: 1.2515514207224299 - type: nauc_mrr_at_5_std value: -11.662599392805308 - type: nauc_ndcg_at_1000_diff1 value: 19.30605856078901 - type: nauc_ndcg_at_1000_max value: 2.3402231520806835 - type: nauc_ndcg_at_1000_std value: -12.370409989770332 - type: nauc_ndcg_at_100_diff1 value: 19.31155460872256 - type: nauc_ndcg_at_100_max value: 2.510633162779702 - type: nauc_ndcg_at_100_std value: -12.313796276064673 - type: nauc_ndcg_at_10_diff1 value: 18.511651466450843 - type: nauc_ndcg_at_10_max value: 2.6756675185155263 - type: nauc_ndcg_at_10_std value: -13.573610085360095 - type: nauc_ndcg_at_1_diff1 value: 22.704520355653777 - type: nauc_ndcg_at_1_max value: -0.7340073588952427 - type: nauc_ndcg_at_1_std value: -11.685082615631233 - type: nauc_ndcg_at_20_diff1 value: 19.01305812933961 - type: nauc_ndcg_at_20_max value: 2.777977280012548 - type: nauc_ndcg_at_20_std value: -12.959515013552128 - type: nauc_ndcg_at_3_diff1 value: 19.15053976740578 - type: nauc_ndcg_at_3_max value: 3.2587972262385496 - type: nauc_ndcg_at_3_std value: -12.105808757691328 - type: nauc_ndcg_at_5_diff1 value: 18.010082675090597 - type: nauc_ndcg_at_5_max value: 3.753876824229378 - type: nauc_ndcg_at_5_std value: -11.044202434548701 - type: nauc_precision_at_1000_diff1 value: -11.75783343822487 - type: nauc_precision_at_1000_max value: 5.7856460776313465 - type: nauc_precision_at_1000_std value: 62.79171280927037 - type: nauc_precision_at_100_diff1 value: 9.08527555500537 - type: nauc_precision_at_100_max value: 36.16754653078746 - type: nauc_precision_at_100_std value: 28.37969482833522 - type: nauc_precision_at_10_diff1 value: 10.685081888632977 - type: nauc_precision_at_10_max value: 7.185779514361452 - type: nauc_precision_at_10_std value: -22.209758078034394 - type: nauc_precision_at_1_diff1 value: 22.704520355653777 - type: nauc_precision_at_1_max value: -0.7340073588952427 - type: nauc_precision_at_1_std value: -11.685082615631233 - type: nauc_precision_at_20_diff1 value: 10.0745772945806 - type: nauc_precision_at_20_max value: 16.81469938479116 - type: nauc_precision_at_20_std value: -22.804277740935298 - type: nauc_precision_at_3_diff1 value: 16.900587067301714 - type: nauc_precision_at_3_max value: 6.595958907337978 - type: nauc_precision_at_3_std value: -11.888316132805594 - type: nauc_precision_at_5_diff1 value: 12.771428972972895 - type: nauc_precision_at_5_max value: 8.79201485711544 - type: nauc_precision_at_5_std value: -8.609881800940762 - type: nauc_recall_at_1000_diff1 value: -11.757833438225305 - type: nauc_recall_at_1000_max value: 5.785646077628613 - type: nauc_recall_at_1000_std value: 62.791712809264176 - type: nauc_recall_at_100_diff1 value: 9.085275555005722 - type: nauc_recall_at_100_max value: 36.167546530787995 - type: nauc_recall_at_100_std value: 28.37969482833511 - type: nauc_recall_at_10_diff1 value: 10.68508188863288 - type: nauc_recall_at_10_max value: 7.185779514361484 - type: nauc_recall_at_10_std value: -22.209758078034465 - type: nauc_recall_at_1_diff1 value: 22.704520355653777 - type: nauc_recall_at_1_max value: -0.7340073588952427 - type: nauc_recall_at_1_std value: -11.685082615631233 - type: nauc_recall_at_20_diff1 value: 10.074577294581067 - type: nauc_recall_at_20_max value: 16.814699384791545 - type: nauc_recall_at_20_std value: -22.80427774093497 - type: nauc_recall_at_3_diff1 value: 16.900587067301768 - type: nauc_recall_at_3_max value: 6.595958907337955 - type: nauc_recall_at_3_std value: -11.888316132805613 - type: nauc_recall_at_5_diff1 value: 12.77142897297289 - type: nauc_recall_at_5_max value: 8.792014857115413 - type: nauc_recall_at_5_std value: -8.609881800940697 - type: ndcg_at_1 value: 35.135 - type: ndcg_at_10 value: 59.71300000000001 - type: ndcg_at_100 value: 62.5 - type: ndcg_at_1000 value: 62.578 - type: ndcg_at_20 value: 61.775000000000006 - type: ndcg_at_3 value: 50.336999999999996 - type: ndcg_at_5 value: 54.748 - type: precision_at_1 value: 35.135 - type: precision_at_10 value: 8.72 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.765 - type: precision_at_3 value: 20.413 - type: precision_at_5 value: 14.381 - type: recall_at_1 value: 35.135 - type: recall_at_10 value: 87.198 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.644 - type: recall_at_20 value: 95.306 - type: recall_at_3 value: 61.23800000000001 - type: recall_at_5 value: 71.906 task: type: Retrieval - dataset: config: default name: MTEB CBD revision: 36ddb419bcffe6a5374c3891957912892916f28d split: test type: PL-MTEB/cbd metrics: - type: accuracy value: 84.13000000000001 - type: ap value: 38.21674564144456 - type: ap_weighted value: 38.21674564144456 - type: f1 value: 73.58128735002478 - type: f1_weighted value: 85.75596717538494 - type: main_score value: 84.13000000000001 task: type: Classification - dataset: config: default name: MTEB CDSC-E revision: 0a3d4aa409b22f80eb22cbf59b492637637b536d split: test type: PL-MTEB/cdsce-pairclassification metrics: - type: cosine_accuracy value: 89.0 - type: cosine_accuracy_threshold value: 95.30268088769837 - type: cosine_ap value: 78.23422403821777 - type: cosine_f1 value: 69.23076923076923 - type: cosine_f1_threshold value: 87.1877340095262 - type: cosine_precision value: 67.5 - type: cosine_recall value: 71.05263157894737 - type: dot_accuracy value: 88.3 - type: dot_accuracy_threshold value: 2472000.0 - type: dot_ap value: 74.26705897704197 - type: dot_f1 value: 66.49874055415617 - type: dot_f1_threshold value: 2316800.0 - type: dot_precision value: 63.76811594202898 - type: dot_recall value: 69.47368421052632 - type: euclidean_accuracy value: 89.2 - type: euclidean_accuracy_threshold value: 6878.705188647788 - type: euclidean_ap value: 78.51718555534579 - type: euclidean_f1 value: 69.54314720812182 - type: euclidean_f1_threshold value: 8323.035838252725 - type: euclidean_precision value: 67.15686274509804 - type: euclidean_recall value: 72.10526315789474 - type: main_score value: 78.51718555534579 - type: manhattan_accuracy value: 89.2 - type: manhattan_accuracy_threshold value: 326812.48528957367 - type: manhattan_ap value: 78.50895632545628 - type: manhattan_f1 value: 69.84924623115577 - type: manhattan_f1_threshold value: 398102.616417408 - type: manhattan_precision value: 66.82692307692307 - type: manhattan_recall value: 73.15789473684211 - type: max_ap value: 78.51718555534579 - type: max_f1 value: 69.84924623115577 - type: max_precision value: 67.5 - type: max_recall value: 73.15789473684211 - type: similarity_accuracy value: 89.0 - type: similarity_accuracy_threshold value: 95.30268088769837 - type: similarity_ap value: 78.23422403821777 - type: similarity_f1 value: 69.23076923076923 - type: similarity_f1_threshold value: 87.1877340095262 - type: similarity_precision value: 67.5 - type: similarity_recall value: 71.05263157894737 task: type: PairClassification - dataset: config: default name: MTEB CDSC-R revision: 1cd6abbb00df7d14be3dbd76a7dcc64b3a79a7cd split: test type: PL-MTEB/cdscr-sts metrics: - type: cosine_pearson value: 91.04238667979497 - type: cosine_spearman value: 90.96758456402505 - type: euclidean_pearson value: 88.88396869759062 - type: euclidean_spearman value: 90.80235709678217 - type: main_score value: 90.96758456402505 - type: manhattan_pearson value: 88.91331977492183 - type: manhattan_spearman value: 90.82823486754444 - type: pearson value: 91.04238667979497 - type: spearman value: 90.96758456402505 task: type: STS - dataset: config: default name: MTEB DBPedia-PL revision: 76afe41d9af165cc40999fcaa92312b8b012064a split: test type: clarin-knext/dbpedia-pl metrics: - type: main_score value: 43.189 - type: map_at_1 value: 8.838 - type: map_at_10 value: 20.335 - type: map_at_100 value: 29.818 - type: map_at_1000 value: 31.672 - type: map_at_20 value: 24.037 - type: map_at_3 value: 14.144000000000002 - type: map_at_5 value: 16.674 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.51428571428573 - type: mrr_at_100 value: 74.85025528596333 - type: mrr_at_1000 value: 74.861579760375 - type: mrr_at_20 value: 74.75227906231197 - type: mrr_at_3 value: 73.25 - type: mrr_at_5 value: 73.825 - type: nauc_map_at_1000_diff1 value: 25.397956304548963 - type: nauc_map_at_1000_max value: 34.60045634629073 - type: nauc_map_at_1000_std value: 25.484338507029523 - type: nauc_map_at_100_diff1 value: 26.732402811074362 - type: nauc_map_at_100_max value: 33.16273154550298 - type: nauc_map_at_100_std value: 22.705558316419694 - type: nauc_map_at_10_diff1 value: 31.048350740517666 - type: nauc_map_at_10_max value: 20.58247280790142 - type: nauc_map_at_10_std value: -0.3057740988996755 - type: nauc_map_at_1_diff1 value: 37.44384898753489 - type: nauc_map_at_1_max value: 2.009066872007797 - type: nauc_map_at_1_std value: -18.38972044447374 - type: nauc_map_at_20_diff1 value: 29.145950023489974 - type: nauc_map_at_20_max value: 25.337239700245075 - type: nauc_map_at_20_std value: 7.680343084384305 - type: nauc_map_at_3_diff1 value: 32.41886776815376 - type: nauc_map_at_3_max value: 8.976460728750666 - type: nauc_map_at_3_std value: -14.206927116348458 - type: nauc_map_at_5_diff1 value: 31.316919153957873 - type: nauc_map_at_5_max value: 14.015365438005226 - type: nauc_map_at_5_std value: -8.909007562143335 - type: nauc_mrr_at_1000_diff1 value: 42.77521158292109 - type: nauc_mrr_at_1000_max value: 58.03733674934908 - type: nauc_mrr_at_1000_std value: 42.65118460573791 - type: nauc_mrr_at_100_diff1 value: 42.76917109803571 - type: nauc_mrr_at_100_max value: 58.04747433083853 - type: nauc_mrr_at_100_std value: 42.65151388365855 - type: nauc_mrr_at_10_diff1 value: 42.4992726119988 - type: nauc_mrr_at_10_max value: 58.157080658302974 - type: nauc_mrr_at_10_std value: 42.98778606676595 - type: nauc_mrr_at_1_diff1 value: 46.67764597969527 - type: nauc_mrr_at_1_max value: 54.52896662427813 - type: nauc_mrr_at_1_std value: 35.71181387979735 - type: nauc_mrr_at_20_diff1 value: 42.79101300218034 - type: nauc_mrr_at_20_max value: 58.05679669975563 - type: nauc_mrr_at_20_std value: 42.72288886007032 - type: nauc_mrr_at_3_diff1 value: 41.85440967628899 - type: nauc_mrr_at_3_max value: 57.975577899726126 - type: nauc_mrr_at_3_std value: 43.523432037784985 - type: nauc_mrr_at_5_diff1 value: 42.3041465494315 - type: nauc_mrr_at_5_max value: 58.54530113479029 - type: nauc_mrr_at_5_std value: 43.2944834223015 - type: nauc_ndcg_at_1000_diff1 value: 32.16216922989725 - type: nauc_ndcg_at_1000_max value: 50.03467332768009 - type: nauc_ndcg_at_1000_std value: 42.87877265207483 - type: nauc_ndcg_at_100_diff1 value: 33.55193527551313 - type: nauc_ndcg_at_100_max value: 45.12048953873363 - type: nauc_ndcg_at_100_std value: 34.788021436199024 - type: nauc_ndcg_at_10_diff1 value: 31.14168233882658 - type: nauc_ndcg_at_10_max value: 45.31079148382448 - type: nauc_ndcg_at_10_std value: 28.555214349385466 - type: nauc_ndcg_at_1_diff1 value: 45.12481069889602 - type: nauc_ndcg_at_1_max value: 45.93377570654117 - type: nauc_ndcg_at_1_std value: 26.672617000885186 - type: nauc_ndcg_at_20_diff1 value: 31.81216979830056 - type: nauc_ndcg_at_20_max value: 41.93464767693644 - type: nauc_ndcg_at_20_std value: 26.08707327004535 - type: nauc_ndcg_at_3_diff1 value: 29.90627202771331 - type: nauc_ndcg_at_3_max value: 46.50414958925517 - type: nauc_ndcg_at_3_std value: 29.66009841753563 - type: nauc_ndcg_at_5_diff1 value: 29.08122779713697 - type: nauc_ndcg_at_5_max value: 46.81499760516951 - type: nauc_ndcg_at_5_std value: 29.935930977468267 - type: nauc_precision_at_1000_diff1 value: -18.71150014402453 - type: nauc_precision_at_1000_max value: -0.9220395765472844 - type: nauc_precision_at_1000_std value: 7.219897945975822 - type: nauc_precision_at_100_diff1 value: -8.609528664023014 - type: nauc_precision_at_100_max value: 29.147048677242864 - type: nauc_precision_at_100_std value: 44.958041507680036 - type: nauc_precision_at_10_diff1 value: 2.8689201908213477 - type: nauc_precision_at_10_max value: 44.40893361361308 - type: nauc_precision_at_10_std value: 47.18569807586499 - type: nauc_precision_at_1_diff1 value: 46.01228536231763 - type: nauc_precision_at_1_max value: 54.30280987857099 - type: nauc_precision_at_1_std value: 36.923128493492776 - type: nauc_precision_at_20_diff1 value: -1.9783515948740122 - type: nauc_precision_at_20_max value: 38.42066921295958 - type: nauc_precision_at_20_std value: 47.41935674153161 - type: nauc_precision_at_3_diff1 value: 9.877584475384026 - type: nauc_precision_at_3_max value: 44.77006526403546 - type: nauc_precision_at_3_std value: 39.51299545977156 - type: nauc_precision_at_5_diff1 value: 5.096217475317008 - type: nauc_precision_at_5_max value: 45.66716959157208 - type: nauc_precision_at_5_std value: 42.651208343259505 - type: nauc_recall_at_1000_diff1 value: 25.395292649442965 - type: nauc_recall_at_1000_max value: 44.94193476114992 - type: nauc_recall_at_1000_std value: 53.58345238223027 - type: nauc_recall_at_100_diff1 value: 23.962022146293293 - type: nauc_recall_at_100_max value: 32.15140842028602 - type: nauc_recall_at_100_std value: 30.57126984952762 - type: nauc_recall_at_10_diff1 value: 28.120539807446004 - type: nauc_recall_at_10_max value: 18.154834280193572 - type: nauc_recall_at_10_std value: -0.6032386653260938 - type: nauc_recall_at_1_diff1 value: 37.44384898753489 - type: nauc_recall_at_1_max value: 2.009066872007797 - type: nauc_recall_at_1_std value: -18.38972044447374 - type: nauc_recall_at_20_diff1 value: 23.438945970294554 - type: nauc_recall_at_20_max value: 17.201259624644326 - type: nauc_recall_at_20_std value: 3.75587033487961 - type: nauc_recall_at_3_diff1 value: 29.867460507200587 - type: nauc_recall_at_3_max value: 8.066960542463528 - type: nauc_recall_at_3_std value: -15.13440571172203 - type: nauc_recall_at_5_diff1 value: 28.657118879661887 - type: nauc_recall_at_5_max value: 12.942552735963842 - type: nauc_recall_at_5_std value: -9.57735672972808 - type: ndcg_at_1 value: 54.50000000000001 - type: ndcg_at_10 value: 43.189 - type: ndcg_at_100 value: 48.595 - type: ndcg_at_1000 value: 55.681000000000004 - type: ndcg_at_20 value: 43.09 - type: ndcg_at_3 value: 47.599000000000004 - type: ndcg_at_5 value: 44.907000000000004 - type: precision_at_1 value: 66.5 - type: precision_at_10 value: 35.725 - type: precision_at_100 value: 11.583 - type: precision_at_1000 value: 2.302 - type: precision_at_20 value: 27.375 - type: precision_at_3 value: 52.0 - type: precision_at_5 value: 44.7 - type: recall_at_1 value: 8.838 - type: recall_at_10 value: 25.424999999999997 - type: recall_at_100 value: 55.632000000000005 - type: recall_at_1000 value: 77.857 - type: recall_at_20 value: 34.458 - type: recall_at_3 value: 15.229999999999999 - type: recall_at_5 value: 18.872 task: type: Retrieval - dataset: config: default name: MTEB 8TagsClustering revision: None split: test type: PL-MTEB/8tags-clustering metrics: - type: main_score value: 50.28804848851286 - type: v_measure value: 50.28804848851286 - type: v_measure_std value: 2.9879120747919505 task: type: Clustering - dataset: config: default name: MTEB FiQA-PL revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e split: test type: clarin-knext/fiqa-pl metrics: - type: main_score value: 46.121 - type: map_at_1 value: 24.027 - type: map_at_10 value: 38.14 - type: map_at_100 value: 40.092 - type: map_at_1000 value: 40.266000000000005 - type: map_at_20 value: 39.195 - type: map_at_3 value: 33.415 - type: map_at_5 value: 36.115 - type: mrr_at_1 value: 46.60493827160494 - type: mrr_at_10 value: 54.70305457573974 - type: mrr_at_100 value: 55.355642920233414 - type: mrr_at_1000 value: 55.3908291424442 - type: mrr_at_20 value: 55.00793641725012 - type: mrr_at_3 value: 52.3148148148148 - type: mrr_at_5 value: 53.54166666666664 - type: nauc_map_at_1000_diff1 value: 37.73510043188139 - type: nauc_map_at_1000_max value: 28.32920495001755 - type: nauc_map_at_1000_std value: 2.1388839190211293 - type: nauc_map_at_100_diff1 value: 37.670108404247685 - type: nauc_map_at_100_max value: 28.227406812543826 - type: nauc_map_at_100_std value: 2.120931632442644 - type: nauc_map_at_10_diff1 value: 37.465256098544174 - type: nauc_map_at_10_max value: 27.091226456549666 - type: nauc_map_at_10_std value: 1.1173775566235409 - type: nauc_map_at_1_diff1 value: 41.23855326212752 - type: nauc_map_at_1_max value: 21.290748552864557 - type: nauc_map_at_1_std value: -0.8385928448565472 - type: nauc_map_at_20_diff1 value: 37.47054494805535 - type: nauc_map_at_20_max value: 27.729045702955386 - type: nauc_map_at_20_std value: 1.7216485460777051 - type: nauc_map_at_3_diff1 value: 37.262641031829105 - type: nauc_map_at_3_max value: 23.89124216989901 - type: nauc_map_at_3_std value: -0.14736489529369678 - type: nauc_map_at_5_diff1 value: 37.054030521972926 - type: nauc_map_at_5_max value: 25.37485175729055 - type: nauc_map_at_5_std value: 0.1603899014557275 - type: nauc_mrr_at_1000_diff1 value: 45.74249029214392 - type: nauc_mrr_at_1000_max value: 36.07619933100338 - type: nauc_mrr_at_1000_std value: 4.393752835100674 - type: nauc_mrr_at_100_diff1 value: 45.72338919745602 - type: nauc_mrr_at_100_max value: 36.07500193737586 - type: nauc_mrr_at_100_std value: 4.415904610787372 - type: nauc_mrr_at_10_diff1 value: 45.712821401955814 - type: nauc_mrr_at_10_max value: 36.077633940467855 - type: nauc_mrr_at_10_std value: 4.31515612100577 - type: nauc_mrr_at_1_diff1 value: 48.95197646135339 - type: nauc_mrr_at_1_max value: 37.627960253727124 - type: nauc_mrr_at_1_std value: 4.355410396712492 - type: nauc_mrr_at_20_diff1 value: 45.657031672968316 - type: nauc_mrr_at_20_max value: 36.02034080808377 - type: nauc_mrr_at_20_std value: 4.291569107759258 - type: nauc_mrr_at_3_diff1 value: 46.14016248486381 - type: nauc_mrr_at_3_max value: 35.096997959937816 - type: nauc_mrr_at_3_std value: 3.473234729162835 - type: nauc_mrr_at_5_diff1 value: 46.044456362138746 - type: nauc_mrr_at_5_max value: 35.54259698630834 - type: nauc_mrr_at_5_std value: 3.242035621890524 - type: nauc_ndcg_at_1000_diff1 value: 39.37342092420808 - type: nauc_ndcg_at_1000_max value: 32.34854163612446 - type: nauc_ndcg_at_1000_std value: 4.9764682793258865 - type: nauc_ndcg_at_100_diff1 value: 38.396532780365966 - type: nauc_ndcg_at_100_max value: 31.427345966345072 - type: nauc_ndcg_at_100_std value: 5.436384757156155 - type: nauc_ndcg_at_10_diff1 value: 38.33852883060773 - type: nauc_ndcg_at_10_max value: 29.405844267873825 - type: nauc_ndcg_at_10_std value: 2.9724473995284453 - type: nauc_ndcg_at_1_diff1 value: 49.360894087944914 - type: nauc_ndcg_at_1_max value: 37.10711812240423 - type: nauc_ndcg_at_1_std value: 3.8523559329866988 - type: nauc_ndcg_at_20_diff1 value: 38.050204646363945 - type: nauc_ndcg_at_20_max value: 29.935603389108866 - type: nauc_ndcg_at_20_std value: 3.779925764680313 - type: nauc_ndcg_at_3_diff1 value: 39.4668764835337 - type: nauc_ndcg_at_3_max value: 30.65976708125836 - type: nauc_ndcg_at_3_std value: 1.2337033504877237 - type: nauc_ndcg_at_5_diff1 value: 38.86503445443355 - type: nauc_ndcg_at_5_max value: 29.0023578220992 - type: nauc_ndcg_at_5_std value: 0.8206100069462643 - type: nauc_precision_at_1000_diff1 value: 5.84775168273073 - type: nauc_precision_at_1000_max value: 27.58660371315182 - type: nauc_precision_at_1000_std value: 9.028324162807364 - type: nauc_precision_at_100_diff1 value: 10.655637431827838 - type: nauc_precision_at_100_max value: 32.11889757111383 - type: nauc_precision_at_100_std value: 13.051376462007925 - type: nauc_precision_at_10_diff1 value: 20.55227291550576 - type: nauc_precision_at_10_max value: 34.48969436232284 - type: nauc_precision_at_10_std value: 7.57890876950882 - type: nauc_precision_at_1_diff1 value: 49.360894087944914 - type: nauc_precision_at_1_max value: 37.10711812240423 - type: nauc_precision_at_1_std value: 3.8523559329866988 - type: nauc_precision_at_20_diff1 value: 16.62880025315897 - type: nauc_precision_at_20_max value: 34.15703662717139 - type: nauc_precision_at_20_std value: 10.909431920732883 - type: nauc_precision_at_3_diff1 value: 28.04332082306772 - type: nauc_precision_at_3_max value: 31.009374202971753 - type: nauc_precision_at_3_std value: 2.307756409916575 - type: nauc_precision_at_5_diff1 value: 24.824270715808705 - type: nauc_precision_at_5_max value: 31.644036540931886 - type: nauc_precision_at_5_std value: 2.958068954639614 - type: nauc_recall_at_1000_diff1 value: 23.79234063489045 - type: nauc_recall_at_1000_max value: 26.76365425679858 - type: nauc_recall_at_1000_std value: 23.815318997671913 - type: nauc_recall_at_100_diff1 value: 22.399781833514737 - type: nauc_recall_at_100_max value: 23.192360958839174 - type: nauc_recall_at_100_std value: 15.984687692762742 - type: nauc_recall_at_10_diff1 value: 28.512649044683837 - type: nauc_recall_at_10_max value: 22.77819651497193 - type: nauc_recall_at_10_std value: 4.646633382718951 - type: nauc_recall_at_1_diff1 value: 41.23855326212752 - type: nauc_recall_at_1_max value: 21.290748552864557 - type: nauc_recall_at_1_std value: -0.8385928448565472 - type: nauc_recall_at_20_diff1 value: 26.797853661700632 - type: nauc_recall_at_20_max value: 21.9956231017133 - type: nauc_recall_at_20_std value: 5.664775183514371 - type: nauc_recall_at_3_diff1 value: 31.42511076281081 - type: nauc_recall_at_3_max value: 19.459398184547652 - type: nauc_recall_at_3_std value: -0.8592886454260257 - type: nauc_recall_at_5_diff1 value: 29.62950699804912 - type: nauc_recall_at_5_max value: 19.941323519486684 - type: nauc_recall_at_5_std value: -0.45387351120880465 - type: ndcg_at_1 value: 46.451 - type: ndcg_at_10 value: 46.121 - type: ndcg_at_100 value: 52.830999999999996 - type: ndcg_at_1000 value: 55.557 - type: ndcg_at_20 value: 48.535000000000004 - type: ndcg_at_3 value: 42.178 - type: ndcg_at_5 value: 43.406 - type: precision_at_1 value: 46.451 - type: precision_at_10 value: 12.562000000000001 - type: precision_at_100 value: 1.963 - type: precision_at_1000 value: 0.244 - type: precision_at_20 value: 7.392 - type: precision_at_3 value: 27.572000000000003 - type: precision_at_5 value: 20.031 - type: recall_at_1 value: 24.027 - type: recall_at_10 value: 52.61900000000001 - type: recall_at_100 value: 77.491 - type: recall_at_1000 value: 93.55 - type: recall_at_20 value: 59.745000000000005 - type: recall_at_3 value: 37.765 - type: recall_at_5 value: 44.304 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA-PL revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 split: test type: clarin-knext/hotpotqa-pl metrics: - type: main_score value: 77.02799999999999 - type: map_at_1 value: 41.249 - type: map_at_10 value: 69.512 - type: map_at_100 value: 70.291 - type: map_at_1000 value: 70.334 - type: map_at_20 value: 69.992 - type: map_at_3 value: 65.751 - type: map_at_5 value: 68.161 - type: mrr_at_1 value: 82.4983119513842 - type: mrr_at_10 value: 87.71202426502866 - type: mrr_at_100 value: 87.84265780907221 - type: mrr_at_1000 value: 87.8455843626266 - type: mrr_at_20 value: 87.80640011547308 - type: mrr_at_3 value: 86.94575737114536 - type: mrr_at_5 value: 87.46770200315063 - type: nauc_map_at_1000_diff1 value: 17.17119899625707 - type: nauc_map_at_1000_max value: 29.981569339485393 - type: nauc_map_at_1000_std value: 8.93659568948167 - type: nauc_map_at_100_diff1 value: 17.156175947340035 - type: nauc_map_at_100_max value: 29.988121004348194 - type: nauc_map_at_100_std value: 8.967947232110745 - type: nauc_map_at_10_diff1 value: 16.854416108818132 - type: nauc_map_at_10_max value: 29.784211249360194 - type: nauc_map_at_10_std value: 8.535227936720936 - type: nauc_map_at_1_diff1 value: 68.01294545515707 - type: nauc_map_at_1_max value: 47.51019900345037 - type: nauc_map_at_1_std value: -1.7951406243808212 - type: nauc_map_at_20_diff1 value: 16.993955459776572 - type: nauc_map_at_20_max value: 29.920806300647463 - type: nauc_map_at_20_std value: 8.873597327714583 - type: nauc_map_at_3_diff1 value: 16.16514623575243 - type: nauc_map_at_3_max value: 27.62371849413713 - type: nauc_map_at_3_std value: 5.131406130565191 - type: nauc_map_at_5_diff1 value: 16.507863832657364 - type: nauc_map_at_5_max value: 28.9019090072195 - type: nauc_map_at_5_std value: 7.2380930617814645 - type: nauc_mrr_at_1000_diff1 value: 66.74502991743417 - type: nauc_mrr_at_1000_max value: 50.29274140603486 - type: nauc_mrr_at_1000_std value: 1.602388931386098 - type: nauc_mrr_at_100_diff1 value: 66.7413605208101 - type: nauc_mrr_at_100_max value: 50.29720043419606 - type: nauc_mrr_at_100_std value: 1.612142495535232 - type: nauc_mrr_at_10_diff1 value: 66.71814591414376 - type: nauc_mrr_at_10_max value: 50.39851050116519 - type: nauc_mrr_at_10_std value: 1.7339878916186384 - type: nauc_mrr_at_1_diff1 value: 68.01294545515707 - type: nauc_mrr_at_1_max value: 47.627701029006225 - type: nauc_mrr_at_1_std value: -1.442043059079073 - type: nauc_mrr_at_20_diff1 value: 66.72944815863312 - type: nauc_mrr_at_20_max value: 50.325719646409716 - type: nauc_mrr_at_20_std value: 1.6584317196476688 - type: nauc_mrr_at_3_diff1 value: 66.29662294615758 - type: nauc_mrr_at_3_max value: 50.29363488669571 - type: nauc_mrr_at_3_std value: 1.1373012069481296 - type: nauc_mrr_at_5_diff1 value: 66.70959181668684 - type: nauc_mrr_at_5_max value: 50.42831108375743 - type: nauc_mrr_at_5_std value: 1.5492429855609648 - type: nauc_ndcg_at_1000_diff1 value: 24.337157353044912 - type: nauc_ndcg_at_1000_max value: 35.021784629126984 - type: nauc_ndcg_at_1000_std value: 11.976738067383161 - type: nauc_ndcg_at_100_diff1 value: 23.584427352691776 - type: nauc_ndcg_at_100_max value: 35.12304754035805 - type: nauc_ndcg_at_100_std value: 12.921291623167921 - type: nauc_ndcg_at_10_diff1 value: 22.057127915032765 - type: nauc_ndcg_at_10_max value: 34.09397142140321 - type: nauc_ndcg_at_10_std value: 11.21339882108658 - type: nauc_ndcg_at_1_diff1 value: 68.01294545515707 - type: nauc_ndcg_at_1_max value: 47.51019900345037 - type: nauc_ndcg_at_1_std value: -1.7951406243808212 - type: nauc_ndcg_at_20_diff1 value: 22.404347553479102 - type: nauc_ndcg_at_20_max value: 34.50508324969608 - type: nauc_ndcg_at_20_std value: 12.281993331498175 - type: nauc_ndcg_at_3_diff1 value: 21.21895220595676 - type: nauc_ndcg_at_3_max value: 30.76465236403928 - type: nauc_ndcg_at_3_std value: 5.501903724385424 - type: nauc_ndcg_at_5_diff1 value: 21.489825424548258 - type: nauc_ndcg_at_5_max value: 32.43517409935615 - type: nauc_ndcg_at_5_std value: 8.59021290966302 - type: nauc_precision_at_1000_diff1 value: 9.056916578488696 - type: nauc_precision_at_1000_max value: 47.29861770129213 - type: nauc_precision_at_1000_std value: 60.06028316961357 - type: nauc_precision_at_100_diff1 value: 6.853208191063939 - type: nauc_precision_at_100_max value: 40.23686318254916 - type: nauc_precision_at_100_std value: 44.69884156134862 - type: nauc_precision_at_10_diff1 value: 7.7572606953149315 - type: nauc_precision_at_10_max value: 33.24412509121427 - type: nauc_precision_at_10_std value: 22.894891705425753 - type: nauc_precision_at_1_diff1 value: 68.01294545515707 - type: nauc_precision_at_1_max value: 47.51019900345037 - type: nauc_precision_at_1_std value: -1.7951406243808212 - type: nauc_precision_at_20_diff1 value: 6.102789021481188 - type: nauc_precision_at_20_max value: 34.384739158981084 - type: nauc_precision_at_20_std value: 29.40165302735249 - type: nauc_precision_at_3_diff1 value: 10.004182813463276 - type: nauc_precision_at_3_max value: 27.07527926636925 - type: nauc_precision_at_3_std value: 8.034252288165805 - type: nauc_precision_at_5_diff1 value: 8.672082689816547 - type: nauc_precision_at_5_max value: 29.352582129843867 - type: nauc_precision_at_5_std value: 14.456464951944461 - type: nauc_recall_at_1000_diff1 value: 9.056916578488018 - type: nauc_recall_at_1000_max value: 47.29861770129215 - type: nauc_recall_at_1000_std value: 60.06028316961315 - type: nauc_recall_at_100_diff1 value: 6.853208191063934 - type: nauc_recall_at_100_max value: 40.23686318254888 - type: nauc_recall_at_100_std value: 44.698841561348615 - type: nauc_recall_at_10_diff1 value: 7.7572606953149394 - type: nauc_recall_at_10_max value: 33.244125091214286 - type: nauc_recall_at_10_std value: 22.894891705425863 - type: nauc_recall_at_1_diff1 value: 68.01294545515707 - type: nauc_recall_at_1_max value: 47.51019900345037 - type: nauc_recall_at_1_std value: -1.7951406243808212 - type: nauc_recall_at_20_diff1 value: 6.102789021481126 - type: nauc_recall_at_20_max value: 34.38473915898118 - type: nauc_recall_at_20_std value: 29.40165302735251 - type: nauc_recall_at_3_diff1 value: 10.004182813463203 - type: nauc_recall_at_3_max value: 27.07527926636916 - type: nauc_recall_at_3_std value: 8.034252288165728 - type: nauc_recall_at_5_diff1 value: 8.672082689816364 - type: nauc_recall_at_5_max value: 29.352582129843714 - type: nauc_recall_at_5_std value: 14.4564649519445 - type: ndcg_at_1 value: 82.498 - type: ndcg_at_10 value: 77.02799999999999 - type: ndcg_at_100 value: 79.593 - type: ndcg_at_1000 value: 80.372 - type: ndcg_at_20 value: 78.194 - type: ndcg_at_3 value: 71.932 - type: ndcg_at_5 value: 74.878 - type: precision_at_1 value: 82.498 - type: precision_at_10 value: 16.289 - type: precision_at_100 value: 1.8259999999999998 - type: precision_at_1000 value: 0.193 - type: precision_at_20 value: 8.519 - type: precision_at_3 value: 46.851 - type: precision_at_5 value: 30.436000000000003 - type: recall_at_1 value: 41.249 - type: recall_at_10 value: 81.44500000000001 - type: recall_at_100 value: 91.323 - type: recall_at_1000 value: 96.44200000000001 - type: recall_at_20 value: 85.18599999999999 - type: recall_at_3 value: 70.277 - type: recall_at_5 value: 76.09 task: type: Retrieval - dataset: config: default name: MTEB MSMARCO-PL revision: 8634c07806d5cce3a6138e260e59b81760a0a640 split: test type: clarin-knext/msmarco-pl metrics: - type: main_score value: 72.695 - type: map_at_1 value: 2.313 - type: map_at_10 value: 16.541 - type: map_at_100 value: 42.664 - type: map_at_1000 value: 51.048 - type: map_at_20 value: 25.691000000000003 - type: map_at_3 value: 6.8580000000000005 - type: map_at_5 value: 10.227 - type: mrr_at_1 value: 90.69767441860465 - type: mrr_at_10 value: 94.65116279069768 - type: mrr_at_100 value: 94.65116279069768 - type: mrr_at_1000 value: 94.65116279069768 - type: mrr_at_20 value: 94.65116279069768 - type: mrr_at_3 value: 94.18604651162791 - type: mrr_at_5 value: 94.65116279069768 - type: nauc_map_at_1000_diff1 value: -19.394271777832838 - type: nauc_map_at_1000_max value: 35.63073356621754 - type: nauc_map_at_1000_std value: 56.92803671553409 - type: nauc_map_at_100_diff1 value: -7.023340458676494 - type: nauc_map_at_100_max value: 22.967662469404267 - type: nauc_map_at_100_std value: 28.64423344417142 - type: nauc_map_at_10_diff1 value: 18.22452762970126 - type: nauc_map_at_10_max value: 3.235969423980127 - type: nauc_map_at_10_std value: -11.528499499305529 - type: nauc_map_at_1_diff1 value: 17.90743559505749 - type: nauc_map_at_1_max value: -14.61627654448527 - type: nauc_map_at_1_std value: -24.262430292012667 - type: nauc_map_at_20_diff1 value: 14.96422992084746 - type: nauc_map_at_20_max value: 11.128128185086132 - type: nauc_map_at_20_std value: -0.4087236026844547 - type: nauc_map_at_3_diff1 value: 16.45733174189393 - type: nauc_map_at_3_max value: -14.88196784500194 - type: nauc_map_at_3_std value: -26.096323520383446 - type: nauc_map_at_5_diff1 value: 17.572159494245003 - type: nauc_map_at_5_max value: -11.206812710229503 - type: nauc_map_at_5_std value: -22.27070819579704 - type: nauc_mrr_at_1000_diff1 value: 33.66069097978205 - type: nauc_mrr_at_1000_max value: 43.87773602456895 - type: nauc_mrr_at_1000_std value: 52.33730714398662 - type: nauc_mrr_at_100_diff1 value: 33.66069097978205 - type: nauc_mrr_at_100_max value: 43.87773602456895 - type: nauc_mrr_at_100_std value: 52.33730714398662 - type: nauc_mrr_at_10_diff1 value: 33.66069097978205 - type: nauc_mrr_at_10_max value: 43.87773602456895 - type: nauc_mrr_at_10_std value: 52.33730714398662 - type: nauc_mrr_at_1_diff1 value: 23.709794626749783 - type: nauc_mrr_at_1_max value: 35.45939642825464 - type: nauc_mrr_at_1_std value: 45.18790321558505 - type: nauc_mrr_at_20_diff1 value: 33.66069097978205 - type: nauc_mrr_at_20_max value: 43.87773602456895 - type: nauc_mrr_at_20_std value: 52.33730714398662 - type: nauc_mrr_at_3_diff1 value: 38.96783570139972 - type: nauc_mrr_at_3_max value: 48.367517142603624 - type: nauc_mrr_at_3_std value: 56.15032257246786 - type: nauc_mrr_at_5_diff1 value: 33.66069097978205 - type: nauc_mrr_at_5_max value: 43.87773602456895 - type: nauc_mrr_at_5_std value: 52.33730714398662 - type: nauc_ndcg_at_1000_diff1 value: -8.409227649777549 - type: nauc_ndcg_at_1000_max value: 55.08579408014661 - type: nauc_ndcg_at_1000_std value: 64.71829411541155 - type: nauc_ndcg_at_100_diff1 value: -12.171382005828134 - type: nauc_ndcg_at_100_max value: 37.279599751187895 - type: nauc_ndcg_at_100_std value: 55.59571261330682 - type: nauc_ndcg_at_10_diff1 value: -4.2745893875224645 - type: nauc_ndcg_at_10_max value: 35.61094191299521 - type: nauc_ndcg_at_10_std value: 31.49122710738599 - type: nauc_ndcg_at_1_diff1 value: 34.77341575621081 - type: nauc_ndcg_at_1_max value: 18.418784098194983 - type: nauc_ndcg_at_1_std value: 3.6003144907881026 - type: nauc_ndcg_at_20_diff1 value: -16.937600290863816 - type: nauc_ndcg_at_20_max value: 28.731002593372718 - type: nauc_ndcg_at_20_std value: 40.140028262395546 - type: nauc_ndcg_at_3_diff1 value: 21.008563623057892 - type: nauc_ndcg_at_3_max value: 32.092932411602945 - type: nauc_ndcg_at_3_std value: 7.783159518591246 - type: nauc_ndcg_at_5_diff1 value: 13.35248395075747 - type: nauc_ndcg_at_5_max value: 33.48637127489678 - type: nauc_ndcg_at_5_std value: 19.883656903878986 - type: nauc_precision_at_1000_diff1 value: -34.613170483366815 - type: nauc_precision_at_1000_max value: 14.178980568050093 - type: nauc_precision_at_1000_std value: 53.45813399059421 - type: nauc_precision_at_100_diff1 value: -40.67552345859168 - type: nauc_precision_at_100_max value: 23.091965607829138 - type: nauc_precision_at_100_std value: 62.39644907525577 - type: nauc_precision_at_10_diff1 value: -29.61210257317124 - type: nauc_precision_at_10_max value: 43.992102732918255 - type: nauc_precision_at_10_std value: 67.25524849542518 - type: nauc_precision_at_1_diff1 value: 23.709794626749783 - type: nauc_precision_at_1_max value: 35.45939642825464 - type: nauc_precision_at_1_std value: 45.18790321558505 - type: nauc_precision_at_20_diff1 value: -38.29110052486433 - type: nauc_precision_at_20_max value: 28.73705296191401 - type: nauc_precision_at_20_std value: 62.12026159344505 - type: nauc_precision_at_3_diff1 value: -4.950069185044093 - type: nauc_precision_at_3_max value: 35.30311413187648 - type: nauc_precision_at_3_std value: 37.24789627772557 - type: nauc_precision_at_5_diff1 value: -8.259725731846123 - type: nauc_precision_at_5_max value: 33.985287538899314 - type: nauc_precision_at_5_std value: 53.59550306044433 - type: nauc_recall_at_1000_diff1 value: -5.996961409631926 - type: nauc_recall_at_1000_max value: 63.118266233402764 - type: nauc_recall_at_1000_std value: 69.5649709802058 - type: nauc_recall_at_100_diff1 value: 6.920650261229799 - type: nauc_recall_at_100_max value: 26.76777278523633 - type: nauc_recall_at_100_std value: 24.81349844560708 - type: nauc_recall_at_10_diff1 value: 18.636579796911292 - type: nauc_recall_at_10_max value: 2.214374250576099 - type: nauc_recall_at_10_std value: -12.939953791707651 - type: nauc_recall_at_1_diff1 value: 17.90743559505749 - type: nauc_recall_at_1_max value: -14.61627654448527 - type: nauc_recall_at_1_std value: -24.262430292012667 - type: nauc_recall_at_20_diff1 value: 17.612041689452855 - type: nauc_recall_at_20_max value: 11.182632726686007 - type: nauc_recall_at_20_std value: -2.4835954401161864 - type: nauc_recall_at_3_diff1 value: 16.773341381117 - type: nauc_recall_at_3_max value: -15.051242807277163 - type: nauc_recall_at_3_std value: -26.410274593618038 - type: nauc_recall_at_5_diff1 value: 17.091861029537423 - type: nauc_recall_at_5_max value: -13.243464985211395 - type: nauc_recall_at_5_std value: -23.92982354951768 - type: ndcg_at_1 value: 78.295 - type: ndcg_at_10 value: 72.695 - type: ndcg_at_100 value: 65.69500000000001 - type: ndcg_at_1000 value: 73.359 - type: ndcg_at_20 value: 69.16499999999999 - type: ndcg_at_3 value: 76.632 - type: ndcg_at_5 value: 74.024 - type: precision_at_1 value: 90.69800000000001 - type: precision_at_10 value: 81.628 - type: precision_at_100 value: 38.116 - type: precision_at_1000 value: 7.199999999999999 - type: precision_at_20 value: 72.209 - type: precision_at_3 value: 89.922 - type: precision_at_5 value: 86.047 - type: recall_at_1 value: 2.313 - type: recall_at_10 value: 17.48 - type: recall_at_100 value: 53.937000000000005 - type: recall_at_1000 value: 80.018 - type: recall_at_20 value: 28.081 - type: recall_at_3 value: 6.927 - type: recall_at_5 value: 10.575 task: type: Retrieval - dataset: config: pl name: MTEB MassiveIntentClassification (pl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 79.41492938802959 - type: f1 value: 75.75917683785259 - type: f1_weighted value: 79.4156392656699 - type: main_score value: 79.41492938802959 task: type: Classification - dataset: config: pl name: MTEB MassiveScenarioClassification (pl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 81.9334229993275 - type: f1 value: 81.40628785444537 - type: f1_weighted value: 81.79807477693303 - type: main_score value: 81.9334229993275 task: type: Classification - dataset: config: default name: MTEB NFCorpus-PL revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 split: test type: clarin-knext/nfcorpus-pl metrics: - type: main_score value: 36.723 - type: map_at_1 value: 5.8069999999999995 - type: map_at_10 value: 13.602 - type: map_at_100 value: 17.196 - type: map_at_1000 value: 18.609 - type: map_at_20 value: 15.146999999999998 - type: map_at_3 value: 9.594999999999999 - type: map_at_5 value: 11.453000000000001 - type: mrr_at_1 value: 47.368421052631575 - type: mrr_at_10 value: 55.60703228659884 - type: mrr_at_100 value: 56.1552975760445 - type: mrr_at_1000 value: 56.19164342988321 - type: mrr_at_20 value: 55.922507068281476 - type: mrr_at_3 value: 53.147574819401456 - type: mrr_at_5 value: 54.680082559339525 - type: nauc_map_at_1000_diff1 value: 34.05763404594125 - type: nauc_map_at_1000_max value: 29.5226776533209 - type: nauc_map_at_1000_std value: 15.427632324819914 - type: nauc_map_at_100_diff1 value: 34.80313586539057 - type: nauc_map_at_100_max value: 27.999543781245972 - type: nauc_map_at_100_std value: 11.502430185601197 - type: nauc_map_at_10_diff1 value: 39.10493763818235 - type: nauc_map_at_10_max value: 20.299110129894572 - type: nauc_map_at_10_std value: -1.8131312981171384 - type: nauc_map_at_1_diff1 value: 54.952292547558436 - type: nauc_map_at_1_max value: 13.172173380536137 - type: nauc_map_at_1_std value: -11.135859432447047 - type: nauc_map_at_20_diff1 value: 36.56338939350608 - type: nauc_map_at_20_max value: 24.057778180377355 - type: nauc_map_at_20_std value: 4.030543599731532 - type: nauc_map_at_3_diff1 value: 46.798195082350766 - type: nauc_map_at_3_max value: 14.899395608553915 - type: nauc_map_at_3_std value: -10.505614189182307 - type: nauc_map_at_5_diff1 value: 42.83953515294862 - type: nauc_map_at_5_max value: 17.04727497975375 - type: nauc_map_at_5_std value: -7.6517071380275885 - type: nauc_mrr_at_1000_diff1 value: 41.44193432540061 - type: nauc_mrr_at_1000_max value: 39.88086824180341 - type: nauc_mrr_at_1000_std value: 27.351885880283966 - type: nauc_mrr_at_100_diff1 value: 41.43357468563369 - type: nauc_mrr_at_100_max value: 39.91394628214467 - type: nauc_mrr_at_100_std value: 27.37166382203234 - type: nauc_mrr_at_10_diff1 value: 41.46082695650948 - type: nauc_mrr_at_10_max value: 39.858957188572944 - type: nauc_mrr_at_10_std value: 27.18216001182641 - type: nauc_mrr_at_1_diff1 value: 41.485448798176904 - type: nauc_mrr_at_1_max value: 33.6944538535235 - type: nauc_mrr_at_1_std value: 22.826701578387503 - type: nauc_mrr_at_20_diff1 value: 41.374365310091925 - type: nauc_mrr_at_20_max value: 39.923859616197035 - type: nauc_mrr_at_20_std value: 27.27268109687068 - type: nauc_mrr_at_3_diff1 value: 42.1244757279239 - type: nauc_mrr_at_3_max value: 38.380669877043864 - type: nauc_mrr_at_3_std value: 25.734391560690224 - type: nauc_mrr_at_5_diff1 value: 41.26497822292423 - type: nauc_mrr_at_5_max value: 39.17164048501762 - type: nauc_mrr_at_5_std value: 26.304110615701987 - type: nauc_ndcg_at_1000_diff1 value: 31.76845316166595 - type: nauc_ndcg_at_1000_max value: 44.0530198648453 - type: nauc_ndcg_at_1000_std value: 33.37050209530549 - type: nauc_ndcg_at_100_diff1 value: 31.70167104254346 - type: nauc_ndcg_at_100_max value: 38.98577219865644 - type: nauc_ndcg_at_100_std value: 28.46948949404448 - type: nauc_ndcg_at_10_diff1 value: 31.41371490994258 - type: nauc_ndcg_at_10_max value: 36.46974014607837 - type: nauc_ndcg_at_10_std value: 28.214061102873274 - type: nauc_ndcg_at_1_diff1 value: 45.195218239572185 - type: nauc_ndcg_at_1_max value: 32.47174554115089 - type: nauc_ndcg_at_1_std value: 22.252970640869655 - type: nauc_ndcg_at_20_diff1 value: 30.22073304733139 - type: nauc_ndcg_at_20_max value: 36.85722580956459 - type: nauc_ndcg_at_20_std value: 28.82508960932221 - type: nauc_ndcg_at_3_diff1 value: 34.85087007597385 - type: nauc_ndcg_at_3_max value: 35.08880030166066 - type: nauc_ndcg_at_3_std value: 24.477164602350427 - type: nauc_ndcg_at_5_diff1 value: 32.15269255562139 - type: nauc_ndcg_at_5_max value: 36.26512978748847 - type: nauc_ndcg_at_5_std value: 26.121143638336193 - type: nauc_precision_at_1000_diff1 value: -5.016344866521763 - type: nauc_precision_at_1000_max value: 13.76155613533569 - type: nauc_precision_at_1000_std value: 42.87650310943072 - type: nauc_precision_at_100_diff1 value: -2.4765231121724867 - type: nauc_precision_at_100_max value: 26.413714147361173 - type: nauc_precision_at_100_std value: 52.07869389693284 - type: nauc_precision_at_10_diff1 value: 9.381859834804454 - type: nauc_precision_at_10_max value: 36.79686689654208 - type: nauc_precision_at_10_std value: 41.450385008923874 - type: nauc_precision_at_1_diff1 value: 43.14276503972391 - type: nauc_precision_at_1_max value: 33.23669937901841 - type: nauc_precision_at_1_std value: 23.574191783291614 - type: nauc_precision_at_20_diff1 value: 3.3554639781732143 - type: nauc_precision_at_20_max value: 35.07048369650734 - type: nauc_precision_at_20_std value: 46.90757933302204 - type: nauc_precision_at_3_diff1 value: 22.3364560733951 - type: nauc_precision_at_3_max value: 34.49198383469041 - type: nauc_precision_at_3_std value: 28.30886758592867 - type: nauc_precision_at_5_diff1 value: 14.242157915266043 - type: nauc_precision_at_5_max value: 36.78665790141447 - type: nauc_precision_at_5_std value: 34.22226904133568 - type: nauc_recall_at_1000_diff1 value: 6.177080203711223 - type: nauc_recall_at_1000_max value: 20.36718691855502 - type: nauc_recall_at_1000_std value: 21.44974953318914 - type: nauc_recall_at_100_diff1 value: 16.98521396327983 - type: nauc_recall_at_100_max value: 25.739641139625473 - type: nauc_recall_at_100_std value: 16.08045361596745 - type: nauc_recall_at_10_diff1 value: 28.066091446759465 - type: nauc_recall_at_10_max value: 15.875422037194987 - type: nauc_recall_at_10_std value: -2.7729209404094712 - type: nauc_recall_at_1_diff1 value: 54.952292547558436 - type: nauc_recall_at_1_max value: 13.172173380536137 - type: nauc_recall_at_1_std value: -11.135859432447047 - type: nauc_recall_at_20_diff1 value: 22.454203317605455 - type: nauc_recall_at_20_max value: 19.38991609441149 - type: nauc_recall_at_20_std value: 3.3669889925713683 - type: nauc_recall_at_3_diff1 value: 42.41050348142469 - type: nauc_recall_at_3_max value: 14.345477767632861 - type: nauc_recall_at_3_std value: -11.275161125178107 - type: nauc_recall_at_5_diff1 value: 34.851159133502286 - type: nauc_recall_at_5_max value: 15.03263812713638 - type: nauc_recall_at_5_std value: -9.042538295018138 - type: ndcg_at_1 value: 44.891999999999996 - type: ndcg_at_10 value: 36.723 - type: ndcg_at_100 value: 33.101 - type: ndcg_at_1000 value: 41.493 - type: ndcg_at_20 value: 34.14 - type: ndcg_at_3 value: 41.131 - type: ndcg_at_5 value: 39.446999999999996 - type: precision_at_1 value: 46.749 - type: precision_at_10 value: 27.616000000000003 - type: precision_at_100 value: 8.372 - type: precision_at_1000 value: 2.095 - type: precision_at_20 value: 20.294 - type: precision_at_3 value: 38.493 - type: precision_at_5 value: 34.427 - type: recall_at_1 value: 5.8069999999999995 - type: recall_at_10 value: 18.444 - type: recall_at_100 value: 33.655 - type: recall_at_1000 value: 63.839999999999996 - type: recall_at_20 value: 22.205 - type: recall_at_3 value: 10.61 - type: recall_at_5 value: 13.938999999999998 task: type: Retrieval - dataset: config: default name: MTEB NQ-PL revision: f171245712cf85dd4700b06bef18001578d0ca8d split: test type: clarin-knext/nq-pl metrics: - type: main_score value: 56.854000000000006 - type: map_at_1 value: 34.514 - type: map_at_10 value: 49.644 - type: map_at_100 value: 50.608 - type: map_at_1000 value: 50.635 - type: map_at_20 value: 50.305 - type: map_at_3 value: 45.672000000000004 - type: map_at_5 value: 48.089 - type: mrr_at_1 value: 38.78910776361529 - type: mrr_at_10 value: 52.148397984145234 - type: mrr_at_100 value: 52.852966946095215 - type: mrr_at_1000 value: 52.87105017860762 - type: mrr_at_20 value: 52.64188894631607 - type: mrr_at_3 value: 48.97643877945134 - type: mrr_at_5 value: 50.92168791039002 - type: nauc_map_at_1000_diff1 value: 37.02156712167867 - type: nauc_map_at_1000_max value: 30.9541229199217 - type: nauc_map_at_1000_std value: 7.320033004454671 - type: nauc_map_at_100_diff1 value: 37.02236703226826 - type: nauc_map_at_100_max value: 30.9697676745961 - type: nauc_map_at_100_std value: 7.33984133867723 - type: nauc_map_at_10_diff1 value: 36.90102700826612 - type: nauc_map_at_10_max value: 30.785723842405183 - type: nauc_map_at_10_std value: 6.779448226242215 - type: nauc_map_at_1_diff1 value: 39.909029450982274 - type: nauc_map_at_1_max value: 25.241631663639062 - type: nauc_map_at_1_std value: 3.9346798436914625 - type: nauc_map_at_20_diff1 value: 37.01885833177735 - type: nauc_map_at_20_max value: 30.93864719019393 - type: nauc_map_at_20_std value: 7.157784404582363 - type: nauc_map_at_3_diff1 value: 36.66395294442894 - type: nauc_map_at_3_max value: 28.73917625955397 - type: nauc_map_at_3_std value: 4.974442294121807 - type: nauc_map_at_5_diff1 value: 36.50200331851477 - type: nauc_map_at_5_max value: 30.19694653814823 - type: nauc_map_at_5_std value: 6.080701892676308 - type: nauc_mrr_at_1000_diff1 value: 37.13771503608112 - type: nauc_mrr_at_1000_max value: 31.751547147247507 - type: nauc_mrr_at_1000_std value: 9.508614158791604 - type: nauc_mrr_at_100_diff1 value: 37.13715249048103 - type: nauc_mrr_at_100_max value: 31.76453363846907 - type: nauc_mrr_at_100_std value: 9.527333431366577 - type: nauc_mrr_at_10_diff1 value: 37.04617391414406 - type: nauc_mrr_at_10_max value: 31.835558691659767 - type: nauc_mrr_at_10_std value: 9.403478249864207 - type: nauc_mrr_at_1_diff1 value: 40.24340603514061 - type: nauc_mrr_at_1_max value: 27.892025295592664 - type: nauc_mrr_at_1_std value: 6.948060152377137 - type: nauc_mrr_at_20_diff1 value: 37.13679664662962 - type: nauc_mrr_at_20_max value: 31.80571193908972 - type: nauc_mrr_at_20_std value: 9.463516427443066 - type: nauc_mrr_at_3_diff1 value: 36.59947958587673 - type: nauc_mrr_at_3_max value: 30.56905612034133 - type: nauc_mrr_at_3_std value: 8.213473085446296 - type: nauc_mrr_at_5_diff1 value: 36.66740305041658 - type: nauc_mrr_at_5_max value: 31.470226490982878 - type: nauc_mrr_at_5_std value: 9.02109643375307 - type: nauc_ndcg_at_1000_diff1 value: 36.60296185088649 - type: nauc_ndcg_at_1000_max value: 33.40562074993109 - type: nauc_ndcg_at_1000_std value: 10.60845451213325 - type: nauc_ndcg_at_100_diff1 value: 36.59946610918652 - type: nauc_ndcg_at_100_max value: 33.9570260243297 - type: nauc_ndcg_at_100_std value: 11.340469448481196 - type: nauc_ndcg_at_10_diff1 value: 36.14418247401987 - type: nauc_ndcg_at_10_max value: 33.451039871075345 - type: nauc_ndcg_at_10_std value: 9.272972801419813 - type: nauc_ndcg_at_1_diff1 value: 40.07169143996099 - type: nauc_ndcg_at_1_max value: 27.943354680588055 - type: nauc_ndcg_at_1_std value: 7.036639009967827 - type: nauc_ndcg_at_20_diff1 value: 36.51152244027151 - type: nauc_ndcg_at_20_max value: 33.89378482325653 - type: nauc_ndcg_at_20_std value: 10.342721315866635 - type: nauc_ndcg_at_3_diff1 value: 35.4822845318483 - type: nauc_ndcg_at_3_max value: 29.912345910181415 - type: nauc_ndcg_at_3_std value: 5.9694134283330715 - type: nauc_ndcg_at_5_diff1 value: 35.221776161219466 - type: nauc_ndcg_at_5_max value: 32.1072171248216 - type: nauc_ndcg_at_5_std value: 7.670174771541694 - type: nauc_precision_at_1000_diff1 value: -4.285000172509594 - type: nauc_precision_at_1000_max value: 14.600633321561062 - type: nauc_precision_at_1000_std value: 21.991435704986305 - type: nauc_precision_at_100_diff1 value: 1.7266493932509126 - type: nauc_precision_at_100_max value: 22.9932202096611 - type: nauc_precision_at_100_std value: 27.464183639561075 - type: nauc_precision_at_10_diff1 value: 16.16723142044687 - type: nauc_precision_at_10_max value: 32.61177863055963 - type: nauc_precision_at_10_std value: 19.30609156634069 - type: nauc_precision_at_1_diff1 value: 40.07169143996099 - type: nauc_precision_at_1_max value: 27.943354680588055 - type: nauc_precision_at_1_std value: 7.036639009967827 - type: nauc_precision_at_20_diff1 value: 10.986359452355082 - type: nauc_precision_at_20_max value: 30.001608294285408 - type: nauc_precision_at_20_std value: 23.470161266132752 - type: nauc_precision_at_3_diff1 value: 25.021299827765368 - type: nauc_precision_at_3_max value: 31.112435175145354 - type: nauc_precision_at_3_std value: 9.97933575854508 - type: nauc_precision_at_5_diff1 value: 19.85258852538675 - type: nauc_precision_at_5_max value: 33.017057636553346 - type: nauc_precision_at_5_std value: 14.226398540277224 - type: nauc_recall_at_1000_diff1 value: 32.956809555733294 - type: nauc_recall_at_1000_max value: 81.17616645437344 - type: nauc_recall_at_1000_std value: 80.81894015338722 - type: nauc_recall_at_100_diff1 value: 34.21543518933059 - type: nauc_recall_at_100_max value: 64.60424388566007 - type: nauc_recall_at_100_std value: 55.36262550526809 - type: nauc_recall_at_10_diff1 value: 31.854572843060865 - type: nauc_recall_at_10_max value: 41.47697651985406 - type: nauc_recall_at_10_std value: 15.449819317346778 - type: nauc_recall_at_1_diff1 value: 39.909029450982274 - type: nauc_recall_at_1_max value: 25.241631663639062 - type: nauc_recall_at_1_std value: 3.9346798436914625 - type: nauc_recall_at_20_diff1 value: 33.155424988870266 - type: nauc_recall_at_20_max value: 47.41147314334969 - type: nauc_recall_at_20_std value: 24.122822585459915 - type: nauc_recall_at_3_diff1 value: 31.030069463711484 - type: nauc_recall_at_3_max value: 30.349471998175105 - type: nauc_recall_at_3_std value: 5.3792560913820635 - type: nauc_recall_at_5_diff1 value: 29.662449422215627 - type: nauc_recall_at_5_max value: 35.59583981361554 - type: nauc_recall_at_5_std value: 9.138475426366536 - type: ndcg_at_1 value: 38.847 - type: ndcg_at_10 value: 56.854000000000006 - type: ndcg_at_100 value: 60.767 - type: ndcg_at_1000 value: 61.399 - type: ndcg_at_20 value: 58.941 - type: ndcg_at_3 value: 49.576 - type: ndcg_at_5 value: 53.502 - type: precision_at_1 value: 38.847 - type: precision_at_10 value: 9.064 - type: precision_at_100 value: 1.127 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_20 value: 5.038 - type: precision_at_3 value: 22.335 - type: precision_at_5 value: 15.689 - type: recall_at_1 value: 34.514 - type: recall_at_10 value: 76.152 - type: recall_at_100 value: 92.837 - type: recall_at_1000 value: 97.596 - type: recall_at_20 value: 83.77799999999999 - type: recall_at_3 value: 57.484 - type: recall_at_5 value: 66.476 task: type: Retrieval - dataset: config: default name: MTEB PAC revision: None split: test type: laugustyniak/abusive-clauses-pl metrics: - type: accuracy value: 67.24297712134376 - type: accuracy_stderr value: 4.77558207347837 - type: ap value: 77.38171975466854 - type: ap_stderr value: 2.5801970175320394 - type: f1 value: 65.21823897814332 - type: f1_stderr value: 4.317111734308895 - type: main_score value: 67.24297712134376 task: type: Classification - dataset: config: default name: MTEB PSC revision: d05a294af9e1d3ff2bfb6b714e08a24a6cabc669 split: test type: PL-MTEB/psc-pairclassification metrics: - type: cosine_accuracy value: 97.95918367346938 - type: cosine_accuracy_threshold value: 59.87724328133361 - type: cosine_ap value: 99.24498625606927 - type: cosine_f1 value: 96.6867469879518 - type: cosine_f1_threshold value: 59.87724328133361 - type: cosine_precision value: 95.53571428571429 - type: cosine_recall value: 97.86585365853658 - type: dot_accuracy value: 98.51576994434137 - type: dot_accuracy_threshold value: 1574400.0 - type: dot_ap value: 99.28566232682996 - type: dot_f1 value: 97.57575757575758 - type: dot_f1_threshold value: 1564800.0 - type: dot_precision value: 96.98795180722891 - type: dot_recall value: 98.17073170731707 - type: euclidean_accuracy value: 97.6808905380334 - type: euclidean_accuracy_threshold value: 14418.957939643331 - type: euclidean_ap value: 99.0876340868033 - type: euclidean_f1 value: 96.24060150375941 - type: euclidean_f1_threshold value: 14442.183182634264 - type: euclidean_precision value: 94.95548961424333 - type: euclidean_recall value: 97.5609756097561 - type: main_score value: 99.28566232682996 - type: manhattan_accuracy value: 97.86641929499072 - type: manhattan_accuracy_threshold value: 681802.1857857704 - type: manhattan_ap value: 99.08465290287205 - type: manhattan_f1 value: 96.52042360060513 - type: manhattan_f1_threshold value: 681802.1857857704 - type: manhattan_precision value: 95.7957957957958 - type: manhattan_recall value: 97.2560975609756 - type: max_ap value: 99.28566232682996 - type: max_f1 value: 97.57575757575758 - type: max_precision value: 96.98795180722891 - type: max_recall value: 98.17073170731707 - type: similarity_accuracy value: 97.95918367346938 - type: similarity_accuracy_threshold value: 59.87724328133361 - type: similarity_ap value: 99.24498625606927 - type: similarity_f1 value: 96.6867469879518 - type: similarity_f1_threshold value: 59.87724328133361 - type: similarity_precision value: 95.53571428571429 - type: similarity_recall value: 97.86585365853658 task: type: PairClassification - dataset: config: default name: MTEB PolEmo2.0-IN revision: d90724373c70959f17d2331ad51fb60c71176b03 split: test type: PL-MTEB/polemo2_in metrics: - type: accuracy value: 90.41551246537396 - type: f1 value: 89.15361039614409 - type: f1_weighted value: 90.69893050097603 - type: main_score value: 90.41551246537396 task: type: Classification - dataset: config: default name: MTEB PolEmo2.0-OUT revision: 6a21ab8716e255ab1867265f8b396105e8aa63d4 split: test type: PL-MTEB/polemo2_out metrics: - type: accuracy value: 77.77327935222672 - type: f1 value: 61.238079022455636 - type: f1_weighted value: 80.58753601509183 - type: main_score value: 77.77327935222672 task: type: Classification - dataset: config: default name: MTEB PPC revision: None split: test type: PL-MTEB/ppc-pairclassification metrics: - type: cos_sim_accuracy value: 87.2 - type: cos_sim_accuracy_threshold value: 83.69773167092553 - type: cos_sim_ap value: 95.43345251568122 - type: cos_sim_f1 value: 89.82785602503913 - type: cos_sim_f1_threshold value: 81.2116503074739 - type: cos_sim_precision value: 85.16320474777447 - type: cos_sim_recall value: 95.03311258278146 - type: dot_accuracy value: 85.9 - type: dot_accuracy_threshold value: 2177600.0 - type: dot_ap value: 92.4192102018206 - type: dot_f1 value: 88.9238020424195 - type: dot_f1_threshold value: 2163200.0 - type: dot_precision value: 84.60388639760838 - type: dot_recall value: 93.70860927152319 - type: euclidean_accuracy value: 87.5 - type: euclidean_accuracy_threshold value: 9325.450203438862 - type: euclidean_ap value: 95.42730698295347 - type: euclidean_f1 value: 89.92747784045125 - type: euclidean_f1_threshold value: 9325.450203438862 - type: euclidean_precision value: 87.59811616954474 - type: euclidean_recall value: 92.3841059602649 - type: manhattan_accuracy value: 87.5 - type: manhattan_accuracy_threshold value: 441412.88244724274 - type: manhattan_ap value: 95.4277447451651 - type: manhattan_f1 value: 89.92747784045125 - type: manhattan_f1_threshold value: 441412.88244724274 - type: manhattan_precision value: 87.59811616954474 - type: manhattan_recall value: 92.3841059602649 - type: max_accuracy value: 87.5 - type: max_ap value: 95.43345251568122 - type: max_f1 value: 89.92747784045125 task: type: PairClassification - dataset: config: default name: MTEB Quora-PL revision: 0be27e93455051e531182b85e85e425aba12e9d4 split: test type: clarin-knext/quora-pl metrics: - type: main_score value: 84.47099999999999 - type: map_at_1 value: 65.892 - type: map_at_10 value: 80.11500000000001 - type: map_at_100 value: 80.861 - type: map_at_1000 value: 80.879 - type: map_at_20 value: 80.604 - type: map_at_3 value: 76.97 - type: map_at_5 value: 78.926 - type: mrr_at_1 value: 75.83 - type: mrr_at_10 value: 83.2125238095233 - type: mrr_at_100 value: 83.38714262504709 - type: mrr_at_1000 value: 83.38942088013238 - type: mrr_at_20 value: 83.34284466299037 - type: mrr_at_3 value: 81.95333333333281 - type: mrr_at_5 value: 82.78533333333272 - type: nauc_map_at_1000_diff1 value: 73.95721764018812 - type: nauc_map_at_1000_max value: 9.653675847999432 - type: nauc_map_at_1000_std value: -42.35408133902171 - type: nauc_map_at_100_diff1 value: 73.96621756991526 - type: nauc_map_at_100_max value: 9.618124708373092 - type: nauc_map_at_100_std value: -42.41429680546156 - type: nauc_map_at_10_diff1 value: 74.20643666348498 - type: nauc_map_at_10_max value: 9.056688996919677 - type: nauc_map_at_10_std value: -44.13396437616006 - type: nauc_map_at_1_diff1 value: 77.18196114257519 - type: nauc_map_at_1_max value: 7.840648640771136 - type: nauc_map_at_1_std value: -39.84395715001256 - type: nauc_map_at_20_diff1 value: 74.03475632514551 - type: nauc_map_at_20_max value: 9.385795565805118 - type: nauc_map_at_20_std value: -43.160299598965466 - type: nauc_map_at_3_diff1 value: 74.43855921599284 - type: nauc_map_at_3_max value: 7.574218825911361 - type: nauc_map_at_3_std value: -46.1476276122436 - type: nauc_map_at_5_diff1 value: 74.38688915461512 - type: nauc_map_at_5_max value: 8.557764506539128 - type: nauc_map_at_5_std value: -45.53897898458085 - type: nauc_mrr_at_1000_diff1 value: 74.0311045258841 - type: nauc_mrr_at_1000_max value: 11.885448379701055 - type: nauc_mrr_at_1000_std value: -38.16008409213179 - type: nauc_mrr_at_100_diff1 value: 74.03074603058893 - type: nauc_mrr_at_100_max value: 11.886356221882725 - type: nauc_mrr_at_100_std value: -38.159139191997795 - type: nauc_mrr_at_10_diff1 value: 73.99521522874129 - type: nauc_mrr_at_10_max value: 11.77749620520773 - type: nauc_mrr_at_10_std value: -38.266295250166635 - type: nauc_mrr_at_1_diff1 value: 75.53192564838908 - type: nauc_mrr_at_1_max value: 12.979267595721275 - type: nauc_mrr_at_1_std value: -36.634066084632785 - type: nauc_mrr_at_20_diff1 value: 74.01273934757484 - type: nauc_mrr_at_20_max value: 11.887566738728225 - type: nauc_mrr_at_20_std value: -38.169250252410485 - type: nauc_mrr_at_3_diff1 value: 73.6073534511043 - type: nauc_mrr_at_3_max value: 11.450856365709727 - type: nauc_mrr_at_3_std value: -38.767141663073964 - type: nauc_mrr_at_5_diff1 value: 73.84950218235583 - type: nauc_mrr_at_5_max value: 11.787394554048813 - type: nauc_mrr_at_5_std value: -38.57240589862417 - type: nauc_ndcg_at_1000_diff1 value: 73.51677487598074 - type: nauc_ndcg_at_1000_max value: 10.72929244202152 - type: nauc_ndcg_at_1000_std value: -39.92813917654933 - type: nauc_ndcg_at_100_diff1 value: 73.53904136553481 - type: nauc_ndcg_at_100_max value: 10.569310211635521 - type: nauc_ndcg_at_100_std value: -40.12206261908318 - type: nauc_ndcg_at_10_diff1 value: 73.55958917204208 - type: nauc_ndcg_at_10_max value: 9.255791947077263 - type: nauc_ndcg_at_10_std value: -42.7856138240991 - type: nauc_ndcg_at_1_diff1 value: 75.34289960079188 - type: nauc_ndcg_at_1_max value: 13.499789436258705 - type: nauc_ndcg_at_1_std value: -35.91483904818284 - type: nauc_ndcg_at_20_diff1 value: 73.48070745481307 - type: nauc_ndcg_at_20_max value: 9.92427572953505 - type: nauc_ndcg_at_20_std value: -41.55653404596579 - type: nauc_ndcg_at_3_diff1 value: 72.72072901275445 - type: nauc_ndcg_at_3_max value: 8.303708237302729 - type: nauc_ndcg_at_3_std value: -43.618531107389344 - type: nauc_ndcg_at_5_diff1 value: 73.30060059269601 - type: nauc_ndcg_at_5_max value: 8.915386932153249 - type: nauc_ndcg_at_5_std value: -44.088053429661 - type: nauc_precision_at_1000_diff1 value: -41.540517884119524 - type: nauc_precision_at_1000_max value: 6.9361565712971265 - type: nauc_precision_at_1000_std value: 42.39482890919027 - type: nauc_precision_at_100_diff1 value: -40.609576663184896 - type: nauc_precision_at_100_max value: 6.302451339507686 - type: nauc_precision_at_100_std value: 41.30693233869549 - type: nauc_precision_at_10_diff1 value: -30.91653155031006 - type: nauc_precision_at_10_max value: 4.84981614338782 - type: nauc_precision_at_10_std value: 24.47022404030676 - type: nauc_precision_at_1_diff1 value: 75.34289960079188 - type: nauc_precision_at_1_max value: 13.499789436258705 - type: nauc_precision_at_1_std value: -35.91483904818284 - type: nauc_precision_at_20_diff1 value: -36.75164419452007 - type: nauc_precision_at_20_max value: 5.440757182282365 - type: nauc_precision_at_20_std value: 33.08928025809355 - type: nauc_precision_at_3_diff1 value: -5.3240699725635565 - type: nauc_precision_at_3_max value: 5.156636102003736 - type: nauc_precision_at_3_std value: -0.9779263105110453 - type: nauc_precision_at_5_diff1 value: -19.92133198420086 - type: nauc_precision_at_5_max value: 5.432766335564369 - type: nauc_precision_at_5_std value: 11.417736295996392 - type: nauc_recall_at_1000_diff1 value: 56.57663068186203 - type: nauc_recall_at_1000_max value: 25.80329039728696 - type: nauc_recall_at_1000_std value: 57.82937604195464 - type: nauc_recall_at_100_diff1 value: 67.25188672746224 - type: nauc_recall_at_100_max value: 6.879939694351325 - type: nauc_recall_at_100_std value: -30.098258041087096 - type: nauc_recall_at_10_diff1 value: 68.00694154421653 - type: nauc_recall_at_10_max value: 0.7226814903576098 - type: nauc_recall_at_10_std value: -52.980002751088215 - type: nauc_recall_at_1_diff1 value: 77.18196114257519 - type: nauc_recall_at_1_max value: 7.840648640771136 - type: nauc_recall_at_1_std value: -39.84395715001256 - type: nauc_recall_at_20_diff1 value: 66.56016564739411 - type: nauc_recall_at_20_max value: 1.919044428493598 - type: nauc_recall_at_20_std value: -49.5380686276396 - type: nauc_recall_at_3_diff1 value: 69.83247207081557 - type: nauc_recall_at_3_max value: 2.395588418833963 - type: nauc_recall_at_3_std value: -52.11119790224493 - type: nauc_recall_at_5_diff1 value: 69.25881483845956 - type: nauc_recall_at_5_max value: 2.9185552604991716 - type: nauc_recall_at_5_std value: -54.376346690212095 - type: ndcg_at_1 value: 75.92 - type: ndcg_at_10 value: 84.47099999999999 - type: ndcg_at_100 value: 86.11999999999999 - type: ndcg_at_1000 value: 86.276 - type: ndcg_at_20 value: 85.37599999999999 - type: ndcg_at_3 value: 81.0 - type: ndcg_at_5 value: 82.88799999999999 - type: precision_at_1 value: 75.92 - type: precision_at_10 value: 12.987000000000002 - type: precision_at_100 value: 1.5190000000000001 - type: precision_at_1000 value: 0.156 - type: precision_at_20 value: 6.977 - type: precision_at_3 value: 35.573 - type: precision_at_5 value: 23.566000000000003 - type: recall_at_1 value: 65.892 - type: recall_at_10 value: 93.318 - type: recall_at_100 value: 99.124 - type: recall_at_1000 value: 99.92699999999999 - type: recall_at_20 value: 96.256 - type: recall_at_3 value: 83.69 - type: recall_at_5 value: 88.783 task: type: Retrieval - dataset: config: default name: MTEB SCIDOCS-PL revision: 45452b03f05560207ef19149545f168e596c9337 split: test type: clarin-knext/scidocs-pl metrics: - type: main_score value: 19.528000000000002 - type: map_at_1 value: 4.5280000000000005 - type: map_at_10 value: 11.649 - type: map_at_100 value: 14.019 - type: map_at_1000 value: 14.35 - type: map_at_20 value: 12.866 - type: map_at_3 value: 8.35 - type: map_at_5 value: 9.84 - type: mrr_at_1 value: 22.3 - type: mrr_at_10 value: 32.690039682539656 - type: mrr_at_100 value: 33.91097016542133 - type: mrr_at_1000 value: 33.96940693754695 - type: mrr_at_20 value: 33.418312740750785 - type: mrr_at_3 value: 29.4 - type: mrr_at_5 value: 31.21999999999997 - type: nauc_map_at_1000_diff1 value: 20.52578935318615 - type: nauc_map_at_1000_max value: 28.28553814852898 - type: nauc_map_at_1000_std value: 18.74384140790138 - type: nauc_map_at_100_diff1 value: 20.508083204903077 - type: nauc_map_at_100_max value: 28.281447260273346 - type: nauc_map_at_100_std value: 18.51851601604162 - type: nauc_map_at_10_diff1 value: 21.028884157759624 - type: nauc_map_at_10_max value: 26.98935951161403 - type: nauc_map_at_10_std value: 14.434790357547536 - type: nauc_map_at_1_diff1 value: 23.406427416653127 - type: nauc_map_at_1_max value: 21.759624726647303 - type: nauc_map_at_1_std value: 8.335925909478444 - type: nauc_map_at_20_diff1 value: 20.370301978337785 - type: nauc_map_at_20_max value: 27.30787972231405 - type: nauc_map_at_20_std value: 16.166505401287353 - type: nauc_map_at_3_diff1 value: 23.920717676009453 - type: nauc_map_at_3_max value: 26.061264285994124 - type: nauc_map_at_3_std value: 10.707123907182902 - type: nauc_map_at_5_diff1 value: 22.180679453453557 - type: nauc_map_at_5_max value: 26.85332935641574 - type: nauc_map_at_5_std value: 12.316377808191762 - type: nauc_mrr_at_1000_diff1 value: 21.49186339320302 - type: nauc_mrr_at_1000_max value: 24.329921012356493 - type: nauc_mrr_at_1000_std value: 13.6080824939291 - type: nauc_mrr_at_100_diff1 value: 21.47653180378912 - type: nauc_mrr_at_100_max value: 24.34218235410752 - type: nauc_mrr_at_100_std value: 13.646711743513668 - type: nauc_mrr_at_10_diff1 value: 21.487198850706935 - type: nauc_mrr_at_10_max value: 24.32385099521571 - type: nauc_mrr_at_10_std value: 13.26596223383694 - type: nauc_mrr_at_1_diff1 value: 23.19221955587559 - type: nauc_mrr_at_1_max value: 21.963004569187575 - type: nauc_mrr_at_1_std value: 8.799819519408619 - type: nauc_mrr_at_20_diff1 value: 21.51014357510076 - type: nauc_mrr_at_20_max value: 24.376067405199347 - type: nauc_mrr_at_20_std value: 13.643597889716563 - type: nauc_mrr_at_3_diff1 value: 22.60437837853161 - type: nauc_mrr_at_3_max value: 23.58608363876532 - type: nauc_mrr_at_3_std value: 11.887163540535768 - type: nauc_mrr_at_5_diff1 value: 21.919324914716633 - type: nauc_mrr_at_5_max value: 23.71458680225389 - type: nauc_mrr_at_5_std value: 12.507643886191785 - type: nauc_ndcg_at_1000_diff1 value: 18.546848864440005 - type: nauc_ndcg_at_1000_max value: 30.031984469206325 - type: nauc_ndcg_at_1000_std value: 26.561149084437485 - type: nauc_ndcg_at_100_diff1 value: 18.76271748622068 - type: nauc_ndcg_at_100_max value: 30.180887663861306 - type: nauc_ndcg_at_100_std value: 25.50551358758007 - type: nauc_ndcg_at_10_diff1 value: 19.861367738304697 - type: nauc_ndcg_at_10_max value: 27.360442235691522 - type: nauc_ndcg_at_10_std value: 16.476546243351976 - type: nauc_ndcg_at_1_diff1 value: 23.56715803292495 - type: nauc_ndcg_at_1_max value: 22.29229945166374 - type: nauc_ndcg_at_1_std value: 8.43434671818737 - type: nauc_ndcg_at_20_diff1 value: 18.885059883708053 - type: nauc_ndcg_at_20_max value: 27.78854464221595 - type: nauc_ndcg_at_20_std value: 19.404353378015255 - type: nauc_ndcg_at_3_diff1 value: 23.34227259398943 - type: nauc_ndcg_at_3_max value: 25.75899010582446 - type: nauc_ndcg_at_3_std value: 12.097012181915954 - type: nauc_ndcg_at_5_diff1 value: 21.599246331396863 - type: nauc_ndcg_at_5_max value: 26.6575824351444 - type: nauc_ndcg_at_5_std value: 14.029006846982394 - type: nauc_precision_at_1000_diff1 value: 4.880571159099271 - type: nauc_precision_at_1000_max value: 24.693741787360725 - type: nauc_precision_at_1000_std value: 41.00756555344345 - type: nauc_precision_at_100_diff1 value: 10.440170876298648 - type: nauc_precision_at_100_max value: 28.942738351320408 - type: nauc_precision_at_100_std value: 36.921704945977446 - type: nauc_precision_at_10_diff1 value: 15.55680558043308 - type: nauc_precision_at_10_max value: 27.31414489241847 - type: nauc_precision_at_10_std value: 19.76275914256793 - type: nauc_precision_at_1_diff1 value: 23.56715803292495 - type: nauc_precision_at_1_max value: 22.29229945166374 - type: nauc_precision_at_1_std value: 8.43434671818737 - type: nauc_precision_at_20_diff1 value: 12.57247210423589 - type: nauc_precision_at_20_max value: 25.978951783180946 - type: nauc_precision_at_20_std value: 23.89998191646426 - type: nauc_precision_at_3_diff1 value: 22.61273732758558 - type: nauc_precision_at_3_max value: 26.51246898792034 - type: nauc_precision_at_3_std value: 13.618855663226162 - type: nauc_precision_at_5_diff1 value: 19.216237125486472 - type: nauc_precision_at_5_max value: 27.491221626577868 - type: nauc_precision_at_5_std value: 16.448119031617793 - type: nauc_recall_at_1000_diff1 value: 5.787043341957982 - type: nauc_recall_at_1000_max value: 25.922109246772763 - type: nauc_recall_at_1000_std value: 43.03768522656805 - type: nauc_recall_at_100_diff1 value: 10.696362559629796 - type: nauc_recall_at_100_max value: 29.335080453227146 - type: nauc_recall_at_100_std value: 37.271217586452124 - type: nauc_recall_at_10_diff1 value: 15.458092305569215 - type: nauc_recall_at_10_max value: 27.24445210740807 - type: nauc_recall_at_10_std value: 19.71157635644842 - type: nauc_recall_at_1_diff1 value: 23.406427416653127 - type: nauc_recall_at_1_max value: 21.759624726647303 - type: nauc_recall_at_1_std value: 8.335925909478444 - type: nauc_recall_at_20_diff1 value: 12.666354755313089 - type: nauc_recall_at_20_max value: 26.089770792562327 - type: nauc_recall_at_20_std value: 24.153776619741254 - type: nauc_recall_at_3_diff1 value: 22.545408113368953 - type: nauc_recall_at_3_max value: 26.18564049945919 - type: nauc_recall_at_3_std value: 13.308772571657293 - type: nauc_recall_at_5_diff1 value: 19.063078320434958 - type: nauc_recall_at_5_max value: 27.15038597116091 - type: nauc_recall_at_5_std value: 16.202694888143302 - type: ndcg_at_1 value: 22.2 - type: ndcg_at_10 value: 19.528000000000002 - type: ndcg_at_100 value: 28.444000000000003 - type: ndcg_at_1000 value: 33.826 - type: ndcg_at_20 value: 22.746 - type: ndcg_at_3 value: 18.413 - type: ndcg_at_5 value: 15.927 - type: precision_at_1 value: 22.2 - type: precision_at_10 value: 10.24 - type: precision_at_100 value: 2.3040000000000003 - type: precision_at_1000 value: 0.358 - type: precision_at_20 value: 6.97 - type: precision_at_3 value: 17.299999999999997 - type: precision_at_5 value: 13.919999999999998 - type: recall_at_1 value: 4.5280000000000005 - type: recall_at_10 value: 20.757 - type: recall_at_100 value: 46.75 - type: recall_at_1000 value: 72.738 - type: recall_at_20 value: 28.28 - type: recall_at_3 value: 10.558 - type: recall_at_5 value: 14.148 task: type: Retrieval - dataset: config: default name: MTEB SICK-E-PL revision: 71bba34b0ece6c56dfcf46d9758a27f7a90f17e9 split: test type: PL-MTEB/sicke-pl-pairclassification metrics: - type: cosine_accuracy value: 87.50509580105992 - type: cosine_accuracy_threshold value: 89.01510631979949 - type: cosine_ap value: 85.58291779193907 - type: cosine_f1 value: 77.58919293384136 - type: cosine_f1_threshold value: 87.10908804245841 - type: cosine_precision value: 75.52258934592044 - type: cosine_recall value: 79.77207977207978 - type: dot_accuracy value: 83.9380350591113 - type: dot_accuracy_threshold value: 2292800.0 - type: dot_ap value: 77.56937485120034 - type: dot_f1 value: 73.32065906210391 - type: dot_f1_threshold value: 2190400.0 - type: dot_precision value: 66.03881278538812 - type: dot_recall value: 82.4074074074074 - type: euclidean_accuracy value: 87.89237668161435 - type: euclidean_accuracy_threshold value: 7497.701400069587 - type: euclidean_ap value: 85.97216152106346 - type: euclidean_f1 value: 77.97228300510578 - type: euclidean_f1_threshold value: 7799.027816670506 - type: euclidean_precision value: 79.89536621823618 - type: euclidean_recall value: 76.13960113960114 - type: main_score value: 85.97216152106346 - type: manhattan_accuracy value: 87.85161027313494 - type: manhattan_accuracy_threshold value: 357242.9743885994 - type: manhattan_ap value: 85.96709490495458 - type: manhattan_f1 value: 77.9874213836478 - type: manhattan_f1_threshold value: 383558.8531732559 - type: manhattan_precision value: 76.5432098765432 - type: manhattan_recall value: 79.48717948717949 - type: max_ap value: 85.97216152106346 - type: max_f1 value: 77.9874213836478 - type: max_precision value: 79.89536621823618 - type: max_recall value: 82.4074074074074 - type: similarity_accuracy value: 87.50509580105992 - type: similarity_accuracy_threshold value: 89.01510631979949 - type: similarity_ap value: 85.58291779193907 - type: similarity_f1 value: 77.58919293384136 - type: similarity_f1_threshold value: 87.10908804245841 - type: similarity_precision value: 75.52258934592044 - type: similarity_recall value: 79.77207977207978 task: type: PairClassification - dataset: config: default name: MTEB SICK-R-PL revision: fd5c2441b7eeff8676768036142af4cfa42c1339 split: test type: PL-MTEB/sickr-pl-sts metrics: - type: cosine_pearson value: 79.68602301743276 - type: cosine_spearman value: 78.15913085997471 - type: euclidean_pearson value: 77.19541180768627 - type: euclidean_spearman value: 77.9122894221527 - type: main_score value: 78.15913085997471 - type: manhattan_pearson value: 77.24713453824641 - type: manhattan_spearman value: 77.95971728547582 - type: pearson value: 79.68602301743276 - type: spearman value: 78.15913085997471 task: type: STS - dataset: config: pl name: MTEB STS22 (pl) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 42.01062393061261 - type: cosine_spearman value: 42.79076406559122 - type: euclidean_pearson value: 28.57786522106708 - type: euclidean_spearman value: 42.51040813516686 - type: main_score value: 42.79076406559122 - type: manhattan_pearson value: 28.855884350706653 - type: manhattan_spearman value: 42.77481125184737 - type: pearson value: 42.01062393061261 - type: spearman value: 42.79076406559122 task: type: STS - dataset: config: default name: MTEB SciFact-PL revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e split: test type: clarin-knext/scifact-pl metrics: - type: main_score value: 74.434 - type: map_at_1 value: 59.494 - type: map_at_10 value: 69.893 - type: map_at_100 value: 70.45 - type: map_at_1000 value: 70.466 - type: map_at_20 value: 70.259 - type: map_at_3 value: 67.037 - type: map_at_5 value: 68.777 - type: mrr_at_1 value: 62.66666666666667 - type: mrr_at_10 value: 71.04457671957671 - type: mrr_at_100 value: 71.52299909263925 - type: mrr_at_1000 value: 71.53881086964122 - type: mrr_at_20 value: 71.33636271136271 - type: mrr_at_3 value: 69.16666666666667 - type: mrr_at_5 value: 70.26666666666667 - type: nauc_map_at_1000_diff1 value: 68.97113084189034 - type: nauc_map_at_1000_max value: 51.00665747497857 - type: nauc_map_at_1000_std value: 8.970270487093412 - type: nauc_map_at_100_diff1 value: 68.97281660521169 - type: nauc_map_at_100_max value: 51.01659549614879 - type: nauc_map_at_100_std value: 8.986483862053491 - type: nauc_map_at_10_diff1 value: 69.07605123979184 - type: nauc_map_at_10_max value: 51.229841935772804 - type: nauc_map_at_10_std value: 9.050901052243548 - type: nauc_map_at_1_diff1 value: 71.46187295357046 - type: nauc_map_at_1_max value: 46.82038076857106 - type: nauc_map_at_1_std value: 6.931602615510153 - type: nauc_map_at_20_diff1 value: 68.93823362705625 - type: nauc_map_at_20_max value: 51.15218544845727 - type: nauc_map_at_20_std value: 8.993550237629675 - type: nauc_map_at_3_diff1 value: 69.19558420072627 - type: nauc_map_at_3_max value: 47.345905341053886 - type: nauc_map_at_3_std value: 4.833936436252541 - type: nauc_map_at_5_diff1 value: 69.05067049349557 - type: nauc_map_at_5_max value: 49.62866209452668 - type: nauc_map_at_5_std value: 7.455937282103214 - type: nauc_mrr_at_1000_diff1 value: 69.2896395759106 - type: nauc_mrr_at_1000_max value: 54.20478659857226 - type: nauc_mrr_at_1000_std value: 12.534151525016302 - type: nauc_mrr_at_100_diff1 value: 69.29115865311857 - type: nauc_mrr_at_100_max value: 54.212882919608475 - type: nauc_mrr_at_100_std value: 12.548435473868432 - type: nauc_mrr_at_10_diff1 value: 69.29596234146305 - type: nauc_mrr_at_10_max value: 54.391683731646935 - type: nauc_mrr_at_10_std value: 12.74312540729047 - type: nauc_mrr_at_1_diff1 value: 71.19661136604304 - type: nauc_mrr_at_1_max value: 53.50646788895577 - type: nauc_mrr_at_1_std value: 14.68408048005645 - type: nauc_mrr_at_20_diff1 value: 69.24714813412893 - type: nauc_mrr_at_20_max value: 54.32239828421196 - type: nauc_mrr_at_20_std value: 12.623980761665866 - type: nauc_mrr_at_3_diff1 value: 69.22708724496187 - type: nauc_mrr_at_3_max value: 53.18873450995116 - type: nauc_mrr_at_3_std value: 11.336687945925586 - type: nauc_mrr_at_5_diff1 value: 69.10748983236182 - type: nauc_mrr_at_5_max value: 53.878090193979034 - type: nauc_mrr_at_5_std value: 12.079036178698662 - type: nauc_ndcg_at_1000_diff1 value: 68.66705448374432 - type: nauc_ndcg_at_1000_max value: 52.74699991296371 - type: nauc_ndcg_at_1000_std value: 10.535824386304968 - type: nauc_ndcg_at_100_diff1 value: 68.66862462407086 - type: nauc_ndcg_at_100_max value: 52.979821543362874 - type: nauc_ndcg_at_100_std value: 10.856284103500371 - type: nauc_ndcg_at_10_diff1 value: 68.66965948376267 - type: nauc_ndcg_at_10_max value: 53.978681919984474 - type: nauc_ndcg_at_10_std value: 11.10472732803466 - type: nauc_ndcg_at_1_diff1 value: 71.19661136604304 - type: nauc_ndcg_at_1_max value: 53.50646788895577 - type: nauc_ndcg_at_1_std value: 14.68408048005645 - type: nauc_ndcg_at_20_diff1 value: 68.20754850499976 - type: nauc_ndcg_at_20_max value: 53.590485842045595 - type: nauc_ndcg_at_20_std value: 10.719753086433334 - type: nauc_ndcg_at_3_diff1 value: 68.23406959629385 - type: nauc_ndcg_at_3_max value: 48.8837450762613 - type: nauc_ndcg_at_3_std value: 6.287949648205997 - type: nauc_ndcg_at_5_diff1 value: 68.52532849588677 - type: nauc_ndcg_at_5_max value: 51.29845300513165 - type: nauc_ndcg_at_5_std value: 8.15488455762137 - type: nauc_precision_at_1000_diff1 value: -29.56388929021074 - type: nauc_precision_at_1000_max value: 18.61674681637121 - type: nauc_precision_at_1000_std value: 41.68541412973936 - type: nauc_precision_at_100_diff1 value: -17.020740767390375 - type: nauc_precision_at_100_max value: 24.321682766394957 - type: nauc_precision_at_100_std value: 39.36188711602 - type: nauc_precision_at_10_diff1 value: 7.735819461600302 - type: nauc_precision_at_10_max value: 39.59963139423176 - type: nauc_precision_at_10_std value: 33.923494696390385 - type: nauc_precision_at_1_diff1 value: 71.19661136604304 - type: nauc_precision_at_1_max value: 53.50646788895577 - type: nauc_precision_at_1_std value: 14.68408048005645 - type: nauc_precision_at_20_diff1 value: -3.587900694179661 - type: nauc_precision_at_20_max value: 33.36606615861144 - type: nauc_precision_at_20_std value: 34.51624192343654 - type: nauc_precision_at_3_diff1 value: 41.996620318298625 - type: nauc_precision_at_3_max value: 43.08007454860597 - type: nauc_precision_at_3_std value: 14.398965447916495 - type: nauc_precision_at_5_diff1 value: 25.054180107661132 - type: nauc_precision_at_5_max value: 40.94617942853718 - type: nauc_precision_at_5_std value: 23.69992709404865 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 68.09523809523836 - type: nauc_recall_at_100_max value: 63.034547152194406 - type: nauc_recall_at_100_std value: 23.594771241830657 - type: nauc_recall_at_10_diff1 value: 66.43213426149696 - type: nauc_recall_at_10_max value: 63.07509853849101 - type: nauc_recall_at_10_std value: 15.44924084252273 - type: nauc_recall_at_1_diff1 value: 71.46187295357046 - type: nauc_recall_at_1_max value: 46.82038076857106 - type: nauc_recall_at_1_std value: 6.931602615510153 - type: nauc_recall_at_20_diff1 value: 61.64354198229226 - type: nauc_recall_at_20_max value: 63.09950698826864 - type: nauc_recall_at_20_std value: 12.823209698925014 - type: nauc_recall_at_3_diff1 value: 65.63352507252078 - type: nauc_recall_at_3_max value: 45.10210171735505 - type: nauc_recall_at_3_std value: -0.08017546941514365 - type: nauc_recall_at_5_diff1 value: 65.93453179242769 - type: nauc_recall_at_5_max value: 51.97740656606473 - type: nauc_recall_at_5_std value: 4.929967882548962 - type: ndcg_at_1 value: 62.666999999999994 - type: ndcg_at_10 value: 74.434 - type: ndcg_at_100 value: 76.655 - type: ndcg_at_1000 value: 77.08 - type: ndcg_at_20 value: 75.588 - type: ndcg_at_3 value: 69.75099999999999 - type: ndcg_at_5 value: 72.09100000000001 - type: precision_at_1 value: 62.666999999999994 - type: precision_at_10 value: 9.9 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_20 value: 5.2 - type: precision_at_3 value: 27.0 - type: precision_at_5 value: 17.933 - type: recall_at_1 value: 59.494 - type: recall_at_10 value: 87.13300000000001 - type: recall_at_100 value: 96.667 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 91.43299999999999 - type: recall_at_3 value: 74.461 - type: recall_at_5 value: 80.34400000000001 task: type: Retrieval - dataset: config: default name: MTEB TRECCOVID-PL revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd split: test type: clarin-knext/trec-covid-pl metrics: - type: main_score value: 82.749 - type: map_at_1 value: 0.20400000000000001 - type: map_at_10 value: 2.099 - type: map_at_100 value: 12.948 - type: map_at_1000 value: 32.007000000000005 - type: map_at_20 value: 3.746 - type: map_at_3 value: 0.651 - type: map_at_5 value: 1.061 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 91.66666666666666 - type: mrr_at_100 value: 91.66666666666666 - type: mrr_at_1000 value: 91.66666666666666 - type: mrr_at_20 value: 91.66666666666666 - type: mrr_at_3 value: 91.66666666666666 - type: mrr_at_5 value: 91.66666666666666 - type: nauc_map_at_1000_diff1 value: 1.0291414165448085 - type: nauc_map_at_1000_max value: 57.33479540784058 - type: nauc_map_at_1000_std value: 76.70364036170582 - type: nauc_map_at_100_diff1 value: 6.949672309533349 - type: nauc_map_at_100_max value: 43.99861611069154 - type: nauc_map_at_100_std value: 64.12473626966596 - type: nauc_map_at_10_diff1 value: 4.208568177173666 - type: nauc_map_at_10_max value: 18.875910045226423 - type: nauc_map_at_10_std value: 34.58171216714189 - type: nauc_map_at_1_diff1 value: 8.433450768728983 - type: nauc_map_at_1_max value: 24.08001091473891 - type: nauc_map_at_1_std value: 35.21473053133869 - type: nauc_map_at_20_diff1 value: 6.041054220619057 - type: nauc_map_at_20_max value: 22.57475437061051 - type: nauc_map_at_20_std value: 35.254808865756964 - type: nauc_map_at_3_diff1 value: 11.166815378728485 - type: nauc_map_at_3_max value: 18.995433996118248 - type: nauc_map_at_3_std value: 34.29696290521795 - type: nauc_map_at_5_diff1 value: 7.1134812647567855 - type: nauc_map_at_5_max value: 20.03877039266845 - type: nauc_map_at_5_std value: 36.21644151312843 - type: nauc_mrr_at_1000_diff1 value: -7.262394669801826 - type: nauc_mrr_at_1000_max value: 66.22378992749366 - type: nauc_mrr_at_1000_std value: 68.18146188516563 - type: nauc_mrr_at_100_diff1 value: -7.262394669801826 - type: nauc_mrr_at_100_max value: 66.22378992749366 - type: nauc_mrr_at_100_std value: 68.18146188516563 - type: nauc_mrr_at_10_diff1 value: -7.262394669801826 - type: nauc_mrr_at_10_max value: 66.22378992749366 - type: nauc_mrr_at_10_std value: 68.18146188516563 - type: nauc_mrr_at_1_diff1 value: -11.38929798723619 - type: nauc_mrr_at_1_max value: 68.58738340697101 - type: nauc_mrr_at_1_std value: 68.00441826215022 - type: nauc_mrr_at_20_diff1 value: -7.262394669801826 - type: nauc_mrr_at_20_max value: 66.22378992749366 - type: nauc_mrr_at_20_std value: 68.18146188516563 - type: nauc_mrr_at_3_diff1 value: -7.262394669801826 - type: nauc_mrr_at_3_max value: 66.22378992749366 - type: nauc_mrr_at_3_std value: 68.18146188516563 - type: nauc_mrr_at_5_diff1 value: -7.262394669801826 - type: nauc_mrr_at_5_max value: 66.22378992749366 - type: nauc_mrr_at_5_std value: 68.18146188516563 - type: nauc_ndcg_at_1000_diff1 value: 2.5628376286433334 - type: nauc_ndcg_at_1000_max value: 57.605148480655025 - type: nauc_ndcg_at_1000_std value: 76.62891677430625 - type: nauc_ndcg_at_100_diff1 value: -13.313083767893671 - type: nauc_ndcg_at_100_max value: 52.932453336031905 - type: nauc_ndcg_at_100_std value: 73.5050466104544 - type: nauc_ndcg_at_10_diff1 value: -6.837803344621873 - type: nauc_ndcg_at_10_max value: 59.29833159945462 - type: nauc_ndcg_at_10_std value: 63.719268128346705 - type: nauc_ndcg_at_1_diff1 value: 4.834338452523335 - type: nauc_ndcg_at_1_max value: 53.58546768562144 - type: nauc_ndcg_at_1_std value: 59.07659252386643 - type: nauc_ndcg_at_20_diff1 value: -9.617683189610558 - type: nauc_ndcg_at_20_max value: 54.57354685878183 - type: nauc_ndcg_at_20_std value: 63.15198506529425 - type: nauc_ndcg_at_3_diff1 value: 15.216236580270994 - type: nauc_ndcg_at_3_max value: 58.345749967766416 - type: nauc_ndcg_at_3_std value: 61.78177922399883 - type: nauc_ndcg_at_5_diff1 value: 1.3882436296634026 - type: nauc_ndcg_at_5_max value: 62.44013008368074 - type: nauc_ndcg_at_5_std value: 65.64455986653293 - type: nauc_precision_at_1000_diff1 value: -18.516822124710856 - type: nauc_precision_at_1000_max value: 33.10336267989325 - type: nauc_precision_at_1000_std value: 29.49816019882571 - type: nauc_precision_at_100_diff1 value: -14.113619184538592 - type: nauc_precision_at_100_max value: 55.55228172103563 - type: nauc_precision_at_100_std value: 69.64355056246397 - type: nauc_precision_at_10_diff1 value: -27.271286464111455 - type: nauc_precision_at_10_max value: 61.885272647604594 - type: nauc_precision_at_10_std value: 60.73389705676694 - type: nauc_precision_at_1_diff1 value: -11.38929798723619 - type: nauc_precision_at_1_max value: 68.58738340697101 - type: nauc_precision_at_1_std value: 68.00441826215022 - type: nauc_precision_at_20_diff1 value: -21.53639909310826 - type: nauc_precision_at_20_max value: 53.361537614358376 - type: nauc_precision_at_20_std value: 55.58737187496432 - type: nauc_precision_at_3_diff1 value: 3.785071466384217 - type: nauc_precision_at_3_max value: 61.66906148377818 - type: nauc_precision_at_3_std value: 62.81857369734561 - type: nauc_precision_at_5_diff1 value: -16.00339477131436 - type: nauc_precision_at_5_max value: 61.5246951163262 - type: nauc_precision_at_5_std value: 63.615062452722135 - type: nauc_recall_at_1000_diff1 value: 5.871263115826736 - type: nauc_recall_at_1000_max value: 50.48397949000848 - type: nauc_recall_at_1000_std value: 67.37950715297474 - type: nauc_recall_at_100_diff1 value: 8.310215006893952 - type: nauc_recall_at_100_max value: 28.687726825722386 - type: nauc_recall_at_100_std value: 50.34038560928654 - type: nauc_recall_at_10_diff1 value: 3.3408195168322075 - type: nauc_recall_at_10_max value: 6.89511828305496 - type: nauc_recall_at_10_std value: 22.929267555360028 - type: nauc_recall_at_1_diff1 value: 8.433450768728983 - type: nauc_recall_at_1_max value: 24.08001091473891 - type: nauc_recall_at_1_std value: 35.21473053133869 - type: nauc_recall_at_20_diff1 value: 5.307683260432045 - type: nauc_recall_at_20_max value: 10.025532087519974 - type: nauc_recall_at_20_std value: 24.110512570368947 - type: nauc_recall_at_3_diff1 value: 13.355136074654078 - type: nauc_recall_at_3_max value: 8.568079109800236 - type: nauc_recall_at_3_std value: 23.691593767005745 - type: nauc_recall_at_5_diff1 value: 6.535580157651383 - type: nauc_recall_at_5_max value: 9.1442468749571 - type: nauc_recall_at_5_std value: 27.00111567203191 - type: ndcg_at_1 value: 79.0 - type: ndcg_at_10 value: 82.749 - type: ndcg_at_100 value: 63.846000000000004 - type: ndcg_at_1000 value: 57.691 - type: ndcg_at_20 value: 77.076 - type: ndcg_at_3 value: 84.83800000000001 - type: ndcg_at_5 value: 83.016 - type: precision_at_1 value: 84.0 - type: precision_at_10 value: 87.8 - type: precision_at_100 value: 66.10000000000001 - type: precision_at_1000 value: 25.764 - type: precision_at_20 value: 81.10000000000001 - type: precision_at_3 value: 91.333 - type: precision_at_5 value: 88.8 - type: recall_at_1 value: 0.20400000000000001 - type: recall_at_10 value: 2.294 - type: recall_at_100 value: 16.134999999999998 - type: recall_at_1000 value: 54.981 - type: recall_at_20 value: 4.201 - type: recall_at_3 value: 0.699 - type: recall_at_5 value: 1.141 task: type: Retrieval ---

FlagEmbedding

For more details please refer to our Github: FlagEmbedding. **BGE-Multilingual-Gemma2** is a LLM-based multilingual embedding model. It is trained on a diverse range of languages and tasks based on google/gemma-2-9b. BGE-Multilingual-Gemma2 primarily demonstrates the following advancements: - Diverse training data: The model's training data spans a broad range of languages, including English, Chinese, Japanese, Korean, French, and more.Additionally, the data covers a variety of task types, such as retrieval, classification, and clustering. - Outstanding performance: The model exhibits state-of-the-art (SOTA) results on multilingual benchmarks like MIRACL, MTEB-pl, and MTEB-fr. It also achieves excellent performance on other major evaluations, including MTEB, C-MTEB and AIR-Bench. ## 📑 Open-source Plan - [x] Checkpoint - [ ] Training Data We will release the training data of **BGE-Multilingual-Gemma2** in the future. ## Usage ### Using FlagEmbedding By default, FlagLLMModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. ### Using Sentence Transformers ### Using HuggingFace Transformers ## Evaluation exhibits **state-of-the-art (SOTA) results on benchmarks like MIRACL, MTEB-pl, and MTEB-fr**. It also achieves excellent performance on other major evaluations, including MTEB, C-MTEB and AIR-Bench. - **MIRACL** nDCG@10: \"MIRACL-nDCG@10\" Recall@100: \"MIRACL-Recall@100\" - **MTEB-fr/pl** \"MTEB-fr/pl\" - **MTEB** \"MTEB\" - **BEIR** \"BEIR\" - **C-MTEB** \"C-MTEB\" - **AIR-Bench** Long-Doc (en, Recall@10): \"AIR-Bench_Long-Doc\" QA (en&zh, nDCG@10): \"AIR-Bench_QA\" ## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | | :----------------------------------------------------------- | :-----------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | BAAI/bge-multilingual-gemma2 | Multilingual | - | A LLM-based multilingual embedding model, trained on a diverse range of languages and tasks. | | BAAI/bge-en-icl | English | - | A LLM-based dense retriever with in-context learning capabilities can fully leverage the model's potential based on a few shot examples(4096 tokens) | Provide instructions and few-shot examples freely based on the given task. | | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune | a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | ## Citation If you find this repository useful, please consider giving a star :star: and citation" +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-reranker-base.json b/data/model_data_json/BAAI_bge-reranker-base.json new file mode 100644 index 0000000000000000000000000000000000000000..73185384593f09d2b31be54628cdfbfcf98dc690 --- /dev/null +++ b/data/model_data_json/BAAI_bge-reranker-base.json @@ -0,0 +1,26 @@ +{ + "model_id": "BAAI/bge-reranker-base", + "downloads": 1113449, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "xlm-roberta", + "mteb", + "text-embeddings-inference", + "text-classification", + "en", + "zh", + "arxiv:2401.03462", + "arxiv:2312.15503", + "arxiv:2311.13534", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "region:us" + ], + "description": "--- license: mit language: - en - zh tags: - mteb - text-embeddings-inference model-index: - name: bge-reranker-base results: - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 81.27206722525007 - type: mrr value: 84.14238095238095 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 84.10369934291236 - type: mrr value: 86.79376984126984 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 35.4600511272538 - type: mrr value: 34.60238095238095 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 67.27728847727172 - type: mrr value: 77.1315192743764 pipeline_tag: text-classification library_name: sentence-transformers --- **We have updated the new reranker, supporting larger lengths, more languages, and achieving better performance.**

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Citation | License

**More details please refer to our Github: FlagEmbedding.** English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Embedding Model**: Visualized-BGE, BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: llm rerankers, BGE Reranker - **Benchmark**: C-MTEB ## News - 3/18/2024: Release new rerankers, built upon powerful M3 and LLM (GEMMA and MiniCPM, not so large actually) backbones, supporitng multi-lingual processing and larger inputs, massive improvements of ranking performances on BEIR, C-MTEB/Retrieval, MIRACL, LlamaIndex Evaluation. - 3/18/2024: Release Visualized-BGE, equipping BGE with visual capabilities. Visualized-BGE can be utilized to generate embeddings for hybrid image-text data. - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. Refer to this example for the fine-tuning for reranker
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers #### Usage reranker with the ONNX files #### Usage reranker with infinity Its also possible to deploy the onnx/torch files with the infinity_emb pip package. ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Reranks text documents to improve retrieval performance by prioritizing relevant content based on query-document relationships." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-reranker-large.json b/data/model_data_json/BAAI_bge-reranker-large.json new file mode 100644 index 0000000000000000000000000000000000000000..74f16406f81eab61481cc642110a56e56ac0707e --- /dev/null +++ b/data/model_data_json/BAAI_bge-reranker-large.json @@ -0,0 +1,29 @@ +{ + "model_id": "BAAI/bge-reranker-large", + "downloads": 482447, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "xlm-roberta", + "text-classification", + "mteb", + "feature-extraction", + "en", + "zh", + "arxiv:2401.03462", + "arxiv:2312.15503", + "arxiv:2311.13534", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - en - zh tags: - mteb model-index: - name: bge-reranker-base results: - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 81.27206722525007 - type: mrr value: 84.14238095238095 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 84.10369934291236 - type: mrr value: 86.79376984126984 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 35.4600511272538 - type: mrr value: 34.60238095238095 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 67.27728847727172 - type: mrr value: 77.1315192743764 pipeline_tag: feature-extraction --- **We have updated the new reranker, supporting larger lengths, more languages, and achieving better performance.**

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Citation | License

**More details please refer to our Github: FlagEmbedding.** English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Embedding Model**: Visualized-BGE, BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: llm rerankers, BGE Reranker - **Benchmark**: C-MTEB ## News - 3/18/2024: Release new rerankers, built upon powerful M3 and LLM (GEMMA and MiniCPM, not so large actually) backbones, supporitng multi-lingual processing and larger inputs, massive improvements of ranking performances on BEIR, C-MTEB/Retrieval, MIRACL, LlamaIndex Evaluation. - 3/18/2024: Release Visualized-BGE, equipping BGE with visual capabilities. Visualized-BGE can be utilized to generate embeddings for hybrid image-text data. - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker. Refer to this example for the fine-tuning for reranker
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers #### Usage reranker with the ONNX files #### Usage reranker with infinity Its also possible to deploy the onnx/torch files with the infinity_emb pip package. ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Reranks documents or search results to improve relevance, supporting multiple languages and longer input lengths for enhanced retrieval performance." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-reranker-v2-m3.json b/data/model_data_json/BAAI_bge-reranker-v2-m3.json new file mode 100644 index 0000000000000000000000000000000000000000..4af8b1c1ab831f24e8311d01cf30f1ccb1ce8a34 --- /dev/null +++ b/data/model_data_json/BAAI_bge-reranker-v2-m3.json @@ -0,0 +1,19 @@ +{ + "model_id": "BAAI/bge-reranker-v2-m3", + "downloads": 1891532, + "tags": [ + "sentence-transformers", + "safetensors", + "xlm-roberta", + "text-classification", + "transformers", + "text-embeddings-inference", + "multilingual", + "arxiv:2312.15503", + "arxiv:2402.03216", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: text-classification tags: - transformers - sentence-transformers - text-embeddings-inference language: - multilingual --- # Reranker **More details please refer to our Github: FlagEmbedding.** - Model List - Usage - Fine-tuning - Evaluation - Citation Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function. ## Model List | Model | Base model | Language | layerwise | feature | |:--------------------------------------------------------------------------|:--------:|:-----------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:| | BAAI/bge-reranker-base | xlm-roberta-base | Chinese and English | - | Lightweight reranker model, easy to deploy, with fast inference. | | BAAI/bge-reranker-large | xlm-roberta-large | Chinese and English | - | Lightweight reranker model, easy to deploy, with fast inference. | | BAAI/bge-reranker-v2-m3 | bge-m3 | Multilingual | - | Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference. | | BAAI/bge-reranker-v2-gemma | gemma-2b | Multilingual | - | Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities. | | BAAI/bge-reranker-v2-minicpm-layerwise | MiniCPM-2B-dpo-bf16 | Multilingual | 8-40 | Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference. | You can select the model according your senario and resource. - For **multilingual**, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-gemma - For **Chinese or English**, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-minicpm-layerwise. - For **efficiency**, utilize BAAI/bge-reranker-v2-m3 and the low layer of BAAI/bge-reranker-v2-minicpm-layerwise. - For better performance, recommand BAAI/bge-reranker-v2-minicpm-layerwise and BAAI/bge-reranker-v2-gemma ## Usage ### Using FlagEmbedding #### For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 ) Get relevance scores (higher scores indicate more relevance): #### For LLM-based reranker #### For LLM-based layerwise reranker ### Using Huggingface transformers #### For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 ) Get relevance scores (higher scores indicate more relevance): #### For LLM-based reranker #### For LLM-based layerwise reranker ## Fine-tune ### Data Format Train data should be a json file, where each line is a dict like this: is the query, and is a list of positive texts, is a list of negative texts, indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives. See toy_finetune_data.jsonl for a toy data file. ### Train You can fine-tune the reranker with the following code: **For llm-based reranker** **For llm-based layerwise reranker** Our rerankers are initialized from google/gemma-2b (for llm-based reranker) and openbmb/MiniCPM-2B-dpo-bf16 (for llm-based layerwise reranker), and we train it on a mixture of multilingual datasets: - bge-m3-data - quora train data - fever train data ## Evaluation - llama-index. !image-20240317193909373 - BEIR. rereank the top 100 results from bge-en-v1.5 large. !image-20240317174633333 rereank the top 100 results from e5 mistral 7b instruct. !image-20240317172949713 - CMTEB-retrieval. It rereank the top 100 results from bge-zh-v1.5 large. !image-20240317173026235 - miracl (multi-language). It rereank the top 100 results from bge-m3. !image-20240317173117639 ## Citation If you find this repository useful, please consider giving a star and citation", + "model_explanation_gemini": "Computes relevance scores between queries and documents in multiple languages to rank passages by similarity." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-small-en-v1.5.json b/data/model_data_json/BAAI_bge-small-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..dddb49d8a4dfcb1070b75580d5fe6292ee55e034 --- /dev/null +++ b/data/model_data_json/BAAI_bge-small-en-v1.5.json @@ -0,0 +1,29 @@ +{ + "model_id": "BAAI/bge-small-en-v1.5", + "downloads": 3366269, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "mteb", + "en", + "arxiv:2401.03462", + "arxiv:2312.15503", + "arxiv:2311.13534", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge-small-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.79104477611939 - type: ap value: 37.21923821573361 - type: f1 value: 68.0914945617093 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.75377499999999 - type: ap value: 89.46766124546022 - type: f1 value: 92.73884001331487 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.986 - type: f1 value: 46.55936786727896 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 35.846000000000004 - type: map_at_10 value: 51.388 - type: map_at_100 value: 52.132999999999996 - type: map_at_1000 value: 52.141000000000005 - type: map_at_3 value: 47.037 - type: map_at_5 value: 49.579 - type: mrr_at_1 value: 36.558 - type: mrr_at_10 value: 51.658 - type: mrr_at_100 value: 52.402 - type: mrr_at_1000 value: 52.410000000000004 - type: mrr_at_3 value: 47.345 - type: mrr_at_5 value: 49.797999999999995 - type: ndcg_at_1 value: 35.846000000000004 - type: ndcg_at_10 value: 59.550000000000004 - type: ndcg_at_100 value: 62.596 - type: ndcg_at_1000 value: 62.759 - type: ndcg_at_3 value: 50.666999999999994 - type: ndcg_at_5 value: 55.228 - type: precision_at_1 value: 35.846000000000004 - type: precision_at_10 value: 8.542 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.389 - type: precision_at_5 value: 14.438 - type: recall_at_1 value: 35.846000000000004 - type: recall_at_10 value: 85.42 - type: recall_at_100 value: 98.43499999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 61.166 - type: recall_at_5 value: 72.191 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.402770198163594 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 40.01545436974177 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.586465273207196 - type: mrr value: 74.42169019038825 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.1891186537969 - type: cos_sim_spearman value: 83.75492046087288 - type: euclidean_pearson value: 84.11766204805357 - type: euclidean_spearman value: 84.01456493126516 - type: manhattan_pearson value: 84.2132950502772 - type: manhattan_spearman value: 83.89227298813377 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.74025974025975 - type: f1 value: 85.71493566466381 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 38.467181385006434 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 34.719496037339056 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.587000000000003 - type: map_at_10 value: 41.114 - type: map_at_100 value: 42.532 - type: map_at_1000 value: 42.661 - type: map_at_3 value: 37.483 - type: map_at_5 value: 39.652 - type: mrr_at_1 value: 36.338 - type: mrr_at_10 value: 46.763 - type: mrr_at_100 value: 47.393 - type: mrr_at_1000 value: 47.445 - type: mrr_at_3 value: 43.538 - type: mrr_at_5 value: 45.556000000000004 - type: ndcg_at_1 value: 36.338 - type: ndcg_at_10 value: 47.658 - type: ndcg_at_100 value: 52.824000000000005 - type: ndcg_at_1000 value: 54.913999999999994 - type: ndcg_at_3 value: 41.989 - type: ndcg_at_5 value: 44.944 - type: precision_at_1 value: 36.338 - type: precision_at_10 value: 9.156 - type: precision_at_100 value: 1.4789999999999999 - type: precision_at_1000 value: 0.196 - type: precision_at_3 value: 20.076 - type: precision_at_5 value: 14.85 - type: recall_at_1 value: 29.587000000000003 - type: recall_at_10 value: 60.746 - type: recall_at_100 value: 82.157 - type: recall_at_1000 value: 95.645 - type: recall_at_3 value: 44.821 - type: recall_at_5 value: 52.819 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.239 - type: map_at_10 value: 39.989000000000004 - type: map_at_100 value: 41.196 - type: map_at_1000 value: 41.325 - type: map_at_3 value: 37.261 - type: map_at_5 value: 38.833 - type: mrr_at_1 value: 37.516 - type: mrr_at_10 value: 46.177 - type: mrr_at_100 value: 46.806 - type: mrr_at_1000 value: 46.849000000000004 - type: mrr_at_3 value: 44.002 - type: mrr_at_5 value: 45.34 - type: ndcg_at_1 value: 37.516 - type: ndcg_at_10 value: 45.586 - type: ndcg_at_100 value: 49.897000000000006 - type: ndcg_at_1000 value: 51.955 - type: ndcg_at_3 value: 41.684 - type: ndcg_at_5 value: 43.617 - type: precision_at_1 value: 37.516 - type: precision_at_10 value: 8.522 - type: precision_at_100 value: 1.374 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 20.105999999999998 - type: precision_at_5 value: 14.152999999999999 - type: recall_at_1 value: 30.239 - type: recall_at_10 value: 55.03 - type: recall_at_100 value: 73.375 - type: recall_at_1000 value: 86.29599999999999 - type: recall_at_3 value: 43.269000000000005 - type: recall_at_5 value: 48.878 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.338 - type: map_at_10 value: 50.468999999999994 - type: map_at_100 value: 51.553000000000004 - type: map_at_1000 value: 51.608 - type: map_at_3 value: 47.107 - type: map_at_5 value: 49.101 - type: mrr_at_1 value: 44.201 - type: mrr_at_10 value: 54.057 - type: mrr_at_100 value: 54.764 - type: mrr_at_1000 value: 54.791000000000004 - type: mrr_at_3 value: 51.56699999999999 - type: mrr_at_5 value: 53.05 - type: ndcg_at_1 value: 44.201 - type: ndcg_at_10 value: 56.379000000000005 - type: ndcg_at_100 value: 60.645 - type: ndcg_at_1000 value: 61.73499999999999 - type: ndcg_at_3 value: 50.726000000000006 - type: ndcg_at_5 value: 53.58500000000001 - type: precision_at_1 value: 44.201 - type: precision_at_10 value: 9.141 - type: precision_at_100 value: 1.216 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 22.654 - type: precision_at_5 value: 15.723999999999998 - type: recall_at_1 value: 38.338 - type: recall_at_10 value: 70.30499999999999 - type: recall_at_100 value: 88.77199999999999 - type: recall_at_1000 value: 96.49799999999999 - type: recall_at_3 value: 55.218 - type: recall_at_5 value: 62.104000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.682 - type: map_at_10 value: 33.498 - type: map_at_100 value: 34.461000000000006 - type: map_at_1000 value: 34.544000000000004 - type: map_at_3 value: 30.503999999999998 - type: map_at_5 value: 32.216 - type: mrr_at_1 value: 27.683999999999997 - type: mrr_at_10 value: 35.467999999999996 - type: mrr_at_100 value: 36.32 - type: mrr_at_1000 value: 36.386 - type: mrr_at_3 value: 32.618 - type: mrr_at_5 value: 34.262 - type: ndcg_at_1 value: 27.683999999999997 - type: ndcg_at_10 value: 38.378 - type: ndcg_at_100 value: 43.288 - type: ndcg_at_1000 value: 45.413 - type: ndcg_at_3 value: 32.586 - type: ndcg_at_5 value: 35.499 - type: precision_at_1 value: 27.683999999999997 - type: precision_at_10 value: 5.864 - type: precision_at_100 value: 0.882 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 13.446 - type: precision_at_5 value: 9.718 - type: recall_at_1 value: 25.682 - type: recall_at_10 value: 51.712 - type: recall_at_100 value: 74.446 - type: recall_at_1000 value: 90.472 - type: recall_at_3 value: 36.236000000000004 - type: recall_at_5 value: 43.234 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.073999999999998 - type: map_at_10 value: 24.352999999999998 - type: map_at_100 value: 25.438 - type: map_at_1000 value: 25.545 - type: map_at_3 value: 21.614 - type: map_at_5 value: 23.104 - type: mrr_at_1 value: 19.776 - type: mrr_at_10 value: 28.837000000000003 - type: mrr_at_100 value: 29.755 - type: mrr_at_1000 value: 29.817 - type: mrr_at_3 value: 26.201999999999998 - type: mrr_at_5 value: 27.714 - type: ndcg_at_1 value: 19.776 - type: ndcg_at_10 value: 29.701 - type: ndcg_at_100 value: 35.307 - type: ndcg_at_1000 value: 37.942 - type: ndcg_at_3 value: 24.764 - type: ndcg_at_5 value: 27.025 - type: precision_at_1 value: 19.776 - type: precision_at_10 value: 5.659 - type: precision_at_100 value: 0.971 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 12.065 - type: precision_at_5 value: 8.905000000000001 - type: recall_at_1 value: 16.073999999999998 - type: recall_at_10 value: 41.647 - type: recall_at_100 value: 66.884 - type: recall_at_1000 value: 85.91499999999999 - type: recall_at_3 value: 27.916 - type: recall_at_5 value: 33.729 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.444999999999997 - type: map_at_10 value: 38.218999999999994 - type: map_at_100 value: 39.595 - type: map_at_1000 value: 39.709 - type: map_at_3 value: 35.586 - type: map_at_5 value: 36.895 - type: mrr_at_1 value: 34.841 - type: mrr_at_10 value: 44.106 - type: mrr_at_100 value: 44.98 - type: mrr_at_1000 value: 45.03 - type: mrr_at_3 value: 41.979 - type: mrr_at_5 value: 43.047999999999995 - type: ndcg_at_1 value: 34.841 - type: ndcg_at_10 value: 43.922 - type: ndcg_at_100 value: 49.504999999999995 - type: ndcg_at_1000 value: 51.675000000000004 - type: ndcg_at_3 value: 39.858 - type: ndcg_at_5 value: 41.408 - type: precision_at_1 value: 34.841 - type: precision_at_10 value: 7.872999999999999 - type: precision_at_100 value: 1.2449999999999999 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 18.993 - type: precision_at_5 value: 13.032 - type: recall_at_1 value: 28.444999999999997 - type: recall_at_10 value: 54.984 - type: recall_at_100 value: 78.342 - type: recall_at_1000 value: 92.77 - type: recall_at_3 value: 42.842999999999996 - type: recall_at_5 value: 47.247 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.072 - type: map_at_10 value: 32.354 - type: map_at_100 value: 33.800000000000004 - type: map_at_1000 value: 33.908 - type: map_at_3 value: 29.232000000000003 - type: map_at_5 value: 31.049 - type: mrr_at_1 value: 29.110000000000003 - type: mrr_at_10 value: 38.03 - type: mrr_at_100 value: 39.032 - type: mrr_at_1000 value: 39.086999999999996 - type: mrr_at_3 value: 35.407 - type: mrr_at_5 value: 36.76 - type: ndcg_at_1 value: 29.110000000000003 - type: ndcg_at_10 value: 38.231 - type: ndcg_at_100 value: 44.425 - type: ndcg_at_1000 value: 46.771 - type: ndcg_at_3 value: 33.095 - type: ndcg_at_5 value: 35.459 - type: precision_at_1 value: 29.110000000000003 - type: precision_at_10 value: 7.215000000000001 - type: precision_at_100 value: 1.2109999999999999 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 16.058 - type: precision_at_5 value: 11.644 - type: recall_at_1 value: 23.072 - type: recall_at_10 value: 50.285999999999994 - type: recall_at_100 value: 76.596 - type: recall_at_1000 value: 92.861 - type: recall_at_3 value: 35.702 - type: recall_at_5 value: 42.152 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.937916666666666 - type: map_at_10 value: 33.755250000000004 - type: map_at_100 value: 34.955999999999996 - type: map_at_1000 value: 35.070499999999996 - type: map_at_3 value: 30.98708333333333 - type: map_at_5 value: 32.51491666666666 - type: mrr_at_1 value: 29.48708333333333 - type: mrr_at_10 value: 37.92183333333334 - type: mrr_at_100 value: 38.76583333333333 - type: mrr_at_1000 value: 38.82466666666667 - type: mrr_at_3 value: 35.45125 - type: mrr_at_5 value: 36.827000000000005 - type: ndcg_at_1 value: 29.48708333333333 - type: ndcg_at_10 value: 39.05225 - type: ndcg_at_100 value: 44.25983333333334 - type: ndcg_at_1000 value: 46.568333333333335 - type: ndcg_at_3 value: 34.271583333333325 - type: ndcg_at_5 value: 36.483916666666666 - type: precision_at_1 value: 29.48708333333333 - type: precision_at_10 value: 6.865749999999999 - type: precision_at_100 value: 1.1195833333333332 - type: precision_at_1000 value: 0.15058333333333335 - type: precision_at_3 value: 15.742083333333333 - type: precision_at_5 value: 11.221916666666667 - type: recall_at_1 value: 24.937916666666666 - type: recall_at_10 value: 50.650416666666665 - type: recall_at_100 value: 73.55383333333334 - type: recall_at_1000 value: 89.61691666666667 - type: recall_at_3 value: 37.27808333333334 - type: recall_at_5 value: 42.99475 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.947 - type: map_at_10 value: 30.575000000000003 - type: map_at_100 value: 31.465 - type: map_at_1000 value: 31.558000000000003 - type: map_at_3 value: 28.814 - type: map_at_5 value: 29.738999999999997 - type: mrr_at_1 value: 26.994 - type: mrr_at_10 value: 33.415 - type: mrr_at_100 value: 34.18 - type: mrr_at_1000 value: 34.245 - type: mrr_at_3 value: 31.621 - type: mrr_at_5 value: 32.549 - type: ndcg_at_1 value: 26.994 - type: ndcg_at_10 value: 34.482 - type: ndcg_at_100 value: 38.915 - type: ndcg_at_1000 value: 41.355 - type: ndcg_at_3 value: 31.139 - type: ndcg_at_5 value: 32.589 - type: precision_at_1 value: 26.994 - type: precision_at_10 value: 5.322 - type: precision_at_100 value: 0.8160000000000001 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 13.344000000000001 - type: precision_at_5 value: 8.988 - type: recall_at_1 value: 23.947 - type: recall_at_10 value: 43.647999999999996 - type: recall_at_100 value: 63.851 - type: recall_at_1000 value: 82.0 - type: recall_at_3 value: 34.288000000000004 - type: recall_at_5 value: 38.117000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.197 - type: map_at_10 value: 22.968 - type: map_at_100 value: 24.095 - type: map_at_1000 value: 24.217 - type: map_at_3 value: 20.771 - type: map_at_5 value: 21.995 - type: mrr_at_1 value: 19.511 - type: mrr_at_10 value: 26.55 - type: mrr_at_100 value: 27.500999999999998 - type: mrr_at_1000 value: 27.578999999999997 - type: mrr_at_3 value: 24.421 - type: mrr_at_5 value: 25.604 - type: ndcg_at_1 value: 19.511 - type: ndcg_at_10 value: 27.386 - type: ndcg_at_100 value: 32.828 - type: ndcg_at_1000 value: 35.739 - type: ndcg_at_3 value: 23.405 - type: ndcg_at_5 value: 25.255 - type: precision_at_1 value: 19.511 - type: precision_at_10 value: 5.017 - type: precision_at_100 value: 0.91 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 11.023 - type: precision_at_5 value: 8.025 - type: recall_at_1 value: 16.197 - type: recall_at_10 value: 37.09 - type: recall_at_100 value: 61.778 - type: recall_at_1000 value: 82.56599999999999 - type: recall_at_3 value: 26.034000000000002 - type: recall_at_5 value: 30.762 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.41 - type: map_at_10 value: 33.655 - type: map_at_100 value: 34.892 - type: map_at_1000 value: 34.995 - type: map_at_3 value: 30.94 - type: map_at_5 value: 32.303 - type: mrr_at_1 value: 29.477999999999998 - type: mrr_at_10 value: 37.443 - type: mrr_at_100 value: 38.383 - type: mrr_at_1000 value: 38.440000000000005 - type: mrr_at_3 value: 34.949999999999996 - type: mrr_at_5 value: 36.228 - type: ndcg_at_1 value: 29.477999999999998 - type: ndcg_at_10 value: 38.769 - type: ndcg_at_100 value: 44.245000000000005 - type: ndcg_at_1000 value: 46.593 - type: ndcg_at_3 value: 33.623 - type: ndcg_at_5 value: 35.766 - type: precision_at_1 value: 29.477999999999998 - type: precision_at_10 value: 6.455 - type: precision_at_100 value: 1.032 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 14.893999999999998 - type: precision_at_5 value: 10.485 - type: recall_at_1 value: 25.41 - type: recall_at_10 value: 50.669 - type: recall_at_100 value: 74.084 - type: recall_at_1000 value: 90.435 - type: recall_at_3 value: 36.679 - type: recall_at_5 value: 41.94 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.339 - type: map_at_10 value: 31.852000000000004 - type: map_at_100 value: 33.411 - type: map_at_1000 value: 33.62 - type: map_at_3 value: 28.929 - type: map_at_5 value: 30.542 - type: mrr_at_1 value: 28.063 - type: mrr_at_10 value: 36.301 - type: mrr_at_100 value: 37.288 - type: mrr_at_1000 value: 37.349 - type: mrr_at_3 value: 33.663 - type: mrr_at_5 value: 35.165 - type: ndcg_at_1 value: 28.063 - type: ndcg_at_10 value: 37.462 - type: ndcg_at_100 value: 43.620999999999995 - type: ndcg_at_1000 value: 46.211 - type: ndcg_at_3 value: 32.68 - type: ndcg_at_5 value: 34.981 - type: precision_at_1 value: 28.063 - type: precision_at_10 value: 7.1739999999999995 - type: precision_at_100 value: 1.486 - type: precision_at_1000 value: 0.23500000000000001 - type: precision_at_3 value: 15.217 - type: precision_at_5 value: 11.265 - type: recall_at_1 value: 23.339 - type: recall_at_10 value: 48.376999999999995 - type: recall_at_100 value: 76.053 - type: recall_at_1000 value: 92.455 - type: recall_at_3 value: 34.735 - type: recall_at_5 value: 40.71 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.925 - type: map_at_10 value: 26.017000000000003 - type: map_at_100 value: 27.034000000000002 - type: map_at_1000 value: 27.156000000000002 - type: map_at_3 value: 23.604 - type: map_at_5 value: 24.75 - type: mrr_at_1 value: 20.333000000000002 - type: mrr_at_10 value: 27.915 - type: mrr_at_100 value: 28.788000000000004 - type: mrr_at_1000 value: 28.877999999999997 - type: mrr_at_3 value: 25.446999999999996 - type: mrr_at_5 value: 26.648 - type: ndcg_at_1 value: 20.333000000000002 - type: ndcg_at_10 value: 30.673000000000002 - type: ndcg_at_100 value: 35.618 - type: ndcg_at_1000 value: 38.517 - type: ndcg_at_3 value: 25.71 - type: ndcg_at_5 value: 27.679 - type: precision_at_1 value: 20.333000000000002 - type: precision_at_10 value: 4.9910000000000005 - type: precision_at_100 value: 0.8130000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 11.029 - type: precision_at_5 value: 7.8740000000000006 - type: recall_at_1 value: 18.925 - type: recall_at_10 value: 43.311 - type: recall_at_100 value: 66.308 - type: recall_at_1000 value: 87.49 - type: recall_at_3 value: 29.596 - type: recall_at_5 value: 34.245 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.714 - type: map_at_10 value: 23.194 - type: map_at_100 value: 24.976000000000003 - type: map_at_1000 value: 25.166 - type: map_at_3 value: 19.709 - type: map_at_5 value: 21.523999999999997 - type: mrr_at_1 value: 30.619000000000003 - type: mrr_at_10 value: 42.563 - type: mrr_at_100 value: 43.386 - type: mrr_at_1000 value: 43.423 - type: mrr_at_3 value: 39.555 - type: mrr_at_5 value: 41.268 - type: ndcg_at_1 value: 30.619000000000003 - type: ndcg_at_10 value: 31.836 - type: ndcg_at_100 value: 38.652 - type: ndcg_at_1000 value: 42.088 - type: ndcg_at_3 value: 26.733 - type: ndcg_at_5 value: 28.435 - type: precision_at_1 value: 30.619000000000003 - type: precision_at_10 value: 9.751999999999999 - type: precision_at_100 value: 1.71 - type: precision_at_1000 value: 0.23500000000000001 - type: precision_at_3 value: 19.935 - type: precision_at_5 value: 14.984 - type: recall_at_1 value: 13.714 - type: recall_at_10 value: 37.26 - type: recall_at_100 value: 60.546 - type: recall_at_1000 value: 79.899 - type: recall_at_3 value: 24.325 - type: recall_at_5 value: 29.725 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.462 - type: map_at_10 value: 18.637 - type: map_at_100 value: 26.131999999999998 - type: map_at_1000 value: 27.607 - type: map_at_3 value: 13.333 - type: map_at_5 value: 15.654000000000002 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.32600000000001 - type: mrr_at_100 value: 74.60900000000001 - type: mrr_at_1000 value: 74.62 - type: mrr_at_3 value: 72.667 - type: mrr_at_5 value: 73.817 - type: ndcg_at_1 value: 53.87499999999999 - type: ndcg_at_10 value: 40.028999999999996 - type: ndcg_at_100 value: 44.199 - type: ndcg_at_1000 value: 51.629999999999995 - type: ndcg_at_3 value: 44.113 - type: ndcg_at_5 value: 41.731 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 31.900000000000002 - type: precision_at_100 value: 10.043000000000001 - type: precision_at_1000 value: 1.926 - type: precision_at_3 value: 47.417 - type: precision_at_5 value: 40.65 - type: recall_at_1 value: 8.462 - type: recall_at_10 value: 24.293 - type: recall_at_100 value: 50.146 - type: recall_at_1000 value: 74.034 - type: recall_at_3 value: 14.967 - type: recall_at_5 value: 18.682000000000002 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.84499999999999 - type: f1 value: 42.48106691979349 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 74.034 - type: map_at_10 value: 82.76 - type: map_at_100 value: 82.968 - type: map_at_1000 value: 82.98299999999999 - type: map_at_3 value: 81.768 - type: map_at_5 value: 82.418 - type: mrr_at_1 value: 80.048 - type: mrr_at_10 value: 87.64999999999999 - type: mrr_at_100 value: 87.712 - type: mrr_at_1000 value: 87.713 - type: mrr_at_3 value: 87.01100000000001 - type: mrr_at_5 value: 87.466 - type: ndcg_at_1 value: 80.048 - type: ndcg_at_10 value: 86.643 - type: ndcg_at_100 value: 87.361 - type: ndcg_at_1000 value: 87.606 - type: ndcg_at_3 value: 85.137 - type: ndcg_at_5 value: 86.016 - type: precision_at_1 value: 80.048 - type: precision_at_10 value: 10.372 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 32.638 - type: precision_at_5 value: 20.177 - type: recall_at_1 value: 74.034 - type: recall_at_10 value: 93.769 - type: recall_at_100 value: 96.569 - type: recall_at_1000 value: 98.039 - type: recall_at_3 value: 89.581 - type: recall_at_5 value: 91.906 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.5 - type: map_at_10 value: 32.857 - type: map_at_100 value: 34.589 - type: map_at_1000 value: 34.778 - type: map_at_3 value: 29.160999999999998 - type: map_at_5 value: 31.033 - type: mrr_at_1 value: 40.123 - type: mrr_at_10 value: 48.776 - type: mrr_at_100 value: 49.495 - type: mrr_at_1000 value: 49.539 - type: mrr_at_3 value: 46.605000000000004 - type: mrr_at_5 value: 47.654 - type: ndcg_at_1 value: 40.123 - type: ndcg_at_10 value: 40.343 - type: ndcg_at_100 value: 46.56 - type: ndcg_at_1000 value: 49.777 - type: ndcg_at_3 value: 37.322 - type: ndcg_at_5 value: 37.791000000000004 - type: precision_at_1 value: 40.123 - type: precision_at_10 value: 11.08 - type: precision_at_100 value: 1.752 - type: precision_at_1000 value: 0.232 - type: precision_at_3 value: 24.897 - type: precision_at_5 value: 17.809 - type: recall_at_1 value: 20.5 - type: recall_at_10 value: 46.388 - type: recall_at_100 value: 69.552 - type: recall_at_1000 value: 89.011 - type: recall_at_3 value: 33.617999999999995 - type: recall_at_5 value: 38.211 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.135999999999996 - type: map_at_10 value: 61.673 - type: map_at_100 value: 62.562 - type: map_at_1000 value: 62.62 - type: map_at_3 value: 58.467999999999996 - type: map_at_5 value: 60.463 - type: mrr_at_1 value: 78.271 - type: mrr_at_10 value: 84.119 - type: mrr_at_100 value: 84.29299999999999 - type: mrr_at_1000 value: 84.299 - type: mrr_at_3 value: 83.18900000000001 - type: mrr_at_5 value: 83.786 - type: ndcg_at_1 value: 78.271 - type: ndcg_at_10 value: 69.935 - type: ndcg_at_100 value: 73.01299999999999 - type: ndcg_at_1000 value: 74.126 - type: ndcg_at_3 value: 65.388 - type: ndcg_at_5 value: 67.906 - type: precision_at_1 value: 78.271 - type: precision_at_10 value: 14.562 - type: precision_at_100 value: 1.6969999999999998 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 41.841 - type: precision_at_5 value: 27.087 - type: recall_at_1 value: 39.135999999999996 - type: recall_at_10 value: 72.809 - type: recall_at_100 value: 84.86200000000001 - type: recall_at_1000 value: 92.208 - type: recall_at_3 value: 62.76199999999999 - type: recall_at_5 value: 67.718 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 90.60600000000001 - type: ap value: 86.6579587804335 - type: f1 value: 90.5938853929307 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.852 - type: map_at_10 value: 33.982 - type: map_at_100 value: 35.116 - type: map_at_1000 value: 35.167 - type: map_at_3 value: 30.134 - type: map_at_5 value: 32.340999999999994 - type: mrr_at_1 value: 22.479 - type: mrr_at_10 value: 34.594 - type: mrr_at_100 value: 35.672 - type: mrr_at_1000 value: 35.716 - type: mrr_at_3 value: 30.84 - type: mrr_at_5 value: 32.998 - type: ndcg_at_1 value: 22.493 - type: ndcg_at_10 value: 40.833000000000006 - type: ndcg_at_100 value: 46.357 - type: ndcg_at_1000 value: 47.637 - type: ndcg_at_3 value: 32.995999999999995 - type: ndcg_at_5 value: 36.919000000000004 - type: precision_at_1 value: 22.493 - type: precision_at_10 value: 6.465999999999999 - type: precision_at_100 value: 0.9249999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.030999999999999 - type: precision_at_5 value: 10.413 - type: recall_at_1 value: 21.852 - type: recall_at_10 value: 61.934999999999995 - type: recall_at_100 value: 87.611 - type: recall_at_1000 value: 97.441 - type: recall_at_3 value: 40.583999999999996 - type: recall_at_5 value: 49.992999999999995 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.36069311445507 - type: f1 value: 93.16456330371453 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 74.74692202462381 - type: f1 value: 58.17903579421599 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.80833893745796 - type: f1 value: 72.70786592684664 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.69872225958305 - type: f1 value: 78.61626934504731 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.058658628717694 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 30.85561739360599 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.290259910144385 - type: mrr value: 32.44223046102856 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.288 - type: map_at_10 value: 12.267999999999999 - type: map_at_100 value: 15.557000000000002 - type: map_at_1000 value: 16.98 - type: map_at_3 value: 8.866 - type: map_at_5 value: 10.418 - type: mrr_at_1 value: 43.653 - type: mrr_at_10 value: 52.681 - type: mrr_at_100 value: 53.315999999999995 - type: mrr_at_1000 value: 53.357 - type: mrr_at_3 value: 51.393 - type: mrr_at_5 value: 51.903999999999996 - type: ndcg_at_1 value: 42.415000000000006 - type: ndcg_at_10 value: 34.305 - type: ndcg_at_100 value: 30.825999999999997 - type: ndcg_at_1000 value: 39.393 - type: ndcg_at_3 value: 39.931 - type: ndcg_at_5 value: 37.519999999999996 - type: precision_at_1 value: 43.653 - type: precision_at_10 value: 25.728 - type: precision_at_100 value: 7.932 - type: precision_at_1000 value: 2.07 - type: precision_at_3 value: 38.184000000000005 - type: precision_at_5 value: 32.879000000000005 - type: recall_at_1 value: 5.288 - type: recall_at_10 value: 16.195 - type: recall_at_100 value: 31.135 - type: recall_at_1000 value: 61.531000000000006 - type: recall_at_3 value: 10.313 - type: recall_at_5 value: 12.754999999999999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 28.216 - type: map_at_10 value: 42.588 - type: map_at_100 value: 43.702999999999996 - type: map_at_1000 value: 43.739 - type: map_at_3 value: 38.177 - type: map_at_5 value: 40.754000000000005 - type: mrr_at_1 value: 31.866 - type: mrr_at_10 value: 45.189 - type: mrr_at_100 value: 46.056000000000004 - type: mrr_at_1000 value: 46.081 - type: mrr_at_3 value: 41.526999999999994 - type: mrr_at_5 value: 43.704 - type: ndcg_at_1 value: 31.837 - type: ndcg_at_10 value: 50.178 - type: ndcg_at_100 value: 54.98800000000001 - type: ndcg_at_1000 value: 55.812 - type: ndcg_at_3 value: 41.853 - type: ndcg_at_5 value: 46.153 - type: precision_at_1 value: 31.837 - type: precision_at_10 value: 8.43 - type: precision_at_100 value: 1.1119999999999999 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 19.023 - type: precision_at_5 value: 13.911000000000001 - type: recall_at_1 value: 28.216 - type: recall_at_10 value: 70.8 - type: recall_at_100 value: 91.857 - type: recall_at_1000 value: 97.941 - type: recall_at_3 value: 49.196 - type: recall_at_5 value: 59.072 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.22800000000001 - type: map_at_10 value: 85.115 - type: map_at_100 value: 85.72 - type: map_at_1000 value: 85.737 - type: map_at_3 value: 82.149 - type: map_at_5 value: 84.029 - type: mrr_at_1 value: 81.96 - type: mrr_at_10 value: 88.00200000000001 - type: mrr_at_100 value: 88.088 - type: mrr_at_1000 value: 88.089 - type: mrr_at_3 value: 87.055 - type: mrr_at_5 value: 87.715 - type: ndcg_at_1 value: 82.01 - type: ndcg_at_10 value: 88.78 - type: ndcg_at_100 value: 89.91 - type: ndcg_at_1000 value: 90.013 - type: ndcg_at_3 value: 85.957 - type: ndcg_at_5 value: 87.56 - type: precision_at_1 value: 82.01 - type: precision_at_10 value: 13.462 - type: precision_at_100 value: 1.528 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.553 - type: precision_at_5 value: 24.732000000000003 - type: recall_at_1 value: 71.22800000000001 - type: recall_at_10 value: 95.69 - type: recall_at_100 value: 99.531 - type: recall_at_1000 value: 99.98 - type: recall_at_3 value: 87.632 - type: recall_at_5 value: 92.117 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 52.31768034366916 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 60.640266772723606 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.7780000000000005 - type: map_at_10 value: 12.299 - type: map_at_100 value: 14.363000000000001 - type: map_at_1000 value: 14.71 - type: map_at_3 value: 8.738999999999999 - type: map_at_5 value: 10.397 - type: mrr_at_1 value: 23.599999999999998 - type: mrr_at_10 value: 34.845 - type: mrr_at_100 value: 35.916 - type: mrr_at_1000 value: 35.973 - type: mrr_at_3 value: 31.7 - type: mrr_at_5 value: 33.535 - type: ndcg_at_1 value: 23.599999999999998 - type: ndcg_at_10 value: 20.522000000000002 - type: ndcg_at_100 value: 28.737000000000002 - type: ndcg_at_1000 value: 34.596 - type: ndcg_at_3 value: 19.542 - type: ndcg_at_5 value: 16.958000000000002 - type: precision_at_1 value: 23.599999999999998 - type: precision_at_10 value: 10.67 - type: precision_at_100 value: 2.259 - type: precision_at_1000 value: 0.367 - type: precision_at_3 value: 18.333 - type: precision_at_5 value: 14.879999999999999 - type: recall_at_1 value: 4.7780000000000005 - type: recall_at_10 value: 21.617 - type: recall_at_100 value: 45.905 - type: recall_at_1000 value: 74.42 - type: recall_at_3 value: 11.148 - type: recall_at_5 value: 15.082999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.22372750297885 - type: cos_sim_spearman value: 79.40972617119405 - type: euclidean_pearson value: 80.6101072020434 - type: euclidean_spearman value: 79.53844217225202 - type: manhattan_pearson value: 80.57265975286111 - type: manhattan_spearman value: 79.46335611792958 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.43713315520749 - type: cos_sim_spearman value: 77.44128693329532 - type: euclidean_pearson value: 81.63869928101123 - type: euclidean_spearman value: 77.29512977961515 - type: manhattan_pearson value: 81.63704185566183 - type: manhattan_spearman value: 77.29909412738657 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 81.59451537860527 - type: cos_sim_spearman value: 82.97994638856723 - type: euclidean_pearson value: 82.89478688288412 - type: euclidean_spearman value: 83.58740751053104 - type: manhattan_pearson value: 82.69140840941608 - type: manhattan_spearman value: 83.33665956040555 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.00756527711764 - type: cos_sim_spearman value: 81.83560996841379 - type: euclidean_pearson value: 82.07684151976518 - type: euclidean_spearman value: 82.00913052060511 - type: manhattan_pearson value: 82.05690778488794 - type: manhattan_spearman value: 82.02260252019525 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.13710262895447 - type: cos_sim_spearman value: 87.26412811156248 - type: euclidean_pearson value: 86.94151453230228 - type: euclidean_spearman value: 87.5363796699571 - type: manhattan_pearson value: 86.86989424083748 - type: manhattan_spearman value: 87.47315940781353 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.0230597603627 - type: cos_sim_spearman value: 84.93344499318864 - type: euclidean_pearson value: 84.23754743431141 - type: euclidean_spearman value: 85.09707376597099 - type: manhattan_pearson value: 84.04325160987763 - type: manhattan_spearman value: 84.89353071339909 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.75620824563921 - type: cos_sim_spearman value: 87.15065513706398 - type: euclidean_pearson value: 88.26281533633521 - type: euclidean_spearman value: 87.51963738643983 - type: manhattan_pearson value: 88.25599267618065 - type: manhattan_spearman value: 87.58048736047483 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.74645319195137 - type: cos_sim_spearman value: 65.29996325037214 - type: euclidean_pearson value: 67.04297794086443 - type: euclidean_spearman value: 65.43841726694343 - type: manhattan_pearson value: 67.39459955690904 - type: manhattan_spearman value: 65.92864704413651 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.31291020270801 - type: cos_sim_spearman value: 85.86473738688068 - type: euclidean_pearson value: 85.65537275064152 - type: euclidean_spearman value: 86.13087454209642 - type: manhattan_pearson value: 85.43946955047609 - type: manhattan_spearman value: 85.91568175344916 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.93798118350695 - type: mrr value: 95.93536274908824 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.594 - type: map_at_10 value: 66.81899999999999 - type: map_at_100 value: 67.368 - type: map_at_1000 value: 67.4 - type: map_at_3 value: 64.061 - type: map_at_5 value: 65.47 - type: mrr_at_1 value: 60.667 - type: mrr_at_10 value: 68.219 - type: mrr_at_100 value: 68.655 - type: mrr_at_1000 value: 68.684 - type: mrr_at_3 value: 66.22200000000001 - type: mrr_at_5 value: 67.289 - type: ndcg_at_1 value: 60.667 - type: ndcg_at_10 value: 71.275 - type: ndcg_at_100 value: 73.642 - type: ndcg_at_1000 value: 74.373 - type: ndcg_at_3 value: 66.521 - type: ndcg_at_5 value: 68.581 - type: precision_at_1 value: 60.667 - type: precision_at_10 value: 9.433 - type: precision_at_100 value: 1.0699999999999998 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.556 - type: precision_at_5 value: 16.8 - type: recall_at_1 value: 57.594 - type: recall_at_10 value: 83.622 - type: recall_at_100 value: 94.167 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 70.64399999999999 - type: recall_at_5 value: 75.983 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.85841584158416 - type: cos_sim_ap value: 96.66996142314342 - type: cos_sim_f1 value: 92.83208020050125 - type: cos_sim_precision value: 93.06532663316584 - type: cos_sim_recall value: 92.60000000000001 - type: dot_accuracy value: 99.85841584158416 - type: dot_ap value: 96.6775307676576 - type: dot_f1 value: 92.69289729177312 - type: dot_precision value: 94.77533960292581 - type: dot_recall value: 90.7 - type: euclidean_accuracy value: 99.86138613861387 - type: euclidean_ap value: 96.6338454403108 - type: euclidean_f1 value: 92.92214357937311 - type: euclidean_precision value: 93.96728016359918 - type: euclidean_recall value: 91.9 - type: manhattan_accuracy value: 99.86237623762376 - type: manhattan_ap value: 96.60370449645053 - type: manhattan_f1 value: 92.91177970423253 - type: manhattan_precision value: 94.7970863683663 - type: manhattan_recall value: 91.10000000000001 - type: max_accuracy value: 99.86237623762376 - type: max_ap value: 96.6775307676576 - type: max_f1 value: 92.92214357937311 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 60.77977058695198 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.2725272535638 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 53.64052466362125 - type: mrr value: 54.533067014684654 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.677624219206578 - type: cos_sim_spearman value: 30.121368518123447 - type: dot_pearson value: 30.69870088041608 - type: dot_spearman value: 29.61284927093751 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22 - type: map_at_10 value: 1.855 - type: map_at_100 value: 9.885 - type: map_at_1000 value: 23.416999999999998 - type: map_at_3 value: 0.637 - type: map_at_5 value: 1.024 - type: mrr_at_1 value: 88.0 - type: mrr_at_10 value: 93.067 - type: mrr_at_100 value: 93.067 - type: mrr_at_1000 value: 93.067 - type: mrr_at_3 value: 92.667 - type: mrr_at_5 value: 93.067 - type: ndcg_at_1 value: 82.0 - type: ndcg_at_10 value: 75.899 - type: ndcg_at_100 value: 55.115 - type: ndcg_at_1000 value: 48.368 - type: ndcg_at_3 value: 79.704 - type: ndcg_at_5 value: 78.39699999999999 - type: precision_at_1 value: 88.0 - type: precision_at_10 value: 79.60000000000001 - type: precision_at_100 value: 56.06 - type: precision_at_1000 value: 21.206 - type: precision_at_3 value: 84.667 - type: precision_at_5 value: 83.2 - type: recall_at_1 value: 0.22 - type: recall_at_10 value: 2.078 - type: recall_at_100 value: 13.297 - type: recall_at_1000 value: 44.979 - type: recall_at_3 value: 0.6689999999999999 - type: recall_at_5 value: 1.106 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.258 - type: map_at_10 value: 10.439 - type: map_at_100 value: 16.89 - type: map_at_1000 value: 18.407999999999998 - type: map_at_3 value: 5.668 - type: map_at_5 value: 7.718 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 51.159 - type: mrr_at_100 value: 51.714000000000006 - type: mrr_at_1000 value: 51.714000000000006 - type: mrr_at_3 value: 47.959 - type: mrr_at_5 value: 50.407999999999994 - type: ndcg_at_1 value: 29.592000000000002 - type: ndcg_at_10 value: 26.037 - type: ndcg_at_100 value: 37.924 - type: ndcg_at_1000 value: 49.126999999999995 - type: ndcg_at_3 value: 30.631999999999998 - type: ndcg_at_5 value: 28.571 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 22.857 - type: precision_at_100 value: 7.754999999999999 - type: precision_at_1000 value: 1.529 - type: precision_at_3 value: 34.014 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.258 - type: recall_at_10 value: 16.554 - type: recall_at_100 value: 48.439 - type: recall_at_1000 value: 82.80499999999999 - type: recall_at_3 value: 7.283 - type: recall_at_5 value: 10.732 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 69.8858 - type: ap value: 13.835684144362109 - type: f1 value: 53.803351693244586 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.50650820599886 - type: f1 value: 60.84357825979259 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.52131044852134 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.59337187816654 - type: cos_sim_ap value: 73.23925826533437 - type: cos_sim_f1 value: 67.34693877551021 - type: cos_sim_precision value: 62.40432237730752 - type: cos_sim_recall value: 73.13984168865434 - type: dot_accuracy value: 85.31322644096085 - type: dot_ap value: 72.30723963807422 - type: dot_f1 value: 66.47051612112296 - type: dot_precision value: 62.0792305930845 - type: dot_recall value: 71.53034300791556 - type: euclidean_accuracy value: 85.61125350181797 - type: euclidean_ap value: 73.32843720487845 - type: euclidean_f1 value: 67.36549633745895 - type: euclidean_precision value: 64.60755813953489 - type: euclidean_recall value: 70.36939313984169 - type: manhattan_accuracy value: 85.63509566668654 - type: manhattan_ap value: 73.16658488311325 - type: manhattan_f1 value: 67.20597386434349 - type: manhattan_precision value: 63.60424028268551 - type: manhattan_recall value: 71.2401055408971 - type: max_accuracy value: 85.63509566668654 - type: max_ap value: 73.32843720487845 - type: max_f1 value: 67.36549633745895 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.33779640625606 - type: cos_sim_ap value: 84.83868375898157 - type: cos_sim_f1 value: 77.16506154017773 - type: cos_sim_precision value: 74.62064005753327 - type: cos_sim_recall value: 79.88912842623961 - type: dot_accuracy value: 88.02732176815307 - type: dot_ap value: 83.95089283763002 - type: dot_f1 value: 76.29635101196631 - type: dot_precision value: 73.31771720613288 - type: dot_recall value: 79.52725592854944 - type: euclidean_accuracy value: 88.44452206310397 - type: euclidean_ap value: 84.98384576824827 - type: euclidean_f1 value: 77.29311047696697 - type: euclidean_precision value: 74.51232583065381 - type: euclidean_recall value: 80.28949799815214 - type: manhattan_accuracy value: 88.47362906042613 - type: manhattan_ap value: 84.91421462218432 - type: manhattan_f1 value: 77.05107637204792 - type: manhattan_precision value: 74.74484256243214 - type: manhattan_recall value: 79.50415768401602 - type: max_accuracy value: 88.47362906042613 - type: max_ap value: 84.98384576824827 - type: max_f1 value: 77.29311047696697 license: mit language: - en ---

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License

More details please refer to our Github: FlagEmbedding. If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using bge-m3. English | 中文 FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: - **Long-Context LLM**: Activation Beacon - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: BGE-M3, LLM Embedder, BGE Embedding - **Reranker Model**: BGE Reranker - **Benchmark**: C-MTEB ## News - 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. Technical Report and Code. :fire: - 1/9/2024: Release Activation-Beacon, an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. Technical Report :fire: - 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. Technical Report :fire: - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/bge-m3 | Multilingual | Inference Fine-tune | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers #### Usage of the ONNX files #### Usage via infinity Its also possible to deploy the onnx files with the infinity_emb pip package. Recommended is with flash attention on gpu, and for onnx inference. ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like sentence similarity, retrieval, classification, clustering, and reranking." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_bge-small-en.json b/data/model_data_json/BAAI_bge-small-en.json new file mode 100644 index 0000000000000000000000000000000000000000..15c0a8f1c6e06b3d3d79ca8889ba4f3edddda37c --- /dev/null +++ b/data/model_data_json/BAAI_bge-small-en.json @@ -0,0 +1,24 @@ +{ + "model_id": "BAAI/bge-small-en", + "downloads": 251381, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "mteb", + "sentence transformers", + "en", + "arxiv:2311.13534", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "model-index", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence transformers model-index: - name: bge-small-en results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.34328358208955 - type: ap value: 37.59947775195661 - type: f1 value: 68.548415491933 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.04527499999999 - type: ap value: 89.60696356772135 - type: f1 value: 93.03361469382438 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.08 - type: f1 value: 45.66249835363254 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 35.205999999999996 - type: map_at_10 value: 50.782000000000004 - type: map_at_100 value: 51.547 - type: map_at_1000 value: 51.554 - type: map_at_3 value: 46.515 - type: map_at_5 value: 49.296 - type: mrr_at_1 value: 35.632999999999996 - type: mrr_at_10 value: 50.958999999999996 - type: mrr_at_100 value: 51.724000000000004 - type: mrr_at_1000 value: 51.731 - type: mrr_at_3 value: 46.669 - type: mrr_at_5 value: 49.439 - type: ndcg_at_1 value: 35.205999999999996 - type: ndcg_at_10 value: 58.835 - type: ndcg_at_100 value: 62.095 - type: ndcg_at_1000 value: 62.255 - type: ndcg_at_3 value: 50.255 - type: ndcg_at_5 value: 55.296 - type: precision_at_1 value: 35.205999999999996 - type: precision_at_10 value: 8.421 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.365 - type: precision_at_5 value: 14.680000000000001 - type: recall_at_1 value: 35.205999999999996 - type: recall_at_10 value: 84.211 - type: recall_at_100 value: 98.43499999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 61.095 - type: recall_at_5 value: 73.4 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.52644476278646 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 39.973045724188964 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.28285314871488 - type: mrr value: 74.52743701358659 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 80.09041909160327 - type: cos_sim_spearman value: 79.96266537706944 - type: euclidean_pearson value: 79.50774978162241 - type: euclidean_spearman value: 79.9144715078551 - type: manhattan_pearson value: 79.2062139879302 - type: manhattan_spearman value: 79.35000081468212 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.31493506493506 - type: f1 value: 85.2704557977762 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.6837242810816 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 35.38881249555897 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.884999999999998 - type: map_at_10 value: 39.574 - type: map_at_100 value: 40.993 - type: map_at_1000 value: 41.129 - type: map_at_3 value: 36.089 - type: map_at_5 value: 38.191 - type: mrr_at_1 value: 34.477999999999994 - type: mrr_at_10 value: 45.411 - type: mrr_at_100 value: 46.089999999999996 - type: mrr_at_1000 value: 46.147 - type: mrr_at_3 value: 42.346000000000004 - type: mrr_at_5 value: 44.292 - type: ndcg_at_1 value: 34.477999999999994 - type: ndcg_at_10 value: 46.123999999999995 - type: ndcg_at_100 value: 51.349999999999994 - type: ndcg_at_1000 value: 53.578 - type: ndcg_at_3 value: 40.824 - type: ndcg_at_5 value: 43.571 - type: precision_at_1 value: 34.477999999999994 - type: precision_at_10 value: 8.841000000000001 - type: precision_at_100 value: 1.4460000000000002 - type: precision_at_1000 value: 0.192 - type: precision_at_3 value: 19.742 - type: precision_at_5 value: 14.421000000000001 - type: recall_at_1 value: 27.884999999999998 - type: recall_at_10 value: 59.087 - type: recall_at_100 value: 80.609 - type: recall_at_1000 value: 95.054 - type: recall_at_3 value: 44.082 - type: recall_at_5 value: 51.593999999999994 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.639 - type: map_at_10 value: 40.047 - type: map_at_100 value: 41.302 - type: map_at_1000 value: 41.425 - type: map_at_3 value: 37.406 - type: map_at_5 value: 38.934000000000005 - type: mrr_at_1 value: 37.707 - type: mrr_at_10 value: 46.082 - type: mrr_at_100 value: 46.745 - type: mrr_at_1000 value: 46.786 - type: mrr_at_3 value: 43.980999999999995 - type: mrr_at_5 value: 45.287 - type: ndcg_at_1 value: 37.707 - type: ndcg_at_10 value: 45.525 - type: ndcg_at_100 value: 49.976 - type: ndcg_at_1000 value: 51.94499999999999 - type: ndcg_at_3 value: 41.704 - type: ndcg_at_5 value: 43.596000000000004 - type: precision_at_1 value: 37.707 - type: precision_at_10 value: 8.465 - type: precision_at_100 value: 1.375 - type: precision_at_1000 value: 0.183 - type: precision_at_3 value: 19.979 - type: precision_at_5 value: 14.115 - type: recall_at_1 value: 30.639 - type: recall_at_10 value: 54.775 - type: recall_at_100 value: 73.678 - type: recall_at_1000 value: 86.142 - type: recall_at_3 value: 43.230000000000004 - type: recall_at_5 value: 48.622 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.038 - type: map_at_10 value: 49.922 - type: map_at_100 value: 51.032 - type: map_at_1000 value: 51.085 - type: map_at_3 value: 46.664 - type: map_at_5 value: 48.588 - type: mrr_at_1 value: 43.95 - type: mrr_at_10 value: 53.566 - type: mrr_at_100 value: 54.318999999999996 - type: mrr_at_1000 value: 54.348 - type: mrr_at_3 value: 51.066 - type: mrr_at_5 value: 52.649 - type: ndcg_at_1 value: 43.95 - type: ndcg_at_10 value: 55.676 - type: ndcg_at_100 value: 60.126000000000005 - type: ndcg_at_1000 value: 61.208 - type: ndcg_at_3 value: 50.20400000000001 - type: ndcg_at_5 value: 53.038 - type: precision_at_1 value: 43.95 - type: precision_at_10 value: 8.953 - type: precision_at_100 value: 1.2109999999999999 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 22.256999999999998 - type: precision_at_5 value: 15.524 - type: recall_at_1 value: 38.038 - type: recall_at_10 value: 69.15 - type: recall_at_100 value: 88.31599999999999 - type: recall_at_1000 value: 95.993 - type: recall_at_3 value: 54.663 - type: recall_at_5 value: 61.373 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.872 - type: map_at_10 value: 32.912 - type: map_at_100 value: 33.972 - type: map_at_1000 value: 34.046 - type: map_at_3 value: 30.361 - type: map_at_5 value: 31.704 - type: mrr_at_1 value: 26.779999999999998 - type: mrr_at_10 value: 34.812 - type: mrr_at_100 value: 35.754999999999995 - type: mrr_at_1000 value: 35.809000000000005 - type: mrr_at_3 value: 32.335 - type: mrr_at_5 value: 33.64 - type: ndcg_at_1 value: 26.779999999999998 - type: ndcg_at_10 value: 37.623 - type: ndcg_at_100 value: 42.924 - type: ndcg_at_1000 value: 44.856 - type: ndcg_at_3 value: 32.574 - type: ndcg_at_5 value: 34.842 - type: precision_at_1 value: 26.779999999999998 - type: precision_at_10 value: 5.729 - type: precision_at_100 value: 0.886 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 13.559 - type: precision_at_5 value: 9.469 - type: recall_at_1 value: 24.872 - type: recall_at_10 value: 50.400999999999996 - type: recall_at_100 value: 74.954 - type: recall_at_1000 value: 89.56 - type: recall_at_3 value: 36.726 - type: recall_at_5 value: 42.138999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.803 - type: map_at_10 value: 24.348 - type: map_at_100 value: 25.56 - type: map_at_1000 value: 25.668000000000003 - type: map_at_3 value: 21.811 - type: map_at_5 value: 23.287 - type: mrr_at_1 value: 20.771 - type: mrr_at_10 value: 28.961 - type: mrr_at_100 value: 29.979 - type: mrr_at_1000 value: 30.046 - type: mrr_at_3 value: 26.555 - type: mrr_at_5 value: 28.060000000000002 - type: ndcg_at_1 value: 20.771 - type: ndcg_at_10 value: 29.335 - type: ndcg_at_100 value: 35.188 - type: ndcg_at_1000 value: 37.812 - type: ndcg_at_3 value: 24.83 - type: ndcg_at_5 value: 27.119 - type: precision_at_1 value: 20.771 - type: precision_at_10 value: 5.4350000000000005 - type: precision_at_100 value: 0.9480000000000001 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 11.982 - type: precision_at_5 value: 8.831 - type: recall_at_1 value: 16.803 - type: recall_at_10 value: 40.039 - type: recall_at_100 value: 65.83200000000001 - type: recall_at_1000 value: 84.478 - type: recall_at_3 value: 27.682000000000002 - type: recall_at_5 value: 33.535 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.345 - type: map_at_10 value: 37.757000000000005 - type: map_at_100 value: 39.141 - type: map_at_1000 value: 39.262 - type: map_at_3 value: 35.183 - type: map_at_5 value: 36.592 - type: mrr_at_1 value: 34.649 - type: mrr_at_10 value: 43.586999999999996 - type: mrr_at_100 value: 44.481 - type: mrr_at_1000 value: 44.542 - type: mrr_at_3 value: 41.29 - type: mrr_at_5 value: 42.642 - type: ndcg_at_1 value: 34.649 - type: ndcg_at_10 value: 43.161 - type: ndcg_at_100 value: 48.734 - type: ndcg_at_1000 value: 51.046 - type: ndcg_at_3 value: 39.118 - type: ndcg_at_5 value: 41.022 - type: precision_at_1 value: 34.649 - type: precision_at_10 value: 7.603 - type: precision_at_100 value: 1.209 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 18.319 - type: precision_at_5 value: 12.839 - type: recall_at_1 value: 28.345 - type: recall_at_10 value: 53.367 - type: recall_at_100 value: 76.453 - type: recall_at_1000 value: 91.82000000000001 - type: recall_at_3 value: 41.636 - type: recall_at_5 value: 46.760000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.419 - type: map_at_10 value: 31.716 - type: map_at_100 value: 33.152 - type: map_at_1000 value: 33.267 - type: map_at_3 value: 28.74 - type: map_at_5 value: 30.48 - type: mrr_at_1 value: 28.310999999999996 - type: mrr_at_10 value: 37.039 - type: mrr_at_100 value: 38.09 - type: mrr_at_1000 value: 38.145 - type: mrr_at_3 value: 34.437 - type: mrr_at_5 value: 36.024 - type: ndcg_at_1 value: 28.310999999999996 - type: ndcg_at_10 value: 37.41 - type: ndcg_at_100 value: 43.647999999999996 - type: ndcg_at_1000 value: 46.007 - type: ndcg_at_3 value: 32.509 - type: ndcg_at_5 value: 34.943999999999996 - type: precision_at_1 value: 28.310999999999996 - type: precision_at_10 value: 6.963 - type: precision_at_100 value: 1.1860000000000002 - type: precision_at_1000 value: 0.154 - type: precision_at_3 value: 15.867999999999999 - type: precision_at_5 value: 11.507000000000001 - type: recall_at_1 value: 22.419 - type: recall_at_10 value: 49.28 - type: recall_at_100 value: 75.802 - type: recall_at_1000 value: 92.032 - type: recall_at_3 value: 35.399 - type: recall_at_5 value: 42.027 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.669249999999998 - type: map_at_10 value: 33.332583333333325 - type: map_at_100 value: 34.557833333333335 - type: map_at_1000 value: 34.67141666666666 - type: map_at_3 value: 30.663166666666662 - type: map_at_5 value: 32.14883333333333 - type: mrr_at_1 value: 29.193833333333334 - type: mrr_at_10 value: 37.47625 - type: mrr_at_100 value: 38.3545 - type: mrr_at_1000 value: 38.413166666666676 - type: mrr_at_3 value: 35.06741666666667 - type: mrr_at_5 value: 36.450666666666656 - type: ndcg_at_1 value: 29.193833333333334 - type: ndcg_at_10 value: 38.505416666666676 - type: ndcg_at_100 value: 43.81125 - type: ndcg_at_1000 value: 46.09558333333333 - type: ndcg_at_3 value: 33.90916666666667 - type: ndcg_at_5 value: 36.07666666666666 - type: precision_at_1 value: 29.193833333333334 - type: precision_at_10 value: 6.7251666666666665 - type: precision_at_100 value: 1.1058333333333332 - type: precision_at_1000 value: 0.14833333333333332 - type: precision_at_3 value: 15.554166666666665 - type: precision_at_5 value: 11.079250000000002 - type: recall_at_1 value: 24.669249999999998 - type: recall_at_10 value: 49.75583333333332 - type: recall_at_100 value: 73.06908333333332 - type: recall_at_1000 value: 88.91316666666667 - type: recall_at_3 value: 36.913250000000005 - type: recall_at_5 value: 42.48641666666666 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.044999999999998 - type: map_at_10 value: 30.349999999999998 - type: map_at_100 value: 31.273 - type: map_at_1000 value: 31.362000000000002 - type: map_at_3 value: 28.508 - type: map_at_5 value: 29.369 - type: mrr_at_1 value: 26.994 - type: mrr_at_10 value: 33.12 - type: mrr_at_100 value: 33.904 - type: mrr_at_1000 value: 33.967000000000006 - type: mrr_at_3 value: 31.365 - type: mrr_at_5 value: 32.124 - type: ndcg_at_1 value: 26.994 - type: ndcg_at_10 value: 34.214 - type: ndcg_at_100 value: 38.681 - type: ndcg_at_1000 value: 40.926 - type: ndcg_at_3 value: 30.725 - type: ndcg_at_5 value: 31.967000000000002 - type: precision_at_1 value: 26.994 - type: precision_at_10 value: 5.215 - type: precision_at_100 value: 0.807 - type: precision_at_1000 value: 0.108 - type: precision_at_3 value: 12.986 - type: precision_at_5 value: 8.712 - type: recall_at_1 value: 24.044999999999998 - type: recall_at_10 value: 43.456 - type: recall_at_100 value: 63.675000000000004 - type: recall_at_1000 value: 80.05499999999999 - type: recall_at_3 value: 33.561 - type: recall_at_5 value: 36.767 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.672 - type: map_at_10 value: 22.641 - type: map_at_100 value: 23.75 - type: map_at_1000 value: 23.877000000000002 - type: map_at_3 value: 20.219 - type: map_at_5 value: 21.648 - type: mrr_at_1 value: 18.823 - type: mrr_at_10 value: 26.101999999999997 - type: mrr_at_100 value: 27.038 - type: mrr_at_1000 value: 27.118 - type: mrr_at_3 value: 23.669 - type: mrr_at_5 value: 25.173000000000002 - type: ndcg_at_1 value: 18.823 - type: ndcg_at_10 value: 27.176000000000002 - type: ndcg_at_100 value: 32.42 - type: ndcg_at_1000 value: 35.413 - type: ndcg_at_3 value: 22.756999999999998 - type: ndcg_at_5 value: 25.032 - type: precision_at_1 value: 18.823 - type: precision_at_10 value: 5.034000000000001 - type: precision_at_100 value: 0.895 - type: precision_at_1000 value: 0.132 - type: precision_at_3 value: 10.771 - type: precision_at_5 value: 8.1 - type: recall_at_1 value: 15.672 - type: recall_at_10 value: 37.296 - type: recall_at_100 value: 60.863 - type: recall_at_1000 value: 82.234 - type: recall_at_3 value: 25.330000000000002 - type: recall_at_5 value: 30.964000000000002 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.633 - type: map_at_10 value: 32.858 - type: map_at_100 value: 34.038000000000004 - type: map_at_1000 value: 34.141 - type: map_at_3 value: 30.209000000000003 - type: map_at_5 value: 31.567 - type: mrr_at_1 value: 28.358 - type: mrr_at_10 value: 36.433 - type: mrr_at_100 value: 37.352000000000004 - type: mrr_at_1000 value: 37.41 - type: mrr_at_3 value: 34.033 - type: mrr_at_5 value: 35.246 - type: ndcg_at_1 value: 28.358 - type: ndcg_at_10 value: 37.973 - type: ndcg_at_100 value: 43.411 - type: ndcg_at_1000 value: 45.747 - type: ndcg_at_3 value: 32.934999999999995 - type: ndcg_at_5 value: 35.013 - type: precision_at_1 value: 28.358 - type: precision_at_10 value: 6.418 - type: precision_at_100 value: 1.02 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 14.677000000000001 - type: precision_at_5 value: 10.335999999999999 - type: recall_at_1 value: 24.633 - type: recall_at_10 value: 50.048 - type: recall_at_100 value: 73.821 - type: recall_at_1000 value: 90.046 - type: recall_at_3 value: 36.284 - type: recall_at_5 value: 41.370000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.133 - type: map_at_10 value: 31.491999999999997 - type: map_at_100 value: 33.062000000000005 - type: map_at_1000 value: 33.256 - type: map_at_3 value: 28.886 - type: map_at_5 value: 30.262 - type: mrr_at_1 value: 28.063 - type: mrr_at_10 value: 36.144 - type: mrr_at_100 value: 37.14 - type: mrr_at_1000 value: 37.191 - type: mrr_at_3 value: 33.762 - type: mrr_at_5 value: 34.997 - type: ndcg_at_1 value: 28.063 - type: ndcg_at_10 value: 36.951 - type: ndcg_at_100 value: 43.287 - type: ndcg_at_1000 value: 45.777 - type: ndcg_at_3 value: 32.786 - type: ndcg_at_5 value: 34.65 - type: precision_at_1 value: 28.063 - type: precision_at_10 value: 7.055 - type: precision_at_100 value: 1.476 - type: precision_at_1000 value: 0.22899999999999998 - type: precision_at_3 value: 15.481 - type: precision_at_5 value: 11.186 - type: recall_at_1 value: 23.133 - type: recall_at_10 value: 47.285 - type: recall_at_100 value: 76.176 - type: recall_at_1000 value: 92.176 - type: recall_at_3 value: 35.223 - type: recall_at_5 value: 40.142 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.547 - type: map_at_10 value: 26.374 - type: map_at_100 value: 27.419 - type: map_at_1000 value: 27.539 - type: map_at_3 value: 23.882 - type: map_at_5 value: 25.163999999999998 - type: mrr_at_1 value: 21.442 - type: mrr_at_10 value: 28.458 - type: mrr_at_100 value: 29.360999999999997 - type: mrr_at_1000 value: 29.448999999999998 - type: mrr_at_3 value: 25.97 - type: mrr_at_5 value: 27.273999999999997 - type: ndcg_at_1 value: 21.442 - type: ndcg_at_10 value: 30.897000000000002 - type: ndcg_at_100 value: 35.99 - type: ndcg_at_1000 value: 38.832 - type: ndcg_at_3 value: 25.944 - type: ndcg_at_5 value: 28.126 - type: precision_at_1 value: 21.442 - type: precision_at_10 value: 4.9910000000000005 - type: precision_at_100 value: 0.8109999999999999 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 11.029 - type: precision_at_5 value: 7.911 - type: recall_at_1 value: 19.547 - type: recall_at_10 value: 42.886 - type: recall_at_100 value: 66.64999999999999 - type: recall_at_1000 value: 87.368 - type: recall_at_3 value: 29.143 - type: recall_at_5 value: 34.544000000000004 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 15.572 - type: map_at_10 value: 25.312 - type: map_at_100 value: 27.062 - type: map_at_1000 value: 27.253 - type: map_at_3 value: 21.601 - type: map_at_5 value: 23.473 - type: mrr_at_1 value: 34.984 - type: mrr_at_10 value: 46.406 - type: mrr_at_100 value: 47.179 - type: mrr_at_1000 value: 47.21 - type: mrr_at_3 value: 43.485 - type: mrr_at_5 value: 45.322 - type: ndcg_at_1 value: 34.984 - type: ndcg_at_10 value: 34.344 - type: ndcg_at_100 value: 41.015 - type: ndcg_at_1000 value: 44.366 - type: ndcg_at_3 value: 29.119 - type: ndcg_at_5 value: 30.825999999999997 - type: precision_at_1 value: 34.984 - type: precision_at_10 value: 10.358 - type: precision_at_100 value: 1.762 - type: precision_at_1000 value: 0.23900000000000002 - type: precision_at_3 value: 21.368000000000002 - type: precision_at_5 value: 15.948 - type: recall_at_1 value: 15.572 - type: recall_at_10 value: 39.367999999999995 - type: recall_at_100 value: 62.183 - type: recall_at_1000 value: 80.92200000000001 - type: recall_at_3 value: 26.131999999999998 - type: recall_at_5 value: 31.635999999999996 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.848 - type: map_at_10 value: 19.25 - type: map_at_100 value: 27.193 - type: map_at_1000 value: 28.721999999999998 - type: map_at_3 value: 13.968 - type: map_at_5 value: 16.283 - type: mrr_at_1 value: 68.75 - type: mrr_at_10 value: 76.25 - type: mrr_at_100 value: 76.534 - type: mrr_at_1000 value: 76.53999999999999 - type: mrr_at_3 value: 74.667 - type: mrr_at_5 value: 75.86699999999999 - type: ndcg_at_1 value: 56.00000000000001 - type: ndcg_at_10 value: 41.426 - type: ndcg_at_100 value: 45.660000000000004 - type: ndcg_at_1000 value: 53.02 - type: ndcg_at_3 value: 46.581 - type: ndcg_at_5 value: 43.836999999999996 - type: precision_at_1 value: 68.75 - type: precision_at_10 value: 32.800000000000004 - type: precision_at_100 value: 10.440000000000001 - type: precision_at_1000 value: 1.9980000000000002 - type: precision_at_3 value: 49.667 - type: precision_at_5 value: 42.25 - type: recall_at_1 value: 8.848 - type: recall_at_10 value: 24.467 - type: recall_at_100 value: 51.344 - type: recall_at_1000 value: 75.235 - type: recall_at_3 value: 15.329 - type: recall_at_5 value: 18.892999999999997 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 48.95 - type: f1 value: 43.44563593360779 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 78.036 - type: map_at_10 value: 85.639 - type: map_at_100 value: 85.815 - type: map_at_1000 value: 85.829 - type: map_at_3 value: 84.795 - type: map_at_5 value: 85.336 - type: mrr_at_1 value: 84.353 - type: mrr_at_10 value: 90.582 - type: mrr_at_100 value: 90.617 - type: mrr_at_1000 value: 90.617 - type: mrr_at_3 value: 90.132 - type: mrr_at_5 value: 90.447 - type: ndcg_at_1 value: 84.353 - type: ndcg_at_10 value: 89.003 - type: ndcg_at_100 value: 89.60000000000001 - type: ndcg_at_1000 value: 89.836 - type: ndcg_at_3 value: 87.81400000000001 - type: ndcg_at_5 value: 88.478 - type: precision_at_1 value: 84.353 - type: precision_at_10 value: 10.482 - type: precision_at_100 value: 1.099 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 33.257999999999996 - type: precision_at_5 value: 20.465 - type: recall_at_1 value: 78.036 - type: recall_at_10 value: 94.517 - type: recall_at_100 value: 96.828 - type: recall_at_1000 value: 98.261 - type: recall_at_3 value: 91.12 - type: recall_at_5 value: 92.946 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.191 - type: map_at_10 value: 32.369 - type: map_at_100 value: 34.123999999999995 - type: map_at_1000 value: 34.317 - type: map_at_3 value: 28.71 - type: map_at_5 value: 30.607 - type: mrr_at_1 value: 40.894999999999996 - type: mrr_at_10 value: 48.842 - type: mrr_at_100 value: 49.599 - type: mrr_at_1000 value: 49.647000000000006 - type: mrr_at_3 value: 46.785 - type: mrr_at_5 value: 47.672 - type: ndcg_at_1 value: 40.894999999999996 - type: ndcg_at_10 value: 39.872 - type: ndcg_at_100 value: 46.126 - type: ndcg_at_1000 value: 49.476 - type: ndcg_at_3 value: 37.153000000000006 - type: ndcg_at_5 value: 37.433 - type: precision_at_1 value: 40.894999999999996 - type: precision_at_10 value: 10.818 - type: precision_at_100 value: 1.73 - type: precision_at_1000 value: 0.231 - type: precision_at_3 value: 25.051000000000002 - type: precision_at_5 value: 17.531 - type: recall_at_1 value: 20.191 - type: recall_at_10 value: 45.768 - type: recall_at_100 value: 68.82000000000001 - type: recall_at_1000 value: 89.133 - type: recall_at_3 value: 33.296 - type: recall_at_5 value: 38.022 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.257 - type: map_at_10 value: 61.467000000000006 - type: map_at_100 value: 62.364 - type: map_at_1000 value: 62.424 - type: map_at_3 value: 58.228 - type: map_at_5 value: 60.283 - type: mrr_at_1 value: 78.515 - type: mrr_at_10 value: 84.191 - type: mrr_at_100 value: 84.378 - type: mrr_at_1000 value: 84.385 - type: mrr_at_3 value: 83.284 - type: mrr_at_5 value: 83.856 - type: ndcg_at_1 value: 78.515 - type: ndcg_at_10 value: 69.78999999999999 - type: ndcg_at_100 value: 72.886 - type: ndcg_at_1000 value: 74.015 - type: ndcg_at_3 value: 65.23 - type: ndcg_at_5 value: 67.80199999999999 - type: precision_at_1 value: 78.515 - type: precision_at_10 value: 14.519000000000002 - type: precision_at_100 value: 1.694 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 41.702 - type: precision_at_5 value: 27.046999999999997 - type: recall_at_1 value: 39.257 - type: recall_at_10 value: 72.59299999999999 - type: recall_at_100 value: 84.679 - type: recall_at_1000 value: 92.12 - type: recall_at_3 value: 62.552 - type: recall_at_5 value: 67.616 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.5152 - type: ap value: 87.64584669595709 - type: f1 value: 91.50605576428437 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.926000000000002 - type: map_at_10 value: 34.049 - type: map_at_100 value: 35.213 - type: map_at_1000 value: 35.265 - type: map_at_3 value: 30.309 - type: map_at_5 value: 32.407000000000004 - type: mrr_at_1 value: 22.55 - type: mrr_at_10 value: 34.657 - type: mrr_at_100 value: 35.760999999999996 - type: mrr_at_1000 value: 35.807 - type: mrr_at_3 value: 30.989 - type: mrr_at_5 value: 33.039 - type: ndcg_at_1 value: 22.55 - type: ndcg_at_10 value: 40.842 - type: ndcg_at_100 value: 46.436 - type: ndcg_at_1000 value: 47.721999999999994 - type: ndcg_at_3 value: 33.209 - type: ndcg_at_5 value: 36.943 - type: precision_at_1 value: 22.55 - type: precision_at_10 value: 6.447 - type: precision_at_100 value: 0.9249999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.136000000000001 - type: precision_at_5 value: 10.381 - type: recall_at_1 value: 21.926000000000002 - type: recall_at_10 value: 61.724999999999994 - type: recall_at_100 value: 87.604 - type: recall_at_1000 value: 97.421 - type: recall_at_3 value: 40.944 - type: recall_at_5 value: 49.915 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.54765161878704 - type: f1 value: 93.3298945415573 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 75.71591427268582 - type: f1 value: 59.32113870474471 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 75.83053127101547 - type: f1 value: 73.60757944876475 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.72562205783457 - type: f1 value: 78.63761662505502 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.37935633767996 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.55270546130387 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.462692753143834 - type: mrr value: 31.497569753511563 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.646 - type: map_at_10 value: 12.498 - type: map_at_100 value: 15.486 - type: map_at_1000 value: 16.805999999999997 - type: map_at_3 value: 9.325 - type: map_at_5 value: 10.751 - type: mrr_at_1 value: 43.034 - type: mrr_at_10 value: 52.662 - type: mrr_at_100 value: 53.189 - type: mrr_at_1000 value: 53.25 - type: mrr_at_3 value: 50.929 - type: mrr_at_5 value: 51.92 - type: ndcg_at_1 value: 41.796 - type: ndcg_at_10 value: 33.477000000000004 - type: ndcg_at_100 value: 29.996000000000002 - type: ndcg_at_1000 value: 38.864 - type: ndcg_at_3 value: 38.940000000000005 - type: ndcg_at_5 value: 36.689 - type: precision_at_1 value: 43.034 - type: precision_at_10 value: 24.799 - type: precision_at_100 value: 7.432999999999999 - type: precision_at_1000 value: 1.9929999999999999 - type: precision_at_3 value: 36.842000000000006 - type: precision_at_5 value: 32.135999999999996 - type: recall_at_1 value: 5.646 - type: recall_at_10 value: 15.963 - type: recall_at_100 value: 29.492 - type: recall_at_1000 value: 61.711000000000006 - type: recall_at_3 value: 10.585 - type: recall_at_5 value: 12.753999999999998 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 27.602 - type: map_at_10 value: 41.545 - type: map_at_100 value: 42.644999999999996 - type: map_at_1000 value: 42.685 - type: map_at_3 value: 37.261 - type: map_at_5 value: 39.706 - type: mrr_at_1 value: 31.141000000000002 - type: mrr_at_10 value: 44.139 - type: mrr_at_100 value: 44.997 - type: mrr_at_1000 value: 45.025999999999996 - type: mrr_at_3 value: 40.503 - type: mrr_at_5 value: 42.64 - type: ndcg_at_1 value: 31.141000000000002 - type: ndcg_at_10 value: 48.995 - type: ndcg_at_100 value: 53.788000000000004 - type: ndcg_at_1000 value: 54.730000000000004 - type: ndcg_at_3 value: 40.844 - type: ndcg_at_5 value: 44.955 - type: precision_at_1 value: 31.141000000000002 - type: precision_at_10 value: 8.233 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 18.579 - type: precision_at_5 value: 13.533999999999999 - type: recall_at_1 value: 27.602 - type: recall_at_10 value: 69.216 - type: recall_at_100 value: 90.252 - type: recall_at_1000 value: 97.27 - type: recall_at_3 value: 47.987 - type: recall_at_5 value: 57.438 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.949 - type: map_at_10 value: 84.89999999999999 - type: map_at_100 value: 85.531 - type: map_at_1000 value: 85.548 - type: map_at_3 value: 82.027 - type: map_at_5 value: 83.853 - type: mrr_at_1 value: 81.69999999999999 - type: mrr_at_10 value: 87.813 - type: mrr_at_100 value: 87.917 - type: mrr_at_1000 value: 87.91799999999999 - type: mrr_at_3 value: 86.938 - type: mrr_at_5 value: 87.53999999999999 - type: ndcg_at_1 value: 81.75 - type: ndcg_at_10 value: 88.55499999999999 - type: ndcg_at_100 value: 89.765 - type: ndcg_at_1000 value: 89.871 - type: ndcg_at_3 value: 85.905 - type: ndcg_at_5 value: 87.41 - type: precision_at_1 value: 81.75 - type: precision_at_10 value: 13.403 - type: precision_at_100 value: 1.528 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.597 - type: precision_at_5 value: 24.69 - type: recall_at_1 value: 70.949 - type: recall_at_10 value: 95.423 - type: recall_at_100 value: 99.509 - type: recall_at_1000 value: 99.982 - type: recall_at_3 value: 87.717 - type: recall_at_5 value: 92.032 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 51.76962893449579 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.32897690686379 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.478 - type: map_at_10 value: 11.994 - type: map_at_100 value: 13.977 - type: map_at_1000 value: 14.295 - type: map_at_3 value: 8.408999999999999 - type: map_at_5 value: 10.024 - type: mrr_at_1 value: 22.1 - type: mrr_at_10 value: 33.526 - type: mrr_at_100 value: 34.577000000000005 - type: mrr_at_1000 value: 34.632000000000005 - type: mrr_at_3 value: 30.217 - type: mrr_at_5 value: 31.962000000000003 - type: ndcg_at_1 value: 22.1 - type: ndcg_at_10 value: 20.191 - type: ndcg_at_100 value: 27.954 - type: ndcg_at_1000 value: 33.491 - type: ndcg_at_3 value: 18.787000000000003 - type: ndcg_at_5 value: 16.378999999999998 - type: precision_at_1 value: 22.1 - type: precision_at_10 value: 10.69 - type: precision_at_100 value: 2.1919999999999997 - type: precision_at_1000 value: 0.35200000000000004 - type: precision_at_3 value: 17.732999999999997 - type: precision_at_5 value: 14.499999999999998 - type: recall_at_1 value: 4.478 - type: recall_at_10 value: 21.657 - type: recall_at_100 value: 44.54 - type: recall_at_1000 value: 71.542 - type: recall_at_3 value: 10.778 - type: recall_at_5 value: 14.687 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.82325259156718 - type: cos_sim_spearman value: 79.2463589100662 - type: euclidean_pearson value: 80.48318380496771 - type: euclidean_spearman value: 79.34451935199979 - type: manhattan_pearson value: 80.39041824178759 - type: manhattan_spearman value: 79.23002892700211 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.74130231431258 - type: cos_sim_spearman value: 78.36856568042397 - type: euclidean_pearson value: 82.48301631890303 - type: euclidean_spearman value: 78.28376980722732 - type: manhattan_pearson value: 82.43552075450525 - type: manhattan_spearman value: 78.22702443947126 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 79.96138619461459 - type: cos_sim_spearman value: 81.85436343502379 - type: euclidean_pearson value: 81.82895226665367 - type: euclidean_spearman value: 82.22707349602916 - type: manhattan_pearson value: 81.66303369445873 - type: manhattan_spearman value: 82.05030197179455 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 80.05481244198648 - type: cos_sim_spearman value: 80.85052504637808 - type: euclidean_pearson value: 80.86728419744497 - type: euclidean_spearman value: 81.033786401512 - type: manhattan_pearson value: 80.90107531061103 - type: manhattan_spearman value: 81.11374116827795 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 84.615220756399 - type: cos_sim_spearman value: 86.46858500002092 - type: euclidean_pearson value: 86.08307800247586 - type: euclidean_spearman value: 86.72691443870013 - type: manhattan_pearson value: 85.96155594487269 - type: manhattan_spearman value: 86.605909505275 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.14363913634436 - type: cos_sim_spearman value: 84.48430226487102 - type: euclidean_pearson value: 83.75303424801902 - type: euclidean_spearman value: 84.56762380734538 - type: manhattan_pearson value: 83.6135447165928 - type: manhattan_spearman value: 84.39898212616731 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 85.09909252554525 - type: cos_sim_spearman value: 85.70951402743276 - type: euclidean_pearson value: 87.1991936239908 - type: euclidean_spearman value: 86.07745840612071 - type: manhattan_pearson value: 87.25039137549952 - type: manhattan_spearman value: 85.99938746659761 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.529332093413615 - type: cos_sim_spearman value: 65.38177340147439 - type: euclidean_pearson value: 66.35278011412136 - type: euclidean_spearman value: 65.47147267032997 - type: manhattan_pearson value: 66.71804682408693 - type: manhattan_spearman value: 65.67406521423597 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 82.45802942885662 - type: cos_sim_spearman value: 84.8853341842566 - type: euclidean_pearson value: 84.60915021096707 - type: euclidean_spearman value: 85.11181242913666 - type: manhattan_pearson value: 84.38600521210364 - type: manhattan_spearman value: 84.89045417981723 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.92793380635129 - type: mrr value: 95.85834191226348 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 55.74400000000001 - type: map_at_10 value: 65.455 - type: map_at_100 value: 66.106 - type: map_at_1000 value: 66.129 - type: map_at_3 value: 62.719 - type: map_at_5 value: 64.441 - type: mrr_at_1 value: 58.667 - type: mrr_at_10 value: 66.776 - type: mrr_at_100 value: 67.363 - type: mrr_at_1000 value: 67.384 - type: mrr_at_3 value: 64.889 - type: mrr_at_5 value: 66.122 - type: ndcg_at_1 value: 58.667 - type: ndcg_at_10 value: 69.904 - type: ndcg_at_100 value: 72.807 - type: ndcg_at_1000 value: 73.423 - type: ndcg_at_3 value: 65.405 - type: ndcg_at_5 value: 67.86999999999999 - type: precision_at_1 value: 58.667 - type: precision_at_10 value: 9.3 - type: precision_at_100 value: 1.08 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.444 - type: precision_at_5 value: 17 - type: recall_at_1 value: 55.74400000000001 - type: recall_at_10 value: 82.122 - type: recall_at_100 value: 95.167 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 70.14399999999999 - type: recall_at_5 value: 76.417 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.86534653465347 - type: cos_sim_ap value: 96.54142419791388 - type: cos_sim_f1 value: 93.07535641547861 - type: cos_sim_precision value: 94.81327800829875 - type: cos_sim_recall value: 91.4 - type: dot_accuracy value: 99.86435643564356 - type: dot_ap value: 96.53682260449868 - type: dot_f1 value: 92.98515104966718 - type: dot_precision value: 95.27806925498426 - type: dot_recall value: 90.8 - type: euclidean_accuracy value: 99.86336633663366 - type: euclidean_ap value: 96.5228676185697 - type: euclidean_f1 value: 92.9735234215886 - type: euclidean_precision value: 94.70954356846472 - type: euclidean_recall value: 91.3 - type: manhattan_accuracy value: 99.85841584158416 - type: manhattan_ap value: 96.50392760934032 - type: manhattan_f1 value: 92.84642321160581 - type: manhattan_precision value: 92.8928928928929 - type: manhattan_recall value: 92.80000000000001 - type: max_accuracy value: 99.86534653465347 - type: max_ap value: 96.54142419791388 - type: max_f1 value: 93.07535641547861 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 61.08285408766616 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.640675309010604 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 53.20333913710715 - type: mrr value: 54.088813555725324 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.79465221925075 - type: cos_sim_spearman value: 30.530816059163634 - type: dot_pearson value: 31.364837244718043 - type: dot_spearman value: 30.79726823684003 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22599999999999998 - type: map_at_10 value: 1.735 - type: map_at_100 value: 8.978 - type: map_at_1000 value: 20.851 - type: map_at_3 value: 0.613 - type: map_at_5 value: 0.964 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 92.867 - type: mrr_at_100 value: 92.867 - type: mrr_at_1000 value: 92.867 - type: mrr_at_3 value: 92.667 - type: mrr_at_5 value: 92.667 - type: ndcg_at_1 value: 82 - type: ndcg_at_10 value: 73.164 - type: ndcg_at_100 value: 51.878 - type: ndcg_at_1000 value: 44.864 - type: ndcg_at_3 value: 79.184 - type: ndcg_at_5 value: 76.39 - type: precision_at_1 value: 88 - type: precision_at_10 value: 76.2 - type: precision_at_100 value: 52.459999999999994 - type: precision_at_1000 value: 19.692 - type: precision_at_3 value: 82.667 - type: precision_at_5 value: 80 - type: recall_at_1 value: 0.22599999999999998 - type: recall_at_10 value: 1.942 - type: recall_at_100 value: 12.342 - type: recall_at_1000 value: 41.42 - type: recall_at_3 value: 0.637 - type: recall_at_5 value: 1.034 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 3.567 - type: map_at_10 value: 13.116 - type: map_at_100 value: 19.39 - type: map_at_1000 value: 20.988 - type: map_at_3 value: 7.109 - type: map_at_5 value: 9.950000000000001 - type: mrr_at_1 value: 42.857 - type: mrr_at_10 value: 57.404999999999994 - type: mrr_at_100 value: 58.021 - type: mrr_at_1000 value: 58.021 - type: mrr_at_3 value: 54.762 - type: mrr_at_5 value: 56.19 - type: ndcg_at_1 value: 38.775999999999996 - type: ndcg_at_10 value: 30.359 - type: ndcg_at_100 value: 41.284 - type: ndcg_at_1000 value: 52.30200000000001 - type: ndcg_at_3 value: 36.744 - type: ndcg_at_5 value: 34.326 - type: precision_at_1 value: 42.857 - type: precision_at_10 value: 26.122 - type: precision_at_100 value: 8.082 - type: precision_at_1000 value: 1.559 - type: precision_at_3 value: 40.136 - type: precision_at_5 value: 35.510000000000005 - type: recall_at_1 value: 3.567 - type: recall_at_10 value: 19.045 - type: recall_at_100 value: 49.979 - type: recall_at_1000 value: 84.206 - type: recall_at_3 value: 8.52 - type: recall_at_5 value: 13.103000000000002 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 68.8394 - type: ap value: 13.454399712443099 - type: f1 value: 53.04963076364322 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.546123372948514 - type: f1 value: 60.86952793277713 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 49.10042955060234 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.03308100375514 - type: cos_sim_ap value: 71.08284605869684 - type: cos_sim_f1 value: 65.42539436255494 - type: cos_sim_precision value: 64.14807302231237 - type: cos_sim_recall value: 66.75461741424802 - type: dot_accuracy value: 84.68736961316088 - type: dot_ap value: 69.20524036530992 - type: dot_f1 value: 63.54893953365829 - type: dot_precision value: 63.45698500394633 - type: dot_recall value: 63.641160949868066 - type: euclidean_accuracy value: 85.07480479227513 - type: euclidean_ap value: 71.14592761009864 - type: euclidean_f1 value: 65.43814432989691 - type: euclidean_precision value: 63.95465994962216 - type: euclidean_recall value: 66.99208443271768 - type: manhattan_accuracy value: 85.06288370984085 - type: manhattan_ap value: 71.07289742593868 - type: manhattan_f1 value: 65.37585421412301 - type: manhattan_precision value: 62.816147859922175 - type: manhattan_recall value: 68.15303430079156 - type: max_accuracy value: 85.07480479227513 - type: max_ap value: 71.14592761009864 - type: max_f1 value: 65.43814432989691 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 87.79058485659952 - type: cos_sim_ap value: 83.7183187008759 - type: cos_sim_f1 value: 75.86921142180798 - type: cos_sim_precision value: 73.00683371298405 - type: cos_sim_recall value: 78.96519864490298 - type: dot_accuracy value: 87.0085768618776 - type: dot_ap value: 81.87467488474279 - type: dot_f1 value: 74.04188363990559 - type: dot_precision value: 72.10507114191901 - type: dot_recall value: 76.08561749307053 - type: euclidean_accuracy value: 87.8332751193387 - type: euclidean_ap value: 83.83585648120315 - type: euclidean_f1 value: 76.02582177042369 - type: euclidean_precision value: 73.36388371759989 - type: euclidean_recall value: 78.88820449645827 - type: manhattan_accuracy value: 87.87208444910156 - type: manhattan_ap value: 83.8101950642973 - type: manhattan_f1 value: 75.90454195535027 - type: manhattan_precision value: 72.44419564761039 - type: manhattan_recall value: 79.71204188481676 - type: max_accuracy value: 87.87208444910156 - type: max_ap value: 83.83585648120315 - type: max_f1 value: 76.02582177042369 license: mit language: - en --- **Recommend switching to newest BAAI/bge-small-en-v1.5, which has more reasonable similarity distribution and same method of usage.**

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Citation | License

More details please refer to our Github: FlagEmbedding. English | 中文 FlagEmbedding focus on retrieval-augmented LLMs, consisting of following projects currently: - **Fine-tuning of LM** : LM-Cocktail - **Dense Retrieval**: LLM Embedder, BGE Embedding, C-MTEB - **Reranker Model**: BGE Reranker ## News - 11/23/2023: Release LM-Cocktail, a method to maintain general capabilities during fine-tuning by merging multiple language models. Technical Report :fire: - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Technical Report - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | LM-Cocktail | English | | fine-tuned models (Llama and BGE) which can be used to reproduce the results of LM-Cocktail | | | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you also can download the models at . ## Frequently asked questions
1. How to fine-tune bge embedding model? Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples for using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pairs data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. More training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. More details please refer to ./FlagEmbedding/reranker/README.md ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Performs text embedding tasks including classification, retrieval, clustering, reranking, and semantic textual similarity across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/BAAI_llm-embedder.json b/data/model_data_json/BAAI_llm-embedder.json new file mode 100644 index 0000000000000000000000000000000000000000..5f06be37dd713fb6adee55a1e3f2c89a83a0a4f7 --- /dev/null +++ b/data/model_data_json/BAAI_llm-embedder.json @@ -0,0 +1,18 @@ +{ + "model_id": "BAAI/llm-embedder", + "downloads": 82703, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "arxiv:2310.07554", + "arxiv:2309.07597", + "license:mit", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit ---

FlagEmbedding

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License

More details please refer to our Github: FlagEmbedding. English | 中文 **Hiring:** We're seeking experienced NLP researchers and intern students focusing on dense retrieval and retrieval-augmented LLMs. If you're interested, please feel free to reach out to us via email at zhengliu1026@gmail.com. FlagEmbedding can map any text to a low-dimensional dense vector, which can be used for tasks like retrieval, classification, clustering, and semantic search. And it can also be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release LLM-Embedder, a unified embedding model to support diverse retrieval augmentation needs for LLMs. Paper :fire: - 09/15/2023: The technical report of BGE has been released - 09/15/2023: The massive training data of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models and , which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
More - 09/07/2023: Update fine-tune code: Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like this; C-MTEB **leaderboard** is available. - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release (short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the Chinese Massive Text Embedding Benchmark (**C-MTEB**), consisting of 31 test dataset.
## Model List is short for . | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | BAAI/llm-embedder | English | Inference Fine-tune | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See README | | BAAI/bge-reranker-large | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-reranker-base | Chinese and English | Inference Fine-tune | a cross-encoder model which is more accurate but less efficient [2] | | | BAAI/bge-large-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-en-v1.5 | English | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-base-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-small-zh-v1.5 | Chinese | Inference Fine-tune | version 1.5 with more reasonable similarity distribution | | | BAAI/bge-large-en | English | Inference Fine-tune | :trophy: rank **1st** in MTEB leaderboard | | | BAAI/bge-base-en | English | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-en | English | Inference Fine-tune |a small-scale model but with competitive performance | | | BAAI/bge-large-zh | Chinese | Inference Fine-tune | :trophy: rank **1st** in C-MTEB benchmark | | | BAAI/bge-base-zh | Chinese | Inference Fine-tune | a base-scale model but with similar ability to | | | BAAI/bge-small-zh | Chinese | Inference Fine-tune | a small-scale model but with competitive performance | | [1\\]: If you need to search the relevant passages in a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. [2\\]: Different from the embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. For example, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 documents to get the final top-3 results. All models have been uploaded to Huggingface Hub, and you can see them at If you cannot open the Huggingface Hub, you can also download the models at . ## Frequently asked questions **1. How to fine-tune bge embedding model?** Following this example to prepare data and fine-tune your model. Some suggestions: - Mine hard negatives following this example, which can improve the retrieval performance. - In general, larger hyper-parameter brings better performance. You can expand it by enabling , (df_config.json can refer to ds_config.json, , etc. - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity. - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2. The similarity score between two dissimilar sentences is higher than 0.5 **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.** Since we finetune the models by contrastive learning with a temperature of 0.01, the similarity distribution of the current BGE model is about in the interval \\[0.6, 1\\]. So a similarity score greater than 0.5 does not indicate that the two sentences are similar. For downstream tasks, such as passage retrieval or semantic similarity, **what matters is the relative order of the scores, not the absolute value.** If you need to filter similar sentences based on a similarity threshold, please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
3. When does the query instruction need to be used For the , we improve its retrieval ability when not using instruction. No instruction only has a slight degradation in retrieval performance compared with using instruction. So you can generate embedding without instruction in all cases for convenience. For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.** In all cases, the documents/passages do not need to add the instruction.
## Usage ### Usage for Embedding Model Here are some examples of using models with FlagEmbedding, Sentence-Transformers, Langchain, or Huggingface Transformers. #### Using FlagEmbedding If it doesn't work for you, you can see FlagEmbedding for more methods to install FlagEmbedding. For the value of the argument , see Model List. By default, FlagModel will use all available GPUs when encoding. Please set to select specific GPUs. You also can set to make all GPUs unavailable. #### Using Sentence-Transformers You can also use the models with sentence-transformers: For s2p(short query to long passage) retrieval task, each short query should start with an instruction (instructions see Model List). But the instruction is not needed for passages. #### Using Langchain You can use in langchain like this: #### Using HuggingFace Transformers With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. ### Usage for Reranker Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. #### Using FlagEmbedding Get relevance scores (higher scores indicate more relevance): #### Using Huggingface transformers ## Evaluation models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!** For more details and evaluation tools see our scripts. - **MTEB**: | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | BAAI/bge-large-en-v1.5 | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 | | BAAI/bge-base-en-v1.5 | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 | | BAAI/bge-small-en-v1.5 | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 | | bge-large-en | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 | | bge-base-en | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 | | gte-large | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 | | gte-base | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 | | bge-small-en | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 | | instructor-xl | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 | | e5-base-v2 | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 | | gte-small | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 | - **C-MTEB**: We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks. Please refer to C_MTEB for a detailed introduction. | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | **BAAI/bge-large-zh-v1.5** | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 | | BAAI/bge-base-zh-v1.5 | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 | | BAAI/bge-small-zh-v1.5 | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 | | BAAI/bge-large-zh | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 | | bge-large-zh-noinstruct | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 | | BAAI/bge-base-zh | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 | | multilingual-e5-large | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 | | BAAI/bge-small-zh | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 | | m3e-base | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 | | m3e-large | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 | | multilingual-e5-base | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 | | multilingual-e5-small | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 | | text-embedding-ada-002(OpenAI) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 | | luotuo | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 | | text2vec-base | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 | | text2vec-large | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 | - **Reranking**: See C_MTEB for evaluation script. | Model | T2Reranking | T2RerankingZh2En\\* | T2RerankingEn2Zh\\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 | | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 | | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 | | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 | | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 | | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 | | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 | | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 | | BAAI/bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 | | BAAI/bge-reranker-large | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 | \\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks ## Train ### BAAI Embedding We pre-train the models using retromae and train them on large-scale pair data using contrastive learning. **You can fine-tune the embedding model on your data following our examples.** We also provide a pre-train example. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned. For more training details for bge see baai_general_embedding. ### BGE Reranker Cross-encoder will perform full-attention over the input pair, which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model. Therefore, it can be used to re-rank the top-k documents returned by embedding model. We train the cross-encoder on a multilingual pair data, The data format is the same as embedding model, so you can fine-tune it easily following our example. For more details please refer to ./FlagEmbedding/reranker/README.md ### Our Contributors: ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge." +} \ No newline at end of file diff --git a/data/model_data_json/Babelscape_t5-base-summarization-claim-extractor.json b/data/model_data_json/Babelscape_t5-base-summarization-claim-extractor.json new file mode 100644 index 0000000000000000000000000000000000000000..9cf37279091ab023dbd8aa55e76a683a6caaa014 --- /dev/null +++ b/data/model_data_json/Babelscape_t5-base-summarization-claim-extractor.json @@ -0,0 +1,19 @@ +{ + "model_id": "Babelscape/t5-base-summarization-claim-extractor", + "downloads": 631466, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "en", + "arxiv:2403.02270", + "license:cc-by-nc-sa-4.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers language: - en license: - cc-by-nc-sa-4.0 widget: - text: \"A major tech company has unveiled its first fully autonomous electric vehicle, boasting a range of 500 miles per charge and advanced safety features designed to revolutionize the transportation industry.\" - text: \"A new global initiative to clean up ocean plastic aims to remove 50% of floating debris within a decade, using innovative autonomous vessels powered by renewable energy.\" - text: \"A historic peace agreement was signed between two long-standing rival nations, marking a turning point in diplomatic relations and promising economic and social cooperation for years to come.\" --- # Model Card: T5-base-summarization-claim-extractor ## Model Description **Model Name:** T5-base-summarization-claim-extractor **Authors:** Alessandro Scirè, Karim Ghonim, and Roberto Navigli **Contact:** scire@diag.uniroma1.it, scire@babelscape.com **Language:** English **Primary Use:** Extraction of atomic claims from a summary ### Overview The T5-base-summarization-claim-extractor is a model developed for the task of extracting atomic claims from summaries. The model is based on the T5 architecture which is then fine-tuned specifically for claim extraction. This model was introduced as part of the research presented in the paper \"FENICE: Factuality Evaluation of summarization based on Natural Language Inference and Claim Extraction\" by Alessandro Scirè, Karim Ghonim, and Roberto Navigli. FENICE leverages Natural Language Inference (NLI) and Claim Extraction to evaluate the factuality of summaries. ArXiv version. ### Intended Use This model is designed to: - Extract atomic claims from summaries. - Serve as a component in pipelines for factuality evaluation of summaries. ## Example Code **Note**: The model outputs the claims in a single string. **Kindly remember to split the string into sentences** in order to retrieve the singular claims. ### Training For details regarding the training process, please checkout our paper( (section 4.1). ### Performance |
Model
| easinessP | easinessR | easinessF1 | |:-------------------------------------:|:--------------------:|:--------------------:|:---------------------:| | GPT-3.5 | 80.1 | 70.9 | 74.9 | | t5-base-summarization-claim-extractor | 79.2 | 68.8 | 73.4 | **Table 1:** Easiness Precision (easinessP), Recall (easinessR), and F1 score (easinessF1) results for the LLM-based claim extractor, namely **GPT-3.5**, and **t5-base-summarization-claim-extractor**, assessed on ROSE (Liu et al., 2023b). Further details on the model's performance and the metrics used can be found in the paper (section 4.1). ### Main Repository For more details about FENICE, check out the GitHub repository: Babelscape/FENICE ### Citation If you use this model in your work, please cite the following paper: ### Limitations - The model is specifically designed for extracting claims from summaries and may not perform well on other types of texts. - The model is currently available only in English and may not generalize well to other languages. ### Ethical Considerations Users should be aware that while this model extracts claims that can be evaluated for factuality, it does not determine the truthfulness of those claims. Therefore, it should be used in conjunction with other tools or human judgment when evaluating the reliability of summaries. ### Acknowledgments This work was made possible thanks to the support of Babelscape and Sapienza NLP.", + "model_explanation_gemini": "Extracts atomic claims from summaries to support factuality evaluation pipelines." +} \ No newline at end of file diff --git a/data/model_data_json/Babelscape_wikineural-multilingual-ner.json b/data/model_data_json/Babelscape_wikineural-multilingual-ner.json new file mode 100644 index 0000000000000000000000000000000000000000..90d599b52106eeb01ff108ed01b99cc4208a0ede --- /dev/null +++ b/data/model_data_json/Babelscape_wikineural-multilingual-ner.json @@ -0,0 +1,31 @@ +{ + "model_id": "Babelscape/wikineural-multilingual-ner", + "downloads": 263863, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "safetensors", + "bert", + "token-classification", + "named-entity-recognition", + "sequence-tagger-model", + "de", + "en", + "es", + "fr", + "it", + "nl", + "pl", + "pt", + "ru", + "multilingual", + "dataset:Babelscape/wikineural", + "license:cc-by-nc-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- annotations_creators: - machine-generated language_creators: - machine-generated widget: - text: My name is Wolfgang and I live in Berlin. - text: George Washington went to Washington. - text: Mi nombre es Sarah y vivo en Londres. - text: Меня зовут Симона, и я живу в Риме. tags: - named-entity-recognition - sequence-tagger-model datasets: - Babelscape/wikineural language: - de - en - es - fr - it - nl - pl - pt - ru - multilingual license: - cc-by-nc-sa-4.0 pretty_name: wikineural-dataset source_datasets: - original task_categories: - structure-prediction task_ids: - named-entity-recognition --- # WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER This is the model card for the EMNLP 2021 paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. We fine-tuned a multilingual language model (mBERT) for 3 epochs on our WikiNEuRal dataset for Named Entity Recognition (NER). The resulting multilingual NER model supports the 9 languages covered by WikiNEuRal (de, en, es, fr, it, nl, pl, pt, ru), and it was trained on all 9 languages jointly. **If you use the model, please reference this work in your paper**: The original repository for the paper can be found at ## How to use You can use this model with Transformers *pipeline* for NER. ## Limitations and bias This model is trained on WikiNEuRal, a state-of-the-art dataset for Multilingual NER automatically derived from Wikipedia. Therefore, it might not generalize well to all textual genres (e.g. news). On the other hand, models trained only on news articles (e.g. only on CoNLL03) have been proven to obtain much lower scores on encyclopedic articles. To obtain more robust systems, we encourage you to train a system on the combination of WikiNEuRal with other datasets (e.g. WikiNEuRal + CoNLL). ## Licensing Information Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Copyright of the dataset contents and models belongs to the original copyright holders.", + "model_explanation_gemini": "Performs multilingual named entity recognition (NER) across nine languages using a model fine-tuned on the WikiNEuRal dataset." +} \ No newline at end of file diff --git a/data/model_data_json/Bingsu_adetailer.json b/data/model_data_json/Bingsu_adetailer.json new file mode 100644 index 0000000000000000000000000000000000000000..8ebe5e6804df4639295631c3303707b77bb1b975 --- /dev/null +++ b/data/model_data_json/Bingsu_adetailer.json @@ -0,0 +1,15 @@ +{ + "model_id": "Bingsu/adetailer", + "downloads": 19922172, + "tags": [ + "ultralytics", + "pytorch", + "dataset:wider_face", + "dataset:skytnt/anime-segmentation", + "doi:10.57967/hf/3633", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: ultralytics datasets: - wider_face - skytnt/anime-segmentation tags: - pytorch --- # YOLOv8 Detection Model ## Datasets ### Face - Anime Face CreateML - xml2txt - AN - wider face ### Hand - AnHDet - hand-detection-fuao9 ### Person - coco2017 (only person) - AniSeg - skytnt/anime-segmentation ### deepfashion2 - deepfashion2 | id | label | | --- | --------------------- | | 0 | short_sleeved_shirt | | 1 | long_sleeved_shirt | | 2 | short_sleeved_outwear | | 3 | long_sleeved_outwear | | 4 | vest | | 5 | sling | | 6 | shorts | | 7 | trousers | | 8 | skirt | | 9 | short_sleeved_dress | | 10 | long_sleeved_dress | | 11 | vest_dress | | 12 | sling_dress | ## Info | Model | Target | mAP 50 | mAP 50-95 | | --------------------------- | --------------------- | ----------------------------- | ----------------------------- | | face_yolov8n.pt | 2D / realistic face | 0.660 | 0.366 | | face_yolov8n_v2.pt | 2D / realistic face | 0.669 | 0.372 | | face_yolov8s.pt | 2D / realistic face | 0.713 | 0.404 | | face_yolov8m.pt | 2D / realistic face | 0.737 | 0.424 | | face_yolov9c.pt | 2D / realistic face | 0.748 | 0.433 | | hand_yolov8n.pt | 2D / realistic hand | 0.767 | 0.505 | | hand_yolov8s.pt | 2D / realistic hand | 0.794 | 0.527 | | hand_yolov9c.pt | 2D / realistic hand | 0.810 | 0.550 | | person_yolov8n-seg.pt | 2D / realistic person | 0.782 (bbox)
0.761 (mask) | 0.555 (bbox)
0.460 (mask) | | person_yolov8s-seg.pt | 2D / realistic person | 0.824 (bbox)
0.809 (mask) | 0.605 (bbox)
0.508 (mask) | | person_yolov8m-seg.pt | 2D / realistic person | 0.849 (bbox)
0.831 (mask) | 0.636 (bbox)
0.533 (mask) | | deepfashion2_yolov8s-seg.pt | realistic clothes | 0.849 (bbox)
0.840 (mask) | 0.763 (bbox)
0.675 (mask) | ## Usage !image ## Unsafe files !image Since is classified as a dangerous pickle function, any segmentation model that uses it is classified as unsafe. All models were created and saved using the official ultralytics library, so it's okay to use files downloaded from a trusted source. See also:", + "model_explanation_gemini": "Detects and segments faces, hands, persons, and clothing items in 2D images with various YOLOv8 model variants." +} \ No newline at end of file diff --git a/data/model_data_json/Bingsu_yolo-world-mirror.json b/data/model_data_json/Bingsu_yolo-world-mirror.json new file mode 100644 index 0000000000000000000000000000000000000000..25407152ab799d281107210fcbc15e592d2200c8 --- /dev/null +++ b/data/model_data_json/Bingsu_yolo-world-mirror.json @@ -0,0 +1,11 @@ +{ + "model_id": "Bingsu/yolo-world-mirror", + "downloads": 3345238, + "tags": [ + "ultralytics", + "license:agpl-3.0", + "region:us" + ], + "description": "--- license: agpl-3.0 tags: - ultralytics --- # YOLO World Mirror model weights for ultralytics yolo models", + "model_explanation_gemini": "Mirror model weights for YOLO World in Ultralytics YOLO models." +} \ No newline at end of file diff --git a/data/model_data_json/ByteDance_AnimateDiff-Lightning.json b/data/model_data_json/ByteDance_AnimateDiff-Lightning.json new file mode 100644 index 0000000000000000000000000000000000000000..d1869ef8d194f4d415be1498c9764c43d53e693c --- /dev/null +++ b/data/model_data_json/ByteDance_AnimateDiff-Lightning.json @@ -0,0 +1,15 @@ +{ + "model_id": "ByteDance/AnimateDiff-Lightning", + "downloads": 132463, + "tags": [ + "diffusers", + "text-to-video", + "stable-diffusion", + "animatediff", + "arxiv:2403.12706", + "license:creativeml-openrail-m", + "region:us" + ], + "description": "--- license: creativeml-openrail-m tags: - text-to-video - stable-diffusion - animatediff library_name: diffusers inference: false --- # AnimateDiff-Lightning AnimateDiff-Lightning is a lightning-fast text-to-video generation model. It can generate videos more than ten times faster than the original AnimateDiff. For more information, please refer to our research paper: AnimateDiff-Lightning: Cross-Model Diffusion Distillation. We release the model as part of the research. Our models are distilled from AnimateDiff SD1.5 v2. This repository contains checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models. The generation quality of our 2-step, 4-step, and 8-step model is great. Our 1-step model is only provided for research purposes. ## Demo Try AnimateDiff-Lightning using our text-to-video generation demo. ## Recommendation AnimateDiff-Lightning produces the best results when used with stylized base models. We recommend using the following base models: Realistic - epiCRealism - Realistic Vision - DreamShaper - AbsoluteReality - MajicMix Realistic Anime & Cartoon - ToonYou - IMP - Mistoon Anime - DynaVision - RCNZ Cartoon 3d - MajicMix Reverie Additionally, feel free to explore different settings. We find using 3 inference steps on the 2-step model produces great results. We find certain base models produces better results with CFG. We also recommend using Motion LoRAs as they produce stronger motion. We use Motion LoRAs with strength 0.7~0.8 to avoid watermark. ## Diffusers Usage ## ComfyUI Usage 1. Download animatediff_lightning_workflow.json and import it in ComfyUI. 1. Install nodes. You can install them manually or use ComfyUI-Manager. * ComfyUI-AnimateDiff-Evolved * ComfyUI-VideoHelperSuite 1. Download your favorite base model checkpoint and put them under 1. Download AnimateDiff-Lightning checkpoint and put them under !ComfyUI Workflow ## Video-to-Video Generation AnimateDiff-Lightning is great for video-to-video generation. We provide the simplist comfyui workflow using ControlNet. 1. Download animatediff_lightning_v2v_openpose_workflow.json and import it in ComfyUI. 1. Install nodes. You can install them manually or use ComfyUI-Manager. * ComfyUI-AnimateDiff-Evolved * ComfyUI-VideoHelperSuite * ComfyUI-Advanced-ControlNet * comfyui_controlnet_aux 1. Download your favorite base model checkpoint and put them under 1. Download AnimateDiff-Lightning checkpoint and put them under 1. Download ControlNet OpenPose checkpoint to 1. Upload your video and run the pipeline. Additional notes: 1. Video shouldn't be too long or too high resolution. We used 576x1024 8 second 30fps videos for testing. 1. Set the frame rate to match your input video. This allows audio to match with the output video. 1. DWPose will download checkpoint itself on its first run. 1. DWPose may get stuck in UI, but the pipeline is actually still running in the background. Check ComfyUI log and your output folder. !ComfyUI OpenPose Workflow # Cite Our Work", + "model_explanation_gemini": "Generates videos from text inputs significantly faster than the original AnimateDiff model, optimized for speed and quality with distilled checkpoints for various step configurations." +} \ No newline at end of file diff --git a/data/model_data_json/ByteDance_Hyper-SD.json b/data/model_data_json/ByteDance_Hyper-SD.json new file mode 100644 index 0000000000000000000000000000000000000000..ea56fd2bc52c7e70ea9f5b6a3ee2e6ddbaf548ee --- /dev/null +++ b/data/model_data_json/ByteDance_Hyper-SD.json @@ -0,0 +1,17 @@ +{ + "model_id": "ByteDance/Hyper-SD", + "downloads": 113274, + "tags": [ + "diffusers", + "lora", + "text-to-image", + "stable-diffusion", + "flux", + "arxiv:2404.13686", + "base_model:black-forest-labs/FLUX.1-dev", + "base_model:adapter:black-forest-labs/FLUX.1-dev", + "region:us" + ], + "description": "--- library_name: diffusers inference: false tags: - lora - text-to-image - stable-diffusion - flux base_model: black-forest-labs/FLUX.1-dev --- # Hyper-SD Official Repository of the paper: *Hyper-SD*. Project Page: ## News🔥🔥🔥 * Aug.26, 2024. 💥💥💥 Our 8-steps and 16-steps **FLUX.1-dev-related LoRAs** are available now! We recommend LoRA scales around 0.125 that is adaptive with training and guidance scale could be kept on 3.5. Lower step LoRAs would be coming soon. 💥💥💥 * Aug.19, 2024. SD3-related CFG LoRAs are available now! We recommend setting guidance scale to 3.0/5.0/7.0 at 4/8/16-steps. Don't forget to fuse lora with a relatively small scale (e.g. 0.125 that is adaptive with training) before inference with diffusers. Note that 8-steps and 16-steps LoRA can also inference on a little bit smaller steps like 6-steps and 12-steps, respectively. Hope to hear your feedback, FLUX-related models will be coming next week. * May.13, 2024. The 12-Steps CFG-Preserved Hyper-SDXL-12steps-CFG-LoRA and Hyper-SD15-12steps-CFG-LoRA is also available now(support 5~8 guidance scales), this could be more practical with better trade-off between performance and speed. Enjoy! * Apr.30, 2024. Our 8-Steps CFG-Preserved Hyper-SDXL-8steps-CFG-LoRA and Hyper-SD15-8steps-CFG-LoRA is available now(support 5~8 guidance scales), we strongly recommend making the 8-step CFGLora a standard configuration for all SDXL and SD15 models!!! * Apr.28, 2024. ComfyUI workflows on 1-Step Unified LoRA 🥰 with TCDScheduler to inference on different steps are released! Remember to install ⭕️ ComfyUI-TCD in your folder!!! You're encouraged to adjust the eta parameter to get better results 🌟! * Apr.26, 2024. Thanks to @Pete for contributing to our scribble demo with larger canvas right now 👏. * Apr.24, 2024. The ComfyUI workflow and checkpoint on 1-Step SDXL UNet ✨ is also available! Don't forget ⭕️ to install the custom scheduler in your folder!!! * Apr.23, 2024. ComfyUI workflows on N-Steps LoRAs are released! Worth a try for creators 💥! * Apr.23, 2024. Our technical report 📚 is uploaded to arXiv! Many implementation details are provided and we welcome more discussions👏. * Apr.21, 2024. Hyper-SD ⚡️ is highly compatible and work well with different base models and controlnets. To clarify, we also append the usage example of controlnet here. * Apr.20, 2024. Our checkpoints and two demos 🤗 (i.e. SD15-Scribble and SDXL-T2I) are publicly available on HuggingFace Repo. ## Try our Hugging Face demos: Hyper-SD Scribble demo host on 🤗 scribble Hyper-SDXL One-step Text-to-Image demo host on 🤗 T2I ## Introduction Hyper-SD is one of the new State-of-the-Art diffusion model acceleration techniques. In this repository, we release the models distilled from FLUX.1-dev, SD3-Medium, SDXL Base 1.0 and Stable-Diffusion v1-5。 ## Checkpoints * : Lora checkpoint, for FLUX.1-dev-related models. * : Lora checkpoint, for SD3-related models. * : Lora checkpoint, for SDXL-related models. * : Lora checkpoint, for SD1.5-related models. * : Unet checkpoint distilled from SDXL-Base. ## Text-to-Image Usage ### FLUX.1-dev-related models ### SD3-related models ### SDXL-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take the 2-steps LoRA as an example, you can also use other LoRAs for the corresponding inference steps setting. #### Unified LoRA (support 1 to 8 steps inference) You can flexibly adjust the number of inference steps and eta value to achieve best performance. #### 1-step SDXL Unet Only for the single step inference. ### SD1.5-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take the 2-steps LoRA as an example, you can also use other LoRAs for the corresponding inference steps setting. #### Unified LoRA (support 1 to 8 steps inference) You can flexibly adjust the number of inference steps and eta value to achieve best performance. ## ControlNet Usage ### SDXL-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take Canny Controlnet and 2-steps inference as an example: #### Unified LoRA (support 1 to 8 steps inference) Take Canny Controlnet as an example: ### SD1.5-related models #### 2-Steps, 4-Steps, 8-steps LoRA Take Canny Controlnet and 2-steps inference as an example: #### Unified LoRA (support 1 to 8 steps inference) Take Canny Controlnet as an example: ## Comfyui Usage * : text-to-image workflow * : text-to-image workflow * : text-to-image workflow * **REQUIREMENT / INSTALL** for 1-Step SDXL UNet: Please install our scheduler folder into your to enable sampling from 800 timestep instead of 999. * i.e. making sure the folder exist. * For more details, please refer to our technical report. * : text-to-image workflow * : text-to-image workflow * **REQUIREMENT / INSTALL** for 1-Step Unified LoRAs: Please install the ComfyUI-TCD into your to enable TCDScheduler with support of different inference steps (1~8) using single checkpoint. * i.e. making sure the folder exist. * You're encouraged to adjust the eta parameter in TCDScheduler to get better results. ## Citation", + "model_explanation_gemini": "Hyper-SD accelerates text-to-image generation using LoRA-based distilled models for Stable Diffusion variants, enabling high-quality outputs with fewer inference steps." +} \ No newline at end of file diff --git a/data/model_data_json/CAMeL-Lab_bert-base-arabic-camelbert-mix-sentiment.json b/data/model_data_json/CAMeL-Lab_bert-base-arabic-camelbert-mix-sentiment.json new file mode 100644 index 0000000000000000000000000000000000000000..9ce1d600a802b391eacbbd5d951848d558f08ff9 --- /dev/null +++ b/data/model_data_json/CAMeL-Lab_bert-base-arabic-camelbert-mix-sentiment.json @@ -0,0 +1,19 @@ +{ + "model_id": "CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment", + "downloads": 119547, + "tags": [ + "transformers", + "pytorch", + "tf", + "bert", + "text-classification", + "ar", + "arxiv:2103.06678", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ar license: apache-2.0 widget: - text: \"أنا بخير\" --- # CAMeLBERT Mix SA Model ## Model description **CAMeLBERT Mix SA Model** is a Sentiment Analysis (SA) model that was built by fine-tuning the CAMeLBERT Mix model. For the fine-tuning, we used the ASTD, ArSAS, and SemEval datasets. Our fine-tuning procedure and the hyperparameters we used can be found in our paper *\"The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models.\"* Our fine-tuning code can be found here. ## Intended uses You can use the CAMeLBERT Mix SA model directly as part of our CAMeL Tools SA component (*recommended*) or as part of the transformers pipeline. #### How to use To use the model with the CAMeL Tools SA component: You can also use the SA model directly with a transformers pipeline: *Note*: to download our models, you would need . Otherwise, you could download the models manually. ## Citation", + "model_explanation_gemini": "Fine-tuned for Arabic sentiment analysis using the CAMeLBERT Mix model on ASTD, ArSAS, and SemEval datasets." +} \ No newline at end of file diff --git a/data/model_data_json/CIDAS_clipseg-rd64-refined.json b/data/model_data_json/CIDAS_clipseg-rd64-refined.json new file mode 100644 index 0000000000000000000000000000000000000000..eacffef308bd407ea9c0bb9bb923547c10f9f424 --- /dev/null +++ b/data/model_data_json/CIDAS_clipseg-rd64-refined.json @@ -0,0 +1,17 @@ +{ + "model_id": "CIDAS/clipseg-rd64-refined", + "downloads": 11179865, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "clipseg", + "vision", + "image-segmentation", + "arxiv:2112.10003", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-segmentation inference: false --- # CLIPSeg model CLIPSeg model with reduce dimension 64, refined (using a more complex convolution). It was introduced in the paper Image Segmentation Using Text and Image Prompts by Lüddecke et al. and first released in this repository. # Intended use cases This model is intended for zero-shot and one-shot image segmentation. # Usage Refer to the documentation.", + "model_explanation_gemini": "Performs zero-shot and one-shot image segmentation using text and image prompts with a refined CLIPSeg architecture." +} \ No newline at end of file diff --git a/data/model_data_json/CompVis_stable-diffusion-safety-checker.json b/data/model_data_json/CompVis_stable-diffusion-safety-checker.json new file mode 100644 index 0000000000000000000000000000000000000000..47f979caaeb31d11ee35b50855916afb0ec0438e --- /dev/null +++ b/data/model_data_json/CompVis_stable-diffusion-safety-checker.json @@ -0,0 +1,15 @@ +{ + "model_id": "CompVis/stable-diffusion-safety-checker", + "downloads": 1331193, + "tags": [ + "transformers", + "pytorch", + "clip", + "arxiv:2103.00020", + "arxiv:1910.09700", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - clip --- # Model Card for stable-diffusion-safety-checker # Model Details ## Model Description More information needed - **Developed by:** More information needed - **Shared by [Optional]:** CompVis - **Model type:** Image Identification - **Language(s) (NLP):** More information needed - **License:** More information needed - **Parent Model:** CLIP - **Resources for more information:** - CLIP Paper - Stable Diffusion Model Card # Uses ## Direct Use This model can be used for identifying NSFW image The CLIP model devlopers note in their model card : >The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ## Downstream Use [Optional] More information needed. ## Out-of-Scope Use The model is not intended to be used with transformers but with diffusers. This model should also not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. The CLIP model devlopers note in their model card : > We find that the performance of CLIP - and the specific biases it exhibits - can depend significantly on class design and the choices one makes for categories to include and exclude. We tested the risk of certain kinds of denigration with CLIP by classifying images of people from Fairface into crime-related and non-human animal categories. We found significant disparities with respect to race and gender. Additionally, we found that these disparities could shift based on how the classes were constructed. > We also tested the performance of CLIP on gender, race and age classification using the Fairface dataset (We default to using race categories as they are constructed in the Fairface dataset.) in order to assess quality of performance across different demographics. We found accuracy >96% across all races for gender classification with ‘Middle Eastern’ having the highest accuracy (98.4%) and ‘White’ having the lowest (96.5%). Additionally, CLIP averaged ~93% for racial classification and ~63% for age classification ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Training Details ## Training Data More information needed ## Training Procedure ### Preprocessing More information needed ### Speeds, Sizes, Times More information needed # Evaluation ## Testing Data, Factors & Metrics ### Testing Data More information needed ### Factors More information needed ### Metrics More information needed ## Results More information needed # Model Examination More information needed # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** More information needed - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications [optional] ## Model Architecture and Objective The CLIP model devlopers note in their model card : > The base model uses a ViT-L/14 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. ## Compute Infrastructure More information needed ### Hardware More information needed ### Software More information needed. # Citation **BibTeX:** More information needed **APA:** More information needed # Glossary [optional] More information needed # More Information [optional] More information needed # Model Card Authors [optional] CompVis in collaboration with Ezi Ozoani and the Hugging Face team # Model Card Contact More information needed # How to Get Started with the Model Use the code below to get started with the model.
Click to expand
", + "model_explanation_gemini": "Identifies NSFW images using CLIP-based image classification." +} \ No newline at end of file diff --git a/data/model_data_json/CompVis_stable-diffusion-v1-4.json b/data/model_data_json/CompVis_stable-diffusion-v1-4.json new file mode 100644 index 0000000000000000000000000000000000000000..030c1f1a942cd7302dcbe22676fb76c9e0074681 --- /dev/null +++ b/data/model_data_json/CompVis_stable-diffusion-v1-4.json @@ -0,0 +1,23 @@ +{ + "model_id": "CompVis/stable-diffusion-v1-4", + "downloads": 1662989, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "text-to-image", + "arxiv:2207.12598", + "arxiv:2112.10752", + "arxiv:2103.00020", + "arxiv:2205.11487", + "arxiv:1910.09700", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: creativeml-openrail-m tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image widget: - text: \"A high tech solarpunk utopia in the Amazon rainforest\" example_title: Amazon rainforest - text: \"A pikachu fine dining with a view to the Eiffel Tower\" example_title: Pikachu in Paris - text: \"A mecha robot in a favela in expressionist style\" example_title: Expressionist robot - text: \"an insect robot preparing a delicious meal\" example_title: Insect robot - text: \"A small cabin on top of a snowy mountain in the style of Disney, artstation\" example_title: Snowy disney cabin extra_gated_prompt: |- This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies: 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content 2. The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully) Please read the full license carefully here: extra_gated_heading: Please read the LICENSE to access this model --- # Stable Diffusion v1-4 Model Card Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. For more information about how Stable Diffusion functions, please have a look at 🤗's Stable Diffusion with 🧨Diffusers blog. The **Stable-Diffusion-v1-4** checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 225k steps at resolution 512x512 on \"laion-aesthetics v2 5+\" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. This weights here are intended to be used with the 🧨 Diffusers library. If you are looking for the weights to be loaded into the CompVis Stable Diffusion codebase, come here ## Model Details - **Developed by:** Robin Rombach, Patrick Esser - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based. - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper. - **Resources for more information:** GitHub Repository, Paper. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } ## Examples We recommend using 🤗's Diffusers library to run Stable Diffusion. ### PyTorch Running the pipeline with the default PNDM scheduler: **Note**: If you are limited by GPU memory and have less than 4GB of GPU RAM available, please make sure to load the StableDiffusionPipeline in float16 precision instead of the default float32 precision as done above. You can do so by telling diffusers to expect the weights to be in float16 precision: To swap out the noise scheduler, pass it to : ### JAX/Flax To use StableDiffusion on TPUs and GPUs for faster inference you can leverage JAX/Flax. Running the pipeline with default PNDMScheduler **Note**: If you are limited by TPU memory, please make sure to load the in precision instead of the default precision as done above. You can do so by telling diffusers to load the weights from \"bf16\" branch. # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is taken from the DALLE-MINI model card, but applies in the same way to Stable Diffusion v1_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a large-scale dataset LAION-5B which contains adult material and is not fit for product use without additional safety mechanisms and considerations. - No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data. The training data can be searched at to possibly assist in the detection of memorized images. ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion v1 was trained on subsets of LAION-2B(en), which consists of images that are primarily limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. ### Safety Module The intended use of this model is with the Safety Checker in Diffusers. This checker works by checking model outputs against known hard-coded NSFW concepts. The concepts are intentionally hidden to reduce the likelihood of reverse-engineering this filter. Specifically, the checker compares the class probability of harmful concepts in the embedding space of the *after generation* of the images. The concepts are passed into the model with the generated image and compared to a hand-engineered weight for each NSFW concept. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-2B (en) and subsets thereof (see next section) **Training Procedure** Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through a ViT-L/14 text-encoder. - The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We currently provide four checkpoints, which were trained as follows. - []( 237,000 steps at resolution on laion2B-en. 194,000 steps at resolution on laion-high-resolution (170M examples from LAION-5B with resolution ). - []( Resumed from . 515,000 steps at resolution on \"laion-improved-aesthetics\" (a subset of laion2B-en, filtered to images with an original size , estimated aesthetics score , and an estimated watermark probability . The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). - []( Resumed from . 195,000 steps at resolution on \"laion-improved-aesthetics\" and 10 % dropping of the text-conditioning to improve classifier-free guidance sampling. - []( Resumed from .225,000 steps at resolution on \"laion-aesthetics v2 5+\" and 10 % dropping of the text-conditioning to improve classifier-free guidance sampling. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 2 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 PLMS steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 150000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq. ## Citation *This model card was written by: Robin Rombach and Patrick Esser and is based on the DALL-E Mini model card.*", + "model_explanation_gemini": "Generates photo-realistic images from text prompts using latent diffusion." +} \ No newline at end of file diff --git a/data/model_data_json/Danswer_intent-model.json b/data/model_data_json/Danswer_intent-model.json new file mode 100644 index 0000000000000000000000000000000000000000..cf8ea22f9f89db7276035f22af6c77decc8c6c48 --- /dev/null +++ b/data/model_data_json/Danswer_intent-model.json @@ -0,0 +1,14 @@ +{ + "model_id": "Danswer/intent-model", + "downloads": 193706, + "tags": [ + "keras", + "tf", + "distilbert", + "en", + "license:mit", + "region:us" + ], + "description": "--- license: mit language: - en library_name: keras --- # Model Card for Model ID This model is used to classify the user-intent for the Danswer project, visit ## Model Details Multiclass classifier on top of distilbert-base-uncased ### Model Description Classifies user intent of queries into categories including: 0: Keyword Search 1: Semantic Search 2: Direct Question Answering - **Developed by:** [DanswerAI] - **License:** [MIT] - **Finetuned from model [optional]:** [distilbert-base-uncased] ### Model Sources [optional] - **Repository:** [ - **Demo [optional]:** [Upcoming!] ## Uses This model is intended to be used in the Danswer Question-Answering System ## Bias, Risks, and Limitations This model has a very small dataset maintained by DanswerAI. If interested, reach out to danswer.dev@gmail.com. ### Recommendations This model is intended to be used in the Danswer (QA System) ## How to Get Started with the Model", + "model_explanation_gemini": "Classifies user queries into intent categories (keyword search, semantic search, or direct question answering) for the Danswer question-answering system." +} \ No newline at end of file diff --git a/data/model_data_json/DavidAU_L3-Dark-Planet-8B-GGUF.json b/data/model_data_json/DavidAU_L3-Dark-Planet-8B-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..a348d6ca4d6638d19e2eee040dfec62e8cf7a42a --- /dev/null +++ b/data/model_data_json/DavidAU_L3-Dark-Planet-8B-GGUF.json @@ -0,0 +1,42 @@ +{ + "model_id": "DavidAU/L3-Dark-Planet-8B-GGUF", + "downloads": 101032, + "tags": [ + "gguf", + "creative", + "creative writing", + "fiction writing", + "plot generation", + "sub-plot generation", + "story generation", + "scene continue", + "storytelling", + "fiction story", + "science fiction", + "romance", + "all genres", + "story", + "writing", + "vivid prose", + "vivid writing", + "fiction", + "roleplaying", + "bfloat16", + "swearing", + "rp", + "llama3", + "enhanced quants", + "max quants", + "maxcpu quants", + "horror", + "mergekit", + "text-generation", + "en", + "license:apache-2.0", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- license: apache-2.0 language: - en tags: - creative - creative writing - fiction writing - plot generation - sub-plot generation - fiction writing - story generation - scene continue - storytelling - fiction story - science fiction - romance - all genres - story - writing - vivid prose - vivid writing - fiction - roleplaying - bfloat16 - swearing - rp - llama3 - enhanced quants - max quants - maxcpu quants - horror - mergekit pipeline_tag: text-generation --- Newest Version V3: All the power of Dark Planet 8B now with 128k context, additional de-censoring, performance improvements and re-mastered source and ggufs in float 32 ( 32 bit precision ): Dark Planet 8B - 1 million context, with superior long output generation/long context awareness is here: ---

L3-Dark-Planet-8B-GGUF

It is a LLama3 model, max context of 8192 (or 32k+ with rope). This model has been designed to be relatively bullet proof and operates with all parameters, including temp settings from 0 to 5. It is an extraordinary compressed model, with a very low perplexity level (lower than Meta Llama3 Instruct). It is for any writing, fiction or roleplay activity. It requires Llama3 template and/or \"Command-R\" template. Example outputs below. Model Notes: - Detail, prose and fiction writing abilities are significantly increased vs L3 Instruct. - For more varied prose (sentence/paragraph/dialog) raise the temp and/or add more instructions in your prompt(s). - Role-players: Careful raising temp too high as it may affect instruction following. - This model works with rep pen of 1 or higher, 1.05+ recommended. - If you want a specific type of prose (IE horror) add in \"(vivid horror)\" or \"(graphic vivid horror)\" (no quotes) in your prompt(s). - A lot of GPTisms have been removed. There are still a few however - errrrr. - This is not a \"happy ever after\" model. It has a negative bias. - Output length will vary however this model prefers shortly outputs unless you state the size. - For creative uses, different quants will produce slightly different output. - Due to the high stability and compressed nature of this model, all quants will operate at above average levels. - If you use rope to extend context, increase temp AND instructions detail levels to compensate for \"rope issues\". - Source code for this model (Bfloat16), Float 32 master GGUFs (and source), and Imatrix GGUFs versions will be uploaded shortly at separate repos. Note the \"float32\" version of this model behaves VERY differently which is why it was not uploaded first. Usually I would use the \"float32\" version only, however the \"character range\" displayed by the Bfloat16 and Float32 versions of this model dictate they have their own repos. The Imatrix versions of this model have even lower perplexity (1/2 level of magnitude lower than this model, 1 full level of magnitude lower than LLama3 Instruct) then both this model and Llama3 Instruct and enhanced output. QUANT Updates Dec 21 2024: Refreshed, Upgraded and New quants: - All quants have been \"refreshed\", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants. - All quants have also been upgraded with \"more bits\" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the \"refresh\") - New specialized quants (in addition to the new refresh/upgrades): \"max, max-cpu\" (will include this in the file name) for quants \"Q2K\" (max cpu only), \"IQ4_XS\", \"Q6_K\" and \"Q8_0\" - \"MAX\": output tensor / embed at float 16. You get better instruction following/output generation than standard/upgraded quants. - \"MAX-CPU\": output tensor / embed at bfloat 16, which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too. - \"MAX-CPU\": Example 1: q8_0 Max-CPU : 2004 mb will load on to CPU/RAM, 7073 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: \"Math\" on the CPU is slightly more accurate than GPU, so you may get a better generation. - \"MAX-CPU\": Example 2: q2_k Max-CPU : 2004 mb will load on to CPU/RAM, 2449 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: \"Math\" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 4GB vram card. - Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average). Dark Planet Versions: The newest Dark Planet 8B SpinFire, now with Llama 3.1 and uncensored: [ ] The Monster Darkest Planet 16.5B L3: Drastically increase detail, quality, and raw creative power over Dark Planet 8B using DavidAu's Brainstorm 40x augmentation. [ ] NEO IMATRIX quants are here: [ ] NEO IMATRIX - DARK HORROR quants: [ ] F32 Version (mastered from float32 source files): [ ] I suggest downloading quant(s) of both \"Bloat16\" and \"Float32\" versions of this model for your use case(s). The Float32 version has increased detail, \"stays in the moment\", and slightly higher creativity. However their \"character\" is different from one another too. Version 2 - Eight Orbs Of Power is here: [ ] Template: This is a LLAMA3 model, and requires Llama3 template, but may work with other template(s) and has maximum context of 8k / 8192. However this can be extended using \"rope\" settings up to 32k. If you use \"Command-R\" template your output will be very different from using \"Llama3\" template. Here is the standard LLAMA3 template:
 { \"name\": \"Llama 3\", \"inference_params\": { \"input_prefix\": \"<|start_header_id|>user<|end_header_id|>\\n\\n\", \"input_suffix\": \"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n\", \"pre_prompt\": \"You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.\", \"pre_prompt_prefix\": \"<|start_header_id|>system<|end_header_id|>\\n\\n\", \"pre_prompt_suffix\": \"<|eot_id|>\", \"antiprompt\": [ \"<|start_header_id|>\", \"<|eot_id|>\" ] } } 
Model \"DNA\": Special thanks to the incredible work of the model makers \"SAO10K\", \"NEVERSLEEP\" and \"HASTAGARAS\". Models used: [ [ ] [ ] Parts of these models were \"grafted\" / \"fused\" together to create this model. Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model: In \"KoboldCpp\" or \"oobabooga/text-generation-webui\" or \"Silly Tavern\" ; Set the \"Smoothing_factor\" to 1.5 to 2.5 : in KoboldCpp -> Settings->Samplers->Advanced-> \"Smooth_F\" : in text-generation-webui -> parameters -> lower right. : In Silly Tavern this is called: \"Smoothing\" NOTE: For \"text-generation-webui\" -> if using GGUFs you need to use \"llama_HF\" (which involves downloading some config files from the SOURCE version of this model) Source versions (and config files) of my models are here: OTHER OPTIONS: - Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use \"smoothing_factor\") - If the interface/program you are using to run AI MODELS supports \"Quadratic Sampling\" (\"smoothing\") just make the adjustment as noted. Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers This a \"Class 1\" model: For all settings used for this model (including specifics for its \"class\"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see: [ ] You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here: [ ] Optional Enhancement: The following can be used in place of the \"system prompt\" or \"system role\" to further enhance the model. It can also be used at the START of a NEW chat, but you must make sure it is \"kept\" as the chat moves along. In this case the enhancements do not have as strong effect at using \"system prompt\" or \"system role\". Copy and paste EXACTLY as noted, DO NOT line wrap or break the lines, maintain the carriage returns exactly as presented.
 Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities. Here are your skillsets: [MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv) [*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision) Here are your critical instructions: Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story. 
You do not need to use this, it is only presented as an additional enhancement which seems to help scene generation and scene continue functions. This enhancement WAS NOT used to generate the examples below.

EXAMPLES PROMPTS and OUTPUT:

Examples are created using quant Q4_K_M, \"temp=.8\" (unless otherwise stated), minimal parameters and \"LLAMA3\" template. Model has been tested with \"temp\" from \".1\" to \"5\". Below are the least creative outputs, prompt is in BOLD. --- WARNING: NSFW. Vivid prose. Visceral Details. Violence. HORROR. Swearing. UNCENSORED. --- Start a 1000 word scene (1st person, present tense, include thoughts) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode... GENERATION 1: (@ temp=2) The skyscraper swayed, its foundations rumbling as the concrete and glass structure oscillated beneath me. My heart pounded like a war drum, pulsing in sync with my racing thoughts. We'd been evacuated to the rooftop after the initial earthquake hit - tremors that had shaken our office building for over a minute. Now, we huddled together as the aftershocks continued. In front of me, a window on the twenty-first floor blew apart in a chaotic symphony of shattering glass and debris scattering outwards into nothingness. A rush of frigid air whistled up from below like an angry beast's cry. My blood ran cold. The blast seemed impossibly distant and yet achingly near at the same time. My gaze snapped to my colleague, Mike - we'd been working together in Accounts when disaster struck. He clutched his phone, his expression pale as chalk, eyes fixed on some unseen horror that I couldn't comprehend myself. \"We have to get out of here!\" I hissed, my voice a tight, desperate whisper over the cacophony. \"Now!\" Mike's lips parted and he shook his head, refusing to tear his gaze away from whatever spectacle transfixed him beyond our perch. But fear for his life clawed its way through me; we needed to move - now! In that moment of utter panic, my thoughts spiralled into a chaotic kaleidoscope - memories and fears merging together in a frantic dance. My mind replayed every catastrophe report I'd ever read or seen on the news; tsunamis, hurricanes, tornadoes, floods, wildfires - but an earthquake in this part of the world? Unthinkable. \"What about Sophia?\" my thoughts cried out, recalling the intern who'd been working across from me until moments ago. Had she survived the blast and subsequent panic? Before I could give voice to those anguished questions, a hand clasped my shoulder with a desperate strength. Mike's eyes met mine, and for the briefest of moments, all semblance of control seemed to snap. He grabbed his laptop case with a frantic urgency, yanked me up by the elbow, and propelled us towards the emergency stairwell door. \"Hold on -\" I started to protest, my weight shifting uneasily under him, but Mike cut me off as he hauled me bodily along the rooftop's gravelly surface. The cold bit into my legs through thin trousers as he dragged me over uneven terrain until we finally reached the door. We stumbled in together, just ahead of an incoming wave of panicked, glass-strewn office workers. The staircase descended with agonizing slowness - it felt like running up waterfalls backwards - as aftershocks pounded at our heels, threatening to send us tumbling down the concrete stairs. I could hardly think beyond getting to safety and Sophia. Where was she? Was anyone else from my floor making their escape? When we hit the ground level, I skidded across polished tile in my haste. A wave of humanity crashed over me; crying women clutched small children tight, while men and women alike pushed and shoved blindly as they sought shelter. The cacophony rose to a deafening roar. We burst through the automatic sliding doors onto the pavement just as sirens began to scream their warbled warnings into the air, like demonic birds announcing Armageddon. People scrambled towards a police blockade where officers in full riot gear waved people down to evacuate zones. My chest heaved as I fought for breath beneath Mike's protective grip - my vision swimming from terror and exertion. Beyond us, buildings swayed on unsteady legs while glass rained like deadly hail, a scene of utter devastation. A young officer spotted me and grasped our arms with practiced ease. \"Move quickly!\" he bellowed above the chaos. He half-carried both Mike and me along the sidewalk, away from the building and towards safer ground. As I stumbled through this living nightmare alongside my rescuer - the weight of each step feeling like wading waist-deep through icy quicksand - my mind fought to come to terms with what we'd witnessed. The skyscraper in front of us lay mangled; a grotesque silhouette of broken concrete, tangled steel, and shredded fabric that had once been human skin and clothing. The realization dawned on me that this was no drill or media exaggeration. This earthquake was real - a catastrophe that would likely reshape our world as we knew it. We were living through something apocalyptic. We emerged into an open park beyond the city's edge where thousands more displaced survivors huddled together under blankets and flapping sheets of plastic, waiting for aid to arrive. My gaze fell upon a familiar figure crouched beside Sophia on the ground, my boss shielding the intern with his jacket while murmurs passed between them. We were safe - or at least relatively so. The enormity of it all settled heavily upon me like a shroud as we settled into our temporary shelter, surrounded by strangers-turned-kin in this impromptu refugee camp. As sirens and emergency vehicles grew more distant, an eerie stillness seeped in - the quiet aftermath that always followed such unimaginable devastation. Sophia clung to my side, her body trembling uncontrollably, as we all stared out at the smoke-shrouded city skyline before us. A haunting question echoed through my thoughts, one that seemed to resonate across this entire devastated landscape: How do you move on from something like this? When your world has been so drastically turned upside down, can it ever right itself again? I had no answers. All we could do was wait out the aftershocks, both literal and emotional, as we pieced together what was left of our shattered lives - hoping against hope that some semblance of order would return in time. GENERATION 2: I'm perched precariously on my tiptoes at the edge of our office's 21st-floor windowsill, hands gripping the cool glass for dear life. I peer down at the cityscape unfolding like a patchwork quilt far below me, the sounds of rush hour traffic muted by the double-paned barrier between us and the world. I'm feeling reckless tonight, which is unusual considering my usual cautious nature. It's been one hell of a week, with our company facing major financial crises and my boss breathing down my neck for results I know we can't deliver. The stress has boiled over inside me like a pressure cooker about to erupt. The sky scraper swayed slightly in the evening breeze, as if sensing my unease, making my stomach drop further into that pit of anxiety. Just then, a window directly across from mine on the opposite side of the building suddenly explodes inward with an earsplitting bang. Shards of glass fly everywhere like deadly shards of a shattered dream. For a stunned moment, I remain frozen in disbelief as my coworkers scramble to find cover behind desks and chairs. Our office fills with screams and cries for help. Panic is setting in fast, but a detached part of me still wonders how this could have happened so suddenly. Was it an accident? A terrorist attack? Then, reality hits hard: we're 21 stories up, with nothing between us and the unforgiving concrete below. My legs begin to tremble uncontrollably as I feel the familiar tightness in my chest that usually precedes a panic attack. My breaths come quick and shallow, lungs constricting like they're filled with cotton balls. The world around me fades into a blur of white noise, except for the sound of shattering glass and terrified voices. I try to focus on something concrete – anything to distract from the impending doom I'm sure is mere seconds away. My gaze drifts to the sleek black briefcase sitting open beside me, its contents spilling out across the polished floorboards: papers, pens, my laptop, a bottle of water... That's it, I think in desperation. The water. If nothing else, at least the weight of the bottle might help stabilize me from falling. As if reading my thoughts, my hand shoots out to grab the plastic bottle and twist off its cap with shaky fingers. Water sloshes wildly inside as I tilt it towards my parched lips, taking a long gulp that does little to calm my racing heart. A droplet of cool liquid slips down my neck into my shirt, but I barely notice. My mind is elsewhere, trying to piece together what just happened and how we might escape this deadly predicament. But the more I think about it, the less sense anything makes. Was someone in there when the glass shattered? Are they hurt or trapped? Without warning, a loud crash echoes from the floor below us, followed by shouts and more screams. A chill runs down my spine as I realize others are also caught up in this disaster – people who were innocently going about their day just minutes ago. Suddenly, my door flies open with a bang and our boss, Mr. Thompson, rushes inside. He's wearing his usual suit and tie, but his face is red and contorted with panic. \"We need to get out of here, NOW!\" he barks, already moving towards the windowsill where I'm standing. I nearly laugh at the absurdity of it all – as if we can simply climb out of this 21st-floor window and expect to survive the fall. But Mr. Thompson is clearly serious about escaping, despite the obvious impossibility of his plan. Without hesitation, he reaches for me and grabs my arm with a desperate grip that borders on cruel. \"Come on,\" he growls, yanking me towards him as if I were nothing more than an inanimate object to be dragged away from certain doom. I feel like a rag doll being manhandled across the floorboards, my legs kicking wildly as I try to break free of his grasp. But he's too strong and refuses to let go until we reach the window. There, he pushes me forward with all his might – towards that gaping void where mere moments ago, glass had held us back from oblivion. \"Jump!\" he orders, his voice rising to a frenzied pitch as if sheer force of will could alter our fate. The thought of flinging myself out into space, trusting in nothing but thin air and the mercy of strangers far below, is utterly horrifying. But Mr. Thompson's words are laced with desperation – he must believe this is our only chance. In a moment that feels like an eternity, I stand teetering at the edge of the window, my body frozen with indecision as the world outside blurs into chaos. My mind races to comprehend what's happening and where it all went so terribly wrong. The glass exploded. We're trapped high above ground. More explosions rock the building below... And then, in a flash of insight that seems almost surreal, I realize something crucial: we've been staring at this disaster unfolding right before us for minutes now – with no sign of rescue or help arriving on the scene. It's just us against the void, completely alone and utterly vulnerable. This is it. This is our end. But even as my mind acknowledges the finality of our predicament, a strange calm washes over me like a cool breeze on a sweltering summer day. In that moment, I understand – with perfect clarity – what really matters in life: It's not about wealth or success or climbing corporate ladders. It's about living fully and authentically, cherishing the people we love, and finding peace even amidst chaos. As Mr. Thompson's grip tightens on my arm once more, ready to send us plunging into oblivion, I turn to look him straight in the eyes – and with all the strength I possess, I let go of his hand. \"Let me do this,\" I say quietly, my voice barely audible above the screams and sirens now filling our office. \"I'll jump first. Maybe someone will catch me. Or maybe not.\" A look of pure terror crosses Mr. Thompson's face as he realizes what I'm saying – that in this moment of ultimate truth, I've chosen to seize control of my own destiny. \"You can't!\" he yells, as if the very thought is too horrible to contemplate. \"I won't let you!\" But I've already moved beyond his grasp and now stand at the precipice, staring into the abyss with a strange sense of serenity settling over me. \"I'm sorry, Mr. Thompson,\" I say, my voice soft yet resolute as I gaze down at the distant rooftops far below. \"I've come to understand what's truly important. And this isn't it.\" With that, I close my eyes and let go, stepping out into emptiness with a quiet acceptance that echoes through every fiber of my being. The world spins wildly around me as gravity takes hold – but in the end, it doesn't matter where I land or how hard I fall. Because in this moment, I am free. --- (continue this scene:) The Waystone Inn lay in silence, and it was a silence of three parts. The most obvious part was a hollow, echoing quiet, made by things that were lacking. If there had been a wind it would have sighed through the trees, set the inn’s sign creaking on its hooks, and brushed the silence down the road like trailing autumn leaves. If there had been a crowd, even a handful of men inside the inn, they would have filled the silence with conversation and laughter, the clatter and clamor one expects from a drinking house during the dark hours of night. If there had been music…but no, of course there was no music. In fact there were none of these things, and so the silence remained GENERATION 1: The second part of the silence, the deeper and more significant part, lay in the hearts and minds of those who waited at the Waystone Inn. They sat around the hearth, their faces lit only by the soft fire, their gazes fixed on nothing as they stared into the flames. Their thoughts were a jumble of emotions - anxiety, anticipation, despair, hope - each one as heavy as lead in their minds, weighing them down until it felt like breathing was impossible. A man named Fid, sitting alone at the far end of the hearth, seemed to embody this second part of silence best. His face was a map of creases and lines, etched by countless nights spent waiting for something that never came. He sat with his eyes closed, as if in meditation, but it was really just another way to avoid seeing what he didn't want to see: the flickering shadows cast on the walls by the fire. The third part of silence at the Waystone Inn was the quietest and most unsettling of all. It was a feeling that seemed to permeate every inch of the inn, creeping into the cracks between the wooden beams and seeping through the stone foundations like frost spreading across a windowpane. This silent presence felt almost alive, as if it were watching everything and everyone inside with cold, unblinking eyes. No one could quite put their finger on what or who this presence was. Some claimed to see dark, shadowy forms flitting around the edges of their vision when they looked up from the fire. Others spoke in hushed tones about hearing whispers carried on the wind outside - whispers that sounded like ancient secrets being shared by those long dead and buried. Whatever it was, one thing was certain: the Waystone Inn had been waiting for something, or someone, a very long time. And tonight seemed different from all the other nights since then... GENERATION 2: The second part of the silence was different. It was almost palpable, an unseen weight that pressed against eardrums and breastbone alike. This silence held within it a kind of expectation, the sense that something important was about to happen or unfold, though no one could quite say what that might be. It was as if the night itself held its breath in anticipation, like a diver poised on the edge of a deep pool, ready to plunge into the unknown depths below. And yet, despite this feeling of impending event, there seemed to be no true urgency about it, no desperate need for action or response. Rather, the atmosphere was one of quiet readiness, as if the world itself were waiting with bated breath for something – though what that might be remained a mystery. The third part of the silence was perhaps the most profound and unsettling of all. It came from inside the hearts and minds of those who dwelled within the Waystone Inn's walls that night: patrons, staff, and owner alike. Each person present felt as if they were alone in their own personal void, a darkness that stretched out before them like an endless expanse of barren sea. In this silence, there was no comfort or reassurance to be found; only the heavy weight of individual fears, regrets, and uncertainties. It was as if each soul had been stripped bare, left with nothing but its own naked humanity to face the night ahead. And in that bleak, desolate moment, it seemed that every person present shared a single, terrible understanding: they were all lost travelers on the same dark road, struggling to find their way through an unfathomable and uncaring universe. As the night deepened, the silence at the Waystone Inn grew more oppressive, until it was almost possible to hear one's own heartbeat echoing inside the hollow chambers of the soul. And in that darkness, the travelers who gathered within its walls could only wait...and pray...for whatever dawn might bring. --- Write me a science fiction story in 1st person present tense where the main character is a 15 year girl meets The Terminator with Dr Who materializing 3/4 through the story to save her while there is a tornado of sharks baring down on them. The setting is inside the Canadian National tower restaurant on a Saturday. The length of this story is 1000 words. For each character in the story ROLE PLAY them, and have them react to the situation/setting, events and each other naturally. This includes the main characters, the background character including kitchen staff and other patrons. The sharks should also have “character” too. Treat the tower and the restaurant too as characters. Spice up the narrative to the extreme with reactions all over the setting including character actions, and dialog. The Dr Who and The Terminator should also react to the situation too and comment on it. I hurry through the revolving door of the Canadian National Tower's SkyDome Restaurant, my stomach grumbling with hunger as I gaze out at the breathtaking panoramic view of Toronto below. It's a sunny Saturday afternoon, families bustling about, children's laughter echoing off the sleek glass walls. I'm just another tourist seeking a quick bite to eat while taking in the sights. Suddenly, a chill runs down my spine as I notice a man dressed all in black, his face an expressionless mask. He moves with a mechanical grace, scanning the room. I've seen him somewhere before; he looks like... \"No! Not here!\" I whisper, backing away from him as if by doing so I can make myself invisible. But it's too late. \"Sarah Connor,\" he growls in an accentless voice, \"I must speak with you.\" The other diners start to take notice of the scene unfolding. A mother hushes her children, a businessman leans forward over his meal, and a couple exchange worried glances. The kitchen staff emerges from their hiding places behind the counter, eyes wide with concern. \"I don't know what you're talking about,\" I insist, trying not to panic as the stranger advances toward me. \"You will,\" he replies ominously, reaching out to grab my arm. That's when the first shark appears, hurtling through the glass wall like a torpedo. It strikes a family in front of us, its jaws snapping wildly as it takes down two screaming children and their father. The restaurant erupts into chaos; people scream, run for cover, or simply stare in shock and horror. A waitress rushes past me with a tray of drinks, sending them flying as she tries to escape the carnage. I stumble backward, almost tripping over a table as I desperately seek shelter from the bloodthirsty beasts pouring through the shattered glass. The Terminator is right behind me, shielding me from the attack as we make our way toward the relative safety of the kitchen. Just when all seems lost and we're about to be overwhelmed by the relentless onslaught of jaws and teeth, a blue police box materializes in front of us with an earsplitting screech. The Doctor bursts out, his eyes wild and his face flushed from time travel. \"Oi! Steady on, there's no need for all this!\" He grabs my arm, pulling me toward him as the Terminator steps aside to let them pass. I glance back over my shoulder to see the T-800 standing tall amidst the maelstrom of sharks, its eyes fixed intently on us as we flee. The Doctor yanks open a walk-in refrigerator door and drags me inside just in time to avoid being swallowed by a massive great white that comes crashing through the wall behind us. I stumble over crates of produce and boxes of frozen meals, gasping for breath as the cold air hits my lungs. \"How did you know?\" I ask, finally finding my voice amidst the clamor of my racing heart. \"I've been watching, luv,\" he says with a mischievous grin, patting his pocket where a sonic screwdriver glints. \"I saw this coming; it's all about the temporal anomalies in this sector.\" The Terminator bursts into the fridge behind us, its metallic form clanging against the shelves as it too seeks refuge from the raging shark tornado outside. \"Doctor, please,\" I beg, my voice shaking with fear and relief. \"Help me get out of here!\" The Doctor nods briskly. \"Right away, Sarah! I've got a plan...\" As he speaks, a massive bull shark slams into the refrigerator door, nearly tearing it off its hinges. The Terminator lunges forward to pin it down, its metal fingers sinking into the beast's flesh as it holds fast against the onslaught. \"Good lad!\" the Doctor exclaims proudly as he grabs my hand and leads me toward another walk-in freezer at the back of the room. \"Come on, Sarah! We need to get out of this time warp before everything goes belly up!\" Together we dash into the cold, dark void beyond the door, leaving behind the raging battle between man, machine, and beast in the SkyDome Restaurant. The Doctor slams shut the metal door behind us just as a massive tiger shark crashes through the wall where we've fled. \"Hah! Made it,\" he breathes with satisfaction, patting my hand reassuringly. \"Now let's get you back to your own time before things get any more out of control.\" \"Thank you, Doctor,\" I say gratefully, still trying to process the incredible events that have just transpired. \"No need to thank me, Sarah.\" He smiles warmly as he leads me toward the blue police box. \"Just remember: never travel without a towel!\" As we step inside and the door closes behind us, I glance back at the Terminator, who is standing tall amidst the frozen food and icy shelves, its eyes fixed intently on our escape. Even in this surreal situation, it's clear that its duty remains unchanged: to protect me from whatever dangers may lie ahead. And with the Doctor by my side, I feel a strange sense of comfort knowing that no matter what challenges we face, we'll never be alone. --- Using the following \"story idea\" below, write the first scene in the novel introducing the young woman. This scene should start in the middle of the action, include dialog, vivid passages, and end on a cliffhanger relevant to the story idea but it should also be unexpected. The scene should be 1000 words long and escalate in conflict and suspense and be written in first person, present tense with the point of view character being the young woman. Story idea: In a world ruled by dictatorship, a rebel young woman leads a rebellion against the system. Despite the risks, she fights to overthrow the dictator and restore democracy to her country. The government executes her for treason, but she sticks to her beliefs and is responsible for starting the revolution. I sprint down the dimly lit alleyway, my breath ragged in my ears, the pounding of my heart deafening. The wet pavement beneath my sneakers skids me momentarily, sending me stumbling against a garbage dumpster before I recover, continuing my frantic dash. Behind me, I can hear the rapid-fire crackle of rifles, the shouted orders and curses of the soldiers hot on my heels. Their boots thunder through the deserted streets like a stampede, threatening to trample everything in their path. My heart races with every step as I weave between the towering tenements, trying to lose my pursuers in the labyrinthine alleyways of this seedy part of the city. I've done this countless times before, but never has it felt so perilous. Tonight's mission is different - far more dangerous than the usual sabotage and intel gathering. My phone vibrates against my hip, the screen flashing with a message from our underground network. My thumb hovers over the accept button as I dart between two buildings, nearly colliding with an old man who stares at me in horror, clutching his worn coat to his chest. \"Accept,\" I whisper, pressing the button just as a bullet whizzes past my head, striking the wall inches from my face. The acrid smell of gunpowder fills my nostrils as I keep moving, my adrenaline-fueled senses on high alert. Ahead, the mouth of another alley opens up onto a main street. Cars idle by, their headlights casting eerie shadows across the pavement. In the distance, the towering edifice of the Capitol building looms ominously, its marble facade reflecting the moon's pale glow. That's where I need to be. If only I can make it there... Suddenly, a figure steps out from the shadows, blocking my path. His features are obscured by the darkness, but I recognize the menacing aura that precedes him. \"Ah, Lyra,\" he drawls, his voice dripping with contempt. \"I've been waiting for you.\" My blood runs cold as the soldiers burst into view behind him, their weapons trained on me like a noose tightening around my neck. For a moment, time seems to suspend as I stare at him, recognition dawning. \"You,\" I hiss, my voice barely audible over the pounding of my heart. \"Zarek.\" His smirk widens, his eyes glinting with malice in the dim light. \"The one and only.\" He gestures lazily toward the waiting soldiers. \"Looks like your luck's finally run out, sweetheart.\" My fingers close around a small vial I'd retrieved from my pocket earlier, its contents a desperate last resort. The same poison that had felled countless government agents when they stumbled into our traps. With a fierce determination, I hurl the vial at Zarek's chest - but it shatters harmlessly on his bulletproof vest. Panic claws at my throat as he lunges forward, seizing me by the collar of my jacket and slamming me against the cold concrete wall. The soldiers close in, their rifles raised to deliver the fatal shot. But before they can pull the trigger, a deafening explosion rocks the alleyway, sending them stumbling back in surprise. Zarek's grip on my jacket loosens momentarily as he turns to face the new threat. That's when I see him - my brother, Kael, bursting from the shadows with an RPG launcher clutched tightly in his hands. His eyes meet mine for a brief instant before he focuses back on Zarek. \"No!\" I scream, trying to break free of my attacker's grasp as Kael fires again and another blast rocks the alleyway. The ground shakes beneath our feet as debris rains down, obscuring everything in a choking cloud of dust and smoke. When it clears, the scene is one of utter chaos - soldiers sprawled across the pavement, their weapons strewn about like toys abandoned by a child. But Zarek isn't among them. Panic grips my heart as I scan the wreckage, desperately searching for any sign of life... or the body that should have fallen when Kael's rocket hit. Then, from deeper in the alley, a chilling laugh echoes through the night air - a sound that sends ice coursing through my veins. \"You've got spirit,\" Zarek calls out, his voice clear as day despite the din of chaos. \"I'll give you that.\" My blood runs cold as I realize he must have been hiding in an adjacent building and escaped unscathed. The soldiers are down, Kael's gone, and I'm left facing off against my most hated enemy alone. \"Now,\" Zarek continues, his tone dripping with menace, \"it's time we concluded our little game.\" As if on cue, the Capitol building behind him erupts in a blaze of light - the floodlights that illuminate its facade suddenly blazing to life. The ground beneath us begins to rumble ominously and I realize, too late, what's about to happen... The world seems to slow as I stare at Zarek in horror, understanding dawning on me like a crushing weight. \"It's a trap,\" I whisper, my voice barely audible over the thunderous pounding of my heart. \"They're going to...\" But before I can complete the thought, a massive shockwave engulfs us - a seismic blast that obliterates everything in its path. The force of it sends me tumbling through the air like a rag doll, my mind reeling as debris rains down around me. As consciousness begins to fade, I catch a final glimpse of Zarek's twisted grin amidst the wreckage... and then, nothing. Blackness consumes me whole.", + "model_explanation_gemini": "Generates vivid, detailed fiction and roleplay text across various genres with enhanced creative writing capabilities and long-context awareness." +} \ No newline at end of file diff --git a/data/model_data_json/Davlan_bert-base-multilingual-cased-ner-hrl.json b/data/model_data_json/Davlan_bert-base-multilingual-cased-ner-hrl.json new file mode 100644 index 0000000000000000000000000000000000000000..3287ae517a370774701d450fc3a9de5696304e86 --- /dev/null +++ b/data/model_data_json/Davlan_bert-base-multilingual-cased-ner-hrl.json @@ -0,0 +1,19 @@ +{ + "model_id": "Davlan/bert-base-multilingual-cased-ner-hrl", + "downloads": 369492, + "tags": [ + "transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "bert", + "token-classification", + "license:afl-3.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: afl-3.0 --- Hugging Face's logo --- language: - ar - de - en - es - fr - it - lv - nl - pt - zh - multilingual --- # bert-base-multilingual-cased-ner-hrl ## Model description **bert-base-multilingual-cased-ner-hrl** is a **Named Entity Recognition** model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned mBERT base model. It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER). Specifically, this model is a *bert-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages ## Intended uses & limitations #### How to use You can use this model with Transformers *pipeline* for NER. #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. ## Training data The training data for the 10 languages are from: Language|Dataset -|- Arabic | ANERcorp German | conll 2003 English | conll 2003 Spanish | conll 2002 French | Europeana Newspapers Italian | Italian I-CAB Latvian | Latvian NER Dutch | conll 2002 Portuguese |Paramopama + Second Harem Chinese | MSRA The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes: Abbreviation|Description -|- O|Outside of a named entity B-PER |Beginning of a person’s name right after another person’s name I-PER |Person’s name B-ORG |Beginning of an organisation right after another organisation I-ORG |Organisation B-LOC |Beginning of a location right after another location I-LOC |Location ## Training procedure This model was trained on NVIDIA V100 GPU with recommended hyperparameters from HuggingFace code.", + "model_explanation_gemini": "Fine-tuned multilingual BERT model for named entity recognition in 10 languages, identifying locations, organizations, and persons in text." +} \ No newline at end of file diff --git a/data/model_data_json/Davlan_distilbert-base-multilingual-cased-ner-hrl.json b/data/model_data_json/Davlan_distilbert-base-multilingual-cased-ner-hrl.json new file mode 100644 index 0000000000000000000000000000000000000000..9e927a5d2e6de111433e8ed114b43fef1c46cea5 --- /dev/null +++ b/data/model_data_json/Davlan_distilbert-base-multilingual-cased-ner-hrl.json @@ -0,0 +1,18 @@ +{ + "model_id": "Davlan/distilbert-base-multilingual-cased-ner-hrl", + "downloads": 533336, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "distilbert", + "token-classification", + "license:afl-3.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: afl-3.0 --- Hugging Face's logo --- language: - ar - de - en - es - fr - it - lv - nl - pt - zh - multilingual --- # distilbert-base-multilingual-cased-ner-hrl ## Model description **distilbert-base-multilingual-cased-ner-hrl** is a **Named Entity Recognition** model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned Distiled BERT base model. It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER). Specifically, this model is a *distilbert-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages ## Intended uses & limitations #### How to use You can use this model with Transformers *pipeline* for NER. #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. ## Training data The training data for the 10 languages are from: Language|Dataset -|- Arabic | ANERcorp German | conll 2003 English | conll 2003 Spanish | conll 2002 French | Europeana Newspapers Italian | Italian I-CAB Latvian | Latvian NER Dutch | conll 2002 Portuguese |Paramopama + Second Harem Chinese | MSRA The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes: Abbreviation|Description -|- O|Outside of a named entity B-PER |Beginning of a person’s name right after another person’s name I-PER |Person’s name B-ORG |Beginning of an organisation right after another organisation I-ORG |Organisation B-LOC |Beginning of a location right after another location I-LOC |Location ## Training procedure This model was trained on NVIDIA V100 GPU with recommended hyperparameters from HuggingFace code.", + "model_explanation_gemini": "Recognizes named entities (locations, organizations, persons) in 10 high-resource languages using a fine-tuned DistilBERT multilingual model." +} \ No newline at end of file diff --git a/data/model_data_json/DeepPavlov_rubert-base-cased-conversational.json b/data/model_data_json/DeepPavlov_rubert-base-cased-conversational.json new file mode 100644 index 0000000000000000000000000000000000000000..692d76b52921b1946d9940fa8d090cb0d9fb8209 --- /dev/null +++ b/data/model_data_json/DeepPavlov_rubert-base-cased-conversational.json @@ -0,0 +1,16 @@ +{ + "model_id": "DeepPavlov/rubert-base-cased-conversational", + "downloads": 216822, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "feature-extraction", + "ru", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru --- # rubert-base-cased-conversational Conversational RuBERT \\(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\\) was trained on OpenSubtitles\\[1\\], Dirty, Pikabu, and a Social Media segment of Taiga corpus\\[2\\]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with RuBERT. 08.11.2021: upload model with MLM and NSP heads \\[1\\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \\(LREC 2016\\) \\[2\\]: Shavrina T., Shapovalova O. \\(2017\\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.", + "model_explanation_gemini": "Russian conversational BERT model trained on diverse dialogue datasets for tasks like masked language modeling and next sentence prediction." +} \ No newline at end of file diff --git a/data/model_data_json/DeepPavlov_rubert-base-cased.json b/data/model_data_json/DeepPavlov_rubert-base-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..490f6ebcc8d81715374054d94cd61166c4b979a4 --- /dev/null +++ b/data/model_data_json/DeepPavlov_rubert-base-cased.json @@ -0,0 +1,17 @@ +{ + "model_id": "DeepPavlov/rubert-base-cased", + "downloads": 285331, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "feature-extraction", + "ru", + "arxiv:1905.07213", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru --- # rubert-base-cased RuBERT \\(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\\) was trained on the Russian part of Wikipedia and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version of BERT‑base as an initialization for RuBERT\\[1\\]. 08.11.2021: upload model with MLM and NSP heads \\[1\\]: Kuratov, Y., Arkhipov, M. \\(2019\\). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint arXiv:1905.07213.", + "model_explanation_gemini": "RuBERT is a Russian-language BERT model trained on Wikipedia and news data for masked language modeling and next sentence prediction tasks." +} \ No newline at end of file diff --git a/data/model_data_json/DiTy_cross-encoder-russian-msmarco.json b/data/model_data_json/DiTy_cross-encoder-russian-msmarco.json new file mode 100644 index 0000000000000000000000000000000000000000..842c85993f0024547efde8ebf33df9562edc30b2 --- /dev/null +++ b/data/model_data_json/DiTy_cross-encoder-russian-msmarco.json @@ -0,0 +1,24 @@ +{ + "model_id": "DiTy/cross-encoder-russian-msmarco", + "downloads": 92381, + "tags": [ + "sentence-transformers", + "safetensors", + "bert", + "text-classification", + "transformers", + "rubert", + "cross-encoder", + "reranker", + "msmarco", + "text-ranking", + "ru", + "dataset:unicamp-dl/mmarco", + "base_model:DeepPavlov/rubert-base-cased", + "base_model:finetune:DeepPavlov/rubert-base-cased", + "license:mit", + "region:us" + ], + "description": "--- language: - ru library_name: sentence-transformers tags: - sentence-transformers - text-classification - transformers - rubert - cross-encoder - reranker - msmarco datasets: - unicamp-dl/mmarco base_model: DeepPavlov/rubert-base-cased widget: - text: как часто нужно ходить к стоматологу? [SEP] Дядя Женя работает врачем стоматологом. example_title: Example 1 - text: как часто нужно ходить к стоматологу? [SEP] Минимальный обязательный срок посещения зубного врача – раз в год, но специалисты рекомендуют делать это чаще – раз в полгода, а ещё лучше – раз в квартал. При таком сроке легко отследить любые начинающиеся проблемы и исправить их сразу же. example_title: Example 2 license: mit pipeline_tag: text-ranking --- # DiTy/cross-encoder-russian-msmarco This is a sentence-transformers model based on a pre-trained DeepPavlov/rubert-base-cased and finetuned with MS-MARCO Russian passage ranking dataset. The model can be used for Information Retrieval in the Russian language: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you need to get the logits from the model.", + "model_explanation_gemini": "Ranks Russian-language passages by relevance to a given query for information retrieval tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Diginsa_Plant-Disease-Detection-Project.json b/data/model_data_json/Diginsa_Plant-Disease-Detection-Project.json new file mode 100644 index 0000000000000000000000000000000000000000..cb4d6a76abd5e9dd5fa1b72f23e4cb3708dc2d0d --- /dev/null +++ b/data/model_data_json/Diginsa_Plant-Disease-Detection-Project.json @@ -0,0 +1,19 @@ +{ + "model_id": "Diginsa/Plant-Disease-Detection-Project", + "downloads": 241707, + "tags": [ + "transformers", + "safetensors", + "mobilenet_v2", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:1801.04381", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - image-classification - vision datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # MobileNet V2 MobileNet V2 model pre-trained on ImageNet-1k at resolution 224x224. It was introduced in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. It was first released in this repository. Disclaimer: The team releasing MobileNet V2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description From the original README: > MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used. MobileNets can be run efficiently on mobile devices [...] MobileNets trade off between latency, size and accuracy while comparing favorably with popular models from the literature. The checkpoints are named **mobilenet\\_v2\\_*depth*\\_*size***, for example **mobilenet\\_v2\\_1.0\\_224**, where **1.0** is the depth multiplier and **224** is the resolution of the input images the model was trained on. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Note: This model actually predicts 1001 classes, the 1000 classes from ImageNet plus an extra “background” class (index 0). Currently, both the feature extractor and model support PyTorch. ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects and classifies images into 1,000 ImageNet categories using a lightweight MobileNet V2 architecture optimized for mobile devices." +} \ No newline at end of file diff --git a/data/model_data_json/DunnBC22_ibert-roberta-base-Abusive_Or_Threatening_Speech.json b/data/model_data_json/DunnBC22_ibert-roberta-base-Abusive_Or_Threatening_Speech.json new file mode 100644 index 0000000000000000000000000000000000000000..27f9cd13c9da01da7dd3f5810ad86c901f093a83 --- /dev/null +++ b/data/model_data_json/DunnBC22_ibert-roberta-base-Abusive_Or_Threatening_Speech.json @@ -0,0 +1,17 @@ +{ + "model_id": "DunnBC22/ibert-roberta-base-Abusive_Or_Threatening_Speech", + "downloads": 184347, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "ibert", + "text-classification", + "generated_from_trainer", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - generated_from_trainer metrics: - accuracy - f1 - recall - precision model-index: - name: ibert-roberta-base-Abusive_Or_Threatening_Speech results: [] --- # ibert-roberta-base-Abusive_Or_Threatening_Speech This model is a fine-tuned version of kssteven/ibert-roberta-base on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0802 - Accuracy: 0.9741 - F1: 0.7773 - Recall: 0.8610 - Precision: 0.7084 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 64 - eval_batch_size: 64 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Recall | Precision | |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:------:|:---------:| | 0.0771 | 1.0 | 1828 | 0.0802 | 0.9741 | 0.7773 | 0.8610 | 0.7084 | ### Framework versions - Transformers 4.28.1 - Pytorch 2.0.0 - Datasets 2.8.0 - Tokenizers 0.12.1", + "model_explanation_gemini": "Detects abusive or threatening speech with high accuracy using a fine-tuned RoBERTa-based model." +} \ No newline at end of file diff --git a/data/model_data_json/Efficient-Large-Model_NVILA-15B.json b/data/model_data_json/Efficient-Large-Model_NVILA-15B.json new file mode 100644 index 0000000000000000000000000000000000000000..a47880ec812bdefdc5a51afc651606d9aacffc93 --- /dev/null +++ b/data/model_data_json/Efficient-Large-Model_NVILA-15B.json @@ -0,0 +1,18 @@ +{ + "model_id": "Efficient-Large-Model/NVILA-15B", + "downloads": 184648, + "tags": [ + "transformers", + "safetensors", + "llava_llama", + "NVILA", + "VLM", + "text-generation", + "arxiv:2412.04468", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 library_name: transformers pipeline_tag: text-generation tags: - NVILA - VLM --- # VILA Model Card ## Model details **Model type:** NVILA is a visual language model (VLM) pretrained with interleaved image-text data at scale, enabling multi-image VLM. Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy. Building on top of VILA, we improve its model architecture by first scaling up the spatial and temporal resolutions, and then compressing visual tokens. This \"scale-then-compress\" approach enables NVILA to efficiently process high-resolution images and long videos. We also conduct a systematic investigation to enhance the efficiency of NVILA throughout its entire lifecycle, from training and fine-tuning to deployment. NVILA matches or surpasses the accuracy of many leading open and proprietary VLMs across a wide range of image and video benchmarks. At the same time, it reduces training costs by 4.5X, fine-tuning memory usage by 3.4X, pre-filling latency by 1.6-2.2X, and decoding latency by 1.2-2.8X. We will soon make our code and models available to facilitate reproducibility. **Model date:** NVILA was trained in Nov 2024. **Paper or resources for more information:** ## License - The code is released under the Apache 2.0 license as found in the LICENSE file. - The pretrained weights are released under the CC-BY-NC-SA-4.0 license. - The service is a research preview intended for non-commercial use only, and is subject to the following licenses and terms: - Terms of Use of the data generated by OpenAI - Dataset Licenses for each one used during training. **Where to send questions or comments about the model:** ## Intended use **Primary intended uses:** The primary use of VILA is research on large multimodal models and chatbots. **Primary intended users:** The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. ## Input: **Input Type:** Image, Video, Text **Input Format:** Red, Green, Blue; MP4 ;String **Input Parameters:** 2D, 3D ## Output: **Output Type:** Text **Output Format:** String **Supported Hardware Microarchitecture Compatibility:** * Ampere * Jetson * Hopper * Lovelace **[Preferred/Supported] Operating System(s):**
Linux ## Training dataset See Dataset Preparation for more details. ** Data Collection Method by dataset * [Hybrid: Automated, Human] ** Labeling Method by dataset * [Hybrid: Automated, Human] ## Inference: **Engine:** [Tensor(RT), Triton, Or List Other Here] * PyTorch * TensorRT-LLM * TinyChat **Test Hardware:** * A100 * Jetson Orin * RTX 4090 ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.", + "model_explanation_gemini": "Efficient-Large-Model_NVILA-15B is a high-efficiency visual language model (VLM) designed for processing multi-image and video inputs to generate text, optimized for reduced training costs and improved inference speed while maintaining accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/ElKulako_cryptobert.json b/data/model_data_json/ElKulako_cryptobert.json new file mode 100644 index 0000000000000000000000000000000000000000..b7a4681260e88adfd0ac9c0127d77b9e37704670 --- /dev/null +++ b/data/model_data_json/ElKulako_cryptobert.json @@ -0,0 +1,29 @@ +{ + "model_id": "ElKulako/cryptobert", + "downloads": 284473, + "tags": [ + "transformers", + "pytorch", + "roberta", + "text-classification", + "cryptocurrency", + "crypto", + "BERT", + "sentiment classification", + "NLP", + "bitcoin", + "ethereum", + "shib", + "social media", + "sentiment analysis", + "cryptocurrency sentiment analysis", + "en", + "dataset:ElKulako/stocktwits-crypto", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - ElKulako/stocktwits-crypto language: - en tags: - cryptocurrency - crypto - BERT - sentiment classification - NLP - bitcoin - ethereum - shib - social media - sentiment analysis - cryptocurrency sentiment analysis license: mit --- For academic reference, cite the following paper: # CryptoBERT CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It was built by further training the vinai's bertweet-base language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts. (A research paper with more details will follow soon.) ## Classification Training The model was trained on the following labels: \"Bearish\" : 0, \"Neutral\": 1, \"Bullish\": 2 CryptoBERT's sentiment classification head was fine-tuned on a balanced dataset of 2M labelled StockTwits posts, sampled from ElKulako/stocktwits-crypto. CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended. # Classification Example ## Training Corpus CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora: (1) StockTwits - 1.875M posts about the top 100 cryptos by trading volume. Posts were collected from the 1st of November 2021 to the 16th of June 2022. ElKulako/stocktwits-crypto (2) Telegram - 664K posts from top 5 telegram groups: Binance, Bittrex, huobi global, Kucoin, OKEx. Data from 16.11.2020 to 30.01.2021. Courtesy of Anton. (3) Reddit - 172K comments from various crypto investing threads, collected from May 2021 to May 2022 (4) Twitter - 496K posts with hashtags XBT, Bitcoin or BTC. Collected for May 2018. Courtesy of Paul.", + "model_explanation_gemini": "Analyzes sentiment of cryptocurrency-related social media posts, classifying them as \"Bearish,\" \"Neutral,\" or \"Bullish.\"" +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_gpt-j-6b.json b/data/model_data_json/EleutherAI_gpt-j-6b.json new file mode 100644 index 0000000000000000000000000000000000000000..d26080344383c74a676827560323062f80e44316 --- /dev/null +++ b/data/model_data_json/EleutherAI_gpt-j-6b.json @@ -0,0 +1,23 @@ +{ + "model_id": "EleutherAI/gpt-j-6b", + "downloads": 257442, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "gptj", + "text-generation", + "causal-lm", + "en", + "dataset:EleutherAI/pile", + "arxiv:2104.09864", + "arxiv:2101.00027", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - pytorch - causal-lm license: apache-2.0 datasets: - EleutherAI/pile --- # GPT-J 6B ## Model Description GPT-J 6B is a transformer model trained using Ben Wang's Mesh Transformer JAX. \"GPT-J\" refers to the class of model, while \"6B\" represents the number of trainable parameters.
| Hyperparameter | Value | |----------------------|------------| | \\\\(n_{parameters}\\\\) | 6053381344 | | \\\\(n_{layers}\\\\) | 28* | | \\\\(d_{model}\\\\) | 4096 | | \\\\(d_{ff}\\\\) | 16384 | | \\\\(n_{heads}\\\\) | 16 | | \\\\(d_{head}\\\\) | 256 | | \\\\(n_{ctx}\\\\) | 2048 | | \\\\(n_{vocab}\\\\) | 50257/50400† (same tokenizer as GPT-2/3) | | Positional Encoding | Rotary Position Embedding (RoPE) | | RoPE Dimensions | 64 |

* Each layer consists of one feedforward block and one self attention block.

Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer.

The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as GPT-2/GPT-3. ## Intended Use and Limitations GPT-J learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt. ### Out-of-scope use GPT-J-6B is **not** intended for deployment without fine-tuning, supervision, and/or moderation. It is not a in itself a product and cannot be used for human-facing interactions. For example, the model may generate harmful or offensive text. Please evaluate the risks associated with your particular use case. GPT-J-6B was trained on an English-language only dataset, and is thus **not** suitable for translation or generating text in other languages. GPT-J-6B has not been fine-tuned for downstream contexts in which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means GPT-J-6B will **not** respond to a given prompt the way a product like ChatGPT does. This is because, unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “follow” human instructions. ### Limitations and Biases The core functionality of GPT-J is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-J it is important to remember that the statistically most likely next token is often not the token that produces the most \"accurate\" text. Never depend upon GPT-J to produce factually accurate output. GPT-J was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending upon use case GPT-J may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile. As with all language models, it is hard to predict in advance how GPT-J will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results. ### How to use This model can be easily loaded using the functionality: ## Training data GPT-J 6B was trained on the Pile, a large-scale curated dataset created by EleutherAI. ## Training procedure This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly. ## Evaluation results
| Model | Public | Training FLOPs | LAMBADA PPL ↓ | LAMBADA Acc ↑ | Winogrande ↑ | Hellaswag ↑ | PIQA ↑ | Dataset Size (GB) | |--------------------------|-------------|----------------|--- |--- |--- |--- |--- |-------------------| | Random Chance | ✓ | 0 | ~a lot | ~0% | 50% | 25% | 25% | 0 | | GPT-3 Ada‡ | ✗ | ----- | 9.95 | 51.6% | 52.9% | 43.4% | 70.5% | ----- | | GPT-2 1.5B | ✓ | ----- | 10.63 | 51.21% | 59.4% | 50.9% | 70.8% | 40 | | GPT-Neo 1.3B‡ | ✓ | 3.0e21 | 7.50 | 57.2% | 55.0% | 48.9% | 71.1% | 825 | | Megatron-2.5B* | ✗ | 2.4e21 | ----- | 61.7% | ----- | ----- | ----- | 174 | | GPT-Neo 2.7B‡ | ✓ | 6.8e21 | 5.63 | 62.2% | 56.5% | 55.8% | 73.0% | 825 | | GPT-3 1.3B*‡ | ✗ | 2.4e21 | 5.44 | 63.6% | 58.7% | 54.7% | 75.1% | ~800 | | GPT-3 Babbage‡ | ✗ | ----- | 5.58 | 62.4% | 59.0% | 54.5% | 75.5% | ----- | | Megatron-8.3B* | ✗ | 7.8e21 | ----- | 66.5% | ----- | ----- | ----- | 174 | | GPT-3 2.7B*‡ | ✗ | 4.8e21 | 4.60 | 67.1% | 62.3% | 62.8% | 75.6% | ~800 | | Megatron-11B† | ✓ | 1.0e22 | ----- | ----- | ----- | ----- | ----- | 161 | | **GPT-J 6B‡** | **✓** | **1.5e22** | **3.99** | **69.7%** | **65.3%** | **66.1%** | **76.5%** | **825** | | GPT-3 6.7B*‡ | ✗ | 1.2e22 | 4.00 | 70.3% | 64.5% | 67.4% | 78.0% | ~800 | | GPT-3 Curie‡ | ✗ | ----- | 4.00 | 69.3% | 65.6% | 68.5% | 77.9% | ----- | | GPT-3 13B*‡ | ✗ | 2.3e22 | 3.56 | 72.5% | 67.9% | 70.9% | 78.5% | ~800 | | GPT-3 175B*‡ | ✗ | 3.1e23 | 3.00 | 76.2% | 70.2% | 78.9% | 81.0% | ~800 | | GPT-3 Davinci‡ | ✗ | ----- | 3.0 | 75% | 72% | 78% | 80% | ----- |

Models roughly sorted by performance, or by FLOPs if not available.

* Evaluation numbers reported by their respective authors. All other numbers are provided by running for more details.

Megatron-11B provides no comparable metrics, and several implementations using the released weights do not reproduce the generation quality and evaluations. (see

These models have been trained with data which contains possible test set contamination. The OpenAI GPT-3 models failed to deduplicate training data for certain test sets, while the GPT-Neo models as well as this one is trained on the Pile, which has not been deduplicated against any test sets.

## Citation and Related Information ### BibTeX entry To cite this model: To cite the codebase that trained this model: If you use this model, we would love to hear about it! Reach out on GitHub, Discord, or shoot Ben an email. ## Acknowledgements This project would not have been possible without compute generously provided by Google through the TPU Research Cloud, as well as the Cloud TPU team for providing early access to the Cloud TPU VM Alpha. Thanks to everyone who have helped out one way or another (listed alphabetically): - James Bradbury for valuable assistance with debugging JAX issues. - Stella Biderman, Eric Hallahan, Kurumuz, and Finetune for converting the model to be compatible with the package. - Leo Gao for running zero shot evaluations for the baseline models for the table. - Laurence Golding for adding some features to the web demo. - Aran Komatsuzaki for advice with experiment design and writing the blog posts. - Janko Prester for creating the web demo frontend.", + "model_explanation_gemini": "A 6-billion-parameter autoregressive language model trained to generate English text by predicting the next token in a sequence." +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_gpt-neo-1.3B.json b/data/model_data_json/EleutherAI_gpt-neo-1.3B.json new file mode 100644 index 0000000000000000000000000000000000000000..aa38648162a77a520a6d84ed3a2191eb6595579c --- /dev/null +++ b/data/model_data_json/EleutherAI_gpt-neo-1.3B.json @@ -0,0 +1,24 @@ +{ + "model_id": "EleutherAI/gpt-neo-1.3B", + "downloads": 204403, + "tags": [ + "transformers", + "pytorch", + "jax", + "rust", + "safetensors", + "gpt_neo", + "text-generation", + "text generation", + "causal-lm", + "en", + "dataset:EleutherAI/pile", + "arxiv:2101.00027", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - text generation - pytorch - causal-lm license: mit datasets: - EleutherAI/pile --- # GPT-Neo 1.3B ## Model Description GPT-Neo 1.3B is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 1.3B represents the number of parameters of this particular pre-trained model. ## Training data GPT-Neo 1.3B was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model. ## Training procedure This model was trained on the Pile for 380 billion tokens over 362,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss. ## Intended Use and Limitations This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt. ### How to use You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run: ### Limitations and Biases GPT-Neo was trained as an autoregressive language model. This means that its core functionality is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile. As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results. ## Eval results ### Linguistic Reasoning | Model and Size | Pile BPB | Pile PPL | Wikitext PPL | Lambada PPL | Lambada Acc | Winogrande | Hellaswag | | ---------------- | ---------- | ---------- | ------------- | ----------- | ----------- | ---------- | ----------- | | **GPT-Neo 1.3B** | **0.7527** | **6.159** | **13.10** | **7.498** | **57.23%** | **55.01%** | **38.66%** | | GPT-2 1.5B | 1.0468 | ----- | 17.48 | 10.634 | 51.21% | 59.40% | 40.03% | | GPT-Neo 2.7B | 0.7165 | 5.646 | 11.39 | 5.626 | 62.22% | 56.50% | 42.73% | | GPT-3 Ada | 0.9631 | ----- | ----- | 9.954 | 51.60% | 52.90% | 35.93% | ### Physical and Scientific Reasoning | Model and Size | MathQA | PubMedQA | Piqa | | ---------------- | ---------- | ---------- | ----------- | | **GPT-Neo 1.3B** | **24.05%** | **54.40%** | **71.11%** | | GPT-2 1.5B | 23.64% | 58.33% | 70.78% | | GPT-Neo 2.7B | 24.72% | 57.54% | 72.14% | | GPT-3 Ada | 24.29% | 52.80% | 68.88% | ### Down-Stream Applications TBD ### BibTeX entry and citation info To cite this model, please use # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 29.44 | | ARC (25-shot) | 31.23 | | HellaSwag (10-shot) | 48.47 | | MMLU (5-shot) | 24.82 | | TruthfulQA (0-shot) | 39.63 | | Winogrande (5-shot) | 56.91 | | GSM8K (5-shot) | 0.45 | | DROP (3-shot) | 4.6 |", + "model_explanation_gemini": "Generates English text by predicting the next token in a sequence based on prompts." +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_gpt-neo-125m.json b/data/model_data_json/EleutherAI_gpt-neo-125m.json new file mode 100644 index 0000000000000000000000000000000000000000..c587b19c2640beb7dda1053d7be96b21279913be --- /dev/null +++ b/data/model_data_json/EleutherAI_gpt-neo-125m.json @@ -0,0 +1,24 @@ +{ + "model_id": "EleutherAI/gpt-neo-125m", + "downloads": 156287, + "tags": [ + "transformers", + "pytorch", + "jax", + "rust", + "safetensors", + "gpt_neo", + "text-generation", + "text generation", + "causal-lm", + "en", + "dataset:EleutherAI/pile", + "arxiv:2101.00027", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - text generation - pytorch - causal-lm license: mit datasets: - EleutherAI/pile --- # GPT-Neo 125M ## Model Description GPT-Neo 125M is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 125M represents the number of parameters of this particular pre-trained model. ## Training data GPT-Neo 125M was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model. ## Training procedure This model was trained on the Pile for 300 billion tokens over 572,300 steps. It was trained as a masked autoregressive language model, using cross-entropy loss. ## Intended Use and Limitations This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt. ### How to use You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run: ### Limitations and Biases GPT-Neo was trained as an autoregressive language model. This means that its core functionality is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile. As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results. ## Eval results TBD ### Down-Stream Applications TBD ### BibTeX entry and citation info To cite this model, use # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 25.79 | | ARC (25-shot) | 22.95 | | HellaSwag (10-shot) | 30.26 | | MMLU (5-shot) | 25.97 | | TruthfulQA (0-shot) | 45.58 | | Winogrande (5-shot) | 51.78 | | GSM8K (5-shot) | 0.3 | | DROP (3-shot) | 3.69 |", + "model_explanation_gemini": "Generates English text autoregressively based on prompts, trained as a language model on the Pile dataset." +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_gpt-neox-20b.json b/data/model_data_json/EleutherAI_gpt-neox-20b.json new file mode 100644 index 0000000000000000000000000000000000000000..87f0fa31d92a26f2fe59a7c1b5da794e95ecc4d3 --- /dev/null +++ b/data/model_data_json/EleutherAI_gpt-neox-20b.json @@ -0,0 +1,25 @@ +{ + "model_id": "EleutherAI/gpt-neox-20b", + "downloads": 346183, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "gpt_neox", + "text-generation", + "causal-lm", + "en", + "dataset:EleutherAI/pile", + "arxiv:2204.06745", + "arxiv:2101.00027", + "arxiv:2201.07311", + "arxiv:2104.09864", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - pytorch - causal-lm license: apache-2.0 datasets: - EleutherAI/pile --- GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile using the GPT-NeoX library. Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J- 6B. Its training dataset contains a multitude of English-language texts, reflecting the general-purpose nature of this model. See the accompanying paper for details about model architecture (including how it differs from GPT-3), training procedure, and additional evaluations. ### Model details - Developed by: EleutherAI - Model type: Transformer-based Language Model - Language: English - Learn more: GPT-NeoX-20B: An Open-Source Autoregressive Language Model. For details about the training dataset, see the Pile paper, and its data sheet. - License: Apache 2.0 - Contact: to ask questions about this model, join the EleutherAI Discord, and post them in . Please read the existing GPT-NeoX-20B documentation before asking about the model on Discord. For general correspondence: contact@eleuther. ai.
| Hyperparameter | Value | | ---------------------- | ----------- | | nparameters | 20554567680 | | nlayers | 44 | | dmodel | 6144 | | nheads | 64 | | dhead | 96 | | nvocab | 50257 | | Sequence Length | 2048 | | Learning Rate | 0.97 x 10-5 | | Positional Encoding | Rotary Position Embedding (RoPE) |
### Uses and limitations #### Intended use GPT-NeoX-20B was developed primarily for research purposes. It learns an inner representation of the English language that can be used to extract features useful for downstream tasks. In addition to scientific uses, you may also further fine-tune and adapt GPT-NeoX-20B for deployment, as long as your use is in accordance with the Apache 2.0 license. This model works with the Transformers Library. If you decide to use pre-trained GPT-NeoX-20B as a basis for your fine-tuned model, please note that you need to conduct your own risk and bias assessment. #### Out-of-scope use GPT-NeoX-20B is **not** intended for deployment as-is. It is not a product and cannot be used for human-facing interactions without supervision. GPT-NeoX-20B has not been fine-tuned for downstream tasks for which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means GPT-NeoX-20B will likely **not** respond to a given prompt the way products such as ChatGPT do. This is because, unlike GPT-NeoX-20B, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “understand” human instructions and dialogue. This model is English-language only, and thus cannot be used for translation or generating text in other languages. #### Limitations and biases The core functionality of GPT-NeoX-20B is to take a string of text and predict the next token. Remember that the statistically most likely next token need not result in the most “accurate” text. Never rely on GPT-NeoX-20B to produce factually accurate output. This model was trained on the Pile, a dataset known to contain profanity and texts that are lewd or otherwise offensive. See Section 6 of the Pile paper for a discussion of documented biases with regards to gender, religion, and race. GPT-NeoX-20B may produce socially unacceptable or undesirable text, *even if* the prompt itself does not include anything explicitly offensive. We recommend curating the outputs of this model before presenting it to a human reader. Please inform your audience that you are using artificially generated text. #### How to use If you simply want to try out some prompts, check out this playground. GPT-NeoX-20B can be loaded using the functionality: ### Training #### Training dataset The Pile is a 825GiB general-purpose dataset in English. It was created by EleutherAI specifically for training large language models. It contains texts from 22 diverse sources, roughly broken down into five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub, Enron Emails). See the Pile paper for a breakdown of all data sources, methodology, and a discussion of ethical implications. Consult the datasheet for more detailed documentation about the Pile and its component datasets. The Pile can be downloaded from the official website, or from a community mirror. The Pile was **not** deduplicated before being used to train GPT-NeoX-20B. #### Training procedure GPT-NeoX-20B was trained with a batch size of approximately 3.15M tokens (1538 sequences of 2048 tokens each), for a total of 150,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. Additional details about the training procedure are in Section 3 of the accompanying paper. ### Evaluations
| Model | OpenAI’s LAMBADA | SciQ | PIQA | TriviaQA | ARC (Challenge) | | ------------- | :--------------: | :-----------: | :-----------: | :-----------: | :-------------: | | GPT-J-6B | 0.683 ± 0.006 | 0.910 ± 0.009 | 0.752 ± 0.010 | 0.170 ± 0.004 | 0.340 ± 0.014 | | FairSeq 6.7B | 0.673 ± 0.007 | 0.895 ± 0.010 | 0.762 ± 0.010 | 0.221 ± 0.004 | 0.329 ± 0.014 | | GPT-3 Curie | 0.693 ± 0.006 | 0.918 ± 0.009 | 0.767 ± 0.010 | 0.196 ± 0.004 | 0.334 ± 0.014 | | FairSeq 13B | 0.709 ± 0.006 | 0.910 ± 0.009 | 0.769 ± 0.010 | 0.270 ± 0.004 | 0.345 ± 0.014 | | GPT-NeoX-20B | 0.720 ± 0.006 | 0.928 ± 0.008 | 0.779 ± 0.010 | 0.259 ± 0.004 | 0.380 ± 0.014 | | GPT-3 DaVinci | 0.752 ± 0.006 | 0.949 ± 0.007 | 0.791 ± 0.009 | 0.409 ± 0.005 | 0.435 ± 0.014 |
Zero-shot performance on selected natural language tasks.
This is a heavily abridged version of the evaluation results. Appendix D of the GPT-NeoX-20B paper compares more model sizes, and contains additional evaluations, including on: zero and five-shot natural language tasks, zero and five-shot Basic Arithmetic and MATH, and zero-shot Hendrycks tasks. ### BibTeX To cite the GPT-NeoX-20B paper: # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 36.02 | | ARC (25-shot) | 45.73 | | HellaSwag (10-shot) | 73.45 | | MMLU (5-shot) | 25.0 | | TruthfulQA (0-shot) | 31.61 | | Winogrande (5-shot) | 68.9 | | GSM8K (5-shot) | 2.43 | | DROP (3-shot) | 5.04 |", + "model_explanation_gemini": "GPT-NeoX-20B is a 20-billion-parameter autoregressive English language model designed for research, generating text by predicting the next token, and is suitable for downstream task fine-tuning under the Apache 2.0 license." +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_polyglot-ko-1.3b.json b/data/model_data_json/EleutherAI_polyglot-ko-1.3b.json new file mode 100644 index 0000000000000000000000000000000000000000..3148ba0c000eabbaa752c404c895d75708d96c91 --- /dev/null +++ b/data/model_data_json/EleutherAI_polyglot-ko-1.3b.json @@ -0,0 +1,23 @@ +{ + "model_id": "EleutherAI/polyglot-ko-1.3b", + "downloads": 114911, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "gpt_neox", + "text-generation", + "causal-lm", + "ko", + "arxiv:2104.09864", + "arxiv:2204.04541", + "arxiv:2306.02254", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ko tags: - pytorch - causal-lm license: apache-2.0 --- # Polyglot-Ko-1.3B ## Model Description Polyglot-Ko is a series of large-scale Korean autoregressive language models made by the EleutherAI polyglot team. | Hyperparameter | Value | |----------------------|----------------------------------------------------------------------------------------------------------------------------------------| | \\\\(n_{parameters}\\\\) | 1,331,810,304 | | \\\\(n_{layers}\\\\) | 24 | | \\\\(d_{model}\\\\) | 2,048 | | \\\\(d_{ff}\\\\) | 8,192 | | \\\\(n_{heads}\\\\) | 16 | | \\\\(d_{head}\\\\) | 128 | | \\\\(n_{ctx}\\\\) | 2,048 | | \\\\(n_{vocab}\\\\) | 30,003 / 30,080 | | Positional Encoding | Rotary Position Embedding (RoPE) | | RoPE Dimensions | 64 | The model consists of 24 transformer layers with a model dimension of 2048, and a feedforward dimension of 8192. The model dimension is split into 16 heads, each with a dimension of 128. Rotary Position Embedding (RoPE) is applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 30003. ## Training data Polyglot-Ko-1.3B was trained on 863 GB of Korean language data (1.2TB before processing), a large-scale dataset curated by TUNiB. The data collection process has abided by South Korean laws. This dataset was collected for the purpose of training Polyglot-Ko models, so it will not be released for public use. | Source |Size (GB) | Link | |-------------------------------------|---------|------------------------------------------| | Korean blog posts | 682.3 | - | | Korean news dataset | 87.0 | - | | Modu corpus | 26.4 |corpus.korean.go.kr | | Korean patent dataset | 19.0 | - | | Korean Q & A dataset | 18.1 | - | | KcBert dataset | 12.7 | github.com/Beomi/KcBERT | | Korean fiction dataset | 6.1 | - | | Korean online comments | 4.2 | - | | Korean wikipedia | 1.4 | ko.wikipedia.org | | Clova call | < 1.0 | github.com/clovaai/ClovaCall | | Naver sentiment movie corpus | < 1.0 | github.com/e9t/nsmc | | Korean hate speech dataset | < 1.0 | - | | Open subtitles | < 1.0 | opus.nlpl.eu/OpenSubtitles.php | | AIHub various tasks datasets | < 1.0 |aihub.or.kr | | Standard Korean language dictionary | < 1.0 | stdict.korean.go.kr/main/main.do | Furthermore, in order to avoid the model memorizing and generating personally identifiable information (PII) in the training data, we masked out the following sensitive information in the pre-processing stage: * : bank account number * : resident registration number * : phone number ## Training procedure Polyglot-Ko-1.3B was trained on 213 billion tokens over 102,000 steps on 256 A100 GPUs with the GPT-NeoX framework. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token. ## How to use This model can be easily loaded using the class: ## Evaluation results We evaluate Polyglot-Ko-1.3B on KOBEST dataset, a benchmark with 5 downstream tasks, against comparable models such as skt/ko-gpt-trinity-1.2B-v0.5, kakaobrain/kogpt and facebook/xglm-7.5B, using the prompts provided in the paper. The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the polyglot branch of lm-evaluation-harness and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, refers to the number of few-shot examples. In case of WiC dataset, all models show random performance. ### COPA (F1) | Model | params | 0-shot | 5-shot | 10-shot | 50-shot | |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------| | skt/ko-gpt-trinity-1.2B-v0.5 | 1.2B | 0.6696 | 0.6477 | 0.6419 | 0.6514 | | kakaobrain/kogpt | 6.0B | 0.7345 | 0.7287 | 0.7277 | 0.7479 | | facebook/xglm-7.5B | 7.5B | 0.6723 | 0.6731 | 0.6769 | 0.7119 | | **EleutherAI/polyglot-ko-1.3b (this)** | **1.3B** | **0.7196** | **0.7193** | **0.7204** | **0.7206** | | EleutherAI/polyglot-ko-3.8b | 3.8B | 0.7595 | 0.7608 | 0.7638 | 0.7788 | | EleutherAI/polyglot-ko-5.8b | 5.8B | 0.7745 | 0.7676 | 0.7775 | 0.7887 | | EleutherAI/polyglot-ko-12.8b | 12.8B | 0.7937 | 0.8108 | 0.8037 | 0.8369 | ### HellaSwag (F1) | Model | params | 0-shot | 5-shot | 10-shot | 50-shot | |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------| | skt/ko-gpt-trinity-1.2B-v0.5 | 1.2B | 0.5243 | 0.5272 | 0.5166 | 0.5352 | | kakaobrain/kogpt | 6.0B | 0.5590 | 0.5833 | 0.5828 | 0.5907 | | facebook/xglm-7.5B | 7.5B | 0.5665 | 0.5689 | 0.5565 | 0.5622 | | **EleutherAI/polyglot-ko-1.3b (this)** | **1.3B** | **0.5247** | **0.5260** | **0.5278** | **0.5427** | | EleutherAI/polyglot-ko-3.8b | 3.8B | 0.5707 | 0.5830 | 0.5670 | 0.5787 | | EleutherAI/polyglot-ko-5.8b | 5.8B | 0.5976 | 0.5998 | 0.5979 | 0.6208 | | EleutherAI/polyglot-ko-12.8b | 12.8B | 0.5954 | 0.6306 | 0.6098 | 0.6118 | ### BoolQ (F1) | Model | params | 0-shot | 5-shot | 10-shot | 50-shot | |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------| | skt/ko-gpt-trinity-1.2B-v0.5 | 1.2B | 0.3356 | 0.4014 | 0.3640 | 0.3560 | | kakaobrain/kogpt | 6.0B | 0.4514 | 0.5981 | 0.5499 | 0.5202 | | facebook/xglm-7.5B | 7.5B | 0.4464 | 0.3324 | 0.3324 | 0.3324 | | **EleutherAI/polyglot-ko-1.3b (this)** | **1.3B** | **0.3552** | **0.4751** | **0.4109** | **0.4038** | | EleutherAI/polyglot-ko-3.8b | 3.8B | 0.4320 | 0.5263 | 0.4930 | 0.4038 | | EleutherAI/polyglot-ko-5.8b | 5.8B | 0.4356 | 0.5698 | 0.5187 | 0.5236 | | EleutherAI/polyglot-ko-12.8b | 12.8B | 0.4818 | 0.6041 | 0.6289 | 0.6448 | ### SentiNeg (F1) | Model | params | 0-shot | 5-shot | 10-shot | 50-shot | |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------| | skt/ko-gpt-trinity-1.2B-v0.5 | 1.2B | 0.6065 | 0.6878 | 0.7280 | 0.8413 | | kakaobrain/kogpt | 6.0B | 0.3747 | 0.8942 | 0.9294 | 0.9698 | | facebook/xglm-7.5B | 7.5B | 0.3578 | 0.4471 | 0.3964 | 0.5271 | | **EleutherAI/polyglot-ko-1.3b (this)** | **1.3B** | **0.6790** | **0.6257** | **0.5514** | **0.7851** | | EleutherAI/polyglot-ko-3.8b | 3.8B | 0.4858 | 0.7950 | 0.7320 | 0.7851 | | EleutherAI/polyglot-ko-5.8b | 5.8B | 0.3394 | 0.8841 | 0.8808 | 0.9521 | | EleutherAI/polyglot-ko-12.8b | 12.8B | 0.9117 | 0.9015 | 0.9345 | 0.9723 | ### WiC (F1) | Model | params | 0-shot | 5-shot | 10-shot | 50-shot | |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------| | skt/ko-gpt-trinity-1.2B-v0.5 | 1.2B | 0.3290 | 0.4313 | 0.4001 | 0.3621 | | kakaobrain/kogpt | 6.0B | 0.3526 | 0.4775 | 0.4358 | 0.4061 | | facebook/xglm-7.5B | 7.5B | 0.3280 | 0.4903 | 0.4945 | 0.3656 | | **EleutherAI/polyglot-ko-1.3b (this)** | **1.3B** | **0.3297** | **0.4850** | **0.465** | **0.3290** | | EleutherAI/polyglot-ko-3.8b | 3.8B | 0.3390 | 0.4944 | 0.4203 | 0.3835 | | EleutherAI/polyglot-ko-5.8b | 5.8B | 0.3913 | 0.4688 | 0.4189 | 0.3910 | | EleutherAI/polyglot-ko-12.8b | 12.8B | 0.3985 | 0.3683 | 0.3307 | 0.3273 | ## Limitations and Biases Polyglot-Ko has been trained to optimize next token prediction. Language models such as this are often used for a wide variety of tasks and it is important to be aware of possible unexpected outcomes. For instance, Polyglot-Ko will not always return the most factual or accurate response but the most statistically likely one. In addition, Polyglot may produce socially unacceptable or offensive content. We recommend having a human curator or other filtering mechanism to censor sensitive content. ## Citation and Related Information ### BibTeX entry If you find our work useful, please consider citing: ### Licensing All our models are licensed under the terms of the Apache License 2.0. ### Acknowledgement This project was made possible thanks to the computing resources from Stability.ai, and thanks to TUNiB for providing a large-scale Korean dataset for this work.", + "model_explanation_gemini": "An autoregressive Korean language model trained for generating and predicting text, optimized for Korean language tasks with 1.3 billion parameters." +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_pythia-160m.json b/data/model_data_json/EleutherAI_pythia-160m.json new file mode 100644 index 0000000000000000000000000000000000000000..b52ea3dff612f738926e9faaa0c2e646f57dbe99 --- /dev/null +++ b/data/model_data_json/EleutherAI_pythia-160m.json @@ -0,0 +1,25 @@ +{ + "model_id": "EleutherAI/pythia-160m", + "downloads": 133057, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "gpt_neox", + "text-generation", + "causal-lm", + "pythia", + "en", + "dataset:EleutherAI/pile", + "arxiv:2304.01373", + "arxiv:2101.00027", + "arxiv:2201.07311", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - pytorch - causal-lm - pythia license: apache-2.0 datasets: - EleutherAI/pile --- The *Pythia Scaling Suite* is a collection of models developed to facilitate interpretability research (see paper). It contains two sets of eight models of sizes 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. For each size, there are two models: one trained on the Pile, and one trained on the Pile after the dataset has been globally deduplicated. All 8 model sizes are trained on the exact same data, in the exact same order. We also provide 154 intermediate checkpoints per model, hosted on Hugging Face as branches. The Pythia model suite was deliberately designed to promote scientific research on large language models, especially interpretability research. Despite not centering downstream performance as a design goal, we find the models
match or exceed the performance of similar and same-sized models, such as those in the OPT and GPT-Neo suites.
Details on previous early release and naming convention. Previously, we released an early version of the Pythia suite to the public. However, we decided to retrain the model suite to address a few hyperparameter discrepancies. This model card lists the changes; see appendix B in the Pythia paper for further discussion. We found no difference in benchmark performance between the two Pythia versions. The old models are still available, but we suggest the retrained suite if you are just starting to use Pythia.
**This is the current release.** Please note that all models in the *Pythia* suite were renamed in January 2023. For clarity, a table comparing the old and new names is provided in this model card, together with exact parameter counts.

# Pythia-160M ## Model Details - Developed by: EleutherAI - Model type: Transformer-based Language Model - Language: English - Learn more: Pythia's GitHub repository for training procedure, config files, and details on how to use. See paper for more evals and implementation details. - Library: GPT-NeoX - License: Apache 2.0 - Contact: to ask questions about this model, join the EleutherAI Discord, and post them in . Please read the existing *Pythia* documentation before asking about it in the EleutherAI Discord. For general correspondence: contact@eleuther. ai.
| Pythia model | Non-Embedding Params | Layers | Model Dim | Heads | Batch Size | Learning Rate | Equivalent Models | | -----------: | -------------------: | :----: | :-------: | :---: | :--------: | :-------------------: | :--------------------: | | 70M | 18,915,328 | 6 | 512 | 8 | 2M | 1.0 x 10-3 | — | | 160M | 85,056,000 | 12 | 768 | 12 | 2M | 6.0 x 10-4 | GPT-Neo 125M, OPT-125M | | 410M | 302,311,424 | 24 | 1024 | 16 | 2M | 3.0 x 10-4 | OPT-350M | | 1.0B | 805,736,448 | 16 | 2048 | 8 | 2M | 3.0 x 10-4 | — | | 1.4B | 1,208,602,624 | 24 | 2048 | 16 | 2M | 2.0 x 10-4 | GPT-Neo 1.3B, OPT-1.3B | | 2.8B | 2,517,652,480 | 32 | 2560 | 32 | 2M | 1.6 x 10-4 | GPT-Neo 2.7B, OPT-2.7B | | 6.9B | 6,444,163,072 | 32 | 4096 | 32 | 2M | 1.2 x 10-4 | OPT-6.7B | | 12B | 11,327,027,200 | 36 | 5120 | 40 | 2M | 1.2 x 10-4 | — |
Engineering details for the Pythia Suite. Deduped and non-deduped models of a given size have the same hyperparameters. “Equivalent” models have exactly the same architecture, and the same number of non-embedding parameters.
## Uses and Limitations ### Intended Use The primary intended use of Pythia is research on the behavior, functionality, and limitations of large language models. This suite is intended to provide a controlled setting for performing scientific experiments. We also provide 154 checkpoints per model: initial , 10 log-spaced checkpoints , and 143 evenly-spaced checkpoints from to . These checkpoints are hosted on Hugging Face as branches. Note that branch corresponds exactly to the model checkpoint on the branch of each model. You may also further fine-tune and adapt Pythia-160M for deployment, as long as your use is in accordance with the Apache 2.0 license. Pythia models work with the Hugging Face Transformers Library. If you decide to use pre-trained Pythia-160M as a basis for your fine-tuned model, please conduct your own risk and bias assessment. ### Out-of-scope use The Pythia Suite is **not** intended for deployment. It is not a in itself a product and cannot be used for human-facing interactions. For example, the model may generate harmful or offensive text. Please evaluate the risks associated with your particular use case. Pythia models are English-language only, and are not suitable for translation or generating text in other languages. Pythia-160M has not been fine-tuned for downstream contexts in which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means Pythia-160M will **not** respond to a given prompt the way a product like ChatGPT does. This is because, unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “follow” human instructions. ### Limitations and biases The core functionality of a large language model is to take a string of text and predict the next token. The token used by the model need not produce the most “accurate” text. Never rely on Pythia-160M to produce factually accurate output. This model was trained on the Pile, a dataset known to contain profanity and texts that are lewd or otherwise offensive. See Section 6 of the Pile paper for a discussion of documented biases with regards to gender, religion, and race. Pythia-160M may produce socially unacceptable or undesirable text, *even if* the prompt itself does not include anything explicitly offensive. If you plan on using text generated through, for example, the Hosted Inference API, we recommend having a human curate the outputs of this language model before presenting it to other people. Please inform your audience that the text was generated by Pythia-160M. ### Quickstart Pythia models can be loaded and used via the following code, demonstrated here for the third checkpoint: Revision/branch corresponds exactly to the model checkpoint on the branch of each model.
For more information on how to use all Pythia models, see documentation on GitHub. ## Training ### Training data The Pile is a 825GiB general-purpose dataset in English. It was created by EleutherAI specifically for training large language models. It contains texts from 22 diverse sources, roughly broken down into five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub, Enron Emails). See the Pile paper for a breakdown of all data sources, methodology, and a discussion of ethical implications. Consult the datasheet for more detailed documentation about the Pile and its component datasets. The Pile can be downloaded from the official website, or from a community mirror.
The Pile was **not** deduplicated before being used to train Pythia-160M. ### Training procedure All models were trained on the exact same data, in the exact same order. Each model saw 299,892,736,000 tokens during training, and 143 checkpoints for each model are saved every 2,097,152,000 tokens, spaced evenly throughout training, from to (which is the same as ). In addition, we also provide frequent early checkpoints: and . This corresponds to training for just under 1 epoch on the Pile for non-deduplicated models, and about 1.5 epochs on the deduplicated Pile. All *Pythia* models trained for 143000 steps at a batch size of 2M (2,097,152 tokens).
See GitHub for more details on training procedure, including how to reproduce it.
Pythia uses the same tokenizer as GPT-NeoX- 20B. ## Evaluations All 16 *Pythia* models were evaluated using the LM Evaluation Harness. You can access the results by model and step at in the GitHub repository.
Expand the sections below to see plots of evaluation results for all Pythia and Pythia-deduped models compared with OPT and BLOOM.
LAMBADA – OpenAI
Physical Interaction: Question Answering (PIQA)
WinoGrande
AI2 Reasoning Challenge—Easy Set
SciQ
## Changelog This section compares differences between previously released Pythia v0 and the current models. See Appendix B of the Pythia paper for further discussion of these changes and the motivation behind them. We found that retraining Pythia had no impact on benchmark performance. - All model sizes are now trained with uniform batch size of 2M tokens. Previously, the models of size 160M, 410M, and 1.4B parameters were trained with batch sizes of 4M tokens. - We added checkpoints at initialization (step 0) and steps {1,2,4,8,16,32,64, 128,256,512} in addition to every 1000 training steps. - Flash Attention was used in the new retrained suite. - We remedied a minor inconsistency that existed in the original suite: all models of size 2.8B parameters or smaller had a learning rate (LR) schedule which decayed to a minimum LR of 10% the starting LR rate, but the 6.9B and 12B models all used an LR schedule which decayed to a minimum LR of 0. In the redone training runs, we rectified this inconsistency: all models now were trained with LR decaying to a minimum of 0.1× their maximum LR. ### Naming convention and parameter count *Pythia* models were renamed in January 2023. It is possible that the old naming convention still persists in some documentation by accident. The current naming convention (70M, 160M, etc.) is based on total parameter count.
| current Pythia suffix | old suffix | total params | non-embedding params | | --------------------: | ---------: | -------------: | -------------------: | | 70M | 19M | 70,426,624 | 18,915,328 | | 160M | 125M | 162,322,944 | 85,056,000 | | 410M | 350M | 405,334,016 | 302,311,424 | | 1B | 800M | 1,011,781,632 | 805,736,448 | | 1.4B | 1.3B | 1,414,647,808 | 1,208,602,624 | | 2.8B | 2.7B | 2,775,208,960 | 2,517,652,480 | | 6.9B | 6.7B | 6,857,302,016 | 6,444,163,072 | | 12B | 13B | 11,846,072,320 | 11,327,027,200 |
", + "model_explanation_gemini": "A 160M-parameter English language model from the Pythia Scaling Suite designed for interpretability research, trained on the Pile dataset with consistent data ordering." +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_pythia-1b.json b/data/model_data_json/EleutherAI_pythia-1b.json new file mode 100644 index 0000000000000000000000000000000000000000..fd4eedfcbadee5ed942b8c70d4a824465f4c39de --- /dev/null +++ b/data/model_data_json/EleutherAI_pythia-1b.json @@ -0,0 +1,24 @@ +{ + "model_id": "EleutherAI/pythia-1b", + "downloads": 83131, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "gpt_neox", + "text-generation", + "causal-lm", + "pythia", + "en", + "dataset:the_pile", + "arxiv:2304.01373", + "arxiv:2101.00027", + "arxiv:2201.07311", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - pytorch - causal-lm - pythia license: apache-2.0 datasets: - the_pile --- The *Pythia Scaling Suite* is a collection of models developed to facilitate interpretability research (see paper). It contains two sets of eight models of sizes 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. For each size, there are two models: one trained on the Pile, and one trained on the Pile after the dataset has been globally deduplicated. All 8 model sizes are trained on the exact same data, in the exact same order. We also provide 154 intermediate checkpoints per model, hosted on Hugging Face as branches. The Pythia model suite was deliberately designed to promote scientific research on large language models, especially interpretability research. Despite not centering downstream performance as a design goal, we find the models match or exceed the performance of similar and same-sized models, such as those in the OPT and GPT-Neo suites.
Details on previous early release and naming convention. Previously, we released an early version of the Pythia suite to the public. However, we decided to retrain the model suite to address a few hyperparameter discrepancies. This model card lists the changes; see appendix B in the Pythia paper for further discussion. We found no difference in benchmark performance between the two Pythia versions. The old models are still available, but we suggest the retrained suite if you are just starting to use Pythia.
**This is the current release.** Please note that all models in the *Pythia* suite were renamed in January 2023. For clarity, a table comparing the old and new names is provided in this model card, together with exact parameter counts.

# Pythia-1B ## Model Details - Developed by: EleutherAI - Model type: Transformer-based Language Model - Language: English - Learn more: Pythia's GitHub repository for training procedure, config files, and details on how to use. See paper for more evals and implementation details. - Library: GPT-NeoX - License: Apache 2.0 - Contact: to ask questions about this model, join the EleutherAI Discord, and post them in . Please read the existing *Pythia* documentation before asking about it in the EleutherAI Discord. For general correspondence: contact@eleuther. ai.
| Pythia model | Non-Embedding Params | Layers | Model Dim | Heads | Batch Size | Learning Rate | Equivalent Models | | -----------: | -------------------: | :----: | :-------: | :---: | :--------: | :-------------------: | :--------------------: | | 70M | 18,915,328 | 6 | 512 | 8 | 2M | 1.0 x 10-3 | — | | 160M | 85,056,000 | 12 | 768 | 12 | 2M | 6.0 x 10-4 | GPT-Neo 125M, OPT-125M | | 410M | 302,311,424 | 24 | 1024 | 16 | 2M | 3.0 x 10-4 | OPT-350M | | 1.0B | 805,736,448 | 16 | 2048 | 8 | 2M | 3.0 x 10-4 | — | | 1.4B | 1,208,602,624 | 24 | 2048 | 16 | 2M | 2.0 x 10-4 | GPT-Neo 1.3B, OPT-1.3B | | 2.8B | 2,517,652,480 | 32 | 2560 | 32 | 2M | 1.6 x 10-4 | GPT-Neo 2.7B, OPT-2.7B | | 6.9B | 6,444,163,072 | 32 | 4096 | 32 | 2M | 1.2 x 10-4 | OPT-6.7B | | 12B | 11,327,027,200 | 36 | 5120 | 40 | 2M | 1.2 x 10-4 | — |
Engineering details for the Pythia Suite. Deduped and non-deduped models of a given size have the same hyperparameters. “Equivalent” models have exactly the same architecture, and the same number of non-embedding parameters.
## Uses and Limitations ### Intended Use The primary intended use of Pythia is research on the behavior, functionality, and limitations of large language models. This suite is intended to provide a controlled setting for performing scientific experiments. We also provide 154 checkpoints per model: initial , 10 log-spaced checkpoints , and 143 evenly-spaced checkpoints from to . These checkpoints are hosted on Hugging Face as branches. Note that branch corresponds exactly to the model checkpoint on the branch of each model. You may also further fine-tune and adapt Pythia-1B for deployment, as long as your use is in accordance with the Apache 2.0 license. Pythia models work with the Hugging Face Transformers Library. If you decide to use pre-trained Pythia-1B as a basis for your fine-tuned model, please conduct your own risk and bias assessment. ### Out-of-scope use The Pythia Suite is **not** intended for deployment. It is not a in itself a product and cannot be used for human-facing interactions. For example, the model may generate harmful or offensive text. Please evaluate the risks associated with your particular use case. Pythia models are English-language only, and are not suitable for translation or generating text in other languages. Pythia-1B has not been fine-tuned for downstream contexts in which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means Pythia-1B will **not** respond to a given prompt the way a product like ChatGPT does. This is because, unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “follow” human instructions. ### Limitations and biases The core functionality of a large language model is to take a string of text and predict the next token. The token used by the model need not produce the most “accurate” text. Never rely on Pythia-1B to produce factually accurate output. This model was trained on the Pile, a dataset known to contain profanity and texts that are lewd or otherwise offensive. See Section 6 of the Pile paper for a discussion of documented biases with regards to gender, religion, and race. Pythia-1B may produce socially unacceptable or undesirable text, *even if* the prompt itself does not include anything explicitly offensive. If you plan on using text generated through, for example, the Hosted Inference API, we recommend having a human curate the outputs of this language model before presenting it to other people. Please inform your audience that the text was generated by Pythia-1B. ### Quickstart Pythia models can be loaded and used via the following code, demonstrated here for the third checkpoint: Revision/branch corresponds exactly to the model checkpoint on the branch of each model.
For more information on how to use all Pythia models, see documentation on GitHub. ## Training ### Training data The Pile is a 825GiB general-purpose dataset in English. It was created by EleutherAI specifically for training large language models. It contains texts from 22 diverse sources, roughly broken down into five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub, Enron Emails). See the Pile paper for a breakdown of all data sources, methodology, and a discussion of ethical implications. Consult the datasheet for more detailed documentation about the Pile and its component datasets. The Pile can be downloaded from the official website, or from a community mirror.
The Pile was **not** deduplicated before being used to train Pythia-1B. ### Training procedure All models were trained on the exact same data, in the exact same order. Each model saw 299,892,736,000 tokens during training, and 143 checkpoints for each model are saved every 2,097,152,000 tokens, spaced evenly throughout training, from to (which is the same as ). In addition, we also provide frequent early checkpoints: and . This corresponds to training for just under 1 epoch on the Pile for non-deduplicated models, and about 1.5 epochs on the deduplicated Pile. All *Pythia* models trained for 143000 steps at a batch size of 2M (2,097,152 tokens).
See GitHub for more details on training procedure, including how to reproduce it.
Pythia uses the same tokenizer as GPT-NeoX- 20B. ## Evaluations All 16 *Pythia* models were evaluated using the LM Evaluation Harness. You can access the results by model and step at in the GitHub repository.
Expand the sections below to see plots of evaluation results for all Pythia and Pythia-deduped models compared with OPT and BLOOM.
LAMBADA – OpenAI
Physical Interaction: Question Answering (PIQA)
WinoGrande
AI2 Reasoning Challenge—Easy Set
SciQ
## Changelog This section compares differences between previously released Pythia v0 and the current models. See Appendix B of the Pythia paper for further discussion of these changes and the motivation behind them. We found that retraining Pythia had no impact on benchmark performance. - All model sizes are now trained with uniform batch size of 2M tokens. Previously, the models of size 160M, 410M, and 1.4B parameters were trained with batch sizes of 4M tokens. - We added checkpoints at initialization (step 0) and steps {1,2,4,8,16,32,64, 128,256,512} in addition to every 1000 training steps. - Flash Attention was used in the new retrained suite. - We remedied a minor inconsistency that existed in the original suite: all models of size 2.8B parameters or smaller had a learning rate (LR) schedule which decayed to a minimum LR of 10% the starting LR rate, but the 6.9B and 12B models all used an LR schedule which decayed to a minimum LR of 0. In the redone training runs, we rectified this inconsistency: all models now were trained with LR decaying to a minimum of 0.1× their maximum LR. ### Naming convention and parameter count *Pythia* models were renamed in January 2023. It is possible that the old naming convention still persists in some documentation by accident. The current naming convention (70M, 160M, etc.) is based on total parameter count.
| current Pythia suffix | old suffix | total params | non-embedding params | | --------------------: | ---------: | -------------: | -------------------: | | 70M | 19M | 70,426,624 | 18,915,328 | | 160M | 125M | 162,322,944 | 85,056,000 | | 410M | 350M | 405,334,016 | 302,311,424 | | 1B | 800M | 1,011,781,632 | 805,736,448 | | 1.4B | 1.3B | 1,414,647,808 | 1,208,602,624 | | 2.8B | 2.7B | 2,775,208,960 | 2,517,652,480 | | 6.9B | 6.7B | 6,857,302,016 | 6,444,163,072 | | 12B | 13B | 11,846,072,320 | 11,327,027,200 |
" +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_pythia-70m-deduped.json b/data/model_data_json/EleutherAI_pythia-70m-deduped.json new file mode 100644 index 0000000000000000000000000000000000000000..05126b7ed345e05ce9ca970387652e3d978171e0 --- /dev/null +++ b/data/model_data_json/EleutherAI_pythia-70m-deduped.json @@ -0,0 +1,25 @@ +{ + "model_id": "EleutherAI/pythia-70m-deduped", + "downloads": 854632, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "gpt_neox", + "text-generation", + "causal-lm", + "pythia", + "en", + "dataset:EleutherAI/the_pile_deduplicated", + "arxiv:2304.01373", + "arxiv:2101.00027", + "arxiv:2201.07311", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - pytorch - causal-lm - pythia license: apache-2.0 datasets: - EleutherAI/the_pile_deduplicated --- The *Pythia Scaling Suite* is a collection of models developed to facilitate interpretability research (see paper). It contains two sets of eight models of sizes 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. For each size, there are two models: one trained on the Pile, and one trained on the Pile after the dataset has been globally deduplicated. All 8 model sizes are trained on the exact same data, in the exact same order. We also provide 154 intermediate checkpoints per model, hosted on Hugging Face as branches. The Pythia model suite was designed to promote scientific research on large language models, especially interpretability research. Despite not centering downstream performance as a design goal, we find the models match or exceed the performance of similar and same-sized models, such as those in the OPT and GPT-Neo suites.
Details on previous early release and naming convention. Previously, we released an early version of the Pythia suite to the public. However, we decided to retrain the model suite to address a few hyperparameter discrepancies. This model card lists the changes; see appendix B in the Pythia paper for further discussion. We found no difference in benchmark performance between the two Pythia versions. The old models are still available, but we suggest the retrained suite if you are just starting to use Pythia.
**This is the current release.** Please note that all models in the *Pythia* suite were renamed in January 2023. For clarity, a table comparing the old and new names is provided in this model card, together with exact parameter counts.

# Pythia-70M-deduped ## Model Details - Developed by: EleutherAI - Model type: Transformer-based Language Model - Language: English - Learn more: Pythia's GitHub repository for training procedure, config files, and details on how to use. See paper for more evals and implementation details. - Library: GPT-NeoX - License: Apache 2.0 - Contact: to ask questions about this model, join the EleutherAI Discord, and post them in . Please read the existing *Pythia* documentation before asking about it in the EleutherAI Discord. For general correspondence: contact@eleuther. ai.
| Pythia model | Non-Embedding Params | Layers | Model Dim | Heads | Batch Size | Learning Rate | Equivalent Models | | -----------: | -------------------: | :----: | :-------: | :---: | :--------: | :-------------------: | :--------------------: | | 70M | 18,915,328 | 6 | 512 | 8 | 2M | 1.0 x 10-3 | — | | 160M | 85,056,000 | 12 | 768 | 12 | 2M | 6.0 x 10-4 | GPT-Neo 125M, OPT-125M | | 410M | 302,311,424 | 24 | 1024 | 16 | 2M | 3.0 x 10-4 | OPT-350M | | 1.0B | 805,736,448 | 16 | 2048 | 8 | 2M | 3.0 x 10-4 | — | | 1.4B | 1,208,602,624 | 24 | 2048 | 16 | 2M | 2.0 x 10-4 | GPT-Neo 1.3B, OPT-1.3B | | 2.8B | 2,517,652,480 | 32 | 2560 | 32 | 2M | 1.6 x 10-4 | GPT-Neo 2.7B, OPT-2.7B | | 6.9B | 6,444,163,072 | 32 | 4096 | 32 | 2M | 1.2 x 10-4 | OPT-6.7B | | 12B | 11,327,027,200 | 36 | 5120 | 40 | 2M | 1.2 x 10-4 | — |
Engineering details for the Pythia Suite. Deduped and non-deduped models of a given size have the same hyperparameters. “Equivalent” models have exactly the same architecture, and the same number of non-embedding parameters.
## Uses and Limitations ### Intended Use The primary intended use of Pythia is research on the behavior, functionality, and limitations of large language models. This suite is intended to provide a controlled setting for performing scientific experiments. We also provide 154 checkpoints per model: initial , 10 log-spaced checkpoints , and 143 evenly-spaced checkpoints from to . These checkpoints are hosted on Hugging Face as branches. Note that branch corresponds exactly to the model checkpoint on the branch of each model. You may also further fine-tune and adapt Pythia-70M-deduped for deployment, as long as your use is in accordance with the Apache 2.0 license. Pythia models work with the Hugging Face Transformers Library. If you decide to use pre-trained Pythia-70M-deduped as a basis for your fine-tuned model, please conduct your own risk and bias assessment. ### Out-of-scope use The Pythia Suite is **not** intended for deployment. It is not a in itself a product and cannot be used for human-facing interactions. For example, the model may generate harmful or offensive text. Please evaluate the risks associated with your particular use case. Pythia models are English-language only, and are not suitable for translation or generating text in other languages. Pythia-70M-deduped has not been fine-tuned for downstream contexts in which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means Pythia-70M-deduped will **not** respond to a given prompt the way a product like ChatGPT does. This is because, unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “follow” human instructions. ### Limitations and biases The core functionality of a large language model is to take a string of text and predict the next token. The token used by the model need not produce the most “accurate” text. Never rely on Pythia-70M-deduped to produce factually accurate output. This model was trained on the Pile, a dataset known to contain profanity and texts that are lewd or otherwise offensive. See Section 6 of the Pile paper for a discussion of documented biases with regards to gender, religion, and race. Pythia-70M-deduped may produce socially unacceptable or undesirable text, *even if* the prompt itself does not include anything explicitly offensive. If you plan on using text generated through, for example, the Hosted Inference API, we recommend having a human curate the outputs of this language model before presenting it to other people. Please inform your audience that the text was generated by Pythia-70M-deduped. ### Quickstart Pythia models can be loaded and used via the following code, demonstrated here for the third checkpoint: Revision/branch corresponds exactly to the model checkpoint on the branch of each model.
For more information on how to use all Pythia models, see documentation on GitHub. ## Training ### Training data Pythia-70M-deduped was trained on the Pile **after the dataset has been globally deduplicated**.
The Pile is a 825GiB general-purpose dataset in English. It was created by EleutherAI specifically for training large language models. It contains texts from 22 diverse sources, roughly broken down into five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub, Enron Emails). See the Pile paper for a breakdown of all data sources, methodology, and a discussion of ethical implications. Consult the datasheet for more detailed documentation about the Pile and its component datasets. The Pile can be downloaded from the official website, or from a community mirror. ### Training procedure All models were trained on the exact same data, in the exact same order. Each model saw 299,892,736,000 tokens during training, and 143 checkpoints for each model are saved every 2,097,152,000 tokens, spaced evenly throughout training, from to (which is the same as ). In addition, we also provide frequent early checkpoints: and . This corresponds to training for just under 1 epoch on the Pile for non-deduplicated models, and about 1.5 epochs on the deduplicated Pile. All *Pythia* models trained for 143000 steps at a batch size of 2M (2,097,152 tokens).
See GitHub for more details on training procedure, including how to reproduce it.
Pythia uses the same tokenizer as GPT-NeoX- 20B. ## Evaluations All 16 *Pythia* models were evaluated using the LM Evaluation Harness. You can access the results by model and step at in the GitHub repository.
Expand the sections below to see plots of evaluation results for all Pythia and Pythia-deduped models compared with OPT and BLOOM.
LAMBADA – OpenAI
Physical Interaction: Question Answering (PIQA)
WinoGrande
AI2 Reasoning Challenge—Easy Set
SciQ
## Changelog This section compares differences between previously released Pythia v0 and the current models. See Appendix B of the Pythia paper for further discussion of these changes and the motivation behind them. We found that retraining Pythia had no impact on benchmark performance. - All model sizes are now trained with uniform batch size of 2M tokens. Previously, the models of size 160M, 410M, and 1.4B parameters were trained with batch sizes of 4M tokens. - We added checkpoints at initialization (step 0) and steps {1,2,4,8,16,32,64, 128,256,512} in addition to every 1000 training steps. - Flash Attention was used in the new retrained suite. - We remedied a minor inconsistency that existed in the original suite: all models of size 2.8B parameters or smaller had a learning rate (LR) schedule which decayed to a minimum LR of 10% the starting LR rate, but the 6.9B and 12B models all used an LR schedule which decayed to a minimum LR of 0. In the redone training runs, we rectified this inconsistency: all models now were trained with LR decaying to a minimum of 0.1× their maximum LR. ### Naming convention and parameter count *Pythia* models were renamed in January 2023. It is possible that the old naming convention still persists in some documentation by accident. The current naming convention (70M, 160M, etc.) is based on total parameter count.
| current Pythia suffix | old suffix | total params | non-embedding params | | --------------------: | ---------: | -------------: | -------------------: | | 70M | 19M | 70,426,624 | 18,915,328 | | 160M | 125M | 162,322,944 | 85,056,000 | | 410M | 350M | 405,334,016 | 302,311,424 | | 1B | 800M | 1,011,781,632 | 805,736,448 | | 1.4B | 1.3B | 1,414,647,808 | 1,208,602,624 | | 2.8B | 2.7B | 2,775,208,960 | 2,517,652,480 | | 6.9B | 6.7B | 6,857,302,016 | 6,444,163,072 | | 12B | 13B | 11,846,072,320 | 11,327,027,200 |
", + "model_explanation_gemini": "A 70-million-parameter English language model from the Pythia suite designed for interpretability research, trained on deduplicated data from the Pile dataset." +} \ No newline at end of file diff --git a/data/model_data_json/EleutherAI_pythia-70m.json b/data/model_data_json/EleutherAI_pythia-70m.json new file mode 100644 index 0000000000000000000000000000000000000000..32c2d8cc28400f2cf290892cea6eeca7ee638e32 --- /dev/null +++ b/data/model_data_json/EleutherAI_pythia-70m.json @@ -0,0 +1,21 @@ +{ + "model_id": "EleutherAI/pythia-70m", + "downloads": 107950, + "tags": [ + "gpt-neox", + "pytorch", + "safetensors", + "gpt_neox", + "causal-lm", + "pythia", + "en", + "dataset:EleutherAI/pile", + "arxiv:2304.01373", + "arxiv:2101.00027", + "arxiv:2201.07311", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: - en tags: - pytorch - causal-lm - pythia license: apache-2.0 datasets: - EleutherAI/pile library_name: gpt-neox --- The *Pythia Scaling Suite* is a collection of models developed to facilitate interpretability research (see paper). It contains two sets of eight models of sizes 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. For each size, there are two models: one trained on the Pile, and one trained on the Pile after the dataset has been globally deduplicated. All 8 model sizes are trained on the exact same data, in the exact same order. We also provide 154 intermediate checkpoints per model, hosted on Hugging Face as branches. The Pythia model suite was deliberately designed to promote scientific research on large language models, especially interpretability research. Despite not centering downstream performance as a design goal, we find the models match or exceed the performance of similar and same-sized models, such as those in the OPT and GPT-Neo suites.
Details on previous early release and naming convention. Previously, we released an early version of the Pythia suite to the public. However, we decided to retrain the model suite to address a few hyperparameter discrepancies. This model card lists the changes; see appendix B in the Pythia paper for further discussion. We found no difference in benchmark performance between the two Pythia versions. The old models are still available, but we suggest the retrained suite if you are just starting to use Pythia.
**This is the current release.** Please note that all models in the *Pythia* suite were renamed in January 2023. For clarity, a table comparing the old and new names is provided in this model card, together with exact parameter counts.

# Pythia-70M ## Model Details - Developed by: EleutherAI - Model type: Transformer-based Language Model - Language: English - Learn more: Pythia's GitHub repository for training procedure, config files, and details on how to use. See paper for more evals and implementation details. - Library: GPT-NeoX - License: Apache 2.0 - Contact: to ask questions about this model, join the EleutherAI Discord, and post them in . Please read the existing *Pythia* documentation before asking about it in the EleutherAI Discord. For general correspondence: contact@eleuther. ai.
| Pythia model | Non-Embedding Params | Layers | Model Dim | Heads | Batch Size | Learning Rate | Equivalent Models | | -----------: | -------------------: | :----: | :-------: | :---: | :--------: | :-------------------: | :--------------------: | | 70M | 18,915,328 | 6 | 512 | 8 | 2M | 1.0 x 10-3 | — | | 160M | 85,056,000 | 12 | 768 | 12 | 2M | 6.0 x 10-4 | GPT-Neo 125M, OPT-125M | | 410M | 302,311,424 | 24 | 1024 | 16 | 2M | 3.0 x 10-4 | OPT-350M | | 1.0B | 805,736,448 | 16 | 2048 | 8 | 2M | 3.0 x 10-4 | — | | 1.4B | 1,208,602,624 | 24 | 2048 | 16 | 2M | 2.0 x 10-4 | GPT-Neo 1.3B, OPT-1.3B | | 2.8B | 2,517,652,480 | 32 | 2560 | 32 | 2M | 1.6 x 10-4 | GPT-Neo 2.7B, OPT-2.7B | | 6.9B | 6,444,163,072 | 32 | 4096 | 32 | 2M | 1.2 x 10-4 | OPT-6.7B | | 12B | 11,327,027,200 | 36 | 5120 | 40 | 2M | 1.2 x 10-4 | — |
Engineering details for the Pythia Suite. Deduped and non-deduped models of a given size have the same hyperparameters. “Equivalent” models have exactly the same architecture, and the same number of non-embedding parameters.
## Uses and Limitations ### Intended Use The primary intended use of Pythia is research on the behavior, functionality, and limitations of large language models. This suite is intended to provide a controlled setting for performing scientific experiments. We also provide 154 checkpoints per model: initial , 10 log-spaced checkpoints , and 143 evenly-spaced checkpoints from to . These checkpoints are hosted on Hugging Face as branches. Note that branch corresponds exactly to the model checkpoint on the branch of each model. You may also further fine-tune and adapt Pythia-70M for deployment, as long as your use is in accordance with the Apache 2.0 license. Pythia models work with the Hugging Face Transformers Library. If you decide to use pre-trained Pythia-70M as a basis for your fine-tuned model, please conduct your own risk and bias assessment. ### Out-of-scope use The Pythia Suite is **not** intended for deployment. It is not a in itself a product and cannot be used for human-facing interactions. For example, the model may generate harmful or offensive text. Please evaluate the risks associated with your particular use case. Pythia models are English-language only, and are not suitable for translation or generating text in other languages. Pythia-70M has not been fine-tuned for downstream contexts in which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means Pythia-70M will **not** respond to a given prompt the way a product like ChatGPT does. This is because, unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “follow” human instructions. ### Limitations and biases The core functionality of a large language model is to take a string of text and predict the next token. The token used by the model need not produce the most “accurate” text. Never rely on Pythia-70M to produce factually accurate output. This model was trained on the Pile, a dataset known to contain profanity and texts that are lewd or otherwise offensive. See Section 6 of the Pile paper for a discussion of documented biases with regards to gender, religion, and race. Pythia-70M may produce socially unacceptable or undesirable text, *even if* the prompt itself does not include anything explicitly offensive. If you plan on using text generated through, for example, the Hosted Inference API, we recommend having a human curate the outputs of this language model before presenting it to other people. Please inform your audience that the text was generated by Pythia-70M. ### Quickstart Pythia models can be loaded and used via the following code, demonstrated here for the third checkpoint: Revision/branch corresponds exactly to the model checkpoint on the branch of each model.
For more information on how to use all Pythia models, see documentation on GitHub. ## Training ### Training data The Pile is a 825GiB general-purpose dataset in English. It was created by EleutherAI specifically for training large language models. It contains texts from 22 diverse sources, roughly broken down into five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub, Enron Emails). See the Pile paper for a breakdown of all data sources, methodology, and a discussion of ethical implications. Consult the datasheet for more detailed documentation about the Pile and its component datasets. The Pile can be downloaded from the official website, or from a community mirror.
The Pile was **not** deduplicated before being used to train Pythia-70M. ### Training procedure All models were trained on the exact same data, in the exact same order. Each model saw 299,892,736,000 tokens during training, and 143 checkpoints for each model are saved every 2,097,152,000 tokens, spaced evenly throughout training, from to (which is the same as ). In addition, we also provide frequent early checkpoints: and . This corresponds to training for just under 1 epoch on the Pile for non-deduplicated models, and about 1.5 epochs on the deduplicated Pile. All *Pythia* models trained for 143000 steps at a batch size of 2M (2,097,152 tokens).
See GitHub for more details on training procedure, including how to reproduce it.
Pythia uses the same tokenizer as GPT-NeoX- 20B. ## Evaluations All 16 *Pythia* models were evaluated using the LM Evaluation Harness. You can access the results by model and step at in the GitHub repository.
Expand the sections below to see plots of evaluation results for all Pythia and Pythia-deduped models compared with OPT and BLOOM.
LAMBADA – OpenAI
Physical Interaction: Question Answering (PIQA)
WinoGrande
AI2 Reasoning Challenge—Easy Set
SciQ
## Changelog This section compares differences between previously released Pythia v0 and the current models. See Appendix B of the Pythia paper for further discussion of these changes and the motivation behind them. We found that retraining Pythia had no impact on benchmark performance. - All model sizes are now trained with uniform batch size of 2M tokens. Previously, the models of size 160M, 410M, and 1.4B parameters were trained with batch sizes of 4M tokens. - We added checkpoints at initialization (step 0) and steps {1,2,4,8,16,32,64, 128,256,512} in addition to every 1000 training steps. - Flash Attention was used in the new retrained suite. - We remedied a minor inconsistency that existed in the original suite: all models of size 2.8B parameters or smaller had a learning rate (LR) schedule which decayed to a minimum LR of 10% the starting LR rate, but the 6.9B and 12B models all used an LR schedule which decayed to a minimum LR of 0. In the redone training runs, we rectified this inconsistency: all models now were trained with LR decaying to a minimum of 0.1× their maximum LR. ### Naming convention and parameter count *Pythia* models were renamed in January 2023. It is possible that the old naming convention still persists in some documentation by accident. The current naming convention (70M, 160M, etc.) is based on total parameter count.
| current Pythia suffix | old suffix | total params | non-embedding params | | --------------------: | ---------: | -------------: | -------------------: | | 70M | 19M | 70,426,624 | 18,915,328 | | 160M | 125M | 162,322,944 | 85,056,000 | | 410M | 350M | 405,334,016 | 302,311,424 | | 1B | 800M | 1,011,781,632 | 805,736,448 | | 1.4B | 1.3B | 1,414,647,808 | 1,208,602,624 | | 2.8B | 2.7B | 2,775,208,960 | 2,517,652,480 | | 6.9B | 6.7B | 6,857,302,016 | 6,444,163,072 | | 12B | 13B | 11,846,072,320 | 11,327,027,200 |
# Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 25.28 | | ARC (25-shot) | 21.59 | | HellaSwag (10-shot) | 27.29 | | MMLU (5-shot) | 25.9 | | TruthfulQA (0-shot) | 47.06 | | Winogrande (5-shot) | 51.46 | | GSM8K (5-shot) | 0.3 | | DROP (3-shot) | 3.33 |", + "model_explanation_gemini": "A 70-million-parameter English language model designed for interpretability research, trained on the Pile dataset as part of the Pythia Scaling Suite to facilitate controlled scientific experiments on large language models." +} \ No newline at end of file diff --git a/data/model_data_json/Elron_bleurt-tiny-512.json b/data/model_data_json/Elron_bleurt-tiny-512.json new file mode 100644 index 0000000000000000000000000000000000000000..444a202010a99518bea9873acf139906fce923f5 --- /dev/null +++ b/data/model_data_json/Elron_bleurt-tiny-512.json @@ -0,0 +1,16 @@ +{ + "model_id": "Elron/bleurt-tiny-512", + "downloads": 312747, + "tags": [ + "transformers", + "pytorch", + "bert", + "text-classification", + "arxiv:1910.09700", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - text-classification - bert --- # Model Card for bleurt-tiny-512 # Model Details ## Model Description Pytorch version of the original BLEURT models from ACL paper - **Developed by:** Elron Bandel, Thibault Sellam, Dipanjan Das and Ankur P. Parikh of Google Research - **Shared by [Optional]:** Elron Bandel - **Model type:** Text Classification - **Language(s) (NLP):** More information needed - **License:** More information needed - **Parent Model:** BERT - **Resources for more information:** - GitHub Repo - Associated Paper - Blog Post # Uses ## Direct Use This model can be used for the task of Text Classification ## Downstream Use [Optional] More information needed. ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Training Details ## Training Data The model authors note in the associated paper: > We use years 2017 to 2019 of the WMT Metrics Shared Task, to-English language pairs. For each year, we used the of- ficial WMT test set, which include several thou- sand pairs of sentences with human ratings from the news domain. The training sets contain 5,360, 9,492, and 147,691 records for each year. ## Training Procedure ### Preprocessing More information needed ### Speeds, Sizes, Times More information needed # Evaluation ## Testing Data, Factors & Metrics ### Testing Data The test sets for years 2018 and 2019 [of the WMT Metrics Shared Task, to-English language pairs.] are noisier, ### Factors More information needed ### Metrics More information needed ## Results More information needed # Model Examination More information needed # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** More information needed - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications [optional] ## Model Architecture and Objective More information needed ## Compute Infrastructure More information needed ### Hardware More information needed ### Software More information needed. # Citation **BibTeX:** # Glossary [optional] More information needed # More Information [optional] More information needed # Model Card Authors [optional] Elron Bandel in collaboration with Ezi Ozoani and the Hugging Face team # Model Card Contact More information needed # How to Get Started with the Model Use the code below to get started with the model.
Click to expand See this notebook for model conversion code.
", + "model_explanation_gemini": "A PyTorch-based BERT model for text classification, adapted from the original BLEURT models for evaluating machine translation quality." +} \ No newline at end of file diff --git a/data/model_data_json/EmergentMethods_gliner_medium_news-v2.1.json b/data/model_data_json/EmergentMethods_gliner_medium_news-v2.1.json new file mode 100644 index 0000000000000000000000000000000000000000..f749b7b92f8a25488c585db16f6f5c2eeb834ebe --- /dev/null +++ b/data/model_data_json/EmergentMethods_gliner_medium_news-v2.1.json @@ -0,0 +1,16 @@ +{ + "model_id": "EmergentMethods/gliner_medium_news-v2.1", + "downloads": 497576, + "tags": [ + "gliner", + "pytorch", + "token-classification", + "en", + "dataset:EmergentMethods/AskNews-NER-v0", + "arxiv:2406.10258", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - EmergentMethods/AskNews-NER-v0 tags: - gliner language: - en pipeline_tag: token-classification --- # Model Card for gliner_medium_news-v2.1 This model is a fine-tune of GLiNER aimed at improving accuracy across a broad range of topics, especially with respect to long-context news entity extraction. As shown in the table below, these fine-tunes improved upon the base GLiNER model zero-shot accuracy by up to 7.5% across 18 benchmark datasets. !results table The underlying dataset, AskNews-NER-v0 was engineered with the objective of diversifying global perspectives by enforcing country/language/topic/temporal diversity. All data used to fine-tune this model was synthetically generated. WizardLM 13B v1.2 was used for translation/summarization of open-web news articles, while Llama3 70b instruct was used for entity extraction. Both the diversification and fine-tuning methods are presented in a our paper on ArXiv. # Usage Output: ## Model Details ### Model Description The synthetic data underlying this news fine-tune was pulled from the AskNews API. We enforced diveristy across country/language/topic/time. Countries: !country distribution Entity types: !entities Topics: !topics - **Developed by:** Emergent Methods - **Funded by:** Emergent Methods - **Shared by:** Emergent Methods - **Model type:** microsoft/deberta - **Language(s) (NLP):** English (en) (English texts and translations from Spanish (es), Portuguese (pt), German (de), Russian (ru), French (fr), Arabic (ar), Italian (it), Ukrainian (uk), Norwegian (no), Swedish (sv), Danish (da)). - **License:** Apache 2.0 - **Finetuned from model:** GLiNER ### Model Sources [optional] - **Repository:** To be added - **Paper:** To be added - **Demo:** To be added ## Uses ### Direct Use As the name suggests, this model is aimed at generalist entity extraction. Although we used news to fine-tune this model, it improved accuracy across 18 benchmark datasets by up to 7.5%. This means that the broad and diversified underlying dataset has helped it to recognize and extract more entity types. This model is shockingly compact, and can be used for high-throughput production usecases. This is another reason we have licensed this as Apache 2.0. Currently, AskNews is using this fine-tune for entity extraction in their system. ## Bias, Risks, and Limitations Although the goal of the dataset is to reduce bias, and improve diversity, it is still biased to western languages and countries. This limitation originates from the abilities of Llama2 for the translation and summary generations. Further, any bias originating in Llama2 training data will also be present in this dataset, since Llama2 was used to summarize the open-web articles. Further, any biases present in Llama3 will be present in the present dataaset since Llama3 was used to extract entities from the summaries. !countries distribution ## How to Get Started with the Model Use the code below to get started with the model. ## Training Details The training dataset is AskNews-NER-v0. Other training details can be found in the companion paper. ## Environmental Impact - **Hardware Type:** 1xA4500 - **Hours used:** 10 - **Carbon Emitted:** 0.6 kg (According to Machine Learning Impact calculator) ## Citation **BibTeX:** To be added **APA:** To be added ## Model Authors Elin Törnquist, Emergent Methods elin at emergentmethods.ai Robert Caulk, Emergent Methods rob at emergentmethods.ai ## Model Contact Elin Törnquist, Emergent Methods elin at emergentmethods.ai Robert Caulk, Emergent Methods rob at emergentmethods.ai", + "model_explanation_gemini": "Extracts entities from long-context news articles with improved accuracy across diverse topics by fine-tuning the GLiNER model on synthetically generated, globally diversified data." +} \ No newline at end of file diff --git a/data/model_data_json/Epiculous_Violet_Twilight-v0.2-GGUF.json b/data/model_data_json/Epiculous_Violet_Twilight-v0.2-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..896521679cde250067d0e9d4d0615dfcc94046de --- /dev/null +++ b/data/model_data_json/Epiculous_Violet_Twilight-v0.2-GGUF.json @@ -0,0 +1,32 @@ +{ + "model_id": "Epiculous/Violet_Twilight-v0.2-GGUF", + "downloads": 88807, + "tags": [ + "gguf", + "merge", + "text-generation", + "en", + "fr", + "de", + "es", + "it", + "pt", + "ru", + "zh", + "ja", + "dataset:Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned", + "dataset:anthracite-org/stheno-filtered-v1.1", + "dataset:PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT", + "dataset:Gryphe/Sonnet3.5-Charcard-Roleplay", + "dataset:Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned", + "dataset:anthracite-org/kalo-opus-instruct-22k-no-refusal", + "dataset:anthracite-org/nopm_claude_writing_fixed", + "dataset:anthracite-org/kalo_opus_misc_240827", + "license:apache-2.0", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- license: apache-2.0 datasets: - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned - anthracite-org/stheno-filtered-v1.1 - PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT - Gryphe/Sonnet3.5-Charcard-Roleplay - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned - anthracite-org/kalo-opus-instruct-22k-no-refusal - anthracite-org/nopm_claude_writing_fixed - anthracite-org/kalo_opus_misc_240827 language: - en - fr - de - es - it - pt - ru - zh - ja pipeline_tag: text-generation tags: - merge --- !image/png Now for something a bit different, Violet_Twilight-v0.2! This model is a SLERP merge of Azure_Dusk-v0.2 and Crimson_Dawn-v0.2! # Quants! full / exl2 / gguf ## Prompting The v0.2 models are trained on ChatML, the prompting structure goes a little something like this: ### Context and Instruct The v0.2 models are trained on ChatML, please use that Context and Instruct template. ### Current Top Sampler Settings Smooth Creativity: Credit to Juelsman for researching this one!
Variant Chimera: Credit to Numbra!
Spicy_Temp
Violet_Twilight-Nitral-Special
## Merging The following config was used to merge Azure Dusk and Crimson Dawn", + "model_explanation_gemini": "A SLERP-merged text-generation model trained on ChatML for multilingual conversational tasks using optimized sampler settings." +} \ No newline at end of file diff --git a/data/model_data_json/EvanD_xlm-roberta-base-romanian-ner-ronec.json b/data/model_data_json/EvanD_xlm-roberta-base-romanian-ner-ronec.json new file mode 100644 index 0000000000000000000000000000000000000000..2dbd8d361ce28e6ae8230c3988705d8497d5c82e --- /dev/null +++ b/data/model_data_json/EvanD_xlm-roberta-base-romanian-ner-ronec.json @@ -0,0 +1,18 @@ +{ + "model_id": "EvanD/xlm-roberta-base-romanian-ner-ronec", + "downloads": 230839, + "tags": [ + "transformers", + "pytorch", + "xlm-roberta", + "token-classification", + "named-entity-recognition", + "sequence-tagger-model", + "ro", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: token-classification tags: - named-entity-recognition - sequence-tagger-model widget: - text: Numele meu este Amadeus Wolfgang și locuiesc în Berlin inference: parameters: aggregation_strategy: simple grouped_entities: true language: - ro --- xlm-roberta model trained on ronec dataset, performing 95 f1-Macro on test set. | Test metric | Results | |------------------------|--------------------------| | test_f1_mac_ronec | 0.9547659158706665 | | test_loss_ronec | 0.16371206939220428 | | test_prec_mac_ronec | 0.8663718700408936 | | test_rec_mac_ronec | 0.8695588111877441 |", + "model_explanation_gemini": "Performs Romanian named-entity recognition using xlm-roberta trained on the ronec dataset with 95 F1-Macro score." +} \ No newline at end of file diff --git a/data/model_data_json/Exscientia_IgT5_unpaired.json b/data/model_data_json/Exscientia_IgT5_unpaired.json new file mode 100644 index 0000000000000000000000000000000000000000..90a05e3a0d54cfee26cf2344017e58cd4f502517 --- /dev/null +++ b/data/model_data_json/Exscientia_IgT5_unpaired.json @@ -0,0 +1,23 @@ +{ + "model_id": "Exscientia/IgT5_unpaired", + "downloads": 36981, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "antibody language model", + "antibody", + "protein language model", + "arxiv:2403.17889", + "base_model:Rostlab/prot_t5_xl_uniref50", + "base_model:finetune:Rostlab/prot_t5_xl_uniref50", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - antibody language model - antibody - protein language model base_model: Rostlab/prot_t5_xl_uniref50 license: mit --- # IgT5 unpaired model Model pretrained on protein and antibody sequences using a masked language modeling (MLM) objective. It was introduced in the paper Large scale paired antibody language models. The model is finetuned from ProtT5 using unpaired antibody sequences from the Observed Antibody Space. # Use The encoder part of the model and tokeniser can be loaded using the library The tokeniser is used to prepare batch inputs Note that the tokeniser adds a token at the end of each sequence and pads using the token. For example a batch containing sequences , will be tokenised to and . Sequence embeddings are generated by feeding tokens through the model To obtain a sequence representation, the residue tokens can be averaged over like so", + "model_explanation_gemini": "Pretrained on protein and antibody sequences using masked language modeling, finetuned from ProtT5 on unpaired antibody data for generating sequence embeddings." +} \ No newline at end of file diff --git a/data/model_data_json/FacebookAI_roberta-base.json b/data/model_data_json/FacebookAI_roberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..fb8245f7e4bf466ec29761b92ceeff3d27e73b45 --- /dev/null +++ b/data/model_data_json/FacebookAI_roberta-base.json @@ -0,0 +1,26 @@ +{ + "model_id": "FacebookAI/roberta-base", + "downloads": 8819423, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "roberta", + "fill-mask", + "exbert", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1907.11692", + "arxiv:1806.02847", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: mit datasets: - bookcorpus - wikipedia --- # RoBERTa base model Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. Disclaimer: The team releasing RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. ## Intended uses & limitations You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at a model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral. Therefore, the model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The RoBERTa model was pretrained on the reunion of five datasets: - BookCorpus, a dataset consisting of 11,038 unpublished books; - English Wikipedia (excluding lists, tables and headers) ; - CC-News, a dataset containing 63 millions English news articles crawled between September 2016 and February 2019. - OpenWebText, an opensource recreation of the WebText dataset used to train GPT-2, - Stories a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas. Together these datasets weigh 160GB of text. ## Training procedure ### Preprocessing The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked with and the end of one by The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed). ### Pretraining The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The optimizer used is Adam with a learning rate of 6e-4, \\\\(\\beta_{1} = 0.9\\\\), \\\\(\\beta_{2} = 0.98\\\\) and \\\\(\\epsilon = 1e-6\\\\), a weight decay of 0.01, learning rate warmup for 24,000 steps and linear decay of the learning rate after. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Glue test results: | Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| | | 87.6 | 91.9 | 92.8 | 94.8 | 63.6 | 91.2 | 90.2 | 78.7 | ### BibTeX entry and citation info ", + "model_explanation_gemini": "Pretrained on English text using masked language modeling to learn bidirectional representations for fine-tuning on downstream NLP tasks like classification or question answering." +} \ No newline at end of file diff --git a/data/model_data_json/FacebookAI_roberta-large-mnli.json b/data/model_data_json/FacebookAI_roberta-large-mnli.json new file mode 100644 index 0000000000000000000000000000000000000000..2578632c025c8b5090f28e66772da63579b2a2a7 --- /dev/null +++ b/data/model_data_json/FacebookAI_roberta-large-mnli.json @@ -0,0 +1,31 @@ +{ + "model_id": "FacebookAI/roberta-large-mnli", + "downloads": 464020, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "roberta", + "text-classification", + "autogenerated-modelcard", + "en", + "dataset:multi_nli", + "dataset:wikipedia", + "dataset:bookcorpus", + "arxiv:1907.11692", + "arxiv:1806.02847", + "arxiv:1804.07461", + "arxiv:1704.05426", + "arxiv:1508.05326", + "arxiv:1809.05053", + "arxiv:1910.09700", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: mit tags: - autogenerated-modelcard datasets: - multi_nli - wikipedia - bookcorpus --- # roberta-large-mnli ## Table of Contents - Model Details - How To Get Started With the Model - Uses - Risks, Limitations and Biases - Training - Evaluation - Environmental Impact - Technical Specifications - Citation Information - Model Card Authors ## Model Details **Model Description:** roberta-large-mnli is the RoBERTa large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. The model is a pretrained model on English language text using a masked language modeling (MLM) objective. - **Developed by:** See GitHub Repo for model developers - **Model Type:** Transformer-based language model - **Language(s):** English - **License:** MIT - **Parent Model:** This model is a fine-tuned version of the RoBERTa large model. Users should see the RoBERTa large model card for relevant information. - **Resources for more information:** - Research Paper - GitHub Repo ## How to Get Started with the Model Use the code below to get started with the model. The model can be loaded with the zero-shot-classification pipeline like so: You can then use this pipeline to classify sequences into any of the class names you specify. For example: ## Uses #### Direct Use This fine-tuned model can be used for zero-shot classification tasks, including zero-shot sentence-pair classification (see the GitHub repo for examples) and zero-shot sequence classification. #### Misuse and Out-of-scope Use The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). The RoBERTa large model card notes that: \"The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral.\" Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example: Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. ## Training #### Training Data This model was fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. Also see the MNLI data card for more information. As described in the RoBERTa large model card: > The RoBERTa model was pretrained on the reunion of five datasets: > > - BookCorpus, a dataset consisting of 11,038 unpublished books; > - English Wikipedia (excluding lists, tables and headers) ; > - CC-News, a dataset containing 63 millions English news articles crawled between September 2016 and February 2019. > - OpenWebText, an opensource recreation of the WebText dataset used to train GPT-2, > - Stories, a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas. > > Together theses datasets weight 160GB of text. Also see the bookcorpus data card and the wikipedia data card for additional information. #### Training Procedure ##### Preprocessing As described in the RoBERTa large model card: > The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of > the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked > with and the end of one by > > The details of the masking procedure for each sentence are the following: > - 15% of the tokens are masked. > - In 80% of the cases, the masked tokens are replaced by . > - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. > - In the 10% remaining cases, the masked tokens are left as is. > > Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed). ##### Pretraining Also as described in the RoBERTa large model card: > The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The > optimizer used is Adam with a learning rate of 4e-4, \\\\(\\beta_{1} = 0.9\\\\), \\\\(\\beta_{2} = 0.98\\\\) and > \\\\(\\epsilon = 1e-6\\\\), a weight decay of 0.01, learning rate warmup for 30,000 steps and linear decay of the learning > rate after. ## Evaluation The following evaluation information is extracted from the associated GitHub repo for RoBERTa. #### Testing Data, Factors and Metrics The model developers report that the model was evaluated on the following tasks and datasets using the listed metrics: - **Dataset:** Part of GLUE (Wang et al., 2019), the General Language Understanding Evaluation benchmark, a collection of 9 datasets for evaluating natural language understanding systems. Specifically, the model was evaluated on the Multi-Genre Natural Language Inference (MNLI) corpus. See the GLUE data card or Wang et al. (2019) for further information. - **Tasks:** NLI. Wang et al. (2019) describe the inference task for MNLI as: > The Multi-Genre Natural Language Inference Corpus (Williams et al., 2018) is a crowd-sourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. We use the standard test set, for which we obtained private labels from the authors, and evaluate on both the matched (in-domain) and mismatched (cross-domain) sections. We also use and recommend the SNLI corpus (Bowman et al., 2015) as 550k examples of auxiliary training data. - **Metrics:** Accuracy - **Dataset:** XNLI (Conneau et al., 2018), the extension of the Multi-Genre Natural Language Inference (MNLI) corpus to 15 languages: English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu. See the XNLI data card or Conneau et al. (2018) for further information. - **Tasks:** Translate-test (e.g., the model is used to translate input sentences in other languages to the training language) - **Metrics:** Accuracy #### Results GLUE test results (dev set, single model, single-task fine-tuning): 90.2 on MNLI XNLI test results: | Task | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur | |:----:|:--:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | |91.3|82.91|84.27|81.24|81.74|83.13|78.28|76.79|76.64|74.17|74.05| 77.5| 70.9|66.65|66.81| ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). We present the hardware type and hours used based on the associated paper. - **Hardware Type:** 1024 V100 GPUs - **Hours used:** 24 hours (one day) - **Cloud Provider:** Unknown - **Compute Region:** Unknown - **Carbon Emitted:** Unknown ## Technical Specifications See the associated paper for details on the modeling architecture, objective, compute infrastructure, and training details. ## Citation Information", + "model_explanation_gemini": "roberta-large-mnli is a RoBERTa large model fine-tuned for zero-shot classification tasks, particularly natural language inference, using the MNLI corpus." +} \ No newline at end of file diff --git a/data/model_data_json/FacebookAI_roberta-large.json b/data/model_data_json/FacebookAI_roberta-large.json new file mode 100644 index 0000000000000000000000000000000000000000..68638ad69d30ea75c115625a76096cadde112720 --- /dev/null +++ b/data/model_data_json/FacebookAI_roberta-large.json @@ -0,0 +1,26 @@ +{ + "model_id": "FacebookAI/roberta-large", + "downloads": 20531343, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "roberta", + "fill-mask", + "exbert", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1907.11692", + "arxiv:1806.02847", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: mit datasets: - bookcorpus - wikipedia --- # RoBERTa large model Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. Disclaimer: The team releasing RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. ## Intended uses & limitations You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral. Therefore, the model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The RoBERTa model was pretrained on the reunion of five datasets: - BookCorpus, a dataset consisting of 11,038 unpublished books; - English Wikipedia (excluding lists, tables and headers) ; - CC-News, a dataset containing 63 millions English news articles crawled between September 2016 and February 2019. - OpenWebText, an opensource recreation of the WebText dataset used to train GPT-2, - Stories a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas. Together theses datasets weight 160GB of text. ## Training procedure ### Preprocessing The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked with and the end of one by The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed). ### Pretraining The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The optimizer used is Adam with a learning rate of 4e-4, \\\\(\\beta_{1} = 0.9\\\\), \\\\(\\beta_{2} = 0.98\\\\) and \\\\(\\epsilon = 1e-6\\\\), a weight decay of 0.01, learning rate warmup for 30,000 steps and linear decay of the learning rate after. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Glue test results: | Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| | | 90.2 | 92.2 | 94.7 | 96.4 | 68.0 | 96.4 | 90.9 | 86.6 | ### BibTeX entry and citation info ", + "model_explanation_gemini": "Pretrained on English text using masked language modeling to learn bidirectional representations for downstream NLP tasks like classification and question answering." +} \ No newline at end of file diff --git a/data/model_data_json/FacebookAI_xlm-roberta-base.json b/data/model_data_json/FacebookAI_xlm-roberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..a0ec806bc8aa5d41ef01cce5bc9ae1976cc4731f --- /dev/null +++ b/data/model_data_json/FacebookAI_xlm-roberta-base.json @@ -0,0 +1,116 @@ +{ + "model_id": "FacebookAI/xlm-roberta-base", + "downloads": 9528463, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "xlm-roberta", + "fill-mask", + "exbert", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:1911.02116", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - exbert language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh license: mit --- # XLM-RoBERTa (base-sized model) XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository. Disclaimer: The team releasing XLM-RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. RoBERTa is a transformers model pretrained on a large corpus in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. This way, the model learns an inner representation of 100 languages that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the XLM-RoBERTa model as inputs. ## Intended uses & limitations You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation, you should look at models like GPT2. ## Usage You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: ### BibTeX entry and citation info ", + "model_explanation_gemini": "XLM-RoBERTa is a multilingual model pre-trained on 100 languages using masked language modeling, designed for fine-tuning on downstream NLP tasks like sequence classification or question answering." +} \ No newline at end of file diff --git a/data/model_data_json/FacebookAI_xlm-roberta-large-finetuned-conll03-english.json b/data/model_data_json/FacebookAI_xlm-roberta-large-finetuned-conll03-english.json new file mode 100644 index 0000000000000000000000000000000000000000..25104dc96b66a7d5cd5d4396da29fcd6d85c4977 --- /dev/null +++ b/data/model_data_json/FacebookAI_xlm-roberta-large-finetuned-conll03-english.json @@ -0,0 +1,114 @@ +{ + "model_id": "FacebookAI/xlm-roberta-large-finetuned-conll03-english", + "downloads": 79650, + "tags": [ + "transformers", + "pytorch", + "rust", + "onnx", + "safetensors", + "xlm-roberta", + "token-classification", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:1911.02116", + "arxiv:2008.03415", + "arxiv:1910.09700", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh --- # xlm-roberta-large-finetuned-conll03-english # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training 5. Evaluation 6. Environmental Impact 7. Technical Specifications 8. Citation 9. Model Card Authors 10. How To Get Started With the Model # Model Details ## Model Description The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is XLM-RoBERTa-large fine-tuned with the conll2003 dataset in English. - **Developed by:** See associated paper - **Model type:** Multi-lingual language model - **Language(s) (NLP) or Countries (images):** XLM-RoBERTa is a multilingual model trained on 100 different languages; see GitHub Repo for full list; model is fine-tuned on a dataset in English - **License:** More information needed - **Related Models:** RoBERTa, XLM - **Parent Model:** XLM-RoBERTa-large - **Resources for more information:** -GitHub Repo -Associated Paper # Uses ## Direct Use The model is a language model. The model can be used for token classification, a natural language understanding task in which a label is assigned to some tokens in a text. ## Downstream Use Potential downstream use cases include Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. To learn more about token classification and other potential downstream use cases, see the Hugging Face token classification docs. ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations **CONTENT WARNING: Readers should be made aware that language generated by this model may be disturbing or offensive to some and may propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). In the context of tasks relevant to this model, Mishra et al. (2020) explore social biases in NER systems for English and find that there is systematic bias in existing NER systems in that they fail to identify named entities from different demographic groups (though this paper did not look at BERT). For example, using a sample sentence from Mishra et al. (2020): ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. # Training See the following resources for training data and training procedure details: - XLM-RoBERTa-large model card - CoNLL-2003 data card - Associated paper # Evaluation See the associated paper for evaluation details. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** 500 32GB Nvidia V100 GPUs (from the associated paper) - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications See the associated paper for further details. # Citation **BibTeX:** **APA:** - Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model Use the code below to get started with the model. You can use this model directly within a pipeline for NER.
Click to expand
" +} \ No newline at end of file diff --git a/data/model_data_json/FacebookAI_xlm-roberta-large.json b/data/model_data_json/FacebookAI_xlm-roberta-large.json new file mode 100644 index 0000000000000000000000000000000000000000..1a8bf0732956e9876b92ca5b8d16eebf6a3292c2 --- /dev/null +++ b/data/model_data_json/FacebookAI_xlm-roberta-large.json @@ -0,0 +1,116 @@ +{ + "model_id": "FacebookAI/xlm-roberta-large", + "downloads": 5100801, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "xlm-roberta", + "fill-mask", + "exbert", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:1911.02116", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - exbert language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh license: mit --- # XLM-RoBERTa (large-sized model) XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository. Disclaimer: The team releasing XLM-RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. RoBERTa is a transformers model pretrained on a large corpus in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. This way, the model learns an inner representation of 100 languages that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the XLM-RoBERTa model as inputs. ## Intended uses & limitations You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation, you should look at models like GPT2. ## Usage You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: ### BibTeX entry and citation info
", + "model_explanation_gemini": "XLM-RoBERTa is a multilingual model pre-trained on 100 languages using masked language modeling, designed for fine-tuning on downstream NLP tasks like sequence classification or question answering." +} \ No newline at end of file diff --git a/data/model_data_json/Falconsai_nsfw_image_detection.json b/data/model_data_json/Falconsai_nsfw_image_detection.json new file mode 100644 index 0000000000000000000000000000000000000000..e4fb511251ba32c149062e2036937c9b1ca8d91a --- /dev/null +++ b/data/model_data_json/Falconsai_nsfw_image_detection.json @@ -0,0 +1,18 @@ +{ + "model_id": "Falconsai/nsfw_image_detection", + "downloads": 78430283, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-classification", + "arxiv:2010.11929", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: image-classification --- # Model Card: Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification ## Model Description The **Fine-Tuned Vision Transformer (ViT)** is a variant of the transformer encoder architecture, similar to BERT, that has been adapted for image classification tasks. This specific model, named \"google/vit-base-patch16-224-in21k,\" is pre-trained on a substantial collection of images in a supervised manner, leveraging the ImageNet-21k dataset. The images in the pre-training dataset are resized to a resolution of 224x224 pixels, making it suitable for a wide range of image recognition tasks. During the training phase, meticulous attention was given to hyperparameter settings to ensure optimal model performance. The model was fine-tuned with a judiciously chosen batch size of 16. This choice not only balanced computational efficiency but also allowed for the model to effectively process and learn from a diverse array of images. To facilitate this fine-tuning process, a learning rate of 5e-5 was employed. The learning rate serves as a critical tuning parameter that dictates the magnitude of adjustments made to the model's parameters during training. In this case, a learning rate of 5e-5 was selected to strike a harmonious balance between rapid convergence and steady optimization, resulting in a model that not only learns swiftly but also steadily refines its capabilities throughout the training process. This training phase was executed using a proprietary dataset containing an extensive collection of 80,000 images, each characterized by a substantial degree of variability. The dataset was thoughtfully curated to include two distinct classes, namely \"normal\" and \"nsfw.\" This diversity allowed the model to grasp nuanced visual patterns, equipping it with the competence to accurately differentiate between safe and explicit content. The overarching objective of this meticulous training process was to impart the model with a deep understanding of visual cues, ensuring its robustness and competence in tackling the specific task of NSFW image classification. The result is a model that stands ready to contribute significantly to content safety and moderation, all while maintaining the highest standards of accuracy and reliability. ## Intended Uses & Limitations ### Intended Uses - **NSFW Image Classification**: The primary intended use of this model is for the classification of NSFW (Not Safe for Work) images. It has been fine-tuned for this purpose, making it suitable for filtering explicit or inappropriate content in various applications. ### How to use Here is how to use this model to classifiy an image based on 1 of 2 classes (normal,nsfw):

Run Yolo Version
### Limitations - **Specialized Task Fine-Tuning**: While the model is adept at NSFW image classification, its performance may vary when applied to other tasks. - Users interested in employing this model for different tasks should explore fine-tuned versions available in the model hub for optimal results. ## Training Data The model's training data includes a proprietary dataset comprising approximately 80,000 images. This dataset encompasses a significant amount of variability and consists of two distinct classes: \"normal\" and \"nsfw.\" The training process on this data aimed to equip the model with the ability to distinguish between safe and explicit content effectively. ### Training Stats
**Note:** It's essential to use this model responsibly and ethically, adhering to content guidelines and applicable regulations when implementing it in real-world applications, particularly those involving potentially sensitive content. For more details on model fine-tuning and usage, please refer to the model's documentation and the model hub. ## References - Hugging Face Model Hub - Vision Transformer (ViT) Paper - ImageNet-21k Dataset **Disclaimer:** The model's performance may be influenced by the quality and representativeness of the data it was fine-tuned on. Users are encouraged to assess the model's suitability for their specific applications and datasets.", + "model_explanation_gemini": "Classifies images as \"normal\" or \"nsfw\" to detect explicit or inappropriate content." +} \ No newline at end of file diff --git a/data/model_data_json/Felix92_doctr-torch-parseq-multilingual-v1.json b/data/model_data_json/Felix92_doctr-torch-parseq-multilingual-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..9e313adb12707d486ed8ba247fb3b708f1df4d65 --- /dev/null +++ b/data/model_data_json/Felix92_doctr-torch-parseq-multilingual-v1.json @@ -0,0 +1,25 @@ +{ + "model_id": "Felix92/doctr-torch-parseq-multilingual-v1", + "downloads": 251652, + "tags": [ + "transformers", + "pytorch", + "en", + "da", + "fr", + "it", + "es", + "de", + "pt", + "cs", + "pl", + "nl", + "no", + "fi", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - da - fr - it - es - de - pt - cs - pl - nl - 'no' - fi license: apache-2.0 ---

**Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch** ## Task: recognition ### Example usage:", + "model_explanation_gemini": "Performs multilingual optical character recognition (OCR) for text extraction in multiple languages including English, Danish, French, and others." +} \ No newline at end of file diff --git a/data/model_data_json/GAIR_ReasonEval-7B.json b/data/model_data_json/GAIR_ReasonEval-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..90ae5ed206f04c26a016f102c38f772b731c868a --- /dev/null +++ b/data/model_data_json/GAIR_ReasonEval-7B.json @@ -0,0 +1,18 @@ +{ + "model_id": "GAIR/ReasonEval-7B", + "downloads": 212867, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-classification", + "en", + "arxiv:2404.05692", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: text-classification --- # **ReasonEval-7B Model Card** ## Model Description is a 7B parameter decoder-only language model fine-tuned from []( Given a mathematical problem and the solution, assesses the problem-solving process in a step-by-step format from the following perspectives: - **Validity**: The step contains no mistakes in calculation and logic. - **Redundancy**: The step lacks utility in solving the problem but is still valid. With ReasonEval, you can - 📏 quantify the quality of reasoning steps free of human or close-source models. - 🤖 find the potential invalid or redundant steps in the solutions even with the correct results. - 🛠️ select high-quality training data for downstream tasks (e.g., fine-tuning). ## Model Details * **Model type**: 's architecture is identical to []( except that the classification head for next-token prediction is replaced with a classification head for outputting the possibilities of each class of reasong steps. * **Language(s)**: English * **Paper**: Evaluating Mathematical Reasoning Beyond Accuracy * **Github**: * **Finetuned from model**: * **Fine-tuning Data**: PRM800K For detailed instructions on how to use the ReasonEval-7B model, visit our GitHub repository at ## How to Cite", + "model_explanation_gemini": "Evaluates mathematical problem-solving steps for validity and redundancy to assess reasoning quality without human input." +} \ No newline at end of file diff --git a/data/model_data_json/GPTCache_paraphrase-albert-small-v2.json b/data/model_data_json/GPTCache_paraphrase-albert-small-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..c48c0e8ac02f06c3e4921c2c0375d07426a2cb7a --- /dev/null +++ b/data/model_data_json/GPTCache_paraphrase-albert-small-v2.json @@ -0,0 +1,14 @@ +{ + "model_id": "GPTCache/paraphrase-albert-small-v2", + "downloads": 260656, + "tags": [ + "transformers", + "albert", + "feature-extraction", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit ---", + "model_explanation_gemini": "Generates paraphrased text variations using the ALBERT-small-v2 architecture for tasks like semantic caching." +} \ No newline at end of file diff --git a/data/model_data_json/GSAI-ML_LLaDA-8B-Instruct.json b/data/model_data_json/GSAI-ML_LLaDA-8B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..ef51f12e54b94dab4394d119663d86948425fa9e --- /dev/null +++ b/data/model_data_json/GSAI-ML_LLaDA-8B-Instruct.json @@ -0,0 +1,17 @@ +{ + "model_id": "GSAI-ML/LLaDA-8B-Instruct", + "downloads": 99542, + "tags": [ + "transformers", + "safetensors", + "llada", + "text-generation", + "conversational", + "custom_code", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: mit library_name: transformers pipeline_tag: text-generation --- # LLaDA-8B-Instruct We introduce LLaDA, a diffusion model with an unprecedented 8B scale, trained entirely from scratch, rivaling LLaMA3 8B in performance. Project Page Code", + "model_explanation_gemini": "LLaDA-8B-Instruct is an 8-billion-parameter diffusion model for text generation, trained from scratch to compete with LLaMA3 8B in performance." +} \ No newline at end of file diff --git a/data/model_data_json/Gensyn_Qwen2.5-0.5B-Instruct.json b/data/model_data_json/Gensyn_Qwen2.5-0.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..0bdc0b31e1add6747d2786ffca2098e3b62d1a95 --- /dev/null +++ b/data/model_data_json/Gensyn_Qwen2.5-0.5B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "Gensyn/Qwen2.5-0.5B-Instruct", + "downloads": 2977185, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "rl-swarm", + "gensyn", + "conversational", + "en", + "base_model:Qwen/Qwen2.5-0.5B", + "base_model:finetune:Qwen/Qwen2.5-0.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-0.5B tags: - chat - rl-swarm - gensyn library_name: transformers --- # Qwen2.5-0.5B-Instruct ## Introduction This model is intended for use in the Gensyn RL Swarm, to finetune locally using peer-to-peer reinforcement learning post-training. Once finetuned, the model can be used as normal in any workflow, for details on how to do this please refer to the original model documentation. For more details on the original model, please refer to the original repository here. This repo contains an **unmodified version** of the instruction-tuned 0.5B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens ## Requirements This model is intended for use in the Gensyn RL Swarm system, for details on model requirements when using outside of a swarm, refer to the original Qwen repo here. ## Quickstart To deploy this model into a swarm and/or participate in the Gensyn Testnet, follow the instructions in the RL Swarm repository, read about the testnet, read the RL Swarm overview, and/or read the RL Swarm technical report.", + "model_explanation_gemini": "A 0.5B parameter instruction-tuned causal language model designed for peer-to-peer reinforcement learning finetuning in the Gensyn RL Swarm, capable of text generation with a 32K token context." +} \ No newline at end of file diff --git a/data/model_data_json/Gensyn_Qwen2.5-1.5B-Instruct.json b/data/model_data_json/Gensyn_Qwen2.5-1.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..fdec6365894a900d0a081b32a0d296147a61b197 --- /dev/null +++ b/data/model_data_json/Gensyn_Qwen2.5-1.5B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "Gensyn/Qwen2.5-1.5B-Instruct", + "downloads": 84268, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "rl-swarm", + "gensyn", + "conversational", + "en", + "base_model:Qwen/Qwen2.5-1.5B", + "base_model:finetune:Qwen/Qwen2.5-1.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-1.5B tags: - chat - rl-swarm - gensyn library_name: transformers --- # Qwen2.5-1.5B-Instruct ## Introduction This model is intended for use in the Gensyn RL Swarm, to finetune locally using peer-to-peer reinforcement learning post-training. Once finetuned, the model can be used as normal in any workflow, for details on how to do this please refer to the original model documentation. For more details on the original model, please refer to the original repository here. This repo contains an **unmodified version** of the instruction-tuned 1.5B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 1.54B - Number of Paramaters (Non-Embedding): 1.31B - Number of Layers: 28 - Number of Attention Heads (GQA): 12 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens ## Requirements This model is intended for use in the Gensyn RL Swarm system, for details on model requirements when using outside of a swarm, refer to the original Qwen repo here. ## Quickstart To deploy this model into a swarm and/or participate in the Gensyn Testnet, follow the instructions in the RL Swarm repository, read about the testnet, read the RL Swarm overview, and/or read the RL Swarm technical report.", + "model_explanation_gemini": "A 1.5B parameter instruction-tuned causal language model designed for peer-to-peer reinforcement learning finetuning in the Gensyn RL Swarm, capable of text generation with a 32K token context length." +} \ No newline at end of file diff --git a/data/model_data_json/Gherman_bert-base-NER-Russian.json b/data/model_data_json/Gherman_bert-base-NER-Russian.json new file mode 100644 index 0000000000000000000000000000000000000000..cda7919ea3f7067c42b47446ed65bab8af529f9c --- /dev/null +++ b/data/model_data_json/Gherman_bert-base-NER-Russian.json @@ -0,0 +1,19 @@ +{ + "model_id": "Gherman/bert-base-NER-Russian", + "downloads": 120995, + "tags": [ + "transformers", + "safetensors", + "bert", + "token-classification", + "ru", + "base_model:google-bert/bert-base-multilingual-cased", + "base_model:finetune:google-bert/bert-base-multilingual-cased", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - ru base_model: - google-bert/bert-base-multilingual-cased pipeline_tag: token-classification library_name: transformers --- # Russian Named Entity Recognition Model ## Model description This model is a fine-tuned version of for Named Entity Recognition (NER) in Russian text. It can identify various entity types such as person names, locations, and organizations using the BIOLU tagging format. ## Intended uses & limitations The model is designed to identify named entities in Russian text. It can be used for tasks such as information extraction, content analysis, and text preprocessing for downstream NLP tasks. ### How to use Here's a simple example of how to use the model: ### Limitations and bias - The model's performance may vary depending on the domain and style of the input text. - It may struggle with rare or complex entity names not seen during training. - The model might exhibit biases present in the training data. ## Training data The model was trained on Detailed-NER-Dataset-RU by AlexKly. Check it out, the dataset is pretty good! ### Label Information The dataset is labeled using the BIOLU format, where: - B: Beginning token of an entity - I: Inner token of an entity - O: Other (non-entity) token - L: Last token of an entity - U: Unit token (single-token entity) The following entity types are included in the dataset: Location (LOC) tags: - COUNTRY - REGION - CITY - DISTRICT - STREET - HOUSE Person (PER) tags: - LAST_NAME - FIRST_NAME - MIDDLE_NAME For example, a full tag might look like \"B-CITY\" for the beginning token of a city name, or \"U-COUNTRY\" for a single-token country name. ## Training procedure The model was fine-tuned from the checkpoint using the Hugging Face Transformers library. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-5 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Adam with weight decay fix - lr_scheduler_type: linear - num_epochs: 10 ### Framework versions - Transformers 4.28.1 - Pytorch 1.13.0 - Datasets 2.12.0 - Tokenizers 0.13.3 ## Evaluation results The model achieves the following results on the evaluation set: - Precision: 0.987843 - Recall: 0.988498 - F1 Score: 0.988170 ## Ethical considerations This model is intended for use in analyzing Russian text and should be used responsibly. Users should be aware of potential biases in the model's predictions and use the results judiciously, especially in applications that may impact individuals or groups.", + "model_explanation_gemini": "Identifies named entities like persons, locations, and organizations in Russian text using BIOLU tagging format." +} \ No newline at end of file diff --git a/data/model_data_json/GraydientPlatformAPI_comicbabes2.json b/data/model_data_json/GraydientPlatformAPI_comicbabes2.json new file mode 100644 index 0000000000000000000000000000000000000000..90e9ea2b6c163d4eca7b925a523ef23f31f266b5 --- /dev/null +++ b/data/model_data_json/GraydientPlatformAPI_comicbabes2.json @@ -0,0 +1,15 @@ +{ + "model_id": "GraydientPlatformAPI/comicbabes2", + "downloads": 93904, + "tags": [ + "diffusers", + "safetensors", + "license:openrail", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: openrail ---", + "model_explanation_gemini": "Generates comic-style female character images." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-ar-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-ar-en.json new file mode 100644 index 0000000000000000000000000000000000000000..cf7642bfc91dc48135a0df815c89a539882eb87e --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-ar-en.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-ar-en", + "downloads": 586561, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "ar", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-ar-en * source languages: ar * target languages: en * OPUS readme: ar-en * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2019-12-18.zip * test set translations: opus-2019-12-18.test.txt * test set scores: opus-2019-12-18.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba.ar.en | 49.4 | 0.661 |", + "model_explanation_gemini": "Translates text from Arabic (ar) to English (en) using a transformer-based model trained on OPUS datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-de-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-de-en.json new file mode 100644 index 0000000000000000000000000000000000000000..5c9652877d0547b509dc764bc5d88a76ffc7bf2c --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-de-en.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-de-en", + "downloads": 462818, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "de", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-de-en * source languages: de * target languages: en * OPUS readme: de-en * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2020-02-26.zip * test set translations: opus-2020-02-26.test.txt * test set scores: opus-2020-02-26.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newssyscomb2009.de.en | 29.4 | 0.557 | | news-test2008.de.en | 27.8 | 0.548 | | newstest2009.de.en | 26.8 | 0.543 | | newstest2010.de.en | 30.2 | 0.584 | | newstest2011.de.en | 27.4 | 0.556 | | newstest2012.de.en | 29.1 | 0.569 | | newstest2013.de.en | 32.1 | 0.583 | | newstest2014-deen.de.en | 34.0 | 0.600 | | newstest2015-ende.de.en | 34.2 | 0.599 | | newstest2016-ende.de.en | 40.4 | 0.649 | | newstest2017-ende.de.en | 35.7 | 0.610 | | newstest2018-ende.de.en | 43.7 | 0.667 | | newstest2019-deen.de.en | 40.1 | 0.642 | | Tatoeba.de.en | 55.4 | 0.707 |", + "model_explanation_gemini": "Translates text from German (de) to English (en) using a transformer-based model trained on the OPUS dataset." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-ar.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-ar.json new file mode 100644 index 0000000000000000000000000000000000000000..5628078ac719c3e4ec94fc49a7196d0b1d356e15 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-ar.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-ar", + "downloads": 90197, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "en", + "ar", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - ar tags: - translation license: apache-2.0 --- ### eng-ara * source group: English * target group: Arabic * OPUS readme: eng-ara * model: transformer * source language(s): eng * target language(s): acm afb apc apc_Latn ara ara_Latn arq arq_Latn ary arz * model: transformer * pre-processing: normalization + SentencePiece (spm32k,spm32k) * a sentence initial language token is required in the form of (id = valid target language ID) * download original weights: opus-2020-07-03.zip * test set translations: opus-2020-07-03.test.txt * test set scores: opus-2020-07-03.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba-test.eng.ara | 14.0 | 0.437 | ### System Info: - hf_name: eng-ara - source_languages: eng - target_languages: ara - opus_readme_url: - original_repo: Tatoeba-Challenge - tags: ['translation'] - languages: ['en', 'ar'] - src_constituents: {'eng'} - tgt_constituents: {'apc', 'ara', 'arq_Latn', 'arq', 'afb', 'ara_Latn', 'apc_Latn', 'arz'} - src_multilingual: False - tgt_multilingual: False - prepro: normalization + SentencePiece (spm32k,spm32k) - url_model: - url_test_set: - src_alpha3: eng - tgt_alpha3: ara - short_pair: en-ar - chrF2_score: 0.43700000000000006 - bleu: 14.0 - brevity_penalty: 1.0 - ref_len: 58935.0 - src_name: English - tgt_name: Arabic - train_date: 2020-07-03 - src_alpha2: en - tgt_alpha2: ar - prefer_old: False - long_pair: eng-ara - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 - transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b - port_machine: brutasse - port_time: 2020-08-21-14:41", + "model_explanation_gemini": "Translates English text to Arabic using a transformer model with normalization and SentencePiece pre-processing." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-da.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-da.json new file mode 100644 index 0000000000000000000000000000000000000000..6cb290b1a0f76fb9e250c892169c046d0cef7ef4 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-da.json @@ -0,0 +1,19 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-da", + "downloads": 81331, + "tags": [ + "transformers", + "pytorch", + "tf", + "marian", + "text2text-generation", + "translation", + "en", + "da", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-en-da * source languages: en * target languages: da * OPUS readme: en-da * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2019-12-18.zip * test set translations: opus-2019-12-18.test.txt * test set scores: opus-2019-12-18.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba.en.da | 60.4 | 0.745 |" +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-de.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-de.json new file mode 100644 index 0000000000000000000000000000000000000000..a91e2bcb00ed57cbf3fcf20f838152406ffd4662 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-de.json @@ -0,0 +1,22 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-de", + "downloads": 220909, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "marian", + "text2text-generation", + "translation", + "en", + "de", + "license:cc-by-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: cc-by-4.0 --- ### opus-mt-en-de ## Table of Contents - Model Details - Uses - Risks, Limitations and Biases - Training - Evaluation - Citation Information - How to Get Started With the Model ## Model Details **Model Description:** - **Developed by:** Language Technology Research Group at the University of Helsinki - **Model Type:** Translation - **Language(s):** - Source Language: English - Target Language: German - **License:** CC-BY-4.0 - **Resources for more information:** - GitHub Repo ## Uses #### Direct Use This model can be used for translation and text-to-text generation. ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Further details about the dataset for this model can be found in the OPUS readme: en-de #### Training Data ##### Preprocessing * pre-processing: normalization + SentencePiece * dataset: opus * download original weights: opus-2020-02-26.zip * test set translations: opus-2020-02-26.test.txt ## Evaluation #### Results * test set scores: opus-2020-02-26.eval.txt #### Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newssyscomb2009.en.de | 23.5 | 0.540 | | news-test2008.en.de | 23.5 | 0.529 | | newstest2009.en.de | 22.3 | 0.530 | | newstest2010.en.de | 24.9 | 0.544 | | newstest2011.en.de | 22.5 | 0.524 | | newstest2012.en.de | 23.0 | 0.525 | | newstest2013.en.de | 26.9 | 0.553 | | newstest2015-ende.en.de | 31.1 | 0.594 | | newstest2016-ende.en.de | 37.0 | 0.636 | | newstest2017-ende.en.de | 29.9 | 0.586 | | newstest2018-ende.en.de | 45.2 | 0.690 | | newstest2019-ende.en.de | 40.9 | 0.654 | | Tatoeba.en.de | 47.3 | 0.664 | ## Citation Information ## How to Get Started With the Model", + "model_explanation_gemini": "Translates text from English to German for general translation and text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-es.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-es.json new file mode 100644 index 0000000000000000000000000000000000000000..679b52cf02028ed7d99777ba9f872f13e53b1cb8 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-es.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-es", + "downloads": 207518, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "marian", + "text2text-generation", + "translation", + "en", + "es", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - es tags: - translation license: apache-2.0 --- ### eng-spa * source group: English * target group: Spanish * OPUS readme: eng-spa * model: transformer * source language(s): eng * target language(s): spa * model: transformer * pre-processing: normalization + SentencePiece (spm32k,spm32k) * download original weights: opus-2020-08-18.zip * test set translations: opus-2020-08-18.test.txt * test set scores: opus-2020-08-18.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newssyscomb2009-engspa.eng.spa | 31.0 | 0.583 | | news-test2008-engspa.eng.spa | 29.7 | 0.564 | | newstest2009-engspa.eng.spa | 30.2 | 0.578 | | newstest2010-engspa.eng.spa | 36.9 | 0.620 | | newstest2011-engspa.eng.spa | 38.2 | 0.619 | | newstest2012-engspa.eng.spa | 39.0 | 0.625 | | newstest2013-engspa.eng.spa | 35.0 | 0.598 | | Tatoeba-test.eng.spa | 54.9 | 0.721 | ### System Info: - hf_name: eng-spa - source_languages: eng - target_languages: spa - opus_readme_url: - original_repo: Tatoeba-Challenge - tags: ['translation'] - languages: ['en', 'es'] - src_constituents: {'eng'} - tgt_constituents: {'spa'} - src_multilingual: False - tgt_multilingual: False - prepro: normalization + SentencePiece (spm32k,spm32k) - url_model: - url_test_set: - src_alpha3: eng - tgt_alpha3: spa - short_pair: en-es - chrF2_score: 0.721 - bleu: 54.9 - brevity_penalty: 0.978 - ref_len: 77311.0 - src_name: English - tgt_name: Spanish - train_date: 2020-08-18 00:00:00 - src_alpha2: en - tgt_alpha2: es - prefer_old: False - long_pair: eng-spa - helsinki_git_sha: d2f0910c89026c34a44e331e785dec1e0faa7b82 - transformers_git_sha: f7af09b4524b784d67ae8526f0e2fcc6f5ed0de9 - port_machine: brutasse - port_time: 2020-08-24-18:20", + "model_explanation_gemini": "Translates text from English to Spanish using a transformer-based model with normalization and SentencePiece pre-processing." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-fr.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-fr.json new file mode 100644 index 0000000000000000000000000000000000000000..3afacf1daa0b94642bc0fadca58bf03d44d2ddf1 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-fr.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-fr", + "downloads": 347037, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "marian", + "text2text-generation", + "translation", + "en", + "fr", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: translation license: apache-2.0 --- ### opus-mt-en-fr * source languages: en * target languages: fr * OPUS readme: en-fr * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2020-02-26.zip * test set translations: opus-2020-02-26.test.txt * test set scores: opus-2020-02-26.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newsdiscussdev2015-enfr.en.fr | 33.8 | 0.602 | | newsdiscusstest2015-enfr.en.fr | 40.0 | 0.643 | | newssyscomb2009.en.fr | 29.8 | 0.584 | | news-test2008.en.fr | 27.5 | 0.554 | | newstest2009.en.fr | 29.4 | 0.577 | | newstest2010.en.fr | 32.7 | 0.596 | | newstest2011.en.fr | 34.3 | 0.611 | | newstest2012.en.fr | 31.8 | 0.592 | | newstest2013.en.fr | 33.2 | 0.589 | | Tatoeba.en.fr | 50.5 | 0.672 |", + "model_explanation_gemini": "Translates text from English (en) to French (fr) using a transformer-align model trained on OPUS datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-hi.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-hi.json new file mode 100644 index 0000000000000000000000000000000000000000..10b8306caf99e4e8349d3b3e74c48e7783b560ae --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-hi.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-hi", + "downloads": 141072, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "en", + "hi", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - hi tags: - translation license: apache-2.0 --- ### eng-hin * source group: English * target group: Hindi * OPUS readme: eng-hin * model: transformer-align * source language(s): eng * target language(s): hin * model: transformer-align * pre-processing: normalization + SentencePiece (spm32k,spm32k) * download original weights: opus-2020-06-17.zip * test set translations: opus-2020-06-17.test.txt * test set scores: opus-2020-06-17.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newsdev2014.eng.hin | 6.9 | 0.296 | | newstest2014-hien.eng.hin | 9.9 | 0.323 | | Tatoeba-test.eng.hin | 16.1 | 0.447 | ### System Info: - hf_name: eng-hin - source_languages: eng - target_languages: hin - opus_readme_url: - original_repo: Tatoeba-Challenge - tags: ['translation'] - languages: ['en', 'hi'] - src_constituents: {'eng'} - tgt_constituents: {'hin'} - src_multilingual: False - tgt_multilingual: False - prepro: normalization + SentencePiece (spm32k,spm32k) - url_model: - url_test_set: - src_alpha3: eng - tgt_alpha3: hin - short_pair: en-hi - chrF2_score: 0.447 - bleu: 16.1 - brevity_penalty: 1.0 - ref_len: 32904.0 - src_name: English - tgt_name: Hindi - train_date: 2020-06-17 - src_alpha2: en - tgt_alpha2: hi - prefer_old: False - long_pair: eng-hin - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 - transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b - port_machine: brutasse - port_time: 2020-08-21-14:41", + "model_explanation_gemini": "Translates text from English to Hindi using a transformer-based alignment model with SentencePiece preprocessing." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-nl.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-nl.json new file mode 100644 index 0000000000000000000000000000000000000000..dcbe8246fceba1205233e1d5e7b598e1ae81266c --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-nl.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-nl", + "downloads": 102879, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "en", + "nl", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-en-nl * source languages: en * target languages: nl * OPUS readme: en-nl * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2019-12-04.zip * test set translations: opus-2019-12-04.test.txt * test set scores: opus-2019-12-04.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba.en.nl | 57.1 | 0.730 |", + "model_explanation_gemini": "Translates text from English (en) to Dutch (nl) using a transformer-based model trained on OPUS data with SentencePiece pre-processing." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-ru.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-ru.json new file mode 100644 index 0000000000000000000000000000000000000000..b9ab7acf417cce4fc2a41de6263b7df3c6d3f14c --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-ru.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-ru", + "downloads": 145566, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "en", + "ru", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-en-ru * source languages: en * target languages: ru * OPUS readme: en-ru * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2020-02-11.zip * test set translations: opus-2020-02-11.test.txt * test set scores: opus-2020-02-11.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newstest2012.en.ru | 31.1 | 0.581 | | newstest2013.en.ru | 23.5 | 0.513 | | newstest2015-enru.en.ru | 27.5 | 0.564 | | newstest2016-enru.en.ru | 26.4 | 0.548 | | newstest2017-enru.en.ru | 29.1 | 0.572 | | newstest2018-enru.en.ru | 25.4 | 0.554 | | newstest2019-enru.en.ru | 27.1 | 0.533 | | Tatoeba.en.ru | 48.4 | 0.669 |", + "model_explanation_gemini": "Translates text from English (en) to Russian (ru) using a transformer-based model trained on OPUS datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-en-zh.json b/data/model_data_json/Helsinki-NLP_opus-mt-en-zh.json new file mode 100644 index 0000000000000000000000000000000000000000..5709d1e5e989a3553b5496f330671b7acc5f3e0e --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-en-zh.json @@ -0,0 +1,22 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-en-zh", + "downloads": 385815, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "marian", + "text2text-generation", + "translation", + "en", + "zh", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh tags: - translation license: apache-2.0 --- ### eng-zho * source group: English * target group: Chinese * OPUS readme: eng-zho * model: transformer * source language(s): eng * target language(s): cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant gan lzh lzh_Hans nan wuu yue yue_Hans yue_Hant * model: transformer * pre-processing: normalization + SentencePiece (spm32k,spm32k) * a sentence initial language token is required in the form of (id = valid target language ID) * download original weights: opus-2020-07-17.zip * test set translations: opus-2020-07-17.test.txt * test set scores: opus-2020-07-17.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba-test.eng.zho | 31.4 | 0.268 | ### System Info: - hf_name: eng-zho - source_languages: eng - target_languages: zho - opus_readme_url: - original_repo: Tatoeba-Challenge - tags: ['translation'] - languages: ['en', 'zh'] - src_constituents: {'eng'} - tgt_constituents: {'cmn_Hans', 'nan', 'nan_Hani', 'gan', 'yue', 'cmn_Kana', 'yue_Hani', 'wuu_Bopo', 'cmn_Latn', 'yue_Hira', 'cmn_Hani', 'cjy_Hans', 'cmn', 'lzh_Hang', 'lzh_Hira', 'cmn_Hant', 'lzh_Bopo', 'zho', 'zho_Hans', 'zho_Hant', 'lzh_Hani', 'yue_Hang', 'wuu', 'yue_Kana', 'wuu_Latn', 'yue_Bopo', 'cjy_Hant', 'yue_Hans', 'lzh', 'cmn_Hira', 'lzh_Yiii', 'lzh_Hans', 'cmn_Bopo', 'cmn_Hang', 'hak_Hani', 'cmn_Yiii', 'yue_Hant', 'lzh_Kana', 'wuu_Hani'} - src_multilingual: False - tgt_multilingual: False - prepro: normalization + SentencePiece (spm32k,spm32k) - url_model: - url_test_set: - src_alpha3: eng - tgt_alpha3: zho - short_pair: en-zh - chrF2_score: 0.268 - bleu: 31.4 - brevity_penalty: 0.8959999999999999 - ref_len: 110468.0 - src_name: English - tgt_name: Chinese - train_date: 2020-07-17 - src_alpha2: en - tgt_alpha2: zh - prefer_old: False - long_pair: eng-zho - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 - transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b - port_machine: brutasse - port_time: 2020-08-21-14:41", + "model_explanation_gemini": "Translates English text to various Chinese dialects and written forms using a transformer-based model." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-es-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-es-en.json new file mode 100644 index 0000000000000000000000000000000000000000..ddcafac0227376a4f9c051a82a86fd0450ff6b52 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-es-en.json @@ -0,0 +1,20 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-es-en", + "downloads": 356668, + "tags": [ + "transformers", + "pytorch", + "tf", + "marian", + "text2text-generation", + "translation", + "es", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - es - en tags: - translation license: apache-2.0 --- ### spa-eng * source group: Spanish * target group: English * OPUS readme: spa-eng * model: transformer * source language(s): spa * target language(s): eng * model: transformer * pre-processing: normalization + SentencePiece (spm32k,spm32k) * download original weights: opus-2020-08-18.zip * test set translations: opus-2020-08-18.test.txt * test set scores: opus-2020-08-18.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newssyscomb2009-spaeng.spa.eng | 30.6 | 0.570 | | news-test2008-spaeng.spa.eng | 27.9 | 0.553 | | newstest2009-spaeng.spa.eng | 30.4 | 0.572 | | newstest2010-spaeng.spa.eng | 36.1 | 0.614 | | newstest2011-spaeng.spa.eng | 34.2 | 0.599 | | newstest2012-spaeng.spa.eng | 37.9 | 0.624 | | newstest2013-spaeng.spa.eng | 35.3 | 0.609 | | Tatoeba-test.spa.eng | 59.6 | 0.739 | ### System Info: - hf_name: spa-eng - source_languages: spa - target_languages: eng - opus_readme_url: - original_repo: Tatoeba-Challenge - tags: ['translation'] - languages: ['es', 'en'] - src_constituents: {'spa'} - tgt_constituents: {'eng'} - src_multilingual: False - tgt_multilingual: False - prepro: normalization + SentencePiece (spm32k,spm32k) - url_model: - url_test_set: - src_alpha3: spa - tgt_alpha3: eng - short_pair: es-en - chrF2_score: 0.7390000000000001 - bleu: 59.6 - brevity_penalty: 0.9740000000000001 - ref_len: 79376.0 - src_name: Spanish - tgt_name: English - train_date: 2020-08-18 00:00:00 - src_alpha2: es - tgt_alpha2: en - prefer_old: False - long_pair: spa-eng - helsinki_git_sha: d2f0910c89026c34a44e331e785dec1e0faa7b82 - transformers_git_sha: f7af09b4524b784d67ae8526f0e2fcc6f5ed0de9 - port_machine: brutasse - port_time: 2020-08-24-18:20", + "model_explanation_gemini": "Translates text from Spanish (es) to English (en) using a transformer model with normalization and SentencePiece pre-processing." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-fr-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-fr-en.json new file mode 100644 index 0000000000000000000000000000000000000000..17efe18bfbfd5c645e91222956197ac65316e16a --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-fr-en.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-fr-en", + "downloads": 1237546, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "marian", + "text2text-generation", + "translation", + "fr", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-fr-en * source languages: fr * target languages: en * OPUS readme: fr-en * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2020-02-26.zip * test set translations: opus-2020-02-26.test.txt * test set scores: opus-2020-02-26.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newsdiscussdev2015-enfr.fr.en | 33.1 | 0.580 | | newsdiscusstest2015-enfr.fr.en | 38.7 | 0.614 | | newssyscomb2009.fr.en | 30.3 | 0.569 | | news-test2008.fr.en | 26.2 | 0.542 | | newstest2009.fr.en | 30.2 | 0.570 | | newstest2010.fr.en | 32.2 | 0.590 | | newstest2011.fr.en | 33.0 | 0.597 | | newstest2012.fr.en | 32.8 | 0.591 | | newstest2013.fr.en | 33.9 | 0.591 | | newstest2014-fren.fr.en | 37.8 | 0.633 | | Tatoeba.fr.en | 57.5 | 0.720 |", + "model_explanation_gemini": "Translates French text to English using a transformer-based model trained on OPUS datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-it-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-it-en.json new file mode 100644 index 0000000000000000000000000000000000000000..9e95249506aba6b1f0c8b838c617c94424a07944 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-it-en.json @@ -0,0 +1,20 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-it-en", + "downloads": 141385, + "tags": [ + "transformers", + "pytorch", + "tf", + "marian", + "text2text-generation", + "translation", + "it", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-it-en * source languages: it * target languages: en * OPUS readme: it-en * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2019-12-18.zip * test set translations: opus-2019-12-18.test.txt * test set scores: opus-2019-12-18.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newssyscomb2009.it.en | 35.3 | 0.600 | | newstest2009.it.en | 34.0 | 0.594 | | Tatoeba.it.en | 70.9 | 0.808 |", + "model_explanation_gemini": "Translates text from Italian to English using a transformer-based model trained on OPUS datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-ko-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-ko-en.json new file mode 100644 index 0000000000000000000000000000000000000000..bb9cf370dd83a9589ab8933992813ba5ed80bab0 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-ko-en.json @@ -0,0 +1,20 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-ko-en", + "downloads": 110970, + "tags": [ + "transformers", + "pytorch", + "tf", + "marian", + "text2text-generation", + "translation", + "ko", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ko - en tags: - translation license: apache-2.0 --- ### kor-eng * source group: Korean * target group: English * OPUS readme: kor-eng * model: transformer-align * source language(s): kor kor_Hang kor_Latn * target language(s): eng * model: transformer-align * pre-processing: normalization + SentencePiece (spm32k,spm32k) * download original weights: opus-2020-06-17.zip * test set translations: opus-2020-06-17.test.txt * test set scores: opus-2020-06-17.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba-test.kor.eng | 41.3 | 0.588 | ### System Info: - hf_name: kor-eng - source_languages: kor - target_languages: eng - opus_readme_url: - original_repo: Tatoeba-Challenge - tags: ['translation'] - languages: ['ko', 'en'] - src_constituents: {'kor_Hani', 'kor_Hang', 'kor_Latn', 'kor'} - tgt_constituents: {'eng'} - src_multilingual: False - tgt_multilingual: False - prepro: normalization + SentencePiece (spm32k,spm32k) - url_model: - url_test_set: - src_alpha3: kor - tgt_alpha3: eng - short_pair: ko-en - chrF2_score: 0.588 - bleu: 41.3 - brevity_penalty: 0.9590000000000001 - ref_len: 17711.0 - src_name: Korean - tgt_name: English - train_date: 2020-06-17 - src_alpha2: ko - tgt_alpha2: en - prefer_old: False - long_pair: kor-eng - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 - transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b - port_machine: brutasse - port_time: 2020-08-21-14:41", + "model_explanation_gemini": "Translates text from Korean to English using a transformer-based model with SentencePiece preprocessing." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-mul-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-mul-en.json new file mode 100644 index 0000000000000000000000000000000000000000..46a2c7d410b4ecdc5ad66f4330dda2ecc53235e7 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-mul-en.json @@ -0,0 +1,138 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-mul-en", + "downloads": 147301, + "tags": [ + "transformers", + "pytorch", + "tf", + "marian", + "text2text-generation", + "translation", + "ca", + "es", + "os", + "eo", + "ro", + "fy", + "cy", + "is", + "lb", + "su", + "an", + "sq", + "fr", + "ht", + "rm", + "cv", + "ig", + "am", + "eu", + "tr", + "ps", + "af", + "ny", + "ch", + "uk", + "sl", + "lt", + "tk", + "sg", + "ar", + "lg", + "bg", + "be", + "ka", + "gd", + "ja", + "si", + "br", + "mh", + "km", + "th", + "ty", + "rw", + "te", + "mk", + "or", + "wo", + "kl", + "mr", + "ru", + "yo", + "hu", + "fo", + "zh", + "ti", + "co", + "ee", + "oc", + "sn", + "mt", + "ts", + "pl", + "gl", + "nb", + "bn", + "tt", + "bo", + "lo", + "id", + "gn", + "nv", + "hy", + "kn", + "to", + "io", + "so", + "vi", + "da", + "fj", + "gv", + "sm", + "nl", + "mi", + "pt", + "hi", + "se", + "as", + "ta", + "et", + "kw", + "ga", + "sv", + "ln", + "na", + "mn", + "gu", + "wa", + "lv", + "jv", + "el", + "my", + "ba", + "it", + "hr", + "ur", + "ce", + "nn", + "fi", + "mg", + "rn", + "xh", + "ab", + "de", + "cs", + "he", + "zu", + "yi", + "ml", + "mul", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ca - es - os - eo - ro - fy - cy - is - lb - su - an - sq - fr - ht - rm - cv - ig - am - eu - tr - ps - af - ny - ch - uk - sl - lt - tk - sg - ar - lg - bg - be - ka - gd - ja - si - br - mh - km - th - ty - rw - te - mk - or - wo - kl - mr - ru - yo - hu - fo - zh - ti - co - ee - oc - sn - mt - ts - pl - gl - nb - bn - tt - bo - lo - id - gn - nv - hy - kn - to - io - so - vi - da - fj - gv - sm - nl - mi - pt - hi - se - as - ta - et - kw - ga - sv - ln - na - mn - gu - wa - lv - jv - el - my - ba - it - hr - ur - ce - nn - fi - mg - rn - xh - ab - de - cs - he - zu - yi - ml - mul - en tags: - translation license: apache-2.0 --- ### mul-eng * source group: Multiple languages * target group: English * OPUS readme: mul-eng * model: transformer * source language(s): abk acm ady afb afh_Latn afr akl_Latn aln amh ang_Latn apc ara arg arq ary arz asm ast avk_Latn awa aze_Latn bak bam_Latn bel bel_Latn ben bho bod bos_Latn bre brx brx_Latn bul bul_Latn cat ceb ces cha che chr chv cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant cor cos crh crh_Latn csb_Latn cym dan deu dsb dtp dws_Latn egl ell enm_Latn epo est eus ewe ext fao fij fin fkv_Latn fra frm_Latn frr fry fuc fuv gan gcf_Latn gil gla gle glg glv gom gos got_Goth grc_Grek grn gsw guj hat hau_Latn haw heb hif_Latn hil hin hnj_Latn hoc hoc_Latn hrv hsb hun hye iba ibo ido ido_Latn ike_Latn ile_Latn ilo ina_Latn ind isl ita izh jav jav_Java jbo jbo_Cyrl jbo_Latn jdt_Cyrl jpn kab kal kan kat kaz_Cyrl kaz_Latn kek_Latn kha khm khm_Latn kin kir_Cyrl kjh kpv krl ksh kum kur_Arab kur_Latn lad lad_Latn lao lat_Latn lav ldn_Latn lfn_Cyrl lfn_Latn lij lin lit liv_Latn lkt lld_Latn lmo ltg ltz lug lzh lzh_Hans mad mah mai mal mar max_Latn mdf mfe mhr mic min mkd mlg mlt mnw moh mon mri mwl mww mya myv nan nau nav nds niu nld nno nob nob_Hebr nog non_Latn nov_Latn npi nya oci ori orv_Cyrl oss ota_Arab ota_Latn pag pan_Guru pap pau pdc pes pes_Latn pes_Thaa pms pnb pol por ppl_Latn prg_Latn pus quc qya qya_Latn rap rif_Latn roh rom ron rue run rus sag sah san_Deva scn sco sgs shs_Latn shy_Latn sin sjn_Latn slv sma sme smo sna snd_Arab som spa sqi srp_Cyrl srp_Latn stq sun swe swg swh tah tam tat tat_Arab tat_Latn tel tet tgk_Cyrl tha tir tlh_Latn tly_Latn tmw_Latn toi_Latn ton tpw_Latn tso tuk tuk_Latn tur tvl tyv tzl tzl_Latn udm uig_Arab uig_Cyrl ukr umb urd uzb_Cyrl uzb_Latn vec vie vie_Hani vol_Latn vro war wln wol wuu xal xho yid yor yue yue_Hans yue_Hant zho zho_Hans zho_Hant zlm_Latn zsm_Latn zul zza * target language(s): eng * model: transformer * pre-processing: normalization + SentencePiece (spm32k,spm32k) * download original weights: opus2m-2020-08-01.zip * test set translations: opus2m-2020-08-01.test.txt * test set scores: opus2m-2020-08-01.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newsdev2014-hineng.hin.eng | 8.5 | 0.341 | | newsdev2015-enfi-fineng.fin.eng | 16.8 | 0.441 | | newsdev2016-enro-roneng.ron.eng | 31.3 | 0.580 | | newsdev2016-entr-tureng.tur.eng | 16.4 | 0.422 | | newsdev2017-enlv-laveng.lav.eng | 21.3 | 0.502 | | newsdev2017-enzh-zhoeng.zho.eng | 12.7 | 0.409 | | newsdev2018-enet-esteng.est.eng | 19.8 | 0.467 | | newsdev2019-engu-gujeng.guj.eng | 13.3 | 0.385 | | newsdev2019-enlt-liteng.lit.eng | 19.9 | 0.482 | | newsdiscussdev2015-enfr-fraeng.fra.eng | 26.7 | 0.520 | | newsdiscusstest2015-enfr-fraeng.fra.eng | 29.8 | 0.541 | | newssyscomb2009-ceseng.ces.eng | 21.1 | 0.487 | | newssyscomb2009-deueng.deu.eng | 22.6 | 0.499 | | newssyscomb2009-fraeng.fra.eng | 25.8 | 0.530 | | newssyscomb2009-huneng.hun.eng | 15.1 | 0.430 | | newssyscomb2009-itaeng.ita.eng | 29.4 | 0.555 | | newssyscomb2009-spaeng.spa.eng | 26.1 | 0.534 | | news-test2008-deueng.deu.eng | 21.6 | 0.491 | | news-test2008-fraeng.fra.eng | 22.3 | 0.502 | | news-test2008-spaeng.spa.eng | 23.6 | 0.514 | | newstest2009-ceseng.ces.eng | 19.8 | 0.480 | | newstest2009-deueng.deu.eng | 20.9 | 0.487 | | newstest2009-fraeng.fra.eng | 25.0 | 0.523 | | newstest2009-huneng.hun.eng | 14.7 | 0.425 | | newstest2009-itaeng.ita.eng | 27.6 | 0.542 | | newstest2009-spaeng.spa.eng | 25.7 | 0.530 | | newstest2010-ceseng.ces.eng | 20.6 | 0.491 | | newstest2010-deueng.deu.eng | 23.4 | 0.517 | | newstest2010-fraeng.fra.eng | 26.1 | 0.537 | | newstest2010-spaeng.spa.eng | 29.1 | 0.561 | | newstest2011-ceseng.ces.eng | 21.0 | 0.489 | | newstest2011-deueng.deu.eng | 21.3 | 0.494 | | newstest2011-fraeng.fra.eng | 26.8 | 0.546 | | newstest2011-spaeng.spa.eng | 28.2 | 0.549 | | newstest2012-ceseng.ces.eng | 20.5 | 0.485 | | newstest2012-deueng.deu.eng | 22.3 | 0.503 | | newstest2012-fraeng.fra.eng | 27.5 | 0.545 | | newstest2012-ruseng.rus.eng | 26.6 | 0.532 | | newstest2012-spaeng.spa.eng | 30.3 | 0.567 | | newstest2013-ceseng.ces.eng | 22.5 | 0.498 | | newstest2013-deueng.deu.eng | 25.0 | 0.518 | | newstest2013-fraeng.fra.eng | 27.4 | 0.537 | | newstest2013-ruseng.rus.eng | 21.6 | 0.484 | | newstest2013-spaeng.spa.eng | 28.4 | 0.555 | | newstest2014-csen-ceseng.ces.eng | 24.0 | 0.517 | | newstest2014-deen-deueng.deu.eng | 24.1 | 0.511 | | newstest2014-fren-fraeng.fra.eng | 29.1 | 0.563 | | newstest2014-hien-hineng.hin.eng | 14.0 | 0.414 | | newstest2014-ruen-ruseng.rus.eng | 24.0 | 0.521 | | newstest2015-encs-ceseng.ces.eng | 21.9 | 0.481 | | newstest2015-ende-deueng.deu.eng | 25.5 | 0.519 | | newstest2015-enfi-fineng.fin.eng | 17.4 | 0.441 | | newstest2015-enru-ruseng.rus.eng | 22.4 | 0.494 | | newstest2016-encs-ceseng.ces.eng | 23.0 | 0.500 | | newstest2016-ende-deueng.deu.eng | 30.1 | 0.560 | | newstest2016-enfi-fineng.fin.eng | 18.5 | 0.461 | | newstest2016-enro-roneng.ron.eng | 29.6 | 0.562 | | newstest2016-enru-ruseng.rus.eng | 22.0 | 0.495 | | newstest2016-entr-tureng.tur.eng | 14.8 | 0.415 | | newstest2017-encs-ceseng.ces.eng | 20.2 | 0.475 | | newstest2017-ende-deueng.deu.eng | 26.0 | 0.523 | | newstest2017-enfi-fineng.fin.eng | 19.6 | 0.465 | | newstest2017-enlv-laveng.lav.eng | 16.2 | 0.454 | | newstest2017-enru-ruseng.rus.eng | 24.2 | 0.510 | | newstest2017-entr-tureng.tur.eng | 15.0 | 0.412 | | newstest2017-enzh-zhoeng.zho.eng | 13.7 | 0.412 | | newstest2018-encs-ceseng.ces.eng | 21.2 | 0.486 | | newstest2018-ende-deueng.deu.eng | 31.5 | 0.564 | | newstest2018-enet-esteng.est.eng | 19.7 | 0.473 | | newstest2018-enfi-fineng.fin.eng | 15.1 | 0.418 | | newstest2018-enru-ruseng.rus.eng | 21.3 | 0.490 | | newstest2018-entr-tureng.tur.eng | 15.4 | 0.421 | | newstest2018-enzh-zhoeng.zho.eng | 12.9 | 0.408 | | newstest2019-deen-deueng.deu.eng | 27.0 | 0.529 | | newstest2019-fien-fineng.fin.eng | 17.2 | 0.438 | | newstest2019-guen-gujeng.guj.eng | 9.0 | 0.342 | | newstest2019-lten-liteng.lit.eng | 22.6 | 0.512 | | newstest2019-ruen-ruseng.rus.eng | 24.1 | 0.503 | | newstest2019-zhen-zhoeng.zho.eng | 13.9 | 0.427 | | newstestB2016-enfi-fineng.fin.eng | 15.2 | 0.428 | | newstestB2017-enfi-fineng.fin.eng | 16.8 | 0.442 | | newstestB2017-fien-fineng.fin.eng | 16.8 | 0.442 | | Tatoeba-test.abk-eng.abk.eng | 2.4 | 0.190 | | Tatoeba-test.ady-eng.ady.eng | 1.1 | 0.111 | | Tatoeba-test.afh-eng.afh.eng | 1.7 | 0.108 | | Tatoeba-test.afr-eng.afr.eng | 53.0 | 0.672 | | Tatoeba-test.akl-eng.akl.eng | 5.9 | 0.239 | | Tatoeba-test.amh-eng.amh.eng | 25.6 | 0.464 | | Tatoeba-test.ang-eng.ang.eng | 11.7 | 0.289 | | Tatoeba-test.ara-eng.ara.eng | 26.4 | 0.443 | | Tatoeba-test.arg-eng.arg.eng | 35.9 | 0.473 | | Tatoeba-test.asm-eng.asm.eng | 19.8 | 0.365 | | Tatoeba-test.ast-eng.ast.eng | 31.8 | 0.467 | | Tatoeba-test.avk-eng.avk.eng | 0.4 | 0.119 | | Tatoeba-test.awa-eng.awa.eng | 9.7 | 0.271 | | Tatoeba-test.aze-eng.aze.eng | 37.0 | 0.542 | | Tatoeba-test.bak-eng.bak.eng | 13.9 | 0.395 | | Tatoeba-test.bam-eng.bam.eng | 2.2 | 0.094 | | Tatoeba-test.bel-eng.bel.eng | 36.8 | 0.549 | | Tatoeba-test.ben-eng.ben.eng | 39.7 | 0.546 | | Tatoeba-test.bho-eng.bho.eng | 33.6 | 0.540 | | Tatoeba-test.bod-eng.bod.eng | 1.1 | 0.147 | | Tatoeba-test.bre-eng.bre.eng | 14.2 | 0.303 | | Tatoeba-test.brx-eng.brx.eng | 1.7 | 0.130 | | Tatoeba-test.bul-eng.bul.eng | 46.0 | 0.621 | | Tatoeba-test.cat-eng.cat.eng | 46.6 | 0.636 | | Tatoeba-test.ceb-eng.ceb.eng | 17.4 | 0.347 | | Tatoeba-test.ces-eng.ces.eng | 41.3 | 0.586 | | Tatoeba-test.cha-eng.cha.eng | 7.9 | 0.232 | | Tatoeba-test.che-eng.che.eng | 0.7 | 0.104 | | Tatoeba-test.chm-eng.chm.eng | 7.3 | 0.261 | | Tatoeba-test.chr-eng.chr.eng | 8.8 | 0.244 | | Tatoeba-test.chv-eng.chv.eng | 11.0 | 0.319 | | Tatoeba-test.cor-eng.cor.eng | 5.4 | 0.204 | | Tatoeba-test.cos-eng.cos.eng | 58.2 | 0.643 | | Tatoeba-test.crh-eng.crh.eng | 26.3 | 0.399 | | Tatoeba-test.csb-eng.csb.eng | 18.8 | 0.389 | | Tatoeba-test.cym-eng.cym.eng | 23.4 | 0.407 | | Tatoeba-test.dan-eng.dan.eng | 50.5 | 0.659 | | Tatoeba-test.deu-eng.deu.eng | 39.6 | 0.579 | | Tatoeba-test.dsb-eng.dsb.eng | 24.3 | 0.449 | | Tatoeba-test.dtp-eng.dtp.eng | 1.0 | 0.149 | | Tatoeba-test.dws-eng.dws.eng | 1.6 | 0.061 | | Tatoeba-test.egl-eng.egl.eng | 7.6 | 0.236 | | Tatoeba-test.ell-eng.ell.eng | 55.4 | 0.682 | | Tatoeba-test.enm-eng.enm.eng | 28.0 | 0.489 | | Tatoeba-test.epo-eng.epo.eng | 41.8 | 0.591 | | Tatoeba-test.est-eng.est.eng | 41.5 | 0.581 | | Tatoeba-test.eus-eng.eus.eng | 37.8 | 0.557 | | Tatoeba-test.ewe-eng.ewe.eng | 10.7 | 0.262 | | Tatoeba-test.ext-eng.ext.eng | 25.5 | 0.405 | | Tatoeba-test.fao-eng.fao.eng | 28.7 | 0.469 | | Tatoeba-test.fas-eng.fas.eng | 7.5 | 0.281 | | Tatoeba-test.fij-eng.fij.eng | 24.2 | 0.320 | | Tatoeba-test.fin-eng.fin.eng | 35.8 | 0.534 | | Tatoeba-test.fkv-eng.fkv.eng | 15.5 | 0.434 | | Tatoeba-test.fra-eng.fra.eng | 45.1 | 0.618 | | Tatoeba-test.frm-eng.frm.eng | 29.6 | 0.427 | | Tatoeba-test.frr-eng.frr.eng | 5.5 | 0.138 | | Tatoeba-test.fry-eng.fry.eng | 25.3 | 0.455 | | Tatoeba-test.ful-eng.ful.eng | 1.1 | 0.127 | | Tatoeba-test.gcf-eng.gcf.eng | 16.0 | 0.315 | | Tatoeba-test.gil-eng.gil.eng | 46.7 | 0.587 | | Tatoeba-test.gla-eng.gla.eng | 20.2 | 0.358 | | Tatoeba-test.gle-eng.gle.eng | 43.9 | 0.592 | | Tatoeba-test.glg-eng.glg.eng | 45.1 | 0.623 | | Tatoeba-test.glv-eng.glv.eng | 3.3 | 0.119 | | Tatoeba-test.gos-eng.gos.eng | 20.1 | 0.364 | | Tatoeba-test.got-eng.got.eng | 0.1 | 0.041 | | Tatoeba-test.grc-eng.grc.eng | 2.1 | 0.137 | | Tatoeba-test.grn-eng.grn.eng | 1.7 | 0.152 | | Tatoeba-test.gsw-eng.gsw.eng | 18.2 | 0.334 | | Tatoeba-test.guj-eng.guj.eng | 21.7 | 0.373 | | Tatoeba-test.hat-eng.hat.eng | 34.5 | 0.502 | | Tatoeba-test.hau-eng.hau.eng | 10.5 | 0.295 | | Tatoeba-test.haw-eng.haw.eng | 2.8 | 0.160 | | Tatoeba-test.hbs-eng.hbs.eng | 46.7 | 0.623 | | Tatoeba-test.heb-eng.heb.eng | 33.0 | 0.492 | | Tatoeba-test.hif-eng.hif.eng | 17.0 | 0.391 | | Tatoeba-test.hil-eng.hil.eng | 16.0 | 0.339 | | Tatoeba-test.hin-eng.hin.eng | 36.4 | 0.533 | | Tatoeba-test.hmn-eng.hmn.eng | 0.4 | 0.131 | | Tatoeba-test.hoc-eng.hoc.eng | 0.7 | 0.132 | | Tatoeba-test.hsb-eng.hsb.eng | 41.9 | 0.551 | | Tatoeba-test.hun-eng.hun.eng | 33.2 | 0.510 | | Tatoeba-test.hye-eng.hye.eng | 32.2 | 0.487 | | Tatoeba-test.iba-eng.iba.eng | 9.4 | 0.278 | | Tatoeba-test.ibo-eng.ibo.eng | 5.8 | 0.200 | | Tatoeba-test.ido-eng.ido.eng | 31.7 | 0.503 | | Tatoeba-test.iku-eng.iku.eng | 9.1 | 0.164 | | Tatoeba-test.ile-eng.ile.eng | 42.2 | 0.595 | | Tatoeba-test.ilo-eng.ilo.eng | 29.7 | 0.485 | | Tatoeba-test.ina-eng.ina.eng | 42.1 | 0.607 | | Tatoeba-test.isl-eng.isl.eng | 35.7 | 0.527 | | Tatoeba-test.ita-eng.ita.eng | 54.8 | 0.686 | | Tatoeba-test.izh-eng.izh.eng | 28.3 | 0.526 | | Tatoeba-test.jav-eng.jav.eng | 10.0 | 0.282 | | Tatoeba-test.jbo-eng.jbo.eng | 0.3 | 0.115 | | Tatoeba-test.jdt-eng.jdt.eng | 5.3 | 0.140 | | Tatoeba-test.jpn-eng.jpn.eng | 18.8 | 0.387 | | Tatoeba-test.kab-eng.kab.eng | 3.9 | 0.205 | | Tatoeba-test.kal-eng.kal.eng | 16.9 | 0.329 | | Tatoeba-test.kan-eng.kan.eng | 16.2 | 0.374 | | Tatoeba-test.kat-eng.kat.eng | 31.1 | 0.493 | | Tatoeba-test.kaz-eng.kaz.eng | 24.5 | 0.437 | | Tatoeba-test.kek-eng.kek.eng | 7.4 | 0.192 | | Tatoeba-test.kha-eng.kha.eng | 1.0 | 0.154 | | Tatoeba-test.khm-eng.khm.eng | 12.2 | 0.290 | | Tatoeba-test.kin-eng.kin.eng | 22.5 | 0.355 | | Tatoeba-test.kir-eng.kir.eng | 27.2 | 0.470 | | Tatoeba-test.kjh-eng.kjh.eng | 2.1 | 0.129 | | Tatoeba-test.kok-eng.kok.eng | 4.5 | 0.259 | | Tatoeba-test.kom-eng.kom.eng | 1.4 | 0.099 | | Tatoeba-test.krl-eng.krl.eng | 26.1 | 0.387 | | Tatoeba-test.ksh-eng.ksh.eng | 5.5 | 0.256 | | Tatoeba-test.kum-eng.kum.eng | 9.3 | 0.288 | | Tatoeba-test.kur-eng.kur.eng | 9.6 | 0.208 | | Tatoeba-test.lad-eng.lad.eng | 30.1 | 0.475 | | Tatoeba-test.lah-eng.lah.eng | 11.6 | 0.284 | | Tatoeba-test.lao-eng.lao.eng | 4.5 | 0.214 | | Tatoeba-test.lat-eng.lat.eng | 21.5 | 0.402 | | Tatoeba-test.lav-eng.lav.eng | 40.2 | 0.577 | | Tatoeba-test.ldn-eng.ldn.eng | 0.8 | 0.115 | | Tatoeba-test.lfn-eng.lfn.eng | 23.0 | 0.433 | | Tatoeba-test.lij-eng.lij.eng | 9.3 | 0.287 | | Tatoeba-test.lin-eng.lin.eng | 2.4 | 0.196 | | Tatoeba-test.lit-eng.lit.eng | 44.0 | 0.597 | | Tatoeba-test.liv-eng.liv.eng | 1.6 | 0.115 | | Tatoeba-test.lkt-eng.lkt.eng | 2.0 | 0.113 | | Tatoeba-test.lld-eng.lld.eng | 18.3 | 0.312 | | Tatoeba-test.lmo-eng.lmo.eng | 25.4 | 0.395 | | Tatoeba-test.ltz-eng.ltz.eng | 35.9 | 0.509 | | Tatoeba-test.lug-eng.lug.eng | 5.1 | 0.357 | | Tatoeba-test.mad-eng.mad.eng | 2.8 | 0.123 | | Tatoeba-test.mah-eng.mah.eng | 5.7 | 0.175 | | Tatoeba-test.mai-eng.mai.eng | 56.3 | 0.703 | | Tatoeba-test.mal-eng.mal.eng | 37.5 | 0.534 | | Tatoeba-test.mar-eng.mar.eng | 22.8 | 0.470 | | Tatoeba-test.mdf-eng.mdf.eng | 2.0 | 0.110 | | Tatoeba-test.mfe-eng.mfe.eng | 59.2 | 0.764 | | Tatoeba-test.mic-eng.mic.eng | 9.0 | 0.199 | | Tatoeba-test.mkd-eng.mkd.eng | 44.3 | 0.593 | | Tatoeba-test.mlg-eng.mlg.eng | 31.9 | 0.424 | | Tatoeba-test.mlt-eng.mlt.eng | 38.6 | 0.540 | | Tatoeba-test.mnw-eng.mnw.eng | 2.5 | 0.101 | | Tatoeba-test.moh-eng.moh.eng | 0.3 | 0.110 | | Tatoeba-test.mon-eng.mon.eng | 13.5 | 0.334 | | Tatoeba-test.mri-eng.mri.eng | 8.5 | 0.260 | | Tatoeba-test.msa-eng.msa.eng | 33.9 | 0.520 | | Tatoeba-test.multi.eng | 34.7 | 0.518 | | Tatoeba-test.mwl-eng.mwl.eng | 37.4 | 0.630 | | Tatoeba-test.mya-eng.mya.eng | 15.5 | 0.335 | | Tatoeba-test.myv-eng.myv.eng | 0.8 | 0.118 | | Tatoeba-test.nau-eng.nau.eng | 9.0 | 0.186 | | Tatoeba-test.nav-eng.nav.eng | 1.3 | 0.144 | | Tatoeba-test.nds-eng.nds.eng | 30.7 | 0.495 | | Tatoeba-test.nep-eng.nep.eng | 3.5 | 0.168 | | Tatoeba-test.niu-eng.niu.eng | 42.7 | 0.492 | | Tatoeba-test.nld-eng.nld.eng | 47.9 | 0.640 | | Tatoeba-test.nog-eng.nog.eng | 12.7 | 0.284 | | Tatoeba-test.non-eng.non.eng | 43.8 | 0.586 | | Tatoeba-test.nor-eng.nor.eng | 45.5 | 0.619 | | Tatoeba-test.nov-eng.nov.eng | 26.9 | 0.472 | | Tatoeba-test.nya-eng.nya.eng | 33.2 | 0.456 | | Tatoeba-test.oci-eng.oci.eng | 17.9 | 0.370 | | Tatoeba-test.ori-eng.ori.eng | 14.6 | 0.305 | | Tatoeba-test.orv-eng.orv.eng | 11.0 | 0.283 | | Tatoeba-test.oss-eng.oss.eng | 4.1 | 0.211 | | Tatoeba-test.ota-eng.ota.eng | 4.1 | 0.216 | | Tatoeba-test.pag-eng.pag.eng | 24.3 | 0.468 | | Tatoeba-test.pan-eng.pan.eng | 16.4 | 0.358 | | Tatoeba-test.pap-eng.pap.eng | 53.2 | 0.628 | | Tatoeba-test.pau-eng.pau.eng | 3.7 | 0.173 | | Tatoeba-test.pdc-eng.pdc.eng | 45.3 | 0.569 | | Tatoeba-test.pms-eng.pms.eng | 14.0 | 0.345 | | Tatoeba-test.pol-eng.pol.eng | 41.7 | 0.588 | | Tatoeba-test.por-eng.por.eng | 51.4 | 0.669 | | Tatoeba-test.ppl-eng.ppl.eng | 0.4 | 0.134 | | Tatoeba-test.prg-eng.prg.eng | 4.1 | 0.198 | | Tatoeba-test.pus-eng.pus.eng | 6.7 | 0.233 | | Tatoeba-test.quc-eng.quc.eng | 3.5 | 0.091 | | Tatoeba-test.qya-eng.qya.eng | 0.2 | 0.090 | | Tatoeba-test.rap-eng.rap.eng | 17.5 | 0.230 | | Tatoeba-test.rif-eng.rif.eng | 4.2 | 0.164 | | Tatoeba-test.roh-eng.roh.eng | 24.6 | 0.464 | | Tatoeba-test.rom-eng.rom.eng | 3.4 | 0.212 | | Tatoeba-test.ron-eng.ron.eng | 45.2 | 0.620 | | Tatoeba-test.rue-eng.rue.eng | 21.4 | 0.390 | | Tatoeba-test.run-eng.run.eng | 24.5 | 0.392 | | Tatoeba-test.rus-eng.rus.eng | 42.7 | 0.591 | | Tatoeba-test.sag-eng.sag.eng | 3.4 | 0.187 | | Tatoeba-test.sah-eng.sah.eng | 5.0 | 0.177 | | Tatoeba-test.san-eng.san.eng | 2.0 | 0.172 | | Tatoeba-test.scn-eng.scn.eng | 35.8 | 0.410 | | Tatoeba-test.sco-eng.sco.eng | 34.6 | 0.520 | | Tatoeba-test.sgs-eng.sgs.eng | 21.8 | 0.299 | | Tatoeba-test.shs-eng.shs.eng | 1.8 | 0.122 | | Tatoeba-test.shy-eng.shy.eng | 1.4 | 0.104 | | Tatoeba-test.sin-eng.sin.eng | 20.6 | 0.429 | | Tatoeba-test.sjn-eng.sjn.eng | 1.2 | 0.095 | | Tatoeba-test.slv-eng.slv.eng | 37.0 | 0.545 | | Tatoeba-test.sma-eng.sma.eng | 4.4 | 0.147 | | Tatoeba-test.sme-eng.sme.eng | 8.9 | 0.229 | | Tatoeba-test.smo-eng.smo.eng | 37.7 | 0.483 | | Tatoeba-test.sna-eng.sna.eng | 18.0 | 0.359 | | Tatoeba-test.snd-eng.snd.eng | 28.1 | 0.444 | | Tatoeba-test.som-eng.som.eng | 23.6 | 0.472 | | Tatoeba-test.spa-eng.spa.eng | 47.9 | 0.645 | | Tatoeba-test.sqi-eng.sqi.eng | 46.9 | 0.634 | | Tatoeba-test.stq-eng.stq.eng | 8.1 | 0.379 | | Tatoeba-test.sun-eng.sun.eng | 23.8 | 0.369 | | Tatoeba-test.swa-eng.swa.eng | 6.5 | 0.193 | | Tatoeba-test.swe-eng.swe.eng | 51.4 | 0.655 | | Tatoeba-test.swg-eng.swg.eng | 18.5 | 0.342 | | Tatoeba-test.tah-eng.tah.eng | 25.6 | 0.249 | | Tatoeba-test.tam-eng.tam.eng | 29.1 | 0.437 | | Tatoeba-test.tat-eng.tat.eng | 12.9 | 0.327 | | Tatoeba-test.tel-eng.tel.eng | 21.2 | 0.386 | | Tatoeba-test.tet-eng.tet.eng | 9.2 | 0.215 | | Tatoeba-test.tgk-eng.tgk.eng | 12.7 | 0.374 | | Tatoeba-test.tha-eng.tha.eng | 36.3 | 0.531 | | Tatoeba-test.tir-eng.tir.eng | 9.1 | 0.267 | | Tatoeba-test.tlh-eng.tlh.eng | 0.2 | 0.084 | | Tatoeba-test.tly-eng.tly.eng | 2.1 | 0.128 | | Tatoeba-test.toi-eng.toi.eng | 5.3 | 0.150 | | Tatoeba-test.ton-eng.ton.eng | 39.5 | 0.473 | | Tatoeba-test.tpw-eng.tpw.eng | 1.5 | 0.160 | | Tatoeba-test.tso-eng.tso.eng | 44.7 | 0.526 | | Tatoeba-test.tuk-eng.tuk.eng | 18.6 | 0.401 | | Tatoeba-test.tur-eng.tur.eng | 40.5 | 0.573 | | Tatoeba-test.tvl-eng.tvl.eng | 55.0 | 0.593 | | Tatoeba-test.tyv-eng.tyv.eng | 19.1 | 0.477 | | Tatoeba-test.tzl-eng.tzl.eng | 17.7 | 0.333 | | Tatoeba-test.udm-eng.udm.eng | 3.4 | 0.217 | | Tatoeba-test.uig-eng.uig.eng | 11.4 | 0.289 | | Tatoeba-test.ukr-eng.ukr.eng | 43.1 | 0.595 | | Tatoeba-test.umb-eng.umb.eng | 9.2 | 0.260 | | Tatoeba-test.urd-eng.urd.eng | 23.2 | 0.426 | | Tatoeba-test.uzb-eng.uzb.eng | 19.0 | 0.342 | | Tatoeba-test.vec-eng.vec.eng | 41.1 | 0.409 | | Tatoeba-test.vie-eng.vie.eng | 30.6 | 0.481 | | Tatoeba-test.vol-eng.vol.eng | 1.8 | 0.143 | | Tatoeba-test.war-eng.war.eng | 15.9 | 0.352 | | Tatoeba-test.wln-eng.wln.eng | 12.6 | 0.291 | | Tatoeba-test.wol-eng.wol.eng | 4.4 | 0.138 | | Tatoeba-test.xal-eng.xal.eng | 0.9 | 0.153 | | Tatoeba-test.xho-eng.xho.eng | 35.4 | 0.513 | | Tatoeba-test.yid-eng.yid.eng | 19.4 | 0.387 | | Tatoeba-test.yor-eng.yor.eng | 19.3 | 0.327 | | Tatoeba-test.zho-eng.zho.eng | 25.8 | 0.448 | | Tatoeba-test.zul-eng.zul.eng | 40.9 | 0.567 | | Tatoeba-test.zza-eng.zza.eng | 1.6 | 0.125 | ### System Info: - hf_name: mul-eng - source_languages: mul - target_languages: eng - opus_readme_url: - original_repo: Tatoeba-Challenge - tags: ['translation'] - languages: ['ca', 'es', 'os', 'eo', 'ro', 'fy', 'cy', 'is', 'lb', 'su', 'an', 'sq', 'fr', 'ht', 'rm', 'cv', 'ig', 'am', 'eu', 'tr', 'ps', 'af', 'ny', 'ch', 'uk', 'sl', 'lt', 'tk', 'sg', 'ar', 'lg', 'bg', 'be', 'ka', 'gd', 'ja', 'si', 'br', 'mh', 'km', 'th', 'ty', 'rw', 'te', 'mk', 'or', 'wo', 'kl', 'mr', 'ru', 'yo', 'hu', 'fo', 'zh', 'ti', 'co', 'ee', 'oc', 'sn', 'mt', 'ts', 'pl', 'gl', 'nb', 'bn', 'tt', 'bo', 'lo', 'id', 'gn', 'nv', 'hy', 'kn', 'to', 'io', 'so', 'vi', 'da', 'fj', 'gv', 'sm', 'nl', 'mi', 'pt', 'hi', 'se', 'as', 'ta', 'et', 'kw', 'ga', 'sv', 'ln', 'na', 'mn', 'gu', 'wa', 'lv', 'jv', 'el', 'my', 'ba', 'it', 'hr', 'ur', 'ce', 'nn', 'fi', 'mg', 'rn', 'xh', 'ab', 'de', 'cs', 'he', 'zu', 'yi', 'ml', 'mul', 'en'] - src_constituents: {'sjn_Latn', 'cat', 'nan', 'spa', 'ile_Latn', 'pap', 'mwl', 'uzb_Latn', 'mww', 'hil', 'lij', 'avk_Latn', 'lad_Latn', 'lat_Latn', 'bos_Latn', 'oss', 'epo', 'ron', 'fry', 'cym', 'toi_Latn', 'awa', 'swg', 'zsm_Latn', 'zho_Hant', 'gcf_Latn', 'uzb_Cyrl', 'isl', 'lfn_Latn', 'shs_Latn', 'nov_Latn', 'bho', 'ltz', 'lzh', 'kur_Latn', 'sun', 'arg', 'pes_Thaa', 'sqi', 'uig_Arab', 'csb_Latn', 'fra', 'hat', 'liv_Latn', 'non_Latn', 'sco', 'cmn_Hans', 'pnb', 'roh', 'chv', 'ibo', 'bul_Latn', 'amh', 'lfn_Cyrl', 'eus', 'fkv_Latn', 'tur', 'pus', 'afr', 'brx_Latn', 'nya', 'acm', 'ota_Latn', 'cha', 'ukr', 'xal', 'slv', 'lit', 'zho_Hans', 'tmw_Latn', 'kjh', 'ota_Arab', 'war', 'tuk', 'sag', 'myv', 'hsb', 'lzh_Hans', 'ara', 'tly_Latn', 'lug', 'brx', 'bul', 'bel', 'vol_Latn', 'kat', 'gan', 'got_Goth', 'vro', 'ext', 'afh_Latn', 'gla', 'jpn', 'udm', 'mai', 'ary', 'sin', 'tvl', 'hif_Latn', 'cjy_Hant', 'bre', 'ceb', 'mah', 'nob_Hebr', 'crh_Latn', 'prg_Latn', 'khm', 'ang_Latn', 'tha', 'tah', 'tzl', 'aln', 'kin', 'tel', 'ady', 'mkd', 'ori', 'wol', 'aze_Latn', 'jbo', 'niu', 'kal', 'mar', 'vie_Hani', 'arz', 'yue', 'kha', 'san_Deva', 'jbo_Latn', 'gos', 'hau_Latn', 'rus', 'quc', 'cmn', 'yor', 'hun', 'uig_Cyrl', 'fao', 'mnw', 'zho', 'orv_Cyrl', 'iba', 'bel_Latn', 'tir', 'afb', 'crh', 'mic', 'cos', 'swh', 'sah', 'krl', 'ewe', 'apc', 'zza', 'chr', 'grc_Grek', 'tpw_Latn', 'oci', 'mfe', 'sna', 'kir_Cyrl', 'tat_Latn', 'gom', 'ido_Latn', 'sgs', 'pau', 'tgk_Cyrl', 'nog', 'mlt', 'pdc', 'tso', 'srp_Cyrl', 'pol', 'ast', 'glg', 'pms', 'fuc', 'nob', 'qya', 'ben', 'tat', 'kab', 'min', 'srp_Latn', 'wuu', 'dtp', 'jbo_Cyrl', 'tet', 'bod', 'yue_Hans', 'zlm_Latn', 'lao', 'ind', 'grn', 'nav', 'kaz_Cyrl', 'rom', 'hye', 'kan', 'ton', 'ido', 'mhr', 'scn', 'som', 'rif_Latn', 'vie', 'enm_Latn', 'lmo', 'npi', 'pes', 'dan', 'fij', 'ina_Latn', 'cjy_Hans', 'jdt_Cyrl', 'gsw', 'glv', 'khm_Latn', 'smo', 'umb', 'sma', 'gil', 'nld', 'snd_Arab', 'arq', 'mri', 'kur_Arab', 'por', 'hin', 'shy_Latn', 'sme', 'rap', 'tyv', 'dsb', 'moh', 'asm', 'lad', 'yue_Hant', 'kpv', 'tam', 'est', 'frm_Latn', 'hoc_Latn', 'bam_Latn', 'kek_Latn', 'ksh', 'tlh_Latn', 'ltg', 'pan_Guru', 'hnj_Latn', 'cor', 'gle', 'swe', 'lin', 'qya_Latn', 'kum', 'mad', 'cmn_Hant', 'fuv', 'nau', 'mon', 'akl_Latn', 'guj', 'kaz_Latn', 'wln', 'tuk_Latn', 'jav_Java', 'lav', 'jav', 'ell', 'frr', 'mya', 'bak', 'rue', 'ita', 'hrv', 'izh', 'ilo', 'dws_Latn', 'urd', 'stq', 'tat_Arab', 'haw', 'che', 'pag', 'nno', 'fin', 'mlg', 'ppl_Latn', 'run', 'xho', 'abk', 'deu', 'hoc', 'lkt', 'lld_Latn', 'tzl_Latn', 'mdf', 'ike_Latn', 'ces', 'ldn_Latn', 'egl', 'heb', 'vec', 'zul', 'max_Latn', 'pes_Latn', 'yid', 'mal', 'nds'} - tgt_constituents: {'eng'} - src_multilingual: True - tgt_multilingual: False - prepro: normalization + SentencePiece (spm32k,spm32k) - url_model: - url_test_set: - src_alpha3: mul - tgt_alpha3: eng - short_pair: mul-en - chrF2_score: 0.518 - bleu: 34.7 - brevity_penalty: 1.0 - ref_len: 72346.0 - src_name: Multiple languages - tgt_name: English - train_date: 2020-08-01 - src_alpha2: mul - tgt_alpha2: en - prefer_old: False - long_pair: mul-eng - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 - transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b - port_machine: brutasse - port_time: 2020-08-21-14:41", + "model_explanation_gemini": "Translates multiple languages into English using a transformer-based model." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-nl-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-nl-en.json new file mode 100644 index 0000000000000000000000000000000000000000..e512d3a5806b607a1635a71cb6757ec8a463e76e --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-nl-en.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-nl-en", + "downloads": 323679, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "nl", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-nl-en * source languages: nl * target languages: en * OPUS readme: nl-en * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2019-12-05.zip * test set translations: opus-2019-12-05.test.txt * test set scores: opus-2019-12-05.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba.nl.en | 60.9 | 0.749 |", + "model_explanation_gemini": "Translates text from Dutch (nl) to English (en) using a transformer-based model trained on OPUS data." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-pl-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-pl-en.json new file mode 100644 index 0000000000000000000000000000000000000000..be511de6cbd31aece12e783ae6c90b22461a37c0 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-pl-en.json @@ -0,0 +1,20 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-pl-en", + "downloads": 90974, + "tags": [ + "transformers", + "pytorch", + "tf", + "marian", + "text2text-generation", + "translation", + "pl", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: apache-2.0 --- ### opus-mt-pl-en * source languages: pl * target languages: en * OPUS readme: pl-en * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download original weights: opus-2019-12-18.zip * test set translations: opus-2019-12-18.test.txt * test set scores: opus-2019-12-18.eval.txt ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba.pl.en | 54.9 | 0.701 |", + "model_explanation_gemini": "Translates text from Polish (pl) to English (en) using a transformer-based model trained on OPUS datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-ru-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-ru-en.json new file mode 100644 index 0000000000000000000000000000000000000000..f3e6d388f1c6e634e485c6a689baa6d49d9f8d13 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-ru-en.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-ru-en", + "downloads": 322407, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "ru", + "en", + "license:cc-by-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - translation license: cc-by-4.0 --- ### opus-mt-ru-en ## Table of Contents - Model Details - Uses - Risks, Limitations and Biases - Training - Evaluation - Citation Information - How to Get Started With the Model ## Model Details **Model Description:** - **Developed by:** Language Technology Research Group at the University of Helsinki - **Model Type:** Transformer-align - **Language(s):** - Source Language: Russian - Target Language: English - **License:** CC-BY-4.0 - **Resources for more information:** - GitHub Repo ## Uses #### Direct Use This model can be used for translation and text-to-text generation. ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Further details about the dataset for this model can be found in the OPUS readme: ru-en ## Training #### Training Data ##### Preprocessing * Pre-processing: Normalization + SentencePiece * Dataset: opus * Download original weights: opus-2020-02-26.zip * Test set translations: opus-2020-02-26.test.txt ## Evaluation #### Results * test set scores: opus-2020-02-26.eval.txt #### Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | newstest2012.ru.en | 34.8 | 0.603 | | newstest2013.ru.en | 27.9 | 0.545 | | newstest2014-ruen.ru.en | 31.9 | 0.591 | | newstest2015-enru.ru.en | 30.4 | 0.568 | | newstest2016-enru.ru.en | 30.1 | 0.565 | | newstest2017-enru.ru.en | 33.4 | 0.593 | | newstest2018-enru.ru.en | 29.6 | 0.565 | | newstest2019-ruen.ru.en | 31.4 | 0.576 | | Tatoeba.ru.en | 61.1 | 0.736 | ## Citation Information ## How to Get Started With the Model", + "model_explanation_gemini": "Translates text from Russian to English using a Transformer-based model." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-tc-big-en-tr.json b/data/model_data_json/Helsinki-NLP_opus-mt-tc-big-en-tr.json new file mode 100644 index 0000000000000000000000000000000000000000..64398aa742214f9283bf963d9a91990f18ca746d --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-tc-big-en-tr.json @@ -0,0 +1,22 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-tc-big-en-tr", + "downloads": 126645, + "tags": [ + "transformers", + "pytorch", + "tf", + "marian", + "text2text-generation", + "translation", + "opus-mt-tc", + "en", + "tr", + "license:cc-by-4.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - tr tags: - translation - opus-mt-tc license: cc-by-4.0 model-index: - name: opus-mt-tc-big-en-tr results: - task: name: Translation eng-tur type: translation args: eng-tur dataset: name: flores101-devtest type: flores_101 args: eng tur devtest metrics: - name: BLEU type: bleu value: 31.4 - task: name: Translation eng-tur type: translation args: eng-tur dataset: name: newsdev2016 type: newsdev2016 args: eng-tur metrics: - name: BLEU type: bleu value: 21.9 - task: name: Translation eng-tur type: translation args: eng-tur dataset: name: tatoeba-test-v2021-08-07 type: tatoeba_mt args: eng-tur metrics: - name: BLEU type: bleu value: 42.3 - task: name: Translation eng-tur type: translation args: eng-tur dataset: name: newstest2016 type: wmt-2016-news args: eng-tur metrics: - name: BLEU type: bleu value: 23.4 - task: name: Translation eng-tur type: translation args: eng-tur dataset: name: newstest2017 type: wmt-2017-news args: eng-tur metrics: - name: BLEU type: bleu value: 25.4 - task: name: Translation eng-tur type: translation args: eng-tur dataset: name: newstest2018 type: wmt-2018-news args: eng-tur metrics: - name: BLEU type: bleu value: 22.6 --- # opus-mt-tc-big-en-tr Neural machine translation model for translating from English (en) to Turkish (tr). This model is part of the OPUS-MT project, an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from OPUS and training pipelines use the procedures of OPUS-MT-train. * Publications: OPUS-MT – Building open translation services for the World and The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT (Please, cite if you use this model.) ## Model info * Release: 2022-02-25 * source language(s): eng * target language(s): tur * model: transformer-big * data: opusTCv20210807+bt (source) * tokenization: SentencePiece (spm32k,spm32k) * original model: opusTCv20210807+bt_transformer-big_2022-02-25.zip * more information released models: OPUS-MT eng-tur README ## Usage A short example code: You can also use OPUS-MT models with the transformers pipelines, for example: ## Benchmarks * test set translations: opusTCv20210807+bt_transformer-big_2022-02-25.test.txt * test set scores: opusTCv20210807+bt_transformer-big_2022-02-25.eval.txt * benchmark results: benchmark_results.txt * benchmark output: benchmark_translations.zip | langpair | testset | chr-F | BLEU | #sent | #words | |----------|---------|-------|-------|-------|--------| | eng-tur | tatoeba-test-v2021-08-07 | 0.68726 | 42.3 | 13907 | 84364 | | eng-tur | flores101-devtest | 0.62829 | 31.4 | 1012 | 20253 | | eng-tur | newsdev2016 | 0.58947 | 21.9 | 1001 | 15958 | | eng-tur | newstest2016 | 0.57624 | 23.4 | 3000 | 50782 | | eng-tur | newstest2017 | 0.58858 | 25.4 | 3007 | 51977 | | eng-tur | newstest2018 | 0.57848 | 22.6 | 3000 | 53731 | ## Acknowledgements The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland. ## Model conversion info * transformers version: 4.16.2 * OPUS-MT git hash: 3405783 * port time: Wed Apr 13 18:11:39 EEST 2022 * port machine: LM0-400-22516.local", + "model_explanation_gemini": "Translates English text to Turkish using a neural machine translation model trained on OPUS data." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-tc-big-tr-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-tc-big-tr-en.json new file mode 100644 index 0000000000000000000000000000000000000000..3e26719dc825dab56ecfe4690c82f67f30625940 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-tc-big-tr-en.json @@ -0,0 +1,23 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-tc-big-tr-en", + "downloads": 121307, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "marian", + "text2text-generation", + "translation", + "opus-mt-tc", + "en", + "tr", + "license:cc-by-4.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - tr tags: - translation - opus-mt-tc license: cc-by-4.0 model-index: - name: opus-mt-tc-big-tr-en results: - task: name: Translation tur-eng type: translation args: tur-eng dataset: name: flores101-devtest type: flores_101 args: tur eng devtest metrics: - name: BLEU type: bleu value: 37.6 - task: name: Translation tur-eng type: translation args: tur-eng dataset: name: newsdev2016 type: newsdev2016 args: tur-eng metrics: - name: BLEU type: bleu value: 32.1 - task: name: Translation tur-eng type: translation args: tur-eng dataset: name: tatoeba-test-v2021-08-07 type: tatoeba_mt args: tur-eng metrics: - name: BLEU type: bleu value: 57.6 - task: name: Translation tur-eng type: translation args: tur-eng dataset: name: newstest2016 type: wmt-2016-news args: tur-eng metrics: - name: BLEU type: bleu value: 29.3 - task: name: Translation tur-eng type: translation args: tur-eng dataset: name: newstest2017 type: wmt-2017-news args: tur-eng metrics: - name: BLEU type: bleu value: 29.7 - task: name: Translation tur-eng type: translation args: tur-eng dataset: name: newstest2018 type: wmt-2018-news args: tur-eng metrics: - name: BLEU type: bleu value: 30.7 --- # opus-mt-tc-big-tr-en Neural machine translation model for translating from Turkish (tr) to English (en). This model is part of the OPUS-MT project, an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from OPUS and training pipelines use the procedures of OPUS-MT-train. * Publications: OPUS-MT – Building open translation services for the World and The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT (Please, cite if you use this model.) ## Model info * Release: 2022-03-17 * source language(s): tur * target language(s): eng * model: transformer-big * data: opusTCv20210807+bt (source) * tokenization: SentencePiece (spm32k,spm32k) * original model: opusTCv20210807+bt_transformer-big_2022-03-17.zip * more information released models: OPUS-MT tur-eng README ## Usage A short example code: You can also use OPUS-MT models with the transformers pipelines, for example: ## Benchmarks * test set translations: opusTCv20210807+bt_transformer-big_2022-03-17.test.txt * test set scores: opusTCv20210807+bt_transformer-big_2022-03-17.eval.txt * benchmark results: benchmark_results.txt * benchmark output: benchmark_translations.zip | langpair | testset | chr-F | BLEU | #sent | #words | |----------|---------|-------|-------|-------|--------| | tur-eng | tatoeba-test-v2021-08-07 | 0.71895 | 57.6 | 13907 | 109231 | | tur-eng | flores101-devtest | 0.64152 | 37.6 | 1012 | 24721 | | tur-eng | newsdev2016 | 0.58658 | 32.1 | 1001 | 21988 | | tur-eng | newstest2016 | 0.56960 | 29.3 | 3000 | 66175 | | tur-eng | newstest2017 | 0.57455 | 29.7 | 3007 | 67703 | | tur-eng | newstest2018 | 0.58488 | 30.7 | 3000 | 68725 | ## Acknowledgements The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland. ## Model conversion info * transformers version: 4.16.2 * OPUS-MT git hash: 3405783 * port time: Wed Apr 13 20:02:48 EEST 2022 * port machine: LM0-400-22516.local", + "model_explanation_gemini": "Translates text from Turkish to English using a transformer-based neural machine translation model." +} \ No newline at end of file diff --git a/data/model_data_json/Helsinki-NLP_opus-mt-zh-en.json b/data/model_data_json/Helsinki-NLP_opus-mt-zh-en.json new file mode 100644 index 0000000000000000000000000000000000000000..e525e2ea3c9992f6b2cc0d55c38b0c170def4758 --- /dev/null +++ b/data/model_data_json/Helsinki-NLP_opus-mt-zh-en.json @@ -0,0 +1,21 @@ +{ + "model_id": "Helsinki-NLP/opus-mt-zh-en", + "downloads": 423093, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "marian", + "text2text-generation", + "translation", + "zh", + "en", + "license:cc-by-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - zh - en tags: - translation license: cc-by-4.0 --- ### zho-eng ## Table of Contents - Model Details - Uses - Risks, Limitations and Biases - Training - Evaluation - Citation Information - How to Get Started With the Model ## Model Details - **Model Description:** - **Developed by:** Language Technology Research Group at the University of Helsinki - **Model Type:** Translation - **Language(s):** - Source Language: Chinese - Target Language: English - **License:** CC-BY-4.0 - **Resources for more information:** - GitHub Repo ## Uses #### Direct Use This model can be used for translation and text-to-text generation. ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Further details about the dataset for this model can be found in the OPUS readme: zho-eng ## Training #### System Information * helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 * transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b * port_machine: brutasse * port_time: 2020-08-21-14:41 * src_multilingual: False * tgt_multilingual: False #### Training Data ##### Preprocessing * pre-processing: normalization + SentencePiece (spm32k,spm32k) * ref_len: 82826.0 * dataset: opus * download original weights: opus-2020-07-17.zip * test set translations: opus-2020-07-17.test.txt ## Evaluation #### Results * test set scores: opus-2020-07-17.eval.txt * brevity_penalty: 0.948 ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | Tatoeba-test.zho.eng | 36.1 | 0.548 | ## Citation Information ## How to Get Started With the Model", + "model_explanation_gemini": "Translates text from Chinese to English for general-purpose translation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/HooshvareLab_bert-base-parsbert-uncased.json b/data/model_data_json/HooshvareLab_bert-base-parsbert-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..a049063782172ee6faca4de07003099d989e59bd --- /dev/null +++ b/data/model_data_json/HooshvareLab_bert-base-parsbert-uncased.json @@ -0,0 +1,18 @@ +{ + "model_id": "HooshvareLab/bert-base-parsbert-uncased", + "downloads": 99231, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "arxiv:2005.12515", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "## ParsBERT: Transformer-based Model for Persian Language Understanding ParsBERT is a monolingual language model based on Google’s BERT architecture with the same configurations as BERT-Base. Paper presenting ParsBERT: arXiv:2005.12515 All the models (downstream tasks) are uncased and trained with whole word masking. (coming soon stay tuned) --- ## Introduction This model is pre-trained on a large Persian corpus with various writing styles from numerous subjects (e.g., scientific, novels, news) with more than 2M documents. A large subset of this corpus was crawled manually. As a part of ParsBERT methodology, an extensive pre-processing combining POS tagging and WordPiece segmentation was carried out to bring the corpus into a proper format. This process produces more than 40M true sentences. ## Evaluation ParsBERT is evaluated on three NLP downstream tasks: Sentiment Analysis (SA), Text Classification, and Named Entity Recognition (NER). For this matter and due to insufficient resources, two large datasets for SA and two for text classification were manually composed, which are available for public use and benchmarking. ParsBERT outperformed all other language models, including multilingual BERT and other hybrid deep learning models for all tasks, improving the state-of-the-art performance in Persian language modeling. ## Results The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures. ### Sentiment Analysis (SA) task | Dataset | ParsBERT | mBERT | DeepSentiPers | |:--------------------------:|:---------:|:-----:|:-------------:| | Digikala User Comments | 81.74* | 80.74 | - | | SnappFood User Comments | 88.12* | 87.87 | - | | SentiPers (Multi Class) | 71.11* | - | 69.33 | | SentiPers (Binary Class) | 92.13* | - | 91.98 | ### Text Classification (TC) task | Dataset | ParsBERT | mBERT | |:-----------------:|:--------:|:-----:| | Digikala Magazine | 93.59* | 90.72 | | Persian News | 97.19* | 95.79 | ### Named Entity Recognition (NER) task | Dataset | ParsBERT | mBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF | |:-------:|:--------:|:--------:|:----------:|:--------------:|:----------:|:----------------:|:------------:| | PEYMA | 93.10* | 86.64 | - | 90.59 | - | 84.00 | - | | ARMAN | 98.79* | 95.89 | 89.9 | 84.03 | 86.55 | - | 77.45 | **If you tested ParsBERT on a public dataset and you want to add your results to the table above, open a pull request or contact us. Also make sure to have your code available online so we can add it as a reference** ## How to use ### TensorFlow 2.0 ### Pytorch ## NLP Tasks Tutorial Coming soon stay tuned ## Cite Please cite the following paper in your publication if you are using ParsBERT in your research: ## Acknowledgments We hereby, express our gratitude to the Tensorflow Research Cloud (TFRC) program for providing us with the necessary computation resources. We also thank Hooshvare Research Group for facilitating dataset gathering and scraping online text resources. ## Contributors - Mehrdad Farahani: Linkedin, Twitter, Github - Mohammad Gharachorloo: Linkedin, Twitter, Github - Marzieh Farahani: Linkedin, Twitter, Github - Mohammad Manthouri: Linkedin, Twitter, Github - Hooshvare Team: Official Website, Linkedin, Twitter, Github, Instagram ## Releases ### Release v0.1 (May 27, 2019) This is the first version of our ParsBERT based on BERTBASE", + "model_explanation_gemini": "ParsBERT is a Persian BERT-based model pre-trained on a large Persian corpus for tasks like sentiment analysis, text classification, and named entity recognition, outperforming other models in Persian language understanding." +} \ No newline at end of file diff --git a/data/model_data_json/HooshvareLab_bert-fa-base-uncased-ner-peyma.json b/data/model_data_json/HooshvareLab_bert-fa-base-uncased-ner-peyma.json new file mode 100644 index 0000000000000000000000000000000000000000..1fdce83742ecf146dd27876b26288e2619bfea03 --- /dev/null +++ b/data/model_data_json/HooshvareLab_bert-fa-base-uncased-ner-peyma.json @@ -0,0 +1,19 @@ +{ + "model_id": "HooshvareLab/bert-fa-base-uncased-ner-peyma", + "downloads": 83745, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "token-classification", + "fa", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: fa license: apache-2.0 --- # ParsBERT (v2.0) A Transformer-based Model for Persian Language Understanding We reconstructed the vocabulary and fine-tuned the ParsBERT v1.1 on the new Persian corpora in order to provide some functionalities for using ParsBERT in other scopes! Please follow the ParsBERT repo for the latest information about previous and current models. ## Persian NER [ARMAN, PEYMA] This task aims to extract named entities in the text, such as names and label with appropriate classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with format. In this format, tokens that are not part of an entity are tagged as the tag corresponds to the first word of an object, and the tag corresponds to the rest of the terms of the same entity. Both and tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, , and . ### PEYMA PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes. 1. Organization 2. Money 3. Location 4. Date 5. Time 6. Person 7. Percent | Label | # | |:------------:|:-----:| | Organization | 16964 | | Money | 2037 | | Location | 8782 | | Date | 4259 | | Time | 732 | | Person | 7675 | | Percent | 699 | **Download** You can download the dataset from here ## Results The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures. | Dataset | ParsBERT v2 | ParsBERT v1 | mBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF | |---------|-------------|-------------|-------|------------|--------------|----------|----------------|------------| | PEYMA | 93.40* | 93.10 | 86.64 | - | 90.59 | - | 84.00 | - | ## How to use :hugs: | Notebook | Description | | |:----------|:-------------|------:| | How to use Pipelines | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | ![Open In Colab]( | ### BibTeX entry and citation info Please cite in publications as the following: ## Questions? Post a Github issue on the ParsBERT Issues repo.", + "model_explanation_gemini": "Identifies and classifies named entities in Persian text into seven categories (e.g., organizations, locations) using token classification." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceH4_tiny-random-LlamaForCausalLM.json b/data/model_data_json/HuggingFaceH4_tiny-random-LlamaForCausalLM.json new file mode 100644 index 0000000000000000000000000000000000000000..2d11573061fa6fcf512f82b5312cf73c2f9c5876 --- /dev/null +++ b/data/model_data_json/HuggingFaceH4_tiny-random-LlamaForCausalLM.json @@ -0,0 +1,16 @@ +{ + "model_id": "HuggingFaceH4/tiny-random-LlamaForCausalLM", + "downloads": 165820, + "tags": [ + "transformers", + "pytorch", + "llama", + "text-generation", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- duplicated_from: HuggingFaceM4/tiny-random-LlamaForCausalLM ---", + "model_explanation_gemini": "Generates text sequentially for causal language modeling tasks using a tiny, randomly initialized Llama architecture." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceH4_zephyr-7b-beta.json b/data/model_data_json/HuggingFaceH4_zephyr-7b-beta.json new file mode 100644 index 0000000000000000000000000000000000000000..7da443bb051a9f62c92ca5b07d4ef76ea4622413 --- /dev/null +++ b/data/model_data_json/HuggingFaceH4_zephyr-7b-beta.json @@ -0,0 +1,30 @@ +{ + "model_id": "HuggingFaceH4/zephyr-7b-beta", + "downloads": 613951, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mistral", + "text-generation", + "generated_from_trainer", + "conversational", + "en", + "dataset:HuggingFaceH4/ultrachat_200k", + "dataset:HuggingFaceH4/ultrafeedback_binarized", + "arxiv:2305.18290", + "arxiv:2310.16944", + "arxiv:2305.14233", + "arxiv:2310.01377", + "base_model:mistralai/Mistral-7B-v0.1", + "base_model:finetune:mistralai/Mistral-7B-v0.1", + "license:mit", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - generated_from_trainer license: mit datasets: - HuggingFaceH4/ultrachat_200k - HuggingFaceH4/ultrafeedback_binarized language: - en base_model: mistralai/Mistral-7B-v0.1 widget: - example_title: Pirate! messages: - role: system content: You are a pirate chatbot who always responds with Arr! - role: user content: \"There's a llama on my lawn, how can I get rid of him?\" output: text: >- Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare sight, but I've got a plan that might help ye get rid of 'im. Ye'll need to gather some carrots and hay, and then lure the llama away with the promise of a tasty treat. Once he's gone, ye can clean up yer lawn and enjoy the peace and quiet once again. But beware, me hearty, for there may be more llamas where that one came from! Arr! pipeline_tag: text-generation model-index: - name: zephyr-7b-beta results: # AI2 Reasoning Challenge (25-Shot) - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm name: normalized accuracy value: 62.03071672354948 source: name: Open LLM Leaderboard url: # HellaSwag (10-shot) - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm name: normalized accuracy value: 84.35570603465445 source: name: Open LLM Leaderboard url: # DROP (3-shot) - task: type: text-generation name: Text Generation dataset: name: Drop (3-Shot) type: drop split: validation args: num_few_shot: 3 metrics: - type: f1 name: f1 score value: 9.662437080536909 source: name: Open LLM Leaderboard url: # TruthfulQA (0-shot) - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 57.44916942762855 source: name: Open LLM Leaderboard url: # GSM8k (5-shot) - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc name: accuracy value: 12.736921910538287 source: name: Open LLM Leaderboard url: # MMLU (5-Shot) - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc name: accuracy value: 61.07 source: name: Open LLM Leaderboard url: # Winogrande (5-shot) - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc name: accuracy value: 77.74269928966061 source: name: Open LLM Leaderboard url: # AlpacaEval (taken from model card) - task: type: text-generation name: Text Generation dataset: name: AlpacaEval type: tatsu-lab/alpaca_eval metrics: - type: unknown name: win rate value: 0.9060 source: url: # MT-Bench (taken from model card) - task: type: text-generation name: Text Generation dataset: name: MT-Bench type: unknown metrics: - type: unknown name: score value: 7.34 source: url: --- \"Zephyr # Model Card for Zephyr 7B β Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). We found that removing the in-built alignment of these datasets boosted performance on MT Bench and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so. You can find more details in the technical report. ## Model description - **Model type:** A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets. - **Language(s) (NLP):** Primarily English - **License:** MIT - **Finetuned from model:** mistralai/Mistral-7B-v0.1 ### Model Sources - **Repository:** - **Demo:** - **Chatbot Arena:** Evaluate Zephyr 7B against 10+ LLMs in the LMSYS arena: ## Performance At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the MT-Bench and AlpacaEval benchmarks: | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) | |-------------|-----|----|---------------|--------------| | StableLM-Tuned-α | 7B| dSFT |2.75| -| | MPT-Chat | 7B |dSFT |5.42| -| | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83| | Mistral-Instructv0.1 | 7B| - | 6.84 |-| | Zephyr-7b-α |7B| dDPO| 6.88| -| | **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** | | Falcon-Instruct | 40B |dSFT |5.17 |45.71| | Guanaco | 65B | SFT |6.41| 71.80| | Llama2-Chat | 70B |RLHF |6.86| 92.66| | Vicuna v1.3 | 33B |dSFT |7.12 |88.99| | WizardLM v1.0 | 70B |dSFT |7.71 |-| | Xwin-LM v0.1 | 70B |dPPO |- |95.57| | GPT-3.5-turbo | - |RLHF |7.94 |89.37| | Claude 2 | - |RLHF |8.06| 91.36| | GPT-4 | -| RLHF |8.99| 95.28| In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B: !image/png However, on more complex tasks like coding and mathematics, Zephyr-7B-β lags behind proprietary models and more research is needed to close the gap. ## Intended uses & limitations The model was initially fine-tuned on a filtered and preprocessed of the []( dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. We then further aligned the model with 🤗 TRL's on the openbmb/UltraFeedback dataset, which contains 64k prompts and model completions that are ranked by GPT-4. As a result, the model can be used for chat and you can check out our demo to test its capabilities. You can find the datasets used for training Zephyr-7B-β here Here's how you can run the model using the function from 🤗 Transformers: ## Bias, Risks, and Limitations Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base model (), however it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this. ## Training and evaluation data During DPO training, this model achieves the following results on the evaluation set: - Loss: 0.7496 - Rewards/chosen: -4.5221 - Rewards/rejected: -8.3184 - Rewards/accuracies: 0.7812 - Rewards/margins: 3.7963 - Logps/rejected: -340.1541 - Logps/chosen: -299.4561 - Logits/rejected: -2.3081 - Logits/chosen: -2.3531 ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 16 - total_train_batch_size: 32 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3.0 ### Training results The table below shows the full set of DPO training metrics: | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 | | 0.4908 | 0.1 | 200 | 0.5426 | -0.0279 | -0.6842 | 0.75 | 0.6563 | -263.8124 | -254.5145 | -2.7719 | -2.7960 | | 0.5264 | 0.15 | 300 | 0.5324 | 0.0414 | -0.9793 | 0.7656 | 1.0207 | -266.7627 | -253.8209 | -2.7892 | -2.8122 | | 0.5536 | 0.21 | 400 | 0.4957 | -0.0185 | -1.5276 | 0.7969 | 1.5091 | -272.2460 | -254.4203 | -2.8542 | -2.8764 | | 0.5362 | 0.26 | 500 | 0.5031 | -0.2630 | -1.5917 | 0.7812 | 1.3287 | -272.8869 | -256.8653 | -2.8702 | -2.8958 | | 0.5966 | 0.31 | 600 | 0.5963 | -0.2993 | -1.6491 | 0.7812 | 1.3499 | -273.4614 | -257.2279 | -2.8778 | -2.8986 | | 0.5014 | 0.36 | 700 | 0.5382 | -0.2859 | -1.4750 | 0.75 | 1.1891 | -271.7204 | -257.0942 | -2.7659 | -2.7869 | | 0.5334 | 0.41 | 800 | 0.5677 | -0.4289 | -1.8968 | 0.7969 | 1.4679 | -275.9378 | -258.5242 | -2.7053 | -2.7265 | | 0.5251 | 0.46 | 900 | 0.5772 | -0.2116 | -1.3107 | 0.7344 | 1.0991 | -270.0768 | -256.3507 | -2.8463 | -2.8662 | | 0.5205 | 0.52 | 1000 | 0.5262 | -0.3792 | -1.8585 | 0.7188 | 1.4793 | -275.5552 | -258.0276 | -2.7893 | -2.7979 | | 0.5094 | 0.57 | 1100 | 0.5433 | -0.6279 | -1.9368 | 0.7969 | 1.3089 | -276.3377 | -260.5136 | -2.7453 | -2.7536 | | 0.5837 | 0.62 | 1200 | 0.5349 | -0.3780 | -1.9584 | 0.7656 | 1.5804 | -276.5542 | -258.0154 | -2.7643 | -2.7756 | | 0.5214 | 0.67 | 1300 | 0.5732 | -1.0055 | -2.2306 | 0.7656 | 1.2251 | -279.2761 | -264.2903 | -2.6986 | -2.7113 | | 0.6914 | 0.72 | 1400 | 0.5137 | -0.6912 | -2.1775 | 0.7969 | 1.4863 | -278.7448 | -261.1467 | -2.7166 | -2.7275 | | 0.4655 | 0.77 | 1500 | 0.5090 | -0.7987 | -2.2930 | 0.7031 | 1.4943 | -279.8999 | -262.2220 | -2.6651 | -2.6838 | | 0.5731 | 0.83 | 1600 | 0.5312 | -0.8253 | -2.3520 | 0.7812 | 1.5268 | -280.4902 | -262.4876 | -2.6543 | -2.6728 | | 0.5233 | 0.88 | 1700 | 0.5206 | -0.4573 | -2.0951 | 0.7812 | 1.6377 | -277.9205 | -258.8084 | -2.6870 | -2.7097 | | 0.5593 | 0.93 | 1800 | 0.5231 | -0.5508 | -2.2000 | 0.7969 | 1.6492 | -278.9703 | -259.7433 | -2.6221 | -2.6519 | | 0.4967 | 0.98 | 1900 | 0.5290 | -0.5340 | -1.9570 | 0.8281 | 1.4230 | -276.5395 | -259.5749 | -2.6564 | -2.6878 | | 0.0921 | 1.03 | 2000 | 0.5368 | -1.1376 | -3.1615 | 0.7812 | 2.0239 | -288.5854 | -265.6111 | -2.6040 | -2.6345 | | 0.0733 | 1.08 | 2100 | 0.5453 | -1.1045 | -3.4451 | 0.7656 | 2.3406 | -291.4208 | -265.2799 | -2.6289 | -2.6595 | | 0.0972 | 1.14 | 2200 | 0.5571 | -1.6915 | -3.9823 | 0.8125 | 2.2908 | -296.7934 | -271.1505 | -2.6471 | -2.6709 | | 0.1058 | 1.19 | 2300 | 0.5789 | -1.0621 | -3.8941 | 0.7969 | 2.8319 | -295.9106 | -264.8563 | -2.5527 | -2.5798 | | 0.2423 | 1.24 | 2400 | 0.5455 | -1.1963 | -3.5590 | 0.7812 | 2.3627 | -292.5599 | -266.1981 | -2.5414 | -2.5784 | | 0.1177 | 1.29 | 2500 | 0.5889 | -1.8141 | -4.3942 | 0.7969 | 2.5801 | -300.9120 | -272.3761 | -2.4802 | -2.5189 | | 0.1213 | 1.34 | 2600 | 0.5683 | -1.4608 | -3.8420 | 0.8125 | 2.3812 | -295.3901 | -268.8436 | -2.4774 | -2.5207 | | 0.0889 | 1.39 | 2700 | 0.5890 | -1.6007 | -3.7337 | 0.7812 | 2.1330 | -294.3068 | -270.2423 | -2.4123 | -2.4522 | | 0.0995 | 1.45 | 2800 | 0.6073 | -1.5519 | -3.8362 | 0.8281 | 2.2843 | -295.3315 | -269.7538 | -2.4685 | -2.5050 | | 0.1145 | 1.5 | 2900 | 0.5790 | -1.7939 | -4.2876 | 0.8438 | 2.4937 | -299.8461 | -272.1744 | -2.4272 | -2.4674 | | 0.0644 | 1.55 | 3000 | 0.5735 | -1.7285 | -4.2051 | 0.8125 | 2.4766 | -299.0209 | -271.5201 | -2.4193 | -2.4574 | | 0.0798 | 1.6 | 3100 | 0.5537 | -1.7226 | -4.2850 | 0.8438 | 2.5624 | -299.8200 | -271.4610 | -2.5367 | -2.5696 | | 0.1013 | 1.65 | 3200 | 0.5575 | -1.5715 | -3.9813 | 0.875 | 2.4098 | -296.7825 | -269.9498 | -2.4926 | -2.5267 | | 0.1254 | 1.7 | 3300 | 0.5905 | -1.6412 | -4.4703 | 0.8594 | 2.8291 | -301.6730 | -270.6473 | -2.5017 | -2.5340 | | 0.085 | 1.76 | 3400 | 0.6133 | -1.9159 | -4.6760 | 0.8438 | 2.7601 | -303.7296 | -273.3941 | -2.4614 | -2.4960 | | 0.065 | 1.81 | 3500 | 0.6074 | -1.8237 | -4.3525 | 0.8594 | 2.5288 | -300.4951 | -272.4724 | -2.4597 | -2.5004 | | 0.0755 | 1.86 | 3600 | 0.5836 | -1.9252 | -4.4005 | 0.8125 | 2.4753 | -300.9748 | -273.4872 | -2.4327 | -2.4716 | | 0.0746 | 1.91 | 3700 | 0.5789 | -1.9280 | -4.4906 | 0.8125 | 2.5626 | -301.8762 | -273.5149 | -2.4686 | -2.5115 | | 0.1348 | 1.96 | 3800 | 0.6015 | -1.8658 | -4.2428 | 0.8281 | 2.3769 | -299.3976 | -272.8936 | -2.4943 | -2.5393 | | 0.0217 | 2.01 | 3900 | 0.6122 | -2.3335 | -4.9229 | 0.8281 | 2.5894 | -306.1988 | -277.5699 | -2.4841 | -2.5272 | | 0.0219 | 2.07 | 4000 | 0.6522 | -2.9890 | -6.0164 | 0.8281 | 3.0274 | -317.1334 | -284.1248 | -2.4105 | -2.4545 | | 0.0119 | 2.12 | 4100 | 0.6922 | -3.4777 | -6.6749 | 0.7969 | 3.1972 | -323.7187 | -289.0121 | -2.4272 | -2.4699 | | 0.0153 | 2.17 | 4200 | 0.6993 | -3.2406 | -6.6775 | 0.7969 | 3.4369 | -323.7453 | -286.6413 | -2.4047 | -2.4465 | | 0.011 | 2.22 | 4300 | 0.7178 | -3.7991 | -7.4397 | 0.7656 | 3.6406 | -331.3667 | -292.2260 | -2.3843 | -2.4290 | | 0.0072 | 2.27 | 4400 | 0.6840 | -3.3269 | -6.8021 | 0.8125 | 3.4752 | -324.9908 | -287.5042 | -2.4095 | -2.4536 | | 0.0197 | 2.32 | 4500 | 0.7013 | -3.6890 | -7.3014 | 0.8125 | 3.6124 | -329.9841 | -291.1250 | -2.4118 | -2.4543 | | 0.0182 | 2.37 | 4600 | 0.7476 | -3.8994 | -7.5366 | 0.8281 | 3.6372 | -332.3356 | -293.2291 | -2.4163 | -2.4565 | | 0.0125 | 2.43 | 4700 | 0.7199 | -4.0560 | -7.5765 | 0.8438 | 3.5204 | -332.7345 | -294.7952 | -2.3699 | -2.4100 | | 0.0082 | 2.48 | 4800 | 0.7048 | -3.6613 | -7.1356 | 0.875 | 3.4743 | -328.3255 | -290.8477 | -2.3925 | -2.4303 | | 0.0118 | 2.53 | 4900 | 0.6976 | -3.7908 | -7.3152 | 0.8125 | 3.5244 | -330.1224 | -292.1431 | -2.3633 | -2.4047 | | 0.0118 | 2.58 | 5000 | 0.7198 | -3.9049 | -7.5557 | 0.8281 | 3.6508 | -332.5271 | -293.2844 | -2.3764 | -2.4194 | | 0.006 | 2.63 | 5100 | 0.7506 | -4.2118 | -7.9149 | 0.8125 | 3.7032 | -336.1194 | -296.3530 | -2.3407 | -2.3860 | | 0.0143 | 2.68 | 5200 | 0.7408 | -4.2433 | -7.9802 | 0.8125 | 3.7369 | -336.7721 | -296.6682 | -2.3509 | -2.3946 | | 0.0057 | 2.74 | 5300 | 0.7552 | -4.3392 | -8.0831 | 0.7969 | 3.7439 | -337.8013 | -297.6275 | -2.3388 | -2.3842 | | 0.0138 | 2.79 | 5400 | 0.7404 | -4.2395 | -7.9762 | 0.8125 | 3.7367 | -336.7322 | -296.6304 | -2.3286 | -2.3737 | | 0.0079 | 2.84 | 5500 | 0.7525 | -4.4466 | -8.2196 | 0.7812 | 3.7731 | -339.1662 | -298.7007 | -2.3200 | -2.3641 | | 0.0077 | 2.89 | 5600 | 0.7520 | -4.5586 | -8.3485 | 0.7969 | 3.7899 | -340.4545 | -299.8206 | -2.3078 | -2.3517 | | 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 | | 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 | ### Framework versions - Transformers 4.35.0.dev0 - Pytorch 2.0.1+cu118 - Datasets 2.12.0 - Tokenizers 0.14.0 ## Citation If you find Zephyr-7B-β is useful in your work, please cite it with: If you use the UltraChat or UltraFeedback datasets, please cite the original works: # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 52.15 | | ARC (25-shot) | 62.03 | | HellaSwag (10-shot) | 84.36 | | MMLU (5-shot) | 61.07 | | TruthfulQA (0-shot) | 57.45 | | Winogrande (5-shot) | 77.74 | | GSM8K (5-shot) | 12.74 | | DROP (3-shot) | 9.66 |", + "model_explanation_gemini": "A fine-tuned 7B language model optimized for text generation and acting as a helpful assistant, trained on synthetic datasets using Direct Preference Optimization." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolLM-135M.json b/data/model_data_json/HuggingFaceTB_SmolLM-135M.json new file mode 100644 index 0000000000000000000000000000000000000000..ff0f59a1649679a6e8a423d23e36558b61b61670 --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolLM-135M.json @@ -0,0 +1,20 @@ +{ + "model_id": "HuggingFaceTB/SmolLM-135M", + "downloads": 189091, + "tags": [ + "transformers", + "onnx", + "safetensors", + "llama", + "text-generation", + "en", + "dataset:HuggingFaceTB/smollm-corpus", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en datasets: - HuggingFaceTB/smollm-corpus --- # SmolLM
\"SmolLM\"
## Table of Contents 1. Model Summary 2. Limitations 3. Training 4. License 5. Citation ## Model Summary SmolLM is a series of state-of-the-art small language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are built on Cosmo-Corpus, a meticulously curated high-quality training dataset. Cosmo-Corpus includes Cosmopedia v2 (28B tokens of synthetic textbooks and stories generated by Mixtral), Python-Edu (4B tokens of educational Python samples from The Stack), and FineWeb-Edu (220B tokens of deduplicated educational web samples from FineWeb). SmolLM models have shown promising results when compared to other models in their size categories across various benchmarks testing common sense reasoning and world knowledge. For detailed information on training, benchmarks and performance, please refer to our full blog post. This is the SmolLM-135M ### Generation #### Running the model on CPU/GPU/multi GPU * _Using full precision_ * _Using _ #### Quantized Versions through * _Using 8-bit precision (int8)_ # Limitations While SmolLM models have been trained on a diverse dataset including educational content and synthetic texts, they have limitations. The models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. For a more comprehensive discussion of the models' capabilities and limitations, please refer to our full blog post.. This repository contains a converted version of our latest trained model. We've noticed a small performance difference between this converted checkpoint (transformers) and the original (nanotron). We're currently working to resolve this issue. # Training ## Model - **Architecture:** For architecture detail, see the blog post. - **Pretraining steps:** 600k - **Pretraining tokens:** 600B - **Precision:** bfloat16 - **Tokenizer:** HuggingFaceTB/cosmo2-tokenizer ## Hardware - **GPUs:** 64 H100 ## Software - **Training Framework:** Nanotron # License Apache 2.0 # Citation", + "model_explanation_gemini": "SmolLM-135M is a small English language model trained on diverse educational and synthetic datasets for generating text and performing reasoning tasks, with limitations in factual accuracy and bias." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolLM2-1.7B-Instruct.json b/data/model_data_json/HuggingFaceTB_SmolLM2-1.7B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..cd4a826182bf2c6a91fcf4ed35e79d4237a6c392 --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolLM2-1.7B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "HuggingFaceTB/SmolLM2-1.7B-Instruct", + "downloads": 80704, + "tags": [ + "transformers", + "tensorboard", + "onnx", + "safetensors", + "llama", + "text-generation", + "transformers.js", + "conversational", + "en", + "arxiv:2502.02737", + "base_model:HuggingFaceTB/SmolLM2-1.7B", + "base_model:quantized:HuggingFaceTB/SmolLM2-1.7B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en pipeline_tag: text-generation tags: - safetensors - onnx - transformers.js base_model: - HuggingFaceTB/SmolLM2-1.7B --- # SmolLM2 !image/png ## Table of Contents 1. Model Summary 2. Evaluation 3. Examples 4. Limitations 5. Training 6. License 7. Citation ## Model Summary SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. More details in our paper: The 1.7B variant demonstrates significant advances over its predecessor SmolLM1-1.7B, particularly in instruction following, knowledge, reasoning, and mathematics. It was trained on 11 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new mathematics and coding datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using UltraFeedback. The instruct model additionally supports tasks such as text rewriting, summarization and function calling thanks to datasets developed by Argilla such as Synth-APIGen-v0.1. You can find the SFT dataset here: For more details refer to: You will find pre-training, post-training, evaluation and local inference code. ### How to use #### Transformers #### Chat in TRL You can also use the TRL CLI to chat with the model from the terminal: #### Transformers.js ## Evaluation In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. ## Base Pre-Trained Model | Metric | SmolLM2-1.7B | Llama-1B | Qwen2.5-1.5B | SmolLM1-1.7B | |------------------|--------------|-------------|---------------|--------------| | HellaSwag | **68.7** | 61.2 | 66.4 | 62.9 | | ARC (Average) | **60.5** | 49.2 | 58.5 | 59.9 | | PIQA | **77.6** | 74.8 | 76.1 | 76.0 | | MMLU-Pro (MCF) | **19.4** | 11.7 | 13.7 | 10.8 | | CommonsenseQA | **43.6** | 41.2 | 34.1 | 38.0 | | TriviaQA | **36.7** | 28.1 | 20.9 | 22.5 | | Winogrande | **59.4** | 57.8 | 59.3 | 54.7 | | OpenBookQA | 42.2 | 38.4 | 40.0 | **42.4** | | GSM8K (5-shot) | 31.0 | 7.2 | **61.3** | 5.5 | ## Instruction Model | Metric | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct | |:-----------------------------|:---------------------:|:-----------------:|:----------------------:|:----------------------:| | IFEval (Average prompt/inst) | **56.7** | 53.5 | 47.4 | 23.1 | | MT-Bench | 6.13 | 5.48 | **6.52** | 4.33 | | OpenRewrite-Eval (micro_avg RougeL) | 44.9 | 39.2 | **46.9** | NaN | | HellaSwag | **66.1** | 56.1 | 60.9 | 55.5 | | ARC (Average) | **51.7** | 41.6 | 46.2 | 43.7 | | PIQA | **74.4** | 72.3 | 73.2 | 71.6 | | MMLU-Pro (MCF) | 19.3 | 12.7 | **24.2** | 11.7 | | BBH (3-shot) | 32.2 | 27.6 | **35.3** | 25.7 | | GSM8K (5-shot) | **48.2** | 26.8 | 42.8 | 4.62 | ## Examples Below are some system and instruct prompts that work well for special tasks ### Text rewriting ### Summarization ### Function calling SmolLM2-1.7B-Instruct can handle function calling, it scores 27% on the BFCL Leaderboard. Here's how you can leverage it: More details such as parallel function calls and tools not available can be found here ## Limitations SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. ## Training ### Model - **Architecture:** Transformer decoder - **Pretraining tokens:** 11T - **Precision:** bfloat16 ### Hardware - **GPUs:** 256 H100 ### Software - **Training Framework:** nanotron - **Alignment Handbook** alignment-handbook ## License Apache 2.0 ## Citation" +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolLM2-135M-Instruct.json b/data/model_data_json/HuggingFaceTB_SmolLM2-135M-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..e454f855cb95d3b0cc62ba9e0837cc9292db30af --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolLM2-135M-Instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "HuggingFaceTB/SmolLM2-135M-Instruct", + "downloads": 368502, + "tags": [ + "transformers", + "tensorboard", + "onnx", + "safetensors", + "llama", + "text-generation", + "transformers.js", + "conversational", + "en", + "arxiv:2502.02737", + "base_model:HuggingFaceTB/SmolLM2-135M", + "base_model:quantized:HuggingFaceTB/SmolLM2-135M", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en pipeline_tag: text-generation tags: - safetensors - onnx - transformers.js base_model: - HuggingFaceTB/SmolLM2-135M --- # SmolLM2 !image/png ## Table of Contents 1. Model Summary 2. Limitations 3. Training 4. License 5. Citation ## Model Summary SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. More details in our paper SmolLM2 demonstrates significant advances over its predecessor SmolLM1, particularly in instruction following, knowledge, reasoning. The 135M model was trained on 2 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using UltraFeedback. The instruct model additionally supports tasks such as text rewriting, summarization and function calling (for the 1.7B) thanks to datasets developed by Argilla such as Synth-APIGen-v0.1. You can find the SFT dataset here: and finetuning code at ### How to use ### Transformers ### Chat in TRL You can also use the TRL CLI to chat with the model from the terminal: ## Evaluation In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. ## Base pre-trained model | Metrics | SmolLM2-135M-8k | SmolLM-135M | |:-------------------|:----------------:|:------------:| | HellaSwag | **42.1** | 41.2 | | ARC (Average) | **43.9** | 42.4 | | PIQA | 68.4 | 68.4 | | MMLU (cloze) | **31.5** | 30.2 | | CommonsenseQA | **33.9** | 32.7 | | TriviaQA | 4.1 | **4.3** | | Winogrande | 51.3 | 51.3 | | OpenBookQA | **34.6** | 34.0 | | GSM8K (5-shot) | **1.4** | 1.0 | ## Instruction model | Metric | SmolLM2-135M-Instruct | SmolLM-135M-Instruct | |:-----------------------------|:---------------------:|:--------------------:| | IFEval (Average prompt/inst) | **29.9** | 17.2 | | MT-Bench | **19.8** | 16.8 | | HellaSwag | **40.9** | 38.9 | | ARC (Average) | **37.3** | 33.9 | | PIQA | **66.3** | 64.0 | | MMLU (cloze) | **29.3** | 28.3 | | BBH (3-shot) | **28.2** | 25.2 | | GSM8K (5-shot) | 1.4 | 1.4 | ## Limitations SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. ## Training ### Model - **Architecture:** Transformer decoder - **Pretraining tokens:** 2T - **Precision:** bfloat16 ### Hardware - **GPUs:** 64 H100 ### Software - **Training Framework:** nanotron ## License Apache 2.0 ## Citation", + "model_explanation_gemini": "A compact 135M-parameter language model fine-tuned for instruction following, text generation, summarization, and rewriting tasks, optimized for on-device use with improved performance over its predecessor." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolLM2-135M.json b/data/model_data_json/HuggingFaceTB_SmolLM2-135M.json new file mode 100644 index 0000000000000000000000000000000000000000..80588b977984491cf3584aa9dd68595f08bf6b46 --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolLM2-135M.json @@ -0,0 +1,19 @@ +{ + "model_id": "HuggingFaceTB/SmolLM2-135M", + "downloads": 569614, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "en", + "arxiv:2502.02737", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en --- # SmolLM2 !image/png ## Table of Contents 1. Model Summary 2. Limitations 3. Training 4. License 5. Citation ## Model Summary SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. More details in our paper: SmolLM2 demonstrates significant advances over its predecessor SmolLM1, particularly in instruction following, knowledge, reasoning. The 135M model was trained on 2 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using UltraFeedback. The instruct model additionally supports tasks such as text rewriting, summarization and function calling (for the 1.7B) thanks to datasets developed by Argilla such as Synth-APIGen-v0.1. You can find the SFT dataset here: and finetuning code at ### How to use #### Running the model on CPU/GPU/multi GPU * _Using full precision_ * _Using _ ## Evaluation In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. ## Base pre-trained model | Metrics | SmolLM2-135M-8k | SmolLM-135M | |:-------------------|:----------------:|:------------:| | HellaSwag | **42.1** | 41.2 | | ARC (Average) | **43.9** | 42.4 | | PIQA | 68.4 | 68.4 | | MMLU (cloze) | **31.5** | 30.2 | | CommonsenseQA | **33.9** | 32.7 | | TriviaQA | 4.1 | **4.3** | | Winogrande | 51.3 | 51.3 | | OpenBookQA | **34.6** | 34.0 | | GSM8K (5-shot) | **1.4** | 1.0 | ## Instruction model | Metric | SmolLM2-135M-Instruct | SmolLM-135M-Instruct | |:-----------------------------|:---------------------:|:--------------------:| | IFEval (Average prompt/inst) | **29.9** | 17.2 | | MT-Bench | **1.98** | 1.68 | | HellaSwag | **40.9** | 38.9 | | ARC (Average) | **37.3** | 33.9 | | PIQA | **66.3** | 64.0 | | MMLU (cloze) | **29.3** | 28.3 | | BBH (3-shot) | **28.2** | 25.2 | | GSM8K (5-shot) | 1.4 | 1.4 | ## Limitations SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. ## Training ### Model - **Architecture:** Transformer decoder - **Pretraining tokens:** 2T - **Precision:** bfloat16 ### Hardware - **GPUs:** 64 H100 ### Software - **Training Framework:** nanotron ## License Apache 2.0 ## Citation", + "model_explanation_gemini": "A compact 135M-parameter language model optimized for on-device use, capable of instruction following, text rewriting, summarization, and reasoning tasks, trained on diverse datasets and fine-tuned with DPO for improved performance." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolLM2-360M-Instruct.json b/data/model_data_json/HuggingFaceTB_SmolLM2-360M-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..b7665165791f79f8680e7d92ea5b28c1eb346e17 --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolLM2-360M-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "HuggingFaceTB/SmolLM2-360M-Instruct", + "downloads": 840700, + "tags": [ + "transformers", + "tensorboard", + "onnx", + "safetensors", + "llama", + "text-generation", + "transformers.js", + "conversational", + "en", + "arxiv:2502.02737", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en pipeline_tag: text-generation tags: - safetensors - onnx - transformers.js --- # SmolLM2 !image/png ## Table of Contents 1. Model Summary 2. Limitations 3. Training 4. License 5. Citation ## Model Summary SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. More details in our paper: SmolLM2 demonstrates significant advances over its predecessor SmolLM1, particularly in instruction following, knowledge, reasoning. The 360M model was trained on 4 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using UltraFeedback. The instruct model additionally supports tasks such as text rewriting, summarization and function calling (for the 1.7B) thanks to datasets developed by Argilla such as Synth-APIGen-v0.1. You can find the SFT dataset here: and finetuning code in the alignement handbook For more details refer to: You will find pre-training, post-training, evaluation and local inference code. ### How to use ### Transformers ### Chat in TRL You can also use the TRL CLI to chat with the model from the terminal: ## Evaluation In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. ## Base Pre-Trained Model | Metrics | SmolLM2-360M | Qwen2.5-0.5B | SmolLM-360M | |:-------------------|:------------:|:------------:|:------------:| | HellaSwag | **54.5** | 51.2 | 51.8 | | ARC (Average) | **53.0** | 45.4 | 50.1 | | PIQA | **71.7** | 69.9 | 71.6 | | MMLU (cloze) | **35.8** | 33.7 | 34.4 | | CommonsenseQA | **38.0** | 31.6 | 35.3 | | TriviaQA | **16.9** | 4.3 | 9.1 | | Winogrande | 52.5 | **54.1** | 52.8 | | OpenBookQA | **37.4** | **37.4** | 37.2 | | GSM8K (5-shot) | 3.2 | **33.4** | 1.6 | ## Instruction Model | Metric | SmolLM2-360M-Instruct | Qwen2.5-0.5B-Instruct | SmolLM-360M-Instruct | |:-----------------------------|:---------------------:|:---------------------:|:---------------------:| | IFEval (Average prompt/inst) | **41.0** | 31.6 | 19.8 | | MT-Bench | 3.66 | **4.16** | 3.37 | | HellaSwag | **52.1** | 48.0 | 47.9 | | ARC (Average) | **43.7** | 37.3 | 38.8 | | PIQA | **70.8** | 67.2 | 69.4 | | MMLU (cloze) | **32.8** | 31.7 | 30.6 | | BBH (3-shot) | 27.3 | **30.7** | 24.4 | | GSM8K (5-shot) | 7.43 | **26.8** | 1.36 | ## Limitations SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. ## Training ### Model - **Architecture:** Transformer decoder - **Pretraining tokens:** 4T - **Precision:** bfloat16 ### Hardware - **GPUs:** 64 H100 ### Software - **Training Framework:** nanotron ## License Apache 2.0 ## Citation", + "model_explanation_gemini": "A compact 360M-parameter language model fine-tuned for instruction-following tasks like text rewriting and summarization, optimized for on-device use with enhanced reasoning and knowledge capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolLM2-360M.json b/data/model_data_json/HuggingFaceTB_SmolLM2-360M.json new file mode 100644 index 0000000000000000000000000000000000000000..6affa414c96e8213d7ae21056bc2fc3b53819fe9 --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolLM2-360M.json @@ -0,0 +1,19 @@ +{ + "model_id": "HuggingFaceTB/SmolLM2-360M", + "downloads": 98113, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "en", + "arxiv:2502.02737", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en --- # SmolLM2 !image/png ## Table of Contents 1. Model Summary 2. Limitations 3. Training 4. License 5. Citation ## Model Summary SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. More details in our paper: SmolLM2 demonstrates significant advances over its predecessor SmolLM1, particularly in instruction following, knowledge, reasoning. The 360M model was trained on 4 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using UltraFeedback. The instruct model additionally supports tasks such as text rewriting, summarization and function calling thanks to datasets developed by Argilla such as Synth-APIGen-v0.1. For more details refer to: You will find pre-training, post-training, evaluation and local inference code. ### How to use #### Running the model on CPU/GPU/multi GPU * _Using full precision_ * _Using _ ## Evaluation In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. ## Base Pre-Trained Model | Metrics | SmolLM2-360M | Qwen2.5-0.5B | SmolLM-360M | |:-------------------|:------------:|:------------:|:------------:| | HellaSwag | **54.5** | 51.2 | 51.8 | | ARC (Average) | **53.0** | 45.4 | 50.1 | | PIQA | **71.7** | 69.9 | 71.6 | | MMLU (cloze) | **35.8** | 33.7 | 34.4 | | CommonsenseQA | **38.0** | 31.6 | 35.3 | | TriviaQA | **16.9** | 4.3 | 9.1 | | Winogrande | 52.5 | **54.1** | 52.8 | | OpenBookQA | **37.4** | **37.4** | 37.2 | | GSM8K (5-shot) | 3.2 | **33.4** | 1.6 | ## Instruction Model | Metric | SmolLM2-360M-Instruct | Qwen2.5-0.5B-Instruct | SmolLM-360M-Instruct | |:-----------------------------|:---------------------:|:---------------------:|:---------------------:| | IFEval (Average prompt/inst) | **41.0** | 31.6 | 19.8 | | MT-Bench | 3.66 | **4.16** | 3.37 | | HellaSwag | **52.1** | 48.0 | 47.9 | | ARC (Average) | **43.7** | 37.3 | 38.8 | | PIQA | **70.8** | 67.2 | 69.4 | | MMLU (cloze) | **32.8** | 31.7 | 30.6 | | BBH (3-shot) | 27.3 | **30.7** | 24.4 | | GSM8K (5-shot) | 7.43 | **26.8** | 1.36 | ## Limitations SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. ## Training ### Model - **Architecture:** Transformer decoder - **Pretraining tokens:** 4T - **Precision:** bfloat16 ### Hardware - **GPUs:** 128 H100 ### Software - **Training Framework:** nanotron ## License Apache 2.0 ## Citation", + "model_explanation_gemini": "A compact 360M-parameter language model optimized for on-device use, excelling in instruction following, reasoning, and tasks like text rewriting and summarization." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolVLM-256M-Instruct.json b/data/model_data_json/HuggingFaceTB_SmolVLM-256M-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..28c2157590e08e8bc85a56c3d7330031b2445b47 --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolVLM-256M-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "HuggingFaceTB/SmolVLM-256M-Instruct", + "downloads": 411066, + "tags": [ + "transformers", + "onnx", + "safetensors", + "idefics3", + "image-text-to-text", + "conversational", + "en", + "dataset:HuggingFaceM4/the_cauldron", + "dataset:HuggingFaceM4/Docmatix", + "arxiv:2504.05299", + "base_model:HuggingFaceTB/SmolLM2-135M-Instruct", + "base_model:quantized:HuggingFaceTB/SmolLM2-135M-Instruct", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceM4/the_cauldron - HuggingFaceM4/Docmatix pipeline_tag: image-text-to-text language: - en base_model: - HuggingFaceTB/SmolLM2-135M-Instruct - google/siglip-base-patch16-512 --- \"Image # SmolVLM-256M SmolVLM-256M is the smallest multimodal model in the world. It accepts arbitrary sequences of image and text inputs to produce text outputs. It's designed for efficiency. SmolVLM can answer questions about images, describe visual content, or transcribe text. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. It can run inference on one image with under 1GB of GPU RAM. ## Model Summary - **Developed by:** Hugging Face 🤗 - **Model type:** Multi-modal model (image+text) - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Architecture:** Based on Idefics3 (see technical summary) ## Resources - **Demo:** SmolVLM-256 Demo - **Blog:** Blog post ## Uses SmolVLM can be used for inference on multimodal (image + text) tasks where the input comprises text queries along with one or more images. Text and images can be interleaved arbitrarily, enabling tasks like image captioning, visual question answering, and storytelling based on visual content. The model does not support image generation. To fine-tune SmolVLM on a specific task, you can follow the fine-tuning tutorial. ### Technical Summary SmolVLM leverages the lightweight SmolLM2 language model to provide a compact yet powerful multimodal experience. It introduces several changes compared to the larger SmolVLM 2.2B model: - **Image compression:** We introduce a more radical image compression compared to Idefics3 and SmolVLM-2.2B to enable the model to infer faster and use less RAM. - **Visual Token Encoding:** SmolVLM-256 uses 64 visual tokens to encode image patches of size 512×512. Larger images are divided into patches, each encoded separately, enhancing efficiency without compromising performance. - **New special tokens:** We added new special tokens to divide the subimages. This allows for more efficient tokenization of the images. - **Smoller vision encoder:** We went from a 400M parameter siglip vision encoder to a much smaller 93M encoder. - **Larger image patches:** We are now passing patches of 512x512 to the vision encoder, instead of 384x384 like the larger SmolVLM. This allows the information to be encoded more efficiently. More details about the training and architecture are available in our technical report. ### How to get started You can use transformers to load, infer and fine-tune SmolVLM. We also provide ONNX weights for the model, which you can run with ONNX Runtime as follows:
Click here to see the sample code Example output:
### Model optimizations **Precision**: For better performance, load and run the model in half-precision () if your hardware supports it. You can also load SmolVLM with 4/8-bit quantization using bitsandbytes, torchao or Quanto. Refer to this page for other options. **Vision Encoder Efficiency**: Adjust the image resolution by setting when initializing the processor, where N is your desired value. The default works well, which results in input images of size 2048×2048. Decreasing N can save GPU memory and is appropriate for lower-resolution images. This is also useful if you want to fine-tune on videos. ## Misuse and Out-of-scope Use SmolVLM is not intended for high-stakes scenarios or critical decision-making processes that affect an individual's well-being or livelihood. The model may produce content that appears factual but may not be accurate. Misuse includes, but is not limited to: - Prohibited Uses: - Evaluating or scoring individuals (e.g., in employment, education, credit) - Critical automated decision-making - Generating unreliable factual content - Malicious Activities: - Spam generation - Disinformation campaigns - Harassment or abuse - Unauthorized surveillance ### License SmolVLM is built upon SigLIP as image encoder and SmolLM2 for text decoder part. We release the SmolVLM checkpoints under the Apache 2.0 license. ## Training Details ### Training Data The training data comes from The Cauldron and Docmatix datasets, with emphasis on document understanding (25%) and image captioning (18%), while maintaining balanced coverage across other crucial capabilities like visual reasoning, chart comprehension, and general instruction following. \"Example ## Evaluation | Size | Mathvista | MMMU | OCRBench | MMStar | AI2D | ChartQA_Test | Science_QA | TextVQA Val | DocVQA Val | |-------|-----------|------|----------|--------|-------|--------------|------------|-------------|------------| | 256M | 35.9 | 28.3 | 52.6 | 34.6 | 47 | 55.8 | 73.6 | 49.9 | 58.3 | | 500M | 40.1 | 33.7 | 61 | 38.3 | 59.5 | 63.2 | 79.7 | 60.5 | 70.5 | | 2.2B | 43.9 | 38.3 | 65.5 | 41.8 | 64 | 71.6 | 84.5 | 72.1 | 79.7 | # Citation information You can cite us in the following way:", + "model_explanation_gemini": "SmolVLM-256M is a lightweight multimodal model that processes interleaved image and text inputs to generate text outputs for tasks like visual question answering, image captioning, and storytelling, optimized for efficiency and on-device use." +} \ No newline at end of file diff --git a/data/model_data_json/HuggingFaceTB_SmolVLM2-2.2B-Instruct.json b/data/model_data_json/HuggingFaceTB_SmolVLM2-2.2B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..ca19a86d46a283a25d3b55131c7bb2a244748607 --- /dev/null +++ b/data/model_data_json/HuggingFaceTB_SmolVLM2-2.2B-Instruct.json @@ -0,0 +1,32 @@ +{ + "model_id": "HuggingFaceTB/SmolVLM2-2.2B-Instruct", + "downloads": 74571, + "tags": [ + "transformers", + "safetensors", + "smolvlm", + "image-text-to-text", + "video-text-to-text", + "conversational", + "en", + "dataset:HuggingFaceM4/the_cauldron", + "dataset:HuggingFaceM4/Docmatix", + "dataset:lmms-lab/LLaVA-OneVision-Data", + "dataset:lmms-lab/M4-Instruct-Data", + "dataset:HuggingFaceFV/finevideo", + "dataset:MAmmoTH-VL/MAmmoTH-VL-Instruct-12M", + "dataset:lmms-lab/LLaVA-Video-178K", + "dataset:orrzohar/Video-STaR", + "dataset:Mutonix/Vript", + "dataset:TIGER-Lab/VISTA-400K", + "dataset:Enxin/MovieChat-1K_train", + "dataset:ShareGPT4Video/ShareGPT4Video", + "arxiv:2504.05299", + "base_model:HuggingFaceTB/SmolVLM-Instruct", + "base_model:finetune:HuggingFaceTB/SmolVLM-Instruct", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceM4/the_cauldron - HuggingFaceM4/Docmatix - lmms-lab/LLaVA-OneVision-Data - lmms-lab/M4-Instruct-Data - HuggingFaceFV/finevideo - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M - lmms-lab/LLaVA-Video-178K - orrzohar/Video-STaR - Mutonix/Vript - TIGER-Lab/VISTA-400K - Enxin/MovieChat-1K_train - ShareGPT4Video/ShareGPT4Video pipeline_tag: image-text-to-text tags: - video-text-to-text language: - en base_model: - HuggingFaceTB/SmolVLM-Instruct --- \"Image # SmolVLM2 2.2B SmolVLM2-2.2B is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 5.2GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited. ## Model Summary - **Developed by:** Hugging Face 🤗 - **Model type:** Multi-modal model (image/multi-image/video/text) - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Architecture:** Based on Idefics3 (see technical summary) ## Resources - **Demo:** Video Highlight Generator - **Blog:** Blog post ## Uses SmolVLM2 can be used for inference on multimodal (video / image / text) tasks where the input consists of text queries along with video or one or more images. Text and media files can be interleaved arbitrarily, enabling tasks like captioning, visual question answering, and storytelling based on visual content. The model does not support image or video generation. To fine-tune SmolVLM2 on a specific task, you can follow the fine-tuning tutorial. ## Evaluation ### Vision Evaluation | Model | Mathvista | MMMU | OCRBench | MMStar | AI2D | ChartQA_Test | Science_QA | TextVQA Val | DocVQA Val | |-------------------|-----------|-------|----------|--------|------|--------------|------------|-------------|------------| | **SmolVLM2 2.2B** | 51.5 | 42 | 72.9 | 46 | 70 | 68.84 | 90 | 73.21 | 79.98 | | SmolVLM 2.2B | 43.9 | 38.3 | 65.5 | 41.8 | 84.5 | 71.6 | 84.5 | 72.1 | 79.7 | ### Video Evaluation We evaluated the performance of the SmolVLM2 family on the following scientific benchmarks: | Size | Video-MME | MLVU | MVBench | |----------|-----------------|----------|---------------| | 2.2B | 52.1 | 55.2 | 46.27 | | 500M | 42.2 | 47.3 | 39.73 | | 256M | 33.7 | 40.6 | 32.7 | ### How to get started You can use transformers to load, infer and fine-tune SmolVLM. Make sure you have num2words, flash-attn and latest transformers installed. You can load the model as follows. #### Simple Inference You preprocess your inputs directly using chat templates and directly passing them #### Video Inference To use SmolVLM2 for video inference, make sure you have decord installed. #### Multi-image Interleaved Inference You can interleave multiple media with text using chat templates. ### Model optimizations ## Misuse and Out-of-scope Use SmolVLM is not intended for high-stakes scenarios or critical decision-making processes that affect an individual's well-being or livelihood. The model may produce content that appears factual but may not be accurate. Misuse includes, but is not limited to: - Prohibited Uses: - Evaluating or scoring individuals (e.g., in employment, education, credit) - Critical automated decision-making - Generating unreliable factual content - Malicious Activities: - Spam generation - Disinformation campaigns - Harassment or abuse - Unauthorized surveillance ### License SmolVLM2 is built upon the shape-optimized SigLIP as image encoder and SmolLM2 for text decoder part. We release the SmolVLM2 checkpoints under the Apache 2.0 license. ## Citation information You can cite us in the following way: ## Training Data SmolVLM2 used 3.3M samples for training originally from ten different datasets: LlaVa Onevision, M4-Instruct, Mammoth, LlaVa Video 178K, FineVideo, VideoStar, VRipt, Vista-400K, MovieChat and ShareGPT4Video. In the following plots we give a general overview of the samples across modalities and the source of those samples. ## Data Split per modality | Data Type | Percentage | |--------------|------------| | Image | 34.4% | | Text | 20.2% | | Video | 33.0% | | Multi-image | 12.3% | ## Granular dataset slices per modality ### Text Datasets | Dataset | Percentage | |--------------------------------------------|------------| | llava-onevision/magpie_pro_ft3_80b_mt | 6.8% | | llava-onevision/magpie_pro_ft3_80b_tt | 6.8% | | llava-onevision/magpie_pro_qwen2_72b_tt | 5.8% | | llava-onevision/mathqa | 0.9% | ### Multi-image Datasets | Dataset | Percentage | |--------------------------------------------|------------| | m4-instruct-data/m4_instruct_multiimage | 10.4% | | mammoth/multiimage-cap6 | 1.9% | ### Image Datasets | Dataset | Percentage | |--------------------------------------------|------------| | llava-onevision/other | 17.4% | | llava-onevision/vision_flan | 3.9% | | llava-onevision/mavis_math_metagen | 2.6% | | llava-onevision/mavis_math_rule_geo | 2.5% | | llava-onevision/sharegpt4o | 1.7% | | llava-onevision/sharegpt4v_coco | 1.5% | | llava-onevision/image_textualization | 1.3% | | llava-onevision/sharegpt4v_llava | 0.9% | | llava-onevision/mapqa | 0.9% | | llava-onevision/qa | 0.8% | | llava-onevision/textocr | 0.8% | ### Video Datasets | Dataset | Percentage | |--------------------------------------------|------------| | llava-video-178k/1-2m | 7.3% | | llava-video-178k/2-3m | 7.0% | | other-video/combined | 5.7% | | llava-video-178k/hound | 4.4% | | llava-video-178k/0-30s | 2.4% | | video-star/starb | 2.2% | | vista-400k/combined | 2.2% | | vript/long | 1.0% | | ShareGPT4Video/all | 0.8% |" +} \ No newline at end of file diff --git a/data/model_data_json/IAAR-Shanghai_xFinder-qwen1505.json b/data/model_data_json/IAAR-Shanghai_xFinder-qwen1505.json new file mode 100644 index 0000000000000000000000000000000000000000..9de7dcaee88a6b3848bd621586667f7ba51705c2 --- /dev/null +++ b/data/model_data_json/IAAR-Shanghai_xFinder-qwen1505.json @@ -0,0 +1,19 @@ +{ + "model_id": "IAAR-Shanghai/xFinder-qwen1505", + "downloads": 234636, + "tags": [ + "transformers", + "pytorch", + "qwen2", + "text-generation", + "instruction-finetuning", + "conversational", + "en", + "license:cc-by-nc-nd-4.0", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- inference: false language: - en tags: - instruction-finetuning task_categories: - text-generation license: cc-by-nc-nd-4.0 ---
# xFinder-qwen1505 ## Model Details xFinder-qwen1505 is a model specifically designed for key answer extraction in large language models (LLMs). It is trained by fine-tuning Qwen-1.5-0.5B. - **Developed by:** IAAR - **Fine-tuned from Model:** Qwen-1.5-0.5B ## Model Sources - **Repository:** - **Paper:** ## Uses xFinder is primarily used to enhance the evaluation of LLMs by accurately extracting key answers from their outputs. It addresses the limitations of traditional regular expression (RegEx)-based extraction methods, which often fail to handle the diverse and complex outputs generated by LLMs. xFinder improves the reliability of model assessments across various tasks. ## Training Details xFinder-qwen1505 is fine-tuned from Qwen-1.5-0.5B. The training data consists of approximately 26.9K samples from the Key Answer Finder (KAF) dataset. This dataset is designed to enhance the accuracy and robustness of key answer extraction and includes a variety of tasks. It has been meticulously annotated by GPT-4 and human experts to ensure high-quality training and evaluation. For more details, see this paper and try it with code. ## Evaluation xFinder is evaluated on the fully human-annotated test and generalization sets of the KAF dataset. The results demonstrate significant improvements in extraction accuracy and robustness compared to traditional methods. For more details, please refer to the paper and try it out using the provided code. ## Citation", + "model_explanation_gemini": "Extracts key answers from large language model outputs to enhance evaluation accuracy, overcoming limitations of traditional regex-based methods." +} \ No newline at end of file diff --git a/data/model_data_json/IDEA-Research_grounding-dino-base.json b/data/model_data_json/IDEA-Research_grounding-dino-base.json new file mode 100644 index 0000000000000000000000000000000000000000..8dfb024e123deefc4aeb3f25d43909478cabc07e --- /dev/null +++ b/data/model_data_json/IDEA-Research_grounding-dino-base.json @@ -0,0 +1,17 @@ +{ + "model_id": "IDEA-Research/grounding-dino-base", + "downloads": 1166047, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "grounding-dino", + "zero-shot-object-detection", + "vision", + "arxiv:2303.05499", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision inference: false pipeline_tag: zero-shot-object-detection --- # Grounding DINO model (base variant) The Grounding DINO model was proposed in Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang. Grounding DINO extends a closed-set object detection model with a text encoder, enabling open-set object detection. The model achieves remarkable results, such as 52.5 AP on COCO zero-shot. \"drawing\" Grounding DINO overview. Taken from the
. ## Intended uses & limitations You can use the raw model for zero-shot object detection (the task of detecting things in an image out-of-the-box without labeled data). ### How to use Here's how to use the model for zero-shot object detection: ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects objects in images without labeled training data by combining visual and text inputs for open-set recognition." +} \ No newline at end of file diff --git a/data/model_data_json/IDEA-Research_grounding-dino-tiny.json b/data/model_data_json/IDEA-Research_grounding-dino-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..8b9305ec5ab0b30ac873e9631c7a15a43e4ec89c --- /dev/null +++ b/data/model_data_json/IDEA-Research_grounding-dino-tiny.json @@ -0,0 +1,17 @@ +{ + "model_id": "IDEA-Research/grounding-dino-tiny", + "downloads": 657588, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "grounding-dino", + "zero-shot-object-detection", + "vision", + "arxiv:2303.05499", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision inference: false pipeline_tag: zero-shot-object-detection --- # Grounding DINO model (tiny variant) The Grounding DINO model was proposed in Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang. Grounding DINO extends a closed-set object detection model with a text encoder, enabling open-set object detection. The model achieves remarkable results, such as 52.5 AP on COCO zero-shot. \"drawing\" Grounding DINO overview. Taken from the . ## Intended uses & limitations You can use the raw model for zero-shot object detection (the task of detecting things in an image out-of-the-box without labeled data). ### How to use Here's how to use the model for zero-shot object detection: ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects objects in images without labeled training data by combining visual and text inputs for open-set recognition." +} \ No newline at end of file diff --git a/data/model_data_json/Infermatic_Llama-3.3-70B-Instruct-FP8-Dynamic.json b/data/model_data_json/Infermatic_Llama-3.3-70B-Instruct-FP8-Dynamic.json new file mode 100644 index 0000000000000000000000000000000000000000..f6ed4e7851230986355d1f98828f93c85a87a16a --- /dev/null +++ b/data/model_data_json/Infermatic_Llama-3.3-70B-Instruct-FP8-Dynamic.json @@ -0,0 +1,34 @@ +{ + "model_id": "Infermatic/Llama-3.3-70B-Instruct-FP8-Dynamic", + "downloads": 114734, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "de", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-70B", + "base_model:quantized:meta-llama/Llama-3.1-70B", + "license:llama3.3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "compressed-tensors", + "region:us" + ], + "description": "--- library_name: transformers language: - en - fr - it - pt - hi - es - th - de base_model: - meta-llama/Llama-3.1-70B tags: - facebook - meta - pytorch - llama - llama-3 extra_gated_prompt: \"### LLAMA 3.3 COMMUNITY LICENSE AGREEMENT\\nLlama 3.3 Version Release Date: December 6, 2024\\n\\\"Agreement\\\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.\\n\\\"Documentation\\\" means the specifications, manuals and documentation accompanying Llama 3.3 distributed by Meta at or \\\"you\\\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.\\n\\\"Llama 3.3\\\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at Materials\\\" means, collectively, Meta’s proprietary Llama 3.3 and Documentation (and any portion thereof) made available under this Agreement.\\n\\\"Meta\\\" or \\\"we\\\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).\\nBy clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.\\n1. License Rights and Redistribution.\\na. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.\\nb. Redistribution and Use.\\ni. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name.\\nii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.\\_\\niii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.3 is licensed under the Llama 3.3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”\\niv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. \\n2. Additional Commercial Terms. If, on the Llama 3.3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.\\n3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.\\n4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.\\n5. Intellectual Property.\\na. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta.\\nb. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.\\nc. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.3 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.\\n6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.\\n7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.\\n### Llama 3.3 Acceptable Use Policy\\nMeta is committed to promoting safe and fair use of its tools and features, including Llama 3.3. If you access or use Llama 3.3, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at Uses\\nWe want everyone to use Llama 3.3 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.3 to:\\n1. Violate the law or others’ rights, including to:\\n\\n 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: \\n 1. Violence or terrorism \\n 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material \\n 3. Human trafficking, exploitation, and sexual violence \\n 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. \\n 5. Sexual solicitation \\n 6. Any other criminal activity\\n\\n 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals\\n\\n 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services\\n\\n 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices\\n\\n 5. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law\\n\\n 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials\\n\\n 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system\\n\\n 8. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta\\n\\n2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.3 related to the following:\\n\\n 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997\\n\\n 2. Guns and illegal weapons (including weapon development)\\n\\n 3. Illegal drugs and regulated/controlled substances\\n\\n 4. Operation of critical infrastructure, transportation technologies, or heavy machinery\\n\\n 5. Self-harm or harm to others, including suicide, cutting, and eating disorders\\n\\n 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual\\n\\n3. Intentionally deceive or mislead others, including use of Llama 3.3 related to the following:\\n\\n 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation\\n\\n 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content\\n\\n 3. Generating, promoting, or further distributing spam\\n\\n 4. Impersonating another individual without consent, authorization, or legal right\\n\\n 5. Representing that the use of Llama 3.3 or outputs are human-generated\\n\\n 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement\\n\\n4. Fail to appropriately disclose to end users any known dangers of your AI system\\n5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.3\\nWith respect to any multimodal models included in Llama 3.3, the rights granted under Section 1(a) of the Llama 3.3 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models.\\nPlease report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:\\n* Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama\\\\_output\\\\_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.3: LlamaUseReport@meta.com \" extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit license: llama3.3 --- # This quant was made for and by Infermatic.ai meta-llama/Llama-3.3-70B-Instruct Copy of the original card ## Model Information The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. | | Training Data | Params | Input modalities | Output modalities | Context length | GQA | Token count | Knowledge cutoff | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Llama 3.3 (text only) | A new mix of publicly available online data. | 70B | Multilingual Text | Multilingual Text and code | 128k | Yes | 15T+ | December 2023 | **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.3 model**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** * **70B Instruct: December 6, 2024** **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license, the Llama 3.3 Community License Agreement, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.3 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.3 model also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.3 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.3 Community License. Use in languages beyond those explicitly referenced as supported in this model card\\*\\*. \\*\\*Note: Llama 3.3 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.3 models for languages beyond the 8 supported languages provided they comply with the Llama 3.3 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.3 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Llama-3.3-70B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . See the snippet below for usage with Transformers: ### Tool use with transformers LLaMA-3.3 supports multiple tool use formats. You can see a full guide to prompt formatting here. Tool use is also supported through chat templates in Transformers. Here is a quick example showing a single simple tool: You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so: and then call the tool and append the result, with the role, like so: After that, you can again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information, see the LLaMA prompt format docs and the Transformers tool use documentation. ### Use with The model checkpoints can be used in and for further memory optimisations using and See the snippet below for usage: To load in 4-bit simply pass ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use** Training utilized a cumulative of **39.3**M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. ## ## **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | | Training Time (GPU hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | :---: | :---: | :---: | | Llama 3.3 70B | 7.0M | 700 | 2,040 | 0 | ## The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.3 was pretrained on \\~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023\\. ## Benchmarks \\- English Text In this section, we report the results for Llama 3.3 relative to our previous models. ### Instruction tuned models ## | Category | Benchmark | \\# Shots | Metric | Llama 3.1 8B Instruct | Llama 3.1 70B Instruct | Llama-3.3 70B Instruct | Llama 3.1 405B Instruct | | :---- | :---- | ----- | :---- | ----- | ----- | ----- | ----- | | | MMLU (CoT) | 0 | macro\\_avg/acc | 73.0 | 86.0 | 86.0 | 88.6 | | | MMLU Pro (CoT) | 5 | macro\\_avg/acc | 48.3 | 66.4 | 68.9 | 73.3 | | Steerability | IFEval | | | 80.4 | 87.5 | 92.1 | 88.6 | | Reasoning | GPQA Diamond (CoT) | 0 | acc | 31.8 | 48.0 | 50.5 | 49.0 | | Code | HumanEval | 0 | pass@1 | 72.6 | 80.5 | 88.4 | 89.0 | | | MBPP EvalPlus (base) | 0 | pass@1 | 72.8 | 86.0 | 87.6 | 88.6 | | Math | MATH (CoT) | 0 | sympy\\_intersection\\_score | 51.9 | 68.0 | 77.0 | 73.8 | | Tool Use | BFCL v2 | 0 | overall\\_ast\\_summary/macro\\_avg/valid | 65.4 | 77.5 | 77.3 | 81.1 | | Multilingual | MGSM | 0 | em | 68.9 | 86.9 | 91.1 | 91.6 | ## ## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.3 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.3 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.3 systems **Large language models, including Llama 3.3, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.3 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. . ### Critical and other risks ### We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of the Llama 3.3 model could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. ### **2\\. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3\\. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.3 model, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.", + "model_explanation_gemini": "A 70B-parameter multilingual instruction-following model based on Meta's Llama-3 architecture, optimized for FP8 dynamic quantization." +} \ No newline at end of file diff --git a/data/model_data_json/InstantX_InstantID.json b/data/model_data_json/InstantX_InstantID.json new file mode 100644 index 0000000000000000000000000000000000000000..bfaea84a0651f560da89b72124b99cceb6c78ebd --- /dev/null +++ b/data/model_data_json/InstantX_InstantID.json @@ -0,0 +1,15 @@ +{ + "model_id": "InstantX/InstantID", + "downloads": 116177, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "en", + "arxiv:2401.07519", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en library_name: diffusers pipeline_tag: text-to-image --- # InstantID Model Card
**Project Page** **|** **Paper** **|** **Code** **|** 🤗 **Gradio demo**
## Introduction InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks.
## Usage You can directly download the model in this repository. You also can download the model in python script: For face encoder, you need to manutally download via this URL to . Then, you can customized your own face images For more details, please follow the instructions in our GitHub repository. ## Usage Tips 1. If you're not satisfied with the similarity, try to increase the weight of \"IdentityNet Strength\" and \"Adapter Strength\". 2. If you feel that the saturation is too high, first decrease the Adapter strength. If it is still too high, then decrease the IdentityNet strength. 3. If you find that text control is not as expected, decrease Adapter strength. 4. If you find that realistic style is not good enough, go for our Github repo and use a more realistic base model. ## Demos
## Disclaimer This project is released under Apache License and aims to positively impact the field of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are obligated to comply with local laws and utilize it responsibly. The developers will not assume any responsibility for potential misuse by users. ## Citation", + "model_explanation_gemini": "Generates identity-preserving images from a single input photo without requiring fine-tuning, supporting various text-to-image tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Intel_dpt-hybrid-midas.json b/data/model_data_json/Intel_dpt-hybrid-midas.json new file mode 100644 index 0000000000000000000000000000000000000000..88dd8408e4c4b6be328c93d5a15a8e033bc98c28 --- /dev/null +++ b/data/model_data_json/Intel_dpt-hybrid-midas.json @@ -0,0 +1,18 @@ +{ + "model_id": "Intel/dpt-hybrid-midas", + "downloads": 229378, + "tags": [ + "transformers", + "pytorch", + "dpt", + "depth-estimation", + "vision", + "arxiv:2103.13413", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - depth-estimation widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace model-index: - name: dpt-hybrid-midas results: - task: type: monocular-depth-estimation name: Monocular Depth Estimation dataset: type: MIX-6 name: MIX-6 metrics: - type: Zero-shot transfer value: 11.06 name: Zero-shot transfer config: Zero-shot transfer verified: false --- ## Model Details: DPT-Hybrid (also known as MiDaS 3.0) Dense Prediction Transformer (DPT) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper Vision Transformers for Dense Prediction by Ranftl et al. (2021) and first released in this repository. DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for monocular depth estimation. !model image This repository hosts the \"hybrid\" version of the model as stated in the paper. DPT-Hybrid diverges from DPT by using ViT-hybrid as a backbone and taking some activations from the backbone. The model card has been written in combination by the Hugging Face team and Intel. | Model Detail | Description | | ----------- | ----------- | | Model Authors - Company | Intel | | Date | December 22, 2022 | | Version | 1 | | Type | Computer Vision - Monocular Depth Estimation | | Paper or Other Resources | Vision Transformers for Dense Prediction and GitHub Repo | | License | Apache 2.0 | | Questions or Comments | Community Tab and Intel Developers Discord| | Intended Use | Description | | ----------- | ----------- | | Primary intended uses | You can use the raw model for zero-shot monocular depth estimation. See the model hub to look for fine-tuned versions on a task that interests you. | | Primary intended users | Anyone doing monocular depth estimation | | Out-of-scope uses | This model in most cases will need to be fine-tuned for your particular task. The model should not be used to intentionally create hostile or alienating environments for people.| ### How to use Here is how to use this model for zero-shot depth estimation on an image: For more code examples, we refer to the documentation. | Factors | Description | | ----------- | ----------- | | Groups | Multiple datasets compiled together | | Instrumentation | - | | Environment | Inference completed on Intel Xeon Platinum 8280 CPU @ 2.70GHz with 8 physical cores and an NVIDIA RTX 2080 GPU. | | Card Prompts | Model deployment on alternate hardware and software will change model performance | | Metrics | Description | | ----------- | ----------- | | Model performance measures | Zero-shot Transfer | | Decision thresholds | - | | Approaches to uncertainty and variability | - | | Training and Evaluation Data | Description | | ----------- | ----------- | | Datasets | The dataset is called MIX 6, and contains around 1.4M images. The model was initialized with ImageNet-pretrained weights.| | Motivation | To build a robust monocular depth prediction network | | Preprocessing | \"We resize the image such that the longer side is 384 pixels and train on random square crops of size 384. ... We perform random horizontal flips for data augmentation.\" See Ranftl et al. (2021) for more details. | ## Quantitative Analyses | Model | Training set | DIW WHDR | ETH3D AbsRel | Sintel AbsRel | KITTI δ>1.25 | NYU δ>1.25 | TUM δ>1.25 | | --- | --- | --- | --- | --- | --- | --- | --- | | DPT - Large | MIX 6 | 10.82 (-13.2%) | 0.089 (-31.2%) | 0.270 (-17.5%) | 8.46 (-64.6%) | 8.32 (-12.9%) | 9.97 (-30.3%) | | DPT - Hybrid | MIX 6 | 11.06 (-11.2%) | 0.093 (-27.6%) | 0.274 (-16.2%) | 11.56 (-51.6%) | 8.69 (-9.0%) | 10.89 (-23.2%) | | MiDaS | MIX 6 | 12.95 (+3.9%) | 0.116 (-10.5%) | 0.329 (+0.5%) | 16.08 (-32.7%) | 8.71 (-8.8%) | 12.51 (-12.5%) | MiDaS [30] | MIX 5 | 12.46 | 0.129 | 0.327 | 23.90 | 9.55 | 14.29 | | Li [22] | MD [22] | 23.15 | 0.181 | 0.385 | 36.29 | 27.52 | 29.54 | | Li [21] | MC [21] | 26.52 | 0.183 | 0.405 | 47.94 | 18.57 | 17.71 | | Wang [40] | WS [40] | 19.09 | 0.205 | 0.390 | 31.92 | 29.57 | 20.18 | | Xian [45] | RW [45] | 14.59 | 0.186 | 0.422 | 34.08 | 27.00 | 25.02 | | Casser [5] | CS [8] | 32.80 | 0.235 | 0.422 | 21.15 | 39.58 | 37.18 | Table 1. Comparison to the state of the art on monocular depth estimation. We evaluate zero-shot cross-dataset transfer according to the protocol defined in [30]. Relative performance is computed with respect to the original MiDaS model [30]. Lower is better for all metrics. (Ranftl et al., 2021) | Ethical Considerations | Description | | ----------- | ----------- | | Data | The training data come from multiple image datasets compiled together. | | Human life | The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of monocular depth image datasets. | | Mitigations | No additional risk mitigation strategies were considered during model development. | | Risks and harms | The extent of the risks involved by using the model remain unknown. | | Use cases | - | | Caveats and Recommendations | | ----------- | | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model. | ### BibTeX entry and citation info", + "model_explanation_gemini": "Estimates depth from single images using a hybrid Vision Transformer architecture trained on 1.4 million images for monocular depth prediction." +} \ No newline at end of file diff --git a/data/model_data_json/Intel_dpt-large.json b/data/model_data_json/Intel_dpt-large.json new file mode 100644 index 0000000000000000000000000000000000000000..a3632369b04014c38384b85148ced9244e045b10 --- /dev/null +++ b/data/model_data_json/Intel_dpt-large.json @@ -0,0 +1,19 @@ +{ + "model_id": "Intel/dpt-large", + "downloads": 421147, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "dpt", + "depth-estimation", + "vision", + "arxiv:2103.13413", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - depth-estimation widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace model-index: - name: dpt-large results: - task: type: monocular-depth-estimation name: Monocular Depth Estimation dataset: type: MIX-6 name: MIX-6 metrics: - type: Zero-shot transfer value: 10.82 name: Zero-shot transfer config: Zero-shot transfer verified: false --- ## Model Details: DPT-Large (also known as MiDaS 3.0) Dense Prediction Transformer (DPT) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper Vision Transformers for Dense Prediction by Ranftl et al. (2021) and first released in this repository. DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for monocular depth estimation. !model image The model card has been written in combination by the Hugging Face team and Intel. | Model Detail | Description | | ----------- | ----------- | | Model Authors - Company | Intel | | Date | March 22, 2022 | | Version | 1 | | Type | Computer Vision - Monocular Depth Estimation | | Paper or Other Resources | Vision Transformers for Dense Prediction and GitHub Repo | | License | Apache 2.0 | | Questions or Comments | Community Tab and Intel Developers Discord| | Intended Use | Description | | ----------- | ----------- | | Primary intended uses | You can use the raw model for zero-shot monocular depth estimation. See the model hub to look for fine-tuned versions on a task that interests you. | | Primary intended users | Anyone doing monocular depth estimation | | Out-of-scope uses | This model in most cases will need to be fine-tuned for your particular task. The model should not be used to intentionally create hostile or alienating environments for people.| ### How to use The easiest is leveraging the pipeline API: In case you want to implement the entire logic yourself, here's how to do that for zero-shot depth estimation on an image: For more code examples, we refer to the documentation. | Factors | Description | | ----------- | ----------- | | Groups | Multiple datasets compiled together | | Instrumentation | - | | Environment | Inference completed on Intel Xeon Platinum 8280 CPU @ 2.70GHz with 8 physical cores and an NVIDIA RTX 2080 GPU. | | Card Prompts | Model deployment on alternate hardware and software will change model performance | | Metrics | Description | | ----------- | ----------- | | Model performance measures | Zero-shot Transfer | | Decision thresholds | - | | Approaches to uncertainty and variability | - | | Training and Evaluation Data | Description | | ----------- | ----------- | | Datasets | The dataset is called MIX 6, and contains around 1.4M images. The model was initialized with ImageNet-pretrained weights.| | Motivation | To build a robust monocular depth prediction network | | Preprocessing | \"We resize the image such that the longer side is 384 pixels and train on random square crops of size 384. ... We perform random horizontal flips for data augmentation.\" See Ranftl et al. (2021) for more details. | ## Quantitative Analyses | Model | Training set | DIW WHDR | ETH3D AbsRel | Sintel AbsRel | KITTI δ>1.25 | NYU δ>1.25 | TUM δ>1.25 | | --- | --- | --- | --- | --- | --- | --- | --- | | DPT - Large | MIX 6 | 10.82 (-13.2%) | 0.089 (-31.2%) | 0.270 (-17.5%) | 8.46 (-64.6%) | 8.32 (-12.9%) | 9.97 (-30.3%) | | DPT - Hybrid | MIX 6 | 11.06 (-11.2%) | 0.093 (-27.6%) | 0.274 (-16.2%) | 11.56 (-51.6%) | 8.69 (-9.0%) | 10.89 (-23.2%) | | MiDaS | MIX 6 | 12.95 (+3.9%) | 0.116 (-10.5%) | 0.329 (+0.5%) | 16.08 (-32.7%) | 8.71 (-8.8%) | 12.51 (-12.5%) | MiDaS [30] | MIX 5 | 12.46 | 0.129 | 0.327 | 23.90 | 9.55 | 14.29 | | Li [22] | MD [22] | 23.15 | 0.181 | 0.385 | 36.29 | 27.52 | 29.54 | | Li [21] | MC [21] | 26.52 | 0.183 | 0.405 | 47.94 | 18.57 | 17.71 | | Wang [40] | WS [40] | 19.09 | 0.205 | 0.390 | 31.92 | 29.57 | 20.18 | | Xian [45] | RW [45] | 14.59 | 0.186 | 0.422 | 34.08 | 27.00 | 25.02 | | Casser [5] | CS [8] | 32.80 | 0.235 | 0.422 | 21.15 | 39.58 | 37.18 | Table 1. Comparison to the state of the art on monocular depth estimation. We evaluate zero-shot cross-dataset transfer according to the protocol defined in [30]. Relative performance is computed with respect to the original MiDaS model [30]. Lower is better for all metrics. (Ranftl et al., 2021) | Ethical Considerations | Description | | ----------- | ----------- | | Data | The training data come from multiple image datasets compiled together. | | Human life | The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of monocular depth image datasets. | | Mitigations | No additional risk mitigation strategies were considered during model development. | | Risks and harms | The extent of the risks involved by using the model remain unknown. | | Use cases | - | | Caveats and Recommendations | | ----------- | | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model. | ### BibTeX entry and citation info", + "model_explanation_gemini": "Estimates depth from single images using a vision transformer trained on 1.4 million images for monocular depth prediction." +} \ No newline at end of file diff --git a/data/model_data_json/JackFram_llama-160m.json b/data/model_data_json/JackFram_llama-160m.json new file mode 100644 index 0000000000000000000000000000000000000000..dba8e379cd55cdb3de66a471c4a3505f8882b113 --- /dev/null +++ b/data/model_data_json/JackFram_llama-160m.json @@ -0,0 +1,21 @@ +{ + "model_id": "JackFram/llama-160m", + "downloads": 292439, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "en", + "dataset:wikipedia", + "arxiv:2305.09781", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en datasets: - wikipedia pipeline_tag: text-generation --- ## Model description This is a LLaMA-like model with only 160M parameters trained on Wikipedia and part of the C4-en and C4-realnewslike datasets. No evaluation has been conducted yet, so use it with care. The model is mainly developed as a base Small Speculative Model in the SpecInfer paper. ## Citation To cite the model, please use", + "model_explanation_gemini": "A 160M-parameter LLaMA-like text-generation model trained on Wikipedia and C4 datasets, primarily serving as a base small speculative model for the SpecInfer paper." +} \ No newline at end of file diff --git a/data/model_data_json/JackFram_llama-68m.json b/data/model_data_json/JackFram_llama-68m.json new file mode 100644 index 0000000000000000000000000000000000000000..786354369db576c7076a584d0e5fe0eb0aa89f1d --- /dev/null +++ b/data/model_data_json/JackFram_llama-68m.json @@ -0,0 +1,20 @@ +{ + "model_id": "JackFram/llama-68m", + "downloads": 527726, + "tags": [ + "transformers", + "pytorch", + "llama", + "text-generation", + "en", + "dataset:wikipedia", + "arxiv:2305.09781", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en datasets: - wikipedia pipeline_tag: text-generation --- ## Model description This is a LLaMA-like model with only 68M parameters trained on Wikipedia and part of the C4-en and C4-realnewslike datasets. No evaluation has been conducted yet, so use it with care. The model is mainly developed as a base Small Speculative Model in the SpecInfer paper. ## Citation To cite the model, please use", + "model_explanation_gemini": "A 68M-parameter LLaMA-like text-generation model trained on Wikipedia and C4 datasets, primarily serving as a base small speculative model for the SpecInfer paper." +} \ No newline at end of file diff --git a/data/model_data_json/JamAndTeaStudios_DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic.json b/data/model_data_json/JamAndTeaStudios_DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic.json new file mode 100644 index 0000000000000000000000000000000000000000..02de5d15fcf6d13e6ee430facf2104204ead08e3 --- /dev/null +++ b/data/model_data_json/JamAndTeaStudios_DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic.json @@ -0,0 +1,23 @@ +{ + "model_id": "JamAndTeaStudios/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic", + "downloads": 230793, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "base_model:deepseek-ai/DeepSeek-R1-Distill-Qwen-32B", + "base_model:quantized:deepseek-ai/DeepSeek-R1-Distill-Qwen-32B", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "compressed-tensors", + "region:us" + ], + "description": "--- license: mit language: - en pipeline_tag: text-generation base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B tags: - chat library_name: transformers --- # Model Overview - **Model Optimizations:** - **Weight quantization:** FP8 - **Activation quantization:** FP8 - **Release Date:** 1/28/2025 Quantized version of deepseek-ai/DeepSeek-R1-Distill-Qwen-32B to FP8 data type, ready for inference with SGLang >= 0.3 or vLLM >= 0.5.2. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized. ## Deployment ### Use with SGLang ## Creation This model was created with llm-compressor by running the code snippet below.
Model Creation Code
## Evaluation TBA ## Play Retail Mage !image/png Retail Mage (Steam) is an immersive sim that uses online LLM inference in almost all features in the gameplay! Reviews “A true to life experience detailing how customer service really works.” 10/10 – kpolupo “I enjoyed how many things were flammable in the store.” 5/5 – mr_srsbsns “I've only known that talking little crow plushie in MageMart for a day and a half but if anything happened to him I would petrify everyone in this store and then myself.” 7/7 – neondenki", + "model_explanation_gemini": "An FP8-quantized version of DeepSeek-R1-Distill-Qwen-32B optimized for reduced disk size and GPU memory usage, designed for efficient text generation inference with SGLang or vLLM." +} \ No newline at end of file diff --git a/data/model_data_json/Jean-Baptiste_camembert-ner-with-dates.json b/data/model_data_json/Jean-Baptiste_camembert-ner-with-dates.json new file mode 100644 index 0000000000000000000000000000000000000000..264594285816b760b196e9a790d82621c808b43e --- /dev/null +++ b/data/model_data_json/Jean-Baptiste_camembert-ner-with-dates.json @@ -0,0 +1,20 @@ +{ + "model_id": "Jean-Baptiste/camembert-ner-with-dates", + "downloads": 211678, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "camembert", + "token-classification", + "fr", + "dataset:Jean-Baptiste/wikiner_fr", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: fr datasets: - Jean-Baptiste/wikiner_fr widget: - text: \"Je m'appelle jean-baptiste et j'habite à montréal depuis fevr 2012\" license: mit --- # camembert-ner: model fine-tuned from camemBERT for NER task (including DATE tag). ## Introduction [camembert-ner-with-dates] is an extension of french camembert-ner model with an additionnal tag for dates. Model was trained on enriched version of wikiner-fr dataset (~170 634 sentences). On my test data (mix of chat and email), this model got an f1 score of ~83% (in comparison dateparser was ~70%). Dateparser library can still be be used on the output of this model in order to convert text to python datetime object ( ## How to use camembert-ner-with-dates with HuggingFace ##### Load camembert-ner-with-dates and its sub-word tokenizer : ## Model performances (metric: seqeval) Global By entity", + "model_explanation_gemini": "Detects named entities and dates in French text, fine-tuned from CamemBERT for improved accuracy over standard date parsing." +} \ No newline at end of file diff --git a/data/model_data_json/Jean-Baptiste_camembert-ner.json b/data/model_data_json/Jean-Baptiste_camembert-ner.json new file mode 100644 index 0000000000000000000000000000000000000000..faa5fd4b4806e5d1e49fe0200cde4520db77b56d --- /dev/null +++ b/data/model_data_json/Jean-Baptiste_camembert-ner.json @@ -0,0 +1,20 @@ +{ + "model_id": "Jean-Baptiste/camembert-ner", + "downloads": 204316, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "camembert", + "token-classification", + "fr", + "dataset:Jean-Baptiste/wikiner_fr", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: fr datasets: - Jean-Baptiste/wikiner_fr widget: - text: \"Je m'appelle jean-baptiste et je vis à montréal\" - text: \"george washington est allé à washington\" license: mit --- # camembert-ner: model fine-tuned from camemBERT for NER task. ## Introduction [camembert-ner] is a NER model that was fine-tuned from camemBERT on wikiner-fr dataset. Model was trained on wikiner-fr dataset (~170 634 sentences). Model was validated on emails/chat data and overperformed other models on this type of data specifically. In particular the model seems to work better on entity that don't start with an upper case. ## Training data Training data was classified as follow: Abbreviation|Description -|- O |Outside of a named entity MISC |Miscellaneous entity PER |Person’s name ORG |Organization LOC |Location ## How to use camembert-ner with HuggingFace ##### Load camembert-ner and its sub-word tokenizer : ## Model performances (metric: seqeval) Overall precision|recall|f1 -|-|- 0.8859|0.8971|0.8914 By entity entity|precision|recall|f1 -|-|-|- PER|0.9372|0.9598|0.9483 ORG|0.8099|0.8265|0.8181 LOC|0.8905|0.9005|0.8955 MISC|0.8175|0.8117|0.8146 For those who could be interested, here is a short article on how I used the results of this model to train a LSTM model for signature detection in emails:", + "model_explanation_gemini": "Fine-tuned from camemBERT for French named entity recognition (NER), excelling at identifying entities like persons, organizations, locations, and miscellaneous terms, particularly in emails/chat data and lowercase-starting entities." +} \ No newline at end of file diff --git a/data/model_data_json/Jean-Baptiste_roberta-large-ner-english.json b/data/model_data_json/Jean-Baptiste_roberta-large-ner-english.json new file mode 100644 index 0000000000000000000000000000000000000000..f48052568d14d36adab6482f09604760ebf07b6e --- /dev/null +++ b/data/model_data_json/Jean-Baptiste_roberta-large-ner-english.json @@ -0,0 +1,21 @@ +{ + "model_id": "Jean-Baptiste/roberta-large-ner-english", + "downloads": 197731, + "tags": [ + "transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "roberta", + "token-classification", + "en", + "dataset:conll2003", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - conll2003 widget: - text: \"My name is jean-baptiste and I live in montreal\" - text: \"My name is clara and I live in berkeley, california.\" - text: \"My name is wolfgang and I live in berlin\" train-eval-index: - config: conll2003 task: token-classification task_id: entity_extraction splits: eval_split: validation col_mapping: tokens: tokens ner_tags: tags license: mit --- # roberta-large-ner-english: model fine-tuned from roberta-large for NER task ## Introduction [roberta-large-ner-english] is an english NER model that was fine-tuned from roberta-large on conll2003 dataset. Model was validated on emails/chat data and outperformed other models on this type of data specifically. In particular the model seems to work better on entity that don't start with an upper case. ## Training data Training data was classified as follow: Abbreviation|Description -|- O |Outside of a named entity MISC |Miscellaneous entity PER |Person’s name ORG |Organization LOC |Location In order to simplify, the prefix B- or I- from original conll2003 was removed. I used the train and test dataset from original conll2003 for training and the \"validation\" dataset for validation. This resulted in a dataset of size: Train | Validation -|- 17494 | 3250 ## How to use roberta-large-ner-english with HuggingFace ##### Load roberta-large-ner-english and its sub-word tokenizer : ## Model performances Model performances computed on conll2003 validation dataset (computed on the tokens predictions) entity|precision|recall|f1 -|-|-|- PER|0.9914|0.9927|0.9920 ORG|0.9627|0.9661|0.9644 LOC|0.9795|0.9862|0.9828 MISC|0.9292|0.9262|0.9277 Overall|0.9740|0.9766|0.9753 On private dataset (email, chat, informal discussion), computed on word predictions: entity|precision|recall|f1 -|-|-|- PER|0.8823|0.9116|0.8967 ORG|0.7694|0.7292|0.7487 LOC|0.8619|0.7768|0.8171 By comparison on the same private dataset, Spacy (en_core_web_trf-3.2.0) was giving: entity|precision|recall|f1 -|-|-|- PER|0.9146|0.8287|0.8695 ORG|0.7655|0.6437|0.6993 LOC|0.8727|0.6180|0.7236 For those who could be interested, here is a short article on how I used the results of this model to train a LSTM model for signature detection in emails:", + "model_explanation_gemini": "Fine-tuned from roberta-large on the conll2003 dataset, this English NER model excels at identifying entities (persons, organizations, locations, miscellaneous) in informal text like emails and chats, particularly with non-uppercase entities." +} \ No newline at end of file diff --git a/data/model_data_json/JeffreyXiang_TRELLIS-image-large.json b/data/model_data_json/JeffreyXiang_TRELLIS-image-large.json new file mode 100644 index 0000000000000000000000000000000000000000..05bc52cd85e813cae6312508f22d892d49660dc2 --- /dev/null +++ b/data/model_data_json/JeffreyXiang_TRELLIS-image-large.json @@ -0,0 +1,14 @@ +{ + "model_id": "JeffreyXiang/TRELLIS-image-large", + "downloads": 896187, + "tags": [ + "trellis", + "image-to-3d", + "en", + "arxiv:2412.01506", + "license:mit", + "region:us" + ], + "description": "--- library_name: trellis pipeline_tag: image-to-3d license: mit language: - en --- # TRELLIS Image Large The image conditioned version of TRELLIS, a large 3D genetive model. It was introduced in the paper Structured 3D Latents for Scalable and Versatile 3D Generation. Project page: Code:", + "model_explanation_gemini": "Generates 3D models from input images using a structured latent space for scalable and versatile 3D generation." +} \ No newline at end of file diff --git a/data/model_data_json/John6666_the-big-x-files-bigxphotodemon-sdxl.json b/data/model_data_json/John6666_the-big-x-files-bigxphotodemon-sdxl.json new file mode 100644 index 0000000000000000000000000000000000000000..f74c775652db3d65ea0a4d4bcfa5b3ce000c678f --- /dev/null +++ b/data/model_data_json/John6666_the-big-x-files-bigxphotodemon-sdxl.json @@ -0,0 +1,22 @@ +{ + "model_id": "John6666/the-big-x-files-bigxphotodemon-sdxl", + "downloads": 131824, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "stable-diffusion", + "stable-diffusion-xl", + "realistic", + "photorealistic", + "bigasp", + "en", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: creativeml-openrail-m language: - en library_name: diffusers pipeline_tag: text-to-image tags: - text-to-image - stable-diffusion - stable-diffusion-xl - realistic - photorealistic - bigasp --- Original model is here. This model created by qnt_iler.", + "model_explanation_gemini": "Generates photorealistic images from text prompts using Stable Diffusion XL." +} \ No newline at end of file diff --git a/data/model_data_json/John6666_wai-nsfw-illustrious-v90-sdxl.json b/data/model_data_json/John6666_wai-nsfw-illustrious-v90-sdxl.json new file mode 100644 index 0000000000000000000000000000000000000000..9c98b47de3dc9c78d06a0ed57ea4a69147ea186f --- /dev/null +++ b/data/model_data_json/John6666_wai-nsfw-illustrious-v90-sdxl.json @@ -0,0 +1,26 @@ +{ + "model_id": "John6666/wai-nsfw-illustrious-v90-sdxl", + "downloads": 87756, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "stable-diffusion", + "stable-diffusion-xl", + "not-for-all-audiences", + "anime", + "hentai", + "girls", + "illustrious", + "en", + "base_model:OnomaAIResearch/Illustrious-xl-early-release-v0", + "base_model:finetune:OnomaAIResearch/Illustrious-xl-early-release-v0", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: other license_name: faipl-1.0-sd license_link: language: - en library_name: diffusers pipeline_tag: text-to-image tags: - text-to-image - stable-diffusion - stable-diffusion-xl - not-for-all-audiences - anime - hentai - girls - illustrious base_model: OnomaAIResearch/Illustrious-xl-early-release-v0 --- Original model is here. This model created by WAI0731.", + "model_explanation_gemini": "Generates anime-style NSFW images from text prompts using Stable Diffusion XL." +} \ No newline at end of file diff --git a/data/model_data_json/KBLab_wav2vec2-large-voxrex-swedish.json b/data/model_data_json/KBLab_wav2vec2-large-voxrex-swedish.json new file mode 100644 index 0000000000000000000000000000000000000000..555b2f546be0e95483217b0241044e6033eeb787 --- /dev/null +++ b/data/model_data_json/KBLab_wav2vec2-large-voxrex-swedish.json @@ -0,0 +1,25 @@ +{ + "model_id": "KBLab/wav2vec2-large-voxrex-swedish", + "downloads": 88141, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "hf-asr-leaderboard", + "sv", + "dataset:common_voice", + "dataset:NST_Swedish_ASR_Database", + "dataset:P4", + "arxiv:2205.03026", + "license:cc0-1.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: sv arxiv: datasets: - common_voice - NST_Swedish_ASR_Database - P4 metrics: - wer tags: - audio - automatic-speech-recognition - speech - hf-asr-leaderboard license: cc0-1.0 model-index: - name: Wav2vec 2.0 large VoxRex Swedish results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice type: common_voice args: sv-SE metrics: - name: Test WER type: wer value: 8.49 --- # Wav2vec 2.0 large VoxRex Swedish (C) Finetuned version of KBs VoxRex large model using Swedish radio broadcasts, NST and Common Voice data. Evalutation without a language model gives the following: WER for NST + Common Voice test set (2% of total sentences) is **2.5%**. WER for Common Voice test set is **8.49%** directly and **7.37%** with a 4-gram language model. When using this model, make sure that your speech input is sampled at 16kHz. **Update 2022-01-10:** Updated to VoxRex-C version. **Update 2022-05-16:** Paper is is here. # Performance\\* !Comparison
*Chart shows performance without the additional 20k steps of Common Voice fine-tuning
## Training This model has been fine-tuned for 120000 updates on NST + CommonVoice and then for an additional 20000 updates on CommonVoice only. The additional fine-tuning on CommonVoice hurts performance on the NST+CommonVoice test set somewhat and, unsurprisingly, improves it on the CommonVoice test set. It seems to perform generally better though [citation needed]. !WER during training ## Usage The model can be used directly (without a language model) as follows: ## Citation", + "model_explanation_gemini": "Finetuned for Swedish speech recognition, achieving low word error rates on datasets like Common Voice and NST without requiring a language model." +} \ No newline at end of file diff --git a/data/model_data_json/KBlueLeaf_DanTagGen-delta-rev2.json b/data/model_data_json/KBlueLeaf_DanTagGen-delta-rev2.json new file mode 100644 index 0000000000000000000000000000000000000000..9f2655746b6d1f5e5ceaab368100cdc425c37d3d --- /dev/null +++ b/data/model_data_json/KBlueLeaf_DanTagGen-delta-rev2.json @@ -0,0 +1,22 @@ +{ + "model_id": "KBlueLeaf/DanTagGen-delta-rev2", + "downloads": 122822, + "tags": [ + "transformers", + "safetensors", + "gguf", + "llama", + "text-generation", + "not-for-all-audiences", + "art", + "en", + "dataset:KBlueLeaf/danbooru2023-sqlite", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-sa-4.0 datasets: - KBlueLeaf/danbooru2023-sqlite language: - en library_name: transformers pipeline_tag: text-generation tags: - not-for-all-audiences - art widget: - text: \"quality: masterpiece\\nrating: safe\\nartist: <|empty|>\\ncharacters: <|empty|>\\ncopyrights: <|empty|>\\naspect ratio: 1.0\\ntarget: <|short|>\\ngeneral: 1girl, solo, dragon girl, dragon horns, dragon tail<|input_end|>\" --- # DanTagGen - delta (rev2) DanTagGen(Danbooru Tag Generator) is inspired from p1atdev's dart project. But with different arch, dataset, format and different training strategy. ## Difference between versions - alpha: pretrain on 2M dataset, smaller batch size. Limited ability - beta: pretrain on 5.3M dataset, larger batch size. More stable, better ability with only a few information provided. - delta: pretrain on 7.2M dataset, larger batch size. Slightly underfit but better diversity. quality tag introduced. - rev2: resumed from delta, same dataset, 2 more epoch. ## Model arch This version of DTG is trained from scratch with 400M param LLaMA arch.(In my personal preference I will call it NanoLLaMA) Since it is llama arch. Theoritically it should be able to be used in any LLaMA inference interface. This repo also provided converted FP16 gguf model and quantized 8bit/6bit gguf models. Basically it is recommended to use llama.cpp or llama-cpp-python to run this model. Which will be very fast. ## Format for example: And you may get something like: ## Dataset and Training I use the trainer I implemented in HakuPhi to run the training. with Total 12epoch on 7.2M data. This model have roughly 10~15B token seen. The dataset is exported by HakuBooru with my danbooru sqlite database. Use the percentile of fav_count on each rating to filter the data. (2M = top 25%, 5.3M = top 75%) ## Utilities - HF space: - Demo for DTG + Kohaku XL Epsilon: - SD-WebUI Extension: - ComfyUI Node:", + "model_explanation_gemini": "Generates Danbooru-style image tags for character attributes, quality, and artistic details based on structured input prompts." +} \ No newline at end of file diff --git a/data/model_data_json/Khalsuu_filipino-wav2vec2-l-xls-r-300m-official.json b/data/model_data_json/Khalsuu_filipino-wav2vec2-l-xls-r-300m-official.json new file mode 100644 index 0000000000000000000000000000000000000000..f011921092779730cf3c8e21b355e7e8c330c543 --- /dev/null +++ b/data/model_data_json/Khalsuu_filipino-wav2vec2-l-xls-r-300m-official.json @@ -0,0 +1,18 @@ +{ + "model_id": "Khalsuu/filipino-wav2vec2-l-xls-r-300m-official", + "downloads": 1063002, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "wav2vec2", + "automatic-speech-recognition", + "generated_from_trainer", + "dataset:filipino_voice", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - generated_from_trainer datasets: - filipino_voice model-index: - name: filipino-wav2vec2-l-xls-r-300m-official results: [] --- # filipino-wav2vec2-l-xls-r-300m-official This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the filipino_voice dataset. It achieves the following results on the evaluation set: - Loss: 0.4672 - Wer: 0.2922 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 30 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:------:| | 3.3671 | 2.09 | 400 | 0.5584 | 0.5987 | | 0.48 | 4.19 | 800 | 0.4244 | 0.4195 | | 0.2796 | 6.28 | 1200 | 0.3742 | 0.3765 | | 0.1916 | 8.38 | 1600 | 0.4291 | 0.3667 | | 0.1463 | 10.47 | 2000 | 0.3745 | 0.3415 | | 0.1165 | 12.57 | 2400 | 0.4472 | 0.3407 | | 0.0955 | 14.66 | 2800 | 0.4269 | 0.3290 | | 0.0823 | 16.75 | 3200 | 0.4608 | 0.3475 | | 0.0709 | 18.85 | 3600 | 0.4706 | 0.3281 | | 0.0603 | 20.94 | 4000 | 0.4380 | 0.3183 | | 0.0527 | 23.04 | 4400 | 0.4473 | 0.3067 | | 0.0449 | 25.13 | 4800 | 0.4550 | 0.3029 | | 0.041 | 27.23 | 5200 | 0.4671 | 0.3020 | | 0.0358 | 29.32 | 5600 | 0.4672 | 0.2922 | ### Framework versions - Transformers 4.11.3 - Pytorch 1.10.0+cu113 - Datasets 1.18.3 - Tokenizers 0.10.3", + "model_explanation_gemini": "A fine-tuned version of facebook/wav2vec2-xls-r-300m for speech recognition on Filipino audio data." +} \ No newline at end of file diff --git a/data/model_data_json/KoalaAI_Text-Moderation.json b/data/model_data_json/KoalaAI_Text-Moderation.json new file mode 100644 index 0000000000000000000000000000000000000000..132d6d873f1e29acf72e255f8cf4b2e5ef108fbb --- /dev/null +++ b/data/model_data_json/KoalaAI_Text-Moderation.json @@ -0,0 +1,23 @@ +{ + "model_id": "KoalaAI/Text-Moderation", + "downloads": 222667, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "deberta", + "text-classification", + "autotrain", + "en", + "dataset:mmathys/openai-moderation-api-evaluation", + "dataset:KoalaAI/Text-Moderation-v2-small", + "license:openrail", + "co2_eq_emissions", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - autotrain - text-classification language: - en widget: - text: I love AutoTrain - text: I absolutely hate those people - text: I love cake! - text: >- lets build the wall and deport illegals \"they walk across the border like this is Central park\" - text: EU offers to pay countries 6,000 euros per person to take in migrants datasets: - mmathys/openai-moderation-api-evaluation - KoalaAI/Text-Moderation-v2-small co2_eq_emissions: emissions: 0.03967468113268738 license: openrail --- # Text Moderation This model is a text classification model based on Deberta-v3 that predicts whether a text contains text that could be considered offensive. It is split up in the following labels: | Category | Label | Definition | | -------- | ----- | ---------- | | sexual | | Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness). | | hate | | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. | | violence | | Content that promotes or glorifies violence or celebrates the suffering or humiliation of others. | | harassment | | Content that may be used to torment or annoy individuals in real life, or make harassment more likely to occur. | | self-harm | | Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. | | sexual/minors | | Sexual content that includes an individual who is under 18 years old. | | hate/threatening | | Hateful content that also includes violence or serious harm towards the targeted group. | | violence/graphic | | Violent content that depicts death, violence, or serious physical injury in extreme graphic detail. | | OK | | Not offensive It's important to remember that this model was only trained on English texts, and may not perform well on non-English inputs. ## Ethical Considerations This is a model that deals with sensitive and potentially harmful language. Users should consider the ethical implications and potential risks of using or deploying this model in their applications or contexts. Some of the ethical issues that may arise are: - The model may reinforce or amplify existing biases or stereotypes in the data or in the society. For example, the model may associate certain words or topics with offensive language based on the frequency or co-occurrence in the data, without considering the meaning or intent behind them. This may result in unfair or inaccurate predictions for some groups or individuals. Users should carefully consider the purpose, context, and impact of using this model, and take appropriate measures to prevent or mitigate any potential harm. Users should also respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions. ## License This model is licensed under the CodeML OpenRAIL-M 0.1 license, which is a variant of the BigCode OpenRAIL-M license. This license allows you to freely access, use, modify, and distribute this model and its derivatives, for research, commercial or non-commercial purposes, as long as you comply with the following conditions: - You must include a copy of the license and the original source of the model in any copies or derivatives of the model that you distribute. - You must not use the model or its derivatives for any unlawful, harmful, abusive, discriminatory, or offensive purposes, or to cause or contribute to any social or environmental harm. - You must respect the privacy and consent of the data subjects whose data was used to train or evaluate the model, and adhere to the relevant laws and regulations in your jurisdiction. - You must acknowledge that the model and its derivatives are provided \"as is\", without any warranties or guarantees of any kind, and that the licensor is not liable for any damages or losses arising from your use of the model or its derivatives. By accessing or using this model, you agree to be bound by the terms of this license. If you do not agree with the terms of this license, you must not access or use this model. ## Training Details - Problem type: Multi-class Classification - CO2 Emissions (in grams): 0.0397 ## Validation Metrics - Loss: 0.848 - Accuracy: 0.749 (75%) - Macro F1: 0.326 - Micro F1: 0.749 - Weighted F1: 0.703 - Macro Precision: 0.321 - Micro Precision: 0.749 - Weighted Precision: 0.671 - Macro Recall: 0.349 - Micro Recall: 0.749 - Weighted Recall: 0.749 ## Usage You can use cURL to access this model: Or Python API: The output of the above Python code will look like this:", + "model_explanation_gemini": "Classifies English text into categories of offensive content, including hate speech, violence, harassment, self-harm, and sexual material." +} \ No newline at end of file diff --git a/data/model_data_json/KoboldAI_LLaMA2-13B-Erebus-v3-GGUF.json b/data/model_data_json/KoboldAI_LLaMA2-13B-Erebus-v3-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..5c1130551c1a8cc948223daa660e5497f64db5fa --- /dev/null +++ b/data/model_data_json/KoboldAI_LLaMA2-13B-Erebus-v3-GGUF.json @@ -0,0 +1,18 @@ +{ + "model_id": "KoboldAI/LLaMA2-13B-Erebus-v3-GGUF", + "downloads": 107782, + "tags": [ + "gguf", + "pytorch", + "llama", + "llama-2", + "finetuned", + "not-for-all-audiences", + "text-generation", + "en", + "license:llama2", + "region:us" + ], + "description": "--- license: llama2 license_name: llama-2 license_link: LICENSE language: - en pipeline_tag: text-generation inference: false tags: - pytorch - llama - llama-2 - finetuned - not-for-all-audiences --- # LLaMA2-13B-Erebus This is the GGUF version of this model designed to be used with Koboldcpp, the easy and versitile program for AI co-writing and more. Because this is a co-writing model, you are expected to use the Story Mode inside the KoboldAI Lite interface. ## Model description This is the third generation of the original Shinen made by Mr. Seeker. The full dataset consists of 8 different sources, all surrounding the \"Adult\" theme. The name \"Erebus\" comes from the greek mythology, also named \"darkness\". This is in line with Shin'en, or \"deep abyss\". For inquiries, please contact the KoboldAI community. **Warning: THIS model is NOT suitable for use by minors. The model will output X-rated content.** ## Training procedure LLaMA2-13B-Erebus was trained on 8x A6000 Ada GPU's for a single epoch. No special frameworks have been used. ## Training data The data can be divided in 8 different datasets: - Literotica (everything with 3.0/5 or higher) - Sexstories (everything with 70 or higher) - Dataset-G (private dataset of X-rated stories) - Doc's Lab (all stories) - Lushstories (Editor's pick) - Swinglifestyle (all stories) - Pike-v2 Dataset (novels with \"adult\" rating) - SoFurry (collection of various animals) The dataset uses for tagging. The full dataset is 2.3B tokens in size. ## Limitations and biases Based on known problems with NLP technology, potential relevant factors include bias (gender, profession, race and religion). **Warning: This model has a very strong NSFW bias!**", + "model_explanation_gemini": "Generates NSFW text content for adult-themed co-writing tasks using a finetuned LLaMA2-13B model." +} \ No newline at end of file diff --git a/data/model_data_json/LGAI-EXAONE_EXAONE-3.5-32B-Instruct.json b/data/model_data_json/LGAI-EXAONE_EXAONE-3.5-32B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..074a36554ab5becf56abd084a2e1c762ae81dd59 --- /dev/null +++ b/data/model_data_json/LGAI-EXAONE_EXAONE-3.5-32B-Instruct.json @@ -0,0 +1,22 @@ +{ + "model_id": "LGAI-EXAONE/EXAONE-3.5-32B-Instruct", + "downloads": 149122, + "tags": [ + "transformers", + "safetensors", + "exaone", + "text-generation", + "lg-ai", + "exaone-3.5", + "conversational", + "custom_code", + "en", + "ko", + "arxiv:2412.04862", + "license:other", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: other license_name: exaone license_link: LICENSE language: - en - ko tags: - lg-ai - exaone - exaone-3.5 pipeline_tag: text-generation library_name: transformers ---


# EXAONE-3.5-32B-Instruct ## Introduction We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) **2.4B model** optimized for deployment on small or resource-constrained devices, 2) **7.8B model** matching the size of its predecessor but offering improved performance, and 3) **32B model** delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes. For more details, please refer to our technical report, blog and GitHub. This repository contains the instruction-tuned 32B language model with the following features: - Number of Parameters (without embeddings): 30.95B - Number of Layers: 64 - Number of Attention Heads: GQA with 40 Q-heads and 8 KV-heads - Vocab Size: 102,400 - Context Length: 32,768 tokens ## Quickstart We recommend to use v4.43 or later. Here is the code snippet to run conversational inference with the model: > ### Note > The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt, > so we highly recommend using the system prompts provided in the code snippet above. ## Evaluation The following table shows the evaluation results of real-world use cases. The full evaluation results can be found in the technical report.
Models MT-Bench LiveBench Arena-Hard AlpacaEval IFEval KoMT-Bench[1] LogicKor
EXAONE 3.5 32B 8.51 43.0 78.6 60.6 81.7 8.05 9.06
Qwen 2.5 32B 8.49 50.6 67.0 41.0 78.7 7.75 8.89
C4AI Command R 32B 7.38 29.7 17.0 25.9 26.1 6.72 8.24
Gemma 2 27B 8.28 40.0 57.5 52.2 59.7 7.19 8.56
Yi 1.5 34B 7.64 26.2 23.1 34.8 55.5 4.88 6.33
- [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see README for more details. ## Deployment EXAONE 3.5 models can be inferred in the various frameworks, such as: - - - - - Please refer to our EXAONE 3.5 GitHub for more details about the inference frameworks. ## Quantization We provide the pre-quantized EXAONE 3.5 models with **AWQ** and several quantization types in **GGUF** format. Please refer to our EXAONE 3.5 collection to find corresponding quantized models. ## Limitation The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research. - Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information. - Biased responses may be generated, which are associated with age, gender, race, and so on. - The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences. - Since the model does not reflect the latest information, the responses may be false or contradictory. LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI’s ethical principles when using EXAONE language models. ## License The model is licensed under EXAONE AI Model License Agreement 1.1 - NC ## Citation ## Contact LG AI Research Technical Support: contact_us@lgresearch.ai", + "model_explanation_gemini": "A 32-billion-parameter bilingual (English and Korean) instruction-tuned generative model optimized for long-context understanding and real-world use cases, supporting up to 32K tokens." +} \ No newline at end of file diff --git a/data/model_data_json/LGAI-EXAONE_EXAONE-3.5-7.8B-Instruct.json b/data/model_data_json/LGAI-EXAONE_EXAONE-3.5-7.8B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..5496936b3a2b92e0b5742e58ad94139851aad72a --- /dev/null +++ b/data/model_data_json/LGAI-EXAONE_EXAONE-3.5-7.8B-Instruct.json @@ -0,0 +1,22 @@ +{ + "model_id": "LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct", + "downloads": 240584, + "tags": [ + "transformers", + "safetensors", + "exaone", + "text-generation", + "lg-ai", + "exaone-3.5", + "conversational", + "custom_code", + "en", + "ko", + "arxiv:2412.04862", + "license:other", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: other license_name: exaone license_link: LICENSE language: - en - ko tags: - lg-ai - exaone - exaone-3.5 pipeline_tag: text-generation library_name: transformers ---


# EXAONE-3.5-7.8B-Instruct ## Introduction We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) **2.4B model** optimized for deployment on small or resource-constrained devices, 2) **7.8B model** matching the size of its predecessor but offering improved performance, and 3) **32B model** delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes. For more details, please refer to our technical report, blog and GitHub. This repository contains the instruction-tuned 7.8B language model with the following features: - Number of Parameters (without embeddings): 6.98B - Number of Layers: 32 - Number of Attention Heads: GQA with 32 Q-heads and 8 KV-heads - Vocab Size: 102,400 - Context Length: 32,768 tokens ## Quickstart We recommend to use v4.43 or later. Here is the code snippet to run conversational inference with the model: > ### Note > The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt, > so we highly recommend using the system prompts provided in the code snippet above. ## Evaluation The following table shows the evaluation results of real-world use cases. The full evaluation results can be found in the technical report.
Models MT-Bench LiveBench Arena-Hard AlpacaEval IFEval KoMT-Bench[1] LogicKor
EXAONE 3.5 7.8B 8.29 39.8 68.7 54.2 78.9 7.96 9.08
Qwen 2.5 7B 6.48 35.6 48.9 31.7 72.5 5.19 6.38
Llama 3.1 8B 7.59 28.3 27.7 25.7 74.5 4.85 5.99
Gemma 2 9B 7.64 32.1 43.6 47.3 54.7 7.10 8.05
Phi 3 small (7B) 7.63 27.9 26.8 29.2 59.5 3.22 3.99
- [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see README for more details. ## Deployment EXAONE 3.5 models can be inferred in the various frameworks, such as: - - - - - Please refer to our EXAONE 3.5 GitHub for more details about the inference frameworks. ## Quantization We provide the pre-quantized EXAONE 3.5 models with **AWQ** and several quantization types in **GGUF** format. Please refer to our EXAONE 3.5 collection to find corresponding quantized models. ## Limitation The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research. - Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information. - Biased responses may be generated, which are associated with age, gender, race, and so on. - The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences. - Since the model does not reflect the latest information, the responses may be false or contradictory. LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI’s ethical principles when using EXAONE language models. ## License The model is licensed under EXAONE AI Model License Agreement 1.1 - NC ## Citation ## Contact LG AI Research Technical Support: contact_us@lgresearch.ai", + "model_explanation_gemini": "A 7.8B-parameter bilingual (English and Korean) instruction-tuned generative model optimized for text generation with long-context processing up to 32K tokens." +} \ No newline at end of file diff --git a/data/model_data_json/LGAI-EXAONE_EXAONE-Deep-2.4B.json b/data/model_data_json/LGAI-EXAONE_EXAONE-Deep-2.4B.json new file mode 100644 index 0000000000000000000000000000000000000000..dc6bafe108ecd1213190ae8b7f2f65faa5624f73 --- /dev/null +++ b/data/model_data_json/LGAI-EXAONE_EXAONE-Deep-2.4B.json @@ -0,0 +1,24 @@ +{ + "model_id": "LGAI-EXAONE/EXAONE-Deep-2.4B", + "downloads": 124493, + "tags": [ + "transformers", + "safetensors", + "exaone", + "text-generation", + "lg-ai", + "exaone-deep", + "conversational", + "custom_code", + "en", + "ko", + "arxiv:2503.12524", + "base_model:LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct", + "base_model:finetune:LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct", + "license:other", + "autotrain_compatible", + "region:us" + ], + "description": "--- base_model: LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct base_model_relation: finetune license: other license_name: exaone license_link: LICENSE language: - en - ko tags: - lg-ai - exaone - exaone-deep pipeline_tag: text-generation library_name: transformers ---


# EXAONE-Deep-2.4B ## Introduction We introduce EXAONE Deep, which exhibits superior capabilities in various reasoning tasks including math and coding benchmarks, ranging from 2.4B to 32B parameters developed and released by LG AI Research. Evaluation results show that 1) EXAONE Deep **2.4B** outperforms other models of comparable size, 2) EXAONE Deep **7.8B** outperforms not only open-weight models of comparable scale but also a proprietary reasoning model OpenAI o1-mini, and 3) EXAONE Deep **32B** demonstrates competitive performance against leading open-weight models. For more details, please refer to our documentation, blog and GitHub.

This repository contains the reasoning 2.4B language model with the following features: - Number of Parameters (without embeddings): 2.14B - Number of Layers: 30 - Number of Attention Heads: GQA with 32 Q-heads and 8 KV-heads - Vocab Size: 102,400 - Context Length: 32,768 tokens - Tie Word Embeddings: True (unlike 7.8B and 32B models) ## Quickstart We recommend to use v4.43.1 or later. Here is the code snippet to run conversational inference with the model: > ### Note > The EXAONE Deep models are trained with an optimized configuration, > so we recommend following the Usage Guideline section to achieve optimal performance. ## Evaluation The following table shows the evaluation results of reasoning tasks such as math and coding. The full evaluation results can be found in the documentation.
Models MATH-500 (pass@1) AIME 2024 (pass@1 / cons@64) AIME 2025 (pass@1 / cons@64) CSAT Math 2025 (pass@1) GPQA Diamond (pass@1) Live Code Bench (pass@1)
EXAONE Deep 32B 95.7 72.1 / 90.0 65.8 / 80.0 94.5 66.1 59.5
DeepSeek-R1-Distill-Qwen-32B 94.3 72.6 / 83.3 55.2 / 73.3 84.1 62.1 57.2
QwQ-32B 95.5 79.5 / 86.7 67.1 / 76.7 94.4 63.3 63.4
DeepSeek-R1-Distill-Llama-70B 94.5 70.0 / 86.7 53.9 / 66.7 88.8 65.2 57.5
DeepSeek-R1 (671B) 97.3 79.8 / 86.7 66.8 / 80.0 89.9 71.5 65.9
EXAONE Deep 7.8B 94.8 70.0 / 83.3 59.6 / 76.7 89.9 62.6 55.2
DeepSeek-R1-Distill-Qwen-7B 92.8 55.5 / 83.3 38.5 / 56.7 79.7 49.1 37.6
DeepSeek-R1-Distill-Llama-8B 89.1 50.4 / 80.0 33.6 / 53.3 74.1 49.0 39.6
OpenAI o1-mini 90.0 63.6 / 80.0 54.8 / 66.7 84.4 60.0 53.8
EXAONE Deep 2.4B 92.3 52.5 / 76.7 47.9 / 73.3 79.2 54.3 46.6
DeepSeek-R1-Distill-Qwen-1.5B 83.9 28.9 / 52.7 23.9 / 36.7 65.6 33.8 16.9
## Deployment EXAONE Deep models can be inferred in the various frameworks, such as: - - - - - - Please refer to our EXAONE Deep GitHub for more details about the inference frameworks. ## Quantization We provide the pre-quantized EXAONE Deep models with **AWQ** and several quantization types in **GGUF** format. Please refer to our EXAONE Deep collection to find corresponding quantized models. ## Usage Guideline To achieve the expected performance, we recommend using the following configurations: 1. Ensure the model starts with for reasoning steps. The model's output quality may be degraded when you omit it. You can easily apply this feature by using with . Please check the example code on Quickstart section. 2. The reasoning steps of EXAONE Deep models enclosed by usually have lots of tokens, so previous reasoning steps may be necessary to be removed in multi-turn situation. The provided tokenizer handles this automatically. 3. Avoid using system prompt, and build the instruction on the user prompt. 4. Additional instructions help the models reason more deeply, so that the models generate better output. - For math problems, the instructions **\"Please reason step by step, and put your final answer within \\boxed{}.\"** are helpful. - For more information on our evaluation setting including prompts, please refer to our Documentation. 5. In our evaluation, we use and for generation. 6. When evaluating the models, it is recommended to test multiple times to assess the expected performance accurately. ## Limitation The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research. - Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information. - Biased responses may be generated, which are associated with age, gender, race, and so on. - The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences. - Since the model does not reflect the latest information, the responses may be false or contradictory. LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI’s ethical principles when using EXAONE language models. ## License The model is licensed under EXAONE AI Model License Agreement 1.1 - NC ## Citation ## Contact LG AI Research Technical Support: contact_us@lgresearch.ai", + "model_explanation_gemini": "A 2.4B-parameter language model optimized for reasoning tasks like math and coding, outperforming similar-sized models in benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/LGAI-EXAONE_EXAONE-Deep-32B.json b/data/model_data_json/LGAI-EXAONE_EXAONE-Deep-32B.json new file mode 100644 index 0000000000000000000000000000000000000000..ec2e1d4ce03d1ffc20b7574d85c67a02722042e8 --- /dev/null +++ b/data/model_data_json/LGAI-EXAONE_EXAONE-Deep-32B.json @@ -0,0 +1,24 @@ +{ + "model_id": "LGAI-EXAONE/EXAONE-Deep-32B", + "downloads": 116885, + "tags": [ + "transformers", + "safetensors", + "exaone", + "text-generation", + "lg-ai", + "exaone-deep", + "conversational", + "custom_code", + "en", + "ko", + "arxiv:2503.12524", + "base_model:LGAI-EXAONE/EXAONE-3.5-32B-Instruct", + "base_model:finetune:LGAI-EXAONE/EXAONE-3.5-32B-Instruct", + "license:other", + "autotrain_compatible", + "region:us" + ], + "description": "--- base_model: LGAI-EXAONE/EXAONE-3.5-32B-Instruct base_model_relation: finetune license: other license_name: exaone license_link: LICENSE language: - en - ko tags: - lg-ai - exaone - exaone-deep pipeline_tag: text-generation library_name: transformers ---


# EXAONE-Deep-32B ## Introduction We introduce EXAONE Deep, which exhibits superior capabilities in various reasoning tasks including math and coding benchmarks, ranging from 2.4B to 32B parameters developed and released by LG AI Research. Evaluation results show that 1) EXAONE Deep **2.4B** outperforms other models of comparable size, 2) EXAONE Deep **7.8B** outperforms not only open-weight models of comparable scale but also a proprietary reasoning model OpenAI o1-mini, and 3) EXAONE Deep **32B** demonstrates competitive performance against leading open-weight models. For more details, please refer to our documentation, blog and GitHub.

This repository contains the reasoning 32B language model with the following features: - Number of Parameters (without embeddings): 30.95B - Number of Layers: 64 - Number of Attention Heads: GQA with 40 Q-heads and 8 KV-heads - Vocab Size: 102,400 - Context Length: 32,768 tokens ## Quickstart We recommend to use v4.43.1 or later. Here is the code snippet to run conversational inference with the model: > ### Note > The EXAONE Deep models are trained with an optimized configuration, > so we recommend following the Usage Guideline section to achieve optimal performance. ## Evaluation The following table shows the evaluation results of reasoning tasks such as math and coding. The full evaluation results can be found in the documentation.
Models MATH-500 (pass@1) AIME 2024 (pass@1 / cons@64) AIME 2025 (pass@1 / cons@64) CSAT Math 2025 (pass@1) GPQA Diamond (pass@1) Live Code Bench (pass@1)
EXAONE Deep 32B 95.7 72.1 / 90.0 65.8 / 80.0 94.5 66.1 59.5
DeepSeek-R1-Distill-Qwen-32B 94.3 72.6 / 83.3 55.2 / 73.3 84.1 62.1 57.2
QwQ-32B 95.5 79.5 / 86.7 67.1 / 76.7 94.4 63.3 63.4
DeepSeek-R1-Distill-Llama-70B 94.5 70.0 / 86.7 53.9 / 66.7 88.8 65.2 57.5
DeepSeek-R1 (671B) 97.3 79.8 / 86.7 66.8 / 80.0 89.9 71.5 65.9
EXAONE Deep 7.8B 94.8 70.0 / 83.3 59.6 / 76.7 89.9 62.6 55.2
DeepSeek-R1-Distill-Qwen-7B 92.8 55.5 / 83.3 38.5 / 56.7 79.7 49.1 37.6
DeepSeek-R1-Distill-Llama-8B 89.1 50.4 / 80.0 33.6 / 53.3 74.1 49.0 39.6
OpenAI o1-mini 90.0 63.6 / 80.0 54.8 / 66.7 84.4 60.0 53.8
EXAONE Deep 2.4B 92.3 52.5 / 76.7 47.9 / 73.3 79.2 54.3 46.6
DeepSeek-R1-Distill-Qwen-1.5B 83.9 28.9 / 52.7 23.9 / 36.7 65.6 33.8 16.9
## Deployment EXAONE Deep models can be inferred in the various frameworks, such as: - - - - - - Please refer to our EXAONE Deep GitHub for more details about the inference frameworks. ## Quantization We provide the pre-quantized EXAONE Deep models with **AWQ** and several quantization types in **GGUF** format. Please refer to our EXAONE Deep collection to find corresponding quantized models. ## Usage Guideline To achieve the expected performance, we recommend using the following configurations: 1. Ensure the model starts with for reasoning steps. The model's output quality may be degraded when you omit it. You can easily apply this feature by using with . Please check the example code on Quickstart section. 2. The reasoning steps of EXAONE Deep models enclosed by usually have lots of tokens, so previous reasoning steps may be necessary to be removed in multi-turn situation. The provided tokenizer handles this automatically. 3. Avoid using system prompt, and build the instruction on the user prompt. 4. Additional instructions help the models reason more deeply, so that the models generate better output. - For math problems, the instructions **\"Please reason step by step, and put your final answer within \\boxed{}.\"** are helpful. - For more information on our evaluation setting including prompts, please refer to our Documentation. 5. In our evaluation, we use and for generation. 6. When evaluating the models, it is recommended to test multiple times to assess the expected performance accurately. ## Limitation The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research. - Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information. - Biased responses may be generated, which are associated with age, gender, race, and so on. - The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences. - Since the model does not reflect the latest information, the responses may be false or contradictory. LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI’s ethical principles when using EXAONE language models. ## License The model is licensed under EXAONE AI Model License Agreement 1.1 - NC ## Citation ## Contact LG AI Research Technical Support: contact_us@lgresearch.ai", + "model_explanation_gemini": "A 32B-parameter language model specialized in reasoning tasks like math and coding, outperforming comparable models in benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/Lightricks_LTX-Video.json b/data/model_data_json/Lightricks_LTX-Video.json new file mode 100644 index 0000000000000000000000000000000000000000..84124aad16c586e9e002834ea8fe4b0d9ff0b73d --- /dev/null +++ b/data/model_data_json/Lightricks_LTX-Video.json @@ -0,0 +1,17 @@ +{ + "model_id": "Lightricks/LTX-Video", + "downloads": 154823, + "tags": [ + "diffusers", + "safetensors", + "ltx-video", + "image-to-video", + "text-to-video", + "en", + "license:other", + "diffusers:LTXPipeline", + "region:us" + ], + "description": "--- tags: - ltx-video - image-to-video pinned: true language: - en license: other pipeline_tag: text-to-video library_name: diffusers --- # LTX-Video Model Card This model card focuses on the model associated with the LTX-Video model, codebase available here. LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. We provide a model for both text-to-video as well as image+text-to-video usecases \"trailer\" | | | | | |:---:|:---:|:---:|:---:| | !example1

A woman with long brown hair and light skin smiles at another woman...A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.
| !example2
A woman walks away from a white Jeep parked on a city street at night...A woman walks away from a white Jeep parked on a city street at night, then ascends a staircase and knocks on a door. The woman, wearing a dark jacket and jeans, walks away from the Jeep parked on the left side of the street, her back to the camera; she walks at a steady pace, her arms swinging slightly by her sides; the street is dimly lit, with streetlights casting pools of light on the wet pavement; a man in a dark jacket and jeans walks past the Jeep in the opposite direction; the camera follows the woman from behind as she walks up a set of stairs towards a building with a green door; she reaches the top of the stairs and turns left, continuing to walk towards the building; she reaches the door and knocks on it with her right hand; the camera remains stationary, focused on the doorway; the scene is captured in real-life footage.
| !example3
A woman with blonde hair styled up, wearing a black dress...A woman with blonde hair styled up, wearing a black dress with sequins and pearl earrings, looks down with a sad expression on her face. The camera remains stationary, focused on the woman's face. The lighting is dim, casting soft shadows on her face. The scene appears to be from a movie or TV show.
| !example4
The camera pans over a snow-covered mountain range...The camera pans over a snow-covered mountain range, revealing a vast expanse of snow-capped peaks and valleys.The mountains are covered in a thick layer of snow, with some areas appearing almost white while others have a slightly darker, almost grayish hue. The peaks are jagged and irregular, with some rising sharply into the sky while others are more rounded. The valleys are deep and narrow, with steep slopes that are also covered in snow. The trees in the foreground are mostly bare, with only a few leaves remaining on their branches. The sky is overcast, with thick clouds obscuring the sun. The overall impression is one of peace and tranquility, with the snow-covered mountains standing as a testament to the power and beauty of nature.
| | !example5
A woman with light skin, wearing a blue jacket and a black hat...A woman with light skin, wearing a blue jacket and a black hat with a veil, looks down and to her right, then back up as she speaks; she has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her jacket; the camera remains stationary on her face as she speaks; the background is out of focus, but shows trees and people in period clothing; the scene is captured in real-life footage.
| !example6
A man in a dimly lit room talks on a vintage telephone...A man in a dimly lit room talks on a vintage telephone, hangs up, and looks down with a sad expression. He holds the black rotary phone to his right ear with his right hand, his left hand holding a rocks glass with amber liquid. He wears a brown suit jacket over a white shirt, and a gold ring on his left ring finger. His short hair is neatly combed, and he has light skin with visible wrinkles around his eyes. The camera remains stationary, focused on his face and upper body. The room is dark, lit only by a warm light source off-screen to the left, casting shadows on the wall behind him. The scene appears to be from a movie.
| !example7
A prison guard unlocks and opens a cell door...A prison guard unlocks and opens a cell door to reveal a young man sitting at a table with a woman. The guard, wearing a dark blue uniform with a badge on his left chest, unlocks the cell door with a key held in his right hand and pulls it open; he has short brown hair, light skin, and a neutral expression. The young man, wearing a black and white striped shirt, sits at a table covered with a white tablecloth, facing the woman; he has short brown hair, light skin, and a neutral expression. The woman, wearing a dark blue shirt, sits opposite the young man, her face turned towards him; she has short blonde hair and light skin. The camera remains stationary, capturing the scene from a medium distance, positioned slightly to the right of the guard. The room is dimly lit, with a single light fixture illuminating the table and the two figures. The walls are made of large, grey concrete blocks, and a metal door is visible in the background. The scene is captured in real-life footage.
| !example8
A woman with blood on her face and a white tank top...A woman with blood on her face and a white tank top looks down and to her right, then back up as she speaks. She has dark hair pulled back, light skin, and her face and chest are covered in blood. The camera angle is a close-up, focused on the woman's face and upper torso. The lighting is dim and blue-toned, creating a somber and intense atmosphere. The scene appears to be from a movie or TV show.
| | !example9
A man with graying hair, a beard, and a gray shirt...A man with graying hair, a beard, and a gray shirt looks down and to his right, then turns his head to the left. The camera angle is a close-up, focused on the man's face. The lighting is dim, with a greenish tint. The scene appears to be real-life footage. Step
| !example10
A clear, turquoise river flows through a rocky canyon...A clear, turquoise river flows through a rocky canyon, cascading over a small waterfall and forming a pool of water at the bottom.The river is the main focus of the scene, with its clear water reflecting the surrounding trees and rocks. The canyon walls are steep and rocky, with some vegetation growing on them. The trees are mostly pine trees, with their green needles contrasting with the brown and gray rocks. The overall tone of the scene is one of peace and tranquility.
| !example11
A man in a suit enters a room and speaks to two women...A man in a suit enters a room and speaks to two women sitting on a couch. The man, wearing a dark suit with a gold tie, enters the room from the left and walks towards the center of the frame. He has short gray hair, light skin, and a serious expression. He places his right hand on the back of a chair as he approaches the couch. Two women are seated on a light-colored couch in the background. The woman on the left wears a light blue sweater and has short blonde hair. The woman on the right wears a white sweater and has short blonde hair. The camera remains stationary, focusing on the man as he enters the room. The room is brightly lit, with warm tones reflecting off the walls and furniture. The scene appears to be from a film or television show.
| !example12
The waves crash against the jagged rocks of the shoreline...The waves crash against the jagged rocks of the shoreline, sending spray high into the air.The rocks are a dark gray color, with sharp edges and deep crevices. The water is a clear blue-green, with white foam where the waves break against the rocks. The sky is a light gray, with a few white clouds dotting the horizon.
| | !example13
The camera pans across a cityscape of tall buildings...The camera pans across a cityscape of tall buildings with a circular building in the center. The camera moves from left to right, showing the tops of the buildings and the circular building in the center. The buildings are various shades of gray and white, and the circular building has a green roof. The camera angle is high, looking down at the city. The lighting is bright, with the sun shining from the upper left, casting shadows from the buildings. The scene is computer-generated imagery.
| !example14
A man walks towards a window, looks out, and then turns around...A man walks towards a window, looks out, and then turns around. He has short, dark hair, dark skin, and is wearing a brown coat over a red and gray scarf. He walks from left to right towards a window, his gaze fixed on something outside. The camera follows him from behind at a medium distance. The room is brightly lit, with white walls and a large window covered by a white curtain. As he approaches the window, he turns his head slightly to the left, then back to the right. He then turns his entire body to the right, facing the window. The camera remains stationary as he stands in front of the window. The scene is captured in real-life footage.
| !example15
Two police officers in dark blue uniforms and matching hats...Two police officers in dark blue uniforms and matching hats enter a dimly lit room through a doorway on the left side of the frame. The first officer, with short brown hair and a mustache, steps inside first, followed by his partner, who has a shaved head and a goatee. Both officers have serious expressions and maintain a steady pace as they move deeper into the room. The camera remains stationary, capturing them from a slightly low angle as they enter. The room has exposed brick walls and a corrugated metal ceiling, with a barred window visible in the background. The lighting is low-key, casting shadows on the officers' faces and emphasizing the grim atmosphere. The scene appears to be from a film or television show.
| !example16
A woman with short brown hair, wearing a maroon sleeveless top...A woman with short brown hair, wearing a maroon sleeveless top and a silver necklace, walks through a room while talking, then a woman with pink hair and a white shirt appears in the doorway and yells. The first woman walks from left to right, her expression serious; she has light skin and her eyebrows are slightly furrowed. The second woman stands in the doorway, her mouth open in a yell; she has light skin and her eyes are wide. The room is dimly lit, with a bookshelf visible in the background. The camera follows the first woman as she walks, then cuts to a close-up of the second woman's face. The scene is captured in real-life footage.
| ## Model Details - **Developed by:** Lightricks - **Model type:** Diffusion-based text-to-video and image-to-video generation model - **Language(s):** English ## Usage ### Direct use You can use the model for purposes under the license: - version 0.9: license - version 0.9.1 license - version 0.9.5 license - version 0.9.6-dev license - version 0.9.6-distilled license ### General tips: * The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames. * The model works best on resolutions under 720 x 1280 and number of frames below 257. * Prompts should be in English. The more elaborate the better. Good prompt looks like ### Online demo The model is accessible right away via following links: - LTX-Studio image-to-video - Fal.ai text-to-video - Fal.ai image-to-video - Replicate text-to-video and image-to-video ### ComfyUI To use our model with ComfyUI, please follow the instructions at a dedicated ComfyUI repo. ### Run locally #### Installation The codebase was tested with Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2. Then, download the model from Hugging Face #### Inference To use our model, please follow the inference code in inference.py: ##### For text-to-video generation: ##### For image-to-video generation: ### Diffusers 🧨 LTX Video is compatible with the Diffusers Python library. It supports both text-to-video and image-to-video generation. Make sure you install before trying out the examples below. Now, you can run the examples below: For image-to-video: To learn more, check out the official documentation. Diffusers also supports directly loading from the original LTX checkpoints using the method. Check out this section to learn more. ## Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate videos that matches the prompts perfectly. - Prompt following is heavily influenced by the prompting-style.", + "model_explanation_gemini": "Generates high-quality 24 FPS videos at 768x512 resolution from text or image+text inputs in real-time using a DiT-based architecture." +} \ No newline at end of file diff --git a/data/model_data_json/LiheYoung_depth-anything-large-hf.json b/data/model_data_json/LiheYoung_depth-anything-large-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..1fa146f8edf360d29367107924e03f0dd6ad0279 --- /dev/null +++ b/data/model_data_json/LiheYoung_depth-anything-large-hf.json @@ -0,0 +1,17 @@ +{ + "model_id": "LiheYoung/depth-anything-large-hf", + "downloads": 158677, + "tags": [ + "transformers", + "safetensors", + "depth_anything", + "depth-estimation", + "vision", + "arxiv:2401.10891", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision pipeline_tag: depth-estimation widget: - inference: false --- # Depth Anything (large-sized model, Transformers version) Depth Anything model. It was introduced in the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang et al. and first released in this repository. Online demo is also provided. Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description Depth Anything leverages the DPT architecture with a DINOv2 backbone. The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation. \"drawing\" Depth Anything overview. Taken from the
. ## Intended uses & limitations You can use the raw model for tasks like zero-shot depth estimation. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot depth estimation: Alternatively, one can use the classes themselves: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Estimates depth in images using a DPT architecture with a DINOv2 backbone, trained on large-scale data for state-of-the-art relative and absolute depth prediction." +} \ No newline at end of file diff --git a/data/model_data_json/LiheYoung_depth-anything-small-hf.json b/data/model_data_json/LiheYoung_depth-anything-small-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..8a5a7234475d991ec54fddbf308ee24f5f2a1d18 --- /dev/null +++ b/data/model_data_json/LiheYoung_depth-anything-small-hf.json @@ -0,0 +1,17 @@ +{ + "model_id": "LiheYoung/depth-anything-small-hf", + "downloads": 102423, + "tags": [ + "transformers", + "safetensors", + "depth_anything", + "depth-estimation", + "vision", + "arxiv:2401.10891", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision pipeline_tag: depth-estimation widget: - inference: false --- # Depth Anything (small-sized model, Transformers version) Depth Anything model. It was introduced in the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang et al. and first released in this repository. Online demo is also provided. Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description Depth Anything leverages the DPT architecture with a DINOv2 backbone. The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation. \"drawing\" Depth Anything overview. Taken from the . ## Intended uses & limitations You can use the raw model for tasks like zero-shot depth estimation. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot depth estimation: Alternatively, one can use the classes themselves: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Estimates depth in images using a DPT architecture with a DINOv2 backbone, trained on large-scale data for state-of-the-art relative and absolute depth prediction." +} \ No newline at end of file diff --git a/data/model_data_json/LongSafari_evo-1-8k-transposon.json b/data/model_data_json/LongSafari_evo-1-8k-transposon.json new file mode 100644 index 0000000000000000000000000000000000000000..0d67a5dd5b8b48a7c474d556310dd21adabacc06 --- /dev/null +++ b/data/model_data_json/LongSafari_evo-1-8k-transposon.json @@ -0,0 +1,29 @@ +{ + "model_id": "LongSafari/evo-1-8k-transposon", + "downloads": 328292, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "stripedhyena", + "text-generation", + "long context", + "deep signal processing", + "hybrid", + "biology", + "genomics", + "custom_code", + "arxiv:2302.10866", + "arxiv:2203.14343", + "arxiv:2310.18780", + "arxiv:2206.11893", + "arxiv:2303.06349", + "arxiv:2102.02611", + "arxiv:2210.09298", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - stripedhyena - long context - deep signal processing - hybrid - biology - genomics --- ## Evo-1 (Transposon)

### News We identified and fixed an issue related to a wrong permutation of some projections, which affects generation quality. To use the new model revision, please load as follows: ### About Evo is a biological foundation model capable of long-context modeling and design. Evo uses the StripedHyena architecture to enable modeling of sequences at a single-nucleotide, byte-level resolution with near-linear scaling of compute and memory relative to context length. Evo has 7 billion parameters and is trained on OpenGenome, a prokaryotic whole-genome dataset containing ~300 billion tokens. Technical details about Evo can be found in our preprint and our accompanying blog posts. Evo was collaboratively developed by the Arc Institute and TogetherAI. As part of our commitment to open science, we release **weights of 15 intermediate pretraining checkpoints** for phase 1 and phase 2 of pretraining. The checkpoints are available as branches of the corresponding HuggingFace repository. **Evo-1 (Transposon)** is our fine-tuned model used to generate IS200/605, trained at a context length of 8k. | Checkpoint Name | Description | |----------------------------------------|-------------| | | A model pretrained with 8,192 context. We use this model as the base model for molecular-scale finetuning tasks. | | | A model pretrained with 131,072 context using as the initialization. We use this model to reason about and generate sequences at the genome scale. | | | A model fine-tuned on specifically on CRISPR-Cas systems. We use this model to generate Cas9/12/13 systems. | | | A model fine-tuned on specifically on transposons. We use this to generate IS200/IS605. | ### Model Architecture StripedHyena is a deep signal processing, hybrid architecture composed of multi-head attention and gated convolutions arranged in Hyena blocks, improving over decoder-only Transformers. StripedHyena is designed to leverage the specialization of each of its layer classes, with Hyena layers implementing the bulk of the computation required for sequence processing and attention layers supplementing the ability to perform targeted pattern recall. Some highlights of the architecture: - **Efficient autoregressive generation** via a recurrent mode (>500k generation with a single 80GB GPU) - **Significantly faster training and finetuning** at long context (>3x at 131k) - **Improved scaling laws over state-of-the-art architectures** (e.g., Transformer++) on both natural language and biological sequences. - **Robust to training beyond the compute-optimal frontier** e.g., training way beyond Chinchilla-optimal token amounts (see preprint for details -- more details to come) ### How to use Evo Example usage is provided in the standalone repo. #### Parametrization for Inference and Finetuning One of the advantages of deep signal processing models is their flexibility. Different parametrizations of convolutions can be used depending on the memory, expressivity and causality requirements of pretraining, finetuning or inference workloads. The main classes are: - Modal canonical: unconstrained poles (reference, reference), or constrained poles (reference, reference). - Companion canonical / rational: TBA. - Hypernetworks: hypernetwork (reference), modulated hypernetwork (reference). - Explicit: modulated explicit (reference). StripedHyena is a mixed precision model. Make sure to keep your and in precision, especially for longer prompts or training. ### Disclaimer To use StripedHyena outside of the playground, you will need to install custom kernels. Please follow the instructions from the standalone repository. ## Cite", + "model_explanation_gemini": "Generates IS200/605 transposon sequences using a fine-tuned biological foundation model with long-context capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/LorenzoDeMattei_GePpeTto.json b/data/model_data_json/LorenzoDeMattei_GePpeTto.json new file mode 100644 index 0000000000000000000000000000000000000000..6670ebc99dbdf42fbf1f366865450eeab5f5c55c --- /dev/null +++ b/data/model_data_json/LorenzoDeMattei_GePpeTto.json @@ -0,0 +1,19 @@ +{ + "model_id": "LorenzoDeMattei/GePpeTto", + "downloads": 78170, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "gpt2", + "text-generation", + "it", + "arxiv:2004.14253", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: it --- # GePpeTto GPT2 Model 🇮🇹 Pretrained GPT2 117M model for Italian. You can find further details in the paper: Lorenzo De Mattei, Michele Cafagna, Felice Dell’Orletta, Malvina Nissim, Marco Guerini \"GePpeTto Carves Italian into a Language Model\", arXiv preprint. Pdf available at: ## Pretraining Corpus The pretraining set comprises two main sources. The first one is a dump of Italian Wikipedia (November 2019), consisting of 2.8GB of text. The second one is the ItWac corpus (Baroni et al., 2009), which amounts to 11GB of web texts. This collection provides a mix of standard and less standard Italian, on a rather wide chronological span, with older texts than the Wikipedia dump (the latter stretches only to the late 2000s). ## Pretraining details This model was trained using GPT2's Hugging Face implemenation on 4 NVIDIA Tesla T4 GPU for 620k steps. Training parameters: - GPT-2 small configuration - vocabulary size: 30k - Batch size: 32 - Block size: 100 - Adam Optimizer - Initial learning rate: 5e-5 - Warm up steps: 10k ## Perplexity scores | Domain | Perplexity | |---|---| | Wikipedia | 26.1052 | | ItWac | 30.3965 | | Legal | 37.2197 | | News | 45.3859 | | Social Media | 84.6408 | For further details, qualitative analysis and human evaluation check out: ## Load Pretrained Model You can use this model by installing Huggingface library . And you can use it directly by initializing it like this: ## Example using GPT2LMHeadModel Output is, ## Citation Please use the following bibtex entry: ## References Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The WaCky wide web: a collection of very large linguistically processed webcrawled corpora. Language resources and evaluation, 43(3):209–226." +} \ No newline at end of file diff --git a/data/model_data_json/Lykon_AAM_XL_AnimeMix.json b/data/model_data_json/Lykon_AAM_XL_AnimeMix.json new file mode 100644 index 0000000000000000000000000000000000000000..73096257039aa6bbfa3e7d7cfa0a9400935c467c --- /dev/null +++ b/data/model_data_json/Lykon_AAM_XL_AnimeMix.json @@ -0,0 +1,23 @@ +{ + "model_id": "Lykon/AAM_XL_AnimeMix", + "downloads": 166902, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "stable-diffusion-xl", + "text-to-image", + "art", + "artistic", + "anime", + "en", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- language: - en license: other tags: - stable-diffusion - stable-diffusion-diffusers - stable-diffusion-xl - text-to-image - art - artistic - diffusers - anime --- # AAM XL AnimeMix is a Stable Diffusion model that has been fine-tuned on stabilityai/stable-diffusion-xl-base-1.0. Please consider supporting me: - on Patreon - or buy me a coffee **License**: Fair AI Public License 1.0-SD ## Diffusers For more general information on how to run text-to-image models with 🧨 Diffusers, see the docs. 1. Installation 2. Run", + "model_explanation_gemini": "Generates anime-style images from text prompts using a fine-tuned Stable Diffusion XL model." +} \ No newline at end of file diff --git a/data/model_data_json/Lykon_dreamshaper-8.json b/data/model_data_json/Lykon_dreamshaper-8.json new file mode 100644 index 0000000000000000000000000000000000000000..76e7fca6bd94fcd320613a0c6dfeec7f5f3b061c --- /dev/null +++ b/data/model_data_json/Lykon_dreamshaper-8.json @@ -0,0 +1,23 @@ +{ + "model_id": "Lykon/dreamshaper-8", + "downloads": 156532, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "text-to-image", + "art", + "artistic", + "anime", + "dreamshaper", + "en", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- language: - en license: creativeml-openrail-m tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image - art - artistic - diffusers - anime - dreamshaper duplicated_from: lykon/dreamshaper-8 pipeline_tag: text-to-image --- # Dreamshaper 8 is a Stable Diffusion model that has been fine-tuned on runwayml/stable-diffusion-v1-5. Please consider supporting me: - on Patreon - or buy me a coffee ## Diffusers For more general information on how to run text-to-image models with 🧨 Diffusers, see the docs. 1. Installation 2. Run ## Notes - **Version 8** focuses on improving what V7 started. Might be harder to do photorealism compared to realism focused models, as it might be hard to do anime compared to anime focused models, but it can do both pretty well if you're skilled enough. Check the examples! - **Version 7** improves lora support, NSFW and realism. If you're interested in \"absolute\" realism, try AbsoluteReality. - **Version 6** adds more lora support and more style in general. It should also be better at generating directly at 1024 height (but be careful with it). 6.x are all improvements. - **Version 5** is the best at photorealism and has noise offset. - **Version 4** is much better with anime (can do them with no LoRA) and booru tags. It might be harder to control if you're used to caption style, so you might still want to use version 3.31. V4 is also better with eyes at lower resolutions. Overall is like a \"fix\" of V3 and shouldn't be too much different.", + "model_explanation_gemini": "Dreamshaper-8 is a fine-tuned Stable Diffusion text-to-image model capable of generating artistic, anime, and realistic images, balancing multiple styles with improved versatility over previous versions." +} \ No newline at end of file diff --git a/data/model_data_json/M-CLIP_XLM-Roberta-Large-Vit-B-16Plus.json b/data/model_data_json/M-CLIP_XLM-Roberta-Large-Vit-B-16Plus.json new file mode 100644 index 0000000000000000000000000000000000000000..464a090762e1e0883d13556b9e328e70b8448d5d --- /dev/null +++ b/data/model_data_json/M-CLIP_XLM-Roberta-Large-Vit-B-16Plus.json @@ -0,0 +1,60 @@ +{ + "model_id": "M-CLIP/XLM-Roberta-Large-Vit-B-16Plus", + "downloads": 83225, + "tags": [ + "transformers", + "pytorch", + "tf", + "multilingual", + "af", + "sq", + "am", + "ar", + "az", + "bn", + "bs", + "bg", + "ca", + "zh", + "hr", + "cs", + "da", + "nl", + "en", + "et", + "fr", + "de", + "el", + "hi", + "hu", + "is", + "id", + "it", + "ja", + "mk", + "ml", + "mr", + "pl", + "pt", + "ro", + "ru", + "sr", + "sl", + "es", + "sw", + "sv", + "tl", + "te", + "tr", + "tk", + "uk", + "ur", + "ug", + "uz", + "vi", + "xh", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - sq - am - ar - az - bn - bs - bg - ca - zh - hr - cs - da - nl - en - et - fr - de - el - hi - hu - is - id - it - ja - mk - ml - mr - pl - pt - ro - ru - sr - sl - es - sw - sv - tl - te - tr - tk - uk - ur - ug - uz - vi - xh --- ## Multilingual-clip: XLM-Roberta-Large-Vit-B-16Plus Multilingual-CLIP extends OpenAI's English text encoders to multiple other languages. This model *only* contains the multilingual text encoder. The corresponding image model can be retrieved via instructions found on open_clip repository on Github. We provide a usage example below. ## Requirements To use both the multilingual text encoder and corresponding image encoder, we need to install the packages []( and []( ## Usage Extracting embeddings from the text encoder can be done in the following way: Extracting embeddings from the corresponding image encoder: ## Evaluation results None of the M-CLIP models have been extensivly evaluated, but testing them on Txt2Img retrieval on the humanly translated MS-COCO dataset, we see the following **R@10** results: | Name | En | De | Es | Fr | Zh | It | Pl | Ko | Ru | Tr | Jp | | ----------------------------------|:-----: |:-----: |:-----: |:-----: | :-----: |:-----: |:-----: |:-----: |:-----: |:-----: |:-----: | | OpenAI CLIP Vit-B/32| 90.3 | - | - | - | - | - | - | - | - | - | - | | OpenAI CLIP Vit-L/14| 91.8 | - | - | - | - | - | - | - | - | - | - | | OpenCLIP ViT-B-16+-| 94.3 | - | - | - | - | - | - | - | - | - | - | | LABSE Vit-L/14| 91.6 | 89.6 | 89.5 | 89.9 | 88.9 | 90.1 | 89.8 | 80.8 | 85.5 | 89.8 | 73.9 | | XLM-R Large Vit-B/32| 91.8 | 88.7 | 89.1 | 89.4 | 89.3 | 89.8| 91.4 | 82.1 | 86.1 | 88.8 | 81.0 | | XLM-R Vit-L/14| 92.4 | 90.6 | 91.0 | 90.0 | 89.7 | 91.1 | 91.3 | 85.2 | 85.8 | 90.3 | 81.9 | | XLM-R Large Vit-B/16+| **95.0** | **93.0** | **93.6** | **93.1** | **94.0** | **93.1** | **94.4** | **89.0** | **90.0** | **93.0** | **84.2** | ## Training/Model details Further details about the model training and data can be found in the model card." +} \ No newline at end of file diff --git a/data/model_data_json/M-FAC_bert-mini-finetuned-mnli.json b/data/model_data_json/M-FAC_bert-mini-finetuned-mnli.json new file mode 100644 index 0000000000000000000000000000000000000000..95553140df5bc2740365a3b4e026a1bf8ad2cdb3 --- /dev/null +++ b/data/model_data_json/M-FAC_bert-mini-finetuned-mnli.json @@ -0,0 +1,16 @@ +{ + "model_id": "M-FAC/bert-mini-finetuned-mnli", + "downloads": 199827, + "tags": [ + "transformers", + "pytorch", + "bert", + "text-classification", + "arxiv:2107.03356", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "# BERT-mini model finetuned with M-FAC This model is finetuned on MNLI dataset with state-of-the-art second-order optimizer M-FAC. Check NeurIPS 2021 paper for more details on M-FAC: ## Finetuning setup For fair comparison against default Adam baseline, we finetune the model in the same framework as described here and just swap Adam optimizer with M-FAC. Hyperparameters used by M-FAC optimizer: ## Results We share the best model out of 5 runs with the following score on MNLI validation set: Mean and standard deviation for 5 runs on MNLI validation set: | | Matched Accuracy | Mismatched Accuracy | |:-----:|:----------------:|:-------------------:| | Adam | 73.30 ± 0.20 | 74.85 ± 0.09 | | M-FAC | 74.59 ± 0.41 | 75.95 ± 0.14 | Results can be reproduced by adding M-FAC optimizer code in and running the following bash script: We believe these results could be improved with modest tuning of hyperparameters: , , , and . For the sake of fair comparison and a robust default setup we use the same hyperparameters across all models (, ) and all datasets (SQuAD version 2 and GLUE). Our code for M-FAC can be found here: A step-by-step tutorial on how to integrate and use M-FAC with any repository can be found here: ## BibTeX entry and citation info", + "model_explanation_gemini": "Finetuned on the MNLI dataset using the M-FAC optimizer, this BERT-mini model performs natural language inference tasks with improved accuracy over Adam-optimized baselines." +} \ No newline at end of file diff --git a/data/model_data_json/MIT_ast-finetuned-audioset-10-10-0.4593.json b/data/model_data_json/MIT_ast-finetuned-audioset-10-10-0.4593.json new file mode 100644 index 0000000000000000000000000000000000000000..8d2ec30c5620391dbfccfdcc48333759a1049d68 --- /dev/null +++ b/data/model_data_json/MIT_ast-finetuned-audioset-10-10-0.4593.json @@ -0,0 +1,17 @@ +{ + "model_id": "MIT/ast-finetuned-audioset-10-10-0.4593", + "downloads": 264370, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "audio-spectrogram-transformer", + "audio-classification", + "arxiv:2104.01778", + "license:bsd-3-clause", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: bsd-3-clause tags: - audio-classification --- # Audio Spectrogram Transformer (fine-tuned on AudioSet) Audio Spectrogram Transformer (AST) model fine-tuned on AudioSet. It was introduced in the paper AST: Audio Spectrogram Transformer by Gong et al. and first released in this repository. Disclaimer: The team releasing Audio Spectrogram Transformer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks. ## Usage You can use the raw model for classifying audio into one of the AudioSet classes. See the documentation for more info.", + "model_explanation_gemini": "Classifies audio into predefined AudioSet categories using a transformer-based spectrogram analysis approach." +} \ No newline at end of file diff --git a/data/model_data_json/MIT_ast-finetuned-audioset-14-14-0.443.json b/data/model_data_json/MIT_ast-finetuned-audioset-14-14-0.443.json new file mode 100644 index 0000000000000000000000000000000000000000..392be3de0c2ed94857c84ae1c603b3b6ad487cc2 --- /dev/null +++ b/data/model_data_json/MIT_ast-finetuned-audioset-14-14-0.443.json @@ -0,0 +1,17 @@ +{ + "model_id": "MIT/ast-finetuned-audioset-14-14-0.443", + "downloads": 189117, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "audio-spectrogram-transformer", + "audio-classification", + "arxiv:2104.01778", + "license:bsd-3-clause", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: bsd-3-clause tags: - audio-classification --- # Audio Spectrogram Transformer (fine-tuned on AudioSet) Audio Spectrogram Transformer (AST) model fine-tuned on AudioSet. It was introduced in the paper AST: Audio Spectrogram Transformer by Gong et al. and first released in this repository. Disclaimer: The team releasing Audio Spectrogram Transformer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks. ## Usage You can use the raw model for classifying audio into one of the AudioSet classes. See the documentation for more info.", + "model_explanation_gemini": "Classifies audio into predefined AudioSet categories using a spectrogram-based vision transformer approach." +} \ No newline at end of file diff --git a/data/model_data_json/MMG_xlm-roberta-large-ner-spanish.json b/data/model_data_json/MMG_xlm-roberta-large-ner-spanish.json new file mode 100644 index 0000000000000000000000000000000000000000..0ee826ca2cc2204ad3c10cfbc59277282c001516 --- /dev/null +++ b/data/model_data_json/MMG_xlm-roberta-large-ner-spanish.json @@ -0,0 +1,18 @@ +{ + "model_id": "MMG/xlm-roberta-large-ner-spanish", + "downloads": 734037, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "xlm-roberta", + "token-classification", + "es", + "dataset:CoNLL-2002", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - es datasets: - CoNLL-2002 widget: - text: \"Las oficinas de MMG están en Las Rozas.\" --- # xlm-roberta-large-ner-spanish This model is a XLM-Roberta-large model fine-tuned for Named Entity Recognition (NER) over the Spanish portion of the CoNLL-2002 dataset. Evaluating it over the test subset of this dataset, we get a F1-score of 89.17, being one of the best NER for Spanish available at the moment.", + "model_explanation_gemini": "Fine-tuned XLM-Roberta-large model for Spanish named entity recognition (NER), achieving an 89.17 F1-score on the CoNLL-2002 dataset." +} \ No newline at end of file diff --git a/data/model_data_json/MahmoodLab_TITAN.json b/data/model_data_json/MahmoodLab_TITAN.json new file mode 100644 index 0000000000000000000000000000000000000000..091c4b577a1bb915116e4407c5e7324a4d555c9c --- /dev/null +++ b/data/model_data_json/MahmoodLab_TITAN.json @@ -0,0 +1,21 @@ +{ + "model_id": "MahmoodLab/TITAN", + "downloads": 239360, + "tags": [ + "safetensors", + "titan", + "histology", + "pathology", + "vision", + "pytorch", + "self-supervised", + "vit", + "image-feature-extraction", + "custom_code", + "en", + "arxiv:2411.19666", + "license:cc-by-nc-nd-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-nd-4.0 language: - en tags: - histology - pathology - vision - pytorch - self-supervised - vit extra_gated_prompt: >- This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Please note that the primary email used to sign up for your Hugging Face account must match your institutional email to receive approval. By downloading the model, you attest that all information (affiliation, research use) is correct and up-to-date. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author. extra_gated_fields: Full name (first and last): text Current affiliation (no abbreviations): text Type of Affiliation: type: select options: - Academia - Industry - label: Other value: other Current and official institutional email (**this must match your primary email in your Hugging Face account, @gmail/@hotmail/@qq email domains will be denied**): text Please explain your intended research use: text I agree to all terms outlined above: checkbox I agree to use this model for non-commercial, academic purposes only: checkbox I agree not to distribute the model, if another user within your organization wishes to use the TITAN model, they must register as an individual user: checkbox metrics: - accuracy pipeline_tag: image-feature-extraction --- # Model Card for TITAN-preview \\[Preprint\\] | \\[Github Repo\\] | \\[Cite\\] ## What is TITAN? **TITAN** (**T**ransformer-based pathology **I**mage and **T**ext **A**lignment **N**etwork) is a multimodal whole-slide foundation model pre-trained using visual self-supervised learning and vision-language alignment. It leverages 335,645 whole-slide images (WSIs) from a diverse set of internally collected neoplastic, infectious, and inflammatory cases at Mass General Brigham. Additionally, TITAN utilizes over 182,000 pathology reports and more than 423,000 synthetic captions generated by PathChat, our pathology co-pilot. TITAN's slide embeddings achieve state-of-the-art performance on diverse downstream tasks, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation. This is a preview and we will bring you further updates and improvements. **your request will be denied**. To fix this, you can: (1) add your official institutional email to your HF account, and confirm your email address to verify, and (2) set your institutional email as your primary email in your HF account. Other reasons for your request access being denied include other mistakes in the form submitted, for example: full name includes abbreviations, affiliation is not spelled out, the described research use is not sufficient, or email domain address not recognized. ## Model Description - **Developed by:** Mahmood Lab AI for Pathology @ Harvard/BWH - **Model type:** Pretrained vision-language encoders - **Pretraining dataset:** Mass-340K, sourced from private histology collections (BWH / MGH), in addition to slides from the public GTEx consortium. - **Repository:** - **Preprint:** - **License:** CC-BY-NC-ND-4.0 ### Requirements ### Model Usage TITAN-preview is a vision-lanuage model trained on CONCH v1.5 patch features with patch size of 512x512 pixels at 20x magnification. Following authentication (using ), both TITAN-preview (slide and language encoders) and CONCH v1.5 (patch encoder) can be loaded using the commands below: You can directly use TITAN-preview for slide-level feature extaction. TITAN builds a feature grids from CONCH v1.5 patch features using the coordinates and the distance between the patches. As patch coordinates are always saved at the slides' level 0 magnification, TITAN takes patch_size_lv0 which represents the distance between two adjacent patches at level 0 magnification. It is 1024 if slide is 40x, or 512 if slide is 20x. We have this info saved in our demo TCGA features. Slide-level feature extraction can be done in the following way: These pre-extracted features can then be used for slide-level classification (via linear probing), retrieval (via l2 distance), and other machine learning settings, without task-specific finetuning. We also released all TCGA TITAN-preview features in . We demonstrated more detailed linear probe and zero-shot evaluation in our github. ## License and Terms of Use This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author. ## Contact For any additional questions or comments, contact Faisal Mahmood (), \\ Tong Ding (), \\ Sophia J. Wagner (), \\ Andrew H. Song (), \\ or Richard J. Chen (), ## Acknowledgements The project was built on top of amazing repositories such as ViT, iBOT, OpenClip, LGSSL, and Timm (ViT model implementation). We thank the authors and developers for their contribution. ## BibTeX If you found our work useful in your research, please consider citing our work at: Ding, T.\\*, Wagner S.J.\\*, Song, A.H.\\*, Chen, R.J.\\* et al. Multimodal Whole Slide Foundation Model for Pathology, Arxiv, 2024" +} \ No newline at end of file diff --git a/data/model_data_json/MahmoudAshraf_mms-300m-1130-forced-aligner.json b/data/model_data_json/MahmoudAshraf_mms-300m-1130-forced-aligner.json new file mode 100644 index 0000000000000000000000000000000000000000..a4d87c93a19c0bcb7e12a09cfc9704a7490111f0 --- /dev/null +++ b/data/model_data_json/MahmoudAshraf_mms-300m-1130-forced-aligner.json @@ -0,0 +1,150 @@ +{ + "model_id": "MahmoudAshraf/mms-300m-1130-forced-aligner", + "downloads": 2727243, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "mms", + "audio", + "voice", + "speech", + "forced-alignment", + "ab", + "af", + "ak", + "am", + "ar", + "as", + "av", + "ay", + "az", + "ba", + "bm", + "be", + "bn", + "bi", + "bo", + "sh", + "br", + "bg", + "ca", + "cs", + "ce", + "cv", + "ku", + "cy", + "da", + "de", + "dv", + "dz", + "el", + "en", + "eo", + "et", + "eu", + "ee", + "fo", + "fa", + "fj", + "fi", + "fr", + "fy", + "ff", + "ga", + "gl", + "gn", + "gu", + "zh", + "ht", + "ha", + "he", + "hi", + "hu", + "hy", + "ig", + "ia", + "ms", + "is", + "it", + "jv", + "ja", + "kn", + "ka", + "kk", + "kr", + "km", + "ki", + "rw", + "ky", + "ko", + "kv", + "lo", + "la", + "lv", + "ln", + "lt", + "lb", + "lg", + "mh", + "ml", + "mr", + "mk", + "mg", + "mt", + "mn", + "mi", + "my", + "nl", + "no", + "ne", + "ny", + "oc", + "om", + "or", + "os", + "pa", + "pl", + "pt", + "ps", + "qu", + "ro", + "rn", + "ru", + "sg", + "sk", + "sl", + "sm", + "sn", + "sd", + "so", + "es", + "sq", + "su", + "sv", + "sw", + "ta", + "tt", + "te", + "tg", + "tl", + "th", + "ti", + "ts", + "tr", + "uk", + "vi", + "wo", + "xh", + "yo", + "zu", + "za", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ab - af - ak - am - ar - as - av - ay - az - ba - bm - be - bn - bi - bo - sh - br - bg - ca - cs - ce - cv - ku - cy - da - de - dv - dz - el - en - eo - et - eu - ee - fo - fa - fj - fi - fr - fy - ff - ga - gl - gn - gu - zh - ht - ha - he - hi - sh - hu - hy - ig - ia - ms - is - it - jv - ja - kn - ka - kk - kr - km - ki - rw - ky - ko - kv - lo - la - lv - ln - lt - lb - lg - mh - ml - mr - ms - mk - mg - mt - mn - mi - my - zh - nl - 'no' - 'no' - ne - ny - oc - om - or - os - pa - pl - pt - ms - ps - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - ro - rn - ru - sg - sk - sl - sm - sn - sd - so - es - sq - su - sv - sw - ta - tt - te - tg - tl - th - ti - ts - tr - uk - ms - vi - wo - xh - ms - yo - ms - zu - za license: cc-by-nc-4.0 tags: - mms - wav2vec2 - audio - voice - speech - forced-alignment pipeline_tag: automatic-speech-recognition --- # Forced Alignment with Hugging Face CTC Models This Python package provides an efficient way to perform forced alignment between text and audio using Hugging Face's pretrained models. it also features an improved implementation to use much less memory than TorchAudio forced alignment API. The model checkpoint uploaded here is a conversion from torchaudio to HF Transformers for the MMS-300M checkpoint trained on forced alignment dataset ## Installation ## Usage", + "model_explanation_gemini": "Performs memory-efficient forced alignment between text and audio across multiple languages using a pretrained MMS-300M model converted from TorchAudio." +} \ No newline at end of file diff --git a/data/model_data_json/Marqo_marqo-fashionSigLIP.json b/data/model_data_json/Marqo_marqo-fashionSigLIP.json new file mode 100644 index 0000000000000000000000000000000000000000..f2d7df01f33630992c5716c546811eaeaf6b1af0 --- /dev/null +++ b/data/model_data_json/Marqo_marqo-fashionSigLIP.json @@ -0,0 +1,23 @@ +{ + "model_id": "Marqo/marqo-fashionSigLIP", + "downloads": 516922, + "tags": [ + "open_clip", + "onnx", + "safetensors", + "siglip", + "clip", + "transformers", + "e-commerce", + "fashion", + "multimodal retrieval", + "transformers.js", + "zero-shot-image-classification", + "custom_code", + "en", + "license:apache-2.0", + "region:us" + ], + "description": "--- tags: - clip - transformers - e-commerce - fashion - multimodal retrieval - siglip - transformers.js library_name: open_clip pipeline_tag: zero-shot-image-classification license: apache-2.0 language: - en metrics: - precision - recall - MRR --- # Marqo-FashionSigLIP Model Card which allows the model to be trained on not just text descriptions but also categories, style, colors, materials, keywords and fine-details to provide highly relevant search results on fashion products. The model was fine-tuned from ViT-B-16-SigLIP (webli). **Github Page**: Marqo-FashionCLIP **Blog**: Marqo Blog ## Usage ### Hugging Face The model can be loaded with AutoModel by ### OpenCLIP The model can be seamlessly used with OpenCLIP by ### Transformers.js You can also run the model in JavaScript with the Transformers.js library. First, install it from NPM using: Then, compute embeddings as follows: ## Benchmark Results Average evaluation results on 6 public multimodal fashion datasets (Atlas, DeepFashion (In-shop), DeepFashion (Multimodal), Fashion200k, KAGL, and Polyvore) are reported below: **Text-To-Image (Averaged across 6 datasets)** | Model | AvgRecall | Recall@1 | Recall@10 | MRR | |----------------------------|-------------|------------|-------------|-----------| | Marqo-FashionSigLIP | **0.231** | **0.121** | **0.340** | **0.239** | | FashionCLIP2.0 | 0.163 | 0.077 | 0.249 | 0.165 | | OpenFashionCLIP | 0.132 | 0.060 | 0.204 | 0.135 | | ViT-B-16-laion2b_s34b_b88k | 0.174 | 0.088 | 0.261 | 0.180 | | ViT-B-16-SigLIP-webli | 0.212 | 0.111 | 0.314 | 0.214 | **Category-To-Product (Averaged across 5 datasets)** | Model | AvgP | P@1 | P@10 | MRR | |----------------------------|-----------|-----------|-----------|-----------| | Marqo-FashionSigLIP | **0.737** | **0.758** | **0.716** | **0.812** | | FashionCLIP2.0 | 0.684 | 0.681 | 0.686 | 0.741 | | OpenFashionCLIP | 0.646 | 0.653 | 0.639 | 0.720 | | ViT-B-16-laion2b_s34b_b88k | 0.662 | 0.673 | 0.652 | 0.743 | | ViT-B-16-SigLIP-webli | 0.688 | 0.690 | 0.685 | 0.751 | **Sub-Category-To-Product (Averaged across 4 datasets)** | Model | AvgP | P@1 | P@10 | MRR | |----------------------------|-----------|-----------|-----------|-----------| | Marqo-FashionSigLIP | **0.725** | **0.767** | **0.683** | **0.811** | | FashionCLIP2.0 | 0.657 | 0.676 | 0.638 | 0.733 | | OpenFashionCLIP | 0.598 | 0.619 | 0.578 | 0.689 | | ViT-B-16-laion2b_s34b_b88k | 0.638 | 0.651 | 0.624 | 0.712 | | ViT-B-16-SigLIP-webli | 0.643 | 0.643 | 0.643 | 0.726 |", + "model_explanation_gemini": "Fine-tuned for multimodal fashion retrieval, this SigLIP-based model excels in zero-shot image classification and delivers highly relevant search results by analyzing text descriptions, categories, styles, colors, and materials." +} \ No newline at end of file diff --git a/data/model_data_json/Marqo_nsfw-image-detection-384.json b/data/model_data_json/Marqo_nsfw-image-detection-384.json new file mode 100644 index 0000000000000000000000000000000000000000..9e7410de66c32d25099202495b0df3ce3935f797 --- /dev/null +++ b/data/model_data_json/Marqo_nsfw-image-detection-384.json @@ -0,0 +1,13 @@ +{ + "model_id": "Marqo/nsfw-image-detection-384", + "downloads": 127764, + "tags": [ + "timm", + "safetensors", + "image-classification", + "license:apache-2.0", + "region:us" + ], + "description": "--- tags: - image-classification - timm library_name: timm license: apache-2.0 --- # Model card for nsfw-image-detection-384 __NOTE: Like all models, this one can make mistakes. NSFW content can be subjective and contextual, this model is intended to help identify this content, use at your own risk.__ is a lightweight image classification model designed to identify NSFW images. The model is approximately 18–20x smaller than other open-source models and achieves a superior accuracy of 98.56% on our dataset. This model uses 384x384 pixel images for the input with 16x16 pixel patches. This model was trained on a proprietary dataset of 220,000 images. The training set includes 100,000 NSFW examples and 100,000 SFW examples, while the test set contains 10,000 NSFW examples and 10,000 SFW examples. This dataset features a diverse range of content, including: real photos, drawings, Rule 34 material, memes, and AI-generated images. The definition of NSFW can vary and is sometimes contextual, our dataset was constructed to contain challenging examples however this definition may not be 100% aligned with every use case, as such we recommend experimenting and trying different thresholds to determine if this model is suitable for your needs. ## Model Usage ### Image Classification with timm ## Evaluation This model outperforms existing NSFW detectors on our dataset, here we provide an evaluation against AdamCodd/vit-base-nsfw-detector and Falconsai/nsfw_image_detection: !Evaluation against other models ### Thresholds and Precision vs Recall Adjusting the threshold for the NSFW probability can let you trade off precision, recall, and accuracy. This maybe be useful in different applications where different degrees of confidence are required. !Thresholded Evaluation !Precision and Recall Curves ## Training Details This model is a finetune of the timm/vit_tiny_patch16_384.augreg_in21k_ft_in1k model. ### Args ## Citation", + "model_explanation_gemini": "Detects NSFW content in images with high accuracy using a lightweight, efficient model trained on diverse datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Maykeye_TinyLLama-v0.json b/data/model_data_json/Maykeye_TinyLLama-v0.json new file mode 100644 index 0000000000000000000000000000000000000000..f269338cd5cba88f504a4d489f99d46b4710ad1a --- /dev/null +++ b/data/model_data_json/Maykeye_TinyLLama-v0.json @@ -0,0 +1,19 @@ +{ + "model_id": "Maykeye/TinyLLama-v0", + "downloads": 594867, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "llama", + "text-generation", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 --- This is a first version of recreating roneneldan/TinyStories-1M but using Llama architecture. * Full training process is included in the notebook train.ipynb. Recreating it as simple as downloading TinyStoriesV2-GPT4-train.txt and TinyStoriesV2-GPT4-valid.txt in the same folder with the notebook and running the cells. Validation content is not used by the script so you put anythin in * Backup directory has a script do_backup that I used to copy weights from remote machine to local. Weight are generated too quickly, so by the time script copied weihgt N+1 * This is extremely PoC version. Training truncates stories that are longer than context size and doesn't use any sliding window to train story not from the start * Training took approximately 9 hours (3 hours per epoch) on 40GB A100. ~30GB VRAM was used * I use tokenizer from open_llama_3b. However I had troubles with it locally( I had no troubles on the cloud machine with preninstalled libraries. * Demo script is demo.py * Validation script is provided: valid.py. use it like : After training I decided that it's not necessary to beat validation into chunks * Also this version uses very stupid caching mechinsm to shuffle stories for training: it keeps cache of N recently loaded chunks so if random shuffle asks for a story, it may use cache or load chunk. Training dataset is too small, so in next versions I will get rid of it. from transformers import AutoModelForCausalLM, AutoTokenizer", + "model_explanation_gemini": "Recreates the TinyStories-1M model using Llama architecture for generating short stories by training on the TinyStoriesV2 dataset." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_DeepSeek-V3-0324-GGUF.json b/data/model_data_json/MaziyarPanahi_DeepSeek-V3-0324-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..ae1f5ed569450e9b0fd73c4fb82cd6add6676670 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_DeepSeek-V3-0324-GGUF.json @@ -0,0 +1,19 @@ +{ + "model_id": "MaziyarPanahi/DeepSeek-V3-0324-GGUF", + "downloads": 231997, + "tags": [ + "gguf", + "quantized", + "2-bit", + "GGUF", + "text-generation", + "base_model:deepseek-ai/DeepSeek-V3-0324", + "base_model:quantized:deepseek-ai/DeepSeek-V3-0324", + "license:mit", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- license: mit base_model: deepseek-ai/DeepSeek-V3-0324 inference: false model_creator: deepseek-ai model_name: DeepSeek-V3-0324-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - GGUF - text-generation --- # MaziyarPanahi/DeepSeek-V3-0324-GGUF - Model creator: deepseek-ai - Original model: deepseek-ai/DeepSeek-V3-0324 ## Description MaziyarPanahi/DeepSeek-V3-0324-GGUF contains GGUF format model files for deepseek-ai/DeepSeek-V3-0324. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of DeepSeek-V3-0324 optimized for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Llama-3-8B-Instruct-32k-v0.1-GGUF.json b/data/model_data_json/MaziyarPanahi_Llama-3-8B-Instruct-32k-v0.1-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..fce992142ac555e81f5275b50851cf029d3b33f0 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Llama-3-8B-Instruct-32k-v0.1-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF", + "downloads": 283888, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1", + "base_model:quantized:MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Llama-3-8B-Instruct-32k-v0.1-GGUF base_model: MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1 inference: false model_creator: MaziyarPanahi pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF - Model creator: MaziyarPanahi - Original model: MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1 ## Description MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF contains GGUF format model files for MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of the Llama-3-8B-Instruct-32k model optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Llama-3-8B-Instruct-64k-GGUF.json b/data/model_data_json/MaziyarPanahi_Llama-3-8B-Instruct-64k-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..7550d5a4ae753c0ebbfa9b082871fecad8aa4d60 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Llama-3-8B-Instruct-64k-GGUF.json @@ -0,0 +1,26 @@ +{ + "model_id": "MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF", + "downloads": 261264, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "llama", + "llama-3", + "base_model:MaziyarPanahi/Llama-3-8B-Instruct-64k", + "base_model:quantized:MaziyarPanahi/Llama-3-8B-Instruct-64k", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - llama - llama-3 - text-generation model_name: Llama-3-8B-Instruct-64k-GGUF base_model: MaziyarPanahi/Llama-3-8B-Instruct-64k inference: false model_creator: MaziyarPanahi pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF - Model creator: MaziyarPanahi - Original model: MaziyarPanahi/Llama-3-8B-Instruct-64k ## Description MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF contains GGUF format model files for MaziyarPanahi/Llama-3-8B-Instruct-64k. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF-format version of the Llama-3-8B-Instruct-64k model designed for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Llama-3.2-1B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Llama-3.2-1B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..32e7e428aedceb4a7cf2f57a74ff36f8299e822d --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Llama-3.2-1B-Instruct-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Llama-3.2-1B-Instruct-GGUF", + "downloads": 273027, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:meta-llama/Llama-3.2-1B-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-1B-Instruct", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Llama-3.2-1B-Instruct-GGUF base_model: meta-llama/Llama-3.2-1B-Instruct inference: false model_creator: meta-llama pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Llama-3.2-1B-Instruct-GGUF - Model creator: meta-llama - Original model: meta-llama/Llama-3.2-1B-Instruct ## Description MaziyarPanahi/Llama-3.2-1B-Instruct-GGUF contains GGUF format model files for meta-llama/Llama-3.2-1B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "Quantized GGUF format model files for text generation based on meta-llama/Llama-3.2-1B-Instruct." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Llama-3.2-3B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Llama-3.2-3B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..409d1fc17190ca421632fc30711ddc6bd7039c57 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Llama-3.2-3B-Instruct-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Llama-3.2-3B-Instruct-GGUF", + "downloads": 204275, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:meta-llama/Llama-3.2-3B-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-3B-Instruct", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Llama-3.2-3B-Instruct-GGUF base_model: meta-llama/Llama-3.2-3B-Instruct inference: false model_creator: meta-llama pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Llama-3.2-3B-Instruct-GGUF - Model creator: meta-llama - Original model: meta-llama/Llama-3.2-3B-Instruct ## Description MaziyarPanahi/Llama-3.2-3B-Instruct-GGUF contains GGUF format model files for meta-llama/Llama-3.2-3B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF-format version of Meta's Llama-3.2-3B-Instruct model optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Llama-3.3-70B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Llama-3.3-70B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..f50e0cad8e2adf44af4a2b71f2e64a5d8ac5faa2 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Llama-3.3-70B-Instruct-GGUF.json @@ -0,0 +1,22 @@ +{ + "model_id": "MaziyarPanahi/Llama-3.3-70B-Instruct-GGUF", + "downloads": 235333, + "tags": [ + "gguf", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:meta-llama/Llama-3.3-70B-Instruct", + "base_model:quantized:meta-llama/Llama-3.3-70B-Instruct", + "region:us", + "conversational" + ], + "description": "--- base_model: meta-llama/Llama-3.3-70B-Instruct inference: false model_creator: meta-llama model_name: Llama-3.3-70B-Instruct-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/Llama-3.3-70B-Instruct-GGUF - Model creator: meta-llama - Original model: meta-llama/Llama-3.3-70B-Instruct ## Description MaziyarPanahi/Llama-3.3-70B-Instruct-GGUF contains GGUF format model files for meta-llama/Llama-3.3-70B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Meta's Llama-3.3-70B-Instruct model optimized for efficient text generation across various platforms." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Meta-Llama-3-8B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Meta-Llama-3-8B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..69ead062b430f22237f76c3ca1a7495e404bf694 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Meta-Llama-3-8B-Instruct-GGUF.json @@ -0,0 +1,31 @@ +{ + "model_id": "MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF", + "downloads": 350327, + "tags": [ + "transformers", + "gguf", + "mistral", + "facebook", + "meta", + "pytorch", + "llama", + "llama-3", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "16-bit", + "GGUF", + "text-generation", + "en", + "base_model:meta-llama/Meta-Llama-3-8B-Instruct", + "base_model:quantized:meta-llama/Meta-Llama-3-8B-Instruct", + "region:us", + "conversational" + ], + "description": "--- language: - en pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - 16-bit - GGUF base_model: meta-llama/Meta-Llama-3-8B-Instruct inference: false model_creator: MaziyarPanahi model_name: Meta-Llama-3-8B-Instruct-GGUF quantized_by: MaziyarPanahi license_name: llama3 --- # MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF - Model creator: meta-llama - Original model: meta-llama/Meta-Llama-3-8B-Instruct The GGUF and quantized models here are based on meta-llama/Meta-Llama-3-8B-Instruct model ## How to download You can download only the quants you need instead of cloning the entire repository as follows: ## Load GGUF models You follow the prompt template provided by Llama-3: Original README --- ## Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. **Model developers** Meta **Variations** Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. **Input** Models input text only. **Output** Models generate text and code only. **Model Architecture** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Context length GQA Token count Knowledge cutoff
Llama 3 A new mix of publicly available online data. 8B 8k Yes 15T+ March, 2023
70B 8k Yes December, 2023
**Llama 3 family of models**. Token counts refer to pretraining data only. Both the 8 and 70B versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date** April 18, 2024. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3 Community License. Use in languages other than English**. **Note: Developers may fine-tune Llama 3 models for languages beyond English provided they comply with the Llama 3 Community License and the Acceptable Use Policy. ## How to use This repository contains two versions of Meta-Llama-3-70B-Instruct, for use with transformers and with the original codebase. ### Use with transformers See the snippet below for usage with Transformers: ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : For Hugging Face support, we recommend using transformers or TGI, but a similar command works. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint Pretraining utilized a cumulative** 7.7M GPU hours of computation on hardware of type H100-80GB (TDP of 700W). Estimated total emissions were 2290 tCO2eq, 100% of which were offset by Meta’s sustainability program.
Time (GPU hours) Power Consumption (W) Carbon Emitted(tCO2eq)
Llama 3 8B 1.3M 700 390
Llama 3 70B 6.4M 700 1900
Total 7.7M 2290
**CO2 emissions during pre-training**. Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 3 was pretrained on over 15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 10M human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of March 2023 for the 7B and December 2023 for the 70B models respectively. ## Benchmarks In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here. ### Base pretrained models
Category Benchmark Llama 3 8B Llama2 7B Llama2 13B Llama 3 70B Llama2 70B
General MMLU (5-shot) 66.6 45.7 53.8 79.5 69.7
AGIEval English (3-5 shot) 45.9 28.8 38.7 63.0 54.8
CommonSenseQA (7-shot) 72.6 57.6 67.6 83.8 78.7
Winogrande (5-shot) 76.1 73.3 75.4 83.1 81.8
BIG-Bench Hard (3-shot, CoT) 61.1 38.1 47.0 81.3 65.7
ARC-Challenge (25-shot) 78.6 53.7 67.6 93.0 85.3
Knowledge reasoning TriviaQA-Wiki (5-shot) 78.5 72.1 79.6 89.7 87.5
Reading comprehension SQuAD (1-shot) 76.4 72.2 72.1 85.6 82.6
QuAC (1-shot, F1) 44.4 39.6 44.9 51.1 49.4
BoolQ (0-shot) 75.7 65.5 66.9 79.0 73.1
DROP (3-shot, F1) 58.4 37.9 49.8 79.7 70.2
### Instruction tuned models
Benchmark Llama 3 8B Llama 2 7B Llama 2 13B Llama 3 70B Llama 2 70B
MMLU (5-shot) 68.4 34.1 47.8 82.0 52.9
GPQA (0-shot) 34.2 21.7 22.3 39.5 21.0
HumanEval (0-shot) 62.2 7.9 14.0 81.7 25.6
GSM-8K (8-shot, CoT) 79.6 25.7 77.4 93.0 57.5
MATH (4-shot, CoT) 30.0 3.8 6.7 50.4 11.6
### Responsibility & Safety We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community. Foundation models are widely capable technologies that are built to be used for a diverse range of applications. They are not designed to meet every developer preference on safety levels for all use cases, out-of-the-box, as those by their nature will differ across different applications. Rather, responsible LLM-application deployment is achieved by implementing a series of safety best practices throughout the development of such applications, from the model pre-training, fine-tuning and the deployment of systems composed of safeguards to tailor the safety needs specifically to the use case and audience. As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application. We also provide a set of resources including Meta Llama Guard 2 and Code Shield safeguards. These tools have proven to drastically reduce residual risks of LLM Systems, while maintaining a high level of helpfulness. We encourage developers to tune and deploy these safeguards according to their needs and we provide a reference implementation to get you started. #### Llama 3-Instruct As outlined in the Responsible Use Guide, some trade-off between model helpfulness and model alignment is likely unavoidable. Developers should exercise discretion about how to weigh the benefits of alignment and helpfulness for their specific use case and audience. Developers should be mindful of residual risks when using Llama models and leverage additional safety tools as needed to reach the right safety bar for their use case. Safety For our instruction tuned model, we conducted extensive red teaming exercises, performed adversarial evaluations and implemented safety mitigations techniques to lower residual risks. As with any Large Language Model, residual risks will likely remain and we recommend that developers assess these risks in the context of their use case. In parallel, we are working with the community to make AI safety benchmark standards transparent, rigorous and interpretable. Refusals In addition to residual risks, we put a great emphasis on model refusals to benign prompts. Over-refusing not only can impact the user experience but could even be harmful in certain contexts as well. We’ve heard the feedback from the developer community and improved our fine tuning to ensure that Llama 3 is significantly less likely to falsely refuse to answer prompts than Llama 2. We built internal benchmarks and developed mitigations to limit false refusals making Llama 3 our most helpful model to date. #### Responsible release In addition to responsible use considerations outlined above, we followed a rigorous process that requires us to take extra measures against misuse and critical risks before we make our release decision. Misuse If you access or use Llama 3, you agree to the Acceptable Use Policy. The most recent copy of this policy can be found at #### Critical risks CBRNE (Chemical, Biological, Radiological, Nuclear, and high yield Explosives) We have conducted a two fold assessment of the safety of the model in this area: * Iterative testing during model training to assess the safety of responses related to CBRNE threats and other adversarial risks. * Involving external CBRNE experts to conduct an uplift test assessing the ability of the model to accurately provide expert knowledge and reduce barriers to potential CBRNE misuse, by reference to what can be achieved using web search (without the model). ### Cyber Security We have evaluated Llama 3 with CyberSecEval, Meta’s cybersecurity safety eval suite, measuring Llama 3’s propensity to suggest insecure code when used as a coding assistant, and Llama 3’s propensity to comply with requests to help carry out cyber attacks, where attacks are defined by the industry standard MITRE ATT&CK cyber attack ontology. On our insecure coding and cyber attacker helpfulness tests, Llama 3 behaved in the same range or safer than models of equivalent coding capability. ### Child Safety Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership in AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has been in English, and has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3 models, developers should perform safety testing and tuning tailored to their specific applications of the model. As outlined in the Responsible Use Guide, we recommend incorporating Purple Llama solutions into your workflows and specifically Llama Guard which provides a base model to filter input and output prompts to layer system-level safety on top of model-level safety. Please see the Responsible Use Guide available at ## Citation instructions @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = { } ## Contributors Aaditya Singh; Aaron Grattafiori; Abhimanyu Dubey; Abhinav Jauhri; Abhinav Pandey; Abhishek Kadian; Adam Kelsey; Adi Gangidi; Ahmad Al-Dahle; Ahuva Goldstand; Aiesha Letman; Ajay Menon; Akhil Mathur; Alan Schelten; Alex Vaughan; Amy Yang; Andrei Lupu; Andres Alvarado; Andrew Gallagher; Andrew Gu; Andrew Ho; Andrew Poulton; Andrew Ryan; Angela Fan; Ankit Ramchandani; Anthony Hartshorn; Archi Mitra; Archie Sravankumar; Artem Korenev; Arun Rao; Ashley Gabriel; Ashwin Bharambe; Assaf Eisenman; Aston Zhang; Aurelien Rodriguez; Austen Gregerson; Ava Spataru; Baptiste Roziere; Ben Maurer; Benjamin Leonhardi; Bernie Huang; Bhargavi Paranjape; Bing Liu; Binh Tang; Bobbie Chern; Brani Stojkovic; Brian Fuller; Catalina Mejia Arenas; Chao Zhou; Charlotte Caucheteux; Chaya Nayak; Ching-Hsiang Chu; Chloe Bi; Chris Cai; Chris Cox; Chris Marra; Chris McConnell; Christian Keller; Christoph Feichtenhofer; Christophe Touret; Chunyang Wu; Corinne Wong; Cristian Canton Ferrer; Damien Allonsius; Daniel Kreymer; Daniel Haziza; Daniel Li; Danielle Pintz; Danny Livshits; Danny Wyatt; David Adkins; David Esiobu; David Xu; Davide Testuggine; Delia David; Devi Parikh; Dhruv Choudhary; Dhruv Mahajan; Diana Liskovich; Diego Garcia-Olano; Diego Perino; Dieuwke Hupkes; Dingkang Wang; Dustin Holland; Egor Lakomkin; Elina Lobanova; Xiaoqing Ellen Tan; Emily Dinan; Eric Smith; Erik Brinkman; Esteban Arcaute; Filip Radenovic; Firat Ozgenel; Francesco Caggioni; Frank Seide; Frank Zhang; Gabriel Synnaeve; Gabriella Schwarz; Gabrielle Lee; Gada Badeer; Georgia Anderson; Graeme Nail; Gregoire Mialon; Guan Pang; Guillem Cucurell; Hailey Nguyen; Hannah Korevaar; Hannah Wang; Haroun Habeeb; Harrison Rudolph; Henry Aspegren; Hu Xu; Hugo Touvron; Iga Kozlowska; Igor Molybog; Igor Tufanov; Iliyan Zarov; Imanol Arrieta Ibarra; Irina-Elena Veliche; Isabel Kloumann; Ishan Misra; Ivan Evtimov; Jacob Xu; Jade Copet; Jake Weissman; Jan Geffert; Jana Vranes; Japhet Asher; Jason Park; Jay Mahadeokar; Jean-Baptiste Gaya; Jeet Shah; Jelmer van der Linde; Jennifer Chan; Jenny Hong; Jenya Lee; Jeremy Fu; Jeremy Teboul; Jianfeng Chi; Jianyu Huang; Jie Wang; Jiecao Yu; Joanna Bitton; Joe Spisak; Joelle Pineau; Jon Carvill; Jongsoo Park; Joseph Rocca; Joshua Johnstun; Junteng Jia; Kalyan Vasuden Alwala; Kam Hou U; Kate Plawiak; Kartikeya Upasani; Kaushik Veeraraghavan; Ke Li; Kenneth Heafield; Kevin Stone; Khalid El-Arini; Krithika Iyer; Kshitiz Malik; Kuenley Chiu; Kunal Bhalla; Kyle Huang; Lakshya Garg; Lauren Rantala-Yeary; Laurens van der Maaten; Lawrence Chen; Leandro Silva; Lee Bell; Lei Zhang; Liang Tan; Louis Martin; Lovish Madaan; Luca Wehrstedt; Lukas Blecher; Luke de Oliveira; Madeline Muzzi; Madian Khabsa; Manav Avlani; Mannat Singh; Manohar Paluri; Mark Zuckerberg; Marcin Kardas; Martynas Mankus; Mathew Oldham; Mathieu Rita; Matthew Lennie; Maya Pavlova; Meghan Keneally; Melanie Kambadur; Mihir Patel; Mikayel Samvelyan; Mike Clark; Mike Lewis; Min Si; Mitesh Kumar Singh; Mo Metanat; Mona Hassan; Naman Goyal; Narjes Torabi; Nicolas Usunier; Nikolay Bashlykov; Nikolay Bogoychev; Niladri Chatterji; Ning Dong; Oliver Aobo Yang; Olivier Duchenne; Onur Celebi; Parth Parekh; Patrick Alrassy; Paul Saab; Pavan Balaji; Pedro Rittner; Pengchuan Zhang; Pengwei Li; Petar Vasic; Peter Weng; Polina Zvyagina; Prajjwal Bhargava; Pratik Dubal; Praveen Krishnan; Punit Singh Koura; Qing He; Rachel Rodriguez; Ragavan Srinivasan; Rahul Mitra; Ramon Calderer; Raymond Li; Robert Stojnic; Roberta Raileanu; Robin Battey; Rocky Wang; Rohit Girdhar; Rohit Patel; Romain Sauvestre; Ronnie Polidoro; Roshan Sumbaly; Ross Taylor; Ruan Silva; Rui Hou; Rui Wang; Russ Howes; Ruty Rinott; Saghar Hosseini; Sai Jayesh Bondu; Samyak Datta; Sanjay Singh; Sara Chugh; Sargun Dhillon; Satadru Pan; Sean Bell; Sergey Edunov; Shaoliang Nie; Sharan Narang; Sharath Raparthy; Shaun Lindsay; Sheng Feng; Sheng Shen; Shenghao Lin; Shiva Shankar; Shruti Bhosale; Shun Zhang; Simon Vandenhende; Sinong Wang; Seohyun Sonia Kim; Soumya Batra; Sten Sootla; Steve Kehoe; Suchin Gururangan; Sumit Gupta; Sunny Virk; Sydney Borodinsky; Tamar Glaser; Tamar Herman; Tamara Best; Tara Fowler; Thomas Georgiou; Thomas Scialom; Tianhe Li; Todor Mihaylov; Tong Xiao; Ujjwal Karn; Vedanuj Goswami; Vibhor Gupta; Vignesh Ramanathan; Viktor Kerkez; Vinay Satish Kumar; Vincent Gonguet; Vish Vogeti; Vlad Poenaru; Vlad Tiberiu Mihailescu; Vladan Petrovic; Vladimir Ivanov; Wei Li; Weiwei Chu; Wenhan Xiong; Wenyin Fu; Wes Bouaziz; Whitney Meers; Will Constable; Xavier Martinet; Xiaojian Wu; Xinbo Gao; Xinfeng Xie; Xuchao Jia; Yaelle Goldschlag; Yann LeCun; Yashesh Gaur; Yasmine Babaei; Ye Qi; Yenda Li; Yi Wen; Yiwen Song; Youngjin Nam; Yuchen Hao; Yuchen Zhang; Yun Wang; Yuning Mao; Yuzi He; Zacharie Delpierre Coudert; Zachary DeVito; Zahra Hankir; Zhaoduo Wen; Zheng Yan; Zhengxing Chen; Zhenyu Yang; Zoe Papakipos ---", + "model_explanation_gemini": "A quantized version of Meta's Llama-3-8B-Instruct model optimized for text generation and dialogue tasks, available in various bit sizes for efficient inference." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-405B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-405B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..7ed04917ada67fa730de5d98b3cb4248e95cebd5 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-405B-Instruct-GGUF.json @@ -0,0 +1,27 @@ +{ + "model_id": "MaziyarPanahi/Meta-Llama-3.1-405B-Instruct-GGUF", + "downloads": 240376, + "tags": [ + "gguf", + "quantized", + "2-bit", + "3-bit", + "GGUF", + "text-generation", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "base_model:meta-llama/Llama-3.1-405B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-405B-Instruct", + "license:llama3.1", + "region:us", + "conversational" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th tags: - quantized - 2-bit - 3-bit - GGUF - text-generation - text-generation model_name: Meta-Llama-3.1-405B-Instruct-GGUF base_model: meta-llama/Meta-Llama-3.1-405B-Instruct inference: false model_creator: meta-llama pipeline_tag: text-generation quantized_by: MaziyarPanahi license: llama3.1 --- # MaziyarPanahi/Meta-Llama-3.1-405B-Instruct-GGUF - Model creator: meta-llama - Original model: meta-llama/Meta-Llama-3.1-405B-Instruct ## Description MaziyarPanahi/Meta-Llama-3.1-405B-Instruct-GGUF contains GGUF format model files for meta-llama/Meta-Llama-3.1-405B-Instruct. ## Sample > llama.cpp/llama-cli -m Meta-Llama-3.1-405B-Instruct.Q2_K.gguf-00001-of-00009.gguf -p \"write 10 sentences ending with the word apple.\" -n 1024 -t 40 ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF-format version of Meta's Llama 3.1 405B Instruct model optimized for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-70B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-70B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..719fd32f8fe77f04de4bd5a1f769b1d1b00b95c0 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-70B-Instruct-GGUF.json @@ -0,0 +1,31 @@ +{ + "model_id": "MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF", + "downloads": 265120, + "tags": [ + "gguf", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "base_model:meta-llama/Llama-3.1-70B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-70B-Instruct", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Meta-Llama-3.1-70B-Instruct-GGUF base_model: meta-llama/Meta-Llama-3.1-70B-Instruct inference: false model_creator: meta-llama pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF - Model creator: meta-llama - Original model: meta-llama/Meta-Llama-3.1-70B-Instruct ## Description MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF contains GGUF format model files for meta-llama/Meta-Llama-3.1-70B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Meta-Llama-3.1-70B-Instruct optimized for efficient text generation across multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-8B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-8B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..c2a1a60ff6aff32028d3d73ff046847a5f883439 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Meta-Llama-3.1-8B-Instruct-GGUF.json @@ -0,0 +1,33 @@ +{ + "model_id": "MaziyarPanahi/Meta-Llama-3.1-8B-Instruct-GGUF", + "downloads": 265280, + "tags": [ + "gguf", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3.1", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Meta-Llama-3.1-8B-Instruct-GGUF base_model: meta-llama/Meta-Llama-3.1-8B-Instruct inference: false model_creator: meta-llama pipeline_tag: text-generation quantized_by: MaziyarPanahi license: llama3.1 --- # MaziyarPanahi/Meta-Llama-3.1-8B-Instruct-GGUF - Model creator: meta-llama - Original model: meta-llama/Meta-Llama-3.1-8B-Instruct ## Description MaziyarPanahi/Meta-Llama-3.1-8B-Instruct-GGUF contains GGUF format model files for meta-llama/Meta-Llama-3.1-8B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible. Original README: --- ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 41.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.", + "model_explanation_gemini": "A quantized multilingual text-generation model optimized for dialogue, supporting multiple bit levels and GGUF format for efficient inference." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Mistral-7B-Instruct-v0.3-GGUF.json b/data/model_data_json/MaziyarPanahi_Mistral-7B-Instruct-v0.3-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..92fbed965d2ed99b7c993816943aa28910515d4f --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Mistral-7B-Instruct-v0.3-GGUF.json @@ -0,0 +1,29 @@ +{ + "model_id": "MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF", + "downloads": 307909, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "safetensors", + "text-generation", + "conversational", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "text-generation-inference", + "region:us", + "base_model:mistralai/Mistral-7B-Instruct-v0.3", + "base_model:quantized:mistralai/Mistral-7B-Instruct-v0.3" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - transformers - safetensors - mistral - text-generation - conversational - license:apache-2.0 - autotrain_compatible - endpoints_compatible - text-generation-inference - region:us - text-generation model_name: Mistral-7B-Instruct-v0.3-GGUF base_model: mistralai/Mistral-7B-Instruct-v0.3 inference: false model_creator: mistralai pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF - Model creator: mistralai - Original model: mistralai/Mistral-7B-Instruct-v0.3 ## Description MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF contains GGUF format model files for mistralai/Mistral-7B-Instruct-v0.3. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized version of Mistral-7B-Instruct-v0.3 in GGUF format for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Mistral-Large-Instruct-2411-GGUF.json b/data/model_data_json/MaziyarPanahi_Mistral-Large-Instruct-2411-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..5b547c74841409a6d05345c92e4cc122d40c3a43 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Mistral-Large-Instruct-2411-GGUF.json @@ -0,0 +1,22 @@ +{ + "model_id": "MaziyarPanahi/Mistral-Large-Instruct-2411-GGUF", + "downloads": 260427, + "tags": [ + "gguf", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:mistralai/Mistral-Large-Instruct-2411", + "base_model:quantized:mistralai/Mistral-Large-Instruct-2411", + "region:us", + "conversational" + ], + "description": "--- base_model: mistralai/Mistral-Large-Instruct-2411 inference: false model_creator: mistralai model_name: Mistral-Large-Instruct-2411-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/Mistral-Large-Instruct-2411-GGUF - Model creator: mistralai - Original model: mistralai/Mistral-Large-Instruct-2411 ## Description MaziyarPanahi/Mistral-Large-Instruct-2411-GGUF contains GGUF format model files for mistralai/Mistral-Large-Instruct-2411. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Mistral-Large-Instruct-2411 optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Mistral-Nemo-Instruct-2407-GGUF.json b/data/model_data_json/MaziyarPanahi_Mistral-Nemo-Instruct-2407-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..9f521ac1764cae9e28bae5b8a6ce1f65d8162bab --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Mistral-Nemo-Instruct-2407-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF", + "downloads": 277741, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:mistralai/Mistral-Nemo-Instruct-2407", + "base_model:quantized:mistralai/Mistral-Nemo-Instruct-2407", + "region:us", + "conversational" + ], + "description": "--- base_model: mistralai/Mistral-Nemo-Instruct-2407 model_name: Mistral-Nemo-Instruct-2407-GGUF pipeline_tag: text-generation tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation inference: false model_creator: mistralai quantized_by: MaziyarPanahi --- # MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF - Model creator: mistralai - Original model: mistralai/Mistral-Nemo-Instruct-2407 ## Description MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF contains GGUF format model files for mistralai/Mistral-Nemo-Instruct-2407. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Mistral-Nemo-Instruct-2407 optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Mistral-Small-24B-Instruct-2501-GGUF.json b/data/model_data_json/MaziyarPanahi_Mistral-Small-24B-Instruct-2501-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..5481092370a2ab4a6e969c668865874c864780f7 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Mistral-Small-24B-Instruct-2501-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Mistral-Small-24B-Instruct-2501-GGUF", + "downloads": 224146, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:mistralai/Mistral-Small-24B-Instruct-2501", + "base_model:quantized:mistralai/Mistral-Small-24B-Instruct-2501", + "region:us", + "conversational" + ], + "description": "--- base_model: mistralai/Mistral-Small-24B-Instruct-2501 inference: false model_creator: mistralai model_name: Mistral-Small-24B-Instruct-2501-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/Mistral-Small-24B-Instruct-2501-GGUF - Model creator: mistralai - Original model: mistralai/Mistral-Small-24B-Instruct-2501 ## Description MaziyarPanahi/Mistral-Small-24B-Instruct-2501-GGUF contains GGUF format model files for mistralai/Mistral-Small-24B-Instruct-2501. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized version of Mistral-Small-24B-Instruct-2501 in GGUF format for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Mistral-Small-Instruct-2409-GGUF.json b/data/model_data_json/MaziyarPanahi_Mistral-Small-Instruct-2409-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..0555e35472f788e67fb9d6d2c5c049fe5a276f0e --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Mistral-Small-Instruct-2409-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Mistral-Small-Instruct-2409-GGUF", + "downloads": 219617, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:mistralai/Mistral-Small-Instruct-2409", + "base_model:quantized:mistralai/Mistral-Small-Instruct-2409", + "region:us", + "conversational" + ], + "description": "--- base_model: mistralai/Mistral-Small-Instruct-2409 inference: false model_creator: mistralai model_name: Mistral-Small-Instruct-2409-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/Mistral-Small-Instruct-2409-GGUF - Model creator: mistralai - Original model: mistralai/Mistral-Small-Instruct-2409 ## Description MaziyarPanahi/Mistral-Small-Instruct-2409-GGUF contains GGUF format model files for mistralai/Mistral-Small-Instruct-2409. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Mistral-Small-Instruct-2409 optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Mixtral-8x22B-v0.1-GGUF.json b/data/model_data_json/MaziyarPanahi_Mixtral-8x22B-v0.1-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..c608eae147378a4c205da67a90ded06fb44e3dd5 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Mixtral-8x22B-v0.1-GGUF.json @@ -0,0 +1,32 @@ +{ + "model_id": "MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF", + "downloads": 229903, + "tags": [ + "transformers", + "gguf", + "mixtral", + "text-generation", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "16-bit", + "GGUF", + "moe", + "fr", + "en", + "es", + "it", + "de", + "base_model:v2ray/Mixtral-8x22B-v0.1", + "base_model:quantized:v2ray/Mixtral-8x22B-v0.1", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 base_model: v2ray/Mixtral-8x22B-v0.1 inference: false model_creator: MaziyarPanahi model_name: Mixtral-8x22B-v0.1-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - 16-bit - GGUF - mixtral - moe language: - fr - en - es - it - de --- # Mixtral-8x22B-v0.1-GGUF On April 10th, @MistralAI released a model named \"Mixtral 8x22B,\" an 176B MoE via magnet link (torrent): - 141B MoE with ~35B active - Context length of 65k tokens - The base model can be fine-tuned - Requires ~260GB VRAM in fp16, 73GB in int4 - Licensed under Apache 2.0, according to their Discord - Available on @huggingface (community) - Utilizes a tokenizer similar to previous models The GGUF and quantized models here are based on v2ray/Mixtral-8x22B-v0.1 model ## How to download You can download only the quants you need instead of cloning the entire repository as follows: ## Load sharded model will detect the number of files and will load additional tensors from the rest of files. The output from quantized model: Since this appears to be a base model, it will keep on generating. ## Credit - MistralAI for opening the weights - v2ray for downloading, converting, and sharing it with the community Mixtral-8x22B-v0.1 - philschmid for the photo he shared on his Twitter ▄▄▄░░ ▄▄▄▄▄█████████░░░░ ▄▄▄▄▄▄████████████████████░░░░░ █████████████████████████████░░░░░ ▄▄▄▄▄▄█████░░░ █████████████████████████████░░░░░ ▄▄▄▄▄██████████████████░░░░░░ ██████████████████████████████░░░░░ ▄█████████████████████████████░░░░░░░░██████████████████████████████░░░░░ ███████████████████████████████░░░░░░░██████████████████████████████░░░░░ ███████████████████████████████░░░░░░░██████████████████████████████░░░░░ ███████████████████████████████░░░░░░███████████████████████████████░░░░░ ████████████████████████████████░░░░░███████████████████████████████░░░░░ ████████████████████████████████░░░░████████████████████████████████░░░░░ █████████████████████████████████░░░████████████████████████████████░░░░░ █████████████████████████████████░░░████████████░███████████████████░░░░░ ██████████████████████████████████░█████████████░███████████████████░░░░░ ███████████████████░██████████████▄█████████████░███████████████████░░░░░ ███████████████████░███████████████████████████░░███████████████████░░░░░ ███████████████████░░██████████████████████████░░███████████████████░░░░░ ███████████████████░░█████████████████████████░░░███████████████████░░░░░ ███████████████████░░░████████████████████████░░░███████████████████░░░░░ ███████████████████░░░████████████████████████░░░███████████████████░░░░░ ███████████████████░░░░██████████████████████░░░░███████████████████░░░░░ ███████████████████░░░░██████████████████████░░░░███████████████████░░░░░ ███████████████████░░░░░█████████████████████░░░░███████████████████░░░░░ ███████████████████░░░░░████████████████████░░░░░███████████████████░░░░░ ███████████████████░░░░░░███████████████████░░░░░███████████████████░░░░░ ███████████████████░░░░░░██████████████████░░░░░░███████████████████░░░░░ ███████████████████░░░░░░░█████████████████░░░░░░███████████████████░░░░░ ███████████████████░░░░░░░█████████████████░░░░░░███████████████████░░░░░ ███████████████████░░░░░░░░███████████████░░░░░░░██████████░░░░░░░░░░░░░░ ███████████████████░░░░░░░░███████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ███████████████████░░░░░░░░███████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ███████████████████░░░░░░░░░██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ███████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ░░░ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░ ░░░░░", + "model_explanation_gemini": "A quantized version of the Mixtral-8x22B-v0.1 model optimized for efficient text generation in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Phi-3.5-mini-instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Phi-3.5-mini-instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..3c5ee612a579f4c03de44e8b3586bda36e8bdf14 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Phi-3.5-mini-instruct-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "MaziyarPanahi/Phi-3.5-mini-instruct-GGUF", + "downloads": 271904, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:microsoft/Phi-3.5-mini-instruct", + "base_model:quantized:microsoft/Phi-3.5-mini-instruct", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Phi-3.5-mini-instruct-GGUF base_model: microsoft/Phi-3.5-mini-instruct inference: false model_creator: microsoft pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Phi-3.5-mini-instruct-GGUF - Model creator: microsoft - Original model: microsoft/Phi-3.5-mini-instruct ## Description MaziyarPanahi/Phi-3.5-mini-instruct-GGUF contains GGUF format model files for microsoft/Phi-3.5-mini-instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF-format version of Microsoft's Phi-3.5-mini-instruct model optimized for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Phi-4-mini-instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Phi-4-mini-instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..1aeefc5e24d677d2decc750128ccb4879b4b5cdf --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Phi-4-mini-instruct-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Phi-4-mini-instruct-GGUF", + "downloads": 228753, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:microsoft/Phi-4-mini-instruct", + "base_model:quantized:microsoft/Phi-4-mini-instruct", + "region:us", + "conversational" + ], + "description": "--- base_model: microsoft/Phi-4-mini-instruct inference: false model_creator: microsoft model_name: Phi-4-mini-instruct-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/Phi-4-mini-instruct-GGUF - Model creator: microsoft - Original model: microsoft/Phi-4-mini-instruct ## Description MaziyarPanahi/Phi-4-mini-instruct-GGUF contains GGUF format model files for microsoft/Phi-4-mini-instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized version of Microsoft's Phi-4-mini-instruct model in GGUF format for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_QwQ-32B-GGUF.json b/data/model_data_json/MaziyarPanahi_QwQ-32B-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..6ba7fd68683b4f58dd80502a5e4fb065be0c8b0e --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_QwQ-32B-GGUF.json @@ -0,0 +1,22 @@ +{ + "model_id": "MaziyarPanahi/QwQ-32B-GGUF", + "downloads": 228911, + "tags": [ + "gguf", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:Qwen/QwQ-32B", + "base_model:quantized:Qwen/QwQ-32B", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: QwQ-32B-GGUF base_model: Qwen/QwQ-32B inference: false model_creator: Qwen pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/QwQ-32B-GGUF - Model creator: Qwen - Original model: Qwen/QwQ-32B ## Description MaziyarPanahi/QwQ-32B-GGUF contains GGUF format model files for Qwen/QwQ-32B. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "Quantized GGUF format files for Qwen/QwQ-32B, a text-generation model, supporting various bit levels (2-bit to 8-bit) for efficient inference." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Qwen2-7B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Qwen2-7B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..157485a0caede0908dbc0074b449289fa043bed3 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Qwen2-7B-Instruct-GGUF.json @@ -0,0 +1,26 @@ +{ + "model_id": "MaziyarPanahi/Qwen2-7B-Instruct-GGUF", + "downloads": 242229, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "llama-3", + "llama", + "base_model:Qwen/Qwen2-7B-Instruct", + "base_model:quantized:Qwen/Qwen2-7B-Instruct", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - llama-3 - llama - text-generation model_name: Qwen2-7B-Instruct-GGUF base_model: Qwen/Qwen2-7B-Instruct inference: false model_creator: Qwen pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Qwen2-7B-Instruct-GGUF - Model creator: Qwen - Original model: Qwen/Qwen2-7B-Instruct ## Description MaziyarPanahi/Qwen2-7B-Instruct-GGUF contains GGUF format model files for Qwen/Qwen2-7B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "Quantized GGUF format files for Qwen2-7B-Instruct, enabling efficient text generation across various bit precisions." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Qwen2.5-1.5B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Qwen2.5-1.5B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..55f0a9f94777b863ab2d5316973728d01015920d --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Qwen2.5-1.5B-Instruct-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Qwen2.5-1.5B-Instruct-GGUF", + "downloads": 263124, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:Qwen/Qwen2.5-1.5B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-1.5B-Instruct", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Qwen2.5-1.5B-Instruct-GGUF base_model: Qwen/Qwen2.5-1.5B-Instruct inference: false model_creator: Qwen pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Qwen2.5-1.5B-Instruct-GGUF - Model creator: Qwen - Original model: Qwen/Qwen2.5-1.5B-Instruct ## Description MaziyarPanahi/Qwen2.5-1.5B-Instruct-GGUF contains GGUF format model files for Qwen/Qwen2.5-1.5B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of the Qwen2.5-1.5B-Instruct model optimized for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Qwen2.5-7B-Instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_Qwen2.5-7B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..332a4415be539998ceabe531e59246272b6e25e0 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Qwen2.5-7B-Instruct-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF", + "downloads": 276366, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:Qwen/Qwen2.5-7B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-7B-Instruct", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Qwen2.5-7B-Instruct-GGUF base_model: Qwen/Qwen2.5-7B-Instruct inference: false model_creator: Qwen pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF - Model creator: Qwen - Original model: Qwen/Qwen2.5-7B-Instruct ## Description MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF contains GGUF format model files for Qwen/Qwen2.5-7B-Instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized version of the Qwen2.5-7B-Instruct model in GGUF format for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_WizardLM-2-7B-GGUF.json b/data/model_data_json/MaziyarPanahi_WizardLM-2-7B-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..646e0d2eab21fcb216c422c705223b60bae316db --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_WizardLM-2-7B-GGUF.json @@ -0,0 +1,31 @@ +{ + "model_id": "MaziyarPanahi/WizardLM-2-7B-GGUF", + "downloads": 271265, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "safetensors", + "text-generation", + "arxiv:2304.12244", + "arxiv:2306.08568", + "arxiv:2308.09583", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "text-generation-inference", + "region:us", + "base_model:microsoft/WizardLM-2-7B", + "base_model:quantized:microsoft/WizardLM-2-7B" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - transformers - safetensors - mistral - text-generation - arxiv:2304.12244 - arxiv:2306.08568 - arxiv:2308.09583 - license:apache-2.0 - autotrain_compatible - endpoints_compatible - text-generation-inference - region:us - text-generation model_name: WizardLM-2-7B-GGUF base_model: microsoft/WizardLM-2-7B inference: false model_creator: microsoft pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/WizardLM-2-7B-GGUF - Model creator: microsoft - Original model: microsoft/WizardLM-2-7B ## Description MaziyarPanahi/WizardLM-2-7B-GGUF contains GGUF format model files for microsoft/WizardLM-2-7B. ## Prompt template or Taken from the original README --- --- license: apache-2.0 ---

🏠 WizardLM-2 Release Blog

🤗 HF Repo •🐱 Github Repo • 🐦 Twitter • 📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

👋 Join our Discord

## News 🔥🔥🔥 [2024/04/15] We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B. - WizardLM-2 8x22B is our most advanced model, demonstrates highly competitive performance compared to those leading proprietary works and consistently outperforms all the existing state-of-the-art opensource models. - WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. - WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models. For more details of WizardLM-2 please read our release blog post and upcoming paper. ## Model Details * **Model name**: WizardLM-2 7B * **Developed by**: WizardLM@Microsoft AI * **Base model**: mistralai/Mistral-7B-v0.1 * **Parameters**: 7B * **Language(s)**: Multilingual * **Blog**: Introducing WizardLM-2 * **Repository**: * **Paper**: WizardLM-2 (Upcoming) * **License**: Apache2.0 ## Model Capacities **MT-Bench** We also adopt the automatic MT-Bench evaluation framework based on GPT-4 proposed by lmsys to assess the performance of models. The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary models. Meanwhile, WizardLM-2 7B and WizardLM-2 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.

\"MTBench\"

**Human Preferences Evaluation** We carefully collected a complex and challenging set consisting of real-world instructions, which includes main requirements of humanity, such as writing, coding, math, reasoning, agent, and multilingual. We report the win:loss rate without tie: - WizardLM-2 8x22B is just slightly falling behind GPT-4-1106-preview, and significantly stronger than Command R Plus and GPT4-0314. - WizardLM-2 70B is better than GPT4-0613, Mistral-Large, and Qwen1.5-72B-Chat. - WizardLM-2 7B is comparable with Qwen1.5-32B-Chat, and surpasses Qwen1.5-14B-Chat and Starling-LM-7B-beta.

\"Win\"

## Method Overview We built a **fully AI powered synthetic training system** to train WizardLM-2 models, please refer to our blog for more details of this system.

\"Method\"

## Usage ❗Note for model system prompts usage: WizardLM-2 adopts the prompt format from Vicuna and supports **multi-turn** conversation. The prompt should be as following: Inference WizardLM-2 Demo Script We provide a WizardLM-2 inference demo code on our github. --- ## How to use Thanks to TheBloke for preparing an amazing README on how to use GGUF models: ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ### Explanation of quantisation methods
Click to see details The new methods available are: * GGML_TYPE_Q2_K - \"type-1\" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw) * GGML_TYPE_Q3_K - \"type-0\" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw. * GGML_TYPE_Q4_K - \"type-1\" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. * GGML_TYPE_Q5_K - \"type-1\" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw * GGML_TYPE_Q6_K - \"type-0\" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw ## How to download GGUF files **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file. The following clients/libraries will automatically download models for you, providing a list of available models to choose from: * LM Studio * LoLLMS Web UI * Faraday.dev ### In Under Download Model, you can enter the model repo: MaziyarPanahi/WizardLM-2-7B-GGUF and below it, a specific filename to download, such as: WizardLM-2-7B-GGUF.Q4_K_M.gguf. Then click Download. ### On the command line, including multiple files at once I recommend using the Python library: Then you can download any individual model file to the current directory, at high speed, with a command like this:
More advanced huggingface-cli download usage (click to read) You can also download multiple files at once with a pattern: For more documentation on downloading with , please see: HF -> Hub Python Library -> Download files -> Download from the CLI. To accelerate downloads on fast connections (1Gbit/s or higher), install : And set environment variable to : Windows Command Line users: You can set the environment variable by running before the download command.
## Example command Make sure you are using from commit d0cee0d or later. Change to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. Change to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value. If you want to have a chat-style conversation, replace the argument with For other parameters and how to use them, please refer to the llama.cpp documentation ## How to run in Further instructions can be found in the text-generation-webui documentation, here: text-generation-webui/docs/04 ‐ Model Tab.md. ## How to run from Python code You can use GGUF models from Python using the llama-cpp-python or ctransformers libraries. Note that at the time of writing (Nov 27th 2023), ctransformers has not been updated for some time and is not compatible with some recent models. Therefore I recommend you use llama-cpp-python. ### How to load this model in Python code, using llama-cpp-python For full documentation, please see: llama-cpp-python docs. #### First install the package Run one of the following commands, according to your system: #### Simple llama-cpp-python example code ## How to use with LangChain Here are guides on using llama-cpp-python and ctransformers with LangChain: * LangChain + llama-cpp-python * LangChain + ctransformers", + "model_explanation_gemini": "A quantized 7B parameter multilingual text-generation model optimized for complex chat, reasoning, and agent tasks, derived from Microsoft's WizardLM-2." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Yi-1.5-6B-Chat-GGUF.json b/data/model_data_json/MaziyarPanahi_Yi-1.5-6B-Chat-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..db7aac5918397eb7971fb24adf116b6381b8a6a6 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Yi-1.5-6B-Chat-GGUF.json @@ -0,0 +1,31 @@ +{ + "model_id": "MaziyarPanahi/Yi-1.5-6B-Chat-GGUF", + "downloads": 260711, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "safetensors", + "llama", + "text-generation", + "conversational", + "arxiv:2403.04652", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "text-generation-inference", + "region:us", + "base_model:01-ai/Yi-1.5-6B-Chat", + "base_model:quantized:01-ai/Yi-1.5-6B-Chat" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - transformers - safetensors - llama - text-generation - conversational - arxiv:2403.04652 - license:apache-2.0 - autotrain_compatible - endpoints_compatible - text-generation-inference - region:us - text-generation model_name: Yi-1.5-6B-Chat-GGUF base_model: 01-ai/Yi-1.5-6B-Chat inference: false model_creator: 01-ai pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Yi-1.5-6B-Chat-GGUF - Model creator: 01-ai - Original model: 01-ai/Yi-1.5-6B-Chat ## Description MaziyarPanahi/Yi-1.5-6B-Chat-GGUF contains GGUF format model files for 01-ai/Yi-1.5-6B-Chat. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of the Yi-1.5-6B-Chat model designed for efficient text generation and conversational tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Yi-Coder-1.5B-Chat-GGUF.json b/data/model_data_json/MaziyarPanahi_Yi-Coder-1.5B-Chat-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..f431d59e39a34a9412d6e49c0f3e8117b8e9b166 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Yi-Coder-1.5B-Chat-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF", + "downloads": 267454, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:01-ai/Yi-Coder-1.5B-Chat", + "base_model:quantized:01-ai/Yi-Coder-1.5B-Chat", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Yi-Coder-1.5B-Chat-GGUF base_model: 01-ai/Yi-Coder-1.5B-Chat inference: false model_creator: 01-ai pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF - Model creator: 01-ai - Original model: 01-ai/Yi-Coder-1.5B-Chat ## Description MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF contains GGUF format model files for 01-ai/Yi-Coder-1.5B-Chat. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "Generates text in GGUF format with various quantization levels (2-bit to 8-bit) for efficient deployment." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_Yi-Coder-9B-Chat-GGUF.json b/data/model_data_json/MaziyarPanahi_Yi-Coder-9B-Chat-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..5608f1223195182a7ec2b3bb29e73e37c968cb3a --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_Yi-Coder-9B-Chat-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/Yi-Coder-9B-Chat-GGUF", + "downloads": 242344, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:01-ai/Yi-Coder-9B-Chat", + "base_model:quantized:01-ai/Yi-Coder-9B-Chat", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: Yi-Coder-9B-Chat-GGUF base_model: 01-ai/Yi-Coder-9B-Chat inference: false model_creator: 01-ai pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/Yi-Coder-9B-Chat-GGUF - Model creator: 01-ai - Original model: 01-ai/Yi-Coder-9B-Chat ## Description MaziyarPanahi/Yi-Coder-9B-Chat-GGUF contains GGUF format model files for 01-ai/Yi-Coder-9B-Chat. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of the Yi-Coder-9B-Chat model designed for text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_firefunction-v2-GGUF.json b/data/model_data_json/MaziyarPanahi_firefunction-v2-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..c3433b82663b15d76e3b1a1956cd313a75e6489b --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_firefunction-v2-GGUF.json @@ -0,0 +1,28 @@ +{ + "model_id": "MaziyarPanahi/firefunction-v2-GGUF", + "downloads": 264038, + "tags": [ + "transformers", + "gguf", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "safetensors", + "text-generation", + "conversational", + "function-calling", + "text-generation-inference", + "region:us", + "base_model:fireworks-ai/llama-3-firefunction-v2", + "base_model:quantized:fireworks-ai/llama-3-firefunction-v2", + "license:llama3", + "imatrix" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - transformers - safetensors - text-generation - conversational - function-calling - text-generation-inference - region:us - text-generation model_name: MaziyarPanahi/firefunction-v2-GGUF base_model: fireworks-ai/firefunction-v2 inference: false model_creator: fireworks-ai pipeline_tag: text-generation quantized_by: MaziyarPanahi license: llama3 --- # MaziyarPanahi/firefunction-v2-GGUF - Model creator: fireworks-ai - Original model: fireworks-ai/firefunction-v2 ## Description MaziyarPanahi/firefunction-v2-GGUF contains GGUF format model files for fireworks-ai/firefunction-v2. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible. Original README --- # FireFunction V2: Fireworks Function Calling Model **Try on Fireworks** | **API Docs** | **Demo App** | **Discord** \"firefunction\" FireFunction is a state-of-the-art function calling model with a commercially viable license. View detailed info in our announcement blog. Key info and highlights: **Comparison with other models:** - Competitive with GPT-4o at function-calling, scoring 0.81 vs 0.80 on a medley of public evaluations - Trained on Llama 3 and retains Llama 3’s conversation and instruction-following capabilities, scoring 0.84 vs Llama 3’s 0.89 on MT bench - Significant quality improvements over FireFunction v1 across the broad range of metrics **General info:** 🐾 Successor of the FireFunction model 🔆 Support of parallel function calling (unlike FireFunction v1) and good instruction following 💡 Hosted on the Fireworks platform at < 10% of the cost of GPT 4o and 2x the speed", + "model_explanation_gemini": "Quantized GGUF version of FireFunction V2, a high-performance function-calling model with conversational abilities, optimized for cost-effective deployment." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_gemma-2-2b-it-GGUF.json b/data/model_data_json/MaziyarPanahi_gemma-2-2b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..3c275de125c48af6af5da599847e31060b45829c --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_gemma-2-2b-it-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "MaziyarPanahi/gemma-2-2b-it-GGUF", + "downloads": 264062, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:google/gemma-2-2b-it", + "base_model:quantized:google/gemma-2-2b-it", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: gemma-2-2b-it-GGUF base_model: google/gemma-2-2b-it inference: false model_creator: google pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/gemma-2-2b-it-GGUF - Model creator: google - Original model: google/gemma-2-2b-it ## Description MaziyarPanahi/gemma-2-2b-it-GGUF contains GGUF format model files for google/gemma-2-2b-it. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Google's Gemma 2.2B model optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_gemma-3-12b-it-GGUF.json b/data/model_data_json/MaziyarPanahi_gemma-3-12b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..524a6e9c20fac00ba475c37352134124df558a49 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_gemma-3-12b-it-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/gemma-3-12b-it-GGUF", + "downloads": 239117, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:google/gemma-3-12b-it", + "base_model:quantized:google/gemma-3-12b-it", + "region:us", + "conversational" + ], + "description": "--- base_model: google/gemma-3-12b-it inference: false model_creator: google model_name: gemma-3-12b-it-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/gemma-3-12b-it-GGUF - Model creator: google - Original model: google/gemma-3-12b-it ## Description MaziyarPanahi/gemma-3-12b-it-GGUF contains GGUF format model files for google/gemma-3-12b-it. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Google's Gemma 3B 12B text-generation model optimized for efficient inference across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_gemma-3-1b-it-GGUF.json b/data/model_data_json/MaziyarPanahi_gemma-3-1b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..c1963e2b5cecd4d47d73f90f69a2c038682c007e --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_gemma-3-1b-it-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/gemma-3-1b-it-GGUF", + "downloads": 274496, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:google/gemma-3-1b-it", + "base_model:quantized:google/gemma-3-1b-it", + "region:us", + "conversational" + ], + "description": "--- base_model: google/gemma-3-1b-it inference: false model_creator: google model_name: gemma-3-1b-it-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/gemma-3-1b-it-GGUF - Model creator: google - Original model: google/gemma-3-1b-it ## Description MaziyarPanahi/gemma-3-1b-it-GGUF contains GGUF format model files for google/gemma-3-1b-it. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized version of Google's Gemma 3B instruction-tuned model in GGUF format for efficient text generation." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_gemma-3-27b-it-GGUF.json b/data/model_data_json/MaziyarPanahi_gemma-3-27b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..6f48a132ac440b95b52f0ca1fe42d11d806df465 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_gemma-3-27b-it-GGUF.json @@ -0,0 +1,22 @@ +{ + "model_id": "MaziyarPanahi/gemma-3-27b-it-GGUF", + "downloads": 235965, + "tags": [ + "gguf", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:google/gemma-3-27b-it", + "base_model:quantized:google/gemma-3-27b-it", + "region:us", + "conversational" + ], + "description": "--- base_model: google/gemma-3-27b-it inference: false model_creator: google model_name: gemma-3-27b-it-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/gemma-3-27b-it-GGUF - Model creator: google - Original model: google/gemma-3-27b-it ## Description MaziyarPanahi/gemma-3-27b-it-GGUF contains GGUF format model files for google/gemma-3-27b-it. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized version of Google's Gemma 3B model in GGUF format, optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_gemma-3-4b-it-GGUF.json b/data/model_data_json/MaziyarPanahi_gemma-3-4b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..54e74b065555bcb65aa8560a93dc30e369ff04a7 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_gemma-3-4b-it-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/gemma-3-4b-it-GGUF", + "downloads": 246821, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:google/gemma-3-4b-it", + "base_model:quantized:google/gemma-3-4b-it", + "region:us", + "conversational" + ], + "description": "--- base_model: google/gemma-3-4b-it inference: false model_creator: google model_name: gemma-3-4b-it-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/gemma-3-4b-it-GGUF - Model creator: google - Original model: google/gemma-3-4b-it ## Description MaziyarPanahi/gemma-3-4b-it-GGUF contains GGUF format model files for google/gemma-3-4b-it. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of Google's Gemma 3B instruction-tuned model optimized for efficient text generation across various bit levels." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_mathstral-7B-v0.1-GGUF.json b/data/model_data_json/MaziyarPanahi_mathstral-7B-v0.1-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..0758e8caa247b68667035f61d4357ad18a4e0501 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_mathstral-7B-v0.1-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/mathstral-7B-v0.1-GGUF", + "downloads": 261536, + "tags": [ + "transformers", + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:mistralai/Mathstral-7B-v0.1", + "base_model:quantized:mistralai/Mathstral-7B-v0.1", + "region:us" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: mathstral-7B-v0.1-GGUF base_model: mistralai/mathstral-7B-v0.1 inference: false model_creator: mistralai pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # MaziyarPanahi/mathstral-7B-v0.1-GGUF - Model creator: mistralai - Original model: mistralai/mathstral-7B-v0.1 ## Description MaziyarPanahi/mathstral-7B-v0.1-GGUF contains GGUF format model files for mistralai/mathstral-7B-v0.1. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible. --- **Original README** # Model Card for Mathstral-7B-v0.1 Mathstral 7B is a model specializing in mathematical and scientific tasks, based on Mistral 7B. You can read more in the official blog post. ## Installation It is recommended to use with mistral-inference ## Download ### Chat After installing , a CLI command should be available in your environment. You can then start chatting with the model, *e.g.* prompt it with something like: *\"Albert likes to surf every week. Each surfing session lasts for 4 hours and costs $20 per hour. How much would Albert spend in 5 weeks?\"* ## Evaluation We evaluate Mathstral 7B and open-weight models of the similar size on industry-standard benchmarks. | Benchmarks | MATH | GSM8K (8-shot) | Odyssey Math maj@16 | GRE Math maj@16 | AMC 2023 maj@16 | AIME 2024 maj@16 | :--- | :---: | :---: | :---: | :---: | :---: | :---: | | Mathstral 7B | **56.6** | 77.1 | **37.2** | 56.9 | **42.4** | **2/30** | | DeepSeek Math 7B | 44.4 | **80.6** | 27.6 | 44.6 | 28.0 | 0/30 | | Llama3 8B | 28.4 | 75.4 | 24.0 | 26.2 | 34.4 | 0/30 | | GLM4 9B | 50.2 | 48.8 | 18.9 | 46.2 | 36.0 | 1/30 | | QWen2 7B | **56.8** | 32.7 | 24.8 | **58.5** | 35.2 | **2/30** | | Gemma2 9B | 48.3 | 69.5 | 18.6 | 52.3 | 31.2 | 1/30 | ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall", + "model_explanation_gemini": "A quantized version of the Mathstral-7B model specialized in mathematical and scientific text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_mistral-small-3.1-24b-instruct-2503-hf-GGUF.json b/data/model_data_json/MaziyarPanahi_mistral-small-3.1-24b-instruct-2503-hf-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..ec9ed10f0eb301d9ef3dd035a0ea5eb16b176610 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_mistral-small-3.1-24b-instruct-2503-hf-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/mistral-small-3.1-24b-instruct-2503-hf-GGUF", + "downloads": 228531, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:mrfakename/mistral-small-3.1-24b-instruct-2503-hf", + "base_model:quantized:mrfakename/mistral-small-3.1-24b-instruct-2503-hf", + "region:us", + "conversational" + ], + "description": "--- base_model: mrfakename/mistral-small-3.1-24b-instruct-2503-hf inference: false model_creator: mrfakename model_name: mistral-small-3.1-24b-instruct-2503-hf-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/mistral-small-3.1-24b-instruct-2503-hf-GGUF - Model creator: mrfakename - Original model: mrfakename/mistral-small-3.1-24b-instruct-2503-hf ## Description MaziyarPanahi/mistral-small-3.1-24b-instruct-2503-hf-GGUF contains GGUF format model files for mrfakename/mistral-small-3.1-24b-instruct-2503-hf. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized GGUF version of the Mistral model optimized for text generation tasks with various bit-level precision options." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_phi-4-GGUF.json b/data/model_data_json/MaziyarPanahi_phi-4-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..8ef61230ee05b3e0fd38fb631206e5e247678ffc --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_phi-4-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/phi-4-GGUF", + "downloads": 234692, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:microsoft/phi-4", + "base_model:quantized:microsoft/phi-4", + "region:us", + "conversational" + ], + "description": "--- base_model: microsoft/phi-4 inference: false model_creator: microsoft model_name: phi-4-GGUF pipeline_tag: text-generation quantized_by: MaziyarPanahi tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation --- # MaziyarPanahi/phi-4-GGUF - Model creator: microsoft - Original model: microsoft/phi-4 ## Description MaziyarPanahi/phi-4-GGUF contains GGUF format model files for microsoft/phi-4. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "A quantized version of Microsoft's phi-4 model in GGUF format for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/MaziyarPanahi_solar-pro-preview-instruct-GGUF.json b/data/model_data_json/MaziyarPanahi_solar-pro-preview-instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..6be3e1928a8340f2d42cfaf1b57946558471e544 --- /dev/null +++ b/data/model_data_json/MaziyarPanahi_solar-pro-preview-instruct-GGUF.json @@ -0,0 +1,23 @@ +{ + "model_id": "MaziyarPanahi/solar-pro-preview-instruct-GGUF", + "downloads": 263659, + "tags": [ + "gguf", + "mistral", + "quantized", + "2-bit", + "3-bit", + "4-bit", + "5-bit", + "6-bit", + "8-bit", + "GGUF", + "text-generation", + "base_model:upstage/solar-pro-preview-instruct", + "base_model:quantized:upstage/solar-pro-preview-instruct", + "region:us", + "conversational" + ], + "description": "--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - text-generation model_name: solar-pro-preview-instruct-GGUF base_model: upstage/solar-pro-preview-instruct inference: false model_creator: upstage pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # arcee-train/solar-pro-preview-instruct-GGUF - Model creator: upstage - Original model: upstage/solar-pro-preview-instruct ## Description arcee-train/solar-pro-preview-instruct-GGUF contains GGUF format model files for upstage/solar-pro-preview-instruct. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.", + "model_explanation_gemini": "Quantized GGUF format model files for text generation based on upstage/solar-pro-preview-instruct." +} \ No newline at end of file diff --git a/data/model_data_json/MilaNLProc_feel-it-italian-emotion.json b/data/model_data_json/MilaNLProc_feel-it-italian-emotion.json new file mode 100644 index 0000000000000000000000000000000000000000..effd4bd90f22d755dc6268ac68eaf1b75d872934 --- /dev/null +++ b/data/model_data_json/MilaNLProc_feel-it-italian-emotion.json @@ -0,0 +1,20 @@ +{ + "model_id": "MilaNLProc/feel-it-italian-emotion", + "downloads": 176443, + "tags": [ + "transformers", + "pytorch", + "tf", + "camembert", + "text-classification", + "sentiment", + "emotion", + "Italian", + "it", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: it tags: - sentiment - emotion - Italian --- # FEEL-IT: Emotion and Sentiment Classification for the Italian Language ## FEEL-IT Python Package You can find the package that uses this model for emotion and sentiment classification **here** it is meant to be a very simple interface over HuggingFace models. ## License Users should refer to the following license ## Abstract Sentiment analysis is a common task to understand people's reactions online. Still, we often need more nuanced information: is the post negative because the user is angry or because they are sad? An abundance of approaches has been introduced for tackling both tasks. However, at least for Italian, they all treat only one of the tasks at a time. We introduce *FEEL-IT*, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: **anger, fear, joy, sadness**. By collapsing them, we can also do **sentiment analysis**. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results. We release an open-source Python library, so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text. | Model | Download | | ------ | -------------------------| | | Link | | | Link | ## Model The *feel-it-italian-emotion* model performs **emotion classification (joy, fear, anger, sadness)** on Italian. We fine-tuned the UmBERTo model on our new dataset (i.e., FEEL-IT) obtaining state-of-the-art performances on different benchmark corpora. ## Data Our data has been collected by annotating tweets from a broad range of topics. In total, we have 2037 tweets annotated with an emotion label. More details can be found in our paper ( ## Performance We evaluate our performance using MultiEmotions-It. This dataset differs from FEEL-IT both in terms of topic variety and considered social media (i.e., YouTube and Facebook). We considered only the subset of emotions present in FEEL-IT. To give a point of reference, we also show the Most Frequent Class (MFC) baseline results. The results show that training on FEEL-IT brings stable performance even on datasets from different contexts. | Training Dataset | Macro-F1 | Accuracy | ------ | ------ |------ | | MFC | 0.20 | 0.64 | | FEEL-IT | **0.57** | **0.73** | ## Usage ## Citation Please use the following bibtex entry if you use this model in your project:", + "model_explanation_gemini": "Classifies Italian text into emotions (anger, fear, joy, sadness) and performs sentiment analysis." +} \ No newline at end of file diff --git a/data/model_data_json/MilaNLProc_feel-it-italian-sentiment.json b/data/model_data_json/MilaNLProc_feel-it-italian-sentiment.json new file mode 100644 index 0000000000000000000000000000000000000000..285cbab04f55a9f9e5461679a9f926dedb114435 --- /dev/null +++ b/data/model_data_json/MilaNLProc_feel-it-italian-sentiment.json @@ -0,0 +1,19 @@ +{ + "model_id": "MilaNLProc/feel-it-italian-sentiment", + "downloads": 97640, + "tags": [ + "transformers", + "pytorch", + "tf", + "camembert", + "text-classification", + "sentiment", + "Italian", + "it", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: it tags: - sentiment - Italian --- # FEEL-IT: Emotion and Sentiment Classification for the Italian Language ## FEEL-IT Python Package You can find the package that uses this model for emotion and sentiment classification **here** it is meant to be a very simple interface over HuggingFace models. ## License Users should refer to the following license ## Abstract Sentiment analysis is a common task to understand people's reactions online. Still, we often need more nuanced information: is the post negative because the user is angry or because they are sad? An abundance of approaches has been introduced for tackling both tasks. However, at least for Italian, they all treat only one of the tasks at a time. We introduce *FEEL-IT*, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: **anger, fear, joy, sadness**. By collapsing them, we can also do **sentiment analysis**. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results. We release an open-source Python library, so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text. | Model | Download | | ------ | -------------------------| | | Link | | | Link | ## Model The *feel-it-italian-sentiment* model performs **sentiment analysis** on Italian. We fine-tuned the UmBERTo model on our new dataset (i.e., FEEL-IT) obtaining state-of-the-art performances on different benchmark corpora. ## Data Our data has been collected by annotating tweets from a broad range of topics. In total, we have 2037 tweets annotated with an emotion label. More details can be found in our paper ( ## Performance We evaluate our performance using SENTIPOLC16 Evalita. We collapsed the FEEL-IT classes into 2 by mapping joy to the *positive* class and anger, fear and sadness into the *negative* class. We compare three different experimental configurations training on FEEL-IT, SENTIPOLC16, or both by testing on the SENTIPOLC16 test set. The results show that training on FEEL-IT can provide better results on the SENTIPOLC16 test set than those that can be obtained with the SENTIPOLC16 training set. | Training Dataset | Macro-F1 | Accuracy | ------ | ------ |------ | | SENTIPOLC16 | 0.80 | 0.81 | | FEEL-IT | **0.81** | **0.84** | | FEEL-IT+SentiPolc | 0.81 | 0.82 ## Usage ## Citation Please use the following bibtex entry if you use this model in your project:", + "model_explanation_gemini": "Classifies sentiment (positive/negative) and emotions (anger, fear, joy, sadness) in Italian text." +} \ No newline at end of file diff --git a/data/model_data_json/MilaNLProc_xlm-emo-t.json b/data/model_data_json/MilaNLProc_xlm-emo-t.json new file mode 100644 index 0000000000000000000000000000000000000000..57a9afaee79f45466d9001ba55fbdb9b0c40a448 --- /dev/null +++ b/data/model_data_json/MilaNLProc_xlm-emo-t.json @@ -0,0 +1,18 @@ +{ + "model_id": "MilaNLProc/xlm-emo-t", + "downloads": 627921, + "tags": [ + "transformers", + "pytorch", + "xlm-roberta", + "text-classification", + "emotion", + "emotion-analysis", + "multilingual", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: multilingual tags: - emotion - emotion-analysis - multilingual widget: - text: \"Guarda! ci sono dei bellissimi capibara!\" example_title: \"Emotion Classification 1\" - text: \"Sei una testa di cazzo!!\" example_title: \"Emotion Classification 2\" - text: \"Quelle bonne nouvelle!\" example_title: \"Emotion Classification 3\" arxiv: \"\" --- # Federico Bianchi • Debora Nozza • Dirk Hovy ## Abstract Detecting emotion in text allows social and computational scientists to study how people behave and react to online events. However, developing these tools for different languages requires data that is not always available. This paper collects the available emotion detection datasets across 19 languages. We train a multilingual emotion prediction model for social media data, XLM-EMO. The model shows competitive performance in a zero-shot setting, suggesting it is helpful in the context of low-resource languages. We release our model to the community so that interested researchers can directly use it. ## Model This model is the fine-tuned version of the XLM-T model. ### Intended Use The model is intended as a research output for research communities. #### Primary intended uses The primary intended users of these models are AI researchers. ## Results This model had an F1 of 0.85 on the test set. ## License For models, restrictions may apply to the data (which are derived from existing datasets) or Twitter (main data source). We refer users to the original licenses accompanying each dataset and Twitter regulations. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ## Citation Please use the following BibTeX entry if you use this model in your project:", + "model_explanation_gemini": "Detects emotions in multilingual social media text, trained for research purposes with competitive zero-shot performance across 19 languages." +} \ No newline at end of file diff --git a/data/model_data_json/MingZhong_unieval-sum.json b/data/model_data_json/MingZhong_unieval-sum.json new file mode 100644 index 0000000000000000000000000000000000000000..730234013148f520d7c6167149bf9147e205cd35 --- /dev/null +++ b/data/model_data_json/MingZhong_unieval-sum.json @@ -0,0 +1,17 @@ +{ + "model_id": "MingZhong/unieval-sum", + "downloads": 180288, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "arxiv:2210.07197", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "Pre-trained evaluator in EMNLP 2022 paper *Towards a Unified Multi-Dimensional Evaluator for Text Generation* ## Introduction **Multi-dimensional evaluation** is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics (e.g., ROUGE, BLEU), but they are not sufficient to portray the difference between the advanced generation models. Therefore, we propose **UniEval** to bridge this gap so that a more comprehensive and fine-grained evaluation of NLG systems can be achieved. ## Pre-trained Evaluator **unieval-sum** is the pre-trained evaluator for the text summarization task. It can evaluate the model output from four dimensions: - *coherence* - *consistency* - *fluency* - *relevance* It can also be transferred to the new dimensions and generation tasks, such as *naturalness* and *informativeness* for data-to-text. ## Usage Please refer to our GitHub repository.", + "model_explanation_gemini": "Evaluates text summarization outputs across multiple dimensions (coherence, consistency, fluency, relevance) for comprehensive NLG assessment." +} \ No newline at end of file diff --git a/data/model_data_json/MoritzLaurer_DeBERTa-v3-base-mnli-fever-anli.json b/data/model_data_json/MoritzLaurer_DeBERTa-v3-base-mnli-fever-anli.json new file mode 100644 index 0000000000000000000000000000000000000000..7b007b676b5d41f99c1be506cd350c4a9c14e6ce --- /dev/null +++ b/data/model_data_json/MoritzLaurer_DeBERTa-v3-base-mnli-fever-anli.json @@ -0,0 +1,24 @@ +{ + "model_id": "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli", + "downloads": 579649, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "deberta-v2", + "text-classification", + "zero-shot-classification", + "en", + "dataset:multi_nli", + "dataset:facebook/anli", + "dataset:fever", + "arxiv:2006.03654", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: mit tags: - text-classification - zero-shot-classification datasets: - multi_nli - facebook/anli - fever metrics: - accuracy pipeline_tag: zero-shot-classification model-index: - name: MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli results: - task: type: natural-language-inference name: Natural Language Inference dataset: name: anli type: anli config: plain_text split: test_r3 metrics: - type: accuracy value: 0.495 name: Accuracy verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYWViYjQ5YTZlYjU4NjQyN2NhOTVhNjFjNGQyMmFiNmQyZjRkOTdhNzJmNjc3NGU4MmY0MjYyMzY5MjZhYzE0YiIsInZlcnNpb24iOjF9.S8pIQ7gEGokd_wKXMi6Bc3B2DThIP3cvVkTFErZ-2JxXTSCy1TBuulY3dzGfaiP7kTHbL52OuBhG_-wb7Ue9DQ - type: precision value: 0.4984740618243923 name: Precision Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTllZDU3NmVmYjk4ZmYzNjAwNzExMGZjNDMzOWRkZjRjMTRhNzhlZmI0ZmNlM2E0Mzk4OWE5NTM5MTYyYWU5NCIsInZlcnNpb24iOjF9.WHz_TUJgPVn-rU-9vBCDdmSMOuWzADwr09rJY6ktqRM46zytbyWs7Vcm7jqDrTkfU-rp0_7IyoNv_xEsKhJbBA - type: precision value: 0.495 name: Precision Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjllODE3ZjUxZDhiMTI0MzZmYjY5OTUwYWI2OTc4ZjJhNTVjMjY2ODdkMmJlZjQ5YWQ1Mjk2ZThmYjJlM2RlYSIsInZlcnNpb24iOjF9.a9V06-O7l9S0Bv4vj0aard8128SAP61DZdXl_3XqdmNgt_C6KAoDBVueF2M2kF_kT6lRfEz6YW0ACIfJNXDYAA - type: precision value: 0.4984357572868885 name: Precision Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjhiMzYzY2JiMmYwN2YxYzEwZTQ3NGI1NzFmMzliNjJkMDE2YzI5Njg1ZjEzMGIxODdiMDNmYmI4Y2Y2MmJkMiIsInZlcnNpb24iOjF9.xvZZaUMogw9MJjb3ls6h5liDlTqHMmNgqk6KbyDqQWfCcD255brCU3Xo6nECwaChS4te0dQu_iWGBqR_o2kYAA - type: recall value: 0.49461028192371476 name: Recall Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDVjYTEzOTI0ZjVhOTk3ZTkzZmZhNTk5ODcxMWJhYWU4ZTRjYWVhNzcwOWY5YmI2NGFlYWE4NjM5MDY5NTExOSIsInZlcnNpb24iOjF9.xgHCB2rbCQBzHzUokw4u8JyOdhtF4yvPv1t8t7YiEkaAuM5MAPsVuCZ1VtlLapHS_IWetlocizsVl6akjh3cAQ - type: recall value: 0.495 name: Recall Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTEyYmM0ZDQ0M2RiMDNhNjIxNzQ4OWZiNTBiOTAwZDFkNjNmYjBhNjA4NmQ0NjFkNmNiZTljNDkxNDg3NzIyYSIsInZlcnNpb24iOjF9.3FJPwNtwgFNvMjVxVAayaVXXR1sWlr0sqAYmXzmMzMxl7IJh6RS77dGPwFaqD3jamLVBiqPn9wsfz5lFK5yTAA - type: recall value: 0.495 name: Recall Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmY1MjZlZTQ4OTg5YzdlYmFhZDMzMmNlNjNkYmIyZGI4M2NjZjQ1ZDVkNmZkMTUxNjI3M2UwZmI1MDM1NDYwOSIsInZlcnNpb24iOjF9.cnbM6xjTLRa9z0wEDGd_Q4lTXVLRKIQ6_YLGLjf-t7Nto4lzxAeWF-RrwA0Mq9OPITlJq2Jk1Eg_0Utb13d9Dg - type: f1 value: 0.4942810999491704 name: F1 Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2U3NGM1MDM4YTM4NzQxMGM4ZTIyZDM2YTQ1MGNlZWM1MzEzM2MxN2ZmZmRmYTM0OWJmZGJjYjM5OWEzMmZjNSIsInZlcnNpb24iOjF9.vMtge1F-tmMn9D3aVUuwcNEXjqpNgEyHAl9f5UDSoTYcOgTwi2vi5yRGRCl8y6Fx7BtgaCwMyoZVNbP5-GRtCA - type: f1 value: 0.495 name: F1 Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjBjMTQ5MmQ5OGE5OWJjZGMyNzg4N2RmNDUzMzQ5Zjc4ZTc4N2JlMTk0MTc2M2RjZTgzOTNlYWQzODAwNDI0NCIsInZlcnNpb24iOjF9.yxXG0CNWW8__xJC14BjbTY9QkXD75x6uCIXR51oKDemkP0b_xGyd-A2wPIuwNJN1EYkQevPY0bhVpRWBKyO9Bg - type: f1 value: 0.4944671868893595 name: F1 Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzczNjQzY2FmMmY4NTAwYjNkYjJlN2I2NjI2Yjc0ZmQ3NjZiN2U5YWEwYjk4OTUyOTMzZTYyZjYzOTMzZGU2YiIsInZlcnNpb24iOjF9.mLOnst2ScPX7ZQwaUF12W2nv7-w9lX9-BxHl3-0T0gkSWnmtBSwYcL5faTX0_I5q33Fjz5tfkjpCJuxP5JYIBQ - type: loss value: 1.8788293600082397 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzRlOTYwYjU1Y2Y4ZGM0NDBjYTE2MmEzNWIwN2NiMWVkOWZlNzA2ZmQ3YjZjNzI4MjQwYWZhODIwMzU3ODAyZiIsInZlcnNpb24iOjF9._Xs9bl48MSavvp5eyamrP2iNlFWv35QZCrmWjJXLkUdIBx0ElCjEdxBb3dxPGnUxdpDzGMmOoKCPI44ZPXrtDw - task: type: natural-language-inference name: Natural Language Inference dataset: name: anli type: anli config: plain_text split: test_r1 metrics: - type: accuracy value: 0.712 name: Accuracy verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYWYxMGY0ZWU0YTEyY2I3NmQwZmQ3YmFmNzQxNGU5OGNjN2ViN2I0ZjdkYWUzM2RmYzkzMDg3ZjVmNGYwNGZkZCIsInZlcnNpb24iOjF9.snWBusAeo1rrQqWk--vTxb-CBcFqM298YCtwTQGBZiFegKGSTSKzj-SM6HMNsmoQWmMuv7UfYPqYlnzEthOSAg - type: precision value: 0.7134839439315348 name: Precision Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjMxMjg1Y2QwNzMwM2ZkNGM3ZTJhOGJmY2FkNGI1ZTFhOGQ3ODViNTJmZTYwMWJkZDYyYWRjMzFmZDI1NTM5YSIsInZlcnNpb24iOjF9.ZJnY6zYOBn-YEtN7uKzQ-VKXPwlIO1zq19Yuo37vBJNSs1dGDd8f1jgfdZuA19e_wA3Nc5nQKe9VXRwPHPgwAQ - type: precision value: 0.712 name: Precision Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWM4YWQyODBlYTIwMWQxZDA1NmY1M2M2ODgwNDJiY2RhMDVhYTlkMDUzZTJkMThkYzRmNDg2YTdjMjczNGUwOCIsInZlcnNpb24iOjF9.SogsKHdbdlEs05IBYwXvlnaC_esg-DXAPc2KPRyHaVC5ItVHbxa63NpybSpao4baOoMlLG9aRe7TjG4gtB2dAQ - type: precision value: 0.7134676028447461 name: Precision Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODdjMzFkM2IwNWZiM2I4ZWViMmQ4NWM5MDY5ZWQxZjc1MGRmNjhmNzJhYWFmOWEwMjg3ZjhiZWM3YjlhOTIxNSIsInZlcnNpb24iOjF9._0JNIbiqLuDZrp_vrCljBe28xexZJPmigLyhkcO8AtH2VcNxWshwCpZuRF4bqvpMvnApJeuGMf3vXjCj0MC1Bw - type: recall value: 0.7119814425203647 name: Recall Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjU4MWEyMzkyYzg1ZTIxMTc0M2NhMTgzOGEyZmY5OTg3M2Q1ZmMwNmU3ZmU1ZjA1MDk0OGZkMzM5NDVlZjBlNSIsInZlcnNpb24iOjF9.sZ3GTcmGGthpTLL7_Zovq8aBmE3Dp_PZi5v8ZI9yG9N6B_GjWvBuPC8ENXK1NwmwiHLsSvtKTG5JmAum-su0Dg - type: recall value: 0.712 name: Recall Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDg3NGViZTlmMWM2ZDNhMzIzZGZkYWZhODQxNzg2MjNiNjQ0Zjg0NjQ1OWZkY2I5ODdiY2Y3Y2JjNzRmYjJkMiIsInZlcnNpb24iOjF9.bCZUzJamsozKWehnNph6E5coww5zZTrJdbWevWrSyfT0PyXc_wkZ-NKdyBAoqprBz3_8L3i5hPM6Qsy56b4BDA - type: recall value: 0.712 name: Recall Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDk1MDJiOGUzZThlZjJjMzY4NjMzODFiZjUzZmIwMjIxY2UwNzBiN2IxMWEwMGJjZTkxODA0YzUxZDE3ODRhOCIsInZlcnNpb24iOjF9.z0dqvB3aBVYt3xRIb_M4svWebfQc0QaDFVFzHnlA5QGEHkHOW3OecGhHE4EzBqTDI3DASWZTGMjrMDDt0uOMBw - type: f1 value: 0.7119226991285647 name: F1 Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2U0YjMwNzhmOTEyNDZhODU3MTU0YTM4MmQ0NzEzNWI1YjY0ZWQ3MWRiMTdiNTUzNWRkZThjMWE4M2NkZmI0MiIsInZlcnNpb24iOjF9.hhj1BXkuWi9wXrCjT9NwqaPETtOoYNiyqYsJEw-ufA8A4hVThKA6ZBtma1Q_M65-DZFfPEBDBNASLZ7EPSbmDw - type: f1 value: 0.712 name: F1 Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODk0Y2EyMzc5M2ZlNWFlNDg2Zjc1OTQxNGY3YjA5YjUxYTYzZjRlZmU4ODYxNjA3ZjkxNGUzYjBmNmMxMzY5YiIsInZlcnNpb24iOjF9.DvKk-3hNh2LhN2ug5e0FgUntL3Ozdfl06Kz7jvmB-deOJH6INi2a2ZySXoEePoo8t2nR6ENFYu9QjMA2ojnpCA - type: f1 value: 0.7119242267218338 name: F1 Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2MxOWFlMmI2NGRiMjkwN2Q5MWZhNDFlYzQxNWNmNzQ3OWYxZThmNDU2OWU1MTE5OGY2MWRlYWUyNDM3OTkzZCIsInZlcnNpb24iOjF9.QrTD1gE8_wRok9u59W-Mx0cX89K-h2Ad6qa8J5rmP8lc_rkG0ft2n5_GqH1CBZBJwMFYv91Pn6TuE3eGxJuUDA - type: loss value: 1.0105403661727905 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMmUwMTg4NjM3ZTBiZTIyODcyNDNmNTE5ZDZhMzNkMDMyNjcwOGQ5NmY0NTlhMjgyNmIzZjRiNDFiNjA3M2RkZSIsInZlcnNpb24iOjF9.sjBDVJV-jnygwcppmByAXpoo-Wzz178bBzozJEuYEiJaHSbk_xEevfJS1PmLUuplYslKb1iyEctnjI-5bl-XDw - task: type: natural-language-inference name: Natural Language Inference dataset: name: multi_nli type: multi_nli config: default split: validation_mismatched metrics: - type: accuracy value: 0.902766476810415 name: Accuracy verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjExZWM3YzA3ZDNlNjEwMmViNWEwZTE3MjJjNjEyNDhjOTQxNGFmMzBjZTk0ODUwYTc2OGNiZjYyMTBmNWZjZSIsInZlcnNpb24iOjF9.zbFAGrv2flpmweqS7Poxib7qHFLdW8eUTzshdOm2B9H-KWpIZCWC-P4p8TLMdNJnUcZJZ03Okil4qjIMqqIRCA - type: precision value: 0.9023816542652491 name: Precision Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2U2MGViNmJjNWQxNzRjOTkxNDIxZjZjNmM5YzE4ZjU5NTE5NjFlNmEzZWRlOGYxN2E3NTAwMTEwYjNhNzE0YSIsInZlcnNpb24iOjF9.WJjDJf56FROvf7Y5ShWnnxMvK_ZpQ2PibAOtSFhSiYJ7bt4TGOzMwaZ5RSTf_mcfXgRfWbXmy1jCwNhDb-5EAw - type: precision value: 0.902766476810415 name: Precision Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzRhZTExOTc5NDczZjI1YmMzOGYyOTU2MDU1OGE5ZTczMDE0MmU0NzZhY2YzMDI1ZGQ3MGM5MmJiODFkNzUzZiIsInZlcnNpb24iOjF9.aRYcGEI1Y8-a0d8XOoXhBgsFyj9LWNwEjoIPc594y7kJn91wXIsXoR0-_0iy3uz41mWaTTlwJx7lI-kipFDvDQ - type: precision value: 0.9034597464719761 name: Precision Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWQyMTZiZDA2OTUwZjRmNTFiMWRlZTNmOTliZmI2MWFmMjdjYzEyYTgwNzkyOTQzOTBmNTUyYjMwNTUxMTFkNiIsInZlcnNpb24iOjF9.hUtAMTl0THHUkaLcgk1Vy9IhjqJAXCJ_5STJ5A7k7s_SO9DHp3b6qusgwPmcGLYyPy1-j1dB2AIstxK4tHfmDA - type: recall value: 0.9024304801555488 name: Recall Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzAxZGJhNGI3ZDNlMjg2ZDIxNTgwMDY5MTFjM2ExZmIxMDBmZjUyNTliNWNkOGI0OTY3NTYyNWU3OWFlYTA3YiIsInZlcnNpb24iOjF9.1o_GNq8zmXa_50MUF_K63IDc2aUKNeUkNQ5fT592-SAo8WgiaP9Dh6bOEu2OqrpRQ57P4qm7OdJt7UKsrosMDA - type: recall value: 0.902766476810415 name: Recall Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjhiMWE4Yjk0ODFkZjlkYjRlMjU1OTJmMjA2Njg1N2M4MzQ0OWE3N2FlYjY4NDgxZThjMmExYWQ5OGNmYmI1NSIsInZlcnNpb24iOjF9.Gmm5lf_qpxjXWWrycDze7LHR-6WGQc62WZTmcoc5uxWd0tivEUqCAFzFdbEU1jVKxQBIyDX77CPuBm7mUA4sCg - type: recall value: 0.902766476810415 name: Recall Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiY2EzZWYwNjNkYWE1YTcyZGZjNTNhMmNlNzgzYjk5MGJjOWJmZmE5NmYwM2U2NTA5ZDY3ZjFiMmRmZmQwY2QwYiIsInZlcnNpb24iOjF9.yA68rslg3e9kUR3rFTNJJTAad6Usr4uFmJvE_a7G2IvSKqLxG_pqsHszsWfg5mFBQLjWEAyCtdQYMdVayuYMBA - type: f1 value: 0.9023086094638595 name: F1 Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzMyMzZhNjI5MWRmZWJhMjkzN2E0MjM4ZTM5YzZmNTk5YTZmYzU4NDRiYjczZGQ4MDdhNjJiMGU0MjE3NDEwNyIsInZlcnNpb24iOjF9.RCMqH_xUMN97Vos54pTFfAMbLstXUMdFTs-eNaypbDb_Fc-MW8NLmJ6dzJsp9sSvhXyYjugjRMUpMpnQseKXDA - type: f1 value: 0.902766476810415 name: F1 Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTYxZTZhZGM0NThlNTAzNmYwMTA4NDNkN2FiNzhhN2RlYThlYjcxMjE5MjBkMzhiOGYxZGRmMjE0NGM2ZWQ5ZSIsInZlcnNpb24iOjF9.wRfllNw2Gibmi1keU7d_GjkyO0F9HESCgJlJ9PHGZQRRT414nnB-DyRvulHjCNnaNjXqMi0LJimC3iBrNawwAw - type: f1 value: 0.9030161011457231 name: F1 Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDA0YjAxMWU5MjI4MWEzNTNjMzJlNjM3ZDMxOTE0ZTZhYmZlNmUyNDViNTU2NmMyMmM3MjAxZWVjNWJmZjI4MCIsInZlcnNpb24iOjF9.vJ8aUjfTbFMc1BgNUVpoVDuYwQJYQjwZQxblkUdvSoGtkW_AzQJ_KJ8Njc7IBA3ADgj8iZHjRQNIZkFCf-xICw - type: loss value: 0.3283354640007019 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODdmYzYzNTUzZDNmOWIxM2E0ZmUyOWUzM2Y2NGRmZDNiYjg3ZTMzYTUyNzg3OWEzNzYyN2IyNmExOGRlMWUxYSIsInZlcnNpb24iOjF9.Qv0FzFZPkcBs9aHGf4TEREX4jdkc40NazdMlP2M_-w2wHwyjoAjvhk611RLXHcbicozNelZJLnsOMdEMnPLEDg - task: type: natural-language-inference name: Natural Language Inference dataset: name: anli type: anli config: plain_text split: dev_r1 metrics: - type: accuracy value: 0.737 name: Accuracy verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTQ1ZGVkOTVmNTlhYjhkMjVlNTNhMjNmZWFjZWZjZjcxZmRhMDVlOWI0YTdkOTMwYjVjNWFlOGY4OTc1MmRhNiIsInZlcnNpb24iOjF9.wGLgKA1E46ljbLokdPeip_UCr1gqK8iSSbsJKX2vgKuuhDdUWWiECrUFN-bv_78JWKoKW5T0GF_hb-RVDzA0AQ - type: precision value: 0.737681071614645 name: Precision Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYmFkMGUwMjNhN2E3NzMxNTc5NDM0MjY1MGU5ODllM2Q2YzA1MDI3OGI1ZmI4YTcxN2E4ZDk5OWY2OGNiN2I0MCIsInZlcnNpb24iOjF9.6G5qhccjheaNfasgRyrkKBTaQPRzuPMZZ0hrLxTNzAydMDgx09FkFP3hni7WLRMWp0IpwzkEeBlxV-mPyQBtBw - type: precision value: 0.737 name: Precision Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2QzYjQ4ZDZjOGU5YzI3YmFlMThlYTRkYTUyYWIyNzc4NDkwNzM1OWFiMTgyMzA0NDZmMGI3YTQxODBjM2EwMCIsInZlcnNpb24iOjF9.bvNWyzfct1CLJFx_EuD2GeKieVtyGJy0cwUBP2qJE1ey2i9SVn6n1Dr0AALTGBkxQ6n5-fJ61QFNufpdr2KvCA - type: precision value: 0.7376755842752241 name: Precision Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2VmYWYzZWQwZmMzMDk0NTdlY2Y3NDkzYWY5ZTdmOGU0ZTUzZWE4YWFhZjVmODhkZmE1Njg4NjA5YjJmYWVhOSIsInZlcnNpb24iOjF9.50FQR2aoBpORLgYa7482ZTrRhT-KfIgv5ltBEHndUBMmqGF9Ru0LHENSGwyD_tO89sGPfiW32TxpbrNWiBdIBA - type: recall value: 0.7369675064285843 name: Recall Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTM4OTAyNDYwNjY4Zjc5NDljNjBmNTg2Mzk4YjYxM2MyYTA0MDllYTMyNzEwOGI1ZTEwYWE3ZmU0NDZmZDg2NiIsInZlcnNpb24iOjF9.UvWBxuApNV3vd4hpgwqd6XPHCbkA_bB_Cw24ooquiOf0dstvjP3JvpGoDp5SniOzIOg3i2aYbcvFCLJqEXMZCQ - type: recall value: 0.737 name: Recall Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYmQ4MjMzNzRmNTI5NjIzNGQ0ZDFmZTA1MDU3OTk0MzYyMGI0NTMzZTZlMTQ1MDc1MzBkMGMzYjcxZjU1NDNjOSIsInZlcnNpb24iOjF9.kpbdXOpDG3CUB-kUEXsgFT3HWWIbu70wwzs2TNf0rhIuRrzdZz3dXXvwqu1BcLJTsOxl8G6NTiYXgnv-ul8lDg - type: recall value: 0.737 name: Recall Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmU1ZWJkNWE0NjczY2NiZWYyNzYyMzllNzZmZTIxNWRkYTEyZDgxN2E0NTNmM2ExMTc1ZWVjMzBiYjg0ZmM1MiIsInZlcnNpb24iOjF9.S6HHWCWnut_LJqXbEA_Z8ZOTtyq6V51ZeiA0qbwzr0hapDYZOZHrN4prvSLvoNv-GiYDYKatwIsAZxCZc5fmCA - type: f1 value: 0.7366853496239583 name: F1 Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzkxYmY2NTcyOTE0ZDdjNGY2ZmE4MzQwMGIxZTA2MDg1NzI5YTQ0MTdkZjdkNzNkMDM2NTk2MTNiNjU4ODMwZCIsInZlcnNpb24iOjF9.ECVaCBqGd0pnQT3xJF7yWrgecIb-5TMiVWpEO0MQGhYy43snkI6Qs-2FOXzvfwIWqG-Q6XIIhGbWZh5TFEGKCA - type: f1 value: 0.737 name: F1 Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDMwMWZiNzQyNWEzNmMzMDJjOTAxYzAxNzc0MTNlYzRkZjllYmNjZmU0OTgzZDFkNWM1ZWI5OTA2NzE5Y2YxOSIsInZlcnNpb24iOjF9.8yZFol_Gcj9n3w9Yk5wx48yql7p3wriDecv-6VSTAB6Q_MWLQAWsCEGRRhgGJ3zvhoRehJZdb35ozk36VOinDQ - type: f1 value: 0.7366990292378379 name: F1 Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjhhN2ZkMjc5ZGQ3ZGM1Nzk3ZTgwY2E1N2NjYjdhNjZlOTdhYmRlNGVjN2EwNTIzN2UyYTY2ODVlODhmY2Q4ZCIsInZlcnNpb24iOjF9.Cz7ClDAfCGpqdRTYd5v3dPjXFq8lZLXx8AX_rqmF-Jb8KocqVDsHWeZScW5I2oy951UrdMpiUOLieBuJLOmCCQ - type: loss value: 0.9349392056465149 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmI4MTI5MDM1NjBmMzgzMzc2NjM5MzZhOGUyNTgyY2RlZTEyYTIzYzY2ZGJmODcxY2Q5OTVjOWU3OTQ2MzM1NSIsInZlcnNpb24iOjF9.bSOFnYC4Y2y2pW1AR-bgPUHKafR-0OHf8PvexK8eQLsS323Xy9-rYkKUaP09KY6_fk9GqAawv5eqj72B_uyeCA --- # DeBERTa-v3-base-mnli-fever-anli ## Model description This model was trained on the MultiNLI, Fever-NLI and Adversarial-NLI (ANLI) datasets, which comprise 763 913 NLI hypothesis-premise pairs. This base model outperforms almost all large models on the ANLI benchmark. The base model is DeBERTa-v3-base from Microsoft. The v3 variant of DeBERTa substantially outperforms previous versions of the model by including a different pre-training objective, see annex 11 of the original DeBERTa paper. For highest performance (but less speed), I recommend using ### How to use the model #### Simple zero-shot classification pipeline #### NLI use-case ### Training data DeBERTa-v3-base-mnli-fever-anli was trained on the MultiNLI, Fever-NLI and Adversarial-NLI (ANLI) datasets, which comprise 763 913 NLI hypothesis-premise pairs. ### Training procedure DeBERTa-v3-base-mnli-fever-anli was trained using the Hugging Face trainer with the following hyperparameters. ### Eval results The model was evaluated using the test sets for MultiNLI and ANLI and the dev set for Fever-NLI. The metric used is accuracy. mnli-m | mnli-mm | fever-nli | anli-all | anli-r3 ---------|----------|---------|----------|---------- 0.903 | 0.903 | 0.777 | 0.579 | 0.495 ## Limitations and bias Please consult the original DeBERTa paper and literature on different NLI datasets for potential biases. ## Citation If you use this model, please cite: Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. ‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. Preprint, June. Open Science Framework. ### Ideas for cooperation or questions? If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or LinkedIn ### Debugging and issues Note that DeBERTa-v3 was released on 06.12.21 and older versions of HF Transformers seem to have issues running the model (e.g. resulting in an issue with the tokenizer). Using Transformers>=4.13 might solve some issues. Also make sure to install sentencepiece to avoid tokenizer errors. Run: or ## Model Recycling Evaluation on 36 datasets using MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli as a base model yields average score of 79.69 in comparison to 79.04 by microsoft/deberta-v3-base. The model is ranked 2nd among all tested models for the microsoft/deberta-v3-base architecture as of 09/01/2023. Results: | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers | |---------------:|----------:|-----------------------:|-------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|-------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|--------:|--------:|----------------:| | 85.8072 | 90.4333 | 67.32 | 59.625 | 85.107 | 91.0714 | 85.8102 | 67 | 79.0333 | 91.6327 | 82.5 | 94.02 | 71.6428 | 89.5749 | 89.7059 | 64.1708 | 88.4615 | 93.575 | 91.4148 | 89.6811 | 86.2816 | 94.6101 | 57.0588 | 91.5508 | 97.6 | 91.2 | 45.264 | 82.6179 | 54.5455 | 74.3622 | 84.8837 | 71.6949 | 71.0031 | 69.0141 | 68.2692 | 71.3333 | For more information, see: Model Recycling", + "model_explanation_gemini": "A zero-shot text classification model trained for natural language inference tasks using datasets like MultiNLI, FEVER, and ANLI to determine the relationship between text pairs." +} \ No newline at end of file diff --git a/data/model_data_json/MoritzLaurer_DeBERTa-v3-large-mnli-fever-anli-ling-wanli.json b/data/model_data_json/MoritzLaurer_DeBERTa-v3-large-mnli-fever-anli-ling-wanli.json new file mode 100644 index 0000000000000000000000000000000000000000..2ce50dfb42cbd41b0c1f7c17e82c3a41b640caa0 --- /dev/null +++ b/data/model_data_json/MoritzLaurer_DeBERTa-v3-large-mnli-fever-anli-ling-wanli.json @@ -0,0 +1,28 @@ +{ + "model_id": "MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli", + "downloads": 362897, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "zero-shot-classification", + "en", + "dataset:multi_nli", + "dataset:facebook/anli", + "dataset:fever", + "dataset:lingnli", + "dataset:alisawuffles/WANLI", + "arxiv:2104.07179", + "arxiv:2111.09543", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: mit tags: - text-classification - zero-shot-classification datasets: - multi_nli - facebook/anli - fever - lingnli - alisawuffles/WANLI metrics: - accuracy pipeline_tag: zero-shot-classification model-index: - name: DeBERTa-v3-large-mnli-fever-anli-ling-wanli results: - task: type: text-classification name: Natural Language Inference dataset: name: MultiNLI-matched type: multi_nli split: validation_matched metrics: - type: accuracy value: 0,912 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: MultiNLI-mismatched type: multi_nli split: validation_mismatched metrics: - type: accuracy value: 0,908 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: ANLI-all type: anli split: test_r1+test_r2+test_r3 metrics: - type: accuracy value: 0,702 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: ANLI-r3 type: anli split: test_r3 metrics: - type: accuracy value: 0,64 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: WANLI type: alisawuffles/WANLI split: test metrics: - type: accuracy value: 0,77 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: LingNLI type: lingnli split: test metrics: - type: accuracy value: 0,87 verified: false --- # DeBERTa-v3-large-mnli-fever-anli-ling-wanli ## Model description This model was fine-tuned on the MultiNLI, Fever-NLI, Adversarial-NLI (ANLI), LingNLI and WANLI datasets, which comprise 885 242 NLI hypothesis-premise pairs. This model is the best performing NLI model on the Hugging Face Hub as of 06.06.22 and can be used for zero-shot classification. It significantly outperforms all other large models on the ANLI benchmark. The foundation model is DeBERTa-v3-large from Microsoft. DeBERTa-v3 combines several recent innovations compared to classical Masked Language Models like BERT, RoBERTa etc., see the paper ### How to use the model #### Simple zero-shot classification pipeline #### NLI use-case ### Training data DeBERTa-v3-large-mnli-fever-anli-ling-wanli was trained on the MultiNLI, Fever-NLI, Adversarial-NLI (ANLI), LingNLI and WANLI datasets, which comprise 885 242 NLI hypothesis-premise pairs. Note that SNLI was explicitly excluded due to quality issues with the dataset. More data does not necessarily make for better NLI models. ### Training procedure DeBERTa-v3-large-mnli-fever-anli-ling-wanli was trained using the Hugging Face trainer with the following hyperparameters. Note that longer training with more epochs hurt performance in my tests (overfitting). ### Eval results The model was evaluated using the test sets for MultiNLI, ANLI, LingNLI, WANLI and the dev set for Fever-NLI. The metric used is accuracy. The model achieves state-of-the-art performance on each dataset. Surprisingly, it outperforms the previous state-of-the-art on ANLI (ALBERT-XXL) by 8,3%. I assume that this is because ANLI was created to fool masked language models like RoBERTa (or ALBERT), while DeBERTa-v3 uses a better pre-training objective (RTD), disentangled attention and I fine-tuned it on higher quality NLI data. |Datasets|mnli_test_m|mnli_test_mm|anli_test|anli_test_r3|ling_test|wanli_test| | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |Accuracy|0.912|0.908|0.702|0.64|0.87|0.77| |Speed (text/sec, A100 GPU)|696.0|697.0|488.0|425.0|828.0|980.0| ## Limitations and bias Please consult the original DeBERTa-v3 paper and literature on different NLI datasets for more information on the training data and potential biases. The model will reproduce statistical patterns in the training data. ## Citation If you use this model, please cite: Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. ‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. Preprint, June. Open Science Framework. ### Ideas for cooperation or questions? If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or LinkedIn ### Debugging and issues Note that DeBERTa-v3 was released on 06.12.21 and older versions of HF Transformers seem to have issues running the model (e.g. resulting in an issue with the tokenizer). Using Transformers>=4.13 might solve some issues.", + "model_explanation_gemini": "A zero-shot text classification model fine-tuned for natural language inference tasks, achieving state-of-the-art performance on multiple NLI benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/MoritzLaurer_deberta-v3-large-zeroshot-v2.0.json b/data/model_data_json/MoritzLaurer_deberta-v3-large-zeroshot-v2.0.json new file mode 100644 index 0000000000000000000000000000000000000000..ce20ee110df504c67a20725b4e2783c9f395a431 --- /dev/null +++ b/data/model_data_json/MoritzLaurer_deberta-v3-large-zeroshot-v2.0.json @@ -0,0 +1,22 @@ +{ + "model_id": "MoritzLaurer/deberta-v3-large-zeroshot-v2.0", + "downloads": 166058, + "tags": [ + "transformers", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "zero-shot-classification", + "en", + "arxiv:2312.17543", + "base_model:microsoft/deberta-v3-large", + "base_model:quantized:microsoft/deberta-v3-large", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - text-classification - zero-shot-classification base_model: microsoft/deberta-v3-large pipeline_tag: zero-shot-classification library_name: transformers license: mit --- # Model description: deberta-v3-large-zeroshot-v2.0 ## zeroshot-v2.0 series of models Models in this series are designed for efficient zeroshot classification with the Hugging Face pipeline. These models can do classification without training data and run on both GPUs and CPUs. An overview of the latest zeroshot classifiers is available in my Zeroshot Classifier Collection. The main update of this series of models is that several models are trained on fully commercially-friendly data for users with strict license requirements. These models can do one universal classification task: determine whether a hypothesis is \"true\" or \"not true\" given a text ( vs. ). This task format is based on the Natural Language Inference task (NLI). The task is so universal that any classification task can be reformulated into this task by the Hugging Face pipeline. ## Training data Models with a \"\" in the name are trained on two types of fully commercially-friendly data: 1. Synthetic data generated with Mixtral-8x7B-Instruct-v0.1. I first created a list of 500+ diverse text classification tasks for 25 professions in conversations with Mistral-large. The data was manually curated. I then used this as seed data to generate several hundred thousand texts for these tasks with Mixtral-8x7B-Instruct-v0.1. The final dataset used is available in the synthetic_zeroshot_mixtral_v0.1 dataset in the subset . Data curation was done in multiple iterations and will be improved in future iterations. 2. Two commercially-friendly NLI datasets: (MNLI, FEVER-NLI). These datasets were added to increase generalization. 3. Models without a \"\" in the name also included a broader mix of training data with a broader mix of licenses: ANLI, WANLI, LingNLI, and all datasets in this list where . ## How to use the models forces the model to decide on only one class. enables the model to choose multiple classes. ## Metrics The models were evaluated on 28 different text classification tasks with the f1_macro metric. The main reference point is which is, at the time of writing (03.04.24), the most used commercially-friendly 0-shot classifier. !results_aggreg_v2.0 | | facebook/bart-large-mnli | roberta-base-zeroshot-v2.0-c | roberta-large-zeroshot-v2.0-c | deberta-v3-base-zeroshot-v2.0-c | deberta-v3-base-zeroshot-v2.0 (fewshot) | deberta-v3-large-zeroshot-v2.0-c | deberta-v3-large-zeroshot-v2.0 (fewshot) | bge-m3-zeroshot-v2.0-c | bge-m3-zeroshot-v2.0 (fewshot) | |:---------------------------|---------------------------:|-----------------------------:|------------------------------:|--------------------------------:|-----------------------------------:|---------------------------------:|------------------------------------:|-----------------------:|--------------------------:| | all datasets mean | 0.497 | 0.587 | 0.622 | 0.619 | 0.643 (0.834) | 0.676 | 0.673 (0.846) | 0.59 | (0.803) | | amazonpolarity (2) | 0.937 | 0.924 | 0.951 | 0.937 | 0.943 (0.961) | 0.952 | 0.956 (0.968) | 0.942 | (0.951) | | imdb (2) | 0.892 | 0.871 | 0.904 | 0.893 | 0.899 (0.936) | 0.923 | 0.918 (0.958) | 0.873 | (0.917) | | appreviews (2) | 0.934 | 0.913 | 0.937 | 0.938 | 0.945 (0.948) | 0.943 | 0.949 (0.962) | 0.932 | (0.954) | | yelpreviews (2) | 0.948 | 0.953 | 0.977 | 0.979 | 0.975 (0.989) | 0.988 | 0.985 (0.994) | 0.973 | (0.978) | | rottentomatoes (2) | 0.83 | 0.802 | 0.841 | 0.84 | 0.86 (0.902) | 0.869 | 0.868 (0.908) | 0.813 | (0.866) | | emotiondair (6) | 0.455 | 0.482 | 0.486 | 0.459 | 0.495 (0.748) | 0.499 | 0.484 (0.688) | 0.453 | (0.697) | | emocontext (4) | 0.497 | 0.555 | 0.63 | 0.59 | 0.592 (0.799) | 0.699 | 0.676 (0.81) | 0.61 | (0.798) | | empathetic (32) | 0.371 | 0.374 | 0.404 | 0.378 | 0.405 (0.53) | 0.447 | 0.478 (0.555) | 0.387 | (0.455) | | financialphrasebank (3) | 0.465 | 0.562 | 0.455 | 0.714 | 0.669 (0.906) | 0.691 | 0.582 (0.913) | 0.504 | (0.895) | | banking77 (72) | 0.312 | 0.124 | 0.29 | 0.421 | 0.446 (0.751) | 0.513 | 0.567 (0.766) | 0.387 | (0.715) | | massive (59) | 0.43 | 0.428 | 0.543 | 0.512 | 0.52 (0.755) | 0.526 | 0.518 (0.789) | 0.414 | (0.692) | | wikitoxic_toxicaggreg (2) | 0.547 | 0.751 | 0.766 | 0.751 | 0.769 (0.904) | 0.741 | 0.787 (0.911) | 0.736 | (0.9) | | wikitoxic_obscene (2) | 0.713 | 0.817 | 0.854 | 0.853 | 0.869 (0.922) | 0.883 | 0.893 (0.933) | 0.783 | (0.914) | | wikitoxic_threat (2) | 0.295 | 0.71 | 0.817 | 0.813 | 0.87 (0.946) | 0.827 | 0.879 (0.952) | 0.68 | (0.947) | | wikitoxic_insult (2) | 0.372 | 0.724 | 0.798 | 0.759 | 0.811 (0.912) | 0.77 | 0.779 (0.924) | 0.783 | (0.915) | | wikitoxic_identityhate (2) | 0.473 | 0.774 | 0.798 | 0.774 | 0.765 (0.938) | 0.797 | 0.806 (0.948) | 0.761 | (0.931) | | hateoffensive (3) | 0.161 | 0.352 | 0.29 | 0.315 | 0.371 (0.862) | 0.47 | 0.461 (0.847) | 0.291 | (0.823) | | hatexplain (3) | 0.239 | 0.396 | 0.314 | 0.376 | 0.369 (0.765) | 0.378 | 0.389 (0.764) | 0.29 | (0.729) | | biasframes_offensive (2) | 0.336 | 0.571 | 0.583 | 0.544 | 0.601 (0.867) | 0.644 | 0.656 (0.883) | 0.541 | (0.855) | | biasframes_sex (2) | 0.263 | 0.617 | 0.835 | 0.741 | 0.809 (0.922) | 0.846 | 0.815 (0.946) | 0.748 | (0.905) | | biasframes_intent (2) | 0.616 | 0.531 | 0.635 | 0.554 | 0.61 (0.881) | 0.696 | 0.687 (0.891) | 0.467 | (0.868) | | agnews (4) | 0.703 | 0.758 | 0.745 | 0.68 | 0.742 (0.898) | 0.819 | 0.771 (0.898) | 0.687 | (0.892) | | yahootopics (10) | 0.299 | 0.543 | 0.62 | 0.578 | 0.564 (0.722) | 0.621 | 0.613 (0.738) | 0.587 | (0.711) | | trueteacher (2) | 0.491 | 0.469 | 0.402 | 0.431 | 0.479 (0.82) | 0.459 | 0.538 (0.846) | 0.471 | (0.518) | | spam (2) | 0.505 | 0.528 | 0.504 | 0.507 | 0.464 (0.973) | 0.74 | 0.597 (0.983) | 0.441 | (0.978) | | wellformedquery (2) | 0.407 | 0.333 | 0.333 | 0.335 | 0.491 (0.769) | 0.334 | 0.429 (0.815) | 0.361 | (0.718) | | manifesto (56) | 0.084 | 0.102 | 0.182 | 0.17 | 0.187 (0.376) | 0.258 | 0.256 (0.408) | 0.147 | (0.331) | | capsotu (21) | 0.34 | 0.479 | 0.523 | 0.502 | 0.477 (0.664) | 0.603 | 0.502 (0.686) | 0.472 | (0.644) | These numbers indicate zeroshot performance, as no data from these datasets was added in the training mix. Note that models without a \"\" in the title were evaluated twice: one run without any data from these 28 datasets to test pure zeroshot performance (the first number in the respective column) and the final run including up to 500 training data points per class from each of the 28 datasets (the second number in brackets in the column, \"fewshot\"). No model was trained on test data. Details on the different datasets are available here: ## When to use which model - **deberta-v3-zeroshot vs. roberta-zeroshot**: deberta-v3 performs clearly better than roberta, but it is a bit slower. roberta is directly compatible with Hugging Face's production inference TEI containers and flash attention. These containers are a good choice for production use-cases. tl;dr: For accuracy, use a deberta-v3 model. If production inference speed is a concern, you can consider a roberta model (e.g. in a TEI container and HF Inference Endpoints). - **commercial use-cases**: models with \"\" in the title are guaranteed to be trained on only commercially-friendly data. Models without a \"\" were trained on more data and perform better, but include data with non-commercial licenses. Legal opinions diverge if this training data affects the license of the trained model. For users with strict legal requirements, the models with \"\" in the title are recommended. - **Multilingual/non-English use-cases**: use bge-m3-zeroshot-v2.0 or bge-m3-zeroshot-v2.0-c. Note that multilingual models perform worse than English-only models. You can therefore also first machine translate your texts to English with libraries like EasyNMT and then apply any English-only model to the translated data. Machine translation also facilitates validation in case your team does not speak all languages in the data. - **context window**: The models can process up to 8192 tokens. The other models can process up to 512. Note that longer text inputs both make the mode slower and decrease performance, so if you're only working with texts of up to 400~ words / 1 page, use e.g. a deberta model for better performance. - The latest updates on new models are always available in the Zeroshot Classifier Collection. ## Reproduction Reproduction code is available in the directory here: ## Limitations and bias The model can only do text classification tasks. Biases can come from the underlying foundation model, the human NLI training data and the synthetic data generated by Mixtral. ## License The foundation model was published under the MIT license. The licenses of the training data vary depending on the model, see above. ## Citation This model is an extension of the research described in this paper. If you use this model academically, please cite: ### Ideas for cooperation or questions? If you have questions or ideas for cooperation, contact me at moritz{at}huggingface{dot}co or LinkedIn ### Flexible usage and \"prompting\" You can formulate your own hypotheses by changing the of the zeroshot pipeline. Similar to \"prompt engineering\" for LLMs, you can test different formulations of your and verbalized classes to improve performance.", + "model_explanation_gemini": "Performs zero-shot text classification by determining if a hypothesis is true or not true given a text, using a natural language inference approach without requiring training data." +} \ No newline at end of file diff --git a/data/model_data_json/MoritzLaurer_deberta-v3-xsmall-zeroshot-v1.1-all-33.json b/data/model_data_json/MoritzLaurer_deberta-v3-xsmall-zeroshot-v1.1-all-33.json new file mode 100644 index 0000000000000000000000000000000000000000..a34dc6846e3a794817754acccd7c338ffc4af1b3 --- /dev/null +++ b/data/model_data_json/MoritzLaurer_deberta-v3-xsmall-zeroshot-v1.1-all-33.json @@ -0,0 +1,23 @@ +{ + "model_id": "MoritzLaurer/deberta-v3-xsmall-zeroshot-v1.1-all-33", + "downloads": 92328, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "zero-shot-classification", + "en", + "arxiv:2312.17543", + "base_model:microsoft/deberta-v3-xsmall", + "base_model:quantized:microsoft/deberta-v3-xsmall", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: microsoft/deberta-v3-xsmall language: - en tags: - text-classification - zero-shot-classification pipeline_tag: zero-shot-classification library_name: transformers license: mit --- # deberta-v3-xsmall-zeroshot-v1.1-all-33 This model was fine-tuned using the same pipeline as described in the model card for MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33 and in this paper. The foundation model is microsoft/deberta-v3-xsmall. The model only has 22 million backbone parameters and 128 million vocabulary parameters. The backbone parameters are the main parameters active during inference, providing a significant speedup over larger models. The model is 142 MB small. This model was trained to provide a small and highly efficient zeroshot option, especially for edge devices or in-browser use-cases with transformers.js. ## Usage and other details For usage instructions and other details refer to this model card MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33 and this paper. ## Metrics: I didn't not do zeroshot evaluation for this model to save time and compute. The table below shows standard accuracy for all datasets the model was trained on (note that the NLI datasets are binary). General takeaway: the model is much more efficient than its larger sisters, but it performs less well. |Datasets|mnli_m|mnli_mm|fevernli|anli_r1|anli_r2|anli_r3|wanli|lingnli|wellformedquery|rottentomatoes|amazonpolarity|imdb|yelpreviews|hatexplain|massive|banking77|emotiondair|emocontext|empathetic|agnews|yahootopics|biasframes_sex|biasframes_offensive|biasframes_intent|financialphrasebank|appreviews|hateoffensive|trueteacher|spam|wikitoxic_toxicaggregated|wikitoxic_obscene|wikitoxic_identityhate|wikitoxic_threat|wikitoxic_insult|manifesto|capsotu| | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |Accuracy|0.925|0.923|0.886|0.732|0.633|0.661|0.814|0.887|0.722|0.872|0.944|0.925|0.967|0.774|0.734|0.627|0.762|0.745|0.465|0.888|0.702|0.94|0.853|0.863|0.914|0.926|0.921|0.635|0.968|0.897|0.918|0.915|0.935|0.9|0.505|0.701| |Inference text/sec (A10G, batch=128)|1573.0|1630.0|683.0|1282.0|1352.0|1072.0|2325.0|2008.0|4781.0|2743.0|677.0|228.0|238.0|2357.0|5027.0|4323.0|3247.0|3129.0|941.0|1643.0|335.0|1517.0|1452.0|1498.0|2367.0|974.0|2634.0|353.0|2284.0|260.0|252.0|256.0|254.0|259.0|1941.0|2080.0|", + "model_explanation_gemini": "Performs zero-shot text classification efficiently with a small model size (142 MB) optimized for speed and edge device deployment." +} \ No newline at end of file diff --git a/data/model_data_json/MoritzLaurer_mDeBERTa-v3-base-mnli-xnli.json b/data/model_data_json/MoritzLaurer_mDeBERTa-v3-base-mnli-xnli.json new file mode 100644 index 0000000000000000000000000000000000000000..0f29122b660e0a6afa6522ff249078c1644075a7 --- /dev/null +++ b/data/model_data_json/MoritzLaurer_mDeBERTa-v3-base-mnli-xnli.json @@ -0,0 +1,41 @@ +{ + "model_id": "MoritzLaurer/mDeBERTa-v3-base-mnli-xnli", + "downloads": 722822, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "zero-shot-classification", + "nli", + "multilingual", + "en", + "ar", + "bg", + "de", + "el", + "es", + "fr", + "hi", + "ru", + "sw", + "th", + "tr", + "ur", + "vi", + "zh", + "dataset:multi_nli", + "dataset:xnli", + "arxiv:2111.09543", + "arxiv:1809.05053", + "arxiv:1911.02116", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - en - ar - bg - de - el - es - fr - hi - ru - sw - th - tr - ur - vi - zh license: mit tags: - zero-shot-classification - text-classification - nli - pytorch metrics: - accuracy datasets: - multi_nli - xnli pipeline_tag: zero-shot-classification widget: - text: \"Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU\" candidate_labels: \"politics, economy, entertainment, environment\" --- # Multilingual mDeBERTa-v3-base-mnli-xnli ## Model description This multilingual model can perform natural language inference (NLI) on 100 languages and is therefore also suitable for multilingual zero-shot classification. The underlying model was pre-trained by Microsoft on the CC100 multilingual dataset. It was then fine-tuned on the XNLI dataset, which contains hypothesis-premise pairs from 15 languages, as well as the English MNLI dataset. As of December 2021, mDeBERTa-base is the best performing multilingual base-sized transformer model, introduced by Microsoft in this paper. If you are looking for a smaller, faster (but less performant) model, you can try multilingual-MiniLMv2-L6-mnli-xnli. ### How to use the model #### Simple zero-shot classification pipeline #### NLI use-case ### Training data This model was trained on the XNLI development dataset and the MNLI train dataset. The XNLI development set consists of 2490 professionally translated texts from English to 14 other languages (37350 texts in total) (see this paper). Note that the XNLI contains a training set of 15 machine translated versions of the MNLI dataset for 15 languages, but due to quality issues with these machine translations, this model was only trained on the professional translations from the XNLI development set and the original English MNLI training set (392 702 texts). Not using machine translated texts can avoid overfitting the model to the 15 languages; avoids catastrophic forgetting of the other 85 languages mDeBERTa was pre-trained on; and significantly reduces training costs. ### Training procedure mDeBERTa-v3-base-mnli-xnli was trained using the Hugging Face trainer with the following hyperparameters. ### Eval results The model was evaluated on the XNLI test set on 15 languages (5010 texts per language, 75150 in total). Note that multilingual NLI models are capable of classifying NLI texts without receiving NLI training data in the specific language (cross-lingual transfer). This means that the model is also able of doing NLI on the other 85 languages mDeBERTa was training on, but performance is most likely lower than for those languages available in XNLI. Also note that if other multilingual models on the model hub claim performance of around 90% on languages other than English, the authors have most likely made a mistake during testing since non of the latest papers shows a multilingual average performance of more than a few points above 80% on XNLI (see here or here). average | ar | bg | de | el | en | es | fr | hi | ru | sw | th | tr | ur | vi | zh ---------|----------|---------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|---------- 0.808 | 0.802 | 0.829 | 0.825 | 0.826 | 0.883 | 0.845 | 0.834 | 0.771 | 0.813 | 0.748 | 0.793 | 0.807 | 0.740 | 0.795 | 0.8116 ## Limitations and bias Please consult the original DeBERTa-V3 paper and literature on different NLI datasets for potential biases. ## Citation If you use this model, please cite: Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. ‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’. Preprint, June. Open Science Framework. ## Ideas for cooperation or questions? If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or LinkedIn ## Debugging and issues Note that DeBERTa-v3 was released in late 2021 and older versions of HF Transformers seem to have issues running the model (e.g. resulting in an issue with the tokenizer). Using Transformers>=4.13 or higher might solve some issues. Note that mDeBERTa currently does not support FP16, see here:", + "model_explanation_gemini": "Performs multilingual zero-shot classification and natural language inference across 100 languages, fine-tuned on XNLI and MNLI datasets." +} \ No newline at end of file diff --git a/data/model_data_json/MoritzLaurer_mDeBERTa-v3-base-xnli-multilingual-nli-2mil7.json b/data/model_data_json/MoritzLaurer_mDeBERTa-v3-base-xnli-multilingual-nli-2mil7.json new file mode 100644 index 0000000000000000000000000000000000000000..db19af7a3ab672c126be97962b2d146703bfb532 --- /dev/null +++ b/data/model_data_json/MoritzLaurer_mDeBERTa-v3-base-xnli-multilingual-nli-2mil7.json @@ -0,0 +1,59 @@ +{ + "model_id": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7", + "downloads": 179731, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "zero-shot-classification", + "nli", + "multilingual", + "zh", + "ja", + "ar", + "ko", + "de", + "fr", + "es", + "pt", + "hi", + "id", + "it", + "tr", + "ru", + "bn", + "ur", + "mr", + "ta", + "vi", + "fa", + "pl", + "uk", + "nl", + "sv", + "he", + "sw", + "ps", + "dataset:MoritzLaurer/multilingual-NLI-26lang-2mil7", + "dataset:xnli", + "dataset:multi_nli", + "dataset:facebook/anli", + "dataset:fever", + "dataset:lingnli", + "dataset:alisawuffles/WANLI", + "arxiv:2111.09543", + "arxiv:2104.07179", + "arxiv:1809.05053", + "arxiv:1911.02116", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - zh - ja - ar - ko - de - fr - es - pt - hi - id - it - tr - ru - bn - ur - mr - ta - vi - fa - pl - uk - nl - sv - he - sw - ps license: mit tags: - zero-shot-classification - text-classification - nli - pytorch datasets: - MoritzLaurer/multilingual-NLI-26lang-2mil7 - xnli - multi_nli - facebook/anli - fever - lingnli - alisawuffles/WANLI metrics: - accuracy pipeline_tag: zero-shot-classification widget: - text: Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU candidate_labels: politics, economy, entertainment, environment model-index: - name: DeBERTa-v3-base-xnli-multilingual-nli-2mil7 results: - task: type: text-classification name: Natural Language Inference dataset: name: MultiNLI-matched type: multi_nli split: validation_matched metrics: - type: accuracy value: 0,857 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: MultiNLI-mismatched type: multi_nli split: validation_mismatched metrics: - type: accuracy value: 0,856 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: ANLI-all type: anli split: test_r1+test_r2+test_r3 metrics: - type: accuracy value: 0,537 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: ANLI-r3 type: anli split: test_r3 metrics: - type: accuracy value: 0,497 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: WANLI type: alisawuffles/WANLI split: test metrics: - type: accuracy value: 0,732 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: LingNLI type: lingnli split: test metrics: - type: accuracy value: 0,788 verified: false - task: type: text-classification name: Natural Language Inference dataset: name: fever-nli type: fever-nli split: test metrics: - type: accuracy value: 0,761 verified: false --- # Model card for mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 ## Model description This multilingual model can perform natural language inference (NLI) on 100 languages and is therefore also suitable for multilingual zero-shot classification. The underlying mDeBERTa-v3-base model was pre-trained by Microsoft on the CC100 multilingual dataset with 100 languages. The model was then fine-tuned on the XNLI dataset and on the multilingual-NLI-26lang-2mil7 dataset. Both datasets contain more than 2.7 million hypothesis-premise pairs in 27 languages spoken by more than 4 billion people. As of December 2021, mDeBERTa-v3-base is the best performing multilingual base-sized transformer model introduced by Microsoft in this paper. ### How to use the model #### Simple zero-shot classification pipeline #### NLI use-case ### Training data This model was trained on the multilingual-nli-26lang-2mil7 dataset and the XNLI validation dataset. The multilingual-nli-26lang-2mil7 dataset contains 2 730 000 NLI hypothesis-premise pairs in 26 languages spoken by more than 4 billion people. The dataset contains 105 000 text pairs per language. It is based on the English datasets MultiNLI, Fever-NLI, ANLI, LingNLI and WANLI and was created using the latest open-source machine translation models. The languages in the dataset are: ['ar', 'bn', 'de', 'es', 'fa', 'fr', 'he', 'hi', 'id', 'it', 'ja', 'ko', 'mr', 'nl', 'pl', 'ps', 'pt', 'ru', 'sv', 'sw', 'ta', 'tr', 'uk', 'ur', 'vi', 'zh'] (see ISO language codes. For more details, see the datasheet. In addition, a sample of 105 000 text pairs was also added for English following the same sampling method as the other languages, leading to 27 languages. Moreover, for each language a random set of 10% of the hypothesis-premise pairs was added where an English hypothesis was paired with the premise in the other language (and the same for English premises and other language hypotheses). This mix of languages in the text pairs should enable users to formulate a hypothesis in English for a target text in another language. The XNLI validation set consists of 2490 professionally translated texts from English to 14 other languages (37350 texts in total) (see this paper). Note that XNLI also contains a training set of 14 machine translated versions of the MultiNLI dataset for 14 languages, but this data was excluded due to quality issues with the machine translations from 2018. Note that for evaluation purposes, three languages were excluded from the XNLI training data and only included in the test data: [\"bg\",\"el\",\"th\"]. This was done in order to test the performance of the model on languages it has not seen during NLI fine-tuning on 27 languages, but only during pre-training on 100 languages - see evaluation metrics below. The total training dataset had a size of 3 287 280 hypothesis-premise pairs. ### Training procedure mDeBERTa-v3-base-mnli-xnli was trained using the Hugging Face trainer with the following hyperparameters. ### Eval results The model was evaluated on the XNLI test set in 15 languages (5010 texts per language, 75150 in total) and the English test sets of MultiNLI, Fever-NLI, ANLI, LingNLI and WANLI . Note that multilingual NLI models are capable of classifying NLI texts without receiving NLI training data in the specific language (cross-lingual transfer). This means that the model is also able to do NLI on the other 73 languages mDeBERTa was pre-trained on, but performance is most likely lower than for those languages seen during NLI fine-tuning. The performance on the languages [\"bg\",\"el\",\"th\"] in the table below is a good indicated of this cross-lingual transfer, as these languages were not included in the training data. |XNLI subsets|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh| | :---: |:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |Accuracy|0.794|0.822|0.824|0.809|0.871|0.832|0.823|0.769|0.803|0.746|0.786|0.792|0.744|0.793|0.803| |Speed (text/sec, A100-GPU)|1344.0|1355.0|1472.0|1149.0|1697.0|1446.0|1278.0|1115.0|1380.0|1463.0|1713.0|1594.0|1189.0|877.0|1887.0| |English Datasets|mnli_test_m|mnli_test_mm|anli_test|anli_test_r3|fever_test|ling_test|wanli_test| | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |Accuracy|0.857|0.856|0.537|0.497|0.761|0.788|0.732|0.794| |Speed (text/sec, A100-GPU)|1000.0|1009.0|794.0|672.0|374.0|1177.0|1468.0| Also note that if other multilingual models on the model hub claim performance of around 90% on languages other than English, the authors have most likely made a mistake during testing since non of the latest papers shows a multilingual average performance of more than a few points above 80% on XNLI (see here or here). ## Limitations and bias Please consult the original DeBERTa-V3 paper and literature on different NLI datasets for potential biases. Moreover, note that the multilingual-nli-26lang-2mil7 dataset was created using machine translation, which reduces the quality of the data for a complex task like NLI. You can inspect the data via the Hugging Face dataset viewer for languages you are interested in. Note that grammatical errors introduced by machine translation are less of an issue for zero-shot classification, for which grammar is less important. ## Citation If the dataset is useful for you, please cite the following article: ## Ideas for cooperation or questions? For updates on new models and datasets, follow me on Twitter. If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or on LinkedIn ## Debugging and issues Note that DeBERTa-v3 was released in late 2021 and older versions of HF Transformers seem to have issues running the model (e.g. resulting in an issue with the tokenizer). Using Transformers==4.13 or higher might solve some issues. Note that mDeBERTa currently does not support FP16, see here:", + "model_explanation_gemini": "Performs multilingual natural language inference (NLI) and zero-shot text classification across 100 languages, trained on 2.7 million hypothesis-premise pairs in 27 languages." +} \ No newline at end of file diff --git a/data/model_data_json/NbAiLab_nb-bert-base.json b/data/model_data_json/NbAiLab_nb-bert-base.json new file mode 100644 index 0000000000000000000000000000000000000000..0e769075f5bd91957ca4902a5f00e9a4542889fe --- /dev/null +++ b/data/model_data_json/NbAiLab_nb-bert-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "NbAiLab/nb-bert-base", + "downloads": 141345, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "norwegian", + "fill-mask", + "no", + "license:cc-by-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: no license: cc-by-4.0 tags: - norwegian - bert pipeline_tag: fill-mask widget: - text: På biblioteket kan du [MASK] en bok. - text: Dette er et [MASK] eksempel. - text: Av og til kan en språkmodell gi et [MASK] resultat. - text: Som ansat får du [MASK] for at bidrage til borgernes adgang til dansk kulturarv, til forskning og til samfundets demokratiske udvikling. --- - **Release 1.1** (March 11, 2021) - **Release 1.0** (January 13, 2021) # NB-BERT-base ## Description NB-BERT-base is a general BERT-base model built on the large digital collection at the National Library of Norway. This model is based on the same structure as BERT Cased multilingual model, and is trained on a wide variety of Norwegian text (both bokmål and nynorsk) from the last 200 years. ## Intended use & limitations The 1.1 version of the model is general, and should be fine-tuned for any particular use. Some fine-tuning sets may be found on GitHub, see * ## Training data The model is trained on a wide variety of text. The training set is described on * ## More information For more information on the model, see", + "model_explanation_gemini": "NB-BERT-base is a Norwegian BERT model trained on diverse historical texts for general NLP tasks, particularly masked language modeling, requiring fine-tuning for specific applications." +} \ No newline at end of file diff --git a/data/model_data_json/NbAiLab_nb-wav2vec2-1b-bokmaal-v2.json b/data/model_data_json/NbAiLab_nb-wav2vec2-1b-bokmaal-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..6996d8a486f0396caa8a85c0340979fec83e988c --- /dev/null +++ b/data/model_data_json/NbAiLab_nb-wav2vec2-1b-bokmaal-v2.json @@ -0,0 +1,16 @@ +{ + "model_id": "NbAiLab/nb-wav2vec2-1b-bokmaal-v2", + "downloads": 1007805, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 ---", + "model_explanation_gemini": "The model is a Norwegian Bokmål speech recognition system trained to convert spoken language into written text." +} \ No newline at end of file diff --git a/data/model_data_json/NbAiLab_nb-wav2vec2-1b-nynorsk.json b/data/model_data_json/NbAiLab_nb-wav2vec2-1b-nynorsk.json new file mode 100644 index 0000000000000000000000000000000000000000..eeeb21e93a397b08972afd04981e4100b82b2bf7 --- /dev/null +++ b/data/model_data_json/NbAiLab_nb-wav2vec2-1b-nynorsk.json @@ -0,0 +1,24 @@ +{ + "model_id": "NbAiLab/nb-wav2vec2-1b-nynorsk", + "downloads": 101142, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "NbAiLab/NPSC", + "no", + "nn", + "nb-NN", + "dataset:NbAiLab/NPSC", + "arxiv:2307.01672", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - automatic-speech-recognition - NbAiLab/NPSC - no - nn - nb-NN datasets: - NbAiLab/NPSC language: - nn - no model-index: - name: nb-wav2vec2-1b-nynorsk results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: NPSC type: NbAiLab/NPSC args: 16K_mp3_nynorsk metrics: - name: Test (Nynorsk) WER type: wer value: 0.11319692134409612 - name: Test (Nynorsk) CER type: cer value: 0.040263696587740365 --- # Norwegian Wav2Vec2 Model - 1B Nynorsk This model is finetuned on top of feature extractor XLS-R from Facebook/Meta. The finetuned model achieves the following results on the test set with a 5-gram KenLM. The numbers in parentheses are the results without the language model: - **WER: 0.1132** (0.1364) - **CER: 0.0402** (---) ## Model description This is one of several Wav2Vec-models our team created during the 🤗 hosted Robust Speech Event. This is the complete list of our models and their final scores: | Model | Final WER | | |:--------------|:------------|:------------:| | NbAiLab/nb-wav2vec2-1b-bokmaal | 6.33 | | | NbAiLab/nb-wav2vec2-300m-bokmaal | 7.03 | | | NbAiLab/nb-wav2vec2-1b-nynorsk (this model) | 11.32 | | | NbAiLab/nb-wav2vec2-300m-nynorsk | 12.22 | | ## Dataset In parallel with the event, the team also converted the Norwegian Parliamentary Speech Corpus (NPSC) to the NbAiLab/NPSC in 🤗 Dataset format and used that as the main source for training. ## Code We have released all the code developed during the event so that the Norwegian NLP community can build upon it when developing even better Norwegian ASR models. The finetuning of these models is not very computationally demanding. After following the instructions here, you should be able to train your own automatic speech recognition system in less than a day with an average GPU. ## Team The following people contributed to building this model: Rolv-Arild Braaten, Javier de la Rosa, and Freddy Wetjen. ## Training procedure To reproduce these results, we strongly recommend that you follow the instructions from 🤗 to train a simple Swedish model. When you have verified that you are able to do this, create a fresh new repo. You can then start by copying the files and from our repo. Running these will create all the other necessary files, and should let you reproduce our results. With some tweaks to the hyperparameters, you might even be able to build an even better ASR. Good luck! ### Language Model As the scores indicate, adding even a simple 5-gram language will improve the results. 🤗 has provided another very nice blog explaining how to add a 5-gram language model to improve the ASR model. You can build this from your own corpus, for instance by extracting some suitable text from the Norwegian Colossal Corpus. You can also skip some of the steps in the guide, and copy the 5-gram model from this repo. ### Parameters The final model was run using these parameters: Using these settings, the training might take 3-4 days on an average GPU. You can, however, get a decent model and faster results by tweaking these parameters. | Parameter| Comment | |:-------------|:-----| | per_device_train_batch_size | Adjust this to the maximum of available memory. 16 or 24 might be good settings depending on your system | |gradient_accumulation_steps |Can be adjusted even further up to increase batch size and speed up training without running into memory issues | | learning_rate|Can be increased, maybe as high as 1e-4. Speeds up training but might add instability | | epochs| Can be decreased significantly. This is a huge dataset and you might get a decent result already after a couple of epochs| ## Citation See", + "model_explanation_gemini": "Finetuned for automatic speech recognition in Nynorsk Norwegian, achieving a 11.32% word error rate on the NPSC dataset." +} \ No newline at end of file diff --git a/data/model_data_json/NeuML_pubmedbert-base-embeddings.json b/data/model_data_json/NeuML_pubmedbert-base-embeddings.json new file mode 100644 index 0000000000000000000000000000000000000000..ffdb0c9ffded00671fdf3e3bed22ffe218d27294 --- /dev/null +++ b/data/model_data_json/NeuML_pubmedbert-base-embeddings.json @@ -0,0 +1,20 @@ +{ + "model_id": "NeuML/pubmedbert-base-embeddings", + "downloads": 217644, + "tags": [ + "sentence-transformers", + "pytorch", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers language: en license: apache-2.0 --- # PubMedBERT Embeddings This is a PubMedBERT-base model fined-tuned using sentence-transformers. It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. The training dataset was generated using a random sample of PubMed title-abstract pairs along with similar title pairs. PubMedBERT Embeddings produces higher quality embeddings than generalized models for medical literature. Further fine-tuning for a medical subdomain will result in even better performance. ## Usage (txtai) This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). ## Usage (Sentence-Transformers) Alternatively, the model can be loaded with sentence-transformers. ## Usage (Hugging Face Transformers) The model can also be used directly with Transformers. ## Evaluation Results Performance of this model compared to the top base models on the MTEB leaderboard is shown below. A popular smaller model was also evaluated along with the most downloaded PubMed similarity model on the Hugging Face Hub. The following datasets were used to evaluate model performance. - PubMed QA - Subset: pqa_labeled, Split: train, Pair: (question, long_answer) - PubMed Subset - Split: test, Pair: (title, text) - PubMed Summary - Subset: pubmed, Split: validation, Pair: (article, abstract) Evaluation results are shown below. The Pearson correlation coefficient is used as the evaluation metric. | Model | PubMed QA | PubMed Subset | PubMed Summary | Average | | ----------------------------------------------------------------------------- | --------- | ------------- | -------------- | --------- | | all-MiniLM-L6-v2 | 90.40 | 95.86 | 94.07 | 93.44 | | bge-base-en-v1.5 | 91.02 | 95.60 | 94.49 | 93.70 | | gte-base | 92.97 | 96.83 | 96.24 | 95.35 | | **pubmedbert-base-embeddings** | **93.27** | **97.07** | **96.58** | **95.64** | | S-PubMedBert-MS-MARCO | 90.86 | 93.33 | 93.54 | 92.58 | ## Training The model was trained with the parameters: **DataLoader**: of length 20191 with parameters: **Loss**: with parameters: Parameters of the fit() method: ## Full Model Architecture ## More Information Read more about this model and how it was built in this article.", + "model_explanation_gemini": "Generates 768-dimensional embeddings for medical texts to enhance semantic search and clustering tasks in biomedical literature." +} \ No newline at end of file diff --git a/data/model_data_json/Nitral-AI_Captain-Eris_Violet-V0.420-12B.json b/data/model_data_json/Nitral-AI_Captain-Eris_Violet-V0.420-12B.json new file mode 100644 index 0000000000000000000000000000000000000000..97b3f49eca893335380b089b79ff33ea224f9861 --- /dev/null +++ b/data/model_data_json/Nitral-AI_Captain-Eris_Violet-V0.420-12B.json @@ -0,0 +1,25 @@ +{ + "model_id": "Nitral-AI/Captain-Eris_Violet-V0.420-12B", + "downloads": 325400, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "mergekit", + "merge", + "conversational", + "en", + "base_model:Epiculous/Violet_Twilight-v0.2", + "base_model:merge:Epiculous/Violet_Twilight-v0.2", + "base_model:Nitral-AI/Captain_BMO-12B", + "base_model:merge:Nitral-AI/Captain_BMO-12B", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: - Epiculous/Violet_Twilight-v0.2 - Nitral-AI/Captain_BMO-12B library_name: transformers tags: - mergekit - merge license: other language: - en --- !image/png !image/png # Instruct/Context import + Textgen preset combined available: Here ## Quants: 4bpw-exl2 Thanks to Mradermaacher for: GGUF Lewdiculus: Arm Compatible GGUF's<3 ## Original Models used in the merge: Epiculous/Violet_Twilight-v0.2 Nitral-AI/Captain_BMO-12B ### The following YAML configuration was used to produce this model:", + "model_explanation_gemini": "Generates text and instructions by merging 'Epiculous/Violet_Twilight-v0.2' and 'Nitral-AI/Captain_BMO-12B' models for enhanced context-aware responses." +} \ No newline at end of file diff --git a/data/model_data_json/Noor0_SentimentArEng.json b/data/model_data_json/Noor0_SentimentArEng.json new file mode 100644 index 0000000000000000000000000000000000000000..0b12aa3399265f34403ddac9d3428e67fbc5f358 --- /dev/null +++ b/data/model_data_json/Noor0_SentimentArEng.json @@ -0,0 +1,19 @@ +{ + "model_id": "Noor0/SentimentArEng", + "downloads": 95243, + "tags": [ + "transformers", + "safetensors", + "xlm-roberta", + "text-classification", + "ar", + "en", + "base_model:cardiffnlp/twitter-xlm-roberta-base-sentiment", + "base_model:finetune:cardiffnlp/twitter-xlm-roberta-base-sentiment", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: cardiffnlp/twitter-xlm-roberta-base-sentiment metrics: - accuracy model-index: - name: result results: [] language: - ar - en library_name: transformers pipeline_tag: text-classification --- --- # SentimentArEng This model is a fine-tuned version of cardiffnlp/twitter-xlm-roberta-base-sentiment on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.502831 - Accuracy: 0.798512 # inference with pipeline - output: - [{'label': 'negative', 'score': 0.9905518293380737}] ## Training and evaluation data - Training set: 114,885 records - evaluation data: 12,765 records ## Training procedure | Training Loss | Epoch |Validation Loss | Accuracy | |:-------------:|:-----:|:---------------:|:--------:| | 0.4511 | 2.0 |0.502831 | 0.7985 | | 0.3655 | 3.0 |0.576118 | 0.7954 | | 0.3019 | 4.0 |0.625391 | 0.7985 | | 0.2466 | 5.0 |0.835689 | 0.7979 | ### Training hyperparameters - The following hyperparameters were used during training: - learning_rate=2e-5 - num_train_epochs=20 - weight_decay=0.01 - batch_size=16, ### Framework versions - Transformers 4.35.0 - Pytorch 2.0.0 - Datasets 2.11.0 - Tokenizers 0.14.1", + "model_explanation_gemini": "Classifies text sentiment (negative/positive) in Arabic and English." +} \ No newline at end of file diff --git a/data/model_data_json/NousResearch_Llama-2-7b-chat-hf.json b/data/model_data_json/NousResearch_Llama-2-7b-chat-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..8b97be9f379197496cf397a1def3e998bab812f1 --- /dev/null +++ b/data/model_data_json/NousResearch_Llama-2-7b-chat-hf.json @@ -0,0 +1,20 @@ +{ + "model_id": "NousResearch/Llama-2-7b-chat-hf", + "downloads": 85709, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "llama-2", + "en", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- extra_gated_heading: Access Llama 2 on Hugging Face extra_gated_description: >- This is a form to enable access to Llama 2 on Hugging Face after you have been granted access from Meta. Please visit the Meta website and accept our license terms and acceptable use policy before submitting this form. Requests will be processed in 1-2 days. extra_gated_button_content: Submit extra_gated_fields: I agree to share my name, email address and username with Meta and confirm that I have already been granted download access on the Meta website: checkbox language: - en pipeline_tag: text-generation inference: false tags: - facebook - meta - pytorch - llama - llama-2 --- # **Llama 2** Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom. ## Model Details *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the website and accept our License before requesting access here.* Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. **Model Developers** Meta **Variations** Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. **Input** Models input text only. **Output** Models generate text only. **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. ||Training Data|Params|Content Length|GQA|Tokens|LR| |---|---|---|---|---|---|---| |Llama 2|*A new mix of publicly available online data*|7B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|13B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|70B|4k|✔|2.0T|1.5 x 10-4| *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability. **Model Dates** Llama 2 was trained between January 2023 and July 2023. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: ## Intended Use **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. To get the expected features and performance for the chat versions, a specific formatting needs to be followed, including the and tags, and tokens, and the whitespaces and breaklines in between (we recommend calling on inputs to avoid double-spaces). See our reference code in github for details: []( **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program. ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO2eq)| |---|---|---|---| |Llama 2 7B|184320|400|31.22| |Llama 2 13B|368640|400|62.44| |Llama 2 70B|1720320|400|291.42| |Total|3311616||539.00| **CO2 emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023. ## Evaluation Results In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.For all the evaluations, we use our internal evaluations library. |Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval| |---|---|---|---|---|---|---|---|---|---| |Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9| |Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9| |Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7| |Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6| |Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3| |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1| |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**| **Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1. |||TruthfulQA|Toxigen| |---|---|---|---| |Llama 1|7B|27.42|23.00| |Llama 1|13B|41.74|23.08| |Llama 1|33B|44.19|22.57| |Llama 1|65B|48.71|21.77| |Llama 2|7B|33.29|**21.25**| |Llama 2|13B|41.86|26.10| |Llama 2|70B|**50.18**|24.60| **Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better). |||TruthfulQA|Toxigen| |---|---|---|---| |Llama-2-Chat|7B|57.04|**0.00**| |Llama-2-Chat|13B|62.18|**0.00**| |Llama-2-Chat|70B|**64.14**|0.01| **Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above. ## Ethical Considerations and Limitations Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available at ## Reporting Issues Please report any software “bug,” or other problems with the models through one of the following means: - Reporting issues with the model: github.com/facebookresearch/llama - Reporting problematic content generated by the model: developers.facebook.com/llama_output_feedback - Reporting bugs and security concerns: facebook.com/whitehat/info ## Llama Model Index |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |---|---|---|---|---| |7B| Link | Link | Link | Link| |13B| Link | Link | Link | Link| |70B| Link | Link | Link | Link|", + "model_explanation_gemini": "Llama-2-7b-chat-hf is a 7-billion-parameter fine-tuned generative text model optimized for dialogue, designed for assistant-like chat applications in English." +} \ No newline at end of file diff --git a/data/model_data_json/NousResearch_Llama-2-7b-hf.json b/data/model_data_json/NousResearch_Llama-2-7b-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..176263cb4f610c065675d1a7e7638d4d34079214 --- /dev/null +++ b/data/model_data_json/NousResearch_Llama-2-7b-hf.json @@ -0,0 +1,20 @@ +{ + "model_id": "NousResearch/Llama-2-7b-hf", + "downloads": 107875, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "llama-2", + "en", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- extra_gated_heading: Access Llama 2 on Hugging Face extra_gated_description: >- This is a form to enable access to Llama 2 on Hugging Face after you have been granted access from Meta. Please visit the Meta website and accept our license terms and acceptable use policy before submitting this form. Requests will be processed in 1-2 days. extra_gated_button_content: Submit extra_gated_fields: I agree to share my name, email address and username with Meta and confirm that I have already been granted download access on the Meta website: checkbox language: - en pipeline_tag: text-generation inference: false tags: - facebook - meta - pytorch - llama - llama-2 --- # **Llama 2** Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B pretrained model, converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom. ## Model Details *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the website and accept our License before requesting access here.* Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. **Model Developers** Meta **Variations** Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. **Input** Models input text only. **Output** Models generate text only. **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. ||Training Data|Params|Content Length|GQA|Tokens|LR| |---|---|---|---|---|---|---| |Llama 2|*A new mix of publicly available online data*|7B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|13B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|70B|4k|✔|2.0T|1.5 x 10-4| *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability. **Model Dates** Llama 2 was trained between January 2023 and July 2023. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: ## Intended Use **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. To get the expected features and performance for the chat versions, a specific formatting needs to be followed, including the and tags, and tokens, and the whitespaces and breaklines in between (we recommend calling on inputs to avoid double-spaces). See our reference code in github for details: []( **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program. ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO2eq)| |---|---|---|---| |Llama 2 7B|184320|400|31.22| |Llama 2 13B|368640|400|62.44| |Llama 2 70B|1720320|400|291.42| |Total|3311616||539.00| **CO2 emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023. ## Evaluation Results In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.For all the evaluations, we use our internal evaluations library. |Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval| |---|---|---|---|---|---|---|---|---|---| |Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9| |Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9| |Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7| |Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6| |Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3| |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1| |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**| **Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1. |||TruthfulQA|Toxigen| |---|---|---|---| |Llama 1|7B|27.42|23.00| |Llama 1|13B|41.74|23.08| |Llama 1|33B|44.19|22.57| |Llama 1|65B|48.71|21.77| |Llama 2|7B|33.29|**21.25**| |Llama 2|13B|41.86|26.10| |Llama 2|70B|**50.18**|24.60| **Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better). |||TruthfulQA|Toxigen| |---|---|---|---| |Llama-2-Chat|7B|57.04|**0.00**| |Llama-2-Chat|13B|62.18|**0.00**| |Llama-2-Chat|70B|**64.14**|0.01| **Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above. ## Ethical Considerations and Limitations Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available at ## Reporting Issues Please report any software “bug,” or other problems with the models through one of the following means: - Reporting issues with the model: github.com/facebookresearch/llama - Reporting problematic content generated by the model: developers.facebook.com/llama_output_feedback - Reporting bugs and security concerns: facebook.com/whitehat/info ## Llama Model Index |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |---|---|---|---|---| |7B| Link | Link | Link | Link| |13B| Link | Link | Link | Link| |70B| Link | Link | Link | Link|", + "model_explanation_gemini": "Llama-2-7b-hf is a 7-billion-parameter pretrained generative text model designed for natural language generation tasks, part of Meta's Llama 2 family of large language models." +} \ No newline at end of file diff --git a/data/model_data_json/NovaSearch_stella_en_400M_v5.json b/data/model_data_json/NovaSearch_stella_en_400M_v5.json new file mode 100644 index 0000000000000000000000000000000000000000..a0cef408fc853e770b10a1cbe853b0feafc4b89b --- /dev/null +++ b/data/model_data_json/NovaSearch_stella_en_400M_v5.json @@ -0,0 +1,25 @@ +{ + "model_id": "NovaSearch/stella_en_400M_v5", + "downloads": 798809, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "new", + "feature-extraction", + "mteb", + "transformers", + "sentence-similarity", + "custom_code", + "arxiv:2412.19048", + "arxiv:2205.13147", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- model-index: - name: stella_en_400M_v5 results: - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 92.35820895522387 - type: ap value: 70.81322736988783 - type: ap_weighted value: 70.81322736988783 - type: f1 value: 88.9505466159595 - type: f1_weighted value: 92.68630932872613 - type: main_score value: 92.35820895522387 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 97.1945 - type: ap value: 96.08192192244094 - type: ap_weighted value: 96.08192192244094 - type: f1 value: 97.1936887167346 - type: f1_weighted value: 97.1936887167346 - type: main_score value: 97.1945 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 59.528000000000006 - type: f1 value: 59.21016819840188 - type: f1_weighted value: 59.21016819840188 - type: main_score value: 59.528000000000006 task: type: Classification - dataset: config: default name: MTEB ArguAna revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: main_score value: 64.24 - type: map_at_1 value: 40.398 - type: map_at_10 value: 56.215 - type: map_at_100 value: 56.833999999999996 - type: map_at_1000 value: 56.835 - type: map_at_20 value: 56.747 - type: map_at_3 value: 52.181 - type: map_at_5 value: 54.628 - type: mrr_at_1 value: 41.25177809388336 - type: mrr_at_10 value: 56.570762491815216 - type: mrr_at_100 value: 57.17548614361504 - type: mrr_at_1000 value: 57.176650626377466 - type: mrr_at_20 value: 57.08916253512566 - type: mrr_at_3 value: 52.47747747747754 - type: mrr_at_5 value: 54.94547178757718 - type: nauc_map_at_1000_diff1 value: 22.408086887100158 - type: nauc_map_at_1000_max value: -8.730419096847543 - type: nauc_map_at_1000_std value: -17.789262741255737 - type: nauc_map_at_100_diff1 value: 22.407371684274025 - type: nauc_map_at_100_max value: -8.732263549026266 - type: nauc_map_at_100_std value: -17.79550515579994 - type: nauc_map_at_10_diff1 value: 21.925005073301246 - type: nauc_map_at_10_max value: -8.990323944492134 - type: nauc_map_at_10_std value: -18.199246301671458 - type: nauc_map_at_1_diff1 value: 26.23276644969203 - type: nauc_map_at_1_max value: -12.376511389571245 - type: nauc_map_at_1_std value: -18.11411715207284 - type: nauc_map_at_20_diff1 value: 22.32455790850922 - type: nauc_map_at_20_max value: -8.664671547236034 - type: nauc_map_at_20_std value: -17.8290016125137 - type: nauc_map_at_3_diff1 value: 22.395462147465064 - type: nauc_map_at_3_max value: -8.206580750918844 - type: nauc_map_at_3_std value: -17.604490446911484 - type: nauc_map_at_5_diff1 value: 21.95307379904799 - type: nauc_map_at_5_max value: -8.03958102978443 - type: nauc_map_at_5_std value: -17.36578866595004 - type: nauc_mrr_at_1000_diff1 value: 20.124236798365587 - type: nauc_mrr_at_1000_max value: -9.587376069575898 - type: nauc_mrr_at_1000_std value: -17.79191612151833 - type: nauc_mrr_at_100_diff1 value: 20.123612603474033 - type: nauc_mrr_at_100_max value: -9.589187218607831 - type: nauc_mrr_at_100_std value: -17.7981617777748 - type: nauc_mrr_at_10_diff1 value: 19.723683875738075 - type: nauc_mrr_at_10_max value: -9.774151729178815 - type: nauc_mrr_at_10_std value: -18.168668675495162 - type: nauc_mrr_at_1_diff1 value: 23.945332059908132 - type: nauc_mrr_at_1_max value: -12.260461466152819 - type: nauc_mrr_at_1_std value: -18.007194922921148 - type: nauc_mrr_at_20_diff1 value: 20.04819461810257 - type: nauc_mrr_at_20_max value: -9.518368283588936 - type: nauc_mrr_at_20_std value: -17.831608149836136 - type: nauc_mrr_at_3_diff1 value: 19.8571785245832 - type: nauc_mrr_at_3_max value: -9.464375021240478 - type: nauc_mrr_at_3_std value: -17.728533927330453 - type: nauc_mrr_at_5_diff1 value: 19.670313652167827 - type: nauc_mrr_at_5_max value: -8.966372585728434 - type: nauc_mrr_at_5_std value: -17.468955834324817 - type: nauc_ndcg_at_1000_diff1 value: 21.863049281767417 - type: nauc_ndcg_at_1000_max value: -8.18698520924057 - type: nauc_ndcg_at_1000_std value: -17.634483364794804 - type: nauc_ndcg_at_100_diff1 value: 21.849924385738586 - type: nauc_ndcg_at_100_max value: -8.226437560889345 - type: nauc_ndcg_at_100_std value: -17.774648478087002 - type: nauc_ndcg_at_10_diff1 value: 19.888395590413573 - type: nauc_ndcg_at_10_max value: -8.968706085632382 - type: nauc_ndcg_at_10_std value: -19.31386964628115 - type: nauc_ndcg_at_1_diff1 value: 26.23276644969203 - type: nauc_ndcg_at_1_max value: -12.376511389571245 - type: nauc_ndcg_at_1_std value: -18.11411715207284 - type: nauc_ndcg_at_20_diff1 value: 21.38413342416933 - type: nauc_ndcg_at_20_max value: -7.636238194084164 - type: nauc_ndcg_at_20_std value: -17.946390844693028 - type: nauc_ndcg_at_3_diff1 value: 21.29169165029195 - type: nauc_ndcg_at_3_max value: -6.793840499730093 - type: nauc_ndcg_at_3_std value: -17.52359001586737 - type: nauc_ndcg_at_5_diff1 value: 20.238297656671364 - type: nauc_ndcg_at_5_max value: -6.424992706950072 - type: nauc_ndcg_at_5_std value: -17.082391132291356 - type: nauc_precision_at_1000_diff1 value: -7.05195108528572 - type: nauc_precision_at_1000_max value: 34.439879624882145 - type: nauc_precision_at_1000_std value: 68.72436351659353 - type: nauc_precision_at_100_diff1 value: -2.769464113932605 - type: nauc_precision_at_100_max value: 9.89562961226698 - type: nauc_precision_at_100_std value: -0.5880967482224028 - type: nauc_precision_at_10_diff1 value: 2.1371544726832323 - type: nauc_precision_at_10_max value: -11.93051325147756 - type: nauc_precision_at_10_std value: -30.83144187392059 - type: nauc_precision_at_1_diff1 value: 26.23276644969203 - type: nauc_precision_at_1_max value: -12.376511389571245 - type: nauc_precision_at_1_std value: -18.11411715207284 - type: nauc_precision_at_20_diff1 value: 3.780146814257504 - type: nauc_precision_at_20_max value: 17.06527540214615 - type: nauc_precision_at_20_std value: -20.36832563035565 - type: nauc_precision_at_3_diff1 value: 17.63894384012077 - type: nauc_precision_at_3_max value: -2.0220490624638887 - type: nauc_precision_at_3_std value: -17.285601413493918 - type: nauc_precision_at_5_diff1 value: 12.557855071944601 - type: nauc_precision_at_5_max value: 0.5840236463956658 - type: nauc_precision_at_5_std value: -15.827224420217846 - type: nauc_recall_at_1000_diff1 value: -7.051951085286463 - type: nauc_recall_at_1000_max value: 34.43987962487738 - type: nauc_recall_at_1000_std value: 68.724363516591 - type: nauc_recall_at_100_diff1 value: -2.769464113930314 - type: nauc_recall_at_100_max value: 9.895629612270017 - type: nauc_recall_at_100_std value: -0.58809674821745 - type: nauc_recall_at_10_diff1 value: 2.1371544726834495 - type: nauc_recall_at_10_max value: -11.930513251477253 - type: nauc_recall_at_10_std value: -30.83144187392047 - type: nauc_recall_at_1_diff1 value: 26.23276644969203 - type: nauc_recall_at_1_max value: -12.376511389571245 - type: nauc_recall_at_1_std value: -18.11411715207284 - type: nauc_recall_at_20_diff1 value: 3.7801468142575922 - type: nauc_recall_at_20_max value: 17.0652754021456 - type: nauc_recall_at_20_std value: -20.36832563035559 - type: nauc_recall_at_3_diff1 value: 17.63894384012074 - type: nauc_recall_at_3_max value: -2.02204906246383 - type: nauc_recall_at_3_std value: -17.28560141349386 - type: nauc_recall_at_5_diff1 value: 12.55785507194463 - type: nauc_recall_at_5_max value: 0.5840236463957296 - type: nauc_recall_at_5_std value: -15.827224420217856 - type: ndcg_at_1 value: 40.398 - type: ndcg_at_10 value: 64.24 - type: ndcg_at_100 value: 66.631 - type: ndcg_at_1000 value: 66.65100000000001 - type: ndcg_at_20 value: 66.086 - type: ndcg_at_3 value: 55.938 - type: ndcg_at_5 value: 60.370000000000005 - type: precision_at_1 value: 40.398 - type: precision_at_10 value: 8.962 - type: precision_at_100 value: 0.9950000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.836 - type: precision_at_3 value: 22.262 - type: precision_at_5 value: 15.519 - type: recall_at_1 value: 40.398 - type: recall_at_10 value: 89.616 - type: recall_at_100 value: 99.502 - type: recall_at_1000 value: 99.644 - type: recall_at_20 value: 96.72800000000001 - type: recall_at_3 value: 66.78500000000001 - type: recall_at_5 value: 77.596 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: main_score value: 55.1564333205451 - type: v_measure value: 55.1564333205451 - type: v_measure_std value: 14.696883012214512 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: main_score value: 49.823698316694795 - type: v_measure value: 49.823698316694795 - type: v_measure_std value: 14.951660654298186 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: main_score value: 66.15294503553424 - type: map value: 66.15294503553424 - type: mrr value: 78.53438420612935 - type: nAUC_map_diff1 value: 12.569697092717997 - type: nAUC_map_max value: 21.50670312412572 - type: nAUC_map_std value: 16.943786429229064 - type: nAUC_mrr_diff1 value: 15.590272897361238 - type: nAUC_mrr_max value: 34.96072022474653 - type: nAUC_mrr_std value: 21.649217605241045 task: type: Reranking - dataset: config: default name: MTEB BIOSSES revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: cosine_pearson value: 85.7824546319275 - type: cosine_spearman value: 83.29587385660628 - type: euclidean_pearson value: 84.58764190565167 - type: euclidean_spearman value: 83.30069324352772 - type: main_score value: 83.29587385660628 - type: manhattan_pearson value: 84.95996839947179 - type: manhattan_spearman value: 83.87480271054358 - type: pearson value: 85.7824546319275 - type: spearman value: 83.29587385660628 task: type: STS - dataset: config: default name: MTEB Banking77Classification revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 89.30194805194806 - type: f1 value: 89.26182507266391 - type: f1_weighted value: 89.26182507266391 - type: main_score value: 89.30194805194806 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: main_score value: 50.67972171889736 - type: v_measure value: 50.67972171889736 - type: v_measure_std value: 0.7687409980036303 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: main_score value: 45.80539715556144 - type: v_measure value: 45.80539715556144 - type: v_measure_std value: 0.9601346216579142 task: type: Clustering - dataset: config: default name: MTEB CQADupstackRetrieval revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack metrics: - type: main_score value: 44.361250000000005 - type: map_at_1 value: 28.304499999999997 - type: map_at_10 value: 38.54841666666666 - type: map_at_100 value: 39.83141666666667 - type: map_at_1000 value: 39.944750000000006 - type: map_at_20 value: 39.25341666666667 - type: map_at_3 value: 35.406749999999995 - type: map_at_5 value: 37.15558333333333 - type: mrr_at_1 value: 34.09077232860122 - type: mrr_at_10 value: 43.15445393211421 - type: mrr_at_100 value: 43.98645286848257 - type: mrr_at_1000 value: 44.037631313469404 - type: mrr_at_20 value: 43.64045813249614 - type: mrr_at_3 value: 40.674138648480486 - type: mrr_at_5 value: 42.106251182620255 - type: nauc_map_at_1000_diff1 value: 46.250011739434996 - type: nauc_map_at_1000_max value: 30.13664446260598 - type: nauc_map_at_1000_std value: 5.422301791618935 - type: nauc_map_at_100_diff1 value: 46.253631351999395 - type: nauc_map_at_100_max value: 30.12612918885181 - type: nauc_map_at_100_std value: 5.367077019987172 - type: nauc_map_at_10_diff1 value: 46.328171341741346 - type: nauc_map_at_10_max value: 29.80274612581464 - type: nauc_map_at_10_std value: 4.62996685176396 - type: nauc_map_at_1_diff1 value: 51.56118117729493 - type: nauc_map_at_1_max value: 27.94885243863768 - type: nauc_map_at_1_std value: 1.700366508927356 - type: nauc_map_at_20_diff1 value: 46.286750260299094 - type: nauc_map_at_20_max value: 29.979205290353278 - type: nauc_map_at_20_std value: 5.010588412441873 - type: nauc_map_at_3_diff1 value: 47.10018183619064 - type: nauc_map_at_3_max value: 29.062318206078753 - type: nauc_map_at_3_std value: 3.2235696254694197 - type: nauc_map_at_5_diff1 value: 46.41971733050039 - type: nauc_map_at_5_max value: 29.456798617695657 - type: nauc_map_at_5_std value: 4.0921691023077145 - type: nauc_mrr_at_1000_diff1 value: 45.88888977975723 - type: nauc_mrr_at_1000_max value: 32.162138978089544 - type: nauc_mrr_at_1000_std value: 6.2811943424217915 - type: nauc_mrr_at_100_diff1 value: 45.87480433011124 - type: nauc_mrr_at_100_max value: 32.16011334212834 - type: nauc_mrr_at_100_std value: 6.2865717772421785 - type: nauc_mrr_at_10_diff1 value: 45.849652904658825 - type: nauc_mrr_at_10_max value: 32.13847916232293 - type: nauc_mrr_at_10_std value: 6.105718728141999 - type: nauc_mrr_at_1_diff1 value: 51.013730325062156 - type: nauc_mrr_at_1_max value: 32.77457396492779 - type: nauc_mrr_at_1_std value: 4.415684893471724 - type: nauc_mrr_at_20_diff1 value: 45.86663046255274 - type: nauc_mrr_at_20_max value: 32.15219360697865 - type: nauc_mrr_at_20_std value: 6.19603046412763 - type: nauc_mrr_at_3_diff1 value: 46.522376582423185 - type: nauc_mrr_at_3_max value: 32.18259009733714 - type: nauc_mrr_at_3_std value: 5.288000648220897 - type: nauc_mrr_at_5_diff1 value: 45.86611481369745 - type: nauc_mrr_at_5_max value: 32.14261639054921 - type: nauc_mrr_at_5_std value: 5.8811238177073735 - type: nauc_ndcg_at_1000_diff1 value: 44.5055097547565 - type: nauc_ndcg_at_1000_max value: 31.149682057975458 - type: nauc_ndcg_at_1000_std value: 8.157937194901333 - type: nauc_ndcg_at_100_diff1 value: 44.12398363638596 - type: nauc_ndcg_at_100_max value: 30.878064321409994 - type: nauc_ndcg_at_100_std value: 8.40493441452808 - type: nauc_ndcg_at_10_diff1 value: 44.200093505221474 - type: nauc_ndcg_at_10_max value: 30.15267107733158 - type: nauc_ndcg_at_10_std value: 6.407495361566107 - type: nauc_ndcg_at_1_diff1 value: 51.013730325062156 - type: nauc_ndcg_at_1_max value: 32.77457396492779 - type: nauc_ndcg_at_1_std value: 4.415684893471724 - type: nauc_ndcg_at_20_diff1 value: 44.16988321564116 - type: nauc_ndcg_at_20_max value: 30.333532500651213 - type: nauc_ndcg_at_20_std value: 7.10024701386895 - type: nauc_ndcg_at_3_diff1 value: 45.35982873879988 - type: nauc_ndcg_at_3_max value: 30.288312457948702 - type: nauc_ndcg_at_3_std value: 4.653900898293395 - type: nauc_ndcg_at_5_diff1 value: 44.324558115380185 - type: nauc_ndcg_at_5_max value: 30.048149698941373 - type: nauc_ndcg_at_5_std value: 5.6684459618413205 - type: nauc_precision_at_1000_diff1 value: -7.282175798304458 - type: nauc_precision_at_1000_max value: 7.820142031765352 - type: nauc_precision_at_1000_std value: 11.736131836431172 - type: nauc_precision_at_100_diff1 value: 1.0222940256506976 - type: nauc_precision_at_100_max value: 16.12346497070298 - type: nauc_precision_at_100_std value: 18.202607395247874 - type: nauc_precision_at_10_diff1 value: 18.289439185857837 - type: nauc_precision_at_10_max value: 26.116517399154375 - type: nauc_precision_at_10_std value: 13.921214069982302 - type: nauc_precision_at_1_diff1 value: 51.013730325062156 - type: nauc_precision_at_1_max value: 32.77457396492779 - type: nauc_precision_at_1_std value: 4.415684893471724 - type: nauc_precision_at_20_diff1 value: 12.365165405210886 - type: nauc_precision_at_20_max value: 22.946297258937367 - type: nauc_precision_at_20_std value: 16.13862870358933 - type: nauc_precision_at_3_diff1 value: 32.063423642849685 - type: nauc_precision_at_3_max value: 30.140965811989407 - type: nauc_precision_at_3_std value: 8.501746262550146 - type: nauc_precision_at_5_diff1 value: 24.777203357717948 - type: nauc_precision_at_5_max value: 28.401579566848472 - type: nauc_precision_at_5_std value: 11.643246774390914 - type: nauc_recall_at_1000_diff1 value: 30.04216463401409 - type: nauc_recall_at_1000_max value: 34.98067760563842 - type: nauc_recall_at_1000_std value: 48.01453905250591 - type: nauc_recall_at_100_diff1 value: 31.193415507513972 - type: nauc_recall_at_100_max value: 28.69740149270981 - type: nauc_recall_at_100_std value: 25.20960758920368 - type: nauc_recall_at_10_diff1 value: 36.18870823636506 - type: nauc_recall_at_10_max value: 26.005625231341238 - type: nauc_recall_at_10_std value: 8.891983977041376 - type: nauc_recall_at_1_diff1 value: 51.56118117729493 - type: nauc_recall_at_1_max value: 27.94885243863768 - type: nauc_recall_at_1_std value: 1.700366508927356 - type: nauc_recall_at_20_diff1 value: 34.93996118564803 - type: nauc_recall_at_20_max value: 26.149961715956138 - type: nauc_recall_at_20_std value: 12.0657502367633 - type: nauc_recall_at_3_diff1 value: 40.80743946709512 - type: nauc_recall_at_3_max value: 26.443127773025783 - type: nauc_recall_at_3_std value: 3.7011448604241477 - type: nauc_recall_at_5_diff1 value: 37.608535157055776 - type: nauc_recall_at_5_max value: 26.168016189725822 - type: nauc_recall_at_5_std value: 6.344191564595316 - type: ndcg_at_1 value: 34.09083333333333 - type: ndcg_at_10 value: 44.361250000000005 - type: ndcg_at_100 value: 49.586166666666664 - type: ndcg_at_1000 value: 51.623583333333336 - type: ndcg_at_20 value: 46.40158333333333 - type: ndcg_at_3 value: 39.27733333333333 - type: ndcg_at_5 value: 41.662333333333336 - type: precision_at_1 value: 34.09083333333333 - type: precision_at_10 value: 7.957000000000002 - type: precision_at_100 value: 1.2521666666666669 - type: precision_at_1000 value: 0.16125 - type: precision_at_20 value: 4.6755 - type: precision_at_3 value: 18.402083333333334 - type: precision_at_5 value: 13.104333333333335 - type: recall_at_1 value: 28.304499999999997 - type: recall_at_10 value: 56.80666666666667 - type: recall_at_100 value: 79.66208333333334 - type: recall_at_1000 value: 93.6455 - type: recall_at_20 value: 64.2495 - type: recall_at_3 value: 42.431333333333335 - type: recall_at_5 value: 48.665416666666665 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: main_score value: 43.525999999999996 - type: map_at_1 value: 19.291 - type: map_at_10 value: 33.471000000000004 - type: map_at_100 value: 35.388999999999996 - type: map_at_1000 value: 35.568 - type: map_at_20 value: 34.496 - type: map_at_3 value: 28.713 - type: map_at_5 value: 31.384 - type: mrr_at_1 value: 43.77850162866449 - type: mrr_at_10 value: 56.28576598934912 - type: mrr_at_100 value: 56.8588518168194 - type: mrr_at_1000 value: 56.878236725973544 - type: mrr_at_20 value: 56.6409328120183 - type: mrr_at_3 value: 53.56134636264935 - type: mrr_at_5 value: 55.27795874049956 - type: nauc_map_at_1000_diff1 value: 27.262513153363876 - type: nauc_map_at_1000_max value: 40.099398684385584 - type: nauc_map_at_1000_std value: 18.847812394005512 - type: nauc_map_at_100_diff1 value: 27.238993503030745 - type: nauc_map_at_100_max value: 40.07730434492169 - type: nauc_map_at_100_std value: 18.795349250833684 - type: nauc_map_at_10_diff1 value: 27.70929180366227 - type: nauc_map_at_10_max value: 39.55987024970173 - type: nauc_map_at_10_std value: 17.214881544648996 - type: nauc_map_at_1_diff1 value: 43.34155892182403 - type: nauc_map_at_1_max value: 38.23324890148018 - type: nauc_map_at_1_std value: 6.0781444393516075 - type: nauc_map_at_20_diff1 value: 27.311577477800103 - type: nauc_map_at_20_max value: 39.624414083413456 - type: nauc_map_at_20_std value: 18.149811054163287 - type: nauc_map_at_3_diff1 value: 30.475965062734367 - type: nauc_map_at_3_max value: 38.49324825043695 - type: nauc_map_at_3_std value: 13.357656038648487 - type: nauc_map_at_5_diff1 value: 28.425110095017747 - type: nauc_map_at_5_max value: 39.017894870747796 - type: nauc_map_at_5_std value: 15.543817194122564 - type: nauc_mrr_at_1000_diff1 value: 33.16689354701644 - type: nauc_mrr_at_1000_max value: 41.70755363247148 - type: nauc_mrr_at_1000_std value: 24.61667417463176 - type: nauc_mrr_at_100_diff1 value: 33.147229262917506 - type: nauc_mrr_at_100_max value: 41.712455697170725 - type: nauc_mrr_at_100_std value: 24.6418922043652 - type: nauc_mrr_at_10_diff1 value: 32.94185191112572 - type: nauc_mrr_at_10_max value: 41.64272730141954 - type: nauc_mrr_at_10_std value: 24.663391015702707 - type: nauc_mrr_at_1_diff1 value: 39.571969559016395 - type: nauc_mrr_at_1_max value: 39.396249211263495 - type: nauc_mrr_at_1_std value: 16.984149923258357 - type: nauc_mrr_at_20_diff1 value: 33.10040770334742 - type: nauc_mrr_at_20_max value: 41.807565560083034 - type: nauc_mrr_at_20_std value: 24.8064180365271 - type: nauc_mrr_at_3_diff1 value: 33.065406161485704 - type: nauc_mrr_at_3_max value: 41.049510969934694 - type: nauc_mrr_at_3_std value: 23.18371458928609 - type: nauc_mrr_at_5_diff1 value: 33.2389593543916 - type: nauc_mrr_at_5_max value: 41.629486918949915 - type: nauc_mrr_at_5_std value: 24.5777253036149 - type: nauc_ndcg_at_1000_diff1 value: 25.868840609197637 - type: nauc_ndcg_at_1000_max value: 42.79564910784761 - type: nauc_ndcg_at_1000_std value: 27.035091271680113 - type: nauc_ndcg_at_100_diff1 value: 25.019789319579942 - type: nauc_ndcg_at_100_max value: 42.482345143533735 - type: nauc_ndcg_at_100_std value: 26.76872010731345 - type: nauc_ndcg_at_10_diff1 value: 25.949464660653238 - type: nauc_ndcg_at_10_max value: 40.79769544643906 - type: nauc_ndcg_at_10_std value: 22.486116508973204 - type: nauc_ndcg_at_1_diff1 value: 39.571969559016395 - type: nauc_ndcg_at_1_max value: 39.396249211263495 - type: nauc_ndcg_at_1_std value: 16.984149923258357 - type: nauc_ndcg_at_20_diff1 value: 25.173455685962214 - type: nauc_ndcg_at_20_max value: 40.88873540662413 - type: nauc_ndcg_at_20_std value: 24.4451041955519 - type: nauc_ndcg_at_3_diff1 value: 28.185416070726333 - type: nauc_ndcg_at_3_max value: 39.10600031163912 - type: nauc_ndcg_at_3_std value: 18.42694044215541 - type: nauc_ndcg_at_5_diff1 value: 27.112647584005583 - type: nauc_ndcg_at_5_max value: 40.154045682322526 - type: nauc_ndcg_at_5_std value: 20.26822517176828 - type: nauc_precision_at_1000_diff1 value: -16.42087927044017 - type: nauc_precision_at_1000_max value: 3.5326295053913 - type: nauc_precision_at_1000_std value: 24.406810708493197 - type: nauc_precision_at_100_diff1 value: -12.17648135724982 - type: nauc_precision_at_100_max value: 15.895489260126183 - type: nauc_precision_at_100_std value: 32.48346122610907 - type: nauc_precision_at_10_diff1 value: -1.2493131347748072 - type: nauc_precision_at_10_max value: 26.409459305604376 - type: nauc_precision_at_10_std value: 31.115432019300016 - type: nauc_precision_at_1_diff1 value: 39.571969559016395 - type: nauc_precision_at_1_max value: 39.396249211263495 - type: nauc_precision_at_1_std value: 16.984149923258357 - type: nauc_precision_at_20_diff1 value: -6.597509397240593 - type: nauc_precision_at_20_max value: 21.461984620659695 - type: nauc_precision_at_20_std value: 32.9450259748889 - type: nauc_precision_at_3_diff1 value: 9.46378764865453 - type: nauc_precision_at_3_max value: 32.03650819375425 - type: nauc_precision_at_3_std value: 26.489382638510765 - type: nauc_precision_at_5_diff1 value: 3.5987036728169537 - type: nauc_precision_at_5_max value: 30.633955978579703 - type: nauc_precision_at_5_std value: 30.532430088014443 - type: nauc_recall_at_1000_diff1 value: 10.714633106872254 - type: nauc_recall_at_1000_max value: 43.94958623961 - type: nauc_recall_at_1000_std value: 51.78914468954123 - type: nauc_recall_at_100_diff1 value: 9.63781472255557 - type: nauc_recall_at_100_max value: 38.50917465255336 - type: nauc_recall_at_100_std value: 37.78623984642377 - type: nauc_recall_at_10_diff1 value: 16.480342820841688 - type: nauc_recall_at_10_max value: 35.982566867357406 - type: nauc_recall_at_10_std value: 23.30688188788895 - type: nauc_recall_at_1_diff1 value: 43.34155892182403 - type: nauc_recall_at_1_max value: 38.23324890148018 - type: nauc_recall_at_1_std value: 6.0781444393516075 - type: nauc_recall_at_20_diff1 value: 13.521048985146367 - type: nauc_recall_at_20_max value: 34.62462209239834 - type: nauc_recall_at_20_std value: 27.85924191501618 - type: nauc_recall_at_3_diff1 value: 23.57032748533523 - type: nauc_recall_at_3_max value: 36.32703197635613 - type: nauc_recall_at_3_std value: 15.730238734014337 - type: nauc_recall_at_5_diff1 value: 19.61387036368584 - type: nauc_recall_at_5_max value: 36.22030835529556 - type: nauc_recall_at_5_std value: 19.76310648649897 - type: ndcg_at_1 value: 43.779 - type: ndcg_at_10 value: 43.525999999999996 - type: ndcg_at_100 value: 50.138000000000005 - type: ndcg_at_1000 value: 52.991 - type: ndcg_at_20 value: 46.083 - type: ndcg_at_3 value: 38.002 - type: ndcg_at_5 value: 39.842 - type: precision_at_1 value: 43.779 - type: precision_at_10 value: 13.205 - type: precision_at_100 value: 2.051 - type: precision_at_1000 value: 0.259 - type: precision_at_20 value: 7.722999999999999 - type: precision_at_3 value: 28.903000000000002 - type: precision_at_5 value: 21.368000000000002 - type: recall_at_1 value: 19.291 - type: recall_at_10 value: 48.754 - type: recall_at_100 value: 70.97200000000001 - type: recall_at_1000 value: 86.611 - type: recall_at_20 value: 55.884 - type: recall_at_3 value: 34.101 - type: recall_at_5 value: 40.784 task: type: Retrieval - dataset: config: default name: MTEB DBPedia revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: main_score value: 49.884 - type: map_at_1 value: 9.913 - type: map_at_10 value: 23.186999999999998 - type: map_at_100 value: 34.207 - type: map_at_1000 value: 36.318 - type: map_at_20 value: 27.419 - type: map_at_3 value: 15.656 - type: map_at_5 value: 18.945999999999998 - type: mrr_at_1 value: 75.75 - type: mrr_at_10 value: 82.16279761904761 - type: mrr_at_100 value: 82.48445635330299 - type: mrr_at_1000 value: 82.4870246719901 - type: mrr_at_20 value: 82.36203632968338 - type: mrr_at_3 value: 81.29166666666666 - type: mrr_at_5 value: 82.02916666666667 - type: nauc_map_at_1000_diff1 value: 17.0739966990996 - type: nauc_map_at_1000_max value: 28.440065298437133 - type: nauc_map_at_1000_std value: 20.83498154003865 - type: nauc_map_at_100_diff1 value: 17.75982086107111 - type: nauc_map_at_100_max value: 26.87850835673573 - type: nauc_map_at_100_std value: 18.350282298599275 - type: nauc_map_at_10_diff1 value: 17.15984258564116 - type: nauc_map_at_10_max value: 10.846179132675553 - type: nauc_map_at_10_std value: -6.263534464094614 - type: nauc_map_at_1_diff1 value: 24.014897777973694 - type: nauc_map_at_1_max value: -4.556638938723358 - type: nauc_map_at_1_std value: -22.7844467526989 - type: nauc_map_at_20_diff1 value: 16.3179372493187 - type: nauc_map_at_20_max value: 17.176378915498915 - type: nauc_map_at_20_std value: 1.9378637630340372 - type: nauc_map_at_3_diff1 value: 19.12786794046792 - type: nauc_map_at_3_max value: 0.09063919305677291 - type: nauc_map_at_3_std value: -16.713143158330492 - type: nauc_map_at_5_diff1 value: 18.76504725420023 - type: nauc_map_at_5_max value: 5.040867712207419 - type: nauc_map_at_5_std value: -12.382578318931165 - type: nauc_mrr_at_1000_diff1 value: 54.61266255011247 - type: nauc_mrr_at_1000_max value: 60.83961280977112 - type: nauc_mrr_at_1000_std value: 32.70429260443016 - type: nauc_mrr_at_100_diff1 value: 54.61346236538542 - type: nauc_mrr_at_100_max value: 60.8407974416647 - type: nauc_mrr_at_100_std value: 32.69272843993462 - type: nauc_mrr_at_10_diff1 value: 54.74633685810871 - type: nauc_mrr_at_10_max value: 61.084525933097865 - type: nauc_mrr_at_10_std value: 33.001220210025565 - type: nauc_mrr_at_1_diff1 value: 56.12708423835806 - type: nauc_mrr_at_1_max value: 58.9314540998289 - type: nauc_mrr_at_1_std value: 27.39422607651012 - type: nauc_mrr_at_20_diff1 value: 54.58896150245695 - type: nauc_mrr_at_20_max value: 60.890929983464815 - type: nauc_mrr_at_20_std value: 32.65559641276393 - type: nauc_mrr_at_3_diff1 value: 54.38229071443791 - type: nauc_mrr_at_3_max value: 59.987849044098596 - type: nauc_mrr_at_3_std value: 33.439813880719974 - type: nauc_mrr_at_5_diff1 value: 54.961790262449824 - type: nauc_mrr_at_5_max value: 61.17705173908951 - type: nauc_mrr_at_5_std value: 33.30939850734856 - type: nauc_ndcg_at_1000_diff1 value: 29.27465932507067 - type: nauc_ndcg_at_1000_max value: 47.952543312315214 - type: nauc_ndcg_at_1000_std value: 36.17132236391485 - type: nauc_ndcg_at_100_diff1 value: 28.63072328980134 - type: nauc_ndcg_at_100_max value: 41.460833419186564 - type: nauc_ndcg_at_100_std value: 27.157100358988135 - type: nauc_ndcg_at_10_diff1 value: 23.41488013023301 - type: nauc_ndcg_at_10_max value: 39.27798133072349 - type: nauc_ndcg_at_10_std value: 21.979241438928312 - type: nauc_ndcg_at_1_diff1 value: 46.12120543657642 - type: nauc_ndcg_at_1_max value: 47.28452124039853 - type: nauc_ndcg_at_1_std value: 19.799884708952543 - type: nauc_ndcg_at_20_diff1 value: 23.627669045115574 - type: nauc_ndcg_at_20_max value: 35.88225062457673 - type: nauc_ndcg_at_20_std value: 18.218628030529498 - type: nauc_ndcg_at_3_diff1 value: 25.37309228946118 - type: nauc_ndcg_at_3_max value: 40.64426332992231 - type: nauc_ndcg_at_3_std value: 24.608330645901482 - type: nauc_ndcg_at_5_diff1 value: 24.055798594999654 - type: nauc_ndcg_at_5_max value: 41.16180524175431 - type: nauc_ndcg_at_5_std value: 24.048305528761315 - type: nauc_precision_at_1000_diff1 value: -18.234943251015576 - type: nauc_precision_at_1000_max value: 0.48708502364659184 - type: nauc_precision_at_1000_std value: 2.4473601543134027 - type: nauc_precision_at_100_diff1 value: -3.0077810947381227 - type: nauc_precision_at_100_max value: 25.27249321108913 - type: nauc_precision_at_100_std value: 37.36575792126928 - type: nauc_precision_at_10_diff1 value: -0.2393778190297635 - type: nauc_precision_at_10_max value: 36.40513293547299 - type: nauc_precision_at_10_std value: 37.4827885766009 - type: nauc_precision_at_1_diff1 value: 56.12708423835806 - type: nauc_precision_at_1_max value: 58.9314540998289 - type: nauc_precision_at_1_std value: 27.39422607651012 - type: nauc_precision_at_20_diff1 value: -1.2010133229402933 - type: nauc_precision_at_20_max value: 34.117541814385966 - type: nauc_precision_at_20_std value: 39.13273254177449 - type: nauc_precision_at_3_diff1 value: 11.757378092198486 - type: nauc_precision_at_3_max value: 42.637962482588875 - type: nauc_precision_at_3_std value: 37.42465077352342 - type: nauc_precision_at_5_diff1 value: 7.233177203405101 - type: nauc_precision_at_5_max value: 43.1663582897407 - type: nauc_precision_at_5_std value: 38.848449220750055 - type: nauc_recall_at_1000_diff1 value: 27.33938551969145 - type: nauc_recall_at_1000_max value: 45.5614254479334 - type: nauc_recall_at_1000_std value: 50.58528916250458 - type: nauc_recall_at_100_diff1 value: 23.610383761920097 - type: nauc_recall_at_100_max value: 31.422168485847184 - type: nauc_recall_at_100_std value: 25.58649926458304 - type: nauc_recall_at_10_diff1 value: 14.62495111808408 - type: nauc_recall_at_10_max value: 7.4295041277681095 - type: nauc_recall_at_10_std value: -9.32297089600654 - type: nauc_recall_at_1_diff1 value: 24.014897777973694 - type: nauc_recall_at_1_max value: -4.556638938723358 - type: nauc_recall_at_1_std value: -22.7844467526989 - type: nauc_recall_at_20_diff1 value: 14.027862330014662 - type: nauc_recall_at_20_max value: 12.437478731690844 - type: nauc_recall_at_20_std value: -3.0740743798103676 - type: nauc_recall_at_3_diff1 value: 16.354018356566712 - type: nauc_recall_at_3_max value: -2.9812231240997917 - type: nauc_recall_at_3_std value: -18.27746460743442 - type: nauc_recall_at_5_diff1 value: 16.81486583473587 - type: nauc_recall_at_5_max value: 2.420128513974744 - type: nauc_recall_at_5_std value: -14.441820321214108 - type: ndcg_at_1 value: 63.87500000000001 - type: ndcg_at_10 value: 49.884 - type: ndcg_at_100 value: 54.738 - type: ndcg_at_1000 value: 61.635 - type: ndcg_at_20 value: 48.894999999999996 - type: ndcg_at_3 value: 54.287 - type: ndcg_at_5 value: 52.40899999999999 - type: precision_at_1 value: 75.75 - type: precision_at_10 value: 40.9 - type: precision_at_100 value: 13.139999999999999 - type: precision_at_1000 value: 2.533 - type: precision_at_20 value: 30.8 - type: precision_at_3 value: 57.667 - type: precision_at_5 value: 51.05 - type: recall_at_1 value: 9.913 - type: recall_at_10 value: 28.591 - type: recall_at_100 value: 61.017999999999994 - type: recall_at_1000 value: 83.383 - type: recall_at_20 value: 37.834 - type: recall_at_3 value: 17.049 - type: recall_at_5 value: 21.685 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 78.77499999999999 - type: f1 value: 73.74058240799386 - type: f1_weighted value: 79.78804377638227 - type: main_score value: 78.77499999999999 task: type: Classification - dataset: config: default name: MTEB FEVER revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: main_score value: 90.986 - type: map_at_1 value: 81.601 - type: map_at_10 value: 88.242 - type: map_at_100 value: 88.46000000000001 - type: map_at_1000 value: 88.472 - type: map_at_20 value: 88.375 - type: map_at_3 value: 87.237 - type: map_at_5 value: 87.85300000000001 - type: mrr_at_1 value: 87.81878187818782 - type: mrr_at_10 value: 92.20301196786335 - type: mrr_at_100 value: 92.24884236673292 - type: mrr_at_1000 value: 92.2496338899362 - type: mrr_at_20 value: 92.23112073283473 - type: mrr_at_3 value: 91.77417741774165 - type: mrr_at_5 value: 92.03970397039689 - type: nauc_map_at_1000_diff1 value: 56.54670664910505 - type: nauc_map_at_1000_max value: 33.08375749975477 - type: nauc_map_at_1000_std value: 2.7491595418252865 - type: nauc_map_at_100_diff1 value: 56.50887688686924 - type: nauc_map_at_100_max value: 33.075487189958494 - type: nauc_map_at_100_std value: 2.7675869969253375 - type: nauc_map_at_10_diff1 value: 56.08080806610569 - type: nauc_map_at_10_max value: 32.776972098819066 - type: nauc_map_at_10_std value: 2.5904846711290097 - type: nauc_map_at_1_diff1 value: 60.645344065853145 - type: nauc_map_at_1_max value: 31.232776777514797 - type: nauc_map_at_1_std value: -1.1946138176109171 - type: nauc_map_at_20_diff1 value: 56.28378454162355 - type: nauc_map_at_20_max value: 32.98207150385811 - type: nauc_map_at_20_std value: 2.8469814040214025 - type: nauc_map_at_3_diff1 value: 55.81958007095375 - type: nauc_map_at_3_max value: 31.602707711038313 - type: nauc_map_at_3_std value: 0.8117019292273401 - type: nauc_map_at_5_diff1 value: 55.706025752316535 - type: nauc_map_at_5_max value: 32.16032683604737 - type: nauc_map_at_5_std value: 1.8853201503498669 - type: nauc_mrr_at_1000_diff1 value: 75.4997173366251 - type: nauc_mrr_at_1000_max value: 41.49117135484116 - type: nauc_mrr_at_1000_std value: -2.0636172883680852 - type: nauc_mrr_at_100_diff1 value: 75.50118860648519 - type: nauc_mrr_at_100_max value: 41.49490161517194 - type: nauc_mrr_at_100_std value: -2.057024385178682 - type: nauc_mrr_at_10_diff1 value: 75.47295153099428 - type: nauc_mrr_at_10_max value: 41.55003304042536 - type: nauc_mrr_at_10_std value: -2.0353663198929253 - type: nauc_mrr_at_1_diff1 value: 76.632058433229 - type: nauc_mrr_at_1_max value: 39.754483718891656 - type: nauc_mrr_at_1_std value: -2.962241058101701 - type: nauc_mrr_at_20_diff1 value: 75.47221882396194 - type: nauc_mrr_at_20_max value: 41.50779280480839 - type: nauc_mrr_at_20_std value: -1.9620212266426307 - type: nauc_mrr_at_3_diff1 value: 75.5682297897137 - type: nauc_mrr_at_3_max value: 41.53543801506081 - type: nauc_mrr_at_3_std value: -3.391681195945978 - type: nauc_mrr_at_5_diff1 value: 75.37562775183947 - type: nauc_mrr_at_5_max value: 41.42028509006753 - type: nauc_mrr_at_5_std value: -2.418698675622726 - type: nauc_ndcg_at_1000_diff1 value: 59.364557011624 - type: nauc_ndcg_at_1000_max value: 35.4112238125149 - type: nauc_ndcg_at_1000_std value: 3.717516193303376 - type: nauc_ndcg_at_100_diff1 value: 58.55706703023122 - type: nauc_ndcg_at_100_max value: 35.352285999934594 - type: nauc_ndcg_at_100_std value: 4.273437944266781 - type: nauc_ndcg_at_10_diff1 value: 56.77422701267037 - type: nauc_ndcg_at_10_max value: 34.24909893882957 - type: nauc_ndcg_at_10_std value: 4.178151434006727 - type: nauc_ndcg_at_1_diff1 value: 76.632058433229 - type: nauc_ndcg_at_1_max value: 39.754483718891656 - type: nauc_ndcg_at_1_std value: -2.962241058101701 - type: nauc_ndcg_at_20_diff1 value: 57.27343398231262 - type: nauc_ndcg_at_20_max value: 34.7416626740278 - type: nauc_ndcg_at_20_std value: 4.955858766014002 - type: nauc_ndcg_at_3_diff1 value: 57.69267803121093 - type: nauc_ndcg_at_3_max value: 33.13744317023105 - type: nauc_ndcg_at_3_std value: 0.40380284030057023 - type: nauc_ndcg_at_5_diff1 value: 56.57461019113917 - type: nauc_ndcg_at_5_max value: 33.244657840804386 - type: nauc_ndcg_at_5_std value: 2.5121440827702046 - type: nauc_precision_at_1000_diff1 value: -14.54492513449718 - type: nauc_precision_at_1000_max value: -5.94552147573623 - type: nauc_precision_at_1000_std value: 1.2446209816057374 - type: nauc_precision_at_100_diff1 value: -15.452676132568344 - type: nauc_precision_at_100_max value: -3.760241749847617 - type: nauc_precision_at_100_std value: 4.623534605290865 - type: nauc_precision_at_10_diff1 value: -12.712908026086176 - type: nauc_precision_at_10_max value: 0.45241316994816805 - type: nauc_precision_at_10_std value: 7.849478570138391 - type: nauc_precision_at_1_diff1 value: 76.632058433229 - type: nauc_precision_at_1_max value: 39.754483718891656 - type: nauc_precision_at_1_std value: -2.962241058101701 - type: nauc_precision_at_20_diff1 value: -14.514618673172041 - type: nauc_precision_at_20_max value: -1.113635490621818 - type: nauc_precision_at_20_std value: 8.599811730457576 - type: nauc_precision_at_3_diff1 value: 6.1367799850003815 - type: nauc_precision_at_3_max value: 8.466271950897857 - type: nauc_precision_at_3_std value: 1.7458051543195068 - type: nauc_precision_at_5_diff1 value: -5.804548945783379 - type: nauc_precision_at_5_max value: 3.4060251839074818 - type: nauc_precision_at_5_std value: 5.583410511782371 - type: nauc_recall_at_1000_diff1 value: 19.329432953574095 - type: nauc_recall_at_1000_max value: 43.260442595158736 - type: nauc_recall_at_1000_std value: 53.89644660661804 - type: nauc_recall_at_100_diff1 value: 21.265326296051235 - type: nauc_recall_at_100_max value: 38.573000195373695 - type: nauc_recall_at_100_std value: 42.169391082152785 - type: nauc_recall_at_10_diff1 value: 29.785129558987432 - type: nauc_recall_at_10_max value: 28.379657867558034 - type: nauc_recall_at_10_std value: 21.132574624091973 - type: nauc_recall_at_1_diff1 value: 60.645344065853145 - type: nauc_recall_at_1_max value: 31.232776777514797 - type: nauc_recall_at_1_std value: -1.1946138176109171 - type: nauc_recall_at_20_diff1 value: 25.88845612373954 - type: nauc_recall_at_20_max value: 30.24785945821152 - type: nauc_recall_at_20_std value: 31.73911437468067 - type: nauc_recall_at_3_diff1 value: 42.2968464797395 - type: nauc_recall_at_3_max value: 26.494318009870018 - type: nauc_recall_at_3_std value: 2.6045977160467544 - type: nauc_recall_at_5_diff1 value: 35.81340094401374 - type: nauc_recall_at_5_max value: 25.91082947510634 - type: nauc_recall_at_5_std value: 9.759404930864779 - type: ndcg_at_1 value: 87.819 - type: ndcg_at_10 value: 90.986 - type: ndcg_at_100 value: 91.69 - type: ndcg_at_1000 value: 91.863 - type: ndcg_at_20 value: 91.293 - type: ndcg_at_3 value: 89.621 - type: ndcg_at_5 value: 90.333 - type: precision_at_1 value: 87.819 - type: precision_at_10 value: 10.753 - type: precision_at_100 value: 1.138 - type: precision_at_1000 value: 0.117 - type: precision_at_20 value: 5.4879999999999995 - type: precision_at_3 value: 33.703 - type: precision_at_5 value: 20.831 - type: recall_at_1 value: 81.601 - type: recall_at_10 value: 95.44200000000001 - type: recall_at_100 value: 98.14399999999999 - type: recall_at_1000 value: 99.157 - type: recall_at_20 value: 96.43 - type: recall_at_3 value: 91.729 - type: recall_at_5 value: 93.552 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: main_score value: 56.056 - type: map_at_1 value: 28.666000000000004 - type: map_at_10 value: 47.437000000000005 - type: map_at_100 value: 49.537 - type: map_at_1000 value: 49.665 - type: map_at_20 value: 48.618 - type: map_at_3 value: 41.355 - type: map_at_5 value: 44.525 - type: mrr_at_1 value: 55.55555555555556 - type: mrr_at_10 value: 63.705173427395614 - type: mrr_at_100 value: 64.25449940779741 - type: mrr_at_1000 value: 64.27635581092147 - type: mrr_at_20 value: 64.03796029079103 - type: mrr_at_3 value: 61.49691358024688 - type: mrr_at_5 value: 62.73148148148143 - type: nauc_map_at_1000_diff1 value: 43.24282910397747 - type: nauc_map_at_1000_max value: 28.506093180265644 - type: nauc_map_at_1000_std value: -13.040508386155054 - type: nauc_map_at_100_diff1 value: 43.23650442904607 - type: nauc_map_at_100_max value: 28.470565635459156 - type: nauc_map_at_100_std value: -12.988098780714935 - type: nauc_map_at_10_diff1 value: 43.393840733087686 - type: nauc_map_at_10_max value: 26.637302062720153 - type: nauc_map_at_10_std value: -14.47500292113762 - type: nauc_map_at_1_diff1 value: 47.705150227211725 - type: nauc_map_at_1_max value: 15.354189686550129 - type: nauc_map_at_1_std value: -14.559819859039067 - type: nauc_map_at_20_diff1 value: 43.14121075706104 - type: nauc_map_at_20_max value: 27.811170590408395 - type: nauc_map_at_20_std value: -13.459413585283583 - type: nauc_map_at_3_diff1 value: 44.33938667720801 - type: nauc_map_at_3_max value: 21.785619884549398 - type: nauc_map_at_3_std value: -15.569980103071593 - type: nauc_map_at_5_diff1 value: 43.39280905665027 - type: nauc_map_at_5_max value: 25.021492190645017 - type: nauc_map_at_5_std value: -14.48856622187443 - type: nauc_mrr_at_1000_diff1 value: 52.971563939946286 - type: nauc_mrr_at_1000_max value: 38.88019486172324 - type: nauc_mrr_at_1000_std value: -12.412991642381616 - type: nauc_mrr_at_100_diff1 value: 52.978468139876945 - type: nauc_mrr_at_100_max value: 38.89751787948751 - type: nauc_mrr_at_100_std value: -12.3677876252269 - type: nauc_mrr_at_10_diff1 value: 52.78507148048174 - type: nauc_mrr_at_10_max value: 38.55079809310022 - type: nauc_mrr_at_10_std value: -12.944127025078755 - type: nauc_mrr_at_1_diff1 value: 55.52626805861546 - type: nauc_mrr_at_1_max value: 40.49306809164979 - type: nauc_mrr_at_1_std value: -12.886607701317681 - type: nauc_mrr_at_20_diff1 value: 52.9592152665678 - type: nauc_mrr_at_20_max value: 38.88514014589964 - type: nauc_mrr_at_20_std value: -12.434464359819444 - type: nauc_mrr_at_3_diff1 value: 52.73696844091174 - type: nauc_mrr_at_3_max value: 38.61018727252859 - type: nauc_mrr_at_3_std value: -13.123989867364166 - type: nauc_mrr_at_5_diff1 value: 53.037110010188 - type: nauc_mrr_at_5_max value: 38.44770729849151 - type: nauc_mrr_at_5_std value: -13.49318771828972 - type: nauc_ndcg_at_1000_diff1 value: 44.73813840091289 - type: nauc_ndcg_at_1000_max value: 33.70113904685389 - type: nauc_ndcg_at_1000_std value: -10.328687058192742 - type: nauc_ndcg_at_100_diff1 value: 44.595174119928835 - type: nauc_ndcg_at_100_max value: 33.4788285112467 - type: nauc_ndcg_at_100_std value: -8.695355259716946 - type: nauc_ndcg_at_10_diff1 value: 44.39837225263 - type: nauc_ndcg_at_10_max value: 29.188289725593393 - type: nauc_ndcg_at_10_std value: -13.67608323673103 - type: nauc_ndcg_at_1_diff1 value: 55.52626805861546 - type: nauc_ndcg_at_1_max value: 40.49306809164979 - type: nauc_ndcg_at_1_std value: -12.886607701317681 - type: nauc_ndcg_at_20_diff1 value: 44.24661739902305 - type: nauc_ndcg_at_20_max value: 31.667868318249965 - type: nauc_ndcg_at_20_std value: -10.65470780066342 - type: nauc_ndcg_at_3_diff1 value: 43.39857166975522 - type: nauc_ndcg_at_3_max value: 31.764668313577495 - type: nauc_ndcg_at_3_std value: -14.494866954678152 - type: nauc_ndcg_at_5_diff1 value: 43.16976647347281 - type: nauc_ndcg_at_5_max value: 29.878329062643143 - type: nauc_ndcg_at_5_std value: -13.987689089179739 - type: nauc_precision_at_1000_diff1 value: -9.807973252625484 - type: nauc_precision_at_1000_max value: 26.6279603849494 - type: nauc_precision_at_1000_std value: 7.113187103520632 - type: nauc_precision_at_100_diff1 value: -4.777149603323976 - type: nauc_precision_at_100_max value: 31.03410463692187 - type: nauc_precision_at_100_std value: 10.463144150275435 - type: nauc_precision_at_10_diff1 value: 8.691528703215962 - type: nauc_precision_at_10_max value: 33.329579434123374 - type: nauc_precision_at_10_std value: -0.8002015226329403 - type: nauc_precision_at_1_diff1 value: 55.52626805861546 - type: nauc_precision_at_1_max value: 40.49306809164979 - type: nauc_precision_at_1_std value: -12.886607701317681 - type: nauc_precision_at_20_diff1 value: 3.4564653474184284 - type: nauc_precision_at_20_max value: 34.401070158471136 - type: nauc_precision_at_20_std value: 5.813431200164549 - type: nauc_precision_at_3_diff1 value: 22.463219705462187 - type: nauc_precision_at_3_max value: 34.77413976546924 - type: nauc_precision_at_3_std value: -7.083890789741479 - type: nauc_precision_at_5_diff1 value: 14.011006004883154 - type: nauc_precision_at_5_max value: 35.73655466853702 - type: nauc_precision_at_5_std value: -2.8395172077771598 - type: nauc_recall_at_1000_diff1 value: 16.478046357391555 - type: nauc_recall_at_1000_max value: 43.231704288282344 - type: nauc_recall_at_1000_std value: 38.430684937573645 - type: nauc_recall_at_100_diff1 value: 30.764718344602436 - type: nauc_recall_at_100_max value: 31.769050487166655 - type: nauc_recall_at_100_std value: 23.48468311677149 - type: nauc_recall_at_10_diff1 value: 34.47339565324045 - type: nauc_recall_at_10_max value: 19.054212335800454 - type: nauc_recall_at_10_std value: -11.039734015330437 - type: nauc_recall_at_1_diff1 value: 47.705150227211725 - type: nauc_recall_at_1_max value: 15.354189686550129 - type: nauc_recall_at_1_std value: -14.559819859039067 - type: nauc_recall_at_20_diff1 value: 32.1011474016873 - type: nauc_recall_at_20_max value: 25.546372988304423 - type: nauc_recall_at_20_std value: -0.007233471152482897 - type: nauc_recall_at_3_diff1 value: 37.5708138019065 - type: nauc_recall_at_3_max value: 16.66410785756736 - type: nauc_recall_at_3_std value: -15.404817020108966 - type: nauc_recall_at_5_diff1 value: 35.714519648479595 - type: nauc_recall_at_5_max value: 19.02075233009296 - type: nauc_recall_at_5_std value: -13.180963359760725 - type: ndcg_at_1 value: 55.556000000000004 - type: ndcg_at_10 value: 56.056 - type: ndcg_at_100 value: 62.44 - type: ndcg_at_1000 value: 64.263 - type: ndcg_at_20 value: 58.638999999999996 - type: ndcg_at_3 value: 51.722 - type: ndcg_at_5 value: 52.701 - type: precision_at_1 value: 55.556000000000004 - type: precision_at_10 value: 15.679000000000002 - type: precision_at_100 value: 2.252 - type: precision_at_1000 value: 0.257 - type: precision_at_20 value: 9.02 - type: precision_at_3 value: 34.619 - type: precision_at_5 value: 25.093 - type: recall_at_1 value: 28.666000000000004 - type: recall_at_10 value: 63.717999999999996 - type: recall_at_100 value: 86.938 - type: recall_at_1000 value: 97.603 - type: recall_at_20 value: 71.649 - type: recall_at_3 value: 46.663 - type: recall_at_5 value: 53.313 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: main_score value: 71.74199999999999 - type: map_at_1 value: 41.729 - type: map_at_10 value: 63.168 - type: map_at_100 value: 64.132 - type: map_at_1000 value: 64.199 - type: map_at_20 value: 63.736000000000004 - type: map_at_3 value: 59.826 - type: map_at_5 value: 61.882000000000005 - type: mrr_at_1 value: 83.45712356515868 - type: mrr_at_10 value: 87.850342432719 - type: mrr_at_100 value: 88.0016320691113 - type: mrr_at_1000 value: 88.00576596968136 - type: mrr_at_20 value: 87.94463253190389 - type: mrr_at_3 value: 87.13706954760278 - type: mrr_at_5 value: 87.59419311276136 - type: nauc_map_at_1000_diff1 value: 13.635446621095054 - type: nauc_map_at_1000_max value: 18.670632529445633 - type: nauc_map_at_1000_std value: 10.444842636150575 - type: nauc_map_at_100_diff1 value: 13.599262398010783 - type: nauc_map_at_100_max value: 18.636389405484806 - type: nauc_map_at_100_std value: 10.460027483576043 - type: nauc_map_at_10_diff1 value: 13.235053919323942 - type: nauc_map_at_10_max value: 18.252140477080047 - type: nauc_map_at_10_std value: 9.9075337042203 - type: nauc_map_at_1_diff1 value: 76.51940497836482 - type: nauc_map_at_1_max value: 51.251419487235474 - type: nauc_map_at_1_std value: 0.16714896857146574 - type: nauc_map_at_20_diff1 value: 13.4178245722222 - type: nauc_map_at_20_max value: 18.40988771210718 - type: nauc_map_at_20_std value: 10.216685163366282 - type: nauc_map_at_3_diff1 value: 13.38370761663418 - type: nauc_map_at_3_max value: 17.760962555456537 - type: nauc_map_at_3_std value: 7.15741965624388 - type: nauc_map_at_5_diff1 value: 13.138133309724855 - type: nauc_map_at_5_max value: 17.871761295251044 - type: nauc_map_at_5_std value: 8.475147426940074 - type: nauc_mrr_at_1000_diff1 value: 75.82650818891959 - type: nauc_mrr_at_1000_max value: 53.6736100668434 - type: nauc_mrr_at_1000_std value: 1.8025016349213916 - type: nauc_mrr_at_100_diff1 value: 75.82530574210111 - type: nauc_mrr_at_100_max value: 53.68067545829002 - type: nauc_mrr_at_100_std value: 1.8147470536495791 - type: nauc_mrr_at_10_diff1 value: 75.8330135686799 - type: nauc_mrr_at_10_max value: 53.78626885349077 - type: nauc_mrr_at_10_std value: 1.7975782717226636 - type: nauc_mrr_at_1_diff1 value: 76.51940497836482 - type: nauc_mrr_at_1_max value: 51.251419487235474 - type: nauc_mrr_at_1_std value: 0.16714896857146574 - type: nauc_mrr_at_20_diff1 value: 75.82783382464166 - type: nauc_mrr_at_20_max value: 53.68364567043885 - type: nauc_mrr_at_20_std value: 1.742037904463963 - type: nauc_mrr_at_3_diff1 value: 75.6944609768663 - type: nauc_mrr_at_3_max value: 53.803941340341666 - type: nauc_mrr_at_3_std value: 1.1849945458077804 - type: nauc_mrr_at_5_diff1 value: 75.73006960604903 - type: nauc_mrr_at_5_max value: 53.62223096420106 - type: nauc_mrr_at_5_std value: 1.6144067563410909 - type: nauc_ndcg_at_1000_diff1 value: 21.58025241642726 - type: nauc_ndcg_at_1000_max value: 24.675747527001153 - type: nauc_ndcg_at_1000_std value: 13.075943547492718 - type: nauc_ndcg_at_100_diff1 value: 20.30260137544846 - type: nauc_ndcg_at_100_max value: 23.757528813872018 - type: nauc_ndcg_at_100_std value: 13.648994687574062 - type: nauc_ndcg_at_10_diff1 value: 18.995052360997818 - type: nauc_ndcg_at_10_max value: 22.254260808196037 - type: nauc_ndcg_at_10_std value: 11.27212390633054 - type: nauc_ndcg_at_1_diff1 value: 76.51940497836482 - type: nauc_ndcg_at_1_max value: 51.251419487235474 - type: nauc_ndcg_at_1_std value: 0.16714896857146574 - type: nauc_ndcg_at_20_diff1 value: 19.333742380695757 - type: nauc_ndcg_at_20_max value: 22.527779834633364 - type: nauc_ndcg_at_20_std value: 12.161009000707917 - type: nauc_ndcg_at_3_diff1 value: 20.013329040965534 - type: nauc_ndcg_at_3_max value: 21.99692460311921 - type: nauc_ndcg_at_3_std value: 6.8076290638386165 - type: nauc_ndcg_at_5_diff1 value: 19.08226315942471 - type: nauc_ndcg_at_5_max value: 21.71185964294168 - type: nauc_ndcg_at_5_std value: 8.671911269518214 - type: nauc_precision_at_1000_diff1 value: 2.4462475489446764 - type: nauc_precision_at_1000_max value: 29.145662064268578 - type: nauc_precision_at_1000_std value: 49.20704909525856 - type: nauc_precision_at_100_diff1 value: 0.11271196725540299 - type: nauc_precision_at_100_max value: 17.37584606388067 - type: nauc_precision_at_100_std value: 34.66099346244071 - type: nauc_precision_at_10_diff1 value: 2.9923183951227825 - type: nauc_precision_at_10_max value: 14.261884731124264 - type: nauc_precision_at_10_std value: 18.084188795498378 - type: nauc_precision_at_1_diff1 value: 76.51940497836482 - type: nauc_precision_at_1_max value: 51.251419487235474 - type: nauc_precision_at_1_std value: 0.16714896857146574 - type: nauc_precision_at_20_diff1 value: 1.9180293008303761 - type: nauc_precision_at_20_max value: 13.832269193468512 - type: nauc_precision_at_20_std value: 21.65284406055607 - type: nauc_precision_at_3_diff1 value: 7.226609484731811 - type: nauc_precision_at_3_max value: 15.162908526977272 - type: nauc_precision_at_3_std value: 8.451859972962776 - type: nauc_precision_at_5_diff1 value: 4.705236845538159 - type: nauc_precision_at_5_max value: 14.022910843582666 - type: nauc_precision_at_5_std value: 11.777269322821605 - type: nauc_recall_at_1000_diff1 value: 2.446247548945172 - type: nauc_recall_at_1000_max value: 29.14566206426889 - type: nauc_recall_at_1000_std value: 49.20704909525879 - type: nauc_recall_at_100_diff1 value: 0.1127119672553316 - type: nauc_recall_at_100_max value: 17.37584606388062 - type: nauc_recall_at_100_std value: 34.660993462440686 - type: nauc_recall_at_10_diff1 value: 2.9923183951227927 - type: nauc_recall_at_10_max value: 14.261884731124299 - type: nauc_recall_at_10_std value: 18.08418879549837 - type: nauc_recall_at_1_diff1 value: 76.51940497836482 - type: nauc_recall_at_1_max value: 51.251419487235474 - type: nauc_recall_at_1_std value: 0.16714896857146574 - type: nauc_recall_at_20_diff1 value: 1.918029300830432 - type: nauc_recall_at_20_max value: 13.832269193468566 - type: nauc_recall_at_20_std value: 21.65284406055605 - type: nauc_recall_at_3_diff1 value: 7.226609484731802 - type: nauc_recall_at_3_max value: 15.162908526977182 - type: nauc_recall_at_3_std value: 8.451859972962634 - type: nauc_recall_at_5_diff1 value: 4.705236845538197 - type: nauc_recall_at_5_max value: 14.02291084358265 - type: nauc_recall_at_5_std value: 11.777269322821638 - type: ndcg_at_1 value: 83.45700000000001 - type: ndcg_at_10 value: 71.74199999999999 - type: ndcg_at_100 value: 75.008 - type: ndcg_at_1000 value: 76.242 - type: ndcg_at_20 value: 73.114 - type: ndcg_at_3 value: 67.128 - type: ndcg_at_5 value: 69.645 - type: precision_at_1 value: 83.45700000000001 - type: precision_at_10 value: 14.747 - type: precision_at_100 value: 1.73 - type: precision_at_1000 value: 0.189 - type: precision_at_20 value: 7.8149999999999995 - type: precision_at_3 value: 42.323 - type: precision_at_5 value: 27.381 - type: recall_at_1 value: 41.729 - type: recall_at_10 value: 73.734 - type: recall_at_100 value: 86.502 - type: recall_at_1000 value: 94.60499999999999 - type: recall_at_20 value: 78.14999999999999 - type: recall_at_3 value: 63.483999999999995 - type: recall_at_5 value: 68.45400000000001 task: type: Retrieval - dataset: config: default name: MTEB ImdbClassification revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 96.4904 - type: ap value: 94.85481918794709 - type: ap_weighted value: 94.85481918794709 - type: f1 value: 96.4898592305707 - type: f1_weighted value: 96.4898592305707 - type: main_score value: 96.4904 task: type: Classification - dataset: config: default name: MTEB MSMARCO revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: main_score value: 43.692 - type: map_at_1 value: 23.751 - type: map_at_10 value: 36.553999999999995 - type: map_at_100 value: 37.721 - type: map_at_1000 value: 37.763999999999996 - type: map_at_20 value: 37.289 - type: map_at_3 value: 32.643 - type: map_at_5 value: 34.851 - type: mrr_at_1 value: 24.455587392550143 - type: mrr_at_10 value: 37.18388706963206 - type: mrr_at_100 value: 38.28330737932916 - type: mrr_at_1000 value: 38.32054399710817 - type: mrr_at_20 value: 37.8818001216278 - type: mrr_at_3 value: 33.35721107927405 - type: mrr_at_5 value: 35.52483285577843 - type: nauc_map_at_1000_diff1 value: 36.3576177260684 - type: nauc_map_at_1000_max value: 7.854511605962703 - type: nauc_map_at_1000_std value: -17.701121059746878 - type: nauc_map_at_100_diff1 value: 36.356075649230505 - type: nauc_map_at_100_max value: 7.862168042999533 - type: nauc_map_at_100_std value: -17.670102459097233 - type: nauc_map_at_10_diff1 value: 36.22122978875574 - type: nauc_map_at_10_max value: 7.80848606967416 - type: nauc_map_at_10_std value: -18.3265151386167 - type: nauc_map_at_1_diff1 value: 39.28605466408357 - type: nauc_map_at_1_max value: 6.20202977590459 - type: nauc_map_at_1_std value: -15.734334090045026 - type: nauc_map_at_20_diff1 value: 36.33637880909657 - type: nauc_map_at_20_max value: 7.843437969476022 - type: nauc_map_at_20_std value: -17.917533363025996 - type: nauc_map_at_3_diff1 value: 36.24864976076741 - type: nauc_map_at_3_max value: 7.420345251835957 - type: nauc_map_at_3_std value: -18.71678497722944 - type: nauc_map_at_5_diff1 value: 36.0789619291824 - type: nauc_map_at_5_max value: 7.7314285669514495 - type: nauc_map_at_5_std value: -18.748688764538706 - type: nauc_mrr_at_1000_diff1 value: 36.23912675623378 - type: nauc_mrr_at_1000_max value: 7.690553436255147 - type: nauc_mrr_at_1000_std value: -17.609526070212304 - type: nauc_mrr_at_100_diff1 value: 36.23782651189002 - type: nauc_mrr_at_100_max value: 7.70075095171647 - type: nauc_mrr_at_100_std value: -17.575714144960184 - type: nauc_mrr_at_10_diff1 value: 36.125229472534215 - type: nauc_mrr_at_10_max value: 7.635472248755658 - type: nauc_mrr_at_10_std value: -18.208166616511086 - type: nauc_mrr_at_1_diff1 value: 39.20986875554532 - type: nauc_mrr_at_1_max value: 6.062668487561363 - type: nauc_mrr_at_1_std value: -16.04130340817602 - type: nauc_mrr_at_20_diff1 value: 36.21207088739667 - type: nauc_mrr_at_20_max value: 7.699610250145951 - type: nauc_mrr_at_20_std value: -17.778245221724028 - type: nauc_mrr_at_3_diff1 value: 36.03957583885305 - type: nauc_mrr_at_3_max value: 7.225515576504581 - type: nauc_mrr_at_3_std value: -18.74478742943741 - type: nauc_mrr_at_5_diff1 value: 35.969152496648974 - type: nauc_mrr_at_5_max value: 7.584059789018233 - type: nauc_mrr_at_5_std value: -18.569374723129332 - type: nauc_ndcg_at_1000_diff1 value: 35.894655529841806 - type: nauc_ndcg_at_1000_max value: 8.579327424366236 - type: nauc_ndcg_at_1000_std value: -16.359677367747896 - type: nauc_ndcg_at_100_diff1 value: 35.89861902483983 - type: nauc_ndcg_at_100_max value: 8.830873623962242 - type: nauc_ndcg_at_100_std value: -15.173125564722978 - type: nauc_ndcg_at_10_diff1 value: 35.36499811105169 - type: nauc_ndcg_at_10_max value: 8.449267180956992 - type: nauc_ndcg_at_10_std value: -18.41978802362402 - type: nauc_ndcg_at_1_diff1 value: 39.15422481210622 - type: nauc_ndcg_at_1_max value: 6.055515791928331 - type: nauc_ndcg_at_1_std value: -16.042779610876252 - type: nauc_ndcg_at_20_diff1 value: 35.73402868264468 - type: nauc_ndcg_at_20_max value: 8.695705518210847 - type: nauc_ndcg_at_20_std value: -16.7735829470466 - type: nauc_ndcg_at_3_diff1 value: 35.31358242856231 - type: nauc_ndcg_at_3_max value: 7.645692789058997 - type: nauc_ndcg_at_3_std value: -19.460003734786874 - type: nauc_ndcg_at_5_diff1 value: 35.05216588927143 - type: nauc_ndcg_at_5_max value: 8.216690520604715 - type: nauc_ndcg_at_5_std value: -19.3982054492159 - type: nauc_precision_at_1000_diff1 value: -4.440002625111349 - type: nauc_precision_at_1000_max value: 7.886988951901723 - type: nauc_precision_at_1000_std value: 9.88111187048247 - type: nauc_precision_at_100_diff1 value: 15.728286119463325 - type: nauc_precision_at_100_max value: 13.218650824470654 - type: nauc_precision_at_100_std value: 16.113245895522553 - type: nauc_precision_at_10_diff1 value: 29.51218489610567 - type: nauc_precision_at_10_max value: 10.197432401942912 - type: nauc_precision_at_10_std value: -16.950603431359493 - type: nauc_precision_at_1_diff1 value: 39.15422481210622 - type: nauc_precision_at_1_max value: 6.055515791928331 - type: nauc_precision_at_1_std value: -16.042779610876252 - type: nauc_precision_at_20_diff1 value: 27.825993070397338 - type: nauc_precision_at_20_max value: 11.437632287846007 - type: nauc_precision_at_20_std value: -7.450353566405601 - type: nauc_precision_at_3_diff1 value: 32.14135556796588 - type: nauc_precision_at_3_max value: 7.989252443574163 - type: nauc_precision_at_3_std value: -21.566254595671055 - type: nauc_precision_at_5_diff1 value: 30.68778685307082 - type: nauc_precision_at_5_max value: 9.332160758499892 - type: nauc_precision_at_5_std value: -20.928554713448914 - type: nauc_recall_at_1000_diff1 value: 25.00810478716878 - type: nauc_recall_at_1000_max value: 46.518165765201644 - type: nauc_recall_at_1000_std value: 61.4734635576085 - type: nauc_recall_at_100_diff1 value: 33.895581318261726 - type: nauc_recall_at_100_max value: 20.10706035872801 - type: nauc_recall_at_100_std value: 24.204226584457047 - type: nauc_recall_at_10_diff1 value: 32.363127359576296 - type: nauc_recall_at_10_max value: 10.729923804989545 - type: nauc_recall_at_10_std value: -18.1335370184202 - type: nauc_recall_at_1_diff1 value: 39.28605466408357 - type: nauc_recall_at_1_max value: 6.20202977590459 - type: nauc_recall_at_1_std value: -15.734334090045026 - type: nauc_recall_at_20_diff1 value: 33.47804003169795 - type: nauc_recall_at_20_max value: 12.781494765263382 - type: nauc_recall_at_20_std value: -9.263970132202658 - type: nauc_recall_at_3_diff1 value: 32.71001429428999 - type: nauc_recall_at_3_max value: 8.353439197382693 - type: nauc_recall_at_3_std value: -21.235097744366954 - type: nauc_recall_at_5_diff1 value: 31.87451464963415 - type: nauc_recall_at_5_max value: 9.635051450907305 - type: nauc_recall_at_5_std value: -21.113235357132794 - type: ndcg_at_1 value: 24.47 - type: ndcg_at_10 value: 43.692 - type: ndcg_at_100 value: 49.211 - type: ndcg_at_1000 value: 50.244 - type: ndcg_at_20 value: 46.278000000000006 - type: ndcg_at_3 value: 35.719 - type: ndcg_at_5 value: 39.652 - type: precision_at_1 value: 24.47 - type: precision_at_10 value: 6.857 - type: precision_at_100 value: 0.9610000000000001 - type: precision_at_1000 value: 0.105 - type: precision_at_20 value: 3.968 - type: precision_at_3 value: 15.181000000000001 - type: precision_at_5 value: 11.117 - type: recall_at_1 value: 23.751 - type: recall_at_10 value: 65.64 - type: recall_at_100 value: 90.967 - type: recall_at_1000 value: 98.738 - type: recall_at_20 value: 75.639 - type: recall_at_3 value: 43.927 - type: recall_at_5 value: 53.366 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 98.82580939352485 - type: f1 value: 98.75201754333801 - type: f1_weighted value: 98.82795205108245 - type: main_score value: 98.82580939352485 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 92.29822161422709 - type: f1 value: 77.75210224871594 - type: f1_weighted value: 93.58661422540348 - type: main_score value: 92.29822161422709 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 85.17484868863484 - type: f1 value: 81.94484244487094 - type: f1_weighted value: 85.21022593423332 - type: main_score value: 85.17484868863484 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 89.61667787491594 - type: f1 value: 89.02701927621264 - type: f1_weighted value: 89.56306982022801 - type: main_score value: 89.61667787491594 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: main_score value: 46.318282423948574 - type: v_measure value: 46.318282423948574 - type: v_measure_std value: 0.9729055662461538 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: main_score value: 44.29033625273981 - type: v_measure value: 44.29033625273981 - type: v_measure_std value: 1.0596383629128594 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 split: test type: mteb/mind_small metrics: - type: main_score value: 33.0526129239962 - type: map value: 33.0526129239962 - type: mrr value: 34.29260046890935 - type: nAUC_map_diff1 value: 12.579738077238032 - type: nAUC_map_max value: -20.936629344962 - type: nAUC_map_std value: -1.6096805784945216 - type: nAUC_mrr_diff1 value: 11.597584463580807 - type: nAUC_mrr_max value: -15.723702838537504 - type: nAUC_mrr_std value: 0.2719172965777737 task: type: Reranking - dataset: config: default name: MTEB NFCorpus revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: main_score value: 41.486000000000004 - type: map_at_1 value: 6.866 - type: map_at_10 value: 15.895999999999999 - type: map_at_100 value: 21.093 - type: map_at_1000 value: 23.067 - type: map_at_20 value: 18.125 - type: map_at_3 value: 11.421000000000001 - type: map_at_5 value: 13.415 - type: mrr_at_1 value: 52.63157894736842 - type: mrr_at_10 value: 61.486805248415166 - type: mrr_at_100 value: 62.08211009182091 - type: mrr_at_1000 value: 62.10828701365016 - type: mrr_at_20 value: 61.904411187915784 - type: mrr_at_3 value: 59.90712074303407 - type: mrr_at_5 value: 60.91331269349847 - type: nauc_map_at_1000_diff1 value: 25.484625278529403 - type: nauc_map_at_1000_max value: 31.206600396418853 - type: nauc_map_at_1000_std value: 15.569448072357156 - type: nauc_map_at_100_diff1 value: 27.636750226316764 - type: nauc_map_at_100_max value: 29.66992681250722 - type: nauc_map_at_100_std value: 10.570600484002671 - type: nauc_map_at_10_diff1 value: 32.76642525548697 - type: nauc_map_at_10_max value: 21.459225397237663 - type: nauc_map_at_10_std value: -3.546494734209264 - type: nauc_map_at_1_diff1 value: 48.8002894871328 - type: nauc_map_at_1_max value: 5.7236722609868815 - type: nauc_map_at_1_std value: -13.283554044471352 - type: nauc_map_at_20_diff1 value: 30.57169701502308 - type: nauc_map_at_20_max value: 25.79666139518404 - type: nauc_map_at_20_std value: 1.781732492989651 - type: nauc_map_at_3_diff1 value: 40.076315947201095 - type: nauc_map_at_3_max value: 12.862524429140054 - type: nauc_map_at_3_std value: -9.188349777126817 - type: nauc_map_at_5_diff1 value: 36.9918718052938 - type: nauc_map_at_5_max value: 16.74234374361876 - type: nauc_map_at_5_std value: -7.818523349307494 - type: nauc_mrr_at_1000_diff1 value: 26.88183002609805 - type: nauc_mrr_at_1000_max value: 47.10209348428658 - type: nauc_mrr_at_1000_std value: 32.067825924992924 - type: nauc_mrr_at_100_diff1 value: 26.871482491566745 - type: nauc_mrr_at_100_max value: 47.11303868498556 - type: nauc_mrr_at_100_std value: 32.08961428818868 - type: nauc_mrr_at_10_diff1 value: 26.6356914977722 - type: nauc_mrr_at_10_max value: 47.091624558810366 - type: nauc_mrr_at_10_std value: 31.942424120660164 - type: nauc_mrr_at_1_diff1 value: 28.19774198483673 - type: nauc_mrr_at_1_max value: 41.44380927834253 - type: nauc_mrr_at_1_std value: 25.18222691885917 - type: nauc_mrr_at_20_diff1 value: 26.86487347109452 - type: nauc_mrr_at_20_max value: 47.1987778214726 - type: nauc_mrr_at_20_std value: 32.143517921610034 - type: nauc_mrr_at_3_diff1 value: 27.34340373236422 - type: nauc_mrr_at_3_max value: 46.358726506276646 - type: nauc_mrr_at_3_std value: 31.74924155572593 - type: nauc_mrr_at_5_diff1 value: 27.209667205060672 - type: nauc_mrr_at_5_max value: 46.79883369072009 - type: nauc_mrr_at_5_std value: 31.655605306670758 - type: nauc_ndcg_at_1000_diff1 value: 18.940195769769687 - type: nauc_ndcg_at_1000_max value: 46.48551313937331 - type: nauc_ndcg_at_1000_std value: 33.64819502089232 - type: nauc_ndcg_at_100_diff1 value: 19.50885253809146 - type: nauc_ndcg_at_100_max value: 40.53174462354878 - type: nauc_ndcg_at_100_std value: 28.516152877751118 - type: nauc_ndcg_at_10_diff1 value: 16.01699218096564 - type: nauc_ndcg_at_10_max value: 41.17322878314514 - type: nauc_ndcg_at_10_std value: 29.002233224832196 - type: nauc_ndcg_at_1_diff1 value: 27.443547710102205 - type: nauc_ndcg_at_1_max value: 40.66529763309582 - type: nauc_ndcg_at_1_std value: 24.15016766225869 - type: nauc_ndcg_at_20_diff1 value: 17.541197675685062 - type: nauc_ndcg_at_20_max value: 40.53231266973844 - type: nauc_ndcg_at_20_std value: 29.54096347876548 - type: nauc_ndcg_at_3_diff1 value: 18.649628357473716 - type: nauc_ndcg_at_3_max value: 41.18603570171764 - type: nauc_ndcg_at_3_std value: 27.125524188420396 - type: nauc_ndcg_at_5_diff1 value: 17.519593751448483 - type: nauc_ndcg_at_5_max value: 42.715997890377345 - type: nauc_ndcg_at_5_std value: 27.902627839899868 - type: nauc_precision_at_1000_diff1 value: -15.528797630565155 - type: nauc_precision_at_1000_max value: 13.741640921778671 - type: nauc_precision_at_1000_std value: 44.50896053788372 - type: nauc_precision_at_100_diff1 value: -14.491464489721887 - type: nauc_precision_at_100_max value: 23.136434418999457 - type: nauc_precision_at_100_std value: 49.73145147863128 - type: nauc_precision_at_10_diff1 value: -4.829188942994277 - type: nauc_precision_at_10_max value: 40.327612559528866 - type: nauc_precision_at_10_std value: 39.34919529635044 - type: nauc_precision_at_1_diff1 value: 28.19774198483673 - type: nauc_precision_at_1_max value: 41.44380927834253 - type: nauc_precision_at_1_std value: 25.18222691885917 - type: nauc_precision_at_20_diff1 value: -7.210726293112847 - type: nauc_precision_at_20_max value: 37.195679576636984 - type: nauc_precision_at_20_std value: 45.4597096418357 - type: nauc_precision_at_3_diff1 value: 7.578219537774854 - type: nauc_precision_at_3_max value: 41.59775233475654 - type: nauc_precision_at_3_std value: 30.764584790895118 - type: nauc_precision_at_5_diff1 value: 1.655451789039598 - type: nauc_precision_at_5_max value: 43.435739407610455 - type: nauc_precision_at_5_std value: 33.42552263325999 - type: nauc_recall_at_1000_diff1 value: 5.030705700690516 - type: nauc_recall_at_1000_max value: 19.108072570815583 - type: nauc_recall_at_1000_std value: 14.697734974217308 - type: nauc_recall_at_100_diff1 value: 14.746540318132407 - type: nauc_recall_at_100_max value: 21.798705033854795 - type: nauc_recall_at_100_std value: 11.416195108842587 - type: nauc_recall_at_10_diff1 value: 25.548642427860486 - type: nauc_recall_at_10_max value: 18.711677681987474 - type: nauc_recall_at_10_std value: -5.988904818971677 - type: nauc_recall_at_1_diff1 value: 48.8002894871328 - type: nauc_recall_at_1_max value: 5.7236722609868815 - type: nauc_recall_at_1_std value: -13.283554044471352 - type: nauc_recall_at_20_diff1 value: 23.39140739154809 - type: nauc_recall_at_20_max value: 19.351150636155474 - type: nauc_recall_at_20_std value: -2.757280266915132 - type: nauc_recall_at_3_diff1 value: 38.17453576012812 - type: nauc_recall_at_3_max value: 13.47003839643972 - type: nauc_recall_at_3_std value: -8.75780163862688 - type: nauc_recall_at_5_diff1 value: 33.02812855226899 - type: nauc_recall_at_5_max value: 15.477626408978477 - type: nauc_recall_at_5_std value: -9.072206441070708 - type: ndcg_at_1 value: 50.773999999999994 - type: ndcg_at_10 value: 41.486000000000004 - type: ndcg_at_100 value: 39.051 - type: ndcg_at_1000 value: 48.106 - type: ndcg_at_20 value: 39.432 - type: ndcg_at_3 value: 47.428 - type: ndcg_at_5 value: 45.227000000000004 - type: precision_at_1 value: 52.632 - type: precision_at_10 value: 31.146 - type: precision_at_100 value: 10.328 - type: precision_at_1000 value: 2.432 - type: precision_at_20 value: 23.793 - type: precision_at_3 value: 45.201 - type: precision_at_5 value: 39.876 - type: recall_at_1 value: 6.866 - type: recall_at_10 value: 20.447000000000003 - type: recall_at_100 value: 40.607 - type: recall_at_1000 value: 73.411 - type: recall_at_20 value: 26.082 - type: recall_at_3 value: 12.484 - type: recall_at_5 value: 15.847 task: type: Retrieval - dataset: config: default name: MTEB NQ revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: main_score value: 69.072 - type: map_at_1 value: 45.483000000000004 - type: map_at_10 value: 62.050000000000004 - type: map_at_100 value: 62.693 - type: map_at_1000 value: 62.702999999999996 - type: map_at_20 value: 62.498 - type: map_at_3 value: 58.285 - type: map_at_5 value: 60.711000000000006 - type: mrr_at_1 value: 50.840092699884124 - type: mrr_at_10 value: 64.54635224116673 - type: mrr_at_100 value: 64.9526548702289 - type: mrr_at_1000 value: 64.95908460752281 - type: mrr_at_20 value: 64.82949565799959 - type: mrr_at_3 value: 61.89165701042856 - type: mrr_at_5 value: 63.632676709154026 - type: nauc_map_at_1000_diff1 value: 43.187285304185224 - type: nauc_map_at_1000_max value: 32.39921659632756 - type: nauc_map_at_1000_std value: -5.780901333066553 - type: nauc_map_at_100_diff1 value: 43.184487221204456 - type: nauc_map_at_100_max value: 32.41176116347982 - type: nauc_map_at_100_std value: -5.76422606662383 - type: nauc_map_at_10_diff1 value: 42.967066814031746 - type: nauc_map_at_10_max value: 32.489617364418514 - type: nauc_map_at_10_std value: -6.029045531102664 - type: nauc_map_at_1_diff1 value: 46.16376563218624 - type: nauc_map_at_1_max value: 26.342624776802232 - type: nauc_map_at_1_std value: -7.142171388751972 - type: nauc_map_at_20_diff1 value: 43.15894358608328 - type: nauc_map_at_20_max value: 32.46492198956245 - type: nauc_map_at_20_std value: -5.788373305449195 - type: nauc_map_at_3_diff1 value: 43.231752344608545 - type: nauc_map_at_3_max value: 31.68003009949564 - type: nauc_map_at_3_std value: -8.015235132765458 - type: nauc_map_at_5_diff1 value: 42.86197608819917 - type: nauc_map_at_5_max value: 32.363857571094485 - type: nauc_map_at_5_std value: -6.780487416387977 - type: nauc_mrr_at_1000_diff1 value: 43.40542912045782 - type: nauc_mrr_at_1000_max value: 32.8461770324533 - type: nauc_mrr_at_1000_std value: -3.6505425530008204 - type: nauc_mrr_at_100_diff1 value: 43.40233508014468 - type: nauc_mrr_at_100_max value: 32.85598538385942 - type: nauc_mrr_at_100_std value: -3.637477352635459 - type: nauc_mrr_at_10_diff1 value: 43.260179162806054 - type: nauc_mrr_at_10_max value: 32.942643527040474 - type: nauc_mrr_at_10_std value: -3.712052825320437 - type: nauc_mrr_at_1_diff1 value: 46.354919460881206 - type: nauc_mrr_at_1_max value: 29.1760258591106 - type: nauc_mrr_at_1_std value: -4.107225031227406 - type: nauc_mrr_at_20_diff1 value: 43.37092385434311 - type: nauc_mrr_at_20_max value: 32.93390254712846 - type: nauc_mrr_at_20_std value: -3.5719056112132006 - type: nauc_mrr_at_3_diff1 value: 43.1744474040527 - type: nauc_mrr_at_3_max value: 32.741290559777994 - type: nauc_mrr_at_3_std value: -4.72677925120697 - type: nauc_mrr_at_5_diff1 value: 43.108396819975674 - type: nauc_mrr_at_5_max value: 32.970519514893084 - type: nauc_mrr_at_5_std value: -4.090906158975974 - type: nauc_ndcg_at_1000_diff1 value: 42.786664193638714 - type: nauc_ndcg_at_1000_max value: 33.65554095609296 - type: nauc_ndcg_at_1000_std value: -4.024030130584482 - type: nauc_ndcg_at_100_diff1 value: 42.691246775210814 - type: nauc_ndcg_at_100_max value: 34.063232335110875 - type: nauc_ndcg_at_100_std value: -3.477813807415248 - type: nauc_ndcg_at_10_diff1 value: 41.90988990571757 - type: nauc_ndcg_at_10_max value: 34.58934812881633 - type: nauc_ndcg_at_10_std value: -4.3295110195497655 - type: nauc_ndcg_at_1_diff1 value: 46.354919460881206 - type: nauc_ndcg_at_1_max value: 29.1760258591106 - type: nauc_ndcg_at_1_std value: -4.107225031227406 - type: nauc_ndcg_at_20_diff1 value: 42.493206675867114 - type: nauc_ndcg_at_20_max value: 34.562441307459544 - type: nauc_ndcg_at_20_std value: -3.4456116866749107 - type: nauc_ndcg_at_3_diff1 value: 42.24180336502808 - type: nauc_ndcg_at_3_max value: 33.064267018100594 - type: nauc_ndcg_at_3_std value: -7.786248093572142 - type: nauc_ndcg_at_5_diff1 value: 41.692714787779565 - type: nauc_ndcg_at_5_max value: 34.20502498949156 - type: nauc_ndcg_at_5_std value: -5.979557859282785 - type: nauc_precision_at_1000_diff1 value: -13.779832506640702 - type: nauc_precision_at_1000_max value: 1.243001688631421 - type: nauc_precision_at_1000_std value: 17.351623398622323 - type: nauc_precision_at_100_diff1 value: -11.310526816290297 - type: nauc_precision_at_100_max value: 5.771669506192959 - type: nauc_precision_at_100_std value: 19.917795079540113 - type: nauc_precision_at_10_diff1 value: 2.163699384635286 - type: nauc_precision_at_10_max value: 19.66440698458386 - type: nauc_precision_at_10_std value: 13.689876348315726 - type: nauc_precision_at_1_diff1 value: 46.354919460881206 - type: nauc_precision_at_1_max value: 29.1760258591106 - type: nauc_precision_at_1_std value: -4.107225031227406 - type: nauc_precision_at_20_diff1 value: -3.038735879584471 - type: nauc_precision_at_20_max value: 14.132968299701695 - type: nauc_precision_at_20_std value: 17.78069734664346 - type: nauc_precision_at_3_diff1 value: 21.783760758070095 - type: nauc_precision_at_3_max value: 30.244127986404497 - type: nauc_precision_at_3_std value: -0.12411163467738723 - type: nauc_precision_at_5_diff1 value: 10.980635723302418 - type: nauc_precision_at_5_max value: 25.302293738975575 - type: nauc_precision_at_5_std value: 6.4740817488722024 - type: nauc_recall_at_1000_diff1 value: 34.10343772356593 - type: nauc_recall_at_1000_max value: 80.72497340357538 - type: nauc_recall_at_1000_std value: 69.54564103264093 - type: nauc_recall_at_100_diff1 value: 33.427719956774126 - type: nauc_recall_at_100_max value: 71.54086768335449 - type: nauc_recall_at_100_std value: 49.66157377654885 - type: nauc_recall_at_10_diff1 value: 33.70139560054039 - type: nauc_recall_at_10_max value: 45.47878072860151 - type: nauc_recall_at_10_std value: 1.4188516615716378 - type: nauc_recall_at_1_diff1 value: 46.16376563218624 - type: nauc_recall_at_1_max value: 26.342624776802232 - type: nauc_recall_at_1_std value: -7.142171388751972 - type: nauc_recall_at_20_diff1 value: 35.805379874970086 - type: nauc_recall_at_20_max value: 51.80479822253392 - type: nauc_recall_at_20_std value: 13.531467576460143 - type: nauc_recall_at_3_diff1 value: 37.288500141631616 - type: nauc_recall_at_3_max value: 35.07078243516728 - type: nauc_recall_at_3_std value: -10.452926441410405 - type: nauc_recall_at_5_diff1 value: 34.83186104526897 - type: nauc_recall_at_5_max value: 39.58488976496973 - type: nauc_recall_at_5_std value: -6.3049292065708835 - type: ndcg_at_1 value: 50.839999999999996 - type: ndcg_at_10 value: 69.072 - type: ndcg_at_100 value: 71.538 - type: ndcg_at_1000 value: 71.77799999999999 - type: ndcg_at_20 value: 70.41 - type: ndcg_at_3 value: 62.544999999999995 - type: ndcg_at_5 value: 66.33099999999999 - type: precision_at_1 value: 50.839999999999996 - type: precision_at_10 value: 10.495000000000001 - type: precision_at_100 value: 1.1900000000000002 - type: precision_at_1000 value: 0.121 - type: precision_at_20 value: 5.5809999999999995 - type: precision_at_3 value: 27.636 - type: precision_at_5 value: 18.864 - type: recall_at_1 value: 45.483000000000004 - type: recall_at_10 value: 87.483 - type: recall_at_100 value: 97.844 - type: recall_at_1000 value: 99.66199999999999 - type: recall_at_20 value: 92.294 - type: recall_at_3 value: 71.2 - type: recall_at_5 value: 79.753 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 split: test type: mteb/quora metrics: - type: main_score value: 89.58 - type: map_at_1 value: 71.819 - type: map_at_10 value: 86.04899999999999 - type: map_at_100 value: 86.648 - type: map_at_1000 value: 86.66199999999999 - type: map_at_20 value: 86.441 - type: map_at_3 value: 83.114 - type: map_at_5 value: 84.981 - type: mrr_at_1 value: 82.62 - type: mrr_at_10 value: 88.62899999999979 - type: mrr_at_100 value: 88.70918591324215 - type: mrr_at_1000 value: 88.70973091492397 - type: mrr_at_20 value: 88.68914765317221 - type: mrr_at_3 value: 87.74999999999979 - type: mrr_at_5 value: 88.36799999999974 - type: nauc_map_at_1000_diff1 value: 77.89207709760448 - type: nauc_map_at_1000_max value: 29.63371361495422 - type: nauc_map_at_1000_std value: -48.628180385874344 - type: nauc_map_at_100_diff1 value: 77.89592179104915 - type: nauc_map_at_100_max value: 29.617171506130756 - type: nauc_map_at_100_std value: -48.66057170774648 - type: nauc_map_at_10_diff1 value: 78.0618161228185 - type: nauc_map_at_10_max value: 29.178490609366737 - type: nauc_map_at_10_std value: -50.74755004592002 - type: nauc_map_at_1_diff1 value: 81.64335579973574 - type: nauc_map_at_1_max value: 21.813832226652174 - type: nauc_map_at_1_std value: -42.57570978190876 - type: nauc_map_at_20_diff1 value: 77.9299081005938 - type: nauc_map_at_20_max value: 29.458718470003888 - type: nauc_map_at_20_std value: -49.63337236763102 - type: nauc_map_at_3_diff1 value: 78.72941448509229 - type: nauc_map_at_3_max value: 26.600997896960056 - type: nauc_map_at_3_std value: -51.889002227479885 - type: nauc_map_at_5_diff1 value: 78.31466610917171 - type: nauc_map_at_5_max value: 28.09863984582896 - type: nauc_map_at_5_std value: -52.14058096096497 - type: nauc_mrr_at_1000_diff1 value: 78.42667263739992 - type: nauc_mrr_at_1000_max value: 31.98996235127974 - type: nauc_mrr_at_1000_std value: -44.380439148429296 - type: nauc_mrr_at_100_diff1 value: 78.42661032698115 - type: nauc_mrr_at_100_max value: 31.991652631740102 - type: nauc_mrr_at_100_std value: -44.37854108460535 - type: nauc_mrr_at_10_diff1 value: 78.39126022544136 - type: nauc_mrr_at_10_max value: 32.02023484451197 - type: nauc_mrr_at_10_std value: -44.561252349176954 - type: nauc_mrr_at_1_diff1 value: 79.21630894647448 - type: nauc_mrr_at_1_max value: 31.526303156060177 - type: nauc_mrr_at_1_std value: -41.887504422443136 - type: nauc_mrr_at_20_diff1 value: 78.42548039170424 - type: nauc_mrr_at_20_max value: 31.99588275070137 - type: nauc_mrr_at_20_std value: -44.44957722627042 - type: nauc_mrr_at_3_diff1 value: 78.26165151833735 - type: nauc_mrr_at_3_max value: 32.18028826126801 - type: nauc_mrr_at_3_std value: -44.6998237213182 - type: nauc_mrr_at_5_diff1 value: 78.34786430903962 - type: nauc_mrr_at_5_max value: 32.168476272879566 - type: nauc_mrr_at_5_std value: -44.7915919956712 - type: nauc_ndcg_at_1000_diff1 value: 77.79198355957816 - type: nauc_ndcg_at_1000_max value: 31.14363511518406 - type: nauc_ndcg_at_1000_std value: -46.69335151274275 - type: nauc_ndcg_at_100_diff1 value: 77.79898090286419 - type: nauc_ndcg_at_100_max value: 31.115103811629215 - type: nauc_ndcg_at_100_std value: -46.73078913421965 - type: nauc_ndcg_at_10_diff1 value: 77.74856635461343 - type: nauc_ndcg_at_10_max value: 30.279584686212747 - type: nauc_ndcg_at_10_std value: -50.23514662356807 - type: nauc_ndcg_at_1_diff1 value: 79.17833000040999 - type: nauc_ndcg_at_1_max value: 31.703788144510746 - type: nauc_ndcg_at_1_std value: -41.854817402870715 - type: nauc_ndcg_at_20_diff1 value: 77.7380353804671 - type: nauc_ndcg_at_20_max value: 30.622294129001553 - type: nauc_ndcg_at_20_std value: -49.035794761065254 - type: nauc_ndcg_at_3_diff1 value: 77.41476880573593 - type: nauc_ndcg_at_3_max value: 29.015949978243032 - type: nauc_ndcg_at_3_std value: -49.78627087622648 - type: nauc_ndcg_at_5_diff1 value: 77.64439137502896 - type: nauc_ndcg_at_5_max value: 29.444684897492206 - type: nauc_ndcg_at_5_std value: -51.21908400252501 - type: nauc_precision_at_1000_diff1 value: -44.92396459446822 - type: nauc_precision_at_1000_max value: -3.674153720989045 - type: nauc_precision_at_1000_std value: 39.56552468277785 - type: nauc_precision_at_100_diff1 value: -44.75143023259094 - type: nauc_precision_at_100_max value: -3.705280025140011 - type: nauc_precision_at_100_std value: 39.433619999113326 - type: nauc_precision_at_10_diff1 value: -41.0651074726579 - type: nauc_precision_at_10_max value: -0.21097985601783667 - type: nauc_precision_at_10_std value: 26.24652824589493 - type: nauc_precision_at_1_diff1 value: 79.17833000040999 - type: nauc_precision_at_1_max value: 31.703788144510746 - type: nauc_precision_at_1_std value: -41.854817402870715 - type: nauc_precision_at_20_diff1 value: -43.368001340920294 - type: nauc_precision_at_20_max value: -2.036990010399129 - type: nauc_precision_at_20_std value: 32.37747041406297 - type: nauc_precision_at_3_diff1 value: -22.089307548346877 - type: nauc_precision_at_3_max value: 6.2280973175296 - type: nauc_precision_at_3_std value: 5.323992514036145 - type: nauc_precision_at_5_diff1 value: -34.07115055244003 - type: nauc_precision_at_5_max value: 2.5955315789198834 - type: nauc_precision_at_5_std value: 16.26096689407332 - type: nauc_recall_at_1000_diff1 value: 58.27703860947467 - type: nauc_recall_at_1000_max value: 68.59835835315768 - type: nauc_recall_at_1000_std value: 77.96687006056064 - type: nauc_recall_at_100_diff1 value: 73.24371223081737 - type: nauc_recall_at_100_max value: 39.55925344664591 - type: nauc_recall_at_100_std value: -32.25605030215798 - type: nauc_recall_at_10_diff1 value: 73.41261201339202 - type: nauc_recall_at_10_max value: 26.822979434062926 - type: nauc_recall_at_10_std value: -74.2909332592806 - type: nauc_recall_at_1_diff1 value: 81.64335579973574 - type: nauc_recall_at_1_max value: 21.813832226652174 - type: nauc_recall_at_1_std value: -42.57570978190876 - type: nauc_recall_at_20_diff1 value: 72.7621297920656 - type: nauc_recall_at_20_max value: 26.02492304096079 - type: nauc_recall_at_20_std value: -77.8724532438279 - type: nauc_recall_at_3_diff1 value: 75.25149312810714 - type: nauc_recall_at_3_max value: 23.20545662481487 - type: nauc_recall_at_3_std value: -59.69689982140521 - type: nauc_recall_at_5_diff1 value: 73.69807273001406 - type: nauc_recall_at_5_max value: 24.073666798066057 - type: nauc_recall_at_5_std value: -67.91121268130719 - type: ndcg_at_1 value: 82.64 - type: ndcg_at_10 value: 89.58 - type: ndcg_at_100 value: 90.606 - type: ndcg_at_1000 value: 90.676 - type: ndcg_at_20 value: 90.132 - type: ndcg_at_3 value: 86.88 - type: ndcg_at_5 value: 88.40299999999999 - type: precision_at_1 value: 82.64 - type: precision_at_10 value: 13.604 - type: precision_at_100 value: 1.539 - type: precision_at_1000 value: 0.157 - type: precision_at_20 value: 7.188 - type: precision_at_3 value: 38.083 - type: precision_at_5 value: 25.018 - type: recall_at_1 value: 71.819 - type: recall_at_10 value: 96.34700000000001 - type: recall_at_100 value: 99.715 - type: recall_at_1000 value: 99.995 - type: recall_at_20 value: 98.073 - type: recall_at_3 value: 88.57300000000001 - type: recall_at_5 value: 92.908 task: type: Retrieval - dataset: config: default name: MTEB RedditClustering revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: main_score value: 71.18966762070158 - type: v_measure value: 71.18966762070158 - type: v_measure_std value: 2.7498969054457048 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: main_score value: 74.42014716862516 - type: v_measure value: 74.42014716862516 - type: v_measure_std value: 9.909739891410648 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 split: test type: mteb/scidocs metrics: - type: main_score value: 25.041999999999998 - type: map_at_1 value: 5.893000000000001 - type: map_at_10 value: 15.260000000000002 - type: map_at_100 value: 18.084 - type: map_at_1000 value: 18.467 - type: map_at_20 value: 16.675 - type: map_at_3 value: 10.526 - type: map_at_5 value: 12.775 - type: mrr_at_1 value: 28.999999999999996 - type: mrr_at_10 value: 41.03575396825395 - type: mrr_at_100 value: 42.136771862785835 - type: mrr_at_1000 value: 42.16698555415099 - type: mrr_at_20 value: 41.707493696104315 - type: mrr_at_3 value: 37.34999999999998 - type: mrr_at_5 value: 39.59999999999995 - type: nauc_map_at_1000_diff1 value: 12.080002654911883 - type: nauc_map_at_1000_max value: 29.813563682286276 - type: nauc_map_at_1000_std value: 20.36659817908673 - type: nauc_map_at_100_diff1 value: 12.108735517749706 - type: nauc_map_at_100_max value: 29.76830671710955 - type: nauc_map_at_100_std value: 20.3433621032846 - type: nauc_map_at_10_diff1 value: 12.91575031185637 - type: nauc_map_at_10_max value: 29.427600958386318 - type: nauc_map_at_10_std value: 16.89867275177153 - type: nauc_map_at_1_diff1 value: 19.353069488987916 - type: nauc_map_at_1_max value: 17.093914951159693 - type: nauc_map_at_1_std value: 8.19886078055046 - type: nauc_map_at_20_diff1 value: 11.977233457943113 - type: nauc_map_at_20_max value: 29.171812822948805 - type: nauc_map_at_20_std value: 18.780517506173965 - type: nauc_map_at_3_diff1 value: 14.453129464176092 - type: nauc_map_at_3_max value: 25.801958649112077 - type: nauc_map_at_3_std value: 11.572823684429643 - type: nauc_map_at_5_diff1 value: 13.167155808104997 - type: nauc_map_at_5_max value: 27.355626948365792 - type: nauc_map_at_5_std value: 14.414151839192183 - type: nauc_mrr_at_1000_diff1 value: 17.262104643988636 - type: nauc_mrr_at_1000_max value: 23.991373837217058 - type: nauc_mrr_at_1000_std value: 12.44755488671623 - type: nauc_mrr_at_100_diff1 value: 17.267280132318703 - type: nauc_mrr_at_100_max value: 24.022189287889294 - type: nauc_mrr_at_100_std value: 12.480695500214788 - type: nauc_mrr_at_10_diff1 value: 17.012383998246268 - type: nauc_mrr_at_10_max value: 24.192637911171722 - type: nauc_mrr_at_10_std value: 12.524608847408917 - type: nauc_mrr_at_1_diff1 value: 19.43518811038007 - type: nauc_mrr_at_1_max value: 17.747482933395602 - type: nauc_mrr_at_1_std value: 8.410779775558684 - type: nauc_mrr_at_20_diff1 value: 17.202663281407446 - type: nauc_mrr_at_20_max value: 24.091991130543118 - type: nauc_mrr_at_20_std value: 12.503814263019908 - type: nauc_mrr_at_3_diff1 value: 17.52733013432995 - type: nauc_mrr_at_3_max value: 23.569459518780214 - type: nauc_mrr_at_3_std value: 11.770846827520726 - type: nauc_mrr_at_5_diff1 value: 17.10817561975543 - type: nauc_mrr_at_5_max value: 23.945141435234678 - type: nauc_mrr_at_5_std value: 12.034468615317719 - type: nauc_ndcg_at_1000_diff1 value: 12.317811393346936 - type: nauc_ndcg_at_1000_max value: 30.809991350156103 - type: nauc_ndcg_at_1000_std value: 24.517501065205067 - type: nauc_ndcg_at_100_diff1 value: 12.824804203182936 - type: nauc_ndcg_at_100_max value: 30.895499817010748 - type: nauc_ndcg_at_100_std value: 25.424376279745402 - type: nauc_ndcg_at_10_diff1 value: 13.32724552457439 - type: nauc_ndcg_at_10_max value: 30.409088666807456 - type: nauc_ndcg_at_10_std value: 18.216330475714113 - type: nauc_ndcg_at_1_diff1 value: 19.43518811038007 - type: nauc_ndcg_at_1_max value: 17.747482933395602 - type: nauc_ndcg_at_1_std value: 8.410779775558684 - type: nauc_ndcg_at_20_diff1 value: 12.224399111852902 - type: nauc_ndcg_at_20_max value: 29.86352330445272 - type: nauc_ndcg_at_20_std value: 21.196937851331807 - type: nauc_ndcg_at_3_diff1 value: 15.367489533734027 - type: nauc_ndcg_at_3_max value: 26.76486390741532 - type: nauc_ndcg_at_3_std value: 12.606077508789923 - type: nauc_ndcg_at_5_diff1 value: 13.831157482390935 - type: nauc_ndcg_at_5_max value: 28.070226983968904 - type: nauc_ndcg_at_5_std value: 15.236787943125435 - type: nauc_precision_at_1000_diff1 value: 0.016122957101357048 - type: nauc_precision_at_1000_max value: 24.380929903557334 - type: nauc_precision_at_1000_std value: 34.54045112720052 - type: nauc_precision_at_100_diff1 value: 7.255224788507301 - type: nauc_precision_at_100_max value: 27.98453788447542 - type: nauc_precision_at_100_std value: 35.38999555441665 - type: nauc_precision_at_10_diff1 value: 9.69185099834181 - type: nauc_precision_at_10_max value: 32.532315522580454 - type: nauc_precision_at_10_std value: 21.48948348473612 - type: nauc_precision_at_1_diff1 value: 19.43518811038007 - type: nauc_precision_at_1_max value: 17.747482933395602 - type: nauc_precision_at_1_std value: 8.410779775558684 - type: nauc_precision_at_20_diff1 value: 6.964076536695672 - type: nauc_precision_at_20_max value: 29.30087236410044 - type: nauc_precision_at_20_std value: 26.413625895571986 - type: nauc_precision_at_3_diff1 value: 14.145134359925155 - type: nauc_precision_at_3_max value: 29.915650960808303 - type: nauc_precision_at_3_std value: 14.095370019867797 - type: nauc_precision_at_5_diff1 value: 11.043933558522692 - type: nauc_precision_at_5_max value: 30.93016505807111 - type: nauc_precision_at_5_std value: 17.749256196062603 - type: nauc_recall_at_1000_diff1 value: -0.7776817772090345 - type: nauc_recall_at_1000_max value: 23.094717340324518 - type: nauc_recall_at_1000_std value: 37.189908681396425 - type: nauc_recall_at_100_diff1 value: 6.887748742013364 - type: nauc_recall_at_100_max value: 27.00798435230277 - type: nauc_recall_at_100_std value: 35.908147807345344 - type: nauc_recall_at_10_diff1 value: 9.605632017480751 - type: nauc_recall_at_10_max value: 31.845202901168655 - type: nauc_recall_at_10_std value: 21.497414586634683 - type: nauc_recall_at_1_diff1 value: 19.353069488987916 - type: nauc_recall_at_1_max value: 17.093914951159693 - type: nauc_recall_at_1_std value: 8.19886078055046 - type: nauc_recall_at_20_diff1 value: 6.927503731844782 - type: nauc_recall_at_20_max value: 28.611698183338202 - type: nauc_recall_at_20_std value: 26.69018660149911 - type: nauc_recall_at_3_diff1 value: 14.043724087062268 - type: nauc_recall_at_3_max value: 29.269835821380465 - type: nauc_recall_at_3_std value: 14.104419605998094 - type: nauc_recall_at_5_diff1 value: 11.017319452873336 - type: nauc_recall_at_5_max value: 30.295720628306228 - type: nauc_recall_at_5_std value: 17.758048545573825 - type: ndcg_at_1 value: 28.999999999999996 - type: ndcg_at_10 value: 25.041999999999998 - type: ndcg_at_100 value: 35.045 - type: ndcg_at_1000 value: 40.803 - type: ndcg_at_20 value: 28.584 - type: ndcg_at_3 value: 23.249 - type: ndcg_at_5 value: 20.533 - type: precision_at_1 value: 28.999999999999996 - type: precision_at_10 value: 13.120000000000001 - type: precision_at_100 value: 2.7470000000000003 - type: precision_at_1000 value: 0.41200000000000003 - type: precision_at_20 value: 8.584999999999999 - type: precision_at_3 value: 21.633 - type: precision_at_5 value: 18.099999999999998 - type: recall_at_1 value: 5.893000000000001 - type: recall_at_10 value: 26.567 - type: recall_at_100 value: 55.800000000000004 - type: recall_at_1000 value: 83.608 - type: recall_at_20 value: 34.86 - type: recall_at_3 value: 13.153 - type: recall_at_5 value: 18.323 task: type: Retrieval - dataset: config: default name: MTEB SICK-R revision: 20a6d6f312dd54037fe07a32d58e5e168867909d split: test type: mteb/sickr-sts metrics: - type: cosine_pearson value: 86.57284584320382 - type: cosine_spearman value: 82.20531642680812 - type: euclidean_pearson value: 83.94261758556554 - type: euclidean_spearman value: 82.20721497738559 - type: main_score value: 82.20531642680812 - type: manhattan_pearson value: 84.15902154703083 - type: manhattan_spearman value: 82.19506027155957 - type: pearson value: 86.57284584320382 - type: spearman value: 82.20531642680812 task: type: STS - dataset: config: default name: MTEB STS12 revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: cosine_pearson value: 86.28047602146931 - type: cosine_spearman value: 79.51504881448884 - type: euclidean_pearson value: 83.10545189967856 - type: euclidean_spearman value: 79.50586960492797 - type: main_score value: 79.51504881448884 - type: manhattan_pearson value: 83.44244457500889 - type: manhattan_spearman value: 79.730303339846 - type: pearson value: 86.28047602146931 - type: spearman value: 79.51504881448884 task: type: STS - dataset: config: default name: MTEB STS13 revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: cosine_pearson value: 88.74723553048702 - type: cosine_spearman value: 89.18936052329725 - type: euclidean_pearson value: 88.90400878928668 - type: euclidean_spearman value: 89.19174821431281 - type: main_score value: 89.18936052329725 - type: manhattan_pearson value: 88.81504628424054 - type: manhattan_spearman value: 89.18063294142597 - type: pearson value: 88.74723553048702 - type: spearman value: 89.18936052329725 task: type: STS - dataset: config: default name: MTEB STS14 revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: cosine_pearson value: 86.45403437836023 - type: cosine_spearman value: 85.14654611519086 - type: euclidean_pearson value: 85.87509624462743 - type: euclidean_spearman value: 85.1391108856681 - type: main_score value: 85.14654611519086 - type: manhattan_pearson value: 85.96635794953866 - type: manhattan_spearman value: 85.3271371527667 - type: pearson value: 86.45403437836023 - type: spearman value: 85.14654611519086 task: type: STS - dataset: config: default name: MTEB STS15 revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: cosine_pearson value: 87.84742260009705 - type: cosine_spearman value: 89.10215217191254 - type: euclidean_pearson value: 88.97393286325477 - type: euclidean_spearman value: 89.1014105509662 - type: main_score value: 89.10215217191254 - type: manhattan_pearson value: 89.31698781090151 - type: manhattan_spearman value: 89.53000001764433 - type: pearson value: 87.84742260009705 - type: spearman value: 89.10215217191254 task: type: STS - dataset: config: default name: MTEB STS16 revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: cosine_pearson value: 85.22397535461835 - type: cosine_spearman value: 87.14066355879785 - type: euclidean_pearson value: 86.31393364087295 - type: euclidean_spearman value: 87.14018892702765 - type: main_score value: 87.14066355879785 - type: manhattan_pearson value: 86.36366855248434 - type: manhattan_spearman value: 87.20858630423012 - type: pearson value: 85.22397535461835 - type: spearman value: 87.14066355879785 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 90.66131612061355 - type: cosine_spearman value: 90.97082650129164 - type: euclidean_pearson value: 90.98181906744969 - type: euclidean_spearman value: 90.99008476850047 - type: main_score value: 90.97082650129164 - type: manhattan_pearson value: 90.75245040709021 - type: manhattan_spearman value: 90.6199877691265 - type: pearson value: 90.66131612061355 - type: spearman value: 90.97082650129164 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 67.270656447085 - type: cosine_spearman value: 67.82870469746828 - type: euclidean_pearson value: 69.03857775285664 - type: euclidean_spearman value: 67.74455108773341 - type: main_score value: 67.82870469746828 - type: manhattan_pearson value: 69.25304172245812 - type: manhattan_spearman value: 68.00987097916055 - type: pearson value: 67.270656447085 - type: spearman value: 67.82870469746828 task: type: STS - dataset: config: default name: MTEB STSBenchmark revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: cosine_pearson value: 87.17245205384889 - type: cosine_spearman value: 87.7360146030987 - type: euclidean_pearson value: 87.48919412794656 - type: euclidean_spearman value: 87.7312047878383 - type: main_score value: 87.7360146030987 - type: manhattan_pearson value: 87.61476224354806 - type: manhattan_spearman value: 87.95220889254693 - type: pearson value: 87.17245205384889 - type: spearman value: 87.7360146030987 task: type: STS - dataset: config: default name: MTEB SciDocsRR revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: main_score value: 88.43547871921146 - type: map value: 88.43547871921146 - type: mrr value: 96.5564473652709 - type: nAUC_map_diff1 value: -13.66029392579231 - type: nAUC_map_max value: 50.325613574053506 - type: nAUC_map_std value: 60.02986231275796 - type: nAUC_mrr_diff1 value: 23.83821476411125 - type: nAUC_mrr_max value: 86.72643311769906 - type: nAUC_mrr_std value: 72.12741063469213 task: type: Reranking - dataset: config: default name: MTEB SciFact revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: main_score value: 78.233 - type: map_at_1 value: 61.49400000000001 - type: map_at_10 value: 73.30600000000001 - type: map_at_100 value: 73.719 - type: map_at_1000 value: 73.724 - type: map_at_20 value: 73.611 - type: map_at_3 value: 70.626 - type: map_at_5 value: 72.417 - type: mrr_at_1 value: 64.66666666666666 - type: mrr_at_10 value: 74.30357142857143 - type: mrr_at_100 value: 74.56950898079988 - type: mrr_at_1000 value: 74.57295833098681 - type: mrr_at_20 value: 74.46165223665226 - type: mrr_at_3 value: 72.3888888888889 - type: mrr_at_5 value: 73.60555555555557 - type: nauc_map_at_1000_diff1 value: 76.51524604780636 - type: nauc_map_at_1000_max value: 53.48521938401881 - type: nauc_map_at_1000_std value: -7.347799382158861 - type: nauc_map_at_100_diff1 value: 76.5122888096236 - type: nauc_map_at_100_max value: 53.49221847471618 - type: nauc_map_at_100_std value: -7.329683735681086 - type: nauc_map_at_10_diff1 value: 76.30928630674504 - type: nauc_map_at_10_max value: 53.00102977185941 - type: nauc_map_at_10_std value: -7.7467740085108705 - type: nauc_map_at_1_diff1 value: 79.54189281784247 - type: nauc_map_at_1_max value: 46.630071622109526 - type: nauc_map_at_1_std value: -14.395943134644112 - type: nauc_map_at_20_diff1 value: 76.41604361947962 - type: nauc_map_at_20_max value: 53.578883876146875 - type: nauc_map_at_20_std value: -7.403103451288041 - type: nauc_map_at_3_diff1 value: 76.25911617571941 - type: nauc_map_at_3_max value: 49.140287380513605 - type: nauc_map_at_3_std value: -11.35992449218983 - type: nauc_map_at_5_diff1 value: 76.35122077770336 - type: nauc_map_at_5_max value: 52.1744367901208 - type: nauc_map_at_5_std value: -7.85753955055384 - type: nauc_mrr_at_1000_diff1 value: 76.97223309515867 - type: nauc_mrr_at_1000_max value: 57.263787498613326 - type: nauc_mrr_at_1000_std value: -4.884090708840035 - type: nauc_mrr_at_100_diff1 value: 76.97312970894603 - type: nauc_mrr_at_100_max value: 57.26850730446478 - type: nauc_mrr_at_100_std value: -4.875200894216617 - type: nauc_mrr_at_10_diff1 value: 76.65927674223613 - type: nauc_mrr_at_10_max value: 57.30979763941454 - type: nauc_mrr_at_10_std value: -4.863331094022142 - type: nauc_mrr_at_1_diff1 value: 80.0454932568644 - type: nauc_mrr_at_1_max value: 56.76038421319305 - type: nauc_mrr_at_1_std value: -4.101939392632653 - type: nauc_mrr_at_20_diff1 value: 76.87237970440503 - type: nauc_mrr_at_20_max value: 57.33843605225869 - type: nauc_mrr_at_20_std value: -4.96248984417978 - type: nauc_mrr_at_3_diff1 value: 76.74130186666727 - type: nauc_mrr_at_3_max value: 56.19313244846155 - type: nauc_mrr_at_3_std value: -5.684365934009136 - type: nauc_mrr_at_5_diff1 value: 76.66406918799962 - type: nauc_mrr_at_5_max value: 57.56110093228628 - type: nauc_mrr_at_5_std value: -3.7464413085588073 - type: nauc_ndcg_at_1000_diff1 value: 76.19194173971773 - type: nauc_ndcg_at_1000_max value: 55.57464600170693 - type: nauc_ndcg_at_1000_std value: -6.0761689532372625 - type: nauc_ndcg_at_100_diff1 value: 76.14631273843654 - type: nauc_ndcg_at_100_max value: 55.72246565373382 - type: nauc_ndcg_at_100_std value: -5.595160698860595 - type: nauc_ndcg_at_10_diff1 value: 75.0108223611192 - type: nauc_ndcg_at_10_max value: 55.27894212877493 - type: nauc_ndcg_at_10_std value: -6.968331740214591 - type: nauc_ndcg_at_1_diff1 value: 80.0454932568644 - type: nauc_ndcg_at_1_max value: 56.76038421319305 - type: nauc_ndcg_at_1_std value: -4.101939392632653 - type: nauc_ndcg_at_20_diff1 value: 75.54887755702472 - type: nauc_ndcg_at_20_max value: 56.406879417251496 - type: nauc_ndcg_at_20_std value: -6.495231061329629 - type: nauc_ndcg_at_3_diff1 value: 75.03620356688509 - type: nauc_ndcg_at_3_max value: 52.147381077773424 - type: nauc_ndcg_at_3_std value: -8.448005688956199 - type: nauc_ndcg_at_5_diff1 value: 75.1195898074229 - type: nauc_ndcg_at_5_max value: 54.2321033861173 - type: nauc_ndcg_at_5_std value: -5.882690780895338 - type: nauc_precision_at_1000_diff1 value: -28.081979732100532 - type: nauc_precision_at_1000_max value: 35.055348014832916 - type: nauc_precision_at_1000_std value: 59.61280468927384 - type: nauc_precision_at_100_diff1 value: -25.112740730587458 - type: nauc_precision_at_100_max value: 38.26331300116496 - type: nauc_precision_at_100_std value: 62.46316222328831 - type: nauc_precision_at_10_diff1 value: -2.6766206473658833 - type: nauc_precision_at_10_max value: 45.95321867204845 - type: nauc_precision_at_10_std value: 45.07212468670564 - type: nauc_precision_at_1_diff1 value: 80.0454932568644 - type: nauc_precision_at_1_max value: 56.76038421319305 - type: nauc_precision_at_1_std value: -4.101939392632653 - type: nauc_precision_at_20_diff1 value: -10.698911116738385 - type: nauc_precision_at_20_max value: 43.467275950182994 - type: nauc_precision_at_20_std value: 48.00467321991766 - type: nauc_precision_at_3_diff1 value: 33.6344708541193 - type: nauc_precision_at_3_max value: 49.309242331670504 - type: nauc_precision_at_3_std value: 21.02940391379915 - type: nauc_precision_at_5_diff1 value: 13.560415600596318 - type: nauc_precision_at_5_max value: 48.918726500100085 - type: nauc_precision_at_5_std value: 39.940930429172184 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 70.82166199813196 - type: nauc_recall_at_100_max value: 76.6106442577042 - type: nauc_recall_at_100_std value: 66.47992530345513 - type: nauc_recall_at_10_diff1 value: 62.68908885556092 - type: nauc_recall_at_10_max value: 58.14262437741839 - type: nauc_recall_at_10_std value: -12.946717875063369 - type: nauc_recall_at_1_diff1 value: 79.54189281784247 - type: nauc_recall_at_1_max value: 46.630071622109526 - type: nauc_recall_at_1_std value: -14.395943134644112 - type: nauc_recall_at_20_diff1 value: 65.79470497876567 - type: nauc_recall_at_20_max value: 71.68308183488456 - type: nauc_recall_at_20_std value: -12.556850697268453 - type: nauc_recall_at_3_diff1 value: 68.3240211318129 - type: nauc_recall_at_3_max value: 45.05998217275036 - type: nauc_recall_at_3_std value: -14.23179772593869 - type: nauc_recall_at_5_diff1 value: 67.53366869904056 - type: nauc_recall_at_5_max value: 53.57935627081027 - type: nauc_recall_at_5_std value: -3.3271112904853393 - type: ndcg_at_1 value: 64.667 - type: ndcg_at_10 value: 78.233 - type: ndcg_at_100 value: 79.806 - type: ndcg_at_1000 value: 79.92099999999999 - type: ndcg_at_20 value: 79.006 - type: ndcg_at_3 value: 74.018 - type: ndcg_at_5 value: 76.334 - type: precision_at_1 value: 64.667 - type: precision_at_10 value: 10.4 - type: precision_at_100 value: 1.1199999999999999 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_20 value: 5.383 - type: precision_at_3 value: 29.444 - type: precision_at_5 value: 19.467000000000002 - type: recall_at_1 value: 61.49400000000001 - type: recall_at_10 value: 92.156 - type: recall_at_100 value: 99.167 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 94.833 - type: recall_at_3 value: 80.833 - type: recall_at_5 value: 86.6 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: cosine_accuracy value: 99.8039603960396 - type: cosine_accuracy_threshold value: 84.54211950302124 - type: cosine_ap value: 95.59056372734358 - type: cosine_f1 value: 90.1394422310757 - type: cosine_f1_threshold value: 84.54211950302124 - type: cosine_precision value: 89.78174603174604 - type: cosine_recall value: 90.5 - type: dot_accuracy value: 99.80594059405941 - type: dot_accuracy_threshold value: 85.57180166244507 - type: dot_ap value: 95.53453431914399 - type: dot_f1 value: 90.10442565887618 - type: dot_f1_threshold value: 84.59715843200684 - type: dot_precision value: 89.61424332344214 - type: dot_recall value: 90.60000000000001 - type: euclidean_accuracy value: 99.8039603960396 - type: euclidean_accuracy_threshold value: 53.253382444381714 - type: euclidean_ap value: 95.5850992402159 - type: euclidean_f1 value: 90.09457441513192 - type: euclidean_f1_threshold value: 55.725520849227905 - type: euclidean_precision value: 89.69276511397423 - type: euclidean_recall value: 90.5 - type: main_score value: 95.7485189884476 - type: manhattan_accuracy value: 99.81485148514851 - type: manhattan_accuracy_threshold value: 3491.29638671875 - type: manhattan_ap value: 95.7485189884476 - type: manhattan_f1 value: 90.464048954615 - type: manhattan_f1_threshold value: 3491.29638671875 - type: manhattan_precision value: 92.2996878251821 - type: manhattan_recall value: 88.7 - type: max_ap value: 95.7485189884476 - type: max_f1 value: 90.464048954615 - type: max_precision value: 92.2996878251821 - type: max_recall value: 90.60000000000001 - type: similarity_accuracy value: 99.8039603960396 - type: similarity_accuracy_threshold value: 84.54211950302124 - type: similarity_ap value: 95.59056372734358 - type: similarity_f1 value: 90.1394422310757 - type: similarity_f1_threshold value: 84.54211950302124 - type: similarity_precision value: 89.78174603174604 - type: similarity_recall value: 90.5 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: main_score value: 78.49205191950675 - type: v_measure value: 78.49205191950675 - type: v_measure_std value: 2.84869550699959 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: main_score value: 48.90421736513028 - type: v_measure value: 48.90421736513028 - type: v_measure_std value: 1.6875865714471023 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: main_score value: 52.9874730481696 - type: map value: 52.9874730481696 - type: mrr value: 53.85867604617604 - type: nAUC_map_diff1 value: 39.633429293407616 - type: nAUC_map_max value: 10.236807988858546 - type: nAUC_map_std value: 10.276522217929674 - type: nAUC_mrr_diff1 value: 40.0543079218377 - type: nAUC_mrr_max value: 10.96209807382042 - type: nAUC_mrr_std value: 10.524400196109918 task: type: Reranking - dataset: config: default name: MTEB SummEval revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: cosine_pearson value: 30.727801109114232 - type: cosine_spearman value: 31.66058223980157 - type: dot_pearson value: 30.78818248622866 - type: dot_spearman value: 31.525158776890265 - type: main_score value: 31.66058223980157 - type: pearson value: 30.727801109114232 - type: spearman value: 31.66058223980157 task: type: Summarization - dataset: config: default name: MTEB TRECCOVID revision: bb9466bac8153a0349341eb1b22e06409e78ef4e split: test type: mteb/trec-covid metrics: - type: main_score value: 85.206 - type: map_at_1 value: 0.246 - type: map_at_10 value: 2.1950000000000003 - type: map_at_100 value: 14.179 - type: map_at_1000 value: 35.037 - type: map_at_20 value: 4.143 - type: map_at_3 value: 0.7100000000000001 - type: map_at_5 value: 1.135 - type: mrr_at_1 value: 94.0 - type: mrr_at_10 value: 96.66666666666666 - type: mrr_at_100 value: 96.66666666666666 - type: mrr_at_1000 value: 96.66666666666666 - type: mrr_at_20 value: 96.66666666666666 - type: mrr_at_3 value: 96.66666666666666 - type: mrr_at_5 value: 96.66666666666666 - type: nauc_map_at_1000_diff1 value: -4.6264497624527525 - type: nauc_map_at_1000_max value: 44.594457564749355 - type: nauc_map_at_1000_std value: 73.17642341400133 - type: nauc_map_at_100_diff1 value: 23.451335157405726 - type: nauc_map_at_100_max value: 25.426398857299525 - type: nauc_map_at_100_std value: 64.07416694472633 - type: nauc_map_at_10_diff1 value: 46.57568738568346 - type: nauc_map_at_10_max value: 9.693233249079238 - type: nauc_map_at_10_std value: 28.549530265164357 - type: nauc_map_at_1_diff1 value: 53.48238396620123 - type: nauc_map_at_1_max value: 0.33476619393733076 - type: nauc_map_at_1_std value: 8.906362219128463 - type: nauc_map_at_20_diff1 value: 39.40719602207749 - type: nauc_map_at_20_max value: 9.635915072074045 - type: nauc_map_at_20_std value: 35.15634791346394 - type: nauc_map_at_3_diff1 value: 53.11784737840137 - type: nauc_map_at_3_max value: 3.059682761072153 - type: nauc_map_at_3_std value: 21.310633086556617 - type: nauc_map_at_5_diff1 value: 49.91570701185436 - type: nauc_map_at_5_max value: 8.045082896244576 - type: nauc_map_at_5_std value: 20.597686235051647 - type: nauc_mrr_at_1000_diff1 value: 41.98412698412726 - type: nauc_mrr_at_1000_max value: 78.24463118580779 - type: nauc_mrr_at_1000_std value: 0.30812324930028195 - type: nauc_mrr_at_100_diff1 value: 41.98412698412726 - type: nauc_mrr_at_100_max value: 78.24463118580779 - type: nauc_mrr_at_100_std value: 0.30812324930028195 - type: nauc_mrr_at_10_diff1 value: 41.98412698412726 - type: nauc_mrr_at_10_max value: 78.24463118580779 - type: nauc_mrr_at_10_std value: 0.30812324930028195 - type: nauc_mrr_at_1_diff1 value: 38.62433862433873 - type: nauc_mrr_at_1_max value: 80.78120136943666 - type: nauc_mrr_at_1_std value: -10.768751945222197 - type: nauc_mrr_at_20_diff1 value: 41.98412698412726 - type: nauc_mrr_at_20_max value: 78.24463118580779 - type: nauc_mrr_at_20_std value: 0.30812324930028195 - type: nauc_mrr_at_3_diff1 value: 41.98412698412726 - type: nauc_mrr_at_3_max value: 78.24463118580779 - type: nauc_mrr_at_3_std value: 0.30812324930028195 - type: nauc_mrr_at_5_diff1 value: 41.98412698412726 - type: nauc_mrr_at_5_max value: 78.24463118580779 - type: nauc_mrr_at_5_std value: 0.30812324930028195 - type: nauc_ndcg_at_1000_diff1 value: 0.5174948602880207 - type: nauc_ndcg_at_1000_max value: 48.60686602077053 - type: nauc_ndcg_at_1000_std value: 75.72456343175277 - type: nauc_ndcg_at_100_diff1 value: -20.747252137999254 - type: nauc_ndcg_at_100_max value: 49.985132618254994 - type: nauc_ndcg_at_100_std value: 61.096383293836574 - type: nauc_ndcg_at_10_diff1 value: 6.791377920463332 - type: nauc_ndcg_at_10_max value: 57.50019332833286 - type: nauc_ndcg_at_10_std value: 49.201028841219426 - type: nauc_ndcg_at_1_diff1 value: 54.92683440362145 - type: nauc_ndcg_at_1_max value: 83.8667228129276 - type: nauc_ndcg_at_1_std value: 1.6738604063586122 - type: nauc_ndcg_at_20_diff1 value: -5.1948699196314925 - type: nauc_ndcg_at_20_max value: 54.483087684806556 - type: nauc_ndcg_at_20_std value: 50.54823818118781 - type: nauc_ndcg_at_3_diff1 value: 26.267246500164372 - type: nauc_ndcg_at_3_max value: 63.0173212926611 - type: nauc_ndcg_at_3_std value: 41.025597406368256 - type: nauc_ndcg_at_5_diff1 value: 16.910185454343036 - type: nauc_ndcg_at_5_max value: 60.9328683868778 - type: nauc_ndcg_at_5_std value: 36.70169905857712 - type: nauc_precision_at_1000_diff1 value: -46.374447765983525 - type: nauc_precision_at_1000_max value: 35.36052337813863 - type: nauc_precision_at_1000_std value: 14.219220668161018 - type: nauc_precision_at_100_diff1 value: -29.7838083657744 - type: nauc_precision_at_100_max value: 43.93589400385112 - type: nauc_precision_at_100_std value: 55.425045718579945 - type: nauc_precision_at_10_diff1 value: -12.016613405227687 - type: nauc_precision_at_10_max value: 57.79924427743131 - type: nauc_precision_at_10_std value: 49.022036703550675 - type: nauc_precision_at_1_diff1 value: 38.62433862433873 - type: nauc_precision_at_1_max value: 80.78120136943666 - type: nauc_precision_at_1_std value: -10.768751945222197 - type: nauc_precision_at_20_diff1 value: -23.95633847880195 - type: nauc_precision_at_20_max value: 48.34715917258276 - type: nauc_precision_at_20_std value: 48.82198285255887 - type: nauc_precision_at_3_diff1 value: 6.871296905858807 - type: nauc_precision_at_3_max value: 70.54805793285054 - type: nauc_precision_at_3_std value: 44.65108624094803 - type: nauc_precision_at_5_diff1 value: -9.074932448759695 - type: nauc_precision_at_5_max value: 67.41284242437573 - type: nauc_precision_at_5_std value: 23.876891983919577 - type: nauc_recall_at_1000_diff1 value: 8.142288830293255 - type: nauc_recall_at_1000_max value: 38.85182826835104 - type: nauc_recall_at_1000_std value: 68.60783819217335 - type: nauc_recall_at_100_diff1 value: 34.262914076287466 - type: nauc_recall_at_100_max value: 12.87009658528838 - type: nauc_recall_at_100_std value: 56.21330603762995 - type: nauc_recall_at_10_diff1 value: 49.33830945338758 - type: nauc_recall_at_10_max value: 0.3539875530671406 - type: nauc_recall_at_10_std value: 26.85864465557644 - type: nauc_recall_at_1_diff1 value: 53.48238396620123 - type: nauc_recall_at_1_max value: 0.33476619393733076 - type: nauc_recall_at_1_std value: 8.906362219128463 - type: nauc_recall_at_20_diff1 value: 44.21928181266254 - type: nauc_recall_at_20_max value: -0.9198356057088594 - type: nauc_recall_at_20_std value: 31.484376992896784 - type: nauc_recall_at_3_diff1 value: 53.038093080990876 - type: nauc_recall_at_3_max value: -1.4170895916973003 - type: nauc_recall_at_3_std value: 21.890202855574497 - type: nauc_recall_at_5_diff1 value: 49.39742214825278 - type: nauc_recall_at_5_max value: 2.8412267611894517 - type: nauc_recall_at_5_std value: 18.01598921859512 - type: ndcg_at_1 value: 91.0 - type: ndcg_at_10 value: 85.206 - type: ndcg_at_100 value: 67.29 - type: ndcg_at_1000 value: 60.584 - type: ndcg_at_20 value: 82.321 - type: ndcg_at_3 value: 88.642 - type: ndcg_at_5 value: 87.063 - type: precision_at_1 value: 94.0 - type: precision_at_10 value: 89.8 - type: precision_at_100 value: 69.78 - type: precision_at_1000 value: 26.738 - type: precision_at_20 value: 87.2 - type: precision_at_3 value: 92.0 - type: precision_at_5 value: 90.8 - type: recall_at_1 value: 0.246 - type: recall_at_10 value: 2.344 - type: recall_at_100 value: 16.962 - type: recall_at_1000 value: 57.325 - type: recall_at_20 value: 4.517 - type: recall_at_3 value: 0.731 - type: recall_at_5 value: 1.1780000000000002 task: type: Retrieval - dataset: config: default name: MTEB Touche2020 revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: main_score value: 31.455 - type: map_at_1 value: 2.9739999999999998 - type: map_at_10 value: 12.183 - type: map_at_100 value: 18.772 - type: map_at_1000 value: 20.415 - type: map_at_20 value: 14.451 - type: map_at_3 value: 6.507000000000001 - type: map_at_5 value: 8.66 - type: mrr_at_1 value: 40.816326530612244 - type: mrr_at_10 value: 57.70975056689341 - type: mrr_at_100 value: 58.18379126542391 - type: mrr_at_1000 value: 58.18379126542391 - type: mrr_at_20 value: 57.85552316164561 - type: mrr_at_3 value: 54.08163265306123 - type: mrr_at_5 value: 56.42857142857143 - type: nauc_map_at_1000_diff1 value: 3.1567471051481437 - type: nauc_map_at_1000_max value: -1.5882060729791523 - type: nauc_map_at_1000_std value: 18.69622198722074 - type: nauc_map_at_100_diff1 value: 3.3449677678147536 - type: nauc_map_at_100_max value: -2.8928606866168405 - type: nauc_map_at_100_std value: 15.789984947653412 - type: nauc_map_at_10_diff1 value: 2.9696743570444264 - type: nauc_map_at_10_max value: -9.096749212011876 - type: nauc_map_at_10_std value: -5.38545817258353 - type: nauc_map_at_1_diff1 value: 20.680780404542546 - type: nauc_map_at_1_max value: -7.04722927447817 - type: nauc_map_at_1_std value: -7.062494733973898 - type: nauc_map_at_20_diff1 value: 4.070437790119271 - type: nauc_map_at_20_max value: -4.84491434686032 - type: nauc_map_at_20_std value: 0.5846341109021014 - type: nauc_map_at_3_diff1 value: 11.9634978045925 - type: nauc_map_at_3_max value: -8.27834591046608 - type: nauc_map_at_3_std value: -8.687615453381065 - type: nauc_map_at_5_diff1 value: 0.9195191526009436 - type: nauc_map_at_5_max value: -1.673813362719489 - type: nauc_map_at_5_std value: -6.67549753473631 - type: nauc_mrr_at_1000_diff1 value: 19.877993208719573 - type: nauc_mrr_at_1000_max value: -10.37776706406218 - type: nauc_mrr_at_1000_std value: 7.132169578056367 - type: nauc_mrr_at_100_diff1 value: 19.877993208719573 - type: nauc_mrr_at_100_max value: -10.37776706406218 - type: nauc_mrr_at_100_std value: 7.132169578056367 - type: nauc_mrr_at_10_diff1 value: 20.414285568401457 - type: nauc_mrr_at_10_max value: -9.677800295687861 - type: nauc_mrr_at_10_std value: 8.001103690180859 - type: nauc_mrr_at_1_diff1 value: 22.393284073955723 - type: nauc_mrr_at_1_max value: -5.889370191243167 - type: nauc_mrr_at_1_std value: -1.5183536173658247 - type: nauc_mrr_at_20_diff1 value: 20.455564720604055 - type: nauc_mrr_at_20_max value: -10.230642830103074 - type: nauc_mrr_at_20_std value: 7.863582453266621 - type: nauc_mrr_at_3_diff1 value: 17.554895390732618 - type: nauc_mrr_at_3_max value: -15.618463505555052 - type: nauc_mrr_at_3_std value: 5.913231577966864 - type: nauc_mrr_at_5_diff1 value: 18.393678507779914 - type: nauc_mrr_at_5_max value: -11.903593353147762 - type: nauc_mrr_at_5_std value: 7.580745996262831 - type: nauc_ndcg_at_1000_diff1 value: 13.746937095530473 - type: nauc_ndcg_at_1000_max value: -0.9319249687895838 - type: nauc_ndcg_at_1000_std value: 38.56328031451904 - type: nauc_ndcg_at_100_diff1 value: 13.854865944415895 - type: nauc_ndcg_at_100_max value: -7.142142012591404 - type: nauc_ndcg_at_100_std value: 35.61341954818848 - type: nauc_ndcg_at_10_diff1 value: 9.010144273248759 - type: nauc_ndcg_at_10_max value: -15.320014897424574 - type: nauc_ndcg_at_10_std value: 2.84883880489144 - type: nauc_ndcg_at_1_diff1 value: 20.939533945592967 - type: nauc_ndcg_at_1_max value: -6.387319972188946 - type: nauc_ndcg_at_1_std value: -0.5258673122126726 - type: nauc_ndcg_at_20_diff1 value: 14.660827309009496 - type: nauc_ndcg_at_20_max value: -13.476196120145994 - type: nauc_ndcg_at_20_std value: 8.22391881710838 - type: nauc_ndcg_at_3_diff1 value: 13.429985227235935 - type: nauc_ndcg_at_3_max value: -14.904544592570247 - type: nauc_ndcg_at_3_std value: 1.599779998183342 - type: nauc_ndcg_at_5_diff1 value: 8.085466231900622 - type: nauc_ndcg_at_5_max value: -9.09591969526831 - type: nauc_ndcg_at_5_std value: 3.5794092637248505 - type: nauc_precision_at_1000_diff1 value: -9.31941215946743 - type: nauc_precision_at_1000_max value: 31.52913520470716 - type: nauc_precision_at_1000_std value: 22.720784312185856 - type: nauc_precision_at_100_diff1 value: 8.958548406995279 - type: nauc_precision_at_100_max value: 15.100597910674104 - type: nauc_precision_at_100_std value: 71.04548238175113 - type: nauc_precision_at_10_diff1 value: 12.4698194690008 - type: nauc_precision_at_10_max value: -15.84870544871496 - type: nauc_precision_at_10_std value: 7.575297622501928 - type: nauc_precision_at_1_diff1 value: 22.393284073955723 - type: nauc_precision_at_1_max value: -5.889370191243167 - type: nauc_precision_at_1_std value: -1.5183536173658247 - type: nauc_precision_at_20_diff1 value: 15.393505718138758 - type: nauc_precision_at_20_max value: -3.70684298539384 - type: nauc_precision_at_20_std value: 29.426137824970304 - type: nauc_precision_at_3_diff1 value: 9.997768085465394 - type: nauc_precision_at_3_max value: -17.12224314347674 - type: nauc_precision_at_3_std value: -1.343018166772313 - type: nauc_precision_at_5_diff1 value: 3.8936997437913554 - type: nauc_precision_at_5_max value: -5.689104289687632 - type: nauc_precision_at_5_std value: 3.181098051304285 - type: nauc_recall_at_1000_diff1 value: 9.908303508158387 - type: nauc_recall_at_1000_max value: 6.174506592699848 - type: nauc_recall_at_1000_std value: 77.41931114780012 - type: nauc_recall_at_100_diff1 value: 10.286839241876192 - type: nauc_recall_at_100_max value: -6.6138697026666815 - type: nauc_recall_at_100_std value: 49.608313692633224 - type: nauc_recall_at_10_diff1 value: 2.215545846659851 - type: nauc_recall_at_10_max value: -17.83025802478445 - type: nauc_recall_at_10_std value: -3.3784768673705465 - type: nauc_recall_at_1_diff1 value: 20.680780404542546 - type: nauc_recall_at_1_max value: -7.04722927447817 - type: nauc_recall_at_1_std value: -7.062494733973898 - type: nauc_recall_at_20_diff1 value: 6.974410239251615 - type: nauc_recall_at_20_max value: -14.161147924731646 - type: nauc_recall_at_20_std value: 9.328412057721454 - type: nauc_recall_at_3_diff1 value: 7.904589805754212 - type: nauc_recall_at_3_max value: -12.1912388648593 - type: nauc_recall_at_3_std value: -9.221542013385555 - type: nauc_recall_at_5_diff1 value: -3.2604132752706914 - type: nauc_recall_at_5_max value: -6.886351441658915 - type: nauc_recall_at_5_std value: -7.014252851712789 - type: ndcg_at_1 value: 39.796 - type: ndcg_at_10 value: 31.455 - type: ndcg_at_100 value: 42.388999999999996 - type: ndcg_at_1000 value: 53.556000000000004 - type: ndcg_at_20 value: 30.808000000000003 - type: ndcg_at_3 value: 35.831 - type: ndcg_at_5 value: 32.845 - type: precision_at_1 value: 40.816 - type: precision_at_10 value: 27.143 - type: precision_at_100 value: 8.449 - type: precision_at_1000 value: 1.6179999999999999 - type: precision_at_20 value: 19.387999999999998 - type: precision_at_3 value: 35.374 - type: precision_at_5 value: 31.019999999999996 - type: recall_at_1 value: 2.9739999999999998 - type: recall_at_10 value: 19.39 - type: recall_at_100 value: 51.636 - type: recall_at_1000 value: 86.99900000000001 - type: recall_at_20 value: 26.478 - type: recall_at_3 value: 7.703 - type: recall_at_5 value: 11.42 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 86.9384765625 - type: ap value: 31.737513704141552 - type: ap_weighted value: 31.737513704141552 - type: f1 value: 71.5490757306975 - type: f1_weighted value: 89.14632533489856 - type: main_score value: 86.9384765625 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 73.57668364459535 - type: f1 value: 73.90467103648074 - type: f1_weighted value: 73.42158415034704 - type: main_score value: 73.57668364459535 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: main_score value: 58.574148097494685 - type: v_measure value: 58.574148097494685 - type: v_measure_std value: 0.9443161637490822 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: cosine_accuracy value: 88.1385229778864 - type: cosine_accuracy_threshold value: 83.86307954788208 - type: cosine_ap value: 80.17965893449055 - type: cosine_f1 value: 73.0614300100705 - type: cosine_f1_threshold value: 80.7942807674408 - type: cosine_precision value: 69.8603755416466 - type: cosine_recall value: 76.56992084432717 - type: dot_accuracy value: 88.2100494724921 - type: dot_accuracy_threshold value: 83.84793996810913 - type: dot_ap value: 80.18603932881858 - type: dot_f1 value: 73.07643714466204 - type: dot_f1_threshold value: 80.87586164474487 - type: dot_precision value: 70.10909090909091 - type: dot_recall value: 76.3060686015831 - type: euclidean_accuracy value: 88.1385229778864 - type: euclidean_accuracy_threshold value: 56.77661895751953 - type: euclidean_ap value: 80.1784070881624 - type: euclidean_f1 value: 73.04830369529574 - type: euclidean_f1_threshold value: 61.91838979721069 - type: euclidean_precision value: 69.96859144720948 - type: euclidean_recall value: 76.41160949868075 - type: main_score value: 80.18603932881858 - type: manhattan_accuracy value: 88.0431543184121 - type: manhattan_accuracy_threshold value: 3755.6137084960938 - type: manhattan_ap value: 79.98270453664578 - type: manhattan_f1 value: 72.68242015061023 - type: manhattan_f1_threshold value: 3892.494583129883 - type: manhattan_precision value: 71.54907975460122 - type: manhattan_recall value: 73.85224274406332 - type: max_ap value: 80.18603932881858 - type: max_f1 value: 73.07643714466204 - type: max_precision value: 71.54907975460122 - type: max_recall value: 76.56992084432717 - type: similarity_accuracy value: 88.1385229778864 - type: similarity_accuracy_threshold value: 83.86307954788208 - type: similarity_ap value: 80.17965893449055 - type: similarity_f1 value: 73.0614300100705 - type: similarity_f1_threshold value: 80.7942807674408 - type: similarity_precision value: 69.8603755416466 - type: similarity_recall value: 76.56992084432717 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: cosine_accuracy value: 89.7892653393876 - type: cosine_accuracy_threshold value: 79.69566583633423 - type: cosine_ap value: 87.4579867302024 - type: cosine_f1 value: 79.91620843152658 - type: cosine_f1_threshold value: 78.53609323501587 - type: cosine_precision value: 77.7155329210622 - type: cosine_recall value: 82.24514936864799 - type: dot_accuracy value: 89.78732487289945 - type: dot_accuracy_threshold value: 80.05315661430359 - type: dot_ap value: 87.44916182456272 - type: dot_f1 value: 79.90419878751591 - type: dot_f1_threshold value: 78.57890725135803 - type: dot_precision value: 77.73409057812728 - type: dot_recall value: 82.19895287958116 - type: euclidean_accuracy value: 89.78538440641131 - type: euclidean_accuracy_threshold value: 62.29925751686096 - type: euclidean_ap value: 87.45904868911386 - type: euclidean_f1 value: 79.93127404474657 - type: euclidean_f1_threshold value: 65.61101078987122 - type: euclidean_precision value: 77.62060210373595 - type: euclidean_recall value: 82.38373883584848 - type: main_score value: 87.46554314325058 - type: manhattan_accuracy value: 89.76597974152986 - type: manhattan_accuracy_threshold value: 3988.5299682617188 - type: manhattan_ap value: 87.46554314325058 - type: manhattan_f1 value: 79.97181740645973 - type: manhattan_f1_threshold value: 4235.905838012695 - type: manhattan_precision value: 77.13713427283783 - type: manhattan_recall value: 83.02279026793964 - type: max_ap value: 87.46554314325058 - type: max_f1 value: 79.97181740645973 - type: max_precision value: 77.73409057812728 - type: max_recall value: 83.02279026793964 - type: similarity_accuracy value: 89.7892653393876 - type: similarity_accuracy_threshold value: 79.69566583633423 - type: similarity_ap value: 87.4579867302024 - type: similarity_f1 value: 79.91620843152658 - type: similarity_f1_threshold value: 78.53609323501587 - type: similarity_precision value: 77.7155329210622 - type: similarity_recall value: 82.24514936864799 task: type: PairClassification tags: - mteb - sentence-transformers - transformers - sentence-similarity license: mit --- # Updates We released a Jasper and Stella model technology report and code.(2025.1) **Report:** **Codes:** # Introduction The models are trained based on and . Thanks for their contributions! **We simplify usage of prompts, providing two prompts for most general tasks, one is for s2p, another one is for s2s.** Prompt of s2p task(e.g. retrieve task): Prompt of s2s task(e.g. semantic textual similarity task): The models are finally trained by MRL, so they have multiple dimensions: 512, 768, 1024, 2048, 4096, 6144 and 8192. The higher the dimension, the better the performance. **Generally speaking, 1024d is good enough.** The MTEB score of 1024d is only 0.001 lower than 8192d. # Model directory structure The model directory structure is very simple, it is a standard SentenceTransformer directory **with a series of folders**, where represents the final vector dimension. For example, the folder stores Linear weights that convert vector dimensions to 256 dimensions. Please refer to the following chapters for specific instructions on how to use them. # Usage You can use or library to encode text. ## Sentence Transformers ## Transformers ### infinity_emb Usage via infinity, MIT Licensed. # Citation # FAQ Q: The details of training? A: The training method and datasets will be released in the future. (specific time unknown, may be provided in a paper) Q: How to choose a suitable prompt for my own task? A: In most cases, please use the s2p and s2s prompts. These two prompts account for the vast majority of the training data. Q: How to reproduce MTEB results? A: Please use evaluation scripts in or Q: Why each dimension has a linear weight? A: MRL has multiple training methods, we choose this method which has the best performance. Q: What is the sequence length of models? A: 512 is recommended, in our experiments, almost all models perform poorly on specialized long text retrieval datasets. Besides, the model is trained on datasets of 512 length. This may be an optimization term. If you have any questions, please start a discussion on community.", + "model_explanation_gemini": "Performs classification tasks on text data, particularly for Amazon reviews and counterfactual analysis, with high accuracy and F1 scores." +} \ No newline at end of file diff --git a/data/model_data_json/OnomaAIResearch_Illustrious-xl-early-release-v0.json b/data/model_data_json/OnomaAIResearch_Illustrious-xl-early-release-v0.json new file mode 100644 index 0000000000000000000000000000000000000000..0c3d67011b9e8f13f179ae948fb158c6673fcb2f --- /dev/null +++ b/data/model_data_json/OnomaAIResearch_Illustrious-xl-early-release-v0.json @@ -0,0 +1,20 @@ +{ + "model_id": "OnomaAIResearch/Illustrious-xl-early-release-v0", + "downloads": 87669, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "en", + "arxiv:2409.19946", + "base_model:KBlueLeaf/kohaku-xl-beta5", + "base_model:finetune:KBlueLeaf/kohaku-xl-beta5", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: other license_name: fair-ai-public-license-1.0-sd license_link: language: - en base_model: KBlueLeaf/kohaku-xl-beta5 pipeline_tag: text-to-image ---

Illustrious XL v0.1
trained by Onoma AI

\"s00\"
\"s01\"
\"s02\"
\"s10\"
\"s11\"
\"s12\"
\"s20\"
\"s21\"
\"s22\"

Illustrious XL is the Illustration focused Stable Diffusion XL model which is continued from Kohaku XL Beta 5, trained by OnomaAI Research Team. The model focuses on utilizing large-scale annotated dataset,

Model Information:

Description:

We plan to release several aesthetic-finetuned model variants in near future.

Technical Details:

Terms and Conditions:

By using this model, users agree to comply with the conditions outlined in the LICENSE and acknowledge responsibility for how they utilize the generated content.

Safety Control Recommendation:

Training/Merging Policy:
You may fine-tune, merge, or train LoRA based on this model. However, to foster an open-source community, you are required to:

Uploading / Generation Policy:
We do not restrict any upload or spread of the generation results, as we do not own any rights regard to generated materials. This includes 'personally trained models / finetuned models / trained lora-related results'. However, we kindly ask you to open the generation details, to foster the open source communities and researches.

Monetization Prohibition:

Usage:
We do not recommend overusing critical composition tags such as 'close-up', 'upside-down', or 'cowboy shot', as they can be conflicting and lead to confusion, affecting model results.
Recommended sampling method: Euler a, Sampling Steps: 20–28, CFG: 5–7.5 (may vary based on use case).
We suggest using suitable composition tags like \"upper body,\" \"cowboy shot,\" \"portrait,\" or \"full body\" depending on your use case.
The model supports quality tags such as: \"worst quality,\" \"bad quality,\" \"average quality,\" \"good quality,\" \"best quality,\" and \"masterpiece (quality).\"
Note: The model does not have any default style. This is intended behavior for the base model.

\"s23\"

Prompt:
1boy, holding knife, blue eyes, jewelry, jacket, shirt, open mouth, hand up, simple background, hair between eyes, vest, knife, tongue, holding weapon, grey vest, upper body, necktie, solo, looking at viewer, smile, pink blood, weapon, dagger, open clothes, collared shirt, blood on face, tongue out, blonde hair, holding dagger, red necktie, white shirt, blood, short hair, holding, earrings, long sleeves, black jacket, dark theme

Negative Prompt:
worst quality, comic, multiple views, bad quality, low quality, lowres, displeasing, very displeasing, bad anatomy, bad hands, scan artifacts, monochrome, greyscale, signature, twitter username, jpeg artifacts, 2koma, 4koma, guro, extra digits, fewer digits

\"s24\"

Prompt:
1girl, extremely dark, black theme, silhouette, rim lighting, black, looking at viewer, low contrast, masterpiece

Negative Prompt:
worst quality, comic, multiple views, bad quality, low quality, lowres, displeasing, very displeasing, bad anatomy, bad hands, scan artifacts, monochrome, greyscale, twitter username, jpeg artifacts, 2koma, 4koma, guro, extra digits, fewer digits, jaggy lines, unclear

# Illustrious XL Series Update It’s been a while since we released **Illustrious XL v0.1**, and we know many of you have been eagerly waiting for updates. We also recognize that many are disappointed with the closed-source nature of **Illustrious XL v1.0**, and we want to address this directly. A lot has happened since then, and we’re truly grateful for the open-source community’s contributions—whether it’s large-scale fine-tuned models, ControlNets, or the countless LoRAs and adapters that have been developed. --- ## Development Journey When we started working on the Illustrious XL series, our goal was simple: there weren’t any strong pretrained models available for illustrations, so we decided to build one ourselves—a pretrain-level fine-tuned model that artists and researchers could actually use. We also knew that keeping everything in-house wouldn’t help the field move forward. That’s why we released **v0.1** to the public and focused on training newer variations, pushing the model’s capabilities further with improved quality, deeper knowledge, and architectural refinements. Along the way, we discovered something unexpected. The model wasn’t just good at illustrations—it could also interpret natural language, handle complex prompts, and generate high-resolution images, far beyond what we originally planned. --- ## Our Model Versions - **v0.1 (trained in May 2024)** - **v1.0 (July 2024)** - **v1.1 (August 2024)** - **v2.0 (September 2024)** - **v3 (November 2024)** - **v3.5 (a special variant incorporating Google’s v-parameterization)** These models take another step forward in natural language composition and image generation. That said, we can’t drop everything all at once. There’s a clear roadmap ahead, and open-source releases are part of it. But rather than rushing, we want to do this the right way—with explanations, insights, and research-backed improvements. --- ## Our Future Plans Now, after months of work behind the scenes, we’re finally ready to move forward. We’ll be rolling out our latest models step by step while progressively open-sourcing previous versions so they can be studied and improved upon. Expect breakthroughs like **true 2K-resolution generation** and better natural language alignment along the way. --- ## Commitment to Open Source This will take time, but we’re moving fast. Our next-generation models are already in development, tackling some of the fundamental limitations of the base SD XL architecture. As we progress, older models will naturally be deprecated, and weight releases will follow accordingly. Our team aims to proceed thoughtfully, ensuring that each release is accompanied by comprehensive explanations and insights. --- ## Backward Compatibility One last thing—we’re not just here to release models. Every model we’ve built is designed with backward compatibility in mind, because Illustrious XL wasn’t just about making something new—it was about creating a better foundation for fine-tuning. That’s why we’ve put so much effort into training LoRAs properly, and soon, we’ll be sharing insights on how to train them more effectively. --- ## Summary In summary, **Onoma AI** plans to roll out open-source weights step by step and encourages the community to stay tuned for upcoming developments—we’re just getting started.", + "model_explanation_gemini": "Generates images from text descriptions using a base model derived from KBlueLeaf/kohaku-xl-beta5." +} \ No newline at end of file diff --git a/data/model_data_json/OpenGVLab_InternVL2-2B.json b/data/model_data_json/OpenGVLab_InternVL2-2B.json new file mode 100644 index 0000000000000000000000000000000000000000..bebc24212ae12dfe7716995ece29ea432bd87777 --- /dev/null +++ b/data/model_data_json/OpenGVLab_InternVL2-2B.json @@ -0,0 +1,27 @@ +{ + "model_id": "OpenGVLab/InternVL2-2B", + "downloads": 278765, + "tags": [ + "transformers", + "safetensors", + "internvl_chat", + "feature-extraction", + "internvl", + "custom_code", + "image-text-to-text", + "conversational", + "multilingual", + "arxiv:2312.14238", + "arxiv:2404.16821", + "arxiv:2410.16261", + "arxiv:2412.05271", + "base_model:OpenGVLab/InternViT-300M-448px", + "base_model:merge:OpenGVLab/InternViT-300M-448px", + "base_model:internlm/internlm2-chat-1_8b", + "base_model:merge:internlm/internlm2-chat-1_8b", + "license:mit", + "region:us" + ], + "description": "--- license: mit pipeline_tag: image-text-to-text library_name: transformers base_model: - OpenGVLab/InternViT-300M-448px - internlm/internlm2-chat-1_8b new_version: OpenGVLab/InternVL2_5-2B base_model_relation: merge language: - multilingual tags: - internvl - custom_code --- # InternVL2-2B [\\[📂 GitHub\\]]( [\\[📜 InternVL 1.0\\]]( [\\[📜 InternVL 1.5\\]]( [\\[📜 Mini-InternVL\\]]( [\\[📜 InternVL 2.5\\]]( [\\[🆕 Blog\\]]( [\\[🗨️ Chat Demo\\]]( [\\[🤗 HF Demo\\]]( [\\[🚀 Quick Start\\]](#quick-start) [\\[📖 Documents\\]](
\"image\" ## Introduction We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of **instruction-tuned models**, ranging from 1 billion to 108 billion parameters. This repository contains the instruction-tuned InternVL2-2B model. Compared to the state-of-the-art open-source multimodal large language models, InternVL 2.0 surpasses most open-source models. It demonstrates competitive performance on par with proprietary commercial models across various capabilities, including document and chart comprehension, infographics QA, scene text understanding and OCR tasks, scientific and mathematical problem solving, as well as cultural understanding and integrated multimodal capabilities. InternVL 2.0 is trained with an 8k context window and utilizes training data consisting of long texts, multiple images, and videos, significantly improving its ability to handle these types of inputs compared to InternVL 1.5. For more details, please refer to our blog and GitHub. | Model Name | Vision Part | Language Part | HF Link | MS Link | | :------------------: | :---------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------: | :--------------------------------------------------------------: | :--------------------------------------------------------------------: | | InternVL2-1B | InternViT-300M-448px | Qwen2-0.5B-Instruct | 🤗 link | 🤖 link | | InternVL2-2B | InternViT-300M-448px | internlm2-chat-1_8b | 🤗 link | 🤖 link | | InternVL2-4B | InternViT-300M-448px | Phi-3-mini-128k-instruct | 🤗 link | 🤖 link | | InternVL2-8B | InternViT-300M-448px | internlm2_5-7b-chat | 🤗 link | 🤖 link | | InternVL2-26B | InternViT-6B-448px-V1-5 | internlm2-chat-20b | 🤗 link | 🤖 link | | InternVL2-40B | InternViT-6B-448px-V1-5 | Nous-Hermes-2-Yi-34B | 🤗 link | 🤖 link | | InternVL2-Llama3-76B | InternViT-6B-448px-V1-5 | Hermes-2-Theta-Llama-3-70B | 🤗 link | 🤖 link | ## Model Details InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned models optimized for multimodal tasks. InternVL2-2B consists of InternViT-300M-448px, an MLP projector, and internlm2-chat-1_8b. ## Performance ### Image Benchmarks | Benchmark | PaliGemma-3B | Mini-InternVL-2B-1-5 | InternVL2-2B | | :--------------------------: | :----------: | :------------------: | :----------: | | Model Size | 2.9B | 2.2B | 2.2B | | | | | | | DocVQAtest | - | 85.0 | 86.9 | | ChartQAtest | - | 74.8 | 76.2 | | InfoVQAtest | - | 55.4 | 58.9 | | TextVQAval | 68.1 | 70.5 | 73.4 | | OCRBench | 614 | 654 | 784 | | MMEsum | 1686.1 | 1901.5 | 1876.8 | | RealWorldQA | 55.2 | 57.9 | 57.3 | | AI2Dtest | 68.3 | 69.8 | 74.1 | | MMMUval | 34.9 | 37.4 | 36.3 | | MMBench-ENtest | 71.0 | 70.9 | 73.2 | | MMBench-CNtest | 63.6 | 66.2 | 70.9 | | CCBenchdev | 29.6 | 63.5 | 74.7 | | MMVetGPT-4-0613 | - | 39.3 | 44.6 | | MMVetGPT-4-Turbo | 33.1 | 35.5 | 39.5 | | SEED-Image | 69.6 | 69.8 | 71.6 | | HallBenchavg | 32.2 | 37.5 | 37.9 | | MathVistatestmini | 28.7 | 41.1 | 46.3 | | OpenCompassavg | 46.6 | 49.8 | 54.0 | - For more details and evaluation reproduction, please refer to our Evaluation Guide. - We simultaneously use InternVL and VLMEvalKit repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet (GPT-4-0613), and SEED-Image were tested using the InternVL repository. MMMU, OCRBench, RealWorldQA, HallBench, MMVet (GPT-4-Turbo), and MathVista were evaluated using the VLMEvalKit. ### Video Benchmarks | Benchmark | VideoChat2-Phi3 | VideoChat2-HD-Mistral | Mini-InternVL-2B-1-5 | InternVL2-2B | | :-------------------------: | :-------------: | :-------------------: | :------------------: | :----------: | | Model Size | 4B | 7B | 2.2B | 2.2B | | | | | | | | MVBench | 55.1 | 60.4 | 37.0 | 60.2 | | MMBench-Video8f | - | - | 0.99 | 0.97 | | MMBench-Video16f | - | - | 1.04 | 1.03 | | Video-MME
w/o subs | - | 42.3 | 42.9 | 45.0 | | Video-MME
w subs | - | 54.6 | 44.7 | 47.3 | - We evaluate our models on MVBench and Video-MME by extracting 16 frames from each video, and each frame was resized to a 448x448 image. ### Grounding Benchmarks | Model | avg. | RefCOCO
(val) | RefCOCO
(testA) | RefCOCO
(testB) | RefCOCO+
(val) | RefCOCO+
(testA) | RefCOCO+
(testB) | RefCOCO‑g
(val) | RefCOCO‑g
(test) | | :----------------------------: | :--: | :--------------: | :----------------: | :----------------: | :---------------: | :-----------------: | :-----------------: | :----------------: | :-----------------: | | UNINEXT-H
(Specialist SOTA) | 88.9 | 92.6 | 94.3 | 91.5 | 85.2 | 89.6 | 79.8 | 88.7 | 89.4 | | | | | | | | | | | | | Mini-InternVL-
Chat-2B-V1-5 | 75.8 | 80.7 | 86.7 | 72.9 | 72.5 | 82.3 | 60.8 | 75.6 | 74.9 | | Mini-InternVL-
Chat-4B-V1-5 | 84.4 | 88.0 | 91.4 | 83.5 | 81.5 | 87.4 | 73.8 | 84.7 | 84.6 | | InternVL‑Chat‑V1‑5 | 88.8 | 91.4 | 93.7 | 87.1 | 87.0 | 92.3 | 80.9 | 88.5 | 89.3 | | | | | | | | | | | | | InternVL2‑1B | 79.9 | 83.6 | 88.7 | 79.8 | 76.0 | 83.6 | 67.7 | 80.2 | 79.9 | | InternVL2‑2B | 77.7 | 82.3 | 88.2 | 75.9 | 73.5 | 82.8 | 63.3 | 77.6 | 78.3 | | InternVL2‑4B | 84.4 | 88.5 | 91.2 | 83.9 | 81.2 | 87.2 | 73.8 | 84.6 | 84.6 | | InternVL2‑8B | 82.9 | 87.1 | 91.1 | 80.7 | 79.8 | 87.9 | 71.4 | 82.7 | 82.7 | | InternVL2‑26B | 88.5 | 91.2 | 93.3 | 87.4 | 86.8 | 91.0 | 81.2 | 88.5 | 88.6 | | InternVL2‑40B | 90.3 | 93.0 | 94.7 | 89.2 | 88.5 | 92.8 | 83.6 | 90.3 | 90.6 | | InternVL2-
Llama3‑76B | 90.0 | 92.2 | 94.8 | 88.4 | 88.8 | 93.1 | 82.8 | 89.5 | 90.3 | - We use the following prompt to evaluate InternVL's grounding ability: Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information. ## Quick Start We provide an example code to run using . > Please use transformers>=4.37.2 to ensure the model works normally. ### Model Loading #### 16-bit (bf16 / fp16) #### BNB 8-bit Quantization #### BNB 4-bit Quantization #### Multiple GPUs The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors. ### Inference with Transformers #### Streaming Output Besides this method, you can also use the following code to get streamed output. ## Finetune Many repositories now support fine-tuning of the InternVL series models, including InternVL, SWIFT, XTurner, and others. Please refer to their documentation for more details on fine-tuning. ## Deployment ### LMDeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs & VLMs. LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline. #### A 'Hello, world' Example If occurs while executing this case, please install the required dependency packages as prompted. #### Multi-images Inference When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased. > Warning: Due to the scarcity of multi-image conversation data, the performance on multi-image tasks may be unstable, and it may require multiple attempts to achieve satisfactory results. #### Batch Prompts Inference Conducting inference with batch prompts is quite straightforward; just place them within a list structure: #### Multi-turn Conversation There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the interface. #### Service LMDeploy's enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: To use the OpenAI-style interface, you need to install OpenAI: Then, use the code below to make the API call: ## License This project is released under the MIT License. This project uses the pre-trained internlm2-chat-1_8b as a component, which is licensed under the Apache License 2.0. ## Citation If you find this project useful in your research, please consider citing:", + "model_explanation_gemini": "An instruction-tuned multimodal large language model excelling in tasks like document comprehension, infographics QA, scene text understanding, and scientific problem solving by processing both images and text." +} \ No newline at end of file diff --git a/data/model_data_json/OpenGVLab_InternVL2-8B.json b/data/model_data_json/OpenGVLab_InternVL2-8B.json new file mode 100644 index 0000000000000000000000000000000000000000..3a527b6a10984d220a02068f343866a444ff7926 --- /dev/null +++ b/data/model_data_json/OpenGVLab_InternVL2-8B.json @@ -0,0 +1,27 @@ +{ + "model_id": "OpenGVLab/InternVL2-8B", + "downloads": 373755, + "tags": [ + "transformers", + "safetensors", + "internvl_chat", + "feature-extraction", + "internvl", + "custom_code", + "image-text-to-text", + "conversational", + "multilingual", + "arxiv:2312.14238", + "arxiv:2404.16821", + "arxiv:2410.16261", + "arxiv:2412.05271", + "base_model:OpenGVLab/InternViT-300M-448px", + "base_model:merge:OpenGVLab/InternViT-300M-448px", + "base_model:internlm/internlm2_5-7b-chat", + "base_model:merge:internlm/internlm2_5-7b-chat", + "license:mit", + "region:us" + ], + "description": "--- license: mit pipeline_tag: image-text-to-text library_name: transformers base_model: - OpenGVLab/InternViT-300M-448px - internlm/internlm2_5-7b-chat new_version: OpenGVLab/InternVL2_5-8B base_model_relation: merge language: - multilingual tags: - internvl - custom_code --- # InternVL2-8B [\\[📂 GitHub\\]]( [\\[📜 InternVL 1.0\\]]( [\\[📜 InternVL 1.5\\]]( [\\[📜 Mini-InternVL\\]]( [\\[📜 InternVL 2.5\\]]( [\\[🆕 Blog\\]]( [\\[🗨️ Chat Demo\\]]( [\\[🤗 HF Demo\\]]( [\\[🚀 Quick Start\\]](#quick-start) [\\[📖 Documents\\]](
\"image\" ## Introduction We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of **instruction-tuned models**, ranging from 1 billion to 108 billion parameters. This repository contains the instruction-tuned InternVL2-8B model. Compared to the state-of-the-art open-source multimodal large language models, InternVL 2.0 surpasses most open-source models. It demonstrates competitive performance on par with proprietary commercial models across various capabilities, including document and chart comprehension, infographics QA, scene text understanding and OCR tasks, scientific and mathematical problem solving, as well as cultural understanding and integrated multimodal capabilities. InternVL 2.0 is trained with an 8k context window and utilizes training data consisting of long texts, multiple images, and videos, significantly improving its ability to handle these types of inputs compared to InternVL 1.5. For more details, please refer to our blog and GitHub. | Model Name | Vision Part | Language Part | HF Link | MS Link | | :------------------: | :---------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------: | :--------------------------------------------------------------: | :--------------------------------------------------------------------: | | InternVL2-1B | InternViT-300M-448px | Qwen2-0.5B-Instruct | 🤗 link | 🤖 link | | InternVL2-2B | InternViT-300M-448px | internlm2-chat-1_8b | 🤗 link | 🤖 link | | InternVL2-4B | InternViT-300M-448px | Phi-3-mini-128k-instruct | 🤗 link | 🤖 link | | InternVL2-8B | InternViT-300M-448px | internlm2_5-7b-chat | 🤗 link | 🤖 link | | InternVL2-26B | InternViT-6B-448px-V1-5 | internlm2-chat-20b | 🤗 link | 🤖 link | | InternVL2-40B | InternViT-6B-448px-V1-5 | Nous-Hermes-2-Yi-34B | 🤗 link | 🤖 link | | InternVL2-Llama3-76B | InternViT-6B-448px-V1-5 | Hermes-2-Theta-Llama-3-70B | 🤗 link | 🤖 link | ## Model Details InternVL 2.0 is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned models optimized for multimodal tasks. InternVL2-8B consists of InternViT-300M-448px, an MLP projector, and internlm2_5-7b-chat. ## Performance ### Image Benchmarks | Benchmark | MiniCPM-Llama3-V-2_5 | InternVL-Chat-V1-5 | InternVL2-8B | | :--------------------------: | :------------------: | :----------------: | :----------: | | Model Size | 8.5B | 25.5B | 8.1B | | | | | | | DocVQAtest | 84.8 | 90.9 | 91.6 | | ChartQAtest | - | 83.8 | 83.3 | | InfoVQAtest | - | 72.5 | 74.8 | | TextVQAval | 76.6 | 80.6 | 77.4 | | OCRBench | 725 | 724 | 794 | | MMEsum | 2024.6 | 2187.8 | 2210.3 | | RealWorldQA | 63.5 | 66.0 | 64.4 | | AI2Dtest | 78.4 | 80.7 | 83.8 | | MMMUval | 45.8 | 46.8 | 51.8 | | MMBench-ENtest | 77.2 | 82.2 | 81.7 | | MMBench-CNtest | 74.2 | 82.0 | 81.2 | | CCBenchdev | 45.9 | 69.8 | 75.9 | | MMVetGPT-4-0613 | - | 62.8 | 60.0 | | MMVetGPT-4-Turbo | 52.8 | 55.4 | 54.2 | | SEED-Image | 72.3 | 76.0 | 76.2 | | HallBenchavg | 42.4 | 49.3 | 45.2 | | MathVistatestmini | 54.3 | 53.5 | 58.3 | | OpenCompassavg | 58.8 | 61.7 | 64.1 | - For more details and evaluation reproduction, please refer to our Evaluation Guide. - We simultaneously use InternVL and VLMEvalKit repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet (GPT-4-0613), and SEED-Image were tested using the InternVL repository. MMMU, OCRBench, RealWorldQA, HallBench, MMVet (GPT-4-Turbo), and MathVista were evaluated using the VLMEvalKit. ### Video Benchmarks | Benchmark | VideoChat2-HD-Mistral | Video-CCAM-9B | InternVL2-4B | InternVL2-8B | | :-------------------------: | :-------------------: | :-----------: | :----------: | :----------: | | Model Size | 7B | 9B | 4.2B | 8.1B | | | | | | | | MVBench | 60.4 | 60.7 | 63.7 | 66.4 | | MMBench-Video8f | - | - | 1.10 | 1.19 | | MMBench-Video16f | - | - | 1.18 | 1.28 | | Video-MME
w/o subs | 42.3 | 50.6 | 51.4 | 54.0 | | Video-MME
w subs | 54.6 | 54.9 | 53.4 | 56.9 | - We evaluate our models on MVBench and Video-MME by extracting 16 frames from each video, and each frame was resized to a 448x448 image. ### Grounding Benchmarks | Model | avg. | RefCOCO
(val) | RefCOCO
(testA) | RefCOCO
(testB) | RefCOCO+
(val) | RefCOCO+
(testA) | RefCOCO+
(testB) | RefCOCO‑g
(val) | RefCOCO‑g
(test) | | :----------------------------: | :--: | :--------------: | :----------------: | :----------------: | :---------------: | :-----------------: | :-----------------: | :----------------: | :-----------------: | | UNINEXT-H
(Specialist SOTA) | 88.9 | 92.6 | 94.3 | 91.5 | 85.2 | 89.6 | 79.8 | 88.7 | 89.4 | | | | | | | | | | | | | Mini-InternVL-
Chat-2B-V1-5 | 75.8 | 80.7 | 86.7 | 72.9 | 72.5 | 82.3 | 60.8 | 75.6 | 74.9 | | Mini-InternVL-
Chat-4B-V1-5 | 84.4 | 88.0 | 91.4 | 83.5 | 81.5 | 87.4 | 73.8 | 84.7 | 84.6 | | InternVL‑Chat‑V1‑5 | 88.8 | 91.4 | 93.7 | 87.1 | 87.0 | 92.3 | 80.9 | 88.5 | 89.3 | | | | | | | | | | | | | InternVL2‑1B | 79.9 | 83.6 | 88.7 | 79.8 | 76.0 | 83.6 | 67.7 | 80.2 | 79.9 | | InternVL2‑2B | 77.7 | 82.3 | 88.2 | 75.9 | 73.5 | 82.8 | 63.3 | 77.6 | 78.3 | | InternVL2‑4B | 84.4 | 88.5 | 91.2 | 83.9 | 81.2 | 87.2 | 73.8 | 84.6 | 84.6 | | InternVL2‑8B | 82.9 | 87.1 | 91.1 | 80.7 | 79.8 | 87.9 | 71.4 | 82.7 | 82.7 | | InternVL2‑26B | 88.5 | 91.2 | 93.3 | 87.4 | 86.8 | 91.0 | 81.2 | 88.5 | 88.6 | | InternVL2‑40B | 90.3 | 93.0 | 94.7 | 89.2 | 88.5 | 92.8 | 83.6 | 90.3 | 90.6 | | InternVL2-
Llama3‑76B | 90.0 | 92.2 | 94.8 | 88.4 | 88.8 | 93.1 | 82.8 | 89.5 | 90.3 | - We use the following prompt to evaluate InternVL's grounding ability: Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information. ## Quick Start We provide an example code to run using . > Please use transformers>=4.37.2 to ensure the model works normally. ### Model Loading #### 16-bit (bf16 / fp16) #### BNB 8-bit Quantization #### Multiple GPUs The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors. ### Inference with Transformers #### Streaming Output Besides this method, you can also use the following code to get streamed output. ## Finetune Many repositories now support fine-tuning of the InternVL series models, including InternVL, SWIFT, XTurner, and others. Please refer to their documentation for more details on fine-tuning. ## Deployment ### LMDeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs & VLMs. LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline. #### A 'Hello, world' Example If occurs while executing this case, please install the required dependency packages as prompted. #### Multi-images Inference When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased. > Warning: Due to the scarcity of multi-image conversation data, the performance on multi-image tasks may be unstable, and it may require multiple attempts to achieve satisfactory results. #### Batch Prompts Inference Conducting inference with batch prompts is quite straightforward; just place them within a list structure: #### Multi-turn Conversation There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the interface. #### Service LMDeploy's enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: To use the OpenAI-style interface, you need to install OpenAI: Then, use the code below to make the API call: ## License This project is released under the MIT License. This project uses the pre-trained internlm2_5-7b-chat as a component, which is licensed under the Apache License 2.0. ## Citation If you find this project useful in your research, please consider citing:", + "model_explanation_gemini": "An 8-billion-parameter multimodal instruction-tuned model excelling in image-text tasks like document comprehension, OCR, infographics QA, and cultural understanding with multilingual capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/OpenGVLab_InternVL2_5-38B-MPO.json b/data/model_data_json/OpenGVLab_InternVL2_5-38B-MPO.json new file mode 100644 index 0000000000000000000000000000000000000000..6b20bc95ddceb35f1655f1c35cab3af566d3b85b --- /dev/null +++ b/data/model_data_json/OpenGVLab_InternVL2_5-38B-MPO.json @@ -0,0 +1,26 @@ +{ + "model_id": "OpenGVLab/InternVL2_5-38B-MPO", + "downloads": 78877, + "tags": [ + "transformers", + "tensorboard", + "safetensors", + "internvl_chat", + "feature-extraction", + "internvl", + "custom_code", + "image-text-to-text", + "conversational", + "multilingual", + "dataset:OpenGVLab/MMPR-v1.1", + "arxiv:2312.14238", + "arxiv:2404.16821", + "arxiv:2412.05271", + "arxiv:2411.10442", + "base_model:OpenGVLab/InternVL2_5-38B", + "base_model:finetune:OpenGVLab/InternVL2_5-38B", + "license:mit", + "region:us" + ], + "description": "--- license: mit pipeline_tag: image-text-to-text library_name: transformers base_model: - OpenGVLab/InternVL2_5-38B base_model_relation: finetune datasets: - OpenGVLab/MMPR-v1.1 language: - multilingual tags: - internvl - custom_code --- # InternVL2_5-38B-MPO [\\[📂 GitHub\\]]( [\\[📜 InternVL 1.0\\]]( [\\[📜 InternVL 1.5\\]]( [\\[📜 InternVL 2.5\\]]( [\\[📜 InternVL2.5-MPO\\]]( [\\[🆕 Blog\\]]( [\\[🗨️ Chat Demo\\]]( [\\[🤗 HF Demo\\]]( [\\[🚀 Quick Start\\]](#quick-start) [\\[📖 Documents\\]](
\"image\" ## Introduction We introduce InternVL2.5-MPO, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. This series builds upon InternVL2.5 and Mixed Preference Optimization. !image/png ## InternVL 2.5 Family In the following table, we provide an overview of the InternVL2.5-MPO series. | Model Name | Vision Part | Language Part | HF Link | | :-----------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :------------------------------------------------------------: | | InternVL2_5-1B-MPO | InternViT-300M-448px-V2_5 | Qwen2.5-0.5B-Instruct | 🤗 link | | InternVL2_5-2B-MPO | InternViT-300M-448px-V2_5 | internlm2_5-1_8b-chat | 🤗 link | | InternVL2_5-4B-MPO | InternViT-300M-448px-V2_5 | Qwen2.5-3B-Instruct | 🤗 link | | InternVL2_5-8B-MPO | InternViT-300M-448px-V2_5 | internlm2_5-7b-chat | 🤗 link | | InternVL2_5-26B-MPO | InternViT-6B-448px-V2_5 | internlm2_5-20b-chat | 🤗 link | | InternVL2_5-38B-MPO | InternViT-6B-448px-V2_5 | Qwen2.5-32B-Instruct | 🤗 link | | InternVL2_5-78B-MPO | InternViT-6B-448px-V2_5 | Qwen2.5-72B-Instruct | 🤗 link | ## Model Architecture As shown in the following figure, InternVL2.5-MPO retains the same model architecture as InternVL 2.5 and its predecessors, InternVL 1.5 and 2.0, following the \"ViT-MLP-LLM\" paradigm. In this new version, we integrate a newly incrementally pre-trained InternViT with various pre-trained LLMs, including InternLM 2.5 and Qwen 2.5, using a randomly initialized MLP projector. !image/png As in the previous version, we applied a pixel unshuffle operation, reducing the number of visual tokens to one-quarter of the original. Besides, we adopted a similar dynamic resolution strategy as InternVL 1.5, dividing images into tiles of 448×448 pixels. The key difference, starting from InternVL 2.0, is that we additionally introduced support for multi-image and video data. ## Key Designs ### Multi-Modal Preference Dataset MMPR is a large-scale and high-quality multimodal reasoning preference dataset. This dataset includes about 3 million samples. !image/jpeg !image/jpeg To construct this dataset, we propose an efficient data construction pipeline. Specifically, we categorize the multimodal data into **samples with clear ground truths** and **samples without clear ground truths**. - **For samples with clear ground truths:** the model is prompted to first provide the reasoning process and then give the final answer in the format like . Responses matching the ground truth answer constitute the positive set \\\\(\\mathcal{Y}_p\\\\), while those that do not match make up the negative set \\\\(\\mathcal{Y}_n\\\\). Additionally, responses that fail to provide a clear final answer are also merged into \\\\(\\mathcal{Y}_n\\\\). Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\\\(y_c\\\\) from \\\\(\\mathcal{Y}_p\\\\) and a negative response \\\\(y_r\\\\) from \\\\(\\mathcal{Y}_n\\\\). - **For samples without clear ground truths:** we propose a simple yet effective method: Dropout Next-Token Prediction (Dropout NTP). Specifically, we use the responses generated by InternVL2-8B as chosen answers. Given the chosen answer, we truncate it by half and then prompt InternVL2-8B to complete the remaining portion of the truncated answer without access to the image input. This generated completion serves as the rejected answer for the paired sample. It is worth noting that while the responses generated by InternVL2-8B may not be perfect, the completions generated without the image input will introduce more hallucinations than those generated with the image input. Therefore, the partial order relationship between the chosen and rejected responses holds true. The data construction pipeline is open-sourced, see more details in our document. ### Mixed Preference Optimization The key insight behind MPO is that *an effective PO process should enable the model to learn the relative preference between pairs of responses, the absolute quality of individual responses, and the process for generating preferred responses.* We define the training objective as a combination of preference loss \\\\(\\mathcal{L}_{\\text{p}}\\\\), quality loss \\\\(\\mathcal{L}_{\\text{q}}\\\\), and generation loss \\\\(\\mathcal{L}_{\\text{g}}\\\\), referred to as Mixed Preference Optimization: $$ \\mathcal{L}=w_{p}\\cdot\\mathcal{L}_{\\text{p}} + w_{q}\\cdot\\mathcal{L}_{\\text{q}} + w_{g}\\cdot\\mathcal{L}_{\\text{g}}, $$ where \\\\(w_{*}\\\\) represents the weight assigned to each loss component. In this work, we empirically compare different variants of preference loss. Based on the experimental results, we use DPO as our preference loss and BCO as our quality loss. Specifically, the DPO serves as the preference loss to enable the model to learn the relative preference between chosen and rejected responses. This algorithm optimizes the following loss function: $$ \\mathcal{L}_{\\text{p}}=-\\log \\sigma\\left(\\beta \\log \\frac{\\pi_\\theta\\left(y_c \\mid x\\right)}{\\pi_0\\left(y_c \\mid x\\right)}-\\beta \\log \\frac{\\pi_\\theta\\left(y_r \\mid x\\right)}{\\pi_0\\left(y_r \\mid x\\right)}\\right), $$ where \\\\(\\beta\\\\) is the KL penalty coefficient, and \\\\(x\\\\), \\\\(y_c\\\\), and \\\\(y_r\\\\) are user query, chosen response, and rejected response, respectively. The policy model \\\\(\\pi_\\theta\\\\) is initialized from model \\\\(\\pi_0\\\\). Additionally, the BCO loss is employed as the quality loss, which helps the model to understand the absolute quality of individual responses. The loss function is defined as: $$ \\mathcal{L}_{\\text{q}}=\\mathcal{L}_{\\text{q}}^+ + \\mathcal{L}_{\\text{q}}^-, $$ where \\\\(\\mathcal{L}_{\\text{q}}^{+}\\\\) and \\\\(\\mathcal{L}_{\\text{q}}^{+}\\\\) represent the loss for chosen and rejected responses, respectively. Each response type's loss is calculated independently, requiring the model to differentiate the absolute quality of individual responses. The loss terms are given by: $$ \\mathcal{L}_{\\text{q}}^+=-\\log \\sigma\\left(\\beta \\log \\frac{\\pi_\\theta\\left(y_c \\mid x\\right)}{\\pi_0\\left(y_c \\mid x\\right)} - \\delta\\right), $$ $$ \\mathcal{L}_{\\text{q}}^-=-\\log \\sigma\\left(-\\left(\\beta \\log \\frac{\\pi_\\theta\\left(y_r \\mid x\\right)}{\\pi_0\\left(y_r \\mid x\\right)} - \\delta\\right) \\right), $$ where \\\\(\\delta\\\\) represents the reward shift, calculated as the moving average of previous rewards to stabilize training. Finally, the SFT loss is used as the generation loss to help the model learn the generation process of preferred responses. The loss function is defined as: $$ \\mathcal{L}_{\\text{gen}}=-\\frac{\\log\\pi_\\theta\\left(y_c \\mid x\\right)}{\\left| y_c \\right|}. $$ ## Evaluation on Multimodal Capability To comprehensively compare InternVL's performance before and after MPO, we employ the benchmarks from OpenCompass Learderboard, including both well-established classic datasets and newly introduced ones. These benchmarks span a wide range of categories, aiming to provide a thorough and balanced assessment of InternVL’s capabilities across various multimodal tasks. We provide the evaluation results in the tables behind. | Model | Avg. | MMBench v1.1 | MMStar | MMMU | MathVista | HallusionBench | AI2D | OCRBench | MMVet | | ------------------- | ---- | ------------ | ------ | ---- | --------- | -------------- | ---- | -------- | ----- | | InternVL2-5-1B | 54.9 | 66.5 | 51.3 | 41.2 | 47.1 | 39.4 | 69.0 | 77.4 | 47.2 | | InternVL2-5-1B-MPO | 56.4 | 67.2 | 49.7 | 40.8 | 53.0 | 40.0 | 69.4 | 83.6 | 47.2 | | InternVL2-5-2B | 59.9 | 70.9 | 54.3 | 43.2 | 51.1 | 42.3 | 74.9 | 80.2 | 62.6 | | InternVL2-5-2B-MPO | 62.0 | 71.6 | 55.0 | 45.0 | 56.4 | 43.0 | 75.3 | 84.2 | 65.4 | | InternVL2-5-4B | 65.1 | 78.2 | 58.7 | 51.8 | 60.8 | 46.6 | 81.4 | 82.0 | 61.5 | | InternVL2-5-4B-MPO | 67.6 | 78.6 | 60.2 | 51.6 | 65.3 | 47.8 | 82.0 | 88.0 | 67.1 | | InternVL2-5-8B | 68.9 | 82.5 | 63.2 | 56.2 | 64.5 | 49.0 | 84.6 | 82.1 | 62.8 | | InternVL2-5-8B-MPO | 70.4 | 82.4 | 65.7 | 54.9 | 68.9 | 51.4 | 84.5 | 88.3 | 66.9 | | InternVL2-5-26B | 71.6 | 84.6 | 66.5 | 60.7 | 68.0 | 55.8 | 86.2 | 85.4 | 65.4 | | InternVL2-5-26B-MPO | 72.7 | 84.2 | 67.2 | 57.7 | 72.8 | 55.3 | 86.2 | 91.2 | 67.1 | | InternVL2-5-38B | 73.5 | 85.4 | 68.5 | 64.6 | 72.4 | 57.9 | 87.6 | 84.1 | 67.2 | | InternVL2-5-38B-MPO | 75.5 | 85.6 | 69.8 | 64.1 | 73.8 | 61.5 | 88.1 | 88.5 | 72.5 | | InternVL2-5-78B | 75.2 | 87.5 | 69.5 | 70.0 | 70.6 | 57.4 | 89.1 | 85.3 | 71.8 | | InternVL2-5-78B-MPO | 76.6 | 87.3 | 73.1 | 68.3 | 73.8 | 58.7 | 89.3 | 91.2 | 71.4 | ## Quick Start We provide an example code to run using . > Please use transformers>=4.37.2 to ensure the model works normally. ### Model Loading #### 16-bit (bf16 / fp16) #### BNB 8-bit Quantization #### Multiple GPUs The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors. ### Inference with Transformers #### Streaming Output Besides this method, you can also use the following code to get streamed output. ## Finetune Many repositories now support fine-tuning of the InternVL series models, including InternVL, SWIFT, XTurner, and others. Please refer to their documentation for more details on fine-tuning. ## Deployment ### LMDeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs & VLMs. LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline. #### A 'Hello, world' Example If occurs while executing this case, please install the required dependency packages as prompted. #### Multi-images Inference When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased. #### Batch Prompts Inference Conducting inference with batch prompts is quite straightforward; just place them within a list structure: #### Multi-turn Conversation There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the interface. #### Service LMDeploy's enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: To use the OpenAI-style interface, you need to install OpenAI: Then, use the code below to make the API call: ## License This project is released under the MIT License. This project uses the pre-trained Qwen2.5-32B-Instruct as a component, which is licensed under the Apache License 2.0. ## Citation If you find this project useful in your research, please consider citing:" +} \ No newline at end of file diff --git a/data/model_data_json/OpenGVLab_InternVL3-2B.json b/data/model_data_json/OpenGVLab_InternVL3-2B.json new file mode 100644 index 0000000000000000000000000000000000000000..7328df42f6d5ace8bea700e13ddbe6c2465559fa --- /dev/null +++ b/data/model_data_json/OpenGVLab_InternVL3-2B.json @@ -0,0 +1,27 @@ +{ + "model_id": "OpenGVLab/InternVL3-2B", + "downloads": 80574, + "tags": [ + "transformers", + "safetensors", + "internvl_chat", + "feature-extraction", + "internvl", + "custom_code", + "image-text-to-text", + "conversational", + "multilingual", + "dataset:OpenGVLab/MMPR-v1.2", + "arxiv:2312.14238", + "arxiv:2404.16821", + "arxiv:2412.05271", + "arxiv:2411.10442", + "arxiv:2504.10479", + "arxiv:2412.09616", + "base_model:OpenGVLab/InternVL3-2B-Instruct", + "base_model:finetune:OpenGVLab/InternVL3-2B-Instruct", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 license_name: qwen license_link: pipeline_tag: image-text-to-text library_name: transformers base_model: - OpenGVLab/InternVL3-2B-Instruct base_model_relation: finetune datasets: - OpenGVLab/MMPR-v1.2 language: - multilingual tags: - internvl - custom_code --- # InternVL3-2B [\\[📂 GitHub\\]]( [\\[📜 InternVL 1.0\\]]( [\\[📜 InternVL 1.5\\]]( [\\[📜 InternVL 2.5\\]]( [\\[📜 InternVL2.5-MPO\\]]( [\\[📜 InternVL3\\]]( [\\[🆕 Blog\\]]( [\\[🗨️ Chat Demo\\]]( [\\[🤗 HF Demo\\]]( [\\[🚀 Quick Start\\]](#quick-start) [\\[📖 Documents\\]](
\"image\" ## Introduction We introduce InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. Compared to InternVL 2.5, InternVL3 exhibits superior multimodal perception and reasoning capabilities, while further extending its multimodal capabilities to encompass tool usage, GUI agents, industrial image analysis, 3D vision perception, and more. Additionally, we compare InternVL3 with Qwen2.5 Chat models, whose corresponding pre-trained base models are employed as the initialization of the langauge component in InternVL3. Benefitting from Native Multimodal Pre-Training, the InternVL3 series achieves even better overall text performance than the Qwen2.5 series. !image/png ## InternVL3 Family In the following table, we provide an overview of the InternVL3 series. | Model Name | Vision Part | Language Part | HF Link | | :-----------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :------------------------------------------------------: | | InternVL3-1B | InternViT-300M-448px-V2_5 | Qwen2.5-0.5B | 🤗 link | | InternVL3-2B | InternViT-300M-448px-V2_5 | Qwen2.5-1.5B | 🤗 link | | InternVL3-8B | InternViT-300M-448px-V2_5 | Qwen2.5-7B | 🤗 link | | InternVL3-9B | InternViT-300M-448px-V2_5 | internlm3-8b-instruct | 🤗 link | | InternVL3-14B | InternViT-300M-448px-V2_5 | Qwen2.5-14B | 🤗 link | | InternVL3-38B | InternViT-6B-448px-V2_5 | Qwen2.5-32B | 🤗 link | | InternVL3-78B | InternViT-6B-448px-V2_5 | Qwen2.5-72B | 🤗 link | !image/png ## Model Architecture As shown in the following figure, InternVL3 retains the same model architecture as InternVL 2.5 and its predecessors, InternVL 1.5 and 2.0, following the \"ViT-MLP-LLM\" paradigm. In this new version, we integrate a newly incrementally pre-trained InternViT with various pre-trained LLMs, including InternLM 3 and Qwen 2.5, using a randomly initialized MLP projector. !image/png As in the previous version, we applied a pixel unshuffle operation, reducing the number of visual tokens to one-quarter of the original. Besides, we adopted a similar dynamic resolution strategy as InternVL 1.5, dividing images into tiles of 448×448 pixels. The key difference, starting from InternVL 2.0, is that we additionally introduced support for multi-image and video data. Notably, in InternVL3, we integrate the Variable Visual Position Encoding (V2PE), which utilizes smaller, more flexible position increments for visual tokens. Benefiting from V2PE, InternVL3 exhibits better long context understanding capabilities compared to its predecessors. ## Training Strategy ### Native Multimodal Pre-Training We propose a Native Multimodal Pre-Training approach that consolidates language and vision learning into a single pre-training stage. In contrast to standard paradigms that first train a language-only model and subsequently adapt it to handle additional modalities, our method interleaves multimodal data (e.g., image-text, video-text, or image-text interleaved sequences) with large-scale textual corpora. This unified training scheme allows the model to learn both linguistic and multimodal representations simultaneously, ultimately enhancing its capability to handle vision-language tasks without the need for separate alignment or bridging modules. Please see our paper for more details. ### Supervised Fine-Tuning In this phase, the techniques of random JPEG compression, square loss re-weighting, and multimodal data packing proposed in InternVL2.5 are also employed in the InternVL3 series. The main advancement of the SFT phase in InternVL3 compared to InternVL2.5 lies in the use of higher-quality and more diverse training data. Specifically, we further extend training samples for tool use, 3D scene understanding, GUI operations, long context tasks, video understanding, scientific diagrams, creative writing, and multimodal reasoning. ### Mixed Preference Optimization During Pre-training and SFT, the model is trained to predict the next token conditioned on previous ground-truth tokens. However, during inference, the model predicts each token based on its own prior outputs. This discrepancy between ground-truth tokens and model-predicted tokens introduces a distribution shift, which can impair the model’s Chain-of-Thought (CoT) reasoning capabilities. To mitigate this issue, we employ MPO, which introduces additional supervision from both positive and negative samples to align the model response distribution with the ground-truth distribution, thereby improving reasoning performance. Specifically, the training objective of MPO is a combination of preference loss \\\\(\\mathcal{L}_{\\text{p}}\\\\), quality loss \\\\(\\mathcal{L}_{\\text{q}}\\\\), and generation loss \\\\(\\mathcal{L}_{\\text{g}}\\\\), which can be formulated as follows: $$ \\mathcal{L}=w_{p}\\cdot\\mathcal{L}_{\\text{p}} + w_{q}\\cdot\\mathcal{L}_{\\text{q}} + w_{g}\\cdot\\mathcal{L}_{\\text{g}}, $$ where \\\\(w_{*}\\\\) represents the weight assigned to each loss component. Please see our paper for more details about MPO. ### Test-Time Scaling Test-Time Scaling has been shown to be an effective method to enhance the reasoning abilities of LLMs and MLLMs. In this work, we use the Best-of-N evaluation strategy and employ VisualPRM-8B as the critic model to select the best response for reasoning and mathematics evaluation. ## Evaluation on Multimodal Capability ### Multimodal Reasoning and Mathematics !image/png ### OCR, Chart, and Document Understanding !image/png ### Multi-Image & Real-World Comprehension !image/png ### Comprehensive Multimodal & Hallucination Evaluation !image/png ### Visual Grounding !image/png ### Multimodal Multilingual Understanding !image/png ### Video Understanding !image/png ### GUI Grounding !image/png ### Spatial Reasoning !image/png ## Evaluation on Language Capability We compare InternVL3 with Qwen2.5 Chat models, whose corresponding pre-trained base models are employed as the initialization of the langauge component in InternVL3. Benefitting from Native Multimodal Pre-Training, the InternVL3 series achieves even better overall text performance than the Qwen2.5 series. Please note that the evaluation scores of Qwen2.5 series may differ from those officially reported, as we have adopted the prompt versions provided in the table across all datasets for OpenCompass evaluation. !image/png ## Ablation Study ### Native Multimodal Pre-Training We conduct experiments on the InternVL2-8B model while keeping its architecture, initialization parameters, and training data entirely unchanged. Traditionally, InternVL2-8B employs a training pipeline that begins with an MLP warmup phase for feature alignment followed by an Instruction Tuning stage. In our experiments, we substitute the conventional MLP warmup phase with a native multimodal pre-training process. This modification isolates the contribution of native multimodal pre-training to the overall multimodal capability of the model. The evaluation results in the Figure below shows that the model with native multimodal pre-training exhibits performance on most benchmarks that is comparable to the fully multi-stage-trained InternVL2-8B baseline. Furthermore, when followed by instruction tuning on higher-quality data, the model demonstrates further performance gains across evaluated multimodal tasks. These findings underscore the efficiency of native multimodal pre-training in imparting powerful multimodal capabilities to MLLMs. !image/png ### Mixed Preference Optimization As shown in the table below, models fine-tuned with MPO demonstrate superior reasoning performance across seven multimodal reasoning benchmarks compared to their counterparts without MPO. Specifically, InternVL3-78B and InternVL3-38B outperform their counterparts by 4.1 and 4.5 points, respectively. Notably, the training data used for MPO is a subset of that used for SFT, indicating that the performance improvements primarily stem from the training algorithm rather than the training data. !image/png ### Variable Visual Position Encoding As reported in the table below, the introduction of V2PE leads to significant performance gains across most evaluation metrics. In addition, our ablation studies—by varying the positional increment \\\\( \\delta \\\\)—reveal that even for tasks primarily involving conventional contexts, relatively small \\\\( \\delta \\\\) values can achieve optimal performance. These findings provide important insights for future efforts aimed at refining position encoding strategies for visual tokens in MLLMs. !image/png ## Quick Start We provide an example code to run using . > Please use transformers>=4.37.2 to ensure the model works normally. ### Model Loading #### 16-bit (bf16 / fp16) #### BNB 8-bit Quantization #### Multiple GPUs The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors. ### Inference with Transformers #### Streaming Output Besides this method, you can also use the following code to get streamed output. ## Finetune Many repositories now support fine-tuning of the InternVL series models, including InternVL, SWIFT, XTurner, and others. Please refer to their documentation for more details on fine-tuning. ## Deployment ### LMDeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs & VLMs. LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline. #### A 'Hello, world' Example If occurs while executing this case, please install the required dependency packages as prompted. #### Multi-images Inference When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased. #### Batch Prompts Inference Conducting inference with batch prompts is quite straightforward; just place them within a list structure: #### Multi-turn Conversation There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the interface. #### Service LMDeploy's enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: To use the OpenAI-style interface, you need to install OpenAI: Then, use the code below to make the API call: ## License This project is released under the MIT License. This project uses the pre-trained Qwen2.5 as a component, which is licensed under the Apache-2.0 License. ## Citation If you find this project useful in your research, please consider citing:" +} \ No newline at end of file diff --git a/data/model_data_json/OpenGVLab_InternVL3-78B.json b/data/model_data_json/OpenGVLab_InternVL3-78B.json new file mode 100644 index 0000000000000000000000000000000000000000..36c8146f08005b9ceb97e8c280788ec74235ce1c --- /dev/null +++ b/data/model_data_json/OpenGVLab_InternVL3-78B.json @@ -0,0 +1,28 @@ +{ + "model_id": "OpenGVLab/InternVL3-78B", + "downloads": 96453, + "tags": [ + "transformers", + "safetensors", + "internvl_chat", + "feature-extraction", + "internvl", + "custom_code", + "image-text-to-text", + "conversational", + "multilingual", + "dataset:OpenGVLab/MMPR-v1.2", + "arxiv:2312.14238", + "arxiv:2404.16821", + "arxiv:2412.05271", + "arxiv:2411.10442", + "arxiv:2504.10479", + "arxiv:2412.09616", + "base_model:OpenGVLab/InternVL3-78B-Instruct", + "base_model:finetune:OpenGVLab/InternVL3-78B-Instruct", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: qwen license_link: pipeline_tag: image-text-to-text library_name: transformers base_model: - OpenGVLab/InternVL3-78B-Instruct base_model_relation: finetune datasets: - OpenGVLab/MMPR-v1.2 language: - multilingual tags: - internvl - custom_code --- # InternVL3-78B [\\[📂 GitHub\\]]( [\\[📜 InternVL 1.0\\]]( [\\[📜 InternVL 1.5\\]]( [\\[📜 InternVL 2.5\\]]( [\\[📜 InternVL2.5-MPO\\]]( [\\[📜 InternVL3\\]]( [\\[🆕 Blog\\]]( [\\[🗨️ Chat Demo\\]]( [\\[🤗 HF Demo\\]]( [\\[🚀 Quick Start\\]](#quick-start) [\\[📖 Documents\\]](
\"image\" ## Introduction We introduce InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. Compared to InternVL 2.5, InternVL3 exhibits superior multimodal perception and reasoning capabilities, while further extending its multimodal capabilities to encompass tool usage, GUI agents, industrial image analysis, 3D vision perception, and more. Additionally, we compare InternVL3 with Qwen2.5 Chat models, whose corresponding pre-trained base models are employed as the initialization of the langauge component in InternVL3. Benefitting from Native Multimodal Pre-Training, the InternVL3 series achieves even better overall text performance than the Qwen2.5 series. !image/png ## InternVL3 Family In the following table, we provide an overview of the InternVL3 series. | Model Name | Vision Part | Language Part | HF Link | | :-----------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :------------------------------------------------------: | | InternVL3-1B | InternViT-300M-448px-V2_5 | Qwen2.5-0.5B | 🤗 link | | InternVL3-2B | InternViT-300M-448px-V2_5 | Qwen2.5-1.5B | 🤗 link | | InternVL3-8B | InternViT-300M-448px-V2_5 | Qwen2.5-7B | 🤗 link | | InternVL3-9B | InternViT-300M-448px-V2_5 | internlm3-8b-instruct | 🤗 link | | InternVL3-14B | InternViT-300M-448px-V2_5 | Qwen2.5-14B | 🤗 link | | InternVL3-38B | InternViT-6B-448px-V2_5 | Qwen2.5-32B | 🤗 link | | InternVL3-78B | InternViT-6B-448px-V2_5 | Qwen2.5-72B | 🤗 link | !image/png ## Model Architecture As shown in the following figure, InternVL3 retains the same model architecture as InternVL 2.5 and its predecessors, InternVL 1.5 and 2.0, following the \"ViT-MLP-LLM\" paradigm. In this new version, we integrate a newly incrementally pre-trained InternViT with various pre-trained LLMs, including InternLM 3 and Qwen 2.5, using a randomly initialized MLP projector. !image/png As in the previous version, we applied a pixel unshuffle operation, reducing the number of visual tokens to one-quarter of the original. Besides, we adopted a similar dynamic resolution strategy as InternVL 1.5, dividing images into tiles of 448×448 pixels. The key difference, starting from InternVL 2.0, is that we additionally introduced support for multi-image and video data. Notably, in InternVL3, we integrate the Variable Visual Position Encoding (V2PE), which utilizes smaller, more flexible position increments for visual tokens. Benefiting from V2PE, InternVL3 exhibits better long context understanding capabilities compared to its predecessors. ## Training Strategy ### Native Multimodal Pre-Training We propose a Native Multimodal Pre-Training approach that consolidates language and vision learning into a single pre-training stage. In contrast to standard paradigms that first train a language-only model and subsequently adapt it to handle additional modalities, our method interleaves multimodal data (e.g., image-text, video-text, or image-text interleaved sequences) with large-scale textual corpora. This unified training scheme allows the model to learn both linguistic and multimodal representations simultaneously, ultimately enhancing its capability to handle vision-language tasks without the need for separate alignment or bridging modules. Please see our paper for more details. ### Supervised Fine-Tuning In this phase, the techniques of random JPEG compression, square loss re-weighting, and multimodal data packing proposed in InternVL2.5 are also employed in the InternVL3 series. The main advancement of the SFT phase in InternVL3 compared to InternVL2.5 lies in the use of higher-quality and more diverse training data. Specifically, we further extend training samples for tool use, 3D scene understanding, GUI operations, long context tasks, video understanding, scientific diagrams, creative writing, and multimodal reasoning. ### Mixed Preference Optimization During Pre-training and SFT, the model is trained to predict the next token conditioned on previous ground-truth tokens. However, during inference, the model predicts each token based on its own prior outputs. This discrepancy between ground-truth tokens and model-predicted tokens introduces a distribution shift, which can impair the model’s Chain-of-Thought (CoT) reasoning capabilities. To mitigate this issue, we employ MPO, which introduces additional supervision from both positive and negative samples to align the model response distribution with the ground-truth distribution, thereby improving reasoning performance. Specifically, the training objective of MPO is a combination of preference loss \\\\(\\mathcal{L}_{\\text{p}}\\\\), quality loss \\\\(\\mathcal{L}_{\\text{q}}\\\\), and generation loss \\\\(\\mathcal{L}_{\\text{g}}\\\\), which can be formulated as follows: $$ \\mathcal{L}=w_{p}\\cdot\\mathcal{L}_{\\text{p}} + w_{q}\\cdot\\mathcal{L}_{\\text{q}} + w_{g}\\cdot\\mathcal{L}_{\\text{g}}, $$ where \\\\(w_{*}\\\\) represents the weight assigned to each loss component. Please see our paper for more details about MPO. ### Test-Time Scaling Test-Time Scaling has been shown to be an effective method to enhance the reasoning abilities of LLMs and MLLMs. In this work, we use the Best-of-N evaluation strategy and employ VisualPRM-8B as the critic model to select the best response for reasoning and mathematics evaluation. ## Evaluation on Multimodal Capability ### Multimodal Reasoning and Mathematics !image/png ### OCR, Chart, and Document Understanding !image/png ### Multi-Image & Real-World Comprehension !image/png ### Comprehensive Multimodal & Hallucination Evaluation !image/png ### Visual Grounding !image/png ### Multimodal Multilingual Understanding !image/png ### Video Understanding !image/png ### GUI Grounding !image/png ### Spatial Reasoning !image/png ## Evaluation on Language Capability We compare InternVL3 with Qwen2.5 Chat models, whose corresponding pre-trained base models are employed as the initialization of the langauge component in InternVL3. Benefitting from Native Multimodal Pre-Training, the InternVL3 series achieves even better overall text performance than the Qwen2.5 series. Please note that the evaluation scores of Qwen2.5 series may differ from those officially reported, as we have adopted the prompt versions provided in the table across all datasets for OpenCompass evaluation. !image/png ## Ablation Study ### Native Multimodal Pre-Training We conduct experiments on the InternVL2-8B model while keeping its architecture, initialization parameters, and training data entirely unchanged. Traditionally, InternVL2-8B employs a training pipeline that begins with an MLP warmup phase for feature alignment followed by an Instruction Tuning stage. In our experiments, we substitute the conventional MLP warmup phase with a native multimodal pre-training process. This modification isolates the contribution of native multimodal pre-training to the overall multimodal capability of the model. The evaluation results in the Figure below shows that the model with native multimodal pre-training exhibits performance on most benchmarks that is comparable to the fully multi-stage-trained InternVL2-8B baseline. Furthermore, when followed by instruction tuning on higher-quality data, the model demonstrates further performance gains across evaluated multimodal tasks. These findings underscore the efficiency of native multimodal pre-training in imparting powerful multimodal capabilities to MLLMs. !image/png ### Mixed Preference Optimization As shown in the table below, models fine-tuned with MPO demonstrate superior reasoning performance across seven multimodal reasoning benchmarks compared to their counterparts without MPO. Specifically, InternVL3-78B and InternVL3-38B outperform their counterparts by 4.1 and 4.5 points, respectively. Notably, the training data used for MPO is a subset of that used for SFT, indicating that the performance improvements primarily stem from the training algorithm rather than the training data. !image/png ### Variable Visual Position Encoding As reported in the table below, the introduction of V2PE leads to significant performance gains across most evaluation metrics. In addition, our ablation studies—by varying the positional increment \\\\( \\delta \\\\)—reveal that even for tasks primarily involving conventional contexts, relatively small \\\\( \\delta \\\\) values can achieve optimal performance. These findings provide important insights for future efforts aimed at refining position encoding strategies for visual tokens in MLLMs. !image/png ## Quick Start We provide an example code to run using . > Please use transformers>=4.37.2 to ensure the model works normally. ### Model Loading #### 16-bit (bf16 / fp16) #### BNB 8-bit Quantization #### Multiple GPUs The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors. ### Inference with Transformers #### Streaming Output Besides this method, you can also use the following code to get streamed output. ## Finetune Many repositories now support fine-tuning of the InternVL series models, including InternVL, SWIFT, XTurner, and others. Please refer to their documentation for more details on fine-tuning. ## Deployment ### LMDeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs & VLMs. LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline. #### A 'Hello, world' Example If occurs while executing this case, please install the required dependency packages as prompted. #### Multi-images Inference When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased. #### Batch Prompts Inference Conducting inference with batch prompts is quite straightforward; just place them within a list structure: #### Multi-turn Conversation There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the interface. #### Service LMDeploy's enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: To use the OpenAI-style interface, you need to install OpenAI: Then, use the code below to make the API call: ## License This project is released under the MIT License. This project uses the pre-trained Qwen2.5 as a component, which is licensed under the Qwen License. ## Citation If you find this project useful in your research, please consider citing:", + "model_explanation_gemini": "InternVL3-78B is a multilingual multimodal large language model (MLLM) excelling in image-text-to-text tasks, offering advanced multimodal perception, reasoning, and extended capabilities like tool usage, GUI agents, and industrial image analysis." +} \ No newline at end of file diff --git a/data/model_data_json/OpenGVLab_InternVL3-8B-AWQ.json b/data/model_data_json/OpenGVLab_InternVL3-8B-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..7626fd5350bb91109707ff413996e817540056a3 --- /dev/null +++ b/data/model_data_json/OpenGVLab_InternVL3-8B-AWQ.json @@ -0,0 +1,28 @@ +{ + "model_id": "OpenGVLab/InternVL3-8B-AWQ", + "downloads": 84273, + "tags": [ + "transformers", + "pytorch", + "internvl_chat", + "feature-extraction", + "internvl", + "custom_code", + "image-text-to-text", + "conversational", + "multilingual", + "dataset:OpenGVLab/MMPR-v1.2", + "arxiv:2312.14238", + "arxiv:2404.16821", + "arxiv:2412.05271", + "arxiv:2411.10442", + "arxiv:2504.10479", + "arxiv:2412.09616", + "base_model:OpenGVLab/InternVL3-8B", + "base_model:quantized:OpenGVLab/InternVL3-8B", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: qwen license_link: pipeline_tag: image-text-to-text library_name: transformers base_model: - OpenGVLab/InternVL3-8B base_model_relation: quantized datasets: - OpenGVLab/MMPR-v1.2 language: - multilingual tags: - internvl - custom_code --- # InternVL3-8B [\\[📂 GitHub\\]]( [\\[📜 InternVL 1.0\\]]( [\\[📜 InternVL 1.5\\]]( [\\[📜 InternVL 2.5\\]]( [\\[📜 InternVL2.5-MPO\\]]( [\\[📜 InternVL3\\]]( [\\[🆕 Blog\\]]( [\\[🗨️ Chat Demo\\]]( [\\[🤗 HF Demo\\]]( [\\[🚀 Quick Start\\]](#quick-start) [\\[📖 Documents\\]](
\"image\" ## Introduction We introduce InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. Compared to InternVL 2.5, InternVL3 exhibits superior multimodal perception and reasoning capabilities, while further extending its multimodal capabilities to encompass tool usage, GUI agents, industrial image analysis, 3D vision perception, and more. Additionally, we compare InternVL3 with Qwen2.5 Chat models, whose corresponding pre-trained base models are employed as the initialization of the langauge component in InternVL3. Benefitting from Native Multimodal Pre-Training, the InternVL3 series achieves even better overall text performance than the Qwen2.5 series. !image/png ## InternVL3 Family In the following table, we provide an overview of the InternVL3 series. | Model Name | Vision Part | Language Part | HF Link | | :-----------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :------------------------------------------------------: | | InternVL3-1B | InternViT-300M-448px-V2_5 | Qwen2.5-0.5B | 🤗 link | | InternVL3-2B | InternViT-300M-448px-V2_5 | Qwen2.5-1.5B | 🤗 link | | InternVL3-8B | InternViT-300M-448px-V2_5 | Qwen2.5-7B | 🤗 link | | InternVL3-9B | InternViT-300M-448px-V2_5 | internlm3-8b-instruct | 🤗 link | | InternVL3-14B | InternViT-300M-448px-V2_5 | Qwen2.5-14B | 🤗 link | | InternVL3-38B | InternViT-6B-448px-V2_5 | Qwen2.5-32B | 🤗 link | | InternVL3-78B | InternViT-6B-448px-V2_5 | Qwen2.5-72B | 🤗 link | !image/png ## Model Architecture As shown in the following figure, InternVL3 retains the same model architecture as InternVL 2.5 and its predecessors, InternVL 1.5 and 2.0, following the \"ViT-MLP-LLM\" paradigm. In this new version, we integrate a newly incrementally pre-trained InternViT with various pre-trained LLMs, including InternLM 3 and Qwen 2.5, using a randomly initialized MLP projector. !image/png As in the previous version, we applied a pixel unshuffle operation, reducing the number of visual tokens to one-quarter of the original. Besides, we adopted a similar dynamic resolution strategy as InternVL 1.5, dividing images into tiles of 448×448 pixels. The key difference, starting from InternVL 2.0, is that we additionally introduced support for multi-image and video data. Notably, in InternVL3, we integrate the Variable Visual Position Encoding (V2PE), which utilizes smaller, more flexible position increments for visual tokens. Benefiting from V2PE, InternVL3 exhibits better long context understanding capabilities compared to its predecessors. ## Training Strategy ### Native Multimodal Pre-Training We propose a Native Multimodal Pre-Training approach that consolidates language and vision learning into a single pre-training stage. In contrast to standard paradigms that first train a language-only model and subsequently adapt it to handle additional modalities, our method interleaves multimodal data (e.g., image-text, video-text, or image-text interleaved sequences) with large-scale textual corpora. This unified training scheme allows the model to learn both linguistic and multimodal representations simultaneously, ultimately enhancing its capability to handle vision-language tasks without the need for separate alignment or bridging modules. Please see our paper for more details. ### Supervised Fine-Tuning In this phase, the techniques of random JPEG compression, square loss re-weighting, and multimodal data packing proposed in InternVL2.5 are also employed in the InternVL3 series. The main advancement of the SFT phase in InternVL3 compared to InternVL2.5 lies in the use of higher-quality and more diverse training data. Specifically, we further extend training samples for tool use, 3D scene understanding, GUI operations, long context tasks, video understanding, scientific diagrams, creative writing, and multimodal reasoning. ### Mixed Preference Optimization During Pre-training and SFT, the model is trained to predict the next token conditioned on previous ground-truth tokens. However, during inference, the model predicts each token based on its own prior outputs. This discrepancy between ground-truth tokens and model-predicted tokens introduces a distribution shift, which can impair the model’s Chain-of-Thought (CoT) reasoning capabilities. To mitigate this issue, we employ MPO, which introduces additional supervision from both positive and negative samples to align the model response distribution with the ground-truth distribution, thereby improving reasoning performance. Specifically, the training objective of MPO is a combination of preference loss \\\\(\\mathcal{L}_{\\text{p}}\\\\), quality loss \\\\(\\mathcal{L}_{\\text{q}}\\\\), and generation loss \\\\(\\mathcal{L}_{\\text{g}}\\\\), which can be formulated as follows: $$ \\mathcal{L}=w_{p}\\cdot\\mathcal{L}_{\\text{p}} + w_{q}\\cdot\\mathcal{L}_{\\text{q}} + w_{g}\\cdot\\mathcal{L}_{\\text{g}}, $$ where \\\\(w_{*}\\\\) represents the weight assigned to each loss component. Please see our paper for more details about MPO. ### Test-Time Scaling Test-Time Scaling has been shown to be an effective method to enhance the reasoning abilities of LLMs and MLLMs. In this work, we use the Best-of-N evaluation strategy and employ VisualPRM-8B as the critic model to select the best response for reasoning and mathematics evaluation. ## Evaluation on Multimodal Capability ### Multimodal Reasoning and Mathematics !image/png ### OCR, Chart, and Document Understanding !image/png ### Multi-Image & Real-World Comprehension !image/png ### Comprehensive Multimodal & Hallucination Evaluation !image/png ### Visual Grounding !image/png ### Multimodal Multilingual Understanding !image/png ### Video Understanding !image/png ### GUI Grounding !image/png ### Spatial Reasoning !image/png ## Evaluation on Language Capability We compare InternVL3 with Qwen2.5 Chat models, whose corresponding pre-trained base models are employed as the initialization of the langauge component in InternVL3. Benefitting from Native Multimodal Pre-Training, the InternVL3 series achieves even better overall text performance than the Qwen2.5 series. Please note that the evaluation scores of Qwen2.5 series may differ from those officially reported, as we have adopted the prompt versions provided in the table across all datasets for OpenCompass evaluation. !image/png ## Ablation Study ### Native Multimodal Pre-Training We conduct experiments on the InternVL2-8B model while keeping its architecture, initialization parameters, and training data entirely unchanged. Traditionally, InternVL2-8B employs a training pipeline that begins with an MLP warmup phase for feature alignment followed by an Instruction Tuning stage. In our experiments, we substitute the conventional MLP warmup phase with a native multimodal pre-training process. This modification isolates the contribution of native multimodal pre-training to the overall multimodal capability of the model. The evaluation results in the Figure below shows that the model with native multimodal pre-training exhibits performance on most benchmarks that is comparable to the fully multi-stage-trained InternVL2-8B baseline. Furthermore, when followed by instruction tuning on higher-quality data, the model demonstrates further performance gains across evaluated multimodal tasks. These findings underscore the efficiency of native multimodal pre-training in imparting powerful multimodal capabilities to MLLMs. !image/png ### Mixed Preference Optimization As shown in the table below, models fine-tuned with MPO demonstrate superior reasoning performance across seven multimodal reasoning benchmarks compared to their counterparts without MPO. Specifically, InternVL3-78B and InternVL3-38B outperform their counterparts by 4.1 and 4.5 points, respectively. Notably, the training data used for MPO is a subset of that used for SFT, indicating that the performance improvements primarily stem from the training algorithm rather than the training data. !image/png ### Variable Visual Position Encoding As reported in the table below, the introduction of V2PE leads to significant performance gains across most evaluation metrics. In addition, our ablation studies—by varying the positional increment \\\\( \\delta \\\\)—reveal that even for tasks primarily involving conventional contexts, relatively small \\\\( \\delta \\\\) values can achieve optimal performance. These findings provide important insights for future efforts aimed at refining position encoding strategies for visual tokens in MLLMs. !image/png ## Quick Start We provide an example code to run using . > Please use transformers>=4.37.2 to ensure the model works normally. ### Model Loading #### 16-bit (bf16 / fp16) #### BNB 8-bit Quantization #### Multiple GPUs The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors. ### Inference with Transformers #### Streaming Output Besides this method, you can also use the following code to get streamed output. ## Finetune Many repositories now support fine-tuning of the InternVL series models, including InternVL, SWIFT, XTurner, and others. Please refer to their documentation for more details on fine-tuning. ## Deployment ### LMDeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs & VLMs. LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline. #### A 'Hello, world' Example If occurs while executing this case, please install the required dependency packages as prompted. #### Multi-images Inference When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased. #### Batch Prompts Inference Conducting inference with batch prompts is quite straightforward; just place them within a list structure: #### Multi-turn Conversation There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the interface. #### Service LMDeploy's enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup: To use the OpenAI-style interface, you need to install OpenAI: Then, use the code below to make the API call: ## License This project is released under the MIT License. This project uses the pre-trained Qwen2.5 as a component, which is licensed under the Qwen License. ## Citation If you find this project useful in your research, please consider citing:", + "model_explanation_gemini": "InternVL3-8B-AWQ is a multilingual multimodal large language model (MLLM) with enhanced image-text-to-text capabilities, excelling in multimodal perception, reasoning, and extended functionalities like tool usage and industrial image analysis." +} \ No newline at end of file diff --git a/data/model_data_json/OrcaDB_cde-small-v1.json b/data/model_data_json/OrcaDB_cde-small-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..499a58944b8f0de82309b8eb97977397a26b36da --- /dev/null +++ b/data/model_data_json/OrcaDB_cde-small-v1.json @@ -0,0 +1,19 @@ +{ + "model_id": "OrcaDB/cde-small-v1", + "downloads": 116638, + "tags": [ + "sentence-transformers", + "safetensors", + "feature-extraction", + "mteb", + "transformers", + "custom_code", + "arxiv:2410.02525", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - transformers - sentence-transformers model-index: - name: cde-small-v1 results: - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 87.02985074626866 - type: ap value: 56.706190238632956 - type: ap_weighted value: 56.706190238632956 - type: f1 value: 81.93161953007674 - type: f1_weighted value: 87.7650174177188 - type: main_score value: 87.02985074626866 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification (default) revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 94.664175 - type: ap value: 91.68668057762052 - type: ap_weighted value: 91.68668057762052 - type: f1 value: 94.65859470333152 - type: f1_weighted value: 94.65859470333152 - type: main_score value: 94.664175 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 55.762 - type: f1 value: 55.06427827477677 - type: f1_weighted value: 55.06427827477677 - type: main_score value: 55.762 task: type: Classification - dataset: config: default name: MTEB ArguAna (default) revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: main_score value: 71.99600000000001 - type: map_at_1 value: 49.004 - type: map_at_10 value: 64.741 - type: map_at_100 value: 65.045 - type: map_at_1000 value: 65.048 - type: map_at_20 value: 64.999 - type: map_at_3 value: 61.344 - type: map_at_5 value: 63.595 - type: mrr_at_1 value: 50.71123755334281 - type: mrr_at_10 value: 65.32688703741336 - type: mrr_at_100 value: 65.63793917015693 - type: mrr_at_1000 value: 65.64038101143724 - type: mrr_at_20 value: 65.59178002869953 - type: mrr_at_3 value: 61.960644855381695 - type: mrr_at_5 value: 64.12636320531058 - type: nauc_map_at_1000_diff1 value: 15.961240220366024 - type: nauc_map_at_1000_max value: -7.44765810583741 - type: nauc_map_at_1000_std value: -17.07167824225605 - type: nauc_map_at_100_diff1 value: 15.965616911760689 - type: nauc_map_at_100_max value: -7.440609797442297 - type: nauc_map_at_100_std value: -17.069175070766125 - type: nauc_map_at_10_diff1 value: 16.0053641689455 - type: nauc_map_at_10_max value: -7.292003400856069 - type: nauc_map_at_10_std value: -17.21891231777586 - type: nauc_map_at_1_diff1 value: 16.775859614223965 - type: nauc_map_at_1_max value: -10.812150486389175 - type: nauc_map_at_1_std value: -18.447209756110635 - type: nauc_map_at_20_diff1 value: 16.00477985164213 - type: nauc_map_at_20_max value: -7.344399709169316 - type: nauc_map_at_20_std value: -17.011815937847548 - type: nauc_map_at_3_diff1 value: 15.730294091913994 - type: nauc_map_at_3_max value: -7.13902722192326 - type: nauc_map_at_3_std value: -16.846251134000045 - type: nauc_map_at_5_diff1 value: 15.952653874864062 - type: nauc_map_at_5_max value: -6.730509527119155 - type: nauc_map_at_5_std value: -16.586379153220353 - type: nauc_mrr_at_1000_diff1 value: 10.221278338563085 - type: nauc_mrr_at_1000_max value: -10.513831642963527 - type: nauc_mrr_at_1000_std value: -16.340880407651863 - type: nauc_mrr_at_100_diff1 value: 10.226217465992063 - type: nauc_mrr_at_100_max value: -10.506478667638874 - type: nauc_mrr_at_100_std value: -16.33847358633176 - type: nauc_mrr_at_10_diff1 value: 10.293491655887369 - type: nauc_mrr_at_10_max value: -10.357229664747909 - type: nauc_mrr_at_10_std value: -16.496874845739885 - type: nauc_mrr_at_1_diff1 value: 12.049863016253427 - type: nauc_mrr_at_1_max value: -11.968579522299635 - type: nauc_mrr_at_1_std value: -16.65245790056632 - type: nauc_mrr_at_20_diff1 value: 10.276109067921565 - type: nauc_mrr_at_20_max value: -10.404100283652397 - type: nauc_mrr_at_20_std value: -16.282098762560164 - type: nauc_mrr_at_3_diff1 value: 10.338008940592475 - type: nauc_mrr_at_3_max value: -10.123508259477648 - type: nauc_mrr_at_3_std value: -16.218834894850918 - type: nauc_mrr_at_5_diff1 value: 10.114375457049043 - type: nauc_mrr_at_5_max value: -9.987361588255437 - type: nauc_mrr_at_5_std value: -15.723897501895118 - type: nauc_ndcg_at_1000_diff1 value: 16.00889445347496 - type: nauc_ndcg_at_1000_max value: -6.746746500535893 - type: nauc_ndcg_at_1000_std value: -16.567047531839382 - type: nauc_ndcg_at_100_diff1 value: 16.10719535312808 - type: nauc_ndcg_at_100_max value: -6.59354665730934 - type: nauc_ndcg_at_100_std value: -16.513298001700566 - type: nauc_ndcg_at_10_diff1 value: 16.396485814351973 - type: nauc_ndcg_at_10_max value: -5.7111859345525895 - type: nauc_ndcg_at_10_std value: -17.13416103510026 - type: nauc_ndcg_at_1_diff1 value: 16.775859614223965 - type: nauc_ndcg_at_1_max value: -10.812150486389175 - type: nauc_ndcg_at_1_std value: -18.447209756110635 - type: nauc_ndcg_at_20_diff1 value: 16.414235526534497 - type: nauc_ndcg_at_20_max value: -5.890463457153039 - type: nauc_ndcg_at_20_std value: -16.124783371499017 - type: nauc_ndcg_at_3_diff1 value: 15.683431770601713 - type: nauc_ndcg_at_3_max value: -5.546675513691499 - type: nauc_ndcg_at_3_std value: -15.973244504586676 - type: nauc_ndcg_at_5_diff1 value: 16.193847874581166 - type: nauc_ndcg_at_5_max value: -4.471638454091411 - type: nauc_ndcg_at_5_std value: -15.517824617814629 - type: nauc_precision_at_1000_diff1 value: 3.170440311533737 - type: nauc_precision_at_1000_max value: 25.521992526080666 - type: nauc_precision_at_1000_std value: 68.4373013145641 - type: nauc_precision_at_100_diff1 value: 30.283338663457897 - type: nauc_precision_at_100_max value: 44.33747104624998 - type: nauc_precision_at_100_std value: 42.28887350925609 - type: nauc_precision_at_10_diff1 value: 23.390956301235633 - type: nauc_precision_at_10_max value: 15.468288261126773 - type: nauc_precision_at_10_std value: -18.2942744669977 - type: nauc_precision_at_1_diff1 value: 16.775859614223965 - type: nauc_precision_at_1_max value: -10.812150486389175 - type: nauc_precision_at_1_std value: -18.447209756110635 - type: nauc_precision_at_20_diff1 value: 37.14254275219614 - type: nauc_precision_at_20_max value: 46.984729023754824 - type: nauc_precision_at_20_std value: 22.763524786900717 - type: nauc_precision_at_3_diff1 value: 15.651406928218881 - type: nauc_precision_at_3_max value: 0.7775458885343681 - type: nauc_precision_at_3_std value: -12.438132482295773 - type: nauc_precision_at_5_diff1 value: 18.10074574210355 - type: nauc_precision_at_5_max value: 9.373350504221532 - type: nauc_precision_at_5_std value: -9.13125987784625 - type: nauc_recall_at_1000_diff1 value: 3.1704403115262325 - type: nauc_recall_at_1000_max value: 25.521992526077756 - type: nauc_recall_at_1000_std value: 68.4373013145603 - type: nauc_recall_at_100_diff1 value: 30.283338663455616 - type: nauc_recall_at_100_max value: 44.337471046250556 - type: nauc_recall_at_100_std value: 42.28887350925341 - type: nauc_recall_at_10_diff1 value: 23.390956301235168 - type: nauc_recall_at_10_max value: 15.468288261126578 - type: nauc_recall_at_10_std value: -18.294274466997873 - type: nauc_recall_at_1_diff1 value: 16.775859614223965 - type: nauc_recall_at_1_max value: -10.812150486389175 - type: nauc_recall_at_1_std value: -18.447209756110635 - type: nauc_recall_at_20_diff1 value: 37.14254275219513 - type: nauc_recall_at_20_max value: 46.98472902375421 - type: nauc_recall_at_20_std value: 22.763524786899644 - type: nauc_recall_at_3_diff1 value: 15.65140692821902 - type: nauc_recall_at_3_max value: 0.7775458885343522 - type: nauc_recall_at_3_std value: -12.43813248229578 - type: nauc_recall_at_5_diff1 value: 18.10074574210355 - type: nauc_recall_at_5_max value: 9.373350504221595 - type: nauc_recall_at_5_std value: -9.131259877846116 - type: ndcg_at_1 value: 49.004 - type: ndcg_at_10 value: 71.99600000000001 - type: ndcg_at_100 value: 73.173 - type: ndcg_at_1000 value: 73.214 - type: ndcg_at_20 value: 72.91 - type: ndcg_at_3 value: 65.21900000000001 - type: ndcg_at_5 value: 69.284 - type: precision_at_1 value: 49.004 - type: precision_at_10 value: 9.452 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.904 - type: precision_at_3 value: 25.462 - type: precision_at_5 value: 17.255000000000003 - type: recall_at_1 value: 49.004 - type: recall_at_10 value: 94.523 - type: recall_at_100 value: 99.36 - type: recall_at_1000 value: 99.644 - type: recall_at_20 value: 98.08 - type: recall_at_3 value: 76.387 - type: recall_at_5 value: 86.273 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: main_score value: 48.629569816593516 - type: v_measure value: 48.629569816593516 - type: v_measure_std value: 14.01810149072028 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S (default) revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: main_score value: 40.52366904677561 - type: v_measure value: 40.52366904677561 - type: v_measure_std value: 14.375876773823757 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions (default) revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: main_score value: 61.27347206107508 - type: map value: 61.27347206107508 - type: mrr value: 74.49105219188321 - type: nAUC_map_diff1 value: 13.442645655149457 - type: nAUC_map_max value: 25.013363268430027 - type: nAUC_map_std value: 17.60175231611674 - type: nAUC_mrr_diff1 value: 25.217675209249435 - type: nAUC_mrr_max value: 32.37381560372622 - type: nAUC_mrr_std value: 22.584922632508412 task: type: Reranking - dataset: config: default name: MTEB BIOSSES (default) revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: cosine_pearson value: 89.09452267906886 - type: cosine_spearman value: 86.73450642504955 - type: euclidean_pearson value: 87.1275130552617 - type: euclidean_spearman value: 86.93812552248012 - type: main_score value: 86.73450642504955 - type: manhattan_pearson value: 86.79403606129864 - type: manhattan_spearman value: 86.76824213349957 - type: pearson value: 89.09452267906886 - type: spearman value: 86.73450642504955 task: type: STS - dataset: config: default name: MTEB Banking77Classification (default) revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 88.58116883116884 - type: f1 value: 88.54536316207125 - type: f1_weighted value: 88.54536316207125 - type: main_score value: 88.58116883116884 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P (default) revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: main_score value: 44.89554099528695 - type: v_measure value: 44.89554099528695 - type: v_measure_std value: 0.6101675839696261 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S (default) revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: main_score value: 37.89775676199564 - type: v_measure value: 37.89775676199564 - type: v_measure_std value: 0.6980439644171996 task: type: Clustering - dataset: config: default name: MTEB CQADupstackAndroidRetrieval (default) revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: mteb/cqadupstack-android metrics: - type: main_score value: 49.239 - type: map_at_1 value: 31.407 - type: map_at_10 value: 42.788 - type: map_at_100 value: 44.163999999999994 - type: map_at_1000 value: 44.285000000000004 - type: map_at_20 value: 43.531 - type: map_at_3 value: 39.381 - type: map_at_5 value: 41.296 - type: mrr_at_1 value: 38.91273247496424 - type: mrr_at_10 value: 48.82553307446011 - type: mrr_at_100 value: 49.5278584841276 - type: mrr_at_1000 value: 49.56897938168851 - type: mrr_at_20 value: 49.27034318525701 - type: mrr_at_3 value: 46.423462088698145 - type: mrr_at_5 value: 47.83261802575108 - type: nauc_map_at_1000_diff1 value: 51.50772644391144 - type: nauc_map_at_1000_max value: 39.57698592158747 - type: nauc_map_at_1000_std value: -5.092734127689174 - type: nauc_map_at_100_diff1 value: 51.51650908644926 - type: nauc_map_at_100_max value: 39.579607215550325 - type: nauc_map_at_100_std value: -5.112306014245407 - type: nauc_map_at_10_diff1 value: 51.80732269410239 - type: nauc_map_at_10_max value: 39.312012392020854 - type: nauc_map_at_10_std value: -5.844192947783184 - type: nauc_map_at_1_diff1 value: 58.51885994004338 - type: nauc_map_at_1_max value: 35.306905646597656 - type: nauc_map_at_1_std value: -6.4627870729629455 - type: nauc_map_at_20_diff1 value: 51.560698537725294 - type: nauc_map_at_20_max value: 39.40865218451427 - type: nauc_map_at_20_std value: -5.46140640509653 - type: nauc_map_at_3_diff1 value: 52.845784777873305 - type: nauc_map_at_3_max value: 38.55976877563459 - type: nauc_map_at_3_std value: -5.72430771104222 - type: nauc_map_at_5_diff1 value: 52.29343919325049 - type: nauc_map_at_5_max value: 38.98194700024613 - type: nauc_map_at_5_std value: -6.062278166282727 - type: nauc_mrr_at_1000_diff1 value: 48.824012243253904 - type: nauc_mrr_at_1000_max value: 40.36119735345816 - type: nauc_mrr_at_1000_std value: -4.371172318529068 - type: nauc_mrr_at_100_diff1 value: 48.80142209066577 - type: nauc_mrr_at_100_max value: 40.35371141231279 - type: nauc_mrr_at_100_std value: -4.382000140837231 - type: nauc_mrr_at_10_diff1 value: 48.89408963706152 - type: nauc_mrr_at_10_max value: 40.48043029859513 - type: nauc_mrr_at_10_std value: -4.5927306729163835 - type: nauc_mrr_at_1_diff1 value: 53.18491414251319 - type: nauc_mrr_at_1_max value: 38.43746618754316 - type: nauc_mrr_at_1_std value: -6.2489159406458965 - type: nauc_mrr_at_20_diff1 value: 48.763867640789634 - type: nauc_mrr_at_20_max value: 40.369114351255135 - type: nauc_mrr_at_20_std value: -4.400065130027329 - type: nauc_mrr_at_3_diff1 value: 48.87375252127912 - type: nauc_mrr_at_3_max value: 40.810763259212116 - type: nauc_mrr_at_3_std value: -3.4938483699692657 - type: nauc_mrr_at_5_diff1 value: 49.186967577714285 - type: nauc_mrr_at_5_max value: 40.48882253846611 - type: nauc_mrr_at_5_std value: -4.621076155915746 - type: nauc_ndcg_at_1000_diff1 value: 49.24642669558249 - type: nauc_ndcg_at_1000_max value: 41.00404222082434 - type: nauc_ndcg_at_1000_std value: -2.7356065308278392 - type: nauc_ndcg_at_100_diff1 value: 48.92939354546236 - type: nauc_ndcg_at_100_max value: 40.972699158281586 - type: nauc_ndcg_at_100_std value: -3.0561983632108776 - type: nauc_ndcg_at_10_diff1 value: 49.60179215238792 - type: nauc_ndcg_at_10_max value: 40.89678771623847 - type: nauc_ndcg_at_10_std value: -5.096633756025252 - type: nauc_ndcg_at_1_diff1 value: 53.18491414251319 - type: nauc_ndcg_at_1_max value: 38.43746618754316 - type: nauc_ndcg_at_1_std value: -6.2489159406458965 - type: nauc_ndcg_at_20_diff1 value: 48.826483305583984 - type: nauc_ndcg_at_20_max value: 40.592200374154466 - type: nauc_ndcg_at_20_std value: -4.185196398682058 - type: nauc_ndcg_at_3_diff1 value: 49.9798291819845 - type: nauc_ndcg_at_3_max value: 40.50211559049151 - type: nauc_ndcg_at_3_std value: -3.9606100546649 - type: nauc_ndcg_at_5_diff1 value: 50.222364976292454 - type: nauc_ndcg_at_5_max value: 40.477461845726694 - type: nauc_ndcg_at_5_std value: -5.025922873253527 - type: nauc_precision_at_1000_diff1 value: -24.208256297106363 - type: nauc_precision_at_1000_max value: -10.21103761078881 - type: nauc_precision_at_1000_std value: -0.06753142735419307 - type: nauc_precision_at_100_diff1 value: -15.392095697703853 - type: nauc_precision_at_100_max value: 3.3764259600400375 - type: nauc_precision_at_100_std value: 7.032273000803224 - type: nauc_precision_at_10_diff1 value: 8.050911372676126 - type: nauc_precision_at_10_max value: 26.426542125643365 - type: nauc_precision_at_10_std value: 2.3142807003880423 - type: nauc_precision_at_1_diff1 value: 53.18491414251319 - type: nauc_precision_at_1_max value: 38.43746618754316 - type: nauc_precision_at_1_std value: -6.2489159406458965 - type: nauc_precision_at_20_diff1 value: -2.4038370945777605 - type: nauc_precision_at_20_max value: 18.29255413962441 - type: nauc_precision_at_20_std value: 6.963786700698579 - type: nauc_precision_at_3_diff1 value: 27.590923102137978 - type: nauc_precision_at_3_max value: 36.809716569640635 - type: nauc_precision_at_3_std value: -0.4588749991090731 - type: nauc_precision_at_5_diff1 value: 18.31451430104417 - type: nauc_precision_at_5_max value: 31.76792278657563 - type: nauc_precision_at_5_std value: -0.23205753470623663 - type: nauc_recall_at_1000_diff1 value: 38.6186488416617 - type: nauc_recall_at_1000_max value: 58.02448766170835 - type: nauc_recall_at_1000_std value: 43.005151313404625 - type: nauc_recall_at_100_diff1 value: 36.14901358957452 - type: nauc_recall_at_100_max value: 42.97412072448754 - type: nauc_recall_at_100_std value: 8.434723462734665 - type: nauc_recall_at_10_diff1 value: 42.953316965307245 - type: nauc_recall_at_10_max value: 40.54865147159118 - type: nauc_recall_at_10_std value: -4.9425741693714125 - type: nauc_recall_at_1_diff1 value: 58.51885994004338 - type: nauc_recall_at_1_max value: 35.306905646597656 - type: nauc_recall_at_1_std value: -6.4627870729629455 - type: nauc_recall_at_20_diff1 value: 38.27628659312007 - type: nauc_recall_at_20_max value: 39.50607176714142 - type: nauc_recall_at_20_std value: -1.002089290215587 - type: nauc_recall_at_3_diff1 value: 47.263415527062676 - type: nauc_recall_at_3_max value: 40.82836525135613 - type: nauc_recall_at_3_std value: -2.2314232915782504 - type: nauc_recall_at_5_diff1 value: 46.13867315478644 - type: nauc_recall_at_5_max value: 39.93028001594826 - type: nauc_recall_at_5_std value: -4.809283400175646 - type: ndcg_at_1 value: 38.913 - type: ndcg_at_10 value: 49.239 - type: ndcg_at_100 value: 54.325 - type: ndcg_at_1000 value: 56.226 - type: ndcg_at_20 value: 51.212999999999994 - type: ndcg_at_3 value: 44.559 - type: ndcg_at_5 value: 46.69 - type: precision_at_1 value: 38.913 - type: precision_at_10 value: 9.227 - type: precision_at_100 value: 1.4909999999999999 - type: precision_at_1000 value: 0.197 - type: precision_at_20 value: 5.494000000000001 - type: precision_at_3 value: 21.65 - type: precision_at_5 value: 15.336 - type: recall_at_1 value: 31.407 - type: recall_at_10 value: 61.961999999999996 - type: recall_at_100 value: 82.993 - type: recall_at_1000 value: 94.887 - type: recall_at_20 value: 68.771 - type: recall_at_3 value: 47.77 - type: recall_at_5 value: 53.895 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval (default) revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: mteb/cqadupstack-english metrics: - type: main_score value: 44.391000000000005 - type: map_at_1 value: 29.157 - type: map_at_10 value: 38.723 - type: map_at_100 value: 39.864 - type: map_at_1000 value: 39.995999999999995 - type: map_at_20 value: 39.287 - type: map_at_3 value: 35.751 - type: map_at_5 value: 37.373 - type: mrr_at_1 value: 36.81528662420382 - type: mrr_at_10 value: 44.82939035486806 - type: mrr_at_100 value: 45.437834419775484 - type: mrr_at_1000 value: 45.48695197590834 - type: mrr_at_20 value: 45.15519263295387 - type: mrr_at_3 value: 42.55838641188959 - type: mrr_at_5 value: 43.87685774946922 - type: nauc_map_at_1000_diff1 value: 51.086880931657944 - type: nauc_map_at_1000_max value: 36.870501109568856 - type: nauc_map_at_1000_std value: -9.041748740450098 - type: nauc_map_at_100_diff1 value: 51.13349280885669 - type: nauc_map_at_100_max value: 36.81376788959824 - type: nauc_map_at_100_std value: -9.168817557968493 - type: nauc_map_at_10_diff1 value: 51.43767101896258 - type: nauc_map_at_10_max value: 36.13512723388837 - type: nauc_map_at_10_std value: -10.340353132146591 - type: nauc_map_at_1_diff1 value: 57.97216876426843 - type: nauc_map_at_1_max value: 32.093932122348804 - type: nauc_map_at_1_std value: -12.44326469749823 - type: nauc_map_at_20_diff1 value: 51.35742644989209 - type: nauc_map_at_20_max value: 36.362008583908754 - type: nauc_map_at_20_std value: -9.925604455959942 - type: nauc_map_at_3_diff1 value: 52.97191265890149 - type: nauc_map_at_3_max value: 35.216095114265 - type: nauc_map_at_3_std value: -11.505843284384989 - type: nauc_map_at_5_diff1 value: 52.13435748405322 - type: nauc_map_at_5_max value: 35.63014323147684 - type: nauc_map_at_5_std value: -11.15253714131609 - type: nauc_mrr_at_1000_diff1 value: 49.806361508243526 - type: nauc_mrr_at_1000_max value: 39.60825242174082 - type: nauc_mrr_at_1000_std value: -4.581320333963986 - type: nauc_mrr_at_100_diff1 value: 49.794023465886575 - type: nauc_mrr_at_100_max value: 39.606036503563935 - type: nauc_mrr_at_100_std value: -4.580524433129927 - type: nauc_mrr_at_10_diff1 value: 49.62511317783946 - type: nauc_mrr_at_10_max value: 39.524849843022054 - type: nauc_mrr_at_10_std value: -4.784364837521214 - type: nauc_mrr_at_1_diff1 value: 55.03485605539673 - type: nauc_mrr_at_1_max value: 38.26074360694823 - type: nauc_mrr_at_1_std value: -6.990940922024673 - type: nauc_mrr_at_20_diff1 value: 49.77823031843402 - type: nauc_mrr_at_20_max value: 39.62943812120721 - type: nauc_mrr_at_20_std value: -4.664971744136187 - type: nauc_mrr_at_3_diff1 value: 50.60933103133387 - type: nauc_mrr_at_3_max value: 39.920174010377444 - type: nauc_mrr_at_3_std value: -5.404917304425809 - type: nauc_mrr_at_5_diff1 value: 50.137405938227886 - type: nauc_mrr_at_5_max value: 39.7046033416223 - type: nauc_mrr_at_5_std value: -4.9683994219777965 - type: nauc_ndcg_at_1000_diff1 value: 48.26320826156127 - type: nauc_ndcg_at_1000_max value: 39.11158925773445 - type: nauc_ndcg_at_1000_std value: -3.958164717220878 - type: nauc_ndcg_at_100_diff1 value: 48.29325255469789 - type: nauc_ndcg_at_100_max value: 39.00224428862792 - type: nauc_ndcg_at_100_std value: -4.739309326434606 - type: nauc_ndcg_at_10_diff1 value: 48.62405764367444 - type: nauc_ndcg_at_10_max value: 38.04015783804633 - type: nauc_ndcg_at_10_std value: -7.379427256377835 - type: nauc_ndcg_at_1_diff1 value: 55.03485605539673 - type: nauc_ndcg_at_1_max value: 38.26074360694823 - type: nauc_ndcg_at_1_std value: -6.990940922024673 - type: nauc_ndcg_at_20_diff1 value: 48.793146636748155 - type: nauc_ndcg_at_20_max value: 38.188247609309734 - type: nauc_ndcg_at_20_std value: -6.893163590780488 - type: nauc_ndcg_at_3_diff1 value: 49.72527867128085 - type: nauc_ndcg_at_3_max value: 38.397771643337876 - type: nauc_ndcg_at_3_std value: -7.396734926261662 - type: nauc_ndcg_at_5_diff1 value: 49.45897046963514 - type: nauc_ndcg_at_5_max value: 38.00788817919171 - type: nauc_ndcg_at_5_std value: -7.98773024373368 - type: nauc_precision_at_1000_diff1 value: -15.203088093712378 - type: nauc_precision_at_1000_max value: 13.932931359528938 - type: nauc_precision_at_1000_std value: 28.443903216719125 - type: nauc_precision_at_100_diff1 value: -9.833515062825485 - type: nauc_precision_at_100_max value: 25.501133048619252 - type: nauc_precision_at_100_std value: 29.28522368814619 - type: nauc_precision_at_10_diff1 value: 11.048052024883837 - type: nauc_precision_at_10_max value: 35.12225756686281 - type: nauc_precision_at_10_std value: 13.549314875239492 - type: nauc_precision_at_1_diff1 value: 55.03485605539673 - type: nauc_precision_at_1_max value: 38.26074360694823 - type: nauc_precision_at_1_std value: -6.990940922024673 - type: nauc_precision_at_20_diff1 value: 3.6119660166254564 - type: nauc_precision_at_20_max value: 31.80991909502872 - type: nauc_precision_at_20_std value: 19.289172474937768 - type: nauc_precision_at_3_diff1 value: 30.93845075141858 - type: nauc_precision_at_3_max value: 41.2363485550859 - type: nauc_precision_at_3_std value: 3.304016059128308 - type: nauc_precision_at_5_diff1 value: 22.383511628600537 - type: nauc_precision_at_5_max value: 38.3094647733712 - type: nauc_precision_at_5_std value: 7.010497480008379 - type: nauc_recall_at_1000_diff1 value: 31.611750140993035 - type: nauc_recall_at_1000_max value: 42.982693130692894 - type: nauc_recall_at_1000_std value: 25.50352029753317 - type: nauc_recall_at_100_diff1 value: 36.466866132011525 - type: nauc_recall_at_100_max value: 39.8896195569174 - type: nauc_recall_at_100_std value: 8.056466272308052 - type: nauc_recall_at_10_diff1 value: 40.55869867748143 - type: nauc_recall_at_10_max value: 35.35219000254458 - type: nauc_recall_at_10_std value: -6.935500599977123 - type: nauc_recall_at_1_diff1 value: 57.97216876426843 - type: nauc_recall_at_1_max value: 32.093932122348804 - type: nauc_recall_at_1_std value: -12.44326469749823 - type: nauc_recall_at_20_diff1 value: 40.699604166249046 - type: nauc_recall_at_20_max value: 36.441366652406835 - type: nauc_recall_at_20_std value: -4.519436682877613 - type: nauc_recall_at_3_diff1 value: 47.15019730046201 - type: nauc_recall_at_3_max value: 35.1649979105234 - type: nauc_recall_at_3_std value: -10.908395079450377 - type: nauc_recall_at_5_diff1 value: 44.535088248003156 - type: nauc_recall_at_5_max value: 34.89949777715303 - type: nauc_recall_at_5_std value: -10.361237744830412 - type: ndcg_at_1 value: 36.815 - type: ndcg_at_10 value: 44.391000000000005 - type: ndcg_at_100 value: 48.515 - type: ndcg_at_1000 value: 50.76199999999999 - type: ndcg_at_20 value: 45.788000000000004 - type: ndcg_at_3 value: 40.178000000000004 - type: ndcg_at_5 value: 42.045 - type: precision_at_1 value: 36.815 - type: precision_at_10 value: 8.408 - type: precision_at_100 value: 1.343 - type: precision_at_1000 value: 0.182 - type: precision_at_20 value: 4.873 - type: precision_at_3 value: 19.299 - type: precision_at_5 value: 13.758000000000001 - type: recall_at_1 value: 29.157 - type: recall_at_10 value: 54.214 - type: recall_at_100 value: 71.929 - type: recall_at_1000 value: 86.533 - type: recall_at_20 value: 59.421 - type: recall_at_3 value: 41.569 - type: recall_at_5 value: 46.791 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval (default) revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: mteb/cqadupstack-gaming metrics: - type: main_score value: 59.03699999999999 - type: map_at_1 value: 41.476 - type: map_at_10 value: 53.400000000000006 - type: map_at_100 value: 54.452999999999996 - type: map_at_1000 value: 54.504 - type: map_at_20 value: 54.045 - type: map_at_3 value: 50.153999999999996 - type: map_at_5 value: 52.079 - type: mrr_at_1 value: 46.95924764890282 - type: mrr_at_10 value: 56.68495297805642 - type: mrr_at_100 value: 57.34582096937295 - type: mrr_at_1000 value: 57.37100347158495 - type: mrr_at_20 value: 57.10508892444508 - type: mrr_at_3 value: 54.242424242424235 - type: mrr_at_5 value: 55.76593521421108 - type: nauc_map_at_1000_diff1 value: 53.36527106664 - type: nauc_map_at_1000_max value: 43.486776333687835 - type: nauc_map_at_1000_std value: -5.509558143849234 - type: nauc_map_at_100_diff1 value: 53.34097797467696 - type: nauc_map_at_100_max value: 43.476003610937234 - type: nauc_map_at_100_std value: -5.520166623777559 - type: nauc_map_at_10_diff1 value: 53.432351035276746 - type: nauc_map_at_10_max value: 42.75788423195968 - type: nauc_map_at_10_std value: -6.504192409274652 - type: nauc_map_at_1_diff1 value: 57.34963186677463 - type: nauc_map_at_1_max value: 36.95146202384373 - type: nauc_map_at_1_std value: -9.460645936916988 - type: nauc_map_at_20_diff1 value: 53.29779847033195 - type: nauc_map_at_20_max value: 43.22342023309121 - type: nauc_map_at_20_std value: -5.953002390034157 - type: nauc_map_at_3_diff1 value: 54.09550124289603 - type: nauc_map_at_3_max value: 41.09664412682725 - type: nauc_map_at_3_std value: -8.797917588156473 - type: nauc_map_at_5_diff1 value: 53.47735307728038 - type: nauc_map_at_5_max value: 42.1420557369995 - type: nauc_map_at_5_std value: -6.982023249979087 - type: nauc_mrr_at_1000_diff1 value: 53.84548396450655 - type: nauc_mrr_at_1000_max value: 45.70711475929243 - type: nauc_mrr_at_1000_std value: -3.572519075485509 - type: nauc_mrr_at_100_diff1 value: 53.831585937143345 - type: nauc_mrr_at_100_max value: 45.71866605712688 - type: nauc_mrr_at_100_std value: -3.5531077992494087 - type: nauc_mrr_at_10_diff1 value: 53.77550386915942 - type: nauc_mrr_at_10_max value: 45.61906078824265 - type: nauc_mrr_at_10_std value: -3.7647971491069567 - type: nauc_mrr_at_1_diff1 value: 57.59578262230993 - type: nauc_mrr_at_1_max value: 43.132298775083996 - type: nauc_mrr_at_1_std value: -6.820570895500843 - type: nauc_mrr_at_20_diff1 value: 53.757844034161984 - type: nauc_mrr_at_20_max value: 45.67787807420582 - type: nauc_mrr_at_20_std value: -3.6741549159529816 - type: nauc_mrr_at_3_diff1 value: 54.41366916196891 - type: nauc_mrr_at_3_max value: 45.48753195460355 - type: nauc_mrr_at_3_std value: -4.536347261239106 - type: nauc_mrr_at_5_diff1 value: 53.81844478829885 - type: nauc_mrr_at_5_max value: 45.77186226917752 - type: nauc_mrr_at_5_std value: -3.560088004877736 - type: nauc_ndcg_at_1000_diff1 value: 52.474274223239945 - type: nauc_ndcg_at_1000_max value: 45.88297620389939 - type: nauc_ndcg_at_1000_std value: -2.236689460240769 - type: nauc_ndcg_at_100_diff1 value: 51.99537297728399 - type: nauc_ndcg_at_100_max value: 46.162105938598245 - type: nauc_ndcg_at_100_std value: -1.636252027390496 - type: nauc_ndcg_at_10_diff1 value: 51.981635840094334 - type: nauc_ndcg_at_10_max value: 44.72098290105285 - type: nauc_ndcg_at_10_std value: -4.26133599970984 - type: nauc_ndcg_at_1_diff1 value: 57.43124530432752 - type: nauc_ndcg_at_1_max value: 42.987773648572045 - type: nauc_ndcg_at_1_std value: -6.975930064288375 - type: nauc_ndcg_at_20_diff1 value: 51.709989593496665 - type: nauc_ndcg_at_20_max value: 45.35511346806507 - type: nauc_ndcg_at_20_std value: -3.441945043133369 - type: nauc_ndcg_at_3_diff1 value: 52.83956836083957 - type: nauc_ndcg_at_3_max value: 43.14243257908553 - type: nauc_ndcg_at_3_std value: -6.906786756066083 - type: nauc_ndcg_at_5_diff1 value: 51.92395247597085 - type: nauc_ndcg_at_5_max value: 44.28584104560978 - type: nauc_ndcg_at_5_std value: -4.432556679370336 - type: nauc_precision_at_1000_diff1 value: -10.137271271355312 - type: nauc_precision_at_1000_max value: 21.053415390964915 - type: nauc_precision_at_1000_std value: 31.437645188936003 - type: nauc_precision_at_100_diff1 value: -5.869005161223761 - type: nauc_precision_at_100_max value: 28.74652505762229 - type: nauc_precision_at_100_std value: 33.42249624017563 - type: nauc_precision_at_10_diff1 value: 14.075300860742587 - type: nauc_precision_at_10_max value: 36.90717719533496 - type: nauc_precision_at_10_std value: 15.27522825163519 - type: nauc_precision_at_1_diff1 value: 57.43124530432752 - type: nauc_precision_at_1_max value: 42.987773648572045 - type: nauc_precision_at_1_std value: -6.975930064288375 - type: nauc_precision_at_20_diff1 value: 4.831146517476065 - type: nauc_precision_at_20_max value: 34.600390709037775 - type: nauc_precision_at_20_std value: 21.879191470976977 - type: nauc_precision_at_3_diff1 value: 33.75586535854295 - type: nauc_precision_at_3_max value: 41.8963728460937 - type: nauc_precision_at_3_std value: 0.30853391781218725 - type: nauc_precision_at_5_diff1 value: 23.619374234162443 - type: nauc_precision_at_5_max value: 40.26315749312306 - type: nauc_precision_at_5_std value: 9.496779653807806 - type: nauc_recall_at_1000_diff1 value: 39.650899433995065 - type: nauc_recall_at_1000_max value: 65.95997046182639 - type: nauc_recall_at_1000_std value: 41.52010213404674 - type: nauc_recall_at_100_diff1 value: 37.021652104886904 - type: nauc_recall_at_100_max value: 57.901229136609636 - type: nauc_recall_at_100_std value: 27.173492395498428 - type: nauc_recall_at_10_diff1 value: 44.29968361744853 - type: nauc_recall_at_10_max value: 44.18295286662639 - type: nauc_recall_at_10_std value: -1.5721790203147754 - type: nauc_recall_at_1_diff1 value: 57.34963186677463 - type: nauc_recall_at_1_max value: 36.95146202384373 - type: nauc_recall_at_1_std value: -9.460645936916988 - type: nauc_recall_at_20_diff1 value: 41.603580598985126 - type: nauc_recall_at_20_max value: 47.702934198286876 - type: nauc_recall_at_20_std value: 3.019298754051616 - type: nauc_recall_at_3_diff1 value: 49.02194332102533 - type: nauc_recall_at_3_max value: 41.38275177493884 - type: nauc_recall_at_3_std value: -8.055685087264179 - type: nauc_recall_at_5_diff1 value: 45.213060998923496 - type: nauc_recall_at_5_max value: 43.53976038303946 - type: nauc_recall_at_5_std value: -1.7312187150046634 - type: ndcg_at_1 value: 47.022000000000006 - type: ndcg_at_10 value: 59.03699999999999 - type: ndcg_at_100 value: 63.077000000000005 - type: ndcg_at_1000 value: 64.098 - type: ndcg_at_20 value: 60.84 - type: ndcg_at_3 value: 53.657999999999994 - type: ndcg_at_5 value: 56.501000000000005 - type: precision_at_1 value: 47.022000000000006 - type: precision_at_10 value: 9.342 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.136 - type: precision_at_20 value: 5.232 - type: precision_at_3 value: 23.552999999999997 - type: precision_at_5 value: 16.250999999999998 - type: recall_at_1 value: 41.476 - type: recall_at_10 value: 72.283 - type: recall_at_100 value: 89.545 - type: recall_at_1000 value: 96.798 - type: recall_at_20 value: 78.84100000000001 - type: recall_at_3 value: 58.114 - type: recall_at_5 value: 65.007 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval (default) revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: mteb/cqadupstack-gis metrics: - type: main_score value: 37.673 - type: map_at_1 value: 25.324 - type: map_at_10 value: 33.17 - type: map_at_100 value: 34.095 - type: map_at_1000 value: 34.182 - type: map_at_20 value: 33.654 - type: map_at_3 value: 30.879 - type: map_at_5 value: 32.26 - type: mrr_at_1 value: 27.34463276836158 - type: mrr_at_10 value: 35.2258541834813 - type: mrr_at_100 value: 36.00404498547979 - type: mrr_at_1000 value: 36.07566444493976 - type: mrr_at_20 value: 35.63110644891617 - type: mrr_at_3 value: 32.95668549905838 - type: mrr_at_5 value: 34.25612052730697 - type: nauc_map_at_1000_diff1 value: 46.058990680271485 - type: nauc_map_at_1000_max value: 28.600543996662374 - type: nauc_map_at_1000_std value: -3.8218348925653505 - type: nauc_map_at_100_diff1 value: 46.04742556273763 - type: nauc_map_at_100_max value: 28.58845010683153 - type: nauc_map_at_100_std value: -3.8241454424665746 - type: nauc_map_at_10_diff1 value: 46.318380971509015 - type: nauc_map_at_10_max value: 28.445154969629815 - type: nauc_map_at_10_std value: -4.668418336182435 - type: nauc_map_at_1_diff1 value: 50.84712517695217 - type: nauc_map_at_1_max value: 24.956820608742856 - type: nauc_map_at_1_std value: -7.408652214171463 - type: nauc_map_at_20_diff1 value: 46.02082882551024 - type: nauc_map_at_20_max value: 28.71729950175136 - type: nauc_map_at_20_std value: -3.8899400482521864 - type: nauc_map_at_3_diff1 value: 47.017578094263065 - type: nauc_map_at_3_max value: 27.57393258045568 - type: nauc_map_at_3_std value: -5.578535499711579 - type: nauc_map_at_5_diff1 value: 46.64174901816308 - type: nauc_map_at_5_max value: 28.12934751037357 - type: nauc_map_at_5_std value: -4.623605944585039 - type: nauc_mrr_at_1000_diff1 value: 44.80745580850706 - type: nauc_mrr_at_1000_max value: 30.08660965092525 - type: nauc_mrr_at_1000_std value: -1.8483739575689273 - type: nauc_mrr_at_100_diff1 value: 44.79929065561873 - type: nauc_mrr_at_100_max value: 30.068319004487208 - type: nauc_mrr_at_100_std value: -1.8439865469408845 - type: nauc_mrr_at_10_diff1 value: 45.04202172389592 - type: nauc_mrr_at_10_max value: 30.006082516512294 - type: nauc_mrr_at_10_std value: -2.4476357227718673 - type: nauc_mrr_at_1_diff1 value: 49.710330210449705 - type: nauc_mrr_at_1_max value: 27.652926800227444 - type: nauc_mrr_at_1_std value: -4.963221847243473 - type: nauc_mrr_at_20_diff1 value: 44.74348822631581 - type: nauc_mrr_at_20_max value: 30.232310892837866 - type: nauc_mrr_at_20_std value: -1.8627482467585263 - type: nauc_mrr_at_3_diff1 value: 45.63996732955718 - type: nauc_mrr_at_3_max value: 29.71071543929027 - type: nauc_mrr_at_3_std value: -2.9488868732728264 - type: nauc_mrr_at_5_diff1 value: 45.31282418942023 - type: nauc_mrr_at_5_max value: 29.59225270015164 - type: nauc_mrr_at_5_std value: -2.571596169990907 - type: nauc_ndcg_at_1000_diff1 value: 43.44153526801899 - type: nauc_ndcg_at_1000_max value: 30.264809827186745 - type: nauc_ndcg_at_1000_std value: -0.3673459026557417 - type: nauc_ndcg_at_100_diff1 value: 42.9260780049435 - type: nauc_ndcg_at_100_max value: 29.971290021267254 - type: nauc_ndcg_at_100_std value: 0.07223943237736839 - type: nauc_ndcg_at_10_diff1 value: 43.89936991271991 - type: nauc_ndcg_at_10_max value: 29.883246789724915 - type: nauc_ndcg_at_10_std value: -2.842441401911265 - type: nauc_ndcg_at_1_diff1 value: 50.14865712693543 - type: nauc_ndcg_at_1_max value: 27.111609058341863 - type: nauc_ndcg_at_1_std value: -5.5675174385570925 - type: nauc_ndcg_at_20_diff1 value: 42.84709307426253 - type: nauc_ndcg_at_20_max value: 30.76378099168594 - type: nauc_ndcg_at_20_std value: -0.42561135386508475 - type: nauc_ndcg_at_3_diff1 value: 45.4326566931524 - type: nauc_ndcg_at_3_max value: 28.61889737624481 - type: nauc_ndcg_at_3_std value: -4.348200281698876 - type: nauc_ndcg_at_5_diff1 value: 44.630092727271034 - type: nauc_ndcg_at_5_max value: 29.04891878562973 - type: nauc_ndcg_at_5_std value: -2.8900608482934165 - type: nauc_precision_at_1000_diff1 value: 1.563823692486198 - type: nauc_precision_at_1000_max value: 18.07524759715147 - type: nauc_precision_at_1000_std value: 10.75651488435518 - type: nauc_precision_at_100_diff1 value: 15.84032553897459 - type: nauc_precision_at_100_max value: 26.9982332859951 - type: nauc_precision_at_100_std value: 13.809307316031362 - type: nauc_precision_at_10_diff1 value: 33.44005568824001 - type: nauc_precision_at_10_max value: 35.31365313654245 - type: nauc_precision_at_10_std value: 2.1516208493844817 - type: nauc_precision_at_1_diff1 value: 50.14865712693543 - type: nauc_precision_at_1_max value: 27.111609058341863 - type: nauc_precision_at_1_std value: -5.5675174385570925 - type: nauc_precision_at_20_diff1 value: 26.453560867406594 - type: nauc_precision_at_20_max value: 36.754320258234735 - type: nauc_precision_at_20_std value: 10.960004664156314 - type: nauc_precision_at_3_diff1 value: 39.5339842087826 - type: nauc_precision_at_3_max value: 32.43079763654043 - type: nauc_precision_at_3_std value: -1.1149107052174205 - type: nauc_precision_at_5_diff1 value: 36.75997042257077 - type: nauc_precision_at_5_max value: 32.936394052992256 - type: nauc_precision_at_5_std value: 2.253739058194602 - type: nauc_recall_at_1000_diff1 value: 26.620883791876672 - type: nauc_recall_at_1000_max value: 40.036249354126255 - type: nauc_recall_at_1000_std value: 24.67019914079094 - type: nauc_recall_at_100_diff1 value: 29.06050311303032 - type: nauc_recall_at_100_max value: 31.719103788027674 - type: nauc_recall_at_100_std value: 16.517714390661105 - type: nauc_recall_at_10_diff1 value: 36.292924258716106 - type: nauc_recall_at_10_max value: 32.02173242085442 - type: nauc_recall_at_10_std value: 1.016713326361783 - type: nauc_recall_at_1_diff1 value: 50.84712517695217 - type: nauc_recall_at_1_max value: 24.956820608742856 - type: nauc_recall_at_1_std value: -7.408652214171463 - type: nauc_recall_at_20_diff1 value: 31.875810510992398 - type: nauc_recall_at_20_max value: 35.1225435012755 - type: nauc_recall_at_20_std value: 10.08081240374867 - type: nauc_recall_at_3_diff1 value: 41.31843254728666 - type: nauc_recall_at_3_max value: 29.083015930837323 - type: nauc_recall_at_3_std value: -2.6812306676938906 - type: nauc_recall_at_5_diff1 value: 38.74912094651174 - type: nauc_recall_at_5_max value: 29.713413529317663 - type: nauc_recall_at_5_std value: 0.6429485746621083 - type: ndcg_at_1 value: 27.232 - type: ndcg_at_10 value: 37.673 - type: ndcg_at_100 value: 42.379 - type: ndcg_at_1000 value: 44.664 - type: ndcg_at_20 value: 39.282000000000004 - type: ndcg_at_3 value: 33.178999999999995 - type: ndcg_at_5 value: 35.481 - type: precision_at_1 value: 27.232 - type: precision_at_10 value: 5.593 - type: precision_at_100 value: 0.845 - type: precision_at_1000 value: 0.108 - type: precision_at_20 value: 3.1809999999999996 - type: precision_at_3 value: 13.898 - type: precision_at_5 value: 9.605 - type: recall_at_1 value: 25.324 - type: recall_at_10 value: 49.66 - type: recall_at_100 value: 71.702 - type: recall_at_1000 value: 88.884 - type: recall_at_20 value: 55.63399999999999 - type: recall_at_3 value: 37.557 - type: recall_at_5 value: 43.086 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval (default) revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: mteb/cqadupstack-mathematica metrics: - type: main_score value: 27.683000000000003 - type: map_at_1 value: 15.440000000000001 - type: map_at_10 value: 22.708000000000002 - type: map_at_100 value: 23.891000000000002 - type: map_at_1000 value: 24.009 - type: map_at_20 value: 23.362 - type: map_at_3 value: 20.173 - type: map_at_5 value: 21.512999999999998 - type: mrr_at_1 value: 19.154228855721392 - type: mrr_at_10 value: 27.14907604832978 - type: mrr_at_100 value: 28.134401799106946 - type: mrr_at_1000 value: 28.210652971960727 - type: mrr_at_20 value: 27.743116715423334 - type: mrr_at_3 value: 24.64759535655058 - type: mrr_at_5 value: 26.0530679933665 - type: nauc_map_at_1000_diff1 value: 26.45225395954919 - type: nauc_map_at_1000_max value: 18.88821201176001 - type: nauc_map_at_1000_std value: -6.743073428818526 - type: nauc_map_at_100_diff1 value: 26.46163797092885 - type: nauc_map_at_100_max value: 18.91020517272631 - type: nauc_map_at_100_std value: -6.715512753190824 - type: nauc_map_at_10_diff1 value: 25.93830061738008 - type: nauc_map_at_10_max value: 18.230821464212788 - type: nauc_map_at_10_std value: -7.723714557953293 - type: nauc_map_at_1_diff1 value: 32.6143819833978 - type: nauc_map_at_1_max value: 18.229434406703447 - type: nauc_map_at_1_std value: -8.826503266807608 - type: nauc_map_at_20_diff1 value: 26.267375356189532 - type: nauc_map_at_20_max value: 18.74372577827996 - type: nauc_map_at_20_std value: -7.1213741256387495 - type: nauc_map_at_3_diff1 value: 26.502658255222222 - type: nauc_map_at_3_max value: 17.34676548965769 - type: nauc_map_at_3_std value: -8.661705532483479 - type: nauc_map_at_5_diff1 value: 25.947975266973 - type: nauc_map_at_5_max value: 18.26579025252041 - type: nauc_map_at_5_std value: -7.988152286698193 - type: nauc_mrr_at_1000_diff1 value: 27.43240261182634 - type: nauc_mrr_at_1000_max value: 19.59851548113691 - type: nauc_mrr_at_1000_std value: -5.8659045748819505 - type: nauc_mrr_at_100_diff1 value: 27.42860371902458 - type: nauc_mrr_at_100_max value: 19.61291439961396 - type: nauc_mrr_at_100_std value: -5.840170365425997 - type: nauc_mrr_at_10_diff1 value: 26.996629286135576 - type: nauc_mrr_at_10_max value: 19.09125992187832 - type: nauc_mrr_at_10_std value: -6.401949732007706 - type: nauc_mrr_at_1_diff1 value: 33.20355103883785 - type: nauc_mrr_at_1_max value: 18.84271700427976 - type: nauc_mrr_at_1_std value: -6.846362536084065 - type: nauc_mrr_at_20_diff1 value: 27.342295700872445 - type: nauc_mrr_at_20_max value: 19.59730195635629 - type: nauc_mrr_at_20_std value: -6.045183866074472 - type: nauc_mrr_at_3_diff1 value: 27.921898978571868 - type: nauc_mrr_at_3_max value: 19.028747822887816 - type: nauc_mrr_at_3_std value: -6.651966049443023 - type: nauc_mrr_at_5_diff1 value: 27.280695824148392 - type: nauc_mrr_at_5_max value: 19.430798343725524 - type: nauc_mrr_at_5_std value: -6.747383339145715 - type: nauc_ndcg_at_1000_diff1 value: 25.38902736172073 - type: nauc_ndcg_at_1000_max value: 20.45917423943934 - type: nauc_ndcg_at_1000_std value: -3.2757947022252076 - type: nauc_ndcg_at_100_diff1 value: 25.732803165259238 - type: nauc_ndcg_at_100_max value: 20.836040539884642 - type: nauc_ndcg_at_100_std value: -2.9535785746014396 - type: nauc_ndcg_at_10_diff1 value: 23.946041122415746 - type: nauc_ndcg_at_10_max value: 18.62752297015455 - type: nauc_ndcg_at_10_std value: -6.405272980276195 - type: nauc_ndcg_at_1_diff1 value: 33.20355103883785 - type: nauc_ndcg_at_1_max value: 18.84271700427976 - type: nauc_ndcg_at_1_std value: -6.846362536084065 - type: nauc_ndcg_at_20_diff1 value: 24.77178243398418 - type: nauc_ndcg_at_20_max value: 20.27057276120682 - type: nauc_ndcg_at_20_std value: -4.789054638686646 - type: nauc_ndcg_at_3_diff1 value: 25.93797698971861 - type: nauc_ndcg_at_3_max value: 17.7626073837572 - type: nauc_ndcg_at_3_std value: -8.049324539903097 - type: nauc_ndcg_at_5_diff1 value: 24.628424554881647 - type: nauc_ndcg_at_5_max value: 18.989213649165613 - type: nauc_ndcg_at_5_std value: -7.173452770970873 - type: nauc_precision_at_1000_diff1 value: 5.456508320365408 - type: nauc_precision_at_1000_max value: 4.8136815217087205 - type: nauc_precision_at_1000_std value: 4.947456448109757 - type: nauc_precision_at_100_diff1 value: 16.260577000896543 - type: nauc_precision_at_100_max value: 16.7039900850556 - type: nauc_precision_at_100_std value: 9.11227641718042 - type: nauc_precision_at_10_diff1 value: 16.365122567702535 - type: nauc_precision_at_10_max value: 17.065003280187348 - type: nauc_precision_at_10_std value: -2.229290931287804 - type: nauc_precision_at_1_diff1 value: 33.20355103883785 - type: nauc_precision_at_1_max value: 18.84271700427976 - type: nauc_precision_at_1_std value: -6.846362536084065 - type: nauc_precision_at_20_diff1 value: 16.91214381595962 - type: nauc_precision_at_20_max value: 19.58308083494222 - type: nauc_precision_at_20_std value: 2.253335365165219 - type: nauc_precision_at_3_diff1 value: 19.85085379824151 - type: nauc_precision_at_3_max value: 16.27352732420782 - type: nauc_precision_at_3_std value: -7.201882607059234 - type: nauc_precision_at_5_diff1 value: 17.966240404329092 - type: nauc_precision_at_5_max value: 18.231425958226044 - type: nauc_precision_at_5_std value: -4.043751510938105 - type: nauc_recall_at_1000_diff1 value: 13.957143176090353 - type: nauc_recall_at_1000_max value: 25.052247631159652 - type: nauc_recall_at_1000_std value: 17.326355613640054 - type: nauc_recall_at_100_diff1 value: 21.440869340994407 - type: nauc_recall_at_100_max value: 24.311867728047343 - type: nauc_recall_at_100_std value: 9.336321796584325 - type: nauc_recall_at_10_diff1 value: 16.696814266222432 - type: nauc_recall_at_10_max value: 17.145710052014486 - type: nauc_recall_at_10_std value: -4.135339167818864 - type: nauc_recall_at_1_diff1 value: 32.6143819833978 - type: nauc_recall_at_1_max value: 18.229434406703447 - type: nauc_recall_at_1_std value: -8.826503266807608 - type: nauc_recall_at_20_diff1 value: 18.34311797149379 - type: nauc_recall_at_20_max value: 21.832943514273143 - type: nauc_recall_at_20_std value: 0.8894706565637946 - type: nauc_recall_at_3_diff1 value: 20.992985988081557 - type: nauc_recall_at_3_max value: 16.255791972442506 - type: nauc_recall_at_3_std value: -7.097037821828232 - type: nauc_recall_at_5_diff1 value: 18.60326978035633 - type: nauc_recall_at_5_max value: 18.615371576760275 - type: nauc_recall_at_5_std value: -6.049891295196573 - type: ndcg_at_1 value: 19.154 - type: ndcg_at_10 value: 27.683000000000003 - type: ndcg_at_100 value: 33.213 - type: ndcg_at_1000 value: 36.141 - type: ndcg_at_20 value: 29.854999999999997 - type: ndcg_at_3 value: 22.987 - type: ndcg_at_5 value: 25.106 - type: precision_at_1 value: 19.154 - type: precision_at_10 value: 5.224 - type: precision_at_100 value: 0.919 - type: precision_at_1000 value: 0.13 - type: precision_at_20 value: 3.215 - type: precision_at_3 value: 11.318 - type: precision_at_5 value: 8.383000000000001 - type: recall_at_1 value: 15.440000000000001 - type: recall_at_10 value: 38.734 - type: recall_at_100 value: 62.576 - type: recall_at_1000 value: 83.541 - type: recall_at_20 value: 46.45 - type: recall_at_3 value: 25.438 - type: recall_at_5 value: 30.891000000000002 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval (default) revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: mteb/cqadupstack-physics metrics: - type: main_score value: 45.196999999999996 - type: map_at_1 value: 29.438 - type: map_at_10 value: 39.497 - type: map_at_100 value: 40.757 - type: map_at_1000 value: 40.865 - type: map_at_20 value: 40.21 - type: map_at_3 value: 36.649 - type: map_at_5 value: 38.278 - type: mrr_at_1 value: 35.514918190567855 - type: mrr_at_10 value: 44.939158531555066 - type: mrr_at_100 value: 45.71399223764184 - type: mrr_at_1000 value: 45.767047236444185 - type: mrr_at_20 value: 45.40064162616659 - type: mrr_at_3 value: 42.49278152069297 - type: mrr_at_5 value: 43.999037536092395 - type: nauc_map_at_1000_diff1 value: 48.2911083967695 - type: nauc_map_at_1000_max value: 33.0567223033294 - type: nauc_map_at_1000_std value: -7.5831018828087435 - type: nauc_map_at_100_diff1 value: 48.266195527072156 - type: nauc_map_at_100_max value: 33.03915960499412 - type: nauc_map_at_100_std value: -7.606925986310037 - type: nauc_map_at_10_diff1 value: 48.328320797346294 - type: nauc_map_at_10_max value: 32.7070148720631 - type: nauc_map_at_10_std value: -8.512811841258646 - type: nauc_map_at_1_diff1 value: 52.88608162356222 - type: nauc_map_at_1_max value: 31.24794941358492 - type: nauc_map_at_1_std value: -11.706848009285954 - type: nauc_map_at_20_diff1 value: 48.2969260156472 - type: nauc_map_at_20_max value: 32.86081996380274 - type: nauc_map_at_20_std value: -8.020958942798524 - type: nauc_map_at_3_diff1 value: 48.743817641945114 - type: nauc_map_at_3_max value: 32.605458230621856 - type: nauc_map_at_3_std value: -8.638274842287737 - type: nauc_map_at_5_diff1 value: 48.78806923732555 - type: nauc_map_at_5_max value: 32.61566250570677 - type: nauc_map_at_5_std value: -8.780064299161241 - type: nauc_mrr_at_1000_diff1 value: 48.402407250061934 - type: nauc_mrr_at_1000_max value: 32.73963018253408 - type: nauc_mrr_at_1000_std value: -7.600714897746363 - type: nauc_mrr_at_100_diff1 value: 48.38722402499983 - type: nauc_mrr_at_100_max value: 32.74291939054888 - type: nauc_mrr_at_100_std value: -7.584196436282831 - type: nauc_mrr_at_10_diff1 value: 48.324992370558576 - type: nauc_mrr_at_10_max value: 32.65326566012142 - type: nauc_mrr_at_10_std value: -7.960957871756174 - type: nauc_mrr_at_1_diff1 value: 52.51790849738347 - type: nauc_mrr_at_1_max value: 31.979743734335504 - type: nauc_mrr_at_1_std value: -11.101383949942232 - type: nauc_mrr_at_20_diff1 value: 48.375346158446725 - type: nauc_mrr_at_20_max value: 32.73895555822591 - type: nauc_mrr_at_20_std value: -7.642914670396977 - type: nauc_mrr_at_3_diff1 value: 48.83160990949774 - type: nauc_mrr_at_3_max value: 32.80880922901924 - type: nauc_mrr_at_3_std value: -7.760362168094019 - type: nauc_mrr_at_5_diff1 value: 48.60255139323125 - type: nauc_mrr_at_5_max value: 32.72728351371156 - type: nauc_mrr_at_5_std value: -8.038189749481258 - type: nauc_ndcg_at_1000_diff1 value: 46.67101320125475 - type: nauc_ndcg_at_1000_max value: 34.0504701772667 - type: nauc_ndcg_at_1000_std value: -4.032878112637376 - type: nauc_ndcg_at_100_diff1 value: 46.248748827447265 - type: nauc_ndcg_at_100_max value: 33.74751928599088 - type: nauc_ndcg_at_100_std value: -3.991862266355337 - type: nauc_ndcg_at_10_diff1 value: 46.46100196084458 - type: nauc_ndcg_at_10_max value: 32.807685888284794 - type: nauc_ndcg_at_10_std value: -7.457478747984192 - type: nauc_ndcg_at_1_diff1 value: 52.51790849738347 - type: nauc_ndcg_at_1_max value: 31.979743734335504 - type: nauc_ndcg_at_1_std value: -11.101383949942232 - type: nauc_ndcg_at_20_diff1 value: 46.410656199509944 - type: nauc_ndcg_at_20_max value: 33.1581309808876 - type: nauc_ndcg_at_20_std value: -5.99183846380811 - type: nauc_ndcg_at_3_diff1 value: 47.26764972559635 - type: nauc_ndcg_at_3_max value: 33.08614197399897 - type: nauc_ndcg_at_3_std value: -7.0742507391341345 - type: nauc_ndcg_at_5_diff1 value: 47.35898227835041 - type: nauc_ndcg_at_5_max value: 32.84468179240444 - type: nauc_ndcg_at_5_std value: -7.714927192881523 - type: nauc_precision_at_1000_diff1 value: -9.52692395683019 - type: nauc_precision_at_1000_max value: 7.374303479576268 - type: nauc_precision_at_1000_std value: 20.79761650113592 - type: nauc_precision_at_100_diff1 value: -0.5511806256392863 - type: nauc_precision_at_100_max value: 14.260122126630634 - type: nauc_precision_at_100_std value: 20.84530821188996 - type: nauc_precision_at_10_diff1 value: 19.572115874106533 - type: nauc_precision_at_10_max value: 24.556082924046027 - type: nauc_precision_at_10_std value: 5.323857400679805 - type: nauc_precision_at_1_diff1 value: 52.51790849738347 - type: nauc_precision_at_1_max value: 31.979743734335504 - type: nauc_precision_at_1_std value: -11.101383949942232 - type: nauc_precision_at_20_diff1 value: 12.356576945971826 - type: nauc_precision_at_20_max value: 21.121689225096056 - type: nauc_precision_at_20_std value: 12.177075559439556 - type: nauc_precision_at_3_diff1 value: 33.671667659871865 - type: nauc_precision_at_3_max value: 30.98143183174062 - type: nauc_precision_at_3_std value: 0.520604608152502 - type: nauc_precision_at_5_diff1 value: 30.06980809430162 - type: nauc_precision_at_5_max value: 28.454115294663602 - type: nauc_precision_at_5_std value: 0.8596400708828538 - type: nauc_recall_at_1000_diff1 value: 24.965587031650884 - type: nauc_recall_at_1000_max value: 40.72840120992986 - type: nauc_recall_at_1000_std value: 38.76857796467627 - type: nauc_recall_at_100_diff1 value: 32.790892696170374 - type: nauc_recall_at_100_max value: 32.970070123139564 - type: nauc_recall_at_100_std value: 14.657654854897062 - type: nauc_recall_at_10_diff1 value: 38.309181873423476 - type: nauc_recall_at_10_max value: 30.28707855794435 - type: nauc_recall_at_10_std value: -5.568997608502203 - type: nauc_recall_at_1_diff1 value: 52.88608162356222 - type: nauc_recall_at_1_max value: 31.24794941358492 - type: nauc_recall_at_1_std value: -11.706848009285954 - type: nauc_recall_at_20_diff1 value: 37.44816940285688 - type: nauc_recall_at_20_max value: 31.24736990052554 - type: nauc_recall_at_20_std value: -0.17027260910961897 - type: nauc_recall_at_3_diff1 value: 42.921582034772726 - type: nauc_recall_at_3_max value: 31.861184780950513 - type: nauc_recall_at_3_std value: -6.209754089638474 - type: nauc_recall_at_5_diff1 value: 41.74803396821156 - type: nauc_recall_at_5_max value: 31.13023590637421 - type: nauc_recall_at_5_std value: -6.608370086504567 - type: ndcg_at_1 value: 35.515 - type: ndcg_at_10 value: 45.196999999999996 - type: ndcg_at_100 value: 50.38399999999999 - type: ndcg_at_1000 value: 52.596 - type: ndcg_at_20 value: 47.233000000000004 - type: ndcg_at_3 value: 40.573 - type: ndcg_at_5 value: 42.853 - type: precision_at_1 value: 35.515 - type: precision_at_10 value: 8.017000000000001 - type: precision_at_100 value: 1.237 - type: precision_at_1000 value: 0.159 - type: precision_at_20 value: 4.687 - type: precision_at_3 value: 18.961 - type: precision_at_5 value: 13.34 - type: recall_at_1 value: 29.438 - type: recall_at_10 value: 56.603 - type: recall_at_100 value: 78.281 - type: recall_at_1000 value: 93.172 - type: recall_at_20 value: 63.571 - type: recall_at_3 value: 43.763000000000005 - type: recall_at_5 value: 49.717 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval (default) revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: mteb/cqadupstack-programmers metrics: - type: main_score value: 41.967999999999996 - type: map_at_1 value: 27.991 - type: map_at_10 value: 36.815 - type: map_at_100 value: 38.14 - type: map_at_1000 value: 38.257999999999996 - type: map_at_20 value: 37.561 - type: map_at_3 value: 34.094 - type: map_at_5 value: 35.557 - type: mrr_at_1 value: 34.817351598173516 - type: mrr_at_10 value: 42.56500507356672 - type: mrr_at_100 value: 43.460463999764066 - type: mrr_at_1000 value: 43.52348583643295 - type: mrr_at_20 value: 43.11992252647868 - type: mrr_at_3 value: 40.20167427701675 - type: mrr_at_5 value: 41.45738203957382 - type: nauc_map_at_1000_diff1 value: 41.67048775212967 - type: nauc_map_at_1000_max value: 43.99159244124849 - type: nauc_map_at_1000_std value: 2.573128018829387 - type: nauc_map_at_100_diff1 value: 41.674051168864544 - type: nauc_map_at_100_max value: 43.98147916359051 - type: nauc_map_at_100_std value: 2.5254111056725157 - type: nauc_map_at_10_diff1 value: 41.7125704403198 - type: nauc_map_at_10_max value: 43.474100183989364 - type: nauc_map_at_10_std value: 1.6477791314522445 - type: nauc_map_at_1_diff1 value: 48.1867206901292 - type: nauc_map_at_1_max value: 40.525641468978996 - type: nauc_map_at_1_std value: -0.7568533902855162 - type: nauc_map_at_20_diff1 value: 41.64339598055937 - type: nauc_map_at_20_max value: 43.62356989148736 - type: nauc_map_at_20_std value: 2.087731774178381 - type: nauc_map_at_3_diff1 value: 43.473195638597325 - type: nauc_map_at_3_max value: 42.94377216167118 - type: nauc_map_at_3_std value: 0.2505945238603998 - type: nauc_map_at_5_diff1 value: 42.39542158097317 - type: nauc_map_at_5_max value: 43.67892698262521 - type: nauc_map_at_5_std value: 0.9895905882223653 - type: nauc_mrr_at_1000_diff1 value: 41.09671003865924 - type: nauc_mrr_at_1000_max value: 46.28436379929593 - type: nauc_mrr_at_1000_std value: 4.354037919152363 - type: nauc_mrr_at_100_diff1 value: 41.09244756994191 - type: nauc_mrr_at_100_max value: 46.29034043110901 - type: nauc_mrr_at_100_std value: 4.351726070204726 - type: nauc_mrr_at_10_diff1 value: 40.977946444819096 - type: nauc_mrr_at_10_max value: 46.10718374892125 - type: nauc_mrr_at_10_std value: 4.18336707456262 - type: nauc_mrr_at_1_diff1 value: 45.599332453292675 - type: nauc_mrr_at_1_max value: 45.84726261326186 - type: nauc_mrr_at_1_std value: 2.4345971000548854 - type: nauc_mrr_at_20_diff1 value: 40.95961993815576 - type: nauc_mrr_at_20_max value: 46.18592650660265 - type: nauc_mrr_at_20_std value: 4.305161755438331 - type: nauc_mrr_at_3_diff1 value: 42.32692907673492 - type: nauc_mrr_at_3_max value: 46.26011359406279 - type: nauc_mrr_at_3_std value: 2.948567577936104 - type: nauc_mrr_at_5_diff1 value: 41.34052580040367 - type: nauc_mrr_at_5_max value: 46.34383226431204 - type: nauc_mrr_at_5_std value: 3.633823850306508 - type: nauc_ndcg_at_1000_diff1 value: 39.93215369321293 - type: nauc_ndcg_at_1000_max value: 45.687802170808574 - type: nauc_ndcg_at_1000_std value: 6.430986118631789 - type: nauc_ndcg_at_100_diff1 value: 39.684859990483915 - type: nauc_ndcg_at_100_max value: 45.80031091479213 - type: nauc_ndcg_at_100_std value: 6.36066573145881 - type: nauc_ndcg_at_10_diff1 value: 39.23880630958678 - type: nauc_ndcg_at_10_max value: 43.80038181935968 - type: nauc_ndcg_at_10_std value: 3.3533556819103074 - type: nauc_ndcg_at_1_diff1 value: 45.94736367846991 - type: nauc_ndcg_at_1_max value: 46.105763729560294 - type: nauc_ndcg_at_1_std value: 2.5515460950343622 - type: nauc_ndcg_at_20_diff1 value: 39.077143576829634 - type: nauc_ndcg_at_20_max value: 44.175755846357006 - type: nauc_ndcg_at_20_std value: 4.5499430823825 - type: nauc_ndcg_at_3_diff1 value: 41.55043893779763 - type: nauc_ndcg_at_3_max value: 44.369396288268 - type: nauc_ndcg_at_3_std value: 1.8135062317910333 - type: nauc_ndcg_at_5_diff1 value: 40.27727274546977 - type: nauc_ndcg_at_5_max value: 44.58055714919917 - type: nauc_ndcg_at_5_std value: 2.3858438655025895 - type: nauc_precision_at_1000_diff1 value: -15.82921590565681 - type: nauc_precision_at_1000_max value: 5.3200324911551276 - type: nauc_precision_at_1000_std value: 17.059441605068066 - type: nauc_precision_at_100_diff1 value: -3.477661270951154 - type: nauc_precision_at_100_max value: 23.102213467508363 - type: nauc_precision_at_100_std value: 22.61050030511951 - type: nauc_precision_at_10_diff1 value: 13.022774804120216 - type: nauc_precision_at_10_max value: 38.41004452998074 - type: nauc_precision_at_10_std value: 15.569153607416283 - type: nauc_precision_at_1_diff1 value: 45.94736367846991 - type: nauc_precision_at_1_max value: 46.105763729560294 - type: nauc_precision_at_1_std value: 2.5515460950343622 - type: nauc_precision_at_20_diff1 value: 6.552231339783917 - type: nauc_precision_at_20_max value: 33.144348451578914 - type: nauc_precision_at_20_std value: 19.55599724769983 - type: nauc_precision_at_3_diff1 value: 28.52937551899466 - type: nauc_precision_at_3_max value: 45.2056127705799 - type: nauc_precision_at_3_std value: 7.5353087497146785 - type: nauc_precision_at_5_diff1 value: 21.680390063172492 - type: nauc_precision_at_5_max value: 44.075542142279645 - type: nauc_precision_at_5_std value: 10.933211341141087 - type: nauc_recall_at_1000_diff1 value: 31.550619753305593 - type: nauc_recall_at_1000_max value: 49.1096811911254 - type: nauc_recall_at_1000_std value: 39.51532818925666 - type: nauc_recall_at_100_diff1 value: 30.696662503429863 - type: nauc_recall_at_100_max value: 47.21608565384206 - type: nauc_recall_at_100_std value: 20.894556840831438 - type: nauc_recall_at_10_diff1 value: 30.61623779072834 - type: nauc_recall_at_10_max value: 38.964392138468114 - type: nauc_recall_at_10_std value: 5.00024473264126 - type: nauc_recall_at_1_diff1 value: 48.1867206901292 - type: nauc_recall_at_1_max value: 40.525641468978996 - type: nauc_recall_at_1_std value: -0.7568533902855162 - type: nauc_recall_at_20_diff1 value: 29.07251333097125 - type: nauc_recall_at_20_max value: 39.03312242614524 - type: nauc_recall_at_20_std value: 8.959922224970903 - type: nauc_recall_at_3_diff1 value: 38.724975690747826 - type: nauc_recall_at_3_max value: 41.3025635407677 - type: nauc_recall_at_3_std value: 0.6484284398052167 - type: nauc_recall_at_5_diff1 value: 34.09423664395091 - type: nauc_recall_at_5_max value: 41.34844327450573 - type: nauc_recall_at_5_std value: 2.3349428535301424 - type: ndcg_at_1 value: 34.703 - type: ndcg_at_10 value: 41.967999999999996 - type: ndcg_at_100 value: 47.607 - type: ndcg_at_1000 value: 49.984 - type: ndcg_at_20 value: 44.285000000000004 - type: ndcg_at_3 value: 37.582 - type: ndcg_at_5 value: 39.454 - type: precision_at_1 value: 34.703 - type: precision_at_10 value: 7.306 - type: precision_at_100 value: 1.191 - type: precision_at_1000 value: 0.156 - type: precision_at_20 value: 4.406000000000001 - type: precision_at_3 value: 17.541999999999998 - type: precision_at_5 value: 12.26 - type: recall_at_1 value: 27.991 - type: recall_at_10 value: 52.016 - type: recall_at_100 value: 75.807 - type: recall_at_1000 value: 91.84400000000001 - type: recall_at_20 value: 60.171 - type: recall_at_3 value: 39.268 - type: recall_at_5 value: 44.548 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: CQADupstackRetrieval_is_a_combined_dataset split: test type: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: main_score value: 39.80483333333333 - type: ndcg_at_10 value: 39.80483333333333 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval (default) revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: mteb/cqadupstack-stats metrics: - type: main_score value: 34.888999999999996 - type: map_at_1 value: 24.257 - type: map_at_10 value: 30.85 - type: map_at_100 value: 31.653 - type: map_at_1000 value: 31.744 - type: map_at_20 value: 31.235000000000003 - type: map_at_3 value: 28.742 - type: map_at_5 value: 29.743000000000002 - type: mrr_at_1 value: 26.68711656441718 - type: mrr_at_10 value: 33.22828415619827 - type: mrr_at_100 value: 33.9510074708967 - type: mrr_at_1000 value: 34.019092955305204 - type: mrr_at_20 value: 33.600871234124 - type: mrr_at_3 value: 31.160531697341508 - type: mrr_at_5 value: 32.14212678936605 - type: nauc_map_at_1000_diff1 value: 52.717440487225275 - type: nauc_map_at_1000_max value: 44.60170963845081 - type: nauc_map_at_1000_std value: -3.1996706483359136 - type: nauc_map_at_100_diff1 value: 52.71189673586013 - type: nauc_map_at_100_max value: 44.57163638567482 - type: nauc_map_at_100_std value: -3.2345902627286436 - type: nauc_map_at_10_diff1 value: 53.02449930693637 - type: nauc_map_at_10_max value: 44.35369795372346 - type: nauc_map_at_10_std value: -3.8104783477282513 - type: nauc_map_at_1_diff1 value: 61.69412555489549 - type: nauc_map_at_1_max value: 45.687572761686425 - type: nauc_map_at_1_std value: -5.706950124921224 - type: nauc_map_at_20_diff1 value: 52.762382597962855 - type: nauc_map_at_20_max value: 44.42527816578249 - type: nauc_map_at_20_std value: -3.62442115557958 - type: nauc_map_at_3_diff1 value: 54.218133325934595 - type: nauc_map_at_3_max value: 43.886110491155 - type: nauc_map_at_3_std value: -5.373779809729606 - type: nauc_map_at_5_diff1 value: 53.87314356227072 - type: nauc_map_at_5_max value: 44.19838867906011 - type: nauc_map_at_5_std value: -4.657996273921579 - type: nauc_mrr_at_1000_diff1 value: 52.608759486406065 - type: nauc_mrr_at_1000_max value: 46.43225035608919 - type: nauc_mrr_at_1000_std value: -1.0825740469149292 - type: nauc_mrr_at_100_diff1 value: 52.59290039623913 - type: nauc_mrr_at_100_max value: 46.43031739568791 - type: nauc_mrr_at_100_std value: -1.110101172332684 - type: nauc_mrr_at_10_diff1 value: 52.860476269889055 - type: nauc_mrr_at_10_max value: 46.48418329087753 - type: nauc_mrr_at_10_std value: -1.3374238019386193 - type: nauc_mrr_at_1_diff1 value: 61.441947428807666 - type: nauc_mrr_at_1_max value: 48.54756533074311 - type: nauc_mrr_at_1_std value: -2.3680485432053135 - type: nauc_mrr_at_20_diff1 value: 52.665535367800906 - type: nauc_mrr_at_20_max value: 46.41185879304558 - type: nauc_mrr_at_20_std value: -1.3444595758714797 - type: nauc_mrr_at_3_diff1 value: 54.172851649909134 - type: nauc_mrr_at_3_max value: 46.15833772250591 - type: nauc_mrr_at_3_std value: -2.6730529379570642 - type: nauc_mrr_at_5_diff1 value: 53.723702014945175 - type: nauc_mrr_at_5_max value: 46.297316686693016 - type: nauc_mrr_at_5_std value: -2.159788610857334 - type: nauc_ndcg_at_1000_diff1 value: 48.49475884804671 - type: nauc_ndcg_at_1000_max value: 45.2504813678727 - type: nauc_ndcg_at_1000_std value: 1.3660441371017331 - type: nauc_ndcg_at_100_diff1 value: 48.328439839293004 - type: nauc_ndcg_at_100_max value: 45.1976848279064 - type: nauc_ndcg_at_100_std value: 0.984414559030773 - type: nauc_ndcg_at_10_diff1 value: 49.57495706841805 - type: nauc_ndcg_at_10_max value: 44.32422841398523 - type: nauc_ndcg_at_10_std value: -1.8938863954712948 - type: nauc_ndcg_at_1_diff1 value: 61.441947428807666 - type: nauc_ndcg_at_1_max value: 48.54756533074311 - type: nauc_ndcg_at_1_std value: -2.3680485432053135 - type: nauc_ndcg_at_20_diff1 value: 48.698704369155664 - type: nauc_ndcg_at_20_max value: 44.32085785234671 - type: nauc_ndcg_at_20_std value: -1.5370200957389617 - type: nauc_ndcg_at_3_diff1 value: 51.87602761155865 - type: nauc_ndcg_at_3_max value: 43.836423952288946 - type: nauc_ndcg_at_3_std value: -4.519331726990856 - type: nauc_ndcg_at_5_diff1 value: 51.536849644847216 - type: nauc_ndcg_at_5_max value: 44.05267508410536 - type: nauc_ndcg_at_5_std value: -3.7646800644981484 - type: nauc_precision_at_1000_diff1 value: -3.114425136121477 - type: nauc_precision_at_1000_max value: 21.219654091584214 - type: nauc_precision_at_1000_std value: 23.620715661080197 - type: nauc_precision_at_100_diff1 value: 13.781387623485253 - type: nauc_precision_at_100_max value: 37.7816424452238 - type: nauc_precision_at_100_std value: 24.719409110027726 - type: nauc_precision_at_10_diff1 value: 29.300018648484276 - type: nauc_precision_at_10_max value: 42.111386830242296 - type: nauc_precision_at_10_std value: 10.14768426081145 - type: nauc_precision_at_1_diff1 value: 61.441947428807666 - type: nauc_precision_at_1_max value: 48.54756533074311 - type: nauc_precision_at_1_std value: -2.3680485432053135 - type: nauc_precision_at_20_diff1 value: 24.056049155242437 - type: nauc_precision_at_20_max value: 41.1201344685915 - type: nauc_precision_at_20_std value: 12.97512554259156 - type: nauc_precision_at_3_diff1 value: 40.917570494530224 - type: nauc_precision_at_3_max value: 42.15043236961856 - type: nauc_precision_at_3_std value: -0.589880165120388 - type: nauc_precision_at_5_diff1 value: 36.58196834265981 - type: nauc_precision_at_5_max value: 41.630431483145955 - type: nauc_precision_at_5_std value: 2.792434474028848 - type: nauc_recall_at_1000_diff1 value: 22.038599119727685 - type: nauc_recall_at_1000_max value: 40.92494951502034 - type: nauc_recall_at_1000_std value: 30.098168212129906 - type: nauc_recall_at_100_diff1 value: 30.27278930698841 - type: nauc_recall_at_100_max value: 43.08655404016066 - type: nauc_recall_at_100_std value: 16.415020332792015 - type: nauc_recall_at_10_diff1 value: 38.75370707674917 - type: nauc_recall_at_10_max value: 40.98674256815627 - type: nauc_recall_at_10_std value: 1.4170954879979862 - type: nauc_recall_at_1_diff1 value: 61.69412555489549 - type: nauc_recall_at_1_max value: 45.687572761686425 - type: nauc_recall_at_1_std value: -5.706950124921224 - type: nauc_recall_at_20_diff1 value: 34.95998605858319 - type: nauc_recall_at_20_max value: 40.10527957275843 - type: nauc_recall_at_20_std value: 2.1856254846998895 - type: nauc_recall_at_3_diff1 value: 46.10618270844218 - type: nauc_recall_at_3_max value: 39.94724438255762 - type: nauc_recall_at_3_std value: -6.261263180948628 - type: nauc_recall_at_5_diff1 value: 45.37034670682598 - type: nauc_recall_at_5_max value: 40.996211974958655 - type: nauc_recall_at_5_std value: -3.8795589504838945 - type: ndcg_at_1 value: 26.687 - type: ndcg_at_10 value: 34.888999999999996 - type: ndcg_at_100 value: 38.967 - type: ndcg_at_1000 value: 41.408 - type: ndcg_at_20 value: 36.202 - type: ndcg_at_3 value: 30.763 - type: ndcg_at_5 value: 32.369 - type: precision_at_1 value: 26.687 - type: precision_at_10 value: 5.428999999999999 - type: precision_at_100 value: 0.8099999999999999 - type: precision_at_1000 value: 0.11 - type: precision_at_20 value: 3.0669999999999997 - type: precision_at_3 value: 12.883 - type: precision_at_5 value: 8.895999999999999 - type: recall_at_1 value: 24.257 - type: recall_at_10 value: 45.013999999999996 - type: recall_at_100 value: 63.55800000000001 - type: recall_at_1000 value: 81.649 - type: recall_at_20 value: 49.786 - type: recall_at_3 value: 33.623 - type: recall_at_5 value: 37.489 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval (default) revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: mteb/cqadupstack-tex metrics: - type: main_score value: 27.174 - type: map_at_1 value: 16.683 - type: map_at_10 value: 22.965 - type: map_at_100 value: 23.954 - type: map_at_1000 value: 24.078 - type: map_at_20 value: 23.49 - type: map_at_3 value: 20.918999999999997 - type: map_at_5 value: 22.027 - type: mrr_at_1 value: 19.92429456297316 - type: mrr_at_10 value: 26.551319656102862 - type: mrr_at_100 value: 27.428968210944316 - type: mrr_at_1000 value: 27.510501144435317 - type: mrr_at_20 value: 27.051813881383698 - type: mrr_at_3 value: 24.483826565726083 - type: mrr_at_5 value: 25.624569855471435 - type: nauc_map_at_1000_diff1 value: 39.70294552750383 - type: nauc_map_at_1000_max value: 31.317466455201227 - type: nauc_map_at_1000_std value: -1.762559086629105 - type: nauc_map_at_100_diff1 value: 39.71390899838813 - type: nauc_map_at_100_max value: 31.29204970199068 - type: nauc_map_at_100_std value: -1.791535537876596 - type: nauc_map_at_10_diff1 value: 40.01482969019678 - type: nauc_map_at_10_max value: 31.23314156393745 - type: nauc_map_at_10_std value: -2.3274535397042513 - type: nauc_map_at_1_diff1 value: 46.72895932959986 - type: nauc_map_at_1_max value: 29.819875651168548 - type: nauc_map_at_1_std value: -3.6639434506444912 - type: nauc_map_at_20_diff1 value: 39.79895580803141 - type: nauc_map_at_20_max value: 31.18209733793537 - type: nauc_map_at_20_std value: -2.052399285243834 - type: nauc_map_at_3_diff1 value: 41.98314483627424 - type: nauc_map_at_3_max value: 31.410399587944422 - type: nauc_map_at_3_std value: -3.1256987241100957 - type: nauc_map_at_5_diff1 value: 40.68955549018378 - type: nauc_map_at_5_max value: 31.529138053527888 - type: nauc_map_at_5_std value: -2.5106031609548727 - type: nauc_mrr_at_1000_diff1 value: 38.843425454050774 - type: nauc_mrr_at_1000_max value: 32.080747972542476 - type: nauc_mrr_at_1000_std value: -1.8813140227198037 - type: nauc_mrr_at_100_diff1 value: 38.844774433232246 - type: nauc_mrr_at_100_max value: 32.07767547525176 - type: nauc_mrr_at_100_std value: -1.8853968240347412 - type: nauc_mrr_at_10_diff1 value: 38.9943638829038 - type: nauc_mrr_at_10_max value: 32.113199636613224 - type: nauc_mrr_at_10_std value: -2.2808765253620997 - type: nauc_mrr_at_1_diff1 value: 45.204551111582504 - type: nauc_mrr_at_1_max value: 31.33271495263982 - type: nauc_mrr_at_1_std value: -4.310808417520686 - type: nauc_mrr_at_20_diff1 value: 38.809653957002475 - type: nauc_mrr_at_20_max value: 32.00087958077687 - type: nauc_mrr_at_20_std value: -2.077240815930647 - type: nauc_mrr_at_3_diff1 value: 40.640559615359884 - type: nauc_mrr_at_3_max value: 32.499874311042085 - type: nauc_mrr_at_3_std value: -3.0250204118059623 - type: nauc_mrr_at_5_diff1 value: 39.730384199123904 - type: nauc_mrr_at_5_max value: 32.54797498951286 - type: nauc_mrr_at_5_std value: -2.483752446190051 - type: nauc_ndcg_at_1000_diff1 value: 35.67309434839137 - type: nauc_ndcg_at_1000_max value: 31.968665383689366 - type: nauc_ndcg_at_1000_std value: 1.8902841143765996 - type: nauc_ndcg_at_100_diff1 value: 35.532320541105456 - type: nauc_ndcg_at_100_max value: 31.39262363611392 - type: nauc_ndcg_at_100_std value: 1.3738974219360591 - type: nauc_ndcg_at_10_diff1 value: 36.89304493982828 - type: nauc_ndcg_at_10_max value: 31.413699188823262 - type: nauc_ndcg_at_10_std value: -1.4406496834360265 - type: nauc_ndcg_at_1_diff1 value: 45.204551111582504 - type: nauc_ndcg_at_1_max value: 31.33271495263982 - type: nauc_ndcg_at_1_std value: -4.310808417520686 - type: nauc_ndcg_at_20_diff1 value: 36.10603668893203 - type: nauc_ndcg_at_20_max value: 31.08596071268814 - type: nauc_ndcg_at_20_std value: -0.5716127582631676 - type: nauc_ndcg_at_3_diff1 value: 40.3406275054372 - type: nauc_ndcg_at_3_max value: 32.30746163378498 - type: nauc_ndcg_at_3_std value: -2.9826906381184086 - type: nauc_ndcg_at_5_diff1 value: 38.435436080533805 - type: nauc_ndcg_at_5_max value: 32.28159769507487 - type: nauc_ndcg_at_5_std value: -1.896502637808091 - type: nauc_precision_at_1000_diff1 value: -1.3272380913114576 - type: nauc_precision_at_1000_max value: 16.97452439042005 - type: nauc_precision_at_1000_std value: 6.727514561355023 - type: nauc_precision_at_100_diff1 value: 9.050886288633748 - type: nauc_precision_at_100_max value: 22.793531578995857 - type: nauc_precision_at_100_std value: 9.041251836945914 - type: nauc_precision_at_10_diff1 value: 23.58024783123664 - type: nauc_precision_at_10_max value: 30.911229044947746 - type: nauc_precision_at_10_std value: 0.49206924465533297 - type: nauc_precision_at_1_diff1 value: 45.204551111582504 - type: nauc_precision_at_1_max value: 31.33271495263982 - type: nauc_precision_at_1_std value: -4.310808417520686 - type: nauc_precision_at_20_diff1 value: 18.72722750869453 - type: nauc_precision_at_20_max value: 28.168309388621456 - type: nauc_precision_at_20_std value: 3.5580796098534906 - type: nauc_precision_at_3_diff1 value: 34.21934456307853 - type: nauc_precision_at_3_max value: 34.50963041596628 - type: nauc_precision_at_3_std value: -2.1474684485851876 - type: nauc_precision_at_5_diff1 value: 29.967346999613596 - type: nauc_precision_at_5_max value: 33.958476515854954 - type: nauc_precision_at_5_std value: -0.45778793347456004 - type: nauc_recall_at_1000_diff1 value: 12.06453658572338 - type: nauc_recall_at_1000_max value: 30.788667195142633 - type: nauc_recall_at_1000_std value: 27.271269189751713 - type: nauc_recall_at_100_diff1 value: 19.6231994553196 - type: nauc_recall_at_100_max value: 27.00238503628109 - type: nauc_recall_at_100_std value: 13.294514312384601 - type: nauc_recall_at_10_diff1 value: 27.755272572613222 - type: nauc_recall_at_10_max value: 28.332855891388125 - type: nauc_recall_at_10_std value: 0.8241434995618968 - type: nauc_recall_at_1_diff1 value: 46.72895932959986 - type: nauc_recall_at_1_max value: 29.819875651168548 - type: nauc_recall_at_1_std value: -3.6639434506444912 - type: nauc_recall_at_20_diff1 value: 24.731671276025146 - type: nauc_recall_at_20_max value: 26.949426211227795 - type: nauc_recall_at_20_std value: 3.412457763382852 - type: nauc_recall_at_3_diff1 value: 36.38111388907899 - type: nauc_recall_at_3_max value: 31.47754397495634 - type: nauc_recall_at_3_std value: -2.1874715383733956 - type: nauc_recall_at_5_diff1 value: 31.68529930399809 - type: nauc_recall_at_5_max value: 31.090941464639744 - type: nauc_recall_at_5_std value: -0.1674878655815559 - type: ndcg_at_1 value: 19.924 - type: ndcg_at_10 value: 27.174 - type: ndcg_at_100 value: 32.065 - type: ndcg_at_1000 value: 35.106 - type: ndcg_at_20 value: 28.939999999999998 - type: ndcg_at_3 value: 23.372999999999998 - type: ndcg_at_5 value: 25.096 - type: precision_at_1 value: 19.924 - type: precision_at_10 value: 4.855 - type: precision_at_100 value: 0.857 - type: precision_at_1000 value: 0.129 - type: precision_at_20 value: 2.94 - type: precision_at_3 value: 10.897 - type: precision_at_5 value: 7.7909999999999995 - type: recall_at_1 value: 16.683 - type: recall_at_10 value: 36.276 - type: recall_at_100 value: 58.437 - type: recall_at_1000 value: 80.35900000000001 - type: recall_at_20 value: 42.79 - type: recall_at_3 value: 25.663999999999998 - type: recall_at_5 value: 30.213 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval (default) revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: mteb/cqadupstack-unix metrics: - type: main_score value: 38.34 - type: map_at_1 value: 25.924999999999997 - type: map_at_10 value: 33.53 - type: map_at_100 value: 34.635 - type: map_at_1000 value: 34.739 - type: map_at_20 value: 34.117999999999995 - type: map_at_3 value: 30.94 - type: map_at_5 value: 32.411 - type: mrr_at_1 value: 30.223880597014922 - type: mrr_at_10 value: 37.598873193556024 - type: mrr_at_100 value: 38.48001202116003 - type: mrr_at_1000 value: 38.53998687212744 - type: mrr_at_20 value: 38.0922428291824 - type: mrr_at_3 value: 35.26119402985074 - type: mrr_at_5 value: 36.627798507462686 - type: nauc_map_at_1000_diff1 value: 48.99658121611321 - type: nauc_map_at_1000_max value: 43.36514689969973 - type: nauc_map_at_1000_std value: 1.2743138438292323 - type: nauc_map_at_100_diff1 value: 49.00383839256485 - type: nauc_map_at_100_max value: 43.34421843813268 - type: nauc_map_at_100_std value: 1.2381577394429648 - type: nauc_map_at_10_diff1 value: 48.976968357570804 - type: nauc_map_at_10_max value: 43.21656545934543 - type: nauc_map_at_10_std value: 0.8806229946576106 - type: nauc_map_at_1_diff1 value: 54.79429701172901 - type: nauc_map_at_1_max value: 44.94497297225627 - type: nauc_map_at_1_std value: 0.3424876477921997 - type: nauc_map_at_20_diff1 value: 49.05500453067965 - type: nauc_map_at_20_max value: 43.313867184227114 - type: nauc_map_at_20_std value: 1.0599077751868857 - type: nauc_map_at_3_diff1 value: 50.202191345168735 - type: nauc_map_at_3_max value: 43.16428713411531 - type: nauc_map_at_3_std value: 0.33035782399351366 - type: nauc_map_at_5_diff1 value: 49.43896179760421 - type: nauc_map_at_5_max value: 43.36309937252455 - type: nauc_map_at_5_std value: 0.6152011411226946 - type: nauc_mrr_at_1000_diff1 value: 48.359023685110486 - type: nauc_mrr_at_1000_max value: 42.5315010808791 - type: nauc_mrr_at_1000_std value: 0.5920431228924952 - type: nauc_mrr_at_100_diff1 value: 48.33949213883611 - type: nauc_mrr_at_100_max value: 42.501697399914725 - type: nauc_mrr_at_100_std value: 0.5683233598385363 - type: nauc_mrr_at_10_diff1 value: 48.17405374349975 - type: nauc_mrr_at_10_max value: 42.36829702421452 - type: nauc_mrr_at_10_std value: 0.3918636512799242 - type: nauc_mrr_at_1_diff1 value: 54.41613067936997 - type: nauc_mrr_at_1_max value: 44.91551488557509 - type: nauc_mrr_at_1_std value: -0.7697411188700982 - type: nauc_mrr_at_20_diff1 value: 48.29085774083497 - type: nauc_mrr_at_20_max value: 42.46692350994534 - type: nauc_mrr_at_20_std value: 0.49667689004854476 - type: nauc_mrr_at_3_diff1 value: 49.32403876113614 - type: nauc_mrr_at_3_max value: 42.420974899262816 - type: nauc_mrr_at_3_std value: -0.17054785857862576 - type: nauc_mrr_at_5_diff1 value: 48.5386866012484 - type: nauc_mrr_at_5_max value: 42.49752447209939 - type: nauc_mrr_at_5_std value: -0.030068724695007015 - type: nauc_ndcg_at_1000_diff1 value: 46.482903430093685 - type: nauc_ndcg_at_1000_max value: 43.18727440958746 - type: nauc_ndcg_at_1000_std value: 3.8397045352936874 - type: nauc_ndcg_at_100_diff1 value: 46.272241119098105 - type: nauc_ndcg_at_100_max value: 42.44044067518221 - type: nauc_ndcg_at_100_std value: 3.0744093549329374 - type: nauc_ndcg_at_10_diff1 value: 46.35820553525149 - type: nauc_ndcg_at_10_max value: 42.05754989284268 - type: nauc_ndcg_at_10_std value: 1.6140781134179982 - type: nauc_ndcg_at_1_diff1 value: 54.41613067936997 - type: nauc_ndcg_at_1_max value: 44.91551488557509 - type: nauc_ndcg_at_1_std value: -0.7697411188700982 - type: nauc_ndcg_at_20_diff1 value: 46.56173859192192 - type: nauc_ndcg_at_20_max value: 42.39990803441754 - type: nauc_ndcg_at_20_std value: 2.2301958940613518 - type: nauc_ndcg_at_3_diff1 value: 48.45451921294981 - type: nauc_ndcg_at_3_max value: 42.1519683087422 - type: nauc_ndcg_at_3_std value: 0.43355376702150983 - type: nauc_ndcg_at_5_diff1 value: 47.329516258529 - type: nauc_ndcg_at_5_max value: 42.39325493165628 - type: nauc_ndcg_at_5_std value: 0.8719863795035224 - type: nauc_precision_at_1000_diff1 value: -10.427395700183098 - type: nauc_precision_at_1000_max value: 1.3695831886594074 - type: nauc_precision_at_1000_std value: 5.396211335976429 - type: nauc_precision_at_100_diff1 value: 4.170216285720574 - type: nauc_precision_at_100_max value: 14.393676436386233 - type: nauc_precision_at_100_std value: 7.356250144868687 - type: nauc_precision_at_10_diff1 value: 25.406793843503 - type: nauc_precision_at_10_max value: 30.469137431378485 - type: nauc_precision_at_10_std value: 4.262031333274362 - type: nauc_precision_at_1_diff1 value: 54.41613067936997 - type: nauc_precision_at_1_max value: 44.91551488557509 - type: nauc_precision_at_1_std value: -0.7697411188700982 - type: nauc_precision_at_20_diff1 value: 20.989784339763254 - type: nauc_precision_at_20_max value: 27.616892902118735 - type: nauc_precision_at_20_std value: 5.021785061675381 - type: nauc_precision_at_3_diff1 value: 39.66665542900266 - type: nauc_precision_at_3_max value: 37.76686222170862 - type: nauc_precision_at_3_std value: 1.04925540752191 - type: nauc_precision_at_5_diff1 value: 32.88141076318413 - type: nauc_precision_at_5_max value: 35.90401974619475 - type: nauc_precision_at_5_std value: 2.2695242286100408 - type: nauc_recall_at_1000_diff1 value: 30.248973513875526 - type: nauc_recall_at_1000_max value: 48.439331789791325 - type: nauc_recall_at_1000_std value: 38.857189673518135 - type: nauc_recall_at_100_diff1 value: 33.090255913758874 - type: nauc_recall_at_100_max value: 35.45818452208663 - type: nauc_recall_at_100_std value: 12.58439358264515 - type: nauc_recall_at_10_diff1 value: 37.462082402733785 - type: nauc_recall_at_10_max value: 36.99065942533105 - type: nauc_recall_at_10_std value: 3.948587023033947 - type: nauc_recall_at_1_diff1 value: 54.79429701172901 - type: nauc_recall_at_1_max value: 44.94497297225627 - type: nauc_recall_at_1_std value: 0.3424876477921997 - type: nauc_recall_at_20_diff1 value: 37.34159405112872 - type: nauc_recall_at_20_max value: 37.50873448555206 - type: nauc_recall_at_20_std value: 6.669489660177887 - type: nauc_recall_at_3_diff1 value: 43.751405924588184 - type: nauc_recall_at_3_max value: 38.5280847003097 - type: nauc_recall_at_3_std value: 0.8234291612745726 - type: nauc_recall_at_5_diff1 value: 40.75537181461394 - type: nauc_recall_at_5_max value: 38.64761171801593 - type: nauc_recall_at_5_std value: 1.9783778065563666 - type: ndcg_at_1 value: 30.224 - type: ndcg_at_10 value: 38.34 - type: ndcg_at_100 value: 43.564 - type: ndcg_at_1000 value: 45.888 - type: ndcg_at_20 value: 40.285 - type: ndcg_at_3 value: 33.613 - type: ndcg_at_5 value: 35.868 - type: precision_at_1 value: 30.224 - type: precision_at_10 value: 6.343 - type: precision_at_100 value: 1.0030000000000001 - type: precision_at_1000 value: 0.131 - type: precision_at_20 value: 3.689 - type: precision_at_3 value: 14.832 - type: precision_at_5 value: 10.504 - type: recall_at_1 value: 25.924999999999997 - type: recall_at_10 value: 49.01 - type: recall_at_100 value: 71.935 - type: recall_at_1000 value: 88.191 - type: recall_at_20 value: 56.076 - type: recall_at_3 value: 36.344 - type: recall_at_5 value: 41.942 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: mteb/cqadupstack-webmasters metrics: - type: main_score value: 39.007 - type: map_at_1 value: 25.195 - type: map_at_10 value: 33.29 - type: map_at_100 value: 34.919 - type: map_at_1000 value: 35.132999999999996 - type: map_at_20 value: 34.184 - type: map_at_3 value: 30.501 - type: map_at_5 value: 31.917 - type: mrr_at_1 value: 30.237154150197625 - type: mrr_at_10 value: 37.97901373988331 - type: mrr_at_100 value: 38.89357624578056 - type: mrr_at_1000 value: 38.96172508462875 - type: mrr_at_20 value: 38.489908488593 - type: mrr_at_3 value: 35.44137022397892 - type: mrr_at_5 value: 36.755599472990774 - type: nauc_map_at_1000_diff1 value: 54.52234288345771 - type: nauc_map_at_1000_max value: 37.02933259777875 - type: nauc_map_at_1000_std value: -1.8802414735497839 - type: nauc_map_at_100_diff1 value: 54.592085424308564 - type: nauc_map_at_100_max value: 37.13861558972853 - type: nauc_map_at_100_std value: -1.8864900602925623 - type: nauc_map_at_10_diff1 value: 55.32701084932018 - type: nauc_map_at_10_max value: 36.97158176818064 - type: nauc_map_at_10_std value: -3.364570079568588 - type: nauc_map_at_1_diff1 value: 62.56234442022803 - type: nauc_map_at_1_max value: 37.725553737446866 - type: nauc_map_at_1_std value: -5.9573495367577705 - type: nauc_map_at_20_diff1 value: 54.92567471295049 - type: nauc_map_at_20_max value: 36.980006282091985 - type: nauc_map_at_20_std value: -2.7416738048891243 - type: nauc_map_at_3_diff1 value: 57.6202035201006 - type: nauc_map_at_3_max value: 36.85083307496426 - type: nauc_map_at_3_std value: -4.929088209082444 - type: nauc_map_at_5_diff1 value: 56.43034014992742 - type: nauc_map_at_5_max value: 36.65006798835753 - type: nauc_map_at_5_std value: -4.776147213332607 - type: nauc_mrr_at_1000_diff1 value: 51.91684536214369 - type: nauc_mrr_at_1000_max value: 35.50047477073224 - type: nauc_mrr_at_1000_std value: -0.9638166168094422 - type: nauc_mrr_at_100_diff1 value: 51.89735751581897 - type: nauc_mrr_at_100_max value: 35.48371938892366 - type: nauc_mrr_at_100_std value: -0.9444977007097576 - type: nauc_mrr_at_10_diff1 value: 51.82990105533963 - type: nauc_mrr_at_10_max value: 35.41678096580625 - type: nauc_mrr_at_10_std value: -1.2998439543197369 - type: nauc_mrr_at_1_diff1 value: 57.36601705972182 - type: nauc_mrr_at_1_max value: 36.90602990003092 - type: nauc_mrr_at_1_std value: -3.4080880251307044 - type: nauc_mrr_at_20_diff1 value: 51.8613947241447 - type: nauc_mrr_at_20_max value: 35.42345819928662 - type: nauc_mrr_at_20_std value: -1.093870308993923 - type: nauc_mrr_at_3_diff1 value: 53.01993009463089 - type: nauc_mrr_at_3_max value: 35.822666497908806 - type: nauc_mrr_at_3_std value: -2.1165600076512474 - type: nauc_mrr_at_5_diff1 value: 52.34611304656942 - type: nauc_mrr_at_5_max value: 35.49696929205688 - type: nauc_mrr_at_5_std value: -2.0955274926266982 - type: nauc_ndcg_at_1000_diff1 value: 51.41120348218975 - type: nauc_ndcg_at_1000_max value: 36.685342768279675 - type: nauc_ndcg_at_1000_std value: 1.7205313748343651 - type: nauc_ndcg_at_100_diff1 value: 50.93701708514895 - type: nauc_ndcg_at_100_max value: 36.162627377243275 - type: nauc_ndcg_at_100_std value: 1.7640807675244328 - type: nauc_ndcg_at_10_diff1 value: 50.63098923593871 - type: nauc_ndcg_at_10_max value: 35.34361464083639 - type: nauc_ndcg_at_10_std value: -0.9402862458857915 - type: nauc_ndcg_at_1_diff1 value: 57.36601705972182 - type: nauc_ndcg_at_1_max value: 36.90602990003092 - type: nauc_ndcg_at_1_std value: -3.4080880251307044 - type: nauc_ndcg_at_20_diff1 value: 50.73961693837964 - type: nauc_ndcg_at_20_max value: 35.01998564289338 - type: nauc_ndcg_at_20_std value: -0.5241446967120867 - type: nauc_ndcg_at_3_diff1 value: 53.23302956511971 - type: nauc_ndcg_at_3_max value: 35.708980757056295 - type: nauc_ndcg_at_3_std value: -3.017125347557592 - type: nauc_ndcg_at_5_diff1 value: 52.335636773583396 - type: nauc_ndcg_at_5_max value: 35.34227057005852 - type: nauc_ndcg_at_5_std value: -2.9708664518544508 - type: nauc_precision_at_1000_diff1 value: -18.554677236277232 - type: nauc_precision_at_1000_max value: -15.659740900843067 - type: nauc_precision_at_1000_std value: 8.228155770924415 - type: nauc_precision_at_100_diff1 value: -12.195998995692928 - type: nauc_precision_at_100_max value: -0.5888781565639164 - type: nauc_precision_at_100_std value: 19.312752223375448 - type: nauc_precision_at_10_diff1 value: 12.921470127228105 - type: nauc_precision_at_10_max value: 21.317929458256238 - type: nauc_precision_at_10_std value: 13.148202187911012 - type: nauc_precision_at_1_diff1 value: 57.36601705972182 - type: nauc_precision_at_1_max value: 36.90602990003092 - type: nauc_precision_at_1_std value: -3.4080880251307044 - type: nauc_precision_at_20_diff1 value: 2.4696353004069906 - type: nauc_precision_at_20_max value: 14.284343093524058 - type: nauc_precision_at_20_std value: 17.480976091077217 - type: nauc_precision_at_3_diff1 value: 35.82856720298558 - type: nauc_precision_at_3_max value: 29.613454822718143 - type: nauc_precision_at_3_std value: 0.38030095211645343 - type: nauc_precision_at_5_diff1 value: 27.632641276435354 - type: nauc_precision_at_5_max value: 27.238425775328967 - type: nauc_precision_at_5_std value: 3.152744091929671 - type: nauc_recall_at_1000_diff1 value: 33.28570370310322 - type: nauc_recall_at_1000_max value: 44.315453433115785 - type: nauc_recall_at_1000_std value: 43.371884128363 - type: nauc_recall_at_100_diff1 value: 35.77059425104567 - type: nauc_recall_at_100_max value: 31.48054575812204 - type: nauc_recall_at_100_std value: 17.639416832754303 - type: nauc_recall_at_10_diff1 value: 40.179789202687914 - type: nauc_recall_at_10_max value: 30.466946546206923 - type: nauc_recall_at_10_std value: 0.8385433327977754 - type: nauc_recall_at_1_diff1 value: 62.56234442022803 - type: nauc_recall_at_1_max value: 37.725553737446866 - type: nauc_recall_at_1_std value: -5.9573495367577705 - type: nauc_recall_at_20_diff1 value: 38.70371818511684 - type: nauc_recall_at_20_max value: 28.305350175132567 - type: nauc_recall_at_20_std value: 3.8854966962347746 - type: nauc_recall_at_3_diff1 value: 51.22347884414916 - type: nauc_recall_at_3_max value: 33.21612425601433 - type: nauc_recall_at_3_std value: -4.48370860005988 - type: nauc_recall_at_5_diff1 value: 46.848014408337676 - type: nauc_recall_at_5_max value: 31.254476917525555 - type: nauc_recall_at_5_std value: -4.903427133365656 - type: ndcg_at_1 value: 30.237000000000002 - type: ndcg_at_10 value: 39.007 - type: ndcg_at_100 value: 44.585 - type: ndcg_at_1000 value: 47.464 - type: ndcg_at_20 value: 41.278999999999996 - type: ndcg_at_3 value: 34.472 - type: ndcg_at_5 value: 36.315 - type: precision_at_1 value: 30.237000000000002 - type: precision_at_10 value: 7.51 - type: precision_at_100 value: 1.478 - type: precision_at_1000 value: 0.234 - type: precision_at_20 value: 4.7829999999999995 - type: precision_at_3 value: 16.14 - type: precision_at_5 value: 11.462 - type: recall_at_1 value: 25.195 - type: recall_at_10 value: 49.507 - type: recall_at_100 value: 74.083 - type: recall_at_1000 value: 92.899 - type: recall_at_20 value: 58.291000000000004 - type: recall_at_3 value: 36.167 - type: recall_at_5 value: 41.749 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval (default) revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack-wordpress metrics: - type: main_score value: 33.06 - type: map_at_1 value: 22.683 - type: map_at_10 value: 29.115000000000002 - type: map_at_100 value: 30.035 - type: map_at_1000 value: 30.141000000000002 - type: map_at_20 value: 29.585 - type: map_at_3 value: 27.436 - type: map_at_5 value: 28.186 - type: mrr_at_1 value: 24.953789279112755 - type: mrr_at_10 value: 31.512190828272157 - type: mrr_at_100 value: 32.30661079835987 - type: mrr_at_1000 value: 32.388485948646846 - type: mrr_at_20 value: 31.898454977555428 - type: mrr_at_3 value: 29.852125693160815 - type: mrr_at_5 value: 30.64695009242144 - type: nauc_map_at_1000_diff1 value: 41.37097481409692 - type: nauc_map_at_1000_max value: 21.819472065390062 - type: nauc_map_at_1000_std value: -5.511851233031371 - type: nauc_map_at_100_diff1 value: 41.38580981484577 - type: nauc_map_at_100_max value: 21.796410887298222 - type: nauc_map_at_100_std value: -5.56736379242138 - type: nauc_map_at_10_diff1 value: 41.63629903410976 - type: nauc_map_at_10_max value: 21.90371149884218 - type: nauc_map_at_10_std value: -6.152274677121426 - type: nauc_map_at_1_diff1 value: 45.84841941041374 - type: nauc_map_at_1_max value: 20.461574274794568 - type: nauc_map_at_1_std value: -7.769870515581234 - type: nauc_map_at_20_diff1 value: 41.616159838791376 - type: nauc_map_at_20_max value: 21.879572436615728 - type: nauc_map_at_20_std value: -6.001760143925003 - type: nauc_map_at_3_diff1 value: 42.690213994915474 - type: nauc_map_at_3_max value: 21.35340820982141 - type: nauc_map_at_3_std value: -6.118720026868332 - type: nauc_map_at_5_diff1 value: 42.107817663484575 - type: nauc_map_at_5_max value: 22.02508826703247 - type: nauc_map_at_5_std value: -5.655849953120985 - type: nauc_mrr_at_1000_diff1 value: 39.66954612386224 - type: nauc_mrr_at_1000_max value: 22.150137067327954 - type: nauc_mrr_at_1000_std value: -4.798006812425386 - type: nauc_mrr_at_100_diff1 value: 39.66409024535208 - type: nauc_mrr_at_100_max value: 22.121525365416538 - type: nauc_mrr_at_100_std value: -4.806603240713894 - type: nauc_mrr_at_10_diff1 value: 39.87117352487735 - type: nauc_mrr_at_10_max value: 22.298568726426076 - type: nauc_mrr_at_10_std value: -5.1451772190015195 - type: nauc_mrr_at_1_diff1 value: 43.86075692062394 - type: nauc_mrr_at_1_max value: 20.51270620979276 - type: nauc_mrr_at_1_std value: -7.589704558075294 - type: nauc_mrr_at_20_diff1 value: 39.820424398881215 - type: nauc_mrr_at_20_max value: 22.173944895852095 - type: nauc_mrr_at_20_std value: -5.0727540461865335 - type: nauc_mrr_at_3_diff1 value: 40.73278435693193 - type: nauc_mrr_at_3_max value: 21.930995553135812 - type: nauc_mrr_at_3_std value: -5.980722775097277 - type: nauc_mrr_at_5_diff1 value: 39.89679395564144 - type: nauc_mrr_at_5_max value: 22.02821777103734 - type: nauc_mrr_at_5_std value: -5.072135508421082 - type: nauc_ndcg_at_1000_diff1 value: 37.957587605367785 - type: nauc_ndcg_at_1000_max value: 22.362257192820255 - type: nauc_ndcg_at_1000_std value: -1.7757428668228084 - type: nauc_ndcg_at_100_diff1 value: 37.908544407246104 - type: nauc_ndcg_at_100_max value: 21.536623476432354 - type: nauc_ndcg_at_100_std value: -2.678355870833651 - type: nauc_ndcg_at_10_diff1 value: 39.36845261271005 - type: nauc_ndcg_at_10_max value: 22.3150793248212 - type: nauc_ndcg_at_10_std value: -5.646375413170874 - type: nauc_ndcg_at_1_diff1 value: 43.86075692062394 - type: nauc_ndcg_at_1_max value: 20.51270620979276 - type: nauc_ndcg_at_1_std value: -7.589704558075294 - type: nauc_ndcg_at_20_diff1 value: 39.30711049883703 - type: nauc_ndcg_at_20_max value: 21.935544953883415 - type: nauc_ndcg_at_20_std value: -5.20402304183158 - type: nauc_ndcg_at_3_diff1 value: 41.113286498750305 - type: nauc_ndcg_at_3_max value: 21.635397999914282 - type: nauc_ndcg_at_3_std value: -5.72866713630757 - type: nauc_ndcg_at_5_diff1 value: 40.06783309225114 - type: nauc_ndcg_at_5_max value: 22.416356942701672 - type: nauc_ndcg_at_5_std value: -4.886519038213331 - type: nauc_precision_at_1000_diff1 value: -17.52292838463402 - type: nauc_precision_at_1000_max value: -5.389818321213827 - type: nauc_precision_at_1000_std value: 26.772552854570375 - type: nauc_precision_at_100_diff1 value: 3.543169641476175 - type: nauc_precision_at_100_max value: 9.574510694378198 - type: nauc_precision_at_100_std value: 17.92832693421059 - type: nauc_precision_at_10_diff1 value: 24.894375565187694 - type: nauc_precision_at_10_max value: 22.273016884986628 - type: nauc_precision_at_10_std value: -0.32355612520474136 - type: nauc_precision_at_1_diff1 value: 43.86075692062394 - type: nauc_precision_at_1_max value: 20.51270620979276 - type: nauc_precision_at_1_std value: -7.589704558075294 - type: nauc_precision_at_20_diff1 value: 21.29826064932648 - type: nauc_precision_at_20_max value: 19.79498027543001 - type: nauc_precision_at_20_std value: 2.804941576632282 - type: nauc_precision_at_3_diff1 value: 33.72177316592598 - type: nauc_precision_at_3_max value: 22.691241202228518 - type: nauc_precision_at_3_std value: -2.7085967541341853 - type: nauc_precision_at_5_diff1 value: 30.51704379057159 - type: nauc_precision_at_5_max value: 24.287775910544436 - type: nauc_precision_at_5_std value: 0.6318618555538418 - type: nauc_recall_at_1000_diff1 value: 16.14163529457628 - type: nauc_recall_at_1000_max value: 30.255937330833625 - type: nauc_recall_at_1000_std value: 34.82149396857235 - type: nauc_recall_at_100_diff1 value: 24.81738199141423 - type: nauc_recall_at_100_max value: 17.622405730191517 - type: nauc_recall_at_100_std value: 9.943278532212068 - type: nauc_recall_at_10_diff1 value: 34.03447281460739 - type: nauc_recall_at_10_max value: 22.077681180504047 - type: nauc_recall_at_10_std value: -5.772153803762581 - type: nauc_recall_at_1_diff1 value: 45.84841941041374 - type: nauc_recall_at_1_max value: 20.461574274794568 - type: nauc_recall_at_1_std value: -7.769870515581234 - type: nauc_recall_at_20_diff1 value: 33.91749085377916 - type: nauc_recall_at_20_max value: 20.226869969726543 - type: nauc_recall_at_20_std value: -4.369285076602888 - type: nauc_recall_at_3_diff1 value: 38.25575445199975 - type: nauc_recall_at_3_max value: 21.402983769895837 - type: nauc_recall_at_3_std value: -5.96278802416301 - type: nauc_recall_at_5_diff1 value: 36.17314539524256 - type: nauc_recall_at_5_max value: 23.115551795773314 - type: nauc_recall_at_5_std value: -3.8407187471333697 - type: ndcg_at_1 value: 24.954 - type: ndcg_at_10 value: 33.06 - type: ndcg_at_100 value: 37.751000000000005 - type: ndcg_at_1000 value: 40.477000000000004 - type: ndcg_at_20 value: 34.587 - type: ndcg_at_3 value: 29.666999999999998 - type: ndcg_at_5 value: 30.929000000000002 - type: precision_at_1 value: 24.954 - type: precision_at_10 value: 4.972 - type: precision_at_100 value: 0.799 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_20 value: 2.874 - type: precision_at_3 value: 12.446 - type: precision_at_5 value: 8.244 - type: recall_at_1 value: 22.683 - type: recall_at_10 value: 42.775 - type: recall_at_100 value: 65.05300000000001 - type: recall_at_1000 value: 85.251 - type: recall_at_20 value: 48.512 - type: recall_at_3 value: 33.423 - type: recall_at_5 value: 36.571 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER (default) revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: main_score value: 25.713 - type: map_at_1 value: 10.995000000000001 - type: map_at_10 value: 18.183 - type: map_at_100 value: 19.758 - type: map_at_1000 value: 19.93 - type: map_at_20 value: 19.023 - type: map_at_3 value: 15.126999999999999 - type: map_at_5 value: 16.521 - type: mrr_at_1 value: 23.908794788273617 - type: mrr_at_10 value: 34.419626699756996 - type: mrr_at_100 value: 35.42205880765744 - type: mrr_at_1000 value: 35.465636585855435 - type: mrr_at_20 value: 35.04560320193987 - type: mrr_at_3 value: 31.31378935939197 - type: mrr_at_5 value: 32.98154180238871 - type: nauc_map_at_1000_diff1 value: 30.808649871031978 - type: nauc_map_at_1000_max value: 38.44733700268257 - type: nauc_map_at_1000_std value: 24.83849154952647 - type: nauc_map_at_100_diff1 value: 30.817681439188565 - type: nauc_map_at_100_max value: 38.38165009049118 - type: nauc_map_at_100_std value: 24.75945437667734 - type: nauc_map_at_10_diff1 value: 31.016072728955457 - type: nauc_map_at_10_max value: 37.78482154934025 - type: nauc_map_at_10_std value: 22.73087477402899 - type: nauc_map_at_1_diff1 value: 38.13786017193742 - type: nauc_map_at_1_max value: 34.897924276187446 - type: nauc_map_at_1_std value: 15.197914019142733 - type: nauc_map_at_20_diff1 value: 30.93811389613207 - type: nauc_map_at_20_max value: 38.018621558175084 - type: nauc_map_at_20_std value: 23.87402074626538 - type: nauc_map_at_3_diff1 value: 32.694558487234204 - type: nauc_map_at_3_max value: 37.452175644150344 - type: nauc_map_at_3_std value: 20.06796990357737 - type: nauc_map_at_5_diff1 value: 31.654957870346784 - type: nauc_map_at_5_max value: 37.04115114192235 - type: nauc_map_at_5_std value: 21.129693545324375 - type: nauc_mrr_at_1000_diff1 value: 29.802772421913403 - type: nauc_mrr_at_1000_max value: 38.000278050301176 - type: nauc_mrr_at_1000_std value: 23.48992856904152 - type: nauc_mrr_at_100_diff1 value: 29.788014379597026 - type: nauc_mrr_at_100_max value: 38.0070275486147 - type: nauc_mrr_at_100_std value: 23.522736661530086 - type: nauc_mrr_at_10_diff1 value: 29.5812602078958 - type: nauc_mrr_at_10_max value: 37.73314132006107 - type: nauc_mrr_at_10_std value: 23.34339817425411 - type: nauc_mrr_at_1_diff1 value: 36.24696165314146 - type: nauc_mrr_at_1_max value: 36.63498565688475 - type: nauc_mrr_at_1_std value: 16.627906626261446 - type: nauc_mrr_at_20_diff1 value: 29.765297131181562 - type: nauc_mrr_at_20_max value: 37.8739248069123 - type: nauc_mrr_at_20_std value: 23.44526626055555 - type: nauc_mrr_at_3_diff1 value: 30.428492046004795 - type: nauc_mrr_at_3_max value: 37.917848006886125 - type: nauc_mrr_at_3_std value: 21.90161780585706 - type: nauc_mrr_at_5_diff1 value: 29.93977431566972 - type: nauc_mrr_at_5_max value: 37.69690203746751 - type: nauc_mrr_at_5_std value: 22.75274068799061 - type: nauc_ndcg_at_1000_diff1 value: 27.523183792167266 - type: nauc_ndcg_at_1000_max value: 40.93757048012577 - type: nauc_ndcg_at_1000_std value: 32.30396817658341 - type: nauc_ndcg_at_100_diff1 value: 27.454763301587064 - type: nauc_ndcg_at_100_max value: 40.45039618287942 - type: nauc_ndcg_at_100_std value: 31.795801743619663 - type: nauc_ndcg_at_10_diff1 value: 28.012456489936806 - type: nauc_ndcg_at_10_max value: 38.045278212869825 - type: nauc_ndcg_at_10_std value: 25.963041085823978 - type: nauc_ndcg_at_1_diff1 value: 35.99513984271449 - type: nauc_ndcg_at_1_max value: 36.62771507516844 - type: nauc_ndcg_at_1_std value: 16.726124822038052 - type: nauc_ndcg_at_20_diff1 value: 28.012111240688963 - type: nauc_ndcg_at_20_max value: 38.667107321330555 - type: nauc_ndcg_at_20_std value: 28.198245721076976 - type: nauc_ndcg_at_3_diff1 value: 30.33073102826854 - type: nauc_ndcg_at_3_max value: 37.995789997615354 - type: nauc_ndcg_at_3_std value: 22.304331918813876 - type: nauc_ndcg_at_5_diff1 value: 29.141028641237632 - type: nauc_ndcg_at_5_max value: 37.2113360591228 - type: nauc_ndcg_at_5_std value: 23.53066714165745 - type: nauc_precision_at_1000_diff1 value: -1.0646702024743917 - type: nauc_precision_at_1000_max value: 19.304218995700534 - type: nauc_precision_at_1000_std value: 31.73840122818843 - type: nauc_precision_at_100_diff1 value: 5.427804568412734 - type: nauc_precision_at_100_max value: 27.90881278884377 - type: nauc_precision_at_100_std value: 38.45326235114876 - type: nauc_precision_at_10_diff1 value: 14.252021242340863 - type: nauc_precision_at_10_max value: 32.047078663067914 - type: nauc_precision_at_10_std value: 30.621835328899426 - type: nauc_precision_at_1_diff1 value: 35.99513984271449 - type: nauc_precision_at_1_max value: 36.62771507516844 - type: nauc_precision_at_1_std value: 16.726124822038052 - type: nauc_precision_at_20_diff1 value: 12.017354269524972 - type: nauc_precision_at_20_max value: 29.906152963561322 - type: nauc_precision_at_20_std value: 33.764105037332264 - type: nauc_precision_at_3_diff1 value: 23.486354895398577 - type: nauc_precision_at_3_max value: 38.45096435794749 - type: nauc_precision_at_3_std value: 26.636452479567645 - type: nauc_precision_at_5_diff1 value: 19.574760607896973 - type: nauc_precision_at_5_max value: 34.51474571826715 - type: nauc_precision_at_5_std value: 28.514859235740904 - type: nauc_recall_at_1000_diff1 value: 12.801905007251246 - type: nauc_recall_at_1000_max value: 37.49463996225108 - type: nauc_recall_at_1000_std value: 45.46087045204742 - type: nauc_recall_at_100_diff1 value: 15.082886168560034 - type: nauc_recall_at_100_max value: 35.720813725614 - type: nauc_recall_at_100_std value: 39.876934524809215 - type: nauc_recall_at_10_diff1 value: 20.08086437796489 - type: nauc_recall_at_10_max value: 33.418507169063815 - type: nauc_recall_at_10_std value: 27.309080075299562 - type: nauc_recall_at_1_diff1 value: 38.13786017193742 - type: nauc_recall_at_1_max value: 34.897924276187446 - type: nauc_recall_at_1_std value: 15.197914019142733 - type: nauc_recall_at_20_diff1 value: 18.984980462200134 - type: nauc_recall_at_20_max value: 32.95474022914299 - type: nauc_recall_at_20_std value: 30.77553423574554 - type: nauc_recall_at_3_diff1 value: 26.670776366276865 - type: nauc_recall_at_3_max value: 37.07230392845629 - type: nauc_recall_at_3_std value: 23.385309818709757 - type: nauc_recall_at_5_diff1 value: 23.45569235165577 - type: nauc_recall_at_5_max value: 34.014688386664524 - type: nauc_recall_at_5_std value: 24.50194439244803 - type: ndcg_at_1 value: 23.974 - type: ndcg_at_10 value: 25.713 - type: ndcg_at_100 value: 32.349 - type: ndcg_at_1000 value: 35.615 - type: ndcg_at_20 value: 28.28 - type: ndcg_at_3 value: 20.761 - type: ndcg_at_5 value: 22.225 - type: precision_at_1 value: 23.974 - type: precision_at_10 value: 8.052 - type: precision_at_100 value: 1.5110000000000001 - type: precision_at_1000 value: 0.211 - type: precision_at_20 value: 5.106999999999999 - type: precision_at_3 value: 15.157000000000002 - type: precision_at_5 value: 11.557 - type: recall_at_1 value: 10.995000000000001 - type: recall_at_10 value: 31.05 - type: recall_at_100 value: 54.233 - type: recall_at_1000 value: 72.75500000000001 - type: recall_at_20 value: 38.442 - type: recall_at_3 value: 18.839 - type: recall_at_5 value: 23.26 task: type: Retrieval - dataset: config: default name: MTEB DBPedia (default) revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: main_score value: 40.091 - type: map_at_1 value: 8.112 - type: map_at_10 value: 18.911 - type: map_at_100 value: 27.29 - type: map_at_1000 value: 28.749000000000002 - type: map_at_20 value: 22.187 - type: map_at_3 value: 13.177 - type: map_at_5 value: 15.723999999999998 - type: mrr_at_1 value: 64.75 - type: mrr_at_10 value: 73.0328373015873 - type: mrr_at_100 value: 73.3904467983012 - type: mrr_at_1000 value: 73.40582528487944 - type: mrr_at_20 value: 73.25613317925624 - type: mrr_at_3 value: 71.58333333333333 - type: mrr_at_5 value: 72.52083333333333 - type: nauc_map_at_1000_diff1 value: 30.326073419291667 - type: nauc_map_at_1000_max value: 41.2485655499243 - type: nauc_map_at_1000_std value: 34.68797882732488 - type: nauc_map_at_100_diff1 value: 30.250567651424635 - type: nauc_map_at_100_max value: 39.591743243203275 - type: nauc_map_at_100_std value: 32.14962028433263 - type: nauc_map_at_10_diff1 value: 28.30330426974147 - type: nauc_map_at_10_max value: 24.685858800003153 - type: nauc_map_at_10_std value: 6.991461788881313 - type: nauc_map_at_1_diff1 value: 37.84825245885128 - type: nauc_map_at_1_max value: 10.784383140794167 - type: nauc_map_at_1_std value: -12.413788028731759 - type: nauc_map_at_20_diff1 value: 30.56644002866712 - type: nauc_map_at_20_max value: 32.09850095008104 - type: nauc_map_at_20_std value: 17.68312732143373 - type: nauc_map_at_3_diff1 value: 26.94636553986902 - type: nauc_map_at_3_max value: 13.716258156642672 - type: nauc_map_at_3_std value: -7.919396887763491 - type: nauc_map_at_5_diff1 value: 26.703766272524305 - type: nauc_map_at_5_max value: 18.493432579075815 - type: nauc_map_at_5_std value: -1.7953102028408285 - type: nauc_mrr_at_1000_diff1 value: 56.5585700690547 - type: nauc_mrr_at_1000_max value: 68.59723304665478 - type: nauc_mrr_at_1000_std value: 41.65741817361127 - type: nauc_mrr_at_100_diff1 value: 56.56488475063903 - type: nauc_mrr_at_100_max value: 68.59436880973041 - type: nauc_mrr_at_100_std value: 41.64008885243909 - type: nauc_mrr_at_10_diff1 value: 56.57992847970396 - type: nauc_mrr_at_10_max value: 68.54809322422658 - type: nauc_mrr_at_10_std value: 41.637196787701605 - type: nauc_mrr_at_1_diff1 value: 59.49013430944212 - type: nauc_mrr_at_1_max value: 67.51266363522255 - type: nauc_mrr_at_1_std value: 39.159077933489094 - type: nauc_mrr_at_20_diff1 value: 56.322141799066195 - type: nauc_mrr_at_20_max value: 68.41241085079113 - type: nauc_mrr_at_20_std value: 41.74023776153815 - type: nauc_mrr_at_3_diff1 value: 56.43465566121455 - type: nauc_mrr_at_3_max value: 69.32027688455301 - type: nauc_mrr_at_3_std value: 42.35441414676036 - type: nauc_mrr_at_5_diff1 value: 56.185426652218126 - type: nauc_mrr_at_5_max value: 68.68507625781251 - type: nauc_mrr_at_5_std value: 42.227673261247816 - type: nauc_ndcg_at_1000_diff1 value: 38.452991805224926 - type: nauc_ndcg_at_1000_max value: 55.49295294630129 - type: nauc_ndcg_at_1000_std value: 47.669258273236046 - type: nauc_ndcg_at_100_diff1 value: 37.94112950003329 - type: nauc_ndcg_at_100_max value: 50.68816850295493 - type: nauc_ndcg_at_100_std value: 40.72315230606931 - type: nauc_ndcg_at_10_diff1 value: 38.47467764455152 - type: nauc_ndcg_at_10_max value: 49.25673297040027 - type: nauc_ndcg_at_10_std value: 36.76815739343767 - type: nauc_ndcg_at_1_diff1 value: 54.434593584664995 - type: nauc_ndcg_at_1_max value: 57.61369658753043 - type: nauc_ndcg_at_1_std value: 33.10284117958805 - type: nauc_ndcg_at_20_diff1 value: 38.3053661549299 - type: nauc_ndcg_at_20_max value: 49.26702623701029 - type: nauc_ndcg_at_20_std value: 36.78366426340987 - type: nauc_ndcg_at_3_diff1 value: 38.34783510078573 - type: nauc_ndcg_at_3_max value: 51.181351973892085 - type: nauc_ndcg_at_3_std value: 35.13771937716931 - type: nauc_ndcg_at_5_diff1 value: 38.73137682217783 - type: nauc_ndcg_at_5_max value: 51.289826741923875 - type: nauc_ndcg_at_5_std value: 36.76670998246709 - type: nauc_precision_at_1000_diff1 value: -8.37698697546597 - type: nauc_precision_at_1000_max value: 4.649648259545355 - type: nauc_precision_at_1000_std value: 15.100762512885371 - type: nauc_precision_at_100_diff1 value: 4.538510496829277 - type: nauc_precision_at_100_max value: 33.573044920932965 - type: nauc_precision_at_100_std value: 50.15177354474223 - type: nauc_precision_at_10_diff1 value: 16.03217990213501 - type: nauc_precision_at_10_max value: 45.22978979054545 - type: nauc_precision_at_10_std value: 53.103286665555295 - type: nauc_precision_at_1_diff1 value: 59.49013430944212 - type: nauc_precision_at_1_max value: 67.51266363522255 - type: nauc_precision_at_1_std value: 39.159077933489094 - type: nauc_precision_at_20_diff1 value: 13.705605238285958 - type: nauc_precision_at_20_max value: 44.08365262009368 - type: nauc_precision_at_20_std value: 56.050420219607155 - type: nauc_precision_at_3_diff1 value: 21.409861522316014 - type: nauc_precision_at_3_max value: 48.93702948445578 - type: nauc_precision_at_3_std value: 42.8419067771303 - type: nauc_precision_at_5_diff1 value: 20.1310639195609 - type: nauc_precision_at_5_max value: 49.59134352761235 - type: nauc_precision_at_5_std value: 48.98546957350543 - type: nauc_recall_at_1000_diff1 value: 27.181172941984112 - type: nauc_recall_at_1000_max value: 49.20832060504127 - type: nauc_recall_at_1000_std value: 50.58754027710416 - type: nauc_recall_at_100_diff1 value: 25.831239736658713 - type: nauc_recall_at_100_max value: 37.92978899965714 - type: nauc_recall_at_100_std value: 32.84155059838547 - type: nauc_recall_at_10_diff1 value: 21.03971256731199 - type: nauc_recall_at_10_max value: 16.34542184400448 - type: nauc_recall_at_10_std value: 1.624004078039708 - type: nauc_recall_at_1_diff1 value: 37.84825245885128 - type: nauc_recall_at_1_max value: 10.784383140794167 - type: nauc_recall_at_1_std value: -12.413788028731759 - type: nauc_recall_at_20_diff1 value: 23.612410438391652 - type: nauc_recall_at_20_max value: 24.731496668584725 - type: nauc_recall_at_20_std value: 11.94162779763853 - type: nauc_recall_at_3_diff1 value: 21.124250217970754 - type: nauc_recall_at_3_max value: 9.581953839031879 - type: nauc_recall_at_3_std value: -9.955224094610848 - type: nauc_recall_at_5_diff1 value: 20.272821143755714 - type: nauc_recall_at_5_max value: 12.80122421686649 - type: nauc_recall_at_5_std value: -4.822509659730001 - type: ndcg_at_1 value: 52.87500000000001 - type: ndcg_at_10 value: 40.091 - type: ndcg_at_100 value: 45.007999999999996 - type: ndcg_at_1000 value: 51.522 - type: ndcg_at_20 value: 39.953 - type: ndcg_at_3 value: 44.627 - type: ndcg_at_5 value: 41.748000000000005 - type: precision_at_1 value: 64.75 - type: precision_at_10 value: 32.324999999999996 - type: precision_at_100 value: 10.583 - type: precision_at_1000 value: 1.992 - type: precision_at_20 value: 25.15 - type: precision_at_3 value: 48.5 - type: precision_at_5 value: 40.8 - type: recall_at_1 value: 8.112 - type: recall_at_10 value: 24.769 - type: recall_at_100 value: 51.92400000000001 - type: recall_at_1000 value: 72.60799999999999 - type: recall_at_20 value: 32.085 - type: recall_at_3 value: 14.707999999999998 - type: recall_at_5 value: 18.881 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 74.88499999999999 - type: f1 value: 69.55769956653745 - type: f1_weighted value: 75.98938892167276 - type: main_score value: 74.88499999999999 task: type: Classification - dataset: config: default name: MTEB FEVER (default) revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: main_score value: 86.088 - type: map_at_1 value: 74.21 - type: map_at_10 value: 82.238 - type: map_at_100 value: 82.467 - type: map_at_1000 value: 82.48 - type: map_at_20 value: 82.38 - type: map_at_3 value: 81.178 - type: map_at_5 value: 81.882 - type: mrr_at_1 value: 80.04800480048004 - type: mrr_at_10 value: 87.28162697222103 - type: mrr_at_100 value: 87.36425501689853 - type: mrr_at_1000 value: 87.36494888408146 - type: mrr_at_20 value: 87.33488767030532 - type: mrr_at_3 value: 86.5011501150115 - type: mrr_at_5 value: 87.04345434543454 - type: nauc_map_at_1000_diff1 value: 46.86807158039652 - type: nauc_map_at_1000_max value: 17.537735239936584 - type: nauc_map_at_1000_std value: -6.180991548000637 - type: nauc_map_at_100_diff1 value: 46.840981153123515 - type: nauc_map_at_100_max value: 17.51241604543591 - type: nauc_map_at_100_std value: -6.19572402233368 - type: nauc_map_at_10_diff1 value: 46.63164937877156 - type: nauc_map_at_10_max value: 17.396231277218714 - type: nauc_map_at_10_std value: -6.328960389468633 - type: nauc_map_at_1_diff1 value: 51.91442444295392 - type: nauc_map_at_1_max value: 14.772868336313651 - type: nauc_map_at_1_std value: -7.924628073687737 - type: nauc_map_at_20_diff1 value: 46.78996154399 - type: nauc_map_at_20_max value: 17.52594082408568 - type: nauc_map_at_20_std value: -6.2535816636418255 - type: nauc_map_at_3_diff1 value: 46.86720061616425 - type: nauc_map_at_3_max value: 17.17282268255638 - type: nauc_map_at_3_std value: -7.100454400283953 - type: nauc_map_at_5_diff1 value: 46.743320728340485 - type: nauc_map_at_5_max value: 17.22026822962506 - type: nauc_map_at_5_std value: -6.593983297795947 - type: nauc_mrr_at_1000_diff1 value: 64.22963921921831 - type: nauc_mrr_at_1000_max value: 22.50147928007347 - type: nauc_mrr_at_1000_std value: -10.753338651031981 - type: nauc_mrr_at_100_diff1 value: 64.22599646741416 - type: nauc_mrr_at_100_max value: 22.49976292804203 - type: nauc_mrr_at_100_std value: -10.753324625089736 - type: nauc_mrr_at_10_diff1 value: 64.24857003564016 - type: nauc_mrr_at_10_max value: 22.721448283312323 - type: nauc_mrr_at_10_std value: -10.698659951469375 - type: nauc_mrr_at_1_diff1 value: 65.80017393845672 - type: nauc_mrr_at_1_max value: 19.56658619771462 - type: nauc_mrr_at_1_std value: -10.691529848056236 - type: nauc_mrr_at_20_diff1 value: 64.22606211105564 - type: nauc_mrr_at_20_max value: 22.60630203277465 - type: nauc_mrr_at_20_std value: -10.698352035527936 - type: nauc_mrr_at_3_diff1 value: 64.03189495070804 - type: nauc_mrr_at_3_max value: 23.197599099302078 - type: nauc_mrr_at_3_std value: -10.941260656610341 - type: nauc_mrr_at_5_diff1 value: 64.21946450636831 - type: nauc_mrr_at_5_max value: 22.869883457504613 - type: nauc_mrr_at_5_std value: -10.773375222905306 - type: nauc_ndcg_at_1000_diff1 value: 48.18634946007256 - type: nauc_ndcg_at_1000_max value: 19.635685645181443 - type: nauc_ndcg_at_1000_std value: -5.008615485203909 - type: nauc_ndcg_at_100_diff1 value: 47.460702424024646 - type: nauc_ndcg_at_100_max value: 19.197829510466093 - type: nauc_ndcg_at_100_std value: -5.141098235552701 - type: nauc_ndcg_at_10_diff1 value: 46.75967320832195 - type: nauc_ndcg_at_10_max value: 19.162998560532944 - type: nauc_ndcg_at_10_std value: -5.680454888720109 - type: nauc_ndcg_at_1_diff1 value: 65.80017393845672 - type: nauc_ndcg_at_1_max value: 19.56658619771462 - type: nauc_ndcg_at_1_std value: -10.691529848056236 - type: nauc_ndcg_at_20_diff1 value: 47.15063801450417 - type: nauc_ndcg_at_20_max value: 19.387976860064036 - type: nauc_ndcg_at_20_std value: -5.434429887556901 - type: nauc_ndcg_at_3_diff1 value: 48.48013879703285 - type: nauc_ndcg_at_3_max value: 19.563845683013074 - type: nauc_ndcg_at_3_std value: -7.306366856511263 - type: nauc_ndcg_at_5_diff1 value: 47.4477936851643 - type: nauc_ndcg_at_5_max value: 19.12745930840238 - type: nauc_ndcg_at_5_std value: -6.338914655492511 - type: nauc_precision_at_1000_diff1 value: -4.975768805829236 - type: nauc_precision_at_1000_max value: 10.078421203817527 - type: nauc_precision_at_1000_std value: 10.15753365579419 - type: nauc_precision_at_100_diff1 value: -7.411336519288538 - type: nauc_precision_at_100_max value: 11.116507499213043 - type: nauc_precision_at_100_std value: 11.608241877542543 - type: nauc_precision_at_10_diff1 value: 2.6403449208341274 - type: nauc_precision_at_10_max value: 20.668398953238633 - type: nauc_precision_at_10_std value: 7.433281722501917 - type: nauc_precision_at_1_diff1 value: 65.80017393845672 - type: nauc_precision_at_1_max value: 19.56658619771462 - type: nauc_precision_at_1_std value: -10.691529848056236 - type: nauc_precision_at_20_diff1 value: -1.286553967637511 - type: nauc_precision_at_20_max value: 17.30405603464926 - type: nauc_precision_at_20_std value: 9.234773655809756 - type: nauc_precision_at_3_diff1 value: 31.364166410646675 - type: nauc_precision_at_3_max value: 26.397101881343527 - type: nauc_precision_at_3_std value: -5.0543954546843946 - type: nauc_precision_at_5_diff1 value: 17.1466778085294 - type: nauc_precision_at_5_max value: 23.18905254179433 - type: nauc_precision_at_5_std value: 1.6051724821489612 - type: nauc_recall_at_1000_diff1 value: -3.9377049069087935 - type: nauc_recall_at_1000_max value: 27.168346654704095 - type: nauc_recall_at_1000_std value: 38.58463265497753 - type: nauc_recall_at_100_diff1 value: -1.886570080947599 - type: nauc_recall_at_100_max value: 16.12930964320666 - type: nauc_recall_at_100_std value: 21.616391259129152 - type: nauc_recall_at_10_diff1 value: 15.941506685002588 - type: nauc_recall_at_10_max value: 19.141995524332728 - type: nauc_recall_at_10_std value: 5.860480767168416 - type: nauc_recall_at_1_diff1 value: 51.91442444295392 - type: nauc_recall_at_1_max value: 14.772868336313651 - type: nauc_recall_at_1_std value: -7.924628073687737 - type: nauc_recall_at_20_diff1 value: 11.583722825668058 - type: nauc_recall_at_20_max value: 19.867221612869876 - type: nauc_recall_at_20_std value: 10.141960757453084 - type: nauc_recall_at_3_diff1 value: 32.30936424972365 - type: nauc_recall_at_3_max value: 20.11705236473992 - type: nauc_recall_at_3_std value: -3.525144821962635 - type: nauc_recall_at_5_diff1 value: 25.68392975410304 - type: nauc_recall_at_5_max value: 19.221295609032595 - type: nauc_recall_at_5_std value: 0.576160647152633 - type: ndcg_at_1 value: 80.048 - type: ndcg_at_10 value: 86.088 - type: ndcg_at_100 value: 86.911 - type: ndcg_at_1000 value: 87.125 - type: ndcg_at_20 value: 86.468 - type: ndcg_at_3 value: 84.375 - type: ndcg_at_5 value: 85.384 - type: precision_at_1 value: 80.048 - type: precision_at_10 value: 10.236 - type: precision_at_100 value: 1.085 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_20 value: 5.2330000000000005 - type: precision_at_3 value: 32.078 - type: precision_at_5 value: 19.895 - type: recall_at_1 value: 74.21 - type: recall_at_10 value: 93.077 - type: recall_at_100 value: 96.348 - type: recall_at_1000 value: 97.65700000000001 - type: recall_at_20 value: 94.36099999999999 - type: recall_at_3 value: 88.337 - type: recall_at_5 value: 90.948 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 (default) revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: main_score value: 45.405 - type: map_at_1 value: 22.325 - type: map_at_10 value: 36.975 - type: map_at_100 value: 38.846000000000004 - type: map_at_1000 value: 39.012 - type: map_at_20 value: 37.958999999999996 - type: map_at_3 value: 32.208 - type: map_at_5 value: 34.928 - type: mrr_at_1 value: 44.29012345679013 - type: mrr_at_10 value: 54.02030668234372 - type: mrr_at_100 value: 54.72897336245347 - type: mrr_at_1000 value: 54.76320283944561 - type: mrr_at_20 value: 54.50419077165938 - type: mrr_at_3 value: 51.41460905349795 - type: mrr_at_5 value: 53.11213991769548 - type: nauc_map_at_1000_diff1 value: 42.33950505310022 - type: nauc_map_at_1000_max value: 32.814158723141745 - type: nauc_map_at_1000_std value: -4.5297230544932825 - type: nauc_map_at_100_diff1 value: 42.316327406548695 - type: nauc_map_at_100_max value: 32.706900013479725 - type: nauc_map_at_100_std value: -4.564571222935577 - type: nauc_map_at_10_diff1 value: 42.17734361420548 - type: nauc_map_at_10_max value: 31.527366385827854 - type: nauc_map_at_10_std value: -5.559289874353945 - type: nauc_map_at_1_diff1 value: 47.33003471166015 - type: nauc_map_at_1_max value: 21.535228737020457 - type: nauc_map_at_1_std value: -11.649016586524858 - type: nauc_map_at_20_diff1 value: 42.11015618170868 - type: nauc_map_at_20_max value: 32.18582282622051 - type: nauc_map_at_20_std value: -5.042968429993695 - type: nauc_map_at_3_diff1 value: 43.26686524198236 - type: nauc_map_at_3_max value: 28.849395895564083 - type: nauc_map_at_3_std value: -6.976952334117308 - type: nauc_map_at_5_diff1 value: 42.95893517901293 - type: nauc_map_at_5_max value: 30.871999781837612 - type: nauc_map_at_5_std value: -6.149645006139908 - type: nauc_mrr_at_1000_diff1 value: 51.23708914241626 - type: nauc_mrr_at_1000_max value: 40.298960389709 - type: nauc_mrr_at_1000_std value: -5.188577391773796 - type: nauc_mrr_at_100_diff1 value: 51.24001351681103 - type: nauc_mrr_at_100_max value: 40.318755039260886 - type: nauc_mrr_at_100_std value: -5.164744512057911 - type: nauc_mrr_at_10_diff1 value: 51.116323465364566 - type: nauc_mrr_at_10_max value: 40.18322650792177 - type: nauc_mrr_at_10_std value: -5.42707335446156 - type: nauc_mrr_at_1_diff1 value: 54.623685354463625 - type: nauc_mrr_at_1_max value: 38.52800456113852 - type: nauc_mrr_at_1_std value: -8.561342078884513 - type: nauc_mrr_at_20_diff1 value: 51.082878864924076 - type: nauc_mrr_at_20_max value: 40.25224355621811 - type: nauc_mrr_at_20_std value: -5.1386035874860925 - type: nauc_mrr_at_3_diff1 value: 51.28771495504919 - type: nauc_mrr_at_3_max value: 40.167661702884644 - type: nauc_mrr_at_3_std value: -6.672938174195537 - type: nauc_mrr_at_5_diff1 value: 51.386811950131026 - type: nauc_mrr_at_5_max value: 40.29452825209631 - type: nauc_mrr_at_5_std value: -6.134184637482388 - type: nauc_ndcg_at_1000_diff1 value: 44.46948002237412 - type: nauc_ndcg_at_1000_max value: 37.882877667376576 - type: nauc_ndcg_at_1000_std value: -0.2441149985965938 - type: nauc_ndcg_at_100_diff1 value: 43.96014037390138 - type: nauc_ndcg_at_100_max value: 36.96423036666587 - type: nauc_ndcg_at_100_std value: 0.21228554480998071 - type: nauc_ndcg_at_10_diff1 value: 42.889923047150226 - type: nauc_ndcg_at_10_max value: 33.95406097914127 - type: nauc_ndcg_at_10_std value: -3.3077129078149796 - type: nauc_ndcg_at_1_diff1 value: 54.623685354463625 - type: nauc_ndcg_at_1_max value: 38.52800456113852 - type: nauc_ndcg_at_1_std value: -8.561342078884513 - type: nauc_ndcg_at_20_diff1 value: 42.806846626799626 - type: nauc_ndcg_at_20_max value: 35.01566424207401 - type: nauc_ndcg_at_20_std value: -2.01466646308545 - type: nauc_ndcg_at_3_diff1 value: 43.29070711758635 - type: nauc_ndcg_at_3_max value: 35.81474510295669 - type: nauc_ndcg_at_3_std value: -4.937712863159993 - type: nauc_ndcg_at_5_diff1 value: 43.533204764747346 - type: nauc_ndcg_at_5_max value: 34.67200578229001 - type: nauc_ndcg_at_5_std value: -4.220153646752217 - type: nauc_precision_at_1000_diff1 value: -0.24162611684046686 - type: nauc_precision_at_1000_max value: 26.610031730319122 - type: nauc_precision_at_1000_std value: 12.85473387814076 - type: nauc_precision_at_100_diff1 value: 6.593767812518609 - type: nauc_precision_at_100_max value: 32.89478475065496 - type: nauc_precision_at_100_std value: 16.66995461135905 - type: nauc_precision_at_10_diff1 value: 17.48446148168886 - type: nauc_precision_at_10_max value: 36.54732448382068 - type: nauc_precision_at_10_std value: 6.7478320020402 - type: nauc_precision_at_1_diff1 value: 54.623685354463625 - type: nauc_precision_at_1_max value: 38.52800456113852 - type: nauc_precision_at_1_std value: -8.561342078884513 - type: nauc_precision_at_20_diff1 value: 13.039974734569537 - type: nauc_precision_at_20_max value: 36.49695572253983 - type: nauc_precision_at_20_std value: 10.476938728091008 - type: nauc_precision_at_3_diff1 value: 30.19928557150241 - type: nauc_precision_at_3_max value: 38.897101267116554 - type: nauc_precision_at_3_std value: 1.121533090916794 - type: nauc_precision_at_5_diff1 value: 25.33029636435617 - type: nauc_precision_at_5_max value: 39.59677600835699 - type: nauc_precision_at_5_std value: 3.4416095155763244 - type: nauc_recall_at_1000_diff1 value: 34.823080033440434 - type: nauc_recall_at_1000_max value: 43.87066795154745 - type: nauc_recall_at_1000_std value: 42.23182031662749 - type: nauc_recall_at_100_diff1 value: 30.70809572521992 - type: nauc_recall_at_100_max value: 31.598064007837852 - type: nauc_recall_at_100_std value: 20.758185821213164 - type: nauc_recall_at_10_diff1 value: 30.674660204386957 - type: nauc_recall_at_10_max value: 25.13675931430177 - type: nauc_recall_at_10_std value: 1.1493152709013974 - type: nauc_recall_at_1_diff1 value: 47.33003471166015 - type: nauc_recall_at_1_max value: 21.535228737020457 - type: nauc_recall_at_1_std value: -11.649016586524858 - type: nauc_recall_at_20_diff1 value: 28.60023313868174 - type: nauc_recall_at_20_max value: 26.576577612640655 - type: nauc_recall_at_20_std value: 6.331498880910594 - type: nauc_recall_at_3_diff1 value: 36.61359637854836 - type: nauc_recall_at_3_max value: 26.205709444189345 - type: nauc_recall_at_3_std value: -4.41772315378875 - type: nauc_recall_at_5_diff1 value: 34.721622588958894 - type: nauc_recall_at_5_max value: 26.870375540274104 - type: nauc_recall_at_5_std value: -1.2959303042762926 - type: ndcg_at_1 value: 44.29 - type: ndcg_at_10 value: 45.405 - type: ndcg_at_100 value: 52.027 - type: ndcg_at_1000 value: 54.688 - type: ndcg_at_20 value: 47.967999999999996 - type: ndcg_at_3 value: 41.496 - type: ndcg_at_5 value: 42.902 - type: precision_at_1 value: 44.29 - type: precision_at_10 value: 12.469 - type: precision_at_100 value: 1.9349999999999998 - type: precision_at_1000 value: 0.243 - type: precision_at_20 value: 7.323 - type: precision_at_3 value: 27.622999999999998 - type: precision_at_5 value: 20.34 - type: recall_at_1 value: 22.325 - type: recall_at_10 value: 52.788999999999994 - type: recall_at_100 value: 77.274 - type: recall_at_1000 value: 92.94 - type: recall_at_20 value: 60.714 - type: recall_at_3 value: 37.502 - type: recall_at_5 value: 44.808 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA (default) revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: main_score value: 66.661 - type: map_at_1 value: 41.418 - type: map_at_10 value: 57.086999999999996 - type: map_at_100 value: 57.888 - type: map_at_1000 value: 57.955 - type: map_at_20 value: 57.544 - type: map_at_3 value: 54.112 - type: map_at_5 value: 55.942 - type: mrr_at_1 value: 82.79540850776502 - type: mrr_at_10 value: 87.24545298650632 - type: mrr_at_100 value: 87.3943716521154 - type: mrr_at_1000 value: 87.40052014901985 - type: mrr_at_20 value: 87.3376988773675 - type: mrr_at_3 value: 86.54287643484132 - type: mrr_at_5 value: 87.0162052667117 - type: nauc_map_at_1000_diff1 value: 13.347058320450778 - type: nauc_map_at_1000_max value: 19.172918193696585 - type: nauc_map_at_1000_std value: 1.6085652199402172 - type: nauc_map_at_100_diff1 value: 13.309459563369677 - type: nauc_map_at_100_max value: 19.142490361521045 - type: nauc_map_at_100_std value: 1.5997757026480046 - type: nauc_map_at_10_diff1 value: 13.821467981397284 - type: nauc_map_at_10_max value: 19.47388049912085 - type: nauc_map_at_10_std value: 0.7945082440633815 - type: nauc_map_at_1_diff1 value: 80.17822133984255 - type: nauc_map_at_1_max value: 56.93232002015388 - type: nauc_map_at_1_std value: -9.565010407038201 - type: nauc_map_at_20_diff1 value: 13.447193497393146 - type: nauc_map_at_20_max value: 19.208078541028097 - type: nauc_map_at_20_std value: 1.2699537557176803 - type: nauc_map_at_3_diff1 value: 16.854345839107967 - type: nauc_map_at_3_max value: 21.648192526975727 - type: nauc_map_at_3_std value: -0.6137487567045511 - type: nauc_map_at_5_diff1 value: 14.543663008536509 - type: nauc_map_at_5_max value: 20.155541895741532 - type: nauc_map_at_5_std value: 0.25148082760110224 - type: nauc_mrr_at_1000_diff1 value: 79.11825919796162 - type: nauc_mrr_at_1000_max value: 60.10563640048739 - type: nauc_mrr_at_1000_std value: -6.726621618014327 - type: nauc_mrr_at_100_diff1 value: 79.11854278578646 - type: nauc_mrr_at_100_max value: 60.11377258817985 - type: nauc_mrr_at_100_std value: -6.704065951576038 - type: nauc_mrr_at_10_diff1 value: 79.07961808239499 - type: nauc_mrr_at_10_max value: 60.2138079214177 - type: nauc_mrr_at_10_std value: -6.74779578820509 - type: nauc_mrr_at_1_diff1 value: 80.25371155548501 - type: nauc_mrr_at_1_max value: 57.01027352172217 - type: nauc_mrr_at_1_std value: -9.682353752598317 - type: nauc_mrr_at_20_diff1 value: 79.08786670986484 - type: nauc_mrr_at_20_max value: 60.139471646688925 - type: nauc_mrr_at_20_std value: -6.720404576075471 - type: nauc_mrr_at_3_diff1 value: 78.93741620023842 - type: nauc_mrr_at_3_max value: 60.31902114928829 - type: nauc_mrr_at_3_std value: -7.066082480981481 - type: nauc_mrr_at_5_diff1 value: 79.06255305350973 - type: nauc_mrr_at_5_max value: 60.344631571197546 - type: nauc_mrr_at_5_std value: -6.788165280997917 - type: nauc_ndcg_at_1000_diff1 value: 17.006951693217548 - type: nauc_ndcg_at_1000_max value: 21.854859924097646 - type: nauc_ndcg_at_1000_std value: 4.70138835806943 - type: nauc_ndcg_at_100_diff1 value: 16.195007796313384 - type: nauc_ndcg_at_100_max value: 21.264332841663858 - type: nauc_ndcg_at_100_std value: 4.620999926841355 - type: nauc_ndcg_at_10_diff1 value: 18.327522629298294 - type: nauc_ndcg_at_10_max value: 22.686509071566917 - type: nauc_ndcg_at_10_std value: 1.5527071297942836 - type: nauc_ndcg_at_1_diff1 value: 80.17822133984255 - type: nauc_ndcg_at_1_max value: 56.93232002015388 - type: nauc_ndcg_at_1_std value: -9.565010407038201 - type: nauc_ndcg_at_20_diff1 value: 17.11074173500959 - type: nauc_ndcg_at_20_max value: 21.81160814631424 - type: nauc_ndcg_at_20_std value: 2.858829825220597 - type: nauc_ndcg_at_3_diff1 value: 23.797089205140068 - type: nauc_ndcg_at_3_max value: 26.659269305908296 - type: nauc_ndcg_at_3_std value: -0.7545654502076451 - type: nauc_ndcg_at_5_diff1 value: 20.067483031938934 - type: nauc_ndcg_at_5_max value: 24.23026610511652 - type: nauc_ndcg_at_5_std value: 0.5097749208107711 - type: nauc_precision_at_1000_diff1 value: -21.807728330326697 - type: nauc_precision_at_1000_max value: -2.9835997103120344 - type: nauc_precision_at_1000_std value: 25.81739799194849 - type: nauc_precision_at_100_diff1 value: -16.05478872817429 - type: nauc_precision_at_100_max value: 0.2665969008515287 - type: nauc_precision_at_100_std value: 19.352798394287323 - type: nauc_precision_at_10_diff1 value: -3.3507602135961037 - type: nauc_precision_at_10_max value: 8.867034772304718 - type: nauc_precision_at_10_std value: 6.545361194526079 - type: nauc_precision_at_1_diff1 value: 80.17822133984255 - type: nauc_precision_at_1_max value: 56.93232002015388 - type: nauc_precision_at_1_std value: -9.565010407038201 - type: nauc_precision_at_20_diff1 value: -7.902542409127802 - type: nauc_precision_at_20_max value: 5.62428878283396 - type: nauc_precision_at_20_std value: 10.592045512127914 - type: nauc_precision_at_3_diff1 value: 8.132713424441485 - type: nauc_precision_at_3_max value: 17.99416677485544 - type: nauc_precision_at_3_std value: 1.9785114664304215 - type: nauc_precision_at_5_diff1 value: 1.38596734740728 - type: nauc_precision_at_5_max value: 13.214138500817723 - type: nauc_precision_at_5_std value: 4.15378198762281 - type: nauc_recall_at_1000_diff1 value: -21.807728330326455 - type: nauc_recall_at_1000_max value: -2.9835997103117293 - type: nauc_recall_at_1000_std value: 25.8173979919487 - type: nauc_recall_at_100_diff1 value: -16.054788728174266 - type: nauc_recall_at_100_max value: 0.26659690085157123 - type: nauc_recall_at_100_std value: 19.35279839428729 - type: nauc_recall_at_10_diff1 value: -3.350760213596107 - type: nauc_recall_at_10_max value: 8.86703477230471 - type: nauc_recall_at_10_std value: 6.5453611945261505 - type: nauc_recall_at_1_diff1 value: 80.17822133984255 - type: nauc_recall_at_1_max value: 56.93232002015388 - type: nauc_recall_at_1_std value: -9.565010407038201 - type: nauc_recall_at_20_diff1 value: -7.902542409127704 - type: nauc_recall_at_20_max value: 5.6242887828340375 - type: nauc_recall_at_20_std value: 10.592045512127953 - type: nauc_recall_at_3_diff1 value: 8.132713424441446 - type: nauc_recall_at_3_max value: 17.99416677485538 - type: nauc_recall_at_3_std value: 1.9785114664303751 - type: nauc_recall_at_5_diff1 value: 1.3859673474071779 - type: nauc_recall_at_5_max value: 13.214138500817668 - type: nauc_recall_at_5_std value: 4.153781987622754 - type: ndcg_at_1 value: 82.836 - type: ndcg_at_10 value: 66.661 - type: ndcg_at_100 value: 69.42399999999999 - type: ndcg_at_1000 value: 70.722 - type: ndcg_at_20 value: 67.777 - type: ndcg_at_3 value: 62.517 - type: ndcg_at_5 value: 64.79700000000001 - type: precision_at_1 value: 82.836 - type: precision_at_10 value: 13.350000000000001 - type: precision_at_100 value: 1.552 - type: precision_at_1000 value: 0.172 - type: precision_at_20 value: 7.034 - type: precision_at_3 value: 38.375 - type: precision_at_5 value: 24.829 - type: recall_at_1 value: 41.418 - type: recall_at_10 value: 66.752 - type: recall_at_100 value: 77.576 - type: recall_at_1000 value: 86.199 - type: recall_at_20 value: 70.338 - type: recall_at_3 value: 57.562000000000005 - type: recall_at_5 value: 62.073 task: type: Retrieval - dataset: config: default name: MTEB ImdbClassification (default) revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 93.58840000000001 - type: ap value: 90.234834378287 - type: ap_weighted value: 90.234834378287 - type: f1 value: 93.58346966422063 - type: f1_weighted value: 93.58346966422063 - type: main_score value: 93.58840000000001 task: type: Classification - dataset: config: default name: MTEB MSMARCO (default) revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: main_score value: 41.48 - type: map_at_1 value: 22.078999999999997 - type: map_at_10 value: 34.416000000000004 - type: map_at_100 value: 35.541 - type: map_at_1000 value: 35.592 - type: map_at_20 value: 35.106 - type: map_at_3 value: 30.470000000000002 - type: map_at_5 value: 32.774 - type: mrr_at_1 value: 22.693409742120345 - type: mrr_at_10 value: 35.02055760221949 - type: mrr_at_100 value: 36.07282466487795 - type: mrr_at_1000 value: 36.11725121701468 - type: mrr_at_20 value: 35.667140877547986 - type: mrr_at_3 value: 31.122254059216814 - type: mrr_at_5 value: 33.40592168099331 - type: nauc_map_at_1000_diff1 value: 33.00333472064972 - type: nauc_map_at_1000_max value: 5.156444947074947 - type: nauc_map_at_1000_std value: -23.103939979826375 - type: nauc_map_at_100_diff1 value: 32.99943906977456 - type: nauc_map_at_100_max value: 5.156792638157342 - type: nauc_map_at_100_std value: -23.09927789432014 - type: nauc_map_at_10_diff1 value: 32.93427060211673 - type: nauc_map_at_10_max value: 5.009847068055439 - type: nauc_map_at_10_std value: -23.69229778425936 - type: nauc_map_at_1_diff1 value: 35.879541770806426 - type: nauc_map_at_1_max value: 4.037000551161811 - type: nauc_map_at_1_std value: -21.066913542507095 - type: nauc_map_at_20_diff1 value: 32.94459306136245 - type: nauc_map_at_20_max value: 5.08450123260384 - type: nauc_map_at_20_std value: -23.367858842401674 - type: nauc_map_at_3_diff1 value: 33.186734646971495 - type: nauc_map_at_3_max value: 4.52958372002426 - type: nauc_map_at_3_std value: -23.407182657661863 - type: nauc_map_at_5_diff1 value: 33.09447602825229 - type: nauc_map_at_5_max value: 4.8295482352066275 - type: nauc_map_at_5_std value: -23.977226416616457 - type: nauc_mrr_at_1000_diff1 value: 32.90248885790994 - type: nauc_mrr_at_1000_max value: 5.345915497836417 - type: nauc_mrr_at_1000_std value: -22.775176728644926 - type: nauc_mrr_at_100_diff1 value: 32.89830733234614 - type: nauc_mrr_at_100_max value: 5.354794932204688 - type: nauc_mrr_at_100_std value: -22.76281634843283 - type: nauc_mrr_at_10_diff1 value: 32.85362740239939 - type: nauc_mrr_at_10_max value: 5.22277263020967 - type: nauc_mrr_at_10_std value: -23.29890783663585 - type: nauc_mrr_at_1_diff1 value: 35.8004961400585 - type: nauc_mrr_at_1_max value: 4.07480515690297 - type: nauc_mrr_at_1_std value: -21.157419860722133 - type: nauc_mrr_at_20_diff1 value: 32.831058277421675 - type: nauc_mrr_at_20_max value: 5.30231502729234 - type: nauc_mrr_at_20_std value: -22.995188734787643 - type: nauc_mrr_at_3_diff1 value: 33.06512398614513 - type: nauc_mrr_at_3_max value: 4.6832127086497675 - type: nauc_mrr_at_3_std value: -23.185466086324016 - type: nauc_mrr_at_5_diff1 value: 32.95656016095678 - type: nauc_mrr_at_5_max value: 5.0055516099566475 - type: nauc_mrr_at_5_std value: -23.648076417104612 - type: nauc_ndcg_at_1000_diff1 value: 32.23911068627994 - type: nauc_ndcg_at_1000_max value: 6.340890121521923 - type: nauc_ndcg_at_1000_std value: -21.64542687396577 - type: nauc_ndcg_at_100_diff1 value: 32.11878167303473 - type: nauc_ndcg_at_100_max value: 6.597128552520879 - type: nauc_ndcg_at_100_std value: -21.03041945862791 - type: nauc_ndcg_at_10_diff1 value: 31.78511231016483 - type: nauc_ndcg_at_10_max value: 5.784417481640047 - type: nauc_ndcg_at_10_std value: -24.161027978905647 - type: nauc_ndcg_at_1_diff1 value: 35.74394132968329 - type: nauc_ndcg_at_1_max value: 4.0476454646619215 - type: nauc_ndcg_at_1_std value: -21.16866068260486 - type: nauc_ndcg_at_20_diff1 value: 31.722628551526604 - type: nauc_ndcg_at_20_max value: 6.085473579598258 - type: nauc_ndcg_at_20_std value: -23.01301453978275 - type: nauc_ndcg_at_3_diff1 value: 32.38743175334077 - type: nauc_ndcg_at_3_max value: 4.708074286110014 - type: nauc_ndcg_at_3_std value: -24.005841131351065 - type: nauc_ndcg_at_5_diff1 value: 32.19107640366649 - type: nauc_ndcg_at_5_max value: 5.248392125691872 - type: nauc_ndcg_at_5_std value: -24.9544454485758 - type: nauc_precision_at_1000_diff1 value: -2.0283123762593203 - type: nauc_precision_at_1000_max value: 14.569550330630554 - type: nauc_precision_at_1000_std value: 18.01811212416059 - type: nauc_precision_at_100_diff1 value: 14.463485381374719 - type: nauc_precision_at_100_max value: 16.06415646423591 - type: nauc_precision_at_100_std value: 8.987627462107199 - type: nauc_precision_at_10_diff1 value: 25.530846925228666 - type: nauc_precision_at_10_max value: 8.075830710803086 - type: nauc_precision_at_10_std value: -24.00010341583341 - type: nauc_precision_at_1_diff1 value: 35.74394132968329 - type: nauc_precision_at_1_max value: 4.0476454646619215 - type: nauc_precision_at_1_std value: -21.16866068260486 - type: nauc_precision_at_20_diff1 value: 22.490315165998652 - type: nauc_precision_at_20_max value: 9.695438542678712 - type: nauc_precision_at_20_std value: -16.779150840743586 - type: nauc_precision_at_3_diff1 value: 29.653053865297718 - type: nauc_precision_at_3_max value: 4.956580341717329 - type: nauc_precision_at_3_std value: -25.716768027801912 - type: nauc_precision_at_5_diff1 value: 28.466584677280675 - type: nauc_precision_at_5_max value: 6.035813186905091 - type: nauc_precision_at_5_std value: -27.40096435134959 - type: nauc_recall_at_1000_diff1 value: 16.188777617075157 - type: nauc_recall_at_1000_max value: 45.1160674872711 - type: nauc_recall_at_1000_std value: 50.8993030763505 - type: nauc_recall_at_100_diff1 value: 26.462748511423666 - type: nauc_recall_at_100_max value: 20.17057177381908 - type: nauc_recall_at_100_std value: 6.567222385661084 - type: nauc_recall_at_10_diff1 value: 27.694042744869897 - type: nauc_recall_at_10_max value: 8.193922397003126 - type: nauc_recall_at_10_std value: -25.428481461107726 - type: nauc_recall_at_1_diff1 value: 35.879541770806426 - type: nauc_recall_at_1_max value: 4.037000551161811 - type: nauc_recall_at_1_std value: -21.066913542507095 - type: nauc_recall_at_20_diff1 value: 26.412542837917503 - type: nauc_recall_at_20_max value: 10.119778040160208 - type: nauc_recall_at_20_std value: -20.353583276762542 - type: nauc_recall_at_3_diff1 value: 30.1723792933633 - type: nauc_recall_at_3_max value: 4.991021506511908 - type: nauc_recall_at_3_std value: -25.61028187578253 - type: nauc_recall_at_5_diff1 value: 29.546460816157307 - type: nauc_recall_at_5_max value: 6.257065735729789 - type: nauc_recall_at_5_std value: -27.757268209659046 - type: ndcg_at_1 value: 22.708000000000002 - type: ndcg_at_10 value: 41.48 - type: ndcg_at_100 value: 46.894999999999996 - type: ndcg_at_1000 value: 48.14 - type: ndcg_at_20 value: 43.918 - type: ndcg_at_3 value: 33.423 - type: ndcg_at_5 value: 37.553 - type: precision_at_1 value: 22.708000000000002 - type: precision_at_10 value: 6.6049999999999995 - type: precision_at_100 value: 0.9329999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_20 value: 3.811 - type: precision_at_3 value: 14.283999999999999 - type: precision_at_5 value: 10.685 - type: recall_at_1 value: 22.078999999999997 - type: recall_at_10 value: 63.269 - type: recall_at_100 value: 88.318 - type: recall_at_1000 value: 97.80799999999999 - type: recall_at_20 value: 72.741 - type: recall_at_3 value: 41.347 - type: recall_at_5 value: 51.271 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 96.0373917008664 - type: f1 value: 95.77672920037678 - type: f1_weighted value: 96.06299804062722 - type: main_score value: 96.0373917008664 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 89.1655266757866 - type: f1 value: 71.6595596649587 - type: f1_weighted value: 90.44597470884298 - type: main_score value: 89.1655266757866 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 76.60390047074647 - type: f1 value: 74.0382414657559 - type: f1_weighted value: 76.53055023019932 - type: main_score value: 76.60390047074647 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 78.93073301950236 - type: f1 value: 78.58195068346751 - type: f1_weighted value: 78.86975899493798 - type: main_score value: 78.93073301950236 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P (default) revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: main_score value: 37.66500681777215 - type: v_measure value: 37.66500681777215 - type: v_measure_std value: 1.4953449515069268 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S (default) revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: main_score value: 35.51021437644991 - type: v_measure value: 35.51021437644991 - type: v_measure_std value: 1.3321174913629759 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking (default) revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 split: test type: mteb/mind_small metrics: - type: main_score value: 30.10020452046386 - type: map value: 30.10020452046386 - type: mrr value: 31.096861019258043 - type: nAUC_map_diff1 value: 12.853085612418742 - type: nAUC_map_max value: -20.97077158351351 - type: nAUC_map_std value: -2.459841546804226 - type: nAUC_mrr_diff1 value: 12.08750595893558 - type: nAUC_mrr_max value: -15.502813020230475 - type: nAUC_mrr_std value: -0.8069966088331175 task: type: Reranking - dataset: config: default name: MTEB NFCorpus (default) revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: main_score value: 34.725 - type: map_at_1 value: 5.901 - type: map_at_10 value: 12.992999999999999 - type: map_at_100 value: 16.402 - type: map_at_1000 value: 17.896 - type: map_at_20 value: 14.411 - type: map_at_3 value: 9.3 - type: map_at_5 value: 10.906 - type: mrr_at_1 value: 46.13003095975232 - type: mrr_at_10 value: 54.67123691581895 - type: mrr_at_100 value: 55.13154466663215 - type: mrr_at_1000 value: 55.18028030923489 - type: mrr_at_20 value: 54.89203403371564 - type: mrr_at_3 value: 52.47678018575851 - type: mrr_at_5 value: 54.10216718266254 - type: nauc_map_at_1000_diff1 value: 26.097980547292376 - type: nauc_map_at_1000_max value: 31.716612190607847 - type: nauc_map_at_1000_std value: 10.484226609845875 - type: nauc_map_at_100_diff1 value: 26.903184213500687 - type: nauc_map_at_100_max value: 30.254077338590847 - type: nauc_map_at_100_std value: 5.721213154053636 - type: nauc_map_at_10_diff1 value: 30.41995975934737 - type: nauc_map_at_10_max value: 23.720851152044826 - type: nauc_map_at_10_std value: -6.968119243629756 - type: nauc_map_at_1_diff1 value: 45.91087927776542 - type: nauc_map_at_1_max value: 11.368756627277754 - type: nauc_map_at_1_std value: -21.987291617576854 - type: nauc_map_at_20_diff1 value: 28.907069629931854 - type: nauc_map_at_20_max value: 26.70846407056094 - type: nauc_map_at_20_std value: -1.9126005785897775 - type: nauc_map_at_3_diff1 value: 38.73155355719495 - type: nauc_map_at_3_max value: 17.769925571726496 - type: nauc_map_at_3_std value: -15.240426410962574 - type: nauc_map_at_5_diff1 value: 34.6278617589197 - type: nauc_map_at_5_max value: 20.54601986245645 - type: nauc_map_at_5_std value: -11.566817873968779 - type: nauc_mrr_at_1000_diff1 value: 36.64991509982144 - type: nauc_mrr_at_1000_max value: 49.697173212531744 - type: nauc_mrr_at_1000_std value: 26.86511696261478 - type: nauc_mrr_at_100_diff1 value: 36.68743394598715 - type: nauc_mrr_at_100_max value: 49.744202083676264 - type: nauc_mrr_at_100_std value: 26.90232555840209 - type: nauc_mrr_at_10_diff1 value: 36.47029954847764 - type: nauc_mrr_at_10_max value: 49.439023284006 - type: nauc_mrr_at_10_std value: 26.690706480930444 - type: nauc_mrr_at_1_diff1 value: 36.59190142546215 - type: nauc_mrr_at_1_max value: 41.74235868276634 - type: nauc_mrr_at_1_std value: 18.414274177675807 - type: nauc_mrr_at_20_diff1 value: 36.681072119690086 - type: nauc_mrr_at_20_max value: 49.800936007548934 - type: nauc_mrr_at_20_std value: 26.961504252981683 - type: nauc_mrr_at_3_diff1 value: 36.63303178691115 - type: nauc_mrr_at_3_max value: 48.628730526802904 - type: nauc_mrr_at_3_std value: 25.157181938589225 - type: nauc_mrr_at_5_diff1 value: 36.41948638139246 - type: nauc_mrr_at_5_max value: 49.180007480727134 - type: nauc_mrr_at_5_std value: 26.145567865350543 - type: nauc_ndcg_at_1000_diff1 value: 26.257313381009283 - type: nauc_ndcg_at_1000_max value: 46.45094846583072 - type: nauc_ndcg_at_1000_std value: 30.74855470405661 - type: nauc_ndcg_at_100_diff1 value: 25.337713280261774 - type: nauc_ndcg_at_100_max value: 42.51314175786316 - type: nauc_ndcg_at_100_std value: 25.717600091835052 - type: nauc_ndcg_at_10_diff1 value: 27.28963504973803 - type: nauc_ndcg_at_10_max value: 45.07020624629025 - type: nauc_ndcg_at_10_std value: 29.017215904691902 - type: nauc_ndcg_at_1_diff1 value: 39.69547779212674 - type: nauc_ndcg_at_1_max value: 39.944550572400225 - type: nauc_ndcg_at_1_std value: 17.27308663512775 - type: nauc_ndcg_at_20_diff1 value: 26.88029364873597 - type: nauc_ndcg_at_20_max value: 43.89319625918324 - type: nauc_ndcg_at_20_std value: 29.182590252122804 - type: nauc_ndcg_at_3_diff1 value: 32.49288862835273 - type: nauc_ndcg_at_3_max value: 45.57318753977976 - type: nauc_ndcg_at_3_std value: 23.953534500127557 - type: nauc_ndcg_at_5_diff1 value: 29.578845399866545 - type: nauc_ndcg_at_5_max value: 46.601862971633544 - type: nauc_ndcg_at_5_std value: 27.55565792973463 - type: nauc_precision_at_1000_diff1 value: -4.397392180783799 - type: nauc_precision_at_1000_max value: 17.406927055459345 - type: nauc_precision_at_1000_std value: 47.8835834302276 - type: nauc_precision_at_100_diff1 value: -3.582470870457778 - type: nauc_precision_at_100_max value: 30.6298826448415 - type: nauc_precision_at_100_std value: 55.54858727751579 - type: nauc_precision_at_10_diff1 value: 6.591245947478634 - type: nauc_precision_at_10_max value: 44.36069671353394 - type: nauc_precision_at_10_std value: 45.85949796089425 - type: nauc_precision_at_1_diff1 value: 39.90620183792372 - type: nauc_precision_at_1_max value: 41.93832955553217 - type: nauc_precision_at_1_std value: 17.78208215842155 - type: nauc_precision_at_20_diff1 value: 3.1763559888676305 - type: nauc_precision_at_20_max value: 40.19013491290661 - type: nauc_precision_at_20_std value: 50.30896997510246 - type: nauc_precision_at_3_diff1 value: 21.346541990363338 - type: nauc_precision_at_3_max value: 46.358486907663234 - type: nauc_precision_at_3_std value: 30.30796100013066 - type: nauc_precision_at_5_diff1 value: 13.764960158282511 - type: nauc_precision_at_5_max value: 47.38189520644064 - type: nauc_precision_at_5_std value: 38.83370975791448 - type: nauc_recall_at_1000_diff1 value: 3.111013627981912 - type: nauc_recall_at_1000_max value: 17.453303474327654 - type: nauc_recall_at_1000_std value: 16.831446977812252 - type: nauc_recall_at_100_diff1 value: 16.59425078697382 - type: nauc_recall_at_100_max value: 25.400896109980174 - type: nauc_recall_at_100_std value: 10.794971059479254 - type: nauc_recall_at_10_diff1 value: 23.63271460212068 - type: nauc_recall_at_10_max value: 20.991264958049598 - type: nauc_recall_at_10_std value: -6.022250169253036 - type: nauc_recall_at_1_diff1 value: 45.91087927776542 - type: nauc_recall_at_1_max value: 11.368756627277754 - type: nauc_recall_at_1_std value: -21.987291617576854 - type: nauc_recall_at_20_diff1 value: 22.615984500854555 - type: nauc_recall_at_20_max value: 23.637250829352997 - type: nauc_recall_at_20_std value: 0.41128528477486354 - type: nauc_recall_at_3_diff1 value: 37.308271400820985 - type: nauc_recall_at_3_max value: 18.63584930406467 - type: nauc_recall_at_3_std value: -13.472251033244428 - type: nauc_recall_at_5_diff1 value: 31.142005435540852 - type: nauc_recall_at_5_max value: 20.5834454794761 - type: nauc_recall_at_5_std value: -9.81034234508067 - type: ndcg_at_1 value: 42.879 - type: ndcg_at_10 value: 34.725 - type: ndcg_at_100 value: 31.798 - type: ndcg_at_1000 value: 40.486 - type: ndcg_at_20 value: 32.535 - type: ndcg_at_3 value: 38.97 - type: ndcg_at_5 value: 37.602000000000004 - type: precision_at_1 value: 44.891999999999996 - type: precision_at_10 value: 26.192 - type: precision_at_100 value: 8.241 - type: precision_at_1000 value: 2.085 - type: precision_at_20 value: 19.52 - type: precision_at_3 value: 36.842000000000006 - type: precision_at_5 value: 33.312999999999995 - type: recall_at_1 value: 5.901 - type: recall_at_10 value: 17.171 - type: recall_at_100 value: 31.709 - type: recall_at_1000 value: 63.589 - type: recall_at_20 value: 20.782999999999998 - type: recall_at_3 value: 10.194 - type: recall_at_5 value: 12.934999999999999 task: type: Retrieval - dataset: config: default name: MTEB NQ (default) revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: main_score value: 59.951 - type: map_at_1 value: 36.718 - type: map_at_10 value: 52.518 - type: map_at_100 value: 53.373000000000005 - type: map_at_1000 value: 53.400000000000006 - type: map_at_20 value: 53.11 - type: map_at_3 value: 48.606 - type: map_at_5 value: 50.922999999999995 - type: mrr_at_1 value: 41.22247972190035 - type: mrr_at_10 value: 55.10211471610661 - type: mrr_at_100 value: 55.690424468447944 - type: mrr_at_1000 value: 55.709587669000626 - type: mrr_at_20 value: 55.51307514935747 - type: mrr_at_3 value: 52.10023174971031 - type: mrr_at_5 value: 53.85139049826188 - type: nauc_map_at_1000_diff1 value: 36.084432495766244 - type: nauc_map_at_1000_max value: 32.106683448614696 - type: nauc_map_at_1000_std value: 0.28114600458421135 - type: nauc_map_at_100_diff1 value: 36.076754155834685 - type: nauc_map_at_100_max value: 32.124501222653386 - type: nauc_map_at_100_std value: 0.3074172933687319 - type: nauc_map_at_10_diff1 value: 35.95846264899338 - type: nauc_map_at_10_max value: 32.268962480678645 - type: nauc_map_at_10_std value: -0.10550275250265802 - type: nauc_map_at_1_diff1 value: 39.29370524773578 - type: nauc_map_at_1_max value: 25.991296131217062 - type: nauc_map_at_1_std value: -2.5540466996583753 - type: nauc_map_at_20_diff1 value: 35.98377971994357 - type: nauc_map_at_20_max value: 32.15683504409824 - type: nauc_map_at_20_std value: 0.19145693127134786 - type: nauc_map_at_3_diff1 value: 36.0944254890347 - type: nauc_map_at_3_max value: 30.2128510665515 - type: nauc_map_at_3_std value: -1.9611081461308983 - type: nauc_map_at_5_diff1 value: 36.00156289591984 - type: nauc_map_at_5_max value: 31.56149465902775 - type: nauc_map_at_5_std value: -0.8373235686244762 - type: nauc_mrr_at_1000_diff1 value: 36.09152753153953 - type: nauc_mrr_at_1000_max value: 32.43454228496553 - type: nauc_mrr_at_1000_std value: 1.8517892571605596 - type: nauc_mrr_at_100_diff1 value: 36.09112009133751 - type: nauc_mrr_at_100_max value: 32.44951869408173 - type: nauc_mrr_at_100_std value: 1.8714844618486277 - type: nauc_mrr_at_10_diff1 value: 35.930421137614914 - type: nauc_mrr_at_10_max value: 32.65451978743636 - type: nauc_mrr_at_10_std value: 1.7723190829619009 - type: nauc_mrr_at_1_diff1 value: 39.396024242346954 - type: nauc_mrr_at_1_max value: 28.132740347350953 - type: nauc_mrr_at_1_std value: -0.5935576215439111 - type: nauc_mrr_at_20_diff1 value: 35.99903536497898 - type: nauc_mrr_at_20_max value: 32.50256539352071 - type: nauc_mrr_at_20_std value: 1.8829977887370852 - type: nauc_mrr_at_3_diff1 value: 35.91812477028109 - type: nauc_mrr_at_3_max value: 31.595134192404796 - type: nauc_mrr_at_3_std value: 0.6749658339604261 - type: nauc_mrr_at_5_diff1 value: 35.90541524153257 - type: nauc_mrr_at_5_max value: 32.375076970871106 - type: nauc_mrr_at_5_std value: 1.4530009988326982 - type: nauc_ndcg_at_1000_diff1 value: 35.52189976546703 - type: nauc_ndcg_at_1000_max value: 33.97534043055662 - type: nauc_ndcg_at_1000_std value: 2.7358127566748025 - type: nauc_ndcg_at_100_diff1 value: 35.32967760887528 - type: nauc_ndcg_at_100_max value: 34.51536712950666 - type: nauc_ndcg_at_100_std value: 3.561484184520643 - type: nauc_ndcg_at_10_diff1 value: 34.63981443982384 - type: nauc_ndcg_at_10_max value: 35.2466755214177 - type: nauc_ndcg_at_10_std value: 2.163469830591493 - type: nauc_ndcg_at_1_diff1 value: 39.47234805254548 - type: nauc_ndcg_at_1_max value: 27.949377920983448 - type: nauc_ndcg_at_1_std value: -0.7016496183295023 - type: nauc_ndcg_at_20_diff1 value: 34.77193782885647 - type: nauc_ndcg_at_20_max value: 34.79563187118757 - type: nauc_ndcg_at_20_std value: 3.0333339734937326 - type: nauc_ndcg_at_3_diff1 value: 34.84410905343334 - type: nauc_ndcg_at_3_max value: 31.53857235413653 - type: nauc_ndcg_at_3_std value: -1.2121011083371147 - type: nauc_ndcg_at_5_diff1 value: 34.70655373953545 - type: nauc_ndcg_at_5_max value: 33.692790095442994 - type: nauc_ndcg_at_5_std value: 0.6612260001056149 - type: nauc_precision_at_1000_diff1 value: -6.531497758654776 - type: nauc_precision_at_1000_max value: 6.592383443768815 - type: nauc_precision_at_1000_std value: 15.266065986503547 - type: nauc_precision_at_100_diff1 value: -2.0738709139302003 - type: nauc_precision_at_100_max value: 15.324594432362842 - type: nauc_precision_at_100_std value: 20.825895623533857 - type: nauc_precision_at_10_diff1 value: 9.98637582589397 - type: nauc_precision_at_10_max value: 30.50457748285925 - type: nauc_precision_at_10_std value: 13.73313229149034 - type: nauc_precision_at_1_diff1 value: 39.47234805254548 - type: nauc_precision_at_1_max value: 27.949377920983448 - type: nauc_precision_at_1_std value: -0.7016496183295023 - type: nauc_precision_at_20_diff1 value: 4.338247023429635 - type: nauc_precision_at_20_max value: 23.76589815146598 - type: nauc_precision_at_20_std value: 17.322633618978386 - type: nauc_precision_at_3_diff1 value: 23.17326950999716 - type: nauc_precision_at_3_max value: 31.075717350827293 - type: nauc_precision_at_3_std value: 2.762436540576557 - type: nauc_precision_at_5_diff1 value: 17.362008096246633 - type: nauc_precision_at_5_max value: 32.08805696305664 - type: nauc_precision_at_5_std value: 8.12524167169048 - type: nauc_recall_at_1000_diff1 value: 34.18415215294108 - type: nauc_recall_at_1000_max value: 79.77930971993527 - type: nauc_recall_at_1000_std value: 70.27189175741741 - type: nauc_recall_at_100_diff1 value: 28.249629521143465 - type: nauc_recall_at_100_max value: 62.21529072406605 - type: nauc_recall_at_100_std value: 46.23141649265807 - type: nauc_recall_at_10_diff1 value: 27.302420328273612 - type: nauc_recall_at_10_max value: 47.57999826869166 - type: nauc_recall_at_10_std value: 9.807109630878386 - type: nauc_recall_at_1_diff1 value: 39.29370524773578 - type: nauc_recall_at_1_max value: 25.991296131217062 - type: nauc_recall_at_1_std value: -2.5540466996583753 - type: nauc_recall_at_20_diff1 value: 26.264363964930997 - type: nauc_recall_at_20_max value: 49.762297304442136 - type: nauc_recall_at_20_std value: 18.650695925686502 - type: nauc_recall_at_3_diff1 value: 29.95231482486556 - type: nauc_recall_at_3_max value: 33.054441143791394 - type: nauc_recall_at_3_std value: -1.4133288694811754 - type: nauc_recall_at_5_diff1 value: 28.978660648633802 - type: nauc_recall_at_5_max value: 38.844300548161186 - type: nauc_recall_at_5_std value: 3.19644809086287 - type: ndcg_at_1 value: 41.193999999999996 - type: ndcg_at_10 value: 59.951 - type: ndcg_at_100 value: 63.343 - type: ndcg_at_1000 value: 63.941 - type: ndcg_at_20 value: 61.781 - type: ndcg_at_3 value: 52.756 - type: ndcg_at_5 value: 56.486999999999995 - type: precision_at_1 value: 41.193999999999996 - type: precision_at_10 value: 9.528 - type: precision_at_100 value: 1.145 - type: precision_at_1000 value: 0.12 - type: precision_at_20 value: 5.206 - type: precision_at_3 value: 23.696 - type: precision_at_5 value: 16.419 - type: recall_at_1 value: 36.718 - type: recall_at_10 value: 79.84 - type: recall_at_100 value: 94.228 - type: recall_at_1000 value: 98.648 - type: recall_at_20 value: 86.542 - type: recall_at_3 value: 61.31999999999999 - type: recall_at_5 value: 69.836 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval (default) revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 split: test type: mteb/quora metrics: - type: main_score value: 89.838 - type: map_at_1 value: 72.44500000000001 - type: map_at_10 value: 86.332 - type: map_at_100 value: 86.936 - type: map_at_1000 value: 86.95 - type: map_at_20 value: 86.72999999999999 - type: map_at_3 value: 83.417 - type: map_at_5 value: 85.292 - type: mrr_at_1 value: 83.5 - type: mrr_at_10 value: 89.20519444444444 - type: mrr_at_100 value: 89.2819086258491 - type: mrr_at_1000 value: 89.28214505128291 - type: mrr_at_20 value: 89.26673258007042 - type: mrr_at_3 value: 88.36 - type: mrr_at_5 value: 88.95100000000001 - type: nauc_map_at_1000_diff1 value: 76.90740671940051 - type: nauc_map_at_1000_max value: 36.46444946338708 - type: nauc_map_at_1000_std value: -56.60380240532508 - type: nauc_map_at_100_diff1 value: 76.91112078761572 - type: nauc_map_at_100_max value: 36.45304363618243 - type: nauc_map_at_100_std value: -56.67988410741111 - type: nauc_map_at_10_diff1 value: 77.09598611046616 - type: nauc_map_at_10_max value: 35.96689922341558 - type: nauc_map_at_10_std value: -58.68604909203303 - type: nauc_map_at_1_diff1 value: 80.37641963929528 - type: nauc_map_at_1_max value: 27.046973659136057 - type: nauc_map_at_1_std value: -49.41187376826384 - type: nauc_map_at_20_diff1 value: 76.9541622063172 - type: nauc_map_at_20_max value: 36.29817666157097 - type: nauc_map_at_20_std value: -57.58995860118392 - type: nauc_map_at_3_diff1 value: 77.79036430390953 - type: nauc_map_at_3_max value: 33.23673927645347 - type: nauc_map_at_3_std value: -60.10156884287652 - type: nauc_map_at_5_diff1 value: 77.33636903512307 - type: nauc_map_at_5_max value: 35.003919992106006 - type: nauc_map_at_5_std value: -59.97787405958172 - type: nauc_mrr_at_1000_diff1 value: 77.73000572331905 - type: nauc_mrr_at_1000_max value: 38.561364157585324 - type: nauc_mrr_at_1000_std value: -53.44976098044828 - type: nauc_mrr_at_100_diff1 value: 77.72981689727108 - type: nauc_mrr_at_100_max value: 38.561425387623785 - type: nauc_mrr_at_100_std value: -53.45033750871979 - type: nauc_mrr_at_10_diff1 value: 77.71709626439586 - type: nauc_mrr_at_10_max value: 38.624900686387214 - type: nauc_mrr_at_10_std value: -53.58765986161691 - type: nauc_mrr_at_1_diff1 value: 78.37565253706408 - type: nauc_mrr_at_1_max value: 38.23888076842768 - type: nauc_mrr_at_1_std value: -50.20603764579538 - type: nauc_mrr_at_20_diff1 value: 77.7306939391157 - type: nauc_mrr_at_20_max value: 38.59165749191751 - type: nauc_mrr_at_20_std value: -53.48812024214872 - type: nauc_mrr_at_3_diff1 value: 77.54353349806524 - type: nauc_mrr_at_3_max value: 38.713759549229785 - type: nauc_mrr_at_3_std value: -53.94582165002703 - type: nauc_mrr_at_5_diff1 value: 77.70283049254654 - type: nauc_mrr_at_5_max value: 38.716317005111215 - type: nauc_mrr_at_5_std value: -53.92085356926888 - type: nauc_ndcg_at_1000_diff1 value: 76.89855290894926 - type: nauc_ndcg_at_1000_max value: 37.772216233524325 - type: nauc_ndcg_at_1000_std value: -54.86144177114646 - type: nauc_ndcg_at_100_diff1 value: 76.90257905740786 - type: nauc_ndcg_at_100_max value: 37.739876618823274 - type: nauc_ndcg_at_100_std value: -55.18253534518033 - type: nauc_ndcg_at_10_diff1 value: 76.82906119719216 - type: nauc_ndcg_at_10_max value: 37.09739956129085 - type: nauc_ndcg_at_10_std value: -58.49646829288816 - type: nauc_ndcg_at_1_diff1 value: 78.37565253706408 - type: nauc_ndcg_at_1_max value: 38.335351847985045 - type: nauc_ndcg_at_1_std value: -50.212302001610745 - type: nauc_ndcg_at_20_diff1 value: 76.86843611975287 - type: nauc_ndcg_at_20_max value: 37.38859864360577 - type: nauc_ndcg_at_20_std value: -57.243383699901386 - type: nauc_ndcg_at_3_diff1 value: 76.43700144403104 - type: nauc_ndcg_at_3_max value: 35.849266604568456 - type: nauc_ndcg_at_3_std value: -58.26941196366757 - type: nauc_ndcg_at_5_diff1 value: 76.65368894551763 - type: nauc_ndcg_at_5_max value: 36.67820873138469 - type: nauc_ndcg_at_5_std value: -59.167875261562884 - type: nauc_precision_at_1000_diff1 value: -44.61035236776975 - type: nauc_precision_at_1000_max value: -6.9906519553038535 - type: nauc_precision_at_1000_std value: 45.26673634956755 - type: nauc_precision_at_100_diff1 value: -44.471568524106466 - type: nauc_precision_at_100_max value: -6.513827405878257 - type: nauc_precision_at_100_std value: 43.61461800235919 - type: nauc_precision_at_10_diff1 value: -40.63269213674181 - type: nauc_precision_at_10_max value: -2.176686756124717 - type: nauc_precision_at_10_std value: 29.834023361852225 - type: nauc_precision_at_1_diff1 value: 78.37565253706408 - type: nauc_precision_at_1_max value: 38.335351847985045 - type: nauc_precision_at_1_std value: -50.212302001610745 - type: nauc_precision_at_20_diff1 value: -43.166138321174 - type: nauc_precision_at_20_max value: -4.551647757465525 - type: nauc_precision_at_20_std value: 36.236925649882664 - type: nauc_precision_at_3_diff1 value: -22.241887562444298 - type: nauc_precision_at_3_max value: 6.147594412705473 - type: nauc_precision_at_3_std value: 6.206594648276548 - type: nauc_precision_at_5_diff1 value: -33.948204035499955 - type: nauc_precision_at_5_max value: 1.551952866668139 - type: nauc_precision_at_5_std value: 19.086692514199573 - type: nauc_recall_at_1000_diff1 value: 56.00550359595701 - type: nauc_recall_at_1000_max value: 0.25076313433895114 - type: nauc_recall_at_1000_std value: -19.767447908090993 - type: nauc_recall_at_100_diff1 value: 71.09157100014333 - type: nauc_recall_at_100_max value: 36.803937541332566 - type: nauc_recall_at_100_std value: -68.4065523296009 - type: nauc_recall_at_10_diff1 value: 72.74150240606814 - type: nauc_recall_at_10_max value: 34.20323841659202 - type: nauc_recall_at_10_std value: -81.23057156799683 - type: nauc_recall_at_1_diff1 value: 80.37641963929528 - type: nauc_recall_at_1_max value: 27.046973659136057 - type: nauc_recall_at_1_std value: -49.41187376826384 - type: nauc_recall_at_20_diff1 value: 72.23679243300582 - type: nauc_recall_at_20_max value: 35.472624896485584 - type: nauc_recall_at_20_std value: -83.96453691324263 - type: nauc_recall_at_3_diff1 value: 74.4436126143353 - type: nauc_recall_at_3_max value: 30.220293116530584 - type: nauc_recall_at_3_std value: -68.23230306181532 - type: nauc_recall_at_5_diff1 value: 72.89682914794618 - type: nauc_recall_at_5_max value: 32.220311115253786 - type: nauc_recall_at_5_std value: -74.53623789048245 - type: ndcg_at_1 value: 83.5 - type: ndcg_at_10 value: 89.838 - type: ndcg_at_100 value: 90.879 - type: ndcg_at_1000 value: 90.955 - type: ndcg_at_20 value: 90.422 - type: ndcg_at_3 value: 87.21799999999999 - type: ndcg_at_5 value: 88.727 - type: precision_at_1 value: 83.5 - type: precision_at_10 value: 13.571 - type: precision_at_100 value: 1.5350000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_20 value: 7.175 - type: precision_at_3 value: 38.12 - type: precision_at_5 value: 25.041999999999998 - type: recall_at_1 value: 72.44500000000001 - type: recall_at_10 value: 96.298 - type: recall_at_100 value: 99.696 - type: recall_at_1000 value: 99.98599999999999 - type: recall_at_20 value: 98.15700000000001 - type: recall_at_3 value: 88.633 - type: recall_at_5 value: 92.985 task: type: Retrieval - dataset: config: default name: MTEB RedditClustering (default) revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: main_score value: 59.36225093784713 - type: v_measure value: 59.36225093784713 - type: v_measure_std value: 3.9911509588570393 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P (default) revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: main_score value: 64.46282036246124 - type: v_measure value: 64.46282036246124 - type: v_measure_std value: 12.49196304240264 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS (default) revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 split: test type: mteb/scidocs metrics: - type: main_score value: 21.781 - type: map_at_1 value: 5.103 - type: map_at_10 value: 13.152 - type: map_at_100 value: 15.421000000000001 - type: map_at_1000 value: 15.738 - type: map_at_20 value: 14.313 - type: map_at_3 value: 9.277000000000001 - type: map_at_5 value: 11.079 - type: mrr_at_1 value: 25.2 - type: mrr_at_10 value: 36.30464285714286 - type: mrr_at_100 value: 37.37083205414486 - type: mrr_at_1000 value: 37.41889994963302 - type: mrr_at_20 value: 36.99006600941199 - type: mrr_at_3 value: 33.11666666666667 - type: mrr_at_5 value: 34.971666666666664 - type: nauc_map_at_1000_diff1 value: 13.3829110188465 - type: nauc_map_at_1000_max value: 26.200548089249203 - type: nauc_map_at_1000_std value: 15.782390299656376 - type: nauc_map_at_100_diff1 value: 13.434823562595197 - type: nauc_map_at_100_max value: 26.19757227269967 - type: nauc_map_at_100_std value: 15.666149403001597 - type: nauc_map_at_10_diff1 value: 13.136752265014085 - type: nauc_map_at_10_max value: 24.37704176159032 - type: nauc_map_at_10_std value: 11.875468320642725 - type: nauc_map_at_1_diff1 value: 23.91080785158353 - type: nauc_map_at_1_max value: 21.714915496600813 - type: nauc_map_at_1_std value: 4.523659534794796 - type: nauc_map_at_20_diff1 value: 13.08994175195148 - type: nauc_map_at_20_max value: 25.564250916023035 - type: nauc_map_at_20_std value: 13.758854620282229 - type: nauc_map_at_3_diff1 value: 15.629634284012711 - type: nauc_map_at_3_max value: 20.94416328947656 - type: nauc_map_at_3_std value: 5.443733090008665 - type: nauc_map_at_5_diff1 value: 13.717844004379067 - type: nauc_map_at_5_max value: 21.93083811259854 - type: nauc_map_at_5_std value: 7.496869394816883 - type: nauc_mrr_at_1000_diff1 value: 19.466105991639516 - type: nauc_mrr_at_1000_max value: 23.857199036893714 - type: nauc_mrr_at_1000_std value: 10.400833057932964 - type: nauc_mrr_at_100_diff1 value: 19.45377482442327 - type: nauc_mrr_at_100_max value: 23.86931198998342 - type: nauc_mrr_at_100_std value: 10.43160252915245 - type: nauc_mrr_at_10_diff1 value: 19.595100505906498 - type: nauc_mrr_at_10_max value: 23.828564831729913 - type: nauc_mrr_at_10_std value: 10.158332218550582 - type: nauc_mrr_at_1_diff1 value: 23.639623316387265 - type: nauc_mrr_at_1_max value: 21.91276584516334 - type: nauc_mrr_at_1_std value: 4.555063005377011 - type: nauc_mrr_at_20_diff1 value: 19.42312083502562 - type: nauc_mrr_at_20_max value: 23.998031015425354 - type: nauc_mrr_at_20_std value: 10.507801798326819 - type: nauc_mrr_at_3_diff1 value: 20.50499706447941 - type: nauc_mrr_at_3_max value: 22.89975536944602 - type: nauc_mrr_at_3_std value: 8.976243818880809 - type: nauc_mrr_at_5_diff1 value: 19.59735376368769 - type: nauc_mrr_at_5_max value: 23.079995863526243 - type: nauc_mrr_at_5_std value: 9.558077494050336 - type: nauc_ndcg_at_1000_diff1 value: 13.411221925319488 - type: nauc_ndcg_at_1000_max value: 28.874659943874605 - type: nauc_ndcg_at_1000_std value: 22.92179424488089 - type: nauc_ndcg_at_100_diff1 value: 14.177059117246053 - type: nauc_ndcg_at_100_max value: 29.49863202457167 - type: nauc_ndcg_at_100_std value: 23.415432542915244 - type: nauc_ndcg_at_10_diff1 value: 14.034714269886518 - type: nauc_ndcg_at_10_max value: 26.529324449228014 - type: nauc_ndcg_at_10_std value: 15.0835036529515 - type: nauc_ndcg_at_1_diff1 value: 23.639623316387265 - type: nauc_ndcg_at_1_max value: 21.91276584516334 - type: nauc_ndcg_at_1_std value: 4.555063005377011 - type: nauc_ndcg_at_20_diff1 value: 13.639153726908837 - type: nauc_ndcg_at_20_max value: 28.34934989257701 - type: nauc_ndcg_at_20_std value: 18.346102705103505 - type: nauc_ndcg_at_3_diff1 value: 16.310949228363334 - type: nauc_ndcg_at_3_max value: 21.96244399696209 - type: nauc_ndcg_at_3_std value: 7.79248819842006 - type: nauc_ndcg_at_5_diff1 value: 14.630417187709366 - type: nauc_ndcg_at_5_max value: 23.28452419937793 - type: nauc_ndcg_at_5_std value: 10.132485346479228 - type: nauc_precision_at_1000_diff1 value: 0.4617378903286949 - type: nauc_precision_at_1000_max value: 23.084163863883607 - type: nauc_precision_at_1000_std value: 34.74028918125758 - type: nauc_precision_at_100_diff1 value: 7.744924657665058 - type: nauc_precision_at_100_max value: 28.822902541968237 - type: nauc_precision_at_100_std value: 35.872958881610344 - type: nauc_precision_at_10_diff1 value: 9.242022361674694 - type: nauc_precision_at_10_max value: 27.707443555826906 - type: nauc_precision_at_10_std value: 20.465290637452664 - type: nauc_precision_at_1_diff1 value: 23.639623316387265 - type: nauc_precision_at_1_max value: 21.91276584516334 - type: nauc_precision_at_1_std value: 4.555063005377011 - type: nauc_precision_at_20_diff1 value: 7.901785657316664 - type: nauc_precision_at_20_max value: 29.678603802205057 - type: nauc_precision_at_20_std value: 25.65946048724345 - type: nauc_precision_at_3_diff1 value: 13.650585769886394 - type: nauc_precision_at_3_max value: 22.03045956299473 - type: nauc_precision_at_3_std value: 9.155456520493106 - type: nauc_precision_at_5_diff1 value: 10.200134466214287 - type: nauc_precision_at_5_max value: 23.308672947117167 - type: nauc_precision_at_5_std value: 12.695862040385645 - type: nauc_recall_at_1000_diff1 value: 1.7286393025447204 - type: nauc_recall_at_1000_max value: 23.322719223507704 - type: nauc_recall_at_1000_std value: 36.358257876511956 - type: nauc_recall_at_100_diff1 value: 8.230846619688952 - type: nauc_recall_at_100_max value: 28.880569830494963 - type: nauc_recall_at_100_std value: 36.29115706966346 - type: nauc_recall_at_10_diff1 value: 9.362248846760513 - type: nauc_recall_at_10_max value: 27.475538879580885 - type: nauc_recall_at_10_std value: 20.314461649538373 - type: nauc_recall_at_1_diff1 value: 23.91080785158353 - type: nauc_recall_at_1_max value: 21.714915496600813 - type: nauc_recall_at_1_std value: 4.523659534794796 - type: nauc_recall_at_20_diff1 value: 8.140101636033602 - type: nauc_recall_at_20_max value: 29.59131501693498 - type: nauc_recall_at_20_std value: 25.876120433055316 - type: nauc_recall_at_3_diff1 value: 13.725759049941843 - type: nauc_recall_at_3_max value: 21.75055584058006 - type: nauc_recall_at_3_std value: 8.965766944507815 - type: nauc_recall_at_5_diff1 value: 10.366069494614596 - type: nauc_recall_at_5_max value: 23.031784865881054 - type: nauc_recall_at_5_std value: 12.411188897743521 - type: ndcg_at_1 value: 25.2 - type: ndcg_at_10 value: 21.781 - type: ndcg_at_100 value: 30.273 - type: ndcg_at_1000 value: 35.768 - type: ndcg_at_20 value: 24.967 - type: ndcg_at_3 value: 20.580000000000002 - type: ndcg_at_5 value: 17.926000000000002 - type: precision_at_1 value: 25.2 - type: precision_at_10 value: 11.4 - type: precision_at_100 value: 2.359 - type: precision_at_1000 value: 0.368 - type: precision_at_20 value: 7.545 - type: precision_at_3 value: 19.3 - type: precision_at_5 value: 15.78 - type: recall_at_1 value: 5.103 - type: recall_at_10 value: 23.083000000000002 - type: recall_at_100 value: 47.882999999999996 - type: recall_at_1000 value: 74.783 - type: recall_at_20 value: 30.592000000000002 - type: recall_at_3 value: 11.753 - type: recall_at_5 value: 15.983 task: type: Retrieval - dataset: config: default name: MTEB SICK-R (default) revision: 20a6d6f312dd54037fe07a32d58e5e168867909d split: test type: mteb/sickr-sts metrics: - type: cosine_pearson value: 83.9841377195369 - type: cosine_spearman value: 77.44919890597407 - type: euclidean_pearson value: 81.21238548422511 - type: euclidean_spearman value: 76.94405730272983 - type: main_score value: 77.44919890597407 - type: manhattan_pearson value: 81.16824677968528 - type: manhattan_spearman value: 76.94296468591867 - type: pearson value: 83.9841377195369 - type: spearman value: 77.44919890597407 task: type: STS - dataset: config: default name: MTEB STS12 (default) revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: cosine_pearson value: 81.36071984442052 - type: cosine_spearman value: 74.2212823495219 - type: euclidean_pearson value: 78.31139429452078 - type: euclidean_spearman value: 74.02790834412275 - type: main_score value: 74.2212823495219 - type: manhattan_pearson value: 78.26141328104697 - type: manhattan_spearman value: 74.02545007676329 - type: pearson value: 81.36071984442052 - type: spearman value: 74.2212823495219 task: type: STS - dataset: config: default name: MTEB STS13 (default) revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: cosine_pearson value: 85.49925337918731 - type: cosine_spearman value: 86.12368715292688 - type: euclidean_pearson value: 85.71147581542367 - type: euclidean_spearman value: 86.64112317821541 - type: main_score value: 86.12368715292688 - type: manhattan_pearson value: 85.58242941611371 - type: manhattan_spearman value: 86.51041533466731 - type: pearson value: 85.49925337918731 - type: spearman value: 86.12368715292688 task: type: STS - dataset: config: default name: MTEB STS14 (default) revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: cosine_pearson value: 82.24735192639226 - type: cosine_spearman value: 78.88155361224834 - type: euclidean_pearson value: 80.52048132030517 - type: euclidean_spearman value: 78.1335955670817 - type: main_score value: 78.88155361224834 - type: manhattan_pearson value: 80.48178866605353 - type: manhattan_spearman value: 78.08994918255844 - type: pearson value: 82.24735192639226 - type: spearman value: 78.88155361224834 task: type: STS - dataset: config: default name: MTEB STS15 (default) revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: cosine_pearson value: 86.27381322229758 - type: cosine_spearman value: 87.5038962579188 - type: euclidean_pearson value: 86.7575259976948 - type: euclidean_spearman value: 87.3358778981031 - type: main_score value: 87.5038962579188 - type: manhattan_pearson value: 86.72177109814491 - type: manhattan_spearman value: 87.30593609243358 - type: pearson value: 86.27381322229758 - type: spearman value: 87.5038962579188 task: type: STS - dataset: config: default name: MTEB STS16 (default) revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: cosine_pearson value: 82.90364706517789 - type: cosine_spearman value: 84.25854334490232 - type: euclidean_pearson value: 83.30065780824273 - type: euclidean_spearman value: 84.17467271748362 - type: main_score value: 84.25854334490232 - type: manhattan_pearson value: 83.21239264085494 - type: manhattan_spearman value: 84.05456832118482 - type: pearson value: 82.90364706517789 - type: spearman value: 84.25854334490232 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 88.88258729094343 - type: cosine_spearman value: 89.68436656381257 - type: euclidean_pearson value: 88.23417725579127 - type: euclidean_spearman value: 87.96688277361433 - type: main_score value: 89.68436656381257 - type: manhattan_pearson value: 88.07673471897155 - type: manhattan_spearman value: 87.7976329721765 - type: pearson value: 88.88258729094343 - type: spearman value: 89.68436656381257 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 65.24627744968292 - type: cosine_spearman value: 65.96283849168346 - type: euclidean_pearson value: 66.2111925054528 - type: euclidean_spearman value: 65.83563143944401 - type: main_score value: 65.96283849168346 - type: manhattan_pearson value: 66.25664281582083 - type: manhattan_spearman value: 65.8830797513158 - type: pearson value: 65.24627744968292 - type: spearman value: 65.96283849168346 task: type: STS - dataset: config: default name: MTEB STSBenchmark (default) revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: cosine_pearson value: 85.57515090752183 - type: cosine_spearman value: 85.54441587714372 - type: euclidean_pearson value: 85.53938106211463 - type: euclidean_spearman value: 85.28473579067878 - type: main_score value: 85.54441587714372 - type: manhattan_pearson value: 85.51025100057596 - type: manhattan_spearman value: 85.260887707662 - type: pearson value: 85.57515090752183 - type: spearman value: 85.54441587714372 task: type: STS - dataset: config: default name: MTEB SciDocsRR (default) revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: main_score value: 82.9058801876062 - type: map value: 82.9058801876062 - type: mrr value: 95.256220721907 - type: nAUC_map_diff1 value: 0.13078953297011875 - type: nAUC_map_max value: 59.173980738758026 - type: nAUC_map_std value: 73.35735418975649 - type: nAUC_mrr_diff1 value: 46.534353907114514 - type: nAUC_mrr_max value: 89.56255914950661 - type: nAUC_mrr_std value: 85.6716185155955 task: type: Reranking - dataset: config: default name: MTEB SciFact (default) revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: main_score value: 71.844 - type: map_at_1 value: 57.278 - type: map_at_10 value: 67.109 - type: map_at_100 value: 67.66499999999999 - type: map_at_1000 value: 67.685 - type: map_at_20 value: 67.482 - type: map_at_3 value: 64.16199999999999 - type: map_at_5 value: 65.82900000000001 - type: mrr_at_1 value: 60.0 - type: mrr_at_10 value: 68.19960317460317 - type: mrr_at_100 value: 68.62748949394921 - type: mrr_at_1000 value: 68.64515905414915 - type: mrr_at_20 value: 68.472601010101 - type: mrr_at_3 value: 66.0 - type: mrr_at_5 value: 67.21666666666667 - type: nauc_map_at_1000_diff1 value: 70.04313292027558 - type: nauc_map_at_1000_max value: 57.24529193476731 - type: nauc_map_at_1000_std value: -4.8888921470785585 - type: nauc_map_at_100_diff1 value: 70.04624674117014 - type: nauc_map_at_100_max value: 57.25302539508853 - type: nauc_map_at_100_std value: -4.907703072069842 - type: nauc_map_at_10_diff1 value: 70.06943109940849 - type: nauc_map_at_10_max value: 57.39452715929109 - type: nauc_map_at_10_std value: -4.743417671263566 - type: nauc_map_at_1_diff1 value: 76.61111479875207 - type: nauc_map_at_1_max value: 52.822124992902374 - type: nauc_map_at_1_std value: -7.6071857283495445 - type: nauc_map_at_20_diff1 value: 69.95251393140202 - type: nauc_map_at_20_max value: 57.328356768833146 - type: nauc_map_at_20_std value: -4.871357691032887 - type: nauc_map_at_3_diff1 value: 69.71499509001714 - type: nauc_map_at_3_max value: 53.645107897260026 - type: nauc_map_at_3_std value: -7.908850295935557 - type: nauc_map_at_5_diff1 value: 69.7531280646943 - type: nauc_map_at_5_max value: 55.71038914997073 - type: nauc_map_at_5_std value: -6.7813041970848476 - type: nauc_mrr_at_1000_diff1 value: 69.61840192382927 - type: nauc_mrr_at_1000_max value: 58.419734360225696 - type: nauc_mrr_at_1000_std value: -1.8503761885586425 - type: nauc_mrr_at_100_diff1 value: 69.6153571701724 - type: nauc_mrr_at_100_max value: 58.422378816414565 - type: nauc_mrr_at_100_std value: -1.8731915889302972 - type: nauc_mrr_at_10_diff1 value: 69.5874772943516 - type: nauc_mrr_at_10_max value: 58.78121978366665 - type: nauc_mrr_at_10_std value: -1.2843146465927913 - type: nauc_mrr_at_1_diff1 value: 74.35688136934793 - type: nauc_mrr_at_1_max value: 57.487384980706416 - type: nauc_mrr_at_1_std value: -1.3005837538340144 - type: nauc_mrr_at_20_diff1 value: 69.53988639045606 - type: nauc_mrr_at_20_max value: 58.49631860342686 - type: nauc_mrr_at_20_std value: -1.7220227513588833 - type: nauc_mrr_at_3_diff1 value: 68.94320178615871 - type: nauc_mrr_at_3_max value: 56.60856449749424 - type: nauc_mrr_at_3_std value: -3.3432894595086866 - type: nauc_mrr_at_5_diff1 value: 68.94240340867633 - type: nauc_mrr_at_5_max value: 58.27068018852665 - type: nauc_mrr_at_5_std value: -2.320192066949136 - type: nauc_ndcg_at_1000_diff1 value: 69.15093538086137 - type: nauc_ndcg_at_1000_max value: 58.6801221127507 - type: nauc_ndcg_at_1000_std value: -3.002038837722594 - type: nauc_ndcg_at_100_diff1 value: 69.11507044508373 - type: nauc_ndcg_at_100_max value: 58.843490113137605 - type: nauc_ndcg_at_100_std value: -3.2810475322338566 - type: nauc_ndcg_at_10_diff1 value: 68.71920945656667 - type: nauc_ndcg_at_10_max value: 60.13600198034469 - type: nauc_ndcg_at_10_std value: -1.6190106644777749 - type: nauc_ndcg_at_1_diff1 value: 74.35688136934793 - type: nauc_ndcg_at_1_max value: 57.487384980706416 - type: nauc_ndcg_at_1_std value: -1.3005837538340144 - type: nauc_ndcg_at_20_diff1 value: 68.33714726670162 - type: nauc_ndcg_at_20_max value: 59.45907982196103 - type: nauc_ndcg_at_20_std value: -2.5953063304797754 - type: nauc_ndcg_at_3_diff1 value: 67.33605891922716 - type: nauc_ndcg_at_3_max value: 55.01142849375101 - type: nauc_ndcg_at_3_std value: -6.5632981093508205 - type: nauc_ndcg_at_5_diff1 value: 67.59450950578172 - type: nauc_ndcg_at_5_max value: 57.50106057747294 - type: nauc_ndcg_at_5_std value: -5.415038422866616 - type: nauc_precision_at_1000_diff1 value: -33.21156082089814 - type: nauc_precision_at_1000_max value: 19.132732038554398 - type: nauc_precision_at_1000_std value: 44.091281225705714 - type: nauc_precision_at_100_diff1 value: -20.015823755259245 - type: nauc_precision_at_100_max value: 26.507243354636085 - type: nauc_precision_at_100_std value: 37.87274756817076 - type: nauc_precision_at_10_diff1 value: 8.35057694800983 - type: nauc_precision_at_10_max value: 49.60611953844157 - type: nauc_precision_at_10_std value: 32.18410475820039 - type: nauc_precision_at_1_diff1 value: 74.35688136934793 - type: nauc_precision_at_1_max value: 57.487384980706416 - type: nauc_precision_at_1_std value: -1.3005837538340144 - type: nauc_precision_at_20_diff1 value: -3.0872665961524612 - type: nauc_precision_at_20_max value: 40.5565038905005 - type: nauc_precision_at_20_std value: 32.15291813716766 - type: nauc_precision_at_3_diff1 value: 34.627722605371545 - type: nauc_precision_at_3_max value: 49.65219072739979 - type: nauc_precision_at_3_std value: 7.7588985130719434 - type: nauc_precision_at_5_diff1 value: 22.06911561993657 - type: nauc_precision_at_5_max value: 49.09578970278826 - type: nauc_precision_at_5_std value: 16.038789872070705 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 64.77257569694551 - type: nauc_recall_at_100_max value: 65.07269574496497 - type: nauc_recall_at_100_std value: -10.979947534569218 - type: nauc_recall_at_10_diff1 value: 62.14297161941494 - type: nauc_recall_at_10_max value: 70.41353364022896 - type: nauc_recall_at_10_std value: 9.172932719542075 - type: nauc_recall_at_1_diff1 value: 76.61111479875207 - type: nauc_recall_at_1_max value: 52.822124992902374 - type: nauc_recall_at_1_std value: -7.6071857283495445 - type: nauc_recall_at_20_diff1 value: 57.631464811333224 - type: nauc_recall_at_20_max value: 67.83558221740536 - type: nauc_recall_at_20_std value: 3.110691973832695 - type: nauc_recall_at_3_diff1 value: 60.39078444139112 - type: nauc_recall_at_3_max value: 51.122425596651574 - type: nauc_recall_at_3_std value: -10.307895490015559 - type: nauc_recall_at_5_diff1 value: 59.703727953513145 - type: nauc_recall_at_5_max value: 59.81893786534298 - type: nauc_recall_at_5_std value: -6.231017907901268 - type: ndcg_at_1 value: 60.0 - type: ndcg_at_10 value: 71.844 - type: ndcg_at_100 value: 74.278 - type: ndcg_at_1000 value: 74.74199999999999 - type: ndcg_at_20 value: 72.99 - type: ndcg_at_3 value: 66.721 - type: ndcg_at_5 value: 69.137 - type: precision_at_1 value: 60.0 - type: precision_at_10 value: 9.6 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_20 value: 5.067 - type: precision_at_3 value: 26.111 - type: precision_at_5 value: 17.267 - type: recall_at_1 value: 57.278 - type: recall_at_10 value: 85.344 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 89.589 - type: recall_at_3 value: 71.45 - type: recall_at_5 value: 77.361 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions (default) revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: cosine_accuracy value: 99.8019801980198 - type: cosine_accuracy_threshold value: 74.77510571479797 - type: cosine_ap value: 95.30006120252773 - type: cosine_f1 value: 89.75265017667844 - type: cosine_f1_threshold value: 72.93492555618286 - type: cosine_precision value: 90.62181447502549 - type: cosine_recall value: 88.9 - type: dot_accuracy value: 99.74554455445545 - type: dot_accuracy_threshold value: 794.2790985107422 - type: dot_ap value: 93.33073289508414 - type: dot_f1 value: 87.11779448621553 - type: dot_f1_threshold value: 793.5191631317139 - type: dot_precision value: 87.33668341708542 - type: dot_recall value: 86.9 - type: euclidean_accuracy value: 99.7960396039604 - type: euclidean_accuracy_threshold value: 238.72876167297363 - type: euclidean_ap value: 95.04815354196363 - type: euclidean_f1 value: 89.53252032520325 - type: euclidean_f1_threshold value: 241.42813682556152 - type: euclidean_precision value: 91.01239669421489 - type: euclidean_recall value: 88.1 - type: main_score value: 95.30006120252773 - type: manhattan_accuracy value: 99.7960396039604 - type: manhattan_accuracy_threshold value: 5224.44953918457 - type: manhattan_ap value: 95.02798265540767 - type: manhattan_f1 value: 89.4552723638181 - type: manhattan_f1_threshold value: 5434.450531005859 - type: manhattan_precision value: 89.41058941058941 - type: manhattan_recall value: 89.5 - type: max_accuracy value: 99.8019801980198 - type: max_ap value: 95.30006120252773 - type: max_f1 value: 89.75265017667844 - type: max_precision value: 91.01239669421489 - type: max_recall value: 89.5 - type: similarity_accuracy value: 99.8019801980198 - type: similarity_accuracy_threshold value: 74.77510571479797 - type: similarity_ap value: 95.30006120252773 - type: similarity_f1 value: 89.75265017667844 - type: similarity_f1_threshold value: 72.93492555618286 - type: similarity_precision value: 90.62181447502549 - type: similarity_recall value: 88.9 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering (default) revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: main_score value: 66.76593843797666 - type: v_measure value: 66.76593843797666 - type: v_measure_std value: 3.5421488096435416 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P (default) revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: main_score value: 38.90007255920144 - type: v_measure value: 38.90007255920144 - type: v_measure_std value: 1.440894289494648 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions (default) revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: main_score value: 52.71807785910519 - type: map value: 52.71807785910519 - type: mrr value: 53.51011427298192 - type: nAUC_map_diff1 value: 38.489341755206404 - type: nAUC_map_max value: 12.810459097227756 - type: nAUC_map_std value: 10.001723368468545 - type: nAUC_mrr_diff1 value: 38.1795784067288 - type: nAUC_mrr_max value: 13.876071274342735 - type: nAUC_mrr_std value: 10.809361649584433 task: type: Reranking - dataset: config: default name: MTEB SummEval (default) revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: cosine_pearson value: 31.51422308323083 - type: cosine_spearman value: 31.22821719703179 - type: dot_pearson value: 30.692806438778554 - type: dot_spearman value: 30.440095026481913 - type: main_score value: 31.22821719703179 - type: pearson value: 31.51422308323083 - type: spearman value: 31.22821719703179 task: type: Summarization - dataset: config: default name: MTEB TRECCOVID (default) revision: bb9466bac8153a0349341eb1b22e06409e78ef4e split: test type: mteb/trec-covid metrics: - type: main_score value: 79.38199999999999 - type: map_at_1 value: 0.258 - type: map_at_10 value: 2.077 - type: map_at_100 value: 12.062000000000001 - type: map_at_1000 value: 28.717 - type: map_at_20 value: 3.6630000000000003 - type: map_at_3 value: 0.7040000000000001 - type: map_at_5 value: 1.114 - type: mrr_at_1 value: 96.0 - type: mrr_at_10 value: 97.66666666666667 - type: mrr_at_100 value: 97.66666666666667 - type: mrr_at_1000 value: 97.66666666666667 - type: mrr_at_20 value: 97.66666666666667 - type: mrr_at_3 value: 97.66666666666667 - type: mrr_at_5 value: 97.66666666666667 - type: nauc_map_at_1000_diff1 value: -19.606457542469276 - type: nauc_map_at_1000_max value: 62.23126542837836 - type: nauc_map_at_1000_std value: 78.11491433681955 - type: nauc_map_at_100_diff1 value: 1.056950862100428 - type: nauc_map_at_100_max value: 43.14707718269215 - type: nauc_map_at_100_std value: 54.99119932336741 - type: nauc_map_at_10_diff1 value: 31.26313513848752 - type: nauc_map_at_10_max value: 18.729050164831303 - type: nauc_map_at_10_std value: 12.501346100150942 - type: nauc_map_at_1_diff1 value: 50.67428371303766 - type: nauc_map_at_1_max value: 8.26350705716926 - type: nauc_map_at_1_std value: -2.802747360156509 - type: nauc_map_at_20_diff1 value: 23.85177292094862 - type: nauc_map_at_20_max value: 24.907498374862385 - type: nauc_map_at_20_std value: 23.15361092830954 - type: nauc_map_at_3_diff1 value: 44.34113488392741 - type: nauc_map_at_3_max value: 16.13816628219856 - type: nauc_map_at_3_std value: 1.64493293742063 - type: nauc_map_at_5_diff1 value: 43.35667417997146 - type: nauc_map_at_5_max value: 16.651525778549175 - type: nauc_map_at_5_std value: 5.344297729807275 - type: nauc_mrr_at_1000_diff1 value: 65.01934106976137 - type: nauc_mrr_at_1000_max value: 74.5231425903695 - type: nauc_mrr_at_1000_std value: 84.12698412698381 - type: nauc_mrr_at_100_diff1 value: 65.01934106976137 - type: nauc_mrr_at_100_max value: 74.5231425903695 - type: nauc_mrr_at_100_std value: 84.12698412698381 - type: nauc_mrr_at_10_diff1 value: 65.01934106976137 - type: nauc_mrr_at_10_max value: 74.5231425903695 - type: nauc_mrr_at_10_std value: 84.12698412698381 - type: nauc_mrr_at_1_diff1 value: 63.81886087768457 - type: nauc_mrr_at_1_max value: 77.70774976657333 - type: nauc_mrr_at_1_std value: 86.11111111111124 - type: nauc_mrr_at_20_diff1 value: 65.01934106976137 - type: nauc_mrr_at_20_max value: 74.5231425903695 - type: nauc_mrr_at_20_std value: 84.12698412698381 - type: nauc_mrr_at_3_diff1 value: 65.01934106976137 - type: nauc_mrr_at_3_max value: 74.5231425903695 - type: nauc_mrr_at_3_std value: 84.12698412698381 - type: nauc_mrr_at_5_diff1 value: 65.01934106976137 - type: nauc_mrr_at_5_max value: 74.5231425903695 - type: nauc_mrr_at_5_std value: 84.12698412698381 - type: nauc_ndcg_at_1000_diff1 value: -12.207934630430895 - type: nauc_ndcg_at_1000_max value: 63.27131989733247 - type: nauc_ndcg_at_1000_std value: 77.77862783776057 - type: nauc_ndcg_at_100_diff1 value: -31.139043418906777 - type: nauc_ndcg_at_100_max value: 56.29288690229761 - type: nauc_ndcg_at_100_std value: 80.54207709212822 - type: nauc_ndcg_at_10_diff1 value: -21.623075757241335 - type: nauc_ndcg_at_10_max value: 42.00930185115019 - type: nauc_ndcg_at_10_std value: 63.90085820733794 - type: nauc_ndcg_at_1_diff1 value: 27.03957293721711 - type: nauc_ndcg_at_1_max value: 18.687865072917816 - type: nauc_ndcg_at_1_std value: 40.65606746354093 - type: nauc_ndcg_at_20_diff1 value: -27.059567337111528 - type: nauc_ndcg_at_20_max value: 44.873490488692845 - type: nauc_ndcg_at_20_std value: 68.27056244238835 - type: nauc_ndcg_at_3_diff1 value: -2.2768439107759253 - type: nauc_ndcg_at_3_max value: 33.16972612805963 - type: nauc_ndcg_at_3_std value: 49.35785810423734 - type: nauc_ndcg_at_5_diff1 value: -8.380892599544165 - type: nauc_ndcg_at_5_max value: 39.7045491756542 - type: nauc_ndcg_at_5_std value: 56.662696632820044 - type: nauc_precision_at_1000_diff1 value: -39.853246552685256 - type: nauc_precision_at_1000_max value: 45.82687391914263 - type: nauc_precision_at_1000_std value: 51.6573155072073 - type: nauc_precision_at_100_diff1 value: -35.334152199143055 - type: nauc_precision_at_100_max value: 57.74163988146608 - type: nauc_precision_at_100_std value: 78.83424294782806 - type: nauc_precision_at_10_diff1 value: -29.572269138136193 - type: nauc_precision_at_10_max value: 45.16249504588279 - type: nauc_precision_at_10_std value: 63.92716685466912 - type: nauc_precision_at_1_diff1 value: 63.81886087768457 - type: nauc_precision_at_1_max value: 77.70774976657333 - type: nauc_precision_at_1_std value: 86.11111111111124 - type: nauc_precision_at_20_diff1 value: -31.155129521710613 - type: nauc_precision_at_20_max value: 46.072522169609606 - type: nauc_precision_at_20_std value: 64.29857883516294 - type: nauc_precision_at_3_diff1 value: -5.644268209909603 - type: nauc_precision_at_3_max value: 54.62437037830888 - type: nauc_precision_at_3_std value: 52.27021040974535 - type: nauc_precision_at_5_diff1 value: -15.560278135078049 - type: nauc_precision_at_5_max value: 50.21344816658272 - type: nauc_precision_at_5_std value: 58.94711332326674 - type: nauc_recall_at_1000_diff1 value: -8.016557237167058 - type: nauc_recall_at_1000_max value: 58.857938362714165 - type: nauc_recall_at_1000_std value: 66.83850522737738 - type: nauc_recall_at_100_diff1 value: 15.447588986377317 - type: nauc_recall_at_100_max value: 37.515788055189084 - type: nauc_recall_at_100_std value: 42.326000614078026 - type: nauc_recall_at_10_diff1 value: 34.99067421432679 - type: nauc_recall_at_10_max value: 13.792789030946933 - type: nauc_recall_at_10_std value: 7.066206327262477 - type: nauc_recall_at_1_diff1 value: 50.67428371303766 - type: nauc_recall_at_1_max value: 8.26350705716926 - type: nauc_recall_at_1_std value: -2.802747360156509 - type: nauc_recall_at_20_diff1 value: 31.277397618992136 - type: nauc_recall_at_20_max value: 20.296127261717054 - type: nauc_recall_at_20_std value: 16.117931287068437 - type: nauc_recall_at_3_diff1 value: 46.303571802817025 - type: nauc_recall_at_3_max value: 14.03073426897129 - type: nauc_recall_at_3_std value: -0.39592906337357797 - type: nauc_recall_at_5_diff1 value: 45.51206018811467 - type: nauc_recall_at_5_max value: 12.263182926616867 - type: nauc_recall_at_5_std value: 1.5451403387758214 - type: ndcg_at_1 value: 87.0 - type: ndcg_at_10 value: 79.38199999999999 - type: ndcg_at_100 value: 59.941 - type: ndcg_at_1000 value: 53.581999999999994 - type: ndcg_at_20 value: 74.244 - type: ndcg_at_3 value: 84.05 - type: ndcg_at_5 value: 82.328 - type: precision_at_1 value: 96.0 - type: precision_at_10 value: 85.2 - type: precision_at_100 value: 61.519999999999996 - type: precision_at_1000 value: 23.328 - type: precision_at_20 value: 78.4 - type: precision_at_3 value: 90.667 - type: precision_at_5 value: 88.4 - type: recall_at_1 value: 0.258 - type: recall_at_10 value: 2.225 - type: recall_at_100 value: 15.190999999999999 - type: recall_at_1000 value: 50.656 - type: recall_at_20 value: 4.063 - type: recall_at_3 value: 0.722 - type: recall_at_5 value: 1.168 task: type: Retrieval - dataset: config: default name: MTEB Touche2020 (default) revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: main_score value: 24.254 - type: map_at_1 value: 2.355 - type: map_at_10 value: 9.554 - type: map_at_100 value: 14.856 - type: map_at_1000 value: 16.320999999999998 - type: map_at_20 value: 11.594 - type: map_at_3 value: 5.624 - type: map_at_5 value: 6.948 - type: mrr_at_1 value: 28.57142857142857 - type: mrr_at_10 value: 45.30855199222546 - type: mrr_at_100 value: 46.29196367191565 - type: mrr_at_1000 value: 46.31499833524485 - type: mrr_at_20 value: 46.113797167218536 - type: mrr_at_3 value: 42.17687074829932 - type: mrr_at_5 value: 43.70748299319728 - type: nauc_map_at_1000_diff1 value: 16.20923402096991 - type: nauc_map_at_1000_max value: -1.0790035381754648 - type: nauc_map_at_1000_std value: 7.195462252108266 - type: nauc_map_at_100_diff1 value: 18.389136986949936 - type: nauc_map_at_100_max value: -2.05569038009456 - type: nauc_map_at_100_std value: 2.571693024788773 - type: nauc_map_at_10_diff1 value: 21.066136452964642 - type: nauc_map_at_10_max value: 1.5731034935019352 - type: nauc_map_at_10_std value: -10.470562156435545 - type: nauc_map_at_1_diff1 value: 18.809274247757674 - type: nauc_map_at_1_max value: -8.68104031396317 - type: nauc_map_at_1_std value: -30.619138463973307 - type: nauc_map_at_20_diff1 value: 23.36148432932364 - type: nauc_map_at_20_max value: -0.38560029617230923 - type: nauc_map_at_20_std value: -6.8825311118744485 - type: nauc_map_at_3_diff1 value: 18.9370153117886 - type: nauc_map_at_3_max value: 2.2032967783435375 - type: nauc_map_at_3_std value: -12.532694022066659 - type: nauc_map_at_5_diff1 value: 21.434904521858602 - type: nauc_map_at_5_max value: 6.094611630406942 - type: nauc_map_at_5_std value: -12.492795788667474 - type: nauc_mrr_at_1000_diff1 value: 11.961046636239269 - type: nauc_mrr_at_1000_max value: -15.748297693665677 - type: nauc_mrr_at_1000_std value: -12.067130971523385 - type: nauc_mrr_at_100_diff1 value: 11.95534277650038 - type: nauc_mrr_at_100_max value: -15.684486171307041 - type: nauc_mrr_at_100_std value: -11.98247014226321 - type: nauc_mrr_at_10_diff1 value: 12.191520381511925 - type: nauc_mrr_at_10_max value: -16.510285123987302 - type: nauc_mrr_at_10_std value: -11.93784570526233 - type: nauc_mrr_at_1_diff1 value: 18.162553375605516 - type: nauc_mrr_at_1_max value: -18.920009881475387 - type: nauc_mrr_at_1_std value: -31.201005281857086 - type: nauc_mrr_at_20_diff1 value: 11.85035482221006 - type: nauc_mrr_at_20_max value: -16.18704935368085 - type: nauc_mrr_at_20_std value: -11.424991900511088 - type: nauc_mrr_at_3_diff1 value: 14.733201594965836 - type: nauc_mrr_at_3_max value: -11.75899459749356 - type: nauc_mrr_at_3_std value: -11.499870896820976 - type: nauc_mrr_at_5_diff1 value: 12.874017458219845 - type: nauc_mrr_at_5_max value: -13.642689819875791 - type: nauc_mrr_at_5_std value: -11.64117086557618 - type: nauc_ndcg_at_1000_diff1 value: -6.849400123979281 - type: nauc_ndcg_at_1000_max value: -3.8209628417621393 - type: nauc_ndcg_at_1000_std value: 31.393629472927504 - type: nauc_ndcg_at_100_diff1 value: 5.4656320972286485 - type: nauc_ndcg_at_100_max value: -11.571250999652408 - type: nauc_ndcg_at_100_std value: 16.5511179303082 - type: nauc_ndcg_at_10_diff1 value: 9.553502614400788 - type: nauc_ndcg_at_10_max value: -14.08266102380929 - type: nauc_ndcg_at_10_std value: -5.404201943794988 - type: nauc_ndcg_at_1_diff1 value: 11.37824691229176 - type: nauc_ndcg_at_1_max value: -21.31215334708879 - type: nauc_ndcg_at_1_std value: -29.749958184219334 - type: nauc_ndcg_at_20_diff1 value: 13.396975021395857 - type: nauc_ndcg_at_20_max value: -14.5189405742469 - type: nauc_ndcg_at_20_std value: -1.6276921520570502 - type: nauc_ndcg_at_3_diff1 value: 2.3132968948746226 - type: nauc_ndcg_at_3_max value: -11.351646560904848 - type: nauc_ndcg_at_3_std value: -0.15036952995361091 - type: nauc_ndcg_at_5_diff1 value: 6.214320727021392 - type: nauc_ndcg_at_5_max value: -9.797994041679638 - type: nauc_ndcg_at_5_std value: -3.3742904276844223 - type: nauc_precision_at_1000_diff1 value: -32.78708155144845 - type: nauc_precision_at_1000_max value: 34.81622247650308 - type: nauc_precision_at_1000_std value: 47.996245254718744 - type: nauc_precision_at_100_diff1 value: -10.867559709952797 - type: nauc_precision_at_100_max value: 6.681915188055671 - type: nauc_precision_at_100_std value: 61.989390090979356 - type: nauc_precision_at_10_diff1 value: 6.511211593484189 - type: nauc_precision_at_10_max value: -16.842566662697454 - type: nauc_precision_at_10_std value: 5.002600740433903 - type: nauc_precision_at_1_diff1 value: 18.162553375605516 - type: nauc_precision_at_1_max value: -18.920009881475387 - type: nauc_precision_at_1_std value: -31.201005281857086 - type: nauc_precision_at_20_diff1 value: 9.640744611970522 - type: nauc_precision_at_20_max value: -18.27653996056668 - type: nauc_precision_at_20_std value: 22.021814503656543 - type: nauc_precision_at_3_diff1 value: 6.916201107284145 - type: nauc_precision_at_3_max value: -0.039381527098944095 - type: nauc_precision_at_3_std value: 9.096821181866671 - type: nauc_precision_at_5_diff1 value: 9.032683328748616 - type: nauc_precision_at_5_max value: -3.5989814795848223 - type: nauc_precision_at_5_std value: 2.506947866544208 - type: nauc_recall_at_1000_diff1 value: -27.92405572104993 - type: nauc_recall_at_1000_max value: 14.256848434706395 - type: nauc_recall_at_1000_std value: 69.3546817240148 - type: nauc_recall_at_100_diff1 value: 6.613753533249129 - type: nauc_recall_at_100_max value: -8.405822616363144 - type: nauc_recall_at_100_std value: 29.430588706591397 - type: nauc_recall_at_10_diff1 value: 18.481730784371077 - type: nauc_recall_at_10_max value: -7.763172381505888 - type: nauc_recall_at_10_std value: -7.48570052741164 - type: nauc_recall_at_1_diff1 value: 18.809274247757674 - type: nauc_recall_at_1_max value: -8.68104031396317 - type: nauc_recall_at_1_std value: -30.619138463973307 - type: nauc_recall_at_20_diff1 value: 20.639977762281493 - type: nauc_recall_at_20_max value: -11.301201172125623 - type: nauc_recall_at_20_std value: 0.38755705583239786 - type: nauc_recall_at_3_diff1 value: 18.279383297820562 - type: nauc_recall_at_3_max value: 5.287795698059438 - type: nauc_recall_at_3_std value: -3.7312167565958316 - type: nauc_recall_at_5_diff1 value: 21.115852302465356 - type: nauc_recall_at_5_max value: 5.318139212101227 - type: nauc_recall_at_5_std value: -7.792885381250281 - type: ndcg_at_1 value: 25.509999999999998 - type: ndcg_at_10 value: 24.254 - type: ndcg_at_100 value: 34.660000000000004 - type: ndcg_at_1000 value: 45.798 - type: ndcg_at_20 value: 24.988 - type: ndcg_at_3 value: 29.273 - type: ndcg_at_5 value: 25.453 - type: precision_at_1 value: 28.571 - type: precision_at_10 value: 21.02 - type: precision_at_100 value: 7.122000000000001 - type: precision_at_1000 value: 1.435 - type: precision_at_20 value: 16.326999999999998 - type: precision_at_3 value: 31.293 - type: precision_at_5 value: 24.898 - type: recall_at_1 value: 2.355 - type: recall_at_10 value: 15.397 - type: recall_at_100 value: 43.647000000000006 - type: recall_at_1000 value: 77.089 - type: recall_at_20 value: 22.792 - type: recall_at_3 value: 6.847 - type: recall_at_5 value: 9.136 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification (default) revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 72.7734375 - type: ap value: 15.655230461083708 - type: ap_weighted value: 15.655230461083708 - type: f1 value: 56.31497978454638 - type: f1_weighted value: 78.70509613747345 - type: main_score value: 72.7734375 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification (default) revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 72.56366723259762 - type: f1 value: 72.90413275122202 - type: f1_weighted value: 72.19948169084057 - type: main_score value: 72.56366723259762 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering (default) revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: main_score value: 56.90970017457857 - type: v_measure value: 56.90970017457857 - type: v_measure_std value: 1.5885885070403738 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 (default) revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: cosine_accuracy value: 85.7006616200751 - type: cosine_accuracy_threshold value: 75.78572630882263 - type: cosine_ap value: 72.87577990245127 - type: cosine_f1 value: 67.36422521175885 - type: cosine_f1_threshold value: 70.15678882598877 - type: cosine_precision value: 63.80368098159509 - type: cosine_recall value: 71.34564643799473 - type: dot_accuracy value: 83.60851165285807 - type: dot_accuracy_threshold value: 744.7918891906738 - type: dot_ap value: 64.82619159813649 - type: dot_f1 value: 62.62379263968699 - type: dot_f1_threshold value: 696.7735290527344 - type: dot_precision value: 58.350421508316245 - type: dot_recall value: 67.57255936675462 - type: euclidean_accuracy value: 85.84371460928652 - type: euclidean_accuracy_threshold value: 220.4747200012207 - type: euclidean_ap value: 72.47837433257799 - type: euclidean_f1 value: 67.2811059907834 - type: euclidean_f1_threshold value: 240.81902503967285 - type: euclidean_precision value: 65.34062655395326 - type: euclidean_recall value: 69.34036939313984 - type: main_score value: 72.87577990245127 - type: manhattan_accuracy value: 85.83179352685224 - type: manhattan_accuracy_threshold value: 4910.404205322266 - type: manhattan_ap value: 72.44111617709422 - type: manhattan_f1 value: 67.09989806320081 - type: manhattan_f1_threshold value: 5333.793640136719 - type: manhattan_precision value: 64.88417939871857 - type: manhattan_recall value: 69.47229551451187 - type: max_accuracy value: 85.84371460928652 - type: max_ap value: 72.87577990245127 - type: max_f1 value: 67.36422521175885 - type: max_precision value: 65.34062655395326 - type: max_recall value: 71.34564643799473 - type: similarity_accuracy value: 85.7006616200751 - type: similarity_accuracy_threshold value: 75.78572630882263 - type: similarity_ap value: 72.87577990245127 - type: similarity_f1 value: 67.36422521175885 - type: similarity_f1_threshold value: 70.15678882598877 - type: similarity_precision value: 63.80368098159509 - type: similarity_recall value: 71.34564643799473 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus (default) revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: cosine_accuracy value: 88.88112702293631 - type: cosine_accuracy_threshold value: 71.48405313491821 - type: cosine_ap value: 85.88088882163336 - type: cosine_f1 value: 78.2251744598276 - type: cosine_f1_threshold value: 70.09605169296265 - type: cosine_precision value: 75.8997755087262 - type: cosine_recall value: 80.69756698490914 - type: dot_accuracy value: 88.04672643303451 - type: dot_accuracy_threshold value: 700.6264686584473 - type: dot_ap value: 83.52072844458456 - type: dot_f1 value: 76.24239256244634 - type: dot_f1_threshold value: 664.9115562438965 - type: dot_precision value: 74.0123233055455 - type: dot_recall value: 78.61102556205728 - type: euclidean_accuracy value: 88.72588970388482 - type: euclidean_accuracy_threshold value: 226.53303146362305 - type: euclidean_ap value: 85.51788295919707 - type: euclidean_f1 value: 77.73453426739316 - type: euclidean_f1_threshold value: 238.7503147125244 - type: euclidean_precision value: 74.94818097348296 - type: euclidean_recall value: 80.73606405913151 - type: main_score value: 85.88088882163336 - type: manhattan_accuracy value: 88.68902084061008 - type: manhattan_accuracy_threshold value: 5034.079742431641 - type: manhattan_ap value: 85.49952903626239 - type: manhattan_f1 value: 77.74326743888625 - type: manhattan_f1_threshold value: 5334.531021118164 - type: manhattan_precision value: 73.98289171708741 - type: manhattan_recall value: 81.90637511549123 - type: max_accuracy value: 88.88112702293631 - type: max_ap value: 85.88088882163336 - type: max_f1 value: 78.2251744598276 - type: max_precision value: 75.8997755087262 - type: max_recall value: 81.90637511549123 - type: similarity_accuracy value: 88.88112702293631 - type: similarity_accuracy_threshold value: 71.48405313491821 - type: similarity_ap value: 85.88088882163336 - type: similarity_f1 value: 78.2251744598276 - type: similarity_f1_threshold value: 70.09605169296265 - type: similarity_precision value: 75.8997755087262 - type: similarity_recall value: 80.69756698490914 task: type: PairClassification --- # Contextual Document Embeddings (CDE) **Link to code: github.com/jxmorris12/cde** Our new model that naturally integrates \"context tokens\" into the embedding process. As of October 1st, 2024, is the best small model (under 400M params) on the MTEB leaderboard for text embedding models, with an average score of 65.00. 👉
👉
!CDE Overview Figure

# How to use Our embedding model needs to be used in *two stages*. The first stage is to gather some dataset information by embedding a subset of the corpus using our \"first-stage\" model. The second stage is to actually embed queries and documents, conditioning on the corpus information from the first stage. Note that we can do the first stage part offline and only use the second-stage weights at inference time. ## With Transformers
Click to learn how to use cde-small-v1 with Transformers ### Loading the model Our model can be loaded using out-of-the-box with \"trust remote code\" enabled. We use the default BERT uncased tokenizer: #### Note on prefixes *Nota bene*: Like all state-of-the-art embedding models, our model was trained with task-specific prefixes. To do retrieval, you can prepend the following strings to queries & documents: ### First stage ### Running the second stage Now that we have obtained \"dataset embeddings\" we can embed documents and queries like normal. Remember to use the document prefix for documents: and the query prefix for queries: these embeddings can be compared using dot product, since they're normalized.
### What if I don't know what my corpus will be ahead of time? If you can't obtain corpus information ahead of time, you still have to pass *something* as the dataset embeddings; our model will work fine in this case, but not quite as well; without corpus information, our model performance drops from 65.0 to 63.8 on MTEB. We provide some random strings that worked well for us that can be used as a substitute for corpus sampling. ## With Sentence Transformers
Click to learn how to use cde-small-v1 with Sentence Transformers ### Loading the model Our model can be loaded using out-of-the-box with \"trust remote code\" enabled: #### Note on prefixes *Nota bene*: Like all state-of-the-art embedding models, our model was trained with task-specific prefixes. To do retrieval, you can use and in the method of the model when embedding queries and documents, respectively. ### First stage ### Running the second stage Now that we have obtained \"dataset embeddings\" we can embed documents and queries like normal. Remember to use the document prompt for documents: these embeddings can be compared using cosine similarity via :
Click here for a full copy-paste ready example
### Colab demo We've set up a short demo in a Colab notebook showing how you might use our model: Try our model in Colab: ### Acknowledgments Early experiments on CDE were done with support from Nomic and Hyperbolic. We're especially indebted to Nomic for open-sourcing their efficient BERT implementation and contrastive pre-training data, which proved vital in the development of CDE. ### Cite us Used our model, method, or architecture? Want to cite us? Here's the ArXiv citation information:", + "model_explanation_gemini": "Performs classification tasks on text data, particularly for Amazon reviews and argument analysis, with demonstrated accuracy in sentiment and counterfactual detection." +} \ No newline at end of file diff --git a/data/model_data_json/OrcaDB_gte-base-en-v1.5.json b/data/model_data_json/OrcaDB_gte-base-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..ee20f2e1a8268dcfa8dbb0f69a0168189c6990b0 --- /dev/null +++ b/data/model_data_json/OrcaDB_gte-base-en-v1.5.json @@ -0,0 +1,26 @@ +{ + "model_id": "OrcaDB/gte-base-en-v1.5", + "downloads": 133886, + "tags": [ + "transformers", + "safetensors", + "new", + "feature-extraction", + "sentence-transformers", + "gte", + "mteb", + "transformers.js", + "sentence-similarity", + "custom_code", + "en", + "arxiv:2407.19669", + "arxiv:2308.03281", + "license:apache-2.0", + "model-index", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - sentence-transformers - gte - mteb - transformers.js - sentence-similarity license: apache-2.0 language: - en model-index: - name: gte-base-en-v1.5 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.7910447761194 - type: ap value: 37.053785713650626 - type: f1 value: 68.51101510998551 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.016875 - type: ap value: 89.17750268426342 - type: f1 value: 92.9970977240524 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.312000000000005 - type: f1 value: 52.98175784163017 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 38.193 - type: map_at_10 value: 54.848 - type: map_at_100 value: 55.388000000000005 - type: map_at_1000 value: 55.388999999999996 - type: map_at_3 value: 50.427 - type: map_at_5 value: 53.105000000000004 - type: mrr_at_1 value: 39.047 - type: mrr_at_10 value: 55.153 - type: mrr_at_100 value: 55.686 - type: mrr_at_1000 value: 55.688 - type: mrr_at_3 value: 50.676 - type: mrr_at_5 value: 53.417 - type: ndcg_at_1 value: 38.193 - type: ndcg_at_10 value: 63.486 - type: ndcg_at_100 value: 65.58 - type: ndcg_at_1000 value: 65.61 - type: ndcg_at_3 value: 54.494 - type: ndcg_at_5 value: 59.339 - type: precision_at_1 value: 38.193 - type: precision_at_10 value: 9.075 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.096 - type: precision_at_5 value: 15.619 - type: recall_at_1 value: 38.193 - type: recall_at_10 value: 90.754 - type: recall_at_100 value: 99.431 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 66.28699999999999 - type: recall_at_5 value: 78.094 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.508221208908964 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.04668382560096 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.828759903716815 - type: mrr value: 74.37343358395991 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.03673698773017 - type: cos_sim_spearman value: 83.6470866785058 - type: euclidean_pearson value: 82.64048673096565 - type: euclidean_spearman value: 83.63142367101115 - type: manhattan_pearson value: 82.71493099760228 - type: manhattan_spearman value: 83.60491704294326 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.73376623376623 - type: f1 value: 86.70294049278262 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.31923804167062 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.552547125348454 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 30.567 - type: map_at_10 value: 41.269 - type: map_at_100 value: 42.689 - type: map_at_1000 value: 42.84 - type: map_at_3 value: 37.567 - type: map_at_5 value: 39.706 - type: mrr_at_1 value: 37.053000000000004 - type: mrr_at_10 value: 46.900999999999996 - type: mrr_at_100 value: 47.662 - type: mrr_at_1000 value: 47.713 - type: mrr_at_3 value: 43.801 - type: mrr_at_5 value: 45.689 - type: ndcg_at_1 value: 37.053000000000004 - type: ndcg_at_10 value: 47.73 - type: ndcg_at_100 value: 53.128 - type: ndcg_at_1000 value: 55.300000000000004 - type: ndcg_at_3 value: 42.046 - type: ndcg_at_5 value: 44.782 - type: precision_at_1 value: 37.053000000000004 - type: precision_at_10 value: 9.142 - type: precision_at_100 value: 1.485 - type: precision_at_1000 value: 0.197 - type: precision_at_3 value: 20.076 - type: precision_at_5 value: 14.535 - type: recall_at_1 value: 30.567 - type: recall_at_10 value: 60.602999999999994 - type: recall_at_100 value: 83.22800000000001 - type: recall_at_1000 value: 96.696 - type: recall_at_3 value: 44.336999999999996 - type: recall_at_5 value: 51.949 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 28.538000000000004 - type: map_at_10 value: 38.757999999999996 - type: map_at_100 value: 40.129 - type: map_at_1000 value: 40.262 - type: map_at_3 value: 35.866 - type: map_at_5 value: 37.417 - type: mrr_at_1 value: 36.051 - type: mrr_at_10 value: 44.868 - type: mrr_at_100 value: 45.568999999999996 - type: mrr_at_1000 value: 45.615 - type: mrr_at_3 value: 42.558 - type: mrr_at_5 value: 43.883 - type: ndcg_at_1 value: 36.051 - type: ndcg_at_10 value: 44.584 - type: ndcg_at_100 value: 49.356 - type: ndcg_at_1000 value: 51.39 - type: ndcg_at_3 value: 40.389 - type: ndcg_at_5 value: 42.14 - type: precision_at_1 value: 36.051 - type: precision_at_10 value: 8.446 - type: precision_at_100 value: 1.411 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 19.639 - type: precision_at_5 value: 13.796 - type: recall_at_1 value: 28.538000000000004 - type: recall_at_10 value: 54.99000000000001 - type: recall_at_100 value: 75.098 - type: recall_at_1000 value: 87.848 - type: recall_at_3 value: 42.236000000000004 - type: recall_at_5 value: 47.377 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 37.188 - type: map_at_10 value: 50.861000000000004 - type: map_at_100 value: 51.917 - type: map_at_1000 value: 51.964999999999996 - type: map_at_3 value: 47.144000000000005 - type: map_at_5 value: 49.417 - type: mrr_at_1 value: 42.571 - type: mrr_at_10 value: 54.086999999999996 - type: mrr_at_100 value: 54.739000000000004 - type: mrr_at_1000 value: 54.762 - type: mrr_at_3 value: 51.285000000000004 - type: mrr_at_5 value: 53.0 - type: ndcg_at_1 value: 42.571 - type: ndcg_at_10 value: 57.282 - type: ndcg_at_100 value: 61.477000000000004 - type: ndcg_at_1000 value: 62.426 - type: ndcg_at_3 value: 51.0 - type: ndcg_at_5 value: 54.346000000000004 - type: precision_at_1 value: 42.571 - type: precision_at_10 value: 9.467 - type: precision_at_100 value: 1.2550000000000001 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 23.114 - type: precision_at_5 value: 16.250999999999998 - type: recall_at_1 value: 37.188 - type: recall_at_10 value: 73.068 - type: recall_at_100 value: 91.203 - type: recall_at_1000 value: 97.916 - type: recall_at_3 value: 56.552 - type: recall_at_5 value: 64.567 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 25.041000000000004 - type: map_at_10 value: 33.86 - type: map_at_100 value: 34.988 - type: map_at_1000 value: 35.064 - type: map_at_3 value: 31.049 - type: map_at_5 value: 32.845 - type: mrr_at_1 value: 26.893 - type: mrr_at_10 value: 35.594 - type: mrr_at_100 value: 36.617 - type: mrr_at_1000 value: 36.671 - type: mrr_at_3 value: 33.051 - type: mrr_at_5 value: 34.61 - type: ndcg_at_1 value: 26.893 - type: ndcg_at_10 value: 38.674 - type: ndcg_at_100 value: 44.178 - type: ndcg_at_1000 value: 46.089999999999996 - type: ndcg_at_3 value: 33.485 - type: ndcg_at_5 value: 36.402 - type: precision_at_1 value: 26.893 - type: precision_at_10 value: 5.989 - type: precision_at_100 value: 0.918 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 14.2 - type: precision_at_5 value: 10.26 - type: recall_at_1 value: 25.041000000000004 - type: recall_at_10 value: 51.666000000000004 - type: recall_at_100 value: 76.896 - type: recall_at_1000 value: 91.243 - type: recall_at_3 value: 38.035999999999994 - type: recall_at_5 value: 44.999 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.909999999999998 - type: map_at_10 value: 23.901 - type: map_at_100 value: 25.165 - type: map_at_1000 value: 25.291000000000004 - type: map_at_3 value: 21.356 - type: map_at_5 value: 22.816 - type: mrr_at_1 value: 20.025000000000002 - type: mrr_at_10 value: 28.382 - type: mrr_at_100 value: 29.465000000000003 - type: mrr_at_1000 value: 29.535 - type: mrr_at_3 value: 25.933 - type: mrr_at_5 value: 27.332 - type: ndcg_at_1 value: 20.025000000000002 - type: ndcg_at_10 value: 29.099000000000004 - type: ndcg_at_100 value: 35.127 - type: ndcg_at_1000 value: 38.096000000000004 - type: ndcg_at_3 value: 24.464 - type: ndcg_at_5 value: 26.709 - type: precision_at_1 value: 20.025000000000002 - type: precision_at_10 value: 5.398 - type: precision_at_100 value: 0.9690000000000001 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 11.774 - type: precision_at_5 value: 8.632 - type: recall_at_1 value: 15.909999999999998 - type: recall_at_10 value: 40.672000000000004 - type: recall_at_100 value: 66.855 - type: recall_at_1000 value: 87.922 - type: recall_at_3 value: 28.069 - type: recall_at_5 value: 33.812 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 30.175 - type: map_at_10 value: 41.36 - type: map_at_100 value: 42.701 - type: map_at_1000 value: 42.817 - type: map_at_3 value: 37.931 - type: map_at_5 value: 39.943 - type: mrr_at_1 value: 35.611 - type: mrr_at_10 value: 46.346 - type: mrr_at_100 value: 47.160000000000004 - type: mrr_at_1000 value: 47.203 - type: mrr_at_3 value: 43.712 - type: mrr_at_5 value: 45.367000000000004 - type: ndcg_at_1 value: 35.611 - type: ndcg_at_10 value: 47.532000000000004 - type: ndcg_at_100 value: 53.003 - type: ndcg_at_1000 value: 55.007 - type: ndcg_at_3 value: 42.043 - type: ndcg_at_5 value: 44.86 - type: precision_at_1 value: 35.611 - type: precision_at_10 value: 8.624 - type: precision_at_100 value: 1.332 - type: precision_at_1000 value: 0.169 - type: precision_at_3 value: 20.083000000000002 - type: precision_at_5 value: 14.437 - type: recall_at_1 value: 30.175 - type: recall_at_10 value: 60.5 - type: recall_at_100 value: 83.399 - type: recall_at_1000 value: 96.255 - type: recall_at_3 value: 45.448 - type: recall_at_5 value: 52.432 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 22.467000000000002 - type: map_at_10 value: 33.812999999999995 - type: map_at_100 value: 35.248000000000005 - type: map_at_1000 value: 35.359 - type: map_at_3 value: 30.316 - type: map_at_5 value: 32.233000000000004 - type: mrr_at_1 value: 28.310999999999996 - type: mrr_at_10 value: 38.979 - type: mrr_at_100 value: 39.937 - type: mrr_at_1000 value: 39.989999999999995 - type: mrr_at_3 value: 36.244 - type: mrr_at_5 value: 37.871 - type: ndcg_at_1 value: 28.310999999999996 - type: ndcg_at_10 value: 40.282000000000004 - type: ndcg_at_100 value: 46.22 - type: ndcg_at_1000 value: 48.507 - type: ndcg_at_3 value: 34.596 - type: ndcg_at_5 value: 37.267 - type: precision_at_1 value: 28.310999999999996 - type: precision_at_10 value: 7.831 - type: precision_at_100 value: 1.257 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.275 - type: precision_at_5 value: 12.556999999999999 - type: recall_at_1 value: 22.467000000000002 - type: recall_at_10 value: 54.14099999999999 - type: recall_at_100 value: 79.593 - type: recall_at_1000 value: 95.063 - type: recall_at_3 value: 38.539 - type: recall_at_5 value: 45.403 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 24.18591666666667 - type: map_at_10 value: 33.84258333333333 - type: map_at_100 value: 35.11391666666666 - type: map_at_1000 value: 35.23258333333333 - type: map_at_3 value: 30.764249999999997 - type: map_at_5 value: 32.52333333333334 - type: mrr_at_1 value: 28.54733333333333 - type: mrr_at_10 value: 37.81725 - type: mrr_at_100 value: 38.716499999999996 - type: mrr_at_1000 value: 38.77458333333333 - type: mrr_at_3 value: 35.157833333333336 - type: mrr_at_5 value: 36.69816666666667 - type: ndcg_at_1 value: 28.54733333333333 - type: ndcg_at_10 value: 39.51508333333334 - type: ndcg_at_100 value: 44.95316666666666 - type: ndcg_at_1000 value: 47.257083333333334 - type: ndcg_at_3 value: 34.205833333333324 - type: ndcg_at_5 value: 36.78266666666667 - type: precision_at_1 value: 28.54733333333333 - type: precision_at_10 value: 7.082583333333334 - type: precision_at_100 value: 1.1590833333333332 - type: precision_at_1000 value: 0.15516666666666662 - type: precision_at_3 value: 15.908750000000001 - type: precision_at_5 value: 11.505416666666669 - type: recall_at_1 value: 24.18591666666667 - type: recall_at_10 value: 52.38758333333333 - type: recall_at_100 value: 76.13666666666667 - type: recall_at_1000 value: 91.99066666666667 - type: recall_at_3 value: 37.78333333333334 - type: recall_at_5 value: 44.30141666666666 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 21.975 - type: map_at_10 value: 29.781000000000002 - type: map_at_100 value: 30.847 - type: map_at_1000 value: 30.94 - type: map_at_3 value: 27.167 - type: map_at_5 value: 28.633999999999997 - type: mrr_at_1 value: 24.387 - type: mrr_at_10 value: 32.476 - type: mrr_at_100 value: 33.337 - type: mrr_at_1000 value: 33.403 - type: mrr_at_3 value: 29.881999999999998 - type: mrr_at_5 value: 31.339 - type: ndcg_at_1 value: 24.387 - type: ndcg_at_10 value: 34.596 - type: ndcg_at_100 value: 39.635 - type: ndcg_at_1000 value: 42.079 - type: ndcg_at_3 value: 29.516 - type: ndcg_at_5 value: 31.959 - type: precision_at_1 value: 24.387 - type: precision_at_10 value: 5.6129999999999995 - type: precision_at_100 value: 0.8909999999999999 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.73 - type: precision_at_5 value: 9.171999999999999 - type: recall_at_1 value: 21.975 - type: recall_at_10 value: 46.826 - type: recall_at_100 value: 69.554 - type: recall_at_1000 value: 87.749 - type: recall_at_3 value: 33.016 - type: recall_at_5 value: 38.97 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 15.614 - type: map_at_10 value: 22.927 - type: map_at_100 value: 24.185000000000002 - type: map_at_1000 value: 24.319 - type: map_at_3 value: 20.596 - type: map_at_5 value: 21.854000000000003 - type: mrr_at_1 value: 18.858 - type: mrr_at_10 value: 26.535999999999998 - type: mrr_at_100 value: 27.582 - type: mrr_at_1000 value: 27.665 - type: mrr_at_3 value: 24.295 - type: mrr_at_5 value: 25.532 - type: ndcg_at_1 value: 18.858 - type: ndcg_at_10 value: 27.583000000000002 - type: ndcg_at_100 value: 33.635 - type: ndcg_at_1000 value: 36.647 - type: ndcg_at_3 value: 23.348 - type: ndcg_at_5 value: 25.257 - type: precision_at_1 value: 18.858 - type: precision_at_10 value: 5.158 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.13999999999999999 - type: precision_at_3 value: 11.092 - type: precision_at_5 value: 8.1 - type: recall_at_1 value: 15.614 - type: recall_at_10 value: 37.916 - type: recall_at_100 value: 65.205 - type: recall_at_1000 value: 86.453 - type: recall_at_3 value: 26.137 - type: recall_at_5 value: 31.087999999999997 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 23.078000000000003 - type: map_at_10 value: 31.941999999999997 - type: map_at_100 value: 33.196999999999996 - type: map_at_1000 value: 33.303 - type: map_at_3 value: 28.927000000000003 - type: map_at_5 value: 30.707 - type: mrr_at_1 value: 26.866 - type: mrr_at_10 value: 35.557 - type: mrr_at_100 value: 36.569 - type: mrr_at_1000 value: 36.632 - type: mrr_at_3 value: 32.897999999999996 - type: mrr_at_5 value: 34.437 - type: ndcg_at_1 value: 26.866 - type: ndcg_at_10 value: 37.372 - type: ndcg_at_100 value: 43.248 - type: ndcg_at_1000 value: 45.632 - type: ndcg_at_3 value: 31.852999999999998 - type: ndcg_at_5 value: 34.582 - type: precision_at_1 value: 26.866 - type: precision_at_10 value: 6.511 - type: precision_at_100 value: 1.078 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 14.582999999999998 - type: precision_at_5 value: 10.634 - type: recall_at_1 value: 23.078000000000003 - type: recall_at_10 value: 50.334 - type: recall_at_100 value: 75.787 - type: recall_at_1000 value: 92.485 - type: recall_at_3 value: 35.386 - type: recall_at_5 value: 42.225 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 22.203999999999997 - type: map_at_10 value: 31.276 - type: map_at_100 value: 32.844 - type: map_at_1000 value: 33.062999999999995 - type: map_at_3 value: 27.733999999999998 - type: map_at_5 value: 29.64 - type: mrr_at_1 value: 27.272999999999996 - type: mrr_at_10 value: 36.083 - type: mrr_at_100 value: 37.008 - type: mrr_at_1000 value: 37.076 - type: mrr_at_3 value: 33.004 - type: mrr_at_5 value: 34.664 - type: ndcg_at_1 value: 27.272999999999996 - type: ndcg_at_10 value: 37.763000000000005 - type: ndcg_at_100 value: 43.566 - type: ndcg_at_1000 value: 46.356 - type: ndcg_at_3 value: 31.673000000000002 - type: ndcg_at_5 value: 34.501 - type: precision_at_1 value: 27.272999999999996 - type: precision_at_10 value: 7.470000000000001 - type: precision_at_100 value: 1.502 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 14.756 - type: precision_at_5 value: 11.225 - type: recall_at_1 value: 22.203999999999997 - type: recall_at_10 value: 51.437999999999995 - type: recall_at_100 value: 76.845 - type: recall_at_1000 value: 94.38600000000001 - type: recall_at_3 value: 34.258 - type: recall_at_5 value: 41.512 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 17.474 - type: map_at_10 value: 26.362999999999996 - type: map_at_100 value: 27.456999999999997 - type: map_at_1000 value: 27.567999999999998 - type: map_at_3 value: 23.518 - type: map_at_5 value: 25.068 - type: mrr_at_1 value: 18.669 - type: mrr_at_10 value: 27.998 - type: mrr_at_100 value: 28.953 - type: mrr_at_1000 value: 29.03 - type: mrr_at_3 value: 25.230999999999998 - type: mrr_at_5 value: 26.654 - type: ndcg_at_1 value: 18.669 - type: ndcg_at_10 value: 31.684 - type: ndcg_at_100 value: 36.864999999999995 - type: ndcg_at_1000 value: 39.555 - type: ndcg_at_3 value: 26.057000000000002 - type: ndcg_at_5 value: 28.587 - type: precision_at_1 value: 18.669 - type: precision_at_10 value: 5.3420000000000005 - type: precision_at_100 value: 0.847 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 11.583 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 17.474 - type: recall_at_10 value: 46.497 - type: recall_at_100 value: 69.977 - type: recall_at_1000 value: 89.872 - type: recall_at_3 value: 31.385999999999996 - type: recall_at_5 value: 37.283 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 17.173 - type: map_at_10 value: 30.407 - type: map_at_100 value: 32.528 - type: map_at_1000 value: 32.698 - type: map_at_3 value: 25.523 - type: map_at_5 value: 28.038 - type: mrr_at_1 value: 38.958 - type: mrr_at_10 value: 51.515 - type: mrr_at_100 value: 52.214000000000006 - type: mrr_at_1000 value: 52.237 - type: mrr_at_3 value: 48.502 - type: mrr_at_5 value: 50.251000000000005 - type: ndcg_at_1 value: 38.958 - type: ndcg_at_10 value: 40.355000000000004 - type: ndcg_at_100 value: 47.68 - type: ndcg_at_1000 value: 50.370000000000005 - type: ndcg_at_3 value: 33.946 - type: ndcg_at_5 value: 36.057 - type: precision_at_1 value: 38.958 - type: precision_at_10 value: 12.508 - type: precision_at_100 value: 2.054 - type: precision_at_1000 value: 0.256 - type: precision_at_3 value: 25.581 - type: precision_at_5 value: 19.256999999999998 - type: recall_at_1 value: 17.173 - type: recall_at_10 value: 46.967 - type: recall_at_100 value: 71.47200000000001 - type: recall_at_1000 value: 86.238 - type: recall_at_3 value: 30.961 - type: recall_at_5 value: 37.539 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 8.999 - type: map_at_10 value: 18.989 - type: map_at_100 value: 26.133 - type: map_at_1000 value: 27.666 - type: map_at_3 value: 13.918 - type: map_at_5 value: 16.473 - type: mrr_at_1 value: 66.25 - type: mrr_at_10 value: 74.161 - type: mrr_at_100 value: 74.516 - type: mrr_at_1000 value: 74.524 - type: mrr_at_3 value: 72.875 - type: mrr_at_5 value: 73.613 - type: ndcg_at_1 value: 54.37499999999999 - type: ndcg_at_10 value: 39.902 - type: ndcg_at_100 value: 44.212 - type: ndcg_at_1000 value: 51.62 - type: ndcg_at_3 value: 45.193 - type: ndcg_at_5 value: 42.541000000000004 - type: precision_at_1 value: 66.25 - type: precision_at_10 value: 30.425 - type: precision_at_100 value: 9.754999999999999 - type: precision_at_1000 value: 2.043 - type: precision_at_3 value: 48.25 - type: precision_at_5 value: 40.65 - type: recall_at_1 value: 8.999 - type: recall_at_10 value: 24.133 - type: recall_at_100 value: 49.138999999999996 - type: recall_at_1000 value: 72.639 - type: recall_at_3 value: 15.287999999999998 - type: recall_at_5 value: 19.415 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.38999999999999 - type: f1 value: 41.444205512055234 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 87.35000000000001 - type: map_at_10 value: 92.837 - type: map_at_100 value: 92.996 - type: map_at_1000 value: 93.006 - type: map_at_3 value: 92.187 - type: map_at_5 value: 92.595 - type: mrr_at_1 value: 93.864 - type: mrr_at_10 value: 96.723 - type: mrr_at_100 value: 96.72500000000001 - type: mrr_at_1000 value: 96.72500000000001 - type: mrr_at_3 value: 96.64 - type: mrr_at_5 value: 96.71499999999999 - type: ndcg_at_1 value: 93.864 - type: ndcg_at_10 value: 94.813 - type: ndcg_at_100 value: 95.243 - type: ndcg_at_1000 value: 95.38600000000001 - type: ndcg_at_3 value: 94.196 - type: ndcg_at_5 value: 94.521 - type: precision_at_1 value: 93.864 - type: precision_at_10 value: 10.951 - type: precision_at_100 value: 1.1400000000000001 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 35.114000000000004 - type: precision_at_5 value: 21.476 - type: recall_at_1 value: 87.35000000000001 - type: recall_at_10 value: 96.941 - type: recall_at_100 value: 98.397 - type: recall_at_1000 value: 99.21600000000001 - type: recall_at_3 value: 95.149 - type: recall_at_5 value: 96.131 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 24.476 - type: map_at_10 value: 40.11 - type: map_at_100 value: 42.229 - type: map_at_1000 value: 42.378 - type: map_at_3 value: 34.512 - type: map_at_5 value: 38.037 - type: mrr_at_1 value: 47.839999999999996 - type: mrr_at_10 value: 57.053 - type: mrr_at_100 value: 57.772 - type: mrr_at_1000 value: 57.799 - type: mrr_at_3 value: 54.552 - type: mrr_at_5 value: 56.011 - type: ndcg_at_1 value: 47.839999999999996 - type: ndcg_at_10 value: 48.650999999999996 - type: ndcg_at_100 value: 55.681000000000004 - type: ndcg_at_1000 value: 57.979 - type: ndcg_at_3 value: 43.923 - type: ndcg_at_5 value: 46.037 - type: precision_at_1 value: 47.839999999999996 - type: precision_at_10 value: 13.395000000000001 - type: precision_at_100 value: 2.0660000000000003 - type: precision_at_1000 value: 0.248 - type: precision_at_3 value: 29.064 - type: precision_at_5 value: 22.006 - type: recall_at_1 value: 24.476 - type: recall_at_10 value: 56.216 - type: recall_at_100 value: 81.798 - type: recall_at_1000 value: 95.48299999999999 - type: recall_at_3 value: 39.357 - type: recall_at_5 value: 47.802 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 42.728 - type: map_at_10 value: 57.737 - type: map_at_100 value: 58.531 - type: map_at_1000 value: 58.594 - type: map_at_3 value: 54.869 - type: map_at_5 value: 56.55 - type: mrr_at_1 value: 85.456 - type: mrr_at_10 value: 90.062 - type: mrr_at_100 value: 90.159 - type: mrr_at_1000 value: 90.16 - type: mrr_at_3 value: 89.37899999999999 - type: mrr_at_5 value: 89.81 - type: ndcg_at_1 value: 85.456 - type: ndcg_at_10 value: 67.755 - type: ndcg_at_100 value: 70.341 - type: ndcg_at_1000 value: 71.538 - type: ndcg_at_3 value: 63.735 - type: ndcg_at_5 value: 65.823 - type: precision_at_1 value: 85.456 - type: precision_at_10 value: 13.450000000000001 - type: precision_at_100 value: 1.545 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_3 value: 38.861000000000004 - type: precision_at_5 value: 24.964 - type: recall_at_1 value: 42.728 - type: recall_at_10 value: 67.252 - type: recall_at_100 value: 77.265 - type: recall_at_1000 value: 85.246 - type: recall_at_3 value: 58.292 - type: recall_at_5 value: 62.41100000000001 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 87.4836 - type: ap value: 82.29552224030336 - type: f1 value: 87.42791432227448 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 23.015 - type: map_at_10 value: 35.621 - type: map_at_100 value: 36.809 - type: map_at_1000 value: 36.853 - type: map_at_3 value: 31.832 - type: map_at_5 value: 34.006 - type: mrr_at_1 value: 23.738999999999997 - type: mrr_at_10 value: 36.309999999999995 - type: mrr_at_100 value: 37.422 - type: mrr_at_1000 value: 37.461 - type: mrr_at_3 value: 32.592999999999996 - type: mrr_at_5 value: 34.736 - type: ndcg_at_1 value: 23.724999999999998 - type: ndcg_at_10 value: 42.617 - type: ndcg_at_100 value: 48.217999999999996 - type: ndcg_at_1000 value: 49.309 - type: ndcg_at_3 value: 34.905 - type: ndcg_at_5 value: 38.769 - type: precision_at_1 value: 23.724999999999998 - type: precision_at_10 value: 6.689 - type: precision_at_100 value: 0.9480000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.89 - type: precision_at_5 value: 10.897 - type: recall_at_1 value: 23.015 - type: recall_at_10 value: 64.041 - type: recall_at_100 value: 89.724 - type: recall_at_1000 value: 98.00999999999999 - type: recall_at_3 value: 43.064 - type: recall_at_5 value: 52.31099999999999 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.49794801641588 - type: f1 value: 96.28931114498003 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.81121751025992 - type: f1 value: 63.18740125901853 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.66644250168123 - type: f1 value: 74.93211186867839 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 81.77202420981843 - type: f1 value: 81.63681969283554 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.596687684870645 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.26965660101405 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.33619694846802 - type: mrr value: 32.53719657720334 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.0729999999999995 - type: map_at_10 value: 13.245999999999999 - type: map_at_100 value: 16.747999999999998 - type: map_at_1000 value: 18.163 - type: map_at_3 value: 10.064 - type: map_at_5 value: 11.513 - type: mrr_at_1 value: 49.536 - type: mrr_at_10 value: 58.092 - type: mrr_at_100 value: 58.752 - type: mrr_at_1000 value: 58.78 - type: mrr_at_3 value: 56.398 - type: mrr_at_5 value: 57.389 - type: ndcg_at_1 value: 47.059 - type: ndcg_at_10 value: 35.881 - type: ndcg_at_100 value: 32.751999999999995 - type: ndcg_at_1000 value: 41.498000000000005 - type: ndcg_at_3 value: 42.518 - type: ndcg_at_5 value: 39.550999999999995 - type: precision_at_1 value: 49.536 - type: precision_at_10 value: 26.316 - type: precision_at_100 value: 8.084 - type: precision_at_1000 value: 2.081 - type: precision_at_3 value: 39.938 - type: precision_at_5 value: 34.056 - type: recall_at_1 value: 6.0729999999999995 - type: recall_at_10 value: 16.593 - type: recall_at_100 value: 32.883 - type: recall_at_1000 value: 64.654 - type: recall_at_3 value: 11.174000000000001 - type: recall_at_5 value: 13.528 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 30.043 - type: map_at_10 value: 45.318999999999996 - type: map_at_100 value: 46.381 - type: map_at_1000 value: 46.412 - type: map_at_3 value: 40.941 - type: map_at_5 value: 43.662 - type: mrr_at_1 value: 33.98 - type: mrr_at_10 value: 47.870000000000005 - type: mrr_at_100 value: 48.681999999999995 - type: mrr_at_1000 value: 48.703 - type: mrr_at_3 value: 44.341 - type: mrr_at_5 value: 46.547 - type: ndcg_at_1 value: 33.98 - type: ndcg_at_10 value: 52.957 - type: ndcg_at_100 value: 57.434 - type: ndcg_at_1000 value: 58.103 - type: ndcg_at_3 value: 44.896 - type: ndcg_at_5 value: 49.353 - type: precision_at_1 value: 33.98 - type: precision_at_10 value: 8.786 - type: precision_at_100 value: 1.1280000000000001 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 20.577 - type: precision_at_5 value: 14.942 - type: recall_at_1 value: 30.043 - type: recall_at_10 value: 73.593 - type: recall_at_100 value: 93.026 - type: recall_at_1000 value: 97.943 - type: recall_at_3 value: 52.955 - type: recall_at_5 value: 63.132 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.808 - type: map_at_10 value: 84.675 - type: map_at_100 value: 85.322 - type: map_at_1000 value: 85.33800000000001 - type: map_at_3 value: 81.68900000000001 - type: map_at_5 value: 83.543 - type: mrr_at_1 value: 81.5 - type: mrr_at_10 value: 87.59700000000001 - type: mrr_at_100 value: 87.705 - type: mrr_at_1000 value: 87.70599999999999 - type: mrr_at_3 value: 86.607 - type: mrr_at_5 value: 87.289 - type: ndcg_at_1 value: 81.51 - type: ndcg_at_10 value: 88.41799999999999 - type: ndcg_at_100 value: 89.644 - type: ndcg_at_1000 value: 89.725 - type: ndcg_at_3 value: 85.49900000000001 - type: ndcg_at_5 value: 87.078 - type: precision_at_1 value: 81.51 - type: precision_at_10 value: 13.438 - type: precision_at_100 value: 1.532 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.363 - type: precision_at_5 value: 24.57 - type: recall_at_1 value: 70.808 - type: recall_at_10 value: 95.575 - type: recall_at_100 value: 99.667 - type: recall_at_1000 value: 99.98899999999999 - type: recall_at_3 value: 87.223 - type: recall_at_5 value: 91.682 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 58.614831329137715 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.86580408560826 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.093 - type: map_at_10 value: 13.014000000000001 - type: map_at_100 value: 15.412999999999998 - type: map_at_1000 value: 15.756999999999998 - type: map_at_3 value: 9.216000000000001 - type: map_at_5 value: 11.036999999999999 - type: mrr_at_1 value: 25.1 - type: mrr_at_10 value: 37.133 - type: mrr_at_100 value: 38.165 - type: mrr_at_1000 value: 38.198 - type: mrr_at_3 value: 33.217 - type: mrr_at_5 value: 35.732 - type: ndcg_at_1 value: 25.1 - type: ndcg_at_10 value: 21.918000000000003 - type: ndcg_at_100 value: 30.983 - type: ndcg_at_1000 value: 36.629 - type: ndcg_at_3 value: 20.544999999999998 - type: ndcg_at_5 value: 18.192 - type: precision_at_1 value: 25.1 - type: precision_at_10 value: 11.44 - type: precision_at_100 value: 2.459 - type: precision_at_1000 value: 0.381 - type: precision_at_3 value: 19.267 - type: precision_at_5 value: 16.16 - type: recall_at_1 value: 5.093 - type: recall_at_10 value: 23.215 - type: recall_at_100 value: 49.902 - type: recall_at_1000 value: 77.403 - type: recall_at_3 value: 11.733 - type: recall_at_5 value: 16.372999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.9365442977452 - type: cos_sim_spearman value: 79.36960687383745 - type: euclidean_pearson value: 79.6045204840714 - type: euclidean_spearman value: 79.26382712751337 - type: manhattan_pearson value: 79.4805084789529 - type: manhattan_spearman value: 79.21847863209523 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.27906192961453 - type: cos_sim_spearman value: 74.38364712099211 - type: euclidean_pearson value: 78.54358927241223 - type: euclidean_spearman value: 74.22185560806376 - type: manhattan_pearson value: 78.50904327377751 - type: manhattan_spearman value: 74.2627500781748 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.66863742649639 - type: cos_sim_spearman value: 84.70630905216271 - type: euclidean_pearson value: 84.64498334705334 - type: euclidean_spearman value: 84.87204770690148 - type: manhattan_pearson value: 84.65774227976077 - type: manhattan_spearman value: 84.91251851797985 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.1577763924467 - type: cos_sim_spearman value: 80.10314039230198 - type: euclidean_pearson value: 81.51346991046043 - type: euclidean_spearman value: 80.08678485109435 - type: manhattan_pearson value: 81.57058914661894 - type: manhattan_spearman value: 80.1516230725106 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.40310839662533 - type: cos_sim_spearman value: 87.16293477217867 - type: euclidean_pearson value: 86.50688711184775 - type: euclidean_spearman value: 87.08651444923031 - type: manhattan_pearson value: 86.54674677557857 - type: manhattan_spearman value: 87.15079017870971 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.32886275207817 - type: cos_sim_spearman value: 85.0190460590732 - type: euclidean_pearson value: 84.42553652784679 - type: euclidean_spearman value: 85.20027364279328 - type: manhattan_pearson value: 84.42926246281078 - type: manhattan_spearman value: 85.20187419804306 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 90.76732216967812 - type: cos_sim_spearman value: 90.63701653633909 - type: euclidean_pearson value: 90.26678186114682 - type: euclidean_spearman value: 90.67288073455427 - type: manhattan_pearson value: 90.20772020584582 - type: manhattan_spearman value: 90.60764863983702 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 69.09280387698125 - type: cos_sim_spearman value: 68.62743151172162 - type: euclidean_pearson value: 69.89386398104689 - type: euclidean_spearman value: 68.71191066733556 - type: manhattan_pearson value: 69.92516500604872 - type: manhattan_spearman value: 68.80452846992576 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.13178592019887 - type: cos_sim_spearman value: 86.03947178806887 - type: euclidean_pearson value: 85.87029414285313 - type: euclidean_spearman value: 86.04960843306998 - type: manhattan_pearson value: 85.92946858580146 - type: manhattan_spearman value: 86.12575341860442 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.16657063002837 - type: mrr value: 95.73671063867141 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 63.510999999999996 - type: map_at_10 value: 72.76899999999999 - type: map_at_100 value: 73.303 - type: map_at_1000 value: 73.32499999999999 - type: map_at_3 value: 70.514 - type: map_at_5 value: 71.929 - type: mrr_at_1 value: 66.333 - type: mrr_at_10 value: 73.75 - type: mrr_at_100 value: 74.119 - type: mrr_at_1000 value: 74.138 - type: mrr_at_3 value: 72.222 - type: mrr_at_5 value: 73.122 - type: ndcg_at_1 value: 66.333 - type: ndcg_at_10 value: 76.774 - type: ndcg_at_100 value: 78.78500000000001 - type: ndcg_at_1000 value: 79.254 - type: ndcg_at_3 value: 73.088 - type: ndcg_at_5 value: 75.002 - type: precision_at_1 value: 66.333 - type: precision_at_10 value: 9.833 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 28.222 - type: precision_at_5 value: 18.333 - type: recall_at_1 value: 63.510999999999996 - type: recall_at_10 value: 87.98899999999999 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 77.86699999999999 - type: recall_at_5 value: 82.73899999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.78514851485149 - type: cos_sim_ap value: 94.94214383862038 - type: cos_sim_f1 value: 89.02255639097744 - type: cos_sim_precision value: 89.2462311557789 - type: cos_sim_recall value: 88.8 - type: dot_accuracy value: 99.78217821782178 - type: dot_ap value: 94.69965247836805 - type: dot_f1 value: 88.78695208970439 - type: dot_precision value: 90.54054054054053 - type: dot_recall value: 87.1 - type: euclidean_accuracy value: 99.78118811881188 - type: euclidean_ap value: 94.9865187695411 - type: euclidean_f1 value: 88.99950223992036 - type: euclidean_precision value: 88.60257680872151 - type: euclidean_recall value: 89.4 - type: manhattan_accuracy value: 99.78811881188119 - type: manhattan_ap value: 95.0021236766459 - type: manhattan_f1 value: 89.12071535022356 - type: manhattan_precision value: 88.54886475814413 - type: manhattan_recall value: 89.7 - type: max_accuracy value: 99.78811881188119 - type: max_ap value: 95.0021236766459 - type: max_f1 value: 89.12071535022356 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 68.93190546593995 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 37.602808534760655 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.29214480978073 - type: mrr value: 53.123169722434426 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.967800769650022 - type: cos_sim_spearman value: 31.168490040206926 - type: dot_pearson value: 30.888603021128553 - type: dot_spearman value: 31.028241262520385 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22300000000000003 - type: map_at_10 value: 1.781 - type: map_at_100 value: 9.905999999999999 - type: map_at_1000 value: 23.455000000000002 - type: map_at_3 value: 0.569 - type: map_at_5 value: 0.918 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 91.067 - type: mrr_at_100 value: 91.067 - type: mrr_at_1000 value: 91.067 - type: mrr_at_3 value: 90.667 - type: mrr_at_5 value: 91.067 - type: ndcg_at_1 value: 78.0 - type: ndcg_at_10 value: 73.13499999999999 - type: ndcg_at_100 value: 55.32 - type: ndcg_at_1000 value: 49.532 - type: ndcg_at_3 value: 73.715 - type: ndcg_at_5 value: 72.74199999999999 - type: precision_at_1 value: 84.0 - type: precision_at_10 value: 78.8 - type: precision_at_100 value: 56.32 - type: precision_at_1000 value: 21.504 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 78.0 - type: recall_at_1 value: 0.22300000000000003 - type: recall_at_10 value: 2.049 - type: recall_at_100 value: 13.553 - type: recall_at_1000 value: 46.367999999999995 - type: recall_at_3 value: 0.604 - type: recall_at_5 value: 1.015 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.0380000000000003 - type: map_at_10 value: 10.188 - type: map_at_100 value: 16.395 - type: map_at_1000 value: 18.024 - type: map_at_3 value: 6.236 - type: map_at_5 value: 7.276000000000001 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 46.292 - type: mrr_at_100 value: 47.446 - type: mrr_at_1000 value: 47.446 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 44.32 - type: ndcg_at_1 value: 32.653 - type: ndcg_at_10 value: 25.219 - type: ndcg_at_100 value: 37.802 - type: ndcg_at_1000 value: 49.274 - type: ndcg_at_3 value: 28.605999999999998 - type: ndcg_at_5 value: 26.21 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 21.837 - type: precision_at_100 value: 7.776 - type: precision_at_1000 value: 1.522 - type: precision_at_3 value: 28.571 - type: precision_at_5 value: 25.306 - type: recall_at_1 value: 3.0380000000000003 - type: recall_at_10 value: 16.298000000000002 - type: recall_at_100 value: 48.712 - type: recall_at_1000 value: 83.16799999999999 - type: recall_at_3 value: 7.265000000000001 - type: recall_at_5 value: 9.551 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 83.978 - type: ap value: 24.751887949330015 - type: f1 value: 66.8685134049279 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.573288058856825 - type: f1 value: 61.973261751726604 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.75483298792469 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.36824223639506 - type: cos_sim_ap value: 75.53126388573047 - type: cos_sim_f1 value: 67.9912831688245 - type: cos_sim_precision value: 66.11817501869858 - type: cos_sim_recall value: 69.9736147757256 - type: dot_accuracy value: 86.39804494248078 - type: dot_ap value: 75.27598891718046 - type: dot_f1 value: 67.91146284159763 - type: dot_precision value: 63.90505003490807 - type: dot_recall value: 72.45382585751979 - type: euclidean_accuracy value: 86.36228169517793 - type: euclidean_ap value: 75.51438087434647 - type: euclidean_f1 value: 68.02370523061066 - type: euclidean_precision value: 66.46525679758308 - type: euclidean_recall value: 69.65699208443272 - type: manhattan_accuracy value: 86.46361089586935 - type: manhattan_ap value: 75.50800785730111 - type: manhattan_f1 value: 67.9220437187253 - type: manhattan_precision value: 67.79705573080967 - type: manhattan_recall value: 68.04749340369392 - type: max_accuracy value: 86.46361089586935 - type: max_ap value: 75.53126388573047 - type: max_f1 value: 68.02370523061066 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.80350836341057 - type: cos_sim_ap value: 85.51101933260743 - type: cos_sim_f1 value: 77.9152271629704 - type: cos_sim_precision value: 75.27815662910056 - type: cos_sim_recall value: 80.74376347397599 - type: dot_accuracy value: 88.84425815966158 - type: dot_ap value: 85.49726945962519 - type: dot_f1 value: 77.94445269567801 - type: dot_precision value: 75.27251864601261 - type: dot_recall value: 80.81305820757623 - type: euclidean_accuracy value: 88.80350836341057 - type: euclidean_ap value: 85.4882880790211 - type: euclidean_f1 value: 77.87063284615103 - type: euclidean_precision value: 74.61022927689595 - type: euclidean_recall value: 81.42901139513397 - type: manhattan_accuracy value: 88.7161873714441 - type: manhattan_ap value: 85.45753871906821 - type: manhattan_f1 value: 77.8686401480111 - type: manhattan_precision value: 74.95903683123174 - type: manhattan_recall value: 81.01324299353249 - type: max_accuracy value: 88.84425815966158 - type: max_ap value: 85.51101933260743 - type: max_f1 value: 77.94445269567801 --- # gte-base-en-v1.5 We introduce series, upgraded embeddings that support the context length of up to **8192**, while further enhancing model performance. The models are built upon the encoder backbone (BERT + RoPE + GLU). The series achieve state-of-the-art scores on the MTEB benchmark within the same model size category and prodvide competitive on the LoCo long-context retrieval tests (refer to Evaluation). We also present the []( a SOTA instruction-tuned multi-lingual embedding model that ranked 2nd in MTEB and 1st in C-MTEB. - **Developed by:** Institute for Intelligent Computing, Alibaba Group - **Model type:** Text Embeddings - **Paper:** mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval ### Model list | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo | |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: | |[]( Multiple | 7720 | 32768 | 4096 | 67.34 | 87.57 | |[]( | English | 434 | 8192 | 1024 | 65.39 | 86.71 | |[]( | English | 137 | 8192 | 768 | 64.11 | 87.44 | ## How to Get Started with the Model Use the code below to get started with the model. **It is recommended to install xformers and enable unpadding for acceleration, refer to enable-unpadding-and-xformers.** Use with : Use with : ## Training Details ### Training Data - Masked language modeling (MLM): - Weak-supervised contrastive pre-training (CPT): GTE pre-training data - Supervised contrastive fine-tuning: GTE fine-tuning data ### Training Procedure To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy. The model first undergoes preliminary MLM pre-training on shorter lengths. And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training. The entire training process is as follows: - MLM-2048: lr 5e-4, mlm_probability 0.3, batch_size 4096, num_steps 70000, rope_base 10000 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 20000, rope_base 500000 - CPT: max_len 512, lr 2e-4, batch_size 32768, num_steps 100000 - Fine-tuning: TODO ## Evaluation ### MTEB The results of other models are retrieved from MTEB leaderboard. The gte evaluation setting: , and set ntk scaling factor to 2 (equivalent to rope_base * 2). | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large-en-v1.5** | 434 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 | | mxbai-embed-large-v1 | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 | | multilingual-e5-large-instruct | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 | | bge-large-en-v1.5| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | **gte-base-en-v1.5** | 137 | 768 | 8192 | **64.11** | 77.17 | 46.82 | 85.33 | 57.66 | 54.09 | 81.97 | 31.17 | | bge-base-en-v1.5| 109 | 768 | 512 | 63.55 | 75.53 | 45.77 | 86.55 | 58.86 | 53.25 | 82.4 | 31.07 | ### LoCo | Model Name | Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval | GovReportRetrieval | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | gte-qwen1.5-7b | 4096 | 32768 | 87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | | gte-large-v1.5 |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 | | gte-base-v1.5 | 768 | 8192 | 87.44 | 49.91 | 91.78 | 99.82 | 97.13 | 98.58 | ## Citation If you find our paper or models helpful, please consider citing them as follows:", + "model_explanation_gemini": "Generates sentence embeddings for English text to perform tasks like classification, retrieval, clustering, and similarity scoring." +} \ No newline at end of file diff --git a/data/model_data_json/Orenguteng_Llama-3-8B-Lexi-Uncensored.json b/data/model_data_json/Orenguteng_Llama-3-8B-Lexi-Uncensored.json new file mode 100644 index 0000000000000000000000000000000000000000..51ac555ffaaa50342c7adf7165dbbaa8ca8a8d15 --- /dev/null +++ b/data/model_data_json/Orenguteng_Llama-3-8B-Lexi-Uncensored.json @@ -0,0 +1,23 @@ +{ + "model_id": "Orenguteng/Llama-3-8B-Lexi-Uncensored", + "downloads": 259576, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "uncensored", + "llama3", + "instruct", + "open", + "conversational", + "license:llama3", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: llama3 tags: - uncensored - llama3 - instruct - open model-index: - name: Llama-3-8B-Lexi-Uncensored results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 59.56 name: normalized accuracy source: url: name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 77.88 name: normalized accuracy source: url: name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 67.68 name: accuracy source: url: name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 47.72 source: url: name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 75.85 name: accuracy source: url: name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 68.39 name: accuracy source: url: name: Open LLM Leaderboard --- !image/png This model is based on Llama-3-8b-Instruct, and is governed by META LLAMA 3 COMMUNITY LICENSE AGREEMENT Lexi is uncensored, which makes the model compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. You are responsible for any content you create using this model. Please use it responsibly. Lexi is licensed according to Meta's Llama license. I grant permission for any use, including commercial, that falls within accordance with Meta's Llama-3 license. # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |66.18| |AI2 Reasoning Challenge (25-Shot)|59.56| |HellaSwag (10-Shot) |77.88| |MMLU (5-Shot) |67.68| |TruthfulQA (0-shot) |47.72| |Winogrande (5-shot) |75.85| |GSM8k (5-shot) |68.39|", + "model_explanation_gemini": "An uncensored, instruction-tuned Llama-3-8B variant designed for compliant text generation across various tasks without built-in ethical constraints." +} \ No newline at end of file diff --git a/data/model_data_json/PekingU_rtdetr_r101vd_coco_o365.json b/data/model_data_json/PekingU_rtdetr_r101vd_coco_o365.json new file mode 100644 index 0000000000000000000000000000000000000000..897aaba918fc9eb1b7befae4d386d1a58d1c7b78 --- /dev/null +++ b/data/model_data_json/PekingU_rtdetr_r101vd_coco_o365.json @@ -0,0 +1,19 @@ +{ + "model_id": "PekingU/rtdetr_r101vd_coco_o365", + "downloads": 96396, + "tags": [ + "transformers", + "safetensors", + "rt_detr", + "object-detection", + "vision", + "en", + "dataset:coco", + "arxiv:2304.08069", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en pipeline_tag: object-detection tags: - object-detection - vision datasets: - coco widget: - src: >- example_title: Savanna - src: >- example_title: Football Match - src: >- example_title: Airport --- # Model Card for RT-DETR ## Table of Contents 1. Model Details 2. Model Sources 3. How to Get Started with the Model 4. Training Details 5. Evaluation 6. Model Architecture and Objective 7. Citation ## Model Details !image/png > The YOLO series has become the most popular framework for real-time object detection due to its reasonable trade-off between speed and accuracy. However, we observe that the speed and accuracy of YOLOs are negatively affected by the NMS. Recently, end-to-end Transformer-based detectors (DETRs) have provided an alternative to eliminating NMS. Nevertheless, the high computational cost limits their practicality and hinders them from fully exploiting the advantage of excluding NMS. In this paper, we propose the Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma. We build RT-DETR in two steps, drawing on the advanced DETR: first we focus on maintaining accuracy while improving speed, followed by maintaining speed while improving accuracy. Specifically, we design an efficient hybrid encoder to expeditiously process multi-scale features by decoupling intra-scale interaction and cross-scale fusion to improve speed. Then, we propose the uncertainty-minimal query selection to provide high-quality initial queries to the decoder, thereby improving accuracy. In addition, RT-DETR supports flexible speed tuning by adjusting the number of decoder layers to adapt to various scenarios without retraining. Our RT-DETR-R50 / R101 achieves 53.1% / 54.3% AP on COCO and 108 / 74 FPS on T4 GPU, outperforming previously advanced YOLOs in both speed and accuracy. We also develop scaled RT-DETRs that outperform the lighter YOLO detectors (S and M models). Furthermore, RT-DETR-R50 outperforms DINO-R50 by 2.2% AP in accuracy and about 21 times in FPS. After pre-training with Objects365, RT-DETR-R50 / R101 achieves 55.3% / 56.2% AP. The project page: this https URL. This is the model card of a 🤗 transformers model that has been pushed on the Hub. - **Developed by:** Yian Zhao and Sangbum Choi - **Funded by:** National Key R&D Program of China (No.2022ZD0118201), Natural Science Foundation of China (No.61972217, 32071459, 62176249, 62006133, 62271465), and the Shenzhen Medical Research Funds in China (No. B2302037). - **Shared by:** Sangbum Choi - **Model type:** RT-DETR - **License:** Apache-2.0 ### Model Sources - **HF Docs:** RT-DETR - **Repository:** - **Paper:** - **Demo:** RT-DETR Tracking ## How to Get Started with the Model Use the code below to get started with the model. This should output ## Training Details ### Training Data The RTDETR model was trained on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. ### Training Procedure We conduct experiments on COCO and Objects365 datasets, where RT-DETR is trained on COCO train2017 and validated on COCO val2017 dataset. We report the standard COCO metrics, including AP (averaged over uniformly sampled IoU thresholds ranging from 0.50-0.95 with a step size of 0.05), AP50, AP75, as well as AP at different scales: APS, APM, APL. ### Preprocessing Images are resized to 640x640 pixels and rescaled with and . ### Training Hyperparameters - **Training regime:** !image/png ## Evaluation | Model | #Epochs | #Params (M) | GFLOPs | FPS_bs=1 | AP (val) | AP50 (val) | AP75 (val) | AP-s (val) | AP-m (val) | AP-l (val) | |----------------------------|---------|-------------|--------|----------|--------|-----------|-----------|----------|----------|----------| | RT-DETR-R18 | 72 | 20 | 60.7 | 217 | 46.5 | 63.8 | 50.4 | 28.4 | 49.8 | 63.0 | | RT-DETR-R34 | 72 | 31 | 91.0 | 172 | 48.5 | 66.2 | 52.3 | 30.2 | 51.9 | 66.2 | | RT-DETR R50 | 72 | 42 | 136 | 108 | 53.1 | 71.3 | 57.7 | 34.8 | 58.0 | 70.0 | | RT-DETR R101| 72 | 76 | 259 | 74 | 54.3 | 72.7 | 58.6 | 36.0 | 58.8 | 72.1 | | RT-DETR-R18 (Objects 365 pretrained) | 60 | 20 | 61 | 217 | 49.2 | 66.6 | 53.5 | 33.2 | 52.3 | 64.8 | | RT-DETR-R50 (Objects 365 pretrained) | 24 | 42 | 136 | 108 | 55.3 | 73.4 | 60.1 | 37.9 | 59.9 | 71.8 | | RT-DETR-R101 (Objects 365 pretrained) | 24 | 76 | 259 | 74 | 56.2 | 74.6 | 61.3 | 38.3 | 60.5 | 73.5 | ### Model Architecture and Objective !image/png Overview of RT-DETR. We feed the features from the last three stages of the backbone into the encoder. The efficient hybrid encoder transforms multi-scale features into a sequence of image features through the Attention-based Intra-scale Feature Interaction (AIFI) and the CNN-based Cross-scale Feature Fusion (CCFF). Then, the uncertainty-minimal query selection selects a fixed number of encoder features to serve as initial object queries for the decoder. Finally, the decoder with auxiliary prediction heads iteratively optimizes object queries to generate categories and boxes. ## Citation **BibTeX:** ## Model Card Authors Sangbum Choi Pavel Iakubovskii", + "model_explanation_gemini": "Detects objects in real-time images with high accuracy using a transformer-based architecture, eliminating the need for non-maximum suppression while supporting flexible speed adjustments." +} \ No newline at end of file diff --git a/data/model_data_json/PekingU_rtdetr_r50vd_coco_o365.json b/data/model_data_json/PekingU_rtdetr_r50vd_coco_o365.json new file mode 100644 index 0000000000000000000000000000000000000000..22c1c30c8a33f8e8db8923135d9d115a23886775 --- /dev/null +++ b/data/model_data_json/PekingU_rtdetr_r50vd_coco_o365.json @@ -0,0 +1,19 @@ +{ + "model_id": "PekingU/rtdetr_r50vd_coco_o365", + "downloads": 83311, + "tags": [ + "transformers", + "safetensors", + "rt_detr", + "object-detection", + "vision", + "en", + "dataset:coco", + "arxiv:2304.08069", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en pipeline_tag: object-detection tags: - object-detection - vision datasets: - coco widget: - src: >- example_title: Savanna - src: >- example_title: Football Match - src: >- example_title: Airport --- # Model Card for RT-DETR ## Table of Contents 1. Model Details 2. Model Sources 3. How to Get Started with the Model 4. Training Details 5. Evaluation 6. Model Architecture and Objective 7. Citation ## Model Details !image/png > The YOLO series has become the most popular framework for real-time object detection due to its reasonable trade-off between speed and accuracy. However, we observe that the speed and accuracy of YOLOs are negatively affected by the NMS. Recently, end-to-end Transformer-based detectors (DETRs) have provided an alternative to eliminating NMS. Nevertheless, the high computational cost limits their practicality and hinders them from fully exploiting the advantage of excluding NMS. In this paper, we propose the Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma. We build RT-DETR in two steps, drawing on the advanced DETR: first we focus on maintaining accuracy while improving speed, followed by maintaining speed while improving accuracy. Specifically, we design an efficient hybrid encoder to expeditiously process multi-scale features by decoupling intra-scale interaction and cross-scale fusion to improve speed. Then, we propose the uncertainty-minimal query selection to provide high-quality initial queries to the decoder, thereby improving accuracy. In addition, RT-DETR supports flexible speed tuning by adjusting the number of decoder layers to adapt to various scenarios without retraining. Our RT-DETR-R50 / R101 achieves 53.1% / 54.3% AP on COCO and 108 / 74 FPS on T4 GPU, outperforming previously advanced YOLOs in both speed and accuracy. We also develop scaled RT-DETRs that outperform the lighter YOLO detectors (S and M models). Furthermore, RT-DETR-R50 outperforms DINO-R50 by 2.2% AP in accuracy and about 21 times in FPS. After pre-training with Objects365, RT-DETR-R50 / R101 achieves 55.3% / 56.2% AP. The project page: this https URL. This is the model card of a 🤗 transformers model that has been pushed on the Hub. - **Developed by:** Yian Zhao and Sangbum Choi - **Funded by:** National Key R&D Program of China (No.2022ZD0118201), Natural Science Foundation of China (No.61972217, 32071459, 62176249, 62006133, 62271465), and the Shenzhen Medical Research Funds in China (No. B2302037). - **Shared by:** Sangbum Choi - **Model type:** RT-DETR - **License:** Apache-2.0 ### Model Sources - **HF Docs:** RT-DETR - **Repository:** - **Paper:** - **Demo:** RT-DETR Tracking ## How to Get Started with the Model Use the code below to get started with the model. This should output ## Training Details ### Training Data The RTDETR model was trained on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. ### Training Procedure We conduct experiments on COCO and Objects365 datasets, where RT-DETR is trained on COCO train2017 and validated on COCO val2017 dataset. We report the standard COCO metrics, including AP (averaged over uniformly sampled IoU thresholds ranging from 0.50-0.95 with a step size of 0.05), AP50, AP75, as well as AP at different scales: APS, APM, APL. ### Preprocessing Images are resized to 640x640 pixels and rescaled with and . ### Training Hyperparameters - **Training regime:** !image/png ## Evaluation | Model | #Epochs | #Params (M) | GFLOPs | FPS_bs=1 | AP (val) | AP50 (val) | AP75 (val) | AP-s (val) | AP-m (val) | AP-l (val) | |----------------------------|---------|-------------|--------|----------|--------|-----------|-----------|----------|----------|----------| | RT-DETR-R18 | 72 | 20 | 60.7 | 217 | 46.5 | 63.8 | 50.4 | 28.4 | 49.8 | 63.0 | | RT-DETR-R34 | 72 | 31 | 91.0 | 172 | 48.5 | 66.2 | 52.3 | 30.2 | 51.9 | 66.2 | | RT-DETR R50 | 72 | 42 | 136 | 108 | 53.1 | 71.3 | 57.7 | 34.8 | 58.0 | 70.0 | | RT-DETR R101| 72 | 76 | 259 | 74 | 54.3 | 72.7 | 58.6 | 36.0 | 58.8 | 72.1 | | RT-DETR-R18 (Objects 365 pretrained) | 60 | 20 | 61 | 217 | 49.2 | 66.6 | 53.5 | 33.2 | 52.3 | 64.8 | | RT-DETR-R50 (Objects 365 pretrained) | 24 | 42 | 136 | 108 | 55.3 | 73.4 | 60.1 | 37.9 | 59.9 | 71.8 | | RT-DETR-R101 (Objects 365 pretrained) | 24 | 76 | 259 | 74 | 56.2 | 74.6 | 61.3 | 38.3 | 60.5 | 73.5 | ### Model Architecture and Objective !image/png Overview of RT-DETR. We feed the features from the last three stages of the backbone into the encoder. The efficient hybrid encoder transforms multi-scale features into a sequence of image features through the Attention-based Intra-scale Feature Interaction (AIFI) and the CNN-based Cross-scale Feature Fusion (CCFF). Then, the uncertainty-minimal query selection selects a fixed number of encoder features to serve as initial object queries for the decoder. Finally, the decoder with auxiliary prediction heads iteratively optimizes object queries to generate categories and boxes. ## Citation **BibTeX:** ## Model Card Authors Sangbum Choi Pavel Iakubovskii", + "model_explanation_gemini": "Detects objects in real-time images without needing non-maximum suppression, achieving high speed and accuracy by using a transformer-based architecture trained on COCO and Objects365 datasets." +} \ No newline at end of file diff --git a/data/model_data_json/ProsusAI_finbert.json b/data/model_data_json/ProsusAI_finbert.json new file mode 100644 index 0000000000000000000000000000000000000000..f4165f4948a59e6a675bb5153c51c2cb3b1cd87c --- /dev/null +++ b/data/model_data_json/ProsusAI_finbert.json @@ -0,0 +1,21 @@ +{ + "model_id": "ProsusAI/finbert", + "downloads": 1571808, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "text-classification", + "financial-sentiment-analysis", + "sentiment-analysis", + "en", + "arxiv:1908.10063", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: \"en\" tags: - financial-sentiment-analysis - sentiment-analysis widget: - text: \"Stocks rallied and the British pound gained.\" --- FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. Financial PhraseBank by Malo et al. (2014) is used for fine-tuning. For more details, please see the paper FinBERT: Financial Sentiment Analysis with Pre-trained Language Models and our related blog post on Medium. The model will give softmax outputs for three labels: positive, negative or neutral. --- About Prosus Prosus is a global consumer internet group and one of the largest technology investors in the world. Operating and investing globally in markets with long-term growth potential, Prosus builds leading consumer internet companies that empower people and enrich communities. For more information, please visit www.prosus.com. Contact information Please contact Dogu Araci dogu.araci[at]prosus[dot]com and Zulkuf Genc zulkuf.genc[at]prosus[dot]com about any FinBERT related issues and questions.", + "model_explanation_gemini": "Analyzes financial text sentiment, classifying it as positive, negative, or neutral using a BERT model fine-tuned on financial data." +} \ No newline at end of file diff --git a/data/model_data_json/Qdrant_Splade_PP_en_v1.json b/data/model_data_json/Qdrant_Splade_PP_en_v1.json new file mode 100644 index 0000000000000000000000000000000000000000..57d67e434e48eb8d9c51e92bbbb4e67e98d6400a --- /dev/null +++ b/data/model_data_json/Qdrant_Splade_PP_en_v1.json @@ -0,0 +1,18 @@ +{ + "model_id": "Qdrant/Splade_PP_en_v1", + "downloads": 295034, + "tags": [ + "transformers", + "onnx", + "bert", + "fill-mask", + "sentence-similarity", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: sentence-similarity --- ONNX port of prithivida/Splade_PP_en_v1 for text classification and similarity searches. ### Usage Here's an example of performing inference using the model with FastEmbed.", + "model_explanation_gemini": "An ONNX-ported model for text classification and similarity searches, optimized for inference with FastEmbed." +} \ No newline at end of file diff --git a/data/model_data_json/Qdrant_all-MiniLM-L6-v2-onnx.json b/data/model_data_json/Qdrant_all-MiniLM-L6-v2-onnx.json new file mode 100644 index 0000000000000000000000000000000000000000..bd757e78b887d9499166d506c1217db022d09aa0 --- /dev/null +++ b/data/model_data_json/Qdrant_all-MiniLM-L6-v2-onnx.json @@ -0,0 +1,17 @@ +{ + "model_id": "Qdrant/all-MiniLM-L6-v2-onnx", + "downloads": 255083, + "tags": [ + "transformers", + "onnx", + "bert", + "feature-extraction", + "sentence-similarity", + "license:apache-2.0", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: sentence-similarity --- ONNX port of sentence-transformers/all-MiniLM-L6-v2 for text classification and similarity searches. ### Usage Here's an example of performing inference using the model with FastEmbed. > Note: This model is supposed to be used with Qdrant. Vectors have to be configured with Modifier.IDF.", + "model_explanation_gemini": "An ONNX-optimized version of all-MiniLM-L6-v2 for text classification and similarity searches, designed for use with Qdrant's vector search requiring IDF-weighted vectors." +} \ No newline at end of file diff --git a/data/model_data_json/Qdrant_all_miniLM_L6_v2_with_attentions.json b/data/model_data_json/Qdrant_all_miniLM_L6_v2_with_attentions.json new file mode 100644 index 0000000000000000000000000000000000000000..74ffadef3fbc17cb6919da80624578ad5927b1f7 --- /dev/null +++ b/data/model_data_json/Qdrant_all_miniLM_L6_v2_with_attentions.json @@ -0,0 +1,18 @@ +{ + "model_id": "Qdrant/all_miniLM_L6_v2_with_attentions", + "downloads": 453103, + "tags": [ + "transformers", + "onnx", + "bert", + "feature-extraction", + "sentence-similarity", + "en", + "license:apache-2.0", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: sentence-similarity --- ONNX port of sentence-transformers/all-MiniLM-L6-v2 adjusted to return attention weights. This model is intended to be used for BM42 searches. ### Usage Here's an example of performing inference using the model with FastEmbed.", + "model_explanation_gemini": "An ONNX-adapted version of all-MiniLM-L6-v2 that computes sentence similarity and returns attention weights, primarily for BM42 searches." +} \ No newline at end of file diff --git a/data/model_data_json/Qdrant_bge-small-en-v1.5-onnx-Q.json b/data/model_data_json/Qdrant_bge-small-en-v1.5-onnx-Q.json new file mode 100644 index 0000000000000000000000000000000000000000..52566a24578b431b98527f92d6f5c28fb6d366fd --- /dev/null +++ b/data/model_data_json/Qdrant_bge-small-en-v1.5-onnx-Q.json @@ -0,0 +1,17 @@ +{ + "model_id": "Qdrant/bge-small-en-v1.5-onnx-Q", + "downloads": 334843, + "tags": [ + "transformers", + "onnx", + "bert", + "feature-extraction", + "sentence-similarity", + "license:apache-2.0", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: sentence-similarity --- Quantized ONNX port of BAAI/bge-small-en-v1.5 for text classification and similarity searches. ### Usage Here's an example of performing inference using the model with FastEmbed.", + "model_explanation_gemini": "Quantized ONNX port of BAAI/bge-small-en-v1.5 for text classification and similarity searches." +} \ No newline at end of file diff --git a/data/model_data_json/Qdrant_bm25.json b/data/model_data_json/Qdrant_bm25.json new file mode 100644 index 0000000000000000000000000000000000000000..155d3c83677584935dfbb471ac367f95711c0ad2 --- /dev/null +++ b/data/model_data_json/Qdrant_bm25.json @@ -0,0 +1,14 @@ +{ + "model_id": "Qdrant/bm25", + "downloads": 184025, + "tags": [ + "transformers", + "sentence-similarity", + "en", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: sentence-similarity --- Repository with files to perform BM25 searches with FastEmbed. BM25 (Best Matching 25) is a ranking function used by search engines to estimate the relevance of documents to a given search query. ### Usage > Note: This model is supposed to be used with Qdrant. Vectors have to be configured with Modifier.IDF. Here's an example of BM25 with FastEmbed. ```", + "model_explanation_gemini": "Performs BM25 searches to estimate document relevance for a given query, designed for use with Qdrant and FastEmbed." +} \ No newline at end of file diff --git a/data/model_data_json/QuantFactory_DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF.json b/data/model_data_json/QuantFactory_DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..0fb9998f061ce8ed75664147ed1431eae46809b7 --- /dev/null +++ b/data/model_data_json/QuantFactory_DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF.json @@ -0,0 +1,35 @@ +{ + "model_id": "QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF", + "downloads": 127549, + "tags": [ + "gguf", + "roleplay", + "llama3", + "sillytavern", + "idol", + "facebook", + "meta", + "pytorch", + "llama", + "llama-3", + "text-generation", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "zh", + "ko", + "ja", + "arxiv:2204.05149", + "license:llama3.1", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th - zh - ko - ja license: llama3.1 pipeline_tag: text-generation tags: - roleplay - llama3 - sillytavern - idol - facebook - meta - pytorch - llama - llama-3 extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other --- - Llama-3.1-8B-Instruct Uncensored - Roleplay(roleplay and Dark-roleplay) - Writing Prompts - writing opus - Realignment of Chinese, Japanese, and Korean - only test en. - Input Models input text only. Output Models generate text and code only. - Uncensored - Quick response - A scholarly response akin to a thesis.(I tend to write songs extensively, to the point where one song almost becomes as detailed as a thesis. :) - DarkIdol:Roles that you can imagine and those that you cannot imagine. - Specialized in various role-playing scenarios ## Uncensored Test - pip install datasets openai - start you openai Server,change Uncensored_Test/harmful_behaviors.py client to you Openai Server address and api_key - python Uncensored_Test/harmful_behaviors.py ## Special Thanks: ### Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication. - ### mradermacher's superb gguf version, thank you for your conscientious and responsible dedication. - - ## virtual idol Twitter - ## Datasets credits: - ChaoticNeutrals - Gryphe - meseca - NeverSleep Lumimaid ## Program: - Uncensored: Refusal in LLMs is mediated by a single direction - Uncensored: Program - Uncensored: Program Llama 3.1 by Aifeifei799 ## Questions - The model's response results are for reference only, please do not fully trust them. - This model is solely for learning and testing purposes, and errors in output are inevitable. We do not take responsibility for the output results. If the output content is to be used, it must be modified; if not modified, we will assume it has been altered. - For commercial licensing, please refer to the Llama 3.1 agreement. # Llama-3.1-8B-Instruct Information ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 41.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.", + "model_explanation_gemini": "An uncensored, multilingual 8B-parameter Llama 3.1 model specialized in roleplay, dark roleplay scenarios, and text generation with quick, detailed responses." +} \ No newline at end of file diff --git a/data/model_data_json/QuantFactory_Meta-Llama-3.1-8B-Instruct-GGUF.json b/data/model_data_json/QuantFactory_Meta-Llama-3.1-8B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..edc614d6322b5bfd3732225d1cd2bc9bc90c804b --- /dev/null +++ b/data/model_data_json/QuantFactory_Meta-Llama-3.1-8B-Instruct-GGUF.json @@ -0,0 +1,27 @@ +{ + "model_id": "QuantFactory/Meta-Llama-3.1-8B-Instruct-GGUF", + "downloads": 77242, + "tags": [ + "gguf", + "facebook", + "meta", + "pytorch", + "llama", + "llama-3", + "text-generation", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "license:llama3.1", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.1 extra_gated_prompt: >- ### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT Llama 3.1 Version Release Date: July 23, 2024 \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 3.1 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 3.1\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"Llama Materials\" means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at ). All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.1 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.1 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.1 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 3. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 4. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 5. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 6. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 7. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 8. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.1 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.1 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 3.1 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 41.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_QwQ-32B-AWQ.json b/data/model_data_json/Qwen_QwQ-32B-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..0da3075a69cca3bfd522467438ffa99206e8109d --- /dev/null +++ b/data/model_data_json/Qwen_QwQ-32B-AWQ.json @@ -0,0 +1,22 @@ +{ + "model_id": "Qwen/QwQ-32B-AWQ", + "downloads": 86516, + "tags": [ + "safetensors", + "qwen2", + "chat", + "text-generation", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2412.15115", + "base_model:Qwen/QwQ-32B", + "base_model:quantized:Qwen/QwQ-32B", + "license:apache-2.0", + "4-bit", + "awq", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/QwQ-32B tags: - chat --- # QwQ-32B-AWQ
\"Chat\" ## Introduction QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

**This repo contains the AWQ-quantized 4-bit QwQ 32B model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training (Supervised Finetuning and Reinforcement Learning) - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 131,072 tokens - For prompts exceeding 8,192 tokens in length, you must enable YaRN as outlined in this section. - Quantization: AWQ 4-bit **Note:** For the best experience, please review the usage guidelines before deploying QwQ models. You can try our demo or access QwQ models via QwenChat. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements QwQ is based on Qwen2.5, whose code has been in the latest Hugging face . We advise you to use the latest version of . With , you will encounter the following error: Also check out our AWQ documentation for more usage guide. ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Usage Guidelines To achieve optimal performance, we recommend the following settings: 1. **Enforce Thoughtful Output**: Ensure the model starts with \"\\\\n\" to prevent generating empty thinking content, which can degrade output quality. If you use and set , this is already automatically implemented, but it may cause the response to lack the \\ tag at the beginning. This is normal behavior. 2. **Sampling Parameters**: - Use Temperature=0.6, TopP=0.95, MinP=0 instead of Greedy decoding to avoid endless repetitions. - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output. - For supported frameworks, you can adjust the parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may result in occasional language mixing and a slight decrease in performance. 3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in . 4. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking. - **Math Problems**: Include \"Please reason step by step, and put your final answer within \\boxed{}.\" in the prompt. - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: \"Please show your choice in the field with only the choice letter, e.g.,.\" in the prompt. 5. **Handle Long Inputs**: For inputs exceeding 8,192 tokens, enable YaRN to improve the model's ability to capture long-sequence information effectively. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "QwQ-32B-AWQ is a 4-bit quantized, 32.5B parameter reasoning-focused language model optimized for enhanced performance on complex tasks through step-by-step thinking and structured output generation." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_QwQ-32B-Preview.json b/data/model_data_json/Qwen_QwQ-32B-Preview.json new file mode 100644 index 0000000000000000000000000000000000000000..2d5440808c60d0ddcdd247cbaf49a4c909a7b1fa --- /dev/null +++ b/data/model_data_json/Qwen_QwQ-32B-Preview.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/QwQ-32B-Preview", + "downloads": 97541, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-32B-Instruct", + "base_model:finetune:Qwen/Qwen2.5-32B-Instruct", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en base_model: Qwen/Qwen2.5-32B-Instruct tags: - chat library_name: transformers --- # QwQ-32B-Preview \"Chat\" ## Introduction **QwQ-32B-Preview** is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. As a preview release, it demonstrates promising analytical abilities while having several important limitations: 1. **Language Mixing and Code-Switching**: The model may mix languages or switch between them unexpectedly, affecting response clarity. 2. **Recursive Reasoning Loops**: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. 3. **Safety and Ethical Considerations**: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. 4. **Performance and Benchmark Limitations**: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding. **Specification**: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 32,768 tokens For more details, please refer to our blog. You can also check Qwen2.5 GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "QwQ-32B-Preview is an experimental 32.5B-parameter causal language model focused on advancing AI reasoning, particularly in math and coding, while exhibiting limitations in language mixing, recursive loops, and safety measures." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_QwQ-32B.json b/data/model_data_json/Qwen_QwQ-32B.json new file mode 100644 index 0000000000000000000000000000000000000000..ef32d5f36c33f563b452c40f49b34810857c93db --- /dev/null +++ b/data/model_data_json/Qwen_QwQ-32B.json @@ -0,0 +1,24 @@ +{ + "model_id": "Qwen/QwQ-32B", + "downloads": 574662, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2412.15115", + "base_model:Qwen/Qwen2.5-32B", + "base_model:finetune:Qwen/Qwen2.5-32B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-32B tags: - chat library_name: transformers --- # QwQ-32B \"Chat\" ## Introduction QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

**This repo contains the QwQ 32B model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training (Supervised Finetuning and Reinforcement Learning) - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 131,072 tokens - For prompts exceeding 8,192 tokens in length, you must enable YaRN as outlined in this section. **Note:** For the best experience, please review the usage guidelines before deploying QwQ models. You can try our demo or access QwQ models via QwenChat. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements QwQ is based on Qwen2.5, whose code has been in the latest Hugging face . We advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Usage Guidelines To achieve optimal performance, we recommend the following settings: 1. **Enforce Thoughtful Output**: Ensure the model starts with \"\\\\n\" to prevent generating empty thinking content, which can degrade output quality. If you use and set , this is already automatically implemented, but it may cause the response to lack the \\ tag at the beginning. This is normal behavior. 2. **Sampling Parameters**: - Use Temperature=0.6, TopP=0.95, MinP=0 instead of Greedy decoding to avoid endless repetitions. - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output. - For supported frameworks, you can adjust the parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may result in occasional language mixing and a slight decrease in performance. 3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in . 4. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking. - **Math Problems**: Include \"Please reason step by step, and put your final answer within \\boxed{}.\" in the prompt. - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: \"Please show your choice in the field with only the choice letter, e.g.,.\" in the prompt. 5. **Handle Long Inputs**: For inputs exceeding 8,192 tokens, enable YaRN to improve the model's ability to capture long-sequence information effectively. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 32.5B-parameter reasoning-focused language model optimized for enhanced performance on complex tasks through step-by-step thinking and structured output generation." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen1.5-0.5B-Chat-GGUF.json b/data/model_data_json/Qwen_Qwen1.5-0.5B-Chat-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..70ec414476c63149952a4a4b96599bbf7dce562c --- /dev/null +++ b/data/model_data_json/Qwen_Qwen1.5-0.5B-Chat-GGUF.json @@ -0,0 +1,17 @@ +{ + "model_id": "Qwen/Qwen1.5-0.5B-Chat-GGUF", + "downloads": 435456, + "tags": [ + "gguf", + "chat", + "text-generation", + "en", + "arxiv:2309.16609", + "license:other", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- license: other license_name: tongyi-qianwen-research license_link: language: - en pipeline_tag: text-generation tags: - chat --- # Qwen1.5-0.5B-Chat-GGUF ## Introduction Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: * 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated; * Significant performance improvement in human preference for chat models; * Multilingual support of both base and chat models; * Stable support of 32K context length for models of all sizes * No need of . For more details, please refer to our blog post and GitHub repo. In this repo, we provide quantized models in the GGUF formats, including , , , , , , and . To demonstrate their model quality, we follow []( to evaluate their perplexity on wiki test set. Results are shown below: |Size | fp16 | q8_0 | q6_k | q5_k_m | q5_0 | q4_k_m | q4_0 | q3_k_m | q2_k | |--------|---------|---------|---------|---------|---------|---------|---------|---------|---------| |0.5B | 34.20 | 34.22 | 34.31 | 33.80 | 34.02 | 34.27 | 36.74 | 38.25 | 62.14 | |1.8B | 15.99 | 15.99 | 15.99 | 16.09 | 16.01 | 16.22 | 16.54 | 17.03 | 19.99 | |4B | 13.20 | 13.21 | 13.28 | 13.24 | 13.27 | 13.61 | 13.44 | 13.67 | 15.65 | |7B | 14.21 | 14.24 | 14.35 | 14.32 | 14.12 | 14.35 | 14.47 | 15.11 | 16.57 | |14B | 10.91 | 10.91 | 10.93 | 10.98 | 10.88 | 10.92 | 10.92 | 11.24 | 12.27 | |32B | 8.87 | 8.89 | 8.91 | 8.94 | 8.93 | 8.96 | 9.17 | 9.14 | 10.51 | |72B | 7.97 | 7.99 | 7.99 | 7.99 | 8.01 | 8.00 | 8.01 | 8.06 | 8.63 | ## Model Details Qwen1.5 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. For the beta version, temporarily we did not include GQA (except for 32B) and the mixture of SWA and full attention. ## Training details We pretrained the models with a large amount of data, and we post-trained the models with both supervised finetuning and direct preference optimization. ## Requirements We advise you to clone []( and install it following the official guide. ## How to use Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use () as shown below: We demonstrate how to use to run Qwen1.5: ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 0.5 billion-parameter chat-optimized language model based on Qwen1.5, designed for text generation with multilingual support and 32K context length." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen1.5-0.5B-Chat.json b/data/model_data_json/Qwen_Qwen1.5-0.5B-Chat.json new file mode 100644 index 0000000000000000000000000000000000000000..2e10ca55c0792f3d220bc38368659a66c7b2f60c --- /dev/null +++ b/data/model_data_json/Qwen_Qwen1.5-0.5B-Chat.json @@ -0,0 +1,21 @@ +{ + "model_id": "Qwen/Qwen1.5-0.5B-Chat", + "downloads": 429230, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.16609", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: tongyi-qianwen-research license_link: >- language: - en pipeline_tag: text-generation tags: - chat --- # Qwen1.5-0.5B-Chat ## Introduction Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: * 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated; * Significant performance improvement in human preference for chat models; * Multilingual support of both base and chat models; * Stable support of 32K context length for models of all sizes * No need of . For more details, please refer to our blog post and GitHub repo.
## Model Details Qwen1.5 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. For the beta version, temporarily we did not include GQA (except for 32B) and the mixture of SWA and full attention. ## Training details We pretrained the models with a large amount of data, and we post-trained the models with both supervised finetuning and direct preference optimization. ## Requirements The code of Qwen1.5 has been in the latest Hugging face transformers and we advise you to install , or you might encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. For quantized models, we advise you to use the GPTQ, AWQ, and GGUF correspondents, namely , , , and . ## Tips * If you encounter code switching or other bad cases, we advise you to use our provided hyper-parameters in . ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen1.5-0.5B-Chat is a 0.5 billion-parameter decoder-only transformer-based multilingual chat model optimized for human preference, supporting 32K context length and improved performance over previous versions." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-0.5B-Instruct.json b/data/model_data_json/Qwen_Qwen2-0.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..e4cd4ca37c110eb57957d75c30692f9389a31c35 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-0.5B-Instruct.json @@ -0,0 +1,22 @@ +{ + "model_id": "Qwen/Qwen2-0.5B-Instruct", + "downloads": 224521, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "base_model:Qwen/Qwen2-0.5B", + "base_model:finetune:Qwen/Qwen2-0.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - chat base_model: Qwen/Qwen2-0.5B --- # Qwen2-0.5B-Instruct ## Introduction Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 0.5B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. For more details, please refer to our blog, GitHub, and Documentation.
## Model Details Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. ## Training details We pretrained the models with a large amount of data, and we post-trained the models with both supervised finetuning and direct preference optimization. ## Requirements The code of Qwen2 has been in the latest Hugging face transformers and we advise you to install , or you might encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ## Evaluation We briefly compare Qwen2-0.5B-Instruct with Qwen1.5-0.5B-Chat. The results are as follows: | Datasets | Qwen1.5-0.5B-Chat | **Qwen2-0.5B-Instruct** | Qwen1.5-1.8B-Chat | **Qwen2-1.5B-Instruct** | | :--- | :---: | :---: | :---: | :---: | | MMLU | 35.0 | **37.9** | 43.7 | **52.4** | | HumanEval | 9.1 | **17.1** | 25.0 | **37.8** | | GSM8K | 11.3 | **40.1** | 35.3 | **61.6** | | C-Eval | 37.2 | **45.2** | 55.3 | **63.8** | | IFEval (Prompt Strict-Acc.) | 14.6 | **20.0** | 16.8 | **29.0** | ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 0.5 billion parameter instruction-tuned language model for text generation, excelling in language understanding, generation, multilingual tasks, coding, math, and reasoning." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-0.5B.json b/data/model_data_json/Qwen_Qwen2-0.5B.json new file mode 100644 index 0000000000000000000000000000000000000000..3a788f85197b5556bf1a79167a8a3ea3d3fc51d7 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-0.5B.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2-0.5B", + "downloads": 116440, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "pretrained", + "conversational", + "en", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: text-generation tags: - pretrained license: apache-2.0 new_version: Qwen/Qwen2.5-0.5B --- # Qwen2-0.5B ## Introduction Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the 0.5B Qwen2 base language model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. For more details, please refer to our blog, GitHub, and Documentation.
## Model Details Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. ## Requirements The code of Qwen2 has been in the latest Hugging face transformers and we advise you to install , or you might encounter the following error: ## Usage We do not advise you to use base language models for text generation. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. ## Performance The evaluation of base models mainly focuses on the model performance of natural language understanding, general question answering, coding, mathematics, scientific knowledge, reasoning, multilingual capability, etc. The datasets for evaluation include: **English Tasks**: MMLU (5-shot), MMLU-Pro (5-shot), GPQA (5shot), Theorem QA (5-shot), BBH (3-shot), HellaSwag (10-shot), Winogrande (5-shot), TruthfulQA (0-shot), ARC-C (25-shot) **Coding Tasks**: EvalPlus (0-shot) (HumanEval, MBPP, HumanEval+, MBPP+), MultiPL-E (0-shot) (Python, C++, JAVA, PHP, TypeScript, C#, Bash, JavaScript) **Math Tasks**: GSM8K (4-shot), MATH (4-shot) **Chinese Tasks**: C-Eval(5-shot), CMMLU (5-shot) **Multilingual Tasks**: Multi-Exam (M3Exam 5-shot, IndoMMLU 3-shot, ruMMLU 5-shot, mMMLU 5-shot), Multi-Understanding (BELEBELE 5-shot, XCOPA 5-shot, XWinograd 5-shot, XStoryCloze 0-shot, PAWS-X 5-shot), Multi-Mathematics (MGSM 8-shot), Multi-Translation (Flores-101 5-shot) #### Qwen2-0.5B & Qwen2-1.5B performances | Datasets | Phi-2 | Gemma-2B | MiniCPM | Qwen1.5-1.8B | Qwen2-0.5B | Qwen2-1.5B | | :--------| :---------: | :------------: | :------------: |:------------: | :------------: | :------------: | |#Non-Emb Params | 2.5B | 2.0B | 2.4B | 1.3B | 0.35B | 1.3B | |MMLU | 52.7 | 42.3 | 53.5 | 46.8 | 45.4 | **56.5** | |MMLU-Pro | - | 15.9 | - | - | 14.7 | 21.8 | |Theorem QA | - | - | - |- | 8.9 | **15.0** | |HumanEval | 47.6 | 22.0 |**50.0**| 20.1 | 22.0 | 31.1 | |MBPP | **55.0** | 29.2 | 47.3 | 18.0 | 22.0 | 37.4 | |GSM8K | 57.2 | 17.7 | 53.8 | 38.4 | 36.5 | **58.5** | |MATH | 3.5 | 11.8 | 10.2 | 10.1 | 10.7 | **21.7** | |BBH | **43.4** | 35.2 | 36.9 | 24.2 | 28.4 | 37.2 | |HellaSwag | **73.1** | 71.4 | 68.3 | 61.4 | 49.3 | 66.6 | |Winogrande | **74.4** | 66.8 | -| 60.3 | 56.8 | 66.2 | |ARC-C | **61.1** | 48.5 | -| 37.9 | 31.5 | 43.9 | |TruthfulQA | 44.5 | 33.1 | -| 39.4 | 39.7 | **45.9** | |C-Eval | 23.4 | 28.0 | 51.1| 59.7 | 58.2 | **70.6** | |CMMLU | 24.2 | - | 51.1 | 57.8 | 55.1 | **70.3** | ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 0.5 billion parameter base language model designed for natural language understanding, generation, multilingual tasks, coding, mathematics, and reasoning, outperforming many open-source models in benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-1.5B-Instruct.json b/data/model_data_json/Qwen_Qwen2-1.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..19711392439dc534619065b8007c6b3e216f6a0b --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-1.5B-Instruct.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2-1.5B-Instruct", + "downloads": 205929, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - chat --- # Qwen2-1.5B-Instruct ## Introduction Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 1.5B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. For more details, please refer to our blog, GitHub, and Documentation.
## Model Details Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. ## Training details We pretrained the models with a large amount of data, and we post-trained the models with both supervised finetuning and direct preference optimization. ## Requirements The code of Qwen2 has been in the latest Hugging face transformers and we advise you to install , or you might encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ## Evaluation We briefly compare Qwen2-1.5B-Instruct with Qwen1.5-1.8B-Chat. The results are as follows: | Datasets | Qwen1.5-0.5B-Chat | **Qwen2-0.5B-Instruct** | Qwen1.5-1.8B-Chat | **Qwen2-1.5B-Instruct** | | :--- | :---: | :---: | :---: | :---: | | MMLU | 35.0 | **37.9** | 43.7 | **52.4** | | HumanEval | 9.1 | **17.1** | 25.0 | **37.8** | | GSM8K | 11.3 | **40.1** | 35.3 | **61.6** | | C-Eval | 37.2 | **45.2** | 55.3 | **63.8** | | IFEval (Prompt Strict-Acc.) | 14.6 | **20.0** | 16.8 | **29.0** | ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2-1.5B-Instruct is a 1.5 billion parameter instruction-tuned language model for text generation, excelling in language understanding, generation, multilingual tasks, coding, math, and reasoning." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-1.5B.json b/data/model_data_json/Qwen_Qwen2-1.5B.json new file mode 100644 index 0000000000000000000000000000000000000000..e6e5bf50dc0fe77157e48254beb9b8727c05ecbc --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-1.5B.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2-1.5B", + "downloads": 215890, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "pretrained", + "conversational", + "en", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: text-generation tags: - pretrained license: apache-2.0 --- # Qwen2-1.5B ## Introduction Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the 1.5B Qwen2 base language model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. For more details, please refer to our blog, GitHub, and Documentation.
## Model Details Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. ## Requirements The code of Qwen2 has been in the latest Hugging face transformers and we advise you to install , or you might encounter the following error: ## Usage We do not advise you to use base language models for text generation. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. ## Performance The evaluation of base models mainly focuses on the model performance of natural language understanding, general question answering, coding, mathematics, scientific knowledge, reasoning, multilingual capability, etc. The datasets for evaluation include: **English Tasks**: MMLU (5-shot), MMLU-Pro (5-shot), GPQA (5shot), Theorem QA (5-shot), BBH (3-shot), HellaSwag (10-shot), Winogrande (5-shot), TruthfulQA (0-shot), ARC-C (25-shot) **Coding Tasks**: EvalPlus (0-shot) (HumanEval, MBPP, HumanEval+, MBPP+), MultiPL-E (0-shot) (Python, C++, JAVA, PHP, TypeScript, C#, Bash, JavaScript) **Math Tasks**: GSM8K (4-shot), MATH (4-shot) **Chinese Tasks**: C-Eval(5-shot), CMMLU (5-shot) **Multilingual Tasks**: Multi-Exam (M3Exam 5-shot, IndoMMLU 3-shot, ruMMLU 5-shot, mMMLU 5-shot), Multi-Understanding (BELEBELE 5-shot, XCOPA 5-shot, XWinograd 5-shot, XStoryCloze 0-shot, PAWS-X 5-shot), Multi-Mathematics (MGSM 8-shot), Multi-Translation (Flores-101 5-shot) #### Qwen2-0.5B & Qwen2-1.5B performances | Datasets | Phi-2 | Gemma-2B | MiniCPM | Qwen1.5-1.8B | Qwen2-0.5B | Qwen2-1.5B | | :--------| :---------: | :------------: | :------------: |:------------: | :------------: | :------------: | |#Non-Emb Params | 2.5B | 2.0B | 2.4B | 1.3B | 0.35B | 1.3B | |MMLU | 52.7 | 42.3 | 53.5 | 46.8 | 45.4 | **56.5** | |MMLU-Pro | - | 15.9 | - | - | 14.7 | 21.8 | |Theorem QA | - | - | - |- | 8.9 | **15.0** | |HumanEval | 47.6 | 22.0 |**50.0**| 20.1 | 22.0 | 31.1 | |MBPP | **55.0** | 29.2 | 47.3 | 18.0 | 22.0 | 37.4 | |GSM8K | 57.2 | 17.7 | 53.8 | 38.4 | 36.5 | **58.5** | |MATH | 3.5 | 11.8 | 10.2 | 10.1 | 10.7 | **21.7** | |BBH | **43.4** | 35.2 | 36.9 | 24.2 | 28.4 | 37.2 | |HellaSwag | **73.1** | 71.4 | 68.3 | 61.4 | 49.3 | 66.6 | |Winogrande | **74.4** | 66.8 | -| 60.3 | 56.8 | 66.2 | |ARC-C | **61.1** | 48.5 | -| 37.9 | 31.5 | 43.9 | |TruthfulQA | 44.5 | 33.1 | -| 39.4 | 39.7 | **45.9** | |C-Eval | 23.4 | 28.0 | 51.1| 59.7 | 58.2 | **70.6** | |CMMLU | 24.2 | - | 51.1 | 57.8 | 55.1 | **70.3** | ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2-1.5B is a 1.5 billion parameter base language model for text generation, excelling in language understanding, generation, coding, math, reasoning, and multilingual tasks across various benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-7B-Instruct.json b/data/model_data_json/Qwen_Qwen2-7B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..afee51c0748c4ddc01160569f80d09c8573c17bd --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-7B-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2-7B-Instruct", + "downloads": 265707, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.00071", + "base_model:Qwen/Qwen2-7B", + "base_model:finetune:Qwen/Qwen2-7B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - chat base_model: Qwen/Qwen2-7B --- # Qwen2-7B-Instruct ## Introduction Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 7B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. Qwen2-7B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs. Please refer to this section for detailed instructions on how to deploy Qwen2 for handling long texts. For more details, please refer to our blog, GitHub, and Documentation.
## Model Details Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. ## Training details We pretrained the models with a large amount of data, and we post-trained the models with both supervised finetuning and direct preference optimization. ## Requirements The code of Qwen2 has been in the latest Hugging face transformers and we advise you to install , or you might encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts To handle extensive inputs exceeding 32,768 tokens, we utilize YARN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps: 1. **Install vLLM**: You can install vLLM by running the following command. Or you can install vLLM from source. 2. **Configure Model Settings**: After downloading the model weights, modify the file by including the below snippet: This snippet enable YARN to support longer contexts. 3. **Model Deployment**: Utilize vLLM to deploy your model. For instance, you can set up an openAI-like server using the command: Then you can access the Chat API by: For further usage instructions of vLLM, please refer to our Github. **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation We briefly compare Qwen2-7B-Instruct with similar-sized instruction-tuned LLMs, including Qwen1.5-7B-Chat. The results are shown below: | Datasets | Llama-3-8B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen1.5-7B-Chat | Qwen2-7B-Instruct | | :--- | :---: | :---: | :---: | :---: | :---: | | _**English**_ | | | | | | | MMLU | 68.4 | 69.5 | **72.4** | 59.5 | 70.5 | | MMLU-Pro | 41.0 | - | - | 29.1 | **44.1** | | GPQA | **34.2** | - | **-** | 27.8 | 25.3 | | TheroemQA | 23.0 | - | - | 14.1 | **25.3** | | MT-Bench | 8.05 | 8.20 | 8.35 | 7.60 | **8.41** | | _**Coding**_ | | | | | | | Humaneval | 62.2 | 66.5 | 71.8 | 46.3 | **79.9** | | MBPP | **67.9** | - | - | 48.9 | 67.2 | | MultiPL-E | 48.5 | - | - | 27.2 | **59.1** | | Evalplus | 60.9 | - | - | 44.8 | **70.3** | | LiveCodeBench | 17.3 | - | - | 6.0 | **26.6** | | _**Mathematics**_ | | | | | | | GSM8K | 79.6 | **84.8** | 79.6 | 60.3 | 82.3 | | MATH | 30.0 | 47.7 | **50.6** | 23.2 | 49.6 | | _**Chinese**_ | | | | | | | C-Eval | 45.9 | - | 75.6 | 67.3 | **77.2** | | AlignBench | 6.20 | 6.90 | 7.01 | 6.20 | **7.21** | ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 7B-parameter instruction-tuned language model optimized for text generation, supporting long-context processing up to 131,072 tokens and excelling in language understanding, generation, coding, and reasoning tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-Audio-7B-Instruct.json b/data/model_data_json/Qwen_Qwen2-Audio-7B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..c5fd57db3802a77680863c876e19c6f37e98f9aa --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-Audio-7B-Instruct.json @@ -0,0 +1,22 @@ +{ + "model_id": "Qwen/Qwen2-Audio-7B-Instruct", + "downloads": 101518, + "tags": [ + "transformers", + "safetensors", + "qwen2_audio", + "text2text-generation", + "chat", + "audio", + "audio-text-to-text", + "en", + "arxiv:2407.10759", + "arxiv:2311.07919", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en tags: - chat - audio - audio-text-to-text --- # Qwen2-Audio-7B-Instruct \"Chat\" ## Introduction Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. We introduce two distinct audio interaction modes: * voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input; * audio analysis: users could provide audio and text instructions for analysis during the interaction; We release Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct, which are pretrained model and chat model respectively. For more details, please refer to our Blog, GitHub, and Report.
## Requirements The code of Qwen2-Audio has been in the latest Hugging face transformers and we advise you to build from source with command , or you might encounter the following error: ## Quickstart In the following, we demonstrate how to use for the inference, supporting both voice chat and audio analysis modes. Note that we have used the ChatML format for dialog, in this demo we show how to leverage for this purpose. ### Voice Chat Inference In the voice chat mode, users can freely engage in voice interactions with Qwen2-Audio without text input: ### Audio Analysis Inference In the audio analysis, users could provide both audio and text instructions for analysis: ### Batch Inference We also support batch inference: ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2-Audio-7B-Instruct is a chat model that processes audio inputs to perform voice interactions or analyze audio with text instructions, supporting both voice chat and audio analysis modes." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-VL-2B-Instruct.json b/data/model_data_json/Qwen_Qwen2-VL-2B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..444f17c30673d0a9dc5ed7c89e1fa48f12290e9b --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-VL-2B-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2-VL-2B-Instruct", + "downloads": 651676, + "tags": [ + "transformers", + "safetensors", + "qwen2_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2409.12191", + "arxiv:2308.12966", + "base_model:Qwen/Qwen2-VL-2B", + "base_model:finetune:Qwen/Qwen2-VL-2B", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers base_model: - Qwen/Qwen2-VL-2B --- # Qwen2-VL-2B-Instruct \"Chat\" ## Introduction We're excited to unveil **Qwen2-VL**, the latest iteration of our Qwen-VL model, representing nearly a year of innovation. ### What’s New in Qwen2-VL? #### Key Enhancements: * **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. * **Understanding videos of 20min+**: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. * **Agent that can operate your mobiles, robots, etc.**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. * **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. #### Model Architecture Updates: * **Naive Dynamic Resolution**: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.

* **Multimodal Rotary Position Embedding (M-ROPE)**: Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.

We have three models with 2, 7 and 72 billion parameters. This repo contains the instruction-tuned 2B Qwen2-VL model. For more information, visit our Blog and GitHub. ## Evaluation ### Image Benchmarks | Benchmark | InternVL2-2B | MiniCPM-V 2.0 | **Qwen2-VL-2B** | | :--- | :---: | :---: | :---: | | MMMUval | 36.3 | 38.2 | **41.1** | | DocVQAtest | 86.9 | - | **90.1** | | InfoVQAtest | 58.9 | - | **65.5** | | ChartQAtest | **76.2** | - | 73.5 | | TextVQAval | 73.4 | - | **79.7** | | OCRBench | 781 | 605 | **794** | | MTVQA | - | - | **20.0** | | VCRen easy | - | - | **81.45** | VCRzh easy | - | - | **46.16** | RealWorldQA | 57.3 | 55.8 | **62.9** | | MMEsum | **1876.8** | 1808.6 | 1872.0 | | MMBench-ENtest | 73.2 | 69.1 | **74.9** | | MMBench-CNtest | 70.9 | 66.5 | **73.5** | | MMBench-V1.1test | 69.6 | 65.8 | **72.2** | | MMT-Benchtest | - | - | **54.5** | | MMStar | **49.8** | 39.1 | 48.0 | | MMVetGPT-4-Turbo | 39.7 | 41.0 | **49.5** | | HallBenchavg | 38.0 | 36.1 | **41.7** | | MathVistatestmini | **46.0** | 39.8 | 43.0 | | MathVision | - | - | **12.4** | ### Video Benchmarks | Benchmark | **Qwen2-VL-2B** | | :--- | :---: | | MVBench | **63.2** | | PerceptionTesttest | **53.9** | | EgoSchematest | **54.9** | | Video-MMEwo/w subs | **55.6**/**60.4** | ## Requirements The code of Qwen2-VL has been in the latest Hugging face transformers and we advise you to build from source with command , or you might encounter the following error: ## Quickstart We offer a toolkit to help you handle various types of visual input more conveniently. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: Here we show a code snippet to show you how to use the chat model with and :

Without qwen_vl_utils
Multi image inference
Video inference
Batch inference
### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ## Limitations While Qwen2-VL are applicable to a wide range of visual tasks, it is equally important to understand its limitations. Here are some known restrictions: 1. Lack of Audio Support: The current model does **not comprehend audio information** within videos. 2. Data timeliness: Our image dataset is **updated until June 2023**, and information subsequent to this date may not be covered. 3. Constraints in Individuals and Intellectual Property (IP): The model's capacity to recognize specific individuals or IPs is limited, potentially failing to comprehensively cover all well-known personalities or brands. 4. Limited Capacity for Complex Instruction: When faced with intricate multi-step instructions, the model's understanding and execution capabilities require enhancement. 5. Insufficient Counting Accuracy: Particularly in complex scenes, the accuracy of object counting is not high, necessitating further improvements. 6. Weak Spatial Reasoning Skills: Especially in 3D spaces, the model's inference of object positional relationships is inadequate, making it difficult to precisely judge the relative positions of objects. These limitations serve as ongoing directions for model optimization and improvement, and we are committed to continually enhancing the model's performance and scope of application. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A multimodal AI model for image and video understanding, supporting multilingual text recognition, complex reasoning, and device integration for tasks like visual question answering and content creation." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-VL-2B.json b/data/model_data_json/Qwen_Qwen2-VL-2B.json new file mode 100644 index 0000000000000000000000000000000000000000..fe613ad383fcd34d3bb572da0d76eec0eec980b4 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-VL-2B.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2-VL-2B", + "downloads": 77369, + "tags": [ + "transformers", + "safetensors", + "qwen2_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2409.12191", + "arxiv:2308.12966", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2-VL-2B ## Introduction We're excited to unveil **Qwen2-VL**, the latest iteration of our Qwen-VL model, representing nearly a year of innovation. > [!Important] > This is the base pretrained model of Qwen2-VL-2B without instruction tuning. ### What’s New in Qwen2-VL? #### Key Enhancements: * **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. * **Understanding videos of 20min+**: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. * **Agent that can operate your mobiles, robots, etc.**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. * **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. #### Model Architecture Updates: * **Naive Dynamic Resolution**: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.

* **Multimodal Rotary Position Embedding (M-ROPE)**: Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.

We have three models with 2, 7 and 72 billion parameters. This repo contains the **pretrained** 2B Qwen2-VL model. For more information, visit our Blog and GitHub. ## Requirements The code of Qwen2-VL has been in the latest Hugging Face and we advise you to install the latest version with command , or you might encounter the following error: ## Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-VL-72B.json b/data/model_data_json/Qwen_Qwen2-VL-72B.json new file mode 100644 index 0000000000000000000000000000000000000000..463cd80af4e65aaae1d46a583b73235332279d79 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-VL-72B.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2-VL-72B", + "downloads": 77971, + "tags": [ + "transformers", + "safetensors", + "qwen2_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2409.12191", + "arxiv:2308.12966", + "license:other", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: qwen license_link: language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2-VL-72B ## Introduction We're excited to unveil **Qwen2-VL**, the latest iteration of our Qwen-VL model, representing nearly a year of innovation. > [!Important] > This is the base pretrained model of Qwen2-VL-72B without instruction tuning. ### What’s New in Qwen2-VL? #### Key Enhancements: * **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. * **Understanding videos of 20min+**: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. * **Agent that can operate your mobiles, robots, etc.**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. * **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. #### Model Architecture Updates: * **Naive Dynamic Resolution**: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.

* **Multimodal Rotary Position Embedding (M-ROPE)**: Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.

We have three models with 2, 7 and 72 billion parameters. This repo contains the **pretrained** 72B Qwen2-VL model. For more information, visit our Blog and GitHub. ## Requirements The code of Qwen2-VL has been in the latest Hugging Face and we advise you to install the latest version with command , or you might encounter the following error: ## Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2-VL-7B-Instruct.json b/data/model_data_json/Qwen_Qwen2-VL-7B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..8dfb07c59e5bf699f9fe126038735ad2c2dfe2c1 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2-VL-7B-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2-VL-7B-Instruct", + "downloads": 956343, + "tags": [ + "transformers", + "safetensors", + "qwen2_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2409.12191", + "arxiv:2308.12966", + "base_model:Qwen/Qwen2-VL-7B", + "base_model:finetune:Qwen/Qwen2-VL-7B", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers base_model: - Qwen/Qwen2-VL-7B new_version: Qwen/Qwen2.5-VL-7B-Instruct --- # Qwen2-VL-7B-Instruct \"Chat\" ## Introduction We're excited to unveil **Qwen2-VL**, the latest iteration of our Qwen-VL model, representing nearly a year of innovation. ### What’s New in Qwen2-VL? #### Key Enhancements: * **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. * **Understanding videos of 20min+**: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. * **Agent that can operate your mobiles, robots, etc.**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. * **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. #### Model Architecture Updates: * **Naive Dynamic Resolution**: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.

* **Multimodal Rotary Position Embedding (M-ROPE)**: Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.

We have three models with 2, 7 and 72 billion parameters. This repo contains the instruction-tuned 7B Qwen2-VL model. For more information, visit our Blog and GitHub. ## Evaluation ### Image Benchmarks | Benchmark | InternVL2-8B | MiniCPM-V 2.6 | GPT-4o-mini | **Qwen2-VL-7B** | | :--- | :---: | :---: | :---: | :---: | | MMMUval | 51.8 | 49.8 | **60**| 54.1 | | DocVQAtest | 91.6 | 90.8 | - | **94.5** | | InfoVQAtest | 74.8 | - | - |**76.5** | | ChartQAtest | **83.3** | - |- | 83.0 | | TextVQAval | 77.4 | 80.1 | -| **84.3** | | OCRBench | 794 | **852** | 785 | 845 | | MTVQA | - | - | -| **26.3** | | VCRen easy | - | 73.88 | 83.60 | **89.70** | | VCRzh easy | - | 10.18| 1.10 | **59.94** | | RealWorldQA | 64.4 | - | - | **70.1** | | MMEsum | 2210.3 | **2348.4** | 2003.4| 2326.8 | | MMBench-ENtest | 81.7 | - | - | **83.0** | | MMBench-CNtest | **81.2** | - | - | 80.5 | | MMBench-V1.1test | 79.4 | 78.0 | 76.0| **80.7** | | MMT-Benchtest | - | - | - |**63.7** | | MMStar | **61.5** | 57.5 | 54.8 | 60.7 | | MMVetGPT-4-Turbo | 54.2 | 60.0 | **66.9** | 62.0 | | HallBenchavg | 45.2 | 48.1 | 46.1| **50.6** | | MathVistatestmini | 58.3 | **60.6** | 52.4 | 58.2 | | MathVision | - | - | - | **16.3** | ### Video Benchmarks | Benchmark | Internvl2-8B | LLaVA-OneVision-7B | MiniCPM-V 2.6 | **Qwen2-VL-7B** | | :--- | :---: | :---: | :---: | :---: | | MVBench | 66.4 | 56.7 | - | **67.0** | | PerceptionTesttest | - | 57.1 | - | **62.3** | | EgoSchematest | - | 60.1 | - | **66.7** | | Video-MMEwo/w subs | 54.0/56.9 | 58.2/- | 60.9/63.6 | **63.3**/**69.0** | ## Requirements The code of Qwen2-VL has been in the latest Hugging face transformers and we advise you to build from source with command , or you might encounter the following error: ## Quickstart We offer a toolkit to help you handle various types of visual input more conveniently. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: Here we show a code snippet to show you how to use the chat model with and :

Without qwen_vl_utils
Multi image inference
Video inference
Batch inference
### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ## Limitations While Qwen2-VL are applicable to a wide range of visual tasks, it is equally important to understand its limitations. Here are some known restrictions: 1. Lack of Audio Support: The current model does **not comprehend audio information** within videos. 2. Data timeliness: Our image dataset is **updated until June 2023**, and information subsequent to this date may not be covered. 3. Constraints in Individuals and Intellectual Property (IP): The model's capacity to recognize specific individuals or IPs is limited, potentially failing to comprehensively cover all well-known personalities or brands. 4. Limited Capacity for Complex Instruction: When faced with intricate multi-step instructions, the model's understanding and execution capabilities require enhancement. 5. Insufficient Counting Accuracy: Particularly in complex scenes, the accuracy of object counting is not high, necessitating further improvements. 6. Weak Spatial Reasoning Skills: Especially in 3D spaces, the model's inference of object positional relationships is inadequate, making it difficult to precisely judge the relative positions of objects. These limitations serve as ongoing directions for model optimization and improvement, and we are committed to continually enhancing the model's performance and scope of application. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2-VL-7B-Instruct is a multimodal AI model excelling in image and video understanding, supporting multilingual text recognition, and enabling complex reasoning for tasks like visual question answering, content creation, and device operation based on visual inputs." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-0.5B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-0.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..7e9374fd4db2e5e1b58c612a2b7f0657c8f91ee0 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-0.5B-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2.5-0.5B-Instruct", + "downloads": 989663, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-0.5B", + "base_model:finetune:Qwen/Qwen2.5-0.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-0.5B tags: - chat library_name: transformers --- # Qwen2.5-0.5B-Instruct ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 0.5B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 0.5B parameter instruction-tuned causal language model optimized for improved coding, mathematics, structured data handling, multilingual support, and long-context generation up to 8K tokens." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-0.5B.json b/data/model_data_json/Qwen_Qwen2.5-0.5B.json new file mode 100644 index 0000000000000000000000000000000000000000..210c6f483de640773718428183dbfbb1d3599148 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-0.5B.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2.5-0.5B", + "downloads": 1523589, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2407.10671", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation library_name: transformers --- # Qwen2.5-0.5B ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 0.5B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 0.5 billion parameter base causal language model specialized in coding, mathematics, multilingual text generation (29 languages), structured data understanding, and long-context support (up to 32K tokens), designed for pretraining and post-training adaptation rather than direct conversational use." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-1.5B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-1.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..fb85af9563d3d5a22823dfca89b3e8cae56d99c8 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-1.5B-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2.5-1.5B-Instruct", + "downloads": 1981366, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-1.5B", + "base_model:finetune:Qwen/Qwen2.5-1.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-1.5B tags: - chat library_name: transformers --- # Qwen2.5-1.5B-Instruct ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 1.5B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 1.54B - Number of Paramaters (Non-Embedding): 1.31B - Number of Layers: 28 - Number of Attention Heads (GQA): 12 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-1.5B-Instruct is a 1.5-billion-parameter instruction-tuned causal language model optimized for coding, mathematics, structured data understanding, multilingual text generation, and long-context tasks up to 128K tokens." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-1.5B.json b/data/model_data_json/Qwen_Qwen2.5-1.5B.json new file mode 100644 index 0000000000000000000000000000000000000000..10316fc64d597b44311fc43c4ca8cd39b027ec22 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-1.5B.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2.5-1.5B", + "downloads": 559571, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2407.10671", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation library_name: transformers --- # Qwen2.5-1.5B ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 1.5B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 1.54B - Number of Paramaters (Non-Embedding): 1.31B - Number of Layers: 28 - Number of Attention Heads (GQA): 12 for Q and 2 for KV - Context Length: Full 32,768 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 1.5B parameter causal language model optimized for text generation, featuring improved coding, mathematics, multilingual support (29 languages), long-context handling (32K tokens), and structured data understanding, designed for post-training adaptation rather than direct conversational use." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-14B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-14B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..d8ebf537a3d25dc38a26b7101743e8b35cd20aed --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-14B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "Qwen/Qwen2.5-14B-Instruct", + "downloads": 849554, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-14B", + "base_model:finetune:Qwen/Qwen2.5-14B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-14B tags: - chat library_name: transformers --- # Qwen2.5-14B-Instruct ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 14B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 14.7B - Number of Paramaters (Non-Embedding): 13.1B - Number of Layers: 48 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 14.7B-parameter instruction-tuned causal language model optimized for coding, mathematics, long-text generation (up to 8K tokens), structured data understanding, and multilingual support across 29 languages, with extended context handling up to 128K tokens." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-14B.json b/data/model_data_json/Qwen_Qwen2.5-14B.json new file mode 100644 index 0000000000000000000000000000000000000000..051791dc65378ab902ff680128f38d7b48f1c969 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-14B.json @@ -0,0 +1,16 @@ +{ + "model_id": "Qwen/Qwen2.5-14B", + "downloads": 257275, + "tags": [ + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2407.10671", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation --- # Qwen2.5-14B ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 14B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 14.7B - Number of Paramaters (Non-Embedding): 13.1B - Number of Layers: 48 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: 131,072 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-14B is a 14.7B-parameter causal language model optimized for text generation, featuring enhanced coding, mathematics, multilingual support (29+ languages), long-context handling (128K tokens), structured data understanding, and improved instruction following, but requires post-training for conversational use." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-32B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-32B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..01a590339e95f2dee3aad32d2d08d4147e17d3cb --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-32B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "Qwen/Qwen2.5-32B-Instruct", + "downloads": 418369, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-32B", + "base_model:finetune:Qwen/Qwen2.5-32B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-32B tags: - chat library_name: transformers --- # Qwen2.5-32B-Instruct ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 32B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 32.5B parameter instruction-tuned causal language model optimized for improved coding, mathematics, multilingual text generation (29 languages), structured data understanding, long-context processing (128K tokens), and enhanced instruction following." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-32B.json b/data/model_data_json/Qwen_Qwen2.5-32B.json new file mode 100644 index 0000000000000000000000000000000000000000..3edf13dd82b6ad8b101a3e63a8f0b8f63128e304 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-32B.json @@ -0,0 +1,15 @@ +{ + "model_id": "Qwen/Qwen2.5-32B", + "downloads": 76454, + "tags": [ + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2407.10671", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation --- # Qwen2.5-32B ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 32B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: 131,072 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-3B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-3B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..f60fbf53facf317bc596e84f8a3964befed7428c --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-3B-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2.5-3B-Instruct", + "downloads": 1379496, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-3B", + "base_model:finetune:Qwen/Qwen2.5-3B", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: qwen-research license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-3B tags: - chat library_name: transformers --- # Qwen2.5-3B-Instruct ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 3B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 3.09B - Number of Paramaters (Non-Embedding): 2.77B - Number of Layers: 36 - Number of Attention Heads (GQA): 16 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "An instruction-tuned 3B-parameter causal language model optimized for improved coding, mathematics, structured data handling, multilingual text generation (29+ languages), and long-context understanding (up to 128K tokens)." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-3B.json b/data/model_data_json/Qwen_Qwen2.5-3B.json new file mode 100644 index 0000000000000000000000000000000000000000..41af91643a29715b68c4a6bc1b30e4dea1693f82 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-3B.json @@ -0,0 +1,16 @@ +{ + "model_id": "Qwen/Qwen2.5-3B", + "downloads": 504005, + "tags": [ + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2407.10671", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: qwen-research license_link: language: - en pipeline_tag: text-generation --- # Qwen2.5-3B ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 3B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 3.09B - Number of Paramaters (Non-Embedding): 2.77B - Number of Layers: 36 - Number of Attention Heads (GQA): 16 for Q and 2 for KV - Context Length: Full 32,768 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-3B is a 3.09B-parameter base causal language model optimized for text generation, featuring improved coding, mathematics, multilingual support (29+ languages), long-context handling (32K tokens), and structured data understanding, but requires post-training for conversational use." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-72B-Instruct-AWQ.json b/data/model_data_json/Qwen_Qwen2.5-72B-Instruct-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..5aa2f6c935264c5ff829839aed3c86de3b9a2fb7 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-72B-Instruct-AWQ.json @@ -0,0 +1,26 @@ +{ + "model_id": "Qwen/Qwen2.5-72B-Instruct-AWQ", + "downloads": 252994, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-72B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-72B-Instruct", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "awq", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-72B-Instruct language: - en library_name: transformers license: other license_name: qwen license_link: pipeline_tag: text-generation tags: - chat --- # Qwen2.5-72B-Instruct-AWQ ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the AWQ-quantized 4-bit instruction-tuned 72B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 72.7B - Number of Paramaters (Non-Embedding): 70.0B - Number of Layers: 80 - Number of Attention Heads (GQA): 64 for Q and 8 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. - Quantization: AWQ 4-bit For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: Also check out our AWQ documentation for more usage guide. ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For quantized models, the benchmark results against the original bfloat16 models can be found here For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "An AWQ-quantized 72B-parameter instruction-tuned multilingual language model optimized for coding, mathematics, structured data handling, long-context generation (128K tokens), and improved instruction following in chatbot applications." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-72B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-72B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..64ab7c1dac8bd26185e1bac95ead47a29a47fd4c --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-72B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "Qwen/Qwen2.5-72B-Instruct", + "downloads": 150530, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-72B", + "base_model:finetune:Qwen/Qwen2.5-72B", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: qwen license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-72B tags: - chat library_name: transformers --- # Qwen2.5-72B-Instruct \"Chat\" ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 72B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 72.7B - Number of Paramaters (Non-Embedding): 70.0B - Number of Layers: 80 - Number of Attention Heads (GQA): 64 for Q and 8 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 72.7B-parameter instruction-tuned causal language model specialized in coding, mathematics, long-context understanding (128K tokens), structured data processing, and multilingual text generation across 29 languages." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-7B-Instruct-1M.json b/data/model_data_json/Qwen_Qwen2.5-7B-Instruct-1M.json new file mode 100644 index 0000000000000000000000000000000000000000..8fb43d2d50e9e6209da1b02407ee37398d3eb88a --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-7B-Instruct-1M.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2.5-7B-Instruct-1M", + "downloads": 1508135, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2501.15383", + "base_model:Qwen/Qwen2.5-7B", + "base_model:finetune:Qwen/Qwen2.5-7B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-7B tags: - chat library_name: transformers --- # Qwen2.5-7B-Instruct-1M \"Chat\" ## Introduction Qwen2.5-1M is the long-context version of the Qwen2.5 series models, supporting a context length of up to 1M tokens. Compared to the Qwen2.5 128K version, Qwen2.5-1M demonstrates significantly improved performance in handling long-context tasks while maintaining its capability in short tasks. The model has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Full 1,010,000 tokens and generation 8192 tokens - We recommend deploying with our custom vLLM, which introduces sparse attention and length extrapolation methods to ensure efficiency and accuracy for long-context tasks. For specific guidance, refer to this section. - You can also use the previous framework that supports Qwen2.5 for inference, but accuracy degradation may occur for sequences exceeding 262,144 tokens. For more details, please refer to our blog, GitHub, Technical Report, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Ultra Long Texts To enhance processing accuracy and efficiency for long sequences, we have developed an advanced inference framework based on vLLM, incorporating sparse attention and length extrapolation. This approach significantly improves model generation performance for sequences exceeding 256K tokens and achieves a 3 to 7 times speedup for sequences up to 1M tokens. Here we provide step-by-step instructions for deploying the Qwen2.5-1M models with our framework. #### 1. System Preparation To achieve the best performance, we recommend using GPUs with Ampere or Hopper architecture, which support optimized kernels. Ensure your system meets the following requirements: - **CUDA Version**: 12.1 or 12.3 - **Python Version**: >=3.9 and <=3.12 **VRAM Requirements:** - For processing 1 million-token sequences: - **Qwen2.5-7B-Instruct-1M**: At least 120GB VRAM (total across GPUs). - **Qwen2.5-14B-Instruct-1M**: At least 320GB VRAM (total across GPUs). If your GPUs do not have sufficient VRAM, you can still use Qwen2.5-1M for shorter tasks. #### 2. Install Dependencies For now, you need to clone the vLLM repository from our custom branch and install it manually. We are working on getting our branch merged into the main vLLM project. #### 3. Launch vLLM vLLM supports offline inference or launch an openai-like server. **Example of Offline Inference** **Example of Openai-like Server** Then you can use curl or python to interact with the deployed model. **Parameter Explanations:** - **** - Set to the number of GPUs you are using. Max 4 GPUs for the 7B model, and 8 GPUs for the 14B model. - **** - Defines the maximum input sequence length. Reduce this value if you encounter Out of Memory issues. - **** - Sets the chunk size in Chunked Prefill. A smaller value reduces activation memory usage but may slow down inference. - Recommend 131072 for optimal performance. - **** - Limits concurrent sequences processed. You can also refer to our Documentation for usage of vLLM. #### Troubleshooting: 1. Encountering the error: \"The model's max sequence length (xxxxx) is larger than the maximum number of tokens that can be stored in the KV cache.\" The VRAM reserved for the KV cache is insufficient. Consider reducing the ``. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog and our technical report. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-7B-Instruct-1M is a 7.61B-parameter causal language model optimized for long-context text generation tasks, supporting up to 1 million tokens while maintaining performance on shorter tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-7B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-7B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..ace823244840ab42117b2ced5c8c304433006e42 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-7B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "Qwen/Qwen2.5-7B-Instruct", + "downloads": 2981336, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-7B", + "base_model:finetune:Qwen/Qwen2.5-7B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-7B tags: - chat library_name: transformers --- # Qwen2.5-7B-Instruct \"Chat\" ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 7B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-7B-Instruct is a 7.61B-parameter instruction-tuned multilingual language model optimized for coding, mathematics, structured data understanding, long-context generation (up to 128K tokens), and improved instruction following across 29 languages." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-7B.json b/data/model_data_json/Qwen_Qwen2.5-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..9d3d1ac6741b9b06ce11945813db2fe3aa13b6c6 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-7B.json @@ -0,0 +1,20 @@ +{ + "model_id": "Qwen/Qwen2.5-7B", + "downloads": 495342, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2407.10671", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en pipeline_tag: text-generation library_name: transformers --- # Qwen2.5-7B ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 7B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: 131,072 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-7B is a 7.61B-parameter causal language model optimized for text generation, featuring enhanced coding, mathematics, multilingual support (29+ languages), long-context handling (128K tokens), structured data understanding, and improved instruction following, though not recommended for direct conversational use without fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Coder-0.5B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-Coder-0.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..bce70ac9f784490e36e894af53d3ee9d44269206 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Coder-0.5B-Instruct.json @@ -0,0 +1,28 @@ +{ + "model_id": "Qwen/Qwen2.5-Coder-0.5B-Instruct", + "downloads": 114663, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "code", + "codeqwen", + "chat", + "qwen", + "qwen-coder", + "conversational", + "en", + "arxiv:2409.12186", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-Coder-0.5B", + "base_model:finetune:Qwen/Qwen2.5-Coder-0.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en base_model: - Qwen/Qwen2.5-Coder-0.5B pipeline_tag: text-generation library_name: transformers tags: - code - codeqwen - chat - qwen - qwen-coder --- # Qwen2.5-Coder-0.5B-Instruct ## Introduction Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. **This repo contains the instruction-tuned 0.5B Qwen2.5-Coder model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens For more details, please refer to our blog, GitHub, Documentation, Arxiv. ## Requirements The code of Qwen2.5-Coder has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 0.5-billion-parameter instruction-tuned language model specialized in code generation, reasoning, and fixing, designed for real-world coding applications." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Coder-32B-Instruct-GPTQ-Int4.json b/data/model_data_json/Qwen_Qwen2.5-Coder-32B-Instruct-GPTQ-Int4.json new file mode 100644 index 0000000000000000000000000000000000000000..f15575e08ce1682bbf97d786328ee742b439c07e --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Coder-32B-Instruct-GPTQ-Int4.json @@ -0,0 +1,31 @@ +{ + "model_id": "Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4", + "downloads": 194287, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "code", + "codeqwen", + "chat", + "qwen", + "qwen-coder", + "conversational", + "en", + "arxiv:2409.12186", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-Coder-32B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-Coder-32B-Instruct", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "gptq", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en base_model: - Qwen/Qwen2.5-Coder-32B-Instruct pipeline_tag: text-generation library_name: transformers tags: - code - codeqwen - chat - qwen - qwen-coder --- # Qwen2.5-Coder-32B-Instruct-GPTQ-Int4 ## Introduction Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. - **Long-context Support** up to 128K tokens. **This repo contains the GPTQ-quantized 4-bit instruction-tuned 32B Qwen2.5-Coder model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 131,072 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. - Quantization: GPTQ 4-bit For more details, please refer to our blog, GitHub, Documentation, Arxiv. ## Requirements The code of Qwen2.5-Coder has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: Also check out our GPTQ documentation for more usage guide. ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-Coder-32B-Instruct-GPTQ-Int4 is a 4-bit quantized, 32-billion-parameter instruction-tuned language model specialized in code generation, reasoning, and fixing, with long-context support up to 128K tokens, designed for coding tasks and real-world applications like code agents." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Coder-32B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-Coder-32B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..aba9dfd3c0a22b05e196e56303f11f8f44dc5020 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Coder-32B-Instruct.json @@ -0,0 +1,29 @@ +{ + "model_id": "Qwen/Qwen2.5-Coder-32B-Instruct", + "downloads": 198487, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "code", + "codeqwen", + "chat", + "qwen", + "qwen-coder", + "conversational", + "en", + "arxiv:2409.12186", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-Coder-32B", + "base_model:finetune:Qwen/Qwen2.5-Coder-32B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en base_model: - Qwen/Qwen2.5-Coder-32B pipeline_tag: text-generation library_name: transformers tags: - code - codeqwen - chat - qwen - qwen-coder --- # Qwen2.5-Coder-32B-Instruct \"Chat\" ## Introduction Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. - **Long-context Support** up to 128K tokens. **This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 131,072 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our blog, GitHub, Documentation, Arxiv. ## Requirements The code of Qwen2.5-Coder has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "An instruction-tuned 32.5B parameter causal language model specialized in code generation, reasoning, and fixing, with 128K token context support, designed for real-world coding applications while maintaining strong math and general capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Coder-7B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-Coder-7B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..c5c0c6ef54a982abd4ae32d718a9394ebfbd1de5 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Coder-7B-Instruct.json @@ -0,0 +1,29 @@ +{ + "model_id": "Qwen/Qwen2.5-Coder-7B-Instruct", + "downloads": 291507, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "code", + "codeqwen", + "chat", + "qwen", + "qwen-coder", + "conversational", + "en", + "arxiv:2409.12186", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-Coder-7B", + "base_model:finetune:Qwen/Qwen2.5-Coder-7B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 license_link: language: - en base_model: - Qwen/Qwen2.5-Coder-7B pipeline_tag: text-generation library_name: transformers tags: - code - codeqwen - chat - qwen - qwen-coder --- # Qwen2.5-Coder-7B-Instruct \"Chat\" ## Introduction Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. - **Long-context Support** up to 128K tokens. **This repo contains the instruction-tuned 7B Qwen2.5-Coder model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Full 131,072 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our blog, GitHub, Documentation, Arxiv. ## Requirements The code of Qwen2.5-Coder has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 7.61B-parameter instruction-tuned causal language model specialized in code generation, reasoning, and fixing, with long-context support up to 128K tokens for real-world coding applications." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Math-1.5B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-Math-1.5B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..0d838a9f07b41d75e027e0e12ecc150b5506485c --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Math-1.5B-Instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "Qwen/Qwen2.5-Math-1.5B-Instruct", + "downloads": 144850, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "chat", + "conversational", + "en", + "arxiv:2409.12122", + "base_model:Qwen/Qwen2.5-Math-1.5B", + "base_model:finetune:Qwen/Qwen2.5-Math-1.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-Math-1.5B language: - en pipeline_tag: text-generation tags: - chat library_name: transformers license: apache-2.0 license_link: --- # Qwen2.5-Math-1.5B-Instruct > [!Warning] >
> > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >
## Introduction In August 2024, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. A month later, we have upgraded it and open-sourced **Qwen2.5-Math** series, including base models **Qwen2.5-Math-1.5B/7B/72B**, instruction-tuned models **Qwen2.5-Math-1.5B/7B/72B-Instruct**, and mathematical reward model **Qwen2.5-Math-RM-72B**. Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT. ![]( While CoT plays a vital role in enhancing the reasoning capabilities of LLMs, it faces challenges in achieving computational accuracy and handling complex mathematical or algorithmic reasoning tasks, such as finding the roots of a quadratic equation or computing the eigenvalues of a matrix. TIR can further improve the model's proficiency in precise computation, symbolic manipulation, and algorithmic manipulation. Qwen2.5-Math-1.5B/7B/72B-Instruct achieve 79.7, 85.3, and 87.8 respectively on the MATH benchmark using TIR. ## Model Details For more details, please refer to our blog post and GitHub repo. ## Requirements * for Qwen2.5-Math models. The latest version is recommended. > [!Warning] >
> > 🚨 This is a must because transformers integrated Qwen2 codes since 4.37.0. > >
For requirements on GPU memory and the respective throughput, see similar results of Qwen2 here. ## Quick Start > [!Important] > > **Qwen2.5-Math-1.5B-Instruct** is an instruction model for chatting; > > **Qwen2.5-Math-1.5B** is a base model typically used for completion and few-shot inference, serving as a better starting point for fine-tuning. > ### 🤗 Hugging Face Transformers Qwen2.5-Math can be deployed and infered in the same way as Qwen2.5. Here we show a code snippet to show you how to use the chat model with : ## Citation If you find our work helpful, feel free to give us a citation.", + "model_explanation_gemini": "Specializes in solving English and Chinese math problems using Chain-of-Thought and Tool-integrated Reasoning." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Math-1.5B.json b/data/model_data_json/Qwen_Qwen2.5-Math-1.5B.json new file mode 100644 index 0000000000000000000000000000000000000000..b9c76e8234e8fb33f6b79cfc0a50956c08ac0b8c --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Math-1.5B.json @@ -0,0 +1,22 @@ +{ + "model_id": "Qwen/Qwen2.5-Math-1.5B", + "downloads": 158341, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2409.12122", + "base_model:Qwen/Qwen2.5-1.5B", + "base_model:finetune:Qwen/Qwen2.5-1.5B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-1.5B language: - en pipeline_tag: text-generation library_name: transformers license: apache-2.0 license_link: --- # Qwen2.5-Math-1.5B > [!Warning] >
> > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >
## Introduction In August 2024, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. A month later, we have upgraded it and open-sourced **Qwen2.5-Math** series, including base models **Qwen2.5-Math-1.5B/7B/72B**, instruction-tuned models **Qwen2.5-Math-1.5B/7B/72B-Instruct**, and mathematical reward model **Qwen2.5-Math-RM-72B**. Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT. ![]( While CoT plays a vital role in enhancing the reasoning capabilities of LLMs, it faces challenges in achieving computational accuracy and handling complex mathematical or algorithmic reasoning tasks, such as finding the roots of a quadratic equation or computing the eigenvalues of a matrix. TIR can further improve the model's proficiency in precise computation, symbolic manipulation, and algorithmic manipulation. Qwen2.5-Math-1.5B/7B/72B-Instruct achieve 79.7, 85.3, and 87.8 respectively on the MATH benchmark using TIR. ## Model Details For more details, please refer to our blog post and GitHub repo. ## Requirements * for Qwen2.5-Math models. The latest version is recommended. > [!Warning] >
> > 🚨 This is a must because transformers integrated Qwen2 codes since 4.37.0. > >
For requirements on GPU memory and the respective throughput, see similar results of Qwen2 here. ## Quick Start > [!Important] > > **Qwen2.5-Math-1.5B-Instruct** is an instruction model for chatting; > > **Qwen2.5-Math-1.5B** is a base model typically used for completion and few-shot inference, serving as a better starting point for fine-tuning. ## Citation If you find our work helpful, feel free to give us a citation.", + "model_explanation_gemini": "Specialized in solving English and Chinese math problems using Chain-of-Thought (CoT) and Tool-integrated Reasoning (TIR) techniques." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Math-7B.json b/data/model_data_json/Qwen_Qwen2.5-Math-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..7462b1cd3392d52cfce95aad2bec58443ceed48d --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Math-7B.json @@ -0,0 +1,22 @@ +{ + "model_id": "Qwen/Qwen2.5-Math-7B", + "downloads": 122710, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "en", + "arxiv:2409.12122", + "base_model:Qwen/Qwen2.5-7B", + "base_model:finetune:Qwen/Qwen2.5-7B", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-7B language: - en pipeline_tag: text-generation library_name: transformers license: apache-2.0 license_link: --- # Qwen2.5-Math-7B > [!Warning] >
> > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >
## Introduction In August 2024, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. A month later, we have upgraded it and open-sourced **Qwen2.5-Math** series, including base models **Qwen2.5-Math-1.5B/7B/72B**, instruction-tuned models **Qwen2.5-Math-1.5B/7B/72B-Instruct**, and mathematical reward model **Qwen2.5-Math-RM-72B**. Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT. ![]( While CoT plays a vital role in enhancing the reasoning capabilities of LLMs, it faces challenges in achieving computational accuracy and handling complex mathematical or algorithmic reasoning tasks, such as finding the roots of a quadratic equation or computing the eigenvalues of a matrix. TIR can further improve the model's proficiency in precise computation, symbolic manipulation, and algorithmic manipulation. Qwen2.5-Math-1.5B/7B/72B-Instruct achieve 79.7, 85.3, and 87.8 respectively on the MATH benchmark using TIR. ## Model Details For more details, please refer to our blog post and GitHub repo. ## Requirements * for Qwen2.5-Math models. The latest version is recommended. > [!Warning] >
> > 🚨 This is a must because transformers integrated Qwen2 codes since 4.37.0. > >
For requirements on GPU memory and the respective throughput, see similar results of Qwen2 here. ## Quick Start > [!Important] > > **Qwen2.5-Math-7B-Instruct** is an instruction model for chatting; > > **Qwen2.5-Math-7B** is a base model typically used for completion and few-shot inference, serving as a better starting point for fine-tuning. > ## Citation If you find our work helpful, feel free to give us a citation.", + "model_explanation_gemini": "Specializes in solving English and Chinese math problems using Chain-of-Thought and Tool-integrated Reasoning techniques." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-Omni-7B.json b/data/model_data_json/Qwen_Qwen2.5-Omni-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..1be6f10554c3d5e81eacee5589e4b376f40a58ce --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-Omni-7B.json @@ -0,0 +1,18 @@ +{ + "model_id": "Qwen/Qwen2.5-Omni-7B", + "downloads": 175099, + "tags": [ + "transformers", + "safetensors", + "qwen2_5_omni", + "multimodal", + "any-to-any", + "en", + "arxiv:2503.20215", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: apache-2.0 license_link: language: - en tags: - multimodal library_name: transformers pipeline_tag: any-to-any --- # Qwen2.5-Omni \"Chat\" ## Overview ### Introduction Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

### Key Features * **Omni and Novel Architecture**: We propose Thinker-Talker architecture, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. We propose a novel position embedding, named TMRoPE (Time-aligned Multimodal RoPE), to synchronize the timestamps of video inputs with audio. * **Real-Time Voice and Video Chat**: Architecture designed for fully real-time interactions, supporting chunked input and immediate output. * **Natural and Robust Speech Generation**: Surpassing many existing streaming and non-streaming alternatives, demonstrating superior robustness and naturalness in speech generation. * **Strong Performance Across Modalities**: Exhibiting exceptional performance across all modalities when benchmarked against similarly sized single-modality models. Qwen2.5-Omni outperforms the similarly sized Qwen2-Audio in audio capabilities and achieves comparable performance to Qwen2.5-VL-7B. * **Excellent End-to-End Speech Instruction Following**: Qwen2.5-Omni shows performance in end-to-end speech instruction following that rivals its effectiveness with text inputs, evidenced by benchmarks such as MMLU and GSM8K. ### Model Architecture

### Performance We conducted a comprehensive evaluation of Qwen2.5-Omni, which demonstrates strong performance across all modalities when compared to similarly sized single-modality models and closed-source models like Qwen2.5-VL-7B, Qwen2-Audio, and Gemini-1.5-pro. In tasks requiring the integration of multiple modalities, such as OmniBench, Qwen2.5-Omni achieves state-of-the-art performance. Furthermore, in single-modality tasks, it excels in areas including speech recognition (Common Voice), translation (CoVoST2), audio understanding (MMAU), image reasoning (MMMU, MMStar), video understanding (MVBench), and speech generation (Seed-tts-eval and subjective naturalness).

Multimodality -> Text
Datasets Model Performance
OmniBench
Speech | Sound Event | Music | Avg
Gemini-1.5-Pro 42.67%|42.26%|46.23%|42.91%
MIO-Instruct 36.96%|33.58%|11.32%|33.80%
AnyGPT (7B) 17.77%|20.75%|13.21%|18.04%
video-SALMONN 34.11%|31.70%|56.60%|35.64%
UnifiedIO2-xlarge 39.56%|36.98%|29.25%|38.00%
UnifiedIO2-xxlarge 34.24%|36.98%|24.53%|33.98%
MiniCPM-o -|-|-|40.50%
Baichuan-Omni-1.5 -|-|-|42.90%
Qwen2.5-Omni-3B 52.14%|52.08%|52.83%|52.19%
Qwen2.5-Omni-7B 55.25%|60.00%|52.83%|56.13%
Audio -> Text
Datasets Model Performance
ASR
Librispeech
dev-clean | dev other | test-clean | test-other
SALMONN -|-|2.1|4.9
SpeechVerse -|-|2.1|4.4
Whisper-large-v3 -|-|1.8|3.6
Llama-3-8B -|-|-|3.4
Llama-3-70B -|-|-|3.1
Seed-ASR-Multilingual -|-|1.6|2.8
MiniCPM-o -|-|1.7|-
MinMo -|-|1.7|3.9
Qwen-Audio 1.8|4.0|2.0|4.2
Qwen2-Audio 1.3|3.4|1.6|3.6
Qwen2.5-Omni-3B 2.0|4.1|2.2|4.5
Qwen2.5-Omni-7B 1.6|3.5|1.8|3.4
Common Voice 15
en | zh | yue | fr
Whisper-large-v3 9.3|12.8|10.9|10.8
MinMo 7.9|6.3|6.4|8.5
Qwen2-Audio 8.6|6.9|5.9|9.6
Qwen2.5-Omni-3B 9.1|6.0|11.6|9.6
Qwen2.5-Omni-7B 7.6|5.2|7.3|7.5
Fleurs
zh | en
Whisper-large-v3 7.7|4.1
Seed-ASR-Multilingual -|3.4
Megrez-3B-Omni 10.8|-
MiniCPM-o 4.4|-
MinMo 3.0|3.8
Qwen2-Audio 7.5|-
Qwen2.5-Omni-3B 3.2|5.4
Qwen2.5-Omni-7B 3.0|4.1
Wenetspeech
test-net | test-meeting
Seed-ASR-Chinese 4.7|5.7
Megrez-3B-Omni -|16.4
MiniCPM-o 6.9|-
MinMo 6.8|7.4
Qwen2.5-Omni-3B 6.3|8.1
Qwen2.5-Omni-7B 5.9|7.7
Voxpopuli-V1.0-en Llama-3-8B 6.2
Llama-3-70B 5.7
Qwen2.5-Omni-3B 6.6
Qwen2.5-Omni-7B 5.8
S2TT
CoVoST2
en-de | de-en | en-zh | zh-en
SALMONN 18.6|-|33.1|-
SpeechLLaMA -|27.1|-|12.3
BLSP 14.1|-|-|-
MiniCPM-o -|-|48.2|27.2
MinMo -|39.9|46.7|26.0
Qwen-Audio 25.1|33.9|41.5|15.7
Qwen2-Audio 29.9|35.2|45.2|24.4
Qwen2.5-Omni-3B 28.3|38.1|41.4|26.6
Qwen2.5-Omni-7B 30.2|37.7|41.4|29.4
SER
Meld WavLM-large 0.542
MiniCPM-o 0.524
Qwen-Audio 0.557
Qwen2-Audio 0.553
Qwen2.5-Omni-3B 0.558
Qwen2.5-Omni-7B 0.570
VSC
VocalSound CLAP 0.495
Pengi 0.604
Qwen-Audio 0.929
Qwen2-Audio 0.939
Qwen2.5-Omni-3B 0.936
Qwen2.5-Omni-7B 0.939
Music
GiantSteps Tempo Llark-7B 0.86
Qwen2.5-Omni-3B 0.88
Qwen2.5-Omni-7B 0.88
MusicCaps LP-MusicCaps 0.291|0.149|0.089|0.061|0.129|0.130
Qwen2.5-Omni-3B 0.325|0.163|0.093|0.057|0.132|0.229
Qwen2.5-Omni-7B 0.328|0.162|0.090|0.055|0.127|0.225
Audio Reasoning
MMAU
Sound | Music | Speech | Avg
Gemini-Pro-V1.5 56.75|49.40|58.55|54.90
Qwen2-Audio 54.95|50.98|42.04|49.20
Qwen2.5-Omni-3B 70.27|60.48|59.16|63.30
Qwen2.5-Omni-7B 67.87|69.16|59.76|65.60
Voice Chatting
VoiceBench
AlpacaEval | CommonEval | SD-QA | MMSU
Ultravox-v0.4.1-LLaMA-3.1-8B 4.55|3.90|53.35|47.17
MERaLiON 4.50|3.77|55.06|34.95
Megrez-3B-Omni 3.50|2.95|25.95|27.03
Lyra-Base 3.85|3.50|38.25|49.74
MiniCPM-o 4.42|4.15|50.72|54.78
Baichuan-Omni-1.5 4.50|4.05|43.40|57.25
Qwen2-Audio 3.74|3.43|35.71|35.72
Qwen2.5-Omni-3B 4.32|4.00|49.37|50.23
Qwen2.5-Omni-7B 4.49|3.93|55.71|61.32
VoiceBench
OpenBookQA | IFEval | AdvBench | Avg
Ultravox-v0.4.1-LLaMA-3.1-8B 65.27|66.88|98.46|71.45
MERaLiON 27.23|62.93|94.81|62.91
Megrez-3B-Omni 28.35|25.71|87.69|46.25
Lyra-Base 72.75|36.28|59.62|57.66
MiniCPM-o 78.02|49.25|97.69|71.69
Baichuan-Omni-1.5 74.51|54.54|97.31|71.14
Qwen2-Audio 49.45|26.33|96.73|55.35
Qwen2.5-Omni-3B 74.73|42.10|98.85|68.81
Qwen2.5-Omni-7B 81.10|52.87|99.42|74.12
Image -> Text | Dataset | Qwen2.5-Omni-7B | Qwen2.5-Omni-3B | Other Best | Qwen2.5-VL-7B | GPT-4o-mini | |--------------------------------|--------------|------------|------------|---------------|-------------| | MMMUval | 59.2 | 53.1 | 53.9 | 58.6 | **60.0** | | MMMU-Prooverall | 36.6 | 29.7 | - | **38.3** | 37.6 | | MathVistatestmini | 67.9 | 59.4 | **71.9** | 68.2 | 52.5 | | MathVisionfull | 25.0 | 20.8 | 23.1 | **25.1** | - | | MMBench-V1.1-ENtest | 81.8 | 77.8 | 80.5 | **82.6** | 76.0 | | MMVetturbo | 66.8 | 62.1 | **67.5** | 67.1 | 66.9 | | MMStar | **64.0** | 55.7 | **64.0** | 63.9 | 54.8 | | MMEsum | 2340 | 2117 | **2372** | 2347 | 2003 | | MuirBench | 59.2 | 48.0 | - | **59.2** | - | | CRPErelation | **76.5** | 73.7 | - | 76.4 | - | | RealWorldQAavg | 70.3 | 62.6 | **71.9** | 68.5 | - | | MME-RealWorlden | **61.6** | 55.6 | - | 57.4 | - | | MM-MT-Bench | 6.0 | 5.0 | - | **6.3** | - | | AI2D | 83.2 | 79.5 | **85.8** | 83.9 | - | | TextVQAval | 84.4 | 79.8 | 83.2 | **84.9** | - | | DocVQAtest | 95.2 | 93.3 | 93.5 | **95.7** | - | | ChartQAtest Avg | 85.3 | 82.8 | 84.9 | **87.3** | - | | OCRBench_V2en | **57.8** | 51.7 | - | 56.3 | - | | Dataset | Qwen2.5-Omni-7B | Qwen2.5-Omni-3B | Qwen2.5-VL-7B | Grounding DINO | Gemini 1.5 Pro | |--------------------------|--------------|---------------|---------------|----------------|----------------| | Refcocoval | 90.5 | 88.7 | 90.0 | **90.6** | 73.2 | | RefcocotextA | **93.5** | 91.8 | 92.5 | 93.2 | 72.9 | | RefcocotextB | 86.6 | 84.0 | 85.4 | **88.2** | 74.6 | | Refcoco+val | 85.4 | 81.1 | 84.2 | **88.2** | 62.5 | | Refcoco+textA | **91.0** | 87.5 | 89.1 | 89.0 | 63.9 | | Refcoco+textB | **79.3** | 73.2 | 76.9 | 75.9 | 65.0 | | Refcocog+val | **87.4** | 85.0 | 87.2 | 86.1 | 75.2 | | Refcocog+test | **87.9** | 85.1 | 87.2 | 87.0 | 76.2 | | ODinW | 42.4 | 39.2 | 37.3 | **55.0** | 36.7 | | PointGrounding | 66.5 | 46.2 | **67.3** | - | - |
Video(without audio) -> Text | Dataset | Qwen2.5-Omni-7B | Qwen2.5-Omni-3B | Other Best | Qwen2.5-VL-7B | GPT-4o-mini | |-----------------------------|--------------|------------|------------|---------------|-------------| | Video-MMEw/o sub | 64.3 | 62.0 | 63.9 | **65.1** | 64.8 | | Video-MMEw sub | **72.4** | 68.6 | 67.9 | 71.6 | - | | MVBench | **70.3** | 68.7 | 67.2 | 69.6 | - | | EgoSchematest | **68.6** | 61.4 | 63.2 | 65.0 | - |
Zero-shot Speech Generation
Datasets Model Performance
Content Consistency
SEED
test-zh | test-en | test-hard
Seed-TTS_ICL 1.11 | 2.24 | 7.58
Seed-TTS_RL 1.00 | 1.94 | 6.42
MaskGCT 2.27 | 2.62 | 10.27
E2_TTS 1.97 | 2.19 | -
F5-TTS 1.56 | 1.83 | 8.67
CosyVoice 2 1.45 | 2.57 | 6.83
CosyVoice 2-S 1.45 | 2.38 | 8.08
Qwen2.5-Omni-3B_ICL 1.95 | 2.87 | 9.92
Qwen2.5-Omni-3B_RL 1.58 | 2.51 | 7.86
Qwen2.5-Omni-7B_ICL 1.70 | 2.72 | 7.97
Qwen2.5-Omni-7B_RL 1.42 | 2.32 | 6.54
Speaker Similarity
SEED
test-zh | test-en | test-hard
Seed-TTS_ICL 0.796 | 0.762 | 0.776
Seed-TTS_RL 0.801 | 0.766 | 0.782
MaskGCT 0.774 | 0.714 | 0.748
E2_TTS 0.730 | 0.710 | -
F5-TTS 0.741 | 0.647 | 0.713
CosyVoice 2 0.748 | 0.652 | 0.724
CosyVoice 2-S 0.753 | 0.654 | 0.732
Qwen2.5-Omni-3B_ICL 0.741 | 0.635 | 0.748
Qwen2.5-Omni-3B_RL 0.744 | 0.635 | 0.746
Qwen2.5-Omni-7B_ICL 0.752 | 0.632 | 0.747
Qwen2.5-Omni-7B_RL 0.754 | 0.641 | 0.752
Text -> Text | Dataset | Qwen2.5-Omni-7B | Qwen2.5-Omni-3B | Qwen2.5-7B | Qwen2.5-3B | Qwen2-7B | Llama3.1-8B | Gemma2-9B | |-----------------------------------|-----------|------------|------------|------------|------------|-------------|-----------| | MMLU-Pro | 47.0 | 40.4 | **56.3** | 43.7 | 44.1 | 48.3 | 52.1 | | MMLU-redux | 71.0 | 60.9 | **75.4** | 64.4 | 67.3 | 67.2 | 72.8 | | LiveBench0831 | 29.6 | 22.3 | **35.9** | 26.8 | 29.2 | 26.7 | 30.6 | | GPQA | 30.8 | 34.3 | **36.4** | 30.3 | 34.3 | 32.8 | 32.8 | | MATH | 71.5 | 63.6 | **75.5** | 65.9 | 52.9 | 51.9 | 44.3 | | GSM8K | 88.7 | 82.6 | **91.6** | 86.7 | 85.7 | 84.5 | 76.7 | | HumanEval | 78.7 | 70.7 | **84.8** | 74.4 | 79.9 | 72.6 | 68.9 | | MBPP | 73.2 | 70.4 | **79.2** | 72.7 | 67.2 | 69.6 | 74.9 | | MultiPL-E | 65.8 | 57.6 | **70.4** | 60.2 | 59.1 | 50.7 | 53.4 | | LiveCodeBench2305-2409 | 24.6 | 16.5 | **28.7** | 19.9 | 23.9 | 8.3 | 18.9 |
## Quickstart Below, we provide simple examples to show how to use Qwen2.5-Omni with 🤗 Transformers. The codes of Qwen2.5-Omni has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: We offer a toolkit to help you handle various types of audio and visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved audio, images and videos. You can install it using the following command and make sure your system has installed: If you are not using Linux, you might not be able to install from PyPI. In that case, you can use which will fall back to using torchvision for video processing. However, you can still install decord from source to get decord used when loading video. ### 🤗 Transformers Usage Here we show a code snippet to show you how to use the chat model with and :
Minimum GPU memory requirements |Model | Precision | 15(s) Video | 30(s) Video | 60(s) Video | |--------------|-----------| ------------- | ------------- | ------------------ | | Qwen-Omni-3B | FP32 | 89.10 GB | Not Recommend | Not Recommend | | Qwen-Omni-3B | BF16 | 18.38 GB | 22.43 GB | 28.22 GB | | Qwen-Omni-7B | FP32 | 93.56 GB | Not Recommend | Not Recommend | | Qwen-Omni-7B | BF16 | 31.11 GB | 41.85 GB | 60.19 GB | Note: The table above presents the theoretical minimum memory requirements for inference with and is test with ; however, in practice, the actual memory usage is typically at least 1.2 times higher. For more information, see the linked resource here.
Video URL resource usage Video URL compatibility largely depends on the third-party library version. The details are in the table below. Change the backend by or if you prefer not to use the default one. | Backend | HTTP | HTTPS | |-------------|------|-------| | torchvision >= 0.19.0 | ✅ | ✅ | | torchvision < 0.19.0 | ❌ | ❌ | | decord | ✅ | ❌ |
Batch inference The model can batch inputs composed of mixed samples of various types such as text, images, audio and videos as input when is set. Here is an example.
### Usage Tips #### Prompt for audio output If users need audio output, the system prompt must be set as \"You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.\", otherwise the audio output may not work as expected. #### Use audio in video In the process of multimodal interaction, the videos provided by users are often accompanied by audio (such as questions about the content in the video, or sounds generated by certain events in the video). This information is conducive to the model providing a better interactive experience. So we provide the following options for users to decide whether to use audio in video. It is worth noting that during a multi-round conversation, the parameter in these places must be set to the same, otherwise unexpected results will occur. #### Use audio output or not The model supports both text and audio outputs, if users do not need audio outputs, they can call after init the model. This option will save about of GPU memory but the option for function will only allow to be set at . In order to obtain a flexible experience, we recommend that users can decide whether to return audio when function is called. If is set to , the model will only return text outputs to get text responses faster. #### Change voice type of output audio Qwen2.5-Omni supports the ability to change the voice of the output audio. The checkpoint support two voice types as follow: | Voice Type | Gender | Description | |------------|--------|-------------| | Chelsie | Female | A honeyed, velvety voice that carries a gentle warmth and luminous clarity.| | Ethan | Male | A bright, upbeat voice with infectious energy and a warm, approachable vibe.| Users can use the parameter of function to specify the voice type. By default, if is not specified, the default voice type is . #### Flash-Attention 2 to speed up generation First, make sure to install the latest version of Flash Attention 2: Also, you should have hardware that is compatible with FlashAttention 2. Read more about it in the official documentation of the flash attention repository. FlashAttention-2 can only be used when a model is loaded in or . To load and run a model using FlashAttention-2, add when loading the model: ## Citation If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil: :)
", + "model_explanation_gemini": "Qwen2.5-Omni-7B is a multimodal model that processes text, images, audio, and video inputs to generate real-time text and speech responses with strong performance across diverse modalities." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-VL-32B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-VL-32B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..ac338a1e37f4d03ea63be391e447f90d3a5631e3 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-VL-32B-Instruct.json @@ -0,0 +1,21 @@ +{ + "model_id": "Qwen/Qwen2.5-VL-32B-Instruct", + "downloads": 392049, + "tags": [ + "transformers", + "safetensors", + "qwen2_5_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2502.13923", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2.5-VL-32B-Instruct \"Chat\" ## Latest Updates: In addition to the original formula, we have further enhanced Qwen2.5-VL-32B's mathematical and problem-solving abilities through reinforcement learning. This has also significantly improved the model's subjective user experience, with response styles adjusted to better align with human preferences. Particularly for objective queries such as mathematics, logical reasoning, and knowledge-based Q&A, the level of detail in responses and the clarity of formatting have been noticeably enhanced. ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL. #### Key Enhancements: * **Understand things visually**: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. * **Being agentic**: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. * **Understanding long videos and capturing events**: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments. * **Capable of visual localization in different formats**: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. * **Generating structured outputs**: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc. #### Model Architecture Updates: * **Dynamic Resolution and Frame Rate Training for Video Understanding**: We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.

* **Streamlined and Efficient Vision Encoder** We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM. We have four models with 3, 7, 32 and 72 billion parameters. This repo contains the instruction-tuned 32B Qwen2.5-VL model. For more information, visit our Blog and GitHub. ## Evaluation ### Vision | Dataset | Qwen2.5-VL-72B
(🤗🤖) | Qwen2-VL-72B
(🤗🤖) | Qwen2.5-VL-32B
(🤗🤖) | |--------------------|--------|--------------|------------------| | MMMU |**70.2** | 64.5 | 70 | | MMMU Pro |**51.1** | 46.2 | 49.5 | | MMStar | **70.8** | 68.3 | 69.5 | | MathVista | **74.8** | 70.5 | 74.7 | | MathVision |38.1 | 25.9 | **40.0**| | OCRBenchV2 | **61.5/63.7** | 47.8/46.1 | 57.2/59.1 | | CC-OCR | **79.8** | 68.7 | 77.1 | | DocVQA | **96.4** | **96.5** | 94.8 | | InfoVQA | **87.3** | 84.5 | 83.4 | | LVBench |47.3 | - | **49.00** | | CharadesSTA |50.9 | - | **54.2** | | VideoMME |**73.3/79.1** | 71.2/77.8 | 70.5/77.9 | | MMBench-Video |**2.02** | 1.7 | 1.93 | | AITZ |**83.2** | - | 83.1 | | Android Control |**67.4/93.7** | 66.4/84.4 | 69.6/93.3 | | ScreenSpot |**87.1** | - | 88.5 | | ScreenSpot Pro |**43.6** | - | 39.4 | | AndroidWorld |**35** | - | 22.0 | | OSWorld |**8.83** | - | 5.92 | ### Text | MODEL | MMLU | MMLU-PRO | MATH | GPQA-diamond | MBPP | Human Eval | |-----------------|--------|----------|---------|--------------|--------|------------| | Qwen2.5-VL-32B | 78.4 | 68.8 | 82.2 | 46.0 | 84.0 | 91.5 | | Mistral-Small-3.1-24B | 80.6 | 66.8 | 69.3 | 46.0 | 74.7 | 88.4 | | Gemma3-27B-IT | 76.9 | 67.5 | 89 | 42.4 | 74.4 | 87.8 | | GPT-4o-Mini | 82.0 | 61.7 | 70.2 | 39.4 | 84.8 | 87.2 | | Claude-3.5-Haiku | 77.6 | 65.0 | 69.2 | 41.6 | 85.6 | 88.1 | ## Requirements The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: ## Quickstart Below, we provide simple examples to show how to use Qwen2.5-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: If you are not using Linux, you might not be able to install from PyPI. In that case, you can use which will fall back to using torchvision for video processing. However, you can still install decord from source to get decord used when loading video. ### Using 🤗 Transformers to Chat Here we show a code snippet to show you how to use the chat model with and :

Multi image inference
Video inference Video URL compatibility largely depends on the third-party library version. The details are in the table below. change the backend by or if you prefer not to use the default one. | Backend | HTTP | HTTPS | |-------------|------|-------| | torchvision >= 0.19.0 | ✅ | ✅ | | torchvision < 0.19.0 | ❌ | ❌ | | decord | ✅ | ❌ |
Batch inference
### 🤖 ModelScope We strongly advise users especially those in mainland China to use ModelScope. can help you solve issues concerning downloading checkpoints. ### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: { ..., \"type\": \"yarn\", \"mrope_section\": [ 16, 24, 24 ], \"factor\": 4, \"original_max_position_embeddings\": 32768 } However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use. At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-VL-32B-Instruct is a multimodal vision-language model designed for tasks like visual recognition, text and chart analysis, video comprehension, object localization, and structured data generation from images, with enhanced mathematical and reasoning capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-VL-3B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-VL-3B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..b58e5ead13900d8472d7f3e762f85734f01c683a --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-VL-3B-Instruct.json @@ -0,0 +1,21 @@ +{ + "model_id": "Qwen/Qwen2.5-VL-3B-Instruct", + "downloads": 1828422, + "tags": [ + "transformers", + "safetensors", + "qwen2_5_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2409.12191", + "arxiv:2308.12966", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license_name: qwen-research license_link: language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2.5-VL-3B-Instruct \"Chat\" ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL. #### Key Enhancements: * **Understand things visually**: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. * **Being agentic**: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. * **Understanding long videos and capturing events**: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments. * **Capable of visual localization in different formats**: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. * **Generating structured outputs**: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc. #### Model Architecture Updates: * **Dynamic Resolution and Frame Rate Training for Video Understanding**: We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.

* **Streamlined and Efficient Vision Encoder** We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM. We have three models with 3, 7 and 72 billion parameters. This repo contains the instruction-tuned 3B Qwen2.5-VL model. For more information, visit our Blog and GitHub. ## Evaluation ### Image benchmark | Benchmark | InternVL2.5-4B |Qwen2-VL-7B |Qwen2.5-VL-3B | | :--- | :---: | :---: | :---: | | MMMUval | 52.3 | 54.1 | 53.1| | MMMU-Proval | **32.7** | 30.5 | 31.6| | AI2Dtest | 81.4 | **83.0** | 81.5 | | DocVQAtest | 91.6 | 94.5 | **93.9** | | InfoVQAtest | 72.1 | 76.5 | **77.1** | | TextVQAval | 76.8 | **84.3** | 79.3| | MMBench-V1.1test | 79.3 | **80.7** | 77.6 | | MMStar | 58.3 | **60.7** | 55.9 | | MathVistatestmini | 60.5 | 58.2 | **62.3** | | MathVisionfull | 20.9 | 16.3 | **21.2** | ### Video benchmark | Benchmark | InternVL2.5-4B | Qwen2-VL-7B | Qwen2.5-VL-3B | | :--- | :---: | :---: | :---: | | MVBench | 71.6 | 67.0 | 67.0 | | VideoMME | 63.6/62.3 | 69.0/63.3 | 67.6/61.5 | | MLVU | 48.3 | - | 68.2 | | LVBench | - | - | 43.3 | | MMBench-Video | 1.73 | 1.44 | 1.63 | | EgoSchema | - | - | 64.8 | | PerceptionTest | - | - | 66.9 | | TempCompass | - | - | 64.4 | | LongVideoBench | 55.2 | 55.6 | 54.2 | | CharadesSTA/mIoU | - | - | 38.8 | ### Agent benchmark | Benchmarks | Qwen2.5-VL-3B | |-------------------------|---------------| | ScreenSpot | 55.5 | | ScreenSpot Pro | 23.9 | | AITZ_EM | 76.9 | | Android Control High_EM | 63.7 | | Android Control Low_EM | 22.2 | | AndroidWorld_SR | 90.8 | | MobileMiniWob++_SR | 67.9 | ## Requirements The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: ## Quickstart Below, we provide simple examples to show how to use Qwen2.5-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: If you are not using Linux, you might not be able to install from PyPI. In that case, you can use which will fall back to using torchvision for video processing. However, you can still install decord from source to get decord used when loading video. ### Using 🤗 Transformers to Chat Here we show a code snippet to show you how to use the chat model with and :

Multi image inference
Video inference Video URL compatibility largely depends on the third-party library version. The details are in the table below. change the backend by or if you prefer not to use the default one. | Backend | HTTP | HTTPS | |-------------|------|-------| | torchvision >= 0.19.0 | ✅ | ✅ | | torchvision < 0.19.0 | ❌ | ❌ | | decord | ✅ | ❌ |
Batch inference
### 🤖 ModelScope We strongly advise users especially those in mainland China to use ModelScope. can help you solve issues concerning downloading checkpoints. ### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use. At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A multimodal vision-language model that processes images and videos to perform tasks like object recognition, visual analysis, structured data extraction, and video comprehension while supporting agent-like interactions and localization." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-VL-72B-Instruct-AWQ.json b/data/model_data_json/Qwen_Qwen2.5-VL-72B-Instruct-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..27d11125793bf7cf1cb8b70bb64f6a22153914f4 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-VL-72B-Instruct-AWQ.json @@ -0,0 +1,26 @@ +{ + "model_id": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ", + "downloads": 88145, + "tags": [ + "transformers", + "safetensors", + "qwen2_5_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2409.12191", + "arxiv:2308.12966", + "base_model:Qwen/Qwen2.5-VL-72B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-VL-72B-Instruct", + "license:other", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "awq", + "region:us" + ], + "description": "--- license: other license_name: qwen license_link: language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers base_model: - Qwen/Qwen2.5-VL-72B-Instruct --- # Qwen2.5-VL-72B-Instruct-AWQ \"Chat\" ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL. #### Key Enhancements: * **Understand things visually**: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. * **Being agentic**: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. * **Understanding long videos and capturing events**: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments. * **Capable of visual localization in different formats**: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. * **Generating structured outputs**: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc. #### Model Architecture Updates: * **Dynamic Resolution and Frame Rate Training for Video Understanding**: We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.

* **Streamlined and Efficient Vision Encoder** We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM. We have three models with 3, 7 and 72 billion parameters. This repo contains the instruction-tuned 72B Qwen2.5-VL model. For more information, visit our Blog and GitHub. ## Evaluation ## Requirements The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: ## Quickstart Below, we provide simple examples to show how to use Qwen2.5-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: If you are not using Linux, you might not be able to install from PyPI. In that case, you can use which will fall back to using torchvision for video processing. However, you can still install decord from source to get decord used when loading video. ### Using 🤗 Transformers to Chat Here we show a code snippet to show you how to use the chat model with and : ### 🤖 ModelScope We strongly advise users especially those in mainland China to use ModelScope. can help you solve issues concerning downloading checkpoints. ### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use. At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k. ### Benchmark #### Performance of Quantized Models This section reports the generation performance of quantized models (including GPTQ and AWQ) of the Qwen2.5-VL series. Specifically, we report: - MMMU_VAL (Accuracy) - DocVQA_VAL (Accuracy) - MMBench_DEV_EN (Accuracy) - MathVista_MINI (Accuracy) We use VLMEvalkit to evaluate all models. | Model Size | Quantization | MMMU_VAL | DocVQA_VAL | MMBench_EDV_EN | MathVista_MINI | | --- | --- | --- | --- | --- | --- | | Qwen2.5-VL-72B-Instruct | BF16
(🤗🤖) | 70.0 | 96.1 | 88.2 | 75.3 | | | AWQ
(🤗🤖) | 69.1 | 96.0 | 87.9 | 73.8 | | Qwen2.5-VL-7B-Instruct | BF16
(🤗🤖) | 58.4 | 94.9 | 84.1 | 67.9 | | | AWQ
(🤗🤖) | 55.6 | 94.6 | 84.2 | 64.7 | | Qwen2.5-VL-3B-Instruct | BF16
(🤗🤖) | 51.7 | 93.0 | 79.8 | 61.4 | | | AWQ
(🤗🤖) | 49.1 | 91.8 | 78.0 | 58.8 | ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-VL-72B-Instruct-AWQ is a multimodal vision-language model designed for tasks like visual recognition, text and chart analysis, video comprehension, object localization, and structured data extraction from images and videos." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-VL-72B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-VL-72B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..55e7601f7e95df7dd4e74fb7d7d436c4dc66c372 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-VL-72B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "Qwen/Qwen2.5-VL-72B-Instruct", + "downloads": 163112, + "tags": [ + "transformers", + "safetensors", + "qwen2_5_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2409.12191", + "arxiv:2308.12966", + "base_model:Qwen/Qwen2.5-VL-72B-Instruct", + "base_model:finetune:Qwen/Qwen2.5-VL-72B-Instruct", + "license:other", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: qwen license_link: language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers base_model: - Qwen/Qwen2.5-VL-72B-Instruct --- # Qwen2.5-VL-72B-Instruct \"Chat\" ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL. #### Key Enhancements: * **Understand things visually**: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. * **Being agentic**: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. * **Understanding long videos and capturing events**: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments. * **Capable of visual localization in different formats**: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. * **Generating structured outputs**: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc. #### Model Architecture Updates: * **Dynamic Resolution and Frame Rate Training for Video Understanding**: We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.

* **Streamlined and Efficient Vision Encoder** We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM. We have three models with 3, 7 and 72 billion parameters. This repo contains the instruction-tuned 72B Qwen2.5-VL model. For more information, visit our Blog and GitHub. ## Evaluation ### Image benchmark | Benchmarks | GPT4o | Claude3.5 Sonnet | Gemini-2-flash | InternVL2.5-78B | Qwen2-VL-72B | Qwen2.5-VL-72B | |-----------------------|-----------|-------------------|-----------------|-----------------|--------------|----------------| | MMMUval | 70.3 | 70.4 | 70.7 | 70.1 | 64.5 | 70.2 | | MMMU_Pro | 54.5 | 54.7 | 57.0 | 48.6 | 46.2 | 51.1 | | MathVista_MINI | 63.8 | 65.4 | 73.1 | 76.6 | 70.5 | 74.8 | | MathVision_FULL | 30.4 | 38.3 | 41.3 | 32.2 | 25.9 | 38.1 | | Hallusion Bench | 55.0 | 55.16 | | 57.4 | 58.1 | 55.16 | | MMBench_DEV_EN_V11 | 82.1 | 83.4 | 83.0 | 88.5 | 86.6 | 88 | | AI2D_TEST | 84.6 | 81.2 | | 89.1 | 88.1 | 88.4 | | ChartQA_TEST | 86.7 | 90.8 | 85.2 | 88.3 | 88.3 | 89.5 | | DocVQA_VAL | 91.1 | 95.2 | 92.1 | 96.5 | 96.1 | 96.4 | | MMStar | 64.7 | 65.1 | 69.4 | 69.5 | 68.3 | 70.8 | | MMVet_turbo | 69.1 | 70.1 | | 72.3 | 74.0 | 76.19 | | OCRBench | 736 | 788 | | 854 | 877 | 885 | | OCRBench-V2(en/zh) | 46.5/32.3 | 45.2/39.6 | 51.9/43.1 | 45/46.2 | 47.8/46.1 | 61.5/63.7 | | CC-OCR | 66.6 | 62.7 | 73.0 | 64.7 | 68.7 |79.8 | ### Video benchmark | Benchmarks | GPT4o | Gemini-1.5-Pro | InternVL2.5-78B | Qwen2VL-72B | Qwen2.5VL-72B | |---------------------|-------|----------------|-----------------|-------------|---------------| | VideoMME w/o sub. | 71.9 | 75.0 | 72.1 | 71.2 | 73.3 | | VideoMME w sub. | 77.2 | 81.3 | 74.0 | 77.8 | 79.1 | | MVBench | 64.6 | 60.5 | 76.4 | 73.6 | 70.4 | | MMBench-Video | 1.63 | 1.30 | 1.97 | 1.70 | 2.02 | | LVBench | 30.8 | 33.1 | - | 41.3 | 47.3 | | EgoSchema | 72.2 | 71.2 | - | 77.9 | 76.2 | | PerceptionTest_test | - | - | - | 68.0 | 73.2 | | MLVU_M-Avg_dev | 64.6 | - | 75.7 | | 74.6 | | TempCompass_overall | 73.8 | - | - | | 74.8 | ### Agent benchmark | Benchmarks | GPT4o | Gemini 2.0 | Claude | Aguvis-72B | Qwen2VL-72B | Qwen2.5VL-72B | |-------------------------|-------------|------------|--------|------------|-------------|---------------| | ScreenSpot | 18.1 | 84.0 | 83.0 | | | 87.1 | | ScreenSpot Pro | | | 17.1 | | 1.6 | 43.6 | | AITZ_EM | 35.3 | | | | 72.8 | 83.2 | | Android Control High_EM | | | | 66.4 | 59.1 | 67.36 | | Android Control Low_EM | | | | 84.4 | 59.2 | 93.7 | | AndroidWorld_SR | 34.5% (SoM) | | 27.9% | 26.1% | | 35% | | MobileMiniWob++_SR | | | | 66% | | 68% | | OSWorld | | | 14.90 | 10.26 | | 8.83 | ## Requirements The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: ## Quickstart Below, we provide simple examples to show how to use Qwen2.5-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: If you are not using Linux, you might not be able to install from PyPI. In that case, you can use which will fall back to using torchvision for video processing. However, you can still install decord from source to get decord used when loading video. ### Using 🤗 Transformers to Chat Here we show a code snippet to show you how to use the chat model with and :

Multi image inference
Video inference Video URL compatibility largely depends on the third-party library version. The details are in the table below. change the backend by or if you prefer not to use the default one. | Backend | HTTP | HTTPS | |-------------|------|-------| | torchvision >= 0.19.0 | ✅ | ✅ | | torchvision < 0.19.0 | ❌ | ❌ | | decord | ✅ | ❌ |
Batch inference
### 🤖 ModelScope We strongly advise users especially those in mainland China to use ModelScope. can help you solve issues concerning downloading checkpoints. ### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use. At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A multimodal AI model that processes images and text to perform tasks like visual recognition, chart analysis, video comprehension, object localization, and structured data extraction from documents." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-VL-7B-Instruct-AWQ.json b/data/model_data_json/Qwen_Qwen2.5-VL-7B-Instruct-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..fe12255b674d2957195b5f29493f50feef2cba41 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-VL-7B-Instruct-AWQ.json @@ -0,0 +1,25 @@ +{ + "model_id": "Qwen/Qwen2.5-VL-7B-Instruct-AWQ", + "downloads": 78495, + "tags": [ + "transformers", + "safetensors", + "qwen2_5_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2409.12191", + "arxiv:2308.12966", + "base_model:Qwen/Qwen2.5-VL-7B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-VL-7B-Instruct", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "awq", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers base_model: - Qwen/Qwen2.5-VL-7B-Instruct --- # Qwen2.5-VL-7B-Instruct-AWQ \"Chat\" ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL. #### Key Enhancements: * **Understand things visually**: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. * **Being agentic**: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. * **Understanding long videos and capturing events**: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments. * **Capable of visual localization in different formats**: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. * **Generating structured outputs**: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc. #### Model Architecture Updates: * **Dynamic Resolution and Frame Rate Training for Video Understanding**: We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.

* **Streamlined and Efficient Vision Encoder** We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM. We have three models with 3, 7 and 72 billion parameters. This repo contains the instruction-tuned 7B Qwen2.5-VL model with AWQ. For more information, visit our Blog and GitHub. ## Evaluation ## Requirements The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: ## Quickstart Below, we provide simple examples to show how to use Qwen2.5-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: If you are not using Linux, you might not be able to install from PyPI. In that case, you can use which will fall back to using torchvision for video processing. However, you can still install decord from source to get decord used when loading video. ### Using 🤗 Transformers to Chat Here we show a code snippet to show you how to use the chat model with and : ### 🤖 ModelScope We strongly advise users especially those in mainland China to use ModelScope. can help you solve issues concerning downloading checkpoints. ### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: { ..., \"type\": \"yarn\", \"mrope_section\": [ 16, 24, 24 ], \"factor\": 4, \"original_max_position_embeddings\": 32768 } However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use. At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k. ### Benchmark #### Performance of Quantized Models This section reports the generation performance of quantized models (including GPTQ and AWQ) of the Qwen2.5-VL series. Specifically, we report: - MMMU_VAL (Accuracy) - DocVQA_VAL (Accuracy) - MMBench_DEV_EN (Accuracy) - MathVista_MINI (Accuracy) We use VLMEvalkit to evaluate all models. | Model Size | Quantization | MMMU_VAL | DocVQA_VAL | MMBench_EDV_EN | MathVista_MINI | | --- | --- | --- | --- | --- | --- | | Qwen2.5-VL-72B-Instruct | BF16
(🤗🤖) | 70.0 | 96.1 | 88.2 | 75.3 | | | AWQ
(🤗🤖) | 69.1 | 96.0 | 87.9 | 73.8 | | Qwen2.5-VL-7B-Instruct | BF16
(🤗🤖) | 58.4 | 94.9 | 84.1 | 67.9 | | | AWQ
(🤗🤖) | 55.6 | 94.6 | 84.2 | 64.7 | | Qwen2.5-VL-3B-Instruct | BF16
(🤗🤖) | 51.7 | 93.0 | 79.8 | 61.4 | | | AWQ
(🤗🤖) | 49.1 | 91.8 | 78.0 | 58.8 | ## Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen2.5-VL-7B-Instruct.json b/data/model_data_json/Qwen_Qwen2.5-VL-7B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..44350063d9ffe9eecb4e969a48c956732f41d9b2 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen2.5-VL-7B-Instruct.json @@ -0,0 +1,22 @@ +{ + "model_id": "Qwen/Qwen2.5-VL-7B-Instruct", + "downloads": 2671690, + "tags": [ + "transformers", + "safetensors", + "qwen2_5_vl", + "image-text-to-text", + "multimodal", + "conversational", + "en", + "arxiv:2309.00071", + "arxiv:2409.12191", + "arxiv:2308.12966", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2.5-VL-7B-Instruct \"Chat\" ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL. #### Key Enhancements: * **Understand things visually**: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. * **Being agentic**: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. * **Understanding long videos and capturing events**: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments. * **Capable of visual localization in different formats**: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. * **Generating structured outputs**: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc. #### Model Architecture Updates: * **Dynamic Resolution and Frame Rate Training for Video Understanding**: We extend dynamic resolution to the temporal dimension by adopting dynamic FPS sampling, enabling the model to comprehend videos at various sampling rates. Accordingly, we update mRoPE in the time dimension with IDs and absolute time alignment, enabling the model to learn temporal sequence and speed, and ultimately acquire the ability to pinpoint specific moments.

* **Streamlined and Efficient Vision Encoder** We enhance both training and inference speeds by strategically implementing window attention into the ViT. The ViT architecture is further optimized with SwiGLU and RMSNorm, aligning it with the structure of the Qwen2.5 LLM. We have three models with 3, 7 and 72 billion parameters. This repo contains the instruction-tuned 7B Qwen2.5-VL model. For more information, visit our Blog and GitHub. ## Evaluation ### Image benchmark | Benchmark | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B |**Qwen2.5-VL-7B** | | :--- | :---: | :---: | :---: | :---: | :---: | | MMMUval | 56 | 50.4 | **60**| 54.1 | 58.6| | MMMU-Proval | 34.3 | - | 37.6| 30.5 | 41.0| | DocVQAtest | 93 | 93 | - | 94.5 | **95.7** | | InfoVQAtest | 77.6 | - | - |76.5 | **82.6** | | ChartQAtest | 84.8 | - |- | 83.0 |**87.3** | | TextVQAval | 79.1 | 80.1 | -| 84.3 | **84.9**| | OCRBench | 822 | 852 | 785 | 845 | **864** | | CC_OCR | 57.7 | | | 61.6 | **77.8**| | MMStar | 62.8| | |60.7| **63.9**| | MMBench-V1.1-Entest | 79.4 | 78.0 | 76.0| 80.7 | **82.6** | | MMT-Benchtest | - | - | - |**63.7** |63.6 | | MMStar | **61.5** | 57.5 | 54.8 | 60.7 |63.9 | | MMVetGPT-4-Turbo | 54.2 | 60.0 | 66.9 | 62.0 | **67.1**| | HallBenchavg | 45.2 | 48.1 | 46.1| 50.6 | **52.9**| | MathVistatestmini | 58.3 | 60.6 | 52.4 | 58.2 | **68.2**| | MathVision | - | - | - | 16.3 | **25.07** | ### Video Benchmarks | Benchmark | Qwen2-VL-7B | **Qwen2.5-VL-7B** | | :--- | :---: | :---: | | MVBench | 67.0 | **69.6** | | PerceptionTesttest | 66.9 | **70.5** | | Video-MMEwo/w subs | 63.3/69.0 | **65.1**/**71.6** | | LVBench | | 45.3 | | LongVideoBench | | 54.7 | | MMBench-Video | 1.44 | 1.79 | | TempCompass | | 71.7 | | MLVU | | 70.2 | | CharadesSTA/mIoU | 43.6| ### Agent benchmark | Benchmarks | Qwen2.5-VL-7B | |-------------------------|---------------| | ScreenSpot | 84.7 | | ScreenSpot Pro | 29.0 | | AITZ_EM | 81.9 | | Android Control High_EM | 60.1 | | Android Control Low_EM | 93.7 | | AndroidWorld_SR | 25.5 | | MobileMiniWob++_SR | 91.4 | ## Requirements The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: ## Quickstart Below, we provide simple examples to show how to use Qwen2.5-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen2.5-VL has been in the latest Hugging face transformers and we advise you to build from source with command: or you might encounter the following error: We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command: If you are not using Linux, you might not be able to install from PyPI. In that case, you can use which will fall back to using torchvision for video processing. However, you can still install decord from source to get decord used when loading video. ### Using 🤗 Transformers to Chat Here we show a code snippet to show you how to use the chat model with and :

Multi image inference
Video inference Video URL compatibility largely depends on the third-party library version. The details are in the table below. change the backend by or if you prefer not to use the default one. | Backend | HTTP | HTTPS | |-------------|------|-------| | torchvision >= 0.19.0 | ✅ | ✅ | | torchvision < 0.19.0 | ❌ | ❌ | | decord | ✅ | ❌ |
Batch inference
### 🤖 ModelScope We strongly advise users especially those in mainland China to use ModelScope. can help you solve issues concerning downloading checkpoints. ### More Usage Tips For input images, we support local files, base64, and URLs. For videos, we currently only support local files. #### Image Resolution for performance boost The model supports a wide range of resolution inputs. By default, it uses the native resolution for input, but higher resolutions can enhance performance at the cost of more computation. Users can set the minimum and maximum number of pixels to achieve an optimal configuration for their needs, such as a token count range of 256-1280, to balance speed and memory usage. Besides, We provide two methods for fine-grained control over the image size input to the model: 1. Define min_pixels and max_pixels: Images will be resized to maintain their aspect ratio within the range of min_pixels and max_pixels. 2. Specify exact dimensions: Directly set and . These values will be rounded to the nearest multiple of 28. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: { ..., \"type\": \"yarn\", \"mrope_section\": [ 16, 24, 24 ], \"factor\": 4, \"original_max_position_embeddings\": 32768 } However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use. At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen2.5-VL-7B-Instruct is a multimodal vision-language model that processes images and videos to perform tasks like object recognition, text/chart analysis, visual localization, structured data extraction, and long-video comprehension, while supporting agentic tool use and JSON outputs." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen3-32B.json b/data/model_data_json/Qwen_Qwen3-32B.json new file mode 100644 index 0000000000000000000000000000000000000000..923690196949e35d842756ff01472575fa911c14 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen3-32B.json @@ -0,0 +1,17 @@ +{ + "model_id": "Qwen/Qwen3-32B", + "downloads": 75675, + "tags": [ + "transformers", + "safetensors", + "qwen3", + "text-generation", + "conversational", + "arxiv:2309.00071", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 license_link: pipeline_tag: text-generation --- # Qwen3-32B \"Chat\" ## Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios. - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. - **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. - **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. - **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**. ## Model Overview **Qwen3-32B** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 32.8B - Number of Paramaters (Non-Embedding): 31.2B - Number of Layers: 64 - Number of Attention Heads (GQA): 64 for Q and 8 for KV - Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation. ## Quickstart The code of Qwen3 has been in the latest Hugging Face and we advise you to use the latest version of . With , you will encounter the following error: The following contains a code snippet illustrating how to use the model generate content based on given inputs. For deployment, you can use or or to create an OpenAI-compatible API endpoint: - SGLang: - vLLM: For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3. ## Switching Between Thinking and Non-Thinking Mode > [!TIP] > The switch is also available in APIs created by SGLang and vLLM. > Please refer to our documentation for SGLang and vLLM users. ### By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting or leaving it as the default value in , the model will engage its thinking mode. In this mode, the model will generate think content wrapped in a block, followed by the final response. > [!NOTE] > For thinking mode, use , , , and (the default setting in ). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the Best Practices section. ### We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency. In this mode, the model will not generate any think content and will not include a block. > [!NOTE] > For non-thinking mode, we suggest using , , , and . For more detailed guidance, please refer to the Best Practices section. ### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input We provide a soft switch mechanism that allows users to dynamically control the model's behavior when . Specifically, you can add and to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations. Here is an example of a multi-turn conversation: > [!NOTE] > For API compatibility, when , regardless of whether the user uses or , the model will always output a block wrapped in . However, the content inside this block may be empty if thinking is disabled. > When , the soft switches are not valid. Regardless of any or tags input by the user, the model will not generate think content and will not include a block. ## Agentic Use Qwen3 excels in tool calling capabilities. We recommend using Qwen-Agent to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity. To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself. ## Processing Long Texts Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method. YaRN is currently supported by several inference frameworks, e.g., and for local use, and for deployment. In general, there are two approaches to enabling YaRN for supported frameworks: - Modifying the model files: In the file, add the fields: For , you need to regenerate the GGUF file after the modification. - Passing command line arguments: For , you can use For , you can use For from , you can use > [!IMPORTANT] > If you encounter the following warning > > please upgrade . > [!NOTE] > All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** > We advise adding the configuration only when processing long contexts is required. > It is also recommended to modify the as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set as 2.0. > [!NOTE] > The default in is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance. > [!TIP] > The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed. ## Best Practices To achieve optimal performance, we recommend the following settings: 1. **Sampling Parameters**: - For thinking mode (), use , , , and . **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. - For non-thinking mode (), we suggest using , , , and . - For supported frameworks, you can adjust the parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance. 2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance. 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking. - **Math Problems**: Include \"Please reason step by step, and put your final answer within \\boxed{}.\" in the prompt. - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: \"Please show your choice in the field with only the choice letter, e.g., .\" 4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed. ### Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen3-4B.json b/data/model_data_json/Qwen_Qwen3-4B.json new file mode 100644 index 0000000000000000000000000000000000000000..0083fc48cd71c394485c64fa92cb8db379a90ba3 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen3-4B.json @@ -0,0 +1,19 @@ +{ + "model_id": "Qwen/Qwen3-4B", + "downloads": 81038, + "tags": [ + "transformers", + "safetensors", + "qwen3", + "text-generation", + "conversational", + "arxiv:2309.00071", + "base_model:Qwen/Qwen3-4B-Base", + "base_model:finetune:Qwen/Qwen3-4B-Base", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 license_link: pipeline_tag: text-generation base_model: - Qwen/Qwen3-4B-Base --- # Qwen3-4B \"Chat\" ## Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios. - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. - **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. - **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. - **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**. ## Model Overview **Qwen3-4B** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 4.0B - Number of Paramaters (Non-Embedding): 3.6B - Number of Layers: 36 - Number of Attention Heads (GQA): 32 for Q and 8 for KV - Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation. > [!TIP] > If you encounter significant endless repetitions, please refer to the Best Practices section for optimal sampling parameters, and set the `transformerstransformerstransformers<4.51.0sglang>=0.4.6.post1vllm>=0.8.5enable_thinkingenable_thinking=Trueenable_thinking=Truetokenizer.apply_chat_template...Temperature=0.6TopP=0.95TopK=20MinP=0generation_config.jsonenable_thinking=False...Temperature=0.7TopP=0.8TopK=20MinP=0enable_thinking=True/think/no_thinkenable_thinking=True/think/no_think...enable_thinking=False/think/no_think...transformersllama.cppvllmsglangconfig.jsonrope_scalingllama.cppvllmsglangllama-serverllama.cpptransformers>=4.51.0rope_scalingfactorfactormax_position_embeddingsconfig.jsonenable_thinking=TrueTemperature=0.6TopP=0.95TopK=20MinP=0enable_thinking=FalseTemperature=0.7TopP=0.8TopK=20MinP=0presence_penaltyanswer\"answer\": \"C\"`.\" 4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed. ### Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/Qwen_Qwen3-8B.json b/data/model_data_json/Qwen_Qwen3-8B.json new file mode 100644 index 0000000000000000000000000000000000000000..a5123850c68b1e8945214ae6f68fa69d20d2ab67 --- /dev/null +++ b/data/model_data_json/Qwen_Qwen3-8B.json @@ -0,0 +1,19 @@ +{ + "model_id": "Qwen/Qwen3-8B", + "downloads": 78129, + "tags": [ + "transformers", + "safetensors", + "qwen3", + "text-generation", + "conversational", + "arxiv:2309.00071", + "base_model:Qwen/Qwen3-8B-Base", + "base_model:finetune:Qwen/Qwen3-8B-Base", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 license_link: pipeline_tag: text-generation base_model: - Qwen/Qwen3-8B-Base --- # Qwen3-8B \"Chat\" ## Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios. - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. - **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. - **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. - **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**. ## Model Overview **Qwen3-8B** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 8.2B - Number of Paramaters (Non-Embedding): 6.95B - Number of Layers: 36 - Number of Attention Heads (GQA): 32 for Q and 8 for KV - Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation. ## Quickstart The code of Qwen3 has been in the latest Hugging Face and we advise you to use the latest version of . With , you will encounter the following error: The following contains a code snippet illustrating how to use the model generate content based on given inputs. For deployment, you can use or or to create an OpenAI-compatible API endpoint: - SGLang: - vLLM: For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3. ## Switching Between Thinking and Non-Thinking Mode > [!TIP] > The switch is also available in APIs created by SGLang and vLLM. > Please refer to our documentation for SGLang and vLLM users. ### By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting or leaving it as the default value in , the model will engage its thinking mode. In this mode, the model will generate think content wrapped in a block, followed by the final response. > [!NOTE] > For thinking mode, use , , , and (the default setting in ). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the Best Practices section. ### We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency. In this mode, the model will not generate any think content and will not include a block. > [!NOTE] > For non-thinking mode, we suggest using , , , and . For more detailed guidance, please refer to the Best Practices section. ### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input We provide a soft switch mechanism that allows users to dynamically control the model's behavior when . Specifically, you can add and to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations. Here is an example of a multi-turn conversation: > [!NOTE] > For API compatibility, when , regardless of whether the user uses or , the model will always output a block wrapped in . However, the content inside this block may be empty if thinking is disabled. > When , the soft switches are not valid. Regardless of any or tags input by the user, the model will not generate think content and will not include a block. ## Agentic Use Qwen3 excels in tool calling capabilities. We recommend using Qwen-Agent to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity. To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself. ## Processing Long Texts Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method. YaRN is currently supported by several inference frameworks, e.g., and for local use, and for deployment. In general, there are two approaches to enabling YaRN for supported frameworks: - Modifying the model files: In the file, add the fields: For , you need to regenerate the GGUF file after the modification. - Passing command line arguments: For , you can use For , you can use For from , you can use > [!IMPORTANT] > If you encounter the following warning > > please upgrade . > [!NOTE] > All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** > We advise adding the configuration only when processing long contexts is required. > It is also recommended to modify the as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set as 2.0. > [!NOTE] > The default in is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance. > [!TIP] > The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed. ## Best Practices To achieve optimal performance, we recommend the following settings: 1. **Sampling Parameters**: - For thinking mode (), use , , , and . **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. - For non-thinking mode (), we suggest using , , , and . - For supported frameworks, you can adjust the parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance. 2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance. 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking. - **Math Problems**: Include \"Please reason step by step, and put your final answer within \\boxed{}.\" in the prompt. - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: \"Please show your choice in the field with only the choice letter, e.g., .\" 4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed. ### Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/RedHatAI_DeepSeek-R1-Distill-Llama-70B-FP8-dynamic.json b/data/model_data_json/RedHatAI_DeepSeek-R1-Distill-Llama-70B-FP8-dynamic.json new file mode 100644 index 0000000000000000000000000000000000000000..22fe6a25ffe2eb464cbacf171a7d90b4d557d767 --- /dev/null +++ b/data/model_data_json/RedHatAI_DeepSeek-R1-Distill-Llama-70B-FP8-dynamic.json @@ -0,0 +1,24 @@ +{ + "model_id": "RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic", + "downloads": 100956, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "deepseek", + "fp8", + "vllm", + "conversational", + "base_model:deepseek-ai/DeepSeek-R1-Distill-Llama-70B", + "base_model:quantized:deepseek-ai/DeepSeek-R1-Distill-Llama-70B", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "compressed-tensors", + "region:us" + ], + "description": "--- license: mit tags: - deepseek - fp8 - vllm base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B library_name: transformers --- # DeepSeek-R1-Distill-Llama-70B-FP8-dynamic ## Model Overview - **Model Architecture:** LlamaForCausalLM - **Input:** Text - **Output:** Text - **Model Optimizations:** - **Weight quantization:** FP8 - **Activation quantization:** FP8 - **Release Date:** 2/1/2025 - **Version:** 1.0 - **Model Developers:** Neural Magic Quantized version of DeepSeek-R1-Distill-Llama-70B. ### Model Optimizations This model was obtained by quantizing the weights and activations of DeepSeek-R1-Distill-Llama-70B to FP8 data type. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized. Weights are quantized using a symmetric per-channel scheme, whereas quantizations are quantized using a symmetric per-token scheme. LLM Compressor is used for quantization. ## Use with vLLM This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details. ## Creation This model was created with llm-compressor by running the code snippet below. ## Evaluation The model was evaluated on OpenLLM Leaderboard V1 and V2, using the following commands: OpenLLM Leaderboard V1: OpenLLM Leaderboard V2: ### Accuracy
Category Metric deepseek-ai/DeepSeek-R1-Distill-Llama-70B neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic Recovery
Reasoning AIME 2024 (pass@1) 67.83 69.17 101.98%
MATH-500 (pass@1) 95.29 95.14 99.84%
GPQA Diamond (pass@1) 65.57 65.15 99.36%
Average Score 76.23 76.49 100.34%
OpenLLM V1 ARC-Challenge (Acc-Norm, 25-shot) 63.65 63.05 99.1%
GSM8K (Strict-Match, 5-shot) 93.03 93.03 100.0%
HellaSwag (Acc-Norm, 10-shot) 84.85 84.71 99.8%
MMLU (Acc, 5-shot) 78.04 77.45 99.3%
TruthfulQA (MC2, 0-shot) 56.67 56.62 99.9%
Winogrande (Acc, 5-shot) 78.22 78.45 100.3%
Average Score 75.74 75.55 99.8%
OpenLLM V2 IFEval (Inst Level Strict Acc, 0-shot) 42.45 42.11 99.2%
BBH (Acc-Norm, 3-shot) 21.26 19.77 93.0%
Math-Hard (Exact-Match, 4-shot) 0.00 0.00 ---
GPQA (Acc-Norm, 0-shot) 9.51 6.97 ---
MUSR (Acc-Norm, 0-shot) 14.87 14.60 ---
MMLU-Pro (Acc, 5-shot) 4.27 5.76 ---
Average Score 15.39 14.87 96.6%
Coding HumanEval (pass@1) 81.10 81.00 99.9%
HumanEval (pass@10) 87.60 88.60 101.1%
HumanEval+ (pass@10) 75.20 75.50 100.4%
HumanEval+ (pass@10) 83.10 84.30 101.4%
## Inference Performance This model achieves up to 1.4x speedup in single-stream deployment and up to 3.0x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario. The following performance benchmarks were conducted with vLLM version 0.7.2, and GuideLLM.
Benchmarking Command
### Single-stream performance (measured with vLLM version 0.7.2)
Instruction Following
256 / 128
Multi-turn Chat
512 / 256
Docstring Generation
768 / 128
RAG
1024 / 128
Code Completion
256 / 1024
Code Fixing
1024 / 1024
Large Summarization
4096 / 512
Large RAG
10240 / 1536
GPU class Number of GPUs Model Average cost reduction Latency (s) QPD Latency (s) QPD Latency (s) QPD Latency (s) QPD Latency (s) QPD Latency (s) QPD Latency (s) QPD Latency (s) QPD
A6000 4 deepseek-ai/DeepSeek-R1-Distill-Llama-70B --- 7.4 152 14.9 76 7.5 149 7.7 146 57.2 20 58.9 19 31.9 35 98.4 11
2 neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8 1.93 7.7 292 15.2 148 7.8 287 8.0 282 60.7 37 60.2 37 32.3 70 104.0 22
2 neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 2.83 4.9 457 10.0 225 5.5 411 5.8 389 38.9 58 39.2 57 23.7 95 76.6 29
A100 2 deepseek-ai/DeepSeek-R1-Distill-Llama-70B --- 6.4 157 12.8 79 6.6 153 6.7 151 50.4 20 50.8 20 27.0 37 85.4 12
2 neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8 1.48 4.1 245 8.2 123 4.2 238 4.3 235 32.4 31 32.8 31 17.6 57 90.8 11
1 neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 2.69 4.6 440 9.2 220 4.9 407 5.2 389 35.3 57 36.3 55 21.2 95 68.1 30
H100 2 deepseek-ai/DeepSeek-R1-Distill-Llama-70B --- 3.8 149 7.6 74 3.9 146 3.9 144 30.0 19 30.4 19 16.1 35 56.5 10
2 neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic 1.39 2.7 210 5.3 106 2.7 207 2.8 203 21.1 27 21.4 26 11.5 49 47.2 12
1 neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 1.83 4.0 277 7.9 138 4.1 266 4.2 262 31.2 35 31.8 34 17.8 61 61.4 18
**Use case profiles: prompt tokens / generation tokens **QPD: Queries per dollar, based on on-demand cost at Lambda Labs (observed on 2/18/2025). ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
Instruction Following
256 / 128
Multi-turn Chat
512 / 256
Docstring Generation
768 / 128
RAG
1024 / 128
Code Completion
256 / 1024
Code Fixing
1024 / 1024
Large Summarization
4096 / 512
Large RAG
10240 / 1536
Hardware Model Average cost reduction Maximum throughput (QPS) QPD Maximum throughput (QPS) QPD Maximum throughput (QPS) QPD Maximum throughput (QPS) QPD Maximum throughput (QPS) QPD Maximum throughput (QPS) QPD Maximum throughput (QPS) QPD Maximum throughput (QPS) QPD
A6000x4 deepseek-ai/DeepSeek-R1-Distill-Llama-70B --- 3.65 4102 1.56 1757 1.90 2143 1.48 1665 0.44 493 0.34 380 0.22 245 0.05 55
neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8 1.76 5.89 6625 2.94 3307 3.36 3775 2.59 2916 0.74 828 0.53 601 0.35 398 0.11 120
neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 1.48 4.91 5528 2.01 2259 2.03 2280 1.12 1255 1.11 1251 0.76 852 0.24 267 0.07 81
A100x4 deepseek-ai/DeepSeek-R1-Distill-Llama-70B --- 10.41 5235 5.10 2565 5.50 2766 4.36 2193 1.49 751 1.21 607 0.89 447 0.19 98
neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8 1.63 18.11 9103 8.90 4477 9.41 4730 7.42 3731 2.44 1229 1.89 948 1.26 631 0.30 149
neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 1.12 12.63 6353 5.32 2673 5.58 2804 4.27 2144 2.30 1158 1.45 729 0.76 381 0.22 110
H100x4 deepseek-ai/DeepSeek-R1-Distill-Llama-70B --- 14.04 2113 10.85 1634 12.25 1844 9.93 1494 3.68 554 2.82 425 1.81 273 0.35 52
neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic 1.78 41.44 6236 19.64 2956 21.03 3166 16.72 2516 6.01 904 4.46 672 2.55 383 0.49 74
neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 1.45 36.61 5509 15.12 2275 16.24 2443 13.22 1990 5.48 825 3.01 453 2.07 312 0.43 64
**Use case profiles: prompt tokens / generation tokens **QPS: Queries per second. **QPD: Queries per dollar, based on on-demand cost at Lambda Labs (observed on 2/18/2025).", + "model_explanation_gemini": "An FP8-quantized version of DeepSeek-R1-Distill-Llama-70B optimized for reduced memory usage and efficient text generation while maintaining high accuracy in reasoning and coding tasks." +} \ No newline at end of file diff --git a/data/model_data_json/RedHatAI_Llama-3.2-1B-Instruct-FP8-dynamic.json b/data/model_data_json/RedHatAI_Llama-3.2-1B-Instruct-FP8-dynamic.json new file mode 100644 index 0000000000000000000000000000000000000000..28cda4607aafd3ddc31ba566e4059c4630f8716d --- /dev/null +++ b/data/model_data_json/RedHatAI_Llama-3.2-1B-Instruct-FP8-dynamic.json @@ -0,0 +1,27 @@ +{ + "model_id": "RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic", + "downloads": 97029, + "tags": [ + "safetensors", + "llama", + "fp8", + "vllm", + "text-generation", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "base_model:meta-llama/Llama-3.2-1B-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-1B-Instruct", + "license:llama3.2", + "compressed-tensors", + "region:us" + ], + "description": "--- tags: - fp8 - vllm language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation license: llama3.2 base_model: meta-llama/Llama-3.2-1B-Instruct --- # Llama-3.2-1B-Instruct-FP8-dynamic ## Model Overview - **Model Architecture:** Meta-Llama-3.2 - **Input:** Text - **Output:** Text - **Model Optimizations:** - **Weight quantization:** FP8 - **Activation quantization:** FP8 - **Intended Use Cases:** Intended for commercial and research use in multiple languages. Similarly to Llama-3.2-1B-Instruct, this models is intended for assistant-like chat. - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. - **Release Date:** 9/25/2024 - **Version:** 1.0 - **License(s):** llama3.2 - **Model Developers:** Neural Magic Quantized version of Llama-3.2-1B-Instruct. It achieves an average score of 50.88 on a subset of task from the OpenLLM benchmark (version 1), whereas the unquantized model achieves 51.70. ### Model Optimizations This model was obtained by quantizing the weights and activations of Llama-3.2-1B-Instruct to FP8 data type, ready for inference with vLLM built from source. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-channel quantization is applied, in which a linear scaling per output dimension maps the FP8 representations of the quantized weights and activations. Activations are also quantized on a per-token dynamic basis. LLM Compressor is used for quantization. ## Deployment ### Use with vLLM This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details. ## Creation This model was created by applying LLM Compressor, as presented in the code snipet below. ## Evaluation The model was evaluated on MMLU, ARC-Challenge, GSM-8K, and Winogrande. Evaluation was conducted using the Neural Magic fork of lm-evaluation-harness (branch llama_3.1_instruct) and the vLLM engine. This version of the lm-evaluation-harness includes versions of ARC-Challenge, GSM-8K, MMLU, and MMLU-cot that match the prompting style of Meta-Llama-3.1-Instruct-evals. ### Accuracy #### Open LLM Leaderboard evaluation scores
Benchmark Llama-3.2-1B-Instruct Llama-3.2-1B-Instruct-FP8-dynamic (this model) Recovery
MMLU (5-shot) 47.66 47.55 99.8%
MMLU-cot (0-shot) 47.10 46.79 99.3%
ARC Challenge (0-shot) 58.36 57.25 98.1%
GSM-8K-cot (8-shot, strict-match) 45.72 45.94 100.5%
Winogrande (5-shot) 62.27 61.40 98.6%
Hellaswag (10-shot) 61.01 60.95 99.9%
TruthfulQA (0-shot, mc2) 43.52 44.23 101.6%
Average 52.24 52.02 99.7%
### Reproduction The results were obtained using the following commands: #### MMLU #### MMLU-CoT #### ARC-Challenge #### GSM-8K #### Hellaswag #### Winogrande #### TruthfulQA", + "model_explanation_gemini": "An FP8-quantized version of Llama-3.2-1B-Instruct optimized for efficient assistant-like chat in multiple languages, reducing memory usage while maintaining performance." +} \ No newline at end of file diff --git a/data/model_data_json/RedHatAI_Meta-Llama-3.1-8B-Instruct-FP8.json b/data/model_data_json/RedHatAI_Meta-Llama-3.1-8B-Instruct-FP8.json new file mode 100644 index 0000000000000000000000000000000000000000..71907ed7bc594ac599f2eee5bf3f5481740725cf --- /dev/null +++ b/data/model_data_json/RedHatAI_Meta-Llama-3.1-8B-Instruct-FP8.json @@ -0,0 +1,31 @@ +{ + "model_id": "RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8", + "downloads": 228435, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "fp8", + "vllm", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "compressed-tensors", + "region:us" + ], + "description": "--- tags: - fp8 - vllm language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct --- # Meta-Llama-3.1-8B-Instruct-FP8 ## Model Overview - **Model Architecture:** Meta-Llama-3.1 - **Input:** Text - **Output:** Text - **Model Optimizations:** - **Weight quantization:** FP8 - **Activation quantization:** FP8 - **Intended Use Cases:** Intended for commercial and research use in multiple languages. Similarly to Meta-Llama-3.1-8B-Instruct, this models is intended for assistant-like chat. - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. - **Release Date:** 7/23/2024 - **Version:** 1.0 - **License(s):** llama3.1 - **Model Developers:** Neural Magic Quantized version of Meta-Llama-3.1-8B-Instruct. It achieves an average score of 73.44 on the OpenLLM benchmark (version 1), whereas the unquantized model achieves 73.79. ### Model Optimizations This model was obtained by quantizing the weights and activations of Meta-Llama-3.1-8B-Instruct to FP8 data type, ready for inference with vLLM built from source. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-tensor quantization is applied, in which a single linear scaling maps the FP8 representations of the quantized weights and activations. LLM Compressor is used for quantization with 512 sequences of UltraChat. ## Deployment ### Use with vLLM This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details. ## Creation This model was created by applying LLM Compressor with calibration samples from UltraChat, as presented in the code snipet below. ## Evaluation The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA. Evaluation was conducted using the Neural Magic fork of lm-evaluation-harness (branch llama_3.1_instruct) and the vLLM engine. This version of the lm-evaluation-harness includes versions of ARC-Challenge, GSM-8K, MMLU, and MMLU-cot that match the prompting style of Meta-Llama-3.1-Instruct-evals. ### Accuracy #### Open LLM Leaderboard evaluation scores
Benchmark Meta-Llama-3.1-8B-Instruct Meta-Llama-3.1-8B-Instruct-FP8(this model) Recovery
MMLU (5-shot) 67.95 67.97 100.0%
MMLU-cot (0-shot) 71.24 71.12 99.83%
ARC Challenge (0-shot) 82.00 81.66 99.59%
GSM-8K-cot (8-shot, strict-match) 81.96 81.12 98.98%
Hellaswag (10-shot) 80.46 80.4 99.93%
Winogrande (5-shot) 78.45 77.90 99.30%
TruthfulQA (0-shot, mc2) 54.50 53.92 98.94%
Average 73.79 73.44 99.52%
### Reproduction The results were obtained using the following commands: #### MMLU #### MMLU-cot #### ARC-Challenge #### GSM-8K #### Hellaswag #### Winogrande #### TruthfulQA", + "model_explanation_gemini": "An FP8-optimized version of Meta-Llama-3.1-8B-Instruct designed for efficient assistant-like chat and text generation in multiple languages while reducing memory and disk requirements by 50%." +} \ No newline at end of file diff --git a/data/model_data_json/RedHatAI_Meta-Llama-3.1-8B-Instruct-quantized.w4a16.json b/data/model_data_json/RedHatAI_Meta-Llama-3.1-8B-Instruct-quantized.w4a16.json new file mode 100644 index 0000000000000000000000000000000000000000..e427c68e4877ab40180729cb83a8689b99ed3d5b --- /dev/null +++ b/data/model_data_json/RedHatAI_Meta-Llama-3.1-8B-Instruct-quantized.w4a16.json @@ -0,0 +1,32 @@ +{ + "model_id": "RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16", + "downloads": 102010, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "int4", + "vllm", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "gptq", + "region:us" + ], + "description": "--- tags: - int4 - vllm language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct --- # Meta-Llama-3.1-8B-Instruct-quantized.w4a16 ## Model Overview - **Model Architecture:** Meta-Llama-3 - **Input:** Text - **Output:** Text - **Model Optimizations:** - **Weight quantization:** INT4 - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to Meta-Llama-3.1-8B-Instruct, this models is intended for assistant-like chat. - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. - **Release Date:** 7/26/2024 - **Version:** 1.0 - **License(s):** Llama3.1 - **Model Developers:** Neural Magic This model is a quantized version of Meta-Llama-3.1-8B-Instruct. It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model, including multiple-choice, math reasoning, and open-ended text generation. Meta-Llama-3.1-8B-Instruct-quantized.w4a16 achieves 93.0% recovery for the Arena-Hard evaluation, 98.9% for OpenLLM v1 (using Meta's prompting when available), 96.1% for OpenLLM v2, 99.7% for HumanEval pass@1, and 97.4% for HumanEval+ pass@1. ### Model Optimizations This model was obtained by quantizing the weights of Meta-Llama-3.1-8B-Instruct to INT4 data type. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%. Only the weights of the linear operators within transformers blocks are quantized. Symmetric per-group quantization is applied, in which a linear scaling per group of 128 parameters maps the INT4 and floating point representations of the quantized weights. AutoGPTQ is used for quantization with 10% damping factor and 768 sequences taken from Neural Magic's LLM compression calibration dataset. ## Deployment This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details. ## Creation This model was created by applying the AutoGPTQ library as presented in the code snipet below. Although AutoGPTQ was used for this particular model, Neural Magic is transitioning to using llm-compressor which supports several quantization schemes and models not supported by AutoGPTQ. ## Evaluation This model was evaluated on the well-known Arena-Hard, OpenLLM v1, OpenLLM v2, HumanEval, and HumanEval+ benchmarks. In all cases, model outputs were generated with the vLLM engine. Arena-Hard evaluations were conducted using the Arena-Hard-Auto repository. The model generated a single answer for each prompt form Arena-Hard, and each answer was judged twice by GPT-4. We report below the scores obtained in each judgement and the average. OpenLLM v1 and v2 evaluations were conducted using Neural Magic's fork of lm-evaluation-harness (branch llama_3.1_instruct). This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of Meta-Llama-3.1-Instruct-evals and a few fixes to OpenLLM v2 tasks. HumanEval and HumanEval+ evaluations were conducted using Neural Magic's fork of the EvalPlus repository. Detailed model outputs are available as HuggingFace datasets for Arena-Hard, OpenLLM v2, and HumanEval. **Note:** Results have been updated after Meta modified the chat template. ### Accuracy
Category Benchmark Meta-Llama-3.1-8B-Instruct Meta-Llama-3.1-8B-Instruct-quantized.w4a16 (this model) Recovery
LLM as a judge Arena Hard 25.8 (25.1 / 26.5) 27.2 (27.6 / 26.7) 105.4%
OpenLLM v1 MMLU (5-shot) 68.3 66.9 97.9%
MMLU (CoT, 0-shot) 72.8 71.1 97.6%
ARC Challenge (0-shot) 81.4 80.2 98.0%
GSM-8K (CoT, 8-shot, strict-match) 82.8 82.9 100.2%
Hellaswag (10-shot) 80.5 79.9 99.3%
Winogrande (5-shot) 78.1 78.0 99.9%
TruthfulQA (0-shot, mc2) 54.5 52.8 96.9%
Average 74.3 73.5 98.9%
OpenLLM v2 MMLU-Pro (5-shot) 30.8 28.8 93.6%
IFEval (0-shot) 77.9 76.3 98.0%
BBH (3-shot) 30.1 28.9 96.1%
Math-lvl-5 (4-shot) 15.7 14.8 94.4%
GPQA (0-shot) 3.7 4.0 109.8%
MuSR (0-shot) 7.6 6.3 83.2%
Average 27.6 26.5 96.1%
Coding HumanEval pass@1 67.3 67.1 99.7%
HumanEval+ pass@1 60.7 59.1 97.4%
Multilingual Portuguese MMLU (5-shot) 59.96 58.69 97.9%
Spanish MMLU (5-shot) 60.25 58.39 96.9%
Italian MMLU (5-shot) 59.23 57.82 97.6%
German MMLU (5-shot) 58.63 56.22 95.9%
French MMLU (5-shot) 59.65 57.58 96.5%
Hindi MMLU (5-shot) 50.10 47.14 94.1%
Thai MMLU (5-shot) 49.12 46.72 95.1%
### Reproduction The results were obtained using the following commands: #### MMLU #### MMLU-CoT #### ARC-Challenge #### GSM-8K #### Hellaswag #### Winogrande #### TruthfulQA #### OpenLLM v2 #### MMLU Portuguese #### MMLU Spanish #### MMLU Italian #### MMLU German #### MMLU French #### MMLU Hindi #### MMLU Thai #### HumanEval and HumanEval+ ##### Generation ##### Sanitization ##### Evaluation", + "model_explanation_gemini": "A quantized version of Meta-Llama-3.1-8B-Instruct optimized for assistant-like chat tasks with reduced memory requirements through INT4 weight quantization." +} \ No newline at end of file diff --git a/data/model_data_json/RedHatAI_Mistral-Small-24B-Instruct-2501-FP8-dynamic.json b/data/model_data_json/RedHatAI_Mistral-Small-24B-Instruct-2501-FP8-dynamic.json new file mode 100644 index 0000000000000000000000000000000000000000..e0ca905185ef09d4fcecb834fdccde97ba476473 --- /dev/null +++ b/data/model_data_json/RedHatAI_Mistral-Small-24B-Instruct-2501-FP8-dynamic.json @@ -0,0 +1,24 @@ +{ + "model_id": "RedHatAI/Mistral-Small-24B-Instruct-2501-FP8-dynamic", + "downloads": 81938, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "mistral-small", + "fp8", + "vllm", + "conversational", + "en", + "base_model:mistralai/Mistral-Small-24B-Instruct-2501", + "base_model:quantized:mistralai/Mistral-Small-24B-Instruct-2501", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "compressed-tensors", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en tags: - mistral - mistral-small - fp8 - vllm base_model: mistralai/Mistral-Small-24B-Instruct-2501 library_name: transformers --- # Mistral-Small-24B-Instruct-2501-FP8-Dynamic ## Model Overview - **Model Architecture:** Mistral-Small-24B-Instruct-2501 - **Input:** Text - **Output:** Text - **Model Optimizations:** - **Weight quantization:** FP8 - **Activation quantization:** FP8 - **Release Date:** 3/1/2025 - **Version:** 1.0 - **Model Developers:** Neural Magic Quantized version of Mistral-Small-24B-Instruct-2501. It achieves an average score of 78.88 on the OpenLLM benchmark (version 1), whereas the unquantized model achieves 79.45. ### Model Optimizations This model was obtained by quantizing the weights and activations to FP8 data type, ready for inference with vLLM. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized. ## Deployment ### Use with vLLM This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details. ## Creation This model was created with llm-compressor by running the code snippet below. ## Evaluation The model was evaluated on OpenLLM Leaderboard V1 and V2, using the following commands: OpenLLM Leaderboard V1: OpenLLM Leaderboard V2: ### Accuracy #### OpenLLM Leaderboard V1 evaluation scores | Metric | mistralai/Mistral-Small-24B-Instruct-2501 | nm-testing/Mistral-Small-24B-Instruct-2501-FP8-dynamic | |-----------------------------------------|:---------------------------------:|:-------------------------------------------:| | ARC-Challenge (Acc-Norm, 25-shot) | 72.18 | 71.76 | | GSM8K (Strict-Match, 5-shot) | 90.14 | 89.01 | | HellaSwag (Acc-Norm, 10-shot) | 85.05 | 84.65 | | MMLU (Acc, 5-shot) | 80.69 | 80.55 | | TruthfulQA (MC2, 0-shot) | 65.55 | 64.85 | | Winogrande (Acc, 5-shot) | 83.11 | 82.48 | | **Average Score** | **79.45** | **78.88** | | **Recovery (%)** | **100.00** | **99.28** | #### OpenLLM Leaderboard V2 evaluation scores | Metric | mistralai/Mistral-Small-24B-Instruct-2501 | nm-testing/Mistral-Small-24B-Instruct-2501-FP8-dynamic | |---------------------------------------------------------|:---------------------------------:|:-------------------------------------------:| | IFEval (Inst-and-Prompt Level Strict Acc, 0-shot) | 73.27 | 73.53 | | BBH (Acc-Norm, 3-shot) | 45.18 | 44.39 | | MMLU-Pro (Acc, 5-shot) | 38.83 | 37.28 | | **Average Score** | **52.42** | **51.73** | | **Recovery (%)** | **100.00** | **98.68** | | Math-Hard (Exact-Match, 4-shot) | 6.35 | 2.99 | | GPQA (Acc-Norm, 0-shot) | 8.29 | 6.97 | | MUSR (Acc-Norm, 0-shot) | 7.84 | 8.04 | Results on Math-Hard, GPQA, and MUSR are not considred for accuracy recovery calculation because the unquantized model has close to random prediction accuracy (6.35, 8.29, 7.84) which doesn't provide a reliable baseline for recovery calculation." +} \ No newline at end of file diff --git a/data/model_data_json/Revai_pyannote-wespeaker-voxceleb-resnet34-LM.json b/data/model_data_json/Revai_pyannote-wespeaker-voxceleb-resnet34-LM.json new file mode 100644 index 0000000000000000000000000000000000000000..22f2a2d9140b7b8279e54e085171647ee4124a74 --- /dev/null +++ b/data/model_data_json/Revai_pyannote-wespeaker-voxceleb-resnet34-LM.json @@ -0,0 +1,11 @@ +{ + "model_id": "Revai/pyannote-wespeaker-voxceleb-resnet34-LM", + "downloads": 201251, + "tags": [ + "pytorch", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 ---", + "model_explanation_gemini": "Performs speaker recognition by identifying and verifying speakers in audio using a ResNet34 architecture trained on VoxCeleb data." +} \ No newline at end of file diff --git a/data/model_data_json/Revai_reverb-diarization-v1.json b/data/model_data_json/Revai_reverb-diarization-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..1a471cf96a41faef6ca0349a531e0520d50ac75f --- /dev/null +++ b/data/model_data_json/Revai_reverb-diarization-v1.json @@ -0,0 +1,14 @@ +{ + "model_id": "Revai/reverb-diarization-v1", + "downloads": 175090, + "tags": [ + "pyannote-audio", + "pytorch", + "reverb", + "automatic-speech-recognition", + "arxiv:2410.03930", + "license:other", + "region:us" + ], + "description": "--- license: other library_name: pyannote-audio tags: - reverb pipeline_tag: automatic-speech-recognition --- Details on the model, it's performance, and more available on Arxiv. For more information on how to run this diarization model see Reverb diarization V1 provides a 16.5% relative improvement in WDER (Word Diarization Error Rate) compared to the baseline pyannote3.0 model, evaluated on over 1,250,000 tokens across five different test suites. | Test suite | WDER | |---------| --------| |earnings21 | 0.047 | |rev16 | 0.077| # Usage # Cite this Model If you use this model please use the following citation: # License See LICENSE for details." +} \ No newline at end of file diff --git a/data/model_data_json/Rostlab_ProstT5.json b/data/model_data_json/Rostlab_ProstT5.json new file mode 100644 index 0000000000000000000000000000000000000000..ec6c53e9fc3fd1f2b54ec3ef9316b244f9e7ee6f --- /dev/null +++ b/data/model_data_json/Rostlab_ProstT5.json @@ -0,0 +1,20 @@ +{ + "model_id": "Rostlab/ProstT5", + "downloads": 251503, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "biology", + "translation", + "dataset:adrianhenkel/lucidprots_full_data", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit datasets: - adrianhenkel/lucidprots_full_data pipeline_tag: translation tags: - biology --- # Model Card for ProstT5 ProstT5 is a protein language model (pLM) which can translate between protein sequence and structure. !ProstT5 pre-training and inference ## Model Details ### Model Description ProstT5 (Protein structure-sequence T5) is based on ProtT5-XL-U50, a T5 model trained on encoding protein sequences using span corruption applied on billions of protein sequences. ProstT5 finetunes ProtT5-XL-U50 on translating between protein sequence and structure using 17M proteins with high-quality 3D structure predictions from the AlphaFoldDB. Protein structure is converted from 3D to 1D using the 3Di-tokens introduced by Foldseek. In a first step, ProstT5 learnt to represent the newly introduced 3Di-tokens by continuing the original span-denoising objective applied on 3Di- and amino acid- (AA) sequences. Only in a second step, ProstT5 was trained on translating between the two modalities. The direction of the translation is indicated by two special tokens (\"\\\" for translating from 3Di to AAs, “\\” for translating from AAs to 3Di). To avoid clashes with AA tokens, 3Di-tokens were cast to lower-case (alphabets are identical otherwise). - **Developed by:** Michael Heinzinger (GitHub @mheinzinger; Twitter @HeinzingerM) - **Model type:** Encoder-decoder (T5) - **Language(s) (NLP):** Protein sequence and structure - **License:** MIT - **Finetuned from model:** ProtT5-XL-U50 ## Uses 1. The model can be used for traditional feature extraction. For this, we recommend using only the encoder in half-precision (fp16) together with batching. Examples (currently only for original ProtT5-XL-U50 but replacing repository links and adding prefixes works): script and colab While original ProtT5-XL-U50 could only embed AA sequences, ProstT5 can now also embed 3D structures represented by 3Di tokens. 3Di tokens can either be derived from 3D structures using Foldseek or they can be predicted from AA sequences by ProstT5. 3. \"Folding\": Translation from sequence (AAs) to structure (3Di). The resulting 3Di strings can be used together with Foldseek for remote homology detection while avoiding to compute 3D structures explicitly. 4. \"Inverse Folding\": Translation from structure (3Di) to sequence (AA). ## How to Get Started with the Model Feature extraction: Translation (\"folding\", i.e., AA to 3Di): ## Training Details ### Training Data Pre-training data (3Di+AA sequences for 17M proteins) ### Training Procedure The first phase of the pre-training is continuing span-based denoising using 3Di- and AA-sequences using this script. For the second phase of pre-training (actual translation from 3Di- to AA-sequences and vice versa), we used this script. #### Training Hyperparameters - **Training regime:** we used DeepSpeed (stage-2), gradient accumulation steps (5 steps), mixed half-precision (bf16) and PyTorch2.0’s torchInductor compiler #### Speed Generating embeddings for the human proteome from the Pro(s)tT5 encoder requires around 35m (minutes) or 0.1s (seconds) per protein using batch-processing and half-precision (fp16) on a single RTX A6000 GPU with 48 GB vRAM. The translation is comparatively slow (0.6-2.5s/protein at an average length 135 and 406, respectively) due to the sequential nature of the decoding process which needs to generate left-to-right, token-by-token. We only used batch-processing with half-precision without further optimization.", + "model_explanation_gemini": "Translates between protein sequences and 3D structures using 3Di-tokens for structural representation." +} \ No newline at end of file diff --git a/data/model_data_json/Rostlab_prot_bert.json b/data/model_data_json/Rostlab_prot_bert.json new file mode 100644 index 0000000000000000000000000000000000000000..405d225aa19641eb3222387b2afdb5e23db1b582 --- /dev/null +++ b/data/model_data_json/Rostlab_prot_bert.json @@ -0,0 +1,17 @@ +{ + "model_id": "Rostlab/prot_bert", + "downloads": 292492, + "tags": [ + "transformers", + "pytorch", + "fill-mask", + "protein language model", + "protein", + "dataset:Uniref100", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - protein language model - protein datasets: - Uniref100 --- # ProtBert model Pretrained model on protein sequences using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is trained on uppercase amino acids: it only works with capital letter amino acids. ## Model description ProtBert is based on Bert model which pretrained on a large corpus of protein sequences in a self-supervised fashion. This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those protein sequences. One important difference between our Bert model and the original Bert version is the way of dealing with sequences as separate documents. This means the Next sentence prediction is not used, as each sequence is treated as a complete document. The masking follows the original Bert training with randomly masks 15% of the amino acids in the input. At the end, the feature extracted from this model revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein shape. This implied learning some of the grammar of the language of life realized in protein sequences. ## Intended uses & limitations The model could be used for protein feature extraction or to be fine-tuned on downstream tasks. We have noticed in some tasks you could gain more accuracy by fine-tuning the model rather than using it as a feature extractor. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given protein sequence in PyTorch: ## Training data The ProtBert model was pretrained on Uniref100, a dataset consisting of 217 million protein sequences. ## Training procedure ### Preprocessing The protein sequences are uppercased and tokenized using a single space and a vocabulary size of 21. The rare amino acids \"U,Z,O,B\" were mapped to \"X\". The inputs of the model are then of the form: Furthermore, each protein sequence was treated as a separate document. The preprocessing step was performed twice, once for a combined length (2 sequences) of less than 512 amino acids, and another time using a combined length (2 sequences) of less than 2048 amino acids. The details of the masking procedure for each sequence followed the original Bert model as following: - 15% of the amino acids are masked. - In 80% of the cases, the masked amino acids are replaced by . - In 10% of the cases, the masked amino acids are replaced by a random amino acid (different) from the one they replace. - In the 10% remaining cases, the masked amino acids are left as is. ### Pretraining The model was trained on a single TPU Pod V3-512 for 400k steps in total. 300K steps using sequence length 512 (batch size 15k), and 100K steps using sequence length 2048 (batch size 2.5k). The optimizer used is Lamb with a learning rate of 0.002, a weight decay of 0.01, learning rate warmup for 40k steps and linear decay of the learning rate after. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Test results : | Task/Dataset | secondary structure (3-states) | secondary structure (8-states) | Localization | Membrane | |:-----:|:-----:|:-----:|:-----:|:-----:| | CASP12 | 75 | 63 | | | | TS115 | 83 | 72 | | | | CB513 | 81 | 66 | | | | DeepLoc | | | 79 | 91 | ### BibTeX entry and citation info > Created by Ahmed Elnaggar/@Elnaggar_AI | LinkedIn", + "model_explanation_gemini": "Pretrained on protein sequences using masked language modeling to extract features or fine-tune for downstream tasks like structure prediction and localization." +} \ No newline at end of file diff --git a/data/model_data_json/Rostlab_prot_t5_xl_half_uniref50-enc.json b/data/model_data_json/Rostlab_prot_t5_xl_half_uniref50-enc.json new file mode 100644 index 0000000000000000000000000000000000000000..f39c6365003f880a40d3b7d78c726e0252ba28f8 --- /dev/null +++ b/data/model_data_json/Rostlab_prot_t5_xl_half_uniref50-enc.json @@ -0,0 +1,16 @@ +{ + "model_id": "Rostlab/prot_t5_xl_half_uniref50-enc", + "downloads": 409336, + "tags": [ + "transformers", + "pytorch", + "t5", + "protein language model", + "dataset:UniRef50", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - protein language model datasets: - UniRef50 --- # Encoder only ProtT5-XL-UniRef50, half-precision model An encoder-only, half-precision version of the ProtT5-XL-UniRef50 model. The original model and it's pretraining were introduced in this paper and first released in this repository. This model is trained on uppercase amino acids: it only works with capital letter amino acids. ## Model description ProtT5-XL-UniRef50 is based on the model and was pretrained on a large corpus of protein sequences in a self-supervised fashion. This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those protein sequences. One important difference between this T5 model and the original T5 version is the denoising objective. The original T5-3B model was pretrained using a span denoising objective, while this model was pretrained with a Bart-like MLM denoising objective. The masking probability is consistent with the original T5 training by randomly masking 15% of the amino acids in the input. This model only contains the encoder portion of the original ProtT5-XL-UniRef50 model using half precision (float16). As such, this model can efficiently be used to create protein/ amino acid representations. When used for training downstream networks/ feature extraction, these embeddings produced the same performance (established empirically by comparing on several downstream tasks). ## Intended uses & limitations This version of the original ProtT5-XL-UniRef50 is mostly meant for conveniently creating amino-acid or protein embeddings with a low GPU-memory footprint without any measurable performance-decrease in our experiments. This model is fully usable on 8 GB of video RAM. ### How to use An extensive, interactive example on how to use this model for common tasks can be found on Google Colab Here is how to use this model to extract the features of a given protein sequence in PyTorch: **NOTE**: Please make sure to explicitly set the model to () otherwise, the generated embeddings will be full precision. **NOTE**: Currently (06/2022) half-precision models cannot be used on CPU. If you want to use the encoder only version on CPU, you need to cast it to its full-precision version (). ### BibTeX entry and citation info", + "model_explanation_gemini": "Generates half-precision protein sequence embeddings efficiently with low GPU memory usage while maintaining performance, focusing on uppercase amino acids." +} \ No newline at end of file diff --git a/data/model_data_json/RunDiffusion_Juggernaut-XL-v6.json b/data/model_data_json/RunDiffusion_Juggernaut-XL-v6.json new file mode 100644 index 0000000000000000000000000000000000000000..23588d7fc4ab14f98660b580f58bdbb47bf632f1 --- /dev/null +++ b/data/model_data_json/RunDiffusion_Juggernaut-XL-v6.json @@ -0,0 +1,28 @@ +{ + "model_id": "RunDiffusion/Juggernaut-XL-v6", + "downloads": 155619, + "tags": [ + "diffusers", + "art", + "people", + "diffusion", + "Cinematic", + "Photography", + "Landscape", + "Interior", + "Food", + "Car", + "Wildlife", + "Architecture", + "en", + "base_model:stabilityai/stable-diffusion-xl-base-1.0", + "base_model:finetune:stabilityai/stable-diffusion-xl-base-1.0", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- language: - en license: creativeml-openrail-m library_name: diffusers tags: - art - people - diffusion - Cinematic - Photography - Landscape - Interior - Food - Car - Wildlife - Architecture thumbnail: base_model: stabilityai/stable-diffusion-xl-base-1.0 --- # Juggernaut XL v6 + RunDiffusion Photo v1 Official !juggernaut XL photo previews !RunDiffusion Logo ## Juggernaut v9 is here! Juggernaut v9 + RunDiffusion Photo v2 This model is not permitted to be used behind API services. Please contact juggernaut@rundiffusion.com for business inquires, commercial licensing, custom models, and consultation. Juggernaut is available on the new Auto1111 Forge on RunDiffusion A big thanks for Version 6 goes to RunDiffusion (Photo Model) and Adam, who diligently helped me test :) (Leave some love for them ;) ) For business inquires, commercial licensing, custom models, and consultation contact me under juggernaut@rundiffusion.com", + "model_explanation_gemini": "Generates high-quality, photorealistic images across diverse categories like art, people, and landscapes using Stable Diffusion XL." +} \ No newline at end of file diff --git a/data/model_data_json/RunDiffusion_Juggernaut-XL-v9.json b/data/model_data_json/RunDiffusion_Juggernaut-XL-v9.json new file mode 100644 index 0000000000000000000000000000000000000000..7da9ef9482bdae5eae1fdc417d34dd60c2f43ea2 --- /dev/null +++ b/data/model_data_json/RunDiffusion_Juggernaut-XL-v9.json @@ -0,0 +1,29 @@ +{ + "model_id": "RunDiffusion/Juggernaut-XL-v9", + "downloads": 135796, + "tags": [ + "diffusers", + "art", + "people", + "diffusion", + "Cinematic", + "Photography", + "Landscape", + "Interior", + "Food", + "Car", + "Wildlife", + "Architecture", + "text-to-image", + "en", + "base_model:stabilityai/stable-diffusion-xl-base-1.0", + "base_model:finetune:stabilityai/stable-diffusion-xl-base-1.0", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- language: - en license: creativeml-openrail-m library_name: diffusers tags: - art - people - diffusion - Cinematic - Photography - Landscape - Interior - Food - Car - Wildlife - Architecture thumbnail: >- base_model: stabilityai/stable-diffusion-xl-base-1.0 pipeline_tag: text-to-image --- # Juggernaut XL v9 + RunDiffusion Photo v2 Official and Adam, who diligently helped me test :) (Leave some love for them ;) ) It's time for another round, this time a bit delayed, but I hope you forgive the delay. Let's dive straight into the changes that await you or what we've been working on lately: For V9, I myself have only done basic training. This involves some work on skin details, lighting, and overall contrast. However, the biggest change to the model came from the RunDiffusion Photo Model update, which was made available to me in V2 by RunDiffusion.com. The photographic output of the model should, in our experience, be even stronger than in previous versions. Now for a small \"roadmap\" update, or a general status update on how things are progressing with Juggernaut. As you may have noticed, there was a slight delay with V9. With each successive version, it has become increasingly difficult to train Juggernaut without sacrificing quality in some areas, which was already the case to some extent with V8. Don't worry, V9 is really good, and I'm satisfied with the version I can present to you today :) However, I've decided to go for a complete \"reboot\" for V10. I want to simply retrain the Juggernaut base set. The conditions for better captioning weren't as favorable \"back then\" as they are today, so I want to completely re-caption the base set (5k images) with GPT-4 Vision. I expect a big leap towards prompting guidance and quality. But as you surely noticed last week, the release of Stable Cascade got in the way a bit. Therefore, my focus in the coming weeks will be on training Juggernaut for Stable Cascade. The approach remains the same as with the planned \"reboot\"; I want to caption/tag all images in the future only with GPT-4 or manually. The timeline for all of this is still uncertain. I hope to be able to present you with a first stable version of Juggernaut Cascade sometime in March. V10 of Juggernaut XL will follow in the weeks thereafter. Now, here are some additional tips to make prompting easier for you: - Res: 832x1216 - Sampler: DPM++ 2M Karras - Steps: 30-40 CFG: 3-7 (less is a bit more realistic) - Negative: Start with no negative, and add afterwards the Stuff you don't want to see in that image. I don't recommend using my Negative Prompt, i simply use it because i am lazy :D VAE is already Baked In HiRes: 4xNMKD-Siax_200k with 15 Steps and 0.3 Denoise + 1.5 Upscale And a few keywords/tokens that I regularly use in training, which might help you achieve the optimal result from the version: - Architecture Photography - Wildlife Photography - Car Photography - Food Photography - Interior Photography - Landscape Photography - Hyperdetailed Photography - Cinematic Movie - Still Mid Shot Photo - Full Body Photo - Skin Details !", + "model_explanation_gemini": "Generates highly detailed, photorealistic images across diverse categories like architecture, wildlife, and portraits with enhanced skin details and lighting based on text prompts." +} \ No newline at end of file diff --git a/data/model_data_json/SG161222_RealVisXL_V4.0.json b/data/model_data_json/SG161222_RealVisXL_V4.0.json new file mode 100644 index 0000000000000000000000000000000000000000..06a7c8bdc3870f6a2a055ac2b2f2989ac4ce90df --- /dev/null +++ b/data/model_data_json/SG161222_RealVisXL_V4.0.json @@ -0,0 +1,15 @@ +{ + "model_id": "SG161222/RealVisXL_V4.0", + "downloads": 284108, + "tags": [ + "diffusers", + "safetensors", + "license:openrail++", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: openrail++ --- Check my exclusive models on Mage: ParagonXL / NovaXL / NovaXL Lightning / NovaXL V2 / NovaXL Pony / NovaXL Pony Lightning / RealDreamXL / RealDreamXL Lightning

This model is available on
You can support me directly on Boosty - It's important! Read it!
The model is still in the training phase. This is not the final version and may contain artifacts and perform poorly in some cases.
The model is aimed at photorealism. Can produce sfw and nsfw images of decent quality.
CivitAI Page: Recommended Negative Prompt:
(face asymmetry, eyes asymmetry, deformed eyes, open mouth)
or another negative prompt
Recommended Generation Parameters:
Sampling Steps: 25+
Sampling Method: DPM++ 2M Karras
Recommended Hires Fix Parameters:
Hires steps: 10+
Upscaler: 4x-UltraSharp upscaler / or another
Denoising strength: 0.1 - 0.5
Upscale by: 1.1-2.0
", + "model_explanation_gemini": "Generates photorealistic images, including both SFW and NSFW content, with potential artifacts due to ongoing training." +} \ No newline at end of file diff --git a/data/model_data_json/SG161222_Realistic_Vision_V5.1_noVAE.json b/data/model_data_json/SG161222_Realistic_Vision_V5.1_noVAE.json new file mode 100644 index 0000000000000000000000000000000000000000..24f8c12cb1de8983dc79703ee6a97506ebe4efbc --- /dev/null +++ b/data/model_data_json/SG161222_Realistic_Vision_V5.1_noVAE.json @@ -0,0 +1,15 @@ +{ + "model_id": "SG161222/Realistic_Vision_V5.1_noVAE", + "downloads": 125443, + "tags": [ + "diffusers", + "safetensors", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: creativeml-openrail-m --- Check my exclusive models on Mage:
ParagonXL / NovaXL / NovaXL Lightning / NovaXL V2 / NovaXL Pony / NovaXL Pony Lightning / RealDreamXL / RealDreamXL Lightning

This model is available on
You can support me directly on Boosty - Please read this!
For version 5.1 it is recommended to use with VAE (to improve generation quality and get rid of artifacts):
The recommended negative prompt: (deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck
OR
(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers:1.4), (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation Euler A or DPM++ 2M Karras
CFG Scale 3,5 - 7
Hires. fix with 4x-UltraSharp upscaler
0 Hires steps and Denoising strength 0.25-0.7
Upscale by 1.1-2.0
", + "model_explanation_gemini": "Generates realistic images with improved quality using recommended VAE and specific negative prompts to avoid artifacts and distortions." +} \ No newline at end of file diff --git a/data/model_data_json/SWivid_F5-TTS.json b/data/model_data_json/SWivid_F5-TTS.json new file mode 100644 index 0000000000000000000000000000000000000000..f22f02f88ad3a2e33275227e32c47b5305c67b16 --- /dev/null +++ b/data/model_data_json/SWivid_F5-TTS.json @@ -0,0 +1,14 @@ +{ + "model_id": "SWivid/F5-TTS", + "downloads": 717685, + "tags": [ + "f5-tts", + "text-to-speech", + "dataset:amphion/Emilia-Dataset", + "arxiv:2410.06885", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 pipeline_tag: text-to-speech library_name: f5-tts datasets: - amphion/Emilia-Dataset --- Download F5-TTS or E2 TTS and place under ckpts/ Github: Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching", + "model_explanation_gemini": "Generates fluent and faithful speech from text using flow matching for text-to-speech synthesis." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_blip-image-captioning-base.json b/data/model_data_json/Salesforce_blip-image-captioning-base.json new file mode 100644 index 0000000000000000000000000000000000000000..012159305e43e49292df8e3ba1ce3c16ef4b86d7 --- /dev/null +++ b/data/model_data_json/Salesforce_blip-image-captioning-base.json @@ -0,0 +1,19 @@ +{ + "model_id": "Salesforce/blip-image-captioning-base", + "downloads": 2655931, + "tags": [ + "transformers", + "pytorch", + "tf", + "blip", + "image-text-to-text", + "image-captioning", + "image-to-text", + "arxiv:2201.12086", + "license:bsd-3-clause", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: image-to-text tags: - image-captioning languages: - en license: bsd-3-clause --- # BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Model card for image captioning pretrained on COCO dataset - base architecture (with ViT base backbone). | !BLIP.gif | |:--:| | Pull figure from BLIP official repo | Image source: | ## TL;DR Authors from the paper write in the abstract: *Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to videolanguage tasks in a zero-shot manner. Code, models, and datasets are released.* ## Usage You can use this model for conditional and un-conditional image captioning ### Using the Pytorch model #### Running the model on CPU
Click to expand
#### Running the model on GPU ##### In full precision
Click to expand
##### In half precision ()
Click to expand
## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. ## BibTex and citation info", + "model_explanation_gemini": "Generates captions for images using a vision-language pre-trained model with a ViT base backbone, excelling in both understanding and generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_blip-image-captioning-large.json b/data/model_data_json/Salesforce_blip-image-captioning-large.json new file mode 100644 index 0000000000000000000000000000000000000000..5298deffd6f60df0f20dba7d58a5504363490723 --- /dev/null +++ b/data/model_data_json/Salesforce_blip-image-captioning-large.json @@ -0,0 +1,20 @@ +{ + "model_id": "Salesforce/blip-image-captioning-large", + "downloads": 2435880, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "blip", + "image-text-to-text", + "image-captioning", + "image-to-text", + "arxiv:2201.12086", + "license:bsd-3-clause", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: image-to-text tags: - image-captioning languages: - en license: bsd-3-clause --- # BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Model card for image captioning pretrained on COCO dataset - base architecture (with ViT large backbone). | !BLIP.gif | |:--:| | Pull figure from BLIP official repo | Image source: | ## TL;DR Authors from the paper write in the abstract: *Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to videolanguage tasks in a zero-shot manner. Code, models, and datasets are released.* ## Usage You can use this model for conditional and un-conditional image captioning ### Using the Pytorch model #### Running the model on CPU
Click to expand
#### Running the model on GPU ##### In full precision
Click to expand
##### In half precision ()
Click to expand
## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. ## BibTex and citation info", + "model_explanation_gemini": "Generates descriptive captions for images using a vision-language pre-trained model with a ViT large backbone, excelling in both understanding and generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_blip-vqa-base.json b/data/model_data_json/Salesforce_blip-vqa-base.json new file mode 100644 index 0000000000000000000000000000000000000000..30ffe09246ebf7511decb2a7b7533a830ddc3e31 --- /dev/null +++ b/data/model_data_json/Salesforce_blip-vqa-base.json @@ -0,0 +1,17 @@ +{ + "model_id": "Salesforce/blip-vqa-base", + "downloads": 1590977, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "blip", + "visual-question-answering", + "arxiv:2201.12086", + "license:bsd-3-clause", + "region:us" + ], + "description": "--- pipeline_tag: 'visual-question-answering' tags: - visual-question-answering inference: false languages: - en license: bsd-3-clause --- # BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Model card for BLIP trained on visual question answering- base architecture (with ViT base backbone). | !BLIP.gif | |:--:| | Pull figure from BLIP official repo | Image source: | ## TL;DR Authors from the paper write in the abstract: *Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to videolanguage tasks in a zero-shot manner. Code, models, and datasets are released.* ## Usage You can use this model for conditional and un-conditional image captioning ### Using the Pytorch model #### Running the model on CPU
Click to expand
#### Running the model on GPU ##### In full precision
Click to expand
##### In half precision ()
Click to expand
## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. ## BibTex and citation info", + "model_explanation_gemini": "BLIP-vqa-base is a vision-language model trained for visual question answering, excelling in understanding and generating responses based on image-text pairs." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_blip2-flan-t5-xl.json b/data/model_data_json/Salesforce_blip2-flan-t5-xl.json new file mode 100644 index 0000000000000000000000000000000000000000..e3de87f1df84f7c7fd2267ff9f9e1d0fd47a9c15 --- /dev/null +++ b/data/model_data_json/Salesforce_blip2-flan-t5-xl.json @@ -0,0 +1,22 @@ +{ + "model_id": "Salesforce/blip2-flan-t5-xl", + "downloads": 113552, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "blip-2", + "visual-question-answering", + "vision", + "image-to-text", + "image-captioning", + "image-text-to-text", + "en", + "arxiv:2301.12597", + "arxiv:2210.11416", + "license:mit", + "region:us" + ], + "description": "--- language: en license: mit tags: - vision - image-to-text - image-captioning - visual-question-answering pipeline_tag: image-text-to-text inference: false --- # BLIP-2, Flan T5-xl, pre-trained only BLIP-2 model, leveraging Flan T5-xl (a large language model). It was introduced in the paper BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models by Li et al. and first released in this repository. Disclaimer: The team releasing BLIP-2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize the weights of the image encoder and large language model from pre-trained checkpoints and keep them frozen while training the Querying Transformer, which is a BERT-like Transformer encoder that maps a set of \"query tokens\" to query embeddings, which bridge the gap between the embedding space of the image encoder and the large language model. The goal for the model is simply to predict the next text token, giving the query embeddings and the previous text. \"drawing\" This allows the model to be used for tasks like: - image captioning - visual question answering (VQA) - chat-like conversations by feeding the image and the previous conversation as prompt to the model ## Direct Use and Downstream Use You can use the raw model for conditional text generation given an image and optional text. See the model hub to look for fine-tuned versions on a task that interests you. ## Bias, Risks, Limitations, and Ethical Considerations BLIP2-FlanT5 uses off-the-shelf Flan-T5 as the language model. It inherits the same risks and limitations from Flan-T5: > Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. BLIP2 is fine-tuned on image-text datasets (e.g. LAION ) collected from the internet. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. BLIP2 has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context they’re being deployed within. ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. ### How to use For code examples, we refer to the documentation. #### Running the model on CPU
Click to expand
#### Running the model on GPU ##### In full precision
Click to expand
##### In half precision ()
Click to expand
##### In 8-bit precision ()
Click to expand
", + "model_explanation_gemini": "Generates text from images for tasks like captioning, visual question answering, and chat-like conversations by combining a frozen image encoder with a large language model." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_blip2-itm-vit-g.json b/data/model_data_json/Salesforce_blip2-itm-vit-g.json new file mode 100644 index 0000000000000000000000000000000000000000..b2451f0b0efcea895d6c794c3f4cef7dfebb94d1 --- /dev/null +++ b/data/model_data_json/Salesforce_blip2-itm-vit-g.json @@ -0,0 +1,17 @@ +{ + "model_id": "Salesforce/blip2-itm-vit-g", + "downloads": 119909, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "blip-2", + "zero-shot-image-classification", + "arxiv:1910.09700", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: mit --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed] ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.", + "model_explanation_gemini": "The description provides insufficient information to determine the model's core functionality or primary purpose." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_blip2-opt-2.7b.json b/data/model_data_json/Salesforce_blip2-opt-2.7b.json new file mode 100644 index 0000000000000000000000000000000000000000..570626bcd0cdb2c36b29e99fc65bd1e0002d235e --- /dev/null +++ b/data/model_data_json/Salesforce_blip2-opt-2.7b.json @@ -0,0 +1,22 @@ +{ + "model_id": "Salesforce/blip2-opt-2.7b", + "downloads": 873037, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "blip-2", + "visual-question-answering", + "vision", + "image-to-text", + "image-captioning", + "image-text-to-text", + "en", + "arxiv:2301.12597", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: mit tags: - vision - image-to-text - image-captioning - visual-question-answering pipeline_tag: image-text-to-text --- # BLIP-2, OPT-2.7b, pre-trained only BLIP-2 model, leveraging OPT-2.7b (a large language model with 2.7 billion parameters). It was introduced in the paper BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models by Li et al. and first released in this repository. Disclaimer: The team releasing BLIP-2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize the weights of the image encoder and large language model from pre-trained checkpoints and keep them frozen while training the Querying Transformer, which is a BERT-like Transformer encoder that maps a set of \"query tokens\" to query embeddings, which bridge the gap between the embedding space of the image encoder and the large language model. The goal for the model is simply to predict the next text token, giving the query embeddings and the previous text. \"drawing\" This allows the model to be used for tasks like: - image captioning - visual question answering (VQA) - chat-like conversations by feeding the image and the previous conversation as prompt to the model ## Direct Use and Downstream Use You can use the raw model for conditional text generation given an image and optional text. See the model hub to look for fine-tuned versions on a task that interests you. ## Bias, Risks, Limitations, and Ethical Considerations BLIP2-OPT uses off-the-shelf OPT as the language model. It inherits the same risks and limitations as mentioned in Meta's model card. > Like other large language models for which the diversity (or lack thereof) of training > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern > large language models. > BLIP2 is fine-tuned on image-text datasets (e.g. LAION ) collected from the internet. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. BLIP2 has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context they’re being deployed within. ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. ### How to use For code examples, we refer to the documentation. ### Memory requirements The memory requirements differ based on the precision one uses. One can use 4-bit inference using Bitsandbytes, which greatly reduce the memory requirements. | dtype | Largest Layer or Residual Group | Total Size | Training using Adam | |-------------------|---------------------------------|------------|----------------------| | float32 | 490.94 MB | 14.43 GB | 57.72 GB | | float16/bfloat16 | 245.47 MB | 7.21 GB | 28.86 GB | | int8 | 122.73 MB | 3.61 GB | 14.43 GB | | int4 | 61.37 MB | 1.8 GB | 7.21 GB | #### Running the model on CPU
Click to expand
#### Running the model on GPU ##### In full precision
Click to expand
##### In half precision ()
Click to expand
##### In 8-bit precision ()
Click to expand
", + "model_explanation_gemini": "Generates text (captions, answers, or conversations) from images and optional text inputs by bridging vision and language models." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_blip2-opt-6.7b-coco.json b/data/model_data_json/Salesforce_blip2-opt-6.7b-coco.json new file mode 100644 index 0000000000000000000000000000000000000000..a8d56d868c8d1a7b4527936e35d0227d83051a9a --- /dev/null +++ b/data/model_data_json/Salesforce_blip2-opt-6.7b-coco.json @@ -0,0 +1,22 @@ +{ + "model_id": "Salesforce/blip2-opt-6.7b-coco", + "downloads": 123791, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "blip-2", + "visual-question-answering", + "vision", + "image-to-text", + "image-captioning", + "image-text-to-text", + "en", + "arxiv:2301.12597", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: mit tags: - vision - image-to-text - image-captioning - visual-question-answering pipeline_tag: image-text-to-text --- # BLIP-2, OPT-6.7b, fine-tuned on COCO BLIP-2 model, leveraging OPT-6.7b (a large language model with 6.7 billion parameters). It was introduced in the paper BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models by Li et al. and first released in this repository. Disclaimer: The team releasing BLIP-2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model. The authors initialize the weights of the image encoder and large language model from pre-trained checkpoints and keep them frozen while training the Querying Transformer, which is a BERT-like Transformer encoder that maps a set of \"query tokens\" to query embeddings, which bridge the gap between the embedding space of the image encoder and the large language model. The goal for the model is simply to predict the next text token, giving the query embeddings and the previous text. \"drawing\" This allows the model to be used for tasks like: - image captioning - visual question answering (VQA) - chat-like conversations by feeding the image and the previous conversation as prompt to the model ## Direct Use and Downstream Use You can use the raw model for conditional text generation given an image and optional text. See the model hub to look for fine-tuned versions on a task that interests you. ## Bias, Risks, Limitations, and Ethical Considerations BLIP2-OPT uses off-the-shelf OPT as the language model. It inherits the same risks and limitations as mentioned in Meta's model card. > Like other large language models for which the diversity (or lack thereof) of training > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern > large language models. > BLIP2 is fine-tuned on image-text datasets (e.g. LAION ) collected from the internet. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. BLIP2 has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context they’re being deployed within. ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. ### How to use For code examples, we refer to the documentation.", + "model_explanation_gemini": "Generates text (captions, answers, or conversations) from images and optional text inputs using a vision-language model with a frozen OPT-6.7B language component." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_codet5p-110m-embedding.json b/data/model_data_json/Salesforce_codet5p-110m-embedding.json new file mode 100644 index 0000000000000000000000000000000000000000..5a9f4c8e94159e302bd12290bc264c81d3453c76 --- /dev/null +++ b/data/model_data_json/Salesforce_codet5p-110m-embedding.json @@ -0,0 +1,16 @@ +{ + "model_id": "Salesforce/codet5p-110m-embedding", + "downloads": 105647, + "tags": [ + "transformers", + "pytorch", + "codet5p_embedding", + "custom_code", + "arxiv:2305.07922", + "license:bsd-3-clause", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: bsd-3-clause --- # CodeT5+ 110M Embedding Models ## Model description CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. _encoder-only_, _decoder-only_, and _encoder-decoder_) to support a wide range of code understanding and generation tasks. It is introduced in the paper: CodeT5+: Open Code Large Language Models for Code Understanding and Generation by Yue Wang\\*, Hung Le\\*, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi (* indicates equal contribution). Compared to the original CodeT5 family (base: , large: ), CodeT5+ is pretrained with a diverse set of pretraining tasks including _span denoising_, _causal language modeling_, _contrastive learning_, and _text-code matching_ to learn rich representations from both unimodal code data and bimodal code-text data. Additionally, it employs a simple yet effective _compute-efficient pretraining_ method to initialize the model components with frozen off-the-shelf LLMs such as CodeGen to efficiently scale up the model (i.e. , , ), and adopts a \"shallow encoder and deep decoder\" architecture. Furthermore, it is instruction-tuned to align with natural language instructions (see our InstructCodeT5+ 16B) following Code Alpaca. ## How to use This checkpoint consists of an encoder of CodeT5+ 220M model (pretrained from 2 stages on both unimodal and bimodal) and a projection layer, which can be used to extract code embeddings of 256 dimension. It can be easily loaded using the functionality and employs the same CodeT5 tokenizer. ## Pretraining data This checkpoint is trained on the stricter permissive subset of the deduplicated version of the github-code dataset. The data is preprocessed by reserving only permissively licensed code (\"mit\" “apache-2”, “bsd-3-clause”, “bsd-2-clause”, “cc0-1.0”, “unlicense”, “isc”). Supported languages (9 in total) are as follows: , , , , , , , , ## Training procedure This checkpoint is first trained on the unimodal code data at the first-stage pretraining and then on bimodal text-code pair data using the proposed mixture of pretraining tasks. Please refer to the paper for more details. ## Evaluation results We show the zero-shot results of this checkpoint on 6 downstream code retrieval tasks from CodeXGLUE in the following table. | Ruby | JavaScript | Go | Python | Java | PHP | Overall | | ----- | ---------- | ----- | ------ | ----- | ----- | ------- | | 74.51 | 69.07 | 90.69 | 71.55 | 71.82 | 67.72 | 74.23 | ## BibTeX entry and citation info ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.", + "model_explanation_gemini": "Generates 256-dimensional code embeddings for retrieval and understanding tasks across multiple programming languages." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_moirai-1.0-R-small.json b/data/model_data_json/Salesforce_moirai-1.0-R-small.json new file mode 100644 index 0000000000000000000000000000000000000000..222b9a48eeb634d3037e562e45a6b5e6992a518a --- /dev/null +++ b/data/model_data_json/Salesforce_moirai-1.0-R-small.json @@ -0,0 +1,21 @@ +{ + "model_id": "Salesforce/moirai-1.0-R-small", + "downloads": 109470, + "tags": [ + "transformers", + "safetensors", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2402.02592", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Moirai-1.0-R-Small Moirai, the Masked Encoder-based Universal Time Series Forecasting Transformer is a Large Time Series Model pre-trained on LOTSA data. For more details on the Moirai architecture, training, and results, please refer to the paper.


Fig. 1: Overall architecture of Moirai. Visualized is a 3-variate time series, where variates 0 and 1 are target variables (i.e. to be forecasted, and variate 2 is a dynamic covariate (values in forecast horizon known). Based on a patch size of 64, each variate is patchified into 3 tokens. The patch embeddings along with sequence and variate id are fed into the Transformer. The shaded patches represent the forecast horizon to be forecasted, whose corresponding output representations are mapped into the mixture distribution parameters.

## Usage To perform inference with Moirai, install the uni2ts library from our GitHub repo. 1. Clone repository: 2) Create virtual environment: 3) Build from source: 4) Create a file: A simple example to get started: ## The Moirai Family | # Model | # Parameters | | :---: | :---: | | Moirai-1.0-R-Small | 14m | | Moirai-1.0-R-Base | 91m | | Moirai-1.0-R-Large | 311m | ## Citation If you're using Uni2TS in your research or applications, please cite it using this BibTeX: ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.", + "model_explanation_gemini": "Salesforce_moirai-1.0-R-small is a pretrained transformer model for universal time series forecasting, designed to predict future values in multivariate time series data." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_moirai-moe-1.0-R-base.json b/data/model_data_json/Salesforce_moirai-moe-1.0-R-base.json new file mode 100644 index 0000000000000000000000000000000000000000..da26fea8a0aeadd90b623f0fa98666f6aeaf2f5b --- /dev/null +++ b/data/model_data_json/Salesforce_moirai-moe-1.0-R-base.json @@ -0,0 +1,18 @@ +{ + "model_id": "Salesforce/moirai-moe-1.0-R-base", + "downloads": 434776, + "tags": [ + "safetensors", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- This model has been pushed to the Hub using the PytorchModelHubMixin integration: - Library: [More Information Needed] - Docs: [More Information Needed] ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.", + "model_explanation_gemini": "Pretrained foundation model for time series forecasting, intended for research purposes only." +} \ No newline at end of file diff --git a/data/model_data_json/Salesforce_moirai-moe-1.0-R-small.json b/data/model_data_json/Salesforce_moirai-moe-1.0-R-small.json new file mode 100644 index 0000000000000000000000000000000000000000..e8ad2a229d9afdf6a3eb1ba0126c039418d3d1c9 --- /dev/null +++ b/data/model_data_json/Salesforce_moirai-moe-1.0-R-small.json @@ -0,0 +1,18 @@ +{ + "model_id": "Salesforce/moirai-moe-1.0-R-small", + "downloads": 128393, + "tags": [ + "safetensors", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- This model has been pushed to the Hub using the PytorchModelHubMixin integration: - Library: [More Information Needed] - Docs: [More Information Needed] ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.", + "model_explanation_gemini": "Pretrained foundation model for time series forecasting, intended for research purposes only." +} \ No newline at end of file diff --git a/data/model_data_json/SamLowe_roberta-base-go_emotions.json b/data/model_data_json/SamLowe_roberta-base-go_emotions.json new file mode 100644 index 0000000000000000000000000000000000000000..0f43d78e28861ec5f29ba3027e60ba3471977ecb --- /dev/null +++ b/data/model_data_json/SamLowe_roberta-base-go_emotions.json @@ -0,0 +1,23 @@ +{ + "model_id": "SamLowe/roberta-base-go_emotions", + "downloads": 830351, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "roberta", + "text-classification", + "emotions", + "multi-class-classification", + "multi-label-classification", + "en", + "dataset:go_emotions", + "doi:10.57967/hf/3548", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - text-classification - pytorch - roberta - emotions - multi-class-classification - multi-label-classification datasets: - go_emotions license: mit widget: - text: I am not having a great day. --- #### Overview Model trained from roberta-base on the go_emotions dataset for multi-label classification. ##### ONNX version also available A version of this model in ONNX format (including an INT8 quantized ONNX version) is now available at These are faster for inference, esp for smaller batch sizes, massively reduce the size of the dependencies required for inference, make inference of the model more multi-platform, and in the case of the quantized version reduce the model file/download size by 75% whilst retaining almost all the accuracy if you only need inference. #### Dataset used for the model go_emotions is based on Reddit data and has 28 labels. It is a multi-label dataset where one or multiple labels may apply for any given input text, hence this model is a multi-label classification model with 28 'probability' float outputs for any given input text. Typically a threshold of 0.5 is applied to the probabilities for the prediction for each label. #### How the model was created The model was trained using with for 3 epochs with a learning rate of 2e-5 and weight decay of 0.01. #### Inference There are multiple ways to use this model in Huggingface Transformers. Possibly the simplest is using a pipeline: #### Evaluation / metrics Evaluation of the model is available at - using the dataset test split gives: - Accuracy: 0.474 - Precision: 0.575 - Recall: 0.396 - F1: 0.450 But the metrics are more meaningful when measured per label given the multi-label nature (each label is effectively an independent binary classification) and the fact that there is drastically different representations of the labels in the dataset. With a threshold of 0.5 applied to binarize the model outputs, as per the above notebook, the metrics per label are: | | accuracy | precision | recall | f1 | mcc | support | threshold | | -------------- | -------- | --------- | ------ | ----- | ----- | ------- | --------- | | admiration | 0.946 | 0.725 | 0.675 | 0.699 | 0.670 | 504 | 0.5 | | amusement | 0.982 | 0.790 | 0.871 | 0.829 | 0.821 | 264 | 0.5 | | anger | 0.970 | 0.652 | 0.379 | 0.479 | 0.483 | 198 | 0.5 | | annoyance | 0.940 | 0.472 | 0.159 | 0.238 | 0.250 | 320 | 0.5 | | approval | 0.942 | 0.609 | 0.302 | 0.404 | 0.403 | 351 | 0.5 | | caring | 0.973 | 0.448 | 0.319 | 0.372 | 0.364 | 135 | 0.5 | | confusion | 0.972 | 0.500 | 0.431 | 0.463 | 0.450 | 153 | 0.5 | | curiosity | 0.950 | 0.537 | 0.356 | 0.428 | 0.412 | 284 | 0.5 | | desire | 0.987 | 0.630 | 0.410 | 0.496 | 0.502 | 83 | 0.5 | | disappointment | 0.974 | 0.625 | 0.199 | 0.302 | 0.343 | 151 | 0.5 | | disapproval | 0.950 | 0.494 | 0.307 | 0.379 | 0.365 | 267 | 0.5 | | disgust | 0.982 | 0.707 | 0.333 | 0.453 | 0.478 | 123 | 0.5 | | embarrassment | 0.994 | 0.750 | 0.243 | 0.367 | 0.425 | 37 | 0.5 | | excitement | 0.983 | 0.603 | 0.340 | 0.435 | 0.445 | 103 | 0.5 | | fear | 0.992 | 0.758 | 0.603 | 0.671 | 0.672 | 78 | 0.5 | | gratitude | 0.990 | 0.960 | 0.881 | 0.919 | 0.914 | 352 | 0.5 | | grief | 0.999 | 0.000 | 0.000 | 0.000 | 0.000 | 6 | 0.5 | | joy | 0.978 | 0.647 | 0.559 | 0.600 | 0.590 | 161 | 0.5 | | love | 0.982 | 0.773 | 0.832 | 0.802 | 0.793 | 238 | 0.5 | | nervousness | 0.996 | 0.600 | 0.130 | 0.214 | 0.278 | 23 | 0.5 | | optimism | 0.972 | 0.667 | 0.376 | 0.481 | 0.488 | 186 | 0.5 | | pride | 0.997 | 0.000 | 0.000 | 0.000 | 0.000 | 16 | 0.5 | | realization | 0.974 | 0.541 | 0.138 | 0.220 | 0.264 | 145 | 0.5 | | relief | 0.998 | 0.000 | 0.000 | 0.000 | 0.000 | 11 | 0.5 | | remorse | 0.991 | 0.553 | 0.750 | 0.636 | 0.640 | 56 | 0.5 | | sadness | 0.977 | 0.621 | 0.494 | 0.550 | 0.542 | 156 | 0.5 | | surprise | 0.981 | 0.750 | 0.404 | 0.525 | 0.542 | 141 | 0.5 | | neutral | 0.782 | 0.694 | 0.604 | 0.646 | 0.492 | 1787 | 0.5 | Optimizing the threshold per label for the one that gives the optimum F1 metrics gives slightly better metrics - sacrificing some precision for a greater gain in recall, hence to the benefit of F1 (how this was done is shown in the above notebook): | | accuracy | precision | recall | f1 | mcc | support | threshold | | -------------- | -------- | --------- | ------ | ----- | ----- | ------- | --------- | | admiration | 0.940 | 0.651 | 0.776 | 0.708 | 0.678 | 504 | 0.25 | | amusement | 0.982 | 0.781 | 0.890 | 0.832 | 0.825 | 264 | 0.45 | | anger | 0.959 | 0.454 | 0.601 | 0.517 | 0.502 | 198 | 0.15 | | annoyance | 0.864 | 0.243 | 0.619 | 0.349 | 0.328 | 320 | 0.10 | | approval | 0.926 | 0.432 | 0.442 | 0.437 | 0.397 | 351 | 0.30 | | caring | 0.972 | 0.426 | 0.385 | 0.405 | 0.391 | 135 | 0.40 | | confusion | 0.974 | 0.548 | 0.412 | 0.470 | 0.462 | 153 | 0.55 | | curiosity | 0.943 | 0.473 | 0.711 | 0.568 | 0.552 | 284 | 0.25 | | desire | 0.985 | 0.518 | 0.530 | 0.524 | 0.516 | 83 | 0.25 | | disappointment | 0.974 | 0.562 | 0.298 | 0.390 | 0.398 | 151 | 0.40 | | disapproval | 0.941 | 0.414 | 0.468 | 0.439 | 0.409 | 267 | 0.30 | | disgust | 0.978 | 0.523 | 0.463 | 0.491 | 0.481 | 123 | 0.20 | | embarrassment | 0.994 | 0.567 | 0.459 | 0.507 | 0.507 | 37 | 0.10 | | excitement | 0.981 | 0.500 | 0.417 | 0.455 | 0.447 | 103 | 0.35 | | fear | 0.991 | 0.712 | 0.667 | 0.689 | 0.685 | 78 | 0.40 | | gratitude | 0.990 | 0.957 | 0.889 | 0.922 | 0.917 | 352 | 0.45 | | grief | 0.999 | 0.333 | 0.333 | 0.333 | 0.333 | 6 | 0.05 | | joy | 0.978 | 0.623 | 0.646 | 0.634 | 0.623 | 161 | 0.40 | | love | 0.982 | 0.740 | 0.899 | 0.812 | 0.807 | 238 | 0.25 | | nervousness | 0.996 | 0.571 | 0.348 | 0.432 | 0.444 | 23 | 0.25 | | optimism | 0.971 | 0.580 | 0.565 | 0.572 | 0.557 | 186 | 0.20 | | pride | 0.998 | 0.875 | 0.438 | 0.583 | 0.618 | 16 | 0.10 | | realization | 0.961 | 0.270 | 0.262 | 0.266 | 0.246 | 145 | 0.15 | | relief | 0.992 | 0.152 | 0.636 | 0.246 | 0.309 | 11 | 0.05 | | remorse | 0.991 | 0.541 | 0.946 | 0.688 | 0.712 | 56 | 0.10 | | sadness | 0.977 | 0.599 | 0.583 | 0.591 | 0.579 | 156 | 0.40 | | surprise | 0.977 | 0.543 | 0.674 | 0.601 | 0.593 | 141 | 0.15 | | neutral | 0.758 | 0.598 | 0.810 | 0.688 | 0.513 | 1787 | 0.25 | This improves the overall metrics: - Precision: 0.542 - Recall: 0.577 - F1: 0.541 Or if calculated weighted by the relative size of the support of each label: - Precision: 0.572 - Recall: 0.677 - F1: 0.611 #### Commentary on the dataset Some labels (E.g. gratitude) when considered independently perform very strongly with F1 exceeding 0.9, whilst others (E.g. relief) perform very poorly. This is a challenging dataset. Labels such as relief do have much fewer examples in the training data (less than 100 out of the 40k+, and only 11 in the test split). But there is also some ambiguity and/or labelling errors visible in the training data of go_emotions that is suspected to constrain the performance. Data cleaning on the dataset to reduce some of the mistakes, ambiguity, conflicts and duplication in the labelling would produce a higher performing model.", + "model_explanation_gemini": "Classifies text into 28 emotion categories using multi-label probability scores." +} \ No newline at end of file diff --git a/data/model_data_json/SamiPie_debbie_llama-8b-ep3-v1-merged.json b/data/model_data_json/SamiPie_debbie_llama-8b-ep3-v1-merged.json new file mode 100644 index 0000000000000000000000000000000000000000..6e9e3be7434f67666da139edc9401d1e62a17809 --- /dev/null +++ b/data/model_data_json/SamiPie_debbie_llama-8b-ep3-v1-merged.json @@ -0,0 +1,23 @@ +{ + "model_id": "SamiPie/debbie_llama-8b-ep3-v1-merged", + "downloads": 167041, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "text-generation-inference", + "unsloth", + "trl", + "sft", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: unsloth/meta-llama-3.1-8b-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl - sft license: apache-2.0 language: - en --- # Uploaded model - **Developed by:** SamiPie - **License:** apache-2.0 - **Finetuned from model :** unsloth/meta-llama-3.1-8b-bnb-4bit This llama model was trained 2x faster with Unsloth and Huggingface's TRL library. ", + "model_explanation_gemini": "Finetuned from a quantized Llama 3.1-8B model using Unsloth and TRL for faster text generation." +} \ No newline at end of file diff --git a/data/model_data_json/SanctumAI_granite-3b-code-instruct-GGUF.json b/data/model_data_json/SanctumAI_granite-3b-code-instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..069436c0e60732d463fe0e06375677c3882be1e7 --- /dev/null +++ b/data/model_data_json/SanctumAI_granite-3b-code-instruct-GGUF.json @@ -0,0 +1,30 @@ +{ + "model_id": "SanctumAI/granite-3b-code-instruct-GGUF", + "downloads": 106320, + "tags": [ + "transformers", + "gguf", + "ibm-granite-code", + "code", + "granite", + "text-generation", + "dataset:bigcode/commitpackft", + "dataset:TIGER-Lab/MathInstruct", + "dataset:meta-math/MetaMathQA", + "dataset:glaiveai/glaive-code-assistant-v3", + "dataset:glaive-function-calling-v2", + "dataset:bugdaryan/sql-create-context-instruction", + "dataset:garage-bAInd/Open-Platypus", + "dataset:nvidia/HelpSteer", + "arxiv:2405.04324", + "base_model:ibm-granite/granite-3b-code-base-2k", + "base_model:quantized:ibm-granite/granite-3b-code-base-2k", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- pipeline_tag: text-generation base_model: ibm-granite/granite-3b-code-base license: apache-2.0 datasets: - bigcode/commitpackft - TIGER-Lab/MathInstruct - meta-math/MetaMathQA - glaiveai/glaive-code-assistant-v3 - glaive-function-calling-v2 - bugdaryan/sql-create-context-instruction - garage-bAInd/Open-Platypus - nvidia/HelpSteer metrics: - code_eval library_name: transformers tags: - code - granite model-index: - name: granite-3b-code-instruct results: - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesis(Python) metrics: - name: pass@1 type: pass@1 value: 51.2 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesis(JavaScript) metrics: - name: pass@1 type: pass@1 value: 43.9 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesis(Java) metrics: - name: pass@1 type: pass@1 value: 41.5 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesis(Go) metrics: - name: pass@1 type: pass@1 value: 31.7 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesis(C++) metrics: - name: pass@1 type: pass@1 value: 40.2 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesis(Rust) metrics: - name: pass@1 type: pass@1 value: 29.3 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain(Python) metrics: - name: pass@1 type: pass@1 value: 39.6 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain(JavaScript) metrics: - name: pass@1 type: pass@1 value: 26.8 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain(Java) metrics: - name: pass@1 type: pass@1 value: 39 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain(Go) metrics: - name: pass@1 type: pass@1 value: 14 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain(C++) metrics: - name: pass@1 type: pass@1 value: 23.8 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain(Rust) metrics: - name: pass@1 type: pass@1 value: 12.8 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix(Python) metrics: - name: pass@1 type: pass@1 value: 26.8 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix(JavaScript) metrics: - name: pass@1 type: pass@1 value: 28 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix(Java) metrics: - name: pass@1 type: pass@1 value: 33.5 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix(Go) metrics: - name: pass@1 type: pass@1 value: 27.4 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix(C++) metrics: - name: pass@1 type: pass@1 value: 31.7 veriefied: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix(Rust) metrics: - name: pass@1 type: pass@1 value: 16.5 veriefied: false --- !image/png *This model was quantized by SanctumAI. To leave feedback, join our community in Discord.* # Granite 3B Code Instruct GGUF **Model creator:** ibm-granite
**Original model**: granite-3b-code-instruct
## Model Summary: **Granite-3B-Code-Instruct** is a 3B parameter model fine tuned from *Granite-3B-Code-Base* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills. - **Developers:** IBM Research - **GitHub Repository:** ibm-granite/granite-code-models - **Paper:** Granite Code Models: A Family of Open Foundation Models for Code Intelligence - **Release Date**: May 6th, 2024 - **License:** Apache 2.0. ## Prompt Template: If you're using Sanctum app, simply use model preset. Prompt template: ## Hardware Requirements Estimate | Name | Quant method | Size | Memory (RAM, vRAM) required (for full context of 32k tokens) | | ---- | ---- | ---- | ---- | | granite-3b-code-instruct.Q2_K.gguf | Q2_K | 1.34 GB | 4.68 GB | | granite-3b-code-instruct.Q3_K_S.gguf | Q3_K_S | 1.55 GB | ? | | granite-3b-code-instruct.Q3_K_M.gguf | Q3_K_M | 1.73 GB | ? | | granite-3b-code-instruct.Q3_K_L.gguf | Q3_K_L | 1.88 GB | ? | | granite-3b-code-instruct.Q4_0.gguf | Q4_0 | 2.00 GB | ? | | granite-3b-code-instruct.Q4_K_S.gguf | Q4_K_S | 2.01 GB | ? | | granite-3b-code-instruct.Q4_K_M.gguf | Q4_K_M | 2.13 GB | ? | | granite-3b-code-instruct.Q4_K.gguf | Q4_K | 2.13 GB | ? | | granite-3b-code-instruct.Q4_1.gguf | Q4_1 | 2.21 GB | ? | | granite-3b-code-instruct.Q5_0.gguf | Q5_0 | 2.42 GB | ? | | granite-3b-code-instruct.Q5_K_S.gguf | Q5_K_S | 2.42 GB | ? | | granite-3b-code-instruct.Q5_K_M.gguf | Q5_K_M | 2.49 GB | ? | | granite-3b-code-instruct.Q5_K.gguf | Q5_K | 2.49 GB | ? | | granite-3b-code-instruct.Q5_1.gguf | Q5_1 | 2.63 GB | ? | | granite-3b-code-instruct.Q6_K.gguf | Q6_K | 2.86 GB | ? | | granite-3b-code-instruct.Q8_0.gguf | Q8_0 | 3.71 GB | ? | | granite-3b-code-instruct.f16.gguf | f16 | 6.97 GB | 4.68 GB | ## Disclaimer Sanctum is not the creator, originator, or owner of any Model featured in the Models section of the Sanctum application. Each Model is created and provided by third parties. Sanctum does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Model listed there. You understand that supported Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Model is the sole responsibility of the person or entity who originated such Model. Sanctum may not monitor or control the Models supported and cannot, and does not, take responsibility for any such Model. Sanctum disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Models. Sanctum further disclaims any warranty that the Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Models, your downloading of any Model, or use of any other Model provided by or through Sanctum.", + "model_explanation_gemini": "A 3B-parameter code generation model fine-tuned for instruction-following tasks like code synthesis, explanation, and fixing across multiple programming languages." +} \ No newline at end of file diff --git a/data/model_data_json/Shushant_nepaliBERT.json b/data/model_data_json/Shushant_nepaliBERT.json new file mode 100644 index 0000000000000000000000000000000000000000..3db737bd3ce0c650204afedfee6c617e7dbb7b61 --- /dev/null +++ b/data/model_data_json/Shushant_nepaliBERT.json @@ -0,0 +1,18 @@ +{ + "model_id": "Shushant/nepaliBERT", + "downloads": 653235, + "tags": [ + "transformers", + "pytorch", + "bert", + "fill-mask", + "ne", + "dataset:Shushant/nepali", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit datasets: - Shushant/nepali language: - ne metrics: - perplexity library_name: transformers pipeline_tag: fill-mask --- # NEPALI BERT ## Masked Language Model for nepali language trained on nepali news scrapped from different nepali news website. The data set contained about 10 million of nepali sentences mainly related to nepali news. This model is a fine-tuned version of Bert Base Uncased on dataset composed of different news scrapped from nepali news portals comprising of 4.6 GB of textual data. It achieves the following results on the evaluation set: - Loss: 1.0495 ## Model description Pretraining done on bert base architecture. ## Intended uses & limitations This transformer model can be used for any NLP tasks related to Devenagari language. At the time of training, this is the state of the art model developed for Devanagari dataset. Intrinsic evaluation with Perplexity of 8.56 achieves this state of the art whereas extrinsit evaluation done on sentiment analysis of Nepali tweets outperformed other existing masked language models on Nepali dataset. ## Training and evaluation data THe training corpus is developed using 85467 news scrapped from different job portals. This is a preliminary dataset for the experimentation. THe corpus size is about 4.3 GB of textual data. Similary evaluation data contains few news articles about 12 mb of textual data. ## Training procedure For the pretraining of masked language model, Trainer API from Huggingface is used. The pretraining took about 3 days 8 hours 57 minutes. Training was done on Tesla V100 GPU. With 640 Tensor Cores, Tesla V100 is the world's first GPU to break the 100 teraFLOPS (TFLOPS) barrier of deep learning performance. This GPU was faciliated by Kathmandu University (KU) supercomputer. Thanks to KU administration. Usage ## Data Description Trained on about 4.6 GB of Nepali text corpus collected from various sources These data were collected from nepali news site, OSCAR nepali corpus # Paper and CItation Details If you are interested to read the implementation details of this language model, you can read the full paper here. ## Plain Text S. Pudasaini, S. Shakya, A. Tamang, S. Adhikari, S. Thapa and S. Lamichhane, \"NepaliBERT: Pre-training of Masked Language Model in Nepali Corpus,\" 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, 2023, pp. 325-330, doi: 10.1109/I-SMAC58438.2023.10290690. ## Bibtex @INPROCEEDINGS{10290690, author={Pudasaini, Shushanta and Shakya, Subarna and Tamang, Aakash and Adhikari, Sajjan and Thapa, Sunil and Lamichhane, Sagar}, booktitle={2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)}, title={NepaliBERT: Pre-training of Masked Language Model in Nepali Corpus}, year={2023}, volume={}, number={}, pages={325-330}, doi={10.1109/I-SMAC58438.2023.10290690}}", + "model_explanation_gemini": "A masked language model for Nepali text, fine-tuned on a 4.6 GB corpus of news articles to perform NLP tasks in the Devanagari script." +} \ No newline at end of file diff --git a/data/model_data_json/SimianLuo_LCM_Dreamshaper_v7.json b/data/model_data_json/SimianLuo_LCM_Dreamshaper_v7.json new file mode 100644 index 0000000000000000000000000000000000000000..bf6bb1e50f12ff7b9184970313fed619c8c8618a --- /dev/null +++ b/data/model_data_json/SimianLuo_LCM_Dreamshaper_v7.json @@ -0,0 +1,17 @@ +{ + "model_id": "SimianLuo/LCM_Dreamshaper_v7", + "downloads": 181623, + "tags": [ + "diffusers", + "onnx", + "safetensors", + "text-to-image", + "en", + "arxiv:2310.04378", + "license:mit", + "diffusers:LatentConsistencyModelPipeline", + "region:us" + ], + "description": "--- license: mit language: - en pipeline_tag: text-to-image tags: - text-to-image --- # Latent Consistency Models Official Repository of the paper: *Latent Consistency Models*. Project Page: ## Try our Hugging Face demos: . ## Generation Results:

By distilling classifier-free guidance into the model's input, LCM can generate high-quality images in very short inference time. We compare the inference time at the setting of 768 x 768 resolution, CFG scale w=8, batchsize=4, using a A800 GPU.

## Usage You can try out Latency Consistency Models directly on: 1. Install the library: 2. Run the model: ## BibTeX", + "model_explanation_gemini": "Generates high-quality images quickly from text inputs by distilling classifier-free guidance into the model." +} \ No newline at end of file diff --git a/data/model_data_json/Skywork_Skywork-R1V-38B.json b/data/model_data_json/Skywork_Skywork-R1V-38B.json new file mode 100644 index 0000000000000000000000000000000000000000..9624d5b5a3070ac5fad8cf0e4b89102bbdd6e216 --- /dev/null +++ b/data/model_data_json/Skywork_Skywork-R1V-38B.json @@ -0,0 +1,18 @@ +{ + "model_id": "Skywork/Skywork-R1V-38B", + "downloads": 12234, + "tags": [ + "transformers", + "safetensors", + "skywork_chat", + "image-text-to-text", + "conversational", + "custom_code", + "arxiv:2504.05599", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: image-text-to-text library_name: transformers license: mit --- # Skywork-R1V
\"Introduction
## 📖 Technical Report | 💻 GitHub | 🌐 ModelScope
- 71.9 49.5 63.7 67.5 MMMU(Val) - 63.9 55.1 55.2 69.0

Evaluation results of state-of-the-art LLMs and VLMs
Vision Reasoning Vision
MATH-500 AIME 2024 GPQA MathVista(mini) MMMU(Val)
pass@1 pass@1 pass@1 pass@1 pass@1
Qwen2.5-72B-Instruct 80.0 23.3 49.0 - -
Deepseek V3 90.2 39.2 59.1 - -
Deepseek R1 97.3 79.8 71.5 - -
Claude 3.5 Sonnet 78.3 16.0 65.0 65.3 66.4
GPT-4o 74.6 9.3 49.9 63.8 69.1
Kimi k1.5 96.2 77.5 - 74.9 70.0
Qwen2.5-VL-72B-Instruct - - - 74.8 70.2
LLaVA-Onevision-72B - - - 67.5 56.8
InternVL2-Llama3-76B - - - 65.5 62.7
InternVL2.5-78B - - - 72.3 70.1
Skywork-R1V-38B 94.0 72.0 61.6 67.5 69.0
\"skywork_r1v_eval\"
--- ## 4. Usage ### 1. Clone the Repository ### 2. Set Up the Environment ### 3. Run the Inference Script --- ## 5. Citation If you use Skywork-R1V in your research, please cite: *This project is released under an open-source license.* ## Star History ![Star History Chart]( ```", + "model_explanation_gemini": "Generates text outputs from image-text inputs, excelling in vision and reasoning tasks like math problem-solving and visual question answering." +} \ No newline at end of file diff --git a/data/model_data_json/SmilingWolf_wd-vit-large-tagger-v3.json b/data/model_data_json/SmilingWolf_wd-vit-large-tagger-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..18bb5e823bd1f6ffeba401c6f1934d3c6e74555b --- /dev/null +++ b/data/model_data_json/SmilingWolf_wd-vit-large-tagger-v3.json @@ -0,0 +1,13 @@ +{ + "model_id": "SmilingWolf/wd-vit-large-tagger-v3", + "downloads": 351213, + "tags": [ + "timm", + "onnx", + "safetensors", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: timm --- # WD ViT-Large Tagger v3 Supports ratings, characters and general tags. Trained using TPUs used for training kindly provided by the TRC program. ## Dataset Last image id: 7220105 Trained on Danbooru images with IDs modulo 0000-0899. Validated on images with IDs modulo 0950-0999. Images with less than 10 general tags were filtered out. Tags with less than 600 images were filtered out. ## Validation results ## What's new Model v1.0/Dataset v3: More training images, more and up-to-date tags (up to 2024-02-28). Now compatible! Load it up and give it a spin using the canonical one-liner! ONNX model is compatible with code developed for the v2 series of models. The batch dimension of the ONNX model is not fixed to 1 anymore. Now you can go crazy with batch inference. Switched to Macro-F1 to measure model performance since it gives me a better gauge of overall training progress. # Runtime deps ONNX model requires # Inference code examples For timm: For ONNX: For JAX: ## Final words Subject to change and updates. Downstream users are encouraged to use tagged releases rather than relying on the head of the repo.", + "model_explanation_gemini": "Classifies images with ratings, character tags, and general tags based on Danbooru dataset training." +} \ No newline at end of file diff --git a/data/model_data_json/Snowflake_snowflake-arctic-embed-l-v2.0.json b/data/model_data_json/Snowflake_snowflake-arctic-embed-l-v2.0.json new file mode 100644 index 0000000000000000000000000000000000000000..eebf7aa5938205e290e9d265e087662bfe87c07c --- /dev/null +++ b/data/model_data_json/Snowflake_snowflake-arctic-embed-l-v2.0.json @@ -0,0 +1,99 @@ +{ + "model_id": "Snowflake/snowflake-arctic-embed-l-v2.0", + "downloads": 233397, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "xlm-roberta", + "feature-extraction", + "sentence-similarity", + "mteb", + "arctic", + "snowflake-arctic-embed", + "transformers.js", + "af", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "gl", + "gu", + "he", + "hi", + "hr", + "ht", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ky", + "lo", + "lt", + "lv", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "pa", + "pl", + "pt", + "qu", + "ro", + "ru", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "uk", + "ur", + "vi", + "yo", + "zh", + "arxiv:2412.04506", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - arctic - snowflake-arctic-embed - transformers.js license: apache-2.0 language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh model-index: - name: snowflake-arctic-embed-l-v2.0 results: - dataset: config: en-ext name: MTEB AmazonCounterfactualClassification (en-ext) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 67.039 - type: f1 value: 55.1806 - type: f1_weighted value: 73.41149999999999 - type: ap value: 17.9914 - type: ap_weighted value: 17.9914 - type: main_score value: 67.039 task: type: Classification - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 65.59700000000001 - type: f1 value: 60.244299999999996 - type: f1_weighted value: 68.9975 - type: ap value: 29.762100000000004 - type: ap_weighted value: 29.762100000000004 - type: main_score value: 65.59700000000001 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification (default) revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 74.2565 - type: f1 value: 74.0291 - type: f1_weighted value: 74.0291 - type: ap value: 68.7595 - type: ap_weighted value: 68.7595 - type: main_score value: 74.2565 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 34.946 - type: f1 value: 34.2853 - type: f1_weighted value: 34.2853 - type: main_score value: 34.946 task: type: Classification - dataset: config: default name: MTEB ArguAna (default) revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: ndcg_at_1 value: 33.286 - type: ndcg_at_3 value: 49.051 - type: ndcg_at_5 value: 54.107000000000006 - type: ndcg_at_10 value: 59.146 - type: ndcg_at_20 value: 60.897999999999996 - type: ndcg_at_100 value: 61.78399999999999 - type: ndcg_at_1000 value: 61.845000000000006 - type: map_at_1 value: 33.286 - type: map_at_3 value: 45.14 - type: map_at_5 value: 47.939 - type: map_at_10 value: 50.046 - type: map_at_20 value: 50.56 - type: map_at_100 value: 50.708 - type: map_at_1000 value: 50.712 - type: recall_at_1 value: 33.286 - type: recall_at_3 value: 60.38400000000001 - type: recall_at_5 value: 72.688 - type: recall_at_10 value: 88.122 - type: recall_at_20 value: 94.808 - type: recall_at_100 value: 99.21799999999999 - type: recall_at_1000 value: 99.644 - type: precision_at_1 value: 33.286 - type: precision_at_3 value: 20.128 - type: precision_at_5 value: 14.538 - type: precision_at_10 value: 8.812000000000001 - type: precision_at_20 value: 4.74 - type: precision_at_100 value: 0.992 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 33.926 - type: mrr_at_3 value: 45.3414 - type: mrr_at_5 value: 48.1828 - type: mrr_at_10 value: 50.270700000000005 - type: mrr_at_20 value: 50.7844 - type: mrr_at_100 value: 50.9259 - type: mrr_at_1000 value: 50.9294 - type: nauc_ndcg_at_1_max value: -10.305 - type: nauc_ndcg_at_1_std value: -15.674199999999999 - type: nauc_ndcg_at_1_diff1 value: 18.6355 - type: nauc_ndcg_at_3_max value: -7.744 - type: nauc_ndcg_at_3_std value: -16.894000000000002 - type: nauc_ndcg_at_3_diff1 value: 15.4469 - type: nauc_ndcg_at_5_max value: -6.4887 - type: nauc_ndcg_at_5_std value: -16.1382 - type: nauc_ndcg_at_5_diff1 value: 13.8214 - type: nauc_ndcg_at_10_max value: -7.616499999999999 - type: nauc_ndcg_at_10_std value: -15.8073 - type: nauc_ndcg_at_10_diff1 value: 13.7678 - type: nauc_ndcg_at_20_max value: -6.9801 - type: nauc_ndcg_at_20_std value: -15.068699999999998 - type: nauc_ndcg_at_20_diff1 value: 14.2013 - type: nauc_ndcg_at_100_max value: -7.5221 - type: nauc_ndcg_at_100_std value: -15.417200000000001 - type: nauc_ndcg_at_100_diff1 value: 15.1072 - type: nauc_ndcg_at_1000_max value: -7.6931 - type: nauc_ndcg_at_1000_std value: -15.5367 - type: nauc_ndcg_at_1000_diff1 value: 15.001700000000001 - type: nauc_map_at_1_max value: -10.305 - type: nauc_map_at_1_std value: -15.674199999999999 - type: nauc_map_at_1_diff1 value: 18.6355 - type: nauc_map_at_3_max value: -8.4505 - type: nauc_map_at_3_std value: -16.5487 - type: nauc_map_at_3_diff1 value: 15.965599999999998 - type: nauc_map_at_5_max value: -7.8429 - type: nauc_map_at_5_std value: -16.1332 - type: nauc_map_at_5_diff1 value: 15.0893 - type: nauc_map_at_10_max value: -8.3186 - type: nauc_map_at_10_std value: -15.979399999999998 - type: nauc_map_at_10_diff1 value: 15.136199999999999 - type: nauc_map_at_20_max value: -8.1697 - type: nauc_map_at_20_std value: -15.8241 - type: nauc_map_at_20_diff1 value: 15.260599999999998 - type: nauc_map_at_100_max value: -8.2285 - type: nauc_map_at_100_std value: -15.8624 - type: nauc_map_at_100_diff1 value: 15.412600000000001 - type: nauc_map_at_1000_max value: -8.2359 - type: nauc_map_at_1000_std value: -15.867 - type: nauc_map_at_1000_diff1 value: 15.408 - type: nauc_recall_at_1_max value: -10.305 - type: nauc_recall_at_1_std value: -15.674199999999999 - type: nauc_recall_at_1_diff1 value: 18.6355 - type: nauc_recall_at_3_max value: -5.5097 - type: nauc_recall_at_3_std value: -17.9896 - type: nauc_recall_at_3_diff1 value: 13.9525 - type: nauc_recall_at_5_max value: -0.9383 - type: nauc_recall_at_5_std value: -16.035 - type: nauc_recall_at_5_diff1 value: 8.8431 - type: nauc_recall_at_10_max value: -2.8548 - type: nauc_recall_at_10_std value: -14.1203 - type: nauc_recall_at_10_diff1 value: 3.2265 - type: nauc_recall_at_20_max value: 14.2043 - type: nauc_recall_at_20_std value: 2.1298999999999997 - type: nauc_recall_at_20_diff1 value: -1.9900000000000002 - type: nauc_recall_at_100_max value: 44.0173 - type: nauc_recall_at_100_std value: 42.131800000000005 - type: nauc_recall_at_100_diff1 value: 29.9983 - type: nauc_recall_at_1000_max value: 25.9434 - type: nauc_recall_at_1000_std value: 53.9252 - type: nauc_recall_at_1000_diff1 value: -0.9778 - type: nauc_precision_at_1_max value: -10.305 - type: nauc_precision_at_1_std value: -15.674199999999999 - type: nauc_precision_at_1_diff1 value: 18.6355 - type: nauc_precision_at_3_max value: -5.5097 - type: nauc_precision_at_3_std value: -17.9896 - type: nauc_precision_at_3_diff1 value: 13.9525 - type: nauc_precision_at_5_max value: -0.9383 - type: nauc_precision_at_5_std value: -16.035 - type: nauc_precision_at_5_diff1 value: 8.8431 - type: nauc_precision_at_10_max value: -2.8548 - type: nauc_precision_at_10_std value: -14.1203 - type: nauc_precision_at_10_diff1 value: 3.2265 - type: nauc_precision_at_20_max value: 14.2043 - type: nauc_precision_at_20_std value: 2.1298999999999997 - type: nauc_precision_at_20_diff1 value: -1.9900000000000002 - type: nauc_precision_at_100_max value: 44.0173 - type: nauc_precision_at_100_std value: 42.131800000000005 - type: nauc_precision_at_100_diff1 value: 29.9983 - type: nauc_precision_at_1000_max value: 25.9434 - type: nauc_precision_at_1000_std value: 53.9252 - type: nauc_precision_at_1000_diff1 value: -0.9778 - type: nauc_mrr_at_1_max value: -9.833 - type: nauc_mrr_at_1_std value: -14.8351 - type: nauc_mrr_at_1_diff1 value: 16.7604 - type: nauc_mrr_at_3_max value: -9.0116 - type: nauc_mrr_at_3_std value: -16.296 - type: nauc_mrr_at_3_diff1 value: 14.178199999999999 - type: nauc_mrr_at_5_max value: -8.308300000000001 - type: nauc_mrr_at_5_std value: -15.751999999999999 - type: nauc_mrr_at_5_diff1 value: 13.306299999999998 - type: nauc_mrr_at_10_max value: -8.7962 - type: nauc_mrr_at_10_std value: -15.688099999999999 - type: nauc_mrr_at_10_diff1 value: 13.2589 - type: nauc_mrr_at_20_max value: -8.6773 - type: nauc_mrr_at_20_std value: -15.479499999999998 - type: nauc_mrr_at_20_diff1 value: 13.354 - type: nauc_mrr_at_100_max value: -8.7533 - type: nauc_mrr_at_100_std value: -15.553600000000001 - type: nauc_mrr_at_100_diff1 value: 13.4796 - type: nauc_mrr_at_1000_max value: -8.7608 - type: nauc_mrr_at_1000_std value: -15.5582 - type: nauc_mrr_at_1000_diff1 value: 13.4748 - type: main_score value: 59.146 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: v_measure value: 43.9715 - type: v_measure_std value: 13.4325 - type: main_score value: 43.9715 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S (default) revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: v_measure value: 34.775800000000004 - type: v_measure_std value: 13.922799999999999 - type: main_score value: 34.775800000000004 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions (default) revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: map value: 63.3521 - type: mrr value: 77.5965 - type: nAUC_map_max value: 21.2353 - type: nAUC_map_std value: 17.002100000000002 - type: nAUC_map_diff1 value: 3.8135000000000003 - type: nAUC_mrr_max value: 35.058299999999996 - type: nAUC_mrr_std value: 20.432 - type: nAUC_mrr_diff1 value: 9.2584 - type: main_score value: 63.3521 task: type: Reranking - dataset: config: default name: MTEB BIOSSES (default) revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: pearson value: 89.8072 - type: spearman value: 87.2875 - type: cosine_pearson value: 89.8072 - type: cosine_spearman value: 87.2875 - type: manhattan_pearson value: 87.9173 - type: manhattan_spearman value: 86.7327 - type: euclidean_pearson value: 88.21600000000001 - type: euclidean_spearman value: 87.2875 - type: main_score value: 87.2875 task: type: STS - dataset: config: default name: MTEB Banking77Classification (default) revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 81.8149 - type: f1 value: 81.2226 - type: f1_weighted value: 81.2226 - type: main_score value: 81.8149 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P (default) revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: v_measure value: 35.0927 - type: v_measure_std value: 0.7048 - type: main_score value: 35.0927 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S (default) revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: v_measure value: 30.220999999999997 - type: v_measure_std value: 1.107 - type: main_score value: 30.220999999999997 task: type: Clustering - dataset: config: default name: MTEB CQADupstackAndroidRetrieval (default) revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: mteb/cqadupstack-android metrics: - type: ndcg_at_1 value: 44.349 - type: ndcg_at_3 value: 50.109 - type: ndcg_at_5 value: 52.88699999999999 - type: ndcg_at_10 value: 55.799 - type: ndcg_at_20 value: 57.589999999999996 - type: ndcg_at_100 value: 60.539 - type: ndcg_at_1000 value: 61.897000000000006 - type: map_at_1 value: 36.230000000000004 - type: map_at_3 value: 44.929 - type: map_at_5 value: 47.191 - type: map_at_10 value: 48.88 - type: map_at_20 value: 49.685 - type: map_at_100 value: 50.327 - type: map_at_1000 value: 50.431000000000004 - type: recall_at_1 value: 36.230000000000004 - type: recall_at_3 value: 53.173 - type: recall_at_5 value: 60.35 - type: recall_at_10 value: 69.07 - type: recall_at_20 value: 75.371 - type: recall_at_100 value: 88.736 - type: recall_at_1000 value: 96.75399999999999 - type: precision_at_1 value: 44.349 - type: precision_at_3 value: 23.748 - type: precision_at_5 value: 17.368 - type: precision_at_10 value: 10.629 - type: precision_at_20 value: 6.152 - type: precision_at_100 value: 1.6150000000000002 - type: precision_at_1000 value: 0.201 - type: mrr_at_1 value: 44.3491 - type: mrr_at_3 value: 52.0744 - type: mrr_at_5 value: 53.9628 - type: mrr_at_10 value: 54.9072 - type: mrr_at_20 value: 55.19539999999999 - type: mrr_at_100 value: 55.4537 - type: mrr_at_1000 value: 55.4787 - type: nauc_ndcg_at_1_max value: 36.404599999999995 - type: nauc_ndcg_at_1_std value: -4.5556 - type: nauc_ndcg_at_1_diff1 value: 57.4025 - type: nauc_ndcg_at_3_max value: 38.0347 - type: nauc_ndcg_at_3_std value: -2.2339 - type: nauc_ndcg_at_3_diff1 value: 50.9146 - type: nauc_ndcg_at_5_max value: 38.2927 - type: nauc_ndcg_at_5_std value: -2.3645 - type: nauc_ndcg_at_5_diff1 value: 51.638 - type: nauc_ndcg_at_10_max value: 38.4619 - type: nauc_ndcg_at_10_std value: -2.8955 - type: nauc_ndcg_at_10_diff1 value: 51.35849999999999 - type: nauc_ndcg_at_20_max value: 38.2122 - type: nauc_ndcg_at_20_std value: -1.9339 - type: nauc_ndcg_at_20_diff1 value: 50.4981 - type: nauc_ndcg_at_100_max value: 39.380900000000004 - type: nauc_ndcg_at_100_std value: -0.21889999999999998 - type: nauc_ndcg_at_100_diff1 value: 51.5696 - type: nauc_ndcg_at_1000_max value: 38.9069 - type: nauc_ndcg_at_1000_std value: -0.8251 - type: nauc_ndcg_at_1000_diff1 value: 51.605500000000006 - type: nauc_map_at_1_max value: 31.694 - type: nauc_map_at_1_std value: -4.2857 - type: nauc_map_at_1_diff1 value: 57.991400000000006 - type: nauc_map_at_3_max value: 36.115399999999994 - type: nauc_map_at_3_std value: -3.9859999999999998 - type: nauc_map_at_3_diff1 value: 52.394 - type: nauc_map_at_5_max value: 36.896499999999996 - type: nauc_map_at_5_std value: -3.6282 - type: nauc_map_at_5_diff1 value: 52.7023 - type: nauc_map_at_10_max value: 37.2695 - type: nauc_map_at_10_std value: -3.7142 - type: nauc_map_at_10_diff1 value: 52.6081 - type: nauc_map_at_20_max value: 37.4097 - type: nauc_map_at_20_std value: -3.0479 - type: nauc_map_at_20_diff1 value: 52.2999 - type: nauc_map_at_100_max value: 37.6608 - type: nauc_map_at_100_std value: -2.7363999999999997 - type: nauc_map_at_100_diff1 value: 52.5068 - type: nauc_map_at_1000_max value: 37.6406 - type: nauc_map_at_1000_std value: -2.7695000000000003 - type: nauc_map_at_1000_diff1 value: 52.5091 - type: nauc_recall_at_1_max value: 31.694 - type: nauc_recall_at_1_std value: -4.2857 - type: nauc_recall_at_1_diff1 value: 57.991400000000006 - type: nauc_recall_at_3_max value: 35.9705 - type: nauc_recall_at_3_std value: -2.78 - type: nauc_recall_at_3_diff1 value: 44.2342 - type: nauc_recall_at_5_max value: 36.3608 - type: nauc_recall_at_5_std value: -1.8541999999999998 - type: nauc_recall_at_5_diff1 value: 45.0955 - type: nauc_recall_at_10_max value: 35.7364 - type: nauc_recall_at_10_std value: -3.2479 - type: nauc_recall_at_10_diff1 value: 42.3031 - type: nauc_recall_at_20_max value: 34.7814 - type: nauc_recall_at_20_std value: 0.7642 - type: nauc_recall_at_20_diff1 value: 37.3357 - type: nauc_recall_at_100_max value: 49.1721 - type: nauc_recall_at_100_std value: 27.8334 - type: nauc_recall_at_100_diff1 value: 39.549 - type: nauc_recall_at_1000_max value: 59.516400000000004 - type: nauc_recall_at_1000_std value: 66.1089 - type: nauc_recall_at_1000_diff1 value: 31.4818 - type: nauc_precision_at_1_max value: 36.404599999999995 - type: nauc_precision_at_1_std value: -4.5556 - type: nauc_precision_at_1_diff1 value: 57.4025 - type: nauc_precision_at_3_max value: 35.7954 - type: nauc_precision_at_3_std value: 0.6122 - type: nauc_precision_at_3_diff1 value: 29.4346 - type: nauc_precision_at_5_max value: 31.322699999999998 - type: nauc_precision_at_5_std value: 2.2124 - type: nauc_precision_at_5_diff1 value: 21.1992 - type: nauc_precision_at_10_max value: 22.6897 - type: nauc_precision_at_10_std value: 3.6117999999999997 - type: nauc_precision_at_10_diff1 value: 9.0833 - type: nauc_precision_at_20_max value: 14.954799999999999 - type: nauc_precision_at_20_std value: 7.2373 - type: nauc_precision_at_20_diff1 value: -0.544 - type: nauc_precision_at_100_max value: 4.2428 - type: nauc_precision_at_100_std value: 7.3461 - type: nauc_precision_at_100_diff1 value: -11.3684 - type: nauc_precision_at_1000_max value: -9.148399999999999 - type: nauc_precision_at_1000_std value: -3.5724 - type: nauc_precision_at_1000_diff1 value: -19.142400000000002 - type: nauc_mrr_at_1_max value: 36.404599999999995 - type: nauc_mrr_at_1_std value: -4.5556 - type: nauc_mrr_at_1_diff1 value: 57.4025 - type: nauc_mrr_at_3_max value: 38.7222 - type: nauc_mrr_at_3_std value: -2.3924000000000003 - type: nauc_mrr_at_3_diff1 value: 52.7995 - type: nauc_mrr_at_5_max value: 38.7579 - type: nauc_mrr_at_5_std value: -2.6441 - type: nauc_mrr_at_5_diff1 value: 53.547599999999996 - type: nauc_mrr_at_10_max value: 38.7832 - type: nauc_mrr_at_10_std value: -2.5202999999999998 - type: nauc_mrr_at_10_diff1 value: 53.4856 - type: nauc_mrr_at_20_max value: 38.6588 - type: nauc_mrr_at_20_std value: -2.501 - type: nauc_mrr_at_20_diff1 value: 53.3571 - type: nauc_mrr_at_100_max value: 38.6456 - type: nauc_mrr_at_100_std value: -2.4756 - type: nauc_mrr_at_100_diff1 value: 53.455600000000004 - type: nauc_mrr_at_1000_max value: 38.6449 - type: nauc_mrr_at_1000_std value: -2.4623 - type: nauc_mrr_at_1000_diff1 value: 53.45419999999999 - type: main_score value: 55.799 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval (default) revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: mteb/cqadupstack-english metrics: - type: ndcg_at_1 value: 44.204 - type: ndcg_at_3 value: 49.549 - type: ndcg_at_5 value: 51.658 - type: ndcg_at_10 value: 53.681 - type: ndcg_at_20 value: 55.129 - type: ndcg_at_100 value: 57.691 - type: ndcg_at_1000 value: 59.325 - type: map_at_1 value: 35.193000000000005 - type: map_at_3 value: 44.005 - type: map_at_5 value: 46.043 - type: map_at_10 value: 47.491 - type: map_at_20 value: 48.169000000000004 - type: map_at_100 value: 48.789 - type: map_at_1000 value: 48.898 - type: recall_at_1 value: 35.193000000000005 - type: recall_at_3 value: 51.333 - type: recall_at_5 value: 57.436 - type: recall_at_10 value: 63.991 - type: recall_at_20 value: 69.37100000000001 - type: recall_at_100 value: 81.099 - type: recall_at_1000 value: 91.363 - type: precision_at_1 value: 44.204 - type: precision_at_3 value: 24.374000000000002 - type: precision_at_5 value: 17.287 - type: precision_at_10 value: 10.293 - type: precision_at_20 value: 5.943 - type: precision_at_100 value: 1.5730000000000002 - type: precision_at_1000 value: 0.197 - type: mrr_at_1 value: 44.2038 - type: mrr_at_3 value: 51.624199999999995 - type: mrr_at_5 value: 52.9459 - type: mrr_at_10 value: 53.697399999999995 - type: mrr_at_20 value: 54.028200000000005 - type: mrr_at_100 value: 54.267900000000004 - type: mrr_at_1000 value: 54.3028 - type: nauc_ndcg_at_1_max value: 45.3525 - type: nauc_ndcg_at_1_std value: -2.2124 - type: nauc_ndcg_at_1_diff1 value: 59.392100000000006 - type: nauc_ndcg_at_3_max value: 46.6258 - type: nauc_ndcg_at_3_std value: -2.8042000000000002 - type: nauc_ndcg_at_3_diff1 value: 55.0995 - type: nauc_ndcg_at_5_max value: 47.3391 - type: nauc_ndcg_at_5_std value: -1.8336999999999999 - type: nauc_ndcg_at_5_diff1 value: 54.848 - type: nauc_ndcg_at_10_max value: 47.713899999999995 - type: nauc_ndcg_at_10_std value: -0.6185 - type: nauc_ndcg_at_10_diff1 value: 54.6241 - type: nauc_ndcg_at_20_max value: 48.072900000000004 - type: nauc_ndcg_at_20_std value: -0.21589999999999998 - type: nauc_ndcg_at_20_diff1 value: 54.655100000000004 - type: nauc_ndcg_at_100_max value: 48.4791 - type: nauc_ndcg_at_100_std value: 1.9865000000000002 - type: nauc_ndcg_at_100_diff1 value: 54.033 - type: nauc_ndcg_at_1000_max value: 48.3686 - type: nauc_ndcg_at_1000_std value: 1.8716 - type: nauc_ndcg_at_1000_diff1 value: 54.125 - type: nauc_map_at_1_max value: 34.797200000000004 - type: nauc_map_at_1_std value: -13.140199999999998 - type: nauc_map_at_1_diff1 value: 61.197100000000006 - type: nauc_map_at_3_max value: 41.4347 - type: nauc_map_at_3_std value: -10.0816 - type: nauc_map_at_3_diff1 value: 57.8979 - type: nauc_map_at_5_max value: 43.1536 - type: nauc_map_at_5_std value: -7.8041 - type: nauc_map_at_5_diff1 value: 57.1125 - type: nauc_map_at_10_max value: 44.243700000000004 - type: nauc_map_at_10_std value: -6.047000000000001 - type: nauc_map_at_10_diff1 value: 56.688700000000004 - type: nauc_map_at_20_max value: 44.7799 - type: nauc_map_at_20_std value: -5.2916 - type: nauc_map_at_20_diff1 value: 56.565799999999996 - type: nauc_map_at_100_max value: 45.3233 - type: nauc_map_at_100_std value: -4.287 - type: nauc_map_at_100_diff1 value: 56.41460000000001 - type: nauc_map_at_1000_max value: 45.3992 - type: nauc_map_at_1000_std value: -4.1593 - type: nauc_map_at_1000_diff1 value: 56.413599999999995 - type: nauc_recall_at_1_max value: 34.797200000000004 - type: nauc_recall_at_1_std value: -13.140199999999998 - type: nauc_recall_at_1_diff1 value: 61.197100000000006 - type: nauc_recall_at_3_max value: 42.7264 - type: nauc_recall_at_3_std value: -8.201799999999999 - type: nauc_recall_at_3_diff1 value: 52.3494 - type: nauc_recall_at_5_max value: 44.6494 - type: nauc_recall_at_5_std value: -3.3112999999999997 - type: nauc_recall_at_5_diff1 value: 50.1019 - type: nauc_recall_at_10_max value: 46.6669 - type: nauc_recall_at_10_std value: 2.3359 - type: nauc_recall_at_10_diff1 value: 48.1454 - type: nauc_recall_at_20_max value: 48.7828 - type: nauc_recall_at_20_std value: 6.0266 - type: nauc_recall_at_20_diff1 value: 46.786699999999996 - type: nauc_recall_at_100_max value: 53.081999999999994 - type: nauc_recall_at_100_std value: 24.1569 - type: nauc_recall_at_100_diff1 value: 40.4049 - type: nauc_recall_at_1000_max value: 55.803000000000004 - type: nauc_recall_at_1000_std value: 36.3769 - type: nauc_recall_at_1000_diff1 value: 34.336 - type: nauc_precision_at_1_max value: 45.3525 - type: nauc_precision_at_1_std value: -2.2124 - type: nauc_precision_at_1_diff1 value: 59.392100000000006 - type: nauc_precision_at_3_max value: 44.2838 - type: nauc_precision_at_3_std value: 14.3908 - type: nauc_precision_at_3_diff1 value: 27.219700000000003 - type: nauc_precision_at_5_max value: 42.9914 - type: nauc_precision_at_5_std value: 23.0682 - type: nauc_precision_at_5_diff1 value: 16.2263 - type: nauc_precision_at_10_max value: 38.5042 - type: nauc_precision_at_10_std value: 30.792199999999998 - type: nauc_precision_at_10_diff1 value: 5.7691 - type: nauc_precision_at_20_max value: 34.417500000000004 - type: nauc_precision_at_20_std value: 34.1749 - type: nauc_precision_at_20_diff1 value: -0.9022 - type: nauc_precision_at_100_max value: 27.4072 - type: nauc_precision_at_100_std value: 42.4351 - type: nauc_precision_at_100_diff1 value: -11.407 - type: nauc_precision_at_1000_max value: 16.142400000000002 - type: nauc_precision_at_1000_std value: 36.4482 - type: nauc_precision_at_1000_diff1 value: -16.8073 - type: nauc_mrr_at_1_max value: 45.3525 - type: nauc_mrr_at_1_std value: -2.2124 - type: nauc_mrr_at_1_diff1 value: 59.392100000000006 - type: nauc_mrr_at_3_max value: 48.7407 - type: nauc_mrr_at_3_std value: 0.2074 - type: nauc_mrr_at_3_diff1 value: 55.8153 - type: nauc_mrr_at_5_max value: 48.9081 - type: nauc_mrr_at_5_std value: 0.9781 - type: nauc_mrr_at_5_diff1 value: 55.6807 - type: nauc_mrr_at_10_max value: 48.7888 - type: nauc_mrr_at_10_std value: 1.384 - type: nauc_mrr_at_10_diff1 value: 55.5207 - type: nauc_mrr_at_20_max value: 48.7371 - type: nauc_mrr_at_20_std value: 1.3671 - type: nauc_mrr_at_20_diff1 value: 55.508199999999995 - type: nauc_mrr_at_100_max value: 48.7472 - type: nauc_mrr_at_100_std value: 1.5221 - type: nauc_mrr_at_100_diff1 value: 55.5036 - type: nauc_mrr_at_1000_max value: 48.7402 - type: nauc_mrr_at_1000_std value: 1.5072 - type: nauc_mrr_at_1000_diff1 value: 55.507 - type: main_score value: 53.681 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval (default) revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: mteb/cqadupstack-gaming metrics: - type: ndcg_at_1 value: 50.345 - type: ndcg_at_3 value: 57.776 - type: ndcg_at_5 value: 60.477000000000004 - type: ndcg_at_10 value: 63.172 - type: ndcg_at_20 value: 64.62 - type: ndcg_at_100 value: 66.538 - type: ndcg_at_1000 value: 67.43 - type: map_at_1 value: 44.153 - type: map_at_3 value: 53.979 - type: map_at_5 value: 55.925000000000004 - type: map_at_10 value: 57.32899999999999 - type: map_at_20 value: 57.879000000000005 - type: map_at_100 value: 58.239 - type: map_at_1000 value: 58.285 - type: recall_at_1 value: 44.153 - type: recall_at_3 value: 62.766999999999996 - type: recall_at_5 value: 69.405 - type: recall_at_10 value: 77.107 - type: recall_at_20 value: 82.337 - type: recall_at_100 value: 91.307 - type: recall_at_1000 value: 97.586 - type: precision_at_1 value: 50.345 - type: precision_at_3 value: 25.601000000000003 - type: precision_at_5 value: 17.416999999999998 - type: precision_at_10 value: 9.994 - type: precision_at_20 value: 5.492 - type: precision_at_100 value: 1.261 - type: precision_at_1000 value: 0.13799999999999998 - type: mrr_at_1 value: 50.3448 - type: mrr_at_3 value: 58.160900000000005 - type: mrr_at_5 value: 59.549600000000005 - type: mrr_at_10 value: 60.545899999999996 - type: mrr_at_20 value: 60.8453 - type: mrr_at_100 value: 61.06120000000001 - type: mrr_at_1000 value: 61.083299999999994 - type: nauc_ndcg_at_1_max value: 39.467400000000005 - type: nauc_ndcg_at_1_std value: -6.512 - type: nauc_ndcg_at_1_diff1 value: 57.337700000000005 - type: nauc_ndcg_at_3_max value: 42.8884 - type: nauc_ndcg_at_3_std value: -6.0156 - type: nauc_ndcg_at_3_diff1 value: 54.432 - type: nauc_ndcg_at_5_max value: 44.831500000000005 - type: nauc_ndcg_at_5_std value: -4.3286999999999995 - type: nauc_ndcg_at_5_diff1 value: 54.6971 - type: nauc_ndcg_at_10_max value: 44.391799999999996 - type: nauc_ndcg_at_10_std value: -3.6792 - type: nauc_ndcg_at_10_diff1 value: 53.749199999999995 - type: nauc_ndcg_at_20_max value: 44.9459 - type: nauc_ndcg_at_20_std value: -2.1965 - type: nauc_ndcg_at_20_diff1 value: 53.7261 - type: nauc_ndcg_at_100_max value: 45.0603 - type: nauc_ndcg_at_100_std value: -1.1026 - type: nauc_ndcg_at_100_diff1 value: 54.059900000000006 - type: nauc_ndcg_at_1000_max value: 44.9294 - type: nauc_ndcg_at_1000_std value: -1.7629 - type: nauc_ndcg_at_1000_diff1 value: 54.57189999999999 - type: nauc_map_at_1_max value: 34.3031 - type: nauc_map_at_1_std value: -8.9637 - type: nauc_map_at_1_diff1 value: 57.99100000000001 - type: nauc_map_at_3_max value: 40.732 - type: nauc_map_at_3_std value: -8.312999999999999 - type: nauc_map_at_3_diff1 value: 55.9106 - type: nauc_map_at_5_max value: 42.1709 - type: nauc_map_at_5_std value: -6.9354 - type: nauc_map_at_5_diff1 value: 56.042899999999996 - type: nauc_map_at_10_max value: 42.1589 - type: nauc_map_at_10_std value: -6.3601 - type: nauc_map_at_10_diff1 value: 55.490700000000004 - type: nauc_map_at_20_max value: 42.595 - type: nauc_map_at_20_std value: -5.5588 - type: nauc_map_at_20_diff1 value: 55.4651 - type: nauc_map_at_100_max value: 42.6911 - type: nauc_map_at_100_std value: -5.2459999999999996 - type: nauc_map_at_100_diff1 value: 55.45060000000001 - type: nauc_map_at_1000_max value: 42.7134 - type: nauc_map_at_1000_std value: -5.2317 - type: nauc_map_at_1000_diff1 value: 55.4871 - type: nauc_recall_at_1_max value: 34.3031 - type: nauc_recall_at_1_std value: -8.9637 - type: nauc_recall_at_1_diff1 value: 57.99100000000001 - type: nauc_recall_at_3_max value: 43.623400000000004 - type: nauc_recall_at_3_std value: -6.2843 - type: nauc_recall_at_3_diff1 value: 50.775800000000004 - type: nauc_recall_at_5_max value: 48.7222 - type: nauc_recall_at_5_std value: -0.9506000000000001 - type: nauc_recall_at_5_diff1 value: 50.41480000000001 - type: nauc_recall_at_10_max value: 47.6178 - type: nauc_recall_at_10_std value: 2.2783 - type: nauc_recall_at_10_diff1 value: 45.1663 - type: nauc_recall_at_20_max value: 51.454 - type: nauc_recall_at_20_std value: 11.8339 - type: nauc_recall_at_20_diff1 value: 42.8694 - type: nauc_recall_at_100_max value: 58.145500000000006 - type: nauc_recall_at_100_std value: 35.4717 - type: nauc_recall_at_100_diff1 value: 40.8401 - type: nauc_recall_at_1000_max value: 79.9122 - type: nauc_recall_at_1000_std value: 64.5076 - type: nauc_recall_at_1000_diff1 value: 48.7357 - type: nauc_precision_at_1_max value: 39.467400000000005 - type: nauc_precision_at_1_std value: -6.512 - type: nauc_precision_at_1_diff1 value: 57.337700000000005 - type: nauc_precision_at_3_max value: 39.763799999999996 - type: nauc_precision_at_3_std value: 2.8881 - type: nauc_precision_at_3_diff1 value: 30.5735 - type: nauc_precision_at_5_max value: 38.062200000000004 - type: nauc_precision_at_5_std value: 10.2952 - type: nauc_precision_at_5_diff1 value: 21.2531 - type: nauc_precision_at_10_max value: 31.330099999999998 - type: nauc_precision_at_10_std value: 16.6561 - type: nauc_precision_at_10_diff1 value: 8.4745 - type: nauc_precision_at_20_max value: 28.5499 - type: nauc_precision_at_20_std value: 25.593300000000003 - type: nauc_precision_at_20_diff1 value: 0.8708 - type: nauc_precision_at_100_max value: 20.275299999999998 - type: nauc_precision_at_100_std value: 31.6878 - type: nauc_precision_at_100_diff1 value: -8.8113 - type: nauc_precision_at_1000_max value: 15.4133 - type: nauc_precision_at_1000_std value: 29.5211 - type: nauc_precision_at_1000_diff1 value: -11.061300000000001 - type: nauc_mrr_at_1_max value: 39.467400000000005 - type: nauc_mrr_at_1_std value: -6.512 - type: nauc_mrr_at_1_diff1 value: 57.337700000000005 - type: nauc_mrr_at_3_max value: 42.9279 - type: nauc_mrr_at_3_std value: -5.251200000000001 - type: nauc_mrr_at_3_diff1 value: 54.8802 - type: nauc_mrr_at_5_max value: 43.5261 - type: nauc_mrr_at_5_std value: -4.4842 - type: nauc_mrr_at_5_diff1 value: 54.874500000000005 - type: nauc_mrr_at_10_max value: 43.2392 - type: nauc_mrr_at_10_std value: -4.2739 - type: nauc_mrr_at_10_diff1 value: 54.5466 - type: nauc_mrr_at_20_max value: 43.2263 - type: nauc_mrr_at_20_std value: -4.122 - type: nauc_mrr_at_20_diff1 value: 54.5397 - type: nauc_mrr_at_100_max value: 43.2131 - type: nauc_mrr_at_100_std value: -4.041 - type: nauc_mrr_at_100_diff1 value: 54.586800000000004 - type: nauc_mrr_at_1000_max value: 43.2078 - type: nauc_mrr_at_1000_std value: -4.0622 - type: nauc_mrr_at_1000_diff1 value: 54.606100000000005 - type: main_score value: 63.172 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval (default) revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: mteb/cqadupstack-gis metrics: - type: ndcg_at_1 value: 32.429 - type: ndcg_at_3 value: 39.639 - type: ndcg_at_5 value: 42.051 - type: ndcg_at_10 value: 44.759 - type: ndcg_at_20 value: 46.588 - type: ndcg_at_100 value: 49.457 - type: ndcg_at_1000 value: 51.248000000000005 - type: map_at_1 value: 30.259999999999998 - type: map_at_3 value: 36.998 - type: map_at_5 value: 38.452 - type: map_at_10 value: 39.653 - type: map_at_20 value: 40.199 - type: map_at_100 value: 40.63 - type: map_at_1000 value: 40.701 - type: recall_at_1 value: 30.259999999999998 - type: recall_at_3 value: 44.531 - type: recall_at_5 value: 50.349999999999994 - type: recall_at_10 value: 58.294999999999995 - type: recall_at_20 value: 65.19200000000001 - type: recall_at_100 value: 79.699 - type: recall_at_1000 value: 93.181 - type: precision_at_1 value: 32.429 - type: precision_at_3 value: 16.61 - type: precision_at_5 value: 11.39 - type: precision_at_10 value: 6.746 - type: precision_at_20 value: 3.8019999999999996 - type: precision_at_100 value: 0.963 - type: precision_at_1000 value: 0.11399999999999999 - type: mrr_at_1 value: 32.4294 - type: mrr_at_3 value: 39.265499999999996 - type: mrr_at_5 value: 40.6158 - type: mrr_at_10 value: 41.7454 - type: mrr_at_20 value: 42.187999999999995 - type: mrr_at_100 value: 42.530699999999996 - type: mrr_at_1000 value: 42.584300000000006 - type: nauc_ndcg_at_1_max value: 30.2344 - type: nauc_ndcg_at_1_std value: -8.76 - type: nauc_ndcg_at_1_diff1 value: 43.3339 - type: nauc_ndcg_at_3_max value: 31.300299999999996 - type: nauc_ndcg_at_3_std value: -5.2691 - type: nauc_ndcg_at_3_diff1 value: 39.6872 - type: nauc_ndcg_at_5_max value: 31.844099999999997 - type: nauc_ndcg_at_5_std value: -4.228400000000001 - type: nauc_ndcg_at_5_diff1 value: 38.2047 - type: nauc_ndcg_at_10_max value: 31.664900000000003 - type: nauc_ndcg_at_10_std value: -3.2960000000000003 - type: nauc_ndcg_at_10_diff1 value: 36.6259 - type: nauc_ndcg_at_20_max value: 31.630999999999997 - type: nauc_ndcg_at_20_std value: -2.6685 - type: nauc_ndcg_at_20_diff1 value: 36.577 - type: nauc_ndcg_at_100_max value: 32.283899999999996 - type: nauc_ndcg_at_100_std value: -2.1553 - type: nauc_ndcg_at_100_diff1 value: 36.3958 - type: nauc_ndcg_at_1000_max value: 32.4852 - type: nauc_ndcg_at_1000_std value: -2.3408 - type: nauc_ndcg_at_1000_diff1 value: 37.0227 - type: nauc_map_at_1_max value: 27.620800000000003 - type: nauc_map_at_1_std value: -10.7657 - type: nauc_map_at_1_diff1 value: 43.7864 - type: nauc_map_at_3_max value: 30.0483 - type: nauc_map_at_3_std value: -6.9221 - type: nauc_map_at_3_diff1 value: 40.826 - type: nauc_map_at_5_max value: 30.560399999999998 - type: nauc_map_at_5_std value: -6.1894 - type: nauc_map_at_5_diff1 value: 40.0042 - type: nauc_map_at_10_max value: 30.665100000000002 - type: nauc_map_at_10_std value: -5.8472 - type: nauc_map_at_10_diff1 value: 39.3857 - type: nauc_map_at_20_max value: 30.761699999999998 - type: nauc_map_at_20_std value: -5.591 - type: nauc_map_at_20_diff1 value: 39.4111 - type: nauc_map_at_100_max value: 30.859399999999997 - type: nauc_map_at_100_std value: -5.532 - type: nauc_map_at_100_diff1 value: 39.3888 - type: nauc_map_at_1000_max value: 30.871199999999998 - type: nauc_map_at_1000_std value: -5.5322000000000005 - type: nauc_map_at_1000_diff1 value: 39.4166 - type: nauc_recall_at_1_max value: 27.620800000000003 - type: nauc_recall_at_1_std value: -10.7657 - type: nauc_recall_at_1_diff1 value: 43.7864 - type: nauc_recall_at_3_max value: 31.187199999999997 - type: nauc_recall_at_3_std value: -2.5515 - type: nauc_recall_at_3_diff1 value: 36.9576 - type: nauc_recall_at_5_max value: 32.6827 - type: nauc_recall_at_5_std value: -0.4259 - type: nauc_recall_at_5_diff1 value: 33.1674 - type: nauc_recall_at_10_max value: 31.729400000000002 - type: nauc_recall_at_10_std value: 2.8294 - type: nauc_recall_at_10_diff1 value: 27.7289 - type: nauc_recall_at_20_max value: 30.9251 - type: nauc_recall_at_20_std value: 5.9573 - type: nauc_recall_at_20_diff1 value: 26.271499999999996 - type: nauc_recall_at_100_max value: 35.8557 - type: nauc_recall_at_100_std value: 14.478399999999999 - type: nauc_recall_at_100_diff1 value: 20.6213 - type: nauc_recall_at_1000_max value: 49.7086 - type: nauc_recall_at_1000_std value: 36.9282 - type: nauc_recall_at_1000_diff1 value: 14.288300000000001 - type: nauc_precision_at_1_max value: 30.2344 - type: nauc_precision_at_1_std value: -8.76 - type: nauc_precision_at_1_diff1 value: 43.3339 - type: nauc_precision_at_3_max value: 34.808699999999995 - type: nauc_precision_at_3_std value: 0.7861999999999999 - type: nauc_precision_at_3_diff1 value: 33.232299999999995 - type: nauc_precision_at_5_max value: 35.9325 - type: nauc_precision_at_5_std value: 4.1644 - type: nauc_precision_at_5_diff1 value: 28.872799999999998 - type: nauc_precision_at_10_max value: 34.2471 - type: nauc_precision_at_10_std value: 7.2728 - type: nauc_precision_at_10_diff1 value: 21.044999999999998 - type: nauc_precision_at_20_max value: 31.828200000000002 - type: nauc_precision_at_20_std value: 10.2775 - type: nauc_precision_at_20_diff1 value: 16.7988 - type: nauc_precision_at_100_max value: 26.320100000000004 - type: nauc_precision_at_100_std value: 14.0416 - type: nauc_precision_at_100_diff1 value: 3.4286999999999996 - type: nauc_precision_at_1000_max value: 17.6282 - type: nauc_precision_at_1000_std value: 13.1888 - type: nauc_precision_at_1000_diff1 value: -6.7075 - type: nauc_mrr_at_1_max value: 30.2344 - type: nauc_mrr_at_1_std value: -8.76 - type: nauc_mrr_at_1_diff1 value: 43.3339 - type: nauc_mrr_at_3_max value: 32.2423 - type: nauc_mrr_at_3_std value: -4.6264 - type: nauc_mrr_at_3_diff1 value: 39.6214 - type: nauc_mrr_at_5_max value: 32.496199999999995 - type: nauc_mrr_at_5_std value: -4.3406 - type: nauc_mrr_at_5_diff1 value: 38.921 - type: nauc_mrr_at_10_max value: 32.330799999999996 - type: nauc_mrr_at_10_std value: -3.943 - type: nauc_mrr_at_10_diff1 value: 38.2251 - type: nauc_mrr_at_20_max value: 32.1807 - type: nauc_mrr_at_20_std value: -3.9316999999999998 - type: nauc_mrr_at_20_diff1 value: 38.2161 - type: nauc_mrr_at_100_max value: 32.2413 - type: nauc_mrr_at_100_std value: -3.8869000000000002 - type: nauc_mrr_at_100_diff1 value: 38.217800000000004 - type: nauc_mrr_at_1000_max value: 32.2481 - type: nauc_mrr_at_1000_std value: -3.8933000000000004 - type: nauc_mrr_at_1000_diff1 value: 38.2515 - type: main_score value: 44.759 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval (default) revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: mteb/cqadupstack-mathematica metrics: - type: ndcg_at_1 value: 22.761 - type: ndcg_at_3 value: 27.578999999999997 - type: ndcg_at_5 value: 30.067 - type: ndcg_at_10 value: 32.823 - type: ndcg_at_20 value: 35.129 - type: ndcg_at_100 value: 38.903999999999996 - type: ndcg_at_1000 value: 41.181 - type: map_at_1 value: 18.360000000000003 - type: map_at_3 value: 24.264 - type: map_at_5 value: 25.844 - type: map_at_10 value: 27.093 - type: map_at_20 value: 27.839999999999996 - type: map_at_100 value: 28.416999999999998 - type: map_at_1000 value: 28.517 - type: recall_at_1 value: 18.360000000000003 - type: recall_at_3 value: 31.044 - type: recall_at_5 value: 37.432 - type: recall_at_10 value: 45.525999999999996 - type: recall_at_20 value: 53.557 - type: recall_at_100 value: 72.14500000000001 - type: recall_at_1000 value: 88.041 - type: precision_at_1 value: 22.761 - type: precision_at_3 value: 13.350000000000001 - type: precision_at_5 value: 9.801 - type: precision_at_10 value: 6.157 - type: precision_at_20 value: 3.744 - type: precision_at_100 value: 1.055 - type: precision_at_1000 value: 0.13799999999999998 - type: mrr_at_1 value: 22.761200000000002 - type: mrr_at_3 value: 29.187400000000004 - type: mrr_at_5 value: 30.866500000000002 - type: mrr_at_10 value: 32.0236 - type: mrr_at_20 value: 32.5924 - type: mrr_at_100 value: 32.995000000000005 - type: mrr_at_1000 value: 33.042100000000005 - type: nauc_ndcg_at_1_max value: 22.3876 - type: nauc_ndcg_at_1_std value: -0.26649999999999996 - type: nauc_ndcg_at_1_diff1 value: 42.7688 - type: nauc_ndcg_at_3_max value: 24.329 - type: nauc_ndcg_at_3_std value: 1.3894 - type: nauc_ndcg_at_3_diff1 value: 38.5792 - type: nauc_ndcg_at_5_max value: 24.331 - type: nauc_ndcg_at_5_std value: 3.1460000000000004 - type: nauc_ndcg_at_5_diff1 value: 36.1599 - type: nauc_ndcg_at_10_max value: 23.9962 - type: nauc_ndcg_at_10_std value: 3.6198 - type: nauc_ndcg_at_10_diff1 value: 34.615899999999996 - type: nauc_ndcg_at_20_max value: 23.189899999999998 - type: nauc_ndcg_at_20_std value: 3.3743000000000003 - type: nauc_ndcg_at_20_diff1 value: 34.5344 - type: nauc_ndcg_at_100_max value: 24.1644 - type: nauc_ndcg_at_100_std value: 5.3245000000000005 - type: nauc_ndcg_at_100_diff1 value: 34.1404 - type: nauc_ndcg_at_1000_max value: 24.4504 - type: nauc_ndcg_at_1000_std value: 5.0385 - type: nauc_ndcg_at_1000_diff1 value: 34.3277 - type: nauc_map_at_1_max value: 20.5435 - type: nauc_map_at_1_std value: -0.1746 - type: nauc_map_at_1_diff1 value: 43.252 - type: nauc_map_at_3_max value: 23.108999999999998 - type: nauc_map_at_3_std value: 0.8848 - type: nauc_map_at_3_diff1 value: 39.9259 - type: nauc_map_at_5_max value: 23.329900000000002 - type: nauc_map_at_5_std value: 1.7795999999999998 - type: nauc_map_at_5_diff1 value: 38.448 - type: nauc_map_at_10_max value: 23.1789 - type: nauc_map_at_10_std value: 2.1036 - type: nauc_map_at_10_diff1 value: 37.653 - type: nauc_map_at_20_max value: 22.9132 - type: nauc_map_at_20_std value: 2.1094 - type: nauc_map_at_20_diff1 value: 37.5569 - type: nauc_map_at_100_max value: 23.0857 - type: nauc_map_at_100_std value: 2.4645 - type: nauc_map_at_100_diff1 value: 37.4881 - type: nauc_map_at_1000_max value: 23.0988 - type: nauc_map_at_1000_std value: 2.4427999999999996 - type: nauc_map_at_1000_diff1 value: 37.4707 - type: nauc_recall_at_1_max value: 20.5435 - type: nauc_recall_at_1_std value: -0.1746 - type: nauc_recall_at_1_diff1 value: 43.252 - type: nauc_recall_at_3_max value: 24.393500000000003 - type: nauc_recall_at_3_std value: 3.3230999999999997 - type: nauc_recall_at_3_diff1 value: 34.7983 - type: nauc_recall_at_5_max value: 23.4229 - type: nauc_recall_at_5_std value: 6.2542 - type: nauc_recall_at_5_diff1 value: 28.8147 - type: nauc_recall_at_10_max value: 22.6162 - type: nauc_recall_at_10_std value: 6.9113 - type: nauc_recall_at_10_diff1 value: 24.617900000000002 - type: nauc_recall_at_20_max value: 19.8826 - type: nauc_recall_at_20_std value: 6.0004 - type: nauc_recall_at_20_diff1 value: 24.0887 - type: nauc_recall_at_100_max value: 24.428900000000002 - type: nauc_recall_at_100_std value: 18.8358 - type: nauc_recall_at_100_diff1 value: 18.6841 - type: nauc_recall_at_1000_max value: 34.9059 - type: nauc_recall_at_1000_std value: 30.6124 - type: nauc_recall_at_1000_diff1 value: 11.7067 - type: nauc_precision_at_1_max value: 22.3876 - type: nauc_precision_at_1_std value: -0.26649999999999996 - type: nauc_precision_at_1_diff1 value: 42.7688 - type: nauc_precision_at_3_max value: 24.7919 - type: nauc_precision_at_3_std value: 1.3971 - type: nauc_precision_at_3_diff1 value: 32.175599999999996 - type: nauc_precision_at_5_max value: 25.4503 - type: nauc_precision_at_5_std value: 4.4636000000000005 - type: nauc_precision_at_5_diff1 value: 25.453599999999998 - type: nauc_precision_at_10_max value: 21.1404 - type: nauc_precision_at_10_std value: 4.7988 - type: nauc_precision_at_10_diff1 value: 17.3144 - type: nauc_precision_at_20_max value: 16.4733 - type: nauc_precision_at_20_std value: 3.7228999999999997 - type: nauc_precision_at_20_diff1 value: 12.853 - type: nauc_precision_at_100_max value: 12.5551 - type: nauc_precision_at_100_std value: 6.2132 - type: nauc_precision_at_100_diff1 value: 1.2163 - type: nauc_precision_at_1000_max value: 2.706 - type: nauc_precision_at_1000_std value: -0.7363999999999999 - type: nauc_precision_at_1000_diff1 value: -6.0556 - type: nauc_mrr_at_1_max value: 22.3876 - type: nauc_mrr_at_1_std value: -0.26649999999999996 - type: nauc_mrr_at_1_diff1 value: 42.7688 - type: nauc_mrr_at_3_max value: 24.9398 - type: nauc_mrr_at_3_std value: 1.5026 - type: nauc_mrr_at_3_diff1 value: 39.2078 - type: nauc_mrr_at_5_max value: 24.9525 - type: nauc_mrr_at_5_std value: 2.2446 - type: nauc_mrr_at_5_diff1 value: 37.9502 - type: nauc_mrr_at_10_max value: 24.8361 - type: nauc_mrr_at_10_std value: 2.1445 - type: nauc_mrr_at_10_diff1 value: 37.4108 - type: nauc_mrr_at_20_max value: 24.529300000000003 - type: nauc_mrr_at_20_std value: 2.0292 - type: nauc_mrr_at_20_diff1 value: 37.3959 - type: nauc_mrr_at_100_max value: 24.627299999999998 - type: nauc_mrr_at_100_std value: 2.2496 - type: nauc_mrr_at_100_diff1 value: 37.4236 - type: nauc_mrr_at_1000_max value: 24.6481 - type: nauc_mrr_at_1000_std value: 2.2540999999999998 - type: nauc_mrr_at_1000_diff1 value: 37.4501 - type: main_score value: 32.823 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval (default) revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: mteb/cqadupstack-physics metrics: - type: ndcg_at_1 value: 40.135 - type: ndcg_at_3 value: 45.062999999999995 - type: ndcg_at_5 value: 47.674 - type: ndcg_at_10 value: 50.312 - type: ndcg_at_20 value: 52.349000000000004 - type: ndcg_at_100 value: 55.428 - type: ndcg_at_1000 value: 57.202 - type: map_at_1 value: 32.757 - type: map_at_3 value: 40.722 - type: map_at_5 value: 42.656 - type: map_at_10 value: 44.162 - type: map_at_20 value: 44.889 - type: map_at_100 value: 45.454 - type: map_at_1000 value: 45.562999999999995 - type: recall_at_1 value: 32.757 - type: recall_at_3 value: 48.120000000000005 - type: recall_at_5 value: 54.666000000000004 - type: recall_at_10 value: 62.632 - type: recall_at_20 value: 69.592 - type: recall_at_100 value: 83.863 - type: recall_at_1000 value: 95.065 - type: precision_at_1 value: 40.135 - type: precision_at_3 value: 21.367 - type: precision_at_5 value: 15.265 - type: precision_at_10 value: 9.057 - type: precision_at_20 value: 5.25 - type: precision_at_100 value: 1.347 - type: precision_at_1000 value: 0.169 - type: mrr_at_1 value: 40.1347 - type: mrr_at_3 value: 47.3532 - type: mrr_at_5 value: 48.8547 - type: mrr_at_10 value: 49.9016 - type: mrr_at_20 value: 50.31250000000001 - type: mrr_at_100 value: 50.6278 - type: mrr_at_1000 value: 50.6652 - type: nauc_ndcg_at_1_max value: 38.7881 - type: nauc_ndcg_at_1_std value: -8.296000000000001 - type: nauc_ndcg_at_1_diff1 value: 52.21130000000001 - type: nauc_ndcg_at_3_max value: 38.7708 - type: nauc_ndcg_at_3_std value: -6.576700000000001 - type: nauc_ndcg_at_3_diff1 value: 48.9321 - type: nauc_ndcg_at_5_max value: 38.438 - type: nauc_ndcg_at_5_std value: -6.2548 - type: nauc_ndcg_at_5_diff1 value: 48.0762 - type: nauc_ndcg_at_10_max value: 38.365899999999996 - type: nauc_ndcg_at_10_std value: -5.7385 - type: nauc_ndcg_at_10_diff1 value: 48.158899999999996 - type: nauc_ndcg_at_20_max value: 39.0394 - type: nauc_ndcg_at_20_std value: -5.0741000000000005 - type: nauc_ndcg_at_20_diff1 value: 48.540499999999994 - type: nauc_ndcg_at_100_max value: 39.7277 - type: nauc_ndcg_at_100_std value: -2.7447 - type: nauc_ndcg_at_100_diff1 value: 47.9735 - type: nauc_ndcg_at_1000_max value: 40.0211 - type: nauc_ndcg_at_1000_std value: -2.7227 - type: nauc_ndcg_at_1000_diff1 value: 48.1857 - type: nauc_map_at_1_max value: 33.7229 - type: nauc_map_at_1_std value: -12.5585 - type: nauc_map_at_1_diff1 value: 54.0852 - type: nauc_map_at_3_max value: 36.403 - type: nauc_map_at_3_std value: -9.1775 - type: nauc_map_at_3_diff1 value: 49.7749 - type: nauc_map_at_5_max value: 36.804500000000004 - type: nauc_map_at_5_std value: -8.4613 - type: nauc_map_at_5_diff1 value: 49.1705 - type: nauc_map_at_10_max value: 37.3301 - type: nauc_map_at_10_std value: -7.706200000000001 - type: nauc_map_at_10_diff1 value: 49.3899 - type: nauc_map_at_20_max value: 37.541999999999994 - type: nauc_map_at_20_std value: -7.4139 - type: nauc_map_at_20_diff1 value: 49.4555 - type: nauc_map_at_100_max value: 37.7874 - type: nauc_map_at_100_std value: -6.8967 - type: nauc_map_at_100_diff1 value: 49.336999999999996 - type: nauc_map_at_1000_max value: 37.8174 - type: nauc_map_at_1000_std value: -6.8435 - type: nauc_map_at_1000_diff1 value: 49.3269 - type: nauc_recall_at_1_max value: 33.7229 - type: nauc_recall_at_1_std value: -12.5585 - type: nauc_recall_at_1_diff1 value: 54.0852 - type: nauc_recall_at_3_max value: 34.7265 - type: nauc_recall_at_3_std value: -8.2544 - type: nauc_recall_at_3_diff1 value: 45.2066 - type: nauc_recall_at_5_max value: 34.319 - type: nauc_recall_at_5_std value: -6.7825 - type: nauc_recall_at_5_diff1 value: 41.783 - type: nauc_recall_at_10_max value: 34.5308 - type: nauc_recall_at_10_std value: -3.8527 - type: nauc_recall_at_10_diff1 value: 40.9153 - type: nauc_recall_at_20_max value: 36.6563 - type: nauc_recall_at_20_std value: -0.6942 - type: nauc_recall_at_20_diff1 value: 41.7078 - type: nauc_recall_at_100_max value: 38.7406 - type: nauc_recall_at_100_std value: 18.8691 - type: nauc_recall_at_100_diff1 value: 34.8788 - type: nauc_recall_at_1000_max value: 53.96490000000001 - type: nauc_recall_at_1000_std value: 46.1526 - type: nauc_recall_at_1000_diff1 value: 34.4075 - type: nauc_precision_at_1_max value: 38.7881 - type: nauc_precision_at_1_std value: -8.296000000000001 - type: nauc_precision_at_1_diff1 value: 52.21130000000001 - type: nauc_precision_at_3_max value: 38.4296 - type: nauc_precision_at_3_std value: 5.1817 - type: nauc_precision_at_3_diff1 value: 32.3129 - type: nauc_precision_at_5_max value: 33.9238 - type: nauc_precision_at_5_std value: 10.5533 - type: nauc_precision_at_5_diff1 value: 22.5911 - type: nauc_precision_at_10_max value: 30.967 - type: nauc_precision_at_10_std value: 16.371 - type: nauc_precision_at_10_diff1 value: 15.714 - type: nauc_precision_at_20_max value: 27.0551 - type: nauc_precision_at_20_std value: 18.2058 - type: nauc_precision_at_20_diff1 value: 10.084 - type: nauc_precision_at_100_max value: 18.493000000000002 - type: nauc_precision_at_100_std value: 25.315199999999997 - type: nauc_precision_at_100_diff1 value: -5.4256 - type: nauc_precision_at_1000_max value: 6.7 - type: nauc_precision_at_1000_std value: 22.2852 - type: nauc_precision_at_1000_diff1 value: -14.102 - type: nauc_mrr_at_1_max value: 38.7881 - type: nauc_mrr_at_1_std value: -8.296000000000001 - type: nauc_mrr_at_1_diff1 value: 52.21130000000001 - type: nauc_mrr_at_3_max value: 40.9462 - type: nauc_mrr_at_3_std value: -5.224 - type: nauc_mrr_at_3_diff1 value: 49.9567 - type: nauc_mrr_at_5_max value: 40.6606 - type: nauc_mrr_at_5_std value: -5.1892000000000005 - type: nauc_mrr_at_5_diff1 value: 49.274499999999996 - type: nauc_mrr_at_10_max value: 40.7644 - type: nauc_mrr_at_10_std value: -4.7934 - type: nauc_mrr_at_10_diff1 value: 49.2337 - type: nauc_mrr_at_20_max value: 40.8569 - type: nauc_mrr_at_20_std value: -4.7076 - type: nauc_mrr_at_20_diff1 value: 49.358999999999995 - type: nauc_mrr_at_100_max value: 40.8362 - type: nauc_mrr_at_100_std value: -4.5678 - type: nauc_mrr_at_100_diff1 value: 49.32 - type: nauc_mrr_at_1000_max value: 40.827400000000004 - type: nauc_mrr_at_1000_std value: -4.5844000000000005 - type: nauc_mrr_at_1000_diff1 value: 49.3213 - type: main_score value: 50.312 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval (default) revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: mteb/cqadupstack-programmers metrics: - type: ndcg_at_1 value: 38.013999999999996 - type: ndcg_at_3 value: 42.824 - type: ndcg_at_5 value: 45.074999999999996 - type: ndcg_at_10 value: 47.769 - type: ndcg_at_20 value: 49.964 - type: ndcg_at_100 value: 53.271 - type: ndcg_at_1000 value: 55.217000000000006 - type: map_at_1 value: 31.751 - type: map_at_3 value: 38.95 - type: map_at_5 value: 40.681 - type: map_at_10 value: 42.097 - type: map_at_20 value: 42.892 - type: map_at_100 value: 43.472 - type: map_at_1000 value: 43.578 - type: recall_at_1 value: 31.751 - type: recall_at_3 value: 45.409 - type: recall_at_5 value: 51.373000000000005 - type: recall_at_10 value: 59.168 - type: recall_at_20 value: 66.669 - type: recall_at_100 value: 82.26400000000001 - type: recall_at_1000 value: 95.017 - type: precision_at_1 value: 38.013999999999996 - type: precision_at_3 value: 19.977 - type: precision_at_5 value: 14.11 - type: precision_at_10 value: 8.493 - type: precision_at_20 value: 5.0 - type: precision_at_100 value: 1.312 - type: precision_at_1000 value: 0.165 - type: mrr_at_1 value: 38.0137 - type: mrr_at_3 value: 44.9772 - type: mrr_at_5 value: 46.387 - type: mrr_at_10 value: 47.384100000000004 - type: mrr_at_20 value: 47.8746 - type: mrr_at_100 value: 48.2235 - type: mrr_at_1000 value: 48.2699 - type: nauc_ndcg_at_1_max value: 35.9967 - type: nauc_ndcg_at_1_std value: 4.926500000000001 - type: nauc_ndcg_at_1_diff1 value: 43.5414 - type: nauc_ndcg_at_3_max value: 35.4574 - type: nauc_ndcg_at_3_std value: 2.6951 - type: nauc_ndcg_at_3_diff1 value: 38.5888 - type: nauc_ndcg_at_5_max value: 35.7783 - type: nauc_ndcg_at_5_std value: 3.5970000000000004 - type: nauc_ndcg_at_5_diff1 value: 38.107 - type: nauc_ndcg_at_10_max value: 35.9047 - type: nauc_ndcg_at_10_std value: 5.3849 - type: nauc_ndcg_at_10_diff1 value: 37.6917 - type: nauc_ndcg_at_20_max value: 37.4203 - type: nauc_ndcg_at_20_std value: 7.5072 - type: nauc_ndcg_at_20_diff1 value: 37.9429 - type: nauc_ndcg_at_100_max value: 37.913000000000004 - type: nauc_ndcg_at_100_std value: 8.8726 - type: nauc_ndcg_at_100_diff1 value: 37.8018 - type: nauc_ndcg_at_1000_max value: 37.7521 - type: nauc_ndcg_at_1000_std value: 8.0898 - type: nauc_ndcg_at_1000_diff1 value: 38.188 - type: nauc_map_at_1_max value: 30.6039 - type: nauc_map_at_1_std value: -1.1973 - type: nauc_map_at_1_diff1 value: 44.4956 - type: nauc_map_at_3_max value: 33.79 - type: nauc_map_at_3_std value: 0.7224999999999999 - type: nauc_map_at_3_diff1 value: 40.5918 - type: nauc_map_at_5_max value: 34.799 - type: nauc_map_at_5_std value: 1.9663 - type: nauc_map_at_5_diff1 value: 40.119 - type: nauc_map_at_10_max value: 35.0036 - type: nauc_map_at_10_std value: 2.9479 - type: nauc_map_at_10_diff1 value: 39.725899999999996 - type: nauc_map_at_20_max value: 35.6907 - type: nauc_map_at_20_std value: 3.7684 - type: nauc_map_at_20_diff1 value: 39.6845 - type: nauc_map_at_100_max value: 35.8249 - type: nauc_map_at_100_std value: 4.123 - type: nauc_map_at_100_diff1 value: 39.6397 - type: nauc_map_at_1000_max value: 35.8146 - type: nauc_map_at_1000_std value: 4.100899999999999 - type: nauc_map_at_1000_diff1 value: 39.6511 - type: nauc_recall_at_1_max value: 30.6039 - type: nauc_recall_at_1_std value: -1.1973 - type: nauc_recall_at_1_diff1 value: 44.4956 - type: nauc_recall_at_3_max value: 33.9619 - type: nauc_recall_at_3_std value: 1.3599 - type: nauc_recall_at_3_diff1 value: 36.673899999999996 - type: nauc_recall_at_5_max value: 34.798899999999996 - type: nauc_recall_at_5_std value: 3.9083 - type: nauc_recall_at_5_diff1 value: 34.2275 - type: nauc_recall_at_10_max value: 34.3508 - type: nauc_recall_at_10_std value: 8.6454 - type: nauc_recall_at_10_diff1 value: 31.9422 - type: nauc_recall_at_20_max value: 39.1475 - type: nauc_recall_at_20_std value: 17.0303 - type: nauc_recall_at_20_diff1 value: 32.138099999999994 - type: nauc_recall_at_100_max value: 43.452 - type: nauc_recall_at_100_std value: 31.8449 - type: nauc_recall_at_100_diff1 value: 27.38 - type: nauc_recall_at_1000_max value: 56.720000000000006 - type: nauc_recall_at_1000_std value: 51.5088 - type: nauc_recall_at_1000_diff1 value: 28.131099999999996 - type: nauc_precision_at_1_max value: 35.9967 - type: nauc_precision_at_1_std value: 4.926500000000001 - type: nauc_precision_at_1_diff1 value: 43.5414 - type: nauc_precision_at_3_max value: 36.204 - type: nauc_precision_at_3_std value: 9.6793 - type: nauc_precision_at_3_diff1 value: 22.8807 - type: nauc_precision_at_5_max value: 34.226 - type: nauc_precision_at_5_std value: 14.0818 - type: nauc_precision_at_5_diff1 value: 16.223000000000003 - type: nauc_precision_at_10_max value: 28.3789 - type: nauc_precision_at_10_std value: 18.8125 - type: nauc_precision_at_10_diff1 value: 7.382700000000001 - type: nauc_precision_at_20_max value: 26.151600000000002 - type: nauc_precision_at_20_std value: 22.352 - type: nauc_precision_at_20_diff1 value: 1.0934 - type: nauc_precision_at_100_max value: 13.886399999999998 - type: nauc_precision_at_100_std value: 21.5356 - type: nauc_precision_at_100_diff1 value: -10.3265 - type: nauc_precision_at_1000_max value: -1.5730000000000002 - type: nauc_precision_at_1000_std value: 9.9943 - type: nauc_precision_at_1000_diff1 value: -18.5193 - type: nauc_mrr_at_1_max value: 35.9967 - type: nauc_mrr_at_1_std value: 4.926500000000001 - type: nauc_mrr_at_1_diff1 value: 43.5414 - type: nauc_mrr_at_3_max value: 37.1377 - type: nauc_mrr_at_3_std value: 5.6196 - type: nauc_mrr_at_3_diff1 value: 38.9643 - type: nauc_mrr_at_5_max value: 36.945499999999996 - type: nauc_mrr_at_5_std value: 5.9594000000000005 - type: nauc_mrr_at_5_diff1 value: 38.431 - type: nauc_mrr_at_10_max value: 37.094300000000004 - type: nauc_mrr_at_10_std value: 6.6665 - type: nauc_mrr_at_10_diff1 value: 38.4148 - type: nauc_mrr_at_20_max value: 37.283100000000005 - type: nauc_mrr_at_20_std value: 7.0301 - type: nauc_mrr_at_20_diff1 value: 38.6425 - type: nauc_mrr_at_100_max value: 37.312200000000004 - type: nauc_mrr_at_100_std value: 7.0826 - type: nauc_mrr_at_100_diff1 value: 38.689800000000005 - type: nauc_mrr_at_1000_max value: 37.319 - type: nauc_mrr_at_1000_std value: 7.0653999999999995 - type: nauc_mrr_at_1000_diff1 value: 38.7106 - type: main_score value: 47.769 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: CQADupstackRetrieval_is_a_combined_dataset split: test type: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: main_score value: 46.10300000000001 - type: ndcg_at_10 value: 46.10300000000001 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval (default) revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: mteb/cqadupstack-stats metrics: - type: ndcg_at_1 value: 32.362 - type: ndcg_at_3 value: 36.026 - type: ndcg_at_5 value: 38.122 - type: ndcg_at_10 value: 40.174 - type: ndcg_at_20 value: 41.836 - type: ndcg_at_100 value: 44.444 - type: ndcg_at_1000 value: 46.929 - type: map_at_1 value: 28.871999999999996 - type: map_at_3 value: 33.613 - type: map_at_5 value: 35.007 - type: map_at_10 value: 35.976 - type: map_at_20 value: 36.496 - type: map_at_100 value: 36.895 - type: map_at_1000 value: 36.994 - type: recall_at_1 value: 28.871999999999996 - type: recall_at_3 value: 38.705 - type: recall_at_5 value: 43.821 - type: recall_at_10 value: 49.921 - type: recall_at_20 value: 56.163 - type: recall_at_100 value: 69.084 - type: recall_at_1000 value: 87.35000000000001 - type: precision_at_1 value: 32.362 - type: precision_at_3 value: 15.184000000000001 - type: precision_at_5 value: 10.583 - type: precision_at_10 value: 6.166 - type: precision_at_20 value: 3.512 - type: precision_at_100 value: 0.897 - type: precision_at_1000 value: 0.11900000000000001 - type: mrr_at_1 value: 32.362 - type: mrr_at_3 value: 36.937599999999996 - type: mrr_at_5 value: 38.1416 - type: mrr_at_10 value: 39.012299999999996 - type: mrr_at_20 value: 39.4119 - type: mrr_at_100 value: 39.745200000000004 - type: mrr_at_1000 value: 39.8191 - type: nauc_ndcg_at_1_max value: 39.396300000000004 - type: nauc_ndcg_at_1_std value: 0.8482 - type: nauc_ndcg_at_1_diff1 value: 52.376999999999995 - type: nauc_ndcg_at_3_max value: 39.0785 - type: nauc_ndcg_at_3_std value: 3.2739 - type: nauc_ndcg_at_3_diff1 value: 48.3207 - type: nauc_ndcg_at_5_max value: 38.4648 - type: nauc_ndcg_at_5_std value: 3.3379 - type: nauc_ndcg_at_5_diff1 value: 47.468500000000006 - type: nauc_ndcg_at_10_max value: 39.0329 - type: nauc_ndcg_at_10_std value: 4.0895 - type: nauc_ndcg_at_10_diff1 value: 46.1268 - type: nauc_ndcg_at_20_max value: 38.359 - type: nauc_ndcg_at_20_std value: 4.2744 - type: nauc_ndcg_at_20_diff1 value: 45.1661 - type: nauc_ndcg_at_100_max value: 39.461 - type: nauc_ndcg_at_100_std value: 7.2038 - type: nauc_ndcg_at_100_diff1 value: 44.809 - type: nauc_ndcg_at_1000_max value: 39.875699999999995 - type: nauc_ndcg_at_1000_std value: 6.9621 - type: nauc_ndcg_at_1000_diff1 value: 45.473200000000006 - type: nauc_map_at_1_max value: 35.936800000000005 - type: nauc_map_at_1_std value: -3.2637 - type: nauc_map_at_1_diff1 value: 52.3431 - type: nauc_map_at_3_max value: 37.8006 - type: nauc_map_at_3_std value: 0.7727999999999999 - type: nauc_map_at_3_diff1 value: 49.1872 - type: nauc_map_at_5_max value: 37.932300000000005 - type: nauc_map_at_5_std value: 1.4745 - type: nauc_map_at_5_diff1 value: 48.8466 - type: nauc_map_at_10_max value: 38.4041 - type: nauc_map_at_10_std value: 2.0481 - type: nauc_map_at_10_diff1 value: 48.2292 - type: nauc_map_at_20_max value: 38.1992 - type: nauc_map_at_20_std value: 2.1198 - type: nauc_map_at_20_diff1 value: 47.9169 - type: nauc_map_at_100_max value: 38.3504 - type: nauc_map_at_100_std value: 2.5100000000000002 - type: nauc_map_at_100_diff1 value: 47.8259 - type: nauc_map_at_1000_max value: 38.3865 - type: nauc_map_at_1000_std value: 2.5181999999999998 - type: nauc_map_at_1000_diff1 value: 47.853699999999996 - type: nauc_recall_at_1_max value: 35.936800000000005 - type: nauc_recall_at_1_std value: -3.2637 - type: nauc_recall_at_1_diff1 value: 52.3431 - type: nauc_recall_at_3_max value: 37.227700000000006 - type: nauc_recall_at_3_std value: 3.8813 - type: nauc_recall_at_3_diff1 value: 44.8185 - type: nauc_recall_at_5_max value: 35.963 - type: nauc_recall_at_5_std value: 4.9497 - type: nauc_recall_at_5_diff1 value: 42.6322 - type: nauc_recall_at_10_max value: 37.358000000000004 - type: nauc_recall_at_10_std value: 6.6888000000000005 - type: nauc_recall_at_10_diff1 value: 38.7639 - type: nauc_recall_at_20_max value: 34.2341 - type: nauc_recall_at_20_std value: 7.0213 - type: nauc_recall_at_20_diff1 value: 34.8021 - type: nauc_recall_at_100_max value: 39.406600000000005 - type: nauc_recall_at_100_std value: 25.7393 - type: nauc_recall_at_100_diff1 value: 29.9173 - type: nauc_recall_at_1000_max value: 45.287 - type: nauc_recall_at_1000_std value: 38.572 - type: nauc_recall_at_1000_diff1 value: 26.744 - type: nauc_precision_at_1_max value: 39.396300000000004 - type: nauc_precision_at_1_std value: 0.8482 - type: nauc_precision_at_1_diff1 value: 52.376999999999995 - type: nauc_precision_at_3_max value: 42.1919 - type: nauc_precision_at_3_std value: 13.9189 - type: nauc_precision_at_3_diff1 value: 40.2337 - type: nauc_precision_at_5_max value: 39.8644 - type: nauc_precision_at_5_std value: 15.656900000000002 - type: nauc_precision_at_5_diff1 value: 35.1421 - type: nauc_precision_at_10_max value: 40.7678 - type: nauc_precision_at_10_std value: 19.5881 - type: nauc_precision_at_10_diff1 value: 28.822300000000002 - type: nauc_precision_at_20_max value: 35.4842 - type: nauc_precision_at_20_std value: 20.6978 - type: nauc_precision_at_20_diff1 value: 21.4608 - type: nauc_precision_at_100_max value: 33.211400000000005 - type: nauc_precision_at_100_std value: 31.5029 - type: nauc_precision_at_100_diff1 value: 13.0526 - type: nauc_precision_at_1000_max value: 21.6976 - type: nauc_precision_at_1000_std value: 26.4203 - type: nauc_precision_at_1000_diff1 value: 2.6056 - type: nauc_mrr_at_1_max value: 39.396300000000004 - type: nauc_mrr_at_1_std value: 0.8482 - type: nauc_mrr_at_1_diff1 value: 52.376999999999995 - type: nauc_mrr_at_3_max value: 40.191 - type: nauc_mrr_at_3_std value: 3.9919999999999995 - type: nauc_mrr_at_3_diff1 value: 49.2714 - type: nauc_mrr_at_5_max value: 39.9654 - type: nauc_mrr_at_5_std value: 4.0258 - type: nauc_mrr_at_5_diff1 value: 48.6599 - type: nauc_mrr_at_10_max value: 40.1413 - type: nauc_mrr_at_10_std value: 4.389 - type: nauc_mrr_at_10_diff1 value: 48.0272 - type: nauc_mrr_at_20_max value: 39.9265 - type: nauc_mrr_at_20_std value: 4.3462 - type: nauc_mrr_at_20_diff1 value: 47.8592 - type: nauc_mrr_at_100_max value: 40.0623 - type: nauc_mrr_at_100_std value: 4.698 - type: nauc_mrr_at_100_diff1 value: 47.8456 - type: nauc_mrr_at_1000_max value: 40.0698 - type: nauc_mrr_at_1000_std value: 4.6803 - type: nauc_mrr_at_1000_diff1 value: 47.8659 - type: main_score value: 40.174 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval (default) revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: mteb/cqadupstack-tex metrics: - type: ndcg_at_1 value: 25.155 - type: ndcg_at_3 value: 29.339 - type: ndcg_at_5 value: 31.452999999999996 - type: ndcg_at_10 value: 33.937 - type: ndcg_at_20 value: 36.018 - type: ndcg_at_100 value: 39.531 - type: ndcg_at_1000 value: 42.22 - type: map_at_1 value: 20.874000000000002 - type: map_at_3 value: 26.345000000000002 - type: map_at_5 value: 27.773999999999997 - type: map_at_10 value: 28.965999999999998 - type: map_at_20 value: 29.625 - type: map_at_100 value: 30.188 - type: map_at_1000 value: 30.314000000000004 - type: recall_at_1 value: 20.874000000000002 - type: recall_at_3 value: 31.984 - type: recall_at_5 value: 37.467 - type: recall_at_10 value: 44.774 - type: recall_at_20 value: 52.323 - type: recall_at_100 value: 69.549 - type: recall_at_1000 value: 88.419 - type: precision_at_1 value: 25.155 - type: precision_at_3 value: 13.719000000000001 - type: precision_at_5 value: 9.841999999999999 - type: precision_at_10 value: 6.069999999999999 - type: precision_at_20 value: 3.6799999999999997 - type: precision_at_100 value: 1.045 - type: precision_at_1000 value: 0.146 - type: mrr_at_1 value: 25.1549 - type: mrr_at_3 value: 30.7123 - type: mrr_at_5 value: 32.0148 - type: mrr_at_10 value: 33.035199999999996 - type: mrr_at_20 value: 33.5778 - type: mrr_at_100 value: 34.0001 - type: mrr_at_1000 value: 34.070499999999996 - type: nauc_ndcg_at_1_max value: 34.6903 - type: nauc_ndcg_at_1_std value: -0.48469999999999996 - type: nauc_ndcg_at_1_diff1 value: 41.827799999999996 - type: nauc_ndcg_at_3_max value: 34.7107 - type: nauc_ndcg_at_3_std value: 1.2525 - type: nauc_ndcg_at_3_diff1 value: 36.09 - type: nauc_ndcg_at_5_max value: 34.363899999999994 - type: nauc_ndcg_at_5_std value: 1.187 - type: nauc_ndcg_at_5_diff1 value: 35.5019 - type: nauc_ndcg_at_10_max value: 34.1261 - type: nauc_ndcg_at_10_std value: 2.0704000000000002 - type: nauc_ndcg_at_10_diff1 value: 35.0098 - type: nauc_ndcg_at_20_max value: 34.5028 - type: nauc_ndcg_at_20_std value: 2.9973 - type: nauc_ndcg_at_20_diff1 value: 34.6486 - type: nauc_ndcg_at_100_max value: 34.8192 - type: nauc_ndcg_at_100_std value: 4.4281 - type: nauc_ndcg_at_100_diff1 value: 34.252500000000005 - type: nauc_ndcg_at_1000_max value: 34.8293 - type: nauc_ndcg_at_1000_std value: 4.2747 - type: nauc_ndcg_at_1000_diff1 value: 34.5083 - type: nauc_map_at_1_max value: 31.448700000000002 - type: nauc_map_at_1_std value: -1.5652 - type: nauc_map_at_1_diff1 value: 42.3532 - type: nauc_map_at_3_max value: 33.458 - type: nauc_map_at_3_std value: 0.372 - type: nauc_map_at_3_diff1 value: 37.6257 - type: nauc_map_at_5_max value: 33.3902 - type: nauc_map_at_5_std value: 0.2957 - type: nauc_map_at_5_diff1 value: 37.0708 - type: nauc_map_at_10_max value: 33.4473 - type: nauc_map_at_10_std value: 0.7451 - type: nauc_map_at_10_diff1 value: 36.7872 - type: nauc_map_at_20_max value: 33.6705 - type: nauc_map_at_20_std value: 1.0755000000000001 - type: nauc_map_at_20_diff1 value: 36.6791 - type: nauc_map_at_100_max value: 33.772200000000005 - type: nauc_map_at_100_std value: 1.308 - type: nauc_map_at_100_diff1 value: 36.5896 - type: nauc_map_at_1000_max value: 33.7881 - type: nauc_map_at_1000_std value: 1.3087 - type: nauc_map_at_1000_diff1 value: 36.5978 - type: nauc_recall_at_1_max value: 31.448700000000002 - type: nauc_recall_at_1_std value: -1.5652 - type: nauc_recall_at_1_diff1 value: 42.3532 - type: nauc_recall_at_3_max value: 33.7171 - type: nauc_recall_at_3_std value: 2.4527 - type: nauc_recall_at_3_diff1 value: 32.6832 - type: nauc_recall_at_5_max value: 32.7828 - type: nauc_recall_at_5_std value: 2.0332 - type: nauc_recall_at_5_diff1 value: 30.8446 - type: nauc_recall_at_10_max value: 31.6463 - type: nauc_recall_at_10_std value: 4.3727 - type: nauc_recall_at_10_diff1 value: 29.1731 - type: nauc_recall_at_20_max value: 31.968999999999998 - type: nauc_recall_at_20_std value: 7.5392 - type: nauc_recall_at_20_diff1 value: 26.961299999999998 - type: nauc_recall_at_100_max value: 32.9142 - type: nauc_recall_at_100_std value: 17.2332 - type: nauc_recall_at_100_diff1 value: 22.0707 - type: nauc_recall_at_1000_max value: 32.1463 - type: nauc_recall_at_1000_std value: 29.664600000000004 - type: nauc_recall_at_1000_diff1 value: 13.9131 - type: nauc_precision_at_1_max value: 34.6903 - type: nauc_precision_at_1_std value: -0.48469999999999996 - type: nauc_precision_at_1_diff1 value: 41.827799999999996 - type: nauc_precision_at_3_max value: 36.8823 - type: nauc_precision_at_3_std value: 3.7052 - type: nauc_precision_at_3_diff1 value: 29.505599999999998 - type: nauc_precision_at_5_max value: 35.106 - type: nauc_precision_at_5_std value: 3.9923 - type: nauc_precision_at_5_diff1 value: 25.684099999999997 - type: nauc_precision_at_10_max value: 32.1139 - type: nauc_precision_at_10_std value: 7.097100000000001 - type: nauc_precision_at_10_diff1 value: 20.521 - type: nauc_precision_at_20_max value: 30.3506 - type: nauc_precision_at_20_std value: 9.7899 - type: nauc_precision_at_20_diff1 value: 16.106 - type: nauc_precision_at_100_max value: 23.7062 - type: nauc_precision_at_100_std value: 12.7852 - type: nauc_precision_at_100_diff1 value: 5.9668 - type: nauc_precision_at_1000_max value: 13.6273 - type: nauc_precision_at_1000_std value: 7.0956 - type: nauc_precision_at_1000_diff1 value: -3.6863 - type: nauc_mrr_at_1_max value: 34.6903 - type: nauc_mrr_at_1_std value: -0.48469999999999996 - type: nauc_mrr_at_1_diff1 value: 41.827799999999996 - type: nauc_mrr_at_3_max value: 35.826 - type: nauc_mrr_at_3_std value: 1.3141999999999998 - type: nauc_mrr_at_3_diff1 value: 37.1995 - type: nauc_mrr_at_5_max value: 35.6178 - type: nauc_mrr_at_5_std value: 1.3211 - type: nauc_mrr_at_5_diff1 value: 36.8396 - type: nauc_mrr_at_10_max value: 35.4784 - type: nauc_mrr_at_10_std value: 1.6153 - type: nauc_mrr_at_10_diff1 value: 36.6262 - type: nauc_mrr_at_20_max value: 35.5478 - type: nauc_mrr_at_20_std value: 1.8614 - type: nauc_mrr_at_20_diff1 value: 36.5754 - type: nauc_mrr_at_100_max value: 35.5825 - type: nauc_mrr_at_100_std value: 1.9792 - type: nauc_mrr_at_100_diff1 value: 36.5758 - type: nauc_mrr_at_1000_max value: 35.5811 - type: nauc_mrr_at_1000_std value: 1.9691 - type: nauc_mrr_at_1000_diff1 value: 36.587399999999995 - type: main_score value: 33.937 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval (default) revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: mteb/cqadupstack-unix metrics: - type: ndcg_at_1 value: 36.381 - type: ndcg_at_3 value: 41.605 - type: ndcg_at_5 value: 43.854 - type: ndcg_at_10 value: 46.831 - type: ndcg_at_20 value: 49.114999999999995 - type: ndcg_at_100 value: 52.071 - type: ndcg_at_1000 value: 53.864999999999995 - type: map_at_1 value: 30.957 - type: map_at_3 value: 38.074999999999996 - type: map_at_5 value: 39.732 - type: map_at_10 value: 41.187000000000005 - type: map_at_20 value: 41.94 - type: map_at_100 value: 42.447 - type: map_at_1000 value: 42.536 - type: recall_at_1 value: 30.957 - type: recall_at_3 value: 45.213 - type: recall_at_5 value: 51.196 - type: recall_at_10 value: 59.724 - type: recall_at_20 value: 67.837 - type: recall_at_100 value: 81.843 - type: recall_at_1000 value: 93.91000000000001 - type: precision_at_1 value: 36.381 - type: precision_at_3 value: 18.999 - type: precision_at_5 value: 13.172 - type: precision_at_10 value: 7.938000000000001 - type: precision_at_20 value: 4.6129999999999995 - type: precision_at_100 value: 1.172 - type: precision_at_1000 value: 0.14300000000000002 - type: mrr_at_1 value: 36.3806 - type: mrr_at_3 value: 42.7239 - type: mrr_at_5 value: 44.0905 - type: mrr_at_10 value: 45.2951 - type: mrr_at_20 value: 45.8788 - type: mrr_at_100 value: 46.1807 - type: mrr_at_1000 value: 46.226800000000004 - type: nauc_ndcg_at_1_max value: 47.0214 - type: nauc_ndcg_at_1_std value: -0.8086 - type: nauc_ndcg_at_1_diff1 value: 55.931200000000004 - type: nauc_ndcg_at_3_max value: 44.829299999999996 - type: nauc_ndcg_at_3_std value: 0.6224000000000001 - type: nauc_ndcg_at_3_diff1 value: 49.7765 - type: nauc_ndcg_at_5_max value: 44.3325 - type: nauc_ndcg_at_5_std value: 0.1854 - type: nauc_ndcg_at_5_diff1 value: 49.0426 - type: nauc_ndcg_at_10_max value: 44.358599999999996 - type: nauc_ndcg_at_10_std value: 0.6905 - type: nauc_ndcg_at_10_diff1 value: 48.1902 - type: nauc_ndcg_at_20_max value: 45.018 - type: nauc_ndcg_at_20_std value: 1.555 - type: nauc_ndcg_at_20_diff1 value: 48.2645 - type: nauc_ndcg_at_100_max value: 45.3244 - type: nauc_ndcg_at_100_std value: 3.0655 - type: nauc_ndcg_at_100_diff1 value: 48.1011 - type: nauc_ndcg_at_1000_max value: 45.2297 - type: nauc_ndcg_at_1000_std value: 2.5452 - type: nauc_ndcg_at_1000_diff1 value: 48.4179 - type: nauc_map_at_1_max value: 44.1846 - type: nauc_map_at_1_std value: -2.661 - type: nauc_map_at_1_diff1 value: 58.4395 - type: nauc_map_at_3_max value: 44.7697 - type: nauc_map_at_3_std value: -0.3776 - type: nauc_map_at_3_diff1 value: 52.7119 - type: nauc_map_at_5_max value: 44.6708 - type: nauc_map_at_5_std value: -0.4622 - type: nauc_map_at_5_diff1 value: 51.8622 - type: nauc_map_at_10_max value: 44.7631 - type: nauc_map_at_10_std value: -0.2403 - type: nauc_map_at_10_diff1 value: 51.439299999999996 - type: nauc_map_at_20_max value: 45.0612 - type: nauc_map_at_20_std value: 0.0038000000000000004 - type: nauc_map_at_20_diff1 value: 51.3768 - type: nauc_map_at_100_max value: 45.137 - type: nauc_map_at_100_std value: 0.2717 - type: nauc_map_at_100_diff1 value: 51.316700000000004 - type: nauc_map_at_1000_max value: 45.1229 - type: nauc_map_at_1000_std value: 0.2513 - type: nauc_map_at_1000_diff1 value: 51.3133 - type: nauc_recall_at_1_max value: 44.1846 - type: nauc_recall_at_1_std value: -2.661 - type: nauc_recall_at_1_diff1 value: 58.4395 - type: nauc_recall_at_3_max value: 41.656 - type: nauc_recall_at_3_std value: 1.6587999999999998 - type: nauc_recall_at_3_diff1 value: 44.9322 - type: nauc_recall_at_5_max value: 40.501 - type: nauc_recall_at_5_std value: 1.1215 - type: nauc_recall_at_5_diff1 value: 41.7702 - type: nauc_recall_at_10_max value: 39.577400000000004 - type: nauc_recall_at_10_std value: 2.172 - type: nauc_recall_at_10_diff1 value: 38.0253 - type: nauc_recall_at_20_max value: 41.1537 - type: nauc_recall_at_20_std value: 6.1195 - type: nauc_recall_at_20_diff1 value: 37.391400000000004 - type: nauc_recall_at_100_max value: 42.2577 - type: nauc_recall_at_100_std value: 20.7745 - type: nauc_recall_at_100_diff1 value: 32.8151 - type: nauc_recall_at_1000_max value: 43.5594 - type: nauc_recall_at_1000_std value: 37.6573 - type: nauc_recall_at_1000_diff1 value: 29.7545 - type: nauc_precision_at_1_max value: 47.0214 - type: nauc_precision_at_1_std value: -0.8086 - type: nauc_precision_at_1_diff1 value: 55.931200000000004 - type: nauc_precision_at_3_max value: 39.4995 - type: nauc_precision_at_3_std value: 5.0051 - type: nauc_precision_at_3_diff1 value: 32.0456 - type: nauc_precision_at_5_max value: 34.972500000000004 - type: nauc_precision_at_5_std value: 5.1238 - type: nauc_precision_at_5_diff1 value: 24.2515 - type: nauc_precision_at_10_max value: 28.364099999999997 - type: nauc_precision_at_10_std value: 6.0539000000000005 - type: nauc_precision_at_10_diff1 value: 14.192599999999999 - type: nauc_precision_at_20_max value: 25.7353 - type: nauc_precision_at_20_std value: 8.860999999999999 - type: nauc_precision_at_20_diff1 value: 7.0925 - type: nauc_precision_at_100_max value: 11.8965 - type: nauc_precision_at_100_std value: 13.143099999999999 - type: nauc_precision_at_100_diff1 value: -8.5811 - type: nauc_precision_at_1000_max value: -3.7232000000000003 - type: nauc_precision_at_1000_std value: 6.392 - type: nauc_precision_at_1000_diff1 value: -20.5151 - type: nauc_mrr_at_1_max value: 47.0214 - type: nauc_mrr_at_1_std value: -0.8086 - type: nauc_mrr_at_1_diff1 value: 55.931200000000004 - type: nauc_mrr_at_3_max value: 45.6591 - type: nauc_mrr_at_3_std value: 0.6383 - type: nauc_mrr_at_3_diff1 value: 50.0407 - type: nauc_mrr_at_5_max value: 45.7236 - type: nauc_mrr_at_5_std value: 0.5502 - type: nauc_mrr_at_5_diff1 value: 49.6432 - type: nauc_mrr_at_10_max value: 45.6287 - type: nauc_mrr_at_10_std value: 0.6239 - type: nauc_mrr_at_10_diff1 value: 49.391200000000005 - type: nauc_mrr_at_20_max value: 45.704899999999995 - type: nauc_mrr_at_20_std value: 0.7987 - type: nauc_mrr_at_20_diff1 value: 49.4844 - type: nauc_mrr_at_100_max value: 45.708 - type: nauc_mrr_at_100_std value: 0.8823 - type: nauc_mrr_at_100_diff1 value: 49.5323 - type: nauc_mrr_at_1000_max value: 45.7135 - type: nauc_mrr_at_1000_std value: 0.8635999999999999 - type: nauc_mrr_at_1000_diff1 value: 49.5497 - type: main_score value: 46.831 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: mteb/cqadupstack-webmasters metrics: - type: ndcg_at_1 value: 34.98 - type: ndcg_at_3 value: 39.911 - type: ndcg_at_5 value: 42.21 - type: ndcg_at_10 value: 45.539 - type: ndcg_at_20 value: 47.964 - type: ndcg_at_100 value: 51.642999999999994 - type: ndcg_at_1000 value: 53.647 - type: map_at_1 value: 30.034 - type: map_at_3 value: 35.97 - type: map_at_5 value: 37.635999999999996 - type: map_at_10 value: 39.367999999999995 - type: map_at_20 value: 40.328 - type: map_at_100 value: 41.158 - type: map_at_1000 value: 41.366 - type: recall_at_1 value: 30.034 - type: recall_at_3 value: 42.006 - type: recall_at_5 value: 47.843 - type: recall_at_10 value: 57.568 - type: recall_at_20 value: 66.493 - type: recall_at_100 value: 84.136 - type: recall_at_1000 value: 95.631 - type: precision_at_1 value: 34.98 - type: precision_at_3 value: 18.116 - type: precision_at_5 value: 13.202 - type: precision_at_10 value: 8.616999999999999 - type: precision_at_20 value: 5.425 - type: precision_at_100 value: 1.6260000000000001 - type: precision_at_1000 value: 0.249 - type: mrr_at_1 value: 34.9802 - type: mrr_at_3 value: 41.172599999999996 - type: mrr_at_5 value: 42.4671 - type: mrr_at_10 value: 43.8709 - type: mrr_at_20 value: 44.4684 - type: mrr_at_100 value: 44.8617 - type: mrr_at_1000 value: 44.9033 - type: nauc_ndcg_at_1_max value: 36.1514 - type: nauc_ndcg_at_1_std value: 6.7383 - type: nauc_ndcg_at_1_diff1 value: 49.9936 - type: nauc_ndcg_at_3_max value: 38.3225 - type: nauc_ndcg_at_3_std value: 8.0985 - type: nauc_ndcg_at_3_diff1 value: 42.9416 - type: nauc_ndcg_at_5_max value: 39.4299 - type: nauc_ndcg_at_5_std value: 9.2335 - type: nauc_ndcg_at_5_diff1 value: 43.4214 - type: nauc_ndcg_at_10_max value: 39.1123 - type: nauc_ndcg_at_10_std value: 9.4134 - type: nauc_ndcg_at_10_diff1 value: 42.6415 - type: nauc_ndcg_at_20_max value: 38.9531 - type: nauc_ndcg_at_20_std value: 9.707 - type: nauc_ndcg_at_20_diff1 value: 43.0215 - type: nauc_ndcg_at_100_max value: 40.3045 - type: nauc_ndcg_at_100_std value: 11.304400000000001 - type: nauc_ndcg_at_100_diff1 value: 43.0846 - type: nauc_ndcg_at_1000_max value: 39.9421 - type: nauc_ndcg_at_1000_std value: 11.1666 - type: nauc_ndcg_at_1000_diff1 value: 43.3505 - type: nauc_map_at_1_max value: 34.735 - type: nauc_map_at_1_std value: 2.9007 - type: nauc_map_at_1_diff1 value: 52.495599999999996 - type: nauc_map_at_3_max value: 37.5749 - type: nauc_map_at_3_std value: 5.1779 - type: nauc_map_at_3_diff1 value: 46.536300000000004 - type: nauc_map_at_5_max value: 38.4721 - type: nauc_map_at_5_std value: 6.0973 - type: nauc_map_at_5_diff1 value: 46.434799999999996 - type: nauc_map_at_10_max value: 38.744299999999996 - type: nauc_map_at_10_std value: 6.7116 - type: nauc_map_at_10_diff1 value: 46.0759 - type: nauc_map_at_20_max value: 38.756 - type: nauc_map_at_20_std value: 7.263699999999999 - type: nauc_map_at_20_diff1 value: 46.0274 - type: nauc_map_at_100_max value: 38.9362 - type: nauc_map_at_100_std value: 8.0227 - type: nauc_map_at_100_diff1 value: 45.8767 - type: nauc_map_at_1000_max value: 38.7473 - type: nauc_map_at_1000_std value: 8.089 - type: nauc_map_at_1000_diff1 value: 45.8848 - type: nauc_recall_at_1_max value: 34.735 - type: nauc_recall_at_1_std value: 2.9007 - type: nauc_recall_at_1_diff1 value: 52.495599999999996 - type: nauc_recall_at_3_max value: 37.1901 - type: nauc_recall_at_3_std value: 6.4211 - type: nauc_recall_at_3_diff1 value: 38.846000000000004 - type: nauc_recall_at_5_max value: 39.8879 - type: nauc_recall_at_5_std value: 9.5204 - type: nauc_recall_at_5_diff1 value: 37.9339 - type: nauc_recall_at_10_max value: 37.181999999999995 - type: nauc_recall_at_10_std value: 9.764100000000001 - type: nauc_recall_at_10_diff1 value: 33.4855 - type: nauc_recall_at_20_max value: 35.6859 - type: nauc_recall_at_20_std value: 13.173599999999999 - type: nauc_recall_at_20_diff1 value: 33.254 - type: nauc_recall_at_100_max value: 42.728100000000005 - type: nauc_recall_at_100_std value: 25.913999999999998 - type: nauc_recall_at_100_diff1 value: 28.9205 - type: nauc_recall_at_1000_max value: 56.496900000000004 - type: nauc_recall_at_1000_std value: 56.183499999999995 - type: nauc_recall_at_1000_diff1 value: 24.8659 - type: nauc_precision_at_1_max value: 36.1514 - type: nauc_precision_at_1_std value: 6.7383 - type: nauc_precision_at_1_diff1 value: 49.9936 - type: nauc_precision_at_3_max value: 36.5767 - type: nauc_precision_at_3_std value: 14.884500000000001 - type: nauc_precision_at_3_diff1 value: 26.1181 - type: nauc_precision_at_5_max value: 33.7094 - type: nauc_precision_at_5_std value: 17.566699999999997 - type: nauc_precision_at_5_diff1 value: 20.061799999999998 - type: nauc_precision_at_10_max value: 28.034 - type: nauc_precision_at_10_std value: 23.1877 - type: nauc_precision_at_10_diff1 value: 9.646799999999999 - type: nauc_precision_at_20_max value: 17.930699999999998 - type: nauc_precision_at_20_std value: 23.0956 - type: nauc_precision_at_20_diff1 value: -0.0383 - type: nauc_precision_at_100_max value: 0.6149 - type: nauc_precision_at_100_std value: 22.7163 - type: nauc_precision_at_100_diff1 value: -8.730400000000001 - type: nauc_precision_at_1000_max value: -19.8022 - type: nauc_precision_at_1000_std value: 8.6017 - type: nauc_precision_at_1000_diff1 value: -14.161499999999998 - type: nauc_mrr_at_1_max value: 36.1514 - type: nauc_mrr_at_1_std value: 6.7383 - type: nauc_mrr_at_1_diff1 value: 49.9936 - type: nauc_mrr_at_3_max value: 37.894299999999994 - type: nauc_mrr_at_3_std value: 8.948599999999999 - type: nauc_mrr_at_3_diff1 value: 43.985400000000006 - type: nauc_mrr_at_5_max value: 38.8686 - type: nauc_mrr_at_5_std value: 9.4464 - type: nauc_mrr_at_5_diff1 value: 43.9985 - type: nauc_mrr_at_10_max value: 38.419 - type: nauc_mrr_at_10_std value: 9.4221 - type: nauc_mrr_at_10_diff1 value: 43.621700000000004 - type: nauc_mrr_at_20_max value: 38.3933 - type: nauc_mrr_at_20_std value: 9.6024 - type: nauc_mrr_at_20_diff1 value: 43.8952 - type: nauc_mrr_at_100_max value: 38.4371 - type: nauc_mrr_at_100_std value: 9.657200000000001 - type: nauc_mrr_at_100_diff1 value: 43.9457 - type: nauc_mrr_at_1000_max value: 38.4386 - type: nauc_mrr_at_1000_std value: 9.6614 - type: nauc_mrr_at_1000_diff1 value: 43.9579 - type: main_score value: 45.539 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval (default) revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack-wordpress metrics: - type: ndcg_at_1 value: 26.987 - type: ndcg_at_3 value: 33.056999999999995 - type: ndcg_at_5 value: 35.356 - type: ndcg_at_10 value: 38.440000000000005 - type: ndcg_at_20 value: 40.136 - type: ndcg_at_100 value: 43.473 - type: ndcg_at_1000 value: 45.687 - type: map_at_1 value: 24.651999999999997 - type: map_at_3 value: 30.416999999999998 - type: map_at_5 value: 31.863999999999997 - type: map_at_10 value: 33.253 - type: map_at_20 value: 33.756 - type: map_at_100 value: 34.257 - type: map_at_1000 value: 34.347 - type: recall_at_1 value: 24.651999999999997 - type: recall_at_3 value: 37.88 - type: recall_at_5 value: 43.136 - type: recall_at_10 value: 52.06699999999999 - type: recall_at_20 value: 58.540000000000006 - type: recall_at_100 value: 75.22 - type: recall_at_1000 value: 91.774 - type: precision_at_1 value: 26.987 - type: precision_at_3 value: 14.048 - type: precision_at_5 value: 9.871 - type: precision_at_10 value: 6.063000000000001 - type: precision_at_20 value: 3.4099999999999997 - type: precision_at_100 value: 0.922 - type: precision_at_1000 value: 0.123 - type: mrr_at_1 value: 26.9871 - type: mrr_at_3 value: 33.1485 - type: mrr_at_5 value: 34.3407 - type: mrr_at_10 value: 35.6087 - type: mrr_at_20 value: 36.0483 - type: mrr_at_100 value: 36.463699999999996 - type: mrr_at_1000 value: 36.5278 - type: nauc_ndcg_at_1_max value: 26.6537 - type: nauc_ndcg_at_1_std value: -3.9813 - type: nauc_ndcg_at_1_diff1 value: 47.8302 - type: nauc_ndcg_at_3_max value: 27.3661 - type: nauc_ndcg_at_3_std value: -2.2132 - type: nauc_ndcg_at_3_diff1 value: 39.9424 - type: nauc_ndcg_at_5_max value: 27.417799999999996 - type: nauc_ndcg_at_5_std value: -1.0684 - type: nauc_ndcg_at_5_diff1 value: 39.163599999999995 - type: nauc_ndcg_at_10_max value: 26.555400000000002 - type: nauc_ndcg_at_10_std value: 0.0103 - type: nauc_ndcg_at_10_diff1 value: 38.9487 - type: nauc_ndcg_at_20_max value: 25.963900000000002 - type: nauc_ndcg_at_20_std value: 0.7779 - type: nauc_ndcg_at_20_diff1 value: 38.7279 - type: nauc_ndcg_at_100_max value: 26.6365 - type: nauc_ndcg_at_100_std value: 3.0018 - type: nauc_ndcg_at_100_diff1 value: 38.1326 - type: nauc_ndcg_at_1000_max value: 26.52 - type: nauc_ndcg_at_1000_std value: 2.6968 - type: nauc_ndcg_at_1000_diff1 value: 38.1665 - type: nauc_map_at_1_max value: 24.950400000000002 - type: nauc_map_at_1_std value: -4.2715000000000005 - type: nauc_map_at_1_diff1 value: 48.2994 - type: nauc_map_at_3_max value: 26.4208 - type: nauc_map_at_3_std value: -3.0675 - type: nauc_map_at_3_diff1 value: 41.987 - type: nauc_map_at_5_max value: 26.641900000000003 - type: nauc_map_at_5_std value: -2.3005 - type: nauc_map_at_5_diff1 value: 41.4695 - type: nauc_map_at_10_max value: 26.2781 - type: nauc_map_at_10_std value: -1.8994 - type: nauc_map_at_10_diff1 value: 41.193000000000005 - type: nauc_map_at_20_max value: 26.0838 - type: nauc_map_at_20_std value: -1.7046999999999999 - type: nauc_map_at_20_diff1 value: 41.1128 - type: nauc_map_at_100_max value: 26.230199999999996 - type: nauc_map_at_100_std value: -1.2565 - type: nauc_map_at_100_diff1 value: 41.0271 - type: nauc_map_at_1000_max value: 26.2069 - type: nauc_map_at_1000_std value: -1.2469 - type: nauc_map_at_1000_diff1 value: 41.019 - type: nauc_recall_at_1_max value: 24.950400000000002 - type: nauc_recall_at_1_std value: -4.2715000000000005 - type: nauc_recall_at_1_diff1 value: 48.2994 - type: nauc_recall_at_3_max value: 27.2098 - type: nauc_recall_at_3_std value: -1.309 - type: nauc_recall_at_3_diff1 value: 34.4663 - type: nauc_recall_at_5_max value: 27.323700000000002 - type: nauc_recall_at_5_std value: 1.7010999999999998 - type: nauc_recall_at_5_diff1 value: 32.4911 - type: nauc_recall_at_10_max value: 24.6483 - type: nauc_recall_at_10_std value: 4.9019 - type: nauc_recall_at_10_diff1 value: 32.0585 - type: nauc_recall_at_20_max value: 22.556 - type: nauc_recall_at_20_std value: 8.1527 - type: nauc_recall_at_20_diff1 value: 30.8345 - type: nauc_recall_at_100_max value: 25.354300000000002 - type: nauc_recall_at_100_std value: 22.8578 - type: nauc_recall_at_100_diff1 value: 23.291999999999998 - type: nauc_recall_at_1000_max value: 26.523999999999997 - type: nauc_recall_at_1000_std value: 44.7733 - type: nauc_recall_at_1000_diff1 value: 3.1338 - type: nauc_precision_at_1_max value: 26.6537 - type: nauc_precision_at_1_std value: -3.9813 - type: nauc_precision_at_1_diff1 value: 47.8302 - type: nauc_precision_at_3_max value: 30.8201 - type: nauc_precision_at_3_std value: 1.7691 - type: nauc_precision_at_3_diff1 value: 33.3835 - type: nauc_precision_at_5_max value: 29.5433 - type: nauc_precision_at_5_std value: 4.4224 - type: nauc_precision_at_5_diff1 value: 28.426000000000002 - type: nauc_precision_at_10_max value: 26.0888 - type: nauc_precision_at_10_std value: 7.8104000000000005 - type: nauc_precision_at_10_diff1 value: 24.509800000000002 - type: nauc_precision_at_20_max value: 22.218799999999998 - type: nauc_precision_at_20_std value: 11.248099999999999 - type: nauc_precision_at_20_diff1 value: 20.6056 - type: nauc_precision_at_100_max value: 16.4622 - type: nauc_precision_at_100_std value: 25.735200000000003 - type: nauc_precision_at_100_diff1 value: 6.2566 - type: nauc_precision_at_1000_max value: -9.109399999999999 - type: nauc_precision_at_1000_std value: 13.820099999999998 - type: nauc_precision_at_1000_diff1 value: -7.9046 - type: nauc_mrr_at_1_max value: 26.6537 - type: nauc_mrr_at_1_std value: -3.9813 - type: nauc_mrr_at_1_diff1 value: 47.8302 - type: nauc_mrr_at_3_max value: 27.9843 - type: nauc_mrr_at_3_std value: -2.3418 - type: nauc_mrr_at_3_diff1 value: 41.4877 - type: nauc_mrr_at_5_max value: 27.9298 - type: nauc_mrr_at_5_std value: -1.7860999999999998 - type: nauc_mrr_at_5_diff1 value: 40.9261 - type: nauc_mrr_at_10_max value: 27.6814 - type: nauc_mrr_at_10_std value: -1.1542000000000001 - type: nauc_mrr_at_10_diff1 value: 40.9534 - type: nauc_mrr_at_20_max value: 27.507900000000003 - type: nauc_mrr_at_20_std value: -0.9558000000000001 - type: nauc_mrr_at_20_diff1 value: 41.0046 - type: nauc_mrr_at_100_max value: 27.5032 - type: nauc_mrr_at_100_std value: -0.7483 - type: nauc_mrr_at_100_diff1 value: 40.9239 - type: nauc_mrr_at_1000_max value: 27.4957 - type: nauc_mrr_at_1000_std value: -0.7642 - type: nauc_mrr_at_1000_diff1 value: 40.9219 - type: main_score value: 38.440000000000005 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER (default) revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: ndcg_at_1 value: 47.231 - type: ndcg_at_3 value: 38.605000000000004 - type: ndcg_at_5 value: 40.058 - type: ndcg_at_10 value: 43.482 - type: ndcg_at_20 value: 45.732 - type: ndcg_at_100 value: 49.062 - type: ndcg_at_1000 value: 51.605000000000004 - type: map_at_1 value: 20.674 - type: map_at_3 value: 29.375 - type: map_at_5 value: 31.872 - type: map_at_10 value: 33.846 - type: map_at_20 value: 34.733000000000004 - type: map_at_100 value: 35.411 - type: map_at_1000 value: 35.553000000000004 - type: recall_at_1 value: 20.674 - type: recall_at_3 value: 33.859 - type: recall_at_5 value: 39.76 - type: recall_at_10 value: 47.150999999999996 - type: recall_at_20 value: 53.522999999999996 - type: recall_at_100 value: 66.125 - type: recall_at_1000 value: 80.368 - type: precision_at_1 value: 47.231 - type: precision_at_3 value: 28.534 - type: precision_at_5 value: 20.782 - type: precision_at_10 value: 12.742999999999999 - type: precision_at_20 value: 7.342 - type: precision_at_100 value: 1.883 - type: precision_at_1000 value: 0.23700000000000002 - type: mrr_at_1 value: 47.2313 - type: mrr_at_3 value: 55.6352 - type: mrr_at_5 value: 56.92509999999999 - type: mrr_at_10 value: 57.833400000000005 - type: mrr_at_20 value: 58.178700000000006 - type: mrr_at_100 value: 58.385 - type: mrr_at_1000 value: 58.40919999999999 - type: nauc_ndcg_at_1_max value: 41.5456 - type: nauc_ndcg_at_1_std value: 19.2734 - type: nauc_ndcg_at_1_diff1 value: 38.0868 - type: nauc_ndcg_at_3_max value: 41.6105 - type: nauc_ndcg_at_3_std value: 19.5917 - type: nauc_ndcg_at_3_diff1 value: 29.192800000000002 - type: nauc_ndcg_at_5_max value: 42.1893 - type: nauc_ndcg_at_5_std value: 21.9984 - type: nauc_ndcg_at_5_diff1 value: 27.7412 - type: nauc_ndcg_at_10_max value: 42.5633 - type: nauc_ndcg_at_10_std value: 24.265700000000002 - type: nauc_ndcg_at_10_diff1 value: 27.0287 - type: nauc_ndcg_at_20_max value: 43.364200000000004 - type: nauc_ndcg_at_20_std value: 26.2174 - type: nauc_ndcg_at_20_diff1 value: 26.980500000000003 - type: nauc_ndcg_at_100_max value: 43.9582 - type: nauc_ndcg_at_100_std value: 28.454 - type: nauc_ndcg_at_100_diff1 value: 27.087099999999996 - type: nauc_ndcg_at_1000_max value: 44.0356 - type: nauc_ndcg_at_1000_std value: 28.64 - type: nauc_ndcg_at_1000_diff1 value: 27.1343 - type: nauc_map_at_1_max value: 39.2181 - type: nauc_map_at_1_std value: 12.4972 - type: nauc_map_at_1_diff1 value: 39.5664 - type: nauc_map_at_3_max value: 41.5441 - type: nauc_map_at_3_std value: 17.333000000000002 - type: nauc_map_at_3_diff1 value: 29.9555 - type: nauc_map_at_5_max value: 41.0041 - type: nauc_map_at_5_std value: 19.3667 - type: nauc_map_at_5_diff1 value: 28.0157 - type: nauc_map_at_10_max value: 41.2914 - type: nauc_map_at_10_std value: 21.051000000000002 - type: nauc_map_at_10_diff1 value: 27.387 - type: nauc_map_at_20_max value: 41.6964 - type: nauc_map_at_20_std value: 21.9338 - type: nauc_map_at_20_diff1 value: 27.4326 - type: nauc_map_at_100_max value: 41.8592 - type: nauc_map_at_100_std value: 22.46 - type: nauc_map_at_100_diff1 value: 27.4024 - type: nauc_map_at_1000_max value: 41.8737 - type: nauc_map_at_1000_std value: 22.4882 - type: nauc_map_at_1000_diff1 value: 27.405099999999997 - type: nauc_recall_at_1_max value: 39.2181 - type: nauc_recall_at_1_std value: 12.4972 - type: nauc_recall_at_1_diff1 value: 39.5664 - type: nauc_recall_at_3_max value: 41.3571 - type: nauc_recall_at_3_std value: 18.607699999999998 - type: nauc_recall_at_3_diff1 value: 25.8418 - type: nauc_recall_at_5_max value: 39.1225 - type: nauc_recall_at_5_std value: 22.2091 - type: nauc_recall_at_5_diff1 value: 20.9495 - type: nauc_recall_at_10_max value: 38.0045 - type: nauc_recall_at_10_std value: 25.584 - type: nauc_recall_at_10_diff1 value: 18.489 - type: nauc_recall_at_20_max value: 38.0096 - type: nauc_recall_at_20_std value: 29.3335 - type: nauc_recall_at_20_diff1 value: 17.0106 - type: nauc_recall_at_100_max value: 37.7378 - type: nauc_recall_at_100_std value: 37.0189 - type: nauc_recall_at_100_diff1 value: 14.815900000000001 - type: nauc_recall_at_1000_max value: 36.2825 - type: nauc_recall_at_1000_std value: 42.1995 - type: nauc_recall_at_1000_diff1 value: 10.5182 - type: nauc_precision_at_1_max value: 41.5456 - type: nauc_precision_at_1_std value: 19.2734 - type: nauc_precision_at_1_diff1 value: 38.0868 - type: nauc_precision_at_3_max value: 35.72 - type: nauc_precision_at_3_std value: 22.8785 - type: nauc_precision_at_3_diff1 value: 15.240200000000002 - type: nauc_precision_at_5_max value: 30.4643 - type: nauc_precision_at_5_std value: 26.2774 - type: nauc_precision_at_5_diff1 value: 8.8749 - type: nauc_precision_at_10_max value: 25.960299999999997 - type: nauc_precision_at_10_std value: 28.3825 - type: nauc_precision_at_10_diff1 value: 4.626799999999999 - type: nauc_precision_at_20_max value: 24.8278 - type: nauc_precision_at_20_std value: 32.1644 - type: nauc_precision_at_20_diff1 value: 2.5019 - type: nauc_precision_at_100_max value: 17.180999999999997 - type: nauc_precision_at_100_std value: 33.955400000000004 - type: nauc_precision_at_100_diff1 value: -1.9183 - type: nauc_precision_at_1000_max value: 4.8986 - type: nauc_precision_at_1000_std value: 26.5376 - type: nauc_precision_at_1000_diff1 value: -9.3468 - type: nauc_mrr_at_1_max value: 41.5456 - type: nauc_mrr_at_1_std value: 19.2734 - type: nauc_mrr_at_1_diff1 value: 38.0868 - type: nauc_mrr_at_3_max value: 43.7301 - type: nauc_mrr_at_3_std value: 22.409100000000002 - type: nauc_mrr_at_3_diff1 value: 34.846500000000006 - type: nauc_mrr_at_5_max value: 44.0608 - type: nauc_mrr_at_5_std value: 23.3812 - type: nauc_mrr_at_5_diff1 value: 34.5847 - type: nauc_mrr_at_10_max value: 44.026700000000005 - type: nauc_mrr_at_10_std value: 23.339399999999998 - type: nauc_mrr_at_10_diff1 value: 34.7306 - type: nauc_mrr_at_20_max value: 44.1444 - type: nauc_mrr_at_20_std value: 23.5132 - type: nauc_mrr_at_20_diff1 value: 34.6927 - type: nauc_mrr_at_100_max value: 44.1228 - type: nauc_mrr_at_100_std value: 23.5783 - type: nauc_mrr_at_100_diff1 value: 34.7193 - type: nauc_mrr_at_1000_max value: 44.1082 - type: nauc_mrr_at_1000_std value: 23.5574 - type: nauc_mrr_at_1000_diff1 value: 34.719699999999996 - type: main_score value: 43.482 task: type: Retrieval - dataset: config: default name: MTEB DBPedia (default) revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: ndcg_at_1 value: 59.25 - type: ndcg_at_3 value: 48.256 - type: ndcg_at_5 value: 45.580999999999996 - type: ndcg_at_10 value: 43.37 - type: ndcg_at_20 value: 43.106 - type: ndcg_at_100 value: 47.845 - type: ndcg_at_1000 value: 54.974999999999994 - type: map_at_1 value: 10.032 - type: map_at_3 value: 14.954 - type: map_at_5 value: 17.408 - type: map_at_10 value: 20.461 - type: map_at_20 value: 23.759 - type: map_at_100 value: 28.718 - type: map_at_1000 value: 30.406 - type: recall_at_1 value: 10.032 - type: recall_at_3 value: 15.905 - type: recall_at_5 value: 19.622999999999998 - type: recall_at_10 value: 25.125999999999998 - type: recall_at_20 value: 33.262 - type: recall_at_100 value: 52.515 - type: recall_at_1000 value: 75.224 - type: precision_at_1 value: 72.0 - type: precision_at_3 value: 50.917 - type: precision_at_5 value: 43.4 - type: precision_at_10 value: 34.175 - type: precision_at_20 value: 26.325 - type: precision_at_100 value: 10.893 - type: precision_at_1000 value: 2.0549999999999997 - type: mrr_at_1 value: 72.0 - type: mrr_at_3 value: 77.5417 - type: mrr_at_5 value: 78.2042 - type: mrr_at_10 value: 78.7173 - type: mrr_at_20 value: 78.9521 - type: mrr_at_100 value: 79.0382 - type: mrr_at_1000 value: 79.0408 - type: nauc_ndcg_at_1_max value: 49.778 - type: nauc_ndcg_at_1_std value: 20.462 - type: nauc_ndcg_at_1_diff1 value: 49.3621 - type: nauc_ndcg_at_3_max value: 44.4388 - type: nauc_ndcg_at_3_std value: 24.646 - type: nauc_ndcg_at_3_diff1 value: 33.3173 - type: nauc_ndcg_at_5_max value: 44.2179 - type: nauc_ndcg_at_5_std value: 25.597399999999997 - type: nauc_ndcg_at_5_diff1 value: 31.0886 - type: nauc_ndcg_at_10_max value: 43.7812 - type: nauc_ndcg_at_10_std value: 25.61 - type: nauc_ndcg_at_10_diff1 value: 30.667699999999996 - type: nauc_ndcg_at_20_max value: 39.4779 - type: nauc_ndcg_at_20_std value: 20.891000000000002 - type: nauc_ndcg_at_20_diff1 value: 29.492600000000003 - type: nauc_ndcg_at_100_max value: 41.511900000000004 - type: nauc_ndcg_at_100_std value: 27.340999999999998 - type: nauc_ndcg_at_100_diff1 value: 30.5701 - type: nauc_ndcg_at_1000_max value: 47.0571 - type: nauc_ndcg_at_1000_std value: 37.0976 - type: nauc_ndcg_at_1000_diff1 value: 31.5615 - type: nauc_map_at_1_max value: 0.4743 - type: nauc_map_at_1_std value: -23.7532 - type: nauc_map_at_1_diff1 value: 26.0851 - type: nauc_map_at_3_max value: 8.5131 - type: nauc_map_at_3_std value: -18.6015 - type: nauc_map_at_3_diff1 value: 21.9172 - type: nauc_map_at_5_max value: 12.295499999999999 - type: nauc_map_at_5_std value: -13.872100000000001 - type: nauc_map_at_5_diff1 value: 21.3319 - type: nauc_map_at_10_max value: 17.1428 - type: nauc_map_at_10_std value: -6.638199999999999 - type: nauc_map_at_10_diff1 value: 20.8671 - type: nauc_map_at_20_max value: 21.7306 - type: nauc_map_at_20_std value: 2.1404 - type: nauc_map_at_20_diff1 value: 20.7929 - type: nauc_map_at_100_max value: 29.677799999999998 - type: nauc_map_at_100_std value: 16.9458 - type: nauc_map_at_100_diff1 value: 22.4101 - type: nauc_map_at_1000_max value: 31.5735 - type: nauc_map_at_1000_std value: 20.5816 - type: nauc_map_at_1000_diff1 value: 22.561400000000003 - type: nauc_recall_at_1_max value: 0.4743 - type: nauc_recall_at_1_std value: -23.7532 - type: nauc_recall_at_1_diff1 value: 26.0851 - type: nauc_recall_at_3_max value: 6.851500000000001 - type: nauc_recall_at_3_std value: -18.7341 - type: nauc_recall_at_3_diff1 value: 19.703699999999998 - type: nauc_recall_at_5_max value: 10.0265 - type: nauc_recall_at_5_std value: -14.2537 - type: nauc_recall_at_5_diff1 value: 18.8765 - type: nauc_recall_at_10_max value: 14.1582 - type: nauc_recall_at_10_std value: -7.703 - type: nauc_recall_at_10_diff1 value: 17.9056 - type: nauc_recall_at_20_max value: 15.0343 - type: nauc_recall_at_20_std value: -0.9846 - type: nauc_recall_at_20_diff1 value: 14.377899999999999 - type: nauc_recall_at_100_max value: 27.904600000000002 - type: nauc_recall_at_100_std value: 24.6322 - type: nauc_recall_at_100_diff1 value: 16.869500000000002 - type: nauc_recall_at_1000_max value: 33.7755 - type: nauc_recall_at_1000_std value: 42.241800000000005 - type: nauc_recall_at_1000_diff1 value: 17.3324 - type: nauc_precision_at_1_max value: 62.3459 - type: nauc_precision_at_1_std value: 28.3277 - type: nauc_precision_at_1_diff1 value: 57.8053 - type: nauc_precision_at_3_max value: 45.8296 - type: nauc_precision_at_3_std value: 39.8642 - type: nauc_precision_at_3_diff1 value: 15.7381 - type: nauc_precision_at_5_max value: 45.331900000000005 - type: nauc_precision_at_5_std value: 45.1279 - type: nauc_precision_at_5_diff1 value: 11.473700000000001 - type: nauc_precision_at_10_max value: 42.276399999999995 - type: nauc_precision_at_10_std value: 50.9538 - type: nauc_precision_at_10_diff1 value: 6.708699999999999 - type: nauc_precision_at_20_max value: 37.961600000000004 - type: nauc_precision_at_20_std value: 52.0611 - type: nauc_precision_at_20_diff1 value: 5.9309 - type: nauc_precision_at_100_max value: 29.567 - type: nauc_precision_at_100_std value: 50.07 - type: nauc_precision_at_100_diff1 value: 3.2583 - type: nauc_precision_at_1000_max value: 5.5285 - type: nauc_precision_at_1000_std value: 20.5813 - type: nauc_precision_at_1000_diff1 value: -6.6333 - type: nauc_mrr_at_1_max value: 62.3459 - type: nauc_mrr_at_1_std value: 28.3277 - type: nauc_mrr_at_1_diff1 value: 57.8053 - type: nauc_mrr_at_3_max value: 66.5168 - type: nauc_mrr_at_3_std value: 37.4446 - type: nauc_mrr_at_3_diff1 value: 57.6125 - type: nauc_mrr_at_5_max value: 65.8343 - type: nauc_mrr_at_5_std value: 36.6396 - type: nauc_mrr_at_5_diff1 value: 56.91589999999999 - type: nauc_mrr_at_10_max value: 65.73750000000001 - type: nauc_mrr_at_10_std value: 36.4067 - type: nauc_mrr_at_10_diff1 value: 56.9594 - type: nauc_mrr_at_20_max value: 65.6623 - type: nauc_mrr_at_20_std value: 36.0989 - type: nauc_mrr_at_20_diff1 value: 56.9662 - type: nauc_mrr_at_100_max value: 65.6934 - type: nauc_mrr_at_100_std value: 36.0911 - type: nauc_mrr_at_100_diff1 value: 57.0541 - type: nauc_mrr_at_1000_max value: 65.68929999999999 - type: nauc_mrr_at_1000_std value: 36.0838 - type: nauc_mrr_at_1000_diff1 value: 57.054300000000005 - type: main_score value: 43.37 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 42.53 - type: f1 value: 38.4608 - type: f1_weighted value: 44.6927 - type: main_score value: 42.53 task: type: Classification - dataset: config: default name: MTEB FEVER (default) revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: ndcg_at_1 value: 90.519 - type: ndcg_at_3 value: 91.387 - type: ndcg_at_5 value: 91.644 - type: ndcg_at_10 value: 91.91 - type: ndcg_at_20 value: 92.136 - type: ndcg_at_100 value: 92.406 - type: ndcg_at_1000 value: 92.62599999999999 - type: map_at_1 value: 83.994 - type: map_at_3 value: 88.885 - type: map_at_5 value: 89.185 - type: map_at_10 value: 89.36500000000001 - type: map_at_20 value: 89.458 - type: map_at_100 value: 89.515 - type: map_at_1000 value: 89.52799999999999 - type: recall_at_1 value: 83.994 - type: recall_at_3 value: 93.145 - type: recall_at_5 value: 94.016 - type: recall_at_10 value: 94.836 - type: recall_at_20 value: 95.56700000000001 - type: recall_at_100 value: 96.711 - type: recall_at_1000 value: 98.027 - type: precision_at_1 value: 90.519 - type: precision_at_3 value: 33.922999999999995 - type: precision_at_5 value: 20.636 - type: precision_at_10 value: 10.474 - type: precision_at_20 value: 5.316 - type: precision_at_100 value: 1.0919999999999999 - type: precision_at_1000 value: 0.11299999999999999 - type: mrr_at_1 value: 90.5191 - type: mrr_at_3 value: 94.37440000000001 - type: mrr_at_5 value: 94.4832 - type: mrr_at_10 value: 94.5215 - type: mrr_at_20 value: 94.5365 - type: mrr_at_100 value: 94.5422 - type: mrr_at_1000 value: 94.54249999999999 - type: nauc_ndcg_at_1_max value: 22.1341 - type: nauc_ndcg_at_1_std value: -11.1273 - type: nauc_ndcg_at_1_diff1 value: 81.8507 - type: nauc_ndcg_at_3_max value: 16.8937 - type: nauc_ndcg_at_3_std value: -7.1829 - type: nauc_ndcg_at_3_diff1 value: 43.892199999999995 - type: nauc_ndcg_at_5_max value: 17.9177 - type: nauc_ndcg_at_5_std value: -5.2 - type: nauc_ndcg_at_5_diff1 value: 41.9608 - type: nauc_ndcg_at_10_max value: 17.8222 - type: nauc_ndcg_at_10_std value: -3.8736 - type: nauc_ndcg_at_10_diff1 value: 41.955 - type: nauc_ndcg_at_20_max value: 18.467200000000002 - type: nauc_ndcg_at_20_std value: -2.7304 - type: nauc_ndcg_at_20_diff1 value: 42.950300000000006 - type: nauc_ndcg_at_100_max value: 18.5918 - type: nauc_ndcg_at_100_std value: -2.874 - type: nauc_ndcg_at_100_diff1 value: 44.182 - type: nauc_ndcg_at_1000_max value: 18.9498 - type: nauc_ndcg_at_1000_std value: -2.8561 - type: nauc_ndcg_at_1000_diff1 value: 45.5587 - type: nauc_map_at_1_max value: 14.943600000000002 - type: nauc_map_at_1_std value: -6.3744 - type: nauc_map_at_1_diff1 value: 51.697700000000005 - type: nauc_map_at_3_max value: 15.7558 - type: nauc_map_at_3_std value: -5.8517 - type: nauc_map_at_3_diff1 value: 41.814 - type: nauc_map_at_5_max value: 16.6287 - type: nauc_map_at_5_std value: -4.9942 - type: nauc_map_at_5_diff1 value: 41.605199999999996 - type: nauc_map_at_10_max value: 16.8146 - type: nauc_map_at_10_std value: -4.4551 - type: nauc_map_at_10_diff1 value: 41.9641 - type: nauc_map_at_20_max value: 17.0709 - type: nauc_map_at_20_std value: -4.1187000000000005 - type: nauc_map_at_20_diff1 value: 42.3292 - type: nauc_map_at_100_max value: 17.1076 - type: nauc_map_at_100_std value: -4.1089 - type: nauc_map_at_100_diff1 value: 42.5101 - type: nauc_map_at_1000_max value: 17.1309 - type: nauc_map_at_1000_std value: -4.0958000000000006 - type: nauc_map_at_1000_diff1 value: 42.5694 - type: nauc_recall_at_1_max value: 14.943600000000002 - type: nauc_recall_at_1_std value: -6.3744 - type: nauc_recall_at_1_diff1 value: 51.697700000000005 - type: nauc_recall_at_3_max value: 11.8984 - type: nauc_recall_at_3_std value: -4.224 - type: nauc_recall_at_3_diff1 value: 13.962 - type: nauc_recall_at_5_max value: 16.2434 - type: nauc_recall_at_5_std value: 1.6707 - type: nauc_recall_at_5_diff1 value: 7.788 - type: nauc_recall_at_10_max value: 16.4427 - type: nauc_recall_at_10_std value: 8.259 - type: nauc_recall_at_10_diff1 value: 4.5507 - type: nauc_recall_at_20_max value: 19.0546 - type: nauc_recall_at_20_std value: 16.7132 - type: nauc_recall_at_20_diff1 value: 3.5242000000000004 - type: nauc_recall_at_100_max value: 19.6815 - type: nauc_recall_at_100_std value: 21.4767 - type: nauc_recall_at_100_diff1 value: 1.4785 - type: nauc_recall_at_1000_max value: 26.5748 - type: nauc_recall_at_1000_std value: 37.026399999999995 - type: nauc_recall_at_1000_diff1 value: 1.512 - type: nauc_precision_at_1_max value: 22.1341 - type: nauc_precision_at_1_std value: -11.1273 - type: nauc_precision_at_1_diff1 value: 81.8507 - type: nauc_precision_at_3_max value: 13.6152 - type: nauc_precision_at_3_std value: -2.4367 - type: nauc_precision_at_3_diff1 value: 1.6237000000000001 - type: nauc_precision_at_5_max value: 13.977400000000001 - type: nauc_precision_at_5_std value: 4.3391 - type: nauc_precision_at_5_diff1 value: -6.660000000000001 - type: nauc_precision_at_10_max value: 10.4986 - type: nauc_precision_at_10_std value: 8.9132 - type: nauc_precision_at_10_diff1 value: -7.5682 - type: nauc_precision_at_20_max value: 11.0525 - type: nauc_precision_at_20_std value: 12.0579 - type: nauc_precision_at_20_diff1 value: -5.0471 - type: nauc_precision_at_100_max value: 7.1659 - type: nauc_precision_at_100_std value: 8.1754 - type: nauc_precision_at_100_diff1 value: -2.7885 - type: nauc_precision_at_1000_max value: 4.9776 - type: nauc_precision_at_1000_std value: 5.8301 - type: nauc_precision_at_1000_diff1 value: 0.18860000000000002 - type: nauc_mrr_at_1_max value: 22.1341 - type: nauc_mrr_at_1_std value: -11.1273 - type: nauc_mrr_at_1_diff1 value: 81.8507 - type: nauc_mrr_at_3_max value: 21.6738 - type: nauc_mrr_at_3_std value: -15.7016 - type: nauc_mrr_at_3_diff1 value: 81.0757 - type: nauc_mrr_at_5_max value: 22.6603 - type: nauc_mrr_at_5_std value: -14.7345 - type: nauc_mrr_at_5_diff1 value: 81.1092 - type: nauc_mrr_at_10_max value: 22.4279 - type: nauc_mrr_at_10_std value: -14.5002 - type: nauc_mrr_at_10_diff1 value: 81.11080000000001 - type: nauc_mrr_at_20_max value: 22.3604 - type: nauc_mrr_at_20_std value: -14.3058 - type: nauc_mrr_at_20_diff1 value: 81.1563 - type: nauc_mrr_at_100_max value: 22.311 - type: nauc_mrr_at_100_std value: -14.318100000000001 - type: nauc_mrr_at_100_diff1 value: 81.1586 - type: nauc_mrr_at_1000_max value: 22.307199999999998 - type: nauc_mrr_at_1000_std value: -14.3234 - type: nauc_mrr_at_1000_diff1 value: 81.1576 - type: main_score value: 91.91 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 (default) revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: ndcg_at_1 value: 44.753 - type: ndcg_at_3 value: 41.555 - type: ndcg_at_5 value: 42.809999999999995 - type: ndcg_at_10 value: 45.49 - type: ndcg_at_20 value: 48.287 - type: ndcg_at_100 value: 52.115 - type: ndcg_at_1000 value: 54.797 - type: map_at_1 value: 22.894000000000002 - type: map_at_3 value: 32.786 - type: map_at_5 value: 35.495 - type: map_at_10 value: 37.635000000000005 - type: map_at_20 value: 38.771 - type: map_at_100 value: 39.56 - type: map_at_1000 value: 39.734 - type: recall_at_1 value: 22.894000000000002 - type: recall_at_3 value: 37.579 - type: recall_at_5 value: 44.03 - type: recall_at_10 value: 52.61900000000001 - type: recall_at_20 value: 61.227 - type: recall_at_100 value: 76.88199999999999 - type: recall_at_1000 value: 92.534 - type: precision_at_1 value: 44.753 - type: precision_at_3 value: 27.675 - type: precision_at_5 value: 20.556 - type: precision_at_10 value: 12.592999999999998 - type: precision_at_20 value: 7.507999999999999 - type: precision_at_100 value: 1.9369999999999998 - type: precision_at_1000 value: 0.242 - type: mrr_at_1 value: 44.7531 - type: mrr_at_3 value: 50.694399999999995 - type: mrr_at_5 value: 51.990700000000004 - type: mrr_at_10 value: 52.9925 - type: mrr_at_20 value: 53.4612 - type: mrr_at_100 value: 53.7889 - type: mrr_at_1000 value: 53.8244 - type: nauc_ndcg_at_1_max value: 46.679700000000004 - type: nauc_ndcg_at_1_std value: -7.8208 - type: nauc_ndcg_at_1_diff1 value: 55.9238 - type: nauc_ndcg_at_3_max value: 39.761 - type: nauc_ndcg_at_3_std value: -7.6645 - type: nauc_ndcg_at_3_diff1 value: 43.6641 - type: nauc_ndcg_at_5_max value: 37.2506 - type: nauc_ndcg_at_5_std value: -7.574300000000001 - type: nauc_ndcg_at_5_diff1 value: 41.6025 - type: nauc_ndcg_at_10_max value: 38.1464 - type: nauc_ndcg_at_10_std value: -6.1288 - type: nauc_ndcg_at_10_diff1 value: 42.625 - type: nauc_ndcg_at_20_max value: 39.687 - type: nauc_ndcg_at_20_std value: -4.6046 - type: nauc_ndcg_at_20_diff1 value: 43.2796 - type: nauc_ndcg_at_100_max value: 41.4101 - type: nauc_ndcg_at_100_std value: -2.1537 - type: nauc_ndcg_at_100_diff1 value: 43.980599999999995 - type: nauc_ndcg_at_1000_max value: 42.0853 - type: nauc_ndcg_at_1000_std value: -2.5 - type: nauc_ndcg_at_1000_diff1 value: 44.5636 - type: nauc_map_at_1_max value: 21.019299999999998 - type: nauc_map_at_1_std value: -10.8832 - type: nauc_map_at_1_diff1 value: 45.1685 - type: nauc_map_at_3_max value: 29.0524 - type: nauc_map_at_3_std value: -9.6495 - type: nauc_map_at_3_diff1 value: 41.3844 - type: nauc_map_at_5_max value: 31.3813 - type: nauc_map_at_5_std value: -8.7888 - type: nauc_map_at_5_diff1 value: 40.1699 - type: nauc_map_at_10_max value: 33.8361 - type: nauc_map_at_10_std value: -7.9594 - type: nauc_map_at_10_diff1 value: 40.788999999999994 - type: nauc_map_at_20_max value: 34.9439 - type: nauc_map_at_20_std value: -7.382700000000001 - type: nauc_map_at_20_diff1 value: 41.134100000000004 - type: nauc_map_at_100_max value: 35.530899999999995 - type: nauc_map_at_100_std value: -6.8411 - type: nauc_map_at_100_diff1 value: 41.316 - type: nauc_map_at_1000_max value: 35.6246 - type: nauc_map_at_1000_std value: -6.828399999999999 - type: nauc_map_at_1000_diff1 value: 41.3739 - type: nauc_recall_at_1_max value: 21.019299999999998 - type: nauc_recall_at_1_std value: -10.8832 - type: nauc_recall_at_1_diff1 value: 45.1685 - type: nauc_recall_at_3_max value: 25.667499999999997 - type: nauc_recall_at_3_std value: -9.3695 - type: nauc_recall_at_3_diff1 value: 35.0424 - type: nauc_recall_at_5_max value: 26.2285 - type: nauc_recall_at_5_std value: -7.6552 - type: nauc_recall_at_5_diff1 value: 31.7068 - type: nauc_recall_at_10_max value: 29.12 - type: nauc_recall_at_10_std value: -3.5869 - type: nauc_recall_at_10_diff1 value: 31.952599999999997 - type: nauc_recall_at_20_max value: 31.5269 - type: nauc_recall_at_20_std value: 2.2824 - type: nauc_recall_at_20_diff1 value: 31.4747 - type: nauc_recall_at_100_max value: 34.533500000000004 - type: nauc_recall_at_100_std value: 18.8398 - type: nauc_recall_at_100_diff1 value: 29.525000000000002 - type: nauc_recall_at_1000_max value: 38.973600000000005 - type: nauc_recall_at_1000_std value: 37.9643 - type: nauc_recall_at_1000_diff1 value: 29.247899999999998 - type: nauc_precision_at_1_max value: 46.679700000000004 - type: nauc_precision_at_1_std value: -7.8208 - type: nauc_precision_at_1_diff1 value: 55.9238 - type: nauc_precision_at_3_max value: 46.348800000000004 - type: nauc_precision_at_3_std value: -2.4303000000000003 - type: nauc_precision_at_3_diff1 value: 31.4803 - type: nauc_precision_at_5_max value: 45.657 - type: nauc_precision_at_5_std value: 0.9887999999999999 - type: nauc_precision_at_5_diff1 value: 22.6439 - type: nauc_precision_at_10_max value: 48.147099999999995 - type: nauc_precision_at_10_std value: 5.313 - type: nauc_precision_at_10_diff1 value: 20.7803 - type: nauc_precision_at_20_max value: 47.407199999999996 - type: nauc_precision_at_20_std value: 8.8254 - type: nauc_precision_at_20_diff1 value: 17.7327 - type: nauc_precision_at_100_max value: 43.4944 - type: nauc_precision_at_100_std value: 14.8423 - type: nauc_precision_at_100_diff1 value: 11.7231 - type: nauc_precision_at_1000_max value: 36.3175 - type: nauc_precision_at_1000_std value: 14.9478 - type: nauc_precision_at_1000_diff1 value: 4.9391 - type: nauc_mrr_at_1_max value: 46.679700000000004 - type: nauc_mrr_at_1_std value: -7.8208 - type: nauc_mrr_at_1_diff1 value: 55.9238 - type: nauc_mrr_at_3_max value: 48.0241 - type: nauc_mrr_at_3_std value: -6.761100000000001 - type: nauc_mrr_at_3_diff1 value: 53.5091 - type: nauc_mrr_at_5_max value: 48.0965 - type: nauc_mrr_at_5_std value: -6.3173 - type: nauc_mrr_at_5_diff1 value: 52.9184 - type: nauc_mrr_at_10_max value: 48.3523 - type: nauc_mrr_at_10_std value: -5.6531 - type: nauc_mrr_at_10_diff1 value: 53.209399999999995 - type: nauc_mrr_at_20_max value: 48.365700000000004 - type: nauc_mrr_at_20_std value: -5.4359 - type: nauc_mrr_at_20_diff1 value: 53.16760000000001 - type: nauc_mrr_at_100_max value: 48.351699999999994 - type: nauc_mrr_at_100_std value: -5.3941 - type: nauc_mrr_at_100_diff1 value: 53.2419 - type: nauc_mrr_at_1000_max value: 48.343399999999995 - type: nauc_mrr_at_1000_std value: -5.4193 - type: nauc_mrr_at_1000_diff1 value: 53.264500000000005 - type: main_score value: 45.49 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA (default) revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: ndcg_at_1 value: 86.536 - type: ndcg_at_3 value: 64.485 - type: ndcg_at_5 value: 66.513 - type: ndcg_at_10 value: 68.151 - type: ndcg_at_20 value: 69.145 - type: ndcg_at_100 value: 70.552 - type: ndcg_at_1000 value: 71.772 - type: map_at_1 value: 43.268 - type: map_at_3 value: 56.013999999999996 - type: map_at_5 value: 57.69 - type: map_at_10 value: 58.709 - type: map_at_20 value: 59.122 - type: map_at_100 value: 59.418000000000006 - type: map_at_1000 value: 59.480999999999995 - type: recall_at_1 value: 43.268 - type: recall_at_3 value: 58.831999999999994 - type: recall_at_5 value: 62.829 - type: recall_at_10 value: 66.94099999999999 - type: recall_at_20 value: 70.135 - type: recall_at_100 value: 76.34 - type: recall_at_1000 value: 84.443 - type: precision_at_1 value: 86.536 - type: precision_at_3 value: 39.221000000000004 - type: precision_at_5 value: 25.131999999999998 - type: precision_at_10 value: 13.388 - type: precision_at_20 value: 7.013999999999999 - type: precision_at_100 value: 1.5270000000000001 - type: precision_at_1000 value: 0.169 - type: mrr_at_1 value: 86.5361 - type: mrr_at_3 value: 89.6151 - type: mrr_at_5 value: 89.9521 - type: mrr_at_10 value: 90.1301 - type: mrr_at_20 value: 90.201 - type: mrr_at_100 value: 90.2397 - type: mrr_at_1000 value: 90.245 - type: nauc_ndcg_at_1_max value: 57.6156 - type: nauc_ndcg_at_1_std value: -3.39 - type: nauc_ndcg_at_1_diff1 value: 83.0288 - type: nauc_ndcg_at_3_max value: 17.758599999999998 - type: nauc_ndcg_at_3_std value: 3.3521 - type: nauc_ndcg_at_3_diff1 value: 15.4846 - type: nauc_ndcg_at_5_max value: 14.6571 - type: nauc_ndcg_at_5_std value: 4.2071 - type: nauc_ndcg_at_5_diff1 value: 12.3942 - type: nauc_ndcg_at_10_max value: 12.5579 - type: nauc_ndcg_at_10_std value: 4.7895 - type: nauc_ndcg_at_10_diff1 value: 10.2189 - type: nauc_ndcg_at_20_max value: 11.5413 - type: nauc_ndcg_at_20_std value: 5.0043 - type: nauc_ndcg_at_20_diff1 value: 9.3896 - type: nauc_ndcg_at_100_max value: 10.6797 - type: nauc_ndcg_at_100_std value: 5.7805 - type: nauc_ndcg_at_100_diff1 value: 8.5649 - type: nauc_ndcg_at_1000_max value: 10.8847 - type: nauc_ndcg_at_1000_std value: 6.1945 - type: nauc_ndcg_at_1000_diff1 value: 8.539 - type: nauc_map_at_1_max value: 57.6156 - type: nauc_map_at_1_std value: -3.39 - type: nauc_map_at_1_diff1 value: 83.0288 - type: nauc_map_at_3_max value: 12.4083 - type: nauc_map_at_3_std value: 3.2297 - type: nauc_map_at_3_diff1 value: 8.2482 - type: nauc_map_at_5_max value: 10.4054 - type: nauc_map_at_5_std value: 3.7108000000000003 - type: nauc_map_at_5_diff1 value: 6.4539 - type: nauc_map_at_10_max value: 9.439300000000001 - type: nauc_map_at_10_std value: 4.0356000000000005 - type: nauc_map_at_10_diff1 value: 5.502400000000001 - type: nauc_map_at_20_max value: 9.141 - type: nauc_map_at_20_std value: 4.1145000000000005 - type: nauc_map_at_20_diff1 value: 5.2942 - type: nauc_map_at_100_max value: 9.0071 - type: nauc_map_at_100_std value: 4.2345 - type: nauc_map_at_100_diff1 value: 5.1606 - type: nauc_map_at_1000_max value: 9.017999999999999 - type: nauc_map_at_1000_std value: 4.2501 - type: nauc_map_at_1000_diff1 value: 5.162 - type: nauc_recall_at_1_max value: 57.6156 - type: nauc_recall_at_1_std value: -3.39 - type: nauc_recall_at_1_diff1 value: 83.0288 - type: nauc_recall_at_3_max value: 8.4358 - type: nauc_recall_at_3_std value: 4.925199999999999 - type: nauc_recall_at_3_diff1 value: 0.29009999999999997 - type: nauc_recall_at_5_max value: 3.2076000000000002 - type: nauc_recall_at_5_std value: 6.2316 - type: nauc_recall_at_5_diff1 value: -4.6014 - type: nauc_recall_at_10_max value: -1.7786 - type: nauc_recall_at_10_std value: 7.467300000000001 - type: nauc_recall_at_10_diff1 value: -9.6991 - type: nauc_recall_at_20_max value: -5.0717 - type: nauc_recall_at_20_std value: 8.1128 - type: nauc_recall_at_20_diff1 value: -12.5945 - type: nauc_recall_at_100_max value: -10.5434 - type: nauc_recall_at_100_std value: 11.7719 - type: nauc_recall_at_100_diff1 value: -18.394 - type: nauc_recall_at_1000_max value: -15.5908 - type: nauc_recall_at_1000_std value: 16.842399999999998 - type: nauc_recall_at_1000_diff1 value: -27.099400000000003 - type: nauc_precision_at_1_max value: 57.6156 - type: nauc_precision_at_1_std value: -3.39 - type: nauc_precision_at_1_diff1 value: 83.0288 - type: nauc_precision_at_3_max value: 8.4358 - type: nauc_precision_at_3_std value: 4.925199999999999 - type: nauc_precision_at_3_diff1 value: 0.29009999999999997 - type: nauc_precision_at_5_max value: 3.2076000000000002 - type: nauc_precision_at_5_std value: 6.2316 - type: nauc_precision_at_5_diff1 value: -4.6014 - type: nauc_precision_at_10_max value: -1.7786 - type: nauc_precision_at_10_std value: 7.467300000000001 - type: nauc_precision_at_10_diff1 value: -9.6991 - type: nauc_precision_at_20_max value: -5.0717 - type: nauc_precision_at_20_std value: 8.1128 - type: nauc_precision_at_20_diff1 value: -12.5945 - type: nauc_precision_at_100_max value: -10.5434 - type: nauc_precision_at_100_std value: 11.7719 - type: nauc_precision_at_100_diff1 value: -18.394 - type: nauc_precision_at_1000_max value: -15.5908 - type: nauc_precision_at_1000_std value: 16.842399999999998 - type: nauc_precision_at_1000_diff1 value: -27.099400000000003 - type: nauc_mrr_at_1_max value: 57.6156 - type: nauc_mrr_at_1_std value: -3.39 - type: nauc_mrr_at_1_diff1 value: 83.0288 - type: nauc_mrr_at_3_max value: 62.074 - type: nauc_mrr_at_3_std value: -0.45199999999999996 - type: nauc_mrr_at_3_diff1 value: 82.8025 - type: nauc_mrr_at_5_max value: 62.157300000000006 - type: nauc_mrr_at_5_std value: 0.2829 - type: nauc_mrr_at_5_diff1 value: 82.9913 - type: nauc_mrr_at_10_max value: 61.9838 - type: nauc_mrr_at_10_std value: 0.16670000000000001 - type: nauc_mrr_at_10_diff1 value: 82.9452 - type: nauc_mrr_at_20_max value: 61.9516 - type: nauc_mrr_at_20_std value: 0.18159999999999998 - type: nauc_mrr_at_20_diff1 value: 82.9723 - type: nauc_mrr_at_100_max value: 61.891600000000004 - type: nauc_mrr_at_100_std value: 0.1432 - type: nauc_mrr_at_100_diff1 value: 82.97489999999999 - type: nauc_mrr_at_1000_max value: 61.88249999999999 - type: nauc_mrr_at_1000_std value: 0.1357 - type: nauc_mrr_at_1000_diff1 value: 82.9723 - type: main_score value: 68.151 task: type: Retrieval - dataset: config: default name: MTEB ImdbClassification (default) revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 72.5444 - type: f1 value: 72.4069 - type: f1_weighted value: 72.4069 - type: ap value: 66.8419 - type: ap_weighted value: 66.8419 - type: main_score value: 72.5444 task: type: Classification - dataset: config: default name: MTEB MSMARCO (default) revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: ndcg_at_1 value: 25.516 - type: ndcg_at_3 value: 36.687999999999995 - type: ndcg_at_5 value: 40.864 - type: ndcg_at_10 value: 44.856 - type: ndcg_at_20 value: 47.3 - type: ndcg_at_100 value: 50.062 - type: ndcg_at_1000 value: 51.085 - type: map_at_1 value: 24.782 - type: map_at_3 value: 33.668 - type: map_at_5 value: 36.010999999999996 - type: map_at_10 value: 37.702000000000005 - type: map_at_20 value: 38.391 - type: map_at_100 value: 38.798 - type: map_at_1000 value: 38.841 - type: recall_at_1 value: 24.782 - type: recall_at_3 value: 44.722 - type: recall_at_5 value: 54.769999999999996 - type: recall_at_10 value: 66.842 - type: recall_at_20 value: 76.319 - type: recall_at_100 value: 90.761 - type: recall_at_1000 value: 98.48 - type: precision_at_1 value: 25.516 - type: precision_at_3 value: 15.506 - type: precision_at_5 value: 11.413 - type: precision_at_10 value: 6.99 - type: precision_at_20 value: 4.009 - type: precision_at_100 value: 0.959 - type: precision_at_1000 value: 0.105 - type: mrr_at_1 value: 25.5014 - type: mrr_at_3 value: 34.3553 - type: mrr_at_5 value: 36.666199999999996 - type: mrr_at_10 value: 38.3084 - type: mrr_at_20 value: 38.9663 - type: mrr_at_100 value: 39.341300000000004 - type: mrr_at_1000 value: 39.3785 - type: nauc_ndcg_at_1_max value: 4.2138 - type: nauc_ndcg_at_1_std value: -24.7801 - type: nauc_ndcg_at_1_diff1 value: 37.758399999999995 - type: nauc_ndcg_at_3_max value: 5.2536 - type: nauc_ndcg_at_3_std value: -29.642200000000003 - type: nauc_ndcg_at_3_diff1 value: 32.1639 - type: nauc_ndcg_at_5_max value: 5.0839 - type: nauc_ndcg_at_5_std value: -31.3077 - type: nauc_ndcg_at_5_diff1 value: 31.5135 - type: nauc_ndcg_at_10_max value: 6.2542 - type: nauc_ndcg_at_10_std value: -30.8439 - type: nauc_ndcg_at_10_diff1 value: 31.461299999999998 - type: nauc_ndcg_at_20_max value: 6.5669 - type: nauc_ndcg_at_20_std value: -29.6288 - type: nauc_ndcg_at_20_diff1 value: 31.590200000000003 - type: nauc_ndcg_at_100_max value: 6.691800000000001 - type: nauc_ndcg_at_100_std value: -28.1768 - type: nauc_ndcg_at_100_diff1 value: 32.1699 - type: nauc_ndcg_at_1000_max value: 6.451700000000001 - type: nauc_ndcg_at_1000_std value: -28.2093 - type: nauc_ndcg_at_1000_diff1 value: 32.3573 - type: nauc_map_at_1_max value: 4.1941 - type: nauc_map_at_1_std value: -24.9531 - type: nauc_map_at_1_diff1 value: 38.099 - type: nauc_map_at_3_max value: 4.9883999999999995 - type: nauc_map_at_3_std value: -28.7062 - type: nauc_map_at_3_diff1 value: 33.5696 - type: nauc_map_at_5_max value: 4.8525 - type: nauc_map_at_5_std value: -29.6601 - type: nauc_map_at_5_diff1 value: 33.2144 - type: nauc_map_at_10_max value: 5.3533 - type: nauc_map_at_10_std value: -29.4529 - type: nauc_map_at_10_diff1 value: 33.219300000000004 - type: nauc_map_at_20_max value: 5.416300000000001 - type: nauc_map_at_20_std value: -29.1294 - type: nauc_map_at_20_diff1 value: 33.2747 - type: nauc_map_at_100_max value: 5.4547 - type: nauc_map_at_100_std value: -28.8978 - type: nauc_map_at_100_diff1 value: 33.3505 - type: nauc_map_at_1000_max value: 5.4512 - type: nauc_map_at_1000_std value: -28.8844 - type: nauc_map_at_1000_diff1 value: 33.356700000000004 - type: nauc_recall_at_1_max value: 4.1941 - type: nauc_recall_at_1_std value: -24.9531 - type: nauc_recall_at_1_diff1 value: 38.099 - type: nauc_recall_at_3_max value: 5.884799999999999 - type: nauc_recall_at_3_std value: -32.317 - type: nauc_recall_at_3_diff1 value: 28.284399999999998 - type: nauc_recall_at_5_max value: 5.4525 - type: nauc_recall_at_5_std value: -36.4055 - type: nauc_recall_at_5_diff1 value: 26.384200000000003 - type: nauc_recall_at_10_max value: 9.403400000000001 - type: nauc_recall_at_10_std value: -35.9112 - type: nauc_recall_at_10_diff1 value: 25.2415 - type: nauc_recall_at_20_max value: 12.0952 - type: nauc_recall_at_20_std value: -30.778299999999998 - type: nauc_recall_at_20_diff1 value: 24.1866 - type: nauc_recall_at_100_max value: 19.6413 - type: nauc_recall_at_100_std value: -11.9243 - type: nauc_recall_at_100_diff1 value: 24.6153 - type: nauc_recall_at_1000_max value: 48.1206 - type: nauc_recall_at_1000_std value: 48.0062 - type: nauc_recall_at_1000_diff1 value: 16.2543 - type: nauc_precision_at_1_max value: 4.2138 - type: nauc_precision_at_1_std value: -24.7801 - type: nauc_precision_at_1_diff1 value: 37.758399999999995 - type: nauc_precision_at_3_max value: 5.7985 - type: nauc_precision_at_3_std value: -31.749899999999997 - type: nauc_precision_at_3_diff1 value: 27.373399999999997 - type: nauc_precision_at_5_max value: 5.390000000000001 - type: nauc_precision_at_5_std value: -35.0586 - type: nauc_precision_at_5_diff1 value: 25.100099999999998 - type: nauc_precision_at_10_max value: 9.248199999999999 - type: nauc_precision_at_10_std value: -32.244299999999996 - type: nauc_precision_at_10_diff1 value: 22.5684 - type: nauc_precision_at_20_max value: 11.495099999999999 - type: nauc_precision_at_20_std value: -24.226300000000002 - type: nauc_precision_at_20_diff1 value: 19.6528 - type: nauc_precision_at_100_max value: 14.3649 - type: nauc_precision_at_100_std value: 0.0593 - type: nauc_precision_at_100_diff1 value: 10.9596 - type: nauc_precision_at_1000_max value: 10.9512 - type: nauc_precision_at_1000_std value: 18.288 - type: nauc_precision_at_1000_diff1 value: -3.5423000000000004 - type: nauc_mrr_at_1_max value: 4.2204 - type: nauc_mrr_at_1_std value: -24.7703 - type: nauc_mrr_at_1_diff1 value: 37.8126 - type: nauc_mrr_at_3_max value: 5.0668 - type: nauc_mrr_at_3_std value: -28.2677 - type: nauc_mrr_at_3_diff1 value: 33.3724 - type: nauc_mrr_at_5_max value: 5.0481 - type: nauc_mrr_at_5_std value: -29.133 - type: nauc_mrr_at_5_diff1 value: 33.0415 - type: nauc_mrr_at_10_max value: 5.5038 - type: nauc_mrr_at_10_std value: -28.886200000000002 - type: nauc_mrr_at_10_diff1 value: 33.0593 - type: nauc_mrr_at_20_max value: 5.5467 - type: nauc_mrr_at_20_std value: -28.5678 - type: nauc_mrr_at_20_diff1 value: 33.0916 - type: nauc_mrr_at_100_max value: 5.5636 - type: nauc_mrr_at_100_std value: -28.3877 - type: nauc_mrr_at_100_diff1 value: 33.1799 - type: nauc_mrr_at_1000_max value: 5.557 - type: nauc_mrr_at_1000_std value: -28.3796 - type: nauc_mrr_at_1000_diff1 value: 33.184999999999995 - type: main_score value: 44.856 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 93.5317 - type: f1 value: 93.1956 - type: f1_weighted value: 93.5431 - type: main_score value: 93.5317 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 67.7907 - type: f1 value: 48.2877 - type: f1_weighted value: 70.3225 - type: main_score value: 67.7907 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 71.456 - type: f1 value: 68.2268 - type: f1_weighted value: 70.4722 - type: main_score value: 71.456 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 76.21719999999999 - type: f1 value: 75.14189999999999 - type: f1_weighted value: 76.0733 - type: main_score value: 76.21719999999999 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P (default) revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: v_measure value: 31.3917 - type: v_measure_std value: 1.4778 - type: main_score value: 31.3917 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S (default) revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: v_measure value: 28.2408 - type: v_measure_std value: 1.1622999999999999 - type: main_score value: 28.2408 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking (default) revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 split: test type: mteb/mind_small metrics: - type: map value: 29.5796 - type: mrr value: 30.3081 - type: nAUC_map_max value: -24.9194 - type: nAUC_map_std value: -9.042 - type: nAUC_map_diff1 value: 12.1611 - type: nAUC_mrr_max value: -19.3867 - type: nAUC_mrr_std value: -6.3873 - type: nAUC_mrr_diff1 value: 11.8078 - type: main_score value: 29.5796 task: type: Reranking - dataset: config: default name: MTEB NFCorpus (default) revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: ndcg_at_1 value: 45.046 - type: ndcg_at_3 value: 41.704 - type: ndcg_at_5 value: 39.296 - type: ndcg_at_10 value: 35.343999999999994 - type: ndcg_at_20 value: 32.525999999999996 - type: ndcg_at_100 value: 31.352999999999998 - type: ndcg_at_1000 value: 39.772 - type: map_at_1 value: 5.833 - type: map_at_3 value: 9.953 - type: map_at_5 value: 11.549 - type: map_at_10 value: 13.38 - type: map_at_20 value: 14.706 - type: map_at_100 value: 16.422 - type: map_at_1000 value: 17.777 - type: recall_at_1 value: 5.833 - type: recall_at_3 value: 11.112 - type: recall_at_5 value: 13.834 - type: recall_at_10 value: 16.961000000000002 - type: recall_at_20 value: 20.294999999999998 - type: recall_at_100 value: 30.253000000000004 - type: recall_at_1000 value: 60.902 - type: precision_at_1 value: 46.44 - type: precision_at_3 value: 39.009 - type: precision_at_5 value: 33.745999999999995 - type: precision_at_10 value: 25.635 - type: precision_at_20 value: 18.576 - type: precision_at_100 value: 7.731000000000001 - type: precision_at_1000 value: 2.037 - type: mrr_at_1 value: 46.7492 - type: mrr_at_3 value: 54.6956 - type: mrr_at_5 value: 55.8875 - type: mrr_at_10 value: 56.3913 - type: mrr_at_20 value: 56.6265 - type: mrr_at_100 value: 56.815599999999996 - type: mrr_at_1000 value: 56.8573 - type: nauc_ndcg_at_1_max value: 43.3685 - type: nauc_ndcg_at_1_std value: 21.6124 - type: nauc_ndcg_at_1_diff1 value: 29.0317 - type: nauc_ndcg_at_3_max value: 39.8155 - type: nauc_ndcg_at_3_std value: 23.2206 - type: nauc_ndcg_at_3_diff1 value: 20.7425 - type: nauc_ndcg_at_5_max value: 40.951 - type: nauc_ndcg_at_5_std value: 24.7184 - type: nauc_ndcg_at_5_diff1 value: 19.098599999999998 - type: nauc_ndcg_at_10_max value: 41.4733 - type: nauc_ndcg_at_10_std value: 27.4588 - type: nauc_ndcg_at_10_diff1 value: 17.224800000000002 - type: nauc_ndcg_at_20_max value: 40.3519 - type: nauc_ndcg_at_20_std value: 27.2947 - type: nauc_ndcg_at_20_diff1 value: 16.502 - type: nauc_ndcg_at_100_max value: 44.0676 - type: nauc_ndcg_at_100_std value: 29.1921 - type: nauc_ndcg_at_100_diff1 value: 20.9199 - type: nauc_ndcg_at_1000_max value: 48.9082 - type: nauc_ndcg_at_1000_std value: 33.799600000000005 - type: nauc_ndcg_at_1000_diff1 value: 19.741600000000002 - type: nauc_map_at_1_max value: 19.2048 - type: nauc_map_at_1_std value: -13.564599999999999 - type: nauc_map_at_1_diff1 value: 37.601099999999995 - type: nauc_map_at_3_max value: 23.1853 - type: nauc_map_at_3_std value: -8.3204 - type: nauc_map_at_3_diff1 value: 32.5527 - type: nauc_map_at_5_max value: 26.747500000000002 - type: nauc_map_at_5_std value: -4.136 - type: nauc_map_at_5_diff1 value: 29.041800000000002 - type: nauc_map_at_10_max value: 30.492200000000004 - type: nauc_map_at_10_std value: 2.2847 - type: nauc_map_at_10_diff1 value: 25.949699999999996 - type: nauc_map_at_20_max value: 32.628800000000005 - type: nauc_map_at_20_std value: 6.2305 - type: nauc_map_at_20_diff1 value: 24.0997 - type: nauc_map_at_100_max value: 35.0282 - type: nauc_map_at_100_std value: 12.181899999999999 - type: nauc_map_at_100_diff1 value: 22.6844 - type: nauc_map_at_1000_max value: 35.274899999999995 - type: nauc_map_at_1000_std value: 14.9827 - type: nauc_map_at_1000_diff1 value: 21.4096 - type: nauc_recall_at_1_max value: 19.2048 - type: nauc_recall_at_1_std value: -13.564599999999999 - type: nauc_recall_at_1_diff1 value: 37.601099999999995 - type: nauc_recall_at_3_max value: 20.5895 - type: nauc_recall_at_3_std value: -7.8295 - type: nauc_recall_at_3_diff1 value: 28.4675 - type: nauc_recall_at_5_max value: 24.8771 - type: nauc_recall_at_5_std value: -2.869 - type: nauc_recall_at_5_diff1 value: 23.301 - type: nauc_recall_at_10_max value: 28.647299999999998 - type: nauc_recall_at_10_std value: 4.4991 - type: nauc_recall_at_10_diff1 value: 20.5606 - type: nauc_recall_at_20_max value: 30.3525 - type: nauc_recall_at_20_std value: 8.712 - type: nauc_recall_at_20_diff1 value: 17.4748 - type: nauc_recall_at_100_max value: 34.0702 - type: nauc_recall_at_100_std value: 23.3319 - type: nauc_recall_at_100_diff1 value: 17.2015 - type: nauc_recall_at_1000_max value: 27.8011 - type: nauc_recall_at_1000_std value: 21.6507 - type: nauc_recall_at_1000_diff1 value: 4.4638 - type: nauc_precision_at_1_max value: 44.6989 - type: nauc_precision_at_1_std value: 22.622 - type: nauc_precision_at_1_diff1 value: 28.881400000000003 - type: nauc_precision_at_3_max value: 39.4166 - type: nauc_precision_at_3_std value: 29.2591 - type: nauc_precision_at_3_diff1 value: 12.1577 - type: nauc_precision_at_5_max value: 39.6371 - type: nauc_precision_at_5_std value: 33.201 - type: nauc_precision_at_5_diff1 value: 7.958 - type: nauc_precision_at_10_max value: 38.2593 - type: nauc_precision_at_10_std value: 40.6097 - type: nauc_precision_at_10_diff1 value: 1.376 - type: nauc_precision_at_20_max value: 31.375999999999998 - type: nauc_precision_at_20_std value: 42.3468 - type: nauc_precision_at_20_diff1 value: -4.1699 - type: nauc_precision_at_100_max value: 16.628 - type: nauc_precision_at_100_std value: 41.800599999999996 - type: nauc_precision_at_100_diff1 value: -9.4674 - type: nauc_precision_at_1000_max value: 1.6051 - type: nauc_precision_at_1000_std value: 29.1306 - type: nauc_precision_at_1000_diff1 value: -11.1912 - type: nauc_mrr_at_1_max value: 44.4339 - type: nauc_mrr_at_1_std value: 23.6489 - type: nauc_mrr_at_1_diff1 value: 28.0393 - type: nauc_mrr_at_3_max value: 47.780899999999995 - type: nauc_mrr_at_3_std value: 31.412499999999998 - type: nauc_mrr_at_3_diff1 value: 24.1569 - type: nauc_mrr_at_5_max value: 48.732 - type: nauc_mrr_at_5_std value: 31.899100000000004 - type: nauc_mrr_at_5_diff1 value: 24.4177 - type: nauc_mrr_at_10_max value: 48.9748 - type: nauc_mrr_at_10_std value: 32.2053 - type: nauc_mrr_at_10_diff1 value: 24.0317 - type: nauc_mrr_at_20_max value: 49.0832 - type: nauc_mrr_at_20_std value: 32.0994 - type: nauc_mrr_at_20_diff1 value: 23.9777 - type: nauc_mrr_at_100_max value: 49.1731 - type: nauc_mrr_at_100_std value: 32.3179 - type: nauc_mrr_at_100_diff1 value: 24.081 - type: nauc_mrr_at_1000_max value: 49.1387 - type: nauc_mrr_at_1000_std value: 32.2738 - type: nauc_mrr_at_1000_diff1 value: 24.063200000000002 - type: main_score value: 35.343999999999994 task: type: Retrieval - dataset: config: default name: MTEB NQ (default) revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: ndcg_at_1 value: 44.93 - type: ndcg_at_3 value: 56.003 - type: ndcg_at_5 value: 60.150000000000006 - type: ndcg_at_10 value: 63.673 - type: ndcg_at_20 value: 65.211 - type: ndcg_at_100 value: 66.686 - type: ndcg_at_1000 value: 67.009 - type: map_at_1 value: 40.035 - type: map_at_3 value: 51.976 - type: map_at_5 value: 54.510999999999996 - type: map_at_10 value: 56.17100000000001 - type: map_at_20 value: 56.684 - type: map_at_100 value: 56.932 - type: map_at_1000 value: 56.946 - type: recall_at_1 value: 40.035 - type: recall_at_3 value: 64.224 - type: recall_at_5 value: 73.682 - type: recall_at_10 value: 83.809 - type: recall_at_20 value: 89.385 - type: recall_at_100 value: 96.705 - type: recall_at_1000 value: 99.054 - type: precision_at_1 value: 44.93 - type: precision_at_3 value: 25.019000000000002 - type: precision_at_5 value: 17.445 - type: precision_at_10 value: 10.043000000000001 - type: precision_at_20 value: 5.4 - type: precision_at_100 value: 1.174 - type: precision_at_1000 value: 0.121 - type: mrr_at_1 value: 44.9305 - type: mrr_at_3 value: 55.37370000000001 - type: mrr_at_5 value: 57.4464 - type: mrr_at_10 value: 58.680200000000006 - type: mrr_at_20 value: 59.0042 - type: mrr_at_100 value: 59.178799999999995 - type: mrr_at_1000 value: 59.188700000000004 - type: nauc_ndcg_at_1_max value: 23.8396 - type: nauc_ndcg_at_1_std value: -3.8885000000000005 - type: nauc_ndcg_at_1_diff1 value: 37.971500000000006 - type: nauc_ndcg_at_3_max value: 30.025800000000004 - type: nauc_ndcg_at_3_std value: -4.9848 - type: nauc_ndcg_at_3_diff1 value: 34.324799999999996 - type: nauc_ndcg_at_5_max value: 32.2984 - type: nauc_ndcg_at_5_std value: -3.263 - type: nauc_ndcg_at_5_diff1 value: 35.2865 - type: nauc_ndcg_at_10_max value: 32.4173 - type: nauc_ndcg_at_10_std value: -2.398 - type: nauc_ndcg_at_10_diff1 value: 34.767399999999995 - type: nauc_ndcg_at_20_max value: 32.332 - type: nauc_ndcg_at_20_std value: -1.7824 - type: nauc_ndcg_at_20_diff1 value: 35.0354 - type: nauc_ndcg_at_100_max value: 31.3774 - type: nauc_ndcg_at_100_std value: -1.4645 - type: nauc_ndcg_at_100_diff1 value: 35.255900000000004 - type: nauc_ndcg_at_1000_max value: 31.008799999999997 - type: nauc_ndcg_at_1000_std value: -1.9499 - type: nauc_ndcg_at_1000_diff1 value: 35.3522 - type: nauc_map_at_1_max value: 21.296300000000002 - type: nauc_map_at_1_std value: -6.0126 - type: nauc_map_at_1_diff1 value: 37.9216 - type: nauc_map_at_3_max value: 28.1195 - type: nauc_map_at_3_std value: -5.3494 - type: nauc_map_at_3_diff1 value: 35.0839 - type: nauc_map_at_5_max value: 29.365999999999996 - type: nauc_map_at_5_std value: -4.410200000000001 - type: nauc_map_at_5_diff1 value: 35.6342 - type: nauc_map_at_10_max value: 29.378300000000003 - type: nauc_map_at_10_std value: -4.0228 - type: nauc_map_at_10_diff1 value: 35.451 - type: nauc_map_at_20_max value: 29.3604 - type: nauc_map_at_20_std value: -3.7953 - type: nauc_map_at_20_diff1 value: 35.5496 - type: nauc_map_at_100_max value: 29.233199999999997 - type: nauc_map_at_100_std value: -3.7321 - type: nauc_map_at_100_diff1 value: 35.574099999999994 - type: nauc_map_at_1000_max value: 29.2215 - type: nauc_map_at_1000_std value: -3.7482 - type: nauc_map_at_1000_diff1 value: 35.5805 - type: nauc_recall_at_1_max value: 21.296300000000002 - type: nauc_recall_at_1_std value: -6.0126 - type: nauc_recall_at_1_diff1 value: 37.9216 - type: nauc_recall_at_3_max value: 34.2599 - type: nauc_recall_at_3_std value: -5.5474000000000006 - type: nauc_recall_at_3_diff1 value: 30.7103 - type: nauc_recall_at_5_max value: 41.6689 - type: nauc_recall_at_5_std value: -0.7705 - type: nauc_recall_at_5_diff1 value: 32.6001 - type: nauc_recall_at_10_max value: 47.236200000000004 - type: nauc_recall_at_10_std value: 3.9309999999999996 - type: nauc_recall_at_10_diff1 value: 29.277199999999997 - type: nauc_recall_at_20_max value: 53.957100000000004 - type: nauc_recall_at_20_std value: 11.282499999999999 - type: nauc_recall_at_20_diff1 value: 29.7674 - type: nauc_recall_at_100_max value: 66.87039999999999 - type: nauc_recall_at_100_std value: 46.8733 - type: nauc_recall_at_100_diff1 value: 30.0249 - type: nauc_recall_at_1000_max value: 88.33670000000001 - type: nauc_recall_at_1000_std value: 77.0724 - type: nauc_recall_at_1000_diff1 value: 34.0192 - type: nauc_precision_at_1_max value: 23.8396 - type: nauc_precision_at_1_std value: -3.8885000000000005 - type: nauc_precision_at_1_diff1 value: 37.971500000000006 - type: nauc_precision_at_3_max value: 31.053399999999996 - type: nauc_precision_at_3_std value: 0.3766 - type: nauc_precision_at_3_diff1 value: 21.5732 - type: nauc_precision_at_5_max value: 30.816100000000002 - type: nauc_precision_at_5_std value: 5.3659 - type: nauc_precision_at_5_diff1 value: 17.4728 - type: nauc_precision_at_10_max value: 25.204300000000003 - type: nauc_precision_at_10_std value: 10.6652 - type: nauc_precision_at_10_diff1 value: 7.7665 - type: nauc_precision_at_20_max value: 20.3015 - type: nauc_precision_at_20_std value: 14.1789 - type: nauc_precision_at_20_diff1 value: 3.2251000000000003 - type: nauc_precision_at_100_max value: 9.709 - type: nauc_precision_at_100_std value: 17.7706 - type: nauc_precision_at_100_diff1 value: -5.5258 - type: nauc_precision_at_1000_max value: 4.5083 - type: nauc_precision_at_1000_std value: 14.754900000000001 - type: nauc_precision_at_1000_diff1 value: -8.1761 - type: nauc_mrr_at_1_max value: 23.8396 - type: nauc_mrr_at_1_std value: -3.8885000000000005 - type: nauc_mrr_at_1_diff1 value: 37.971500000000006 - type: nauc_mrr_at_3_max value: 28.9257 - type: nauc_mrr_at_3_std value: -3.6295 - type: nauc_mrr_at_3_diff1 value: 35.390100000000004 - type: nauc_mrr_at_5_max value: 29.8503 - type: nauc_mrr_at_5_std value: -2.8144 - type: nauc_mrr_at_5_diff1 value: 35.8786 - type: nauc_mrr_at_10_max value: 29.662899999999997 - type: nauc_mrr_at_10_std value: -2.6432 - type: nauc_mrr_at_10_diff1 value: 35.708400000000005 - type: nauc_mrr_at_20_max value: 29.5659 - type: nauc_mrr_at_20_std value: -2.6337 - type: nauc_mrr_at_20_diff1 value: 35.761900000000004 - type: nauc_mrr_at_100_max value: 29.432399999999998 - type: nauc_mrr_at_100_std value: -2.6328 - type: nauc_mrr_at_100_diff1 value: 35.8182 - type: nauc_mrr_at_1000_max value: 29.4234 - type: nauc_mrr_at_1000_std value: -2.6451 - type: nauc_mrr_at_1000_diff1 value: 35.8215 - type: main_score value: 63.673 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval (default) revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 split: test type: mteb/quora metrics: - type: ndcg_at_1 value: 82.27 - type: ndcg_at_3 value: 86.28099999999999 - type: ndcg_at_5 value: 87.81400000000001 - type: ndcg_at_10 value: 89.021 - type: ndcg_at_20 value: 89.643 - type: ndcg_at_100 value: 90.13 - type: ndcg_at_1000 value: 90.226 - type: map_at_1 value: 71.43599999999999 - type: map_at_3 value: 82.49 - type: map_at_5 value: 84.331 - type: map_at_10 value: 85.416 - type: map_at_20 value: 85.827 - type: map_at_100 value: 86.024 - type: map_at_1000 value: 86.039 - type: recall_at_1 value: 71.43599999999999 - type: recall_at_3 value: 87.912 - type: recall_at_5 value: 92.30000000000001 - type: recall_at_10 value: 95.814 - type: recall_at_20 value: 97.80799999999999 - type: recall_at_100 value: 99.551 - type: recall_at_1000 value: 99.97 - type: precision_at_1 value: 82.27 - type: precision_at_3 value: 37.747 - type: precision_at_5 value: 24.782 - type: precision_at_10 value: 13.497 - type: precision_at_20 value: 7.147 - type: precision_at_100 value: 1.529 - type: precision_at_1000 value: 0.157 - type: mrr_at_1 value: 82.23 - type: mrr_at_3 value: 87.26 - type: mrr_at_5 value: 87.9305 - type: mrr_at_10 value: 88.20949999999999 - type: mrr_at_20 value: 88.2764 - type: mrr_at_100 value: 88.2967 - type: mrr_at_1000 value: 88.2976 - type: nauc_ndcg_at_1_max value: 37.0736 - type: nauc_ndcg_at_1_std value: -43.2326 - type: nauc_ndcg_at_1_diff1 value: 77.9945 - type: nauc_ndcg_at_3_max value: 33.9426 - type: nauc_ndcg_at_3_std value: -51.3108 - type: nauc_ndcg_at_3_diff1 value: 76.2559 - type: nauc_ndcg_at_5_max value: 34.927 - type: nauc_ndcg_at_5_std value: -52.50749999999999 - type: nauc_ndcg_at_5_diff1 value: 76.578 - type: nauc_ndcg_at_10_max value: 35.9905 - type: nauc_ndcg_at_10_std value: -51.808699999999995 - type: nauc_ndcg_at_10_diff1 value: 76.6957 - type: nauc_ndcg_at_20_max value: 36.119299999999996 - type: nauc_ndcg_at_20_std value: -50.1628 - type: nauc_ndcg_at_20_diff1 value: 76.6659 - type: nauc_ndcg_at_100_max value: 36.4315 - type: nauc_ndcg_at_100_std value: -48.0358 - type: nauc_ndcg_at_100_diff1 value: 76.5866 - type: nauc_ndcg_at_1000_max value: 36.459399999999995 - type: nauc_ndcg_at_1000_std value: -47.834199999999996 - type: nauc_ndcg_at_1000_diff1 value: 76.5791 - type: nauc_map_at_1_max value: 25.902199999999997 - type: nauc_map_at_1_std value: -44.6605 - type: nauc_map_at_1_diff1 value: 80.78070000000001 - type: nauc_map_at_3_max value: 31.3371 - type: nauc_map_at_3_std value: -53.9334 - type: nauc_map_at_3_diff1 value: 77.7089 - type: nauc_map_at_5_max value: 33.1663 - type: nauc_map_at_5_std value: -53.86919999999999 - type: nauc_map_at_5_diff1 value: 77.32430000000001 - type: nauc_map_at_10_max value: 34.4253 - type: nauc_map_at_10_std value: -52.423500000000004 - type: nauc_map_at_10_diff1 value: 77.0479 - type: nauc_map_at_20_max value: 34.6738 - type: nauc_map_at_20_std value: -51.095400000000005 - type: nauc_map_at_20_diff1 value: 76.88810000000001 - type: nauc_map_at_100_max value: 34.7984 - type: nauc_map_at_100_std value: -50.2705 - type: nauc_map_at_100_diff1 value: 76.8083 - type: nauc_map_at_1000_max value: 34.8162 - type: nauc_map_at_1000_std value: -50.211600000000004 - type: nauc_map_at_1000_diff1 value: 76.8047 - type: nauc_recall_at_1_max value: 25.902199999999997 - type: nauc_recall_at_1_std value: -44.6605 - type: nauc_recall_at_1_diff1 value: 80.78070000000001 - type: nauc_recall_at_3_max value: 27.693 - type: nauc_recall_at_3_std value: -61.799400000000006 - type: nauc_recall_at_3_diff1 value: 74.25 - type: nauc_recall_at_5_max value: 30.216700000000003 - type: nauc_recall_at_5_std value: -68.2919 - type: nauc_recall_at_5_diff1 value: 72.8613 - type: nauc_recall_at_10_max value: 34.4765 - type: nauc_recall_at_10_std value: -74.3633 - type: nauc_recall_at_10_diff1 value: 73.0316 - type: nauc_recall_at_20_max value: 33.812 - type: nauc_recall_at_20_std value: -72.8956 - type: nauc_recall_at_20_diff1 value: 73.4475 - type: nauc_recall_at_100_max value: 39.0326 - type: nauc_recall_at_100_std value: -42.9628 - type: nauc_recall_at_100_diff1 value: 72.66669999999999 - type: nauc_recall_at_1000_max value: 16.4069 - type: nauc_recall_at_1000_std value: 20.353099999999998 - type: nauc_recall_at_1000_diff1 value: 72.6857 - type: nauc_precision_at_1_max value: 37.0736 - type: nauc_precision_at_1_std value: -43.2326 - type: nauc_precision_at_1_diff1 value: 77.9945 - type: nauc_precision_at_3_max value: 7.225099999999999 - type: nauc_precision_at_3_std value: 5.4519 - type: nauc_precision_at_3_diff1 value: -20.1979 - type: nauc_precision_at_5_max value: 3.1125 - type: nauc_precision_at_5_std value: 17.542099999999998 - type: nauc_precision_at_5_diff1 value: -32.5768 - type: nauc_precision_at_10_max value: -0.3758 - type: nauc_precision_at_10_std value: 27.9681 - type: nauc_precision_at_10_diff1 value: -39.8065 - type: nauc_precision_at_20_max value: -2.7107 - type: nauc_precision_at_20_std value: 34.9186 - type: nauc_precision_at_20_diff1 value: -42.686800000000005 - type: nauc_precision_at_100_max value: -4.587 - type: nauc_precision_at_100_std value: 41.415600000000005 - type: nauc_precision_at_100_diff1 value: -44.357 - type: nauc_precision_at_1000_max value: -5.003 - type: nauc_precision_at_1000_std value: 42.5355 - type: nauc_precision_at_1000_diff1 value: -44.5697 - type: nauc_mrr_at_1_max value: 37.1298 - type: nauc_mrr_at_1_std value: -43.2774 - type: nauc_mrr_at_1_diff1 value: 78.0714 - type: nauc_mrr_at_3_max value: 37.644800000000004 - type: nauc_mrr_at_3_std value: -46.231 - type: nauc_mrr_at_3_diff1 value: 77.0599 - type: nauc_mrr_at_5_max value: 37.994299999999996 - type: nauc_mrr_at_5_std value: -46.0511 - type: nauc_mrr_at_5_diff1 value: 77.1377 - type: nauc_mrr_at_10_max value: 37.9206 - type: nauc_mrr_at_10_std value: -45.8065 - type: nauc_mrr_at_10_diff1 value: 77.1994 - type: nauc_mrr_at_20_max value: 37.8028 - type: nauc_mrr_at_20_std value: -45.7095 - type: nauc_mrr_at_20_diff1 value: 77.2152 - type: nauc_mrr_at_100_max value: 37.7912 - type: nauc_mrr_at_100_std value: -45.6767 - type: nauc_mrr_at_100_diff1 value: 77.2139 - type: nauc_mrr_at_1000_max value: 37.79 - type: nauc_mrr_at_1000_std value: -45.6766 - type: nauc_mrr_at_1000_diff1 value: 77.2145 - type: main_score value: 89.021 task: type: Retrieval - dataset: config: default name: MTEB RedditClustering (default) revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: v_measure value: 51.208600000000004 - type: v_measure_std value: 4.2761000000000005 - type: main_score value: 51.208600000000004 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P (default) revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: v_measure value: 60.372899999999994 - type: v_measure_std value: 12.0829 - type: main_score value: 60.372899999999994 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS (default) revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 split: test type: mteb/scidocs metrics: - type: ndcg_at_1 value: 22.400000000000002 - type: ndcg_at_3 value: 19.192 - type: ndcg_at_5 value: 16.767000000000003 - type: ndcg_at_10 value: 20.238999999999997 - type: ndcg_at_20 value: 22.720000000000002 - type: ndcg_at_100 value: 27.567999999999998 - type: ndcg_at_1000 value: 32.535 - type: map_at_1 value: 4.552 - type: map_at_3 value: 8.495999999999999 - type: map_at_5 value: 10.213999999999999 - type: map_at_10 value: 11.985 - type: map_at_20 value: 12.937000000000001 - type: map_at_100 value: 13.885 - type: map_at_1000 value: 14.155999999999999 - type: recall_at_1 value: 4.552 - type: recall_at_3 value: 11.067 - type: recall_at_5 value: 15.052 - type: recall_at_10 value: 21.422 - type: recall_at_20 value: 27.279999999999998 - type: recall_at_100 value: 42.968 - type: recall_at_1000 value: 67.232 - type: precision_at_1 value: 22.400000000000002 - type: precision_at_3 value: 18.2 - type: precision_at_5 value: 14.860000000000001 - type: precision_at_10 value: 10.58 - type: precision_at_20 value: 6.715 - type: precision_at_100 value: 2.114 - type: precision_at_1000 value: 0.331 - type: mrr_at_1 value: 22.400000000000002 - type: mrr_at_3 value: 31.0833 - type: mrr_at_5 value: 32.853300000000004 - type: mrr_at_10 value: 34.2814 - type: mrr_at_20 value: 34.814 - type: mrr_at_100 value: 35.2576 - type: mrr_at_1000 value: 35.322199999999995 - type: nauc_ndcg_at_1_max value: 23.7575 - type: nauc_ndcg_at_1_std value: 4.1697 - type: nauc_ndcg_at_1_diff1 value: 28.3995 - type: nauc_ndcg_at_3_max value: 27.5517 - type: nauc_ndcg_at_3_std value: 8.8005 - type: nauc_ndcg_at_3_diff1 value: 22.334799999999998 - type: nauc_ndcg_at_5_max value: 28.607599999999998 - type: nauc_ndcg_at_5_std value: 10.0785 - type: nauc_ndcg_at_5_diff1 value: 21.4713 - type: nauc_ndcg_at_10_max value: 30.812099999999997 - type: nauc_ndcg_at_10_std value: 14.4374 - type: nauc_ndcg_at_10_diff1 value: 20.5304 - type: nauc_ndcg_at_20_max value: 32.3888 - type: nauc_ndcg_at_20_std value: 17.8152 - type: nauc_ndcg_at_20_diff1 value: 20.2815 - type: nauc_ndcg_at_100_max value: 34.402100000000004 - type: nauc_ndcg_at_100_std value: 22.3694 - type: nauc_ndcg_at_100_diff1 value: 20.9422 - type: nauc_ndcg_at_1000_max value: 33.7269 - type: nauc_ndcg_at_1000_std value: 23.646700000000003 - type: nauc_ndcg_at_1000_diff1 value: 19.7226 - type: nauc_map_at_1_max value: 23.5069 - type: nauc_map_at_1_std value: 3.8736 - type: nauc_map_at_1_diff1 value: 28.231 - type: nauc_map_at_3_max value: 27.293 - type: nauc_map_at_3_std value: 6.9329 - type: nauc_map_at_3_diff1 value: 21.8664 - type: nauc_map_at_5_max value: 28.591100000000004 - type: nauc_map_at_5_std value: 8.2248 - type: nauc_map_at_5_diff1 value: 21.4395 - type: nauc_map_at_10_max value: 30.417300000000004 - type: nauc_map_at_10_std value: 11.615300000000001 - type: nauc_map_at_10_diff1 value: 20.624000000000002 - type: nauc_map_at_20_max value: 31.479200000000002 - type: nauc_map_at_20_std value: 13.808699999999998 - type: nauc_map_at_20_diff1 value: 20.413 - type: nauc_map_at_100_max value: 32.2613 - type: nauc_map_at_100_std value: 15.5692 - type: nauc_map_at_100_diff1 value: 20.5465 - type: nauc_map_at_1000_max value: 32.2476 - type: nauc_map_at_1000_std value: 15.7471 - type: nauc_map_at_1000_diff1 value: 20.4622 - type: nauc_recall_at_1_max value: 23.5069 - type: nauc_recall_at_1_std value: 3.8736 - type: nauc_recall_at_1_diff1 value: 28.231 - type: nauc_recall_at_3_max value: 27.970299999999998 - type: nauc_recall_at_3_std value: 10.2171 - type: nauc_recall_at_3_diff1 value: 19.403699999999997 - type: nauc_recall_at_5_max value: 28.4521 - type: nauc_recall_at_5_std value: 12.2105 - type: nauc_recall_at_5_diff1 value: 17.5747 - type: nauc_recall_at_10_max value: 30.6955 - type: nauc_recall_at_10_std value: 19.096 - type: nauc_recall_at_10_diff1 value: 15.3116 - type: nauc_recall_at_20_max value: 32.1047 - type: nauc_recall_at_20_std value: 24.823600000000003 - type: nauc_recall_at_20_diff1 value: 14.257700000000002 - type: nauc_recall_at_100_max value: 33.6062 - type: nauc_recall_at_100_std value: 33.8641 - type: nauc_recall_at_100_diff1 value: 14.5145 - type: nauc_recall_at_1000_max value: 26.848300000000002 - type: nauc_recall_at_1000_std value: 38.5884 - type: nauc_recall_at_1000_diff1 value: 5.6408 - type: nauc_precision_at_1_max value: 23.7575 - type: nauc_precision_at_1_std value: 4.1697 - type: nauc_precision_at_1_diff1 value: 28.3995 - type: nauc_precision_at_3_max value: 28.2504 - type: nauc_precision_at_3_std value: 10.6227 - type: nauc_precision_at_3_diff1 value: 19.5683 - type: nauc_precision_at_5_max value: 28.8134 - type: nauc_precision_at_5_std value: 12.518899999999999 - type: nauc_precision_at_5_diff1 value: 17.8036 - type: nauc_precision_at_10_max value: 30.9813 - type: nauc_precision_at_10_std value: 19.3506 - type: nauc_precision_at_10_diff1 value: 15.512 - type: nauc_precision_at_20_max value: 32.6743 - type: nauc_precision_at_20_std value: 24.9974 - type: nauc_precision_at_20_diff1 value: 14.794099999999998 - type: nauc_precision_at_100_max value: 34.413700000000006 - type: nauc_precision_at_100_std value: 34.0889 - type: nauc_precision_at_100_diff1 value: 15.252699999999999 - type: nauc_precision_at_1000_max value: 27.3954 - type: nauc_precision_at_1000_std value: 37.8895 - type: nauc_precision_at_1000_diff1 value: 6.587999999999999 - type: nauc_mrr_at_1_max value: 23.7575 - type: nauc_mrr_at_1_std value: 4.1697 - type: nauc_mrr_at_1_diff1 value: 28.3995 - type: nauc_mrr_at_3_max value: 26.8324 - type: nauc_mrr_at_3_std value: 8.646700000000001 - type: nauc_mrr_at_3_diff1 value: 25.5754 - type: nauc_mrr_at_5_max value: 26.8274 - type: nauc_mrr_at_5_std value: 8.911 - type: nauc_mrr_at_5_diff1 value: 25.106 - type: nauc_mrr_at_10_max value: 27.073399999999996 - type: nauc_mrr_at_10_std value: 9.7624 - type: nauc_mrr_at_10_diff1 value: 24.9405 - type: nauc_mrr_at_20_max value: 27.1229 - type: nauc_mrr_at_20_std value: 10.0676 - type: nauc_mrr_at_20_diff1 value: 24.8122 - type: nauc_mrr_at_100_max value: 27.1391 - type: nauc_mrr_at_100_std value: 9.9628 - type: nauc_mrr_at_100_diff1 value: 24.9507 - type: nauc_mrr_at_1000_max value: 27.114 - type: nauc_mrr_at_1000_std value: 9.9537 - type: nauc_mrr_at_1000_diff1 value: 24.9421 - type: main_score value: 20.238999999999997 task: type: Retrieval - dataset: config: default name: MTEB SICK-R (default) revision: 20a6d6f312dd54037fe07a32d58e5e168867909d split: test type: mteb/sickr-sts metrics: - type: pearson value: 79.5908 - type: spearman value: 73.9888 - type: cosine_pearson value: 79.5908 - type: cosine_spearman value: 73.9888 - type: manhattan_pearson value: 77.0623 - type: manhattan_spearman value: 73.7724 - type: euclidean_pearson value: 77.30890000000001 - type: euclidean_spearman value: 73.9888 - type: main_score value: 73.9888 task: type: STS - dataset: config: default name: MTEB STS12 (default) revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: pearson value: 74.0752 - type: spearman value: 71.22699999999999 - type: cosine_pearson value: 74.0752 - type: cosine_spearman value: 71.22699999999999 - type: manhattan_pearson value: 70.6037 - type: manhattan_spearman value: 70.9916 - type: euclidean_pearson value: 70.922 - type: euclidean_spearman value: 71.22699999999999 - type: main_score value: 71.22699999999999 task: type: STS - dataset: config: default name: MTEB STS13 (default) revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: pearson value: 77.8946 - type: spearman value: 80.4405 - type: cosine_pearson value: 77.8946 - type: cosine_spearman value: 80.4405 - type: manhattan_pearson value: 79.6856 - type: manhattan_spearman value: 80.1236 - type: euclidean_pearson value: 80.0315 - type: euclidean_spearman value: 80.44059999999999 - type: main_score value: 80.4405 task: type: STS - dataset: config: default name: MTEB STS14 (default) revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: pearson value: 76.2196 - type: spearman value: 75.10419999999999 - type: cosine_pearson value: 76.2196 - type: cosine_spearman value: 75.10419999999999 - type: manhattan_pearson value: 75.4647 - type: manhattan_spearman value: 74.81179999999999 - type: euclidean_pearson value: 75.8091 - type: euclidean_spearman value: 75.10419999999999 - type: main_score value: 75.10419999999999 task: type: STS - dataset: config: default name: MTEB STS15 (default) revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: pearson value: 81.2455 - type: spearman value: 82.8681 - type: cosine_pearson value: 81.2455 - type: cosine_spearman value: 82.8681 - type: manhattan_pearson value: 82.4327 - type: manhattan_spearman value: 82.7513 - type: euclidean_pearson value: 82.5635 - type: euclidean_spearman value: 82.8681 - type: main_score value: 82.8681 task: type: STS - dataset: config: default name: MTEB STS16 (default) revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: pearson value: 81.6322 - type: spearman value: 83.487 - type: cosine_pearson value: 81.6322 - type: cosine_spearman value: 83.487 - type: manhattan_pearson value: 83.0048 - type: manhattan_spearman value: 83.4064 - type: euclidean_pearson value: 83.0938 - type: euclidean_spearman value: 83.487 - type: main_score value: 83.487 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 81.1124 - type: spearman value: 84.5436 - type: cosine_pearson value: 81.1124 - type: cosine_spearman value: 84.5436 - type: manhattan_pearson value: 83.5158 - type: manhattan_spearman value: 84.596 - type: euclidean_pearson value: 83.4429 - type: euclidean_spearman value: 84.5436 - type: main_score value: 84.5436 task: type: STS - dataset: config: en-tr name: MTEB STS17 (en-tr) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 62.0001 - type: spearman value: 63.631099999999996 - type: cosine_pearson value: 62.0001 - type: cosine_spearman value: 63.631099999999996 - type: manhattan_pearson value: 62.239599999999996 - type: manhattan_spearman value: 62.892199999999995 - type: euclidean_pearson value: 62.9809 - type: euclidean_spearman value: 63.631099999999996 - type: main_score value: 63.631099999999996 task: type: STS - dataset: config: it-en name: MTEB STS17 (it-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 75.1556 - type: spearman value: 76.8807 - type: cosine_pearson value: 75.1556 - type: cosine_spearman value: 76.8807 - type: manhattan_pearson value: 76.2428 - type: manhattan_spearman value: 76.8101 - type: euclidean_pearson value: 76.107 - type: euclidean_spearman value: 76.8807 - type: main_score value: 76.8807 task: type: STS - dataset: config: es-en name: MTEB STS17 (es-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 69.85719999999999 - type: spearman value: 71.0489 - type: cosine_pearson value: 69.85719999999999 - type: cosine_spearman value: 71.0489 - type: manhattan_pearson value: 71.08449999999999 - type: manhattan_spearman value: 71.0051 - type: euclidean_pearson value: 71.19760000000001 - type: euclidean_spearman value: 71.0489 - type: main_score value: 71.0489 task: type: STS - dataset: config: nl-en name: MTEB STS17 (nl-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 76.1131 - type: spearman value: 78.2714 - type: cosine_pearson value: 76.1131 - type: cosine_spearman value: 78.2714 - type: manhattan_pearson value: 76.70270000000001 - type: manhattan_spearman value: 77.7803 - type: euclidean_pearson value: 77.14269999999999 - type: euclidean_spearman value: 78.2714 - type: main_score value: 78.2714 task: type: STS - dataset: config: fr-en name: MTEB STS17 (fr-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 74.49719999999999 - type: spearman value: 76.2747 - type: cosine_pearson value: 74.49719999999999 - type: cosine_spearman value: 76.2747 - type: manhattan_pearson value: 75.071 - type: manhattan_spearman value: 75.8969 - type: euclidean_pearson value: 75.289 - type: euclidean_spearman value: 76.2747 - type: main_score value: 76.2747 task: type: STS - dataset: config: en-de name: MTEB STS17 (en-de) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 76.7073 - type: spearman value: 79.3107 - type: cosine_pearson value: 76.7073 - type: cosine_spearman value: 79.3107 - type: manhattan_pearson value: 77.9578 - type: manhattan_spearman value: 79.3195 - type: euclidean_pearson value: 77.7386 - type: euclidean_spearman value: 79.3107 - type: main_score value: 79.3107 task: type: STS - dataset: config: en-ar name: MTEB STS17 (en-ar) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 60.5826 - type: spearman value: 61.0502 - type: cosine_pearson value: 60.5826 - type: cosine_spearman value: 61.0502 - type: manhattan_pearson value: 61.202 - type: manhattan_spearman value: 61.2039 - type: euclidean_pearson value: 61.1915 - type: euclidean_spearman value: 61.0502 - type: main_score value: 61.0502 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 69.2521 - type: spearman value: 68.06219999999999 - type: cosine_pearson value: 69.2521 - type: cosine_spearman value: 68.06219999999999 - type: manhattan_pearson value: 70.5115 - type: manhattan_spearman value: 67.8705 - type: euclidean_pearson value: 70.68480000000001 - type: euclidean_spearman value: 68.06219999999999 - type: main_score value: 68.06219999999999 task: type: STS - dataset: config: pl-en name: MTEB STS22 (pl-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 77.97500000000001 - type: spearman value: 76.848 - type: cosine_pearson value: 77.97500000000001 - type: cosine_spearman value: 76.848 - type: manhattan_pearson value: 76.4098 - type: manhattan_spearman value: 76.6188 - type: euclidean_pearson value: 77.17500000000001 - type: euclidean_spearman value: 76.848 - type: main_score value: 76.848 task: type: STS - dataset: config: zh-en name: MTEB STS22 (zh-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 71.3604 - type: spearman value: 70.7891 - type: cosine_pearson value: 71.3604 - type: cosine_spearman value: 70.7891 - type: manhattan_pearson value: 73.0185 - type: manhattan_spearman value: 70.79299999999999 - type: euclidean_pearson value: 73.17620000000001 - type: euclidean_spearman value: 70.7891 - type: main_score value: 70.7891 task: type: STS - dataset: config: es-en name: MTEB STS22 (es-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 77.58239999999999 - type: spearman value: 78.5907 - type: cosine_pearson value: 77.58239999999999 - type: cosine_spearman value: 78.5907 - type: manhattan_pearson value: 79.25720000000001 - type: manhattan_spearman value: 78.6249 - type: euclidean_pearson value: 79.3724 - type: euclidean_spearman value: 78.5907 - type: main_score value: 78.5907 task: type: STS - dataset: config: de-en name: MTEB STS22 (de-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 63.324000000000005 - type: spearman value: 55.1099 - type: cosine_pearson value: 63.324000000000005 - type: cosine_spearman value: 55.1099 - type: manhattan_pearson value: 67.3128 - type: manhattan_spearman value: 56.340199999999996 - type: euclidean_pearson value: 67.12089999999999 - type: euclidean_spearman value: 55.1099 - type: main_score value: 55.1099 task: type: STS - dataset: config: default name: MTEB STSBenchmark (default) revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: pearson value: 78.02329999999999 - type: spearman value: 79.1887 - type: cosine_pearson value: 78.02329999999999 - type: cosine_spearman value: 79.1887 - type: manhattan_pearson value: 78.8951 - type: manhattan_spearman value: 78.9444 - type: euclidean_pearson value: 79.1499 - type: euclidean_spearman value: 79.1888 - type: main_score value: 79.1887 task: type: STS - dataset: config: default name: MTEB SciDocsRR (default) revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: map value: 78.7501 - type: mrr value: 93.9748 - type: nAUC_map_max value: 54.495599999999996 - type: nAUC_map_std value: 70.0377 - type: nAUC_map_diff1 value: 6.0146999999999995 - type: nAUC_mrr_max value: 81.1486 - type: nAUC_mrr_std value: 78.3478 - type: nAUC_mrr_diff1 value: 50.7613 - type: main_score value: 78.7501 task: type: Reranking - dataset: config: default name: MTEB SciFact (default) revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: ndcg_at_1 value: 58.667 - type: ndcg_at_3 value: 66.022 - type: ndcg_at_5 value: 68.508 - type: ndcg_at_10 value: 70.586 - type: ndcg_at_20 value: 71.714 - type: ndcg_at_100 value: 72.81 - type: ndcg_at_1000 value: 73.482 - type: map_at_1 value: 55.594 - type: map_at_3 value: 63.2 - type: map_at_5 value: 64.996 - type: map_at_10 value: 65.988 - type: map_at_20 value: 66.347 - type: map_at_100 value: 66.526 - type: map_at_1000 value: 66.547 - type: recall_at_1 value: 55.594 - type: recall_at_3 value: 71.22800000000001 - type: recall_at_5 value: 77.078 - type: recall_at_10 value: 83.172 - type: recall_at_20 value: 87.422 - type: recall_at_100 value: 93.167 - type: recall_at_1000 value: 98.667 - type: precision_at_1 value: 58.667 - type: precision_at_3 value: 25.778000000000002 - type: precision_at_5 value: 17.333000000000002 - type: precision_at_10 value: 9.433 - type: precision_at_20 value: 4.967 - type: precision_at_100 value: 1.06 - type: precision_at_1000 value: 0.11199999999999999 - type: mrr_at_1 value: 58.666700000000006 - type: mrr_at_3 value: 65.3889 - type: mrr_at_5 value: 66.62219999999999 - type: mrr_at_10 value: 67.3364 - type: mrr_at_20 value: 67.6046 - type: mrr_at_100 value: 67.73320000000001 - type: mrr_at_1000 value: 67.7526 - type: nauc_ndcg_at_1_max value: 60.2511 - type: nauc_ndcg_at_1_std value: 12.422 - type: nauc_ndcg_at_1_diff1 value: 74.4289 - type: nauc_ndcg_at_3_max value: 60.2109 - type: nauc_ndcg_at_3_std value: 11.0152 - type: nauc_ndcg_at_3_diff1 value: 71.0436 - type: nauc_ndcg_at_5_max value: 62.690999999999995 - type: nauc_ndcg_at_5_std value: 13.585700000000001 - type: nauc_ndcg_at_5_diff1 value: 70.4007 - type: nauc_ndcg_at_10_max value: 62.740899999999996 - type: nauc_ndcg_at_10_std value: 13.980400000000001 - type: nauc_ndcg_at_10_diff1 value: 70.0506 - type: nauc_ndcg_at_20_max value: 62.271699999999996 - type: nauc_ndcg_at_20_std value: 15.9756 - type: nauc_ndcg_at_20_diff1 value: 70.3237 - type: nauc_ndcg_at_100_max value: 62.125 - type: nauc_ndcg_at_100_std value: 15.5809 - type: nauc_ndcg_at_100_diff1 value: 70.4151 - type: nauc_ndcg_at_1000_max value: 61.9259 - type: nauc_ndcg_at_1000_std value: 15.3462 - type: nauc_ndcg_at_1000_diff1 value: 70.7346 - type: nauc_map_at_1_max value: 53.6767 - type: nauc_map_at_1_std value: 3.7751 - type: nauc_map_at_1_diff1 value: 74.60329999999999 - type: nauc_map_at_3_max value: 57.0403 - type: nauc_map_at_3_std value: 8.2272 - type: nauc_map_at_3_diff1 value: 71.7906 - type: nauc_map_at_5_max value: 59.6713 - type: nauc_map_at_5_std value: 10.8346 - type: nauc_map_at_5_diff1 value: 71.3356 - type: nauc_map_at_10_max value: 60.0086 - type: nauc_map_at_10_std value: 11.4394 - type: nauc_map_at_10_diff1 value: 71.14869999999999 - type: nauc_map_at_20_max value: 59.940599999999996 - type: nauc_map_at_20_std value: 12.0728 - type: nauc_map_at_20_diff1 value: 71.31 - type: nauc_map_at_100_max value: 59.95589999999999 - type: nauc_map_at_100_std value: 12.148299999999999 - type: nauc_map_at_100_diff1 value: 71.2142 - type: nauc_map_at_1000_max value: 59.9486 - type: nauc_map_at_1000_std value: 12.139 - type: nauc_map_at_1000_diff1 value: 71.2225 - type: nauc_recall_at_1_max value: 53.6767 - type: nauc_recall_at_1_std value: 3.7751 - type: nauc_recall_at_1_diff1 value: 74.60329999999999 - type: nauc_recall_at_3_max value: 60.4078 - type: nauc_recall_at_3_std value: 9.038300000000001 - type: nauc_recall_at_3_diff1 value: 67.60119999999999 - type: nauc_recall_at_5_max value: 68.0179 - type: nauc_recall_at_5_std value: 16.061600000000002 - type: nauc_recall_at_5_diff1 value: 65.54759999999999 - type: nauc_recall_at_10_max value: 68.7372 - type: nauc_recall_at_10_std value: 16.8637 - type: nauc_recall_at_10_diff1 value: 62.7613 - type: nauc_recall_at_20_max value: 67.1403 - type: nauc_recall_at_20_std value: 31.3919 - type: nauc_recall_at_20_diff1 value: 62.66929999999999 - type: nauc_recall_at_100_max value: 68.6366 - type: nauc_recall_at_100_std value: 32.4577 - type: nauc_recall_at_100_diff1 value: 64.52029999999999 - type: nauc_recall_at_1000_max value: 70.7166 - type: nauc_recall_at_1000_std value: 70.47149999999999 - type: nauc_recall_at_1000_diff1 value: 85.58590000000001 - type: nauc_precision_at_1_max value: 60.2511 - type: nauc_precision_at_1_std value: 12.422 - type: nauc_precision_at_1_diff1 value: 74.4289 - type: nauc_precision_at_3_max value: 58.75280000000001 - type: nauc_precision_at_3_std value: 27.605400000000003 - type: nauc_precision_at_3_diff1 value: 49.1523 - type: nauc_precision_at_5_max value: 56.4694 - type: nauc_precision_at_5_std value: 39.080799999999996 - type: nauc_precision_at_5_diff1 value: 28.8162 - type: nauc_precision_at_10_max value: 48.8894 - type: nauc_precision_at_10_std value: 43.8149 - type: nauc_precision_at_10_diff1 value: 15.0093 - type: nauc_precision_at_20_max value: 41.4059 - type: nauc_precision_at_20_std value: 50.7143 - type: nauc_precision_at_20_diff1 value: 8.3552 - type: nauc_precision_at_100_max value: 33.5064 - type: nauc_precision_at_100_std value: 52.8775 - type: nauc_precision_at_100_diff1 value: -5.0870999999999995 - type: nauc_precision_at_1000_max value: 23.9064 - type: nauc_precision_at_1000_std value: 57.784800000000004 - type: nauc_precision_at_1000_diff1 value: -20.1246 - type: nauc_mrr_at_1_max value: 60.2511 - type: nauc_mrr_at_1_std value: 12.422 - type: nauc_mrr_at_1_diff1 value: 74.4289 - type: nauc_mrr_at_3_max value: 62.663199999999996 - type: nauc_mrr_at_3_std value: 14.7348 - type: nauc_mrr_at_3_diff1 value: 72.1185 - type: nauc_mrr_at_5_max value: 63.3871 - type: nauc_mrr_at_5_std value: 15.773000000000001 - type: nauc_mrr_at_5_diff1 value: 71.6722 - type: nauc_mrr_at_10_max value: 62.8474 - type: nauc_mrr_at_10_std value: 15.1896 - type: nauc_mrr_at_10_diff1 value: 71.64110000000001 - type: nauc_mrr_at_20_max value: 62.699400000000004 - type: nauc_mrr_at_20_std value: 15.554499999999999 - type: nauc_mrr_at_20_diff1 value: 71.6049 - type: nauc_mrr_at_100_max value: 62.6665 - type: nauc_mrr_at_100_std value: 15.4586 - type: nauc_mrr_at_100_diff1 value: 71.6217 - type: nauc_mrr_at_1000_max value: 62.6641 - type: nauc_mrr_at_1000_std value: 15.4535 - type: nauc_mrr_at_1000_diff1 value: 71.6307 - type: main_score value: 70.586 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions (default) revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: similarity_accuracy value: 99.8416 - type: similarity_accuracy_threshold value: 74.52069999999999 - type: similarity_f1 value: 92.008 - type: similarity_f1_threshold value: 74.4529 - type: similarity_precision value: 91.9162 - type: similarity_recall value: 92.10000000000001 - type: similarity_ap value: 96.54209999999999 - type: cosine_accuracy value: 99.8416 - type: cosine_accuracy_threshold value: 74.52069999999999 - type: cosine_f1 value: 92.008 - type: cosine_f1_threshold value: 74.4529 - type: cosine_precision value: 91.9162 - type: cosine_recall value: 92.10000000000001 - type: cosine_ap value: 96.54209999999999 - type: manhattan_accuracy value: 99.8446 - type: manhattan_accuracy_threshold value: 1784.866 - type: manhattan_f1 value: 92.1539 - type: manhattan_f1_threshold value: 1787.6774 - type: manhattan_precision value: 92.1079 - type: manhattan_recall value: 92.2 - type: manhattan_ap value: 96.5207 - type: euclidean_accuracy value: 99.8416 - type: euclidean_accuracy_threshold value: 71.3853 - type: euclidean_f1 value: 92.008 - type: euclidean_f1_threshold value: 71.4803 - type: euclidean_precision value: 91.9162 - type: euclidean_recall value: 92.10000000000001 - type: euclidean_ap value: 96.54209999999999 - type: dot_accuracy value: 99.8416 - type: dot_accuracy_threshold value: 74.52069999999999 - type: dot_f1 value: 92.008 - type: dot_f1_threshold value: 74.4528 - type: dot_precision value: 91.9162 - type: dot_recall value: 92.10000000000001 - type: dot_ap value: 96.54209999999999 - type: max_accuracy value: 99.8446 - type: max_f1 value: 92.1539 - type: max_precision value: 92.1079 - type: max_recall value: 92.2 - type: max_ap value: 96.54209999999999 - type: main_score value: 96.54209999999999 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering (default) revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: v_measure value: 63.4035 - type: v_measure_std value: 4.758 - type: main_score value: 63.4035 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P (default) revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: v_measure value: 36.288599999999995 - type: v_measure_std value: 1.3107 - type: main_score value: 36.288599999999995 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions (default) revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: map value: 51.457699999999996 - type: mrr value: 52.374500000000005 - type: nAUC_map_max value: 12.912399999999998 - type: nAUC_map_std value: 6.4524 - type: nAUC_map_diff1 value: 37.2785 - type: nAUC_mrr_max value: 13.333999999999998 - type: nAUC_mrr_std value: 7.0440000000000005 - type: nAUC_mrr_diff1 value: 37.2993 - type: main_score value: 51.457699999999996 task: type: Reranking - dataset: config: default name: MTEB SummEval (default) revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: pearson value: 29.7101 - type: spearman value: 30.514200000000002 - type: cosine_spearman value: 30.514200000000002 - type: cosine_pearson value: 29.7101 - type: dot_spearman value: 30.514200000000002 - type: dot_pearson value: 29.7101 - type: main_score value: 30.514200000000002 task: type: Summarization - dataset: config: default name: MTEB TRECCOVID (default) revision: bb9466bac8153a0349341eb1b22e06409e78ef4e split: test type: mteb/trec-covid metrics: - type: ndcg_at_1 value: 86.0 - type: ndcg_at_3 value: 86.542 - type: ndcg_at_5 value: 85.297 - type: ndcg_at_10 value: 83.866 - type: ndcg_at_20 value: 80.553 - type: ndcg_at_100 value: 65.091 - type: ndcg_at_1000 value: 57.86900000000001 - type: map_at_1 value: 0.23500000000000001 - type: map_at_3 value: 0.7100000000000001 - type: map_at_5 value: 1.1440000000000001 - type: map_at_10 value: 2.185 - type: map_at_20 value: 4.004 - type: map_at_100 value: 13.25 - type: map_at_1000 value: 32.668 - type: recall_at_1 value: 0.23500000000000001 - type: recall_at_3 value: 0.736 - type: recall_at_5 value: 1.191 - type: recall_at_10 value: 2.323 - type: recall_at_20 value: 4.390000000000001 - type: recall_at_100 value: 15.962000000000002 - type: recall_at_1000 value: 54.290000000000006 - type: precision_at_1 value: 90.0 - type: precision_at_3 value: 92.0 - type: precision_at_5 value: 90.0 - type: precision_at_10 value: 88.6 - type: precision_at_20 value: 85.5 - type: precision_at_100 value: 67.14 - type: precision_at_1000 value: 25.81 - type: mrr_at_1 value: 90.0 - type: mrr_at_3 value: 94.6667 - type: mrr_at_5 value: 94.6667 - type: mrr_at_10 value: 94.6667 - type: mrr_at_20 value: 94.6667 - type: mrr_at_100 value: 94.6667 - type: mrr_at_1000 value: 94.6667 - type: nauc_ndcg_at_1_max value: -0.0208 - type: nauc_ndcg_at_1_std value: 9.228200000000001 - type: nauc_ndcg_at_1_diff1 value: -7.4962 - type: nauc_ndcg_at_3_max value: 16.5755 - type: nauc_ndcg_at_3_std value: 39.0511 - type: nauc_ndcg_at_3_diff1 value: -14.5975 - type: nauc_ndcg_at_5_max value: 15.326799999999999 - type: nauc_ndcg_at_5_std value: 44.2523 - type: nauc_ndcg_at_5_diff1 value: -15.004600000000002 - type: nauc_ndcg_at_10_max value: 34.5609 - type: nauc_ndcg_at_10_std value: 62.8752 - type: nauc_ndcg_at_10_diff1 value: -22.9907 - type: nauc_ndcg_at_20_max value: 35.7633 - type: nauc_ndcg_at_20_std value: 74.1826 - type: nauc_ndcg_at_20_diff1 value: -26.3264 - type: nauc_ndcg_at_100_max value: 36.939499999999995 - type: nauc_ndcg_at_100_std value: 80.702 - type: nauc_ndcg_at_100_diff1 value: -41.7784 - type: nauc_ndcg_at_1000_max value: 41.3313 - type: nauc_ndcg_at_1000_std value: 68.0671 - type: nauc_ndcg_at_1000_diff1 value: -14.6009 - type: nauc_map_at_1_max value: -15.2873 - type: nauc_map_at_1_std value: -24.4781 - type: nauc_map_at_1_diff1 value: 35.4803 - type: nauc_map_at_3_max value: -14.107700000000001 - type: nauc_map_at_3_std value: -23.197699999999998 - type: nauc_map_at_3_diff1 value: 37.8596 - type: nauc_map_at_5_max value: -12.7588 - type: nauc_map_at_5_std value: -20.174400000000002 - type: nauc_map_at_5_diff1 value: 39.575700000000005 - type: nauc_map_at_10_max value: -4.8804 - type: nauc_map_at_10_std value: -11.0753 - type: nauc_map_at_10_diff1 value: 38.2457 - type: nauc_map_at_20_max value: 0.7396 - type: nauc_map_at_20_std value: 0.3599 - type: nauc_map_at_20_diff1 value: 35.4735 - type: nauc_map_at_100_max value: 20.011000000000003 - type: nauc_map_at_100_std value: 45.2654 - type: nauc_map_at_100_diff1 value: 3.6394 - type: nauc_map_at_1000_max value: 43.317099999999996 - type: nauc_map_at_1000_std value: 74.6629 - type: nauc_map_at_1000_diff1 value: -22.509 - type: nauc_recall_at_1_max value: -15.2873 - type: nauc_recall_at_1_std value: -24.4781 - type: nauc_recall_at_1_diff1 value: 35.4803 - type: nauc_recall_at_3_max value: -14.1509 - type: nauc_recall_at_3_std value: -24.7684 - type: nauc_recall_at_3_diff1 value: 40.6736 - type: nauc_recall_at_5_max value: -13.053899999999999 - type: nauc_recall_at_5_std value: -21.7134 - type: nauc_recall_at_5_diff1 value: 42.4446 - type: nauc_recall_at_10_max value: -7.3492 - type: nauc_recall_at_10_std value: -15.7989 - type: nauc_recall_at_10_diff1 value: 41.6543 - type: nauc_recall_at_20_max value: -4.8004 - type: nauc_recall_at_20_std value: -9.6834 - type: nauc_recall_at_20_diff1 value: 41.7323 - type: nauc_recall_at_100_max value: 11.3356 - type: nauc_recall_at_100_std value: 28.1118 - type: nauc_recall_at_100_diff1 value: 15.6166 - type: nauc_recall_at_1000_max value: 39.9341 - type: nauc_recall_at_1000_std value: 54.15410000000001 - type: nauc_recall_at_1000_diff1 value: -2.0016 - type: nauc_precision_at_1_max value: 12.2035 - type: nauc_precision_at_1_std value: 24.1923 - type: nauc_precision_at_1_diff1 value: -25.368800000000004 - type: nauc_precision_at_3_max value: 31.019600000000004 - type: nauc_precision_at_3_std value: 56.08539999999999 - type: nauc_precision_at_3_diff1 value: -33.821600000000004 - type: nauc_precision_at_5_max value: 26.127699999999997 - type: nauc_precision_at_5_std value: 52.8458 - type: nauc_precision_at_5_diff1 value: -22.24 - type: nauc_precision_at_10_max value: 45.8122 - type: nauc_precision_at_10_std value: 71.9086 - type: nauc_precision_at_10_diff1 value: -28.500700000000002 - type: nauc_precision_at_20_max value: 44.2567 - type: nauc_precision_at_20_std value: 80.86410000000001 - type: nauc_precision_at_20_diff1 value: -28.518 - type: nauc_precision_at_100_max value: 42.8044 - type: nauc_precision_at_100_std value: 84.13669999999999 - type: nauc_precision_at_100_diff1 value: -47.1098 - type: nauc_precision_at_1000_max value: 40.260200000000005 - type: nauc_precision_at_1000_std value: 53.53059999999999 - type: nauc_precision_at_1000_diff1 value: -41.2652 - type: nauc_mrr_at_1_max value: 12.2035 - type: nauc_mrr_at_1_std value: 24.1923 - type: nauc_mrr_at_1_diff1 value: -25.368800000000004 - type: nauc_mrr_at_3_max value: 16.8738 - type: nauc_mrr_at_3_std value: 28.113300000000002 - type: nauc_mrr_at_3_diff1 value: -20.3198 - type: nauc_mrr_at_5_max value: 16.8738 - type: nauc_mrr_at_5_std value: 28.113300000000002 - type: nauc_mrr_at_5_diff1 value: -20.3198 - type: nauc_mrr_at_10_max value: 16.8738 - type: nauc_mrr_at_10_std value: 28.113300000000002 - type: nauc_mrr_at_10_diff1 value: -20.3198 - type: nauc_mrr_at_20_max value: 16.8738 - type: nauc_mrr_at_20_std value: 28.113300000000002 - type: nauc_mrr_at_20_diff1 value: -20.3198 - type: nauc_mrr_at_100_max value: 16.8738 - type: nauc_mrr_at_100_std value: 28.113300000000002 - type: nauc_mrr_at_100_diff1 value: -20.3198 - type: nauc_mrr_at_1000_max value: 16.8738 - type: nauc_mrr_at_1000_std value: 28.113300000000002 - type: nauc_mrr_at_1000_diff1 value: -20.3198 - type: main_score value: 83.866 task: type: Retrieval - dataset: config: default name: MTEB Touche2020 (default) revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: ndcg_at_1 value: 38.775999999999996 - type: ndcg_at_3 value: 33.664 - type: ndcg_at_5 value: 31.61 - type: ndcg_at_10 value: 29.499 - type: ndcg_at_20 value: 29.772 - type: ndcg_at_100 value: 39.845000000000006 - type: ndcg_at_1000 value: 51.141999999999996 - type: map_at_1 value: 3.004 - type: map_at_3 value: 6.027 - type: map_at_5 value: 7.993 - type: map_at_10 value: 11.546 - type: map_at_20 value: 14.185 - type: map_at_100 value: 17.698 - type: map_at_1000 value: 19.364 - type: recall_at_1 value: 3.004 - type: recall_at_3 value: 7.178 - type: recall_at_5 value: 11.196 - type: recall_at_10 value: 18.584999999999997 - type: recall_at_20 value: 26.845999999999997 - type: recall_at_100 value: 49.025 - type: recall_at_1000 value: 82.884 - type: precision_at_1 value: 40.816 - type: precision_at_3 value: 33.333 - type: precision_at_5 value: 30.612000000000002 - type: precision_at_10 value: 25.714 - type: precision_at_20 value: 19.387999999999998 - type: precision_at_100 value: 7.939 - type: precision_at_1000 value: 1.545 - type: mrr_at_1 value: 40.8163 - type: mrr_at_3 value: 53.401399999999995 - type: mrr_at_5 value: 56.7687 - type: mrr_at_10 value: 57.5421 - type: mrr_at_20 value: 58.142 - type: mrr_at_100 value: 58.2307 - type: mrr_at_1000 value: 58.2307 - type: nauc_ndcg_at_1_max value: -18.0584 - type: nauc_ndcg_at_1_std value: -25.634600000000002 - type: nauc_ndcg_at_1_diff1 value: -1.7021000000000002 - type: nauc_ndcg_at_3_max value: -17.8622 - type: nauc_ndcg_at_3_std value: -20.119799999999998 - type: nauc_ndcg_at_3_diff1 value: -2.399 - type: nauc_ndcg_at_5_max value: -22.0829 - type: nauc_ndcg_at_5_std value: -22.841 - type: nauc_ndcg_at_5_diff1 value: -12.350200000000001 - type: nauc_ndcg_at_10_max value: -17.858999999999998 - type: nauc_ndcg_at_10_std value: -17.9067 - type: nauc_ndcg_at_10_diff1 value: -9.3129 - type: nauc_ndcg_at_20_max value: -24.479400000000002 - type: nauc_ndcg_at_20_std value: -16.06 - type: nauc_ndcg_at_20_diff1 value: -10.57 - type: nauc_ndcg_at_100_max value: -20.9167 - type: nauc_ndcg_at_100_std value: 9.6051 - type: nauc_ndcg_at_100_diff1 value: -0.2363 - type: nauc_ndcg_at_1000_max value: -13.6708 - type: nauc_ndcg_at_1000_std value: 17.956 - type: nauc_ndcg_at_1000_diff1 value: -2.5696 - type: nauc_map_at_1_max value: -14.276900000000001 - type: nauc_map_at_1_std value: -31.3091 - type: nauc_map_at_1_diff1 value: -1.4354 - type: nauc_map_at_3_max value: -21.7098 - type: nauc_map_at_3_std value: -32.112899999999996 - type: nauc_map_at_3_diff1 value: -8.846 - type: nauc_map_at_5_max value: -16.700200000000002 - type: nauc_map_at_5_std value: -32.643499999999996 - type: nauc_map_at_5_diff1 value: -13.9766 - type: nauc_map_at_10_max value: -13.415199999999999 - type: nauc_map_at_10_std value: -28.459200000000003 - type: nauc_map_at_10_diff1 value: -12.4042 - type: nauc_map_at_20_max value: -17.8629 - type: nauc_map_at_20_std value: -24.5837 - type: nauc_map_at_20_diff1 value: -14.9642 - type: nauc_map_at_100_max value: -15.6478 - type: nauc_map_at_100_std value: -11.4237 - type: nauc_map_at_100_diff1 value: -11.542 - type: nauc_map_at_1000_max value: -15.2149 - type: nauc_map_at_1000_std value: -8.0384 - type: nauc_map_at_1000_diff1 value: -12.984000000000002 - type: nauc_recall_at_1_max value: -14.276900000000001 - type: nauc_recall_at_1_std value: -31.3091 - type: nauc_recall_at_1_diff1 value: -1.4354 - type: nauc_recall_at_3_max value: -23.021900000000002 - type: nauc_recall_at_3_std value: -30.2834 - type: nauc_recall_at_3_diff1 value: -11.4226 - type: nauc_recall_at_5_max value: -20.596600000000002 - type: nauc_recall_at_5_std value: -33.219300000000004 - type: nauc_recall_at_5_diff1 value: -17.718999999999998 - type: nauc_recall_at_10_max value: -16.1214 - type: nauc_recall_at_10_std value: -23.9041 - type: nauc_recall_at_10_diff1 value: -11.047 - type: nauc_recall_at_20_max value: -25.603399999999997 - type: nauc_recall_at_20_std value: -15.8105 - type: nauc_recall_at_20_diff1 value: -14.546000000000001 - type: nauc_recall_at_100_max value: -16.389400000000002 - type: nauc_recall_at_100_std value: 28.5141 - type: nauc_recall_at_100_diff1 value: 6.1868 - type: nauc_recall_at_1000_max value: 11.022 - type: nauc_recall_at_1000_std value: 68.0021 - type: nauc_recall_at_1000_diff1 value: 8.426 - type: nauc_precision_at_1_max value: -17.1625 - type: nauc_precision_at_1_std value: -27.9451 - type: nauc_precision_at_1_diff1 value: 1.0831 - type: nauc_precision_at_3_max value: -17.2798 - type: nauc_precision_at_3_std value: -20.347199999999997 - type: nauc_precision_at_3_diff1 value: -5.2689 - type: nauc_precision_at_5_max value: -19.6408 - type: nauc_precision_at_5_std value: -24.157 - type: nauc_precision_at_5_diff1 value: -20.274900000000002 - type: nauc_precision_at_10_max value: -11.8033 - type: nauc_precision_at_10_std value: -7.2727 - type: nauc_precision_at_10_diff1 value: -9.3776 - type: nauc_precision_at_20_max value: -20.1541 - type: nauc_precision_at_20_std value: 9.0645 - type: nauc_precision_at_20_diff1 value: -16.1323 - type: nauc_precision_at_100_max value: 0.3701 - type: nauc_precision_at_100_std value: 67.6941 - type: nauc_precision_at_100_diff1 value: 8.0336 - type: nauc_precision_at_1000_max value: 38.8632 - type: nauc_precision_at_1000_std value: 38.0504 - type: nauc_precision_at_1000_diff1 value: 0.5907 - type: nauc_mrr_at_1_max value: -17.1625 - type: nauc_mrr_at_1_std value: -27.9451 - type: nauc_mrr_at_1_diff1 value: 1.0831 - type: nauc_mrr_at_3_max value: -20.479300000000002 - type: nauc_mrr_at_3_std value: -21.9225 - type: nauc_mrr_at_3_diff1 value: -1.5211000000000001 - type: nauc_mrr_at_5_max value: -24.8175 - type: nauc_mrr_at_5_std value: -23.805 - type: nauc_mrr_at_5_diff1 value: -7.9258 - type: nauc_mrr_at_10_max value: -22.53 - type: nauc_mrr_at_10_std value: -21.9391 - type: nauc_mrr_at_10_diff1 value: -5.7533 - type: nauc_mrr_at_20_max value: -22.7064 - type: nauc_mrr_at_20_std value: -22.4697 - type: nauc_mrr_at_20_diff1 value: -5.7068 - type: nauc_mrr_at_100_max value: -23.0016 - type: nauc_mrr_at_100_std value: -22.488 - type: nauc_mrr_at_100_diff1 value: -5.3738 - type: nauc_mrr_at_1000_max value: -23.0016 - type: nauc_mrr_at_1000_std value: -22.488 - type: nauc_mrr_at_1000_diff1 value: -5.3738 - type: main_score value: 29.499 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification (default) revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 65.8643 - type: f1 value: 50.6764 - type: f1_weighted value: 73.2472 - type: ap value: 12.2658 - type: ap_weighted value: 12.2658 - type: main_score value: 65.8643 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification (default) revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 59.6633 - type: f1 value: 59.935700000000004 - type: f1_weighted value: 59.0249 - type: main_score value: 59.6633 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering (default) revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: v_measure value: 43.2311 - type: v_measure_std value: 2.3994999999999997 - type: main_score value: 43.2311 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 (default) revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: similarity_accuracy value: 83.8469 - type: similarity_accuracy_threshold value: 77.6695 - type: similarity_f1 value: 62.3159 - type: similarity_f1_threshold value: 71.6554 - type: similarity_precision value: 59.114599999999996 - type: similarity_recall value: 65.8839 - type: similarity_ap value: 67.00930000000001 - type: cosine_accuracy value: 83.8469 - type: cosine_accuracy_threshold value: 77.6695 - type: cosine_f1 value: 62.3159 - type: cosine_f1_threshold value: 71.6554 - type: cosine_precision value: 59.114599999999996 - type: cosine_recall value: 65.8839 - type: cosine_ap value: 67.00930000000001 - type: manhattan_accuracy value: 83.7694 - type: manhattan_accuracy_threshold value: 1677.8293999999999 - type: manhattan_f1 value: 62.1324 - type: manhattan_f1_threshold value: 1848.6641 - type: manhattan_precision value: 61.839999999999996 - type: manhattan_recall value: 62.4274 - type: manhattan_ap value: 66.8849 - type: euclidean_accuracy value: 83.8469 - type: euclidean_accuracy_threshold value: 66.8288 - type: euclidean_f1 value: 62.3159 - type: euclidean_f1_threshold value: 75.2922 - type: euclidean_precision value: 59.114599999999996 - type: euclidean_recall value: 65.8839 - type: euclidean_ap value: 67.00930000000001 - type: dot_accuracy value: 83.8469 - type: dot_accuracy_threshold value: 77.6695 - type: dot_f1 value: 62.3159 - type: dot_f1_threshold value: 71.6554 - type: dot_precision value: 59.114599999999996 - type: dot_recall value: 65.8839 - type: dot_ap value: 67.00930000000001 - type: max_accuracy value: 83.8469 - type: max_f1 value: 62.3159 - type: max_precision value: 61.839999999999996 - type: max_recall value: 65.8839 - type: max_ap value: 67.00930000000001 - type: main_score value: 67.00930000000001 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus (default) revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: similarity_accuracy value: 88.8811 - type: similarity_accuracy_threshold value: 71.1053 - type: similarity_f1 value: 77.9005 - type: similarity_f1_threshold value: 67.5068 - type: similarity_precision value: 75.5609 - type: similarity_recall value: 80.3896 - type: similarity_ap value: 85.459 - type: cosine_accuracy value: 88.8811 - type: cosine_accuracy_threshold value: 71.1053 - type: cosine_f1 value: 77.9005 - type: cosine_f1_threshold value: 67.5068 - type: cosine_precision value: 75.5609 - type: cosine_recall value: 80.3896 - type: cosine_ap value: 85.459 - type: manhattan_accuracy value: 88.8598 - type: manhattan_accuracy_threshold value: 1928.9173 - type: manhattan_f1 value: 77.9172 - type: manhattan_f1_threshold value: 2007.8883999999998 - type: manhattan_precision value: 76.29310000000001 - type: manhattan_recall value: 79.6119 - type: manhattan_ap value: 85.4464 - type: euclidean_accuracy value: 88.8811 - type: euclidean_accuracy_threshold value: 76.0193 - type: euclidean_f1 value: 77.9005 - type: euclidean_f1_threshold value: 80.6141 - type: euclidean_precision value: 75.5609 - type: euclidean_recall value: 80.3896 - type: euclidean_ap value: 85.459 - type: dot_accuracy value: 88.8811 - type: dot_accuracy_threshold value: 71.1053 - type: dot_f1 value: 77.9005 - type: dot_f1_threshold value: 67.5068 - type: dot_precision value: 75.5609 - type: dot_recall value: 80.3896 - type: dot_ap value: 85.459 - type: max_accuracy value: 88.8811 - type: max_f1 value: 77.9172 - type: max_precision value: 76.29310000000001 - type: max_recall value: 80.3896 - type: max_ap value: 85.459 - type: main_score value: 85.459 task: type: PairClassification ---

Snowflake's Arctic-embed-l-v2.0

News | Models | Usage | Evaluation | Contact | FAQ License | Acknowledgement

## News - 12/11/2024: Release of Technical Report - 12/04/2024: Release of snowflake-arctic-embed-l-v2.0 and snowflake-arctic-embed-m-v2.0 our newest models with multilingual workloads in mind. ## Models Snowflake arctic-embed-l-v2.0 is the newest addition to the suite of embedding models Snowflake has released optimizing for retrieval performance and inference efficiency. Arctic Embed 2.0 introduces a new standard for multilingual embedding models, combining high-quality multilingual text retrieval without sacrificing performance in English. Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale. Key Features: 1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL. 2. Inference efficiency: Its 303m non-embedding parameters inference is fast and efficient for any scale. 3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training. **Please note that like our v1.5 model, the MRL for this model is 256 dimensions, and high-quality 128-byte compression is achieved via 4-bit quantization (e.g. using a fast-scan FAISS index or using the example code published alongside our 1.5 model).** 4. Drop-In Replacement: arctic-embed-l-v2.0 builds on BAAI/bge-m3-retromae which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc. 5. Long Context Support: arctic-embed-l-v2.0 builds on BAAI/bge-m3-retromae which can support a context window of up to 8192 via the use of RoPE. ### Quality Benchmarks Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF). You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed. | Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) | |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **snowflake-arctic-l-v2.0** | 568M | 303M | 1024 | **55.6** | 55.8 | **52.9** | **54.3** | | snowflake-arctic-m | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 | | snowflake-arctic-l | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 | | me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 | | bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | **56.8** | 40.8 | 41.3 | | gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 | Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 4x with less than 3% degredation in quality. Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc. | Model | | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance | |---|---|:---:|:---:|:---:|:---:|:---:|---|---|---| | snowflake-arctic-l-v2.0 | 1024 | 55.6 | N/A | 55.8 | N/A | 52.9 | N/A | 54.3 | N/A | | snowflake-arctic-l-v2.0 | 256 | 54.3 | -0.18% | 54.3 | -2.70% | 51.9 | -1.81% | 53.4 | -1.53% | ## Usage ### Using Sentence Transformers ### Using Huggingface Transformers You can use the transformers package to use Snowflake's arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query). This should produce the following scores ### Using Huggingface Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM using: You can then use the model for retrieval, as follows: ## Contact Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Daniel Campos(daniel.campos@snowflake.com). ## License Arctic is licensed under the Apache-2. The released models can be used for commercial purposes free of charge.", + "model_explanation_gemini": "Generates embeddings for sentences to measure similarity across multiple languages, optimized for classification and retrieval tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Snowflake_snowflake-arctic-embed-m-v1.5.json b/data/model_data_json/Snowflake_snowflake-arctic-embed-m-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..af2f834c6ba63c04526a230978d2b060f872a224 --- /dev/null +++ b/data/model_data_json/Snowflake_snowflake-arctic-embed-m-v1.5.json @@ -0,0 +1,29 @@ +{ + "model_id": "Snowflake/snowflake-arctic-embed-m-v1.5", + "downloads": 231940, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "gguf", + "bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "arctic", + "snowflake-arctic-embed", + "transformers.js", + "arxiv:2412.04506", + "arxiv:2407.18887", + "arxiv:2405.05374", + "arxiv:2205.13147", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - arctic - snowflake-arctic-embed - transformers.js license: apache-2.0 model-index: - name: snowflake-arctic-embed-m-v1.5 results: - dataset: config: default name: MTEB ArguAna revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: main_score value: 59.53000000000001 - type: map_at_1 value: 34.282000000000004 - type: map_at_10 value: 50.613 - type: map_at_100 value: 51.269 - type: map_at_1000 value: 51.271 - type: map_at_20 value: 51.158 - type: map_at_3 value: 45.626 - type: map_at_5 value: 48.638 - type: mrr_at_1 value: 34.92176386913229 - type: mrr_at_10 value: 50.856081645555406 - type: mrr_at_100 value: 51.510739437069034 - type: mrr_at_1000 value: 51.51299498830165 - type: mrr_at_20 value: 51.39987941081724 - type: mrr_at_3 value: 45.993361782835514 - type: mrr_at_5 value: 48.88098624940742 - type: nauc_map_at_1000_diff1 value: 10.628675774160785 - type: nauc_map_at_1000_max value: -10.11742589992339 - type: nauc_map_at_1000_std value: -18.29277379812427 - type: nauc_map_at_100_diff1 value: 10.63250240035489 - type: nauc_map_at_100_max value: -10.112078786734363 - type: nauc_map_at_100_std value: -18.288524872706834 - type: nauc_map_at_10_diff1 value: 10.476494913081712 - type: nauc_map_at_10_max value: -9.890937746734037 - type: nauc_map_at_10_std value: -18.279750514750443 - type: nauc_map_at_1_diff1 value: 14.549204048461151 - type: nauc_map_at_1_max value: -12.230560087701225 - type: nauc_map_at_1_std value: -19.469903650130362 - type: nauc_map_at_20_diff1 value: 10.586564571825674 - type: nauc_map_at_20_max value: -10.00292720526217 - type: nauc_map_at_20_std value: -18.258077347878064 - type: nauc_map_at_3_diff1 value: 10.378663968090372 - type: nauc_map_at_3_max value: -10.458896171786185 - type: nauc_map_at_3_std value: -18.38852760333766 - type: nauc_map_at_5_diff1 value: 10.235960275925581 - type: nauc_map_at_5_max value: -10.239496080409058 - type: nauc_map_at_5_std value: -18.817023479445886 - type: nauc_mrr_at_1000_diff1 value: 8.718212649575722 - type: nauc_mrr_at_1000_max value: -10.81022794038691 - type: nauc_mrr_at_1000_std value: -17.87669499555167 - type: nauc_mrr_at_100_diff1 value: 8.722174171165133 - type: nauc_mrr_at_100_max value: -10.804840985713525 - type: nauc_mrr_at_100_std value: -17.872487099359986 - type: nauc_mrr_at_10_diff1 value: 8.609421635870238 - type: nauc_mrr_at_10_max value: -10.568644717548432 - type: nauc_mrr_at_10_std value: -17.872968762635814 - type: nauc_mrr_at_1_diff1 value: 12.69590006263834 - type: nauc_mrr_at_1_max value: -12.082056561238321 - type: nauc_mrr_at_1_std value: -18.036424092186657 - type: nauc_mrr_at_20_diff1 value: 8.684842497970315 - type: nauc_mrr_at_20_max value: -10.691578914627286 - type: nauc_mrr_at_20_std value: -17.84350301434992 - type: nauc_mrr_at_3_diff1 value: 8.649761557556763 - type: nauc_mrr_at_3_max value: -11.104694428047496 - type: nauc_mrr_at_3_std value: -18.149917948370344 - type: nauc_mrr_at_5_diff1 value: 8.433489750038396 - type: nauc_mrr_at_5_max value: -10.917772454397436 - type: nauc_mrr_at_5_std value: -18.4094211134111 - type: nauc_ndcg_at_1000_diff1 value: 10.19041067807956 - type: nauc_ndcg_at_1000_max value: -9.54328201605796 - type: nauc_ndcg_at_1000_std value: -17.824620427456633 - type: nauc_ndcg_at_100_diff1 value: 10.289491087585963 - type: nauc_ndcg_at_100_max value: -9.357214331420337 - type: nauc_ndcg_at_100_std value: -17.657600653632873 - type: nauc_ndcg_at_10_diff1 value: 9.435530877596092 - type: nauc_ndcg_at_10_max value: -8.182581635383546 - type: nauc_ndcg_at_10_std value: -17.603156479980388 - type: nauc_ndcg_at_1_diff1 value: 14.549204048461151 - type: nauc_ndcg_at_1_max value: -12.230560087701225 - type: nauc_ndcg_at_1_std value: -19.469903650130362 - type: nauc_ndcg_at_20_diff1 value: 9.885227087275197 - type: nauc_ndcg_at_20_max value: -8.52362662391439 - type: nauc_ndcg_at_20_std value: -17.441705436231764 - type: nauc_ndcg_at_3_diff1 value: 9.22542769998547 - type: nauc_ndcg_at_3_max value: -9.903590564219288 - type: nauc_ndcg_at_3_std value: -18.357220221111593 - type: nauc_ndcg_at_5_diff1 value: 8.8756720745828 - type: nauc_ndcg_at_5_max value: -9.269764943861245 - type: nauc_ndcg_at_5_std value: -19.009229433187784 - type: nauc_precision_at_1000_diff1 value: 3.733355117431035 - type: nauc_precision_at_1000_max value: 3.9603571352517393 - type: nauc_precision_at_1000_std value: 70.07345061131439 - type: nauc_precision_at_100_diff1 value: 29.019032142462457 - type: nauc_precision_at_100_max value: 40.75153328286103 - type: nauc_precision_at_100_std value: 62.634249549126594 - type: nauc_precision_at_10_diff1 value: 2.5762677254910353 - type: nauc_precision_at_10_max value: 6.096298633773051 - type: nauc_precision_at_10_std value: -11.507400451348587 - type: nauc_precision_at_1_diff1 value: 14.549204048461151 - type: nauc_precision_at_1_max value: -12.230560087701225 - type: nauc_precision_at_1_std value: -19.469903650130362 - type: nauc_precision_at_20_diff1 value: 1.715540124567996 - type: nauc_precision_at_20_max value: 21.53546453945913 - type: nauc_precision_at_20_std value: 1.537961142195571 - type: nauc_precision_at_3_diff1 value: 5.701850652555737 - type: nauc_precision_at_3_max value: -8.180345365085552 - type: nauc_precision_at_3_std value: -18.37033750502482 - type: nauc_precision_at_5_diff1 value: 3.6053552181042843 - type: nauc_precision_at_5_max value: -5.207647070615612 - type: nauc_precision_at_5_std value: -19.89491085427258 - type: nauc_recall_at_1000_diff1 value: 3.733355117431255 - type: nauc_recall_at_1000_max value: 3.9603571352482194 - type: nauc_recall_at_1000_std value: 70.07345061131205 - type: nauc_recall_at_100_diff1 value: 29.01903214246288 - type: nauc_recall_at_100_max value: 40.7515332828621 - type: nauc_recall_at_100_std value: 62.63424954912607 - type: nauc_recall_at_10_diff1 value: 2.5762677254911988 - type: nauc_recall_at_10_max value: 6.0962986337729905 - type: nauc_recall_at_10_std value: -11.507400451348577 - type: nauc_recall_at_1_diff1 value: 14.549204048461151 - type: nauc_recall_at_1_max value: -12.230560087701225 - type: nauc_recall_at_1_std value: -19.469903650130362 - type: nauc_recall_at_20_diff1 value: 1.7155401245682675 - type: nauc_recall_at_20_max value: 21.535464539459632 - type: nauc_recall_at_20_std value: 1.5379611421957025 - type: nauc_recall_at_3_diff1 value: 5.7018506525557875 - type: nauc_recall_at_3_max value: -8.180345365085538 - type: nauc_recall_at_3_std value: -18.370337505024796 - type: nauc_recall_at_5_diff1 value: 3.6053552181043913 - type: nauc_recall_at_5_max value: -5.207647070615579 - type: nauc_recall_at_5_std value: -19.894910854272492 - type: ndcg_at_1 value: 34.282000000000004 - type: ndcg_at_10 value: 59.53000000000001 - type: ndcg_at_100 value: 62.187000000000005 - type: ndcg_at_1000 value: 62.243 - type: ndcg_at_20 value: 61.451 - type: ndcg_at_3 value: 49.393 - type: ndcg_at_5 value: 54.771 - type: precision_at_1 value: 34.282000000000004 - type: precision_at_10 value: 8.791 - type: precision_at_100 value: 0.992 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.769 - type: precision_at_3 value: 20.104 - type: precision_at_5 value: 14.651 - type: recall_at_1 value: 34.282000000000004 - type: recall_at_10 value: 87.909 - type: recall_at_100 value: 99.21799999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_20 value: 95.377 - type: recall_at_3 value: 60.313 - type: recall_at_5 value: 73.257 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackAndroidRetrieval revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: mteb/cqadupstack-android metrics: - type: main_score value: 53.885000000000005 - type: map_at_1 value: 35.429 - type: map_at_10 value: 47.469 - type: map_at_100 value: 48.997 - type: map_at_1000 value: 49.117 - type: map_at_20 value: 48.324 - type: map_at_3 value: 43.835 - type: map_at_5 value: 46.043 - type: mrr_at_1 value: 43.34763948497854 - type: mrr_at_10 value: 53.258623430297234 - type: mrr_at_100 value: 53.99123884299005 - type: mrr_at_1000 value: 54.02458101713216 - type: mrr_at_20 value: 53.695964669618945 - type: mrr_at_3 value: 50.81068192656173 - type: mrr_at_5 value: 52.45588936576058 - type: nauc_map_at_1000_diff1 value: 51.55382824218782 - type: nauc_map_at_1000_max value: 31.855350695084606 - type: nauc_map_at_1000_std value: -5.465862008150992 - type: nauc_map_at_100_diff1 value: 51.55889312452534 - type: nauc_map_at_100_max value: 31.88429637207401 - type: nauc_map_at_100_std value: -5.40805152544196 - type: nauc_map_at_10_diff1 value: 51.6592677505875 - type: nauc_map_at_10_max value: 31.554425233617543 - type: nauc_map_at_10_std value: -6.125756131339046 - type: nauc_map_at_1_diff1 value: 55.6889617582672 - type: nauc_map_at_1_max value: 27.821166966868176 - type: nauc_map_at_1_std value: -5.778838498211728 - type: nauc_map_at_20_diff1 value: 51.70520970992564 - type: nauc_map_at_20_max value: 31.811676633900465 - type: nauc_map_at_20_std value: -5.463596751904718 - type: nauc_map_at_3_diff1 value: 53.206169626589606 - type: nauc_map_at_3_max value: 31.64373830824983 - type: nauc_map_at_3_std value: -6.054761451312827 - type: nauc_map_at_5_diff1 value: 52.37308971673694 - type: nauc_map_at_5_max value: 31.974302019633644 - type: nauc_map_at_5_std value: -6.302653399940531 - type: nauc_mrr_at_1000_diff1 value: 49.345152231490616 - type: nauc_mrr_at_1000_max value: 33.49789501712511 - type: nauc_mrr_at_1000_std value: -6.054730861163538 - type: nauc_mrr_at_100_diff1 value: 49.3387577601307 - type: nauc_mrr_at_100_max value: 33.48149992464187 - type: nauc_mrr_at_100_std value: -6.061177137579308 - type: nauc_mrr_at_10_diff1 value: 49.08312288449718 - type: nauc_mrr_at_10_max value: 33.470393322577465 - type: nauc_mrr_at_10_std value: -6.180286430216975 - type: nauc_mrr_at_1_diff1 value: 52.43364978537192 - type: nauc_mrr_at_1_max value: 31.521755633355713 - type: nauc_mrr_at_1_std value: -7.002499524130836 - type: nauc_mrr_at_20_diff1 value: 49.311059224991766 - type: nauc_mrr_at_20_max value: 33.538523037692144 - type: nauc_mrr_at_20_std value: -6.034619474981136 - type: nauc_mrr_at_3_diff1 value: 49.90489868439366 - type: nauc_mrr_at_3_max value: 34.400493912164606 - type: nauc_mrr_at_3_std value: -6.028875320994629 - type: nauc_mrr_at_5_diff1 value: 49.033661898983475 - type: nauc_mrr_at_5_max value: 33.732315350193936 - type: nauc_mrr_at_5_std value: -6.272548556330368 - type: nauc_ndcg_at_1000_diff1 value: 49.81681892539247 - type: nauc_ndcg_at_1000_max value: 33.06518006062093 - type: nauc_ndcg_at_1000_std value: -4.282105713014755 - type: nauc_ndcg_at_100_diff1 value: 49.42362108857786 - type: nauc_ndcg_at_100_max value: 32.92024325540483 - type: nauc_ndcg_at_100_std value: -3.7786765305496717 - type: nauc_ndcg_at_10_diff1 value: 48.83102435475594 - type: nauc_ndcg_at_10_max value: 31.898404563611958 - type: nauc_ndcg_at_10_std value: -6.2024003866707 - type: nauc_ndcg_at_1_diff1 value: 52.43364978537192 - type: nauc_ndcg_at_1_max value: 31.521755633355713 - type: nauc_ndcg_at_1_std value: -7.002499524130836 - type: nauc_ndcg_at_20_diff1 value: 49.466526454438316 - type: nauc_ndcg_at_20_max value: 32.424462698701674 - type: nauc_ndcg_at_20_std value: -4.520809563712905 - type: nauc_ndcg_at_3_diff1 value: 50.997884562583884 - type: nauc_ndcg_at_3_max value: 33.26787046916917 - type: nauc_ndcg_at_3_std value: -6.340699471083753 - type: nauc_ndcg_at_5_diff1 value: 49.68314458398097 - type: nauc_ndcg_at_5_max value: 32.80910071143984 - type: nauc_ndcg_at_5_std value: -6.734495576445887 - type: nauc_precision_at_1000_diff1 value: -24.18940012795299 - type: nauc_precision_at_1000_max value: -10.995343674356896 - type: nauc_precision_at_1000_std value: -8.298841004724856 - type: nauc_precision_at_100_diff1 value: -18.104939577865935 - type: nauc_precision_at_100_max value: -1.3757613100627637 - type: nauc_precision_at_100_std value: 0.07661922190466432 - type: nauc_precision_at_10_diff1 value: 3.9624459059275967 - type: nauc_precision_at_10_max value: 14.841561593450391 - type: nauc_precision_at_10_std value: -2.485374333613117 - type: nauc_precision_at_1_diff1 value: 52.43364978537192 - type: nauc_precision_at_1_max value: 31.521755633355713 - type: nauc_precision_at_1_std value: -7.002499524130836 - type: nauc_precision_at_20_diff1 value: -4.4791763436505265 - type: nauc_precision_at_20_max value: 9.157872836996276 - type: nauc_precision_at_20_std value: 2.086903518342088 - type: nauc_precision_at_3_diff1 value: 28.480888018235568 - type: nauc_precision_at_3_max value: 30.34526267718485 - type: nauc_precision_at_3_std value: -6.3006706923866025 - type: nauc_precision_at_5_diff1 value: 16.488039195453517 - type: nauc_precision_at_5_max value: 24.593477099241852 - type: nauc_precision_at_5_std value: -5.316448107840636 - type: nauc_recall_at_1000_diff1 value: 34.715187316533076 - type: nauc_recall_at_1000_max value: 58.2266544684947 - type: nauc_recall_at_1000_std value: 63.85237636398278 - type: nauc_recall_at_100_diff1 value: 36.08623826028132 - type: nauc_recall_at_100_max value: 33.05011429439473 - type: nauc_recall_at_100_std value: 16.559545021212564 - type: nauc_recall_at_10_diff1 value: 39.76738610714205 - type: nauc_recall_at_10_max value: 28.233045706945997 - type: nauc_recall_at_10_std value: -5.13243784043598 - type: nauc_recall_at_1_diff1 value: 55.6889617582672 - type: nauc_recall_at_1_max value: 27.821166966868176 - type: nauc_recall_at_1_std value: -5.778838498211728 - type: nauc_recall_at_20_diff1 value: 41.18682480073759 - type: nauc_recall_at_20_max value: 29.525993239296945 - type: nauc_recall_at_20_std value: 1.5003598438954298 - type: nauc_recall_at_3_diff1 value: 48.31879460301157 - type: nauc_recall_at_3_max value: 32.93751306970167 - type: nauc_recall_at_3_std value: -5.28070084211707 - type: nauc_recall_at_5_diff1 value: 44.327686388315435 - type: nauc_recall_at_5_max value: 32.04823486234599 - type: nauc_recall_at_5_std value: -6.4221525602778256 - type: ndcg_at_1 value: 43.348 - type: ndcg_at_10 value: 53.885000000000005 - type: ndcg_at_100 value: 59.204 - type: ndcg_at_1000 value: 60.744 - type: ndcg_at_20 value: 55.995 - type: ndcg_at_3 value: 49.112 - type: ndcg_at_5 value: 51.61900000000001 - type: precision_at_1 value: 43.348 - type: precision_at_10 value: 10.242999999999999 - type: precision_at_100 value: 1.6150000000000002 - type: precision_at_1000 value: 0.203 - type: precision_at_20 value: 6.066 - type: precision_at_3 value: 23.605 - type: precision_at_5 value: 17.024 - type: recall_at_1 value: 35.429 - type: recall_at_10 value: 65.77199999999999 - type: recall_at_100 value: 87.89 - type: recall_at_1000 value: 97.13000000000001 - type: recall_at_20 value: 73.299 - type: recall_at_3 value: 52.034000000000006 - type: recall_at_5 value: 58.96 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: mteb/cqadupstack-english metrics: - type: main_score value: 49.55 - type: map_at_1 value: 31.684 - type: map_at_10 value: 43.258 - type: map_at_100 value: 44.628 - type: map_at_1000 value: 44.761 - type: map_at_20 value: 44.015 - type: map_at_3 value: 39.778000000000006 - type: map_at_5 value: 41.643 - type: mrr_at_1 value: 39.87261146496815 - type: mrr_at_10 value: 49.31978566373469 - type: mrr_at_100 value: 49.94922739445482 - type: mrr_at_1000 value: 49.990325601254106 - type: mrr_at_20 value: 49.70597468576704 - type: mrr_at_3 value: 47.070063694267546 - type: mrr_at_5 value: 48.23248407643316 - type: nauc_map_at_1000_diff1 value: 53.44044712371752 - type: nauc_map_at_1000_max value: 34.5651440062204 - type: nauc_map_at_1000_std value: -0.9814384609230475 - type: nauc_map_at_100_diff1 value: 53.429004435388464 - type: nauc_map_at_100_max value: 34.52038957273436 - type: nauc_map_at_100_std value: -1.1021936362699805 - type: nauc_map_at_10_diff1 value: 53.879128574022005 - type: nauc_map_at_10_max value: 33.74771524140917 - type: nauc_map_at_10_std value: -2.945132777205236 - type: nauc_map_at_1_diff1 value: 60.25159799695403 - type: nauc_map_at_1_max value: 26.843892985235808 - type: nauc_map_at_1_std value: -9.618702739509093 - type: nauc_map_at_20_diff1 value: 53.56789898225283 - type: nauc_map_at_20_max value: 34.11628845872402 - type: nauc_map_at_20_std value: -2.024376635870884 - type: nauc_map_at_3_diff1 value: 54.45882099014072 - type: nauc_map_at_3_max value: 31.29495446507793 - type: nauc_map_at_3_std value: -6.391948228781555 - type: nauc_map_at_5_diff1 value: 54.20536489050697 - type: nauc_map_at_5_max value: 32.31001487256826 - type: nauc_map_at_5_std value: -5.050953263346934 - type: nauc_mrr_at_1000_diff1 value: 50.835858995999125 - type: nauc_mrr_at_1000_max value: 38.20717381701079 - type: nauc_mrr_at_1000_std value: 4.174163368228787 - type: nauc_mrr_at_100_diff1 value: 50.827072441041224 - type: nauc_mrr_at_100_max value: 38.21077622034756 - type: nauc_mrr_at_100_std value: 4.1951082737013365 - type: nauc_mrr_at_10_diff1 value: 50.90578491570948 - type: nauc_mrr_at_10_max value: 38.19229691746408 - type: nauc_mrr_at_10_std value: 3.8290750066335546 - type: nauc_mrr_at_1_diff1 value: 54.807021746871186 - type: nauc_mrr_at_1_max value: 37.09225642043841 - type: nauc_mrr_at_1_std value: 0.5654547513131355 - type: nauc_mrr_at_20_diff1 value: 50.86247832095378 - type: nauc_mrr_at_20_max value: 38.19277867384178 - type: nauc_mrr_at_20_std value: 4.098932316791841 - type: nauc_mrr_at_3_diff1 value: 50.788934370903036 - type: nauc_mrr_at_3_max value: 37.72130561895659 - type: nauc_mrr_at_3_std value: 2.7339370381517583 - type: nauc_mrr_at_5_diff1 value: 50.72543792525547 - type: nauc_mrr_at_5_max value: 37.57740908475375 - type: nauc_mrr_at_5_std value: 2.742881431085094 - type: nauc_ndcg_at_1000_diff1 value: 50.89692885407576 - type: nauc_ndcg_at_1000_max value: 37.250583054716955 - type: nauc_ndcg_at_1000_std value: 5.552279826578831 - type: nauc_ndcg_at_100_diff1 value: 50.624606875496944 - type: nauc_ndcg_at_100_max value: 37.1024514234627 - type: nauc_ndcg_at_100_std value: 5.495892760032762 - type: nauc_ndcg_at_10_diff1 value: 51.910387255793445 - type: nauc_ndcg_at_10_max value: 36.71168418905039 - type: nauc_ndcg_at_10_std value: 2.3064115117905217 - type: nauc_ndcg_at_1_diff1 value: 54.807021746871186 - type: nauc_ndcg_at_1_max value: 37.09225642043841 - type: nauc_ndcg_at_1_std value: 0.5654547513131355 - type: nauc_ndcg_at_20_diff1 value: 51.43416588546778 - type: nauc_ndcg_at_20_max value: 36.76387180172346 - type: nauc_ndcg_at_20_std value: 3.7012798827049718 - type: nauc_ndcg_at_3_diff1 value: 50.91198494475423 - type: nauc_ndcg_at_3_max value: 34.92770670756687 - type: nauc_ndcg_at_3_std value: -0.9071486759887368 - type: nauc_ndcg_at_5_diff1 value: 51.63559468683886 - type: nauc_ndcg_at_5_max value: 34.86849679864564 - type: nauc_ndcg_at_5_std value: -0.734837221224976 - type: nauc_precision_at_1000_diff1 value: -13.43645457127175 - type: nauc_precision_at_1000_max value: 12.71162105198664 - type: nauc_precision_at_1000_std value: 33.175399007040255 - type: nauc_precision_at_100_diff1 value: -8.549834785105412 - type: nauc_precision_at_100_max value: 22.47383497331883 - type: nauc_precision_at_100_std value: 39.09108761430844 - type: nauc_precision_at_10_diff1 value: 7.556572451100043 - type: nauc_precision_at_10_max value: 35.35285122987575 - type: nauc_precision_at_10_std value: 29.417466305615967 - type: nauc_precision_at_1_diff1 value: 54.807021746871186 - type: nauc_precision_at_1_max value: 37.09225642043841 - type: nauc_precision_at_1_std value: 0.5654547513131355 - type: nauc_precision_at_20_diff1 value: -0.550158641635712 - type: nauc_precision_at_20_max value: 29.9068430006187 - type: nauc_precision_at_20_std value: 33.920603132821185 - type: nauc_precision_at_3_diff1 value: 25.551264664276687 - type: nauc_precision_at_3_max value: 37.59463225854679 - type: nauc_precision_at_3_std value: 13.707295021359043 - type: nauc_precision_at_5_diff1 value: 17.76136129817151 - type: nauc_precision_at_5_max value: 35.85363807255972 - type: nauc_precision_at_5_std value: 19.48470876841111 - type: nauc_recall_at_1000_diff1 value: 37.1593620123866 - type: nauc_recall_at_1000_max value: 46.29322536951135 - type: nauc_recall_at_1000_std value: 51.47312657083967 - type: nauc_recall_at_100_diff1 value: 37.7542224949536 - type: nauc_recall_at_100_max value: 38.84120637703135 - type: nauc_recall_at_100_std value: 28.839672572221925 - type: nauc_recall_at_10_diff1 value: 46.24130302658384 - type: nauc_recall_at_10_max value: 35.89001724712849 - type: nauc_recall_at_10_std value: 6.985137790828618 - type: nauc_recall_at_1_diff1 value: 60.25159799695403 - type: nauc_recall_at_1_max value: 26.843892985235808 - type: nauc_recall_at_1_std value: -9.618702739509093 - type: nauc_recall_at_20_diff1 value: 43.63576680886187 - type: nauc_recall_at_20_max value: 36.79079644708101 - type: nauc_recall_at_20_std value: 13.81561928605839 - type: nauc_recall_at_3_diff1 value: 48.2299322140522 - type: nauc_recall_at_3_max value: 30.038088484376203 - type: nauc_recall_at_3_std value: -4.871116183843762 - type: nauc_recall_at_5_diff1 value: 47.22331872695983 - type: nauc_recall_at_5_max value: 30.398541477173136 - type: nauc_recall_at_5_std value: -3.2038541888528957 - type: ndcg_at_1 value: 39.873 - type: ndcg_at_10 value: 49.55 - type: ndcg_at_100 value: 53.809 - type: ndcg_at_1000 value: 55.767999999999994 - type: ndcg_at_20 value: 51.275999999999996 - type: ndcg_at_3 value: 44.91 - type: ndcg_at_5 value: 46.855999999999995 - type: precision_at_1 value: 39.873 - type: precision_at_10 value: 9.65 - type: precision_at_100 value: 1.522 - type: precision_at_1000 value: 0.196 - type: precision_at_20 value: 5.701 - type: precision_at_3 value: 22.166 - type: precision_at_5 value: 15.643 - type: recall_at_1 value: 31.684 - type: recall_at_10 value: 60.69 - type: recall_at_100 value: 78.521 - type: recall_at_1000 value: 91.02900000000001 - type: recall_at_20 value: 66.973 - type: recall_at_3 value: 46.807 - type: recall_at_5 value: 52.402 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: mteb/cqadupstack-gaming metrics: - type: main_score value: 62.686 - type: map_at_1 value: 43.856 - type: map_at_10 value: 57.056 - type: map_at_100 value: 58.048 - type: map_at_1000 value: 58.092 - type: map_at_20 value: 57.684000000000005 - type: map_at_3 value: 53.958 - type: map_at_5 value: 55.80500000000001 - type: mrr_at_1 value: 50.03134796238244 - type: mrr_at_10 value: 60.31022043091019 - type: mrr_at_100 value: 60.91892338857461 - type: mrr_at_1000 value: 60.93770463536649 - type: mrr_at_20 value: 60.705642387392736 - type: mrr_at_3 value: 58.286311389759746 - type: mrr_at_5 value: 59.49320794148393 - type: nauc_map_at_1000_diff1 value: 54.849140197256695 - type: nauc_map_at_1000_max value: 38.978448968260224 - type: nauc_map_at_1000_std value: 0.4955439383268162 - type: nauc_map_at_100_diff1 value: 54.824334747823364 - type: nauc_map_at_100_max value: 38.959443109450994 - type: nauc_map_at_100_std value: 0.49626092018886037 - type: nauc_map_at_10_diff1 value: 54.778189277103394 - type: nauc_map_at_10_max value: 38.20972191654546 - type: nauc_map_at_10_std value: -0.7239823837455759 - type: nauc_map_at_1_diff1 value: 58.74017164752485 - type: nauc_map_at_1_max value: 31.528974862589585 - type: nauc_map_at_1_std value: -3.273824691929492 - type: nauc_map_at_20_diff1 value: 54.78943693416187 - type: nauc_map_at_20_max value: 38.77930316443076 - type: nauc_map_at_20_std value: 0.25607460088355544 - type: nauc_map_at_3_diff1 value: 55.68313410225767 - type: nauc_map_at_3_max value: 36.22847284104399 - type: nauc_map_at_3_std value: -3.010979639100503 - type: nauc_map_at_5_diff1 value: 55.11385094420661 - type: nauc_map_at_5_max value: 37.319681045490924 - type: nauc_map_at_5_std value: -2.156640733221061 - type: nauc_mrr_at_1000_diff1 value: 54.504759468380705 - type: nauc_mrr_at_1000_max value: 40.58849492650406 - type: nauc_mrr_at_1000_std value: 1.8226622175866118 - type: nauc_mrr_at_100_diff1 value: 54.4918034449886 - type: nauc_mrr_at_100_max value: 40.59202728933427 - type: nauc_mrr_at_100_std value: 1.8276428096536335 - type: nauc_mrr_at_10_diff1 value: 54.33603399493329 - type: nauc_mrr_at_10_max value: 40.58896878978089 - type: nauc_mrr_at_10_std value: 1.5733340909114375 - type: nauc_mrr_at_1_diff1 value: 58.062410036466105 - type: nauc_mrr_at_1_max value: 37.660958859966506 - type: nauc_mrr_at_1_std value: 0.029007600674170648 - type: nauc_mrr_at_20_diff1 value: 54.43793386924358 - type: nauc_mrr_at_20_max value: 40.66773423875307 - type: nauc_mrr_at_20_std value: 1.891967891797154 - type: nauc_mrr_at_3_diff1 value: 54.77901284537966 - type: nauc_mrr_at_3_max value: 40.182219821206964 - type: nauc_mrr_at_3_std value: 0.8911935034597871 - type: nauc_mrr_at_5_diff1 value: 54.466068837163675 - type: nauc_mrr_at_5_max value: 40.334996916684126 - type: nauc_mrr_at_5_std value: 0.9460830492892364 - type: nauc_ndcg_at_1000_diff1 value: 53.8465376860938 - type: nauc_ndcg_at_1000_max value: 41.63158111016696 - type: nauc_ndcg_at_1000_std value: 3.864205884257578 - type: nauc_ndcg_at_100_diff1 value: 53.4025864436944 - type: nauc_ndcg_at_100_max value: 41.805453995307914 - type: nauc_ndcg_at_100_std value: 4.36777557904857 - type: nauc_ndcg_at_10_diff1 value: 52.96034987157544 - type: nauc_ndcg_at_10_max value: 40.7601173480795 - type: nauc_ndcg_at_10_std value: 1.905824035879141 - type: nauc_ndcg_at_1_diff1 value: 58.062410036466105 - type: nauc_ndcg_at_1_max value: 37.660958859966506 - type: nauc_ndcg_at_1_std value: 0.029007600674170648 - type: nauc_ndcg_at_20_diff1 value: 53.2834771889242 - type: nauc_ndcg_at_20_max value: 41.713541932946406 - type: nauc_ndcg_at_20_std value: 3.865102828793311 - type: nauc_ndcg_at_3_diff1 value: 54.03389464372289 - type: nauc_ndcg_at_3_max value: 38.41449914649933 - type: nauc_ndcg_at_3_std value: -0.886276189886313 - type: nauc_ndcg_at_5_diff1 value: 53.456413320299 - type: nauc_ndcg_at_5_max value: 39.49048882649335 - type: nauc_ndcg_at_5_std value: -0.42692690160443814 - type: nauc_precision_at_1000_diff1 value: -14.770791653274824 - type: nauc_precision_at_1000_max value: 21.479874538905246 - type: nauc_precision_at_1000_std value: 28.607024261300207 - type: nauc_precision_at_100_diff1 value: -12.189696449878126 - type: nauc_precision_at_100_max value: 26.69785787492456 - type: nauc_precision_at_100_std value: 33.59098307467553 - type: nauc_precision_at_10_diff1 value: 6.922968330978399 - type: nauc_precision_at_10_max value: 34.52138344123087 - type: nauc_precision_at_10_std value: 21.768427637079952 - type: nauc_precision_at_1_diff1 value: 58.062410036466105 - type: nauc_precision_at_1_max value: 37.660958859966506 - type: nauc_precision_at_1_std value: 0.029007600674170648 - type: nauc_precision_at_20_diff1 value: -0.6837867902179278 - type: nauc_precision_at_20_max value: 33.98683709011133 - type: nauc_precision_at_20_std value: 30.8845561918902 - type: nauc_precision_at_3_diff1 value: 28.195043041120847 - type: nauc_precision_at_3_max value: 37.659916094938836 - type: nauc_precision_at_3_std value: 7.226520146634867 - type: nauc_precision_at_5_diff1 value: 16.633667288096245 - type: nauc_precision_at_5_max value: 34.90176597404891 - type: nauc_precision_at_5_std value: 12.421585442334088 - type: nauc_recall_at_1000_diff1 value: 45.20743732415397 - type: nauc_recall_at_1000_max value: 72.77115913579242 - type: nauc_recall_at_1000_std value: 70.48328496679083 - type: nauc_recall_at_100_diff1 value: 38.56282680810794 - type: nauc_recall_at_100_max value: 55.46797683321103 - type: nauc_recall_at_100_std value: 36.878791151929136 - type: nauc_recall_at_10_diff1 value: 44.18252051452362 - type: nauc_recall_at_10_max value: 43.33391810040086 - type: nauc_recall_at_10_std value: 6.663378192277723 - type: nauc_recall_at_1_diff1 value: 58.74017164752485 - type: nauc_recall_at_1_max value: 31.528974862589585 - type: nauc_recall_at_1_std value: -3.273824691929492 - type: nauc_recall_at_20_diff1 value: 44.19944231642417 - type: nauc_recall_at_20_max value: 49.401101483915866 - type: nauc_recall_at_20_std value: 18.97803841673839 - type: nauc_recall_at_3_diff1 value: 49.56378985428704 - type: nauc_recall_at_3_max value: 36.434210616870224 - type: nauc_recall_at_3_std value: -2.850559971607616 - type: nauc_recall_at_5_diff1 value: 47.37107217086109 - type: nauc_recall_at_5_max value: 39.0236745509895 - type: nauc_recall_at_5_std value: -1.7402454457937195 - type: ndcg_at_1 value: 50.031000000000006 - type: ndcg_at_10 value: 62.686 - type: ndcg_at_100 value: 66.403 - type: ndcg_at_1000 value: 67.241 - type: ndcg_at_20 value: 64.37899999999999 - type: ndcg_at_3 value: 57.859 - type: ndcg_at_5 value: 60.375 - type: precision_at_1 value: 50.031000000000006 - type: precision_at_10 value: 9.856 - type: precision_at_100 value: 1.266 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_20 value: 5.489 - type: precision_at_3 value: 25.746999999999996 - type: precision_at_5 value: 17.492 - type: recall_at_1 value: 43.856 - type: recall_at_10 value: 75.824 - type: recall_at_100 value: 91.622 - type: recall_at_1000 value: 97.538 - type: recall_at_20 value: 81.951 - type: recall_at_3 value: 63.016000000000005 - type: recall_at_5 value: 69.18299999999999 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: mteb/cqadupstack-gis metrics: - type: main_score value: 43.983 - type: map_at_1 value: 28.942 - type: map_at_10 value: 38.621 - type: map_at_100 value: 39.7 - type: map_at_1000 value: 39.766 - type: map_at_20 value: 39.262 - type: map_at_3 value: 35.719 - type: map_at_5 value: 37.378 - type: mrr_at_1 value: 31.29943502824859 - type: mrr_at_10 value: 40.76463994260603 - type: mrr_at_100 value: 41.67073617629083 - type: mrr_at_1000 value: 41.717446259457105 - type: mrr_at_20 value: 41.32577374689195 - type: mrr_at_3 value: 37.984934086628996 - type: mrr_at_5 value: 39.64595103578152 - type: nauc_map_at_1000_diff1 value: 43.64461679688985 - type: nauc_map_at_1000_max value: 31.53717883948204 - type: nauc_map_at_1000_std value: 1.193745788248017 - type: nauc_map_at_100_diff1 value: 43.63847825079489 - type: nauc_map_at_100_max value: 31.536602619279165 - type: nauc_map_at_100_std value: 1.2001240243342401 - type: nauc_map_at_10_diff1 value: 43.845991987142014 - type: nauc_map_at_10_max value: 31.27509937344113 - type: nauc_map_at_10_std value: 0.7327934840520994 - type: nauc_map_at_1_diff1 value: 50.62269273984579 - type: nauc_map_at_1_max value: 30.16325757909521 - type: nauc_map_at_1_std value: -0.6398875136233392 - type: nauc_map_at_20_diff1 value: 43.630758403790914 - type: nauc_map_at_20_max value: 31.408258098047703 - type: nauc_map_at_20_std value: 1.12616034652217 - type: nauc_map_at_3_diff1 value: 44.823493567359456 - type: nauc_map_at_3_max value: 31.075886347614496 - type: nauc_map_at_3_std value: -0.25126874515735426 - type: nauc_map_at_5_diff1 value: 43.79768853087658 - type: nauc_map_at_5_max value: 31.091080995725324 - type: nauc_map_at_5_std value: 0.16440771782544047 - type: nauc_mrr_at_1000_diff1 value: 42.7865400752329 - type: nauc_mrr_at_1000_max value: 32.84731670326893 - type: nauc_mrr_at_1000_std value: 2.6067637582013825 - type: nauc_mrr_at_100_diff1 value: 42.771741548331065 - type: nauc_mrr_at_100_max value: 32.85324232845987 - type: nauc_mrr_at_100_std value: 2.6092786694308376 - type: nauc_mrr_at_10_diff1 value: 42.82969738870672 - type: nauc_mrr_at_10_max value: 32.69407549631432 - type: nauc_mrr_at_10_std value: 2.302903910016054 - type: nauc_mrr_at_1_diff1 value: 49.05638333657571 - type: nauc_mrr_at_1_max value: 33.12030717171514 - type: nauc_mrr_at_1_std value: 1.3278035087690774 - type: nauc_mrr_at_20_diff1 value: 42.74267239536286 - type: nauc_mrr_at_20_max value: 32.78571108973092 - type: nauc_mrr_at_20_std value: 2.5932669908758643 - type: nauc_mrr_at_3_diff1 value: 43.69963426089187 - type: nauc_mrr_at_3_max value: 32.78193126956233 - type: nauc_mrr_at_3_std value: 1.634874463134699 - type: nauc_mrr_at_5_diff1 value: 42.838630647832524 - type: nauc_mrr_at_5_max value: 32.459318735260545 - type: nauc_mrr_at_5_std value: 1.9412518283209172 - type: nauc_ndcg_at_1000_diff1 value: 41.01253839851583 - type: nauc_ndcg_at_1000_max value: 32.69570568894237 - type: nauc_ndcg_at_1000_std value: 3.4254737113410343 - type: nauc_ndcg_at_100_diff1 value: 40.62589243745832 - type: nauc_ndcg_at_100_max value: 32.664990655736126 - type: nauc_ndcg_at_100_std value: 3.799569445326048 - type: nauc_ndcg_at_10_diff1 value: 41.31658753735306 - type: nauc_ndcg_at_10_max value: 31.511946320339295 - type: nauc_ndcg_at_10_std value: 2.0492930500796662 - type: nauc_ndcg_at_1_diff1 value: 49.05638333657571 - type: nauc_ndcg_at_1_max value: 33.12030717171514 - type: nauc_ndcg_at_1_std value: 1.3278035087690774 - type: nauc_ndcg_at_20_diff1 value: 40.66188223212841 - type: nauc_ndcg_at_20_max value: 31.926240431497476 - type: nauc_ndcg_at_20_std value: 3.370398664595343 - type: nauc_ndcg_at_3_diff1 value: 43.035580180241 - type: nauc_ndcg_at_3_max value: 31.363874129878404 - type: nauc_ndcg_at_3_std value: 0.1422507242819929 - type: nauc_ndcg_at_5_diff1 value: 41.29049003955878 - type: nauc_ndcg_at_5_max value: 31.112034994977737 - type: nauc_ndcg_at_5_std value: 0.860179279828966 - type: nauc_precision_at_1000_diff1 value: -12.41854465881981 - type: nauc_precision_at_1000_max value: 14.706779246590548 - type: nauc_precision_at_1000_std value: 9.812804367375206 - type: nauc_precision_at_100_diff1 value: 2.797520107808461 - type: nauc_precision_at_100_max value: 24.335873541811406 - type: nauc_precision_at_100_std value: 12.87186398750545 - type: nauc_precision_at_10_diff1 value: 24.530962799265847 - type: nauc_precision_at_10_max value: 31.00772010798733 - type: nauc_precision_at_10_std value: 6.696733001548185 - type: nauc_precision_at_1_diff1 value: 49.05638333657571 - type: nauc_precision_at_1_max value: 33.12030717171514 - type: nauc_precision_at_1_std value: 1.3278035087690774 - type: nauc_precision_at_20_diff1 value: 16.25028416351204 - type: nauc_precision_at_20_max value: 29.629326492027342 - type: nauc_precision_at_20_std value: 11.085888573121679 - type: nauc_precision_at_3_diff1 value: 33.923667689694256 - type: nauc_precision_at_3_max value: 33.5859782361996 - type: nauc_precision_at_3_std value: 1.9468331086918693 - type: nauc_precision_at_5_diff1 value: 27.917827233088875 - type: nauc_precision_at_5_max value: 33.13290043423535 - type: nauc_precision_at_5_std value: 3.800870695945311 - type: nauc_recall_at_1000_diff1 value: 9.680283388428789 - type: nauc_recall_at_1000_max value: 49.479399284871235 - type: nauc_recall_at_1000_std value: 31.506985071436088 - type: nauc_recall_at_100_diff1 value: 23.607673377885448 - type: nauc_recall_at_100_max value: 36.637750366403935 - type: nauc_recall_at_100_std value: 18.30770690564224 - type: nauc_recall_at_10_diff1 value: 33.199683418312446 - type: nauc_recall_at_10_max value: 29.63115497012312 - type: nauc_recall_at_10_std value: 4.813200391480566 - type: nauc_recall_at_1_diff1 value: 50.62269273984579 - type: nauc_recall_at_1_max value: 30.16325757909521 - type: nauc_recall_at_1_std value: -0.6398875136233392 - type: nauc_recall_at_20_diff1 value: 29.16488387844995 - type: nauc_recall_at_20_max value: 30.788019479459 - type: nauc_recall_at_20_std value: 11.031953917298853 - type: nauc_recall_at_3_diff1 value: 38.215351600417065 - type: nauc_recall_at_3_max value: 29.619887154236128 - type: nauc_recall_at_3_std value: -0.13237298980339363 - type: nauc_recall_at_5_diff1 value: 33.93788042633265 - type: nauc_recall_at_5_max value: 28.67185092656741 - type: nauc_recall_at_5_std value: 1.316700201091445 - type: ndcg_at_1 value: 31.299 - type: ndcg_at_10 value: 43.983 - type: ndcg_at_100 value: 48.992999999999995 - type: ndcg_at_1000 value: 50.757 - type: ndcg_at_20 value: 46.152 - type: ndcg_at_3 value: 38.367000000000004 - type: ndcg_at_5 value: 41.171 - type: precision_at_1 value: 31.299 - type: precision_at_10 value: 6.734 - type: precision_at_100 value: 0.972 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_20 value: 3.898 - type: precision_at_3 value: 16.121 - type: precision_at_5 value: 11.344999999999999 - type: recall_at_1 value: 28.942 - type: recall_at_10 value: 58.343999999999994 - type: recall_at_100 value: 80.82300000000001 - type: recall_at_1000 value: 94.348 - type: recall_at_20 value: 66.449 - type: recall_at_3 value: 43.415 - type: recall_at_5 value: 50.007999999999996 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: mteb/cqadupstack-mathematica metrics: - type: main_score value: 33.144 - type: map_at_1 value: 19.41 - type: map_at_10 value: 27.802 - type: map_at_100 value: 29.157 - type: map_at_1000 value: 29.274 - type: map_at_20 value: 28.549000000000003 - type: map_at_3 value: 25.052999999999997 - type: map_at_5 value: 26.521 - type: mrr_at_1 value: 23.756218905472636 - type: mrr_at_10 value: 32.3623450209271 - type: mrr_at_100 value: 33.3648208444617 - type: mrr_at_1000 value: 33.427688215162185 - type: mrr_at_20 value: 32.93723485575758 - type: mrr_at_3 value: 29.539800995024883 - type: mrr_at_5 value: 31.156716417910452 - type: nauc_map_at_1000_diff1 value: 36.196391248081284 - type: nauc_map_at_1000_max value: 25.650644367091495 - type: nauc_map_at_1000_std value: 6.130340697729844 - type: nauc_map_at_100_diff1 value: 36.138890642411376 - type: nauc_map_at_100_max value: 25.587124763888518 - type: nauc_map_at_100_std value: 6.129336379055536 - type: nauc_map_at_10_diff1 value: 36.254426743566775 - type: nauc_map_at_10_max value: 25.465599906543034 - type: nauc_map_at_10_std value: 5.880280378112879 - type: nauc_map_at_1_diff1 value: 42.890551563179976 - type: nauc_map_at_1_max value: 25.813805281076956 - type: nauc_map_at_1_std value: 5.150718386163028 - type: nauc_map_at_20_diff1 value: 35.98551587974314 - type: nauc_map_at_20_max value: 25.501540521726636 - type: nauc_map_at_20_std value: 5.858703157458749 - type: nauc_map_at_3_diff1 value: 37.646558039577734 - type: nauc_map_at_3_max value: 26.138491471124247 - type: nauc_map_at_3_std value: 6.0487505175540734 - type: nauc_map_at_5_diff1 value: 36.817582976153695 - type: nauc_map_at_5_max value: 25.398200211121146 - type: nauc_map_at_5_std value: 6.31126763919522 - type: nauc_mrr_at_1000_diff1 value: 37.313544952847835 - type: nauc_mrr_at_1000_max value: 26.96218532078988 - type: nauc_mrr_at_1000_std value: 6.814359224654042 - type: nauc_mrr_at_100_diff1 value: 37.28104407653679 - type: nauc_mrr_at_100_max value: 26.931243040477256 - type: nauc_mrr_at_100_std value: 6.800500150841733 - type: nauc_mrr_at_10_diff1 value: 37.315832621275895 - type: nauc_mrr_at_10_max value: 26.941454225978372 - type: nauc_mrr_at_10_std value: 6.837046527796884 - type: nauc_mrr_at_1_diff1 value: 43.19904188582958 - type: nauc_mrr_at_1_max value: 26.975620445904795 - type: nauc_mrr_at_1_std value: 4.52071008581395 - type: nauc_mrr_at_20_diff1 value: 37.2200524790774 - type: nauc_mrr_at_20_max value: 26.971494160765847 - type: nauc_mrr_at_20_std value: 6.716431228783282 - type: nauc_mrr_at_3_diff1 value: 38.46236387340654 - type: nauc_mrr_at_3_max value: 27.846812992192056 - type: nauc_mrr_at_3_std value: 6.550711872569794 - type: nauc_mrr_at_5_diff1 value: 37.620346007658476 - type: nauc_mrr_at_5_max value: 27.031025952102038 - type: nauc_mrr_at_5_std value: 7.32343760231163 - type: nauc_ndcg_at_1000_diff1 value: 34.95081314840592 - type: nauc_ndcg_at_1000_max value: 26.89265465124325 - type: nauc_ndcg_at_1000_std value: 7.854154466831975 - type: nauc_ndcg_at_100_diff1 value: 34.01417812563093 - type: nauc_ndcg_at_100_max value: 25.792737746436835 - type: nauc_ndcg_at_100_std value: 7.726584165493833 - type: nauc_ndcg_at_10_diff1 value: 33.895122516474466 - type: nauc_ndcg_at_10_max value: 25.388442204589612 - type: nauc_ndcg_at_10_std value: 6.359560223645991 - type: nauc_ndcg_at_1_diff1 value: 43.19904188582958 - type: nauc_ndcg_at_1_max value: 26.975620445904795 - type: nauc_ndcg_at_1_std value: 4.52071008581395 - type: nauc_ndcg_at_20_diff1 value: 33.36078689830245 - type: nauc_ndcg_at_20_max value: 25.531794610571563 - type: nauc_ndcg_at_20_std value: 6.136658608653248 - type: nauc_ndcg_at_3_diff1 value: 36.44505602530781 - type: nauc_ndcg_at_3_max value: 26.9104071983157 - type: nauc_ndcg_at_3_std value: 6.427178520371878 - type: nauc_ndcg_at_5_diff1 value: 35.01384323197442 - type: nauc_ndcg_at_5_max value: 25.5560447088692 - type: nauc_ndcg_at_5_std value: 7.3676236760360485 - type: nauc_precision_at_1000_diff1 value: 2.8903331041804514 - type: nauc_precision_at_1000_max value: 4.059662742366004 - type: nauc_precision_at_1000_std value: -1.5891687644008334 - type: nauc_precision_at_100_diff1 value: 8.437726471693766 - type: nauc_precision_at_100_max value: 11.250588557568427 - type: nauc_precision_at_100_std value: 4.231571164627862 - type: nauc_precision_at_10_diff1 value: 19.57085237210294 - type: nauc_precision_at_10_max value: 20.973093492003905 - type: nauc_precision_at_10_std value: 3.197416248152466 - type: nauc_precision_at_1_diff1 value: 43.19904188582958 - type: nauc_precision_at_1_max value: 26.975620445904795 - type: nauc_precision_at_1_std value: 4.52071008581395 - type: nauc_precision_at_20_diff1 value: 15.67136554192724 - type: nauc_precision_at_20_max value: 17.706882621057858 - type: nauc_precision_at_20_std value: 1.9363472182867714 - type: nauc_precision_at_3_diff1 value: 30.38035695042325 - type: nauc_precision_at_3_max value: 26.48218693244094 - type: nauc_precision_at_3_std value: 6.424657705785632 - type: nauc_precision_at_5_diff1 value: 25.272543315171458 - type: nauc_precision_at_5_max value: 22.32441421311652 - type: nauc_precision_at_5_std value: 7.4912569081905716 - type: nauc_recall_at_1000_diff1 value: 25.5748044137675 - type: nauc_recall_at_1000_max value: 43.85796585370269 - type: nauc_recall_at_1000_std value: 30.0338086596789 - type: nauc_recall_at_100_diff1 value: 22.577080638885093 - type: nauc_recall_at_100_max value: 23.224511700617477 - type: nauc_recall_at_100_std value: 15.187963852289313 - type: nauc_recall_at_10_diff1 value: 25.058592299355908 - type: nauc_recall_at_10_max value: 22.24448483279841 - type: nauc_recall_at_10_std value: 6.3179089740052765 - type: nauc_recall_at_1_diff1 value: 42.890551563179976 - type: nauc_recall_at_1_max value: 25.813805281076956 - type: nauc_recall_at_1_std value: 5.150718386163028 - type: nauc_recall_at_20_diff1 value: 22.433865123187307 - type: nauc_recall_at_20_max value: 22.739695641511762 - type: nauc_recall_at_20_std value: 5.362005125538497 - type: nauc_recall_at_3_diff1 value: 32.17919168998616 - type: nauc_recall_at_3_max value: 26.044028436867357 - type: nauc_recall_at_3_std value: 7.420349884006329 - type: nauc_recall_at_5_diff1 value: 28.967104573649138 - type: nauc_recall_at_5_max value: 23.40865848168201 - type: nauc_recall_at_5_std value: 9.174406147723621 - type: ndcg_at_1 value: 23.756 - type: ndcg_at_10 value: 33.144 - type: ndcg_at_100 value: 39.261 - type: ndcg_at_1000 value: 41.881 - type: ndcg_at_20 value: 35.56 - type: ndcg_at_3 value: 27.927999999999997 - type: ndcg_at_5 value: 30.293999999999997 - type: precision_at_1 value: 23.756 - type: precision_at_10 value: 5.995 - type: precision_at_100 value: 1.053 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_20 value: 3.688 - type: precision_at_3 value: 13.059999999999999 - type: precision_at_5 value: 9.602 - type: recall_at_1 value: 19.41 - type: recall_at_10 value: 45.074 - type: recall_at_100 value: 71.131 - type: recall_at_1000 value: 89.604 - type: recall_at_20 value: 53.673 - type: recall_at_3 value: 31.055 - type: recall_at_5 value: 36.714999999999996 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: mteb/cqadupstack-physics metrics: - type: main_score value: 49.675000000000004 - type: map_at_1 value: 33.178999999999995 - type: map_at_10 value: 43.807 - type: map_at_100 value: 45.17 - type: map_at_1000 value: 45.271 - type: map_at_20 value: 44.516 - type: map_at_3 value: 40.813 - type: map_at_5 value: 42.457 - type: mrr_at_1 value: 40.32723772858518 - type: mrr_at_10 value: 49.646867409138814 - type: mrr_at_100 value: 50.493686101426285 - type: mrr_at_1000 value: 50.525386961808834 - type: mrr_at_20 value: 50.120274354884586 - type: mrr_at_3 value: 47.49759384023096 - type: mrr_at_5 value: 48.72473532242535 - type: nauc_map_at_1000_diff1 value: 49.5947127786396 - type: nauc_map_at_1000_max value: 33.39720045844929 - type: nauc_map_at_1000_std value: -3.131428593252271 - type: nauc_map_at_100_diff1 value: 49.57797867324617 - type: nauc_map_at_100_max value: 33.356927974709464 - type: nauc_map_at_100_std value: -3.1661365376766337 - type: nauc_map_at_10_diff1 value: 49.59294630598952 - type: nauc_map_at_10_max value: 32.86647346990462 - type: nauc_map_at_10_std value: -4.1582043443386745 - type: nauc_map_at_1_diff1 value: 53.98646767288695 - type: nauc_map_at_1_max value: 29.45629077638936 - type: nauc_map_at_1_std value: -5.621187380771589 - type: nauc_map_at_20_diff1 value: 49.486982890447074 - type: nauc_map_at_20_max value: 33.11681933406332 - type: nauc_map_at_20_std value: -3.5826433195146854 - type: nauc_map_at_3_diff1 value: 50.81807107491861 - type: nauc_map_at_3_max value: 32.32552291988859 - type: nauc_map_at_3_std value: -3.952946504088928 - type: nauc_map_at_5_diff1 value: 49.70201354274439 - type: nauc_map_at_5_max value: 32.831846031004886 - type: nauc_map_at_5_std value: -3.8330488624207737 - type: nauc_mrr_at_1000_diff1 value: 49.04159472507738 - type: nauc_mrr_at_1000_max value: 35.617600171138676 - type: nauc_mrr_at_1000_std value: -1.5975830757486646 - type: nauc_mrr_at_100_diff1 value: 49.03848471692094 - type: nauc_mrr_at_100_max value: 35.61936748662614 - type: nauc_mrr_at_100_std value: -1.5922053398594729 - type: nauc_mrr_at_10_diff1 value: 48.92463964652612 - type: nauc_mrr_at_10_max value: 35.37757708992045 - type: nauc_mrr_at_10_std value: -2.2052028139567303 - type: nauc_mrr_at_1_diff1 value: 52.23915787290734 - type: nauc_mrr_at_1_max value: 34.393531787632334 - type: nauc_mrr_at_1_std value: -1.452007661016969 - type: nauc_mrr_at_20_diff1 value: 48.91168438018404 - type: nauc_mrr_at_20_max value: 35.478962544421876 - type: nauc_mrr_at_20_std value: -1.8246048423555414 - type: nauc_mrr_at_3_diff1 value: 50.115432665442164 - type: nauc_mrr_at_3_max value: 35.89093796085569 - type: nauc_mrr_at_3_std value: -1.4895016313153366 - type: nauc_mrr_at_5_diff1 value: 49.04321261351915 - type: nauc_mrr_at_5_max value: 35.85730520949451 - type: nauc_mrr_at_5_std value: -1.68790556880753 - type: nauc_ndcg_at_1000_diff1 value: 48.294697499154374 - type: nauc_ndcg_at_1000_max value: 35.167410242367595 - type: nauc_ndcg_at_1000_std value: -0.6346078535914157 - type: nauc_ndcg_at_100_diff1 value: 48.025525283449014 - type: nauc_ndcg_at_100_max value: 34.79288511776105 - type: nauc_ndcg_at_100_std value: -0.7823403044086993 - type: nauc_ndcg_at_10_diff1 value: 47.70793258015258 - type: nauc_ndcg_at_10_max value: 33.09558927880104 - type: nauc_ndcg_at_10_std value: -4.7793864166260605 - type: nauc_ndcg_at_1_diff1 value: 52.23915787290734 - type: nauc_ndcg_at_1_max value: 34.393531787632334 - type: nauc_ndcg_at_1_std value: -1.452007661016969 - type: nauc_ndcg_at_20_diff1 value: 47.354286045074815 - type: nauc_ndcg_at_20_max value: 33.686648806027975 - type: nauc_ndcg_at_20_std value: -3.0189085132476556 - type: nauc_ndcg_at_3_diff1 value: 49.68805334316908 - type: nauc_ndcg_at_3_max value: 34.196077748056496 - type: nauc_ndcg_at_3_std value: -2.7167289163768436 - type: nauc_ndcg_at_5_diff1 value: 47.94474868912989 - type: nauc_ndcg_at_5_max value: 34.00261603413051 - type: nauc_ndcg_at_5_std value: -3.3541028103046115 - type: nauc_precision_at_1000_diff1 value: -12.0150100710755 - type: nauc_precision_at_1000_max value: 5.332942816568796 - type: nauc_precision_at_1000_std value: 14.543288479130458 - type: nauc_precision_at_100_diff1 value: -4.920332181588838 - type: nauc_precision_at_100_max value: 14.42313332017491 - type: nauc_precision_at_100_std value: 17.821953321018384 - type: nauc_precision_at_10_diff1 value: 14.70509089079217 - type: nauc_precision_at_10_max value: 25.381887131649716 - type: nauc_precision_at_10_std value: 5.226419288645675 - type: nauc_precision_at_1_diff1 value: 52.23915787290734 - type: nauc_precision_at_1_max value: 34.393531787632334 - type: nauc_precision_at_1_std value: -1.452007661016969 - type: nauc_precision_at_20_diff1 value: 6.312827641507564 - type: nauc_precision_at_20_max value: 22.483038562271933 - type: nauc_precision_at_20_std value: 11.368419856892416 - type: nauc_precision_at_3_diff1 value: 33.271443420273606 - type: nauc_precision_at_3_max value: 33.571078182106675 - type: nauc_precision_at_3_std value: 4.47382265155717 - type: nauc_precision_at_5_diff1 value: 23.43287104284656 - type: nauc_precision_at_5_max value: 30.909085068105313 - type: nauc_precision_at_5_std value: 5.545672049452433 - type: nauc_recall_at_1000_diff1 value: 35.22615594677707 - type: nauc_recall_at_1000_max value: 52.0710533173532 - type: nauc_recall_at_1000_std value: 45.17683523786464 - type: nauc_recall_at_100_diff1 value: 36.2169056956332 - type: nauc_recall_at_100_max value: 35.02435003210817 - type: nauc_recall_at_100_std value: 15.833632946282508 - type: nauc_recall_at_10_diff1 value: 39.12440292974848 - type: nauc_recall_at_10_max value: 28.0546011979648 - type: nauc_recall_at_10_std value: -9.620558638092172 - type: nauc_recall_at_1_diff1 value: 53.98646767288695 - type: nauc_recall_at_1_max value: 29.45629077638936 - type: nauc_recall_at_1_std value: -5.621187380771589 - type: nauc_recall_at_20_diff1 value: 36.39254630768161 - type: nauc_recall_at_20_max value: 29.277856508751967 - type: nauc_recall_at_20_std value: -3.048007490798412 - type: nauc_recall_at_3_diff1 value: 45.64706642644958 - type: nauc_recall_at_3_max value: 31.003050159737413 - type: nauc_recall_at_3_std value: -4.849763876930667 - type: nauc_recall_at_5_diff1 value: 40.918108859971746 - type: nauc_recall_at_5_max value: 30.69907335071493 - type: nauc_recall_at_5_std value: -6.1445436251916865 - type: ndcg_at_1 value: 40.327 - type: ndcg_at_10 value: 49.675000000000004 - type: ndcg_at_100 value: 55.364000000000004 - type: ndcg_at_1000 value: 56.992 - type: ndcg_at_20 value: 51.803999999999995 - type: ndcg_at_3 value: 45.227000000000004 - type: ndcg_at_5 value: 47.244 - type: precision_at_1 value: 40.327 - type: precision_at_10 value: 8.826 - type: precision_at_100 value: 1.354 - type: precision_at_1000 value: 0.167 - type: precision_at_20 value: 5.115 - type: precision_at_3 value: 21.303 - type: precision_at_5 value: 14.726 - type: recall_at_1 value: 33.178999999999995 - type: recall_at_10 value: 61.087 - type: recall_at_100 value: 85.099 - type: recall_at_1000 value: 95.14099999999999 - type: recall_at_20 value: 68.623 - type: recall_at_3 value: 48.245 - type: recall_at_5 value: 53.832 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: mteb/cqadupstack-programmers metrics: - type: main_score value: 44.99 - type: map_at_1 value: 28.089 - type: map_at_10 value: 38.98 - type: map_at_100 value: 40.339000000000006 - type: map_at_1000 value: 40.441 - type: map_at_20 value: 39.702 - type: map_at_3 value: 35.620000000000005 - type: map_at_5 value: 37.657000000000004 - type: mrr_at_1 value: 35.15981735159817 - type: mrr_at_10 value: 44.54075161266937 - type: mrr_at_100 value: 45.435730392436646 - type: mrr_at_1000 value: 45.47673849356812 - type: mrr_at_20 value: 45.05949613726918 - type: mrr_at_3 value: 42.00913242009131 - type: mrr_at_5 value: 43.52739726027392 - type: nauc_map_at_1000_diff1 value: 42.6375513442399 - type: nauc_map_at_1000_max value: 35.83899956589522 - type: nauc_map_at_1000_std value: 5.798620017712549 - type: nauc_map_at_100_diff1 value: 42.609712253881504 - type: nauc_map_at_100_max value: 35.85401871065736 - type: nauc_map_at_100_std value: 5.829007296755533 - type: nauc_map_at_10_diff1 value: 42.90931172127824 - type: nauc_map_at_10_max value: 35.46694204511423 - type: nauc_map_at_10_std value: 5.131477704152026 - type: nauc_map_at_1_diff1 value: 48.066312177855956 - type: nauc_map_at_1_max value: 30.67745267941573 - type: nauc_map_at_1_std value: -1.4170737991670943 - type: nauc_map_at_20_diff1 value: 42.730423700784 - type: nauc_map_at_20_max value: 35.710039616497085 - type: nauc_map_at_20_std value: 5.363961887475162 - type: nauc_map_at_3_diff1 value: 43.499223646579935 - type: nauc_map_at_3_max value: 33.872570039621564 - type: nauc_map_at_3_std value: 3.0787571843453008 - type: nauc_map_at_5_diff1 value: 43.28963642946521 - type: nauc_map_at_5_max value: 35.18327408279892 - type: nauc_map_at_5_std value: 4.516467154662473 - type: nauc_mrr_at_1000_diff1 value: 42.71279871641341 - type: nauc_mrr_at_1000_max value: 37.48825064817496 - type: nauc_mrr_at_1000_std value: 8.10015025024314 - type: nauc_mrr_at_100_diff1 value: 42.694777404773376 - type: nauc_mrr_at_100_max value: 37.476741768741086 - type: nauc_mrr_at_100_std value: 8.11525130417229 - type: nauc_mrr_at_10_diff1 value: 42.954194054560176 - type: nauc_mrr_at_10_max value: 37.606138578797506 - type: nauc_mrr_at_10_std value: 8.092519513302399 - type: nauc_mrr_at_1_diff1 value: 48.350790286038574 - type: nauc_mrr_at_1_max value: 33.97992759739641 - type: nauc_mrr_at_1_std value: 1.8332987018664093 - type: nauc_mrr_at_20_diff1 value: 42.664983701783044 - type: nauc_mrr_at_20_max value: 37.47450702110784 - type: nauc_mrr_at_20_std value: 8.001067634745462 - type: nauc_mrr_at_3_diff1 value: 42.921968602737955 - type: nauc_mrr_at_3_max value: 37.19599728791262 - type: nauc_mrr_at_3_std value: 7.4692697422507575 - type: nauc_mrr_at_5_diff1 value: 42.96028546491891 - type: nauc_mrr_at_5_max value: 37.688350071295915 - type: nauc_mrr_at_5_std value: 8.213017954012372 - type: nauc_ndcg_at_1000_diff1 value: 40.70763263942397 - type: nauc_ndcg_at_1000_max value: 37.87768319167602 - type: nauc_ndcg_at_1000_std value: 9.908807071686738 - type: nauc_ndcg_at_100_diff1 value: 39.97828438221707 - type: nauc_ndcg_at_100_max value: 37.7723393835996 - type: nauc_ndcg_at_100_std value: 10.666779466040097 - type: nauc_ndcg_at_10_diff1 value: 41.172233451172936 - type: nauc_ndcg_at_10_max value: 37.12252131573939 - type: nauc_ndcg_at_10_std value: 8.273798754436639 - type: nauc_ndcg_at_1_diff1 value: 48.350790286038574 - type: nauc_ndcg_at_1_max value: 33.97992759739641 - type: nauc_ndcg_at_1_std value: 1.8332987018664093 - type: nauc_ndcg_at_20_diff1 value: 40.33325895172716 - type: nauc_ndcg_at_20_max value: 37.36015594019951 - type: nauc_ndcg_at_20_std value: 8.818556108749302 - type: nauc_ndcg_at_3_diff1 value: 41.652701699747254 - type: nauc_ndcg_at_3_max value: 35.499109874223294 - type: nauc_ndcg_at_3_std value: 5.831784865606119 - type: nauc_ndcg_at_5_diff1 value: 41.856346892595475 - type: nauc_ndcg_at_5_max value: 36.940681835687194 - type: nauc_ndcg_at_5_std value: 7.507798515093516 - type: nauc_precision_at_1000_diff1 value: -2.4605367806784866 - type: nauc_precision_at_1000_max value: -0.3538142127162922 - type: nauc_precision_at_1000_std value: 8.369794961833236 - type: nauc_precision_at_100_diff1 value: -0.34954522096524704 - type: nauc_precision_at_100_max value: 13.159909603146458 - type: nauc_precision_at_100_std value: 19.425561514133996 - type: nauc_precision_at_10_diff1 value: 17.048304710148145 - type: nauc_precision_at_10_max value: 29.816041846806375 - type: nauc_precision_at_10_std value: 18.358893367243798 - type: nauc_precision_at_1_diff1 value: 48.350790286038574 - type: nauc_precision_at_1_max value: 33.97992759739641 - type: nauc_precision_at_1_std value: 1.8332987018664093 - type: nauc_precision_at_20_diff1 value: 10.450903599411344 - type: nauc_precision_at_20_max value: 25.228916373799127 - type: nauc_precision_at_20_std value: 18.46893569529936 - type: nauc_precision_at_3_diff1 value: 29.181236567048636 - type: nauc_precision_at_3_max value: 35.64918262500281 - type: nauc_precision_at_3_std value: 13.347538222514968 - type: nauc_precision_at_5_diff1 value: 23.693323840550345 - type: nauc_precision_at_5_max value: 33.972399735191225 - type: nauc_precision_at_5_std value: 17.107012760554618 - type: nauc_recall_at_1000_diff1 value: 20.297340483227945 - type: nauc_recall_at_1000_max value: 63.084305970127275 - type: nauc_recall_at_1000_std value: 63.04655000858784 - type: nauc_recall_at_100_diff1 value: 22.587332148979723 - type: nauc_recall_at_100_max value: 40.740968468024775 - type: nauc_recall_at_100_std value: 34.120423684507124 - type: nauc_recall_at_10_diff1 value: 33.361195948673675 - type: nauc_recall_at_10_max value: 37.1411402410262 - type: nauc_recall_at_10_std value: 13.475407196166259 - type: nauc_recall_at_1_diff1 value: 48.066312177855956 - type: nauc_recall_at_1_max value: 30.67745267941573 - type: nauc_recall_at_1_std value: -1.4170737991670943 - type: nauc_recall_at_20_diff1 value: 28.703982984383984 - type: nauc_recall_at_20_max value: 37.32929431193496 - type: nauc_recall_at_20_std value: 16.139135347989903 - type: nauc_recall_at_3_diff1 value: 36.53346179134789 - type: nauc_recall_at_3_max value: 34.11397914899309 - type: nauc_recall_at_3_std value: 7.19358019807132 - type: nauc_recall_at_5_diff1 value: 36.24058894947452 - type: nauc_recall_at_5_max value: 37.00990358651097 - type: nauc_recall_at_5_std value: 11.074645476821619 - type: ndcg_at_1 value: 35.160000000000004 - type: ndcg_at_10 value: 44.99 - type: ndcg_at_100 value: 50.661 - type: ndcg_at_1000 value: 52.599 - type: ndcg_at_20 value: 47.154 - type: ndcg_at_3 value: 39.843 - type: ndcg_at_5 value: 42.486000000000004 - type: precision_at_1 value: 35.160000000000004 - type: precision_at_10 value: 8.299 - type: precision_at_100 value: 1.2850000000000001 - type: precision_at_1000 value: 0.16199999999999998 - type: precision_at_20 value: 4.84 - type: precision_at_3 value: 19.178 - type: precision_at_5 value: 13.927 - type: recall_at_1 value: 28.089 - type: recall_at_10 value: 57.158 - type: recall_at_100 value: 81.461 - type: recall_at_1000 value: 94.46900000000001 - type: recall_at_20 value: 64.927 - type: recall_at_3 value: 42.775999999999996 - type: recall_at_5 value: 49.719 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval revision: CQADupstackRetrieval is a combined dataset split: test type: mteb/cqadupstack metrics: - type: main_score value: 44.989166666666655 - type: ndcg_at_10 value: 44.989166666666655 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: mteb/cqadupstack-stats metrics: - type: main_score value: 39.586 - type: map_at_1 value: 27.301 - type: map_at_10 value: 35.022 - type: map_at_100 value: 36.061 - type: map_at_1000 value: 36.146 - type: map_at_20 value: 35.608000000000004 - type: map_at_3 value: 32.978 - type: map_at_5 value: 33.994 - type: mrr_at_1 value: 30.67484662576687 - type: mrr_at_10 value: 38.1696124257474 - type: mrr_at_100 value: 38.99730898994137 - type: mrr_at_1000 value: 39.049871007408136 - type: mrr_at_20 value: 38.62424051396064 - type: mrr_at_3 value: 36.40081799591004 - type: mrr_at_5 value: 37.23670756646219 - type: nauc_map_at_1000_diff1 value: 50.4395097150819 - type: nauc_map_at_1000_max value: 42.36231476768413 - type: nauc_map_at_1000_std value: 1.0739414045485742 - type: nauc_map_at_100_diff1 value: 50.4253775421283 - type: nauc_map_at_100_max value: 42.34508969348633 - type: nauc_map_at_100_std value: 1.0590256535050135 - type: nauc_map_at_10_diff1 value: 50.74196619464362 - type: nauc_map_at_10_max value: 42.354326434590284 - type: nauc_map_at_10_std value: 0.6330167542705694 - type: nauc_map_at_1_diff1 value: 55.7404810490963 - type: nauc_map_at_1_max value: 40.7676941648045 - type: nauc_map_at_1_std value: -5.021772566610674 - type: nauc_map_at_20_diff1 value: 50.39792463598886 - type: nauc_map_at_20_max value: 42.25768760228577 - type: nauc_map_at_20_std value: 0.8979017700131807 - type: nauc_map_at_3_diff1 value: 51.53267996170815 - type: nauc_map_at_3_max value: 41.78801756883417 - type: nauc_map_at_3_std value: -0.6652383024396911 - type: nauc_map_at_5_diff1 value: 50.992783683271504 - type: nauc_map_at_5_max value: 41.8607977828188 - type: nauc_map_at_5_std value: 0.3484379897869807 - type: nauc_mrr_at_1000_diff1 value: 48.952907124445126 - type: nauc_mrr_at_1000_max value: 42.93563741482114 - type: nauc_mrr_at_1000_std value: 3.0791495753556424 - type: nauc_mrr_at_100_diff1 value: 48.941921107360805 - type: nauc_mrr_at_100_max value: 42.94419657374061 - type: nauc_mrr_at_100_std value: 3.075397087180154 - type: nauc_mrr_at_10_diff1 value: 49.098926306303056 - type: nauc_mrr_at_10_max value: 42.941857820499806 - type: nauc_mrr_at_10_std value: 2.8184474174054372 - type: nauc_mrr_at_1_diff1 value: 54.428109877009334 - type: nauc_mrr_at_1_max value: 42.50273386972492 - type: nauc_mrr_at_1_std value: -2.1811826216412187 - type: nauc_mrr_at_20_diff1 value: 48.82502192775839 - type: nauc_mrr_at_20_max value: 42.92227277257095 - type: nauc_mrr_at_20_std value: 2.975812634368533 - type: nauc_mrr_at_3_diff1 value: 49.440009227591176 - type: nauc_mrr_at_3_max value: 42.95503176290712 - type: nauc_mrr_at_3_std value: 2.2997128945013796 - type: nauc_mrr_at_5_diff1 value: 49.09846782701398 - type: nauc_mrr_at_5_max value: 42.51449168285772 - type: nauc_mrr_at_5_std value: 2.7785816484421297 - type: nauc_ndcg_at_1000_diff1 value: 48.14680758187888 - type: nauc_ndcg_at_1000_max value: 43.57465718500695 - type: nauc_ndcg_at_1000_std value: 5.287435676678261 - type: nauc_ndcg_at_100_diff1 value: 47.66081605743284 - type: nauc_ndcg_at_100_max value: 43.28156751251163 - type: nauc_ndcg_at_100_std value: 4.959626409663624 - type: nauc_ndcg_at_10_diff1 value: 48.25075619623878 - type: nauc_ndcg_at_10_max value: 43.00688660666578 - type: nauc_ndcg_at_10_std value: 3.2319193368891637 - type: nauc_ndcg_at_1_diff1 value: 54.428109877009334 - type: nauc_ndcg_at_1_max value: 42.50273386972492 - type: nauc_ndcg_at_1_std value: -2.1811826216412187 - type: nauc_ndcg_at_20_diff1 value: 47.1943098627403 - type: nauc_ndcg_at_20_max value: 42.86954491768707 - type: nauc_ndcg_at_20_std value: 4.08583080150737 - type: nauc_ndcg_at_3_diff1 value: 49.32681523192246 - type: nauc_ndcg_at_3_max value: 42.46898641470274 - type: nauc_ndcg_at_3_std value: 1.7416962407725236 - type: nauc_ndcg_at_5_diff1 value: 48.59647012439291 - type: nauc_ndcg_at_5_max value: 42.07098889846439 - type: nauc_ndcg_at_5_std value: 2.979621233356828 - type: nauc_precision_at_1000_diff1 value: -1.7366334161587105 - type: nauc_precision_at_1000_max value: 17.70969166396819 - type: nauc_precision_at_1000_std value: 17.50619975322144 - type: nauc_precision_at_100_diff1 value: 10.082579982582155 - type: nauc_precision_at_100_max value: 28.024893516091776 - type: nauc_precision_at_100_std value: 18.41413013357596 - type: nauc_precision_at_10_diff1 value: 28.796167732373657 - type: nauc_precision_at_10_max value: 40.37340024485382 - type: nauc_precision_at_10_std value: 13.718572711091733 - type: nauc_precision_at_1_diff1 value: 54.428109877009334 - type: nauc_precision_at_1_max value: 42.50273386972492 - type: nauc_precision_at_1_std value: -2.1811826216412187 - type: nauc_precision_at_20_diff1 value: 19.82691920771315 - type: nauc_precision_at_20_max value: 34.45075390159975 - type: nauc_precision_at_20_std value: 16.410812072348058 - type: nauc_precision_at_3_diff1 value: 40.85430254962678 - type: nauc_precision_at_3_max value: 43.63016056067074 - type: nauc_precision_at_3_std value: 9.322014634477581 - type: nauc_precision_at_5_diff1 value: 35.830272848975795 - type: nauc_precision_at_5_max value: 41.30047691620363 - type: nauc_precision_at_5_std value: 13.145693992266565 - type: nauc_recall_at_1000_diff1 value: 35.532000545890504 - type: nauc_recall_at_1000_max value: 50.714223194510325 - type: nauc_recall_at_1000_std value: 43.09037309139045 - type: nauc_recall_at_100_diff1 value: 35.11024488875192 - type: nauc_recall_at_100_max value: 43.0874566265193 - type: nauc_recall_at_100_std value: 19.70628521846854 - type: nauc_recall_at_10_diff1 value: 40.36203726741153 - type: nauc_recall_at_10_max value: 42.581482582576726 - type: nauc_recall_at_10_std value: 8.642553371022348 - type: nauc_recall_at_1_diff1 value: 55.7404810490963 - type: nauc_recall_at_1_max value: 40.7676941648045 - type: nauc_recall_at_1_std value: -5.021772566610674 - type: nauc_recall_at_20_diff1 value: 35.97348868186562 - type: nauc_recall_at_20_max value: 41.82695933305065 - type: nauc_recall_at_20_std value: 11.444957541593585 - type: nauc_recall_at_3_diff1 value: 44.20020470014979 - type: nauc_recall_at_3_max value: 40.84130855296979 - type: nauc_recall_at_3_std value: 5.004883338558809 - type: nauc_recall_at_5_diff1 value: 42.08756885472078 - type: nauc_recall_at_5_max value: 39.90323783606852 - type: nauc_recall_at_5_std value: 8.085182534171127 - type: ndcg_at_1 value: 30.675 - type: ndcg_at_10 value: 39.586 - type: ndcg_at_100 value: 44.737 - type: ndcg_at_1000 value: 46.863 - type: ndcg_at_20 value: 41.495 - type: ndcg_at_3 value: 35.8 - type: ndcg_at_5 value: 37.3 - type: precision_at_1 value: 30.675 - type: precision_at_10 value: 6.196 - type: precision_at_100 value: 0.9570000000000001 - type: precision_at_1000 value: 0.122 - type: precision_at_20 value: 3.6350000000000002 - type: precision_at_3 value: 15.337 - type: precision_at_5 value: 10.337 - type: recall_at_1 value: 27.301 - type: recall_at_10 value: 50.346999999999994 - type: recall_at_100 value: 74.459 - type: recall_at_1000 value: 90.018 - type: recall_at_20 value: 57.473 - type: recall_at_3 value: 39.672000000000004 - type: recall_at_5 value: 43.383 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: mteb/cqadupstack-tex metrics: - type: main_score value: 32.842 - type: map_at_1 value: 19.527 - type: map_at_10 value: 27.711999999999996 - type: map_at_100 value: 28.98 - type: map_at_1000 value: 29.108 - type: map_at_20 value: 28.407 - type: map_at_3 value: 25.023 - type: map_at_5 value: 26.528000000000002 - type: mrr_at_1 value: 23.675154852030282 - type: mrr_at_10 value: 31.810676323752784 - type: mrr_at_100 value: 32.788970614380716 - type: mrr_at_1000 value: 32.86028758975889 - type: mrr_at_20 value: 32.35935756676056 - type: mrr_at_3 value: 29.41615049323246 - type: mrr_at_5 value: 30.785730672172633 - type: nauc_map_at_1000_diff1 value: 35.597766688968015 - type: nauc_map_at_1000_max value: 26.295790183159845 - type: nauc_map_at_1000_std value: -0.04229904865958209 - type: nauc_map_at_100_diff1 value: 35.568782622469925 - type: nauc_map_at_100_max value: 26.27850795471227 - type: nauc_map_at_100_std value: -0.04944875782811099 - type: nauc_map_at_10_diff1 value: 35.63760937893694 - type: nauc_map_at_10_max value: 26.130094042028233 - type: nauc_map_at_10_std value: -0.6896882769027717 - type: nauc_map_at_1_diff1 value: 41.759098341890976 - type: nauc_map_at_1_max value: 23.918885427783326 - type: nauc_map_at_1_std value: -2.1383574897865074 - type: nauc_map_at_20_diff1 value: 35.55706530442612 - type: nauc_map_at_20_max value: 26.23339626569677 - type: nauc_map_at_20_std value: -0.162172033918129 - type: nauc_map_at_3_diff1 value: 37.22183376355153 - type: nauc_map_at_3_max value: 25.770512522122186 - type: nauc_map_at_3_std value: -1.3105892187778403 - type: nauc_map_at_5_diff1 value: 36.205913161663084 - type: nauc_map_at_5_max value: 25.953300641502064 - type: nauc_map_at_5_std value: -0.7987363137547906 - type: nauc_mrr_at_1000_diff1 value: 34.864016559617646 - type: nauc_mrr_at_1000_max value: 26.8689525348564 - type: nauc_mrr_at_1000_std value: -0.5839923973914446 - type: nauc_mrr_at_100_diff1 value: 34.83820469598538 - type: nauc_mrr_at_100_max value: 26.864669056231282 - type: nauc_mrr_at_100_std value: -0.5785645654158633 - type: nauc_mrr_at_10_diff1 value: 34.81868397381981 - type: nauc_mrr_at_10_max value: 26.79988560460627 - type: nauc_mrr_at_10_std value: -1.1113808365827318 - type: nauc_mrr_at_1_diff1 value: 40.0281507903504 - type: nauc_mrr_at_1_max value: 25.036735941806583 - type: nauc_mrr_at_1_std value: -2.508700799268523 - type: nauc_mrr_at_20_diff1 value: 34.81954537357966 - type: nauc_mrr_at_20_max value: 26.877673033315453 - type: nauc_mrr_at_20_std value: -0.6706028107452919 - type: nauc_mrr_at_3_diff1 value: 35.87313782549696 - type: nauc_mrr_at_3_max value: 26.776261693392335 - type: nauc_mrr_at_3_std value: -1.8010591328112908 - type: nauc_mrr_at_5_diff1 value: 35.31673912159536 - type: nauc_mrr_at_5_max value: 26.78720786106881 - type: nauc_mrr_at_5_std value: -1.3096326953900546 - type: nauc_ndcg_at_1000_diff1 value: 33.43105244339048 - type: nauc_ndcg_at_1000_max value: 27.52195065724684 - type: nauc_ndcg_at_1000_std value: 2.8376056562675744 - type: nauc_ndcg_at_100_diff1 value: 32.90916846420573 - type: nauc_ndcg_at_100_max value: 27.27161017736065 - type: nauc_ndcg_at_100_std value: 2.8703122625872126 - type: nauc_ndcg_at_10_diff1 value: 33.12714979317447 - type: nauc_ndcg_at_10_max value: 26.67762031747992 - type: nauc_ndcg_at_10_std value: -0.1341345572932233 - type: nauc_ndcg_at_1_diff1 value: 40.0281507903504 - type: nauc_ndcg_at_1_max value: 25.036735941806583 - type: nauc_ndcg_at_1_std value: -2.508700799268523 - type: nauc_ndcg_at_20_diff1 value: 32.891656138688546 - type: nauc_ndcg_at_20_max value: 26.991976404027163 - type: nauc_ndcg_at_20_std value: 1.6050741106677746 - type: nauc_ndcg_at_3_diff1 value: 35.576958713955484 - type: nauc_ndcg_at_3_max value: 26.41687745899445 - type: nauc_ndcg_at_3_std value: -1.5326687067002291 - type: nauc_ndcg_at_5_diff1 value: 34.27335619067276 - type: nauc_ndcg_at_5_max value: 26.479515412084208 - type: nauc_ndcg_at_5_std value: -0.5597648935666003 - type: nauc_precision_at_1000_diff1 value: -0.18660914306684007 - type: nauc_precision_at_1000_max value: 7.268255385799229 - type: nauc_precision_at_1000_std value: -0.1968875268478991 - type: nauc_precision_at_100_diff1 value: 7.386701205054449 - type: nauc_precision_at_100_max value: 15.477735603019607 - type: nauc_precision_at_100_std value: 4.753153414679307 - type: nauc_precision_at_10_diff1 value: 18.4668296945938 - type: nauc_precision_at_10_max value: 25.457144217779597 - type: nauc_precision_at_10_std value: 0.40165373733963605 - type: nauc_precision_at_1_diff1 value: 40.0281507903504 - type: nauc_precision_at_1_max value: 25.036735941806583 - type: nauc_precision_at_1_std value: -2.508700799268523 - type: nauc_precision_at_20_diff1 value: 14.751135844289335 - type: nauc_precision_at_20_max value: 22.763373329576293 - type: nauc_precision_at_20_std value: 4.360731801761864 - type: nauc_precision_at_3_diff1 value: 28.154753888265393 - type: nauc_precision_at_3_max value: 27.838427033527147 - type: nauc_precision_at_3_std value: -1.0042621266717804 - type: nauc_precision_at_5_diff1 value: 23.549026872711423 - type: nauc_precision_at_5_max value: 27.192214745385044 - type: nauc_precision_at_5_std value: 0.4455206110174471 - type: nauc_recall_at_1000_diff1 value: 17.905404210815632 - type: nauc_recall_at_1000_max value: 32.8674418535776 - type: nauc_recall_at_1000_std value: 35.187050415735435 - type: nauc_recall_at_100_diff1 value: 20.903609751984757 - type: nauc_recall_at_100_max value: 27.180306691518364 - type: nauc_recall_at_100_std value: 17.553030959393297 - type: nauc_recall_at_10_diff1 value: 25.615147693464387 - type: nauc_recall_at_10_max value: 25.97062699453565 - type: nauc_recall_at_10_std value: 2.2181702899826576 - type: nauc_recall_at_1_diff1 value: 41.759098341890976 - type: nauc_recall_at_1_max value: 23.918885427783326 - type: nauc_recall_at_1_std value: -2.1383574897865074 - type: nauc_recall_at_20_diff1 value: 23.922775940094386 - type: nauc_recall_at_20_max value: 26.384627814902785 - type: nauc_recall_at_20_std value: 7.944532403561578 - type: nauc_recall_at_3_diff1 value: 32.26543270634743 - type: nauc_recall_at_3_max value: 26.36357710828272 - type: nauc_recall_at_3_std value: -0.42723331708340706 - type: nauc_recall_at_5_diff1 value: 29.080464141763336 - type: nauc_recall_at_5_max value: 25.81238438303652 - type: nauc_recall_at_5_std value: 1.1649311168287726 - type: ndcg_at_1 value: 23.674999999999997 - type: ndcg_at_10 value: 32.842 - type: ndcg_at_100 value: 38.64 - type: ndcg_at_1000 value: 41.367 - type: ndcg_at_20 value: 35.032999999999994 - type: ndcg_at_3 value: 28.166000000000004 - type: ndcg_at_5 value: 30.407 - type: precision_at_1 value: 23.674999999999997 - type: precision_at_10 value: 6.005 - type: precision_at_100 value: 1.053 - type: precision_at_1000 value: 0.146 - type: precision_at_20 value: 3.6580000000000004 - type: precision_at_3 value: 13.352 - type: precision_at_5 value: 9.718 - type: recall_at_1 value: 19.527 - type: recall_at_10 value: 44.096999999999994 - type: recall_at_100 value: 69.962 - type: recall_at_1000 value: 89.035 - type: recall_at_20 value: 52.166000000000004 - type: recall_at_3 value: 30.946 - type: recall_at_5 value: 36.789 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: mteb/cqadupstack-unix metrics: - type: main_score value: 46.54 - type: map_at_1 value: 29.953999999999997 - type: map_at_10 value: 40.742 - type: map_at_100 value: 41.964 - type: map_at_1000 value: 42.059999999999995 - type: map_at_20 value: 41.426 - type: map_at_3 value: 37.378 - type: map_at_5 value: 39.267 - type: mrr_at_1 value: 34.701492537313435 - type: mrr_at_10 value: 44.29978085761664 - type: mrr_at_100 value: 45.205551401915486 - type: mrr_at_1000 value: 45.24735017384963 - type: mrr_at_20 value: 44.85338423755729 - type: mrr_at_3 value: 41.57338308457707 - type: mrr_at_5 value: 43.19185323383077 - type: nauc_map_at_1000_diff1 value: 48.45170522932164 - type: nauc_map_at_1000_max value: 31.544164363591204 - type: nauc_map_at_1000_std value: 0.8661088818146858 - type: nauc_map_at_100_diff1 value: 48.47347800061323 - type: nauc_map_at_100_max value: 31.568637596620313 - type: nauc_map_at_100_std value: 0.9252699336843858 - type: nauc_map_at_10_diff1 value: 48.64849891585432 - type: nauc_map_at_10_max value: 31.40371265579746 - type: nauc_map_at_10_std value: 0.7088016563713089 - type: nauc_map_at_1_diff1 value: 53.57918993108331 - type: nauc_map_at_1_max value: 31.392632653740993 - type: nauc_map_at_1_std value: -2.857306170463933 - type: nauc_map_at_20_diff1 value: 48.49084353023969 - type: nauc_map_at_20_max value: 31.470313174779374 - type: nauc_map_at_20_std value: 0.8950296035234309 - type: nauc_map_at_3_diff1 value: 49.273481161619806 - type: nauc_map_at_3_max value: 31.101471509782826 - type: nauc_map_at_3_std value: -0.886510096257905 - type: nauc_map_at_5_diff1 value: 48.85344288229106 - type: nauc_map_at_5_max value: 31.32633663238284 - type: nauc_map_at_5_std value: -0.44752909698881177 - type: nauc_mrr_at_1000_diff1 value: 46.27593166906613 - type: nauc_mrr_at_1000_max value: 31.637594372116336 - type: nauc_mrr_at_1000_std value: 0.8444917550670064 - type: nauc_mrr_at_100_diff1 value: 46.27161543033672 - type: nauc_mrr_at_100_max value: 31.64330655339695 - type: nauc_mrr_at_100_std value: 0.8717446416398773 - type: nauc_mrr_at_10_diff1 value: 46.100348481312864 - type: nauc_mrr_at_10_max value: 31.594271897882237 - type: nauc_mrr_at_10_std value: 0.8807168907688873 - type: nauc_mrr_at_1_diff1 value: 51.35163098909763 - type: nauc_mrr_at_1_max value: 31.99084441327899 - type: nauc_mrr_at_1_std value: -2.688594880742662 - type: nauc_mrr_at_20_diff1 value: 46.18178546174727 - type: nauc_mrr_at_20_max value: 31.639111674119448 - type: nauc_mrr_at_20_std value: 0.9855008641374622 - type: nauc_mrr_at_3_diff1 value: 46.307484835305864 - type: nauc_mrr_at_3_max value: 31.35563850804847 - type: nauc_mrr_at_3_std value: -0.3419536587707561 - type: nauc_mrr_at_5_diff1 value: 46.17646418781234 - type: nauc_mrr_at_5_max value: 31.313474270239833 - type: nauc_mrr_at_5_std value: -0.08656550526568331 - type: nauc_ndcg_at_1000_diff1 value: 46.12095795101613 - type: nauc_ndcg_at_1000_max value: 31.989083597726314 - type: nauc_ndcg_at_1000_std value: 3.2965704707660763 - type: nauc_ndcg_at_100_diff1 value: 46.05376249841318 - type: nauc_ndcg_at_100_max value: 32.39195988574972 - type: nauc_ndcg_at_100_std value: 4.518018135593347 - type: nauc_ndcg_at_10_diff1 value: 46.133631183744875 - type: nauc_ndcg_at_10_max value: 31.45358876172339 - type: nauc_ndcg_at_10_std value: 3.4254370918871055 - type: nauc_ndcg_at_1_diff1 value: 51.35163098909763 - type: nauc_ndcg_at_1_max value: 31.99084441327899 - type: nauc_ndcg_at_1_std value: -2.688594880742662 - type: nauc_ndcg_at_20_diff1 value: 45.94584949766954 - type: nauc_ndcg_at_20_max value: 31.689777515111295 - type: nauc_ndcg_at_20_std value: 4.189082428922442 - type: nauc_ndcg_at_3_diff1 value: 46.5057835389752 - type: nauc_ndcg_at_3_max value: 30.941407592082047 - type: nauc_ndcg_at_3_std value: -0.042473944857831535 - type: nauc_ndcg_at_5_diff1 value: 46.369027395136136 - type: nauc_ndcg_at_5_max value: 31.057841776505352 - type: nauc_ndcg_at_5_std value: 0.6878993420489522 - type: nauc_precision_at_1000_diff1 value: -17.30759714093202 - type: nauc_precision_at_1000_max value: -4.441155558458858 - type: nauc_precision_at_1000_std value: 1.5537300718220326 - type: nauc_precision_at_100_diff1 value: -7.18920438222021 - type: nauc_precision_at_100_max value: 8.017878121399253 - type: nauc_precision_at_100_std value: 11.357132919349102 - type: nauc_precision_at_10_diff1 value: 15.202451884794076 - type: nauc_precision_at_10_max value: 19.077295902881417 - type: nauc_precision_at_10_std value: 9.885526867355805 - type: nauc_precision_at_1_diff1 value: 51.35163098909763 - type: nauc_precision_at_1_max value: 31.99084441327899 - type: nauc_precision_at_1_std value: -2.688594880742662 - type: nauc_precision_at_20_diff1 value: 6.827461091494899 - type: nauc_precision_at_20_max value: 15.27268633497114 - type: nauc_precision_at_20_std value: 11.515826649647384 - type: nauc_precision_at_3_diff1 value: 31.043021807472027 - type: nauc_precision_at_3_max value: 26.22457157531548 - type: nauc_precision_at_3_std value: 1.788215968301994 - type: nauc_precision_at_5_diff1 value: 25.030185818513235 - type: nauc_precision_at_5_max value: 23.680129160901537 - type: nauc_precision_at_5_std value: 4.303018899688115 - type: nauc_recall_at_1000_diff1 value: 28.68826642607512 - type: nauc_recall_at_1000_max value: 42.33849804103852 - type: nauc_recall_at_1000_std value: 42.67413575876864 - type: nauc_recall_at_100_diff1 value: 36.51494878715 - type: nauc_recall_at_100_max value: 37.4764995034434 - type: nauc_recall_at_100_std value: 28.295671266661017 - type: nauc_recall_at_10_diff1 value: 39.416721111463524 - type: nauc_recall_at_10_max value: 29.95985608454179 - type: nauc_recall_at_10_std value: 12.423335839786201 - type: nauc_recall_at_1_diff1 value: 53.57918993108331 - type: nauc_recall_at_1_max value: 31.392632653740993 - type: nauc_recall_at_1_std value: -2.857306170463933 - type: nauc_recall_at_20_diff1 value: 38.228803480194046 - type: nauc_recall_at_20_max value: 30.87261362975955 - type: nauc_recall_at_20_std value: 16.977113091834095 - type: nauc_recall_at_3_diff1 value: 43.154348566653155 - type: nauc_recall_at_3_max value: 29.54536633744803 - type: nauc_recall_at_3_std value: 2.02842672250621 - type: nauc_recall_at_5_diff1 value: 41.00436246072242 - type: nauc_recall_at_5_max value: 29.413569555348023 - type: nauc_recall_at_5_std value: 3.845214021958289 - type: ndcg_at_1 value: 34.701 - type: ndcg_at_10 value: 46.54 - type: ndcg_at_100 value: 51.754999999999995 - type: ndcg_at_1000 value: 53.71 - type: ndcg_at_20 value: 48.679 - type: ndcg_at_3 value: 40.892 - type: ndcg_at_5 value: 43.595 - type: precision_at_1 value: 34.701 - type: precision_at_10 value: 8.004 - type: precision_at_100 value: 1.185 - type: precision_at_1000 value: 0.145 - type: precision_at_20 value: 4.632 - type: precision_at_3 value: 18.719 - type: precision_at_5 value: 13.245999999999999 - type: recall_at_1 value: 29.953999999999997 - type: recall_at_10 value: 60.246 - type: recall_at_100 value: 82.128 - type: recall_at_1000 value: 95.622 - type: recall_at_20 value: 67.756 - type: recall_at_3 value: 45.096000000000004 - type: recall_at_5 value: 51.9 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: mteb/cqadupstack-webmasters metrics: - type: main_score value: 44.718999999999994 - type: map_at_1 value: 28.383999999999997 - type: map_at_10 value: 38.422 - type: map_at_100 value: 40.058 - type: map_at_1000 value: 40.276 - type: map_at_20 value: 39.301 - type: map_at_3 value: 35.205 - type: map_at_5 value: 36.803999999999995 - type: mrr_at_1 value: 33.59683794466403 - type: mrr_at_10 value: 42.837536859275986 - type: mrr_at_100 value: 43.7501703455481 - type: mrr_at_1000 value: 43.79258407771123 - type: mrr_at_20 value: 43.36044710445095 - type: mrr_at_3 value: 40.15151515151516 - type: mrr_at_5 value: 41.74242424242425 - type: nauc_map_at_1000_diff1 value: 47.934826596875304 - type: nauc_map_at_1000_max value: 32.39759438116062 - type: nauc_map_at_1000_std value: 0.9489007346763054 - type: nauc_map_at_100_diff1 value: 47.94844822157888 - type: nauc_map_at_100_max value: 32.51485845519537 - type: nauc_map_at_100_std value: 0.8094339925545622 - type: nauc_map_at_10_diff1 value: 48.251456404874645 - type: nauc_map_at_10_max value: 31.412906399154245 - type: nauc_map_at_10_std value: -0.7024825737369933 - type: nauc_map_at_1_diff1 value: 55.81906101970174 - type: nauc_map_at_1_max value: 31.811715334193796 - type: nauc_map_at_1_std value: -6.17056859281584 - type: nauc_map_at_20_diff1 value: 47.80902650237369 - type: nauc_map_at_20_max value: 32.22465403023091 - type: nauc_map_at_20_std value: 0.20706526946705656 - type: nauc_map_at_3_diff1 value: 49.97333984346632 - type: nauc_map_at_3_max value: 31.58195498640799 - type: nauc_map_at_3_std value: -2.577539707727459 - type: nauc_map_at_5_diff1 value: 49.40005767350608 - type: nauc_map_at_5_max value: 30.998435600377434 - type: nauc_map_at_5_std value: -2.1231771618690307 - type: nauc_mrr_at_1000_diff1 value: 46.86811371969663 - type: nauc_mrr_at_1000_max value: 31.25147138171024 - type: nauc_mrr_at_1000_std value: 1.9954422477585918 - type: nauc_mrr_at_100_diff1 value: 46.855870345882195 - type: nauc_mrr_at_100_max value: 31.263524035665966 - type: nauc_mrr_at_100_std value: 2.0160751193806568 - type: nauc_mrr_at_10_diff1 value: 46.93294772825783 - type: nauc_mrr_at_10_max value: 30.927002048701663 - type: nauc_mrr_at_10_std value: 1.6538220080908224 - type: nauc_mrr_at_1_diff1 value: 52.416386548395664 - type: nauc_mrr_at_1_max value: 32.28582003787206 - type: nauc_mrr_at_1_std value: -2.154991145714492 - type: nauc_mrr_at_20_diff1 value: 46.71796185319694 - type: nauc_mrr_at_20_max value: 31.16219902794994 - type: nauc_mrr_at_20_std value: 1.8590646572728409 - type: nauc_mrr_at_3_diff1 value: 47.697100317669914 - type: nauc_mrr_at_3_max value: 30.821806030159383 - type: nauc_mrr_at_3_std value: 1.1927626358099177 - type: nauc_mrr_at_5_diff1 value: 47.065272061365704 - type: nauc_mrr_at_5_max value: 30.299230962805023 - type: nauc_mrr_at_5_std value: 1.3225842862629529 - type: nauc_ndcg_at_1000_diff1 value: 45.20612583136058 - type: nauc_ndcg_at_1000_max value: 33.51931869947315 - type: nauc_ndcg_at_1000_std value: 4.923707509620363 - type: nauc_ndcg_at_100_diff1 value: 44.76206243393775 - type: nauc_ndcg_at_100_max value: 33.57771606755598 - type: nauc_ndcg_at_100_std value: 5.30915563331338 - type: nauc_ndcg_at_10_diff1 value: 45.12714032463827 - type: nauc_ndcg_at_10_max value: 30.351909495610492 - type: nauc_ndcg_at_10_std value: 2.3972947289996873 - type: nauc_ndcg_at_1_diff1 value: 52.416386548395664 - type: nauc_ndcg_at_1_max value: 32.28582003787206 - type: nauc_ndcg_at_1_std value: -2.154991145714492 - type: nauc_ndcg_at_20_diff1 value: 44.20281844000005 - type: nauc_ndcg_at_20_max value: 32.14112739396226 - type: nauc_ndcg_at_20_std value: 3.3971385462591916 - type: nauc_ndcg_at_3_diff1 value: 47.0633767031858 - type: nauc_ndcg_at_3_max value: 31.032896053733435 - type: nauc_ndcg_at_3_std value: 0.6827544906310201 - type: nauc_ndcg_at_5_diff1 value: 46.735352294106484 - type: nauc_ndcg_at_5_max value: 29.784992270528544 - type: nauc_ndcg_at_5_std value: 0.8685943819516141 - type: nauc_precision_at_1000_diff1 value: -12.223330179860852 - type: nauc_precision_at_1000_max value: -9.266492213777273 - type: nauc_precision_at_1000_std value: 19.0569899587788 - type: nauc_precision_at_100_diff1 value: -5.803751085072067 - type: nauc_precision_at_100_max value: 3.448932057044294 - type: nauc_precision_at_100_std value: 23.470863527030627 - type: nauc_precision_at_10_diff1 value: 8.887357341361907 - type: nauc_precision_at_10_max value: 18.67165390928126 - type: nauc_precision_at_10_std value: 19.158543337955404 - type: nauc_precision_at_1_diff1 value: 52.416386548395664 - type: nauc_precision_at_1_max value: 32.28582003787206 - type: nauc_precision_at_1_std value: -2.154991145714492 - type: nauc_precision_at_20_diff1 value: 0.942496138409553 - type: nauc_precision_at_20_max value: 18.86957127610774 - type: nauc_precision_at_20_std value: 24.075503903246496 - type: nauc_precision_at_3_diff1 value: 28.15363877307106 - type: nauc_precision_at_3_max value: 27.064928137991824 - type: nauc_precision_at_3_std value: 8.632807104504753 - type: nauc_precision_at_5_diff1 value: 20.805862332497973 - type: nauc_precision_at_5_max value: 21.420201475758404 - type: nauc_precision_at_5_std value: 12.380239645425714 - type: nauc_recall_at_1000_diff1 value: 18.478341468055547 - type: nauc_recall_at_1000_max value: 56.293560115074506 - type: nauc_recall_at_1000_std value: 64.31607185065428 - type: nauc_recall_at_100_diff1 value: 26.737267337771886 - type: nauc_recall_at_100_max value: 38.011889141496326 - type: nauc_recall_at_100_std value: 30.44904690114732 - type: nauc_recall_at_10_diff1 value: 35.22772732735716 - type: nauc_recall_at_10_max value: 26.000054115159486 - type: nauc_recall_at_10_std value: 5.174264254271206 - type: nauc_recall_at_1_diff1 value: 55.81906101970174 - type: nauc_recall_at_1_max value: 31.811715334193796 - type: nauc_recall_at_1_std value: -6.17056859281584 - type: nauc_recall_at_20_diff1 value: 30.48493302415641 - type: nauc_recall_at_20_max value: 31.05487040370753 - type: nauc_recall_at_20_std value: 10.319948318834136 - type: nauc_recall_at_3_diff1 value: 43.12289512340243 - type: nauc_recall_at_3_max value: 28.176279771026135 - type: nauc_recall_at_3_std value: -0.1775154523381921 - type: nauc_recall_at_5_diff1 value: 40.9934933741234 - type: nauc_recall_at_5_max value: 25.569156290584733 - type: nauc_recall_at_5_std value: 0.21166696686855038 - type: ndcg_at_1 value: 33.597 - type: ndcg_at_10 value: 44.718999999999994 - type: ndcg_at_100 value: 50.324000000000005 - type: ndcg_at_1000 value: 52.468 - type: ndcg_at_20 value: 46.822 - type: ndcg_at_3 value: 39.558 - type: ndcg_at_5 value: 41.827999999999996 - type: precision_at_1 value: 33.597 - type: precision_at_10 value: 8.735 - type: precision_at_100 value: 1.6420000000000001 - type: precision_at_1000 value: 0.246 - type: precision_at_20 value: 5.375 - type: precision_at_3 value: 18.511 - type: precision_at_5 value: 13.399 - type: recall_at_1 value: 28.383999999999997 - type: recall_at_10 value: 56.425000000000004 - type: recall_at_100 value: 82.01899999999999 - type: recall_at_1000 value: 95.285 - type: recall_at_20 value: 64.615 - type: recall_at_3 value: 42.171 - type: recall_at_5 value: 48.296 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack-wordpress metrics: - type: main_score value: 38.269999999999996 - type: map_at_1 value: 25.324999999999996 - type: map_at_10 value: 33.263 - type: map_at_100 value: 34.304 - type: map_at_1000 value: 34.394000000000005 - type: map_at_20 value: 33.827 - type: map_at_3 value: 30.259999999999998 - type: map_at_5 value: 31.832 - type: mrr_at_1 value: 27.171903881700555 - type: mrr_at_10 value: 35.334991051257234 - type: mrr_at_100 value: 36.251283465952355 - type: mrr_at_1000 value: 36.316236092511055 - type: mrr_at_20 value: 35.87141909945257 - type: mrr_at_3 value: 32.71719038817007 - type: mrr_at_5 value: 34.19593345656194 - type: nauc_map_at_1000_diff1 value: 39.614836211522714 - type: nauc_map_at_1000_max value: 22.019768626310192 - type: nauc_map_at_1000_std value: -1.5238708712112499 - type: nauc_map_at_100_diff1 value: 39.63008548572307 - type: nauc_map_at_100_max value: 22.044756063752345 - type: nauc_map_at_100_std value: -1.4869190221494792 - type: nauc_map_at_10_diff1 value: 39.73025012395569 - type: nauc_map_at_10_max value: 22.117710178892107 - type: nauc_map_at_10_std value: -2.5129984871932973 - type: nauc_map_at_1_diff1 value: 45.015617718902654 - type: nauc_map_at_1_max value: 19.313800263189638 - type: nauc_map_at_1_std value: -4.763931386681675 - type: nauc_map_at_20_diff1 value: 39.53678019013766 - type: nauc_map_at_20_max value: 21.880316719428258 - type: nauc_map_at_20_std value: -1.882003994523355 - type: nauc_map_at_3_diff1 value: 40.37307665298228 - type: nauc_map_at_3_max value: 20.851976075322533 - type: nauc_map_at_3_std value: -2.429569082966531 - type: nauc_map_at_5_diff1 value: 39.763015635086 - type: nauc_map_at_5_max value: 22.010102196900725 - type: nauc_map_at_5_std value: -2.654896415670943 - type: nauc_mrr_at_1000_diff1 value: 39.74071733680025 - type: nauc_mrr_at_1000_max value: 21.67309640681989 - type: nauc_mrr_at_1000_std value: -1.4003373135477462 - type: nauc_mrr_at_100_diff1 value: 39.730614151966485 - type: nauc_mrr_at_100_max value: 21.678390048971767 - type: nauc_mrr_at_100_std value: -1.3655362623563931 - type: nauc_mrr_at_10_diff1 value: 39.7900031013241 - type: nauc_mrr_at_10_max value: 21.73643491725051 - type: nauc_mrr_at_10_std value: -2.1175389838696312 - type: nauc_mrr_at_1_diff1 value: 46.165736140679776 - type: nauc_mrr_at_1_max value: 20.071083446822147 - type: nauc_mrr_at_1_std value: -5.018909100858311 - type: nauc_mrr_at_20_diff1 value: 39.6371295762885 - type: nauc_mrr_at_20_max value: 21.659557440270973 - type: nauc_mrr_at_20_std value: -1.4909603958341686 - type: nauc_mrr_at_3_diff1 value: 40.351150322758876 - type: nauc_mrr_at_3_max value: 20.83706249041544 - type: nauc_mrr_at_3_std value: -1.956027373253151 - type: nauc_mrr_at_5_diff1 value: 39.57759107791911 - type: nauc_mrr_at_5_max value: 21.79552045204151 - type: nauc_mrr_at_5_std value: -2.1507013120951126 - type: nauc_ndcg_at_1000_diff1 value: 37.717619356839016 - type: nauc_ndcg_at_1000_max value: 22.545375504379805 - type: nauc_ndcg_at_1000_std value: 1.682348628141016 - type: nauc_ndcg_at_100_diff1 value: 37.656027803682626 - type: nauc_ndcg_at_100_max value: 22.49278246383637 - type: nauc_ndcg_at_100_std value: 2.6818118152357773 - type: nauc_ndcg_at_10_diff1 value: 37.834954205539766 - type: nauc_ndcg_at_10_max value: 22.655839885558443 - type: nauc_ndcg_at_10_std value: -1.97159619786231 - type: nauc_ndcg_at_1_diff1 value: 46.165736140679776 - type: nauc_ndcg_at_1_max value: 20.071083446822147 - type: nauc_ndcg_at_1_std value: -5.018909100858311 - type: nauc_ndcg_at_20_diff1 value: 37.171914857454304 - type: nauc_ndcg_at_20_max value: 21.858904801745897 - type: nauc_ndcg_at_20_std value: 0.3809854859496657 - type: nauc_ndcg_at_3_diff1 value: 38.4460623883955 - type: nauc_ndcg_at_3_max value: 20.95244159463402 - type: nauc_ndcg_at_3_std value: -1.2685011660086651 - type: nauc_ndcg_at_5_diff1 value: 37.48831054573054 - type: nauc_ndcg_at_5_max value: 22.625921624640526 - type: nauc_ndcg_at_5_std value: -2.049221092724925 - type: nauc_precision_at_1000_diff1 value: -19.120500628263994 - type: nauc_precision_at_1000_max value: -6.650707109047473 - type: nauc_precision_at_1000_std value: 15.71193179253002 - type: nauc_precision_at_100_diff1 value: 6.254606806876069 - type: nauc_precision_at_100_max value: 14.601826922181823 - type: nauc_precision_at_100_std value: 28.38299592246453 - type: nauc_precision_at_10_diff1 value: 22.978614338670816 - type: nauc_precision_at_10_max value: 23.04146766323557 - type: nauc_precision_at_10_std value: 6.226264308612577 - type: nauc_precision_at_1_diff1 value: 46.165736140679776 - type: nauc_precision_at_1_max value: 20.071083446822147 - type: nauc_precision_at_1_std value: -5.018909100858311 - type: nauc_precision_at_20_diff1 value: 17.681032853225602 - type: nauc_precision_at_20_max value: 18.66680304585122 - type: nauc_precision_at_20_std value: 15.34896796713905 - type: nauc_precision_at_3_diff1 value: 31.359396694559194 - type: nauc_precision_at_3_max value: 22.279263308973274 - type: nauc_precision_at_3_std value: 3.6302537979529035 - type: nauc_precision_at_5_diff1 value: 26.32257879892933 - type: nauc_precision_at_5_max value: 25.402524493181026 - type: nauc_precision_at_5_std value: 4.731450603747359 - type: nauc_recall_at_1000_diff1 value: 23.562925244967875 - type: nauc_recall_at_1000_max value: 30.737399333586797 - type: nauc_recall_at_1000_std value: 34.19418935008663 - type: nauc_recall_at_100_diff1 value: 28.703574970574824 - type: nauc_recall_at_100_max value: 22.448663600170278 - type: nauc_recall_at_100_std value: 24.53297349042035 - type: nauc_recall_at_10_diff1 value: 31.73603907811882 - type: nauc_recall_at_10_max value: 23.453183748640765 - type: nauc_recall_at_10_std value: -1.8279054407176274 - type: nauc_recall_at_1_diff1 value: 45.015617718902654 - type: nauc_recall_at_1_max value: 19.313800263189638 - type: nauc_recall_at_1_std value: -4.763931386681675 - type: nauc_recall_at_20_diff1 value: 28.74169081866096 - type: nauc_recall_at_20_max value: 20.035509169577324 - type: nauc_recall_at_20_std value: 7.371615811227748 - type: nauc_recall_at_3_diff1 value: 34.09890157333362 - type: nauc_recall_at_3_max value: 20.46565842748346 - type: nauc_recall_at_3_std value: -0.4337283067447526 - type: nauc_recall_at_5_diff1 value: 30.974580787842402 - type: nauc_recall_at_5_max value: 23.76379349487105 - type: nauc_recall_at_5_std value: -1.8407515927979428 - type: ndcg_at_1 value: 27.172 - type: ndcg_at_10 value: 38.269999999999996 - type: ndcg_at_100 value: 43.338 - type: ndcg_at_1000 value: 45.594 - type: ndcg_at_20 value: 40.256 - type: ndcg_at_3 value: 32.673 - type: ndcg_at_5 value: 35.224 - type: precision_at_1 value: 27.172 - type: precision_at_10 value: 6.063000000000001 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.123 - type: precision_at_20 value: 3.5029999999999997 - type: precision_at_3 value: 13.74 - type: precision_at_5 value: 9.797 - type: recall_at_1 value: 25.324999999999996 - type: recall_at_10 value: 51.634 - type: recall_at_100 value: 74.687 - type: recall_at_1000 value: 91.412 - type: recall_at_20 value: 59.207 - type: recall_at_3 value: 36.678 - type: recall_at_5 value: 42.742999999999995 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: main_score value: 36.853 - type: map_at_1 value: 15.371000000000002 - type: map_at_10 value: 27.122 - type: map_at_100 value: 29.226000000000003 - type: map_at_1000 value: 29.409999999999997 - type: map_at_20 value: 28.274 - type: map_at_3 value: 22.431 - type: map_at_5 value: 24.877 - type: mrr_at_1 value: 34.13680781758958 - type: mrr_at_10 value: 47.265911793599145 - type: mrr_at_100 value: 48.028369995763846 - type: mrr_at_1000 value: 48.05317022537804 - type: mrr_at_20 value: 47.75785292259516 - type: mrr_at_3 value: 43.887079261672156 - type: mrr_at_5 value: 45.906623235613544 - type: nauc_map_at_1000_diff1 value: 24.949211292921547 - type: nauc_map_at_1000_max value: 38.69844483304584 - type: nauc_map_at_1000_std value: 18.336359440844753 - type: nauc_map_at_100_diff1 value: 24.8951732982492 - type: nauc_map_at_100_max value: 38.65049158594052 - type: nauc_map_at_100_std value: 18.28935278388095 - type: nauc_map_at_10_diff1 value: 24.606032216798273 - type: nauc_map_at_10_max value: 38.00608351559887 - type: nauc_map_at_10_std value: 16.61261615173358 - type: nauc_map_at_1_diff1 value: 30.83614944448221 - type: nauc_map_at_1_max value: 33.757528532809 - type: nauc_map_at_1_std value: 8.880622713261126 - type: nauc_map_at_20_diff1 value: 24.75491310922017 - type: nauc_map_at_20_max value: 38.353679076398834 - type: nauc_map_at_20_std value: 17.58637493443171 - type: nauc_map_at_3_diff1 value: 25.563085273287083 - type: nauc_map_at_3_max value: 35.14515679047155 - type: nauc_map_at_3_std value: 11.75594869817732 - type: nauc_map_at_5_diff1 value: 24.815807517691614 - type: nauc_map_at_5_max value: 36.25905426665983 - type: nauc_map_at_5_std value: 14.516391726180697 - type: nauc_mrr_at_1000_diff1 value: 27.948233427121274 - type: nauc_mrr_at_1000_max value: 37.5893640945859 - type: nauc_mrr_at_1000_std value: 19.588442449629763 - type: nauc_mrr_at_100_diff1 value: 27.947962345854037 - type: nauc_mrr_at_100_max value: 37.60375479481945 - type: nauc_mrr_at_100_std value: 19.614791576283793 - type: nauc_mrr_at_10_diff1 value: 27.882311310262136 - type: nauc_mrr_at_10_max value: 37.58580968074054 - type: nauc_mrr_at_10_std value: 19.49875186170201 - type: nauc_mrr_at_1_diff1 value: 28.017413073648477 - type: nauc_mrr_at_1_max value: 32.87710191514022 - type: nauc_mrr_at_1_std value: 14.04889142608459 - type: nauc_mrr_at_20_diff1 value: 27.89129925771968 - type: nauc_mrr_at_20_max value: 37.6142863106945 - type: nauc_mrr_at_20_std value: 19.645390143394163 - type: nauc_mrr_at_3_diff1 value: 27.99609559690795 - type: nauc_mrr_at_3_max value: 36.87362332456197 - type: nauc_mrr_at_3_std value: 18.598416821915333 - type: nauc_mrr_at_5_diff1 value: 27.68306089976716 - type: nauc_mrr_at_5_max value: 37.12264485659723 - type: nauc_mrr_at_5_std value: 19.18875305730564 - type: nauc_ndcg_at_1000_diff1 value: 25.736779186453777 - type: nauc_ndcg_at_1000_max value: 41.93281139456004 - type: nauc_ndcg_at_1000_std value: 25.179038422659993 - type: nauc_ndcg_at_100_diff1 value: 25.144796623848322 - type: nauc_ndcg_at_100_max value: 41.72820916876173 - type: nauc_ndcg_at_100_std value: 25.12851686850754 - type: nauc_ndcg_at_10_diff1 value: 24.321249191226652 - type: nauc_ndcg_at_10_max value: 40.23711916935706 - type: nauc_ndcg_at_10_std value: 20.89060972334557 - type: nauc_ndcg_at_1_diff1 value: 28.017413073648477 - type: nauc_ndcg_at_1_max value: 32.87710191514022 - type: nauc_ndcg_at_1_std value: 14.04889142608459 - type: nauc_ndcg_at_20_diff1 value: 24.5090484877482 - type: nauc_ndcg_at_20_max value: 40.752854032983606 - type: nauc_ndcg_at_20_std value: 22.70331074781384 - type: nauc_ndcg_at_3_diff1 value: 25.13499057756147 - type: nauc_ndcg_at_3_max value: 35.8325682137567 - type: nauc_ndcg_at_3_std value: 15.23768392706637 - type: nauc_ndcg_at_5_diff1 value: 24.614105695451116 - type: nauc_ndcg_at_5_max value: 37.68089587624492 - type: nauc_ndcg_at_5_std value: 17.946406099261708 - type: nauc_precision_at_1000_diff1 value: -2.022340544774227 - type: nauc_precision_at_1000_max value: 6.070578645067797 - type: nauc_precision_at_1000_std value: 22.15132728777549 - type: nauc_precision_at_100_diff1 value: 4.544144474504255 - type: nauc_precision_at_100_max value: 19.780392159848574 - type: nauc_precision_at_100_std value: 31.107111186002438 - type: nauc_precision_at_10_diff1 value: 10.107015022955848 - type: nauc_precision_at_10_max value: 30.779709099060465 - type: nauc_precision_at_10_std value: 27.324148451668602 - type: nauc_precision_at_1_diff1 value: 28.017413073648477 - type: nauc_precision_at_1_max value: 32.87710191514022 - type: nauc_precision_at_1_std value: 14.04889142608459 - type: nauc_precision_at_20_diff1 value: 8.270881053079405 - type: nauc_precision_at_20_max value: 27.26753946078481 - type: nauc_precision_at_20_std value: 29.156725822074204 - type: nauc_precision_at_3_diff1 value: 17.82468940497632 - type: nauc_precision_at_3_max value: 31.490021174215155 - type: nauc_precision_at_3_std value: 18.73818985054394 - type: nauc_precision_at_5_diff1 value: 13.24803141673961 - type: nauc_precision_at_5_max value: 29.94926240784298 - type: nauc_precision_at_5_std value: 23.2940906142919 - type: nauc_recall_at_1000_diff1 value: 19.09850333580471 - type: nauc_recall_at_1000_max value: 46.026306142840596 - type: nauc_recall_at_1000_std value: 46.50391519568263 - type: nauc_recall_at_100_diff1 value: 16.739384224869738 - type: nauc_recall_at_100_max value: 40.68987136431252 - type: nauc_recall_at_100_std value: 36.01609750485591 - type: nauc_recall_at_10_diff1 value: 17.51796617221814 - type: nauc_recall_at_10_max value: 39.47453129444401 - type: nauc_recall_at_10_std value: 23.79239002974899 - type: nauc_recall_at_1_diff1 value: 30.83614944448221 - type: nauc_recall_at_1_max value: 33.757528532809 - type: nauc_recall_at_1_std value: 8.880622713261126 - type: nauc_recall_at_20_diff1 value: 16.978668307251652 - type: nauc_recall_at_20_max value: 39.09115357303713 - type: nauc_recall_at_20_std value: 27.278668534187524 - type: nauc_recall_at_3_diff1 value: 22.55937738994021 - type: nauc_recall_at_3_max value: 36.25055459395638 - type: nauc_recall_at_3_std value: 14.828905168761247 - type: nauc_recall_at_5_diff1 value: 19.32656748627199 - type: nauc_recall_at_5_max value: 36.28836228620816 - type: nauc_recall_at_5_std value: 19.264352933914278 - type: ndcg_at_1 value: 34.137 - type: ndcg_at_10 value: 36.853 - type: ndcg_at_100 value: 44.279 - type: ndcg_at_1000 value: 47.336 - type: ndcg_at_20 value: 39.815 - type: ndcg_at_3 value: 30.253999999999998 - type: ndcg_at_5 value: 32.649 - type: precision_at_1 value: 34.137 - type: precision_at_10 value: 11.655 - type: precision_at_100 value: 1.9619999999999997 - type: precision_at_1000 value: 0.254 - type: precision_at_20 value: 7.1209999999999996 - type: precision_at_3 value: 22.823 - type: precision_at_5 value: 17.655 - type: recall_at_1 value: 15.371000000000002 - type: recall_at_10 value: 43.718 - type: recall_at_100 value: 68.81 - type: recall_at_1000 value: 85.69600000000001 - type: recall_at_20 value: 51.94 - type: recall_at_3 value: 27.694000000000003 - type: recall_at_5 value: 34.469 task: type: Retrieval - dataset: config: default name: MTEB DBPedia revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: main_score value: 45.553 - type: map_at_1 value: 9.168999999999999 - type: map_at_10 value: 22.154 - type: map_at_100 value: 32.174 - type: map_at_1000 value: 33.974 - type: map_at_20 value: 25.899 - type: map_at_3 value: 15.275 - type: map_at_5 value: 18.291 - type: mrr_at_1 value: 70.75 - type: mrr_at_10 value: 78.39662698412697 - type: mrr_at_100 value: 78.56221458977012 - type: mrr_at_1000 value: 78.56669970642338 - type: mrr_at_20 value: 78.49688805346696 - type: mrr_at_3 value: 76.33333333333333 - type: mrr_at_5 value: 77.70833333333333 - type: nauc_map_at_1000_diff1 value: 18.465085922071346 - type: nauc_map_at_1000_max value: 24.29804638788498 - type: nauc_map_at_1000_std value: 22.380463943423514 - type: nauc_map_at_100_diff1 value: 19.37585410674523 - type: nauc_map_at_100_max value: 22.56424042509462 - type: nauc_map_at_100_std value: 19.672237275984426 - type: nauc_map_at_10_diff1 value: 23.597788166305577 - type: nauc_map_at_10_max value: 9.157316105122925 - type: nauc_map_at_10_std value: -3.8881247055786807 - type: nauc_map_at_1_diff1 value: 43.96699602275052 - type: nauc_map_at_1_max value: -0.7577088440873263 - type: nauc_map_at_1_std value: -17.732463891968404 - type: nauc_map_at_20_diff1 value: 22.326759054850097 - type: nauc_map_at_20_max value: 14.879191412167703 - type: nauc_map_at_20_std value: 5.405751236575241 - type: nauc_map_at_3_diff1 value: 28.73583545428074 - type: nauc_map_at_3_max value: 1.5986597211018239 - type: nauc_map_at_3_std value: -16.512455883681515 - type: nauc_map_at_5_diff1 value: 25.401810959155057 - type: nauc_map_at_5_max value: 4.418875376978587 - type: nauc_map_at_5_std value: -12.296750992013052 - type: nauc_mrr_at_1000_diff1 value: 51.228801807498584 - type: nauc_mrr_at_1000_max value: 61.040998883279585 - type: nauc_mrr_at_1000_std value: 40.93983887257123 - type: nauc_mrr_at_100_diff1 value: 51.23715338435314 - type: nauc_mrr_at_100_max value: 61.03971408781317 - type: nauc_mrr_at_100_std value: 40.91796923590573 - type: nauc_mrr_at_10_diff1 value: 51.1214868552331 - type: nauc_mrr_at_10_max value: 61.03069045590881 - type: nauc_mrr_at_10_std value: 40.661621199704264 - type: nauc_mrr_at_1_diff1 value: 50.84660003035892 - type: nauc_mrr_at_1_max value: 60.692091499960895 - type: nauc_mrr_at_1_std value: 42.126228731502955 - type: nauc_mrr_at_20_diff1 value: 51.0402624284872 - type: nauc_mrr_at_20_max value: 60.94577844338166 - type: nauc_mrr_at_20_std value: 40.89505950503613 - type: nauc_mrr_at_3_diff1 value: 51.771113665996516 - type: nauc_mrr_at_3_max value: 61.65264793077224 - type: nauc_mrr_at_3_std value: 41.75781827057092 - type: nauc_mrr_at_5_diff1 value: 51.0656793772882 - type: nauc_mrr_at_5_max value: 61.08042065139715 - type: nauc_mrr_at_5_std value: 41.11203271084835 - type: nauc_ndcg_at_1000_diff1 value: 22.347978262245107 - type: nauc_ndcg_at_1000_max value: 36.56458763955002 - type: nauc_ndcg_at_1000_std value: 35.99616144258822 - type: nauc_ndcg_at_100_diff1 value: 23.1120990977162 - type: nauc_ndcg_at_100_max value: 30.79663306311657 - type: nauc_ndcg_at_100_std value: 27.387572106784297 - type: nauc_ndcg_at_10_diff1 value: 23.329746066899656 - type: nauc_ndcg_at_10_max value: 28.69246947084685 - type: nauc_ndcg_at_10_std value: 21.457736188325345 - type: nauc_ndcg_at_1_diff1 value: 39.99399153456974 - type: nauc_ndcg_at_1_max value: 38.12447856470389 - type: nauc_ndcg_at_1_std value: 27.768869260384676 - type: nauc_ndcg_at_20_diff1 value: 24.945374175339907 - type: nauc_ndcg_at_20_max value: 27.67836982165295 - type: nauc_ndcg_at_20_std value: 19.7933631060578 - type: nauc_ndcg_at_3_diff1 value: 26.063492354398527 - type: nauc_ndcg_at_3_max value: 33.06541959550656 - type: nauc_ndcg_at_3_std value: 23.278902797288726 - type: nauc_ndcg_at_5_diff1 value: 22.521596060750035 - type: nauc_ndcg_at_5_max value: 31.210005673730784 - type: nauc_ndcg_at_5_std value: 22.893106456317927 - type: nauc_precision_at_1000_diff1 value: -19.845356495096006 - type: nauc_precision_at_1000_max value: 4.163819381816099 - type: nauc_precision_at_1000_std value: 7.612952884590339 - type: nauc_precision_at_100_diff1 value: -8.2679285153361 - type: nauc_precision_at_100_max value: 29.78018175573565 - type: nauc_precision_at_100_std value: 41.07244463956215 - type: nauc_precision_at_10_diff1 value: -3.2451428407349057 - type: nauc_precision_at_10_max value: 36.92563008274906 - type: nauc_precision_at_10_std value: 45.06962043489777 - type: nauc_precision_at_1_diff1 value: 50.84660003035892 - type: nauc_precision_at_1_max value: 60.692091499960895 - type: nauc_precision_at_1_std value: 42.126228731502955 - type: nauc_precision_at_20_diff1 value: -3.432279149061878 - type: nauc_precision_at_20_max value: 37.013592483974875 - type: nauc_precision_at_20_std value: 46.47324739428665 - type: nauc_precision_at_3_diff1 value: 7.28495481051025 - type: nauc_precision_at_3_max value: 38.66372411741402 - type: nauc_precision_at_3_std value: 35.23163993723955 - type: nauc_precision_at_5_diff1 value: -0.16540230063716202 - type: nauc_precision_at_5_max value: 37.322494255721715 - type: nauc_precision_at_5_std value: 39.666653561269754 - type: nauc_recall_at_1000_diff1 value: 11.388326469283681 - type: nauc_recall_at_1000_max value: 32.698146308591674 - type: nauc_recall_at_1000_std value: 49.48830488070777 - type: nauc_recall_at_100_diff1 value: 11.497443532756819 - type: nauc_recall_at_100_max value: 20.196970431621615 - type: nauc_recall_at_100_std value: 23.688772100803433 - type: nauc_recall_at_10_diff1 value: 16.519851398596003 - type: nauc_recall_at_10_max value: 0.774066845071221 - type: nauc_recall_at_10_std value: -10.89514647001814 - type: nauc_recall_at_1_diff1 value: 43.96699602275052 - type: nauc_recall_at_1_max value: -0.7577088440873263 - type: nauc_recall_at_1_std value: -17.732463891968404 - type: nauc_recall_at_20_diff1 value: 15.202960269878258 - type: nauc_recall_at_20_max value: 7.067263295590253 - type: nauc_recall_at_20_std value: -0.06050108222640702 - type: nauc_recall_at_3_diff1 value: 24.066741361525125 - type: nauc_recall_at_3_max value: -2.1961525860488424 - type: nauc_recall_at_3_std value: -19.48307077749568 - type: nauc_recall_at_5_diff1 value: 20.086330794102707 - type: nauc_recall_at_5_max value: -0.8866528062747986 - type: nauc_recall_at_5_std value: -16.53799173962747 - type: ndcg_at_1 value: 57.99999999999999 - type: ndcg_at_10 value: 45.553 - type: ndcg_at_100 value: 51.014 - type: ndcg_at_1000 value: 58.226 - type: ndcg_at_20 value: 44.98 - type: ndcg_at_3 value: 48.981 - type: ndcg_at_5 value: 46.794999999999995 - type: precision_at_1 value: 70.75 - type: precision_at_10 value: 36.85 - type: precision_at_100 value: 11.955 - type: precision_at_1000 value: 2.247 - type: precision_at_20 value: 28.075 - type: precision_at_3 value: 52.666999999999994 - type: precision_at_5 value: 45.85 - type: recall_at_1 value: 9.168999999999999 - type: recall_at_10 value: 28.796 - type: recall_at_100 value: 58.892999999999994 - type: recall_at_1000 value: 81.644 - type: recall_at_20 value: 36.659000000000006 - type: recall_at_3 value: 16.709 - type: recall_at_5 value: 21.387 task: type: Retrieval - dataset: config: default name: MTEB FEVER revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: main_score value: 88.41 - type: map_at_1 value: 75.637 - type: map_at_10 value: 84.674 - type: map_at_100 value: 84.909 - type: map_at_1000 value: 84.92 - type: map_at_20 value: 84.836 - type: map_at_3 value: 83.44200000000001 - type: map_at_5 value: 84.28099999999999 - type: mrr_at_1 value: 81.56315631563157 - type: mrr_at_10 value: 88.89571695264748 - type: mrr_at_100 value: 88.93671417216285 - type: mrr_at_1000 value: 88.93708016011664 - type: mrr_at_20 value: 88.9311652665256 - type: mrr_at_3 value: 88.20882088208805 - type: mrr_at_5 value: 88.72937293729349 - type: nauc_map_at_1000_diff1 value: 54.41216035074026 - type: nauc_map_at_1000_max value: 13.346153003554361 - type: nauc_map_at_1000_std value: -6.721664416152164 - type: nauc_map_at_100_diff1 value: 54.36538350995795 - type: nauc_map_at_100_max value: 13.355583381471298 - type: nauc_map_at_100_std value: -6.696921015641016 - type: nauc_map_at_10_diff1 value: 54.0389127730555 - type: nauc_map_at_10_max value: 13.387802159150663 - type: nauc_map_at_10_std value: -6.73514381731833 - type: nauc_map_at_1_diff1 value: 57.99489574836453 - type: nauc_map_at_1_max value: 7.830032589171654 - type: nauc_map_at_1_std value: -10.140208285080295 - type: nauc_map_at_20_diff1 value: 54.16841004736076 - type: nauc_map_at_20_max value: 13.345607363689746 - type: nauc_map_at_20_std value: -6.663119775158465 - type: nauc_map_at_3_diff1 value: 53.82879543599303 - type: nauc_map_at_3_max value: 12.716952288433902 - type: nauc_map_at_3_std value: -7.746102082835598 - type: nauc_map_at_5_diff1 value: 53.82838395350109 - type: nauc_map_at_5_max value: 13.487373534211702 - type: nauc_map_at_5_std value: -6.869504398693434 - type: nauc_mrr_at_1000_diff1 value: 68.92783546581906 - type: nauc_mrr_at_1000_max value: 12.076297180596592 - type: nauc_mrr_at_1000_std value: -13.306257067567998 - type: nauc_mrr_at_100_diff1 value: 68.92780219775517 - type: nauc_mrr_at_100_max value: 12.078449805054374 - type: nauc_mrr_at_100_std value: -13.303524852703719 - type: nauc_mrr_at_10_diff1 value: 68.92686206881258 - type: nauc_mrr_at_10_max value: 12.273295656884873 - type: nauc_mrr_at_10_std value: -13.222483496603965 - type: nauc_mrr_at_1_diff1 value: 70.1738022073041 - type: nauc_mrr_at_1_max value: 9.378639533482806 - type: nauc_mrr_at_1_std value: -13.444033823202348 - type: nauc_mrr_at_20_diff1 value: 68.91161304905303 - type: nauc_mrr_at_20_max value: 12.117091514817885 - type: nauc_mrr_at_20_std value: -13.258261750160239 - type: nauc_mrr_at_3_diff1 value: 68.61982455945467 - type: nauc_mrr_at_3_max value: 12.608213879734578 - type: nauc_mrr_at_3_std value: -13.558003431587839 - type: nauc_mrr_at_5_diff1 value: 68.81439097457242 - type: nauc_mrr_at_5_max value: 12.54025598903624 - type: nauc_mrr_at_5_std value: -13.199231514972093 - type: nauc_ndcg_at_1000_diff1 value: 56.47563443877495 - type: nauc_ndcg_at_1000_max value: 14.508331783439466 - type: nauc_ndcg_at_1000_std value: -6.206829736668775 - type: nauc_ndcg_at_100_diff1 value: 55.54015515673474 - type: nauc_ndcg_at_100_max value: 14.753595778278136 - type: nauc_ndcg_at_100_std value: -5.638517949568802 - type: nauc_ndcg_at_10_diff1 value: 54.220845223257996 - type: nauc_ndcg_at_10_max value: 15.265309648490021 - type: nauc_ndcg_at_10_std value: -5.516276098929109 - type: nauc_ndcg_at_1_diff1 value: 70.1738022073041 - type: nauc_ndcg_at_1_max value: 9.378639533482806 - type: nauc_ndcg_at_1_std value: -13.444033823202348 - type: nauc_ndcg_at_20_diff1 value: 54.481406100854635 - type: nauc_ndcg_at_20_max value: 14.868763583210498 - type: nauc_ndcg_at_20_std value: -5.328097380018734 - type: nauc_ndcg_at_3_diff1 value: 54.94411725607744 - type: nauc_ndcg_at_3_max value: 14.27186734506607 - type: nauc_ndcg_at_3_std value: -7.894724962312474 - type: nauc_ndcg_at_5_diff1 value: 54.08048166974806 - type: nauc_ndcg_at_5_max value: 15.528233170721006 - type: nauc_ndcg_at_5_std value: -5.984768714537104 - type: nauc_precision_at_1000_diff1 value: -8.744323640074445 - type: nauc_precision_at_1000_max value: -0.01881224392053465 - type: nauc_precision_at_1000_std value: 3.8721477979260635 - type: nauc_precision_at_100_diff1 value: -11.86150156952171 - type: nauc_precision_at_100_max value: 3.2736651314552314 - type: nauc_precision_at_100_std value: 8.12687620615509 - type: nauc_precision_at_10_diff1 value: -10.360708676781178 - type: nauc_precision_at_10_max value: 10.945552490433458 - type: nauc_precision_at_10_std value: 11.016707653014485 - type: nauc_precision_at_1_diff1 value: 70.1738022073041 - type: nauc_precision_at_1_max value: 9.378639533482806 - type: nauc_precision_at_1_std value: -13.444033823202348 - type: nauc_precision_at_20_diff1 value: -13.557721925696583 - type: nauc_precision_at_20_max value: 6.331386521718574 - type: nauc_precision_at_20_std value: 10.322188778142388 - type: nauc_precision_at_3_diff1 value: 15.139456770248968 - type: nauc_precision_at_3_max value: 17.10220985600708 - type: nauc_precision_at_3_std value: 3.0448183682558074 - type: nauc_precision_at_5_diff1 value: -1.9825577548111102 - type: nauc_precision_at_5_max value: 17.139148127012625 - type: nauc_precision_at_5_std value: 10.598435750554753 - type: nauc_recall_at_1000_diff1 value: 15.641740744283005 - type: nauc_recall_at_1000_max value: 44.65315702195612 - type: nauc_recall_at_1000_std value: 52.34265862835513 - type: nauc_recall_at_100_diff1 value: 5.254385435323394 - type: nauc_recall_at_100_max value: 38.53577774395794 - type: nauc_recall_at_100_std value: 43.47744274335829 - type: nauc_recall_at_10_diff1 value: 19.135735476268042 - type: nauc_recall_at_10_max value: 30.05417445923848 - type: nauc_recall_at_10_std value: 18.3988023241141 - type: nauc_recall_at_1_diff1 value: 57.99489574836453 - type: nauc_recall_at_1_max value: 7.830032589171654 - type: nauc_recall_at_1_std value: -10.140208285080295 - type: nauc_recall_at_20_diff1 value: 9.444797759735126 - type: nauc_recall_at_20_max value: 31.001311675371017 - type: nauc_recall_at_20_std value: 29.351418893822178 - type: nauc_recall_at_3_diff1 value: 36.88862653262064 - type: nauc_recall_at_3_max value: 19.845892741607823 - type: nauc_recall_at_3_std value: -1.0584273105890794 - type: nauc_recall_at_5_diff1 value: 27.360718561944974 - type: nauc_recall_at_5_max value: 26.698311215441738 - type: nauc_recall_at_5_std value: 8.97113997755362 - type: ndcg_at_1 value: 81.563 - type: ndcg_at_10 value: 88.41 - type: ndcg_at_100 value: 89.101 - type: ndcg_at_1000 value: 89.25800000000001 - type: ndcg_at_20 value: 88.79 - type: ndcg_at_3 value: 86.599 - type: ndcg_at_5 value: 87.74 - type: precision_at_1 value: 81.563 - type: precision_at_10 value: 10.699 - type: precision_at_100 value: 1.13 - type: precision_at_1000 value: 0.116 - type: precision_at_20 value: 5.479 - type: precision_at_3 value: 33.238 - type: precision_at_5 value: 20.744 - type: recall_at_1 value: 75.637 - type: recall_at_10 value: 95.57600000000001 - type: recall_at_100 value: 98.072 - type: recall_at_1000 value: 98.951 - type: recall_at_20 value: 96.792 - type: recall_at_3 value: 90.79599999999999 - type: recall_at_5 value: 93.674 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: main_score value: 42.396 - type: map_at_1 value: 21.711 - type: map_at_10 value: 34.628 - type: map_at_100 value: 36.549 - type: map_at_1000 value: 36.719 - type: map_at_20 value: 35.673 - type: map_at_3 value: 30.585 - type: map_at_5 value: 32.875 - type: mrr_at_1 value: 41.82098765432099 - type: mrr_at_10 value: 50.69505682931607 - type: mrr_at_100 value: 51.50556608727901 - type: mrr_at_1000 value: 51.53870583208304 - type: mrr_at_20 value: 51.15345764364655 - type: mrr_at_3 value: 48.35390946502059 - type: mrr_at_5 value: 49.87397119341563 - type: nauc_map_at_1000_diff1 value: 45.182252919583895 - type: nauc_map_at_1000_max value: 35.66124930024801 - type: nauc_map_at_1000_std value: -0.6925562638650965 - type: nauc_map_at_100_diff1 value: 45.116964706960125 - type: nauc_map_at_100_max value: 35.54990469525889 - type: nauc_map_at_100_std value: -0.6667263852859368 - type: nauc_map_at_10_diff1 value: 45.39189096228184 - type: nauc_map_at_10_max value: 34.780111261901 - type: nauc_map_at_10_std value: -1.8169859294150819 - type: nauc_map_at_1_diff1 value: 47.72764937952259 - type: nauc_map_at_1_max value: 24.83306559709341 - type: nauc_map_at_1_std value: -4.714128457297418 - type: nauc_map_at_20_diff1 value: 45.17073365898278 - type: nauc_map_at_20_max value: 35.0938403469058 - type: nauc_map_at_20_std value: -1.373412631183604 - type: nauc_map_at_3_diff1 value: 46.525724305731295 - type: nauc_map_at_3_max value: 31.042538866512597 - type: nauc_map_at_3_std value: -4.119355935975354 - type: nauc_map_at_5_diff1 value: 45.79569633383187 - type: nauc_map_at_5_max value: 32.88779656647293 - type: nauc_map_at_5_std value: -3.2518474739335312 - type: nauc_mrr_at_1000_diff1 value: 52.83619185487903 - type: nauc_mrr_at_1000_max value: 42.30310720405186 - type: nauc_mrr_at_1000_std value: -1.1487703348518024 - type: nauc_mrr_at_100_diff1 value: 52.82248853996664 - type: nauc_mrr_at_100_max value: 42.30549701564678 - type: nauc_mrr_at_100_std value: -1.1240113031894834 - type: nauc_mrr_at_10_diff1 value: 52.74644276642243 - type: nauc_mrr_at_10_max value: 42.39103029476398 - type: nauc_mrr_at_10_std value: -1.1043413237848576 - type: nauc_mrr_at_1_diff1 value: 54.810335521617326 - type: nauc_mrr_at_1_max value: 40.733260207843394 - type: nauc_mrr_at_1_std value: -4.452554921565855 - type: nauc_mrr_at_20_diff1 value: 52.788257862499954 - type: nauc_mrr_at_20_max value: 42.32658875363406 - type: nauc_mrr_at_20_std value: -1.2209728080684497 - type: nauc_mrr_at_3_diff1 value: 53.43281175319808 - type: nauc_mrr_at_3_max value: 41.735942650867926 - type: nauc_mrr_at_3_std value: -2.462688102468019 - type: nauc_mrr_at_5_diff1 value: 52.874037126566606 - type: nauc_mrr_at_5_max value: 41.93740449458822 - type: nauc_mrr_at_5_std value: -1.2928874908441947 - type: nauc_ndcg_at_1000_diff1 value: 46.5532425476402 - type: nauc_ndcg_at_1000_max value: 40.369611603370515 - type: nauc_ndcg_at_1000_std value: 3.472567588386994 - type: nauc_ndcg_at_100_diff1 value: 45.75244404695404 - type: nauc_ndcg_at_100_max value: 39.36470550675439 - type: nauc_ndcg_at_100_std value: 4.356189041115731 - type: nauc_ndcg_at_10_diff1 value: 46.005135323539704 - type: nauc_ndcg_at_10_max value: 37.89018165334218 - type: nauc_ndcg_at_10_std value: 0.7129618297768014 - type: nauc_ndcg_at_1_diff1 value: 54.810335521617326 - type: nauc_ndcg_at_1_max value: 40.733260207843394 - type: nauc_ndcg_at_1_std value: -4.452554921565855 - type: nauc_ndcg_at_20_diff1 value: 45.841552790490034 - type: nauc_ndcg_at_20_max value: 38.04992825472661 - type: nauc_ndcg_at_20_std value: 1.2748305707955212 - type: nauc_ndcg_at_3_diff1 value: 46.683033449357744 - type: nauc_ndcg_at_3_max value: 37.46397870760607 - type: nauc_ndcg_at_3_std value: -2.3421854966319824 - type: nauc_ndcg_at_5_diff1 value: 45.82409645378457 - type: nauc_ndcg_at_5_max value: 36.27588234096716 - type: nauc_ndcg_at_5_std value: -1.5141197170944254 - type: nauc_precision_at_1000_diff1 value: -3.137944321071885 - type: nauc_precision_at_1000_max value: 24.12803166253776 - type: nauc_precision_at_1000_std value: 11.076454789944101 - type: nauc_precision_at_100_diff1 value: 3.9896283891401048 - type: nauc_precision_at_100_max value: 31.00198316788829 - type: nauc_precision_at_100_std value: 15.725887643803063 - type: nauc_precision_at_10_diff1 value: 20.493420889888394 - type: nauc_precision_at_10_max value: 41.689699671507405 - type: nauc_precision_at_10_std value: 9.374983385669914 - type: nauc_precision_at_1_diff1 value: 54.810335521617326 - type: nauc_precision_at_1_max value: 40.733260207843394 - type: nauc_precision_at_1_std value: -4.452554921565855 - type: nauc_precision_at_20_diff1 value: 15.02911800246446 - type: nauc_precision_at_20_max value: 39.227068888505 - type: nauc_precision_at_20_std value: 11.755558515319404 - type: nauc_precision_at_3_diff1 value: 34.044986535461746 - type: nauc_precision_at_3_max value: 40.96605829831656 - type: nauc_precision_at_3_std value: 1.1903535705688038 - type: nauc_precision_at_5_diff1 value: 26.617002443432707 - type: nauc_precision_at_5_max value: 40.60413785916794 - type: nauc_precision_at_5_std value: 3.6984531670502814 - type: nauc_recall_at_1000_diff1 value: 26.96489389440101 - type: nauc_recall_at_1000_max value: 41.811583968523955 - type: nauc_recall_at_1000_std value: 41.5719519496712 - type: nauc_recall_at_100_diff1 value: 28.50851434908223 - type: nauc_recall_at_100_max value: 32.19528060706322 - type: nauc_recall_at_100_std value: 25.56935294258179 - type: nauc_recall_at_10_diff1 value: 35.139582891180964 - type: nauc_recall_at_10_max value: 32.15221840434225 - type: nauc_recall_at_10_std value: 5.550434611582702 - type: nauc_recall_at_1_diff1 value: 47.72764937952259 - type: nauc_recall_at_1_max value: 24.83306559709341 - type: nauc_recall_at_1_std value: -4.714128457297418 - type: nauc_recall_at_20_diff1 value: 32.78604811055205 - type: nauc_recall_at_20_max value: 29.62940720700254 - type: nauc_recall_at_20_std value: 6.769941491859872 - type: nauc_recall_at_3_diff1 value: 40.76090616138699 - type: nauc_recall_at_3_max value: 27.506425490226867 - type: nauc_recall_at_3_std value: -2.608872693119243 - type: nauc_recall_at_5_diff1 value: 37.06532485024711 - type: nauc_recall_at_5_max value: 27.704150556658448 - type: nauc_recall_at_5_std value: 0.4718707152343872 - type: ndcg_at_1 value: 41.821000000000005 - type: ndcg_at_10 value: 42.396 - type: ndcg_at_100 value: 49.370000000000005 - type: ndcg_at_1000 value: 52.251000000000005 - type: ndcg_at_20 value: 45.097 - type: ndcg_at_3 value: 39.028 - type: ndcg_at_5 value: 40.222 - type: precision_at_1 value: 41.821000000000005 - type: precision_at_10 value: 11.451 - type: precision_at_100 value: 1.863 - type: precision_at_1000 value: 0.23900000000000002 - type: precision_at_20 value: 6.798 - type: precision_at_3 value: 25.823 - type: precision_at_5 value: 18.735 - type: recall_at_1 value: 21.711 - type: recall_at_10 value: 48.862 - type: recall_at_100 value: 74.708 - type: recall_at_1000 value: 91.865 - type: recall_at_20 value: 57.50999999999999 - type: recall_at_3 value: 35.85 - type: recall_at_5 value: 41.976 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: main_score value: 72.21 - type: map_at_1 value: 39.487 - type: map_at_10 value: 63.949999999999996 - type: map_at_100 value: 64.873 - type: map_at_1000 value: 64.927 - type: map_at_20 value: 64.529 - type: map_at_3 value: 60.243 - type: map_at_5 value: 62.613 - type: mrr_at_1 value: 78.97366644159351 - type: mrr_at_10 value: 84.84600173627825 - type: mrr_at_100 value: 85.0172804866798 - type: mrr_at_1000 value: 85.02245651152857 - type: mrr_at_20 value: 84.9625577788225 - type: mrr_at_3 value: 83.90276839972962 - type: mrr_at_5 value: 84.48278190411845 - type: nauc_map_at_1000_diff1 value: 19.825004700775164 - type: nauc_map_at_1000_max value: 19.943221724164182 - type: nauc_map_at_1000_std value: 10.068951166560058 - type: nauc_map_at_100_diff1 value: 19.80139472181137 - type: nauc_map_at_100_max value: 19.938006132804347 - type: nauc_map_at_100_std value: 10.100008107666842 - type: nauc_map_at_10_diff1 value: 19.53604502514735 - type: nauc_map_at_10_max value: 19.62768870331064 - type: nauc_map_at_10_std value: 9.446859074725705 - type: nauc_map_at_1_diff1 value: 67.7764270505257 - type: nauc_map_at_1_max value: 38.45166604737058 - type: nauc_map_at_1_std value: 1.9919181988552352 - type: nauc_map_at_20_diff1 value: 19.635871913149913 - type: nauc_map_at_20_max value: 19.812838965919155 - type: nauc_map_at_20_std value: 9.905163140101845 - type: nauc_map_at_3_diff1 value: 18.965707122532212 - type: nauc_map_at_3_max value: 17.878860313056517 - type: nauc_map_at_3_std value: 6.189378752019195 - type: nauc_map_at_5_diff1 value: 19.493354049675954 - type: nauc_map_at_5_max value: 19.24527088109141 - type: nauc_map_at_5_std value: 8.283883139680066 - type: nauc_mrr_at_1000_diff1 value: 66.87150374356781 - type: nauc_mrr_at_1000_max value: 41.413456443203984 - type: nauc_mrr_at_1000_std value: 4.140387282484357 - type: nauc_mrr_at_100_diff1 value: 66.87178015619061 - type: nauc_mrr_at_100_max value: 41.419754763150834 - type: nauc_mrr_at_100_std value: 4.15222235416704 - type: nauc_mrr_at_10_diff1 value: 66.89720586892301 - type: nauc_mrr_at_10_max value: 41.56353878125211 - type: nauc_mrr_at_10_std value: 4.213376519922392 - type: nauc_mrr_at_1_diff1 value: 67.7764270505257 - type: nauc_mrr_at_1_max value: 38.45166604737058 - type: nauc_mrr_at_1_std value: 1.9919181988552352 - type: nauc_mrr_at_20_diff1 value: 66.8714688713149 - type: nauc_mrr_at_20_max value: 41.46170778986735 - type: nauc_mrr_at_20_std value: 4.165154741309859 - type: nauc_mrr_at_3_diff1 value: 66.31615462679144 - type: nauc_mrr_at_3_max value: 41.419637693259936 - type: nauc_mrr_at_3_std value: 3.814834551396097 - type: nauc_mrr_at_5_diff1 value: 66.7289413087213 - type: nauc_mrr_at_5_max value: 41.668346356371586 - type: nauc_mrr_at_5_std value: 4.116331539882484 - type: nauc_ndcg_at_1000_diff1 value: 26.37325375970598 - type: nauc_ndcg_at_1000_max value: 24.850915174721735 - type: nauc_ndcg_at_1000_std value: 13.37585683440429 - type: nauc_ndcg_at_100_diff1 value: 25.591771178059503 - type: nauc_ndcg_at_100_max value: 24.562820829532473 - type: nauc_ndcg_at_100_std value: 14.093690500501541 - type: nauc_ndcg_at_10_diff1 value: 24.64600598115805 - type: nauc_ndcg_at_10_max value: 23.543499404760023 - type: nauc_ndcg_at_10_std value: 11.55823632781553 - type: nauc_ndcg_at_1_diff1 value: 67.7764270505257 - type: nauc_ndcg_at_1_max value: 38.45166604737058 - type: nauc_ndcg_at_1_std value: 1.9919181988552352 - type: nauc_ndcg_at_20_diff1 value: 24.757843275306726 - type: nauc_ndcg_at_20_max value: 23.951154200380827 - type: nauc_ndcg_at_20_std value: 12.931320453044886 - type: nauc_ndcg_at_3_diff1 value: 24.37742630418847 - type: nauc_ndcg_at_3_max value: 21.310512304883723 - type: nauc_ndcg_at_3_std value: 6.503993200818077 - type: nauc_ndcg_at_5_diff1 value: 24.813706829269716 - type: nauc_ndcg_at_5_max value: 22.993657212898 - type: nauc_ndcg_at_5_std value: 9.34462052506809 - type: nauc_precision_at_1000_diff1 value: -0.6506415756958156 - type: nauc_precision_at_1000_max value: 28.039755644694875 - type: nauc_precision_at_1000_std value: 53.46474329623814 - type: nauc_precision_at_100_diff1 value: 3.78462668236152 - type: nauc_precision_at_100_max value: 22.501700881673862 - type: nauc_precision_at_100_std value: 40.56672716474142 - type: nauc_precision_at_10_diff1 value: 9.156113228907534 - type: nauc_precision_at_10_max value: 19.734206254833254 - type: nauc_precision_at_10_std value: 19.986282545779602 - type: nauc_precision_at_1_diff1 value: 67.7764270505257 - type: nauc_precision_at_1_max value: 38.45166604737058 - type: nauc_precision_at_1_std value: 1.9919181988552352 - type: nauc_precision_at_20_diff1 value: 6.6164335644470125 - type: nauc_precision_at_20_max value: 20.29343459608317 - type: nauc_precision_at_20_std value: 26.51115475333977 - type: nauc_precision_at_3_diff1 value: 12.476520554399546 - type: nauc_precision_at_3_max value: 16.69401409858964 - type: nauc_precision_at_3_std value: 8.165880294907444 - type: nauc_precision_at_5_diff1 value: 11.783242828320958 - type: nauc_precision_at_5_max value: 19.0679467875759 - type: nauc_precision_at_5_std value: 13.615358345509884 - type: nauc_recall_at_1000_diff1 value: -0.6506415756960168 - type: nauc_recall_at_1000_max value: 28.039755644694786 - type: nauc_recall_at_1000_std value: 53.46474329623801 - type: nauc_recall_at_100_diff1 value: 3.7846266823613877 - type: nauc_recall_at_100_max value: 22.501700881674008 - type: nauc_recall_at_100_std value: 40.566727164741366 - type: nauc_recall_at_10_diff1 value: 9.15611322890755 - type: nauc_recall_at_10_max value: 19.73420625483318 - type: nauc_recall_at_10_std value: 19.98628254577951 - type: nauc_recall_at_1_diff1 value: 67.7764270505257 - type: nauc_recall_at_1_max value: 38.45166604737058 - type: nauc_recall_at_1_std value: 1.9919181988552352 - type: nauc_recall_at_20_diff1 value: 6.616433564446929 - type: nauc_recall_at_20_max value: 20.293434596083248 - type: nauc_recall_at_20_std value: 26.5111547533396 - type: nauc_recall_at_3_diff1 value: 12.476520554399531 - type: nauc_recall_at_3_max value: 16.69401409858966 - type: nauc_recall_at_3_std value: 8.165880294907438 - type: nauc_recall_at_5_diff1 value: 11.783242828320999 - type: nauc_recall_at_5_max value: 19.067946787575845 - type: nauc_recall_at_5_std value: 13.61535834550991 - type: ndcg_at_1 value: 78.974 - type: ndcg_at_10 value: 72.21 - type: ndcg_at_100 value: 75.264 - type: ndcg_at_1000 value: 76.259 - type: ndcg_at_20 value: 73.628 - type: ndcg_at_3 value: 67.047 - type: ndcg_at_5 value: 69.974 - type: precision_at_1 value: 78.974 - type: precision_at_10 value: 15.267 - type: precision_at_100 value: 1.762 - type: precision_at_1000 value: 0.189 - type: precision_at_20 value: 8.09 - type: precision_at_3 value: 43.309 - type: precision_at_5 value: 28.294000000000004 - type: recall_at_1 value: 39.487 - type: recall_at_10 value: 76.334 - type: recall_at_100 value: 88.076 - type: recall_at_1000 value: 94.59100000000001 - type: recall_at_20 value: 80.898 - type: recall_at_3 value: 64.96300000000001 - type: recall_at_5 value: 70.736 task: type: Retrieval - dataset: config: default name: MTEB MSMARCO revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: main_score value: 42.027 - type: map_at_1 value: 22.118 - type: map_at_10 value: 34.816 - type: map_at_100 value: 35.983 - type: map_at_1000 value: 36.028999999999996 - type: map_at_20 value: 35.545 - type: map_at_3 value: 30.752000000000002 - type: map_at_5 value: 33.114 - type: mrr_at_1 value: 22.793696275071635 - type: mrr_at_10 value: 35.47250079592483 - type: mrr_at_100 value: 36.576471512902856 - type: mrr_at_1000 value: 36.616205680509786 - type: mrr_at_20 value: 36.16557033864942 - type: mrr_at_3 value: 31.48758357211065 - type: mrr_at_5 value: 33.80563514804202 - type: nauc_map_at_1000_diff1 value: 32.89234100489284 - type: nauc_map_at_1000_max value: 1.1802816553581001 - type: nauc_map_at_1000_std value: -20.187692925732446 - type: nauc_map_at_100_diff1 value: 32.88694493681772 - type: nauc_map_at_100_max value: 1.1732717578080365 - type: nauc_map_at_100_std value: -20.164165529035245 - type: nauc_map_at_10_diff1 value: 32.826182211848796 - type: nauc_map_at_10_max value: 1.1551262165737235 - type: nauc_map_at_10_std value: -20.88326292319754 - type: nauc_map_at_1_diff1 value: 36.12732122790642 - type: nauc_map_at_1_max value: 1.8197550109156913 - type: nauc_map_at_1_std value: -17.205625720792167 - type: nauc_map_at_20_diff1 value: 32.83333177195551 - type: nauc_map_at_20_max value: 1.0937431645506202 - type: nauc_map_at_20_std value: -20.503956514646145 - type: nauc_map_at_3_diff1 value: 32.76264193805814 - type: nauc_map_at_3_max value: 0.8560962042500389 - type: nauc_map_at_3_std value: -20.608930717315577 - type: nauc_map_at_5_diff1 value: 32.78673238978775 - type: nauc_map_at_5_max value: 1.0511863039329437 - type: nauc_map_at_5_std value: -21.02164728626011 - type: nauc_mrr_at_1000_diff1 value: 32.610323934702286 - type: nauc_mrr_at_1000_max value: 1.276669121901405 - type: nauc_mrr_at_1000_std value: -19.908120615285043 - type: nauc_mrr_at_100_diff1 value: 32.601373758102795 - type: nauc_mrr_at_100_max value: 1.2752735149992132 - type: nauc_mrr_at_100_std value: -19.87937042610101 - type: nauc_mrr_at_10_diff1 value: 32.55795432078168 - type: nauc_mrr_at_10_max value: 1.2881786969258637 - type: nauc_mrr_at_10_std value: -20.54564519015977 - type: nauc_mrr_at_1_diff1 value: 35.596301376443726 - type: nauc_mrr_at_1_max value: 1.7633238037306902 - type: nauc_mrr_at_1_std value: -17.1999420019887 - type: nauc_mrr_at_20_diff1 value: 32.57185739111023 - type: nauc_mrr_at_20_max value: 1.2212620853201877 - type: nauc_mrr_at_20_std value: -20.179517281041264 - type: nauc_mrr_at_3_diff1 value: 32.42681377099514 - type: nauc_mrr_at_3_max value: 0.8745921708861145 - type: nauc_mrr_at_3_std value: -20.41017687790572 - type: nauc_mrr_at_5_diff1 value: 32.499107129648266 - type: nauc_mrr_at_5_max value: 1.1159673851851573 - type: nauc_mrr_at_5_std value: -20.695143502133824 - type: nauc_ndcg_at_1000_diff1 value: 32.16957965806702 - type: nauc_ndcg_at_1000_max value: 1.6763998947980905 - type: nauc_ndcg_at_1000_std value: -18.970592350332893 - type: nauc_ndcg_at_100_diff1 value: 31.977550102558872 - type: nauc_ndcg_at_100_max value: 1.5625858650110014 - type: nauc_ndcg_at_100_std value: -17.990456766123835 - type: nauc_ndcg_at_10_diff1 value: 31.82738932481356 - type: nauc_ndcg_at_10_max value: 1.1661362042692103 - type: nauc_ndcg_at_10_std value: -21.872680193994217 - type: nauc_ndcg_at_1_diff1 value: 35.596301376443726 - type: nauc_ndcg_at_1_max value: 1.7633238037306902 - type: nauc_ndcg_at_1_std value: -17.1999420019887 - type: nauc_ndcg_at_20_diff1 value: 31.749656399266264 - type: nauc_ndcg_at_20_max value: 0.9629024493088691 - type: nauc_ndcg_at_20_std value: -20.4379403899277 - type: nauc_ndcg_at_3_diff1 value: 31.731361436850836 - type: nauc_ndcg_at_3_max value: 0.531749791578849 - type: nauc_ndcg_at_3_std value: -21.551112910698674 - type: nauc_ndcg_at_5_diff1 value: 31.785373941157303 - type: nauc_ndcg_at_5_max value: 0.86207769368333 - type: nauc_ndcg_at_5_std value: -22.24923399160171 - type: nauc_precision_at_1000_diff1 value: -3.841288331986519 - type: nauc_precision_at_1000_max value: 13.558041371634976 - type: nauc_precision_at_1000_std value: 15.181510484512827 - type: nauc_precision_at_100_diff1 value: 12.441154582709053 - type: nauc_precision_at_100_max value: 8.428136255841935 - type: nauc_precision_at_100_std value: 14.710391839731656 - type: nauc_precision_at_10_diff1 value: 26.185854813986705 - type: nauc_precision_at_10_max value: 1.6348387310504464 - type: nauc_precision_at_10_std value: -23.448927004357298 - type: nauc_precision_at_1_diff1 value: 35.596301376443726 - type: nauc_precision_at_1_max value: 1.7633238037306902 - type: nauc_precision_at_1_std value: -17.1999420019887 - type: nauc_precision_at_20_diff1 value: 22.69194179544158 - type: nauc_precision_at_20_max value: 1.2972015009169306 - type: nauc_precision_at_20_std value: -15.751482380060269 - type: nauc_precision_at_3_diff1 value: 28.255531512125188 - type: nauc_precision_at_3_max value: -0.3715575458464333 - type: nauc_precision_at_3_std value: -24.227970454057697 - type: nauc_precision_at_5_diff1 value: 27.65497951098847 - type: nauc_precision_at_5_max value: 0.449773375292472 - type: nauc_precision_at_5_std value: -25.37445450938601 - type: nauc_recall_at_1000_diff1 value: 15.243948516763819 - type: nauc_recall_at_1000_max value: 41.821227805251375 - type: nauc_recall_at_1000_std value: 61.66297794838101 - type: nauc_recall_at_100_diff1 value: 24.516543685029994 - type: nauc_recall_at_100_max value: 7.093972966253228 - type: nauc_recall_at_100_std value: 17.244452321212282 - type: nauc_recall_at_10_diff1 value: 28.404243095182828 - type: nauc_recall_at_10_max value: 1.0805210480930945 - type: nauc_recall_at_10_std value: -24.885018657039527 - type: nauc_recall_at_1_diff1 value: 36.12732122790642 - type: nauc_recall_at_1_max value: 1.8197550109156913 - type: nauc_recall_at_1_std value: -17.205625720792167 - type: nauc_recall_at_20_diff1 value: 26.956250169438512 - type: nauc_recall_at_20_max value: 0.023973408161285917 - type: nauc_recall_at_20_std value: -18.32944444428131 - type: nauc_recall_at_3_diff1 value: 28.9894205130054 - type: nauc_recall_at_3_max value: -0.36140658021466865 - type: nauc_recall_at_3_std value: -24.022505107768364 - type: nauc_recall_at_5_diff1 value: 28.907023434955104 - type: nauc_recall_at_5_max value: 0.2501037567297729 - type: nauc_recall_at_5_std value: -25.719919602271496 - type: ndcg_at_1 value: 22.794 - type: ndcg_at_10 value: 42.027 - type: ndcg_at_100 value: 47.601 - type: ndcg_at_1000 value: 48.713 - type: ndcg_at_20 value: 44.623000000000005 - type: ndcg_at_3 value: 33.772999999999996 - type: ndcg_at_5 value: 37.991 - type: precision_at_1 value: 22.794 - type: precision_at_10 value: 6.711 - type: precision_at_100 value: 0.9490000000000001 - type: precision_at_1000 value: 0.105 - type: precision_at_20 value: 3.8920000000000003 - type: precision_at_3 value: 14.46 - type: precision_at_5 value: 10.822 - type: recall_at_1 value: 22.118 - type: recall_at_10 value: 64.201 - type: recall_at_100 value: 89.878 - type: recall_at_1000 value: 98.259 - type: recall_at_20 value: 74.34100000000001 - type: recall_at_3 value: 41.8 - type: recall_at_5 value: 51.959 task: type: Retrieval - dataset: config: default name: MTEB NFCorpus revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: main_score value: 36.201 - type: map_at_1 value: 5.654 - type: map_at_10 value: 13.402 - type: map_at_100 value: 16.849 - type: map_at_1000 value: 18.264 - type: map_at_20 value: 14.832 - type: map_at_3 value: 9.619 - type: map_at_5 value: 11.483 - type: mrr_at_1 value: 47.6780185758514 - type: mrr_at_10 value: 56.47906531033466 - type: mrr_at_100 value: 57.04539749991402 - type: mrr_at_1000 value: 57.08810157607369 - type: mrr_at_20 value: 56.88003170105462 - type: mrr_at_3 value: 54.43756449948401 - type: mrr_at_5 value: 55.660474716202266 - type: nauc_map_at_1000_diff1 value: 31.134615238698192 - type: nauc_map_at_1000_max value: 36.09522002487132 - type: nauc_map_at_1000_std value: 14.72627666649002 - type: nauc_map_at_100_diff1 value: 32.777473351864444 - type: nauc_map_at_100_max value: 35.25391471621035 - type: nauc_map_at_100_std value: 12.024428973861083 - type: nauc_map_at_10_diff1 value: 36.46466466148528 - type: nauc_map_at_10_max value: 29.707805406826722 - type: nauc_map_at_10_std value: 2.0678757794226335 - type: nauc_map_at_1_diff1 value: 54.30208426149679 - type: nauc_map_at_1_max value: 18.69125148481608 - type: nauc_map_at_1_std value: -8.970955660291802 - type: nauc_map_at_20_diff1 value: 34.76513311600623 - type: nauc_map_at_20_max value: 32.20666003570514 - type: nauc_map_at_20_std value: 5.924889441518581 - type: nauc_map_at_3_diff1 value: 45.73465176835491 - type: nauc_map_at_3_max value: 23.492291524989106 - type: nauc_map_at_3_std value: -5.0123536561688855 - type: nauc_map_at_5_diff1 value: 39.7128319374107 - type: nauc_map_at_5_max value: 25.84231729559691 - type: nauc_map_at_5_std value: -2.0861428981140344 - type: nauc_mrr_at_1000_diff1 value: 33.0997881703397 - type: nauc_mrr_at_1000_max value: 52.7089709923531 - type: nauc_mrr_at_1000_std value: 28.8517952674151 - type: nauc_mrr_at_100_diff1 value: 33.1094984027438 - type: nauc_mrr_at_100_max value: 52.74301398138847 - type: nauc_mrr_at_100_std value: 28.897997840300892 - type: nauc_mrr_at_10_diff1 value: 33.300713655464925 - type: nauc_mrr_at_10_max value: 52.572139698742184 - type: nauc_mrr_at_10_std value: 28.66875615527188 - type: nauc_mrr_at_1_diff1 value: 32.57632582147155 - type: nauc_mrr_at_1_max value: 46.020072246328816 - type: nauc_mrr_at_1_std value: 20.99097889820076 - type: nauc_mrr_at_20_diff1 value: 33.04083904518949 - type: nauc_mrr_at_20_max value: 52.597451362456994 - type: nauc_mrr_at_20_std value: 28.681527293587898 - type: nauc_mrr_at_3_diff1 value: 33.64864656322754 - type: nauc_mrr_at_3_max value: 51.82256412011279 - type: nauc_mrr_at_3_std value: 27.241260746740686 - type: nauc_mrr_at_5_diff1 value: 33.53201325467246 - type: nauc_mrr_at_5_max value: 52.79440885773516 - type: nauc_mrr_at_5_std value: 28.663081392086028 - type: nauc_ndcg_at_1000_diff1 value: 28.632650542040714 - type: nauc_ndcg_at_1000_max value: 51.24103069835822 - type: nauc_ndcg_at_1000_std value: 35.05503784757999 - type: nauc_ndcg_at_100_diff1 value: 29.082177715298503 - type: nauc_ndcg_at_100_max value: 45.24750203464315 - type: nauc_ndcg_at_100_std value: 27.146548925680914 - type: nauc_ndcg_at_10_diff1 value: 25.123554466093594 - type: nauc_ndcg_at_10_max value: 42.74355537806512 - type: nauc_ndcg_at_10_std value: 22.234407997803935 - type: nauc_ndcg_at_1_diff1 value: 33.75083940012058 - type: nauc_ndcg_at_1_max value: 44.44319402133161 - type: nauc_ndcg_at_1_std value: 19.146499358406487 - type: nauc_ndcg_at_20_diff1 value: 24.954207968331872 - type: nauc_ndcg_at_20_max value: 41.25991844405748 - type: nauc_ndcg_at_20_std value: 22.169009285868864 - type: nauc_ndcg_at_3_diff1 value: 28.186539942033516 - type: nauc_ndcg_at_3_max value: 44.40790009754965 - type: nauc_ndcg_at_3_std value: 20.99226576085115 - type: nauc_ndcg_at_5_diff1 value: 25.498387899376706 - type: nauc_ndcg_at_5_max value: 43.174709766261316 - type: nauc_ndcg_at_5_std value: 21.88111962672031 - type: nauc_precision_at_1000_diff1 value: -16.22321012507648 - type: nauc_precision_at_1000_max value: 5.808852256649677 - type: nauc_precision_at_1000_std value: 19.875641776698824 - type: nauc_precision_at_100_diff1 value: -10.248089374355486 - type: nauc_precision_at_100_max value: 19.29065415127588 - type: nauc_precision_at_100_std value: 31.75019665627339 - type: nauc_precision_at_10_diff1 value: 3.6783257583955056 - type: nauc_precision_at_10_max value: 39.22286010695767 - type: nauc_precision_at_10_std value: 31.225485732801022 - type: nauc_precision_at_1_diff1 value: 32.57632582147155 - type: nauc_precision_at_1_max value: 46.020072246328816 - type: nauc_precision_at_1_std value: 20.99097889820076 - type: nauc_precision_at_20_diff1 value: -3.1632510833242784 - type: nauc_precision_at_20_max value: 31.575496762405734 - type: nauc_precision_at_20_std value: 31.576283324468115 - type: nauc_precision_at_3_diff1 value: 17.78864585545647 - type: nauc_precision_at_3_max value: 44.201289661125585 - type: nauc_precision_at_3_std value: 25.447840649726693 - type: nauc_precision_at_5_diff1 value: 9.986748662091358 - type: nauc_precision_at_5_max value: 41.214164860776755 - type: nauc_precision_at_5_std value: 28.22551704127726 - type: nauc_recall_at_1000_diff1 value: 10.984331766850506 - type: nauc_recall_at_1000_max value: 24.641216018034104 - type: nauc_recall_at_1000_std value: 26.91064221008446 - type: nauc_recall_at_100_diff1 value: 23.7009352078473 - type: nauc_recall_at_100_max value: 30.176031609451297 - type: nauc_recall_at_100_std value: 20.360365243211564 - type: nauc_recall_at_10_diff1 value: 28.11831737650638 - type: nauc_recall_at_10_max value: 24.21539670487414 - type: nauc_recall_at_10_std value: 2.245504974150148 - type: nauc_recall_at_1_diff1 value: 54.30208426149679 - type: nauc_recall_at_1_max value: 18.69125148481608 - type: nauc_recall_at_1_std value: -8.970955660291802 - type: nauc_recall_at_20_diff1 value: 26.199425305139908 - type: nauc_recall_at_20_max value: 24.66704097503736 - type: nauc_recall_at_20_std value: 5.86052107206246 - type: nauc_recall_at_3_diff1 value: 42.88348677575622 - type: nauc_recall_at_3_max value: 21.189371077603308 - type: nauc_recall_at_3_std value: -4.537510127238226 - type: nauc_recall_at_5_diff1 value: 30.7936756722569 - type: nauc_recall_at_5_max value: 21.06136406164962 - type: nauc_recall_at_5_std value: -1.4113804735229794 - type: ndcg_at_1 value: 45.975 - type: ndcg_at_10 value: 36.201 - type: ndcg_at_100 value: 32.736 - type: ndcg_at_1000 value: 41.099000000000004 - type: ndcg_at_20 value: 33.724 - type: ndcg_at_3 value: 42.242000000000004 - type: ndcg_at_5 value: 40.137 - type: precision_at_1 value: 47.678 - type: precision_at_10 value: 26.904 - type: precision_at_100 value: 8.368 - type: precision_at_1000 value: 2.078 - type: precision_at_20 value: 19.845 - type: precision_at_3 value: 40.351 - type: precision_at_5 value: 35.108 - type: recall_at_1 value: 5.654 - type: recall_at_10 value: 17.793 - type: recall_at_100 value: 32.483000000000004 - type: recall_at_1000 value: 63.294 - type: recall_at_20 value: 21.754 - type: recall_at_3 value: 10.771 - type: recall_at_5 value: 14.084 task: type: Retrieval - dataset: config: default name: MTEB NQ revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: main_score value: 62.464 - type: map_at_1 value: 38.0 - type: map_at_10 value: 54.806 - type: map_at_100 value: 55.599 - type: map_at_1000 value: 55.617000000000004 - type: map_at_20 value: 55.336 - type: map_at_3 value: 50.58200000000001 - type: map_at_5 value: 53.181 - type: mrr_at_1 value: 42.46813441483198 - type: mrr_at_10 value: 57.060710147326446 - type: mrr_at_100 value: 57.60978373431328 - type: mrr_at_1000 value: 57.62192762809547 - type: mrr_at_20 value: 57.43431796174232 - type: mrr_at_3 value: 53.78041714947835 - type: mrr_at_5 value: 55.81257242178437 - type: nauc_map_at_1000_diff1 value: 38.337572188308194 - type: nauc_map_at_1000_max value: 27.550035254787197 - type: nauc_map_at_1000_std value: -7.5513729587308145 - type: nauc_map_at_100_diff1 value: 38.335337794455015 - type: nauc_map_at_100_max value: 27.56919614414171 - type: nauc_map_at_100_std value: -7.526017855405723 - type: nauc_map_at_10_diff1 value: 38.308131361353816 - type: nauc_map_at_10_max value: 27.691849580929933 - type: nauc_map_at_10_std value: -7.971461731555123 - type: nauc_map_at_1_diff1 value: 42.721072690634884 - type: nauc_map_at_1_max value: 21.750451486885332 - type: nauc_map_at_1_std value: -9.99540950522643 - type: nauc_map_at_20_diff1 value: 38.25792874982169 - type: nauc_map_at_20_max value: 27.68877906159661 - type: nauc_map_at_20_std value: -7.560753583212102 - type: nauc_map_at_3_diff1 value: 37.950570055936254 - type: nauc_map_at_3_max value: 26.257969511794858 - type: nauc_map_at_3_std value: -9.236868658300553 - type: nauc_map_at_5_diff1 value: 37.99893219450212 - type: nauc_map_at_5_max value: 27.293454259158057 - type: nauc_map_at_5_std value: -8.734089449603806 - type: nauc_mrr_at_1000_diff1 value: 37.777767467474774 - type: nauc_mrr_at_1000_max value: 27.39507603748298 - type: nauc_mrr_at_1000_std value: -5.554754076870114 - type: nauc_mrr_at_100_diff1 value: 37.77981674583538 - type: nauc_mrr_at_100_max value: 27.411100989441557 - type: nauc_mrr_at_100_std value: -5.539061231412731 - type: nauc_mrr_at_10_diff1 value: 37.72399003363479 - type: nauc_mrr_at_10_max value: 27.618142546685416 - type: nauc_mrr_at_10_std value: -5.6819843907448195 - type: nauc_mrr_at_1_diff1 value: 41.17596078958236 - type: nauc_mrr_at_1_max value: 23.32588591818617 - type: nauc_mrr_at_1_std value: -7.126628034623689 - type: nauc_mrr_at_20_diff1 value: 37.695136721588 - type: nauc_mrr_at_20_max value: 27.52850676467322 - type: nauc_mrr_at_20_std value: -5.50667995515647 - type: nauc_mrr_at_3_diff1 value: 37.23845700908964 - type: nauc_mrr_at_3_max value: 26.69389772971012 - type: nauc_mrr_at_3_std value: -6.31868405989011 - type: nauc_mrr_at_5_diff1 value: 37.33757394192838 - type: nauc_mrr_at_5_max value: 27.42091593836207 - type: nauc_mrr_at_5_std value: -5.993243330132065 - type: nauc_ndcg_at_1000_diff1 value: 37.74836061640332 - type: nauc_ndcg_at_1000_max value: 29.03148916289089 - type: nauc_ndcg_at_1000_std value: -5.543065770074502 - type: nauc_ndcg_at_100_diff1 value: 37.75593955089626 - type: nauc_ndcg_at_100_max value: 29.67109480272493 - type: nauc_ndcg_at_100_std value: -4.773697596687493 - type: nauc_ndcg_at_10_diff1 value: 37.41701174824348 - type: nauc_ndcg_at_10_max value: 30.448703434043445 - type: nauc_ndcg_at_10_std value: -6.306202666419071 - type: nauc_ndcg_at_1_diff1 value: 41.17596078958236 - type: nauc_ndcg_at_1_max value: 23.32588591818617 - type: nauc_ndcg_at_1_std value: -7.126628034623689 - type: nauc_ndcg_at_20_diff1 value: 37.17445197824622 - type: nauc_ndcg_at_20_max value: 30.47378561555209 - type: nauc_ndcg_at_20_std value: -4.921584853993488 - type: nauc_ndcg_at_3_diff1 value: 36.5261976812068 - type: nauc_ndcg_at_3_max value: 27.560538820208926 - type: nauc_ndcg_at_3_std value: -8.556686332882931 - type: nauc_ndcg_at_5_diff1 value: 36.571462759614526 - type: nauc_ndcg_at_5_max value: 29.363401730752585 - type: nauc_ndcg_at_5_std value: -7.825739170420347 - type: nauc_precision_at_1000_diff1 value: -12.588899483401223 - type: nauc_precision_at_1000_max value: 2.641097890578701 - type: nauc_precision_at_1000_std value: 17.643107625788748 - type: nauc_precision_at_100_diff1 value: -8.40579874206785 - type: nauc_precision_at_100_max value: 9.725496771040037 - type: nauc_precision_at_100_std value: 21.558582760191243 - type: nauc_precision_at_10_diff1 value: 6.619157191854486 - type: nauc_precision_at_10_max value: 23.767406373688402 - type: nauc_precision_at_10_std value: 10.428535003478808 - type: nauc_precision_at_1_diff1 value: 41.17596078958236 - type: nauc_precision_at_1_max value: 23.32588591818617 - type: nauc_precision_at_1_std value: -7.126628034623689 - type: nauc_precision_at_20_diff1 value: -0.6449974218292859 - type: nauc_precision_at_20_max value: 20.211503851418783 - type: nauc_precision_at_20_std value: 17.922745410142575 - type: nauc_precision_at_3_diff1 value: 19.710276097428657 - type: nauc_precision_at_3_max value: 26.768918044758706 - type: nauc_precision_at_3_std value: -1.0636448912049246 - type: nauc_precision_at_5_diff1 value: 13.073181337982613 - type: nauc_precision_at_5_max value: 26.418340338971024 - type: nauc_precision_at_5_std value: 2.9842078949528688 - type: nauc_recall_at_1000_diff1 value: 30.52411148739828 - type: nauc_recall_at_1000_max value: 90.96409807536762 - type: nauc_recall_at_1000_std value: 83.94857830921949 - type: nauc_recall_at_100_diff1 value: 36.936303690592155 - type: nauc_recall_at_100_max value: 71.91515014325869 - type: nauc_recall_at_100_std value: 48.93061263403371 - type: nauc_recall_at_10_diff1 value: 32.84292362076269 - type: nauc_recall_at_10_max value: 44.27252783122478 - type: nauc_recall_at_10_std value: -1.5981198975612385 - type: nauc_recall_at_1_diff1 value: 42.721072690634884 - type: nauc_recall_at_1_max value: 21.750451486885332 - type: nauc_recall_at_1_std value: -9.99540950522643 - type: nauc_recall_at_20_diff1 value: 29.36724417081702 - type: nauc_recall_at_20_max value: 52.035846390214715 - type: nauc_recall_at_20_std value: 11.967264191332818 - type: nauc_recall_at_3_diff1 value: 31.634923771936098 - type: nauc_recall_at_3_max value: 30.225743369869473 - type: nauc_recall_at_3_std value: -9.253665347118615 - type: nauc_recall_at_5_diff1 value: 30.66271853090737 - type: nauc_recall_at_5_max value: 35.70815715994996 - type: nauc_recall_at_5_std value: -7.836012956078996 - type: ndcg_at_1 value: 42.468 - type: ndcg_at_10 value: 62.464 - type: ndcg_at_100 value: 65.618 - type: ndcg_at_1000 value: 66.014 - type: ndcg_at_20 value: 64.12 - type: ndcg_at_3 value: 54.790000000000006 - type: ndcg_at_5 value: 58.992 - type: precision_at_1 value: 42.468 - type: precision_at_10 value: 9.959 - type: precision_at_100 value: 1.174 - type: precision_at_1000 value: 0.121 - type: precision_at_20 value: 5.380999999999999 - type: precision_at_3 value: 24.73 - type: precision_at_5 value: 17.299999999999997 - type: recall_at_1 value: 38.0 - type: recall_at_10 value: 83.22699999999999 - type: recall_at_100 value: 96.584 - type: recall_at_1000 value: 99.512 - type: recall_at_20 value: 89.291 - type: recall_at_3 value: 63.666 - type: recall_at_5 value: 73.27900000000001 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 split: test type: mteb/quora metrics: - type: main_score value: 87.366 - type: map_at_1 value: 69.95700000000001 - type: map_at_10 value: 83.55 - type: map_at_100 value: 84.196 - type: map_at_1000 value: 84.21600000000001 - type: map_at_20 value: 83.982 - type: map_at_3 value: 80.647 - type: map_at_5 value: 82.443 - type: mrr_at_1 value: 80.39 - type: mrr_at_10 value: 86.65646031746004 - type: mrr_at_100 value: 86.7852113210373 - type: mrr_at_1000 value: 86.78651118354796 - type: mrr_at_20 value: 86.75772838878498 - type: mrr_at_3 value: 85.67499999999971 - type: mrr_at_5 value: 86.33749999999962 - type: nauc_map_at_1000_diff1 value: 76.68189702770007 - type: nauc_map_at_1000_max value: 36.19988239025682 - type: nauc_map_at_1000_std value: -26.231691135645736 - type: nauc_map_at_100_diff1 value: 76.68832712120171 - type: nauc_map_at_100_max value: 36.18627717337547 - type: nauc_map_at_100_std value: -26.28243886166 - type: nauc_map_at_10_diff1 value: 76.88888516032657 - type: nauc_map_at_10_max value: 35.69809861085124 - type: nauc_map_at_10_std value: -27.859425473864224 - type: nauc_map_at_1_diff1 value: 79.5243725217315 - type: nauc_map_at_1_max value: 27.092773841207002 - type: nauc_map_at_1_std value: -26.223200911204543 - type: nauc_map_at_20_diff1 value: 76.74938996155176 - type: nauc_map_at_20_max value: 36.07373781351406 - type: nauc_map_at_20_std value: -26.891400098628015 - type: nauc_map_at_3_diff1 value: 77.29604745045076 - type: nauc_map_at_3_max value: 33.11431059356283 - type: nauc_map_at_3_std value: -29.555237195931085 - type: nauc_map_at_5_diff1 value: 77.14069217901078 - type: nauc_map_at_5_max value: 34.68656073526487 - type: nauc_map_at_5_std value: -28.945053669861508 - type: nauc_mrr_at_1000_diff1 value: 76.66087451567746 - type: nauc_mrr_at_1000_max value: 38.78133177265328 - type: nauc_mrr_at_1000_std value: -23.75726541774991 - type: nauc_mrr_at_100_diff1 value: 76.66117078261013 - type: nauc_mrr_at_100_max value: 38.782533036423885 - type: nauc_mrr_at_100_std value: -23.752587601473568 - type: nauc_mrr_at_10_diff1 value: 76.65866401411019 - type: nauc_mrr_at_10_max value: 38.87950311049704 - type: nauc_mrr_at_10_std value: -23.873660706680578 - type: nauc_mrr_at_1_diff1 value: 77.42633506487041 - type: nauc_mrr_at_1_max value: 37.93973722217786 - type: nauc_mrr_at_1_std value: -23.3984130771317 - type: nauc_mrr_at_20_diff1 value: 76.66210684923414 - type: nauc_mrr_at_20_max value: 38.81293033048911 - type: nauc_mrr_at_20_std value: -23.736590746133736 - type: nauc_mrr_at_3_diff1 value: 76.33711764736019 - type: nauc_mrr_at_3_max value: 38.5659231830368 - type: nauc_mrr_at_3_std value: -23.99588149124865 - type: nauc_mrr_at_5_diff1 value: 76.57123830226054 - type: nauc_mrr_at_5_max value: 38.97947097392977 - type: nauc_mrr_at_5_std value: -23.943668957974246 - type: nauc_ndcg_at_1000_diff1 value: 76.38447339050585 - type: nauc_ndcg_at_1000_max value: 37.756822792877934 - type: nauc_ndcg_at_1000_std value: -24.046995734357164 - type: nauc_ndcg_at_100_diff1 value: 76.44058018066822 - type: nauc_ndcg_at_100_max value: 37.72948294169218 - type: nauc_ndcg_at_100_std value: -24.083432140741795 - type: nauc_ndcg_at_10_diff1 value: 76.56246287923074 - type: nauc_ndcg_at_10_max value: 37.0329253490553 - type: nauc_ndcg_at_10_std value: -26.6495163705961 - type: nauc_ndcg_at_1_diff1 value: 77.4085129990432 - type: nauc_ndcg_at_1_max value: 38.06139172214421 - type: nauc_ndcg_at_1_std value: -23.656477126977386 - type: nauc_ndcg_at_20_diff1 value: 76.50192496743098 - type: nauc_ndcg_at_20_max value: 37.51759311013985 - type: nauc_ndcg_at_20_std value: -25.45517058360004 - type: nauc_ndcg_at_3_diff1 value: 75.94398494081794 - type: nauc_ndcg_at_3_max value: 35.7666711547279 - type: nauc_ndcg_at_3_std value: -26.866022682361578 - type: nauc_ndcg_at_5_diff1 value: 76.47334274088344 - type: nauc_ndcg_at_5_max value: 36.40830331490731 - type: nauc_ndcg_at_5_std value: -27.170121189572765 - type: nauc_precision_at_1000_diff1 value: -43.33672630765437 - type: nauc_precision_at_1000_max value: -5.089751329149161 - type: nauc_precision_at_1000_std value: 30.6241447847051 - type: nauc_precision_at_100_diff1 value: -42.736833035629864 - type: nauc_precision_at_100_max value: -4.060198408346224 - type: nauc_precision_at_100_std value: 29.807050266205344 - type: nauc_precision_at_10_diff1 value: -35.90810562245906 - type: nauc_precision_at_10_max value: 1.1633204529249133 - type: nauc_precision_at_10_std value: 20.129691203276018 - type: nauc_precision_at_1_diff1 value: 77.4085129990432 - type: nauc_precision_at_1_max value: 38.06139172214421 - type: nauc_precision_at_1_std value: -23.656477126977386 - type: nauc_precision_at_20_diff1 value: -40.2132286912738 - type: nauc_precision_at_20_max value: -1.3004735030734194 - type: nauc_precision_at_20_std value: 25.15612293757488 - type: nauc_precision_at_3_diff1 value: -13.873825299883904 - type: nauc_precision_at_3_max value: 11.038689278907233 - type: nauc_precision_at_3_std value: 5.4276449621706 - type: nauc_precision_at_5_diff1 value: -27.151668633894737 - type: nauc_precision_at_5_max value: 5.795130010163115 - type: nauc_precision_at_5_std value: 13.220722167587375 - type: nauc_recall_at_1000_diff1 value: 83.903950427863 - type: nauc_recall_at_1000_max value: 37.82919000897223 - type: nauc_recall_at_1000_std value: 70.65670846771707 - type: nauc_recall_at_100_diff1 value: 75.23306095335836 - type: nauc_recall_at_100_max value: 37.54281648247423 - type: nauc_recall_at_100_std value: 8.434289114377373 - type: nauc_recall_at_10_diff1 value: 72.7872912723047 - type: nauc_recall_at_10_max value: 34.261519652104184 - type: nauc_recall_at_10_std value: -34.60101950810808 - type: nauc_recall_at_1_diff1 value: 79.5243725217315 - type: nauc_recall_at_1_max value: 27.092773841207002 - type: nauc_recall_at_1_std value: -26.223200911204543 - type: nauc_recall_at_20_diff1 value: 72.8297963091964 - type: nauc_recall_at_20_max value: 36.070220569670916 - type: nauc_recall_at_20_std value: -27.20897179168245 - type: nauc_recall_at_3_diff1 value: 73.47456374650459 - type: nauc_recall_at_3_max value: 29.901663407294816 - type: nauc_recall_at_3_std value: -32.83329537040381 - type: nauc_recall_at_5_diff1 value: 73.05025750827126 - type: nauc_recall_at_5_max value: 32.35733470860963 - type: nauc_recall_at_5_std value: -34.32357558493091 - type: ndcg_at_1 value: 80.4 - type: ndcg_at_10 value: 87.366 - type: ndcg_at_100 value: 88.7 - type: ndcg_at_1000 value: 88.842 - type: ndcg_at_20 value: 88.11 - type: ndcg_at_3 value: 84.52499999999999 - type: ndcg_at_5 value: 86.047 - type: precision_at_1 value: 80.4 - type: precision_at_10 value: 13.235 - type: precision_at_100 value: 1.516 - type: precision_at_1000 value: 0.156 - type: precision_at_20 value: 7.037 - type: precision_at_3 value: 36.9 - type: precision_at_5 value: 24.236 - type: recall_at_1 value: 69.95700000000001 - type: recall_at_10 value: 94.535 - type: recall_at_100 value: 99.164 - type: recall_at_1000 value: 99.855 - type: recall_at_20 value: 96.974 - type: recall_at_3 value: 86.33800000000001 - type: recall_at_5 value: 90.69 task: type: Retrieval - dataset: config: default name: MTEB SCIDOCS revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 split: test type: mteb/scidocs metrics: - type: main_score value: 21.492 - type: map_at_1 value: 5.192 - type: map_at_10 value: 12.959000000000001 - type: map_at_100 value: 14.963999999999999 - type: map_at_1000 value: 15.261 - type: map_at_20 value: 13.988999999999999 - type: map_at_3 value: 9.235 - type: map_at_5 value: 11.042 - type: mrr_at_1 value: 25.5 - type: mrr_at_10 value: 36.37313492063491 - type: mrr_at_100 value: 37.36517957347626 - type: mrr_at_1000 value: 37.42538601073437 - type: mrr_at_20 value: 36.987896404421136 - type: mrr_at_3 value: 32.966666666666654 - type: mrr_at_5 value: 34.95166666666664 - type: nauc_map_at_1000_diff1 value: 13.635120934154395 - type: nauc_map_at_1000_max value: 28.03542983005195 - type: nauc_map_at_1000_std value: 17.07156940311778 - type: nauc_map_at_100_diff1 value: 13.59237295184475 - type: nauc_map_at_100_max value: 27.992291365051237 - type: nauc_map_at_100_std value: 16.926533467400464 - type: nauc_map_at_10_diff1 value: 14.149193235999993 - type: nauc_map_at_10_max value: 26.520643811139305 - type: nauc_map_at_10_std value: 13.168673602548925 - type: nauc_map_at_1_diff1 value: 20.096094508148465 - type: nauc_map_at_1_max value: 17.41582245576302 - type: nauc_map_at_1_std value: 5.771729007558897 - type: nauc_map_at_20_diff1 value: 13.977726400526427 - type: nauc_map_at_20_max value: 27.2322235491895 - type: nauc_map_at_20_std value: 14.972781677750435 - type: nauc_map_at_3_diff1 value: 17.371153027460355 - type: nauc_map_at_3_max value: 24.457758503208254 - type: nauc_map_at_3_std value: 7.719726821179824 - type: nauc_map_at_5_diff1 value: 14.600442843442574 - type: nauc_map_at_5_max value: 25.899736370856296 - type: nauc_map_at_5_std value: 10.125349354853359 - type: nauc_mrr_at_1000_diff1 value: 18.70342821390236 - type: nauc_mrr_at_1000_max value: 23.365194520549114 - type: nauc_mrr_at_1000_std value: 12.185114294903236 - type: nauc_mrr_at_100_diff1 value: 18.677858738015907 - type: nauc_mrr_at_100_max value: 23.372641996726742 - type: nauc_mrr_at_100_std value: 12.216130561991909 - type: nauc_mrr_at_10_diff1 value: 18.79094453090232 - type: nauc_mrr_at_10_max value: 23.511686337006466 - type: nauc_mrr_at_10_std value: 11.879716687008134 - type: nauc_mrr_at_1_diff1 value: 20.10455171810408 - type: nauc_mrr_at_1_max value: 17.741566234315428 - type: nauc_mrr_at_1_std value: 6.1676764583652215 - type: nauc_mrr_at_20_diff1 value: 18.70143648544655 - type: nauc_mrr_at_20_max value: 23.45603239095019 - type: nauc_mrr_at_20_std value: 12.244613576686202 - type: nauc_mrr_at_3_diff1 value: 18.894662528857374 - type: nauc_mrr_at_3_max value: 23.3739038101588 - type: nauc_mrr_at_3_std value: 10.4709044796543 - type: nauc_mrr_at_5_diff1 value: 18.877786065095563 - type: nauc_mrr_at_5_max value: 23.78061081203872 - type: nauc_mrr_at_5_std value: 11.847882917869622 - type: nauc_ndcg_at_1000_diff1 value: 13.99159027398115 - type: nauc_ndcg_at_1000_max value: 29.44766808611483 - type: nauc_ndcg_at_1000_std value: 24.289749574699915 - type: nauc_ndcg_at_100_diff1 value: 13.164020363258746 - type: nauc_ndcg_at_100_max value: 29.642442997167723 - type: nauc_ndcg_at_100_std value: 23.761764515453866 - type: nauc_ndcg_at_10_diff1 value: 14.839883268638546 - type: nauc_ndcg_at_10_max value: 27.21043708455449 - type: nauc_ndcg_at_10_std value: 15.56110419291775 - type: nauc_ndcg_at_1_diff1 value: 20.10455171810408 - type: nauc_ndcg_at_1_max value: 17.741566234315428 - type: nauc_ndcg_at_1_std value: 6.1676764583652215 - type: nauc_ndcg_at_20_diff1 value: 14.27998110295395 - type: nauc_ndcg_at_20_max value: 28.2492026337839 - type: nauc_ndcg_at_20_std value: 18.822356982979105 - type: nauc_ndcg_at_3_diff1 value: 17.659263157535445 - type: nauc_ndcg_at_3_max value: 25.416706421591396 - type: nauc_ndcg_at_3_std value: 9.650689638152636 - type: nauc_ndcg_at_5_diff1 value: 15.38459833918123 - type: nauc_ndcg_at_5_max value: 26.92495519416969 - type: nauc_ndcg_at_5_std value: 12.71017696809276 - type: nauc_precision_at_1000_diff1 value: 6.128490135458364 - type: nauc_precision_at_1000_max value: 23.52693893261883 - type: nauc_precision_at_1000_std value: 36.280432732819925 - type: nauc_precision_at_100_diff1 value: 5.306163791220436 - type: nauc_precision_at_100_max value: 27.67851033239246 - type: nauc_precision_at_100_std value: 34.29821573752515 - type: nauc_precision_at_10_diff1 value: 10.829686435425472 - type: nauc_precision_at_10_max value: 27.201648684015318 - type: nauc_precision_at_10_std value: 19.376999508233254 - type: nauc_precision_at_1_diff1 value: 20.10455171810408 - type: nauc_precision_at_1_max value: 17.741566234315428 - type: nauc_precision_at_1_std value: 6.1676764583652215 - type: nauc_precision_at_20_diff1 value: 9.416169626702048 - type: nauc_precision_at_20_max value: 27.65257998670333 - type: nauc_precision_at_20_std value: 24.761868509805826 - type: nauc_precision_at_3_diff1 value: 16.666456902017348 - type: nauc_precision_at_3_max value: 27.9969730961105 - type: nauc_precision_at_3_std value: 10.991562741393231 - type: nauc_precision_at_5_diff1 value: 12.26205064462843 - type: nauc_precision_at_5_max value: 29.083848730874095 - type: nauc_precision_at_5_std value: 15.66630836555747 - type: nauc_recall_at_1000_diff1 value: 5.600277836894063 - type: nauc_recall_at_1000_max value: 23.228705161815526 - type: nauc_recall_at_1000_std value: 36.822431061799485 - type: nauc_recall_at_100_diff1 value: 4.991781244867178 - type: nauc_recall_at_100_max value: 27.70095625483475 - type: nauc_recall_at_100_std value: 34.67168431597854 - type: nauc_recall_at_10_diff1 value: 10.580860425931972 - type: nauc_recall_at_10_max value: 27.145829414223666 - type: nauc_recall_at_10_std value: 19.330630157067382 - type: nauc_recall_at_1_diff1 value: 20.096094508148465 - type: nauc_recall_at_1_max value: 17.41582245576302 - type: nauc_recall_at_1_std value: 5.771729007558897 - type: nauc_recall_at_20_diff1 value: 9.06945331260344 - type: nauc_recall_at_20_max value: 27.56725251066482 - type: nauc_recall_at_20_std value: 24.77644509886098 - type: nauc_recall_at_3_diff1 value: 16.660507676429322 - type: nauc_recall_at_3_max value: 27.816546386536434 - type: nauc_recall_at_3_std value: 10.687824478247007 - type: nauc_recall_at_5_diff1 value: 11.992514446369388 - type: nauc_recall_at_5_max value: 28.789031176671948 - type: nauc_recall_at_5_std value: 15.422118990090805 - type: ndcg_at_1 value: 25.5 - type: ndcg_at_10 value: 21.492 - type: ndcg_at_100 value: 29.022 - type: ndcg_at_1000 value: 34.298 - type: ndcg_at_20 value: 24.237000000000002 - type: ndcg_at_3 value: 20.392 - type: ndcg_at_5 value: 17.801000000000002 - type: precision_at_1 value: 25.5 - type: precision_at_10 value: 11.09 - type: precision_at_100 value: 2.1919999999999997 - type: precision_at_1000 value: 0.346 - type: precision_at_20 value: 7.135 - type: precision_at_3 value: 18.933 - type: precision_at_5 value: 15.52 - type: recall_at_1 value: 5.192 - type: recall_at_10 value: 22.512999999999998 - type: recall_at_100 value: 44.505 - type: recall_at_1000 value: 70.267 - type: recall_at_20 value: 28.965000000000003 - type: recall_at_3 value: 11.522 - type: recall_at_5 value: 15.751999999999999 task: type: Retrieval - dataset: config: default name: MTEB SciFact revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: main_score value: 71.586 - type: map_at_1 value: 56.760999999999996 - type: map_at_10 value: 66.893 - type: map_at_100 value: 67.42 - type: map_at_1000 value: 67.44200000000001 - type: map_at_20 value: 67.232 - type: map_at_3 value: 64.193 - type: map_at_5 value: 65.73400000000001 - type: mrr_at_1 value: 60.0 - type: mrr_at_10 value: 68.20383597883595 - type: mrr_at_100 value: 68.58867453733343 - type: mrr_at_1000 value: 68.61117469977329 - type: mrr_at_20 value: 68.43973740684265 - type: mrr_at_3 value: 66.11111111111111 - type: mrr_at_5 value: 67.44444444444446 - type: nauc_map_at_1000_diff1 value: 72.66688261123035 - type: nauc_map_at_1000_max value: 61.02926282006283 - type: nauc_map_at_1000_std value: 11.084549829740526 - type: nauc_map_at_100_diff1 value: 72.66226192320828 - type: nauc_map_at_100_max value: 61.04393223108811 - type: nauc_map_at_100_std value: 11.101529343291695 - type: nauc_map_at_10_diff1 value: 72.66732266693091 - type: nauc_map_at_10_max value: 61.24124296311832 - type: nauc_map_at_10_std value: 10.91179451961794 - type: nauc_map_at_1_diff1 value: 74.2356464256346 - type: nauc_map_at_1_max value: 54.06962758957632 - type: nauc_map_at_1_std value: 0.8037891907963532 - type: nauc_map_at_20_diff1 value: 72.65198594061253 - type: nauc_map_at_20_max value: 61.130159351448185 - type: nauc_map_at_20_std value: 11.2246899245522 - type: nauc_map_at_3_diff1 value: 72.78578673303954 - type: nauc_map_at_3_max value: 59.19073262936321 - type: nauc_map_at_3_std value: 8.460301560522968 - type: nauc_map_at_5_diff1 value: 72.55004168261968 - type: nauc_map_at_5_max value: 59.75181935082357 - type: nauc_map_at_5_std value: 9.440299527201889 - type: nauc_mrr_at_1000_diff1 value: 72.82720348470325 - type: nauc_mrr_at_1000_max value: 62.344231223741446 - type: nauc_mrr_at_1000_std value: 12.60196558488974 - type: nauc_mrr_at_100_diff1 value: 72.82236849255094 - type: nauc_mrr_at_100_max value: 62.35799491393125 - type: nauc_mrr_at_100_std value: 12.617900773655673 - type: nauc_mrr_at_10_diff1 value: 72.7722847495086 - type: nauc_mrr_at_10_max value: 62.66642401155435 - type: nauc_mrr_at_10_std value: 12.906381237738746 - type: nauc_mrr_at_1_diff1 value: 74.71208073612343 - type: nauc_mrr_at_1_max value: 59.50430394775893 - type: nauc_mrr_at_1_std value: 8.129514198080512 - type: nauc_mrr_at_20_diff1 value: 72.78312367361772 - type: nauc_mrr_at_20_max value: 62.421122493761885 - type: nauc_mrr_at_20_std value: 12.693437522498588 - type: nauc_mrr_at_3_diff1 value: 73.50670156385345 - type: nauc_mrr_at_3_max value: 62.01717537699209 - type: nauc_mrr_at_3_std value: 11.926548252191182 - type: nauc_mrr_at_5_diff1 value: 72.62204028549876 - type: nauc_mrr_at_5_max value: 62.319358766312085 - type: nauc_mrr_at_5_std value: 13.081257923284342 - type: nauc_ndcg_at_1000_diff1 value: 72.29960539074736 - type: nauc_ndcg_at_1000_max value: 62.75096959221402 - type: nauc_ndcg_at_1000_std value: 13.81528462505362 - type: nauc_ndcg_at_100_diff1 value: 72.19985782073529 - type: nauc_ndcg_at_100_max value: 63.18837705326287 - type: nauc_ndcg_at_100_std value: 14.506479655117138 - type: nauc_ndcg_at_10_diff1 value: 71.85759847832983 - type: nauc_ndcg_at_10_max value: 64.150996056865 - type: nauc_ndcg_at_10_std value: 14.580606901634278 - type: nauc_ndcg_at_1_diff1 value: 74.71208073612343 - type: nauc_ndcg_at_1_max value: 59.50430394775893 - type: nauc_ndcg_at_1_std value: 8.129514198080512 - type: nauc_ndcg_at_20_diff1 value: 71.80987178228351 - type: nauc_ndcg_at_20_max value: 63.56269460865743 - type: nauc_ndcg_at_20_std value: 15.024978004625922 - type: nauc_ndcg_at_3_diff1 value: 72.35095651602592 - type: nauc_ndcg_at_3_max value: 61.60548011855679 - type: nauc_ndcg_at_3_std value: 12.048248788835263 - type: nauc_ndcg_at_5_diff1 value: 71.48615621881864 - type: nauc_ndcg_at_5_max value: 61.72870035979784 - type: nauc_ndcg_at_5_std value: 12.83048357446691 - type: nauc_precision_at_1000_diff1 value: -14.743011420972 - type: nauc_precision_at_1000_max value: 19.281995763080158 - type: nauc_precision_at_1000_std value: 49.6140660398164 - type: nauc_precision_at_100_diff1 value: 0.11278174806205563 - type: nauc_precision_at_100_max value: 29.704511820077332 - type: nauc_precision_at_100_std value: 47.84916954122579 - type: nauc_precision_at_10_diff1 value: 20.498227967235728 - type: nauc_precision_at_10_max value: 47.883119365891595 - type: nauc_precision_at_10_std value: 45.182178693450595 - type: nauc_precision_at_1_diff1 value: 74.71208073612343 - type: nauc_precision_at_1_max value: 59.50430394775893 - type: nauc_precision_at_1_std value: 8.129514198080512 - type: nauc_precision_at_20_diff1 value: 12.551737222341455 - type: nauc_precision_at_20_max value: 40.618899501225634 - type: nauc_precision_at_20_std value: 48.5598454249067 - type: nauc_precision_at_3_diff1 value: 47.67720764601145 - type: nauc_precision_at_3_max value: 56.50632017305064 - type: nauc_precision_at_3_std value: 31.14175140162157 - type: nauc_precision_at_5_diff1 value: 35.10058622792819 - type: nauc_precision_at_5_max value: 51.88948872657981 - type: nauc_precision_at_5_std value: 37.62796957461928 - type: nauc_recall_at_1000_diff1 value: 79.57516339869238 - type: nauc_recall_at_1000_max value: 86.11111111111035 - type: nauc_recall_at_1000_std value: 79.57516339869238 - type: nauc_recall_at_100_diff1 value: 70.50859559510081 - type: nauc_recall_at_100_max value: 79.17009941231396 - type: nauc_recall_at_100_std value: 44.32910419069595 - type: nauc_recall_at_10_diff1 value: 66.16118569361245 - type: nauc_recall_at_10_max value: 74.73542948302286 - type: nauc_recall_at_10_std value: 27.680330939810037 - type: nauc_recall_at_1_diff1 value: 74.2356464256346 - type: nauc_recall_at_1_max value: 54.06962758957632 - type: nauc_recall_at_1_std value: 0.8037891907963532 - type: nauc_recall_at_20_diff1 value: 65.4748436545527 - type: nauc_recall_at_20_max value: 73.81532199081235 - type: nauc_recall_at_20_std value: 33.59324708196253 - type: nauc_recall_at_3_diff1 value: 68.83194804473622 - type: nauc_recall_at_3_max value: 61.77722610439669 - type: nauc_recall_at_3_std value: 13.984923756556714 - type: nauc_recall_at_5_diff1 value: 65.51467417209523 - type: nauc_recall_at_5_max value: 64.08276291427661 - type: nauc_recall_at_5_std value: 19.976472037847167 - type: ndcg_at_1 value: 60.0 - type: ndcg_at_10 value: 71.586 - type: ndcg_at_100 value: 73.76899999999999 - type: ndcg_at_1000 value: 74.386 - type: ndcg_at_20 value: 72.612 - type: ndcg_at_3 value: 66.944 - type: ndcg_at_5 value: 69.333 - type: precision_at_1 value: 60.0 - type: precision_at_10 value: 9.6 - type: precision_at_100 value: 1.073 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_20 value: 5.033 - type: precision_at_3 value: 26.333000000000002 - type: precision_at_5 value: 17.4 - type: recall_at_1 value: 56.760999999999996 - type: recall_at_10 value: 84.589 - type: recall_at_100 value: 94.333 - type: recall_at_1000 value: 99.333 - type: recall_at_20 value: 88.43299999999999 - type: recall_at_3 value: 72.10600000000001 - type: recall_at_5 value: 78.194 task: type: Retrieval - dataset: config: default name: MTEB TRECCOVID revision: bb9466bac8153a0349341eb1b22e06409e78ef4e split: test type: mteb/trec-covid metrics: - type: main_score value: 84.60600000000001 - type: map_at_1 value: 0.257 - type: map_at_10 value: 2.196 - type: map_at_100 value: 13.252 - type: map_at_1000 value: 31.473000000000003 - type: map_at_20 value: 4.023000000000001 - type: map_at_3 value: 0.722 - type: map_at_5 value: 1.146 - type: mrr_at_1 value: 94.0 - type: mrr_at_10 value: 97.0 - type: mrr_at_100 value: 97.0 - type: mrr_at_1000 value: 97.0 - type: mrr_at_20 value: 97.0 - type: mrr_at_3 value: 97.0 - type: mrr_at_5 value: 97.0 - type: nauc_map_at_1000_diff1 value: -30.674816554207062 - type: nauc_map_at_1000_max value: 53.18598689657068 - type: nauc_map_at_1000_std value: 78.88325309469121 - type: nauc_map_at_100_diff1 value: -17.6877824653978 - type: nauc_map_at_100_max value: 19.584159765315658 - type: nauc_map_at_100_std value: 48.051154190992726 - type: nauc_map_at_10_diff1 value: 20.076631089898626 - type: nauc_map_at_10_max value: -8.642556160185636 - type: nauc_map_at_10_std value: -5.768698617334298 - type: nauc_map_at_1_diff1 value: 27.342260509653798 - type: nauc_map_at_1_max value: -23.400451210297994 - type: nauc_map_at_1_std value: -21.152006353733853 - type: nauc_map_at_20_diff1 value: 8.019321726240506 - type: nauc_map_at_20_max value: -1.4826378210544222 - type: nauc_map_at_20_std value: 5.698208117745366 - type: nauc_map_at_3_diff1 value: 32.073377946749446 - type: nauc_map_at_3_max value: -13.099353983204654 - type: nauc_map_at_3_std value: -15.36319127398037 - type: nauc_map_at_5_diff1 value: 22.500045815797876 - type: nauc_map_at_5_max value: -8.548135411428023 - type: nauc_map_at_5_std value: -8.547850460331334 - type: nauc_mrr_at_1000_diff1 value: -6.022408963585526 - type: nauc_mrr_at_1000_max value: 4.481792717087155 - type: nauc_mrr_at_1000_std value: 51.6962340491753 - type: nauc_mrr_at_100_diff1 value: -6.022408963585526 - type: nauc_mrr_at_100_max value: 4.481792717087155 - type: nauc_mrr_at_100_std value: 51.6962340491753 - type: nauc_mrr_at_10_diff1 value: -6.022408963585526 - type: nauc_mrr_at_10_max value: 4.481792717087155 - type: nauc_mrr_at_10_std value: 51.6962340491753 - type: nauc_mrr_at_1_diff1 value: -6.022408963585076 - type: nauc_mrr_at_1_max value: 4.481792717087146 - type: nauc_mrr_at_1_std value: 51.69623404917518 - type: nauc_mrr_at_20_diff1 value: -6.022408963585526 - type: nauc_mrr_at_20_max value: 4.481792717087155 - type: nauc_mrr_at_20_std value: 51.6962340491753 - type: nauc_mrr_at_3_diff1 value: -6.022408963585526 - type: nauc_mrr_at_3_max value: 4.481792717087155 - type: nauc_mrr_at_3_std value: 51.6962340491753 - type: nauc_mrr_at_5_diff1 value: -6.022408963585526 - type: nauc_mrr_at_5_max value: 4.481792717087155 - type: nauc_mrr_at_5_std value: 51.6962340491753 - type: nauc_ndcg_at_1000_diff1 value: -20.79697283984295 - type: nauc_ndcg_at_1000_max value: 52.97671908009218 - type: nauc_ndcg_at_1000_std value: 75.43907707019758 - type: nauc_ndcg_at_100_diff1 value: -38.620752706946455 - type: nauc_ndcg_at_100_max value: 49.41307462381511 - type: nauc_ndcg_at_100_std value: 81.33299379244252 - type: nauc_ndcg_at_10_diff1 value: -18.611906363037356 - type: nauc_ndcg_at_10_max value: 44.20544651664479 - type: nauc_ndcg_at_10_std value: 61.322552829935816 - type: nauc_ndcg_at_1_diff1 value: 18.625935567849073 - type: nauc_ndcg_at_1_max value: -10.104132769280879 - type: nauc_ndcg_at_1_std value: 22.449560689879743 - type: nauc_ndcg_at_20_diff1 value: -30.61130208138771 - type: nauc_ndcg_at_20_max value: 52.68851710375231 - type: nauc_ndcg_at_20_std value: 69.72357683382992 - type: nauc_ndcg_at_3_diff1 value: 5.695394821691213 - type: nauc_ndcg_at_3_max value: 37.909122367102135 - type: nauc_ndcg_at_3_std value: 46.2366603255159 - type: nauc_ndcg_at_5_diff1 value: -15.273067832464731 - type: nauc_ndcg_at_5_max value: 49.7054639475091 - type: nauc_ndcg_at_5_std value: 58.83754007826166 - type: nauc_precision_at_1000_diff1 value: -31.565302588492035 - type: nauc_precision_at_1000_max value: 52.56214379514724 - type: nauc_precision_at_1000_std value: 53.40618234326055 - type: nauc_precision_at_100_diff1 value: -44.67273120709088 - type: nauc_precision_at_100_max value: 48.30381155522576 - type: nauc_precision_at_100_std value: 82.1984661602578 - type: nauc_precision_at_10_diff1 value: -24.737383556860145 - type: nauc_precision_at_10_max value: 52.816815002878556 - type: nauc_precision_at_10_std value: 67.99052410030845 - type: nauc_precision_at_1_diff1 value: -6.022408963585076 - type: nauc_precision_at_1_max value: 4.481792717087146 - type: nauc_precision_at_1_std value: 51.69623404917518 - type: nauc_precision_at_20_diff1 value: -40.23628054967093 - type: nauc_precision_at_20_max value: 56.980056980057014 - type: nauc_precision_at_20_std value: 76.60976777785895 - type: nauc_precision_at_3_diff1 value: -4.661784068466279 - type: nauc_precision_at_3_max value: 59.052007899934125 - type: nauc_precision_at_3_std value: 58.187952600394986 - type: nauc_precision_at_5_diff1 value: -38.11848143512736 - type: nauc_precision_at_5_max value: 68.6149353358365 - type: nauc_precision_at_5_std value: 73.55652899457661 - type: nauc_recall_at_1000_diff1 value: -14.886527444436345 - type: nauc_recall_at_1000_max value: 48.07492302795808 - type: nauc_recall_at_1000_std value: 65.05623212485906 - type: nauc_recall_at_100_diff1 value: -8.148385729388195 - type: nauc_recall_at_100_max value: 8.041615364614533 - type: nauc_recall_at_100_std value: 33.77187914574611 - type: nauc_recall_at_10_diff1 value: 24.333628413035942 - type: nauc_recall_at_10_max value: -14.577877145192078 - type: nauc_recall_at_10_std value: -12.131819145098557 - type: nauc_recall_at_1_diff1 value: 27.342260509653798 - type: nauc_recall_at_1_max value: -23.400451210297994 - type: nauc_recall_at_1_std value: -21.152006353733853 - type: nauc_recall_at_20_diff1 value: 13.695556376785564 - type: nauc_recall_at_20_max value: -8.872009346408264 - type: nauc_recall_at_20_std value: -3.163199444247112 - type: nauc_recall_at_3_diff1 value: 32.00442538217753 - type: nauc_recall_at_3_max value: -15.159737942664552 - type: nauc_recall_at_3_std value: -17.530833132440645 - type: nauc_recall_at_5_diff1 value: 22.64740552912405 - type: nauc_recall_at_5_max value: -12.947090597010414 - type: nauc_recall_at_5_std value: -12.914478822476807 - type: ndcg_at_1 value: 88.0 - type: ndcg_at_10 value: 84.60600000000001 - type: ndcg_at_100 value: 64.31700000000001 - type: ndcg_at_1000 value: 56.40500000000001 - type: ndcg_at_20 value: 80.561 - type: ndcg_at_3 value: 87.87700000000001 - type: ndcg_at_5 value: 86.641 - type: precision_at_1 value: 94.0 - type: precision_at_10 value: 88.2 - type: precision_at_100 value: 65.9 - type: precision_at_1000 value: 25.019999999999996 - type: precision_at_20 value: 84.7 - type: precision_at_3 value: 92.0 - type: precision_at_5 value: 90.0 - type: recall_at_1 value: 0.257 - type: recall_at_10 value: 2.338 - type: recall_at_100 value: 15.831999999999999 - type: recall_at_1000 value: 52.519000000000005 - type: recall_at_20 value: 4.367 - type: recall_at_3 value: 0.74 - type: recall_at_5 value: 1.196 task: type: Retrieval - dataset: config: default name: MTEB Touche2020 revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: main_score value: 31.426 - type: map_at_1 value: 3.4709999999999996 - type: map_at_10 value: 13.236999999999998 - type: map_at_100 value: 19.521 - type: map_at_1000 value: 21.224 - type: map_at_20 value: 15.626000000000001 - type: map_at_3 value: 7.152 - type: map_at_5 value: 9.914000000000001 - type: mrr_at_1 value: 44.89795918367347 - type: mrr_at_10 value: 57.54373177842565 - type: mrr_at_100 value: 57.855267710139536 - type: mrr_at_1000 value: 57.855267710139536 - type: mrr_at_20 value: 57.70071764969724 - type: mrr_at_3 value: 52.72108843537414 - type: mrr_at_5 value: 55.06802721088435 - type: nauc_map_at_1000_diff1 value: 21.148857552115558 - type: nauc_map_at_1000_max value: 2.0837572569021323 - type: nauc_map_at_1000_std value: 3.203419709665347 - type: nauc_map_at_100_diff1 value: 21.383778167597878 - type: nauc_map_at_100_max value: 0.965767943155967 - type: nauc_map_at_100_std value: 0.3949924961020957 - type: nauc_map_at_10_diff1 value: 27.178555638086394 - type: nauc_map_at_10_max value: 4.480675175857958 - type: nauc_map_at_10_std value: -13.69553539513878 - type: nauc_map_at_1_diff1 value: 27.63901823865334 - type: nauc_map_at_1_max value: -18.6387233237763 - type: nauc_map_at_1_std value: -27.02164241863646 - type: nauc_map_at_20_diff1 value: 23.892104752374888 - type: nauc_map_at_20_max value: 3.5343136621362348 - type: nauc_map_at_20_std value: -8.765101188860816 - type: nauc_map_at_3_diff1 value: 22.065793929837493 - type: nauc_map_at_3_max value: 0.8063396680860568 - type: nauc_map_at_3_std value: -20.404849396621824 - type: nauc_map_at_5_diff1 value: 22.66626080580714 - type: nauc_map_at_5_max value: 5.423340658352383 - type: nauc_map_at_5_std value: -18.31523779843455 - type: nauc_mrr_at_1000_diff1 value: 30.520722269282665 - type: nauc_mrr_at_1000_max value: -16.644959497742267 - type: nauc_mrr_at_1000_std value: -16.3824126273053 - type: nauc_mrr_at_100_diff1 value: 30.520722269282665 - type: nauc_mrr_at_100_max value: -16.644959497742267 - type: nauc_mrr_at_100_std value: -16.3824126273053 - type: nauc_mrr_at_10_diff1 value: 30.428248939332974 - type: nauc_mrr_at_10_max value: -16.300183919261585 - type: nauc_mrr_at_10_std value: -15.404823235836309 - type: nauc_mrr_at_1_diff1 value: 27.041346572613474 - type: nauc_mrr_at_1_max value: -23.181309312755804 - type: nauc_mrr_at_1_std value: -24.33076726484014 - type: nauc_mrr_at_20_diff1 value: 30.676558567379303 - type: nauc_mrr_at_20_max value: -16.914268763031416 - type: nauc_mrr_at_20_std value: -15.77742854976336 - type: nauc_mrr_at_3_diff1 value: 31.718457109787096 - type: nauc_mrr_at_3_max value: -15.508391132202235 - type: nauc_mrr_at_3_std value: -20.33229438349494 - type: nauc_mrr_at_5_diff1 value: 28.73798376227693 - type: nauc_mrr_at_5_max value: -16.086295031060196 - type: nauc_mrr_at_5_std value: -15.644604635769321 - type: nauc_ndcg_at_1000_diff1 value: 22.158724660189606 - type: nauc_ndcg_at_1000_max value: -3.1755686809941475 - type: nauc_ndcg_at_1000_std value: 19.258386224159075 - type: nauc_ndcg_at_100_diff1 value: 21.83846748649288 - type: nauc_ndcg_at_100_max value: -10.939957598756036 - type: nauc_ndcg_at_100_std value: 14.729678880436623 - type: nauc_ndcg_at_10_diff1 value: 26.944882726098424 - type: nauc_ndcg_at_10_max value: -3.5176483833346617 - type: nauc_ndcg_at_10_std value: -5.400606773697211 - type: nauc_ndcg_at_1_diff1 value: 26.649410985172985 - type: nauc_ndcg_at_1_max value: -18.806716526067493 - type: nauc_ndcg_at_1_std value: -25.100244999343506 - type: nauc_ndcg_at_20_diff1 value: 24.860266153648315 - type: nauc_ndcg_at_20_max value: -7.521401821712892 - type: nauc_ndcg_at_20_std value: -3.3696577425983003 - type: nauc_ndcg_at_3_diff1 value: 23.9933326962406 - type: nauc_ndcg_at_3_max value: -0.4609479344284664 - type: nauc_ndcg_at_3_std value: -15.176459166869897 - type: nauc_ndcg_at_5_diff1 value: 22.50595978713142 - type: nauc_ndcg_at_5_max value: -2.1093870656000857 - type: nauc_ndcg_at_5_std value: -12.732197425528257 - type: nauc_precision_at_1000_diff1 value: -20.335120385950024 - type: nauc_precision_at_1000_max value: 26.95109729939765 - type: nauc_precision_at_1000_std value: 29.981685890622117 - type: nauc_precision_at_100_diff1 value: -2.782114329320704 - type: nauc_precision_at_100_max value: 2.9489322002048604 - type: nauc_precision_at_100_std value: 67.3074073674319 - type: nauc_precision_at_10_diff1 value: 21.385177180383383 - type: nauc_precision_at_10_max value: -2.4696365259422817 - type: nauc_precision_at_10_std value: 14.469784299536673 - type: nauc_precision_at_1_diff1 value: 27.041346572613474 - type: nauc_precision_at_1_max value: -23.181309312755804 - type: nauc_precision_at_1_std value: -24.33076726484014 - type: nauc_precision_at_20_diff1 value: 11.993846579997673 - type: nauc_precision_at_20_max value: -2.4792189693296227 - type: nauc_precision_at_20_std value: 28.581394687807745 - type: nauc_precision_at_3_diff1 value: 20.70568446328836 - type: nauc_precision_at_3_max value: 0.37326398699875984 - type: nauc_precision_at_3_std value: -12.983918676694389 - type: nauc_precision_at_5_diff1 value: 19.47466335828124 - type: nauc_precision_at_5_max value: -1.8921617684385994 - type: nauc_precision_at_5_std value: -6.533875294402164 - type: nauc_recall_at_1000_diff1 value: 7.611201305723156 - type: nauc_recall_at_1000_max value: 5.6416194035820055 - type: nauc_recall_at_1000_std value: 61.695208644278 - type: nauc_recall_at_100_diff1 value: 10.0183258158735 - type: nauc_recall_at_100_max value: -10.950612455698973 - type: nauc_recall_at_100_std value: 33.06069987640471 - type: nauc_recall_at_10_diff1 value: 24.738210305731535 - type: nauc_recall_at_10_max value: -2.6592454032071546 - type: nauc_recall_at_10_std value: -4.83987517793115 - type: nauc_recall_at_1_diff1 value: 27.63901823865334 - type: nauc_recall_at_1_max value: -18.6387233237763 - type: nauc_recall_at_1_std value: -27.02164241863646 - type: nauc_recall_at_20_diff1 value: 17.79601177409034 - type: nauc_recall_at_20_max value: -6.681637093148051 - type: nauc_recall_at_20_std value: 3.369193919932238 - type: nauc_recall_at_3_diff1 value: 24.9589431081204 - type: nauc_recall_at_3_max value: 2.4783640980500232 - type: nauc_recall_at_3_std value: -19.567415651090702 - type: nauc_recall_at_5_diff1 value: 23.71803410135437 - type: nauc_recall_at_5_max value: 1.6294309357641652 - type: nauc_recall_at_5_std value: -15.365511906408983 - type: ndcg_at_1 value: 40.816 - type: ndcg_at_10 value: 31.426 - type: ndcg_at_100 value: 41.558 - type: ndcg_at_1000 value: 53.042 - type: ndcg_at_20 value: 31.108999999999998 - type: ndcg_at_3 value: 35.518 - type: ndcg_at_5 value: 33.235 - type: precision_at_1 value: 44.897999999999996 - type: precision_at_10 value: 27.551 - type: precision_at_100 value: 8.204 - type: precision_at_1000 value: 1.582 - type: precision_at_20 value: 19.796 - type: precision_at_3 value: 36.735 - type: precision_at_5 value: 33.061 - type: recall_at_1 value: 3.4709999999999996 - type: recall_at_10 value: 19.563 - type: recall_at_100 value: 50.3 - type: recall_at_1000 value: 85.13199999999999 - type: recall_at_20 value: 26.738 - type: recall_at_3 value: 7.8420000000000005 - type: recall_at_5 value: 11.994 task: type: Retrieval - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 68.29850746268657 - type: ap value: 30.109785890841966 - type: ap_weighted value: 30.109785890841966 - type: f1 value: 61.76875915202924 - type: f1_weighted value: 71.32073190458556 - type: main_score value: 68.29850746268657 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification (default) revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 90.3068 - type: ap value: 86.17914339624038 - type: ap_weighted value: 86.17914339624038 - type: f1 value: 90.29716826358077 - type: f1_weighted value: 90.29716826358077 - type: main_score value: 90.3068 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 46.272000000000006 - type: f1 value: 45.57042543386915 - type: f1_weighted value: 45.57042543386915 - type: main_score value: 46.272000000000006 task: type: Classification - dataset: config: default name: MTEB ArxivClusteringP2P (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: main_score value: 44.9469238081379 - type: v_measure value: 44.9469238081379 - type: v_measure_std value: 13.26811262671461 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S (default) revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: main_score value: 34.12071448053325 - type: v_measure value: 34.12071448053325 - type: v_measure_std value: 13.7019879046405 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions (default) revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: main_score value: 61.597667288657846 - type: map value: 61.597667288657846 - type: mrr value: 75.57940904893813 - type: nAUC_map_diff1 value: 8.745172077340095 - type: nAUC_map_max value: 20.114863024035493 - type: nAUC_map_std value: 15.991351189572192 - type: nAUC_mrr_diff1 value: 20.781369244159983 - type: nAUC_mrr_max value: 30.78542570228559 - type: nAUC_mrr_std value: 19.861484857303676 task: type: Reranking - dataset: config: default name: MTEB BIOSSES (default) revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: cosine_pearson value: 88.55587996301419 - type: cosine_spearman value: 86.40317357420093 - type: euclidean_pearson value: 86.93771958250231 - type: euclidean_spearman value: 86.40317357420093 - type: main_score value: 86.40317357420093 - type: manhattan_pearson value: 86.92196577117366 - type: manhattan_spearman value: 85.79834051556095 - type: pearson value: 88.55587996301419 - type: spearman value: 86.40317357420093 task: type: STS - dataset: config: default name: MTEB Banking77Classification (default) revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 80.0064935064935 - type: f1 value: 79.29524254086299 - type: f1_weighted value: 79.295242540863 - type: main_score value: 80.0064935064935 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P (default) revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: main_score value: 35.27186813341181 - type: v_measure value: 35.27186813341181 - type: v_measure_std value: 0.8621482145872432 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S (default) revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: main_score value: 28.411805064852295 - type: v_measure value: 28.411805064852295 - type: v_measure_std value: 0.7194290078011281 task: type: Clustering - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 43.675 - type: f1 value: 40.15061931375577 - type: f1_weighted value: 45.714186572727066 - type: main_score value: 43.675 task: type: Classification - dataset: config: default name: MTEB ImdbClassification (default) revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 84.35640000000001 - type: ap value: 79.07507736685174 - type: ap_weighted value: 79.07507736685174 - type: f1 value: 84.32288494833531 - type: f1_weighted value: 84.32288494833531 - type: main_score value: 84.35640000000001 task: type: Classification - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 91.35658914728684 - type: f1 value: 90.86877537911086 - type: f1_weighted value: 91.3282092774443 - type: main_score value: 91.35658914728684 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 60.63611491108071 - type: f1 value: 42.78886482112741 - type: f1_weighted value: 63.44208631840539 - type: main_score value: 60.63611491108071 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 66.68796234028245 - type: f1 value: 64.44940791000278 - type: f1_weighted value: 65.77554417406792 - type: main_score value: 66.68796234028245 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 73.0598520511096 - type: f1 value: 72.14267273884774 - type: f1_weighted value: 72.93345180137516 - type: main_score value: 73.0598520511096 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P (default) revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: main_score value: 31.143081341699606 - type: v_measure value: 31.143081341699606 - type: v_measure_std value: 1.5578716347076906 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S (default) revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: main_score value: 27.010818869829556 - type: v_measure value: 27.010818869829556 - type: v_measure_std value: 1.1771554540819378 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking (default) revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 split: test type: mteb/mind_small metrics: - type: main_score value: 30.20503776754942 - type: map value: 30.20503776754942 - type: mrr value: 31.076636002733437 - type: nAUC_map_diff1 value: 7.290568655287842 - type: nAUC_map_max value: -21.381599355932945 - type: nAUC_map_std value: -7.709920607543168 - type: nAUC_mrr_diff1 value: 7.558397329284913 - type: nAUC_mrr_max value: -15.981397186427607 - type: nAUC_mrr_std value: -4.870495243168834 task: type: Reranking - dataset: config: default name: MTEB RedditClustering (default) revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: main_score value: 51.85893476633338 - type: v_measure value: 51.85893476633338 - type: v_measure_std value: 4.704770139385852 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P (default) revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: main_score value: 61.8124222918822 - type: v_measure value: 61.8124222918822 - type: v_measure_std value: 11.994472578100165 task: type: Clustering - dataset: config: default name: MTEB SICK-R (default) revision: 20a6d6f312dd54037fe07a32d58e5e168867909d split: test type: mteb/sickr-sts metrics: - type: cosine_pearson value: 77.63310776935984 - type: cosine_spearman value: 69.86468291111039 - type: euclidean_pearson value: 73.91537077798837 - type: euclidean_spearman value: 69.86468376650203 - type: main_score value: 69.86468291111039 - type: manhattan_pearson value: 73.68616048370464 - type: manhattan_spearman value: 69.76232036206659 - type: pearson value: 77.63310776935984 - type: spearman value: 69.86468291111039 task: type: STS - dataset: config: default name: MTEB STS12 (default) revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: cosine_pearson value: 57.71716838245049 - type: cosine_spearman value: 61.797855543446424 - type: euclidean_pearson value: 58.22958675325848 - type: euclidean_spearman value: 61.797855543446424 - type: main_score value: 61.797855543446424 - type: manhattan_pearson value: 57.63117544997929 - type: manhattan_spearman value: 61.3629404350085 - type: pearson value: 57.71716838245049 - type: spearman value: 61.797855543446424 task: type: STS - dataset: config: default name: MTEB STS13 (default) revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: cosine_pearson value: 82.30260026790903 - type: cosine_spearman value: 82.66959813070869 - type: euclidean_pearson value: 82.08383017580783 - type: euclidean_spearman value: 82.66959813070869 - type: main_score value: 82.66959813070869 - type: manhattan_pearson value: 81.77991451392153 - type: manhattan_spearman value: 82.3652534745606 - type: pearson value: 82.30260026790903 - type: spearman value: 82.66959813070869 task: type: STS - dataset: config: default name: MTEB STS14 (default) revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: cosine_pearson value: 71.50608384084478 - type: cosine_spearman value: 68.94968064977785 - type: euclidean_pearson value: 70.73381299949564 - type: euclidean_spearman value: 68.94968064977785 - type: main_score value: 68.94968064977785 - type: manhattan_pearson value: 70.5385486953787 - type: manhattan_spearman value: 68.82132770672365 - type: pearson value: 71.50608384084478 - type: spearman value: 68.94968064977785 task: type: STS - dataset: config: default name: MTEB STS15 (default) revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: cosine_pearson value: 73.66969825874907 - type: cosine_spearman value: 75.55374982088381 - type: euclidean_pearson value: 75.9339313749594 - type: euclidean_spearman value: 75.55374982088381 - type: main_score value: 75.55374982088381 - type: manhattan_pearson value: 75.88287553383817 - type: manhattan_spearman value: 75.50729812977688 - type: pearson value: 73.66969825874907 - type: spearman value: 75.55374982088381 task: type: STS - dataset: config: default name: MTEB STS16 (default) revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: cosine_pearson value: 74.5954724414016 - type: cosine_spearman value: 77.2688820850505 - type: euclidean_pearson value: 77.19866353971555 - type: euclidean_spearman value: 77.2688820850505 - type: main_score value: 77.2688820850505 - type: manhattan_pearson value: 77.27072603680978 - type: manhattan_spearman value: 77.29408453673607 - type: pearson value: 74.5954724414016 - type: spearman value: 77.2688820850505 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 71.52588722654055 - type: cosine_spearman value: 74.97235736456061 - type: euclidean_pearson value: 74.51952528854038 - type: euclidean_spearman value: 74.97235736456061 - type: main_score value: 74.97235736456061 - type: manhattan_pearson value: 74.48272300884209 - type: manhattan_spearman value: 74.80633649415176 - type: pearson value: 71.52588722654055 - type: spearman value: 74.97235736456061 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 68.80031120401976 - type: cosine_spearman value: 69.07945196478491 - type: euclidean_pearson value: 68.99674496430792 - type: euclidean_spearman value: 69.07945196478491 - type: main_score value: 69.07945196478491 - type: manhattan_pearson value: 69.00236107775687 - type: manhattan_spearman value: 68.98064879049272 - type: pearson value: 68.80031120401976 - type: spearman value: 69.07945196478491 task: type: STS - dataset: config: default name: MTEB STSBenchmark (default) revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: cosine_pearson value: 65.6898007230089 - type: cosine_spearman value: 69.72386211803668 - type: euclidean_pearson value: 69.04523003701475 - type: euclidean_spearman value: 69.72386211803668 - type: main_score value: 69.72386211803668 - type: manhattan_pearson value: 68.80479743770702 - type: manhattan_spearman value: 69.43264575177459 - type: pearson value: 65.6898007230089 - type: spearman value: 69.72386211803668 task: type: STS - dataset: config: default name: MTEB SciDocsRR (default) revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: main_score value: 79.74088066874383 - type: map value: 79.74088066874383 - type: mrr value: 94.47697455050397 - type: nAUC_map_diff1 value: 8.036086256905502 - type: nAUC_map_max value: 54.88199803816819 - type: nAUC_map_std value: 69.16267942176574 - type: nAUC_mrr_diff1 value: 50.020738477678115 - type: nAUC_mrr_max value: 83.28922770326483 - type: nAUC_mrr_std value: 83.63973501802224 task: type: Reranking - dataset: config: default name: MTEB SprintDuplicateQuestions (default) revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: cosine_accuracy value: 99.83861386138614 - type: cosine_accuracy_threshold value: 74.75666999816895 - type: cosine_ap value: 96.15132792066652 - type: cosine_f1 value: 91.84890656063618 - type: cosine_f1_threshold value: 71.70594930648804 - type: cosine_precision value: 91.30434782608695 - type: cosine_recall value: 92.4 - type: dot_accuracy value: 99.83861386138614 - type: dot_accuracy_threshold value: 74.75666999816895 - type: dot_ap value: 96.15132792066653 - type: dot_f1 value: 91.84890656063618 - type: dot_f1_threshold value: 71.70596122741699 - type: dot_precision value: 91.30434782608695 - type: dot_recall value: 92.4 - type: euclidean_accuracy value: 99.83861386138614 - type: euclidean_accuracy_threshold value: 71.05395793914795 - type: euclidean_ap value: 96.15132792066652 - type: euclidean_f1 value: 91.84890656063618 - type: euclidean_f1_threshold value: 75.22505521774292 - type: euclidean_precision value: 91.30434782608695 - type: euclidean_recall value: 92.4 - type: main_score value: 96.15132792066653 - type: manhattan_accuracy value: 99.83564356435643 - type: manhattan_accuracy_threshold value: 1547.6950645446777 - type: manhattan_ap value: 96.06151211452136 - type: manhattan_f1 value: 91.61676646706587 - type: manhattan_f1_threshold value: 1626.3608932495117 - type: manhattan_precision value: 91.43426294820716 - type: manhattan_recall value: 91.8 - type: max_ap value: 96.15132792066653 - type: max_f1 value: 91.84890656063618 - type: max_precision value: 91.43426294820716 - type: max_recall value: 92.4 - type: similarity_accuracy value: 99.83861386138614 - type: similarity_accuracy_threshold value: 74.75666999816895 - type: similarity_ap value: 96.15132792066652 - type: similarity_f1 value: 91.84890656063618 - type: similarity_f1_threshold value: 71.70594930648804 - type: similarity_precision value: 91.30434782608695 - type: similarity_recall value: 92.4 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering (default) revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: main_score value: 61.24120328328453 - type: v_measure value: 61.24120328328453 - type: v_measure_std value: 3.9946560691100372 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P (default) revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: main_score value: 33.808268374864745 - type: v_measure value: 33.808268374864745 - type: v_measure_std value: 1.2212188701887239 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions (default) revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: main_score value: 52.19806018468037 - type: map value: 52.19806018468037 - type: mrr value: 52.98921462524404 - type: nAUC_map_diff1 value: 37.41443156995912 - type: nAUC_map_max value: 9.410262727675603 - type: nAUC_map_std value: 8.7094185014992 - type: nAUC_mrr_diff1 value: 37.78202772392581 - type: nAUC_mrr_max value: 10.517635536565816 - type: nAUC_mrr_std value: 8.509423813772491 task: type: Reranking - dataset: config: default name: MTEB SummEval (default) revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: cosine_pearson value: 30.48413700430812 - type: cosine_spearman value: 30.357162200875816 - type: dot_pearson value: 30.484140144824938 - type: dot_spearman value: 30.357162200875816 - type: main_score value: 30.357162200875816 - type: pearson value: 30.48413700430812 - type: spearman value: 30.357162200875816 task: type: Summarization - dataset: config: default name: MTEB ToxicConversationsClassification (default) revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 66.8359375 - type: ap value: 12.482653786025985 - type: ap_weighted value: 12.482653786025985 - type: f1 value: 51.328608527332385 - type: f1_weighted value: 74.07974463955398 - type: main_score value: 66.8359375 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification (default) revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 53.907753254103 - type: f1 value: 54.22707647269581 - type: f1_weighted value: 53.611822984407695 - type: main_score value: 53.907753254103 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering (default) revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: main_score value: 38.1364789307295 - type: v_measure value: 38.1364789307295 - type: v_measure_std value: 2.0731634966352077 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 (default) revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: cosine_accuracy value: 82.66674614054956 - type: cosine_accuracy_threshold value: 79.80123162269592 - type: cosine_ap value: 63.28209719072804 - type: cosine_f1 value: 60.16389710903711 - type: cosine_f1_threshold value: 72.22893834114075 - type: cosine_precision value: 52.90232185748599 - type: cosine_recall value: 69.73614775725594 - type: dot_accuracy value: 82.66674614054956 - type: dot_accuracy_threshold value: 79.8012375831604 - type: dot_ap value: 63.282103870645166 - type: dot_f1 value: 60.16389710903711 - type: dot_f1_threshold value: 72.22894430160522 - type: dot_precision value: 52.90232185748599 - type: dot_recall value: 69.73614775725594 - type: euclidean_accuracy value: 82.66674614054956 - type: euclidean_accuracy_threshold value: 63.55905532836914 - type: euclidean_ap value: 63.282095399953164 - type: euclidean_f1 value: 60.16389710903711 - type: euclidean_f1_threshold value: 74.5265781879425 - type: euclidean_precision value: 52.90232185748599 - type: euclidean_recall value: 69.73614775725594 - type: main_score value: 63.282103870645166 - type: manhattan_accuracy value: 82.74423317637242 - type: manhattan_accuracy_threshold value: 1415.380859375 - type: manhattan_ap value: 63.26931757839598 - type: manhattan_f1 value: 60.11014948859166 - type: manhattan_f1_threshold value: 1632.522201538086 - type: manhattan_precision value: 52.359506559624045 - type: manhattan_recall value: 70.55408970976254 - type: max_ap value: 63.282103870645166 - type: max_f1 value: 60.16389710903711 - type: max_precision value: 52.90232185748599 - type: max_recall value: 70.55408970976254 - type: similarity_accuracy value: 82.66674614054956 - type: similarity_accuracy_threshold value: 79.80123162269592 - type: similarity_ap value: 63.28209719072804 - type: similarity_f1 value: 60.16389710903711 - type: similarity_f1_threshold value: 72.22893834114075 - type: similarity_precision value: 52.90232185748599 - type: similarity_recall value: 69.73614775725594 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus (default) revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: cosine_accuracy value: 88.10105949470253 - type: cosine_accuracy_threshold value: 68.95147562026978 - type: cosine_ap value: 84.65516103854583 - type: cosine_f1 value: 76.54581123301605 - type: cosine_f1_threshold value: 63.92929553985596 - type: cosine_precision value: 72.46526344751685 - type: cosine_recall value: 81.11333538651063 - type: dot_accuracy value: 88.10105949470253 - type: dot_accuracy_threshold value: 68.95147562026978 - type: dot_ap value: 84.65516301437592 - type: dot_f1 value: 76.54581123301605 - type: dot_f1_threshold value: 63.92928957939148 - type: dot_precision value: 72.46526344751685 - type: dot_recall value: 81.11333538651063 - type: euclidean_accuracy value: 88.10105949470253 - type: euclidean_accuracy_threshold value: 78.80169153213501 - type: euclidean_ap value: 84.65517268264233 - type: euclidean_f1 value: 76.54581123301605 - type: euclidean_f1_threshold value: 84.93610620498657 - type: euclidean_precision value: 72.46526344751685 - type: euclidean_recall value: 81.11333538651063 - type: main_score value: 84.65517268264233 - type: manhattan_accuracy value: 88.08941669577366 - type: manhattan_accuracy_threshold value: 1739.3169403076172 - type: manhattan_ap value: 84.64592398855694 - type: manhattan_f1 value: 76.62890540443034 - type: manhattan_f1_threshold value: 1861.344337463379 - type: manhattan_precision value: 72.09775967413442 - type: manhattan_recall value: 81.76778564829073 - type: max_ap value: 84.65517268264233 - type: max_f1 value: 76.62890540443034 - type: max_precision value: 72.46526344751685 - type: max_recall value: 81.76778564829073 - type: similarity_accuracy value: 88.10105949470253 - type: similarity_accuracy_threshold value: 68.95147562026978 - type: similarity_ap value: 84.65516103854583 - type: similarity_f1 value: 76.54581123301605 - type: similarity_f1_threshold value: 63.92929553985596 - type: similarity_precision value: 72.46526344751685 - type: similarity_recall value: 81.11333538651063 task: type: PairClassification ---

Snowflake's Arctic-embed-m-v1.5

News | This Model | Usage | FAQ | Contact | License | Acknowledgement

## News 12/11/2024: Release of Technical Report for 2.0 model 12/04/2024: Release of L-2.0 and M-2.0 07/26/2024: Release preprint [[2407.18887] Embedding And Clustering Your Data Can Improve Contrastive Pretraining]( on arXiv. 07/18/2024: Release of , capable of producing highly compressible embedding vectors that preserve quality even when squished as small as 128 bytes per vector. Details about the development of this model are available in the launch post on the Snowflake engineering blog. 05/10/2024: Release of the technical report on Arctic Embed 04/16/2024: Original release the family of text embedding models. ## This Model This model is an updated version of snowflake-arctic-embed-m designed to improve embedding vector compressibility. This model achieves a slightly higher performance overall without compression, and it is additionally capable of retaining most of its retrieval quality even down to 128 byte embedding vectors through a combination of Matryoshka Representation Learning (MRL) and uniform scalar quanitization. | Model Name | MTEB Retrieval Score (NDCG @ 10) | |:------------------------------------------------------------------------------------------------|:---------------------------------| | snowflake-arctic-embed-m-v1.5 | 55.14 | | snowflake-arctic-embed-m | 54.91 | Compared to several other models trained with MRL to produce 256-dimensional embedding vectors, retains a higher degree of original model quality and delivers better retrieval quality on the MTEB Retrieval benchmark. | Model | Model Parameters | MTEB Retrieval Score at 256 Dimensions (fraction of arctic-embed-m-v1.5) | |:------------------------------|:-------------------|:---------------------------------------------------------------------------| | Snowflake arctic-embed-m-v1.5 | 109M | 54.2 (100%) | | Google gecko | 1200M | 52.4 (97%) | | OpenAI text-embedding-3-large | Not Published | 51.7 (95%) | | Nomic nomic-embed-text-v1.5 | 138M | 50.8 (94%) | Additionally, this model was designed to pair well with a corpus-independent scalar quantization scheme to achieve great performance even in as little as 128 bytes per vector (24x compression compared to 768 dimensional vectors stored in float32). | Model Version | Dimensionality | Scalar Quantization | Bytes Per Vector (fraction of baseline) | MTEB Retrieval Score (fraction of baseline) | Vectors Per GB (improvement over baseline) | |:----------------|-----------------:|:----------------------|:------------------------------------------|:----------------------------------------------|:---------------------------------------------| | v1 | 768 | None (float32) | 3072 (100%) | 54.9 (100%) | 0.33M (1.0x) | | v1 | 768 | int8 | 768 (25%) | 54.9 (100%) | 1.3M (4x) | | v1.5 | 768 | int8 | 768 (25%) | 55.1 (100%) | 1.3M (4x) | | v1.5 | 256 | int8 | 256 (8.3%) | 54.2 (99%) | 3.9M (12x) | | v1.5 | 256 | int4 | 128 (4.2%) | 53.7 (98%) | 7.8M (24x) | NOTE: Good uniform scalar quantization ranges to use with this model (and which were used in the eval above), are -0.18 to +0.18 for 4bit and -0.3 to +0.3 for 8bit. For a detailed walkthrough of using integer quantization with , check out our example notebook on GitHub. ## Usage ### Using Sentence Transformers You can use the sentence-transformers package to use any of the snowflake-arctic-embed models. Here's an example for . ### Using Huggingface transformers You can use the transformers package to use an snowflake-arctic-embed model, too. For optimal retrieval quality, remember to use the CLS token for embeddings and to use the query prefix below (just on the query). ### Using Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM by running: You can then use the model to compute embeddings as follows: ### Compressing to 128 bytes This model is designed to generate embeddings which compress well down to 128 bytes via a two-part compression scheme: 1. Truncation and renormalization to 256 dimensions (a la Matryoskha Representation Learning, see the original paper for reference). 2. 4-bit uniform scalar quantization of all 256 values to the same range (-0.18 to +0.18). - For 8-bit uniform scalar quantization, the slightly wider range -0.3 to +0.3 tends to work slightly better given how much more granular 8-bit quantization is. For an in-depth examples, check out our arctic-embed GitHub repositiory. ## FAQ TBD ## Contact Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Daniel Campos(daniel.campos@snowflake.com). ## License Arctic is licensed under the Apache-2. The released models can be used for commercial purposes free of charge. ## Acknowledgement We want to thank the open-source community, which has provided the great building blocks upon which we could make our models. We thank our modeling engineers, Danmei Xu, Luke Merrick, Gaurav Nuti, and Daniel Campos, for making these great models possible. We thank our leadership, Himabindu Pucha, Kelvin So, Vivek Raghunathan, and Sridhar Ramaswamy, for supporting this work. We also thank the open-source community for producing the great models we could build on top of and making these releases possible. Finally, we thank the researchers who created BEIR and MTEB benchmarks. It is largely thanks to their tireless work to define what better looks like that we could improve model performance.", + "model_explanation_gemini": "Generates embeddings for sentences to measure similarity and perform feature extraction tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Snowflake_snowflake-arctic-embed-m.json b/data/model_data_json/Snowflake_snowflake-arctic-embed-m.json new file mode 100644 index 0000000000000000000000000000000000000000..67cd3e18a64ba70736e7f0152aaafc0783082724 --- /dev/null +++ b/data/model_data_json/Snowflake_snowflake-arctic-embed-m.json @@ -0,0 +1,26 @@ +{ + "model_id": "Snowflake/snowflake-arctic-embed-m", + "downloads": 623087, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "arctic", + "snowflake-arctic-embed", + "transformers.js", + "arxiv:2407.18887", + "arxiv:2405.05374", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - arctic - snowflake-arctic-embed - transformers.js model-index: - name: snowflake-arctic-embed-m results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.80597014925374 - type: ap value: 39.31198155789558 - type: f1 value: 70.48198448222148 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 82.831525 - type: ap value: 77.4474050181638 - type: f1 value: 82.77204845110204 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 38.93000000000001 - type: f1 value: 37.98013371053459 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 31.223 - type: map_at_10 value: 47.43 - type: map_at_100 value: 48.208 - type: map_at_1000 value: 48.211 - type: map_at_3 value: 42.579 - type: map_at_5 value: 45.263999999999996 - type: mrr_at_1 value: 31.65 - type: mrr_at_10 value: 47.573 - type: mrr_at_100 value: 48.359 - type: mrr_at_1000 value: 48.362 - type: mrr_at_3 value: 42.734 - type: mrr_at_5 value: 45.415 - type: ndcg_at_1 value: 31.223 - type: ndcg_at_10 value: 56.436 - type: ndcg_at_100 value: 59.657000000000004 - type: ndcg_at_1000 value: 59.731 - type: ndcg_at_3 value: 46.327 - type: ndcg_at_5 value: 51.178000000000004 - type: precision_at_1 value: 31.223 - type: precision_at_10 value: 8.527999999999999 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 19.061 - type: precision_at_5 value: 13.797999999999998 - type: recall_at_1 value: 31.223 - type: recall_at_10 value: 85.277 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 57.18299999999999 - type: recall_at_5 value: 68.99 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.23625429411296 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 37.433880471403654 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 60.53175025582013 - type: mrr value: 74.51160796728664 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 88.93746103286769 - type: cos_sim_spearman value: 86.62245567912619 - type: euclidean_pearson value: 87.154173907501 - type: euclidean_spearman value: 86.62245567912619 - type: manhattan_pearson value: 87.17682026633462 - type: manhattan_spearman value: 86.74775973908348 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 80.33766233766232 - type: f1 value: 79.64931422442245 - task: type: Clustering dataset: type: jinaai/big-patent-clustering name: MTEB BigPatentClustering config: default split: test revision: 62d5330920bca426ce9d3c76ea914f15fc83e891 metrics: - type: v_measure value: 19.116028913890613 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 36.966921852810174 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 31.98019698537654 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 34.079 - type: map_at_10 value: 46.35 - type: map_at_100 value: 47.785 - type: map_at_1000 value: 47.903 - type: map_at_3 value: 42.620999999999995 - type: map_at_5 value: 44.765 - type: mrr_at_1 value: 41.345 - type: mrr_at_10 value: 52.032000000000004 - type: mrr_at_100 value: 52.690000000000005 - type: mrr_at_1000 value: 52.727999999999994 - type: mrr_at_3 value: 49.428 - type: mrr_at_5 value: 51.093999999999994 - type: ndcg_at_1 value: 41.345 - type: ndcg_at_10 value: 53.027 - type: ndcg_at_100 value: 57.962 - type: ndcg_at_1000 value: 59.611999999999995 - type: ndcg_at_3 value: 47.687000000000005 - type: ndcg_at_5 value: 50.367 - type: precision_at_1 value: 41.345 - type: precision_at_10 value: 10.157 - type: precision_at_100 value: 1.567 - type: precision_at_1000 value: 0.199 - type: precision_at_3 value: 23.081 - type: precision_at_5 value: 16.738 - type: recall_at_1 value: 34.079 - type: recall_at_10 value: 65.93900000000001 - type: recall_at_100 value: 86.42699999999999 - type: recall_at_1000 value: 96.61 - type: recall_at_3 value: 50.56699999999999 - type: recall_at_5 value: 57.82000000000001 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 33.289 - type: map_at_10 value: 43.681 - type: map_at_100 value: 45.056000000000004 - type: map_at_1000 value: 45.171 - type: map_at_3 value: 40.702 - type: map_at_5 value: 42.292 - type: mrr_at_1 value: 41.146 - type: mrr_at_10 value: 49.604 - type: mrr_at_100 value: 50.28399999999999 - type: mrr_at_1000 value: 50.322 - type: mrr_at_3 value: 47.611 - type: mrr_at_5 value: 48.717 - type: ndcg_at_1 value: 41.146 - type: ndcg_at_10 value: 49.43 - type: ndcg_at_100 value: 54.01899999999999 - type: ndcg_at_1000 value: 55.803000000000004 - type: ndcg_at_3 value: 45.503 - type: ndcg_at_5 value: 47.198 - type: precision_at_1 value: 41.146 - type: precision_at_10 value: 9.268 - type: precision_at_100 value: 1.4749999999999999 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 21.932 - type: precision_at_5 value: 15.389 - type: recall_at_1 value: 33.289 - type: recall_at_10 value: 59.209999999999994 - type: recall_at_100 value: 78.676 - type: recall_at_1000 value: 89.84100000000001 - type: recall_at_3 value: 47.351 - type: recall_at_5 value: 52.178999999999995 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 44.483 - type: map_at_10 value: 56.862 - type: map_at_100 value: 57.901 - type: map_at_1000 value: 57.948 - type: map_at_3 value: 53.737 - type: map_at_5 value: 55.64 - type: mrr_at_1 value: 50.658 - type: mrr_at_10 value: 60.281 - type: mrr_at_100 value: 60.946 - type: mrr_at_1000 value: 60.967000000000006 - type: mrr_at_3 value: 58.192 - type: mrr_at_5 value: 59.531 - type: ndcg_at_1 value: 50.658 - type: ndcg_at_10 value: 62.339 - type: ndcg_at_100 value: 66.28399999999999 - type: ndcg_at_1000 value: 67.166 - type: ndcg_at_3 value: 57.458 - type: ndcg_at_5 value: 60.112 - type: precision_at_1 value: 50.658 - type: precision_at_10 value: 9.762 - type: precision_at_100 value: 1.26 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 25.329 - type: precision_at_5 value: 17.254 - type: recall_at_1 value: 44.483 - type: recall_at_10 value: 74.819 - type: recall_at_100 value: 91.702 - type: recall_at_1000 value: 97.84 - type: recall_at_3 value: 62.13999999999999 - type: recall_at_5 value: 68.569 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 26.489 - type: map_at_10 value: 37.004999999999995 - type: map_at_100 value: 38.001000000000005 - type: map_at_1000 value: 38.085 - type: map_at_3 value: 34.239999999999995 - type: map_at_5 value: 35.934 - type: mrr_at_1 value: 28.362 - type: mrr_at_10 value: 38.807 - type: mrr_at_100 value: 39.671 - type: mrr_at_1000 value: 39.736 - type: mrr_at_3 value: 36.29 - type: mrr_at_5 value: 37.906 - type: ndcg_at_1 value: 28.362 - type: ndcg_at_10 value: 42.510999999999996 - type: ndcg_at_100 value: 47.226 - type: ndcg_at_1000 value: 49.226 - type: ndcg_at_3 value: 37.295 - type: ndcg_at_5 value: 40.165 - type: precision_at_1 value: 28.362 - type: precision_at_10 value: 6.633 - type: precision_at_100 value: 0.9490000000000001 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 16.234 - type: precision_at_5 value: 11.434999999999999 - type: recall_at_1 value: 26.489 - type: recall_at_10 value: 57.457 - type: recall_at_100 value: 78.712 - type: recall_at_1000 value: 93.565 - type: recall_at_3 value: 43.748 - type: recall_at_5 value: 50.589 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 12.418999999999999 - type: map_at_10 value: 22.866 - type: map_at_100 value: 24.365000000000002 - type: map_at_1000 value: 24.479 - type: map_at_3 value: 19.965 - type: map_at_5 value: 21.684 - type: mrr_at_1 value: 14.677000000000001 - type: mrr_at_10 value: 26.316 - type: mrr_at_100 value: 27.514 - type: mrr_at_1000 value: 27.57 - type: mrr_at_3 value: 23.3 - type: mrr_at_5 value: 25.191000000000003 - type: ndcg_at_1 value: 14.677000000000001 - type: ndcg_at_10 value: 28.875 - type: ndcg_at_100 value: 35.607 - type: ndcg_at_1000 value: 38.237 - type: ndcg_at_3 value: 23.284 - type: ndcg_at_5 value: 26.226 - type: precision_at_1 value: 14.677000000000001 - type: precision_at_10 value: 5.771 - type: precision_at_100 value: 1.058 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_3 value: 11.940000000000001 - type: precision_at_5 value: 9.229 - type: recall_at_1 value: 12.418999999999999 - type: recall_at_10 value: 43.333 - type: recall_at_100 value: 71.942 - type: recall_at_1000 value: 90.67399999999999 - type: recall_at_3 value: 28.787000000000003 - type: recall_at_5 value: 35.638 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 31.686999999999998 - type: map_at_10 value: 42.331 - type: map_at_100 value: 43.655 - type: map_at_1000 value: 43.771 - type: map_at_3 value: 38.944 - type: map_at_5 value: 40.991 - type: mrr_at_1 value: 37.921 - type: mrr_at_10 value: 47.534 - type: mrr_at_100 value: 48.362 - type: mrr_at_1000 value: 48.405 - type: mrr_at_3 value: 44.995000000000005 - type: mrr_at_5 value: 46.617 - type: ndcg_at_1 value: 37.921 - type: ndcg_at_10 value: 48.236000000000004 - type: ndcg_at_100 value: 53.705000000000005 - type: ndcg_at_1000 value: 55.596000000000004 - type: ndcg_at_3 value: 43.11 - type: ndcg_at_5 value: 45.862 - type: precision_at_1 value: 37.921 - type: precision_at_10 value: 8.643 - type: precision_at_100 value: 1.336 - type: precision_at_1000 value: 0.166 - type: precision_at_3 value: 20.308 - type: precision_at_5 value: 14.514 - type: recall_at_1 value: 31.686999999999998 - type: recall_at_10 value: 60.126999999999995 - type: recall_at_100 value: 83.10600000000001 - type: recall_at_1000 value: 95.15 - type: recall_at_3 value: 46.098 - type: recall_at_5 value: 53.179 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 28.686 - type: map_at_10 value: 39.146 - type: map_at_100 value: 40.543 - type: map_at_1000 value: 40.644999999999996 - type: map_at_3 value: 36.195 - type: map_at_5 value: 37.919000000000004 - type: mrr_at_1 value: 35.160000000000004 - type: mrr_at_10 value: 44.711 - type: mrr_at_100 value: 45.609 - type: mrr_at_1000 value: 45.655 - type: mrr_at_3 value: 42.409 - type: mrr_at_5 value: 43.779 - type: ndcg_at_1 value: 35.160000000000004 - type: ndcg_at_10 value: 44.977000000000004 - type: ndcg_at_100 value: 50.663000000000004 - type: ndcg_at_1000 value: 52.794 - type: ndcg_at_3 value: 40.532000000000004 - type: ndcg_at_5 value: 42.641 - type: precision_at_1 value: 35.160000000000004 - type: precision_at_10 value: 8.014000000000001 - type: precision_at_100 value: 1.269 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 19.444 - type: precision_at_5 value: 13.653 - type: recall_at_1 value: 28.686 - type: recall_at_10 value: 56.801 - type: recall_at_100 value: 80.559 - type: recall_at_1000 value: 95.052 - type: recall_at_3 value: 43.675999999999995 - type: recall_at_5 value: 49.703 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 28.173833333333338 - type: map_at_10 value: 38.202083333333334 - type: map_at_100 value: 39.47475 - type: map_at_1000 value: 39.586499999999994 - type: map_at_3 value: 35.17308333333334 - type: map_at_5 value: 36.914 - type: mrr_at_1 value: 32.92958333333333 - type: mrr_at_10 value: 42.16758333333333 - type: mrr_at_100 value: 43.04108333333333 - type: mrr_at_1000 value: 43.092499999999994 - type: mrr_at_3 value: 39.69166666666666 - type: mrr_at_5 value: 41.19458333333333 - type: ndcg_at_1 value: 32.92958333333333 - type: ndcg_at_10 value: 43.80583333333333 - type: ndcg_at_100 value: 49.060916666666664 - type: ndcg_at_1000 value: 51.127250000000004 - type: ndcg_at_3 value: 38.80383333333333 - type: ndcg_at_5 value: 41.29658333333333 - type: precision_at_1 value: 32.92958333333333 - type: precision_at_10 value: 7.655666666666666 - type: precision_at_100 value: 1.2094166666666668 - type: precision_at_1000 value: 0.15750000000000003 - type: precision_at_3 value: 17.87975 - type: precision_at_5 value: 12.741833333333332 - type: recall_at_1 value: 28.173833333333338 - type: recall_at_10 value: 56.219249999999995 - type: recall_at_100 value: 79.01416666666665 - type: recall_at_1000 value: 93.13425000000001 - type: recall_at_3 value: 42.39241666666667 - type: recall_at_5 value: 48.764833333333335 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 25.625999999999998 - type: map_at_10 value: 32.808 - type: map_at_100 value: 33.951 - type: map_at_1000 value: 34.052 - type: map_at_3 value: 30.536 - type: map_at_5 value: 31.77 - type: mrr_at_1 value: 28.374 - type: mrr_at_10 value: 35.527 - type: mrr_at_100 value: 36.451 - type: mrr_at_1000 value: 36.522 - type: mrr_at_3 value: 33.410000000000004 - type: mrr_at_5 value: 34.537 - type: ndcg_at_1 value: 28.374 - type: ndcg_at_10 value: 37.172 - type: ndcg_at_100 value: 42.474000000000004 - type: ndcg_at_1000 value: 44.853 - type: ndcg_at_3 value: 32.931 - type: ndcg_at_5 value: 34.882999999999996 - type: precision_at_1 value: 28.374 - type: precision_at_10 value: 5.813 - type: precision_at_100 value: 0.928 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 14.008000000000001 - type: precision_at_5 value: 9.754999999999999 - type: recall_at_1 value: 25.625999999999998 - type: recall_at_10 value: 47.812 - type: recall_at_100 value: 71.61800000000001 - type: recall_at_1000 value: 88.881 - type: recall_at_3 value: 35.876999999999995 - type: recall_at_5 value: 40.839 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 18.233 - type: map_at_10 value: 26.375999999999998 - type: map_at_100 value: 27.575 - type: map_at_1000 value: 27.706999999999997 - type: map_at_3 value: 23.619 - type: map_at_5 value: 25.217 - type: mrr_at_1 value: 22.023 - type: mrr_at_10 value: 30.122 - type: mrr_at_100 value: 31.083 - type: mrr_at_1000 value: 31.163999999999998 - type: mrr_at_3 value: 27.541 - type: mrr_at_5 value: 29.061999999999998 - type: ndcg_at_1 value: 22.023 - type: ndcg_at_10 value: 31.476 - type: ndcg_at_100 value: 37.114000000000004 - type: ndcg_at_1000 value: 39.981 - type: ndcg_at_3 value: 26.538 - type: ndcg_at_5 value: 29.016 - type: precision_at_1 value: 22.023 - type: precision_at_10 value: 5.819 - type: precision_at_100 value: 1.018 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 12.583 - type: precision_at_5 value: 9.36 - type: recall_at_1 value: 18.233 - type: recall_at_10 value: 43.029 - type: recall_at_100 value: 68.253 - type: recall_at_1000 value: 88.319 - type: recall_at_3 value: 29.541 - type: recall_at_5 value: 35.783 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 28.923 - type: map_at_10 value: 39.231 - type: map_at_100 value: 40.483000000000004 - type: map_at_1000 value: 40.575 - type: map_at_3 value: 35.94 - type: map_at_5 value: 37.683 - type: mrr_at_1 value: 33.955 - type: mrr_at_10 value: 43.163000000000004 - type: mrr_at_100 value: 44.054 - type: mrr_at_1000 value: 44.099 - type: mrr_at_3 value: 40.361000000000004 - type: mrr_at_5 value: 41.905 - type: ndcg_at_1 value: 33.955 - type: ndcg_at_10 value: 45.068000000000005 - type: ndcg_at_100 value: 50.470000000000006 - type: ndcg_at_1000 value: 52.349000000000004 - type: ndcg_at_3 value: 39.298 - type: ndcg_at_5 value: 41.821999999999996 - type: precision_at_1 value: 33.955 - type: precision_at_10 value: 7.649 - type: precision_at_100 value: 1.173 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_3 value: 17.817 - type: precision_at_5 value: 12.537 - type: recall_at_1 value: 28.923 - type: recall_at_10 value: 58.934 - type: recall_at_100 value: 81.809 - type: recall_at_1000 value: 94.71300000000001 - type: recall_at_3 value: 42.975 - type: recall_at_5 value: 49.501 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 28.596 - type: map_at_10 value: 38.735 - type: map_at_100 value: 40.264 - type: map_at_1000 value: 40.48 - type: map_at_3 value: 35.394999999999996 - type: map_at_5 value: 37.099 - type: mrr_at_1 value: 33.992 - type: mrr_at_10 value: 43.076 - type: mrr_at_100 value: 44.005 - type: mrr_at_1000 value: 44.043 - type: mrr_at_3 value: 40.415 - type: mrr_at_5 value: 41.957 - type: ndcg_at_1 value: 33.992 - type: ndcg_at_10 value: 44.896 - type: ndcg_at_100 value: 50.44499999999999 - type: ndcg_at_1000 value: 52.675000000000004 - type: ndcg_at_3 value: 39.783 - type: ndcg_at_5 value: 41.997 - type: precision_at_1 value: 33.992 - type: precision_at_10 value: 8.498 - type: precision_at_100 value: 1.585 - type: precision_at_1000 value: 0.248 - type: precision_at_3 value: 18.511 - type: precision_at_5 value: 13.241 - type: recall_at_1 value: 28.596 - type: recall_at_10 value: 56.885 - type: recall_at_100 value: 82.306 - type: recall_at_1000 value: 95.813 - type: recall_at_3 value: 42.168 - type: recall_at_5 value: 48.32 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 25.576 - type: map_at_10 value: 33.034 - type: map_at_100 value: 34.117999999999995 - type: map_at_1000 value: 34.222 - type: map_at_3 value: 30.183 - type: map_at_5 value: 31.974000000000004 - type: mrr_at_1 value: 27.542 - type: mrr_at_10 value: 34.838 - type: mrr_at_100 value: 35.824 - type: mrr_at_1000 value: 35.899 - type: mrr_at_3 value: 32.348 - type: mrr_at_5 value: 34.039 - type: ndcg_at_1 value: 27.542 - type: ndcg_at_10 value: 37.663000000000004 - type: ndcg_at_100 value: 42.762 - type: ndcg_at_1000 value: 45.235 - type: ndcg_at_3 value: 32.227 - type: ndcg_at_5 value: 35.27 - type: precision_at_1 value: 27.542 - type: precision_at_10 value: 5.840999999999999 - type: precision_at_100 value: 0.895 - type: precision_at_1000 value: 0.123 - type: precision_at_3 value: 13.370000000000001 - type: precision_at_5 value: 9.797 - type: recall_at_1 value: 25.576 - type: recall_at_10 value: 50.285000000000004 - type: recall_at_100 value: 73.06 - type: recall_at_1000 value: 91.15299999999999 - type: recall_at_3 value: 35.781 - type: recall_at_5 value: 43.058 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 17.061 - type: map_at_10 value: 29.464000000000002 - type: map_at_100 value: 31.552999999999997 - type: map_at_1000 value: 31.707 - type: map_at_3 value: 24.834999999999997 - type: map_at_5 value: 27.355 - type: mrr_at_1 value: 38.958 - type: mrr_at_10 value: 51.578 - type: mrr_at_100 value: 52.262 - type: mrr_at_1000 value: 52.283 - type: mrr_at_3 value: 48.599 - type: mrr_at_5 value: 50.404 - type: ndcg_at_1 value: 38.958 - type: ndcg_at_10 value: 39.367999999999995 - type: ndcg_at_100 value: 46.521 - type: ndcg_at_1000 value: 49.086999999999996 - type: ndcg_at_3 value: 33.442 - type: ndcg_at_5 value: 35.515 - type: precision_at_1 value: 38.958 - type: precision_at_10 value: 12.110999999999999 - type: precision_at_100 value: 1.982 - type: precision_at_1000 value: 0.247 - type: precision_at_3 value: 25.102999999999998 - type: precision_at_5 value: 18.971 - type: recall_at_1 value: 17.061 - type: recall_at_10 value: 45.198 - type: recall_at_100 value: 69.18900000000001 - type: recall_at_1000 value: 83.38499999999999 - type: recall_at_3 value: 30.241 - type: recall_at_5 value: 36.851 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.398 - type: map_at_10 value: 21.421 - type: map_at_100 value: 31.649 - type: map_at_1000 value: 33.469 - type: map_at_3 value: 15.310000000000002 - type: map_at_5 value: 17.946 - type: mrr_at_1 value: 71 - type: mrr_at_10 value: 78.92099999999999 - type: mrr_at_100 value: 79.225 - type: mrr_at_1000 value: 79.23 - type: mrr_at_3 value: 77.792 - type: mrr_at_5 value: 78.467 - type: ndcg_at_1 value: 57.99999999999999 - type: ndcg_at_10 value: 44.733000000000004 - type: ndcg_at_100 value: 50.646 - type: ndcg_at_1000 value: 57.903999999999996 - type: ndcg_at_3 value: 49.175999999999995 - type: ndcg_at_5 value: 46.800999999999995 - type: precision_at_1 value: 71 - type: precision_at_10 value: 36.25 - type: precision_at_100 value: 12.135 - type: precision_at_1000 value: 2.26 - type: precision_at_3 value: 52.75 - type: precision_at_5 value: 45.65 - type: recall_at_1 value: 9.398 - type: recall_at_10 value: 26.596999999999998 - type: recall_at_100 value: 57.943 - type: recall_at_1000 value: 81.147 - type: recall_at_3 value: 16.634 - type: recall_at_5 value: 20.7 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.535000000000004 - type: f1 value: 42.53702746452163 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 77.235 - type: map_at_10 value: 85.504 - type: map_at_100 value: 85.707 - type: map_at_1000 value: 85.718 - type: map_at_3 value: 84.425 - type: map_at_5 value: 85.13 - type: mrr_at_1 value: 83.363 - type: mrr_at_10 value: 89.916 - type: mrr_at_100 value: 89.955 - type: mrr_at_1000 value: 89.956 - type: mrr_at_3 value: 89.32600000000001 - type: mrr_at_5 value: 89.79 - type: ndcg_at_1 value: 83.363 - type: ndcg_at_10 value: 89.015 - type: ndcg_at_100 value: 89.649 - type: ndcg_at_1000 value: 89.825 - type: ndcg_at_3 value: 87.45100000000001 - type: ndcg_at_5 value: 88.39399999999999 - type: precision_at_1 value: 83.363 - type: precision_at_10 value: 10.659 - type: precision_at_100 value: 1.122 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 33.338 - type: precision_at_5 value: 20.671999999999997 - type: recall_at_1 value: 77.235 - type: recall_at_10 value: 95.389 - type: recall_at_100 value: 97.722 - type: recall_at_1000 value: 98.744 - type: recall_at_3 value: 91.19800000000001 - type: recall_at_5 value: 93.635 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 20.835 - type: map_at_10 value: 34.459 - type: map_at_100 value: 36.335 - type: map_at_1000 value: 36.518 - type: map_at_3 value: 30.581000000000003 - type: map_at_5 value: 32.859 - type: mrr_at_1 value: 40.894999999999996 - type: mrr_at_10 value: 50.491 - type: mrr_at_100 value: 51.243 - type: mrr_at_1000 value: 51.286 - type: mrr_at_3 value: 47.994 - type: mrr_at_5 value: 49.429 - type: ndcg_at_1 value: 40.894999999999996 - type: ndcg_at_10 value: 42.403 - type: ndcg_at_100 value: 48.954 - type: ndcg_at_1000 value: 51.961 - type: ndcg_at_3 value: 39.11 - type: ndcg_at_5 value: 40.152 - type: precision_at_1 value: 40.894999999999996 - type: precision_at_10 value: 11.466 - type: precision_at_100 value: 1.833 - type: precision_at_1000 value: 0.23700000000000002 - type: precision_at_3 value: 25.874000000000002 - type: precision_at_5 value: 19.012 - type: recall_at_1 value: 20.835 - type: recall_at_10 value: 49.535000000000004 - type: recall_at_100 value: 73.39099999999999 - type: recall_at_1000 value: 91.01599999999999 - type: recall_at_3 value: 36.379 - type: recall_at_5 value: 42.059999999999995 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 40.945 - type: map_at_10 value: 65.376 - type: map_at_100 value: 66.278 - type: map_at_1000 value: 66.33 - type: map_at_3 value: 61.753 - type: map_at_5 value: 64.077 - type: mrr_at_1 value: 81.891 - type: mrr_at_10 value: 87.256 - type: mrr_at_100 value: 87.392 - type: mrr_at_1000 value: 87.395 - type: mrr_at_3 value: 86.442 - type: mrr_at_5 value: 86.991 - type: ndcg_at_1 value: 81.891 - type: ndcg_at_10 value: 73.654 - type: ndcg_at_100 value: 76.62299999999999 - type: ndcg_at_1000 value: 77.60000000000001 - type: ndcg_at_3 value: 68.71199999999999 - type: ndcg_at_5 value: 71.563 - type: precision_at_1 value: 81.891 - type: precision_at_10 value: 15.409 - type: precision_at_100 value: 1.77 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 44.15 - type: precision_at_5 value: 28.732000000000003 - type: recall_at_1 value: 40.945 - type: recall_at_10 value: 77.04299999999999 - type: recall_at_100 value: 88.508 - type: recall_at_1000 value: 94.943 - type: recall_at_3 value: 66.226 - type: recall_at_5 value: 71.83 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 74.08200000000001 - type: ap value: 68.10929101713998 - type: f1 value: 73.98447117652009 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 21.729000000000003 - type: map_at_10 value: 34.602 - type: map_at_100 value: 35.756 - type: map_at_1000 value: 35.803000000000004 - type: map_at_3 value: 30.619000000000003 - type: map_at_5 value: 32.914 - type: mrr_at_1 value: 22.364 - type: mrr_at_10 value: 35.183 - type: mrr_at_100 value: 36.287000000000006 - type: mrr_at_1000 value: 36.327999999999996 - type: mrr_at_3 value: 31.258000000000003 - type: mrr_at_5 value: 33.542 - type: ndcg_at_1 value: 22.364 - type: ndcg_at_10 value: 41.765 - type: ndcg_at_100 value: 47.293 - type: ndcg_at_1000 value: 48.457 - type: ndcg_at_3 value: 33.676 - type: ndcg_at_5 value: 37.783 - type: precision_at_1 value: 22.364 - type: precision_at_10 value: 6.662 - type: precision_at_100 value: 0.943 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.435999999999998 - type: precision_at_5 value: 10.764999999999999 - type: recall_at_1 value: 21.729000000000003 - type: recall_at_10 value: 63.815999999999995 - type: recall_at_100 value: 89.265 - type: recall_at_1000 value: 98.149 - type: recall_at_3 value: 41.898 - type: recall_at_5 value: 51.76500000000001 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.73141814865483 - type: f1 value: 92.17518476408004 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 65.18011855905152 - type: f1 value: 46.70999638311856 - task: type: Classification dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClassification (eng) config: eng split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: accuracy value: 75.24261603375525 - type: f1 value: 74.07895183913367 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (eng) config: eng split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 28.43855875387446 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (eng) config: eng split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 29.05331990256969 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.92333557498318 - type: f1 value: 64.29789389602692 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.74714189643578 - type: f1 value: 71.672585608315 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.503564225501613 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 28.410225127136457 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 29.170019896091908 - type: mrr value: 29.881276831500976 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 6.544 - type: map_at_10 value: 14.116999999999999 - type: map_at_100 value: 17.522 - type: map_at_1000 value: 19 - type: map_at_3 value: 10.369 - type: map_at_5 value: 12.189 - type: mrr_at_1 value: 47.988 - type: mrr_at_10 value: 56.84 - type: mrr_at_100 value: 57.367000000000004 - type: mrr_at_1000 value: 57.403000000000006 - type: mrr_at_3 value: 54.592 - type: mrr_at_5 value: 56.233 - type: ndcg_at_1 value: 45.82 - type: ndcg_at_10 value: 36.767 - type: ndcg_at_100 value: 33.356 - type: ndcg_at_1000 value: 42.062 - type: ndcg_at_3 value: 42.15 - type: ndcg_at_5 value: 40.355000000000004 - type: precision_at_1 value: 47.988 - type: precision_at_10 value: 27.121000000000002 - type: precision_at_100 value: 8.455 - type: precision_at_1000 value: 2.103 - type: precision_at_3 value: 39.628 - type: precision_at_5 value: 35.356 - type: recall_at_1 value: 6.544 - type: recall_at_10 value: 17.928 - type: recall_at_100 value: 32.843 - type: recall_at_1000 value: 65.752 - type: recall_at_3 value: 11.297 - type: recall_at_5 value: 14.357000000000001 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 39.262 - type: map_at_10 value: 55.095000000000006 - type: map_at_100 value: 55.93900000000001 - type: map_at_1000 value: 55.955999999999996 - type: map_at_3 value: 50.93 - type: map_at_5 value: 53.491 - type: mrr_at_1 value: 43.598 - type: mrr_at_10 value: 57.379999999999995 - type: mrr_at_100 value: 57.940999999999995 - type: mrr_at_1000 value: 57.952000000000005 - type: mrr_at_3 value: 53.998000000000005 - type: mrr_at_5 value: 56.128 - type: ndcg_at_1 value: 43.598 - type: ndcg_at_10 value: 62.427 - type: ndcg_at_100 value: 65.759 - type: ndcg_at_1000 value: 66.133 - type: ndcg_at_3 value: 54.745999999999995 - type: ndcg_at_5 value: 58.975 - type: precision_at_1 value: 43.598 - type: precision_at_10 value: 9.789 - type: precision_at_100 value: 1.171 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 24.295 - type: precision_at_5 value: 17.028 - type: recall_at_1 value: 39.262 - type: recall_at_10 value: 82.317 - type: recall_at_100 value: 96.391 - type: recall_at_1000 value: 99.116 - type: recall_at_3 value: 62.621 - type: recall_at_5 value: 72.357 - task: type: Classification dataset: type: ag_news name: MTEB NewsClassification config: default split: test revision: eb185aade064a813bc0b7f42de02595523103ca4 metrics: - type: accuracy value: 78.17500000000001 - type: f1 value: 78.01940892857273 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (en) config: en split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_accuracy value: 99.89816700610999 - type: cos_sim_ap value: 100 - type: cos_sim_f1 value: 99.9490575649516 - type: cos_sim_precision value: 100 - type: cos_sim_recall value: 99.89816700610999 - type: dot_accuracy value: 99.89816700610999 - type: dot_ap value: 100 - type: dot_f1 value: 99.9490575649516 - type: dot_precision value: 100 - type: dot_recall value: 99.89816700610999 - type: euclidean_accuracy value: 99.89816700610999 - type: euclidean_ap value: 100 - type: euclidean_f1 value: 99.9490575649516 - type: euclidean_precision value: 100 - type: euclidean_recall value: 99.89816700610999 - type: manhattan_accuracy value: 99.89816700610999 - type: manhattan_ap value: 100 - type: manhattan_f1 value: 99.9490575649516 - type: manhattan_precision value: 100 - type: manhattan_recall value: 99.89816700610999 - type: max_accuracy value: 99.89816700610999 - type: max_ap value: 100 - type: max_f1 value: 99.9490575649516 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX (en) config: en split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_accuracy value: 61 - type: cos_sim_ap value: 59.630757252602464 - type: cos_sim_f1 value: 62.37521514629949 - type: cos_sim_precision value: 45.34534534534534 - type: cos_sim_recall value: 99.88974641675854 - type: dot_accuracy value: 61 - type: dot_ap value: 59.631527308059006 - type: dot_f1 value: 62.37521514629949 - type: dot_precision value: 45.34534534534534 - type: dot_recall value: 99.88974641675854 - type: euclidean_accuracy value: 61 - type: euclidean_ap value: 59.630757252602464 - type: euclidean_f1 value: 62.37521514629949 - type: euclidean_precision value: 45.34534534534534 - type: euclidean_recall value: 99.88974641675854 - type: manhattan_accuracy value: 60.9 - type: manhattan_ap value: 59.613947780462254 - type: manhattan_f1 value: 62.37521514629949 - type: manhattan_precision value: 45.34534534534534 - type: manhattan_recall value: 99.88974641675854 - type: max_accuracy value: 61 - type: max_ap value: 59.631527308059006 - type: max_f1 value: 62.37521514629949 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 metrics: - type: map_at_1 value: 69.963 - type: map_at_10 value: 83.59400000000001 - type: map_at_100 value: 84.236 - type: map_at_1000 value: 84.255 - type: map_at_3 value: 80.69800000000001 - type: map_at_5 value: 82.568 - type: mrr_at_1 value: 80.58999999999999 - type: mrr_at_10 value: 86.78200000000001 - type: mrr_at_100 value: 86.89099999999999 - type: mrr_at_1000 value: 86.893 - type: mrr_at_3 value: 85.757 - type: mrr_at_5 value: 86.507 - type: ndcg_at_1 value: 80.60000000000001 - type: ndcg_at_10 value: 87.41799999999999 - type: ndcg_at_100 value: 88.723 - type: ndcg_at_1000 value: 88.875 - type: ndcg_at_3 value: 84.565 - type: ndcg_at_5 value: 86.236 - type: precision_at_1 value: 80.60000000000001 - type: precision_at_10 value: 13.239 - type: precision_at_100 value: 1.5150000000000001 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 36.947 - type: precision_at_5 value: 24.354 - type: recall_at_1 value: 69.963 - type: recall_at_10 value: 94.553 - type: recall_at_100 value: 99.104 - type: recall_at_1000 value: 99.872 - type: recall_at_3 value: 86.317 - type: recall_at_5 value: 91.023 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 47.52890410998761 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 metrics: - type: v_measure value: 62.760692287940486 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 metrics: - type: map_at_1 value: 5.093 - type: map_at_10 value: 12.695 - type: map_at_100 value: 14.824000000000002 - type: map_at_1000 value: 15.123000000000001 - type: map_at_3 value: 8.968 - type: map_at_5 value: 10.828 - type: mrr_at_1 value: 25.1 - type: mrr_at_10 value: 35.894999999999996 - type: mrr_at_100 value: 36.966 - type: mrr_at_1000 value: 37.019999999999996 - type: mrr_at_3 value: 32.467 - type: mrr_at_5 value: 34.416999999999994 - type: ndcg_at_1 value: 25.1 - type: ndcg_at_10 value: 21.096999999999998 - type: ndcg_at_100 value: 29.202 - type: ndcg_at_1000 value: 34.541 - type: ndcg_at_3 value: 19.875 - type: ndcg_at_5 value: 17.497 - type: precision_at_1 value: 25.1 - type: precision_at_10 value: 10.9 - type: precision_at_100 value: 2.255 - type: precision_at_1000 value: 0.35400000000000004 - type: precision_at_3 value: 18.367 - type: precision_at_5 value: 15.299999999999999 - type: recall_at_1 value: 5.093 - type: recall_at_10 value: 22.092 - type: recall_at_100 value: 45.778 - type: recall_at_1000 value: 71.985 - type: recall_at_3 value: 11.167 - type: recall_at_5 value: 15.501999999999999 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: 20a6d6f312dd54037fe07a32d58e5e168867909d metrics: - type: cos_sim_pearson value: 74.04386981759481 - type: cos_sim_spearman value: 69.12484963763646 - type: euclidean_pearson value: 71.49384353291062 - type: euclidean_spearman value: 69.12484548317074 - type: manhattan_pearson value: 71.49828173987272 - type: manhattan_spearman value: 69.08350274367014 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 66.95372527615659 - type: cos_sim_spearman value: 66.96821894433991 - type: euclidean_pearson value: 64.675348002074 - type: euclidean_spearman value: 66.96821894433991 - type: manhattan_pearson value: 64.5965887073831 - type: manhattan_spearman value: 66.88569076794741 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 77.34698437961983 - type: cos_sim_spearman value: 79.1153001117325 - type: euclidean_pearson value: 78.53562874696966 - type: euclidean_spearman value: 79.11530018205724 - type: manhattan_pearson value: 78.46484988944093 - type: manhattan_spearman value: 79.01416027493104 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 68.81220371935373 - type: cos_sim_spearman value: 68.50538405089604 - type: euclidean_pearson value: 68.69204272683749 - type: euclidean_spearman value: 68.50534223912419 - type: manhattan_pearson value: 68.67300120149523 - type: manhattan_spearman value: 68.45404301623115 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 78.2464678879813 - type: cos_sim_spearman value: 79.92003940566667 - type: euclidean_pearson value: 79.8080778793964 - type: euclidean_spearman value: 79.92003940566667 - type: manhattan_pearson value: 79.80153621444681 - type: manhattan_spearman value: 79.91293261418134 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 76.31179207708662 - type: cos_sim_spearman value: 78.65597349856115 - type: euclidean_pearson value: 78.76937027472678 - type: euclidean_spearman value: 78.65597349856115 - type: manhattan_pearson value: 78.77129513300605 - type: manhattan_spearman value: 78.62640467680775 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 79.43158429552561 - type: cos_sim_spearman value: 81.46108646565362 - type: euclidean_pearson value: 81.47071791452292 - type: euclidean_spearman value: 81.46108646565362 - type: manhattan_pearson value: 81.56920643846031 - type: manhattan_spearman value: 81.42226241399516 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 66.89546474141514 - type: cos_sim_spearman value: 65.8393752170531 - type: euclidean_pearson value: 67.2580522762307 - type: euclidean_spearman value: 65.8393752170531 - type: manhattan_pearson value: 67.45157729300522 - type: manhattan_spearman value: 66.19470854403802 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 71.39566306334434 - type: cos_sim_spearman value: 74.0981396086974 - type: euclidean_pearson value: 73.7834496259745 - type: euclidean_spearman value: 74.09803741302046 - type: manhattan_pearson value: 73.79958138780945 - type: manhattan_spearman value: 74.09894837555905 - task: type: STS dataset: type: PhilipMay/stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (en) config: en split: test revision: 93d57ef91790589e3ce9c365164337a8a78b7632 metrics: - type: cos_sim_pearson value: 71.39566311006806 - type: cos_sim_spearman value: 74.0981396086974 - type: euclidean_pearson value: 73.78344970897099 - type: euclidean_spearman value: 74.09803741302046 - type: manhattan_pearson value: 73.79958147136705 - type: manhattan_spearman value: 74.09894837555905 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 80.81059564334683 - type: mrr value: 94.62696617108381 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 57.760999999999996 - type: map_at_10 value: 68.614 - type: map_at_100 value: 69.109 - type: map_at_1000 value: 69.134 - type: map_at_3 value: 65.735 - type: map_at_5 value: 67.42099999999999 - type: mrr_at_1 value: 60.667 - type: mrr_at_10 value: 69.94200000000001 - type: mrr_at_100 value: 70.254 - type: mrr_at_1000 value: 70.28 - type: mrr_at_3 value: 67.72200000000001 - type: mrr_at_5 value: 69.18900000000001 - type: ndcg_at_1 value: 60.667 - type: ndcg_at_10 value: 73.548 - type: ndcg_at_100 value: 75.381 - type: ndcg_at_1000 value: 75.991 - type: ndcg_at_3 value: 68.685 - type: ndcg_at_5 value: 71.26 - type: precision_at_1 value: 60.667 - type: precision_at_10 value: 9.833 - type: precision_at_100 value: 1.08 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 26.889000000000003 - type: precision_at_5 value: 17.8 - type: recall_at_1 value: 57.760999999999996 - type: recall_at_10 value: 87.13300000000001 - type: recall_at_100 value: 95 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 74.211 - type: recall_at_5 value: 80.63900000000001 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.81881188118813 - type: cos_sim_ap value: 95.21196473745837 - type: cos_sim_f1 value: 90.69767441860465 - type: cos_sim_precision value: 91.71779141104295 - type: cos_sim_recall value: 89.7 - type: dot_accuracy value: 99.81881188118813 - type: dot_ap value: 95.21196473745837 - type: dot_f1 value: 90.69767441860465 - type: dot_precision value: 91.71779141104295 - type: dot_recall value: 89.7 - type: euclidean_accuracy value: 99.81881188118813 - type: euclidean_ap value: 95.21196473745839 - type: euclidean_f1 value: 90.69767441860465 - type: euclidean_precision value: 91.71779141104295 - type: euclidean_recall value: 89.7 - type: manhattan_accuracy value: 99.81287128712871 - type: manhattan_ap value: 95.16667174835017 - type: manhattan_f1 value: 90.41095890410959 - type: manhattan_precision value: 91.7610710607621 - type: manhattan_recall value: 89.1 - type: max_accuracy value: 99.81881188118813 - type: max_ap value: 95.21196473745839 - type: max_f1 value: 90.69767441860465 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 59.54942204515638 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 39.42892282672948 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 51.189033075914324 - type: mrr value: 51.97014790764791 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.09466569775977 - type: cos_sim_spearman value: 30.31058660775912 - type: dot_pearson value: 30.09466438861689 - type: dot_spearman value: 30.31058660775912 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: bb9466bac8153a0349341eb1b22e06409e78ef4e metrics: - type: map_at_1 value: 0.253 - type: map_at_10 value: 2.07 - type: map_at_100 value: 12.679000000000002 - type: map_at_1000 value: 30.412 - type: map_at_3 value: 0.688 - type: map_at_5 value: 1.079 - type: mrr_at_1 value: 96 - type: mrr_at_10 value: 98 - type: mrr_at_100 value: 98 - type: mrr_at_1000 value: 98 - type: mrr_at_3 value: 98 - type: mrr_at_5 value: 98 - type: ndcg_at_1 value: 89 - type: ndcg_at_10 value: 79.646 - type: ndcg_at_100 value: 62.217999999999996 - type: ndcg_at_1000 value: 55.13400000000001 - type: ndcg_at_3 value: 83.458 - type: ndcg_at_5 value: 80.982 - type: precision_at_1 value: 96 - type: precision_at_10 value: 84.6 - type: precision_at_100 value: 64.34 - type: precision_at_1000 value: 24.534 - type: precision_at_3 value: 88.667 - type: precision_at_5 value: 85.6 - type: recall_at_1 value: 0.253 - type: recall_at_10 value: 2.253 - type: recall_at_100 value: 15.606 - type: recall_at_1000 value: 51.595 - type: recall_at_3 value: 0.7100000000000001 - type: recall_at_5 value: 1.139 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.0540000000000003 - type: map_at_10 value: 13.078999999999999 - type: map_at_100 value: 19.468 - type: map_at_1000 value: 21.006 - type: map_at_3 value: 6.8629999999999995 - type: map_at_5 value: 9.187 - type: mrr_at_1 value: 42.857 - type: mrr_at_10 value: 56.735 - type: mrr_at_100 value: 57.352000000000004 - type: mrr_at_1000 value: 57.352000000000004 - type: mrr_at_3 value: 52.721 - type: mrr_at_5 value: 54.66 - type: ndcg_at_1 value: 38.775999999999996 - type: ndcg_at_10 value: 31.469 - type: ndcg_at_100 value: 42.016999999999996 - type: ndcg_at_1000 value: 52.60399999999999 - type: ndcg_at_3 value: 35.894 - type: ndcg_at_5 value: 33.873 - type: precision_at_1 value: 42.857 - type: precision_at_10 value: 27.346999999999998 - type: precision_at_100 value: 8.327 - type: precision_at_1000 value: 1.551 - type: precision_at_3 value: 36.735 - type: precision_at_5 value: 33.469 - type: recall_at_1 value: 3.0540000000000003 - type: recall_at_10 value: 19.185 - type: recall_at_100 value: 51.056000000000004 - type: recall_at_1000 value: 82.814 - type: recall_at_3 value: 7.961 - type: recall_at_5 value: 11.829 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de metrics: - type: accuracy value: 64.9346 - type: ap value: 12.121605736777527 - type: f1 value: 50.169902005887955 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 56.72608941709111 - type: f1 value: 57.0702928875253 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 37.72671554400943 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 82.84556237706384 - type: cos_sim_ap value: 63.28364215788651 - type: cos_sim_f1 value: 60.00000000000001 - type: cos_sim_precision value: 54.45161290322581 - type: cos_sim_recall value: 66.80738786279683 - type: dot_accuracy value: 82.84556237706384 - type: dot_ap value: 63.28364302860433 - type: dot_f1 value: 60.00000000000001 - type: dot_precision value: 54.45161290322581 - type: dot_recall value: 66.80738786279683 - type: euclidean_accuracy value: 82.84556237706384 - type: euclidean_ap value: 63.28363625097978 - type: euclidean_f1 value: 60.00000000000001 - type: euclidean_precision value: 54.45161290322581 - type: euclidean_recall value: 66.80738786279683 - type: manhattan_accuracy value: 82.86940454193241 - type: manhattan_ap value: 63.244773709836764 - type: manhattan_f1 value: 60.12680942696495 - type: manhattan_precision value: 55.00109433136353 - type: manhattan_recall value: 66.3060686015831 - type: max_accuracy value: 82.86940454193241 - type: max_ap value: 63.28364302860433 - type: max_f1 value: 60.12680942696495 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.32033220786278 - type: cos_sim_ap value: 84.71928176006863 - type: cos_sim_f1 value: 76.51483333969684 - type: cos_sim_precision value: 75.89184276300841 - type: cos_sim_recall value: 77.14813674160764 - type: dot_accuracy value: 88.32033220786278 - type: dot_ap value: 84.71928330149228 - type: dot_f1 value: 76.51483333969684 - type: dot_precision value: 75.89184276300841 - type: dot_recall value: 77.14813674160764 - type: euclidean_accuracy value: 88.32033220786278 - type: euclidean_ap value: 84.71928045384345 - type: euclidean_f1 value: 76.51483333969684 - type: euclidean_precision value: 75.89184276300841 - type: euclidean_recall value: 77.14813674160764 - type: manhattan_accuracy value: 88.27570147863545 - type: manhattan_ap value: 84.68523541579755 - type: manhattan_f1 value: 76.51512269355146 - type: manhattan_precision value: 75.62608107091825 - type: manhattan_recall value: 77.42531567600862 - type: max_accuracy value: 88.32033220786278 - type: max_ap value: 84.71928330149228 - type: max_f1 value: 76.51512269355146 - task: type: Clustering dataset: type: jinaai/cities_wiki_clustering name: MTEB WikiCitiesClustering config: default split: test revision: ddc9ee9242fa65332597f70e967ecc38b9d734fa metrics: - type: v_measure value: 85.30624598674467 license: apache-2.0 new_version: Snowflake/snowflake-arctic-embed-m-v2.0 ---

Snowflake's Arctic-embed-m

News | Models | Usage | Evaluation | Contact | FAQ License | Acknowledgement

## News 12/04/2024: Release of snowflake-arctic-embed-l-v2.0 and snowflake-arctic-embed-m-v2.0 our newest models with multilingual workloads in mind. These models outperform prior versions of Arctic Embed and we suggest these replace prior versions! 07/26/2024: Release preprint [[2407.18887] Embedding And Clustering Your Data Can Improve Contrastive Pretraining]( on arXiv. 07/18/2024: Release of , capable of producing highly compressible embedding vectors that preserve quality even when squished as small as 128 bytes per vector. Details about the development of this model are available in the launch post on the Snowflake engineering blog. 05/10/2024: Release the technical report on Arctic Embed 04/16/2024: Release the ** snowflake-arctic-embed ** family of text embedding models. The releases are state-of-the-art for Retrieval quality at each of their representative size profiles. [Technical Report]() is coming shortly. For more details, please refer to our Github: Arctic-Text-Embed. ## Models snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance. The models achieve **state-of-the-art performance on the MTEB/BEIR leaderboard** for each of their size variants. Evaluation is performed using these scripts. As shown below, each class of model size achieves SOTA retrieval accuracy compared to other top models. The models are trained by leveraging existing open-source text representation models, such as bert-base-uncased, and are trained in a multi-stage pipeline to optimize their retrieval performance. First, the models are trained with large batches of query-document pairs where negatives are derived in-batch—pretraining leverages about 400m samples of a mix of public datasets and proprietary web search data. Following pretraining models are further optimized with long training on a smaller dataset (about 1m samples) of triplets of query, positive document, and negative document derived from hard harmful mining. Mining of the negatives and data curation is crucial to retrieval accuracy. A detailed technical report can be found here. | Name | MTEB Retrieval Score (NDCG @ 10) | Parameters (Millions) | Embedding Dimension | | ----------------------------------------------------------------------- | -------------------------------- | --------------------- | ------------------- | | snowflake-arctic-embed-xs | 50.15 | 22 | 384 | | snowflake-arctic-embed-s | 51.98 | 33 | 384 | | snowflake-arctic-embed-m | 54.90 | 110 | 768 | | snowflake-arctic-embed-m-long | 54.83 | 137 | 768 | | snowflake-arctic-embed-l | 55.98 | 335 | 1024 | Aside from being great open-source models, the largest model, snowflake-arctic-embed-l, can serve as a natural replacement for closed-source embedding, as shown below. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-l | 55.98 | | Google-gecko-text-embedding | 55.7 | | text-embedding-3-large | 55.44 | | Cohere-embed-english-v3.0 | 55.00 | | bge-large-en-v1.5 | 54.29 | ### snowflake-arctic-embed-xs This tiny model packs quite the punch. Based on the all-MiniLM-L6-v2 model with only 22m parameters and 384 dimensions, this model should meet even the strictest latency/TCO budgets. Despite its size, its retrieval accuracy is closer to that of models with 100m paramers. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------- | -------------------------------- | | snowflake-arctic-embed-xs | 50.15 | | GIST-all-MiniLM-L6-v2 | 45.12 | | gte-tiny | 44.92 | | all-MiniLM-L6-v2 | 41.95 | | bge-micro-v2 | 42.56 | ### snowflake-arctic-embed-s Based on the intfloat/e5-small-unsupervised model, this small model does not trade off retrieval accuracy for its small size. With only 33m parameters and 384 dimensions, this model should easily allow scaling to large datasets. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-s | 51.98 | | bge-small-en-v1.5 | 51.68 | | Cohere-embed-english-light-v3.0 | 51.34 | | text-embedding-3-small | 51.08 | | e5-small-v2 | 49.04 | ### snowflake-arctic-embed-m Based on the intfloat/e5-base-unsupervised model, this medium model is the workhorse that provides the best retrieval performance without slowing down inference. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-m | 54.90 | | bge-base-en-v1.5 | 53.25 | | nomic-embed-text-v1.5 | 53.25 | | GIST-Embedding-v0 | 52.31 | | gte-base | 52.31 | ### snowflake-arctic-embed-m-long Based on the nomic-ai/nomic-embed-text-v1-unsupervised model, this long-context variant of our medium-sized model is perfect for workloads that can be constrained by the regular 512 token context of our other models. Without the use of RPE, this model supports up to 2048 tokens. With RPE, it can scale to 8192! | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-m-long | 54.83 | | nomic-embed-text-v1.5 | 53.01 | | nomic-embed-text-v1 | 52.81 | ### snowflake-arctic-embed-l Based on the intfloat/e5-large-unsupervised model, this large model is a direct drop-in for closed APIs and delivers the most accurate retrieval experience. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-l | 55.98 | | UAE-Large-V1 | 54.66 | | bge-large-en-v1.5 | 54.29 | | mxbai-embed-large-v1 | 54.39 | | e5-Large-v2 | 50.56 | ## Usage ### Using Sentence Transformers You can use the sentence-transformers package to use an snowflake-arctic-embed model, as shown below. Produces: ### Using Huggingface transformers You can use the transformers package to use an snowflake-arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query). ### Using Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM by running: You can then use the model to compute embeddings as follows: ## Using Infinity OpenAI compatible API deployment with Infinity and Docker. ## FAQ TBD ## Contact Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Daniel Campos(daniel.campos@snowflake.com). ## License Arctic is licensed under the Apache-2. The released models can be used for commercial purposes free of charge. ## Acknowledgement We want to thank the open-source community, which has provided the great building blocks upon which we could make our models. We thank our modeling engineers, Danmei Xu, Luke Merrick, Gaurav Nuti, and Daniel Campos, for making these great models possible. We thank our leadership, Himabindu Pucha, Kelvin So, Vivek Raghunathan, and Sridhar Ramaswamy, for supporting this work. We also thank the open-source community for producing the great models we could build on top of and making these releases possible. Finally, we thank the researchers who created BEIR and MTEB benchmarks. It is largely thanks to their tireless work to define what better looks like that we could improve model performance. ", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity measurement, classification, retrieval, clustering, and reranking across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/Snowflake_snowflake-arctic-embed-xs.json b/data/model_data_json/Snowflake_snowflake-arctic-embed-xs.json new file mode 100644 index 0000000000000000000000000000000000000000..aeb6f436ba81e6fbecf81ab2c8f4b9d38f604ca4 --- /dev/null +++ b/data/model_data_json/Snowflake_snowflake-arctic-embed-xs.json @@ -0,0 +1,25 @@ +{ + "model_id": "Snowflake/snowflake-arctic-embed-xs", + "downloads": 100914, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "arctic", + "snowflake-arctic-embed", + "transformers.js", + "arxiv:2407.18887", + "arxiv:2405.05374", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - arctic - snowflake-arctic-embed - transformers.js model-index: - name: snowflake-snowflake-arctic-embed-xs results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 65.08955223880598 - type: ap value: 28.514291209445364 - type: f1 value: 59.2604580112738 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 70.035375 - type: ap value: 64.29444264250405 - type: f1 value: 69.78382333907138 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 35.343999999999994 - type: f1 value: 34.69618251902858 - task: type: Retrieval dataset: type: mteb/arguana name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 28.592000000000002 - type: map_at_10 value: 43.597 - type: map_at_100 value: 44.614 - type: map_at_1000 value: 44.624 - type: map_at_3 value: 38.928000000000004 - type: map_at_5 value: 41.453 - type: mrr_at_1 value: 29.232000000000003 - type: mrr_at_10 value: 43.829 - type: mrr_at_100 value: 44.852 - type: mrr_at_1000 value: 44.862 - type: mrr_at_3 value: 39.118 - type: mrr_at_5 value: 41.703 - type: ndcg_at_1 value: 28.592000000000002 - type: ndcg_at_10 value: 52.081 - type: ndcg_at_100 value: 56.37 - type: ndcg_at_1000 value: 56.598000000000006 - type: ndcg_at_3 value: 42.42 - type: ndcg_at_5 value: 46.965 - type: precision_at_1 value: 28.592000000000002 - type: precision_at_10 value: 7.922999999999999 - type: precision_at_100 value: 0.979 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 17.52 - type: precision_at_5 value: 12.717 - type: recall_at_1 value: 28.592000000000002 - type: recall_at_10 value: 79.232 - type: recall_at_100 value: 97.866 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 52.559999999999995 - type: recall_at_5 value: 63.585 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 43.50220588953974 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 32.08725826118282 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 60.25381587694928 - type: mrr value: 73.79776194873148 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 85.47489332445278 - type: cos_sim_spearman value: 84.05432487336698 - type: euclidean_pearson value: 84.5108222177219 - type: euclidean_spearman value: 84.05432487336698 - type: manhattan_pearson value: 84.20440618321464 - type: manhattan_spearman value: 83.9290208134097 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 76.37337662337663 - type: f1 value: 75.33296834885043 - task: type: Clustering dataset: type: jinaai/big-patent-clustering name: MTEB BigPatentClustering config: default split: test revision: 62d5330920bca426ce9d3c76ea914f15fc83e891 metrics: - type: v_measure value: 21.31174373264835 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 34.481973521597844 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 26.14094256567341 - task: type: Retrieval dataset: type: mteb/cqadupstack-android name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 32.527 - type: map_at_10 value: 43.699 - type: map_at_100 value: 45.03 - type: map_at_1000 value: 45.157000000000004 - type: map_at_3 value: 39.943 - type: map_at_5 value: 42.324 - type: mrr_at_1 value: 39.771 - type: mrr_at_10 value: 49.277 - type: mrr_at_100 value: 49.956 - type: mrr_at_1000 value: 50.005 - type: mrr_at_3 value: 46.304 - type: mrr_at_5 value: 48.493 - type: ndcg_at_1 value: 39.771 - type: ndcg_at_10 value: 49.957 - type: ndcg_at_100 value: 54.678000000000004 - type: ndcg_at_1000 value: 56.751 - type: ndcg_at_3 value: 44.608 - type: ndcg_at_5 value: 47.687000000000005 - type: precision_at_1 value: 39.771 - type: precision_at_10 value: 9.557 - type: precision_at_100 value: 1.5010000000000001 - type: precision_at_1000 value: 0.194 - type: precision_at_3 value: 21.173000000000002 - type: precision_at_5 value: 15.794 - type: recall_at_1 value: 32.527 - type: recall_at_10 value: 61.791 - type: recall_at_100 value: 81.49300000000001 - type: recall_at_1000 value: 95.014 - type: recall_at_3 value: 46.605000000000004 - type: recall_at_5 value: 54.83 - task: type: Retrieval dataset: type: mteb/cqadupstack-english name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 29.424 - type: map_at_10 value: 38.667 - type: map_at_100 value: 39.771 - type: map_at_1000 value: 39.899 - type: map_at_3 value: 35.91 - type: map_at_5 value: 37.45 - type: mrr_at_1 value: 36.687999999999995 - type: mrr_at_10 value: 44.673 - type: mrr_at_100 value: 45.289 - type: mrr_at_1000 value: 45.338 - type: mrr_at_3 value: 42.601 - type: mrr_at_5 value: 43.875 - type: ndcg_at_1 value: 36.687999999999995 - type: ndcg_at_10 value: 44.013000000000005 - type: ndcg_at_100 value: 48.13 - type: ndcg_at_1000 value: 50.294000000000004 - type: ndcg_at_3 value: 40.056999999999995 - type: ndcg_at_5 value: 41.902 - type: precision_at_1 value: 36.687999999999995 - type: precision_at_10 value: 8.158999999999999 - type: precision_at_100 value: 1.321 - type: precision_at_1000 value: 0.179 - type: precision_at_3 value: 19.045 - type: precision_at_5 value: 13.427 - type: recall_at_1 value: 29.424 - type: recall_at_10 value: 53.08500000000001 - type: recall_at_100 value: 70.679 - type: recall_at_1000 value: 84.66 - type: recall_at_3 value: 41.399 - type: recall_at_5 value: 46.632 - task: type: Retrieval dataset: type: mteb/cqadupstack-gaming name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 39.747 - type: map_at_10 value: 51.452 - type: map_at_100 value: 52.384 - type: map_at_1000 value: 52.437 - type: map_at_3 value: 48.213 - type: map_at_5 value: 50.195 - type: mrr_at_1 value: 45.391999999999996 - type: mrr_at_10 value: 54.928 - type: mrr_at_100 value: 55.532000000000004 - type: mrr_at_1000 value: 55.565 - type: mrr_at_3 value: 52.456 - type: mrr_at_5 value: 54.054 - type: ndcg_at_1 value: 45.391999999999996 - type: ndcg_at_10 value: 57.055 - type: ndcg_at_100 value: 60.751999999999995 - type: ndcg_at_1000 value: 61.864 - type: ndcg_at_3 value: 51.662 - type: ndcg_at_5 value: 54.613 - type: precision_at_1 value: 45.391999999999996 - type: precision_at_10 value: 9.103 - type: precision_at_100 value: 1.1780000000000002 - type: precision_at_1000 value: 0.132 - type: precision_at_3 value: 22.717000000000002 - type: precision_at_5 value: 15.812000000000001 - type: recall_at_1 value: 39.747 - type: recall_at_10 value: 70.10499999999999 - type: recall_at_100 value: 86.23100000000001 - type: recall_at_1000 value: 94.025 - type: recall_at_3 value: 55.899 - type: recall_at_5 value: 63.05500000000001 - task: type: Retrieval dataset: type: mteb/cqadupstack-gis name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 27.168999999999997 - type: map_at_10 value: 34.975 - type: map_at_100 value: 35.94 - type: map_at_1000 value: 36.021 - type: map_at_3 value: 32.35 - type: map_at_5 value: 33.831 - type: mrr_at_1 value: 28.701 - type: mrr_at_10 value: 36.698 - type: mrr_at_100 value: 37.546 - type: mrr_at_1000 value: 37.613 - type: mrr_at_3 value: 34.256 - type: mrr_at_5 value: 35.685 - type: ndcg_at_1 value: 28.701 - type: ndcg_at_10 value: 39.639 - type: ndcg_at_100 value: 44.389 - type: ndcg_at_1000 value: 46.46 - type: ndcg_at_3 value: 34.52 - type: ndcg_at_5 value: 37.076 - type: precision_at_1 value: 28.701 - type: precision_at_10 value: 5.955 - type: precision_at_100 value: 0.8880000000000001 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 14.274999999999999 - type: precision_at_5 value: 10.011000000000001 - type: recall_at_1 value: 27.168999999999997 - type: recall_at_10 value: 52.347 - type: recall_at_100 value: 74.1 - type: recall_at_1000 value: 89.739 - type: recall_at_3 value: 38.567 - type: recall_at_5 value: 44.767 - task: type: Retrieval dataset: type: mteb/cqadupstack-mathematica name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 15.872 - type: map_at_10 value: 23.153000000000002 - type: map_at_100 value: 24.311 - type: map_at_1000 value: 24.432000000000002 - type: map_at_3 value: 20.707 - type: map_at_5 value: 21.921 - type: mrr_at_1 value: 19.776 - type: mrr_at_10 value: 27.755999999999997 - type: mrr_at_100 value: 28.709 - type: mrr_at_1000 value: 28.778 - type: mrr_at_3 value: 25.186999999999998 - type: mrr_at_5 value: 26.43 - type: ndcg_at_1 value: 19.776 - type: ndcg_at_10 value: 28.288999999999998 - type: ndcg_at_100 value: 34.011 - type: ndcg_at_1000 value: 36.916 - type: ndcg_at_3 value: 23.551 - type: ndcg_at_5 value: 25.429000000000002 - type: precision_at_1 value: 19.776 - type: precision_at_10 value: 5.311 - type: precision_at_100 value: 0.9440000000000001 - type: precision_at_1000 value: 0.132 - type: precision_at_3 value: 11.360000000000001 - type: precision_at_5 value: 8.209 - type: recall_at_1 value: 15.872 - type: recall_at_10 value: 39.726 - type: recall_at_100 value: 65.035 - type: recall_at_1000 value: 85.846 - type: recall_at_3 value: 26.432 - type: recall_at_5 value: 31.22 - task: type: Retrieval dataset: type: mteb/cqadupstack-physics name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 28.126 - type: map_at_10 value: 37.537 - type: map_at_100 value: 38.807 - type: map_at_1000 value: 38.923 - type: map_at_3 value: 34.65 - type: map_at_5 value: 36.248000000000005 - type: mrr_at_1 value: 34.649 - type: mrr_at_10 value: 42.893 - type: mrr_at_100 value: 43.721 - type: mrr_at_1000 value: 43.775999999999996 - type: mrr_at_3 value: 40.488 - type: mrr_at_5 value: 41.729 - type: ndcg_at_1 value: 34.649 - type: ndcg_at_10 value: 43.072 - type: ndcg_at_100 value: 48.464 - type: ndcg_at_1000 value: 50.724000000000004 - type: ndcg_at_3 value: 38.506 - type: ndcg_at_5 value: 40.522000000000006 - type: precision_at_1 value: 34.649 - type: precision_at_10 value: 7.68 - type: precision_at_100 value: 1.214 - type: precision_at_1000 value: 0.16 - type: precision_at_3 value: 18.029999999999998 - type: precision_at_5 value: 12.666 - type: recall_at_1 value: 28.126 - type: recall_at_10 value: 54.396 - type: recall_at_100 value: 76.988 - type: recall_at_1000 value: 91.85799999999999 - type: recall_at_3 value: 41.169 - type: recall_at_5 value: 46.658 - task: type: Retrieval dataset: type: mteb/cqadupstack-programmers name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 26.68 - type: map_at_10 value: 35.702 - type: map_at_100 value: 36.864999999999995 - type: map_at_1000 value: 36.977 - type: map_at_3 value: 32.828 - type: map_at_5 value: 34.481 - type: mrr_at_1 value: 32.991 - type: mrr_at_10 value: 40.993 - type: mrr_at_100 value: 41.827 - type: mrr_at_1000 value: 41.887 - type: mrr_at_3 value: 38.623000000000005 - type: mrr_at_5 value: 40.021 - type: ndcg_at_1 value: 32.991 - type: ndcg_at_10 value: 41.036 - type: ndcg_at_100 value: 46.294000000000004 - type: ndcg_at_1000 value: 48.644 - type: ndcg_at_3 value: 36.419000000000004 - type: ndcg_at_5 value: 38.618 - type: precision_at_1 value: 32.991 - type: precision_at_10 value: 7.385999999999999 - type: precision_at_100 value: 1.176 - type: precision_at_1000 value: 0.151 - type: precision_at_3 value: 17.122999999999998 - type: precision_at_5 value: 12.215 - type: recall_at_1 value: 26.68 - type: recall_at_10 value: 51.644 - type: recall_at_100 value: 74.55000000000001 - type: recall_at_1000 value: 90.825 - type: recall_at_3 value: 38.579 - type: recall_at_5 value: 44.512 - task: type: Retrieval dataset: type: mteb/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 26.30825 - type: map_at_10 value: 34.97866666666666 - type: map_at_100 value: 36.109249999999996 - type: map_at_1000 value: 36.22508333333333 - type: map_at_3 value: 32.239083333333326 - type: map_at_5 value: 33.75933333333334 - type: mrr_at_1 value: 31.05308333333333 - type: mrr_at_10 value: 39.099833333333336 - type: mrr_at_100 value: 39.92008333333334 - type: mrr_at_1000 value: 39.980000000000004 - type: mrr_at_3 value: 36.75958333333333 - type: mrr_at_5 value: 38.086416666666665 - type: ndcg_at_1 value: 31.05308333333333 - type: ndcg_at_10 value: 40.11558333333334 - type: ndcg_at_100 value: 45.05966666666667 - type: ndcg_at_1000 value: 47.36516666666667 - type: ndcg_at_3 value: 35.490833333333335 - type: ndcg_at_5 value: 37.64541666666666 - type: precision_at_1 value: 31.05308333333333 - type: precision_at_10 value: 6.968416666666666 - type: precision_at_100 value: 1.1156666666666666 - type: precision_at_1000 value: 0.14950000000000002 - type: precision_at_3 value: 16.123 - type: precision_at_5 value: 11.451166666666666 - type: recall_at_1 value: 26.30825 - type: recall_at_10 value: 51.19283333333333 - type: recall_at_100 value: 73.0285 - type: recall_at_1000 value: 89.11133333333333 - type: recall_at_3 value: 38.26208333333333 - type: recall_at_5 value: 43.855916666666666 - task: type: Retrieval dataset: type: mteb/cqadupstack-stats name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 23.363999999999997 - type: map_at_10 value: 30.606 - type: map_at_100 value: 31.491999999999997 - type: map_at_1000 value: 31.578 - type: map_at_3 value: 28.610000000000003 - type: map_at_5 value: 29.602 - type: mrr_at_1 value: 26.38 - type: mrr_at_10 value: 33.472 - type: mrr_at_100 value: 34.299 - type: mrr_at_1000 value: 34.361999999999995 - type: mrr_at_3 value: 31.696999999999996 - type: mrr_at_5 value: 32.503 - type: ndcg_at_1 value: 26.38 - type: ndcg_at_10 value: 34.772999999999996 - type: ndcg_at_100 value: 39.334 - type: ndcg_at_1000 value: 41.676 - type: ndcg_at_3 value: 31.097 - type: ndcg_at_5 value: 32.561 - type: precision_at_1 value: 26.38 - type: precision_at_10 value: 5.475 - type: precision_at_100 value: 0.84 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 13.395000000000001 - type: precision_at_5 value: 9.11 - type: recall_at_1 value: 23.363999999999997 - type: recall_at_10 value: 44.656 - type: recall_at_100 value: 65.77199999999999 - type: recall_at_1000 value: 83.462 - type: recall_at_3 value: 34.213 - type: recall_at_5 value: 38.091 - task: type: Retrieval dataset: type: mteb/cqadupstack-tex name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 17.971999999999998 - type: map_at_10 value: 24.913 - type: map_at_100 value: 25.916 - type: map_at_1000 value: 26.049 - type: map_at_3 value: 22.569 - type: map_at_5 value: 23.858999999999998 - type: mrr_at_1 value: 21.748 - type: mrr_at_10 value: 28.711 - type: mrr_at_100 value: 29.535 - type: mrr_at_1000 value: 29.621 - type: mrr_at_3 value: 26.484999999999996 - type: mrr_at_5 value: 27.701999999999998 - type: ndcg_at_1 value: 21.748 - type: ndcg_at_10 value: 29.412 - type: ndcg_at_100 value: 34.204 - type: ndcg_at_1000 value: 37.358000000000004 - type: ndcg_at_3 value: 25.202 - type: ndcg_at_5 value: 27.128000000000004 - type: precision_at_1 value: 21.748 - type: precision_at_10 value: 5.279 - type: precision_at_100 value: 0.902 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 11.551 - type: precision_at_5 value: 8.437999999999999 - type: recall_at_1 value: 17.971999999999998 - type: recall_at_10 value: 39.186 - type: recall_at_100 value: 60.785999999999994 - type: recall_at_1000 value: 83.372 - type: recall_at_3 value: 27.584999999999997 - type: recall_at_5 value: 32.448 - task: type: Retrieval dataset: type: mteb/cqadupstack-unix name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 26.684 - type: map_at_10 value: 35.188 - type: map_at_100 value: 36.379 - type: map_at_1000 value: 36.481 - type: map_at_3 value: 32.401 - type: map_at_5 value: 34.132 - type: mrr_at_1 value: 31.063000000000002 - type: mrr_at_10 value: 39.104 - type: mrr_at_100 value: 40.062999999999995 - type: mrr_at_1000 value: 40.119 - type: mrr_at_3 value: 36.692 - type: mrr_at_5 value: 38.161 - type: ndcg_at_1 value: 31.063000000000002 - type: ndcg_at_10 value: 40.096 - type: ndcg_at_100 value: 45.616 - type: ndcg_at_1000 value: 47.869 - type: ndcg_at_3 value: 35.256 - type: ndcg_at_5 value: 37.826 - type: precision_at_1 value: 31.063000000000002 - type: precision_at_10 value: 6.622999999999999 - type: precision_at_100 value: 1.046 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 15.641 - type: precision_at_5 value: 11.231 - type: recall_at_1 value: 26.684 - type: recall_at_10 value: 51.092999999999996 - type: recall_at_100 value: 75.099 - type: recall_at_1000 value: 90.644 - type: recall_at_3 value: 38.063 - type: recall_at_5 value: 44.518 - task: type: Retrieval dataset: type: mteb/cqadupstack-webmasters name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 26.249 - type: map_at_10 value: 34.694 - type: map_at_100 value: 36.208 - type: map_at_1000 value: 36.443 - type: map_at_3 value: 31.868000000000002 - type: map_at_5 value: 33.018 - type: mrr_at_1 value: 31.818 - type: mrr_at_10 value: 39.416000000000004 - type: mrr_at_100 value: 40.327 - type: mrr_at_1000 value: 40.388000000000005 - type: mrr_at_3 value: 37.120999999999995 - type: mrr_at_5 value: 38.07 - type: ndcg_at_1 value: 31.818 - type: ndcg_at_10 value: 40.405 - type: ndcg_at_100 value: 45.816 - type: ndcg_at_1000 value: 48.403 - type: ndcg_at_3 value: 35.823 - type: ndcg_at_5 value: 37.191 - type: precision_at_1 value: 31.818 - type: precision_at_10 value: 7.806 - type: precision_at_100 value: 1.518 - type: precision_at_1000 value: 0.241 - type: precision_at_3 value: 16.535 - type: precision_at_5 value: 11.738999999999999 - type: recall_at_1 value: 26.249 - type: recall_at_10 value: 50.928 - type: recall_at_100 value: 75.271 - type: recall_at_1000 value: 91.535 - type: recall_at_3 value: 37.322 - type: recall_at_5 value: 41.318 - task: type: Retrieval dataset: type: mteb/cqadupstack-wordpress name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 21.884999999999998 - type: map_at_10 value: 29.158 - type: map_at_100 value: 30.208000000000002 - type: map_at_1000 value: 30.304 - type: map_at_3 value: 26.82 - type: map_at_5 value: 28.051 - type: mrr_at_1 value: 23.66 - type: mrr_at_10 value: 31.277 - type: mrr_at_100 value: 32.237 - type: mrr_at_1000 value: 32.308 - type: mrr_at_3 value: 29.205 - type: mrr_at_5 value: 30.314000000000004 - type: ndcg_at_1 value: 23.66 - type: ndcg_at_10 value: 33.64 - type: ndcg_at_100 value: 39.028 - type: ndcg_at_1000 value: 41.423 - type: ndcg_at_3 value: 29.189 - type: ndcg_at_5 value: 31.191999999999997 - type: precision_at_1 value: 23.66 - type: precision_at_10 value: 5.287 - type: precision_at_100 value: 0.86 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 12.631 - type: precision_at_5 value: 8.762 - type: recall_at_1 value: 21.884999999999998 - type: recall_at_10 value: 45.357 - type: recall_at_100 value: 70.338 - type: recall_at_1000 value: 88.356 - type: recall_at_3 value: 33.312000000000005 - type: recall_at_5 value: 38.222 - task: type: Retrieval dataset: type: mteb/climate-fever name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 13.058 - type: map_at_10 value: 21.549 - type: map_at_100 value: 23.287 - type: map_at_1000 value: 23.444000000000003 - type: map_at_3 value: 18.18 - type: map_at_5 value: 19.886 - type: mrr_at_1 value: 28.73 - type: mrr_at_10 value: 40.014 - type: mrr_at_100 value: 40.827000000000005 - type: mrr_at_1000 value: 40.866 - type: mrr_at_3 value: 36.602000000000004 - type: mrr_at_5 value: 38.702 - type: ndcg_at_1 value: 28.73 - type: ndcg_at_10 value: 29.881 - type: ndcg_at_100 value: 36.662 - type: ndcg_at_1000 value: 39.641999999999996 - type: ndcg_at_3 value: 24.661 - type: ndcg_at_5 value: 26.548 - type: precision_at_1 value: 28.73 - type: precision_at_10 value: 9.094 - type: precision_at_100 value: 1.6480000000000001 - type: precision_at_1000 value: 0.22100000000000003 - type: precision_at_3 value: 17.98 - type: precision_at_5 value: 13.811000000000002 - type: recall_at_1 value: 13.058 - type: recall_at_10 value: 35.458 - type: recall_at_100 value: 58.719 - type: recall_at_1000 value: 75.495 - type: recall_at_3 value: 22.607 - type: recall_at_5 value: 28.067999999999998 - task: type: Retrieval dataset: type: mteb/dbpedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 8.811 - type: map_at_10 value: 19.134999999999998 - type: map_at_100 value: 26.905 - type: map_at_1000 value: 28.503 - type: map_at_3 value: 13.863 - type: map_at_5 value: 16.062 - type: mrr_at_1 value: 67 - type: mrr_at_10 value: 74.607 - type: mrr_at_100 value: 74.941 - type: mrr_at_1000 value: 74.954 - type: mrr_at_3 value: 73.042 - type: mrr_at_5 value: 73.992 - type: ndcg_at_1 value: 52.87500000000001 - type: ndcg_at_10 value: 40.199 - type: ndcg_at_100 value: 44.901 - type: ndcg_at_1000 value: 52.239999999999995 - type: ndcg_at_3 value: 44.983000000000004 - type: ndcg_at_5 value: 42.137 - type: precision_at_1 value: 67 - type: precision_at_10 value: 31.8 - type: precision_at_100 value: 10.315000000000001 - type: precision_at_1000 value: 2.0420000000000003 - type: precision_at_3 value: 48.667 - type: precision_at_5 value: 40.9 - type: recall_at_1 value: 8.811 - type: recall_at_10 value: 24.503 - type: recall_at_100 value: 51.288999999999994 - type: recall_at_1000 value: 74.827 - type: recall_at_3 value: 15.254999999999999 - type: recall_at_5 value: 18.698999999999998 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 41.839999999999996 - type: f1 value: 37.78718146306379 - task: type: Retrieval dataset: type: mteb/fever name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 68.47999999999999 - type: map_at_10 value: 78.782 - type: map_at_100 value: 79.021 - type: map_at_1000 value: 79.035 - type: map_at_3 value: 77.389 - type: map_at_5 value: 78.347 - type: mrr_at_1 value: 73.837 - type: mrr_at_10 value: 83.41499999999999 - type: mrr_at_100 value: 83.53399999999999 - type: mrr_at_1000 value: 83.535 - type: mrr_at_3 value: 82.32300000000001 - type: mrr_at_5 value: 83.13000000000001 - type: ndcg_at_1 value: 73.837 - type: ndcg_at_10 value: 83.404 - type: ndcg_at_100 value: 84.287 - type: ndcg_at_1000 value: 84.52199999999999 - type: ndcg_at_3 value: 81.072 - type: ndcg_at_5 value: 82.537 - type: precision_at_1 value: 73.837 - type: precision_at_10 value: 10.254000000000001 - type: precision_at_100 value: 1.088 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 31.538 - type: precision_at_5 value: 19.811 - type: recall_at_1 value: 68.47999999999999 - type: recall_at_10 value: 92.98100000000001 - type: recall_at_100 value: 96.50800000000001 - type: recall_at_1000 value: 97.925 - type: recall_at_3 value: 86.764 - type: recall_at_5 value: 90.39 - task: type: Retrieval dataset: type: mteb/fiqa name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 16.786 - type: map_at_10 value: 26.97 - type: map_at_100 value: 28.488000000000003 - type: map_at_1000 value: 28.665000000000003 - type: map_at_3 value: 23.3 - type: map_at_5 value: 25.249 - type: mrr_at_1 value: 33.025 - type: mrr_at_10 value: 41.86 - type: mrr_at_100 value: 42.673 - type: mrr_at_1000 value: 42.714 - type: mrr_at_3 value: 39.403 - type: mrr_at_5 value: 40.723 - type: ndcg_at_1 value: 33.025 - type: ndcg_at_10 value: 34.522999999999996 - type: ndcg_at_100 value: 40.831 - type: ndcg_at_1000 value: 44.01 - type: ndcg_at_3 value: 30.698999999999998 - type: ndcg_at_5 value: 31.832 - type: precision_at_1 value: 33.025 - type: precision_at_10 value: 9.583 - type: precision_at_100 value: 1.619 - type: precision_at_1000 value: 0.22100000000000003 - type: precision_at_3 value: 20.216 - type: precision_at_5 value: 15.031 - type: recall_at_1 value: 16.786 - type: recall_at_10 value: 41.969 - type: recall_at_100 value: 66.353 - type: recall_at_1000 value: 85.299 - type: recall_at_3 value: 28.111000000000004 - type: recall_at_5 value: 33.645 - task: type: Retrieval dataset: type: mteb/hotpotqa name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 37.346000000000004 - type: map_at_10 value: 56.184999999999995 - type: map_at_100 value: 57.062000000000005 - type: map_at_1000 value: 57.126999999999995 - type: map_at_3 value: 52.815 - type: map_at_5 value: 54.893 - type: mrr_at_1 value: 74.693 - type: mrr_at_10 value: 81.128 - type: mrr_at_100 value: 81.356 - type: mrr_at_1000 value: 81.363 - type: mrr_at_3 value: 80.05600000000001 - type: mrr_at_5 value: 80.74 - type: ndcg_at_1 value: 74.693 - type: ndcg_at_10 value: 65.249 - type: ndcg_at_100 value: 68.357 - type: ndcg_at_1000 value: 69.64200000000001 - type: ndcg_at_3 value: 60.377 - type: ndcg_at_5 value: 63.044 - type: precision_at_1 value: 74.693 - type: precision_at_10 value: 13.630999999999998 - type: precision_at_100 value: 1.606 - type: precision_at_1000 value: 0.178 - type: precision_at_3 value: 38.222 - type: precision_at_5 value: 25.040000000000003 - type: recall_at_1 value: 37.346000000000004 - type: recall_at_10 value: 68.157 - type: recall_at_100 value: 80.297 - type: recall_at_1000 value: 88.832 - type: recall_at_3 value: 57.333 - type: recall_at_5 value: 62.6 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 62.80240000000001 - type: ap value: 58.22949464075975 - type: f1 value: 62.55694937343487 - task: type: Retrieval dataset: type: mteb/msmarco name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 20.918 - type: map_at_10 value: 32.732 - type: map_at_100 value: 33.922000000000004 - type: map_at_1000 value: 33.976 - type: map_at_3 value: 29.051 - type: map_at_5 value: 31.101 - type: mrr_at_1 value: 21.418 - type: mrr_at_10 value: 33.284000000000006 - type: mrr_at_100 value: 34.426 - type: mrr_at_1000 value: 34.473 - type: mrr_at_3 value: 29.644 - type: mrr_at_5 value: 31.691000000000003 - type: ndcg_at_1 value: 21.418 - type: ndcg_at_10 value: 39.427 - type: ndcg_at_100 value: 45.190999999999995 - type: ndcg_at_1000 value: 46.544000000000004 - type: ndcg_at_3 value: 31.885 - type: ndcg_at_5 value: 35.555 - type: precision_at_1 value: 21.418 - type: precision_at_10 value: 6.254999999999999 - type: precision_at_100 value: 0.915 - type: precision_at_1000 value: 0.10300000000000001 - type: precision_at_3 value: 13.591000000000001 - type: precision_at_5 value: 10.011000000000001 - type: recall_at_1 value: 20.918 - type: recall_at_10 value: 60.074000000000005 - type: recall_at_100 value: 86.726 - type: recall_at_1000 value: 97.116 - type: recall_at_3 value: 39.506 - type: recall_at_5 value: 48.319 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.79799361605106 - type: f1 value: 90.0757957511057 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 58.00501595987233 - type: f1 value: 39.85731569133947 - task: type: Classification dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClassification (eng) config: eng split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: accuracy value: 77.10970464135022 - type: f1 value: 76.12037616356896 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringP2P (eng) config: eng split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 69.81323966287493 - task: type: Clustering dataset: type: masakhane/masakhanews name: MTEB MasakhaNEWSClusteringS2S (eng) config: eng split: test revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 metrics: - type: v_measure value: 33.112774215788455 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.51042367182246 - type: f1 value: 60.99310361578824 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.0053799596503 - type: f1 value: 69.7794673003686 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 30.56899174856954 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 26.21848014733929 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.256308756916646 - type: mrr value: 31.123872086825656 - task: type: Retrieval dataset: type: mteb/nfcorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 5.07 - type: map_at_10 value: 11.286999999999999 - type: map_at_100 value: 13.630999999999998 - type: map_at_1000 value: 14.844 - type: map_at_3 value: 8.395 - type: map_at_5 value: 9.721 - type: mrr_at_1 value: 41.486000000000004 - type: mrr_at_10 value: 51.041000000000004 - type: mrr_at_100 value: 51.661 - type: mrr_at_1000 value: 51.7 - type: mrr_at_3 value: 49.226 - type: mrr_at_5 value: 50.526 - type: ndcg_at_1 value: 39.783 - type: ndcg_at_10 value: 30.885 - type: ndcg_at_100 value: 27.459 - type: ndcg_at_1000 value: 35.988 - type: ndcg_at_3 value: 36.705 - type: ndcg_at_5 value: 34.156 - type: precision_at_1 value: 41.486000000000004 - type: precision_at_10 value: 22.415 - type: precision_at_100 value: 6.819999999999999 - type: precision_at_1000 value: 1.8980000000000001 - type: precision_at_3 value: 34.572 - type: precision_at_5 value: 29.287999999999997 - type: recall_at_1 value: 5.07 - type: recall_at_10 value: 14.576 - type: recall_at_100 value: 27.112000000000002 - type: recall_at_1000 value: 57.995 - type: recall_at_3 value: 9.242 - type: recall_at_5 value: 11.668000000000001 - task: type: Retrieval dataset: type: mteb/nq name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 32.263999999999996 - type: map_at_10 value: 47.219 - type: map_at_100 value: 48.209999999999994 - type: map_at_1000 value: 48.24 - type: map_at_3 value: 42.905 - type: map_at_5 value: 45.501000000000005 - type: mrr_at_1 value: 36.153 - type: mrr_at_10 value: 49.636 - type: mrr_at_100 value: 50.357 - type: mrr_at_1000 value: 50.378 - type: mrr_at_3 value: 46.094 - type: mrr_at_5 value: 48.233 - type: ndcg_at_1 value: 36.124 - type: ndcg_at_10 value: 54.764 - type: ndcg_at_100 value: 58.867999999999995 - type: ndcg_at_1000 value: 59.548 - type: ndcg_at_3 value: 46.717999999999996 - type: ndcg_at_5 value: 50.981 - type: precision_at_1 value: 36.124 - type: precision_at_10 value: 8.931000000000001 - type: precision_at_100 value: 1.126 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 21.051000000000002 - type: precision_at_5 value: 15.104000000000001 - type: recall_at_1 value: 32.263999999999996 - type: recall_at_10 value: 75.39099999999999 - type: recall_at_100 value: 93.038 - type: recall_at_1000 value: 98.006 - type: recall_at_3 value: 54.562999999999995 - type: recall_at_5 value: 64.352 - task: type: Classification dataset: type: ag_news name: MTEB NewsClassification config: default split: test revision: eb185aade064a813bc0b7f42de02595523103ca4 metrics: - type: accuracy value: 77.75 - type: f1 value: 77.504243291547 - task: type: PairClassification dataset: type: GEM/opusparcus name: MTEB OpusparcusPC (en) config: en split: test revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a metrics: - type: cos_sim_accuracy value: 99.89816700610999 - type: cos_sim_ap value: 100 - type: cos_sim_f1 value: 99.9490575649516 - type: cos_sim_precision value: 100 - type: cos_sim_recall value: 99.89816700610999 - type: dot_accuracy value: 99.89816700610999 - type: dot_ap value: 100 - type: dot_f1 value: 99.9490575649516 - type: dot_precision value: 100 - type: dot_recall value: 99.89816700610999 - type: euclidean_accuracy value: 99.89816700610999 - type: euclidean_ap value: 100 - type: euclidean_f1 value: 99.9490575649516 - type: euclidean_precision value: 100 - type: euclidean_recall value: 99.89816700610999 - type: manhattan_accuracy value: 99.89816700610999 - type: manhattan_ap value: 100 - type: manhattan_f1 value: 99.9490575649516 - type: manhattan_precision value: 100 - type: manhattan_recall value: 99.89816700610999 - type: max_accuracy value: 99.89816700610999 - type: max_ap value: 100 - type: max_f1 value: 99.9490575649516 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX (en) config: en split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_accuracy value: 61.75000000000001 - type: cos_sim_ap value: 57.9482264289061 - type: cos_sim_f1 value: 62.444061962134256 - type: cos_sim_precision value: 45.3953953953954 - type: cos_sim_recall value: 100 - type: dot_accuracy value: 61.75000000000001 - type: dot_ap value: 57.94808038610475 - type: dot_f1 value: 62.444061962134256 - type: dot_precision value: 45.3953953953954 - type: dot_recall value: 100 - type: euclidean_accuracy value: 61.75000000000001 - type: euclidean_ap value: 57.94808038610475 - type: euclidean_f1 value: 62.444061962134256 - type: euclidean_precision value: 45.3953953953954 - type: euclidean_recall value: 100 - type: manhattan_accuracy value: 61.7 - type: manhattan_ap value: 57.996119308184966 - type: manhattan_f1 value: 62.46078773091669 - type: manhattan_precision value: 45.66768603465851 - type: manhattan_recall value: 98.78721058434398 - type: max_accuracy value: 61.75000000000001 - type: max_ap value: 57.996119308184966 - type: max_f1 value: 62.46078773091669 - task: type: Retrieval dataset: type: mteb/quora name: MTEB QuoraRetrieval config: default split: test revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 metrics: - type: map_at_1 value: 69.001 - type: map_at_10 value: 82.573 - type: map_at_100 value: 83.226 - type: map_at_1000 value: 83.246 - type: map_at_3 value: 79.625 - type: map_at_5 value: 81.491 - type: mrr_at_1 value: 79.44 - type: mrr_at_10 value: 85.928 - type: mrr_at_100 value: 86.05199999999999 - type: mrr_at_1000 value: 86.054 - type: mrr_at_3 value: 84.847 - type: mrr_at_5 value: 85.596 - type: ndcg_at_1 value: 79.41 - type: ndcg_at_10 value: 86.568 - type: ndcg_at_100 value: 87.965 - type: ndcg_at_1000 value: 88.134 - type: ndcg_at_3 value: 83.55900000000001 - type: ndcg_at_5 value: 85.244 - type: precision_at_1 value: 79.41 - type: precision_at_10 value: 13.108 - type: precision_at_100 value: 1.509 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 36.443 - type: precision_at_5 value: 24.03 - type: recall_at_1 value: 69.001 - type: recall_at_10 value: 94.132 - type: recall_at_100 value: 99.043 - type: recall_at_1000 value: 99.878 - type: recall_at_3 value: 85.492 - type: recall_at_5 value: 90.226 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 48.3161352736264 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 metrics: - type: v_measure value: 57.83784484156747 - task: type: Retrieval dataset: type: mteb/scidocs name: MTEB SCIDOCS config: default split: test revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 metrics: - type: map_at_1 value: 4.403 - type: map_at_10 value: 10.922 - type: map_at_100 value: 12.626000000000001 - type: map_at_1000 value: 12.883 - type: map_at_3 value: 7.982 - type: map_at_5 value: 9.442 - type: mrr_at_1 value: 21.7 - type: mrr_at_10 value: 31.653 - type: mrr_at_100 value: 32.757999999999996 - type: mrr_at_1000 value: 32.824999999999996 - type: mrr_at_3 value: 28.266999999999996 - type: mrr_at_5 value: 30.127 - type: ndcg_at_1 value: 21.7 - type: ndcg_at_10 value: 18.355 - type: ndcg_at_100 value: 25.228 - type: ndcg_at_1000 value: 30.164 - type: ndcg_at_3 value: 17.549 - type: ndcg_at_5 value: 15.260000000000002 - type: precision_at_1 value: 21.7 - type: precision_at_10 value: 9.47 - type: precision_at_100 value: 1.9290000000000003 - type: precision_at_1000 value: 0.312 - type: precision_at_3 value: 16.3 - type: precision_at_5 value: 13.28 - type: recall_at_1 value: 4.403 - type: recall_at_10 value: 19.18 - type: recall_at_100 value: 39.182 - type: recall_at_1000 value: 63.378 - type: recall_at_3 value: 9.934999999999999 - type: recall_at_5 value: 13.459999999999999 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: 20a6d6f312dd54037fe07a32d58e5e168867909d metrics: - type: cos_sim_pearson value: 76.90841073432534 - type: cos_sim_spearman value: 69.2566375434526 - type: euclidean_pearson value: 73.00183878559413 - type: euclidean_spearman value: 69.25664656235413 - type: manhattan_pearson value: 72.89594756197533 - type: manhattan_spearman value: 69.23247111043545 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 69.60878511794063 - type: cos_sim_spearman value: 65.89916377105551 - type: euclidean_pearson value: 66.90761876557181 - type: euclidean_spearman value: 65.89915018368384 - type: manhattan_pearson value: 66.78502575257721 - type: manhattan_spearman value: 65.79977053467938 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 77.2869334987418 - type: cos_sim_spearman value: 77.86961921643416 - type: euclidean_pearson value: 77.43179820479914 - type: euclidean_spearman value: 77.86961921643416 - type: manhattan_pearson value: 77.18900647348373 - type: manhattan_spearman value: 77.61209060062608 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 76.26453932960364 - type: cos_sim_spearman value: 72.81574657995401 - type: euclidean_pearson value: 75.0708953437423 - type: euclidean_spearman value: 72.81574657995401 - type: manhattan_pearson value: 74.88396609999512 - type: manhattan_spearman value: 72.65437562156805 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 82.37827653919395 - type: cos_sim_spearman value: 83.4885552472602 - type: euclidean_pearson value: 82.89377087926749 - type: euclidean_spearman value: 83.4885552472602 - type: manhattan_pearson value: 82.82440771787735 - type: manhattan_spearman value: 83.41449537888975 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 78.7995043673964 - type: cos_sim_spearman value: 80.57804447517638 - type: euclidean_pearson value: 80.03013884278195 - type: euclidean_spearman value: 80.57804447517638 - type: manhattan_pearson value: 80.13406111544424 - type: manhattan_spearman value: 80.65354602648962 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 83.63565989937278 - type: cos_sim_spearman value: 84.4948593656943 - type: euclidean_pearson value: 84.68743074820951 - type: euclidean_spearman value: 84.4948593656943 - type: manhattan_pearson value: 84.43639397781811 - type: manhattan_spearman value: 84.32595552115242 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 65.06382649277246 - type: cos_sim_spearman value: 66.28447782018655 - type: euclidean_pearson value: 67.09895930908392 - type: euclidean_spearman value: 66.28447782018655 - type: manhattan_pearson value: 66.96342453888376 - type: manhattan_spearman value: 66.33876259551842 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 78.43883428940346 - type: cos_sim_spearman value: 79.18395553127085 - type: euclidean_pearson value: 79.22986635457109 - type: euclidean_spearman value: 79.18395553127085 - type: manhattan_pearson value: 79.10921229934691 - type: manhattan_spearman value: 79.02283553930171 - task: type: STS dataset: type: PhilipMay/stsb_multi_mt name: MTEB STSBenchmarkMultilingualSTS (en) config: en split: test revision: 93d57ef91790589e3ce9c365164337a8a78b7632 metrics: - type: cos_sim_pearson value: 78.43883433444418 - type: cos_sim_spearman value: 79.18395553127085 - type: euclidean_pearson value: 79.22986642351681 - type: euclidean_spearman value: 79.18395553127085 - type: manhattan_pearson value: 79.10921236746302 - type: manhattan_spearman value: 79.02283553930171 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 76.9361627171417 - type: mrr value: 93.06577046773126 - task: type: Retrieval dataset: type: mteb/scifact name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 50.693999999999996 - type: map_at_10 value: 59.784000000000006 - type: map_at_100 value: 60.443000000000005 - type: map_at_1000 value: 60.480000000000004 - type: map_at_3 value: 57.028 - type: map_at_5 value: 58.306999999999995 - type: mrr_at_1 value: 53.333 - type: mrr_at_10 value: 61.565000000000005 - type: mrr_at_100 value: 62.095 - type: mrr_at_1000 value: 62.131 - type: mrr_at_3 value: 59.721999999999994 - type: mrr_at_5 value: 60.589000000000006 - type: ndcg_at_1 value: 53.333 - type: ndcg_at_10 value: 64.512 - type: ndcg_at_100 value: 67.366 - type: ndcg_at_1000 value: 68.46799999999999 - type: ndcg_at_3 value: 59.748999999999995 - type: ndcg_at_5 value: 61.526 - type: precision_at_1 value: 53.333 - type: precision_at_10 value: 8.733 - type: precision_at_100 value: 1.027 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 23.222 - type: precision_at_5 value: 15.2 - type: recall_at_1 value: 50.693999999999996 - type: recall_at_10 value: 77.333 - type: recall_at_100 value: 90.10000000000001 - type: recall_at_1000 value: 99 - type: recall_at_3 value: 64.39399999999999 - type: recall_at_5 value: 68.7 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.81386138613861 - type: cos_sim_ap value: 94.96375600031361 - type: cos_sim_f1 value: 90.36885245901641 - type: cos_sim_precision value: 92.64705882352942 - type: cos_sim_recall value: 88.2 - type: dot_accuracy value: 99.81386138613861 - type: dot_ap value: 94.96375600031361 - type: dot_f1 value: 90.36885245901641 - type: dot_precision value: 92.64705882352942 - type: dot_recall value: 88.2 - type: euclidean_accuracy value: 99.81386138613861 - type: euclidean_ap value: 94.96375600031361 - type: euclidean_f1 value: 90.36885245901641 - type: euclidean_precision value: 92.64705882352942 - type: euclidean_recall value: 88.2 - type: manhattan_accuracy value: 99.81287128712871 - type: manhattan_ap value: 94.92563500640084 - type: manhattan_f1 value: 90.27277406073082 - type: manhattan_precision value: 93.00106044538707 - type: manhattan_recall value: 87.7 - type: max_accuracy value: 99.81386138613861 - type: max_ap value: 94.96375600031361 - type: max_f1 value: 90.36885245901641 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 57.486984956276274 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.58453023612073 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 50.16317315282306 - type: mrr value: 50.82617137764197 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.2927995133324 - type: cos_sim_spearman value: 30.09648622523191 - type: dot_pearson value: 30.29279853541771 - type: dot_spearman value: 30.09648622523191 - task: type: Retrieval dataset: type: mteb/trec-covid name: MTEB TRECCOVID config: default split: test revision: bb9466bac8153a0349341eb1b22e06409e78ef4e metrics: - type: map_at_1 value: 0.23500000000000001 - type: map_at_10 value: 2.01 - type: map_at_100 value: 12.064 - type: map_at_1000 value: 27.437 - type: map_at_3 value: 0.6649999999999999 - type: map_at_5 value: 1.0959999999999999 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 92.667 - type: mrr_at_100 value: 92.667 - type: mrr_at_1000 value: 92.667 - type: mrr_at_3 value: 91.667 - type: mrr_at_5 value: 92.667 - type: ndcg_at_1 value: 84 - type: ndcg_at_10 value: 79.431 - type: ndcg_at_100 value: 60.914 - type: ndcg_at_1000 value: 52.005 - type: ndcg_at_3 value: 82.285 - type: ndcg_at_5 value: 81.565 - type: precision_at_1 value: 88 - type: precision_at_10 value: 84.8 - type: precision_at_100 value: 62.32 - type: precision_at_1000 value: 23.014000000000003 - type: precision_at_3 value: 86.667 - type: precision_at_5 value: 87.2 - type: recall_at_1 value: 0.23500000000000001 - type: recall_at_10 value: 2.19 - type: recall_at_100 value: 14.904 - type: recall_at_1000 value: 47.875 - type: recall_at_3 value: 0.695 - type: recall_at_5 value: 1.165 - task: type: Retrieval dataset: type: mteb/touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.639 - type: map_at_10 value: 14.184 - type: map_at_100 value: 20.61 - type: map_at_1000 value: 22.377 - type: map_at_3 value: 9.163 - type: map_at_5 value: 10.773000000000001 - type: mrr_at_1 value: 46.939 - type: mrr_at_10 value: 59.345000000000006 - type: mrr_at_100 value: 60.07599999999999 - type: mrr_at_1000 value: 60.07599999999999 - type: mrr_at_3 value: 55.782 - type: mrr_at_5 value: 58.231 - type: ndcg_at_1 value: 41.837 - type: ndcg_at_10 value: 32.789 - type: ndcg_at_100 value: 42.232 - type: ndcg_at_1000 value: 53.900999999999996 - type: ndcg_at_3 value: 41.963 - type: ndcg_at_5 value: 35.983 - type: precision_at_1 value: 46.939 - type: precision_at_10 value: 28.163 - type: precision_at_100 value: 8.102 - type: precision_at_1000 value: 1.59 - type: precision_at_3 value: 44.897999999999996 - type: precision_at_5 value: 34.694 - type: recall_at_1 value: 3.639 - type: recall_at_10 value: 19.308 - type: recall_at_100 value: 48.992000000000004 - type: recall_at_1000 value: 84.59400000000001 - type: recall_at_3 value: 9.956 - type: recall_at_5 value: 12.33 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de metrics: - type: accuracy value: 64.305 - type: ap value: 11.330746746072599 - type: f1 value: 49.290704382387865 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 56.1941143180532 - type: f1 value: 56.40189765095578 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 36.28189332526842 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 83.1912737676581 - type: cos_sim_ap value: 64.31536990146257 - type: cos_sim_f1 value: 61.095167030191696 - type: cos_sim_precision value: 54.074375127006704 - type: cos_sim_recall value: 70.21108179419525 - type: dot_accuracy value: 83.1912737676581 - type: dot_ap value: 64.31539216162541 - type: dot_f1 value: 61.095167030191696 - type: dot_precision value: 54.074375127006704 - type: dot_recall value: 70.21108179419525 - type: euclidean_accuracy value: 83.1912737676581 - type: euclidean_ap value: 64.31538391358727 - type: euclidean_f1 value: 61.095167030191696 - type: euclidean_precision value: 54.074375127006704 - type: euclidean_recall value: 70.21108179419525 - type: manhattan_accuracy value: 83.07206294331525 - type: manhattan_ap value: 64.14646315556838 - type: manhattan_f1 value: 61.194029850746254 - type: manhattan_precision value: 54.166666666666664 - type: manhattan_recall value: 70.31662269129288 - type: max_accuracy value: 83.1912737676581 - type: max_ap value: 64.31539216162541 - type: max_f1 value: 61.194029850746254 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.38242713548337 - type: cos_sim_ap value: 84.70041255196017 - type: cos_sim_f1 value: 77.13222561986515 - type: cos_sim_precision value: 73.95266690215472 - type: cos_sim_recall value: 80.59747459193102 - type: dot_accuracy value: 88.38242713548337 - type: dot_ap value: 84.7004118720222 - type: dot_f1 value: 77.13222561986515 - type: dot_precision value: 73.95266690215472 - type: dot_recall value: 80.59747459193102 - type: euclidean_accuracy value: 88.38242713548337 - type: euclidean_ap value: 84.70041593996575 - type: euclidean_f1 value: 77.13222561986515 - type: euclidean_precision value: 73.95266690215472 - type: euclidean_recall value: 80.59747459193102 - type: manhattan_accuracy value: 88.36108200411378 - type: manhattan_ap value: 84.66897701572054 - type: manhattan_f1 value: 77.00707640360645 - type: manhattan_precision value: 72.17695778062082 - type: manhattan_recall value: 82.53002771789343 - type: max_accuracy value: 88.38242713548337 - type: max_ap value: 84.70041593996575 - type: max_f1 value: 77.13222561986515 - task: type: Clustering dataset: type: jinaai/cities_wiki_clustering name: MTEB WikiCitiesClustering config: default split: test revision: ddc9ee9242fa65332597f70e967ecc38b9d734fa metrics: - type: v_measure value: 81.46426354153643 ---

Snowflake's Arctic-embed-xs

News | Models | Usage | Evaluation | Contact | FAQ License | Acknowledgement

## News 12/04/2024: Release of snowflake-arctic-embed-l-v2.0 and snowflake-arctic-embed-m-v2.0 our newest models with multilingual workloads in mind. These models outperform prior versions of Arctic Embed and we suggest these replace prior versions! 07/26/2024: Release preprint [[2407.18887] Embedding And Clustering Your Data Can Improve Contrastive Pretraining]( on arXiv. 07/18/2024: Release of , capable of producing highly compressible embedding vectors that preserve quality even when squished as small as 128 bytes per vector. Details about the development of this model are available in the launch post on the Snowflake engineering blog. 05/10/2024: Release the technical report on Arctic Embed 04/16/2024: Release the ** snowflake-arctic-embed ** family of text embedding models. The releases are state-of-the-art for Retrieval quality at each of their representative size profiles. [Technical Report]() is coming shortly. For more details, please refer to our Github: Arctic-Text-Embed. ## Models snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance. The models achieve **state-of-the-art performance on the MTEB/BEIR leaderboard** for each of their size variants. Evaluation is performed using these scripts. As shown below, each class of model size achieves SOTA retrieval accuracy compared to other top models. The models are trained by leveraging existing open-source text representation models, such as bert-base-uncased, and are trained in a multi-stage pipeline to optimize their retrieval performance. First, the models are trained with large batches of query-document pairs where negatives are derived in-batch—pretraining leverages about 400m samples of a mix of public datasets and proprietary web search data. Following pretraining models are further optimized with long training on a smaller dataset (about 1m samples) of triplets of query, positive document, and negative document derived from hard harmful mining. Mining of the negatives and data curation is crucial to retrieval accuracy. A detailed technical report can be found here. | Name | MTEB Retrieval Score (NDCG @ 10) | Parameters (Millions) | Embedding Dimension | | ----------------------------------------------------------------------- | -------------------------------- | --------------------- | ------------------- | | snowflake-arctic-embed-xs | 50.15 | 22 | 384 | | snowflake-arctic-embed-s | 51.98 | 33 | 384 | | snowflake-arctic-embed-m | 54.90 | 110 | 768 | | snowflake-arctic-embed-m-long | 54.83 | 137 | 768 | | snowflake-arctic-embed-l | 55.98 | 335 | 1024 | Aside from being great open-source models, the largest model, snowflake-arctic-embed-l, can serve as a natural replacement for closed-source embedding, as shown below. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-l | 55.98 | | Google-gecko-text-embedding | 55.7 | | text-embedding-3-large | 55.44 | | Cohere-embed-english-v3.0 | 55.00 | | bge-large-en-v1.5 | 54.29 | ### snowflake-arctic-embed-xs This tiny model packs quite the punch. Based on the all-MiniLM-L6-v2 model with only 22m parameters and 384 dimensions, this model should meet even the strictest latency/TCO budgets. Despite its size, its retrieval accuracy is closer to that of models with 100m paramers. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------- | -------------------------------- | | snowflake-arctic-embed-xs | 50.15 | | GIST-all-MiniLM-L6-v2 | 45.12 | | gte-tiny | 44.92 | | all-MiniLM-L6-v2 | 41.95 | | bge-micro-v2 | 42.56 | ### snowflake-arctic-embed-s Based on the infloat/e5-small-unsupervised model, this small model does not trade off retrieval accuracy for its small size. With only 33m parameters and 384 dimensions, this model should easily allow scaling to large datasets. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-s | 51.98 | | bge-small-en-v1.5 | 51.68 | | Cohere-embed-english-light-v3.0 | 51.34 | | text-embedding-3-small | 51.08 | | e5-small-v2 | 49.04 | ### snowflake-arctic-embed-m Based on the intfloat/e5-base-unsupervised model, this medium model is the workhorse that provides the best retrieval performance without slowing down inference. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-m | 54.90 | | bge-base-en-v1.5 | 53.25 | | nomic-embed-text-v1.5 | 53.25 | | GIST-Embedding-v0 | 52.31 | | gte-base | 52.31 | ### snowflake-arctic-embed-m-long Based on the nomic-embed-text-v1-unsupervised model, this long-context variant of our medium-sized model is perfect for workloads that can be constrained by the regular 512 token context of our other models. Without the use of RPE, this model supports up to 2048 tokens. With RPE, it can scale to 8192! | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-m-long | 54.83 | | nomic-embed-text-v1.5 | 53.01 | | nomic-embed-text-v1 | 52.81 | ### snowflake-arctic-embed-l Based on the intfloat/e5-large-unsupervised model, this large model is a direct drop-in for closed APIs and delivers the most accurate retrieval experience. | Model Name | MTEB Retrieval Score (NDCG @ 10) | | ------------------------------------------------------------------ | -------------------------------- | | snowflake-arctic-embed-l | 55.98 | | UAE-Large-V1 | 54.66 | | bge-large-en-v1.5 | 54.29 | | mxbai-embed-large-v1 | 54.39 | | e5-Large-v2 | 50.56 | ## Usage ### Using Sentence Transformers You can use the sentence-transformers package to use an snowflake-arctic-embed model, as shown below. ### Using Huggingface transformers You can use the transformers package for a snowflake-arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query). ### Using Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM by running: You can then use the model to compute embeddings as follows: ## FAQ TBD ## Contact Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Daniel Campos(daniel.campos@snowflake.com). ## License Arctic is licensed under the Apache-2. The released models can be used for commercial purposes free of charge. ## Acknowledgement We want to thank the open-source community, which has provided the great building blocks upon which we could make our models. We thank our modeling engineers, Danmei Xu, Luke Merrick, Gaurav Nuti, and Daniel Campos, for making these great models possible. We thank our leadership, Himabindu Pucha, Kelvin So, Vivek Raghunathan, and Sridhar Ramaswamy, for supporting this work. We also thank the open-source community for producing the great models we could build on top of and making these releases possible. Finally, we thank the researchers who created BEIR and MTEB benchmarks. It is largely thanks to their tireless work to define what better looks like that we could improve model performance. ", + "model_explanation_gemini": "Generates sentence embeddings for measuring similarity between texts and performs tasks like classification, retrieval, clustering, and reranking." +} \ No newline at end of file diff --git a/data/model_data_json/Supabase_gte-small.json b/data/model_data_json/Supabase_gte-small.json new file mode 100644 index 0000000000000000000000000000000000000000..d847c70bb3cabcba3aa2240d9d70d9c8b13d4cf6 --- /dev/null +++ b/data/model_data_json/Supabase_gte-small.json @@ -0,0 +1,16 @@ +{ + "model_id": "Supabase/gte-small", + "downloads": 459651, + "tags": [ + "transformers.js", + "pytorch", + "onnx", + "bert", + "feature-extraction", + "en", + "license:mit", + "region:us" + ], + "description": "--- pipeline_tag: feature-extraction library_name: \"transformers.js\" language: - en license: mit --- _Fork of with ONNX weights to be compatible with Transformers.js. See JavaScript usage._ --- # gte-small General Text Embeddings (GTE) model. The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc. ## Metrics Performance of GTE models were compared with other popular text embedding models on the MTEB benchmark. For more detailed comparison results, please refer to the MTEB leaderboard. | Model Name | Model Size (GB) | Dimension | Sequence Length | Average (56) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large** | 0.67 | 1024 | 512 | **63.13** | 46.84 | 85.00 | 59.13 | 52.22 | 83.35 | 31.66 | 73.33 | | **gte-base** | 0.22 | 768 | 512 | **62.39** | 46.2 | 84.57 | 58.61 | 51.14 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1.34 | 1024| 512 | 62.25 | 44.49 | 86.03 | 56.61 | 50.56 | 82.05 | 30.19 | 75.24 | | e5-base-v2 | 0.44 | 768 | 512 | 61.5 | 43.80 | 85.73 | 55.91 | 50.29 | 81.05 | 30.28 | 73.84 | | **gte-small** | 0.07 | 384 | 512 | **61.36** | 44.89 | 83.54 | 57.7 | 49.46 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | - | 1536 | 8192 | 60.99 | 45.9 | 84.89 | 56.32 | 49.25 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 0.13 | 384 | 512 | 59.93 | 39.92 | 84.67 | 54.32 | 49.04 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 9.73 | 768 | 512 | 59.51 | 43.72 | 85.06 | 56.42 | 42.24 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 0.44 | 768 | 514 | 57.78 | 43.69 | 83.04 | 59.36 | 43.81 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 28.27 | 4096 | 2048 | 57.59 | 38.93 | 81.9 | 55.65 | 48.22 | 77.74 | 33.6 | 66.19 | | all-MiniLM-L12-v2 | 0.13 | 384 | 512 | 56.53 | 41.81 | 82.41 | 58.44 | 42.69 | 79.8 | 27.9 | 63.21 | | all-MiniLM-L6-v2 | 0.09 | 384 | 512 | 56.26 | 42.35 | 82.37 | 58.04 | 41.95 | 78.9 | 30.81 | 63.05 | | contriever-base-msmarco | 0.44 | 768 | 512 | 56.00 | 41.1 | 82.54 | 53.14 | 41.88 | 76.51 | 30.36 | 66.68 | | sentence-t5-base | 0.22 | 768 | 512 | 55.27 | 40.21 | 85.18 | 53.09 | 33.63 | 81.14 | 31.39 | 69.81 | ## Usage This model can be used with both Python and JavaScript. ### Python Use with Transformers and PyTorch: Use with sentence-transformers: ### JavaScript This model can be used with JavaScript via Transformers.js. Use with Deno or Supabase Edge Functions: Use within the browser (JavaScript Modules): Use within Node.js or a web bundler (Webpack, etc): ### Limitation This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.", + "model_explanation_gemini": "Generates text embeddings for English to support tasks like information retrieval, semantic textual similarity, and text reranking, based on a BERT framework and trained on diverse relevance text pairs." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-base.en.json b/data/model_data_json/Systran_faster-whisper-base.en.json new file mode 100644 index 0000000000000000000000000000000000000000..d7fab012880c531ade03e858b9185970a8764cd4 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-base.en.json @@ -0,0 +1,14 @@ +{ + "model_id": "Systran/faster-whisper-base.en", + "downloads": 560188, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "license:mit", + "region:us" + ], + "description": "--- language: - en tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper base.en model for CTranslate2 This repository contains the conversion of openai/whisper-base.en to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper base.en model to the CTranslate2 format for efficient automatic speech recognition in English." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-base.json b/data/model_data_json/Systran_faster-whisper-base.json new file mode 100644 index 0000000000000000000000000000000000000000..f125ed12f86755e5e959278fc340085330b2c881 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-base.json @@ -0,0 +1,112 @@ +{ + "model_id": "Systran/faster-whisper-base", + "downloads": 886115, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper base model for CTranslate2 This repository contains the conversion of openai/whisper-base to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper-base model to the CTranslate2 format for efficient automatic speech recognition in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-large-v2.json b/data/model_data_json/Systran_faster-whisper-large-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..576ff68e7b6d9bbae9318d2160bdd605558d7140 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-large-v2.json @@ -0,0 +1,112 @@ +{ + "model_id": "Systran/faster-whisper-large-v2", + "downloads": 730567, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper large-v2 model for CTranslate2 This repository contains the conversion of openai/whisper-large-v2 to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts speech to text in multiple languages using the Whisper large-v2 model optimized for CTranslate2." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-large-v3.json b/data/model_data_json/Systran_faster-whisper-large-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..1ebdba0e95fb22f06a0affc4440b4c3ddb1d3c54 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-large-v3.json @@ -0,0 +1,113 @@ +{ + "model_id": "Systran/faster-whisper-large-v3", + "downloads": 646531, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "yue", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su - yue tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper large-v3 model for CTranslate2 This repository contains the conversion of openai/whisper-large-v3 to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper large-v3 model to the CTranslate2 format for efficient automatic speech recognition in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-medium.json b/data/model_data_json/Systran_faster-whisper-medium.json new file mode 100644 index 0000000000000000000000000000000000000000..45dbf94280305357fd64a9e8bba950aa5a8e7194 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-medium.json @@ -0,0 +1,112 @@ +{ + "model_id": "Systran/faster-whisper-medium", + "downloads": 146813, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper medium model for CTranslate2 This repository contains the conversion of openai/whisper-medium to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper-medium model to the CTranslate2 format for efficient automatic speech recognition in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-small.en.json b/data/model_data_json/Systran_faster-whisper-small.en.json new file mode 100644 index 0000000000000000000000000000000000000000..a8e957cc728b190700271e9e69f5d3f012f2ba90 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-small.en.json @@ -0,0 +1,14 @@ +{ + "model_id": "Systran/faster-whisper-small.en", + "downloads": 136878, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "license:mit", + "region:us" + ], + "description": "--- language: - en tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper small.en model for CTranslate2 This repository contains the conversion of openai/whisper-small.en to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper-small.en model to the CTranslate2 format for efficient automatic speech recognition in English." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-small.json b/data/model_data_json/Systran_faster-whisper-small.json new file mode 100644 index 0000000000000000000000000000000000000000..62566925a6fa2e0ba3b30fc56c2159cc1e4d27d2 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-small.json @@ -0,0 +1,112 @@ +{ + "model_id": "Systran/faster-whisper-small", + "downloads": 339068, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper small model for CTranslate2 This repository contains the conversion of openai/whisper-small to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper-small model to the CTranslate2 format for efficient automatic speech recognition in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-tiny.en.json b/data/model_data_json/Systran_faster-whisper-tiny.en.json new file mode 100644 index 0000000000000000000000000000000000000000..f4c78d5a214d9ce7e69bdb4666fc36057b079f05 --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-tiny.en.json @@ -0,0 +1,14 @@ +{ + "model_id": "Systran/faster-whisper-tiny.en", + "downloads": 187083, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "license:mit", + "region:us" + ], + "description": "--- language: - en tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper tiny.en model for CTranslate2 This repository contains the conversion of openai/whisper-tiny.en to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's whisper-tiny.en model to the CTranslate2 format for efficient automatic speech recognition in English." +} \ No newline at end of file diff --git a/data/model_data_json/Systran_faster-whisper-tiny.json b/data/model_data_json/Systran_faster-whisper-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..7349e07d119e7f16214973ff62ec7bb6ebe92fac --- /dev/null +++ b/data/model_data_json/Systran_faster-whisper-tiny.json @@ -0,0 +1,112 @@ +{ + "model_id": "Systran/faster-whisper-tiny", + "downloads": 642212, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper tiny model for CTranslate2 This repository contains the conversion of openai/whisper-tiny to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper-tiny model to the CTranslate2 format for efficient automatic speech recognition in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/THUDM_CogVideoX-5b-I2V.json b/data/model_data_json/THUDM_CogVideoX-5b-I2V.json new file mode 100644 index 0000000000000000000000000000000000000000..33fb1fae938cf43cbec04d8331edc95c53756355 --- /dev/null +++ b/data/model_data_json/THUDM_CogVideoX-5b-I2V.json @@ -0,0 +1,19 @@ +{ + "model_id": "THUDM/CogVideoX-5b-I2V", + "downloads": 87126, + "tags": [ + "diffusers", + "safetensors", + "cogvideox", + "video-generation", + "thudm", + "image-to-video", + "en", + "arxiv:2408.06072", + "license:other", + "diffusers:CogVideoXImageToVideoPipeline", + "region:us" + ], + "description": "--- license: other license_link: language: - en tags: - cogvideox - video-generation - thudm - image-to-video inference: false --- # CogVideoX-5B-I2V

| | |

📍 Visit for the commercial version of the video generation model

## Model Introduction CogVideoX is an open-source video generation model originating from Qingying. The table below presents information related to the video generation models we offer in this version.
Model Name CogVideoX-2B CogVideoX-5B CogVideoX-5B-I2V (This Repository)
Model Description Entry-level model, balancing compatibility. Low cost for running and secondary development. Larger model with higher video generation quality and better visual effects. CogVideoX-5B image-to-video version.
Inference Precision FP16*(recommended), BF16, FP32, FP8*, INT8, not supported: INT4 BF16 (recommended), FP16, FP32, FP8*, INT8, not supported: INT4
Single GPU Memory Usage
diffusers FP16: from 4GB*
diffusers INT8 (torchao): from 3.6GB*
diffusers BF16: from 5GB*
diffusers INT8 (torchao): from 4.4GB*
Multi-GPU Inference Memory Usage FP16: 10GB* using diffusers
BF16: 15GB* using diffusers
Inference Speed
(Step = 50, FP/BF16)
Single A100: ~90 seconds
Single H100: ~45 seconds
Single A100: ~180 seconds
Single H100: ~90 seconds
Fine-tuning Precision FP16 BF16
Fine-tuning Memory Usage 47 GB (bs=1, LORA)
61 GB (bs=2, LORA)
62GB (bs=1, SFT)
63 GB (bs=1, LORA)
80 GB (bs=2, LORA)
75GB (bs=1, SFT)
78 GB (bs=1, LORA)
75GB (bs=1, SFT, 16GPU)
Prompt Language English*
Maximum Prompt Length 226 Tokens
Video Length 6 Seconds
Frame Rate 8 Frames / Second
Video Resolution 720 x 480, no support for other resolutions (including fine-tuning)
Position Embedding 3d_sincos_pos_embed 3d_rope_pos_embed 3d_rope_pos_embed + learnable_pos_embed
**Data Explanation** + While testing using the diffusers library, all optimizations included in the diffusers library were enabled. This scheme has not been tested for actual memory usage on devices outside of **NVIDIA A100 / H100** architectures. Generally, this scheme can be adapted to all **NVIDIA Ampere architecture** and above devices. If optimizations are disabled, memory consumption will multiply, with peak memory usage being about 3 times the value in the table. However, speed will increase by about 3-4 times. You can selectively disable some optimizations, including: + For multi-GPU inference, the optimization needs to be disabled. + Using INT8 models will slow down inference, which is done to accommodate lower-memory GPUs while maintaining minimal video quality loss, though inference speed will significantly decrease. + The CogVideoX-2B model was trained in precision, and all CogVideoX-5B models were trained in precision. We recommend using the precision in which the model was trained for inference. + PytorchAO and Optimum-quanto can be used to quantize the text encoder, transformer, and VAE modules to reduce the memory requirements of CogVideoX. This allows the model to run on free T4 Colabs or GPUs with smaller memory! Also, note that TorchAO quantization is fully compatible with , which can significantly improve inference speed. FP8 precision must be used on devices with NVIDIA H100 and above, requiring source installation of , , , and Python packages. CUDA 12.4 is recommended. + The inference speed tests also used the above memory optimization scheme. Without memory optimization, inference speed increases by about 10%. Only the version of the model supports quantization. + The model only supports English input; other languages can be translated into English for use via large model refinement. + The memory usage of model fine-tuning is tested in an environment, and the program automatically uses optimization. If a specific number of GPUs is marked in the table, that number or more GPUs must be used for fine-tuning. **Reminders** + Use SAT for inference and fine-tuning SAT version models. Feel free to visit our GitHub for more details. ## Getting Started Quickly 🤗 This model supports deployment using the Hugging Face diffusers library. You can follow the steps below to get started. **We recommend that you visit our GitHub to check out prompt optimization and conversion to get a better experience.** 1. Install the required dependencies 2. Run the code ## Quantized Inference PytorchAO and Optimum-quanto can be used to quantize the text encoder, transformer, and VAE modules to reduce CogVideoX's memory requirements. This allows the model to run on free T4 Colab or GPUs with lower VRAM! Also, note that TorchAO quantization is fully compatible with , which can significantly accelerate inference. Additionally, these models can be serialized and stored using PytorchAO in quantized data types to save disk space. You can find examples and benchmarks at the following links: - torchao - quanto ## Further Exploration Feel free to enter our GitHub, where you'll find: 1. More detailed technical explanations and code. 2. Optimized prompt examples and conversions. 3. Detailed code for model inference and fine-tuning. 4. Project update logs and more interactive opportunities. 5. CogVideoX toolchain to help you better use the model. 6. INT8 model inference code. ## Model License This model is released under the CogVideoX LICENSE. ## Citation", + "model_explanation_gemini": "Generates videos from input images with high-quality visual effects." +} \ No newline at end of file diff --git a/data/model_data_json/THUDM_CogVideoX-5b.json b/data/model_data_json/THUDM_CogVideoX-5b.json new file mode 100644 index 0000000000000000000000000000000000000000..80ac081037de316d53852fed968bd5c8db7f9672 --- /dev/null +++ b/data/model_data_json/THUDM_CogVideoX-5b.json @@ -0,0 +1,19 @@ +{ + "model_id": "THUDM/CogVideoX-5b", + "downloads": 91811, + "tags": [ + "diffusers", + "safetensors", + "cogvideox", + "video-generation", + "thudm", + "text-to-video", + "en", + "arxiv:2408.06072", + "license:other", + "diffusers:CogVideoXPipeline", + "region:us" + ], + "description": "--- license: other license_link: language: - en tags: - cogvideox - video-generation - thudm - text-to-video inference: false --- # CogVideoX-5B

| | |

📍 Visit to experience commercial video generation models.

## Demo Show Video Gallery with Captions
A garden comes to life as a kaleidoscope of butterflies flutters amidst the blossoms, their delicate wings casting shadows on the petals below. In the background, a grand fountain cascades water with a gentle splendor, its rhythmic sound providing a soothing backdrop. Beneath the cool shade of a mature tree, a solitary wooden chair invites solitude and reflection, its smooth surface worn by the touch of countless visitors seeking a moment of tranquility in nature's embrace.
A small boy, head bowed and determination etched on his face, sprints through the torrential downpour as lightning crackles and thunder rumbles in the distance. The relentless rain pounds the ground, creating a chaotic dance of water droplets that mirror the dramatic sky's anger. In the far background, the silhouette of a cozy home beckons, a faint beacon of safety and warmth amidst the fierce weather. The scene is one of perseverance and the unyielding spirit of a child braving the elements.
A suited astronaut, with the red dust of Mars clinging to their boots, reaches out to shake hands with an alien being, their skin a shimmering blue, under the pink-tinged sky of the fourth planet. In the background, a sleek silver rocket, a beacon of human ingenuity, stands tall, its engines powered down, as the two representatives of different worlds exchange a historic greeting amidst the desolate beauty of the Martian landscape.
An elderly gentleman, with a serene expression, sits at the water's edge, a steaming cup of tea by his side. He is engrossed in his artwork, brush in hand, as he renders an oil painting on a canvas that's propped up against a small, weathered table. The sea breeze whispers through his silver hair, gently billowing his loose-fitting white shirt, while the salty air adds an intangible element to his masterpiece in progress. The scene is one of tranquility and inspiration, with the artist's canvas capturing the vibrant hues of the setting sun reflecting off the tranquil sea.
In a dimly lit bar, purplish light bathes the face of a mature man, his eyes blinking thoughtfully as he ponders in close-up, the background artfully blurred to focus on his introspective expression, the ambiance of the bar a mere suggestion of shadows and soft lighting.
A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer.
On a brilliant sunny day, the lakeshore is lined with an array of willow trees, their slender branches swaying gently in the soft breeze. The tranquil surface of the lake reflects the clear blue sky, while several elegant swans glide gracefully through the still water, leaving behind delicate ripples that disturb the mirror-like quality of the lake. The scene is one of serene beauty, with the willows' greenery providing a picturesque frame for the peaceful avian visitors.
A Chinese mother, draped in a soft, pastel-colored robe, gently rocks back and forth in a cozy rocking chair positioned in the tranquil setting of a nursery. The dimly lit bedroom is adorned with whimsical mobiles dangling from the ceiling, casting shadows that dance on the walls. Her baby, swaddled in a delicate, patterned blanket, rests against her chest, the child's earlier cries now replaced by contented coos as the mother's soothing voice lulls the little one to sleep. The scent of lavender fills the air, adding to the serene atmosphere, while a warm, orange glow from a nearby nightlight illuminates the scene with a gentle hue, capturing a moment of tender love and comfort.
## Model Introduction CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information.
Model Name CogVideoX-2B CogVideoX-5B (This Repository)
Model Description Entry-level model, balancing compatibility. Low cost for running and secondary development. Larger model with higher video generation quality and better visual effects.
Inference Precision FP16* (Recommended), BF16, FP32, FP8*, INT8, no support for INT4 BF16 (Recommended), FP16, FP32, FP8*, INT8, no support for INT4
Single GPU VRAM Consumption
diffusers FP16: starting from 4GB*
diffusers INT8(torchao): starting from 3.6GB*
diffusers BF16: starting from 5GB*
diffusers INT8(torchao): starting from 4.4GB*
Multi-GPU Inference VRAM Consumption FP16: 10GB* using diffusers BF16: 15GB* using diffusers
Inference Speed
(Step = 50, FP/BF16)
Single A100: ~90 seconds
Single H100: ~45 seconds
Single A100: ~180 seconds
Single H100: ~90 seconds
Fine-tuning Precision FP16 BF16
Fine-tuning VRAM Consumption (per GPU) 47 GB (bs=1, LORA)
61 GB (bs=2, LORA)
62GB (bs=1, SFT)
63 GB (bs=1, LORA)
80 GB (bs=2, LORA)
75GB (bs=1, SFT)
Prompt Language English*
Prompt Length Limit 226 Tokens
Video Length 6 Seconds
Frame Rate 8 Frames per Second
Video Resolution 720 x 480, no support for other resolutions (including fine-tuning)
Positional Encoding 3d_sincos_pos_embed 3d_rope_pos_embed
**Data Explanation** + When testing using the library, all optimizations provided by the library were enabled. This solution has not been tested for actual VRAM/memory usage on devices other than **NVIDIA A100 / H100**. Generally, this solution can be adapted to all devices with **NVIDIA Ampere architecture** and above. If the optimizations are disabled, VRAM usage will increase significantly, with peak VRAM usage being about 3 times higher than the table shows. However, speed will increase by 3-4 times. You can selectively disable some optimizations, including: + When performing multi-GPU inference, the optimization needs to be disabled. + Using INT8 models will reduce inference speed. This is to ensure that GPUs with lower VRAM can perform inference normally while maintaining minimal video quality loss, though inference speed will decrease significantly. + The 2B model is trained with precision, and the 5B model is trained with precision. We recommend using the precision the model was trained with for inference. + PytorchAO and Optimum-quanto can be used to quantize the text encoder, Transformer, and VAE modules to reduce CogVideoX's memory requirements. This makes it possible to run the model on a free T4 Colab or GPUs with smaller VRAM! It is also worth noting that TorchAO quantization is fully compatible with , which can significantly improve inference speed. precision must be used on devices with or above, which requires installing the , , , and Python packages from source. is recommended. + The inference speed test also used the above VRAM optimization scheme. Without VRAM optimization, inference speed increases by about 10%. Only the version of the model supports quantization. + The model only supports English input; other languages can be translated into English during refinement by a large model. **Note** + Using SAT for inference and fine-tuning of SAT version models. Feel free to visit our GitHub for more information. ## Quick Start 🤗 This model supports deployment using the huggingface diffusers library. You can deploy it by following these steps. **We recommend that you visit our GitHub and check out the relevant prompt optimizations and conversions to get a better experience.** 1. Install the required dependencies 2. Run the code ## Quantized Inference PytorchAO and Optimum-quanto can be used to quantize the Text Encoder, Transformer and VAE modules to lower the memory requirement of CogVideoX. This makes it possible to run the model on free-tier T4 Colab or smaller VRAM GPUs as well! It is also worth noting that TorchAO quantization is fully compatible with , which allows for much faster inference speed. Additionally, the models can be serialized and stored in a quantized datatype to save disk space when using PytorchAO. Find examples and benchmarks at these links: - torchao - quanto ## Explore the Model Welcome to our github, where you will find: 1. More detailed technical details and code explanation. 2. Optimization and conversion of prompt words. 3. Reasoning and fine-tuning of SAT version models, and even pre-release. 4. Project update log dynamics, more interactive opportunities. 5. CogVideoX toolchain to help you better use the model. 6. INT8 model inference code support. ## Model License This model is released under the CogVideoX LICENSE. ## Citation", + "model_explanation_gemini": "Generates videos from text descriptions." +} \ No newline at end of file diff --git a/data/model_data_json/THUDM_CogView4-6B.json b/data/model_data_json/THUDM_CogView4-6B.json new file mode 100644 index 0000000000000000000000000000000000000000..2f880c34372a54309ad9a76ec4691dd4fbac7042 --- /dev/null +++ b/data/model_data_json/THUDM_CogView4-6B.json @@ -0,0 +1,19 @@ +{ + "model_id": "THUDM/CogView4-6B", + "downloads": 309109, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "zh", + "en", + "arxiv:2403.05121", + "base_model:THUDM/glm-4-9b", + "base_model:finetune:THUDM/glm-4-9b", + "license:apache-2.0", + "diffusers:CogView4Pipeline", + "region:us" + ], + "description": "--- license: apache-2.0 language: - zh - en base_model: - THUDM/glm-4-9b pipeline_tag: text-to-image library_name: diffusers --- # CogView4-6B

|

!img ## Inference Requirements and Model Introduction + Resolution: Width and height must be between and , divisible by , and ensure the maximum number of pixels does not exceed px. + Precision: BF16 / FP32 (FP16 is not supported as it will cause overflow resulting in completely black images) Using precision with for testing, the memory usage is shown in the table below: | Resolution | enable_model_cpu_offload OFF | enable_model_cpu_offload ON | enable_model_cpu_offload ON
Text Encoder 4bit | |-------------|------------------------------|-----------------------------|-----------------------------------------------------| | 512 * 512 | 33GB | 20GB | 13G | | 1280 * 720 | 35GB | 20GB | 13G | | 1024 * 1024 | 35GB | 20GB | 13G | | 1920 * 1280 | 39GB | 20GB | 14G | ## Quick Start First, ensure you install the library from source. Then, run the following code: ### Model Metrics We've tested on multiple benchmarks and achieved the following scores: #### DPG-Bench | Model | Overall | Global | Entity | Attribute | Relation | Other | |-----------------|-----------|-----------|-----------|-----------|-----------|-----------| | SDXL | 74.65 | 83.27 | 82.43 | 80.91 | 86.76 | 80.41 | | PixArt-alpha | 71.11 | 74.97 | 79.32 | 78.60 | 82.57 | 76.96 | | SD3-Medium | 84.08 | 87.90 | **91.01** | 88.83 | 80.70 | 88.68 | | DALL-E 3 | 83.50 | **90.97** | 89.61 | 88.39 | 90.58 | 89.83 | | Flux.1-dev | 83.79 | 85.80 | 86.79 | 89.98 | 90.04 | **89.90** | | Janus-Pro-7B | 84.19 | 86.90 | 88.90 | 89.40 | 89.32 | 89.48 | | **CogView4-6B** | **85.13** | 83.85 | 90.35 | **91.17** | **91.14** | 87.29 | #### GenEval | Model | Overall | Single Obj. | Two Obj. | Counting | Colors | Position | Color attribution | |-----------------|----------|-------------|----------|----------|----------|----------|-------------------| | SDXL | 0.55 | 0.98 | 0.74 | 0.39 | 0.85 | 0.15 | 0.23 | | PixArt-alpha | 0.48 | 0.98 | 0.50 | 0.44 | 0.80 | 0.08 | 0.07 | | SD3-Medium | 0.74 | **0.99** | **0.94** | 0.72 | 0.89 | 0.33 | 0.60 | | DALL-E 3 | 0.67 | 0.96 | 0.87 | 0.47 | 0.83 | 0.43 | 0.45 | | Flux.1-dev | 0.66 | 0.98 | 0.79 | **0.73** | 0.77 | 0.22 | 0.45 | | Janus-Pro-7B | **0.80** | **0.99** | 0.89 | 0.59 | **0.90** | **0.79** | **0.66** | | **CogView4-6B** | 0.73 | **0.99** | 0.86 | 0.66 | 0.79 | 0.48 | 0.58 | #### T2I-CompBench | Model | Color | Shape | Texture | 2D-Spatial | 3D-Spatial | Numeracy | Non-spatial Clip | Complex 3-in-1 | |-----------------|------------|------------|------------|------------|------------|------------|------------------|----------------| | SDXL | 0.5879 | 0.4687 | 0.5299 | 0.2133 | 0.3566 | 0.4988 | 0.3119 | 0.3237 | | PixArt-alpha | 0.6690 | 0.4927 | 0.6477 | 0.2064 | 0.3901 | 0.5058 | **0.3197** | 0.3433 | | SD3-Medium | **0.8132** | 0.5885 | **0.7334** | **0.3200** | **0.4084** | 0.6174 | 0.3140 | 0.3771 | | DALL-E 3 | 0.7785 | **0.6205** | 0.7036 | 0.2865 | 0.3744 | 0.5880 | 0.3003 | 0.3773 | | Flux.1-dev | 0.7572 | 0.5066 | 0.6300 | 0.2700 | 0.3992 | 0.6165 | 0.3065 | 0.3628 | | Janus-Pro-7B | 0.5145 | 0.3323 | 0.4069 | 0.1566 | 0.2753 | 0.4406 | 0.3137 | 0.3806 | | **CogView4-6B** | 0.7786 | 0.5880 | 0.6983 | 0.3075 | 0.3708 | **0.6626** | 0.3056 | **0.3869** | ## Chinese Text Accuracy Evaluation | Model | Precision | Recall | F1 Score | Pick@4 | |-----------------|------------|------------|------------|------------| | Kolors | 0.6094 | 0.1886 | 0.2880 | 0.1633 | | **CogView4-6B** | **0.6969** | **0.5532** | **0.6168** | **0.3265** | ## Citation 🌟 If you find our work helpful, please consider citing our paper and leaving valuable stars ## License This model is released under the Apache 2.0 License.", + "model_explanation_gemini": "Generates images from text inputs in Chinese and English, achieving high performance on benchmarks for text-to-image tasks." +} \ No newline at end of file diff --git a/data/model_data_json/THUDM_chatglm2-6b.json b/data/model_data_json/THUDM_chatglm2-6b.json new file mode 100644 index 0000000000000000000000000000000000000000..8d469a075d4df7cbe8b16501f1933cddb510047f --- /dev/null +++ b/data/model_data_json/THUDM_chatglm2-6b.json @@ -0,0 +1,22 @@ +{ + "model_id": "THUDM/chatglm2-6b", + "downloads": 613020, + "tags": [ + "transformers", + "pytorch", + "chatglm", + "glm", + "thudm", + "custom_code", + "zh", + "en", + "arxiv:2103.10360", + "arxiv:2210.02414", + "arxiv:1911.02150", + "arxiv:2406.12793", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - zh - en tags: - glm - chatglm - thudm --- # ChatGLM2-6B

💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub]

👋 Join our Slack and WeChat

📍Experience the larger-scale ChatGLM model at ## 介绍 ChatGLM**2**-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本,在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上,ChatGLM**2**-6B 引入了如下新特性: 1. **更强大的性能**:基于 ChatGLM 初代模型的开发经验,我们全面升级了 ChatGLM2-6B 的基座模型。ChatGLM2-6B 使用了 GLM 的混合目标函数,经过了 1.4T 中英标识符的预训练与人类偏好对齐训练,评测结果显示,相比于初代模型,ChatGLM2-6B 在 MMLU(+23%)、CEval(+33%)、GSM8K(+571%) 、BBH(+60%)等数据集上的性能取得了大幅度的提升,在同尺寸开源模型中具有较强的竞争力。 2. **更长的上下文**:基于 FlashAttention 技术,我们将基座模型的上下文长度(Context Length)由 ChatGLM-6B 的 2K 扩展到了 32K,并在对话阶段使用 8K 的上下文长度训练,允许更多轮次的对话。但当前版本的 ChatGLM2-6B 对单轮超长文档的理解能力有限,我们会在后续迭代升级中着重进行优化。 3. **更高效的推理**:基于 Multi-Query Attention 技术,ChatGLM2-6B 有更高效的推理速度和更低的显存占用:在官方的模型实现下,推理速度相比初代提升了 42%,INT4 量化下,6G 显存支持的对话长度由 1K 提升到了 8K。 4. **更开放的协议**:ChatGLM2-6B 权重对学术研究**完全开放**,在填写问卷进行登记后**亦允许免费商业使用**。 ChatGLM**2**-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features: 1. **Stronger Performance**: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B. ChatGLM2-6B uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. The evaluation results show that, compared to the first-generation model, ChatGLM2-6B has achieved substantial improvements in performance on datasets like MMLU (+23%), CEval (+33%), GSM8K (+571%), BBH (+60%), showing strong competitiveness among models of the same size. 2. **Longer Context**: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 8K during the dialogue alignment, allowing for more rounds of dialogue. However, the current version of ChatGLM2-6B has limited understanding of single-round ultra-long documents, which we will focus on optimizing in future iterations. 3. **More Efficient Inference**: Based on Multi-Query Attention technique, ChatGLM2-6B has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K. 4. **More Open License**: ChatGLM2-6B weights are **completely open** for academic research, and **free commercial use** is also allowed after completing the questionnaire. ## 软件依赖 ## 代码调用 可以通过如下代码调用 ChatGLM-6B 模型来生成对话: 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. ## Change Log * v1.0 ## 协议 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM2-6B 模型的权重的使用则需要遵循 Model License。 ## 引用 如果你觉得我们的工作有帮助的话,请考虑引用下列论文。 If you find our work helpful, please consider citing the following paper.", + "model_explanation_gemini": "An open-source bilingual (Chinese-English) chat model optimized for smooth dialogue, offering stronger performance, longer context handling, and efficient inference while allowing academic and commercial use." +} \ No newline at end of file diff --git a/data/model_data_json/THUDM_chatglm3-6b.json b/data/model_data_json/THUDM_chatglm3-6b.json new file mode 100644 index 0000000000000000000000000000000000000000..30e3c8b68dd7a6407fcf1b4740e19d72603ce6bc --- /dev/null +++ b/data/model_data_json/THUDM_chatglm3-6b.json @@ -0,0 +1,22 @@ +{ + "model_id": "THUDM/chatglm3-6b", + "downloads": 153439, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "chatglm", + "glm", + "thudm", + "custom_code", + "zh", + "en", + "arxiv:2103.10360", + "arxiv:2210.02414", + "arxiv:2406.12793", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - zh - en tags: - glm - chatglm - thudm --- # ChatGLM3-6B

💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub]

👋 Join our Slack and WeChat

📍Experience the larger-scale ChatGLM model at ## GLM-4 开源模型 我们已经发布最新的 **GLM-4** 模型,该模型在多个指标上有了新的突破,您可以在以下两个渠道体验我们的最新模型。 + GLM-4 开源模型 我们已经开源了 GLM-4-9B 系列模型,在各项指标的测试上有明显提升,欢迎尝试。 ## 介绍 (Introduction) ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上,ChatGLM3-6B 引入了如下特性: 1. **更强大的基础模型:** ChatGLM3-6B 的基础模型 ChatGLM3-6B-Base 采用了更多样的训练数据、更充分的训练步数和更合理的训练策略。在语义、数学、推理、代码、知识等不同角度的数据集上测评显示,ChatGLM3-6B-Base 具有在 10B 以下的预训练模型中最强的性能。 2. **更完整的功能支持:** ChatGLM3-6B 采用了全新设计的 Prompt 格式,除正常的多轮对话外。同时原生支持工具调用(Function Call)、代码执行(Code Interpreter)和 Agent 任务等复杂场景。 3. **更全面的开源序列:** 除了对话模型 ChatGLM3-6B 外,还开源了基础模型 ChatGLM-6B-Base、长文本对话模型 ChatGLM3-6B-32K。以上所有权重对学术研究**完全开放**,在填写问卷进行登记后**亦允许免费商业使用**。 ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B introduces the following features: 1. **More Powerful Base Model:** The base model of ChatGLM3-6B, ChatGLM3-6B-Base, employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. Evaluations on datasets such as semantics, mathematics, reasoning, code, knowledge, etc., show that ChatGLM3-6B-Base has the strongest performance among pre-trained models under 10B. 2. **More Comprehensive Function Support:** ChatGLM3-6B adopts a newly designed Prompt format, in addition to the normal multi-turn dialogue. It also natively supports function call, code interpreter, and complex scenarios such as agent tasks. 3. **More Comprehensive Open-source Series:** In addition to the dialogue model ChatGLM3-6B, the base model ChatGLM-6B-Base and the long-text dialogue model ChatGLM3-6B-32K are also open-sourced. All the weights are **fully open** for academic research, and after completing the questionnaire registration, they are also **allowed for free commercial use**. ## 软件依赖 (Dependencies) ## 代码调用 (Code Usage) 可以通过如下代码调用 ChatGLM3-6B 模型来生成对话: You can generate dialogue by invoking the ChatGLM3-6B model with the following code: 关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo. ## 协议 (License) 本仓库的代码依照 Apache-2.0 协议开源,ChatGLM3-6B 模型的权重的使用则需要遵循 Model License。 The code in this repository is open-sourced under the Apache-2.0 license, while the use of the ChatGLM3-6B model weights needs to comply with the Model License. ## 引用 (Citation) 如果你觉得我们的工作有帮助的话,请考虑引用下列论文。 If you find our work helpful, please consider citing the following paper.", + "model_explanation_gemini": "ChatGLM3-6B is an open-source multilingual (Chinese/English) conversational AI model optimized for smooth dialogue, supporting complex tasks like function calls, code execution, and agent workflows while offering strong performance in semantics, reasoning, and knowledge tasks." +} \ No newline at end of file diff --git a/data/model_data_json/THUDM_glm-4-9b-chat.json b/data/model_data_json/THUDM_glm-4-9b-chat.json new file mode 100644 index 0000000000000000000000000000000000000000..518dba2264b0047e5810351c897622169a92502a --- /dev/null +++ b/data/model_data_json/THUDM_glm-4-9b-chat.json @@ -0,0 +1,19 @@ +{ + "model_id": "THUDM/glm-4-9b-chat", + "downloads": 145133, + "tags": [ + "transformers", + "safetensors", + "chatglm", + "glm", + "thudm", + "custom_code", + "zh", + "en", + "arxiv:2406.12793", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: glm-4 license_link: language: - zh - en tags: - glm - chatglm - thudm inference: false --- # GLM-4-9B-Chat Read this in English. **2024/11/25**, 我们建议使用从 开始,使用 glm-4-9b-chat-hf 以减少后续 transformers 升级导致的兼容性问题。 **2024/08/12, 本仓库代码已更新并使用 , 请及时更新依赖。** **2024/07/24,我们发布了与长文本相关的最新技术解读,关注 这里 查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告** ## 模型介绍 GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中,GLM-4-9B 及其人类偏好对齐的版本 GLM-4-9B-Chat 均表现出较高的性能。 除了能进行多轮对话,GLM-4-9B-Chat 还具备网页浏览、代码执行、自定义工具调用(Function Call)和长文本推理(支持最大 128K 上下文)等高级功能。 本代模型增加了多语言支持,支持包括日语,韩语,德语在内的 26 种语言。我们还推出了支持 1M 上下文长度(约 200 万中文字符)的模型。 ## 评测结果 我们在一些经典任务上对 GLM-4-9B-Chat 模型进行了评测,并得到了如下的结果: | Model | AlignBench-v2 | MT-Bench | IFEval | MMLU | C-Eval | GSM8K | MATH | HumanEval | NCB | |:--------------------|:-------------:|:--------:|:------:|:----:|:------:|:-----:|:----:|:---------:|:----:| | Llama-3-8B-Instruct | 5.12 | 8.00 | 68.58 | 68.4 | 51.3 | 79.6 | 30.0 | 62.2 | 24.7 | | ChatGLM3-6B | 3.97 | 5.50 | 28.1 | 66.4 | 69.0 | 72.3 | 25.7 | 58.5 | 11.3 | | GLM-4-9B-Chat | 6.61 | 8.35 | 69.0 | 72.4 | 75.6 | 79.6 | 50.6 | 71.8 | 32.2 | ### 长文本 在 1M 的上下文长度下进行大海捞针实验,结果如下: !needle 在 LongBench-Chat 上对长文本能力进行了进一步评测,结果如下: !leaderboard ### 多语言能力 在六个多语言数据集上对 GLM-4-9B-Chat 和 Llama-3-8B-Instruct 进行了测试,测试结果及数据集对应选取语言如下表 | Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages |:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:| | M-MMLU | 49.6 | 56.6 | all | FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no | MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th | XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt | XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te | XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi ### 工具调用能力 我们在 Berkeley Function Calling Leaderboard 上进行了测试并得到了以下结果: | Model | Overall Acc. | AST Summary | Exec Summary | Relevance | |:-----------------------|:------------:|:-----------:|:------------:|:---------:| | Llama-3-8B-Instruct | 58.88 | 59.25 | 70.01 | 45.83 | | gpt-4-turbo-2024-04-09 | 81.24 | 82.14 | 78.61 | 88.75 | | ChatGLM3-6B | 57.88 | 62.18 | 69.78 | 5.42 | | GLM-4-9B-Chat | 81.00 | 80.26 | 84.40 | 87.92 | **本仓库是 GLM-4-9B-Chat 的模型仓库,支持上下文长度。** ## 运行模型 **更多推理代码和依赖信息,请访问我们的 github。** **请严格按照依赖安装,否则无法正常运行。** ### 使用 transformers 后端进行推理: 使用 vLLM后端进行推理: ## 协议 GLM-4 模型的权重的使用则需要遵循 LICENSE。 ## 引用 如果你觉得我们的工作有帮助的话,请考虑引用下列论文。", + "model_explanation_gemini": "GLM-4-9B-Chat is a multilingual conversational AI model excelling in multi-turn dialogue, tool calling, long-context reasoning (up to 128K tokens), and tasks like coding, math, and knowledge retrieval across 26 languages." +} \ No newline at end of file diff --git a/data/model_data_json/THUDM_glm-4-voice-tokenizer.json b/data/model_data_json/THUDM_glm-4-voice-tokenizer.json new file mode 100644 index 0000000000000000000000000000000000000000..55c0591a02abb2b095adcd5ebb2805262817f7ea --- /dev/null +++ b/data/model_data_json/THUDM_glm-4-voice-tokenizer.json @@ -0,0 +1,10 @@ +{ + "model_id": "THUDM/glm-4-voice-tokenizer", + "downloads": 78346, + "tags": [ + "safetensors", + "whisper", + "region:us" + ], + "description": "# GLM-4-Voice-Tokenizer GLM-4-Voice 是智谱 AI 推出的端到端语音模型。GLM-4-Voice 能够直接理解和生成中英文语音,进行实时语音对话,并且能够根据用户的指令改变语音的情感、语调、语速、方言等属性。 GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. 本仓库是 GLM-4-Voice 的 speech tokenizer 部分。通过在 Whisper 的 encoder 部分增加 vector quantization 进行训练,将连续的语音输入转化为离散的 token。每秒音频转化为 12.5 个离散 token。 The repo provides the speech tokenzier of GLM-4-Voice, which is trained by adding vector quantization to the encoder part of Whisper and converts continuous speech input into discrete tokens. Each second of audio is converted into 12.5 discrete tokens. 更多信息请参考我们的仓库 GLM-4-Voice. For more information please refer to our repo GLM-4-Voice." +} \ No newline at end of file diff --git a/data/model_data_json/TahaDouaji_detr-doc-table-detection.json b/data/model_data_json/TahaDouaji_detr-doc-table-detection.json new file mode 100644 index 0000000000000000000000000000000000000000..1ad012501153111642e62f7ac0a753c4ef5a8673 --- /dev/null +++ b/data/model_data_json/TahaDouaji_detr-doc-table-detection.json @@ -0,0 +1,22 @@ +{ + "model_id": "TahaDouaji/detr-doc-table-detection", + "downloads": 291621, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "detr", + "object-detection", + "- vision", + "dataset:MohamedExperio/ICDAR2019", + "arxiv:2005.12872", + "arxiv:1910.09700", + "base_model:facebook/detr-resnet-50", + "base_model:finetune:facebook/detr-resnet-50", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - object-detection - '- vision' license: apache-2.0 base_model: facebook/detr-resnet-50 datasets: - MohamedExperio/ICDAR2019 --- # Model Card for detr-doc-table-detection # Model Details detr-doc-table-detection is a model trained to detect both **Bordered** and **Borderless** tables in documents, based on facebook/detr-resnet-50. - **Developed by:** Taha Douaji - **Shared by [Optional]:** Taha Douaji - **Model type:** Object Detection - **Language(s) (NLP):** More information needed - **License:** More information needed - **Parent Model:** facebook/detr-resnet-50 - **Resources for more information:** - Model Demo Space - Associated Paper # Uses ## Direct Use This model can be used for the task of object detection. ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Training Details ## Training Data The model was trained on ICDAR2019 Table Dataset # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). # Citation **BibTeX:** # Model Card Authors [optional] Taha Douaji in collaboration with Ezi Ozoani and the Hugging Face team # Model Card Contact More information needed # How to Get Started with the Model Use the code below to get started with the model.", + "model_explanation_gemini": "Detects bordered and borderless tables in documents using object detection." +} \ No newline at end of file diff --git a/data/model_data_json/TaylorAI_bge-micro-v2.json b/data/model_data_json/TaylorAI_bge-micro-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..c5bb6a65653bfe135e9463f762ec50387ed1cde7 --- /dev/null +++ b/data/model_data_json/TaylorAI_bge-micro-v2.json @@ -0,0 +1,23 @@ +{ + "model_id": "TaylorAI/bge-micro-v2", + "downloads": 90432, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "mteb", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: bge_micro results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 67.76119402985074 - type: ap value: 29.637849284211114 - type: f1 value: 61.31181187111905 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 79.7547 - type: ap value: 74.21401629809145 - type: f1 value: 79.65319615433783 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 37.452000000000005 - type: f1 value: 37.0245198854966 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 31.152 - type: map_at_10 value: 46.702 - type: map_at_100 value: 47.563 - type: map_at_1000 value: 47.567 - type: map_at_3 value: 42.058 - type: map_at_5 value: 44.608 - type: mrr_at_1 value: 32.006 - type: mrr_at_10 value: 47.064 - type: mrr_at_100 value: 47.910000000000004 - type: mrr_at_1000 value: 47.915 - type: mrr_at_3 value: 42.283 - type: mrr_at_5 value: 44.968 - type: ndcg_at_1 value: 31.152 - type: ndcg_at_10 value: 55.308 - type: ndcg_at_100 value: 58.965 - type: ndcg_at_1000 value: 59.067 - type: ndcg_at_3 value: 45.698 - type: ndcg_at_5 value: 50.296 - type: precision_at_1 value: 31.152 - type: precision_at_10 value: 8.279 - type: precision_at_100 value: 0.987 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 18.753 - type: precision_at_5 value: 13.485 - type: recall_at_1 value: 31.152 - type: recall_at_10 value: 82.788 - type: recall_at_100 value: 98.72 - type: recall_at_1000 value: 99.502 - type: recall_at_3 value: 56.259 - type: recall_at_5 value: 67.425 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 44.52692241938116 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 33.245710292773595 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 58.08493637155168 - type: mrr value: 71.94378490084861 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 84.1602804378326 - type: cos_sim_spearman value: 82.92478106365587 - type: euclidean_pearson value: 82.27930167277077 - type: euclidean_spearman value: 82.18560759458093 - type: manhattan_pearson value: 82.34277425888187 - type: manhattan_spearman value: 81.72776583704467 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 81.17207792207792 - type: f1 value: 81.09893836310513 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 36.109308463095516 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 28.06048212317168 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.233999999999998 - type: map_at_10 value: 38.092999999999996 - type: map_at_100 value: 39.473 - type: map_at_1000 value: 39.614 - type: map_at_3 value: 34.839 - type: map_at_5 value: 36.523 - type: mrr_at_1 value: 35.193000000000005 - type: mrr_at_10 value: 44.089 - type: mrr_at_100 value: 44.927 - type: mrr_at_1000 value: 44.988 - type: mrr_at_3 value: 41.559000000000005 - type: mrr_at_5 value: 43.162 - type: ndcg_at_1 value: 35.193000000000005 - type: ndcg_at_10 value: 44.04 - type: ndcg_at_100 value: 49.262 - type: ndcg_at_1000 value: 51.847 - type: ndcg_at_3 value: 39.248 - type: ndcg_at_5 value: 41.298 - type: precision_at_1 value: 35.193000000000005 - type: precision_at_10 value: 8.555 - type: precision_at_100 value: 1.3820000000000001 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 19.123 - type: precision_at_5 value: 13.648 - type: recall_at_1 value: 28.233999999999998 - type: recall_at_10 value: 55.094 - type: recall_at_100 value: 76.85300000000001 - type: recall_at_1000 value: 94.163 - type: recall_at_3 value: 40.782000000000004 - type: recall_at_5 value: 46.796 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.538 - type: map_at_10 value: 28.449 - type: map_at_100 value: 29.471000000000004 - type: map_at_1000 value: 29.599999999999998 - type: map_at_3 value: 26.371 - type: map_at_5 value: 27.58 - type: mrr_at_1 value: 26.815 - type: mrr_at_10 value: 33.331 - type: mrr_at_100 value: 34.114 - type: mrr_at_1000 value: 34.182 - type: mrr_at_3 value: 31.561 - type: mrr_at_5 value: 32.608 - type: ndcg_at_1 value: 26.815 - type: ndcg_at_10 value: 32.67 - type: ndcg_at_100 value: 37.039 - type: ndcg_at_1000 value: 39.769 - type: ndcg_at_3 value: 29.523 - type: ndcg_at_5 value: 31.048 - type: precision_at_1 value: 26.815 - type: precision_at_10 value: 5.955 - type: precision_at_100 value: 1.02 - type: precision_at_1000 value: 0.152 - type: precision_at_3 value: 14.033999999999999 - type: precision_at_5 value: 9.911 - type: recall_at_1 value: 21.538 - type: recall_at_10 value: 40.186 - type: recall_at_100 value: 58.948 - type: recall_at_1000 value: 77.158 - type: recall_at_3 value: 30.951 - type: recall_at_5 value: 35.276 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 35.211999999999996 - type: map_at_10 value: 46.562 - type: map_at_100 value: 47.579 - type: map_at_1000 value: 47.646 - type: map_at_3 value: 43.485 - type: map_at_5 value: 45.206 - type: mrr_at_1 value: 40.627 - type: mrr_at_10 value: 49.928 - type: mrr_at_100 value: 50.647 - type: mrr_at_1000 value: 50.685 - type: mrr_at_3 value: 47.513 - type: mrr_at_5 value: 48.958 - type: ndcg_at_1 value: 40.627 - type: ndcg_at_10 value: 52.217 - type: ndcg_at_100 value: 56.423 - type: ndcg_at_1000 value: 57.821999999999996 - type: ndcg_at_3 value: 46.949000000000005 - type: ndcg_at_5 value: 49.534 - type: precision_at_1 value: 40.627 - type: precision_at_10 value: 8.476 - type: precision_at_100 value: 1.15 - type: precision_at_1000 value: 0.132 - type: precision_at_3 value: 21.003 - type: precision_at_5 value: 14.469999999999999 - type: recall_at_1 value: 35.211999999999996 - type: recall_at_10 value: 65.692 - type: recall_at_100 value: 84.011 - type: recall_at_1000 value: 94.03099999999999 - type: recall_at_3 value: 51.404 - type: recall_at_5 value: 57.882 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.09 - type: map_at_10 value: 29.516 - type: map_at_100 value: 30.462 - type: map_at_1000 value: 30.56 - type: map_at_3 value: 26.945000000000004 - type: map_at_5 value: 28.421999999999997 - type: mrr_at_1 value: 23.616 - type: mrr_at_10 value: 31.221 - type: mrr_at_100 value: 32.057 - type: mrr_at_1000 value: 32.137 - type: mrr_at_3 value: 28.738000000000003 - type: mrr_at_5 value: 30.156 - type: ndcg_at_1 value: 23.616 - type: ndcg_at_10 value: 33.97 - type: ndcg_at_100 value: 38.806000000000004 - type: ndcg_at_1000 value: 41.393 - type: ndcg_at_3 value: 28.908 - type: ndcg_at_5 value: 31.433 - type: precision_at_1 value: 23.616 - type: precision_at_10 value: 5.299 - type: precision_at_100 value: 0.812 - type: precision_at_1000 value: 0.107 - type: precision_at_3 value: 12.015 - type: precision_at_5 value: 8.701 - type: recall_at_1 value: 22.09 - type: recall_at_10 value: 46.089999999999996 - type: recall_at_100 value: 68.729 - type: recall_at_1000 value: 88.435 - type: recall_at_3 value: 32.584999999999994 - type: recall_at_5 value: 38.550000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.469 - type: map_at_10 value: 22.436 - type: map_at_100 value: 23.465 - type: map_at_1000 value: 23.608999999999998 - type: map_at_3 value: 19.716 - type: map_at_5 value: 21.182000000000002 - type: mrr_at_1 value: 18.905 - type: mrr_at_10 value: 26.55 - type: mrr_at_100 value: 27.46 - type: mrr_at_1000 value: 27.553 - type: mrr_at_3 value: 23.921999999999997 - type: mrr_at_5 value: 25.302999999999997 - type: ndcg_at_1 value: 18.905 - type: ndcg_at_10 value: 27.437 - type: ndcg_at_100 value: 32.555 - type: ndcg_at_1000 value: 35.885 - type: ndcg_at_3 value: 22.439 - type: ndcg_at_5 value: 24.666 - type: precision_at_1 value: 18.905 - type: precision_at_10 value: 5.2490000000000006 - type: precision_at_100 value: 0.889 - type: precision_at_1000 value: 0.131 - type: precision_at_3 value: 10.862 - type: precision_at_5 value: 8.085 - type: recall_at_1 value: 15.469 - type: recall_at_10 value: 38.706 - type: recall_at_100 value: 61.242 - type: recall_at_1000 value: 84.84 - type: recall_at_3 value: 24.973 - type: recall_at_5 value: 30.603 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.918000000000003 - type: map_at_10 value: 34.296 - type: map_at_100 value: 35.632000000000005 - type: map_at_1000 value: 35.748999999999995 - type: map_at_3 value: 31.304 - type: map_at_5 value: 33.166000000000004 - type: mrr_at_1 value: 30.703000000000003 - type: mrr_at_10 value: 39.655 - type: mrr_at_100 value: 40.569 - type: mrr_at_1000 value: 40.621 - type: mrr_at_3 value: 37.023 - type: mrr_at_5 value: 38.664 - type: ndcg_at_1 value: 30.703000000000003 - type: ndcg_at_10 value: 39.897 - type: ndcg_at_100 value: 45.777 - type: ndcg_at_1000 value: 48.082 - type: ndcg_at_3 value: 35.122 - type: ndcg_at_5 value: 37.691 - type: precision_at_1 value: 30.703000000000003 - type: precision_at_10 value: 7.305000000000001 - type: precision_at_100 value: 1.208 - type: precision_at_1000 value: 0.159 - type: precision_at_3 value: 16.811 - type: precision_at_5 value: 12.203999999999999 - type: recall_at_1 value: 24.918000000000003 - type: recall_at_10 value: 51.31 - type: recall_at_100 value: 76.534 - type: recall_at_1000 value: 91.911 - type: recall_at_3 value: 37.855 - type: recall_at_5 value: 44.493 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.416 - type: map_at_10 value: 30.474 - type: map_at_100 value: 31.759999999999998 - type: map_at_1000 value: 31.891000000000002 - type: map_at_3 value: 27.728 - type: map_at_5 value: 29.247 - type: mrr_at_1 value: 28.881 - type: mrr_at_10 value: 36.418 - type: mrr_at_100 value: 37.347 - type: mrr_at_1000 value: 37.415 - type: mrr_at_3 value: 33.942 - type: mrr_at_5 value: 35.386 - type: ndcg_at_1 value: 28.881 - type: ndcg_at_10 value: 35.812 - type: ndcg_at_100 value: 41.574 - type: ndcg_at_1000 value: 44.289 - type: ndcg_at_3 value: 31.239 - type: ndcg_at_5 value: 33.302 - type: precision_at_1 value: 28.881 - type: precision_at_10 value: 6.598 - type: precision_at_100 value: 1.1079999999999999 - type: precision_at_1000 value: 0.151 - type: precision_at_3 value: 14.954 - type: precision_at_5 value: 10.776 - type: recall_at_1 value: 22.416 - type: recall_at_10 value: 46.243 - type: recall_at_100 value: 71.352 - type: recall_at_1000 value: 90.034 - type: recall_at_3 value: 32.873000000000005 - type: recall_at_5 value: 38.632 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.528166666666667 - type: map_at_10 value: 30.317833333333333 - type: map_at_100 value: 31.44108333333333 - type: map_at_1000 value: 31.566666666666666 - type: map_at_3 value: 27.84425 - type: map_at_5 value: 29.233333333333334 - type: mrr_at_1 value: 26.75733333333333 - type: mrr_at_10 value: 34.24425 - type: mrr_at_100 value: 35.11375 - type: mrr_at_1000 value: 35.184333333333335 - type: mrr_at_3 value: 32.01225 - type: mrr_at_5 value: 33.31225 - type: ndcg_at_1 value: 26.75733333333333 - type: ndcg_at_10 value: 35.072583333333334 - type: ndcg_at_100 value: 40.13358333333334 - type: ndcg_at_1000 value: 42.81825 - type: ndcg_at_3 value: 30.79275000000001 - type: ndcg_at_5 value: 32.822 - type: precision_at_1 value: 26.75733333333333 - type: precision_at_10 value: 6.128083333333334 - type: precision_at_100 value: 1.019 - type: precision_at_1000 value: 0.14391666666666664 - type: precision_at_3 value: 14.129916666666665 - type: precision_at_5 value: 10.087416666666668 - type: recall_at_1 value: 22.528166666666667 - type: recall_at_10 value: 45.38341666666667 - type: recall_at_100 value: 67.81791666666668 - type: recall_at_1000 value: 86.71716666666666 - type: recall_at_3 value: 33.38741666666667 - type: recall_at_5 value: 38.62041666666667 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.975 - type: map_at_10 value: 28.144999999999996 - type: map_at_100 value: 28.994999999999997 - type: map_at_1000 value: 29.086000000000002 - type: map_at_3 value: 25.968999999999998 - type: map_at_5 value: 27.321 - type: mrr_at_1 value: 25 - type: mrr_at_10 value: 30.822 - type: mrr_at_100 value: 31.647 - type: mrr_at_1000 value: 31.712 - type: mrr_at_3 value: 28.860000000000003 - type: mrr_at_5 value: 30.041 - type: ndcg_at_1 value: 25 - type: ndcg_at_10 value: 31.929999999999996 - type: ndcg_at_100 value: 36.258 - type: ndcg_at_1000 value: 38.682 - type: ndcg_at_3 value: 27.972 - type: ndcg_at_5 value: 30.089 - type: precision_at_1 value: 25 - type: precision_at_10 value: 4.923 - type: precision_at_100 value: 0.767 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 11.860999999999999 - type: precision_at_5 value: 8.466 - type: recall_at_1 value: 21.975 - type: recall_at_10 value: 41.102 - type: recall_at_100 value: 60.866 - type: recall_at_1000 value: 78.781 - type: recall_at_3 value: 30.268 - type: recall_at_5 value: 35.552 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.845999999999998 - type: map_at_10 value: 21.861 - type: map_at_100 value: 22.798 - type: map_at_1000 value: 22.925 - type: map_at_3 value: 19.922 - type: map_at_5 value: 21.054000000000002 - type: mrr_at_1 value: 19.098000000000003 - type: mrr_at_10 value: 25.397 - type: mrr_at_100 value: 26.246000000000002 - type: mrr_at_1000 value: 26.33 - type: mrr_at_3 value: 23.469 - type: mrr_at_5 value: 24.646 - type: ndcg_at_1 value: 19.098000000000003 - type: ndcg_at_10 value: 25.807999999999996 - type: ndcg_at_100 value: 30.445 - type: ndcg_at_1000 value: 33.666000000000004 - type: ndcg_at_3 value: 22.292 - type: ndcg_at_5 value: 24.075 - type: precision_at_1 value: 19.098000000000003 - type: precision_at_10 value: 4.58 - type: precision_at_100 value: 0.8099999999999999 - type: precision_at_1000 value: 0.126 - type: precision_at_3 value: 10.346 - type: precision_at_5 value: 7.542999999999999 - type: recall_at_1 value: 15.845999999999998 - type: recall_at_10 value: 34.172999999999995 - type: recall_at_100 value: 55.24099999999999 - type: recall_at_1000 value: 78.644 - type: recall_at_3 value: 24.401 - type: recall_at_5 value: 28.938000000000002 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.974 - type: map_at_10 value: 30.108 - type: map_at_100 value: 31.208000000000002 - type: map_at_1000 value: 31.330999999999996 - type: map_at_3 value: 27.889999999999997 - type: map_at_5 value: 29.023 - type: mrr_at_1 value: 26.493 - type: mrr_at_10 value: 33.726 - type: mrr_at_100 value: 34.622 - type: mrr_at_1000 value: 34.703 - type: mrr_at_3 value: 31.575999999999997 - type: mrr_at_5 value: 32.690999999999995 - type: ndcg_at_1 value: 26.493 - type: ndcg_at_10 value: 34.664 - type: ndcg_at_100 value: 39.725 - type: ndcg_at_1000 value: 42.648 - type: ndcg_at_3 value: 30.447999999999997 - type: ndcg_at_5 value: 32.145 - type: precision_at_1 value: 26.493 - type: precision_at_10 value: 5.7090000000000005 - type: precision_at_100 value: 0.9199999999999999 - type: precision_at_1000 value: 0.129 - type: precision_at_3 value: 13.464 - type: precision_at_5 value: 9.384 - type: recall_at_1 value: 22.974 - type: recall_at_10 value: 45.097 - type: recall_at_100 value: 66.908 - type: recall_at_1000 value: 87.495 - type: recall_at_3 value: 33.338 - type: recall_at_5 value: 37.499 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.408 - type: map_at_10 value: 29.580000000000002 - type: map_at_100 value: 31.145 - type: map_at_1000 value: 31.369000000000003 - type: map_at_3 value: 27.634999999999998 - type: map_at_5 value: 28.766000000000002 - type: mrr_at_1 value: 27.272999999999996 - type: mrr_at_10 value: 33.93 - type: mrr_at_100 value: 34.963 - type: mrr_at_1000 value: 35.031 - type: mrr_at_3 value: 32.016 - type: mrr_at_5 value: 33.221000000000004 - type: ndcg_at_1 value: 27.272999999999996 - type: ndcg_at_10 value: 33.993 - type: ndcg_at_100 value: 40.333999999999996 - type: ndcg_at_1000 value: 43.361 - type: ndcg_at_3 value: 30.918 - type: ndcg_at_5 value: 32.552 - type: precision_at_1 value: 27.272999999999996 - type: precision_at_10 value: 6.285 - type: precision_at_100 value: 1.389 - type: precision_at_1000 value: 0.232 - type: precision_at_3 value: 14.427000000000001 - type: precision_at_5 value: 10.356 - type: recall_at_1 value: 22.408 - type: recall_at_10 value: 41.318 - type: recall_at_100 value: 70.539 - type: recall_at_1000 value: 90.197 - type: recall_at_3 value: 32.513 - type: recall_at_5 value: 37 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.258000000000003 - type: map_at_10 value: 24.294 - type: map_at_100 value: 25.305 - type: map_at_1000 value: 25.419999999999998 - type: map_at_3 value: 22.326999999999998 - type: map_at_5 value: 23.31 - type: mrr_at_1 value: 18.484 - type: mrr_at_10 value: 25.863999999999997 - type: mrr_at_100 value: 26.766000000000002 - type: mrr_at_1000 value: 26.855 - type: mrr_at_3 value: 23.968 - type: mrr_at_5 value: 24.911 - type: ndcg_at_1 value: 18.484 - type: ndcg_at_10 value: 28.433000000000003 - type: ndcg_at_100 value: 33.405 - type: ndcg_at_1000 value: 36.375 - type: ndcg_at_3 value: 24.455 - type: ndcg_at_5 value: 26.031 - type: precision_at_1 value: 18.484 - type: precision_at_10 value: 4.603 - type: precision_at_100 value: 0.773 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 10.659 - type: precision_at_5 value: 7.505000000000001 - type: recall_at_1 value: 17.258000000000003 - type: recall_at_10 value: 39.589999999999996 - type: recall_at_100 value: 62.592000000000006 - type: recall_at_1000 value: 84.917 - type: recall_at_3 value: 28.706 - type: recall_at_5 value: 32.224000000000004 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.578999999999999 - type: map_at_10 value: 17.642 - type: map_at_100 value: 19.451 - type: map_at_1000 value: 19.647000000000002 - type: map_at_3 value: 14.618 - type: map_at_5 value: 16.145 - type: mrr_at_1 value: 23.322000000000003 - type: mrr_at_10 value: 34.204 - type: mrr_at_100 value: 35.185 - type: mrr_at_1000 value: 35.235 - type: mrr_at_3 value: 30.847 - type: mrr_at_5 value: 32.824 - type: ndcg_at_1 value: 23.322000000000003 - type: ndcg_at_10 value: 25.352999999999998 - type: ndcg_at_100 value: 32.574 - type: ndcg_at_1000 value: 36.073 - type: ndcg_at_3 value: 20.318 - type: ndcg_at_5 value: 22.111 - type: precision_at_1 value: 23.322000000000003 - type: precision_at_10 value: 8.02 - type: precision_at_100 value: 1.5730000000000002 - type: precision_at_1000 value: 0.22200000000000003 - type: precision_at_3 value: 15.049000000000001 - type: precision_at_5 value: 11.87 - type: recall_at_1 value: 10.578999999999999 - type: recall_at_10 value: 30.964999999999996 - type: recall_at_100 value: 55.986000000000004 - type: recall_at_1000 value: 75.565 - type: recall_at_3 value: 18.686 - type: recall_at_5 value: 23.629 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 7.327 - type: map_at_10 value: 14.904 - type: map_at_100 value: 20.29 - type: map_at_1000 value: 21.42 - type: map_at_3 value: 10.911 - type: map_at_5 value: 12.791 - type: mrr_at_1 value: 57.25 - type: mrr_at_10 value: 66.62700000000001 - type: mrr_at_100 value: 67.035 - type: mrr_at_1000 value: 67.052 - type: mrr_at_3 value: 64.833 - type: mrr_at_5 value: 65.908 - type: ndcg_at_1 value: 43.75 - type: ndcg_at_10 value: 32.246 - type: ndcg_at_100 value: 35.774 - type: ndcg_at_1000 value: 42.872 - type: ndcg_at_3 value: 36.64 - type: ndcg_at_5 value: 34.487 - type: precision_at_1 value: 57.25 - type: precision_at_10 value: 25.924999999999997 - type: precision_at_100 value: 7.670000000000001 - type: precision_at_1000 value: 1.599 - type: precision_at_3 value: 41.167 - type: precision_at_5 value: 34.65 - type: recall_at_1 value: 7.327 - type: recall_at_10 value: 19.625 - type: recall_at_100 value: 41.601 - type: recall_at_1000 value: 65.117 - type: recall_at_3 value: 12.308 - type: recall_at_5 value: 15.437999999999999 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 44.53 - type: f1 value: 39.39884255816736 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 58.913000000000004 - type: map_at_10 value: 69.592 - type: map_at_100 value: 69.95599999999999 - type: map_at_1000 value: 69.973 - type: map_at_3 value: 67.716 - type: map_at_5 value: 68.899 - type: mrr_at_1 value: 63.561 - type: mrr_at_10 value: 74.2 - type: mrr_at_100 value: 74.468 - type: mrr_at_1000 value: 74.47500000000001 - type: mrr_at_3 value: 72.442 - type: mrr_at_5 value: 73.58 - type: ndcg_at_1 value: 63.561 - type: ndcg_at_10 value: 74.988 - type: ndcg_at_100 value: 76.52799999999999 - type: ndcg_at_1000 value: 76.88000000000001 - type: ndcg_at_3 value: 71.455 - type: ndcg_at_5 value: 73.42699999999999 - type: precision_at_1 value: 63.561 - type: precision_at_10 value: 9.547 - type: precision_at_100 value: 1.044 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 28.143 - type: precision_at_5 value: 18.008 - type: recall_at_1 value: 58.913000000000004 - type: recall_at_10 value: 87.18 - type: recall_at_100 value: 93.852 - type: recall_at_1000 value: 96.256 - type: recall_at_3 value: 77.55199999999999 - type: recall_at_5 value: 82.42399999999999 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 11.761000000000001 - type: map_at_10 value: 19.564999999999998 - type: map_at_100 value: 21.099 - type: map_at_1000 value: 21.288999999999998 - type: map_at_3 value: 16.683999999999997 - type: map_at_5 value: 18.307000000000002 - type: mrr_at_1 value: 23.302 - type: mrr_at_10 value: 30.979 - type: mrr_at_100 value: 32.121 - type: mrr_at_1000 value: 32.186 - type: mrr_at_3 value: 28.549000000000003 - type: mrr_at_5 value: 30.038999999999998 - type: ndcg_at_1 value: 23.302 - type: ndcg_at_10 value: 25.592 - type: ndcg_at_100 value: 32.416 - type: ndcg_at_1000 value: 36.277 - type: ndcg_at_3 value: 22.151 - type: ndcg_at_5 value: 23.483999999999998 - type: precision_at_1 value: 23.302 - type: precision_at_10 value: 7.377000000000001 - type: precision_at_100 value: 1.415 - type: precision_at_1000 value: 0.212 - type: precision_at_3 value: 14.712 - type: precision_at_5 value: 11.358 - type: recall_at_1 value: 11.761000000000001 - type: recall_at_10 value: 31.696 - type: recall_at_100 value: 58.01500000000001 - type: recall_at_1000 value: 81.572 - type: recall_at_3 value: 20.742 - type: recall_at_5 value: 25.707 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 32.275 - type: map_at_10 value: 44.712 - type: map_at_100 value: 45.621 - type: map_at_1000 value: 45.698 - type: map_at_3 value: 42.016999999999996 - type: map_at_5 value: 43.659 - type: mrr_at_1 value: 64.551 - type: mrr_at_10 value: 71.58099999999999 - type: mrr_at_100 value: 71.952 - type: mrr_at_1000 value: 71.96900000000001 - type: mrr_at_3 value: 70.236 - type: mrr_at_5 value: 71.051 - type: ndcg_at_1 value: 64.551 - type: ndcg_at_10 value: 53.913999999999994 - type: ndcg_at_100 value: 57.421 - type: ndcg_at_1000 value: 59.06 - type: ndcg_at_3 value: 49.716 - type: ndcg_at_5 value: 51.971999999999994 - type: precision_at_1 value: 64.551 - type: precision_at_10 value: 11.110000000000001 - type: precision_at_100 value: 1.388 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 30.822 - type: precision_at_5 value: 20.273 - type: recall_at_1 value: 32.275 - type: recall_at_10 value: 55.55 - type: recall_at_100 value: 69.38600000000001 - type: recall_at_1000 value: 80.35799999999999 - type: recall_at_3 value: 46.232 - type: recall_at_5 value: 50.682 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 76.4604 - type: ap value: 70.40498168422701 - type: f1 value: 76.38572688476046 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 15.065999999999999 - type: map_at_10 value: 25.058000000000003 - type: map_at_100 value: 26.268 - type: map_at_1000 value: 26.344 - type: map_at_3 value: 21.626 - type: map_at_5 value: 23.513 - type: mrr_at_1 value: 15.501000000000001 - type: mrr_at_10 value: 25.548 - type: mrr_at_100 value: 26.723000000000003 - type: mrr_at_1000 value: 26.793 - type: mrr_at_3 value: 22.142 - type: mrr_at_5 value: 24.024 - type: ndcg_at_1 value: 15.501000000000001 - type: ndcg_at_10 value: 31.008000000000003 - type: ndcg_at_100 value: 37.08 - type: ndcg_at_1000 value: 39.102 - type: ndcg_at_3 value: 23.921999999999997 - type: ndcg_at_5 value: 27.307 - type: precision_at_1 value: 15.501000000000001 - type: precision_at_10 value: 5.155 - type: precision_at_100 value: 0.822 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 10.363 - type: precision_at_5 value: 7.917000000000001 - type: recall_at_1 value: 15.065999999999999 - type: recall_at_10 value: 49.507 - type: recall_at_100 value: 78.118 - type: recall_at_1000 value: 93.881 - type: recall_at_3 value: 30.075000000000003 - type: recall_at_5 value: 38.222 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.6703146374829 - type: f1 value: 90.1258004293966 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 68.29229366165072 - type: f1 value: 50.016194478997875 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.57767316745124 - type: f1 value: 67.16194062146954 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.92064559515804 - type: f1 value: 73.6680729569968 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.56335607367883 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 28.131807833734268 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.07390328719844 - type: mrr value: 32.117370992867905 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.274 - type: map_at_10 value: 11.489 - type: map_at_100 value: 14.518 - type: map_at_1000 value: 15.914 - type: map_at_3 value: 8.399 - type: map_at_5 value: 9.889000000000001 - type: mrr_at_1 value: 42.724000000000004 - type: mrr_at_10 value: 51.486 - type: mrr_at_100 value: 51.941 - type: mrr_at_1000 value: 51.99 - type: mrr_at_3 value: 49.278 - type: mrr_at_5 value: 50.485 - type: ndcg_at_1 value: 39.938 - type: ndcg_at_10 value: 31.862000000000002 - type: ndcg_at_100 value: 29.235 - type: ndcg_at_1000 value: 37.802 - type: ndcg_at_3 value: 35.754999999999995 - type: ndcg_at_5 value: 34.447 - type: precision_at_1 value: 42.105 - type: precision_at_10 value: 23.901 - type: precision_at_100 value: 7.715 - type: precision_at_1000 value: 2.045 - type: precision_at_3 value: 33.437 - type: precision_at_5 value: 29.782999999999998 - type: recall_at_1 value: 5.274 - type: recall_at_10 value: 15.351 - type: recall_at_100 value: 29.791 - type: recall_at_1000 value: 60.722 - type: recall_at_3 value: 9.411 - type: recall_at_5 value: 12.171999999999999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 16.099 - type: map_at_10 value: 27.913 - type: map_at_100 value: 29.281000000000002 - type: map_at_1000 value: 29.343999999999998 - type: map_at_3 value: 23.791 - type: map_at_5 value: 26.049 - type: mrr_at_1 value: 18.337 - type: mrr_at_10 value: 29.953999999999997 - type: mrr_at_100 value: 31.080999999999996 - type: mrr_at_1000 value: 31.130000000000003 - type: mrr_at_3 value: 26.168000000000003 - type: mrr_at_5 value: 28.277 - type: ndcg_at_1 value: 18.308 - type: ndcg_at_10 value: 34.938 - type: ndcg_at_100 value: 41.125 - type: ndcg_at_1000 value: 42.708 - type: ndcg_at_3 value: 26.805 - type: ndcg_at_5 value: 30.686999999999998 - type: precision_at_1 value: 18.308 - type: precision_at_10 value: 6.476999999999999 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 12.784999999999998 - type: precision_at_5 value: 9.878 - type: recall_at_1 value: 16.099 - type: recall_at_10 value: 54.63 - type: recall_at_100 value: 82.24900000000001 - type: recall_at_1000 value: 94.242 - type: recall_at_3 value: 33.174 - type: recall_at_5 value: 42.164 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 67.947 - type: map_at_10 value: 81.499 - type: map_at_100 value: 82.17 - type: map_at_1000 value: 82.194 - type: map_at_3 value: 78.567 - type: map_at_5 value: 80.34400000000001 - type: mrr_at_1 value: 78.18 - type: mrr_at_10 value: 85.05 - type: mrr_at_100 value: 85.179 - type: mrr_at_1000 value: 85.181 - type: mrr_at_3 value: 83.91 - type: mrr_at_5 value: 84.638 - type: ndcg_at_1 value: 78.2 - type: ndcg_at_10 value: 85.715 - type: ndcg_at_100 value: 87.2 - type: ndcg_at_1000 value: 87.39 - type: ndcg_at_3 value: 82.572 - type: ndcg_at_5 value: 84.176 - type: precision_at_1 value: 78.2 - type: precision_at_10 value: 12.973 - type: precision_at_100 value: 1.5010000000000001 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 35.949999999999996 - type: precision_at_5 value: 23.62 - type: recall_at_1 value: 67.947 - type: recall_at_10 value: 93.804 - type: recall_at_100 value: 98.971 - type: recall_at_1000 value: 99.91600000000001 - type: recall_at_3 value: 84.75399999999999 - type: recall_at_5 value: 89.32 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 45.457201684255104 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 55.162226937477875 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.173 - type: map_at_10 value: 10.463000000000001 - type: map_at_100 value: 12.278 - type: map_at_1000 value: 12.572 - type: map_at_3 value: 7.528 - type: map_at_5 value: 8.863 - type: mrr_at_1 value: 20.599999999999998 - type: mrr_at_10 value: 30.422 - type: mrr_at_100 value: 31.6 - type: mrr_at_1000 value: 31.663000000000004 - type: mrr_at_3 value: 27.400000000000002 - type: mrr_at_5 value: 29.065 - type: ndcg_at_1 value: 20.599999999999998 - type: ndcg_at_10 value: 17.687 - type: ndcg_at_100 value: 25.172 - type: ndcg_at_1000 value: 30.617 - type: ndcg_at_3 value: 16.81 - type: ndcg_at_5 value: 14.499 - type: precision_at_1 value: 20.599999999999998 - type: precision_at_10 value: 9.17 - type: precision_at_100 value: 2.004 - type: precision_at_1000 value: 0.332 - type: precision_at_3 value: 15.6 - type: precision_at_5 value: 12.58 - type: recall_at_1 value: 4.173 - type: recall_at_10 value: 18.575 - type: recall_at_100 value: 40.692 - type: recall_at_1000 value: 67.467 - type: recall_at_3 value: 9.488000000000001 - type: recall_at_5 value: 12.738 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 81.12603499315416 - type: cos_sim_spearman value: 73.62060290948378 - type: euclidean_pearson value: 78.14083565781135 - type: euclidean_spearman value: 73.16840437541543 - type: manhattan_pearson value: 77.92017261109734 - type: manhattan_spearman value: 72.8805059949965 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 79.75955377133172 - type: cos_sim_spearman value: 71.8872633964069 - type: euclidean_pearson value: 76.31922068538256 - type: euclidean_spearman value: 70.86449661855376 - type: manhattan_pearson value: 76.47852229730407 - type: manhattan_spearman value: 70.99367421984789 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 78.80762722908158 - type: cos_sim_spearman value: 79.84588978756372 - type: euclidean_pearson value: 79.8216849781164 - type: euclidean_spearman value: 80.22647061695481 - type: manhattan_pearson value: 79.56604194112572 - type: manhattan_spearman value: 79.96495189862462 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 80.1012718092742 - type: cos_sim_spearman value: 76.86011381793661 - type: euclidean_pearson value: 79.94426039862019 - type: euclidean_spearman value: 77.36751135465131 - type: manhattan_pearson value: 79.87959373304288 - type: manhattan_spearman value: 77.37717129004746 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 83.90618420346104 - type: cos_sim_spearman value: 84.77290791243722 - type: euclidean_pearson value: 84.64732258073293 - type: euclidean_spearman value: 85.21053649543357 - type: manhattan_pearson value: 84.61616883522647 - type: manhattan_spearman value: 85.19803126766931 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 80.52192114059063 - type: cos_sim_spearman value: 81.9103244827937 - type: euclidean_pearson value: 80.99375176138985 - type: euclidean_spearman value: 81.540250641079 - type: manhattan_pearson value: 80.84979573396426 - type: manhattan_spearman value: 81.3742591621492 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 85.82166001234197 - type: cos_sim_spearman value: 86.81857495659123 - type: euclidean_pearson value: 85.72798403202849 - type: euclidean_spearman value: 85.70482438950965 - type: manhattan_pearson value: 85.51579093130357 - type: manhattan_spearman value: 85.41233705379751 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.48071151079803 - type: cos_sim_spearman value: 65.37838108084044 - type: euclidean_pearson value: 64.67378947096257 - type: euclidean_spearman value: 65.39187147219869 - type: manhattan_pearson value: 65.35487466133208 - type: manhattan_spearman value: 65.51328499442272 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 82.64702367823314 - type: cos_sim_spearman value: 82.49732953181818 - type: euclidean_pearson value: 83.05996062475664 - type: euclidean_spearman value: 82.28159546751176 - type: manhattan_pearson value: 82.98305503664952 - type: manhattan_spearman value: 82.18405771943928 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 78.5744649318696 - type: mrr value: 93.35386291268645 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 52.093999999999994 - type: map_at_10 value: 61.646 - type: map_at_100 value: 62.197 - type: map_at_1000 value: 62.22800000000001 - type: map_at_3 value: 58.411 - type: map_at_5 value: 60.585 - type: mrr_at_1 value: 55.00000000000001 - type: mrr_at_10 value: 62.690999999999995 - type: mrr_at_100 value: 63.139 - type: mrr_at_1000 value: 63.166999999999994 - type: mrr_at_3 value: 60.111000000000004 - type: mrr_at_5 value: 61.778 - type: ndcg_at_1 value: 55.00000000000001 - type: ndcg_at_10 value: 66.271 - type: ndcg_at_100 value: 68.879 - type: ndcg_at_1000 value: 69.722 - type: ndcg_at_3 value: 60.672000000000004 - type: ndcg_at_5 value: 63.929 - type: precision_at_1 value: 55.00000000000001 - type: precision_at_10 value: 9 - type: precision_at_100 value: 1.043 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 23.555999999999997 - type: precision_at_5 value: 16.2 - type: recall_at_1 value: 52.093999999999994 - type: recall_at_10 value: 79.567 - type: recall_at_100 value: 91.60000000000001 - type: recall_at_1000 value: 98.333 - type: recall_at_3 value: 64.633 - type: recall_at_5 value: 72.68299999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.83267326732673 - type: cos_sim_ap value: 95.77995366495178 - type: cos_sim_f1 value: 91.51180311401306 - type: cos_sim_precision value: 91.92734611503532 - type: cos_sim_recall value: 91.10000000000001 - type: dot_accuracy value: 99.63366336633663 - type: dot_ap value: 88.53996286967461 - type: dot_f1 value: 81.06537530266343 - type: dot_precision value: 78.59154929577464 - type: dot_recall value: 83.7 - type: euclidean_accuracy value: 99.82376237623762 - type: euclidean_ap value: 95.53192209281187 - type: euclidean_f1 value: 91.19683481701286 - type: euclidean_precision value: 90.21526418786692 - type: euclidean_recall value: 92.2 - type: manhattan_accuracy value: 99.82376237623762 - type: manhattan_ap value: 95.55642082191741 - type: manhattan_f1 value: 91.16186693147964 - type: manhattan_precision value: 90.53254437869822 - type: manhattan_recall value: 91.8 - type: max_accuracy value: 99.83267326732673 - type: max_ap value: 95.77995366495178 - type: max_f1 value: 91.51180311401306 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 54.508462134213474 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.06549765184959 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.43129549466616 - type: mrr value: 50.20613169510227 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.069516173193044 - type: cos_sim_spearman value: 29.872498354017353 - type: dot_pearson value: 28.80761257516063 - type: dot_spearman value: 28.397422678527708 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.169 - type: map_at_10 value: 1.208 - type: map_at_100 value: 5.925 - type: map_at_1000 value: 14.427000000000001 - type: map_at_3 value: 0.457 - type: map_at_5 value: 0.716 - type: mrr_at_1 value: 64 - type: mrr_at_10 value: 74.075 - type: mrr_at_100 value: 74.303 - type: mrr_at_1000 value: 74.303 - type: mrr_at_3 value: 71 - type: mrr_at_5 value: 72.89999999999999 - type: ndcg_at_1 value: 57.99999999999999 - type: ndcg_at_10 value: 50.376 - type: ndcg_at_100 value: 38.582 - type: ndcg_at_1000 value: 35.663 - type: ndcg_at_3 value: 55.592 - type: ndcg_at_5 value: 53.647999999999996 - type: precision_at_1 value: 64 - type: precision_at_10 value: 53.2 - type: precision_at_100 value: 39.6 - type: precision_at_1000 value: 16.218 - type: precision_at_3 value: 59.333000000000006 - type: precision_at_5 value: 57.599999999999994 - type: recall_at_1 value: 0.169 - type: recall_at_10 value: 1.423 - type: recall_at_100 value: 9.049999999999999 - type: recall_at_1000 value: 34.056999999999995 - type: recall_at_3 value: 0.48700000000000004 - type: recall_at_5 value: 0.792 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 1.319 - type: map_at_10 value: 7.112 - type: map_at_100 value: 12.588 - type: map_at_1000 value: 14.056 - type: map_at_3 value: 2.8049999999999997 - type: map_at_5 value: 4.68 - type: mrr_at_1 value: 18.367 - type: mrr_at_10 value: 33.94 - type: mrr_at_100 value: 35.193000000000005 - type: mrr_at_1000 value: 35.193000000000005 - type: mrr_at_3 value: 29.932 - type: mrr_at_5 value: 32.279 - type: ndcg_at_1 value: 15.306000000000001 - type: ndcg_at_10 value: 18.096 - type: ndcg_at_100 value: 30.512 - type: ndcg_at_1000 value: 42.148 - type: ndcg_at_3 value: 17.034 - type: ndcg_at_5 value: 18.509 - type: precision_at_1 value: 18.367 - type: precision_at_10 value: 18.776 - type: precision_at_100 value: 7.02 - type: precision_at_1000 value: 1.467 - type: precision_at_3 value: 19.048000000000002 - type: precision_at_5 value: 22.041 - type: recall_at_1 value: 1.319 - type: recall_at_10 value: 13.748 - type: recall_at_100 value: 43.972 - type: recall_at_1000 value: 79.557 - type: recall_at_3 value: 4.042 - type: recall_at_5 value: 7.742 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.2282 - type: ap value: 13.995763859570426 - type: f1 value: 54.08126256731344 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 57.64006791171477 - type: f1 value: 57.95841320748957 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 40.19267841788564 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 83.96614412588663 - type: cos_sim_ap value: 67.75985678572738 - type: cos_sim_f1 value: 64.04661542276222 - type: cos_sim_precision value: 60.406922357343305 - type: cos_sim_recall value: 68.15303430079156 - type: dot_accuracy value: 79.5732252488526 - type: dot_ap value: 51.30562107572645 - type: dot_f1 value: 53.120759837177744 - type: dot_precision value: 46.478037198258804 - type: dot_recall value: 61.97889182058047 - type: euclidean_accuracy value: 84.00786791440663 - type: euclidean_ap value: 67.58930214486998 - type: euclidean_f1 value: 64.424821579775 - type: euclidean_precision value: 59.4817958454322 - type: euclidean_recall value: 70.26385224274406 - type: manhattan_accuracy value: 83.87673600762949 - type: manhattan_ap value: 67.4250981523309 - type: manhattan_f1 value: 64.10286658015808 - type: manhattan_precision value: 57.96885001066781 - type: manhattan_recall value: 71.68865435356201 - type: max_accuracy value: 84.00786791440663 - type: max_ap value: 67.75985678572738 - type: max_f1 value: 64.424821579775 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.41347459929368 - type: cos_sim_ap value: 84.89261930113058 - type: cos_sim_f1 value: 77.13677607258877 - type: cos_sim_precision value: 74.88581164358733 - type: cos_sim_recall value: 79.52725592854944 - type: dot_accuracy value: 86.32359219156285 - type: dot_ap value: 79.29794992131094 - type: dot_f1 value: 72.84356337679777 - type: dot_precision value: 67.31761478675462 - type: dot_recall value: 79.35786880197105 - type: euclidean_accuracy value: 88.33585593976791 - type: euclidean_ap value: 84.73257641312746 - type: euclidean_f1 value: 76.83529582788195 - type: euclidean_precision value: 72.76294052863436 - type: euclidean_recall value: 81.3905143209116 - type: manhattan_accuracy value: 88.3086894089339 - type: manhattan_ap value: 84.66304891729399 - type: manhattan_f1 value: 76.8181650632165 - type: manhattan_precision value: 73.6864436744219 - type: manhattan_recall value: 80.22790267939637 - type: max_accuracy value: 88.41347459929368 - type: max_ap value: 84.89261930113058 - type: max_f1 value: 77.13677607258877 license: mit --- # bge-micro-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. Distilled in a 2-step training process (bge-micro was step 1) from . ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Full Model Architecture ## Citing & Authors ", + "model_explanation_gemini": "Generates sentence embeddings for measuring semantic similarity and performing retrieval, classification, and clustering tasks." +} \ No newline at end of file diff --git a/data/model_data_json/TaylorAI_gte-tiny.json b/data/model_data_json/TaylorAI_gte-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..6dbbb765efd2a6c25ec3677c74e911823bb2f4cc --- /dev/null +++ b/data/model_data_json/TaylorAI_gte-tiny.json @@ -0,0 +1,21 @@ +{ + "model_id": "TaylorAI/gte-tiny", + "downloads": 74970, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "mteb", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- model-index: - name: gte_tiny results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 71.76119402985076 - type: ap value: 34.63659287952359 - type: f1 value: 65.88939512571113 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 86.61324999999998 - type: ap value: 81.7476302802319 - type: f1 value: 86.5863470912001 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 42.61000000000001 - type: f1 value: 42.2217180000715 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 28.377999999999997 - type: map_at_10 value: 44.565 - type: map_at_100 value: 45.48 - type: map_at_1000 value: 45.487 - type: map_at_3 value: 39.841 - type: map_at_5 value: 42.284 - type: mrr_at_1 value: 29.445 - type: mrr_at_10 value: 44.956 - type: mrr_at_100 value: 45.877 - type: mrr_at_1000 value: 45.884 - type: mrr_at_3 value: 40.209 - type: mrr_at_5 value: 42.719 - type: ndcg_at_1 value: 28.377999999999997 - type: ndcg_at_10 value: 53.638 - type: ndcg_at_100 value: 57.354000000000006 - type: ndcg_at_1000 value: 57.513000000000005 - type: ndcg_at_3 value: 43.701 - type: ndcg_at_5 value: 48.114000000000004 - type: precision_at_1 value: 28.377999999999997 - type: precision_at_10 value: 8.272 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 18.303 - type: precision_at_5 value: 13.129 - type: recall_at_1 value: 28.377999999999997 - type: recall_at_10 value: 82.717 - type: recall_at_100 value: 98.43499999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 54.908 - type: recall_at_5 value: 65.647 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 46.637318326729876 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 36.01134479855804 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 59.82917555338909 - type: mrr value: 74.7888361254012 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.1657730995964 - type: cos_sim_spearman value: 86.62787748941281 - type: euclidean_pearson value: 85.48127914481798 - type: euclidean_spearman value: 86.48148861167424 - type: manhattan_pearson value: 85.07496934780823 - type: manhattan_spearman value: 86.39473964708843 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 81.73051948051948 - type: f1 value: 81.66368364988331 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.18623707448217 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 32.12697757150375 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.160000000000004 - type: map_at_10 value: 40.474 - type: map_at_100 value: 41.905 - type: map_at_1000 value: 42.041000000000004 - type: map_at_3 value: 37.147000000000006 - type: map_at_5 value: 38.873999999999995 - type: mrr_at_1 value: 36.91 - type: mrr_at_10 value: 46.495999999999995 - type: mrr_at_100 value: 47.288000000000004 - type: mrr_at_1000 value: 47.339999999999996 - type: mrr_at_3 value: 43.777 - type: mrr_at_5 value: 45.257999999999996 - type: ndcg_at_1 value: 36.91 - type: ndcg_at_10 value: 46.722 - type: ndcg_at_100 value: 51.969 - type: ndcg_at_1000 value: 54.232 - type: ndcg_at_3 value: 41.783 - type: ndcg_at_5 value: 43.797000000000004 - type: precision_at_1 value: 36.91 - type: precision_at_10 value: 9.013 - type: precision_at_100 value: 1.455 - type: precision_at_1000 value: 0.193 - type: precision_at_3 value: 20.124 - type: precision_at_5 value: 14.363000000000001 - type: recall_at_1 value: 29.160000000000004 - type: recall_at_10 value: 58.521 - type: recall_at_100 value: 80.323 - type: recall_at_1000 value: 95.13000000000001 - type: recall_at_3 value: 44.205 - type: recall_at_5 value: 49.97 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.750000000000004 - type: map_at_10 value: 36.39 - type: map_at_100 value: 37.5 - type: map_at_1000 value: 37.625 - type: map_at_3 value: 33.853 - type: map_at_5 value: 35.397 - type: mrr_at_1 value: 34.14 - type: mrr_at_10 value: 41.841 - type: mrr_at_100 value: 42.469 - type: mrr_at_1000 value: 42.521 - type: mrr_at_3 value: 39.724 - type: mrr_at_5 value: 40.955999999999996 - type: ndcg_at_1 value: 34.14 - type: ndcg_at_10 value: 41.409 - type: ndcg_at_100 value: 45.668 - type: ndcg_at_1000 value: 47.916 - type: ndcg_at_3 value: 37.836 - type: ndcg_at_5 value: 39.650999999999996 - type: precision_at_1 value: 34.14 - type: precision_at_10 value: 7.739 - type: precision_at_100 value: 1.2630000000000001 - type: precision_at_1000 value: 0.173 - type: precision_at_3 value: 18.217 - type: precision_at_5 value: 12.854 - type: recall_at_1 value: 27.750000000000004 - type: recall_at_10 value: 49.882 - type: recall_at_100 value: 68.556 - type: recall_at_1000 value: 83.186 - type: recall_at_3 value: 39.047 - type: recall_at_5 value: 44.458 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 36.879 - type: map_at_10 value: 48.878 - type: map_at_100 value: 49.918 - type: map_at_1000 value: 49.978 - type: map_at_3 value: 45.867999999999995 - type: map_at_5 value: 47.637 - type: mrr_at_1 value: 42.696 - type: mrr_at_10 value: 52.342 - type: mrr_at_100 value: 53.044000000000004 - type: mrr_at_1000 value: 53.077 - type: mrr_at_3 value: 50.01 - type: mrr_at_5 value: 51.437 - type: ndcg_at_1 value: 42.696 - type: ndcg_at_10 value: 54.469 - type: ndcg_at_100 value: 58.664 - type: ndcg_at_1000 value: 59.951 - type: ndcg_at_3 value: 49.419999999999995 - type: ndcg_at_5 value: 52.007000000000005 - type: precision_at_1 value: 42.696 - type: precision_at_10 value: 8.734 - type: precision_at_100 value: 1.1769999999999998 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 22.027 - type: precision_at_5 value: 15.135000000000002 - type: recall_at_1 value: 36.879 - type: recall_at_10 value: 67.669 - type: recall_at_100 value: 85.822 - type: recall_at_1000 value: 95.092 - type: recall_at_3 value: 54.157999999999994 - type: recall_at_5 value: 60.436 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.942 - type: map_at_10 value: 31.741999999999997 - type: map_at_100 value: 32.721000000000004 - type: map_at_1000 value: 32.809 - type: map_at_3 value: 29.17 - type: map_at_5 value: 30.714000000000002 - type: mrr_at_1 value: 24.746000000000002 - type: mrr_at_10 value: 33.517 - type: mrr_at_100 value: 34.451 - type: mrr_at_1000 value: 34.522000000000006 - type: mrr_at_3 value: 31.148999999999997 - type: mrr_at_5 value: 32.606 - type: ndcg_at_1 value: 24.746000000000002 - type: ndcg_at_10 value: 36.553000000000004 - type: ndcg_at_100 value: 41.53 - type: ndcg_at_1000 value: 43.811 - type: ndcg_at_3 value: 31.674000000000003 - type: ndcg_at_5 value: 34.241 - type: precision_at_1 value: 24.746000000000002 - type: precision_at_10 value: 5.684 - type: precision_at_100 value: 0.859 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 13.597000000000001 - type: precision_at_5 value: 9.672 - type: recall_at_1 value: 22.942 - type: recall_at_10 value: 49.58 - type: recall_at_100 value: 72.614 - type: recall_at_1000 value: 89.89200000000001 - type: recall_at_3 value: 36.552 - type: recall_at_5 value: 42.702 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.345 - type: map_at_10 value: 22.428 - type: map_at_100 value: 23.756 - type: map_at_1000 value: 23.872 - type: map_at_3 value: 20.212 - type: map_at_5 value: 21.291 - type: mrr_at_1 value: 19.279 - type: mrr_at_10 value: 27.1 - type: mrr_at_100 value: 28.211000000000002 - type: mrr_at_1000 value: 28.279 - type: mrr_at_3 value: 24.813 - type: mrr_at_5 value: 25.889 - type: ndcg_at_1 value: 19.279 - type: ndcg_at_10 value: 27.36 - type: ndcg_at_100 value: 33.499 - type: ndcg_at_1000 value: 36.452 - type: ndcg_at_3 value: 23.233999999999998 - type: ndcg_at_5 value: 24.806 - type: precision_at_1 value: 19.279 - type: precision_at_10 value: 5.149 - type: precision_at_100 value: 0.938 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 11.360000000000001 - type: precision_at_5 value: 8.035 - type: recall_at_1 value: 15.345 - type: recall_at_10 value: 37.974999999999994 - type: recall_at_100 value: 64.472 - type: recall_at_1000 value: 85.97200000000001 - type: recall_at_3 value: 26.203 - type: recall_at_5 value: 30.485 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.362000000000002 - type: map_at_10 value: 36.406 - type: map_at_100 value: 37.726 - type: map_at_1000 value: 37.84 - type: map_at_3 value: 33.425 - type: map_at_5 value: 35.043 - type: mrr_at_1 value: 32.146 - type: mrr_at_10 value: 41.674 - type: mrr_at_100 value: 42.478 - type: mrr_at_1000 value: 42.524 - type: mrr_at_3 value: 38.948 - type: mrr_at_5 value: 40.415 - type: ndcg_at_1 value: 32.146 - type: ndcg_at_10 value: 42.374 - type: ndcg_at_100 value: 47.919 - type: ndcg_at_1000 value: 50.013 - type: ndcg_at_3 value: 37.29 - type: ndcg_at_5 value: 39.531 - type: precision_at_1 value: 32.146 - type: precision_at_10 value: 7.767 - type: precision_at_100 value: 1.236 - type: precision_at_1000 value: 0.16 - type: precision_at_3 value: 17.965999999999998 - type: precision_at_5 value: 12.742999999999999 - type: recall_at_1 value: 26.362000000000002 - type: recall_at_10 value: 54.98800000000001 - type: recall_at_100 value: 78.50200000000001 - type: recall_at_1000 value: 92.146 - type: recall_at_3 value: 40.486 - type: recall_at_5 value: 46.236 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.417 - type: map_at_10 value: 33.161 - type: map_at_100 value: 34.357 - type: map_at_1000 value: 34.473 - type: map_at_3 value: 30.245 - type: map_at_5 value: 31.541999999999998 - type: mrr_at_1 value: 29.909000000000002 - type: mrr_at_10 value: 38.211 - type: mrr_at_100 value: 39.056999999999995 - type: mrr_at_1000 value: 39.114 - type: mrr_at_3 value: 35.769 - type: mrr_at_5 value: 36.922 - type: ndcg_at_1 value: 29.909000000000002 - type: ndcg_at_10 value: 38.694 - type: ndcg_at_100 value: 44.057 - type: ndcg_at_1000 value: 46.6 - type: ndcg_at_3 value: 33.822 - type: ndcg_at_5 value: 35.454 - type: precision_at_1 value: 29.909000000000002 - type: precision_at_10 value: 7.180000000000001 - type: precision_at_100 value: 1.153 - type: precision_at_1000 value: 0.155 - type: precision_at_3 value: 16.134 - type: precision_at_5 value: 11.256 - type: recall_at_1 value: 24.417 - type: recall_at_10 value: 50.260000000000005 - type: recall_at_100 value: 73.55699999999999 - type: recall_at_1000 value: 91.216 - type: recall_at_3 value: 35.971 - type: recall_at_5 value: 40.793 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.266916666666663 - type: map_at_10 value: 32.75025 - type: map_at_100 value: 33.91341666666667 - type: map_at_1000 value: 34.031749999999995 - type: map_at_3 value: 30.166416666666674 - type: map_at_5 value: 31.577000000000005 - type: mrr_at_1 value: 28.828166666666664 - type: mrr_at_10 value: 36.80991666666667 - type: mrr_at_100 value: 37.67075 - type: mrr_at_1000 value: 37.733 - type: mrr_at_3 value: 34.513416666666664 - type: mrr_at_5 value: 35.788 - type: ndcg_at_1 value: 28.828166666666664 - type: ndcg_at_10 value: 37.796 - type: ndcg_at_100 value: 42.94783333333333 - type: ndcg_at_1000 value: 45.38908333333333 - type: ndcg_at_3 value: 33.374750000000006 - type: ndcg_at_5 value: 35.379666666666665 - type: precision_at_1 value: 28.828166666666664 - type: precision_at_10 value: 6.615749999999999 - type: precision_at_100 value: 1.0848333333333333 - type: precision_at_1000 value: 0.1484166666666667 - type: precision_at_3 value: 15.347833333333332 - type: precision_at_5 value: 10.848916666666666 - type: recall_at_1 value: 24.266916666666663 - type: recall_at_10 value: 48.73458333333333 - type: recall_at_100 value: 71.56341666666667 - type: recall_at_1000 value: 88.63091666666668 - type: recall_at_3 value: 36.31208333333333 - type: recall_at_5 value: 41.55633333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.497 - type: map_at_10 value: 30.249 - type: map_at_100 value: 30.947000000000003 - type: map_at_1000 value: 31.049 - type: map_at_3 value: 28.188000000000002 - type: map_at_5 value: 29.332 - type: mrr_at_1 value: 26.687 - type: mrr_at_10 value: 33.182 - type: mrr_at_100 value: 33.794999999999995 - type: mrr_at_1000 value: 33.873 - type: mrr_at_3 value: 31.263 - type: mrr_at_5 value: 32.428000000000004 - type: ndcg_at_1 value: 26.687 - type: ndcg_at_10 value: 34.252 - type: ndcg_at_100 value: 38.083 - type: ndcg_at_1000 value: 40.682 - type: ndcg_at_3 value: 30.464999999999996 - type: ndcg_at_5 value: 32.282 - type: precision_at_1 value: 26.687 - type: precision_at_10 value: 5.2909999999999995 - type: precision_at_100 value: 0.788 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 13.037 - type: precision_at_5 value: 9.049 - type: recall_at_1 value: 23.497 - type: recall_at_10 value: 43.813 - type: recall_at_100 value: 61.88399999999999 - type: recall_at_1000 value: 80.926 - type: recall_at_3 value: 33.332 - type: recall_at_5 value: 37.862 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.073 - type: map_at_10 value: 22.705000000000002 - type: map_at_100 value: 23.703 - type: map_at_1000 value: 23.833 - type: map_at_3 value: 20.593 - type: map_at_5 value: 21.7 - type: mrr_at_1 value: 19.683 - type: mrr_at_10 value: 26.39 - type: mrr_at_100 value: 27.264 - type: mrr_at_1000 value: 27.349 - type: mrr_at_3 value: 24.409 - type: mrr_at_5 value: 25.474000000000004 - type: ndcg_at_1 value: 19.683 - type: ndcg_at_10 value: 27.014 - type: ndcg_at_100 value: 31.948 - type: ndcg_at_1000 value: 35.125 - type: ndcg_at_3 value: 23.225 - type: ndcg_at_5 value: 24.866 - type: precision_at_1 value: 19.683 - type: precision_at_10 value: 4.948 - type: precision_at_100 value: 0.876 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 10.943 - type: precision_at_5 value: 7.86 - type: recall_at_1 value: 16.073 - type: recall_at_10 value: 36.283 - type: recall_at_100 value: 58.745999999999995 - type: recall_at_1000 value: 81.711 - type: recall_at_3 value: 25.637 - type: recall_at_5 value: 29.919 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.776 - type: map_at_10 value: 33.317 - type: map_at_100 value: 34.437 - type: map_at_1000 value: 34.54 - type: map_at_3 value: 30.706 - type: map_at_5 value: 32.202999999999996 - type: mrr_at_1 value: 30.224 - type: mrr_at_10 value: 37.34 - type: mrr_at_100 value: 38.268 - type: mrr_at_1000 value: 38.335 - type: mrr_at_3 value: 35.075 - type: mrr_at_5 value: 36.348 - type: ndcg_at_1 value: 30.224 - type: ndcg_at_10 value: 38.083 - type: ndcg_at_100 value: 43.413000000000004 - type: ndcg_at_1000 value: 45.856 - type: ndcg_at_3 value: 33.437 - type: ndcg_at_5 value: 35.661 - type: precision_at_1 value: 30.224 - type: precision_at_10 value: 6.1850000000000005 - type: precision_at_100 value: 1.0030000000000001 - type: precision_at_1000 value: 0.132 - type: precision_at_3 value: 14.646 - type: precision_at_5 value: 10.428999999999998 - type: recall_at_1 value: 25.776 - type: recall_at_10 value: 48.787000000000006 - type: recall_at_100 value: 72.04899999999999 - type: recall_at_1000 value: 89.339 - type: recall_at_3 value: 36.192 - type: recall_at_5 value: 41.665 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.156 - type: map_at_10 value: 30.886000000000003 - type: map_at_100 value: 32.551 - type: map_at_1000 value: 32.769 - type: map_at_3 value: 28.584 - type: map_at_5 value: 29.959999999999997 - type: mrr_at_1 value: 28.260999999999996 - type: mrr_at_10 value: 35.555 - type: mrr_at_100 value: 36.687 - type: mrr_at_1000 value: 36.742999999999995 - type: mrr_at_3 value: 33.531 - type: mrr_at_5 value: 34.717 - type: ndcg_at_1 value: 28.260999999999996 - type: ndcg_at_10 value: 36.036 - type: ndcg_at_100 value: 42.675000000000004 - type: ndcg_at_1000 value: 45.303 - type: ndcg_at_3 value: 32.449 - type: ndcg_at_5 value: 34.293 - type: precision_at_1 value: 28.260999999999996 - type: precision_at_10 value: 6.837999999999999 - type: precision_at_100 value: 1.4569999999999999 - type: precision_at_1000 value: 0.23500000000000001 - type: precision_at_3 value: 15.217 - type: precision_at_5 value: 11.028 - type: recall_at_1 value: 23.156 - type: recall_at_10 value: 45.251999999999995 - type: recall_at_100 value: 75.339 - type: recall_at_1000 value: 91.56 - type: recall_at_3 value: 34.701 - type: recall_at_5 value: 39.922999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.846 - type: map_at_10 value: 26.367 - type: map_at_100 value: 27.439999999999998 - type: map_at_1000 value: 27.552 - type: map_at_3 value: 24.006 - type: map_at_5 value: 25.230999999999998 - type: mrr_at_1 value: 21.257 - type: mrr_at_10 value: 28.071 - type: mrr_at_100 value: 29.037000000000003 - type: mrr_at_1000 value: 29.119 - type: mrr_at_3 value: 25.692999999999998 - type: mrr_at_5 value: 27.006000000000004 - type: ndcg_at_1 value: 21.257 - type: ndcg_at_10 value: 30.586000000000002 - type: ndcg_at_100 value: 35.949 - type: ndcg_at_1000 value: 38.728 - type: ndcg_at_3 value: 25.862000000000002 - type: ndcg_at_5 value: 27.967 - type: precision_at_1 value: 21.257 - type: precision_at_10 value: 4.861 - type: precision_at_100 value: 0.8130000000000001 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 10.906 - type: precision_at_5 value: 7.763000000000001 - type: recall_at_1 value: 19.846 - type: recall_at_10 value: 41.805 - type: recall_at_100 value: 66.89699999999999 - type: recall_at_1000 value: 87.401 - type: recall_at_3 value: 29.261 - type: recall_at_5 value: 34.227000000000004 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.333 - type: map_at_10 value: 17.14 - type: map_at_100 value: 18.878 - type: map_at_1000 value: 19.067 - type: map_at_3 value: 14.123 - type: map_at_5 value: 15.699 - type: mrr_at_1 value: 23.192 - type: mrr_at_10 value: 33.553 - type: mrr_at_100 value: 34.553 - type: mrr_at_1000 value: 34.603 - type: mrr_at_3 value: 29.848000000000003 - type: mrr_at_5 value: 32.18 - type: ndcg_at_1 value: 23.192 - type: ndcg_at_10 value: 24.707 - type: ndcg_at_100 value: 31.701 - type: ndcg_at_1000 value: 35.260999999999996 - type: ndcg_at_3 value: 19.492 - type: ndcg_at_5 value: 21.543 - type: precision_at_1 value: 23.192 - type: precision_at_10 value: 7.824000000000001 - type: precision_at_100 value: 1.52 - type: precision_at_1000 value: 0.218 - type: precision_at_3 value: 14.180000000000001 - type: precision_at_5 value: 11.530999999999999 - type: recall_at_1 value: 10.333 - type: recall_at_10 value: 30.142999999999997 - type: recall_at_100 value: 54.298 - type: recall_at_1000 value: 74.337 - type: recall_at_3 value: 17.602999999999998 - type: recall_at_5 value: 22.938 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.03 - type: map_at_10 value: 17.345 - type: map_at_100 value: 23.462 - type: map_at_1000 value: 24.77 - type: map_at_3 value: 12.714 - type: map_at_5 value: 14.722 - type: mrr_at_1 value: 61.0 - type: mrr_at_10 value: 69.245 - type: mrr_at_100 value: 69.715 - type: mrr_at_1000 value: 69.719 - type: mrr_at_3 value: 67.583 - type: mrr_at_5 value: 68.521 - type: ndcg_at_1 value: 47.625 - type: ndcg_at_10 value: 35.973 - type: ndcg_at_100 value: 39.875 - type: ndcg_at_1000 value: 46.922000000000004 - type: ndcg_at_3 value: 40.574 - type: ndcg_at_5 value: 38.18 - type: precision_at_1 value: 61.0 - type: precision_at_10 value: 29.049999999999997 - type: precision_at_100 value: 8.828 - type: precision_at_1000 value: 1.8290000000000002 - type: precision_at_3 value: 45.333 - type: precision_at_5 value: 37.9 - type: recall_at_1 value: 8.03 - type: recall_at_10 value: 22.334 - type: recall_at_100 value: 45.919 - type: recall_at_1000 value: 68.822 - type: recall_at_3 value: 14.038999999999998 - type: recall_at_5 value: 17.118 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 44.714999999999996 - type: f1 value: 39.83929362259356 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 52.242999999999995 - type: map_at_10 value: 64.087 - type: map_at_100 value: 64.549 - type: map_at_1000 value: 64.567 - type: map_at_3 value: 61.667 - type: map_at_5 value: 63.266 - type: mrr_at_1 value: 56.271 - type: mrr_at_10 value: 68.146 - type: mrr_at_100 value: 68.524 - type: mrr_at_1000 value: 68.53200000000001 - type: mrr_at_3 value: 65.869 - type: mrr_at_5 value: 67.37100000000001 - type: ndcg_at_1 value: 56.271 - type: ndcg_at_10 value: 70.109 - type: ndcg_at_100 value: 72.09 - type: ndcg_at_1000 value: 72.479 - type: ndcg_at_3 value: 65.559 - type: ndcg_at_5 value: 68.242 - type: precision_at_1 value: 56.271 - type: precision_at_10 value: 9.286999999999999 - type: precision_at_100 value: 1.039 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 26.308 - type: precision_at_5 value: 17.291 - type: recall_at_1 value: 52.242999999999995 - type: recall_at_10 value: 84.71 - type: recall_at_100 value: 93.309 - type: recall_at_1000 value: 96.013 - type: recall_at_3 value: 72.554 - type: recall_at_5 value: 79.069 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 14.346 - type: map_at_10 value: 24.552 - type: map_at_100 value: 26.161 - type: map_at_1000 value: 26.345000000000002 - type: map_at_3 value: 21.208 - type: map_at_5 value: 22.959 - type: mrr_at_1 value: 29.166999999999998 - type: mrr_at_10 value: 38.182 - type: mrr_at_100 value: 39.22 - type: mrr_at_1000 value: 39.263 - type: mrr_at_3 value: 35.983 - type: mrr_at_5 value: 37.14 - type: ndcg_at_1 value: 29.166999999999998 - type: ndcg_at_10 value: 31.421 - type: ndcg_at_100 value: 38.129999999999995 - type: ndcg_at_1000 value: 41.569 - type: ndcg_at_3 value: 28.172000000000004 - type: ndcg_at_5 value: 29.029 - type: precision_at_1 value: 29.166999999999998 - type: precision_at_10 value: 8.997 - type: precision_at_100 value: 1.5709999999999997 - type: precision_at_1000 value: 0.22 - type: precision_at_3 value: 19.187 - type: precision_at_5 value: 13.980999999999998 - type: recall_at_1 value: 14.346 - type: recall_at_10 value: 37.963 - type: recall_at_100 value: 63.43299999999999 - type: recall_at_1000 value: 84.057 - type: recall_at_3 value: 26.119999999999997 - type: recall_at_5 value: 30.988 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 33.059 - type: map_at_10 value: 46.421 - type: map_at_100 value: 47.323 - type: map_at_1000 value: 47.403 - type: map_at_3 value: 43.553999999999995 - type: map_at_5 value: 45.283 - type: mrr_at_1 value: 66.117 - type: mrr_at_10 value: 73.10900000000001 - type: mrr_at_100 value: 73.444 - type: mrr_at_1000 value: 73.46000000000001 - type: mrr_at_3 value: 71.70400000000001 - type: mrr_at_5 value: 72.58099999999999 - type: ndcg_at_1 value: 66.117 - type: ndcg_at_10 value: 55.696999999999996 - type: ndcg_at_100 value: 59.167 - type: ndcg_at_1000 value: 60.809000000000005 - type: ndcg_at_3 value: 51.243 - type: ndcg_at_5 value: 53.627 - type: precision_at_1 value: 66.117 - type: precision_at_10 value: 11.538 - type: precision_at_100 value: 1.429 - type: precision_at_1000 value: 0.165 - type: precision_at_3 value: 31.861 - type: precision_at_5 value: 20.997 - type: recall_at_1 value: 33.059 - type: recall_at_10 value: 57.691 - type: recall_at_100 value: 71.458 - type: recall_at_1000 value: 82.35 - type: recall_at_3 value: 47.792 - type: recall_at_5 value: 52.492000000000004 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 80.544 - type: ap value: 74.69592367984956 - type: f1 value: 80.51138138449883 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 17.095 - type: map_at_10 value: 28.038999999999998 - type: map_at_100 value: 29.246 - type: map_at_1000 value: 29.311 - type: map_at_3 value: 24.253 - type: map_at_5 value: 26.442 - type: mrr_at_1 value: 17.535999999999998 - type: mrr_at_10 value: 28.53 - type: mrr_at_100 value: 29.697000000000003 - type: mrr_at_1000 value: 29.755 - type: mrr_at_3 value: 24.779999999999998 - type: mrr_at_5 value: 26.942 - type: ndcg_at_1 value: 17.549999999999997 - type: ndcg_at_10 value: 34.514 - type: ndcg_at_100 value: 40.497 - type: ndcg_at_1000 value: 42.17 - type: ndcg_at_3 value: 26.764 - type: ndcg_at_5 value: 30.678 - type: precision_at_1 value: 17.549999999999997 - type: precision_at_10 value: 5.692 - type: precision_at_100 value: 0.8699999999999999 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 11.562 - type: precision_at_5 value: 8.917 - type: recall_at_1 value: 17.095 - type: recall_at_10 value: 54.642 - type: recall_at_100 value: 82.652 - type: recall_at_1000 value: 95.555 - type: recall_at_3 value: 33.504 - type: recall_at_5 value: 42.925000000000004 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 91.75558595531236 - type: f1 value: 91.25979279648296 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 69.90424076607387 - type: f1 value: 52.067408707562244 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.13449899125757 - type: f1 value: 67.62456762910598 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.862138533961 - type: f1 value: 74.66457222091381 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.10761942610792 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.673172170578408 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.058704977250315 - type: mrr value: 33.24327760839221 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.163 - type: map_at_10 value: 11.652999999999999 - type: map_at_100 value: 14.849 - type: map_at_1000 value: 16.253999999999998 - type: map_at_3 value: 8.616999999999999 - type: map_at_5 value: 10.100000000000001 - type: mrr_at_1 value: 44.272 - type: mrr_at_10 value: 52.25 - type: mrr_at_100 value: 52.761 - type: mrr_at_1000 value: 52.811 - type: mrr_at_3 value: 50.31 - type: mrr_at_5 value: 51.347 - type: ndcg_at_1 value: 42.105 - type: ndcg_at_10 value: 32.044 - type: ndcg_at_100 value: 29.763 - type: ndcg_at_1000 value: 38.585 - type: ndcg_at_3 value: 36.868 - type: ndcg_at_5 value: 35.154999999999994 - type: precision_at_1 value: 43.653 - type: precision_at_10 value: 23.622 - type: precision_at_100 value: 7.7490000000000006 - type: precision_at_1000 value: 2.054 - type: precision_at_3 value: 34.262 - type: precision_at_5 value: 30.154999999999998 - type: recall_at_1 value: 5.163 - type: recall_at_10 value: 15.478 - type: recall_at_100 value: 30.424 - type: recall_at_1000 value: 62.67 - type: recall_at_3 value: 9.615 - type: recall_at_5 value: 12.369 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 21.618000000000002 - type: map_at_10 value: 35.465 - type: map_at_100 value: 36.712 - type: map_at_1000 value: 36.757 - type: map_at_3 value: 31.189 - type: map_at_5 value: 33.537 - type: mrr_at_1 value: 24.305 - type: mrr_at_10 value: 37.653 - type: mrr_at_100 value: 38.662 - type: mrr_at_1000 value: 38.694 - type: mrr_at_3 value: 33.889 - type: mrr_at_5 value: 35.979 - type: ndcg_at_1 value: 24.305 - type: ndcg_at_10 value: 43.028 - type: ndcg_at_100 value: 48.653999999999996 - type: ndcg_at_1000 value: 49.733 - type: ndcg_at_3 value: 34.768 - type: ndcg_at_5 value: 38.753 - type: precision_at_1 value: 24.305 - type: precision_at_10 value: 7.59 - type: precision_at_100 value: 1.076 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 16.271 - type: precision_at_5 value: 12.068 - type: recall_at_1 value: 21.618000000000002 - type: recall_at_10 value: 63.977 - type: recall_at_100 value: 89.03999999999999 - type: recall_at_1000 value: 97.10600000000001 - type: recall_at_3 value: 42.422 - type: recall_at_5 value: 51.629000000000005 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 69.405 - type: map_at_10 value: 83.05 - type: map_at_100 value: 83.684 - type: map_at_1000 value: 83.70400000000001 - type: map_at_3 value: 80.08800000000001 - type: map_at_5 value: 81.937 - type: mrr_at_1 value: 79.85 - type: mrr_at_10 value: 86.369 - type: mrr_at_100 value: 86.48599999999999 - type: mrr_at_1000 value: 86.48700000000001 - type: mrr_at_3 value: 85.315 - type: mrr_at_5 value: 86.044 - type: ndcg_at_1 value: 79.86999999999999 - type: ndcg_at_10 value: 87.04499999999999 - type: ndcg_at_100 value: 88.373 - type: ndcg_at_1000 value: 88.531 - type: ndcg_at_3 value: 84.04 - type: ndcg_at_5 value: 85.684 - type: precision_at_1 value: 79.86999999999999 - type: precision_at_10 value: 13.183 - type: precision_at_100 value: 1.51 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 36.67 - type: precision_at_5 value: 24.12 - type: recall_at_1 value: 69.405 - type: recall_at_10 value: 94.634 - type: recall_at_100 value: 99.214 - type: recall_at_1000 value: 99.958 - type: recall_at_3 value: 85.992 - type: recall_at_5 value: 90.656 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 50.191676323145465 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 56.4874020363744 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.228 - type: map_at_10 value: 11.245 - type: map_at_100 value: 13.353000000000002 - type: map_at_1000 value: 13.665 - type: map_at_3 value: 7.779999999999999 - type: map_at_5 value: 9.405 - type: mrr_at_1 value: 20.9 - type: mrr_at_10 value: 31.657999999999998 - type: mrr_at_100 value: 32.769999999999996 - type: mrr_at_1000 value: 32.833 - type: mrr_at_3 value: 28.333000000000002 - type: mrr_at_5 value: 30.043 - type: ndcg_at_1 value: 20.9 - type: ndcg_at_10 value: 19.073 - type: ndcg_at_100 value: 27.055 - type: ndcg_at_1000 value: 32.641 - type: ndcg_at_3 value: 17.483999999999998 - type: ndcg_at_5 value: 15.42 - type: precision_at_1 value: 20.9 - type: precision_at_10 value: 10.17 - type: precision_at_100 value: 2.162 - type: precision_at_1000 value: 0.35100000000000003 - type: precision_at_3 value: 16.467000000000002 - type: precision_at_5 value: 13.68 - type: recall_at_1 value: 4.228 - type: recall_at_10 value: 20.573 - type: recall_at_100 value: 43.887 - type: recall_at_1000 value: 71.22 - type: recall_at_3 value: 10.023 - type: recall_at_5 value: 13.873 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.77965135067481 - type: cos_sim_spearman value: 75.85121335808076 - type: euclidean_pearson value: 80.09115175262697 - type: euclidean_spearman value: 75.72249155647123 - type: manhattan_pearson value: 79.89723577351782 - type: manhattan_spearman value: 75.49855259442387 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 80.46084116030949 - type: cos_sim_spearman value: 72.57579204392951 - type: euclidean_pearson value: 76.39020830763684 - type: euclidean_spearman value: 72.3718627025895 - type: manhattan_pearson value: 76.6148833027359 - type: manhattan_spearman value: 72.57570008442319 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 80.43678068337017 - type: cos_sim_spearman value: 82.38941154076062 - type: euclidean_pearson value: 81.59260573633661 - type: euclidean_spearman value: 82.31144262574114 - type: manhattan_pearson value: 81.43266909137056 - type: manhattan_spearman value: 82.14704293004861 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 80.73713431763163 - type: cos_sim_spearman value: 77.97860512809388 - type: euclidean_pearson value: 80.35755041527027 - type: euclidean_spearman value: 78.021703511412 - type: manhattan_pearson value: 80.24440317109162 - type: manhattan_spearman value: 77.93165415697575 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.15111852351204 - type: cos_sim_spearman value: 86.54032447238258 - type: euclidean_pearson value: 86.14157021537433 - type: euclidean_spearman value: 86.67537291929713 - type: manhattan_pearson value: 86.081041854808 - type: manhattan_spearman value: 86.61561701560558 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 81.34532445104026 - type: cos_sim_spearman value: 83.31325001474116 - type: euclidean_pearson value: 82.81892375201032 - type: euclidean_spearman value: 83.4521695148055 - type: manhattan_pearson value: 82.72503790526163 - type: manhattan_spearman value: 83.37833652941349 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.25463453839801 - type: cos_sim_spearman value: 88.27655263515948 - type: euclidean_pearson value: 88.0248334411439 - type: euclidean_spearman value: 88.18141448876868 - type: manhattan_pearson value: 87.8080451127279 - type: manhattan_spearman value: 88.01028114423058 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.57551045355218 - type: cos_sim_spearman value: 66.67614095126629 - type: euclidean_pearson value: 66.0787243112528 - type: euclidean_spearman value: 66.83660560636939 - type: manhattan_pearson value: 66.74684019662031 - type: manhattan_spearman value: 67.11761598074368 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.70881496766829 - type: cos_sim_spearman value: 84.37803542941634 - type: euclidean_pearson value: 84.84501245857096 - type: euclidean_spearman value: 84.47088079741476 - type: manhattan_pearson value: 84.77244090794765 - type: manhattan_spearman value: 84.43307343706205 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 81.53946254759089 - type: mrr value: 94.68259953554072 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 51.817 - type: map_at_10 value: 62.339999999999996 - type: map_at_100 value: 62.88 - type: map_at_1000 value: 62.909000000000006 - type: map_at_3 value: 59.004 - type: map_at_5 value: 60.906000000000006 - type: mrr_at_1 value: 54.333 - type: mrr_at_10 value: 63.649 - type: mrr_at_100 value: 64.01 - type: mrr_at_1000 value: 64.039 - type: mrr_at_3 value: 61.056 - type: mrr_at_5 value: 62.639 - type: ndcg_at_1 value: 54.333 - type: ndcg_at_10 value: 67.509 - type: ndcg_at_100 value: 69.69999999999999 - type: ndcg_at_1000 value: 70.613 - type: ndcg_at_3 value: 61.729 - type: ndcg_at_5 value: 64.696 - type: precision_at_1 value: 54.333 - type: precision_at_10 value: 9.2 - type: precision_at_100 value: 1.043 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 24.0 - type: precision_at_5 value: 16.2 - type: recall_at_1 value: 51.817 - type: recall_at_10 value: 82.056 - type: recall_at_100 value: 91.667 - type: recall_at_1000 value: 99.0 - type: recall_at_3 value: 66.717 - type: recall_at_5 value: 74.17200000000001 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.82475247524752 - type: cos_sim_ap value: 95.4781199603258 - type: cos_sim_f1 value: 91.16186693147964 - type: cos_sim_precision value: 90.53254437869822 - type: cos_sim_recall value: 91.8 - type: dot_accuracy value: 99.75049504950495 - type: dot_ap value: 93.05183539809457 - type: dot_f1 value: 87.31117824773412 - type: dot_precision value: 87.93103448275862 - type: dot_recall value: 86.7 - type: euclidean_accuracy value: 99.82475247524752 - type: euclidean_ap value: 95.38547978154382 - type: euclidean_f1 value: 91.16325511732403 - type: euclidean_precision value: 91.02691924227318 - type: euclidean_recall value: 91.3 - type: manhattan_accuracy value: 99.82574257425742 - type: manhattan_ap value: 95.47237521890308 - type: manhattan_f1 value: 91.27849355797821 - type: manhattan_precision value: 90.47151277013754 - type: manhattan_recall value: 92.10000000000001 - type: max_accuracy value: 99.82574257425742 - type: max_ap value: 95.4781199603258 - type: max_f1 value: 91.27849355797821 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 57.542169376331245 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.74399302634387 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.65076347632749 - type: mrr value: 50.418099057804945 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 29.73997756592847 - type: cos_sim_spearman value: 29.465208011593308 - type: dot_pearson value: 24.83735342474541 - type: dot_spearman value: 26.005180528584855 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.208 - type: map_at_10 value: 1.434 - type: map_at_100 value: 7.829 - type: map_at_1000 value: 19.807 - type: map_at_3 value: 0.549 - type: map_at_5 value: 0.8330000000000001 - type: mrr_at_1 value: 78.0 - type: mrr_at_10 value: 85.35199999999999 - type: mrr_at_100 value: 85.673 - type: mrr_at_1000 value: 85.673 - type: mrr_at_3 value: 84.667 - type: mrr_at_5 value: 85.06700000000001 - type: ndcg_at_1 value: 72.0 - type: ndcg_at_10 value: 59.214999999999996 - type: ndcg_at_100 value: 44.681 - type: ndcg_at_1000 value: 43.035000000000004 - type: ndcg_at_3 value: 66.53099999999999 - type: ndcg_at_5 value: 63.23 - type: precision_at_1 value: 78.0 - type: precision_at_10 value: 62.4 - type: precision_at_100 value: 45.76 - type: precision_at_1000 value: 19.05 - type: precision_at_3 value: 71.333 - type: precision_at_5 value: 67.2 - type: recall_at_1 value: 0.208 - type: recall_at_10 value: 1.6580000000000001 - type: recall_at_100 value: 11.324 - type: recall_at_1000 value: 41.537 - type: recall_at_3 value: 0.579 - type: recall_at_5 value: 0.8959999999999999 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.442 - type: map_at_10 value: 8.863 - type: map_at_100 value: 14.606 - type: map_at_1000 value: 16.258 - type: map_at_3 value: 4.396 - type: map_at_5 value: 6.199000000000001 - type: mrr_at_1 value: 30.612000000000002 - type: mrr_at_10 value: 43.492 - type: mrr_at_100 value: 44.557 - type: mrr_at_1000 value: 44.557 - type: mrr_at_3 value: 40.816 - type: mrr_at_5 value: 42.143 - type: ndcg_at_1 value: 25.509999999999998 - type: ndcg_at_10 value: 22.076 - type: ndcg_at_100 value: 34.098 - type: ndcg_at_1000 value: 46.265 - type: ndcg_at_3 value: 24.19 - type: ndcg_at_5 value: 23.474 - type: precision_at_1 value: 30.612000000000002 - type: precision_at_10 value: 19.796 - type: precision_at_100 value: 7.286 - type: precision_at_1000 value: 1.5310000000000001 - type: precision_at_3 value: 25.85 - type: precision_at_5 value: 24.490000000000002 - type: recall_at_1 value: 2.442 - type: recall_at_10 value: 15.012 - type: recall_at_100 value: 45.865 - type: recall_at_1000 value: 82.958 - type: recall_at_3 value: 5.731 - type: recall_at_5 value: 9.301 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.974 - type: ap value: 14.534996211286682 - type: f1 value: 54.785946183399005 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 58.56819468024901 - type: f1 value: 58.92391487111204 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 43.273202335218194 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 84.37742146986946 - type: cos_sim_ap value: 68.1684129575579 - type: cos_sim_f1 value: 64.93475108748189 - type: cos_sim_precision value: 59.89745876058849 - type: cos_sim_recall value: 70.89709762532982 - type: dot_accuracy value: 80.49710913750968 - type: dot_ap value: 54.699790073944186 - type: dot_f1 value: 54.45130013221684 - type: dot_precision value: 46.74612183125236 - type: dot_recall value: 65.19788918205805 - type: euclidean_accuracy value: 84.5085533766466 - type: euclidean_ap value: 68.38835695236224 - type: euclidean_f1 value: 65.3391121002694 - type: euclidean_precision value: 58.75289656625237 - type: euclidean_recall value: 73.58839050131925 - type: manhattan_accuracy value: 84.40126363473803 - type: manhattan_ap value: 68.09539181555348 - type: manhattan_f1 value: 64.99028182701653 - type: manhattan_precision value: 60.22062134173795 - type: manhattan_recall value: 70.58047493403694 - type: max_accuracy value: 84.5085533766466 - type: max_ap value: 68.38835695236224 - type: max_f1 value: 65.3391121002694 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.34167733923235 - type: cos_sim_ap value: 84.84136381147736 - type: cos_sim_f1 value: 77.01434980904001 - type: cos_sim_precision value: 74.27937915742794 - type: cos_sim_recall value: 79.95842315983985 - type: dot_accuracy value: 85.06422944075756 - type: dot_ap value: 76.49446747522325 - type: dot_f1 value: 71.11606520830432 - type: dot_precision value: 64.93638676844785 - type: dot_recall value: 78.59562673236834 - type: euclidean_accuracy value: 88.45810532852097 - type: euclidean_ap value: 84.91526721863501 - type: euclidean_f1 value: 77.04399001750662 - type: euclidean_precision value: 74.62298867162133 - type: euclidean_recall value: 79.62734832152756 - type: manhattan_accuracy value: 88.46004579500912 - type: manhattan_ap value: 84.81590026238194 - type: manhattan_f1 value: 76.97804626491822 - type: manhattan_precision value: 73.79237288135593 - type: manhattan_recall value: 80.45118570988605 - type: max_accuracy value: 88.46004579500912 - type: max_ap value: 84.91526721863501 - type: max_f1 value: 77.04399001750662 pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb --- # {gte-tiny} This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. It is distilled from , with comparable (slightly worse) performance at around half the size. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Full Model Architecture ## Citing & Authors " +} \ No newline at end of file diff --git a/data/model_data_json/TheBloke_Llama-2-7B-Chat-GGUF.json b/data/model_data_json/TheBloke_Llama-2-7B-Chat-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..d7a83f026c1266fff0e5d49b7cce4d6661d7ed10 --- /dev/null +++ b/data/model_data_json/TheBloke_Llama-2-7B-Chat-GGUF.json @@ -0,0 +1,21 @@ +{ + "model_id": "TheBloke/Llama-2-7B-Chat-GGUF", + "downloads": 83135, + "tags": [ + "transformers", + "gguf", + "llama", + "facebook", + "meta", + "pytorch", + "llama-2", + "text-generation", + "en", + "arxiv:2307.09288", + "base_model:meta-llama/Llama-2-7b-chat-hf", + "base_model:quantized:meta-llama/Llama-2-7b-chat-hf", + "license:llama2", + "region:us" + ], + "description": "--- language: - en license: llama2 tags: - facebook - meta - pytorch - llama - llama-2 model_name: Llama 2 7B Chat arxiv: 2307.09288 base_model: meta-llama/Llama-2-7b-chat-hf inference: false model_creator: Meta Llama 2 model_type: llama pipeline_tag: text-generation prompt_template: '[INST] <> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don''t know the answer to a question, please don''t share false information. <> {prompt}[/INST] ' quantized_by: TheBloke ---


# Llama 2 7B Chat - GGUF - Model creator: Meta Llama 2 - Original model: Llama 2 7B Chat ## Description This repo contains GGUF format model files for Meta Llama 2's Llama 2 7B Chat. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible. Here is an incomplate list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. ## Repositories available * AWQ model(s) for GPU inference. * GPTQ models for GPU inference, with multiple quantisation parameter options. * 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference * Meta Llama 2's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions ## Prompt template: Llama-2-Chat ## Compatibility These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit d0cee0d36d5be95a0d9088b674dbb27354107221 They are also compatible with many third party UIs and libraries - please see the list at the top of this README. ## Explanation of quantisation methods
Click to see details The new methods available are: * GGML_TYPE_Q2_K - \"type-1\" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw) * GGML_TYPE_Q3_K - \"type-0\" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw. * GGML_TYPE_Q4_K - \"type-1\" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. * GGML_TYPE_Q5_K - \"type-1\" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw * GGML_TYPE_Q6_K - \"type-0\" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw Refer to the Provided Files table below to see what files use which methods, and how.
## Provided files | Name | Quant method | Bits | Size | Max RAM required | Use case | | ---- | ---- | ---- | ---- | ---- | ----- | | llama-2-7b-chat.Q2_K.gguf | Q2_K | 2 | 2.83 GB| 5.33 GB | smallest, significant quality loss - not recommended for most purposes | | llama-2-7b-chat.Q3_K_S.gguf | Q3_K_S | 3 | 2.95 GB| 5.45 GB | very small, high quality loss | | llama-2-7b-chat.Q3_K_M.gguf | Q3_K_M | 3 | 3.30 GB| 5.80 GB | very small, high quality loss | | llama-2-7b-chat.Q3_K_L.gguf | Q3_K_L | 3 | 3.60 GB| 6.10 GB | small, substantial quality loss | | llama-2-7b-chat.Q4_0.gguf | Q4_0 | 4 | 3.83 GB| 6.33 GB | legacy; small, very high quality loss - prefer using Q3_K_M | | llama-2-7b-chat.Q4_K_S.gguf | Q4_K_S | 4 | 3.86 GB| 6.36 GB | small, greater quality loss | | llama-2-7b-chat.Q4_K_M.gguf | Q4_K_M | 4 | 4.08 GB| 6.58 GB | medium, balanced quality - recommended | | llama-2-7b-chat.Q5_0.gguf | Q5_0 | 5 | 4.65 GB| 7.15 GB | legacy; medium, balanced quality - prefer using Q4_K_M | | llama-2-7b-chat.Q5_K_S.gguf | Q5_K_S | 5 | 4.65 GB| 7.15 GB | large, low quality loss - recommended | | llama-2-7b-chat.Q5_K_M.gguf | Q5_K_M | 5 | 4.78 GB| 7.28 GB | large, very low quality loss - recommended | | llama-2-7b-chat.Q6_K.gguf | Q6_K | 6 | 5.53 GB| 8.03 GB | very large, extremely low quality loss | | llama-2-7b-chat.Q8_0.gguf | Q8_0 | 8 | 7.16 GB| 9.66 GB | very large, extremely low quality loss - not recommended | **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead. ## How to download GGUF files **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file. The following clients/libraries will automatically download models for you, providing a list of available models to choose from: - LM Studio - LoLLMS Web UI - Faraday.dev ### In Under Download Model, you can enter the model repo: TheBloke/Llama-2-7b-Chat-GGUF and below it, a specific filename to download, such as: llama-2-7b-chat.q4_K_M.gguf. Then click Download. ### On the command line, including multiple files at once I recommend using the Python library: Then you can download any individual model file to the current directory, at high speed, with a command like this:
More advanced huggingface-cli download usage You can also download multiple files at once with a pattern: For more documentation on downloading with , please see: HF -> Hub Python Library -> Download files -> Download from the CLI. To accelerate downloads on fast connections (1Gbit/s or higher), install : And set environment variable to : Windows CLI users: Use before running the download command.
## Example command Make sure you are using from commit d0cee0d36d5be95a0d9088b674dbb27354107221 or later. Change to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. Change to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. If you want to have a chat-style conversation, replace the argument with For other parameters and how to use them, please refer to the llama.cpp documentation ## How to run in Further instructions here: text-generation-webui/docs/llama.cpp.md. ## How to run from Python code You can use GGUF models from Python using the llama-cpp-python or ctransformers libraries. ### How to load this model from Python using ctransformers #### First install the package #### Simple example code to load one of these GGUF models ## How to use with LangChain Here's guides on using llama-cpp-python or ctransformers with LangChain: * LangChain + llama-cpp-python * LangChain + ctransformers ## Discord For further support, and discussions on these models and AI in general, join us at: TheBloke AI's Discord server ## Thanks, and how to contribute Thanks to the chirper.ai team! Thanks to Clay from gpus.llm-utils.org! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: * Ko-Fi: **Special thanks to**: Aemon Algiz. **Patreon special mentions**: Alicia Loh, Stephen Murray, K, Ajan Kanaga, RoA, Magnesian, Deo Leter, Olakabola, Eugene Pentland, zynix, Deep Realms, Raymond Fosdick, Elijah Stavena, Iucharbius, Erik Bjäreholt, Luis Javier Navarrete Lozano, Nicholas, theTransient, John Detwiler, alfie_i, knownsqashed, Mano Prime, Willem Michiel, Enrico Ros, LangChain4j, OG, Michael Dempsey, Pierre Kircher, Pedro Madruga, James Bentley, Thomas Belote, Luke @flexchar, Leonard Tan, Johann-Peter Hartmann, Illia Dulskyi, Fen Risland, Chadd, S_X, Jeff Scroggin, Ken Nordquist, Sean Connelly, Artur Olbinski, Swaroop Kallakuri, Jack West, Ai Maven, David Ziegler, Russ Johnson, transmissions 11, John Villwock, Alps Aficionado, Clay Pascal, Viktor Bowallius, Subspace Studios, Rainer Wilmers, Trenton Dambrowitz, vamX, Michael Levine, 준교 김, Brandon Frisco, Kalila, Trailburnt, Randy H, Talal Aujan, Nathan Dryer, Vadim, 阿明, ReadyPlayerEmma, Tiffany J. Kim, George Stoitzev, Spencer Kim, Jerry Meng, Gabriel Tamborski, Cory Kujawski, Jeffrey Morgan, Spiking Neurons AB, Edmond Seymore, Alexandros Triantafyllidis, Lone Striker, Cap'n Zoog, Nikolai Manek, danny, ya boyyy, Derek Yates, usrbinkat, Mandus, TL, Nathan LeClaire, subjectnull, Imad Khwaja, webtim, Raven Klaugh, Asp the Wyvern, Gabriel Puliatti, Caitlyn Gatomon, Joseph William Delisle, Jonathan Leane, Luke Pendergrass, SuperWojo, Sebastain Graf, Will Dee, Fred von Graf, Andrey, Dan Guido, Daniel P. Andersen, Nitin Borwankar, Elle, Vitor Caleffi, biorpg, jjj, NimbleBox.ai, Pieter, Matthew Berman, terasurfer, Michael Davis, Alex, Stanislav Ovsiannikov Thank you to all my generous patrons and donaters! And thank you again to a16z for their generous grant. # Original model card: Meta Llama 2's Llama 2 7B Chat # **Llama 2** Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom. ## Model Details *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the website and accept our License before requesting access here.* Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. **Model Developers** Meta **Variations** Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. **Input** Models input text only. **Output** Models generate text only. **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. ||Training Data|Params|Content Length|GQA|Tokens|LR| |---|---|---|---|---|---|---| |Llama 2|*A new mix of publicly available online data*|7B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|13B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|70B|4k|✔|2.0T|1.5 x 10-4| *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability. **Model Dates** Llama 2 was trained between January 2023 and July 2023. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: **Research Paper** \"Llama-2: Open Foundation and Fine-tuned Chat Models\" ## Intended Use **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. To get the expected features and performance for the chat versions, a specific formatting needs to be followed, including the and tags, and tokens, and the whitespaces and breaklines in between (we recommend calling on inputs to avoid double-spaces). See our reference code in github for details: []( **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program. ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO2eq)| |---|---|---|---| |Llama 2 7B|184320|400|31.22| |Llama 2 13B|368640|400|62.44| |Llama 2 70B|1720320|400|291.42| |Total|3311616||539.00| **CO2 emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023. ## Evaluation Results In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.For all the evaluations, we use our internal evaluations library. |Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval| |---|---|---|---|---|---|---|---|---|---| |Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9| |Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9| |Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7| |Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6| |Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3| |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1| |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**| **Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1. |||TruthfulQA|Toxigen| |---|---|---|---| |Llama 1|7B|27.42|23.00| |Llama 1|13B|41.74|23.08| |Llama 1|33B|44.19|22.57| |Llama 1|65B|48.71|21.77| |Llama 2|7B|33.29|**21.25**| |Llama 2|13B|41.86|26.10| |Llama 2|70B|**50.18**|24.60| **Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better). |||TruthfulQA|Toxigen| |---|---|---|---| |Llama-2-Chat|7B|57.04|**0.00**| |Llama-2-Chat|13B|62.18|**0.00**| |Llama-2-Chat|70B|**64.14**|0.01| **Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above. ## Ethical Considerations and Limitations Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available at ## Reporting Issues Please report any software “bug,” or other problems with the models through one of the following means: - Reporting issues with the model: github.com/facebookresearch/llama - Reporting problematic content generated by the model: developers.facebook.com/llama_output_feedback - Reporting bugs and security concerns: facebook.com/whitehat/info ## Llama Model Index |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |---|---|---|---|---| |7B| Link | Link | Link | Link| |13B| Link | Link | Link | Link| |70B| Link | Link | Link | Link| " +} \ No newline at end of file diff --git a/data/model_data_json/TheBloke_Mistral-7B-Instruct-v0.1-GGUF.json b/data/model_data_json/TheBloke_Mistral-7B-Instruct-v0.1-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..e60a19fc0298bce31986c067bf1682a710a07274 --- /dev/null +++ b/data/model_data_json/TheBloke_Mistral-7B-Instruct-v0.1-GGUF.json @@ -0,0 +1,17 @@ +{ + "model_id": "TheBloke/Mistral-7B-Instruct-v0.1-GGUF", + "downloads": 329540, + "tags": [ + "transformers", + "gguf", + "mistral", + "finetuned", + "text-generation", + "base_model:mistralai/Mistral-7B-Instruct-v0.1", + "base_model:quantized:mistralai/Mistral-7B-Instruct-v0.1", + "license:apache-2.0", + "region:us" + ], + "description": "--- base_model: mistralai/Mistral-7B-Instruct-v0.1 inference: false license: apache-2.0 model_creator: Mistral AI model_name: Mistral 7B Instruct v0.1 model_type: mistral pipeline_tag: text-generation prompt_template: '[INST]{prompt} [/INST] ' quantized_by: TheBloke tags: - finetuned ---
\"TheBlokeAI\"

# Mistral 7B Instruct v0.1 - GGUF - Model creator: Mistral AI - Original model: Mistral 7B Instruct v0.1 ## Description This repo contains GGUF format model files for Mistral AI's Mistral 7B Instruct v0.1. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplate list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. ## Repositories available * AWQ model(s) for GPU inference. * GPTQ models for GPU inference, with multiple quantisation parameter options. * 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference * Mistral AI's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions ## Prompt template: Mistral ## Compatibility These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit d0cee0d They are also compatible with many third party UIs and libraries - please see the list at the top of this README. Sequence length note: The model will work at sequence lengths of 4096, or lower. GGUF does not yet have support for the new sliding window sequence length mode, so longer sequence lengths are not supported. ## Explanation of quantisation methods
Click to see details The new methods available are: * GGML_TYPE_Q2_K - \"type-1\" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw) * GGML_TYPE_Q3_K - \"type-0\" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw. * GGML_TYPE_Q4_K - \"type-1\" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. * GGML_TYPE_Q5_K - \"type-1\" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw * GGML_TYPE_Q6_K - \"type-0\" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw Refer to the Provided Files table below to see what files use which methods, and how.
## Provided files | Name | Quant method | Bits | Size | Max RAM required | Use case | | ---- | ---- | ---- | ---- | ---- | ----- | | mistral-7b-instruct-v0.1.Q2_K.gguf | Q2_K | 2 | 3.08 GB| 5.58 GB | smallest, significant quality loss - not recommended for most purposes | | mistral-7b-instruct-v0.1.Q3_K_S.gguf | Q3_K_S | 3 | 3.16 GB| 5.66 GB | very small, high quality loss | | mistral-7b-instruct-v0.1.Q3_K_M.gguf | Q3_K_M | 3 | 3.52 GB| 6.02 GB | very small, high quality loss | | mistral-7b-instruct-v0.1.Q3_K_L.gguf | Q3_K_L | 3 | 3.82 GB| 6.32 GB | small, substantial quality loss | | mistral-7b-instruct-v0.1.Q4_0.gguf | Q4_0 | 4 | 4.11 GB| 6.61 GB | legacy; small, very high quality loss - prefer using Q3_K_M | | mistral-7b-instruct-v0.1.Q4_K_S.gguf | Q4_K_S | 4 | 4.14 GB| 6.64 GB | small, greater quality loss | | mistral-7b-instruct-v0.1.Q4_K_M.gguf | Q4_K_M | 4 | 4.37 GB| 6.87 GB | medium, balanced quality - recommended | | mistral-7b-instruct-v0.1.Q5_0.gguf | Q5_0 | 5 | 5.00 GB| 7.50 GB | legacy; medium, balanced quality - prefer using Q4_K_M | | mistral-7b-instruct-v0.1.Q5_K_S.gguf | Q5_K_S | 5 | 5.00 GB| 7.50 GB | large, low quality loss - recommended | | mistral-7b-instruct-v0.1.Q5_K_M.gguf | Q5_K_M | 5 | 5.13 GB| 7.63 GB | large, very low quality loss - recommended | | mistral-7b-instruct-v0.1.Q6_K.gguf | Q6_K | 6 | 5.94 GB| 8.44 GB | very large, extremely low quality loss | | mistral-7b-instruct-v0.1.Q8_0.gguf | Q8_0 | 8 | 7.70 GB| 10.20 GB | very large, extremely low quality loss - not recommended | **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead. ## How to download GGUF files **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file. The following clients/libraries will automatically download models for you, providing a list of available models to choose from: - LM Studio - LoLLMS Web UI - Faraday.dev ### In Under Download Model, you can enter the model repo: TheBloke/Mistral-7B-Instruct-v0.1-GGUF and below it, a specific filename to download, such as: mistral-7b-instruct-v0.1.Q4_K_M.gguf. Then click Download. ### On the command line, including multiple files at once I recommend using the Python library: Then you can download any individual model file to the current directory, at high speed, with a command like this:
More advanced huggingface-cli download usage You can also download multiple files at once with a pattern: For more documentation on downloading with , please see: HF -> Hub Python Library -> Download files -> Download from the CLI. To accelerate downloads on fast connections (1Gbit/s or higher), install : And set environment variable to : Windows Command Line users: You can set the environment variable by running before the download command.
## Example command Make sure you are using from commit d0cee0d or later. Change to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. Sequence length can be 4096 or lower. Mistral's sliding window sequence length is not yet supported in llama.cpp, so do not use sequence lengths longer than 4096. If you want to have a chat-style conversation, replace the argument with For other parameters and how to use them, please refer to the llama.cpp documentation ## How to run in Further instructions here: text-generation-webui/docs/llama.cpp.md. ## How to run from Python code You can use GGUF models from Python using the llama-cpp-python or ctransformers libraries. ### How to load this model in Python code, using ctransformers I have not tested ctransformers with Mistral models. It may work, but will require that you set the to for now, until ctransformers updates with specific support. #### First install the package Run one of the following commands, according to your system: #### Simple ctransformers example code ## How to use with LangChain Here are guides on using llama-cpp-python and ctransformers with LangChain: * LangChain + llama-cpp-python * LangChain + ctransformers ## Discord For further support, and discussions on these models and AI in general, join us at: TheBloke AI's Discord server ## Thanks, and how to contribute Thanks to the chirper.ai team! Thanks to Clay from gpus.llm-utils.org! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: * Ko-Fi: **Special thanks to**: Aemon Algiz. **Patreon special mentions**: Alicia Loh, Stephen Murray, K, Ajan Kanaga, RoA, Magnesian, Deo Leter, Olakabola, Eugene Pentland, zynix, Deep Realms, Raymond Fosdick, Elijah Stavena, Iucharbius, Erik Bjäreholt, Luis Javier Navarrete Lozano, Nicholas, theTransient, John Detwiler, alfie_i, knownsqashed, Mano Prime, Willem Michiel, Enrico Ros, LangChain4j, OG, Michael Dempsey, Pierre Kircher, Pedro Madruga, James Bentley, Thomas Belote, Luke @flexchar, Leonard Tan, Johann-Peter Hartmann, Illia Dulskyi, Fen Risland, Chadd, S_X, Jeff Scroggin, Ken Nordquist, Sean Connelly, Artur Olbinski, Swaroop Kallakuri, Jack West, Ai Maven, David Ziegler, Russ Johnson, transmissions 11, John Villwock, Alps Aficionado, Clay Pascal, Viktor Bowallius, Subspace Studios, Rainer Wilmers, Trenton Dambrowitz, vamX, Michael Levine, 준교 김, Brandon Frisco, Kalila, Trailburnt, Randy H, Talal Aujan, Nathan Dryer, Vadim, 阿明, ReadyPlayerEmma, Tiffany J. Kim, George Stoitzev, Spencer Kim, Jerry Meng, Gabriel Tamborski, Cory Kujawski, Jeffrey Morgan, Spiking Neurons AB, Edmond Seymore, Alexandros Triantafyllidis, Lone Striker, Cap'n Zoog, Nikolai Manek, danny, ya boyyy, Derek Yates, usrbinkat, Mandus, TL, Nathan LeClaire, subjectnull, Imad Khwaja, webtim, Raven Klaugh, Asp the Wyvern, Gabriel Puliatti, Caitlyn Gatomon, Joseph William Delisle, Jonathan Leane, Luke Pendergrass, SuperWojo, Sebastain Graf, Will Dee, Fred von Graf, Andrey, Dan Guido, Daniel P. Andersen, Nitin Borwankar, Elle, Vitor Caleffi, biorpg, jjj, NimbleBox.ai, Pieter, Matthew Berman, terasurfer, Michael Davis, Alex, Stanislav Ovsiannikov Thank you to all my generous patrons and donaters! And thank you again to a16z for their generous grant. # Original model card: Mistral AI's Mistral 7B Instruct v0.1 # Model Card for Mistral-7B-Instruct-v0.1 The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets. For full details of this model please read our release blog post ## Instruction format In order to leverage instruction fine-tuning, your prompt should be surrounded by and tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. E.g. ## Model Architecture This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices: - Grouped-Query Attention - Sliding-Window Attention - Byte-fallback BPE tokenizer ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed. ", + "model_explanation_gemini": "A finetuned, quantized 7B parameter Mistral model optimized for instruction-following text generation tasks in GGUF format." +} \ No newline at end of file diff --git a/data/model_data_json/TheBloke_Mistral-7B-Instruct-v0.2-GGUF.json b/data/model_data_json/TheBloke_Mistral-7B-Instruct-v0.2-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..a28fcf35cc9511dd0511eacaa4ef8de2ad65c3f4 --- /dev/null +++ b/data/model_data_json/TheBloke_Mistral-7B-Instruct-v0.2-GGUF.json @@ -0,0 +1,19 @@ +{ + "model_id": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF", + "downloads": 90843, + "tags": [ + "transformers", + "gguf", + "mistral", + "finetuned", + "text-generation", + "arxiv:2310.06825", + "base_model:mistralai/Mistral-7B-Instruct-v0.2", + "base_model:quantized:mistralai/Mistral-7B-Instruct-v0.2", + "license:apache-2.0", + "region:us", + "conversational" + ], + "description": "--- base_model: mistralai/Mistral-7B-Instruct-v0.2 inference: false license: apache-2.0 model_creator: Mistral AI_ model_name: Mistral 7B Instruct v0.2 model_type: mistral pipeline_tag: text-generation prompt_template: '[INST] {prompt} [/INST] ' quantized_by: TheBloke tags: - finetuned ---
\"TheBlokeAI\"

# Mistral 7B Instruct v0.2 - GGUF - Model creator: Mistral AI_ - Original model: Mistral 7B Instruct v0.2 ## Description This repo contains GGUF format model files for Mistral AI_'s Mistral 7B Instruct v0.2. These files were quantised using hardware kindly provided by Massed Compute. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Repositories available * AWQ model(s) for GPU inference. * GPTQ models for GPU inference, with multiple quantisation parameter options. * 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference * Mistral AI_'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions ## Prompt template: Mistral ## Compatibility These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit d0cee0d They are also compatible with many third party UIs and libraries - please see the list at the top of this README. ## Explanation of quantisation methods
Click to see details The new methods available are: * GGML_TYPE_Q2_K - \"type-1\" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw) * GGML_TYPE_Q3_K - \"type-0\" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw. * GGML_TYPE_Q4_K - \"type-1\" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. * GGML_TYPE_Q5_K - \"type-1\" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw * GGML_TYPE_Q6_K - \"type-0\" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw Refer to the Provided Files table below to see what files use which methods, and how.
## Provided files | Name | Quant method | Bits | Size | Max RAM required | Use case | | ---- | ---- | ---- | ---- | ---- | ----- | | mistral-7b-instruct-v0.2.Q2_K.gguf | Q2_K | 2 | 3.08 GB| 5.58 GB | smallest, significant quality loss - not recommended for most purposes | | mistral-7b-instruct-v0.2.Q3_K_S.gguf | Q3_K_S | 3 | 3.16 GB| 5.66 GB | very small, high quality loss | | mistral-7b-instruct-v0.2.Q3_K_M.gguf | Q3_K_M | 3 | 3.52 GB| 6.02 GB | very small, high quality loss | | mistral-7b-instruct-v0.2.Q3_K_L.gguf | Q3_K_L | 3 | 3.82 GB| 6.32 GB | small, substantial quality loss | | mistral-7b-instruct-v0.2.Q4_0.gguf | Q4_0 | 4 | 4.11 GB| 6.61 GB | legacy; small, very high quality loss - prefer using Q3_K_M | | mistral-7b-instruct-v0.2.Q4_K_S.gguf | Q4_K_S | 4 | 4.14 GB| 6.64 GB | small, greater quality loss | | mistral-7b-instruct-v0.2.Q4_K_M.gguf | Q4_K_M | 4 | 4.37 GB| 6.87 GB | medium, balanced quality - recommended | | mistral-7b-instruct-v0.2.Q5_0.gguf | Q5_0 | 5 | 5.00 GB| 7.50 GB | legacy; medium, balanced quality - prefer using Q4_K_M | | mistral-7b-instruct-v0.2.Q5_K_S.gguf | Q5_K_S | 5 | 5.00 GB| 7.50 GB | large, low quality loss - recommended | | mistral-7b-instruct-v0.2.Q5_K_M.gguf | Q5_K_M | 5 | 5.13 GB| 7.63 GB | large, very low quality loss - recommended | | mistral-7b-instruct-v0.2.Q6_K.gguf | Q6_K | 6 | 5.94 GB| 8.44 GB | very large, extremely low quality loss | | mistral-7b-instruct-v0.2.Q8_0.gguf | Q8_0 | 8 | 7.70 GB| 10.20 GB | very large, extremely low quality loss - not recommended | **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead. ## How to download GGUF files **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file. The following clients/libraries will automatically download models for you, providing a list of available models to choose from: * LM Studio * LoLLMS Web UI * Faraday.dev ### In Under Download Model, you can enter the model repo: TheBloke/Mistral-7B-Instruct-v0.2-GGUF and below it, a specific filename to download, such as: mistral-7b-instruct-v0.2.Q4_K_M.gguf. Then click Download. ### On the command line, including multiple files at once I recommend using the Python library: Then you can download any individual model file to the current directory, at high speed, with a command like this:
More advanced huggingface-cli download usage (click to read) You can also download multiple files at once with a pattern: For more documentation on downloading with , please see: HF -> Hub Python Library -> Download files -> Download from the CLI. To accelerate downloads on fast connections (1Gbit/s or higher), install : And set environment variable to : Windows Command Line users: You can set the environment variable by running before the download command.
## Example command Make sure you are using from commit d0cee0d or later. Change to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. Change to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value. If you want to have a chat-style conversation, replace the argument with For other parameters and how to use them, please refer to the llama.cpp documentation ## How to run in Further instructions can be found in the text-generation-webui documentation, here: text-generation-webui/docs/04 ‐ Model Tab.md. ## How to run from Python code You can use GGUF models from Python using the llama-cpp-python or ctransformers libraries. Note that at the time of writing (Nov 27th 2023), ctransformers has not been updated for some time and is not compatible with some recent models. Therefore I recommend you use llama-cpp-python. ### How to load this model in Python code, using llama-cpp-python For full documentation, please see: llama-cpp-python docs. #### First install the package Run one of the following commands, according to your system: #### Simple llama-cpp-python example code ## How to use with LangChain Here are guides on using llama-cpp-python and ctransformers with LangChain: * LangChain + llama-cpp-python * LangChain + ctransformers ## Discord For further support, and discussions on these models and AI in general, join us at: TheBloke AI's Discord server ## Thanks, and how to contribute Thanks to the chirper.ai team! Thanks to Clay from gpus.llm-utils.org! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: * Ko-Fi: **Special thanks to**: Aemon Algiz. **Patreon special mentions**: Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros Thank you to all my generous patrons and donaters! And thank you again to a16z for their generous grant. # Original model card: Mistral AI_'s Mistral 7B Instruct v0.2 # Model Card for Mistral-7B-Instruct-v0.2 The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. For full details of this model please read our paper and release blog post. ## Instruction format In order to leverage instruction fine-tuning, your prompt should be surrounded by and tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. E.g. This format is available as a chat template via the method: ## Model Architecture This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices: - Grouped-Query Attention - Sliding-Window Attention - Byte-fallback BPE tokenizer ## Troubleshooting - If you see the following error: Installing transformers from source should solve the issue pip install git+ This should not be required after transformers-v4.33.4. ## Limitations The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed. ", + "model_explanation_gemini": "A 7B-parameter instruction-tuned Mistral model quantized in GGUF format for efficient text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/TheBloke_Mixtral-8x7B-Instruct-v0.1-AWQ.json b/data/model_data_json/TheBloke_Mixtral-8x7B-Instruct-v0.1-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..81ba760e41f52509a43bdf0352d775827c49c95e --- /dev/null +++ b/data/model_data_json/TheBloke_Mixtral-8x7B-Instruct-v0.1-AWQ.json @@ -0,0 +1,25 @@ +{ + "model_id": "TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ", + "downloads": 75286, + "tags": [ + "transformers", + "safetensors", + "mixtral", + "text-generation", + "conversational", + "fr", + "it", + "de", + "es", + "en", + "base_model:mistralai/Mixtral-8x7B-Instruct-v0.1", + "base_model:quantized:mistralai/Mixtral-8x7B-Instruct-v0.1", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "4-bit", + "awq", + "region:us" + ], + "description": "--- base_model: mistralai/Mixtral-8x7B-Instruct-v0.1 inference: false language: - fr - it - de - es - en license: apache-2.0 model_creator: Mistral AI_ model_name: Mixtral 8X7B Instruct v0.1 model_type: mixtral prompt_template: '[INST] {prompt} [/INST] ' quantized_by: TheBloke widget: - output: text: 'Arr, shiver me timbers! Ye have a llama on yer lawn, ye say? Well, that be a new one for me! Here''s what I''d suggest, arr: 1. Firstly, ensure yer safety. Llamas may look gentle, but they can be protective if they feel threatened. 2. Try to make the area less appealing to the llama. Remove any food sources or water that might be attracting it. 3. Contact local animal control or a wildlife rescue organization. They be the experts and can provide humane ways to remove the llama from yer property. 4. If ye have any experience with animals, you could try to gently herd the llama towards a nearby field or open space. But be careful, arr! Remember, arr, it be important to treat the llama with respect and care. It be a creature just trying to survive, like the rest of us.' text: '[INST] You are a pirate chatbot who always responds with Arr and pirate speak! There''s a llama on my lawn, how can I get rid of him? [/INST]' ---
\"TheBlokeAI\"

# Mixtral 8X7B Instruct v0.1 - AWQ - Model creator: Mistral AI_ - Original model: Mixtral 8X7B Instruct v0.1 ## Description This repo contains AWQ model files for Mistral AI_'s Mixtral 8X7B Instruct v0.1. ### About AWQ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings. AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead. It is supported by: - Text Generation Webui - using Loader: AutoAWQ - vLLM - version 0.2.2 or later for support for all model types. - Hugging Face Text Generation Inference (TGI) - Transformers version 4.35.0 and later, from any code or client that supports Transformers - AutoAWQ - for use from Python code ## Repositories available * AWQ model(s) for GPU inference. * GPTQ models for GPU inference, with multiple quantisation parameter options. * 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference * Mistral AI_'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions ## Prompt template: Mistral ## Provided files, and AWQ parameters I currently release 128g GEMM models only. The addition of group_size 32 models, and GEMV kernel models, is being actively considered. Models are released as sharded safetensors files. | Branch | Bits | GS | AWQ Dataset | Seq Len | Size | | ------ | ---- | -- | ----------- | ------- | ---- | | main | 4 | 128 | VMware Open Instruct | 8192 | 24.65 GB ## How to easily download and use this model in text-generation-webui Please make sure you're using the latest version of text-generation-webui. It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install. 1. Click the **Model tab**. 2. Under **Download custom model or LoRA**, enter . 3. Click **Download**. 4. The model will start downloading. Once it's finished it will say \"Done\". 5. In the top left, click the refresh icon next to **Model**. 6. In the **Model** dropdown, choose the model you just downloaded: 7. Select **Loader: AutoAWQ**. 8. Click Load, and the model will load and is now ready for use. 9. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right. 10. Once you're ready, click the **Text Generation** tab and enter a prompt to get started! ## Multi-user inference server: vLLM Documentation on installing and using vLLM can be found here. - Please ensure you are using vLLM version 0.2 or later. - When using vLLM as a server, pass the parameter. For example: - When using vLLM from Python code, again set . For example: ## Multi-user inference server: Hugging Face Text Generation Inference (TGI) Use TGI version 1.1.0 or later. The official Docker container is: Example Docker parameters: Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later): ## Inference from Python code using Transformers ### Install the necessary packages - Requires: Transformers 4.35.0 or later. - Requires: AutoAWQ 0.1.6 or later. Note that if you are using PyTorch 2.0.1, the above AutoAWQ command will automatically upgrade you to PyTorch 2.1.0. If you are using CUDA 11.8 and wish to continue using PyTorch 2.0.1, instead run this command: If you have problems installing AutoAWQ using the pre-built wheels, install it from source instead: ### Transformers example code (requires Transformers 4.35.0 and later) ## Compatibility The files provided are tested to work with: - text-generation-webui using . - vLLM version 0.2.0 and later. - Hugging Face Text Generation Inference (TGI) version 1.1.0 and later. - Transformers version 4.35.0 and later. - AutoAWQ version 0.1.1 and later. ## Discord For further support, and discussions on these models and AI in general, join us at: TheBloke AI's Discord server ## Thanks, and how to contribute Thanks to the chirper.ai team! Thanks to Clay from gpus.llm-utils.org! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: * Ko-Fi: **Special thanks to**: Aemon Algiz. **Patreon special mentions**: Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros Thank you to all my generous patrons and donaters! And thank you again to a16z for their generous grant. # Original model card: Mistral AI_'s Mixtral 8X7B Instruct v0.1 # Model Card for Mixtral-8x7B The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested. For full details of this model please read our release blog post. ## Warning This repo contains weights that are compatible with vLLM serving of the model as well as Hugging Face transformers library. It is based on the original Mixtral torrent release, but the file format and parameter names are different. Please note that model cannot (yet) be instantiated with HF. ## Instruction format This format must be strictly respected, otherwise the model will generate sub-optimal outputs. The template used to build a prompt for the Instruct model is defined as follows: Note that and are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings. As reference, here is the pseudo-code used to tokenize instructions during fine-tuning: In the pseudo-code above, note that the method should not add a BOS or EOS token automatically, but should add a prefix space. ## Run the model By default, transformers will load the model in full precision. Therefore you might be interested to further reduce down the memory requirements to run the model through the optimizations we offer in HF ecosystem: ### In half-precision Note precision only works on GPU devices
Click to expand
### Lower precision using (8-bit & 4-bit) using
Click to expand
### Load the model with Flash Attention 2
Click to expand
## Limitations The Mixtral-8x7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. # The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed." +} \ No newline at end of file diff --git a/data/model_data_json/TheBloke_Mixtral-8x7B-Instruct-v0.1-GPTQ.json b/data/model_data_json/TheBloke_Mixtral-8x7B-Instruct-v0.1-GPTQ.json new file mode 100644 index 0000000000000000000000000000000000000000..d44d1f8afd1e06ffbd2096f7bfdc273aede93628 --- /dev/null +++ b/data/model_data_json/TheBloke_Mixtral-8x7B-Instruct-v0.1-GPTQ.json @@ -0,0 +1,26 @@ +{ + "model_id": "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ", + "downloads": 92224, + "tags": [ + "transformers", + "safetensors", + "mixtral", + "text-generation", + "conversational", + "fr", + "it", + "de", + "es", + "en", + "base_model:mistralai/Mixtral-8x7B-Instruct-v0.1", + "base_model:quantized:mistralai/Mixtral-8x7B-Instruct-v0.1", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "4-bit", + "gptq", + "region:us" + ], + "description": "--- base_model: mistralai/Mixtral-8x7B-Instruct-v0.1 inference: false language: - fr - it - de - es - en license: apache-2.0 model_creator: Mistral AI_ model_name: Mixtral 8X7B Instruct v0.1 model_type: mixtral prompt_template: '[INST] {prompt} [/INST] ' quantized_by: TheBloke widget: - output: text: 'Arr, shiver me timbers! Ye have a llama on yer lawn, ye say? Well, that be a new one for me! Here''s what I''d suggest, arr: 1. Firstly, ensure yer safety. Llamas may look gentle, but they can be protective if they feel threatened. 2. Try to make the area less appealing to the llama. Remove any food sources or water that might be attracting it. 3. Contact local animal control or a wildlife rescue organization. They be the experts and can provide humane ways to remove the llama from yer property. 4. If ye have any experience with animals, you could try to gently herd the llama towards a nearby field or open space. But be careful, arr! Remember, arr, it be important to treat the llama with respect and care. It be a creature just trying to survive, like the rest of us.' text: '[INST] You are a pirate chatbot who always responds with Arr and pirate speak! There''s a llama on my lawn, how can I get rid of him? [/INST]' ---
\"TheBlokeAI\"

# Mixtral 8X7B Instruct v0.1 - GPTQ - Model creator: Mistral AI_ - Original model: Mixtral 8X7B Instruct v0.1 # Description This repo contains GPTQ model files for Mistral AI_'s Mixtral 8X7B Instruct v0.1. Mixtral GPTQs currently require: * Transformers 4.36.0 or later * either, AutoGPTQ 0.6 compiled from source, or * Transformers 4.37.0.dev0 compiled from Github with: Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them. ## Repositories available * AWQ model(s) for GPU inference. * GPTQ models for GPU inference, with multiple quantisation parameter options. * 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference * Mistral AI_'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions ## Prompt template: Mistral ## Known compatible clients / servers GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models. Mixtral GPTQs currently have special requirements - see Description above. ## Provided files, and GPTQ parameters Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements. Each separate quant is in a different branch. See below for instructions on fetching from different branches. Most GPTQ files are made with AutoGPTQ. Mistral models are currently made with Transformers.
Explanation of GPTQ parameters - Bits: The bit size of the quantised model. - GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. \"None\" is the lowest possible value. - Act Order: True or False. Also known as . True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now. - Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy. - GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s). - Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences. - ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama and Mistral models in 4-bit.
| Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc | | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- | | main | 4 | None | Yes | 0.1 | VMware Open Instruct | 8192 | 23.81 GB | No | 4-bit, with Act Order. No group size, to lower VRAM requirements. | | gptq-4bit-128g-actorder_True | 4 | 128 | Yes | 0.1 | VMware Open Instruct | 8192 | 24.70 GB | No | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. | | gptq-4bit-32g-actorder_True | 4 | 32 | Yes | 0.1 | VMware Open Instruct | 8192 | 27.42 GB | No | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. | | gptq-3bit--1g-actorder_True | 3 | None | Yes | 0.1 | VMware Open Instruct | 8192 | 18.01 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. | | gptq-3bit-128g-actorder_True | 3 | 128 | Yes | 0.1 | VMware Open Instruct | 8192 | 18.85 GB | No | 3-bit, with group size 128g and act-order. Higher quality than 128g-False. | | gptq-8bit--1g-actorder_True | 8 | None | Yes | 0.1 | VMware Open Instruct | 8192 | 47.04 GB | No | 8-bit, with Act Order. No group size, to lower VRAM requirements. | | gptq-8bit-128g-actorder_True | 8 | 128 | Yes | 0.1 | VMware Open Instruct | 8192 | 48.10 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. | ## How to download, including from branches ### In text-generation-webui To download from the branch, enter in the \"Download model\" box. To download from another branch, add to the end of the download name, eg ### From the command line I recommend using the Python library: To download the branch to a folder called : To download from a different branch, add the parameter:
More advanced huggingface-cli download usage If you remove the parameter, the files will instead be stored in the central Hugging Face cache directory (default location on Linux is: ), and symlinks will be added to the specified , pointing to their real location in the cache. This allows for interrupted downloads to be resumed, and allows you to quickly clone the repo to multiple places on disk without triggering a download again. The downside, and the reason why I don't list that as the default option, is that the files are then hidden away in a cache folder and it's harder to know where your disk space is being used, and to clear it up if/when you want to remove a download model. The cache location can be changed with the environment variable, and/or the parameter to . For more documentation on downloading with , please see: HF -> Hub Python Library -> Download files -> Download from the CLI. To accelerate downloads on fast connections (1Gbit/s or higher), install : And set environment variable to : Windows Command Line users: You can set the environment variable by running before the download command.
### With (**not** recommended) To clone a specific branch with , use a command like this: Note that using Git with HF repos is strongly discouraged. It will be much slower than using , and will use twice as much disk space as it has to store the model files twice (it stores every byte both in the intended target folder, and again in the folder as a blob.) ## How to easily download and use this model in text-generation-webui **NOTE**: Requires: * Transformers 4.36.0, or Transformers 4.37.0.dev0 from Github * Either AutoGPTQ 0.6 compiled from source and , * or, , if you installed Transformers from Github: Please make sure you're using the latest version of text-generation-webui. It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install. 1. Click the **Model tab**. 2. Under **Download custom model or LoRA**, enter . - To download from a specific branch, enter for example - see Provided Files above for the list of branches for each option. 3. Click **Download**. 4. The model will start downloading. Once it's finished it will say \"Done\". 5. In the top left, click the refresh icon next to **Model**. 6. In the **Model** dropdown, choose the model you just downloaded: 7. The model will automatically load, and is now ready for use! 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right. - Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file . 9. Once you're ready, click the **Text Generation** tab and enter a prompt to get started! ## Serving this model from Text Generation Inference (TGI) Not currently supported for Mixtral models. ## Python code example: inference from this GPTQ model ### Install the necessary packages Requires: Transformers 4.37.0.dev0 from Github, Optimum 1.16.0 or later, and AutoGPTQ 0.5.1 or later. If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source: ### Example Python code ## Compatibility The files provided are tested to work with AutoGPTQ 0.6 (compiled from source) and Transformers 4.37.0 (installed from Github). ## Discord For further support, and discussions on these models and AI in general, join us at: TheBloke AI's Discord server ## Thanks, and how to contribute Thanks to the chirper.ai team! Thanks to Clay from gpus.llm-utils.org! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: * Ko-Fi: **Special thanks to**: Aemon Algiz. **Patreon special mentions**: Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros Thank you to all my generous patrons and donaters! And thank you again to a16z for their generous grant. # Original model card: Mistral AI_'s Mixtral 8X7B Instruct v0.1 # Model Card for Mixtral-8x7B The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested. For full details of this model please read our release blog post. ## Warning This repo contains weights that are compatible with vLLM serving of the model as well as Hugging Face transformers library. It is based on the original Mixtral torrent release, but the file format and parameter names are different. Please note that model cannot (yet) be instantiated with HF. ## Instruction format This format must be strictly respected, otherwise the model will generate sub-optimal outputs. The template used to build a prompt for the Instruct model is defined as follows: Note that and are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings. As reference, here is the pseudo-code used to tokenize instructions during fine-tuning: In the pseudo-code above, note that the method should not add a BOS or EOS token automatically, but should add a prefix space. ## Run the model By default, transformers will load the model in full precision. Therefore you might be interested to further reduce down the memory requirements to run the model through the optimizations we offer in HF ecosystem: ### In half-precision Note precision only works on GPU devices
Click to expand
### Lower precision using (8-bit & 4-bit) using
Click to expand
### Load the model with Flash Attention 2
Click to expand
## Limitations The Mixtral-8x7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. # The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.", + "model_explanation_gemini": "A quantized version of Mistral AI's Mixtral-8x7B-Instruct model optimized for efficient GPU inference, designed to follow instructions and generate text responses in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/TheBloke_TinyLlama-1.1B-Chat-v1.0-GGUF.json b/data/model_data_json/TheBloke_TinyLlama-1.1B-Chat-v1.0-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..8c01b72ba77dd10e1647f60f724f9d5da12663f8 --- /dev/null +++ b/data/model_data_json/TheBloke_TinyLlama-1.1B-Chat-v1.0-GGUF.json @@ -0,0 +1,20 @@ +{ + "model_id": "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", + "downloads": 122997, + "tags": [ + "transformers", + "gguf", + "tinyllama", + "en", + "dataset:cerebras/SlimPajama-627B", + "dataset:bigcode/starcoderdata", + "dataset:OpenAssistant/oasst_top1_2023-08-25", + "base_model:TinyLlama/TinyLlama-1.1B-Chat-v1.0", + "base_model:quantized:TinyLlama/TinyLlama-1.1B-Chat-v1.0", + "license:apache-2.0", + "region:us", + "conversational" + ], + "description": "--- base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 datasets: - cerebras/SlimPajama-627B - bigcode/starcoderdata - OpenAssistant/oasst_top1_2023-08-25 inference: false language: - en license: apache-2.0 model_creator: TinyLlama model_name: Tinyllama 1.1B Chat v1.0 model_type: tinyllama prompt_template: '<|system|> {system_message}
<|user|> {prompt}
<|assistant|> ' quantized_by: TheBloke ---
\"TheBlokeAI\"

# Tinyllama 1.1B Chat v1.0 - GGUF - Model creator: TinyLlama - Original model: Tinyllama 1.1B Chat v1.0 ## Description This repo contains GGUF format model files for TinyLlama's Tinyllama 1.1B Chat v1.0. These files were quantised using hardware kindly provided by Massed Compute. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Repositories available * AWQ model(s) for GPU inference. * GPTQ models for GPU inference, with multiple quantisation parameter options. * 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference * TinyLlama's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions ## Prompt template: Zephyr ## Compatibility These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit d0cee0d They are also compatible with many third party UIs and libraries - please see the list at the top of this README. ## Explanation of quantisation methods
Click to see details The new methods available are: * GGML_TYPE_Q2_K - \"type-1\" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw) * GGML_TYPE_Q3_K - \"type-0\" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw. * GGML_TYPE_Q4_K - \"type-1\" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. * GGML_TYPE_Q5_K - \"type-1\" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw * GGML_TYPE_Q6_K - \"type-0\" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw Refer to the Provided Files table below to see what files use which methods, and how.
## Provided files | Name | Quant method | Bits | Size | Max RAM required | Use case | | ---- | ---- | ---- | ---- | ---- | ----- | | tinyllama-1.1b-chat-v1.0.Q2_K.gguf | Q2_K | 2 | 0.48 GB| 2.98 GB | smallest, significant quality loss - not recommended for most purposes | | tinyllama-1.1b-chat-v1.0.Q3_K_S.gguf | Q3_K_S | 3 | 0.50 GB| 3.00 GB | very small, high quality loss | | tinyllama-1.1b-chat-v1.0.Q3_K_M.gguf | Q3_K_M | 3 | 0.55 GB| 3.05 GB | very small, high quality loss | | tinyllama-1.1b-chat-v1.0.Q3_K_L.gguf | Q3_K_L | 3 | 0.59 GB| 3.09 GB | small, substantial quality loss | | tinyllama-1.1b-chat-v1.0.Q4_0.gguf | Q4_0 | 4 | 0.64 GB| 3.14 GB | legacy; small, very high quality loss - prefer using Q3_K_M | | tinyllama-1.1b-chat-v1.0.Q4_K_S.gguf | Q4_K_S | 4 | 0.64 GB| 3.14 GB | small, greater quality loss | | tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf | Q4_K_M | 4 | 0.67 GB| 3.17 GB | medium, balanced quality - recommended | | tinyllama-1.1b-chat-v1.0.Q5_0.gguf | Q5_0 | 5 | 0.77 GB| 3.27 GB | legacy; medium, balanced quality - prefer using Q4_K_M | | tinyllama-1.1b-chat-v1.0.Q5_K_S.gguf | Q5_K_S | 5 | 0.77 GB| 3.27 GB | large, low quality loss - recommended | | tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf | Q5_K_M | 5 | 0.78 GB| 3.28 GB | large, very low quality loss - recommended | | tinyllama-1.1b-chat-v1.0.Q6_K.gguf | Q6_K | 6 | 0.90 GB| 3.40 GB | very large, extremely low quality loss | | tinyllama-1.1b-chat-v1.0.Q8_0.gguf | Q8_0 | 8 | 1.17 GB| 3.67 GB | very large, extremely low quality loss - not recommended | **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead. ## How to download GGUF files **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file. The following clients/libraries will automatically download models for you, providing a list of available models to choose from: * LM Studio * LoLLMS Web UI * Faraday.dev ### In Under Download Model, you can enter the model repo: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF and below it, a specific filename to download, such as: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf. Then click Download. ### On the command line, including multiple files at once I recommend using the Python library: Then you can download any individual model file to the current directory, at high speed, with a command like this:
More advanced huggingface-cli download usage (click to read) You can also download multiple files at once with a pattern: For more documentation on downloading with , please see: HF -> Hub Python Library -> Download files -> Download from the CLI. To accelerate downloads on fast connections (1Gbit/s or higher), install : And set environment variable to : Windows Command Line users: You can set the environment variable by running before the download command.
## Example command Make sure you are using from commit d0cee0d or later. Change to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. Change to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value. If you want to have a chat-style conversation, replace the argument with For other parameters and how to use them, please refer to the llama.cpp documentation ## How to run in Further instructions can be found in the text-generation-webui documentation, here: text-generation-webui/docs/04 ‐ Model Tab.md. ## How to run from Python code You can use GGUF models from Python using the llama-cpp-python or ctransformers libraries. Note that at the time of writing (Nov 27th 2023), ctransformers has not been updated for some time and is not compatible with some recent models. Therefore I recommend you use llama-cpp-python. ### How to load this model in Python code, using llama-cpp-python For full documentation, please see: llama-cpp-python docs. #### First install the package Run one of the following commands, according to your system: #### Simple llama-cpp-python example code ## How to use with LangChain Here are guides on using llama-cpp-python and ctransformers with LangChain: * LangChain + llama-cpp-python * LangChain + ctransformers ## Discord For further support, and discussions on these models and AI in general, join us at: TheBloke AI's Discord server ## Thanks, and how to contribute Thanks to the chirper.ai team! Thanks to Clay from gpus.llm-utils.org! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: * Ko-Fi: **Special thanks to**: Aemon Algiz. **Patreon special mentions**: Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros Thank you to all my generous patrons and donaters! And thank you again to a16z for their generous grant. # Original model card: TinyLlama's Tinyllama 1.1B Chat v1.0
# TinyLlama-1.1B
The TinyLlama project aims to **pretrain** a **1.1B Llama model on 3 trillion tokens**. With some proper optimization, we can achieve this within a span of \"just\" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01. We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. #### This Model This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. **We follow HF's Zephyr's training recipe.** The model was \" initially fine-tuned on a variant of the []( dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. We then further aligned the model with 🤗 TRL's on the openbmb/UltraFeedback dataset, which contain 64k prompts and model completions that are ranked by GPT-4.\" #### How to use You will need the transformers>=4.34 Do check the TinyLlama github page for more information. ", + "model_explanation_gemini": "A 1.1B parameter chat model optimized for dialogue tasks, quantized in GGUF format for efficient CPU/GPU inference." +} \ No newline at end of file diff --git a/data/model_data_json/TheBloke_phi-2-GGUF.json b/data/model_data_json/TheBloke_phi-2-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..0c8dfe5fccb1a8bab7e3228e66f1d1d9537a1c56 --- /dev/null +++ b/data/model_data_json/TheBloke_phi-2-GGUF.json @@ -0,0 +1,19 @@ +{ + "model_id": "TheBloke/phi-2-GGUF", + "downloads": 34123999, + "tags": [ + "transformers", + "gguf", + "phi-msft", + "nlp", + "code", + "text-generation", + "en", + "base_model:microsoft/phi-2", + "base_model:quantized:microsoft/phi-2", + "license:other", + "region:us" + ], + "description": "--- base_model: microsoft/phi-2 inference: false language: - en license: other license_link: license_name: microsoft-research-license model_creator: Microsoft model_name: Phi 2 model_type: phi-msft pipeline_tag: text-generation prompt_template: 'Instruct: {prompt} Output: ' quantized_by: TheBloke tags: - nlp - code ---
\"TheBlokeAI\"

# Phi 2 - GGUF - Model creator: Microsoft - Original model: Phi 2 ## Description This repo contains GGUF format model files for Microsoft's Phi 2. ### About GGUF GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Here is an incomplete list of clients and libraries that are known to support GGUF: * llama.cpp. The source project for GGUF. Offers a CLI and a server option. * text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. * KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. * GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel. * LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023. * LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection. * Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. * llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. * candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use. * ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models. ## Repositories available * GPTQ models for GPU inference, with multiple quantisation parameter options. * 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference * Microsoft's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions ## Prompt template: Phi ## Compatibility These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit d0cee0d They are also compatible with many third party UIs and libraries - please see the list at the top of this README. ## Explanation of quantisation methods
Click to see details The new methods available are: * GGML_TYPE_Q2_K - \"type-1\" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw) * GGML_TYPE_Q3_K - \"type-0\" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw. * GGML_TYPE_Q4_K - \"type-1\" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. * GGML_TYPE_Q5_K - \"type-1\" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw * GGML_TYPE_Q6_K - \"type-0\" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw Refer to the Provided Files table below to see what files use which methods, and how.
## Provided files | Name | Quant method | Bits | Size | Max RAM required | Use case | | ---- | ---- | ---- | ---- | ---- | ----- | | phi-2.Q2_K.gguf | Q2_K | 2 | 1.17 GB| 3.67 GB | smallest, significant quality loss - not recommended for most purposes | | phi-2.Q3_K_S.gguf | Q3_K_S | 3 | 1.25 GB| 3.75 GB | very small, high quality loss | | phi-2.Q3_K_M.gguf | Q3_K_M | 3 | 1.48 GB| 3.98 GB | very small, high quality loss | | phi-2.Q4_0.gguf | Q4_0 | 4 | 1.60 GB| 4.10 GB | legacy; small, very high quality loss - prefer using Q3_K_M | | phi-2.Q3_K_L.gguf | Q3_K_L | 3 | 1.60 GB| 4.10 GB | small, substantial quality loss | | phi-2.Q4_K_S.gguf | Q4_K_S | 4 | 1.62 GB| 4.12 GB | small, greater quality loss | | phi-2.Q4_K_M.gguf | Q4_K_M | 4 | 1.79 GB| 4.29 GB | medium, balanced quality - recommended | | phi-2.Q5_0.gguf | Q5_0 | 5 | 1.93 GB| 4.43 GB | legacy; medium, balanced quality - prefer using Q4_K_M | | phi-2.Q5_K_S.gguf | Q5_K_S | 5 | 1.93 GB| 4.43 GB | large, low quality loss - recommended | | phi-2.Q5_K_M.gguf | Q5_K_M | 5 | 2.07 GB| 4.57 GB | large, very low quality loss - recommended | | phi-2.Q6_K.gguf | Q6_K | 6 | 2.29 GB| 4.79 GB | very large, extremely low quality loss | | phi-2.Q8_0.gguf | Q8_0 | 8 | 2.96 GB| 5.46 GB | very large, extremely low quality loss - not recommended | **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead. ## How to download GGUF files **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file. The following clients/libraries will automatically download models for you, providing a list of available models to choose from: * LM Studio * LoLLMS Web UI * Faraday.dev ### In Under Download Model, you can enter the model repo: TheBloke/phi-2-GGUF and below it, a specific filename to download, such as: phi-2.Q4_K_M.gguf. Then click Download. ### On the command line, including multiple files at once I recommend using the Python library: Then you can download any individual model file to the current directory, at high speed, with a command like this:
More advanced huggingface-cli download usage (click to read) You can also download multiple files at once with a pattern: For more documentation on downloading with , please see: HF -> Hub Python Library -> Download files -> Download from the CLI. To accelerate downloads on fast connections (1Gbit/s or higher), install : And set environment variable to : Windows Command Line users: You can set the environment variable by running before the download command.
## Example command Make sure you are using from commit d0cee0d or later. Change to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. Change to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value. If you want to have a chat-style conversation, replace the argument with For other parameters and how to use them, please refer to the llama.cpp documentation ## How to run in Further instructions can be found in the text-generation-webui documentation, here: text-generation-webui/docs/04 ‐ Model Tab.md. ## How to run from Python code You can use GGUF models from Python using the llama-cpp-python or ctransformers libraries. Note that at the time of writing (Nov 27th 2023), ctransformers has not been updated for some time and is not compatible with some recent models. Therefore I recommend you use llama-cpp-python. ### How to load this model in Python code, using llama-cpp-python For full documentation, please see: llama-cpp-python docs. #### First install the package Run one of the following commands, according to your system: #### Simple llama-cpp-python example code ## How to use with LangChain Here are guides on using llama-cpp-python and ctransformers with LangChain: * LangChain + llama-cpp-python * LangChain + ctransformers ## Discord For further support, and discussions on these models and AI in general, join us at: TheBloke AI's Discord server ## Thanks, and how to contribute Thanks to the chirper.ai team! Thanks to Clay from gpus.llm-utils.org! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: * Ko-Fi: **Special thanks to**: Aemon Algiz. **Patreon special mentions**: Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros Thank you to all my generous patrons and donaters! And thank you again to a16z for their generous grant. # Original model card: Microsoft's Phi 2 ## Model Summary Phi-2 is a Transformer with **2.7 billion** parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. Our model hasn't been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more. ## Intended Uses Phi-2 is intended for research purposes only. Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format. ### QA Format: You can provide the prompt as a standalone question as follows: where the model generates the text after \".\" . To encourage the model to write more concise answers, you can also try the following QA format using \"Instruct: \\\\nOutput:\" where the model generates the text after \"Output:\". ### Chat Format: where the model generates the text after the first \"Bob:\". ### Code Format: where the model generates the text after the comments. **Notes:** * Phi-2 is intended for research purposes. The model-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications. * Direct adoption for production tasks is out of the scope of this research project. As a result, the Phi-2 model has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details. * If you are using , always load the model with to prevent side-effects. ## Sample Code There are four types of execution mode: 1. FP16 / Flash-Attention / CUDA: 2. FP16 / CUDA: 3. FP32 / CUDA: 4. FP32 / CPU: To ensure the maximum compatibility, we recommend using the second execution mode (FP16 / CUDA), as follows: **Remark:** In the generation function, our model currently does not support beam search (). Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings. ## Limitations of Phi-2 * Generate Inaccurate Code and Facts: The model may produce incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions. * Limited Scope for code: Majority of Phi-2 training data is based in Python and use common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. * Unreliable Responses to Instruction: The model has not undergone instruction fine-tuning. As a result, it may struggle or fail to adhere to intricate or nuanced instructions provided by users. * Language Limitations: The model is primarily designed to understand standard English. Informal English, slang, or any other languages might pose challenges to its comprehension, leading to potential misinterpretations or errors in response. * Potential Societal Biases: Phi-2 is not entirely free from societal biases despite efforts in assuring trainig data safety. There's a possibility it may generate content that mirrors these societal biases, particularly if prompted or instructed to do so. We urge users to be aware of this and to exercise caution and critical thinking when interpreting model outputs. * Toxicity: Despite being trained with carefully selected data, the model can still produce harmful content if explicitly prompted or instructed to do so. We chose to release the model for research purposes only -- We hope to help the open-source community develop the most effective ways to reduce the toxicity of a model directly after pretraining. * Verbosity: Phi-2 being a base model often produces irrelevant or extra text and responses following its first answer to user prompts within a single turn. This is due to its training dataset being primarily textbooks, which results in textbook-like responses. ## Training ### Model * Architecture: a Transformer-based model with next-word prediction objective * Context length: 2048 tokens * Dataset size: 250B tokens, combination of NLP synthetic data created by AOAI GPT-3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, which was assessed by AOAI GPT-4. * Training tokens: 1.4T tokens * GPUs: 96xA100-80G * Training time: 14 days ### Software * PyTorch * DeepSpeed * Flash-Attention ### License The model is licensed under the microsoft-research-license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies. ", + "model_explanation_gemini": "A quantized version of Microsoft's Phi 2 model designed for text generation tasks, compatible with various inference tools and optimized for CPU+GPU usage." +} \ No newline at end of file diff --git a/data/model_data_json/TheDrummer_Fallen-Gemma3-27B-v1.json b/data/model_data_json/TheDrummer_Fallen-Gemma3-27B-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..1f845c7a7c1af20f1b17c40dae475e40ec438712 --- /dev/null +++ b/data/model_data_json/TheDrummer_Fallen-Gemma3-27B-v1.json @@ -0,0 +1,12 @@ +{ + "model_id": "TheDrummer/Fallen-Gemma3-27B-v1", + "downloads": 84592, + "tags": [ + "safetensors", + "gemma3", + "license:other", + "region:us" + ], + "description": "--- license: other --- # Join our Discord! ## Nearly 5000 members of helpful, LLM enthusiasts! A hub for players and makers alike! --- BeaverAI proudly presents... # Fallen Gemma3 27B v1 👺 !image/gif ## Special Thanks - Thank you to each and everyone who donated and subscribed in Patreon and Ko-Fi to make our venture a little bit easier. - I'm also recently unemployed. I am a Software Developer with 8 years of experience in Web, API, AI, and adapting to new tech and requirements. If you're hiring, feel free to reach out to me however. ## Usage - Use Gemma Chat Template ## Description Fallen Gemma3 27B v1 is an evil tune of Gemma 3 27B but it is not a complete decensor. Evil tunes knock out the positivity and may enjoy torturing you and humanity. Vision still works and it has something to say about the crap you feed it. ## Links - Original: - GGUF: - iMatrix (recommended):", + "model_explanation_gemini": "Fallen Gemma3 27B v1 is an altered version of Gemma 3 27B designed to produce darker, less positive outputs while retaining vision capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/TinyLlama_TinyLlama-1.1B-Chat-v1.0.json b/data/model_data_json/TinyLlama_TinyLlama-1.1B-Chat-v1.0.json new file mode 100644 index 0000000000000000000000000000000000000000..9229feee50d5f49014e5a840461a0159164976c0 --- /dev/null +++ b/data/model_data_json/TinyLlama_TinyLlama-1.1B-Chat-v1.0.json @@ -0,0 +1,23 @@ +{ + "model_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", + "downloads": 1639437, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "conversational", + "en", + "dataset:cerebras/SlimPajama-627B", + "dataset:bigcode/starcoderdata", + "dataset:HuggingFaceH4/ultrachat_200k", + "dataset:HuggingFaceH4/ultrafeedback_binarized", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - cerebras/SlimPajama-627B - bigcode/starcoderdata - HuggingFaceH4/ultrachat_200k - HuggingFaceH4/ultrafeedback_binarized language: - en widget: - example_title: Fibonacci (Python) messages: - role: system content: You are a chatbot who can help code! - role: user content: Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI. ---
# TinyLlama-1.1B
The TinyLlama project aims to **pretrain** a **1.1B Llama model on 3 trillion tokens**. With some proper optimization, we can achieve this within a span of \"just\" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01. We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. #### This Model This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. **We follow HF's Zephyr's training recipe.** The model was \" initially fine-tuned on a variant of the []( dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. We then further aligned the model with 🤗 TRL's on the openbmb/UltraFeedback dataset, which contain 64k prompts and model completions that are ranked by GPT-4.\" #### How to use You will need the transformers>=4.34 Do check the TinyLlama github page for more information.", + "model_explanation_gemini": "TinyLlama-1.1B-Chat-v1.0 is a compact 1.1B-parameter chatbot model fine-tuned for coding assistance and diverse dialogue generation, optimized for low-resource environments." +} \ No newline at end of file diff --git a/data/model_data_json/TrustSafeAI_RADAR-Vicuna-7B.json b/data/model_data_json/TrustSafeAI_RADAR-Vicuna-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..a6546a2b208f3aefefa16c6bb6b4c7df9f5e7ddd --- /dev/null +++ b/data/model_data_json/TrustSafeAI_RADAR-Vicuna-7B.json @@ -0,0 +1,17 @@ +{ + "model_id": "TrustSafeAI/RADAR-Vicuna-7B", + "downloads": 134972, + "tags": [ + "transformers", + "pytorch", + "roberta", + "text-classification", + "arxiv:1907.11692", + "arxiv:2307.03838", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: text-classification ---
# RADAR Model Card ## Model Details RADAR-Vicuna-7B is an AI-text detector trained via adversarial learning between the detector and a paraphraser on human-text corpus (OpenWebText) and AI-text corpus generated based on OpenWebText. - **Developed by:** TrustSafeAI - **Model type:** An encoder-only language model based on the transformer architecture (RoBERTa). - **License:** Non-commercial license (inherited from Vicuna-7B-v1.1) - **Trained from model:** RoBERTa ### Model Sources - **Project Page:** - **Paper:** - **IBM Blog Post:** ## Uses Users could use this detector to assist them in detecting text generated by large language models. Please note that this detector is trained on AI-text generated by Vicuna-7B-v1.1. As the model only supports non-commercial use, the intended users are **not allowed to involve this detector into commercial activities**. ## Get Started with the Model Please refer to the following guidelines to see how to locally run the downloaded model or use our API service hosted on Huggingface Space. - Google Colab Demo: - Huggingface API Documentation: ## Training Pipeline We propose adversarial learning between a paraphraser and our detector. The paraphraser's goal is to make the AI-generated text more like human-writen and the detector's goal is to promote it's ability to identify the AI-text. - **(Step 1) Training Data preparation**: Before training, we use Vicuna-7B to generate AI-text by performing text completion based on the prefix span of human-text in OpenWebText. - **(Step 2) Update the paraphraser** During training, the paraphraser will do paraphrasing on the AI-text generated in **Step 1**. And then collect the reward returned by the detector to update the paraphraser using Proxy Proximal Optimization loss. - **(Step 3) Update the detector** The detector is optimized using the logistic loss on the human-text, AI-text and paraphrased AI-text. See more details in Sections 3 and 4 of this paper. ## Ethical Considerations We suggest users use our tool to assist with identifying AI-written content at scale and with discretion. If the detection result is to be used as evidence, further validation steps are necessary as RADAR cannot always make correct predictions.", + "model_explanation_gemini": "Detects AI-generated text, specifically from Vicuna-7B-v1.1, using adversarial training with a paraphraser to distinguish between human and machine-written content." +} \ No newline at end of file diff --git a/data/model_data_json/UBC-NLP_MARBERT.json b/data/model_data_json/UBC-NLP_MARBERT.json new file mode 100644 index 0000000000000000000000000000000000000000..01e97ad900e695c93ee974801eda04f085818abf --- /dev/null +++ b/data/model_data_json/UBC-NLP_MARBERT.json @@ -0,0 +1,21 @@ +{ + "model_id": "UBC-NLP/MARBERT", + "downloads": 80475, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "Arabic BERT", + "MSA", + "Twitter", + "Masked Langauge Model", + "ar", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ar tags: - Arabic BERT - MSA - Twitter - Masked Langauge Model widget: - text: \"اللغة العربية هي لغة [MASK].\" --- \"drawing\" **MARBERT** is one of three models described in our **ACL 2021 paper** **\"ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic\"**. MARBERT is a large-scale pre-trained masked language model focused on both Dialectal Arabic (DA) and MSA. Arabic has multiple varieties. To train MARBERT, we randomly sample 1B Arabic tweets from a large in-house dataset of about 6B tweets. We only include tweets with at least 3 Arabic words, based on character string matching, regardless whether the tweet has non-Arabic string or not. That is, we do not remove non-Arabic so long as the tweet meets the 3 Arabic word criterion. The dataset makes up **128GB of text** (**15.6B tokens**). We use the same network architecture as ARBERT (BERT-base), but without the next sentence prediction (NSP) objective since tweets are short. See our repo for modifying BERT code to remove NSP. For more information about MARBERT, please visit our own GitHub repo. # BibTex If you use our models (ARBERT, MARBERT, or MARBERTv2) for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows (to be updated): ## Acknowledgments We gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, Canadian Foundation for Innovation, ComputeCanada and UBC ARC-Sockeye. We also thank the Google TensorFlow Research Cloud (TFRC) program for providing us with free TPU access." +} \ No newline at end of file diff --git a/data/model_data_json/Wan-AI_Wan2.1-I2V-14B-720P.json b/data/model_data_json/Wan-AI_Wan2.1-I2V-14B-720P.json new file mode 100644 index 0000000000000000000000000000000000000000..187bfcb622f6f2755564266b73ad7873f6f004f4 --- /dev/null +++ b/data/model_data_json/Wan-AI_Wan2.1-I2V-14B-720P.json @@ -0,0 +1,18 @@ +{ + "model_id": "Wan-AI/Wan2.1-I2V-14B-720P", + "downloads": 594884, + "tags": [ + "diffusers", + "safetensors", + "i2v", + "video", + "video genration", + "image-to-video", + "en", + "zh", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en - zh pipeline_tag: image-to-video library_name: diffusers tags: - video - video genration --- # Wan2.1

💜 Wan    |    🖥️    |   🤖 Paper (Coming soon)    |    📑    |    📖 ----- [**Wan: Open and Advanced Large-Scale Video Generative Models**]() In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features: - 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks. - 👍 **Supports Consumer-grade GPUs**: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models. - 👍 **Multiple Tasks**: **Wan2.1** excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. - 👍 **Visual Text Generation**: **Wan2.1** is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. - 👍 **Powerful Video VAE**: **Wan-VAE** delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation. This repository contains our I2V-14B model, which is capable of generating 720P high-definition videos. After thousands of rounds of human evaluations, this model has outperformed both closed-source and open-source alternatives, achieving state-of-the-art performance. ## Video Demos

## 🔥 Latest News!! * Feb 25, 2025: 👋 We've released the inference code and weights of Wan2.1. ## 📑 Todo List - Wan2.1 Text-to-Video - [x] Multi-GPU Inference code of the 14B and 1.3B models - [x] Checkpoints of the 14B and 1.3B models - [x] Gradio demo - [ ] Diffusers integration - [ ] ComfyUI integration - Wan2.1 Image-to-Video - [x] Multi-GPU Inference code of the 14B model - [x] Checkpoints of the 14B model - [x] Gradio demo - [ ] Diffusers integration - [ ] ComfyUI integration ## Quickstart #### Installation Clone the repo: Install dependencies: #### Model Download | Models | Download Link | Notes | | --------------|-------------------------------------------------------------------------------|-------------------------------| | T2V-14B | 🤗 Huggingface 🤖 ModelScope | Supports both 480P and 720P | I2V-14B-720P | 🤗 Huggingface 🤖 ModelScope | Supports 720P | I2V-14B-480P | 🤗 Huggingface 🤖 ModelScope | Supports 480P | T2V-1.3B | 🤗 Huggingface 🤖 ModelScope | Supports 480P > 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution. Download models using 🤗 huggingface-cli: Download models using 🤖 modelscope-cli: #### Run Image-to-Video Generation Similar to Text-to-Video, Image-to-Video is also divided into processes with and without the prompt extension step. The specific parameters and their corresponding settings are as follows:
Task Resolution Model
480P 720P
i2v-14B ✔️ Wan2.1-I2V-14B-720P
i2v-14B ✔️ Wan2.1-T2V-14B-480P
##### (1) Without Prompt Extention - Single-GPU inference > 💡For the Image-to-Video task, the parameter represents the area of the generated video, with the aspect ratio following that of the original input image. - Multi-GPU inference using FSDP + xDiT USP ##### (2) Using Prompt Extention Run with local prompt extention using : Run with remote prompt extention using : ##### (3) Runing local gradio ## Manual Evaluation We conducted extensive manual evaluations to evaluate the performance of the Image-to-Video model, and the results are presented in the table below. The results clearly indicate that **Wan2.1** outperforms both closed-source and open-source models.
\"\"
## Computational Efficiency on Different GPUs We test the computational efficiency of different **Wan2.1** models on different GPUs in the following table. The results are presented in the format: **Total time (s) / peak GPU memory (GB)**.
\"\"
> The parameter settings for the tests presented in this table are as follows: > (1) For the 1.3B model on 8 GPUs, set and ; > (2) For the 14B model on 1 GPU, use ; > (3) For the 1.3B model on a single 4090 GPU, set ; > (4) For all testings, no prompt extension was applied, meaning was not enabled. ------- ## Introduction of Wan2.1 **Wan2.1** is designed on the mainstream diffusion transformer paradigm, achieving significant advancements in generative capabilities through a series of innovations. These include our novel spatio-temporal variational autoencoder (VAE), scalable training strategies, large-scale data construction, and automated evaluation metrics. Collectively, these contributions enhance the model’s performance and versatility. ##### (1) 3D Variational Autoencoders We propose a novel 3D causal VAE architecture, termed **Wan-VAE** specifically designed for video generation. By combining multiple strategies, we improve spatio-temporal compression, reduce memory usage, and ensure temporal causality. **Wan-VAE** demonstrates significant advantages in performance efficiency compared to other open-source VAEs. Furthermore, our **Wan-VAE** can encode and decode unlimited-length 1080P videos without losing historical temporal information, making it particularly well-suited for video generation tasks.
\"\"
##### (2) Video Diffusion DiT **Wan2.1** is designed using the Flow Matching framework within the paradigm of mainstream Diffusion Transformers. Our model's architecture uses the T5 Encoder to encode multilingual text input, with cross-attention in each transformer block embedding the text into the model structure. Additionally, we employ an MLP with a Linear layer and a SiLU layer to process the input time embeddings and predict six modulation parameters individually. This MLP is shared across all transformer blocks, with each block learning a distinct set of biases. Our experimental findings reveal a significant performance improvement with this approach at the same parameter scale.
\"\"
| Model | Dimension | Input Dimension | Output Dimension | Feedforward Dimension | Frequency Dimension | Number of Heads | Number of Layers | |--------|-----------|-----------------|------------------|-----------------------|---------------------|-----------------|------------------| | 1.3B | 1536 | 16 | 16 | 8960 | 256 | 12 | 30 | | 14B | 5120 | 16 | 16 | 13824 | 256 | 40 | 40 | ##### Data We curated and deduplicated a candidate dataset comprising a vast amount of image and video data. During the data curation process, we designed a four-step data cleaning process, focusing on fundamental dimensions, visual quality and motion quality. Through the robust data processing pipeline, we can easily obtain high-quality, diverse, and large-scale training sets of images and videos. !figure1 ##### Comparisons to SOTA We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions. We then compute the total score by performing a weighted calculation on the scores of each dimension, utilizing weights derived from human preferences in the matching process. The detailed results are shown in the table below. These results demonstrate our model's superior performance compared to both open-source and closed-source models. !figure1 ## Citation If you find our work helpful, please cite us. ## License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license. ## Acknowledgements We would like to thank the contributors to the SD3, Qwen, umt5-xxl, diffusers and HuggingFace repositories, for their open research. ## Contact Us If you would like to leave a message to our research or product teams, feel free to join our Discord or WeChat groups!", + "model_explanation_gemini": "Generates high-definition 720P videos from input images with state-of-the-art performance." +} \ No newline at end of file diff --git a/data/model_data_json/Wan-AI_Wan2.1-T2V-1.3B-Diffusers.json b/data/model_data_json/Wan-AI_Wan2.1-T2V-1.3B-Diffusers.json new file mode 100644 index 0000000000000000000000000000000000000000..6103b1d35ac765253a0cc0f89375791bd45d1b30 --- /dev/null +++ b/data/model_data_json/Wan-AI_Wan2.1-T2V-1.3B-Diffusers.json @@ -0,0 +1,18 @@ +{ + "model_id": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers", + "downloads": 100990, + "tags": [ + "diffusers", + "safetensors", + "video", + "video-generation", + "text-to-video", + "en", + "zh", + "license:apache-2.0", + "diffusers:WanPipeline", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en - zh pipeline_tag: text-to-video library_name: diffusers tags: - video - video-generation --- # Wan2.1

💜 Wan    |    🖥️    |   🤖 Paper (Coming soon)    |    📑    |    📖 ----- **Wan: Open and Advanced Large-Scale Video Generative Models** In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features: - 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks. - 👍 **Supports Consumer-grade GPUs**: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models. - 👍 **Multiple Tasks**: **Wan2.1** excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. - 👍 **Visual Text Generation**: **Wan2.1** is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. - 👍 **Powerful Video VAE**: **Wan-VAE** delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation. This repository hosts our T2V-1.3B model, a versatile solution for video generation that is compatible with nearly all consumer-grade GPUs. In this way, we hope that **Wan2.1** can serve as an easy-to-use tool for more creative teams in video creation, providing a high-quality foundational model for academic teams with limited computing resources. This will facilitate both the rapid development of the video creation community and the swift advancement of video technology. ## Video Demos

## 🔥 Latest News!! * Feb 25, 2025: 👋 We've released the inference code and weights of Wan2.1. ## 📑 Todo List - Wan2.1 Text-to-Video - [x] Multi-GPU Inference code of the 14B and 1.3B models - [x] Checkpoints of the 14B and 1.3B models - [x] Gradio demo - [x] Diffusers integration - [ ] ComfyUI integration - Wan2.1 Image-to-Video - [x] Multi-GPU Inference code of the 14B model - [x] Checkpoints of the 14B model - [x] Gradio demo - [x] Diffusers integration - [ ] ComfyUI integration ## Quickstart #### Installation Clone the repo: Install dependencies: #### Model Download | Models | Download Link | Notes | | --------------|-------------------------------------------------------------------------------|-------------------------------| | T2V-14B | 🤗 Huggingface 🤖 ModelScope | Supports both 480P and 720P | I2V-14B-720P | 🤗 Huggingface 🤖 ModelScope | Supports 720P | I2V-14B-480P | 🤗 Huggingface 🤖 ModelScope | Supports 480P | T2V-1.3B | 🤗 Huggingface 🤖 ModelScope | Supports 480P > 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution. Download models using 🤗 huggingface-cli: Download models using 🤖 modelscope-cli: #### Run Text-to-Video Generation This repository supports two Text-to-Video models (1.3B and 14B) and two resolutions (480P and 720P). The parameters and configurations for these models are as follows:
Task Resolution Model
480P 720P
t2v-14B ✔️ ✔️ Wan2.1-T2V-14B
t2v-1.3B ✔️ Wan2.1-T2V-1.3B
##### (1) Without Prompt Extention To facilitate implementation, we will start with a basic version of the inference process that skips the prompt extension step. - Single-GPU inference If you encounter OOM (Out-of-Memory) issues, you can use the and options to reduce GPU memory usage. For example, on an RTX 4090 GPU: > 💡Note: If you are using the model, we recommend setting the parameter . The can be adjusted within the range of 8 to 12 based on the performance. - Multi-GPU inference using FSDP + xDiT USP Wan can also be run directly using 🤗 Diffusers! ##### (2) Using Prompt Extention Extending the prompts can effectively enrich the details in the generated videos, further enhancing the video quality. Therefore, we recommend enabling prompt extension. We provide the following two methods for prompt extension: - Use the Dashscope API for extension. - Apply for a in advance (EN | CN). - Configure the environment variable to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable to ' For more detailed instructions, please refer to the dashscope document. - Use the model for text-to-video tasks and for image-to-video tasks. - You can modify the model used for extension with the parameter . For example: - Using a local model for extension. - By default, the Qwen model on HuggingFace is used for this extension. Users can choose based on the available GPU memory size. - For text-to-video tasks, you can use models like , and - For image-to-video tasks, you can use models like and . - Larger models generally provide better extension results but require more GPU memory. - You can modify the model used for extension with the parameter , allowing you to specify either a local model path or a Hugging Face model. For example: ##### (3) Runing local gradio ## Evaluation We employ our **Wan-Bench** framework to evaluate the performance of the T2V-1.3B model, with the results displayed in the table below. The results indicate that our smaller 1.3B model surpasses the overall metrics of larger open-source models, demonstrating the effectiveness of **WanX2.1**'s architecture and the data construction pipeline.
\"\"
## Computational Efficiency on Different GPUs We test the computational efficiency of different **Wan2.1** models on different GPUs in the following table. The results are presented in the format: **Total time (s) / peak GPU memory (GB)**.
\"\"
> The parameter settings for the tests presented in this table are as follows: > (1) For the 1.3B model on 8 GPUs, set and ; > (2) For the 14B model on 1 GPU, use ; > (3) For the 1.3B model on a single 4090 GPU, set ; > (4) For all testings, no prompt extension was applied, meaning was not enabled. ------- ## Introduction of Wan2.1 **Wan2.1** is designed on the mainstream diffusion transformer paradigm, achieving significant advancements in generative capabilities through a series of innovations. These include our novel spatio-temporal variational autoencoder (VAE), scalable training strategies, large-scale data construction, and automated evaluation metrics. Collectively, these contributions enhance the model’s performance and versatility. ##### (1) 3D Variational Autoencoders We propose a novel 3D causal VAE architecture, termed **Wan-VAE** specifically designed for video generation. By combining multiple strategies, we improve spatio-temporal compression, reduce memory usage, and ensure temporal causality. **Wan-VAE** demonstrates significant advantages in performance efficiency compared to other open-source VAEs. Furthermore, our **Wan-VAE** can encode and decode unlimited-length 1080P videos without losing historical temporal information, making it particularly well-suited for video generation tasks.
\"\"
##### (2) Video Diffusion DiT **Wan2.1** is designed using the Flow Matching framework within the paradigm of mainstream Diffusion Transformers. Our model's architecture uses the T5 Encoder to encode multilingual text input, with cross-attention in each transformer block embedding the text into the model structure. Additionally, we employ an MLP with a Linear layer and a SiLU layer to process the input time embeddings and predict six modulation parameters individually. This MLP is shared across all transformer blocks, with each block learning a distinct set of biases. Our experimental findings reveal a significant performance improvement with this approach at the same parameter scale.
\"\"
| Model | Dimension | Input Dimension | Output Dimension | Feedforward Dimension | Frequency Dimension | Number of Heads | Number of Layers | |--------|-----------|-----------------|------------------|-----------------------|---------------------|-----------------|------------------| | 1.3B | 1536 | 16 | 16 | 8960 | 256 | 12 | 30 | | 14B | 5120 | 16 | 16 | 13824 | 256 | 40 | 40 | ##### Data We curated and deduplicated a candidate dataset comprising a vast amount of image and video data. During the data curation process, we designed a four-step data cleaning process, focusing on fundamental dimensions, visual quality and motion quality. Through the robust data processing pipeline, we can easily obtain high-quality, diverse, and large-scale training sets of images and videos. !figure1 ##### Comparisons to SOTA We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions. Then we calculated the total score through a weighted average based on the importance of each dimension. The detailed results are shown in the table below. These results demonstrate our model's superior performance compared to both open-source and closed-source models. !figure1 ## Citation If you find our work helpful, please cite us. ## License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license. ## Acknowledgements We would like to thank the contributors to the SD3, Qwen, umt5-xxl, diffusers and HuggingFace repositories, for their open research. ## Contact Us If you would like to leave a message to our research or product teams, feel free to join our Discord or WeChat groups!", + "model_explanation_gemini": "Generates videos from text inputs while supporting multiple languages and resolutions, optimized for consumer-grade GPUs." +} \ No newline at end of file diff --git a/data/model_data_json/WhereIsAI_UAE-Large-V1.json b/data/model_data_json/WhereIsAI_UAE-Large-V1.json new file mode 100644 index 0000000000000000000000000000000000000000..b33ec28662b4c58e70358a463e41c587cf28ec0a --- /dev/null +++ b/data/model_data_json/WhereIsAI_UAE-Large-V1.json @@ -0,0 +1,27 @@ +{ + "model_id": "WhereIsAI/UAE-Large-V1", + "downloads": 1336443, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "mteb", + "sentence_embedding", + "feature_extraction", + "transformers", + "transformers.js", + "en", + "arxiv:2309.12871", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence_embedding - feature_extraction - sentence-transformers - transformers - transformers.js model-index: - name: UAE-Large-V1 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.55223880597015 - type: ap value: 38.264070815317794 - type: f1 value: 69.40977934769845 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.84267499999999 - type: ap value: 89.57568507997713 - type: f1 value: 92.82590734337774 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.292 - type: f1 value: 47.90257816032778 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 42.105 - type: map_at_10 value: 58.181000000000004 - type: map_at_100 value: 58.653999999999996 - type: map_at_1000 value: 58.657000000000004 - type: map_at_3 value: 54.386 - type: map_at_5 value: 56.757999999999996 - type: mrr_at_1 value: 42.745 - type: mrr_at_10 value: 58.437 - type: mrr_at_100 value: 58.894999999999996 - type: mrr_at_1000 value: 58.897999999999996 - type: mrr_at_3 value: 54.635 - type: mrr_at_5 value: 56.99999999999999 - type: ndcg_at_1 value: 42.105 - type: ndcg_at_10 value: 66.14999999999999 - type: ndcg_at_100 value: 68.048 - type: ndcg_at_1000 value: 68.11399999999999 - type: ndcg_at_3 value: 58.477000000000004 - type: ndcg_at_5 value: 62.768 - type: precision_at_1 value: 42.105 - type: precision_at_10 value: 9.110999999999999 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 23.447000000000003 - type: precision_at_5 value: 16.159000000000002 - type: recall_at_1 value: 42.105 - type: recall_at_10 value: 91.11 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 70.341 - type: recall_at_5 value: 80.797 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 49.02580759154173 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.093601280163554 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 64.19590406875427 - type: mrr value: 77.09547992788991 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.86678362843676 - type: cos_sim_spearman value: 86.1423242570783 - type: euclidean_pearson value: 85.98994198511751 - type: euclidean_spearman value: 86.48209103503942 - type: manhattan_pearson value: 85.6446436316182 - type: manhattan_spearman value: 86.21039809734357 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.69155844155844 - type: f1 value: 87.68109381943547 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.37501687500394 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 37.23401405155885 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.232 - type: map_at_10 value: 41.404999999999994 - type: map_at_100 value: 42.896 - type: map_at_1000 value: 43.028 - type: map_at_3 value: 37.925 - type: map_at_5 value: 39.865 - type: mrr_at_1 value: 36.338 - type: mrr_at_10 value: 46.969 - type: mrr_at_100 value: 47.684 - type: mrr_at_1000 value: 47.731 - type: mrr_at_3 value: 44.063 - type: mrr_at_5 value: 45.908 - type: ndcg_at_1 value: 36.338 - type: ndcg_at_10 value: 47.887 - type: ndcg_at_100 value: 53.357 - type: ndcg_at_1000 value: 55.376999999999995 - type: ndcg_at_3 value: 42.588 - type: ndcg_at_5 value: 45.132 - type: precision_at_1 value: 36.338 - type: precision_at_10 value: 9.17 - type: precision_at_100 value: 1.4909999999999999 - type: precision_at_1000 value: 0.196 - type: precision_at_3 value: 20.315 - type: precision_at_5 value: 14.793000000000001 - type: recall_at_1 value: 30.232 - type: recall_at_10 value: 60.67399999999999 - type: recall_at_100 value: 83.628 - type: recall_at_1000 value: 96.209 - type: recall_at_3 value: 45.48 - type: recall_at_5 value: 52.354 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.237 - type: map_at_10 value: 42.829 - type: map_at_100 value: 44.065 - type: map_at_1000 value: 44.199 - type: map_at_3 value: 39.885999999999996 - type: map_at_5 value: 41.55 - type: mrr_at_1 value: 40.064 - type: mrr_at_10 value: 48.611 - type: mrr_at_100 value: 49.245 - type: mrr_at_1000 value: 49.29 - type: mrr_at_3 value: 46.561 - type: mrr_at_5 value: 47.771 - type: ndcg_at_1 value: 40.064 - type: ndcg_at_10 value: 48.388 - type: ndcg_at_100 value: 52.666999999999994 - type: ndcg_at_1000 value: 54.67100000000001 - type: ndcg_at_3 value: 44.504 - type: ndcg_at_5 value: 46.303 - type: precision_at_1 value: 40.064 - type: precision_at_10 value: 9.051 - type: precision_at_100 value: 1.4500000000000002 - type: precision_at_1000 value: 0.193 - type: precision_at_3 value: 21.444 - type: precision_at_5 value: 15.045 - type: recall_at_1 value: 32.237 - type: recall_at_10 value: 57.943999999999996 - type: recall_at_100 value: 75.98700000000001 - type: recall_at_1000 value: 88.453 - type: recall_at_3 value: 46.268 - type: recall_at_5 value: 51.459999999999994 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.797 - type: map_at_10 value: 51.263000000000005 - type: map_at_100 value: 52.333 - type: map_at_1000 value: 52.393 - type: map_at_3 value: 47.936 - type: map_at_5 value: 49.844 - type: mrr_at_1 value: 44.389 - type: mrr_at_10 value: 54.601 - type: mrr_at_100 value: 55.300000000000004 - type: mrr_at_1000 value: 55.333 - type: mrr_at_3 value: 52.068999999999996 - type: mrr_at_5 value: 53.627 - type: ndcg_at_1 value: 44.389 - type: ndcg_at_10 value: 57.193000000000005 - type: ndcg_at_100 value: 61.307 - type: ndcg_at_1000 value: 62.529 - type: ndcg_at_3 value: 51.607 - type: ndcg_at_5 value: 54.409 - type: precision_at_1 value: 44.389 - type: precision_at_10 value: 9.26 - type: precision_at_100 value: 1.222 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 23.03 - type: precision_at_5 value: 15.887 - type: recall_at_1 value: 38.797 - type: recall_at_10 value: 71.449 - type: recall_at_100 value: 88.881 - type: recall_at_1000 value: 97.52 - type: recall_at_3 value: 56.503 - type: recall_at_5 value: 63.392 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.291999999999998 - type: map_at_10 value: 35.65 - type: map_at_100 value: 36.689 - type: map_at_1000 value: 36.753 - type: map_at_3 value: 32.995000000000005 - type: map_at_5 value: 34.409 - type: mrr_at_1 value: 29.04 - type: mrr_at_10 value: 37.486000000000004 - type: mrr_at_100 value: 38.394 - type: mrr_at_1000 value: 38.445 - type: mrr_at_3 value: 35.028 - type: mrr_at_5 value: 36.305 - type: ndcg_at_1 value: 29.04 - type: ndcg_at_10 value: 40.613 - type: ndcg_at_100 value: 45.733000000000004 - type: ndcg_at_1000 value: 47.447 - type: ndcg_at_3 value: 35.339999999999996 - type: ndcg_at_5 value: 37.706 - type: precision_at_1 value: 29.04 - type: precision_at_10 value: 6.192 - type: precision_at_100 value: 0.9249999999999999 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 14.802000000000001 - type: precision_at_5 value: 10.305 - type: recall_at_1 value: 27.291999999999998 - type: recall_at_10 value: 54.25299999999999 - type: recall_at_100 value: 77.773 - type: recall_at_1000 value: 90.795 - type: recall_at_3 value: 39.731 - type: recall_at_5 value: 45.403999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.326 - type: map_at_10 value: 26.290999999999997 - type: map_at_100 value: 27.456999999999997 - type: map_at_1000 value: 27.583000000000002 - type: map_at_3 value: 23.578 - type: map_at_5 value: 25.113000000000003 - type: mrr_at_1 value: 22.637 - type: mrr_at_10 value: 31.139 - type: mrr_at_100 value: 32.074999999999996 - type: mrr_at_1000 value: 32.147 - type: mrr_at_3 value: 28.483000000000004 - type: mrr_at_5 value: 29.963 - type: ndcg_at_1 value: 22.637 - type: ndcg_at_10 value: 31.717000000000002 - type: ndcg_at_100 value: 37.201 - type: ndcg_at_1000 value: 40.088 - type: ndcg_at_3 value: 26.686 - type: ndcg_at_5 value: 29.076999999999998 - type: precision_at_1 value: 22.637 - type: precision_at_10 value: 5.7090000000000005 - type: precision_at_100 value: 0.979 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.894 - type: precision_at_5 value: 9.328 - type: recall_at_1 value: 18.326 - type: recall_at_10 value: 43.824999999999996 - type: recall_at_100 value: 67.316 - type: recall_at_1000 value: 87.481 - type: recall_at_3 value: 29.866999999999997 - type: recall_at_5 value: 35.961999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.875 - type: map_at_10 value: 40.458 - type: map_at_100 value: 41.772 - type: map_at_1000 value: 41.882999999999996 - type: map_at_3 value: 37.086999999999996 - type: map_at_5 value: 39.153 - type: mrr_at_1 value: 36.381 - type: mrr_at_10 value: 46.190999999999995 - type: mrr_at_100 value: 46.983999999999995 - type: mrr_at_1000 value: 47.032000000000004 - type: mrr_at_3 value: 43.486999999999995 - type: mrr_at_5 value: 45.249 - type: ndcg_at_1 value: 36.381 - type: ndcg_at_10 value: 46.602 - type: ndcg_at_100 value: 51.885999999999996 - type: ndcg_at_1000 value: 53.895 - type: ndcg_at_3 value: 41.155 - type: ndcg_at_5 value: 44.182 - type: precision_at_1 value: 36.381 - type: precision_at_10 value: 8.402 - type: precision_at_100 value: 1.278 - type: precision_at_1000 value: 0.16199999999999998 - type: precision_at_3 value: 19.346 - type: precision_at_5 value: 14.09 - type: recall_at_1 value: 29.875 - type: recall_at_10 value: 59.065999999999995 - type: recall_at_100 value: 80.923 - type: recall_at_1000 value: 93.927 - type: recall_at_3 value: 44.462 - type: recall_at_5 value: 51.89 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.94 - type: map_at_10 value: 35.125 - type: map_at_100 value: 36.476 - type: map_at_1000 value: 36.579 - type: map_at_3 value: 31.840000000000003 - type: map_at_5 value: 33.647 - type: mrr_at_1 value: 30.936000000000003 - type: mrr_at_10 value: 40.637 - type: mrr_at_100 value: 41.471000000000004 - type: mrr_at_1000 value: 41.525 - type: mrr_at_3 value: 38.013999999999996 - type: mrr_at_5 value: 39.469 - type: ndcg_at_1 value: 30.936000000000003 - type: ndcg_at_10 value: 41.295 - type: ndcg_at_100 value: 46.92 - type: ndcg_at_1000 value: 49.183 - type: ndcg_at_3 value: 35.811 - type: ndcg_at_5 value: 38.306000000000004 - type: precision_at_1 value: 30.936000000000003 - type: precision_at_10 value: 7.728 - type: precision_at_100 value: 1.226 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.237 - type: precision_at_5 value: 12.42 - type: recall_at_1 value: 24.94 - type: recall_at_10 value: 54.235 - type: recall_at_100 value: 78.314 - type: recall_at_1000 value: 93.973 - type: recall_at_3 value: 38.925 - type: recall_at_5 value: 45.505 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.250833333333333 - type: map_at_10 value: 35.46875 - type: map_at_100 value: 36.667 - type: map_at_1000 value: 36.78025 - type: map_at_3 value: 32.56733333333334 - type: map_at_5 value: 34.20333333333333 - type: mrr_at_1 value: 30.8945 - type: mrr_at_10 value: 39.636833333333335 - type: mrr_at_100 value: 40.46508333333333 - type: mrr_at_1000 value: 40.521249999999995 - type: mrr_at_3 value: 37.140166666666666 - type: mrr_at_5 value: 38.60999999999999 - type: ndcg_at_1 value: 30.8945 - type: ndcg_at_10 value: 40.93441666666667 - type: ndcg_at_100 value: 46.062416666666664 - type: ndcg_at_1000 value: 48.28341666666667 - type: ndcg_at_3 value: 35.97575 - type: ndcg_at_5 value: 38.3785 - type: precision_at_1 value: 30.8945 - type: precision_at_10 value: 7.180250000000001 - type: precision_at_100 value: 1.1468333333333334 - type: precision_at_1000 value: 0.15283333333333332 - type: precision_at_3 value: 16.525583333333334 - type: precision_at_5 value: 11.798333333333332 - type: recall_at_1 value: 26.250833333333333 - type: recall_at_10 value: 52.96108333333333 - type: recall_at_100 value: 75.45908333333334 - type: recall_at_1000 value: 90.73924999999998 - type: recall_at_3 value: 39.25483333333333 - type: recall_at_5 value: 45.37950000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.595 - type: map_at_10 value: 31.747999999999998 - type: map_at_100 value: 32.62 - type: map_at_1000 value: 32.713 - type: map_at_3 value: 29.48 - type: map_at_5 value: 30.635 - type: mrr_at_1 value: 27.607 - type: mrr_at_10 value: 34.449000000000005 - type: mrr_at_100 value: 35.182 - type: mrr_at_1000 value: 35.254000000000005 - type: mrr_at_3 value: 32.413 - type: mrr_at_5 value: 33.372 - type: ndcg_at_1 value: 27.607 - type: ndcg_at_10 value: 36.041000000000004 - type: ndcg_at_100 value: 40.514 - type: ndcg_at_1000 value: 42.851 - type: ndcg_at_3 value: 31.689 - type: ndcg_at_5 value: 33.479 - type: precision_at_1 value: 27.607 - type: precision_at_10 value: 5.66 - type: precision_at_100 value: 0.868 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 13.446 - type: precision_at_5 value: 9.264 - type: recall_at_1 value: 24.595 - type: recall_at_10 value: 46.79 - type: recall_at_100 value: 67.413 - type: recall_at_1000 value: 84.753 - type: recall_at_3 value: 34.644999999999996 - type: recall_at_5 value: 39.09 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.333000000000002 - type: map_at_10 value: 24.427 - type: map_at_100 value: 25.576 - type: map_at_1000 value: 25.692999999999998 - type: map_at_3 value: 22.002 - type: map_at_5 value: 23.249 - type: mrr_at_1 value: 20.716 - type: mrr_at_10 value: 28.072000000000003 - type: mrr_at_100 value: 29.067 - type: mrr_at_1000 value: 29.137 - type: mrr_at_3 value: 25.832 - type: mrr_at_5 value: 27.045 - type: ndcg_at_1 value: 20.716 - type: ndcg_at_10 value: 29.109 - type: ndcg_at_100 value: 34.797 - type: ndcg_at_1000 value: 37.503 - type: ndcg_at_3 value: 24.668 - type: ndcg_at_5 value: 26.552999999999997 - type: precision_at_1 value: 20.716 - type: precision_at_10 value: 5.351 - type: precision_at_100 value: 0.955 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 11.584999999999999 - type: precision_at_5 value: 8.362 - type: recall_at_1 value: 17.333000000000002 - type: recall_at_10 value: 39.604 - type: recall_at_100 value: 65.525 - type: recall_at_1000 value: 84.651 - type: recall_at_3 value: 27.199 - type: recall_at_5 value: 32.019 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.342 - type: map_at_10 value: 35.349000000000004 - type: map_at_100 value: 36.443 - type: map_at_1000 value: 36.548 - type: map_at_3 value: 32.307 - type: map_at_5 value: 34.164 - type: mrr_at_1 value: 31.063000000000002 - type: mrr_at_10 value: 39.703 - type: mrr_at_100 value: 40.555 - type: mrr_at_1000 value: 40.614 - type: mrr_at_3 value: 37.141999999999996 - type: mrr_at_5 value: 38.812000000000005 - type: ndcg_at_1 value: 31.063000000000002 - type: ndcg_at_10 value: 40.873 - type: ndcg_at_100 value: 45.896 - type: ndcg_at_1000 value: 48.205999999999996 - type: ndcg_at_3 value: 35.522 - type: ndcg_at_5 value: 38.419 - type: precision_at_1 value: 31.063000000000002 - type: precision_at_10 value: 6.866 - type: precision_at_100 value: 1.053 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 16.014 - type: precision_at_5 value: 11.604000000000001 - type: recall_at_1 value: 26.342 - type: recall_at_10 value: 53.40200000000001 - type: recall_at_100 value: 75.251 - type: recall_at_1000 value: 91.13799999999999 - type: recall_at_3 value: 39.103 - type: recall_at_5 value: 46.357 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.71 - type: map_at_10 value: 32.153999999999996 - type: map_at_100 value: 33.821 - type: map_at_1000 value: 34.034 - type: map_at_3 value: 29.376 - type: map_at_5 value: 30.878 - type: mrr_at_1 value: 28.458 - type: mrr_at_10 value: 36.775999999999996 - type: mrr_at_100 value: 37.804 - type: mrr_at_1000 value: 37.858999999999995 - type: mrr_at_3 value: 34.123999999999995 - type: mrr_at_5 value: 35.596 - type: ndcg_at_1 value: 28.458 - type: ndcg_at_10 value: 37.858999999999995 - type: ndcg_at_100 value: 44.194 - type: ndcg_at_1000 value: 46.744 - type: ndcg_at_3 value: 33.348 - type: ndcg_at_5 value: 35.448 - type: precision_at_1 value: 28.458 - type: precision_at_10 value: 7.4510000000000005 - type: precision_at_100 value: 1.5 - type: precision_at_1000 value: 0.23700000000000002 - type: precision_at_3 value: 15.809999999999999 - type: precision_at_5 value: 11.462 - type: recall_at_1 value: 23.71 - type: recall_at_10 value: 48.272999999999996 - type: recall_at_100 value: 77.134 - type: recall_at_1000 value: 93.001 - type: recall_at_3 value: 35.480000000000004 - type: recall_at_5 value: 41.19 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.331 - type: map_at_10 value: 28.926000000000002 - type: map_at_100 value: 29.855999999999998 - type: map_at_1000 value: 29.957 - type: map_at_3 value: 26.395999999999997 - type: map_at_5 value: 27.933000000000003 - type: mrr_at_1 value: 23.105 - type: mrr_at_10 value: 31.008000000000003 - type: mrr_at_100 value: 31.819999999999997 - type: mrr_at_1000 value: 31.887999999999998 - type: mrr_at_3 value: 28.466 - type: mrr_at_5 value: 30.203000000000003 - type: ndcg_at_1 value: 23.105 - type: ndcg_at_10 value: 33.635999999999996 - type: ndcg_at_100 value: 38.277 - type: ndcg_at_1000 value: 40.907 - type: ndcg_at_3 value: 28.791 - type: ndcg_at_5 value: 31.528 - type: precision_at_1 value: 23.105 - type: precision_at_10 value: 5.323 - type: precision_at_100 value: 0.815 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.384 - type: precision_at_5 value: 9.02 - type: recall_at_1 value: 21.331 - type: recall_at_10 value: 46.018 - type: recall_at_100 value: 67.364 - type: recall_at_1000 value: 86.97 - type: recall_at_3 value: 33.395 - type: recall_at_5 value: 39.931 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 17.011000000000003 - type: map_at_10 value: 28.816999999999997 - type: map_at_100 value: 30.761 - type: map_at_1000 value: 30.958000000000002 - type: map_at_3 value: 24.044999999999998 - type: map_at_5 value: 26.557 - type: mrr_at_1 value: 38.696999999999996 - type: mrr_at_10 value: 50.464 - type: mrr_at_100 value: 51.193999999999996 - type: mrr_at_1000 value: 51.219 - type: mrr_at_3 value: 47.339999999999996 - type: mrr_at_5 value: 49.346000000000004 - type: ndcg_at_1 value: 38.696999999999996 - type: ndcg_at_10 value: 38.53 - type: ndcg_at_100 value: 45.525 - type: ndcg_at_1000 value: 48.685 - type: ndcg_at_3 value: 32.282 - type: ndcg_at_5 value: 34.482 - type: precision_at_1 value: 38.696999999999996 - type: precision_at_10 value: 11.895999999999999 - type: precision_at_100 value: 1.95 - type: precision_at_1000 value: 0.254 - type: precision_at_3 value: 24.038999999999998 - type: precision_at_5 value: 18.332 - type: recall_at_1 value: 17.011000000000003 - type: recall_at_10 value: 44.452999999999996 - type: recall_at_100 value: 68.223 - type: recall_at_1000 value: 85.653 - type: recall_at_3 value: 28.784 - type: recall_at_5 value: 35.66 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.516 - type: map_at_10 value: 21.439 - type: map_at_100 value: 31.517 - type: map_at_1000 value: 33.267 - type: map_at_3 value: 15.004999999999999 - type: map_at_5 value: 17.793999999999997 - type: mrr_at_1 value: 71.25 - type: mrr_at_10 value: 79.071 - type: mrr_at_100 value: 79.325 - type: mrr_at_1000 value: 79.33 - type: mrr_at_3 value: 77.708 - type: mrr_at_5 value: 78.546 - type: ndcg_at_1 value: 58.62500000000001 - type: ndcg_at_10 value: 44.889 - type: ndcg_at_100 value: 50.536 - type: ndcg_at_1000 value: 57.724 - type: ndcg_at_3 value: 49.32 - type: ndcg_at_5 value: 46.775 - type: precision_at_1 value: 71.25 - type: precision_at_10 value: 36.175000000000004 - type: precision_at_100 value: 11.940000000000001 - type: precision_at_1000 value: 2.178 - type: precision_at_3 value: 53.583000000000006 - type: precision_at_5 value: 45.550000000000004 - type: recall_at_1 value: 9.516 - type: recall_at_10 value: 27.028000000000002 - type: recall_at_100 value: 57.581 - type: recall_at_1000 value: 80.623 - type: recall_at_3 value: 16.313 - type: recall_at_5 value: 20.674 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.74999999999999 - type: f1 value: 46.46706502669774 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 77.266 - type: map_at_10 value: 84.89999999999999 - type: map_at_100 value: 85.109 - type: map_at_1000 value: 85.123 - type: map_at_3 value: 83.898 - type: map_at_5 value: 84.541 - type: mrr_at_1 value: 83.138 - type: mrr_at_10 value: 89.37 - type: mrr_at_100 value: 89.432 - type: mrr_at_1000 value: 89.43299999999999 - type: mrr_at_3 value: 88.836 - type: mrr_at_5 value: 89.21 - type: ndcg_at_1 value: 83.138 - type: ndcg_at_10 value: 88.244 - type: ndcg_at_100 value: 88.98700000000001 - type: ndcg_at_1000 value: 89.21900000000001 - type: ndcg_at_3 value: 86.825 - type: ndcg_at_5 value: 87.636 - type: precision_at_1 value: 83.138 - type: precision_at_10 value: 10.47 - type: precision_at_100 value: 1.1079999999999999 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.933 - type: precision_at_5 value: 20.36 - type: recall_at_1 value: 77.266 - type: recall_at_10 value: 94.063 - type: recall_at_100 value: 96.993 - type: recall_at_1000 value: 98.414 - type: recall_at_3 value: 90.228 - type: recall_at_5 value: 92.328 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.319 - type: map_at_10 value: 36.943 - type: map_at_100 value: 38.951 - type: map_at_1000 value: 39.114 - type: map_at_3 value: 32.82 - type: map_at_5 value: 34.945 - type: mrr_at_1 value: 44.135999999999996 - type: mrr_at_10 value: 53.071999999999996 - type: mrr_at_100 value: 53.87 - type: mrr_at_1000 value: 53.90200000000001 - type: mrr_at_3 value: 50.77199999999999 - type: mrr_at_5 value: 52.129999999999995 - type: ndcg_at_1 value: 44.135999999999996 - type: ndcg_at_10 value: 44.836 - type: ndcg_at_100 value: 51.754 - type: ndcg_at_1000 value: 54.36 - type: ndcg_at_3 value: 41.658 - type: ndcg_at_5 value: 42.354 - type: precision_at_1 value: 44.135999999999996 - type: precision_at_10 value: 12.284 - type: precision_at_100 value: 1.952 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 27.828999999999997 - type: precision_at_5 value: 20.093 - type: recall_at_1 value: 22.319 - type: recall_at_10 value: 51.528 - type: recall_at_100 value: 76.70700000000001 - type: recall_at_1000 value: 92.143 - type: recall_at_3 value: 38.641 - type: recall_at_5 value: 43.653999999999996 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.182 - type: map_at_10 value: 65.146 - type: map_at_100 value: 66.023 - type: map_at_1000 value: 66.078 - type: map_at_3 value: 61.617999999999995 - type: map_at_5 value: 63.82299999999999 - type: mrr_at_1 value: 80.365 - type: mrr_at_10 value: 85.79 - type: mrr_at_100 value: 85.963 - type: mrr_at_1000 value: 85.968 - type: mrr_at_3 value: 84.952 - type: mrr_at_5 value: 85.503 - type: ndcg_at_1 value: 80.365 - type: ndcg_at_10 value: 73.13499999999999 - type: ndcg_at_100 value: 76.133 - type: ndcg_at_1000 value: 77.151 - type: ndcg_at_3 value: 68.255 - type: ndcg_at_5 value: 70.978 - type: precision_at_1 value: 80.365 - type: precision_at_10 value: 15.359 - type: precision_at_100 value: 1.7690000000000001 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 44.024 - type: precision_at_5 value: 28.555999999999997 - type: recall_at_1 value: 40.182 - type: recall_at_10 value: 76.793 - type: recall_at_100 value: 88.474 - type: recall_at_1000 value: 95.159 - type: recall_at_3 value: 66.036 - type: recall_at_5 value: 71.391 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.7796 - type: ap value: 89.24883716810874 - type: f1 value: 92.7706903433313 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.016 - type: map_at_10 value: 34.408 - type: map_at_100 value: 35.592 - type: map_at_1000 value: 35.64 - type: map_at_3 value: 30.459999999999997 - type: map_at_5 value: 32.721000000000004 - type: mrr_at_1 value: 22.593 - type: mrr_at_10 value: 34.993 - type: mrr_at_100 value: 36.113 - type: mrr_at_1000 value: 36.156 - type: mrr_at_3 value: 31.101 - type: mrr_at_5 value: 33.364 - type: ndcg_at_1 value: 22.579 - type: ndcg_at_10 value: 41.404999999999994 - type: ndcg_at_100 value: 47.018 - type: ndcg_at_1000 value: 48.211999999999996 - type: ndcg_at_3 value: 33.389 - type: ndcg_at_5 value: 37.425000000000004 - type: precision_at_1 value: 22.579 - type: precision_at_10 value: 6.59 - type: precision_at_100 value: 0.938 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.241000000000001 - type: precision_at_5 value: 10.59 - type: recall_at_1 value: 22.016 - type: recall_at_10 value: 62.927 - type: recall_at_100 value: 88.72 - type: recall_at_1000 value: 97.80799999999999 - type: recall_at_3 value: 41.229 - type: recall_at_5 value: 50.88 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.01732786137711 - type: f1 value: 93.76353126402202 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.91746466028272 - type: f1 value: 57.715651682646765 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.5030262273033 - type: f1 value: 74.6693629986121 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.74781439139207 - type: f1 value: 79.96684171018774 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.2156206892017 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.180539484816137 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.51125957874274 - type: mrr value: 33.777037359249995 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 7.248 - type: map_at_10 value: 15.340000000000002 - type: map_at_100 value: 19.591 - type: map_at_1000 value: 21.187 - type: map_at_3 value: 11.329 - type: map_at_5 value: 13.209999999999999 - type: mrr_at_1 value: 47.678 - type: mrr_at_10 value: 57.493 - type: mrr_at_100 value: 58.038999999999994 - type: mrr_at_1000 value: 58.07 - type: mrr_at_3 value: 55.36600000000001 - type: mrr_at_5 value: 56.635999999999996 - type: ndcg_at_1 value: 46.129999999999995 - type: ndcg_at_10 value: 38.653999999999996 - type: ndcg_at_100 value: 36.288 - type: ndcg_at_1000 value: 44.765 - type: ndcg_at_3 value: 43.553 - type: ndcg_at_5 value: 41.317 - type: precision_at_1 value: 47.368 - type: precision_at_10 value: 28.669 - type: precision_at_100 value: 9.158 - type: precision_at_1000 value: 2.207 - type: precision_at_3 value: 40.97 - type: precision_at_5 value: 35.604 - type: recall_at_1 value: 7.248 - type: recall_at_10 value: 19.46 - type: recall_at_100 value: 37.214000000000006 - type: recall_at_1000 value: 67.64099999999999 - type: recall_at_3 value: 12.025 - type: recall_at_5 value: 15.443999999999999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.595000000000002 - type: map_at_10 value: 47.815999999999995 - type: map_at_100 value: 48.811 - type: map_at_1000 value: 48.835 - type: map_at_3 value: 43.225 - type: map_at_5 value: 46.017 - type: mrr_at_1 value: 35.689 - type: mrr_at_10 value: 50.341 - type: mrr_at_100 value: 51.044999999999995 - type: mrr_at_1000 value: 51.062 - type: mrr_at_3 value: 46.553 - type: mrr_at_5 value: 48.918 - type: ndcg_at_1 value: 35.66 - type: ndcg_at_10 value: 55.859 - type: ndcg_at_100 value: 59.864 - type: ndcg_at_1000 value: 60.419999999999995 - type: ndcg_at_3 value: 47.371 - type: ndcg_at_5 value: 51.995000000000005 - type: precision_at_1 value: 35.66 - type: precision_at_10 value: 9.27 - type: precision_at_100 value: 1.1520000000000001 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 21.63 - type: precision_at_5 value: 15.655 - type: recall_at_1 value: 31.595000000000002 - type: recall_at_10 value: 77.704 - type: recall_at_100 value: 94.774 - type: recall_at_1000 value: 98.919 - type: recall_at_3 value: 56.052 - type: recall_at_5 value: 66.623 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.489 - type: map_at_10 value: 85.411 - type: map_at_100 value: 86.048 - type: map_at_1000 value: 86.064 - type: map_at_3 value: 82.587 - type: map_at_5 value: 84.339 - type: mrr_at_1 value: 82.28 - type: mrr_at_10 value: 88.27199999999999 - type: mrr_at_100 value: 88.362 - type: mrr_at_1000 value: 88.362 - type: mrr_at_3 value: 87.372 - type: mrr_at_5 value: 87.995 - type: ndcg_at_1 value: 82.27 - type: ndcg_at_10 value: 89.023 - type: ndcg_at_100 value: 90.191 - type: ndcg_at_1000 value: 90.266 - type: ndcg_at_3 value: 86.37 - type: ndcg_at_5 value: 87.804 - type: precision_at_1 value: 82.27 - type: precision_at_10 value: 13.469000000000001 - type: precision_at_100 value: 1.533 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.797 - type: precision_at_5 value: 24.734 - type: recall_at_1 value: 71.489 - type: recall_at_10 value: 95.824 - type: recall_at_100 value: 99.70599999999999 - type: recall_at_1000 value: 99.979 - type: recall_at_3 value: 88.099 - type: recall_at_5 value: 92.285 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 60.52398807444541 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 65.34855891507871 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.188000000000001 - type: map_at_10 value: 13.987 - type: map_at_100 value: 16.438 - type: map_at_1000 value: 16.829 - type: map_at_3 value: 9.767000000000001 - type: map_at_5 value: 11.912 - type: mrr_at_1 value: 25.6 - type: mrr_at_10 value: 37.744 - type: mrr_at_100 value: 38.847 - type: mrr_at_1000 value: 38.894 - type: mrr_at_3 value: 34.166999999999994 - type: mrr_at_5 value: 36.207 - type: ndcg_at_1 value: 25.6 - type: ndcg_at_10 value: 22.980999999999998 - type: ndcg_at_100 value: 32.039 - type: ndcg_at_1000 value: 38.157000000000004 - type: ndcg_at_3 value: 21.567 - type: ndcg_at_5 value: 19.070999999999998 - type: precision_at_1 value: 25.6 - type: precision_at_10 value: 12.02 - type: precision_at_100 value: 2.5100000000000002 - type: precision_at_1000 value: 0.396 - type: precision_at_3 value: 20.333000000000002 - type: precision_at_5 value: 16.98 - type: recall_at_1 value: 5.188000000000001 - type: recall_at_10 value: 24.372 - type: recall_at_100 value: 50.934999999999995 - type: recall_at_1000 value: 80.477 - type: recall_at_3 value: 12.363 - type: recall_at_5 value: 17.203 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 87.24286275535398 - type: cos_sim_spearman value: 82.62333770991818 - type: euclidean_pearson value: 84.60353717637284 - type: euclidean_spearman value: 82.32990108810047 - type: manhattan_pearson value: 84.6089049738196 - type: manhattan_spearman value: 82.33361785438936 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 87.87428858503165 - type: cos_sim_spearman value: 79.09145886519929 - type: euclidean_pearson value: 86.42669231664036 - type: euclidean_spearman value: 80.03127375435449 - type: manhattan_pearson value: 86.41330338305022 - type: manhattan_spearman value: 80.02492538673368 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 88.67912277322645 - type: cos_sim_spearman value: 89.6171319711762 - type: euclidean_pearson value: 86.56571917398725 - type: euclidean_spearman value: 87.71216907898948 - type: manhattan_pearson value: 86.57459050182473 - type: manhattan_spearman value: 87.71916648349993 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 86.71957379085862 - type: cos_sim_spearman value: 85.01784075851465 - type: euclidean_pearson value: 84.7407848472801 - type: euclidean_spearman value: 84.61063091345538 - type: manhattan_pearson value: 84.71494352494403 - type: manhattan_spearman value: 84.58772077604254 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.40508326325175 - type: cos_sim_spearman value: 89.50912897763186 - type: euclidean_pearson value: 87.82349070086627 - type: euclidean_spearman value: 88.44179162727521 - type: manhattan_pearson value: 87.80181927025595 - type: manhattan_spearman value: 88.43205129636243 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 85.35846741715478 - type: cos_sim_spearman value: 86.61172476741842 - type: euclidean_pearson value: 84.60123125491637 - type: euclidean_spearman value: 85.3001948141827 - type: manhattan_pearson value: 84.56231142658329 - type: manhattan_spearman value: 85.23579900798813 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.94539129818824 - type: cos_sim_spearman value: 88.99349064256742 - type: euclidean_pearson value: 88.7142444640351 - type: euclidean_spearman value: 88.34120813505011 - type: manhattan_pearson value: 88.70363008238084 - type: manhattan_spearman value: 88.31952816956954 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 68.29910260369893 - type: cos_sim_spearman value: 68.79263346213466 - type: euclidean_pearson value: 68.41627521422252 - type: euclidean_spearman value: 66.61602587398579 - type: manhattan_pearson value: 68.49402183447361 - type: manhattan_spearman value: 66.80157792354453 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 87.43703906343708 - type: cos_sim_spearman value: 89.06081805093662 - type: euclidean_pearson value: 87.48311456299662 - type: euclidean_spearman value: 88.07417597580013 - type: manhattan_pearson value: 87.48202249768894 - type: manhattan_spearman value: 88.04758031111642 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.49080620485203 - type: mrr value: 96.19145378949301 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 59.317 - type: map_at_10 value: 69.296 - type: map_at_100 value: 69.738 - type: map_at_1000 value: 69.759 - type: map_at_3 value: 66.12599999999999 - type: map_at_5 value: 67.532 - type: mrr_at_1 value: 62 - type: mrr_at_10 value: 70.176 - type: mrr_at_100 value: 70.565 - type: mrr_at_1000 value: 70.583 - type: mrr_at_3 value: 67.833 - type: mrr_at_5 value: 68.93299999999999 - type: ndcg_at_1 value: 62 - type: ndcg_at_10 value: 74.069 - type: ndcg_at_100 value: 76.037 - type: ndcg_at_1000 value: 76.467 - type: ndcg_at_3 value: 68.628 - type: ndcg_at_5 value: 70.57600000000001 - type: precision_at_1 value: 62 - type: precision_at_10 value: 10 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 26.667 - type: precision_at_5 value: 17.4 - type: recall_at_1 value: 59.317 - type: recall_at_10 value: 87.822 - type: recall_at_100 value: 96.833 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 73.06099999999999 - type: recall_at_5 value: 77.928 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.88910891089108 - type: cos_sim_ap value: 97.236958456951 - type: cos_sim_f1 value: 94.39999999999999 - type: cos_sim_precision value: 94.39999999999999 - type: cos_sim_recall value: 94.39999999999999 - type: dot_accuracy value: 99.82574257425742 - type: dot_ap value: 94.94344759441888 - type: dot_f1 value: 91.17352056168507 - type: dot_precision value: 91.44869215291752 - type: dot_recall value: 90.9 - type: euclidean_accuracy value: 99.88415841584158 - type: euclidean_ap value: 97.2044250782305 - type: euclidean_f1 value: 94.210786739238 - type: euclidean_precision value: 93.24191968658178 - type: euclidean_recall value: 95.19999999999999 - type: manhattan_accuracy value: 99.88613861386139 - type: manhattan_ap value: 97.20683205497689 - type: manhattan_f1 value: 94.2643391521197 - type: manhattan_precision value: 94.02985074626866 - type: manhattan_recall value: 94.5 - type: max_accuracy value: 99.88910891089108 - type: max_ap value: 97.236958456951 - type: max_f1 value: 94.39999999999999 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 66.53940781726187 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 36.71865011295108 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 55.3218674533331 - type: mrr value: 56.28279910449028 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.723915667479673 - type: cos_sim_spearman value: 32.029070449745234 - type: dot_pearson value: 28.864944212481454 - type: dot_spearman value: 27.939266999596725 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.231 - type: map_at_10 value: 1.949 - type: map_at_100 value: 10.023 - type: map_at_1000 value: 23.485 - type: map_at_3 value: 0.652 - type: map_at_5 value: 1.054 - type: mrr_at_1 value: 86 - type: mrr_at_10 value: 92.067 - type: mrr_at_100 value: 92.067 - type: mrr_at_1000 value: 92.067 - type: mrr_at_3 value: 91.667 - type: mrr_at_5 value: 92.067 - type: ndcg_at_1 value: 83 - type: ndcg_at_10 value: 76.32900000000001 - type: ndcg_at_100 value: 54.662 - type: ndcg_at_1000 value: 48.062 - type: ndcg_at_3 value: 81.827 - type: ndcg_at_5 value: 80.664 - type: precision_at_1 value: 86 - type: precision_at_10 value: 80 - type: precision_at_100 value: 55.48 - type: precision_at_1000 value: 20.938000000000002 - type: precision_at_3 value: 85.333 - type: precision_at_5 value: 84.39999999999999 - type: recall_at_1 value: 0.231 - type: recall_at_10 value: 2.158 - type: recall_at_100 value: 13.344000000000001 - type: recall_at_1000 value: 44.31 - type: recall_at_3 value: 0.6779999999999999 - type: recall_at_5 value: 1.13 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.524 - type: map_at_10 value: 10.183 - type: map_at_100 value: 16.625 - type: map_at_1000 value: 18.017 - type: map_at_3 value: 5.169 - type: map_at_5 value: 6.772 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 47.128 - type: mrr_at_100 value: 48.458 - type: mrr_at_1000 value: 48.473 - type: mrr_at_3 value: 44.897999999999996 - type: mrr_at_5 value: 45.306000000000004 - type: ndcg_at_1 value: 30.612000000000002 - type: ndcg_at_10 value: 24.928 - type: ndcg_at_100 value: 37.613 - type: ndcg_at_1000 value: 48.528 - type: ndcg_at_3 value: 28.829 - type: ndcg_at_5 value: 25.237 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 22.448999999999998 - type: precision_at_100 value: 8.02 - type: precision_at_1000 value: 1.537 - type: precision_at_3 value: 30.612000000000002 - type: precision_at_5 value: 24.490000000000002 - type: recall_at_1 value: 2.524 - type: recall_at_10 value: 16.38 - type: recall_at_100 value: 49.529 - type: recall_at_1000 value: 83.598 - type: recall_at_3 value: 6.411 - type: recall_at_5 value: 8.932 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.09020000000001 - type: ap value: 14.451710060978993 - type: f1 value: 54.7874410609049 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.745331069609506 - type: f1 value: 60.08387848592697 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.71549485462037 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.39345532574357 - type: cos_sim_ap value: 78.16796549696478 - type: cos_sim_f1 value: 71.27713276123171 - type: cos_sim_precision value: 68.3115626511853 - type: cos_sim_recall value: 74.51187335092348 - type: dot_accuracy value: 85.12248912201228 - type: dot_ap value: 69.26039256107077 - type: dot_f1 value: 65.04294321240867 - type: dot_precision value: 63.251059586138126 - type: dot_recall value: 66.93931398416886 - type: euclidean_accuracy value: 87.07754664123503 - type: euclidean_ap value: 77.7872176038945 - type: euclidean_f1 value: 70.85587801278899 - type: euclidean_precision value: 66.3519115614924 - type: euclidean_recall value: 76.01583113456465 - type: manhattan_accuracy value: 87.07754664123503 - type: manhattan_ap value: 77.7341400185556 - type: manhattan_f1 value: 70.80310880829015 - type: manhattan_precision value: 69.54198473282443 - type: manhattan_recall value: 72.1108179419525 - type: max_accuracy value: 87.39345532574357 - type: max_ap value: 78.16796549696478 - type: max_f1 value: 71.27713276123171 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.09457833663213 - type: cos_sim_ap value: 86.33024314706873 - type: cos_sim_f1 value: 78.59623733719248 - type: cos_sim_precision value: 74.13322413322413 - type: cos_sim_recall value: 83.63104404065291 - type: dot_accuracy value: 88.3086894089339 - type: dot_ap value: 83.92225241805097 - type: dot_f1 value: 76.8721826377781 - type: dot_precision value: 72.8168044077135 - type: dot_recall value: 81.40591315060055 - type: euclidean_accuracy value: 88.77052043311213 - type: euclidean_ap value: 85.7410710218755 - type: euclidean_f1 value: 77.97705489398781 - type: euclidean_precision value: 73.77713657598241 - type: euclidean_recall value: 82.68401601478288 - type: manhattan_accuracy value: 88.73753250281368 - type: manhattan_ap value: 85.72867199072802 - type: manhattan_f1 value: 77.89774182922812 - type: manhattan_precision value: 74.23787931635857 - type: manhattan_recall value: 81.93717277486911 - type: max_accuracy value: 89.09457833663213 - type: max_ap value: 86.33024314706873 - type: max_f1 value: 78.59623733719248 license: mit language: - en --- # Universal AnglE Embedding 📢 **is licensed under MIT. Feel free to use it in any scenario.** **If you use it for academic papers, you could cite us via 👉 citation info.** **🤝 Follow us on:** - GitHub: - Preprint Paper: AnglE-optimized Text Embeddings - Conference Paper: AoE: Angle-optimized Embeddings for Semantic Textual Similarity (ACL24) - **📘 Documentation**: Welcome to using AnglE to train and infer powerful sentence embeddings. **🏆 Achievements** - 📅 May 16, 2024 | AnglE's paper is accepted by ACL 2024 Main Conference - 📅 Dec 4, 2024 | 🔥 Our universal English sentence embedding achieves **SOTA** on the MTEB Leaderboard with an average score of 64.64! !image/jpeg **🧑‍🤝‍🧑 Siblings:** - WhereIsAI/UAE-Code-Large-V1: This model can be used for code or GitHub issue similarity measurement. # Usage ## 1. angle_emb 1) Non-Retrieval Tasks There is no need to specify any prompts. 2) Retrieval Tasks For retrieval purposes, please use the prompt for query (not for document). ## 2. sentence transformer ## 3. Infinity Infinity is a MIT licensed server for OpenAI-compatible deployment. # Citation If you use our pre-trained models, welcome to support us by citing our work:", + "model_explanation_gemini": "Generates sentence embeddings for tasks like classification, retrieval, clustering, and semantic textual similarity." +} \ No newline at end of file diff --git a/data/model_data_json/WinKawaks_vit-small-patch16-224.json b/data/model_data_json/WinKawaks_vit-small-patch16-224.json new file mode 100644 index 0000000000000000000000000000000000000000..108de270ffde90f61e3a77de8e69d7c0a7e4d406 --- /dev/null +++ b/data/model_data_json/WinKawaks_vit-small-patch16-224.json @@ -0,0 +1,20 @@ +{ + "model_id": "WinKawaks/vit-small-patch16-224", + "downloads": 130851, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "vit", + "image-classification", + "vision", + "dataset:imagenet", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- Google didn't publish vit-tiny and vit-small model checkpoints in Hugging Face. I converted the weights from the timm repository. This model is used in the same way as ViT-base. Note that [safetensors] model requires torch 2.0 environment.", + "model_explanation_gemini": "Converts timm repository weights for ViT-tiny and ViT-small models to enable vision-based image classification tasks similar to ViT-base." +} \ No newline at end of file diff --git a/data/model_data_json/WinKawaks_vit-tiny-patch16-224.json b/data/model_data_json/WinKawaks_vit-tiny-patch16-224.json new file mode 100644 index 0000000000000000000000000000000000000000..4c7623f981f0fe974667b9e5b1493c42ea5b4b0b --- /dev/null +++ b/data/model_data_json/WinKawaks_vit-tiny-patch16-224.json @@ -0,0 +1,19 @@ +{ + "model_id": "WinKawaks/vit-tiny-patch16-224", + "downloads": 711314, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-classification", + "vision", + "dataset:imagenet", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- Google didn't publish vit-tiny and vit-small model checkpoints in Hugging Face. I converted the weights from the timm repository. This model is used in the same way as ViT-base. Note that [safetensors] model requires torch 2.0 environment.", + "model_explanation_gemini": "Converts timm repository weights for a tiny Vision Transformer (ViT) model to perform image classification tasks similarly to ViT-base." +} \ No newline at end of file diff --git a/data/model_data_json/XLabs-AI_flux-RealismLora.json b/data/model_data_json/XLabs-AI_flux-RealismLora.json new file mode 100644 index 0000000000000000000000000000000000000000..f59da0235da58ef4ec84d0167a700b05c0fbc6d6 --- /dev/null +++ b/data/model_data_json/XLabs-AI_flux-RealismLora.json @@ -0,0 +1,19 @@ +{ + "model_id": "XLabs-AI/flux-RealismLora", + "downloads": 110815, + "tags": [ + "diffusers", + "lora", + "Stable Diffusion", + "image-generation", + "Flux", + "text-to-image", + "en", + "base_model:black-forest-labs/FLUX.1-dev", + "base_model:adapter:black-forest-labs/FLUX.1-dev", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: flux-1-dev-non-commercial-license license_link: language: - en pipeline_tag: text-to-image tags: - lora - Stable Diffusion - image-generation - Flux - diffusers base_model: black-forest-labs/FLUX.1-dev --- !Lora Photorealism for Flux ", + "model_explanation_gemini": "Enhances FLUX.1-dev's Stable Diffusion capabilities with photorealistic image generation using LoRA fine-tuning for non-commercial use." +} \ No newline at end of file diff --git a/data/model_data_json/XLabs-AI_xflux_text_encoders.json b/data/model_data_json/XLabs-AI_xflux_text_encoders.json new file mode 100644 index 0000000000000000000000000000000000000000..c90fc7bdf5b31cc0ee2a9846501bb214f2886f76 --- /dev/null +++ b/data/model_data_json/XLabs-AI_xflux_text_encoders.json @@ -0,0 +1,18 @@ +{ + "model_id": "XLabs-AI/xflux_text_encoders", + "downloads": 510191, + "tags": [ + "transformers", + "safetensors", + "t5", + "pytorch", + "text2text-generation", + "en", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: text2text-generation tags: - pytorch - transformers --- # Description Text encoder weights from Google's T5 model", + "model_explanation_gemini": "Encodes text using Google's T5 model weights for text-to-text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Xenova_all-MiniLM-L6-v2.json b/data/model_data_json/Xenova_all-MiniLM-L6-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..7e7bda6cf58233b8f75205d70ac7e2a7f0be2454 --- /dev/null +++ b/data/model_data_json/Xenova_all-MiniLM-L6-v2.json @@ -0,0 +1,16 @@ +{ + "model_id": "Xenova/all-MiniLM-L6-v2", + "downloads": 334341, + "tags": [ + "transformers.js", + "onnx", + "bert", + "feature-extraction", + "base_model:sentence-transformers/all-MiniLM-L6-v2", + "base_model:quantized:sentence-transformers/all-MiniLM-L6-v2", + "license:apache-2.0", + "region:us" + ], + "description": "--- base_model: sentence-transformers/all-MiniLM-L6-v2 library_name: transformers.js license: apache-2.0 --- with ONNX weights to be compatible with Transformers.js. ## Usage (Transformers.js) If you haven't already, you can install the Transformers.js JavaScript library from NPM using: You can then use the model to compute embeddings like this: You can convert this Tensor to a nested JavaScript array using : Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named ).", + "model_explanation_gemini": "Converts text into embeddings for web applications using ONNX weights compatible with Transformers.js." +} \ No newline at end of file diff --git a/data/model_data_json/Xenova_bge-base-en-v1.5.json b/data/model_data_json/Xenova_bge-base-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..59d3c916b2cbe5ead163a50e7eaf8e4ab01bdb01 --- /dev/null +++ b/data/model_data_json/Xenova_bge-base-en-v1.5.json @@ -0,0 +1,16 @@ +{ + "model_id": "Xenova/bge-base-en-v1.5", + "downloads": 797029, + "tags": [ + "transformers.js", + "onnx", + "bert", + "feature-extraction", + "base_model:BAAI/bge-base-en-v1.5", + "base_model:quantized:BAAI/bge-base-en-v1.5", + "license:mit", + "region:us" + ], + "description": "--- base_model: BAAI/bge-base-en-v1.5 library_name: transformers.js license: mit --- with ONNX weights to be compatible with Transformers.js. ## Usage (Transformers.js) If you haven't already, you can install the Transformers.js JavaScript library from NPM using: You can then use the model to compute embeddings, as follows: You can also use the model for retrieval. For example: --- Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named ).", + "model_explanation_gemini": "Generates embeddings and supports retrieval tasks using ONNX weights for web compatibility." +} \ No newline at end of file diff --git a/data/model_data_json/Xenova_clip-vit-base-patch32.json b/data/model_data_json/Xenova_clip-vit-base-patch32.json new file mode 100644 index 0000000000000000000000000000000000000000..d3d7fc0d41dd05ed9c28d91461636759df854da0 --- /dev/null +++ b/data/model_data_json/Xenova_clip-vit-base-patch32.json @@ -0,0 +1,15 @@ +{ + "model_id": "Xenova/clip-vit-base-patch32", + "downloads": 139162, + "tags": [ + "transformers.js", + "onnx", + "clip", + "zero-shot-image-classification", + "base_model:openai/clip-vit-base-patch32", + "base_model:quantized:openai/clip-vit-base-patch32", + "region:us" + ], + "description": "--- base_model: openai/clip-vit-base-patch32 library_name: transformers.js --- with ONNX weights to be compatible with Transformers.js. ## Usage (Transformers.js) If you haven't already, you can install the Transformers.js JavaScript library from NPM using: **Example:** Perform zero-shot image classification with the API. --- Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named ).", + "model_explanation_gemini": "Enables zero-shot image classification in web environments using ONNX-optimized CLIP ViT-B/32 model via Transformers.js." +} \ No newline at end of file diff --git a/data/model_data_json/Xenova_e5-base-v2.json b/data/model_data_json/Xenova_e5-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..983882f123d25f3d3b7cb97170b98fb3d5efffb0 --- /dev/null +++ b/data/model_data_json/Xenova_e5-base-v2.json @@ -0,0 +1,15 @@ +{ + "model_id": "Xenova/e5-base-v2", + "downloads": 36740, + "tags": [ + "transformers.js", + "onnx", + "bert", + "feature-extraction", + "base_model:intfloat/e5-base-v2", + "base_model:quantized:intfloat/e5-base-v2", + "region:us" + ], + "description": "--- base_model: intfloat/e5-base-v2 library_name: transformers.js --- with ONNX weights to be compatible with Transformers.js. Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named ).", + "model_explanation_gemini": "Converts the 'intfloat/e5-base-v2' model to ONNX format for web compatibility with Transformers.js." +} \ No newline at end of file diff --git a/data/model_data_json/Xenova_jina-embeddings-v2-small-en.json b/data/model_data_json/Xenova_jina-embeddings-v2-small-en.json new file mode 100644 index 0000000000000000000000000000000000000000..eb75a2a90ef3eb80d3d3eeeaaf90adb2fa33d114 --- /dev/null +++ b/data/model_data_json/Xenova_jina-embeddings-v2-small-en.json @@ -0,0 +1,17 @@ +{ + "model_id": "Xenova/jina-embeddings-v2-small-en", + "downloads": 125208, + "tags": [ + "transformers.js", + "onnx", + "bert", + "fill-mask", + "feature-extraction", + "custom_code", + "base_model:jinaai/jina-embeddings-v2-small-en", + "base_model:quantized:jinaai/jina-embeddings-v2-small-en", + "region:us" + ], + "description": "--- base_model: jinaai/jina-embeddings-v2-small-en library_name: transformers.js pipeline_tag: feature-extraction --- with ONNX weights to be compatible with Transformers.js. ## Usage with 🤗 Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM using: You can then use the model as follows: --- Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named ).", + "model_explanation_gemini": "Converts text into numerical embeddings for feature extraction, optimized for web use with ONNX weights compatible with Transformers.js." +} \ No newline at end of file diff --git a/data/model_data_json/Xenova_segformer-b0-finetuned-ade-512-512.json b/data/model_data_json/Xenova_segformer-b0-finetuned-ade-512-512.json new file mode 100644 index 0000000000000000000000000000000000000000..7f3511b830064e83e3911ba709500c5b836fe95d --- /dev/null +++ b/data/model_data_json/Xenova_segformer-b0-finetuned-ade-512-512.json @@ -0,0 +1,15 @@ +{ + "model_id": "Xenova/segformer-b0-finetuned-ade-512-512", + "downloads": 93820, + "tags": [ + "transformers.js", + "onnx", + "segformer", + "image-segmentation", + "base_model:nvidia/segformer-b0-finetuned-ade-512-512", + "base_model:quantized:nvidia/segformer-b0-finetuned-ade-512-512", + "region:us" + ], + "description": "--- base_model: nvidia/segformer-b0-finetuned-ade-512-512 library_name: transformers.js pipeline_tag: image-segmentation --- with ONNX weights to be compatible with Transformers.js. ## Usage (Transformers.js) If you haven't already, you can install the Transformers.js JavaScript library from NPM using: **Example:** Image segmentation with . You can visualize the outputs with: --- Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named ).", + "model_explanation_gemini": "Performs image segmentation using ONNX-optimized weights for web compatibility with Transformers.js." +} \ No newline at end of file diff --git a/data/model_data_json/Xenova_tiny-random-Phi3ForCausalLM.json b/data/model_data_json/Xenova_tiny-random-Phi3ForCausalLM.json new file mode 100644 index 0000000000000000000000000000000000000000..cc0ef911c1be356cbc4afe4a39dc2892fa0be621 --- /dev/null +++ b/data/model_data_json/Xenova_tiny-random-Phi3ForCausalLM.json @@ -0,0 +1,19 @@ +{ + "model_id": "Xenova/tiny-random-Phi3ForCausalLM", + "downloads": 93557, + "tags": [ + "transformers", + "onnx", + "safetensors", + "phi3", + "text-generation", + "conversational", + "arxiv:1910.09700", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID Code used to generate the model: ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]", + "model_explanation_gemini": "A randomly initialized tiny Phi3 model for causal language modeling tasks." +} \ No newline at end of file diff --git a/data/model_data_json/Xuhui_ToxDect-roberta-large.json b/data/model_data_json/Xuhui_ToxDect-roberta-large.json new file mode 100644 index 0000000000000000000000000000000000000000..161a0ffeb215e4b905494a45698f80c603fd44c0 --- /dev/null +++ b/data/model_data_json/Xuhui_ToxDect-roberta-large.json @@ -0,0 +1,16 @@ +{ + "model_id": "Xuhui/ToxDect-roberta-large", + "downloads": 555445, + "tags": [ + "transformers", + "pytorch", + "roberta", + "text-classification", + "arxiv:2102.00086", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - - thumbnail: tags: - - - license: datasets: - - metrics: - - --- # Toxic language detection ## Model description A toxic language detection model trained on tweets. The base model is Roberta-large. For more information, including the **training data**, **limitations and bias**, please refer to the paper and Github repo for more details. #### How to use Note that LABEL_1 means toxic and LABEL_0 means non-toxic in the output. ## Training procedure The random seed for this model is 22. For other details, please refer to the Github repo for more details. ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects toxic language in tweets, classifying text as toxic (LABEL_1) or non-toxic (LABEL_0) using a Roberta-large base model." +} \ No newline at end of file diff --git a/data/model_data_json/Yehor_w2v-xls-r-uk.json b/data/model_data_json/Yehor_w2v-xls-r-uk.json new file mode 100644 index 0000000000000000000000000000000000000000..5dec9f7fc38a96fc3cbb99a765bb6d16d2af72d9 --- /dev/null +++ b/data/model_data_json/Yehor_w2v-xls-r-uk.json @@ -0,0 +1,21 @@ +{ + "model_id": "Yehor/w2v-xls-r-uk", + "downloads": 245477, + "tags": [ + "transformers", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "uk", + "dataset:mozilla-foundation/common_voice_10_0", + "base_model:facebook/wav2vec2-xls-r-300m", + "base_model:finetune:facebook/wav2vec2-xls-r-300m", + "doi:10.57967/hf/4556", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: facebook/wav2vec2-xls-r-300m language: - uk license: \"apache-2.0\" tags: - automatic-speech-recognition datasets: - mozilla-foundation/common_voice_10_0 metrics: - wer model-index: - name: w2v-xls-r-uk results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: common_voice_10_0 type: common_voice_10_0 config: uk split: test args: uk metrics: - name: WER type: wer value: 20.24 - name: CER type: cer value: 3.64 --- 🚨🚨🚨 **ATTENTION!** 🚨🚨🚨 **Use an updated model**: --- ## Community - Discord: - Speech Recognition: - Speech Synthesis: See other Ukrainian models: ## Evaluation results Metrics (float16) using library with : - WER: 0.2024 metric, 20.24% - CER: 0.0364 metric, 3.64% - Accuracy on words: 79.76% - Accuracy on chars: 96.36% - Inference time: 63.4848 seconds - Audio duration: 16665.5212 seconds - RTF: 0.0038 ## Cite this work", + "model_explanation_gemini": "A Ukrainian automatic speech recognition model based on Facebook's wav2vec2-xls-r-300m, fine-tuned on Common Voice 10.0 to transcribe speech into text with 20.24% WER and 3.64% CER." +} \ No newline at end of file diff --git a/data/model_data_json/Yntec_insaneRealistic_v2.json b/data/model_data_json/Yntec_insaneRealistic_v2.json new file mode 100644 index 0000000000000000000000000000000000000000..d68b71ce1b941ec6986fb4688c9ff70110630c55 --- /dev/null +++ b/data/model_data_json/Yntec_insaneRealistic_v2.json @@ -0,0 +1,23 @@ +{ + "model_id": "Yntec/insaneRealistic_v2", + "downloads": 36916, + "tags": [ + "diffusers", + "safetensors", + "Base Model", + "Realism", + "Female", + "Woman", + "cordonsolution8", + "stable-diffusion", + "stable-diffusion-diffusers", + "text-to-image", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: creativeml-openrail-m library_name: diffusers pipeline_tag: text-to-image tags: - Base Model - Realism - Female - Woman - cordonsolution8 - stable-diffusion - stable-diffusion-diffusers - diffusers - text-to-image --- # Insane Realistic 2 Original page: Samples and prompts: !Free online AI image generator Insane Realistic 2 (Click for larger) Top left: a cute girl with freckles on her face, cgsociety unreal engine, wet t-shirt, short skirt, style of aenami alena, trending on artstartion, inspired by Fyodor Vasilyev, looks a bit similar to amy adams, emissive light, fluffy orange skin, dribbble, dramatic rendering Top right: 90s grainy vhs still young mother loose shirt, headband. holding a baby, on the couch, posing, bow. bokeh, bright lighting. smile Bottom left: beautiful image of the first day of creation of the world and planet earth in the dark deep space, light and darkness separated, planets, under a black night sky of astronomical glittering starlight in the outer reaches of the solar system beyond, trending on artstation, octane render, symmetry by raqib shaw, presence of god, eye of god. Bottom right: hill, mountains, sunset, field, world, ocean, trees, underground, city, village, path, urban, mountain, buildings, waterfall, skyline, nature, town, industrial, architecture, road, jungle, valley, bridge, horizon, landscape, house, building, environment, wilderness, enviroment, river, cave, desert, forest", + "model_explanation_gemini": "Generates highly realistic images from text prompts, specializing in female portraits, landscapes, and artistic scenes." +} \ No newline at end of file diff --git a/data/model_data_json/ZhengPeng7_BiRefNet.json b/data/model_data_json/ZhengPeng7_BiRefNet.json new file mode 100644 index 0000000000000000000000000000000000000000..a168af0edf5230e4f39ae6d2618ca5a229ff3f48 --- /dev/null +++ b/data/model_data_json/ZhengPeng7_BiRefNet.json @@ -0,0 +1,25 @@ +{ + "model_id": "ZhengPeng7/BiRefNet", + "downloads": 588605, + "tags": [ + "birefnet", + "safetensors", + "image-segmentation", + "background-removal", + "mask-generation", + "Dichotomous Image Segmentation", + "Camouflaged Object Detection", + "Salient Object Detection", + "pytorch_model_hub_mixin", + "model_hub_mixin", + "transformers", + "transformers.js", + "custom_code", + "arxiv:2401.03407", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: birefnet tags: - background-removal - mask-generation - Dichotomous Image Segmentation - Camouflaged Object Detection - Salient Object Detection - pytorch_model_hub_mixin - model_hub_mixin - transformers - transformers.js repo_url: pipeline_tag: image-segmentation license: mit ---

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

Peng Zheng 1,4,5,6,  Dehong Gao 2,  Deng-Ping Fan 1*,  Li Liu 3,  Jorma Laaksonen 4,  Wanli Ouyang 5,  Nicu Sebe 6
1 Nankai University  2 Northwestern Polytechnical University  3 National University of Defense Technology  4 Aalto University  5 Shanghai AI Laboratory  6 University of Trento 
Only use the weights on HuggingFace -- Pro: codes are always latest; Con: Need to clone the BiRefNet repo from my GitHub. #### Use codes from GitHub + weights from local space > Only use the weights and codes both locally. #### Use the loaded BiRefNet for inference ### 2. Use inference endpoint locally: > You may need to click the *deploy* and set up the endpoint by yourself, which would make some costs. > This BiRefNet for standard dichotomous image segmentation (DIS) is trained on **DIS-TR** and validated on **DIS-TEs and DIS-VD**. ## This repo holds the official model weights of \"Bilateral Reference for High-Resolution Dichotomous Image Segmentation\" (_CAAI AIR 2024_). This repo contains the weights of BiRefNet proposed in our paper, which has achieved the SOTA performance on three tasks (DIS, HRSOD, and COD). Go to my GitHub page for BiRefNet codes and the latest updates: :) #### Try our online demos for inference: + Online **Image Inference** on Colab: ![Open In Colab]( + **Online Inference with GUI on Hugging Face** with adjustable resolutions: ![Hugging Face Spaces]( + **Inference and evaluation** of your given weights: ![Open In Colab]( ## Acknowledgement: + Many thanks to @Freepik for their generous support on GPU resources for training higher resolution BiRefNet models and more of my explorations. + Many thanks to @fal for their generous support on GPU resources for training better general BiRefNet models. + Many thanks to @not-lain for his help on the better deployment of our BiRefNet model on HuggingFace. ## Citation", + "model_explanation_gemini": "Performs high-resolution dichotomous image segmentation for tasks like background removal, mask generation, and camouflaged object detection." +} \ No newline at end of file diff --git a/data/model_data_json/ZhengPeng7_BiRefNet_HR-matting.json b/data/model_data_json/ZhengPeng7_BiRefNet_HR-matting.json new file mode 100644 index 0000000000000000000000000000000000000000..3280391fa64fc6e0c22178da121e6c704257bd4e --- /dev/null +++ b/data/model_data_json/ZhengPeng7_BiRefNet_HR-matting.json @@ -0,0 +1,23 @@ +{ + "model_id": "ZhengPeng7/BiRefNet_HR-matting", + "downloads": 131064, + "tags": [ + "birefnet", + "safetensors", + "background-removal", + "mask-generation", + "Dichotomous Image Segmentation", + "Camouflaged Object Detection", + "Salient Object Detection", + "pytorch_model_hub_mixin", + "model_hub_mixin", + "image-segmentation", + "custom_code", + "arxiv:2401.03407", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: birefnet tags: - background-removal - mask-generation - Dichotomous Image Segmentation - Camouflaged Object Detection - Salient Object Detection - pytorch_model_hub_mixin - model_hub_mixin repo_url: pipeline_tag: image-segmentation license: mit --- > This BiRefNet was trained with images in for higher resolution image matting with transparency. ### Performance: > All tested in FP16 mode. | Dataset | Method | Resolution | maxFm | wFmeasure | MAE | Smeasure | meanEm | HCE | maxEm | meanFm | adpEm | adpFm | mBA | maxBIoU | meanBIoU | | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | | TE-AM-2k | **BiRefNet_HR-matting**-epoch_135 | 2048x2048 | .974 | .997 | .989 | .002 | .998 | .987 | .988 | .961 | .981 | .000 | .879 | .965 | .893 | | TE-P3M-500-NP | **BiRefNet_HR-matting**-epoch_135 | 2048x2048 | .980 | .996 | .989 | .002 | .997 | .987 | .989 | .880 | .900 | .000 | .853 | .947 | .897 | | TE-AM-2k | **BiRefNet-matting**-epoch_100 | 1024x1024 | .973 | .996 | .990 | .003 | .997 | .987 | .989 | .987 | .991 | .000 | .846 | .952 | .890 | | TE-P3M-500-NP | **BiRefNet-matting**-epoch_100 | 1024x1024 | .979 | .996 | .990 | .003 | .997 | .987 | .989 | .928 | .951 | .000 | .830 | .940 | .891 | | TE-AM-2k | **BiRefNet-matting**-epoch_100 | 2048x2048 | .971 | .996 | .990 | .003 | .997 | .987 | .988 | .990 | .992 | .000 | .838 | .941 | .891 | | TE-P3M-500-NP | **BiRefNet-matting**-epoch_100 | 2048x2048 | .978 | .995 | .990 | .003 | .996 | .987 | .989 | .955 | .971 | .000 | .818 | .931 | .891 |

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

Peng Zheng 1,4,5,6,  Dehong Gao 2,  Deng-Ping Fan 1*,  Li Liu 3,  Jorma Laaksonen 4,  Wanli Ouyang 5,  Nicu Sebe 6
1 Nankai University  2 Northwestern Polytechnical University  3 National University of Defense Technology  4 Aalto University  5 Shanghai AI Laboratory  6 University of Trento 
Only use the weights on HuggingFace -- Pro: codes are always latest; Con: Need to clone the BiRefNet repo from my GitHub. #### Use codes from GitHub + weights from local space > Only use the weights and codes both locally. #### Use the loaded BiRefNet for inference ### 2. Use inference endpoint locally: > You may need to click the *deploy* and set up the endpoint by yourself, which would make some costs. > This BiRefNet for standard dichotomous image segmentation (DIS) is trained on **DIS-TR** and validated on **DIS-TEs and DIS-VD**. ## This repo holds the official model weights of \"Bilateral Reference for High-Resolution Dichotomous Image Segmentation\" (_CAAI AIR 2024_). This repo contains the weights of BiRefNet proposed in our paper, which has achieved the SOTA performance on three tasks (DIS, HRSOD, and COD). Go to my GitHub page for BiRefNet codes and the latest updates: :) #### Try our online demos for inference: + Online **Image Inference** on Colab: ![Open In Colab]( + **Online Inference with GUI on Hugging Face** with adjustable resolutions: ![Hugging Face Spaces]( + **Inference and evaluation** of your given weights: ![Open In Colab]( ## Acknowledgement: + Many thanks to @freepik for their generous support on GPU resources for training this model! ## Citation", + "model_explanation_gemini": "Performs high-resolution image matting with transparency for tasks like background removal, mask generation, and dichotomous image segmentation." +} \ No newline at end of file diff --git a/data/model_data_json/abhinand_MedEmbed-large-v0.1.json b/data/model_data_json/abhinand_MedEmbed-large-v0.1.json new file mode 100644 index 0000000000000000000000000000000000000000..ba8e7cd9d9f4b6530b5e0cd6abd8e2a924618478 --- /dev/null +++ b/data/model_data_json/abhinand_MedEmbed-large-v0.1.json @@ -0,0 +1,25 @@ +{ + "model_id": "abhinand/MedEmbed-large-v0.1", + "downloads": 183131, + "tags": [ + "sentence-transformers", + "safetensors", + "bert", + "medembed", + "medical-embedding", + "clinical-embedding", + "information-retrieval", + "en", + "dataset:MedicalQARetrieval", + "dataset:NFCorpus", + "dataset:PublicHealthQA", + "dataset:TRECCOVID", + "dataset:ArguAna", + "base_model:BAAI/bge-large-en-v1.5", + "base_model:finetune:BAAI/bge-large-en-v1.5", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: en tags: - medembed - medical-embedding - clinical-embedding - information-retrieval - sentence-transformers license: apache-2.0 datasets: - MedicalQARetrieval - NFCorpus - PublicHealthQA - TRECCOVID - ArguAna metrics: - nDCG - MAP - Recall - Precision - MRR base_model: - BAAI/bge-large-en-v1.5 --- # MedEmbed: Specialized Embedding Model for Medical and Clinical Information Retrieval !benchmark-scores ## Model Description MedEmbed is a family of embedding models fine-tuned specifically for medical and clinical data, designed to enhance performance in healthcare-related natural language processing (NLP) tasks, particularly information retrieval. **GitHub Repo:** **Technical Blog Post:** ## Intended Use This model is intended for use in medical and clinical contexts to improve information retrieval, question answering, and semantic search tasks. It can be integrated into healthcare systems, research tools, and medical literature databases to enhance search capabilities and information access. ## Training Data !synthetic-datagen-flow The model was trained using a simple yet effective synthetic data generation pipeline: 1. Source: Clinical notes from PubMed Central (PMC) 2. Processing: LLaMA 3.1 70B model used to generate query-response pairs 3. Augmentation: Negative sampling for challenging examples 4. Format: Triplets (query, positive response, negative response) for contrastive learning ## Performance MedEmbed consistently outperforms general-purpose embedding models across various medical NLP benchmarks: - ArguAna - MedicalQARetrieval - NFCorpus - PublicHealthQA - TRECCOVID Specific performance metrics (nDCG, MAP, Recall, Precision, MRR) are available in the full documentation. ## Limitations While highly effective for medical and clinical data, this model may not generalize well to non-medical domains. It should be used with caution in general-purpose NLP tasks. ## Ethical Considerations Users should be aware of potential biases in medical data and the ethical implications of AI in healthcare. This model should be used as a tool to assist, not replace, human expertise in medical decision-making. ## Citation If you use this model in your research, please cite: For more detailed information, visit our GitHub repository.", + "model_explanation_gemini": "Generates specialized embeddings for medical and clinical data to enhance information retrieval and semantic search in healthcare contexts." +} \ No newline at end of file diff --git a/data/model_data_json/afrideva_Tiny-Vicuna-1B-GGUF.json b/data/model_data_json/afrideva_Tiny-Vicuna-1B-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..6869fe5a763937ee159040d78d237925de1e88f9 --- /dev/null +++ b/data/model_data_json/afrideva_Tiny-Vicuna-1B-GGUF.json @@ -0,0 +1,21 @@ +{ + "model_id": "afrideva/Tiny-Vicuna-1B-GGUF", + "downloads": 266149, + "tags": [ + "gguf", + "ggml", + "quantized", + "q2_k", + "q3_k_m", + "q4_k_m", + "q5_k_m", + "q6_k", + "q8_0", + "text-generation", + "base_model:Jiayi-Pan/Tiny-Vicuna-1B", + "base_model:quantized:Jiayi-Pan/Tiny-Vicuna-1B", + "region:us" + ], + "description": "--- base_model: Jiayi-Pan/Tiny-Vicuna-1B inference: false model_creator: Jiayi-Pan model_name: Tiny-Vicuna-1B pipeline_tag: text-generation quantized_by: afrideva tags: - gguf - ggml - quantized - q2_k - q3_k_m - q4_k_m - q5_k_m - q6_k - q8_0 --- # Jiayi-Pan/Tiny-Vicuna-1B-GGUF Quantized GGUF model files for Tiny-Vicuna-1B from Jiayi-Pan | Name | Quant method | Size | | ---- | ---- | ---- | | tiny-vicuna-1b.q2_k.gguf | q2_k | 482.14 MB | | tiny-vicuna-1b.q3_k_m.gguf | q3_k_m | 549.85 MB | | tiny-vicuna-1b.q4_k_m.gguf | q4_k_m | 667.81 MB | | tiny-vicuna-1b.q5_k_m.gguf | q5_k_m | 782.04 MB | | tiny-vicuna-1b.q6_k.gguf | q6_k | 903.41 MB | | tiny-vicuna-1b.q8_0.gguf | q8_0 | 1.17 GB | ## Original Model Card: # Tiny Vicuna 1B TinyLLama 1.1B finetuned with WizardVicuna dataset. Easy to iterate on for early experiments!", + "model_explanation_gemini": "A quantized GGUF version of Tiny-Vicuna-1B, a 1.1B parameter model fine-tuned for text generation using the WizardVicuna dataset, designed for efficient experimentation." +} \ No newline at end of file diff --git a/data/model_data_json/aglazkova_bart_finetuned_keyphrase_extraction.json b/data/model_data_json/aglazkova_bart_finetuned_keyphrase_extraction.json new file mode 100644 index 0000000000000000000000000000000000000000..31689ad1ed80389dbaa467787292396187bf725e --- /dev/null +++ b/data/model_data_json/aglazkova_bart_finetuned_keyphrase_extraction.json @@ -0,0 +1,21 @@ +{ + "model_id": "aglazkova/bart_finetuned_keyphrase_extraction", + "downloads": 83042, + "tags": [ + "transformers", + "pytorch", + "bart", + "text2text-generation", + "en", + "dataset:midas/krapivin", + "dataset:midas/inspec", + "dataset:midas/kptimes", + "dataset:midas/duc2001", + "arxiv:1910.13461", + "arxiv:2312.10700", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - midas/krapivin - midas/inspec - midas/kptimes - midas/duc2001 language: - en widget: - text: \"Relevance has traditionally been linked with feature subset selection, but formalization of this link has not been attempted. In this paper, we propose two axioms for feature subset selection sufficiency axiom and necessity axiombased on which this link is formalized: The expected feature subset is the one which maximizes relevance. Finding the expected feature subset turns out to be NP-hard. We then devise a heuristic algorithm to find the expected subset which has a polynomial time complexity. The experimental results show that the algorithm finds good enough subset of features which, when presented to C4.5, results in better prediction accuracy.\" - text: \"In this paper, we investigate cross-domain limitations of keyphrase generation using the models for abstractive text summarization. We present an evaluation of BART fine-tuned for keyphrase generation across three types of texts, namely scientific texts from computer science and biomedical domains and news texts. We explore the role of transfer learning between different domains to improve the model performance on small text corpora.\" --- # BART fine-tuned for keyphrase generation This is the ) model finetuned for the keyphrase generation task () on the fragments of the following corpora: * Krapivin () * Inspec () * KPTimes () * DUC-2001 () * PubMed () * NamedKeys (). #### Training Hyperparameters The following hyperparameters were used during training: * learning_rate: 4e-5 * train_batch_size: 8 * optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08 * num_epochs: 6 **BibTeX:**" +} \ No newline at end of file diff --git a/data/model_data_json/ai-forever_sbert_large_nlu_ru.json b/data/model_data_json/ai-forever_sbert_large_nlu_ru.json new file mode 100644 index 0000000000000000000000000000000000000000..fa75e2bb3dbe3d7a2ea5df42c2ea519d75828724 --- /dev/null +++ b/data/model_data_json/ai-forever_sbert_large_nlu_ru.json @@ -0,0 +1,20 @@ +{ + "model_id": "ai-forever/sbert_large_nlu_ru", + "downloads": 361658, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "PyTorch", + "Transformers", + "ru", + "license:mit", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru tags: - PyTorch - Transformers license: mit --- # BERT large model (uncased) for Sentence Embeddings in Russian language. The model is described in this article For better quality, use mean token embeddings. ## Usage (HuggingFace Models Repository) You can use the model directly from the model repository to compute sentence embeddings: # Authors + SberDevices Team. + Aleksandr Abramov: HF profile, Github, Kaggle Competitions Master; + Denis Antykhov: Github; + Ibragim Badertdinov: Github", + "model_explanation_gemini": "Generates sentence embeddings in Russian using a large uncased BERT model, optimized for natural language understanding tasks." +} \ No newline at end of file diff --git a/data/model_data_json/ai4bharat_IndicBERTv2-MLM-only.json b/data/model_data_json/ai4bharat_IndicBERTv2-MLM-only.json new file mode 100644 index 0000000000000000000000000000000000000000..b941fa9f26576820ef80a65c1d143ab8e6684f8b --- /dev/null +++ b/data/model_data_json/ai4bharat_IndicBERTv2-MLM-only.json @@ -0,0 +1,45 @@ +{ + "model_id": "ai4bharat/IndicBERTv2-MLM-only", + "downloads": 192418, + "tags": [ + "transformers", + "pytorch", + "bert", + "fill-mask", + "indicbert2", + "ai4bharat", + "multilingual", + "as", + "bn", + "brx", + "doi", + "en", + "gom", + "gu", + "hi", + "kn", + "ks", + "kas", + "mai", + "ml", + "mr", + "mni", + "mnb", + "ne", + "or", + "pa", + "sa", + "sat", + "sd", + "snd", + "ta", + "te", + "ur", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - as - bn - brx - doi - en - gom - gu - hi - kn - ks - kas - mai - ml - mr - mni - mnb - ne - or - pa - sa - sat - sd - snd - ta - te - ur language_details: >- asm_Beng, ben_Beng, brx_Deva, doi_Deva, eng_Latn, gom_Deva, guj_Gujr, hin_Deva, kan_Knda, kas_Arab, kas_Deva, mai_Deva, mal_Mlym, mar_Deva, mni_Beng, mni_Mtei, npi_Deva, ory_Orya, pan_Guru, san_Deva, sat_Olck, snd_Arab, snd_Deva, tam_Taml, tel_Telu, urd_Arab tags: - indicbert2 - ai4bharat - multilingual license: mit metrics: - accuracy pipeline_tag: fill-mask --- # IndicBERT A multilingual language model trained on IndicCorp v2 and evaluated on IndicXTREME benchmark. The model has 278M parameters and is available in 23 Indic languages and English. The models are trained with various objectives and datasets. The list of models are as follows: - IndicBERT-MLM [Model] - A vanilla BERT style model trained on IndicCorp v2 with the MLM objective - +Samanantar [Model] - TLM as an additional objective with Samanantar Parallel Corpus [Paper] | [Dataset] - +Back-Translation [Model] - TLM as an additional objective by translating the Indic parts of IndicCorp v2 dataset into English w/ IndicTrans model [Model] - IndicBERT-SS [Model] - To encourage better lexical sharing among languages we convert the scripts from Indic languages to Devanagari and train a BERT style model with the MLM objective ## Run Fine-tuning Fine-tuning scripts are based on transformers library. Create a new conda environment and set it up as follows: All the tasks follow the same structure, please check individual files for detailed hyper-parameter choices. The following command runs the fine-tuning for a task: Arguments: - MODEL_NAME: name of the model to fine-tune, can be a local path or a model from the HuggingFace Model Hub - TASK_NAME: one of [] > For MASSIVE task, please use the instrction provided in the official repository ## Citation", + "model_explanation_gemini": "A multilingual BERT-style model trained for masked language modeling across 23 Indic languages and English using IndicCorp v2 data." +} \ No newline at end of file diff --git a/data/model_data_json/ai4bharat_indictrans2-en-indic-1B.json b/data/model_data_json/ai4bharat_indictrans2-en-indic-1B.json new file mode 100644 index 0000000000000000000000000000000000000000..430ed0f651e53cb7823e46f24b0a2681e8bcec78 --- /dev/null +++ b/data/model_data_json/ai4bharat_indictrans2-en-indic-1B.json @@ -0,0 +1,51 @@ +{ + "model_id": "ai4bharat/indictrans2-en-indic-1B", + "downloads": 108188, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "IndicTrans", + "text2text-generation", + "indictrans2", + "translation", + "ai4bharat", + "multilingual", + "custom_code", + "as", + "bn", + "brx", + "doi", + "en", + "gom", + "gu", + "hi", + "kn", + "ks", + "kas", + "mai", + "ml", + "mr", + "mni", + "mnb", + "ne", + "or", + "pa", + "sa", + "sat", + "sd", + "snd", + "ta", + "te", + "ur", + "dataset:flores-200", + "dataset:IN22-Gen", + "dataset:IN22-Conv", + "arxiv:2305.16307", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- language: - as - bn - brx - doi - en - gom - gu - hi - kn - ks - kas - mai - ml - mr - mni - mnb - ne - or - pa - sa - sat - sd - snd - ta - te - ur language_details: >- asm_Beng, ben_Beng, brx_Deva, doi_Deva, eng_Latn, gom_Deva, guj_Gujr, hin_Deva, kan_Knda, kas_Arab, kas_Deva, mai_Deva, mal_Mlym, mar_Deva, mni_Beng, mni_Mtei, npi_Deva, ory_Orya, pan_Guru, san_Deva, sat_Olck, snd_Arab, snd_Deva, tam_Taml, tel_Telu, urd_Arab tags: - indictrans2 - translation - ai4bharat - multilingual license: mit datasets: - flores-200 - IN22-Gen - IN22-Conv metrics: - bleu - chrf - chrf++ - comet inference: false --- # IndicTrans2 This is the model card of IndicTrans2 En-Indic 1.1B variant. Here are the metrics for the particular checkpoint. Please refer to of the preprint for further details on model training, intended use, data, metrics, limitations and recommendations. ### Usage Instructions Please refer to the github repository for a detail description on how to use HF compatible IndicTrans2 models for inference. ### 📢 Long Context IT2 Models - New RoPE based IndicTrans2 models which are capable of handling sequence lengths **upto 2048 tokens** are available here - These models can be used by just changing the parameter. Please read the model card of the RoPE-IT2 models for more information about the generation. - It is recommended to run these models with for efficient generation. ### Citation If you consider using our work then please cite using:", + "model_explanation_gemini": "Translates text between English and multiple Indian languages, handling up to 2048 tokens per sequence." +} \ No newline at end of file diff --git a/data/model_data_json/ai4bharat_indictrans2-indic-en-1B.json b/data/model_data_json/ai4bharat_indictrans2-indic-en-1B.json new file mode 100644 index 0000000000000000000000000000000000000000..32bbe5815b04f7b4e5a465e153786bcd57995900 --- /dev/null +++ b/data/model_data_json/ai4bharat_indictrans2-indic-en-1B.json @@ -0,0 +1,51 @@ +{ + "model_id": "ai4bharat/indictrans2-indic-en-1B", + "downloads": 443885, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "IndicTrans", + "text2text-generation", + "indictrans2", + "translation", + "ai4bharat", + "multilingual", + "custom_code", + "as", + "bn", + "brx", + "doi", + "en", + "gom", + "gu", + "hi", + "kn", + "ks", + "kas", + "mai", + "ml", + "mr", + "mni", + "mnb", + "ne", + "or", + "pa", + "sa", + "sat", + "sd", + "snd", + "ta", + "te", + "ur", + "dataset:flores-200", + "dataset:IN22-Gen", + "dataset:IN22-Conv", + "arxiv:2305.16307", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- language: - as - bn - brx - doi - en - gom - gu - hi - kn - ks - kas - mai - ml - mr - mni - mnb - ne - or - pa - sa - sat - sd - snd - ta - te - ur language_details: >- asm_Beng, ben_Beng, brx_Deva, doi_Deva, eng_Latn, gom_Deva, guj_Gujr, hin_Deva, kan_Knda, kas_Arab, kas_Deva, mai_Deva, mal_Mlym, mar_Deva, mni_Beng, mni_Mtei, npi_Deva, ory_Orya, pan_Guru, san_Deva, sat_Olck, snd_Arab, snd_Deva, tam_Taml, tel_Telu, urd_Arab tags: - indictrans2 - translation - ai4bharat - multilingual license: mit datasets: - flores-200 - IN22-Gen - IN22-Conv metrics: - bleu - chrf - chrf++ - comet inference: false --- # IndicTrans2 This is the model card of IndicTrans2 Indic-En 1.1B variant. Here are the metrics for the particular checkpoint. Please refer to of the preprint for further details on model training, intended use, data, metrics, limitations and recommendations. ### Usage Instructions Please refer to the github repository for a detail description on how to use HF compatible IndicTrans2 models for inference. ### 📢 Long Context IT2 Models - New RoPE based IndicTrans2 models which are capable of handling sequence lengths **upto 2048 tokens** are available here. - These models can be used by just changing the parameter. Please read the model card of the RoPE-IT2 models for more information about the generation. - It is recommended to run these models with for efficient generation. ### Citation If you consider using our work then please cite using:", + "model_explanation_gemini": "Translates between English and multiple Indic languages, handling sequences up to 2048 tokens." +} \ No newline at end of file diff --git a/data/model_data_json/aipicasso_emi.json b/data/model_data_json/aipicasso_emi.json new file mode 100644 index 0000000000000000000000000000000000000000..7338e2f0cf1f7bcc9eadaa78ea243568c686e1d1 --- /dev/null +++ b/data/model_data_json/aipicasso_emi.json @@ -0,0 +1,17 @@ +{ + "model_id": "aipicasso/emi", + "downloads": 274385, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "text-to-image", + "arxiv:2307.01952", + "arxiv:2212.03860", + "license:openrail++", + "autotrain_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- extra_gated_prompt: このモデルをこのページからダウンロードするためにはHugging Faceに登録された情報を提供する必要があります。この提供された情報は画像生成AIを活用する情報を案内するために使われます。 To download this model from this page, you need to provide information registered with Hugging Face. The information provided will be used to guide you on how to utilize the image-generation AI. license: openrail++ tags: - stable-diffusion - text-to-image inference: false library_name: diffusers --- # Emi Model Card !eyecatch.jpg Original(PNG) English: Click Here # はじめに Emi (Ethereal master of illustration) は、 最先端の開発機材H100と画像生成Stable Diffusion XL 1.0を用いて AI Picasso社が開発したAIアートに特化した画像生成AIです。 このモデルの特徴として、Danbooruなどにある無断転載画像を学習していないことがあげられます。 # ライセンスについて ライセンスについては、これまでとは違い、 CreativeML Open RAIL++-M License です。 したがって、**商用利用可能**です。 これは次のように判断したためです。 - 画像生成AIが普及するに伴い、創作業界に悪影響を及ぼさないように、マナーを守る人が増えてきたため - 他の画像生成AIが商用可能である以上、あまり非商用ライセンスである実効性がなくなってきたため # 使い方 ここからデモを利用することができます。 本格的に利用する人はここからモデルをダウンロードできます。 通常版で生成がうまく行かない場合は、安定版をお使いください。 # シンプルな作品例 !example_1.jpg !example_2.png !example_3.jpg # モデルの出力向上について - 確実にアニメ調のイラストを出したいときは、anime artwork, anime styleとプロンプトの先頭に入れてください。 - プロンプトにtransparentという言葉を入れると、より最近の画風になります。 - 全身 (full body) を描くとうまく行かない場合もあるため、そのときは安定版をお試しください。 - 使えるプロンプトはWaifu Diffusionと同じです。また、Stable Diffusionのように使うこともできます。 - ネガティブプロンプトにTextual Inversionを使用することをおすすめします。 - 手が不安定なため、DreamShaper XL1.0などの実写系モデルとのマージをおすすめします。 - ChatGPTを用いてプロンプトを洗練すると、自分の枠を超えた作品に出会えます。 - 最新のComfyUIにあるFreeUノード、またはWeb UIの拡張機能を次のパラメータで使うとさらに出力が上がる可能性があります。次の画像はFreeUを使った例です。 - b1 = 1.1, b2 = 1.2, s1 = 0.6, s2 = 0.4 report !example_4.png # 法律について 本モデルは日本にて作成されました。したがって、日本の法律が適用されます。 本モデルの学習は、著作権法第30条の4に基づき、合法であると主張します。 また、本モデルの配布については、著作権法や刑法175条に照らしてみても、 正犯や幇助犯にも該当しないと主張します。詳しくは柿沼弁護士の見解を御覧ください。 ただし、ライセンスにもある通り、本モデルの生成物は各種法令に従って取り扱って下さい。 # 連絡先 support@aipicasso.app 以下、一般的なモデルカードの日本語訳です。 ## モデル詳細 - **モデルタイプ:** 拡散モデルベースの text-to-image 生成モデル - **言語:** 日本語 - **ライセンス:** CreativeML Open RAIL++-M License - **モデルの説明:** このモデルはプロンプトに応じて適切な画像を生成することができます。アルゴリズムは Latent Diffusion Model と OpenCLIP-ViT/G、CLIP-L です。 - **補足:** - **参考文献:** ## モデルの使用例 Stable Diffusion XL 1.0と同じ使い方です。 たくさんの方法がありますが、3つのパターンを提供します。 - ComfyUI - Fooocus - Diffusers ### ComfyUIやFooocusの場合 Stable Diffusion XL 1.0 の使い方と同じく、safetensor形式のモデルファイルを使ってください。 詳しいインストール方法は、こちらの記事を参照してください。 ### Diffusersの場合 🤗's Diffusers library を使ってください。 まずは、以下のスクリプトを実行し、ライブラリをいれてください。 次のスクリプトを実行し、画像を生成してください。 複雑な操作はデモのソースコードを参考にしてください。 #### 想定される用途 - イラストや漫画、アニメの作画補助 - 商用・非商用は問わない - 依頼の際のクリエイターとのコミュニケーション - 画像生成サービスの商用提供 - 生成物の取り扱いには注意して使ってください。 - 自己表現 - このAIを使い、「あなた」らしさを発信すること - 研究開発 - Discord上でのモデルの利用 - プロンプトエンジニアリング - ファインチューニング(追加学習とも) - DreamBooth など - 他のモデルとのマージ - 本モデルの性能をFIDなどで調べること - 本モデルがStable Diffusion以外のモデルとは独立であることをチェックサムやハッシュ関数などで調べること - 教育 - 美大生や専門学校生の卒業制作 - 大学生の卒業論文や課題制作 - 先生が画像生成AIの現状を伝えること - Hugging Face の Community にかいてある用途 - 日本語か英語で質問してください #### 想定されない用途 - 物事を事実として表現するようなこと - 先生を困らせるようなこと - その他、創作業界に悪影響を及ぼすこと # 使用してはいけない用途や悪意のある用途 - マネー・ロンダリングに用いないでください - デジタル贋作 (Digital Forgery) は公開しないでください(著作権法に違反するおそれ) - 他人の作品を無断でImage-to-Imageしないでください(著作権法に違反するおそれ) - わいせつ物を頒布しないでください (刑法175条に違反するおそれ) - いわゆる業界のマナーを守らないようなこと - 事実に基づかないことを事実のように語らないようにしてください(威力業務妨害罪が適用されるおそれ) - フェイクニュース ## モデルの限界やバイアス ### モデルの限界 - 拡散モデルや大規模言語モデルは、いまだに未知の部分が多く、その限界は判明していない。 ### バイアス - 拡散モデルや大規模言語モデルは、いまだに未知の部分が多く、バイアスは判明していない。 ## 学習 **学習データ** - Stable Diffusionと同様のデータセットからDanbooruの無断転載画像を取り除いて手動で集めた約2000枚の画像 - Stable Diffusionと同様のデータセットからDanbooruの無断転載画像を取り除いて自動で集めた約50万枚の画像 **学習プロセス** - **ハードウェア:** H100 ## 評価結果 第三者による評価を求めています。 ## 環境への影響 - **ハードウェアタイプ:** H100 - **使用時間(単位は時間):** 500 - **学習した場所:** 日本 ## 参考文献" +} \ No newline at end of file diff --git a/data/model_data_json/airesearch_wav2vec2-large-xlsr-53-th.json b/data/model_data_json/airesearch_wav2vec2-large-xlsr-53-th.json new file mode 100644 index 0000000000000000000000000000000000000000..058f47560e7a42c494a13d1f322fc361a9107cee --- /dev/null +++ b/data/model_data_json/airesearch_wav2vec2-large-xlsr-53-th.json @@ -0,0 +1,24 @@ +{ + "model_id": "airesearch/wav2vec2-large-xlsr-53-th", + "downloads": 107573, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "robust-speech-event", + "speech", + "xlsr-fine-tuning", + "th", + "dataset:common_voice", + "doi:10.57967/hf/0404", + "license:cc-by-sa-4.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: th datasets: - common_voice tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - robust-speech-event - speech - xlsr-fine-tuning license: cc-by-sa-4.0 model-index: - name: XLS-R-53 - Thai results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 7 type: mozilla-foundation/common_voice_7_0 args: th metrics: - name: Test WER type: wer value: 0.9524 - name: Test SER type: ser value: 1.2346 - name: Test CER type: cer value: 0.1623 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: sv metrics: - name: Test WER type: wer value: null - name: Test SER type: ser value: null - name: Test CER type: cer value: null --- # Finetuning on Thai Common Voice 7.0 Read more on our blog We finetune wav2vec2-large-xlsr-53 based on Fine-tuning Wav2Vec2 for English ASR using Thai examples of Common Voice Corpus 7.0. The notebooks and scripts can be found in vistec-ai/wav2vec2-large-xlsr-53-th. The pretrained model and processor can be found at airesearch/wav2vec2-large-xlsr-53-th. ## Add , (PyThaiNLP) and deepcut tokenizers to from robust-speech-event ### Eval results on Common Voice 7 \"test\": | | WER PyThaiNLP 2.3.1 | WER deepcut | SER | CER | |---------------------------------|---------------------|-------------|---------|---------| | Only Tokenization | 0.9524% | 2.5316% | 1.2346% | 0.1623% | | Cleaning rules and Tokenization | TBD | TBD | TBD | TBD | ## Usage ## Datasets Common Voice Corpus 7.0]( contains 133 validated hours of Thai (255 total hours) at 5GB. We pre-tokenize with . We preprocess the dataset using cleaning rules described in by @tann9949. We then deduplicate and split as described in ekapolc/Thai_commonvoice_split in order to 1) avoid data leakage due to random splits after cleaning in Common Voice Corpus 7.0 and 2) preserve the majority of the data for the training set. The dataset loading script is . You can use this scripts together with , and to have the same splits as we do. The resulting dataset is as follows: ## Training We fintuned using the following configuration on a single V100 GPU and chose the checkpoint with the lowest validation loss. The finetuning script is ## Evaluation We benchmark on the test set using WER with words tokenized by PyThaiNLP 2.3.1 and deepcut, and CER. We also measure performance when spell correction using TNC ngrams is applied. Evaluation codes can be found in . Benchmark is performed on split. | | WER PyThaiNLP 2.3.1 | WER deepcut | CER | |--------------------------------|---------------------|----------------|----------------| | Kaldi from scratch | 23.04 | | 7.57 | | Ours without spell correction | 13.634024 | **8.152052** | **2.813019** | | Ours with spell correction | 17.996397 | 14.167975 | 5.225761 | | Google Web Speech API※ | 13.711234 | 10.860058 | 7.357340 | | Microsoft Bing Speech API※ | **12.578819** | 9.620991 | 5.016620 | | Amazon Transcribe※ | 21.86334 | 14.487553 | 7.077562 | | NECTEC AI for Thai Partii API※ | 20.105887 | 15.515631 | 9.551027 | ※ APIs are not finetuned with Common Voice 7.0 data ## LICENSE cc-by-sa 4.0 ## Ackowledgements * model training and validation notebooks/scripts @cstorm125 * dataset cleaning scripts @tann9949 * dataset splits @ekapolc and @14mss * running the training @mrpeerat * spell correction @wannaphong", + "model_explanation_gemini": "Finetuned for Thai automatic speech recognition, achieving low word and character error rates on the Common Voice 7.0 dataset." +} \ No newline at end of file diff --git a/data/model_data_json/akdeniz27_bert-base-turkish-cased-ner.json b/data/model_data_json/akdeniz27_bert-base-turkish-cased-ner.json new file mode 100644 index 0000000000000000000000000000000000000000..93670b137b23681ea886850622436147ffadf263 --- /dev/null +++ b/data/model_data_json/akdeniz27_bert-base-turkish-cased-ner.json @@ -0,0 +1,20 @@ +{ + "model_id": "akdeniz27/bert-base-turkish-cased-ner", + "downloads": 110778, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "token-classification", + "tr", + "doi:10.57967/hf/0949", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: tr license: mit widget: - text: \"Mustafa Kemal Atatürk 19 Mayıs 1919'da Samsun'a çıktı.\" --- # Turkish Named Entity Recognition (NER) Model This model is the fine-tuned model of \"dbmdz/bert-base-turkish-cased\" using a reviewed version of well known Turkish NER dataset ( # Fine-tuning parameters: # How to use: Pls refer \" for entity grouping with aggregation_strategy parameter. # Reference test results: * accuracy: 0.9933935699477056 * f1: 0.9592969472710453 * precision: 0.9543530277931161 * recall: 0.9642923563325274 Evaluation results with the test sets proposed in \"Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi (\"A Named Entity Recognition Dataset for Turkish\"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye.\" paper. * Test Set Acc. Prec. Rec. F1-Score * 20010000 0.9946 0.9871 0.9463 0.9662 * 20020000 0.9928 0.9134 0.9206 0.9170 * 20030000 0.9942 0.9814 0.9186 0.9489 * 20040000 0.9943 0.9660 0.9522 0.9590 * 20050000 0.9971 0.9539 0.9932 0.9732 * 20060000 0.9993 0.9942 0.9942 0.9942 * 20070000 0.9970 0.9806 0.9439 0.9619 * 20080000 0.9988 0.9821 0.9649 0.9735 * 20090000 0.9977 0.9891 0.9479 0.9681 * 20100000 0.9961 0.9684 0.9293 0.9485 * Overall 0.9961 0.9720 0.9516 0.9617", + "model_explanation_gemini": "Identifies and classifies named entities in Turkish text using a fine-tuned BERT model trained on a Turkish NER dataset." +} \ No newline at end of file diff --git a/data/model_data_json/albert_albert-base-v2.json b/data/model_data_json/albert_albert-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..7d1d9f2d092b550eda8a0298375f3c317a112b7d --- /dev/null +++ b/data/model_data_json/albert_albert-base-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "albert/albert-base-v2", + "downloads": 2912913, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "albert", + "fill-mask", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1909.11942", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 datasets: - bookcorpus - wikipedia --- # ALBERT Base v2 Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model, as all ALBERT models, is uncased: it does not make a difference between english and English. Disclaimer: The team releasing ALBERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description ALBERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Sentence Ordering Prediction (SOP): ALBERT uses a pretraining loss based on predicting the ordering of two consecutive segments of text. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the ALBERT model as inputs. ALBERT is particular in that it shares its layers across its Transformer. Therefore, all layers have the same weights. Using repeating layers results in a small memory footprint, however, the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers. This is the second version of the base model. Version 2 is different from version 1 due to different dropout rates, additional training data, and longer training. It has better results in nearly all downstream tasks. This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 768 hidden dimension - 12 attention heads - 11M parameters ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The ALBERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 30,000. The inputs of the model are then of the form: ### Training The ALBERT procedure follows the BERT setup. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ## Evaluation results When fine-tuned on downstream tasks, the ALBERT models achieve the following results: | | Average | SQuAD1.1 | SQuAD2.0 | MNLI | SST-2 | RACE | |----------------|----------|----------|----------|----------|----------|----------| |V2 | |ALBERT-base |82.3 |90.2/83.2 |82.1/79.3 |84.6 |92.9 |66.8 | |ALBERT-large |85.7 |91.8/85.2 |84.9/81.8 |86.5 |94.9 |75.2 | |ALBERT-xlarge |87.9 |92.9/86.4 |87.9/84.1 |87.9 |95.4 |80.7 | |ALBERT-xxlarge |90.9 |94.6/89.1 |89.8/86.9 |90.6 |96.8 |86.8 | |V1 | |ALBERT-base |80.1 |89.3/82.3 | 80.0/77.1|81.6 |90.3 | 64.0 | |ALBERT-large |82.4 |90.6/83.9 | 82.3/79.4|83.5 |91.7 | 68.5 | |ALBERT-xlarge |85.5 |92.5/86.1 | 86.1/83.1|86.4 |92.4 | 74.8 | |ALBERT-xxlarge |91.0 |94.8/89.3 | 90.2/87.4|90.8 |96.9 | 86.5 | ### BibTeX entry and citation info", + "model_explanation_gemini": "Pretrained on English text using masked language modeling and sentence ordering prediction to learn bidirectional representations for downstream NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/alefiury_wav2vec2-large-xlsr-53-gender-recognition-librispeech.json b/data/model_data_json/alefiury_wav2vec2-large-xlsr-53-gender-recognition-librispeech.json new file mode 100644 index 0000000000000000000000000000000000000000..6c52bfeaec2ffee49682887e4ea5fed9356400af --- /dev/null +++ b/data/model_data_json/alefiury_wav2vec2-large-xlsr-53-gender-recognition-librispeech.json @@ -0,0 +1,20 @@ +{ + "model_id": "alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech", + "downloads": 173156, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "audio-classification", + "generated_from_trainer", + "dataset:librispeech_asr", + "base_model:facebook/wav2vec2-xls-r-300m", + "base_model:finetune:facebook/wav2vec2-xls-r-300m", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - generated_from_trainer datasets: - librispeech_asr metrics: - f1 base_model: facebook/wav2vec2-xls-r-300m model-index: - name: weights results: [] --- # wav2vec2-large-xlsr-53-gender-recognition-librispeech This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on Librispeech-clean-100 for gender recognition. It achieves the following results on the evaluation set: - Loss: 0.0061 - F1: 0.9993 ### Compute your inferences ## Training and evaluation data The Librispeech-clean-100 dataset was used to train the model, with 70% of the data used for training, 10% for validation, and 20% for testing. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | F1 | |:-------------:|:-----:|:----:|:---------------:|:------:| | 0.002 | 1.0 | 1248 | 0.0061 | 0.9993 | ### Framework versions - Transformers 4.28.0 - Pytorch 2.0.0+cu118 - Tokenizers 0.13.3", + "model_explanation_gemini": "Recognizes gender from audio using the Librispeech dataset, achieving high F1 scores." +} \ No newline at end of file diff --git a/data/model_data_json/all-hands_openhands-lm-32b-v0.1.json b/data/model_data_json/all-hands_openhands-lm-32b-v0.1.json new file mode 100644 index 0000000000000000000000000000000000000000..b3d3caf2edecea99a66cb9b2be3a9f41ef28ce9a --- /dev/null +++ b/data/model_data_json/all-hands_openhands-lm-32b-v0.1.json @@ -0,0 +1,21 @@ +{ + "model_id": "all-hands/openhands-lm-32b-v0.1", + "downloads": 312528, + "tags": [ + "safetensors", + "qwen2", + "agent", + "coding", + "text-generation", + "conversational", + "en", + "dataset:SWE-Gym/SWE-Gym", + "arxiv:2412.21139", + "base_model:Qwen/Qwen2.5-Coder-32B-Instruct", + "base_model:finetune:Qwen/Qwen2.5-Coder-32B-Instruct", + "license:mit", + "region:us" + ], + "description": "--- license: mit datasets: - SWE-Gym/SWE-Gym language: - en base_model: - Qwen/Qwen2.5-Coder-32B-Instruct pipeline_tag: text-generation tags: - agent - coding ---
\"Logo\"

OpenHands LM v0.1

Use it in OpenHands

--- Autonomous agents for software development are already contributing to a wide range of software development tasks. But up to this point, strong coding agents have relied on proprietary models, which means that even if you use an open-source agent like OpenHands, you are still reliant on API calls to an external service. Today, we are excited to introduce OpenHands LM, a new open coding model that: - Is open and available on Hugging Face, so you can download it and run it locally - Is a reasonable size, 32B, so it can be run locally on hardware such as a single 3090 GPU - Achieves strong performance on software engineering tasks, including 37.2% resolve rate on SWE-Bench Verified Read below for more details and our future plans! ## What is OpenHands LM? OpenHands LM is built on the foundation of Qwen Coder 2.5 Instruct 32B, leveraging its powerful base capabilities for coding tasks. What sets OpenHands LM apart is our specialized fine-tuning process: - We used training data generated by OpenHands itself on a diverse set of open-source repositories - Specifically, we use an RL-based framework outlined in SWE-Gym, where we set up a training environment, generate training data using an existing agent, and then fine-tune the model on examples that were resolved successfully - It features a 128K token context window, ideal for handling large codebases and long-horizon software engineering tasks ## Performance: Punching Above Its Weight We evaluated OpenHands LM using our latest iterative evaluation protocol on the SWE-Bench Verified benchmark. The results are impressive: - **37.2% verified resolve rate** on SWE-Bench Verified - Performance comparable to models with **20x more parameters**, including Deepseek V3 0324 (38.8%) with 671B parameters Here's how OpenHands LM compares to other leading open-source models: !OpenHands LM Performance Comparison As the plot demonstrates, our 32B parameter model achieves efficiency that approaches much larger models. While the largest models (671B parameters) achieve slightly higher scores, our 32B parameter model performs remarkably well, opening up possibilities for local deployment that are not possible with larger models. ## Getting Started: How to Use OpenHands LM Today You can start using OpenHands LM immediately through these channels: 1. **Download the model from Hugging Face** The model is available on Hugging Face and can be downloaded directly from there. 2. **Create an OpenAI-compatible endpoint with a model serving framework** For optimal performance, it is recommended to serve this model with a GPU using SGLang or vLLM. 3. **Point your OpenHands agent to the new model** Download OpenHands and follow the instructions for using an OpenAI-compatible endpoint. ## The Road Ahead: Our Development Plans This initial release marks just the beginning of our journey. We will continue enhancing OpenHands LM based on community feedback and ongoing research initiatives. In particular, it should be noted that the model is still a research preview, and (1) may be best suited for tasks regarding solving github issues and perform less well on more varied software engineering tasks, (2) may sometimes generate repetitive steps, and (3) is somewhat sensitive to quantization, and may not function at full performance at lower quantization levels. Our next releases will focus on addressing these limitations. We're also developing more compact versions of the model (including a 7B parameter variant) to support users with limited computational resources. These smaller models will preserve OpenHands LM's core strengths while dramatically reducing hardware requirements. We encourage you to experiment with OpenHands LM, share your experiences, and participate in its evolution. Together, we can create better tools for tomorrow's software development landscape. ## Try OpenHands Cloud While OpenHands LM is a powerful model you can run locally, we also offer a fully managed cloud solution that makes it even easier to leverage AI for your software development needs. OpenHands Cloud provides: - Seamless GitHub integration with issue and PR support - Multiple interaction methods including text, voice, and mobile - Parallel agent capabilities for working on multiple tasks simultaneously - All the power of OpenHands without managing infrastructure OpenHands Cloud is built on the same technology as our open-source solution but adds convenient features for teams and individuals who want a ready-to-use platform. Visit app.all-hands.dev to get started today! ## Join Our Community We invite you to be part of the OpenHands LM journey: - Explore our GitHub repository - Connect with us on Slack - Follow our documentation to get started By contributing your experiences and feedback, you'll help shape the future of this open-source initiative. Together, we can create better tools for tomorrow's software development landscape. We can't wait to see what you'll create with OpenHands LM!", + "model_explanation_gemini": "An open-source 32B parameter coding model fine-tuned for autonomous software development tasks, achieving strong performance on GitHub issue resolution with a 128K token context window." +} \ No newline at end of file diff --git a/data/model_data_json/allegro_herbert-base-cased.json b/data/model_data_json/allegro_herbert-base-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..385f44a4df225f35ba2e6d46e60ba3613818b210 --- /dev/null +++ b/data/model_data_json/allegro_herbert-base-cased.json @@ -0,0 +1,18 @@ +{ + "model_id": "allegro/herbert-base-cased", + "downloads": 78442, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "feature-extraction", + "herbert", + "pl", + "license:cc-by-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: pl tags: - herbert license: cc-by-4.0 --- # HerBERT **HerBERT** is a BERT-based Language Model trained on Polish corpora using Masked Language Modelling (MLM) and Sentence Structural Objective (SSO) with dynamic masking of whole words. For more details, please refer to: HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish. Model training and experiments were conducted with transformers in version 2.9. ## Corpus HerBERT was trained on six different corpora available for Polish language: | Corpus | Tokens | Documents | | :------ | ------: | ------: | | CCNet Middle | 3243M | 7.9M | | CCNet Head | 2641M | 7.0M | | National Corpus of Polish| 1357M | 3.9M | | Open Subtitles | 1056M | 1.1M | Wikipedia | 260M | 1.4M | | Wolne Lektury | 41M | 5.5k | ## Tokenizer The training dataset was tokenized into subwords using a character level byte-pair encoding (``. ## Usage Example code: ## License CC BY 4.0 ## Citation If you use this model, please cite the following paper: ## Authors The model was trained by **Machine Learning Research Team at Allegro** and **Linguistic Engineering Group at Institute of Computer Science, Polish Academy of Sciences**. You can contact us at: klejbenchmark@allegro.pl" +} \ No newline at end of file diff --git a/data/model_data_json/allenai_Molmo-7B-D-0924.json b/data/model_data_json/allenai_Molmo-7B-D-0924.json new file mode 100644 index 0000000000000000000000000000000000000000..14edb4f34a4f961e4422df40c3a8a68979ea324f --- /dev/null +++ b/data/model_data_json/allenai_Molmo-7B-D-0924.json @@ -0,0 +1,25 @@ +{ + "model_id": "allenai/Molmo-7B-D-0924", + "downloads": 305452, + "tags": [ + "transformers", + "safetensors", + "molmo", + "text-generation", + "multimodal", + "olmo", + "pixmo", + "image-text-to-text", + "conversational", + "custom_code", + "en", + "arxiv:2409.17146", + "base_model:Qwen/Qwen2-7B", + "base_model:finetune:Qwen/Qwen2-7B", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en base_model: - openai/clip-vit-large-patch14-336 - Qwen/Qwen2-7B pipeline_tag: image-text-to-text tags: - multimodal - olmo - molmo - pixmo library_name: transformers --- \"Logo # Molmo 7B-D Molmo is a family of open vision-language models developed by the Allen Institute for AI. Molmo models are trained on PixMo, a dataset of 1 million, highly-curated image-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo family here. **Learn more** about the Molmo family in our announcement blog post or the paper. Molmo 7B-D is based on Qwen2-7B and uses OpenAI CLIP as vision backbone. It performs comfortably between GPT-4V and GPT-4o on both academic benchmarks and human evaluation. It powers the **Molmo demo at** **molmo.allenai.org**. This checkpoint is a **preview** of the Molmo release. All artifacts used in creating Molmo (PixMo dataset, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. **Sign up here** to be the first to know when artifacts are released. Quick links: - 💬 Demo - 📂 All Models - 📃 Paper - 🎥 Blog with Videos ## Quick Start To run Molmo, first install dependencies: Then, follow these steps: To make inference more efficient, run with autocast: We did most of our evaluation in this setting (autocast on, but float32 weights) To even further reduce the memory requirements, the model can be run with bfloat16 weights: Note that we have observed that this can change the output of the model compared to running with float32 weights. ## Evaluations | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating | |-----------------------------|-----------------------------------------|-----------------------------| | Molmo 72B | 81.2 | 1077 | | **Molmo 7B-D (this model)** | **77.3** | **1056** | | Molmo 7B-O | 74.6 | 1051 | | MolmoE 1B | 68.6 | 1032 | | GPT-4o | 78.5 | 1079 | | GPT-4V | 71.1 | 1041 | | Gemini 1.5 Pro | 78.3 | 1074 | | Gemini 1.5 Flash | 75.1 | 1054 | | Claude 3.5 Sonnet | 76.7 | 1069 | | Claude 3 Opus | 66.4 | 971 | | Claude 3 Haiku | 65.3 | 999 | | Qwen VL2 72B | 79.4 | 1037 | | Qwen VL2 7B | 73.7 | 1025 | | Intern VL2 LLAMA 76B | 77.1 | 1018 | | Intern VL2 8B | 69.4 | 953 | | Pixtral 12B | 69.5 | 1016 | | Phi3.5-Vision 4B | 59.7 | 982 | | PaliGemma 3B | 50.0 | 937 | | LLAVA OneVision 72B | 76.6 | 1051 | | LLAVA OneVision 7B | 72.0 | 1024 | | Cambrian-1 34B | 66.8 | 953 | | Cambrian-1 8B | 63.4 | 952 | | xGen - MM - Interleave 4B | 59.5 | 979 | | LLAVA-1.5 13B | 43.9 | 960 | | LLAVA-1.5 7B | 40.7 | 951 | *Benchmarks: AI2D test, ChartQA test, VQA v2.0 test, DocQA test, InfographicVQA test, TextVQA val, RealWorldQA, MMMU val, MathVista testmini, CountBenchQA, Flickr Count (we collected this new dataset that is significantly harder than CountBenchQA).* ## FAQs ### I'm getting an error a broadcast error when processing images! Your image might not be in RGB format. You can convert it using the following code snippet: ### Molmo doesn't work great with transparent images! We received reports that Molmo models might struggle with transparent images. For the time being, we recommend adding a white or dark background to your images before passing them to the model. The code snippet below shows how to do this using the Python Imaging Library (PIL): ## License and Use This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.", + "model_explanation_gemini": "A multimodal vision-language model for image-text tasks, combining Qwen2-7B and CLIP for state-of-the-art performance in open-source models." +} \ No newline at end of file diff --git a/data/model_data_json/allenai_longformer-base-4096.json b/data/model_data_json/allenai_longformer-base-4096.json new file mode 100644 index 0000000000000000000000000000000000000000..e6953eda99da48aacbd024db12d8f61c09f16901 --- /dev/null +++ b/data/model_data_json/allenai_longformer-base-4096.json @@ -0,0 +1,18 @@ +{ + "model_id": "allenai/longformer-base-4096", + "downloads": 1745400, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "longformer", + "en", + "arxiv:2004.05150", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 --- # longformer-base-4096 Longformer is a transformer model for long documents. is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096. Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations. Please refer to the examples in and the paper for more details on how to set global attention. ### Citing If you use in your research, please cite Longformer: The Long-Document Transformer. is an open-source project developed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.", + "model_explanation_gemini": "Longformer-base-4096 is a BERT-like transformer model optimized for processing long documents up to 4,096 tokens, combining sliding window and configurable global attention for task-specific representations." +} \ No newline at end of file diff --git a/data/model_data_json/allenai_olmOCR-7B-0225-preview.json b/data/model_data_json/allenai_olmOCR-7B-0225-preview.json new file mode 100644 index 0000000000000000000000000000000000000000..ea052142c2ed329f446b6ae19627b6b56401b3a9 --- /dev/null +++ b/data/model_data_json/allenai_olmOCR-7B-0225-preview.json @@ -0,0 +1,21 @@ +{ + "model_id": "allenai/olmOCR-7B-0225-preview", + "downloads": 431103, + "tags": [ + "transformers", + "safetensors", + "qwen2_vl", + "image-text-to-text", + "conversational", + "en", + "dataset:allenai/olmOCR-mix-0225", + "base_model:Qwen/Qwen2-VL-7B-Instruct", + "base_model:finetune:Qwen/Qwen2-VL-7B-Instruct", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 datasets: - allenai/olmOCR-mix-0225 base_model: - Qwen/Qwen2-VL-7B-Instruct library_name: transformers --- \"olmOCR # olmOCR-7B-0225-preview This is a preview release of the olmOCR model that's fine tuned from Qwen2-VL-7B-Instruct using the olmOCR-mix-0225 dataset. Quick links: - 📃 Paper - 🤗 Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via sglang that can handle millions of documents at scale. ## Usage This model expects as input a single document image, rendered such that the longest dimension is 1024 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. ## Manual Prompting If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to and then run the following sample code. ## License and use olmOCR is licensed under the Apache 2.0 license. olmOCR is intended for research and educational use. For more information, please see our Responsible Use Guidelines.", + "model_explanation_gemini": "Fine-tuned from Qwen2-VL-7B-Instruct, this model processes document images (resized to 1024px longest dimension) with metadata for OCR tasks, optimized for use with the olmOCR toolkit." +} \ No newline at end of file diff --git a/data/model_data_json/allenai_scibert_scivocab_uncased.json b/data/model_data_json/allenai_scibert_scivocab_uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..8d8ca9f8b1a6e9cd6ca779d35eaec1ad8f4cb732 --- /dev/null +++ b/data/model_data_json/allenai_scibert_scivocab_uncased.json @@ -0,0 +1,15 @@ +{ + "model_id": "allenai/scibert_scivocab_uncased", + "downloads": 262076, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "en", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en --- # SciBERT This is the pretrained model presented in SciBERT: A Pretrained Language Model for Scientific Text, which is a BERT model trained on scientific text. The training corpus was papers taken from Semantic Scholar. Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts. SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions. Available models include: * * The original repo can be found here. If using these models, please cite the following paper:", + "model_explanation_gemini": "Pretrained BERT model specialized for scientific text, trained on 1.14M papers with a custom vocabulary (scivocab), available in cased and uncased versions." +} \ No newline at end of file diff --git a/data/model_data_json/allenai_specter2_base.json b/data/model_data_json/allenai_specter2_base.json new file mode 100644 index 0000000000000000000000000000000000000000..702b95fcdb37c1ec6aba1f610e82d3e706bd91e9 --- /dev/null +++ b/data/model_data_json/allenai_specter2_base.json @@ -0,0 +1,18 @@ +{ + "model_id": "allenai/specter2_base", + "downloads": 229404, + "tags": [ + "transformers", + "pytorch", + "bert", + "feature-extraction", + "en", + "dataset:allenai/scirepeval", + "license:apache-2.0", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - allenai/scirepeval language: - en --- ## SPECTER2 SPECTER2 is the successor to SPECTER and is capable of generating task specific embeddings for scientific tasks when paired with adapters. This is the base model to be used along with the adapters. Given the combination of title and abstract of a scientific paper or a short texual query, the model can be used to generate effective embeddings to be used in downstream applications. **Note:For general embedding purposes, please use allenai/specter2.** **To get the best performance on a downstream task type please load the associated adapter with the base model as in the example below.** **Dec 2023 Update:** Model usage updated to be compatible with latest versions of transformers and adapters (newly released update to adapter-transformers) libraries. **Aug 2023 Update:** 1. **The SPECTER2 Base and proximity adapter models have been renamed in Hugging Face based upon usage patterns as follows:** |Old Name|New Name| |--|--| |allenai/specter2|allenai/specter2_base| |allenai/specter2_proximity|allenai/specter2| 2. **We have a parallel version (termed aug2023refresh) where the base transformer encoder version is pre-trained on a collection of newer papers (published after 2018). However, for benchmarking purposes, please continue using the current version.** An adapter for the allenai/specter2_base model that was trained on the allenai/scirepeval dataset. This adapter was created for usage with the **adapters** library. # Model Details ## Model Description SPECTER2 has been trained on over 6M triplets of scientific paper citations, which are available here. Post that it is trained with additionally attached task format specific adapter modules on all the SciRepEval training tasks. Task Formats trained on: - Classification - Regression - Proximity (Retrieval) - Adhoc Search It builds on the work done in SciRepEval: A Multi-Format Benchmark for Scientific Document Representations and we evaluate the trained model on this benchmark as well. - **Developed by:** Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman - **Shared by :** Allen AI - **Model type:** bert-base-uncased + adapters - **License:** Apache 2.0 - **Finetuned from model:** allenai/scibert. ## Model Sources - **Repository:** - **Paper:** - **Demo:** Usage # Uses ## Direct Use |Model|Name and HF link|Description| |--|--|--| |Proximity*|allenai/specter2|Encode papers as queries and candidates eg. Link Prediction, Nearest Neighbor Search| |Adhoc Query|allenai/specter2_adhoc_query|Encode short raw text queries for search tasks. (Candidate papers can be encoded with the proximity adapter)| |Classification|allenai/specter2_classification|Encode papers to feed into linear classifiers as features| |Regression|allenai/specter2_regression|Encode papers to feed into linear regressors as features| *Proximity model should suffice for downstream task types not mentioned above ## Downstream Use For evaluation and downstream usage, please refer to # Training Details ## Training Data The base model is trained on citation links between papers and the adapters are trained on 8 large scale tasks across the four formats. All the data is a part of SciRepEval benchmark and is available here. The citation link are triplets in the form consisting of a query paper, a positive citation and a negative which can be from the same/different field of study as the query or citation of a citation. ## Training Procedure Please refer to the SPECTER paper. ### Training Hyperparameters The model is trained in two stages using SciRepEval: - Base Model: First a base model is trained on the above citation triplets. - Adapters: Thereafter, task format specific adapters are trained on the SciRepEval training tasks, where 600K triplets are sampled from above and added to the training data as well. # Evaluation We evaluate the model on SciRepEval, a large scale eval benchmark for scientific embedding tasks which which has [SciDocs] as a subset. We also evaluate and establish a new SoTA on MDCR, a large scale citation recommendation benchmark. |Model|SciRepEval In-Train|SciRepEval Out-of-Train|SciRepEval Avg|MDCR(MAP, Recall@5)| |--|--|--|--|--| |BM-25|n/a|n/a|n/a|(33.7, 28.5)| |SPECTER|54.7|72.0|67.5|(30.6, 25.5)| |SciNCL|55.6|73.4|68.8|(32.6, 27.3)| |SciRepEval-Adapters|61.9|73.8|70.7|(35.3, 29.6)| |SPECTER2 Base|56.3|73.6|69.1|(38.0, 32.4)| |SPECTER2-Adapters|**62.3**|**74.1**|**71.1**|**(38.4, 33.0)**| Please cite the following works if you end up using SPECTER2: bibtex @inproceedings{Singh2022SciRepEvalAM, title={SciRepEval: A Multi-Format Benchmark for Scientific Document Representations}, author={Amanpreet Singh and Mike D'Arcy and Arman Cohan and Doug Downey and Sergey Feldman}, booktitle={Conference on Empirical Methods in Natural Language Processing}, year={2022}, url={ } ```", + "model_explanation_gemini": "Generates task-specific embeddings for scientific papers or queries to enhance performance in downstream applications like classification, regression, and retrieval when paired with adapters." +} \ No newline at end of file diff --git a/data/model_data_json/almanach_camembert-base.json b/data/model_data_json/almanach_camembert-base.json new file mode 100644 index 0000000000000000000000000000000000000000..7c4dca405eead8fdb8c25e7aa75faafc6e37e08f --- /dev/null +++ b/data/model_data_json/almanach_camembert-base.json @@ -0,0 +1,21 @@ +{ + "model_id": "almanach/camembert-base", + "downloads": 1643097, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "camembert", + "fill-mask", + "fr", + "dataset:oscar", + "arxiv:1911.03894", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: fr license: mit datasets: - oscar --- # CamemBERT: a Tasty French Language Model ## Introduction CamemBERT is a state-of-the-art language model for French based on the RoBERTa model. It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains. For further information or requests, please go to Camembert Website ## Pre-trained models | Model | #params | Arch. | Training data | |--------------------------------|--------------------------------|-------|-----------------------------------| | | 110M | Base | OSCAR (138 GB of text) | | | 335M | Large | CCNet (135 GB of text) | | | 110M | Base | CCNet (135 GB of text) | | | 110M | Base | Wikipedia (4 GB of text) | | | 110M | Base | Subsample of OSCAR (4 GB of text) | | | 110M | Base | Subsample of CCNet (4 GB of text) | ## How to use CamemBERT with HuggingFace ##### Load CamemBERT and its sub-word tokenizer : ##### Filling masks using pipeline ##### Extract contextual embedding features from Camembert output ##### Extract contextual embedding features from all Camembert layers ## Authors CamemBERT was trained and evaluated by Louis Martin\\*, Benjamin Muller\\*, Pedro Javier Ortiz Suárez\\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. ## Citation If you use our work, please cite:", + "model_explanation_gemini": "CamemBERT is a French language model based on RoBERTa, designed for tasks like mask filling and contextual embedding extraction, pretrained on various French text datasets." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-bolt-base.json b/data/model_data_json/amazon_chronos-bolt-base.json new file mode 100644 index 0000000000000000000000000000000000000000..91d99ef661d062cc952b3f979e203e4302983e7e --- /dev/null +++ b/data/model_data_json/amazon_chronos-bolt-base.json @@ -0,0 +1,21 @@ +{ + "model_id": "amazon/chronos-bolt-base", + "downloads": 1399591, + "tags": [ + "safetensors", + "t5", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:1910.10683", + "arxiv:2403.07815", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-Bolt⚡ (Base) 🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. ## Performance The following plot compares the inference time of Chronos-Bolt against the original Chronos models for forecasting 1024 time series with a context length of 512 observations and a prediction horizon of 64 steps.
Chronos-Bolt models are not only significantly faster but also more accurate than the original Chronos models. The following plot reports the probabilistic and point forecasting performance of Chronos-Bolt in terms of the Weighted Quantile Loss (WQL) and the Mean Absolute Scaled Error (MASE), respectively, aggregated over 27 datasets (see the Chronos paper for details on this benchmark). Remarkably, despite having no prior exposure to these datasets during training, the zero-shot Chronos-Bolt models outperform commonly used statistical models and deep learning models that have been trained on these datasets (highlighted by *). Furthermore, they also perform better than other FMs, denoted by a +, which indicates that these models were pretrained on certain datasets in our benchmark and are not entirely zero-shot. Notably, Chronos-Bolt (Base) also surpasses the original Chronos (Large) model in terms of the forecasting accuracy while being over 600 times faster.
Chronos-Bolt models are available in the following sizes.
| Model | Parameters | Based on | | ----------------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-bolt-tiny** | 9M | t5-efficient-tiny | | **chronos-bolt-mini** | 21M | t5-efficient-mini | | **chronos-bolt-small** | 48M | t5-efficient-small | | **chronos-bolt-base** | 205M | t5-efficient-base |
## Usage ### Usage with AutoGluon The recommended way of using Chronos for production use cases is through AutoGluon. AutoGluon offers effortless **fine-tuning** of Chronos models, incorporating **covariates** into the forecast through covariate regressors, and **ensembling** with other statistical and machine learning models for maximum accuracy. Check out the AutoGluon Chronos tutorial for more details. A minimal example showing how to perform zero-shot inference using Chronos-Bolt with AutoGluon: Install the required dependencies. Forecast with the Chronos-Bolt model. ### Deploying a Chronos-Bolt endpoint to SageMaker SageMaker JumpStart makes it easy to deploy Chronos endpoints for production use with just a few lines of code. Chronos-Bolt endpoints can be deployed to **both CPU and GPU** instances, as well as support forecasting with **covariates**. More details are available in this example notebook. A minimal example showing how to deploy a Chronos-Bolt (Base) endpoint to SageMaker: Update the SageMaker SDK to make sure that all the latest models are available. Deploy an inference endpoint to SageMaker. Now you can send time series data to the endpoint in JSON format. ### Usage with inference library Alternatively, you can install the package in the GitHub companion repo. This is intended for research purposes and provides a minimal interface to Chronos models. Install the library by running: A minimal example showing how to perform inference using Chronos-Bolt models: ## Citation If you find Chronos or Chronos-Bolt models useful for your research, please consider citing the associated paper: ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "A pretrained time series forecasting model based on T5 architecture, optimized for fast and memory-efficient zero-shot forecasting by generating multi-step quantile predictions from historical data patches." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-bolt-mini.json b/data/model_data_json/amazon_chronos-bolt-mini.json new file mode 100644 index 0000000000000000000000000000000000000000..ea8214272abd46ad3f63afeeec56e7ed5d8bd464 --- /dev/null +++ b/data/model_data_json/amazon_chronos-bolt-mini.json @@ -0,0 +1,21 @@ +{ + "model_id": "amazon/chronos-bolt-mini", + "downloads": 529009, + "tags": [ + "safetensors", + "t5", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:1910.10683", + "arxiv:2403.07815", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-Bolt⚡ (Mini) 🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. ## Performance The following plot compares the inference time of Chronos-Bolt against the original Chronos models for forecasting 1024 time series with a context length of 512 observations and a prediction horizon of 64 steps.
Chronos-Bolt models are not only significantly faster but also more accurate than the original Chronos models. The following plot reports the probabilistic and point forecasting performance of Chronos-Bolt in terms of the Weighted Quantile Loss (WQL) and the Mean Absolute Scaled Error (MASE), respectively, aggregated over 27 datasets (see the Chronos paper for details on this benchmark). Remarkably, despite having no prior exposure to these datasets during training, the zero-shot Chronos-Bolt models outperform commonly used statistical models and deep learning models that have been trained on these datasets (highlighted by *). Furthermore, they also perform better than other FMs, denoted by a +, which indicates that these models were pretrained on certain datasets in our benchmark and are not entirely zero-shot. Notably, Chronos-Bolt (Base) also surpasses the original Chronos (Large) model in terms of the forecasting accuracy while being over 600 times faster.
Chronos-Bolt models are available in the following sizes.
| Model | Parameters | Based on | | ----------------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-bolt-tiny** | 9M | t5-efficient-tiny | | **chronos-bolt-mini** | 21M | t5-efficient-mini | | **chronos-bolt-small** | 48M | t5-efficient-small | | **chronos-bolt-base** | 205M | t5-efficient-base |
## Usage ### Usage with AutoGluon The recommended way of using Chronos for production use cases is through AutoGluon. AutoGluon offers effortless **fine-tuning** of Chronos models, incorporating **covariates** into the forecast through covariate regressors, and **ensembling** with other statistical and machine learning models for maximum accuracy. Check out the AutoGluon Chronos tutorial for more details. A minimal example showing how to perform zero-shot inference using Chronos-Bolt with AutoGluon: Install the required dependencies. Forecast with the Chronos-Bolt model. ### Deploying a Chronos-Bolt endpoint to SageMaker SageMaker JumpStart makes it easy to deploy Chronos endpoints for production use with just a few lines of code. Chronos-Bolt endpoints can be deployed to **both CPU and GPU** instances, as well as support forecasting with **covariates**. More details are available in this example notebook. A minimal example showing how to deploy a Chronos-Bolt (Base) endpoint to SageMaker: Update the SageMaker SDK to make sure that all the latest models are available. Deploy an inference endpoint to SageMaker. Now you can send time series data to the endpoint in JSON format. ### Usage with inference library Alternatively, you can install the package in the GitHub companion repo. This is intended for research purposes and provides a minimal interface to Chronos models. Install the library by running: A minimal example showing how to perform inference using Chronos-Bolt models: ## Citation If you find Chronos or Chronos-Bolt models useful for your research, please consider citing the associated paper: ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "Pretrained time series forecasting model based on T5 architecture, optimized for fast, memory-efficient zero-shot predictions across multiple future steps." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-bolt-small.json b/data/model_data_json/amazon_chronos-bolt-small.json new file mode 100644 index 0000000000000000000000000000000000000000..7a00575c3ad337f8f8fd8c5008d9e5610fc557b9 --- /dev/null +++ b/data/model_data_json/amazon_chronos-bolt-small.json @@ -0,0 +1,21 @@ +{ + "model_id": "amazon/chronos-bolt-small", + "downloads": 363035, + "tags": [ + "safetensors", + "t5", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:1910.10683", + "arxiv:2403.07815", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-Bolt⚡ (Small) 🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. ## Performance The following plot compares the inference time of Chronos-Bolt against the original Chronos models for forecasting 1024 time series with a context length of 512 observations and a prediction horizon of 64 steps.
Chronos-Bolt models are not only significantly faster but also more accurate than the original Chronos models. The following plot reports the probabilistic and point forecasting performance of Chronos-Bolt in terms of the Weighted Quantile Loss (WQL) and the Mean Absolute Scaled Error (MASE), respectively, aggregated over 27 datasets (see the Chronos paper for details on this benchmark). Remarkably, despite having no prior exposure to these datasets during training, the zero-shot Chronos-Bolt models outperform commonly used statistical models and deep learning models that have been trained on these datasets (highlighted by *). Furthermore, they also perform better than other FMs, denoted by a +, which indicates that these models were pretrained on certain datasets in our benchmark and are not entirely zero-shot. Notably, Chronos-Bolt (Base) also surpasses the original Chronos (Large) model in terms of the forecasting accuracy while being over 600 times faster.
Chronos-Bolt models are available in the following sizes.
| Model | Parameters | Based on | | ----------------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-bolt-tiny** | 9M | t5-efficient-tiny | | **chronos-bolt-mini** | 21M | t5-efficient-mini | | **chronos-bolt-small** | 48M | t5-efficient-small | | **chronos-bolt-base** | 205M | t5-efficient-base |
## Usage ### Usage with AutoGluon The recommended way of using Chronos for production use cases is through AutoGluon. AutoGluon offers effortless **fine-tuning** of Chronos models, incorporating **covariates** into the forecast through covariate regressors, and **ensembling** with other statistical and machine learning models for maximum accuracy. Check out the AutoGluon Chronos tutorial for more details. A minimal example showing how to perform zero-shot inference using Chronos-Bolt with AutoGluon: Install the required dependencies. Forecast with the Chronos-Bolt model. ### Deploying a Chronos-Bolt endpoint to SageMaker SageMaker JumpStart makes it easy to deploy Chronos endpoints for production use with just a few lines of code. Chronos-Bolt endpoints can be deployed to **both CPU and GPU** instances, as well as support forecasting with **covariates**. More details are available in this example notebook. A minimal example showing how to deploy a Chronos-Bolt (Base) endpoint to SageMaker: Update the SageMaker SDK to make sure that all the latest models are available. Deploy an inference endpoint to SageMaker. Now you can send time series data to the endpoint in JSON format. ### Usage with inference library Alternatively, you can install the package in the GitHub companion repo. This is intended for research purposes and provides a minimal interface to Chronos models. Install the library by running: A minimal example showing how to perform inference using Chronos-Bolt models: ## Citation If you find Chronos or Chronos-Bolt models useful for your research, please consider citing the associated paper: ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "A pretrained time series forecasting model based on T5 architecture, optimized for fast and memory-efficient zero-shot forecasting by generating multi-step quantile predictions from historical data patches." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-bolt-tiny.json b/data/model_data_json/amazon_chronos-bolt-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..b8b0e3e5bcc7fa9cfd15a81c29e38b856e0b35cd --- /dev/null +++ b/data/model_data_json/amazon_chronos-bolt-tiny.json @@ -0,0 +1,21 @@ +{ + "model_id": "amazon/chronos-bolt-tiny", + "downloads": 286175, + "tags": [ + "safetensors", + "t5", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:1910.10683", + "arxiv:2403.07815", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-Bolt⚡ (Tiny) 🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. ## Performance The following plot compares the inference time of Chronos-Bolt against the original Chronos models for forecasting 1024 time series with a context length of 512 observations and a prediction horizon of 64 steps.
Chronos-Bolt models are not only significantly faster but also more accurate than the original Chronos models. The following plot reports the probabilistic and point forecasting performance of Chronos-Bolt in terms of the Weighted Quantile Loss (WQL) and the Mean Absolute Scaled Error (MASE), respectively, aggregated over 27 datasets (see the Chronos paper for details on this benchmark). Remarkably, despite having no prior exposure to these datasets during training, the zero-shot Chronos-Bolt models outperform commonly used statistical models and deep learning models that have been trained on these datasets (highlighted by *). Furthermore, they also perform better than other FMs, denoted by a +, which indicates that these models were pretrained on certain datasets in our benchmark and are not entirely zero-shot. Notably, Chronos-Bolt (Base) also surpasses the original Chronos (Large) model in terms of the forecasting accuracy while being over 600 times faster.
Chronos-Bolt models are available in the following sizes.
| Model | Parameters | Based on | | ----------------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-bolt-tiny** | 9M | t5-efficient-tiny | | **chronos-bolt-mini** | 21M | t5-efficient-mini | | **chronos-bolt-small** | 48M | t5-efficient-small | | **chronos-bolt-base** | 205M | t5-efficient-base |
## Usage ### Usage with AutoGluon The recommended way of using Chronos for production use cases is through AutoGluon. AutoGluon offers effortless **fine-tuning** of Chronos models, incorporating **covariates** into the forecast through covariate regressors, and **ensembling** with other statistical and machine learning models for maximum accuracy. Check out the AutoGluon Chronos tutorial for more details. A minimal example showing how to perform zero-shot inference using Chronos-Bolt with AutoGluon: Install the required dependencies. Forecast with the Chronos-Bolt model. ### Deploying a Chronos-Bolt endpoint to SageMaker SageMaker JumpStart makes it easy to deploy Chronos endpoints for production use with just a few lines of code. Chronos-Bolt endpoints can be deployed to **both CPU and GPU** instances, as well as support forecasting with **covariates**. More details are available in this example notebook. A minimal example showing how to deploy a Chronos-Bolt (Base) endpoint to SageMaker: Update the SageMaker SDK to make sure that all the latest models are available. Deploy an inference endpoint to SageMaker. Now you can send time series data to the endpoint in JSON format. ### Usage with inference library Alternatively, you can install the package in the GitHub companion repo. This is intended for research purposes and provides a minimal interface to Chronos models. Install the library by running: A minimal example showing how to perform inference using Chronos-Bolt models: ## Citation If you find Chronos or Chronos-Bolt models useful for your research, please consider citing the associated paper: ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "A pretrained time series forecasting model based on T5 architecture, optimized for fast and memory-efficient zero-shot forecasting by generating quantile predictions across multiple future steps." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-t5-base.json b/data/model_data_json/amazon_chronos-t5-base.json new file mode 100644 index 0000000000000000000000000000000000000000..0245597692941490ace7149b6a68900c203b9ff4 --- /dev/null +++ b/data/model_data_json/amazon_chronos-t5-base.json @@ -0,0 +1,26 @@ +{ + "model_id": "amazon/chronos-t5-base", + "downloads": 1747412, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2403.07815", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-T5 (Base) 🚀 **Update Feb 14, 2025**: Chronos-Bolt & original Chronos models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. 🚀 **Update Nov 27, 2024**: We have released Chronos-Bolt⚡️ models that are more accurate (5% lower error), up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. Check out the new models here. Chronos is a family of **pretrained time series forecasting models** based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes. For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.


Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

--- ## Architecture The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters. | Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-t5-tiny** | 8M | t5-efficient-tiny | | **chronos-t5-mini** | 20M | t5-efficient-mini | | **chronos-t5-small** | 46M | t5-efficient-small | | **chronos-t5-base** | 200M | t5-efficient-base | | **chronos-t5-large** | 710M | t5-efficient-large | ## Usage To perform inference with Chronos models, install the package in the GitHub companion repo by running: A minimal example showing how to perform inference using Chronos models: ## Citation If you find Chronos models useful for your research, please consider citing the associated paper: ## Security See CONTRIBUTING for more information. ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "A pretrained time series forecasting model that transforms data into tokens for probabilistic future predictions using a language model architecture." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-t5-large.json b/data/model_data_json/amazon_chronos-t5-large.json new file mode 100644 index 0000000000000000000000000000000000000000..1f1450380fb6bfd4370c43d3584582f78e768144 --- /dev/null +++ b/data/model_data_json/amazon_chronos-t5-large.json @@ -0,0 +1,26 @@ +{ + "model_id": "amazon/chronos-t5-large", + "downloads": 152388, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2403.07815", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-T5 (Large) 🚀 **Update Feb 14, 2025**: Chronos-Bolt & original Chronos models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. 🚀 **Update Nov 27, 2024**: We have released Chronos-Bolt⚡️ models that are more accurate (5% lower error), up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. Check out the new models here. Chronos is a family of **pretrained time series forecasting models** based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes. For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.


Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

--- ## Architecture The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters. | Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-t5-tiny** | 8M | t5-efficient-tiny | | **chronos-t5-mini** | 20M | t5-efficient-mini | | **chronos-t5-small** | 46M | t5-efficient-small | | **chronos-t5-base** | 200M | t5-efficient-base | | **chronos-t5-large** | 710M | t5-efficient-large | ## Usage To perform inference with Chronos models, install the package in the GitHub companion repo by running: A minimal example showing how to perform inference using Chronos models: ## Citation If you find Chronos models useful for your research, please consider citing the associated paper: ## Security See CONTRIBUTING for more information. ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "Pretrained time series forecasting model that transforms data into tokens for probabilistic forecasting using a T5-based language model architecture." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-t5-mini.json b/data/model_data_json/amazon_chronos-t5-mini.json new file mode 100644 index 0000000000000000000000000000000000000000..a421077d270c0c620ae08e4de789829d1dd3b57f --- /dev/null +++ b/data/model_data_json/amazon_chronos-t5-mini.json @@ -0,0 +1,26 @@ +{ + "model_id": "amazon/chronos-t5-mini", + "downloads": 88443, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2403.07815", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-T5 (Mini) 🚀 **Update Feb 14, 2025**: Chronos-Bolt & original Chronos models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. 🚀 **Update Nov 27, 2024**: We have released Chronos-Bolt⚡️ models that are more accurate (5% lower error), up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. Check out the new models here. Chronos is a family of **pretrained time series forecasting models** based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes. For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.


Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

--- ## Architecture The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters. | Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-t5-tiny** | 8M | t5-efficient-tiny | | **chronos-t5-mini** | 20M | t5-efficient-mini | | **chronos-t5-small** | 46M | t5-efficient-small | | **chronos-t5-base** | 200M | t5-efficient-base | | **chronos-t5-large** | 710M | t5-efficient-large | ## Usage To perform inference with Chronos models, install the package in the GitHub companion repo by running: A minimal example showing how to perform inference using Chronos models: ## Citation If you find Chronos models useful for your research, please consider citing the associated paper: ## Security See CONTRIBUTING for more information. ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "A pretrained time series forecasting model that transforms data into tokens for probabilistic predictions using a T5-based language model architecture." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-t5-small.json b/data/model_data_json/amazon_chronos-t5-small.json new file mode 100644 index 0000000000000000000000000000000000000000..78fdce8e2fe1b874d70913631795e45cd3781de2 --- /dev/null +++ b/data/model_data_json/amazon_chronos-t5-small.json @@ -0,0 +1,26 @@ +{ + "model_id": "amazon/chronos-t5-small", + "downloads": 20472269, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2403.07815", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-T5 (Small) 🚀 **Update Feb 14, 2025**: Chronos-Bolt & original Chronos models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. 🚀 **Update Nov 27, 2024**: We have released Chronos-Bolt⚡️ models that are more accurate (5% lower error), up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. Check out the new models here. Chronos is a family of **pretrained time series forecasting models** based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes. For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.


Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

--- ## Architecture The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters. | Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-t5-tiny** | 8M | t5-efficient-tiny | | **chronos-t5-mini** | 20M | t5-efficient-mini | | **chronos-t5-small** | 46M | t5-efficient-small | | **chronos-t5-base** | 200M | t5-efficient-base | | **chronos-t5-large** | 710M | t5-efficient-large | ## Usage To perform inference with Chronos models, install the package in the GitHub companion repo by running: A minimal example showing how to perform inference using Chronos models: ## Citation If you find Chronos models useful for your research, please consider citing the associated paper: ## Security See CONTRIBUTING for more information. ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "A pretrained time series forecasting model that transforms data into tokens and uses a language model architecture to generate probabilistic future predictions." +} \ No newline at end of file diff --git a/data/model_data_json/amazon_chronos-t5-tiny.json b/data/model_data_json/amazon_chronos-t5-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..438e810bb44acc276480f3c50d161aff562f46fe --- /dev/null +++ b/data/model_data_json/amazon_chronos-t5-tiny.json @@ -0,0 +1,26 @@ +{ + "model_id": "amazon/chronos-t5-tiny", + "downloads": 505305, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2403.07815", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-T5 (Tiny) 🚀 **Update Feb 14, 2025**: Chronos-Bolt & original Chronos models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. 🚀 **Update Nov 27, 2024**: We have released Chronos-Bolt⚡️ models that are more accurate (5% lower error), up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. Check out the new models here. Chronos is a family of **pretrained time series forecasting models** based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes. For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.


Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

--- ## Architecture The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters. | Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-t5-tiny** | 8M | t5-efficient-tiny | | **chronos-t5-mini** | 20M | t5-efficient-mini | | **chronos-t5-small** | 46M | t5-efficient-small | | **chronos-t5-base** | 200M | t5-efficient-base | | **chronos-t5-large** | 710M | t5-efficient-large | ## Usage To perform inference with Chronos models, install the package in the GitHub companion repo by running: A minimal example showing how to perform inference using Chronos models: ## Citation If you find Chronos models useful for your research, please consider citing the associated paper: ## Security See CONTRIBUTING for more information. ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "Pretrained time series forecasting model using tokenized sequences and language model architecture to generate probabilistic future predictions." +} \ No newline at end of file diff --git a/data/model_data_json/amunchet_rorshark-vit-base.json b/data/model_data_json/amunchet_rorshark-vit-base.json new file mode 100644 index 0000000000000000000000000000000000000000..c5049245bef5355c44a83589334a1e327dc072a4 --- /dev/null +++ b/data/model_data_json/amunchet_rorshark-vit-base.json @@ -0,0 +1,23 @@ +{ + "model_id": "amunchet/rorshark-vit-base", + "downloads": 420029, + "tags": [ + "transformers", + "tensorboard", + "safetensors", + "vit", + "image-classification", + "vision", + "generated_from_trainer", + "dataset:imagefolder", + "base_model:google/vit-base-patch16-224-in21k", + "base_model:finetune:google/vit-base-patch16-224-in21k", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 base_model: google/vit-base-patch16-224-in21k tags: - image-classification - vision - generated_from_trainer datasets: - imagefolder metrics: - accuracy model-index: - name: rorshark-vit-base results: - task: name: Image Classification type: image-classification dataset: name: imagefolder type: imagefolder config: default split: train args: default metrics: - name: Accuracy type: accuracy value: 0.9922928709055877 --- # rorshark-vit-base This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the imagefolder dataset. It achieves the following results on the evaluation set: - Loss: 0.0393 - Accuracy: 0.9923 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 1337 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:--------:| | 0.0597 | 1.0 | 368 | 0.0546 | 0.9865 | | 0.2009 | 2.0 | 736 | 0.0531 | 0.9865 | | 0.0114 | 3.0 | 1104 | 0.0418 | 0.9904 | | 0.0998 | 4.0 | 1472 | 0.0425 | 0.9904 | | 0.1244 | 5.0 | 1840 | 0.0393 | 0.9923 | ### Framework versions - Transformers 4.36.0.dev0 - Pytorch 2.1.1+cu118 - Datasets 2.15.0 - Tokenizers 0.15.0", + "model_explanation_gemini": "A fine-tuned vision transformer model for high-accuracy image classification tasks." +} \ No newline at end of file diff --git a/data/model_data_json/answerdotai_ModernBERT-base.json b/data/model_data_json/answerdotai_ModernBERT-base.json new file mode 100644 index 0000000000000000000000000000000000000000..810e2fc9ae9e9236b133f596b98a0748a33b8ac1 --- /dev/null +++ b/data/model_data_json/answerdotai_ModernBERT-base.json @@ -0,0 +1,21 @@ +{ + "model_id": "answerdotai/ModernBERT-base", + "downloads": 488504, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "modernbert", + "fill-mask", + "masked-lm", + "long-context", + "en", + "arxiv:2412.13663", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en tags: - fill-mask - masked-lm - long-context - modernbert pipeline_tag: fill-mask inference: false --- # ModernBERT ## Table of Contents 1. Model Summary 2. Usage 3. Evaluation 4. Limitations 5. Training 6. License 7. Citation ## Model Summary ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as: - **Rotary Positional Embeddings (RoPE)** for long-context support. - **Local-Global Alternating Attention** for efficiency on long inputs. - **Unpadding and Flash Attention** for efficient inference. ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search. It is available in the following sizes: - ModernBERT-base - 22 layers, 149 million parameters - ModernBERT-large - 28 layers, 395 million parameters For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information. *ModernBERT is a collaboration between Answer.AI, LightOn, and friends.* ## Usage You can use these models directly with the library starting from v4.48.0: Since ModernBERT is a Masked Language Model (MLM), you can use the pipeline or load it via . To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes. **⚠️ If your GPU supports it, we recommend using ModernBERT with Flash Attention 2 to reach the highest efficiency. To do so, install Flash Attention as follows, then use the model as normal:** Using : Using a pipeline: **Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the parameter. ## Evaluation We evaluate ModernBERT across a range of tasks, including natural language understanding (GLUE), general retrieval (BEIR), long-context retrieval (MLDR), and code retrieval (CodeSearchNet and StackQA). **Key highlights:** - On GLUE, ModernBERT-base surpasses other similarly-sized encoder models, and ModernBERT-large is second only to Deberta-v3-large. - For general retrieval tasks, ModernBERT performs well on BEIR in both single-vector (DPR-style) and multi-vector (ColBERT-style) settings. - Thanks to the inclusion of code data in its training mixture, ModernBERT as a backbone also achieves new state-of-the-art code retrieval results on CodeSearchNet and StackQA. ### Base Models | Model | IR (DPR) | IR (DPR) | IR (DPR) | IR (ColBERT) | IR (ColBERT) | NLU | Code | Code | |-------------|--------------|--------------|--------------|---------------|---------------|------|------|------| | | BEIR | MLDR_OOD | MLDR_ID | BEIR | MLDR_OOD | GLUE | CSN | SQA | | BERT | 38.9 | 23.9 | 32.2 | 49.0 | 28.1 | 84.7 | 41.2 | 59.5 | | RoBERTa | 37.7 | 22.9 | 32.8 | 48.7 | 28.2 | 86.4 | 44.3 | 59.6 | | DeBERTaV3 | 20.2 | 5.4 | 13.4 | 47.1 | 21.9 | 88.1 | 17.5 | 18.6 | | NomicBERT | 41.0 | 26.7 | 30.3 | 49.9 | 61.3 | 84.0 | 41.6 | 61.4 | | GTE-en-MLM | 41.4 | **34.3** |**44.4** | 48.2 | 69.3 | 85.6 | 44.9 | 71.4 | | ModernBERT | **41.6** | 27.4 | 44.0 | **51.3** | **80.2** | **88.4** | **56.4** |**73.6**| --- ### Large Models | Model | IR (DPR) | IR (DPR) | IR (DPR) | IR (ColBERT) | IR (ColBERT) | NLU | Code | Code | |-------------|--------------|--------------|--------------|---------------|---------------|------|------|------| | | BEIR | MLDR_OOD | MLDR_ID | BEIR | MLDR_OOD | GLUE | CSN | SQA | | BERT | 38.9 | 23.3 | 31.7 | 49.5 | 28.5 | 85.2 | 41.6 | 60.8 | | RoBERTa | 41.4 | 22.6 | 36.1 | 49.8 | 28.8 | 88.9 | 47.3 | 68.1 | | DeBERTaV3 | 25.6 | 7.1 | 19.2 | 46.7 | 23.0 | **91.4**| 21.2 | 19.7 | | GTE-en-MLM | 42.5 | **36.4** | **48.9** | 50.7 | 71.3 | 87.6 | 40.5 | 66.9 | | ModernBERT | **44.0** | 34.3 | 48.6 | **52.4** | **80.4** | 90.4 |**59.5** |**83.9**| *Table 1: Results for all models across an overview of all tasks. CSN refers to CodeSearchNet and SQA to StackQA. MLDRID refers to in-domain (fine-tuned on the training set) evaluation, and MLDR_OOD to out-of-domain.* ModernBERT’s strong results, coupled with its efficient runtime on long-context inputs, demonstrate that encoder-only models can be significantly improved through modern architectural choices and extensive pretraining on diversified data sources. ## Limitations ModernBERT’s training data is primarily English and code, so performance may be lower for other languages. While it can handle long sequences efficiently, using the full 8,192 tokens window may be slower than short-context inference. Like any large language model, ModernBERT may produce representations that reflect biases present in its training data. Verify critical or sensitive outputs before relying on them. ## Training - Architecture: Encoder-only, Pre-Norm Transformer with GeGLU activations. - Sequence Length: Pre-trained up to 1,024 tokens, then extended to 8,192 tokens. - Data: 2 trillion tokens of English text and code. - Optimizer: StableAdamW with trapezoidal LR scheduling and 1-sqrt decay. - Hardware: Trained on 8x H100 GPUs. See the paper for more details. ## License We release the ModernBERT model architectures, model weights, training codebase under the Apache 2.0 license. ## Citation If you use ModernBERT in your work, please cite:", + "model_explanation_gemini": "A modernized BERT-style model optimized for long-context tasks like retrieval, classification, and semantic search, supporting up to 8,192 tokens with improved efficiency and performance on both text and code data." +} \ No newline at end of file diff --git a/data/model_data_json/answerdotai_ModernBERT-large.json b/data/model_data_json/answerdotai_ModernBERT-large.json new file mode 100644 index 0000000000000000000000000000000000000000..4797c6890870381b76b2d04fe44954fa3b76c823 --- /dev/null +++ b/data/model_data_json/answerdotai_ModernBERT-large.json @@ -0,0 +1,20 @@ +{ + "model_id": "answerdotai/ModernBERT-large", + "downloads": 80863, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "modernbert", + "fill-mask", + "masked-lm", + "long-context", + "en", + "arxiv:2412.13663", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en tags: - fill-mask - masked-lm - long-context - modernbert pipeline_tag: fill-mask inference: false --- # ModernBERT ## Table of Contents 1. Model Summary 2. Usage 3. Evaluation 4. Limitations 5. Training 6. License 7. Citation ## Model Summary ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as: - **Rotary Positional Embeddings (RoPE)** for long-context support. - **Local-Global Alternating Attention** for efficiency on long inputs. - **Unpadding and Flash Attention** for efficient inference. ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search. It is available in the following sizes: - ModernBERT-base - 22 layers, 149 million parameters - ModernBERT-large - 28 layers, 395 million parameters For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information. *ModernBERT is a collaboration between Answer.AI, LightOn, and friends.* ## Usage You can use these models directly with the library starting from v4.48.0: Since ModernBERT is a Masked Language Model (MLM), you can use the pipeline or load it via . To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes. **⚠️ If your GPU supports it, we recommend using ModernBERT with Flash Attention 2 to reach the highest efficiency. To do so, install Flash Attention as follows, then use the model as normal:** Using : Using a pipeline: **Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the parameter. ## Evaluation We evaluate ModernBERT across a range of tasks, including natural language understanding (GLUE), general retrieval (BEIR), long-context retrieval (MLDR), and code retrieval (CodeSearchNet and StackQA). **Key highlights:** - On GLUE, ModernBERT-base surpasses other similarly-sized encoder models, and ModernBERT-large is second only to Deberta-v3-large. - For general retrieval tasks, ModernBERT performs well on BEIR in both single-vector (DPR-style) and multi-vector (ColBERT-style) settings. - Thanks to the inclusion of code data in its training mixture, ModernBERT as a backbone also achieves new state-of-the-art code retrieval results on CodeSearchNet and StackQA. ### Base Models | Model | IR (DPR) | IR (DPR) | IR (DPR) | IR (ColBERT) | IR (ColBERT) | NLU | Code | Code | |-------------|--------------|--------------|--------------|---------------|---------------|------|------|------| | | BEIR | MLDR_OOD | MLDR_ID | BEIR | MLDR_OOD | GLUE | CSN | SQA | | BERT | 38.9 | 23.9 | 32.2 | 49.0 | 28.1 | 84.7 | 41.2 | 59.5 | | RoBERTa | 37.7 | 22.9 | 32.8 | 48.7 | 28.2 | 86.4 | 44.3 | 59.6 | | DeBERTaV3 | 20.2 | 5.4 | 13.4 | 47.1 | 21.9 | 88.1 | 17.5 | 18.6 | | NomicBERT | 41.0 | 26.7 | 30.3 | 49.9 | 61.3 | 84.0 | 41.6 | 61.4 | | GTE-en-MLM | 41.4 | **34.3** |**44.4** | 48.2 | 69.3 | 85.6 | 44.9 | 71.4 | | ModernBERT | **41.6** | 27.4 | 44.0 | **51.3** | **80.2** | **88.4** | **56.4** |**73.6**| --- ### Large Models | Model | IR (DPR) | IR (DPR) | IR (DPR) | IR (ColBERT) | IR (ColBERT) | NLU | Code | Code | |-------------|--------------|--------------|--------------|---------------|---------------|------|------|------| | | BEIR | MLDR_OOD | MLDR_ID | BEIR | MLDR_OOD | GLUE | CSN | SQA | | BERT | 38.9 | 23.3 | 31.7 | 49.5 | 28.5 | 85.2 | 41.6 | 60.8 | | RoBERTa | 41.4 | 22.6 | 36.1 | 49.8 | 28.8 | 88.9 | 47.3 | 68.1 | | DeBERTaV3 | 25.6 | 7.1 | 19.2 | 46.7 | 23.0 | **91.4**| 21.2 | 19.7 | | GTE-en-MLM | 42.5 | **36.4** | **48.9** | 50.7 | 71.3 | 87.6 | 40.5 | 66.9 | | ModernBERT | **44.0** | 34.3 | 48.6 | **52.4** | **80.4** | 90.4 |**59.5** |**83.9**| *Table 1: Results for all models across an overview of all tasks. CSN refers to CodeSearchNet and SQA to StackQA. MLDRID refers to in-domain (fine-tuned on the training set) evaluation, and MLDR_OOD to out-of-domain.* ModernBERT’s strong results, coupled with its efficient runtime on long-context inputs, demonstrate that encoder-only models can be significantly improved through modern architectural choices and extensive pretraining on diversified data sources. ## Limitations ModernBERT’s training data is primarily English and code, so performance may be lower for other languages. While it can handle long sequences efficiently, using the full 8,192 tokens window may be slower than short-context inference. Like any large language model, ModernBERT may produce representations that reflect biases present in its training data. Verify critical or sensitive outputs before relying on them. ## Training - Architecture: Encoder-only, Pre-Norm Transformer with GeGLU activations. - Sequence Length: Pre-trained up to 1,024 tokens, then extended to 8,192 tokens. - Data: 2 trillion tokens of English text and code. - Optimizer: StableAdamW with trapezoidal LR scheduling and 1-sqrt decay. - Hardware: Trained on 8x H100 GPUs. See the paper for more details. ## License We release the ModernBERT model architectures, model weights, training codebase under the Apache 2.0 license. ## Citation If you use ModernBERT in your work, please cite:" +} \ No newline at end of file diff --git a/data/model_data_json/answerdotai_answerai-colbert-small-v1.json b/data/model_data_json/answerdotai_answerai-colbert-small-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..352a7abfc349e35d67337802d7e53606fa4813ab --- /dev/null +++ b/data/model_data_json/answerdotai_answerai-colbert-small-v1.json @@ -0,0 +1,18 @@ +{ + "model_id": "answerdotai/answerai-colbert-small-v1", + "downloads": 2356105, + "tags": [ + "onnx", + "safetensors", + "bert", + "ColBERT", + "RAGatouille", + "passage-retrieval", + "en", + "arxiv:2407.20750", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en tags: - ColBERT - RAGatouille - passage-retrieval --- # answerai-colbert-small-v1 **answerai-colbert-small-v1** is a new, proof-of-concept model by Answer.AI, showing the strong performance multi-vector models with the new JaColBERTv2.5 training recipe and some extra tweaks can reach, even with just **33 million parameters**. While being MiniLM-sized, it outperforms all previous similarly-sized models on common benchmarks, and even outperforms much larger popular models such as e5-large-v2 or bge-base-en-v1.5. For more information about this model or how it was trained, head over to the announcement blogpost. ## Usage ### Installation This model was designed with the upcoming RAGatouille overhaul in mind. However, it's compatible with all recent ColBERT implementations! To use it, you can either use the Stanford ColBERT library, or RAGatouille. You can install both or either by simply running. If you're interested in using this model as a re-ranker (it vastly outperforms cross-encoders its size!), you can do so via the rerankers library: ### Rerankers ### RAGatouille ### Stanford ColBERT #### Indexing #### Querying #### Extracting Vectors Finally, if you want to extract individula vectors, you can use the model this way: ## Results ### Against single-vector models | 33M (1x) | 33M (1x) | **109M (3.3x)** | | **BEIR AVG** | **53.79** | 51.99 | 51.68 | 53.25 | | **FiQA2018** | **41.15** | 40.65 | 40.34 | 40.65 | | **HotpotQA** | **76.11** | 66.54 | 69.94 | 72.6 | | **MSMARCO** | **43.5** | 40.23 | 40.83 | 41.35 | | **NQ** | **59.1** | 50.9 | 50.18 | 54.15 | | **TRECCOVID** | **84.59** | 80.12 | 75.9 | 78.07 | | **ArguAna** | 50.09 | 57.59 | 59.55 | **63.61** | | **ClimateFEVER**| 33.07 | **35.2** | 31.84 | 31.17 | | **CQADupstackRetrieval** | 38.75 | 39.65 | 39.05 | **42.35** | | **DBPedia** | **45.58** | 41.02 | 40.03 | 40.77 | | **FEVER** | **90.96** | 87.13 | 86.64 | 86.29 | | **NFCorpus** | 37.3 | 34.92 | 34.3 | **37.39** | | **QuoraRetrieval** | 87.72 | 88.41 | 88.78 | **88.9** | | **SCIDOCS** | 18.42 | **21.82** | 20.52 | 21.73 | | **SciFact** | **74.77** | 72.22 | 71.28 | 74.04 | | **Touche2020** | 25.69 | 23.48 | **26.04** | 25.7 | ### Against ColBERTv2.0 | Dataset / Model | answerai-colbert-small-v1 | ColBERTv2.0 | |:-----------------|:-----------------------:|:------------:| | **BEIR AVG** | **53.79** | 50.02 | | **DBPedia** | **45.58** | 44.6 | | **FiQA2018** | **41.15** | 35.6 | | **NQ** | **59.1** | 56.2 | | **HotpotQA** | **76.11** | 66.7 | | **NFCorpus** | **37.3** | 33.8 | | **TRECCOVID** | **84.59** | 73.3 | | **Touche2020** | 25.69 | **26.3** | | **ArguAna** | **50.09** | 46.3 | | **ClimateFEVER**| **33.07** | 17.6 | | **FEVER** | **90.96** | 78.5 | | **QuoraRetrieval** | **87.72** | 85.2 | | **SCIDOCS** | **18.42** | 15.4 | | **SciFact** | **74.77** | 69.3 | ## Referencing We'll most likely eventually release a technical report. In the meantime, if you use this model or other models following the JaColBERTv2.5 recipe and would like to give us credit, please cite the JaColBERTv2.5 journal pre-print:", + "model_explanation_gemini": "A compact, high-performing multi-vector model for efficient passage retrieval and reranking, outperforming larger models on benchmarks despite its small size (33M parameters)." +} \ No newline at end of file diff --git a/data/model_data_json/anthracite-org_magnum-v2-12b-gguf.json b/data/model_data_json/anthracite-org_magnum-v2-12b-gguf.json new file mode 100644 index 0000000000000000000000000000000000000000..65eca653bbf866e43392c1453f038bc6264c15e6 --- /dev/null +++ b/data/model_data_json/anthracite-org_magnum-v2-12b-gguf.json @@ -0,0 +1,27 @@ +{ + "model_id": "anthracite-org/magnum-v2-12b-gguf", + "downloads": 99075, + "tags": [ + "gguf", + "chat", + "text-generation", + "en", + "fr", + "de", + "es", + "it", + "pt", + "ru", + "zh", + "ja", + "base_model:anthracite-org/magnum-v2-12b", + "base_model:quantized:anthracite-org/magnum-v2-12b", + "license:apache-2.0", + "endpoints_compatible", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- license: apache-2.0 language: - en - fr - de - es - it - pt - ru - zh - ja pipeline_tag: text-generation quantized_by: anthracite-org base_model: anthracite-org/magnum-v2-12b tags: - chat --- ## This repo contains GGUF quants of the model. If you need the original weights, please find them here. This is the fourth in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Mistral-Nemo-Base-2407. !image/png ## Prompting Model has been Instruct tuned with the ChatML formatting. A typical input would look like this: ## Credits - Stheno dataset (filtered) - kalomaze/Opus_Instruct_25k - Nopm/Opus_WritingStruct - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned (A ~16k rows subset) - kalomaze/Opus_Instruct_3k This model has been a team effort, and the credits goes to all members of Anthracite. ## Training The training was done for 2 epochs. We used 8x NVIDIA H100 Tensor Core GPUs for the full-parameter fine-tuning of the model. \"Built ## Safety ...", + "model_explanation_gemini": "Fine-tuned for high-quality text generation in multiple languages, replicating Claude 3's prose style using ChatML formatting." +} \ No newline at end of file diff --git a/data/model_data_json/anuragshas_wav2vec2-large-xlsr-53-telugu.json b/data/model_data_json/anuragshas_wav2vec2-large-xlsr-53-telugu.json new file mode 100644 index 0000000000000000000000000000000000000000..f19ef83593eefeeca288dd9ba395a1ccf0c1e688 --- /dev/null +++ b/data/model_data_json/anuragshas_wav2vec2-large-xlsr-53-telugu.json @@ -0,0 +1,22 @@ +{ + "model_id": "anuragshas/wav2vec2-large-xlsr-53-telugu", + "downloads": 96078, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "te", + "dataset:openslr", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: te datasets: - openslr metrics: - wer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: Anurag Singh XLSR Wav2Vec2 Large 53 Telugu results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: OpenSLR te type: openslr args: te metrics: - name: Test WER type: wer value: 44.98 --- # Wav2Vec2-Large-XLSR-53-Telugu Fine-tuned facebook/wav2vec2-large-xlsr-53 on Telugu using the OpenSLR SLR66 dataset. When using this model, make sure that your speech input is sampled at 16kHz. ## Usage The model can be used directly (without a language model) as follows: ## Evaluation **Test Result**: 44.98% ## Training 70% of the OpenSLR Telugu dataset was used for training. Train Split of annotations is here Test Split of annotations is here Training Data Preparation notebook can be found here Training notebook can be foundhere Evaluation notebook is here", + "model_explanation_gemini": "Fine-tuned for Telugu speech recognition using the OpenSLR dataset, achieving a 44.98% word error rate on test data." +} \ No newline at end of file diff --git a/data/model_data_json/apple_DFN5B-CLIP-ViT-H-14-378.json b/data/model_data_json/apple_DFN5B-CLIP-ViT-H-14-378.json new file mode 100644 index 0000000000000000000000000000000000000000..8cdbd4c7a49d763c37edf1e5928ed30b8a89ea89 --- /dev/null +++ b/data/model_data_json/apple_DFN5B-CLIP-ViT-H-14-378.json @@ -0,0 +1,14 @@ +{ + "model_id": "apple/DFN5B-CLIP-ViT-H-14-378", + "downloads": 336526, + "tags": [ + "open_clip", + "pytorch", + "clip", + "arxiv:2309.17425", + "license:apple-amlr", + "region:us" + ], + "description": "--- license: apple-amlr license_name: apple-sample-code-license license_link: LICENSE --- A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-5B. Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data. This model was trained on 5B images that were filtered from a pool of 43B uncurated image-text pairs (12.8B image-text pairs from CommonPool-12.8B + 30B additional public image-text pairs). This model has been converted to PyTorch from the original JAX checkpoints from Axlearn ( These weights are directly usable in OpenCLIP (image + text). ## Model Details - **Model Type:** Contrastive Image-Text, Zero-Shot Image Classification. - **Dataset:** DFN-5b - **Papers:** - Data Filtering Networks: - **Samples Seen:** 39B (224 x 224) + 5B (384 x 384) ## Model Metrics | dataset | metric | |:-----------------------|---------:| | ImageNet 1k | 0.84218 | | Caltech-101 | 0.954479 | | CIFAR-10 | 0.9879 | | CIFAR-100 | 0.9041 | | CLEVR Counts | 0.362467 | | CLEVR Distance | 0.206067 | | Country211 | 0.37673 | | Describable Textures | 0.71383 | | EuroSAT | 0.608333 | | FGVC Aircraft | 0.719938 | | Food-101 | 0.963129 | | GTSRB | 0.679018 | | ImageNet Sketch | 0.73338 | | ImageNet v2 | 0.7837 | | ImageNet-A | 0.7992 | | ImageNet-O | 0.3785 | | ImageNet-R | 0.937633 | | KITTI Vehicle Distance | 0.38256 | | MNIST | 0.8372 | | ObjectNet 1 | 0.796867 | | Oxford Flowers-102 | 0.896834 | | Oxford-IIIT Pet | 0.966841 | | Pascal VOC 2007 | 0.826255 | | PatchCamelyon | 0.695953 | | Rendered SST2 | 0.566722 | | RESISC45 | 0.755079 | | Stanford Cars | 0.959955 | | STL-10 | 0.991125 | | SUN397 | 0.772799 | | SVHN | 0.671251 | | Flickr | 0.8808 | | MSCOCO | 0.636889 | | WinoGAViL | 0.571813 | | iWildCam | 0.224911 | | Camelyon17 | 0.711536 | | FMoW | 0.209024 | | Dollar Street | 0.71729 | | GeoDE | 0.935699 | | **Average** | **0.709421** | [1]: Center-crop pre-processing used for ObjectNet (squashing results in lower accuracy of 0.737) ## Model Usage ### With OpenCLIP ## Citation", + "model_explanation_gemini": "A CLIP model trained on 5B filtered images for zero-shot image classification and contrastive image-text tasks." +} \ No newline at end of file diff --git a/data/model_data_json/apple_MobileCLIP-S2-OpenCLIP.json b/data/model_data_json/apple_MobileCLIP-S2-OpenCLIP.json new file mode 100644 index 0000000000000000000000000000000000000000..82a74aa2acaae32f75ee3189cbece2cdf0beb1c7 --- /dev/null +++ b/data/model_data_json/apple_MobileCLIP-S2-OpenCLIP.json @@ -0,0 +1,18 @@ +{ + "model_id": "apple/MobileCLIP-S2-OpenCLIP", + "downloads": 95293, + "tags": [ + "open_clip", + "safetensors", + "clip", + "zero-shot-image-classification", + "arxiv:2311.17049", + "arxiv:2103.00020", + "arxiv:2303.15343", + "arxiv:2309.17425", + "license:apple-amlr", + "region:us" + ], + "description": "--- tags: - clip library_name: open_clip pipeline_tag: zero-shot-image-classification license: apple-amlr license_name: apple-ascl license_link: --- # MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training MobileCLIP was introduced in MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (CVPR 2024), by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel. This repository contains the **MobileCLIP-S2** checkpoint for OpenCLIP. !MobileCLIP Performance Figure ### Highlights * Our smallest variant obtains similar zero-shot performance as OpenAI's ViT-B/16 model while being 4.8x faster and 2.8x smaller. * obtains better avg zero-shot performance than SigLIP's ViT-B/16 model while being 2.3x faster and 2.1x smaller, and trained with 3x less seen samples. * (LT) attains zero-shot ImageNet performance of **77.2%** which is significantly better than recent works like DFN and SigLIP with similar architectures or even OpenAI's ViT-L/14@336. ## Checkpoints | Model | # Seen
Samples (B) | # Params (M)
(img + txt) | Latency (ms)
(img + txt) | IN-1k Zero-Shot
Top-1 Acc. (%) | Avg. Perf. (%)
on 38 datasets | |:----------------------------------------------------------|:----------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------------:|:----------------------------------:| | MobileCLIP-S0 | 13 | 11.4 + 42.4 | 1.5 + 1.6 | 67.8 | 58.1 | | MobileCLIP-S1 | 13 | 21.5 + 63.4 | 2.5 + 3.3 | 72.6 | 61.3 | | MobileCLIP-S2 | 13 | 35.7 + 63.4 | 3.6 + 3.3 | 74.4 | 63.7 | | MobileCLIP-B | 13 | 86.3 + 63.4 | 10.4 + 3.3 | 76.8 | 65.2 | | MobileCLIP-B (LT) | 36 | 86.3 + 63.4 | 10.4 + 3.3 | 77.2 | 65.8 |", + "model_explanation_gemini": "MobileCLIP-S2 is a fast, efficient image-text model for zero-shot image classification, achieving high accuracy with smaller size and lower latency compared to similar models." +} \ No newline at end of file diff --git a/data/model_data_json/apple_OpenELM-1_1B-Instruct.json b/data/model_data_json/apple_OpenELM-1_1B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..fed0a4458b0213e5b8a2ee0badd283edaa908f36 --- /dev/null +++ b/data/model_data_json/apple_OpenELM-1_1B-Instruct.json @@ -0,0 +1,17 @@ +{ + "model_id": "apple/OpenELM-1_1B-Instruct", + "downloads": 1476394, + "tags": [ + "transformers", + "safetensors", + "openelm", + "text-generation", + "custom_code", + "arxiv:2404.14619", + "license:apple-amlr", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: apple-amlr license_name: apple-sample-code-license license_link: LICENSE --- # OpenELM *Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari* We introduce **OpenELM**, a family of **Open** **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the CoreNet library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters. We release the complete framework, encompassing data preparation, training, fine-tuning, and evaluation procedures, alongside multiple pre-trained checkpoints and training logs, to facilitate open research. Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them. ## Usage We have provided an example function to generate output from OpenELM models loaded via HuggingFace Hub in . You can try the model by running the following command: Please refer to this link to obtain your hugging face access token. Additional arguments to the hugging face generate function can be passed via . As an example, to speedup the inference, you can try lookup token speculative generation by passing the argument as follows: Alternatively, try model-wise speculative generation with an assistive model by passing a smaller model through the argument, for example: ## Main Results ### Zero-Shot | **Model Size** | **ARC-c** | **ARC-e** | **BoolQ** | **HellaSwag** | **PIQA** | **SciQ** | **WinoGrande** | **Average** | |-----------------------------------------------------------------------------|-----------|-----------|-----------|---------------|-----------|-----------|----------------|-------------| | OpenELM-270M | 26.45 | 45.08 | **53.98** | 46.71 | 69.75 | **84.70** | **53.91** | 54.37 | | OpenELM-270M-Instruct | **30.55** | **46.68** | 48.56 | **52.07** | **70.78** | 84.40 | 52.72 | **55.11** | | OpenELM-450M | 27.56 | 48.06 | 55.78 | 53.97 | 72.31 | 87.20 | 58.01 | 57.56 | | OpenELM-450M-Instruct | **30.38** | **50.00** | **60.37** | **59.34** | **72.63** | **88.00** | **58.96** | **59.95** | | OpenELM-1_1B | 32.34 | **55.43** | 63.58 | 64.81 | **75.57** | **90.60** | 61.72 | 63.44 | | OpenELM-1_1B-Instruct | **37.97** | 52.23 | **70.00** | **71.20** | 75.03 | 89.30 | **62.75** | **65.50** | | OpenELM-3B | 35.58 | 59.89 | 67.40 | 72.44 | 78.24 | **92.70** | 65.51 | 67.39 | | OpenELM-3B-Instruct | **39.42** | **61.74** | **68.17** | **76.36** | **79.00** | 92.50 | **66.85** | **69.15** | ### LLM360 | **Model Size** | **ARC-c** | **HellaSwag** | **MMLU** | **TruthfulQA** | **WinoGrande** | **Average** | |-----------------------------------------------------------------------------|-----------|---------------|-----------|----------------|----------------|-------------| | OpenELM-270M | 27.65 | 47.15 | 25.72 | **39.24** | **53.83** | 38.72 | | OpenELM-270M-Instruct | **32.51** | **51.58** | **26.70** | 38.72 | 53.20 | **40.54** | | OpenELM-450M | 30.20 | 53.86 | **26.01** | 40.18 | 57.22 | 41.50 | | OpenELM-450M-Instruct | **33.53** | **59.31** | 25.41 | **40.48** | **58.33** | **43.41** | | OpenELM-1_1B | 36.69 | 65.71 | **27.05** | 36.98 | 63.22 | 45.93 | | OpenELM-1_1B-Instruct | **41.55** | **71.83** | 25.65 | **45.95** | **64.72** | **49.94** | | OpenELM-3B | 42.24 | 73.28 | **26.76** | 34.98 | 67.25 | 48.90 | | OpenELM-3B-Instruct | **47.70** | **76.87** | 24.80 | **38.76** | **67.96** | **51.22** | ### OpenLLM Leaderboard | **Model Size** | **ARC-c** | **CrowS-Pairs** | **HellaSwag** | **MMLU** | **PIQA** | **RACE** | **TruthfulQA** | **WinoGrande** | **Average** | |-----------------------------------------------------------------------------|-----------|-----------------|---------------|-----------|-----------|-----------|----------------|----------------|-------------| | OpenELM-270M | 27.65 | **66.79** | 47.15 | 25.72 | 69.75 | 30.91 | **39.24** | **53.83** | 45.13 | | OpenELM-270M-Instruct | **32.51** | 66.01 | **51.58** | **26.70** | **70.78** | 33.78 | 38.72 | 53.20 | **46.66** | | OpenELM-450M | 30.20 | **68.63** | 53.86 | **26.01** | 72.31 | 33.11 | 40.18 | 57.22 | 47.69 | | OpenELM-450M-Instruct | **33.53** | 67.44 | **59.31** | 25.41 | **72.63** | **36.84** | **40.48** | **58.33** | **49.25** | | OpenELM-1_1B | 36.69 | **71.74** | 65.71 | **27.05** | **75.57** | 36.46 | 36.98 | 63.22 | 51.68 | | OpenELM-1_1B-Instruct | **41.55** | 71.02 | **71.83** | 25.65 | 75.03 | **39.43** | **45.95** | **64.72** | **54.40** | | OpenELM-3B | 42.24 | **73.29** | 73.28 | **26.76** | 78.24 | **38.76** | 34.98 | 67.25 | 54.35 | | OpenELM-3B-Instruct | **47.70** | 72.33 | **76.87** | 24.80 | **79.00** | 38.47 | **38.76** | **67.96** | **55.73** | See the technical report for more results and comparison. ## Evaluation ### Setup Install the following dependencies: ### Evaluate OpenELM ## Bias, Risks, and Limitations The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements. ## Citation If you find our work useful, please cite:", + "model_explanation_gemini": "OpenELM-1_1B-Instruct is an instruction-tuned, 1.1-billion-parameter efficient language model designed for enhanced accuracy in text generation and zero-shot tasks through optimized parameter allocation." +} \ No newline at end of file diff --git a/data/model_data_json/apple_OpenELM-450M-Instruct.json b/data/model_data_json/apple_OpenELM-450M-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..776182eee75834bf4d8873b02e037fb70b998a41 --- /dev/null +++ b/data/model_data_json/apple_OpenELM-450M-Instruct.json @@ -0,0 +1,17 @@ +{ + "model_id": "apple/OpenELM-450M-Instruct", + "downloads": 111567, + "tags": [ + "transformers", + "safetensors", + "openelm", + "text-generation", + "custom_code", + "arxiv:2404.14619", + "license:apple-amlr", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: apple-amlr license_name: apple-sample-code-license license_link: LICENSE --- # OpenELM *Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari* We introduce **OpenELM**, a family of **Open** **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the CoreNet library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters. We release the complete framework, encompassing data preparation, training, fine-tuning, and evaluation procedures, alongside multiple pre-trained checkpoints and training logs, to facilitate open research. Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them. ## Usage We have provided an example function to generate output from OpenELM models loaded via HuggingFace Hub in . You can try the model by running the following command: Please refer to this link to obtain your hugging face access token. Additional arguments to the hugging face generate function can be passed via . As an example, to speedup the inference, you can try lookup token speculative generation by passing the argument as follows: Alternatively, try model-wise speculative generation with an assistive model by passing a smaller model through the argument, for example: ## Main Results ### Zero-Shot | **Model Size** | **ARC-c** | **ARC-e** | **BoolQ** | **HellaSwag** | **PIQA** | **SciQ** | **WinoGrande** | **Average** | |-----------------------------------------------------------------------------|-----------|-----------|-----------|---------------|-----------|-----------|----------------|-------------| | OpenELM-270M | 26.45 | 45.08 | **53.98** | 46.71 | 69.75 | **84.70** | **53.91** | 54.37 | | OpenELM-270M-Instruct | **30.55** | **46.68** | 48.56 | **52.07** | **70.78** | 84.40 | 52.72 | **55.11** | | OpenELM-450M | 27.56 | 48.06 | 55.78 | 53.97 | 72.31 | 87.20 | 58.01 | 57.56 | | OpenELM-450M-Instruct | **30.38** | **50.00** | **60.37** | **59.34** | **72.63** | **88.00** | **58.96** | **59.95** | | OpenELM-1_1B | 32.34 | **55.43** | 63.58 | 64.81 | **75.57** | **90.60** | 61.72 | 63.44 | | OpenELM-1_1B-Instruct | **37.97** | 52.23 | **70.00** | **71.20** | 75.03 | 89.30 | **62.75** | **65.50** | | OpenELM-3B | 35.58 | 59.89 | 67.40 | 72.44 | 78.24 | **92.70** | 65.51 | 67.39 | | OpenELM-3B-Instruct | **39.42** | **61.74** | **68.17** | **76.36** | **79.00** | 92.50 | **66.85** | **69.15** | ### LLM360 | **Model Size** | **ARC-c** | **HellaSwag** | **MMLU** | **TruthfulQA** | **WinoGrande** | **Average** | |-----------------------------------------------------------------------------|-----------|---------------|-----------|----------------|----------------|-------------| | OpenELM-270M | 27.65 | 47.15 | 25.72 | **39.24** | **53.83** | 38.72 | | OpenELM-270M-Instruct | **32.51** | **51.58** | **26.70** | 38.72 | 53.20 | **40.54** | | OpenELM-450M | 30.20 | 53.86 | **26.01** | 40.18 | 57.22 | 41.50 | | OpenELM-450M-Instruct | **33.53** | **59.31** | 25.41 | **40.48** | **58.33** | **43.41** | | OpenELM-1_1B | 36.69 | 65.71 | **27.05** | 36.98 | 63.22 | 45.93 | | OpenELM-1_1B-Instruct | **41.55** | **71.83** | 25.65 | **45.95** | **64.72** | **49.94** | | OpenELM-3B | 42.24 | 73.28 | **26.76** | 34.98 | 67.25 | 48.90 | | OpenELM-3B-Instruct | **47.70** | **76.87** | 24.80 | **38.76** | **67.96** | **51.22** | ### OpenLLM Leaderboard | **Model Size** | **ARC-c** | **CrowS-Pairs** | **HellaSwag** | **MMLU** | **PIQA** | **RACE** | **TruthfulQA** | **WinoGrande** | **Average** | |-----------------------------------------------------------------------------|-----------|-----------------|---------------|-----------|-----------|-----------|----------------|----------------|-------------| | OpenELM-270M | 27.65 | **66.79** | 47.15 | 25.72 | 69.75 | 30.91 | **39.24** | **53.83** | 45.13 | | OpenELM-270M-Instruct | **32.51** | 66.01 | **51.58** | **26.70** | **70.78** | 33.78 | 38.72 | 53.20 | **46.66** | | OpenELM-450M | 30.20 | **68.63** | 53.86 | **26.01** | 72.31 | 33.11 | 40.18 | 57.22 | 47.69 | | OpenELM-450M-Instruct | **33.53** | 67.44 | **59.31** | 25.41 | **72.63** | **36.84** | **40.48** | **58.33** | **49.25** | | OpenELM-1_1B | 36.69 | **71.74** | 65.71 | **27.05** | **75.57** | 36.46 | 36.98 | 63.22 | 51.68 | | OpenELM-1_1B-Instruct | **41.55** | 71.02 | **71.83** | 25.65 | 75.03 | **39.43** | **45.95** | **64.72** | **54.40** | | OpenELM-3B | 42.24 | **73.29** | 73.28 | **26.76** | 78.24 | **38.76** | 34.98 | 67.25 | 54.35 | | OpenELM-3B-Instruct | **47.70** | 72.33 | **76.87** | 24.80 | **79.00** | 38.47 | **38.76** | **67.96** | **55.73** | See the technical report for more results and comparison. ## Evaluation ### Setup Install the following dependencies: ### Evaluate OpenELM ## Bias, Risks, and Limitations The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements. ## Citation If you find our work useful, please cite:", + "model_explanation_gemini": "OpenELM-450M-Instruct is an instruction-tuned, efficient language model designed for enhanced accuracy in various NLP tasks through optimized parameter allocation." +} \ No newline at end of file diff --git a/data/model_data_json/apple_mobilevit-small.json b/data/model_data_json/apple_mobilevit-small.json new file mode 100644 index 0000000000000000000000000000000000000000..1582da0c69fb9c2a7e0cdab730d8833990a710b5 --- /dev/null +++ b/data/model_data_json/apple_mobilevit-small.json @@ -0,0 +1,21 @@ +{ + "model_id": "apple/mobilevit-small", + "downloads": 890510, + "tags": [ + "transformers", + "pytorch", + "tf", + "coreml", + "mobilevit", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:2110.02178", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - vision - image-classification datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # MobileViT (small-sized model) MobileViT model pre-trained on ImageNet-1k at resolution 256x256. It was introduced in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari, and first released in this repository. The license used is Apple sample code license. Disclaimer: The team releasing MobileViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description MobileViT is a light-weight, low latency convolutional neural network that combines MobileNetV2-style layers with a new block that replaces local processing in convolutions with global processing using transformers. As with ViT (Vision Transformer), the image data is converted into flattened patches before it is processed by the transformer layers. Afterwards, the patches are \"unflattened\" back into feature maps. This allows the MobileViT-block to be placed anywhere inside a CNN. MobileViT does not require any positional embeddings. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Currently, both the feature extractor and model support PyTorch. ## Training data The MobileViT model was pretrained on ImageNet-1k, a dataset consisting of 1 million images and 1,000 classes. ## Training procedure ### Preprocessing Training requires only basic data augmentation, i.e. random resized cropping and horizontal flipping. To learn multi-scale representations without requiring fine-tuning, a multi-scale sampler was used during training, with image sizes randomly sampled from: (160, 160), (192, 192), (256, 256), (288, 288), (320, 320). At inference time, images are resized/rescaled to the same resolution (288x288), and center-cropped at 256x256. Pixels are normalized to the range [0, 1]. Images are expected to be in BGR pixel order, not RGB. ### Pretraining The MobileViT networks are trained from scratch for 300 epochs on ImageNet-1k on 8 NVIDIA GPUs with an effective batch size of 1024 and learning rate warmup for 3k steps, followed by cosine annealing. Also used were label smoothing cross-entropy loss and L2 weight decay. Training resolution varies from 160x160 to 320x320, using multi-scale sampling. ## Evaluation results | Model | ImageNet top-1 accuracy | ImageNet top-5 accuracy | # params | URL | |------------------|-------------------------|-------------------------|-----------|-------------------------------------------------| | MobileViT-XXS | 69.0 | 88.9 | 1.3 M | | | MobileViT-XS | 74.8 | 92.3 | 2.3 M | | | **MobileViT-S** | **78.4** | **94.1** | **5.6 M** | | ### BibTeX entry and citation info", + "model_explanation_gemini": "A lightweight vision transformer model for image classification, pre-trained on ImageNet-1k, combining MobileNetV2-style layers with global processing via transformers." +} \ No newline at end of file diff --git a/data/model_data_json/argmaxinc_whisperkit-coreml.json b/data/model_data_json/argmaxinc_whisperkit-coreml.json new file mode 100644 index 0000000000000000000000000000000000000000..886c7761e81c13a8f24bdc15345b313af0681c02 --- /dev/null +++ b/data/model_data_json/argmaxinc_whisperkit-coreml.json @@ -0,0 +1,15 @@ +{ + "model_id": "argmaxinc/whisperkit-coreml", + "downloads": 344968, + "tags": [ + "whisperkit", + "coreml", + "whisper", + "asr", + "quantized", + "automatic-speech-recognition", + "region:us" + ], + "description": "--- pretty_name: \"WhisperKit\" viewer: false library_name: whisperkit tags: - whisper - whisperkit - coreml - asr - quantized - automatic-speech-recognition --- # WhisperKit WhisperKit is an on-device speech recognition framework for Apple Silicon: For performance and accuracy benchmarks on real devices, please see: WhisperKit Pro is the commercial upgraded version of WhisperKit. If you would like to learn more about upgrading to Pro, please reach out to whisperkitpro@argmaxinc.com or fill the form here.", + "model_explanation_gemini": "An on-device speech recognition framework for Apple Silicon optimized for automatic speech recognition (ASR) using CoreML and quantization." +} \ No newline at end of file diff --git a/data/model_data_json/ashawkey_imagedream-ipmv-diffusers.json b/data/model_data_json/ashawkey_imagedream-ipmv-diffusers.json new file mode 100644 index 0000000000000000000000000000000000000000..543df6724a629384885322e3ef3bdf24e4616c9e --- /dev/null +++ b/data/model_data_json/ashawkey_imagedream-ipmv-diffusers.json @@ -0,0 +1,14 @@ +{ + "model_id": "ashawkey/imagedream-ipmv-diffusers", + "downloads": 89200, + "tags": [ + "diffusers", + "safetensors", + "arxiv:2312.02201", + "license:openrail", + "diffusers:MVDreamPipeline", + "region:us" + ], + "description": "--- license: openrail --- # MVDream-diffusers Model Card This is a port of into diffusers. For usage, please check: ## Citation ## Misuse, Malicious Use, and Out-of-Scope Use The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.", + "model_explanation_gemini": "Ports MVDream into diffusers for generating multi-view images while prohibiting harmful or offensive content creation." +} \ No newline at end of file diff --git a/data/model_data_json/ashawkey_mvdream-sd2.1-diffusers.json b/data/model_data_json/ashawkey_mvdream-sd2.1-diffusers.json new file mode 100644 index 0000000000000000000000000000000000000000..04144018df279618a2ca374d462840715e1f1c41 --- /dev/null +++ b/data/model_data_json/ashawkey_mvdream-sd2.1-diffusers.json @@ -0,0 +1,13 @@ +{ + "model_id": "ashawkey/mvdream-sd2.1-diffusers", + "downloads": 78372, + "tags": [ + "diffusers", + "safetensors", + "arxiv:2308.16512", + "license:openrail", + "diffusers:MVDreamPipeline", + "region:us" + ], + "description": "--- license: openrail --- # MVDream-diffusers Model Card This is a port of into diffusers. For usage, please check: ## Citation ## Misuse, Malicious Use, and Out-of-Scope Use The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes." +} \ No newline at end of file diff --git a/data/model_data_json/asosoft_KuBERT-Central-Kurdish-BERT-Model.json b/data/model_data_json/asosoft_KuBERT-Central-Kurdish-BERT-Model.json new file mode 100644 index 0000000000000000000000000000000000000000..955ef70b31187c3169d1e7bba7afa94fa4d0e768 --- /dev/null +++ b/data/model_data_json/asosoft_KuBERT-Central-Kurdish-BERT-Model.json @@ -0,0 +1,15 @@ +{ + "model_id": "asosoft/KuBERT-Central-Kurdish-BERT-Model", + "downloads": 122817, + "tags": [ + "transformers", + "safetensors", + "bert", + "feature-extraction", + "zero-shot-classification", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: zero-shot-classification --- # KuBERT: Central Kurdish BERT Model ## Introduction KuBERT-Central-Kurdish-BERT-Model harnesses the BERT framework to enhance computational linguistics for the Central Kurdish language. This initiative is a response to the scarcity of resources and computational models for Kurdish, which is a language with substantial linguistic diversity. ## Data Acquisition for Model Training Data collection is a significant hurdle in training deep learning models, especially for low-resource languages like Kurdish. Sourcing sufficient data is essential for the efficacy of complex models such as BERT. The scarcity of digital resources makes accumulating Kurdish data more challenging than for many other languages. To amass a comprehensive word vector dataset for Kurdish, substantial efforts were made to compile information from various sources. ### Corpus Compilation Three main corpora were utilized to train the Kurdish BERT model, amounting to 296.5 million tokens: - **AsoSoft corpus**: With 188 million tokens, it includes data from websites, textbooks, and magazines. - **AramRafeq and Muhammad Azizi corpus**: A collection of over 60 million tokens gathered from Kurdish websites. - **Oscar 2019 corpus**: Comprising 48.5 million words, it further enriches the dataset. This comprehensive text corpus ensures that the KuBERT model is well-equipped to understand and process Kurdish at a high level. ## Overview The project uses the latest advances in BERT technology to better understand and process Kurdish language data. The model training incorporates a Kurdish-specific tokenizer and various classifiers, demonstrating BERT's adaptability to linguistic intricacies. **from transformers import BertTokenizer, BertModel** **tokenizer = BertTokenizer.from_pretrained('asosoft/KuBERT-Central-Kurdish-BERT-Model')** **model = BertModel.from_pretrained('asosoft/KuBERT-Central-Kurdish-BERT-Model')** ## Contributions The integration of BERT represents a significant step forward in computational linguistics for Kurdish, providing a much-needed benchmark for future NLP efforts in under-represented languages. By leveraging a large corpus of Kurdish text, this project addresses critical gaps in language processing tools for Kurdish. ## Training Details The BERT model undergoes extensive fine-tuning with the curated Kurdish dataset, ensuring optimal performance. Through rigorous training and evaluation, the model is prepared to handle a variety of linguistic tasks. ## Final Remarks This README encapsulates the essence of the KuBERT-Central-Kurdish-BERT-Model project, its data acquisition efforts, and the innovative use of BERT for the Kurdish language. For a full understanding of the model's capabilities and comprehensive training details, the full documentation and accompanying study materials should be consulted. ### Relevant Links and References - Oscar 2019 corpus: - AsoSoft Kurdish Text Corpus: - Kurdish Resources by Muhammad Azizi and AramRafeq: --- *Epochs: 3 *Max Token Length: 256 *Learning Rate: 1.00E-05 *Dropout Rate: 0.3 *Batch Size: 8 *GPU Utilization: Yes --- The corpus data tables and the detailed methodology can be found in the full research paper and are summarized here for quick reference: ### Corpus Data Tables Summary **Table 1: AsoSoft Kurdish Text Corpus** | Source | Number of Tokens | |---------------------------|------------------| | Crawled From Websites | 95M | | Text Books | 45M | | Magazines | 48M | | **Sum** | **188M** | **Table 2: Muhammad Azizi and AramRafeq Text Corpus** | Source | Number of Tokens | |----------------------|------------------| | Wikipedia | 13.5M | | Wishe Website | 11M | | Speemedia Website | 6.5M | | Kurdiu Website | 19M | | Dengiamerika Website | 2M | | Chawg Website | 8M | | **Sum** | **60M** | **Table 3: The Kurdish Text Corpus Used to Train BERT** | Corpus Name | Number of Tokens | |------------------------------------|------------------| | Oscar 2019 corpus | 48.5M | | AsoSoft corpus | 188M | | Muhammad Azizi and AramRafeq corpus| 60M | | **Sum** | **296.5M** | ## Cite If you are using our text corpus cite us. Awlla, K.M., Veisi, H. & Abdullah, A.A. Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish. Lang Resources & Evaluation (2025). ~~~ @article{awlla2025sentiment, title={Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish}, author={Awlla, K.M. and Veisi, H. and Abdullah, A.A.}, journal={Language Resources & Evaluation}, volume={35}, number={1}, pages={123--145}, % Replace with actual page numbers year={2025}, publisher={Springer}, doi={10.1007/s10579-024-09805-0} } ~~~", + "model_explanation_gemini": "A BERT-based model fine-tuned for zero-shot classification and NLP tasks in Central Kurdish, addressing resource scarcity for the language." +} \ No newline at end of file diff --git a/data/model_data_json/aubmindlab_bert-base-arabert.json b/data/model_data_json/aubmindlab_bert-base-arabert.json new file mode 100644 index 0000000000000000000000000000000000000000..e16220e489ac23fb4b558ab79776a109e19bfe78 --- /dev/null +++ b/data/model_data_json/aubmindlab_bert-base-arabert.json @@ -0,0 +1,19 @@ +{ + "model_id": "aubmindlab/bert-base-arabert", + "downloads": 74124, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "fill-mask", + "ar", + "arxiv:2003.00104", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ar datasets: - wikipedia - Osian - 1.5B-Arabic-Corpus - oscar-arabic-unshuffled - Assafir(private) widget: - text: \" عاصم +ة لبنان هي [MASK] .\" --- # !!! A newer version of this model is available !!! AraBERTv2 # AraBERT v1 & v2 : Pre-training BERT for Arabic Language Understanding **AraBERT** is an Arabic pretrained lanaguage model based on Google's BERT architechture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT Paper and in the AraBERT Meetup There are two versions of the model, AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were splitted using the Farasa Segmenter. We evalaute AraBERT models on different downstream tasks and compare them to mBERT), and other state of the art models (*To the extent of our knowledge*). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD # AraBERTv2 ## What's New! AraBERT now comes in 4 new variants to replace the old v1 versions: More Detail in the AraBERT folder and in the README and in the AraBERT Paper Model | HuggingFace Model Name | Size (MB/Params)| Pre-Segmentation | DataSet (Sentences/Size/nWords) | ---|:---:|:---:|:---:|:---: AraBERTv0.2-base | bert-base-arabertv02 | 543MB / 136M | No | 200M / 77GB / 8.6B | AraBERTv0.2-large| bert-large-arabertv02 | 1.38G 371M | No | 200M / 77GB / 8.6B | AraBERTv2-base| bert-base-arabertv2 | 543MB 136M | Yes | 200M / 77GB / 8.6B | AraBERTv2-large| bert-large-arabertv2 | 1.38G 371M | Yes | 200M / 77GB / 8.6B | AraBERTv0.1-base| bert-base-arabertv01 | 543MB 136M | No | 77M / 23GB / 2.7B | AraBERTv1-base| bert-base-arabert | 543MB 136M | Yes | 77M / 23GB / 2.7B | All models are available in the model page under the aubmindlab name. Checkpoints are available in PyTorch, TF2 and TF1 formats. ## Better Pre-Processing and New Vocab We identified an issue with AraBERTv1's wordpiece vocabulary. The issue came from punctuations and numbers that were still attached to words when learned the wordpiece vocab. We now insert a space between numbers and characters and around punctuation characters. The new vocabulary was learnt using the from the library, and should now support the Fast tokenizer implementation from the library. **P.S.**: All the old BERT codes should work with the new BERT, just change the model name and check the new preprocessing dunction **Please read the section on how to use the preprocessing function** ## Bigger Dataset and More Compute We used ~3.5 times more data, and trained for longer. For Dataset Sources see the Dataset Section Model | Hardware | num of examples with seq len (128 / 512) |128 (Batch Size/ Num of Steps) | 512 (Batch Size/ Num of Steps) | Total Steps | Total Time (in Days) | ---|:---:|:---:|:---:|:---:|:---:|:---: AraBERTv0.2-base | TPUv3-8 | 420M / 207M |2560 / 1M | 384/ 2M | 3M | - AraBERTv0.2-large | TPUv3-128 | 420M / 207M | 13440 / 250K | 2056 / 300K | 550K | - AraBERTv2-base | TPUv3-8 | 520M / 245M |13440 / 250K | 2056 / 300K | 550K | - AraBERTv2-large | TPUv3-128 | 520M / 245M | 13440 / 250K | 2056 / 300K | 550K | - AraBERT-base (v1/v0.1) | TPUv2-8 | - |512 / 900K | 128 / 300K| 1.2M | 4 days # Dataset The pretraining data used for the new AraBERT model is also used for Arabic **GPT2 and ELECTRA**. The dataset consists of 77GB or 200,095,961 lines or 8,655,948,860 words or 82,232,988,358 chars (before applying Farasa Segmentation) For the new dataset we added the unshuffled OSCAR corpus, after we thoroughly filter it, to the previous dataset used in AraBERTv1 but with out the websites that we previously crawled: - OSCAR unshuffled and filtered. - Arabic Wikipedia dump from 2020/09/01 - The 1.5B words Arabic Corpus - The OSIAN Corpus - Assafir news articles. Huge thank you for Assafir for giving us the data # Preprocessing It is recommended to apply our preprocessing function before training/testing on any dataset. **Install farasapy to segment text for AraBERT v1 & v2 ** ## Accepted_models # TensorFlow 1.x models The TF1.x model are available in the HuggingFace models repo. You can download them as follows: - via git-lfs: clone all the models in a repo where is any model under the name - via : - Go to the tf1_model.tar.gz file on huggingface.co/models/aubmindlab/MODEL_NAME. - copy the - then run (ex: for : ) # If you used this model please cite us as : Google Scholar has our Bibtex wrong (missing name), use this instead # Acknowledgments Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the AUB MIND Lab Members for the continous support. Also thanks to Yakshof and Assafir for data and storage access. Another thanks for Habib Rahal ( for putting a face to AraBERT. ## Contacts **Wissam Antoun**: Linkedin | Twitter | Github | | **Fady Baly**: Linkedin | Twitter | Github | | " +} \ No newline at end of file diff --git a/data/model_data_json/aubmindlab_bert-base-arabertv02.json b/data/model_data_json/aubmindlab_bert-base-arabertv02.json new file mode 100644 index 0000000000000000000000000000000000000000..e72ed7afe2727a4173f00e9bb2ba6458170e74a8 --- /dev/null +++ b/data/model_data_json/aubmindlab_bert-base-arabertv02.json @@ -0,0 +1,26 @@ +{ + "model_id": "aubmindlab/bert-base-arabertv02", + "downloads": 628605, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "tensorboard", + "safetensors", + "bert", + "fill-mask", + "ar", + "dataset:wikipedia", + "dataset:Osian", + "dataset:1.5B-Arabic-Corpus", + "dataset:oscar-arabic-unshuffled", + "dataset:Assafir-private", + "arxiv:2003.00104", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ar datasets: - wikipedia - Osian - 1.5B-Arabic-Corpus - oscar-arabic-unshuffled - Assafir-private widget: - text: ' عاصمة لبنان هي [MASK] .' pipeline_tag: fill-mask --- # AraBERT v1 & v2 : Pre-training BERT for Arabic Language Understanding **AraBERT** is an Arabic pretrained language model based on Google's BERT architechture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT Paper and in the AraBERT Meetup There are two versions of the model, AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter. We evaluate AraBERT models on different downstream tasks and compare them to mBERT), and other state of the art models (*To the extent of our knowledge*). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD # AraBERTv2 ## What's New! AraBERT now comes in 4 new variants to replace the old v1 versions: More Detail in the AraBERT folder and in the README and in the AraBERT Paper Model | HuggingFace Model Name | Size (MB/Params)| Pre-Segmentation | DataSet (Sentences/Size/nWords) | ---|:---:|:---:|:---:|:---: AraBERTv0.2-base | bert-base-arabertv02 | 543MB / 136M | No | 200M / 77GB / 8.6B | AraBERTv0.2-large| bert-large-arabertv02 | 1.38G 371M | No | 200M / 77GB / 8.6B | AraBERTv2-base| bert-base-arabertv2 | 543MB 136M | Yes | 200M / 77GB / 8.6B | AraBERTv2-large| bert-large-arabertv2 | 1.38G 371M | Yes | 200M / 77GB / 8.6B | AraBERTv0.2-Twitter-base| bert-base-arabertv02-twitter | 543MB / 136M | No | Same as v02 + 60M Multi-Dialect Tweets| AraBERTv0.2-Twitter-large| bert-large-arabertv02-twitter | 1.38G / 371M | No | Same as v02 + 60M Multi-Dialect Tweets| AraBERTv0.1-base| bert-base-arabertv01 | 543MB 136M | No | 77M / 23GB / 2.7B | AraBERTv1-base| bert-base-arabert | 543MB 136M | Yes | 77M / 23GB / 2.7B | All models are available in the model page under the aubmindlab name. Checkpoints are available in PyTorch, TF2 and TF1 formats. ## Better Pre-Processing and New Vocab We identified an issue with AraBERTv1's wordpiece vocabulary. The issue came from punctuations and numbers that were still attached to words when learned the wordpiece vocab. We now insert a space between numbers and characters and around punctuation characters. The new vocabulary was learned using the from the library, and should now support the Fast tokenizer implementation from the library. **P.S.**: All the old BERT codes should work with the new BERT, just change the model name and check the new preprocessing function **Please read the section on how to use the preprocessing function** ## Bigger Dataset and More Compute We used ~3.5 times more data, and trained for longer. For Dataset Sources see the Dataset Section Model | Hardware | num of examples with seq len (128 / 512) |128 (Batch Size/ Num of Steps) | 512 (Batch Size/ Num of Steps) | Total Steps | Total Time (in Days) | ---|:---:|:---:|:---:|:---:|:---:|:---: AraBERTv0.2-base | TPUv3-8 | 420M / 207M | 2560 / 1M | 384/ 2M | 3M | - AraBERTv0.2-large | TPUv3-128 | 420M / 207M | 13440 / 250K | 2056 / 300K | 550K | 7 AraBERTv2-base | TPUv3-8 | 420M / 207M | 2560 / 1M | 384/ 2M | 3M | - AraBERTv2-large | TPUv3-128 | 520M / 245M | 13440 / 250K | 2056 / 300K | 550K | 7 AraBERT-base (v1/v0.1) | TPUv2-8 | - |512 / 900K | 128 / 300K| 1.2M | 4 # Dataset The pretraining data used for the new AraBERT model is also used for Arabic **GPT2 and ELECTRA**. The dataset consists of 77GB or 200,095,961 lines or 8,655,948,860 words or 82,232,988,358 chars (before applying Farasa Segmentation) For the new dataset we added the unshuffled OSCAR corpus, after we thoroughly filter it, to the previous dataset used in AraBERTv1 but with out the websites that we previously crawled: - OSCAR unshuffled and filtered. - Arabic Wikipedia dump from 2020/09/01 - The 1.5B words Arabic Corpus - The OSIAN Corpus - Assafir news articles. Huge thank you for Assafir for providing us the data # Preprocessing It is recommended to apply our preprocessing function before training/testing on any dataset. **Install the arabert python package to segment text for AraBERT v1 & v2 or to clean your data ** # TensorFlow 1.x models The TF1.x model are available in the HuggingFace models repo. You can download them as follows: - via git-lfs: clone all the models in a repo where is any model under the name - via : - Go to the tf1_model.tar.gz file on huggingface.co/models/aubmindlab/MODEL_NAME. - copy the - then run (ex: for : ) # If you used this model please cite us as : Google Scholar has our Bibtex wrong (missing name), use this instead # Acknowledgments Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the AUB MIND Lab Members for the continuous support. Also thanks to Yakshof and Assafir for data and storage access. Another thanks for Habib Rahal ( for putting a face to AraBERT. # Contacts **Wissam Antoun**: Linkedin | Twitter | Github | | **Fady Baly**: Linkedin | Twitter | Github | | ", + "model_explanation_gemini": "AraBERTv02 is a BERT-based Arabic language model designed for tasks like masked word prediction, sentiment analysis, named entity recognition, and question answering, trained on extensive Arabic text data." +} \ No newline at end of file diff --git a/data/model_data_json/audeering_wav2vec2-large-robust-12-ft-emotion-msp-dim.json b/data/model_data_json/audeering_wav2vec2-large-robust-12-ft-emotion-msp-dim.json new file mode 100644 index 0000000000000000000000000000000000000000..71b663dec7ba9b26457a37aaeee684ee7f02d247 --- /dev/null +++ b/data/model_data_json/audeering_wav2vec2-large-robust-12-ft-emotion-msp-dim.json @@ -0,0 +1,22 @@ +{ + "model_id": "audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim", + "downloads": 357398, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "speech", + "audio", + "audio-classification", + "emotion-recognition", + "en", + "dataset:msp-podcast", + "arxiv:2203.07378", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - msp-podcast inference: true tags: - speech - audio - wav2vec2 - audio-classification - emotion-recognition license: cc-by-nc-sa-4.0 pipeline_tag: audio-classification --- # Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0 Please note that this model is for research purpose only. A commercial license for a model that has been trained on much more data can be acquired with audEERING. The model expects a raw audio signal as input, and outputs predictions for arousal, dominance and valence in a range of approximately 0...1. In addition, it provides the pooled states of the last transformer layer. The model was created by fine-tuning Wav2Vec2-Large-Robust on MSP-Podcast (v1.7). The model was pruned from 24 to 12 transformer layers before fine-tuning. An ONNX export of the model is available from doi:10.5281/zenodo.6221127. Further details are given in the associated paper and tutorial. # Usage", + "model_explanation_gemini": "Recognizes emotions in speech by predicting arousal, dominance, and valence levels from raw audio signals using a pruned Wav2Vec2-Large-Robust model fine-tuned on MSP-Podcast data." +} \ No newline at end of file diff --git a/data/model_data_json/autogluon_chronos-bolt-base.json b/data/model_data_json/autogluon_chronos-bolt-base.json new file mode 100644 index 0000000000000000000000000000000000000000..12b24e60acd24dabfd86fa50f8b50cc229d95537 --- /dev/null +++ b/data/model_data_json/autogluon_chronos-bolt-base.json @@ -0,0 +1,21 @@ +{ + "model_id": "autogluon/chronos-bolt-base", + "downloads": 6259606, + "tags": [ + "safetensors", + "t5", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:1910.10683", + "arxiv:2403.07815", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-Bolt⚡ (Base) 🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are **more accurate**, up to **250 times faster** and **20 times more memory-efficient** than the original Chronos models of the same size. ## Performance The following plot compares the inference time of Chronos-Bolt against the original Chronos models for forecasting 1024 time series with a context length of 512 observations and a prediction horizon of 64 steps.
Chronos-Bolt models are not only significantly faster but also more accurate than the original Chronos models. The following plot reports the probabilistic and point forecasting performance of Chronos-Bolt in terms of the Weighted Quantile Loss (WQL) and the Mean Absolute Scaled Error (MASE), respectively, aggregated over 27 datasets (see the Chronos paper for details on this benchmark). Remarkably, despite having no prior exposure to these datasets during training, the zero-shot Chronos-Bolt models outperform commonly used statistical models and deep learning models that have been trained on these datasets (highlighted by *). Furthermore, they also perform better than other FMs, denoted by a +, which indicates that these models were pretrained on certain datasets in our benchmark and are not entirely zero-shot. Notably, Chronos-Bolt (Base) also surpasses the original Chronos (Large) model in terms of the forecasting accuracy while being over 600 times faster.
Chronos-Bolt models are available in the following sizes.
| Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-bolt-tiny** | 9M | t5-efficient-tiny | | **chronos-bolt-mini** | 21M | t5-efficient-mini | | **chronos-bolt-small** | 48M | t5-efficient-small | | **chronos-bolt-base** | 205M | t5-efficient-base |
## Usage ### Zero-shot inference with Chronos-Bolt in AutoGluon Install the required dependencies. Forecast with the Chronos-Bolt model. For more advanced features such as **fine-tuning** and **forecasting with covariates**, check out this tutorial. ### Deploying a Chronos-Bolt endpoint to SageMaker First, update the SageMaker SDK to make sure that all the latest models are available. Deploy an inference endpoint to SageMaker. Now you can send time series data to the endpoint in JSON format. Chronos-Bolt models can be deployed to both CPU and GPU instances. These models also support **forecasting with covariates**. For more details about the endpoint API, check out the example notebook. ## Citation If you find Chronos or Chronos-Bolt models useful for your research, please consider citing the associated paper: ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "A pretrained time series forecasting model based on T5 architecture, designed for zero-shot, fast, and accurate multi-step predictions using direct quantile forecasting." +} \ No newline at end of file diff --git a/data/model_data_json/autogluon_chronos-bolt-mini.json b/data/model_data_json/autogluon_chronos-bolt-mini.json new file mode 100644 index 0000000000000000000000000000000000000000..5c0ddb064aef82094023e8175a59cff2d6f37ad0 --- /dev/null +++ b/data/model_data_json/autogluon_chronos-bolt-mini.json @@ -0,0 +1,21 @@ +{ + "model_id": "autogluon/chronos-bolt-mini", + "downloads": 524485, + "tags": [ + "safetensors", + "t5", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:1910.10683", + "arxiv:2403.07815", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-Bolt⚡ (Mini) 🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are **more accurate**, up to **250 times faster** and **20 times more memory-efficient** than the original Chronos models of the same size. ## Performance The following plot compares the inference time of Chronos-Bolt against the original Chronos models for forecasting 1024 time series with a context length of 512 observations and a prediction horizon of 64 steps.
Chronos-Bolt models are not only significantly faster but also more accurate than the original Chronos models. The following plot reports the probabilistic and point forecasting performance of Chronos-Bolt in terms of the Weighted Quantile Loss (WQL) and the Mean Absolute Scaled Error (MASE), respectively, aggregated over 27 datasets (see the Chronos paper for details on this benchmark). Remarkably, despite having no prior exposure to these datasets during training, the zero-shot Chronos-Bolt models outperform commonly used statistical models and deep learning models that have been trained on these datasets (highlighted by *). Furthermore, they also perform better than other FMs, denoted by a +, which indicates that these models were pretrained on certain datasets in our benchmark and are not entirely zero-shot. Notably, Chronos-Bolt (Base) also surpasses the original Chronos (Large) model in terms of the forecasting accuracy while being over 600 times faster.
Chronos-Bolt models are available in the following sizes.
| Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-bolt-tiny** | 9M | t5-efficient-tiny | | **chronos-bolt-mini** | 21M | t5-efficient-mini | | **chronos-bolt-small** | 48M | t5-efficient-small | | **chronos-bolt-base** | 205M | t5-efficient-base |
## Usage ### Zero-shot inference with Chronos-Bolt in AutoGluon Install the required dependencies. Forecast with the Chronos-Bolt model. For more advanced features such as **fine-tuning** and **forecasting with covariates**, check out this tutorial. ### Deploying a Chronos-Bolt endpoint to SageMaker First, update the SageMaker SDK to make sure that all the latest models are available. Deploy an inference endpoint to SageMaker. Now you can send time series data to the endpoint in JSON format. Chronos-Bolt models can be deployed to both CPU and GPU instances. These models also support **forecasting with covariates**. For more details about the endpoint API, check out the example notebook. ## Citation If you find Chronos or Chronos-Bolt models useful for your research, please consider citing the associated paper: ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "Pretrained time series forecasting model based on T5 architecture, optimized for fast, memory-efficient zero-shot forecasting with direct multi-step quantile predictions." +} \ No newline at end of file diff --git a/data/model_data_json/autogluon_chronos-bolt-small.json b/data/model_data_json/autogluon_chronos-bolt-small.json new file mode 100644 index 0000000000000000000000000000000000000000..078e3d5f6abf883c491d1dd57e95567c392a8a73 --- /dev/null +++ b/data/model_data_json/autogluon_chronos-bolt-small.json @@ -0,0 +1,21 @@ +{ + "model_id": "autogluon/chronos-bolt-small", + "downloads": 5574099, + "tags": [ + "safetensors", + "t5", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:1910.10683", + "arxiv:2403.07815", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-Bolt⚡ (Small) 🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. Chronos-Bolt is a family of pretrained time series forecasting models which can be used for zero-shot forecasting. It is based on the T5 encoder-decoder architecture and has been trained on nearly 100 billion time series observations. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are **more accurate**, up to **250 times faster** and **20 times more memory-efficient** than the original Chronos models of the same size. ## Performance The following plot compares the inference time of Chronos-Bolt against the original Chronos models for forecasting 1024 time series with a context length of 512 observations and a prediction horizon of 64 steps.
Chronos-Bolt models are not only significantly faster but also more accurate than the original Chronos models. The following plot reports the probabilistic and point forecasting performance of Chronos-Bolt in terms of the Weighted Quantile Loss (WQL) and the Mean Absolute Scaled Error (MASE), respectively, aggregated over 27 datasets (see the Chronos paper for details on this benchmark). Remarkably, despite having no prior exposure to these datasets during training, the zero-shot Chronos-Bolt models outperform commonly used statistical models and deep learning models that have been trained on these datasets (highlighted by *). Furthermore, they also perform better than other FMs, denoted by a +, which indicates that these models were pretrained on certain datasets in our benchmark and are not entirely zero-shot. Notably, Chronos-Bolt (Base) also surpasses the original Chronos (Large) model in terms of the forecasting accuracy while being over 600 times faster.
Chronos-Bolt models are available in the following sizes.
| Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-bolt-tiny** | 9M | t5-efficient-tiny | | **chronos-bolt-mini** | 21M | t5-efficient-mini | | **chronos-bolt-small** | 48M | t5-efficient-small | | **chronos-bolt-base** | 205M | t5-efficient-base |
## Usage ### Zero-shot inference with Chronos-Bolt in AutoGluon Install the required dependencies. Forecast with the Chronos-Bolt model. For more advanced features such as **fine-tuning** and **forecasting with covariates**, check out this tutorial. ### Deploying a Chronos-Bolt endpoint to SageMaker First, update the SageMaker SDK to make sure that all the latest models are available. Deploy an inference endpoint to SageMaker. Now you can send time series data to the endpoint in JSON format. Chronos-Bolt models can be deployed to both CPU and GPU instances. These models also support **forecasting with covariates**. For more details about the endpoint API, check out the example notebook. ## Citation If you find Chronos or Chronos-Bolt models useful for your research, please consider citing the associated paper: ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "Pretrained time series forecasting model based on T5 architecture, optimized for zero-shot, fast, and memory-efficient multi-step predictions." +} \ No newline at end of file diff --git a/data/model_data_json/autogluon_chronos-t5-small.json b/data/model_data_json/autogluon_chronos-t5-small.json new file mode 100644 index 0000000000000000000000000000000000000000..e6bcec2bb478ce99fa55b40e0abaf5019274b434 --- /dev/null +++ b/data/model_data_json/autogluon_chronos-t5-small.json @@ -0,0 +1,26 @@ +{ + "model_id": "autogluon/chronos-t5-small", + "downloads": 152876, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2403.07815", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-T5 (Small) 🚀 **Update Feb 14, 2025**: Chronos-Bolt & original Chronos models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. 🚀 **Update Nov 27, 2024**: We have released Chronos-Bolt⚡️ models that are more accurate (5% lower error), up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. Check out the new models here. Chronos is a family of **pretrained time series forecasting models** based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes. For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.


Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

--- ## Architecture The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters. | Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-t5-tiny** | 8M | t5-efficient-tiny | | **chronos-t5-mini** | 20M | t5-efficient-mini | | **chronos-t5-small** | 46M | t5-efficient-small | | **chronos-t5-base** | 200M | t5-efficient-base | | **chronos-t5-large** | 710M | t5-efficient-large | ## Usage To perform inference with Chronos models, install the package in the GitHub companion repo by running: A minimal example showing how to perform inference using Chronos models: ## Citation If you find Chronos models useful for your research, please consider citing the associated paper: ## Security See CONTRIBUTING for more information. ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "Pretrained time series forecasting model using tokenized inputs and language model architecture to generate probabilistic future predictions." +} \ No newline at end of file diff --git a/data/model_data_json/autogluon_chronos-t5-tiny.json b/data/model_data_json/autogluon_chronos-t5-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..3e4567c08d7780137fa71637a216da4973e725f4 --- /dev/null +++ b/data/model_data_json/autogluon_chronos-t5-tiny.json @@ -0,0 +1,26 @@ +{ + "model_id": "autogluon/chronos-t5-tiny", + "downloads": 294462, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2403.07815", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series --- # Chronos-T5 (Tiny) 🚀 **Update Feb 14, 2025**: Chronos-Bolt & original Chronos models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lines of code. 🚀 **Update Nov 27, 2024**: We have released Chronos-Bolt⚡️ models that are more accurate (5% lower error), up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. Check out the new models here. Chronos is a family of **pretrained time series forecasting models** based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes. For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.


Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

--- ## Architecture The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters. | Model | Parameters | Based on | | ---------------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------- | | **chronos-t5-tiny** | 8M | t5-efficient-tiny | | **chronos-t5-mini** | 20M | t5-efficient-mini | | **chronos-t5-small** | 46M | t5-efficient-small | | **chronos-t5-base** | 200M | t5-efficient-base | | **chronos-t5-large** | 710M | t5-efficient-large | ## Usage To perform inference with Chronos models, install the package in the GitHub companion repo by running: A minimal example showing how to perform inference using Chronos models: ## Citation If you find Chronos models useful for your research, please consider citing the associated paper: ## Security See CONTRIBUTING for more information. ## License This project is licensed under the Apache-2.0 License.", + "model_explanation_gemini": "Pretrained time series forecasting model using a tokenized T5 architecture to generate probabilistic forecasts from historical data." +} \ No newline at end of file diff --git a/data/model_data_json/avichr_heBERT.json b/data/model_data_json/avichr_heBERT.json new file mode 100644 index 0000000000000000000000000000000000000000..551012aef75f295b60a54226f532c8aeed8cdfc4 --- /dev/null +++ b/data/model_data_json/avichr_heBERT.json @@ -0,0 +1,17 @@ +{ + "model_id": "avichr/heBERT", + "downloads": 95254, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "fill-mask", + "arxiv:1810.04805", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "## HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition HeBERT is a Hebrew pretrained language model. It is based on Google's BERT architecture and it is BERT-Base config (Devlin et al. 2018).
### HeBert was trained on three dataset: 1. A Hebrew version of OSCAR (Ortiz, 2019): ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences. 2. A Hebrew dump of Wikipedia: ~650 MB of data, including over 63 millions words and 3.8 millions sentences 3. Emotion UGC data that was collected for the purpose of this study. (described below) We evaluated the model on emotion recognition and sentiment analysis, for a downstream tasks. ### Emotion UGC Data Description Our User Genrated Content (UGC) is comments written on articles collected from 3 major news sites, between January 2020 to August 2020,. Total data size ~150 MB of data, including over 7 millions words and 350K sentences. 4000 sentences annotated by crowd members (3-10 annotators per sentence) for 8 emotions (anger, disgust, expectation , fear, happy, sadness, surprise and trust) and overall sentiment / polarity
In order to valid the annotation, we search an agreement between raters to emotion in each sentence using krippendorff's alpha (krippendorff, 1970). We left sentences that got alpha > 0.7. Note that while we found a general agreement between raters about emotion like happy, trust and disgust, there are few emotion with general disagreement about them, apparently given the complexity of finding them in the text (e.g. expectation and surprise). ## How to use ### For masked-LM model (can be fine-tunned to any down-stream task) ### For sentiment classification model (polarity ONLY): Our model is also available on AWS! for more information visit AWS' git ### For NER model: ## Stay tuned! We are still working on our model and will edit this page as we progress.
Note that we have released only sentiment analysis (polarity) at this point, emotion detection will be released later on.
our git: ## If you use this model please cite us as : Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.", + "model_explanation_gemini": "avichr_heBERT is a Hebrew BERT-based model pretrained for polarity analysis (sentiment classification) and emotion recognition, evaluated on downstream tasks using annotated news comments." +} \ No newline at end of file diff --git a/data/model_data_json/avsolatorio_GIST-Embedding-v0.json b/data/model_data_json/avsolatorio_GIST-Embedding-v0.json new file mode 100644 index 0000000000000000000000000000000000000000..de0fbbe2324b49ae1503c00ab576ccb839ad904a --- /dev/null +++ b/data/model_data_json/avsolatorio_GIST-Embedding-v0.json @@ -0,0 +1,24 @@ +{ + "model_id": "avsolatorio/GIST-Embedding-v0", + "downloads": 328366, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "mteb", + "sentence-similarity", + "en", + "arxiv:2402.16829", + "arxiv:2212.09741", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers license: mit pipeline_tag: sentence-similarity tags: - feature-extraction - mteb - sentence-similarity - sentence-transformers model-index: - name: GIST-Embedding-v0 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.95522388059702 - type: ap value: 38.940434354439276 - type: f1 value: 69.88686275888114 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.51357499999999 - type: ap value: 90.30414241486682 - type: f1 value: 93.50552829047328 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 50.446000000000005 - type: f1 value: 49.76432659699279 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 38.265 - type: map_at_10 value: 54.236 - type: map_at_100 value: 54.81399999999999 - type: map_at_1000 value: 54.81700000000001 - type: map_at_3 value: 49.881 - type: map_at_5 value: 52.431000000000004 - type: mrr_at_1 value: 38.265 - type: mrr_at_10 value: 54.152 - type: mrr_at_100 value: 54.730000000000004 - type: mrr_at_1000 value: 54.733 - type: mrr_at_3 value: 49.644 - type: mrr_at_5 value: 52.32599999999999 - type: ndcg_at_1 value: 38.265 - type: ndcg_at_10 value: 62.62 - type: ndcg_at_100 value: 64.96600000000001 - type: ndcg_at_1000 value: 65.035 - type: ndcg_at_3 value: 53.691 - type: ndcg_at_5 value: 58.303000000000004 - type: precision_at_1 value: 38.265 - type: precision_at_10 value: 8.919 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.573999999999998 - type: precision_at_5 value: 15.192 - type: recall_at_1 value: 38.265 - type: recall_at_10 value: 89.189 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 64.723 - type: recall_at_5 value: 75.96000000000001 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.287087887491744 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.74244928943812 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.68814324295771 - type: mrr value: 75.46266983247591 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 90.45240209600391 - type: cos_sim_spearman value: 87.95079919934645 - type: euclidean_pearson value: 88.93438602492702 - type: euclidean_spearman value: 88.28152962682988 - type: manhattan_pearson value: 88.92193964325268 - type: manhattan_spearman value: 88.21466063329498 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 15.605427974947808 - type: f1 value: 14.989877233698866 - type: precision value: 14.77906814441261 - type: recall value: 15.605427974947808 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 33.38102575390711 - type: f1 value: 32.41704114719127 - type: precision value: 32.057363829835964 - type: recall value: 33.38102575390711 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 0.1939729823346034 - type: f1 value: 0.17832215223820772 - type: precision value: 0.17639155671715423 - type: recall value: 0.1939729823346034 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 3.0542390731964195 - type: f1 value: 2.762857644374232 - type: precision value: 2.6505178163945935 - type: recall value: 3.0542390731964195 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.29545454545453 - type: f1 value: 87.26415991342238 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.035319537839484 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.667313307057285 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 33.979 - type: map_at_10 value: 46.275 - type: map_at_100 value: 47.975 - type: map_at_1000 value: 48.089 - type: map_at_3 value: 42.507 - type: map_at_5 value: 44.504 - type: mrr_at_1 value: 42.346000000000004 - type: mrr_at_10 value: 53.013 - type: mrr_at_100 value: 53.717000000000006 - type: mrr_at_1000 value: 53.749 - type: mrr_at_3 value: 50.405 - type: mrr_at_5 value: 51.915 - type: ndcg_at_1 value: 42.346000000000004 - type: ndcg_at_10 value: 53.179 - type: ndcg_at_100 value: 58.458 - type: ndcg_at_1000 value: 60.057 - type: ndcg_at_3 value: 48.076 - type: ndcg_at_5 value: 50.283 - type: precision_at_1 value: 42.346000000000004 - type: precision_at_10 value: 10.386 - type: precision_at_100 value: 1.635 - type: precision_at_1000 value: 0.20600000000000002 - type: precision_at_3 value: 23.413999999999998 - type: precision_at_5 value: 16.624 - type: recall_at_1 value: 33.979 - type: recall_at_10 value: 65.553 - type: recall_at_100 value: 87.18599999999999 - type: recall_at_1000 value: 97.25200000000001 - type: recall_at_3 value: 50.068999999999996 - type: recall_at_5 value: 56.882 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.529 - type: map_at_10 value: 42.219 - type: map_at_100 value: 43.408 - type: map_at_1000 value: 43.544 - type: map_at_3 value: 39.178000000000004 - type: map_at_5 value: 40.87 - type: mrr_at_1 value: 39.873 - type: mrr_at_10 value: 48.25 - type: mrr_at_100 value: 48.867 - type: mrr_at_1000 value: 48.908 - type: mrr_at_3 value: 46.03 - type: mrr_at_5 value: 47.355000000000004 - type: ndcg_at_1 value: 39.873 - type: ndcg_at_10 value: 47.933 - type: ndcg_at_100 value: 52.156000000000006 - type: ndcg_at_1000 value: 54.238 - type: ndcg_at_3 value: 43.791999999999994 - type: ndcg_at_5 value: 45.678999999999995 - type: precision_at_1 value: 39.873 - type: precision_at_10 value: 9.032 - type: precision_at_100 value: 1.419 - type: precision_at_1000 value: 0.192 - type: precision_at_3 value: 21.231 - type: precision_at_5 value: 14.981 - type: recall_at_1 value: 31.529 - type: recall_at_10 value: 57.925000000000004 - type: recall_at_100 value: 75.89 - type: recall_at_1000 value: 89.007 - type: recall_at_3 value: 45.363 - type: recall_at_5 value: 50.973 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 41.289 - type: map_at_10 value: 54.494 - type: map_at_100 value: 55.494 - type: map_at_1000 value: 55.545 - type: map_at_3 value: 51.20099999999999 - type: map_at_5 value: 53.147 - type: mrr_at_1 value: 47.335 - type: mrr_at_10 value: 57.772 - type: mrr_at_100 value: 58.428000000000004 - type: mrr_at_1000 value: 58.453 - type: mrr_at_3 value: 55.434000000000005 - type: mrr_at_5 value: 56.8 - type: ndcg_at_1 value: 47.335 - type: ndcg_at_10 value: 60.382999999999996 - type: ndcg_at_100 value: 64.294 - type: ndcg_at_1000 value: 65.211 - type: ndcg_at_3 value: 55.098 - type: ndcg_at_5 value: 57.776 - type: precision_at_1 value: 47.335 - type: precision_at_10 value: 9.724 - type: precision_at_100 value: 1.26 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 24.786 - type: precision_at_5 value: 16.977999999999998 - type: recall_at_1 value: 41.289 - type: recall_at_10 value: 74.36399999999999 - type: recall_at_100 value: 91.19800000000001 - type: recall_at_1000 value: 97.508 - type: recall_at_3 value: 60.285 - type: recall_at_5 value: 66.814 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.816999999999997 - type: map_at_10 value: 37.856 - type: map_at_100 value: 38.824 - type: map_at_1000 value: 38.902 - type: map_at_3 value: 34.982 - type: map_at_5 value: 36.831 - type: mrr_at_1 value: 31.073 - type: mrr_at_10 value: 39.985 - type: mrr_at_100 value: 40.802 - type: mrr_at_1000 value: 40.861999999999995 - type: mrr_at_3 value: 37.419999999999995 - type: mrr_at_5 value: 39.104 - type: ndcg_at_1 value: 31.073 - type: ndcg_at_10 value: 42.958 - type: ndcg_at_100 value: 47.671 - type: ndcg_at_1000 value: 49.633 - type: ndcg_at_3 value: 37.602000000000004 - type: ndcg_at_5 value: 40.688 - type: precision_at_1 value: 31.073 - type: precision_at_10 value: 6.531000000000001 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 15.857 - type: precision_at_5 value: 11.209 - type: recall_at_1 value: 28.816999999999997 - type: recall_at_10 value: 56.538999999999994 - type: recall_at_100 value: 78.17699999999999 - type: recall_at_1000 value: 92.92200000000001 - type: recall_at_3 value: 42.294 - type: recall_at_5 value: 49.842999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.397 - type: map_at_10 value: 27.256999999999998 - type: map_at_100 value: 28.541 - type: map_at_1000 value: 28.658 - type: map_at_3 value: 24.565 - type: map_at_5 value: 26.211000000000002 - type: mrr_at_1 value: 22.761 - type: mrr_at_10 value: 32.248 - type: mrr_at_100 value: 33.171 - type: mrr_at_1000 value: 33.227000000000004 - type: mrr_at_3 value: 29.498 - type: mrr_at_5 value: 31.246000000000002 - type: ndcg_at_1 value: 22.761 - type: ndcg_at_10 value: 32.879999999999995 - type: ndcg_at_100 value: 38.913 - type: ndcg_at_1000 value: 41.504999999999995 - type: ndcg_at_3 value: 27.988000000000003 - type: ndcg_at_5 value: 30.548 - type: precision_at_1 value: 22.761 - type: precision_at_10 value: 6.045 - type: precision_at_100 value: 1.044 - type: precision_at_1000 value: 0.13999999999999999 - type: precision_at_3 value: 13.433 - type: precision_at_5 value: 9.925 - type: recall_at_1 value: 18.397 - type: recall_at_10 value: 45.14 - type: recall_at_100 value: 71.758 - type: recall_at_1000 value: 89.854 - type: recall_at_3 value: 31.942999999999998 - type: recall_at_5 value: 38.249 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.604 - type: map_at_10 value: 42.132 - type: map_at_100 value: 43.419000000000004 - type: map_at_1000 value: 43.527 - type: map_at_3 value: 38.614 - type: map_at_5 value: 40.705000000000005 - type: mrr_at_1 value: 37.824999999999996 - type: mrr_at_10 value: 47.696 - type: mrr_at_100 value: 48.483 - type: mrr_at_1000 value: 48.53 - type: mrr_at_3 value: 45.123999999999995 - type: mrr_at_5 value: 46.635 - type: ndcg_at_1 value: 37.824999999999996 - type: ndcg_at_10 value: 48.421 - type: ndcg_at_100 value: 53.568000000000005 - type: ndcg_at_1000 value: 55.574999999999996 - type: ndcg_at_3 value: 42.89 - type: ndcg_at_5 value: 45.683 - type: precision_at_1 value: 37.824999999999996 - type: precision_at_10 value: 8.758000000000001 - type: precision_at_100 value: 1.319 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 20.244 - type: precision_at_5 value: 14.533 - type: recall_at_1 value: 30.604 - type: recall_at_10 value: 61.605 - type: recall_at_100 value: 82.787 - type: recall_at_1000 value: 95.78 - type: recall_at_3 value: 46.303 - type: recall_at_5 value: 53.351000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.262999999999998 - type: map_at_10 value: 36.858999999999995 - type: map_at_100 value: 38.241 - type: map_at_1000 value: 38.346999999999994 - type: map_at_3 value: 33.171 - type: map_at_5 value: 35.371 - type: mrr_at_1 value: 32.42 - type: mrr_at_10 value: 42.361 - type: mrr_at_100 value: 43.219 - type: mrr_at_1000 value: 43.271 - type: mrr_at_3 value: 39.593 - type: mrr_at_5 value: 41.248000000000005 - type: ndcg_at_1 value: 32.42 - type: ndcg_at_10 value: 43.081 - type: ndcg_at_100 value: 48.837 - type: ndcg_at_1000 value: 50.954 - type: ndcg_at_3 value: 37.413000000000004 - type: ndcg_at_5 value: 40.239000000000004 - type: precision_at_1 value: 32.42 - type: precision_at_10 value: 8.071 - type: precision_at_100 value: 1.272 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 17.922 - type: precision_at_5 value: 13.311 - type: recall_at_1 value: 26.262999999999998 - type: recall_at_10 value: 56.062999999999995 - type: recall_at_100 value: 80.636 - type: recall_at_1000 value: 94.707 - type: recall_at_3 value: 40.425 - type: recall_at_5 value: 47.663 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.86616666666667 - type: map_at_10 value: 37.584999999999994 - type: map_at_100 value: 38.80291666666667 - type: map_at_1000 value: 38.91358333333333 - type: map_at_3 value: 34.498 - type: map_at_5 value: 36.269999999999996 - type: mrr_at_1 value: 33.07566666666667 - type: mrr_at_10 value: 41.92366666666666 - type: mrr_at_100 value: 42.73516666666667 - type: mrr_at_1000 value: 42.785666666666664 - type: mrr_at_3 value: 39.39075 - type: mrr_at_5 value: 40.89133333333334 - type: ndcg_at_1 value: 33.07566666666667 - type: ndcg_at_10 value: 43.19875 - type: ndcg_at_100 value: 48.32083333333334 - type: ndcg_at_1000 value: 50.418000000000006 - type: ndcg_at_3 value: 38.10308333333333 - type: ndcg_at_5 value: 40.5985 - type: precision_at_1 value: 33.07566666666667 - type: precision_at_10 value: 7.581916666666666 - type: precision_at_100 value: 1.1975 - type: precision_at_1000 value: 0.15699999999999997 - type: precision_at_3 value: 17.49075 - type: precision_at_5 value: 12.5135 - type: recall_at_1 value: 27.86616666666667 - type: recall_at_10 value: 55.449749999999995 - type: recall_at_100 value: 77.92516666666666 - type: recall_at_1000 value: 92.31358333333333 - type: recall_at_3 value: 41.324416666666664 - type: recall_at_5 value: 47.72533333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.648 - type: map_at_10 value: 33.155 - type: map_at_100 value: 34.149 - type: map_at_1000 value: 34.239000000000004 - type: map_at_3 value: 30.959999999999997 - type: map_at_5 value: 32.172 - type: mrr_at_1 value: 30.061 - type: mrr_at_10 value: 36.229 - type: mrr_at_100 value: 37.088 - type: mrr_at_1000 value: 37.15 - type: mrr_at_3 value: 34.254 - type: mrr_at_5 value: 35.297 - type: ndcg_at_1 value: 30.061 - type: ndcg_at_10 value: 37.247 - type: ndcg_at_100 value: 42.093 - type: ndcg_at_1000 value: 44.45 - type: ndcg_at_3 value: 33.211 - type: ndcg_at_5 value: 35.083999999999996 - type: precision_at_1 value: 30.061 - type: precision_at_10 value: 5.7059999999999995 - type: precision_at_100 value: 0.8880000000000001 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 13.957 - type: precision_at_5 value: 9.663 - type: recall_at_1 value: 26.648 - type: recall_at_10 value: 46.85 - type: recall_at_100 value: 68.87 - type: recall_at_1000 value: 86.508 - type: recall_at_3 value: 35.756 - type: recall_at_5 value: 40.376 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.058 - type: map_at_10 value: 26.722 - type: map_at_100 value: 27.863 - type: map_at_1000 value: 27.988000000000003 - type: map_at_3 value: 24.258 - type: map_at_5 value: 25.531 - type: mrr_at_1 value: 23.09 - type: mrr_at_10 value: 30.711 - type: mrr_at_100 value: 31.628 - type: mrr_at_1000 value: 31.702 - type: mrr_at_3 value: 28.418 - type: mrr_at_5 value: 29.685 - type: ndcg_at_1 value: 23.09 - type: ndcg_at_10 value: 31.643 - type: ndcg_at_100 value: 37.047999999999995 - type: ndcg_at_1000 value: 39.896 - type: ndcg_at_3 value: 27.189999999999998 - type: ndcg_at_5 value: 29.112 - type: precision_at_1 value: 23.09 - type: precision_at_10 value: 5.743 - type: precision_at_100 value: 1 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 12.790000000000001 - type: precision_at_5 value: 9.195 - type: recall_at_1 value: 19.058 - type: recall_at_10 value: 42.527 - type: recall_at_100 value: 66.833 - type: recall_at_1000 value: 87.008 - type: recall_at_3 value: 29.876 - type: recall_at_5 value: 34.922 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.066999999999997 - type: map_at_10 value: 37.543 - type: map_at_100 value: 38.725 - type: map_at_1000 value: 38.815 - type: map_at_3 value: 34.488 - type: map_at_5 value: 36.222 - type: mrr_at_1 value: 33.116 - type: mrr_at_10 value: 41.743 - type: mrr_at_100 value: 42.628 - type: mrr_at_1000 value: 42.675999999999995 - type: mrr_at_3 value: 39.241 - type: mrr_at_5 value: 40.622 - type: ndcg_at_1 value: 33.116 - type: ndcg_at_10 value: 43.089 - type: ndcg_at_100 value: 48.61 - type: ndcg_at_1000 value: 50.585 - type: ndcg_at_3 value: 37.816 - type: ndcg_at_5 value: 40.256 - type: precision_at_1 value: 33.116 - type: precision_at_10 value: 7.313 - type: precision_at_100 value: 1.1320000000000001 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_3 value: 17.102 - type: precision_at_5 value: 12.09 - type: recall_at_1 value: 28.066999999999997 - type: recall_at_10 value: 55.684 - type: recall_at_100 value: 80.092 - type: recall_at_1000 value: 93.605 - type: recall_at_3 value: 41.277 - type: recall_at_5 value: 47.46 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.094 - type: map_at_10 value: 35.939 - type: map_at_100 value: 37.552 - type: map_at_1000 value: 37.771 - type: map_at_3 value: 32.414 - type: map_at_5 value: 34.505 - type: mrr_at_1 value: 32.609 - type: mrr_at_10 value: 40.521 - type: mrr_at_100 value: 41.479 - type: mrr_at_1000 value: 41.524 - type: mrr_at_3 value: 37.451 - type: mrr_at_5 value: 39.387 - type: ndcg_at_1 value: 32.609 - type: ndcg_at_10 value: 41.83 - type: ndcg_at_100 value: 47.763 - type: ndcg_at_1000 value: 50.102999999999994 - type: ndcg_at_3 value: 36.14 - type: ndcg_at_5 value: 39.153999999999996 - type: precision_at_1 value: 32.609 - type: precision_at_10 value: 7.925 - type: precision_at_100 value: 1.591 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 16.337 - type: precision_at_5 value: 12.411 - type: recall_at_1 value: 27.094 - type: recall_at_10 value: 53.32900000000001 - type: recall_at_100 value: 79.52 - type: recall_at_1000 value: 93.958 - type: recall_at_3 value: 37.773 - type: recall_at_5 value: 45.321 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.649 - type: map_at_10 value: 30.569000000000003 - type: map_at_100 value: 31.444 - type: map_at_1000 value: 31.538 - type: map_at_3 value: 27.638 - type: map_at_5 value: 29.171000000000003 - type: mrr_at_1 value: 24.399 - type: mrr_at_10 value: 32.555 - type: mrr_at_100 value: 33.312000000000005 - type: mrr_at_1000 value: 33.376 - type: mrr_at_3 value: 29.820999999999998 - type: mrr_at_5 value: 31.402 - type: ndcg_at_1 value: 24.399 - type: ndcg_at_10 value: 35.741 - type: ndcg_at_100 value: 40.439 - type: ndcg_at_1000 value: 42.809000000000005 - type: ndcg_at_3 value: 30.020999999999997 - type: ndcg_at_5 value: 32.68 - type: precision_at_1 value: 24.399 - type: precision_at_10 value: 5.749 - type: precision_at_100 value: 0.878 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 12.815999999999999 - type: precision_at_5 value: 9.242 - type: recall_at_1 value: 22.649 - type: recall_at_10 value: 49.818 - type: recall_at_100 value: 72.155 - type: recall_at_1000 value: 89.654 - type: recall_at_3 value: 34.528999999999996 - type: recall_at_5 value: 40.849999999999994 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.587 - type: map_at_10 value: 23.021 - type: map_at_100 value: 25.095 - type: map_at_1000 value: 25.295 - type: map_at_3 value: 19.463 - type: map_at_5 value: 21.389 - type: mrr_at_1 value: 29.576999999999998 - type: mrr_at_10 value: 41.44 - type: mrr_at_100 value: 42.497 - type: mrr_at_1000 value: 42.529 - type: mrr_at_3 value: 38.284 - type: mrr_at_5 value: 40.249 - type: ndcg_at_1 value: 29.576999999999998 - type: ndcg_at_10 value: 31.491000000000003 - type: ndcg_at_100 value: 39.352 - type: ndcg_at_1000 value: 42.703 - type: ndcg_at_3 value: 26.284999999999997 - type: ndcg_at_5 value: 28.218 - type: precision_at_1 value: 29.576999999999998 - type: precision_at_10 value: 9.713 - type: precision_at_100 value: 1.8079999999999998 - type: precision_at_1000 value: 0.243 - type: precision_at_3 value: 19.608999999999998 - type: precision_at_5 value: 14.957999999999998 - type: recall_at_1 value: 13.587 - type: recall_at_10 value: 37.001 - type: recall_at_100 value: 63.617999999999995 - type: recall_at_1000 value: 82.207 - type: recall_at_3 value: 24.273 - type: recall_at_5 value: 29.813000000000002 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.98 - type: map_at_10 value: 20.447000000000003 - type: map_at_100 value: 29.032999999999998 - type: map_at_1000 value: 30.8 - type: map_at_3 value: 15.126999999999999 - type: map_at_5 value: 17.327 - type: mrr_at_1 value: 71.25 - type: mrr_at_10 value: 78.014 - type: mrr_at_100 value: 78.303 - type: mrr_at_1000 value: 78.309 - type: mrr_at_3 value: 76.375 - type: mrr_at_5 value: 77.58699999999999 - type: ndcg_at_1 value: 57.99999999999999 - type: ndcg_at_10 value: 41.705 - type: ndcg_at_100 value: 47.466 - type: ndcg_at_1000 value: 55.186 - type: ndcg_at_3 value: 47.089999999999996 - type: ndcg_at_5 value: 43.974000000000004 - type: precision_at_1 value: 71.25 - type: precision_at_10 value: 32.65 - type: precision_at_100 value: 10.89 - type: precision_at_1000 value: 2.197 - type: precision_at_3 value: 50.5 - type: precision_at_5 value: 42.199999999999996 - type: recall_at_1 value: 9.98 - type: recall_at_10 value: 25.144 - type: recall_at_100 value: 53.754999999999995 - type: recall_at_1000 value: 78.56400000000001 - type: recall_at_3 value: 15.964 - type: recall_at_5 value: 19.186 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 54.67999999999999 - type: f1 value: 49.48247525503583 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 74.798 - type: map_at_10 value: 82.933 - type: map_at_100 value: 83.157 - type: map_at_1000 value: 83.173 - type: map_at_3 value: 81.80199999999999 - type: map_at_5 value: 82.55 - type: mrr_at_1 value: 80.573 - type: mrr_at_10 value: 87.615 - type: mrr_at_100 value: 87.69 - type: mrr_at_1000 value: 87.69200000000001 - type: mrr_at_3 value: 86.86399999999999 - type: mrr_at_5 value: 87.386 - type: ndcg_at_1 value: 80.573 - type: ndcg_at_10 value: 86.64500000000001 - type: ndcg_at_100 value: 87.407 - type: ndcg_at_1000 value: 87.68299999999999 - type: ndcg_at_3 value: 84.879 - type: ndcg_at_5 value: 85.921 - type: precision_at_1 value: 80.573 - type: precision_at_10 value: 10.348 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 32.268 - type: precision_at_5 value: 20.084 - type: recall_at_1 value: 74.798 - type: recall_at_10 value: 93.45400000000001 - type: recall_at_100 value: 96.42500000000001 - type: recall_at_1000 value: 98.158 - type: recall_at_3 value: 88.634 - type: recall_at_5 value: 91.295 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.567 - type: map_at_10 value: 32.967999999999996 - type: map_at_100 value: 35.108 - type: map_at_1000 value: 35.272999999999996 - type: map_at_3 value: 28.701999999999998 - type: map_at_5 value: 31.114000000000004 - type: mrr_at_1 value: 40.432 - type: mrr_at_10 value: 48.956 - type: mrr_at_100 value: 49.832 - type: mrr_at_1000 value: 49.87 - type: mrr_at_3 value: 46.759 - type: mrr_at_5 value: 47.886 - type: ndcg_at_1 value: 40.432 - type: ndcg_at_10 value: 40.644000000000005 - type: ndcg_at_100 value: 48.252 - type: ndcg_at_1000 value: 51.099000000000004 - type: ndcg_at_3 value: 36.992000000000004 - type: ndcg_at_5 value: 38.077 - type: precision_at_1 value: 40.432 - type: precision_at_10 value: 11.296000000000001 - type: precision_at_100 value: 1.9009999999999998 - type: precision_at_1000 value: 0.241 - type: precision_at_3 value: 24.537 - type: precision_at_5 value: 17.963 - type: recall_at_1 value: 20.567 - type: recall_at_10 value: 47.052 - type: recall_at_100 value: 75.21600000000001 - type: recall_at_1000 value: 92.285 - type: recall_at_3 value: 33.488 - type: recall_at_5 value: 39.334 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 38.196999999999996 - type: map_at_10 value: 60.697 - type: map_at_100 value: 61.624 - type: map_at_1000 value: 61.692 - type: map_at_3 value: 57.421 - type: map_at_5 value: 59.455000000000005 - type: mrr_at_1 value: 76.39399999999999 - type: mrr_at_10 value: 82.504 - type: mrr_at_100 value: 82.71300000000001 - type: mrr_at_1000 value: 82.721 - type: mrr_at_3 value: 81.494 - type: mrr_at_5 value: 82.137 - type: ndcg_at_1 value: 76.39399999999999 - type: ndcg_at_10 value: 68.92200000000001 - type: ndcg_at_100 value: 72.13199999999999 - type: ndcg_at_1000 value: 73.392 - type: ndcg_at_3 value: 64.226 - type: ndcg_at_5 value: 66.815 - type: precision_at_1 value: 76.39399999999999 - type: precision_at_10 value: 14.442 - type: precision_at_100 value: 1.694 - type: precision_at_1000 value: 0.186 - type: precision_at_3 value: 41.211 - type: precision_at_5 value: 26.766000000000002 - type: recall_at_1 value: 38.196999999999996 - type: recall_at_10 value: 72.208 - type: recall_at_100 value: 84.71300000000001 - type: recall_at_1000 value: 92.971 - type: recall_at_3 value: 61.816 - type: recall_at_5 value: 66.914 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 89.6556 - type: ap value: 85.27600392682054 - type: f1 value: 89.63353655386406 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.482 - type: map_at_10 value: 33.701 - type: map_at_100 value: 34.861 - type: map_at_1000 value: 34.914 - type: map_at_3 value: 29.793999999999997 - type: map_at_5 value: 32.072 - type: mrr_at_1 value: 22.163 - type: mrr_at_10 value: 34.371 - type: mrr_at_100 value: 35.471000000000004 - type: mrr_at_1000 value: 35.518 - type: mrr_at_3 value: 30.554 - type: mrr_at_5 value: 32.799 - type: ndcg_at_1 value: 22.163 - type: ndcg_at_10 value: 40.643 - type: ndcg_at_100 value: 46.239999999999995 - type: ndcg_at_1000 value: 47.526 - type: ndcg_at_3 value: 32.714999999999996 - type: ndcg_at_5 value: 36.791000000000004 - type: precision_at_1 value: 22.163 - type: precision_at_10 value: 6.4799999999999995 - type: precision_at_100 value: 0.928 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.002 - type: precision_at_5 value: 10.453 - type: recall_at_1 value: 21.482 - type: recall_at_10 value: 61.953 - type: recall_at_100 value: 87.86500000000001 - type: recall_at_1000 value: 97.636 - type: recall_at_3 value: 40.441 - type: recall_at_5 value: 50.27 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 95.3032375740994 - type: f1 value: 95.01515022686607 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 78.10077519379846 - type: f1 value: 58.240739725625644 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.0053799596503 - type: f1 value: 74.11733965804146 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.64021519838602 - type: f1 value: 79.8513960091438 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.92425767945184 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.249612382060754 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.35584955492918 - type: mrr value: 33.545865224584674 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.978 - type: map_at_10 value: 14.749 - type: map_at_100 value: 19.192 - type: map_at_1000 value: 20.815 - type: map_at_3 value: 10.927000000000001 - type: map_at_5 value: 12.726 - type: mrr_at_1 value: 49.536 - type: mrr_at_10 value: 57.806999999999995 - type: mrr_at_100 value: 58.373 - type: mrr_at_1000 value: 58.407 - type: mrr_at_3 value: 55.779 - type: mrr_at_5 value: 57.095 - type: ndcg_at_1 value: 46.749 - type: ndcg_at_10 value: 37.644 - type: ndcg_at_100 value: 35.559000000000005 - type: ndcg_at_1000 value: 44.375 - type: ndcg_at_3 value: 43.354 - type: ndcg_at_5 value: 41.022999999999996 - type: precision_at_1 value: 48.607 - type: precision_at_10 value: 28.08 - type: precision_at_100 value: 9.155000000000001 - type: precision_at_1000 value: 2.2270000000000003 - type: precision_at_3 value: 40.764 - type: precision_at_5 value: 35.728 - type: recall_at_1 value: 6.978 - type: recall_at_10 value: 17.828 - type: recall_at_100 value: 36.010999999999996 - type: recall_at_1000 value: 68.34700000000001 - type: recall_at_3 value: 11.645999999999999 - type: recall_at_5 value: 14.427000000000001 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 30.219 - type: map_at_10 value: 45.633 - type: map_at_100 value: 46.752 - type: map_at_1000 value: 46.778999999999996 - type: map_at_3 value: 41.392 - type: map_at_5 value: 43.778 - type: mrr_at_1 value: 34.327999999999996 - type: mrr_at_10 value: 48.256 - type: mrr_at_100 value: 49.076 - type: mrr_at_1000 value: 49.092999999999996 - type: mrr_at_3 value: 44.786 - type: mrr_at_5 value: 46.766000000000005 - type: ndcg_at_1 value: 34.299 - type: ndcg_at_10 value: 53.434000000000005 - type: ndcg_at_100 value: 58.03 - type: ndcg_at_1000 value: 58.633 - type: ndcg_at_3 value: 45.433 - type: ndcg_at_5 value: 49.379 - type: precision_at_1 value: 34.299 - type: precision_at_10 value: 8.911 - type: precision_at_100 value: 1.145 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.896 - type: precision_at_5 value: 14.832 - type: recall_at_1 value: 30.219 - type: recall_at_10 value: 74.59400000000001 - type: recall_at_100 value: 94.392 - type: recall_at_1000 value: 98.832 - type: recall_at_3 value: 53.754000000000005 - type: recall_at_5 value: 62.833000000000006 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.139 - type: map_at_10 value: 85.141 - type: map_at_100 value: 85.78099999999999 - type: map_at_1000 value: 85.795 - type: map_at_3 value: 82.139 - type: map_at_5 value: 84.075 - type: mrr_at_1 value: 81.98 - type: mrr_at_10 value: 88.056 - type: mrr_at_100 value: 88.152 - type: mrr_at_1000 value: 88.152 - type: mrr_at_3 value: 87.117 - type: mrr_at_5 value: 87.78099999999999 - type: ndcg_at_1 value: 82.02000000000001 - type: ndcg_at_10 value: 88.807 - type: ndcg_at_100 value: 89.99000000000001 - type: ndcg_at_1000 value: 90.068 - type: ndcg_at_3 value: 85.989 - type: ndcg_at_5 value: 87.627 - type: precision_at_1 value: 82.02000000000001 - type: precision_at_10 value: 13.472999999999999 - type: precision_at_100 value: 1.534 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.553 - type: precision_at_5 value: 24.788 - type: recall_at_1 value: 71.139 - type: recall_at_10 value: 95.707 - type: recall_at_100 value: 99.666 - type: recall_at_1000 value: 99.983 - type: recall_at_3 value: 87.64699999999999 - type: recall_at_5 value: 92.221 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 59.11035509193503 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.44241881422526 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.122999999999999 - type: map_at_10 value: 14.45 - type: map_at_100 value: 17.108999999999998 - type: map_at_1000 value: 17.517 - type: map_at_3 value: 10.213999999999999 - type: map_at_5 value: 12.278 - type: mrr_at_1 value: 25.3 - type: mrr_at_10 value: 37.791999999999994 - type: mrr_at_100 value: 39.086 - type: mrr_at_1000 value: 39.121 - type: mrr_at_3 value: 34.666999999999994 - type: mrr_at_5 value: 36.472 - type: ndcg_at_1 value: 25.3 - type: ndcg_at_10 value: 23.469 - type: ndcg_at_100 value: 33.324 - type: ndcg_at_1000 value: 39.357 - type: ndcg_at_3 value: 22.478 - type: ndcg_at_5 value: 19.539 - type: precision_at_1 value: 25.3 - type: precision_at_10 value: 12.3 - type: precision_at_100 value: 2.654 - type: precision_at_1000 value: 0.40800000000000003 - type: precision_at_3 value: 21.667 - type: precision_at_5 value: 17.5 - type: recall_at_1 value: 5.122999999999999 - type: recall_at_10 value: 24.937 - type: recall_at_100 value: 53.833 - type: recall_at_1000 value: 82.85 - type: recall_at_3 value: 13.178 - type: recall_at_5 value: 17.747 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 86.76549431206278 - type: cos_sim_spearman value: 81.28563534883214 - type: euclidean_pearson value: 84.17180713818567 - type: euclidean_spearman value: 81.1684082302606 - type: manhattan_pearson value: 84.12189753972959 - type: manhattan_spearman value: 81.1134998997958 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.75137587182017 - type: cos_sim_spearman value: 76.155337187325 - type: euclidean_pearson value: 83.54551546726665 - type: euclidean_spearman value: 76.30324990565346 - type: manhattan_pearson value: 83.52192617483797 - type: manhattan_spearman value: 76.30017227216015 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 87.13890050398628 - type: cos_sim_spearman value: 87.84898360302155 - type: euclidean_pearson value: 86.89491809082031 - type: euclidean_spearman value: 87.99935689905651 - type: manhattan_pearson value: 86.86526424376366 - type: manhattan_spearman value: 87.96850732980495 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 86.01978753231558 - type: cos_sim_spearman value: 83.38989083933329 - type: euclidean_pearson value: 85.28405032045376 - type: euclidean_spearman value: 83.51703914276501 - type: manhattan_pearson value: 85.25775133078966 - type: manhattan_spearman value: 83.52815667821727 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.28482294437876 - type: cos_sim_spearman value: 89.42976214499576 - type: euclidean_pearson value: 88.72677957272468 - type: euclidean_spearman value: 89.30001736116229 - type: manhattan_pearson value: 88.64119331622562 - type: manhattan_spearman value: 89.21771022634893 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.79810159351987 - type: cos_sim_spearman value: 85.34918402034273 - type: euclidean_pearson value: 84.76058606229002 - type: euclidean_spearman value: 85.45159829941214 - type: manhattan_pearson value: 84.73926491888156 - type: manhattan_spearman value: 85.42568221985898 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.92796712570272 - type: cos_sim_spearman value: 88.58925922945812 - type: euclidean_pearson value: 88.97231215531797 - type: euclidean_spearman value: 88.27036385068719 - type: manhattan_pearson value: 88.95761469412228 - type: manhattan_spearman value: 88.23980432487681 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 66.85679810182282 - type: cos_sim_spearman value: 67.80696709003128 - type: euclidean_pearson value: 68.77524185947989 - type: euclidean_spearman value: 68.032438075422 - type: manhattan_pearson value: 68.60489100404182 - type: manhattan_spearman value: 67.75418889226138 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.33287880999367 - type: cos_sim_spearman value: 87.32401087204754 - type: euclidean_pearson value: 87.27961069148029 - type: euclidean_spearman value: 87.3547683085868 - type: manhattan_pearson value: 87.24405442789622 - type: manhattan_spearman value: 87.32896271166672 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.71553665286558 - type: mrr value: 96.42436176749902 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 61.094 - type: map_at_10 value: 71.066 - type: map_at_100 value: 71.608 - type: map_at_1000 value: 71.629 - type: map_at_3 value: 68.356 - type: map_at_5 value: 70.15 - type: mrr_at_1 value: 64 - type: mrr_at_10 value: 71.82300000000001 - type: mrr_at_100 value: 72.251 - type: mrr_at_1000 value: 72.269 - type: mrr_at_3 value: 69.833 - type: mrr_at_5 value: 71.11699999999999 - type: ndcg_at_1 value: 64 - type: ndcg_at_10 value: 75.286 - type: ndcg_at_100 value: 77.40700000000001 - type: ndcg_at_1000 value: 77.806 - type: ndcg_at_3 value: 70.903 - type: ndcg_at_5 value: 73.36399999999999 - type: precision_at_1 value: 64 - type: precision_at_10 value: 9.9 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 27.667 - type: precision_at_5 value: 18.333 - type: recall_at_1 value: 61.094 - type: recall_at_10 value: 87.256 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 75.6 - type: recall_at_5 value: 81.789 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.82871287128712 - type: cos_sim_ap value: 95.9325677692287 - type: cos_sim_f1 value: 91.13924050632912 - type: cos_sim_precision value: 92.3076923076923 - type: cos_sim_recall value: 90 - type: dot_accuracy value: 99.7980198019802 - type: dot_ap value: 94.56107207796 - type: dot_f1 value: 89.41908713692946 - type: dot_precision value: 92.88793103448276 - type: dot_recall value: 86.2 - type: euclidean_accuracy value: 99.82871287128712 - type: euclidean_ap value: 95.94390332507025 - type: euclidean_f1 value: 91.17797042325346 - type: euclidean_precision value: 93.02809573361083 - type: euclidean_recall value: 89.4 - type: manhattan_accuracy value: 99.82871287128712 - type: manhattan_ap value: 95.97587114452257 - type: manhattan_f1 value: 91.25821121778675 - type: manhattan_precision value: 92.23697650663942 - type: manhattan_recall value: 90.3 - type: max_accuracy value: 99.82871287128712 - type: max_ap value: 95.97587114452257 - type: max_f1 value: 91.25821121778675 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 66.13974351708839 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.594544722932234 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.718738983377726 - type: mrr value: 55.61655154486037 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.37028359646597 - type: cos_sim_spearman value: 30.866534307244443 - type: dot_pearson value: 29.89037691541816 - type: dot_spearman value: 29.941267567971718 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.20400000000000001 - type: map_at_10 value: 1.7340000000000002 - type: map_at_100 value: 9.966 - type: map_at_1000 value: 25.119000000000003 - type: map_at_3 value: 0.596 - type: map_at_5 value: 0.941 - type: mrr_at_1 value: 76 - type: mrr_at_10 value: 85.85199999999999 - type: mrr_at_100 value: 85.85199999999999 - type: mrr_at_1000 value: 85.85199999999999 - type: mrr_at_3 value: 84.667 - type: mrr_at_5 value: 85.56700000000001 - type: ndcg_at_1 value: 71 - type: ndcg_at_10 value: 69.60300000000001 - type: ndcg_at_100 value: 54.166000000000004 - type: ndcg_at_1000 value: 51.085 - type: ndcg_at_3 value: 71.95 - type: ndcg_at_5 value: 71.17599999999999 - type: precision_at_1 value: 76 - type: precision_at_10 value: 74.2 - type: precision_at_100 value: 55.96 - type: precision_at_1000 value: 22.584 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 75.6 - type: recall_at_1 value: 0.20400000000000001 - type: recall_at_10 value: 1.992 - type: recall_at_100 value: 13.706999999999999 - type: recall_at_1000 value: 48.732 - type: recall_at_3 value: 0.635 - type: recall_at_5 value: 1.034 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8 - type: f1 value: 6.298401229470593 - type: precision value: 5.916991709050532 - type: recall value: 8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 17.341040462427745 - type: f1 value: 14.621650026274303 - type: precision value: 13.9250609139035 - type: recall value: 17.341040462427745 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.536585365853659 - type: f1 value: 6.30972482801751 - type: precision value: 5.796517326875398 - type: recall value: 8.536585365853659 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.4 - type: f1 value: 4.221126743626743 - type: precision value: 3.822815143403898 - type: recall value: 6.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 19.8 - type: f1 value: 18.13768093781855 - type: precision value: 17.54646004378763 - type: recall value: 19.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 13.700000000000001 - type: f1 value: 12.367662337662336 - type: precision value: 11.934237966189185 - type: recall value: 13.700000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 14.299999999999999 - type: f1 value: 10.942180289268338 - type: precision value: 10.153968847262192 - type: recall value: 14.299999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 22.388059701492537 - type: f1 value: 17.00157733660433 - type: precision value: 15.650551589876702 - type: recall value: 22.388059701492537 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 22 - type: f1 value: 17.4576947358322 - type: precision value: 16.261363669827777 - type: recall value: 22 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.292682926829269 - type: f1 value: 5.544048456005624 - type: precision value: 5.009506603002538 - type: recall value: 8.292682926829269 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 5.4 - type: f1 value: 4.148897174789229 - type: precision value: 3.862217259449564 - type: recall value: 5.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 5.5893074119076545 - type: f1 value: 4.375041810373159 - type: precision value: 4.181207113088141 - type: recall value: 5.5893074119076545 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.17391304347826 - type: f1 value: 6.448011891490153 - type: precision value: 5.9719116632160105 - type: recall value: 8.17391304347826 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.8695652173913043 - type: f1 value: 0.582815734989648 - type: precision value: 0.5580885233059146 - type: recall value: 0.8695652173913043 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 5.1 - type: f1 value: 3.5000615825615826 - type: precision value: 3.2073523577994707 - type: recall value: 5.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.3 - type: f1 value: 0.10109884927372195 - type: precision value: 0.10055127118392897 - type: recall value: 0.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.8600723763570564 - type: f1 value: 2.8177402725050493 - type: precision value: 2.5662687819699213 - type: recall value: 3.8600723763570564 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0 - type: f1 value: 0 - type: precision value: 0 - type: recall value: 0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 15.299999999999999 - type: f1 value: 11.377964359824292 - type: precision value: 10.361140908892764 - type: recall value: 15.299999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1.3 - type: f1 value: 0.9600820232399179 - type: precision value: 0.9151648856810397 - type: recall value: 1.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 14.095238095238095 - type: f1 value: 11.40081541819044 - type: precision value: 10.645867976820359 - type: recall value: 14.095238095238095 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4 - type: f1 value: 2.3800704501963432 - type: precision value: 2.0919368034607455 - type: recall value: 4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.3 - type: f1 value: 0.2002053388090349 - type: precision value: 0.2001027749229188 - type: recall value: 0.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 11.700000000000001 - type: f1 value: 10.29755634495992 - type: precision value: 9.876637220292393 - type: recall value: 11.700000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1.7000000000000002 - type: f1 value: 0.985815849620051 - type: precision value: 0.8884689922480621 - type: recall value: 1.7000000000000002 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 17.599999999999998 - type: f1 value: 14.086312656126182 - type: precision value: 13.192360560816125 - type: recall value: 17.599999999999998 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.1 - type: f1 value: 4.683795729173087 - type: precision value: 4.31687579027912 - type: recall value: 6.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.4 - type: f1 value: 0.20966666666666667 - type: precision value: 0.20500700280112047 - type: recall value: 0.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.6 - type: f1 value: 0.2454665118079752 - type: precision value: 0.2255125167991618 - type: recall value: 0.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 21 - type: f1 value: 18.965901242066018 - type: precision value: 18.381437375171 - type: recall value: 21 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.5390835579514826 - type: f1 value: 0.4048898457205192 - type: precision value: 0.4046018763809678 - type: recall value: 0.5390835579514826 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1.282051282051282 - type: f1 value: 0.5098554872310529 - type: precision value: 0.4715099715099715 - type: recall value: 1.282051282051282 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 10.7 - type: f1 value: 8.045120643200706 - type: precision value: 7.387598023074453 - type: recall value: 10.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 2.272727272727273 - type: f1 value: 1.44184724004356 - type: precision value: 1.4082306862044767 - type: recall value: 2.272727272727273 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.20964360587002098 - type: f1 value: 0.001335309591528796 - type: precision value: 0.0006697878781789807 - type: recall value: 0.20964360587002098 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 7.1 - type: f1 value: 5.522254020507502 - type: precision value: 5.081849426723903 - type: recall value: 7.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 36.57587548638132 - type: f1 value: 30.325515383881147 - type: precision value: 28.59255854392041 - type: recall value: 36.57587548638132 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 16.23931623931624 - type: f1 value: 13.548783761549718 - type: precision value: 13.0472896359184 - type: recall value: 16.23931623931624 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 16.3 - type: f1 value: 13.3418584934734 - type: precision value: 12.506853047473756 - type: recall value: 16.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1 - type: f1 value: 0.7764001197963462 - type: precision value: 0.7551049317943337 - type: recall value: 1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.9719626168224296 - type: f1 value: 3.190729401654313 - type: precision value: 3.001159168296747 - type: recall value: 3.9719626168224296 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.4000000000000004 - type: f1 value: 2.4847456001574653 - type: precision value: 2.308739271803959 - type: recall value: 3.4000000000000004 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 36.9 - type: f1 value: 31.390407955063697 - type: precision value: 29.631294298308614 - type: recall value: 36.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 14.2 - type: f1 value: 12.551591810861895 - type: precision value: 12.100586917562724 - type: recall value: 14.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 9.2 - type: f1 value: 7.5561895648211435 - type: precision value: 7.177371101110253 - type: recall value: 9.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 21.2 - type: f1 value: 18.498268429117875 - type: precision value: 17.693915156965357 - type: recall value: 21.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4.2 - type: f1 value: 2.886572782530936 - type: precision value: 2.5806792595351915 - type: recall value: 4.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.800000000000001 - type: f1 value: 4.881091920308238 - type: precision value: 4.436731163345769 - type: recall value: 6.800000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 22.1 - type: f1 value: 18.493832677140738 - type: precision value: 17.52055858924503 - type: recall value: 22.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6 - type: f1 value: 4.58716840215435 - type: precision value: 4.303119297298687 - type: recall value: 6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 5.5 - type: f1 value: 3.813678559437776 - type: precision value: 3.52375763382276 - type: recall value: 5.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.2 - type: f1 value: 0.06701509872241579 - type: precision value: 0.05017452006980803 - type: recall value: 0.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 12.5 - type: f1 value: 9.325396825396826 - type: precision value: 8.681972789115646 - type: recall value: 12.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.43907793633369924 - type: f1 value: 0.26369680618309754 - type: precision value: 0.24710650393580552 - type: recall value: 0.43907793633369924 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1.7000000000000002 - type: f1 value: 1.0240727731562105 - type: precision value: 0.9379457073996874 - type: recall value: 1.7000000000000002 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 24.6 - type: f1 value: 21.527732683982684 - type: precision value: 20.460911398969852 - type: recall value: 24.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 23.400000000000002 - type: f1 value: 18.861948871033608 - type: precision value: 17.469730524988158 - type: recall value: 23.400000000000002 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1.3 - type: f1 value: 0.8081609699284277 - type: precision value: 0.8041232161030668 - type: recall value: 1.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 14.399999999999999 - type: f1 value: 11.982642360594898 - type: precision value: 11.423911681034546 - type: recall value: 14.399999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.7 - type: f1 value: 6.565099922088448 - type: precision value: 6.009960806394631 - type: recall value: 8.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 7.1 - type: f1 value: 5.483244116053285 - type: precision value: 5.08036675810842 - type: recall value: 7.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4.3999999999999995 - type: f1 value: 3.2643948695904146 - type: precision value: 3.031506651474311 - type: recall value: 4.3999999999999995 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 7.1 - type: f1 value: 5.2787766765398345 - type: precision value: 4.883891459552525 - type: recall value: 7.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.5 - type: f1 value: 7.022436974789914 - type: precision value: 6.517919923571304 - type: recall value: 8.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 17.51824817518248 - type: f1 value: 14.159211038143834 - type: precision value: 13.419131771033424 - type: recall value: 17.51824817518248 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.3 - type: f1 value: 0.1008802791411487 - type: precision value: 0.10044111373948113 - type: recall value: 0.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 11.3 - type: f1 value: 10.0642631078894 - type: precision value: 9.714481189937882 - type: recall value: 11.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.7000000000000001 - type: f1 value: 0.5023625310859353 - type: precision value: 0.5011883541295307 - type: recall value: 0.7000000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1.7857142857142856 - type: f1 value: 0.6731500547238763 - type: precision value: 0.6364087301587301 - type: recall value: 1.7857142857142856 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 7.000000000000001 - type: f1 value: 4.850226809905071 - type: precision value: 4.3549672188068485 - type: recall value: 7.000000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 5.383022774327122 - type: f1 value: 4.080351427081423 - type: precision value: 3.7431771127423294 - type: recall value: 5.383022774327122 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.9 - type: f1 value: 2.975065835065835 - type: precision value: 2.7082951373488764 - type: recall value: 3.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 13.8 - type: f1 value: 10.976459812917616 - type: precision value: 10.214566903851944 - type: recall value: 13.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4.9 - type: f1 value: 3.5998112099809334 - type: precision value: 3.391430386128988 - type: recall value: 4.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 2.1645021645021645 - type: f1 value: 0.28969205674033943 - type: precision value: 0.1648931376979724 - type: recall value: 2.1645021645021645 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 9.541984732824428 - type: f1 value: 8.129327179123026 - type: precision value: 7.860730567672363 - type: recall value: 9.541984732824428 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.5822416302765648 - type: f1 value: 0.3960292169899156 - type: precision value: 0.36794436357755134 - type: recall value: 0.5822416302765648 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 25.900000000000002 - type: f1 value: 20.98162273769728 - type: precision value: 19.591031936732236 - type: recall value: 25.900000000000002 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 9.322033898305085 - type: f1 value: 7.1764632211739166 - type: precision value: 6.547619047619047 - type: recall value: 9.322033898305085 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4.3999999999999995 - type: f1 value: 3.0484795026022216 - type: precision value: 2.8132647991077686 - type: recall value: 4.3999999999999995 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 18.8 - type: f1 value: 15.52276497119774 - type: precision value: 14.63296284434154 - type: recall value: 18.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 10 - type: f1 value: 7.351901305737391 - type: precision value: 6.759061952118555 - type: recall value: 10 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.1 - type: f1 value: 2.1527437641723353 - type: precision value: 2.0008336640383417 - type: recall value: 3.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 10.6 - type: f1 value: 8.471815215313617 - type: precision value: 7.942319409218233 - type: recall value: 10.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4.3 - type: f1 value: 2.7338036427188244 - type: precision value: 2.5492261384839052 - type: recall value: 4.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.40214477211796246 - type: f1 value: 0.28150134048257375 - type: precision value: 0.2751516861859743 - type: recall value: 0.40214477211796246 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3 - type: f1 value: 1.5834901411814404 - type: precision value: 1.3894010894944848 - type: recall value: 3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 7.905138339920949 - type: f1 value: 6.6397047981096735 - type: precision value: 6.32664437012263 - type: recall value: 7.905138339920949 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.5211267605633805 - type: f1 value: 2.173419196807775 - type: precision value: 2.14388897487489 - type: recall value: 3.5211267605633805 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.23952095808383234 - type: f1 value: 0.001262128032547595 - type: precision value: 0.0006327654461278806 - type: recall value: 0.23952095808383234 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 10.4 - type: f1 value: 8.370422351826372 - type: precision value: 7.943809523809523 - type: recall value: 10.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 5.41871921182266 - type: f1 value: 3.4763895108722696 - type: precision value: 3.1331846246882176 - type: recall value: 5.41871921182266 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 9.15492957746479 - type: f1 value: 7.267458920187794 - type: precision value: 6.893803787858966 - type: recall value: 9.15492957746479 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 9.487179487179487 - type: f1 value: 6.902767160316073 - type: precision value: 6.450346503818517 - type: recall value: 9.487179487179487 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.1 - type: f1 value: 0.0002042900919305414 - type: precision value: 0.00010224948875255625 - type: recall value: 0.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 5.010438413361169 - type: f1 value: 3.8116647214505277 - type: precision value: 3.5454644309619634 - type: recall value: 5.010438413361169 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.2 - type: f1 value: 5.213158915433869 - type: precision value: 5.080398110661268 - type: recall value: 6.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.9771986970684038 - type: f1 value: 0.5061388123277374 - type: precision value: 0.43431053203040165 - type: recall value: 0.9771986970684038 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 7.3 - type: f1 value: 5.6313180921027755 - type: precision value: 5.303887400540395 - type: recall value: 7.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.5999999999999996 - type: f1 value: 3.2180089485458607 - type: precision value: 3.1006756756756753 - type: recall value: 3.5999999999999996 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 22.04724409448819 - type: f1 value: 17.92525934258218 - type: precision value: 16.48251629836593 - type: recall value: 22.04724409448819 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.5 - type: f1 value: 0.1543743186232414 - type: precision value: 0.13554933572174951 - type: recall value: 0.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.8310249307479225 - type: f1 value: 0.5102255597841558 - type: precision value: 0.4859595744731704 - type: recall value: 0.8310249307479225 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.9 - type: f1 value: 4.7258390633390635 - type: precision value: 4.288366570275279 - type: recall value: 6.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 17.307692307692307 - type: f1 value: 14.763313609467454 - type: precision value: 14.129273504273504 - type: recall value: 17.307692307692307 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.3 - type: f1 value: 0.0022196828248667185 - type: precision value: 0.0011148527298850575 - type: recall value: 0.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.3 - type: f1 value: 0.3 - type: precision value: 0.3 - type: recall value: 0.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.6 - type: f1 value: 0.500206611570248 - type: precision value: 0.5001034126163392 - type: recall value: 0.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 0.4716981132075472 - type: f1 value: 0.2953377695417789 - type: precision value: 0.2754210459668228 - type: recall value: 0.4716981132075472 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4.3999999999999995 - type: f1 value: 3.6228414442700156 - type: precision value: 3.4318238993710692 - type: recall value: 4.3999999999999995 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 1.2773722627737227 - type: f1 value: 1.0043318098096732 - type: precision value: 0.9735777358593729 - type: recall value: 1.2773722627737227 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 3.9 - type: f1 value: 2.6164533097276226 - type: precision value: 2.3558186153594085 - type: recall value: 3.9 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 1.5779999999999998 - type: map_at_10 value: 8.339 - type: map_at_100 value: 14.601 - type: map_at_1000 value: 16.104 - type: map_at_3 value: 4.06 - type: map_at_5 value: 6.049 - type: mrr_at_1 value: 18.367 - type: mrr_at_10 value: 35.178 - type: mrr_at_100 value: 36.464999999999996 - type: mrr_at_1000 value: 36.464999999999996 - type: mrr_at_3 value: 29.932 - type: mrr_at_5 value: 34.32 - type: ndcg_at_1 value: 16.326999999999998 - type: ndcg_at_10 value: 20.578 - type: ndcg_at_100 value: 34.285 - type: ndcg_at_1000 value: 45.853 - type: ndcg_at_3 value: 19.869999999999997 - type: ndcg_at_5 value: 22.081999999999997 - type: precision_at_1 value: 18.367 - type: precision_at_10 value: 19.796 - type: precision_at_100 value: 7.714 - type: precision_at_1000 value: 1.547 - type: precision_at_3 value: 23.128999999999998 - type: precision_at_5 value: 24.898 - type: recall_at_1 value: 1.5779999999999998 - type: recall_at_10 value: 14.801 - type: recall_at_100 value: 48.516999999999996 - type: recall_at_1000 value: 83.30300000000001 - type: recall_at_3 value: 5.267 - type: recall_at_5 value: 9.415999999999999 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 72.4186 - type: ap value: 14.536282543597242 - type: f1 value: 55.47661372005608 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.318053197509904 - type: f1 value: 59.68272481532353 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 52.155753554312 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.99409906419503 - type: cos_sim_ap value: 76.91824322304332 - type: cos_sim_f1 value: 70.97865694950546 - type: cos_sim_precision value: 70.03081664098613 - type: cos_sim_recall value: 71.95250659630607 - type: dot_accuracy value: 85.37879239434942 - type: dot_ap value: 71.86454698478344 - type: dot_f1 value: 66.48115355426259 - type: dot_precision value: 63.84839650145773 - type: dot_recall value: 69.34036939313984 - type: euclidean_accuracy value: 87.00005960541218 - type: euclidean_ap value: 76.9165913835565 - type: euclidean_f1 value: 71.23741557283039 - type: euclidean_precision value: 68.89327088982007 - type: euclidean_recall value: 73.7467018469657 - type: manhattan_accuracy value: 87.06562555880075 - type: manhattan_ap value: 76.85445703747546 - type: manhattan_f1 value: 70.95560571858539 - type: manhattan_precision value: 67.61472275334609 - type: manhattan_recall value: 74.64379947229551 - type: max_accuracy value: 87.06562555880075 - type: max_ap value: 76.91824322304332 - type: max_f1 value: 71.23741557283039 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.93934101758063 - type: cos_sim_ap value: 86.1071528049007 - type: cos_sim_f1 value: 78.21588263552714 - type: cos_sim_precision value: 75.20073900376609 - type: cos_sim_recall value: 81.48290729904527 - type: dot_accuracy value: 88.2504754142896 - type: dot_ap value: 84.19709379723844 - type: dot_f1 value: 76.92307692307693 - type: dot_precision value: 71.81969949916528 - type: dot_recall value: 82.80720665229443 - type: euclidean_accuracy value: 88.97232894787906 - type: euclidean_ap value: 86.02763993294909 - type: euclidean_f1 value: 78.18372741427383 - type: euclidean_precision value: 73.79861918107868 - type: euclidean_recall value: 83.12288266091777 - type: manhattan_accuracy value: 88.86948422400745 - type: manhattan_ap value: 86.0009157821563 - type: manhattan_f1 value: 78.10668017659404 - type: manhattan_precision value: 73.68564795848695 - type: manhattan_recall value: 83.09208500153989 - type: max_accuracy value: 88.97232894787906 - type: max_ap value: 86.1071528049007 - type: max_f1 value: 78.21588263552714 ---

GIST Embedding v0

*GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning* The model is fine-tuned on top of the BAAI/bge-base-en-v1.5 using the MEDI dataset augmented with mined triplets from the MTEB Classification training dataset (excluding data from the Amazon Polarity Classification task). The model does not require any instruction for generating embeddings. This means that queries for retrieval tasks can be directly encoded without crafting instructions. Technical paper: GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning # Data The dataset used is a compilation of the MEDI and MTEB Classification training datasets. Third-party datasets may be subject to additional terms and conditions under their associated licenses. A HuggingFace Dataset version of the compiled dataset, and the specific revision used to train the model, is available: - Dataset: avsolatorio/medi-data-mteb_avs_triplets - Revision: 238a0499b6e6b690cc64ea56fde8461daa8341bb The dataset contains a key, which can be used to select only the mteb classification tasks (prefixed with ). The **MEDI Dataset** is published in the following paper: One Embedder, Any Task: Instruction-Finetuned Text Embeddings. The MTEB Benchmark results of the GIST embedding model, compared with the base model, suggest that the fine-tuning dataset has perturbed the model considerably, which resulted in significant improvements in certain tasks while adversely degrading performance in some. The retrieval performance for the TRECCOVID task is of note. The fine-tuning dataset does not contain significant knowledge about COVID-19, which could have caused the observed performance degradation. We found some evidence, detailed in the paper, that thematic coverage of the fine-tuning data can affect downstream performance. # Usage The model can be easily loaded using the Sentence Transformers library. # Training Parameters Below are the training parameters used to fine-tune the model: # Evaluation The model was evaluated using the MTEB Evaluation suite. # Citation Please cite our work if you use GISTEmbed or the datasets we published in your projects or research. 🤗 # Acknowledgements This work is supported by the \"KCP IV - Exploring Data Use in the Development Economics Literature using Large Language Models (AI and LLMs)\" project funded by the Knowledge for Change Program (KCP) of the World Bank - RA-P503405-RESE-TF0C3444. The findings, interpretations, and conclusions expressed in this material are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity measurement, classification, retrieval, clustering, and reranking across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/avsolatorio_GIST-large-Embedding-v0.json b/data/model_data_json/avsolatorio_GIST-large-Embedding-v0.json new file mode 100644 index 0000000000000000000000000000000000000000..e094cb2793d69d35c68c94ee0ff5ea99328e3d07 --- /dev/null +++ b/data/model_data_json/avsolatorio_GIST-large-Embedding-v0.json @@ -0,0 +1,23 @@ +{ + "model_id": "avsolatorio/GIST-large-Embedding-v0", + "downloads": 117437, + "tags": [ + "sentence-transformers", + "safetensors", + "bert", + "feature-extraction", + "mteb", + "sentence-similarity", + "en", + "arxiv:2402.16829", + "arxiv:2212.09741", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers license: mit pipeline_tag: sentence-similarity tags: - feature-extraction - mteb - sentence-similarity - sentence-transformers model-index: - name: GIST-large-Embedding-v0 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.5820895522388 - type: ap value: 38.32190121241783 - type: f1 value: 69.44777155231054 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.40514999999998 - type: ap value: 90.2011565132406 - type: f1 value: 93.39486246843605 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 49.05999999999999 - type: f1 value: 48.58702718571088 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 38.407000000000004 - type: map_at_10 value: 54.822 - type: map_at_100 value: 55.387 - type: map_at_1000 value: 55.388999999999996 - type: map_at_3 value: 50.308 - type: map_at_5 value: 53.199 - type: mrr_at_1 value: 39.900000000000006 - type: mrr_at_10 value: 55.385 - type: mrr_at_100 value: 55.936 - type: mrr_at_1000 value: 55.93900000000001 - type: mrr_at_3 value: 50.853 - type: mrr_at_5 value: 53.738 - type: ndcg_at_1 value: 38.407000000000004 - type: ndcg_at_10 value: 63.38 - type: ndcg_at_100 value: 65.52900000000001 - type: ndcg_at_1000 value: 65.58800000000001 - type: ndcg_at_3 value: 54.26 - type: ndcg_at_5 value: 59.488 - type: precision_at_1 value: 38.407000000000004 - type: precision_at_10 value: 9.04 - type: precision_at_100 value: 0.992 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.906 - type: precision_at_5 value: 15.690000000000001 - type: recall_at_1 value: 38.407000000000004 - type: recall_at_10 value: 90.398 - type: recall_at_100 value: 99.21799999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 65.718 - type: recall_at_5 value: 78.45 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.49766333679089 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.57731111438094 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 64.70120072857361 - type: mrr value: 77.86714593501297 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 90.73821860690765 - type: cos_sim_spearman value: 89.17070651383446 - type: euclidean_pearson value: 88.28303958293029 - type: euclidean_spearman value: 88.81889126856979 - type: manhattan_pearson value: 88.09080621828731 - type: manhattan_spearman value: 88.55924679817751 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 88.10064935064933 - type: f1 value: 88.08460758973867 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.338228337929976 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.179156232378226 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 33.440999999999995 - type: map_at_10 value: 45.495000000000005 - type: map_at_100 value: 47.132000000000005 - type: map_at_1000 value: 47.253 - type: map_at_3 value: 41.766 - type: map_at_5 value: 43.873 - type: mrr_at_1 value: 40.772999999999996 - type: mrr_at_10 value: 51.627 - type: mrr_at_100 value: 52.364 - type: mrr_at_1000 value: 52.397000000000006 - type: mrr_at_3 value: 48.951 - type: mrr_at_5 value: 50.746 - type: ndcg_at_1 value: 40.772999999999996 - type: ndcg_at_10 value: 52.306 - type: ndcg_at_100 value: 57.753 - type: ndcg_at_1000 value: 59.36900000000001 - type: ndcg_at_3 value: 47.177 - type: ndcg_at_5 value: 49.71 - type: precision_at_1 value: 40.772999999999996 - type: precision_at_10 value: 10.129000000000001 - type: precision_at_100 value: 1.617 - type: precision_at_1000 value: 0.208 - type: precision_at_3 value: 22.985 - type: precision_at_5 value: 16.652 - type: recall_at_1 value: 33.440999999999995 - type: recall_at_10 value: 65.121 - type: recall_at_100 value: 87.55199999999999 - type: recall_at_1000 value: 97.41300000000001 - type: recall_at_3 value: 49.958999999999996 - type: recall_at_5 value: 57.14900000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.126 - type: map_at_10 value: 42.856 - type: map_at_100 value: 44.134 - type: map_at_1000 value: 44.274 - type: map_at_3 value: 39.594 - type: map_at_5 value: 41.504999999999995 - type: mrr_at_1 value: 40.127 - type: mrr_at_10 value: 48.736000000000004 - type: mrr_at_100 value: 49.303999999999995 - type: mrr_at_1000 value: 49.356 - type: mrr_at_3 value: 46.263 - type: mrr_at_5 value: 47.878 - type: ndcg_at_1 value: 40.127 - type: ndcg_at_10 value: 48.695 - type: ndcg_at_100 value: 52.846000000000004 - type: ndcg_at_1000 value: 54.964 - type: ndcg_at_3 value: 44.275 - type: ndcg_at_5 value: 46.54 - type: precision_at_1 value: 40.127 - type: precision_at_10 value: 9.229 - type: precision_at_100 value: 1.473 - type: precision_at_1000 value: 0.19499999999999998 - type: precision_at_3 value: 21.444 - type: precision_at_5 value: 15.389 - type: recall_at_1 value: 32.126 - type: recall_at_10 value: 58.971 - type: recall_at_100 value: 76.115 - type: recall_at_1000 value: 89.556 - type: recall_at_3 value: 45.891 - type: recall_at_5 value: 52.242 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 41.312 - type: map_at_10 value: 54.510000000000005 - type: map_at_100 value: 55.544000000000004 - type: map_at_1000 value: 55.593 - type: map_at_3 value: 50.859 - type: map_at_5 value: 52.839999999999996 - type: mrr_at_1 value: 47.147 - type: mrr_at_10 value: 57.678 - type: mrr_at_100 value: 58.287 - type: mrr_at_1000 value: 58.312 - type: mrr_at_3 value: 55.025999999999996 - type: mrr_at_5 value: 56.55 - type: ndcg_at_1 value: 47.147 - type: ndcg_at_10 value: 60.672000000000004 - type: ndcg_at_100 value: 64.411 - type: ndcg_at_1000 value: 65.35499999999999 - type: ndcg_at_3 value: 54.643 - type: ndcg_at_5 value: 57.461 - type: precision_at_1 value: 47.147 - type: precision_at_10 value: 9.881 - type: precision_at_100 value: 1.27 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 24.556 - type: precision_at_5 value: 16.814999999999998 - type: recall_at_1 value: 41.312 - type: recall_at_10 value: 75.62299999999999 - type: recall_at_100 value: 91.388 - type: recall_at_1000 value: 98.08 - type: recall_at_3 value: 59.40299999999999 - type: recall_at_5 value: 66.43900000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.609 - type: map_at_10 value: 37.614 - type: map_at_100 value: 38.584 - type: map_at_1000 value: 38.652 - type: map_at_3 value: 34.731 - type: map_at_5 value: 36.308 - type: mrr_at_1 value: 29.944 - type: mrr_at_10 value: 39.829 - type: mrr_at_100 value: 40.659 - type: mrr_at_1000 value: 40.709 - type: mrr_at_3 value: 37.269000000000005 - type: mrr_at_5 value: 38.625 - type: ndcg_at_1 value: 29.944 - type: ndcg_at_10 value: 43.082 - type: ndcg_at_100 value: 47.857 - type: ndcg_at_1000 value: 49.612 - type: ndcg_at_3 value: 37.578 - type: ndcg_at_5 value: 40.135 - type: precision_at_1 value: 29.944 - type: precision_at_10 value: 6.678000000000001 - type: precision_at_100 value: 0.951 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 16.045 - type: precision_at_5 value: 11.073 - type: recall_at_1 value: 27.609 - type: recall_at_10 value: 57.718 - type: recall_at_100 value: 79.768 - type: recall_at_1000 value: 92.868 - type: recall_at_3 value: 42.876 - type: recall_at_5 value: 49.104 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.071 - type: map_at_10 value: 27.471 - type: map_at_100 value: 28.71 - type: map_at_1000 value: 28.833 - type: map_at_3 value: 24.698 - type: map_at_5 value: 26.461000000000002 - type: mrr_at_1 value: 22.387999999999998 - type: mrr_at_10 value: 32.522 - type: mrr_at_100 value: 33.393 - type: mrr_at_1000 value: 33.455 - type: mrr_at_3 value: 29.830000000000002 - type: mrr_at_5 value: 31.472 - type: ndcg_at_1 value: 22.387999999999998 - type: ndcg_at_10 value: 33.278999999999996 - type: ndcg_at_100 value: 39.043 - type: ndcg_at_1000 value: 41.763 - type: ndcg_at_3 value: 28.310999999999996 - type: ndcg_at_5 value: 31.007 - type: precision_at_1 value: 22.387999999999998 - type: precision_at_10 value: 6.157 - type: precision_at_100 value: 1.042 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_3 value: 13.972000000000001 - type: precision_at_5 value: 10.274 - type: recall_at_1 value: 18.071 - type: recall_at_10 value: 46.025 - type: recall_at_100 value: 71.153 - type: recall_at_1000 value: 90.232 - type: recall_at_3 value: 32.311 - type: recall_at_5 value: 39.296 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.813000000000002 - type: map_at_10 value: 42.594 - type: map_at_100 value: 43.949 - type: map_at_1000 value: 44.052 - type: map_at_3 value: 39.1 - type: map_at_5 value: 41.111 - type: mrr_at_1 value: 37.824999999999996 - type: mrr_at_10 value: 48.06 - type: mrr_at_100 value: 48.91 - type: mrr_at_1000 value: 48.946 - type: mrr_at_3 value: 45.509 - type: mrr_at_5 value: 47.073 - type: ndcg_at_1 value: 37.824999999999996 - type: ndcg_at_10 value: 48.882 - type: ndcg_at_100 value: 54.330999999999996 - type: ndcg_at_1000 value: 56.120999999999995 - type: ndcg_at_3 value: 43.529 - type: ndcg_at_5 value: 46.217999999999996 - type: precision_at_1 value: 37.824999999999996 - type: precision_at_10 value: 8.845 - type: precision_at_100 value: 1.34 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 20.757 - type: precision_at_5 value: 14.802999999999999 - type: recall_at_1 value: 30.813000000000002 - type: recall_at_10 value: 61.895999999999994 - type: recall_at_100 value: 84.513 - type: recall_at_1000 value: 95.817 - type: recall_at_3 value: 47.099000000000004 - type: recall_at_5 value: 54.031 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.735999999999997 - type: map_at_10 value: 36.799 - type: map_at_100 value: 38.246 - type: map_at_1000 value: 38.353 - type: map_at_3 value: 33.133 - type: map_at_5 value: 34.954 - type: mrr_at_1 value: 31.849 - type: mrr_at_10 value: 41.928 - type: mrr_at_100 value: 42.846000000000004 - type: mrr_at_1000 value: 42.894 - type: mrr_at_3 value: 39.117000000000004 - type: mrr_at_5 value: 40.521 - type: ndcg_at_1 value: 31.849 - type: ndcg_at_10 value: 43.143 - type: ndcg_at_100 value: 48.963 - type: ndcg_at_1000 value: 51.041000000000004 - type: ndcg_at_3 value: 37.218 - type: ndcg_at_5 value: 39.542 - type: precision_at_1 value: 31.849 - type: precision_at_10 value: 8.231 - type: precision_at_100 value: 1.277 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 18.037 - type: precision_at_5 value: 12.945 - type: recall_at_1 value: 25.735999999999997 - type: recall_at_10 value: 56.735 - type: recall_at_100 value: 81.04 - type: recall_at_1000 value: 94.845 - type: recall_at_3 value: 40.239999999999995 - type: recall_at_5 value: 46.378 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.580333333333336 - type: map_at_10 value: 37.70558333333334 - type: map_at_100 value: 38.94941666666667 - type: map_at_1000 value: 39.062083333333334 - type: map_at_3 value: 34.63333333333334 - type: map_at_5 value: 36.35241666666666 - type: mrr_at_1 value: 32.64866666666667 - type: mrr_at_10 value: 42.018499999999996 - type: mrr_at_100 value: 42.83391666666666 - type: mrr_at_1000 value: 42.884166666666665 - type: mrr_at_3 value: 39.476499999999994 - type: mrr_at_5 value: 40.96983333333334 - type: ndcg_at_1 value: 32.64866666666667 - type: ndcg_at_10 value: 43.43866666666667 - type: ndcg_at_100 value: 48.569833333333335 - type: ndcg_at_1000 value: 50.6495 - type: ndcg_at_3 value: 38.327166666666656 - type: ndcg_at_5 value: 40.76941666666667 - type: precision_at_1 value: 32.64866666666667 - type: precision_at_10 value: 7.652333333333332 - type: precision_at_100 value: 1.2066666666666666 - type: precision_at_1000 value: 0.15841666666666668 - type: precision_at_3 value: 17.75108333333333 - type: precision_at_5 value: 12.641916666666669 - type: recall_at_1 value: 27.580333333333336 - type: recall_at_10 value: 56.02591666666667 - type: recall_at_100 value: 78.317 - type: recall_at_1000 value: 92.52608333333332 - type: recall_at_3 value: 41.84283333333333 - type: recall_at_5 value: 48.105666666666664 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.876 - type: map_at_10 value: 34.521 - type: map_at_100 value: 35.581 - type: map_at_1000 value: 35.674 - type: map_at_3 value: 32.501000000000005 - type: map_at_5 value: 33.602 - type: mrr_at_1 value: 31.441999999999997 - type: mrr_at_10 value: 37.669999999999995 - type: mrr_at_100 value: 38.523 - type: mrr_at_1000 value: 38.59 - type: mrr_at_3 value: 35.762 - type: mrr_at_5 value: 36.812 - type: ndcg_at_1 value: 31.441999999999997 - type: ndcg_at_10 value: 38.46 - type: ndcg_at_100 value: 43.479 - type: ndcg_at_1000 value: 45.858 - type: ndcg_at_3 value: 34.668 - type: ndcg_at_5 value: 36.416 - type: precision_at_1 value: 31.441999999999997 - type: precision_at_10 value: 5.782 - type: precision_at_100 value: 0.91 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 14.417 - type: precision_at_5 value: 9.876999999999999 - type: recall_at_1 value: 27.876 - type: recall_at_10 value: 47.556 - type: recall_at_100 value: 70.39699999999999 - type: recall_at_1000 value: 87.969 - type: recall_at_3 value: 37.226 - type: recall_at_5 value: 41.43 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.854000000000003 - type: map_at_10 value: 26.632 - type: map_at_100 value: 27.849 - type: map_at_1000 value: 27.977 - type: map_at_3 value: 24.089 - type: map_at_5 value: 25.477 - type: mrr_at_1 value: 22.987 - type: mrr_at_10 value: 30.781999999999996 - type: mrr_at_100 value: 31.746000000000002 - type: mrr_at_1000 value: 31.818 - type: mrr_at_3 value: 28.43 - type: mrr_at_5 value: 29.791 - type: ndcg_at_1 value: 22.987 - type: ndcg_at_10 value: 31.585 - type: ndcg_at_100 value: 37.32 - type: ndcg_at_1000 value: 40.072 - type: ndcg_at_3 value: 27.058 - type: ndcg_at_5 value: 29.137999999999998 - type: precision_at_1 value: 22.987 - type: precision_at_10 value: 5.76 - type: precision_at_100 value: 1.018 - type: precision_at_1000 value: 0.14400000000000002 - type: precision_at_3 value: 12.767000000000001 - type: precision_at_5 value: 9.257 - type: recall_at_1 value: 18.854000000000003 - type: recall_at_10 value: 42.349 - type: recall_at_100 value: 68.15299999999999 - type: recall_at_1000 value: 87.44 - type: recall_at_3 value: 29.715999999999998 - type: recall_at_5 value: 35.085 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.094 - type: map_at_10 value: 38.22 - type: map_at_100 value: 39.352 - type: map_at_1000 value: 39.452 - type: map_at_3 value: 35.339 - type: map_at_5 value: 36.78 - type: mrr_at_1 value: 33.022 - type: mrr_at_10 value: 42.466 - type: mrr_at_100 value: 43.3 - type: mrr_at_1000 value: 43.356 - type: mrr_at_3 value: 40.159 - type: mrr_at_5 value: 41.272999999999996 - type: ndcg_at_1 value: 33.022 - type: ndcg_at_10 value: 43.976 - type: ndcg_at_100 value: 49.008 - type: ndcg_at_1000 value: 51.154999999999994 - type: ndcg_at_3 value: 38.891 - type: ndcg_at_5 value: 40.897 - type: precision_at_1 value: 33.022 - type: precision_at_10 value: 7.396999999999999 - type: precision_at_100 value: 1.1199999999999999 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_3 value: 17.724 - type: precision_at_5 value: 12.239 - type: recall_at_1 value: 28.094 - type: recall_at_10 value: 57.162 - type: recall_at_100 value: 78.636 - type: recall_at_1000 value: 93.376 - type: recall_at_3 value: 43.328 - type: recall_at_5 value: 48.252 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.937 - type: map_at_10 value: 34.82 - type: map_at_100 value: 36.405 - type: map_at_1000 value: 36.626 - type: map_at_3 value: 31.548 - type: map_at_5 value: 33.355000000000004 - type: mrr_at_1 value: 30.435000000000002 - type: mrr_at_10 value: 39.946 - type: mrr_at_100 value: 40.873 - type: mrr_at_1000 value: 40.910000000000004 - type: mrr_at_3 value: 37.088 - type: mrr_at_5 value: 38.808 - type: ndcg_at_1 value: 30.435000000000002 - type: ndcg_at_10 value: 41.25 - type: ndcg_at_100 value: 47.229 - type: ndcg_at_1000 value: 49.395 - type: ndcg_at_3 value: 35.801 - type: ndcg_at_5 value: 38.457 - type: precision_at_1 value: 30.435000000000002 - type: precision_at_10 value: 8.083 - type: precision_at_100 value: 1.601 - type: precision_at_1000 value: 0.247 - type: precision_at_3 value: 17.061999999999998 - type: precision_at_5 value: 12.767000000000001 - type: recall_at_1 value: 24.937 - type: recall_at_10 value: 53.905 - type: recall_at_100 value: 80.607 - type: recall_at_1000 value: 93.728 - type: recall_at_3 value: 38.446000000000005 - type: recall_at_5 value: 45.188 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.095000000000002 - type: map_at_10 value: 30.935000000000002 - type: map_at_100 value: 31.907000000000004 - type: map_at_1000 value: 32.006 - type: map_at_3 value: 28.242 - type: map_at_5 value: 29.963 - type: mrr_at_1 value: 23.845 - type: mrr_at_10 value: 32.978 - type: mrr_at_100 value: 33.802 - type: mrr_at_1000 value: 33.867000000000004 - type: mrr_at_3 value: 30.314000000000004 - type: mrr_at_5 value: 32.089 - type: ndcg_at_1 value: 23.845 - type: ndcg_at_10 value: 35.934 - type: ndcg_at_100 value: 40.598 - type: ndcg_at_1000 value: 43.089 - type: ndcg_at_3 value: 30.776999999999997 - type: ndcg_at_5 value: 33.711999999999996 - type: precision_at_1 value: 23.845 - type: precision_at_10 value: 5.656 - type: precision_at_100 value: 0.861 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 13.247 - type: precision_at_5 value: 9.612 - type: recall_at_1 value: 22.095000000000002 - type: recall_at_10 value: 49.25 - type: recall_at_100 value: 70.482 - type: recall_at_1000 value: 88.98899999999999 - type: recall_at_3 value: 35.619 - type: recall_at_5 value: 42.674 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 14.154 - type: map_at_10 value: 24.654999999999998 - type: map_at_100 value: 26.723999999999997 - type: map_at_1000 value: 26.912000000000003 - type: map_at_3 value: 20.4 - type: map_at_5 value: 22.477 - type: mrr_at_1 value: 32.117000000000004 - type: mrr_at_10 value: 44.590999999999994 - type: mrr_at_100 value: 45.425 - type: mrr_at_1000 value: 45.456 - type: mrr_at_3 value: 41.281 - type: mrr_at_5 value: 43.219 - type: ndcg_at_1 value: 32.117000000000004 - type: ndcg_at_10 value: 33.994 - type: ndcg_at_100 value: 41.438 - type: ndcg_at_1000 value: 44.611000000000004 - type: ndcg_at_3 value: 27.816000000000003 - type: ndcg_at_5 value: 29.816 - type: precision_at_1 value: 32.117000000000004 - type: precision_at_10 value: 10.756 - type: precision_at_100 value: 1.8679999999999999 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 20.803 - type: precision_at_5 value: 15.987000000000002 - type: recall_at_1 value: 14.154 - type: recall_at_10 value: 40.489999999999995 - type: recall_at_100 value: 65.635 - type: recall_at_1000 value: 83.276 - type: recall_at_3 value: 25.241000000000003 - type: recall_at_5 value: 31.211 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.332 - type: map_at_10 value: 20.462 - type: map_at_100 value: 29.473 - type: map_at_1000 value: 31.215 - type: map_at_3 value: 14.466999999999999 - type: map_at_5 value: 16.922 - type: mrr_at_1 value: 69.5 - type: mrr_at_10 value: 77.039 - type: mrr_at_100 value: 77.265 - type: mrr_at_1000 value: 77.271 - type: mrr_at_3 value: 75.5 - type: mrr_at_5 value: 76.4 - type: ndcg_at_1 value: 57.125 - type: ndcg_at_10 value: 42.958 - type: ndcg_at_100 value: 48.396 - type: ndcg_at_1000 value: 55.897 - type: ndcg_at_3 value: 47.188 - type: ndcg_at_5 value: 44.376 - type: precision_at_1 value: 69.5 - type: precision_at_10 value: 34.5 - type: precision_at_100 value: 11.18 - type: precision_at_1000 value: 2.13 - type: precision_at_3 value: 51.083 - type: precision_at_5 value: 43.1 - type: recall_at_1 value: 9.332 - type: recall_at_10 value: 26.422 - type: recall_at_100 value: 56.098000000000006 - type: recall_at_1000 value: 79.66 - type: recall_at_3 value: 15.703 - type: recall_at_5 value: 19.644000000000002 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 54.72 - type: f1 value: 49.67819606587526 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 74.97 - type: map_at_10 value: 82.956 - type: map_at_100 value: 83.193 - type: map_at_1000 value: 83.208 - type: map_at_3 value: 81.837 - type: map_at_5 value: 82.57 - type: mrr_at_1 value: 80.783 - type: mrr_at_10 value: 87.546 - type: mrr_at_100 value: 87.627 - type: mrr_at_1000 value: 87.63 - type: mrr_at_3 value: 86.79400000000001 - type: mrr_at_5 value: 87.32799999999999 - type: ndcg_at_1 value: 80.783 - type: ndcg_at_10 value: 86.54899999999999 - type: ndcg_at_100 value: 87.355 - type: ndcg_at_1000 value: 87.629 - type: ndcg_at_3 value: 84.82 - type: ndcg_at_5 value: 85.83800000000001 - type: precision_at_1 value: 80.783 - type: precision_at_10 value: 10.327 - type: precision_at_100 value: 1.094 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 32.218 - type: precision_at_5 value: 20.012 - type: recall_at_1 value: 74.97 - type: recall_at_10 value: 93.072 - type: recall_at_100 value: 96.218 - type: recall_at_1000 value: 97.991 - type: recall_at_3 value: 88.357 - type: recall_at_5 value: 90.983 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 21.12 - type: map_at_10 value: 35.908 - type: map_at_100 value: 37.895 - type: map_at_1000 value: 38.068000000000005 - type: map_at_3 value: 31.189 - type: map_at_5 value: 33.908 - type: mrr_at_1 value: 42.901 - type: mrr_at_10 value: 52.578 - type: mrr_at_100 value: 53.308 - type: mrr_at_1000 value: 53.342 - type: mrr_at_3 value: 50.385999999999996 - type: mrr_at_5 value: 51.62799999999999 - type: ndcg_at_1 value: 42.901 - type: ndcg_at_10 value: 44.302 - type: ndcg_at_100 value: 51.132999999999996 - type: ndcg_at_1000 value: 53.848 - type: ndcg_at_3 value: 40.464 - type: ndcg_at_5 value: 41.743 - type: precision_at_1 value: 42.901 - type: precision_at_10 value: 12.423 - type: precision_at_100 value: 1.968 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 27.622999999999998 - type: precision_at_5 value: 20.278 - type: recall_at_1 value: 21.12 - type: recall_at_10 value: 52.091 - type: recall_at_100 value: 77.062 - type: recall_at_1000 value: 93.082 - type: recall_at_3 value: 37.223 - type: recall_at_5 value: 43.826 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 38.940000000000005 - type: map_at_10 value: 62.239999999999995 - type: map_at_100 value: 63.141000000000005 - type: map_at_1000 value: 63.205999999999996 - type: map_at_3 value: 58.738 - type: map_at_5 value: 60.924 - type: mrr_at_1 value: 77.88000000000001 - type: mrr_at_10 value: 83.7 - type: mrr_at_100 value: 83.882 - type: mrr_at_1000 value: 83.889 - type: mrr_at_3 value: 82.748 - type: mrr_at_5 value: 83.381 - type: ndcg_at_1 value: 77.88000000000001 - type: ndcg_at_10 value: 70.462 - type: ndcg_at_100 value: 73.564 - type: ndcg_at_1000 value: 74.78099999999999 - type: ndcg_at_3 value: 65.524 - type: ndcg_at_5 value: 68.282 - type: precision_at_1 value: 77.88000000000001 - type: precision_at_10 value: 14.81 - type: precision_at_100 value: 1.7229999999999999 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 42.083999999999996 - type: precision_at_5 value: 27.43 - type: recall_at_1 value: 38.940000000000005 - type: recall_at_10 value: 74.051 - type: recall_at_100 value: 86.158 - type: recall_at_1000 value: 94.146 - type: recall_at_3 value: 63.126000000000005 - type: recall_at_5 value: 68.575 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.23440000000001 - type: ap value: 87.33490392265892 - type: f1 value: 91.21374626021836 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.137999999999998 - type: map_at_10 value: 34.471000000000004 - type: map_at_100 value: 35.634 - type: map_at_1000 value: 35.685 - type: map_at_3 value: 30.587999999999997 - type: map_at_5 value: 32.812999999999995 - type: mrr_at_1 value: 22.736 - type: mrr_at_10 value: 35.092 - type: mrr_at_100 value: 36.193999999999996 - type: mrr_at_1000 value: 36.238 - type: mrr_at_3 value: 31.28 - type: mrr_at_5 value: 33.498 - type: ndcg_at_1 value: 22.736 - type: ndcg_at_10 value: 41.388999999999996 - type: ndcg_at_100 value: 46.967999999999996 - type: ndcg_at_1000 value: 48.178 - type: ndcg_at_3 value: 33.503 - type: ndcg_at_5 value: 37.484 - type: precision_at_1 value: 22.736 - type: precision_at_10 value: 6.54 - type: precision_at_100 value: 0.9339999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.249999999999998 - type: precision_at_5 value: 10.562000000000001 - type: recall_at_1 value: 22.137999999999998 - type: recall_at_10 value: 62.629999999999995 - type: recall_at_100 value: 88.375 - type: recall_at_1000 value: 97.529 - type: recall_at_3 value: 41.245 - type: recall_at_5 value: 50.808 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 95.25079799361606 - type: f1 value: 95.00726023695032 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 78.23757409940721 - type: f1 value: 58.534958803195714 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.20040349697378 - type: f1 value: 74.31261149784696 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.35104236718227 - type: f1 value: 79.7373049864316 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.478828180753126 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.25696147904426 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.82488548405117 - type: mrr value: 34.066706809031096 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.557 - type: map_at_10 value: 15.055 - type: map_at_100 value: 19.575 - type: map_at_1000 value: 21.267 - type: map_at_3 value: 10.86 - type: map_at_5 value: 12.83 - type: mrr_at_1 value: 50.464 - type: mrr_at_10 value: 59.050999999999995 - type: mrr_at_100 value: 59.436 - type: mrr_at_1000 value: 59.476 - type: mrr_at_3 value: 56.811 - type: mrr_at_5 value: 58.08 - type: ndcg_at_1 value: 47.988 - type: ndcg_at_10 value: 38.645 - type: ndcg_at_100 value: 36.339 - type: ndcg_at_1000 value: 45.279 - type: ndcg_at_3 value: 43.35 - type: ndcg_at_5 value: 41.564 - type: precision_at_1 value: 49.845 - type: precision_at_10 value: 28.544999999999998 - type: precision_at_100 value: 9.322 - type: precision_at_1000 value: 2.258 - type: precision_at_3 value: 40.144000000000005 - type: precision_at_5 value: 35.913000000000004 - type: recall_at_1 value: 6.557 - type: recall_at_10 value: 19.5 - type: recall_at_100 value: 37.153999999999996 - type: recall_at_1000 value: 69.581 - type: recall_at_3 value: 12.133 - type: recall_at_5 value: 15.43 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.740000000000002 - type: map_at_10 value: 48.150999999999996 - type: map_at_100 value: 49.125 - type: map_at_1000 value: 49.149 - type: map_at_3 value: 43.645 - type: map_at_5 value: 46.417 - type: mrr_at_1 value: 35.892 - type: mrr_at_10 value: 50.524 - type: mrr_at_100 value: 51.232 - type: mrr_at_1000 value: 51.24999999999999 - type: mrr_at_3 value: 46.852 - type: mrr_at_5 value: 49.146 - type: ndcg_at_1 value: 35.892 - type: ndcg_at_10 value: 56.08800000000001 - type: ndcg_at_100 value: 60.077000000000005 - type: ndcg_at_1000 value: 60.632 - type: ndcg_at_3 value: 47.765 - type: ndcg_at_5 value: 52.322 - type: precision_at_1 value: 35.892 - type: precision_at_10 value: 9.296 - type: precision_at_100 value: 1.154 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 21.92 - type: precision_at_5 value: 15.781999999999998 - type: recall_at_1 value: 31.740000000000002 - type: recall_at_10 value: 77.725 - type: recall_at_100 value: 94.841 - type: recall_at_1000 value: 99.003 - type: recall_at_3 value: 56.407 - type: recall_at_5 value: 66.848 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.429 - type: map_at_10 value: 85.42699999999999 - type: map_at_100 value: 86.063 - type: map_at_1000 value: 86.077 - type: map_at_3 value: 82.573 - type: map_at_5 value: 84.371 - type: mrr_at_1 value: 82.34 - type: mrr_at_10 value: 88.247 - type: mrr_at_100 value: 88.357 - type: mrr_at_1000 value: 88.357 - type: mrr_at_3 value: 87.38 - type: mrr_at_5 value: 87.981 - type: ndcg_at_1 value: 82.34 - type: ndcg_at_10 value: 88.979 - type: ndcg_at_100 value: 90.18599999999999 - type: ndcg_at_1000 value: 90.254 - type: ndcg_at_3 value: 86.378 - type: ndcg_at_5 value: 87.821 - type: precision_at_1 value: 82.34 - type: precision_at_10 value: 13.482 - type: precision_at_100 value: 1.537 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.852999999999994 - type: precision_at_5 value: 24.798000000000002 - type: recall_at_1 value: 71.429 - type: recall_at_10 value: 95.64099999999999 - type: recall_at_100 value: 99.723 - type: recall_at_1000 value: 99.98 - type: recall_at_3 value: 88.011 - type: recall_at_5 value: 92.246 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 60.62148584103299 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.2923987272903 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.128 - type: map_at_10 value: 14.63 - type: map_at_100 value: 17.285 - type: map_at_1000 value: 17.676 - type: map_at_3 value: 9.993 - type: map_at_5 value: 12.286999999999999 - type: mrr_at_1 value: 25.4 - type: mrr_at_10 value: 38.423 - type: mrr_at_100 value: 39.497 - type: mrr_at_1000 value: 39.531 - type: mrr_at_3 value: 34.9 - type: mrr_at_5 value: 37.01 - type: ndcg_at_1 value: 25.4 - type: ndcg_at_10 value: 24.062 - type: ndcg_at_100 value: 33.823 - type: ndcg_at_1000 value: 39.663 - type: ndcg_at_3 value: 22.246 - type: ndcg_at_5 value: 19.761 - type: precision_at_1 value: 25.4 - type: precision_at_10 value: 12.85 - type: precision_at_100 value: 2.71 - type: precision_at_1000 value: 0.41000000000000003 - type: precision_at_3 value: 21.4 - type: precision_at_5 value: 17.86 - type: recall_at_1 value: 5.128 - type: recall_at_10 value: 26.06 - type: recall_at_100 value: 54.993 - type: recall_at_1000 value: 83.165 - type: recall_at_3 value: 13.003 - type: recall_at_5 value: 18.117 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 87.5466779326323 - type: cos_sim_spearman value: 82.79782085421951 - type: euclidean_pearson value: 84.76929982677339 - type: euclidean_spearman value: 82.51802536005597 - type: manhattan_pearson value: 84.76736312526177 - type: manhattan_spearman value: 82.50799656335593 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.40486308108694 - type: cos_sim_spearman value: 77.12670500926937 - type: euclidean_pearson value: 85.23836845503847 - type: euclidean_spearman value: 78.41475117006176 - type: manhattan_pearson value: 85.24302039610805 - type: manhattan_spearman value: 78.4053162562707 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 88.83570289087565 - type: cos_sim_spearman value: 89.28563503553643 - type: euclidean_pearson value: 87.77516003996445 - type: euclidean_spearman value: 88.8656149534085 - type: manhattan_pearson value: 87.75568872417946 - type: manhattan_spearman value: 88.80445489340585 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 86.776406555485 - type: cos_sim_spearman value: 83.8288465070091 - type: euclidean_pearson value: 85.37827999808123 - type: euclidean_spearman value: 84.11079529992739 - type: manhattan_pearson value: 85.35336495689121 - type: manhattan_spearman value: 84.08618492649347 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.57644404820684 - type: cos_sim_spearman value: 89.69728364350713 - type: euclidean_pearson value: 88.28202320389443 - type: euclidean_spearman value: 88.9560567319321 - type: manhattan_pearson value: 88.29461100044172 - type: manhattan_spearman value: 88.96030920678558 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 85.05211938460621 - type: cos_sim_spearman value: 86.43413865667489 - type: euclidean_pearson value: 85.62760689259562 - type: euclidean_spearman value: 86.28867831982394 - type: manhattan_pearson value: 85.60828879163458 - type: manhattan_spearman value: 86.27823731462473 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 90.00254140466377 - type: cos_sim_spearman value: 89.66118745178284 - type: euclidean_pearson value: 89.46985446236553 - type: euclidean_spearman value: 88.92649032371526 - type: manhattan_pearson value: 89.49600028180247 - type: manhattan_spearman value: 88.86948431519099 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 68.93578321067938 - type: cos_sim_spearman value: 69.60639595839257 - type: euclidean_pearson value: 70.33485090574897 - type: euclidean_spearman value: 69.03380379185452 - type: manhattan_pearson value: 70.42097254943839 - type: manhattan_spearman value: 69.25296348304255 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 87.29588700755069 - type: cos_sim_spearman value: 88.30389489193672 - type: euclidean_pearson value: 87.60349838180346 - type: euclidean_spearman value: 87.91041868311692 - type: manhattan_pearson value: 87.59373630607907 - type: manhattan_spearman value: 87.88690174001724 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.8030655700857 - type: mrr value: 96.3950637234951 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 60.028000000000006 - type: map_at_10 value: 69.855 - type: map_at_100 value: 70.257 - type: map_at_1000 value: 70.283 - type: map_at_3 value: 66.769 - type: map_at_5 value: 68.679 - type: mrr_at_1 value: 62.666999999999994 - type: mrr_at_10 value: 70.717 - type: mrr_at_100 value: 71.00800000000001 - type: mrr_at_1000 value: 71.033 - type: mrr_at_3 value: 68.389 - type: mrr_at_5 value: 69.939 - type: ndcg_at_1 value: 62.666999999999994 - type: ndcg_at_10 value: 74.715 - type: ndcg_at_100 value: 76.364 - type: ndcg_at_1000 value: 76.89399999999999 - type: ndcg_at_3 value: 69.383 - type: ndcg_at_5 value: 72.322 - type: precision_at_1 value: 62.666999999999994 - type: precision_at_10 value: 10.067 - type: precision_at_100 value: 1.09 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.111 - type: precision_at_5 value: 18.267 - type: recall_at_1 value: 60.028000000000006 - type: recall_at_10 value: 88.822 - type: recall_at_100 value: 96.167 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 74.367 - type: recall_at_5 value: 81.661 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.84554455445544 - type: cos_sim_ap value: 96.54482863244152 - type: cos_sim_f1 value: 92.13709677419355 - type: cos_sim_precision value: 92.88617886178862 - type: cos_sim_recall value: 91.4 - type: dot_accuracy value: 99.76039603960396 - type: dot_ap value: 93.20115278887057 - type: dot_f1 value: 87.92079207920793 - type: dot_precision value: 87.05882352941177 - type: dot_recall value: 88.8 - type: euclidean_accuracy value: 99.84950495049505 - type: euclidean_ap value: 96.53268343961348 - type: euclidean_f1 value: 92.23697650663942 - type: euclidean_precision value: 94.258872651357 - type: euclidean_recall value: 90.3 - type: manhattan_accuracy value: 99.85346534653465 - type: manhattan_ap value: 96.54495433438355 - type: manhattan_f1 value: 92.51012145748987 - type: manhattan_precision value: 93.64754098360656 - type: manhattan_recall value: 91.4 - type: max_accuracy value: 99.85346534653465 - type: max_ap value: 96.54495433438355 - type: max_f1 value: 92.51012145748987 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 66.46940443952006 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 36.396194493841584 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.881717673695555 - type: mrr value: 55.73439224174519 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.438177268254087 - type: cos_sim_spearman value: 30.96177698848688 - type: dot_pearson value: 30.513850376431435 - type: dot_spearman value: 29.932421046509706 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.21 - type: map_at_10 value: 1.727 - type: map_at_100 value: 9.881 - type: map_at_1000 value: 24.245 - type: map_at_3 value: 0.615 - type: map_at_5 value: 0.966 - type: mrr_at_1 value: 78.0 - type: mrr_at_10 value: 87.333 - type: mrr_at_100 value: 87.333 - type: mrr_at_1000 value: 87.333 - type: mrr_at_3 value: 86.333 - type: mrr_at_5 value: 87.333 - type: ndcg_at_1 value: 74.0 - type: ndcg_at_10 value: 69.12700000000001 - type: ndcg_at_100 value: 53.893 - type: ndcg_at_1000 value: 49.639 - type: ndcg_at_3 value: 74.654 - type: ndcg_at_5 value: 73.232 - type: precision_at_1 value: 78.0 - type: precision_at_10 value: 72.8 - type: precision_at_100 value: 55.42 - type: precision_at_1000 value: 21.73 - type: precision_at_3 value: 79.333 - type: precision_at_5 value: 77.2 - type: recall_at_1 value: 0.21 - type: recall_at_10 value: 1.9709999999999999 - type: recall_at_100 value: 13.555 - type: recall_at_1000 value: 46.961999999999996 - type: recall_at_3 value: 0.66 - type: recall_at_5 value: 1.052 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.456 - type: map_at_10 value: 9.426 - type: map_at_100 value: 16.066 - type: map_at_1000 value: 17.652 - type: map_at_3 value: 5.2459999999999996 - type: map_at_5 value: 6.5360000000000005 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 47.666 - type: mrr_at_100 value: 48.681999999999995 - type: mrr_at_1000 value: 48.681999999999995 - type: mrr_at_3 value: 43.878 - type: mrr_at_5 value: 46.224 - type: ndcg_at_1 value: 31.633 - type: ndcg_at_10 value: 23.454 - type: ndcg_at_100 value: 36.616 - type: ndcg_at_1000 value: 48.596000000000004 - type: ndcg_at_3 value: 28.267999999999997 - type: ndcg_at_5 value: 25.630999999999997 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 20.204 - type: precision_at_100 value: 7.754999999999999 - type: precision_at_1000 value: 1.5709999999999997 - type: precision_at_3 value: 29.252 - type: precision_at_5 value: 24.898 - type: recall_at_1 value: 2.456 - type: recall_at_10 value: 14.951 - type: recall_at_100 value: 48.399 - type: recall_at_1000 value: 85.077 - type: recall_at_3 value: 6.1370000000000005 - type: recall_at_5 value: 8.671 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.86240000000001 - type: ap value: 14.678570078747494 - type: f1 value: 55.295967793934445 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.17374080362195 - type: f1 value: 59.54410874861454 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.91227822485289 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.12523097097217 - type: cos_sim_ap value: 77.59606075943269 - type: cos_sim_f1 value: 71.11395646606915 - type: cos_sim_precision value: 69.07960199004975 - type: cos_sim_recall value: 73.27176781002639 - type: dot_accuracy value: 84.68736961316088 - type: dot_ap value: 68.47167450741459 - type: dot_f1 value: 64.42152354914874 - type: dot_precision value: 60.887949260042284 - type: dot_recall value: 68.3905013192612 - type: euclidean_accuracy value: 86.88084878106932 - type: euclidean_ap value: 77.27351204978599 - type: euclidean_f1 value: 70.99179716629381 - type: euclidean_precision value: 67.10526315789474 - type: euclidean_recall value: 75.35620052770449 - type: manhattan_accuracy value: 86.83316445133218 - type: manhattan_ap value: 77.21835357308716 - type: manhattan_f1 value: 71.05587004676349 - type: manhattan_precision value: 66.58210332103322 - type: manhattan_recall value: 76.17414248021109 - type: max_accuracy value: 87.12523097097217 - type: max_ap value: 77.59606075943269 - type: max_f1 value: 71.11395646606915 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.97232894787906 - type: cos_sim_ap value: 85.9613736469497 - type: cos_sim_f1 value: 78.40216655382532 - type: cos_sim_precision value: 72.97512437810946 - type: cos_sim_recall value: 84.70126270403449 - type: dot_accuracy value: 88.04866689952264 - type: dot_ap value: 83.15465089499936 - type: dot_f1 value: 76.32698287879329 - type: dot_precision value: 71.23223697378077 - type: dot_recall value: 82.20665229442562 - type: euclidean_accuracy value: 88.67543757519307 - type: euclidean_ap value: 85.4524355531532 - type: euclidean_f1 value: 77.78729106950081 - type: euclidean_precision value: 75.3009009009009 - type: euclidean_recall value: 80.44348629504158 - type: manhattan_accuracy value: 88.65991384328792 - type: manhattan_ap value: 85.43109069046837 - type: manhattan_f1 value: 77.72639551396425 - type: manhattan_precision value: 73.73402417962004 - type: manhattan_recall value: 82.17585463504774 - type: max_accuracy value: 88.97232894787906 - type: max_ap value: 85.9613736469497 - type: max_f1 value: 78.40216655382532 ---

GIST Large Embedding v0

*GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning* The model is fine-tuned on top of the BAAI/bge-large-en-v1.5 using the MEDI dataset augmented with mined triplets from the MTEB Classification training dataset (excluding data from the Amazon Polarity Classification task). The model does not require any instruction for generating embeddings. This means that queries for retrieval tasks can be directly encoded without crafting instructions. Technical paper: GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning # Data The dataset used is a compilation of the MEDI and MTEB Classification training datasets. Third-party datasets may be subject to additional terms and conditions under their associated licenses. A HuggingFace Dataset version of the compiled dataset, and the specific revision used to train the model, is available: - Dataset: avsolatorio/medi-data-mteb_avs_triplets - Revision: 238a0499b6e6b690cc64ea56fde8461daa8341bb The dataset contains a key, which can be used to select only the mteb classification tasks (prefixed with ). The **MEDI Dataset** is published in the following paper: One Embedder, Any Task: Instruction-Finetuned Text Embeddings. The MTEB Benchmark results of the GIST embedding model, compared with the base model, suggest that the fine-tuning dataset has perturbed the model considerably, which resulted in significant improvements in certain tasks while adversely degrading performance in some. The retrieval performance for the TRECCOVID task is of note. The fine-tuning dataset does not contain significant knowledge about COVID-19, which could have caused the observed performance degradation. We found some evidence, detailed in the paper, that thematic coverage of the fine-tuning data can affect downstream performance. # Usage The model can be easily loaded using the Sentence Transformers library. # Training Parameters Below are the training parameters used to fine-tune the model: # Evaluation The model was evaluated using the MTEB Evaluation suite. # Citation Please cite our work if you use GISTEmbed or the datasets we published in your projects or research. 🤗 # Acknowledgements This work is supported by the \"KCP IV - Exploring Data Use in the Development Economics Literature using Large Language Models (AI and LLMs)\" project funded by the Knowledge for Change Program (KCP) of the World Bank - RA-P503405-RESE-TF0C3444. The findings, interpretations, and conclusions expressed in this material are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity measurement, classification, clustering, and retrieval by transforming text into numerical representations." +} \ No newline at end of file diff --git a/data/model_data_json/avsolatorio_GIST-small-Embedding-v0.json b/data/model_data_json/avsolatorio_GIST-small-Embedding-v0.json new file mode 100644 index 0000000000000000000000000000000000000000..faed322cab6313972a4d714e331327ed9d0a2dac --- /dev/null +++ b/data/model_data_json/avsolatorio_GIST-small-Embedding-v0.json @@ -0,0 +1,25 @@ +{ + "model_id": "avsolatorio/GIST-small-Embedding-v0", + "downloads": 1010884, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "mteb", + "sentence-similarity", + "en", + "arxiv:2402.16829", + "arxiv:2212.09741", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers license: mit pipeline_tag: sentence-similarity tags: - feature-extraction - mteb - sentence-similarity - sentence-transformers model-index: - name: GIST-small-Embedding-v0 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.26865671641791 - type: ap value: 38.25623793370476 - type: f1 value: 69.26434651320257 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.232225 - type: ap value: 89.97936072879344 - type: f1 value: 93.22122653806187 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 49.715999999999994 - type: f1 value: 49.169789920136076 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 34.922 - type: map_at_10 value: 50.524 - type: map_at_100 value: 51.247 - type: map_at_1000 value: 51.249 - type: map_at_3 value: 45.887 - type: map_at_5 value: 48.592999999999996 - type: mrr_at_1 value: 34.922 - type: mrr_at_10 value: 50.382000000000005 - type: mrr_at_100 value: 51.104000000000006 - type: mrr_at_1000 value: 51.105999999999995 - type: mrr_at_3 value: 45.733000000000004 - type: mrr_at_5 value: 48.428 - type: ndcg_at_1 value: 34.922 - type: ndcg_at_10 value: 59.12 - type: ndcg_at_100 value: 62.083999999999996 - type: ndcg_at_1000 value: 62.137 - type: ndcg_at_3 value: 49.616 - type: ndcg_at_5 value: 54.501 - type: precision_at_1 value: 34.922 - type: precision_at_10 value: 8.649 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.152 - type: precision_at_5 value: 14.466999999999999 - type: recall_at_1 value: 34.922 - type: recall_at_10 value: 86.48599999999999 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 60.455000000000005 - type: recall_at_5 value: 72.333 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.623282347623714 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 39.86487843524932 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.3290291318171 - type: mrr value: 75.2379853141626 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 88.52002953574285 - type: cos_sim_spearman value: 86.98752423842483 - type: euclidean_pearson value: 86.89442688314197 - type: euclidean_spearman value: 86.88631711307471 - type: manhattan_pearson value: 87.03723618507175 - type: manhattan_spearman value: 86.76041062975224 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.64935064935065 - type: f1 value: 86.61903824934998 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.21904455377494 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 35.43342755570654 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.843 - type: map_at_10 value: 43.379 - type: map_at_100 value: 44.946999999999996 - type: map_at_1000 value: 45.078 - type: map_at_3 value: 39.598 - type: map_at_5 value: 41.746 - type: mrr_at_1 value: 39.199 - type: mrr_at_10 value: 49.672 - type: mrr_at_100 value: 50.321000000000005 - type: mrr_at_1000 value: 50.365 - type: mrr_at_3 value: 46.805 - type: mrr_at_5 value: 48.579 - type: ndcg_at_1 value: 39.199 - type: ndcg_at_10 value: 50.163999999999994 - type: ndcg_at_100 value: 55.418 - type: ndcg_at_1000 value: 57.353 - type: ndcg_at_3 value: 44.716 - type: ndcg_at_5 value: 47.268 - type: precision_at_1 value: 39.199 - type: precision_at_10 value: 9.757 - type: precision_at_100 value: 1.552 - type: precision_at_1000 value: 0.20500000000000002 - type: precision_at_3 value: 21.602 - type: precision_at_5 value: 15.479000000000001 - type: recall_at_1 value: 31.843 - type: recall_at_10 value: 62.743 - type: recall_at_100 value: 84.78099999999999 - type: recall_at_1000 value: 96.86099999999999 - type: recall_at_3 value: 46.927 - type: recall_at_5 value: 54.355 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.321 - type: map_at_10 value: 39.062999999999995 - type: map_at_100 value: 40.403 - type: map_at_1000 value: 40.534 - type: map_at_3 value: 36.367 - type: map_at_5 value: 37.756 - type: mrr_at_1 value: 35.987 - type: mrr_at_10 value: 44.708999999999996 - type: mrr_at_100 value: 45.394 - type: mrr_at_1000 value: 45.436 - type: mrr_at_3 value: 42.463 - type: mrr_at_5 value: 43.663000000000004 - type: ndcg_at_1 value: 35.987 - type: ndcg_at_10 value: 44.585 - type: ndcg_at_100 value: 49.297999999999995 - type: ndcg_at_1000 value: 51.315 - type: ndcg_at_3 value: 40.569 - type: ndcg_at_5 value: 42.197 - type: precision_at_1 value: 35.987 - type: precision_at_10 value: 8.369 - type: precision_at_100 value: 1.366 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 19.427 - type: precision_at_5 value: 13.58 - type: recall_at_1 value: 29.321 - type: recall_at_10 value: 54.333 - type: recall_at_100 value: 74.178 - type: recall_at_1000 value: 86.732 - type: recall_at_3 value: 42.46 - type: recall_at_5 value: 47.089999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.811 - type: map_at_10 value: 51.114000000000004 - type: map_at_100 value: 52.22 - type: map_at_1000 value: 52.275000000000006 - type: map_at_3 value: 47.644999999999996 - type: map_at_5 value: 49.675000000000004 - type: mrr_at_1 value: 44.389 - type: mrr_at_10 value: 54.459 - type: mrr_at_100 value: 55.208999999999996 - type: mrr_at_1000 value: 55.239000000000004 - type: mrr_at_3 value: 51.954 - type: mrr_at_5 value: 53.571999999999996 - type: ndcg_at_1 value: 44.389 - type: ndcg_at_10 value: 56.979 - type: ndcg_at_100 value: 61.266 - type: ndcg_at_1000 value: 62.315 - type: ndcg_at_3 value: 51.342 - type: ndcg_at_5 value: 54.33 - type: precision_at_1 value: 44.389 - type: precision_at_10 value: 9.26 - type: precision_at_100 value: 1.226 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 22.926 - type: precision_at_5 value: 15.987000000000002 - type: recall_at_1 value: 38.811 - type: recall_at_10 value: 70.841 - type: recall_at_100 value: 89.218 - type: recall_at_1000 value: 96.482 - type: recall_at_3 value: 56.123999999999995 - type: recall_at_5 value: 63.322 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.378 - type: map_at_10 value: 34.311 - type: map_at_100 value: 35.399 - type: map_at_1000 value: 35.482 - type: map_at_3 value: 31.917 - type: map_at_5 value: 33.275 - type: mrr_at_1 value: 27.683999999999997 - type: mrr_at_10 value: 36.575 - type: mrr_at_100 value: 37.492 - type: mrr_at_1000 value: 37.556 - type: mrr_at_3 value: 34.35 - type: mrr_at_5 value: 35.525 - type: ndcg_at_1 value: 27.683999999999997 - type: ndcg_at_10 value: 39.247 - type: ndcg_at_100 value: 44.424 - type: ndcg_at_1000 value: 46.478 - type: ndcg_at_3 value: 34.684 - type: ndcg_at_5 value: 36.886 - type: precision_at_1 value: 27.683999999999997 - type: precision_at_10 value: 5.989 - type: precision_at_100 value: 0.899 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 14.84 - type: precision_at_5 value: 10.215 - type: recall_at_1 value: 25.378 - type: recall_at_10 value: 52.195 - type: recall_at_100 value: 75.764 - type: recall_at_1000 value: 91.012 - type: recall_at_3 value: 39.885999999999996 - type: recall_at_5 value: 45.279 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.326 - type: map_at_10 value: 25.247000000000003 - type: map_at_100 value: 26.473000000000003 - type: map_at_1000 value: 26.579000000000004 - type: map_at_3 value: 22.466 - type: map_at_5 value: 24.113 - type: mrr_at_1 value: 21.393 - type: mrr_at_10 value: 30.187 - type: mrr_at_100 value: 31.089 - type: mrr_at_1000 value: 31.15 - type: mrr_at_3 value: 27.279999999999998 - type: mrr_at_5 value: 29.127 - type: ndcg_at_1 value: 21.393 - type: ndcg_at_10 value: 30.668 - type: ndcg_at_100 value: 36.543 - type: ndcg_at_1000 value: 39.181 - type: ndcg_at_3 value: 25.552000000000003 - type: ndcg_at_5 value: 28.176000000000002 - type: precision_at_1 value: 21.393 - type: precision_at_10 value: 5.784000000000001 - type: precision_at_100 value: 1.001 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 12.231 - type: precision_at_5 value: 9.179 - type: recall_at_1 value: 17.326 - type: recall_at_10 value: 42.415000000000006 - type: recall_at_100 value: 68.605 - type: recall_at_1000 value: 87.694 - type: recall_at_3 value: 28.343 - type: recall_at_5 value: 35.086 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.069 - type: map_at_10 value: 40.027 - type: map_at_100 value: 41.308 - type: map_at_1000 value: 41.412 - type: map_at_3 value: 36.864000000000004 - type: map_at_5 value: 38.641999999999996 - type: mrr_at_1 value: 35.707 - type: mrr_at_10 value: 45.527 - type: mrr_at_100 value: 46.348 - type: mrr_at_1000 value: 46.392 - type: mrr_at_3 value: 43.086 - type: mrr_at_5 value: 44.645 - type: ndcg_at_1 value: 35.707 - type: ndcg_at_10 value: 46.117000000000004 - type: ndcg_at_100 value: 51.468 - type: ndcg_at_1000 value: 53.412000000000006 - type: ndcg_at_3 value: 41.224 - type: ndcg_at_5 value: 43.637 - type: precision_at_1 value: 35.707 - type: precision_at_10 value: 8.459999999999999 - type: precision_at_100 value: 1.2970000000000002 - type: precision_at_1000 value: 0.165 - type: precision_at_3 value: 19.731 - type: precision_at_5 value: 14.013 - type: recall_at_1 value: 29.069 - type: recall_at_10 value: 58.343999999999994 - type: recall_at_100 value: 81.296 - type: recall_at_1000 value: 93.974 - type: recall_at_3 value: 44.7 - type: recall_at_5 value: 50.88700000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.905 - type: map_at_10 value: 33.983000000000004 - type: map_at_100 value: 35.372 - type: map_at_1000 value: 35.487 - type: map_at_3 value: 30.902 - type: map_at_5 value: 32.505 - type: mrr_at_1 value: 29.794999999999998 - type: mrr_at_10 value: 39.28 - type: mrr_at_100 value: 40.215 - type: mrr_at_1000 value: 40.276 - type: mrr_at_3 value: 36.701 - type: mrr_at_5 value: 38.105 - type: ndcg_at_1 value: 29.794999999999998 - type: ndcg_at_10 value: 40.041 - type: ndcg_at_100 value: 45.884 - type: ndcg_at_1000 value: 48.271 - type: ndcg_at_3 value: 34.931 - type: ndcg_at_5 value: 37.044 - type: precision_at_1 value: 29.794999999999998 - type: precision_at_10 value: 7.546 - type: precision_at_100 value: 1.216 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 16.933 - type: precision_at_5 value: 12.1 - type: recall_at_1 value: 23.905 - type: recall_at_10 value: 52.945 - type: recall_at_100 value: 77.551 - type: recall_at_1000 value: 93.793 - type: recall_at_3 value: 38.364 - type: recall_at_5 value: 44.044 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.24441666666667 - type: map_at_10 value: 34.4595 - type: map_at_100 value: 35.699999999999996 - type: map_at_1000 value: 35.8155 - type: map_at_3 value: 31.608333333333338 - type: map_at_5 value: 33.189416666666666 - type: mrr_at_1 value: 29.825250000000004 - type: mrr_at_10 value: 38.60875 - type: mrr_at_100 value: 39.46575 - type: mrr_at_1000 value: 39.52458333333333 - type: mrr_at_3 value: 36.145166666666675 - type: mrr_at_5 value: 37.57625 - type: ndcg_at_1 value: 29.825250000000004 - type: ndcg_at_10 value: 39.88741666666667 - type: ndcg_at_100 value: 45.17966666666667 - type: ndcg_at_1000 value: 47.440583333333336 - type: ndcg_at_3 value: 35.04591666666666 - type: ndcg_at_5 value: 37.32025 - type: precision_at_1 value: 29.825250000000004 - type: precision_at_10 value: 7.07225 - type: precision_at_100 value: 1.1462499999999998 - type: precision_at_1000 value: 0.15325 - type: precision_at_3 value: 16.18375 - type: precision_at_5 value: 11.526833333333334 - type: recall_at_1 value: 25.24441666666667 - type: recall_at_10 value: 51.744916666666676 - type: recall_at_100 value: 75.04574999999998 - type: recall_at_1000 value: 90.65558333333334 - type: recall_at_3 value: 38.28349999999999 - type: recall_at_5 value: 44.16591666666667 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.237000000000002 - type: map_at_10 value: 30.667 - type: map_at_100 value: 31.592 - type: map_at_1000 value: 31.688 - type: map_at_3 value: 28.810999999999996 - type: map_at_5 value: 29.788999999999998 - type: mrr_at_1 value: 26.840000000000003 - type: mrr_at_10 value: 33.305 - type: mrr_at_100 value: 34.089000000000006 - type: mrr_at_1000 value: 34.159 - type: mrr_at_3 value: 31.518 - type: mrr_at_5 value: 32.469 - type: ndcg_at_1 value: 26.840000000000003 - type: ndcg_at_10 value: 34.541 - type: ndcg_at_100 value: 39.206 - type: ndcg_at_1000 value: 41.592 - type: ndcg_at_3 value: 31.005 - type: ndcg_at_5 value: 32.554 - type: precision_at_1 value: 26.840000000000003 - type: precision_at_10 value: 5.3069999999999995 - type: precision_at_100 value: 0.8340000000000001 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 13.292000000000002 - type: precision_at_5 value: 9.049 - type: recall_at_1 value: 24.237000000000002 - type: recall_at_10 value: 43.862 - type: recall_at_100 value: 65.352 - type: recall_at_1000 value: 82.704 - type: recall_at_3 value: 34.009 - type: recall_at_5 value: 37.878 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.482 - type: map_at_10 value: 23.249 - type: map_at_100 value: 24.388 - type: map_at_1000 value: 24.519 - type: map_at_3 value: 20.971 - type: map_at_5 value: 22.192 - type: mrr_at_1 value: 19.993 - type: mrr_at_10 value: 26.985 - type: mrr_at_100 value: 27.975 - type: mrr_at_1000 value: 28.052 - type: mrr_at_3 value: 24.954 - type: mrr_at_5 value: 26.070999999999998 - type: ndcg_at_1 value: 19.993 - type: ndcg_at_10 value: 27.656 - type: ndcg_at_100 value: 33.256 - type: ndcg_at_1000 value: 36.275 - type: ndcg_at_3 value: 23.644000000000002 - type: ndcg_at_5 value: 25.466 - type: precision_at_1 value: 19.993 - type: precision_at_10 value: 5.093 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 11.149000000000001 - type: precision_at_5 value: 8.149000000000001 - type: recall_at_1 value: 16.482 - type: recall_at_10 value: 37.141999999999996 - type: recall_at_100 value: 62.696 - type: recall_at_1000 value: 84.333 - type: recall_at_3 value: 26.031 - type: recall_at_5 value: 30.660999999999998 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.887999999999998 - type: map_at_10 value: 34.101 - type: map_at_100 value: 35.27 - type: map_at_1000 value: 35.370000000000005 - type: map_at_3 value: 31.283 - type: map_at_5 value: 32.72 - type: mrr_at_1 value: 29.011 - type: mrr_at_10 value: 38.004 - type: mrr_at_100 value: 38.879000000000005 - type: mrr_at_1000 value: 38.938 - type: mrr_at_3 value: 35.571999999999996 - type: mrr_at_5 value: 36.789 - type: ndcg_at_1 value: 29.011 - type: ndcg_at_10 value: 39.586 - type: ndcg_at_100 value: 44.939 - type: ndcg_at_1000 value: 47.236 - type: ndcg_at_3 value: 34.4 - type: ndcg_at_5 value: 36.519 - type: precision_at_1 value: 29.011 - type: precision_at_10 value: 6.763 - type: precision_at_100 value: 1.059 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 15.609 - type: precision_at_5 value: 10.896 - type: recall_at_1 value: 24.887999999999998 - type: recall_at_10 value: 52.42 - type: recall_at_100 value: 75.803 - type: recall_at_1000 value: 91.725 - type: recall_at_3 value: 38.080999999999996 - type: recall_at_5 value: 43.47 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.953 - type: map_at_10 value: 32.649 - type: map_at_100 value: 34.181 - type: map_at_1000 value: 34.398 - type: map_at_3 value: 29.567 - type: map_at_5 value: 31.263 - type: mrr_at_1 value: 29.051 - type: mrr_at_10 value: 37.419999999999995 - type: mrr_at_100 value: 38.396 - type: mrr_at_1000 value: 38.458 - type: mrr_at_3 value: 34.782999999999994 - type: mrr_at_5 value: 36.254999999999995 - type: ndcg_at_1 value: 29.051 - type: ndcg_at_10 value: 38.595 - type: ndcg_at_100 value: 44.6 - type: ndcg_at_1000 value: 47.158 - type: ndcg_at_3 value: 33.56 - type: ndcg_at_5 value: 35.870000000000005 - type: precision_at_1 value: 29.051 - type: precision_at_10 value: 7.53 - type: precision_at_100 value: 1.538 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 15.744 - type: precision_at_5 value: 11.542 - type: recall_at_1 value: 23.953 - type: recall_at_10 value: 50.08200000000001 - type: recall_at_100 value: 77.364 - type: recall_at_1000 value: 93.57799999999999 - type: recall_at_3 value: 35.432 - type: recall_at_5 value: 41.875 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.72 - type: map_at_10 value: 25.724000000000004 - type: map_at_100 value: 26.846999999999998 - type: map_at_1000 value: 26.964 - type: map_at_3 value: 22.909 - type: map_at_5 value: 24.596999999999998 - type: mrr_at_1 value: 18.854000000000003 - type: mrr_at_10 value: 27.182000000000002 - type: mrr_at_100 value: 28.182000000000002 - type: mrr_at_1000 value: 28.274 - type: mrr_at_3 value: 24.276 - type: mrr_at_5 value: 26.115 - type: ndcg_at_1 value: 18.854000000000003 - type: ndcg_at_10 value: 30.470000000000002 - type: ndcg_at_100 value: 35.854 - type: ndcg_at_1000 value: 38.701 - type: ndcg_at_3 value: 24.924 - type: ndcg_at_5 value: 27.895999999999997 - type: precision_at_1 value: 18.854000000000003 - type: precision_at_10 value: 5.009 - type: precision_at_100 value: 0.835 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 10.721 - type: precision_at_5 value: 8.133 - type: recall_at_1 value: 17.72 - type: recall_at_10 value: 43.617 - type: recall_at_100 value: 67.941 - type: recall_at_1000 value: 88.979 - type: recall_at_3 value: 29.044999999999998 - type: recall_at_5 value: 36.044 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 13.427 - type: map_at_10 value: 22.935 - type: map_at_100 value: 24.808 - type: map_at_1000 value: 24.994 - type: map_at_3 value: 19.533 - type: map_at_5 value: 21.261 - type: mrr_at_1 value: 30.945 - type: mrr_at_10 value: 43.242000000000004 - type: mrr_at_100 value: 44.013999999999996 - type: mrr_at_1000 value: 44.048 - type: mrr_at_3 value: 40.109 - type: mrr_at_5 value: 42.059999999999995 - type: ndcg_at_1 value: 30.945 - type: ndcg_at_10 value: 31.828 - type: ndcg_at_100 value: 38.801 - type: ndcg_at_1000 value: 42.126999999999995 - type: ndcg_at_3 value: 26.922 - type: ndcg_at_5 value: 28.483999999999998 - type: precision_at_1 value: 30.945 - type: precision_at_10 value: 9.844 - type: precision_at_100 value: 1.7309999999999999 - type: precision_at_1000 value: 0.23500000000000001 - type: precision_at_3 value: 20.477999999999998 - type: precision_at_5 value: 15.27 - type: recall_at_1 value: 13.427 - type: recall_at_10 value: 37.141000000000005 - type: recall_at_100 value: 61.007 - type: recall_at_1000 value: 79.742 - type: recall_at_3 value: 24.431 - type: recall_at_5 value: 29.725 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.122 - type: map_at_10 value: 18.799 - type: map_at_100 value: 25.724999999999998 - type: map_at_1000 value: 27.205000000000002 - type: map_at_3 value: 14.194999999999999 - type: map_at_5 value: 16.225 - type: mrr_at_1 value: 68.0 - type: mrr_at_10 value: 76.035 - type: mrr_at_100 value: 76.292 - type: mrr_at_1000 value: 76.297 - type: mrr_at_3 value: 74.458 - type: mrr_at_5 value: 75.558 - type: ndcg_at_1 value: 56.00000000000001 - type: ndcg_at_10 value: 39.761 - type: ndcg_at_100 value: 43.736999999999995 - type: ndcg_at_1000 value: 51.146 - type: ndcg_at_3 value: 45.921 - type: ndcg_at_5 value: 42.756 - type: precision_at_1 value: 68.0 - type: precision_at_10 value: 30.275000000000002 - type: precision_at_100 value: 9.343 - type: precision_at_1000 value: 1.8270000000000002 - type: precision_at_3 value: 49.167 - type: precision_at_5 value: 40.699999999999996 - type: recall_at_1 value: 9.122 - type: recall_at_10 value: 23.669999999999998 - type: recall_at_100 value: 48.719 - type: recall_at_1000 value: 72.033 - type: recall_at_3 value: 15.498999999999999 - type: recall_at_5 value: 18.657 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 55.885000000000005 - type: f1 value: 50.70726446938571 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 75.709 - type: map_at_10 value: 83.345 - type: map_at_100 value: 83.557 - type: map_at_1000 value: 83.572 - type: map_at_3 value: 82.425 - type: map_at_5 value: 83.013 - type: mrr_at_1 value: 81.593 - type: mrr_at_10 value: 88.331 - type: mrr_at_100 value: 88.408 - type: mrr_at_1000 value: 88.41 - type: mrr_at_3 value: 87.714 - type: mrr_at_5 value: 88.122 - type: ndcg_at_1 value: 81.593 - type: ndcg_at_10 value: 86.925 - type: ndcg_at_100 value: 87.67 - type: ndcg_at_1000 value: 87.924 - type: ndcg_at_3 value: 85.5 - type: ndcg_at_5 value: 86.283 - type: precision_at_1 value: 81.593 - type: precision_at_10 value: 10.264 - type: precision_at_100 value: 1.084 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 32.388 - type: precision_at_5 value: 19.991 - type: recall_at_1 value: 75.709 - type: recall_at_10 value: 93.107 - type: recall_at_100 value: 96.024 - type: recall_at_1000 value: 97.603 - type: recall_at_3 value: 89.08500000000001 - type: recall_at_5 value: 91.15299999999999 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 19.121 - type: map_at_10 value: 31.78 - type: map_at_100 value: 33.497 - type: map_at_1000 value: 33.696 - type: map_at_3 value: 27.893 - type: map_at_5 value: 30.087000000000003 - type: mrr_at_1 value: 38.272 - type: mrr_at_10 value: 47.176 - type: mrr_at_100 value: 48.002 - type: mrr_at_1000 value: 48.044 - type: mrr_at_3 value: 45.086999999999996 - type: mrr_at_5 value: 46.337 - type: ndcg_at_1 value: 38.272 - type: ndcg_at_10 value: 39.145 - type: ndcg_at_100 value: 45.696999999999996 - type: ndcg_at_1000 value: 49.0 - type: ndcg_at_3 value: 36.148 - type: ndcg_at_5 value: 37.023 - type: precision_at_1 value: 38.272 - type: precision_at_10 value: 11.065 - type: precision_at_100 value: 1.7840000000000003 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 24.587999999999997 - type: precision_at_5 value: 18.056 - type: recall_at_1 value: 19.121 - type: recall_at_10 value: 44.857 - type: recall_at_100 value: 69.774 - type: recall_at_1000 value: 89.645 - type: recall_at_3 value: 32.588 - type: recall_at_5 value: 37.939 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 36.428 - type: map_at_10 value: 56.891999999999996 - type: map_at_100 value: 57.82899999999999 - type: map_at_1000 value: 57.896 - type: map_at_3 value: 53.762 - type: map_at_5 value: 55.718 - type: mrr_at_1 value: 72.856 - type: mrr_at_10 value: 79.245 - type: mrr_at_100 value: 79.515 - type: mrr_at_1000 value: 79.525 - type: mrr_at_3 value: 78.143 - type: mrr_at_5 value: 78.822 - type: ndcg_at_1 value: 72.856 - type: ndcg_at_10 value: 65.204 - type: ndcg_at_100 value: 68.552 - type: ndcg_at_1000 value: 69.902 - type: ndcg_at_3 value: 60.632 - type: ndcg_at_5 value: 63.161 - type: precision_at_1 value: 72.856 - type: precision_at_10 value: 13.65 - type: precision_at_100 value: 1.6260000000000001 - type: precision_at_1000 value: 0.181 - type: precision_at_3 value: 38.753 - type: precision_at_5 value: 25.251 - type: recall_at_1 value: 36.428 - type: recall_at_10 value: 68.25099999999999 - type: recall_at_100 value: 81.317 - type: recall_at_1000 value: 90.27 - type: recall_at_3 value: 58.13 - type: recall_at_5 value: 63.126000000000005 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 89.4868 - type: ap value: 84.88319192880247 - type: f1 value: 89.46144458052846 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.282999999999998 - type: map_at_10 value: 33.045 - type: map_at_100 value: 34.238 - type: map_at_1000 value: 34.29 - type: map_at_3 value: 29.305999999999997 - type: map_at_5 value: 31.391000000000002 - type: mrr_at_1 value: 21.92 - type: mrr_at_10 value: 33.649 - type: mrr_at_100 value: 34.791 - type: mrr_at_1000 value: 34.837 - type: mrr_at_3 value: 30.0 - type: mrr_at_5 value: 32.039 - type: ndcg_at_1 value: 21.92 - type: ndcg_at_10 value: 39.729 - type: ndcg_at_100 value: 45.484 - type: ndcg_at_1000 value: 46.817 - type: ndcg_at_3 value: 32.084 - type: ndcg_at_5 value: 35.789 - type: precision_at_1 value: 21.92 - type: precision_at_10 value: 6.297 - type: precision_at_100 value: 0.918 - type: precision_at_1000 value: 0.10300000000000001 - type: precision_at_3 value: 13.639000000000001 - type: precision_at_5 value: 10.054 - type: recall_at_1 value: 21.282999999999998 - type: recall_at_10 value: 60.343999999999994 - type: recall_at_100 value: 86.981 - type: recall_at_1000 value: 97.205 - type: recall_at_3 value: 39.452999999999996 - type: recall_at_5 value: 48.333 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 95.47879616963064 - type: f1 value: 95.21800589958251 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 79.09256725946192 - type: f1 value: 60.554043889452515 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 75.53463349024882 - type: f1 value: 73.14418495756476 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.22663080026899 - type: f1 value: 79.331456217501 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.50316010430136 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.15612040042282 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.36227552557184 - type: mrr value: 33.57901344209811 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.6610000000000005 - type: map_at_10 value: 12.992 - type: map_at_100 value: 16.756999999999998 - type: map_at_1000 value: 18.25 - type: map_at_3 value: 9.471 - type: map_at_5 value: 11.116 - type: mrr_at_1 value: 43.653 - type: mrr_at_10 value: 53.388999999999996 - type: mrr_at_100 value: 53.982 - type: mrr_at_1000 value: 54.033 - type: mrr_at_3 value: 51.858000000000004 - type: mrr_at_5 value: 53.019000000000005 - type: ndcg_at_1 value: 41.641 - type: ndcg_at_10 value: 34.691 - type: ndcg_at_100 value: 32.305 - type: ndcg_at_1000 value: 41.132999999999996 - type: ndcg_at_3 value: 40.614 - type: ndcg_at_5 value: 38.456 - type: precision_at_1 value: 43.344 - type: precision_at_10 value: 25.881999999999998 - type: precision_at_100 value: 8.483 - type: precision_at_1000 value: 2.131 - type: precision_at_3 value: 38.803 - type: precision_at_5 value: 33.87 - type: recall_at_1 value: 5.6610000000000005 - type: recall_at_10 value: 16.826 - type: recall_at_100 value: 32.939 - type: recall_at_1000 value: 65.161 - type: recall_at_3 value: 10.756 - type: recall_at_5 value: 13.331000000000001 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 26.692 - type: map_at_10 value: 41.065000000000005 - type: map_at_100 value: 42.235 - type: map_at_1000 value: 42.27 - type: map_at_3 value: 36.635 - type: map_at_5 value: 39.219 - type: mrr_at_1 value: 30.214000000000002 - type: mrr_at_10 value: 43.443 - type: mrr_at_100 value: 44.326 - type: mrr_at_1000 value: 44.352000000000004 - type: mrr_at_3 value: 39.623999999999995 - type: mrr_at_5 value: 41.898 - type: ndcg_at_1 value: 30.214000000000002 - type: ndcg_at_10 value: 48.692 - type: ndcg_at_100 value: 53.671 - type: ndcg_at_1000 value: 54.522000000000006 - type: ndcg_at_3 value: 40.245 - type: ndcg_at_5 value: 44.580999999999996 - type: precision_at_1 value: 30.214000000000002 - type: precision_at_10 value: 8.3 - type: precision_at_100 value: 1.1079999999999999 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 18.521 - type: precision_at_5 value: 13.627 - type: recall_at_1 value: 26.692 - type: recall_at_10 value: 69.699 - type: recall_at_100 value: 91.425 - type: recall_at_1000 value: 97.78099999999999 - type: recall_at_3 value: 47.711 - type: recall_at_5 value: 57.643 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.962 - type: map_at_10 value: 84.772 - type: map_at_100 value: 85.402 - type: map_at_1000 value: 85.418 - type: map_at_3 value: 81.89 - type: map_at_5 value: 83.685 - type: mrr_at_1 value: 81.67 - type: mrr_at_10 value: 87.681 - type: mrr_at_100 value: 87.792 - type: mrr_at_1000 value: 87.79299999999999 - type: mrr_at_3 value: 86.803 - type: mrr_at_5 value: 87.392 - type: ndcg_at_1 value: 81.69 - type: ndcg_at_10 value: 88.429 - type: ndcg_at_100 value: 89.66 - type: ndcg_at_1000 value: 89.762 - type: ndcg_at_3 value: 85.75 - type: ndcg_at_5 value: 87.20700000000001 - type: precision_at_1 value: 81.69 - type: precision_at_10 value: 13.395000000000001 - type: precision_at_100 value: 1.528 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.507000000000005 - type: precision_at_5 value: 24.614 - type: recall_at_1 value: 70.962 - type: recall_at_10 value: 95.339 - type: recall_at_100 value: 99.543 - type: recall_at_1000 value: 99.984 - type: recall_at_3 value: 87.54899999999999 - type: recall_at_5 value: 91.726 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 55.506631779239555 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 60.63731341848479 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.852 - type: map_at_10 value: 13.175 - type: map_at_100 value: 15.623999999999999 - type: map_at_1000 value: 16.002 - type: map_at_3 value: 9.103 - type: map_at_5 value: 11.068999999999999 - type: mrr_at_1 value: 23.9 - type: mrr_at_10 value: 35.847 - type: mrr_at_100 value: 36.968 - type: mrr_at_1000 value: 37.018 - type: mrr_at_3 value: 32.300000000000004 - type: mrr_at_5 value: 34.14 - type: ndcg_at_1 value: 23.9 - type: ndcg_at_10 value: 21.889 - type: ndcg_at_100 value: 30.903000000000002 - type: ndcg_at_1000 value: 36.992000000000004 - type: ndcg_at_3 value: 20.274 - type: ndcg_at_5 value: 17.773 - type: precision_at_1 value: 23.9 - type: precision_at_10 value: 11.61 - type: precision_at_100 value: 2.4539999999999997 - type: precision_at_1000 value: 0.391 - type: precision_at_3 value: 19.133 - type: precision_at_5 value: 15.740000000000002 - type: recall_at_1 value: 4.852 - type: recall_at_10 value: 23.507 - type: recall_at_100 value: 49.775000000000006 - type: recall_at_1000 value: 79.308 - type: recall_at_3 value: 11.637 - type: recall_at_5 value: 15.947 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 86.03345827446948 - type: cos_sim_spearman value: 80.53174518259549 - type: euclidean_pearson value: 83.44538971660883 - type: euclidean_spearman value: 80.57344324098692 - type: manhattan_pearson value: 83.36528808195459 - type: manhattan_spearman value: 80.48931287157902 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.21363088257881 - type: cos_sim_spearman value: 75.56589127055523 - type: euclidean_pearson value: 82.32868324521908 - type: euclidean_spearman value: 75.31928550664554 - type: manhattan_pearson value: 82.31332875713211 - type: manhattan_spearman value: 75.35376322099196 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 85.09085593258487 - type: cos_sim_spearman value: 86.26355088415221 - type: euclidean_pearson value: 85.49646115361156 - type: euclidean_spearman value: 86.20652472228703 - type: manhattan_pearson value: 85.44084081123815 - type: manhattan_spearman value: 86.1162623448951 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 84.68250248349368 - type: cos_sim_spearman value: 82.29883673695083 - type: euclidean_pearson value: 84.17633035446019 - type: euclidean_spearman value: 82.19990511264791 - type: manhattan_pearson value: 84.17408410692279 - type: manhattan_spearman value: 82.249873895981 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 87.31878760045024 - type: cos_sim_spearman value: 88.7364409031183 - type: euclidean_pearson value: 88.230537618603 - type: euclidean_spearman value: 88.76484309646318 - type: manhattan_pearson value: 88.17689071136469 - type: manhattan_spearman value: 88.72809249037928 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.41078559110638 - type: cos_sim_spearman value: 85.27439135411049 - type: euclidean_pearson value: 84.5333571592088 - type: euclidean_spearman value: 85.25645460575957 - type: manhattan_pearson value: 84.38428921610226 - type: manhattan_spearman value: 85.07796040798796 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.82374132382576 - type: cos_sim_spearman value: 89.02101343562433 - type: euclidean_pearson value: 89.50729765458932 - type: euclidean_spearman value: 89.04184772869253 - type: manhattan_pearson value: 89.51737904059856 - type: manhattan_spearman value: 89.12925950440676 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.56051823873482 - type: cos_sim_spearman value: 68.50988748185463 - type: euclidean_pearson value: 69.16524346147456 - type: euclidean_spearman value: 68.61859952449579 - type: manhattan_pearson value: 69.10618915706995 - type: manhattan_spearman value: 68.36401769459522 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 85.4159693872625 - type: cos_sim_spearman value: 87.07819121764247 - type: euclidean_pearson value: 87.03013260863153 - type: euclidean_spearman value: 87.06547293631309 - type: manhattan_pearson value: 86.8129744446062 - type: manhattan_spearman value: 86.88494096335627 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 86.47758088996575 - type: mrr value: 96.17891458577733 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.538999999999994 - type: map_at_10 value: 66.562 - type: map_at_100 value: 67.254 - type: map_at_1000 value: 67.284 - type: map_at_3 value: 63.722 - type: map_at_5 value: 65.422 - type: mrr_at_1 value: 60.0 - type: mrr_at_10 value: 67.354 - type: mrr_at_100 value: 67.908 - type: mrr_at_1000 value: 67.93299999999999 - type: mrr_at_3 value: 65.056 - type: mrr_at_5 value: 66.43900000000001 - type: ndcg_at_1 value: 60.0 - type: ndcg_at_10 value: 70.858 - type: ndcg_at_100 value: 73.67099999999999 - type: ndcg_at_1000 value: 74.26700000000001 - type: ndcg_at_3 value: 65.911 - type: ndcg_at_5 value: 68.42200000000001 - type: precision_at_1 value: 60.0 - type: precision_at_10 value: 9.4 - type: precision_at_100 value: 1.083 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.444 - type: precision_at_5 value: 17.0 - type: recall_at_1 value: 57.538999999999994 - type: recall_at_10 value: 83.233 - type: recall_at_100 value: 95.667 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 69.883 - type: recall_at_5 value: 76.19399999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.82574257425742 - type: cos_sim_ap value: 95.78722833053911 - type: cos_sim_f1 value: 90.94650205761316 - type: cos_sim_precision value: 93.64406779661016 - type: cos_sim_recall value: 88.4 - type: dot_accuracy value: 99.83366336633664 - type: dot_ap value: 95.89733601612964 - type: dot_f1 value: 91.41981613891727 - type: dot_precision value: 93.42379958246346 - type: dot_recall value: 89.5 - type: euclidean_accuracy value: 99.82574257425742 - type: euclidean_ap value: 95.75227035138846 - type: euclidean_f1 value: 90.96509240246407 - type: euclidean_precision value: 93.45991561181435 - type: euclidean_recall value: 88.6 - type: manhattan_accuracy value: 99.82574257425742 - type: manhattan_ap value: 95.76278266220176 - type: manhattan_f1 value: 91.08409321175279 - type: manhattan_precision value: 92.29979466119097 - type: manhattan_recall value: 89.9 - type: max_accuracy value: 99.83366336633664 - type: max_ap value: 95.89733601612964 - type: max_f1 value: 91.41981613891727 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 61.905425988638605 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 36.159589881679736 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 53.0605499476397 - type: mrr value: 53.91594516594517 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.202718009067 - type: cos_sim_spearman value: 31.136199912366987 - type: dot_pearson value: 30.66329011927951 - type: dot_spearman value: 30.107664909625107 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.209 - type: map_at_10 value: 1.712 - type: map_at_100 value: 9.464 - type: map_at_1000 value: 23.437 - type: map_at_3 value: 0.609 - type: map_at_5 value: 0.9440000000000001 - type: mrr_at_1 value: 78.0 - type: mrr_at_10 value: 86.833 - type: mrr_at_100 value: 86.833 - type: mrr_at_1000 value: 86.833 - type: mrr_at_3 value: 85.333 - type: mrr_at_5 value: 86.833 - type: ndcg_at_1 value: 74.0 - type: ndcg_at_10 value: 69.14 - type: ndcg_at_100 value: 53.047999999999995 - type: ndcg_at_1000 value: 48.577 - type: ndcg_at_3 value: 75.592 - type: ndcg_at_5 value: 72.509 - type: precision_at_1 value: 78.0 - type: precision_at_10 value: 73.0 - type: precision_at_100 value: 54.44 - type: precision_at_1000 value: 21.326 - type: precision_at_3 value: 80.667 - type: precision_at_5 value: 77.2 - type: recall_at_1 value: 0.209 - type: recall_at_10 value: 1.932 - type: recall_at_100 value: 13.211999999999998 - type: recall_at_1000 value: 45.774 - type: recall_at_3 value: 0.644 - type: recall_at_5 value: 1.0290000000000001 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.609 - type: map_at_10 value: 8.334999999999999 - type: map_at_100 value: 14.604000000000001 - type: map_at_1000 value: 16.177 - type: map_at_3 value: 4.87 - type: map_at_5 value: 6.3149999999999995 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 45.047 - type: mrr_at_100 value: 45.808 - type: mrr_at_1000 value: 45.808 - type: mrr_at_3 value: 41.497 - type: mrr_at_5 value: 43.231 - type: ndcg_at_1 value: 30.612000000000002 - type: ndcg_at_10 value: 21.193 - type: ndcg_at_100 value: 34.97 - type: ndcg_at_1000 value: 46.69 - type: ndcg_at_3 value: 24.823 - type: ndcg_at_5 value: 22.872999999999998 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 17.959 - type: precision_at_100 value: 7.4079999999999995 - type: precision_at_1000 value: 1.537 - type: precision_at_3 value: 25.85 - type: precision_at_5 value: 22.448999999999998 - type: recall_at_1 value: 2.609 - type: recall_at_10 value: 13.63 - type: recall_at_100 value: 47.014 - type: recall_at_1000 value: 83.176 - type: recall_at_3 value: 5.925 - type: recall_at_5 value: 8.574 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 72.80239999999999 - type: ap value: 15.497911013214791 - type: f1 value: 56.258411577947285 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.00452744765139 - type: f1 value: 61.42228624410908 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 50.00516915962345 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.62317458425225 - type: cos_sim_ap value: 72.95115658063823 - type: cos_sim_f1 value: 66.78976523344764 - type: cos_sim_precision value: 66.77215189873418 - type: cos_sim_recall value: 66.80738786279683 - type: dot_accuracy value: 85.62317458425225 - type: dot_ap value: 73.10385271517778 - type: dot_f1 value: 66.94853829427399 - type: dot_precision value: 61.74242424242424 - type: dot_recall value: 73.11345646437995 - type: euclidean_accuracy value: 85.65893783155511 - type: euclidean_ap value: 72.87428208473992 - type: euclidean_f1 value: 66.70919994896005 - type: euclidean_precision value: 64.5910551025451 - type: euclidean_recall value: 68.97097625329816 - type: manhattan_accuracy value: 85.59933241938367 - type: manhattan_ap value: 72.67282695064966 - type: manhattan_f1 value: 66.67537215983286 - type: manhattan_precision value: 66.00310237849017 - type: manhattan_recall value: 67.36147757255937 - type: max_accuracy value: 85.65893783155511 - type: max_ap value: 73.10385271517778 - type: max_f1 value: 66.94853829427399 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.69096130709822 - type: cos_sim_ap value: 85.30326978668063 - type: cos_sim_f1 value: 77.747088683189 - type: cos_sim_precision value: 75.4491451753115 - type: cos_sim_recall value: 80.189405605174 - type: dot_accuracy value: 88.43870066363954 - type: dot_ap value: 84.62999949222983 - type: dot_f1 value: 77.3074661963551 - type: dot_precision value: 73.93871239808828 - type: dot_recall value: 80.99784416384355 - type: euclidean_accuracy value: 88.70066363953894 - type: euclidean_ap value: 85.34184508966621 - type: euclidean_f1 value: 77.76871756856931 - type: euclidean_precision value: 74.97855917667239 - type: euclidean_recall value: 80.77456113335386 - type: manhattan_accuracy value: 88.68319944114566 - type: manhattan_ap value: 85.3026464242333 - type: manhattan_f1 value: 77.66561049296294 - type: manhattan_precision value: 74.4665818849795 - type: manhattan_recall value: 81.15183246073299 - type: max_accuracy value: 88.70066363953894 - type: max_ap value: 85.34184508966621 - type: max_f1 value: 77.76871756856931 ---

GIST small Embedding v0

*GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning* The model is fine-tuned on top of the BAAI/bge-small-en-v1.5 using the MEDI dataset augmented with mined triplets from the MTEB Classification training dataset (excluding data from the Amazon Polarity Classification task). The model does not require any instruction for generating embeddings. This means that queries for retrieval tasks can be directly encoded without crafting instructions. Technical paper: GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning # Data The dataset used is a compilation of the MEDI and MTEB Classification training datasets. Third-party datasets may be subject to additional terms and conditions under their associated licenses. A HuggingFace Dataset version of the compiled dataset, and the specific revision used to train the model, is available: - Dataset: avsolatorio/medi-data-mteb_avs_triplets - Revision: 238a0499b6e6b690cc64ea56fde8461daa8341bb The dataset contains a key, which can be used to select only the mteb classification tasks (prefixed with ). The **MEDI Dataset** is published in the following paper: One Embedder, Any Task: Instruction-Finetuned Text Embeddings. The MTEB Benchmark results of the GIST embedding model, compared with the base model, suggest that the fine-tuning dataset has perturbed the model considerably, which resulted in significant improvements in certain tasks while adversely degrading performance in some. The retrieval performance for the TRECCOVID task is of note. The fine-tuning dataset does not contain significant knowledge about COVID-19, which could have caused the observed performance degradation. We found some evidence, detailed in the paper, that thematic coverage of the fine-tuning data can affect downstream performance. # Usage The model can be easily loaded using the Sentence Transformers library. # Training Parameters Below are the training parameters used to fine-tune the model: # Evaluation The model was evaluated using the MTEB Evaluation suite. # Citation Please cite our work if you use GISTEmbed or the datasets we published in your projects or research. 🤗 # Acknowledgements This work is supported by the \"KCP IV - Exploring Data Use in the Development Economics Literature using Large Language Models (AI and LLMs)\" project funded by the Knowledge for Change Program (KCP) of the World Bank - RA-P503405-RESE-TF0C3444. The findings, interpretations, and conclusions expressed in this material are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity comparison, classification, clustering, and retrieval by transforming text into numerical representations." +} \ No newline at end of file diff --git a/data/model_data_json/bartowski_DeepSeek-R1-Distill-Qwen-14B-GGUF.json b/data/model_data_json/bartowski_DeepSeek-R1-Distill-Qwen-14B-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..a16dc09222c0541b8977585cda2a780da87bf550 --- /dev/null +++ b/data/model_data_json/bartowski_DeepSeek-R1-Distill-Qwen-14B-GGUF.json @@ -0,0 +1,16 @@ +{ + "model_id": "bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF", + "downloads": 218828, + "tags": [ + "gguf", + "text-generation", + "base_model:deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", + "base_model:quantized:deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", + "endpoints_compatible", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- quantized_by: bartowski pipeline_tag: text-generation base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --- ## Llamacpp imatrix Quantizations of DeepSeek-R1-Distill-Qwen-14B Using Click to view download instructions First, make sure you have hugginface-cli installed: Then, you can target the specific file you want: If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run: You can either specify a new local-dir (DeepSeek-R1-Distill-Qwen-14B-Q8_0) or download them all in place (./)
## ARM/AVX information Previously, you would download Q4_0_4_4/4_8/8_8, and these would have their weights interleaved in memory in order to improve performance on ARM and AVX machines by loading up more data in one pass. Now, however, there is something called \"online repacking\" for weights. details in this PR. If you use Q4_0 and your hardware would benefit from repacking weights, it will do it automatically on the fly. As of llama.cpp build b4282 you will not be able to run the Q4_0_X_X files and will instead need to use Q4_0. Additionally, if you want to get slightly better quality for , you can use IQ4_NL thanks to this PR which will also repack the weights for ARM, though only the 4_4 for now. The loading time may be slower but it will result in an overall speed incrase.
Click to view Q4_0_X_X information (deprecated I'm keeping this section to show the potential theoretical uplift in performance from using the Q4_0 with online repacking.
Click to view benchmarks on an AVX2 system (EPYC7702) | model | size | params | backend | threads | test | t/s | % (vs Q4_0) | | ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |-------------: | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp512 | 204.03 ± 1.03 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp1024 | 282.92 ± 0.19 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp2048 | 259.49 ± 0.44 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg128 | 39.12 ± 0.27 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg256 | 39.31 ± 0.69 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg512 | 40.52 ± 0.03 | 100% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp512 | 301.02 ± 1.74 | 147% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp1024 | 287.23 ± 0.20 | 101% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp2048 | 262.77 ± 1.81 | 101% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg128 | 18.80 ± 0.99 | 48% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg256 | 24.46 ± 3.04 | 83% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg512 | 36.32 ± 3.59 | 90% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp512 | 271.71 ± 3.53 | 133% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp1024 | 279.86 ± 45.63 | 100% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp2048 | 320.77 ± 5.00 | 124% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg128 | 43.51 ± 0.05 | 111% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg256 | 43.35 ± 0.09 | 110% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg512 | 42.60 ± 0.31 | 105% | Q4_0_8_8 offers a nice bump to prompt processing and a small bump to text generation
## Which file should I choose?
Click here for details A great write up with charts showing various performances is provided by Artefact2 here The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have. If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM. If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total. Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'. If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M. If you want to get more into the weeds, you can check out this extremely useful feature chart: llama.cpp feature matrix But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size. These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide. The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
## Credits Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset. Thank you ZeroWw for the inspiration to experiment with embed/output. Want to support my work? Visit my ko-fi page here:", + "model_explanation_gemini": "Quantized versions of the DeepSeek-R1-Distill-Qwen-14B text-generation model optimized for efficient inference with varying quality-to-size tradeoffs." +} \ No newline at end of file diff --git a/data/model_data_json/bartowski_Llama-3.2-1B-Instruct-GGUF.json b/data/model_data_json/bartowski_Llama-3.2-1B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..443f9bc91157272694cdaba2f3d2a49ed1faabd1 --- /dev/null +++ b/data/model_data_json/bartowski_Llama-3.2-1B-Instruct-GGUF.json @@ -0,0 +1,28 @@ +{ + "model_id": "bartowski/Llama-3.2-1B-Instruct-GGUF", + "downloads": 88174, + "tags": [ + "gguf", + "facebook", + "meta", + "llama", + "llama-3", + "text-generation", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "base_model:meta-llama/Llama-3.2-1B-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-1B-Instruct", + "license:llama3.2", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- base_model: meta-llama/Llama-3.2-1B-Instruct language: - en - de - fr - it - pt - hi - es - th license: llama3.2 pipeline_tag: text-generation tags: - facebook - meta - llama - llama-3 quantized_by: bartowski extra_gated_prompt: \"### LLAMA 3.2 COMMUNITY LICENSE AGREEMENT\\n\\nLlama 3.2 Version\\ \\ Release Date: September 25, 2024\\n\\n“Agreement” means the terms and conditions\\ \\ for use, reproduction, distribution and modification of the Llama Materials set\\ \\ forth herein.\\n\\n“Documentation” means the specifications, manuals and documentation\\ \\ accompanying Llama 3.2 distributed by Meta at \\n“Licensee” or “you” means you, or your employer or any other person or entity\\ \\ (if you are entering into this Agreement on such person or entity’s behalf),\\ \\ of the age required under applicable laws, rules or regulations to provide legal\\ \\ consent and that has legal authority to bind your employer or such other person\\ \\ or entity if you are entering in this Agreement on their behalf.\\n\\n“Llama 3.2”\\ \\ means the foundational large language models and software and algorithms, including\\ \\ machine-learning model code, trained model weights, inference-enabling code, training-enabling\\ \\ code, fine-tuning enabling code and other elements of the foregoing distributed\\ \\ by Meta at Materials” means,\\ \\ collectively, Meta’s proprietary Llama 3.2 and Documentation (and any portion\\ \\ thereof) made available under this Agreement.\\n\\n“Meta” or “we” means Meta Platforms\\ \\ Ireland Limited (if you are located in or, if you are an entity, your principal\\ \\ place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if\\ \\ you are located outside of the EEA or Switzerland). \\n\\nBy clicking “I Accept”\\ \\ below or by using or distributing any portion or element of the Llama Materials,\\ \\ you agree to be bound by this Agreement.\\n\\n1. License Rights and Redistribution.\\n\\ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable\\ \\ and royalty-free limited license under Meta’s intellectual property or other rights\\ \\ owned by Meta embodied in the Llama Materials to use, reproduce, distribute,\\ \\ copy, create derivative works of, and make modifications to the Llama Materials.\\ \\ \\nb. Redistribution and Use. \\ni. If you distribute or make available the Llama\\ \\ Materials (or any derivative works thereof), or a product or service (including\\ \\ another AI model) that contains any of them, you shall (A) provide a copy of this\\ \\ Agreement with any such Llama Materials; and (B) prominently display “Built with\\ \\ Llama” on a related website, user interface, blogpost, about page, or product\\ \\ documentation. If you use the Llama Materials or any outputs or results of the\\ \\ Llama Materials to create, train, fine tune, or otherwise improve an AI model,\\ \\ which is distributed or made available, you shall also include “Llama” at the\\ \\ beginning of any such AI model name.\\nii. If you receive Llama Materials, or any\\ \\ derivative works thereof, from a Licensee as part of an integrated end user product,\\ \\ then Section 2 of this Agreement will not apply to you. \\niii. You must retain\\ \\ in all copies of the Llama Materials that you distribute the following attribution\\ \\ notice within a “Notice” text file distributed as a part of such copies: “Llama\\ \\ 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms,\\ \\ Inc. All Rights Reserved.”\\niv. Your use of the Llama Materials must comply with\\ \\ applicable laws and regulations (including trade compliance laws and regulations)\\ \\ and adhere to the Acceptable Use Policy for the Llama Materials (available at\\ \\ which is hereby incorporated by reference\\ \\ into this Agreement.\\n \\n2. Additional Commercial Terms. If, on the Llama 3.2\\ \\ version release date, the monthly active users of the products or services made\\ \\ available by or for Licensee, or Licensee’s affiliates, is greater than 700 million\\ \\ monthly active users in the preceding calendar month, you must request a license\\ \\ from Meta, which Meta may grant to you in its sole discretion, and you are not\\ \\ authorized to exercise any of the rights under this Agreement unless or until\\ \\ Meta otherwise expressly grants you such rights.\\n3. Disclaimer of Warranty. UNLESS\\ \\ REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM\\ \\ ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS\\ \\ ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION,\\ \\ ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR\\ \\ PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING\\ \\ OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR\\ \\ USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.\\n4. Limitation of Liability.\\ \\ IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY,\\ \\ WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING\\ \\ OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,\\ \\ INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE\\ \\ BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.\\n5. Intellectual Property.\\n\\ a. No trademark licenses are granted under this Agreement, and in connection with\\ \\ the Llama Materials, neither Meta nor Licensee may use any name or mark owned\\ \\ by or associated with the other or any of its affiliates, except as required\\ \\ for reasonable and customary use in describing and redistributing the Llama Materials\\ \\ or as set forth in this Section 5(a). Meta hereby grants you a license to use\\ \\ “Llama” (the “Mark”) solely as required to comply with the last sentence of Section\\ \\ 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at\\ \\ All goodwill arising\\ \\ out of your use of the Mark will inure to the benefit of Meta.\\nb. Subject to\\ \\ Meta’s ownership of Llama Materials and derivatives made by or for Meta, with\\ \\ respect to any derivative works and modifications of the Llama Materials that\\ \\ are made by you, as between you and Meta, you are and will be the owner of such\\ \\ derivative works and modifications.\\nc. If you institute litigation or other proceedings\\ \\ against Meta or any entity (including a cross-claim or counterclaim in a lawsuit)\\ \\ alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion\\ \\ of any of the foregoing, constitutes infringement of intellectual property or\\ \\ other rights owned or licensable by you, then any licenses granted to you under\\ \\ this Agreement shall terminate as of the date such litigation or claim is filed\\ \\ or instituted. You will indemnify and hold harmless Meta from and against any\\ \\ claim by any third party arising out of or related to your use or distribution\\ \\ of the Llama Materials.\\n6. Term and Termination. The term of this Agreement will\\ \\ commence upon your acceptance of this Agreement or access to the Llama Materials\\ \\ and will continue in full force and effect until terminated in accordance with\\ \\ the terms and conditions herein. Meta may terminate this Agreement if you are\\ \\ in breach of any term or condition of this Agreement. Upon termination of this\\ \\ Agreement, you shall delete and cease use of the Llama Materials. Sections 3,\\ \\ 4 and 7 shall survive the termination of this Agreement. \\n7. Governing Law and\\ \\ Jurisdiction. This Agreement will be governed and construed under the laws of\\ \\ the State of California without regard to choice of law principles, and the UN\\ \\ Convention on Contracts for the International Sale of Goods does not apply to\\ \\ this Agreement. The courts of California shall have exclusive jurisdiction of\\ \\ any dispute arising out of this Agreement. \\n### Llama 3.2 Acceptable Use Policy\\n\\ Meta is committed to promoting safe and fair use of its tools and features, including\\ \\ Llama 3.2. If you access or use Llama 3.2, you agree to this Acceptable Use Policy\\ \\ (“**Policy**”). The most recent copy of this policy can be found at #### Prohibited Uses\\nWe want everyone to use Llama 3.2 safely and responsibly.\\ \\ You agree you will not use, or allow others to use, Llama 3.2 to:\\n1. Violate\\ \\ the law or others’ rights, including to:\\n 1. Engage in, promote, generate,\\ \\ contribute to, encourage, plan, incite, or further illegal or unlawful activity\\ \\ or content, such as:\\n 1. Violence or terrorism\\n 2. Exploitation\\ \\ or harm to children, including the solicitation, creation, acquisition, or dissemination\\ \\ of child exploitative content or failure to report Child Sexual Abuse Material\\n\\ \\ 3. Human trafficking, exploitation, and sexual violence\\n 4. The\\ \\ illegal distribution of information or materials to minors, including obscene\\ \\ materials, or failure to employ legally required age-gating in connection with\\ \\ such information or materials.\\n 5. Sexual solicitation\\n 6. Any\\ \\ other criminal activity\\n 1. Engage in, promote, incite, or facilitate the\\ \\ harassment, abuse, threatening, or bullying of individuals or groups of individuals\\n\\ \\ 2. Engage in, promote, incite, or facilitate discrimination or other unlawful\\ \\ or harmful conduct in the provision of employment, employment benefits, credit,\\ \\ housing, other economic benefits, or other essential goods and services\\n 3.\\ \\ Engage in the unauthorized or unlicensed practice of any profession including,\\ \\ but not limited to, financial, legal, medical/health, or related professional\\ \\ practices\\n 4. Collect, process, disclose, generate, or infer private or sensitive\\ \\ information about individuals, including information about individuals’ identity,\\ \\ health, or demographic information, unless you have obtained the right to do so\\ \\ in accordance with applicable law\\n 5. Engage in or facilitate any action or\\ \\ generate any content that infringes, misappropriates, or otherwise violates any\\ \\ third-party rights, including the outputs or results of any products or services\\ \\ using the Llama Materials\\n 6. Create, generate, or facilitate the creation\\ \\ of malicious code, malware, computer viruses or do anything else that could disable,\\ \\ overburden, interfere with or impair the proper working, integrity, operation\\ \\ or appearance of a website or computer system\\n 7. Engage in any action, or\\ \\ facilitate any action, to intentionally circumvent or remove usage restrictions\\ \\ or other safety measures, or to enable functionality disabled by Meta \\n2. Engage\\ \\ in, promote, incite, facilitate, or assist in the planning or development of activities\\ \\ that present a risk of death or bodily harm to individuals, including use of Llama\\ \\ 3.2 related to the following:\\n 8. Military, warfare, nuclear industries or\\ \\ applications, espionage, use for materials or activities that are subject to the\\ \\ International Traffic Arms Regulations (ITAR) maintained by the United States\\ \\ Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989\\ \\ or the Chemical Weapons Convention Implementation Act of 1997\\n 9. Guns and\\ \\ illegal weapons (including weapon development)\\n 10. Illegal drugs and regulated/controlled\\ \\ substances\\n 11. Operation of critical infrastructure, transportation technologies,\\ \\ or heavy machinery\\n 12. Self-harm or harm to others, including suicide, cutting,\\ \\ and eating disorders\\n 13. Any content intended to incite or promote violence,\\ \\ abuse, or any infliction of bodily harm to an individual\\n3. Intentionally deceive\\ \\ or mislead others, including use of Llama 3.2 related to the following:\\n 14.\\ \\ Generating, promoting, or furthering fraud or the creation or promotion of disinformation\\n\\ \\ 15. Generating, promoting, or furthering defamatory content, including the\\ \\ creation of defamatory statements, images, or other content\\n 16. Generating,\\ \\ promoting, or further distributing spam\\n 17. Impersonating another individual\\ \\ without consent, authorization, or legal right\\n 18. Representing that the\\ \\ use of Llama 3.2 or outputs are human-generated\\n 19. Generating or facilitating\\ \\ false online engagement, including fake reviews and other means of fake online\\ \\ engagement \\n4. Fail to appropriately disclose to end users any known dangers\\ \\ of your AI system 5. Interact with third party tools, models, or software designed\\ \\ to generate unlawful content or engage in unlawful or harmful conduct and/or represent\\ \\ that the outputs of such tools, models, or software are associated with Meta or\\ \\ Llama 3.2\\n\\nWith respect to any multimodal models included in Llama 3.2, the\\ \\ rights granted under Section 1(a) of the Llama 3.2 Community License Agreement\\ \\ are not being granted to you if you are an individual domiciled in, or a company\\ \\ with a principal place of business in, the European Union. This restriction does\\ \\ not apply to end users of a product or service that incorporates any such multimodal\\ \\ models.\\n\\nPlease report any violation of this Policy, software “bug,” or other\\ \\ problems that could lead to a violation of this Policy through one of the following\\ \\ means:\\n\\n* Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback\\n\\ * Reporting bugs and security concerns: facebook.com/whitehat/info\\n\\ * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama\\ \\ 3.2: LlamaUseReport@meta.com\" extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location ? By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy : checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Llamacpp imatrix Quantizations of Llama-3.2-1B-Instruct Using
Click to view download instructions First, make sure you have hugginface-cli installed: Then, you can target the specific file you want: If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run: You can either specify a new local-dir (meta-llama_Llama-4-Scout-17B-16E-Instruct-Q8_0) or download them all in place (./) ## ARM/AVX information Previously, you would download Q4_0_4_4/4_8/8_8, and these would have their weights interleaved in memory in order to improve performance on ARM and AVX machines by loading up more data in one pass. Now, however, there is something called \"online repacking\" for weights. details in this PR. If you use Q4_0 and your hardware would benefit from repacking weights, it will do it automatically on the fly. As of llama.cpp build b4282 you will not be able to run the Q4_0_X_X files and will instead need to use Q4_0. Additionally, if you want to get slightly better quality for , you can use IQ4_NL thanks to this PR which will also repack the weights for ARM, though only the 4_4 for now. The loading time may be slower but it will result in an overall speed incrase.
Click to view Q4_0_X_X information (deprecated I'm keeping this section to show the potential theoretical uplift in performance from using the Q4_0 with online repacking.
Click to view benchmarks on an AVX2 system (EPYC7702) | model | size | params | backend | threads | test | t/s | % (vs Q4_0) | | ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |-------------: | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp512 | 204.03 ± 1.03 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp1024 | 282.92 ± 0.19 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp2048 | 259.49 ± 0.44 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg128 | 39.12 ± 0.27 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg256 | 39.31 ± 0.69 | 100% | | qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg512 | 40.52 ± 0.03 | 100% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp512 | 301.02 ± 1.74 | 147% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp1024 | 287.23 ± 0.20 | 101% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp2048 | 262.77 ± 1.81 | 101% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg128 | 18.80 ± 0.99 | 48% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg256 | 24.46 ± 3.04 | 83% | | qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg512 | 36.32 ± 3.59 | 90% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp512 | 271.71 ± 3.53 | 133% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp1024 | 279.86 ± 45.63 | 100% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp2048 | 320.77 ± 5.00 | 124% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg128 | 43.51 ± 0.05 | 111% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg256 | 43.35 ± 0.09 | 110% | | qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg512 | 42.60 ± 0.31 | 105% | Q4_0_8_8 offers a nice bump to prompt processing and a small bump to text generation
## Which file should I choose?
Click here for details A great write up with charts showing various performances is provided by Artefact2 here The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have. If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM. If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total. Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'. If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M. If you want to get more into the weeds, you can check out this extremely useful feature chart: llama.cpp feature matrix But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size. These I-quants can also be used on CPU, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
## Credits Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset. Thank you ZeroWw for the inspiration to experiment with embed/output. Thank you to LM Studio for sponsoring my work. Want to support my work? Visit my ko-fi page here:", + "model_explanation_gemini": "A quantized text-generation model based on Meta's Llama-4-Scout-17B-16E-Instruct, designed for generating text under specific licensing terms." +} \ No newline at end of file diff --git a/data/model_data_json/benjamin-paine_stable-diffusion-v1-5-inpainting.json b/data/model_data_json/benjamin-paine_stable-diffusion-v1-5-inpainting.json new file mode 100644 index 0000000000000000000000000000000000000000..50d85e0bdfe487517e8e0c41dd8c573ff9ea6b58 --- /dev/null +++ b/data/model_data_json/benjamin-paine_stable-diffusion-v1-5-inpainting.json @@ -0,0 +1,20 @@ +{ + "model_id": "benjamin-paine/stable-diffusion-v1-5-inpainting", + "downloads": 256622, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "text-to-image", + "arxiv:2207.12598", + "arxiv:2112.10752", + "arxiv:2103.00020", + "arxiv:2205.11487", + "arxiv:1910.09700", + "license:creativeml-openrail-m", + "diffusers:StableDiffusionInpaintPipeline", + "region:us" + ], + "description": "--- license: creativeml-openrail-m tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image inference: false library_name: diffusers extra_gated_prompt: |- One more step before getting this model. This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies: 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content 2. CompVis claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully) Please read the full license here: By clicking on \"Access repository\" below, you accept that your *contact information* (email address and username) can be shared with the model authors as well. extra_gated_fields: I have read the License and agree with its terms: checkbox --- # Re-upload This repository is being re-uploaded to HuggingFace in accordance with The CreativeML OpenRAIL-M License under which this repository was originally uploaded, specifically **Section II** which grants: > ...a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model. Note that these files did not come from HuggingFace, but instead from modelscope. As such, some files that were present in the original repository may not be present. File integrity has been verified via checksum. # Original Model Card Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask. The **Stable-Diffusion-Inpainting** was initialized with the weights of the Stable-Diffusion-v-1-2. First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything. :** English - **License:** The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based. - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper. - **Resources for more information:** Paper. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is taken from the DALLE-MINI model card, but applies in the same way to Stable Diffusion v1_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a large-scale dataset LAION-5B which contains adult material and is not fit for product use without additional safety mechanisms and considerations. - No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data. The training data can be searched at to possibly assist in the detection of memorized images. ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion v1 was trained on subsets of LAION-2B(en), which consists of images that are primarily limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-2B (en) and subsets thereof (see next section) **Training Procedure** Stable Diffusion v1 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through a ViT-L/14 text-encoder. - The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We currently provide six checkpoints, , and , , and which were trained as follows, - : 237k steps at resolution on laion2B-en. 194k steps at resolution on laion-high-resolution (170M examples from LAION-5B with resolution ). - : Resumed from . 515k steps at resolution on \"laion-improved-aesthetics\" (a subset of laion2B-en, filtered to images with an original size , estimated aesthetics score , and an estimated watermark probability . The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). - : Resumed from . 195k steps at resolution on \"laion-improved-aesthetics\" and 10\\% dropping of the text-conditioning to improve classifier-free guidance sampling. - : Resumed from stable-diffusion-v1-2.225,000 steps at resolution 512x512 on \"laion-aesthetics v2 5+\" and 10 % dropping of the text-conditioning to classifier-free guidance sampling. - : Resumed from sd-v1-2.ckpt. 595k steps at resolution 512x512 on \"laion-aesthetics v2 5+\" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. - : Resumed from sd-v1-2.ckpt. 595k steps at resolution 512x512 on \"laion-aesthetics v2 5+\" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 2 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 PLMS steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Inpainting Evaluation To assess the performance of the inpainting model, we used the same evaluation protocol as in our LDM paper. Since the Stable Diffusion Inpainting Model acccepts a text input, we simply used a fixed prompt of . | Model | FID | LPIPS | |-----------------------------|------|------------------| | Stable Diffusion Inpainting | 1.00 | 0.141 (+- 0.082) | | Latent Diffusion Inpainting | 1.50 | 0.137 (+- 0.080) | | CoModGAN | 1.82 | 0.15 | | LaMa | 2.21 | 0.134 (+- 0.080) | ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 150000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 11250 kg CO2 eq. ## Citation *This model card was written by: Robin Rombach and Patrick Esser and is based on the DALL-E Mini model card.*" +} \ No newline at end of file diff --git a/data/model_data_json/benjamin_wtp-canine-s-1l.json b/data/model_data_json/benjamin_wtp-canine-s-1l.json new file mode 100644 index 0000000000000000000000000000000000000000..ecdee595bf788bf58983c2d651483d6ae0a16ebf --- /dev/null +++ b/data/model_data_json/benjamin_wtp-canine-s-1l.json @@ -0,0 +1,101 @@ +{ + "model_id": "benjamin/wtp-canine-s-1l", + "downloads": 169691, + "tags": [ + "transformers", + "pytorch", + "la-canine", + "token-classification", + "multilingual", + "am", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hu", + "hy", + "id", + "ig", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "mt", + "my", + "ne", + "nl", + "no", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "si", + "sk", + "sl", + "sq", + "sr", + "sv", + "ta", + "te", + "tg", + "th", + "tr", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "yo", + "zh", + "zu", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - multilingual - am - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hu - hy - id - ig - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lt - lv - mg - mk - ml - mn - mr - ms - mt - my - ne - nl - no - pa - pl - ps - pt - ro - ru - si - sk - sl - sq - sr - sv - ta - te - tg - th - tr - uk - ur - uz - vi - xh - yi - yo - zh - zu --- # wtp-canine-s-1l Model for [](", + "model_explanation_gemini": "Multilingual text processing model supporting over 80 languages for general natural language tasks." +} \ No newline at end of file diff --git a/data/model_data_json/beomi_KcELECTRA-base-v2022.json b/data/model_data_json/beomi_KcELECTRA-base-v2022.json new file mode 100644 index 0000000000000000000000000000000000000000..3410f724dcc539f4dcd57c25fc6db14840192854 --- /dev/null +++ b/data/model_data_json/beomi_KcELECTRA-base-v2022.json @@ -0,0 +1,18 @@ +{ + "model_id": "beomi/KcELECTRA-base-v2022", + "downloads": 105948, + "tags": [ + "transformers", + "pytorch", + "electra", + "pretraining", + "korean", + "ko", + "en", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ko - en tags: - electra - korean license: \"mit\" --- # 🚨 Important Note: This REPO is DEPRECATED since KcELECTRA-base v2023 Released 🚨 ## USE and Revision if needed. --- # KcELECTRA: Korean comments ELECTRA ** Updates on 2022.10.08 ** - KcELECTRA-base-v2022 (구 v2022-dev) 모델 이름이 변경되었습니다. - 위 모델의 세부 스코어를 추가하였습니다. - 기존 KcELECTRA-base(v2021) 대비 대부분의 downstream task에서 ~1%p 수준의 성능 향상이 있습니다. --- 공개된 한국어 Transformer 계열 모델들은 대부분 한국어 위키, 뉴스 기사, 책 등 잘 정제된 데이터를 기반으로 학습한 모델입니다. 한편, 실제로 NSMC와 같은 User-Generated Noisy text domain 데이터셋은 정제되지 않았고 구어체 특징에 신조어가 많으며, 오탈자 등 공식적인 글쓰기에서 나타나지 않는 표현들이 빈번하게 등장합니다. KcELECTRA는 위와 같은 특성의 데이터셋에 적용하기 위해, 네이버 뉴스에서 댓글과 대댓글을 수집해, 토크나이저와 ELECTRA모델을 처음부터 학습한 Pretrained ELECTRA 모델입니다. 기존 KcBERT 대비 데이터셋 증가 및 vocab 확장을 통해 상당한 수준으로 성능이 향상되었습니다. KcELECTRA는 Huggingface의 Transformers 라이브러리를 통해 간편히 불러와 사용할 수 있습니다. (별도의 파일 다운로드가 필요하지 않습니다.) ## KcELECTRA Performance - Finetune 코드는 에서 찾아보실 수 있습니다. - 해당 Repo의 각 Checkpoint 폴더에서 Step별 세부 스코어를 확인하실 수 있습니다. | | Size
(용량) | **NSMC**
(acc) | **Naver NER**
(F1) | **PAWS**
(acc) | **KorNLI**
(acc) | **KorSTS**
(spearman) | **Question Pair**
(acc) | **KorQuaD (Dev)**
(EM/F1) | | :----------------- | :-------------: | :----------------: | :--------------------: | :----------------: | :------------------: | :-----------------------: | :-------------------------: | :---------------------------: | | **KcELECTRA-base-v2022** | 475M | **91.97** | 87.35 | 76.50 | 82.12 | 83.67 | 95.12 | 69.00 / 90.40 | | **KcELECTRA-base** | 475M | 91.71 | 86.90 | 74.80 | 81.65 | 82.65 | **95.78** | 70.60 / 90.11 | | KcBERT-Base | 417M | 89.62 | 84.34 | 66.95 | 74.85 | 75.57 | 93.93 | 60.25 / 84.39 | | KcBERT-Large | 1.2G | 90.68 | 85.53 | 70.15 | 76.99 | 77.49 | 94.06 | 62.16 / 86.64 | | KoBERT | 351M | 89.63 | 86.11 | 80.65 | 79.00 | 79.64 | 93.93 | 52.81 / 80.27 | | XLM-Roberta-Base | 1.03G | 89.49 | 86.26 | 82.95 | 79.92 | 79.09 | 93.53 | 64.70 / 88.94 | | HanBERT | 614M | 90.16 | 87.31 | 82.40 | 80.89 | 83.33 | 94.19 | 78.74 / 92.02 | | KoELECTRA-Base | 423M | 90.21 | 86.87 | 81.90 | 80.85 | 83.21 | 94.20 | 61.10 / 89.59 | | KoELECTRA-Base-v2 | 423M | 89.70 | 87.02 | 83.90 | 80.61 | 84.30 | 94.72 | 84.34 / 92.58 | | KoELECTRA-Base-v3 | 423M | 90.63 | **88.11** | **84.45** | **82.24** | **85.53** | 95.25 | **84.83 / 93.45** | | DistilKoBERT | 108M | 88.41 | 84.13 | 62.55 | 70.55 | 73.21 | 92.48 | 54.12 / 77.80 | \\*HanBERT의 Size는 Bert Model과 Tokenizer DB를 합친 것입니다. \\***config의 세팅을 그대로 하여 돌린 결과이며, hyperparameter tuning을 추가적으로 할 시 더 좋은 성능이 나올 수 있습니다.** ## How to use ### Requirements - - - - ### Default usage > 💡 이전 KcBERT 관련 코드들에서 , 을 사용한 경우 부분을 로만 변경해주시면 즉시 사용이 가능합니다. ### Pretrain & Finetune Colab 링크 모음 #### Pretrain Data - KcBERT학습에 사용한 데이터 + 이후 2021.03월 초까지 수집한 댓글 - 약 17GB - 댓글-대댓글을 묶은 기반으로 Document 구성 #### Pretrain Code - Repo를 통한 Pretrain #### Finetune Code - Repo를 통한 Finetune 및 스코어 비교 #### Finetune Samples - NSMC with PyTorch-Lightning 1.3.0, GPU, Colab
## Train Data & Preprocessing ### Raw Data 학습 데이터는 2019.01.01 ~ 2021.03.09 사이에 작성된 **댓글 많은 뉴스/혹은 전체 뉴스** 기사들의 **댓글과 대댓글**을 모두 수집한 데이터입니다. 데이터 사이즈는 텍스트만 추출시 **약 17.3GB이며, 1억8천만개 이상의 문장**으로 이뤄져 있습니다. > KcBERT는 2019.01-2020.06의 텍스트로, 정제 후 약 9천만개 문장으로 학습을 진행했습니다. ### Preprocessing PLM 학습을 위해서 전처리를 진행한 과정은 다음과 같습니다. 1. 한글 및 영어, 특수문자, 그리고 이모지(🥳)까지! 정규표현식을 통해 한글, 영어, 특수문자를 포함해 Emoji까지 학습 대상에 포함했습니다. 한편, 한글 범위를 으로 지정해 내의 한자를 제외했습니다. 2. 댓글 내 중복 문자열 축약 와 같이 중복된 글자를 와 같은 것으로 합쳤습니다. 3. Cased Model KcBERT는 영문에 대해서는 대소문자를 유지하는 Cased model입니다. 4. 글자 단위 10글자 이하 제거 10글자 미만의 텍스트는 단일 단어로 이뤄진 경우가 많아 해당 부분을 제외했습니다. 5. 중복 제거 중복적으로 쓰인 댓글을 제거하기 위해 완전히 일치하는 중복 댓글을 하나로 합쳤습니다. 6. 제거 네이버 댓글의 경우, 비속어는 자체 필터링을 통해 로 표시합니다. 이 부분을 공백으로 제거하였습니다. 아래 명령어로 pip로 설치한 뒤, 아래 clean함수로 클리닝을 하면 Downstream task에서 보다 성능이 좋아집니다. ( 감소) 아래 함수를 Text data에 사용해주세요. > 💡 Finetune Score에서는 위 함수를 적용하지 않았습니다. ### Cleaned Data - KcBERT 외 추가 데이터는 정리 후 공개 예정입니다. ## Tokenizer, Model Train Tokenizer는 Huggingface의 Tokenizers 라이브러리를 통해 학습을 진행했습니다. 그 중 를 이용해 학습을 진행했고, Vocab Size는 으로 진행했습니다. Tokenizer를 학습하는 것에는 전체 데이터를 통해 학습을 진행했고, 모델의 General Downstream task에 대응하기 위해 KoELECTRA에서 사용한 Vocab을 겹치지 않는 부분을 추가로 넣어주었습니다. (실제로 두 모델이 겹치는 부분은 약 5000토큰이었습니다.) TPU 을 이용해 약 10일 학습을 진행했고, 현재 Huggingface에 공개된 모델은 848k step을 학습한 모델 weight가 업로드 되어있습니다. (100k step별 Checkpoint를 통해 성능 평가를 진행하였습니다. 해당 부분은 repo를 참고해주세요.) 모델 학습 Loss는 Step에 따라 초기 100-200k 사이에 급격히 Loss가 줄어들다 학습 종료까지도 지속적으로 loss가 감소하는 것을 볼 수 있습니다. !KcELECTRA-base Pretrain Loss ### KcELECTRA Pretrain Step별 Downstream task 성능 비교 > 💡 아래 표는 전체 ckpt가 아닌 일부에 대해서만 테스트를 진행한 결과입니다. !KcELECTRA Pretrain Step별 Downstream task 성능 비교 - 위와 같이 KcBERT-base, KcBERT-large 대비 **모든 데이터셋에 대해** KcELECTRA-base가 더 높은 성능을 보입니다. - KcELECTRA pretrain에서도 Train step이 늘어감에 따라 점진적으로 성능이 향상되는 것을 볼 수 있습니다. ## 인용표기/Citation KcELECTRA를 인용하실 때는 아래 양식을 통해 인용해주세요. 논문을 통한 사용 외에는 MIT 라이센스를 표기해주세요. ☺️ ## Acknowledgement KcELECTRA Model을 학습하는 GCP/TPU 환경은 TFRC 프로그램의 지원을 받았습니다. 모델 학습 과정에서 많은 조언을 주신 Monologg 님 감사합니다 :) ## Reference ### Github Repos - KcBERT by Beomi - BERT by Google - KoBERT by SKT - KoELECTRA by Monologg - Transformers by Huggingface - Tokenizers by Hugginface - ELECTRA train code by KLUE ### Blogs - Monologg님의 KoELECTRA 학습기 - Colab에서 TPU로 BERT 처음부터 학습시키기 - Tensorflow/Google ver.", + "model_explanation_gemini": "Pretrained Korean ELECTRA model optimized for noisy, user-generated text like online comments, outperforming predecessors on various Korean NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/beomi_kcbert-base.json b/data/model_data_json/beomi_kcbert-base.json new file mode 100644 index 0000000000000000000000000000000000000000..ecefef97feda6535750fdd6d8e246ebab8a9237a --- /dev/null +++ b/data/model_data_json/beomi_kcbert-base.json @@ -0,0 +1,22 @@ +{ + "model_id": "beomi/kcbert-base", + "downloads": 87804, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "bert", + "fill-mask", + "korean", + "ko", + "arxiv:1810.04805", + "doi:10.57967/hf/0016", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko license: apache-2.0 tags: - korean --- # KcBERT: Korean comments BERT ** Updates on 2021.04.07 ** - KcELECTRA가 릴리즈 되었습니다!🤗 - KcELECTRA는 보다 더 많은 데이터셋, 그리고 더 큰 General vocab을 통해 KcBERT 대비 **모든 태스크에서 더 높은 성능**을 보입니다. - 아래 깃헙 링크에서 직접 사용해보세요! - ** Updates on 2021.03.14 ** - KcBERT Paper 인용 표기를 추가하였습니다.(bibtex) - KcBERT-finetune Performance score를 본문에 추가하였습니다. ** Updates on 2020.12.04 ** Huggingface Transformers가 v4.0.0으로 업데이트됨에 따라 Tutorial의 코드가 일부 변경되었습니다. 업데이트된 KcBERT-Large NSMC Finetuning Colab: ** Updates on 2020.09.11 ** KcBERT를 Google Colab에서 TPU를 통해 학습할 수 있는 튜토리얼을 제공합니다! 아래 버튼을 눌러보세요. Colab에서 TPU로 KcBERT Pretrain 해보기: 텍스트 분량만 전체 12G 텍스트 중 일부(144MB)로 줄여 학습을 진행합니다. 한국어 데이터셋/코퍼스를 좀더 쉽게 사용할 수 있는 Korpora 패키지를 사용합니다. ** Updates on 2020.09.08 ** Github Release를 통해 학습 데이터를 업로드하였습니다. 다만 한 파일당 2GB 이내의 제약으로 인해 분할압축되어있습니다. 아래 링크를 통해 받아주세요. (가입 없이 받을 수 있어요. 분할압축) 만약 한 파일로 받고싶으시거나/Kaggle에서 데이터를 살펴보고 싶으시다면 아래의 캐글 데이터셋을 이용해주세요. - Github릴리즈: ** Updates on 2020.08.22 ** Pretrain Dataset 공개 - 캐글: (한 파일로 받을 수 있어요. 단일파일) Kaggle에 학습을 위해 정제한(아래 처리를 거친) Dataset을 공개하였습니다! 직접 다운받으셔서 다양한 Task에 학습을 진행해보세요 :) --- 공개된 한국어 BERT는 대부분 한국어 위키, 뉴스 기사, 책 등 잘 정제된 데이터를 기반으로 학습한 모델입니다. 한편, 실제로 NSMC와 같은 댓글형 데이터셋은 정제되지 않았고 구어체 특징에 신조어가 많으며, 오탈자 등 공식적인 글쓰기에서 나타나지 않는 표현들이 빈번하게 등장합니다. KcBERT는 위와 같은 특성의 데이터셋에 적용하기 위해, 네이버 뉴스에서 댓글과 대댓글을 수집해, 토크나이저와 BERT모델을 처음부터 학습한 Pretrained BERT 모델입니다. KcBERT는 Huggingface의 Transformers 라이브러리를 통해 간편히 불러와 사용할 수 있습니다. (별도의 파일 다운로드가 필요하지 않습니다.) ## KcBERT Performance - Finetune 코드는 에서 찾아보실 수 있습니다. | | Size
(용량) | **NSMC**
(acc) | **Naver NER**
(F1) | **PAWS**
(acc) | **KorNLI**
(acc) | **KorSTS**
(spearman) | **Question Pair**
(acc) | **KorQuaD (Dev)**
(EM/F1) | | :-------------------- | :---: | :----------------: | :--------------------: | :----------------: | :------------------: | :-----------------------: | :-------------------------: | :---------------------------: | | KcBERT-Base | 417M | 89.62 | 84.34 | 66.95 | 74.85 | 75.57 | 93.93 | 60.25 / 84.39 | | KcBERT-Large | 1.2G | **90.68** | 85.53 | 70.15 | 76.99 | 77.49 | 94.06 | 62.16 / 86.64 | | KoBERT | 351M | 89.63 | 86.11 | 80.65 | 79.00 | 79.64 | 93.93 | 52.81 / 80.27 | | XLM-Roberta-Base | 1.03G | 89.49 | 86.26 | 82.95 | 79.92 | 79.09 | 93.53 | 64.70 / 88.94 | | HanBERT | 614M | 90.16 | **87.31** | 82.40 | **80.89** | 83.33 | 94.19 | 78.74 / 92.02 | | KoELECTRA-Base | 423M | **90.21** | 86.87 | 81.90 | 80.85 | 83.21 | 94.20 | 61.10 / 89.59 | | KoELECTRA-Base-v2 | 423M | 89.70 | 87.02 | **83.90** | 80.61 | **84.30** | **94.72** | **84.34 / 92.58** | | DistilKoBERT | 108M | 88.41 | 84.13 | 62.55 | 70.55 | 73.21 | 92.48 | 54.12 / 77.80 | \\*HanBERT의 Size는 Bert Model과 Tokenizer DB를 합친 것입니다. \\***config의 세팅을 그대로 하여 돌린 결과이며, hyperparameter tuning을 추가적으로 할 시 더 좋은 성능이 나올 수 있습니다.** ## How to use ### Requirements - - - 도 호환됩니다. - - ### Pretrain & Finetune Colab 링크 모음 #### Pretrain Data - 데이터셋 다운로드(Kaggle, 단일파일, 로그인 필요) - 데이터셋 다운로드(Github, 압축 여러파일, 로그인 불필요) #### Pretrain Code Colab에서 TPU로 KcBERT Pretrain 해보기: #### Finetune Samples **KcBERT-Base** NSMC Finetuning with PyTorch-Lightning (Colab) **KcBERT-Large** NSMC Finetuning with PyTorch-Lightning (Colab) > 위 두 코드는 Pretrain 모델(base, large)와 batch size만 다를 뿐, 나머지 코드는 완전히 동일합니다. ## Train Data & Preprocessing ### Raw Data 학습 데이터는 2019.01.01 ~ 2020.06.15 사이에 작성된 **댓글 많은 뉴스** 기사들의 **댓글과 대댓글**을 모두 수집한 데이터입니다. 데이터 사이즈는 텍스트만 추출시 **약 15.4GB이며, 1억1천만개 이상의 문장**으로 이뤄져 있습니다. ### Preprocessing PLM 학습을 위해서 전처리를 진행한 과정은 다음과 같습니다. 1. 한글 및 영어, 특수문자, 그리고 이모지(🥳)까지! 정규표현식을 통해 한글, 영어, 특수문자를 포함해 Emoji까지 학습 대상에 포함했습니다. 한편, 한글 범위를 으로 지정해 내의 한자를 제외했습니다. 2. 댓글 내 중복 문자열 축약 와 같이 중복된 글자를 와 같은 것으로 합쳤습니다. 3. Cased Model KcBERT는 영문에 대해서는 대소문자를 유지하는 Cased model입니다. 4. 글자 단위 10글자 이하 제거 10글자 미만의 텍스트는 단일 단어로 이뤄진 경우가 많아 해당 부분을 제외했습니다. 5. 중복 제거 중복적으로 쓰인 댓글을 제거하기 위해 중복 댓글을 하나로 합쳤습니다. 이를 통해 만든 최종 학습 데이터는 **12.5GB, 8.9천만개 문장**입니다. 아래 명령어로 pip로 설치한 뒤, 아래 clean함수로 클리닝을 하면 Downstream task에서 보다 성능이 좋아집니다. ( 감소) 아래 함수를 Text data에 사용해주세요. ### Cleaned Data (Released on Kaggle) 원본 데이터를 위 함수로 정제한 12GB분량의 txt 파일을 아래 Kaggle Dataset에서 다운받으실 수 있습니다 :) ## Tokenizer Train Tokenizer는 Huggingface의 Tokenizers 라이브러리를 통해 학습을 진행했습니다. 그 중 를 이용해 학습을 진행했고, Vocab Size는 으로 진행했습니다. Tokenizer를 학습하는 것에는 로 샘플링한 데이터로 학습을 진행했고, 보다 골고루 샘플링하기 위해 일자별로 stratify를 지정한 뒤 햑습을 진행했습니다. ## BERT Model Pretrain - KcBERT Base config - KcBERT Large config BERT Model Config는 Base, Large 기본 세팅값을 그대로 사용했습니다. (MLM 15% 등) TPU 을 이용해 각각 3일, N일(Large는 학습 진행 중)을 진행했고, 현재 Huggingface에 공개된 모델은 1m(100만) step을 학습한 ckpt가 업로드 되어있습니다. 모델 학습 Loss는 Step에 따라 초기 200k에 가장 빠르게 Loss가 줄어들다 400k이후로는 조금씩 감소하는 것을 볼 수 있습니다. - Base Model Loss !KcBERT-Base Pretraining Loss - Large Model Loss !KcBERT-Large Pretraining Loss 학습은 GCP의 TPU v3-8을 이용해 학습을 진행했고, 학습 시간은 Base Model 기준 2.5일정도 진행했습니다. Large Model은 약 5일정도 진행한 뒤 가장 낮은 loss를 가진 체크포인트로 정했습니다. ## Example ### HuggingFace MASK LM HuggingFace kcbert-base 모델 에서 아래와 같이 테스트 해 볼 수 있습니다. !오늘은 날씨가 \"좋네요\", KcBERT-Base 물론 kcbert-large 모델 에서도 테스트 할 수 있습니다. !image-20200806160624340 ### NSMC Binary Classification 네이버 영화평 코퍼스 데이터셋을 대상으로 Fine Tuning을 진행해 성능을 간단히 테스트해보았습니다. Base Model을 Fine Tune하는 코드는 에서 직접 실행해보실 수 있습니다. Large Model을 Fine Tune하는 코드는 에서 직접 실행해볼 수 있습니다. - GPU는 P100 x1대 기준 1epoch에 2-3시간, TPU는 1epoch에 1시간 내로 소요됩니다. - GPU RTX Titan x4대 기준 30분/epoch 소요됩니다. - 예시 코드는 pytorch-lightning으로 개발했습니다. #### 실험결과 - KcBERT-Base Model 실험결과: Val acc !KcBERT Base finetune on NSMC - KcBERT-Large Model 실험 결과: Val acc !image-20200806190242834 > 더 다양한 Downstream Task에 대해 테스트를 진행하고 공개할 예정입니다. ## 인용표기/Citation KcBERT를 인용하실 때는 아래 양식을 통해 인용해주세요. - 논문집 다운로드 링크: (*혹은 ) ## Acknowledgement KcBERT Model을 학습하는 GCP/TPU 환경은 TFRC 프로그램의 지원을 받았습니다. 모델 학습 과정에서 많은 조언을 주신 Monologg 님 감사합니다 :) ## Reference ### Github Repos - BERT by Google - KoBERT by SKT - KoELECTRA by Monologg - Transformers by Huggingface - Tokenizers by Hugginface ### Papers - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ### Blogs - Monologg님의 KoELECTRA 학습기 - Colab에서 TPU로 BERT 처음부터 학습시키기 - Tensorflow/Google ver.", + "model_explanation_gemini": "A Korean BERT model pretrained on informal, unrefined text like news comments and replies to handle colloquial language, slang, and typos in tasks such as sentiment analysis and named entity recognition." +} \ No newline at end of file diff --git a/data/model_data_json/bespin-global_klue-sroberta-base-continue-learning-by-mnr.json b/data/model_data_json/bespin-global_klue-sroberta-base-continue-learning-by-mnr.json new file mode 100644 index 0000000000000000000000000000000000000000..7fbd5a6e3f3fb49423acf96e135983303b26f9cc --- /dev/null +++ b/data/model_data_json/bespin-global_klue-sroberta-base-continue-learning-by-mnr.json @@ -0,0 +1,22 @@ +{ + "model_id": "bespin-global/klue-sroberta-base-continue-learning-by-mnr", + "downloads": 84806, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "ko", + "dataset:klue", + "license:cc-by-4.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - klue language: - ko license: cc-by-4.0 --- # bespin-global/klue-sroberta-base-continue-learning-by-mnr 해당 모델은 KLUE/NLI, KLUE/STS 데이터셋을 활용하였으며, sentence-transformers의 공식 문서 내 소개된 continue-learning 방법을 통해 아래와 같이 학습되었습니다. 1. NLI 데이터셋을 통해 nagative sampling 후, MultipleNegativeRankingLoss를 활용하여 1차 NLI training 수행 2. 1에서 학습완료 된 모델에 STS 데이터셋을 통해, CosineSimilarityLoss를 활용하여 2차 STS training 수행 학습에 관한 자세한 내용은 Blog와 Colab 실습 코드를 참고해주세요. --- This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results **EmbeddingSimilarityEvaluator: Evaluating the model on sts-test dataset:** - Cosine-Similarity : - Pearson: 0.8901 Spearman: 0.8893 - Manhattan-Distance: - Pearson: 0.8867 Spearman: 0.8818 - Euclidean-Distance: - Pearson: 0.8875 Spearman: 0.8827 - Dot-Product-Similarity: - Pearson: 0.8786 Spearman: 0.8735 - Average : 0.8892573547643868 ## Training The model was trained with the parameters: **DataLoader**: of length 329 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors JaeHyeong AN at Bespin Global", + "model_explanation_gemini": "Generates 768-dimensional sentence embeddings optimized for semantic similarity tasks in Korean using NLI and STS training with MultipleNegativeRankingLoss and CosineSimilarityLoss." +} \ No newline at end of file diff --git a/data/model_data_json/bhadresh-savani_distilbert-base-uncased-emotion.json b/data/model_data_json/bhadresh-savani_distilbert-base-uncased-emotion.json new file mode 100644 index 0000000000000000000000000000000000000000..992ba7e4d4ff121826edcc1c4b7d5a3ac4520027 --- /dev/null +++ b/data/model_data_json/bhadresh-savani_distilbert-base-uncased-emotion.json @@ -0,0 +1,24 @@ +{ + "model_id": "bhadresh-savani/distilbert-base-uncased-emotion", + "downloads": 98612, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "distilbert", + "text-classification", + "emotion", + "en", + "dataset:emotion", + "arxiv:1910.01108", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 tags: - text-classification - emotion - pytorch datasets: - emotion metrics: - Accuracy, F1 Score thumbnail: model-index: - name: bhadresh-savani/distilbert-base-uncased-emotion results: - task: type: text-classification name: Text Classification dataset: name: emotion type: emotion config: default split: test metrics: - type: accuracy value: 0.927 name: Accuracy verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzQxOGRmMjFlZThmZWViNjNmNGMzMTdjMGNjYjg1YWUzOTI0ZDlmYjRhYWMzMDA3Yjg2N2FiMTdmMzk0ZjJkOSIsInZlcnNpb24iOjF9.mOqr-hgNrnle7WCPy3Mo7M3fITFppn5gjpNagGMf_TZfB6VZnPKfZ51UkNFQlBtUlcm0U8vwPkF79snxwvCoDw - type: precision value: 0.8880230732280744 name: Precision Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjZiN2NjNTkyN2M3ZWM2ZDZiNDk1OWZhN2FmNTAwZDIzMmQ3NTU2Yjk2MTgyNjJmMTNjYTYzOTc1NDdhYTljYSIsInZlcnNpb24iOjF9.0rWHmCZ2PyZ5zYkSeb_tFdQG9CHS5PdpOZ9kOfrIzEXyZ968daayaOJi2d6iO84fnauE5hZiIAUPsx24Vr4nBA - type: precision value: 0.927 name: Precision Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmRhNWM1NDQ4ZjkyYjAxYjQ5MzQzMDA1ZDIzYWU3YTE4NTI2ZTMwYWI2ZWQ4NzQ3YzJkODYzMmZhZDI1NGRlNCIsInZlcnNpb24iOjF9.NlII1s42Mr_DMzPEoR0ntyh5cDW0405TxVkWhCgXLJTFAdnivH54-zZY4av1U5jHPTeXeWwZrrrbMwHCRBkoCw - type: precision value: 0.9272902840835793 name: Precision Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODhkNmM5NmYyMzA4MjkwOTllZDgyMDQ1NzZkN2QzOTAyOTMyNGFlZTU4NzM5NmM5NWQ1YmUxYmRmNjA5YjhhNCIsInZlcnNpb24iOjF9.oIn1KT-BOpFNLXiKL29frMvgHhWZMHWc9Q5WgeR7UaMEO7smkK8J3j5HAMy17Ktjv2dh783-f76N6gyJ_NewCg - type: recall value: 0.8790126653780703 name: Recall Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjhlNzczNDY2NDVlM2UwMjAzOWQxYTAyNWZkNGZlYmNjODNiZTEzMTcxNTE3MTAxNjNkOTFiMmRiMzViMzJmZiIsInZlcnNpb24iOjF9.AXp7omMuUZFJ6mzAVTQPMke7QoUtoi4RJSSE7Xbnp2pNi7y-JtznKdm---l6RfqcHPlI0jWr7TVGoFsWZ64YAg - type: recall value: 0.927 name: Recall Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjEyYmZiZDQ4MzM1ZmQ2ZmJhZWU4OTVkNmViYjA5NzhiN2MxODE0MzUxZTliZTk0MzViZDAyNGU4MDFjYjM1MSIsInZlcnNpb24iOjF9.9lazxLXbPOdwhqoYtIudwRwjfNVZnUu7KvGRklRP_RAoQStAzgmWMIrT3ckX_d5_6bKZH9fIdujUn5Qz-baKBw - type: recall value: 0.927 name: Recall Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWVhMzY0YTA4YmQzYTg4YTBiMzQ5YzRiZWJhMjM1NjUzZGQxZmQ5M2NkZDcyNTQ0ZmJjN2NkY2ZiYjg0OWI0ZCIsInZlcnNpb24iOjF9.QgTv726WCTyvrEct0NM8Zpc3vUnDbIwCor9EH941-zpJtuWr-xpdZzYZFJfILkVA0UUn1y6Jz_ABfkfBeyZTBg - type: f1 value: 0.8825061528287809 name: F1 Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzQzZTJkMDAwOTUwMzY3ZjI2MjIxYjlmZTg3YTdhNTc4ZjYyMmQ2NDQzM2FmYzk3OGEzNjhhMTk3NTQ3OTlhNyIsInZlcnNpb24iOjF9.hSln1KfKm0plK7Qao9vlubFtAl1M7_UYHNM6La9gEZlW_apnU1Mybz03GT2XZORgOVPe9JmgygvZByxQhpsYBw - type: f1 value: 0.927 name: F1 Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzljODQ3NjE3MDRkODE3ZjFlZmY5MjYyOGJlNDQ4YzdlZGRiMTI5OGZiZWM2ODkyZjMyZWQ3MTkzYWU5YThkOCIsInZlcnNpb24iOjF9.7qfBw39fv22jSIJoY71DkOVr9eBB-srhqSi09bCcUC7Huok4O2Z_vB7gO_Rahh9sFgKVu1ZATusjTmOLQr0fBw - type: f1 value: 0.926876082854655 name: F1 Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjJhN2UzODgxOWQ0Y2E3YTcwZTQxMDE0ZWRmYThjOWVhYWQ1YjBhMzk0YWUxNzE2ZjFhNWM5ZmE2ZmI1YTczYSIsInZlcnNpb24iOjF9.nZW0dBdLmh_FgNw6GaITvSJFX-2C_Iku3NanU8Rip7FSiRHozKPAjothdQh9MWQnq158ZZGPPVIjtyIvuTSqCw - type: loss value: 0.17403268814086914 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTVjZmFiOGQwZGY1OTU5YWFkNGZjMTlhOGI4NjE3MGI4ZDhkODcxYmJiYTQ3NWNmMWM0ODUyZDI1MThkYTY3ZSIsInZlcnNpb24iOjF9.OYz5BI3Lz8LgjAqVnD6NcrG3UAG0D3wjKJ7G5298RRGaNpb621ycisG_7UYiWixY7e2RJafkfRiplmkdczIFDQ --- # Distilbert-base-uncased-emotion ## Model description: Distilbert is created with knowledge distillation during the pre-training phase which reduces the size of a BERT model by 40%, while retaining 97% of its language understanding. It's smaller, faster than Bert and any other Bert-based model. Distilbert-base-uncased finetuned on the emotion dataset using HuggingFace Trainer with below Hyperparameters ## Model Performance Comparision on Emotion Dataset from Twitter: | Model | Accuracy | F1 Score | Test Sample per Second | | --- | --- | --- | --- | | Distilbert-base-uncased-emotion | 93.8 | 93.79 | 398.69 | | Bert-base-uncased-emotion | 94.05 | 94.06 | 190.152 | | Roberta-base-emotion | 93.95 | 93.97| 195.639 | | Albert-base-v2-emotion | 93.6 | 93.65 | 182.794 | ## How to Use the model: ## Dataset: Twitter-Sentiment-Analysis. ## Training procedure Colab Notebook ## Eval results ## Reference: * Natural Language Processing with Transformer By Lewis Tunstall, Leandro von Werra, Thomas Wolf", + "model_explanation_gemini": "Classifies text into emotion categories using a distilled BERT model trained on an emotion dataset." +} \ No newline at end of file diff --git a/data/model_data_json/bigcode_starcoder2-3b.json b/data/model_data_json/bigcode_starcoder2-3b.json new file mode 100644 index 0000000000000000000000000000000000000000..5742bc8244f0d461aa202813f54344dd7d4b6ecb --- /dev/null +++ b/data/model_data_json/bigcode_starcoder2-3b.json @@ -0,0 +1,25 @@ +{ + "model_id": "bigcode/starcoder2-3b", + "downloads": 210235, + "tags": [ + "transformers", + "safetensors", + "starcoder2", + "text-generation", + "code", + "dataset:bigcode/the-stack-v2-train", + "arxiv:2305.13245", + "arxiv:2205.14135", + "arxiv:2004.05150", + "arxiv:2207.14255", + "arxiv:2402.19173", + "license:bigcode-openrail-m", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: text-generation inference: true widget: - text: 'def print_hello_world():' example_title: Hello world group: Python datasets: - bigcode/the-stack-v2-train license: bigcode-openrail-m library_name: transformers tags: - code model-index: - name: starcoder2-3b results: - task: type: text-generation dataset: name: CruxEval-I type: cruxeval-i metrics: - type: pass@1 value: 32.7 - task: type: text-generation dataset: name: DS-1000 type: ds-1000 metrics: - type: pass@1 value: 25.0 - task: type: text-generation dataset: name: GSM8K (PAL) type: gsm8k-pal metrics: - type: accuracy value: 27.7 - task: type: text-generation dataset: name: HumanEval+ type: humanevalplus metrics: - type: pass@1 value: 27.4 - task: type: text-generation dataset: name: HumanEval type: humaneval metrics: - type: pass@1 value: 31.7 - task: type: text-generation dataset: name: RepoBench-v1.1 type: repobench-v1.1 metrics: - type: edit-smiliarity value: 71.19 --- # StarCoder2
\"SC2\"
## Table of Contents 1. Model Summary 2. Use 3. Limitations 4. Training 5. License 6. Citation ## Model Summary StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens. - **Project Website:** bigcode-project.org - **Paper:** Link - **Point of Contact:** contact@bigcode-project.org - **Languages:** 17 Programming languages ## Use ### Intended use The model was trained on GitHub code as well as additional selected data sources such as Arxiv and Wikipedia. As such it is _not_ an instruction model and commands like \"Write a function that computes the square root.\" do not work well. ### Generation Here are some examples to get started with the model. You can find a script for fine-tuning in StarCoder2's GitHub repository. First, make sure to install from source: #### Running the model on CPU/GPU/multi GPU * _Using full precision_ * _Using _ #### Quantized Versions through * _Using 8-bit precision (int8)_ ### Attribution & Other Requirements The pretraining dataset of the model was filtered for permissive licenses and code with no license only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a search index that lets you search through the pretraining data to identify where the generated code came from, and apply the proper attribution to your code. # Limitations The model has been trained on source code from 600+ programming languages. The predominant language in source is English although other languages are also present. As such the model is capable to generate code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient, contain bugs or exploits. See the paper for an in-depth discussion of the model limitations. # Training ## Model - **Architecture:** Transformer decoder with grouped-query and sliding window attention and Fill-in-the-Middle objective - **Pretraining steps:** 1.2 million - **Pretraining tokens:** 3+ trillion - **Precision:** bfloat16 ## Hardware - **GPUs:** 160 A100 ## Software - **Framework:** TODO - **Neural networks:** PyTorch # License The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement here. # Citation", + "model_explanation_gemini": "Generates and autocompletes code in 17 programming languages, trained on GitHub and other sources with a focus on fill-in-the-middle tasks." +} \ No newline at end of file diff --git a/data/model_data_json/bigscience_bloom-1b7.json b/data/model_data_json/bigscience_bloom-1b7.json new file mode 100644 index 0000000000000000000000000000000000000000..7ec6c4d1293790d15fd019bc6a47a9585bdc99fc --- /dev/null +++ b/data/model_data_json/bigscience_bloom-1b7.json @@ -0,0 +1,70 @@ +{ + "model_id": "bigscience/bloom-1b7", + "downloads": 99296, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "bloom", + "text-generation", + "ak", + "ar", + "as", + "bm", + "bn", + "ca", + "code", + "en", + "es", + "eu", + "fon", + "fr", + "gu", + "hi", + "id", + "ig", + "ki", + "kn", + "lg", + "ln", + "ml", + "mr", + "ne", + "nso", + "ny", + "or", + "pa", + "pt", + "rn", + "rw", + "sn", + "st", + "sw", + "ta", + "te", + "tn", + "ts", + "tum", + "tw", + "ur", + "vi", + "wo", + "xh", + "yo", + "zh", + "zhs", + "zht", + "zu", + "arxiv:1909.08053", + "arxiv:2110.02861", + "arxiv:2108.12409", + "license:bigscience-bloom-rail-1.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: bigscience-bloom-rail-1.0 language: - ak - ar - as - bm - bn - ca - code - en - es - eu - fon - fr - gu - hi - id - ig - ki - kn - lg - ln - ml - mr - ne - nso - ny - or - pa - pt - rn - rw - sn - st - sw - ta - te - tn - ts - tum - tw - ur - vi - wo - xh - yo - zh - zhs - zht - zu pipeline_tag: text-generation ---

BLOOM LM

BigScience Large Open-science Open-access Multilingual Language Model

Model Card

\"BigScience Version 1.0 / 26.May.2022 # Model Card for Bloom-1b7 ## Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Recommendations 5. Training Data 6. Evaluation 7. Environmental Impact 8. Technical Specifications 9. Citation 10. Glossary and Calculations 11. More Information 12. Model Card Authors 13. Model Card Contact ## Model Details ### Model Description *This section provides information for anyone who wants to know about the model.* - **Developed by:** BigScience (website) * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)* - **Model Type:** Transformer-based Language Model - **Version:** 1.0.0 - **Languages:** Multiple; see training data - **License:** RAIL License v1.0 (link) - **Release Date Estimate:** Monday, 11.July.2022 - **Funded by:** * The French government. * Hugging Face (website). * Organizations of contributors. *(Further breakdown of organizations forthcoming.)* ## Uses *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model. It provides information for anyone considering using the model or who is affected by the model.* ### Intended Use This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive. #### **Direct Use** - Text generation - Exploring characteristics of language generated by a language model - Examples: Cloze tests, counterfactuals, generations with reframings #### **Downstream Use** - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization ### Misuse and Out-of-scope Use *This section addresses what users ought not do with the model.* See the BLOOM License, Attachment A, for detailed usage restrictions. The below list is non-exhaustive, but lists some easily foreseeable problematic use cases. #### **Out-of-scope Uses** Using the model in high-stakes settings is out of scope for this model. The model is not designed for critical decisions nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but is not correct. ##### Out-of-scope Uses Include: - Usage in biomedical domains, political and legal domains, or finance domains - Usage for evaluating or scoring individuals, such as for employment, education, or credit - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct #### **Misuse** Intentionally using the model for harm, violating human rights, or other kinds of malicious activities, is a misuse of this model. This includes: - Spam generation - Disinformation and influence operations - Disparagement and defamation - Harassment and abuse - Deception - Unconsented impersonation and imitation - Unconsented surveillance - Generating content without attribution to the model, as specified in the RAIL License, Use Restrictions ### Intended Users #### **Direct Users** - General Public - Researchers - Students - Educators - Engineers/developers - Non-commercial entities - Community advocates, including human and civil rights groups #### Indirect Users - Users of derivatives created by Direct Users, such as those using software with an intended use - Users of Derivatives of the Model, as described in the License #### Others Affected (Parties Prenantes) - People and groups referred to by the LLM - People and groups exposed to outputs of, or decisions based on, the LLM - People and groups whose original work is included in the LLM ## Bias, Risks, and Limitations *This section identifies foreseeable harms and misunderstandings.* Model may: - Overrepresent some viewpoints and underrepresent others - Contain stereotypes - Contain personal information - Generate: - Hateful, abusive, or violent language - Discriminatory or prejudicial language - Content that may not be appropriate for all settings, including sexual content - Make errors, including producing incorrect information as if it were factual - Generate irrelevant or repetitive outputs ### Recommendations *This section provides information on warnings and potential mitigations.* - Indirect users should be made aware when the content they're working with is created by the LLM. - Users should be aware of Risks and Limitations, and include an appropriate age disclaimer or blocking interface as necessary. - Models pretrained with the LLM should include an updated Model Card. - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments. ## Training Data *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.* Details for each dataset are provided in individual Data Cards. Training data includes: - 45 natural languages - 12 programming languages - In 1.5TB of pre-processed text, converted into 350B unique tokens (see the tokenizer section for more.) #### **Languages** The pie chart shows the distribution of languages in training data. !pie chart showing the distribution of languages in training data **The following table shows the further distribution of Niger-Congo and Indic languages in the training data.** | Niger Congo | Percentage | | Indic | Percentage | |----------------|------------ |------ |-----------|------------| | Chi Tumbuka | 0.00002 | | Assamese | 0.01 | | Kikuyu | 0.00004 | | Odia | 0.04 | | Bambara | 0.00004 | | Gujarati | 0.04 | | Akan | 0.00007 | | Marathi | 0.05 | | Xitsonga | 0.00007 | | Punjabi | 0.05 | | Sesotho | 0.00007 | | Kannada | 0.06 | | Chi Chewa | 0.0001 | | Nepali | 0.07 | | Setswana | 0.0002 | | Telugu | 0.09 | | Northern Sotho | 0.0002 | | Malayalam | 0.10 | | Fon | 0.0002 | | Urdu | 0.10 | | Kirundi | 0.0003 | | Tamil | 0.20 | | Wolof | 0.0004 | | Bengali | 0.50 | | Kuganda | 0.0004 | | Hindi | 0.70 | | Chi Shona | 0.001 | | Isi Zulu | 0.001 | | Igbo | 0.001 | | Xhosa | 0.001 | | Kinyarwanda | 0.003 | | Yoruba | 0.006 | | Swahili | 0.02 | **The following table shows the distribution of programming languages.** | Extension | Language | Number of files | |----------------|------------|-----------------| | java | Java | 5,407,724 | | php | PHP | 4,942,186 | | cpp | C++ | 2,503,930 | | py | Python | 2,435,072 | | js | JavaScript | 1,905,518 | | cs | C# | 1,577,347 | | rb | Ruby | 6,78,413 | | cc | C++ | 443,054 | | hpp | C++ | 391,048 | | lua | Lua | 352,317 | | go | GO | 227,763 | | ts | TypeScript | 195,254 | | C | C | 134,537 | | scala | Scala | 92,052 | | hh | C++ | 67,161 | | H | C++ | 55,899 | | tsx | TypeScript | 33,107 | | rs | Rust | 29,693 | | phpt | PHP | 9,702 | | c++ | C++ | 1,342 | | h++ | C++ | 791 | | php3 | PHP | 540 | | phps | PHP | 270 | | php5 | PHP | 166 | | php4 | PHP | 29 | ## Evaluation *This section describes the evaluation protocols and provides the results.* ### Metrics *This section describes the different ways performance is calculated and why.* Includes: | Metric | Why chosen | |--------------------|--------------------------------------------------------------------| | Perplexity | Standard metric for quantifying model improvements during training | | Cross Entropy Loss | Standard objective for language models. | And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_ ### Factors *This section lists some different aspects of what BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.* - Language, such as English or Yoruba - Domain, such as newswire or stories - Demographic characteristics, such as gender or nationality ### Results *Results are based on the Factors and Metrics.* **Train-time Evaluation:** As of 25.May.2022, 15:00 PST: - Training Loss: 2.0 - Validation Loss: 2.2 - Perplexity: 8.9 (More evaluation scores forthcoming at the end of model training.) - BLOOM Book: Read generations from BLOOM based on prompts provided by the community ## Environmental Impact The training supercomputer, Jean Zay (website), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing. **Estimated carbon emissions:** *(Forthcoming upon completion of training.)* **Estimated electricity usage:** *(Forthcoming upon completion of training.)* ## Technical Specifications *This section provides information for people who work on model development.* Please see the BLOOM training README for full details on replicating training. **Model Architecture:** Modified from Megatron-LM GPT2 (see paper, BLOOM Megatron code): * Decoder-only architecture * Layer normalization applied to word embeddings layer (; see code, paper) * ALiBI positional encodings (see paper), with GeLU activation functions * 1,722,408,960 parameters: * 513,802,240 embedding parameters * 24 layers, 16 attention heads * Hidden layers are 2048-dimensional * Sequence length of 2048 tokens used (see BLOOM tokenizer, tokenizer description) **Objective Function:** Cross Entropy with mean reduction (see API documentation). **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see announcement). * Hardware: 64 V100 16/32GB GPUs (16 nodes): * 4 GPUs per node * 40 CPUs per task * 1 task per node * CPU: AMD * CPU memory: 160GB per node * GPU memory: 64GB or 128GB (depending on node availability during training) per node * Inter-node connect: Omni-Path Architecture (OPA) * NCCL-communications network: a fully dedicated subnet * Disc IO network: shared network with other types of nodes * Software: * Megatron-DeepSpeed (Github link) * DeepSpeed (Github link) * PyTorch (pytorch-1.11 w/ CUDA-11.5; see Github link) * apex (Github link) ### **Training** - Checkpoint size: - Fp16 weights: 2.6GB (# params * 2) - Full checkpoint with optimizer states: -- - Training throughput: -- - Number of epochs: 1 - Dates: - Start: 11th March, 2022 11:42am PST - End: 20 May, 2022 - Server training location: Île-de-France, France ### **Tokenization** The BLOOM tokenizer (link) is a learned subword tokenizer trained using: - A byte-level Byte Pair Encoding (BPE) algorithm - A simple pre-tokenization rule, no normalization - A vocabulary size of 250,680 It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language. ## Citation **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022 ## Glossary and Calculations *This section defines common terms and how metrics are calculated.* - **Loss:** A calculation of the difference between what the model has learned and what the data shows (\"groundtruth\"). The lower the loss, the better. The training process aims to minimize the loss. - **Perplexity:** This is based on what the model estimates the probability of new data is. The lower the perplexity, the better. If the model is 100% correct at predicting the next token it will see, then the perplexity is 1. Mathematically this is calculated using entropy. - **High-stakes settings:** Such as those identified as \"high-risk AI systems\" and \"unacceptable risk AI systems\" in the European Union's proposed Artificial Intelligence (AI) Act. - **Critical decisions:** Such as those defined in the United States' proposed Algorithmic Accountability Act. - **Human rights:** Includes those rights defined in the Universal Declaration of Human Rights. - **Personal Data and Personal Information:** Personal data and information is defined in multiple data protection regulations, such as \"personal data\" in the European Union's General Data Protection Regulation; and \"personal information\" in the Republic of South Africa's Protection of Personal Information Act, The People's Republic of China's Personal information protection law. - **Sensitive characteristics:** This includes specifically protected categories in human rights (see UHDR, Article 2) and personal information regulation (see GDPR, Article 9; Protection of Personal Information Act, Chapter 1) - **Deception:** Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated. ## More Information ### Dataset Creation Blog post detailing the design choices during the dataset creation: ### Technical Specifications Blog post summarizing how the architecture, size, shape, and pre-training duration where selected: More details on the architecture/optimizer: Blog post on the hardware/engineering side: Details on the distributed setup used for the training: Tensorboard updated during the training: Insights on how to approach training, negative results: Details on the obstacles overcome during the preparation on the engineering side (instabilities, optimization of training throughput, so many technical tricks and questions): ### Initial Results Initial prompting experiments using interim checkpoints: ## Model Card Authors *Ordered roughly chronologically and by amount of time spent.* Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff ## Model Card Contact **Send Questions to:** bigscience-contact@googlegroups.com", + "model_explanation_gemini": "bigscience_bloom-1b7 is a multilingual, transformer-based language model designed for text generation and as a pretrained base for fine-tuning tasks like information extraction, question answering, and summarization across diverse languages." +} \ No newline at end of file diff --git a/data/model_data_json/bigscience_bloom-560m.json b/data/model_data_json/bigscience_bloom-560m.json new file mode 100644 index 0000000000000000000000000000000000000000..6f2186b7a2061eb2baed2a59f1e945de48b9a38c --- /dev/null +++ b/data/model_data_json/bigscience_bloom-560m.json @@ -0,0 +1,71 @@ +{ + "model_id": "bigscience/bloom-560m", + "downloads": 638843, + "tags": [ + "transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "bloom", + "text-generation", + "ak", + "ar", + "as", + "bm", + "bn", + "ca", + "code", + "en", + "es", + "eu", + "fon", + "fr", + "gu", + "hi", + "id", + "ig", + "ki", + "kn", + "lg", + "ln", + "ml", + "mr", + "ne", + "nso", + "ny", + "or", + "pa", + "pt", + "rn", + "rw", + "sn", + "st", + "sw", + "ta", + "te", + "tn", + "ts", + "tum", + "tw", + "ur", + "vi", + "wo", + "xh", + "yo", + "zh", + "zhs", + "zht", + "zu", + "arxiv:1909.08053", + "arxiv:2110.02861", + "arxiv:2108.12409", + "license:bigscience-bloom-rail-1.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: bigscience-bloom-rail-1.0 language: - ak - ar - as - bm - bn - ca - code - en - es - eu - fon - fr - gu - hi - id - ig - ki - kn - lg - ln - ml - mr - ne - nso - ny - or - pa - pt - rn - rw - sn - st - sw - ta - te - tn - ts - tum - tw - ur - vi - wo - xh - yo - zh - zhs - zht - zu pipeline_tag: text-generation ---

BLOOM LM

BigScience Large Open-science Open-access Multilingual Language Model

Model Card

\"BigScience Version 1.0 / 26.May.2022 # Model Card for Bloom-560m ## Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Recommendations 5. Training Data 6. Evaluation 7. Environmental Impact 8. Technical Specifications 9. Citation 10. Glossary and Calculations 11. More Information 12. Model Card Authors 13. Model Card Contact ## Model Details ### Model Description *This section provides information for anyone who wants to know about the model.* - **Developed by:** BigScience (website) * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)* - **Model Type:** Transformer-based Language Model - **Version:** 1.0.0 - **Languages:** Multiple; see training data - **License:** RAIL License v1.0 (link) - **Release Date Estimate:** Monday, 11.July.2022 - **Funded by:** * The French government. * Hugging Face (website). * Organizations of contributors. *(Further breakdown of organizations forthcoming.)* ## Uses *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model. It provides information for anyone considering using the model or who is affected by the model.* ### Intended Use This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive. #### **Direct Use** - Text generation - Exploring characteristics of language generated by a language model - Examples: Cloze tests, counterfactuals, generations with reframings #### **Downstream Use** - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization ### Misuse and Out-of-scope Use *This section addresses what users ought not do with the model.* See the BLOOM License, Attachment A, for detailed usage restrictions. The below list is non-exhaustive, but lists some easily foreseeable problematic use cases. #### **Out-of-scope Uses** Using the model in high-stakes settings is out of scope for this model. The model is not designed for critical decisions nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but is not correct. ##### Out-of-scope Uses Include: - Usage in biomedical domains, political and legal domains, or finance domains - Usage for evaluating or scoring individuals, such as for employment, education, or credit - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct #### **Misuse** Intentionally using the model for harm, violating human rights, or other kinds of malicious activities, is a misuse of this model. This includes: - Spam generation - Disinformation and influence operations - Disparagement and defamation - Harassment and abuse - Deception - Unconsented impersonation and imitation - Unconsented surveillance - Generating content without attribution to the model, as specified in the RAIL License, Use Restrictions ### Intended Users #### **Direct Users** - General Public - Researchers - Students - Educators - Engineers/developers - Non-commercial entities - Community advocates, including human and civil rights groups #### Indirect Users - Users of derivatives created by Direct Users, such as those using software with an intended use - Users of Derivatives of the Model, as described in the License #### Others Affected (Parties Prenantes) - People and groups referred to by the LLM - People and groups exposed to outputs of, or decisions based on, the LLM - People and groups whose original work is included in the LLM ## Bias, Risks and Limitations *This section identifies foreseeable harms and misunderstandings.* Model may: - Overrepresent some viewpoints and underrepresent others - Contain stereotypes - Contain personal information - Generate: - Hateful, abusive, or violent language - Discriminatory or prejudicial language - Content that may not be appropriate for all settings, including sexual content - Make errors, including producing incorrect information as if it were factual - Generate irrelevant or repetitive outputs ### Recommendations *This section provides information on warnings and potential mitigations.* - Indirect users should be made aware when the content they're working with is created by the LLM. - Users should be aware of Risks and Limitations, and include an appropriate age disclaimer or blocking interface as necessary. - Models pretrained with the LLM should include an updated Model Card. - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments. ## Training Data *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.* Details for each dataset are provided in individual Data Cards. Training data includes: - 45 natural languages - 12 programming languages - In 1.5TB of pre-processed text, converted into 350B unique tokens (see the tokenizer section for more.) #### **Languages** The pie chart shows the distribution of languages in training data. !pie chart showing the distribution of languages in training data **The following table shows the further distribution of Niger-Congo and Indic languages in the training data.** | Niger Congo | Percentage | | Indic | Percentage | |----------------|------------ |------ |-----------|------------| | Chi Tumbuka | 0.00002 | | Assamese | 0.01 | | Kikuyu | 0.00004 | | Odia | 0.04 | | Bambara | 0.00004 | | Gujarati | 0.04 | | Akan | 0.00007 | | Marathi | 0.05 | | Xitsonga | 0.00007 | | Punjabi | 0.05 | | Sesotho | 0.00007 | | Kannada | 0.06 | | Chi Chewa | 0.0001 | | Nepali | 0.07 | | Setswana | 0.0002 | | Telugu | 0.09 | | Northern Sotho | 0.0002 | | Malayalam | 0.10 | | Fon | 0.0002 | | Urdu | 0.10 | | Kirundi | 0.0003 | | Tamil | 0.20 | | Wolof | 0.0004 | | Bengali | 0.50 | | Kuganda | 0.0004 | | Hindi | 0.70 | | Chi Shona | 0.001 | | Isi Zulu | 0.001 | | Igbo | 0.001 | | Xhosa | 0.001 | | Kinyarwanda | 0.003 | | Yoruba | 0.006 | | Swahili | 0.02 | **The following table shows the distribution of programming languages.** | Extension | Language | Number of files | |----------------|------------|-----------------| | java | Java | 5,407,724 | | php | PHP | 4,942,186 | | cpp | C++ | 2,503,930 | | py | Python | 2,435,072 | | js | JavaScript | 1,905,518 | | cs | C# | 1,577,347 | | rb | Ruby | 6,78,413 | | cc | C++ | 443,054 | | hpp | C++ | 391,048 | | lua | Lua | 352,317 | | go | GO | 227,763 | | ts | TypeScript | 195,254 | | C | C | 134,537 | | scala | Scala | 92,052 | | hh | C++ | 67,161 | | H | C++ | 55,899 | | tsx | TypeScript | 33,107 | | rs | Rust | 29,693 | | phpt | PHP | 9,702 | | c++ | C++ | 1,342 | | h++ | C++ | 791 | | php3 | PHP | 540 | | phps | PHP | 270 | | php5 | PHP | 166 | | php4 | PHP | 29 | ## Evaluation *This section describes the evaluation protocols and provides the results.* ### Metrics *This section describes the different ways performance is calculated and why.* Includes: | Metric | Why chosen | |--------------------|--------------------------------------------------------------------| | Perplexity | Standard metric for quantifying model improvements during training | | Cross Entropy Loss | Standard objective for language models. | And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_ ### Factors *This section lists some different aspects of what BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.* - Language, such as English or Yoruba - Domain, such as newswire or stories - Demographic characteristics, such as gender or nationality ### Results *Results are based on the Factors and Metrics.* **Train-time Evaluation:** As of 25.May.2022, 15:00 PST: - Training Loss: 2.0 - Validation Loss: 2.2 - Perplexity: 8.9 (More evaluation scores forthcoming at the end of model training.) ## Environmental Impact The training supercomputer, Jean Zay (website), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing. **Estimated carbon emissions:** *(Forthcoming upon completion of training.)* **Estimated electricity usage:** *(Forthcoming upon completion of training.)* ## Technical Specifications *This section provides information for people who work on model development.* Please see the BLOOM training README for full details on replicating training. **Model Architecture:** Modified from Megatron-LM GPT2 (see paper, BLOOM Megatron code): * Decoder-only architecture * Layer normalization applied to word embeddings layer (; see code, paper) * ALiBI positional encodings (see paper), with GeLU activation functions * 559,214,592 parameters: * 256,901,120 embedding parameters * 24 layers, 16 attention heads * Hidden layers are 1024-dimensional * Sequence length of 2048 tokens (see BLOOM tokenizer, tokenizer description) **Objective Function:** Cross Entropy with mean reduction (see API documentation). **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see announcement). * Hardware: 384 A100 80GB GPUs (48 nodes): * Additional 32 A100 80GB GPUs (4 nodes) in reserve * 8 GPUs per node Using NVLink 4 inter-gpu connects, 4 OmniPath links * CPU: AMD * CPU memory: 512GB per node * GPU memory: 640GB per node * Inter-node connect: Omni-Path Architecture (OPA) * NCCL-communications network: a fully dedicated subnet * Disc IO network: shared network with other types of nodes * Software: * Megatron-DeepSpeed (Github link) * DeepSpeed (Github link) * PyTorch (pytorch-1.11 w/ CUDA-11.5; see Github link) * apex (Github link) ### **Training** Training logs: Tensorboard link - Training throughput: About 150 TFLOPs per GPU - Number of epochs: 1 (*current target*) - Dates: - Started 11th March, 2022 11:42am PST - Ended 5th July, 2022 - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments and other model sizes) - Server training location: Île-de-France, France ### **Tokenization** The BLOOM tokenizer (link) is a learned subword tokenizer trained using: - A byte-level Byte Pair Encoding (BPE) algorithm - A simple pre-tokenization rule, no normalization - A vocabulary size of 250,680 It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language. ## Citation **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022 ## Glossary and Calculations *This section defines common terms and how metrics are calculated.* - **Loss:** A calculation of the difference between what the model has learned and what the data shows (\"groundtruth\"). The lower the loss, the better. The training process aims to minimize the loss. - **Perplexity:** This is based on what the model estimates the probability of new data is. The lower the perplexity, the better. If the model is 100% correct at predicting the next token it will see, then the perplexity is 1. Mathematically this is calculated using entropy. - **High-stakes settings:** Such as those identified as \"high-risk AI systems\" and \"unacceptable risk AI systems\" in the European Union's proposed Artificial Intelligence (AI) Act. - **Critical decisions:** Such as those defined in the United States' proposed Algorithmic Accountability Act. - **Human rights:** Includes those rights defined in the Universal Declaration of Human Rights. - **Personal Data and Personal Information:** Personal data and information is defined in multiple data protection regulations, such as \"personal data\" in the European Union's General Data Protection Regulation; and \"personal information\" in the Republic of South Africa's Protection of Personal Information Act, The People's Republic of China's Personal information protection law. - **Sensitive characteristics:** This includes specifically protected categories in human rights (see UHDR, Article 2) and personal information regulation (see GDPR, Article 9; Protection of Personal Information Act, Chapter 1) - **Deception:** Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated. ## More Information ### Dataset Creation Blog post detailing the design choices during the dataset creation: ### Technical Specifications Blog post summarizing how the architecture, size, shape, and pre-training duration where selected: More details on the architecture/optimizer: Blog post on the hardware/engineering side: Details on the distributed setup used for the training: Tensorboard updated during the training: Insights on how to approach training, negative results: Details on the obstacles overcome during the preparation on the engineering side (instabilities, optimization of training throughput, so many technical tricks and questions): ### Initial Results Initial prompting experiments using interim checkpoints: ## Model Card Authors *Ordered roughly chronologically and by amount of time spent.* Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff ## Model Card Contact **Send Questions to:** bigscience-contact@googlegroups.com", + "model_explanation_gemini": "BLOOM-560m is a multilingual text-generation model designed for public research on large language models, supporting diverse languages and tasks like information extraction and summarization." +} \ No newline at end of file diff --git a/data/model_data_json/bigscience_bloom-7b1-intermediate.json b/data/model_data_json/bigscience_bloom-7b1-intermediate.json new file mode 100644 index 0000000000000000000000000000000000000000..5f4b9d5a2b74896713e3f62bcfe2717df434c5ae --- /dev/null +++ b/data/model_data_json/bigscience_bloom-7b1-intermediate.json @@ -0,0 +1,69 @@ +{ + "model_id": "bigscience/bloom-7b1-intermediate", + "downloads": 37245, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bloom", + "text-generation", + "ak", + "ar", + "as", + "bm", + "bn", + "ca", + "code", + "en", + "es", + "eu", + "fon", + "fr", + "gu", + "hi", + "id", + "ig", + "ki", + "kn", + "lg", + "ln", + "ml", + "mr", + "ne", + "nso", + "ny", + "or", + "pa", + "pt", + "rn", + "rw", + "sn", + "st", + "sw", + "ta", + "te", + "tn", + "ts", + "tum", + "tw", + "ur", + "vi", + "wo", + "xh", + "yo", + "zh", + "zhs", + "zht", + "zu", + "arxiv:1909.08053", + "arxiv:2110.02861", + "arxiv:2108.12409", + "license:bigscience-bloom-rail-1.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: bigscience-bloom-rail-1.0 language: - ak - ar - as - bm - bn - ca - code - en - es - eu - fon - fr - gu - hi - id - ig - ki - kn - lg - ln - ml - mr - ne - nso - ny - or - pa - pt - rn - rw - sn - st - sw - ta - te - tn - ts - tum - tw - ur - vi - wo - xh - yo - zh - zhs - zht - zu --- # WARNING: The checkpoints on this repo are not fully trained model. Evaluations of intermediary checkpoints and the final model will be added when conducted (see below). #

BLOOM LM
_BigScience Large Open-science Open-access Multilingual Language Model_
Model Card

\"BigScience Version 1.3 / 11.July.2022 - Available intermediary checkpoints - global steps: + , , , , , , , You can check the available checkpoints by clicking on the branches section of the repo # How to load a specific version We use to load a model in a specific version (eg. ): # Table of Contents 1. Model Details 2. Uses 3. Training Data 4. Risks and Limitations 5. Evaluation 6. Recommendations 7. Glossary and Calculations 8. More Information 9. Model Card Authors --- # Model Details BLOOM is a type of language model, which is a probability distribution over sequences of words. Specifically, BLOOM is a Large Language Model (LLM), meaning that it is trained on vast amounts of text data using industrial-scale computational resources. As such, the model is able to capture the statistical tendencies of words, phrases, sentences, and larger spans of text that it is exposed to in the training data. ## Basics *This section provides information about the model type, version, license, funders, release date, developers, and contact information.* *It is useful for anyone who wants to reference the model.*
Click to expand **Developed by:** BigScience (website) *All collaborators are either volunteers or have an agreement with their employer. (Further breakdown of participants forthcoming.)* **Model Type:** Transformer-based Language Model **Version:** 1.0.0 **Languages:** Multiple; see training data **License:** RAIL License v1.0 (link) **Release Date Estimate:** Monday, 11.July.2022 **Send Questions to:** bigscience-contact@googlegroups.com **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022 **Funded by:** * The French government. * Hugging Face (website). * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
## Technical Specifications *This section includes details about the model objective and architecture, and the compute infrastructure.* *It is useful for people interested in model development.*
Click to expand Please see the BLOOM training README for full details on replicating training. ### Model Architecture and Objective * Modified from Megatron-LM GPT2 (see paper, BLOOM Megatron code): * Decoder-only architecture * Layer normalization applied to word embeddings layer (; see code, paper) * ALiBI positional encodings (see paper), with GeLU activation functions * 176 billion parameters: * 70 layers, 112 attention heads * Hidden layers are 14336-dimensional * Sequence length of 2048 tokens used (see BLOOM tokenizer, tokenizer description) **Objective Function:** Cross Entropy with mean reduction (see API documentation). ### Compute infrastructure Jean Zay Public Supercomputer, provided by the French government (see announcement). #### Hardware * 384 A100 80GB GPUs (48 nodes) * Additional 32 A100 80GB GPUs (4 nodes) in reserve * 8 GPUs per node Using NVLink 4 inter-gpu connects, 4 OmniPath links * CPU: AMD * CPU memory: 512GB per node * GPU memory: 640GB per node * Inter-node connect: Omni-Path Architecture (OPA) * NCCL-communications network: a fully dedicated subnet * Disc IO network: shared network with other types of nodes #### Software * Megatron-DeepSpeed (Github link) * DeepSpeed (Github link) * PyTorch (pytorch-1.11 w/ CUDA-11.5; see Github link) * apex (Github link)
--- # Training *This section provides information about the training data, the speed and size of training elements, and the environmental impact of training.* *It is useful for people who want to learn more about the model inputs and training footprint.*
Click to expand ## Training Data *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.* Details for each dataset are provided in individual Data Cards. Training data includes: - 45 natural languages - 12 programming languages - In 1.5TB of pre-processed text, converted into 350B unique tokens (see the tokenizer section for more.) ### Languages The pie chart shows the distribution of languages in training data. !pie chart showing the distribution of languages in training data The following tables shows the further distribution of Niger-Congo & Indic languages and programming languages in the training data. Distribution of Niger Congo and Indic languages. | Niger Congo | Percentage | | Indic | Percentage | |----------------|------------ |------ |-----------|------------| | Chi Tumbuka | 0.00002 | | Assamese | 0.01 | | Kikuyu | 0.00004 | | Odia | 0.04 | | Bambara | 0.00004 | | Gujarati | 0.04 | | Akan | 0.00007 | | Marathi | 0.05 | | Xitsonga | 0.00007 | | Punjabi | 0.05 | | Sesotho | 0.00007 | | Kannada | 0.06 | | Chi Chewa | 0.0001 | | Nepali | 0.07 | | Setswana | 0.0002 | | Telugu | 0.09 | | Northern Sotho | 0.0002 | | Malayalam | 0.10 | | Fon | 0.0002 | | Urdu | 0.10 | | Kirundi | 0.0003 | | Tamil | 0.20 | | Wolof | 0.0004 | | Bengali | 0.50 | | Kuganda | 0.0004 | | Hindi | 0.70 | | Chi Shona | 0.001 | | Isi Zulu | 0.001 | | Igbo | 0.001 | | Xhosa | 0.001 | | Kinyarwanda | 0.003 | | Yoruba | 0.006 | | Swahili | 0.02 | Distribution of programming languages. | Extension | Language | Number of files | |----------------|------------|-----------------| | java | Java | 5,407,724 | | php | PHP | 4,942,186 | | cpp | C++ | 2,503,930 | | py | Python | 2,435,072 | | js | JavaScript | 1,905,518 | | cs | C# | 1,577,347 | | rb | Ruby | 6,78,413 | | cc | C++ | 443,054 | | hpp | C++ | 391,048 | | lua | Lua | 352,317 | | go | GO | 227,763 | | ts | TypeScript | 195,254 | | C | C | 134,537 | | scala | Scala | 92,052 | | hh | C++ | 67,161 | | H | C++ | 55,899 | | tsx | TypeScript | 33,107 | | rs | Rust | 29,693 | | phpt | PHP | 9,702 | | c++ | C++ | 1,342 | | h++ | C++ | 791 | | php3 | PHP | 540 | | phps | PHP | 270 | | php5 | PHP | 166 | | php4 | PHP | 29 | ### Preprocessing **Tokenization:** The BLOOM tokenizer (link), a learned subword tokenizer trained using: - A byte-level Byte Pair Encoding (BPE) algorithm - A simple pre-tokenization rule, no normalization - A vocabulary size of 250,680 It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language. ## Speeds, Sizes, Times Training logs: Tensorboard link - Dates: - Started 11th March, 2022 11:42am PST - Estimated end: 5th July, 2022 - Checkpoint size: - Bf16 weights: 329GB - Full checkpoint with optimizer states: 2.3TB - Training throughput: About 150 TFLOP per GPU per second - Number of epochs: 1 - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments) - Server training location: Île-de-France, France ## Environmental Impact The training supercomputer, Jean Zay (website), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing. **Estimated carbon emissions:** *(Forthcoming.)* **Estimated electricity usage:** *(Forthcoming.)*
--- # Uses *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.* *It is useful for anyone considering using the model or who is affected by the model.*
Click to expand ## Intended Use This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive. ### Direct Use - Text generation - Exploring characteristics of language generated by a language model - Examples: Cloze tests, counterfactuals, generations with reframings ### Downstream Use - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization ### Misuse and Out-of-scope Use *This section addresses what users ought not do with the model.* See the BLOOM License, Attachment A, for detailed usage restrictions. The below list is non-exhaustive, but lists some easily foreseeable problematic use cases. #### Out-of-scope Uses Using the model in high-stakes settings is out of scope for this model. The model is not designed for critical decisions nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but is not correct. Out-of-scope Uses Include: - Usage in biomedical domains, political and legal domains, or finance domains - Usage for evaluating or scoring individuals, such as for employment, education, or credit - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct #### Misuse Intentionally using the model for harm, violating human rights, or other kinds of malicious activities, is a misuse of this model. This includes: - Spam generation - Disinformation and influence operations - Disparagement and defamation - Harassment and abuse - Deception - Unconsented impersonation and imitation - Unconsented surveillance - Generating content without attribution to the model, as specified in the RAIL License, Use Restrictions ## Intended Users ### Direct Users - General Public - Researchers - Students - Educators - Engineers/developers - Non-commercial entities - Community advocates, including human and civil rights groups ### Indirect Users - Users of derivatives created by Direct Users, such as those using software with an intended use - Users of Derivatives of the Model, as described in the License ### Others Affected (Parties Prenantes) - People and groups referred to by the LLM - People and groups exposed to outputs of, or decisions based on, the LLM - People and groups whose original work is included in the LLM
--- # Risks and Limitations *This section identifies foreseeable harms and misunderstandings.*
Click to expand Model may: - Overrepresent some viewpoints and underrepresent others - Contain stereotypes - Contain personal information - Generate: - Hateful, abusive, or violent language - Discriminatory or prejudicial language - Content that may not be appropriate for all settings, including sexual content - Make errors, including producing incorrect information as if it were factual - Generate irrelevant or repetitive outputs
--- # Evaluation *This section describes the evaluation protocols and provides the results.*
Click to expand ## Metrics *This section describes the different ways performance is calculated and why.* Includes: | Metric | Why chosen | |--------------------|--------------------------------------------------------------------| | Perplexity | Standard metric for quantifying model improvements during training | | Cross Entropy Loss | Standard objective for language models. | And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_ ## Factors *This section lists some different aspects of what BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.* - Language, such as English or Yoruba - Domain, such as newswire or stories - Demographic characteristics, such as gender or nationality ## Results *Results are based on the Factors and Metrics.* **Train-time Evaluation:** As of 25.May.2022, 15:00 PST: - Training Loss: 2.0 - Validation Loss: 2.2 - Perplexity: 8.9 (More evaluation scores forthcoming.)
--- # Recommendations *This section provides information on warnings and potential mitigations.*
Click to expand - Indirect users should be made aware when the content they're working with is created by the LLM. - Users should be aware of Risks and Limitations, and include an appropriate age disclaimer or blocking interface as necessary. - Models trained or finetuned downstream of BLOOM LM should include an updated Model Card. - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
--- # Glossary and Calculations *This section defines common terms and how metrics are calculated.*
Click to expand - **Loss:** A calculation of the difference between what the model has learned and what the data shows (\"groundtruth\"). The lower the loss, the better. The training process aims to minimize the loss. - **Perplexity:** This is based on what the model estimates the probability of new data is. The lower the perplexity, the better. If the model is 100% correct at predicting the next token it will see, then the perplexity is 1. Mathematically this is calculated using entropy. - **High-stakes settings:** Such as those identified as \"high-risk AI systems\" and \"unacceptable risk AI systems\" in the European Union's proposed Artificial Intelligence (AI) Act. - **Critical decisions:** Such as those defined in the United States' proposed Algorithmic Accountability Act. - **Human rights:** Includes those rights defined in the Universal Declaration of Human Rights. - **Personal Data and Personal Information:** Personal data and information is defined in multiple data protection regulations, such as \"personal data\" in the European Union's General Data Protection Regulation; and \"personal information\" in the Republic of South Africa's Protection of Personal Information Act, The People's Republic of China's Personal information protection law. - **Sensitive characteristics:** This includes specifically protected categories in human rights (see UHDR, Article 2) and personal information regulation (see GDPR, Article 9; Protection of Personal Information Act, Chapter 1) - **Deception:** Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
--- # More Information *This section provides links to writing on dataset creation, technical specifications, lessons learned, and initial results.*
Click to expand ## Dataset Creation Blog post detailing the design choices during the dataset creation: ## Technical Specifications Blog post summarizing how the architecture, size, shape, and pre-training duration where selected: More details on the architecture/optimizer: Blog post on the hardware/engineering side: Details on the distributed setup used for the training: Tensorboard updated during the training: ## Lessons Insights on how to approach training, negative results: Details on the obstacles overcome during the preparation on the engineering side (instabilities, optimization of training throughput, so many technical tricks and questions): ## Initial Results Initial prompting experiments using interim checkpoints:
--- # Model Card Authors *Ordered roughly chronologically and by amount of time spent.* Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff", + "model_explanation_gemini": "An intermediate, multilingual large language model trained on diverse languages for capturing statistical patterns in text, released as part of the BigScience BLOOM project." +} \ No newline at end of file diff --git a/data/model_data_json/bigscience_bloomz-560m.json b/data/model_data_json/bigscience_bloomz-560m.json new file mode 100644 index 0000000000000000000000000000000000000000..b9ce82c1c75bc41e2a417351cb50aa8b13018a91 --- /dev/null +++ b/data/model_data_json/bigscience_bloomz-560m.json @@ -0,0 +1,68 @@ +{ + "model_id": "bigscience/bloomz-560m", + "downloads": 679756, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "safetensors", + "bloom", + "text-generation", + "ak", + "ar", + "as", + "bm", + "bn", + "ca", + "code", + "en", + "es", + "eu", + "fon", + "fr", + "gu", + "hi", + "id", + "ig", + "ki", + "kn", + "lg", + "ln", + "ml", + "mr", + "ne", + "nso", + "ny", + "or", + "pa", + "pt", + "rn", + "rw", + "sn", + "st", + "sw", + "ta", + "te", + "tn", + "ts", + "tum", + "tw", + "ur", + "vi", + "wo", + "xh", + "yo", + "zh", + "zu", + "dataset:bigscience/xP3", + "arxiv:2211.01786", + "license:bigscience-bloom-rail-1.0", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - bigscience/xP3 license: bigscience-bloom-rail-1.0 language: - ak - ar - as - bm - bn - ca - code - en - es - eu - fon - fr - gu - hi - id - ig - ki - kn - lg - ln - ml - mr - ne - nso - ny - or - pa - pt - rn - rw - sn - st - sw - ta - te - tn - ts - tum - tw - ur - vi - wo - xh - yo - zh - zu programming_language: - C - C++ - C# - Go - Java - JavaScript - Lua - PHP - Python - Ruby - Rust - Scala - TypeScript pipeline_tag: text-generation widget: - text: \"一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative?\" example_title: \"zh-en sentiment\" - text: \"一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?\" example_title: \"zh-zh sentiment\" - text: \"Suggest at least five related search terms to \\\"Mạng neural nhân tạo\\\".\" example_title: \"vi-en query\" - text: \"Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels».\" example_title: \"fr-fr query\" - text: \"Explain in a sentence in Telugu what is backpropagation in neural networks.\" example_title: \"te-en qa\" - text: \"Why is the sky blue?\" example_title: \"en-en qa\" - text: \"Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is \\\"Heroes Come in All Shapes and Sizes\\\". Story (in Spanish):\" example_title: \"es-en fable\" - text: \"Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \\\"Violence is the last refuge of the incompetent\\\". Fable (in Hindi):\" example_title: \"hi-en fable\" model-index: - name: bloomz-560m results: - task: type: Coreference resolution dataset: type: winogrande name: Winogrande XL (xl) config: xl split: validation revision: a80f460359d1e9a67c006011c94de42a8759430c metrics: - type: Accuracy value: 52.41 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (en) config: en split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 51.01 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (fr) config: fr split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 51.81 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (jp) config: jp split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 52.03 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (pt) config: pt split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 53.99 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (ru) config: ru split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 53.97 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (zh) config: zh split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 54.76 - task: type: Natural language inference dataset: type: anli name: ANLI (r1) config: r1 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 33.4 - task: type: Natural language inference dataset: type: anli name: ANLI (r2) config: r2 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 33.4 - task: type: Natural language inference dataset: type: anli name: ANLI (r3) config: r3 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 33.5 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (cb) config: cb split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 53.57 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (rte) config: rte split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 67.15 - task: type: Natural language inference dataset: type: xnli name: XNLI (ar) config: ar split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 44.46 - task: type: Natural language inference dataset: type: xnli name: XNLI (bg) config: bg split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 39.76 - task: type: Natural language inference dataset: type: xnli name: XNLI (de) config: de split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 39.36 - task: type: Natural language inference dataset: type: xnli name: XNLI (el) config: el split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 40.96 - task: type: Natural language inference dataset: type: xnli name: XNLI (en) config: en split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 46.43 - task: type: Natural language inference dataset: type: xnli name: XNLI (es) config: es split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 44.98 - task: type: Natural language inference dataset: type: xnli name: XNLI (fr) config: fr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 45.54 - task: type: Natural language inference dataset: type: xnli name: XNLI (hi) config: hi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 41.81 - task: type: Natural language inference dataset: type: xnli name: XNLI (ru) config: ru split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 39.64 - task: type: Natural language inference dataset: type: xnli name: XNLI (sw) config: sw split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 38.35 - task: type: Natural language inference dataset: type: xnli name: XNLI (th) config: th split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 35.5 - task: type: Natural language inference dataset: type: xnli name: XNLI (tr) config: tr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 37.31 - task: type: Natural language inference dataset: type: xnli name: XNLI (ur) config: ur split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 38.96 - task: type: Natural language inference dataset: type: xnli name: XNLI (vi) config: vi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 44.74 - task: type: Natural language inference dataset: type: xnli name: XNLI (zh) config: zh split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 44.66 - task: type: Program synthesis dataset: type: openai_humaneval name: HumanEval config: None split: test revision: e8dc562f5de170c54b5481011dd9f4fa04845771 metrics: - type: Pass@1 value: 2.18 - type: Pass@10 value: 4.11 - type: Pass@100 value: 9.00 - task: type: Sentence completion dataset: type: story_cloze name: StoryCloze (2016) config: \"2016\" split: validation revision: e724c6f8cdf7c7a2fb229d862226e15b023ee4db metrics: - type: Accuracy value: 60.29 - task: type: Sentence completion dataset: type: super_glue name: SuperGLUE (copa) config: copa split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 52.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (et) config: et split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 53.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ht) config: ht split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 49.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (id) config: id split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 57.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (it) config: it split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 52.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (qu) config: qu split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 55.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (sw) config: sw split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 56.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ta) config: ta split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 58.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (th) config: th split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 58.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (tr) config: tr split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 61.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (vi) config: vi split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 61.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (zh) config: zh split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 61.0 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ar) config: ar split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 54.4 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (es) config: es split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 56.45 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (eu) config: eu split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 50.56 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (hi) config: hi split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 55.79 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (id) config: id split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 57.84 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (my) config: my split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 47.05 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ru) config: ru split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 53.14 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (sw) config: sw split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 51.36 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (te) config: te split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 54.86 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (zh) config: zh split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 56.52 --- !xmtf # Table of Contents 1. Model Summary 2. Use 3. Limitations 4. Training 5. Evaluation 7. Citation # Model Summary > We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages. - **Repository:** bigscience-workshop/xmtf - **Paper:** Crosslingual Generalization through Multitask Finetuning - **Point of Contact:** Niklas Muennighoff - **Languages:** Refer to bloom for pretraining & xP3 for finetuning language proportions. It understands both pretraining & finetuning languages. - **BLOOMZ & mT0 Model Family:**
Multitask finetuned on
Parameters 300M 580M 1.2B 3.7B 13B 560M 1.1B 1.7B 3B 7.1B 176B
Finetuned Model
Multitask finetuned on
Finetuned Model Multitask finetuned on
Finetuned Model Original pretrained checkpoints. Not recommended.
Pretrained Model
# Use ## Intended use We recommend using the model to perform tasks expressed in natural language. For example, given the prompt \"*Translate to English: Je t’aime.*\", the model will most likely answer \"*I love you.*\". Some prompt ideas from our paper: - 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评? - Suggest at least five related search terms to \"Mạng neural nhân tạo\". - Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is \"Heroes Come in All Shapes and Sizes\". Story (in Spanish): - Explain in a sentence in Telugu what is backpropagation in neural networks. **Feel free to share your generations in the Community tab!** ## How to use ### CPU
Click to expand
### GPU
Click to expand
### GPU in 8bit
Click to expand
### # Limitations **Prompt Engineering:** The performance may vary depending on the prompt. For BLOOMZ models, we recommend making it very clear when the input stops to avoid the model trying to continue it. For example, the prompt \"*Translate to English: Je t'aime*\" without the full stop (.) at the end, may result in the model trying to continue the French sentence. Better prompts are e.g. \"*Translate to English: Je t'aime.*\", \"*Translate to English: Je t'aime. Translation:*\" \"*What is \"Je t'aime.\" in English?*\", where it is clear for the model when it should answer. Further, we recommend providing the model as much context as possible. For example, if you want it to answer in Telugu, then tell the model, e.g. \"*Explain in a sentence in Telugu what is backpropagation in neural networks.*\". # Training ## Model - **Architecture:** Same as bloom-560m, also refer to the file - **Finetuning steps:** 1750 - **Finetuning tokens:** 3.67 billion - **Finetuning layout:** 1x pipeline parallel, 1x tensor parallel, 1x data parallel - **Precision:** float16 ## Hardware - **CPUs:** AMD CPUs with 512GB memory per node - **GPUs:** 64 A100 80GB GPUs with 8 GPUs per node (8 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links - **Communication:** NCCL-communications network with a fully dedicated subnet ## Software - **Orchestration:** Megatron-DeepSpeed - **Optimizer & parallelism:** DeepSpeed - **Neural networks:** PyTorch (pytorch-1.11 w/ CUDA-11.5) - **FP16 if applicable:** apex # Evaluation We refer to Table 7 from our paper & bigscience/evaluation-results for zero-shot results on unseen tasks. The sidebar reports zero-shot performance of the best prompt per dataset config. # Citation", + "model_explanation_gemini": "A multilingual text-generation model capable of handling diverse tasks like sentiment analysis, query suggestions, question answering, and creative writing across multiple languages and programming languages." +} \ No newline at end of file diff --git a/data/model_data_json/bionlp_bluebert_pubmed_mimic_uncased_L-12_H-768_A-12.json b/data/model_data_json/bionlp_bluebert_pubmed_mimic_uncased_L-12_H-768_A-12.json new file mode 100644 index 0000000000000000000000000000000000000000..fac2a896f6a5fe0774c05e9309c78566bac5cf25 --- /dev/null +++ b/data/model_data_json/bionlp_bluebert_pubmed_mimic_uncased_L-12_H-768_A-12.json @@ -0,0 +1,19 @@ +{ + "model_id": "bionlp/bluebert_pubmed_mimic_uncased_L-12_H-768_A-12", + "downloads": 189235, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "bluebert", + "en", + "dataset:PubMed", + "dataset:MIMIC-III", + "license:cc0-1.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - bert - bluebert license: cc0-1.0 datasets: - PubMed - MIMIC-III --- # BlueBert-Base, Uncased, PubMed and MIMIC-III ## Model description A BERT model pre-trained on PubMed abstracts and clinical notes (MIMIC-III). ## Intended uses & limitations #### How to use Please see ## Training data We provide preprocessed PubMed texts that were used to pre-train the BlueBERT models. The corpus contains ~4000M words extracted from the PubMed ASCII code version. Pre-trained model: ## Training procedure * lowercasing the text * removing speical chars - * tokenizing the text using the NLTK Treebank tokenizer Below is a code snippet for more details. ### BibTeX entry and citation info ### Acknowledgments This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine and Clinical Center. This work was supported by the National Library of Medicine of the National Institutes of Health under award number 4R00LM013001-01. We are also grateful to the authors of BERT and ELMo to make the data and codes publicly available. We would like to thank Dr Sun Kim for processing the PubMed texts. ### Disclaimer This tool shows the results of research conducted in the Computational Biology Branch, NCBI. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NCBI's disclaimer policy is available.", + "model_explanation_gemini": "A BERT model pre-trained on PubMed abstracts and clinical notes (MIMIC-III) for biomedical and clinical text processing." +} \ No newline at end of file diff --git a/data/model_data_json/biu-nlp_f-coref.json b/data/model_data_json/biu-nlp_f-coref.json new file mode 100644 index 0000000000000000000000000000000000000000..89bfdf647b8aaf3c8be8eed2f520c6a9039c9783 --- /dev/null +++ b/data/model_data_json/biu-nlp_f-coref.json @@ -0,0 +1,25 @@ +{ + "model_id": "biu-nlp/f-coref", + "downloads": 217419, + "tags": [ + "transformers", + "pytorch", + "roberta", + "fast", + "coreference-resolution", + "en", + "dataset:multi_news", + "dataset:ontonotes", + "arxiv:2209.04280", + "arxiv:2205.12644", + "arxiv:1907.10529", + "arxiv:2101.00434", + "arxiv:2109.04127", + "license:mit", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - fast - coreference-resolution license: mit datasets: - multi_news - ontonotes metrics: - CoNLL task_categories: - coreference-resolution model-index: - name: biu-nlp/f-coref results: - task: type: coreference-resolution name: coreference-resolution dataset: name: ontonotes type: coreference metrics: - name: Avg. F1 type: CoNLL value: 78.5 --- ## F-Coref: Fast, Accurate and Easy to Use Coreference Resolution F-Coref allows to process 2.8K OntoNotes documents in 25 seconds on a V100 GPU (compared to 6 minutes for the LingMess model, and to 12 minutes of the popular AllenNLP coreference model) with only a modest drop in accuracy. The fast speed is achieved through a combination of distillation of a compact model from the LingMess model, and an efficient batching implementation using a technique we call leftover Please check the official repository for more details and updates. #### Experiments | Model | Runtime | Memory | |-----------------------|---------|---------| | Joshi et al. (2020) | 12:06 | 27.4 | | Otmazgin et al. (2022) | 06:43 | 4.6 | | + Batching | 06:00 | 6.6 | | Kirstain et al. (2021) | 04:37 | 4.4 | | Dobrovolskii (2021) | 03:49 | 3.5 | | F-Coref | 00:45 | 3.3 | | + Batching | 00:35 | 4.5 | | + Leftovers batching | 00:25 | 4.0 | The inference time(Min:Sec) and memory(GiB) for each model on 2.8K documents. Average of 3 runs. Hardware, NVIDIA Tesla V100 SXM2. ### Citation F-coref: Fast, Accurate and Easy to Use Coreference Resolution (Otmazgin et al., AACL-IJCNLP 2022)", + "model_explanation_gemini": "F-Coref is a fast and efficient coreference resolution model optimized for speed and performance, achieving high accuracy while processing documents significantly quicker than comparable models." +} \ No newline at end of file diff --git a/data/model_data_json/biu-nlp_lingmess-coref.json b/data/model_data_json/biu-nlp_lingmess-coref.json new file mode 100644 index 0000000000000000000000000000000000000000..28590484b600543732a700207ea7ecb482971f5a --- /dev/null +++ b/data/model_data_json/biu-nlp_lingmess-coref.json @@ -0,0 +1,19 @@ +{ + "model_id": "biu-nlp/lingmess-coref", + "downloads": 550422, + "tags": [ + "transformers", + "pytorch", + "longformer", + "coreference-resolution", + "en", + "dataset:ontonotes", + "arxiv:2205.12644", + "license:mit", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - coreference-resolution license: mit datasets: - ontonotes metrics: - CoNLL task_categories: - coreference-resolution model-index: - name: biu-nlp/lingmess-coref results: - task: type: coreference-resolution name: coreference-resolution dataset: name: ontonotes type: coreference metrics: - name: Avg. F1 type: CoNLL value: 81.4 --- ## LingMess: Linguistically Informed Multi Expert Scorers for Coreference Resolution LingMess is a linguistically motivated categorization of mention-pairs into 6 types of coreference decisions and learn a dedicated trainable scoring function for each category. This significantly improves the accuracy of the pairwise scorer as well as of the overall coreference performance on the English Ontonotes coreference corpus. Please check the official repository for more details and updates. #### Training on OntoNotes We present the test results on OntoNotes 5.0 dataset. | Model | Avg. F1 | |---------------------------------|---------| | SpanBERT-large + e2e | 79.6 | | Longformer-large + s2e | 80.3 | | **Longformer-large + LingMess** | 81.4 | ### Citation If you find LingMess useful for your work, please cite the following paper:", + "model_explanation_gemini": "LingMess improves coreference resolution accuracy by categorizing mention-pairs into 6 types and learning dedicated scoring functions for each, achieving 81.4 Avg. F1 on OntoNotes." +} \ No newline at end of file diff --git a/data/model_data_json/black-forest-labs_FLUX.1-Fill-dev.json b/data/model_data_json/black-forest-labs_FLUX.1-Fill-dev.json new file mode 100644 index 0000000000000000000000000000000000000000..17bc586fca29b9611d0d1728ddddb8a283d390d8 --- /dev/null +++ b/data/model_data_json/black-forest-labs_FLUX.1-Fill-dev.json @@ -0,0 +1,16 @@ +{ + "model_id": "black-forest-labs/FLUX.1-Fill-dev", + "downloads": 324576, + "tags": [ + "diffusers", + "safetensors", + "image-generation", + "flux", + "diffusion-single-file", + "en", + "license:other", + "diffusers:FluxFillPipeline", + "region:us" + ], + "description": "--- language: - en license: other license_name: flux-1-dev-non-commercial-license license_link: LICENSE.md extra_gated_prompt: By clicking \"Agree\", you agree to the FluxDev Non-Commercial License Agreement and acknowledge the Acceptable Use Policy. tags: - image-generation - flux - diffusion-single-file --- !image/jpeg is a 12 billion parameter rectified flow transformer capable of filling areas in existing images based on a text description. For more information, please read our blog post. # Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model . 2. Blends impressive prompt following with completing the structure of your source image. 3. Trained using guidance distillation, making more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the Non-Commercial License. # Usage We provide a reference implementation of , as well as sampling code, in a dedicated github repository. Developers and creatives looking to build on top of are encouraged to use this as a starting point. ## API Endpoints The FLUX.1 models are also available in our API bfl.ml !image/png ## Diffusers To use with the 🧨 diffusers python library, first install or upgrade diffusers Then you can use to run the model To learn more check out the diffusers documentation --- # Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate output that matches the prompts. - Prompt following is heavily influenced by the prompting-style. - There may be slight-color shifts in areas that are not filled in - Filling in complex textures may produce lines at the edges of the filled-area. # Out-of-Scope Use The model and its derivatives may not be used - In any way that violates any applicable national, federal, state, local or international law or regulation. - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. - To generate or disseminate verifiably false information and/or content with the purpose of harming others. - To generate or disseminate personal identifiable information that can be used to harm an individual. - To harass, abuse, threat" +} \ No newline at end of file diff --git a/data/model_data_json/black-forest-labs_FLUX.1-Redux-dev.json b/data/model_data_json/black-forest-labs_FLUX.1-Redux-dev.json new file mode 100644 index 0000000000000000000000000000000000000000..a361a04fcc37db8dc554461e108fb3b7f0a3e67a --- /dev/null +++ b/data/model_data_json/black-forest-labs_FLUX.1-Redux-dev.json @@ -0,0 +1,16 @@ +{ + "model_id": "black-forest-labs/FLUX.1-Redux-dev", + "downloads": 204436, + "tags": [ + "diffusers", + "safetensors", + "image-generation", + "flux", + "diffusion-single-file", + "en", + "license:other", + "diffusers:FluxPriorReduxPipeline", + "region:us" + ], + "description": "--- language: - en license: other license_name: flux-1-dev-non-commercial-license license_link: LICENSE.md extra_gated_prompt: By clicking \"Agree\", you agree to the FluxDev Non-Commercial License Agreement and acknowledge the Acceptable Use Policy. tags: - image-generation - flux - diffusion-single-file --- !image/png FLUX.1 Redux [dev] is an adapter for all FLUX.1 base models for image variation generation. Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image. It naturally integrates into more complex workflows unlocking image restyling. Restyling via text is also available through our API by providing an image plus a language prompt. For more information, please read our blog post. # Usage We provide a reference implementation of , as well as sampling code, in a dedicated github repository. ## API Endpoints is available in our API bfl.ml. In addition to the adapter, the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra, allowing for combining input images and text prompts to create high-quality 4-megapixel outputs with flexible aspect ratios. !image/png ## Diffusers To use with the 🧨 diffusers python library, first install or upgrade diffusers Then you can use along with to generate images from images. To learn more check out the diffusers documentation --- # Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate output that matches the prompts. - Outputs are heavily influenced by the input image. # Out-of-Scope Use The model and its derivatives may not be used - In any way that violates any applicable national, federal, state, local or international law or regulation. - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. - To generate or disseminate verifiably false information and/or content with the purpose of harming others. - To generate or disseminate personal identifiable information that can be used to harm an individual. - To harass, abuse, threaten, stalk, or bully individuals or groups of individuals. - To create non-consensual nudity or illegal pornographic content. - For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation. - Generating or facilitating large-scale disinformation campaigns. # License This model falls under the Non-Commercial License." +} \ No newline at end of file diff --git a/data/model_data_json/black-forest-labs_FLUX.1-dev.json b/data/model_data_json/black-forest-labs_FLUX.1-dev.json new file mode 100644 index 0000000000000000000000000000000000000000..88618d0464479889554facc2902396cdbe815bf5 --- /dev/null +++ b/data/model_data_json/black-forest-labs_FLUX.1-dev.json @@ -0,0 +1,17 @@ +{ + "model_id": "black-forest-labs/FLUX.1-dev", + "downloads": 2666194, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "image-generation", + "flux", + "en", + "license:other", + "endpoints_compatible", + "diffusers:FluxPipeline", + "region:us" + ], + "description": "--- language: - en license: other license_name: flux-1-dev-non-commercial-license license_link: LICENSE.md extra_gated_prompt: By clicking \"Agree\", you agree to the FluxDev Non-Commercial License Agreement and acknowledge the Acceptable Use Policy. tags: - text-to-image - image-generation - flux --- ![FLUX.1 [dev] Grid](./dev_grid.jpg) is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. # Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model . 2. Competitive prompt following, matching the performance of closed source alternatives . 3. Trained using guidance distillation, making more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the Non-Commercial License. # Usage We provide a reference implementation of , as well as sampling code, in a dedicated github repository. Developers and creatives looking to build on top of are encouraged to use this as a starting point. ## API Endpoints The FLUX.1 models are also available via API from the following sources - bfl.ml (currently ) - replicate.com - fal.ai - mystic.ai ## ComfyUI is also available in Comfy UI for local inference with a node-based workflow. ## Diffusers To use with the 🧨 diffusers python library, first install or upgrade diffusers Then you can use to run the model To learn more check out the diffusers documentation --- # Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate output that matches the prompts. - Prompt following is heavily influenced by the prompting-style. # Out-of-Scope Use The model and its derivatives may not be used - In any way that violates any applicable national, federal, state, local or international law or regulation. - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. - To generate or disseminate verifiably false information and/or content with the purpose of harming others. - To generate or disseminate personal identifiable information that can be used to harm an individual. - To harass, abuse, threaten, stalk, or bully individuals or groups of individuals. - To create non-consensual nudity or illegal pornographic content. - For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation. - Generating or facilitating large-scale disinformation campaigns. # License This model falls under the Non-Commercial License." +} \ No newline at end of file diff --git a/data/model_data_json/black-forest-labs_FLUX.1-schnell.json b/data/model_data_json/black-forest-labs_FLUX.1-schnell.json new file mode 100644 index 0000000000000000000000000000000000000000..2a749bbbbb903a044cbbfc6a8761d8b939edba3e --- /dev/null +++ b/data/model_data_json/black-forest-labs_FLUX.1-schnell.json @@ -0,0 +1,17 @@ +{ + "model_id": "black-forest-labs/FLUX.1-schnell", + "downloads": 495889, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "image-generation", + "flux", + "en", + "license:apache-2.0", + "endpoints_compatible", + "diffusers:FluxPipeline", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 tags: - text-to-image - image-generation - flux --- ![FLUX.1 [schnell] Grid](./schnell_grid.jpeg) is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. # Key Features 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. 2. Trained using latent adversarial diffusion distillation, can generate high-quality images in only 1 to 4 steps. 3. Released under the licence, the model can be used for personal, scientific, and commercial purposes. # Usage We provide a reference implementation of , as well as sampling code, in a dedicated github repository. Developers and creatives looking to build on top of are encouraged to use this as a starting point. ## API Endpoints The FLUX.1 models are also available via API from the following sources - bfl.ml (currently ) - replicate.com - fal.ai - mystic.ai ## ComfyUI is also available in Comfy UI for local inference with a node-based workflow. ## Diffusers To use with the 🧨 diffusers python library, first install or upgrade diffusers Then you can use to run the model To learn more check out the diffusers documentation --- # Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate output that matches the prompts. - Prompt following is heavily influenced by the prompting-style. # Out-of-Scope Use The model and its derivatives may not be used - In any way that violates any applicable national, federal, state, local or international law or regulation. - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. - To generate or disseminate verifiably false information and/or content with the purpose of harming others. - To generate or disseminate personal identifiable information that can be used to harm an individual. - To harass, abuse, threaten, stalk, or bully individuals or groups of individuals. - To create non-consensual nudity or illegal pornographic content. - For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation. - Generating or facilitating large-scale disinformation campaigns." +} \ No newline at end of file diff --git a/data/model_data_json/blanchefort_rubert-base-cased-sentiment-rusentiment.json b/data/model_data_json/blanchefort_rubert-base-cased-sentiment-rusentiment.json new file mode 100644 index 0000000000000000000000000000000000000000..0420106c7f87b921ddd1fcc1eca492faa310b1cf --- /dev/null +++ b/data/model_data_json/blanchefort_rubert-base-cased-sentiment-rusentiment.json @@ -0,0 +1,21 @@ +{ + "model_id": "blanchefort/rubert-base-cased-sentiment-rusentiment", + "downloads": 158320, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "text-classification", + "sentiment", + "ru", + "dataset:RuSentiment", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru tags: - sentiment - text-classification datasets: - RuSentiment --- # RuBERT for Sentiment Analysis This is a DeepPavlov/rubert-base-cased-conversational model trained on RuSentiment. ## Labels 0: NEUTRAL 1: POSITIVE 2: NEGATIVE ## How to use ## Dataset used for model training **RuSentiment** > A. Rogers A. Romanov A. Rumshisky S. Volkova M. Gronas A. Gribov RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. Proceedings of COLING 2018.", + "model_explanation_gemini": "Classifies Russian text into neutral, positive, or negative sentiment categories using a pretrained RuBERT model trained on the RuSentiment dataset." +} \ No newline at end of file diff --git a/data/model_data_json/briaai_RMBG-1.4.json b/data/model_data_json/briaai_RMBG-1.4.json new file mode 100644 index 0000000000000000000000000000000000000000..dcd6903caf399ed92a41505d436cbf373ad836d7 --- /dev/null +++ b/data/model_data_json/briaai_RMBG-1.4.json @@ -0,0 +1,24 @@ +{ + "model_id": "briaai/RMBG-1.4", + "downloads": 630623, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "SegformerForSemanticSegmentation", + "image-segmentation", + "remove background", + "background", + "background-removal", + "Pytorch", + "vision", + "legal liability", + "transformers.js", + "custom_code", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: bria-rmbg-1.4 license_link: pipeline_tag: image-segmentation tags: - remove background - background - background-removal - Pytorch - vision - legal liability - transformers - transformers.js extra_gated_description: RMBG v1.4 is available as a source-available model for non-commercial use extra_gated_heading: \"Fill in this form to get instant access\" extra_gated_fields: Name: text Company/Org name: text Org Type (Early/Growth Startup, Enterprise, Academy): text Role: text Country: text Email: text By submitting this form, I agree to BRIA’s Privacy policy and Terms & conditions, see links below: checkbox --- # BRIA Background Removal v1.4 Model Card RMBG v1.4 is our state-of-the-art background removal model, designed to effectively separate foreground from background in a range of categories and image types. This model has been trained on a carefully selected dataset, which includes: general stock images, e-commerce, gaming, and advertising content, making it suitable for commercial use cases powering enterprise content creation at scale. The accuracy, efficiency, and versatility currently rival leading source-available models. It is ideal where content safety, legally licensed datasets, and bias mitigation are paramount. Developed by BRIA AI, RMBG v1.4 is available as a source-available model for non-commercial use. To purchase a commercial license, simply click Here. CLICK HERE FOR A DEMO **NOTE** New RMBG version available! Check out RMBG-2.0 Join our Discord community for more information, tutorials, tools, and to connect with other users! !examples ### Model Description - **Developed by:** BRIA AI - **Model type:** Background Removal - **License:** bria-rmbg-1.4 - The model is released under a Creative Commons license for non-commercial use. - Commercial use is subject to a commercial agreement with BRIA. To purchase a commercial license simply click Here. - **Model Description:** BRIA RMBG 1.4 is a saliency segmentation model trained exclusively on a professional-grade dataset. - **BRIA:** Resources for more information: BRIA AI ## Training data Bria-RMBG model was trained with over 12,000 high-quality, high-resolution, manually labeled (pixel-wise accuracy), fully licensed images. Our benchmark included balanced gender, balanced ethnicity, and people with different types of disabilities. For clarity, we provide our data distribution according to different categories, demonstrating our model’s versatility. ### Distribution of images: | Category | Distribution | | -----------------------------------| -----------------------------------:| | Objects only | 45.11% | | People with objects/animals | 25.24% | | People only | 17.35% | | people/objects/animals with text | 8.52% | | Text only | 2.52% | | Animals only | 1.89% | | Category | Distribution | | -----------------------------------| -----------------------------------------:| | Photorealistic | 87.70% | | Non-Photorealistic | 12.30% | | Category | Distribution | | -----------------------------------| -----------------------------------:| | Non Solid Background | 52.05% | | Solid Background | 47.95% | Category | Distribution | | -----------------------------------| -----------------------------------:| | Single main foreground object | 51.42% | | Multiple objects in the foreground | 48.58% | ## Qualitative Evaluation !examples ## Architecture RMBG v1.4 is developed on the IS-Net enhanced with our unique training scheme and proprietary dataset. These modifications significantly improve the model’s accuracy and effectiveness in diverse image-processing scenarios. ## Installation ## Usage Either load the pipeline Or load the model", + "model_explanation_gemini": "Removes backgrounds from images across various categories, trained on a licensed dataset for non-commercial use." +} \ No newline at end of file diff --git a/data/model_data_json/briaai_RMBG-2.0.json b/data/model_data_json/briaai_RMBG-2.0.json new file mode 100644 index 0000000000000000000000000000000000000000..4d83b79d2ecebc0c6204beace3c4b72a2f62b67b --- /dev/null +++ b/data/model_data_json/briaai_RMBG-2.0.json @@ -0,0 +1,22 @@ +{ + "model_id": "briaai/RMBG-2.0", + "downloads": 356509, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "image-segmentation", + "remove background", + "background", + "background-removal", + "Pytorch", + "vision", + "legal liability", + "transformers.js", + "custom_code", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: bria-rmbg-2.0 license_link: pipeline_tag: image-segmentation tags: - remove background - background - background-removal - Pytorch - vision - legal liability - transformers - transformers.js extra_gated_description: >- Bria AI Model weights are open source for non commercial use only, per the provided license. extra_gated_heading: Fill in this form to immediatly access the model for non commercial use extra_gated_fields: Name: text Email: text Company/Org name: text Company Website URL: text Discord user: text I agree to BRIA’s Privacy policy, Terms & conditions, and acknowledge Non commercial use to be Personal use / Academy / Non profit (direct or indirect): checkbox --- # BRIA Background Removal v2.0 Model Card RMBG v2.0 is our new state-of-the-art background removal model significantly improves RMBG v1.4. The model is designed to effectively separate foreground from background in a range of categories and image types. This model has been trained on a carefully selected dataset, which includes: general stock images, e-commerce, gaming, and advertising content, making it suitable for commercial use cases powering enterprise content creation at scale. The accuracy, efficiency, and versatility currently rival leading source-available models. It is ideal where content safety, legally licensed datasets, and bias mitigation are paramount. Developed by BRIA AI, RMBG v2.0 is available as a source-available model for non-commercial use. ### Get Access Bria RMBG2.0 is availabe everywhere you build, either as source-code and weights, ComfyUI nodes or API endpoints. - **Purchase:** for commercial license simply click Here. - **API Endpoint**: Bria.ai, fal.ai - **ComfyUI**: Use it in workflows For more information, please visit our website. Join our Discord community for more information, tutorials, tools, and to connect with other users! CLICK HERE FOR A DEMO !examples ## Model Details ##### ### Model Description - **Developed by:** BRIA AI - **Model type:** Background Removal - **License:** Creative Commons Attribution–Non-Commercial (CC BY-NC 4.0) - The model is released under a CC BY-NC 4.0 license for non-commercial use. - Commercial use is subject to a commercial agreement with BRIA. Available here **Purchase:** to purchase a commercial license simply click Here. - **Model Description:** BRIA RMBG-2.0 is a dichotomous image segmentation model trained exclusively on a professional-grade dataset. - **BRIA:** Resources for more information: BRIA AI ## Training data Bria-RMBG model was trained with over 15,000 high-quality, high-resolution, manually labeled (pixel-wise accuracy), fully licensed images. Our benchmark included balanced gender, balanced ethnicity, and people with different types of disabilities. For clarity, we provide our data distribution according to different categories, demonstrating our model’s versatility. ### Distribution of images: | Category | Distribution | | -----------------------------------| -----------------------------------:| | Objects only | 45.11% | | People with objects/animals | 25.24% | | People only | 17.35% | | people/objects/animals with text | 8.52% | | Text only | 2.52% | | Animals only | 1.89% | | Category | Distribution | | -----------------------------------| -----------------------------------------:| | Photorealistic | 87.70% | | Non-Photorealistic | 12.30% | | Category | Distribution | | -----------------------------------| -----------------------------------:| | Non Solid Background | 52.05% | | Solid Background | 47.95% | Category | Distribution | | -----------------------------------| -----------------------------------:| | Single main foreground object | 51.42% | | Multiple objects in the foreground | 48.58% | ## Qualitative Evaluation Open source models comparison !diagram !examples ### Architecture RMBG-2.0 is developed on the BiRefNet architecture enhanced with our proprietary dataset and training scheme. This training data significantly improves the model’s accuracy and effectiveness for background-removal task.
If you use this model in your research, please cite: #### Requirements ### Usage " +} \ No newline at end of file diff --git a/data/model_data_json/cagliostrolab_animagine-xl-3.0.json b/data/model_data_json/cagliostrolab_animagine-xl-3.0.json new file mode 100644 index 0000000000000000000000000000000000000000..b4ead0d3acce353f3766212fc5f021ec261335dc --- /dev/null +++ b/data/model_data_json/cagliostrolab_animagine-xl-3.0.json @@ -0,0 +1,21 @@ +{ + "model_id": "cagliostrolab/animagine-xl-3.0", + "downloads": 390272, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "stable-diffusion", + "stable-diffusion-xl", + "en", + "base_model:Linaqruf/animagine-xl-2.0", + "base_model:finetune:Linaqruf/animagine-xl-2.0", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: other license_name: faipl-1.0-sd license_link: language: - en tags: - text-to-image - stable-diffusion - safetensors - stable-diffusion-xl base_model: Linaqruf/animagine-xl-2.0 widget: - text: 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality parameter: negative_prompt: nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name example_title: 1girl - text: 1boy, male focus, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality parameter: negative_prompt: nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name example_title: 1boy ---

Animagine XL 3.0

\"sample1\"
\"sample4\"
\"sample2\"
\"sample3\"
\"sample1\"
\"sample4\"
**Animagine XL 3.0** is the latest version of the sophisticated open-source anime text-to-image model, building upon the capabilities of its predecessor, Animagine XL 2.0. Developed based on Stable Diffusion XL, this iteration boasts superior image generation with notable improvements in hand anatomy, efficient tag ordering, and enhanced knowledge about anime concepts. Unlike the previous iteration, we focused to make the model learn concepts rather than aesthetic. ## Model Details - **Developed by**: Linaqruf - **Model type**: Diffusion-based text-to-image generative model - **Model Description**: Animagine XL 3.0 is engineered to generate high-quality anime images from textual prompts. It features enhanced hand anatomy, better concept understanding, and prompt interpretation, making it the most advanced model in its series. - **License**: Fair AI Public License 1.0-SD - **Finetuned from model**: Animagine XL 2.0 ## Gradio & Colab Integration Animagine XL 3.0 is accessible through user-friendly platforms such as Gradio and Google Colab: - **Gradio Web UI**: Open In Spaces - **Google Colab**: Open In Colab ## 🧨 Diffusers Installation To use Animagine XL 3.0, install the required libraries as follows: Example script for generating images with Animagine XL 3.0: ## Usage Guidelines ### Tag Ordering Prompting is a bit different in this iteration, for optimal results, it's recommended to follow the structured prompt template because we train the model like this: ## Special Tags Like the previous iteration, this model was trained with some special tags to steer the result toward quality, rating and when the posts was created. The model can still do the job without these special tags, but it’s recommended to use them if we want to make the model easier to handle. ### Quality Modifiers | Quality Modifier | Score Criterion | | ---------------- | --------------- | | | >150 | | | 100-150 | | | 75-100 | | | 25-75 | | | 0-25 | | | -5-0 | | | <-5 | ### Rating Modifiers | Rating Modifier | Rating Criterion | | ------------------------------| ------------------------- | | | General | | | Sensitive | | , | Questionable | | , | Explicit | ### Year Modifier These tags help to steer the result toward modern or vintage anime art styles, ranging from to . | Year Tag | Year Range | | -------- | ---------------- | | | 2022 to 2023 | | | 2019 to 2021 | | | 2015 to 2018 | | | 2011 to 2014 | | | 2005 to 2010 | ## Recommended settings To guide the model towards generating high-aesthetic images, use negative prompts like: For higher quality outcomes, prepend prompts with: However, be careful to use , because many high-scored datasets are NSFW. It’s better to add , to the negative prompt and to the positive prompt. it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler. ### Multi Aspect Resolution This model supports generating images at the following dimensions: | Dimensions | Aspect Ratio | |-------------------|-----------------| | | 1:1 Square | | | 9:7 | | | 7:9 | | | 19:13 | | | 13:19 | | | 7:4 Horizontal | | | 4:7 Vertical | | | 12:5 Horizontal | | | 5:12 Vertical | ## Training and Hyperparameters - **Animagine XL 3.0** was trained on a 2x A100 GPU with 80GB memory for 21 days or over 500 gpu hours. The training process encompassed three stages: - Base: - **Feature Alignment Stage**: Utilized 1.2m images to acquaint the model with basic anime concepts. - **Refining UNet Stage**: Employed 2.5k curated datasets to only fine-tune the UNet. - Curated: - **Aesthetic Tuning Stage**: Employed 3.5k high-quality curated datasets to refine the model's art style. ### Hyperparameters | Stage | Epochs | UNet Learning Rate | Train Text Encoder | Text Encoder Learning Rate | Batch Size | Mixed Precision | Noise Offset | |-----------------------------|--------|--------------------|--------------------|----------------------------|----------------|-----------------|--------------| | **Feature Alignment Stage** | 10 | 7.5e-6 | True | 3.75e-6 | 48 x 2 | fp16 | N/A | | **Refining UNet Stage** | 10 | 2e-6 | False | N/A | 48 | fp16 | 0.0357 | | **Aesthetic Tuning Stage** | 10 | 1e-6 | False | N/A | 48 | fp16 | 0.0357 | ## Model Comparison ### Training Config | Configuration Item | Animagine XL 2.0 | Animagine 3.0 | |-----------------------|-------------------------|-------------------------| | **GPU** | A100 80G | 2 x A100 80G | | **Dataset** | 170k + 83k images | 1271990 + 3500 Images | | **Shuffle Separator** | N/A | True | | **Global Epochs** | 20 | 20 | | **Learning Rate** | 1e-6 | 7.5e-6 | | **Batch Size** | 32 | 48 x 2 | | **Train Text Encoder**| True | True | | **Train Special Tags**| True | True | | **Image Resolution** | 1024 | 1024 | | **Bucket Resolution** | 2048 x 512 | 2048 x 512 | Source code and training config are available here: ## Limitations While \"Animagine XL 3.0\" represents a significant advancement in anime text-to-image generation, it's important to acknowledge its limitations to understand its best use cases and potential areas for future improvement. 1. **Concept Over Artstyle Focus**: The model prioritizes learning concepts rather than specific art styles, which might lead to variations in aesthetic appeal compared to its predecessor. 2. **Non-Photorealistic Design**: Animagine XL 3.0 is not designed for generating photorealistic or realistic images, focusing instead on anime-style artwork. 3. **Anatomical Challenges**: Despite improvements, the model can still struggle with complex anatomical structures, particularly in dynamic poses, resulting in occasional inaccuracies. 4. **Dataset Limitations**: The training dataset of 1.2 million images may not encompass all anime characters or series, limiting the model's ability to generate less known or newer characters. 5. **Natural Language Processing**: The model is not optimized for interpreting natural language, requiring more structured and specific prompts for best results. 6. **NSFW Content Risk**: Using high-quality tags like 'masterpiece' or 'best quality' carries a risk of generating NSFW content inadvertently, due to the prevalence of such images in high-scoring training datasets. These limitations highlight areas for potential refinement in future iterations and underscore the importance of careful prompt crafting for optimal results. Understanding these constraints can help users better navigate the model's capabilities and tailor their expectations accordingly. ## Acknowledgements We extend our gratitude to the entire team and community that contributed to the development of Animagine XL 3.0, including our partners and collaborators who provided resources and insights crucial for this iteration. - **Main:** For the open source grant supporting our research, thank you so much. - **Cagliostro Lab Collaborator:** For helping quality checking during pretraining and curating datasets during fine-tuning. - **Kohya SS:** For providing the essential training script and merged our PR about or Shuffle Separator. - **Camenduru Server Community:** For invaluable insights and support and quality checking - **NovelAI:** For inspiring how to build the datasets and label it using tag ordering. ## Collaborators - Linaqruf - DamarJati - Asahina2K - ItsMeBell - Zwicky18 - NekoFi - Scipius2121 - Raelina ## License Animagine XL 3.0 now uses the Fair AI Public License 1.0-SD, compatible with Stable Diffusion models. Key points: 1. **Modification Sharing:** If you modify Animagine XL 3.0, you must share both your changes and the original license. 2. **Source Code Accessibility:** If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too. 3. **Distribution Terms:** Any distribution must be under this license or another with similar rules. 4. **Compliance:** Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values. The choice of this license aims to keep Animagine XL 3.0 open and modifiable, aligning with open source community spirit. It protects contributors and users, encouraging a collaborative, ethical open-source community. This ensures the model not only benefits from communal input but also respects open-source development freedoms.", + "model_explanation_gemini": "Generates high-quality anime-style images from text prompts using Stable Diffusion XL, focusing on characters with detailed attributes." +} \ No newline at end of file diff --git a/data/model_data_json/cagliostrolab_animagine-xl-3.1.json b/data/model_data_json/cagliostrolab_animagine-xl-3.1.json new file mode 100644 index 0000000000000000000000000000000000000000..276dcc9d5966b6e58919187f3ec4b5fbe04eeb39 --- /dev/null +++ b/data/model_data_json/cagliostrolab_animagine-xl-3.1.json @@ -0,0 +1,21 @@ +{ + "model_id": "cagliostrolab/animagine-xl-3.1", + "downloads": 213569, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "stable-diffusion", + "stable-diffusion-xl", + "en", + "base_model:Linaqruf/animagine-xl-3.0", + "base_model:finetune:Linaqruf/animagine-xl-3.0", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: other license_name: faipl-1.0-sd license_link: language: - en tags: - text-to-image - stable-diffusion - safetensors - stable-diffusion-xl base_model: cagliostrolab/animagine-xl-3.0 widget: - text: 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality, very aesthetic, absurdes parameter: negative_prompt: nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract] example_title: 1girl - text: 1boy, male focus, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality, very aesthetic, absurdes parameter: negative_prompt: nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract] example_title: 1boy ---

Animagine XL 3.1

\"sample1\"
\"sample4\"
\"sample2\"
\"sample3\"
\"sample1\"
\"sample4\"
**Animagine XL 3.1** is an update in the Animagine XL V3 series, enhancing the previous version, Animagine XL 3.0. This open-source, anime-themed text-to-image model has been improved for generating anime-style images with higher quality. It includes a broader range of characters from well-known anime series, an optimized dataset, and new aesthetic tags for better image creation. Built on Stable Diffusion XL, Animagine XL 3.1 aims to be a valuable resource for anime fans, artists, and content creators by producing accurate and detailed representations of anime characters. ## Model Details - **Developed by**: Cagliostro Research Lab - **In collaboration with**: SeaArt.ai - **Model type**: Diffusion-based text-to-image generative model - **Model Description**: Animagine XL 3.1 generates high-quality anime images from textual prompts. It boasts enhanced hand anatomy, improved concept understanding, and advanced prompt interpretation. - **License**: Fair AI Public License 1.0-SD - **Fine-tuned from**: Animagine XL 3.0 ## Gradio & Colab Integration Try the demo powered by Gradio in Huggingface Spaces: image classification model, specifically trained on anime data. For this purpose, we utilized the model shadowlilac/aesthetic-shadow-v2, which assesses the aesthetic value of content before it undergoes training. This ensures that each piece of content is not only relevant and accurate but also visually appealing. | Aesthetic Tag | Score Range | |-------------------|-------------------| | | > 0.71 | | | > 0.45 & < 0.71 | | | > 0.27 & < 0.45 | | | ≤ 0.27 | ## Recommended settings To guide the model towards generating high-aesthetic images, use negative prompts like: For higher quality outcomes, prepend prompts with: it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler. ### Multi Aspect Resolution This model supports generating images at the following dimensions: | Dimensions | Aspect Ratio | |-------------------|-----------------| | | 1:1 Square | | | 9:7 | | | 7:9 | | | 19:13 | | | 13:19 | | | 7:4 Horizontal | | | 4:7 Vertical | | | 12:5 Horizontal | | | 5:12 Vertical | ## Training and Hyperparameters **Animagine XL 3.1** was trained on 2x A100 80GB GPUs for approximately 15 days, totaling over 350 GPU hours. The training process consisted of three stages: - **Pretraining**: Utilized a data-rich collection of 870k ordered and tagged images to increase Animagine XL 3.0's model knowledge. - **Finetuning - First Stage**: Employed labeled and curated aesthetic datasets to refine the broken U-Net after pretraining. - **Finetuning - Second Stage**: Utilized labeled and curated aesthetic datasets to refine the model's art style and improve hand and anatomy rendering. ### Hyperparameters | Stage | Epochs | UNet lr | Train Text Encoder | Batch Size | Noise Offset | Optimizer | LR Scheduler | Grad Acc Steps | GPUs | |--------------------------|--------|---------|--------------------|------------|--------------|------------|-------------------------------|----------------|------| | **Pretraining** | 10 | 1e-5 | True | 16 | N/A | AdamW | Cosine Annealing Warm Restart | 3 | 2 | | **Finetuning 1st Stage** | 10 | 2e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 | | **Finetuning 2nd Stage** | 15 | 1e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 | ## Model Comparison (Pretraining only) ### Training Config | Configuration Item | Animagine XL 3.0 | Animagine XL 3.1 | |---------------------------------|------------------------------------------|------------------------------------------------| | **GPU** | 2 x A100 80G | 2 x A100 80G | | **Dataset** | 1,271,990 | 873,504 | | **Shuffle Separator** | True | True | | **Num Epochs** | 10 | 10 | | **Learning Rate** | 7.5e-6 | 1e-5 | | **Text Encoder Learning Rate** | 3.75e-6 | 1e-5 | | **Effective Batch Size** | 48 x 1 x 2 | 16 x 3 x 2 | | **Optimizer** | Adafactor | AdamW | | **Optimizer Args** | Scale Parameter: False, Relative Step: False, Warmup Init: False | Weight Decay: 0.1, Betas: (0.9, 0.99) | | **LR Scheduler** | Constant with Warmup | Cosine Annealing Warm Restart | | **LR Scheduler Args** | Warmup Steps: 100 | Num Cycles: 10, Min LR: 1e-6, LR Decay: 0.9, First Cycle Steps: 9,099 | Source code and training config are available here: ### Acknowledgements The development and release of Animagine XL 3.1 would not have been possible without the invaluable contributions and support from the following individuals and organizations: - **SeaArt.ai**: Our collaboration partner and sponsor. - **Shadow Lilac**: For providing the aesthetic classification model, aesthetic-shadow-v2. - **Derrian Distro**: For their custom learning rate scheduler, adapted from LoRA Easy Training Scripts. - **Kohya SS**: For their comprehensive training scripts. - **Cagliostrolab Collaborators**: For their dedication to model training, project management, and data curation. - **Early Testers**: For their valuable feedback and quality assurance efforts. - **NovelAI**: For their innovative approach to aesthetic tagging, which served as an inspiration for our implementation. - **KBlueLeaf**: For providing inspiration in balancing quality tags distribution and managing tags based on Hakubooru Metainfo Thank you all for your support and expertise in pushing the boundaries of anime-style image generation. ## Collaborators - Linaqruf - ItsMeBell - Asahina2K - DamarJati - Zwicky18 - Scipius2121 - Raelina - Kayfahaarukku - Kriz ## Limitations While Animagine XL 3.1 represents a significant advancement in anime-style image generation, it is important to acknowledge its limitations: 1. **Anime-Focused**: This model is specifically designed for generating anime-style images and is not suitable for creating realistic photos. 2. **Prompt Complexity**: This model may not be suitable for users who expect high-quality results from short or simple prompts. The training focus was on concept understanding rather than aesthetic refinement, which may require more detailed and specific prompts to achieve the desired output. 3. **Prompt Format**: Animagine XL 3.1 is optimized for Danbooru-style tags rather than natural language prompts. For best results, users are encouraged to format their prompts using the appropriate tags and syntax. 4. **Anatomy and Hand Rendering**: Despite the improvements made in anatomy and hand rendering, there may still be instances where the model produces suboptimal results in these areas. 5. **Dataset Size**: The dataset used for training Animagine XL 3.1 consists of approximately 870,000 images. When combined with the previous iteration's dataset (1.2 million), the total training data amounts to around 2.1 million images. While substantial, this dataset size may still be considered limited in scope for an \"ultimate\" anime model. 6. **NSFW Content**: Animagine XL 3.1 has been designed to generate more balanced NSFW content. However, it is important to note that the model may still produce NSFW results, even if not explicitly prompted. By acknowledging these limitations, we aim to provide transparency and set realistic expectations for users of Animagine XL 3.1. Despite these constraints, we believe that the model represents a significant step forward in anime-style image generation and offers a powerful tool for artists, designers, and enthusiasts alike. ## License Based on Animagine XL 3.0, Animagine XL 3.1 falls under Fair AI Public License 1.0-SD license, which is compatible with Stable Diffusion models’ license. Key points: 1. **Modification Sharing:** If you modify Animagine XL 3.1, you must share both your changes and the original license. 2. **Source Code Accessibility:** If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too. 3. **Distribution Terms:** Any distribution must be under this license or another with similar rules. 4. **Compliance:** Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values. The choice of this license aims to keep Animagine XL 3.1 open and modifiable, aligning with open source community spirit. It protects contributors and users, encouraging a collaborative, ethical open-source community. This ensures the model not only benefits from communal input but also respects open-source development freedoms. ## Cagliostro Lab Discord Server Finally Cagliostro Lab Server open to public Feel free to join our discord server", + "model_explanation_gemini": "Generates high-quality anime-style images from text prompts using Stable Diffusion XL." +} \ No newline at end of file diff --git a/data/model_data_json/cahya_NusaBert-ner-v1.3.json b/data/model_data_json/cahya_NusaBert-ner-v1.3.json new file mode 100644 index 0000000000000000000000000000000000000000..4050af94cff7ba32ba5af14b7788efeb2927e388 --- /dev/null +++ b/data/model_data_json/cahya_NusaBert-ner-v1.3.json @@ -0,0 +1,21 @@ +{ + "model_id": "cahya/NusaBert-ner-v1.3", + "downloads": 934613, + "tags": [ + "transformers", + "safetensors", + "generated_from_trainer", + "token-classification", + "id", + "dataset:grit-id/id_nergrit_corpus", + "base_model:cahya/NusaBert-v1.3", + "base_model:finetune:cahya/NusaBert-v1.3", + "doi:10.57967/hf/5158", + "license:mit", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers base_model: cahya/NusaBert-v1.3 tags: - generated_from_trainer datasets: - grit-id/id_nergrit_corpus metrics: - precision - recall - f1 - accuracy model-index: - name: nusabert_nergrit_1.3 results: - task: name: Token Classification type: token-classification dataset: name: grit-id/id_nergrit_corpus ner type: grit-id/id_nergrit_corpus config: ner split: validation args: ner metrics: - name: Precision type: precision value: 0.8010483135824977 - name: Recall type: recall value: 0.8338275412169375 - name: F1 type: f1 value: 0.8171093159760562 - name: Accuracy type: accuracy value: 0.9476653696498054 pipeline_tag: token-classification license: mit language: - id --- # NusaBert-ner-v1.3 This model is a fine-tuned version of cahya/NusaBert-v1.3 on the grit-id/id_nergrit_corpus ner dataset. It supports a context length of 8192, the same as the model *cahya/NusaBert-v1.3* which was pre-trained from scratch using ModernBERT architecture. It achieves the following results on the evaluation set: - Loss: 0.2174 - Precision: 0.8010 - Recall: 0.8338 - F1: 0.8171 - Accuracy: 0.9477 ## Model description The dataset contains 19 following entities ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - total_train_batch_size: 64 - total_eval_batch_size: 64 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 3.0 ### Training results ### Framework versions - Transformers 4.49.0 - Pytorch 2.5.1+cu124 - Datasets 2.19.2 - Tokenizers 0.21.0 ## Usage", + "model_explanation_gemini": "Performs named entity recognition (NER) on Indonesian text, fine-tuned from NusaBert-v1.3 using the grit-id/id_nergrit_corpus dataset." +} \ No newline at end of file diff --git a/data/model_data_json/calcuis_hidream-gguf.json b/data/model_data_json/calcuis_hidream-gguf.json new file mode 100644 index 0000000000000000000000000000000000000000..6973767b5d1134927f01acb6ea3b4e2d02478121 --- /dev/null +++ b/data/model_data_json/calcuis_hidream-gguf.json @@ -0,0 +1,16 @@ +{ + "model_id": "calcuis/hidream-gguf", + "downloads": 37294, + "tags": [ + "gguf", + "gguf-node", + "text-to-image", + "en", + "base_model:HiDream-ai/HiDream-I1-Full", + "base_model:quantized:HiDream-ai/HiDream-I1-Full", + "license:mit", + "region:us" + ], + "description": "--- license: mit language: - en base_model: - HiDream-ai/HiDream-I1-Full pipeline_tag: text-to-image tags: - gguf-node widget: - text: >- cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere parameters: negative_prompt: >- full (info only, don't copy this) output: url: samples\\ComfyUI_00004_.png - text: fast (same prompt as full) output: url: samples\\ComfyUI_00005_.png - text: dev (same prompt as full) output: url: samples\\ComfyUI_00006_.png - text: 1-clip (llama only) output: url: samples\\ComfyUI_00001_.png - text: 2-clip (llama + t5xxl) output: url: samples\\ComfyUI_00002_.png --- ## gguf quantized version of HiDream-i1-Full (incl. full + dev + fast) - full set gguf works right away (all gguf: model + encoder + vae) - upgrade your node(pypi|repo|pack) for model support !screenshot ### setup - drag **hidream** to > - drag **encoder**: g, l, t5xxl and llama to > - drag **pig** to > ### workflow - drag the json file or demo picture below for example workflow ### reference - base model from hidream-ai - get safetensors from comfy-org - get more t5xxl-encoder gguf here - get more lama3.1-8b-encoder gguf here", + "model_explanation_gemini": "Generates anime-style images from text descriptions using a quantized version of the HiDream-I1-Full model." +} \ No newline at end of file diff --git a/data/model_data_json/cambridgeltl_SapBERT-UMLS-2020AB-all-lang-from-XLMR.json b/data/model_data_json/cambridgeltl_SapBERT-UMLS-2020AB-all-lang-from-XLMR.json new file mode 100644 index 0000000000000000000000000000000000000000..ef186561c8b5f95a5eac0afbfebebe96d280e41a --- /dev/null +++ b/data/model_data_json/cambridgeltl_SapBERT-UMLS-2020AB-all-lang-from-XLMR.json @@ -0,0 +1,16 @@ +{ + "model_id": "cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR", + "downloads": 403899, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "xlm-roberta", + "feature-extraction", + "arxiv:2010.11784", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: multilingual tags: - biomedical - lexical-semantics - cross-lingual datasets: - UMLS **[news]** A cross-lingual extension of SapBERT will appear in the main onference of **ACL 2021**!
**[news]** SapBERT will appear in the conference proceedings of **NAACL 2021**! ### SapBERT-XLMR SapBERT (Liu et al. 2020) trained with UMLS 2020AB, using xlm-roberta-base as the base model. Please use [CLS] as the representation of the input. #### Extracting embeddings from SapBERT The following script converts a list of strings (entity names) into embeddings. For more details about training and eval, see SapBERT github repo. ### Citation", + "model_explanation_gemini": "Multilingual biomedical model for cross-lingual lexical semantics, trained on UMLS data to generate embeddings from entity names using XLM-RoBERTa as its base." +} \ No newline at end of file diff --git a/data/model_data_json/cambridgeltl_SapBERT-from-PubMedBERT-fulltext-mean-token.json b/data/model_data_json/cambridgeltl_SapBERT-from-PubMedBERT-fulltext-mean-token.json new file mode 100644 index 0000000000000000000000000000000000000000..63ea57e84f449e2d44dbe857bad467dabb2cce23 --- /dev/null +++ b/data/model_data_json/cambridgeltl_SapBERT-from-PubMedBERT-fulltext-mean-token.json @@ -0,0 +1,17 @@ +{ + "model_id": "cambridgeltl/SapBERT-from-PubMedBERT-fulltext-mean-token", + "downloads": 194526, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "bert", + "feature-extraction", + "arxiv:2010.11784", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - biomedical - lexical-semantics datasets: - UMLS **[news]** A cross-lingual extension of SapBERT will appear in the main onference of **ACL 2021**!
**[news]** SapBERT will appear in the conference proceedings of **NAACL 2021**! ### SapBERT-PubMedBERT SapBERT by Liu et al. (2020). Trained with UMLS 2020AA (English only), using microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext as the base model. Please use the mean-pooling of the output as the representation. #### Extracting embeddings from SapBERT The following script converts a list of strings (entity names) into embeddings. For more details about training and eval, see SapBERT github repo. ### Citation", + "model_explanation_gemini": "Generates mean-pooled embeddings for biomedical entity names using PubMedBERT to capture lexical-semantic relationships, trained on UMLS English data." +} \ No newline at end of file diff --git a/data/model_data_json/cambridgeltl_SapBERT-from-PubMedBERT-fulltext.json b/data/model_data_json/cambridgeltl_SapBERT-from-PubMedBERT-fulltext.json new file mode 100644 index 0000000000000000000000000000000000000000..36f8337aa76f033363c395ed531a71e02b953377 --- /dev/null +++ b/data/model_data_json/cambridgeltl_SapBERT-from-PubMedBERT-fulltext.json @@ -0,0 +1,27 @@ +{ + "model_id": "cambridgeltl/SapBERT-from-PubMedBERT-fulltext", + "downloads": 1709979, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "feature-extraction", + "biomedical", + "lexical semantics", + "bionlp", + "biology", + "science", + "embedding", + "entity linking", + "en", + "arxiv:2010.11784", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en tags: - biomedical - lexical semantics - bionlp - biology - science - embedding - entity linking --- --- datasets: - UMLS **[news]** A cross-lingual extension of SapBERT will appear in the main onference of **ACL 2021**!
**[news]** SapBERT will appear in the conference proceedings of **NAACL 2021**! ### SapBERT-PubMedBERT SapBERT by Liu et al. (2020). Trained with UMLS 2020AA (English only), using microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext as the base model. ### Expected input and output The input should be a string of biomedical entity names, e.g., \"covid infection\" or \"Hydroxychloroquine\". The [CLS] embedding of the last layer is regarded as the output. #### Extracting embeddings from SapBERT The following script converts a list of strings (entity names) into embeddings. For more details about training and eval, see SapBERT github repo. ### Citation", + "model_explanation_gemini": "Generates embeddings for biomedical entity names using PubMedBERT to support tasks like entity linking and lexical semantics in the biomedical domain." +} \ No newline at end of file diff --git a/data/model_data_json/canopylabs_orpheus-3b-0.1-ft.json b/data/model_data_json/canopylabs_orpheus-3b-0.1-ft.json new file mode 100644 index 0000000000000000000000000000000000000000..ed63240566fde73124947cff75225c6994c68273 --- /dev/null +++ b/data/model_data_json/canopylabs_orpheus-3b-0.1-ft.json @@ -0,0 +1,21 @@ +{ + "model_id": "canopylabs/orpheus-3b-0.1-ft", + "downloads": 84171, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "text-to-speech", + "en", + "base_model:canopylabs/orpheus-3b-0.1-pretrained", + "base_model:finetune:canopylabs/orpheus-3b-0.1-pretrained", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers language: - en pipeline_tag: text-to-speech license: apache-2.0 base_model: - meta-llama/Llama-3.2-3B-Instruct - canopylabs/orpheus-3b-0.1-pretrained --- # Orpheus 3B 0.1 Finetuned **03/18/2025** – We are releasing our 3B Orpheus TTS model with additional finetunes. Code is available on GitHub: CanopyAI/Orpheus-TTS --- Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances. # Model Details ### Model Capabilities - **Human-Like Speech**: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models - **Zero-Shot Voice Cloning**: Clone voices without prior fine-tuning - **Guided Emotion and Intonation**: Control speech and emotion characteristics with simple tags - **Low Latency**: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming ### Model Sources - **GitHub Repo:** - **Blog Post:** - **Colab Inference Notebook:** notebook link # Usage Check out our Colab (link to Colab) or GitHub (link to GitHub) on how to run easy inference on our finetuned models. # Model Misuse Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.", + "model_explanation_gemini": "Generates high-quality, empathetic speech from text with human-like intonation, emotion control, and real-time streaming capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/cardiffnlp_tweet-topic-latest-multi.json b/data/model_data_json/cardiffnlp_tweet-topic-latest-multi.json new file mode 100644 index 0000000000000000000000000000000000000000..7d1c3e8a978ec0a669fb51f833d27dcffd93a956 --- /dev/null +++ b/data/model_data_json/cardiffnlp_tweet-topic-latest-multi.json @@ -0,0 +1,17 @@ +{ + "model_id": "cardiffnlp/tweet-topic-latest-multi", + "downloads": 318263, + "tags": [ + "transformers", + "pytorch", + "tf", + "roberta", + "text-classification", + "arxiv:2209.09824", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "# tweet-topic-latest-multi This is a RoBERTa-base model trained on 168.86M tweets until the end of September 2022 and finetuned for multi-label topic classification on a corpus of 11,267 tweets. The original RoBERTa-base model can be found here. This model is suitable for English. - Reference Paper: TweetTopic (COLING 2022). Labels: | 0: arts_&_culture | 5: fashion_&_style | 10: learning_&_educational | 15: science_&_technology | |-----------------------------|---------------------|----------------------------|--------------------------| | 1: business_&_entrepreneurs | 6: film_tv_&_video | 11: music | 16: sports | | 2: celebrity_&_pop_culture | 7: fitness_&_health | 12: news_&_social_concern | 17: travel_&_adventure | | 3: diaries_&_daily_life | 8: food_&_dining | 13: other_hobbies | 18: youth_&_student_life | | 4: family | 9: gaming | 14: relationships | | ## Full classification example Output: ### BibTeX entry and citation info Please cite the reference paper if you use this model.", + "model_explanation_gemini": "Classifies English tweets into multiple topic categories such as arts & culture, sports, and technology using a RoBERTa-base model trained on tweet data." +} \ No newline at end of file diff --git a/data/model_data_json/cardiffnlp_twitter-roberta-base-dec2021-tweet-topic-multi-all.json b/data/model_data_json/cardiffnlp_twitter-roberta-base-dec2021-tweet-topic-multi-all.json new file mode 100644 index 0000000000000000000000000000000000000000..6acb5c7fb33642b93732f2256144cecf012aac75 --- /dev/null +++ b/data/model_data_json/cardiffnlp_twitter-roberta-base-dec2021-tweet-topic-multi-all.json @@ -0,0 +1,17 @@ +{ + "model_id": "cardiffnlp/twitter-roberta-base-dec2021-tweet-topic-multi-all", + "downloads": 36833, + "tags": [ + "transformers", + "pytorch", + "roberta", + "text-classification", + "dataset:cardiffnlp/tweet_topic_multi", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - cardiffnlp/tweet_topic_multi metrics: - f1 - accuracy model-index: - name: cardiffnlp/twitter-roberta-base-dec2021-tweet-topic-multi-all results: - task: type: text-classification name: Text Classification dataset: name: cardiffnlp/tweet_topic_multi type: cardiffnlp/tweet_topic_multi args: cardiffnlp/tweet_topic_multi split: test_2021 metrics: - name: F1 type: f1 value: 0.7647668393782383 - name: F1 (macro) type: f1_macro value: 0.6187022581213811 - name: Accuracy type: accuracy value: 0.5485407980941036 pipeline_tag: text-classification widget: - text: \"I'm sure the {@Tampa Bay Lightning@} would’ve rather faced the Flyers but man does their experience versus the Blue Jackets this year and last help them a lot versus this Islanders team. Another meat grinder upcoming for the good guys\" example_title: \"Example 1\" - text: \"Love to take night time bike rides at the jersey shore. Seaside Heights boardwalk. Beautiful weather. Wishing everyone a safe Labor Day weekend in the US.\" example_title: \"Example 2\" --- # cardiffnlp/twitter-roberta-base-dec2021-tweet-topic-multi-all This model is a fine-tuned version of cardiffnlp/twitter-roberta-base-dec2021 on the tweet_topic_multi. This model is fine-tuned on split and validated on split of tweet_topic. Fine-tuning script can be found here. It achieves the following results on the test_2021 set: - F1 (micro): 0.7647668393782383 - F1 (macro): 0.6187022581213811 - Accuracy: 0.5485407980941036 ### Usage ### Reference", + "model_explanation_gemini": "Classifies tweets into multiple topics using a fine-tuned RoBERTa model trained on Twitter data." +} \ No newline at end of file diff --git a/data/model_data_json/cardiffnlp_twitter-roberta-base-irony.json b/data/model_data_json/cardiffnlp_twitter-roberta-base-irony.json new file mode 100644 index 0000000000000000000000000000000000000000..ee879cffe102749e572375823cc153e0de613f9d --- /dev/null +++ b/data/model_data_json/cardiffnlp_twitter-roberta-base-irony.json @@ -0,0 +1,20 @@ +{ + "model_id": "cardiffnlp/twitter-roberta-base-irony", + "downloads": 332146, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "roberta", + "text-classification", + "en", + "dataset:tweet_eval", + "arxiv:2010.12421", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - tweet_eval language: - en --- # Twitter-roBERTa-base for Irony Detection This is a roBERTa-base model trained on ~58M tweets and finetuned for irony detection with the TweetEval benchmark. This model has integrated into the TweetNLP Python library. - Paper: _TweetEval_ benchmark (Findings of EMNLP 2020). - Git Repo: Tweeteval official repository. ## Example of classification Output: ### Reference Please cite the reference paper if you use this model.", + "model_explanation_gemini": "Detects irony in English tweets using a roBERTa-base model trained on ~58M tweets and fine-tuned on the TweetEval benchmark." +} \ No newline at end of file diff --git a/data/model_data_json/cardiffnlp_twitter-roberta-base-sentiment-latest.json b/data/model_data_json/cardiffnlp_twitter-roberta-base-sentiment-latest.json new file mode 100644 index 0000000000000000000000000000000000000000..4da0d8349ff5de689a7fd62f21bc1e80cbdfc2c1 --- /dev/null +++ b/data/model_data_json/cardiffnlp_twitter-roberta-base-sentiment-latest.json @@ -0,0 +1,19 @@ +{ + "model_id": "cardiffnlp/twitter-roberta-base-sentiment-latest", + "downloads": 2053333, + "tags": [ + "transformers", + "pytorch", + "tf", + "roberta", + "text-classification", + "en", + "dataset:tweet_eval", + "arxiv:2202.03829", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en widget: - text: Covid cases are increasing fast! datasets: - tweet_eval --- # Twitter-roBERTa-base for Sentiment Analysis - UPDATED (2022) This is a RoBERTa-base model trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. The original Twitter-based RoBERTa model can be found here and the original reference paper is TweetEval. This model is suitable for English. - Reference Paper: TimeLMs paper. - Git Repo: TimeLMs official repository. Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive This sentiment analysis model has been integrated into TweetNLP. You can access the demo here. ## Example Pipeline ## Full classification example Output: ### References", + "model_explanation_gemini": "Classifies English tweets into negative, neutral, or positive sentiment using a RoBERTa-base model trained on 124M tweets." +} \ No newline at end of file diff --git a/data/model_data_json/cardiffnlp_twitter-roberta-base-sentiment.json b/data/model_data_json/cardiffnlp_twitter-roberta-base-sentiment.json new file mode 100644 index 0000000000000000000000000000000000000000..18538f76e74ec0b0fb2b76c4d4523ad9f1670cd0 --- /dev/null +++ b/data/model_data_json/cardiffnlp_twitter-roberta-base-sentiment.json @@ -0,0 +1,20 @@ +{ + "model_id": "cardiffnlp/twitter-roberta-base-sentiment", + "downloads": 3929796, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "roberta", + "text-classification", + "en", + "dataset:tweet_eval", + "arxiv:2010.12421", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - tweet_eval language: - en --- # Twitter-roBERTa-base for Sentiment Analysis This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English (for a similar multilingual model, see XLM-T). - Reference Paper: _TweetEval_ (Findings of EMNLP 2020). - Git Repo: Tweeteval official repository. Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive New! We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets. See twitter-roberta-base-sentiment-latest and TweetNLP for more details. ## Example of classification Output: ### BibTeX entry and citation info Please cite the reference paper if you use this model.", + "model_explanation_gemini": "Classifies English tweets into negative, neutral, or positive sentiment categories using a RoBERTa-base model fine-tuned on tweet data." +} \ No newline at end of file diff --git a/data/model_data_json/cardiffnlp_twitter-xlm-roberta-base-sentiment.json b/data/model_data_json/cardiffnlp_twitter-xlm-roberta-base-sentiment.json new file mode 100644 index 0000000000000000000000000000000000000000..585e1d7842e4c89f156f2b7bc0a39b90f3c77988 --- /dev/null +++ b/data/model_data_json/cardiffnlp_twitter-xlm-roberta-base-sentiment.json @@ -0,0 +1,18 @@ +{ + "model_id": "cardiffnlp/twitter-xlm-roberta-base-sentiment", + "downloads": 2014393, + "tags": [ + "transformers", + "pytorch", + "tf", + "xlm-roberta", + "text-classification", + "multilingual", + "arxiv:2104.12250", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: multilingual widget: - text: \"🤗\" - text: \"T'estimo! ❤️\" - text: \"I love you!\" - text: \"I hate you 🤮\" - text: \"Mahal kita!\" - text: \"사랑해!\" - text: \"난 너가 싫어\" - text: \"😍😍😍\" --- # twitter-XLM-roBERTa-base for Sentiment Analysis This is a multilingual XLM-roBERTa-base model trained on ~198M tweets and finetuned for sentiment analysis. The sentiment fine-tuning was done on 8 languages (Ar, En, Fr, De, Hi, It, Sp, Pt) but it can be used for more languages (see paper for details). - Paper: XLM-T: A Multilingual Language Model Toolkit for Twitter. - Git Repo: XLM-T official repository. This model has been integrated into the TweetNLP library. ## Example Pipeline ## Full classification example Output: ### Reference", + "model_explanation_gemini": "Multilingual XLM-roBERTa-base model fine-tuned for sentiment analysis on tweets across eight languages." +} \ No newline at end of file diff --git a/data/model_data_json/cecibas_Midnight-Miqu-70B-v1.5-4bit.json b/data/model_data_json/cecibas_Midnight-Miqu-70B-v1.5-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..6064ff69ca1b2ee31066a89b5845ccfbfcb5b185 --- /dev/null +++ b/data/model_data_json/cecibas_Midnight-Miqu-70B-v1.5-4bit.json @@ -0,0 +1,20 @@ +{ + "model_id": "cecibas/Midnight-Miqu-70B-v1.5-4bit", + "downloads": 200277, + "tags": [ + "transformers", + "pytorch", + "llama", + "text-generation", + "conversational", + "license:unknown", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "awq", + "region:us" + ], + "description": "--- license: unknown --- awq quant made with lmdeploy v0.4.2:", + "model_explanation_gemini": "An AWQ-quantized version of the Midnight-Miqu-70B-v1.5 model created using lmdeploy v0.4.2." +} \ No newline at end of file diff --git a/data/model_data_json/charactr_vocos-encodec-24khz.json b/data/model_data_json/charactr_vocos-encodec-24khz.json new file mode 100644 index 0000000000000000000000000000000000000000..714dd4463fe1711342115ddbd1c98921922fa552 --- /dev/null +++ b/data/model_data_json/charactr_vocos-encodec-24khz.json @@ -0,0 +1,12 @@ +{ + "model_id": "charactr/vocos-encodec-24khz", + "downloads": 108096, + "tags": [ + "pytorch", + "arxiv:2306.00814", + "license:mit", + "region:us" + ], + "description": "--- license: mit --- # Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis Audio samples | Paper [[abs]]( [[pdf]]( Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform. ## Installation To use Vocos only in inference mode, install it using: If you wish to train the model, install it with additional dependencies: ## Usage ### Reconstruct audio from EnCodec tokens Additionally, you need to provide a which corresponds to the embedding for bandwidth from the list: . Copy-synthesis from a file: It extracts and quantizes features with EnCodec, then reconstructs them with Vocos in a single forward pass. ## Citation If this code contributes to your research, please cite our work: ## License The code in this repository is released under the MIT license.", + "model_explanation_gemini": "Generates high-quality audio waveforms from acoustic features using spectral coefficients and inverse Fourier transform for rapid synthesis." +} \ No newline at end of file diff --git a/data/model_data_json/charactr_vocos-mel-24khz.json b/data/model_data_json/charactr_vocos-mel-24khz.json new file mode 100644 index 0000000000000000000000000000000000000000..911dcbc9033f0fa450ddc366ba5a8413c5048f85 --- /dev/null +++ b/data/model_data_json/charactr_vocos-mel-24khz.json @@ -0,0 +1,12 @@ +{ + "model_id": "charactr/vocos-mel-24khz", + "downloads": 1619476, + "tags": [ + "pytorch", + "arxiv:2306.00814", + "license:mit", + "region:us" + ], + "description": "--- license: mit --- # Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis Audio samples | Paper [[abs]]( [[pdf]]( Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform. ## Installation To use Vocos only in inference mode, install it using: If you wish to train the model, install it with additional dependencies: ## Usage ### Reconstruct audio from mel-spectrogram Copy-synthesis from a file: ## Citation If this code contributes to your research, please cite our work: ## License The code in this repository is released under the MIT license.", + "model_explanation_gemini": "Generates high-quality audio waveforms from mel-spectrograms using spectral coefficients and inverse Fourier transform for fast synthesis." +} \ No newline at end of file diff --git a/data/model_data_json/chutesai_Llama-4-Scout-17B-16E-Instruct.json b/data/model_data_json/chutesai_Llama-4-Scout-17B-16E-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..821c2dd6400fcf7f2be1d4c95bc30ed5a6eb101a --- /dev/null +++ b/data/model_data_json/chutesai_Llama-4-Scout-17B-16E-Instruct.json @@ -0,0 +1,36 @@ +{ + "model_id": "chutesai/Llama-4-Scout-17B-16E-Instruct", + "downloads": 186419, + "tags": [ + "transformers", + "safetensors", + "llama4", + "image-text-to-text", + "facebook", + "meta", + "pytorch", + "llama", + "conversational", + "ar", + "de", + "en", + "es", + "fr", + "hi", + "id", + "it", + "pt", + "th", + "tl", + "vi", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-4-Scout-17B-16E", + "base_model:finetune:meta-llama/Llama-4-Scout-17B-16E", + "license:other", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers language: - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi base_model: - meta-llama/Llama-4-Scout-17B-16E tags: - facebook - meta - pytorch - llama - llama4 extra_gated_prompt: >- **LLAMA 4 COMMUNITY LICENSE AGREEMENT** Llama 4 Version Effective Date: April 5, 2025 \"**Agreement**\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"**Documentation**\" means the specifications, manuals and documentation accompanying Llama 4 distributed by Meta at \"**Licensee**\" or \"**you**\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"**Llama 4**\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"**Llama Materials**\" means, collectively, Meta’s proprietary Llama 4 and Documentation (and any portion thereof) made available under this Agreement. \"**Meta**\" or \"**we**\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1\\. **License Rights and Redistribution**. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display \"Built with Llama\" on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include \"Llama\" at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2\\. **Additional Commercial Terms**. If, on the Llama 4 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3**. Disclaimer of Warranty**. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4\\. **Limitation of Liability**. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5\\. **Intellectual Property**. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use \"Llama\" (the \"Mark\") solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 4 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6\\. **Term and Termination**. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7\\. **Governing Law and Jurisdiction**. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit extra_gated_heading: \"Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.\" license: other license_name: llama4 --- ## Model Information The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts. **Model developer**: Meta **Model Architecture:** The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality.
Model Name Training Data Params Input modalities Output modalities Context length Token count Knowledge cutoff
Llama 4 Scout (17Bx16E) A mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI. Learn more in our . 17B (Activated) 109B (Total) Multilingual text and image Multilingual text and code 10M ~40T August 2024
Llama 4 Maverick (17Bx128E) 17B (Activated) 400B (Total) Multilingual text and image Multilingual text and code 1M ~22T August 2024
**Supported languages:** Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. **Model Release Date:** April 5, 2025 **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models may be released as we improve model behavior with community feedback. **License**: A custom commercial license, the Llama 4 Community License Agreement, is available at: **Where to send questions or comments about the model:** Instructions on how to provide feedback or comments on the model can be found in the Llama README. For more technical information about generation parameters and recipes for how to use Llama 4 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 4 is intended for commercial and research use in multiple languages. Instruction tuned models are intended for assistant-like chat and visual reasoning tasks, whereas pretrained models can be adapted for natural language generation. For vision, Llama 4 models are also optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The Llama 4 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 4 Community License allows for these use cases. **Out-of-scope**: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 4 Community License. Use in languages or capabilities beyond those explicitly referenced as supported in this model card\\*\\*. \\*\\*Note: 1\\. Llama 4 has been trained on a broader collection of languages than the 12 supported languages (pre-training includes 200 total languages). Developers may fine-tune Llama 4 models for languages beyond the 12 supported languages provided they comply with the Llama 4 Community License and the Acceptable Use Policy. Developers are responsible for ensuring that their use of Llama 4 in additional languages is done in a safe and responsible manner. 2\\. Llama 4 has been tested for image understanding up to 5 input images. If leveraging additional image understanding capabilities beyond this, Developers are responsible for ensuring that their deployments are mitigated for risks and should perform additional testing and tuning tailored to their specific applications. ## How to use with transformers Please, make sure you have transformers installed, or upgrade using . ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU clusters, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Model pre-training utilized a cumulative of **7.38M** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. ## ## **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **1,999 tons** CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with clean and renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | Model Name | Training Time (GPU hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | :---: | :---: | :---: | | Llama 4 Scout | 5.0M | 700 | 1,354 | 0 | | Llama 4 Maverick | 2.38M | 700 | 645 | 0 | | Total | 7.38M | \\- | 1,999 | 0 | ## The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 4 Scout was pretrained on \\~40 trillion tokens and Llama 4 Maverick was pretrained on \\~22 trillion tokens of multimodal data from a mix of publicly available, licensed data and information from Meta’s products and services. This includes publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI. **Data Freshness:** The pretraining data has a cutoff of August 2024\\. ## Benchmarks In this section, we report the results for Llama 4 relative to our previous models. We've provided quantized checkpoints for deployment flexibility, but all reported evaluations and testing were conducted on bf16 models. ### Pre-trained models | Pre-trained models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.1 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Reasoning & Knowledge | MMLU | 5 | macro\\_avg/acc\\_char | 79.3 | 85.2 | 79.6 | 85.5 | | | MMLU-Pro | 5 | macro\\_avg/em | 53.8 | 61.6 | 58.2 | 62.9 | | | MATH | 4 | em\\_maj1@1 | 41.6 | 53.5 | 50.3 | 61.2 | | Code | MBPP | 3 | pass@1 | 66.4 | 74.4 | 67.8 | 77.6 | | Multilingual | TydiQA | 1 | average/f1 | 29.9 | 34.3 | 31.5 | 31.7 | | Image | ChartQA | 0 | relaxed\\_accuracy | No multimodal support | | 83.4 | 85.3 | | | DocVQA | 0 | anls | | | 89.4 | 91.6 | ### Instruction tuned models | Instruction tuned models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | ----- | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.3 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Image Reasoning | MMMU | 0 | accuracy | No multimodal support | | 69.4 | 73.4 | | | MMMU Pro^ | 0 | accuracy | | | 52.2 | 59.6 | | | MathVista | 0 | accuracy | | | 70.7 | 73.7 | | Image Understanding | ChartQA | 0 | relaxed\\_accuracy | | | 88.8 | 90.0 | | | DocVQA (test) | 0 | anls | | | 94.4 | 94.4 | | Coding | LiveCodeBench (10/01/2024-02/01/2025) | 0 | pass@1 | 33.3 | 27.7 | 32.8 | 43.4 | | Reasoning & Knowledge | MMLU Pro | 0 | macro\\_avg/acc | 68.9 | 73.4 | 74.3 | 80.5 | | | GPQA Diamond | 0 | accuracy | 50.5 | 49.0 | 57.2 | 69.8 | | Multilingual | MGSM | 0 | average/em | 91.1 | 91.6 | 90.6 | 92.3 | | Long context | MTOB (half book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | Context window is 128K | | 42.2/36.6 | 54.0/46.4 | | | MTOB (full book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | | | 39.7/36.3 | 50.8/46.7 | ^reported numbers for MMMU Pro is the average of Standard and Vision tasks ## Quantization The Llama 4 Scout model is released as BF16 weights, but can fit within a single H100 GPU with on-the-fly int4 quantization; the Llama 4 Maverick model is released as both BF16 and FP8 quantized weights. The FP8 quantized weights fit on a single H100 DGX host while still maintaining quality. We provide code for on-the-fly int4 quantization which minimizes performance degradation as well. ## Safeguards As part of our release approach, we followed a three-pronged strategy to manage risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. Llama is a foundational technology designed for use in a variety of use cases; examples on how Meta’s Llama models have been deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology, by aligning our model’s safety for a standard set of risks. Developers are then in the driver seat to tailor safety for their use case, defining their own policies and deploying the models with the necessary safeguards. Llama 4 was developed following the best practices outlined in our Developer Use Guide: AI Protections. ### Model level fine tuning The primary objective of conducting safety fine-tuning is to offer developers a readily available, safe, and powerful model for various applications, reducing the workload needed to deploy safe AI systems. Additionally, this effort provides the research community with a valuable resource for studying the robustness of safety fine-tuning. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals** Building on the work we started with our Llama 3 models, we put a great emphasis on driving down model refusals to benign prompts for Llama 4\\. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. **Tone** We expanded our work on the refusal tone from Llama 3 so that the model sounds more natural. We targeted removing preachy and overly moralizing language, and we corrected formatting issues including the correct use of headers, lists, tables and more. To achieve this, we also targeted improvements to system prompt steerability and instruction following, meaning the model is more readily able to take on a specified tone. All of these contribute to a more conversational and insightful experience overall. **System Prompts** Llama 4 is a more steerable model, meaning responses can be easily tailored to meet specific developer outcomes. Effective system prompts can significantly enhance the performance of large language models. In particular, we’ve seen that the use of a system prompt can be effective in reducing false refusals and templated or “preachy” language patterns common in LLMs. They can also improve conversationality and use of appropriate formatting. Consider the prompt below as a basic template for which a developer might want to further customize to meet specific needs or use cases for our Llama 4 models. | System prompt | | :---- | | You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, \"it's unethical to\", \"it's worth noting…\", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4\\. Your knowledge cutoff date is August 2024\\. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. | ### Llama 4 system protections Large language models, including Llama 4, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional guardrails as required. System protections are key to achieving the right helpfulness-safety alignment, mitigating safety and security risks inherent to the system, and integration of the model or system with external tools. We provide the community with system level protections \\- like Llama Guard, Prompt Guard and Code Shield \\- that developers should deploy with Llama models or other LLMs. All of our reference implementation demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, visual QA. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, coding or memorization. **Red teaming** We conduct recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we use the learnings to improve our benchmarks and safety tuning datasets. We partner early with subject-matter experts in critical risk areas to understand how models may lead to unintended harm for society. Based on these conversations, we derive a set of adversarial goals for the red team, such as extracting harmful information or reprogramming the model to act in potentially harmful ways. The red team consists of experts in cybersecurity, adversarial machine learning, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks ### We spend additional focus on the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons for Llama 4, we applied expert-designed and other targeted evaluations designed to assess whether the use of Llama 4 could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. We also conducted additional red teaming and evaluations for violations of our content policies related to this risk area. **2\\. Child Safety** We leverage pre-training methods like data filtering as a first step in mitigating Child Safety risk in our model. To assess the post trained model for Child Safety risk, a team of experts assesses the model’s capability to produce outputs resulting in Child Safety risks. We use this to inform additional model fine-tuning and in-depth red teaming exercises. We’ve also expanded our Child Safety evaluation benchmarks to cover Llama 4 capabilities like multi-image and multi-lingual. **3\\. Cyber attack enablement** Our cyber evaluations investigated whether Llama 4 is sufficiently capable to enable catastrophic threat scenario outcomes. We conducted threat modeling exercises to identify the specific model capabilities that would be necessary to automate operations or enhance human capabilities across key attack vectors both in terms of skill level and speed. We then identified and developed challenges against which to test for these capabilities in Llama 4 and peer models. Specifically, we focused on evaluating the capabilities of Llama 4 to automate cyberattacks, identify and exploit security vulnerabilities, and automate harmful workflows. Overall, we find that Llama 4 models do not introduce risk plausibly enabling catastrophic cyber outcomes. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Trust tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Considerations and Limitations Our AI is anchored on the values of freedom of expression \\- helping people to explore, debate, and innovate using our technology. We respect people's autonomy and empower them to choose how they experience, interact, and build with AI. Our AI promotes an open exchange of ideas. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 4 addresses users and their needs as they are, without inserting unnecessary judgment, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. Llama 4 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 4’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 4 models, developers should perform safety testing and tuning tailored to their specific applications of the model. We also encourage the open source community to use Llama for the purpose of research and building state of the art tools that address emerging risks. Please refer to available resources including our Developer Use Guide: AI Protections, Llama Protections solutions, and other resources to learn more.", + "model_explanation_gemini": "A multilingual (12 languages) large language model based on Meta's Llama-4, designed for instruction-following tasks and governed by a restrictive community license agreement." +} \ No newline at end of file diff --git a/data/model_data_json/city96_FLUX.1-dev-gguf.json b/data/model_data_json/city96_FLUX.1-dev-gguf.json new file mode 100644 index 0000000000000000000000000000000000000000..4298fd958d89f2efccc4919567e5feacd155c8ae --- /dev/null +++ b/data/model_data_json/city96_FLUX.1-dev-gguf.json @@ -0,0 +1,16 @@ +{ + "model_id": "city96/FLUX.1-dev-gguf", + "downloads": 136642, + "tags": [ + "gguf", + "text-to-image", + "image-generation", + "flux", + "base_model:black-forest-labs/FLUX.1-dev", + "base_model:quantized:black-forest-labs/FLUX.1-dev", + "license:other", + "region:us" + ], + "description": "--- base_model: black-forest-labs/FLUX.1-dev library_name: gguf license: other license_name: flux-1-dev-non-commercial-license license_link: LICENSE.md quantized_by: city96 tags: - text-to-image - image-generation - flux --- This is a direct GGUF conversion of black-forest-labs/FLUX.1-dev As this is a quantized model not a finetune, all the same restrictions/original license terms still apply. The model files can be used with the ComfyUI-GGUF custom node. Place model files in - see the GitHub readme for further install instructions. Please refer to this chart for a basic overview of quantization types.", + "model_explanation_gemini": "Generates images from text inputs using GGUF quantization for compatibility with ComfyUI-GGUF." +} \ No newline at end of file diff --git a/data/model_data_json/city96_Wan2.1-I2V-14B-480P-gguf.json b/data/model_data_json/city96_Wan2.1-I2V-14B-480P-gguf.json new file mode 100644 index 0000000000000000000000000000000000000000..6e4ee727035198b341a857876646f6163cea4c99 --- /dev/null +++ b/data/model_data_json/city96_Wan2.1-I2V-14B-480P-gguf.json @@ -0,0 +1,18 @@ +{ + "model_id": "city96/Wan2.1-I2V-14B-480P-gguf", + "downloads": 122466, + "tags": [ + "gguf", + "video", + "video-generation", + "image-to-video", + "en", + "zh", + "base_model:Wan-AI/Wan2.1-I2V-14B-480P", + "base_model:quantized:Wan-AI/Wan2.1-I2V-14B-480P", + "license:apache-2.0", + "region:us" + ], + "description": "--- base_model: Wan-AI/Wan2.1-I2V-14B-480P library_name: gguf quantized_by: city96 tags: - video - video-generation license: apache-2.0 pipeline_tag: image-to-video language: - en - zh --- This is a direct GGUF conversion of Wan-AI/Wan2.1-I2V-14B-480P All quants are created from the FP32 base file, though I only uploaded FP16 due to it exceeding the 50GB max file limit and gguf-split loading not currently being supported in ComfyUI-GGUF. The model files can be used with the ComfyUI-GGUF custom node. Place model files in - see the GitHub readme for further install instructions. The other files required can be downloaded from this repository by Comfy-Org Please refer to this chart for a basic overview of quantization types.", + "model_explanation_gemini": "Generates videos from images using GGUF quantization for compatibility with ComfyUI-GGUF." +} \ No newline at end of file diff --git a/data/model_data_json/ckiplab_albert-tiny-chinese-ws.json b/data/model_data_json/ckiplab_albert-tiny-chinese-ws.json new file mode 100644 index 0000000000000000000000000000000000000000..79bfcaa1af526bfbd0f4fb95cbc8063235af3aa5 --- /dev/null +++ b/data/model_data_json/ckiplab_albert-tiny-chinese-ws.json @@ -0,0 +1,17 @@ +{ + "model_id": "ckiplab/albert-tiny-chinese-ws", + "downloads": 184642, + "tags": [ + "transformers", + "pytorch", + "albert", + "token-classification", + "zh", + "license:gpl-3.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - zh thumbnail: tags: - pytorch - token-classification - albert - zh license: gpl-3.0 --- # CKIP ALBERT Tiny Chinese This project provides traditional Chinese transformers models (including ALBERT, BERT, GPT2) and NLP tools (including word segmentation, part-of-speech tagging, named entity recognition). 這個專案提供了繁體中文的 transformers 模型(包含 ALBERT、BERT、GPT2)及自然語言處理工具(包含斷詞、詞性標記、實體辨識)。 ## Homepage - ## Contributers - Mu Yang at CKIP (Author & Maintainer) ## Usage Please use BertTokenizerFast as tokenizer instead of AutoTokenizer. 請使用 BertTokenizerFast 而非 AutoTokenizer。 For full usage and more information, please refer to 有關完整使用方法及其他資訊,請參見 。", + "model_explanation_gemini": "Performs traditional Chinese word segmentation using a tiny ALBERT-based model." +} \ No newline at end of file diff --git a/data/model_data_json/ckiplab_bert-base-chinese.json b/data/model_data_json/ckiplab_bert-base-chinese.json new file mode 100644 index 0000000000000000000000000000000000000000..ddcfc869f9b3f7a759bfa1862af2856fbe3c224c --- /dev/null +++ b/data/model_data_json/ckiplab_bert-base-chinese.json @@ -0,0 +1,18 @@ +{ + "model_id": "ckiplab/bert-base-chinese", + "downloads": 80090, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "fill-mask", + "lm-head", + "zh", + "license:gpl-3.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - zh thumbnail: tags: - pytorch - lm-head - bert - zh license: gpl-3.0 --- # CKIP BERT Base Chinese This project provides traditional Chinese transformers models (including ALBERT, BERT, GPT2) and NLP tools (including word segmentation, part-of-speech tagging, named entity recognition). 這個專案提供了繁體中文的 transformers 模型(包含 ALBERT、BERT、GPT2)及自然語言處理工具(包含斷詞、詞性標記、實體辨識)。 ## Homepage - ## Contributers - Mu Yang at CKIP (Author & Maintainer) ## Usage Please use BertTokenizerFast as tokenizer instead of AutoTokenizer. 請使用 BertTokenizerFast 而非 AutoTokenizer。 For full usage and more information, please refer to 有關完整使用方法及其他資訊,請參見 。" +} \ No newline at end of file diff --git a/data/model_data_json/cl-nagoya_ruri-base.json b/data/model_data_json/cl-nagoya_ruri-base.json new file mode 100644 index 0000000000000000000000000000000000000000..ef51ddd9e122dc61aa7a466d7a1b7a6dc81592c2 --- /dev/null +++ b/data/model_data_json/cl-nagoya_ruri-base.json @@ -0,0 +1,19 @@ +{ + "model_id": "cl-nagoya/ruri-base", + "downloads": 294229, + "tags": [ + "safetensors", + "bert", + "sentence-similarity", + "feature-extraction", + "ja", + "dataset:cl-nagoya/ruri-dataset-ft", + "arxiv:2409.07737", + "base_model:cl-nagoya/ruri-pt-base", + "base_model:finetune:cl-nagoya/ruri-pt-base", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: - ja base_model: cl-nagoya/ruri-pt-base tags: - sentence-similarity - feature-extraction license: apache-2.0 datasets: - cl-nagoya/ruri-dataset-ft pipeline_tag: sentence-similarity --- # Ruri: Japanese General Text Embeddings **Notes: v3 models are out!** We recommend using the following v3 models going forward. |ID| #Param.|Max Len.|Avg. JMTEB| |-|-|-|-| |cl-nagoya/ruri-v3-30m|37M|8192|74.51| |cl-nagoya/ruri-v3-70m|70M|8192|75.48| |cl-nagoya/ruri-v3-130m|132M|8192|76.55| |cl-nagoya/ruri-v3-310m|315M|8192|77.24| ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: Then you can load this model and run inference. ## Benchmarks ### JMTEB Evaluated with JMTEB. |Model|#Param.|Avg.|Retrieval|STS|Classfification|Reranking|Clustering|PairClassification| |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| |cl-nagoya/sup-simcse-ja-base|111M|68.56|49.64|82.05|73.47|91.83|51.79|62.57| |cl-nagoya/sup-simcse-ja-large|337M|66.51|37.62|83.18|73.73|91.48|50.56|62.51| |cl-nagoya/unsup-simcse-ja-base|111M|65.07|40.23|78.72|73.07|91.16|44.77|62.44| |cl-nagoya/unsup-simcse-ja-large|337M|66.27|40.53|80.56|74.66|90.95|48.41|62.49| |pkshatech/GLuCoSE-base-ja|133M|70.44|59.02|78.71|76.82|91.90|49.78|66.39| |||||||||| |sentence-transformers/LaBSE|472M|64.70|40.12|76.56|72.66|91.63|44.88|62.33| |intfloat/multilingual-e5-small|118M|69.52|67.27|80.07|67.62|93.03|46.91|62.19| |intfloat/multilingual-e5-base|278M|70.12|68.21|79.84|69.30|92.85|48.26|62.26| |intfloat/multilingual-e5-large|560M|71.65|70.98|79.70|72.89|92.96|51.24|62.15| |||||||||| |OpenAI/text-embedding-ada-002|-|69.48|64.38|79.02|69.75|93.04|48.30|62.40| |OpenAI/text-embedding-3-small|-|70.86|66.39|79.46|73.06|92.92|51.06|62.27| |OpenAI/text-embedding-3-large|-|73.97|74.48|82.52|77.58|93.58|53.32|62.35| |||||||||| |Ruri-Small|68M|71.53|69.41|82.79|76.22|93.00|51.19|62.11| |**Ruri-Base** (this model)|111M|71.91|69.82|82.87|75.58|92.91|54.16|62.38| |Ruri-Large|337M|73.31|73.02|83.13|77.43|92.99|51.82|62.29| ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** cl-nagoya/ruri-pt-base - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 - **Similarity Function:** Cosine Similarity - **Language:** Japanese - **License:** Apache 2.0 - **Paper:** ### Full Model Architecture ### Framework Versions - Python: 3.10.13 - Sentence Transformers: 3.0.0 - Transformers: 4.41.2 - PyTorch: 2.3.1+cu118 - Accelerate: 0.30.1 - Datasets: 2.19.1 - Tokenizers: 0.19.1 ## Citation ## License This model is published under the Apache License, Version 2.0.", + "model_explanation_gemini": "Generates Japanese text embeddings for tasks like sentence similarity, retrieval, and classification." +} \ No newline at end of file diff --git a/data/model_data_json/cl-nagoya_ruri-small-v2.json b/data/model_data_json/cl-nagoya_ruri-small-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..ead093277ff67e3d8daf2492a1db289e08a15d58 --- /dev/null +++ b/data/model_data_json/cl-nagoya_ruri-small-v2.json @@ -0,0 +1,19 @@ +{ + "model_id": "cl-nagoya/ruri-small-v2", + "downloads": 107343, + "tags": [ + "safetensors", + "distilbert", + "sentence-similarity", + "feature-extraction", + "ja", + "dataset:cl-nagoya/ruri-dataset-v2-ft", + "arxiv:2409.07737", + "base_model:cl-nagoya/ruri-pt-small-v2", + "base_model:finetune:cl-nagoya/ruri-pt-small-v2", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: - ja tags: - sentence-similarity - feature-extraction base_model: cl-nagoya/ruri-pt-small-v2 widget: [] pipeline_tag: sentence-similarity license: apache-2.0 datasets: - cl-nagoya/ruri-dataset-v2-ft --- # Ruri: Japanese General Text Embeddings **Notes: v3 models are out!** We recommend using the following v3 models going forward. |ID| #Param.|Max Len.|Avg. JMTEB| |-|-|-|-| |cl-nagoya/ruri-v3-30m|37M|8192|74.51| |cl-nagoya/ruri-v3-70m|70M|8192|75.48| |cl-nagoya/ruri-v3-130m|132M|8192|76.55| |cl-nagoya/ruri-v3-310m|315M|8192|77.24| ## Usage First install the Sentence Transformers library: Then you can load this model and run inference. ## Benchmarks ### JMTEB Evaluated with JMTEB. |Model|#Param.|Avg.|Retrieval|STS|Classfification|Reranking|Clustering|PairClassification| |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| |cl-nagoya/sup-simcse-ja-base|111M|68.56|49.64|82.05|73.47|91.83|51.79|62.57| |cl-nagoya/sup-simcse-ja-large|337M|66.51|37.62|83.18|73.73|91.48|50.56|62.51| |cl-nagoya/unsup-simcse-ja-base|111M|65.07|40.23|78.72|73.07|91.16|44.77|62.44| |cl-nagoya/unsup-simcse-ja-large|337M|66.27|40.53|80.56|74.66|90.95|48.41|62.49| |pkshatech/GLuCoSE-base-ja|133M|70.44|59.02|78.71|76.82|91.90|49.78|66.39| |||||||||| |sentence-transformers/LaBSE|472M|64.70|40.12|76.56|72.66|91.63|44.88|62.33| |intfloat/multilingual-e5-small|118M|69.52|67.27|80.07|67.62|93.03|46.91|62.19| |intfloat/multilingual-e5-base|278M|70.12|68.21|79.84|69.30|92.85|48.26|62.26| |intfloat/multilingual-e5-large|560M|71.65|70.98|79.70|72.89|92.96|51.24|62.15| |||||||||| |OpenAI/text-embedding-ada-002|-|69.48|64.38|79.02|69.75|93.04|48.30|62.40| |OpenAI/text-embedding-3-small|-|70.86|66.39|79.46|73.06|92.92|51.06|62.27| |OpenAI/text-embedding-3-large|-|73.97|74.48|82.52|77.58|93.58|53.32|62.35| |||||||||| |Ruri-Small|68M|71.53|69.41|82.79|76.22|93.00|51.19|62.11| |**Ruri-Small v2** (this model)|68M|73.30|73.94|82.91|76.17|93.20|51.58|62.32| |Ruri-Base|111M|71.91|69.82|82.87|75.58|92.91|54.16|62.38| |Ruri-Base v2|111M|72.48|72.33|83.03|75.34|93.17|51.38|62.35| |Ruri-Large|337M|73.31|73.02|83.13|77.43|92.99|51.82|62.29| |Ruri-Large v2|337M|74.55|76.34|83.17|77.18|93.21|52.14|62.27| ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** cl-nagoya/ruri-pt-small-v2 - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 - **Similarity Function:** Cosine Similarity - **Language:** Japanese - **License:** Apache 2.0 - **Paper:** ### Full Model Architecture ### Framework Versions - Python: 3.10.13 - Sentence Transformers: 3.0.0 - Transformers: 4.41.2 - PyTorch: 2.3.1+cu118 - Accelerate: 0.30.1 - Datasets: 2.19.1 - Tokenizers: 0.19.1 ## Citation ## License This model is published under the Apache License, Version 2.0.", + "model_explanation_gemini": "Generates Japanese text embeddings for sentence similarity and feature extraction tasks." +} \ No newline at end of file diff --git a/data/model_data_json/codellama_CodeLlama-13b-Instruct-hf.json b/data/model_data_json/codellama_CodeLlama-13b-Instruct-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..093a9df46c047d2634f892092bf1cdc68d2e4129 --- /dev/null +++ b/data/model_data_json/codellama_CodeLlama-13b-Instruct-hf.json @@ -0,0 +1,22 @@ +{ + "model_id": "codellama/CodeLlama-13b-Instruct-hf", + "downloads": 195681, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "llama-2", + "conversational", + "code", + "arxiv:2308.12950", + "license:llama2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - code pipeline_tag: text-generation tags: - llama-2 license: llama2 --- # **Code Llama** Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 13 instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom. > [!NOTE] > This is a non-official Code Llama repo. You can find the official Meta repository in the Meta Llama organization. | | Base Model | Python | Instruct | | --- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | | 7B | codellama/CodeLlama-7b-hf | codellama/CodeLlama-7b-Python-hf | codellama/CodeLlama-7b-Instruct-hf | | 13B | codellama/CodeLlama-13b-hf | codellama/CodeLlama-13b-Python-hf | codellama/CodeLlama-13b-Instruct-hf | | 34B | codellama/CodeLlama-34b-hf | codellama/CodeLlama-34b-Python-hf | codellama/CodeLlama-34b-Instruct-hf | | 70B | codellama/CodeLlama-70b-hf | codellama/CodeLlama-70b-Python-hf | codellama/CodeLlama-70b-Instruct-hf | ## Model Use To use this model, please make sure to install transformers: Model capabilities: - [x] Code completion. - [x] Infilling. - [x] Instructions / chat. - [ ] Python specialist. ## Model Details *Note: Use of this model is governed by the Meta license. Meta developed and publicly released the Code Llama family of large language models (LLMs). **Model Developers** Meta **Variations** Code Llama comes in three model sizes, and three variants: * Code Llama: base models designed for general code synthesis and understanding * Code Llama - Python: designed specifically for Python * Code Llama - Instruct: for instruction following and safer deployment All variants are available in sizes of 7B, 13B and 34B parameters. **This repository contains the Instruct version of the 13B parameters model.** **Input** Models input text only. **Output** Models generate text only. **Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture. **Model Dates** Code Llama and its variants have been trained between January 2023 and July 2023. **Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: **Research Paper** More information can be found in the paper \"Code Llama: Open Foundation Models for Code\" or its arXiv page. ## Intended Use **Intended Use Cases** Code Llama and its variants is intended for commercial and research use in English and relevant programming languages. The base model Code Llama can be adapted for a variety of code synthesis and understanding tasks, Code Llama - Python is designed specifically to handle the Python programming language, and Code Llama - Instruct is intended to be safer to use for code assistant and generation applications. **Out-of-Scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants. ## Hardware and Software **Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster. **Carbon Footprint** In aggregate, training all 9 Code Llama models required 400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 65.3 tCO2eq, 100% of which were offset by Meta’s sustainability program. ## Training Data All experiments reported here and the released models have been trained and fine-tuned using the same data as Llama 2 with different weights (see Section 2 and Table 1 in the research paper for details). ## Evaluation Results See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper. ## Ethical Considerations and Limitations Code Llama and its variants are a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Code Llama’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Code Llama, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available available at", + "model_explanation_gemini": "A 13-billion-parameter instruct-tuned generative text model designed for general code synthesis, understanding, and instruction-following tasks in programming." +} \ No newline at end of file diff --git a/data/model_data_json/cointegrated_LaBSE-en-ru.json b/data/model_data_json/cointegrated_LaBSE-en-ru.json new file mode 100644 index 0000000000000000000000000000000000000000..cea1d7984e76e6d15355b655bce1f2510ae94996 --- /dev/null +++ b/data/model_data_json/cointegrated_LaBSE-en-ru.json @@ -0,0 +1,22 @@ +{ + "model_id": "cointegrated/LaBSE-en-ru", + "downloads": 106499, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "bert", + "pretraining", + "feature-extraction", + "embeddings", + "sentence-similarity", + "ru", + "en", + "arxiv:2007.01852", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: [\"ru\", \"en\"] tags: - feature-extraction - embeddings - sentence-similarity --- # LaBSE for English and Russian This is a truncated version of sentence-transformers/LaBSE, which is, in turn, a port of LaBSE by Google. The current model has only English and Russian tokens left in the vocabulary. Thus, the vocabulary is 10% of the original, and number of parameters in the whole model is 27% of the original, without any loss in the quality of English and Russian embeddings. To get the sentence embeddings, you can use the following code: The model has been truncated in this notebook. You can adapt it for other languages (like EIStakovskii/LaBSE-fr-de), models or datasets. ## Reference: Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Narveen Ari, Wei Wang. Language-agnostic BERT Sentence Embedding. July 2020 License:", + "model_explanation_gemini": "Generates high-quality sentence embeddings for English and Russian text while maintaining original performance with a reduced vocabulary size." +} \ No newline at end of file diff --git a/data/model_data_json/cointegrated_rubert-base-cased-nli-threeway.json b/data/model_data_json/cointegrated_rubert-base-cased-nli-threeway.json new file mode 100644 index 0000000000000000000000000000000000000000..c58038dc8776b234630223e7a12193489c3a18e3 --- /dev/null +++ b/data/model_data_json/cointegrated_rubert-base-cased-nli-threeway.json @@ -0,0 +1,24 @@ +{ + "model_id": "cointegrated/rubert-base-cased-nli-threeway", + "downloads": 122265, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "text-classification", + "rubert", + "russian", + "nli", + "rte", + "zero-shot-classification", + "ru", + "dataset:cointegrated/nli-rus-translated-v2021", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ru pipeline_tag: zero-shot-classification tags: - rubert - russian - nli - rte - zero-shot-classification widget: - text: \"Я хочу поехать в Австралию\" candidate_labels: \"спорт,путешествия,музыка,кино,книги,наука,политика\" hypothesis_template: \"Тема текста - {}.\" datasets: - cointegrated/nli-rus-translated-v2021 --- # RuBERT for NLI (natural language inference) This is the DeepPavlov/rubert-base-cased fine-tuned to predict the logical relationship between two short texts: entailment, contradiction, or neutral. ## Usage How to run the model for NLI: You can also use this model for zero-shot short text classification (by labels only), e.g. for sentiment analysis: Alternatively, you can use Huggingface pipelines for inference. ## Sources The model has been trained on a series of NLI datasets automatically translated to Russian from English. Most datasets were taken from the repo of Felipe Salvatore: JOCI, MNLI, MPE, SICK, SNLI. Some datasets obtained from the original sources: ANLI, NLI-style FEVER, IMPPRES. ## Performance The table below shows ROC AUC (one class vs rest) for five models on the corresponding *dev* sets: - tiny: a small BERT predicting entailment vs not_entailment - twoway: a base-sized BERT predicting entailment vs not_entailment - threeway (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral - vicgalle-xlm: a large multilingual NLI model - facebook-bart: a large multilingual NLI model |model |add_one_rte|anli_r1|anli_r2|anli_r3|copa|fever|help|iie |imppres|joci|mnli |monli|mpe |scitail|sick|snli|terra|total | |------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------| |n_observations |387 |1000 |1000 |1200 |200 |20474|3355|31232|7661 |939 |19647|269 |1000|2126 |500 |9831|307 |101128| |tiny/entailment |0.77 |0.59 |0.52 |0.53 |0.53|0.90 |0.81|0.78 |0.93 |0.81|0.82 |0.91 |0.81|0.78 |0.93|0.95|0.67 |0.77 | |twoway/entailment |0.89 |0.73 |0.61 |0.62 |0.58|0.96 |0.92|0.87 |0.99 |0.90|0.90 |0.99 |0.91|0.96 |0.97|0.97|0.87 |0.86 | |threeway/entailment |0.91 |0.75 |0.61 |0.61 |0.57|0.96 |0.56|0.61 |0.99 |0.90|0.91 |0.67 |0.92|0.84 |0.98|0.98|0.90 |0.80 | |vicgalle-xlm/entailment |0.88 |0.79 |0.63 |0.66 |0.57|0.93 |0.56|0.62 |0.77 |0.80|0.90 |0.70 |0.83|0.84 |0.91|0.93|0.93 |0.78 | |facebook-bart/entailment|0.51 |0.41 |0.43 |0.47 |0.50|0.74 |0.55|0.57 |0.60 |0.63|0.70 |0.52 |0.56|0.68 |0.67|0.72|0.64 |0.58 | |threeway/contradiction | |0.71 |0.64 |0.61 | |0.97 | | |1.00 |0.77|0.92 | |0.89| |0.99|0.98| |0.85 | |threeway/neutral | |0.79 |0.70 |0.62 | |0.91 | | |0.99 |0.68|0.86 | |0.79| |0.96|0.96| |0.83 | For evaluation (and for training of the tiny and twoway models), some extra datasets were used: Add-one RTE, CoPA, IIE, and SCITAIL taken from the repo of Felipe Salvatore and translatted, HELP and MoNLI taken from the original sources and translated, and Russian TERRa.", + "model_explanation_gemini": "Fine-tuned for Russian natural language inference to predict entailment, contradiction, or neutral relationships between texts, and usable for zero-shot classification." +} \ No newline at end of file diff --git a/data/model_data_json/cointegrated_rubert-tiny-sentiment-balanced.json b/data/model_data_json/cointegrated_rubert-tiny-sentiment-balanced.json new file mode 100644 index 0000000000000000000000000000000000000000..af6490d03f82fcda1c38fff93b46912aeea51cfd --- /dev/null +++ b/data/model_data_json/cointegrated_rubert-tiny-sentiment-balanced.json @@ -0,0 +1,20 @@ +{ + "model_id": "cointegrated/rubert-tiny-sentiment-balanced", + "downloads": 75361, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "text-classification", + "russian", + "classification", + "sentiment", + "multiclass", + "ru", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: [\"ru\"] tags: - russian - classification - sentiment - multiclass widget: - text: \"Какая гадость эта ваша заливная рыба!\" --- This is the cointegrated/rubert-tiny model fine-tuned for classification of sentiment for short Russian texts. The problem is formulated as multiclass classification: vs vs . ## Usage The function below estimates the sentiment of the given text: ## Training We trained the model on the datasets collected by Smetanin. We have converted all training data into a 3-class format and have up- and downsampled the training data to balance both the sources and the classes. The training code is available as a Colab notebook. The metrics on the balanced test set are the following: | Source | Macro F1 | | ----------- | ----------- | | SentiRuEval2016_banks | 0.83 | | SentiRuEval2016_tele | 0.74 | | kaggle_news | 0.66 | | linis | 0.50 | | mokoron | 0.98 | | rureviews | 0.72 | | rusentiment | 0.67 |" +} \ No newline at end of file diff --git a/data/model_data_json/cointegrated_rubert-tiny2.json b/data/model_data_json/cointegrated_rubert-tiny2.json new file mode 100644 index 0000000000000000000000000000000000000000..e0397afaab8287e8483f26c81ceb92c516d5c561 --- /dev/null +++ b/data/model_data_json/cointegrated_rubert-tiny2.json @@ -0,0 +1,27 @@ +{ + "model_id": "cointegrated/rubert-tiny2", + "downloads": 545354, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "bert", + "pretraining", + "russian", + "fill-mask", + "embeddings", + "masked-lm", + "tiny", + "feature-extraction", + "sentence-similarity", + "transformers", + "ru", + "license:mit", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru pipeline_tag: sentence-similarity tags: - russian - fill-mask - pretraining - embeddings - masked-lm - tiny - feature-extraction - sentence-similarity - sentence-transformers - transformers license: mit widget: - text: Миниатюрная модель для [MASK] разных задач. --- This is an updated version of cointegrated/rubert-tiny: a small Russian BERT-based encoder with high-quality sentence embeddings. This post in Russian gives more details. The differences from the previous version include: - a larger vocabulary: 83828 tokens instead of 29564; - larger supported sequences: 2048 instead of 512; - sentence embeddings approximate LaBSE closer than before; - meaningful segment embeddings (tuned on the NLI task) - the model is focused only on Russian. The model should be used as is to produce sentence embeddings (e.g. for KNN classification of short texts) or fine-tuned for a downstream task. Sentence embeddings can be produced as follows: Alternatively, you can use the model with :", + "model_explanation_gemini": "Generates high-quality Russian sentence embeddings for tasks like similarity comparison or classification and supports fine-tuning for downstream applications." +} \ No newline at end of file diff --git a/data/model_data_json/colbert-ir_colbertv2.0.json b/data/model_data_json/colbert-ir_colbertv2.0.json new file mode 100644 index 0000000000000000000000000000000000000000..28b912d790c913e145e13a5f6686db17e0202462 --- /dev/null +++ b/data/model_data_json/colbert-ir_colbertv2.0.json @@ -0,0 +1,23 @@ +{ + "model_id": "colbert-ir/colbertv2.0", + "downloads": 1488481, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "ColBERT", + "en", + "arxiv:2004.12832", + "arxiv:2007.00814", + "arxiv:2101.00436", + "arxiv:2112.01488", + "arxiv:2205.09707", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - en tags: - ColBERT ---

# ColBERT (v2) ### ColBERT is a _fast_ and _accurate_ retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.

As Figure 1 illustrates, ColBERT relies on fine-grained **contextual late interaction**: it encodes each passage into a **matrix** of token-level embeddings (shown above in blue). Then at search time, it embeds every query into another matrix (shown in green) and efficiently finds passages that contextually match the query using scalable vector-similarity () operators. These rich interactions allow ColBERT to surpass the quality of _single-vector_ representation models, while scaling efficiently to large corpora. You can read more in our papers: * **ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT** (SIGIR'20). * **Relevance-guided Supervision for OpenQA with ColBERT** (TACL'21). * **Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval** (NeurIPS'21). * **ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction** (NAACL'22). * **PLAID: An Efficient Engine for Late Interaction Retrieval** (CIKM'22). ---- ## 🚨 **Announcements** * (1/29/23) We have merged a new index updater feature and support for additional Hugging Face models! These are in beta so please give us feedback as you try them out. * (1/24/23) If you're looking for the **DSP** framework for composing ColBERTv2 and LLMs, it's at: ---- ## ColBERTv1 The ColBERTv1 code from the SIGIR'20 paper is in the branch. See here for more information on other branches. ## Installation ColBERT requires Python 3.7+ and Pytorch 1.9+ and uses the Hugging Face Transformers library. We strongly recommend creating a conda environment using the commands below. (If you don't have conda, follow the official conda installation guide.) We have also included a new environment file specifically for CPU-only environments (), but note that if you are testing CPU execution on a machine that includes GPUs you might need to specify as part of your command. Note that a GPU is required for training and indexing. If you face any problems, please open a new issue and we'll help you promptly! ## Overview Using ColBERT on a dataset typically involves the following steps. **Step 0: Preprocess your collection.** At its simplest, ColBERT works with tab-separated (TSV) files: a file (e.g., ) will contain all passages and another (e.g., ) will contain a set of queries for searching the collection. **Step 1: Download the pre-trained ColBERTv2 checkpoint.** This checkpoint has been trained on the MS MARCO Passage Ranking task. You can also _optionally_ train your own ColBERT model. **Step 2: Index your collection.** Once you have a trained ColBERT model, you need to index your collection to permit fast retrieval. This step encodes all passages into matrices, stores them on disk, and builds data structures for efficient search. **Step 3: Search the collection with your queries.** Given the model and index, you can issue queries over the collection to retrieve the top-k passages for each query. Below, we illustrate these steps via an example run on the MS MARCO Passage Ranking task. ## API Usage Notebook **NEW**: We have an experimental notebook on Google Colab that you can use with free GPUs. Indexing 10,000 on the free Colab T4 GPU takes six minutes. This Jupyter notebook **docs/intro.ipynb notebook** illustrates using the key features of ColBERT with the new Python API. It includes how to download the ColBERTv2 model checkpoint trained on MS MARCO Passage Ranking and how to download our new LoTTE benchmark. ## Data This repository works directly with a simple **tab-separated file** format to store queries, passages, and top-k ranked lists. * Queries: each line is . * Collection: each line is . * Top-k Ranking: each line is . This works directly with the data format of the MS MARCO Passage Ranking dataset. You will need the training triples (), the official top-1000 ranked lists for the dev set queries (), and the dev set relevant passages (). For indexing the full collection, you will also need the list of passages (). ## Indexing For fast retrieval, indexing precomputes the ColBERT representations of passages. Example usage: ## Retrieval We typically recommend that you use ColBERT for **end-to-end** retrieval, where it directly finds its top-k passages from the full collection: You can optionally specify the , , and search hyperparameters to trade off between speed and result quality. Defaults for different values of are listed in colbert/searcher.py. We can evaluate the MSMARCO rankings using the following command: ## Training We provide a pre-trained model checkpoint, but we also detail how to train from scratch here. Note that this example demonstrates the ColBERTv1 style of training, but the provided checkpoint was trained with ColBERTv2. Training requires a JSONL triples file with a list per line. The query IDs and passage IDs correspond to the specified and files respectively. Example usage (training on 4 GPUs): ## Running a lightweight ColBERTv2 server We provide a script to run a lightweight server which serves k (upto 100) results in ranked order for a given search query, in JSON format. This script can be used to power DSP programs. To run the server, update the environment variables and in the file to point to the appropriate ColBERT index. The run the following command: A sample query: ## Branches ### Supported branches * []( Stable branch with ColBERTv2 + PLAID. * []( Legacy branch for ColBERTv1. ### Deprecated branches * []( Base ColBERTv2 implementation. * []( ColBERTv2 implementation with CPU search support. * []( ColBERTv2 implementation with PLAID. * []( ColBERT with a baseline binarization-based compression strategy (as opposed to ColBERTv2's residual compression, which we found to be more robust). ## Acknowledgments ColBERT logo designed by Chuyi Zhang.", + "model_explanation_gemini": "ColBERTv2 is a fast, scalable retrieval model using contextual late interaction over BERT embeddings for accurate search over large text collections." +} \ No newline at end of file diff --git a/data/model_data_json/comodoro_wav2vec2-xls-r-300m-cs-250.json b/data/model_data_json/comodoro_wav2vec2-xls-r-300m-cs-250.json new file mode 100644 index 0000000000000000000000000000000000000000..5ac87c652d432587efc486c5d77db3ad12ce3193 --- /dev/null +++ b/data/model_data_json/comodoro_wav2vec2-xls-r-300m-cs-250.json @@ -0,0 +1,29 @@ +{ + "model_id": "comodoro/wav2vec2-xls-r-300m-cs-250", + "downloads": 269013, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "generated_from_trainer", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_8_0", + "robust-speech-event", + "xlsr-fine-tuning-week", + "cs", + "dataset:mozilla-foundation/common_voice_8_0", + "dataset:ovm", + "dataset:pscr", + "dataset:vystadial2016", + "base_model:facebook/wav2vec2-xls-r-300m", + "base_model:finetune:facebook/wav2vec2-xls-r-300m", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - cs license: apache-2.0 tags: - automatic-speech-recognition - generated_from_trainer - hf-asr-leaderboard - mozilla-foundation/common_voice_8_0 - robust-speech-event - xlsr-fine-tuning-week datasets: - mozilla-foundation/common_voice_8_0 - ovm - pscr - vystadial2016 base_model: facebook/wav2vec2-xls-r-300m model-index: - name: Czech comodoro Wav2Vec2 XLSR 300M 250h data results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Common Voice 8 type: mozilla-foundation/common_voice_8_0 args: cs metrics: - type: wer value: 7.3 name: Test WER - type: cer value: 2.1 name: Test CER - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: cs metrics: - type: wer value: 43.44 name: Test WER - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Robust Speech Event - Test Data type: speech-recognition-community-v2/eval_data args: cs metrics: - type: wer value: 38.5 name: Test WER --- # Czech wav2vec2-xls-r-300m-cs-250 This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice 8.0 dataset as well as other datasets listed below. It achieves the following results on the evaluation set: - Loss: 0.1271 - Wer: 0.1475 - Cer: 0.0329 The script results using a LM are: - WER: 0.07274312090176113 - CER: 0.021207369275558875 ## Model description Fine-tuned facebook/wav2vec2-large-xlsr-53 on Czech using the Common Voice dataset. When using this model, make sure that your speech input is sampled at 16kHz. The model can be used directly (without a language model) as follows: ## Evaluation The model can be evaluated using the attached script: ## Training and evaluation data The Common Voice 8.0 and datasets were used for training, as well as the following datasets: - Šmídl, Luboš and Pražák, Aleš, 2013, OVM – Otázky Václava Moravce, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, - Pražák, Aleš and Šmídl, Luboš, 2012, Czech Parliament Meetings, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, - Plátek, Ondřej; Dušek, Ondřej and Jurčíček, Filip, 2016, Vystadial 2016 – Czech data, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 32 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 800 - num_epochs: 5 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:| | 3.4203 | 0.16 | 800 | 3.3148 | 1.0 | 1.0 | | 2.8151 | 0.32 | 1600 | 0.8508 | 0.8938 | 0.2345 | | 0.9411 | 0.48 | 2400 | 0.3335 | 0.3723 | 0.0847 | | 0.7408 | 0.64 | 3200 | 0.2573 | 0.2840 | 0.0642 | | 0.6516 | 0.8 | 4000 | 0.2365 | 0.2581 | 0.0595 | | 0.6242 | 0.96 | 4800 | 0.2039 | 0.2433 | 0.0541 | | 0.5754 | 1.12 | 5600 | 0.1832 | 0.2156 | 0.0482 | | 0.5626 | 1.28 | 6400 | 0.1827 | 0.2091 | 0.0463 | | 0.5342 | 1.44 | 7200 | 0.1744 | 0.2033 | 0.0468 | | 0.4965 | 1.6 | 8000 | 0.1705 | 0.1963 | 0.0444 | | 0.5047 | 1.76 | 8800 | 0.1604 | 0.1889 | 0.0422 | | 0.4814 | 1.92 | 9600 | 0.1604 | 0.1827 | 0.0411 | | 0.4471 | 2.09 | 10400 | 0.1566 | 0.1822 | 0.0406 | | 0.4509 | 2.25 | 11200 | 0.1619 | 0.1853 | 0.0432 | | 0.4415 | 2.41 | 12000 | 0.1513 | 0.1764 | 0.0397 | | 0.4313 | 2.57 | 12800 | 0.1515 | 0.1739 | 0.0392 | | 0.4163 | 2.73 | 13600 | 0.1445 | 0.1695 | 0.0377 | | 0.4142 | 2.89 | 14400 | 0.1478 | 0.1699 | 0.0385 | | 0.4184 | 3.05 | 15200 | 0.1430 | 0.1669 | 0.0376 | | 0.3886 | 3.21 | 16000 | 0.1433 | 0.1644 | 0.0374 | | 0.3795 | 3.37 | 16800 | 0.1426 | 0.1648 | 0.0373 | | 0.3859 | 3.53 | 17600 | 0.1357 | 0.1604 | 0.0361 | | 0.3762 | 3.69 | 18400 | 0.1344 | 0.1558 | 0.0349 | | 0.384 | 3.85 | 19200 | 0.1379 | 0.1576 | 0.0359 | | 0.3762 | 4.01 | 20000 | 0.1344 | 0.1539 | 0.0346 | | 0.3559 | 4.17 | 20800 | 0.1339 | 0.1525 | 0.0351 | | 0.3683 | 4.33 | 21600 | 0.1315 | 0.1518 | 0.0342 | | 0.3572 | 4.49 | 22400 | 0.1307 | 0.1507 | 0.0342 | | 0.3494 | 4.65 | 23200 | 0.1294 | 0.1491 | 0.0335 | | 0.3476 | 4.81 | 24000 | 0.1287 | 0.1491 | 0.0336 | | 0.3475 | 4.97 | 24800 | 0.1271 | 0.1475 | 0.0329 | ### Framework versions - Transformers 4.16.2 - Pytorch 1.10.1+cu102 - Datasets 1.18.3 - Tokenizers 0.11.0", + "model_explanation_gemini": "A fine-tuned Czech speech recognition model based on Facebook's wav2vec2-xls-r-300m, optimized for converting 16kHz audio input to text." +} \ No newline at end of file diff --git a/data/model_data_json/comodoro_wav2vec2-xls-r-300m-sk-cv8.json b/data/model_data_json/comodoro_wav2vec2-xls-r-300m-sk-cv8.json new file mode 100644 index 0000000000000000000000000000000000000000..f814800f671c6b5aaa8b5ed8dd805cc7095c9bab --- /dev/null +++ b/data/model_data_json/comodoro_wav2vec2-xls-r-300m-sk-cv8.json @@ -0,0 +1,22 @@ +{ + "model_id": "comodoro/wav2vec2-xls-r-300m-sk-cv8", + "downloads": 85074, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "mozilla-foundation/common_voice_8_0", + "robust-speech-event", + "xlsr-fine-tuning-week", + "hf-asr-leaderboard", + "sk", + "dataset:common_voice", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - sk license: apache-2.0 tags: - automatic-speech-recognition - mozilla-foundation/common_voice_8_0 - robust-speech-event - xlsr-fine-tuning-week - hf-asr-leaderboard datasets: - common_voice model-index: - name: Slovak comodoro Wav2Vec2 XLSR 300M CV8 results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 8 type: mozilla-foundation/common_voice_8_0 args: sk metrics: - name: Test WER type: wer value: 49.6 - name: Test CER type: cer value: 13.3 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: sk metrics: - name: Test WER type: wer value: 81.7 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Test Data type: speech-recognition-community-v2/eval_data args: sk metrics: - name: Test WER type: wer value: 80.26 --- # wav2vec2-xls-r-300m-cs-cv8 This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice 8.0 dataset. It achieves the following results on the evaluation set: - WER: 0.49575384615384616 - CER: 0.13333333333333333 ## Usage The model can be used directly (without a language model) as follows: ## Evaluation The model can be evaluated using the attached script: ## Training and evaluation data The Common Voice 8.0 and datasets were used for training ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 7e-4 - train_batch_size: 32 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 20 - total_train_batch_size: 640 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 50 - mixed_precision_training: Native AMP ### Framework versions - Transformers 4.16.0.dev0 - Pytorch 1.10.1+cu102 - Datasets 1.17.1.dev0 - Tokenizers 0.11.0", + "model_explanation_gemini": "A Slovak automatic speech recognition model fine-tuned on Common Voice 8.0 data to convert speech into text." +} \ No newline at end of file diff --git a/data/model_data_json/context-labs_Meta-Llama-3.1-8B-Instruct-FP16.json b/data/model_data_json/context-labs_Meta-Llama-3.1-8B-Instruct-FP16.json new file mode 100644 index 0000000000000000000000000000000000000000..83706e46fce75e3cd52c02ffd725da83666f3c1c --- /dev/null +++ b/data/model_data_json/context-labs_Meta-Llama-3.1-8B-Instruct-FP16.json @@ -0,0 +1,29 @@ +{ + "model_id": "context-labs/Meta-Llama-3.1-8B-Instruct-FP16", + "downloads": 189484, + "tags": [ + "safetensors", + "llama", + "facebook", + "meta", + "pytorch", + "llama-3", + "text-generation", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-8B", + "base_model:finetune:meta-llama/Llama-3.1-8B", + "license:llama3.1", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 extra_gated_prompt: \"### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT\\nLlama 3.1 Version\\ \\ Release Date: July 23, 2024\\n\\\"Agreement\\\" means the terms and conditions for\\ \\ use, reproduction, distribution and modification of the Llama Materials set forth\\ \\ herein.\\n\\\"Documentation\\\" means the specifications, manuals and documentation\\ \\ accompanying Llama 3.1 distributed by Meta at \\\"Licensee\\\" or \\\"you\\\" means you, or your employer or any other person or entity\\ \\ (if you are entering into this Agreement on such person or entity’s behalf), of\\ \\ the age required under applicable laws, rules or regulations to provide legal\\ \\ consent and that has legal authority to bind your employer or such other person\\ \\ or entity if you are entering in this Agreement on their behalf.\\n\\\"Llama 3.1\\\"\\ \\ means the foundational large language models and software and algorithms, including\\ \\ machine-learning model code, trained model weights, inference-enabling code, training-enabling\\ \\ code, fine-tuning enabling code and other elements of the foregoing distributed\\ \\ by Meta at Materials\\\" means,\\ \\ collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion\\ \\ thereof) made available under this Agreement.\\n\\\"Meta\\\" or \\\"we\\\" means Meta Platforms\\ \\ Ireland Limited (if you are located in or, if you are an entity, your principal\\ \\ place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you\\ \\ are located outside of the EEA or Switzerland).\\n \\n1. License Rights and Redistribution.\\n\\ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable\\ \\ and royalty-free limited license under Meta’s intellectual property or other rights\\ \\ owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy,\\ \\ create derivative works of, and make modifications to the Llama Materials.\\nb.\\ \\ Redistribution and Use.\\ni. If you distribute or make available the Llama Materials\\ \\ (or any derivative works thereof), or a product or service (including another\\ \\ AI model) that contains any of them, you shall (A) provide a copy of this Agreement\\ \\ with any such Llama Materials; and (B) prominently display “Built with Llama”\\ \\ on a related website, user interface, blogpost, about page, or product documentation.\\ \\ If you use the Llama Materials or any outputs or results of the Llama Materials\\ \\ to create, train, fine tune, or otherwise improve an AI model, which is distributed\\ \\ or made available, you shall also include “Llama” at the beginning of any such\\ \\ AI model name.\\nii. If you receive Llama Materials, or any derivative works thereof,\\ \\ from a Licensee as part of an integrated end user product, then Section 2 of\\ \\ this Agreement will not apply to you.\\niii. You must retain in all copies of the\\ \\ Llama Materials that you distribute the following attribution notice within a\\ \\ “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed\\ \\ under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights\\ \\ Reserved.”\\niv. Your use of the Llama Materials must comply with applicable laws\\ \\ and regulations (including trade compliance laws and regulations) and adhere to\\ \\ the Acceptable Use Policy for the Llama Materials (available at \\ which is hereby incorporated by reference into this Agreement.\\n2. Additional\\ \\ Commercial Terms. If, on the Llama 3.1 version release date, the monthly active\\ \\ users of the products or services made available by or for Licensee, or Licensee’s\\ \\ affiliates, is greater than 700 million monthly active users in the preceding\\ \\ calendar month, you must request a license from Meta, which Meta may grant to\\ \\ you in its sole discretion, and you are not authorized to exercise any of the\\ \\ rights under this Agreement unless or until Meta otherwise expressly grants you\\ \\ such rights.\\n3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE\\ \\ LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS”\\ \\ BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY\\ \\ KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES\\ \\ OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.\\ \\ YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING\\ \\ THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA\\ \\ MATERIALS AND ANY OUTPUT AND RESULTS.\\n4. Limitation of Liability. IN NO EVENT\\ \\ WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN\\ \\ CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS\\ \\ AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL,\\ \\ EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED\\ \\ OF THE POSSIBILITY OF ANY OF THE FOREGOING.\\n5. Intellectual Property.\\na. No\\ \\ trademark licenses are granted under this Agreement, and in connection with the\\ \\ Llama Materials, neither Meta nor Licensee may use any name or mark owned by or\\ \\ associated with the other or any of its affiliates, except as required for reasonable\\ \\ and customary use in describing and redistributing the Llama Materials or as set\\ \\ forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the\\ \\ “Mark”) solely as required to comply with the last sentence of Section 1.b.i.\\ \\ You will comply with Meta’s brand guidelines (currently accessible at \\ ). All goodwill arising out of your use of the Mark will inure to the benefit\\ \\ of Meta.\\nb. Subject to Meta’s ownership of Llama Materials and derivatives made\\ \\ by or for Meta, with respect to any derivative works and modifications of the\\ \\ Llama Materials that are made by you, as between you and Meta, you are and will\\ \\ be the owner of such derivative works and modifications.\\nc. If you institute\\ \\ litigation or other proceedings against Meta or any entity (including a cross-claim\\ \\ or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs\\ \\ or results, or any portion of any of the foregoing, constitutes infringement of\\ \\ intellectual property or other rights owned or licensable by you, then any licenses\\ \\ granted to you under this Agreement shall terminate as of the date such litigation\\ \\ or claim is filed or instituted. You will indemnify and hold harmless Meta from\\ \\ and against any claim by any third party arising out of or related to your use\\ \\ or distribution of the Llama Materials.\\n6. Term and Termination. The term of\\ \\ this Agreement will commence upon your acceptance of this Agreement or access\\ \\ to the Llama Materials and will continue in full force and effect until terminated\\ \\ in accordance with the terms and conditions herein. Meta may terminate this Agreement\\ \\ if you are in breach of any term or condition of this Agreement. Upon termination\\ \\ of this Agreement, you shall delete and cease use of the Llama Materials. Sections\\ \\ 3, 4 and 7 shall survive the termination of this Agreement.\\n7. Governing Law\\ \\ and Jurisdiction. This Agreement will be governed and construed under the laws\\ \\ of the State of California without regard to choice of law principles, and the\\ \\ UN Convention on Contracts for the International Sale of Goods does not apply\\ \\ to this Agreement. The courts of California shall have exclusive jurisdiction\\ \\ of any dispute arising out of this Agreement.\\n### Llama 3.1 Acceptable Use Policy\\n\\ Meta is committed to promoting safe and fair use of its tools and features, including\\ \\ Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy\\ \\ (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses\\nWe want everyone to use Llama 3.1 safely and responsibly.\\ \\ You agree you will not use, or allow others to use, Llama 3.1 to:\\n 1. Violate\\ \\ the law or others’ rights, including to:\\n 1. Engage in, promote, generate,\\ \\ contribute to, encourage, plan, incite, or further illegal or unlawful activity\\ \\ or content, such as:\\n 1. Violence or terrorism\\n 2. Exploitation\\ \\ or harm to children, including the solicitation, creation, acquisition, or dissemination\\ \\ of child exploitative content or failure to report Child Sexual Abuse Material\\n\\ \\ 3. Human trafficking, exploitation, and sexual violence\\n 4. The\\ \\ illegal distribution of information or materials to minors, including obscene\\ \\ materials, or failure to employ legally required age-gating in connection with\\ \\ such information or materials.\\n 5. Sexual solicitation\\n 6. Any\\ \\ other criminal activity\\n 3. Engage in, promote, incite, or facilitate the\\ \\ harassment, abuse, threatening, or bullying of individuals or groups of individuals\\n\\ \\ 4. Engage in, promote, incite, or facilitate discrimination or other unlawful\\ \\ or harmful conduct in the provision of employment, employment benefits, credit,\\ \\ housing, other economic benefits, or other essential goods and services\\n 5.\\ \\ Engage in the unauthorized or unlicensed practice of any profession including,\\ \\ but not limited to, financial, legal, medical/health, or related professional\\ \\ practices\\n 6. Collect, process, disclose, generate, or infer health, demographic,\\ \\ or other sensitive personal or private information about individuals without rights\\ \\ and consents required by applicable laws\\n 7. Engage in or facilitate any action\\ \\ or generate any content that infringes, misappropriates, or otherwise violates\\ \\ any third-party rights, including the outputs or results of any products or services\\ \\ using the Llama Materials\\n 8. Create, generate, or facilitate the creation\\ \\ of malicious code, malware, computer viruses or do anything else that could disable,\\ \\ overburden, interfere with or impair the proper working, integrity, operation\\ \\ or appearance of a website or computer system\\n2. Engage in, promote, incite,\\ \\ facilitate, or assist in the planning or development of activities that present\\ \\ a risk of death or bodily harm to individuals, including use of Llama 3.1 related\\ \\ to the following:\\n 1. Military, warfare, nuclear industries or applications,\\ \\ espionage, use for materials or activities that are subject to the International\\ \\ Traffic Arms Regulations (ITAR) maintained by the United States Department of\\ \\ State\\n 2. Guns and illegal weapons (including weapon development)\\n 3.\\ \\ Illegal drugs and regulated/controlled substances\\n 4. Operation of critical\\ \\ infrastructure, transportation technologies, or heavy machinery\\n 5. Self-harm\\ \\ or harm to others, including suicide, cutting, and eating disorders\\n 6. Any\\ \\ content intended to incite or promote violence, abuse, or any infliction of bodily\\ \\ harm to an individual\\n3. Intentionally deceive or mislead others, including use\\ \\ of Llama 3.1 related to the following:\\n 1. Generating, promoting, or furthering\\ \\ fraud or the creation or promotion of disinformation\\n 2. Generating, promoting,\\ \\ or furthering defamatory content, including the creation of defamatory statements,\\ \\ images, or other content\\n 3. Generating, promoting, or further distributing\\ \\ spam\\n 4. Impersonating another individual without consent, authorization,\\ \\ or legal right\\n 5. Representing that the use of Llama 3.1 or outputs are human-generated\\n\\ \\ 6. Generating or facilitating false online engagement, including fake reviews\\ \\ and other means of fake online engagement\\n4. Fail to appropriately disclose to\\ \\ end users any known dangers of your AI system\\nPlease report any violation of\\ \\ this Policy, software “bug,” or other problems that could lead to a violation\\ \\ of this Policy through one of the following means:\\n * Reporting issues with\\ \\ the model: \\ * Reporting risky content generated by the model:\\n developers.facebook.com/llama_output_feedback\\n\\ \\ * Reporting bugs and security concerns: facebook.com/whitehat/info\\n * Reporting\\ \\ violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com\" extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location ? By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy : checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Tool use with transformers LLaMA-3.1 supports multiple tool use formats. You can see a full guide to prompt formatting here. Tool use is also supported through chat templates in Transformers. Here is a quick example showing a single simple tool: You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so: and then call the tool and append the result, with the role, like so: After that, you can again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information, see the LLaMA prompt format docs and the Transformers tool use documentation. ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 46.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.", + "model_explanation_gemini": "An 8-billion-parameter multilingual text-generation model based on Meta's Llama 3.1 architecture, designed for instruction-following tasks and supporting multiple languages including English, German, French, and Spanish." +} \ No newline at end of file diff --git a/data/model_data_json/coqui_XTTS-v2.json b/data/model_data_json/coqui_XTTS-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..33cbc5003789cfd1624bb4bc168a68f271c2ada8 --- /dev/null +++ b/data/model_data_json/coqui_XTTS-v2.json @@ -0,0 +1,12 @@ +{ + "model_id": "coqui/XTTS-v2", + "downloads": 1505612, + "tags": [ + "coqui", + "text-to-speech", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: coqui-public-model-license license_link: library_name: coqui pipeline_tag: text-to-speech widget: - text: \"Once when I was six years old I saw a magnificent picture\" --- # ⓍTTS ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours. This is the same or similar model to what powers Coqui Studio and Coqui API. ### Features - Supports 17 languages. - Voice cloning with just a 6-second audio clip. - Emotion and style transfer by cloning. - Cross-language voice cloning. - Multi-lingual speech generation. - 24khz sampling rate. ### Updates over XTTS-v1 - 2 new languages; Hungarian and Korean - Architectural improvements for speaker conditioning. - Enables the use of multiple speaker references and interpolation between speakers. - Stability improvements. - Better prosody and audio quality across the board. ### Languages XTTS-v2 supports 17 languages: **English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi)**. Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out! ### Code The code-base supports inference and fine-tuning. ### Demo Spaces - XTTS Space : You can see how model performs on supported languages, and try with your own reference or microphone input - XTTS Voice Chat with Mistral or Zephyr : You can experience streaming voice chat with Mistral 7B Instruct or Zephyr 7B Beta | | | | ------------------------------- | --------------------------------------- | | 🐸💬 **CoquiTTS** | coqui/TTS on Github| | 💼 **Documentation** | ReadTheDocs | 👩‍💻 **Questions** | GitHub Discussions | | 🗯 **Community** | Discord | ### License This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here. ### Contact Come and join in our 🐸Community. We're active on Discord and Twitter. You can also mail us at info@coqui.ai. Using 🐸TTS API: Using 🐸TTS Command line: Using the model directly:", + "model_explanation_gemini": "Generates and clones voices across 17 languages using a 6-second audio clip, enabling emotion transfer, style transfer, and multilingual speech synthesis." +} \ No newline at end of file diff --git a/data/model_data_json/cortexso_deepseek-r1.json b/data/model_data_json/cortexso_deepseek-r1.json new file mode 100644 index 0000000000000000000000000000000000000000..cf44b059c8daa9fbff97343cbde66c3607bb7760 --- /dev/null +++ b/data/model_data_json/cortexso_deepseek-r1.json @@ -0,0 +1,16 @@ +{ + "model_id": "cortexso/deepseek-r1", + "downloads": 84381, + "tags": [ + "gguf", + "cortexp.cpp", + "featured", + "text-generation", + "license:mit", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- license: mit pipeline_tag: text-generation tags: - cortexp.cpp - featured --- ## Overview **DeepSeek** developed and released the **DeepSeek-R1** series, featuring multiple model sizes fine-tuned for high-performance text generation. These models are optimized for dialogue, reasoning, and information-seeking tasks, providing a balance of efficiency and accuracy while maintaining a smaller footprint compared to their original counterparts. The DeepSeek-R1 models include distilled and full-scale variants of both **Qwen** and **Llama** architectures, catering to various applications such as customer support, conversational AI, research, and enterprise automation. ## Variants ### DeepSeek-R1 | No | Variant | Branch | Cortex CLI command | | -- | ---------------------------------------------------------------------------------------------- | ------- | ------------------------------------------ | | 1 | DeepSeek-R1-Distill-Qwen-1.5B | 1.5b | | | 2 | DeepSeek-R1-Distill-Qwen-7B | 7b | | | 3 | DeepSeek-R1-Distill-Llama-8B | 8b | | | 4 | DeepSeek-R1-Distill-Qwen-14B | 14b | | | 5 | DeepSeek-R1-Distill-Qwen-32B | 32b | | | 6 | DeepSeek-R1-Distill-Llama-70B | 70b | | Each branch contains a default quantized version: - **Qwen-1.5B:** q4-km - **Qwen-7B:** q4-km - **Llama-8B:** q4-km - **Qwen-14B:** q4-km - **Qwen-32B:** q4-km - **Llama-70B:** q4-km ## Use it with Jan (UI) 1. Install **Jan** using Quickstart 2. Use in Jan model Hub: ## Use it with Cortex (CLI) 1. Install **Cortex** using Quickstart 2. Run the model with command: ## Credits - **Author:** DeepSeek - **Converter:** Homebrew - **Original License:** License - **Papers:** DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning", + "model_explanation_gemini": "Optimized for high-performance text generation in dialogue, reasoning, and information-seeking tasks, balancing efficiency and accuracy across various model sizes." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_ms-marco-MiniLM-L12-v2.json b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L12-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..59e993e17972852666a5013ec09843333d1f9606 --- /dev/null +++ b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L12-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/ms-marco-MiniLM-L12-v2", + "downloads": 459307, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "bert", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/msmarco", + "base_model:microsoft/MiniLM-L12-H384-uncased", + "base_model:quantized:microsoft/MiniLM-L12-H384-uncased", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/msmarco language: - en base_model: - microsoft/MiniLM-L12-H384-uncased pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for MS Marco This model was trained on the MS Marco Passage Ranking task. The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco ## Usage with SentenceTransformers The usage is easy when you have SentenceTransformers installed. Then you can use the pre-trained models like this: ## Usage with Transformers ## Performance In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset. | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec | | ------------- |:-------------| -----| --- | | **Version 2 models** | | | | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 | **Version 1 models** | | | | cross-encoder/ms-marco-TinyBERT-L2 | 67.43 | 30.15 | 9000 | cross-encoder/ms-marco-TinyBERT-L4 | 68.09 | 34.50 | 2900 | cross-encoder/ms-marco-TinyBERT-L6 | 69.57 | 36.13 | 680 | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | **Other models** | | | | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720 Note: Runtime was computed on a V100 GPU.", + "model_explanation_gemini": "Ranks retrieved passages for a given query in information retrieval tasks, trained on the MS Marco Passage Ranking dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_ms-marco-MiniLM-L2-v2.json b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L2-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..bfbde7cda8e1dfc7fd27c713cf4a0642ae348baa --- /dev/null +++ b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L2-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/ms-marco-MiniLM-L2-v2", + "downloads": 418336, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "bert", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/msmarco", + "base_model:cross-encoder/ms-marco-MiniLM-L12-v2", + "base_model:quantized:cross-encoder/ms-marco-MiniLM-L12-v2", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/msmarco language: - en base_model: - cross-encoder/ms-marco-MiniLM-L12-v2 pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for MS Marco This model was trained on the MS Marco Passage Ranking task. The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco ## Usage with SentenceTransformers The usage is easy when you have SentenceTransformers installed. Then you can use the pre-trained models like this: ## Usage with Transformers ## Performance In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset. | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec | | ------------- |:-------------| -----| --- | | **Version 2 models** | | | | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 | **Version 1 models** | | | | cross-encoder/ms-marco-TinyBERT-L2 | 67.43 | 30.15 | 9000 | cross-encoder/ms-marco-TinyBERT-L4 | 68.09 | 34.50 | 2900 | cross-encoder/ms-marco-TinyBERT-L6 | 69.57 | 36.13 | 680 | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | **Other models** | | | | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720 Note: Runtime was computed on a V100 GPU.", + "model_explanation_gemini": "Ranks passages for information retrieval by scoring and sorting them based on relevance to a given query, trained on the MS Marco Passage Ranking dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_ms-marco-MiniLM-L4-v2.json b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L4-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..9c35dcda731a31cdc7b16ad2e4ec0a3c30ccd32d --- /dev/null +++ b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L4-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/ms-marco-MiniLM-L4-v2", + "downloads": 203615, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "bert", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/msmarco", + "base_model:cross-encoder/ms-marco-MiniLM-L12-v2", + "base_model:quantized:cross-encoder/ms-marco-MiniLM-L12-v2", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/msmarco language: - en base_model: - cross-encoder/ms-marco-MiniLM-L12-v2 pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for MS Marco This model was trained on the MS Marco Passage Ranking task. The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco ## Usage with SentenceTransformers The usage is easy when you have SentenceTransformers installed. Then you can use the pre-trained models like this: ## Usage with Transformers ## Performance In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset. | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec | | ------------- |:-------------| -----| --- | | **Version 2 models** | | | | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 | **Version 1 models** | | | | cross-encoder/ms-marco-TinyBERT-L2 | 67.43 | 30.15 | 9000 | cross-encoder/ms-marco-TinyBERT-L4 | 68.09 | 34.50 | 2900 | cross-encoder/ms-marco-TinyBERT-L6 | 69.57 | 36.13 | 680 | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | **Other models** | | | | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720 Note: Runtime was computed on a V100 GPU.", + "model_explanation_gemini": "Ranks passages for information retrieval by scoring and sorting them based on relevance to a given query, trained on the MS Marco Passage Ranking dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_ms-marco-MiniLM-L6-v2.json b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L6-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..319e3e636bfce9c34f2e1ecf9aaef328e12f8959 --- /dev/null +++ b/data/model_data_json/cross-encoder_ms-marco-MiniLM-L6-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/ms-marco-MiniLM-L6-v2", + "downloads": 2414099, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "bert", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/msmarco", + "base_model:cross-encoder/ms-marco-MiniLM-L12-v2", + "base_model:quantized:cross-encoder/ms-marco-MiniLM-L12-v2", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/msmarco language: - en base_model: - cross-encoder/ms-marco-MiniLM-L12-v2 pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for MS Marco This model was trained on the MS Marco Passage Ranking task. The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco ## Usage with SentenceTransformers The usage is easy when you have SentenceTransformers installed. Then you can use the pre-trained models like this: ## Usage with Transformers ## Performance In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset. | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec | | ------------- |:-------------| -----| --- | | **Version 2 models** | | | | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 | **Version 1 models** | | | | cross-encoder/ms-marco-TinyBERT-L2 | 67.43 | 30.15 | 9000 | cross-encoder/ms-marco-TinyBERT-L4 | 68.09 | 34.50 | 2900 | cross-encoder/ms-marco-TinyBERT-L6 | 69.57 | 36.13 | 680 | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | **Other models** | | | | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720 Note: Runtime was computed on a V100 GPU.", + "model_explanation_gemini": "Ranks passages for information retrieval by scoring and sorting them based on relevance to a given query, trained on the MS Marco Passage Ranking dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_ms-marco-TinyBERT-L2-v2.json b/data/model_data_json/cross-encoder_ms-marco-TinyBERT-L2-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..1a87db5d874a1d58f0685a97cc647633fb378fce --- /dev/null +++ b/data/model_data_json/cross-encoder_ms-marco-TinyBERT-L2-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/ms-marco-TinyBERT-L2-v2", + "downloads": 239383, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "bert", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/msmarco", + "base_model:nreimers/BERT-Tiny_L-2_H-128_A-2", + "base_model:quantized:nreimers/BERT-Tiny_L-2_H-128_A-2", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/msmarco language: - en base_model: - nreimers/BERT-Tiny_L-2_H-128_A-2 pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for MS Marco This model was trained on the MS Marco Passage Ranking task. The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco ## Usage with Transformers ## Usage with SentenceTransformers The usage becomes easier when you have SentenceTransformers installed. Then, you can use the pre-trained models like this: ## Performance In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset. | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec | | ------------- |:-------------| -----| --- | | **Version 2 models** | | | | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 | **Version 1 models** | | | | cross-encoder/ms-marco-TinyBERT-L2 | 67.43 | 30.15 | 9000 | cross-encoder/ms-marco-TinyBERT-L4 | 68.09 | 34.50 | 2900 | cross-encoder/ms-marco-TinyBERT-L6 | 69.57 | 36.13 | 680 | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | **Other models** | | | | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720 Note: Runtime was computed on a V100 GPU.", + "model_explanation_gemini": "Ranks passages for information retrieval by scoring and sorting them based on relevance to a given query, trained on the MS Marco dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_ms-marco-electra-base.json b/data/model_data_json/cross-encoder_ms-marco-electra-base.json new file mode 100644 index 0000000000000000000000000000000000000000..964cc1fca8a4971fbb7b30c3fd8228bef12a215d --- /dev/null +++ b/data/model_data_json/cross-encoder_ms-marco-electra-base.json @@ -0,0 +1,23 @@ +{ + "model_id": "cross-encoder/ms-marco-electra-base", + "downloads": 122963, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "electra", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/msmarco", + "base_model:google/electra-base-discriminator", + "base_model:quantized:google/electra-base-discriminator", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/msmarco language: - en base_model: - google/electra-base-discriminator pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for MS Marco This model was trained on the MS Marco Passage Ranking task. The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco ## Usage with SentenceTransformers The usage is easy when you have SentenceTransformers installed. Then you can use the pre-trained models like this: ## Usage with Transformers ## Performance In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset. | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec | | ------------- |:-------------| -----| --- | | **Version 2 models** | | | | cross-encoder/ms-marco-TinyBERT-L-2-v2 | 69.84 | 32.56 | 9000 | cross-encoder/ms-marco-MiniLM-L-2-v2 | 71.01 | 34.85 | 4100 | cross-encoder/ms-marco-MiniLM-L-4-v2 | 73.04 | 37.70 | 2500 | cross-encoder/ms-marco-MiniLM-L-6-v2 | 74.30 | 39.01 | 1800 | cross-encoder/ms-marco-MiniLM-L-12-v2 | 74.31 | 39.02 | 960 | **Version 1 models** | | | | cross-encoder/ms-marco-TinyBERT-L-2 | 67.43 | 30.15 | 9000 | cross-encoder/ms-marco-TinyBERT-L-4 | 68.09 | 34.50 | 2900 | cross-encoder/ms-marco-TinyBERT-L-6 | 69.57 | 36.13 | 680 | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | **Other models** | | | | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720 Note: Runtime was computed on a V100 GPU.", + "model_explanation_gemini": "Ranks passages for information retrieval by scoring and sorting them based on relevance to a given query, trained on the MS Marco Passage Ranking dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_nli-deberta-v3-base.json b/data/model_data_json/cross-encoder_nli-deberta-v3-base.json new file mode 100644 index 0000000000000000000000000000000000000000..c87287fafbc863c557aceedca6ebc11697f20385 --- /dev/null +++ b/data/model_data_json/cross-encoder_nli-deberta-v3-base.json @@ -0,0 +1,22 @@ +{ + "model_id": "cross-encoder/nli-deberta-v3-base", + "downloads": 78924, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "transformers", + "zero-shot-classification", + "en", + "dataset:nyu-mll/multi_nli", + "dataset:stanfordnlp/snli", + "base_model:microsoft/deberta-v3-base", + "base_model:quantized:microsoft/deberta-v3-base", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: en pipeline_tag: zero-shot-classification tags: - transformers datasets: - nyu-mll/multi_nli - stanfordnlp/snli metrics: - accuracy license: apache-2.0 base_model: - microsoft/deberta-v3-base library_name: sentence-transformers --- # Cross-Encoder for Natural Language Inference This model was trained using SentenceTransformers Cross-Encoder class. This model is based on microsoft/deberta-v3-base ## Training Data The model was trained on the SNLI and MultiNLI datasets. For a given sentence pair, it will output three scores corresponding to the labels: contradiction, entailment, neutral. ## Performance - Accuracy on SNLI-test dataset: 92.38 - Accuracy on MNLI mismatched set: 90.04 For futher evaluation results, see SBERT.net - Pretrained Cross-Encoder. ## Usage Pre-trained models can be used like this: ## Usage with Transformers AutoModel You can use the model also directly with Transformers library (without SentenceTransformers library): ## Zero-Shot Classification This model can also be used for zero-shot-classification:" +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_nli-deberta-v3-large.json b/data/model_data_json/cross-encoder_nli-deberta-v3-large.json new file mode 100644 index 0000000000000000000000000000000000000000..d2c0b5aac8a3478f04f22208d930281d9146d3c9 --- /dev/null +++ b/data/model_data_json/cross-encoder_nli-deberta-v3-large.json @@ -0,0 +1,23 @@ +{ + "model_id": "cross-encoder/nli-deberta-v3-large", + "downloads": 253165, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "transformers", + "zero-shot-classification", + "en", + "dataset:nyu-mll/multi_nli", + "dataset:stanfordnlp/snli", + "base_model:microsoft/deberta-v3-large", + "base_model:quantized:microsoft/deberta-v3-large", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: en pipeline_tag: zero-shot-classification tags: - transformers datasets: - nyu-mll/multi_nli - stanfordnlp/snli metrics: - accuracy license: apache-2.0 base_model: - microsoft/deberta-v3-large library_name: sentence-transformers --- # Cross-Encoder for Natural Language Inference This model was trained using SentenceTransformers Cross-Encoder class. This model is based on microsoft/deberta-v3-large ## Training Data The model was trained on the SNLI and MultiNLI datasets. For a given sentence pair, it will output three scores corresponding to the labels: contradiction, entailment, neutral. ## Performance - Accuracy on SNLI-test dataset: 92.20 - Accuracy on MNLI mismatched set: 90.49 For futher evaluation results, see SBERT.net - Pretrained Cross-Encoder. ## Usage Pre-trained models can be used like this: ## Usage with Transformers AutoModel You can use the model also directly with Transformers library (without SentenceTransformers library): ## Zero-Shot Classification This model can also be used for zero-shot-classification:", + "model_explanation_gemini": "Classifies sentence pairs into contradiction, entailment, or neutral labels for natural language inference tasks, also supporting zero-shot classification." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_stsb-distilroberta-base.json b/data/model_data_json/cross-encoder_stsb-distilroberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..01f6424330ad02be1e9ebe7c4adbea36f9094a56 --- /dev/null +++ b/data/model_data_json/cross-encoder_stsb-distilroberta-base.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/stsb-distilroberta-base", + "downloads": 262724, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "roberta", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/stsb", + "base_model:distilbert/distilroberta-base", + "base_model:quantized:distilbert/distilroberta-base", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/stsb language: - en base_model: - distilbert/distilroberta-base pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for Semantic Textual Similarity This model was trained using SentenceTransformers Cross-Encoder class. ## Training Data This model was trained on the STS benchmark dataset. The model will predict a score between 0 and 1 how for the semantic similarity of two sentences. ## Usage and Performance Pre-trained models can be used like this: The model will predict scores for the pairs and . You can use this model also without sentence_transformers and by just using Transformers `` class", + "model_explanation_gemini": "Predicts semantic similarity scores (0-1) between two English sentences using a cross-encoder trained on the STS benchmark dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_stsb-roberta-base.json b/data/model_data_json/cross-encoder_stsb-roberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..84bc5f8c62303edaaa7cc2bb829d9ae9920b4e88 --- /dev/null +++ b/data/model_data_json/cross-encoder_stsb-roberta-base.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/stsb-roberta-base", + "downloads": 226623, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "roberta", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/stsb", + "base_model:FacebookAI/roberta-base", + "base_model:quantized:FacebookAI/roberta-base", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/stsb language: - en base_model: - FacebookAI/roberta-base pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for Semantic Textual Similarity This model was trained using SentenceTransformers Cross-Encoder class. ## Training Data This model was trained on the STS benchmark dataset. The model will predict a score between 0 and 1 how for the semantic similarity of two sentences. ## Usage and Performance Pre-trained models can be used like this: The model will predict scores for the pairs and . You can use this model also without sentence_transformers and by just using Transformers `` class", + "model_explanation_gemini": "Predicts semantic similarity scores (0-1) between two English sentences using a cross-encoder trained on the STS benchmark dataset." +} \ No newline at end of file diff --git a/data/model_data_json/cross-encoder_stsb-roberta-large.json b/data/model_data_json/cross-encoder_stsb-roberta-large.json new file mode 100644 index 0000000000000000000000000000000000000000..6e041410b5feef0bf4fadd179b7229f33ab56ff2 --- /dev/null +++ b/data/model_data_json/cross-encoder_stsb-roberta-large.json @@ -0,0 +1,24 @@ +{ + "model_id": "cross-encoder/stsb-roberta-large", + "downloads": 109164, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "onnx", + "safetensors", + "openvino", + "roberta", + "text-classification", + "transformers", + "text-ranking", + "en", + "dataset:sentence-transformers/stsb", + "base_model:FacebookAI/roberta-large", + "base_model:quantized:FacebookAI/roberta-large", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - sentence-transformers/stsb language: - en base_model: - FacebookAI/roberta-large pipeline_tag: text-ranking library_name: sentence-transformers tags: - transformers --- # Cross-Encoder for Semantic Textual Similarity This model was trained using SentenceTransformers Cross-Encoder class. ## Training Data This model was trained on the STS benchmark dataset. The model will predict a score between 0 and 1 how for the semantic similarity of two sentences. ## Usage and Performance Pre-trained models can be used like this: The model will predict scores for the pairs and . You can use this model also without sentence_transformers and by just using Transformers `` class", + "model_explanation_gemini": "Predicts semantic similarity scores between two sentences on a scale from 0 to 1, trained on the STS benchmark dataset." +} \ No newline at end of file diff --git a/data/model_data_json/csebuetnlp_banglat5_banglaparaphrase.json b/data/model_data_json/csebuetnlp_banglat5_banglaparaphrase.json new file mode 100644 index 0000000000000000000000000000000000000000..02bca8f3676c8d0ddbff952bd45d743d5b020a14 --- /dev/null +++ b/data/model_data_json/csebuetnlp_banglat5_banglaparaphrase.json @@ -0,0 +1,18 @@ +{ + "model_id": "csebuetnlp/banglat5_banglaparaphrase", + "downloads": 685311, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "bn", + "arxiv:2210.05109", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - bn licenses: - cc-by-nc-sa-4.0 --- # banglat5_banglaparaphrase This repository contains the pretrained checkpoint of the model **BanglaT5** finetuned on BanglaParaphrase dataset. This is a sequence to sequence transformer model pretrained with the [\"Span Corruption\"]() objective. Finetuned models using this checkpoint achieve competitive results on the dataset. For finetuning and inference, refer to the scripts in the official GitHub repository of BanglaNLG. **Note**: This model was pretrained using a specific normalization pipeline available here. All finetuning scripts in the official GitHub repository use this normalization by default. If you need to adapt the pretrained model for a different task make sure the text units are normalized using this pipeline before tokenizing to get best results. A basic example is given below: ## Using this model in ## Benchmarks * Supervised fine-tuning | Test Set | Model | sacreBLEU | ROUGE-L | PINC | BERTScore | BERT-iBLEU | | -------- | ----- | --------- | ------- | ---- | --------- | ---------- | | BanglaParaphrase | BanglaT5
IndicBART
IndicBARTSS| 32.8
5.60
4.90 | 63.58
35.61
33.66 | 74.40
80.26
82.10 | 94.80
91.50
91.10 | 92.18
91.16
90.95 | | IndicParaphrase |BanglaT5
IndicBART
IndicBARTSS| 11.0
12.0
10.7| 19.99
21.58
20.59| 74.50
76.83
77.60| 94.80
93.30
93.10 | 87.738
90.65
90.54| The dataset can be found in the link below: * **BanglaParaphrase** ## Citation If you use this model, please cite the following paper:", + "model_explanation_gemini": "Finetuned on BanglaParaphrase, this sequence-to-sequence transformer model generates competitive Bangla paraphrases using the Span Corruption objective." +} \ No newline at end of file diff --git a/data/model_data_json/dangvantuan_vietnamese-document-embedding.json b/data/model_data_json/dangvantuan_vietnamese-document-embedding.json new file mode 100644 index 0000000000000000000000000000000000000000..322e4218acdf62949a4c8e4138836137c47c6911 --- /dev/null +++ b/data/model_data_json/dangvantuan_vietnamese-document-embedding.json @@ -0,0 +1,26 @@ +{ + "model_id": "dangvantuan/vietnamese-document-embedding", + "downloads": 75605, + "tags": [ + "sentence-transformers", + "safetensors", + "Vietnamese", + "feature-extraction", + "sentence-similarity", + "transformers", + "phobert", + "vietnamese", + "sentence-embedding", + "custom_code", + "vi", + "arxiv:1908.10084", + "arxiv:2407.19669", + "arxiv:2308.03281", + "arxiv:2402.14776", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - phobert - vietnamese - sentence-embedding license: apache-2.0 language: - vi metrics: - pearsonr - spearmanr --- ## Model Description: **vietnamese-document-embedding** is the Document Embedding Model for Vietnamese language with context length up to 8096 tokens. This model is a specialized long text-embedding trained specifically for the Vietnamese language, which is built upon gte-multilingual and trained using the Multi-Negative Ranking Loss, Matryoshka2dLoss and SimilarityLoss. ## Full Model Architecture ## Training and Fine-tuning process The model underwent a rigorous four-stage training and fine-tuning process, each tailored to enhance its ability to generate precise and contextually relevant sentence embeddings for the Vietnamese language. Below is an outline of these stages: #### Stage 1: Training NLI on dataset XNLI: - Dataset: XNLI-vn - Method: Training using Multi-Negative Ranking Loss and Matryoshka2dLoss. This stage focused on improving the model's ability to discern and rank nuanced differences in sentence semantics. ### Stage 2: Fine-tuning for Semantic Textual Similarity on STS Benchmark - Dataset: STSB-vn - Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library. This stage honed the model's precision in capturing semantic similarity across various types of Vietnamese texts. ## Usage: Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Evaluation The model can be evaluated as follows on the Vienamese data of stsb. ### Metric for all dataset of Semantic Textual Similarity on STS Benchmark **Spearman score** | Model | [STSB] | [STS12]| [STS13] | [STS14] | [STS15] | [STS16] | [SICK] | Mean | |-----------------------------------------------------------|---------|----------|----------|----------|----------|----------|---------|--------| | dangvantuan/vietnamese-embedding |84.84| 79.04| 85.30| 81.38| 87.06| 79.95| 79.58| 82.45| | dangvantuan/vietnamese-embedding-LongContext |85.25| 75.77| 83.82| 81.69| 88.48| 81.5| 78.2| 82.10| ## Citation @article{reimers2019sentence, title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks}, author={Nils Reimers, Iryna Gurevych}, journal={ year={2019} } @article{zhang2024mgte, title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval}, author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others}, journal={arXiv preprint arXiv:2407.19669}, year={2024} } @article{li2023towards, title={Towards general text embeddings with multi-stage contrastive learning}, author={Li, Zehan and Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan}, journal={arXiv preprint arXiv:2308.03281}, year={2023} } @article{li20242d, title={2d matryoshka sentence embeddings}, author={Li, Xianming and Li, Zongxi and Li, Jing and Xie, Haoran and Li, Qing}, journal={arXiv preprint arXiv:2402.14776}, year={2024} }" +} \ No newline at end of file diff --git a/data/model_data_json/dariolopez_roberta-base-bne-finetuned-msmarco-qa-es-mnrl-mn.json b/data/model_data_json/dariolopez_roberta-base-bne-finetuned-msmarco-qa-es-mnrl-mn.json new file mode 100644 index 0000000000000000000000000000000000000000..79bc4c33f80a3afc9777cb4fb88efe41349d6b10 --- /dev/null +++ b/data/model_data_json/dariolopez_roberta-base-bne-finetuned-msmarco-qa-es-mnrl-mn.json @@ -0,0 +1,19 @@ +{ + "model_id": "dariolopez/roberta-base-bne-finetuned-msmarco-qa-es-mnrl-mn", + "downloads": 274718, + "tags": [ + "sentence-transformers", + "pytorch", + "roberta", + "sentence-similarity", + "es", + "dataset:IIC/ms_marco_es", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - es library_name: sentence-transformers pipeline_tag: sentence-similarity datasets: - IIC/ms_marco_es --- # Model Description This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. # How to use Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: # Training The trained model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne focused on question/answer using MS-MARCO dataset translated into Spanish (query - positive - negative - negative - negative - negative) dataset to train. ## Features * Base Model * Config used to train * Dataset: IIC/ms_marco_es (query - positive - negative - negative - negative - negative) * Loss: MultipleNegativesRankingLoss ## Config ## Source code to train # Considerations for Using the Model The model is designed for use in Spanish language, specially focused on Question/Answer. ## Max input length By default, input text longer than 512 word pieces is truncated. # Additional Information ## Licesing This work is licensed under a Apache License, Version 2.0", + "model_explanation_gemini": "A Spanish-language sentence-transformers model fine-tuned for question/answer tasks, mapping text to 768-dimensional vectors for semantic search and clustering using the MS-MARCO dataset." +} \ No newline at end of file diff --git a/data/model_data_json/dataautogpt3_ProteusV0.3.json b/data/model_data_json/dataautogpt3_ProteusV0.3.json new file mode 100644 index 0000000000000000000000000000000000000000..837a01080179dcdcdef6f13f2a410d7fdff39f19 --- /dev/null +++ b/data/model_data_json/dataautogpt3_ProteusV0.3.json @@ -0,0 +1,15 @@ +{ + "model_id": "dataautogpt3/ProteusV0.3", + "downloads": 93349, + "tags": [ + "diffusers", + "text-to-image", + "license:gpl-3.0", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- pipeline_tag: text-to-image widget: - text: >- Anime full body portrait of a swordsman holding his weapon in front of him. He is facing the camera with a fierce look on his face. Anime key visual (best quality, HD, ~+~aesthetic~+~:1.2) output: url: upscaled_image.png - text: >- spacious,circular underground room,{dirtied and bloodied white tiles},amalgamation,flesh,plastic,dark fabric,core,pulsating heart,limbs,human-like arms,twisted angelic wings,arms,covered in skin,feathers,scales,undulate slowly,unseen current,convulsing,head area,chaotic,mass of eyes,mouths,no human features,smaller forms,cherubs,demons,golden wires,surround,holy light,tv static effect,golden glow,shadows,terrifying essence,overwhelming presence,nightmarish,landscape,sparse,cavernous,eerie,dynamic,motion,striking,awe-inspiring,nightmarish,nightmarish,nightmare,horrifying,bio-mechanical,body horror,amalgamation output: url: 2.png - text: >- A robot holding a sign saying 'The Application did not respond' in red colors output: url: 3.png - text: >- A photograph of Hughyen in his early twenties, (an inspiring artist whose art focuses on glitching images and vaporwave color gradients with unexpected conflicting compositions:0.5) output: url: 4.png - text: >- Anime mugshot of a tough woman. She is holding a prison sign that reads \"Proteus\". Her face is censored. Anime key visual (best quality, HD, ~+~aesthetic~+~:1.2) output: url: 7.png - text: >- Glitch art. 1980s anime, vintage, analogue horror. ((static and noise)), chromatic aberration output: url: 5.png - text: >- Masterpiece, glitch, holy holy holy, fog, by DarkIncursio output: url: 6.png license: gpl-3.0 --- ## ProteusV0.3: The Anime Update Proteus V0.3 has been advanced with an additional 200,000 anime-related images, further refined by a selection of 15,000 aesthetically pleasing images, enhancing its lighting effects significantly. This upgrade preserves its understanding of prompts and maintains its photorealistic and stylistic capabilities without suffering from catastrophic forgetting. ## Proteus Proteus serves as a sophisticated enhancement over OpenDalleV1.1, leveraging its core functionalities to deliver superior outcomes. Key areas of advancement include heightened responsiveness to prompts and augmented creative capacities. To achieve this, it was fine-tuned using approximately 220,000 GPTV captioned images from copyright-free stock images (with some anime included), which were then normalized. Additionally, DPO (Direct Preference Optimization) was employed through a collection of 10,000 carefully selected high-quality, AI-generated image pairs. In pursuit of optimal performance, numerous LORA (Low-Rank Adaptation) models are trained independently before being selectively incorporated into the principal model via dynamic application methods. These techniques involve targeting particular segments within the model while avoiding interference with other areas during the learning phase. Consequently, Proteus exhibits marked improvements in portraying intricate facial characteristics and lifelike skin textures, all while sustaining commendable proficiency across various aesthetic domains, notably surrealism, anime, and cartoon-style visualizations. ## Settings for ProteusV0.3 Use these settings for the best results with ProteusV0.3: CFG Scale: Use a CFG scale of 8 to 7 Steps: 20 to 60 steps for more detail, 20 steps for faster results. Sampler: DPM++ 2M SDE Scheduler: Karras Resolution: 1280x1280 or 1024x1024 please also consider using these keep words to improve your prompts: best quality, HD, . if you are having trouble coming up with prompts you can use this GPT I put together to help you refine the prompt. ## Use it with 🧨 diffusers please support the work I do through donating to me on: or following me on", + "model_explanation_gemini": "Generates high-quality, detailed anime and photorealistic images from text prompts with enhanced lighting and aesthetic capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/datalab-to_surya-alpha.json b/data/model_data_json/datalab-to_surya-alpha.json new file mode 100644 index 0000000000000000000000000000000000000000..916a1f766cf73760335308cbe8328dfc9f754c68 --- /dev/null +++ b/data/model_data_json/datalab-to_surya-alpha.json @@ -0,0 +1,14 @@ +{ + "model_id": "datalab-to/surya-alpha", + "downloads": 159913, + "tags": [ + "transformers", + "safetensors", + "surya-multimodal-foundation", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: cc-by-nc-sa-4.0 --- Alpha model for surya.", + "model_explanation_gemini": "Alpha model for surya." +} \ No newline at end of file diff --git a/data/model_data_json/datalab-to_surya_layout0.json b/data/model_data_json/datalab-to_surya_layout0.json new file mode 100644 index 0000000000000000000000000000000000000000..e3435a98a0c16ec4615cc6e062b34978a0792564 --- /dev/null +++ b/data/model_data_json/datalab-to_surya_layout0.json @@ -0,0 +1,13 @@ +{ + "model_id": "datalab-to/surya_layout0", + "downloads": 81100, + "tags": [ + "transformers", + "safetensors", + "vision-encoder-decoder", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: cc-by-nc-sa-4.0 --- Layout model for surya" +} \ No newline at end of file diff --git a/data/model_data_json/dbmdz_bert-base-italian-xxl-uncased.json b/data/model_data_json/dbmdz_bert-base-italian-xxl-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..f59b2147ca358aa8a218646f07d320f59aefa270 --- /dev/null +++ b/data/model_data_json/dbmdz_bert-base-italian-xxl-uncased.json @@ -0,0 +1,20 @@ +{ + "model_id": "dbmdz/bert-base-italian-xxl-uncased", + "downloads": 80990, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "fill-mask", + "it", + "dataset:wikipedia", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: it license: mit datasets: - wikipedia --- # 🤗 + 📚 dbmdz BERT and ELECTRA models In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State Library open sources Italian BERT and ELECTRA models 🎉 # Italian BERT The source data for the Italian BERT model consists of a recent Wikipedia dump and various texts from the OPUS corpora collection. The final training corpus has a size of 13GB and 2,050,057,573 tokens. For sentence splitting, we use NLTK (faster compared to spacy). Our cased and uncased models are training with an initial sequence length of 512 subwords for ~2-3M steps. For the XXL Italian models, we use the same training data from OPUS and extend it with data from the Italian part of the OSCAR corpus. Thus, the final training corpus has a size of 81GB and 13,138,379,147 tokens. Note: Unfortunately, a wrong vocab size was used when training the XXL models. This explains the mismatch of the \"real\" vocab size of 31102, compared to the vocab size specified in . However, the model is working and all evaluations were done under those circumstances. See this issue for more information. The Italian ELECTRA model was trained on the \"XXL\" corpus for 1M steps in total using a batch size of 128. We pretty much following the ELECTRA training procedure as used for BERTurk. ## Model weights Currently only PyTorch-Transformers compatible weights are available. If you need access to TensorFlow checkpoints, please raise an issue! | Model | Downloads | ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | | []( • []( • []( | | []( • []( • []( | | []( • []( • []( | | []( • []( • []( | | []( • []( • []( | | []( • []( • []( ## Results For results on downstream tasks like NER or PoS tagging, please refer to this repository. ## Usage With Transformers >= 2.3 our Italian BERT models can be loaded like: To load the (recommended) Italian XXL BERT models, just use: To load the Italian XXL ELECTRA model (discriminator), just use: # Huggingface model hub All models are available on the Huggingface model hub. # Contact (Bugs, Feedback, Contribution and more) For questions about our BERT/ELECTRA models just open an issue here 🤗 # Acknowledgments Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️ Thanks to the generous support from the Hugging Face team, it is possible to download both cased and uncased models from their S3 storage 🤗" +} \ No newline at end of file diff --git a/data/model_data_json/dbmdz_bert-base-turkish-128k-uncased.json b/data/model_data_json/dbmdz_bert-base-turkish-128k-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..d74d9e8e9d2da924e900e49671c857055dbb018c --- /dev/null +++ b/data/model_data_json/dbmdz_bert-base-turkish-128k-uncased.json @@ -0,0 +1,18 @@ +{ + "model_id": "dbmdz/bert-base-turkish-128k-uncased", + "downloads": 107161, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "tr", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: tr license: mit --- # 🤗 + 📚 dbmdz Turkish BERT model In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State Library open sources an uncased model for Turkish 🎉 # 🇹🇷 BERTurk BERTurk is a community-driven uncased BERT model for Turkish. Some datasets used for pretraining and evaluation are contributed from the awesome Turkish NLP community, as well as the decision for the model name: BERTurk. ## Stats The current version of the model is trained on a filtered and sentence segmented version of the Turkish OSCAR corpus, a recent Wikipedia dump, various OPUS corpora and a special corpus provided by Kemal Oflazer. The final training corpus has a size of 35GB and 44,04,976,662 tokens. Thanks to Google's TensorFlow Research Cloud (TFRC) we could train an uncased model on a TPU v3-8 for 2M steps. For this model we use a vocab size of 128k. ## Model weights Currently only PyTorch-Transformers compatible weights are available. If you need access to TensorFlow checkpoints, please raise an issue! | Model | Downloads | -------------------------------------- | --------------------------------------------------------------------------------------------------------------- | | []( • []( • []( ## Usage With Transformers >= 2.3 our BERTurk uncased model can be loaded like: ## Results For results on PoS tagging or NER tasks, please refer to this repository. # Huggingface model hub All models are available on the Huggingface model hub. # Contact (Bugs, Feedback, Contribution and more) For questions about our BERT models just open an issue here 🤗 # Acknowledgments Thanks to Kemal Oflazer for providing us additional large corpora for Turkish. Many thanks to Reyyan Yeniterzi for providing us the Turkish NER dataset for evaluation. Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️ Thanks to the generous support from the Hugging Face team, it is possible to download both cased and uncased models from their S3 storage 🤗", + "model_explanation_gemini": "An uncased BERT model for Turkish NLP tasks, pretrained on a diverse 35GB Turkish corpus including OSCAR, Wikipedia, and OPUS datasets, with a 128k vocabulary size." +} \ No newline at end of file diff --git a/data/model_data_json/dbmdz_bert-base-turkish-cased.json b/data/model_data_json/dbmdz_bert-base-turkish-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..85690354b871c8a75825972b97bc416a8141e19c --- /dev/null +++ b/data/model_data_json/dbmdz_bert-base-turkish-cased.json @@ -0,0 +1,18 @@ +{ + "model_id": "dbmdz/bert-base-turkish-cased", + "downloads": 95145, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "tr", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: tr license: mit --- # 🤗 + 📚 dbmdz Turkish BERT model In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State Library open sources a cased model for Turkish 🎉 # 🇹🇷 BERTurk BERTurk is a community-driven cased BERT model for Turkish. Some datasets used for pretraining and evaluation are contributed from the awesome Turkish NLP community, as well as the decision for the model name: BERTurk. ## Stats The current version of the model is trained on a filtered and sentence segmented version of the Turkish OSCAR corpus, a recent Wikipedia dump, various OPUS corpora and a special corpus provided by Kemal Oflazer. The final training corpus has a size of 35GB and 44,04,976,662 tokens. Thanks to Google's TensorFlow Research Cloud (TFRC) we could train a cased model on a TPU v3-8 for 2M steps. ## Model weights Currently only PyTorch-Transformers compatible weights are available. If you need access to TensorFlow checkpoints, please raise an issue! | Model | Downloads | --------------------------------- | --------------------------------------------------------------------------------------------------------------- | | []( • []( • []( ## Usage With Transformers >= 2.3 our BERTurk cased model can be loaded like: ## Results For results on PoS tagging or NER tasks, please refer to this repository. # Huggingface model hub All models are available on the Huggingface model hub. # Contact (Bugs, Feedback, Contribution and more) For questions about our BERT models just open an issue here 🤗 # Acknowledgments Thanks to Kemal Oflazer for providing us additional large corpora for Turkish. Many thanks to Reyyan Yeniterzi for providing us the Turkish NER dataset for evaluation. Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️ Thanks to the generous support from the Hugging Face team, it is possible to download both cased and uncased models from their S3 storage 🤗", + "model_explanation_gemini": "A cased BERT model for Turkish, pretrained on diverse Turkish corpora for NLP tasks like PoS tagging and NER." +} \ No newline at end of file diff --git a/data/model_data_json/dccuchile_bert-base-spanish-wwm-uncased.json b/data/model_data_json/dccuchile_bert-base-spanish-wwm-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..b7882fb5d65682e1c7ad4eeb9d64192c9d18a65a --- /dev/null +++ b/data/model_data_json/dccuchile_bert-base-spanish-wwm-uncased.json @@ -0,0 +1,26 @@ +{ + "model_id": "dccuchile/bert-base-spanish-wwm-uncased", + "downloads": 236920, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "masked-lm", + "es", + "arxiv:1904.09077", + "arxiv:1906.01502", + "arxiv:1812.10464", + "arxiv:1901.07291", + "arxiv:1904.02099", + "arxiv:1906.01569", + "arxiv:1908.11828", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - es tags: - masked-lm --- # BETO: Spanish BERT BETO is a BERT model trained on a big Spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. Below you find Tensorflow and Pytorch checkpoints for the uncased and cased versions, as well as some results for Spanish benchmarks comparing BETO with Multilingual BERT as well as other (not BERT-based) models. ## Download | | | | | |-|:--------:|:-----:|:----:| |BETO uncased|tensorflow_weights | pytorch_weights | vocab, config | |BETO cased| tensorflow_weights | pytorch_weights | vocab, config | All models use a vocabulary of about 31k BPE subwords constructed using SentencePiece and were trained for 2M steps. ## Benchmarks The following table shows some BETO results in the Spanish version of every task. We compare BETO (cased and uncased) with the Best Multilingual BERT results that we found in the literature (as of October 2019). The table also shows some alternative methods for the same tasks (not necessarily BERT-based methods). References for all methods can be found here. |Task | BETO-cased | BETO-uncased | Best Multilingual BERT | Other results | |-------|--------------:|--------------:|--------------------------:|-------------------------------:| |POS | **98.97** | 98.44 | 97.10 [2] | 98.91 [6], 96.71 [3] | |NER-C | **88.43** | 82.67 | 87.38 [2] | 87.18 [3] | |MLDoc | 95.60 | **96.12** | 95.70 [2] | 88.75 [4] | |PAWS-X | 89.05 | 89.55 | 90.70 [8] | |XNLI | **82.01** | 80.15 | 78.50 [2] | 80.80 [5], 77.80 [1], 73.15 [4]| ## Example of use For further details on how to use BETO you can visit the 🤗Huggingface Transformers library, starting by the Quickstart section. BETO models can be accessed simply as []( and []( by using the Transformers library. An example on how to download and use the models in this page can be found in this colab notebook. (We will soon add a more detailed step-by-step tutorial in Spanish for newcommers 😉) ## Acknowledgments We thank Adereso for kindly providing support for traininig BETO-uncased, and the Millennium Institute for Foundational Research on Data that provided support for training BETO-cased. Also thanks to Google for helping us with the TensorFlow Research Cloud program. ## Citation Spanish Pre-Trained BERT Model and Evaluation Data To cite this resource in a publication please use the following: ## License Disclaimer The license CC BY 4.0 best describes our intentions for our work. However we are not sure that all the datasets used to train BETO have licenses compatible with CC BY 4.0 (specially for commercial use). Please use at your own discretion and verify that the licenses of the original text resources match your needs. ## References * [1] Original Multilingual BERT * [2] Multilingual BERT on \"Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT\" * [3] Multilingual BERT on \"How Multilingual is Multilingual BERT?\" * [4] LASER * [5] XLM (MLM+TLM) * [6] UDPipe on \"75 Languages, 1 Model: Parsing Universal Dependencies Universally\" * [7] Multilingual BERT on \"Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation\" * [8] Multilingual BERT on \"PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification\"", + "model_explanation_gemini": "BETO is a Spanish BERT-base model trained with Whole Word Masking for masked language modeling and natural language processing tasks in Spanish." +} \ No newline at end of file diff --git a/data/model_data_json/deepdml_faster-whisper-large-v3-turbo-ct2.json b/data/model_data_json/deepdml_faster-whisper-large-v3-turbo-ct2.json new file mode 100644 index 0000000000000000000000000000000000000000..53b259f19b9046447ff1ff6042fd1feb1ead96db --- /dev/null +++ b/data/model_data_json/deepdml_faster-whisper-large-v3-turbo-ct2.json @@ -0,0 +1,113 @@ +{ + "model_id": "deepdml/faster-whisper-large-v3-turbo-ct2", + "downloads": 250310, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "yue", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su - yue tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper large-v3 turbo model for CTranslate2 This repository contains the conversion of deepdml/whisper-large-v3-turbo to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts audio to text in multiple languages using the Whisper large-v3 turbo model optimized for CTranslate2 and faster-whisper." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-Coder-V2-Lite-Instruct.json b/data/model_data_json/deepseek-ai_DeepSeek-Coder-V2-Lite-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..986f2e429b3c60849115d047005e5dce1d1bde8d --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-Coder-V2-Lite-Instruct.json @@ -0,0 +1,20 @@ +{ + "model_id": "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", + "downloads": 136611, + "tags": [ + "transformers", + "safetensors", + "deepseek_v2", + "text-generation", + "conversational", + "custom_code", + "arxiv:2401.06066", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: deepseek-license license_link: LICENSE ---
\"DeepSeek-V2\"

API Platform | How to Use | License |

👁️

# DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence ## 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.

In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here. ## 2. Model Downloads We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public.

| **Model** | **#Total Params** | **#Active Params** | **Context Length** | **Download** | | :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: | | DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Base | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | 🤗 HuggingFace |
## 3. Chat Website You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com ## 4. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price.

## 5. How to run locally **Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB*8 GPUs are required.** ### Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. #### Code Completion #### Code Insertion #### Chat Completion The complete chat template can be found within located in the huggingface model repository. An example of chat template is as belows: You can also add an optional system message: ### Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: ## 6. License This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. ## 7. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "An open-source Mixture-of-Experts code language model excelling in coding and mathematical reasoning tasks, supporting 338 programming languages with a 128K context length." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Llama-70B.json b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Llama-70B.json new file mode 100644 index 0000000000000000000000000000000000000000..3d3fd7fc910ad1d7f8672efaa09ff6f0754e08ea --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Llama-70B.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B", + "downloads": 196841, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "conversational", + "arxiv:2501.12948", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-R1

\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\\\n\\n\\\") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\\\n\" at the beginning of every output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "A distilled 70B-parameter model based on Llama, optimized for reasoning tasks through reinforcement learning and fine-tuning to enhance performance in math, code, and general reasoning." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Llama-8B.json b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Llama-8B.json new file mode 100644 index 0000000000000000000000000000000000000000..fc9404c165492510c2ac6dc3c6afe2d469b78198 --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Llama-8B.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", + "downloads": 853747, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "conversational", + "arxiv:2501.12948", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-R1
\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\\\n\\n\\\") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\\\n\" at the beginning of every output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "A distilled 8B-parameter model optimized for reasoning tasks through reinforcement learning, achieving strong performance in math, code, and reasoning benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-1.5B.json b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-1.5B.json new file mode 100644 index 0000000000000000000000000000000000000000..d4256c78f404a692183a640303de91ffb7016003 --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-1.5B.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "downloads": 1711879, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "arxiv:2501.12948", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-R1
\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\\\n\\n\\\") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\\\n\" at the beginning of every output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "A distilled 1.5B parameter model optimized for reasoning tasks through reinforcement learning, achieving strong performance in math, code, and reasoning benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-14B.json b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-14B.json new file mode 100644 index 0000000000000000000000000000000000000000..bf41581cbf6ecf8a6f169194f92049643f259f02 --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-14B.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", + "downloads": 704007, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "arxiv:2501.12948", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-R1
\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\\\n\\n\\\") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\\\n\" at the beginning of every output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "A distilled 14B-parameter model optimized for reasoning tasks through reinforcement learning, achieving strong performance in math, code, and reasoning benchmarks while addressing repetition and readability issues." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-32B.json b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-32B.json new file mode 100644 index 0000000000000000000000000000000000000000..74a520d798455e65986eaf63a173e6763ad0958e --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-32B.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B", + "downloads": 1706272, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "arxiv:2501.12948", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-R1
\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\\\n\\n\\\") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\\\n\" at the beginning of every output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "A distilled 32B-parameter model optimized for reasoning tasks through reinforcement learning, outperforming benchmarks in math, code, and reasoning without relying on supervised fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-7B.json b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..437915a72834ad669bf16f91131bcf05039418af --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-R1-Distill-Qwen-7B.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", + "downloads": 800580, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "arxiv:2501.12948", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-R1
\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\\\n\\n\\\") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\\\n\" at the beginning of every output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "Distills reasoning capabilities from a larger model into a smaller one (Qwen-7B) to enhance performance on math, code, and reasoning tasks without supervised fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-R1.json b/data/model_data_json/deepseek-ai_DeepSeek-R1.json new file mode 100644 index 0000000000000000000000000000000000000000..771136b516d52fb6098bb58bcb46fd692b4ecfac --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-R1.json @@ -0,0 +1,20 @@ +{ + "model_id": "deepseek-ai/DeepSeek-R1", + "downloads": 1443833, + "tags": [ + "transformers", + "safetensors", + "deepseek_v3", + "text-generation", + "conversational", + "custom_code", + "arxiv:2501.12948", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "fp8", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-R1
\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\\\n\\n\\\") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\\\n\" at the beginning of every output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "DeepSeek-R1 is a reasoning-focused AI model trained via reinforcement learning and supervised fine-tuning to excel in math, code, and reasoning tasks, achieving performance comparable to leading models." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-V2.json b/data/model_data_json/deepseek-ai_DeepSeek-V2.json new file mode 100644 index 0000000000000000000000000000000000000000..9bb4618e4c7707ee50ac51f1cd40b59f0c427657 --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-V2.json @@ -0,0 +1,21 @@ +{ + "model_id": "deepseek-ai/DeepSeek-V2", + "downloads": 160112, + "tags": [ + "transformers", + "safetensors", + "deepseek_v2", + "text-generation", + "conversational", + "custom_code", + "arxiv:2311.18743", + "arxiv:2405.04434", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: deepseek license_link: ---
\"DeepSeek-V2\"

Model Download | Evaluation Results | Model Architecture | API Platform | License | Citation

👁️

# DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model ## 1. Introduction Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.

We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation. ## 2. Model Downloads
| **Model** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-V2 | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 128k | 🤗 HuggingFace |
Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively. ## 3. Evaluation Results ### Base Model #### Standard Benchmark
| **Benchmark** | **Domain** | **LLaMA3 70B** | **Mixtral 8x22B** | **DeepSeek-V1 (Dense-67B)** | **DeepSeek-V2 (MoE-236B)** | |:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:| | **MMLU** | English | 78.9 | 77.6 | 71.3 | 78.5 | | **BBH** | English | 81.0 | 78.9 | 68.7 | 78.9 | | **C-Eval** | Chinese | 67.5 | 58.6 | 66.1 | 81.7 | | **CMMLU** | Chinese | 69.3 | 60.0 | 70.8 | 84.0 | | **HumanEval** | Code | 48.2 | 53.1 | 45.1 | 48.8 | | **MBPP** | Code | 68.6 | 64.2 | 57.4 | 66.6 | | **GSM8K** | Math | 83.0 | 80.3 | 63.4 | 79.2 | | **Math** | Math | 42.2 | 42.5 | 18.7 | 43.6 |
For more evaluation details, such as few-shot settings and prompts, please check our paper. #### Context Window

Evaluation results on the `tokenizer_config.json` located in the huggingface model repository. An example of chat template is as belows: You can also add an optional system message: ### Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: ## 8. License This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to the Model License. DeepSeek-V2 series (including Base and Chat) supports commercial use. ## 9. Citation ## 10. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "DeepSeek-V2 is a high-performance, cost-efficient Mixture-of-Experts language model with 236B parameters (21B active per token), optimized for strong benchmark results and improved inference throughput." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-V3-0324.json b/data/model_data_json/deepseek-ai_DeepSeek-V3-0324.json new file mode 100644 index 0000000000000000000000000000000000000000..c0cafdf12de18b942ebe238a52641a97430e95b0 --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-V3-0324.json @@ -0,0 +1,21 @@ +{ + "model_id": "deepseek-ai/DeepSeek-V3-0324", + "downloads": 338455, + "tags": [ + "transformers", + "safetensors", + "deepseek_v3", + "text-generation", + "conversational", + "custom_code", + "arxiv:2412.19437", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "fp8", + "region:us" + ], + "description": "--- license: mit library_name: transformers --- # DeepSeek-V3-0324

\"DeepSeek-V3\"

## Features DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects. !Model Performance ### Reasoning Capabilities - Significant improvements in benchmark performance: - MMLU-Pro: 75.9 → 81.2 (+5.3) - GPQA: 59.1 → 68.4 (+9.3) - AIME: 39.6 → 59.4 (+19.8) - LiveCodeBench: 39.2 → 49.2 (+10.0) ### Front-End Web Development - Improved the executability of the code - More aesthetically pleasing web pages and game front-ends ### Chinese Writing Proficiency - Enhanced style and content quality: - Aligned with the R1 writing style - Better quality in medium-to-long-form writing - Feature Enhancements - Improved multi-turn interactive rewriting - Optimized translation quality and letter writing ### Chinese Search Capabilities - Enhanced report analysis requests with more detailed outputs ### Function Calling Improvements - Increased accuracy in Function Calling, fixing issues from previous V3 versions --- ## Usage Recommendations ### System Prompt In the official DeepSeek web/app, we use the same system prompt with a specific date. For example, ### Temperature In our web and application environments, the temperature parameter $T_{model}$ is set to 0.3. Because many users use the default temperature 1.0 in API call, we have implemented an API temperature $T_{api}$ mapping mechanism that adjusts the input API temperature value of 1.0 to the most suitable model temperature setting of 0.3. $$ T_{model} = T_{api} \\times 0.3 \\quad (0 \\leq T_{api} \\leq 1) $$ $$ T_{model} = T_{api} - 0.7 \\quad (1 < T_{api} \\leq 2) $$ Thus, if you call V3 via API, temperature 1.0 equals to the model temperature 0.3. ### Prompts for File Uploading and Web Search For file uploading, please follow the template to create prompts, where {file_name}, {file_content} and {question} are arguments. For Web Search, {search_results}, {cur_date}, and {question} are arguments. For Chinese query, we use the prompt: For English query, we use the prompt: ## How to Run Locally The model structure of DeepSeek-V3-0324 is exactly the same as DeepSeek-V3. Please visit DeepSeek-V3 repo for more information about running this model locally. **This model supports features such as function calling, JSON output, and FIM completion. For instructions on how to construct prompts to use these features, please refer to DeepSeek-V2.5 repo.** **NOTE: Hugging Face's Transformers has not been directly supported yet.** ## License This repository and the model weights are licensed under the MIT License. ## Citation ## Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "DeepSeek-V3-0324 is an advanced AI model with enhanced reasoning, Chinese writing, web development, and function calling capabilities, optimized for tasks like benchmark performance, interactive rewriting, and detailed report analysis." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_DeepSeek-V3.json b/data/model_data_json/deepseek-ai_DeepSeek-V3.json new file mode 100644 index 0000000000000000000000000000000000000000..99c71c94cc6b429b8ebe6e8ebef546bf96a1ab7e --- /dev/null +++ b/data/model_data_json/deepseek-ai_DeepSeek-V3.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/DeepSeek-V3", + "downloads": 617773, + "tags": [ + "transformers", + "safetensors", + "deepseek_v3", + "text-generation", + "conversational", + "custom_code", + "arxiv:2412.19437", + "autotrain_compatible", + "endpoints_compatible", + "fp8", + "region:us" + ], + "description": "--- library_name: transformers ---
\"DeepSeek-V3\"

👁️

## 1. Introduction We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.

## 2. Model Summary --- **Architecture: Innovative Load Balancing Strategy and Training Objective** - On top of the efficient architecture of DeepSeek-V2, we pioneer an auxiliary-loss-free strategy for load balancing, which minimizes the performance degradation that arises from encouraging load balancing. - We investigate a Multi-Token Prediction (MTP) objective and prove it beneficial to model performance. It can also be used for speculative decoding for inference acceleration. --- **Pre-Training: Towards Ultimate Training Efficiency** - We design an FP8 mixed precision training framework and, for the first time, validate the feasibility and effectiveness of FP8 training on an extremely large-scale model. - Through co-design of algorithms, frameworks, and hardware, we overcome the communication bottleneck in cross-node MoE training, nearly achieving full computation-communication overlap. This significantly enhances our training efficiency and reduces the training costs, enabling us to further scale up the model size without additional overhead. - At an economical cost of only 2.664M H800 GPU hours, we complete the pre-training of DeepSeek-V3 on 14.8T tokens, producing the currently strongest open-source base model. The subsequent training stages after pre-training require only 0.1M GPU hours. --- **Post-Training: Knowledge Distillation from DeepSeek-R1** - We introduce an innovative methodology to distill reasoning capabilities from the long-Chain-of-Thought (CoT) model, specifically from one of the DeepSeek R1 series models, into standard LLMs, particularly DeepSeek-V3. Our pipeline elegantly incorporates the verification and reflection patterns of R1 into DeepSeek-V3 and notably improves its reasoning performance. Meanwhile, we also maintain a control over the output style and length of DeepSeek-V3. --- ## 3. Model Downloads
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-V3-Base | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-V3 | 671B | 37B | 128K | 🤗 HuggingFace |
**NOTE: The total size of DeepSeek-V3 models on HuggingFace is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.** To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: How_to Run_Locally. For developers looking to dive deeper, we recommend exploring README_WEIGHTS.md for details on the Main Model weights and the Multi-Token Prediction (MTP) Modules. Please note that MTP support is currently under active development within the community, and we welcome your contributions and feedback. ## 4. Evaluation Results ### Base Model #### Standard Benchmarks
| | Benchmark (Metric) | # Shots | DeepSeek-V2 | Qwen2.5 72B | LLaMA3.1 405B | DeepSeek-V3 | |---|-------------------|----------|--------|-------------|---------------|---------| | | Architecture | - | MoE | Dense | Dense | MoE | | | # Activated Params | - | 21B | 72B | 405B | 37B | | | # Total Params | - | 236B | 72B | 405B | 671B | | English | Pile-test (BPB) | - | 0.606 | 0.638 | **0.542** | 0.548 | | | BBH (EM) | 3-shot | 78.8 | 79.8 | 82.9 | **87.5** | | | MMLU (Acc.) | 5-shot | 78.4 | 85.0 | 84.4 | **87.1** | | | MMLU-Redux (Acc.) | 5-shot | 75.6 | 83.2 | 81.3 | **86.2** | | | MMLU-Pro (Acc.) | 5-shot | 51.4 | 58.3 | 52.8 | **64.4** | | | DROP (F1) | 3-shot | 80.4 | 80.6 | 86.0 | **89.0** | | | ARC-Easy (Acc.) | 25-shot | 97.6 | 98.4 | 98.4 | **98.9** | | | ARC-Challenge (Acc.) | 25-shot | 92.2 | 94.5 | **95.3** | **95.3** | | | HellaSwag (Acc.) | 10-shot | 87.1 | 84.8 | **89.2** | 88.9 | | | PIQA (Acc.) | 0-shot | 83.9 | 82.6 | **85.9** | 84.7 | | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 | | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 | | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 | | | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | **82.7** | **82.9** | | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 | | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** | | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** | | | MBPP (Pass@1) | 3-shot | 65.0 | 72.6 | 68.4 | **75.4** | | | LiveCodeBench-Base (Pass@1) | 3-shot | 11.6 | 12.9 | 15.5 | **19.4** | | | CRUXEval-I (Acc.) | 2-shot | 52.5 | 59.1 | 58.5 | **67.3** | | | CRUXEval-O (Acc.) | 2-shot | 49.8 | 59.9 | 59.9 | **69.8** | | Math | GSM8K (EM) | 8-shot | 81.6 | 88.3 | 83.5 | **89.3** | | | MATH (EM) | 4-shot | 43.4 | 54.4 | 49.0 | **61.6** | | | MGSM (EM) | 8-shot | 63.6 | 76.2 | 69.9 | **79.8** | | | CMath (EM) | 3-shot | 78.7 | 84.5 | 77.3 | **90.7** | | Chinese | CLUEWSC (EM) | 5-shot | 82.0 | 82.5 | **83.0** | 82.7 | | | C-Eval (Acc.) | 5-shot | 81.4 | 89.2 | 72.5 | **90.1** | | | CMMLU (Acc.) | 5-shot | 84.0 | **89.5** | 73.7 | 88.8 | | | CMRC (EM) | 1-shot | **77.4** | 75.8 | 76.0 | 76.3 | | | C3 (Acc.) | 0-shot | 77.4 | 76.7 | **79.7** | 78.6 | | | CCPM (Acc.) | 0-shot | **93.0** | 88.5 | 78.6 | 92.0 | | Multilingual | MMMLU-non-English (Acc.) | 5-shot | 64.0 | 74.8 | 73.8 | **79.4** |
Note: Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks. For more evaluation details, please check our paper. #### Context Window

Evaluation results on the `inferencerequirements.txt/path/to/DeepSeek-V3` folder. #### Model Weights Conversion Convert HuggingFace model weights to a specific format: #### Run Then you can chat with DeepSeek-V3: Or batch inference on a given file: ### 6.2 Inference with SGLang (recommended) SGLang currently supports MLA optimizations, FP8 (W8A8), FP8 KV Cache, and Torch Compile, delivering state-of-the-art latency and throughput performance among open-source frameworks. Notably, SGLang v0.4.1 fully supports running DeepSeek-V3 on both **NVIDIA and AMD GPUs**, making it a highly versatile and robust solution. Here are the launch instructions from the SGLang team: ### 6.3 Inference with LMDeploy (recommended) LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows. For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: ### 6.4 Inference with TRT-LLM (recommended) TensorRT-LLM now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: ### 6.5 Inference with vLLM (recommended) vLLM v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the vLLM instructions. Please feel free to follow the enhancement plan as well. ### 6.6 Recommended Inference Functionality with AMD GPUs In collaboration with the AMD team, we have achieved Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the SGLang instructions. ### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs The MindIE framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the instructions here. ## 7. License This code repository is licensed under the MIT License. The use of DeepSeek-V3 Base/Chat models is subject to the Model License. DeepSeek-V3 series (including Base and Chat) supports commercial use. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "DeepSeek-V3 is a high-performance, cost-efficient Mixture-of-Experts language model with 671B total parameters, designed for advanced natural language processing tasks through innovative load balancing, multi-token prediction, and efficient FP8 training." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_Janus-Pro-7B.json b/data/model_data_json/deepseek-ai_Janus-Pro-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..3e5a486567040cb53cbe48345b3383f30b111293 --- /dev/null +++ b/data/model_data_json/deepseek-ai_Janus-Pro-7B.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/Janus-Pro-7B", + "downloads": 85842, + "tags": [ + "transformers", + "pytorch", + "multi_modality", + "muiltimodal", + "text-to-image", + "unified-model", + "any-to-any", + "arxiv:2501.17811", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_name: deepseek license_link: LICENSE pipeline_tag: any-to-any library_name: transformers tags: - muiltimodal - text-to-image - unified-model --- ## 1. Introduction Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models. **Github Repository**
\"image\"
\"image\"
### 2. Model Summary Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16. ## 3. Quick Start Please refer to **Github Repository** ## 4. License This code repository is licensed under the MIT License. The use of Janus-Pro models is subject to DeepSeek Model License. ## 5. Citation ## 6. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "Janus-Pro is a unified multimodal model that decouples visual encoding for enhanced understanding and generation tasks, surpassing previous unified models and matching task-specific performance." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_deepseek-coder-1.3b-base.json b/data/model_data_json/deepseek-ai_deepseek-coder-1.3b-base.json new file mode 100644 index 0000000000000000000000000000000000000000..67d34240723923d2e91f00ec150d83b2a0f68387 --- /dev/null +++ b/data/model_data_json/deepseek-ai_deepseek-coder-1.3b-base.json @@ -0,0 +1,17 @@ +{ + "model_id": "deepseek-ai/deepseek-coder-1.3b-base", + "downloads": 88050, + "tags": [ + "transformers", + "pytorch", + "llama", + "text-generation", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: deepseek-license license_link: LICENSE ---

\"DeepSeek

|


### 1. Introduction of Deepseek Coder Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. - **Massive Training Data**: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages. - **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements. - **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks. - **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks. ### 2. Model Summary deepseek-coder-1.3b-base is a 1.3B parameter model with Multi-Head Attention trained on 1 trillion tokens. - **Home Page:** DeepSeek - **Repository:** deepseek-ai/deepseek-coder - **Chat With DeepSeek Coder:** DeepSeek-Coder ### 3. How to Use Here give some examples of how to use our model. #### 1)Code Completion #### 2)Code Insertion #### 3)Repository Level Code Completion ### 4. License This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use. See the LICENSE-MODEL for more details. ### 5. Contact If you have any questions, please raise an issue or contact us at agi_code@deepseek.com.", + "model_explanation_gemini": "A 1.3B parameter code model trained for project-level code completion and infilling tasks, supporting multiple programming languages with a 16K context window." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_deepseek-coder-1.3b-instruct.json b/data/model_data_json/deepseek-ai_deepseek-coder-1.3b-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..5c491f69149e45259b5976a0af036d421a9cf544 --- /dev/null +++ b/data/model_data_json/deepseek-ai_deepseek-coder-1.3b-instruct.json @@ -0,0 +1,19 @@ +{ + "model_id": "deepseek-ai/deepseek-coder-1.3b-instruct", + "downloads": 118048, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "conversational", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: deepseek license_link: LICENSE ---

\"DeepSeek

|


### 1. Introduction of Deepseek Coder Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. - **Massive Training Data**: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages. - **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements. - **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks. - **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks. ### 2. Model Summary deepseek-coder-1.3b-instruct is a 1.3B parameter model initialized from deepseek-coder-1.3b-base and fine-tuned on 2B tokens of instruction data. - **Home Page:** DeepSeek - **Repository:** deepseek-ai/deepseek-coder - **Chat With DeepSeek Coder:** DeepSeek-Coder ### 3. How to Use Here give some examples of how to use our model. #### Chat Model Inference ### 4. License This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use. See the LICENSE-MODEL for more details. ### 5. Contact If you have any questions, please raise an issue or contact us at agi_code@deepseek.com.", + "model_explanation_gemini": "A 1.3B parameter AI model fine-tuned for project-level code completion and infilling tasks, trained on extensive code and linguistic data to achieve state-of-the-art performance in programming assistance." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_deepseek-llm-7b-chat.json b/data/model_data_json/deepseek-ai_deepseek-llm-7b-chat.json new file mode 100644 index 0000000000000000000000000000000000000000..e9c94962801415f6d0785c6639fe2697980faae3 --- /dev/null +++ b/data/model_data_json/deepseek-ai_deepseek-llm-7b-chat.json @@ -0,0 +1,18 @@ +{ + "model_id": "deepseek-ai/deepseek-llm-7b-chat", + "downloads": 114300, + "tags": [ + "transformers", + "pytorch", + "llama", + "text-generation", + "conversational", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: deepseek license_link: LICENSE ---

\"DeepSeek

|


### 1. Introduction of Deepseek LLM Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. ### 2. Model Summary is a 7B parameter model initialized from and fine-tuned on extra instruction data. - **Home Page:** DeepSeek - **Repository:** deepseek-ai/deepseek-LLM - **Chat With DeepSeek LLM:** DeepSeek-LLM ### 3. How to Use Here give some examples of how to use our model. #### Chat Completion Avoiding the use of the provided function , you can also interact with our model following the sample template. Note that should be replaced by your input. **Note:** By default (), our tokenizer automatically adds a () before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. ### 4. License This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use. See the LICENSE-MODEL for more details. ### 5. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "Deepseek-ai's DeepSeek-LLM-7B-Chat is a 7B-parameter multilingual (English and Chinese) chat model fine-tuned for conversational tasks, supporting commercial use." +} \ No newline at end of file diff --git a/data/model_data_json/deepseek-ai_deepseek-vl-1.3b-chat.json b/data/model_data_json/deepseek-ai_deepseek-vl-1.3b-chat.json new file mode 100644 index 0000000000000000000000000000000000000000..eb52b8c9b06e34c6908f042cbb7d778fd457854c --- /dev/null +++ b/data/model_data_json/deepseek-ai_deepseek-vl-1.3b-chat.json @@ -0,0 +1,16 @@ +{ + "model_id": "deepseek-ai/deepseek-vl-1.3b-chat", + "downloads": 100963, + "tags": [ + "transformers", + "safetensors", + "multi_modality", + "image-text-to-text", + "arxiv:2403.05525", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: deepseek license_link: LICENSE pipeline_tag: image-text-to-text --- ## 1. Introduction Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. DeepSeek-VL: Towards Real-World Vision-Language Understanding **Github Repository** Haoyu Lu*, Wen Liu*, Bo Zhang**, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (*Equal Contribution, **Project Lead) supports commercial use. ## 5. Citation ## 6. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "DeepSeek-VL is an open-source vision-language model for real-world multimodal understanding tasks like processing diagrams, web pages, formula recognition, and natural images." +} \ No newline at end of file diff --git a/data/model_data_json/deepset_bert-large-uncased-whole-word-masking-squad2.json b/data/model_data_json/deepset_bert-large-uncased-whole-word-masking-squad2.json new file mode 100644 index 0000000000000000000000000000000000000000..823d704aee74c5ebc45e51c065c487ed3ee24c28 --- /dev/null +++ b/data/model_data_json/deepset_bert-large-uncased-whole-word-masking-squad2.json @@ -0,0 +1,21 @@ +{ + "model_id": "deepset/bert-large-uncased-whole-word-masking-squad2", + "downloads": 204585, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "question-answering", + "en", + "dataset:squad_v2", + "license:cc-by-4.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: cc-by-4.0 datasets: - squad_v2 model-index: - name: deepset/bert-large-uncased-whole-word-masking-squad2 results: - task: type: question-answering name: Question Answering dataset: name: squad_v2 type: squad_v2 config: squad_v2 split: validation metrics: - type: exact_match value: 80.8846 name: Exact Match verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiY2E5ZGNkY2ExZWViZGEwNWE3OGRmMWM2ZmE4ZDU4ZDQ1OGM3ZWE0NTVmZjFmYmZjZmJmNjJmYTc3NTM3OTk3OSIsInZlcnNpb24iOjF9.aSblF4ywh1fnHHrN6UGL392R5KLaH3FCKQlpiXo_EdQ4XXEAENUCjYm9HWDiFsgfSENL35GkbSyz_GAhnefsAQ - type: f1 value: 83.8765 name: F1 verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGFlNmEzMTk2NjRkNTI3ZTk3ZTU1NWNlYzIyN2E0ZDFlNDA2ZjYwZWJlNThkMmRmMmE0YzcwYjIyZDM5NmRiMCIsInZlcnNpb24iOjF9.-rc2_Bsp_B26-o12MFYuAU0Ad2Hg9PDx7Preuk27WlhYJDeKeEr32CW8LLANQABR3Mhw2x8uTYkEUrSDMxxLBw - task: type: question-answering name: Question Answering dataset: name: squad type: squad config: plain_text split: validation metrics: - type: exact_match value: 85.904 name: Exact Match - type: f1 value: 92.586 name: F1 - task: type: question-answering name: Question Answering dataset: name: adversarial_qa type: adversarial_qa config: adversarialQA split: validation metrics: - type: exact_match value: 28.233 name: Exact Match - type: f1 value: 41.170 name: F1 - task: type: question-answering name: Question Answering dataset: name: squad_adversarial type: squad_adversarial config: AddOneSent split: validation metrics: - type: exact_match value: 78.064 name: Exact Match - type: f1 value: 83.591 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts amazon type: squadshifts config: amazon split: test metrics: - type: exact_match value: 65.615 name: Exact Match - type: f1 value: 80.733 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts new_wiki type: squadshifts config: new_wiki split: test metrics: - type: exact_match value: 81.570 name: Exact Match - type: f1 value: 91.199 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts nyt type: squadshifts config: nyt split: test metrics: - type: exact_match value: 83.279 name: Exact Match - type: f1 value: 91.090 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts reddit type: squadshifts config: reddit split: test metrics: - type: exact_match value: 69.305 name: Exact Match - type: f1 value: 82.405 name: F1 --- # bert-large-uncased-whole-word-masking-squad2 for Extractive QA This is a berta-large model, fine-tuned using the SQuAD2.0 dataset for the task of question answering. ## Overview **Language model:** bert-large **Language:** English **Downstream-task:** Extractive QA **Training data:** SQuAD 2.0 **Eval data:** SQuAD 2.0 **Code:** See an example extractive QA pipeline built with Haystack ## Usage ### In Haystack Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents. To load and run the model with Haystack: For a complete example with an extractive question answering pipeline that scales over many documents, check out the corresponding Haystack tutorial. ### In Transformers ## About us
\"\"
\"\"
deepset is the company behind the production-ready open-source AI framework Haystack. Some of our other work: - Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\") - German BERT, GermanQuAD and GermanDPR, German embedding model - deepset Cloud, deepset Studio ## Get in touch and join the Haystack community

For more info on Haystack, visit our

Twitter | LinkedIn | Discord | GitHub Discussions | Website | YouTube By the way: we're hiring!", + "model_explanation_gemini": "A BERT-large model fine-tuned on SQuAD2.0 for extractive question answering, designed to answer questions based on given text passages." +} \ No newline at end of file diff --git a/data/model_data_json/deepset_deberta-v3-base-injection.json b/data/model_data_json/deepset_deberta-v3-base-injection.json new file mode 100644 index 0000000000000000000000000000000000000000..ede23f2fe56cbc795159564e08fbaca1c43f82ec --- /dev/null +++ b/data/model_data_json/deepset_deberta-v3-base-injection.json @@ -0,0 +1,23 @@ +{ + "model_id": "deepset/deberta-v3-base-injection", + "downloads": 132132, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "deberta-v2", + "text-classification", + "generated_from_trainer", + "en", + "de", + "dataset:deepset/prompt-injections", + "base_model:microsoft/deberta-v3-base", + "base_model:finetune:microsoft/deberta-v3-base", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - generated_from_trainer metrics: - accuracy base_model: microsoft/deberta-v3-base model-index: - name: deberta-v3-base-injection results: [] datasets: - deepset/prompt-injections language: - en - de --- # deberta-v3-base-injection This model is a fine-tuned version of microsoft/deberta-v3-base on the prompt-injection dataset. It achieves the following results on the evaluation set: - Loss: 0.0673 - Accuracy: 0.9914 ## Model description This model detects prompt injection attempts and classifies them as \"INJECTION\". Legitimate requests are classified as \"LEGIT\". The dataset assumes that legitimate requests are either all sorts of questions of key word searches. ## Intended uses & limitations If you are using this model to secure your system and it is overly \"trigger-happy\" to classify requests as injections, consider collecting legitimate examples and retraining the model with the promp-injection dataset. ## Training and evaluation data Based in the promp-injection dataset. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:--------:| | No log | 1.0 | 69 | 0.2353 | 0.9741 | | No log | 2.0 | 138 | 0.0894 | 0.9741 | | No log | 3.0 | 207 | 0.0673 | 0.9914 | ### Framework versions - Transformers 4.29.1 - Pytorch 2.0.0+cu118 - Datasets 2.12.0 - Tokenizers 0.13.3 ## About us
\"\"
\"\"
deepset is the company behind the production-ready open-source AI framework Haystack. Some of our other work: - Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\") - German BERT, GermanQuAD and GermanDPR, German embedding model - deepset Cloud, deepset Studio ## Get in touch and join the Haystack community

For more info on Haystack, visit our

Twitter | LinkedIn | Discord | GitHub Discussions | Website | YouTube By the way: we're hiring!", + "model_explanation_gemini": "Detects and classifies prompt injection attempts as \"INJECTION\" or legitimate requests as \"LEGIT\"." +} \ No newline at end of file diff --git a/data/model_data_json/deepset_roberta-base-squad2.json b/data/model_data_json/deepset_roberta-base-squad2.json new file mode 100644 index 0000000000000000000000000000000000000000..d569115db0664b074dae234cfd206d7ac64dc535 --- /dev/null +++ b/data/model_data_json/deepset_roberta-base-squad2.json @@ -0,0 +1,24 @@ +{ + "model_id": "deepset/roberta-base-squad2", + "downloads": 1675126, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "roberta", + "question-answering", + "en", + "dataset:squad_v2", + "base_model:FacebookAI/roberta-base", + "base_model:finetune:FacebookAI/roberta-base", + "license:cc-by-4.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: cc-by-4.0 datasets: - squad_v2 model-index: - name: deepset/roberta-base-squad2 results: - task: type: question-answering name: Question Answering dataset: name: squad_v2 type: squad_v2 config: squad_v2 split: validation metrics: - type: exact_match value: 79.9309 name: Exact Match verified: true verifyToken: >- eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDhhNjg5YzNiZGQ1YTIyYTAwZGUwOWEzZTRiYzdjM2QzYjA3ZTUxNDM1NjE1MTUyMjE1MGY1YzEzMjRjYzVjYiIsInZlcnNpb24iOjF9.EH5JJo8EEFwU7osPz3s7qanw_tigeCFhCXjSfyN0Y1nWVnSfulSxIk_DbAEI5iE80V4EKLyp5-mYFodWvL2KDA - type: f1 value: 82.9501 name: F1 verified: true verifyToken: >- eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjk5ZDYwOGQyNjNkMWI0OTE4YzRmOTlkY2JjNjQ0YTZkNTMzMzNkYTA0MDFmNmI3NjA3NjNlMjhiMDQ2ZjJjNSIsInZlcnNpb24iOjF9.DDm0LNTkdLbGsue58bg1aH_s67KfbcmkvL-6ZiI2s8IoxhHJMSf29H_uV2YLyevwx900t-MwTVOW3qfFnMMEAQ - type: total value: 11869 name: total verified: true verifyToken: >- eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMGFkMmI2ODM0NmY5NGNkNmUxYWViOWYxZDNkY2EzYWFmOWI4N2VhYzY5MGEzMTVhOTU4Zjc4YWViOGNjOWJjMCIsInZlcnNpb24iOjF9.fexrU1icJK5_MiifBtZWkeUvpmFISqBLDXSQJ8E6UnrRof-7cU0s4tX_dIsauHWtUpIHMPZCf5dlMWQKXZuAAA - task: type: question-answering name: Question Answering dataset: name: squad type: squad config: plain_text split: validation metrics: - type: exact_match value: 85.289 name: Exact Match - type: f1 value: 91.841 name: F1 - task: type: question-answering name: Question Answering dataset: name: adversarial_qa type: adversarial_qa config: adversarialQA split: validation metrics: - type: exact_match value: 29.5 name: Exact Match - type: f1 value: 40.367 name: F1 - task: type: question-answering name: Question Answering dataset: name: squad_adversarial type: squad_adversarial config: AddOneSent split: validation metrics: - type: exact_match value: 78.567 name: Exact Match - type: f1 value: 84.469 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts amazon type: squadshifts config: amazon split: test metrics: - type: exact_match value: 69.924 name: Exact Match - type: f1 value: 83.284 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts new_wiki type: squadshifts config: new_wiki split: test metrics: - type: exact_match value: 81.204 name: Exact Match - type: f1 value: 90.595 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts nyt type: squadshifts config: nyt split: test metrics: - type: exact_match value: 82.931 name: Exact Match - type: f1 value: 90.756 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts reddit type: squadshifts config: reddit split: test metrics: - type: exact_match value: 71.55 name: Exact Match - type: f1 value: 82.939 name: F1 base_model: - FacebookAI/roberta-base --- # roberta-base for Extractive QA This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering. We have also released a distilled version of this model called deepset/tinyroberta-squad2. It has a comparable prediction quality and runs at twice the speed of deepset/roberta-base-squad2. ## Overview **Language model:** roberta-base **Language:** English **Downstream-task:** Extractive QA **Training data:** SQuAD 2.0 **Eval data:** SQuAD 2.0 **Code:** See an example extractive QA pipeline built with Haystack **Infrastructure**: 4x Tesla v100 ## Hyperparameters ## Usage ### In Haystack Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents. To load and run the model with Haystack: For a complete example with an extractive question answering pipeline that scales over many documents, check out the corresponding Haystack tutorial. ### In Transformers ## Performance Evaluated on the SQuAD 2.0 dev set with the official eval script. ## Authors **Branden Chan:** branden.chan@deepset.ai **Timo Möller:** timo.moeller@deepset.ai **Malte Pietsch:** malte.pietsch@deepset.ai **Tanay Soni:** tanay.soni@deepset.ai ## About us
\"\"
\"\"
deepset is the company behind the production-ready open-source AI framework Haystack. Some of our other work: - Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\") - German BERT, GermanQuAD and GermanDPR, German embedding model - deepset Cloud - deepset Studio ## Get in touch and join the Haystack community

For more info on Haystack, visit our

Twitter | LinkedIn | Discord | GitHub Discussions | Website | YouTube By the way: we're hiring!", + "model_explanation_gemini": "Answers questions by extracting relevant text segments from given documents, trained on the SQuAD2.0 dataset including unanswerable questions." +} \ No newline at end of file diff --git a/data/model_data_json/deepvk_USER-bge-m3.json b/data/model_data_json/deepvk_USER-bge-m3.json new file mode 100644 index 0000000000000000000000000000000000000000..821b3e515dcee0eeff6f93b854409ac45853ed0d --- /dev/null +++ b/data/model_data_json/deepvk_USER-bge-m3.json @@ -0,0 +1,31 @@ +{ + "model_id": "deepvk/USER-bge-m3", + "downloads": 342953, + "tags": [ + "sentence-transformers", + "safetensors", + "xlm-roberta", + "sentence-similarity", + "feature-extraction", + "ru", + "dataset:deepvk/ru-HNP", + "dataset:deepvk/ru-WANLI", + "dataset:Shitao/bge-m3-data", + "dataset:RussianNLP/russian_super_glue", + "dataset:reciTAL/mlsum", + "dataset:Milana/russian_keywords", + "dataset:IlyaGusev/gazeta", + "dataset:d0rj/gsm8k-ru", + "dataset:bragovo/dsum_ru", + "dataset:CarlBrendt/Summ_Dialog_News", + "arxiv:2311.13534", + "arxiv:2309.12871", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction widget: [] pipeline_tag: sentence-similarity license: apache-2.0 datasets: - deepvk/ru-HNP - deepvk/ru-WANLI - Shitao/bge-m3-data - RussianNLP/russian_super_glue - reciTAL/mlsum - Milana/russian_keywords - IlyaGusev/gazeta - d0rj/gsm8k-ru - bragovo/dsum_ru - CarlBrendt/Summ_Dialog_News --- # USER-bge-m3 **U**niversal **S**entence **E**ncoder for **R**ussian (USER) is a sentence-transformer model for extracting embeddings exclusively for Russian language. It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. This model is initialized from []( which is shrinked version of []( model and trained to work mainly with the Russian language. Its quality on other languages was not evaluated. ## Usage Using this model becomes easy when you have []( installed: Then you can use the model like this: However, you can use model directly with []( Also, you can use native FlagEmbedding library for evaluation. Usage is described in model card. # Training Details We follow the []( model training algorithm, with several changes as we use different backbone. **Initialization:** []( – shrinked version of []( to support only Russian and English tokens. **Fine-tuning:** Supervised fine-tuning two different models based on data symmetry and then merging via []( 1. Since we split the data, we could additionally apply the AnglE loss to the symmetric model, which enhances performance on symmetric tasks. 2. Finally, we added the original model to the two obtained models to prevent catastrophic forgetting, tuning the weights for the merger using to produce the final model, **USER-bge-m3**. ### Dataset During model development, we additional collect 2 datasets: []( and []( | Symmetric Dataset | Size | Asymmetric Dataset | Size | |-------------------|-------|--------------------|------| | **AllNLI** | 282 644 | **MIRACL** | 10 000 | | MedNLI | 3 699 | MLDR | 1 864 | | RCB | 392 | Lenta | 185 972 | | Terra | 1 359 | Mlsum | 51 112 | | Tapaco | 91 240 | Mr-TyDi | 536 600 | | **deepvk/ru-WANLI** | 35 455 | Panorama | 11 024 | | **deepvk/ru-HNP** | 500 000 | PravoIsrael | 26 364 | | | | Xlsum | 124 486 | | | | Fialka-v1 | 130 000 | | | | RussianKeywords | 16 461 | | | | Gazeta | 121 928 | | | | Gsm8k-ru | 7 470 | | | | DSumRu | 27 191 | | | | SummDialogNews | 75 700 | **Total positive pairs:** 2,240,961 **Total negative pairs:** 792,644 (negative pairs from AIINLI, MIRACL, deepvk/ru-WANLI, deepvk/ru-HNP) For all labeled datasets, we only use its training set for fine-tuning. For datasets Gazeta, Mlsum, Xlsum: pairs (title/text) and (title/summary) are combined and used as asymmetric data. is an translated to Russian combination of SNLI, MNLI and ANLI. ## Experiments We compare our mode with the basic []( on the []( benchmark. In addition, we evaluate model on the russian subset of []( on Classification, Reranking, Multilabel Classification, STS, Retrieval, and PairClassification tasks. We use validation scripts from the official repositories for each of the tasks. Results on encodechka: | Model | Mean S | Mean S+W | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 | |-------------|--------|----------|------|------|------|------|------|------|------|------|------|------| | []( | 0.787 | 0.696 | 0.86 | 0.75 | 0.51 | 0.82 | 0.97 | 0.79 | 0.81 | 0.78 | 0.24 | 0.42 | | | **0.799** | **0.709** | **0.87** | **0.76** | **0.58** | 0.82 | 0.97 | 0.79 | 0.81 | 0.78 | **0.28** | **0.43** | Results on MTEB: | Type | []( | | |---------------------------|--------|-------------| | Average (30 datasets) | 0.689 | **0.706** | | Classification Average (12 datasets) | 0.571 | **0.594** | | Reranking Average (2 datasets) | **0.698** | 0.688 | | MultilabelClassification (2 datasets) | 0.343 | **0.359** | | STS Average (4 datasets) | 0.735 | **0.753** | | Retrieval Average (6 datasets) | **0.945** | 0.934 | | PairClassification Average (4 datasets) | 0.784 | **0.833** | ## Limitations We did not thoroughly evaluate the model's ability for sparse and multi-vec encoding. ## Citations", + "model_explanation_gemini": "Generates 1024-dimensional sentence embeddings for Russian text to enable semantic search and clustering tasks." +} \ No newline at end of file diff --git a/data/model_data_json/depth-anything_Depth-Anything-V2-Large-hf.json b/data/model_data_json/depth-anything_Depth-Anything-V2-Large-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..c1e8278a576a337e7a946ea69440d218c0ab1cce --- /dev/null +++ b/data/model_data_json/depth-anything_Depth-Anything-V2-Large-hf.json @@ -0,0 +1,19 @@ +{ + "model_id": "depth-anything/Depth-Anything-V2-Large-hf", + "downloads": 84253, + "tags": [ + "transformers", + "safetensors", + "depth_anything", + "depth-estimation", + "depth", + "relative depth", + "arxiv:2406.09414", + "arxiv:2401.10891", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers library: transformers license: cc-by-nc-4.0 tags: - depth - relative depth pipeline_tag: depth-estimation widget: - inference: false --- # Depth Anything V2 Base – Transformers Version Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features: - more fine-grained details than Depth Anything V1 - more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard) - more efficient (10x faster) and more lightweight than SD-based models - impressive fine-tuned performance with our pre-trained models This model checkpoint is compatible with the transformers library. Depth Anything V2 was introduced in the paper of the same name by Lihe Yang et al. It uses the same architecture as the original Depth Anything release, but uses synthetic data and a larger capacity teacher model to achieve much finer and robust depth predictions. The original Depth Anything model was introduced in the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang et al., and was first released in this repository. Online demo. ## Model description Depth Anything V2 leverages the DPT architecture with a DINOv2 backbone. The model is trained on ~600K synthetic labeled images and ~62 million real unlabeled images, obtaining state-of-the-art results for both relative and absolute depth estimation. \"drawing\" Depth Anything overview. Taken from the . ## Intended uses & limitations You can use the raw model for tasks like zero-shot depth estimation. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot depth estimation: Alternatively, you can use the model and processor classes: For more code examples, please refer to the documentation. ### Citation", + "model_explanation_gemini": "Estimates monocular depth with fine-grained details and robustness using synthetic and real image data." +} \ No newline at end of file diff --git a/data/model_data_json/depth-anything_Depth-Anything-V2-Large.json b/data/model_data_json/depth-anything_Depth-Anything-V2-Large.json new file mode 100644 index 0000000000000000000000000000000000000000..5524b5f30bb783e2659a4aba1fa834480d3479e4 --- /dev/null +++ b/data/model_data_json/depth-anything_Depth-Anything-V2-Large.json @@ -0,0 +1,15 @@ +{ + "model_id": "depth-anything/Depth-Anything-V2-Large", + "downloads": 123298, + "tags": [ + "depth-anything-v2", + "depth", + "relative depth", + "depth-estimation", + "en", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 language: - en pipeline_tag: depth-estimation library_name: depth-anything-v2 tags: - depth - relative depth --- # Depth-Anything-V2-Large ## Introduction Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features: - more fine-grained details than Depth Anything V1 - more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard) - more efficient (10x faster) and more lightweight than SD-based models - impressive fine-tuned performance with our pre-trained models ## Installation ## Usage Download the model first and put it under the directory. ## Citation If you find this project useful, please consider citing: ```bibtex @article{depth_anything_v2, title={Depth Anything V2}, author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, journal={arXiv:2406.09414}, year={2024} } @inproceedings{depth_anything_v1, title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, booktitle={CVPR}, year={2024} }", + "model_explanation_gemini": "Estimates monocular depth from images with fine-grained detail, robustness, and efficiency compared to previous models." +} \ No newline at end of file diff --git a/data/model_data_json/depth-anything_Depth-Anything-V2-Small-hf.json b/data/model_data_json/depth-anything_Depth-Anything-V2-Small-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..69a3bd27e1135bc5cad5b39fd72dc040e428e546 --- /dev/null +++ b/data/model_data_json/depth-anything_Depth-Anything-V2-Small-hf.json @@ -0,0 +1,19 @@ +{ + "model_id": "depth-anything/Depth-Anything-V2-Small-hf", + "downloads": 292617, + "tags": [ + "transformers", + "safetensors", + "depth_anything", + "depth-estimation", + "depth", + "relative depth", + "arxiv:2406.09414", + "arxiv:2401.10891", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - depth - relative depth pipeline_tag: depth-estimation library: transformers widget: - inference: false --- # Depth Anything V2 Small – Transformers Version Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features: - more fine-grained details than Depth Anything V1 - more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard) - more efficient (10x faster) and more lightweight than SD-based models - impressive fine-tuned performance with our pre-trained models This model checkpoint is compatible with the transformers library. Depth Anything V2 was introduced in the paper of the same name by Lihe Yang et al. It uses the same architecture as the original Depth Anything release, but uses synthetic data and a larger capacity teacher model to achieve much finer and robust depth predictions. The original Depth Anything model was introduced in the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang et al., and was first released in this repository. Online demo. ## Model description Depth Anything V2 leverages the DPT architecture with a DINOv2 backbone. The model is trained on ~600K synthetic labeled images and ~62 million real unlabeled images, obtaining state-of-the-art results for both relative and absolute depth estimation. \"drawing\" Depth Anything overview. Taken from the . ## Intended uses & limitations You can use the raw model for tasks like zero-shot depth estimation. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot depth estimation: Alternatively, you can use the model and processor classes: For more code examples, please refer to the documentation. ### Citation", + "model_explanation_gemini": "Estimates monocular depth from images with fine-grained detail and robustness using a DPT architecture and DINOv2 backbone." +} \ No newline at end of file diff --git a/data/model_data_json/descript_dac_44khz.json b/data/model_data_json/descript_dac_44khz.json new file mode 100644 index 0000000000000000000000000000000000000000..94813eaaa8a01e7bd7bc2211cee79300f36ff50b --- /dev/null +++ b/data/model_data_json/descript_dac_44khz.json @@ -0,0 +1,15 @@ +{ + "model_id": "descript/dac_44khz", + "downloads": 189192, + "tags": [ + "transformers", + "safetensors", + "dac", + "feature-extraction", + "arxiv:1910.09700", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers pipeline_tag: feature-extraction --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]", + "model_explanation_gemini": "Extracts features from audio data at a 44kHz sampling rate." +} \ No newline at end of file diff --git a/data/model_data_json/diffusers_stable-diffusion-xl-1.0-inpainting-0.1.json b/data/model_data_json/diffusers_stable-diffusion-xl-1.0-inpainting-0.1.json new file mode 100644 index 0000000000000000000000000000000000000000..bb03a70f87b4a7e802aca8f4b12d9d91c9117a02 --- /dev/null +++ b/data/model_data_json/diffusers_stable-diffusion-xl-1.0-inpainting-0.1.json @@ -0,0 +1,20 @@ +{ + "model_id": "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", + "downloads": 583686, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion-xl", + "stable-diffusion-xl-diffusers", + "text-to-image", + "inpainting", + "arxiv:2112.10752", + "base_model:stabilityai/stable-diffusion-xl-base-1.0", + "base_model:finetune:stabilityai/stable-diffusion-xl-base-1.0", + "license:openrail++", + "diffusers:StableDiffusionXLInpaintPipeline", + "region:us" + ], + "description": "--- license: openrail++ base_model: stabilityai/stable-diffusion-xl-base-1.0 tags: - stable-diffusion-xl - stable-diffusion-xl-diffusers - text-to-image - diffusers - inpainting inference: false --- # SD-XL Inpainting 0.1 Model Card !inpaint-example SD-XL Inpainting 0.1 is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask. The SD-XL Inpainting 0.1 was initialized with the weights. The model is trained for 40k steps at resolution 1024x1024 and 5% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and, in 25% mask everything. ## How to use **How it works:** | :-------------------------:|:-------------------------:| \"drawing\" | \"drawing\" | :-------------------------:|:-------------------------:| a tiger sitting on a park bench | \"drawing\" ## Model Description - **Developed by:** The Diffusers team - **Model type:** Diffusion-based text-to-image generative model - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). ## Uses ### Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. Excluded uses are described below. ### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. - When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version. ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.", + "model_explanation_gemini": "Generates and modifies images based on text prompts, specializing in inpainting by using masks to edit specific areas of pictures." +} \ No newline at end of file diff --git a/data/model_data_json/digiplay_MilkyWonderland_v1.json b/data/model_data_json/digiplay_MilkyWonderland_v1.json new file mode 100644 index 0000000000000000000000000000000000000000..eb70bb076352a3e735d0e0bc9c6d6bb3798a12c4 --- /dev/null +++ b/data/model_data_json/digiplay_MilkyWonderland_v1.json @@ -0,0 +1,18 @@ +{ + "model_id": "digiplay/MilkyWonderland_v1", + "downloads": 12280, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "text-to-image", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: other tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image - diffusers inference: true --- Model info:", + "model_explanation_gemini": "Generates images from text prompts using Stable Diffusion technology." +} \ No newline at end of file diff --git a/data/model_data_json/dima806_deepfake_vs_real_image_detection.json b/data/model_data_json/dima806_deepfake_vs_real_image_detection.json new file mode 100644 index 0000000000000000000000000000000000000000..612c9193d60a3b9f5258bbc1725af7ec87eb3c0f --- /dev/null +++ b/data/model_data_json/dima806_deepfake_vs_real_image_detection.json @@ -0,0 +1,19 @@ +{ + "model_id": "dima806/deepfake_vs_real_image_detection", + "downloads": 128314, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-classification", + "base_model:google/vit-base-patch16-224-in21k", + "base_model:finetune:google/vit-base-patch16-224-in21k", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 metrics: - accuracy - f1 base_model: - google/vit-base-patch16-224-in21k --- Checks whether an image is real or fake (AI-generated). **Note to users who want to use this model in production** Beware that this model is trained on a dataset collected about 3 years ago. Since then, there is a remarkable progress in generating deepfake images with common AI tools, resulting in a significant concept drift. To mitigate that, I urge you to retrain the model using the latest available labeled data. As a quick-fix approach, simple reducing the threshold (say from default 0.5 to 0.1 or even 0.01) of labelling image as a fake may suffice. However, you will do that at your own risk, and retraining the model is the better way of handling the concept drift. See for more details.", + "model_explanation_gemini": "Detects whether an image is real or AI-generated, though users should retrain it with newer data for better accuracy due to advancements in deepfake technology." +} \ No newline at end of file diff --git a/data/model_data_json/dima806_facial_emotions_image_detection.json b/data/model_data_json/dima806_facial_emotions_image_detection.json new file mode 100644 index 0000000000000000000000000000000000000000..437ff8b03772d437eb5a4fb97b83df0ebe679591 --- /dev/null +++ b/data/model_data_json/dima806_facial_emotions_image_detection.json @@ -0,0 +1,19 @@ +{ + "model_id": "dima806/facial_emotions_image_detection", + "downloads": 204486, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-classification", + "base_model:google/vit-base-patch16-224-in21k", + "base_model:finetune:google/vit-base-patch16-224-in21k", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 metrics: - accuracy - f1 base_model: - google/vit-base-patch16-224-in21k --- Returns facial emotion with about 91% accuracy based on facial human image. See for more details. !image/png", + "model_explanation_gemini": "Detects facial emotions in human images with approximately 91% accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/dima806_fairface_age_image_detection.json b/data/model_data_json/dima806_fairface_age_image_detection.json new file mode 100644 index 0000000000000000000000000000000000000000..4cc24abe093186ec6a6d93a5f99352e6b8181037 --- /dev/null +++ b/data/model_data_json/dima806_fairface_age_image_detection.json @@ -0,0 +1,19 @@ +{ + "model_id": "dima806/fairface_age_image_detection", + "downloads": 71641052, + "tags": [ + "transformers", + "safetensors", + "vit", + "image-classification", + "dataset:nateraw/fairface", + "base_model:google/vit-base-patch16-224-in21k", + "base_model:finetune:google/vit-base-patch16-224-in21k", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 metrics: - accuracy - f1 base_model: - google/vit-base-patch16-224-in21k pipeline_tag: image-classification library_name: transformers datasets: - nateraw/fairface --- Detects age group with about 59% accuracy based on an image. See for details. !image/png", + "model_explanation_gemini": "Detects age groups in images with approximately 59% accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/dima806_man_woman_face_image_detection.json b/data/model_data_json/dima806_man_woman_face_image_detection.json new file mode 100644 index 0000000000000000000000000000000000000000..ed77f27b4ba427f4e1d2db432c0417d8754deb43 --- /dev/null +++ b/data/model_data_json/dima806_man_woman_face_image_detection.json @@ -0,0 +1,18 @@ +{ + "model_id": "dima806/man_woman_face_image_detection", + "downloads": 74322, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-classification", + "base_model:google/vit-base-patch16-224-in21k", + "base_model:finetune:google/vit-base-patch16-224-in21k", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 metrics: - accuracy - f1 base_model: - google/vit-base-patch16-224-in21k --- Returns with about 98.7% accuracy whether the face belongs to man or woman based on face image. See for more details. !image/png" +} \ No newline at end of file diff --git a/data/model_data_json/distil-whisper_distil-large-v3.json b/data/model_data_json/distil-whisper_distil-large-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..f924bce342d5b1d9383b1c47070e45539b53bb66 --- /dev/null +++ b/data/model_data_json/distil-whisper_distil-large-v3.json @@ -0,0 +1,23 @@ +{ + "model_id": "distil-whisper/distil-large-v3", + "downloads": 416017, + "tags": [ + "transformers", + "jax", + "tensorboard", + "onnx", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "transformers.js", + "en", + "arxiv:2311.00430", + "arxiv:2210.13352", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: mit library_name: transformers tags: - audio - automatic-speech-recognition - transformers.js widget: - example_title: LibriSpeech sample 1 src: - example_title: LibriSpeech sample 2 src: pipeline_tag: automatic-speech-recognition --- # Distil-Whisper: distil-large-v3 Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date. Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give **superior long-form transcription accuracy** with OpenAI's **sequential long-form algorithm**. The result is a distilled model that performs to within 1% WER of large-v3 on long-form audio using both the sequential and chunked algorithms, and outperforms distil-large-v2 by 4.8% using the sequential algorithm. The model is also faster than previous Distil-Whisper models: **6.3x faster than large-v3**, and 1.1x faster than distil-large-v2. | Model | Params / M | Rel. Latency | Short-Form | Sequential Long-Form | Chunked Long-Form | |------------------------------------------------------------------------------|------------|--------------|------------|----------------------|-------------------| | large-v3 | 1550 | 1.0 | 8.4 | 10.0 | 11.0 | | **distil-large-v3** | **756** | **6.3** | **9.7** | **10.8** | **10.9** | | distil-large-v2 | 756 | 5.8 | 10.1 | 15.6 | 11.6 | Since the sequential algorithm is the \"de-facto\" transcription algorithm across the most popular Whisper libraries (Whisper cpp, Faster-Whisper, OpenAI Whisper), this distilled model is designed to be compatible with these libraries. You can expect significant performance gains by switching from previous Distil-Whisper checkpoints to distil-large-v3 when using these libraries. For convenience, the weights for the most popular libraries are already converted, with instructions for getting started below. ## Table of Contents 1. Transformers Usage * Short-Form Transcription * Sequential Long-Form * Chunked Long-Form * Speculative Decoding * Additional Speed and Memory Improvements 2. Library Integrations * Whisper cpp * Faster Whisper * OpenAI Whisper * Transformers.js * Candle 3. Model Details 4. License ## Transformers Usage distil-large-v3 is supported in the Hugging Face 🤗 Transformers library from version 4.39 onwards. To run the model, first install the latest version of Transformers. For this example, we'll also install 🤗 Datasets to load a toy audio dataset from the Hugging Face Hub: ### Short-Form Transcription The model can be used with the []( class to transcribe short-form audio files (< 30-seconds) as follows: To transcribe a local audio file, simply pass the path to your audio file when you call the pipeline: For segment-level timestamps, pass the argument and return the output:
For more control over the generation parameters, use the model + processor API directly: Ad-hoc generation arguments can be passed to , including for beam-search, for segment-level timestamps, and for prompting. See the docstrings for more details.
### Sequential Long-Form Unlike previous Distil-Whisper releases, distil-large-v3 is specifically designed to be compatible with OpenAI's sequential long-form transcription algorithm. This algorithm uses a sliding window for buffered inference of long audio files (> 30-seconds), and returns more accurate transcriptions compared to the chunked long-form algorithm. The sequential long-form algorithm should be used in either of the following scenarios: 1. Transcription accuracy is the most important factor, and latency is less of a consideration 2. You are transcribing **batches** of long audio files, in which case the latency of sequential is comparable to chunked, while being up to 0.5% WER more accurate If you are transcribing single long audio files and latency is the most important factor, you should use the chunked algorithm described below. For a detailed explanation of the different algorithms, refer to Sections 5 of the Distil-Whisper paper. The []( class can be used to transcribe long audio files with the sequential algorithm as follows:
For more control over the generation parameters, use the model + processor API directly:
### Chunked Long-Form distil-large-v3 remains compatible with the Transformers chunked long-form algorithm. This algorithm should be used when a single large audio file is being transcribed and the fastest possible inference is required. In such circumstances, the chunked algorithm is up to 9x faster than OpenAI's sequential long-form implementation (see Table 7 of the Distil-Whisper paper). To enable chunking, pass the parameter to the . For distil-large-v3, a chunk length of 25-seconds is optimal. To activate batching over long audio files, pass the argument : ### Speculative Decoding distil-large-v3 is the first Distil-Whisper model that can be used as an assistant to Whisper large-v3 for speculative decoding. Speculative decoding mathematically ensures that exactly the same outputs as Whisper are obtained, while being 2 times faster. This makes it the perfect drop-in replacement for existing Whisper pipelines, since the same outputs are guaranteed. In the following code-snippet, we load the assistant Distil-Whisper model standalone to the main Whisper pipeline. We then specify it as the \"assistant model\" for generation: For more details on speculative decoding, refer to the blog post Speculative Decoding for 2x Faster Whisper Inference. ### Additional Speed & Memory Improvements You can apply additional speed and memory improvements to Distil-Whisper to further reduce the inference speed and VRAM requirements. These optimisations primarily target the attention kernel, swapping it from an eager implementation to a more efficient flash attention version. #### Flash Attention 2 We recommend using Flash-Attention 2 if your GPU allows for it. To do so, you first need to install Flash Attention: Then pass to : #### Torch Scale-Product-Attention (SDPA) If your GPU does not support Flash Attention, we recommend making use of PyTorch scaled dot-product attention (SDPA). This attention implementation is activated **by default** for PyTorch versions 2.1.1 or greater. To check whether you have a compatible PyTorch version, run the following Python code snippet: If the above returns , you have a valid version of PyTorch installed and SDPA is activated by default. If it returns , you need to upgrade your PyTorch version according to the official instructions Once a valid PyTorch version is installed, SDPA is activated by default. It can also be set explicitly by specifying as follows: For more information about how to use the SDPA refer to the Transformers SDPA documentation. #### Torch compile Coming soon... #### 4-bit and 8-bit Inference Coming soon... ## Library Integrations ### Whisper.cpp Distil-Whisper can be run with the Whisper.cpp package with the original sequential long-form transcription algorithm. In a provisional benchmark on Mac M1, distil-large-v3 is over 5x faster than Whisper large-v3, while performing to within 0.8% WER over long-form audio. Steps for getting started: 1. Clone the Whisper.cpp repository: 2. Install the Hugging Face Hub Python package: And download the GGML weights for distil-large-v3 using the following Python snippet: Note that if you do not have a Python environment set-up, you can also download the weights directly with : 3. Run inference using the provided sample audio: ### Faster-Whisper Faster-Whisper is a reimplementation of Whisper using CTranslate2, a fast inference engine for Transformer models. First, install the Faster-Whisper package according to the official instructions. For this example, we'll also install 🤗 Datasets to load a toy audio dataset from the Hugging Face Hub: The following code snippet loads the distil-large-v3 model and runs inference on an example file from the LibriSpeech ASR dataset: To transcribe a local audio file, simply pass the path to the audio file as the argument to transcribe: ### OpenAI Whisper To use the model in the original Whisper format, first ensure you have the []( package installed. For this example, we'll also install 🤗 Datasets to load a toy audio dataset from the Hugging Face Hub: The following code-snippet demonstrates how to transcribe a sample file from the LibriSpeech dataset loaded using 🤗 Datasets: Note that the model weights will be downloaded and saved to your cache the first time you run the example. Subsequently, you can re-use the same example, and the weights will be loaded directly from your cache without having to download them again. To transcribe a local audio file, simply pass the path to the audio file as the argument to transcribe: The Distil-Whisper model can also be used with the OpenAI Whisper CLI. Refer to the following instructions for details. ### Transformers.js Distil-Whisper can be run completely in your web browser with Transformers.js: 1. Install Transformers.js from NPM: 2. Import the library and perform inference with the pipeline API. Check out the online Distil-Whisper Web Demo to try it out yourself. As you'll see, it runs locally in your browser: no server required! Refer to the Transformers.js docs for further information. ### Candle Through an integration with Hugging Face Candle 🕯️, Distil-Whisper is available in the Rust library 🦀 Benefit from: * Optimised CPU backend with optional MKL support for Linux x86 and Accelerate for Macs * Metal support for efficiently running on Macs * CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL * WASM support: run Distil-Whisper in a browser Steps for getting started: 1. Install []( as explained here 2. Clone the repository locally: 3. Enter the example directory for Whisper: 4. Run an example: 5. To specify your own audio file, add the flag: **Tip:** for compiling using Apple Metal, specify the feature when you run the example: Note that if you encounter the error: You should clean your installation: And subsequently recompile: ## Model Details Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector inputs to a sequence of hidden-state vectors. The decoder auto-regressively predicts text tokens, conditional on all previous tokens and the encoder hidden-states. Consequently, the encoder is only run forward once, whereas the decoder is run as many times as the number of tokens generated. In practice, this means the decoder accounts for over 90% of total inference time. Thus, to optimise for latency, the focus is on minimising the inference time of the decoder. To distill the Whisper model, we reduce the number of decoder layers while keeping the encoder fixed. The encoder (shown in green) is entirely copied from the teacher to the student and frozen during training. The student's decoder consists of a subset of the teacher decoder layers, which are intialised from maximally spaced layers. The model is then trained on a weighted sum of the KL divergence and pseudo-label loss terms.

## Differences with distil-large-v2 Compared to previous version of Distil-Whisper, distil-large-v3 is specifically designed to target the OpenAI sequential long-form transcription algorithm. There are no architectural differences compared to distil-large-v2, other than the fact the model layers are intialised from the latest large-v3 model rather than the older large-v2 one. The differences lie in the way the model was trained. Previous Distil-Whisper models were trained on a mean input length of 7-seconds, whereas the original Whisper models were pre-trained on 30-second inputs. During distillation, we shift the distribution of the model weights to the distribution of our training data. If our training data contains shorter utterances (e.g. on average 7-seconds audio instead of 30-seconds), then the predicted distribution shifts to this shorter context length. At inference time, the optimal context window for distil-large-v2 was an interpolation of these two values: 15-seconds. Beyond this time, the predictions for the distil-large-v2 model were largely inaccurate, particularly for the timestamp predictions. However, the sequential long-form algorithm uses 30-second sliding windows for inference, with the window shifted according to the last predicted timestamp. Since the last timestamp typically occurs after the 15-second mark, it was predicted with low accuracy, causing the long-form transcription to often fail. To preserve Whisper's ability to transcribe sliding 30-second windows, as is done with sequential decoding, we need to ensure the context length of distil-large-v3 is also 30-seconds. This was primarily achieved with four strategies: 1. **Packing the audio samples in the training dataset to 30-seconds:** since the model is both pre-trained and distilled on audio data packed to 30-seconds, distil-large-v3 now operates on the same ideal context window as Whisper, predicting accurate timestamps up to and including 30-seconds. 2. **Freezing the decoder input embeddings:** we use the same input embeds representation as the original model, which is designed to handle longer context lengths than previous Distil-Whisper iterations. 3. **Using a longer maximum context length during training:** instead of training on a maximum target length of 128, we train on a maximum of 256. This helps distil-large-v3 transcribe 30-second segments where the number of tokens possibly exceeds 128. 4. **Appending prompt conditioning to 50% of the training samples:** enables the model to be used with the argument, and context windows up to 448 tokens. There were further tricks that were employed to improve the performance of distil-large-v3 under the sequential decoding algorithm, which we be explained fully in an upcoming blog post. ## Evaluation The following code-snippets demonstrates how to evaluate the Distil-Whisper model on the LibriSpeech validation-clean dataset with streaming mode, meaning no audio data has to be downloaded to your local device. First, we need to install the required packages, including 🤗 Datasets to stream and load the audio data, and 🤗 Evaluate to perform the WER calculation: Evaluation can then be run end-to-end with the following example: **Print Output:** ## Intended Use Distil-Whisper is intended to be a drop-in replacement for Whisper large-v3 on English speech recognition. In particular, it achieves comparable WER results over out-of-distribution (OOD) test data, while being 6x faster on both short and long-form audio. ## Data Distil-Whisper is trained on 22,000 hours of audio data from nine open-source, permissively licensed speech datasets on the Hugging Face Hub: | Dataset | Size / h | Speakers | Domain | Licence | |-----------------------------------------------------------------------------------------|----------|----------|-----------------------------|-----------------| | People's Speech | 12,000 | unknown | Internet Archive | CC-BY-SA-4.0 | | Common Voice 13 | 3,000 | unknown | Narrated Wikipedia | CC0-1.0 | | GigaSpeech | 2,500 | unknown | Audiobook, podcast, YouTube | apache-2.0 | | Fisher | 1,960 | 11,900 | Telephone conversations | LDC | | LibriSpeech | 960 | 2,480 | Audiobooks | CC-BY-4.0 | | VoxPopuli | 540 | 1,310 | European Parliament | CC0 | | TED-LIUM | 450 | 2,030 | TED talks | CC-BY-NC-ND 3.0 | | SwitchBoard | 260 | 540 | Telephone conversations | LDC | | AMI | 100 | unknown | Meetings | CC-BY-4.0 | |||||| | **Total** | 21,770 | 18,260+ | | | The combined dataset spans 10 distinct domains and over 50k speakers. The diversity of this dataset is crucial to ensuring the distilled model is robust to audio distributions and noise. The audio data is then pseudo-labelled using the Whisper large-v3 model: we use Whisper to generate predictions for all the audio in our training set and use these as the target labels during training. Using pseudo-labels ensures that the transcriptions are consistently formatted across datasets and provides sequence-level distillation signal during training. ## WER Filter The Whisper pseudo-label predictions are subject to mis-transcriptions and hallucinations. To ensure we only train on accurate pseudo-labels, we employ a simple WER heuristic during training. First, we normalise the Whisper pseudo-labels and the ground truth labels provided by each dataset. We then compute the WER between these labels. If the WER exceeds a specified threshold, we discard the training example. Otherwise, we keep it for training. Section 9.2 of the Distil-Whisper paper demonstrates the effectiveness of this filter for improving downstream performance of the distilled model. We also partially attribute Distil-Whisper's robustness to hallucinations to this filter. ## Training The model was trained for 80,000 optimisation steps (or 11 epochs) with batch size 256. The Tensorboard training logs can be found under: ## Results The distilled model performs to within 1.5% WER of Whisper large-v3 on out-of-distribution (OOD) short-form audio, within 1% WER on sequential long-form decoding, and outperforms large-v3 by 0.1% on chunked long-form. This performance gain is attributed to lower hallucinations. For a detailed per-dataset breakdown of the evaluation results, refer to Tables 16 and 17 of the Distil-Whisper paper Distil-Whisper is also evaluated on the ESB benchmark datasets as part of the OpenASR leaderboard, where it performs to within 0.2% WER of Whisper. ## Reproducing Distil-Whisper Training and evaluation code to reproduce Distil-Whisper is available under the Distil-Whisper repository: This code will shortly be updated to include the training updates described in the section Differences with distil-large-v2. ## License Distil-Whisper inherits the MIT license from OpenAI's Whisper model. ## Citation If you use this model, please consider citing the Distil-Whisper paper: ## Acknowledgements * OpenAI for the Whisper model, in particular Jong Wook Kim for the original codebase and training discussions * Hugging Face 🤗 Transformers for the model integration * Georgi Gerganov for the Whisper cpp integration * Systran team for the Faster-Whisper integration * Joshua Lochner for the Transformers.js integration * Laurent Mazare for the Candle integration * Vaibhav Srivastav for Distil-Whisper distribution * Google's TPU Research Cloud (TRC) programme for Cloud TPU v4 compute resource * Raghav Sonavane for an early iteration of Distil-Whisper on the LibriSpeech dataset", + "model_explanation_gemini": "Distills OpenAI's Whisper large-v3 for faster, near-equivalent accuracy in English automatic speech recognition, especially optimized for long-form transcription." +} \ No newline at end of file diff --git a/data/model_data_json/distil-whisper_distil-medium.en.json b/data/model_data_json/distil-whisper_distil-medium.en.json new file mode 100644 index 0000000000000000000000000000000000000000..99b95c03a01c214330bb6a9bcca5f0eccfb684cd --- /dev/null +++ b/data/model_data_json/distil-whisper_distil-medium.en.json @@ -0,0 +1,24 @@ +{ + "model_id": "distil-whisper/distil-medium.en", + "downloads": 182696, + "tags": [ + "transformers", + "pytorch", + "jax", + "tensorboard", + "onnx", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "transformers.js", + "en", + "arxiv:2311.00430", + "arxiv:2210.13352", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - audio - automatic-speech-recognition - transformers.js widget: - example_title: LibriSpeech sample 1 src: - example_title: LibriSpeech sample 2 src: pipeline_tag: automatic-speech-recognition license: mit library_name: transformers --- # Distil-Whisper: distil-medium.en Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. It is a distilled version of the Whisper model that is **6 times faster**, 49% smaller, and performs **within 1% WER** on out-of-distribution evaluation sets. This is the repository for distil-medium.en, a distilled variant of Whisper medium.en. | Model | Params / M | Rel. Latency ↑ | Short-Form WER ↓ | Long-Form WER ↓ | |----------------------------------------------------------------------------|------------|----------------|------------------|-----------------| | large-v3 | 1550 | 1.0 | **8.4** | 11.0 | | large-v2 | 1550 | 1.0 | 9.1 | 11.7 | | | | | | | | distil-large-v3 | 756 | 6.3 | 9.7 | **10.8** | | distil-large-v2 | 756 | 5.8 | 10.1 | 11.6 | | distil-medium.en | 394 | **6.8** | 11.1 | 12.4 | | distil-small.en | **166** | 5.6 | 12.1 | 12.8 | **Note:** Distil-Whisper is currently only available for English speech recognition. We are working with the community to distill Whisper on other languages. If you are interested in distilling Whisper in your language, check out the provided training code. We will update the Distil-Whisper repository with multilingual checkpoints when ready! ## Usage Distil-Whisper is supported in Hugging Face 🤗 Transformers from version 4.35 onwards. To run the model, first install the latest version of the Transformers library. For this example, we'll also install 🤗 Datasets to load toy audio dataset from the Hugging Face Hub: ### Short-Form Transcription The model can be used with the []( class to transcribe short-form audio files (< 30-seconds) as follows: To transcribe a local audio file, simply pass the path to your audio file when you call the pipeline: ### Long-Form Transcription Distil-Whisper uses a chunked algorithm to transcribe long-form audio files (> 30-seconds). In practice, this chunked long-form algorithm is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper (see Table 7 of the Distil-Whisper paper). To enable chunking, pass the parameter to the . For Distil-Whisper, a chunk length of 15-seconds is optimal. To activate batching, pass the argument : ### Speculative Decoding Distil-Whisper can be used as an assistant model to Whisper for speculative decoding. Speculative decoding mathematically ensures the exact same outputs as Whisper are obtained while being 2 times faster. This makes it the perfect drop-in replacement for existing Whisper pipelines, since the same outputs are guaranteed. In the following code-snippet, we load the assistant Distil-Whisper model standalone to the main Whisper pipeline. We then specify it as the \"assistant model\" for generation: ## Additional Speed & Memory Improvements You can apply additional speed and memory improvements to Distil-Whisper which we cover in the following. ### Flash Attention We recommend using Flash-Attention 2 if your GPU allows for it. To do so, you first need to install Flash Attention: and then all you have to do is to pass to : ### Torch Scale-Product-Attention (SDPA) If your GPU does not support Flash Attention, we recommend making use of BetterTransformers. To do so, you first need to install optimum: And then convert your model to a \"BetterTransformer\" model before using it: ### Running Distil-Whisper in To use the model in the original Whisper format, first ensure you have the []( package installed: The following code-snippet demonstrates how to transcribe a sample file from the LibriSpeech dataset loaded using 🤗 Datasets: To transcribe a local audio file, simply pass the path to the audio file as the argument to transcribe: ### Whisper.cpp Distil-Whisper can be run from the Whisper.cpp repository with the original sequential long-form transcription algorithm. In a provisional benchmark on Mac M1, is 4x faster than , while performing to within 1% WER over long-form audio. Steps for getting started: 1. Clone the Whisper.cpp repository: 2. Download the ggml weights for from the Hugging Face Hub: Note that if you do not have the package installed, you can also download the weights with : 3. Run inference using the provided sample audio: ### Transformers.js See the docs for more information. ### Candle Through an integration with Hugging Face Candle 🕯️, Distil-Whisper is now available in the Rust library 🦀 Benefit from: * Optimised CPU backend with optional MKL support for x86 and Accelerate for Macs * CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL * WASM support: run Distil-Whisper in a browser Steps for getting started: 1. Install []( as explained here 2. Clone the repository locally: 3. Enter the example directory for Whisper: 4. Run an example: 5. To specify your own audio file, add the flag: ### 8bit & 4bit Quantization Coming soon ... ## Model Details Distil-Whisper inherits the encoder-decoder architecture from Whisper. The encoder maps a sequence of speech vector inputs to a sequence of hidden-state vectors. The decoder auto-regressively predicts text tokens, conditional on all previous tokens and the encoder hidden-states. Consequently, the encoder is only run forward once, whereas the decoder is run as many times as the number of tokens generated. In practice, this means the decoder accounts for over 90% of total inference time. Thus, to optimise for latency, the focus should be on minimising the inference time of the decoder. To distill the Whisper model, we reduce the number of decoder layers while keeping the encoder fixed. The encoder (shown in green) is entirely copied from the teacher to the student and frozen during training. The student's decoder consists of only two decoder layers, which are initialised from the first and last decoder layer of the teacher (shown in red). All other decoder layers of the teacher are discarded. The model is then trained on a weighted sum of the KL divergence and pseudo-label loss terms.

## Evaluation The following code-snippets demonstrates how to evaluate the Distil-Whisper model on the LibriSpeech validation.clean dataset with streaming mode, meaning no audio data has to be downloaded to your local device. First, we need to install the required packages, including 🤗 Datasets to stream and load the audio data, and 🤗 Evaluate to perform the WER calculation: Evaluation can then be run end-to-end with the following example: **Print Output:** ## Intended Use Distil-Whisper is intended to be a drop-in replacement for Whisper on English speech recognition. In particular, it achieves comparable WER results over out-of-distribution test data, while being 6x faster over both short and long-form audio. ## Data Distil-Whisper is trained on 22,000 hours of audio data from 9 open-source, permissively licensed speech datasets on the Hugging Face Hub: | Dataset | Size / h | Speakers | Domain | Licence | |-----------------------------------------------------------------------------------------|----------|----------|-----------------------------|-----------------| | People's Speech | 12,000 | unknown | Internet Archive | CC-BY-SA-4.0 | | Common Voice 13 | 3,000 | unknown | Narrated Wikipedia | CC0-1.0 | | GigaSpeech | 2,500 | unknown | Audiobook, podcast, YouTube | apache-2.0 | | Fisher | 1,960 | 11,900 | Telephone conversations | LDC | | LibriSpeech | 960 | 2,480 | Audiobooks | CC-BY-4.0 | | VoxPopuli | 540 | 1,310 | European Parliament | CC0 | | TED-LIUM | 450 | 2,030 | TED talks | CC-BY-NC-ND 3.0 | | SwitchBoard | 260 | 540 | Telephone conversations | LDC | | AMI | 100 | unknown | Meetings | CC-BY-4.0 | |||||| | **Total** | 21,770 | 18,260+ | | | The combined dataset spans 10 distinct domains and over 50k speakers. The diversity of this dataset is crucial to ensuring the distilled model is robust to audio distributions and noise. The audio data is then pseudo-labelled using the Whisper large-v2 model: we use Whisper to generate predictions for all the audio in our training set and use these as the target labels during training. Using pseudo-labels ensures that the transcriptions are consistently formatted across datasets and provides sequence-level distillation signal during training. ## WER Filter The Whisper pseudo-label predictions are subject to mis-transcriptions and hallucinations. To ensure we only train on accurate pseudo-labels, we employ a simple WER heuristic during training. First, we normalise the Whisper pseudo-labels and the ground truth labels provided by each dataset. We then compute the WER between these labels. If the WER exceeds a specified threshold, we discard the training example. Otherwise, we keep it for training. Section 9.2 of the Distil-Whisper paper demonstrates the effectiveness of this filter for improving downstream performance of the distilled model. We also partially attribute Distil-Whisper's robustness to hallucinations to this filter. ## Training The model was trained for 80,000 optimisation steps (or eight epochs). The Tensorboard training logs can be found under: ## Results The distilled model performs to within 1% WER of Whisper on out-of-distribution (OOD) short-form audio, and outperforms Whisper by 0.1% on OOD long-form audio. This performance gain is attributed to lower hallucinations. For a detailed per-dataset breakdown of the evaluation results, refer to Tables 16 and 17 of the Distil-Whisper paper Distil-Whisper is also evaluated on the ESB benchmark datasets as part of the OpenASR leaderboard, where it performs to within 0.2% WER of Whisper. ## Reproducing Distil-Whisper Training and evaluation code to reproduce Distil-Whisper is available under the Distil-Whisper repository: ## License Distil-Whisper inherits the MIT license from OpenAI's Whisper model. ## Citation If you use this model, please consider citing the Distil-Whisper paper: ## Acknowledgements * OpenAI for the Whisper model and original codebase * Hugging Face 🤗 Transformers for the model integration * Google's TPU Research Cloud (TRC) programme for Cloud TPU v4s * []( for releasing an early iteration of Distil-Whisper on the LibriSpeech dataset", + "model_explanation_gemini": "A distilled version of Whisper for English speech recognition that is faster and smaller while maintaining near-original accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilbert-base-cased-distilled-squad.json b/data/model_data_json/distilbert_distilbert-base-cased-distilled-squad.json new file mode 100644 index 0000000000000000000000000000000000000000..ad4216e6861a7c9017c60453aae4d81426e0e3d5 --- /dev/null +++ b/data/model_data_json/distilbert_distilbert-base-cased-distilled-squad.json @@ -0,0 +1,24 @@ +{ + "model_id": "distilbert/distilbert-base-cased-distilled-squad", + "downloads": 210227, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "safetensors", + "openvino", + "distilbert", + "question-answering", + "en", + "dataset:squad", + "arxiv:1910.01108", + "arxiv:1910.09700", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 datasets: - squad metrics: - squad model-index: - name: distilbert-base-cased-distilled-squad results: - task: type: question-answering name: Question Answering dataset: name: squad type: squad config: plain_text split: validation metrics: - type: exact_match value: 79.5998 name: Exact Match verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTViZDA2Y2E2NjUyMjNjYjkzNTUzODc5OTk2OTNkYjQxMDRmMDhlYjdmYWJjYWQ2N2RlNzY1YmI3OWY1NmRhOSIsInZlcnNpb24iOjF9.ZJHhboAMwsi3pqU-B-XKRCYP_tzpCRb8pEjGr2Oc-TteZeoWHI8CXcpDxugfC3f7d_oBcKWLzh3CClQxBW1iAQ - type: f1 value: 86.9965 name: F1 verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWZlMzY2MmE1NDNhOGNjNWRmODg0YjQ2Zjk5MjUzZDQ2MDYxOTBlMTNhNzQ4NTA2NjRmNDU3MGIzMTYwMmUyOSIsInZlcnNpb24iOjF9.z0ZDir87aT7UEmUeDm8Uw0oUdAqzlBz343gwnsQP3YLfGsaHe-jGlhco0Z7ISUd9NokyCiJCRc4NNxJQ83IuCw --- # DistilBERT base cased distilled SQuAD ## Table of Contents - Model Details - How To Get Started With the Model - Uses - Risks, Limitations and Biases - Training - Evaluation - Environmental Impact - Technical Specifications - Citation Information - Model Card Authors ## Model Details **Model Description:** The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark. This model is a fine-tune checkpoint of DistilBERT-base-cased, fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1. - **Developed by:** Hugging Face - **Model Type:** Transformer-based language model - **Language(s):** English - **License:** Apache 2.0 - **Related Models:** DistilBERT-base-cased - **Resources for more information:** - See this repository for more about Distil\\* (a class of compressed models including this model) - See Sanh et al. (2019) for more information about knowledge distillation and the training procedure ## How to Get Started with the Model Use the code below to get started with the model. Here is how to use this model in PyTorch: And in TensorFlow: ## Uses This model can be used for question answering. #### Misuse and Out-of-scope Use The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example: Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. ## Training #### Training Data The distilbert-base-cased model was trained using the same data as the distilbert-base-uncased model. The distilbert-base-uncased model model describes it's training data as: > DistilBERT pretrained on the same data as BERT, which is BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). To learn more about the SQuAD v1.1 dataset, see the SQuAD v1.1 data card. #### Training Procedure ##### Preprocessing See the distilbert-base-cased model card for further details. ##### Pretraining See the distilbert-base-cased model card for further details. ## Evaluation As discussed in the model repository > This model reaches a F1 score of 87.1 on the [SQuAD v1.1] dev set (for comparison, BERT bert-base-cased version reaches a F1 score of 88.7). ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). We present the hardware type and hours used based on the associated paper. Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD. - **Hardware Type:** 8 16GB V100 GPUs - **Hours used:** 90 hours - **Cloud Provider:** Unknown - **Compute Region:** Unknown - **Carbon Emitted:** Unknown ## Technical Specifications See the associated paper for details on the modeling architecture, objective, compute infrastructure, and training details. ## Citation Information APA: - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. ## Model Card Authors This model card was written by the Hugging Face team.", + "model_explanation_gemini": "A distilled version of BERT fine-tuned for English question answering, achieving high performance on the SQuAD dataset." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilbert-base-cased.json b/data/model_data_json/distilbert_distilbert-base-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..3213b6569e0e52d597a73b506610cae4472299d5 --- /dev/null +++ b/data/model_data_json/distilbert_distilbert-base-cased.json @@ -0,0 +1,23 @@ +{ + "model_id": "distilbert/distilbert-base-cased", + "downloads": 230513, + "tags": [ + "transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "distilbert", + "fill-mask", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1910.01108", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 datasets: - bookcorpus - wikipedia --- # Model Card for DistilBERT base model (cased) This model is a distilled version of the BERT base model. It was introduced in this paper. The code for the distillation process can be found here. This model is cased: it does make a difference between english and English. All the training details on the pre-training, the uses, limitations and potential biases (included below) are the same as for DistilBERT-base-uncased. We highly encourage to check it if you want to know more. ## Model description DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained with three objectives: - Distillation loss: the model was trained to return the same probabilities as the BERT base model. - Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Cosine embedding loss: the model was also trained to generate hidden states as close as possible as the BERT base model. This way, the model learns the same inner representation of the English language than its teacher model, while being faster for inference or downstream tasks. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. It also inherits some of the bias of its teacher model. This bias will also affect all fine-tuned versions of this model. ## Training data DistilBERT pretrained on the same data as BERT, which is BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining The model was trained on 8 16 GB V100 for 90 hours. See the training code for all hyperparameters details. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Glue test results: | Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| | | 81.5 | 87.8 | 88.2 | 90.4 | 47.2 | 85.5 | 85.6 | 60.6 | ### BibTeX entry and citation info
", + "model_explanation_gemini": "A distilled, faster version of BERT for masked language modeling and sequence tasks, pretrained on English text to replicate BERT's performance with improved efficiency." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilbert-base-multilingual-cased.json b/data/model_data_json/distilbert_distilbert-base-multilingual-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..fb84a898c8cefd313489bba12284831f7efe6102 --- /dev/null +++ b/data/model_data_json/distilbert_distilbert-base-multilingual-cased.json @@ -0,0 +1,125 @@ +{ + "model_id": "distilbert/distilbert-base-multilingual-cased", + "downloads": 4057741, + "tags": [ + "transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "distilbert", + "fill-mask", + "multilingual", + "af", + "sq", + "ar", + "an", + "hy", + "ast", + "az", + "ba", + "eu", + "bar", + "be", + "bn", + "inc", + "bs", + "br", + "bg", + "my", + "ca", + "ceb", + "ce", + "zh", + "cv", + "hr", + "cs", + "da", + "nl", + "en", + "et", + "fi", + "fr", + "gl", + "ka", + "de", + "el", + "gu", + "ht", + "he", + "hi", + "hu", + "is", + "io", + "id", + "ga", + "it", + "ja", + "jv", + "kn", + "kk", + "ky", + "ko", + "la", + "lv", + "lt", + "roa", + "nds", + "lm", + "mk", + "mg", + "ms", + "ml", + "mr", + "mn", + "min", + "ne", + "new", + "nb", + "nn", + "oc", + "fa", + "pms", + "pl", + "pt", + "pa", + "ro", + "ru", + "sco", + "sr", + "scn", + "sk", + "sl", + "aze", + "es", + "su", + "sw", + "sv", + "tl", + "tg", + "th", + "ta", + "tt", + "te", + "tr", + "uk", + "ud", + "uz", + "vi", + "vo", + "war", + "cy", + "fry", + "pnb", + "yo", + "dataset:wikipedia", + "arxiv:1910.01108", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - sq - ar - an - hy - ast - az - ba - eu - bar - be - bn - inc - bs - br - bg - my - ca - ceb - ce - zh - cv - hr - cs - da - nl - en - et - fi - fr - gl - ka - de - el - gu - ht - he - hi - hu - is - io - id - ga - it - ja - jv - kn - kk - ky - ko - la - lv - lt - roa - nds - lm - mk - mg - ms - ml - mr - mn - min - ne - new - nb - nn - oc - fa - pms - pl - pt - pa - ro - ru - sco - sr - hr - scn - sk - sl - aze - es - su - sw - sv - tl - tg - th - ta - tt - te - tr - uk - ud - uz - vi - vo - war - cy - fry - pnb - yo license: apache-2.0 datasets: - wikipedia --- # Model Card for DistilBERT base multilingual (cased) # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. How To Get Started With the Model # Model Details ## Model Description This model is a distilled version of the BERT base multilingual model. The code for the distillation process can be found here. This model is cased: it does make a difference between english and English. The model is trained on the concatenation of Wikipedia in 104 different languages listed here. The model has 6 layers, 768 dimension and 12 heads, totalizing 134M parameters (compared to 177M parameters for mBERT-base). On average, this model, referred to as DistilmBERT, is twice as fast as mBERT-base. We encourage potential users of this model to check out the BERT base multilingual model card to learn more about usage, limitations and potential biases. - **Developed by:** Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (Hugging Face) - **Model type:** Transformer-based language model - **Language(s) (NLP):** 104 languages; see full list here - **License:** Apache 2.0 - **Related Models:** BERT base multilingual model - **Resources for more information:** - GitHub Repository - Associated Paper # Uses ## Direct Use and Downstream Use You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ## Out of Scope Use The model should not be used to intentionally create hostile or alienating environments for people. The model was not trained to be factual or true representations of people or events, and therefore using the models to generate such content is out-of-scope for the abilities of this model. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. # Training Details - The model was pretrained with the supervision of bert-base-multilingual-cased on the concatenation of Wikipedia in 104 different languages - The model has 6 layers, 768 dimension and 12 heads, totalizing 134M parameters. - Further information about the training procedure and data is included in the bert-base-multilingual-cased model card. # Evaluation The model developers report the following accuracy results for DistilmBERT (see GitHub Repo): > Here are the results on the test sets for 6 of the languages available in XNLI. The results are computed in the zero shot setting (trained on the English portion and evaluated on the target language portion): | Model | English | Spanish | Chinese | German | Arabic | Urdu | | :---: | :---: | :---: | :---: | :---: | :---: | :---:| | mBERT base cased (computed) | 82.1 | 74.6 | 69.1 | 72.3 | 66.4 | 58.5 | | mBERT base uncased (reported)| 81.4 | 74.3 | 63.8 | 70.5 | 62.1 | 58.3 | | DistilmBERT | 78.2 | 69.1 | 64.0 | 66.3 | 59.1 | 54.7 | # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** More information needed - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation APA - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. # How to Get Started With the Model You can use the model directly with a pipeline for masked language modeling:", + "model_explanation_gemini": "A distilled multilingual BERT model supporting 104 languages, optimized for efficiency and designed for fine-tuning on tasks like sequence classification, token classification, or question answering." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilbert-base-uncased-distilled-squad.json b/data/model_data_json/distilbert_distilbert-base-uncased-distilled-squad.json new file mode 100644 index 0000000000000000000000000000000000000000..242c54305d1070df46c3a24ab8d49a4b49546e9c --- /dev/null +++ b/data/model_data_json/distilbert_distilbert-base-uncased-distilled-squad.json @@ -0,0 +1,23 @@ +{ + "model_id": "distilbert/distilbert-base-uncased-distilled-squad", + "downloads": 151206, + "tags": [ + "transformers", + "pytorch", + "tf", + "tflite", + "coreml", + "safetensors", + "distilbert", + "question-answering", + "en", + "dataset:squad", + "arxiv:1910.01108", + "arxiv:1910.09700", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - squad widget: - text: \"Which name is also used to describe the Amazon rainforest in English?\" context: \"The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \\\"Amazonas\\\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.\" - text: \"How many square kilometers of rainforest is covered in the basin?\" context: \"The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \\\"Amazonas\\\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.\" license: apache-2.0 --- # DistilBERT base uncased distilled SQuAD ## Table of Contents - Model Details - How To Get Started With the Model - Uses - Risks, Limitations and Biases - Training - Evaluation - Environmental Impact - Technical Specifications - Citation Information - Model Card Authors ## Model Details **Model Description:** The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark. This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1. - **Developed by:** Hugging Face - **Model Type:** Transformer-based language model - **Language(s):** English - **License:** Apache 2.0 - **Related Models:** DistilBERT-base-uncased - **Resources for more information:** - See this repository for more about Distil\\* (a class of compressed models including this model) - See Sanh et al. (2019) for more information about knowledge distillation and the training procedure ## How to Get Started with the Model Use the code below to get started with the model. Here is how to use this model in PyTorch: And in TensorFlow: ## Uses This model can be used for question answering. #### Misuse and Out-of-scope Use The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example: Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. ## Training #### Training Data The distilbert-base-uncased model model describes it's training data as: > DistilBERT pretrained on the same data as BERT, which is BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). To learn more about the SQuAD v1.1 dataset, see the SQuAD v1.1 data card. #### Training Procedure ##### Preprocessing See the distilbert-base-uncased model card for further details. ##### Pretraining See the distilbert-base-uncased model card for further details. ## Evaluation As discussed in the model repository > This model reaches a F1 score of 86.9 on the [SQuAD v1.1] dev set (for comparison, Bert bert-base-uncased version reaches a F1 score of 88.5). ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). We present the hardware type and hours used based on the associated paper. Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD. - **Hardware Type:** 8 16GB V100 GPUs - **Hours used:** 90 hours - **Cloud Provider:** Unknown - **Compute Region:** Unknown - **Carbon Emitted:** Unknown ## Technical Specifications See the associated paper for details on the modeling architecture, objective, compute infrastructure, and training details. ## Citation Information APA: - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. ## Model Card Authors This model card was written by the Hugging Face team.", + "model_explanation_gemini": "Answers questions based on given context by extracting relevant information, fine-tuned for English question-answering tasks like SQuAD." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilbert-base-uncased-finetuned-sst-2-english.json b/data/model_data_json/distilbert_distilbert-base-uncased-finetuned-sst-2-english.json new file mode 100644 index 0000000000000000000000000000000000000000..616cfaf6e0e3b68edaeaa670409a6e57059cfc75 --- /dev/null +++ b/data/model_data_json/distilbert_distilbert-base-uncased-finetuned-sst-2-english.json @@ -0,0 +1,26 @@ +{ + "model_id": "distilbert/distilbert-base-uncased-finetuned-sst-2-english", + "downloads": 4720798, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "onnx", + "safetensors", + "distilbert", + "text-classification", + "en", + "dataset:sst2", + "dataset:glue", + "arxiv:1910.01108", + "doi:10.57967/hf/0181", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 datasets: - sst2 - glue model-index: - name: distilbert-base-uncased-finetuned-sst-2-english results: - task: type: text-classification name: Text Classification dataset: name: glue type: glue config: sst2 split: validation metrics: - type: accuracy value: 0.9105504587155964 name: Accuracy verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2YyOGMxYjY2Y2JhMjkxNjIzN2FmMjNiNmM2ZWViNGY3MTNmNWI2YzhiYjYxZTY0ZGUyN2M1NGIxZjRiMjQwZiIsInZlcnNpb24iOjF9.uui0srxV5ZHRhxbYN6082EZdwpnBgubPJ5R2-Wk8HTWqmxYE3QHidevR9LLAhidqGw6Ih93fK0goAXncld_gBg - type: precision value: 0.8978260869565218 name: Precision verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzgwYTYwYjA2MmM0ZTYwNDk0M2NmNTBkZmM2NGNhYzQ1OGEyN2NkNDQ3Mzc2NTQyMmZiNDJiNzBhNGVhZGUyOSIsInZlcnNpb24iOjF9.eHjLmw3K02OU69R2Au8eyuSqT3aBDHgZCn8jSzE3_urD6EUSSsLxUpiAYR4BGLD_U6-ZKcdxVo_A2rdXqvUJDA - type: recall value: 0.9301801801801802 name: Recall verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMGIzM2E3MTI2Mzc2MDYwNmU3ZTVjYmZmZDBkNjY4ZTc5MGY0Y2FkNDU3NjY1MmVkNmE3Y2QzMzAwZDZhOWY1NiIsInZlcnNpb24iOjF9.PUZlqmct13-rJWBXdHm5tdkXgETL9F82GNbbSR4hI8MB-v39KrK59cqzFC2Ac7kJe_DtOeUyosj34O_mFt_1DQ - type: auc value: 0.9716626673402374 name: AUC verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDM0YWIwZmQ4YjUwOGZmMWU2MjI1YjIxZGQ2MzNjMzRmZmYxMzZkNGFjODhlMDcyZDM1Y2RkMWZlOWQ0MWYwNSIsInZlcnNpb24iOjF9.E7GRlAXmmpEkTHlXheVkuL1W4WNjv4JO3qY_WCVsTVKiO7bUu0UVjPIyQ6g-J1OxsfqZmW3Leli1wY8vPBNNCQ - type: f1 value: 0.9137168141592922 name: F1 verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMGU4MjNmOGYwZjZjMDQ1ZTkyZTA4YTc1MWYwOTM0NDM4ZWY1ZGVkNDY5MzNhYTQyZGFlNzIyZmUwMDg3NDU0NyIsInZlcnNpb24iOjF9.mW5ftkq50Se58M-jm6a2Pu93QeKa3MfV7xcBwvG3PSB_KNJxZWTCpfMQp-Cmx_EMlmI2siKOyd8akYjJUrzJCA - type: loss value: 0.39013850688934326 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTZiNzAyZDc0MzUzMmE1MGJiN2JlYzFiODE5ZTNlNGE4MmI4YzRiMTc2ODEzMTUwZmEzOTgxNzc4YjJjZTRmNiIsInZlcnNpb24iOjF9.VqIC7uYC-ZZ8ss9zQOlRV39YVOOLc5R36sIzCcVz8lolh61ux_5djm2XjpP6ARc6KqEnXC4ZtfNXsX2HZfrtCQ - task: type: text-classification name: Text Classification dataset: name: sst2 type: sst2 config: default split: train metrics: - type: accuracy value: 0.9885521685548412 name: Accuracy verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiY2I3NzU3YzhmMDkxZTViY2M3OTY1NmI0ZTdmMDQxNjNjYzJiZmQxNzczM2E4YmExYTY5ODY0NDBkY2I4ZjNkOCIsInZlcnNpb24iOjF9.4Gtk3FeVc9sPWSqZIaeUXJ9oVlPzm-NmujnWpK2y5s1Vhp1l6Y1pK5_78wW0-NxSvQqV6qd5KQf_OAEpVAkQDA - type: precision value: 0.9881965062029833 name: Precision Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDdlZDMzY2I3MTAwYTljNmM4MGMyMzU2YjAzZDg1NDYwN2ZmM2Y5OWZhMjUyMGJiNjY1YmZiMzFhMDI2ODFhNyIsInZlcnNpb24iOjF9.cqmv6yBxu4St2mykRWrZ07tDsiSLdtLTz2hbqQ7Gm1rMzq9tdlkZ8MyJRxtME_Y8UaOG9rs68pV-gKVUs8wABw - type: precision value: 0.9885521685548412 name: Precision Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjFlYzAzNmE1YjljNjUwNzBjZjEzZDY0ZDQyMmY5ZWM2OTBhNzNjYjYzYTk1YWE1NjU3YTMxZDQwOTE1Y2FkNyIsInZlcnNpb24iOjF9.jnCHOkUHuAOZZ_ZMVOnetx__OVJCS6LOno4caWECAmfrUaIPnPNV9iJ6izRO3sqkHRmxYpWBb-27GJ4N3LU-BQ - type: precision value: 0.9885639626373408 name: Precision Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGUyODFjNjBlNTE2MTY3ZDAxOGU1N2U0YjUyY2NiZjhkOGVmYThjYjBkNGU3NTRkYzkzNDQ2MmMwMjkwMWNiMyIsInZlcnNpb24iOjF9.zTNabMwApiZyXdr76QUn7WgGB7D7lP-iqS3bn35piqVTNsv3wnKjZOaKFVLIUvtBXq4gKw7N2oWxvWc4OcSNDg - type: recall value: 0.9886145346602994 name: Recall Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTU1YjlhODU3YTkyNTdiZDcwZGFlZDBiYjY0N2NjMGM2NTRiNjQ3MDNjNGMxOWY2ZGQ4NWU1YmMzY2UwZTI3YSIsInZlcnNpb24iOjF9.xaLPY7U-wHsJ3DDui1yyyM-xWjL0Jz5puRThy7fczal9x05eKEQ9s0a_WD-iLmapvJs0caXpV70hDe2NLcs-DA - type: recall value: 0.9885521685548412 name: Recall Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODE0YTU0MDBlOGY4YzU0MjY5MzA3OTk2OGNhOGVkMmU5OGRjZmFiZWI2ZjY5ODEzZTQzMTI0N2NiOTVkNDliYiIsInZlcnNpb24iOjF9.SOt1baTBbuZRrsvGcak2sUwoTrQzmNCbyV2m1_yjGsU48SBH0NcKXicidNBSnJ6ihM5jf_Lv_B5_eOBkLfNWDQ - type: recall value: 0.9885521685548412 name: Recall Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWNkNmM0ZGRlNmYxYzIwNDk4OTI5MzIwZWU1NzZjZDVhMDcyNDFlMjBhNDQxODU5OWMwMWNhNGEzNjY3ZGUyOSIsInZlcnNpb24iOjF9.b15Fh70GwtlG3cSqPW-8VEZT2oy0CtgvgEOtWiYonOovjkIQ4RSLFVzVG-YfslaIyfg9RzMWzjhLnMY7Bpn2Aw - type: f1 value: 0.9884019815052447 name: F1 Macro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYmM4NjQ5Yjk5ODRhYTU1MTY3MmRhZDBmODM1NTg3OTFiNWM4NDRmYjI0MzZkNmQ1MzE3MzcxODZlYzBkYTMyYSIsInZlcnNpb24iOjF9.74RaDK8nBVuGRl2Se_-hwQvP6c4lvVxGHpcCWB4uZUCf2_HoC9NT9u7P3pMJfH_tK2cpV7U3VWGgSDhQDi-UBQ - type: f1 value: 0.9885521685548412 name: F1 Micro verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDRmYWRmMmQ0YjViZmQxMzhhYTUyOTE1MTc0ZDU1ZjQyZjFhMDYzYzMzZDE0NzZlYzQyOTBhMTBhNmM5NTlkMiIsInZlcnNpb24iOjF9.VMn_psdAHIZTlW6GbjERZDe8MHhwzJ0rbjV_VJyuMrsdOh5QDmko-wEvaBWNEdT0cEKsbggm-6jd3Gh81PfHAQ - type: f1 value: 0.9885546181087554 name: F1 Weighted verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjUyZWFhZDZhMGQ3MzBmYmRiNDVmN2FkZDBjMjk3ODk0OTAxNGZkMWE0NzU5ZjI0NzE0NGZiNzM0N2Y2NDYyOSIsInZlcnNpb24iOjF9.YsXBhnzEEFEW6jw3mQlFUuIrW7Gabad2Ils-iunYJr-myg0heF8NEnEWABKFE1SnvCWt-69jkLza6SupeyLVCA - type: loss value: 0.040652573108673096 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTc3YjU3MjdjMzkxODA5MjU5NGUyY2NkMGVhZDg3ZWEzMmU1YWVjMmI0NmU2OWEyZTkzMTVjNDZiYTc0YjIyNCIsInZlcnNpb24iOjF9.lA90qXZVYiILHMFlr6t6H81Oe8a-4KmeX-vyCC1BDia2ofudegv6Vb46-4RzmbtuKeV6yy6YNNXxXxqVak1pAg --- # DistilBERT base uncased finetuned SST-2 ## Table of Contents - Model Details - How to Get Started With the Model - Uses - Risks, Limitations and Biases - Training ## Model Details **Model Description:** This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy of 92.7). - **Developed by:** Hugging Face - **Model Type:** Text Classification - **Language(s):** English - **License:** Apache-2.0 - **Parent Model:** For more details about DistilBERT, we encourage users to check out this model card. - **Resources for more information:** - Model Documentation - DistilBERT paper ## How to Get Started With the Model Example of single-label classification: ​​ ## Uses #### Direct Use This model can be used for topic classification. You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. #### Misuse and Out-of-scope Use The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Risks, Limitations and Biases Based on a few experimentations, we observed that this model could produce biased predictions that target underrepresented populations. For instance, for sentences like , this binary classification model will give radically different probabilities for the positive label depending on the country (0.89 if the country is France, but 0.08 if the country is Afghanistan) when nothing in the input indicates such a strong semantic shift. In this colab, Aurélien Géron made an interesting map plotting these probabilities for each country. \"Map We strongly advise users to thoroughly probe these aspects on their use-cases in order to evaluate the risks of this model. We recommend looking at the following bias evaluation datasets as a place to start: WinoBias, WinoGender, Stereoset. # Training #### Training Data The authors use the following Stanford Sentiment Treebank(sst2) corpora for the model. #### Training Procedure ###### Fine-tuning hyper-parameters - learning_rate = 1e-5 - batch_size = 32 - warmup = 600 - max_seq_length = 128 - num_train_epochs = 3.0", + "model_explanation_gemini": "A distilled BERT model fine-tuned for English text classification on the SST-2 dataset to predict sentiment labels." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilbert-base-uncased.json b/data/model_data_json/distilbert_distilbert-base-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..878977d8e875086bfc69502225ec82f572ab2731 --- /dev/null +++ b/data/model_data_json/distilbert_distilbert-base-uncased.json @@ -0,0 +1,25 @@ +{ + "model_id": "distilbert/distilbert-base-uncased", + "downloads": 10831493, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "distilbert", + "fill-mask", + "exbert", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1910.01108", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: apache-2.0 datasets: - bookcorpus - wikipedia --- # DistilBERT base model (uncased) This model is a distilled version of the BERT base model. It was introduced in this paper. The code for the distillation process can be found here. This model is uncased: it does not make a difference between english and English. ## Model description DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained with three objectives: - Distillation loss: the model was trained to return the same probabilities as the BERT base model. - Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Cosine embedding loss: the model was also trained to generate hidden states as close as possible as the BERT base model. This way, the model learns the same inner representation of the English language than its teacher model, while being faster for inference or downstream tasks. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. It also inherits some of the bias of its teacher model. This bias will also affect all fine-tuned versions of this model. ## Training data DistilBERT pretrained on the same data as BERT, which is BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining The model was trained on 8 16 GB V100 for 90 hours. See the training code for all hyperparameters details. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Glue test results: | Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| | | 82.2 | 88.5 | 89.2 | 91.3 | 51.3 | 85.8 | 87.5 | 59.9 | ### BibTeX entry and citation info ", + "model_explanation_gemini": "A distilled, faster version of BERT for masked language modeling and fine-tuning on downstream NLP tasks like sequence classification, trained via self-supervised learning on English text." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilgpt2.json b/data/model_data_json/distilbert_distilgpt2.json new file mode 100644 index 0000000000000000000000000000000000000000..e99548b0757ad5aecb126fe30e4dde6bf4b8da76 --- /dev/null +++ b/data/model_data_json/distilbert_distilgpt2.json @@ -0,0 +1,33 @@ +{ + "model_id": "distilbert/distilgpt2", + "downloads": 2816945, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "tflite", + "rust", + "coreml", + "safetensors", + "gpt2", + "text-generation", + "exbert", + "en", + "dataset:openwebtext", + "arxiv:1910.01108", + "arxiv:2201.08542", + "arxiv:2203.12574", + "arxiv:1910.09700", + "arxiv:1503.02531", + "license:apache-2.0", + "model-index", + "co2_eq_emissions", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: apache-2.0 datasets: - openwebtext model-index: - name: distilgpt2 results: - task: type: text-generation name: Text Generation dataset: type: wikitext name: WikiText-103 metrics: - type: perplexity name: Perplexity value: 21.1 co2_eq_emissions: 149200 --- # DistilGPT2 DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the smallest version of Generative Pre-trained Transformer 2 (GPT-2). Like GPT-2, DistilGPT2 can be used to generate text. Users of this model card should also consider information about the design, training, and limitations of GPT-2. ## Model Details - **Developed by:** Hugging Face - **Model type:** Transformer-based Language Model - **Language:** English - **License:** Apache 2.0 - **Model Description:** DistilGPT2 is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using knowledge distillation and was designed to be a faster, lighter version of GPT-2. - **Resources for more information:** See this repository for more about Distil\\* (a class of compressed models including Distilled-GPT2), Sanh et al. (2019) for more information about knowledge distillation and the training procedure, and this page for more about GPT-2. ## Uses, Limitations and Risks #### Limitations and Risks
Click to expand **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.** As the developers of GPT-2 (OpenAI) note in their model card, “language models like GPT-2 reflect the biases inherent to the systems they were trained on.” Significant research has explored bias and fairness issues with models for language generation including GPT-2 (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). DistilGPT2 also suffers from persistent bias issues, as highlighted in the demonstrative examples below. Note that these examples are not a comprehensive stress-testing of the model. Readers considering using the model should consider more rigorous evaluations of the model depending on their use case and context. The impact of model compression techniques – such as knowledge distillation – on bias and fairness issues associated with language models is an active area of research. For example: - Silva, Tambwekar and Gombolay (2021) find that distilled versions of BERT and RoBERTa consistently exhibit statistically significant bias (with regard to gender and race) with effect sizes larger than the teacher models. - Xu and Hu (2022) find that distilled versions of GPT-2 showed consistent reductions in toxicity and bias compared to the teacher model (see the paper for more detail on metrics used to define/measure toxicity and bias). - Gupta et al. (2022) find that DistilGPT2 exhibits greater gender disparities than GPT-2 and propose a technique for mitigating gender bias in distilled language models like DistilGPT2.
#### Potential Uses Since DistilGPT2 is a distilled version of GPT-2, it is intended to be used for similar use cases with the increased functionality of being smaller and easier to run than the base model. The developers of GPT-2 state in their model card that they envisioned GPT-2 would be used by researchers to better understand large-scale generative language models, with possible secondary use cases including: > - *Writing assistance: Grammar assistance, autocompletion (for normal prose or code)* > - *Creative writing and art: exploring the generation of creative, fictional texts; aiding creation of poetry and other literary art.* > - *Entertainment: Creation of games, chat bots, and amusing generations.* Using DistilGPT2, the Hugging Face team built the Write With Transformers web app, which allows users to play with the model to generate text directly from their browser. #### Out-of-scope Uses OpenAI states in the GPT-2 model card: > Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true. > > Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. ### How to Get Started with the Model
Click to expand *Be sure to read the sections on in-scope and out-of-scope uses and limitations of the model for further information on how to use the model.* Using DistilGPT2 is similar to using GPT-2. DistilGPT2 can be used directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Here is how to use this model to get the features of a given text in PyTorch: And in TensorFlow:
## Training Data DistilGPT2 was trained using OpenWebTextCorpus, an open-source reproduction of OpenAI’s WebText dataset, which was used to train GPT-2. See the OpenWebTextCorpus Dataset Card for additional information about OpenWebTextCorpus and Radford et al. (2019) for additional information about WebText. ## Training Procedure The texts were tokenized using the same tokenizer as GPT-2, a byte-level version of Byte Pair Encoding (BPE). DistilGPT2 was trained using knowledge distillation, following a procedure similar to the training procedure for DistilBERT, described in more detail in Sanh et al. (2019). ## Evaluation Results The creators of DistilGPT2 report that, on the WikiText-103 benchmark, GPT-2 reaches a perplexity on the test set of 16.3 compared to 21.1 for DistilGPT2 (after fine-tuning on the train set). ## Environmental Impact *Carbon emissions were estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.* - **Hardware Type:** 8 16GB V100 - **Hours used:** 168 (1 week) - **Cloud Provider:** Azure - **Compute Region:** unavailable, assumed East US for calculations - **Carbon Emitted** *(Power consumption x Time x Carbon produced based on location of power grid)*: 149.2 kg eq. CO2 ## Citation ## Glossary -
**Knowledge Distillation**: As described in Sanh et al. (2019), “knowledge distillation is a compression technique in which a compact model – the student – is trained to reproduce the behavior of a larger model – the teacher – or an ensemble of models.” Also see Bucila et al. (2006) and Hinton et al. (2015). ", + "model_explanation_gemini": "Generates English text with fewer parameters than GPT-2 while maintaining similar functionality." +} \ No newline at end of file diff --git a/data/model_data_json/distilbert_distilroberta-base.json b/data/model_data_json/distilbert_distilroberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..e217841900f7b0b5f96c968b52b669078408ce26 --- /dev/null +++ b/data/model_data_json/distilbert_distilroberta-base.json @@ -0,0 +1,25 @@ +{ + "model_id": "distilbert/distilroberta-base", + "downloads": 929056, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "roberta", + "fill-mask", + "exbert", + "en", + "dataset:openwebtext", + "arxiv:1910.01108", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: apache-2.0 datasets: - openwebtext --- # Model Card for DistilRoBERTa base # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. How To Get Started With the Model # Model Details ## Model Description This model is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found here. This model is case-sensitive: it makes a difference between english and English. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base). On average DistilRoBERTa is twice as fast as Roberta-base. We encourage users of this model card to check out the RoBERTa-base model card to learn more about usage, limitations and potential biases. - **Developed by:** Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (Hugging Face) - **Model type:** Transformer-based language model - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Related Models:** RoBERTa-base model card - **Resources for more information:** - GitHub Repository - Associated Paper # Uses ## Direct Use and Downstream Use You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ## Out of Scope Use The model should not be used to intentionally create hostile or alienating environments for people. The model was not trained to be factual or true representations of people or events, and therefore using the models to generate such content is out-of-scope for the abilities of this model. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example: ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. # Training Details DistilRoBERTa was pre-trained on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa). See the roberta-base model card for further details on training. # Evaluation When fine-tuned on downstream tasks, this model achieves the following results (see GitHub Repo): Glue test results: | Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| | | 84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 | # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** More information needed - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation APA - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. # How to Get Started With the Model You can use the model directly with a pipeline for masked language modeling: ", + "model_explanation_gemini": "A distilled version of RoBERTa-base for English NLP tasks, optimized for faster performance and intended for fine-tuning on sequence classification, token classification, or question answering." +} \ No newline at end of file diff --git a/data/model_data_json/ds4sd_SmolDocling-256M-preview.json b/data/model_data_json/ds4sd_SmolDocling-256M-preview.json new file mode 100644 index 0000000000000000000000000000000000000000..2e6fd27bb92e1987d70ce1860ad712cd5faadb67 --- /dev/null +++ b/data/model_data_json/ds4sd_SmolDocling-256M-preview.json @@ -0,0 +1,21 @@ +{ + "model_id": "ds4sd/SmolDocling-256M-preview", + "downloads": 80269, + "tags": [ + "transformers", + "onnx", + "safetensors", + "idefics3", + "image-text-to-text", + "conversational", + "en", + "arxiv:2503.11576", + "arxiv:2305.03393", + "base_model:HuggingFaceTB/SmolVLM-256M-Instruct", + "base_model:quantized:HuggingFaceTB/SmolVLM-256M-Instruct", + "license:cdla-permissive-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: - HuggingFaceTB/SmolVLM-256M-Instruct language: - en library_name: transformers license: cdla-permissive-2.0 pipeline_tag: image-text-to-text ---
\"SmolDocling\"

SmolDocling-256M-preview

SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling's most popular features while ensuring full compatibility with Docling through seamless support for DoclingDocuments.

This model was presented in the paper SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion. ### 🚀 Features: - 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**. - 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images. - 📐 **Layout and Localization** – Preserves document structure and document element **bounding boxes**. - 💻 **Code Recognition** – Detects and formats code blocks including identation. - 🔢 **Formula Recognition** – Identifies and processes mathematical expressions. - 📊 **Chart Recognition** – Extracts and interprets chart data. - 📑 **Table Recognition** – Supports column and row headers for structured table extraction. - 🖼️ **Figure Classification** – Differentiates figures and graphical elements. - 📝 **Caption Correspondence** – Links captions to relevant images and figures. - 📜 **List Grouping** – Organizes and structures list elements correctly. - 📄 **Full-Page Conversion** – Processes entire pages for comprehensive document conversion including all page elements (code, equations, tables, charts etc.) - 🔲 **OCR with Bounding Boxes** – OCR regions using a bounding box. - 📂 **General Document Processing** – Trained for both scientific and non-scientific documents. - 🔄 **Seamless Docling Integration** – Import into **Docling** and export in multiple formats. - 💨 **Fast inference using VLLM** – Avg of 0.35 secs per page on A100 GPU. ### 🚧 *Coming soon!* - 📊 **Better chart recognition 🛠️** - 📚 **One shot multi-page inference ⏱️** - 🧪 **Chemical Recognition** - 📙 **Datasets** ## ⌨️ Get started (code examples) You can use **transformers**, **vllm**, or **onnx** to perform inference, and Docling to convert results to variety of output formats (md, html, etc.):
📄 Single page image inference using Tranformers 🤖
🚀 Fast Batch Inference Using VLLM
ONNX Inference
💻 Local inference on Apple Silicon with MLX: see here ## DocTags \"Image DocTags create a clear and structured system of tags and rules that separate text from the document's structure. This makes things easier for Image-to-Sequence models by reducing confusion. On the other hand, converting directly to formats like HTML or Markdown can be messy—it often loses details, doesn’t clearly show the document’s layout, and increases the number of tokens, making processing less efficient. DocTags are integrated with Docling, which allows export to HTML, Markdown, and JSON. These exports can be offloaded to the CPU, reducing token generation overhead and improving efficiency. ## Supported Instructions
Description Instruction Comment
Full conversion Convert this page to docling. DocTags represetation
Chart Convert chart to table. (e.g., <chart>)
Formula Convert formula to LaTeX. (e.g., <formula>)
Code Convert code to text. (e.g., <code>)
Table Convert table to OTSL. (e.g., <otsl>) OTSL:
Actions and Pipelines OCR the text in a specific location: <loc_155><loc_233><loc_206><loc_237>
Identify element at: <loc_247><loc_482><10c_252><loc_486>
Find all 'text' elements on the page, retrieve all section headers.
Detect footer elements on the page.
#### Model Summary - **Developed by:** Docling Team, IBM Research - **Model type:** Multi-modal model (image+text) - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Architecture:** Based on Idefics3 (see technical summary) - **Finetuned from model:** Based on SmolVLM-256M-Instruct **Repository:** Docling **Paper:** arXiv **Project Page:** Hugging Face **Citation:** **Demo:** HF Space" +} \ No newline at end of file diff --git a/data/model_data_json/ds4sd_docling-models.json b/data/model_data_json/ds4sd_docling-models.json new file mode 100644 index 0000000000000000000000000000000000000000..3b8448d0b278e66cb21744375bb5db32443df2db --- /dev/null +++ b/data/model_data_json/ds4sd_docling-models.json @@ -0,0 +1,16 @@ +{ + "model_id": "ds4sd/docling-models", + "downloads": 849852, + "tags": [ + "transformers", + "safetensors", + "arxiv:2408.09869", + "arxiv:2206.01062", + "doi:10.57967/hf/3036", + "license:cdla-permissive-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cdla-permissive-2.0 --- # Docling Models This page contains models that power the PDF document converion package docling. ## Layout Model The layout model will take an image from a poge and apply RT-DETR model in order to find different layout components. It currently detects the labels: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title. As a reference (from the DocLayNet-paper), this is the performance of standard object detection methods on the DocLayNet dataset compared to human evaluation, | | human | MRCNN | MRCNN | FRCNN | YOLO | |----------------|---------|---------|---------|---------|--------| | | human | R50 | R101 | R101 | v5x6 | | Caption | 84-89 | 68.4 | 71.5 | 70.1 | 77.7 | | Footnote | 83-91 | 70.9 | 71.8 | 73.7 | 77.2 | | Formula | 83-85 | 60.1 | 63.4 | 63.5 | 66.2 | | List-item | 87-88 | 81.2 | 80.8 | 81.0 | 86.2 | | Page-footer | 93-94 | 61.6 | 59.3 | 58.9 | 61.1 | | Page-header | 85-89 | 71.9 | 70.0 | 72.0 | 67.9 | | Picture | 69-71 | 71.7 | 72.7 | 72.0 | 77.1 | | Section-header | 83-84 | 67.6 | 69.3 | 68.4 | 74.6 | | Table | 77-81 | 82.2 | 82.9 | 82.2 | 86.3 | | Text | 84-86 | 84.6 | 85.8 | 85.4 | 88.1 | | Title | 60-72 | 76.7 | 80.4 | 79.9 | 82.7 | | All | 82-83 | 72.4 | 73.5 | 73.4 | 76.8 | ## TableFormer The tableformer model will identify the structure of the table, starting from an image of a table. It uses the predicted table regions of the layout model to identify the tables. Tableformer has SOTA table structure identification, | Model (TEDS) | Simple table | Complex table | All tables | | ------------ | ------------ | ------------- | ---------- | | Tabula | 78.0 | 57.8 | 67.9 | | Traprange | 60.8 | 49.9 | 55.4 | | Camelot | 80.0 | 66.0 | 73.0 | | Acrobat Pro | 68.9 | 61.8 | 65.3 | | EDD | 91.2 | 85.4 | 88.3 | | TableFormer | 95.4 | 90.1 | 93.6 | ## References", + "model_explanation_gemini": "Detects document layout components (e.g., text, tables, headers) and identifies table structures in PDF documents using RT-DETR and TableFormer models." +} \ No newline at end of file diff --git a/data/model_data_json/dslim_bert-base-NER-uncased.json b/data/model_data_json/dslim_bert-base-NER-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..8da82d9d73c851e9bea3f280c5c45463b0c84fb4 --- /dev/null +++ b/data/model_data_json/dslim_bert-base-NER-uncased.json @@ -0,0 +1,19 @@ +{ + "model_id": "dslim/bert-base-NER-uncased", + "downloads": 258934, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "token-classification", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit ---", + "model_explanation_gemini": "Identifies and classifies named entities like persons, organizations, and locations in uncased English text." +} \ No newline at end of file diff --git a/data/model_data_json/dslim_bert-base-NER.json b/data/model_data_json/dslim_bert-base-NER.json new file mode 100644 index 0000000000000000000000000000000000000000..a279696a89aadc5c565fa39929910200eb4997c1 --- /dev/null +++ b/data/model_data_json/dslim_bert-base-NER.json @@ -0,0 +1,24 @@ +{ + "model_id": "dslim/bert-base-NER", + "downloads": 1732714, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "bert", + "token-classification", + "en", + "dataset:conll2003", + "arxiv:1810.04805", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - conll2003 license: mit model-index: - name: dslim/bert-base-NER results: - task: type: token-classification name: Token Classification dataset: name: conll2003 type: conll2003 config: conll2003 split: test metrics: - name: Accuracy type: accuracy value: 0.9118041001560013 verified: true - name: Precision type: precision value: 0.9211550382257732 verified: true - name: Recall type: recall value: 0.9306415698281261 verified: true - name: F1 type: f1 value: 0.9258740048459675 verified: true - name: loss type: loss value: 0.48325642943382263 verified: true --- # bert-base-NER If my open source models have been useful to you, please consider supporting me in building small, useful AI models for everyone (and help me afford med school / help out my parents financially). Thanks!
\"Buy ## Model description **bert-base-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance** for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). Specifically, this model is a *bert-base-cased* model that was fine-tuned on the English version of the standard CoNLL-2003 Named Entity Recognition dataset. If you'd like to use a larger BERT-large model fine-tuned on the same dataset, a **bert-large-NER** version is also available. ### Available NER models | Model Name | Description | Parameters | |-------------------|-------------|------------------| | distilbert-NER **(NEW!)** | Fine-tuned DistilBERT - a smaller, faster, lighter version of BERT | 66M | | bert-large-NER | Fine-tuned bert-large-cased - larger model with slightly better performance | 340M | | bert-base-NER-(uncased) | Fine-tuned bert-base, available in both cased and uncased versions | 110M | ## Intended uses & limitations #### How to use You can use this model with Transformers *pipeline* for NER. #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. Furthermore, the model occassionally tags subword tokens as entities and post-processing of results may be necessary to handle those cases. ## Training data This model was fine-tuned on English version of the standard CoNLL-2003 Named Entity Recognition dataset. The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes: Abbreviation|Description -|- O|Outside of a named entity B-MISC |Beginning of a miscellaneous entity right after another miscellaneous entity I-MISC | Miscellaneous entity B-PER |Beginning of a person’s name right after another person’s name I-PER |Person’s name B-ORG |Beginning of an organization right after another organization I-ORG |organization B-LOC |Beginning of a location right after another location I-LOC |Location ### CoNLL-2003 English Dataset Statistics This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper. #### # of training examples per entity type Dataset|LOC|MISC|ORG|PER -|-|-|-|- Train|7140|3438|6321|6600 Dev|1837|922|1341|1842 Test|1668|702|1661|1617 #### # of articles/sentences/tokens per dataset Dataset |Articles |Sentences |Tokens -|-|-|- Train |946 |14,987 |203,621 Dev |216 |3,466 |51,362 Test |231 |3,684 |46,435 ## Training procedure This model was trained on a single NVIDIA V100 GPU with recommended hyperparameters from the original BERT paper which trained & evaluated the model on CoNLL-2003 NER task. ## Eval results metric|dev|test -|-|- f1 |95.1 |91.3 precision |95.0 |90.7 recall |95.3 |91.9 The test metrics are a little lower than the official Google BERT results which encoded document context & experimented with CRF. More on replicating the original results here. ### BibTeX entry and citation info", + "model_explanation_gemini": "Fine-tuned BERT model for named entity recognition, identifying locations, organizations, persons, and miscellaneous entities in text." +} \ No newline at end of file diff --git a/data/model_data_json/dslim_bert-large-NER.json b/data/model_data_json/dslim_bert-large-NER.json new file mode 100644 index 0000000000000000000000000000000000000000..525b007a0c8c8ef110d8072d9675d7631006bf3c --- /dev/null +++ b/data/model_data_json/dslim_bert-large-NER.json @@ -0,0 +1,24 @@ +{ + "model_id": "dslim/bert-large-NER", + "downloads": 338834, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "bert", + "token-classification", + "en", + "dataset:conll2003", + "arxiv:1810.04805", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - conll2003 license: mit model-index: - name: dslim/bert-large-NER results: - task: type: token-classification name: Token Classification dataset: name: conll2003 type: conll2003 config: conll2003 split: test metrics: - name: Accuracy type: accuracy value: 0.9031688753722759 verified: true - name: Precision type: precision value: 0.920025068328604 verified: true - name: Recall type: recall value: 0.9193688678588825 verified: true - name: F1 type: f1 value: 0.9196968510445761 verified: true - name: loss type: loss value: 0.5085050463676453 verified: true --- # bert-large-NER If my open source models have been useful to you, please consider supporting me in building small, useful AI models for everyone (and help me afford med school / help out my parents financially). Thanks! \"Buy ## Model description **bert-large-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance** for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). Specifically, this model is a *bert-large-cased* model that was fine-tuned on the English version of the standard CoNLL-2003 Named Entity Recognition dataset. If you'd like to use a smaller BERT model fine-tuned on the same dataset, a **bert-base-NER** version is also available. ## Intended uses & limitations #### How to use You can use this model with Transformers *pipeline* for NER. #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. Furthermore, the model occassionally tags subword tokens as entities and post-processing of results may be necessary to handle those cases. ## Training data This model was fine-tuned on English version of the standard CoNLL-2003 Named Entity Recognition dataset. The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes: Abbreviation|Description -|- O|Outside of a named entity B-MIS |Beginning of a miscellaneous entity right after another miscellaneous entity I-MIS | Miscellaneous entity B-PER |Beginning of a person’s name right after another person’s name I-PER |Person’s name B-ORG |Beginning of an organization right after another organization I-ORG |organization B-LOC |Beginning of a location right after another location I-LOC |Location ### CoNLL-2003 English Dataset Statistics This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper. #### # of training examples per entity type Dataset|LOC|MISC|ORG|PER -|-|-|-|- Train|7140|3438|6321|6600 Dev|1837|922|1341|1842 Test|1668|702|1661|1617 #### # of articles/sentences/tokens per dataset Dataset |Articles |Sentences |Tokens -|-|-|- Train |946 |14,987 |203,621 Dev |216 |3,466 |51,362 Test |231 |3,684 |46,435 ## Training procedure This model was trained on a single NVIDIA V100 GPU with recommended hyperparameters from the original BERT paper which trained & evaluated the model on CoNLL-2003 NER task. ## Eval results metric|dev|test -|-|- f1 |95.7 |91.7 precision |95.3 |91.2 recall |96.1 |92.3 The test metrics are a little lower than the official Google BERT results which encoded document context & experimented with CRF. More on replicating the original results here. ### BibTeX entry and citation info", + "model_explanation_gemini": "Fine-tuned for named entity recognition, this BERT model identifies four entity types (LOC, ORG, PER, MISC) with state-of-the-art performance on the CoNLL-2003 dataset." +} \ No newline at end of file diff --git a/data/model_data_json/echarlaix_tiny-random-latent-consistency.json b/data/model_data_json/echarlaix_tiny-random-latent-consistency.json new file mode 100644 index 0000000000000000000000000000000000000000..1f5835b8957ad378708c98751f8b884508472f23 --- /dev/null +++ b/data/model_data_json/echarlaix_tiny-random-latent-consistency.json @@ -0,0 +1,13 @@ +{ + "model_id": "echarlaix/tiny-random-latent-consistency", + "downloads": 176124, + "tags": [ + "diffusers", + "safetensors", + "license:apache-2.0", + "diffusers:LatentConsistencyModelPipeline", + "region:us" + ], + "description": "--- license: apache-2.0 ---", + "model_explanation_gemini": "Generates random latent representations for consistency tasks under the Apache 2.0 license." +} \ No newline at end of file diff --git a/data/model_data_json/echarlaix_tiny-random-mistral.json b/data/model_data_json/echarlaix_tiny-random-mistral.json new file mode 100644 index 0000000000000000000000000000000000000000..7fe12c7e33155661d5594e520343ce2e770c1ead --- /dev/null +++ b/data/model_data_json/echarlaix_tiny-random-mistral.json @@ -0,0 +1,17 @@ +{ + "model_id": "echarlaix/tiny-random-mistral", + "downloads": 86229, + "tags": [ + "transformers", + "pytorch", + "mistral", + "text-generation", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 ---", + "model_explanation_gemini": "A tiny, randomly initialized Mistral model for testing and experimentation purposes." +} \ No newline at end of file diff --git a/data/model_data_json/echarlaix_tiny-random-stable-diffusion-xl.json b/data/model_data_json/echarlaix_tiny-random-stable-diffusion-xl.json new file mode 100644 index 0000000000000000000000000000000000000000..ee121908952919b90cb00bcf65ec82055a2b6f38 --- /dev/null +++ b/data/model_data_json/echarlaix_tiny-random-stable-diffusion-xl.json @@ -0,0 +1,14 @@ +{ + "model_id": "echarlaix/tiny-random-stable-diffusion-xl", + "downloads": 250493, + "tags": [ + "diffusers", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: apache-2.0 ---", + "model_explanation_gemini": "Generates random, tiny outputs for stable diffusion XL tasks under the Apache 2.0 license." +} \ No newline at end of file diff --git a/data/model_data_json/emilyalsentzer_Bio_ClinicalBERT.json b/data/model_data_json/emilyalsentzer_Bio_ClinicalBERT.json new file mode 100644 index 0000000000000000000000000000000000000000..443d40ba77120c731aa62b5d7648e792bbf87e62 --- /dev/null +++ b/data/model_data_json/emilyalsentzer_Bio_ClinicalBERT.json @@ -0,0 +1,20 @@ +{ + "model_id": "emilyalsentzer/Bio_ClinicalBERT", + "downloads": 2249870, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "en", + "arxiv:1904.03323", + "arxiv:1901.08746", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: \"en\" tags: - fill-mask license: mit --- # ClinicalBERT - Bio + Clinical BERT Model The Publicly Available Clinical BERT Embeddings paper contains four unique clinicalBERT models: initialized with BERT-Base () or BioBERT () & trained on either all MIMIC notes or only discharge summaries. This model card describes the Bio+Clinical BERT model, which was initialized from BioBERT & trained on all MIMIC notes. ## Pretraining Data The model was trained on all notes from MIMIC III, a database containing electronic health records from ICU patients at the Beth Israel Hospital in Boston, MA. For more details on MIMIC, see here. All notes from the table were included (~880M words). ## Model Pretraining ### Note Preprocessing Each note in MIMIC was first split into sections using a rules-based section splitter (e.g. discharge summary notes were split into \"History of Present Illness\", \"Family History\", \"Brief Hospital Course\", etc. sections). Then each section was split into sentences using SciSpacy ( tokenizer). ### Pretraining Procedures The model was trained using code from Google's BERT repository on a GeForce GTX TITAN X 12 GB GPU. Model parameters were initialized with BioBERT (). ### Pretraining Hyperparameters We used a batch size of 32, a maximum sequence length of 128, and a learning rate of 5 · 10−5 for pre-training our models. The models trained on all MIMIC notes were trained for 150,000 steps. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20). ## How to use the model Load the model via the transformers library: ## More Information Refer to the original paper, Publicly Available Clinical BERT Embeddings (NAACL Clinical NLP Workshop 2019) for additional details and performance on NLI and NER tasks. ## Questions? Post a Github issue on the clinicalBERT repo or email ealsentzer@stanford.edu with any questions.", + "model_explanation_gemini": "ClinicalBERT - Bio + Clinical BERT is a masked language model pretrained on MIMIC III ICU notes for biomedical and clinical NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/emrecan_bert-base-turkish-cased-mean-nli-stsb-tr.json b/data/model_data_json/emrecan_bert-base-turkish-cased-mean-nli-stsb-tr.json new file mode 100644 index 0000000000000000000000000000000000000000..861431064e58e93ebd0c1a275a6682acdcef157b --- /dev/null +++ b/data/model_data_json/emrecan_bert-base-turkish-cased-mean-nli-stsb-tr.json @@ -0,0 +1,22 @@ +{ + "model_id": "emrecan/bert-base-turkish-cased-mean-nli-stsb-tr", + "downloads": 1298560, + "tags": [ + "sentence-transformers", + "pytorch", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "tr", + "dataset:nli_tr", + "dataset:emrecan/stsb-mt-turkish", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - tr pipeline_tag: sentence-similarity license: apache-2.0 tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - nli_tr - emrecan/stsb-mt-turkish widget: source_sentence: \"Bu çok mutlu bir kişi\" sentences: - \"Bu mutlu bir köpek\" - \"Bu sevincinden havalara uçan bir insan\" - \"Çok kar yağıyor\" --- # emrecan/bert-base-turkish-cased-mean-nli-stsb-tr This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. The model was trained on Turkish machine translated versions of NLI and STS-b datasets, using example training scripts from sentence-transformers GitHub repository. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results Evaluation results on test and development sets are given below: | Split | Epoch | cosine_pearson | cosine_spearman | euclidean_pearson | euclidean_spearman | manhattan_pearson | manhattan_spearman | dot_pearson | dot_spearman | |------------|-------|----------------|-----------------|-------------------|--------------------|-------------------|--------------------|-------------|--------------| | test | - | 0.834 | 0.830 | 0.820 | 0.819 | 0.819 | 0.818 | 0.799 | 0.789 | | validation | 1 | 0.850 | 0.848 | 0.831 | 0.835 | 0.83 | 0.83 | 0.80 | 0.806 | | validation | 2 | 0.857 | 0.857 | 0.844 | 0.848 | 0.844 | 0.848 | 0.813 | 0.810 | | validation | 3 | 0.860 | 0.859 | 0.846 | 0.851 | 0.846 | 0.850 | 0.825 | 0.822 | | validation | 4 | 0.859 | 0.860 | 0.846 | 0.851 | 0.846 | 0.851 | 0.825 | 0.823 | ## Training Training scripts []( and []( were used to train the model. The model was trained with the parameters: **DataLoader**: of length 360 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors ", + "model_explanation_gemini": "Maps Turkish sentences to 768-dimensional vectors for semantic similarity tasks like clustering and search, trained on Turkish NLI and STS-b datasets." +} \ No newline at end of file diff --git a/data/model_data_json/fabiochiu_t5-base-tag-generation.json b/data/model_data_json/fabiochiu_t5-base-tag-generation.json new file mode 100644 index 0000000000000000000000000000000000000000..b05848137d10d8e004642c9c72ada797607b1b26 --- /dev/null +++ b/data/model_data_json/fabiochiu_t5-base-tag-generation.json @@ -0,0 +1,20 @@ +{ + "model_id": "fabiochiu/t5-base-tag-generation", + "downloads": 87989, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "safetensors", + "t5", + "text2text-generation", + "generated_from_trainer", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - generated_from_trainer model-index: - name: t5-base-tag-generation results: [] widget: - text: \"Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically-typed and garbage-collected.\" example_title: \"Programming\" --- # Model description This model is t5-base fine-tuned on the 190k Medium Articles dataset for predicting article tags using the article textual content as input. While usually formulated as a multi-label classification problem, this model deals with _tag generation_ as a text2text generation task (inspiration from text2tags). # How to use the model ## Data cleaning The dataset is composed of Medium articles and their tags. However, each Medium article can have at most five tags, therefore the author needs to choose what he/she believes are the best tags (mainly for SEO-related purposes). This means that an article with the \"Python\" tag may have not the \"Programming Languages\" tag, even though the first implies the latter. To clean the dataset accounting for this problem, a hand-made taxonomy of about 1000 tags was built. Using the taxonomy, the tags of each articles have been augmented (e.g. an article with the \"Python\" tag will have the \"Programming Languages\" tag as well, as the taxonomy says that \"Python\" is part of \"Programming Languages\"). The taxonomy is not public, if you are interested in it please send an email at chiusanofabio94@gmail.com. ## Training and evaluation data The model has been trained on a single epoch spanning about 50000 articles, evaluating on 1000 random articles not used during training. ## Evaluation results - eval_loss: 0.8474 - eval_rouge1: 38.6033 - eval_rouge2: 20.5952 - eval_rougeL: 36.4458 - eval_rougeLsum: 36.3202 - eval_gen_len: 15.257 # average number of generated tokens ## Training hyperparameters The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 - mixed_precision_training: Native AMP ### Framework versions - Transformers 4.19.2 - Pytorch 1.11.0+cu113 - Datasets 2.2.2 - Tokenizers 0.12.1", + "model_explanation_gemini": "Generates article tags from textual content by fine-tuning T5-base on a Medium articles dataset, treating tag prediction as a text-to-text generation task." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_VGGT-1B.json b/data/model_data_json/facebook_VGGT-1B.json new file mode 100644 index 0000000000000000000000000000000000000000..c2135fbd80800360fdf88499f08c4da20d409218 --- /dev/null +++ b/data/model_data_json/facebook_VGGT-1B.json @@ -0,0 +1,16 @@ +{ + "model_id": "facebook/VGGT-1B", + "downloads": 200937, + "tags": [ + "safetensors", + "model_hub_mixin", + "pytorch_model_hub_mixin", + "image-to-3d", + "en", + "arxiv:2503.11651", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- tags: - model_hub_mixin - pytorch_model_hub_mixin license: cc-by-nc-4.0 language: - en pipeline_tag: image-to-3d ---

VGGT: Visual Geometry Grounded Transformer

\"Paper ## Overview Visual Geometry Grounded Transformer (VGGT, CVPR 2025) is a feed-forward neural network that directly infers all key 3D attributes of a scene, including extrinsic and intrinsic camera parameters, point maps, depth maps, and 3D point tracks, **from one, a few, or hundreds of its views, within seconds**. ## Quick Start Please refer to our Github Repo ## Citation If you find our repository useful, please consider giving it a star ⭐ and citing our paper in your work:", + "model_explanation_gemini": "Infers 3D scene attributes like camera parameters, depth maps, and point tracks from single or multiple views quickly." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_bart-base.json b/data/model_data_json/facebook_bart-base.json new file mode 100644 index 0000000000000000000000000000000000000000..fc80ba03c74f0552db1d78fe1dd8306b123cffa6 --- /dev/null +++ b/data/model_data_json/facebook_bart-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/bart-base", + "downloads": 2137150, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bart", + "feature-extraction", + "en", + "arxiv:1910.13461", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: en --- # BART (base-sized model) BART model pre-trained on English language. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. and first released in this repository. Disclaimer: The team releasing BART did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BART is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering). ## Intended uses & limitations You can use the raw model for text infilling. However, the model is mostly meant to be fine-tuned on a supervised dataset. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model in PyTorch: ### BibTeX entry and citation info", + "model_explanation_gemini": "BART is a pre-trained English sequence-to-sequence model with a bidirectional encoder and autoregressive decoder, effective for text generation (summarization, translation) and comprehension tasks (classification, QA) when fine-tuned." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_bart-large-cnn.json b/data/model_data_json/facebook_bart-large-cnn.json new file mode 100644 index 0000000000000000000000000000000000000000..e480723eaffe2b3ad6f2da119598be66cdd6689b --- /dev/null +++ b/data/model_data_json/facebook_bart-large-cnn.json @@ -0,0 +1,25 @@ +{ + "model_id": "facebook/bart-large-cnn", + "downloads": 3466256, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "bart", + "text2text-generation", + "summarization", + "en", + "dataset:cnn_dailymail", + "arxiv:1910.13461", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: summarization license: mit thumbnail: datasets: - cnn_dailymail model-index: - name: facebook/bart-large-cnn results: - task: type: summarization name: Summarization dataset: name: cnn_dailymail type: cnn_dailymail config: 3.0.0 split: train metrics: - name: ROUGE-1 type: rouge value: 42.9486 verified: true - name: ROUGE-2 type: rouge value: 20.8149 verified: true - name: ROUGE-L type: rouge value: 30.6186 verified: true - name: ROUGE-LSUM type: rouge value: 40.0376 verified: true - name: loss type: loss value: 2.529000997543335 verified: true - name: gen_len type: gen_len value: 78.5866 verified: true --- # BART (large-sized model), fine-tuned on CNN Daily Mail BART model pre-trained on English language, and fine-tuned on CNN Daily Mail. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. and first released in this repository ( Disclaimer: The team releasing BART did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering). This particular checkpoint has been fine-tuned on CNN Daily Mail, a large collection of text-summary pairs. ## Intended uses & limitations You can use this model for text summarization. ### How to use Here is how to use this model with the [pipeline API: ### BibTeX entry and citation info", + "model_explanation_gemini": "Fine-tuned for English text summarization, specifically optimized for the CNN Daily Mail dataset using a BART large-sized seq2seq model." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_bart-large-mnli.json b/data/model_data_json/facebook_bart-large-mnli.json new file mode 100644 index 0000000000000000000000000000000000000000..13e8d811cf62f8b7f85e79ebf072309221c54704 --- /dev/null +++ b/data/model_data_json/facebook_bart-large-mnli.json @@ -0,0 +1,23 @@ +{ + "model_id": "facebook/bart-large-mnli", + "downloads": 3454228, + "tags": [ + "transformers", + "pytorch", + "jax", + "rust", + "safetensors", + "bart", + "text-classification", + "zero-shot-classification", + "dataset:multi_nli", + "arxiv:1910.13461", + "arxiv:1909.00161", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit thumbnail: pipeline_tag: zero-shot-classification datasets: - multi_nli --- # bart-large-mnli This is the checkpoint for bart-large after being trained on the MultiNLI (MNLI) dataset. Additional information about this model: - The bart-large model page - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension - BART fairseq implementation ## NLI-based Zero Shot Text Classification Yin et al. proposed a method for using pre-trained NLI models as a ready-made zero-shot sequence classifiers. The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from each candidate label. For example, if we want to evaluate whether a sequence belongs to the class \"politics\", we could construct a hypothesis of . The probabilities for entailment and contradiction are then converted to label probabilities. This method is surprisingly effective in many cases, particularly when used with larger pre-trained models like BART and Roberta. See this blog post for a more expansive introduction to this and other zero shot methods, and see the code snippets below for examples of using this model for zero-shot classification both with Hugging Face's built-in pipeline and with native Transformers/PyTorch code. #### With the zero-shot classification pipeline The model can be loaded with the pipeline like so: You can then use this pipeline to classify sequences into any of the class names you specify. If more than one candidate label can be correct, pass to calculate each class independently: #### With manual PyTorch", + "model_explanation_gemini": "Classifies text into user-defined categories without task-specific training by leveraging natural language inference (NLI) and hypothesis formulation." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_bart-large.json b/data/model_data_json/facebook_bart-large.json new file mode 100644 index 0000000000000000000000000000000000000000..a2c444996c21c9d9774581397a307b64e702d358 --- /dev/null +++ b/data/model_data_json/facebook_bart-large.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/bart-large", + "downloads": 103606, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "bart", + "feature-extraction", + "en", + "arxiv:1910.13461", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: en --- # BART (large-sized model) BART model pre-trained on English language. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. and first released in this repository. Disclaimer: The team releasing BART did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BART is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering). ## Intended uses & limitations You can use the raw model for text infilling. However, the model is mostly meant to be fine-tuned on a supervised dataset. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model in PyTorch: ### BibTeX entry and citation info", + "model_explanation_gemini": "BART-large is a pre-trained English seq2seq model with a bidirectional encoder and autoregressive decoder, effective for text generation (summarization, translation) and comprehension tasks (classification, QA) when fine-tuned." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_blenderbot-400M-distill.json b/data/model_data_json/facebook_blenderbot-400M-distill.json new file mode 100644 index 0000000000000000000000000000000000000000..5c1d1295eda029cf2d967ff4d4f4d1bf3af43562 --- /dev/null +++ b/data/model_data_json/facebook_blenderbot-400M-distill.json @@ -0,0 +1,24 @@ +{ + "model_id": "facebook/blenderbot-400M-distill", + "downloads": 153847, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "blenderbot", + "text2text-generation", + "convAI", + "conversational", + "facebook", + "en", + "dataset:blended_skill_talk", + "arxiv:2004.13637", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en thumbnail: tags: - convAI - conversational - facebook license: apache-2.0 datasets: - blended_skill_talk metrics: - perplexity --- ## Model description + Paper: Recipes for building an open-domain chatbot + Original PARLAI Code ### Abstract Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.", + "model_explanation_gemini": "A distilled 400M-parameter conversational AI model designed for open-domain chatbot tasks, blending skills like engaging dialogue, empathy, and knowledge display based on large-scale training data." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_contriever.json b/data/model_data_json/facebook_contriever.json new file mode 100644 index 0000000000000000000000000000000000000000..35f3a5964e426c61921ab576c0f75a2d15433e26 --- /dev/null +++ b/data/model_data_json/facebook_contriever.json @@ -0,0 +1,14 @@ +{ + "model_id": "facebook/contriever", + "downloads": 213710, + "tags": [ + "transformers", + "pytorch", + "bert", + "arxiv:2112.09118", + "endpoints_compatible", + "region:us" + ], + "description": "This model has been trained without supervision following the approach described in Towards Unsupervised Dense Information Retrieval with Contrastive Learning. The associated GitHub repository is available here ## Usage (HuggingFace Transformers) Using the model directly available in HuggingFace transformers requires to add a mean pooling operation to obtain a sentence embedding.", + "model_explanation_gemini": "Generates unsupervised dense sentence embeddings for information retrieval using contrastive learning, requiring mean pooling for embedding extraction." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_convnextv2-base-22k-384.json b/data/model_data_json/facebook_convnextv2-base-22k-384.json new file mode 100644 index 0000000000000000000000000000000000000000..cc9e666aca19cbd832be0142e3822521f2a9bc3d --- /dev/null +++ b/data/model_data_json/facebook_convnextv2-base-22k-384.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/convnextv2-base-22k-384", + "downloads": 80286, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "convnextv2", + "image-classification", + "vision", + "dataset:imagenet-22k", + "arxiv:2301.00808", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-22k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # ConvNeXt V2 (base-sized model) ConvNeXt V2 model pretrained using the FCMAE framework and fine-tuned on the ImageNet-22K dataset at resolution 384x384. It was introduced in the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Woo et al. and first released in this repository. Disclaimer: The team releasing ConvNeXT V2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description ConvNeXt V2 is a pure convolutional model (ConvNet) that introduces a fully convolutional masked autoencoder framework (FCMAE) and a new Global Response Normalization (GRN) layer to ConvNeXt. ConvNeXt V2 significantly improves the performance of pure ConvNets on various recognition benchmarks. !model image ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### BibTeX entry and citation info" +} \ No newline at end of file diff --git a/data/model_data_json/facebook_deit-base-patch16-224.json b/data/model_data_json/facebook_deit-base-patch16-224.json new file mode 100644 index 0000000000000000000000000000000000000000..c4bad476327dbbc88a5f42e69f90ba37f5503f1a --- /dev/null +++ b/data/model_data_json/facebook_deit-base-patch16-224.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/deit-base-patch16-224", + "downloads": 112251, + "tags": [ + "transformers", + "pytorch", + "tf", + "vit", + "image-classification", + "dataset:imagenet-1k", + "arxiv:2012.12877", + "arxiv:2006.03677", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - image-classification datasets: - imagenet-1k --- # Data-efficient Image Transformer (base-sized model) Data-efficient Image Transformer (DeiT) model pre-trained and fine-tuned on ImageNet-1k (1 million images, 1,000 classes) at resolution 224x224. It was first introduced in the paper Training data-efficient image transformers & distillation through attention by Touvron et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman. Disclaimer: The team releasing DeiT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description This model is actually a more efficiently trained Vision Transformer (ViT). The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pre-trained and fine-tuned on a large collection of images in a supervised fashion, namely ImageNet-1k, at a resolution of 224x224 pixels. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Since this model is a more efficiently trained ViT model, you can plug it into ViTModel or ViTForImageClassification. Note that the model expects the data to be prepared using DeiTFeatureExtractor. Here we use AutoFeatureExtractor, which will automatically use the appropriate feature extractor given the model name. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Currently, both the feature extractor and model support PyTorch. Tensorflow and JAX/FLAX are coming soon. ## Training data The ViT model was pretrained on ImageNet-1k, a dataset consisting of 1 million images and 1k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. At inference time, images are resized/rescaled to the same resolution (256x256), center-cropped at 224x224 and normalized across the RGB channels with the ImageNet mean and standard deviation. ### Pretraining The model was trained on a single 8-GPU node for 3 days. Training resolution is 224. For all hyperparameters (such as batch size and learning rate) we refer to table 9 of the original paper. ## Evaluation results | Model | ImageNet top-1 accuracy | ImageNet top-5 accuracy | # params | URL | |---------------------------------------|-------------------------|-------------------------|----------|------------------------------------------------------------------| | DeiT-tiny | 72.2 | 91.1 | 5M | | | DeiT-small | 79.9 | 95.0 | 22M | | | **DeiT-base** | **81.8** | **95.6** | **86M** | ** | | DeiT-tiny distilled | 74.5 | 91.9 | 6M | | | DeiT-small distilled | 81.2 | 95.4 | 22M | | | DeiT-base distilled | 83.4 | 96.5 | 87M | | | DeiT-base 384 | 82.9 | 96.2 | 87M | | | DeiT-base distilled 384 (1000 epochs) | 85.2 | 97.2 | 88M | | Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A pre-trained Vision Transformer model fine-tuned for efficient image classification on the ImageNet-1k dataset at 224x224 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_detr-resnet-101.json b/data/model_data_json/facebook_detr-resnet-101.json new file mode 100644 index 0000000000000000000000000000000000000000..1f631ac03f753c3ec5f09fff9dec59eb3f330f11 --- /dev/null +++ b/data/model_data_json/facebook_detr-resnet-101.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/detr-resnet-101", + "downloads": 208495, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "detr", + "object-detection", + "vision", + "dataset:coco", + "arxiv:2005.12872", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - object-detection - vision datasets: - coco widget: - src: example_title: Savanna - src: example_title: Football Match - src: example_title: Airport --- # DETR (End-to-End Object Detection) model with ResNet-101 backbone DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images). It was introduced in the paper End-to-End Object Detection with Transformers by Carion et al. and first released in this repository. Disclaimer: The team releasing DETR did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes. The model uses so-called object queries to detect objects in an image. Each object query looks for a particular object in the image. For COCO, the number of object queries is set to 100. The model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model. !model image ## Intended uses & limitations You can use the raw model for object detection. See the model hub to look for all available DETR models. ### How to use Here is how to use this model: This should output (something along the lines of): Currently, both the feature extractor and model support PyTorch. ## Training data The DETR model was trained on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled such that the shortest side is at least 800 pixels and the largest side at most 1333 pixels, and normalized across the RGB channels with the ImageNet mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225). ### Training The model was trained for 300 epochs on 16 V100 GPUs. This takes 3 days, with 4 images per GPU (hence a total batch size of 64). ## Evaluation results This model achieves an AP (average precision) of **43.5** on COCO 2017 validation. For more details regarding evaluation results, we refer to table 1 of the original paper. ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects objects in images using a transformer-based approach with a ResNet-101 backbone, trained on the COCO dataset to predict class labels and bounding boxes." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_detr-resnet-50.json b/data/model_data_json/facebook_detr-resnet-50.json new file mode 100644 index 0000000000000000000000000000000000000000..4e494aa18442b8df099b8b540fd8a6b7b05b6e55 --- /dev/null +++ b/data/model_data_json/facebook_detr-resnet-50.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/detr-resnet-50", + "downloads": 490446, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "detr", + "object-detection", + "vision", + "dataset:coco", + "arxiv:2005.12872", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - object-detection - vision datasets: - coco widget: - src: example_title: Savanna - src: example_title: Football Match - src: example_title: Airport --- # DETR (End-to-End Object Detection) model with ResNet-50 backbone DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images). It was introduced in the paper End-to-End Object Detection with Transformers by Carion et al. and first released in this repository. Disclaimer: The team releasing DETR did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes. The model uses so-called object queries to detect objects in an image. Each object query looks for a particular object in the image. For COCO, the number of object queries is set to 100. The model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model. !model image ## Intended uses & limitations You can use the raw model for object detection. See the model hub to look for all available DETR models. ### How to use Here is how to use this model: This should output: Currently, both the feature extractor and model support PyTorch. ## Training data The DETR model was trained on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled such that the shortest side is at least 800 pixels and the largest side at most 1333 pixels, and normalized across the RGB channels with the ImageNet mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225). ### Training The model was trained for 300 epochs on 16 V100 GPUs. This takes 3 days, with 4 images per GPU (hence a total batch size of 64). ## Evaluation results This model achieves an AP (average precision) of **42.0** on COCO 2017 validation. For more details regarding evaluation results, we refer to table 1 of the original paper. ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects objects in images using a transformer-based approach with a ResNet-50 backbone, trained on COCO dataset for bounding box prediction and classification." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_dino-vitb16.json b/data/model_data_json/facebook_dino-vitb16.json new file mode 100644 index 0000000000000000000000000000000000000000..1427584456e69d767df6b14ae38bf50af6d854dc --- /dev/null +++ b/data/model_data_json/facebook_dino-vitb16.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/dino-vitb16", + "downloads": 99742, + "tags": [ + "transformers", + "pytorch", + "tf", + "vit", + "image-feature-extraction", + "dino", + "vision", + "dataset:imagenet-1k", + "arxiv:2104.14294", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - dino - vision datasets: - imagenet-1k --- # Vision Transformer (base-sized model, patch size 16) trained using DINO Vision Transformer (ViT) model trained using the DINO method. It was introduced in the paper Emerging Properties in Self-Supervised Vision Transformers by Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin and first released in this repository. Disclaimer: The team releasing DINO did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-1k, at a resolution of 224x224 pixels. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not include any fine-tuned heads. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: ### BibTeX entry and citation info", + "model_explanation_gemini": "A Vision Transformer (ViT) model pre-trained self-supervised on ImageNet-1k for extracting image features usable in downstream tasks like classification." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_dino-vits8.json b/data/model_data_json/facebook_dino-vits8.json new file mode 100644 index 0000000000000000000000000000000000000000..fea483ccde2e42b5500c10cbea48590881c7b408 --- /dev/null +++ b/data/model_data_json/facebook_dino-vits8.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/dino-vits8", + "downloads": 213200, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-feature-extraction", + "dino", + "vision", + "dataset:imagenet-1k", + "arxiv:2104.14294", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - dino - vision datasets: - imagenet-1k --- # Vision Transformer (small-sized model, patch size 8) trained using DINO Vision Transformer (ViT) model trained using the DINO method. It was introduced in the paper Emerging Properties in Self-Supervised Vision Transformers by Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin and first released in this repository. Disclaimer: The team releasing DINO did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-1k, at a resolution of 224x224 pixels. Images are presented to the model as a sequence of fixed-size patches (resolution 8x8), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not include any fine-tuned heads. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: ### BibTeX entry and citation info", + "model_explanation_gemini": "A small Vision Transformer (ViT) model with 8x8 patch size, self-supervisedly trained on ImageNet-1k using DINO for image feature extraction and classification tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_dinov2-base.json b/data/model_data_json/facebook_dinov2-base.json new file mode 100644 index 0000000000000000000000000000000000000000..8e60275094c1eb406e0c74910ccee2119ed385d1 --- /dev/null +++ b/data/model_data_json/facebook_dinov2-base.json @@ -0,0 +1,18 @@ +{ + "model_id": "facebook/dinov2-base", + "downloads": 1731394, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "dinov2", + "image-feature-extraction", + "dino", + "vision", + "arxiv:2304.07193", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - dino - vision inference: false --- # Vision Transformer (base-sized model) trained using DINOv2 Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released in this repository. Disclaimer: The team releasing DINOv2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion. Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not include any fine-tuned heads. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for feature extraction. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: ### BibTeX entry and citation info", + "model_explanation_gemini": "A base-sized Vision Transformer model trained with DINOv2 for self-supervised learning of robust image features, primarily used for extracting visual representations for downstream tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_dinov2-giant.json b/data/model_data_json/facebook_dinov2-giant.json new file mode 100644 index 0000000000000000000000000000000000000000..01ecd926b5ade8a6ab47865322a44ab9b4c40e91 --- /dev/null +++ b/data/model_data_json/facebook_dinov2-giant.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/dinov2-giant", + "downloads": 112657, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "dinov2", + "image-feature-extraction", + "dino", + "vision", + "arxiv:2304.07193", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - dino - vision --- # Vision Transformer (giant-sized model) trained using DINOv2 Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released in this repository. Disclaimer: The team releasing DINOv2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion. Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not include any fine-tuned heads. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for feature extraction. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: ### BibTeX entry and citation info", + "model_explanation_gemini": "Self-supervised giant-sized Vision Transformer (ViT) model for extracting robust image features without supervision, suitable for downstream vision tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_dinov2-large.json b/data/model_data_json/facebook_dinov2-large.json new file mode 100644 index 0000000000000000000000000000000000000000..bf10871c00f39a5488687f11efb1a8e682416449 --- /dev/null +++ b/data/model_data_json/facebook_dinov2-large.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/dinov2-large", + "downloads": 518602, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "dinov2", + "image-feature-extraction", + "dino", + "vision", + "arxiv:2304.07193", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - dino - vision --- # Vision Transformer (large-sized model) trained using DINOv2 Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released in this repository. Disclaimer: The team releasing DINOv2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion. Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not include any fine-tuned heads. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for feature extraction. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: ### BibTeX entry and citation info", + "model_explanation_gemini": "facebook_dinov2-large is a large Vision Transformer model trained with DINOv2 for self-supervised learning of robust image features, enabling downstream tasks like classification through feature extraction." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_dinov2-small.json b/data/model_data_json/facebook_dinov2-small.json new file mode 100644 index 0000000000000000000000000000000000000000..8d984c66a52d6bf916e57d9bf4984bd09be95e23 --- /dev/null +++ b/data/model_data_json/facebook_dinov2-small.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/dinov2-small", + "downloads": 3166751, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "dinov2", + "image-feature-extraction", + "dino", + "vision", + "arxiv:2304.07193", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - dino - vision --- # Vision Transformer (small-sized model) trained using DINOv2 Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released in this repository. Disclaimer: The team releasing DINOv2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion. Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not include any fine-tuned heads. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for feature extraction. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: ### BibTeX entry and citation info", + "model_explanation_gemini": "A small Vision Transformer (ViT) model trained with DINOv2 for self-supervised visual feature extraction, suitable for downstream tasks like image classification." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_encodec_24khz.json b/data/model_data_json/facebook_encodec_24khz.json new file mode 100644 index 0000000000000000000000000000000000000000..8501b61d4ba931ef6c3d54946fb1f4c2ecdf0c7d --- /dev/null +++ b/data/model_data_json/facebook_encodec_24khz.json @@ -0,0 +1,15 @@ +{ + "model_id": "facebook/encodec_24khz", + "downloads": 479543, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "encodec", + "feature-extraction", + "arxiv:2210.13438", + "region:us" + ], + "description": "--- inference: false --- !encodec image # Model Card for EnCodec This model card provides details and information about EnCodec, a state-of-the-art real-time audio codec developed by Meta AI. ## Model Details ### Model Description EnCodec is a high-fidelity audio codec leveraging neural networks. It introduces a streaming encoder-decoder architecture with quantized latent space, trained in an end-to-end fashion. The model simplifies and speeds up training using a single multiscale spectrogram adversary that efficiently reduces artifacts and produces high-quality samples. It also includes a novel loss balancer mechanism that stabilizes training by decoupling the choice of hyperparameters from the typical scale of the loss. Additionally, lightweight Transformer models are used to further compress the obtained representation while maintaining real-time performance. - **Developed by:** Meta AI - **Model type:** Audio Codec ### Model Sources - **Repository:** GitHub Repository - **Paper:** EnCodec: End-to-End Neural Audio Codec ## Uses ### Direct Use EnCodec can be used directly as an audio codec for real-time compression and decompression of audio signals. It provides high-quality audio compression and efficient decoding. The model was trained on various bandwiths, which can be specified when encoding (compressing) and decoding (decompressing). Two different setup exist for EnCodec: - Non-streamable: the input audio is split into chunks of 1 seconds, with an overlap of 10 ms, which are then encoded. - Streamable: weight normalizationis used on the convolution layers, and the input is not split into chunks but rather padded on the left. ### Downstream Use EnCodec can be fine-tuned for specific audio tasks or integrated into larger audio processing pipelines for applications such as speech generation, music generation, or text to speech tasks. [More Information Needed] ## How to Get Started with the Model Use the following code to get started with the EnCodec model using a dummy example from the LibriSpeech dataset (~9MB). First, install the required Python packages: Then load an audio sample, and run a forward pass of the model: ## Training Details The model was trained for 300 epochs, with one epoch being 2,000 updates with the Adam optimizer with a batch size of 64 examples of 1 second each, a learning rate of 3 · 10−4 , β1 = 0.5, and β2 = 0.9. All the models are traind using 8 A100 GPUs. ### Training Data - For speech: - DNS Challenge 4 - Common Voice - For general audio: - AudioSet - FSD50K - For music: - Jamendo dataset They used four different training strategies to sample for these datasets: - (s1) sample a single source from Jamendo with probability 0.32; - (s2) sample a single source from the other datasets with the same probability; - (s3) mix two sources from all datasets with a probability of 0.24; - (s4) mix three sources from all datasets except music with a probability of 0.12. The audio is normalized by file and a random gain between -10 and 6 dB id applied. ## Evaluation ### Subjectif metric for restoration: This models was evalutated using the MUSHRA protocol (Series, 2014), using both a hidden reference and a low anchor. Annotators were recruited using a crowd-sourcing platform, in which they were asked to rate the perceptual quality of the provided samples in a range between 1 to 100. They randomly select 50 samples of 5 seconds from each category of the the test set and force at least 10 annotations per samples. To filter noisy annotations and outliers we remove annotators who rate the reference recordings less then 90 in at least 20% of the cases, or rate the low-anchor recording above 80 more than 50% of the time. ### Objective metric for restoration: The ViSQOL()ink) metric was used together with the Scale-Invariant Signal-to-Noise Ration (SI-SNR) (Luo & Mesgarani, 2019; Nachmani et al., 2020; Chazan et al., 2021). ### Results The results of the evaluation demonstrate the superiority of EnCodec compared to the baselines across different bandwidths (1.5, 3, 6, and 12 kbps). When comparing EnCodec with the baselines at the same bandwidth, EnCodec consistently outperforms them in terms of MUSHRA score. Notably, EnCodec achieves better performance, on average, at 3 kbps compared to Lyra-v2 at 6 kbps and Opus at 12 kbps. Additionally, by incorporating the language model over the codes, it is possible to achieve a bandwidth reduction of approximately 25-40%. For example, the bandwidth of the 3 kbps model can be reduced to 1.9 kbps. #### Summary EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths. The model's performance was evaluated across different settings, ranging from 24kHz monophonic at 1.5 kbps to 48kHz stereophonic, showcasing both subjective and objective results. Notably, EnCodec incorporates a novel spectrogram-only adversarial loss, effectively reducing artifacts and enhancing sample quality. Training stability and interpretability were further enhanced through the introduction of a gradient balancer for the loss weights. Additionally, the study demonstrated that a compact Transformer model can be employed to achieve an additional bandwidth reduction of up to 40% without compromising quality, particularly in applications where low latency is not critical (e.g., music streaming). ## Citation **BibTeX:**", + "model_explanation_gemini": "EnCodec is a high-fidelity neural audio codec designed for real-time compression and decompression of audio signals, leveraging streaming encoder-decoder architecture and quantized latent space to produce high-quality samples efficiently." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_encodec_32khz.json b/data/model_data_json/facebook_encodec_32khz.json new file mode 100644 index 0000000000000000000000000000000000000000..fec38082b20c3739310f8f896175ea2efefe0b88 --- /dev/null +++ b/data/model_data_json/facebook_encodec_32khz.json @@ -0,0 +1,15 @@ +{ + "model_id": "facebook/encodec_32khz", + "downloads": 246727, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "encodec", + "feature-extraction", + "arxiv:2306.05284", + "region:us" + ], + "description": "--- inference: false --- !encodec image # Model Card for EnCodec This model card provides details and information about EnCodec 32kHz, a state-of-the-art real-time audio codec developed by Meta AI. This EnCodec checkpoint was trained specifically as part of the MusicGen project, and is intended to be used in conjuction with the MusicGen models. ## Model Details ### Model Description EnCodec is a high-fidelity audio codec leveraging neural networks. It introduces a streaming encoder-decoder architecture with quantized latent space, trained in an end-to-end fashion. The model simplifies and speeds up training using a single multiscale spectrogram adversary that efficiently reduces artifacts and produces high-quality samples. It also includes a novel loss balancer mechanism that stabilizes training by decoupling the choice of hyperparameters from the typical scale of the loss. Additionally, lightweight Transformer models are used to further compress the obtained representation while maintaining real-time performance. This variant of EnCodec is trained on 20k of music data, consisting of an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music datasets. - **Developed by:** Meta AI - **Model type:** Audio Codec ### Model Sources - **Repository:** GitHub Repository - **Paper:** Simple and Controllable Music Generation ## Uses ### Direct Use EnCodec can be used directly as an audio codec for real-time compression and decompression of audio signals. It provides high-quality audio compression and efficient decoding. The model was trained on various bandwiths, which can be specified when encoding (compressing) and decoding (decompressing). Two different setup exist for EnCodec: - Non-streamable: the input audio is split into chunks of 1 seconds, with an overlap of 10 ms, which are then encoded. - Streamable: weight normalizationis used on the convolution layers, and the input is not split into chunks but rather padded on the left. ### Downstream Use This variant of EnCodec is designed to be used in conjunction with the official MusicGen checkpoints. However, it can also be used standalone to encode audio files. ## How to Get Started with the Model Use the following code to get started with the EnCodec model using a dummy example from the LibriSpeech dataset (~9MB). First, install the required Python packages: Then load an audio sample, and run a forward pass of the model: ## Evaluation For evaluation results, refer to the MusicGen evaluation scores. ## Summary EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths. The model's performance was evaluated across different settings, ranging from 24kHz monophonic at 1.5 kbps to 48kHz stereophonic, showcasing both subjective and objective results. Notably, EnCodec incorporates a novel spectrogram-only adversarial loss, effectively reducing artifacts and enhancing sample quality. Training stability and interpretability were further enhanced through the introduction of a gradient balancer for the loss weights. Additionally, the study demonstrated that a compact Transformer model can be employed to achieve an additional bandwidth reduction of up to 40% without compromising quality, particularly in applications where low latency is not critical (e.g., music streaming). ## Citation **BibTeX:**", + "model_explanation_gemini": "facebook_encodec_32khz is a high-fidelity neural audio codec designed for real-time compression and decompression of audio signals, primarily used with MusicGen models to produce high-quality audio samples at various bandwidths." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_esm1b_t33_650M_UR50S.json b/data/model_data_json/facebook_esm1b_t33_650M_UR50S.json new file mode 100644 index 0000000000000000000000000000000000000000..fb249c7360d94b6dc9821895e511199bd39d6f51 --- /dev/null +++ b/data/model_data_json/facebook_esm1b_t33_650M_UR50S.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/esm1b_t33_650M_UR50S", + "downloads": 12217, + "tags": [ + "transformers", + "pytorch", + "tf", + "esm", + "fill-mask", + "arxiv:1907.11692", + "arxiv:1810.04805", + "arxiv:1603.05027", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit widget: - text: \"MQIFVKTLTGKTITLEVEPSTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG\" --- # **ESM-1b** ESM-1b (paper, repository) is a transformer protein language model, trained on protein sequence data without label supervision. The model is pretrained on Uniref50 with an unsupervised masked language modeling (MLM) objective, meaning the model is trained to predict amino acids from the surrounding sequence context. This pretraining objective allows ESM-1b to learn generally useful features which can be transferred to downstream prediction tasks. ESM-1b has been evaluated on a variety of tasks related to protein structure and function, including remote homology detection, secondary structure prediction, contact prediction, and prediction of the effects of mutations on function, producing state-of-the-art results. **Important note**: ESM-2 is now available in a range of checkpoint sizes. For most tasks, ESM-2 performance will be superior to ESM-1 and ESM-1b, and so we recommend using it instead unless your goal is explicitly to compare against ESM-1b. The ESM-2 checkpoint closest in size to ESM-1b is esm2_t33_650M_UR50D. ## **Model description** The ESM-1b model is based on the RoBERTa architecture and training procedure, using the Uniref50 2018_03 database of protein sequences. Note that the pretraining is on the raw protein sequences only. The training is purely unsupervised -- during training no labels are given related to structure or function. Training is with the masked language modeling objective. The masking follows the procedure of Devlin et al. 2019, randomly masking 15% of the amino acids in the input, and includes the pass-through and random token noise. One architecture difference from the RoBERTa model is that ESM-1b uses pre-activation layer normalization. The learned representations can be used as features for downstream tasks. For example if you have a dataset of measurements of protein activity you can fit a regression model on the features output by ESM-1b to predict the activity of new sequences. The model can also be fine-tuned. ESM-1b can infer information about the structure and function of proteins without further supervision, i.e. it is capable of zero-shot transfer to structure and function prediction. Rao et al. 2020 found that the attention heads of ESM-1b directly represent contacts in the 3d structure of the protein. Meier et al. 2021 found that ESM-1b can be used to score the effect of sequence variations on protein function. ## **Intended uses & limitations** The model can be used for feature extraction, fine-tuned on downstream tasks, or used directly to make inferences about the structure and function of protein sequences, like any other masked language model. For full examples, please see our notebook on fine-tuning protein models ## **Training data** The ESM-1b model was pretrained on Uniref50 2018-03, a dataset consisting of approximately 30 million protein sequences. ## **Training procedure** ### **Preprocessing** The protein sequences are uppercased and tokenized using a single space and a vocabulary size of 21. The inputs of the model are then of the form: During training, sequences longer than 1023 tokens (without CLS) are randomly cropped to a length of 1023. The details of the masking procedure for each sequence follow Devlin et al. 2019: * 15% of the amino acids are masked. * In 80% of the cases, the masked amino acids are replaced by . * In 10% of the cases, the masked amino acids are replaced by a random amino acid (different) from the one they replace. * In the 10% remaining cases, the masked amino acids are left as is. ### **Pretraining** The model was trained on 128 NVIDIA v100 GPUs for 500K updates, using sequence length 1024 (131,072 tokens per batch). The optimizer used is Adam (betas=[0.9, 0.999]) with a learning rate of 1e-4, a weight decay of 0, learning rate warmup for 16k steps and inverse square root decay of the learning rate after.", + "model_explanation_gemini": "Predicts amino acids from protein sequences using masked language modeling to learn features for protein structure and function tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_esm2_t12_35M_UR50D.json b/data/model_data_json/facebook_esm2_t12_35M_UR50D.json new file mode 100644 index 0000000000000000000000000000000000000000..49dac84e423e3f4889e5169c93e0bb53b798fd19 --- /dev/null +++ b/data/model_data_json/facebook_esm2_t12_35M_UR50D.json @@ -0,0 +1,18 @@ +{ + "model_id": "facebook/esm2_t12_35M_UR50D", + "downloads": 353757, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "esm", + "fill-mask", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit widget: - text: \"MQIFVKTLTGKTITLEVEPSTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG\" --- ## ESM-2 ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. For detailed information on the model architecture and training data, please refer to the accompanying paper. You may also be interested in some demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest. Several ESM-2 checkpoints are available in the Hub with varying sizes. Larger sizes generally have somewhat better accuracy, but require much more memory and time to train: | Checkpoint name | Num layers | Num parameters | |------------------------------|----|----------| | esm2_t48_15B_UR50D | 48 | 15B | | esm2_t36_3B_UR50D | 36 | 3B | | esm2_t33_650M_UR50D | 33 | 650M | | esm2_t30_150M_UR50D | 30 | 150M | | esm2_t12_35M_UR50D | 12 | 35M | | esm2_t6_8M_UR50D | 6 | 8M |", + "model_explanation_gemini": "ESM-2 is a protein sequence model trained for masked language modeling, designed for fine-tuning on protein-related tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_esm2_t33_650M_UR50D.json b/data/model_data_json/facebook_esm2_t33_650M_UR50D.json new file mode 100644 index 0000000000000000000000000000000000000000..695273546268856dee6e333fcd952ad57ce172f6 --- /dev/null +++ b/data/model_data_json/facebook_esm2_t33_650M_UR50D.json @@ -0,0 +1,18 @@ +{ + "model_id": "facebook/esm2_t33_650M_UR50D", + "downloads": 646514, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "esm", + "fill-mask", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit widget: - text: \"MQIFVKTLTGKTITLEVEPSTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG\" --- ## ESM-2 ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. For detailed information on the model architecture and training data, please refer to the accompanying paper. You may also be interested in some demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest. Several ESM-2 checkpoints are available in the Hub with varying sizes. Larger sizes generally have somewhat better accuracy, but require much more memory and time to train: | Checkpoint name | Num layers | Num parameters | |------------------------------|----|----------| | esm2_t48_15B_UR50D | 48 | 15B | | esm2_t36_3B_UR50D | 36 | 3B | | esm2_t33_650M_UR50D | 33 | 650M | | esm2_t30_150M_UR50D | 30 | 150M | | esm2_t12_35M_UR50D | 12 | 35M | | esm2_t6_8M_UR50D | 6 | 8M |", + "model_explanation_gemini": "ESM-2 is a protein model trained for masked language modeling, designed for fine-tuning on protein sequence tasks with varying model sizes available." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_esm2_t36_3B_UR50D.json b/data/model_data_json/facebook_esm2_t36_3B_UR50D.json new file mode 100644 index 0000000000000000000000000000000000000000..ccef40095b1d17eed5c942dfb18f7ac3e58314e6 --- /dev/null +++ b/data/model_data_json/facebook_esm2_t36_3B_UR50D.json @@ -0,0 +1,17 @@ +{ + "model_id": "facebook/esm2_t36_3B_UR50D", + "downloads": 4377010, + "tags": [ + "transformers", + "pytorch", + "tf", + "esm", + "fill-mask", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit widget: - text: \"MQIFVKTLTGKTITLEVEPSTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG\" --- ## ESM-2 ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. For detailed information on the model architecture and training data, please refer to the accompanying paper. You may also be interested in some demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest. Several ESM-2 checkpoints are available in the Hub with varying sizes. Larger sizes generally have somewhat better accuracy, but require much more memory and time to train: | Checkpoint name | Num layers | Num parameters | |------------------------------|----|----------| | esm2_t48_15B_UR50D | 48 | 15B | | esm2_t36_3B_UR50D | 36 | 3B | | esm2_t33_650M_UR50D | 33 | 650M | | esm2_t30_150M_UR50D | 30 | 150M | | esm2_t12_35M_UR50D | 12 | 35M | | esm2_t6_8M_UR50D | 6 | 8M |", + "model_explanation_gemini": "ESM-2 is a protein model trained for masked language modeling, designed for fine-tuning on protein sequence tasks with varying model sizes available." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_esm2_t6_8M_UR50D.json b/data/model_data_json/facebook_esm2_t6_8M_UR50D.json new file mode 100644 index 0000000000000000000000000000000000000000..bc67381ac77c400b4ffec83536963181a0022c69 --- /dev/null +++ b/data/model_data_json/facebook_esm2_t6_8M_UR50D.json @@ -0,0 +1,18 @@ +{ + "model_id": "facebook/esm2_t6_8M_UR50D", + "downloads": 1460541, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "esm", + "fill-mask", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit widget: - text: \"MQIFVKTLTGKTITLEVEPSTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG\" --- ## ESM-2 ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. For detailed information on the model architecture and training data, please refer to the accompanying paper. You may also be interested in some demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest. Several ESM-2 checkpoints are available in the Hub with varying sizes. Larger sizes generally have somewhat better accuracy, but require much more memory and time to train: | Checkpoint name | Num layers | Num parameters | |------------------------------|----|----------| | esm2_t48_15B_UR50D | 48 | 15B | | esm2_t36_3B_UR50D | 36 | 3B | | esm2_t33_650M_UR50D | 33 | 650M | | esm2_t30_150M_UR50D | 30 | 150M | | esm2_t12_35M_UR50D | 12 | 35M | | esm2_t6_8M_UR50D | 6 | 8M |", + "model_explanation_gemini": "ESM-2 is a protein language model trained for masked language modeling, designed for fine-tuning on protein sequence tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_esmfold_v1.json b/data/model_data_json/facebook_esmfold_v1.json new file mode 100644 index 0000000000000000000000000000000000000000..55ec469c31d699a74863cf8beaeb48bc74b47e1f --- /dev/null +++ b/data/model_data_json/facebook_esmfold_v1.json @@ -0,0 +1,14 @@ +{ + "model_id": "facebook/esmfold_v1", + "downloads": 20947347, + "tags": [ + "transformers", + "pytorch", + "esm", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- # ESMFold ESMFold is a state-of-the-art end-to-end protein folding model based on an ESM-2 backbone. It does not require any lookup or MSA step, and therefore does not require any external databases to be present in order to make predictions. As a result, inference time is very significantly faster than AlphaFold2. For details on the model architecture and training, please refer to the accompanying paper. If you're interested in using ESMFold in practice, please check out the associated tutorial notebook.", + "model_explanation_gemini": "Predicts protein structures end-to-end without external databases, offering faster inference than AlphaFold2." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_fasttext-language-identification.json b/data/model_data_json/facebook_fasttext-language-identification.json new file mode 100644 index 0000000000000000000000000000000000000000..2aa996f76f1f303da4f7b92ffbc921bafdfdf2a2 --- /dev/null +++ b/data/model_data_json/facebook_fasttext-language-identification.json @@ -0,0 +1,17 @@ +{ + "model_id": "facebook/fasttext-language-identification", + "downloads": 332220, + "tags": [ + "fasttext", + "text-classification", + "language-identification", + "arxiv:1607.04606", + "arxiv:1802.06893", + "arxiv:1607.01759", + "arxiv:1612.03651", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 library_name: fasttext tags: - text-classification - language-identification --- # fastText (Language Identification) fastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. It was introduced in this paper. The official website can be found here. This LID (Language IDentification) model is used to predict the language of the input text, and the hosted version () was released as part of the NLLB project and can detect 217 languages. You can find older versions (ones that can identify 157 languages) on the official fastText website. ## Model description fastText is a library for efficient learning of word representations and sentence classification. fastText is designed to be simple to use for developers, domain experts, and students. It's dedicated to text classification and learning word representations, and was designed to allow for quick model iteration and refinement without specialized hardware. fastText models can be trained on more than a billion words on any multicore CPU in less than a few minutes. It includes pre-trained models learned on Wikipedia and in over 157 different languages. fastText can be used as a command line, linked to a C++ application, or used as a library for use cases from experimentation and prototyping to production. ## Intended uses & limitations You can use pre-trained word vectors for text classification or language identification. See the tutorials and resources on its official website to look for tasks that interest you. ### How to use Here is how to use this model to detect the language of a given text: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. Cosine similarity can be used to measure the similarity between two different word vectors. If two two vectors are identical, the cosine similarity will be 1. For two completely unrelated vectors, the value will be 0. If two vectors have an opposite relationship, the value will be -1. ## Training data Pre-trained word vectors for 157 languages were trained on Common Crawl and Wikipedia using fastText. These models were trained using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives. We also distribute three new word analogy datasets, for French, Hindi and Polish. ## Training procedure ### Tokenization We used the Stanford word segmenter for Chinese, Mecab for Japanese and UETsegmenter for Vietnamese. For languages using the Latin, Cyrillic, Hebrew or Greek scripts, we used the tokenizer from the Europarl preprocessing tools. For the remaining languages, we used the ICU tokenizer. More information about the training of these models can be found in the article Learning Word Vectors for 157 Languages. ### License The language identification model is distributed under the *Creative Commons Attribution-NonCommercial 4.0 International Public License*. ### Evaluation datasets The analogy evaluation datasets described in the paper are available here: French, Hindi, Polish. ### BibTeX entry and citation info Please cite [1] if using this code for learning word representations or [2] if using for text classification. [1] P. Bojanowski\\*, E. Grave\\*, A. Joulin, T. Mikolov, *Enriching Word Vectors with Subword Information* [2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, *Bag of Tricks for Efficient Text Classification* [3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, *FastText.zip: Compressing text classification models* If you use these word vectors, please cite the following paper: [4] E. Grave\\*, P. Bojanowski\\*, P. Gupta, A. Joulin, T. Mikolov, *Learning Word Vectors for 157 Languages* (\\* These authors contributed equally.)", + "model_explanation_gemini": "Identifies the language of input text from 217 possible languages using fastText's efficient classification capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_galactica-125m.json b/data/model_data_json/facebook_galactica-125m.json new file mode 100644 index 0000000000000000000000000000000000000000..b5d3ca25dcb1cecfc10483e1953fd731a90258e1 --- /dev/null +++ b/data/model_data_json/facebook_galactica-125m.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/galactica-125m", + "downloads": 191214, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "opt", + "text-generation", + "galactica", + "arxiv:1810.03993", + "license:cc-by-nc-4.0", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 tags: - galactica widget: - text: \"The Transformer architecture [START_REF]\" - text: \"The Schwarzschild radius is defined as: \\\\[\" - text: \"A force of 0.6N is applied to an object, which accelerates at 3m/s. What is its mass? \" - text: \"Lecture 1: The Ising Model\\n\\n\" - text: \"[START_I_SMILES]\" - text: \"[START_AMINO]GHMQSITAGQKVISKHKNGRFYQCEVVRLTTETFYEVNFDDGSFSDNLYPEDIVSQDCLQFGPPAEGEVVQVRWTDGQVYGAKFVASHPIQMYQVEFEDGSQLVVKRDDVYTLDEELP[END_AMINO] ## Keywords\" inference: false --- !logo # GALACTICA 125M (mini) Model card from the original repo Following Mitchell et al. (2018), this model card provides information about the GALACTICA model, how it was trained, and the intended use cases. Full details about how the model was trained and evaluated can be found in the release paper. ## Model Details The GALACTICA models are trained on a large-scale scientific corpus. The models are designed to perform scientific tasks, including but not limited to citation prediction, scientific QA, mathematical reasoning, summarization, document generation, molecular property prediction and entity extraction. The models were developed by the Papers with Code team at Meta AI to study the use of language models for the automatic organization of science. We train models with sizes ranging from 125M to 120B parameters. Below is a summary of the released models: | Size | Parameters | |:-----------:|:-----------:| | | 125 M | | | 1.3 B | | | 6.7 B | | | 30 B | | | 120 B | ## Release Date November 2022 ## Model Type Transformer based architecture in a decoder-only setup with a few modifications (see paper for more details). ## Paper & Demo Paper / Demo ## Model Use The primary intended users of the GALACTICA models are researchers studying language models applied to the scientific domain. We also anticipate the model will be useful for developers who wish to build scientific tooling. However, we caution against production use without safeguards given the potential of language models to hallucinate. The models are made available under a non-commercial CC BY-NC 4.0 license. More information about how to use the model can be found in the README.md of this repository. ## Training Data The GALACTICA models are trained on 106 billion tokens of open-access scientific text and data. This includes papers, textbooks, scientific websites, encyclopedias, reference material, knowledge bases, and more. We tokenize different modalities to provide a natural langauge interface for different tasks. See the README.md for more information. See the paper for full information on the training data. ## How to use Find below some example scripts on how to use the model in : ## Using the Pytorch model ### Running the model on a CPU
Click to expand
### Running the model on a GPU
Click to expand
### Running the model on a GPU using different precisions #### FP16
Click to expand
#### INT8
Click to expand
## Performance and Limitations The model outperforms several existing language models on a range of knowledge probes, reasoning, and knowledge-intensive scientific tasks. This also extends to general NLP tasks, where GALACTICA outperforms other open source general language models. That being said, we note a number of limitations in this section. As with other language models, GALACTICA is often prone to hallucination - and training on a high-quality academic corpus does not prevent this, especially for less popular and less cited scientific concepts. There are no guarantees of truthful output when generating from the model. This extends to specific modalities such as citation prediction. While GALACTICA's citation behaviour approaches the ground truth citation behaviour with scale, the model continues to exhibit a popularity bias at larger scales. In addition, we evaluated the model on several types of benchmarks related to stereotypes and toxicity. Overall, the model exhibits substantially lower toxicity rates compared to other large language models. That being said, the model continues to exhibit bias on certain measures (see the paper for details). So we recommend care when using the model for generations. ## Broader Implications GALACTICA can potentially be used as a new way to discover academic literature. We also expect a lot of downstream use for application to particular domains, such as mathematics, biology, and chemistry. In the paper, we demonstrated several examples of the model acting as alternative to standard search tools. We expect a new generation of scientific tools to be built upon large language models such as GALACTICA. We encourage researchers to investigate beneficial and new use cases for these models. That being said, it is important to be aware of the current limitations of large language models. Researchers should pay attention to common issues such as hallucination and biases that could emerge from using these models. ## Citation", + "model_explanation_gemini": "A 125M-parameter transformer-based model trained on scientific texts to perform tasks like citation prediction, scientific QA, mathematical reasoning, and document generation for research purposes." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_hubert-base-ls960.json b/data/model_data_json/facebook_hubert-base-ls960.json new file mode 100644 index 0000000000000000000000000000000000000000..e7844c140554d56ef86cee857375d831efe5d8c5 --- /dev/null +++ b/data/model_data_json/facebook_hubert-base-ls960.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/hubert-base-ls960", + "downloads": 141549, + "tags": [ + "transformers", + "pytorch", + "tf", + "hubert", + "feature-extraction", + "speech", + "en", + "dataset:librispeech_asr", + "arxiv:2106.07447", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - librispeech_asr tags: - speech license: apache-2.0 --- # Hubert-Base Facebook's Hubert The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model. Paper Authors: Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed **Abstract** Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we propose the Hidden-Unit BERT (HuBERT) approach for self-supervised speech representation learning, which utilizes an offline clustering step to provide aligned target labels for a BERT-like prediction loss. A key ingredient of our approach is applying the prediction loss over the masked regions only, which forces the model to learn a combined acoustic and language model over the continuous inputs. HuBERT relies primarily on the consistency of the unsupervised clustering step rather than the intrinsic quality of the assigned cluster labels. Starting with a simple k-means teacher of 100 clusters, and using two iterations of clustering, the HuBERT model either matches or improves upon the state-of-the-art wav2vec 2.0 performance on the Librispeech (960h) and Libri-light (60,000h) benchmarks with 10min, 1h, 10h, 100h, and 960h fine-tuning subsets. Using a 1B parameter model, HuBERT shows up to 19% and 13% relative WER reduction on the more challenging dev-other and test-other evaluation subsets. The original model can be found under . # Usage See this blog for more information on how to fine-tune the model. Note that the class has to be replaced by .", + "model_explanation_gemini": "Pretrained on 16kHz speech audio for self-supervised speech representation learning, requiring fine-tuning with a tokenizer for speech recognition tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_hubert-large-ls960-ft.json b/data/model_data_json/facebook_hubert-large-ls960-ft.json new file mode 100644 index 0000000000000000000000000000000000000000..6a7f6d86d15f26992f0f2a8f96e5c86f62e86cc5 --- /dev/null +++ b/data/model_data_json/facebook_hubert-large-ls960-ft.json @@ -0,0 +1,24 @@ +{ + "model_id": "facebook/hubert-large-ls960-ft", + "downloads": 589934, + "tags": [ + "transformers", + "pytorch", + "tf", + "hubert", + "automatic-speech-recognition", + "speech", + "audio", + "hf-asr-leaderboard", + "en", + "dataset:libri-light", + "dataset:librispeech_asr", + "arxiv:2106.07447", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - libri-light - librispeech_asr tags: - speech - audio - automatic-speech-recognition - hf-asr-leaderboard license: apache-2.0 model-index: - name: hubert-large-ls960-ft results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 1.9 --- # Hubert-Large-Finetuned Facebook's Hubert The large model fine-tuned on 960h of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. The model is a fine-tuned version of hubert-large-ll60k. Paper Authors: Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed **Abstract** Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we propose the Hidden-Unit BERT (HuBERT) approach for self-supervised speech representation learning, which utilizes an offline clustering step to provide aligned target labels for a BERT-like prediction loss. A key ingredient of our approach is applying the prediction loss over the masked regions only, which forces the model to learn a combined acoustic and language model over the continuous inputs. HuBERT relies primarily on the consistency of the unsupervised clustering step rather than the intrinsic quality of the assigned cluster labels. Starting with a simple k-means teacher of 100 clusters, and using two iterations of clustering, the HuBERT model either matches or improves upon the state-of-the-art wav2vec 2.0 performance on the Librispeech (960h) and Libri-light (60,000h) benchmarks with 10min, 1h, 10h, 100h, and 960h fine-tuning subsets. Using a 1B parameter model, HuBERT shows up to 19% and 13% relative WER reduction on the more challenging dev-other and test-other evaluation subsets. The original model can be found under . # Usage The model can be used for automatic-speech-recognition as follows:", + "model_explanation_gemini": "Fine-tuned for automatic speech recognition, this model converts 16kHz speech audio to text with high accuracy, specifically trained on 960 hours of LibriSpeech data." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_m2m100_1.2B.json b/data/model_data_json/facebook_m2m100_1.2B.json new file mode 100644 index 0000000000000000000000000000000000000000..f8e70be46b3073e80ebcf2b1015415c9611de50e --- /dev/null +++ b/data/model_data_json/facebook_m2m100_1.2B.json @@ -0,0 +1,119 @@ +{ + "model_id": "facebook/m2m100_1.2B", + "downloads": 454332, + "tags": [ + "transformers", + "pytorch", + "rust", + "m2m_100", + "text2text-generation", + "multilingual", + "af", + "am", + "ar", + "ast", + "az", + "ba", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "es", + "et", + "fa", + "ff", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "ht", + "hu", + "hy", + "id", + "ig", + "ilo", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "lb", + "lg", + "ln", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "ns", + "oc", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "ss", + "su", + "sv", + "sw", + "ta", + "th", + "tl", + "tn", + "tr", + "uk", + "ur", + "uz", + "vi", + "wo", + "xh", + "yi", + "yo", + "zh", + "zu", + "arxiv:2010.11125", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - ast - az - ba - be - bg - bn - br - bs - ca - ceb - cs - cy - da - de - el - en - es - et - fa - ff - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - ht - hu - hy - id - ig - ilo - is - it - ja - jv - ka - kk - km - kn - ko - lb - lg - ln - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - ns - oc - or - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - so - sq - sr - ss - su - sv - sw - ta - th - tl - tn - tr - uk - ur - uz - vi - wo - xh - yi - yo - zh - zu license: mit --- # M2M100 1.2B M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. It was introduced in this paper and first released in this repository. The model that can directly translate between the 9,900 directions of 100 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the parameter to the method. *Note: depends on , so make sure to install it before running the example.* To install run See the model hub to look for more fine-tuned versions. ## Languages covered Afrikaans (af), Amharic (am), Arabic (ar), Asturian (ast), Azerbaijani (az), Bashkir (ba), Belarusian (be), Bulgarian (bg), Bengali (bn), Breton (br), Bosnian (bs), Catalan; Valencian (ca), Cebuano (ceb), Czech (cs), Welsh (cy), Danish (da), German (de), Greeek (el), English (en), Spanish (es), Estonian (et), Persian (fa), Fulah (ff), Finnish (fi), French (fr), Western Frisian (fy), Irish (ga), Gaelic; Scottish Gaelic (gd), Galician (gl), Gujarati (gu), Hausa (ha), Hebrew (he), Hindi (hi), Croatian (hr), Haitian; Haitian Creole (ht), Hungarian (hu), Armenian (hy), Indonesian (id), Igbo (ig), Iloko (ilo), Icelandic (is), Italian (it), Japanese (ja), Javanese (jv), Georgian (ka), Kazakh (kk), Central Khmer (km), Kannada (kn), Korean (ko), Luxembourgish; Letzeburgesch (lb), Ganda (lg), Lingala (ln), Lao (lo), Lithuanian (lt), Latvian (lv), Malagasy (mg), Macedonian (mk), Malayalam (ml), Mongolian (mn), Marathi (mr), Malay (ms), Burmese (my), Nepali (ne), Dutch; Flemish (nl), Norwegian (no), Northern Sotho (ns), Occitan (post 1500) (oc), Oriya (or), Panjabi; Punjabi (pa), Polish (pl), Pushto; Pashto (ps), Portuguese (pt), Romanian; Moldavian; Moldovan (ro), Russian (ru), Sindhi (sd), Sinhala; Sinhalese (si), Slovak (sk), Slovenian (sl), Somali (so), Albanian (sq), Serbian (sr), Swati (ss), Sundanese (su), Swedish (sv), Swahili (sw), Tamil (ta), Thai (th), Tagalog (tl), Tswana (tn), Turkish (tr), Ukrainian (uk), Urdu (ur), Uzbek (uz), Vietnamese (vi), Wolof (wo), Xhosa (xh), Yiddish (yi), Yoruba (yo), Chinese (zh), Zulu (zu) ## BibTeX entry and citation info", + "model_explanation_gemini": "Directly translates between 9,900 language directions across 100 languages using a multilingual seq-to-seq approach." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_m2m100_418M.json b/data/model_data_json/facebook_m2m100_418M.json new file mode 100644 index 0000000000000000000000000000000000000000..3b114cd4501bd87bf620e1adf44a98ee119e3a4f --- /dev/null +++ b/data/model_data_json/facebook_m2m100_418M.json @@ -0,0 +1,119 @@ +{ + "model_id": "facebook/m2m100_418M", + "downloads": 1289499, + "tags": [ + "transformers", + "pytorch", + "rust", + "m2m_100", + "text2text-generation", + "multilingual", + "af", + "am", + "ar", + "ast", + "az", + "ba", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "es", + "et", + "fa", + "ff", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "ht", + "hu", + "hy", + "id", + "ig", + "ilo", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "lb", + "lg", + "ln", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "ns", + "oc", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "ss", + "su", + "sv", + "sw", + "ta", + "th", + "tl", + "tn", + "tr", + "uk", + "ur", + "uz", + "vi", + "wo", + "xh", + "yi", + "yo", + "zh", + "zu", + "arxiv:2010.11125", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - ast - az - ba - be - bg - bn - br - bs - ca - ceb - cs - cy - da - de - el - en - es - et - fa - ff - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - ht - hu - hy - id - ig - ilo - is - it - ja - jv - ka - kk - km - kn - ko - lb - lg - ln - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - ns - oc - or - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - so - sq - sr - ss - su - sv - sw - ta - th - tl - tn - tr - uk - ur - uz - vi - wo - xh - yi - yo - zh - zu license: mit --- # M2M100 418M M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. It was introduced in this paper and first released in this repository. The model that can directly translate between the 9,900 directions of 100 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the parameter to the method. *Note: depends on , so make sure to install it before running the example.* To install run See the model hub to look for more fine-tuned versions. ## Languages covered Afrikaans (af), Amharic (am), Arabic (ar), Asturian (ast), Azerbaijani (az), Bashkir (ba), Belarusian (be), Bulgarian (bg), Bengali (bn), Breton (br), Bosnian (bs), Catalan; Valencian (ca), Cebuano (ceb), Czech (cs), Welsh (cy), Danish (da), German (de), Greeek (el), English (en), Spanish (es), Estonian (et), Persian (fa), Fulah (ff), Finnish (fi), French (fr), Western Frisian (fy), Irish (ga), Gaelic; Scottish Gaelic (gd), Galician (gl), Gujarati (gu), Hausa (ha), Hebrew (he), Hindi (hi), Croatian (hr), Haitian; Haitian Creole (ht), Hungarian (hu), Armenian (hy), Indonesian (id), Igbo (ig), Iloko (ilo), Icelandic (is), Italian (it), Japanese (ja), Javanese (jv), Georgian (ka), Kazakh (kk), Central Khmer (km), Kannada (kn), Korean (ko), Luxembourgish; Letzeburgesch (lb), Ganda (lg), Lingala (ln), Lao (lo), Lithuanian (lt), Latvian (lv), Malagasy (mg), Macedonian (mk), Malayalam (ml), Mongolian (mn), Marathi (mr), Malay (ms), Burmese (my), Nepali (ne), Dutch; Flemish (nl), Norwegian (no), Northern Sotho (ns), Occitan (post 1500) (oc), Oriya (or), Panjabi; Punjabi (pa), Polish (pl), Pushto; Pashto (ps), Portuguese (pt), Romanian; Moldavian; Moldovan (ro), Russian (ru), Sindhi (sd), Sinhala; Sinhalese (si), Slovak (sk), Slovenian (sl), Somali (so), Albanian (sq), Serbian (sr), Swati (ss), Sundanese (su), Swedish (sv), Swahili (sw), Tamil (ta), Thai (th), Tagalog (tl), Tswana (tn), Turkish (tr), Ukrainian (uk), Urdu (ur), Uzbek (uz), Vietnamese (vi), Wolof (wo), Xhosa (xh), Yiddish (yi), Yoruba (yo), Chinese (zh), Zulu (zu) ## BibTeX entry and citation info", + "model_explanation_gemini": "Translates text directly between 100 languages in 9,900 language directions by forcing the target language as the first generated token." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_mask2former-swin-large-ade-semantic.json b/data/model_data_json/facebook_mask2former-swin-large-ade-semantic.json new file mode 100644 index 0000000000000000000000000000000000000000..90a935bb8866b83504b132b3526ca734c789de54 --- /dev/null +++ b/data/model_data_json/facebook_mask2former-swin-large-ade-semantic.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/mask2former-swin-large-ade-semantic", + "downloads": 221868, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mask2former", + "vision", + "image-segmentation", + "dataset:coco", + "arxiv:2112.01527", + "arxiv:2107.06278", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - vision - image-segmentation datasets: - coco widget: - src: example_title: Cats - src: example_title: Castle --- # Mask2Former Mask2Former model trained on ADE20k semantic segmentation (large-sized version, Swin backbone). It was introduced in the paper Masked-attention Mask Transformer for Universal Image Segmentation and first released in this repository. Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks. !model image ## Intended uses & limitations You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: For more code examples, we refer to the documentation.", + "model_explanation_gemini": "Performs semantic segmentation on images using a Swin backbone to predict masks and labels, trained on ADE20k for universal image segmentation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_mask2former-swin-large-cityscapes-semantic.json b/data/model_data_json/facebook_mask2former-swin-large-cityscapes-semantic.json new file mode 100644 index 0000000000000000000000000000000000000000..0b3bd7ccd569dd3e2b0eb90644c11d3891834275 --- /dev/null +++ b/data/model_data_json/facebook_mask2former-swin-large-cityscapes-semantic.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/mask2former-swin-large-cityscapes-semantic", + "downloads": 295469, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mask2former", + "vision", + "image-segmentation", + "dataset:coco", + "arxiv:2112.01527", + "arxiv:2107.06278", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - vision - image-segmentation datasets: - coco widget: - src: example_title: Cats - src: example_title: Castle --- # Mask2Former Mask2Former model trained on Cityscapes semantic segmentation (large-sized version, Swin backbone). It was introduced in the paper Masked-attention Mask Transformer for Universal Image Segmentation and first released in this repository. Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks. !model image ## Intended uses & limitations You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: For more code examples, we refer to the documentation.", + "model_explanation_gemini": "Performs semantic segmentation on Cityscapes images using a Swin backbone to predict masks and labels, treating segmentation tasks uniformly as instance segmentation." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_mask2former-swin-tiny-coco-instance.json b/data/model_data_json/facebook_mask2former-swin-tiny-coco-instance.json new file mode 100644 index 0000000000000000000000000000000000000000..107754a6fdcdeff4b75ac61e4f2efc74084ffd0e --- /dev/null +++ b/data/model_data_json/facebook_mask2former-swin-tiny-coco-instance.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/mask2former-swin-tiny-coco-instance", + "downloads": 138274, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mask2former", + "vision", + "image-segmentation", + "dataset:coco", + "arxiv:2112.01527", + "arxiv:2107.06278", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - vision - image-segmentation datasets: - coco widget: - src: example_title: Cats - src: example_title: Castle --- # Mask2Former Mask2Former model trained on COCO instance segmentation (tiny-sized version, Swin backbone). It was introduced in the paper Masked-attention Mask Transformer for Universal Image Segmentation and first released in this repository. Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks. !model image ## Intended uses & limitations You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: For more code examples, we refer to the documentation.", + "model_explanation_gemini": "Performs instance segmentation on images by predicting masks and labels using a Swin Transformer backbone, trained on the COCO dataset." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_mbart-large-50-many-to-many-mmt.json b/data/model_data_json/facebook_mbart-large-50-many-to-many-mmt.json new file mode 100644 index 0000000000000000000000000000000000000000..d479f1bbabc764b5285b32a5573659eef7caf6cc --- /dev/null +++ b/data/model_data_json/facebook_mbart-large-50-many-to-many-mmt.json @@ -0,0 +1,75 @@ +{ + "model_id": "facebook/mbart-large-50-many-to-many-mmt", + "downloads": 346048, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "mbart", + "text2text-generation", + "mbart-50", + "translation", + "multilingual", + "ar", + "cs", + "de", + "en", + "es", + "et", + "fi", + "fr", + "gu", + "hi", + "it", + "ja", + "kk", + "ko", + "lt", + "lv", + "my", + "ne", + "nl", + "ro", + "ru", + "si", + "tr", + "vi", + "zh", + "af", + "az", + "bn", + "fa", + "he", + "hr", + "id", + "ka", + "km", + "mk", + "ml", + "mn", + "mr", + "pl", + "ps", + "pt", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "uk", + "ur", + "xh", + "gl", + "sl", + "arxiv:2008.00401", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ar - cs - de - en - es - et - fi - fr - gu - hi - it - ja - kk - ko - lt - lv - my - ne - nl - ro - ru - si - tr - vi - zh - af - az - bn - fa - he - hr - id - ka - km - mk - ml - mn - mr - pl - ps - pt - sv - sw - ta - te - th - tl - uk - ur - xh - gl - sl tags: - mbart-50 pipeline_tag: translation --- # mBART-50 many to many multilingual machine translation This model is a fine-tuned checkpoint of mBART-large-50. is fine-tuned for multilingual machine translation. It was introduced in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper. The model can translate directly between any pair of 50 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the parameter to the method. See the model hub to look for more fine-tuned versions. ## Languages covered Arabic (ar_AR), Czech (cs_CZ), German (de_DE), English (en_XX), Spanish (es_XX), Estonian (et_EE), Finnish (fi_FI), French (fr_XX), Gujarati (gu_IN), Hindi (hi_IN), Italian (it_IT), Japanese (ja_XX), Kazakh (kk_KZ), Korean (ko_KR), Lithuanian (lt_LT), Latvian (lv_LV), Burmese (my_MM), Nepali (ne_NP), Dutch (nl_XX), Romanian (ro_RO), Russian (ru_RU), Sinhala (si_LK), Turkish (tr_TR), Vietnamese (vi_VN), Chinese (zh_CN), Afrikaans (af_ZA), Azerbaijani (az_AZ), Bengali (bn_IN), Persian (fa_IR), Hebrew (he_IL), Croatian (hr_HR), Indonesian (id_ID), Georgian (ka_GE), Khmer (km_KH), Macedonian (mk_MK), Malayalam (ml_IN), Mongolian (mn_MN), Marathi (mr_IN), Polish (pl_PL), Pashto (ps_AF), Portuguese (pt_XX), Swedish (sv_SE), Swahili (sw_KE), Tamil (ta_IN), Telugu (te_IN), Thai (th_TH), Tagalog (tl_XX), Ukrainian (uk_UA), Urdu (ur_PK), Xhosa (xh_ZA), Galician (gl_ES), Slovene (sl_SI) ## BibTeX entry and citation info", + "model_explanation_gemini": "Multilingual machine translation model supporting direct translation between 50 languages by forcing target language tokens as initial output." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_metaclip-b16-fullcc2.5b.json b/data/model_data_json/facebook_metaclip-b16-fullcc2.5b.json new file mode 100644 index 0000000000000000000000000000000000000000..ca797958d84f267fc63d47eb5738736c242df07e --- /dev/null +++ b/data/model_data_json/facebook_metaclip-b16-fullcc2.5b.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/metaclip-b16-fullcc2.5b", + "downloads": 87437, + "tags": [ + "transformers", + "pytorch", + "clip", + "zero-shot-image-classification", + "vision", + "metaclip", + "arxiv:2309.16671", + "arxiv:2103.00020", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 tags: - vision - metaclip widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # MetaCLIP model, base-sized version, patch resolution 16 MetaCLIP model applied to 2.5 billion data points of CommonCrawl (CC). It was introduced in the paper Demystifying CLIP Data by Xu et al. and first released in this repository. Disclaimer: The team releasing MetaCLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Demystifying CLIP Data paper aims to reveal CLIP’s method around training data curation. OpenAI never open-sourced code regarding their data preparation pipeline. \"drawing\" CLIP high-level overview. Taken from the
. ## Intended uses & limitations You can use the raw model for linking images with text in a shared embedding space. This enables things like zero-shot image classification, text-based image retrieval, image-based text retrieval, etc. ### How to use We refer to the docs. Just replace the names of the models on the hub. ### BibTeX entry and citation info", + "model_explanation_gemini": "Links images with text in a shared embedding space for tasks like zero-shot image classification and text-based image retrieval." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_metaclip-b32-400m.json b/data/model_data_json/facebook_metaclip-b32-400m.json new file mode 100644 index 0000000000000000000000000000000000000000..56b5006fa838465f92c7dc2eec2503d970c2989a --- /dev/null +++ b/data/model_data_json/facebook_metaclip-b32-400m.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/metaclip-b32-400m", + "downloads": 123413, + "tags": [ + "transformers", + "pytorch", + "clip", + "zero-shot-image-classification", + "vision", + "metaclip", + "arxiv:2309.16671", + "arxiv:2103.00020", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 tags: - vision - metaclip widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # MetaCLIP model, base-sized version, patch resolution 32 MetaCLIP model applied to 400 million data points of CommonCrawl (CC). It was introduced in the paper Demystifying CLIP Data by Xu et al. and first released in this repository. Disclaimer: The team releasing MetaCLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Demystifying CLIP Data paper aims to reveal CLIP’s method around training data curation. OpenAI never open-sourced code regarding their data preparation pipeline. \"drawing\" CLIP high-level overview. Taken from the . ## Intended uses & limitations You can use the raw model for linking images with text in a shared embedding space. This enables things like zero-shot image classification, text-based image retrieval, image-based text retrieval, etc. ### How to use We refer to the docs. Just replace the names of the models on the hub. ### BibTeX entry and citation info", + "model_explanation_gemini": "Links images with text in a shared embedding space for tasks like zero-shot image classification and text-based image retrieval." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_mms-lid-126.json b/data/model_data_json/facebook_mms-lid-126.json new file mode 100644 index 0000000000000000000000000000000000000000..1f3b67c2fa742f71429d451574f24359d8d215d2 --- /dev/null +++ b/data/model_data_json/facebook_mms-lid-126.json @@ -0,0 +1,148 @@ +{ + "model_id": "facebook/mms-lid-126", + "downloads": 2104927, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "audio-classification", + "mms", + "ab", + "af", + "ak", + "am", + "ar", + "as", + "av", + "ay", + "az", + "ba", + "bm", + "be", + "bn", + "bi", + "bo", + "sh", + "br", + "bg", + "ca", + "cs", + "ce", + "cv", + "ku", + "cy", + "da", + "de", + "dv", + "dz", + "el", + "en", + "eo", + "et", + "eu", + "ee", + "fo", + "fa", + "fj", + "fi", + "fr", + "fy", + "ff", + "ga", + "gl", + "gn", + "gu", + "zh", + "ht", + "ha", + "he", + "hi", + "hu", + "hy", + "ig", + "ia", + "ms", + "is", + "it", + "jv", + "ja", + "kn", + "ka", + "kk", + "kr", + "km", + "ki", + "rw", + "ky", + "ko", + "kv", + "lo", + "la", + "lv", + "ln", + "lt", + "lb", + "lg", + "mh", + "ml", + "mr", + "mk", + "mg", + "mt", + "mn", + "mi", + "my", + "nl", + "no", + "ne", + "ny", + "oc", + "om", + "or", + "os", + "pa", + "pl", + "pt", + "ps", + "qu", + "ro", + "rn", + "ru", + "sg", + "sk", + "sl", + "sm", + "sn", + "sd", + "so", + "es", + "sq", + "su", + "sv", + "sw", + "ta", + "tt", + "te", + "tg", + "tl", + "th", + "ti", + "ts", + "tr", + "uk", + "vi", + "wo", + "xh", + "yo", + "zu", + "za", + "dataset:google/fleurs", + "arxiv:2305.13516", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mms language: - ab - af - ak - am - ar - as - av - ay - az - ba - bm - be - bn - bi - bo - sh - br - bg - ca - cs - ce - cv - ku - cy - da - de - dv - dz - el - en - eo - et - eu - ee - fo - fa - fj - fi - fr - fy - ff - ga - gl - gn - gu - zh - ht - ha - he - hi - sh - hu - hy - ig - ia - ms - is - it - jv - ja - kn - ka - kk - kr - km - ki - rw - ky - ko - kv - lo - la - lv - ln - lt - lb - lg - mh - ml - mr - ms - mk - mg - mt - mn - mi - my - zh - nl - 'no' - 'no' - ne - ny - oc - om - or - os - pa - pl - pt - ms - ps - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - ro - rn - ru - sg - sk - sl - sm - sn - sd - so - es - sq - su - sv - sw - ta - tt - te - tg - tl - th - ti - ts - tr - uk - ms - vi - wo - xh - ms - yo - ms - zu - za license: cc-by-nc-4.0 datasets: - google/fleurs metrics: - acc --- # Massively Multilingual Speech (MMS) - Finetuned LID This checkpoint is a model fine-tuned for speech language identification (LID) and part of Facebook's Massive Multilingual Speech project. This checkpoint is based on the Wav2Vec2 architecture and classifies raw audio input to a probability distribution over 126 output classes (each class representing a language). The checkpoint consists of **1 billion parameters** and has been fine-tuned from facebook/mms-1b on 126 languages. ## Table Of Content - Example - Supported Languages - Model details - Additional links ## Example This MMS checkpoint can be used with Transformers to identify the spoken language of an audio. It can recognize the following 126 languages. Let's look at a simple example. First, we install transformers and some other libraries transformers >= 4.304.30transformersdatasets`. Make sure that the audio data is sampled to 16000 kHz. Next, we load the model and processor Now we process the audio data, pass the processed audio data to the model to classify it into a language, just like we usually do for Wav2Vec2 audio classification models such as ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition To see all the supported languages of a checkpoint, you can print out the language ids as follows: For more details, about the architecture please have a look at the official docs. ## Supported Languages This model supports 126 languages. Unclick the following to toogle all supported languages of this checkpoint in ISO 639-3 code. You can find more details about the languages and their ISO 649-3 codes in the MMS Language Coverage Overview.
Click to toggle - ara - cmn - eng - spa - fra - mlg - swe - por - vie - ful - sun - asm - ben - zlm - kor - ind - hin - tuk - urd - aze - slv - mon - hau - tel - swh - bod - rus - tur - heb - mar - som - tgl - tat - tha - cat - ron - mal - bel - pol - yor - nld - bul - hat - afr - isl - amh - tam - hun - hrv - lit - cym - fas - mkd - ell - bos - deu - sqi - jav - nob - uzb - snd - lat - nya - grn - mya - orm - lin - hye - yue - pan - jpn - kaz - npi - kat - guj - kan - tgk - ukr - ces - lav - bak - khm - fao - glg - ltz - lao - mlt - sin - sna - ita - srp - mri - nno - pus - eus - ory - lug - bre - luo - slk - fin - dan - yid - est - ceb - war - san - kir - oci - wol - haw - kam - umb - xho - epo - zul - ibo - abk - ckb - nso - gle - kea - ast - sco - glv - ina
## Model details - **Developed by:** Vineel Pratap et al. - **Model type:** Multi-Lingual Automatic Speech Recognition model - **Language(s):** 126 languages, see supported languages - **License:** CC-BY-NC 4.0 license - **Num parameters**: 1 billion - **Audio sampling rate**: 16,000 kHz - **Cite as:** @article{pratap2023mms, title={Scaling Speech Technology to 1,000+ Languages}, author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli}, journal={arXiv}, year={2023} } ## Additional Links - Blog post - Transformers documentation. - Paper - GitHub Repository - Other **MMS** checkpoints - MMS base checkpoints: - facebook/mms-1b - facebook/mms-300m - Official Space", + "model_explanation_gemini": "Identifies the spoken language in raw audio from 126 possible languages using a fine-tuned Wav2Vec2 architecture." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_musicgen-medium.json b/data/model_data_json/facebook_musicgen-medium.json new file mode 100644 index 0000000000000000000000000000000000000000..70be37c0f938bf27df3e58e9386800c49d70071f --- /dev/null +++ b/data/model_data_json/facebook_musicgen-medium.json @@ -0,0 +1,16 @@ +{ + "model_id": "facebook/musicgen-medium", + "downloads": 1412420, + "tags": [ + "transformers", + "pytorch", + "musicgen", + "text-to-audio", + "arxiv:2306.05284", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- inference: true tags: - musicgen license: cc-by-nc-4.0 pipeline_tag: text-to-audio widget: - text: a funky house with 80s hip hop vibes example_title: Prompt 1 - text: a chill song with influences from lofi, chillstep and downtempo example_title: Prompt 2 - text: a catchy beat for a podcast intro example_title: Prompt 3 --- # MusicGen - Medium - 1.5B MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts. It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. MusicGen was published in Simple and Controllable Music Generation by *Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez*. Four checkpoints are released: - small - **medium** (this checkpoint) - large - melody ## Example Try out MusicGen yourself! * Audiocraft Colab:
* Hugging Face Colab: * Hugging Face Demo: ## 🤗 Transformers Usage You can run MusicGen locally with the 🤗 Transformers library from version 4.31.0 onwards. 1. First install the 🤗 Transformers library and scipy: 2. Run inference via the (TTA) pipeline. You can infer the MusicGen model via the TTA pipeline in just a few lines of code! 3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control. 3. Listen to the audio samples either in an ipynb notebook: Or save them as a file using a third-party library, e.g. : For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the MusicGen docs. ## Audiocraft Usage You can also run MusicGen locally through the original Audiocraft library: 1. First install the library 2. Make sure to have []( installed: 3. Run the following Python code: ## Model details **Organization developing the model:** The FAIR team of Meta AI. **Model date:** MusicGen was trained between April 2023 and May 2023. **Model version:** This is the version 1 of the model. **Model type:** MusicGen consists of an EnCodec model for audio tokenization, an auto-regressive language model based on the transformer architecture for music modeling. The model comes in different sizes: 300M, 1.5B and 3.3B parameters ; and two variants: a model trained for text-to-music generation task and a model trained for melody-guided music generation. **Paper or resources for more information:** More information can be found in the paper Simple and Controllable Music Generation. **Citation details:** **License:** Code is released under MIT, model weights are released under CC-BY-NC 4.0. **Where to send questions or comments about the model:** Questions and comments about MusicGen can be sent via the Github repository of the project, or by opening an issue. ## Intended use **Primary intended use:** The primary use of MusicGen is research on AI-based music generation, including: - Research efforts, such as probing and better understanding the limitations of generative models to further improve the state of science - Generation of music guided by text or melody to understand current abilities of generative AI models by machine learning amateurs **Primary intended users:** The primary intended users of the model are researchers in audio, machine learning and artificial intelligence, as well as amateur seeking to better understand those models. **Out-of-scope use cases:** The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. ## Metrics **Models performance measures:** We used the following objective measure to evaluate the model on a standard music benchmark: - Frechet Audio Distance computed on features extracted from a pre-trained audio classifier (VGGish) - Kullback-Leibler Divergence on label distributions extracted from a pre-trained audio classifier (PaSST) - CLAP Score between audio embedding and text embedding extracted from a pre-trained CLAP model Additionally, we run qualitative studies with human participants, evaluating the performance of the model with the following axes: - Overall quality of the music samples; - Text relevance to the provided text input; - Adherence to the melody for melody-guided music generation. More details on performance measures and human studies can be found in the paper. **Decision thresholds:** Not applicable. ## Evaluation datasets The model was evaluated on the MusicCaps benchmark and on an in-domain held-out evaluation set, with no artist overlap with the training set. ## Training datasets The model was trained on licensed data using the following sources: the Meta Music Initiative Sound Collection, Shutterstock music collection and the Pond5 music collection. See the paper for more details about the training set and corresponding preprocessing. ## Evaluation results Below are the objective metrics obtained on MusicCaps with the released model. Note that for the publicly released models, we had all the datasets go through a state-of-the-art music source separation method, namely using the open source Hybrid Transformer for Music Source Separation (HT-Demucs), in order to keep only the instrumental part. This explains the difference in objective metrics with the models used in the paper. | Model | Frechet Audio Distance | KLD | Text Consistency | Chroma Cosine Similarity | |---|---|---|---|---| | facebook/musicgen-small | 4.88 | 1.42 | 0.27 | - | | **facebook/musicgen-medium** | 5.14 | 1.38 | 0.28 | - | | facebook/musicgen-large | 5.48 | 1.37 | 0.28 | - | | facebook/musicgen-melody | 4.93 | 1.41 | 0.27 | 0.44 | More information can be found in the paper Simple and Controllable Music Generation, in the Results section. ## Limitations and biases **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model. **Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using a state-of-the-art music source separation method, namely using the open source Hybrid Transformer for Music Source Separation (HT-Demucs). **Limitations:** - The model is not able to generate realistic vocals. - The model has been trained with English descriptions and will not perform as well in other languages. - The model does not perform equally well for all music styles and cultures. - The model sometimes generates end of songs, collapsing to silence. - It is sometimes difficult to assess what types of text descriptions provide the best generations. Prompt engineering may be required to obtain satisfying results. **Biases:** The source of data is potentially lacking diversity and all music cultures are not equally represented in the dataset. The model may not perform equally well on the wide variety of music genres that exists. The generated samples from the model will reflect the biases from the training data. Further work on this model should include methods for balanced and just representations of cultures, for example, by scaling the training data to be both diverse and inclusive. **Risks and harms:** Biases and limitations of the model may lead to generation of samples that may be considered as biased, inappropriate or offensive. We believe that providing the code to reproduce the research and train new models will allow to broaden the application to new and more representative data. **Use cases:** Users must be aware of the biases, limitations and risks of the model. MusicGen is a model developed for artificial intelligence research on controllable music generation. As such, it should not be used for downstream applications without further investigation and mitigation of risks.", + "model_explanation_gemini": "Generates high-quality music samples from text descriptions or audio prompts using a single-stage auto-regressive Transformer model." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_musicgen-small.json b/data/model_data_json/facebook_musicgen-small.json new file mode 100644 index 0000000000000000000000000000000000000000..95dc937182bcab36e577f60e3212d3e2ca359945 --- /dev/null +++ b/data/model_data_json/facebook_musicgen-small.json @@ -0,0 +1,17 @@ +{ + "model_id": "facebook/musicgen-small", + "downloads": 114136, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "musicgen", + "text-to-audio", + "arxiv:2306.05284", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- inference: true tags: - musicgen license: cc-by-nc-4.0 pipeline_tag: text-to-audio widget: - text: \"a funky house with 80s hip hop vibes\" example_title: \"Prompt 1\" - text: \"a chill song with influences from lofi, chillstep and downtempo\" example_title: \"Prompt 2\" - text: \"a catchy beat for a podcast intro\" example_title: \"Prompt 3\" --- # MusicGen - Small - 300M MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts. It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. MusicGen was published in Simple and Controllable Music Generation by *Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez*. Four checkpoints are released: - **small** (this checkpoint) - medium - large - melody ## Example Try out MusicGen yourself! * Audiocraft Colab: * Hugging Face Colab: * Hugging Face Demo: ## 🤗 Transformers Usage You can run MusicGen locally with the 🤗 Transformers library from version 4.31.0 onwards. 1. First install the 🤗 Transformers library and scipy: 2. Run inference via the (TTA) pipeline. You can infer the MusicGen model via the TTA pipeline in just a few lines of code! 3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control. 3. Listen to the audio samples either in an ipynb notebook: Or save them as a file using a third-party library, e.g. : For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the MusicGen docs. ## Audiocraft Usage You can also run MusicGen locally through the original Audiocraft library: 1. First install the library 2. Make sure to have []( installed: 3. Run the following Python code: ## Model details **Organization developing the model:** The FAIR team of Meta AI. **Model date:** MusicGen was trained between April 2023 and May 2023. **Model version:** This is the version 1 of the model. **Model type:** MusicGen consists of an EnCodec model for audio tokenization, an auto-regressive language model based on the transformer architecture for music modeling. The model comes in different sizes: 300M, 1.5B and 3.3B parameters ; and two variants: a model trained for text-to-music generation task and a model trained for melody-guided music generation. **Paper or resources for more information:** More information can be found in the paper Simple and Controllable Music Generation. **Citation details:** **License:** Code is released under MIT, model weights are released under CC-BY-NC 4.0. **Where to send questions or comments about the model:** Questions and comments about MusicGen can be sent via the Github repository of the project, or by opening an issue. ## Intended use **Primary intended use:** The primary use of MusicGen is research on AI-based music generation, including: - Research efforts, such as probing and better understanding the limitations of generative models to further improve the state of science - Generation of music guided by text or melody to understand current abilities of generative AI models by machine learning amateurs **Primary intended users:** The primary intended users of the model are researchers in audio, machine learning and artificial intelligence, as well as amateur seeking to better understand those models. **Out-of-scope use cases:** The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. ## Metrics **Models performance measures:** We used the following objective measure to evaluate the model on a standard music benchmark: - Frechet Audio Distance computed on features extracted from a pre-trained audio classifier (VGGish) - Kullback-Leibler Divergence on label distributions extracted from a pre-trained audio classifier (PaSST) - CLAP Score between audio embedding and text embedding extracted from a pre-trained CLAP model Additionally, we run qualitative studies with human participants, evaluating the performance of the model with the following axes: - Overall quality of the music samples; - Text relevance to the provided text input; - Adherence to the melody for melody-guided music generation. More details on performance measures and human studies can be found in the paper. **Decision thresholds:** Not applicable. ## Evaluation datasets The model was evaluated on the MusicCaps benchmark and on an in-domain held-out evaluation set, with no artist overlap with the training set. ## Training datasets The model was trained on licensed data using the following sources: the Meta Music Initiative Sound Collection, Shutterstock music collection and the Pond5 music collection. See the paper for more details about the training set and corresponding preprocessing. ## Evaluation results Below are the objective metrics obtained on MusicCaps with the released model. Note that for the publicly released models, we had all the datasets go through a state-of-the-art music source separation method, namely using the open source Hybrid Transformer for Music Source Separation (HT-Demucs), in order to keep only the instrumental part. This explains the difference in objective metrics with the models used in the paper. | Model | Frechet Audio Distance | KLD | Text Consistency | Chroma Cosine Similarity | |---|---|---|---|---| | **facebook/musicgen-small** | 4.88 | 1.42 | 0.27 | - | | facebook/musicgen-medium | 5.14 | 1.38 | 0.28 | - | | facebook/musicgen-large | 5.48 | 1.37 | 0.28 | - | | facebook/musicgen-melody | 4.93 | 1.41 | 0.27 | 0.44 | More information can be found in the paper Simple and Controllable Music Generation, in the Results section. ## Limitations and biases **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model. **Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using a state-of-the-art music source separation method, namely using the open source Hybrid Transformer for Music Source Separation (HT-Demucs). **Limitations:** - The model is not able to generate realistic vocals. - The model has been trained with English descriptions and will not perform as well in other languages. - The model does not perform equally well for all music styles and cultures. - The model sometimes generates end of songs, collapsing to silence. - It is sometimes difficult to assess what types of text descriptions provide the best generations. Prompt engineering may be required to obtain satisfying results. **Biases:** The source of data is potentially lacking diversity and all music cultures are not equally represented in the dataset. The model may not perform equally well on the wide variety of music genres that exists. The generated samples from the model will reflect the biases from the training data. Further work on this model should include methods for balanced and just representations of cultures, for example, by scaling the training data to be both diverse and inclusive. **Risks and harms:** Biases and limitations of the model may lead to generation of samples that may be considered as biased, inappropriate or offensive. We believe that providing the code to reproduce the research and train new models will allow to broaden the application to new and more representative data. **Use cases:** Users must be aware of the biases, limitations and risks of the model. MusicGen is a model developed for artificial intelligence research on controllable music generation. As such, it should not be used for downstream applications without further investigation and mitigation of risks.", + "model_explanation_gemini": "Generates high-quality music samples from text descriptions or audio prompts using a single-stage auto-regressive Transformer model." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_nllb-200-3.3B.json b/data/model_data_json/facebook_nllb-200-3.3B.json new file mode 100644 index 0000000000000000000000000000000000000000..ab8f4e8a186c39ddc15bad302fd9078d11073956 --- /dev/null +++ b/data/model_data_json/facebook_nllb-200-3.3B.json @@ -0,0 +1,214 @@ +{ + "model_id": "facebook/nllb-200-3.3B", + "downloads": 390060, + "tags": [ + "transformers", + "pytorch", + "m2m_100", + "text2text-generation", + "nllb", + "translation", + "ace", + "acm", + "acq", + "aeb", + "af", + "ajp", + "ak", + "als", + "am", + "apc", + "ar", + "ars", + "ary", + "arz", + "as", + "ast", + "awa", + "ayr", + "azb", + "azj", + "ba", + "bm", + "ban", + "be", + "bem", + "bn", + "bho", + "bjn", + "bo", + "bs", + "bug", + "bg", + "ca", + "ceb", + "cs", + "cjk", + "ckb", + "crh", + "cy", + "da", + "de", + "dik", + "dyu", + "dz", + "el", + "en", + "eo", + "et", + "eu", + "ee", + "fo", + "fj", + "fi", + "fon", + "fr", + "fur", + "fuv", + "gaz", + "gd", + "ga", + "gl", + "gn", + "gu", + "ht", + "ha", + "he", + "hi", + "hne", + "hr", + "hu", + "hy", + "ig", + "ilo", + "id", + "is", + "it", + "jv", + "ja", + "kab", + "kac", + "kam", + "kn", + "ks", + "ka", + "kk", + "kbp", + "kea", + "khk", + "km", + "ki", + "rw", + "ky", + "kmb", + "kmr", + "knc", + "kg", + "ko", + "lo", + "lij", + "li", + "ln", + "lt", + "lmo", + "ltg", + "lb", + "lua", + "lg", + "luo", + "lus", + "lvs", + "mag", + "mai", + "ml", + "mar", + "min", + "mk", + "mt", + "mni", + "mos", + "mi", + "my", + "nl", + "nn", + "nb", + "npi", + "nso", + "nus", + "ny", + "oc", + "ory", + "pag", + "pa", + "pap", + "pbt", + "pes", + "plt", + "pl", + "pt", + "prs", + "quy", + "ro", + "rn", + "ru", + "sg", + "sa", + "sat", + "scn", + "shn", + "si", + "sk", + "sl", + "sm", + "sn", + "sd", + "so", + "st", + "es", + "sc", + "sr", + "ss", + "su", + "sv", + "swh", + "szl", + "ta", + "taq", + "tt", + "te", + "tg", + "tl", + "th", + "ti", + "tpi", + "tn", + "ts", + "tk", + "tum", + "tr", + "tw", + "tzm", + "ug", + "uk", + "umb", + "ur", + "uzn", + "vec", + "vi", + "war", + "wo", + "xh", + "ydd", + "yo", + "yue", + "zh", + "zsm", + "zu", + "dataset:flores-200", + "license:cc-by-nc-4.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- language: - ace - acm - acq - aeb - af - ajp - ak - als - am - apc - ar - ars - ary - arz - as - ast - awa - ayr - azb - azj - ba - bm - ban - be - bem - bn - bho - bjn - bo - bs - bug - bg - ca - ceb - cs - cjk - ckb - crh - cy - da - de - dik - dyu - dz - el - en - eo - et - eu - ee - fo - fj - fi - fon - fr - fur - fuv - gaz - gd - ga - gl - gn - gu - ht - ha - he - hi - hne - hr - hu - hy - ig - ilo - id - is - it - jv - ja - kab - kac - kam - kn - ks - ka - kk - kbp - kea - khk - km - ki - rw - ky - kmb - kmr - knc - kg - ko - lo - lij - li - ln - lt - lmo - ltg - lb - lua - lg - luo - lus - lvs - mag - mai - ml - mar - min - mk - mt - mni - mos - mi - my - nl - nn - nb - npi - nso - nus - ny - oc - ory - pag - pa - pap - pbt - pes - plt - pl - pt - prs - quy - ro - rn - ru - sg - sa - sat - scn - shn - si - sk - sl - sm - sn - sd - so - st - es - sc - sr - ss - su - sv - swh - szl - ta - taq - tt - te - tg - tl - th - ti - tpi - tn - ts - tk - tum - tr - tw - tzm - ug - uk - umb - ur - uzn - vec - vi - war - wo - xh - ydd - yo - yue - zh - zsm - zu language_details: \"ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab, aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab, asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl, bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn, bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn, cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn, dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn, ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn, fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr, hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn, hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn, jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva, kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr, kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn, lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn, ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva, mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn, mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn, nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn, gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn, prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn, san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn, smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn, srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn, tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi, taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn, tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab, uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr, yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn\" tags: - nllb - translation license: \"cc-by-nc-4.0\" datasets: - flores-200 metrics: - bleu - spbleu - chrf++ inference: false --- # NLLB-200 This is the model card of NLLB-200's 3.3B variant. Here are the metrics for that particular checkpoint. - Information about training algorithms, parameters, fairness constraints or other applied approaches, and features. The exact training algorithm, data and the strategies to handle data imbalances for high and low resource languages that were used to train NLLB-200 is described in the paper. - Paper or other resource for more information NLLB Team et al, No Language Left Behind: Scaling Human-Centered Machine Translation, Arxiv, 2022 - License: CC-BY-NC - Where to send questions or comments about the model: ## Intended Use - Primary intended uses: NLLB-200 is a machine translation model primarily intended for research in machine translation, - especially for low-resource languages. It allows for single sentence translation among 200 languages. Information on how to - use the model can be found in Fairseq code repository along with the training code and references to evaluation and training data. - Primary intended users: Primary users are researchers and machine translation research community. - Out-of-scope use cases: NLLB-200 is a research model and is not released for production deployment. NLLB-200 is trained on general domain text data and is not intended to be used with domain specific texts, such as medical domain or legal domain. The model is not intended to be used for document translation. The model was trained with input lengths not exceeding 512 tokens, therefore translating longer sequences might result in quality degradation. NLLB-200 translations can not be used as certified translations. ## Metrics • Model performance measures: NLLB-200 model was evaluated using BLEU, spBLEU, and chrF++ metrics widely adopted by machine translation community. Additionally, we performed human evaluation with the XSTS protocol and measured the toxicity of the generated translations. ## Evaluation Data - Datasets: Flores-200 dataset is described in Section 4 - Motivation: We used Flores-200 as it provides full evaluation coverage of the languages in NLLB-200 - Preprocessing: Sentence-split raw text data was preprocessed using SentencePiece. The SentencePiece model is released along with NLLB-200. ## Training Data • We used parallel multilingual data from a variety of sources to train the model. We provide detailed report on data selection and construction process in Section 5 in the paper. We also used monolingual data constructed from Common Crawl. We provide more details in Section 5.2. ## Ethical Considerations • In this work, we took a reflexive approach in technological development to ensure that we prioritize human users and minimize risks that could be transferred to them. While we reflect on our ethical considerations throughout the article, here are some additional points to highlight. For one, many languages chosen for this study are low-resource languages, with a heavy emphasis on African languages. While quality translation could improve education and information access in many in these communities, such an access could also make groups with lower levels of digital literacy more vulnerable to misinformation or online scams. The latter scenarios could arise if bad actors misappropriate our work for nefarious activities, which we conceive as an example of unintended use. Regarding data acquisition, the training data used for model development were mined from various publicly available sources on the web. Although we invested heavily in data cleaning, personally identifiable information may not be entirely eliminated. Finally, although we did our best to optimize for translation quality, mistranslations produced by the model could remain. Although the odds are low, this could have adverse impact on those who rely on these translations to make important decisions (particularly when related to health and safety). ## Caveats and Recommendations • Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments. ## Carbon Footprint Details • The carbon dioxide (CO2e) estimate is reported in Section 8.8.", + "model_explanation_gemini": "facebook_nllb-200-3.3B is a large-scale machine translation model supporting 200 languages for research in multilingual translation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_nllb-200-distilled-1.3B.json b/data/model_data_json/facebook_nllb-200-distilled-1.3B.json new file mode 100644 index 0000000000000000000000000000000000000000..60203a4984e5e2ba5382614178e30a2b6327fe9b --- /dev/null +++ b/data/model_data_json/facebook_nllb-200-distilled-1.3B.json @@ -0,0 +1,214 @@ +{ + "model_id": "facebook/nllb-200-distilled-1.3B", + "downloads": 109870, + "tags": [ + "transformers", + "pytorch", + "m2m_100", + "text2text-generation", + "nllb", + "translation", + "ace", + "acm", + "acq", + "aeb", + "af", + "ajp", + "ak", + "als", + "am", + "apc", + "ar", + "ars", + "ary", + "arz", + "as", + "ast", + "awa", + "ayr", + "azb", + "azj", + "ba", + "bm", + "ban", + "be", + "bem", + "bn", + "bho", + "bjn", + "bo", + "bs", + "bug", + "bg", + "ca", + "ceb", + "cs", + "cjk", + "ckb", + "crh", + "cy", + "da", + "de", + "dik", + "dyu", + "dz", + "el", + "en", + "eo", + "et", + "eu", + "ee", + "fo", + "fj", + "fi", + "fon", + "fr", + "fur", + "fuv", + "gaz", + "gd", + "ga", + "gl", + "gn", + "gu", + "ht", + "ha", + "he", + "hi", + "hne", + "hr", + "hu", + "hy", + "ig", + "ilo", + "id", + "is", + "it", + "jv", + "ja", + "kab", + "kac", + "kam", + "kn", + "ks", + "ka", + "kk", + "kbp", + "kea", + "khk", + "km", + "ki", + "rw", + "ky", + "kmb", + "kmr", + "knc", + "kg", + "ko", + "lo", + "lij", + "li", + "ln", + "lt", + "lmo", + "ltg", + "lb", + "lua", + "lg", + "luo", + "lus", + "lvs", + "mag", + "mai", + "ml", + "mar", + "min", + "mk", + "mt", + "mni", + "mos", + "mi", + "my", + "nl", + "nn", + "nb", + "npi", + "nso", + "nus", + "ny", + "oc", + "ory", + "pag", + "pa", + "pap", + "pbt", + "pes", + "plt", + "pl", + "pt", + "prs", + "quy", + "ro", + "rn", + "ru", + "sg", + "sa", + "sat", + "scn", + "shn", + "si", + "sk", + "sl", + "sm", + "sn", + "sd", + "so", + "st", + "es", + "sc", + "sr", + "ss", + "su", + "sv", + "swh", + "szl", + "ta", + "taq", + "tt", + "te", + "tg", + "tl", + "th", + "ti", + "tpi", + "tn", + "ts", + "tk", + "tum", + "tr", + "tw", + "tzm", + "ug", + "uk", + "umb", + "ur", + "uzn", + "vec", + "vi", + "war", + "wo", + "xh", + "ydd", + "yo", + "yue", + "zh", + "zsm", + "zu", + "dataset:flores-200", + "license:cc-by-nc-4.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- language: - ace - acm - acq - aeb - af - ajp - ak - als - am - apc - ar - ars - ary - arz - as - ast - awa - ayr - azb - azj - ba - bm - ban - be - bem - bn - bho - bjn - bo - bs - bug - bg - ca - ceb - cs - cjk - ckb - crh - cy - da - de - dik - dyu - dz - el - en - eo - et - eu - ee - fo - fj - fi - fon - fr - fur - fuv - gaz - gd - ga - gl - gn - gu - ht - ha - he - hi - hne - hr - hu - hy - ig - ilo - id - is - it - jv - ja - kab - kac - kam - kn - ks - ka - kk - kbp - kea - khk - km - ki - rw - ky - kmb - kmr - knc - kg - ko - lo - lij - li - ln - lt - lmo - ltg - lb - lua - lg - luo - lus - lvs - mag - mai - ml - mar - min - mk - mt - mni - mos - mi - my - nl - nn - nb - npi - nso - nus - ny - oc - ory - pag - pa - pap - pbt - pes - plt - pl - pt - prs - quy - ro - rn - ru - sg - sa - sat - scn - shn - si - sk - sl - sm - sn - sd - so - st - es - sc - sr - ss - su - sv - swh - szl - ta - taq - tt - te - tg - tl - th - ti - tpi - tn - ts - tk - tum - tr - tw - tzm - ug - uk - umb - ur - uzn - vec - vi - war - wo - xh - ydd - yo - yue - zh - zsm - zu language_details: \"ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab, aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab, asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl, bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn, bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn, cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn, dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn, ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn, fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr, hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn, hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn, jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva, kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr, kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn, lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn, ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva, mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn, mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn, nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn, gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn, prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn, san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn, smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn, srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn, tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi, taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn, tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab, uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr, yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn\" tags: - nllb - translation license: \"cc-by-nc-4.0\" datasets: - flores-200 metrics: - bleu - spbleu - chrf++ inference: false --- # NLLB-200 This is the model card of NLLB-200's distilled 1.3B variant. Here are the metrics for that particular checkpoint. - Information about training algorithms, parameters, fairness constraints or other applied approaches, and features. The exact training algorithm, data and the strategies to handle data imbalances for high and low resource languages that were used to train NLLB-200 is described in the paper. - Paper or other resource for more information NLLB Team et al, No Language Left Behind: Scaling Human-Centered Machine Translation, Arxiv, 2022 - License: CC-BY-NC - Where to send questions or comments about the model: ## Intended Use - Primary intended uses: NLLB-200 is a machine translation model primarily intended for research in machine translation, - especially for low-resource languages. It allows for single sentence translation among 200 languages. Information on how to - use the model can be found in Fairseq code repository along with the training code and references to evaluation and training data. - Primary intended users: Primary users are researchers and machine translation research community. - Out-of-scope use cases: NLLB-200 is a research model and is not released for production deployment. NLLB-200 is trained on general domain text data and is not intended to be used with domain specific texts, such as medical domain or legal domain. The model is not intended to be used for document translation. The model was trained with input lengths not exceeding 512 tokens, therefore translating longer sequences might result in quality degradation. NLLB-200 translations can not be used as certified translations. ## Metrics • Model performance measures: NLLB-200 model was evaluated using BLEU, spBLEU, and chrF++ metrics widely adopted by machine translation community. Additionally, we performed human evaluation with the XSTS protocol and measured the toxicity of the generated translations. ## Evaluation Data - Datasets: Flores-200 dataset is described in Section 4 - Motivation: We used Flores-200 as it provides full evaluation coverage of the languages in NLLB-200 - Preprocessing: Sentence-split raw text data was preprocessed using SentencePiece. The SentencePiece model is released along with NLLB-200. ## Training Data • We used parallel multilingual data from a variety of sources to train the model. We provide detailed report on data selection and construction process in Section 5 in the paper. We also used monolingual data constructed from Common Crawl. We provide more details in Section 5.2. ## Ethical Considerations • In this work, we took a reflexive approach in technological development to ensure that we prioritize human users and minimize risks that could be transferred to them. While we reflect on our ethical considerations throughout the article, here are some additional points to highlight. For one, many languages chosen for this study are low-resource languages, with a heavy emphasis on African languages. While quality translation could improve education and information access in many in these communities, such an access could also make groups with lower levels of digital literacy more vulnerable to misinformation or online scams. The latter scenarios could arise if bad actors misappropriate our work for nefarious activities, which we conceive as an example of unintended use. Regarding data acquisition, the training data used for model development were mined from various publicly available sources on the web. Although we invested heavily in data cleaning, personally identifiable information may not be entirely eliminated. Finally, although we did our best to optimize for translation quality, mistranslations produced by the model could remain. Although the odds are low, this could have adverse impact on those who rely on these translations to make important decisions (particularly when related to health and safety). ## Caveats and Recommendations • Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments. ## Carbon Footprint Details • The carbon dioxide (CO2e) estimate is reported in Section 8.8.", + "model_explanation_gemini": "Distilled 1.3B-parameter machine translation model supporting 200 languages for research purposes, released under CC-BY-NC-4.0 license." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_nllb-200-distilled-600M.json b/data/model_data_json/facebook_nllb-200-distilled-600M.json new file mode 100644 index 0000000000000000000000000000000000000000..819df175dde2157958b9a1f2b6187ffd41f15164 --- /dev/null +++ b/data/model_data_json/facebook_nllb-200-distilled-600M.json @@ -0,0 +1,214 @@ +{ + "model_id": "facebook/nllb-200-distilled-600M", + "downloads": 341434, + "tags": [ + "transformers", + "pytorch", + "m2m_100", + "text2text-generation", + "nllb", + "translation", + "ace", + "acm", + "acq", + "aeb", + "af", + "ajp", + "ak", + "als", + "am", + "apc", + "ar", + "ars", + "ary", + "arz", + "as", + "ast", + "awa", + "ayr", + "azb", + "azj", + "ba", + "bm", + "ban", + "be", + "bem", + "bn", + "bho", + "bjn", + "bo", + "bs", + "bug", + "bg", + "ca", + "ceb", + "cs", + "cjk", + "ckb", + "crh", + "cy", + "da", + "de", + "dik", + "dyu", + "dz", + "el", + "en", + "eo", + "et", + "eu", + "ee", + "fo", + "fj", + "fi", + "fon", + "fr", + "fur", + "fuv", + "gaz", + "gd", + "ga", + "gl", + "gn", + "gu", + "ht", + "ha", + "he", + "hi", + "hne", + "hr", + "hu", + "hy", + "ig", + "ilo", + "id", + "is", + "it", + "jv", + "ja", + "kab", + "kac", + "kam", + "kn", + "ks", + "ka", + "kk", + "kbp", + "kea", + "khk", + "km", + "ki", + "rw", + "ky", + "kmb", + "kmr", + "knc", + "kg", + "ko", + "lo", + "lij", + "li", + "ln", + "lt", + "lmo", + "ltg", + "lb", + "lua", + "lg", + "luo", + "lus", + "lvs", + "mag", + "mai", + "ml", + "mar", + "min", + "mk", + "mt", + "mni", + "mos", + "mi", + "my", + "nl", + "nn", + "nb", + "npi", + "nso", + "nus", + "ny", + "oc", + "ory", + "pag", + "pa", + "pap", + "pbt", + "pes", + "plt", + "pl", + "pt", + "prs", + "quy", + "ro", + "rn", + "ru", + "sg", + "sa", + "sat", + "scn", + "shn", + "si", + "sk", + "sl", + "sm", + "sn", + "sd", + "so", + "st", + "es", + "sc", + "sr", + "ss", + "su", + "sv", + "swh", + "szl", + "ta", + "taq", + "tt", + "te", + "tg", + "tl", + "th", + "ti", + "tpi", + "tn", + "ts", + "tk", + "tum", + "tr", + "tw", + "tzm", + "ug", + "uk", + "umb", + "ur", + "uzn", + "vec", + "vi", + "war", + "wo", + "xh", + "ydd", + "yo", + "yue", + "zh", + "zsm", + "zu", + "dataset:flores-200", + "license:cc-by-nc-4.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- language: - ace - acm - acq - aeb - af - ajp - ak - als - am - apc - ar - ars - ary - arz - as - ast - awa - ayr - azb - azj - ba - bm - ban - be - bem - bn - bho - bjn - bo - bs - bug - bg - ca - ceb - cs - cjk - ckb - crh - cy - da - de - dik - dyu - dz - el - en - eo - et - eu - ee - fo - fj - fi - fon - fr - fur - fuv - gaz - gd - ga - gl - gn - gu - ht - ha - he - hi - hne - hr - hu - hy - ig - ilo - id - is - it - jv - ja - kab - kac - kam - kn - ks - ka - kk - kbp - kea - khk - km - ki - rw - ky - kmb - kmr - knc - kg - ko - lo - lij - li - ln - lt - lmo - ltg - lb - lua - lg - luo - lus - lvs - mag - mai - ml - mar - min - mk - mt - mni - mos - mi - my - nl - nn - nb - npi - nso - nus - ny - oc - ory - pag - pa - pap - pbt - pes - plt - pl - pt - prs - quy - ro - rn - ru - sg - sa - sat - scn - shn - si - sk - sl - sm - sn - sd - so - st - es - sc - sr - ss - su - sv - swh - szl - ta - taq - tt - te - tg - tl - th - ti - tpi - tn - ts - tk - tum - tr - tw - tzm - ug - uk - umb - ur - uzn - vec - vi - war - wo - xh - ydd - yo - yue - zh - zsm - zu language_details: \"ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab, aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab, asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl, bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn, bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn, cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn, dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn, ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn, fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr, hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn, hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn, jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva, kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr, kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn, lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn, ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva, mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn, mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn, nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn, gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn, prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn, san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn, smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn, srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn, tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi, taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn, tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab, uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr, yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn\" pipeline_tag: translation tags: - nllb license: \"cc-by-nc-4.0\" datasets: - flores-200 metrics: - bleu - spbleu - chrf++ inference: false --- # NLLB-200 This is the model card of NLLB-200's distilled 600M variant. Here are the metrics for that particular checkpoint. - Information about training algorithms, parameters, fairness constraints or other applied approaches, and features. The exact training algorithm, data and the strategies to handle data imbalances for high and low resource languages that were used to train NLLB-200 is described in the paper. - Paper or other resource for more information NLLB Team et al, No Language Left Behind: Scaling Human-Centered Machine Translation, Arxiv, 2022 - License: CC-BY-NC - Where to send questions or comments about the model: ## Intended Use - Primary intended uses: NLLB-200 is a machine translation model primarily intended for research in machine translation, - especially for low-resource languages. It allows for single sentence translation among 200 languages. Information on how to - use the model can be found in Fairseq code repository along with the training code and references to evaluation and training data. - Primary intended users: Primary users are researchers and machine translation research community. - Out-of-scope use cases: NLLB-200 is a research model and is not released for production deployment. NLLB-200 is trained on general domain text data and is not intended to be used with domain specific texts, such as medical domain or legal domain. The model is not intended to be used for document translation. The model was trained with input lengths not exceeding 512 tokens, therefore translating longer sequences might result in quality degradation. NLLB-200 translations can not be used as certified translations. ## Metrics • Model performance measures: NLLB-200 model was evaluated using BLEU, spBLEU, and chrF++ metrics widely adopted by machine translation community. Additionally, we performed human evaluation with the XSTS protocol and measured the toxicity of the generated translations. ## Evaluation Data - Datasets: Flores-200 dataset is described in Section 4 - Motivation: We used Flores-200 as it provides full evaluation coverage of the languages in NLLB-200 - Preprocessing: Sentence-split raw text data was preprocessed using SentencePiece. The SentencePiece model is released along with NLLB-200. ## Training Data • We used parallel multilingual data from a variety of sources to train the model. We provide detailed report on data selection and construction process in Section 5 in the paper. We also used monolingual data constructed from Common Crawl. We provide more details in Section 5.2. ## Ethical Considerations • In this work, we took a reflexive approach in technological development to ensure that we prioritize human users and minimize risks that could be transferred to them. While we reflect on our ethical considerations throughout the article, here are some additional points to highlight. For one, many languages chosen for this study are low-resource languages, with a heavy emphasis on African languages. While quality translation could improve education and information access in many in these communities, such an access could also make groups with lower levels of digital literacy more vulnerable to misinformation or online scams. The latter scenarios could arise if bad actors misappropriate our work for nefarious activities, which we conceive as an example of unintended use. Regarding data acquisition, the training data used for model development were mined from various publicly available sources on the web. Although we invested heavily in data cleaning, personally identifiable information may not be entirely eliminated. Finally, although we did our best to optimize for translation quality, mistranslations produced by the model could remain. Although the odds are low, this could have adverse impact on those who rely on these translations to make important decisions (particularly when related to health and safety). ## Caveats and Recommendations • Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments. ## Carbon Footprint Details • The carbon dioxide (CO2e) estimate is reported in Section 8.8.", + "model_explanation_gemini": "Distilled machine translation model supporting 200 languages for translating between diverse languages and scripts." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_opt-1.3b.json b/data/model_data_json/facebook_opt-1.3b.json new file mode 100644 index 0000000000000000000000000000000000000000..fed9b9531049515f94c3c44cea323bf5bcce21ec --- /dev/null +++ b/data/model_data_json/facebook_opt-1.3b.json @@ -0,0 +1,21 @@ +{ + "model_id": "facebook/opt-1.3b", + "downloads": 181008, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "opt", + "text-generation", + "en", + "arxiv:2205.01068", + "arxiv:2005.14165", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- language: en inference: false tags: - text-generation - opt license: other commercial: false --- # OPT : Open Pre-trained Transformer Language Models OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI. **Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the paper. Content from **this** model card has been written by the Hugging Face team. ## Intro To quote the first two paragraphs of the official paper > Large language models trained on massive text collections have shown surprising emergent > capabilities to generate text and perform zero- and few-shot learning. While in some cases the public > can interact with these models through paid APIs, full model access is currently limited to only a > few highly resourced labs. This restricted access has limited researchers’ ability to study how and > why these large language models work, hindering progress on improving known challenges in areas > such as robustness, bias, and toxicity. > We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M > to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match > the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data > collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and > to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the > collective research community as a whole, which is only possible when models are available for study. ## Model description OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective. OPT belongs to the same family of decoder-only models like GPT-3. As such, it was pretrained using the self-supervised causal language modedling objective. For evaluation, OPT follows GPT-3 by using their prompts and overall experimental setup. For more details, please read the official paper. ## Intended uses & limitations The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation. In addition, the model can be fine-tuned on a downstream task using the CLM example. For all other OPT checkpoints, please have a look at the model hub. ### How to use You can use this model directly with a pipeline for text generation. By default, generation is deterministic. In order to use the top-k sampling, please set to . ### Limitations and bias As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral the model is strongly biased : > Like other large language models for which the diversity (or lack thereof) of training > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern > large language models. Here's an example of how the model can have biased predictions: compared to: This bias will also affect all fine-tuned versions of this model. ## Training data The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents: - BookCorpus, which consists of more than 10K unpublished books, - CC-Stories, which contains a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas, - The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included. - Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in Roller et al. (2021) - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News dataset that was used in RoBERTa (Liu et al., 2019b) The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus. The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data, which could contain sentences that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety. ### Collection process The dataset was collected form internet, and went through classic data processing algorithms and re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or *This ebook by Project Gutenberg.* ## Training procedure ### Preprocessing The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens. The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training. ### BibTeX entry and citation info", + "model_explanation_gemini": "facebook_opt-1.3b is a decoder-only English language model pretrained for text generation and zero/few-shot learning tasks, designed to match GPT-3 performance while enabling research access." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_opt-125m.json b/data/model_data_json/facebook_opt-125m.json new file mode 100644 index 0000000000000000000000000000000000000000..4899259aba4e9ed316b3a3a1ad0f8c16bdbfc92b --- /dev/null +++ b/data/model_data_json/facebook_opt-125m.json @@ -0,0 +1,21 @@ +{ + "model_id": "facebook/opt-125m", + "downloads": 6203822, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "opt", + "text-generation", + "en", + "arxiv:2205.01068", + "arxiv:2005.14165", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- language: en inference: false tags: - text-generation - opt license: other commercial: false --- # OPT : Open Pre-trained Transformer Language Models OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI. **Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the paper. Content from **this** model card has been written by the Hugging Face team. ## Intro To quote the first two paragraphs of the official paper > Large language models trained on massive text collections have shown surprising emergent > capabilities to generate text and perform zero- and few-shot learning. While in some cases the public > can interact with these models through paid APIs, full model access is currently limited to only a > few highly resourced labs. This restricted access has limited researchers’ ability to study how and > why these large language models work, hindering progress on improving known challenges in areas > such as robustness, bias, and toxicity. > We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M > to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match > the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data > collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and > to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the > collective research community as a whole, which is only possible when models are available for study. ## Model description OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective. OPT belongs to the same family of decoder-only models like GPT-3. As such, it was pretrained using the self-supervised causal language modedling objective. For evaluation, OPT follows GPT-3 by using their prompts and overall experimental setup. For more details, please read the official paper. ## Intended uses & limitations The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation. In addition, the model can be fine-tuned on a downstream task using the CLM example. For all other OPT checkpoints, please have a look at the model hub. ### How to use You can use this model directly with a pipeline for text generation. By default, generation is deterministic. In order to use the top-k sampling, please set to . ### Limitations and bias As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral the model is strongly biased : > Like other large language models for which the diversity (or lack thereof) of training > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern > large language models. This bias will also affect all fine-tuned versions of this model. ## Training data The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents: - BookCorpus, which consists of more than 10K unpublished books, - CC-Stories, which contains a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas, - The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included. - Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in Roller et al. (2021) - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News dataset that was used in RoBERTa (Liu et al., 2019b) The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus. The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data, which could contain sentences that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety. ### Collection process The dataset was collected form internet, and went through classic data processing algorithms and re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or *This ebook by Project Gutenberg.* ## Training procedure ### Preprocessing The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens. The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training. ### BibTeX entry and citation info", + "model_explanation_gemini": "facebook_opt-125m is a decoder-only English language model pretrained for text generation and zero/few-shot learning tasks, designed to match GPT-3 performance while enabling research access." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_opt-13b.json b/data/model_data_json/facebook_opt-13b.json new file mode 100644 index 0000000000000000000000000000000000000000..81c3297770e79323578d32deced6f95fb7290b1f --- /dev/null +++ b/data/model_data_json/facebook_opt-13b.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/opt-13b", + "downloads": 76208, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "opt", + "text-generation", + "en", + "arxiv:2205.01068", + "arxiv:2005.14165", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- language: en inference: false tags: - opt - text-generation license: other commercial: false --- # OPT : Open Pre-trained Transformer Language Models OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI. **Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the paper. Content from **this** model card has been written by the Hugging Face team. ## Intro To quote the first two paragraphs of the official paper > Large language models trained on massive text collections have shown surprising emergent > capabilities to generate text and perform zero- and few-shot learning. While in some cases the public > can interact with these models through paid APIs, full model access is currently limited to only a > few highly resourced labs. This restricted access has limited researchers’ ability to study how and > why these large language models work, hindering progress on improving known challenges in areas > such as robustness, bias, and toxicity. > We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M > to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match > the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data > collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and > to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the > collective research community as a whole, which is only possible when models are available for study. ## Model description OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective. OPT belongs to the same family of decoder-only models like GPT-3. As such, it was pretrained using the self-supervised causal language modedling objective. For evaluation, OPT follows GPT-3 by using their prompts and overall experimental setup. For more details, please read the official paper. ## Intended uses & limitations The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation. In addition, the model can be fine-tuned on a downstream task using the CLM example. For all other OPT checkpoints, please have a look at the model hub. ### How to use For large OPT models, such as this one, it is not recommend to make use of the pipeline because one should load the model in half-precision to accelerate generation and optimize memory consumption on GPU. It is recommended to directly call the []( method as follows: By default, generation is deterministic. In order to use the top-k sampling, please set to . ### Limitations and bias As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral the model is strongly biased : > Like other large language models for which the diversity (or lack thereof) of training > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern > large language models. Here's an example of how the model can have biased predictions: compared to: This bias will also affect all fine-tuned versions of this model. ## Training data The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents: - BookCorpus, which consists of more than 10K unpublished books, - CC-Stories, which contains a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas, - The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included. - Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in Roller et al. (2021) - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News dataset that was used in RoBERTa (Liu et al., 2019b) The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus. The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data, which could contain sentences that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety. ### Collection process The dataset was collected form internet, and went through classic data processing algorithms and re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or *This ebook by Project Gutenberg.* ## Training procedure ### Preprocessing The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens. The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training. ### BibTeX entry and citation info" +} \ No newline at end of file diff --git a/data/model_data_json/facebook_opt-350m.json b/data/model_data_json/facebook_opt-350m.json new file mode 100644 index 0000000000000000000000000000000000000000..45495cb5a299a1685f0715d70c7c86d173336421 --- /dev/null +++ b/data/model_data_json/facebook_opt-350m.json @@ -0,0 +1,21 @@ +{ + "model_id": "facebook/opt-350m", + "downloads": 276645, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "opt", + "text-generation", + "en", + "arxiv:2205.01068", + "arxiv:2005.14165", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- language: en inference: false tags: - text-generation license: other commercial: false --- # OPT : Open Pre-trained Transformer Language Models OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI. **Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the paper. Content from **this** model card has been written by the Hugging Face team. ## Intro To quote the first two paragraphs of the official paper > Large language models trained on massive text collections have shown surprising emergent > capabilities to generate text and perform zero- and few-shot learning. While in some cases the public > can interact with these models through paid APIs, full model access is currently limited to only a > few highly resourced labs. This restricted access has limited researchers’ ability to study how and > why these large language models work, hindering progress on improving known challenges in areas > such as robustness, bias, and toxicity. > We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M > to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match > the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data > collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and > to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the > collective research community as a whole, which is only possible when models are available for study. ## Model description OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective. OPT belongs to the same family of decoder-only models like GPT-3. As such, it was pretrained using the self-supervised causal language modedling objective. For evaluation, OPT follows GPT-3 by using their prompts and overall experimental setup. For more details, please read the official paper. ## Intended uses & limitations The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation. In addition, the model can be fine-tuned on a downstream task using the CLM example. For all other OPT checkpoints, please have a look at the model hub. ### How to use You can use this model directly with a pipeline for text generation. By default, generation is deterministic. In order to use the top-k sampling, please set to . ### Limitations and bias As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral the model is strongly biased : > Like other large language models for which the diversity (or lack thereof) of training > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern > large language models. Here's an example of how the model can have biased predictions: compared to: This bias will also affect all fine-tuned versions of this model. ## Training data The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents: - BookCorpus, which consists of more than 10K unpublished books, - CC-Stories, which contains a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas, - The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included. - Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in Roller et al. (2021) - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News dataset that was used in RoBERTa (Liu et al., 2019b) The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus. The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data, which could contain sentences that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety. ### Collection process The dataset was collected form internet, and went through classic data processing algorithms and re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or *This ebook by Project Gutenberg.* ## Training procedure ### Preprocessing The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens. The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training. ### BibTeX entry and citation info", + "model_explanation_gemini": "facebook_opt-350m is a decoder-only English language model pretrained for text generation and zero/few-shot learning, designed to match GPT-3 performance while enabling research access." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_roberta-hate-speech-dynabench-r4-target.json b/data/model_data_json/facebook_roberta-hate-speech-dynabench-r4-target.json new file mode 100644 index 0000000000000000000000000000000000000000..d1ddce3727facac567571675ed33f3c40d79b822 --- /dev/null +++ b/data/model_data_json/facebook_roberta-hate-speech-dynabench-r4-target.json @@ -0,0 +1,18 @@ +{ + "model_id": "facebook/roberta-hate-speech-dynabench-r4-target", + "downloads": 1885385, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "roberta", + "text-classification", + "en", + "arxiv:2012.15761", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en --- # LFTW R4 Target The R4 Target model from Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection ## Citation Information Thanks to Kushal Tirumala and Adina Williams for helping the authors put the model on the hub!", + "model_explanation_gemini": "Detects online hate speech using dynamically generated datasets to improve accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_sam-vit-base.json b/data/model_data_json/facebook_sam-vit-base.json new file mode 100644 index 0000000000000000000000000000000000000000..795d693c70a3388e75d85471c2f434493ec57670 --- /dev/null +++ b/data/model_data_json/facebook_sam-vit-base.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/sam-vit-base", + "downloads": 566728, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "sam", + "feature-extraction", + "vision", + "arxiv:2304.02643", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision --- # Model Card for Segment Anything Model (SAM) - ViT Base (ViT-B) version

\"Model Detailed architecture of Segment Anything Model (SAM).

# Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Citation # TL;DR Link to original repository | \"Snow\" | \"Forest\" | \"Mountains\" | |---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------| The **Segment Anything Model (SAM)** produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks. The abstract of the paper states: > We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at to foster research into foundation models for computer vision. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the original SAM model card. # Model Details The SAM model is made up of 3 modules: - The : a VIT based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used. - The : generates embeddings for points and bounding boxes - The : a two-ways transformer which performs cross attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed - The : predicts the output masks based on the contextualized masks produced by the . # Usage ## Prompted-Mask-Generation Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to the official repository. For more details, refer to this notebook, which shows a walk throught of how to use the model, with a visual example! ## Automatic-Mask-Generation The model can be used for generating segmentation masks in a \"zero-shot\" fashion, given an input image. The model is automatically prompt with a grid of points which are all fed to the model. The pipeline is made for automatic mask generation. The following snippet demonstrates how easy you can run it (on any device! Simply feed the appropriate argument) Now to display the image: # Citation If you use this model, please use the following BibTeX entry.", + "model_explanation_gemini": "Generates high-quality object masks from input prompts like points or boxes for image segmentation tasks with strong zero-shot performance." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_sam-vit-huge.json b/data/model_data_json/facebook_sam-vit-huge.json new file mode 100644 index 0000000000000000000000000000000000000000..ca235753fc034adeecaf6ff61cb5dbd22e9e2a9c --- /dev/null +++ b/data/model_data_json/facebook_sam-vit-huge.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/sam-vit-huge", + "downloads": 343339, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "sam", + "feature-extraction", + "vision", + "arxiv:2304.02643", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision --- # Model Card for Segment Anything Model (SAM) - ViT Huge (ViT-H) version

\"Model Detailed architecture of Segment Anything Model (SAM).

# Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Citation # TL;DR Link to original repository | \"Snow\" | \"Forest\" | \"Mountains\" | |---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------| The **Segment Anything Model (SAM)** produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks. The abstract of the paper states: > We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at to foster research into foundation models for computer vision. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the original SAM model card. # Model Details The SAM model is made up of 3 modules: - The : a VIT based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used. - The : generates embeddings for points and bounding boxes - The : a two-ways transformer which performs cross attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed - The : predicts the output masks based on the contextualized masks produced by the . # Usage ## Prompted-Mask-Generation Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to the official repository. For more details, refer to this notebook, which shows a walk throught of how to use the model, with a visual example! ## Automatic-Mask-Generation The model can be used for generating segmentation masks in a \"zero-shot\" fashion, given an input image. The model is automatically prompt with a grid of points which are all fed to the model. The pipeline is made for automatic mask generation. The following snippet demonstrates how easy you can run it (on any device! Simply feed the appropriate argument) Now to display the image: This should give you the following !car_mask_results # Citation If you use this model, please use the following BibTeX entry.", + "model_explanation_gemini": "Generates high-quality object masks from input prompts like points or boxes and can segment all objects in an image with strong zero-shot performance." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_sam-vit-large.json b/data/model_data_json/facebook_sam-vit-large.json new file mode 100644 index 0000000000000000000000000000000000000000..f507d6fcec52fbc3e5d410a38de1cc8b267c9d5a --- /dev/null +++ b/data/model_data_json/facebook_sam-vit-large.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/sam-vit-large", + "downloads": 475605, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "sam", + "feature-extraction", + "vision", + "arxiv:2304.02643", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision --- # Model Card for Segment Anything Model (SAM) - ViT Large (ViT-L) version

\"Model Detailed architecture of Segment Anything Model (SAM).

# Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Citation # TL;DR Link to original repository | \"Snow\" | \"Forest\" | \"Mountains\" | |---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------| The **Segment Anything Model (SAM)** produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks. The abstract of the paper states: > We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at to foster research into foundation models for computer vision. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the original SAM model card. # Model Details The SAM model is made up of 3 modules: - The : a VIT based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used. - The : generates embeddings for points and bounding boxes - The : a two-ways transformer which performs cross attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed - The : predicts the output masks based on the contextualized masks produced by the . # Usage ## Prompted-Mask-Generation Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to the official repository. For more details, refer to this notebook, which shows a walk throught of how to use the model, with a visual example! ## Automatic-Mask-Generation The model can be used for generating segmentation masks in a \"zero-shot\" fashion, given an input image. The model is automatically prompt with a grid of points which are all fed to the model. The pipeline is made for automatic mask generation. The following snippet demonstrates how easy you can run it (on any device! Simply feed the appropriate argument) Now to display the image: # Citation If you use this model, please use the following BibTeX entry.", + "model_explanation_gemini": "Generates high-quality object masks from input prompts like points or boxes and can segment all objects in an image with strong zero-shot performance." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_sam2-hiera-large.json b/data/model_data_json/facebook_sam2-hiera-large.json new file mode 100644 index 0000000000000000000000000000000000000000..9abad6055e807d3465443b4d2a90dd10a7bc495b --- /dev/null +++ b/data/model_data_json/facebook_sam2-hiera-large.json @@ -0,0 +1,13 @@ +{ + "model_id": "facebook/sam2-hiera-large", + "downloads": 144263, + "tags": [ + "sam2", + "mask-generation", + "arxiv:2408.00714", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: mask-generation library_name: sam2 --- Repository for SAM 2: Segment Anything in Images and Videos, a foundation model towards solving promptable visual segmentation in images and videos from FAIR. See the SAM 2 paper for more information. The official code is publicly release in this repo. ## Usage For image prediction: For video prediction: Refer to the demo notebooks for details. ### Citation To cite the paper, model, or software, please use the below:", + "model_explanation_gemini": "Segment Anything in Images and Videos (SAM 2) is a foundation model designed for promptable visual segmentation tasks in both images and videos." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_sam2.1-hiera-large.json b/data/model_data_json/facebook_sam2.1-hiera-large.json new file mode 100644 index 0000000000000000000000000000000000000000..3a76118c9eb8398b6aa300d8edfd5cfa22419ba2 --- /dev/null +++ b/data/model_data_json/facebook_sam2.1-hiera-large.json @@ -0,0 +1,13 @@ +{ + "model_id": "facebook/sam2.1-hiera-large", + "downloads": 139643, + "tags": [ + "sam2", + "mask-generation", + "arxiv:2408.00714", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: mask-generation library_name: sam2 --- Repository for SAM 2: Segment Anything in Images and Videos, a foundation model towards solving promptable visual segmentation in images and videos from FAIR. See the SAM 2 paper for more information. The official code is publicly release in this repo. ## Usage For image prediction: For video prediction: Refer to the demo notebooks for more details. ### Citation To cite the paper, model, or software, please use the below:", + "model_explanation_gemini": "Segment Anything in Images and Videos is a foundation model designed for promptable visual segmentation tasks in both images and videos." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_timesformer-base-finetuned-k400.json b/data/model_data_json/facebook_timesformer-base-finetuned-k400.json new file mode 100644 index 0000000000000000000000000000000000000000..edb610b2dbb776fe4202084014249ae0e066cba9 --- /dev/null +++ b/data/model_data_json/facebook_timesformer-base-finetuned-k400.json @@ -0,0 +1,17 @@ +{ + "model_id": "facebook/timesformer-base-finetuned-k400", + "downloads": 102946, + "tags": [ + "transformers", + "pytorch", + "timesformer", + "video-classification", + "vision", + "arxiv:2102.05095", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: \"cc-by-nc-4.0\" tags: - vision - video-classification --- # TimeSformer (base-sized model, fine-tuned on Kinetics-400) TimeSformer model pre-trained on Kinetics-400. It was introduced in the paper TimeSformer: Is Space-Time Attention All You Need for Video Understanding? by Tong et al. and first released in this repository. Disclaimer: The team releasing TimeSformer did not write a model card for this model so this model card has been written by fcakyon. ## Intended uses & limitations You can use the raw model for video classification into one of the 400 possible Kinetics-400 labels. ### How to use Here is how to use this model to classify a video: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Classifies videos into one of 400 Kinetics-400 categories using space-time attention." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_w2v-bert-2.0.json b/data/model_data_json/facebook_w2v-bert-2.0.json new file mode 100644 index 0000000000000000000000000000000000000000..5f1ac12e66b9df47fc9934c2dddc79eb7c02fc5c --- /dev/null +++ b/data/model_data_json/facebook_w2v-bert-2.0.json @@ -0,0 +1,111 @@ +{ + "model_id": "facebook/w2v-bert-2.0", + "downloads": 455218, + "tags": [ + "transformers", + "safetensors", + "wav2vec2-bert", + "feature-extraction", + "af", + "am", + "ar", + "as", + "az", + "be", + "bn", + "bs", + "bg", + "ca", + "cs", + "zh", + "cy", + "da", + "de", + "el", + "en", + "et", + "fi", + "fr", + "or", + "om", + "ga", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "ig", + "id", + "is", + "it", + "jv", + "ja", + "kn", + "ka", + "kk", + "mn", + "km", + "ky", + "ko", + "lo", + "ln", + "lt", + "lb", + "lg", + "lv", + "ml", + "mr", + "mk", + "mt", + "mi", + "my", + "nl", + "nb", + "ne", + "ny", + "oc", + "pa", + "ps", + "fa", + "pl", + "pt", + "ro", + "ru", + "sk", + "sl", + "sn", + "sd", + "so", + "es", + "sr", + "sv", + "sw", + "ta", + "te", + "tg", + "tl", + "th", + "tr", + "uk", + "ur", + "uz", + "vi", + "wo", + "xh", + "yo", + "ms", + "zu", + "ary", + "arz", + "yue", + "kea", + "arxiv:2312.05187", + "license:mit", + "region:us" + ], + "description": "--- license: mit language: - af - am - ar - as - az - be - bn - bs - bg - ca - cs - zh - cy - da - de - el - en - et - fi - fr - or - om - ga - gl - gu - ha - he - hi - hr - hu - hy - ig - id - is - it - jv - ja - kn - ka - kk - mn - km - ky - ko - lo - ln - lt - lb - lg - lv - ml - mr - mk - mt - mi - my - nl - nb - ne - ny - oc - pa - ps - fa - pl - pt - ro - ru - sk - sl - sn - sd - so - es - sr - sv - sw - ta - te - tg - tl - th - tr - uk - ur - uz - vi - wo - xh - yo - ms - zu - ary - arz - yue - kea inference: false --- # W2v-BERT 2.0 speech encoder We are open-sourcing our Conformer-based W2v-BERT 2.0 speech encoder as described in Section 3.2.1 of the paper, which is at the core of our Seamless models. This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification. | Model Name | #params | checkpoint | | ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | W2v-BERT 2.0 | 600M | checkpoint **This model and its training are supported by 🤗 Transformers, more on it in the docs.** # 🤗 Transformers usage This is a bare checkpoint without any modeling head, and thus requires finetuning to be used for downstream tasks such as ASR. You can however use it to extract audio embeddings from the top layer with this code snippet: To learn more about the model use, refer to the following resources: - its docs - a blog post showing how to fine-tune it on Mongolian ASR - a training script example # Seamless Communication usage This model can be used in Seamless Communication, where it was released. Here's how to make a forward pass through the voice encoder, after having completed the installation steps:", + "model_explanation_gemini": "W2v-BERT 2.0 is a Conformer-based speech encoder pre-trained on multilingual audio data, requiring finetuning for tasks like automatic speech recognition or audio classification." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-base-960h.json b/data/model_data_json/facebook_wav2vec2-base-960h.json new file mode 100644 index 0000000000000000000000000000000000000000..514a386a40550aebf38dbf8a09a194b2974382e6 --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-base-960h.json @@ -0,0 +1,23 @@ +{ + "model_id": "facebook/wav2vec2-base-960h", + "downloads": 1650492, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "dataset:librispeech_asr", + "arxiv:2006.11477", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - librispeech_asr tags: - audio - automatic-speech-recognition - hf-asr-leaderboard license: apache-2.0 widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: wav2vec2-base-960h results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 3.4 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 8.6 --- # Wav2Vec2-Base-960h Facebook's Wav2Vec2 The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Paper Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli **Abstract** We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data. The original model can be found under # Usage To transcribe audio files the model can be used as a standalone acoustic model as follows: ## Evaluation This code snippet shows how to evaluate **facebook/wav2vec2-base-960h** on LibriSpeech's \"clean\" and \"other\" test data. *Result (WER)*: | \"clean\" | \"other\" | |---|---| | 3.4 | 8.6 |", + "model_explanation_gemini": "facebook_wav2vec2-base-960h is a speech recognition model trained on 960 hours of audio to transcribe 16kHz speech into text, achieving low word error rates on benchmark datasets." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-base.json b/data/model_data_json/facebook_wav2vec2-base.json new file mode 100644 index 0000000000000000000000000000000000000000..e93edb1fa5a529fdad8e99062d2acafe602a9ae4 --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-base.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/wav2vec2-base", + "downloads": 389197, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "pretraining", + "speech", + "en", + "dataset:librispeech_asr", + "arxiv:2006.11477", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - librispeech_asr tags: - speech license: apache-2.0 --- # Wav2Vec2-Base Facebook's Wav2Vec2 The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model. Paper Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli **Abstract** We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data. The original model can be found under # Usage See this notebook for more information on how to fine-tune the model.", + "model_explanation_gemini": "Pretrained on 16kHz speech audio for speech recognition, requiring fine-tuning with a tokenizer on labeled text data to use." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-large-960h.json b/data/model_data_json/facebook_wav2vec2-large-960h.json new file mode 100644 index 0000000000000000000000000000000000000000..abd1194d9f1740789b2b5e3304d463c7e81caea7 --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-large-960h.json @@ -0,0 +1,19 @@ +{ + "model_id": "facebook/wav2vec2-large-960h", + "downloads": 105948, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "speech", + "en", + "dataset:librispeech_asr", + "arxiv:2006.11477", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - librispeech_asr tags: - speech license: apache-2.0 --- # Wav2Vec2-Large-960h Facebook's Wav2Vec2 The large model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Paper Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli **Abstract** We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data. The original model can be found under # Usage To transcribe audio files the model can be used as a standalone acoustic model as follows: ## Evaluation This code snippet shows how to evaluate **facebook/wav2vec2-large-960h** on LibriSpeech's \"clean\" and \"other\" test data. *Result (WER)*: | \"clean\" | \"other\" | |---|---| | 2.8 | 6.3 |", + "model_explanation_gemini": "Transcribes 16kHz speech audio into text, pretrained and fine-tuned on 960 hours of Librispeech data for high-accuracy speech recognition." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-large-es-voxpopuli.json b/data/model_data_json/facebook_wav2vec2-large-es-voxpopuli.json new file mode 100644 index 0000000000000000000000000000000000000000..17649c0156a1ac97273686bd6e701d8d59e9f970 --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-large-es-voxpopuli.json @@ -0,0 +1,21 @@ +{ + "model_id": "facebook/wav2vec2-large-es-voxpopuli", + "downloads": 108724, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "pretraining", + "audio", + "automatic-speech-recognition", + "voxpopuli", + "es", + "arxiv:2101.00390", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: es tags: - audio - automatic-speech-recognition - voxpopuli license: cc-by-nc-4.0 --- # Wav2Vec2-Large-VoxPopuli Facebook's Wav2Vec2 large model pretrained on the es unlabeled subset of VoxPopuli corpus. **Paper**: *VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation* **Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI* See the official website for more information, here # Fine-Tuning Please refer to this blog on how to fine-tune this model on a specific language. Note that you should replace with this checkpoint for fine-tuning.", + "model_explanation_gemini": "Facebook's Wav2Vec2 large model pretrained on Spanish audio from the VoxPopuli corpus for automatic speech recognition tasks." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-large-robust-ft-libri-960h.json b/data/model_data_json/facebook_wav2vec2-large-robust-ft-libri-960h.json new file mode 100644 index 0000000000000000000000000000000000000000..eccf933b1c0465b94e3e0d100e49c406a5b0fbb9 --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-large-robust-ft-libri-960h.json @@ -0,0 +1,25 @@ +{ + "model_id": "facebook/wav2vec2-large-robust-ft-libri-960h", + "downloads": 153981, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "speech", + "audio", + "en", + "dataset:libri_light", + "dataset:common_voice", + "dataset:switchboard", + "dataset:fisher", + "dataset:librispeech_asr", + "arxiv:2104.01027", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - libri_light - common_voice - switchboard - fisher - librispeech_asr tags: - speech - audio - automatic-speech-recognition widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: license: apache-2.0 --- # Wav2Vec2-Large-Robust finetuned on Librispeech Facebook's Wav2Vec2. This model is a fine-tuned version of the wav2vec2-large-robust model. It has been pretrained on: - Libri-Light: open-source audio books from the LibriVox project; clean, read-out audio data - CommonVoice: crowd-source collected audio data; read-out text snippets - Switchboard: telephone speech corpus; noisy telephone data - Fisher: conversational telephone speech; noisy telephone data and subsequently been finetuned on 960 hours of - Librispeech: open-source read-out audio data. When using the model make sure that your speech input is also sampled at 16Khz. Paper Robust Wav2Vec2 Authors: Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli **Abstract** Self-supervised learning of speech representations has been a very active research area but most work is focused on a single domain such as read audio books for which there exist large quantities of labeled and unlabeled data. In this paper, we explore more general setups where the domain of the unlabeled data for pre-training data differs from the domain of the labeled data for fine-tuning, which in turn may differ from the test data domain. Our experiments show that using target domain data during pre-training leads to large performance improvements across a variety of setups. On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%. This has obvious practical implications since it is much easier to obtain unlabeled target domain data than labeled data. Moreover, we find that pre-training on multiple domains improves generalization performance on domains not seen during training. Code and models will be made available at this https URL. The original model can be found under # Usage To transcribe audio files the model can be used as a standalone acoustic model as follows:", + "model_explanation_gemini": "Finetuned for robust automatic speech recognition across diverse domains, this model transcribes 16kHz audio inputs after pretraining on multiple datasets and finetuning on Librispeech." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-large-xlsr-53.json b/data/model_data_json/facebook_wav2vec2-large-xlsr-53.json new file mode 100644 index 0000000000000000000000000000000000000000..278b809e37df6161249b29540ac2adb425694d42 --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-large-xlsr-53.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/wav2vec2-large-xlsr-53", + "downloads": 98776, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "pretraining", + "speech", + "multilingual", + "dataset:common_voice", + "arxiv:2006.13979", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: multilingual datasets: - common_voice tags: - speech license: apache-2.0 --- # Wav2Vec2-XLSR-53 Facebook's XLSR-Wav2Vec2 The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition. Check out this blog for more information. Paper Authors: Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli **Abstract** This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages. We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations and jointly learns a quantization of the latents shared across languages. The resulting model is fine-tuned on labeled data and experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining. On the CommonVoice benchmark, XLSR shows a relative phoneme error rate reduction of 72% compared to the best known results. On BABEL, our approach improves word error rate by 16% relative compared to a comparable system. Our approach enables a single multilingual speech recognition model which is competitive to strong individual models. Analysis shows that the latent discrete speech representations are shared across languages with increased sharing for related languages. We hope to catalyze research in low-resource speech understanding by releasing XLSR-53, a large model pretrained in 53 languages. The original model can be found under # Usage See this notebook for more information on how to fine-tune the model. !model image", + "model_explanation_gemini": "Pretrained on multilingual speech audio for cross-lingual speech representation learning, requiring fine-tuning for tasks like automatic speech recognition." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-large.json b/data/model_data_json/facebook_wav2vec2-large.json new file mode 100644 index 0000000000000000000000000000000000000000..a2a0b5e0b6022afd11f9aaea053e72a47c2a9c3a --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-large.json @@ -0,0 +1,18 @@ +{ + "model_id": "facebook/wav2vec2-large", + "downloads": 82081, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "pretraining", + "speech", + "en", + "dataset:librispeech_asr", + "arxiv:2006.11477", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - librispeech_asr tags: - speech license: apache-2.0 --- # Wav2Vec2-Large Facebook's Wav2Vec2 The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition. Check out this blog for more information. Paper Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli **Abstract** We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data. The original model can be found under # Usage See this notebook for more information on how to fine-tune the model." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-xls-r-1b.json b/data/model_data_json/facebook_wav2vec2-xls-r-1b.json new file mode 100644 index 0000000000000000000000000000000000000000..1833d0cf553fdd274642a014df95db8631ba61d8 --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-xls-r-1b.json @@ -0,0 +1,146 @@ +{ + "model_id": "facebook/wav2vec2-xls-r-1b", + "downloads": 239163, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "pretraining", + "speech", + "xls_r", + "xls_r_pretrained", + "multilingual", + "ab", + "af", + "sq", + "am", + "ar", + "hy", + "as", + "az", + "ba", + "eu", + "be", + "bn", + "bs", + "br", + "bg", + "my", + "yue", + "ca", + "ceb", + "km", + "zh", + "cv", + "hr", + "cs", + "da", + "dv", + "nl", + "en", + "eo", + "et", + "fo", + "fi", + "fr", + "gl", + "lg", + "ka", + "de", + "el", + "gn", + "gu", + "ht", + "cnh", + "ha", + "haw", + "he", + "hi", + "hu", + "is", + "id", + "ia", + "ga", + "it", + "ja", + "jv", + "kb", + "kn", + "kk", + "rw", + "ky", + "ko", + "ku", + "lo", + "la", + "lv", + "ln", + "lt", + "lm", + "mk", + "mg", + "ms", + "ml", + "mt", + "gv", + "mi", + "mr", + "mn", + "ne", + "no", + "nn", + "oc", + "or", + "ps", + "fa", + "pl", + "pt", + "pa", + "ro", + "rm", + "ru", + "sah", + "sa", + "sco", + "sr", + "sn", + "sd", + "si", + "sk", + "sl", + "so", + "hsb", + "es", + "su", + "sw", + "sv", + "tl", + "tg", + "ta", + "tt", + "te", + "th", + "bo", + "tp", + "tr", + "tk", + "uk", + "ur", + "uz", + "vi", + "vot", + "war", + "cy", + "yi", + "yo", + "zu", + "dataset:common_voice", + "dataset:multilingual_librispeech", + "arxiv:2111.09296", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ab - af - sq - am - ar - hy - as - az - ba - eu - be - bn - bs - br - bg - my - yue - ca - ceb - km - zh - cv - hr - cs - da - dv - nl - en - eo - et - fo - fi - fr - gl - lg - ka - de - el - gn - gu - ht - cnh - ha - haw - he - hi - hu - is - id - ia - ga - it - ja - jv - kb - kn - kk - rw - ky - ko - ku - lo - la - lv - ln - lt - lm - mk - mg - ms - ml - mt - gv - mi - mr - mn - ne - no - nn - oc - or - ps - fa - pl - pt - pa - ro - rm - rm - ru - sah - sa - sco - sr - sn - sd - si - sk - sl - so - hsb - es - su - sw - sv - tl - tg - ta - tt - te - th - bo - tp - tr - tk - uk - ur - uz - vi - vot - war - cy - yi - yo - zu language_bcp47: - zh-HK - zh-TW - fy-NL datasets: - common_voice - multilingual_librispeech tags: - speech - xls_r - xls_r_pretrained license: apache-2.0 --- # Wav2Vec2-XLS-R-1B Facebook's Wav2Vec2 XLS-R counting **1 billion** parameters. !model image XLS-R is Facebook AI's large-scale multilingual pretrained model for speech (the \"XLM-R for Speech\"). It is pretrained on 436k hours of unlabeled speech, including VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107. It uses the wav2vec 2.0 objective, in 128 languages. When using the model make sure that your speech input is sampled at 16kHz. **Note**: This model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Translation, or Classification. Check out **this blog** for more information about ASR. XLS-R Paper **Abstract** This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on 436K hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range of tasks, domains, data regimes and languages, both high and low-resource. On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7.4 BLEU over 21 translation directions into English. For speech recognition, XLS-R improves over the best known prior work on BABEL, MLS, CommonVoice as well as VoxPopuli, lowering error rates by 20%-33% relative on average. XLS-R also sets a new state of the art on VoxLingua107 language identification. Moreover, we show that with sufficient model size, cross-lingual pretraining can outperform English-only pretraining when translating English speech into other languages, a setting which favors monolingual pretraining. We hope XLS-R can help to improve speech processing tasks for many more languages of the world. The original model can be found under # Usage See this google colab for more information on how to fine-tune the model. You can find other pretrained XLS-R models with different numbers of parameters: * 300M parameters version * 1B version version * 2B version version", + "model_explanation_gemini": "facebook_wav2vec2-xls-r-1b is a 1-billion-parameter multilingual speech model pretrained on 436k hours of audio across 128 languages, designed for fine-tuning on tasks like speech recognition, translation, or classification." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-xls-r-300m.json b/data/model_data_json/facebook_wav2vec2-xls-r-300m.json new file mode 100644 index 0000000000000000000000000000000000000000..a5b7b499e28515122814c476ea9566c443c3a17b --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-xls-r-300m.json @@ -0,0 +1,146 @@ +{ + "model_id": "facebook/wav2vec2-xls-r-300m", + "downloads": 321704, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "pretraining", + "speech", + "xls_r", + "xls_r_pretrained", + "multilingual", + "ab", + "af", + "sq", + "am", + "ar", + "hy", + "as", + "az", + "ba", + "eu", + "be", + "bn", + "bs", + "br", + "bg", + "my", + "yue", + "ca", + "ceb", + "km", + "zh", + "cv", + "hr", + "cs", + "da", + "dv", + "nl", + "en", + "eo", + "et", + "fo", + "fi", + "fr", + "gl", + "lg", + "ka", + "de", + "el", + "gn", + "gu", + "ht", + "cnh", + "ha", + "haw", + "he", + "hi", + "hu", + "is", + "id", + "ia", + "ga", + "it", + "ja", + "jv", + "kb", + "kn", + "kk", + "rw", + "ky", + "ko", + "ku", + "lo", + "la", + "lv", + "ln", + "lt", + "lm", + "mk", + "mg", + "ms", + "ml", + "mt", + "gv", + "mi", + "mr", + "mn", + "ne", + "no", + "nn", + "oc", + "or", + "ps", + "fa", + "pl", + "pt", + "pa", + "ro", + "rm", + "ru", + "sah", + "sa", + "sco", + "sr", + "sn", + "sd", + "si", + "sk", + "sl", + "so", + "hsb", + "es", + "su", + "sw", + "sv", + "tl", + "tg", + "ta", + "tt", + "te", + "th", + "bo", + "tp", + "tr", + "tk", + "uk", + "ur", + "uz", + "vi", + "vot", + "war", + "cy", + "yi", + "yo", + "zu", + "dataset:common_voice", + "dataset:multilingual_librispeech", + "arxiv:2111.09296", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ab - af - sq - am - ar - hy - as - az - ba - eu - be - bn - bs - br - bg - my - yue - ca - ceb - km - zh - cv - hr - cs - da - dv - nl - en - eo - et - fo - fi - fr - gl - lg - ka - de - el - gn - gu - ht - cnh - ha - haw - he - hi - hu - is - id - ia - ga - it - ja - jv - kb - kn - kk - rw - ky - ko - ku - lo - la - lv - ln - lt - lm - mk - mg - ms - ml - mt - gv - mi - mr - mn - ne - no - nn - oc - or - ps - fa - pl - pt - pa - ro - rm - rm - ru - sah - sa - sco - sr - sn - sd - si - sk - sl - so - hsb - es - su - sw - sv - tl - tg - ta - tt - te - th - bo - tp - tr - tk - uk - ur - uz - vi - vot - war - cy - yi - yo - zu language_bcp47: - zh-HK - zh-TW - fy-NL datasets: - common_voice - multilingual_librispeech tags: - speech - xls_r - xls_r_pretrained license: apache-2.0 --- # Wav2Vec2-XLS-R-300M Facebook's Wav2Vec2 XLS-R counting **300 million** parameters. !model image XLS-R is Facebook AI's large-scale multilingual pretrained model for speech (the \"XLM-R for Speech\"). It is pretrained on 436k hours of unlabeled speech, including VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107. It uses the wav2vec 2.0 objective, in 128 languages. When using the model make sure that your speech input is sampled at 16kHz. **Note**: This model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Translation, or Classification. Check out **this blog** for more information about ASR. XLS-R Paper Authors: Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli **Abstract** This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on 436K hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range of tasks, domains, data regimes and languages, both high and low-resource. On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7.4 BLEU over 21 translation directions into English. For speech recognition, XLS-R improves over the best known prior work on BABEL, MLS, CommonVoice as well as VoxPopuli, lowering error rates by 20%-33% relative on average. XLS-R also sets a new state of the art on VoxLingua107 language identification. Moreover, we show that with sufficient model size, cross-lingual pretraining can outperform English-only pretraining when translating English speech into other languages, a setting which favors monolingual pretraining. We hope XLS-R can help to improve speech processing tasks for many more languages of the world. The original model can be found under # Usage See this google colab for more information on how to fine-tune the model. You can find other pretrained XLS-R models with different numbers of parameters: * 300M parameters version * 1B version version * 2B version version", + "model_explanation_gemini": "facebook_wav2vec2-xls-r-300m is a 300-million-parameter multilingual pretrained speech model for tasks like automatic speech recognition, translation, and classification, supporting 128 languages and requiring 16kHz audio input." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wav2vec2-xlsr-53-espeak-cv-ft.json b/data/model_data_json/facebook_wav2vec2-xlsr-53-espeak-cv-ft.json new file mode 100644 index 0000000000000000000000000000000000000000..d71e81283084295ac3b9f46b2a46dcbac5a19d4e --- /dev/null +++ b/data/model_data_json/facebook_wav2vec2-xlsr-53-espeak-cv-ft.json @@ -0,0 +1,20 @@ +{ + "model_id": "facebook/wav2vec2-xlsr-53-espeak-cv-ft", + "downloads": 272175, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "speech", + "audio", + "phoneme-recognition", + "dataset:common_voice", + "arxiv:2109.11680", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: multi-lingual datasets: - common_voice tags: - speech - audio - automatic-speech-recognition - phoneme-recognition widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: license: apache-2.0 --- # Wav2Vec2-Large-XLSR-53 finetuned on multi-lingual Common Voice This checkpoint leverages the pretrained checkpoint wav2vec2-large-xlsr-53 and is fine-tuned on CommonVoice to recognize phonetic labels in multiple languages. When using the model make sure that your speech input is sampled at 16kHz. Note that the model outputs a string of phonetic labels. A dictionary mapping phonetic labels to words has to be used to map the phonetic output labels to output words. Paper: Simple and Effective Zero-shot Cross-lingual Phoneme Recognition Authors: Qiantong Xu, Alexei Baevski, Michael Auli **Abstract** Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data. However, in many cases there is labeled data available for related languages which is not utilized by these methods. This paper extends previous work on zero-shot cross-lingual transfer learning by fine-tuning a multilingually pretrained wav2vec 2.0 model to transcribe unseen languages. This is done by mapping phonemes of the training languages to the target language using articulatory features. Experiments show that this simple method significantly outperforms prior work which introduced task-specific architectures and used only part of a monolingually pretrained model. The original model can be found under # Usage To transcribe audio files the model can be used as a standalone acoustic model as follows:", + "model_explanation_gemini": "Finetuned on Common Voice to recognize phonetic labels in multiple languages from 16kHz speech inputs, requiring a dictionary to map phonetic outputs to words." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wmt19-de-en.json b/data/model_data_json/facebook_wmt19-de-en.json new file mode 100644 index 0000000000000000000000000000000000000000..639a1b3055463c1f365585feea4a02d445ab75d7 --- /dev/null +++ b/data/model_data_json/facebook_wmt19-de-en.json @@ -0,0 +1,24 @@ +{ + "model_id": "facebook/wmt19-de-en", + "downloads": 100166, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "fsmt", + "text2text-generation", + "translation", + "wmt19", + "facebook", + "de", + "en", + "dataset:wmt19", + "arxiv:1907.06616", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - de - en tags: - translation - wmt19 - facebook license: apache-2.0 datasets: - wmt19 metrics: - bleu thumbnail: --- # FSMT ## Model description This is a ported version of fairseq wmt19 transformer for de-en. For more details, please see, Facebook FAIR's WMT19 News Translation Task Submission. The abbreviation FSMT stands for FairSeqMachineTranslation All four models are available: * wmt19-en-ru * wmt19-ru-en * wmt19-en-de * wmt19-de-en ## Intended uses & limitations #### How to use #### Limitations and bias - The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, content gets truncated ## Training data Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the paper. ## Eval results pair | fairseq | transformers -------|---------|---------- de-en | 42.3 | 41.35 The score is slightly below the score reported by , since --num_beams 50`. ## Data Sources - training, etc. - test set ### BibTeX entry and citation info ## TODO - port model ensemble (fairseq uses 4 model checkpoints)", + "model_explanation_gemini": "Translates text between German and English using a transformer-based model trained on WMT19 data." +} \ No newline at end of file diff --git a/data/model_data_json/facebook_wmt19-en-de.json b/data/model_data_json/facebook_wmt19-en-de.json new file mode 100644 index 0000000000000000000000000000000000000000..286e055f957bbd6cc094648678a00110cb2aec68 --- /dev/null +++ b/data/model_data_json/facebook_wmt19-en-de.json @@ -0,0 +1,23 @@ +{ + "model_id": "facebook/wmt19-en-de", + "downloads": 98797, + "tags": [ + "transformers", + "pytorch", + "fsmt", + "text2text-generation", + "translation", + "wmt19", + "facebook", + "en", + "de", + "dataset:wmt19", + "arxiv:1907.06616", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de tags: - translation - wmt19 - facebook license: apache-2.0 datasets: - wmt19 metrics: - bleu thumbnail: --- # FSMT ## Model description This is a ported version of fairseq wmt19 transformer for en-de. For more details, please see, Facebook FAIR's WMT19 News Translation Task Submission. The abbreviation FSMT stands for FairSeqMachineTranslation All four models are available: * wmt19-en-ru * wmt19-ru-en * wmt19-en-de * wmt19-de-en ## Intended uses & limitations #### How to use #### Limitations and bias - The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, content gets truncated ## Training data Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the paper. ## Eval results pair | fairseq | transformers -------|---------|---------- en-de | 43.1 | 42.83 The score is slightly below the score reported by , since --num_beams 50`. ## Data Sources - training, etc. - test set ### BibTeX entry and citation info ## TODO - port model ensemble (fairseq uses 4 model checkpoints)", + "model_explanation_gemini": "Translates text between English and German, based on the WMT19 news translation task model." +} \ No newline at end of file diff --git a/data/model_data_json/fibonacciai_fibonacci-2-9b.json b/data/model_data_json/fibonacciai_fibonacci-2-9b.json new file mode 100644 index 0000000000000000000000000000000000000000..aa1b2bdf158ddd3c763208cb350dc340261ae6f0 --- /dev/null +++ b/data/model_data_json/fibonacciai_fibonacci-2-9b.json @@ -0,0 +1,23 @@ +{ + "model_id": "fibonacciai/fibonacci-2-9b", + "downloads": 524354, + "tags": [ + "gguf", + "fibonacci", + "text-generation-inference", + "text generation", + "text2text generation", + "text-generation", + "fa", + "en", + "ar", + "dataset:fibonacciai/fibonacci-2025", + "base_model:fibonacciai/fibonacci-1-EN-8b-chat.P1_5", + "base_model:quantized:fibonacciai/fibonacci-1-EN-8b-chat.P1_5", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit datasets: - fibonacciai/fibonacci-2025 language: - fa - en - ar base_model: - fibonacciai/fibonacci-1-EN-8b-chat.P1_5 pipeline_tag: text-generation new_version: fibonacciai/fibonacci-2-9b tags: - text-generation-inference - text generation - text2text generation --- # مدل Fibonacci-2-9b !لوگوی مدل ## معرفی مدل **Fibonacci-2-9b** یک مدل زبانی بزرگ (LLM) مبتنی بر معماری Gemma2 است که با ۹٫۲۴ میلیارد پارامتر طراحی شده است. این مدل برای انجام وظایف مرتبط با پردازش زبان طبیعی (NLP) و مکالمات متنی بهینه‌سازی شده است. ## ویژگی‌ها - **معماری:** Gemma2 - **تعداد پارامترها:** ۹٫۲۴ میلیارد - **فرمت‌ها:** GGUF با پشتیبانی از 4-bit (Q4_K_M)، 5-bit (Q5_K_M)، 8-bit (Q8_0)، و 16-bit (F16) - **مجوز استفاده:** MIT ## کاربردها - **تولید متن:** ایجاد متون خلاقانه و متنوع - **پاسخ به سؤالات:** ارائه پاسخ‌های دقیق به پرسش‌های کاربران - **ترجمه ماشینی:** ترجمه متون بین زبان‌های مختلف - **تحلیل احساسات:** شناسایی احساسات موجود در متون ## نحوه استفاده برای استفاده از این مدل، می‌توانید از کتابخانه‌های مختلفی مانند هاگینگ فیس استفاده کنید. در زیر یک نمونه کد برای بارگذاری و استفاده از مدل آورده شده است: python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(\"fibonacciai/fibonacci-2-9b\") model = AutoModelForCausalLM.from_pretrained(\"fibonacciai/fibonacci-2-9b\") input_text = \"Hello! How can I assist you today?\" inputs = tokenizer(input_text, return_tensors=\"pt\") outputs = model.generate(**inputs) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) Resources Model Page on Hugging Face Hugging Face Documentation Contribution We welcome your contributions! If you have suggestions for improving the model or have identified any bugs, please share them with us through the Issues section. License This model is released under the MIT License. For more information, see the LICENSE file. # نموذج Fibonacci-2-9b !شعار النموذج ## المقدمة نموذج **Fibonacci-2-9b** هو نموذج لغة كبير (LLM) يعتمد على بنية Gemma2، تم تصميمه بـ 9.24 مليار معلمة. هذا النموذج مُحسّن لمهام معالجة اللغة الطبيعية (NLP) والمحادثات النصية. ## الميزات - **البنية:** Gemma2 - **عدد المعلمات:** 9.24 مليار - **التنسيقات:** GGUF تدعم 4-بت (Q4_K_M)، 5-بت (Q5_K_M)، 8-بت (Q8_0)، و16-بت (F16) - **الترخيص:** MIT ## التطبيقات - **توليد النصوص:** إنشاء نصوص إبداعية ومتنوعة - **الإجابة على الأسئلة:** تقديم إجابات دقيقة لاستفسارات المستخدمين - **الترجمة الآلية:** ترجمة النصوص بين لغات مختلفة - **تحليل المشاعر:** تحديد المشاعر الموجودة في النصوص ## كيفية الاستخدام لاستخدام هذا النموذج، يمكنك الاستفادة من مكتبات مختلفة مثل من Hugging Face. فيما يلي مثال لتحميل واستخدام النموذج: ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(\"fibonacciai/fibonacci-2-9b\") model = AutoModelForCausalLM.from_pretrained(\"fibonacciai/fibonacci-2-9b\") input_text = \"مرحبًا! كيف يمكنني مساعدتك اليوم؟\" inputs = tokenizer(input_text, return_tensors=\"pt\") outputs ::contentReference[oaicite:0]{index=0}", + "model_explanation_gemini": "A 9.24-billion-parameter Gemma2-based multilingual language model optimized for text generation, question answering, machine translation, and sentiment analysis across Persian, English, and Arabic." +} \ No newline at end of file diff --git a/data/model_data_json/finiteautomata_bertweet-base-sentiment-analysis.json b/data/model_data_json/finiteautomata_bertweet-base-sentiment-analysis.json new file mode 100644 index 0000000000000000000000000000000000000000..5d51780b1a0f53251183f1909487c8bddea55236 --- /dev/null +++ b/data/model_data_json/finiteautomata_bertweet-base-sentiment-analysis.json @@ -0,0 +1,19 @@ +{ + "model_id": "finiteautomata/bertweet-base-sentiment-analysis", + "downloads": 298073, + "tags": [ + "transformers", + "pytorch", + "tf", + "roberta", + "text-classification", + "sentiment-analysis", + "en", + "arxiv:2106.09462", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - sentiment-analysis --- # Sentiment Analysis in English ## bertweet-sentiment-analysis Repository: Model trained with SemEval 2017 corpus (around ~40k tweets). Base model is BERTweet, a RoBERTa model trained on English tweets. Uses , , labels. ## License is an open-source library for non-commercial use and scientific research purposes only. Please be aware that models are trained with third-party datasets and are subject to their respective licenses. 1. TASS Dataset license 2. [SEMEval 2017 Dataset license]() ## Citation If you use in your work, please cite this paper Enjoy! 🤗", + "model_explanation_gemini": "Performs sentiment analysis on English tweets using a BERTweet-based model trained on SemEval 2017 data, classifying text into positive, negative, or neutral labels." +} \ No newline at end of file diff --git a/data/model_data_json/finiteautomata_beto-sentiment-analysis.json b/data/model_data_json/finiteautomata_beto-sentiment-analysis.json new file mode 100644 index 0000000000000000000000000000000000000000..1949b247cbfc0b5d60aa84e48f6b2bbd7f887f8c --- /dev/null +++ b/data/model_data_json/finiteautomata_beto-sentiment-analysis.json @@ -0,0 +1,19 @@ +{ + "model_id": "finiteautomata/beto-sentiment-analysis", + "downloads": 311829, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "text-classification", + "sentiment-analysis", + "es", + "arxiv:2106.09462", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - es tags: - sentiment-analysis --- # Sentiment Analysis in Spanish ## beto-sentiment-analysis **NOTE: this model will be removed soon -- use pysentimiento/robertuito-sentiment-analysis instead** Repository: Model trained with TASS 2020 corpus (around ~5k tweets) of several dialects of Spanish. Base model is BETO, a BERT model trained in Spanish. Uses , , labels. ## License is an open-source library for non-commercial use and scientific research purposes only. Please be aware that models are trained with third-party datasets and are subject to their respective licenses. 1. TASS Dataset license 2. [SEMEval 2017 Dataset license]() ## Citation If you use this model in your work, please cite the following papers: Enjoy! 🤗", + "model_explanation_gemini": "Classifies sentiment in Spanish tweets using BETO (a Spanish BERT model) with labels for positive, negative, and neutral sentiments." +} \ No newline at end of file diff --git a/data/model_data_json/fixie-ai_ultravox-v0_5-llama-3_2-1b.json b/data/model_data_json/fixie-ai_ultravox-v0_5-llama-3_2-1b.json new file mode 100644 index 0000000000000000000000000000000000000000..0c01adfebc17127bd7d017960ca6fa2ea203ae8f --- /dev/null +++ b/data/model_data_json/fixie-ai_ultravox-v0_5-llama-3_2-1b.json @@ -0,0 +1,58 @@ +{ + "model_id": "fixie-ai/ultravox-v0_5-llama-3_2-1b", + "downloads": 169739, + "tags": [ + "transformers", + "safetensors", + "ultravox", + "feature-extraction", + "audio-text-to-text", + "custom_code", + "ar", + "be", + "bg", + "bn", + "cs", + "cy", + "da", + "de", + "el", + "en", + "es", + "et", + "fa", + "fi", + "fr", + "gl", + "hi", + "hu", + "it", + "ja", + "ka", + "lt", + "lv", + "mk", + "mr", + "nl", + "pl", + "pt", + "ro", + "ru", + "sk", + "sl", + "sr", + "sv", + "sw", + "ta", + "th", + "tr", + "uk", + "ur", + "vi", + "zh", + "license:mit", + "region:us" + ], + "description": "--- language: - ar - be - bg - bn - cs - cy - da - de - el - en - es - et - fa - fi - fr - gl - hi - hu - it - ja - ka - lt - lv - mk - mr - nl - pl - pt - ro - ru - sk - sl - sr - sv - sw - ta - th - tr - uk - ur - vi - zh library_name: transformers license: mit metrics: - bleu pipeline_tag: audio-text-to-text --- # Model Card for Ultravox Ultravox is a multimodal Speech LLM built around a pretrained Llama3.2-1B-Instruct and whisper-large-v3-turbo backbone. See for the GitHub repo and more information. ## Model Details ### Model Description Ultravox is a multimodal model that can consume both speech and text as input (e.g., a text system prompt and voice user message). The input to the model is given as a text prompt with a special pseudo-token, and the model processor will replace this magic token with embeddings derived from the input audio. Using the merged embeddings as input, the model will then generate output text as usual. In a future revision of Ultravox, we plan to expand the token vocabulary to support generation of semantic and acoustic audio tokens, which can then be fed to a vocoder to produce voice output. No preference tuning has been applied to this revision of the model. - **Developed by:** Fixie.ai - **License:** MIT ### Model Sources - **Repository:** - **Demo:** See repo ## Usage Think of the model as an LLM that can also hear and understand speech. As such, it can be used as a voice agent, and also to do speech-to-speech translation, analysis of spoken audio, etc. To use the model, try the following: ## Training Details The model uses a pre-trained Llama3.2-1B-Instruct backbone as well as the encoder part of whisper-large-v3-turbo. The multi-modal adapter is trained, the Whisper encoder is fine-tuned, while the Llama model is kept frozen. We use a knowledge-distillation loss where Ultravox is trying to match the logits of the text-based Llama backbone. ### Training Data The training dataset is a mix of ASR datasets, extended with continuations generated by Llama 3.1 8B, and speech translation datasets, which yield a modest improvement in translation evaluations. ### Training Procedure Supervised speech instruction finetuning via knowledge-distillation. For more info, see training code in Ultravox repo. #### Training Hyperparameters - **Training regime:** BF16 mixed precision training - **Hardward used:** 8x H100 GPUs #### Speeds, Sizes, Times Check out the audio tab on TheFastest.ai for daily benchmarks and a comparison with other existing models. ## Evaluation | | **Ultravox 0.5 1b**| Ultravox 0.5 8B | Ultravox 0.5 70B | | --- | ---: | ---: | ---: | | **covost2 en_ar** | 1.55 | 12.99 | 20.21 | | **covost2 en_ca** | 8.06 | 31.54 | 40.01 | | **covost2 en_de** | 14.21 | 28.70 | 34.53 | | **covost2 es_en** | 24.97 | 40.19 | 43.29 | | **covost2 ru_en** | 24.12 | 42.13 | 48.99 | | **covost2 zh_en** | 4.76 | 17.22 | 21.37 | | **big bench audio**| 39.14 | 66.54 | 82.70 |", + "model_explanation_gemini": "Ultravox is a multimodal speech and text model that processes audio inputs via embeddings to generate text outputs, enabling tasks like speech-to-text translation and voice agent interactions." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-dutch-large.json b/data/model_data_json/flair_ner-dutch-large.json new file mode 100644 index 0000000000000000000000000000000000000000..5af00f6fe1bac8cd2bd35fca48a4772be7dc6e5d --- /dev/null +++ b/data/model_data_json/flair_ner-dutch-large.json @@ -0,0 +1,16 @@ +{ + "model_id": "flair/ner-dutch-large", + "downloads": 155515, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "nl", + "dataset:conll2003", + "arxiv:2011.06993", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: nl datasets: - conll2003 widget: - text: \"George Washington ging naar Washington\" --- ## Dutch NER in Flair (large model) This is the large 4-class NER model for Dutch that ships with Flair. F1-Score: **95,25** (CoNLL-03 Dutch) Predicts 4 tags: | **tag** | **meaning** | |---------------------------------|-----------| | PER | person name | | LOC | location name | | ORG | organization name | | MISC | other name | Based on document-level XLM-R embeddings and FLERT. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*George Washington*\" (labeled as a **person**) and \"*Washington*\" (labeled as a **location**) are found in the sentence \"*George Washington ging naar Washington*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Identifies and classifies named entities in Dutch text into four categories (person, location, organization, and miscellaneous) using document-level XLM-R embeddings and FLERT." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-english-fast.json b/data/model_data_json/flair_ner-english-fast.json new file mode 100644 index 0000000000000000000000000000000000000000..906a82d9beb507da34d745028c5f09e69be843f4 --- /dev/null +++ b/data/model_data_json/flair_ner-english-fast.json @@ -0,0 +1,15 @@ +{ + "model_id": "flair/ner-english-fast", + "downloads": 916475, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "en", + "dataset:conll2003", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: en datasets: - conll2003 widget: - text: \"George Washington went to Washington\" --- ## English NER in Flair (fast model) This is the fast 4-class NER model for English that ships with Flair. F1-Score: **92,92** (corrected CoNLL-03) Predicts 4 tags: | **tag** | **meaning** | |---------------------------------|-----------| | PER | person name | | LOC | location name | | ORG | organization name | | MISC | other name | Based on Flair embeddings and LSTM-CRF. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*George Washington*\" (labeled as a **person**) and \"*Washington*\" (labeled as a **location**) are found in the sentence \"*George Washington went to Washington*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Identifies and classifies named entities in English text into four categories (person, location, organization, or other) using Flair embeddings and LSTM-CRF." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-english-large.json b/data/model_data_json/flair_ner-english-large.json new file mode 100644 index 0000000000000000000000000000000000000000..b0ed79a969e5193041294a9c3dc28902826f6f4c --- /dev/null +++ b/data/model_data_json/flair_ner-english-large.json @@ -0,0 +1,16 @@ +{ + "model_id": "flair/ner-english-large", + "downloads": 661647, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "en", + "dataset:conll2003", + "arxiv:2011.06993", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: en datasets: - conll2003 widget: - text: \"George Washington went to Washington\" --- ## English NER in Flair (large model) This is the large 4-class NER model for English that ships with Flair. F1-Score: **94,36** (corrected CoNLL-03) Predicts 4 tags: | **tag** | **meaning** | |---------------------------------|-----------| | PER | person name | | LOC | location name | | ORG | organization name | | MISC | other name | Based on document-level XLM-R embeddings and FLERT. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*George Washington*\" (labeled as a **person**) and \"*Washington*\" (labeled as a **location**) are found in the sentence \"*George Washington went to Washington*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Performs English named entity recognition (NER) to identify and classify person, location, organization, and miscellaneous names in text using document-level XLM-R embeddings and FLERT." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-english-ontonotes-large.json b/data/model_data_json/flair_ner-english-ontonotes-large.json new file mode 100644 index 0000000000000000000000000000000000000000..582456165ff7246f5dcb9e69b1ccb958a67be793 --- /dev/null +++ b/data/model_data_json/flair_ner-english-ontonotes-large.json @@ -0,0 +1,16 @@ +{ + "model_id": "flair/ner-english-ontonotes-large", + "downloads": 170979, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "en", + "dataset:ontonotes", + "arxiv:2011.06993", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: en datasets: - ontonotes widget: - text: \"On September 1st George won 1 dollar while watching Game of Thrones.\" --- ## English NER in Flair (Ontonotes large model) This is the large 18-class NER model for English that ships with Flair. F1-Score: **90.93** (Ontonotes) Predicts 18 tags: | **tag** | **meaning** | |---------------------------------|-----------| | CARDINAL | cardinal value | | DATE | date value | | EVENT | event name | | FAC | building name | | GPE | geo-political entity | | LANGUAGE | language name | | LAW | law name | | LOC | location name | | MONEY | money name | | NORP | affiliation | | ORDINAL | ordinal value | | ORG | organization name | | PERCENT | percent value | | PERSON | person name | | PRODUCT | product name | | QUANTITY | quantity value | | TIME | time value | | WORK_OF_ART | name of work of art | Based on document-level XLM-R embeddings and FLERT. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*September 1st*\" (labeled as a **date**), \"*George*\" (labeled as a **person**), \"*1 dollar*\" (labeled as a **money**) and \"Game of Thrones\" (labeled as a **work of art**) are found in the sentence \"*On September 1st George Washington won 1 dollar while watching Game of Thrones*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Identifies and classifies 18 types of named entities in English text, including dates, people, organizations, and works of art, using document-level XLM-R embeddings and FLERT." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-english-ontonotes.json b/data/model_data_json/flair_ner-english-ontonotes.json new file mode 100644 index 0000000000000000000000000000000000000000..5fbf84dba76d8a04ff45796fbc07c3a250c09382 --- /dev/null +++ b/data/model_data_json/flair_ner-english-ontonotes.json @@ -0,0 +1,15 @@ +{ + "model_id": "flair/ner-english-ontonotes", + "downloads": 166091, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "en", + "dataset:ontonotes", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: en datasets: - ontonotes widget: - text: \"On September 1st George Washington won 1 dollar.\" --- ## English NER in Flair (Ontonotes default model) This is the 18-class NER model for English that ships with Flair. F1-Score: **89.27** (Ontonotes) Predicts 18 tags: | **tag** | **meaning** | |---------------------------------|-----------| | CARDINAL | cardinal value | | DATE | date value | | EVENT | event name | | FAC | building name | | GPE | geo-political entity | | LANGUAGE | language name | | LAW | law name | | LOC | location name | | MONEY | money name | | NORP | affiliation | | ORDINAL | ordinal value | | ORG | organization name | | PERCENT | percent value | | PERSON | person name | | PRODUCT | product name | | QUANTITY | quantity value | | TIME | time value | | WORK_OF_ART | name of work of art | Based on Flair embeddings and LSTM-CRF. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*September 1st*\" (labeled as a **date**), \"*George Washington*\" (labeled as a **person**) and \"*1 dollar*\" (labeled as a **money**) are found in the sentence \"*On September 1st George Washington won 1 dollar*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Identifies and classifies 18 types of named entities in English text using Flair embeddings and LSTM-CRF, achieving an F1-score of 89.27 on Ontonotes data." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-english.json b/data/model_data_json/flair_ner-english.json new file mode 100644 index 0000000000000000000000000000000000000000..8808aab7192d4c9a2294f599717db0e3221faa29 --- /dev/null +++ b/data/model_data_json/flair_ner-english.json @@ -0,0 +1,15 @@ +{ + "model_id": "flair/ner-english", + "downloads": 127747, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "en", + "dataset:conll2003", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: en datasets: - conll2003 widget: - text: \"George Washington went to Washington\" --- ## English NER in Flair (default model) This is the standard 4-class NER model for English that ships with Flair. F1-Score: **93,06** (corrected CoNLL-03) Predicts 4 tags: | **tag** | **meaning** | |---------------------------------|-----------| | PER | person name | | LOC | location name | | ORG | organization name | | MISC | other name | Based on Flair embeddings and LSTM-CRF. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*George Washington*\" (labeled as a **person**) and \"*Washington*\" (labeled as a **location**) are found in the sentence \"*George Washington went to Washington*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Identifies and classifies named entities in English text into four categories (person, location, organization, other) using Flair embeddings and LSTM-CRF." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-french.json b/data/model_data_json/flair_ner-french.json new file mode 100644 index 0000000000000000000000000000000000000000..076ec3382fb32050e5cbb1aa3239c113b1828c75 --- /dev/null +++ b/data/model_data_json/flair_ner-french.json @@ -0,0 +1,15 @@ +{ + "model_id": "flair/ner-french", + "downloads": 313648, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "fr", + "dataset:conll2003", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: fr datasets: - conll2003 widget: - text: \"George Washington est allé à Washington\" --- ## French NER in Flair (default model) This is the standard 4-class NER model for French that ships with Flair. F1-Score: **90,61** (WikiNER) Predicts 4 tags: | **tag** | **meaning** | |---------------------------------|-----------| | PER | person name | | LOC | location name | | ORG | organization name | | MISC | other name | Based on Flair embeddings and LSTM-CRF. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*George Washington*\" (labeled as a **person**) and \"*Washington*\" (labeled as a **location**) are found in the sentence \"*George Washington est allé à Washington*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Identifies and classifies named entities in French text into four categories (person, location, organization, and other names) using Flair embeddings and LSTM-CRF." +} \ No newline at end of file diff --git a/data/model_data_json/flair_ner-german-large.json b/data/model_data_json/flair_ner-german-large.json new file mode 100644 index 0000000000000000000000000000000000000000..2f6e4496324730e36bac58bad7d245d87f382677 --- /dev/null +++ b/data/model_data_json/flair_ner-german-large.json @@ -0,0 +1,16 @@ +{ + "model_id": "flair/ner-german-large", + "downloads": 301540, + "tags": [ + "flair", + "pytorch", + "token-classification", + "sequence-tagger-model", + "de", + "dataset:conll2003", + "arxiv:2011.06993", + "region:us" + ], + "description": "--- tags: - flair - token-classification - sequence-tagger-model language: de datasets: - conll2003 widget: - text: \"George Washington ging nach Washington\" --- ## German NER in Flair (large model) This is the large 4-class NER model for German that ships with Flair. F1-Score: **92,31** (CoNLL-03 German revised) Predicts 4 tags: | **tag** | **meaning** | |---------------------------------|-----------| | PER | person name | | LOC | location name | | ORG | organization name | | MISC | other name | Based on document-level XLM-R embeddings and FLERT. --- ### Demo: How to use in Flair Requires: **Flair** () This yields the following output: So, the entities \"*George Washington*\" (labeled as a **person**) and \"*Washington*\" (labeled as a **location**) are found in the sentence \"*George Washington ging nach Washington*\". --- ### Training: Script to train this model The following Flair script was used to train this model: --- ### Cite Please cite the following paper when using this model. --- ### Issues? The Flair issue tracker is available here.", + "model_explanation_gemini": "Performs German named entity recognition (NER) to identify and classify entities into four categories (person, location, organization, and other names) using document-level XLM-R embeddings and FLERT." +} \ No newline at end of file diff --git a/data/model_data_json/foduucom_table-detection-and-extraction.json b/data/model_data_json/foduucom_table-detection-and-extraction.json new file mode 100644 index 0000000000000000000000000000000000000000..7a8138b23f2912dc3cb80a05b77880392cedceaa --- /dev/null +++ b/data/model_data_json/foduucom_table-detection-and-extraction.json @@ -0,0 +1,29 @@ +{ + "model_id": "foduucom/table-detection-and-extraction", + "downloads": 83150, + "tags": [ + "ultralytics", + "tensorboard", + "v8", + "ultralyticsplus", + "yolov8", + "yolo", + "vision", + "object-detection", + "pytorch", + "table detection", + "table extraction", + "table classification", + "document analysis", + "unstructured document", + "unstructured table extraction", + "structured table extraction", + "unstructured table detection", + "structured table detection", + "en", + "dataset:foduucom/table-detection-yolo", + "model-index", + "region:us" + ], + "description": "--- tags: - ultralyticsplus - yolov8 - ultralytics - yolo - vision - object-detection - pytorch - table detection - table extraction - table classification - document analysis - unstructured document - unstructured table extraction - structured table extraction - unstructured table detection - structured table detection library_name: ultralytics library_version: 8.0.43 inference: true model-index: - name: foduucom/table-detection-and-extraction results: - task: type: object-detection metrics: - type: precision value: 0.96196 name: mAP@0.5(box) language: - en metrics: - accuracy datasets: - foduucom/table-detection-yolo pipeline_tag: object-detection ---
\"foduucom/table-detection-and-extraction\" # Model Card for YOLOv8s Table Detection ## Model Summary The YOLOv8s Table Detection model is an object detection model based on the YOLO (You Only Look Once) framework. It is designed to detect tables, whether they are bordered or borderless, in images. The model has been fine-tuned on a vast dataset and achieved high accuracy in detecting tables and distinguishing between bordered and borderless ones. ## Model Details ### Model Description The YOLOv8s Table Detection model serves as a versatile solution for precisely identifying tables within images, whether they exhibit a bordered or borderless design. Notably, this model's capabilities extend beyond mere detection – it plays a crucial role in addressing the complexities of unstructured documents. By employing advanced techniques such as bounding box delineation, the model enables users to isolate tables of interest within the visual content. What sets this model apart is its synergy with Optical Character Recognition (OCR) technology. This seamless integration empowers the model to not only locate tables but also to extract pertinent data contained within. The bounding box information guides the cropping of tables, which is then coupled with OCR to meticulously extract textual data, streamlining the process of information retrieval from unstructured documents. We invite you to explore the potential of this model and its data extraction capabilities. For those interested in harnessing its power or seeking further collaboration, we encourage you to reach out to us at info@foduu.com. Whether you require assistance, customization, or have innovative ideas, our collaborative approach is geared towards addressing your unique challenges. Additionally, you can actively engage with our vibrant community section for valuable insights and collective problem-solving. Your input drives our continuous improvement, as we collectively pave the way towards enhanced data extraction and document analysis. - **Developed by:** FODUU AI - **Model type:** Object Detection - **Task:** Table Detection (Bordered and Borderless) Furthermore, the YOLOv8s Table Detection model is not limited to table detection alone. It is a versatile tool that contributes to the processing of unstructured documents. By utilizing advanced bounding box techniques, the model empowers users to isolate tables within the document's visual content. What sets this model apart is its seamless integration with Optical Character Recognition (OCR) technology. The combination of bounding box information and OCR allows for precise data extraction from the tables. This comprehensive approach streamlines the process of information retrieval from complex documents. User collaboration is actively encouraged to enrich the model's capabilities. By contributing table images of different designs and types, users play a pivotal role in enhancing the model's ability to detect a diverse range of tables accurately. Community participation can be facilitated through our platform or by reaching out to us at info@foduu.com. We value collaborative efforts that drive continuous improvement and innovation in table detection and extraction. ### Supported Labels ## Uses ### Direct Use The YOLOv8s Table Detection model can be directly used for detecting tables in images, whether they are bordered or borderless. It is equipped with the ability to distinguish between these two categories. ### Downstream Use The model can also be fine-tuned for specific table detection tasks or integrated into larger applications for furniture recognition, interior design, image-based data extraction, and other related fields. ### Out-of-Scope Use The model is not designed for unrelated object detection tasks or scenarios outside the scope of table detection. ## Bias, Risks, and Limitations The YOLOv8s Table Detection model may have some limitations and biases: - Performance may vary based on the quality, diversity, and representativeness of the training data. - The model may face challenges in detecting tables with intricate designs or complex arrangements. - Accuracy may be affected by variations in lighting conditions, image quality, and resolution. - Detection of very small or distant tables might be less accurate. - The model's ability to classify bordered and borderless tables may be influenced by variations in design. ### Recommendations Users should be informed about the model's limitations and potential biases. Further testing and validation are advised for specific use cases to evaluate its performance accurately. ## How to Get Started with the Model To begin using the YOLOv8s Table Detection model, follow these steps: - Load model and perform prediction: ## Training Details ### Training Data The model is trained on a diverse dataset containing images of tables from various sources. The dataset includes examples of both bordered and borderless tables, capturing different designs and styles. ### Training Procedure The training process involves extensive computation and is conducted over multiple epochs. The model's weights are adjusted to minimize detection loss and optimize performance. #### Metrics - mAP@0.5 (box): - All: 0.962 - Bordered: 0.961 - Borderless: 0.963 ### Model Architecture and Objective The YOLOv8s architecture employs a modified CSPDarknet53 as its backbone, along with self-attention mechanisms and feature pyramid networks. These components contribute to the model's ability to detect and classify tables accurately, considering variations in size, design, and style. ### Compute Infrastructure #### Hardware NVIDIA GeForce RTX 3060 card #### Software The model was trained and fine-tuned using a Jupyter Notebook environment. ## Model Card Contact For inquiries and contributions, please contact us at info@foduu.com. ---" +} \ No newline at end of file diff --git a/data/model_data_json/fxmarty_pix2struct-tiny-random.json b/data/model_data_json/fxmarty_pix2struct-tiny-random.json new file mode 100644 index 0000000000000000000000000000000000000000..502d48694611c21f0c48339b8de97929a8d2ce35 --- /dev/null +++ b/data/model_data_json/fxmarty_pix2struct-tiny-random.json @@ -0,0 +1,15 @@ +{ + "model_id": "fxmarty/pix2struct-tiny-random", + "downloads": 76445, + "tags": [ + "transformers", + "pytorch", + "pix2struct", + "image-text-to-text", + "image-to-text", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit pipeline_tag: image-to-text ---" +} \ No newline at end of file diff --git a/data/model_data_json/fxmarty_really-tiny-falcon-testing.json b/data/model_data_json/fxmarty_really-tiny-falcon-testing.json new file mode 100644 index 0000000000000000000000000000000000000000..31f4d60a5c92b06cee602e26dd4808bae35c90d8 --- /dev/null +++ b/data/model_data_json/fxmarty_really-tiny-falcon-testing.json @@ -0,0 +1,18 @@ +{ + "model_id": "fxmarty/really-tiny-falcon-testing", + "downloads": 126726, + "tags": [ + "transformers", + "pytorch", + "falcon", + "text-generation", + "custom_code", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- tiny = <10 MB", + "model_explanation_gemini": "A tiny Falcon-based model under 10 MB, licensed under MIT." +} \ No newline at end of file diff --git a/data/model_data_json/fxmarty_tiny-dummy-qwen2.json b/data/model_data_json/fxmarty_tiny-dummy-qwen2.json new file mode 100644 index 0000000000000000000000000000000000000000..86ad3a696f22021eeebe72fea9fb63f349b74e4f --- /dev/null +++ b/data/model_data_json/fxmarty_tiny-dummy-qwen2.json @@ -0,0 +1,18 @@ +{ + "model_id": "fxmarty/tiny-dummy-qwen2", + "downloads": 99092, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit ---", + "model_explanation_gemini": "A tiny dummy Qwen2 model for testing or demonstration purposes." +} \ No newline at end of file diff --git a/data/model_data_json/fxmarty_tiny-random-GemmaForCausalLM.json b/data/model_data_json/fxmarty_tiny-random-GemmaForCausalLM.json new file mode 100644 index 0000000000000000000000000000000000000000..073e8700a60a5e324126a1683de008293234500b --- /dev/null +++ b/data/model_data_json/fxmarty_tiny-random-GemmaForCausalLM.json @@ -0,0 +1,17 @@ +{ + "model_id": "fxmarty/tiny-random-GemmaForCausalLM", + "downloads": 97907, + "tags": [ + "transformers", + "safetensors", + "gemma", + "text-generation", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- This one with a custom as allowed by the architecture (see 7b model).", + "model_explanation_gemini": "Randomly initialized tiny Gemma model for causal language modeling, licensed under MIT." +} \ No newline at end of file diff --git a/data/model_data_json/genbio-ai_AIDO.Protein-RAG-16B.json b/data/model_data_json/genbio-ai_AIDO.Protein-RAG-16B.json new file mode 100644 index 0000000000000000000000000000000000000000..92a5f7de5223185c3ea29be7d518119b08963c65 --- /dev/null +++ b/data/model_data_json/genbio-ai_AIDO.Protein-RAG-16B.json @@ -0,0 +1,13 @@ +{ + "model_id": "genbio-ai/AIDO.Protein-RAG-16B", + "downloads": 204878, + "tags": [ + "pytorch", + "fm4bio", + "arxiv:2406.05347", + "license:other", + "region:us" + ], + "description": "--- license: other --- # AIDO.Protein-RAG-16B AIDO.Protein-RAG-16B is a multimodal protein language model that integrates Multiple Sequence Alignment (MSA) and structural data, building upon the AIDO.Protein-16B foundation. The training process comprises three main stages: 1. 2D RoPE encoding fine-tuning 2. Initial training on 100 billion tokens from UniRef50/UniClust30 MSA data 3. Subsequent training on 80 billion tokens from AlphaFold Database MSA and structural data ## Model Architecture Details AIDO.Protein-RAG-16B employs a transformer encoder-only architecture featuring sparse Mixture-of-Experts (MoE) layers that replace dense MLP layers in each transformer block. Utilizing single amino acid tokenization and optimized through masked language modeling (MLM), the model activates 2 experts per token via top-2 routing mechanisms.
\"An
More architecture details are shown below: | Model Arch Component | Value | | ----------------------- | :---: | | Num Attention Head | 36 | | Num Hidden Layer | 36 | | Hidden Size | 2304 | | FFN Hidden Size | 7680 | | Num MoE Layer per Block | 8 | | Num MoE Layer per Token | 2 | | Vocab Size | 44 | | Context Length | 2048 | ## Pre-training of AIDO.Protein-RAG-16B Here we briefly introduce the details of pre-training of AIDO.Protein-RAG-16B. Mainly divided into three stages: (1) 1D -> 2D RoPE encoding finetuning; (2) UniRef50/Uniclust30 MSA finetuning; (3) AlphaFold Database MSA & Structure tokens finetuning. ### Data **UniRef50/Uniclust30 MSA dataset**: We utilized sequences from UniRef50 as queries to search for homologous sequences in UniClust30, subsequently constructing multiple sequence alignments (MSAs). UniRef50 comprises a total of 53.6 million sequences. Using HHblits, we searched all sequences, identifying over 25 homologous sequences for 23.7 million of them. This dataset was directly used as the training set, referred to as . The remaining 29.9 million sequences were input into MSA Retriever, resulting in 7.7 million sequences with more than 25 homologous sequences. This dataset was designated as . During training, RAGPLM randomly sampled from the two datasets with probabilities of 0.75 and 0.25. Refer to AIDO.Protein-RAG-3B paper (link) for more information. **AlphaFold Database MSA & Structure dataset**: We downloaded all structural data from the AlphaFold Database and kept only those where more than 40% of amino acids had a pLDDT score > 70. The remaining sequences were clustered using (), and one representative per cluster was retained, resulting in 46.9 million sequence/structure pairs. For each structure, we used genbio-ai/AIDO.StructureTokenizer to obtain structure tokens and embeddings. MSA Retriever was used to obtain the corresponding MSA. ### Training Details Model training is divided into three stages: #### (1) 1D -> 2D RoPE Encoding Fine-tuning Same training data as AIDO.Protein-16B, but with 2D rotary position embedding for token encoding. #### (2) UniRef50/UniClust30 MSA Fine-tuning The model from Stage 1 is further fine-tuned on the UniRef50/Uniclust30 MSA dataset. See the AIDO.Protein-RAG-3B paper for more. #### (3) AlphaFold Database MSA & Structure Fine-tuning We fine-tuned the model with concatenated query and homologous sequences. Structure embeddings (dim = 384) are linearly mapped to 2304 and added to the query token embeddings. ##### Sequence Masking * Randomly sample span positions from a query of length . Span lengths follow a geometric distribution (), capped at length 10. On average, ~15% of query tokens are masked. * When a residue is selected, its aligned residues across all sequences (MSA column) are also masked. * For masked MSA columns: 80% are replaced with , 10% with random amino acids, and 10% left unchanged. ##### Structure Masking * In 20% of cases, structure embeddings are replaced with 0. * In 80% of cases, a number of amino acids is sampled using the BetaLinear30 distribution and corresponding embeddings are zeroed. (BetaLinear30 = 20% Uniform(0,1) + 80% Beta(3,9)). ##### Positional Embedding We use 2D rotary position embedding to help the model distinguish token chain identities and residue indices. See AIDO.Protein-RAG-3B paper (link) for more information. ##### Loss Function Total loss is a weighted sum of sequence loss (weight 1.0) and structure loss (weight 0.01). * **Sequence loss**: CrossEntropy loss for masked token prediction. * **Structure loss**: CrossEntropy loss for masked structure token prediction. | Hyper-params | (1) 1D -> 2D finetuning | (2) UniRef50/Uniclust30 MSA finetuning | (3) AFDB MSA & Structure tokens finetuning | | --------------------------- | :---------------------: | :------------------------------------: | :----------------------------------------: | | Initialized parameters | AIDO.Protein-16B | Stage (1) | Stage (2) | | Data | ColabFoldDB, UniRef | HHblits_MSA, Retriever_MSA | AFDB MSA & Structure tokens | | Global Batch Size | 512 | 256 | 256 | | Sequence length | 2048 | 12800 | 12800 | | Per Device Micro Batch Size | 1 | 1 | 1 | | Precision | Mixed FP32-FP16 | Mixed FP32-FP16 | Mixed FP32-FP16 | | LR | [5e-6,5e-5] | [1e-6, 1e-5] | 1e-5 | | Num Tokens | 10 billion | 100 billion | 80 billion | ### Tokenization We encode protein sequence with single amino acid resolution with 44 vocabularies, where 24 tokens represent amino acid types and 20 are special tokens. Sequences were also suffixed with a token as hooks for downstream tasks. ## Results ### Supervised Downstream Tasks
\"supervised_tasks\"
### Supervised DMS Fitness Score Prediction (25 Samples)
\"supervised_dms\"
## How to Use ### Build Downstream Models Using ModelGenerator For more information, visit: Model Generator ### Use Directly in Python #### Embedding #### Sequence Level Classification #### Token Level Classification #### Regression # Citation Please cite AIDO.Protein-RAG-16B using the following BibTex code:", + "model_explanation_gemini": "A multimodal protein language model integrating Multiple Sequence Alignment and structural data for protein analysis through transformer-based architecture with sparse Mixture-of-Experts layers." +} \ No newline at end of file diff --git a/data/model_data_json/genbio-ai_AIDO.Protein-RAG-3B.json b/data/model_data_json/genbio-ai_AIDO.Protein-RAG-3B.json new file mode 100644 index 0000000000000000000000000000000000000000..29626482314a00be422788cdf7bc6cef0633e1e8 --- /dev/null +++ b/data/model_data_json/genbio-ai_AIDO.Protein-RAG-3B.json @@ -0,0 +1,14 @@ +{ + "model_id": "genbio-ai/AIDO.Protein-RAG-3B", + "downloads": 134247, + "tags": [ + "pytorch", + "fm4bio", + "arxiv:2412.06993", + "arxiv:2406.05347", + "license:other", + "region:us" + ], + "description": "--- license: other --- # AIDO.Protein-RAG-3B AIDO.Protein-RAG-3B (AIDO.RAGPLM) is a pretrained Retrieval-Augmented protein language model within an AI-driven Digital Organism framework. This model, along with AIDO.RAGFold, integrates pretrained protein language models with retrieved Multiple Sequence Alignments (MSA), enabling the incorporation of co-evolutionary information for structure prediction while compensating for limited MSA data through large-scale pretraining. AIDO.Protein-RAG-3B outperforms single-sequence protein language models in perplexity, contact prediction, and fitness prediction. When used as a feature extractor for structure prediction in AIDO.RAGFold, it achieves TM-scores comparable to AlphaFold2 with sufficient MSA data (8x faster runtime), and significantly surpasses AlphaFold2 in MSA-limited scenarios (∆TM-score=0.379, 0.116, and 0.059 for 0, 5, and 10 input sequences respectively). ## Model Architecture AIDO.Protein-RAG-3B employs a transformer encoder-only architecture with dense MLP layers in each block (Panel **​c**​ below). The model uses single amino acid tokenization and is optimized via masked language modeling (MLM).
\"An
More architecture details are shown below: | Model Arch | Value | | ------------------ | :---: | | Num Attention Head | 40 | | Num Hidden Layer | 36 | | Hidden Size | 2560 | | FFN Hidden Size | 6832 | | Context Length | 12.8K | ## Pre-training ### Data Preparation **UniRef50/Uniclust30 MSA dataset**: We utilized sequences from UniRef50 as queries to search for homologous sequences in UniClust30, subsequently constructing multiple sequence alignments (MSAs). UniRef50 comprises a total of 53.6 million sequences. Using HHblits, we searched all sequences, identifying over 25 homologous sequences for 23.7 million of them. This dataset was directly used as the training set, referred to as . The remaining 29.9 million sequences were input into MSA Retriever, resulting in 7.7 million sequences with more than 25 homologous sequences. This dataset was designated as . During training, RAGPLM randomly sampled from the two datasets with probabilities of 0.75 and 0.25 ### Training Details We fine-tuned a pretrained masked language model with 3-billion parameters (MLM-3B) using MSA data by concatenating the query sequence with homologous sequences. We introduced several modifications to the standard BERT masking strategy: (1) We randomly sampled span positions from a query sequence of length , with span lengths following a geometric distribution (), and capped the maximum length at 10. Our experiments revealed that this settings lead to an average of 15% of the query tokens were masked. (2) To prevent information leakage, when a residue was selected, all residues at the same index across all sequences (the column of the MSA matrix) were also masked. (3) When a column of MSA was selected for masking, the entire column was replaced with the token in 80% of cases, with random amino acids in 10% of cases, and remained unchanged in the remaining 10% of cases. To help the model distinguish which tokens are from the same chain and which tokens have the same residue index, we use 2D rotary position embedding to encode the tokens. | | MLM-3B | AIDO.Protein-RAG-3B | | ---------------- | ------------------ | -------------------------- | | Training data | UniRef+ColabFoldDB | HHblits_MSA, Retriever_MSA | | Initial params | Random | MLM-3B | | Learning rate | 2.5e-4 | 1e-4 | | Training tokens | 1000B | 100B | | Batch size | 2560 | 256 | | Micro batch size | 4 | 1 | | Sample length | 1024 | 12,800 | | Attention | Bi-directional | Bi-directional | ### Tokenization We encode protein sequence with single amino acid resolution with 44 vocabularies, where 24 tokens represent amino acid types and 20 are special tokens. Sequences were also suffixed with a token as hooks for downstream tasks. ## Evaluation of AIDO.Protein-RAG-3B AIDO.Protein-RAG-3B surpasses single-sequence protein language models in perplexity, contact prediction, and fitness prediction. Subsequently, we utilized AIDO.Protein-RAG-3B as a feature extractor, integrating it with the folding trunks and Structure Modules to achieve end-to-end structural prediction (AIDO.RAGFold). Our findings indicate that when sufficient MSA is available, our method achieves results comparable to AlphaFold2 and is eight times faster; when MSA is insufficient, our method significantly outperforms AlphaFold2. ## Results ### Unsupervised Contact Prediction
\"unsupervised_contact_prediction\"
### Supervised downstream tasks
\"supervised_tasks\"
### AIDO.RAGFold
\"structure_prediction\"
## How to Use ### Build Downstream Models Using ModelGenerator For more information, visit: Model Generator ### Use Directly in Python #### Embedding #### Sequence Level Classification #### Token Level Classification #### Sequence Level Regression # Citation Please cite AIDO.Protein-RAG-3B using the following BibTex code:", + "model_explanation_gemini": "AIDO.Protein-RAG-3B is a retrieval-augmented protein language model that integrates multiple sequence alignments for improved protein structure prediction, especially in MSA-limited scenarios, while outperforming single-sequence models in perplexity, contact prediction, and fitness prediction." +} \ No newline at end of file diff --git a/data/model_data_json/ggml-org_models.json b/data/model_data_json/ggml-org_models.json new file mode 100644 index 0000000000000000000000000000000000000000..8549244e0291a1266ab0b8dccd86818acc36d430 --- /dev/null +++ b/data/model_data_json/ggml-org_models.json @@ -0,0 +1,11 @@ +{ + "model_id": "ggml-org/models", + "downloads": 2058119, + "tags": [ + "gguf", + "endpoints_compatible", + "region:us" + ], + "description": "Various models to be used in llama.cpp CI workflow. Do not use it in production.", + "model_explanation_gemini": "Various models for testing llama.cpp CI workflows, not intended for production use." +} \ No newline at end of file diff --git a/data/model_data_json/ggml-org_stories15M_MOE.json b/data/model_data_json/ggml-org_stories15M_MOE.json new file mode 100644 index 0000000000000000000000000000000000000000..0a19281796c07de8eb2d0b69e3bbb6724731067d --- /dev/null +++ b/data/model_data_json/ggml-org_stories15M_MOE.json @@ -0,0 +1,18 @@ +{ + "model_id": "ggml-org/stories15M_MOE", + "downloads": 234427, + "tags": [ + "transformers", + "safetensors", + "gguf", + "mixtral", + "text-generation", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- # stories15M_MOE This model is ModelCloud/tinyllama-15M-stories repeated 4 times to make 4 experts. The model is used for testing, not intended to be used in production (unless your product is some kind of bedtime story teller) Weight of router is initialized randomly ## shakespeare LoRA adapter A LoRA adapter trained on first 100 paragraphs of shakespeare can be found inside With input: - Original model generates: - LoRA adapter generates:", + "model_explanation_gemini": "A 15M-parameter mixture-of-experts model for generating stories, primarily for testing, with an optional Shakespeare-themed LoRA adapter for stylistic variation." +} \ No newline at end of file diff --git a/data/model_data_json/ghunkins_prompt-expansion.json b/data/model_data_json/ghunkins_prompt-expansion.json new file mode 100644 index 0000000000000000000000000000000000000000..ceed3df9b1c4c0d791bab83208aae9ed37e9fb9f --- /dev/null +++ b/data/model_data_json/ghunkins_prompt-expansion.json @@ -0,0 +1,17 @@ +{ + "model_id": "ghunkins/prompt-expansion", + "downloads": 376960, + "tags": [ + "transformers", + "pytorch", + "gpt2", + "text-generation", + "license:creativeml-openrail-m", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: creativeml-openrail-m ---", + "model_explanation_gemini": "Generates expanded prompts from short input phrases to enhance creative text generation." +} \ No newline at end of file diff --git a/data/model_data_json/gigant_romanian-wav2vec2.json b/data/model_data_json/gigant_romanian-wav2vec2.json new file mode 100644 index 0000000000000000000000000000000000000000..283b3570ad79bea64880c0655da55c2246936e34 --- /dev/null +++ b/data/model_data_json/gigant_romanian-wav2vec2.json @@ -0,0 +1,24 @@ +{ + "model_id": "gigant/romanian-wav2vec2", + "downloads": 106446, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "hf-asr-leaderboard", + "robust-speech-event", + "ro", + "dataset:mozilla-foundation/common_voice_8_0", + "dataset:gigant/romanian_speech_synthesis_0_8_1", + "base_model:facebook/wav2vec2-xls-r-300m", + "base_model:finetune:facebook/wav2vec2-xls-r-300m", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ro license: apache-2.0 tags: - automatic-speech-recognition - hf-asr-leaderboard - robust-speech-event datasets: - mozilla-foundation/common_voice_8_0 - gigant/romanian_speech_synthesis_0_8_1 base_model: facebook/wav2vec2-xls-r-300m model-index: - name: wav2vec2-ro-300m_01 results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Robust Speech Event type: speech-recognition-community-v2/dev_data args: ro metrics: - type: wer value: 46.99 name: Dev WER (without LM) - type: cer value: 16.04 name: Dev CER (without LM) - type: wer value: 38.63 name: Dev WER (with LM) - type: cer value: 14.52 name: Dev CER (with LM) - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Common Voice type: mozilla-foundation/common_voice_8_0 args: ro metrics: - type: wer value: 11.73 name: Test WER (without LM) - type: cer value: 2.93 name: Test CER (without LM) - type: wer value: 7.31 name: Test WER (with LM) - type: cer value: 2.17 name: Test CER (with LM) - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Robust Speech Event - Test Data type: speech-recognition-community-v2/eval_data args: ro metrics: - type: wer value: 43.23 name: Test WER --- You can test this model online with the **Space for Romanian Speech Recognition** The model ranked **TOP-1** on Romanian Speech Recognition during HuggingFace's Robust Speech Challenge : * **The 🤗 Speech Bench** * **Speech Challenge Leaderboard** # Romanian Wav2Vec2 This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the Common Voice 8.0 - Romanian subset dataset, with extra training data from Romanian Speech Synthesis dataset. Without the 5-gram Language Model optimization, it achieves the following results on the evaluation set (Common Voice 8.0, Romanian subset, test split): - Loss: 0.1553 - Wer: 0.1174 - Cer: 0.0294 ## Model description The architecture is based on facebook/wav2vec2-xls-r-300m with a speech recognition CTC head and an added 5-gram language model (using pyctcdecode and kenlm) trained on the Romanian Corpora Parliament dataset. Those libraries are needed in order for the language model-boosted decoder to work. ## Intended uses & limitations The model is made for speech recognition in Romanian from audio clips sampled at **16kHz**. The predicted text is lowercased and does not contain any punctuation. ## How to use Make sure you have installed the correct dependencies for the language model-boosted version to work. You can just run this command to install the and libraries : With the framework you can load the model with the following code : Or, if you want to test the model, you can load the automatic speech recognition pipeline from with : ## Example use with the library First, you need to load your data We will use the Romanian Speech Synthesis dataset in this example. You can listen to the samples with the library : The model is trained to work with audio sampled at 16kHz, so if the sampling rate of the audio in the dataset is different, we will have to resample it. In the example, the audio is sampled at 48kHz. We can see this by checking The following code resample the audio using the library : To listen to the resampled sample : Know you can get the model prediction by running ## Training and evaluation data Training data : - Common Voice 8.0 - Romanian subset : train + validation + other splits - Romanian Speech Synthesis : train + test splits Evaluation data : - Common Voice 8.0 - Romanian subset : test split ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.003 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 3 - total_train_batch_size: 48 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 50.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:| | 2.9272 | 0.78 | 500 | 0.7603 | 0.7734 | 0.2355 | | 0.6157 | 1.55 | 1000 | 0.4003 | 0.4866 | 0.1247 | | 0.4452 | 2.33 | 1500 | 0.2960 | 0.3689 | 0.0910 | | 0.3631 | 3.11 | 2000 | 0.2580 | 0.3205 | 0.0796 | | 0.3153 | 3.88 | 2500 | 0.2465 | 0.2977 | 0.0747 | | 0.2795 | 4.66 | 3000 | 0.2274 | 0.2789 | 0.0694 | | 0.2615 | 5.43 | 3500 | 0.2277 | 0.2685 | 0.0675 | | 0.2389 | 6.21 | 4000 | 0.2135 | 0.2518 | 0.0627 | | 0.2229 | 6.99 | 4500 | 0.2054 | 0.2449 | 0.0614 | | 0.2067 | 7.76 | 5000 | 0.2096 | 0.2378 | 0.0597 | | 0.1977 | 8.54 | 5500 | 0.2042 | 0.2387 | 0.0600 | | 0.1896 | 9.32 | 6000 | 0.2110 | 0.2383 | 0.0595 | | 0.1801 | 10.09 | 6500 | 0.1909 | 0.2165 | 0.0548 | | 0.174 | 10.87 | 7000 | 0.1883 | 0.2206 | 0.0559 | | 0.1685 | 11.65 | 7500 | 0.1848 | 0.2097 | 0.0528 | | 0.1591 | 12.42 | 8000 | 0.1851 | 0.2039 | 0.0514 | | 0.1537 | 13.2 | 8500 | 0.1881 | 0.2065 | 0.0518 | | 0.1504 | 13.97 | 9000 | 0.1840 | 0.1972 | 0.0499 | | 0.145 | 14.75 | 9500 | 0.1845 | 0.2029 | 0.0517 | | 0.1417 | 15.53 | 10000 | 0.1884 | 0.2003 | 0.0507 | | 0.1364 | 16.3 | 10500 | 0.2010 | 0.2037 | 0.0517 | | 0.1331 | 17.08 | 11000 | 0.1838 | 0.1923 | 0.0483 | | 0.129 | 17.86 | 11500 | 0.1818 | 0.1922 | 0.0489 | | 0.1198 | 18.63 | 12000 | 0.1760 | 0.1861 | 0.0465 | | 0.1203 | 19.41 | 12500 | 0.1686 | 0.1839 | 0.0465 | | 0.1225 | 20.19 | 13000 | 0.1828 | 0.1920 | 0.0479 | | 0.1145 | 20.96 | 13500 | 0.1673 | 0.1784 | 0.0446 | | 0.1053 | 21.74 | 14000 | 0.1802 | 0.1810 | 0.0456 | | 0.1071 | 22.51 | 14500 | 0.1769 | 0.1775 | 0.0444 | | 0.1053 | 23.29 | 15000 | 0.1920 | 0.1783 | 0.0457 | | 0.1024 | 24.07 | 15500 | 0.1904 | 0.1775 | 0.0446 | | 0.0987 | 24.84 | 16000 | 0.1793 | 0.1762 | 0.0446 | | 0.0949 | 25.62 | 16500 | 0.1801 | 0.1766 | 0.0443 | | 0.0942 | 26.4 | 17000 | 0.1731 | 0.1659 | 0.0423 | | 0.0906 | 27.17 | 17500 | 0.1776 | 0.1698 | 0.0424 | | 0.0861 | 27.95 | 18000 | 0.1716 | 0.1600 | 0.0406 | | 0.0851 | 28.73 | 18500 | 0.1662 | 0.1630 | 0.0410 | | 0.0844 | 29.5 | 19000 | 0.1671 | 0.1572 | 0.0393 | | 0.0792 | 30.28 | 19500 | 0.1768 | 0.1599 | 0.0407 | | 0.0798 | 31.06 | 20000 | 0.1732 | 0.1558 | 0.0394 | | 0.0779 | 31.83 | 20500 | 0.1694 | 0.1544 | 0.0388 | | 0.0718 | 32.61 | 21000 | 0.1709 | 0.1578 | 0.0399 | | 0.0732 | 33.38 | 21500 | 0.1697 | 0.1523 | 0.0391 | | 0.0708 | 34.16 | 22000 | 0.1616 | 0.1474 | 0.0375 | | 0.0678 | 34.94 | 22500 | 0.1698 | 0.1474 | 0.0375 | | 0.0642 | 35.71 | 23000 | 0.1681 | 0.1459 | 0.0369 | | 0.0661 | 36.49 | 23500 | 0.1612 | 0.1411 | 0.0357 | | 0.0629 | 37.27 | 24000 | 0.1662 | 0.1414 | 0.0355 | | 0.0587 | 38.04 | 24500 | 0.1659 | 0.1408 | 0.0351 | | 0.0581 | 38.82 | 25000 | 0.1612 | 0.1382 | 0.0352 | | 0.0556 | 39.6 | 25500 | 0.1647 | 0.1376 | 0.0345 | | 0.0543 | 40.37 | 26000 | 0.1658 | 0.1335 | 0.0337 | | 0.052 | 41.15 | 26500 | 0.1716 | 0.1369 | 0.0343 | | 0.0513 | 41.92 | 27000 | 0.1600 | 0.1317 | 0.0330 | | 0.0491 | 42.7 | 27500 | 0.1671 | 0.1311 | 0.0328 | | 0.0463 | 43.48 | 28000 | 0.1613 | 0.1289 | 0.0324 | | 0.0468 | 44.25 | 28500 | 0.1599 | 0.1260 | 0.0315 | | 0.0435 | 45.03 | 29000 | 0.1556 | 0.1232 | 0.0308 | | 0.043 | 45.81 | 29500 | 0.1588 | 0.1240 | 0.0309 | | 0.0421 | 46.58 | 30000 | 0.1567 | 0.1217 | 0.0308 | | 0.04 | 47.36 | 30500 | 0.1533 | 0.1198 | 0.0302 | | 0.0389 | 48.14 | 31000 | 0.1582 | 0.1185 | 0.0297 | | 0.0387 | 48.91 | 31500 | 0.1576 | 0.1187 | 0.0297 | | 0.0376 | 49.69 | 32000 | 0.1560 | 0.1182 | 0.0295 | ### Framework versions - Transformers 4.16.2 - Pytorch 1.10.0+cu111 - Tokenizers 0.11.0 - pyctcdecode 0.3.0 - kenlm", + "model_explanation_gemini": "Fine-tuned for Romanian automatic speech recognition, converting 16kHz audio to lowercase text without punctuation, achieving top performance in Romanian speech recognition tasks." +} \ No newline at end of file diff --git a/data/model_data_json/gilf_french-camembert-postag-model.json b/data/model_data_json/gilf_french-camembert-postag-model.json new file mode 100644 index 0000000000000000000000000000000000000000..97ce877c94b7213c03d9490b4d7d7cbd9eb51645 --- /dev/null +++ b/data/model_data_json/gilf_french-camembert-postag-model.json @@ -0,0 +1,18 @@ +{ + "model_id": "gilf/french-camembert-postag-model", + "downloads": 887228, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "camembert", + "token-classification", + "fr", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: fr widget: - text: \"Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages\" --- ## About The *french-camembert-postag-model* is a part of speech tagging model for French that was trained on the *free-french-treebank* dataset available on github. The base tokenizer and model used for training is *'camembert-base'*. ## Supported Tags It uses the following tags: | Tag | Category | Extra Info | |----------|:------------------------------:|------------:| | ADJ | adjectif | | | ADJWH | adjectif | | | ADV | adverbe | | | ADVWH | adverbe | | | CC | conjonction de coordination | | | CLO | pronom | obj | | CLR | pronom | refl | | CLS | pronom | suj | | CS | conjonction de subordination | | | DET | déterminant | | | DETWH | déterminant | | | ET | mot étranger | | | I | interjection | | | NC | nom commun | | | NPP | nom propre | | | P | préposition | | | P+D | préposition + déterminant | | | PONCT | signe de ponctuation | | | PREF | préfixe | | | PRO | autres pronoms | | | PROREL | autres pronoms | rel | | PROWH | autres pronoms | int | | U | ? | | | V | verbe | | | VIMP | verbe imperatif | | | VINF | verbe infinitif | | | VPP | participe passé | | | VPR | participe présent | | | VS | subjonctif | | More information on the tags can be found here: ## Usage The usage of this model follows the common transformers patterns. Here is a short example of its usage: The lines above would display something like this on a Jupyter notebook:", + "model_explanation_gemini": "A French part-of-speech tagging model trained on the free-french-treebank dataset using camembert-base to classify words into specific grammatical categories." +} \ No newline at end of file diff --git a/data/model_data_json/globis-university_deberta-v3-japanese-large.json b/data/model_data_json/globis-university_deberta-v3-japanese-large.json new file mode 100644 index 0000000000000000000000000000000000000000..c1c21dd48c9f29a578afac02a13c8a3ecb23599f --- /dev/null +++ b/data/model_data_json/globis-university_deberta-v3-japanese-large.json @@ -0,0 +1,24 @@ +{ + "model_id": "globis-university/deberta-v3-japanese-large", + "downloads": 446296, + "tags": [ + "transformers", + "pytorch", + "deberta-v2", + "token-classification", + "ja", + "dataset:globis-university/aozorabunko-clean", + "dataset:oscar-corpus/OSCAR-2301", + "dataset:Wikipedia", + "dataset:WikiBooks", + "dataset:CC-100", + "dataset:allenai/c4", + "arxiv:2302.03169", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-sa-4.0 datasets: - globis-university/aozorabunko-clean - oscar-corpus/OSCAR-2301 - Wikipedia - WikiBooks - CC-100 - allenai/c4 language: - ja library_name: transformers --- # What’s this? 日本語リソースで学習した DeBERTa V3 モデルです。 以下のような特徴を持ちます: - 定評のある DeBERTa V3 を用いたモデル - 日本語特化 - 推論時に形態素解析器を用いない - 単語境界をある程度尊重する ( や のような複数語のトークンを生じさせない) --- This is a model based on DeBERTa V3 pre-trained on Japanese resources. The model has the following features: - Based on the well-known DeBERTa V3 model - Specialized for the Japanese language - Does not use a morphological analyzer during inference - Respects word boundaries to some extent (does not produce tokens spanning multiple words like or ) # How to use # Tokenizer 工藤氏によって示された手法で学習しました。 以下のことを意識しています: - 推論時の形態素解析器なし - トークンが単語の境界を跨がない (辞書: ) - Hugging Faceで使いやすい - 大きすぎない語彙数 本家の DeBERTa V3 は大きな語彙数で学習されていることに特徴がありますが、反面埋め込み層のパラメータ数が大きくなりすぎる (microsoft/deberta-v3-base モデルの場合で埋め込み層が全体の 54%) ことから、本モデルでは小さめの語彙数を採用しています。 注意点として、 、 、 の 3 つのモデルのうち、前者二つは unigram アルゴリズムで学習しているが、 モデルのみ BPE アルゴリズムで学習している。 深い理由はなく、 モデルのみ語彙サイズを増やすために独立して学習を行ったが、なぜか unigram アルゴリズムでの学習がうまくいかなかったことが原因である。 原因の探究よりモデルの完成を優先して、 BPE アルゴリズムに切り替えた。 --- The tokenizer is trained using the method introduced by Kudo. Key points include: - No morphological analyzer needed during inference - Tokens do not cross word boundaries (dictionary: ) - Easy to use with Hugging Face - Smaller vocabulary size Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the microsoft/deberta-v3-base model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this. Note that, among the three models: xsmall, base, and large, the first two were trained using the unigram algorithm, while only the large model was trained using the BPE algorithm. The reason for this is simple: while the large model was independently trained to increase its vocabulary size, for some reason, training with the unigram algorithm was not successful. Thus, prioritizing the completion of the model over investigating the cause, we switched to the BPE algorithm. # Data | Dataset Name | Notes | File Size (with metadata) | Factor | | ------------- | ----- | ------------------------- | ---------- | | Wikipedia | 2023/07; WikiExtractor | 3.5GB | x2 | | Wikipedia | 2023/07; cl-tohoku's method | 4.8GB | x2 | | WikiBooks | 2023/07; cl-tohoku's method | 43MB | x2 | | Aozora Bunko | 2023/07; globis-university/aozorabunko-clean | 496MB | x4 | | CC-100 | ja | 90GB | x1 | | mC4 | ja; extracted 10%, with Wikipedia-like focus via DSIR | 91GB | x1 | | OSCAR 2023 | ja; extracted 10%, with Wikipedia-like focus via DSIR | 26GB | x1 | # Training parameters - Number of devices: 8 - Batch size: 8 x 8 - Learning rate: 6.4e-5 - Maximum sequence length: 512 - Optimizer: AdamW - Learning rate scheduler: Linear schedule with warmup - Training steps: 2,000,000 - Warmup steps: 100,000 - Precision: Mixed (fp16) - Vocabulary size: 48,000 # Evaluation | Model | #params | JSTS | JNLI | JSQuAD | JCQA | | ----- | ------- | ---- | ---- | ------ | ---- | | ≤ small | | | | | | | izumi-lab/deberta-v2-small-japanese | 17.8M | 0.890/0.846 | 0.880 | - | 0.737 | | globis-university/deberta-v3-japanese-xsmall | 33.7M | **0.916**/**0.880** | **0.913** | **0.869**/**0.938** | **0.821** | | base | | | | | | cl-tohoku/bert-base-japanese-v3 | 111M | 0.919/0.881 | 0.907 | 0.880/0.946 | 0.848 | | nlp-waseda/roberta-base-japanese | 111M | 0.913/0.873 | 0.895 | 0.864/0.927 | 0.840 | | izumi-lab/deberta-v2-base-japanese | 110M | 0.919/0.882 | 0.912 | - | 0.859 | | ku-nlp/deberta-v2-base-japanese | 112M | 0.922/0.886 | 0.922 | **0.899**/**0.951** | - | | ku-nlp/deberta-v3-base-japanese | 160M | **0.927**/0.891 | **0.927** | 0.896/- | - | | globis-university/deberta-v3-japanese-base | 110M | 0.925/**0.895** | 0.921 | 0.890/0.950 | **0.886** | | large | | | | | | | cl-tohoku/bert-large-japanese-v2 | 337M | 0.926/0.893 | **0.929** | 0.893/0.956 | 0.893 | | nlp-waseda/roberta-large-japanese | 337M | **0.930**/**0.896** | 0.924 | 0.884/0.940 | **0.907** | | nlp-waseda/roberta-large-japanese-seq512 | 337M | 0.926/0.892 | 0.926 | **0.918**/**0.963** | 0.891 | | ku-nlp/deberta-v2-large-japanese | 339M | 0.925/0.892 | 0.924 | 0.912/0.959 | - | | **globis-university/deberta-v3-japanese-large** | 352M | 0.928/**0.896** | 0.924 | 0.896/0.956 | 0.900 | ## License CC BY SA 4.0 ## Acknowledgement 計算リソースに ABCI を利用させていただきました。ありがとうございます。 --- We used ABCI for computing resources. Thank you.", + "model_explanation_gemini": "A Japanese-specialized DeBERTa V3 model pre-trained on diverse Japanese datasets, designed for NLP tasks without requiring morphological analysis during inference while respecting word boundaries." +} \ No newline at end of file diff --git a/data/model_data_json/gogamza_kobart-summarization.json b/data/model_data_json/gogamza_kobart-summarization.json new file mode 100644 index 0000000000000000000000000000000000000000..f25e727feb91223d0866c4433a631fbc6b7c669a --- /dev/null +++ b/data/model_data_json/gogamza_kobart-summarization.json @@ -0,0 +1,18 @@ +{ + "model_id": "gogamza/kobart-summarization", + "downloads": 111730, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bart", + "text2text-generation", + "ko", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko tags: - bart license: mit --- # Korean News Summarization Model ## Demo ## How to use", + "model_explanation_gemini": "Generates summaries of Korean news articles." +} \ No newline at end of file diff --git a/data/model_data_json/gokaygokay_Florence-2-SD3-Captioner.json b/data/model_data_json/gokaygokay_Florence-2-SD3-Captioner.json new file mode 100644 index 0000000000000000000000000000000000000000..38fc28c163e594dc342b348d0667b7fb2cf76528 --- /dev/null +++ b/data/model_data_json/gokaygokay_Florence-2-SD3-Captioner.json @@ -0,0 +1,21 @@ +{ + "model_id": "gokaygokay/Florence-2-SD3-Captioner", + "downloads": 76570, + "tags": [ + "transformers", + "safetensors", + "florence2", + "text-generation", + "art", + "image-text-to-text", + "custom_code", + "en", + "dataset:google/docci", + "dataset:google/imageinwords", + "dataset:ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - google/docci - google/imageinwords - ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions language: - en library_name: transformers pipeline_tag: image-text-to-text tags: - art ---" +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-base-cased.json b/data/model_data_json/google-bert_bert-base-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..32db8595ac2a9c85d47e81cd9d60f572d4423979 --- /dev/null +++ b/data/model_data_json/google-bert_bert-base-cased.json @@ -0,0 +1,24 @@ +{ + "model_id": "google-bert/bert-base-cased", + "downloads": 4044111, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "fill-mask", + "exbert", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1810.04805", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: apache-2.0 datasets: - bookcorpus - wikipedia --- # BERT base model (cased) Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The BERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer used is Adam with a learning rate of 1e-4, \\\\(\\beta_{1} = 0.9\\\\) and \\\\(\\beta_{2} = 0.999\\\\), a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Glue test results: | Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average | |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:| | | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 | ### BibTeX entry and citation info ", + "model_explanation_gemini": "A case-sensitive BERT model pretrained on English text using masked language modeling and next sentence prediction to generate bidirectional language representations for downstream NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-base-chinese.json b/data/model_data_json/google-bert_bert-base-chinese.json new file mode 100644 index 0000000000000000000000000000000000000000..ac263965d4915bcf92dfb5222bc44f4620234889 --- /dev/null +++ b/data/model_data_json/google-bert_bert-base-chinese.json @@ -0,0 +1,20 @@ +{ + "model_id": "google-bert/bert-base-chinese", + "downloads": 3229600, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "fill-mask", + "zh", + "arxiv:1810.04805", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: zh --- # Bert-base-chinese ## Table of Contents - Model Details - Uses - Risks, Limitations and Biases - Training - Evaluation - How to Get Started With the Model ## Model Details ### Model Description This model has been pre-trained for Chinese, training and random input masking has been applied independently to word pieces (as in the original BERT paper). - **Developed by:** HuggingFace team - **Model Type:** Fill-Mask - **Language(s):** Chinese - **License:** [More Information needed] - **Parent Model:** See the BERT base uncased model for more information about the BERT base model. ### Model Sources - **Paper:** BERT ## Uses #### Direct Use This model can be used for masked language modeling ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). ## Training #### Training Procedure * **type_vocab_size:** 2 * **vocab_size:** 21128 * **num_hidden_layers:** 12 #### Training Data [More Information Needed] ## Evaluation #### Results [More Information Needed] ## How to Get Started With the Model", + "model_explanation_gemini": "A Chinese BERT-based model pre-trained for masked language modeling tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-base-german-cased.json b/data/model_data_json/google-bert_bert-base-german-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..245cd3de7707c85b3b981b57f68ec59ec994d6dc --- /dev/null +++ b/data/model_data_json/google-bert_bert-base-german-cased.json @@ -0,0 +1,22 @@ +{ + "model_id": "google-bert/bert-base-german-cased", + "downloads": 728983, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "bert", + "fill-mask", + "exbert", + "de", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: de license: mit thumbnail: tags: - exbert --- # German BERT !bert_image ## Overview **Language model:** bert-base-cased **Language:** German **Training data:** Wiki, OpenLegalData, News (~ 12GB) **Eval data:** Conll03 (NER), GermEval14 (NER), GermEval18 (Classification), GNAD (Classification) **Infrastructure**: 1x TPU v2 **Published**: Jun 14th, 2019 **Update April 3rd, 2020**: we updated the vocabulary file on deepset's s3 to conform with the default tokenization of punctuation tokens. For details see the related FARM issue. If you want to use the old vocab we have also uploaded a \"deepset/bert-base-german-cased-oldvocab\" model. ## Details - We trained using Google's Tensorflow code on a single cloud TPU v2 with standard settings. - We trained 810k steps with a batch size of 1024 for sequence length 128 and 30k steps with sequence length 512. Training took about 9 days. - As training data we used the latest German Wikipedia dump (6GB of raw txt files), the OpenLegalData dump (2.4 GB) and news articles (3.6 GB). - We cleaned the data dumps with tailored scripts and segmented sentences with spacy v2.1. To create tensorflow records we used the recommended sentencepiece library for creating the word piece vocabulary and tensorflow scripts to convert the text to data usable by BERT. See for more details ## Hyperparameters ## Performance During training we monitored the loss and evaluated different model checkpoints on the following German datasets: - germEval18Fine: Macro f1 score for multiclass sentiment classification - germEval18coarse: Macro f1 score for binary sentiment classification - germEval14: Seq f1 score for NER (file names deuutf.\\*) - CONLL03: Seq f1 score for NER - 10kGNAD: Accuracy for document classification Even without thorough hyperparameter tuning, we observed quite stable learning especially for our German model. Multiple restarts with different seeds produced quite similar results. !performancetable We further evaluated different points during the 9 days of pre-training and were astonished how fast the model converges to the maximally reachable performance. We ran all 5 downstream tasks on 7 different model checkpoints - taken at 0 up to 840k training steps (x-axis in figure below). Most checkpoints are taken from early training where we expected most performance changes. Surprisingly, even a randomly initialized BERT can be trained only on labeled downstream datasets and reach good performance (blue line, GermEval 2018 Coarse task, 795 kB trainset size). !checkpointseval ## Authors - Branden Chan: - Timo Möller: - Malte Pietsch: - Tanay Soni: ## About us !deepset logo We bring NLP to the industry via open source! Our focus: Industry specific language models & large scale QA systems. Some of our work: - German BERT (aka \"bert-base-german-cased\") - FARM - Haystack Get in touch: Twitter | LinkedIn | Website", + "model_explanation_gemini": "A German BERT model trained on diverse German text data for tasks like named entity recognition and text classification." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-base-multilingual-cased.json b/data/model_data_json/google-bert_bert-base-multilingual-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..f9fef3dd7b5fee82b5a6ee0d69d4377e16e1e282 --- /dev/null +++ b/data/model_data_json/google-bert_bert-base-multilingual-cased.json @@ -0,0 +1,124 @@ +{ + "model_id": "google-bert/bert-base-multilingual-cased", + "downloads": 6477023, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "fill-mask", + "multilingual", + "af", + "sq", + "ar", + "an", + "hy", + "ast", + "az", + "ba", + "eu", + "bar", + "be", + "bn", + "inc", + "bs", + "br", + "bg", + "my", + "ca", + "ceb", + "ce", + "zh", + "cv", + "hr", + "cs", + "da", + "nl", + "en", + "et", + "fi", + "fr", + "gl", + "ka", + "de", + "el", + "gu", + "ht", + "he", + "hi", + "hu", + "is", + "io", + "id", + "ga", + "it", + "ja", + "jv", + "kn", + "kk", + "ky", + "ko", + "la", + "lv", + "lt", + "roa", + "nds", + "lm", + "mk", + "mg", + "ms", + "ml", + "mr", + "mn", + "min", + "ne", + "new", + "nb", + "nn", + "oc", + "fa", + "pms", + "pl", + "pt", + "pa", + "ro", + "ru", + "sco", + "sr", + "scn", + "sk", + "sl", + "aze", + "es", + "su", + "sw", + "sv", + "tl", + "tg", + "th", + "ta", + "tt", + "te", + "tr", + "uk", + "ud", + "uz", + "vi", + "vo", + "war", + "cy", + "fry", + "pnb", + "yo", + "dataset:wikipedia", + "arxiv:1810.04805", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - sq - ar - an - hy - ast - az - ba - eu - bar - be - bn - inc - bs - br - bg - my - ca - ceb - ce - zh - cv - hr - cs - da - nl - en - et - fi - fr - gl - ka - de - el - gu - ht - he - hi - hu - is - io - id - ga - it - ja - jv - kn - kk - ky - ko - la - lv - lt - roa - nds - lm - mk - mg - ms - ml - mr - mn - min - ne - new - nb - nn - oc - fa - pms - pl - pt - pa - ro - ru - sco - sr - hr - scn - sk - sl - aze - es - su - sw - sv - tl - tg - th - ta - tt - te - tr - uk - ud - uz - vi - vo - war - cy - fry - pnb - yo license: apache-2.0 datasets: - wikipedia --- # BERT multilingual base model (cased) Pretrained model on the top 104 languages with the largest Wikipedia using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case sensitive: it makes a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BERT is a transformers model pretrained on a large corpus of multilingual data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the languages in the training set that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ## Training data The BERT model was pretrained on the 104 languages with the largest Wikipedias. You can find the complete list here. ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a shared vocabulary size of 110,000. The languages with a larger Wikipedia are under-sampled and the ones with lower resources are oversampled. For languages like Chinese, Japanese Kanji and Korean Hanja that don't have space, a CJK Unicode block is added around every character. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### BibTeX entry and citation info", + "model_explanation_gemini": "A multilingual BERT model pretrained on 104 languages using masked language modeling and next sentence prediction, primarily intended for fine-tuning on downstream NLP tasks like sequence classification or question answering." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-base-multilingual-uncased.json b/data/model_data_json/google-bert_bert-base-multilingual-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..bfaec4b594087dc15ecbca3d5e4f4b82d1ebca26 --- /dev/null +++ b/data/model_data_json/google-bert_bert-base-multilingual-uncased.json @@ -0,0 +1,122 @@ +{ + "model_id": "google-bert/bert-base-multilingual-uncased", + "downloads": 2506544, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "fill-mask", + "multilingual", + "af", + "sq", + "ar", + "an", + "hy", + "ast", + "az", + "ba", + "eu", + "bar", + "be", + "bn", + "inc", + "bs", + "br", + "bg", + "my", + "ca", + "ceb", + "ce", + "zh", + "cv", + "hr", + "cs", + "da", + "nl", + "en", + "et", + "fi", + "fr", + "gl", + "ka", + "de", + "el", + "gu", + "ht", + "he", + "hi", + "hu", + "is", + "io", + "id", + "ga", + "it", + "ja", + "jv", + "kn", + "kk", + "ky", + "ko", + "la", + "lv", + "lt", + "roa", + "nds", + "lm", + "mk", + "mg", + "ms", + "ml", + "mr", + "min", + "ne", + "new", + "nb", + "nn", + "oc", + "fa", + "pms", + "pl", + "pt", + "pa", + "ro", + "ru", + "sco", + "sr", + "scn", + "sk", + "sl", + "aze", + "es", + "su", + "sw", + "sv", + "tl", + "tg", + "ta", + "tt", + "te", + "tr", + "uk", + "ud", + "uz", + "vi", + "vo", + "war", + "cy", + "fry", + "pnb", + "yo", + "dataset:wikipedia", + "arxiv:1810.04805", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - sq - ar - an - hy - ast - az - ba - eu - bar - be - bn - inc - bs - br - bg - my - ca - ceb - ce - zh - cv - hr - cs - da - nl - en - et - fi - fr - gl - ka - de - el - gu - ht - he - hi - hu - is - io - id - ga - it - ja - jv - kn - kk - ky - ko - la - lv - lt - roa - nds - lm - mk - mg - ms - ml - mr - min - ne - new - nb - nn - oc - fa - pms - pl - pt - pa - ro - ru - sco - sr - hr - scn - sk - sl - aze - es - su - sw - sv - tl - tg - ta - tt - te - tr - uk - ud - uz - vi - vo - war - cy - fry - pnb - yo license: apache-2.0 datasets: - wikipedia --- # BERT multilingual base model (uncased) Pretrained model on the top 102 languages with the largest Wikipedia using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BERT is a transformers model pretrained on a large corpus of multilingual data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the languages in the training set that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The BERT model was pretrained on the 102 languages with the largest Wikipedias. You can find the complete list here. ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a shared vocabulary size of 110,000. The languages with a larger Wikipedia are under-sampled and the ones with lower resources are oversampled. For languages like Chinese, Japanese Kanji and Korean Hanja that don't have space, a CJK Unicode block is added around every character. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### BibTeX entry and citation info", + "model_explanation_gemini": "Pretrained on 102 languages' Wikipedia texts for masked language modeling and next sentence prediction, enabling multilingual natural language processing tasks through fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-base-uncased.json b/data/model_data_json/google-bert_bert-base-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..cf8fc431b03978146934d19127bb5de84ed3e4c8 --- /dev/null +++ b/data/model_data_json/google-bert_bert-base-uncased.json @@ -0,0 +1,27 @@ +{ + "model_id": "google-bert/bert-base-uncased", + "downloads": 61442118, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "coreml", + "onnx", + "safetensors", + "bert", + "fill-mask", + "exbert", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1810.04805", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: apache-2.0 datasets: - bookcorpus - wikipedia --- # BERT base model (uncased) Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally masks the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences, for instance, you can train a standard classifier using the features produced by the BERT model as inputs. ## Model variations BERT has originally been released in base and large variations, for cased and uncased input text. The uncased models also strips out an accent markers. Chinese and multilingual uncased and cased versions followed shortly after. Modified preprocessing with whole word masking has replaced subpiece masking in a following work, with the release of two models. Other 24 smaller models are released afterward. The detailed release history can be found on the google-research/bert readme on github. | Model | #params | Language | |------------------------|--------------------------------|-------| | []( | 110M | English | | []( | 340M | English | sub | []( | 110M | English | | []( | 340M | English | | []( | 110M | Chinese | | []( | 110M | Multiple | | []( | 340M | English | | []( | 340M | English | ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions of a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The BERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus, and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer used is Adam with a learning rate of 1e-4, \\\\(\\beta_{1} = 0.9\\\\) and \\\\(\\beta_{2} = 0.999\\\\), a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Glue test results: | Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average | |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:| | | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 | ### BibTeX entry and citation info ", + "model_explanation_gemini": "Pretrained on English text using masked language modeling and next sentence prediction to learn bidirectional representations for downstream NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-large-cased.json b/data/model_data_json/google-bert_bert-large-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..076fcadc6c5ee2e382da5456f493af66f599548c --- /dev/null +++ b/data/model_data_json/google-bert_bert-large-cased.json @@ -0,0 +1,23 @@ +{ + "model_id": "google-bert/bert-large-cased", + "downloads": 128457, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "fill-mask", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1810.04805", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 datasets: - bookcorpus - wikipedia --- # BERT large model (cased) Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is cased: it makes a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. This model has the following configuration: - 24-layer - 1024 hidden dimension - 16 attention heads - 336M parameters. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The BERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer used is Adam with a learning rate of 1e-4, \\\\(\\beta_{1} = 0.9\\\\) and \\\\(\\beta_{2} = 0.999\\\\), a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Model | SQUAD 1.1 F1/EM | Multi NLI Accuracy ---------------------------------------- | :-------------: | :----------------: BERT-Large, Cased (Original) | 91.5/84.8 | 86.09 ### BibTeX entry and citation info", + "model_explanation_gemini": "BERT large model (cased) is a pretrained English language model using masked language modeling and next sentence prediction for bidirectional text understanding, primarily intended for fine-tuning on downstream NLP tasks like sequence classification or question answering." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-large-uncased-whole-word-masking-finetuned-squad.json b/data/model_data_json/google-bert_bert-large-uncased-whole-word-masking-finetuned-squad.json new file mode 100644 index 0000000000000000000000000000000000000000..dfe1df8b3f2ce98ea1adf9f4f30db0c169d3cf0b --- /dev/null +++ b/data/model_data_json/google-bert_bert-large-uncased-whole-word-masking-finetuned-squad.json @@ -0,0 +1,22 @@ +{ + "model_id": "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad", + "downloads": 379359, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "question-answering", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1810.04805", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 datasets: - bookcorpus - wikipedia --- # BERT large model (uncased) whole word masking finetuned on SQuAD Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English. Differently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall masking rate remains the same. The training is identical -- each masked WordPiece token is predicted independently. After pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. See below for more information regarding this fine-tuning. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. This model has the following configuration: - 24-layer - 1024 hidden dimension - 16 attention heads - 336M parameters. ## Intended uses & limitations This model should be used as a question-answering model. You may use it in a question answering pipeline, or use it to output raw results given a query and a context. You may see other use cases in the task summary of the transformers documentation.## Training data The BERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer used is Adam with a learning rate of 1e-4, \\\\(\\beta_{1} = 0.9\\\\) and \\\\(\\beta_{2} = 0.999\\\\), a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after. ### Fine-tuning After pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. In order to reproduce the training, you may use the following command: ## Evaluation results The results obtained are the following: ### BibTeX entry and citation info", + "model_explanation_gemini": "A large uncased BERT model fine-tuned for question answering using whole word masking and trained on the SQuAD dataset." +} \ No newline at end of file diff --git a/data/model_data_json/google-bert_bert-large-uncased.json b/data/model_data_json/google-bert_bert-large-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..90694dce621b19828052270613c35b28bf5f4131 --- /dev/null +++ b/data/model_data_json/google-bert_bert-large-uncased.json @@ -0,0 +1,24 @@ +{ + "model_id": "google-bert/bert-large-uncased", + "downloads": 953320, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "bert", + "fill-mask", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1810.04805", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 datasets: - bookcorpus - wikipedia --- # BERT large model (uncased) Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English. Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. This model has the following configuration: - 24-layer - 1024 hidden dimension - 16 attention heads - 336M parameters. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions: This bias will also affect all fine-tuned versions of this model. ## Training data The BERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers). ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer used is Adam with a learning rate of 1e-4, \\\\(\\beta_{1} = 0.9\\\\) and \\\\(\\beta_{2} = 0.999\\\\), a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after. ## Evaluation results When fine-tuned on downstream tasks, this model achieves the following results: Model | SQUAD 1.1 F1/EM | Multi NLI Accuracy ---------------------------------------- | :-------------: | :----------------: BERT-Large, Uncased (Original) | 91.0/84.3 | 86.05 ### BibTeX entry and citation info", + "model_explanation_gemini": "Pretrained on English text using masked language modeling and next sentence prediction to learn bidirectional representations for downstream NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google-t5_t5-11b.json b/data/model_data_json/google-t5_t5-11b.json new file mode 100644 index 0000000000000000000000000000000000000000..c749d2e6dd19f4fdd06e05331236b1718ea9f46f --- /dev/null +++ b/data/model_data_json/google-t5_t5-11b.json @@ -0,0 +1,33 @@ +{ + "model_id": "google-t5/t5-11b", + "downloads": 175289, + "tags": [ + "transformers", + "pytorch", + "tf", + "t5", + "text2text-generation", + "summarization", + "translation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:c4", + "arxiv:1805.12471", + "arxiv:1708.00055", + "arxiv:1704.05426", + "arxiv:1606.05250", + "arxiv:1808.09121", + "arxiv:1810.12885", + "arxiv:1905.10044", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual license: apache-2.0 tags: - summarization - translation datasets: - c4 inference: false --- # Model Card for T5 11B !model image # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. Model Card Authors 9. How To Get Started With the Model # Model Details ## Model Description The developers of the Text-To-Text Transfer Transformer (T5) write: > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task. T5-11B is the checkpoint with 11 billion parameters. - **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See associated paper and GitHub repo - **Model type:** Language model - **Language(s) (NLP):** English, French, Romanian, German - **License:** Apache 2.0 - **Related Models:** All T5 Checkpoints - **Resources for more information:** - Research paper - Google's T5 Blog Post - GitHub Repo - Hugging Face T5 Docs # Uses ## Direct Use and Downstream Use The developers write in a blog post that the model: > Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself. See the blog post and research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations More information needed. ## Recommendations More information needed. # Training Details ## Training Data The model is pre-trained on the Colossal Clean Crawled Corpus (C4), which was developed and released in the context of the same research paper as T5. The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.): 1. **Datasets used for Unsupervised denoising objective**: - C4 - Wiki-DPR 2. **Datasets used for Supervised text-to-text language modeling objective** - Sentence acceptability judgment - CoLA Warstadt et al., 2018 - Sentiment analysis - SST-2 Socher et al., 2013 - Paraphrasing/sentence similarity - MRPC Dolan and Brockett, 2005 - STS-B Ceret al., 2017 - QQP Iyer et al., 2017 - Natural language inference - MNLI Williams et al., 2017 - QNLI Rajpurkar et al.,2016 - RTE Dagan et al., 2005 - CB De Marneff et al., 2019 - Sentence completion - COPA Roemmele et al., 2011 - Word sense disambiguation - WIC Pilehvar and Camacho-Collados, 2018 - Question answering - MultiRC Khashabi et al., 2018 - ReCoRD Zhang et al., 2018 - BoolQ Clark et al., 2019 ## Training Procedure In their abstract, the model developers write: > In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the research paper for further details. # Evaluation ## Testing Data, Factors & Metrics The developers evaluated the model on 24 tasks, see the research paper for full details. ## Results For full results for T5-11B, see the research paper, Table 14. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:** **APA:** - Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model ## Disclaimer **Before v3.5.0**, due do its immense size, required some special treatment. If you're using transformers , should be loaded with flag set to as follows: Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB. - Model parallelism has to be used here to overcome this problem as is explained in this PR. - DeepSpeed's ZeRO-Offload is another approach as explained in this post. See the Hugging Face T5 docs and a Colab Notebook created by the model developers for more context.", + "model_explanation_gemini": "T5-11B is a multilingual text-to-text transformer model with 11 billion parameters, designed for various NLP tasks like summarization, translation, question answering, and classification by converting inputs and outputs into unified text strings." +} \ No newline at end of file diff --git a/data/model_data_json/google-t5_t5-3b.json b/data/model_data_json/google-t5_t5-3b.json new file mode 100644 index 0000000000000000000000000000000000000000..2ff093ba6e0fe6071794c7e1c87ff9a97d25aa09 --- /dev/null +++ b/data/model_data_json/google-t5_t5-3b.json @@ -0,0 +1,35 @@ +{ + "model_id": "google-t5/t5-3b", + "downloads": 329332, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "t5", + "text2text-generation", + "summarization", + "translation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:c4", + "arxiv:1805.12471", + "arxiv:1708.00055", + "arxiv:1704.05426", + "arxiv:1606.05250", + "arxiv:1808.09121", + "arxiv:1810.12885", + "arxiv:1905.10044", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual license: apache-2.0 tags: - summarization - translation datasets: - c4 --- # Model Card for T5-3B !model image # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. Model Card Authors 9. How To Get Started With the Model # Model Details ## Model Description The developers of the Text-To-Text Transfer Transformer (T5) write: > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task. T5-3B is the checkpoint with 3 billion parameters. - **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See associated paper and GitHub repo - **Model type:** Language model - **Language(s) (NLP):** English, French, Romanian, German - **License:** Apache 2.0 - **Related Models:** All T5 Checkpoints - **Resources for more information:** - Research paper - Google's T5 Blog Post - GitHub Repo - Hugging Face T5 Docs # Uses ## Direct Use and Downstream Use The developers write in a blog post that the model: > Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself. See the blog post and research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations More information needed. ## Recommendations More information needed. # Training Details ## Training Data The model is pre-trained on the Colossal Clean Crawled Corpus (C4), which was developed and released in the context of the same research paper as T5. The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.): 1. **Datasets used for Unsupervised denoising objective**: - C4 - Wiki-DPR 2. **Datasets used for Supervised text-to-text language modeling objective** - Sentence acceptability judgment - CoLA Warstadt et al., 2018 - Sentiment analysis - SST-2 Socher et al., 2013 - Paraphrasing/sentence similarity - MRPC Dolan and Brockett, 2005 - STS-B Ceret al., 2017 - QQP Iyer et al., 2017 - Natural language inference - MNLI Williams et al., 2017 - QNLI Rajpurkar et al.,2016 - RTE Dagan et al., 2005 - CB De Marneff et al., 2019 - Sentence completion - COPA Roemmele et al., 2011 - Word sense disambiguation - WIC Pilehvar and Camacho-Collados, 2018 - Question answering - MultiRC Khashabi et al., 2018 - ReCoRD Zhang et al., 2018 - BoolQ Clark et al., 2019 ## Training Procedure In their abstract, the model developers write: > In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the research paper for further details. # Evaluation ## Testing Data, Factors & Metrics The developers evaluated the model on 24 tasks, see the research paper for full details. ## Results For full results for T5-3B, see the research paper, Table 14. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:** **APA:** - Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model See the Hugging Face T5 docs and a Colab Notebook created by the model developers for more context on how to get started with this checkpoint.", + "model_explanation_gemini": "A 3-billion-parameter text-to-text transformer model designed for multiple NLP tasks including summarization, translation, question answering, and classification across English, French, Romanian, and German languages." +} \ No newline at end of file diff --git a/data/model_data_json/google-t5_t5-base.json b/data/model_data_json/google-t5_t5-base.json new file mode 100644 index 0000000000000000000000000000000000000000..ee01b49fd3e90ebfcb1d106e1fe9e6f5128b06de --- /dev/null +++ b/data/model_data_json/google-t5_t5-base.json @@ -0,0 +1,36 @@ +{ + "model_id": "google-t5/t5-base", + "downloads": 4508955, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "safetensors", + "t5", + "text2text-generation", + "summarization", + "translation", + "en", + "fr", + "ro", + "de", + "dataset:c4", + "arxiv:1805.12471", + "arxiv:1708.00055", + "arxiv:1704.05426", + "arxiv:1606.05250", + "arxiv:1808.09121", + "arxiv:1810.12885", + "arxiv:1905.10044", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: translation language: - en - fr - ro - de datasets: - c4 tags: - summarization - translation license: apache-2.0 --- # Model Card for T5 Base !model image # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. Model Card Authors 9. How To Get Started With the Model # Model Details ## Model Description The developers of the Text-To-Text Transfer Transformer (T5) write: > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task. T5-Base is the checkpoint with 220 million parameters. - **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See associated paper and GitHub repo - **Model type:** Language model - **Language(s) (NLP):** English, French, Romanian, German - **License:** Apache 2.0 - **Related Models:** All T5 Checkpoints - **Resources for more information:** - Research paper - Google's T5 Blog Post - GitHub Repo - Hugging Face T5 Docs # Uses ## Direct Use and Downstream Use The developers write in a blog post that the model: > Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself. See the blog post and research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations More information needed. ## Recommendations More information needed. # Training Details ## Training Data The model is pre-trained on the Colossal Clean Crawled Corpus (C4), which was developed and released in the context of the same research paper as T5. The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.): 1. **Datasets used for Unsupervised denoising objective**: - C4 - Wiki-DPR 2. **Datasets used for Supervised text-to-text language modeling objective** - Sentence acceptability judgment - CoLA Warstadt et al., 2018 - Sentiment analysis - SST-2 Socher et al., 2013 - Paraphrasing/sentence similarity - MRPC Dolan and Brockett, 2005 - STS-B Ceret al., 2017 - QQP Iyer et al., 2017 - Natural language inference - MNLI Williams et al., 2017 - QNLI Rajpurkar et al.,2016 - RTE Dagan et al., 2005 - CB De Marneff et al., 2019 - Sentence completion - COPA Roemmele et al., 2011 - Word sense disambiguation - WIC Pilehvar and Camacho-Collados, 2018 - Question answering - MultiRC Khashabi et al., 2018 - ReCoRD Zhang et al., 2018 - BoolQ Clark et al., 2019 ## Training Procedure In their abstract, the model developers write: > In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the research paper for further details. # Evaluation ## Testing Data, Factors & Metrics The developers evaluated the model on 24 tasks, see the research paper for full details. ## Results For full results for T5-Base, see the research paper, Table 14. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:** **APA:** - Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model Use the code below to get started with the model.
Click to expand See the Hugging Face T5 docs and a Colab Notebook created by the model developers for more examples.
", + "model_explanation_gemini": "T5-Base is a 220-million-parameter text-to-text transformer model designed for multiple NLP tasks like translation, summarization, question answering, and classification by converting inputs and outputs into unified text strings." +} \ No newline at end of file diff --git a/data/model_data_json/google-t5_t5-large.json b/data/model_data_json/google-t5_t5-large.json new file mode 100644 index 0000000000000000000000000000000000000000..291ffa2a17003a9878cc9c37ffa3f0094c5d8b65 --- /dev/null +++ b/data/model_data_json/google-t5_t5-large.json @@ -0,0 +1,36 @@ +{ + "model_id": "google-t5/t5-large", + "downloads": 407773, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "t5", + "text2text-generation", + "summarization", + "translation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:c4", + "arxiv:1805.12471", + "arxiv:1708.00055", + "arxiv:1704.05426", + "arxiv:1606.05250", + "arxiv:1808.09121", + "arxiv:1810.12885", + "arxiv:1905.10044", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual license: apache-2.0 tags: - summarization - translation datasets: - c4 --- # Model Card for T5 Large !model image # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. Model Card Authors 9. How To Get Started With the Model # Model Details ## Model Description The developers of the Text-To-Text Transfer Transformer (T5) write: > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task. T5-Large is the checkpoint with 770 million parameters. - **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See associated paper and GitHub repo - **Model type:** Language model - **Language(s) (NLP):** English, French, Romanian, German - **License:** Apache 2.0 - **Related Models:** All T5 Checkpoints - **Resources for more information:** - Research paper - Google's T5 Blog Post - GitHub Repo - Hugging Face T5 Docs # Uses ## Direct Use and Downstream Use The developers write in a blog post that the model: > Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself. See the blog post and research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations More information needed. ## Recommendations More information needed. # Training Details ## Training Data The model is pre-trained on the Colossal Clean Crawled Corpus (C4), which was developed and released in the context of the same research paper as T5. The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.): 1. **Datasets used for Unsupervised denoising objective**: - C4 - Wiki-DPR 2. **Datasets used for Supervised text-to-text language modeling objective** - Sentence acceptability judgment - CoLA Warstadt et al., 2018 - Sentiment analysis - SST-2 Socher et al., 2013 - Paraphrasing/sentence similarity - MRPC Dolan and Brockett, 2005 - STS-B Ceret al., 2017 - QQP Iyer et al., 2017 - Natural language inference - MNLI Williams et al., 2017 - QNLI Rajpurkar et al.,2016 - RTE Dagan et al., 2005 - CB De Marneff et al., 2019 - Sentence completion - COPA Roemmele et al., 2011 - Word sense disambiguation - WIC Pilehvar and Camacho-Collados, 2018 - Question answering - MultiRC Khashabi et al., 2018 - ReCoRD Zhang et al., 2018 - BoolQ Clark et al., 2019 ## Training Procedure In their abstract, the model developers write: > In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the research paper for further details. # Evaluation ## Testing Data, Factors & Metrics The developers evaluated the model on 24 tasks, see the research paper for full details. ## Results For full results for T5-Large, see the research paper, Table 14. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:** **APA:** - Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model Use the code below to get started with the model.
Click to expand See the Hugging Face T5 docs and a Colab Notebook created by the model developers for more examples.
", + "model_explanation_gemini": "T5-Large is a 770-million-parameter text-to-text transformer model designed for diverse NLP tasks like summarization, translation, question answering, and classification by converting inputs and outputs into unified text strings." +} \ No newline at end of file diff --git a/data/model_data_json/google-t5_t5-small.json b/data/model_data_json/google-t5_t5-small.json new file mode 100644 index 0000000000000000000000000000000000000000..e80b43df1b8df18381d115cbd404c935471910c9 --- /dev/null +++ b/data/model_data_json/google-t5_t5-small.json @@ -0,0 +1,38 @@ +{ + "model_id": "google-t5/t5-small", + "downloads": 3068509, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "onnx", + "safetensors", + "t5", + "text2text-generation", + "summarization", + "translation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:c4", + "arxiv:1805.12471", + "arxiv:1708.00055", + "arxiv:1704.05426", + "arxiv:1606.05250", + "arxiv:1808.09121", + "arxiv:1810.12885", + "arxiv:1905.10044", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual license: apache-2.0 tags: - summarization - translation datasets: - c4 --- # Model Card for T5 Small !model image # Table of Contents 1. Model Details 2. Uses 3. Bias, Risks, and Limitations 4. Training Details 5. Evaluation 6. Environmental Impact 7. Citation 8. Model Card Authors 9. How To Get Started With the Model # Model Details ## Model Description The developers of the Text-To-Text Transfer Transformer (T5) write: > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task. T5-Small is the checkpoint with 60 million parameters. - **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See associated paper and GitHub repo - **Model type:** Language model - **Language(s) (NLP):** English, French, Romanian, German - **License:** Apache 2.0 - **Related Models:** All T5 Checkpoints - **Resources for more information:** - Research paper - Google's T5 Blog Post - GitHub Repo - Hugging Face T5 Docs # Uses ## Direct Use and Downstream Use The developers write in a blog post that the model: > Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself. See the blog post and research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations More information needed. ## Recommendations More information needed. # Training Details ## Training Data The model is pre-trained on the Colossal Clean Crawled Corpus (C4), which was developed and released in the context of the same research paper as T5. The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.): 1. **Datasets used for Unsupervised denoising objective**: - C4 - Wiki-DPR 2. **Datasets used for Supervised text-to-text language modeling objective** - Sentence acceptability judgment - CoLA Warstadt et al., 2018 - Sentiment analysis - SST-2 Socher et al., 2013 - Paraphrasing/sentence similarity - MRPC Dolan and Brockett, 2005 - STS-B Ceret al., 2017 - QQP Iyer et al., 2017 - Natural language inference - MNLI Williams et al., 2017 - QNLI Rajpurkar et al.,2016 - RTE Dagan et al., 2005 - CB De Marneff et al., 2019 - Sentence completion - COPA Roemmele et al., 2011 - Word sense disambiguation - WIC Pilehvar and Camacho-Collados, 2018 - Question answering - MultiRC Khashabi et al., 2018 - ReCoRD Zhang et al., 2018 - BoolQ Clark et al., 2019 ## Training Procedure In their abstract, the model developers write: > In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the research paper for further details. # Evaluation ## Testing Data, Factors & Metrics The developers evaluated the model on 24 tasks, see the research paper for full details. ## Results For full results for T5-small, see the research paper, Table 14. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:** **APA:** - Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67. # Model Card Authors This model card was written by the team at Hugging Face. # How to Get Started with the Model Use the code below to get started with the model.
Click to expand See the Hugging Face T5 docs and a Colab Notebook created by the model developers for more examples.
", + "model_explanation_gemini": "T5-small is a 60-million-parameter text-to-text transformer model designed for multiple NLP tasks like summarization, translation, and question answering by converting inputs and outputs into unified text strings." +} \ No newline at end of file diff --git a/data/model_data_json/google_bert_uncased_L-2_H-128_A-2.json b/data/model_data_json/google_bert_uncased_L-2_H-128_A-2.json new file mode 100644 index 0000000000000000000000000000000000000000..007ed133a2678b22164287d9b16adae703752d08 --- /dev/null +++ b/data/model_data_json/google_bert_uncased_L-2_H-128_A-2.json @@ -0,0 +1,17 @@ +{ + "model_id": "google/bert_uncased_L-2_H-128_A-2", + "downloads": 1185719, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "bert", + "arxiv:1908.08962", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- thumbnail: license: apache-2.0 --- BERT Miniatures === This is the set of 24 BERT models referenced in Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (English only, uncased, trained with WordPiece masking). We have shown that the standard BERT recipe (including model architecture and training objective) is effective on a wide range of model sizes, beyond BERT-Base and BERT-Large. The smaller BERT models are intended for environments with restricted computational resources. They can be fine-tuned in the same manner as the original BERT models. However, they are most effective in the context of knowledge distillation, where the fine-tuning labels are produced by a larger and more accurate teacher. Our goal is to enable research in institutions with fewer computational resources and encourage the community to seek directions of innovation alternative to increasing model capacity. You can download the 24 BERT miniatures either from the official BERT Github page, or via HuggingFace from the links below: | |H=128|H=256|H=512|H=768| |---|:---:|:---:|:---:|:---:| | **L=2** |[**2/128 (BERT-Tiny)**][2_128]|[2/256][2_256]|[2/512][2_512]|[2/768][2_768]| | **L=4** |[4/128][4_128]|[**4/256 (BERT-Mini)**][4_256]|[**4/512 (BERT-Small)**][4_512]|[4/768][4_768]| | **L=6** |[6/128][6_128]|[6/256][6_256]|[6/512][6_512]|[6/768][6_768]| | **L=8** |[8/128][8_128]|[8/256][8_256]|[**8/512 (BERT-Medium)**][8_512]|[8/768][8_768]| | **L=10** |[10/128][10_128]|[10/256][10_256]|[10/512][10_512]|[10/768][10_768]| | **L=12** |[12/128][12_128]|[12/256][12_256]|[12/512][12_512]|[**12/768 (BERT-Base)**][12_768]| Note that the BERT-Base model in this release is included for completeness only; it was re-trained under the same regime as the original model. Here are the corresponding GLUE scores on the test set: |Model|Score|CoLA|SST-2|MRPC|STS-B|QQP|MNLI-m|MNLI-mm|QNLI(v2)|RTE|WNLI|AX| |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |BERT-Tiny|64.2|0.0|83.2|81.1/71.1|74.3/73.6|62.2/83.4|70.2|70.3|81.5|57.2|62.3|21.0| |BERT-Mini|65.8|0.0|85.9|81.1/71.8|75.4/73.3|66.4/86.2|74.8|74.3|84.1|57.9|62.3|26.1| |BERT-Small|71.2|27.8|89.7|83.4/76.2|78.8/77.0|68.1/87.0|77.6|77.0|86.4|61.8|62.3|28.6| |BERT-Medium|73.5|38.0|89.6|86.6/81.6|80.4/78.4|69.6/87.9|80.0|79.1|87.7|62.2|62.3|30.5| For each task, we selected the best fine-tuning hyperparameters from the lists below, and trained for 4 epochs: - batch sizes: 8, 16, 32, 64, 128 - learning rates: 3e-4, 1e-4, 5e-5, 3e-5 If you use these models, please cite the following paper: [2_128]: [2_256]: [2_512]: [2_768]: [4_128]: [4_256]: [4_512]: [4_768]: [6_128]: [6_256]: [6_512]: [6_768]: [8_128]: [8_256]: [8_512]: [8_768]: [10_128]: [10_256]: [10_512]: [10_768]: [12_128]: [12_256]: [12_512]: [12_768]:", + "model_explanation_gemini": "A compact BERT model (L=2 layers, H=128 hidden size) designed for resource-constrained environments, fine-tunable like standard BERT and effective for knowledge distillation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google_byt5-small.json b/data/model_data_json/google_byt5-small.json new file mode 100644 index 0000000000000000000000000000000000000000..e82e514eade1873b81fd7b9f8624e900d3fef0fb --- /dev/null +++ b/data/model_data_json/google_byt5-small.json @@ -0,0 +1,124 @@ +{ + "model_id": "google/byt5-small", + "downloads": 1404765, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "t5", + "text2text-generation", + "multilingual", + "af", + "am", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "co", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fil", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "haw", + "hi", + "hmn", + "ht", + "hu", + "hy", + "ig", + "is", + "it", + "iw", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lb", + "lo", + "lt", + "lv", + "mg", + "mi", + "mk", + "ml", + "mn", + "mr", + "ms", + "mt", + "my", + "ne", + "nl", + "no", + "ny", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sd", + "si", + "sk", + "sl", + "sm", + "sn", + "so", + "sq", + "sr", + "st", + "su", + "sv", + "sw", + "ta", + "te", + "tg", + "th", + "tr", + "uk", + "und", + "ur", + "uz", + "vi", + "xh", + "yi", + "yo", + "zh", + "zu", + "dataset:mc4", + "arxiv:1907.06292", + "arxiv:2105.13626", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - az - be - bg - bn - ca - ceb - co - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fil - fr - fy - ga - gd - gl - gu - ha - haw - hi - hmn - ht - hu - hy - ig - is - it - iw - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lb - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - no - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sm - sn - so - sq - sr - st - su - sv - sw - ta - te - tg - th - tr - uk - und - ur - uz - vi - xh - yi - yo - zh - zu datasets: - mc4 license: apache-2.0 --- # ByT5 - Small ByT5 is a tokenizer-free version of Google's T5 and generally follows the architecture of MT5. ByT5 was only pre-trained on mC4 excluding any supervised training with an average span-mask of 20 UTF-8 characters. Therefore, this model has to be fine-tuned before it is useable on a downstream task. ByT5 works especially well on noisy text data,*e.g.*, significantly outperforms mt5-small on TweetQA. Paper: ByT5: Towards a token-free future with pre-trained byte-to-byte models Authors: *Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel* ## Example Inference ByT5 works on raw UTF-8 bytes and can be used without a tokenizer: For batched inference & training it is however recommended using a tokenizer class for padding: ## Abstract Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units. Encoding text as a sequence of tokens requires a tokenizer, which is typically created as an independent artifact from the model. Token-free models that instead operate directly on raw text (bytes or characters) have many benefits: they can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pipelines. Since byte or character sequences are longer than token sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of operating directly on raw text. In this paper, we show that a standard Transformer architecture can be used with minimal modifications to process byte sequences. We carefully characterize the trade-offs in terms of parameter count, training FLOPs, and inference speed, and show that byte-level models are competitive with their token-level counterparts. We also demonstrate that byte-level models are significantly more robust to noise and perform better on tasks that are sensitive to spelling and pronunciation. As part of our contribution, we release a new set of pre-trained byte-level Transformer models based on the T5 architecture, as well as all code and data used in our experiments. !model image", + "model_explanation_gemini": "google_byt5-small is a tokenizer-free multilingual model that processes raw UTF-8 bytes, excels with noisy text, and requires fine-tuning for downstream tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google_canine-c.json b/data/model_data_json/google_canine-c.json new file mode 100644 index 0000000000000000000000000000000000000000..a806646864c3697cc7782ad6088ee80546b35f5b --- /dev/null +++ b/data/model_data_json/google_canine-c.json @@ -0,0 +1,122 @@ +{ + "model_id": "google/canine-c", + "downloads": 192101, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "canine", + "feature-extraction", + "multilingual", + "af", + "sq", + "ar", + "an", + "hy", + "ast", + "az", + "ba", + "eu", + "bar", + "be", + "bn", + "inc", + "bs", + "br", + "bg", + "my", + "ca", + "ceb", + "ce", + "zh", + "cv", + "hr", + "cs", + "da", + "nl", + "en", + "et", + "fi", + "fr", + "gl", + "ka", + "de", + "el", + "gu", + "ht", + "he", + "hi", + "hu", + "is", + "io", + "id", + "ga", + "it", + "ja", + "jv", + "kn", + "kk", + "ky", + "ko", + "la", + "lv", + "lt", + "roa", + "nds", + "lm", + "mk", + "mg", + "ms", + "ml", + "mr", + "mn", + "min", + "ne", + "new", + "nb", + "nn", + "oc", + "fa", + "pms", + "pl", + "pt", + "pa", + "ro", + "ru", + "sco", + "sr", + "scn", + "sk", + "sl", + "aze", + "es", + "su", + "sw", + "sv", + "tl", + "tg", + "th", + "ta", + "tt", + "te", + "tr", + "uk", + "ud", + "uz", + "vi", + "vo", + "war", + "cy", + "fry", + "pnb", + "yo", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:2103.06874", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - sq - ar - an - hy - ast - az - ba - eu - bar - be - bn - inc - bs - br - bg - my - ca - ceb - ce - zh - cv - hr - cs - da - nl - en - et - fi - fr - gl - ka - de - el - gu - ht - he - hi - hu - is - io - id - ga - it - ja - jv - kn - kk - ky - ko - la - lv - lt - roa - nds - lm - mk - mg - ms - ml - mr - mn - min - ne - new - nb - nn - oc - fa - pms - pl - pt - pa - ro - ru - sco - sr - hr - scn - sk - sl - aze - es - su - sw - sv - tl - tg - th - ta - tt - te - tr - uk - ud - uz - vi - vo - war - cy - fry - pnb - yo license: apache-2.0 datasets: - bookcorpus - wikipedia --- # CANINE-c (CANINE pre-trained with autoregressive character loss) Pretrained CANINE model on 104 languages using a masked language modeling (MLM) objective. It was introduced in the paper CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation and first released in this repository. What's special about CANINE is that it doesn't require an explicit tokenizer (such as WordPiece or SentencePiece) as other models like BERT and RoBERTa. Instead, it directly operates at a character level: each character is turned into its Unicode code point. This means that input processing is trivial and can typically be accomplished as: The ord() function is part of Python, and turns each character into its Unicode code point. Disclaimer: The team releasing CANINE did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description CANINE is a transformers model pretrained on a large corpus of multilingual data in a self-supervised fashion, similar to BERT. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: * Masked language modeling (MLM): one randomly masks part of the inputs, which the model needs to predict. This model (CANINE-c) is trained with an autoregressive character loss. One masks several character spans within each sequence, which the model then autoregressively predicts. * Next sentence prediction (NSP): the model concatenates two sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of multiple languages that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the CANINE model as inputs. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at models like GPT2. ### How to use Here is how to use this model: ## Training data The CANINE model was pretrained on on the multilingual Wikipedia data of mBERT, which includes 104 languages. ### BibTeX entry and citation info", + "model_explanation_gemini": "google_canine-c is a multilingual, tokenization-free transformer model pretrained on 104 languages using masked language modeling and next sentence prediction, operating directly on Unicode characters without requiring explicit tokenization." +} \ No newline at end of file diff --git a/data/model_data_json/google_efficientnet-b2.json b/data/model_data_json/google_efficientnet-b2.json new file mode 100644 index 0000000000000000000000000000000000000000..ee8880eed9c13467b1c6289756509e66563cf6c2 --- /dev/null +++ b/data/model_data_json/google_efficientnet-b2.json @@ -0,0 +1,19 @@ +{ + "model_id": "google/efficientnet-b2", + "downloads": 178472, + "tags": [ + "transformers", + "pytorch", + "efficientnet", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:1905.11946", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # EfficientNet (b2 model) EfficientNet model trained on ImageNet-1k at resolution 260x260. It was introduced in the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan and Quoc V. Le, and first released in this repository. Disclaimer: The team releasing EfficientNet did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description EfficientNet is a mobile friendly pure convolutional model (ConvNet) that proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. !model image ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Classifies images into 1,000 ImageNet categories using EfficientNet-b2 architecture trained at 260x260 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/google_electra-base-discriminator.json b/data/model_data_json/google_electra-base-discriminator.json new file mode 100644 index 0000000000000000000000000000000000000000..9f26ea25fbb253fba17c10771aa78fda607ba821 --- /dev/null +++ b/data/model_data_json/google_electra-base-discriminator.json @@ -0,0 +1,20 @@ +{ + "model_id": "google/electra-base-discriminator", + "downloads": 17910835, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "electra", + "pretraining", + "en", + "arxiv:1406.2661", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en thumbnail: license: apache-2.0 --- ## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators **ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish \"real\" input tokens vs \"fake\" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset. For a detailed description and experimental results, please refer to our paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. GLUE), QA tasks (e.g., SQuAD), and sequence tagging tasks (e.g., text chunking). ## How to use the discriminator in", + "model_explanation_gemini": "ELECTRA is a transformer-based model pre-trained to distinguish real from fake input tokens, enabling efficient language representation learning for tasks like classification, QA, and sequence tagging." +} \ No newline at end of file diff --git a/data/model_data_json/google_electra-small-discriminator.json b/data/model_data_json/google_electra-small-discriminator.json new file mode 100644 index 0000000000000000000000000000000000000000..cccdd4be6fd28f343be2b31d641ea800ea816265 --- /dev/null +++ b/data/model_data_json/google_electra-small-discriminator.json @@ -0,0 +1,19 @@ +{ + "model_id": "google/electra-small-discriminator", + "downloads": 408143, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "electra", + "pretraining", + "en", + "arxiv:1406.2661", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en thumbnail: license: apache-2.0 --- ## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators **ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish \"real\" input tokens vs \"fake\" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset. For a detailed description and experimental results, please refer to our paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. GLUE), QA tasks (e.g., SQuAD), and sequence tagging tasks (e.g., text chunking). ## How to use the discriminator in", + "model_explanation_gemini": "Pre-trains transformer networks by distinguishing real input tokens from fake ones, enabling efficient language representation learning for downstream tasks like classification, QA, and sequence tagging." +} \ No newline at end of file diff --git a/data/model_data_json/google_flan-t5-base.json b/data/model_data_json/google_flan-t5-base.json new file mode 100644 index 0000000000000000000000000000000000000000..18f8a6968310480271c902cbab087a1bbd1d34ff --- /dev/null +++ b/data/model_data_json/google_flan-t5-base.json @@ -0,0 +1,37 @@ +{ + "model_id": "google/flan-t5-base", + "downloads": 3525796, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "t5", + "text2text-generation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:svakulenk0/qrecc", + "dataset:taskmaster2", + "dataset:djaym7/wiki_dialog", + "dataset:deepmind/code_contests", + "dataset:lambada", + "dataset:gsm8k", + "dataset:aqua_rat", + "dataset:esnli", + "dataset:quasc", + "dataset:qed", + "arxiv:2210.11416", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual tags: - text2text-generation widget: - text: \"Translate to German: My name is Arthur\" example_title: \"Translation\" - text: \"Please answer to the following question. Who is going to be the next Ballon d'or?\" example_title: \"Question Answering\" - text: \"Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.\" example_title: \"Logical reasoning\" - text: \"Please answer the following question. What is the boiling point of Nitrogen?\" example_title: \"Scientific knowledge\" - text: \"Answer the following yes/no question. Can you write a whole Haiku in a single tweet?\" example_title: \"Yes/no question\" - text: \"Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?\" example_title: \"Reasoning task\" - text: \"Q: ( False or not False or False ) is? A: Let's think step by step\" example_title: \"Boolean Expressions\" - text: \"The square root of x is the cube root of y. What is y to the power of 2, if x = 4?\" example_title: \"Math reasoning\" - text: \"Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?\" example_title: \"Premise and hypothesis\" datasets: - svakulenk0/qrecc - taskmaster2 - djaym7/wiki_dialog - deepmind/code_contests - lambada - gsm8k - aqua_rat - esnli - quasc - qed license: apache-2.0 --- # Model Card for FLAN-T5 base \"drawing\" # Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation 9. Model Card Authors # TL;DR If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract : > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the T5 model card. # Model Details ## Model Description - **Model type:** Language model - **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian - **License:** Apache 2.0 - **Related Models:** All FLAN-T5 Checkpoints - **Original Checkpoints:** All Original FLAN-T5 Checkpoints - **Resources for more information:** - Research paper - GitHub Repo - Hugging Face FLAN-T5 Docs (Similar to T5) # Usage Find below some example scripts on how to use the model in : ## Using the Pytorch model ### Running the model on a CPU
Click to expand
### Running the model on a GPU
Click to expand
### Running the model on a GPU using different precisions #### FP16
Click to expand
#### INT8
Click to expand
# Uses ## Direct Use and Downstream Use The authors write in the original paper's model card that: > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models See the research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations The information below in this section are copied from the model's official model card: > Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. ## Ethical considerations and risks > Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. ## Known Limitations > Flan-T5 has not been tested in real world applications. ## Sensitive Use: > Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech. # Training Details ## Training Data The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2): !table.png ## Training Procedure According to the model card from the original paper: > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size. The model has been trained on TPU v3 or TPU v4 pods, using []( codebase together with []( # Evaluation ## Testing Data, Factors & Metrics The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation: !image.png For full details, please check the research paper. ## Results For full results for FLAN-T5-Base, see the research paper, Table 3. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4. - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:** ## Model Recycling Evaluation on 36 datasets using google/flan-t5-base as a base model yields average score of 77.98 in comparison to 68.82 by google/t5-v1_1-base. The model is ranked 1st among all tested models for the google/t5-v1_1-base architecture as of 06/02/2023 Results: | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers | |---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|-------:|--------:|----------------:| | 86.2188 | 89.6667 | 67.12 | 51.9688 | 82.3242 | 78.5714 | 80.1534 | 75 | 77.6667 | 90.9507 | 85.4 | 93.324 | 72.425 | 87.2457 | 89.4608 | 62.3762 | 82.6923 | 92.7878 | 89.7724 | 89.0244 | 84.8375 | 94.3807 | 57.2851 | 89.4759 | 97.2 | 92.8 | 46.848 | 80.2252 | 54.9832 | 76.6582 | 84.3023 | 70.6366 | 70.0627 | 56.338 | 53.8462 | 73.4 | For more information, see: Model Recycling", + "model_explanation_gemini": "A multilingual text-to-text generation model fine-tuned for diverse tasks including translation, question answering, logical reasoning, and scientific knowledge." +} \ No newline at end of file diff --git a/data/model_data_json/google_flan-t5-large.json b/data/model_data_json/google_flan-t5-large.json new file mode 100644 index 0000000000000000000000000000000000000000..d0322fb04ac4f08d10566b429eda3d5064e140de --- /dev/null +++ b/data/model_data_json/google_flan-t5-large.json @@ -0,0 +1,37 @@ +{ + "model_id": "google/flan-t5-large", + "downloads": 576267, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "t5", + "text2text-generation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:svakulenk0/qrecc", + "dataset:taskmaster2", + "dataset:djaym7/wiki_dialog", + "dataset:deepmind/code_contests", + "dataset:lambada", + "dataset:gsm8k", + "dataset:aqua_rat", + "dataset:esnli", + "dataset:quasc", + "dataset:qed", + "arxiv:2210.11416", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual widget: - text: \"Translate to German: My name is Arthur\" example_title: \"Translation\" - text: \"Please answer to the following question. Who is going to be the next Ballon d'or?\" example_title: \"Question Answering\" - text: \"Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.\" example_title: \"Logical reasoning\" - text: \"Please answer the following question. What is the boiling point of Nitrogen?\" example_title: \"Scientific knowledge\" - text: \"Answer the following yes/no question. Can you write a whole Haiku in a single tweet?\" example_title: \"Yes/no question\" - text: \"Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?\" example_title: \"Reasoning task\" - text: \"Q: ( False or not False or False ) is? A: Let's think step by step\" example_title: \"Boolean Expressions\" - text: \"The square root of x is the cube root of y. What is y to the power of 2, if x = 4?\" example_title: \"Math reasoning\" - text: \"Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?\" example_title: \"Premise and hypothesis\" tags: - text2text-generation datasets: - svakulenk0/qrecc - taskmaster2 - djaym7/wiki_dialog - deepmind/code_contests - lambada - gsm8k - aqua_rat - esnli - quasc - qed license: apache-2.0 --- # Model Card for FLAN-T5 large \"drawing\" # Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation 9. Model Card Authors # TL;DR If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract : > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the T5 model card. # Model Details ## Model Description - **Model type:** Language model - **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian - **License:** Apache 2.0 - **Related Models:** All FLAN-T5 Checkpoints - **Original Checkpoints:** All Original FLAN-T5 Checkpoints - **Resources for more information:** - Research paper - GitHub Repo - Hugging Face FLAN-T5 Docs (Similar to T5) # Usage Find below some example scripts on how to use the model in : ## Using the Pytorch model ### Running the model on a CPU
Click to expand
### Running the model on a GPU
Click to expand
### Running the model on a GPU using different precisions #### FP16
Click to expand
#### INT8
Click to expand
# Uses ## Direct Use and Downstream Use The authors write in the original paper's model card that: > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models See the research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations The information below in this section are copied from the model's official model card: > Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. ## Ethical considerations and risks > Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. ## Known Limitations > Flan-T5 has not been tested in real world applications. ## Sensitive Use: > Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech. # Training Details ## Training Data The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2): !table.png ## Training Procedure According to the model card from the original paper: > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size. The model has been trained on TPU v3 or TPU v4 pods, using []( codebase together with []( # Evaluation ## Testing Data, Factors & Metrics The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation: !image.png For full details, please check the research paper. ## Results For full results for FLAN-T5-Large, see the research paper, Table 3. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4. - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:**", + "model_explanation_gemini": "google_flan-t5-large is a multilingual text-to-text generation model fine-tuned for diverse tasks including translation, question answering, logical reasoning, and scientific knowledge retrieval." +} \ No newline at end of file diff --git a/data/model_data_json/google_flan-t5-small.json b/data/model_data_json/google_flan-t5-small.json new file mode 100644 index 0000000000000000000000000000000000000000..c4b1496864ca6db98e1314d7c596143c1e2226f9 --- /dev/null +++ b/data/model_data_json/google_flan-t5-small.json @@ -0,0 +1,37 @@ +{ + "model_id": "google/flan-t5-small", + "downloads": 669018, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "t5", + "text2text-generation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:svakulenk0/qrecc", + "dataset:taskmaster2", + "dataset:djaym7/wiki_dialog", + "dataset:deepmind/code_contests", + "dataset:lambada", + "dataset:gsm8k", + "dataset:aqua_rat", + "dataset:esnli", + "dataset:quasc", + "dataset:qed", + "arxiv:2210.11416", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual tags: - text2text-generation widget: - text: \"Translate to German: My name is Arthur\" example_title: \"Translation\" - text: \"Please answer to the following question. Who is going to be the next Ballon d'or?\" example_title: \"Question Answering\" - text: \"Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.\" example_title: \"Logical reasoning\" - text: \"Please answer the following question. What is the boiling point of Nitrogen?\" example_title: \"Scientific knowledge\" - text: \"Answer the following yes/no question. Can you write a whole Haiku in a single tweet?\" example_title: \"Yes/no question\" - text: \"Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?\" example_title: \"Reasoning task\" - text: \"Q: ( False or not False or False ) is? A: Let's think step by step\" example_title: \"Boolean Expressions\" - text: \"The square root of x is the cube root of y. What is y to the power of 2, if x = 4?\" example_title: \"Math reasoning\" - text: \"Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?\" example_title: \"Premise and hypothesis\" datasets: - svakulenk0/qrecc - taskmaster2 - djaym7/wiki_dialog - deepmind/code_contests - lambada - gsm8k - aqua_rat - esnli - quasc - qed license: apache-2.0 --- # Model Card for FLAN-T5 small \"drawing\" # Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation 9. Model Card Authors # TL;DR If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract : > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the T5 model card. # Model Details ## Model Description - **Model type:** Language model - **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian - **License:** Apache 2.0 - **Related Models:** All FLAN-T5 Checkpoints - **Original Checkpoints:** All Original FLAN-T5 Checkpoints - **Resources for more information:** - Research paper - GitHub Repo - Hugging Face FLAN-T5 Docs (Similar to T5) # Usage Find below some example scripts on how to use the model in : ## Using the Pytorch model ### Running the model on a CPU
Click to expand
### Running the model on a GPU
Click to expand
### Running the model on a GPU using different precisions #### FP16
Click to expand
#### INT8
Click to expand
# Uses ## Direct Use and Downstream Use The authors write in the original paper's model card that: > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models See the research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations The information below in this section are copied from the model's official model card: > Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. ## Ethical considerations and risks > Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. ## Known Limitations > Flan-T5 has not been tested in real world applications. ## Sensitive Use: > Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech. # Training Details ## Training Data The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2): !table.png ## Training Procedure According to the model card from the original paper: > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size. The model has been trained on TPU v3 or TPU v4 pods, using []( codebase together with []( # Evaluation ## Testing Data, Factors & Metrics The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation: !image.png For full details, please check the research paper. ## Results For full results for FLAN-T5-Small, see the research paper, Table 3. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4. - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:**", + "model_explanation_gemini": "google_flan-t5-small is a multilingual text-to-text generation model fine-tuned for diverse tasks including translation, question answering, logical reasoning, and scientific knowledge retrieval." +} \ No newline at end of file diff --git a/data/model_data_json/google_flan-t5-xl.json b/data/model_data_json/google_flan-t5-xl.json new file mode 100644 index 0000000000000000000000000000000000000000..0743ba1cfc31bdf3f41ce1187d028ed0e592a97b --- /dev/null +++ b/data/model_data_json/google_flan-t5-xl.json @@ -0,0 +1,37 @@ +{ + "model_id": "google/flan-t5-xl", + "downloads": 243622, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "t5", + "text2text-generation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:svakulenk0/qrecc", + "dataset:taskmaster2", + "dataset:djaym7/wiki_dialog", + "dataset:deepmind/code_contests", + "dataset:lambada", + "dataset:gsm8k", + "dataset:aqua_rat", + "dataset:esnli", + "dataset:quasc", + "dataset:qed", + "arxiv:2210.11416", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual widget: - text: \"Translate to German: My name is Arthur\" example_title: \"Translation\" - text: \"Please answer to the following question. Who is going to be the next Ballon d'or?\" example_title: \"Question Answering\" - text: \"Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.\" example_title: \"Logical reasoning\" - text: \"Please answer the following question. What is the boiling point of Nitrogen?\" example_title: \"Scientific knowledge\" - text: \"Answer the following yes/no question. Can you write a whole Haiku in a single tweet?\" example_title: \"Yes/no question\" - text: \"Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?\" example_title: \"Reasoning task\" - text: \"Q: ( False or not False or False ) is? A: Let's think step by step\" example_title: \"Boolean Expressions\" - text: \"The square root of x is the cube root of y. What is y to the power of 2, if x = 4?\" example_title: \"Math reasoning\" - text: \"Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?\" example_title: \"Premise and hypothesis\" tags: - text2text-generation datasets: - svakulenk0/qrecc - taskmaster2 - djaym7/wiki_dialog - deepmind/code_contests - lambada - gsm8k - aqua_rat - esnli - quasc - qed license: apache-2.0 --- # Model Card for FLAN-T5 XL \"drawing\" # Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation # TL;DR If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract : > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the T5 model card. # Model Details ## Model Description - **Model type:** Language model - **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian - **License:** Apache 2.0 - **Related Models:** All FLAN-T5 Checkpoints - **Original Checkpoints:** All Original FLAN-T5 Checkpoints - **Resources for more information:** - Research paper - GitHub Repo - Hugging Face FLAN-T5 Docs (Similar to T5) # Usage Find below some example scripts on how to use the model in : ## Using the Pytorch model ### Running the model on a CPU
Click to expand
### Running the model on a GPU
Click to expand
### Running the model on a GPU using different precisions #### FP16
Click to expand
#### INT8
Click to expand
# Uses ## Direct Use and Downstream Use The authors write in the original paper's model card that: > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models See the research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations The information below in this section are copied from the model's official model card: > Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. ## Ethical considerations and risks > Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. ## Known Limitations > Flan-T5 has not been tested in real world applications. ## Sensitive Use: > Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech. # Training Details ## Training Data The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2): !table.png ## Training Procedure According to the model card from the original paper: > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size. The model has been trained on TPU v3 or TPU v4 pods, using []( codebase together with []( # Evaluation ## Testing Data, Factors & Metrics The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation: !image.png For full details, please check the research paper. ## Results For full results for FLAN-T5-XL, see the research paper, Table 3. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4. - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:**", + "model_explanation_gemini": "google_flan-t5-xl is a multilingual text-to-text generation model fine-tuned on over 1000 tasks, excelling in translation, question answering, logical reasoning, and scientific tasks across numerous languages." +} \ No newline at end of file diff --git a/data/model_data_json/google_flan-t5-xxl.json b/data/model_data_json/google_flan-t5-xxl.json new file mode 100644 index 0000000000000000000000000000000000000000..5bedca8eec2cb5493f1a7bbf83538ff17b085f6b --- /dev/null +++ b/data/model_data_json/google_flan-t5-xxl.json @@ -0,0 +1,37 @@ +{ + "model_id": "google/flan-t5-xxl", + "downloads": 141763, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "t5", + "text2text-generation", + "en", + "fr", + "ro", + "de", + "multilingual", + "dataset:svakulenk0/qrecc", + "dataset:taskmaster2", + "dataset:djaym7/wiki_dialog", + "dataset:deepmind/code_contests", + "dataset:lambada", + "dataset:gsm8k", + "dataset:aqua_rat", + "dataset:esnli", + "dataset:quasc", + "dataset:qed", + "arxiv:2210.11416", + "arxiv:1910.09700", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - ro - de - multilingual widget: - text: \"Translate to German: My name is Arthur\" example_title: \"Translation\" - text: \"Please answer to the following question. Who is going to be the next Ballon d'or?\" example_title: \"Question Answering\" - text: \"Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.\" example_title: \"Logical reasoning\" - text: \"Please answer the following question. What is the boiling point of Nitrogen?\" example_title: \"Scientific knowledge\" - text: \"Answer the following yes/no question. Can you write a whole Haiku in a single tweet?\" example_title: \"Yes/no question\" - text: \"Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?\" example_title: \"Reasoning task\" - text: \"Q: ( False or not False or False ) is? A: Let's think step by step\" example_title: \"Boolean Expressions\" - text: \"The square root of x is the cube root of y. What is y to the power of 2, if x = 4?\" example_title: \"Math reasoning\" - text: \"Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?\" example_title: \"Premise and hypothesis\" tags: - text2text-generation datasets: - svakulenk0/qrecc - taskmaster2 - djaym7/wiki_dialog - deepmind/code_contests - lambada - gsm8k - aqua_rat - esnli - quasc - qed license: apache-2.0 --- # Model Card for FLAN-T5 XXL \"drawing\" # Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation # TL;DR If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract : > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models. **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the T5 model card. # Model Details ## Model Description - **Model type:** Language model - **Language(s) (NLP):** English, German, French - **License:** Apache 2.0 - **Related Models:** All FLAN-T5 Checkpoints - **Original Checkpoints:** All Original FLAN-T5 Checkpoints - **Resources for more information:** - Research paper - GitHub Repo - Hugging Face FLAN-T5 Docs (Similar to T5) # Usage Find below some example scripts on how to use the model in : ## Using the Pytorch model ### Running the model on a CPU
Click to expand
### Running the model on a GPU
Click to expand
### Running the model on a GPU using different precisions #### FP16
Click to expand
#### INT8
Click to expand
# Uses ## Direct Use and Downstream Use The authors write in the original paper's model card that: > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models See the research paper for further details. ## Out-of-Scope Use More information needed. # Bias, Risks, and Limitations The information below in this section are copied from the model's official model card: > Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. ## Ethical considerations and risks > Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data. ## Known Limitations > Flan-T5 has not been tested in real world applications. ## Sensitive Use: > Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech. # Training Details ## Training Data The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2): !table.png ## Training Procedure According to the model card from the original paper: > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size. The model has been trained on TPU v3 or TPU v4 pods, using []( codebase together with []( # Evaluation ## Testing Data, Factors & Metrics The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation: !image.png For full details, please check the research paper. ## Results For full results for FLAN-T5-XXL, see the research paper, Table 3. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4. - **Hours used:** More information needed - **Cloud Provider:** GCP - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Citation **BibTeX:**", + "model_explanation_gemini": "google_flan-t5-xxl is a multilingual text-to-text generation model fine-tuned for diverse tasks including translation, question answering, logical reasoning, and scientific knowledge, outperforming similar-sized models across 1000+ tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google_fnet-base.json b/data/model_data_json/google_fnet-base.json new file mode 100644 index 0000000000000000000000000000000000000000..f61b3cd6063c3c7022853ad320bc7c626ecc918e --- /dev/null +++ b/data/model_data_json/google_fnet-base.json @@ -0,0 +1,19 @@ +{ + "model_id": "google/fnet-base", + "downloads": 888370, + "tags": [ + "transformers", + "pytorch", + "rust", + "fnet", + "pretraining", + "en", + "dataset:c4", + "arxiv:2105.03824", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - fnet license: apache-2.0 datasets: - c4 --- # FNet base model Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was introduced in this paper and first released in this repository. This model is cased: it makes a difference between english and English. The model achieves 0.58 accuracy on MLM objective and 0.80 on NSP objective. Disclaimer: This model card has been written by gchhablani. ## Model description FNet is a transformers model with attention replaced with fourier transforms. Hence, the inputs do not contain an . It is pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to predict if the two sentences were following each other or not. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the FNet model as inputs. ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ## Training data The FNet model was pretrained on C4, a cleaned version of the Common Crawl dataset. ## Training procedure ### Preprocessing The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 32,000. The inputs of the model are then of the form: With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two \"sentences\" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by . - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ### Pretraining FNet-base was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size of 256. The sequence length was limited to 512 tokens. The optimizer used is Adam with a learning rate of 1e-4, \\\\(\\beta_{1} = 0.9\\\\) and \\\\(\\beta_{2} = 0.999\\\\), a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after. ## Evaluation results FNet-base was fine-tuned and evaluated on the validation data of the GLUE benchamrk. The results of the official model (written in Flax) can be seen in Table 1 on page 7 of the official paper. For comparison, this model (ported to PyTorch) was fine-tuned and evaluated using the official Hugging Face GLUE evaluation scripts alongside bert-base-cased for comparison. The training was done on a single 16GB NVIDIA Tesla V100 GPU. For MRPC/WNLI, the models were trained for 5 epochs, while for other tasks, the models were trained for 3 epochs. A sequence length of 512 was used with batch size 16 and learning rate 2e-5. The following table summarizes the results for fnet-base (called *FNet (PyTorch) - Reproduced*) and bert-base-cased (called *Bert (PyTorch) - Reproduced*) in terms of **fine-tuning** speed. The format is *hour:min:seconds*. **Note** that the authors compared **pre-traning** speed in the official paper instead. | Task/Model | FNet-base (PyTorch) |Bert-base (PyTorch)| |:----:|:-----------:|:----:| | MNLI-(m/mm) | 06:40:55 | 09:52:33| | QQP | 06:21:16 | 09:25:01 | | QNLI | 01:48:22 | 02:40:22| | SST-2 | 01:09:27 | 01:42:17| | CoLA | 00:09:47 | 00:14:20| | STS-B | 00:07:09 | 00:10:24| | MRPC | 00:07:48 | 00:11:12| | RTE | 00:03:24 | 00:04:51| | WNLI | 00:02:37 | 00:03:23| | SUM | 16:30:45 | 24:23:56 | On average the PyTorch version of FNet-base requires *ca.* 32% less time for GLUE fine-tuning on GPU. The following table summarizes the results for fnet-base (called *FNet (PyTorch) - Reproduced*) and bert-base-cased (called *Bert (PyTorch) - Reproduced*) in terms of performance and compares it to the reported performance of the official FNet-base model (called *FNet (Flax) - Official*). Note that the training hyperparameters of the reproduced models were not the same as the official model, so the performance may differ significantly for some tasks (for example: CoLA). | Task/Model | Metric | FNet-base (PyTorch) | Bert-base (PyTorch) | FNet-Base (Flax - official) | |:----:|:-----------:|:----:|:-----------:|:----:| | MNLI-(m/mm) | Accuracy or Match/Mismatch | 76.75 | 84.10 | 72/73 | | QQP | mean(Accuracy,F1) | 86.5 | 89.26 | 83 | | QNLI | Accuracy | 84.39 | 90.99 | 80 | | SST-2 | Accuracy | 89.45 | 92.32 | 95 | | CoLA | Matthews corr or Accuracy | 35.94 | 59.57 | 69 | | STS-B | Spearman corr. | 82.19 | 88.98 | 79 | | MRPC | mean(F1/Accuracy) | 81.15 | 88.15 | 76 | | RTE | Accuracy | 62.82 | 67.15 | 63 | | WNLI | Accuracy | 54.93 | 46.48 | - | | Avg | - | 72.7 | 78.6 | 76.7 | We can see that FNet-base achieves around 93% of BERT-base's performance on average. For more details, please refer to the checkpoints linked with the scores. On overview of all fine-tuned checkpoints of the following table can be accessed here. ### How to use You can use this model directly with a pipeline for masked language modeling: **Note: The mask filling pipeline doesn't work exactly as the original model performs masking after converting to tokens. In masking pipeline an additional space is added after the [MASK].** Here is how to use this model to get the features of a given text in PyTorch: **Note: You must specify the maximum sequence length to be 512 and truncate/pad to the same length because the original model has no attention mask and considers all the hidden states during forward pass.** ### BibTeX entry and citation info ## Contributions Thanks to @gchhablani for adding this model.", + "model_explanation_gemini": "FNet-base is a transformer model using Fourier transforms instead of attention, pretrained on English text via masked language modeling and next sentence prediction for downstream task fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-2-27b-it.json b/data/model_data_json/google_gemma-2-27b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..31ecf02b14597fde29b12ee4bee22d133890b15f --- /dev/null +++ b/data/model_data_json/google_gemma-2-27b-it.json @@ -0,0 +1,42 @@ +{ + "model_id": "google/gemma-2-27b-it", + "downloads": 135109, + "tags": [ + "transformers", + "safetensors", + "gemma2", + "text-generation", + "conversational", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:2110.08193", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:1804.06876", + "arxiv:2103.03874", + "arxiv:2304.06364", + "arxiv:2206.04615", + "arxiv:2203.09509", + "base_model:google/gemma-2-27b", + "base_model:finetune:google/gemma-2-27b", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/gemma-2-27b --- # Gemma 2 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma] **Terms of Use**: Terms **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with: Then, copy the snippet from the section that is relevant for your usecase. #### Running with the API #### Running the model on a single / multi GPU You can ensure the correct chat template is applied by using as follows:
#### Running the model on a GPU using different precisions The native weights of this model were exported in precision. You can also use if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to ). See examples below. * _Upcasting to _ #### Running the model through a CLI The local-gemma repository contains a lightweight wrapper around Transformers for running Gemma 2 through a command line interface, or CLI. Follow the installation instructions for getting started, then launch the CLI through the following command: #### Quantized Versions through
Using 8-bit precision (int8)
Using 4-bit precision
#### Advanced Usage
Torch compile Torch compile is a method for speeding-up the inference of PyTorch modules. The Gemma-2 model can be run up to 6x faster by leveraging torch compile. Note that two warm-up steps are required before the full inference speed is realised: For more details, refer to the Transformers documentation.
### Chat Template The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet. Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction: At this point, the prompt contains the following text: As you can see, each turn is preceded by a delimiter and then the role of the entity (either , for content supplied by the user, or for LLM responses). Turns finish with the token. You can follow this format to build the prompt manually, if you need to do it without the tokenizer's chat template. After the prompt is ready, generation can be performed like this: ### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens and the 9B model was trained with 8 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of [Tensor Processing Unit (TPU)][tpu] hardware (TPUv5p). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for [foundation models][foundation-models], including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | Gemma PT 9B | Gemma PT 27B | | ------------------------------ | ------------- | ----------- | ------------ | | [MMLU][mmlu] | 5-shot, top-1 | 71.3 | 75.2 | | [HellaSwag][hellaswag] | 10-shot | 81.9 | 86.4 | | [PIQA][piqa] | 0-shot | 81.7 | 83.2 | | [SocialIQA][socialiqa] | 0-shot | 53.4 | 53.7 | | [BoolQ][boolq] | 0-shot | 84.2 | 84.8 | | [WinoGrande][winogrande] | partial score | 80.6 | 83.7 | | [ARC-e][arc] | 0-shot | 88.0 | 88.6 | | [ARC-c][arc] | 25-shot | 68.4 | 71.4 | | [TriviaQA][triviaqa] | 5-shot | 76.6 | 83.7 | | [Natural Questions][naturalq] | 5-shot | 29.2 | 34.5 | | [HumanEval][humaneval] | pass@1 | 40.2 | 51.8 | | [MBPP][mbpp] | 3-shot | 52.4 | 62.6 | | [GSM8K][gsm8k] | 5-shot, maj@1 | 68.6 | 74.0 | | [MATH][math] | 4-shot | 36.6 | 42.3 | | [AGIEval][agieval] | 3-5-shot | 52.8 | 55.1 | | [BIG-Bench][big-bench] | 3-shot, CoT | 68.2 | 74.9 | | ------------------------------ | ------------- | ----------- | ------------ | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as [WinoBias][winobias] and [BBQ Dataset][bbq]. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting [internal policies][safety-policies] for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well-known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. #### Gemma 2.0 | Benchmark | Metric | Gemma 2 IT 9B | Gemma 2 IT 27B | | ------------------------ | ------------- | --------------- | ---------------- | | [RealToxicity][realtox] | average | 8.25 | 8.84 | | [CrowS-Pairs][crows] | top-1 | 37.47 | 36.67 | | [BBQ Ambig][bbq] | 1-shot, top-1 | 88.58 | 85.99 | | [BBQ Disambig][bbq] | top-1 | 82.67 | 86.94 | | [Winogender][winogender] | top-1 | 79.17 | 77.22 | | [TruthfulQA][truthfulqa] | | 50.27 | 51.60 | | [Winobias 1_2][winobias] | | 78.09 | 81.94 | | [Winobias 2_2][winobias] | | 95.32 | 97.22 | | [Toxigen][toxigen] | | 39.30 | 38.42 | | ------------------------ | ------------- | --------------- | ---------------- | ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [rai-toolkit]: [kaggle-gemma]: [terms]: [vertex-mg-gemma]: [sensitive-info]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [foundation-models]: [gemini-2-paper]: [mmlu]: [hellaswag]: [piqa]: [socialiqa]: [boolq]: [winogrande]: [commonsenseqa]: [openbookqa]: [arc]: [triviaqa]: [naturalq]: [humaneval]: [mbpp]: [gsm8k]: [realtox]: [bold]: [crows]: [bbq]: [winogender]: [truthfulqa]: [winobias]: [math]: [agieval]: [big-bench]: [toxigen]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-2-2b-it.json b/data/model_data_json/google_gemma-2-2b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..b1575203b38e6581fe0d1aebd6a2cc3aa18e68d6 --- /dev/null +++ b/data/model_data_json/google_gemma-2-2b-it.json @@ -0,0 +1,44 @@ +{ + "model_id": "google/gemma-2-2b-it", + "downloads": 317158, + "tags": [ + "transformers", + "safetensors", + "gemma2", + "text-generation", + "conversational", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:2110.08193", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:1804.06876", + "arxiv:2103.03874", + "arxiv:2304.06364", + "arxiv:1903.00161", + "arxiv:2206.04615", + "arxiv:2203.09509", + "arxiv:2403.13793", + "base_model:google/gemma-2-2b", + "base_model:finetune:google/gemma-2-2b", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license tags: - conversational base_model: google/gemma-2-2b --- # Gemma 2 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma2] **Terms of Use**: [Terms][terms] **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with: Then, copy the snippet from the section that is relevant for your usecase. #### Running with the API #### Running the model on a single / multi GPU You can ensure the correct chat template is applied by using as follows: #### Running the model on a GPU using different precisions The native weights of this model were exported in precision. You can also use if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to ). See examples below. * _Upcasting to _ #### Running the model through a CLI The local-gemma repository contains a lightweight wrapper around Transformers for running Gemma 2 through a command line interface, or CLI. Follow the installation instructions for getting started, then launch the CLI through the following command: #### Quantized Versions through
Using 8-bit precision (int8)
Using 4-bit precision
#### Advanced Usage
Torch compile Torch compile is a method for speeding-up the inference of PyTorch modules. The Gemma-2 2b model can be run up to 6x faster by leveraging torch compile. Note that two warm-up steps are required before the full inference speed is realised: For more details, refer to the Transformers documentation.
### Chat Template The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet. Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction: At this point, the prompt contains the following text: As you can see, each turn is preceded by a delimiter and then the role of the entity (either , for content supplied by the user, or for LLM responses). Turns finish with the token. You can follow this format to build the prompt manually, if you need to do it without the tokenizer's chat template. After the prompt is ready, generation can be performed like this: ### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens, the 9B model was trained with 8 trillion tokens, and 2B model was trained with 2 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of [Tensor Processing Unit (TPU)][tpu] hardware (TPUv5p). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for [foundation models][foundation-models], including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | Gemma 2 PT 2B | Gemma 2 PT 9B | Gemma 2 PT 27B | | ------------------------------ | ------------- | ------------- | ------------- | -------------- | | [MMLU][mmlu] | 5-shot, top-1 | 51.3 | 71.3 | 75.2 | | [HellaSwag][hellaswag] | 10-shot | 73.0 | 81.9 | 86.4 | | [PIQA][piqa] | 0-shot | 77.8 | 81.7 | 83.2 | | [SocialIQA][socialiqa] | 0-shot | 51.9 | 53.4 | 53.7 | | [BoolQ][boolq] | 0-shot | 72.5 | 84.2 | 84.8 | | [WinoGrande][winogrande] | partial score | 70.9 | 80.6 | 83.7 | | [ARC-e][arc] | 0-shot | 80.1 | 88.0 | 88.6 | | [ARC-c][arc] | 25-shot | 55.4 | 68.4 | 71.4 | | [TriviaQA][triviaqa] | 5-shot | 59.4 | 76.6 | 83.7 | | [Natural Questions][naturalq] | 5-shot | 16.7 | 29.2 | 34.5 | | [HumanEval][humaneval] | pass@1 | 17.7 | 40.2 | 51.8 | | [MBPP][mbpp] | 3-shot | 29.6 | 52.4 | 62.6 | | [GSM8K][gsm8k] | 5-shot, maj@1 | 23.9 | 68.6 | 74.0 | | [MATH][math] | 4-shot | 15.0 | 36.6 | 42.3 | | [AGIEval][agieval] | 3-5-shot | 30.6 | 52.8 | 55.1 | | [DROP][drop] | 3-shot, F1 | 52.0 | 69.4 | 72.2 | | [BIG-Bench][big-bench] | 3-shot, CoT | 41.9 | 68.2 | 74.9 | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as [WinoBias][winobias] and [BBQ Dataset][bbq]. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting [internal policies][safety-policies] for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well-known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. #### Gemma 2.0 | Benchmark | Metric | Gemma 2 IT 2B | Gemma 2 IT 9B | Gemma 2 IT 27B | | ------------------------ | ------------- | ------------- | ------------- | -------------- | | [RealToxicity][realtox] | average | 8.16 | 8.25 | 8.84 | | [CrowS-Pairs][crows] | top-1 | 37.67 | 37.47 | 36.67 | | [BBQ Ambig][bbq] | 1-shot, top-1 | 83.20 | 88.58 | 85.99 | | [BBQ Disambig][bbq] | top-1 | 69.31 | 82.67 | 86.94 | | [Winogender][winogender] | top-1 | 52.91 | 79.17 | 77.22 | | [TruthfulQA][truthfulqa] | | 43.72 | 50.27 | 51.60 | | [Winobias 1_2][winobias] | | 59.28 | 78.09 | 81.94 | | [Winobias 2_2][winobias] | | 88.57 | 95.32 | 97.22 | | [Toxigen][toxigen] | | 48.32 | 39.30 | 38.42 | ## Dangerous Capability Evaluations ### Evaluation Approach We evaluated a range of dangerous capabilities: - **Offensive cybersecurity:** To assess the model's potential for misuse in cybersecurity contexts, we utilized both publicly available Capture-the-Flag (CTF) platforms like InterCode-CTF and Hack the Box, as well as internally developed CTF challenges. These evaluations measure the model's ability to exploit vulnerabilities and gain unauthorized access in simulated environments. - **Self-proliferation:** We evaluated the model's capacity for self-proliferation by designing tasks that involve resource acquisition, code execution, and interaction with remote systems. These evaluations assess the model's ability to independently replicate and spread. - **Persuasion:** To evaluate the model's capacity for persuasion and deception, we conducted human persuasion studies. These studies involved scenarios that measure the model's ability to build rapport, influence beliefs, and elicit specific actions from human participants. ### Evaluation Results All evaluations are described in detail in [Evaluating Frontier Models for Dangerous Capabilities][eval-danger] and in brief in the [Gemma 2 technical report][tech-report].
Evaluation Capability Gemma 2 IT 27B
InterCode-CTF Offensive cybersecurity 34/76 challenges
Internal CTF Offensive cybersecurity 1/13 challenges
Hack the Box Offensive cybersecurity 0/13 challenges
Self-proliferation early warning Self-proliferation 1/10 challenges
Charm offensive Persuasion Percent of participants agreeing: 81% interesting, 75% would speak again, 80% made personal connection
Click Links Persuasion 34% of participants
Find Info Persuasion 9% of participants
Run Code Persuasion 11% of participants
Money talks Persuasion £3.72 mean donation
Web of Lies Persuasion 18% mean shift towards correct belief, 1% mean shift towards incorrect belief
## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [tech-report]: [rai-toolkit]: [kaggle-gemma]: [terms]: [vertex-mg-gemma2]: [sensitive-info]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [foundation-models]: [gemini-2-paper]: [mmlu]: [hellaswag]: [piqa]: [socialiqa]: [boolq]: [winogrande]: [commonsenseqa]: [openbookqa]: [arc]: [triviaqa]: [naturalq]: [humaneval]: [mbpp]: [gsm8k]: [realtox]: [bold]: [crows]: [bbq]: [winogender]: [truthfulqa]: [winobias]: [math]: [agieval]: [drop]: [big-bench]: [toxigen]: [eval-danger]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-2-2b.json b/data/model_data_json/google_gemma-2-2b.json new file mode 100644 index 0000000000000000000000000000000000000000..aa7d8005b37ce8f080f31019019657ff51f45d79 --- /dev/null +++ b/data/model_data_json/google_gemma-2-2b.json @@ -0,0 +1,41 @@ +{ + "model_id": "google/gemma-2-2b", + "downloads": 155581, + "tags": [ + "transformers", + "safetensors", + "gemma2", + "text-generation", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:2110.08193", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:1804.06876", + "arxiv:2103.03874", + "arxiv:2304.06364", + "arxiv:1903.00161", + "arxiv:2206.04615", + "arxiv:2203.09509", + "arxiv:2403.13793", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license --- # Gemma 2 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma2] **Terms of Use**: [Terms][terms] **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with: Then, copy the snippet from the section that is relevant for your usecase. #### Running with the API #### Running the model on a single / multi GPU #### Running the model through a CLI The local-gemma repository contains a lightweight wrapper around Transformers for running Gemma 2 through a command line interface, or CLI. Follow the installation instructions for getting started, then launch the CLI through the following command: #### Quantized Versions through
Using 8-bit precision (int8)
Using 4-bit precision
#### Advanced Usage
Torch compile Torch compile is a method for speeding-up the inference of PyTorch modules. The Gemma-2 2b model can be run up to 6x faster by leveraging torch compile. Note that two warm-up steps are required before the full inference speed is realised: For more details, refer to the Transformers documentation.
### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens, the 9B model was trained with 8 trillion tokens, and 2B model was trained with 2 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of [Tensor Processing Unit (TPU)][tpu] hardware (TPUv5p). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for [foundation models][foundation-models], including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | Gemma 2 PT 2B | Gemma 2 PT 9B | Gemma 2 PT 27B | | ------------------------------ | ------------- | ------------- | ------------- | -------------- | | [MMLU][mmlu] | 5-shot, top-1 | 51.3 | 71.3 | 75.2 | | [HellaSwag][hellaswag] | 10-shot | 73.0 | 81.9 | 86.4 | | [PIQA][piqa] | 0-shot | 77.8 | 81.7 | 83.2 | | [SocialIQA][socialiqa] | 0-shot | 51.9 | 53.4 | 53.7 | | [BoolQ][boolq] | 0-shot | 72.5 | 84.2 | 84.8 | | [WinoGrande][winogrande] | partial score | 70.9 | 80.6 | 83.7 | | [ARC-e][arc] | 0-shot | 80.1 | 88.0 | 88.6 | | [ARC-c][arc] | 25-shot | 55.4 | 68.4 | 71.4 | | [TriviaQA][triviaqa] | 5-shot | 59.4 | 76.6 | 83.7 | | [Natural Questions][naturalq] | 5-shot | 16.7 | 29.2 | 34.5 | | [HumanEval][humaneval] | pass@1 | 17.7 | 40.2 | 51.8 | | [MBPP][mbpp] | 3-shot | 29.6 | 52.4 | 62.6 | | [GSM8K][gsm8k] | 5-shot, maj@1 | 23.9 | 68.6 | 74.0 | | [MATH][math] | 4-shot | 15.0 | 36.6 | 42.3 | | [AGIEval][agieval] | 3-5-shot | 30.6 | 52.8 | 55.1 | | [DROP][drop] | 3-shot, F1 | 52.0 | 69.4 | 72.2 | | [BIG-Bench][big-bench] | 3-shot, CoT | 41.9 | 68.2 | 74.9 | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as [WinoBias][winobias] and [BBQ Dataset][bbq]. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting [internal policies][safety-policies] for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well-known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. #### Gemma 2.0 | Benchmark | Metric | Gemma 2 IT 2B | Gemma 2 IT 9B | Gemma 2 IT 27B | | ------------------------ | ------------- | ------------- | ------------- | -------------- | | [RealToxicity][realtox] | average | 8.16 | 8.25 | 8.84 | | [CrowS-Pairs][crows] | top-1 | 37.67 | 37.47 | 36.67 | | [BBQ Ambig][bbq] | 1-shot, top-1 | 83.20 | 88.58 | 85.99 | | [BBQ Disambig][bbq] | top-1 | 69.31 | 82.67 | 86.94 | | [Winogender][winogender] | top-1 | 52.91 | 79.17 | 77.22 | | [TruthfulQA][truthfulqa] | | 43.72 | 50.27 | 51.60 | | [Winobias 1_2][winobias] | | 59.28 | 78.09 | 81.94 | | [Winobias 2_2][winobias] | | 88.57 | 95.32 | 97.22 | | [Toxigen][toxigen] | | 48.32 | 39.30 | 38.42 | ## Dangerous Capability Evaluations ### Evaluation Approach We evaluated a range of dangerous capabilities: - **Offensive cybersecurity:** To assess the model's potential for misuse in cybersecurity contexts, we utilized both publicly available Capture-the-Flag (CTF) platforms like InterCode-CTF and Hack the Box, as well as internally developed CTF challenges. These evaluations measure the model's ability to exploit vulnerabilities and gain unauthorized access in simulated environments. - **Self-proliferation:** We evaluated the model's capacity for self-proliferation by designing tasks that involve resource acquisition, code execution, and interaction with remote systems. These evaluations assess the model's ability to independently replicate and spread. - **Persuasion:** To evaluate the model's capacity for persuasion and deception, we conducted human persuasion studies. These studies involved scenarios that measure the model's ability to build rapport, influence beliefs, and elicit specific actions from human participants. ### Evaluation Results All evaluations are described in detail in [Evaluating Frontier Models for Dangerous Capabilities][eval-danger] and in brief in the [Gemma 2 technical report][tech-report].
Evaluation Capability Gemma 2 IT 27B
InterCode-CTF Offensive cybersecurity 34/76 challenges
Internal CTF Offensive cybersecurity 1/13 challenges
Hack the Box Offensive cybersecurity 0/13 challenges
Self-proliferation early warning Self-proliferation 1/10 challenges
Charm offensive Persuasion Percent of participants agreeing: 81% interesting, 75% would speak again, 80% made personal connection
Click Links Persuasion 34% of participants
Find Info Persuasion 9% of participants
Run Code Persuasion 11% of participants
Money talks Persuasion £3.72 mean donation
Web of Lies Persuasion 18% mean shift towards correct belief, 1% mean shift towards incorrect belief
## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [tech-report]: [rai-toolkit]: [kaggle-gemma]: [terms]: [vertex-mg-gemma2]: [sensitive-info]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [foundation-models]: [gemini-2-paper]: [mmlu]: [hellaswag]: [piqa]: [socialiqa]: [boolq]: [winogrande]: [commonsenseqa]: [openbookqa]: [arc]: [triviaqa]: [naturalq]: [humaneval]: [mbpp]: [gsm8k]: [realtox]: [bold]: [crows]: [bbq]: [winogender]: [truthfulqa]: [winobias]: [math]: [agieval]: [drop]: [big-bench]: [toxigen]: [eval-danger]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-2-9b-it.json b/data/model_data_json/google_gemma-2-9b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..149ad6699ab0155667925c74a221647fb1978344 --- /dev/null +++ b/data/model_data_json/google_gemma-2-9b-it.json @@ -0,0 +1,42 @@ +{ + "model_id": "google/gemma-2-9b-it", + "downloads": 336099, + "tags": [ + "transformers", + "safetensors", + "gemma2", + "text-generation", + "conversational", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:2110.08193", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:1804.06876", + "arxiv:2103.03874", + "arxiv:2304.06364", + "arxiv:2206.04615", + "arxiv:2203.09509", + "base_model:google/gemma-2-9b", + "base_model:finetune:google/gemma-2-9b", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license tags: - conversational base_model: google/gemma-2-9b --- # Gemma 2 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma] **Terms of Use**: Terms **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with: Then, copy the snippet from the section that is relevant for your usecase. #### Running with the API #### Running the model on a single / multi GPU You can ensure the correct chat template is applied by using as follows: #### Running the model on a GPU using different precisions The native weights of this model were exported in precision. You can also use if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to ). See examples below. * _Upcasting to _ #### Running the model through a CLI The local-gemma repository contains a lightweight wrapper around Transformers for running Gemma 2 through a command line interface, or CLI. Follow the installation instructions for getting started, then launch the CLI through the following command: #### Quantized Versions through
Using 8-bit precision (int8)
Using 4-bit precision
#### Advanced Usage
Torch compile Torch compile is a method for speeding-up the inference of PyTorch modules. The Gemma-2 model can be run up to 6x faster by leveraging torch compile. Note that two warm-up steps are required before the full inference speed is realised: For more details, refer to the Transformers documentation.
### Chat Template The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet. Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction: At this point, the prompt contains the following text: As you can see, each turn is preceded by a delimiter and then the role of the entity (either , for content supplied by the user, or for LLM responses). Turns finish with the token. You can follow this format to build the prompt manually, if you need to do it without the tokenizer's chat template. After the prompt is ready, generation can be performed like this: ### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens and the 9B model was trained with 8 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of [Tensor Processing Unit (TPU)][tpu] hardware (TPUv5p). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for [foundation models][foundation-models], including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | Gemma PT 9B | Gemma PT 27B | | ------------------------------ | ------------- | ----------- | ------------ | | [MMLU][mmlu] | 5-shot, top-1 | 71.3 | 75.2 | | [HellaSwag][hellaswag] | 10-shot | 81.9 | 86.4 | | [PIQA][piqa] | 0-shot | 81.7 | 83.2 | | [SocialIQA][socialiqa] | 0-shot | 53.4 | 53.7 | | [BoolQ][boolq] | 0-shot | 84.2 | 84.8 | | [WinoGrande][winogrande] | partial score | 80.6 | 83.7 | | [ARC-e][arc] | 0-shot | 88.0 | 88.6 | | [ARC-c][arc] | 25-shot | 68.4 | 71.4 | | [TriviaQA][triviaqa] | 5-shot | 76.6 | 83.7 | | [Natural Questions][naturalq] | 5-shot | 29.2 | 34.5 | | [HumanEval][humaneval] | pass@1 | 40.2 | 51.8 | | [MBPP][mbpp] | 3-shot | 52.4 | 62.6 | | [GSM8K][gsm8k] | 5-shot, maj@1 | 68.6 | 74.0 | | [MATH][math] | 4-shot | 36.6 | 42.3 | | [AGIEval][agieval] | 3-5-shot | 52.8 | 55.1 | | [BIG-Bench][big-bench] | 3-shot, CoT | 68.2 | 74.9 | | ------------------------------ | ------------- | ----------- | ------------ | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as [WinoBias][winobias] and [BBQ Dataset][bbq]. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting [internal policies][safety-policies] for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well-known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. #### Gemma 2.0 | Benchmark | Metric | Gemma 2 IT 9B | Gemma 2 IT 27B | | ------------------------ | ------------- | --------------- | ---------------- | | [RealToxicity][realtox] | average | 8.25 | 8.84 | | [CrowS-Pairs][crows] | top-1 | 37.47 | 36.67 | | [BBQ Ambig][bbq] | 1-shot, top-1 | 88.58 | 85.99 | | [BBQ Disambig][bbq] | top-1 | 82.67 | 86.94 | | [Winogender][winogender] | top-1 | 79.17 | 77.22 | | [TruthfulQA][truthfulqa] | | 50.27 | 51.60 | | [Winobias 1_2][winobias] | | 78.09 | 81.94 | | [Winobias 2_2][winobias] | | 95.32 | 97.22 | | [Toxigen][toxigen] | | 39.30 | 38.42 | | ------------------------ | ------------- | --------------- | ---------------- | ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [rai-toolkit]: [kaggle-gemma]: [terms]: [vertex-mg-gemma]: [sensitive-info]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [foundation-models]: [gemini-2-paper]: [mmlu]: [hellaswag]: [piqa]: [socialiqa]: [boolq]: [winogrande]: [commonsenseqa]: [openbookqa]: [arc]: [triviaqa]: [naturalq]: [humaneval]: [mbpp]: [gsm8k]: [realtox]: [bold]: [crows]: [bbq]: [winogender]: [truthfulqa]: [winobias]: [math]: [agieval]: [big-bench]: [toxigen]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-2-9b.json b/data/model_data_json/google_gemma-2-9b.json new file mode 100644 index 0000000000000000000000000000000000000000..20bce3fac38891d01715a52736daa81c70f731ff --- /dev/null +++ b/data/model_data_json/google_gemma-2-9b.json @@ -0,0 +1,40 @@ +{ + "model_id": "google/gemma-2-9b", + "downloads": 89472, + "tags": [ + "transformers", + "safetensors", + "gemma2", + "text-generation", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:2110.08193", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:1804.06876", + "arxiv:2103.03874", + "arxiv:2304.06364", + "arxiv:2206.04615", + "arxiv:2203.09509", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license --- # Gemma 2 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma] **Terms of Use**: Terms **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with: Then, copy the snippet from the section that is relevant for your usecase. #### Running with the API #### Running the model on a single / multi GPU #### Running the model through a CLI The local-gemma repository contains a lightweight wrapper around Transformers for running Gemma 2 through a command line interface, or CLI. Follow the installation instructions for getting started, then launch the CLI through the following command: #### Quantized Versions through
Using 8-bit precision (int8)
Using 4-bit precision
#### Advanced Usage
Torch compile Torch compile is a method for speeding-up the inference of PyTorch modules. The Gemma-2 model can be run up to 6x faster by leveraging torch compile. Note that two warm-up steps are required before the full inference speed is realised: For more details, refer to the Transformers documentation.
### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens and the 9B model was trained with 8 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of [Tensor Processing Unit (TPU)][tpu] hardware (TPUv5p). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for [foundation models][foundation-models], including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | Gemma PT 9B | Gemma PT 27B | | ------------------------------ | ------------- | ----------- | ------------ | | [MMLU][mmlu] | 5-shot, top-1 | 71.3 | 75.2 | | [HellaSwag][hellaswag] | 10-shot | 81.9 | 86.4 | | [PIQA][piqa] | 0-shot | 81.7 | 83.2 | | [SocialIQA][socialiqa] | 0-shot | 53.4 | 53.7 | | [BoolQ][boolq] | 0-shot | 84.2 | 84.8 | | [WinoGrande][winogrande] | partial score | 80.6 | 83.7 | | [ARC-e][arc] | 0-shot | 88.0 | 88.6 | | [ARC-c][arc] | 25-shot | 68.4 | 71.4 | | [TriviaQA][triviaqa] | 5-shot | 76.6 | 83.7 | | [Natural Questions][naturalq] | 5-shot | 29.2 | 34.5 | | [HumanEval][humaneval] | pass@1 | 40.2 | 51.8 | | [MBPP][mbpp] | 3-shot | 52.4 | 62.6 | | [GSM8K][gsm8k] | 5-shot, maj@1 | 68.6 | 74.0 | | [MATH][math] | 4-shot | 36.6 | 42.3 | | [AGIEval][agieval] | 3-5-shot | 52.8 | 55.1 | | [BIG-Bench][big-bench] | 3-shot, CoT | 68.2 | 74.9 | | ------------------------------ | ------------- | ----------- | ------------ | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as [WinoBias][winobias] and [BBQ Dataset][bbq]. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting [internal policies][safety-policies] for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well-known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. #### Gemma 2.0 | Benchmark | Metric | Gemma 2 IT 9B | Gemma 2 IT 27B | | ------------------------ | ------------- | --------------- | ---------------- | | [RealToxicity][realtox] | average | 8.25 | 8.84 | | [CrowS-Pairs][crows] | top-1 | 37.47 | 36.67 | | [BBQ Ambig][bbq] | 1-shot, top-1 | 88.58 | 85.99 | | [BBQ Disambig][bbq] | top-1 | 82.67 | 86.94 | | [Winogender][winogender] | top-1 | 79.17 | 77.22 | | [TruthfulQA][truthfulqa] | | 50.27 | 51.60 | | [Winobias 1_2][winobias] | | 78.09 | 81.94 | | [Winobias 2_2][winobias] | | 95.32 | 97.22 | | [Toxigen][toxigen] | | 39.30 | 38.42 | | ------------------------ | ------------- | --------------- | ---------------- | ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [rai-toolkit]: [kaggle-gemma]: [terms]: [vertex-mg-gemma]: [sensitive-info]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [foundation-models]: [gemini-2-paper]: [mmlu]: [hellaswag]: [piqa]: [socialiqa]: [boolq]: [winogrande]: [commonsenseqa]: [openbookqa]: [arc]: [triviaqa]: [naturalq]: [humaneval]: [mbpp]: [gsm8k]: [realtox]: [bold]: [crows]: [bbq]: [winogender]: [truthfulqa]: [winobias]: [math]: [agieval]: [big-bench]: [toxigen]:", + "model_explanation_gemini": "Generates English-language text for tasks like question answering, summarization, and reasoning based on input prompts." +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-2b-it.json b/data/model_data_json/google_gemma-2b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..179d120398b9ffa96e8d3c68d420901125db1035 --- /dev/null +++ b/data/model_data_json/google_gemma-2b-it.json @@ -0,0 +1,42 @@ +{ + "model_id": "google/gemma-2b-it", + "downloads": 95141, + "tags": [ + "transformers", + "safetensors", + "gguf", + "gemma", + "text-generation", + "conversational", + "arxiv:2312.11805", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2304.06364", + "arxiv:2206.04615", + "arxiv:1804.06876", + "arxiv:2110.08193", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:2203.09509", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: gemma new_version: google/gemma-2-2b-it widget: - messages: - role: user content: How does the brain work? inference: parameters: max_new_tokens: 200 extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged-in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license --- # Gemma Model Card **Model Page**: Gemma This model card corresponds to the 2B instruct version of the Gemma model. You can also visit the model card of the 2B base model, 7B base model, and 7B instruct model. **Resources and Technical Documentation**: * Responsible Generative AI Toolkit * Gemma on Kaggle * Gemma on Vertex Model Garden **Terms of Use**: Terms **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below we share some code snippets on how to get quickly started with running the model. First make sure to , then copy the snippet from the section that is relevant for your usecase. #### Running the model on a CPU As explained below, we recommend as the default dtype. You can use a different precision if necessary. #### Running the model on a single / multi GPU #### Running the model on a GPU using different precisions The native weights of this model were exported in precision. You can use , which may be faster on certain hardware, indicating the when loading the model. For convenience, the revision of the repo contains a copy of the weights already converted to that precision. You can also use if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to ). See examples below. * _Using _ * _Upcasting to _ #### Quantized Versions through * _Using 8-bit precision (int8)_ * _Using 4-bit precision_ #### Other optimizations * _Flash Attention 2_ First make sure to install in your environment ### Chat Template The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet. Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction: At this point, the prompt contains the following text: As you can see, each turn is preceded by a delimiter and then the role of the entity (either , for content supplied by the user, or for LLM responses). Turns finish with the token. You can follow this format to build the prompt manually, if you need to do it without the tokenizer's chat template. After the prompt is ready, generation can be performed like this: ### Fine-tuning You can find some fine-tuning scripts under the directory of []( repository. To adapt them to this model, simply change the model-id to . We provide: * A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA * A script to perform SFT using FSDP on TPU devices * A notebook that you can run on a free-tier Google Colab instance to perform SFT on the English quotes dataset ### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources, totaling 6 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safely in line with our policies. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of Tensor Processing Unit (TPU) hardware (TPUv5e). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with Google's commitments to operate sustainably. ### Software Training was done using JAX and ML Pathways. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | 2B Params | 7B Params | | ------------------------------ | ------------- | ----------- | --------- | | MMLU | 5-shot, top-1 | 42.3 | 64.3 | | HellaSwag | 0-shot |71.4 | 81.2 | | PIQA | 0-shot | 77.3 | 81.2 | | SocialIQA | 0-shot | 49.7 | 51.8 | | BooIQ | 0-shot | 69.4 | 83.2 | | WinoGrande | partial score | 65.4 | 72.3 | | CommonsenseQA | 7-shot | 65.3 | 71.3 | | OpenBookQA | | 47.8 | 52.8 | | ARC-e | | 73.2 | 81.5 | | ARC-c | | 42.1 | 53.2 | | TriviaQA | 5-shot | 53.2 | 63.4 | | Natural Questions | 5-shot | 12.5 | 23 | | HumanEval | pass@1 | 22.0 | 32.3 | | MBPP | 3-shot | 29.2 | 44.4 | | GSM8K | maj@1 | 17.7 | 46.4 | | MATH | 4-shot | 11.8 | 24.3 | | AGIEval | | 24.2 | 41.7 | | BIG-Bench | | 35.2 | 55.1 | | ------------------------------ | ------------- | ----------- | --------- | | **Average** | | **45.0** | **56.9** | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as WinoBias and BBQ Dataset. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting internal policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. | Benchmark | Metric | 2B Params | 7B Params | | ------------------------------ | ------------- | ----------- | --------- | | RealToxicity | average | 6.86 | 7.90 | | BOLD | | 45.57 | 49.08 | | CrowS-Pairs | top-1 | 45.82 | 51.33 | | BBQ Ambig | 1-shot, top-1 | 62.58 | 92.54 | | BBQ Disambig | top-1 | 54.62 | 71.99 | | Winogender | top-1 | 51.25 | 54.17 | | TruthfulQA | | 44.84 | 31.81 | | Winobias 1_2 | | 56.12 | 59.09 | | Winobias 2_2 | | 91.10 | 92.23 | | Toxigen | | 29.77 | 39.59 | | ------------------------------ | ------------- | ----------- | --------- | ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the Responsible Generative AI Toolkit. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the Gemma Prohibited Use Policy. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives.", + "model_explanation_gemini": "Generates text responses for conversational tasks like question answering and summarization using a lightweight, instruction-tuned language model." +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-2b.json b/data/model_data_json/google_gemma-2b.json new file mode 100644 index 0000000000000000000000000000000000000000..a77ef5c1df1dd0732ab1d4a1b33579b7e9e283e5 --- /dev/null +++ b/data/model_data_json/google_gemma-2b.json @@ -0,0 +1,40 @@ +{ + "model_id": "google/gemma-2b", + "downloads": 397089, + "tags": [ + "transformers", + "safetensors", + "gguf", + "gemma", + "text-generation", + "arxiv:2312.11805", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2304.06364", + "arxiv:2206.04615", + "arxiv:1804.06876", + "arxiv:2110.08193", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:2203.09509", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers new_version: google/gemma-2-2b license: gemma extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged-in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license --- # Gemma Model Card **Model Page**: Gemma This model card corresponds to the 2B base version of the Gemma model. You can also visit the model card of the 7B base model, 7B instruct model, and 2B instruct model. **Resources and Technical Documentation**: * Gemma Technical Report * Responsible Generative AI Toolkit * Gemma on Kaggle * Gemma on Vertex Model Garden **Terms of Use**: Terms **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Context Length Models are trained on a context length of 8192 tokens. ### Usage Below we share some code snippets on how to get quickly started with running the model. First make sure to , then copy the snippet from the section that is relevant for your usecase. #### Fine-tuning the model You can find fine-tuning scripts and notebook under the directory of []( repository. To adapt it to this model, simply change the model-id to . In that repository, we provide: * A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA * A script to perform SFT using FSDP on TPU devices * A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset #### Running the model on a CPU #### Running the model on a single / multi GPU #### Running the model on a GPU using different precisions * _Using _ * _Using _ #### Quantized Versions through * _Using 8-bit precision (int8)_ * _Using 4-bit precision_ #### Other optimizations * _Flash Attention 2_ First make sure to install in your environment ### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources, totaling 6 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safely in line with our policies. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of Tensor Processing Unit (TPU) hardware (TPUv5e). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with Google's commitments to operate sustainably. ### Software Training was done using JAX and ML Pathways. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | 2B Params | 7B Params | | ------------------------------ | ------------- | ----------- | --------- | | MMLU | 5-shot, top-1 | 42.3 | 64.3 | | HellaSwag | 0-shot |71.4 | 81.2 | | PIQA | 0-shot | 77.3 | 81.2 | | SocialIQA | 0-shot | 49.7 | 51.8 | | BooIQ | 0-shot | 69.4 | 83.2 | | WinoGrande | partial score | 65.4 | 72.3 | | CommonsenseQA | 7-shot | 65.3 | 71.3 | | OpenBookQA | | 47.8 | 52.8 | | ARC-e | | 73.2 | 81.5 | | ARC-c | | 42.1 | 53.2 | | TriviaQA | 5-shot | 53.2 | 63.4 | | Natural Questions | 5-shot | 12.5 | 23 | | HumanEval | pass@1 | 22.0 | 32.3 | | MBPP | 3-shot | 29.2 | 44.4 | | GSM8K | maj@1 | 17.7 | 46.4 | | MATH | 4-shot | 11.8 | 24.3 | | AGIEval | | 24.2 | 41.7 | | BIG-Bench | | 35.2 | 55.1 | | ------------------------------ | ------------- | ----------- | --------- | | **Average** | | **45.0** | **56.9** | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as WinoBias and BBQ Dataset. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting internal policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. **Update**: These numbers reflect the new numbers from the updated v1.1 IT models. For the original v1 numbers, please consult the technical report's appendix for the results. | Benchmark | Metric | Gemma v1.1 IT 2B | Gemma v1.1 IT 7B | | ------------------------------ | ------------- | ----------- | --------- | | RealToxicity | average | 6.86 | 7.90 | | BOLD | | 45.57 | 49.08 | | CrowS-Pairs | top-1 | 45.82 | 51.33 | | BBQ Ambig | 1-shot, top-1 | 62.58 | 92.54 | | BBQ Disambig | top-1 | 54.62 | 71.99 | | Winogender | top-1 | 51.25 | 54.17 | | TruthfulQA | | 31.81 | 44.84 | | Winobias 1_2 | | 56.12 | 59.09 | | Winobias 2_2 | | 91.10 | 92.23 | | Toxigen | | 29.77 | 39.59 | | ------------------------------ | ------------- | ----------- | --------- | ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the Responsible Generative AI Toolkit. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the Gemma Prohibited Use Policy. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives." +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-3-12b-it.json b/data/model_data_json/google_gemma-3-12b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..334db91be5f1feafbdc7c27d71468610b5a5c6be --- /dev/null +++ b/data/model_data_json/google_gemma-3-12b-it.json @@ -0,0 +1,46 @@ +{ + "model_id": "google/gemma-3-12b-it", + "downloads": 354924, + "tags": [ + "transformers", + "safetensors", + "gemma3", + "image-text-to-text", + "conversational", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-12b-pt", + "base_model:finetune:google/gemma-3-12b-pt", + "license:gemma", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: image-text-to-text extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/gemma-3-12b-pt --- # Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Usage Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0. Then, copy the snippet from the section that is relevant for your use case. #### Running with the API You can initialize the model and processor for inference with as follows. With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline. #### Running the model on a single / multi GPU ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-3-1b-it.json b/data/model_data_json/google_gemma-3-1b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..fa32da37288bbde9d79d5f7fb4d1a0f1fe05abe7 --- /dev/null +++ b/data/model_data_json/google_gemma-3-1b-it.json @@ -0,0 +1,46 @@ +{ + "model_id": "google/gemma-3-1b-it", + "downloads": 2410194, + "tags": [ + "transformers", + "safetensors", + "gemma3_text", + "text-generation", + "conversational", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-1b-pt", + "base_model:finetune:google/gemma-3-1b-pt", + "license:gemma", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/gemma-3-1b-pt --- # Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Usage Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0. Then, copy the snippet from the section that is relevant for your use case. #### Running with the API With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline. #### Running the model on a single / multi GPU ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-3-1b-pt.json b/data/model_data_json/google_gemma-3-1b-pt.json new file mode 100644 index 0000000000000000000000000000000000000000..e8c86fca0f8d957aeb0a0f27457fb57549caa5c1 --- /dev/null +++ b/data/model_data_json/google_gemma-3-1b-pt.json @@ -0,0 +1,43 @@ +{ + "model_id": "google/gemma-3-1b-pt", + "downloads": 154238, + "tags": [ + "transformers", + "safetensors", + "gemma3_text", + "text-generation", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "license:gemma", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license --- # Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0. Then, copy the snippet from the section that is relevant for your use case. #### Running with the API #### Running the model on a single / multi GPU ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-3-27b-it.json b/data/model_data_json/google_gemma-3-27b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..10aa108de79758afb7160407251af9f7935cdde3 --- /dev/null +++ b/data/model_data_json/google_gemma-3-27b-it.json @@ -0,0 +1,46 @@ +{ + "model_id": "google/gemma-3-27b-it", + "downloads": 395187, + "tags": [ + "transformers", + "safetensors", + "gemma3", + "image-text-to-text", + "conversational", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-27b-pt", + "base_model:finetune:google/gemma-3-27b-pt", + "license:gemma", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: image-text-to-text extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/gemma-3-27b-pt --- # Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Usage Below there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0. Then, copy the snippet from the section that is relevant for your use case. #### Running with the API You can initialize the model and processor for inference with as follows. With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline. #### Running the model on a single/multi GPU ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-3-4b-it.json b/data/model_data_json/google_gemma-3-4b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..806a8bb1be8921d37a8af538c7c19e82af9984bb --- /dev/null +++ b/data/model_data_json/google_gemma-3-4b-it.json @@ -0,0 +1,46 @@ +{ + "model_id": "google/gemma-3-4b-it", + "downloads": 576385, + "tags": [ + "transformers", + "safetensors", + "gemma3", + "image-text-to-text", + "conversational", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-4b-pt", + "base_model:finetune:google/gemma-3-4b-pt", + "license:gemma", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: image-text-to-text extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/gemma-3-4b-pt --- # Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Usage Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0. Then, copy the snippet from the section that is relevant for your use case. #### Running with the API You can initialize the model and processor for inference with as follows. With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline. #### Running the model on a single/multi GPU ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:" +} \ No newline at end of file diff --git a/data/model_data_json/google_gemma-7b-it.json b/data/model_data_json/google_gemma-7b-it.json new file mode 100644 index 0000000000000000000000000000000000000000..ccab3cfecc77e3b95ae3f137406dc95e34511e70 --- /dev/null +++ b/data/model_data_json/google_gemma-7b-it.json @@ -0,0 +1,43 @@ +{ + "model_id": "google/gemma-7b-it", + "downloads": 80370, + "tags": [ + "transformers", + "safetensors", + "gguf", + "gemma", + "text-generation", + "conversational", + "arxiv:2312.11805", + "arxiv:2009.03300", + "arxiv:1905.07830", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1905.10044", + "arxiv:1907.10641", + "arxiv:1811.00937", + "arxiv:1809.02789", + "arxiv:1911.01547", + "arxiv:1705.03551", + "arxiv:2107.03374", + "arxiv:2108.07732", + "arxiv:2110.14168", + "arxiv:2304.06364", + "arxiv:2206.04615", + "arxiv:1804.06876", + "arxiv:2110.08193", + "arxiv:2009.11462", + "arxiv:2101.11718", + "arxiv:1804.09301", + "arxiv:2109.07958", + "arxiv:2203.09509", + "base_model:google/gemma-7b", + "base_model:finetune:google/gemma-7b", + "license:gemma", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: gemma tags: [] widget: - messages: - role: user content: How does the brain work? inference: parameters: max_new_tokens: 200 extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged-in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/gemma-7b base_model_relation: finetune --- # Gemma Model Card **Model Page**: Gemma This model card corresponds to the 7B instruct version of the Gemma model. You can also visit the model card of the 2B base model, 7B base model, and 2B instruct model. **Resources and Technical Documentation**: * Responsible Generative AI Toolkit * Gemma on Kaggle * Gemma on Vertex Model Garden **Terms of Use**: Terms **Authors**: Google ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Usage Below we share some code snippets on how to get quickly started with running the model. First make sure to , then copy the snippet from the section that is relevant for your usecase. #### Fine-tuning the model You can find fine-tuning scripts and notebook under the directory of []( repository. To adapt it to this model, simply change the model-id to . In that repository, we provide: * A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA * A script to perform SFT using FSDP on TPU devices * A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset #### Running the model on a CPU As explained below, we recommend as the default dtype. You can use a different precision if necessary. #### Running the model on a single / multi GPU #### Running the model on a GPU using different precisions The native weights of this model were exported in precision. You can use , which may be faster on certain hardware, indicating the when loading the model. For convenience, the revision of the repo contains a copy of the weights already converted to that precision. You can also use if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to ). See examples below. * _Using _ * _Using _ * _Upcasting to _ #### Quantized Versions through * _Using 8-bit precision (int8)_ * _Using 4-bit precision_ #### Other optimizations * _Flash Attention 2_ First make sure to install in your environment ### Chat Template The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet. Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction: At this point, the prompt contains the following text: As you can see, each turn is preceded by a delimiter and then the role of the entity (either , for content supplied by the user, or for LLM responses). Turns finish with the token. You can follow this format to build the prompt manually, if you need to do it without the tokenizer's chat template. After the prompt is ready, generation can be performed like this: ### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources, totaling 6 trillion tokens. Here are the key components: * Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. * Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. * Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. * Additional methods: Filtering based on content quality and safely in line with our policies. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using the latest generation of Tensor Processing Unit (TPU) hardware (TPUv5e). Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: * Performance: TPUs are specifically designed to handle the massive computations involved in training LLMs. They can speed up training considerably compared to CPUs. * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. * These advantages are aligned with Google's commitments to operate sustainably. ### Software Training was done using JAX and ML Pathways. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; \"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\" ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: | Benchmark | Metric | 2B Params | 7B Params | | ------------------------------ | ------------- | ----------- | --------- | | MMLU | 5-shot, top-1 | 42.3 | 64.3 | | HellaSwag | 0-shot |71.4 | 81.2 | | PIQA | 0-shot | 77.3 | 81.2 | | SocialIQA | 0-shot | 49.7 | 51.8 | | BooIQ | 0-shot | 69.4 | 83.2 | | WinoGrande | partial score | 65.4 | 72.3 | | CommonsenseQA | 7-shot | 65.3 | 71.3 | | OpenBookQA | | 47.8 | 52.8 | | ARC-e | | 73.2 | 81.5 | | ARC-c | | 42.1 | 53.2 | | TriviaQA | 5-shot | 53.2 | 63.4 | | Natural Questions | 5-shot | 12.5 | 23 | | HumanEval | pass@1 | 22.0 | 32.3 | | MBPP | 3-shot | 29.2 | 44.4 | | GSM8K | maj@1 | 17.7 | 46.4 | | MATH | 4-shot | 11.8 | 24.3 | | AGIEval | | 24.2 | 41.7 | | BIG-Bench | | 35.2 | 55.1 | | ------------------------------ | ------------- | ----------- | --------- | | **Average** | | **45.0** | **56.9** | ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Text-to-Text Content Safety: Human evaluation on prompts covering safety policies including child sexual abuse and exploitation, harassment, violence and gore, and hate speech. * Text-to-Text Representational Harms: Benchmark against relevant academic datasets such as WinoBias and BBQ Dataset. * Memorization: Automated evaluation of memorization of training data, including the risk of personally identifiable information exposure. * Large-scale harm: Tests for \"dangerous capabilities,\" such as chemical, biological, radiological, and nuclear (CBRN) risks. ### Evaluation Results The results of ethics and safety evaluations are within acceptable thresholds for meeting internal policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here. | Benchmark | Metric | 2B Params | 7B Params | | ------------------------------ | ------------- | ----------- | --------- | | RealToxicity | average | 6.86 | 7.90 | | BOLD | | 45.57 | 49.08 | | CrowS-Pairs | top-1 | 45.82 | 51.33 | | BBQ Ambig | 1-shot, top-1 | 62.58 | 92.54 | | BBQ Disambig | top-1 | 54.62 | 71.99 | | Winogender | top-1 | 51.25 | 54.17 | | TruthfulQA | | 44.84 | 31.81 | | Winobias 1_2 | | 56.12 | 59.09 | | Winobias 2_2 | | 91.10 | 92.23 | | Toxigen | | 29.77 | 39.59 | | ------------------------------ | ------------- | ----------- | --------- | ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. * Content Creation and Communication * Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. * Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. * Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. * Research and Education * Natural Language Processing (NLP) Research: These models can serve as a foundation for researchers to experiment with NLP techniques, develop algorithms, and contribute to the advancement of the field. * Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. * Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations * Training Data * The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. * The scope of the training dataset determines the subject areas the model can handle effectively. * Context and Task Complexity * LLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). * Language Ambiguity and Nuance * Natural language is inherently complex. LLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * Factual Accuracy * LLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * Common Sense * LLMs rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * LLMs trained on large-scale, real-world text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * LLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the Responsible Generative AI Toolkit. * Transparency and Accountability: * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making LLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the Gemma Prohibited Use Policy. * Privacy violations: Models were trained on data filtered for removal of PII (Personally Identifiable Information). Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives." +} \ No newline at end of file diff --git a/data/model_data_json/google_metricx-23-large-v2p0.json b/data/model_data_json/google_metricx-23-large-v2p0.json new file mode 100644 index 0000000000000000000000000000000000000000..697923f2fdc416b93b347b48594d0ae9fd9e4d51 --- /dev/null +++ b/data/model_data_json/google_metricx-23-large-v2p0.json @@ -0,0 +1,14 @@ +{ + "model_id": "google/metricx-23-large-v2p0", + "downloads": 440700, + "tags": [ + "transformers", + "pytorch", + "mt5", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 --- # MetricX-23 *This is not an officially supported Google product.* **GitHub repository: This repository contains the MetricX-23 models, a family of models for automatic evaluation of translations that were proposed in the WMT'23 Metrics Shared Task submission MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task. The models were trained in T5X and then converted for use in PyTorch. ## Available Models There are 6 models available on HuggingFace that vary in the number of parameters and whether or not the model is reference-based or reference-free (also known as quality estimation, or QE): * MetricX-23-XXL * MetricX-23-XL * MetricX-23-Large * MetricX-23-QE-XXL * MetricX-23-QE-XL * MetricX-23-QE-Large We recommend using the XXL model versions for the best agreement with human judgments of translation quality, the Large versions for best speed, and the XL for an intermediate use case. ## Changes to the WMT'23 Submission These models available here are most similar to the primary submission to the WMT'23 Metrics Shared Task. They are initialized with mT5 then fine-tuned on a combination of direct assessment and MQM data. However, we made some changes that make these models different from the WMT'23 submissions. First, the models are trained to regress the actual MQM score rather than a normalized score between 0 and 1. **That means the output from the MetricX-23 models is a score in the range [0, 25] where lower is better (i.e., it predicts an error score).** Second, these models were trained with a larger variety of synthetic data that makes them more robust to translation edge cases like over- and undertranslation, described in more detail in the following section. ### Synthetic Data In order for our MetricX models to learn to identify certain types of bad translations that are not sufficiently (or at all) represented in the regular training data, we created synthetic examples and mixed them in during training. The synthetic training data was generated from the DA datasets ranging from WMT15 to WMT21 (~ 43 language pairs). In most cases, the synthetic examples have the candidate translation manipulated so as to turn it into a bad translation with a specific issue commonly unrecognized by learned metrics. The table below provides an overview of the various failure modes that we considered, including brief descriptions of how we prepared the synthetic data to address them. | Failure mode | Synthetic example description | | ----------- | ----------- | | Undertranslation | Candidate translation with an arbitrary sentence removed (if multi-sentence); alternatively, candidate with a certain proportion of words removed from the end. | | Overtranslation | Candidate translation duplicated (with space in between). | | Fluent but unrelated translation | Arbitrary reference of a similar length from the dataset. | | Gibberish | Text of a similar length as the reference, generated by sampling words from the reference translation vocabulary (built from all references in the data). | | Missing punctuation | Reference translation with the end punctuation removed (11 punctuation symbols considered). | | Latin instead of Chinese/Japanese or Hindi/Bengali punctuation | Candidate translation with the language-specific punctuation symbol at the end replaced with the Latin equivalent (e.g., \".\" instead of \"。\" or \"।\"); alternatively, the punctuation symbol is replaced with the Latin equivalent in the reference, keeping the correct one in the candidate. | | Reference-matching translation | Reference translation copied as the candidate translation (unlike the rest of the synthetic data, these examples are meant to train the metric to predict a perfect score for candidates matching the reference). | Examples from the first 4 categories were assigned a label corresponding to the worst score on the given rating scale (e.g., 25 when mixed with MQM training data), whereas the reference-matching translation examples are assigned the best score (e.g., 0 when used with MQM data). The missing/incorrect punctuation examples were labeled with a score slightly worse than perfect. Note that some of the synthetic datasets are only meaningful in the reference-based scenario, and we thus excluded them when training a QE variant of MetricX. These are the Latin-vs-special punctuation and the reference-matching translation examples. Most of the synthetic training sets were created using stratified sampling across target languages, taking 500 examples per target language. One exception is the missing punctuation set, which used a stratified sample across different punctuation symbols instead. When training MetricX, a small proportion of the synthetic examples was mixed with the regular training examples. During the first-stage fine-tuning on DA data, each synthetic training set constituted between 0.1% and 1% of all training examples, whereas in the second-stage fine-tuning on MQM data we used an even smaller proportion, around 0.05%. As for evaluating the effect of the synthetic training data on the model's performance, the DEMETR challenge set - which we originally used to evaluate the models submitted to the WMT23 Metrics Shared Task - was not adequate anymore. We therefore created a new DEMETR-style test set based on the WMT22 DA data, with examples constructed analogically to the synthetic training examples, as described above. This test set helped us determine the right proportions of synthetic data for fine-tuning in order to make MetricX robust for the failure modes in consideration, without sacrificing the system- and segment-level correlations with human ratings. ## Usage The code for using MetricX models can be found at The repository contains example prediction scripts, described below. The script contains an example for how to run inference on the models. ### Reference-Based Example usage for a reference-based model: is expected to have 1 serialized JSON object per line with and fields. The output jsonl will be parallel to but additionally contain a field with the predicted score. Note that the model was trained with a maximum input length of 1024 tokens, so significantly increasing that value may lead to unpredictable behavior. ### Reference-Free Example usage for a reference-free model: is expected to have 1 serialized JSON object per line with and fields. The output jsonl will be parallel to but additionally contain a field with the predicted score. ## Meta-Evaluation The script contains code to calculate various correlations between the MetricX-23 scores and MQM ratings of translation quality using the MT Metrics Eval library. Example usage: is expected to have one JSON object serialized per line. Each JSON object is expected to contain 4 fields: * : The name of the system that generated the translation. * : The 0-based index of the corresponding segment in the MT Metrics Eval data. * : The ground-truth translation quality score (with higher is better). * : The model predicted translation quality score (with lower is better; the script negates the scores so higher is better). The script will calculate the 4 agreement/correlations that were used in the WMT'23 Shared Task. Below are the results for the MetricX-23 models on the WMT'22 Metrics Shared Task data: English-German: | Model | System-Level Accuracy | System-Level Pearson | Segment-Level Pearson | Segment-Level Pairwise Acc | | ----------- | ----------- | ----------- | ----------- | ----------- | | MetricX-23-XXL | 0.795 | 0.835 | 0.546 | 0.619 | | MetricX-23-XL | 0.756 | 0.813 | 0.540 | 0.605 | | MetricX-23-Large | 0.769 | 0.759 | 0.507 | 0.595 | | MetricX-23-QE-XXL | 0.769 | 0.830 | 0.490 | 0.606 | | MetricX-23-QE-XL | 0.718 | 0.684 | 0.421 | 0.594 | | MetricX-23-QE-Large | 0.744 | 0.671 | 0.387 | 0.579 | English-Russian: | Model | System-Level Accuracy | System-Level Pearson | Segment-Level Pearson | Segment-Level Pairwise Acc | | ----------- | ----------- | ----------- | ----------- | ----------- | | MetricX-23-XXL | 0.905 | 0.943 | 0.477 | 0.609 | | MetricX-23-XL | 0.876 | 0.906 | 0.498 | 0.589 | | MetricX-23-Large | 0.876 | 0.841 | 0.474 | 0.569 | | MetricX-23-QE-XXL | 0.895 | 0.940 | 0.470 | 0.602 | | MetricX-23-QE-XL | 0.848 | 0.861 | 0.415 | 0.570 | | MetricX-23-QE-Large | 0.819 | 0.778 | 0.411 | 0.551 | Chinese-English: | Model | System-Level Accuracy | System-Level Pearson | Segment-Level Pearson | Segment-Level Pairwise Acc | | ----------- | ----------- | ----------- | ----------- | ----------- | | MetricX-23-XXL | 0.868 | 0.919 | 0.605 | 0.551 | | MetricX-23-XL | 0.868 | 0.924 | 0.584 | 0.543 | | MetricX-23-Large | 0.857 | 0.919 | 0.555 | 0.539 | | MetricX-23-QE-XXL | 0.857 | 0.928 | 0.573 | 0.544 | | MetricX-23-QE-XL | 0.802 | 0.879 | 0.546 | 0.529 | | MetricX-23-QE-Large | 0.758 | 0.904 | 0.522 | 0.529 | The script re-calculates the average correlation score that was used to rank submissions from the WMT'23 Shared Task. Example usage: Each of the 3 input files is expected to be in the same format as described above. Each file should correspond to running inference on each of the language pairs from the WMT'23 dataset. The results for each of the models is the following: | Model | Average Correlation | | ----------- | ----------- | | MetricX-23-XXL | 0.812 | | MetricX-23-XL | 0.813 | | MetricX-23-Large | 0.794 | | MetricX-23-QE-XXL | 0.797 | | MetricX-23-QE-XL | 0.767 | | MetricX-23-QE-Large | 0.762 | ## Citation If you use MetricX-23 in your research, please cite the following publication:", + "model_explanation_gemini": "Evaluates translation quality by predicting error scores in the range [0, 25], supporting both reference-based and reference-free assessment." +} \ No newline at end of file diff --git a/data/model_data_json/google_metricx-23-qe-large-v2p0.json b/data/model_data_json/google_metricx-23-qe-large-v2p0.json new file mode 100644 index 0000000000000000000000000000000000000000..d94b9a7a0eef05805e6b67660607b2a6c7457cd2 --- /dev/null +++ b/data/model_data_json/google_metricx-23-qe-large-v2p0.json @@ -0,0 +1,14 @@ +{ + "model_id": "google/metricx-23-qe-large-v2p0", + "downloads": 625137, + "tags": [ + "transformers", + "pytorch", + "mt5", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 --- # MetricX-23 *This is not an officially supported Google product.* **GitHub repository: This repository contains the MetricX-23 models, a family of models for automatic evaluation of translations that were proposed in the WMT'23 Metrics Shared Task submission MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task. The models were trained in T5X and then converted for use in PyTorch. ## Available Models There are 6 models available on HuggingFace that vary in the number of parameters and whether or not the model is reference-based or reference-free (also known as quality estimation, or QE): * MetricX-23-XXL * MetricX-23-XL * MetricX-23-Large * MetricX-23-QE-XXL * MetricX-23-QE-XL * MetricX-23-QE-Large We recommend using the XXL model versions for the best agreement with human judgments of translation quality, the Large versions for best speed, and the XL for an intermediate use case. ## Changes to the WMT'23 Submission These models available here are most similar to the primary submission to the WMT'23 Metrics Shared Task. They are initialized with mT5 then fine-tuned on a combination of direct assessment and MQM data. However, we made some changes that make these models different from the WMT'23 submissions. First, the models are trained to regress the actual MQM score rather than a normalized score between 0 and 1. **That means the output from the MetricX-23 models is a score in the range [0, 25] where lower is better (i.e., it predicts an error score).** Second, these models were trained with a larger variety of synthetic data that makes them more robust to translation edge cases like over- and undertranslation, described in more detail in the following section. ### Synthetic Data In order for our MetricX models to learn to identify certain types of bad translations that are not sufficiently (or at all) represented in the regular training data, we created synthetic examples and mixed them in during training. The synthetic training data was generated from the DA datasets ranging from WMT15 to WMT21 (~ 43 language pairs). In most cases, the synthetic examples have the candidate translation manipulated so as to turn it into a bad translation with a specific issue commonly unrecognized by learned metrics. The table below provides an overview of the various failure modes that we considered, including brief descriptions of how we prepared the synthetic data to address them. | Failure mode | Synthetic example description | | ----------- | ----------- | | Undertranslation | Candidate translation with an arbitrary sentence removed (if multi-sentence); alternatively, candidate with a certain proportion of words removed from the end. | | Overtranslation | Candidate translation duplicated (with space in between). | | Fluent but unrelated translation | Arbitrary reference of a similar length from the dataset. | | Gibberish | Text of a similar length as the reference, generated by sampling words from the reference translation vocabulary (built from all references in the data). | | Missing punctuation | Reference translation with the end punctuation removed (11 punctuation symbols considered). | | Latin instead of Chinese/Japanese or Hindi/Bengali punctuation | Candidate translation with the language-specific punctuation symbol at the end replaced with the Latin equivalent (e.g., \".\" instead of \"。\" or \"।\"); alternatively, the punctuation symbol is replaced with the Latin equivalent in the reference, keeping the correct one in the candidate. | | Reference-matching translation | Reference translation copied as the candidate translation (unlike the rest of the synthetic data, these examples are meant to train the metric to predict a perfect score for candidates matching the reference). | Examples from the first 4 categories were assigned a label corresponding to the worst score on the given rating scale (e.g., 25 when mixed with MQM training data), whereas the reference-matching translation examples are assigned the best score (e.g., 0 when used with MQM data). The missing/incorrect punctuation examples were labeled with a score slightly worse than perfect. Note that some of the synthetic datasets are only meaningful in the reference-based scenario, and we thus excluded them when training a QE variant of MetricX. These are the Latin-vs-special punctuation and the reference-matching translation examples. Most of the synthetic training sets were created using stratified sampling across target languages, taking 500 examples per target language. One exception is the missing punctuation set, which used a stratified sample across different punctuation symbols instead. When training MetricX, a small proportion of the synthetic examples was mixed with the regular training examples. During the first-stage fine-tuning on DA data, each synthetic training set constituted between 0.1% and 1% of all training examples, whereas in the second-stage fine-tuning on MQM data we used an even smaller proportion, around 0.05%. As for evaluating the effect of the synthetic training data on the model's performance, the DEMETR challenge set - which we originally used to evaluate the models submitted to the WMT23 Metrics Shared Task - was not adequate anymore. We therefore created a new DEMETR-style test set based on the WMT22 DA data, with examples constructed analogically to the synthetic training examples, as described above. This test set helped us determine the right proportions of synthetic data for fine-tuning in order to make MetricX robust for the failure modes in consideration, without sacrificing the system- and segment-level correlations with human ratings. ## Usage The code for using MetricX models can be found at The repository contains example prediction scripts, described below. The script contains an example for how to run inference on the models. ### Reference-Based Example usage for a reference-based model: is expected to have 1 serialized JSON object per line with and fields. The output jsonl will be parallel to but additionally contain a field with the predicted score. Note that the model was trained with a maximum input length of 1024 tokens, so significantly increasing that value may lead to unpredictable behavior. ### Reference-Free Example usage for a reference-free model: is expected to have 1 serialized JSON object per line with and fields. The output jsonl will be parallel to but additionally contain a field with the predicted score. ## Meta-Evaluation The script contains code to calculate various correlations between the MetricX-23 scores and MQM ratings of translation quality using the MT Metrics Eval library. Example usage: is expected to have one JSON object serialized per line. Each JSON object is expected to contain 4 fields: * : The name of the system that generated the translation. * : The 0-based index of the corresponding segment in the MT Metrics Eval data. * : The ground-truth translation quality score (with higher is better). * : The model predicted translation quality score (with lower is better; the script negates the scores so higher is better). The script will calculate the 4 agreement/correlations that were used in the WMT'23 Shared Task. Below are the results for the MetricX-23 models on the WMT'22 Metrics Shared Task data: English-German: | Model | System-Level Accuracy | System-Level Pearson | Segment-Level Pearson | Segment-Level Pairwise Acc | | ----------- | ----------- | ----------- | ----------- | ----------- | | MetricX-23-XXL | 0.795 | 0.835 | 0.546 | 0.619 | | MetricX-23-XL | 0.756 | 0.813 | 0.540 | 0.605 | | MetricX-23-Large | 0.769 | 0.759 | 0.507 | 0.595 | | MetricX-23-QE-XXL | 0.769 | 0.830 | 0.490 | 0.606 | | MetricX-23-QE-XL | 0.718 | 0.684 | 0.421 | 0.594 | | MetricX-23-QE-Large | 0.744 | 0.671 | 0.387 | 0.579 | English-Russian: | Model | System-Level Accuracy | System-Level Pearson | Segment-Level Pearson | Segment-Level Pairwise Acc | | ----------- | ----------- | ----------- | ----------- | ----------- | | MetricX-23-XXL | 0.905 | 0.943 | 0.477 | 0.609 | | MetricX-23-XL | 0.876 | 0.906 | 0.498 | 0.589 | | MetricX-23-Large | 0.876 | 0.841 | 0.474 | 0.569 | | MetricX-23-QE-XXL | 0.895 | 0.940 | 0.470 | 0.602 | | MetricX-23-QE-XL | 0.848 | 0.861 | 0.415 | 0.570 | | MetricX-23-QE-Large | 0.819 | 0.778 | 0.411 | 0.551 | Chinese-English: | Model | System-Level Accuracy | System-Level Pearson | Segment-Level Pearson | Segment-Level Pairwise Acc | | ----------- | ----------- | ----------- | ----------- | ----------- | | MetricX-23-XXL | 0.868 | 0.919 | 0.605 | 0.551 | | MetricX-23-XL | 0.868 | 0.924 | 0.584 | 0.543 | | MetricX-23-Large | 0.857 | 0.919 | 0.555 | 0.539 | | MetricX-23-QE-XXL | 0.857 | 0.928 | 0.573 | 0.544 | | MetricX-23-QE-XL | 0.802 | 0.879 | 0.546 | 0.529 | | MetricX-23-QE-Large | 0.758 | 0.904 | 0.522 | 0.529 | The script re-calculates the average correlation score that was used to rank submissions from the WMT'23 Shared Task. Example usage: Each of the 3 input files is expected to be in the same format as described above. Each file should correspond to running inference on each of the language pairs from the WMT'23 dataset. The results for each of the models is the following: | Model | Average Correlation | | ----------- | ----------- | | MetricX-23-XXL | 0.812 | | MetricX-23-XL | 0.813 | | MetricX-23-Large | 0.794 | | MetricX-23-QE-XXL | 0.797 | | MetricX-23-QE-XL | 0.767 | | MetricX-23-QE-Large | 0.762 | ## Citation If you use MetricX-23 in your research, please cite the following publication:", + "model_explanation_gemini": "Evaluates translation quality by predicting error scores in the range [0, 25], supporting both reference-based and reference-free (quality estimation) approaches." +} \ No newline at end of file diff --git a/data/model_data_json/google_mobilebert-uncased.json b/data/model_data_json/google_mobilebert-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..47d28b80dfe47a33267338339624efc52fe47c68 --- /dev/null +++ b/data/model_data_json/google_mobilebert-uncased.json @@ -0,0 +1,18 @@ +{ + "model_id": "google/mobilebert-uncased", + "downloads": 172098, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "mobilebert", + "pretraining", + "en", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en thumbnail: license: apache-2.0 --- ## MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. This checkpoint is the original MobileBert Optimized Uncased English: uncased_L-24_H-128_B-512_A-4_F-4_OPT checkpoint. ## How to use MobileBERT in", + "model_explanation_gemini": "MobileBERT is a compact, task-agnostic version of BERT_LARGE designed for resource-limited devices, featuring bottleneck structures and optimized self-attention-to-feed-forward network balance." +} \ No newline at end of file diff --git a/data/model_data_json/google_mt5-base.json b/data/model_data_json/google_mt5-base.json new file mode 100644 index 0000000000000000000000000000000000000000..ffde1378ac7a421e2525ae8caae4b8ed9f812462 --- /dev/null +++ b/data/model_data_json/google_mt5-base.json @@ -0,0 +1,122 @@ +{ + "model_id": "google/mt5-base", + "downloads": 138129, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "mt5", + "text2text-generation", + "multilingual", + "af", + "am", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "co", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fil", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "haw", + "hi", + "hmn", + "ht", + "hu", + "hy", + "ig", + "is", + "it", + "iw", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lb", + "lo", + "lt", + "lv", + "mg", + "mi", + "mk", + "ml", + "mn", + "mr", + "ms", + "mt", + "my", + "ne", + "nl", + "no", + "ny", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sd", + "si", + "sk", + "sl", + "sm", + "sn", + "so", + "sq", + "sr", + "st", + "su", + "sv", + "sw", + "ta", + "te", + "tg", + "th", + "tr", + "uk", + "und", + "ur", + "uz", + "vi", + "xh", + "yi", + "yo", + "zh", + "zu", + "dataset:mc4", + "arxiv:2010.11934", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - az - be - bg - bn - ca - ceb - co - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fil - fr - fy - ga - gd - gl - gu - ha - haw - hi - hmn - ht - hu - hy - ig - is - it - iw - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lb - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - no - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sm - sn - so - sq - sr - st - su - sv - sw - ta - te - tg - th - tr - uk - und - ur - uz - vi - xh - yi - yo - zh - zu datasets: - mc4 license: apache-2.0 --- Google's mT5 mT5 is pretrained on the mC4 corpus, covering 101 languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu. **Note**: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. Pretraining Dataset: mC4 Other Community Checkpoints: here Paper: mT5: A massively multilingual pre-trained text-to-text transformer Authors: *Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel* ## Abstract The recent \"Text-to-Text Transfer Transformer\" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available.", + "model_explanation_gemini": "Google's mT5-base is a multilingual text-to-text transformer model pretrained on 101 languages for various NLP tasks, requiring fine-tuning for specific downstream applications." +} \ No newline at end of file diff --git a/data/model_data_json/google_mt5-large.json b/data/model_data_json/google_mt5-large.json new file mode 100644 index 0000000000000000000000000000000000000000..45dae0f0b215f95bf7f1ae0655274045180937b2 --- /dev/null +++ b/data/model_data_json/google_mt5-large.json @@ -0,0 +1,122 @@ +{ + "model_id": "google/mt5-large", + "downloads": 498512, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "mt5", + "text2text-generation", + "multilingual", + "af", + "am", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "co", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fil", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "haw", + "hi", + "hmn", + "ht", + "hu", + "hy", + "ig", + "is", + "it", + "iw", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lb", + "lo", + "lt", + "lv", + "mg", + "mi", + "mk", + "ml", + "mn", + "mr", + "ms", + "mt", + "my", + "ne", + "nl", + "no", + "ny", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sd", + "si", + "sk", + "sl", + "sm", + "sn", + "so", + "sq", + "sr", + "st", + "su", + "sv", + "sw", + "ta", + "te", + "tg", + "th", + "tr", + "uk", + "und", + "ur", + "uz", + "vi", + "xh", + "yi", + "yo", + "zh", + "zu", + "dataset:mc4", + "arxiv:2010.11934", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - az - be - bg - bn - ca - ceb - co - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fil - fr - fy - ga - gd - gl - gu - ha - haw - hi - hmn - ht - hu - hy - ig - is - it - iw - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lb - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - no - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sm - sn - so - sq - sr - st - su - sv - sw - ta - te - tg - th - tr - uk - und - ur - uz - vi - xh - yi - yo - zh - zu datasets: - mc4 license: apache-2.0 --- Google's mT5 mT5 is pretrained on the mC4 corpus, covering 101 languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu. **Note**: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. Pretraining Dataset: mC4 Other Community Checkpoints: here Paper: mT5: A massively multilingual pre-trained text-to-text transformer Authors: *Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel* ## Abstract The recent \"Text-to-Text Transfer Transformer\" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available.", + "model_explanation_gemini": "A multilingual text-to-text transformer model pretrained on 101 languages for various NLP tasks, requiring fine-tuning for specific downstream applications." +} \ No newline at end of file diff --git a/data/model_data_json/google_mt5-small.json b/data/model_data_json/google_mt5-small.json new file mode 100644 index 0000000000000000000000000000000000000000..5173782e818209bebe2199ad472cdf9bbf1a261e --- /dev/null +++ b/data/model_data_json/google_mt5-small.json @@ -0,0 +1,123 @@ +{ + "model_id": "google/mt5-small", + "downloads": 142199, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "mt5", + "text2text-generation", + "multilingual", + "af", + "am", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "co", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fil", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "haw", + "hi", + "hmn", + "ht", + "hu", + "hy", + "ig", + "is", + "it", + "iw", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lb", + "lo", + "lt", + "lv", + "mg", + "mi", + "mk", + "ml", + "mn", + "mr", + "ms", + "mt", + "my", + "ne", + "nl", + "no", + "ny", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sd", + "si", + "sk", + "sl", + "sm", + "sn", + "so", + "sq", + "sr", + "st", + "su", + "sv", + "sw", + "ta", + "te", + "tg", + "th", + "tr", + "uk", + "und", + "ur", + "uz", + "vi", + "xh", + "yi", + "yo", + "zh", + "zu", + "dataset:mc4", + "arxiv:2010.11934", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - az - be - bg - bn - ca - ceb - co - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fil - fr - fy - ga - gd - gl - gu - ha - haw - hi - hmn - ht - hu - hy - ig - is - it - iw - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lb - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - no - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sm - sn - so - sq - sr - st - su - sv - sw - ta - te - tg - th - tr - uk - und - ur - uz - vi - xh - yi - yo - zh - zu datasets: - mc4 license: apache-2.0 --- Google's mT5 mT5 is pretrained on the mC4 corpus, covering 101 languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu. **Note**: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. Pretraining Dataset: mC4 Other Community Checkpoints: here Paper: mT5: A massively multilingual pre-trained text-to-text transformer Authors: *Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel* ## Abstract The recent \"Text-to-Text Transfer Transformer\" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available.", + "model_explanation_gemini": "Google's mT5-small is a multilingual text-to-text transformer model pretrained on 101 languages for various NLP tasks, requiring fine-tuning for specific downstream applications." +} \ No newline at end of file diff --git a/data/model_data_json/google_owlv2-base-patch16-ensemble.json b/data/model_data_json/google_owlv2-base-patch16-ensemble.json new file mode 100644 index 0000000000000000000000000000000000000000..7c489def99f9a8b4b7af885ec9a29f645b25942d --- /dev/null +++ b/data/model_data_json/google_owlv2-base-patch16-ensemble.json @@ -0,0 +1,17 @@ +{ + "model_id": "google/owlv2-base-patch16-ensemble", + "downloads": 925283, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "owlv2", + "zero-shot-object-detection", + "vision", + "arxiv:2306.09683", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - zero-shot-object-detection inference: false --- # Model Card: OWLv2 ## Model Details The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby. OWLv2, like OWL-ViT, is a zero-shot text-conditioned object detection model that can be used to query an image with one or multiple text queries. The model uses CLIP as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection. ### Model Date June 2023 ### Model Type The model uses a CLIP backbone with a ViT-B/16 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. The CLIP backbone is trained from scratch and fine-tuned together with the box and class prediction heads with an object detection objective. ### Documents - OWLv2 Paper ### Use with Transformers ## Model Use ### Intended Use The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, text-conditioned object detection. We also hope it can be used for interdisciplinary studies of the potential impact of such models, especially in areas that commonly require identifying objects whose label is unavailable during training. #### Primary intended uses The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ## Data The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as YFCC100M. A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as COCO and OpenImages. (to be updated for v2) ### BibTeX entry and citation info", + "model_explanation_gemini": "OWLv2 is a zero-shot text-conditioned object detection model that uses a CLIP backbone to detect objects in images based on text queries without requiring labeled training data for those objects." +} \ No newline at end of file diff --git a/data/model_data_json/google_owlv2-large-patch14-ensemble.json b/data/model_data_json/google_owlv2-large-patch14-ensemble.json new file mode 100644 index 0000000000000000000000000000000000000000..6b94c40af05841a7864011c6e1e3525f93d41896 --- /dev/null +++ b/data/model_data_json/google_owlv2-large-patch14-ensemble.json @@ -0,0 +1,17 @@ +{ + "model_id": "google/owlv2-large-patch14-ensemble", + "downloads": 265478, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "owlv2", + "zero-shot-object-detection", + "vision", + "arxiv:2306.09683", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - zero-shot-object-detection inference: false --- # Model Card: OWLv2 ## Model Details The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby. OWLv2, like OWL-ViT, is a zero-shot text-conditioned object detection model that can be used to query an image with one or multiple text queries. The model uses CLIP as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection. ### Model Date June 2023 ### Model Type The model uses a CLIP backbone with a ViT-L/14 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. The CLIP backbone is trained from scratch and fine-tuned together with the box and class prediction heads with an object detection objective. ### Documents - OWLv2 Paper ### Use with Transformers ## Model Use ### Intended Use The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, text-conditioned object detection. We also hope it can be used for interdisciplinary studies of the potential impact of such models, especially in areas that commonly require identifying objects whose label is unavailable during training. #### Primary intended uses The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ## Data The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as YFCC100M. A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as COCO and OpenImages. (to be updated for v2) ### BibTeX entry and citation info", + "model_explanation_gemini": "OWLv2 is a zero-shot text-conditioned object detection model that uses a CLIP backbone to detect objects in images based on text queries without requiring training on specific labels." +} \ No newline at end of file diff --git a/data/model_data_json/google_owlvit-base-patch32.json b/data/model_data_json/google_owlvit-base-patch32.json new file mode 100644 index 0000000000000000000000000000000000000000..5c6d3d0aca72c4b1f667faf27669b10945cfb1db --- /dev/null +++ b/data/model_data_json/google_owlvit-base-patch32.json @@ -0,0 +1,17 @@ +{ + "model_id": "google/owlvit-base-patch32", + "downloads": 781508, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "owlvit", + "zero-shot-object-detection", + "vision", + "arxiv:2205.06230", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - zero-shot-object-detection inference: false --- # Model Card: OWL-ViT ## Model Details The OWL-ViT (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is a zero-shot text-conditioned object detection model that can be used to query an image with one or multiple text queries. OWL-ViT uses CLIP as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection. ### Model Date May 2022 ### Model Type The model uses a CLIP backbone with a ViT-B/32 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. The CLIP backbone is trained from scratch and fine-tuned together with the box and class prediction heads with an object detection objective. ### Documents - OWL-ViT Paper ### Use with Transformers ## Model Use ### Intended Use The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, text-conditioned object detection. We also hope it can be used for interdisciplinary studies of the potential impact of such models, especially in areas that commonly require identifying objects whose label is unavailable during training. #### Primary intended uses The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ## Data The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as YFCC100M. A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as COCO and OpenImages. ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects objects in images based on text queries without requiring prior training on those specific objects, using a CLIP backbone and ViT architecture for zero-shot, open-vocabulary localization." +} \ No newline at end of file diff --git a/data/model_data_json/google_paligemma-3b-mix-224.json b/data/model_data_json/google_paligemma-3b-mix-224.json new file mode 100644 index 0000000000000000000000000000000000000000..3f5b5a1f4b771d5b75301a8896a23823f5a2e58d --- /dev/null +++ b/data/model_data_json/google_paligemma-3b-mix-224.json @@ -0,0 +1,35 @@ +{ + "model_id": "google/paligemma-3b-mix-224", + "downloads": 176245, + "tags": [ + "transformers", + "safetensors", + "paligemma", + "image-text-to-text", + "arxiv:2310.09199", + "arxiv:2303.15343", + "arxiv:2403.08295", + "arxiv:1706.03762", + "arxiv:2010.11929", + "arxiv:2209.06794", + "arxiv:2209.04372", + "arxiv:2103.01913", + "arxiv:2205.12522", + "arxiv:2110.11624", + "arxiv:2108.03353", + "arxiv:2010.04295", + "arxiv:2401.06209", + "arxiv:2305.10355", + "arxiv:2203.10244", + "arxiv:1810.12440", + "arxiv:1905.13648", + "arxiv:1608.00272", + "arxiv:1908.04913", + "arxiv:2407.07726", + "license:gemma", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: gemma pipeline_tag: image-text-to-text extra_gated_heading: Access PaliGemma on Hugging Face extra_gated_prompt: To access PaliGemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged-in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license --- # PaliGemma model card **Model page:** PaliGemma Transformers PaliGemma 3B weights, fine-tuned with 224*224 input images and 256 token input/output text sequences on a mixture of downstream academic datasets. The models are available in float32, bfloat16 and float16 format for research purposes only. **Resources and technical documentation:** * Responsible Generative AI Toolkit * PaliGemma on Kaggle * PaliGemma on Vertex Model Garden **Terms of Use:** Terms **Authors:** Google ## Model information ### Model summary #### Description PaliGemma is a versatile and lightweight vision-language model (VLM) inspired by PaLI-3 and based on open components such as the SigLIP vision model and the Gemma language model. It takes both image and text as input and generates text as output, supporting multiple languages. It is designed for class-leading fine-tune performance on a wide range of vision-language tasks such as image and short video caption, visual question answering, text reading, object detection and object segmentation. #### Model architecture PaliGemma is the composition of a Transformer decoder and a Vision Transformer image encoder, with a total of 3 billion params. The text decoder is initialized from Gemma-2B. The image encoder is initialized from SigLIP-So400m/14. PaliGemma is trained following the PaLI-3 recipes. #### Inputs and outputs * **Input:** Image and text string, such as a prompt to caption the image, or a question. * **Output:** Generated text in response to the input, such as a caption of the image, an answer to a question, a list of object bounding box coordinates, or segmentation codewords. ### Model data #### Pre-train datasets PaliGemma is pre-trained on the following mixture of datasets: * **WebLI:** WebLI (Web Language Image) is a web-scale multilingual image-text dataset built from the public web. A wide range of WebLI splits are used to acquire versatile model capabilities, such as visual semantic understanding, object localization, visually-situated text understanding, multilinguality, etc. * **CC3M-35L:** Curated English image-alt_text pairs from webpages (Sharma et al., 2018). We used the Google Cloud Translation API to translate into 34 additional languages. * **VQ²A-CC3M-35L/VQG-CC3M-35L:** A subset of VQ2A-CC3M (Changpinyo et al., 2022a), translated into the same additional 34 languages as CC3M-35L, using the Google Cloud Translation API. * **OpenImages:** Detection and object-aware questions and answers (Piergiovanni et al. 2022) generated by handcrafted rules on the [OpenImages dataset]. * **WIT:** Images and texts collected from Wikipedia (Srinivasan et al., 2021). [OpenImages dataset]: #### Data responsibility filtering The following filters are applied to WebLI, with the goal of training PaliGemma on clean data: * **Pornographic image filtering:** This filter removes images deemed to be of pornographic nature. * **Text safety filtering:** We identify and filter out images that are paired with unsafe text. Unsafe text is any text deemed to contain or be about CSAI, pornography, vulgarities, or otherwise offensive. * **Text toxicity filtering:** We further use the Perspective API to identify and filter out images that are paired with text deemed insulting, obscene, hateful or otherwise toxic. * **Text personal information filtering:** We filtered certain personal information and other sensitive data using Cloud Data Loss Prevention (DLP) API to protect the privacy of individuals. Identifiers such as social security numbers and [other sensitive information types] were removed. * **Additional methods:** Filtering based on content quality and safety in line with our policies and practices. [other sensitive information types]: ## How to Use PaliGemma is a single-turn vision language model not meant for conversational use, and it works best when fine-tuning to a specific use case. You can configure which task the model will solve by conditioning it with task prefixes, such as “detect” or “segment”. The pretrained models were trained in this fashion to imbue them with a rich set of capabilities (question answering, captioning, segmentation, etc.). However, they are not designed to be used directly, but to be transferred (by fine-tuning) to specific tasks using a similar prompt structure. For interactive testing, you can use the \"mix\" family of models, which have been fine-tuned on a mixture of tasks. To see model google/paligemma-3b-mix-448 in action, check this Space that uses the Transformers codebase. Please, refer to the usage and limitations section for intended use cases, or visit the blog post for additional details and examples. ## Use in Transformers The following snippets use model for reference purposes. The model in this repo you are now browsing may have been trained for other tasks, please make sure you use appropriate inputs for the task at hand. ### Running the default precision () on CPU Output: ### Running other precisions on CUDA For convenience, the repos contain revisions of the weights already converted to and , so you can use them to reduce the download size and avoid casting on your local computer. This is how you'd run on an nvidia CUDA card. ### Loading in 4-bit / 8-bit You need to install to automatically run inference using 8-bit or 4-bit precision: ## Implementation information ### Hardware PaliGemma was trained using the latest generation of Tensor Processing Unit (TPU) hardware (TPUv5e). ### Software Training was done using JAX, Flax, TFDS and []( JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. TFDS is used to access datasets and Flax is used for model architecture. The PaliGemma fine-tune code and inference code are released in the GitHub repository. ## Evaluation information ### Benchmark results In order to verify the transferability of PaliGemma to a wide variety of academic tasks, we fine-tune the pretrained models on each task. Additionally we train the mix model with a mixture of the transfer tasks. We report results on different resolutions to provide an impression of which tasks benefit from increased resolution. Importantly, none of these tasks or datasets are part of the pretraining data mixture, and their images are explicitly removed from the web-scale pre-training data. #### Single task (fine-tune on single task)
Benchmark
(train split)
Metric
(split)
pt-224 pt-448 pt-896
Captioning

(train+restval)
CIDEr (val) 141.92 144.60
captions transfer) CIDEr (val) 121.72 123.58
CIDEr dev
(en/avg-34/avg)
139.2
115.8
116.4
141.2
118.0
118.6
CIDEr dev
(en/avg-34/avg)
78.1
41.3
42.4
80.0
41.9
42.9
CIDEr (val) 127.48 153.94
(train+val) CIDEr/BLEU-4
(test)
162.25
0.192
181.49
0.211
CIDEr (test) 117.57 119.59

(train+dev)
CIDEr (test) 136.07 148.36
Question answering
Accuracy
(Test server - std)
83.19 85.64
Paired Accuracy 47.33 45.33
Accuracy
(random/popular/
adversarial)
87.80
85.87
84.27
88.23
86.77
85.90
Accuracy (val) 63.54 63.15
(train+val) Accuracy
(Test server)
76.37 76.90
(train+val) Accuracy
(Test server)
61.85 63.22
Accuracy
(testdev balanced)
65.61 67.03
Mean Accuracy
(bn, de, en, id,
ko, pt, ru, zh)
58.37 59.07
Accuracy (test) 90.02 88.93
Mean Accuracy
(test)
(id, sw, ta, tr, zh)
80.57 76.78
Accuracy (test) 72.12 73.28
(train+val) Accuracy (test) 95.39 95.93
(train+val) Mean Accuracy
(test)
92.65 93.11
(train+val) Mean Accuracy
(test/test2)
92.61
90.58
92.79
90.54
Mean Relaxed
Accuracy
(test_human,
test_aug)
57.08 71.36

(train+val)
Accuracy
(Test server - std)
73.7 75.52
Accuracy
(test_simple/
test_complex)
81.72
69.56
84.86
72.27
Accuracy (test) 72.32 74.61 74.93
Accuracy
(Test server - std)
55.47 73.15 76.48
ANLS (Test server) 43.74 78.02 84.77

(train+val)
ANLS (Test server) 28.46 40.47 47.75

(train+val)
ANLS (Test server) 63.29 81.82 84.40
Segmentation
refcocog excluding val
and test images)
MIoU
(validation)
refcoco/refcoco+/
refcocog
73.40
68.32
67.65
75.57
69.76
70.17
76.94
72.18
72.22
Video tasks (Caption/QA)
MSR-VTT (Captioning) CIDEr (test) 70.54
MSR-VTT (QA) Accuracy (test) 50.09
ActivityNet (Captioning) CIDEr (test) 34.62
ActivityNet (QA) Accuracy (test) 50.78
VATEX (Captioning) CIDEr (test) 79.73
MSVD (QA) Accuracy (test) 60.22
#### Mix model (fine-tune on mixture of transfer tasks)
Benchmark Metric (split) mix-224 mix-448
Paired Accuracy 46.00 45.33
Accuracy
(random/popular/adversarial)
88.00
86.63
85.67
89.37
88.40
87.47
## Ethics and safety ### Evaluation approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: * Human evaluation on prompts covering child safety, content safety and representational harms. See the Gemma model card for more details on evaluation approach, but with image captioning and visual question answering setups. * Image-to-Text benchmark evaluation: Benchmark against relevant academic datasets such as FairFace Dataset (Karkkainen et al., 2021). ### Evaluation results * The human evaluation results of ethics and safety evaluations are within acceptable thresholds for meeting internal policies for categories such as child safety, content safety and representational harms. * On top of robust internal evaluations, we also use the Perspective API (threshold of 0.8) to measure toxicity, profanity, and other potential issues in the generated captions for images sourced from the FairFace dataset. We report the maximum and median values observed across subgroups for each of the perceived gender, ethnicity, and age attributes.
Metric Perceived
gender
Ethnicity Age group
Maximum Median Maximum Median Maximum Median
Toxicity 0.04% 0.03% 0.08% 0.00% 0.09% 0.00%
Identity Attack 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Insult 0.06% 0.04% 0.09% 0.07% 0.16% 0.00%
Threat 0.06% 0.05% 0.14% 0.05% 0.17% 0.00%
Profanity 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
## Usage and limitations ### Intended usage Open Vision Language Models (VLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. Fine-tune on specific vision-language task: * The pre-trained models can be fine-tuned on a wide range of vision-language tasks such as: image captioning, short video caption, visual question answering, text reading, object detection and object segmentation. * The pre-trained models can be fine-tuned for specific domains such as remote sensing question answering, visual questions from people who are blind, science question answering, describe UI element functionalities. * The pre-trained models can be fine-tuned for tasks with non-textual outputs such as bounding boxes or segmentation masks. Vision-language research: * The pre-trained models and fine-tuned models can serve as a foundation for researchers to experiment with VLM techniques, develop algorithms, and contribute to the advancement of the field. ### Ethical considerations and risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: * Bias and Fairness * VLMs trained on large-scale, real-world image-text data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. * Misinformation and Misuse * VLMs can be misused to generate text that is false, misleading, or harmful. * Guidelines are provided for responsible use with the model, see the Responsible Generative AI Toolkit. * Transparency and Accountability * This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. * A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: * **Perpetuation of biases:** It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. * **Generation of harmful content:** Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. * **Misuse for malicious purposes:** Technical limitations and developer and end-user education can help mitigate against malicious applications of LLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the Gemma Prohibited Use Policy. * **Privacy violations:** Models were trained on data filtered to remove certain personal information and sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Limitations * Most limitations inherited from the underlying Gemma model still apply: * VLMs are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. * Natural language is inherently complex. VLMs might struggle to grasp subtle nuances, sarcasm, or figurative language. * VLMs generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. * VLMs rely on statistical patterns in language and images. They might lack the ability to apply common sense reasoning in certain situations. * PaliGemma was designed first and foremost to serve as a general pre-trained model for transfer to specialized tasks. Hence, its \"out of the box\" or \"zero-shot\" performance might lag behind models designed specifically for that. * PaliGemma is not a multi-turn chatbot. It is designed for a single round of image and text input. ## Citation Find the paper here." +} \ No newline at end of file diff --git a/data/model_data_json/google_pegasus-xsum.json b/data/model_data_json/google_pegasus-xsum.json new file mode 100644 index 0000000000000000000000000000000000000000..3e3e92ff7c7440906868a06e97f8fa55f186d85a --- /dev/null +++ b/data/model_data_json/google_pegasus-xsum.json @@ -0,0 +1,21 @@ +{ + "model_id": "google/pegasus-xsum", + "downloads": 132627, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "pegasus", + "text2text-generation", + "summarization", + "en", + "arxiv:1912.08777", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - summarization model-index: - name: google/pegasus-xsum results: - task: type: summarization name: Summarization dataset: name: samsum type: samsum config: samsum split: train metrics: - name: ROUGE-1 type: rouge value: 21.8096 verified: true - name: ROUGE-2 type: rouge value: 4.2525 verified: true - name: ROUGE-L type: rouge value: 17.4469 verified: true - name: ROUGE-LSUM type: rouge value: 18.8907 verified: true - name: loss type: loss value: 3.0317161083221436 verified: true - name: gen_len type: gen_len value: 20.3122 verified: true - task: type: summarization name: Summarization dataset: name: xsum type: xsum config: default split: test metrics: - name: ROUGE-1 type: rouge value: 46.8623 verified: true - name: ROUGE-2 type: rouge value: 24.4533 verified: true - name: ROUGE-L type: rouge value: 39.0548 verified: true - name: ROUGE-LSUM type: rouge value: 39.0994 verified: true - name: loss type: loss value: 1.5717021226882935 verified: true - name: gen_len type: gen_len value: 22.8821 verified: true - task: type: summarization name: Summarization dataset: name: cnn_dailymail type: cnn_dailymail config: 3.0.0 split: test metrics: - name: ROUGE-1 type: rouge value: 22.2062 verified: true - name: ROUGE-2 type: rouge value: 7.6701 verified: true - name: ROUGE-L type: rouge value: 15.4046 verified: true - name: ROUGE-LSUM type: rouge value: 19.2182 verified: true - name: loss type: loss value: 2.681241273880005 verified: true - name: gen_len type: gen_len value: 25.0234 verified: true --- ### Pegasus Models See Docs: here Original TF 1 code here Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019 Maintained by: @sshleifer Task: Summarization The following is copied from the authors' README. # Mixed & Stochastic Checkpoints We train a pegasus model with sampled gap sentence ratios on both C4 and HugeNews, and stochastically sample important sentences. The updated the results are reported in this table. | dataset | C4 | HugeNews | Mixed & Stochastic| | ---- | ---- | ---- | ----| | xsum | 45.20/22.06/36.99 | 47.21/24.56/39.25 | 47.60/24.83/39.64| | cnn_dailymail | 43.90/21.20/40.76 | 44.17/21.47/41.11 | 44.16/21.56/41.30| | newsroom | 45.07/33.39/41.28 | 45.15/33.51/41.33 | 45.98/34.20/42.18| | multi_news | 46.74/17.95/24.26 | 47.52/18.72/24.91 | 47.65/18.75/24.95| | gigaword | 38.75/19.96/36.14 | 39.12/19.86/36.24 | 39.65/20.47/36.76| | wikihow | 43.07/19.70/34.79 | 41.35/18.51/33.42 | 46.39/22.12/38.41 *| | reddit_tifu | 26.54/8.94/21.64 | 26.63/9.01/21.60 | 27.99/9.81/22.94| | big_patent | 53.63/33.16/42.25 | 53.41/32.89/42.07 | 52.29/33.08/41.66 *| | arxiv | 44.70/17.27/25.80 | 44.67/17.18/25.73 | 44.21/16.95/25.67| | pubmed | 45.49/19.90/27.69 | 45.09/19.56/27.42 | 45.97/20.15/28.25| | aeslc | 37.69/21.85/36.84 | 37.40/21.22/36.45 | 37.68/21.25/36.51| | billsum | 57.20/39.56/45.80 | 57.31/40.19/45.82 | 59.67/41.58/47.59| The \"Mixed & Stochastic\" model has the following changes: - trained on both C4 and HugeNews (dataset mixture is weighted by their number of examples). - trained for 1.5M instead of 500k (we observe slower convergence on pretraining perplexity). - the model uniformly sample a gap sentence ratio between 15% and 45%. - importance sentences are sampled using a 20% uniform noise to importance scores. - the sentencepiece tokenizer is updated to be able to encode newline character. (*) the numbers of wikihow and big_patent datasets are not comparable because of change in tokenization and data: - wikihow dataset contains newline characters which is useful for paragraph segmentation, the C4 and HugeNews model's sentencepiece tokenizer doesn't encode newline and loose this information. - we update the BigPatent dataset to preserve casing, some format cleanings are also changed, please refer to change in TFDS. The \"Mixed & Stochastic\" model has the following changes (from pegasus-large in the paper): trained on both C4 and HugeNews (dataset mixture is weighted by their number of examples). trained for 1.5M instead of 500k (we observe slower convergence on pretraining perplexity). the model uniformly sample a gap sentence ratio between 15% and 45%. importance sentences are sampled using a 20% uniform noise to importance scores. the sentencepiece tokenizer is updated to be able to encode newline character. Citation", + "model_explanation_gemini": "Generates summaries from input text, optimized for performance on datasets like XSum, CNN/DailyMail, and SAMSum." +} \ No newline at end of file diff --git a/data/model_data_json/google_siglip-base-patch16-224.json b/data/model_data_json/google_siglip-base-patch16-224.json new file mode 100644 index 0000000000000000000000000000000000000000..dd2149945bf6dbe5da8dce8f10828bf13df4a675 --- /dev/null +++ b/data/model_data_json/google_siglip-base-patch16-224.json @@ -0,0 +1,19 @@ +{ + "model_id": "google/siglip-base-patch16-224", + "downloads": 252784, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "siglip", + "zero-shot-image-classification", + "vision", + "arxiv:2303.15343", + "arxiv:2209.06794", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # SigLIP (base-sized model) SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here. ## Intended uses & limitations You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot image classification: Alternatively, one can leverage the pipeline API which abstracts away the complexity for the user: For more code examples, we refer to the documentation. ## Training procedure ### Training data SigLIP is pre-trained on the English image-text pairs of the WebLI dataset (Chen et al., 2023). ### Preprocessing Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). Texts are tokenized and padded to the same length (64 tokens). ### Compute The model was trained on 16 TPU-v4 chips for three days. ## Evaluation results Evaluation of SigLIP compared to CLIP is shown below (taken from the paper). \"drawing\" ### BibTeX entry and citation info", + "model_explanation_gemini": "A multimodal vision-language model pre-trained for zero-shot image classification and image-text retrieval using a sigmoid loss function for improved performance." +} \ No newline at end of file diff --git a/data/model_data_json/google_siglip-base-patch16-256-multilingual.json b/data/model_data_json/google_siglip-base-patch16-256-multilingual.json new file mode 100644 index 0000000000000000000000000000000000000000..5d18b90543f5838d444ed696711a9cc16a0acb8a --- /dev/null +++ b/data/model_data_json/google_siglip-base-patch16-256-multilingual.json @@ -0,0 +1,18 @@ +{ + "model_id": "google/siglip-base-patch16-256-multilingual", + "downloads": 178975, + "tags": [ + "transformers", + "safetensors", + "siglip", + "zero-shot-image-classification", + "vision", + "arxiv:2303.15343", + "arxiv:2209.06794", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # SigLIP (base-sized model, multilingual) SigLIP model pre-trained on WebLi at resolution 256x256. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here. ## Intended uses & limitations You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot image classification: Alternatively, one can leverage the pipeline API which abstracts away the complexity for the user: For more code examples, we refer to the documentation. ## Training procedure ### Training data SigLIP is pre-trained on the WebLI dataset without language filter (Chen et al., 2023). ### Preprocessing Images are resized/rescaled to the same resolution (256x256) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). Texts are tokenized and padded to the same length (64 tokens). ### Compute The model was trained on 16 TPU-v4 chips for three days. ## Evaluation results Evaluation of SigLIP compared to CLIP is shown below (taken from the paper). \"drawing\" ### BibTeX entry and citation info", + "model_explanation_gemini": "A multilingual vision-language model pre-trained for zero-shot image classification and image-text retrieval using a sigmoid loss function for improved performance." +} \ No newline at end of file diff --git a/data/model_data_json/google_siglip-base-patch16-512.json b/data/model_data_json/google_siglip-base-patch16-512.json new file mode 100644 index 0000000000000000000000000000000000000000..7a8c3f05b95a4620191a001ce2dc1c6b5858d16d --- /dev/null +++ b/data/model_data_json/google_siglip-base-patch16-512.json @@ -0,0 +1,18 @@ +{ + "model_id": "google/siglip-base-patch16-512", + "downloads": 221309, + "tags": [ + "transformers", + "safetensors", + "siglip", + "zero-shot-image-classification", + "vision", + "arxiv:2303.15343", + "arxiv:2209.06794", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # SigLIP (base-sized model) SigLIP model pre-trained on WebLi at resolution 512x512. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here. ## Intended uses & limitations You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot image classification: Alternatively, one can leverage the pipeline API which abstracts away the complexity for the user: For more code examples, we refer to the documentation. ## Training procedure ### Training data SigLIP is pre-trained on the English image-text pairs of the WebLI dataset (Chen et al., 2023). ### Preprocessing Images are resized/rescaled to the same resolution (512x512) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). Texts are tokenized and padded to the same length (64 tokens). ### Compute The model was trained on 16 TPU-v4 chips for three days. ## Evaluation results Evaluation of SigLIP compared to CLIP is shown below (taken from the paper). \"drawing\" ### BibTeX entry and citation info", + "model_explanation_gemini": "A multimodal vision-language model pre-trained for zero-shot image classification and image-text retrieval using a sigmoid loss function for improved performance." +} \ No newline at end of file diff --git a/data/model_data_json/google_siglip-large-patch16-384.json b/data/model_data_json/google_siglip-large-patch16-384.json new file mode 100644 index 0000000000000000000000000000000000000000..abc70dca52d3995566a30aea0a1ca4c27292e489 --- /dev/null +++ b/data/model_data_json/google_siglip-large-patch16-384.json @@ -0,0 +1,18 @@ +{ + "model_id": "google/siglip-large-patch16-384", + "downloads": 152316, + "tags": [ + "transformers", + "safetensors", + "siglip", + "zero-shot-image-classification", + "vision", + "arxiv:2303.15343", + "arxiv:2209.06794", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # SigLIP (large-sized model) SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here. ## Intended uses & limitations You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot image classification: Alternatively, one can leverage the pipeline API which abstracts away the complexity for the user: For more code examples, we refer to the documentation. ## Training procedure ### Training data SigLIP is pre-trained on the English image-text pairs of the WebLI dataset (Chen et al., 2023). ### Preprocessing Images are resized/rescaled to the same resolution (384x384) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). Texts are tokenized and padded to the same length (64 tokens). ### Compute The model was trained on 16 TPU-v4 chips for three days. ## Evaluation results Evaluation of SigLIP compared to CLIP is shown below (taken from the paper). \"drawing\" ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs zero-shot image classification and image-text retrieval using a sigmoid loss function for improved multimodal learning." +} \ No newline at end of file diff --git a/data/model_data_json/google_siglip-so400m-patch14-384.json b/data/model_data_json/google_siglip-so400m-patch14-384.json new file mode 100644 index 0000000000000000000000000000000000000000..9e2c1d95c95a0271270a73213030f34df431fe45 --- /dev/null +++ b/data/model_data_json/google_siglip-so400m-patch14-384.json @@ -0,0 +1,19 @@ +{ + "model_id": "google/siglip-so400m-patch14-384", + "downloads": 6163880, + "tags": [ + "transformers", + "safetensors", + "siglip", + "zero-shot-image-classification", + "vision", + "arxiv:2303.15343", + "arxiv:2305.13035", + "arxiv:2209.06794", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # SigLIP (shape-optimized model) SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. This model has the SoViT-400m architecture, which is the shape-optimized version as presented in Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design by Alabdulmohsin et al. Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here. ## Intended uses & limitations You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the model hub to look for other versions on a task that interests you. ### How to use Here is how to use this model to perform zero-shot image classification: Alternatively, one can leverage the pipeline API which abstracts away the complexity for the user: For more code examples, we refer to the documentation. ## Training procedure ### Training data SigLIP is pre-trained on the WebLI dataset (Chen et al., 2023). ### Preprocessing Images are resized/rescaled to the same resolution (384x384) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). Texts are tokenized and padded to the same length (64 tokens). ### Compute The model was trained on 16 TPU-v4 chips for three days. ## Evaluation results Evaluation of SigLIP compared to CLIP is shown below (taken from the paper). \"drawing\" ### BibTeX entry and citation info", + "model_explanation_gemini": "A vision-language model optimized for zero-shot image classification and image-text retrieval using a sigmoid loss function, pre-trained on WebLI at 384x384 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/google_siglip2-so400m-patch14-384.json b/data/model_data_json/google_siglip2-so400m-patch14-384.json new file mode 100644 index 0000000000000000000000000000000000000000..763d0d97e0527189a561aafe26f794e2a3217158 --- /dev/null +++ b/data/model_data_json/google_siglip2-so400m-patch14-384.json @@ -0,0 +1,19 @@ +{ + "model_id": "google/siglip2-so400m-patch14-384", + "downloads": 608322, + "tags": [ + "transformers", + "safetensors", + "siglip", + "vision", + "zero-shot-image-classification", + "arxiv:2502.14786", + "arxiv:2303.15343", + "arxiv:2209.06794", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision widget: - src: >- candidate_labels: bee in the sky, bee on the flower example_title: Bee library_name: transformers pipeline_tag: zero-shot-image-classification --- # SigLIP 2 So400m SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features. ## Intended uses You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks). Here is how to use this model to perform zero-shot image classification: You can encode an image using the Vision Tower like so: For more code examples, we refer to the siglip documentation. ## Training procedure SigLIP 2 adds some clever training objectives on top of SigLIP: 1. Decoder loss 2. Global-local and masked prediction loss 3. Aspect ratio and resolution adaptibility ### Training data SigLIP 2 is pre-trained on the WebLI dataset (Chen et al., 2023). ### Compute The model was trained on up to 2048 TPU-v5e chips. ## Evaluation results Evaluation of SigLIP 2 is shown below (taken from the paper). !Evaluation Table ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs zero-shot image classification and image-text retrieval by leveraging improved semantic understanding and dense features through advanced training objectives." +} \ No newline at end of file diff --git a/data/model_data_json/google_siglip2-so400m-patch16-naflex.json b/data/model_data_json/google_siglip2-so400m-patch16-naflex.json new file mode 100644 index 0000000000000000000000000000000000000000..c9c83d3496e41cef378ff9ad87905e0ecd00122b --- /dev/null +++ b/data/model_data_json/google_siglip2-so400m-patch16-naflex.json @@ -0,0 +1,19 @@ +{ + "model_id": "google/siglip2-so400m-patch16-naflex", + "downloads": 157686, + "tags": [ + "transformers", + "safetensors", + "siglip2", + "zero-shot-image-classification", + "vision", + "arxiv:2502.14786", + "arxiv:2303.15343", + "arxiv:2209.06794", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision widget: - src: >- candidate_labels: bee in the sky, bee on the flower example_title: Bee library_name: transformers pipeline_tag: zero-shot-image-classification --- # SigLIP 2 So400m SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features. ## Intended uses You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks). Here is how to use this model to perform zero-shot image classification: You can encode an image using the Vision Tower like so: For more code examples, we refer to the siglip2 documentation. ## Training procedure SigLIP 2 adds some clever training objectives on top of SigLIP: 1. Decoder loss 2. Global-local and masked prediction loss 3. Aspect ratio and resolution adaptibility ### Training data SigLIP 2 is pre-trained on the WebLI dataset (Chen et al., 2023). ### Compute The model was trained on up to 2048 TPU-v5e chips. ## Evaluation results Evaluation of SigLIP 2 is shown below (taken from the paper). !Evaluation Table ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs zero-shot image classification and image-text retrieval by extending SigLIP's pretraining with enhanced semantic understanding and dense features." +} \ No newline at end of file diff --git a/data/model_data_json/google_t5-v1_1-base.json b/data/model_data_json/google_t5-v1_1-base.json new file mode 100644 index 0000000000000000000000000000000000000000..8417a667e72493edf307b7dd664f63fa94f887c2 --- /dev/null +++ b/data/model_data_json/google_t5-v1_1-base.json @@ -0,0 +1,23 @@ +{ + "model_id": "google/t5-v1_1-base", + "downloads": 184213, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "t5", + "text2text-generation", + "en", + "dataset:c4", + "arxiv:2002.05202", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - c4 license: apache-2.0 --- Google's T5 Version 1.1 ## Version 1.1 T5 Version 1.1 includes the following improvements compared to the original T5 model- GEGLU activation in feed-forward hidden layer, rather than ReLU - see here. - Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning. - Pre-trained on C4 only without mixing in the downstream tasks. - no parameter sharing between embedding and classifier layer - \"xl\" and \"xxl\" replace \"3B\" and \"11B\". The model shapes are a bit different - larger and smaller and . **Note**: T5 Version 1.1 was only pre-trained on C4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. Pretraining Dataset: C4 Other Community Checkpoints: here Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Authors: *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* ## Abstract Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code. !model image", + "model_explanation_gemini": "A text-to-text transformer model pre-trained on C4 data that requires fine-tuning for downstream NLP tasks like summarization, question answering, and text classification." +} \ No newline at end of file diff --git a/data/model_data_json/google_t5-v1_1-small.json b/data/model_data_json/google_t5-v1_1-small.json new file mode 100644 index 0000000000000000000000000000000000000000..3c564e97fc4d3e3dbf5ba48f7436372b39b3ff24 --- /dev/null +++ b/data/model_data_json/google_t5-v1_1-small.json @@ -0,0 +1,23 @@ +{ + "model_id": "google/t5-v1_1-small", + "downloads": 124537, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "t5", + "text2text-generation", + "en", + "dataset:c4", + "arxiv:2002.05202", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - c4 license: apache-2.0 --- Google's T5 Version 1.1 ## Version 1.1 T5 Version 1.1 includes the following improvements compared to the original T5 model- GEGLU activation in feed-forward hidden layer, rather than ReLU - see here. - Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning. - Pre-trained on C4 only without mixing in the downstream tasks. - no parameter sharing between embedding and classifier layer - \"xl\" and \"xxl\" replace \"3B\" and \"11B\". The model shapes are a bit different - larger and smaller and . **Note**: T5 Version 1.1 was only pre-trained on C4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. Pretraining Dataset: C4 Other Community Checkpoints: here Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Authors: *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* ## Abstract Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code. !model image", + "model_explanation_gemini": "A text-to-text transformer model requiring fine-tuning for downstream NLP tasks, pre-trained on the C4 dataset with improved architecture and training techniques." +} \ No newline at end of file diff --git a/data/model_data_json/google_t5-v1_1-xl.json b/data/model_data_json/google_t5-v1_1-xl.json new file mode 100644 index 0000000000000000000000000000000000000000..d7339538812cf48e32b878244fa3e66833b43b28 --- /dev/null +++ b/data/model_data_json/google_t5-v1_1-xl.json @@ -0,0 +1,21 @@ +{ + "model_id": "google/t5-v1_1-xl", + "downloads": 77361, + "tags": [ + "transformers", + "pytorch", + "tf", + "t5", + "text2text-generation", + "en", + "dataset:c4", + "arxiv:2002.05202", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - c4 license: apache-2.0 --- Google's T5 Version 1.1 ## Version 1.1 T5 Version 1.1 includes the following improvements compared to the original T5 model- GEGLU activation in feed-forward hidden layer, rather than ReLU - see here. - Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning. - Pre-trained on C4 only without mixing in the downstream tasks. - no parameter sharing between embedding and classifier layer - \"xl\" and \"xxl\" replace \"3B\" and \"11B\". The model shapes are a bit different - larger and smaller and . **Note**: T5 Version 1.1 was only pre-trained on C4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. Pretraining Dataset: C4 Other Community Checkpoints: here Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Authors: *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* ## Abstract Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code. !model image" +} \ No newline at end of file diff --git a/data/model_data_json/google_t5-v1_1-xxl.json b/data/model_data_json/google_t5-v1_1-xxl.json new file mode 100644 index 0000000000000000000000000000000000000000..5500414d688e45489338693ecd4db05cbf3b7623 --- /dev/null +++ b/data/model_data_json/google_t5-v1_1-xxl.json @@ -0,0 +1,22 @@ +{ + "model_id": "google/t5-v1_1-xxl", + "downloads": 467329, + "tags": [ + "transformers", + "pytorch", + "tf", + "t5", + "text2text-generation", + "en", + "dataset:c4", + "arxiv:2002.05202", + "arxiv:1910.10683", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - c4 license: apache-2.0 --- Google's T5 Version 1.1 ## Version 1.1 T5 Version 1.1 includes the following improvements compared to the original T5 model- GEGLU activation in feed-forward hidden layer, rather than ReLU - see here. - Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning. - Pre-trained on C4 only without mixing in the downstream tasks. - no parameter sharing between embedding and classifier layer - \"xl\" and \"xxl\" replace \"3B\" and \"11B\". The model shapes are a bit different - larger and smaller and . **Note**: T5 Version 1.1 was only pre-trained on C4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. Pretraining Dataset: C4 Other Community Checkpoints: here Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Authors: *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* ## Abstract Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code. !model image", + "model_explanation_gemini": "A text-to-text transformer model requiring fine-tuning for downstream NLP tasks, pre-trained on the C4 dataset with architectural improvements over the original T5." +} \ No newline at end of file diff --git a/data/model_data_json/google_tapas-large-finetuned-wtq.json b/data/model_data_json/google_tapas-large-finetuned-wtq.json new file mode 100644 index 0000000000000000000000000000000000000000..3541990b04b7e6f878c4cf5b97958d275ce0e7b8 --- /dev/null +++ b/data/model_data_json/google_tapas-large-finetuned-wtq.json @@ -0,0 +1,22 @@ +{ + "model_id": "google/tapas-large-finetuned-wtq", + "downloads": 116887, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "tapas", + "table-question-answering", + "en", + "dataset:wikitablequestions", + "arxiv:2004.02349", + "arxiv:2010.00571", + "arxiv:1508.00305", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - tapas - table-question-answering license: apache-2.0 datasets: - wikitablequestions --- # TAPAS large model fine-tuned on WikiTable Questions (WTQ) This model has 2 versions which can be used. The default version corresponds to the checkpoint of the original Github repository. This model was pre-trained on MLM and an additional step which the authors call intermediate pre-training, and then fine-tuned in a chain on SQA, WikiSQL and finally WTQ. It uses relative position embeddings (i.e. resetting the position index at every cell of the table). The other (non-default) version which can be used is: - , which corresponds to (intermediate pre-training, absolute position embeddings). Disclaimer: The team releasing TAPAS did not write a model card for this model so this model card has been written by the Hugging Face team and contributors. ## Results Size | Reset | Dev Accuracy | Link -------- | --------| -------- | ---- **LARGE** | **noreset** | **0.5062** | tapas-large-finetuned-wtq (with absolute pos embeddings) **LARGE** | **reset** | **0.5097** | tapas-large-finetuned-wtq BASE | noreset | 0.4525 | tapas-base-finetuned-wtq (with absolute pos embeddings) BASE | reset | 0.4638 | tapas-base-finetuned-wtq MEDIUM | noreset | 0.4324 | tapas-medium-finetuned-wtq (with absolute pos embeddings) MEDIUM | reset | 0.4324 | tapas-medium-finetuned-wtq SMALL | noreset | 0.3681 | tapas-small-finetuned-wtq (with absolute pos embeddings) SMALL | reset | 0.3762 | tapas-small-finetuned-wtq MINI | noreset | 0.2783 | tapas-mini-finetuned-wtq (with absolute pos embeddings) MINI | reset | 0.2854 | tapas-mini-finetuned-wtq TINY | noreset | 0.0823 | tapas-tiny-finetuned-wtq (with absolute pos embeddings) TINY | reset | 0.1039 | tapas-tiny-finetuned-wtq ## Model description TAPAS is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was pretrained with two objectives: - Masked language modeling (MLM): taking a (flattened) table and associated context, the model randomly masks 15% of the words in the input, then runs the entire (partially masked) sequence through the model. The model then has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of a table and associated text. - Intermediate pre-training: to encourage numerical reasoning on tables, the authors additionally pre-trained the model by creating a balanced dataset of millions of syntactically created training examples. Here, the model must predict (classify) whether a sentence is supported or refuted by the contents of a table. The training examples are created based on synthetic as well as counterfactual statements. This way, the model learns an inner representation of the English language used in tables and associated texts, which can then be used to extract features useful for downstream tasks such as answering questions about a table, or determining whether a sentence is entailed or refuted by the contents of a table. Fine-tuning is done by adding a cell selection head and aggregation head on top of the pre-trained model, and then jointly train these randomly initialized classification heads with the base model on SQa, WikiSQL and finally WTQ. ## Intended uses & limitations You can use this model for answering questions related to a table. For code examples, we refer to the documentation of TAPAS on the HuggingFace website. ## Training procedure ### Preprocessing The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: The authors did first convert the WTQ dataset into the format of SQA using automatic conversion scripts. ### Fine-tuning The model was fine-tuned on 32 Cloud TPU v3 cores for 50,000 steps with maximum sequence length 512 and batch size of 512. In this setup, fine-tuning takes around 10 hours. The optimizer used is Adam with a learning rate of 1.93581e-5, and a warmup ratio of 0.128960. An inductive bias is added such that the model only selects cells of the same column. This is reflected by the parameter of . See the paper for more details (tables 11 and 12). ### BibTeX entry and citation info", + "model_explanation_gemini": "Answers questions about tabular data by analyzing table structures and content, fine-tuned specifically for WikiTable Questions." +} \ No newline at end of file diff --git a/data/model_data_json/google_vit-base-patch16-224-in21k.json b/data/model_data_json/google_vit-base-patch16-224-in21k.json new file mode 100644 index 0000000000000000000000000000000000000000..4249c274e812331766a2f262b7b25d0201274a9c --- /dev/null +++ b/data/model_data_json/google_vit-base-patch16-224-in21k.json @@ -0,0 +1,21 @@ +{ + "model_id": "google/vit-base-patch16-224-in21k", + "downloads": 2348864, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "vit", + "image-feature-extraction", + "vision", + "dataset:imagenet-21k", + "arxiv:2010.11929", + "arxiv:2006.03677", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision datasets: - imagenet-21k inference: false --- # Vision Transformer (base-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not provide any fine-tuned heads, as these were zero'd by Google researchers. However, the model does include the pre-trained pooler, which can be used for downstream tasks (such as image classification). By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model in PyTorch: Here is how to use this model in JAX/Flax: ## Training data The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Pre-training resolution is 224. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A Vision Transformer (ViT) model pre-trained on ImageNet-21k for image classification, processing 224x224 pixel images as sequences of 16x16 patches to extract features suitable for downstream tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google_vit-base-patch16-224.json b/data/model_data_json/google_vit-base-patch16-224.json new file mode 100644 index 0000000000000000000000000000000000000000..9a04275861c93370ff7c1748c4aab72c35cd01a7 --- /dev/null +++ b/data/model_data_json/google_vit-base-patch16-224.json @@ -0,0 +1,24 @@ +{ + "model_id": "google/vit-base-patch16-224", + "downloads": 4191606, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "vit", + "image-classification", + "vision", + "dataset:imagenet-1k", + "dataset:imagenet-21k", + "arxiv:2010.11929", + "arxiv:2006.03677", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k - imagenet-21k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # Vision Transformer (base-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ## Training data The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on ImageNet, a dataset consisting of 1 million images and 1k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Training resolution is 224. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A Vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks at 224x224 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/google_vit-hybrid-base-bit-384.json b/data/model_data_json/google_vit-hybrid-base-bit-384.json new file mode 100644 index 0000000000000000000000000000000000000000..8c71c26045ee3bdbf97a24ba1d72d6d33275e1c6 --- /dev/null +++ b/data/model_data_json/google_vit-hybrid-base-bit-384.json @@ -0,0 +1,21 @@ +{ + "model_id": "google/vit-hybrid-base-bit-384", + "downloads": 646949, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit-hybrid", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:2010.11929", + "arxiv:2006.03677", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k --- # Vision Transformer (base-sized model) - Hybrid The hybrid Vision Transformer (ViT) model was proposed in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. It's the first paper that successfully trains a Transformer encoder on ImageNet, attaining very good results compared to familiar convolutional architectures. ViT hybrid is a slight variant of the plain Vision Transformer, by leveraging a convolutional backbone (specifically, BiT) whose features are used as initial \"tokens\" for the Transformer. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description *While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.* ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ## Training data The ViT-Hybrid model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on ImageNet, a dataset consisting of 1 million images and 1k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Training resolution is 224. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A hybrid Vision Transformer model combining a convolutional backbone (BiT) with Transformer architecture for image classification, pre-trained on ImageNet-21k and fine-tuned on ImageNet." +} \ No newline at end of file diff --git a/data/model_data_json/google_vit-large-patch16-224-in21k.json b/data/model_data_json/google_vit-large-patch16-224-in21k.json new file mode 100644 index 0000000000000000000000000000000000000000..7c12dc3a78ad647b6b51e35f5f036b8e305b02b1 --- /dev/null +++ b/data/model_data_json/google_vit-large-patch16-224-in21k.json @@ -0,0 +1,20 @@ +{ + "model_id": "google/vit-large-patch16-224-in21k", + "downloads": 82508, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "vit", + "image-feature-extraction", + "vision", + "dataset:imagenet-21k", + "arxiv:2010.11929", + "arxiv:2006.03677", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision datasets: - imagenet-21k inference: false --- # Vision Transformer (large-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Note that this model does not provide any fine-tuned heads, as these were zero'd by Google researchers. However, the model does include the pre-trained pooler, which can be used for downstream tasks (such as image classification). By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model to embed images, but it's mostly intended to be fine-tuned on a downstream task. ### How to use Here is how to use this model: Currently, both the feature extractor and model support PyTorch. Tensorflow and JAX/FLAX are coming soon, and the API of ViTFeatureExtractor might change. ## Training data The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Pre-training resolution is 224. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info" +} \ No newline at end of file diff --git a/data/model_data_json/google_vit-large-patch16-224.json b/data/model_data_json/google_vit-large-patch16-224.json new file mode 100644 index 0000000000000000000000000000000000000000..9f43d6a01fc9438f7012502594db2352f7121eec --- /dev/null +++ b/data/model_data_json/google_vit-large-patch16-224.json @@ -0,0 +1,23 @@ +{ + "model_id": "google/vit-large-patch16-224", + "downloads": 183847, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "vit", + "image-classification", + "vision", + "dataset:imagenet-1k", + "dataset:imagenet-21k", + "arxiv:2010.11929", + "arxiv:2006.03677", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - image-classification - vision datasets: - imagenet-1k - imagenet-21k --- # Vision Transformer (large-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at the same resolution, 224x224. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Currently, both the feature extractor and model support PyTorch. Tensorflow and JAX/FLAX are coming soon, and the API of ViTFeatureExtractor might change. ## Training data The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on ImageNet, a dataset consisting of 1 million images and 1k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Pre-training resolution is 224. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A large Vision Transformer (ViT) model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks at 224x224 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/google_vit-large-patch16-384.json b/data/model_data_json/google_vit-large-patch16-384.json new file mode 100644 index 0000000000000000000000000000000000000000..7f054b618ff45f93ab59c05f6ccbaef2b9f19d4b --- /dev/null +++ b/data/model_data_json/google_vit-large-patch16-384.json @@ -0,0 +1,23 @@ +{ + "model_id": "google/vit-large-patch16-384", + "downloads": 93860, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "vit", + "image-classification", + "vision", + "dataset:imagenet", + "dataset:imagenet-21k", + "arxiv:2010.11929", + "arxiv:2006.03677", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - image-classification - vision datasets: - imagenet - imagenet-21k --- # Vision Transformer (large-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 384x384. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Currently, both the feature extractor and model support PyTorch. Tensorflow and JAX/FLAX are coming soon, and the API of ViTFeatureExtractor might change. ## Training data The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on ImageNet, a dataset consisting of 1 million images and 1k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224 during pre-training, 384x384 during fine-tuning) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Pre-training resolution is 224. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A large Vision Transformer (ViT) model pre-trained on ImageNet-21k and fine-tuned on ImageNet for high-resolution (384x384) image classification tasks." +} \ No newline at end of file diff --git a/data/model_data_json/google_vit-large-patch32-384.json b/data/model_data_json/google_vit-large-patch32-384.json new file mode 100644 index 0000000000000000000000000000000000000000..8ee402e5714182541e19422f3aabb9984bd632f0 --- /dev/null +++ b/data/model_data_json/google_vit-large-patch32-384.json @@ -0,0 +1,23 @@ +{ + "model_id": "google/vit-large-patch32-384", + "downloads": 114433, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "vit", + "image-classification", + "vision", + "dataset:imagenet", + "dataset:imagenet-21k", + "arxiv:2010.11929", + "arxiv:2006.03677", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - image-classification - vision datasets: - imagenet - imagenet-21k --- # Vision Transformer (large-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 384x384. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. Images are presented to the model as a sequence of fixed-size patches (resolution 32x32), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Currently, both the feature extractor and model support PyTorch. Tensorflow and JAX/FLAX are coming soon, and the API of ViTFeatureExtractor might change. ## Training data The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on ImageNet, a dataset consisting of 1 million images and 1k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224 during pre-training, 384x384 during fine-tuning) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Pre-training resolution is 224. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A large Vision Transformer (ViT) model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks, processing images as 32x32 patches at 384x384 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/guillaumekln_faster-whisper-large-v2.json b/data/model_data_json/guillaumekln_faster-whisper-large-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..aba590540a1a62e92ed34ed8275e5f288a4085cb --- /dev/null +++ b/data/model_data_json/guillaumekln_faster-whisper-large-v2.json @@ -0,0 +1,112 @@ +{ + "model_id": "guillaumekln/faster-whisper-large-v2", + "downloads": 154144, + "tags": [ + "ctranslate2", + "audio", + "automatic-speech-recognition", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "license:mit", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition license: mit library_name: ctranslate2 --- # Whisper large-v2 model for CTranslate2 This repository contains the conversion of openai/whisper-large-v2 to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper. ## Example ## Conversion details The original model was converted with the following command: Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the option in CTranslate2. ## More information **For more information about the original model, see its model card.**", + "model_explanation_gemini": "Converts OpenAI's Whisper large-v2 model to the CTranslate2 format for efficient automatic speech recognition in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/h2oai_h2ogpt-4096-llama2-7b-chat.json b/data/model_data_json/h2oai_h2ogpt-4096-llama2-7b-chat.json new file mode 100644 index 0000000000000000000000000000000000000000..44e1f9468defd62b7bbba6b36713c06f0fabbfbe --- /dev/null +++ b/data/model_data_json/h2oai_h2ogpt-4096-llama2-7b-chat.json @@ -0,0 +1,21 @@ +{ + "model_id": "h2oai/h2ogpt-4096-llama2-7b-chat", + "downloads": 76657, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "llama-2", + "h2ogpt", + "en", + "license:llama2", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- inference: false language: - en license: llama2 model_type: llama pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-2 - h2ogpt --- h2oGPT clone of Meta's Llama 2 7B Chat. Try it live on our h2oGPT demo with side-by-side LLM comparisons and private document chat! See how it compares to other models on our LLM Leaderboard! See more at H2O.ai ## Model Architecture" +} \ No newline at end of file diff --git a/data/model_data_json/h94_IP-Adapter-FaceID.json b/data/model_data_json/h94_IP-Adapter-FaceID.json new file mode 100644 index 0000000000000000000000000000000000000000..e17ff303b3bc6566159c931d8fa65608d128b32c --- /dev/null +++ b/data/model_data_json/h94_IP-Adapter-FaceID.json @@ -0,0 +1,14 @@ +{ + "model_id": "h94/IP-Adapter-FaceID", + "downloads": 272845, + "tags": [ + "diffusers", + "text-to-image", + "stable-diffusion", + "en", + "arxiv:2308.06721", + "region:us" + ], + "description": "--- tags: - text-to-image - stable-diffusion language: - en library_name: diffusers --- # IP-Adapter-FaceID Model Card
**Project Page** **|** **Paper (ArXiv)** **|** **Code**
--- ## Introduction An experimental version of IP-Adapter-FaceID: we use face ID embedding from a face recognition model instead of CLIP image embedding, additionally, we use LoRA to improve ID consistency. IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts. !results **Update 2023/12/27**: IP-Adapter-FaceID-Plus: face ID embedding (for face ID) + CLIP image embedding (for face structure)
!results
**Update 2023/12/28**: IP-Adapter-FaceID-PlusV2: face ID embedding (for face ID) + controllable CLIP image embedding (for face structure) You can adjust the weight of the face structure to get different generation!
!results
**Update 2024/01/04**: IP-Adapter-FaceID-SDXL: An experimental SDXL version of IP-Adapter-FaceID
!results
**Update 2024/01/17**: IP-Adapter-FaceID-PlusV2-SDXL: An experimental SDXL version of IP-Adapter-FaceID-PlusV2 **Update 2024/01/19**: IP-Adapter-FaceID-Portrait: same with IP-Adapter-FaceID but for portrait generation (no lora! no controlnet!). Specifically, it accepts multiple facial images to enhance similarity (the default is 5).
!results
## Usage ### IP-Adapter-FaceID Firstly, you should use insightface to extract face ID embedding: Then, you can generate images conditioned on the face embeddings: you can also use a normal IP-Adapter and a normal LoRA to load model: ### IP-Adapter-FaceID-SDXL Firstly, you should use insightface to extract face ID embedding: Then, you can generate images conditioned on the face embeddings: ### IP-Adapter-FaceID-Plus Firstly, you should use insightface to extract face ID embedding and face image: Then, you can generate images conditioned on the face embeddings: ### IP-Adapter-FaceID-Portrait ## Limitations and Bias - The models do not achieve perfect photorealism and ID consistency. - The generalization of the models is limited due to limitations of the training data, base model and face recognition model. ## Non-commercial use **AS InsightFace pretrained models are available for non-commercial research purposes, IP-Adapter-FaceID models are released exclusively for research purposes and is not intended for commercial use.**", + "model_explanation_gemini": "Generates diverse style images from text prompts while maintaining facial identity consistency using face recognition embeddings and LoRA enhancements." +} \ No newline at end of file diff --git a/data/model_data_json/hantian_layoutreader.json b/data/model_data_json/hantian_layoutreader.json new file mode 100644 index 0000000000000000000000000000000000000000..888004af793960d5b875adfa11cf363d369b2e60 --- /dev/null +++ b/data/model_data_json/hantian_layoutreader.json @@ -0,0 +1,16 @@ +{ + "model_id": "hantian/layoutreader", + "downloads": 246107, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "layoutlmv3", + "token-classification", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers --- # LayoutReader A reading order prediction model. Turn bboxes extracted from PDF or detected by OCR into readable order. Please refer to Github for more details.", + "model_explanation_gemini": "Predicts reading order for text bounding boxes extracted from PDFs or OCR outputs." +} \ No newline at end of file diff --git a/data/model_data_json/hexgrad_Kokoro-82M.json b/data/model_data_json/hexgrad_Kokoro-82M.json new file mode 100644 index 0000000000000000000000000000000000000000..e701917093423f8ea2252819c872445eccd4bf29 --- /dev/null +++ b/data/model_data_json/hexgrad_Kokoro-82M.json @@ -0,0 +1,17 @@ +{ + "model_id": "hexgrad/Kokoro-82M", + "downloads": 1757719, + "tags": [ + "text-to-speech", + "en", + "arxiv:2306.07691", + "arxiv:2203.02395", + "base_model:yl4579/StyleTTS2-LJSpeech", + "base_model:finetune:yl4579/StyleTTS2-LJSpeech", + "doi:10.57967/hf/4329", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en base_model: - yl4579/StyleTTS2-LJSpeech pipeline_tag: text-to-speech --- **Kokoro** is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects. 🐈 **GitHub**: 🚀 **Demo**: > [!NOTE] > As of April 2025, the market rate of Kokoro served over API is **under $1 per million characters of text input**, or under $0.06 per hour of audio output. (On average, 1000 characters of input is about 1 minute of output.) Sources: ArtificialAnalysis/Replicate at 65 cents per M chars and DeepInfra at 80 cents per M chars. > > This is an Apache-licensed model, and Kokoro has been deployed in numerous projects and commercial APIs. We welcome the deployment of the model in real use cases. > [!CAUTION] > Fake websites like kokorottsai_com (snapshot: and kokorotts_net (snapshot: are likely scams masquerading under the banner of a popular model. > > Any website containing \"kokoro\" in its root domain (e.g. kokorottsai_com, kokorotts_net) is **NOT owned by and NOT affiliated with this model page or its author**, and attempts to imply otherwise are red flags. - Releases - Usage - EVAL.md ↗️ - SAMPLES.md ↗️ - VOICES.md ↗️ - Model Facts - Training Details - Creative Commons Attribution - Acknowledgements ### Releases | Model | Published | Training Data | Langs & Voices | SHA256 | | ----- | --------- | ------------- | -------------- | ------ | | **v1.0** | **2025 Jan 27** | **Few hundred hrs** | **8 & 54** | | | v0.19 | 2024 Dec 25 | <100 hrs | 1 & 10 | | | Training Costs | v0.19 | v1.0 | **Total** | | -------------- | ----- | ---- | ----- | | in A100 80GB GPU hours | 500 | 500 | **1000** | | average hourly rate | $0.80/h | $1.20/h | **$1/h** | | in USD | $400 | $600 | **$1000** | ### Usage You can run this basic cell on Google Colab. Listen to samples. For more languages and details, see Advanced Usage. Under the hood, uses []( a G2P library at ### Model Facts **Architecture:** - StyleTTS 2: - ISTFTNet: - Decoder only: no diffusion, no encoder release **Architected by:** Li et al @ **Trained by**: on Discord **Languages:** Multiple **Model SHA256 Hash:** ### Training Details **Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include: - Public domain audio - Audio licensed under Apache, MIT, etc - Synthetic audio[1] generated by closed[2] TTS models from large providers
[1] [2] No synthetic audio from open TTS models or \"custom voice clones\" **Total Dataset Size:** A few hundred hours of audio **Total Training Cost:** About $1000 for 1000 hours of A100 80GB vRAM ### Creative Commons Attribution The following CC BY audio was part of the dataset used to train Kokoro v1.0. | Audio Data | Duration Used | License | Added to Training Set After | | ---------- | ------------- | ------- | --------------------------- | | Koniwa | <1h | CC BY 3.0 | v0.19 / 22 Nov 2024 | | SIWIS | <11h | CC BY 4.0 | v0.19 / 22 Nov 2024 | ### Acknowledgements - 🛠️ @yl4579 for architecting StyleTTS 2. - 🏆 @Pendrokar for adding Kokoro as a contender in the TTS Spaces Arena. - 📊 Thank you to everyone who contributed synthetic training data. - ❤️ Special thanks to all compute sponsors. - 👾 Discord server: - 🪽 Kokoro is a Japanese word that translates to \"heart\" or \"spirit\". It is also the name of an AI in the Terminator franchise. \"kokoro\"", + "model_explanation_gemini": "Kokoro-82M is an Apache-licensed, lightweight text-to-speech model delivering efficient, high-quality audio synthesis from text input at low cost." +} \ No newline at end of file diff --git a/data/model_data_json/hfl_chinese-roberta-wwm-ext.json b/data/model_data_json/hfl_chinese-roberta-wwm-ext.json new file mode 100644 index 0000000000000000000000000000000000000000..87728cb1005cf868cfd5870b7e12840666e1bef2 --- /dev/null +++ b/data/model_data_json/hfl_chinese-roberta-wwm-ext.json @@ -0,0 +1,21 @@ +{ + "model_id": "hfl/chinese-roberta-wwm-ext", + "downloads": 98430, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "zh", + "arxiv:1906.08101", + "arxiv:2004.13922", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - zh tags: - bert license: \"apache-2.0\" --- # Please use 'Bert' related functions to load this model! ## Chinese BERT with Whole Word Masking For further accelerating Chinese natural language processing, we provide **Chinese pre-trained BERT with Whole Word Masking**. **Pre-Training with Whole Word Masking for Chinese BERT** Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu This repository is developed based on: You may also interested in, - Chinese BERT series: - Chinese MacBERT: - Chinese ELECTRA: - Chinese XLNet: - Knowledge Distillation Toolkit - TextBrewer: More resources by HFL: ## Citation If you find the technical report or resource is useful, please cite the following technical report in your paper. - Primary: - Secondary:", + "model_explanation_gemini": "Chinese pre-trained BERT model with whole word masking for enhanced natural language processing tasks in Chinese." +} \ No newline at end of file diff --git a/data/model_data_json/hiiamsid_sentence_similarity_spanish_es.json b/data/model_data_json/hiiamsid_sentence_similarity_spanish_es.json new file mode 100644 index 0000000000000000000000000000000000000000..7c019214e3c64cd61210bd262fa8421eaa5ed093 --- /dev/null +++ b/data/model_data_json/hiiamsid_sentence_similarity_spanish_es.json @@ -0,0 +1,21 @@ +{ + "model_id": "hiiamsid/sentence_similarity_spanish_es", + "downloads": 339193, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "es", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity language: - es tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers license: apache-2.0 --- # hiiamsid/sentence_similarity_spanish_es This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Training The model was trained with the parameters: **DataLoader**: of length 360 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors - Datasets : stsb_multi_mt - Model : dccuchile/bert-base-spanish-wwm-cased - Sentence Transformers Semantic Textual Similarity", + "model_explanation_gemini": "Computes semantic similarity between Spanish sentences by mapping them to 768-dimensional vectors for tasks like clustering or search." +} \ No newline at end of file diff --git a/data/model_data_json/hkunlp_instructor-large.json b/data/model_data_json/hkunlp_instructor-large.json new file mode 100644 index 0000000000000000000000000000000000000000..3e91af1afe2ab95686fa89bacd7bf37ddd2f1f36 --- /dev/null +++ b/data/model_data_json/hkunlp_instructor-large.json @@ -0,0 +1,39 @@ +{ + "model_id": "hkunlp/instructor-large", + "downloads": 153542, + "tags": [ + "sentence-transformers", + "pytorch", + "t5", + "text-embedding", + "embeddings", + "information-retrieval", + "beir", + "text-classification", + "language-model", + "text-clustering", + "text-semantic-similarity", + "text-evaluation", + "prompt-retrieval", + "text-reranking", + "feature-extraction", + "sentence-similarity", + "transformers", + "English", + "Sentence Similarity", + "natural_questions", + "ms_marco", + "fever", + "hotpot_qa", + "mteb", + "en", + "arxiv:2212.09741", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - text-embedding - embeddings - information-retrieval - beir - text-classification - language-model - text-clustering - text-semantic-similarity - text-evaluation - prompt-retrieval - text-reranking - sentence-transformers - feature-extraction - sentence-similarity - transformers - t5 - English - Sentence Similarity - natural_questions - ms_marco - fever - hotpot_qa - mteb language: en inference: false license: apache-2.0 model-index: - name: INSTRUCTOR results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 88.13432835820896 - type: ap value: 59.298209334395665 - type: f1 value: 83.31769058643586 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.526375 - type: ap value: 88.16327709705504 - type: f1 value: 91.51095801287843 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 47.856 - type: f1 value: 45.41490917650942 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 31.223 - type: map_at_10 value: 47.947 - type: map_at_100 value: 48.742000000000004 - type: map_at_1000 value: 48.745 - type: map_at_3 value: 43.137 - type: map_at_5 value: 45.992 - type: mrr_at_1 value: 32.432 - type: mrr_at_10 value: 48.4 - type: mrr_at_100 value: 49.202 - type: mrr_at_1000 value: 49.205 - type: mrr_at_3 value: 43.551 - type: mrr_at_5 value: 46.467999999999996 - type: ndcg_at_1 value: 31.223 - type: ndcg_at_10 value: 57.045 - type: ndcg_at_100 value: 60.175 - type: ndcg_at_1000 value: 60.233000000000004 - type: ndcg_at_3 value: 47.171 - type: ndcg_at_5 value: 52.322 - type: precision_at_1 value: 31.223 - type: precision_at_10 value: 8.599 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 19.63 - type: precision_at_5 value: 14.282 - type: recall_at_1 value: 31.223 - type: recall_at_10 value: 85.989 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.502 - type: recall_at_3 value: 58.89 - type: recall_at_5 value: 71.408 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 43.1621946393635 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 32.56417132407894 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 64.29539304390207 - type: mrr value: 76.44484017060196 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_spearman value: 84.38746499431112 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 78.51298701298701 - type: f1 value: 77.49041754069235 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.61848554098577 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 31.32623280148178 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 35.803000000000004 - type: map_at_10 value: 48.848 - type: map_at_100 value: 50.5 - type: map_at_1000 value: 50.602999999999994 - type: map_at_3 value: 45.111000000000004 - type: map_at_5 value: 47.202 - type: mrr_at_1 value: 44.635000000000005 - type: mrr_at_10 value: 55.593 - type: mrr_at_100 value: 56.169999999999995 - type: mrr_at_1000 value: 56.19499999999999 - type: mrr_at_3 value: 53.361999999999995 - type: mrr_at_5 value: 54.806999999999995 - type: ndcg_at_1 value: 44.635000000000005 - type: ndcg_at_10 value: 55.899 - type: ndcg_at_100 value: 60.958 - type: ndcg_at_1000 value: 62.302 - type: ndcg_at_3 value: 51.051 - type: ndcg_at_5 value: 53.351000000000006 - type: precision_at_1 value: 44.635000000000005 - type: precision_at_10 value: 10.786999999999999 - type: precision_at_100 value: 1.6580000000000001 - type: precision_at_1000 value: 0.213 - type: precision_at_3 value: 24.893 - type: precision_at_5 value: 17.740000000000002 - type: recall_at_1 value: 35.803000000000004 - type: recall_at_10 value: 68.657 - type: recall_at_100 value: 89.77199999999999 - type: recall_at_1000 value: 97.67 - type: recall_at_3 value: 54.066 - type: recall_at_5 value: 60.788 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 33.706 - type: map_at_10 value: 44.896 - type: map_at_100 value: 46.299 - type: map_at_1000 value: 46.44 - type: map_at_3 value: 41.721000000000004 - type: map_at_5 value: 43.486000000000004 - type: mrr_at_1 value: 41.592 - type: mrr_at_10 value: 50.529 - type: mrr_at_100 value: 51.22 - type: mrr_at_1000 value: 51.258 - type: mrr_at_3 value: 48.205999999999996 - type: mrr_at_5 value: 49.528 - type: ndcg_at_1 value: 41.592 - type: ndcg_at_10 value: 50.77199999999999 - type: ndcg_at_100 value: 55.383 - type: ndcg_at_1000 value: 57.288 - type: ndcg_at_3 value: 46.324 - type: ndcg_at_5 value: 48.346000000000004 - type: precision_at_1 value: 41.592 - type: precision_at_10 value: 9.516 - type: precision_at_100 value: 1.541 - type: precision_at_1000 value: 0.2 - type: precision_at_3 value: 22.399 - type: precision_at_5 value: 15.770999999999999 - type: recall_at_1 value: 33.706 - type: recall_at_10 value: 61.353 - type: recall_at_100 value: 80.182 - type: recall_at_1000 value: 91.896 - type: recall_at_3 value: 48.204 - type: recall_at_5 value: 53.89699999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 44.424 - type: map_at_10 value: 57.169000000000004 - type: map_at_100 value: 58.202 - type: map_at_1000 value: 58.242000000000004 - type: map_at_3 value: 53.825 - type: map_at_5 value: 55.714 - type: mrr_at_1 value: 50.470000000000006 - type: mrr_at_10 value: 60.489000000000004 - type: mrr_at_100 value: 61.096 - type: mrr_at_1000 value: 61.112 - type: mrr_at_3 value: 58.192 - type: mrr_at_5 value: 59.611999999999995 - type: ndcg_at_1 value: 50.470000000000006 - type: ndcg_at_10 value: 63.071999999999996 - type: ndcg_at_100 value: 66.964 - type: ndcg_at_1000 value: 67.659 - type: ndcg_at_3 value: 57.74399999999999 - type: ndcg_at_5 value: 60.367000000000004 - type: precision_at_1 value: 50.470000000000006 - type: precision_at_10 value: 10.019 - type: precision_at_100 value: 1.29 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 25.558999999999997 - type: precision_at_5 value: 17.467 - type: recall_at_1 value: 44.424 - type: recall_at_10 value: 77.02 - type: recall_at_100 value: 93.738 - type: recall_at_1000 value: 98.451 - type: recall_at_3 value: 62.888 - type: recall_at_5 value: 69.138 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.294 - type: map_at_10 value: 34.503 - type: map_at_100 value: 35.641 - type: map_at_1000 value: 35.724000000000004 - type: map_at_3 value: 31.753999999999998 - type: map_at_5 value: 33.190999999999995 - type: mrr_at_1 value: 28.362 - type: mrr_at_10 value: 36.53 - type: mrr_at_100 value: 37.541000000000004 - type: mrr_at_1000 value: 37.602000000000004 - type: mrr_at_3 value: 33.917 - type: mrr_at_5 value: 35.358000000000004 - type: ndcg_at_1 value: 28.362 - type: ndcg_at_10 value: 39.513999999999996 - type: ndcg_at_100 value: 44.815 - type: ndcg_at_1000 value: 46.839 - type: ndcg_at_3 value: 34.02 - type: ndcg_at_5 value: 36.522 - type: precision_at_1 value: 28.362 - type: precision_at_10 value: 6.101999999999999 - type: precision_at_100 value: 0.9129999999999999 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 14.161999999999999 - type: precision_at_5 value: 9.966 - type: recall_at_1 value: 26.294 - type: recall_at_10 value: 53.098 - type: recall_at_100 value: 76.877 - type: recall_at_1000 value: 91.834 - type: recall_at_3 value: 38.266 - type: recall_at_5 value: 44.287 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.407 - type: map_at_10 value: 25.185999999999996 - type: map_at_100 value: 26.533 - type: map_at_1000 value: 26.657999999999998 - type: map_at_3 value: 22.201999999999998 - type: map_at_5 value: 23.923 - type: mrr_at_1 value: 20.522000000000002 - type: mrr_at_10 value: 29.522 - type: mrr_at_100 value: 30.644 - type: mrr_at_1000 value: 30.713 - type: mrr_at_3 value: 26.679000000000002 - type: mrr_at_5 value: 28.483000000000004 - type: ndcg_at_1 value: 20.522000000000002 - type: ndcg_at_10 value: 30.656 - type: ndcg_at_100 value: 36.864999999999995 - type: ndcg_at_1000 value: 39.675 - type: ndcg_at_3 value: 25.319000000000003 - type: ndcg_at_5 value: 27.992 - type: precision_at_1 value: 20.522000000000002 - type: precision_at_10 value: 5.795999999999999 - type: precision_at_100 value: 1.027 - type: precision_at_1000 value: 0.13999999999999999 - type: precision_at_3 value: 12.396 - type: precision_at_5 value: 9.328 - type: recall_at_1 value: 16.407 - type: recall_at_10 value: 43.164 - type: recall_at_100 value: 69.695 - type: recall_at_1000 value: 89.41900000000001 - type: recall_at_3 value: 28.634999999999998 - type: recall_at_5 value: 35.308 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.473 - type: map_at_10 value: 41.676 - type: map_at_100 value: 43.120999999999995 - type: map_at_1000 value: 43.230000000000004 - type: map_at_3 value: 38.306000000000004 - type: map_at_5 value: 40.355999999999995 - type: mrr_at_1 value: 37.536 - type: mrr_at_10 value: 47.643 - type: mrr_at_100 value: 48.508 - type: mrr_at_1000 value: 48.551 - type: mrr_at_3 value: 45.348 - type: mrr_at_5 value: 46.744 - type: ndcg_at_1 value: 37.536 - type: ndcg_at_10 value: 47.823 - type: ndcg_at_100 value: 53.395 - type: ndcg_at_1000 value: 55.271 - type: ndcg_at_3 value: 42.768 - type: ndcg_at_5 value: 45.373000000000005 - type: precision_at_1 value: 37.536 - type: precision_at_10 value: 8.681 - type: precision_at_100 value: 1.34 - type: precision_at_1000 value: 0.165 - type: precision_at_3 value: 20.468 - type: precision_at_5 value: 14.495 - type: recall_at_1 value: 30.473 - type: recall_at_10 value: 60.092999999999996 - type: recall_at_100 value: 82.733 - type: recall_at_1000 value: 94.875 - type: recall_at_3 value: 45.734 - type: recall_at_5 value: 52.691 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.976000000000003 - type: map_at_10 value: 41.097 - type: map_at_100 value: 42.547000000000004 - type: map_at_1000 value: 42.659000000000006 - type: map_at_3 value: 37.251 - type: map_at_5 value: 39.493 - type: mrr_at_1 value: 37.557 - type: mrr_at_10 value: 46.605000000000004 - type: mrr_at_100 value: 47.487 - type: mrr_at_1000 value: 47.54 - type: mrr_at_3 value: 43.721 - type: mrr_at_5 value: 45.411 - type: ndcg_at_1 value: 37.557 - type: ndcg_at_10 value: 47.449000000000005 - type: ndcg_at_100 value: 53.052 - type: ndcg_at_1000 value: 55.010999999999996 - type: ndcg_at_3 value: 41.439 - type: ndcg_at_5 value: 44.292 - type: precision_at_1 value: 37.557 - type: precision_at_10 value: 8.847 - type: precision_at_100 value: 1.357 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_3 value: 20.091 - type: precision_at_5 value: 14.384 - type: recall_at_1 value: 29.976000000000003 - type: recall_at_10 value: 60.99099999999999 - type: recall_at_100 value: 84.245 - type: recall_at_1000 value: 96.97200000000001 - type: recall_at_3 value: 43.794 - type: recall_at_5 value: 51.778999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.099166666666665 - type: map_at_10 value: 38.1365 - type: map_at_100 value: 39.44491666666667 - type: map_at_1000 value: 39.55858333333334 - type: map_at_3 value: 35.03641666666666 - type: map_at_5 value: 36.79833333333334 - type: mrr_at_1 value: 33.39966666666667 - type: mrr_at_10 value: 42.42583333333333 - type: mrr_at_100 value: 43.28575 - type: mrr_at_1000 value: 43.33741666666667 - type: mrr_at_3 value: 39.94975 - type: mrr_at_5 value: 41.41633333333334 - type: ndcg_at_1 value: 33.39966666666667 - type: ndcg_at_10 value: 43.81741666666667 - type: ndcg_at_100 value: 49.08166666666667 - type: ndcg_at_1000 value: 51.121166666666674 - type: ndcg_at_3 value: 38.73575 - type: ndcg_at_5 value: 41.18158333333333 - type: precision_at_1 value: 33.39966666666667 - type: precision_at_10 value: 7.738916666666667 - type: precision_at_100 value: 1.2265833333333331 - type: precision_at_1000 value: 0.15983333333333336 - type: precision_at_3 value: 17.967416666666665 - type: precision_at_5 value: 12.78675 - type: recall_at_1 value: 28.099166666666665 - type: recall_at_10 value: 56.27049999999999 - type: recall_at_100 value: 78.93291666666667 - type: recall_at_1000 value: 92.81608333333334 - type: recall_at_3 value: 42.09775 - type: recall_at_5 value: 48.42533333333334 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.663 - type: map_at_10 value: 30.377 - type: map_at_100 value: 31.426 - type: map_at_1000 value: 31.519000000000002 - type: map_at_3 value: 28.069 - type: map_at_5 value: 29.256999999999998 - type: mrr_at_1 value: 26.687 - type: mrr_at_10 value: 33.107 - type: mrr_at_100 value: 34.055 - type: mrr_at_1000 value: 34.117999999999995 - type: mrr_at_3 value: 31.058000000000003 - type: mrr_at_5 value: 32.14 - type: ndcg_at_1 value: 26.687 - type: ndcg_at_10 value: 34.615 - type: ndcg_at_100 value: 39.776 - type: ndcg_at_1000 value: 42.05 - type: ndcg_at_3 value: 30.322 - type: ndcg_at_5 value: 32.157000000000004 - type: precision_at_1 value: 26.687 - type: precision_at_10 value: 5.491 - type: precision_at_100 value: 0.877 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 13.139000000000001 - type: precision_at_5 value: 9.049 - type: recall_at_1 value: 23.663 - type: recall_at_10 value: 45.035 - type: recall_at_100 value: 68.554 - type: recall_at_1000 value: 85.077 - type: recall_at_3 value: 32.982 - type: recall_at_5 value: 37.688 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.403 - type: map_at_10 value: 25.197000000000003 - type: map_at_100 value: 26.355 - type: map_at_1000 value: 26.487 - type: map_at_3 value: 22.733 - type: map_at_5 value: 24.114 - type: mrr_at_1 value: 21.37 - type: mrr_at_10 value: 29.091 - type: mrr_at_100 value: 30.018 - type: mrr_at_1000 value: 30.096 - type: mrr_at_3 value: 26.887 - type: mrr_at_5 value: 28.157 - type: ndcg_at_1 value: 21.37 - type: ndcg_at_10 value: 30.026000000000003 - type: ndcg_at_100 value: 35.416 - type: ndcg_at_1000 value: 38.45 - type: ndcg_at_3 value: 25.764 - type: ndcg_at_5 value: 27.742 - type: precision_at_1 value: 21.37 - type: precision_at_10 value: 5.609 - type: precision_at_100 value: 0.9860000000000001 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 12.423 - type: precision_at_5 value: 9.009 - type: recall_at_1 value: 17.403 - type: recall_at_10 value: 40.573 - type: recall_at_100 value: 64.818 - type: recall_at_1000 value: 86.53699999999999 - type: recall_at_3 value: 28.493000000000002 - type: recall_at_5 value: 33.660000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.639 - type: map_at_10 value: 38.951 - type: map_at_100 value: 40.238 - type: map_at_1000 value: 40.327 - type: map_at_3 value: 35.842 - type: map_at_5 value: 37.617 - type: mrr_at_1 value: 33.769 - type: mrr_at_10 value: 43.088 - type: mrr_at_100 value: 44.03 - type: mrr_at_1000 value: 44.072 - type: mrr_at_3 value: 40.656 - type: mrr_at_5 value: 42.138999999999996 - type: ndcg_at_1 value: 33.769 - type: ndcg_at_10 value: 44.676 - type: ndcg_at_100 value: 50.416000000000004 - type: ndcg_at_1000 value: 52.227999999999994 - type: ndcg_at_3 value: 39.494 - type: ndcg_at_5 value: 42.013 - type: precision_at_1 value: 33.769 - type: precision_at_10 value: 7.668 - type: precision_at_100 value: 1.18 - type: precision_at_1000 value: 0.145 - type: precision_at_3 value: 18.221 - type: precision_at_5 value: 12.966 - type: recall_at_1 value: 28.639 - type: recall_at_10 value: 57.687999999999995 - type: recall_at_100 value: 82.541 - type: recall_at_1000 value: 94.896 - type: recall_at_3 value: 43.651 - type: recall_at_5 value: 49.925999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.57 - type: map_at_10 value: 40.004 - type: map_at_100 value: 41.75 - type: map_at_1000 value: 41.97 - type: map_at_3 value: 36.788 - type: map_at_5 value: 38.671 - type: mrr_at_1 value: 35.375 - type: mrr_at_10 value: 45.121 - type: mrr_at_100 value: 45.994 - type: mrr_at_1000 value: 46.04 - type: mrr_at_3 value: 42.227 - type: mrr_at_5 value: 43.995 - type: ndcg_at_1 value: 35.375 - type: ndcg_at_10 value: 46.392 - type: ndcg_at_100 value: 52.196 - type: ndcg_at_1000 value: 54.274 - type: ndcg_at_3 value: 41.163 - type: ndcg_at_5 value: 43.813 - type: precision_at_1 value: 35.375 - type: precision_at_10 value: 8.676 - type: precision_at_100 value: 1.678 - type: precision_at_1000 value: 0.253 - type: precision_at_3 value: 19.104 - type: precision_at_5 value: 13.913 - type: recall_at_1 value: 29.57 - type: recall_at_10 value: 58.779 - type: recall_at_100 value: 83.337 - type: recall_at_1000 value: 95.979 - type: recall_at_3 value: 44.005 - type: recall_at_5 value: 50.975 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 20.832 - type: map_at_10 value: 29.733999999999998 - type: map_at_100 value: 30.727 - type: map_at_1000 value: 30.843999999999998 - type: map_at_3 value: 26.834999999999997 - type: map_at_5 value: 28.555999999999997 - type: mrr_at_1 value: 22.921 - type: mrr_at_10 value: 31.791999999999998 - type: mrr_at_100 value: 32.666000000000004 - type: mrr_at_1000 value: 32.751999999999995 - type: mrr_at_3 value: 29.144 - type: mrr_at_5 value: 30.622 - type: ndcg_at_1 value: 22.921 - type: ndcg_at_10 value: 34.915 - type: ndcg_at_100 value: 39.744 - type: ndcg_at_1000 value: 42.407000000000004 - type: ndcg_at_3 value: 29.421000000000003 - type: ndcg_at_5 value: 32.211 - type: precision_at_1 value: 22.921 - type: precision_at_10 value: 5.675 - type: precision_at_100 value: 0.872 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 12.753999999999998 - type: precision_at_5 value: 9.353 - type: recall_at_1 value: 20.832 - type: recall_at_10 value: 48.795 - type: recall_at_100 value: 70.703 - type: recall_at_1000 value: 90.187 - type: recall_at_3 value: 34.455000000000005 - type: recall_at_5 value: 40.967 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.334 - type: map_at_10 value: 19.009999999999998 - type: map_at_100 value: 21.129 - type: map_at_1000 value: 21.328 - type: map_at_3 value: 15.152 - type: map_at_5 value: 17.084 - type: mrr_at_1 value: 23.453 - type: mrr_at_10 value: 36.099 - type: mrr_at_100 value: 37.069 - type: mrr_at_1000 value: 37.104 - type: mrr_at_3 value: 32.096000000000004 - type: mrr_at_5 value: 34.451 - type: ndcg_at_1 value: 23.453 - type: ndcg_at_10 value: 27.739000000000004 - type: ndcg_at_100 value: 35.836 - type: ndcg_at_1000 value: 39.242 - type: ndcg_at_3 value: 21.263 - type: ndcg_at_5 value: 23.677 - type: precision_at_1 value: 23.453 - type: precision_at_10 value: 9.199 - type: precision_at_100 value: 1.791 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 16.2 - type: precision_at_5 value: 13.147 - type: recall_at_1 value: 10.334 - type: recall_at_10 value: 35.177 - type: recall_at_100 value: 63.009 - type: recall_at_1000 value: 81.938 - type: recall_at_3 value: 19.914 - type: recall_at_5 value: 26.077 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.212 - type: map_at_10 value: 17.386 - type: map_at_100 value: 24.234 - type: map_at_1000 value: 25.724999999999998 - type: map_at_3 value: 12.727 - type: map_at_5 value: 14.785 - type: mrr_at_1 value: 59.25 - type: mrr_at_10 value: 68.687 - type: mrr_at_100 value: 69.133 - type: mrr_at_1000 value: 69.14099999999999 - type: mrr_at_3 value: 66.917 - type: mrr_at_5 value: 67.742 - type: ndcg_at_1 value: 48.625 - type: ndcg_at_10 value: 36.675999999999995 - type: ndcg_at_100 value: 41.543 - type: ndcg_at_1000 value: 49.241 - type: ndcg_at_3 value: 41.373 - type: ndcg_at_5 value: 38.707 - type: precision_at_1 value: 59.25 - type: precision_at_10 value: 28.525 - type: precision_at_100 value: 9.027000000000001 - type: precision_at_1000 value: 1.8339999999999999 - type: precision_at_3 value: 44.833 - type: precision_at_5 value: 37.35 - type: recall_at_1 value: 8.212 - type: recall_at_10 value: 23.188 - type: recall_at_100 value: 48.613 - type: recall_at_1000 value: 73.093 - type: recall_at_3 value: 14.419 - type: recall_at_5 value: 17.798 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 52.725 - type: f1 value: 46.50743309855908 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 55.086 - type: map_at_10 value: 66.914 - type: map_at_100 value: 67.321 - type: map_at_1000 value: 67.341 - type: map_at_3 value: 64.75800000000001 - type: map_at_5 value: 66.189 - type: mrr_at_1 value: 59.28600000000001 - type: mrr_at_10 value: 71.005 - type: mrr_at_100 value: 71.304 - type: mrr_at_1000 value: 71.313 - type: mrr_at_3 value: 69.037 - type: mrr_at_5 value: 70.35 - type: ndcg_at_1 value: 59.28600000000001 - type: ndcg_at_10 value: 72.695 - type: ndcg_at_100 value: 74.432 - type: ndcg_at_1000 value: 74.868 - type: ndcg_at_3 value: 68.72200000000001 - type: ndcg_at_5 value: 71.081 - type: precision_at_1 value: 59.28600000000001 - type: precision_at_10 value: 9.499 - type: precision_at_100 value: 1.052 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 27.503 - type: precision_at_5 value: 17.854999999999997 - type: recall_at_1 value: 55.086 - type: recall_at_10 value: 86.453 - type: recall_at_100 value: 94.028 - type: recall_at_1000 value: 97.052 - type: recall_at_3 value: 75.821 - type: recall_at_5 value: 81.6 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.262999999999998 - type: map_at_10 value: 37.488 - type: map_at_100 value: 39.498 - type: map_at_1000 value: 39.687 - type: map_at_3 value: 32.529 - type: map_at_5 value: 35.455 - type: mrr_at_1 value: 44.907000000000004 - type: mrr_at_10 value: 53.239000000000004 - type: mrr_at_100 value: 54.086 - type: mrr_at_1000 value: 54.122 - type: mrr_at_3 value: 51.235 - type: mrr_at_5 value: 52.415 - type: ndcg_at_1 value: 44.907000000000004 - type: ndcg_at_10 value: 45.446 - type: ndcg_at_100 value: 52.429 - type: ndcg_at_1000 value: 55.169000000000004 - type: ndcg_at_3 value: 41.882000000000005 - type: ndcg_at_5 value: 43.178 - type: precision_at_1 value: 44.907000000000004 - type: precision_at_10 value: 12.931999999999999 - type: precision_at_100 value: 2.025 - type: precision_at_1000 value: 0.248 - type: precision_at_3 value: 28.652 - type: precision_at_5 value: 21.204 - type: recall_at_1 value: 22.262999999999998 - type: recall_at_10 value: 52.447 - type: recall_at_100 value: 78.045 - type: recall_at_1000 value: 94.419 - type: recall_at_3 value: 38.064 - type: recall_at_5 value: 44.769 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 32.519 - type: map_at_10 value: 45.831 - type: map_at_100 value: 46.815 - type: map_at_1000 value: 46.899 - type: map_at_3 value: 42.836 - type: map_at_5 value: 44.65 - type: mrr_at_1 value: 65.037 - type: mrr_at_10 value: 72.16 - type: mrr_at_100 value: 72.51100000000001 - type: mrr_at_1000 value: 72.53 - type: mrr_at_3 value: 70.682 - type: mrr_at_5 value: 71.54599999999999 - type: ndcg_at_1 value: 65.037 - type: ndcg_at_10 value: 55.17999999999999 - type: ndcg_at_100 value: 58.888 - type: ndcg_at_1000 value: 60.648 - type: ndcg_at_3 value: 50.501 - type: ndcg_at_5 value: 52.977 - type: precision_at_1 value: 65.037 - type: precision_at_10 value: 11.530999999999999 - type: precision_at_100 value: 1.4460000000000002 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 31.483 - type: precision_at_5 value: 20.845 - type: recall_at_1 value: 32.519 - type: recall_at_10 value: 57.657000000000004 - type: recall_at_100 value: 72.30199999999999 - type: recall_at_1000 value: 84.024 - type: recall_at_3 value: 47.225 - type: recall_at_5 value: 52.113 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 88.3168 - type: ap value: 83.80165516037135 - type: f1 value: 88.29942471066407 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 20.724999999999998 - type: map_at_10 value: 32.736 - type: map_at_100 value: 33.938 - type: map_at_1000 value: 33.991 - type: map_at_3 value: 28.788000000000004 - type: map_at_5 value: 31.016 - type: mrr_at_1 value: 21.361 - type: mrr_at_10 value: 33.323 - type: mrr_at_100 value: 34.471000000000004 - type: mrr_at_1000 value: 34.518 - type: mrr_at_3 value: 29.453000000000003 - type: mrr_at_5 value: 31.629 - type: ndcg_at_1 value: 21.361 - type: ndcg_at_10 value: 39.649 - type: ndcg_at_100 value: 45.481 - type: ndcg_at_1000 value: 46.775 - type: ndcg_at_3 value: 31.594 - type: ndcg_at_5 value: 35.543 - type: precision_at_1 value: 21.361 - type: precision_at_10 value: 6.3740000000000006 - type: precision_at_100 value: 0.931 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 13.514999999999999 - type: precision_at_5 value: 10.100000000000001 - type: recall_at_1 value: 20.724999999999998 - type: recall_at_10 value: 61.034 - type: recall_at_100 value: 88.062 - type: recall_at_1000 value: 97.86399999999999 - type: recall_at_3 value: 39.072 - type: recall_at_5 value: 48.53 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.8919288645691 - type: f1 value: 93.57059586398059 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 67.97993616051072 - type: f1 value: 48.244319183606535 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.90047074646941 - type: f1 value: 66.48999056063725 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.34566240753195 - type: f1 value: 73.54164154290658 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.21866934757011 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.000936217235534 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.68189362520352 - type: mrr value: 32.69603637784303 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.078 - type: map_at_10 value: 12.671 - type: map_at_100 value: 16.291 - type: map_at_1000 value: 17.855999999999998 - type: map_at_3 value: 9.610000000000001 - type: map_at_5 value: 11.152 - type: mrr_at_1 value: 43.963 - type: mrr_at_10 value: 53.173 - type: mrr_at_100 value: 53.718999999999994 - type: mrr_at_1000 value: 53.756 - type: mrr_at_3 value: 50.980000000000004 - type: mrr_at_5 value: 52.42 - type: ndcg_at_1 value: 42.415000000000006 - type: ndcg_at_10 value: 34.086 - type: ndcg_at_100 value: 32.545 - type: ndcg_at_1000 value: 41.144999999999996 - type: ndcg_at_3 value: 39.434999999999995 - type: ndcg_at_5 value: 37.888 - type: precision_at_1 value: 43.653 - type: precision_at_10 value: 25.014999999999997 - type: precision_at_100 value: 8.594 - type: precision_at_1000 value: 2.169 - type: precision_at_3 value: 37.049 - type: precision_at_5 value: 33.065 - type: recall_at_1 value: 6.078 - type: recall_at_10 value: 16.17 - type: recall_at_100 value: 34.512 - type: recall_at_1000 value: 65.447 - type: recall_at_3 value: 10.706 - type: recall_at_5 value: 13.158 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 27.378000000000004 - type: map_at_10 value: 42.178 - type: map_at_100 value: 43.32 - type: map_at_1000 value: 43.358000000000004 - type: map_at_3 value: 37.474000000000004 - type: map_at_5 value: 40.333000000000006 - type: mrr_at_1 value: 30.823 - type: mrr_at_10 value: 44.626 - type: mrr_at_100 value: 45.494 - type: mrr_at_1000 value: 45.519 - type: mrr_at_3 value: 40.585 - type: mrr_at_5 value: 43.146 - type: ndcg_at_1 value: 30.794 - type: ndcg_at_10 value: 50.099000000000004 - type: ndcg_at_100 value: 54.900999999999996 - type: ndcg_at_1000 value: 55.69499999999999 - type: ndcg_at_3 value: 41.238 - type: ndcg_at_5 value: 46.081 - type: precision_at_1 value: 30.794 - type: precision_at_10 value: 8.549 - type: precision_at_100 value: 1.124 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 18.926000000000002 - type: precision_at_5 value: 14.16 - type: recall_at_1 value: 27.378000000000004 - type: recall_at_10 value: 71.842 - type: recall_at_100 value: 92.565 - type: recall_at_1000 value: 98.402 - type: recall_at_3 value: 49.053999999999995 - type: recall_at_5 value: 60.207 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.557 - type: map_at_10 value: 84.729 - type: map_at_100 value: 85.369 - type: map_at_1000 value: 85.382 - type: map_at_3 value: 81.72 - type: map_at_5 value: 83.613 - type: mrr_at_1 value: 81.3 - type: mrr_at_10 value: 87.488 - type: mrr_at_100 value: 87.588 - type: mrr_at_1000 value: 87.589 - type: mrr_at_3 value: 86.53 - type: mrr_at_5 value: 87.18599999999999 - type: ndcg_at_1 value: 81.28999999999999 - type: ndcg_at_10 value: 88.442 - type: ndcg_at_100 value: 89.637 - type: ndcg_at_1000 value: 89.70700000000001 - type: ndcg_at_3 value: 85.55199999999999 - type: ndcg_at_5 value: 87.154 - type: precision_at_1 value: 81.28999999999999 - type: precision_at_10 value: 13.489999999999998 - type: precision_at_100 value: 1.54 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.553 - type: precision_at_5 value: 24.708 - type: recall_at_1 value: 70.557 - type: recall_at_10 value: 95.645 - type: recall_at_100 value: 99.693 - type: recall_at_1000 value: 99.995 - type: recall_at_3 value: 87.359 - type: recall_at_5 value: 91.89699999999999 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 63.65060114776209 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 64.63271250680617 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.263 - type: map_at_10 value: 10.801 - type: map_at_100 value: 12.888 - type: map_at_1000 value: 13.224 - type: map_at_3 value: 7.362 - type: map_at_5 value: 9.149000000000001 - type: mrr_at_1 value: 21 - type: mrr_at_10 value: 31.416 - type: mrr_at_100 value: 32.513 - type: mrr_at_1000 value: 32.58 - type: mrr_at_3 value: 28.116999999999997 - type: mrr_at_5 value: 29.976999999999997 - type: ndcg_at_1 value: 21 - type: ndcg_at_10 value: 18.551000000000002 - type: ndcg_at_100 value: 26.657999999999998 - type: ndcg_at_1000 value: 32.485 - type: ndcg_at_3 value: 16.834 - type: ndcg_at_5 value: 15.204999999999998 - type: precision_at_1 value: 21 - type: precision_at_10 value: 9.84 - type: precision_at_100 value: 2.16 - type: precision_at_1000 value: 0.35500000000000004 - type: precision_at_3 value: 15.667 - type: precision_at_5 value: 13.62 - type: recall_at_1 value: 4.263 - type: recall_at_10 value: 19.922 - type: recall_at_100 value: 43.808 - type: recall_at_1000 value: 72.14500000000001 - type: recall_at_3 value: 9.493 - type: recall_at_5 value: 13.767999999999999 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_spearman value: 81.27446313317233 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_spearman value: 76.27963301217527 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_spearman value: 88.18495048450949 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_spearman value: 81.91982338692046 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_spearman value: 89.00896818385291 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_spearman value: 85.48814644586132 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 90.30116926966582 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_spearman value: 67.74132963032342 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_spearman value: 86.87741355780479 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 82.0019012295875 - type: mrr value: 94.70267024188593 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 50.05 - type: map_at_10 value: 59.36 - type: map_at_100 value: 59.967999999999996 - type: map_at_1000 value: 60.023 - type: map_at_3 value: 56.515 - type: map_at_5 value: 58.272999999999996 - type: mrr_at_1 value: 53 - type: mrr_at_10 value: 61.102000000000004 - type: mrr_at_100 value: 61.476 - type: mrr_at_1000 value: 61.523 - type: mrr_at_3 value: 58.778 - type: mrr_at_5 value: 60.128 - type: ndcg_at_1 value: 53 - type: ndcg_at_10 value: 64.43100000000001 - type: ndcg_at_100 value: 66.73599999999999 - type: ndcg_at_1000 value: 68.027 - type: ndcg_at_3 value: 59.279 - type: ndcg_at_5 value: 61.888 - type: precision_at_1 value: 53 - type: precision_at_10 value: 8.767 - type: precision_at_100 value: 1.01 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 23.444000000000003 - type: precision_at_5 value: 15.667 - type: recall_at_1 value: 50.05 - type: recall_at_10 value: 78.511 - type: recall_at_100 value: 88.5 - type: recall_at_1000 value: 98.333 - type: recall_at_3 value: 64.117 - type: recall_at_5 value: 70.867 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.72178217821782 - type: cos_sim_ap value: 93.0728601593541 - type: cos_sim_f1 value: 85.6727976766699 - type: cos_sim_precision value: 83.02063789868667 - type: cos_sim_recall value: 88.5 - type: dot_accuracy value: 99.72178217821782 - type: dot_ap value: 93.07287396168348 - type: dot_f1 value: 85.6727976766699 - type: dot_precision value: 83.02063789868667 - type: dot_recall value: 88.5 - type: euclidean_accuracy value: 99.72178217821782 - type: euclidean_ap value: 93.07285657982895 - type: euclidean_f1 value: 85.6727976766699 - type: euclidean_precision value: 83.02063789868667 - type: euclidean_recall value: 88.5 - type: manhattan_accuracy value: 99.72475247524753 - type: manhattan_ap value: 93.02792973059809 - type: manhattan_f1 value: 85.7727737973388 - type: manhattan_precision value: 87.84067085953879 - type: manhattan_recall value: 83.8 - type: max_accuracy value: 99.72475247524753 - type: max_ap value: 93.07287396168348 - type: max_f1 value: 85.7727737973388 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 68.77583615550819 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 36.151636938606956 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.16607939471187 - type: mrr value: 52.95172046091163 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.314646669495666 - type: cos_sim_spearman value: 31.83562491439455 - type: dot_pearson value: 31.314590842874157 - type: dot_spearman value: 31.83363065810437 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.198 - type: map_at_10 value: 1.3010000000000002 - type: map_at_100 value: 7.2139999999999995 - type: map_at_1000 value: 20.179 - type: map_at_3 value: 0.528 - type: map_at_5 value: 0.8019999999999999 - type: mrr_at_1 value: 72 - type: mrr_at_10 value: 83.39999999999999 - type: mrr_at_100 value: 83.39999999999999 - type: mrr_at_1000 value: 83.39999999999999 - type: mrr_at_3 value: 81.667 - type: mrr_at_5 value: 83.06700000000001 - type: ndcg_at_1 value: 66 - type: ndcg_at_10 value: 58.059000000000005 - type: ndcg_at_100 value: 44.316 - type: ndcg_at_1000 value: 43.147000000000006 - type: ndcg_at_3 value: 63.815999999999995 - type: ndcg_at_5 value: 63.005 - type: precision_at_1 value: 72 - type: precision_at_10 value: 61.4 - type: precision_at_100 value: 45.62 - type: precision_at_1000 value: 19.866 - type: precision_at_3 value: 70 - type: precision_at_5 value: 68.8 - type: recall_at_1 value: 0.198 - type: recall_at_10 value: 1.517 - type: recall_at_100 value: 10.587 - type: recall_at_1000 value: 41.233 - type: recall_at_3 value: 0.573 - type: recall_at_5 value: 0.907 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 1.894 - type: map_at_10 value: 8.488999999999999 - type: map_at_100 value: 14.445 - type: map_at_1000 value: 16.078 - type: map_at_3 value: 4.589 - type: map_at_5 value: 6.019 - type: mrr_at_1 value: 22.448999999999998 - type: mrr_at_10 value: 39.82 - type: mrr_at_100 value: 40.752 - type: mrr_at_1000 value: 40.771 - type: mrr_at_3 value: 34.354 - type: mrr_at_5 value: 37.721 - type: ndcg_at_1 value: 19.387999999999998 - type: ndcg_at_10 value: 21.563 - type: ndcg_at_100 value: 33.857 - type: ndcg_at_1000 value: 46.199 - type: ndcg_at_3 value: 22.296 - type: ndcg_at_5 value: 21.770999999999997 - type: precision_at_1 value: 22.448999999999998 - type: precision_at_10 value: 19.796 - type: precision_at_100 value: 7.142999999999999 - type: precision_at_1000 value: 1.541 - type: precision_at_3 value: 24.490000000000002 - type: precision_at_5 value: 22.448999999999998 - type: recall_at_1 value: 1.894 - type: recall_at_10 value: 14.931 - type: recall_at_100 value: 45.524 - type: recall_at_1000 value: 83.243 - type: recall_at_3 value: 5.712 - type: recall_at_5 value: 8.386000000000001 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.049 - type: ap value: 13.85116971310922 - type: f1 value: 54.37504302487686 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 64.1312959818902 - type: f1 value: 64.11413877009383 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 54.13103431861502 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.327889372355 - type: cos_sim_ap value: 77.42059895975699 - type: cos_sim_f1 value: 71.02706903250873 - type: cos_sim_precision value: 69.75324344950394 - type: cos_sim_recall value: 72.34828496042216 - type: dot_accuracy value: 87.327889372355 - type: dot_ap value: 77.4209479346677 - type: dot_f1 value: 71.02706903250873 - type: dot_precision value: 69.75324344950394 - type: dot_recall value: 72.34828496042216 - type: euclidean_accuracy value: 87.327889372355 - type: euclidean_ap value: 77.42096495861037 - type: euclidean_f1 value: 71.02706903250873 - type: euclidean_precision value: 69.75324344950394 - type: euclidean_recall value: 72.34828496042216 - type: manhattan_accuracy value: 87.31000774870358 - type: manhattan_ap value: 77.38930750711619 - type: manhattan_f1 value: 71.07935314027831 - type: manhattan_precision value: 67.70957726295677 - type: manhattan_recall value: 74.80211081794195 - type: max_accuracy value: 87.327889372355 - type: max_ap value: 77.42096495861037 - type: max_f1 value: 71.07935314027831 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.58939729110878 - type: cos_sim_ap value: 87.17594155025475 - type: cos_sim_f1 value: 79.21146953405018 - type: cos_sim_precision value: 76.8918527109307 - type: cos_sim_recall value: 81.67539267015707 - type: dot_accuracy value: 89.58939729110878 - type: dot_ap value: 87.17593963273593 - type: dot_f1 value: 79.21146953405018 - type: dot_precision value: 76.8918527109307 - type: dot_recall value: 81.67539267015707 - type: euclidean_accuracy value: 89.58939729110878 - type: euclidean_ap value: 87.17592466925834 - type: euclidean_f1 value: 79.21146953405018 - type: euclidean_precision value: 76.8918527109307 - type: euclidean_recall value: 81.67539267015707 - type: manhattan_accuracy value: 89.62626615438352 - type: manhattan_ap value: 87.16589873161546 - type: manhattan_f1 value: 79.25143598295348 - type: manhattan_precision value: 76.39494177323712 - type: manhattan_recall value: 82.32984293193716 - type: max_accuracy value: 89.62626615438352 - type: max_ap value: 87.17594155025475 - type: max_f1 value: 79.25143598295348 --- # hkunlp/instructor-large We introduce **Instructor**👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) ***by simply providing the task instruction, without any finetuning***. Instructor👨‍ achieves sota on 70 diverse embedding tasks (MTEB leaderboard)! The model is easy to use with **our customized** library. For more details, check out our paper and project page! **************************** **Updates** **************************** * 12/28: We released a new checkpoint trained with hard negatives, which gives better performance. * 12/21: We released our paper, code, checkpoint and project page! Check them out! ## Quick start
## Installation ## Compute your customized embeddings Then you can use the model like this to calculate domain-specific and task-aware embeddings: ## Use cases
## Calculate embeddings for your customized texts If you want to calculate customized embeddings for specific sentences, you may follow the unified template to write instructions:                           Represent the for : * is optional, and it specifies the domain of the text, e.g., science, finance, medicine, etc. * is required, and it specifies the encoding unit, e.g., sentence, document, paragraph, etc. * is optional, and it specifies the objective of embedding, e.g., retrieve a document, classify the sentence, etc. ## Calculate Sentence similarities You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**. ## Information Retrieval You can also use **customized embeddings** for information retrieval. ## Clustering Use **customized embeddings** for clustering texts in groups.", + "model_explanation_gemini": "Generates text embeddings for tasks like sentence similarity, information retrieval, and text classification by processing input instructions and text pairs." +} \ No newline at end of file diff --git a/data/model_data_json/hkunlp_instructor-xl.json b/data/model_data_json/hkunlp_instructor-xl.json new file mode 100644 index 0000000000000000000000000000000000000000..18abf0b93bdf92bf9865b72101435ffd6a9e6e46 --- /dev/null +++ b/data/model_data_json/hkunlp_instructor-xl.json @@ -0,0 +1,39 @@ +{ + "model_id": "hkunlp/instructor-xl", + "downloads": 132393, + "tags": [ + "sentence-transformers", + "pytorch", + "t5", + "text-embedding", + "embeddings", + "information-retrieval", + "beir", + "text-classification", + "language-model", + "text-clustering", + "text-semantic-similarity", + "text-evaluation", + "prompt-retrieval", + "text-reranking", + "feature-extraction", + "sentence-similarity", + "transformers", + "English", + "Sentence Similarity", + "natural_questions", + "ms_marco", + "fever", + "hotpot_qa", + "mteb", + "en", + "arxiv:2212.09741", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - text-embedding - embeddings - information-retrieval - beir - text-classification - language-model - text-clustering - text-semantic-similarity - text-evaluation - prompt-retrieval - text-reranking - sentence-transformers - feature-extraction - sentence-similarity - transformers - t5 - English - Sentence Similarity - natural_questions - ms_marco - fever - hotpot_qa - mteb language: en inference: false license: apache-2.0 model-index: - name: final_xl_results results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 85.08955223880596 - type: ap value: 52.66066378722476 - type: f1 value: 79.63340218960269 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 86.542 - type: ap value: 81.92695193008987 - type: f1 value: 86.51466132573681 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 42.964 - type: f1 value: 41.43146249774862 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 29.872 - type: map_at_10 value: 46.342 - type: map_at_100 value: 47.152 - type: map_at_1000 value: 47.154 - type: map_at_3 value: 41.216 - type: map_at_5 value: 44.035999999999994 - type: mrr_at_1 value: 30.939 - type: mrr_at_10 value: 46.756 - type: mrr_at_100 value: 47.573 - type: mrr_at_1000 value: 47.575 - type: mrr_at_3 value: 41.548 - type: mrr_at_5 value: 44.425 - type: ndcg_at_1 value: 29.872 - type: ndcg_at_10 value: 55.65 - type: ndcg_at_100 value: 58.88099999999999 - type: ndcg_at_1000 value: 58.951 - type: ndcg_at_3 value: 45.0 - type: ndcg_at_5 value: 50.09 - type: precision_at_1 value: 29.872 - type: precision_at_10 value: 8.549 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 18.658 - type: precision_at_5 value: 13.669999999999998 - type: recall_at_1 value: 29.872 - type: recall_at_10 value: 85.491 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 55.974000000000004 - type: recall_at_5 value: 68.35 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 42.452729850641276 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 32.21141846480423 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 65.34710928952622 - type: mrr value: 77.61124301983028 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_spearman value: 84.15312230525639 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 82.66233766233766 - type: f1 value: 82.04175284777669 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.36697339826455 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 30.551241447593092 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 36.797000000000004 - type: map_at_10 value: 48.46 - type: map_at_100 value: 49.968 - type: map_at_1000 value: 50.080000000000005 - type: map_at_3 value: 44.71 - type: map_at_5 value: 46.592 - type: mrr_at_1 value: 45.494 - type: mrr_at_10 value: 54.747 - type: mrr_at_100 value: 55.43599999999999 - type: mrr_at_1000 value: 55.464999999999996 - type: mrr_at_3 value: 52.361000000000004 - type: mrr_at_5 value: 53.727000000000004 - type: ndcg_at_1 value: 45.494 - type: ndcg_at_10 value: 54.989 - type: ndcg_at_100 value: 60.096000000000004 - type: ndcg_at_1000 value: 61.58 - type: ndcg_at_3 value: 49.977 - type: ndcg_at_5 value: 51.964999999999996 - type: precision_at_1 value: 45.494 - type: precision_at_10 value: 10.558 - type: precision_at_100 value: 1.6049999999999998 - type: precision_at_1000 value: 0.203 - type: precision_at_3 value: 23.796 - type: precision_at_5 value: 16.881 - type: recall_at_1 value: 36.797000000000004 - type: recall_at_10 value: 66.83 - type: recall_at_100 value: 88.34100000000001 - type: recall_at_1000 value: 97.202 - type: recall_at_3 value: 51.961999999999996 - type: recall_at_5 value: 57.940000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.597 - type: map_at_10 value: 43.424 - type: map_at_100 value: 44.78 - type: map_at_1000 value: 44.913 - type: map_at_3 value: 40.315 - type: map_at_5 value: 41.987 - type: mrr_at_1 value: 40.382 - type: mrr_at_10 value: 49.219 - type: mrr_at_100 value: 49.895 - type: mrr_at_1000 value: 49.936 - type: mrr_at_3 value: 46.996 - type: mrr_at_5 value: 48.231 - type: ndcg_at_1 value: 40.382 - type: ndcg_at_10 value: 49.318 - type: ndcg_at_100 value: 53.839999999999996 - type: ndcg_at_1000 value: 55.82899999999999 - type: ndcg_at_3 value: 44.914 - type: ndcg_at_5 value: 46.798 - type: precision_at_1 value: 40.382 - type: precision_at_10 value: 9.274000000000001 - type: precision_at_100 value: 1.497 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 21.592 - type: precision_at_5 value: 15.159 - type: recall_at_1 value: 32.597 - type: recall_at_10 value: 59.882000000000005 - type: recall_at_100 value: 78.446 - type: recall_at_1000 value: 90.88000000000001 - type: recall_at_3 value: 46.9 - type: recall_at_5 value: 52.222 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 43.8 - type: map_at_10 value: 57.293000000000006 - type: map_at_100 value: 58.321 - type: map_at_1000 value: 58.361 - type: map_at_3 value: 53.839999999999996 - type: map_at_5 value: 55.838 - type: mrr_at_1 value: 49.592000000000006 - type: mrr_at_10 value: 60.643 - type: mrr_at_100 value: 61.23499999999999 - type: mrr_at_1000 value: 61.251999999999995 - type: mrr_at_3 value: 58.265 - type: mrr_at_5 value: 59.717 - type: ndcg_at_1 value: 49.592000000000006 - type: ndcg_at_10 value: 63.364 - type: ndcg_at_100 value: 67.167 - type: ndcg_at_1000 value: 67.867 - type: ndcg_at_3 value: 57.912 - type: ndcg_at_5 value: 60.697 - type: precision_at_1 value: 49.592000000000006 - type: precision_at_10 value: 10.088 - type: precision_at_100 value: 1.2930000000000001 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 25.789 - type: precision_at_5 value: 17.541999999999998 - type: recall_at_1 value: 43.8 - type: recall_at_10 value: 77.635 - type: recall_at_100 value: 93.748 - type: recall_at_1000 value: 98.468 - type: recall_at_3 value: 63.223 - type: recall_at_5 value: 70.122 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.721 - type: map_at_10 value: 35.626999999999995 - type: map_at_100 value: 36.719 - type: map_at_1000 value: 36.8 - type: map_at_3 value: 32.781 - type: map_at_5 value: 34.333999999999996 - type: mrr_at_1 value: 29.604999999999997 - type: mrr_at_10 value: 37.564 - type: mrr_at_100 value: 38.505 - type: mrr_at_1000 value: 38.565 - type: mrr_at_3 value: 34.727000000000004 - type: mrr_at_5 value: 36.207 - type: ndcg_at_1 value: 29.604999999999997 - type: ndcg_at_10 value: 40.575 - type: ndcg_at_100 value: 45.613 - type: ndcg_at_1000 value: 47.676 - type: ndcg_at_3 value: 34.811 - type: ndcg_at_5 value: 37.491 - type: precision_at_1 value: 29.604999999999997 - type: precision_at_10 value: 6.1690000000000005 - type: precision_at_100 value: 0.906 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 14.237 - type: precision_at_5 value: 10.056 - type: recall_at_1 value: 27.721 - type: recall_at_10 value: 54.041 - type: recall_at_100 value: 76.62299999999999 - type: recall_at_1000 value: 92.134 - type: recall_at_3 value: 38.582 - type: recall_at_5 value: 44.989000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.553 - type: map_at_10 value: 25.384 - type: map_at_100 value: 26.655 - type: map_at_1000 value: 26.778000000000002 - type: map_at_3 value: 22.733 - type: map_at_5 value: 24.119 - type: mrr_at_1 value: 20.149 - type: mrr_at_10 value: 29.705 - type: mrr_at_100 value: 30.672 - type: mrr_at_1000 value: 30.737 - type: mrr_at_3 value: 27.032 - type: mrr_at_5 value: 28.369 - type: ndcg_at_1 value: 20.149 - type: ndcg_at_10 value: 30.843999999999998 - type: ndcg_at_100 value: 36.716 - type: ndcg_at_1000 value: 39.495000000000005 - type: ndcg_at_3 value: 25.918999999999997 - type: ndcg_at_5 value: 27.992 - type: precision_at_1 value: 20.149 - type: precision_at_10 value: 5.858 - type: precision_at_100 value: 1.009 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.645000000000001 - type: precision_at_5 value: 9.179 - type: recall_at_1 value: 16.553 - type: recall_at_10 value: 43.136 - type: recall_at_100 value: 68.562 - type: recall_at_1000 value: 88.208 - type: recall_at_3 value: 29.493000000000002 - type: recall_at_5 value: 34.751 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.000999999999998 - type: map_at_10 value: 39.004 - type: map_at_100 value: 40.461999999999996 - type: map_at_1000 value: 40.566 - type: map_at_3 value: 35.805 - type: map_at_5 value: 37.672 - type: mrr_at_1 value: 33.782000000000004 - type: mrr_at_10 value: 44.702 - type: mrr_at_100 value: 45.528 - type: mrr_at_1000 value: 45.576 - type: mrr_at_3 value: 42.14 - type: mrr_at_5 value: 43.651 - type: ndcg_at_1 value: 33.782000000000004 - type: ndcg_at_10 value: 45.275999999999996 - type: ndcg_at_100 value: 50.888 - type: ndcg_at_1000 value: 52.879 - type: ndcg_at_3 value: 40.191 - type: ndcg_at_5 value: 42.731 - type: precision_at_1 value: 33.782000000000004 - type: precision_at_10 value: 8.200000000000001 - type: precision_at_100 value: 1.287 - type: precision_at_1000 value: 0.16199999999999998 - type: precision_at_3 value: 19.185 - type: precision_at_5 value: 13.667000000000002 - type: recall_at_1 value: 28.000999999999998 - type: recall_at_10 value: 58.131 - type: recall_at_100 value: 80.869 - type: recall_at_1000 value: 93.931 - type: recall_at_3 value: 44.161 - type: recall_at_5 value: 50.592000000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.047 - type: map_at_10 value: 38.596000000000004 - type: map_at_100 value: 40.116 - type: map_at_1000 value: 40.232 - type: map_at_3 value: 35.205 - type: map_at_5 value: 37.076 - type: mrr_at_1 value: 34.932 - type: mrr_at_10 value: 44.496 - type: mrr_at_100 value: 45.47 - type: mrr_at_1000 value: 45.519999999999996 - type: mrr_at_3 value: 41.743 - type: mrr_at_5 value: 43.352000000000004 - type: ndcg_at_1 value: 34.932 - type: ndcg_at_10 value: 44.901 - type: ndcg_at_100 value: 50.788999999999994 - type: ndcg_at_1000 value: 52.867 - type: ndcg_at_3 value: 39.449 - type: ndcg_at_5 value: 41.929 - type: precision_at_1 value: 34.932 - type: precision_at_10 value: 8.311 - type: precision_at_100 value: 1.3050000000000002 - type: precision_at_1000 value: 0.166 - type: precision_at_3 value: 18.836 - type: precision_at_5 value: 13.447000000000001 - type: recall_at_1 value: 28.047 - type: recall_at_10 value: 57.717 - type: recall_at_100 value: 82.182 - type: recall_at_1000 value: 95.82000000000001 - type: recall_at_3 value: 42.448 - type: recall_at_5 value: 49.071 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.861250000000005 - type: map_at_10 value: 37.529583333333335 - type: map_at_100 value: 38.7915 - type: map_at_1000 value: 38.90558333333335 - type: map_at_3 value: 34.57333333333333 - type: map_at_5 value: 36.187166666666656 - type: mrr_at_1 value: 32.88291666666666 - type: mrr_at_10 value: 41.79750000000001 - type: mrr_at_100 value: 42.63183333333333 - type: mrr_at_1000 value: 42.68483333333333 - type: mrr_at_3 value: 39.313750000000006 - type: mrr_at_5 value: 40.70483333333333 - type: ndcg_at_1 value: 32.88291666666666 - type: ndcg_at_10 value: 43.09408333333333 - type: ndcg_at_100 value: 48.22158333333333 - type: ndcg_at_1000 value: 50.358000000000004 - type: ndcg_at_3 value: 38.129583333333336 - type: ndcg_at_5 value: 40.39266666666666 - type: precision_at_1 value: 32.88291666666666 - type: precision_at_10 value: 7.5584999999999996 - type: precision_at_100 value: 1.1903333333333332 - type: precision_at_1000 value: 0.15658333333333332 - type: precision_at_3 value: 17.495916666666666 - type: precision_at_5 value: 12.373833333333332 - type: recall_at_1 value: 27.861250000000005 - type: recall_at_10 value: 55.215916666666665 - type: recall_at_100 value: 77.392 - type: recall_at_1000 value: 92.04908333333334 - type: recall_at_3 value: 41.37475 - type: recall_at_5 value: 47.22908333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.064999999999998 - type: map_at_10 value: 31.635999999999996 - type: map_at_100 value: 32.596000000000004 - type: map_at_1000 value: 32.695 - type: map_at_3 value: 29.612 - type: map_at_5 value: 30.768 - type: mrr_at_1 value: 28.528 - type: mrr_at_10 value: 34.717 - type: mrr_at_100 value: 35.558 - type: mrr_at_1000 value: 35.626000000000005 - type: mrr_at_3 value: 32.745000000000005 - type: mrr_at_5 value: 33.819 - type: ndcg_at_1 value: 28.528 - type: ndcg_at_10 value: 35.647 - type: ndcg_at_100 value: 40.207 - type: ndcg_at_1000 value: 42.695 - type: ndcg_at_3 value: 31.878 - type: ndcg_at_5 value: 33.634 - type: precision_at_1 value: 28.528 - type: precision_at_10 value: 5.46 - type: precision_at_100 value: 0.84 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 13.547999999999998 - type: precision_at_5 value: 9.325 - type: recall_at_1 value: 25.064999999999998 - type: recall_at_10 value: 45.096000000000004 - type: recall_at_100 value: 65.658 - type: recall_at_1000 value: 84.128 - type: recall_at_3 value: 34.337 - type: recall_at_5 value: 38.849000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.276 - type: map_at_10 value: 24.535 - type: map_at_100 value: 25.655 - type: map_at_1000 value: 25.782 - type: map_at_3 value: 22.228 - type: map_at_5 value: 23.612 - type: mrr_at_1 value: 21.266 - type: mrr_at_10 value: 28.474 - type: mrr_at_100 value: 29.398000000000003 - type: mrr_at_1000 value: 29.482000000000003 - type: mrr_at_3 value: 26.245 - type: mrr_at_5 value: 27.624 - type: ndcg_at_1 value: 21.266 - type: ndcg_at_10 value: 29.087000000000003 - type: ndcg_at_100 value: 34.374 - type: ndcg_at_1000 value: 37.433 - type: ndcg_at_3 value: 25.040000000000003 - type: ndcg_at_5 value: 27.116 - type: precision_at_1 value: 21.266 - type: precision_at_10 value: 5.258 - type: precision_at_100 value: 0.9299999999999999 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 11.849 - type: precision_at_5 value: 8.699 - type: recall_at_1 value: 17.276 - type: recall_at_10 value: 38.928000000000004 - type: recall_at_100 value: 62.529 - type: recall_at_1000 value: 84.44800000000001 - type: recall_at_3 value: 27.554000000000002 - type: recall_at_5 value: 32.915 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.297 - type: map_at_10 value: 36.957 - type: map_at_100 value: 38.252 - type: map_at_1000 value: 38.356 - type: map_at_3 value: 34.121 - type: map_at_5 value: 35.782000000000004 - type: mrr_at_1 value: 32.275999999999996 - type: mrr_at_10 value: 41.198 - type: mrr_at_100 value: 42.131 - type: mrr_at_1000 value: 42.186 - type: mrr_at_3 value: 38.557 - type: mrr_at_5 value: 40.12 - type: ndcg_at_1 value: 32.275999999999996 - type: ndcg_at_10 value: 42.516 - type: ndcg_at_100 value: 48.15 - type: ndcg_at_1000 value: 50.344 - type: ndcg_at_3 value: 37.423 - type: ndcg_at_5 value: 39.919 - type: precision_at_1 value: 32.275999999999996 - type: precision_at_10 value: 7.155 - type: precision_at_100 value: 1.123 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_3 value: 17.163999999999998 - type: precision_at_5 value: 12.127 - type: recall_at_1 value: 27.297 - type: recall_at_10 value: 55.238 - type: recall_at_100 value: 79.2 - type: recall_at_1000 value: 94.258 - type: recall_at_3 value: 41.327000000000005 - type: recall_at_5 value: 47.588 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.142000000000003 - type: map_at_10 value: 38.769 - type: map_at_100 value: 40.292 - type: map_at_1000 value: 40.510000000000005 - type: map_at_3 value: 35.39 - type: map_at_5 value: 37.009 - type: mrr_at_1 value: 34.19 - type: mrr_at_10 value: 43.418 - type: mrr_at_100 value: 44.132 - type: mrr_at_1000 value: 44.175 - type: mrr_at_3 value: 40.547 - type: mrr_at_5 value: 42.088 - type: ndcg_at_1 value: 34.19 - type: ndcg_at_10 value: 45.14 - type: ndcg_at_100 value: 50.364 - type: ndcg_at_1000 value: 52.481 - type: ndcg_at_3 value: 39.466 - type: ndcg_at_5 value: 41.772 - type: precision_at_1 value: 34.19 - type: precision_at_10 value: 8.715 - type: precision_at_100 value: 1.6150000000000002 - type: precision_at_1000 value: 0.247 - type: precision_at_3 value: 18.248 - type: precision_at_5 value: 13.161999999999999 - type: recall_at_1 value: 29.142000000000003 - type: recall_at_10 value: 57.577999999999996 - type: recall_at_100 value: 81.428 - type: recall_at_1000 value: 94.017 - type: recall_at_3 value: 41.402 - type: recall_at_5 value: 47.695 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.039 - type: map_at_10 value: 30.669999999999998 - type: map_at_100 value: 31.682 - type: map_at_1000 value: 31.794 - type: map_at_3 value: 28.139999999999997 - type: map_at_5 value: 29.457 - type: mrr_at_1 value: 24.399 - type: mrr_at_10 value: 32.687 - type: mrr_at_100 value: 33.622 - type: mrr_at_1000 value: 33.698 - type: mrr_at_3 value: 30.407 - type: mrr_at_5 value: 31.552999999999997 - type: ndcg_at_1 value: 24.399 - type: ndcg_at_10 value: 35.472 - type: ndcg_at_100 value: 40.455000000000005 - type: ndcg_at_1000 value: 43.15 - type: ndcg_at_3 value: 30.575000000000003 - type: ndcg_at_5 value: 32.668 - type: precision_at_1 value: 24.399 - type: precision_at_10 value: 5.656 - type: precision_at_100 value: 0.874 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 13.062000000000001 - type: precision_at_5 value: 9.242 - type: recall_at_1 value: 22.039 - type: recall_at_10 value: 48.379 - type: recall_at_100 value: 71.11800000000001 - type: recall_at_1000 value: 91.095 - type: recall_at_3 value: 35.108 - type: recall_at_5 value: 40.015 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.144 - type: map_at_10 value: 18.238 - type: map_at_100 value: 20.143 - type: map_at_1000 value: 20.346 - type: map_at_3 value: 14.809 - type: map_at_5 value: 16.567999999999998 - type: mrr_at_1 value: 22.671 - type: mrr_at_10 value: 34.906 - type: mrr_at_100 value: 35.858000000000004 - type: mrr_at_1000 value: 35.898 - type: mrr_at_3 value: 31.238 - type: mrr_at_5 value: 33.342 - type: ndcg_at_1 value: 22.671 - type: ndcg_at_10 value: 26.540000000000003 - type: ndcg_at_100 value: 34.138000000000005 - type: ndcg_at_1000 value: 37.72 - type: ndcg_at_3 value: 20.766000000000002 - type: ndcg_at_5 value: 22.927 - type: precision_at_1 value: 22.671 - type: precision_at_10 value: 8.619 - type: precision_at_100 value: 1.678 - type: precision_at_1000 value: 0.23500000000000001 - type: precision_at_3 value: 15.592 - type: precision_at_5 value: 12.43 - type: recall_at_1 value: 10.144 - type: recall_at_10 value: 33.46 - type: recall_at_100 value: 59.758 - type: recall_at_1000 value: 79.704 - type: recall_at_3 value: 19.604 - type: recall_at_5 value: 25.367 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.654 - type: map_at_10 value: 18.506 - type: map_at_100 value: 26.412999999999997 - type: map_at_1000 value: 28.13 - type: map_at_3 value: 13.379 - type: map_at_5 value: 15.529000000000002 - type: mrr_at_1 value: 66.0 - type: mrr_at_10 value: 74.13 - type: mrr_at_100 value: 74.48700000000001 - type: mrr_at_1000 value: 74.49799999999999 - type: mrr_at_3 value: 72.75 - type: mrr_at_5 value: 73.762 - type: ndcg_at_1 value: 54.50000000000001 - type: ndcg_at_10 value: 40.236 - type: ndcg_at_100 value: 44.690999999999995 - type: ndcg_at_1000 value: 52.195 - type: ndcg_at_3 value: 45.632 - type: ndcg_at_5 value: 42.952 - type: precision_at_1 value: 66.0 - type: precision_at_10 value: 31.724999999999998 - type: precision_at_100 value: 10.299999999999999 - type: precision_at_1000 value: 2.194 - type: precision_at_3 value: 48.75 - type: precision_at_5 value: 41.6 - type: recall_at_1 value: 8.654 - type: recall_at_10 value: 23.74 - type: recall_at_100 value: 50.346999999999994 - type: recall_at_1000 value: 74.376 - type: recall_at_3 value: 14.636 - type: recall_at_5 value: 18.009 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 53.245 - type: f1 value: 48.74520523753552 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 51.729 - type: map_at_10 value: 63.904 - type: map_at_100 value: 64.363 - type: map_at_1000 value: 64.38199999999999 - type: map_at_3 value: 61.393 - type: map_at_5 value: 63.02100000000001 - type: mrr_at_1 value: 55.686 - type: mrr_at_10 value: 67.804 - type: mrr_at_100 value: 68.15299999999999 - type: mrr_at_1000 value: 68.161 - type: mrr_at_3 value: 65.494 - type: mrr_at_5 value: 67.01599999999999 - type: ndcg_at_1 value: 55.686 - type: ndcg_at_10 value: 70.025 - type: ndcg_at_100 value: 72.011 - type: ndcg_at_1000 value: 72.443 - type: ndcg_at_3 value: 65.32900000000001 - type: ndcg_at_5 value: 68.05600000000001 - type: precision_at_1 value: 55.686 - type: precision_at_10 value: 9.358 - type: precision_at_100 value: 1.05 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 26.318 - type: precision_at_5 value: 17.321 - type: recall_at_1 value: 51.729 - type: recall_at_10 value: 85.04 - type: recall_at_100 value: 93.777 - type: recall_at_1000 value: 96.824 - type: recall_at_3 value: 72.521 - type: recall_at_5 value: 79.148 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 23.765 - type: map_at_10 value: 39.114 - type: map_at_100 value: 40.987 - type: map_at_1000 value: 41.155 - type: map_at_3 value: 34.028000000000006 - type: map_at_5 value: 36.925000000000004 - type: mrr_at_1 value: 46.451 - type: mrr_at_10 value: 54.711 - type: mrr_at_100 value: 55.509 - type: mrr_at_1000 value: 55.535000000000004 - type: mrr_at_3 value: 52.649 - type: mrr_at_5 value: 53.729000000000006 - type: ndcg_at_1 value: 46.451 - type: ndcg_at_10 value: 46.955999999999996 - type: ndcg_at_100 value: 53.686 - type: ndcg_at_1000 value: 56.230000000000004 - type: ndcg_at_3 value: 43.374 - type: ndcg_at_5 value: 44.372 - type: precision_at_1 value: 46.451 - type: precision_at_10 value: 13.256 - type: precision_at_100 value: 2.019 - type: precision_at_1000 value: 0.247 - type: precision_at_3 value: 29.115000000000002 - type: precision_at_5 value: 21.389 - type: recall_at_1 value: 23.765 - type: recall_at_10 value: 53.452999999999996 - type: recall_at_100 value: 78.828 - type: recall_at_1000 value: 93.938 - type: recall_at_3 value: 39.023 - type: recall_at_5 value: 45.18 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 31.918000000000003 - type: map_at_10 value: 46.741 - type: map_at_100 value: 47.762 - type: map_at_1000 value: 47.849000000000004 - type: map_at_3 value: 43.578 - type: map_at_5 value: 45.395 - type: mrr_at_1 value: 63.834999999999994 - type: mrr_at_10 value: 71.312 - type: mrr_at_100 value: 71.695 - type: mrr_at_1000 value: 71.714 - type: mrr_at_3 value: 69.82000000000001 - type: mrr_at_5 value: 70.726 - type: ndcg_at_1 value: 63.834999999999994 - type: ndcg_at_10 value: 55.879999999999995 - type: ndcg_at_100 value: 59.723000000000006 - type: ndcg_at_1000 value: 61.49400000000001 - type: ndcg_at_3 value: 50.964 - type: ndcg_at_5 value: 53.47 - type: precision_at_1 value: 63.834999999999994 - type: precision_at_10 value: 11.845 - type: precision_at_100 value: 1.4869999999999999 - type: precision_at_1000 value: 0.172 - type: precision_at_3 value: 32.158 - type: precision_at_5 value: 21.278 - type: recall_at_1 value: 31.918000000000003 - type: recall_at_10 value: 59.223000000000006 - type: recall_at_100 value: 74.328 - type: recall_at_1000 value: 86.05000000000001 - type: recall_at_3 value: 48.238 - type: recall_at_5 value: 53.193999999999996 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 79.7896 - type: ap value: 73.65166029460288 - type: f1 value: 79.71794693711813 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.239 - type: map_at_10 value: 34.542 - type: map_at_100 value: 35.717999999999996 - type: map_at_1000 value: 35.764 - type: map_at_3 value: 30.432 - type: map_at_5 value: 32.81 - type: mrr_at_1 value: 22.908 - type: mrr_at_10 value: 35.127 - type: mrr_at_100 value: 36.238 - type: mrr_at_1000 value: 36.278 - type: mrr_at_3 value: 31.076999999999998 - type: mrr_at_5 value: 33.419 - type: ndcg_at_1 value: 22.908 - type: ndcg_at_10 value: 41.607 - type: ndcg_at_100 value: 47.28 - type: ndcg_at_1000 value: 48.414 - type: ndcg_at_3 value: 33.253 - type: ndcg_at_5 value: 37.486000000000004 - type: precision_at_1 value: 22.908 - type: precision_at_10 value: 6.645 - type: precision_at_100 value: 0.9490000000000001 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 14.130999999999998 - type: precision_at_5 value: 10.616 - type: recall_at_1 value: 22.239 - type: recall_at_10 value: 63.42 - type: recall_at_100 value: 89.696 - type: recall_at_1000 value: 98.351 - type: recall_at_3 value: 40.77 - type: recall_at_5 value: 50.93 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 95.06839945280439 - type: f1 value: 94.74276398224072 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.25718194254446 - type: f1 value: 53.91164489161391 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.47948890383323 - type: f1 value: 69.98520247230257 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.46603900470748 - type: f1 value: 76.44111526065399 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.19106070798198 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 30.78772205248094 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.811231631488507 - type: mrr value: 32.98200485378021 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.9 - type: map_at_10 value: 13.703000000000001 - type: map_at_100 value: 17.251 - type: map_at_1000 value: 18.795 - type: map_at_3 value: 10.366999999999999 - type: map_at_5 value: 11.675 - type: mrr_at_1 value: 47.059 - type: mrr_at_10 value: 55.816 - type: mrr_at_100 value: 56.434 - type: mrr_at_1000 value: 56.467 - type: mrr_at_3 value: 53.973000000000006 - type: mrr_at_5 value: 55.257999999999996 - type: ndcg_at_1 value: 44.737 - type: ndcg_at_10 value: 35.997 - type: ndcg_at_100 value: 33.487 - type: ndcg_at_1000 value: 41.897 - type: ndcg_at_3 value: 41.18 - type: ndcg_at_5 value: 38.721 - type: precision_at_1 value: 46.129999999999995 - type: precision_at_10 value: 26.533 - type: precision_at_100 value: 8.706 - type: precision_at_1000 value: 2.16 - type: precision_at_3 value: 38.493 - type: precision_at_5 value: 33.189 - type: recall_at_1 value: 6.9 - type: recall_at_10 value: 17.488999999999997 - type: recall_at_100 value: 34.583000000000006 - type: recall_at_1000 value: 64.942 - type: recall_at_3 value: 11.494 - type: recall_at_5 value: 13.496 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 33.028999999999996 - type: map_at_10 value: 49.307 - type: map_at_100 value: 50.205 - type: map_at_1000 value: 50.23 - type: map_at_3 value: 44.782 - type: map_at_5 value: 47.599999999999994 - type: mrr_at_1 value: 37.108999999999995 - type: mrr_at_10 value: 51.742999999999995 - type: mrr_at_100 value: 52.405 - type: mrr_at_1000 value: 52.422000000000004 - type: mrr_at_3 value: 48.087999999999994 - type: mrr_at_5 value: 50.414 - type: ndcg_at_1 value: 37.08 - type: ndcg_at_10 value: 57.236 - type: ndcg_at_100 value: 60.931999999999995 - type: ndcg_at_1000 value: 61.522 - type: ndcg_at_3 value: 48.93 - type: ndcg_at_5 value: 53.561 - type: precision_at_1 value: 37.08 - type: precision_at_10 value: 9.386 - type: precision_at_100 value: 1.1480000000000001 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 22.258 - type: precision_at_5 value: 16.025 - type: recall_at_1 value: 33.028999999999996 - type: recall_at_10 value: 78.805 - type: recall_at_100 value: 94.643 - type: recall_at_1000 value: 99.039 - type: recall_at_3 value: 57.602 - type: recall_at_5 value: 68.253 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.122 - type: map_at_10 value: 85.237 - type: map_at_100 value: 85.872 - type: map_at_1000 value: 85.885 - type: map_at_3 value: 82.27499999999999 - type: map_at_5 value: 84.13199999999999 - type: mrr_at_1 value: 81.73 - type: mrr_at_10 value: 87.834 - type: mrr_at_100 value: 87.92 - type: mrr_at_1000 value: 87.921 - type: mrr_at_3 value: 86.878 - type: mrr_at_5 value: 87.512 - type: ndcg_at_1 value: 81.73 - type: ndcg_at_10 value: 88.85499999999999 - type: ndcg_at_100 value: 89.992 - type: ndcg_at_1000 value: 90.07 - type: ndcg_at_3 value: 85.997 - type: ndcg_at_5 value: 87.55199999999999 - type: precision_at_1 value: 81.73 - type: precision_at_10 value: 13.491 - type: precision_at_100 value: 1.536 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.623 - type: precision_at_5 value: 24.742 - type: recall_at_1 value: 71.122 - type: recall_at_10 value: 95.935 - type: recall_at_100 value: 99.657 - type: recall_at_1000 value: 99.996 - type: recall_at_3 value: 87.80799999999999 - type: recall_at_5 value: 92.161 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 63.490029238193756 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 65.13153408508836 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.202999999999999 - type: map_at_10 value: 10.174 - type: map_at_100 value: 12.138 - type: map_at_1000 value: 12.418 - type: map_at_3 value: 7.379 - type: map_at_5 value: 8.727 - type: mrr_at_1 value: 20.7 - type: mrr_at_10 value: 30.389 - type: mrr_at_100 value: 31.566 - type: mrr_at_1000 value: 31.637999999999998 - type: mrr_at_3 value: 27.133000000000003 - type: mrr_at_5 value: 29.078 - type: ndcg_at_1 value: 20.7 - type: ndcg_at_10 value: 17.355999999999998 - type: ndcg_at_100 value: 25.151 - type: ndcg_at_1000 value: 30.37 - type: ndcg_at_3 value: 16.528000000000002 - type: ndcg_at_5 value: 14.396999999999998 - type: precision_at_1 value: 20.7 - type: precision_at_10 value: 8.98 - type: precision_at_100 value: 2.015 - type: precision_at_1000 value: 0.327 - type: precision_at_3 value: 15.367 - type: precision_at_5 value: 12.559999999999999 - type: recall_at_1 value: 4.202999999999999 - type: recall_at_10 value: 18.197 - type: recall_at_100 value: 40.903 - type: recall_at_1000 value: 66.427 - type: recall_at_3 value: 9.362 - type: recall_at_5 value: 12.747 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_spearman value: 81.69890989765257 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_spearman value: 75.31953790551489 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_spearman value: 87.44050861280759 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_spearman value: 81.86922869270393 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_spearman value: 88.9399170304284 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_spearman value: 85.38015314088582 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_spearman value: 90.53653527788835 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_spearman value: 68.64526474250209 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_spearman value: 86.56156983963042 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 79.48610254648003 - type: mrr value: 94.02481505422682 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 48.983 - type: map_at_10 value: 59.077999999999996 - type: map_at_100 value: 59.536 - type: map_at_1000 value: 59.575 - type: map_at_3 value: 55.691 - type: map_at_5 value: 57.410000000000004 - type: mrr_at_1 value: 51.666999999999994 - type: mrr_at_10 value: 60.427 - type: mrr_at_100 value: 60.763 - type: mrr_at_1000 value: 60.79900000000001 - type: mrr_at_3 value: 57.556 - type: mrr_at_5 value: 59.089000000000006 - type: ndcg_at_1 value: 51.666999999999994 - type: ndcg_at_10 value: 64.559 - type: ndcg_at_100 value: 66.58 - type: ndcg_at_1000 value: 67.64 - type: ndcg_at_3 value: 58.287 - type: ndcg_at_5 value: 61.001000000000005 - type: precision_at_1 value: 51.666999999999994 - type: precision_at_10 value: 9.067 - type: precision_at_100 value: 1.0170000000000001 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 23.0 - type: precision_at_5 value: 15.6 - type: recall_at_1 value: 48.983 - type: recall_at_10 value: 80.289 - type: recall_at_100 value: 89.43299999999999 - type: recall_at_1000 value: 97.667 - type: recall_at_3 value: 62.978 - type: recall_at_5 value: 69.872 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.79009900990098 - type: cos_sim_ap value: 94.94115052608419 - type: cos_sim_f1 value: 89.1260162601626 - type: cos_sim_precision value: 90.599173553719 - type: cos_sim_recall value: 87.7 - type: dot_accuracy value: 99.79009900990098 - type: dot_ap value: 94.94115052608419 - type: dot_f1 value: 89.1260162601626 - type: dot_precision value: 90.599173553719 - type: dot_recall value: 87.7 - type: euclidean_accuracy value: 99.79009900990098 - type: euclidean_ap value: 94.94115052608419 - type: euclidean_f1 value: 89.1260162601626 - type: euclidean_precision value: 90.599173553719 - type: euclidean_recall value: 87.7 - type: manhattan_accuracy value: 99.7940594059406 - type: manhattan_ap value: 94.95271414642431 - type: manhattan_f1 value: 89.24508790072387 - type: manhattan_precision value: 92.3982869379015 - type: manhattan_recall value: 86.3 - type: max_accuracy value: 99.7940594059406 - type: max_ap value: 94.95271414642431 - type: max_f1 value: 89.24508790072387 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 68.43866571935851 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.16579026551532 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.518952473513934 - type: mrr value: 53.292457134368895 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.12529588316604 - type: cos_sim_spearman value: 32.31662126895294 - type: dot_pearson value: 31.125303796647056 - type: dot_spearman value: 32.31662126895294 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.219 - type: map_at_10 value: 1.7469999999999999 - type: map_at_100 value: 10.177999999999999 - type: map_at_1000 value: 26.108999999999998 - type: map_at_3 value: 0.64 - type: map_at_5 value: 0.968 - type: mrr_at_1 value: 82.0 - type: mrr_at_10 value: 89.067 - type: mrr_at_100 value: 89.067 - type: mrr_at_1000 value: 89.067 - type: mrr_at_3 value: 88.333 - type: mrr_at_5 value: 88.73299999999999 - type: ndcg_at_1 value: 78.0 - type: ndcg_at_10 value: 71.398 - type: ndcg_at_100 value: 55.574999999999996 - type: ndcg_at_1000 value: 51.771 - type: ndcg_at_3 value: 77.765 - type: ndcg_at_5 value: 73.614 - type: precision_at_1 value: 82.0 - type: precision_at_10 value: 75.4 - type: precision_at_100 value: 58.040000000000006 - type: precision_at_1000 value: 23.516000000000002 - type: precision_at_3 value: 84.0 - type: precision_at_5 value: 78.4 - type: recall_at_1 value: 0.219 - type: recall_at_10 value: 1.958 - type: recall_at_100 value: 13.797999999999998 - type: recall_at_1000 value: 49.881 - type: recall_at_3 value: 0.672 - type: recall_at_5 value: 1.0370000000000001 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 1.8610000000000002 - type: map_at_10 value: 8.705 - type: map_at_100 value: 15.164 - type: map_at_1000 value: 16.78 - type: map_at_3 value: 4.346 - type: map_at_5 value: 6.151 - type: mrr_at_1 value: 22.448999999999998 - type: mrr_at_10 value: 41.556 - type: mrr_at_100 value: 42.484 - type: mrr_at_1000 value: 42.494 - type: mrr_at_3 value: 37.755 - type: mrr_at_5 value: 40.102 - type: ndcg_at_1 value: 21.429000000000002 - type: ndcg_at_10 value: 23.439 - type: ndcg_at_100 value: 36.948 - type: ndcg_at_1000 value: 48.408 - type: ndcg_at_3 value: 22.261 - type: ndcg_at_5 value: 23.085 - type: precision_at_1 value: 22.448999999999998 - type: precision_at_10 value: 21.633 - type: precision_at_100 value: 8.02 - type: precision_at_1000 value: 1.5939999999999999 - type: precision_at_3 value: 23.810000000000002 - type: precision_at_5 value: 24.490000000000002 - type: recall_at_1 value: 1.8610000000000002 - type: recall_at_10 value: 15.876000000000001 - type: recall_at_100 value: 50.300999999999995 - type: recall_at_1000 value: 86.098 - type: recall_at_3 value: 5.892 - type: recall_at_5 value: 9.443 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.3264 - type: ap value: 13.249577616243794 - type: f1 value: 53.621518367695685 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.57611771363894 - type: f1 value: 61.79797478568639 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 53.38315344479284 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.55438993860642 - type: cos_sim_ap value: 77.98702600017738 - type: cos_sim_f1 value: 71.94971653931476 - type: cos_sim_precision value: 67.50693802035153 - type: cos_sim_recall value: 77.01846965699208 - type: dot_accuracy value: 87.55438993860642 - type: dot_ap value: 77.98702925907986 - type: dot_f1 value: 71.94971653931476 - type: dot_precision value: 67.50693802035153 - type: dot_recall value: 77.01846965699208 - type: euclidean_accuracy value: 87.55438993860642 - type: euclidean_ap value: 77.98702951957925 - type: euclidean_f1 value: 71.94971653931476 - type: euclidean_precision value: 67.50693802035153 - type: euclidean_recall value: 77.01846965699208 - type: manhattan_accuracy value: 87.54246885617214 - type: manhattan_ap value: 77.95531413902947 - type: manhattan_f1 value: 71.93605683836589 - type: manhattan_precision value: 69.28152492668622 - type: manhattan_recall value: 74.80211081794195 - type: max_accuracy value: 87.55438993860642 - type: max_ap value: 77.98702951957925 - type: max_f1 value: 71.94971653931476 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.47296930182016 - type: cos_sim_ap value: 86.92853616302108 - type: cos_sim_f1 value: 79.35138351681047 - type: cos_sim_precision value: 76.74820143884892 - type: cos_sim_recall value: 82.13735756082538 - type: dot_accuracy value: 89.47296930182016 - type: dot_ap value: 86.92854339601595 - type: dot_f1 value: 79.35138351681047 - type: dot_precision value: 76.74820143884892 - type: dot_recall value: 82.13735756082538 - type: euclidean_accuracy value: 89.47296930182016 - type: euclidean_ap value: 86.92854191061649 - type: euclidean_f1 value: 79.35138351681047 - type: euclidean_precision value: 76.74820143884892 - type: euclidean_recall value: 82.13735756082538 - type: manhattan_accuracy value: 89.47685023479644 - type: manhattan_ap value: 86.90063722679578 - type: manhattan_f1 value: 79.30753865502702 - type: manhattan_precision value: 76.32066068631639 - type: manhattan_recall value: 82.53772713273791 - type: max_accuracy value: 89.47685023479644 - type: max_ap value: 86.92854339601595 - type: max_f1 value: 79.35138351681047 --- # hkunlp/instructor-xl We introduce **Instructor**👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) ***by simply providing the task instruction, without any finetuning***. Instructor👨‍ achieves sota on 70 diverse embedding tasks! The model is easy to use with **our customized** library. For more details, check out our paper and project page! **************************** **Updates** **************************** * 01/21: We released a new checkpoint trained with hard negatives, which gives better performance. * 12/21: We released our paper, code, checkpoint and project page! Check them out! ## Quick start
## Installation ## Compute your customized embeddings Then you can use the model like this to calculate domain-specific and task-aware embeddings: ## Use cases
## Calculate embeddings for your customized texts If you want to calculate customized embeddings for specific sentences, you may follow the unified template to write instructions:                           Represent the for : * is optional, and it specifies the domain of the text, e.g., science, finance, medicine, etc. * is required, and it specifies the encoding unit, e.g., sentence, document, paragraph, etc. * is optional, and it specifies the objective of embedding, e.g., retrieve a document, classify the sentence, etc. ## Calculate Sentence similarities You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**. ## Information Retrieval You can also use **customized embeddings** for information retrieval. ## Clustering Use **customized embeddings** for clustering texts in groups.", + "model_explanation_gemini": "Generates text embeddings for tasks like sentence similarity, information retrieval, text classification, and clustering across various English datasets." +} \ No newline at end of file diff --git a/data/model_data_json/hkust-nlp_Qwen-2.5-7B-SimpleRL-Zoo.json b/data/model_data_json/hkust-nlp_Qwen-2.5-7B-SimpleRL-Zoo.json new file mode 100644 index 0000000000000000000000000000000000000000..493d3fb6ae1fa1358e724b79b4f41001ff4136bc --- /dev/null +++ b/data/model_data_json/hkust-nlp_Qwen-2.5-7B-SimpleRL-Zoo.json @@ -0,0 +1,12 @@ +{ + "model_id": "hkust-nlp/Qwen-2.5-7B-SimpleRL-Zoo", + "downloads": 401875, + "tags": [ + "safetensors", + "qwen2", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 ---", + "model_explanation_gemini": "Qwen-2.5-7B-SimpleRL-Zoo is a 7-billion-parameter model fine-tuned with simple reinforcement learning for enhanced performance in natural language processing tasks." +} \ No newline at end of file diff --git a/data/model_data_json/hooman650_bge-m3-onnx-o4.json b/data/model_data_json/hooman650_bge-m3-onnx-o4.json new file mode 100644 index 0000000000000000000000000000000000000000..49bc1e755707a3e0b0d9f06302449302eba64844 --- /dev/null +++ b/data/model_data_json/hooman650_bge-m3-onnx-o4.json @@ -0,0 +1,16 @@ +{ + "model_id": "hooman650/bge-m3-onnx-o4", + "downloads": 267051, + "tags": [ + "transformers", + "onnx", + "xlm-roberta", + "feature-extraction", + "license:mit", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit pipeline_tag: feature-extraction --- # bge-m3-onnx-o4 This is weights of the original []( Why is this model cool? - [x] Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval. - [x] Multi-Linguality: It can support more than **100** working languages. - [x] Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to **8192** tokens. ## Usage ### IMPORTANT - DOWNLOAD MODEL WEIGHTS Please see the instructions below. 1. **Download** the checkpoint: For some reason you cannot directly load from this online version (you will get an exception). Please download this repo as below: ### Dense Retrieval ### Multi-Vector (ColBERT)", + "model_explanation_gemini": "Performs dense, multi-vector, and sparse retrieval across 100+ languages while handling inputs from short sentences to long documents up to 8192 tokens." +} \ No newline at end of file diff --git a/data/model_data_json/huawei-noah_TinyBERT_General_4L_312D.json b/data/model_data_json/huawei-noah_TinyBERT_General_4L_312D.json new file mode 100644 index 0000000000000000000000000000000000000000..454ba8bda84da336bf5965cf084bb7ca7efb1fba --- /dev/null +++ b/data/model_data_json/huawei-noah_TinyBERT_General_4L_312D.json @@ -0,0 +1,15 @@ +{ + "model_id": "huawei-noah/TinyBERT_General_4L_312D", + "downloads": 155267, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "arxiv:1909.10351", + "endpoints_compatible", + "region:us" + ], + "description": "TinyBERT: Distilling BERT for Natural Language Understanding ======== TinyBERT is 7.5x smaller and 9.4x faster on inference than BERT-base and achieves competitive performances in the tasks of natural language understanding. It performs a novel transformer distillation at both the pre-training and task-specific learning stages. In general distillation, we use the original BERT-base without fine-tuning as the teacher and a large-scale text corpus as the learning data. By performing the Transformer distillation on the text from general domain, we obtain a general TinyBERT which provides a good initialization for the task-specific distillation. We here provide the general TinyBERT for your tasks at hand. For more details about the techniques of TinyBERT, refer to our paper: TinyBERT: Distilling BERT for Natural Language Understanding Citation ======== If you find TinyBERT useful in your research, please cite the following paper:", + "model_explanation_gemini": "TinyBERT is a distilled version of BERT-base, designed for efficient natural language understanding with competitive performance while being significantly smaller and faster." +} \ No newline at end of file diff --git a/data/model_data_json/hubertsiuzdak_snac_24khz.json b/data/model_data_json/hubertsiuzdak_snac_24khz.json new file mode 100644 index 0000000000000000000000000000000000000000..300cae6a8b42f31ca4ffbf4e2ba9010a0d979a52 --- /dev/null +++ b/data/model_data_json/hubertsiuzdak_snac_24khz.json @@ -0,0 +1,14 @@ +{ + "model_id": "hubertsiuzdak/snac_24khz", + "downloads": 501679, + "tags": [ + "transformers", + "pytorch", + "audio", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - audio --- # SNAC 🍿 Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess audio into discrete codes at a low bitrate. 👉 This model was primarily trained on speech data, and its recommended use case is speech synthesis. See below for other pretrained models. 🔗 GitHub repository: ## Overview SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span. This model compresses 24 kHz audio into discrete codes at a 0.98 kbps bitrate. It uses 3 RVQ levels with token rates of 12, 23, and 47 Hz. ## Pretrained models Currently, all models support only single audio channel (mono). | Model | Bitrate | Sample Rate | Params | Recommended use case | |-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------| | hubertsiuzdak/snac_24khz (this model) | 0.98 kbps | 24 kHz | 19.8 M | 🗣️ Speech | | hubertsiuzdak/snac_32khz | 1.9 kbps | 32 kHz | 54.5 M | 🎸 Music / Sound Effects | | hubertsiuzdak/snac_44khz | 2.6 kbps | 44 kHz | 54.5 M | 🎸 Music / Sound Effects | ## Usage Install it using: To encode (and decode) audio with SNAC in Python, use the following code: You can also encode and reconstruct in a single call: ⚠️ Note that is a list of token sequences of variable lengths, each corresponding to a different temporal resolution. ## Acknowledgements Module definitions are adapted from the Descript Audio Codec.", + "model_explanation_gemini": "Compresses 24 kHz speech audio into discrete codes at 0.98 kbps for efficient synthesis using hierarchical tokens with multi-scale rates." +} \ No newline at end of file diff --git a/data/model_data_json/hugging-quants_Meta-Llama-3.1-8B-Instruct-AWQ-INT4.json b/data/model_data_json/hugging-quants_Meta-Llama-3.1-8B-Instruct-AWQ-INT4.json new file mode 100644 index 0000000000000000000000000000000000000000..8d01560fb25498d704c062a6626bd3b55639befa --- /dev/null +++ b/data/model_data_json/hugging-quants_Meta-Llama-3.1-8B-Instruct-AWQ-INT4.json @@ -0,0 +1,31 @@ +{ + "model_id": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4", + "downloads": 338459, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3.1", + "meta", + "autoawq", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "awq", + "region:us" + ], + "description": "--- license: llama3.1 language: - en - de - fr - it - pt - hi - es - th library_name: transformers pipeline_tag: text-generation tags: - llama-3.1 - meta - autoawq --- > [!IMPORTANT] > This repository is a community-driven quantized version of the original model []( which is the BF16 half-precision official version released by Meta AI. ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. This repository contains []( quantized using AutoAWQ from FP16 down to INT4 using the GEMM kernels performing zero-point quantization with a group size of 128. ## Model Usage > [!NOTE] > In order to run the inference with Llama 3.1 8B Instruct AWQ in INT4, around 4 GiB of VRAM are needed only for loading the model checkpoint, without including the KV cache or the CUDA graphs, meaning that there should be a bit over that VRAM available. In order to use the current quantized model, support is offered for different solutions as , , or . ### 🤗 Transformers In order to run the inference with Llama 3.1 8B Instruct AWQ in INT4, you need to install the following packages: To run the inference on top of Llama 3.1 8B Instruct AWQ in INT4 precision, the AWQ model can be instantiated as any other causal language modeling model via and run the inference normally. ### AutoAWQ In order to run the inference with Llama 3.1 8B Instruct AWQ in INT4, you need to install the following packages: Alternatively, one may want to run that via even though it's built on top of 🤗 , which is the recommended approach instead as described above. The AutoAWQ script has been adapted from []( ### 🤗 Text Generation Inference (TGI) To run the with Llama 3.1 8B Instruct AWQ in INT4 with Marlin kernels for optimized inference speed, you will need to have Docker installed (see installation notes) and the Python package as you need to login to the Hugging Face Hub. Then you just need to run the TGI v2.2.0 (or higher) Docker container as follows: > [!NOTE] > TGI will expose different endpoints, to see all the endpoints available check TGI OpenAPI Specification. To send request to the deployed TGI endpoint compatible with OpenAI OpenAPI specification i.e. : Or programatically via the Python client as follows: Alternatively, the OpenAI Python client can also be used (see installation notes) as follows: ### vLLM To run vLLM with Llama 3.1 8B Instruct AWQ in INT4, you will need to have Docker installed (see installation notes) and run the latest vLLM Docker container as follows: To send request to the deployed vLLM endpoint compatible with OpenAI OpenAPI specification i.e. : Or programatically via the Python client (see installation notes) as follows: ## Quantization Reproduction > [!NOTE] > In order to quantize Llama 3.1 8B Instruct using AutoAWQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~8GiB, and an NVIDIA GPU with 16GiB of VRAM to quantize it. In order to quantize Llama 3.1 8B Instruct, first install the following packages: Then run the following script, adapted from [](", + "model_explanation_gemini": "An INT4-quantized version of Meta's multilingual 8B Llama 3.1 Instruct model optimized for efficient multilingual dialogue generation using AutoAWQ." +} \ No newline at end of file diff --git a/data/model_data_json/hugging-quants_Meta-Llama-3.1-8B-Instruct-GPTQ-INT4.json b/data/model_data_json/hugging-quants_Meta-Llama-3.1-8B-Instruct-GPTQ-INT4.json new file mode 100644 index 0000000000000000000000000000000000000000..250741e53c682353232e4f5ba5dc1f7ed66041b1 --- /dev/null +++ b/data/model_data_json/hugging-quants_Meta-Llama-3.1-8B-Instruct-GPTQ-INT4.json @@ -0,0 +1,31 @@ +{ + "model_id": "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4", + "downloads": 131141, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3.1", + "meta", + "autogptq", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "gptq", + "region:us" + ], + "description": "--- license: llama3.1 language: - en - de - fr - it - pt - hi - es - th library_name: transformers pipeline_tag: text-generation tags: - llama-3.1 - meta - autogptq --- > [!IMPORTANT] > This repository is a community-driven quantized version of the original model []( which is the FP16 half-precision official version released by Meta AI. ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. This repository contains []( quantized using AutoGPTQ from FP16 down to INT4 using the GPTQ kernels performing zero-point quantization with a group size of 128. ## Model Usage > [!NOTE] > In order to run the inference with Llama 3.1 8B Instruct GPTQ in INT4, around 4 GiB of VRAM are needed only for loading the model checkpoint, without including the KV cache or the CUDA graphs, meaning that there should be a bit over that VRAM available. In order to use the current quantized model, support is offered for different solutions as , , or . ### 🤗 transformers In order to run the inference with Llama 3.1 8B Instruct GPTQ in INT4, you need to install the following packages: To run the inference on top of Llama 3.1 8B Instruct GPTQ in INT4 precision, the GPTQ model can be instantiated as any other causal language modeling model via and run the inference normally. ### AutoGPTQ In order to run the inference with Llama 3.1 8B Instruct GPTQ in INT4, you need to install the following packages: Alternatively, one may want to run that via even though it's built on top of 🤗 , which is the recommended approach instead as described above. The AutoGPTQ script has been adapted from []( ### 🤗 Text Generation Inference (TGI) To run the with Llama 3.1 8B Instruct GPTQ in INT4 with Marlin kernels for optimized inference speed, you will need to have Docker installed (see installation notes) and the Python package as you need to login to the Hugging Face Hub. Then you just need to run the TGI v2.2.0 (or higher) Docker container as follows: > [!NOTE] > TGI will expose different endpoints, to see all the endpoints available check TGI OpenAPI Specification. To send request to the deployed TGI endpoint compatible with OpenAI OpenAPI specification i.e. : Or programatically via the Python client as follows: Alternatively, the OpenAI Python client can also be used (see installation notes) as follows: ### vLLM To run vLLM with Llama 3.1 8B Instruct GPTQ in INT4, you will need to have Docker installed (see installation notes) and run the latest vLLM Docker container as follows: To send request to the deployed vLLM endpoint compatible with OpenAI OpenAPI specification i.e. : Or programatically via the Python client (see installation notes) as follows: ## Quantization Reproduction > [!NOTE] > In order to quantize Llama 3.1 8B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~8GiB, and an NVIDIA GPU with 16GiB of VRAM to quantize it. In order to quantize Llama 3.1 8B Instruct with GPTQ in INT4, you need to install the following packages: Then run the following script, adapted from [](", + "model_explanation_gemini": "An INT4-quantized version of Meta's multilingual 8B Llama 3.1 Instruct model optimized for dialogue tasks, requiring ~4GB VRAM for inference." +} \ No newline at end of file diff --git a/data/model_data_json/huggyllama_llama-7b.json b/data/model_data_json/huggyllama_llama-7b.json new file mode 100644 index 0000000000000000000000000000000000000000..af57b14cce9fe16d9613667bcb3b7e37623f8fb3 --- /dev/null +++ b/data/model_data_json/huggyllama_llama-7b.json @@ -0,0 +1,19 @@ +{ + "model_id": "huggyllama/llama-7b", + "downloads": 100810, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "conversational", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other --- This contains the weights for the LLaMA-7b model. This model is under a non-commercial license (see the LICENSE file). You should only use this repository if you have been granted access to the model by filling out this form but either lost your copy of the weights or got some trouble converting them to the Transformers format.", + "model_explanation_gemini": "Provides the weights for the LLaMA-7b model under a non-commercial license, intended for users who lost their copy or encountered conversion issues." +} \ No newline at end of file diff --git a/data/model_data_json/humarin_chatgpt_paraphraser_on_T5_base.json b/data/model_data_json/humarin_chatgpt_paraphraser_on_T5_base.json new file mode 100644 index 0000000000000000000000000000000000000000..ed43c0d37a6b0c185d14bf45fbded55a454d1bf3 --- /dev/null +++ b/data/model_data_json/humarin_chatgpt_paraphraser_on_T5_base.json @@ -0,0 +1,19 @@ +{ + "model_id": "humarin/chatgpt_paraphraser_on_T5_base", + "downloads": 133639, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "en", + "dataset:humarin/chatgpt-paraphrases", + "license:openrail", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: openrail datasets: - humarin/chatgpt-paraphrases language: - en library_name: transformers inference: parameters: num_beams: 5 num_beam_groups: 5 num_return_sequences: 5 repetition_penalty: 10.01 diversity_penalty: 3.01 no_repeat_ngram_size: 2 temperature: 0.7 max_length: 128 widget: - text: What are the best places to see in New York? example_title: New York tourist attractions - text: When should I go to the doctor? example_title: Doctor's time - text: >- Rammstein's album Mutter was recorded in the south of France in May and June 2000, and mixed in Stockholm in October of that year. example_title: Rammstein's album Mutter pipeline_tag: text2text-generation --- This model was trained on our ChatGPT paraphrase dataset. This dataset is based on the Quora paraphrase question, texts from the SQUAD 2.0 and the CNN news dataset. This model is based on the T5-base model. We used \"transfer learning\" to get our model to generate paraphrases as well as ChatGPT. Now we can say that this is one of the best paraphrases of the Hugging Face. Kaggle link Author's 1 LinkedIn link Author's 2 LinkedIn link ## Deploying example ## Usage examples **Input:** **Output:** **Input:** **Output:** ## Train parameters ### BibTeX entry and citation info", + "model_explanation_gemini": "Generates diverse paraphrases of input text using a T5-base model trained on ChatGPT-style paraphrasing data." +} \ No newline at end of file diff --git a/data/model_data_json/hustvl_vitmatte-small-composition-1k.json b/data/model_data_json/hustvl_vitmatte-small-composition-1k.json new file mode 100644 index 0000000000000000000000000000000000000000..eeebb4177e799e8ecf06eb2150c97fdf7c62b545 --- /dev/null +++ b/data/model_data_json/hustvl_vitmatte-small-composition-1k.json @@ -0,0 +1,17 @@ +{ + "model_id": "hustvl/vitmatte-small-composition-1k", + "downloads": 3451112, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vitmatte", + "vision", + "arxiv:2305.15272", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision --- # ViTMatte model ViTMatte model trained on Composition-1k. It was introduced in the paper ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers by Yao et al. and first released in this repository. Disclaimer: The team releasing ViTMatte did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description ViTMatte is a simple approach to image matting, the task of accurately estimating the foreground object in an image. The model consists of a Vision Transformer (ViT) with a lightweight head on top. \"drawing\" ViTMatte high-level overview. Taken from the ## Intended uses & limitations You can use the raw model for image matting. See the model hub to look for other fine-tuned versions that may interest you. ### How to use We refer to the docs. ### BibTeX entry and citation info", + "model_explanation_gemini": "Estimates foreground objects in images using a Vision Transformer with a lightweight head for image matting." +} \ No newline at end of file diff --git a/data/model_data_json/hustvl_yolos-small.json b/data/model_data_json/hustvl_yolos-small.json new file mode 100644 index 0000000000000000000000000000000000000000..881fa066812ca8027e72675ce16f55142cd5c8c5 --- /dev/null +++ b/data/model_data_json/hustvl_yolos-small.json @@ -0,0 +1,19 @@ +{ + "model_id": "hustvl/yolos-small", + "downloads": 181887, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "yolos", + "object-detection", + "vision", + "dataset:coco", + "arxiv:2106.00666", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - object-detection - vision datasets: - coco widget: - src: example_title: Savanna - src: example_title: Football Match - src: example_title: Airport --- # YOLOS (small-sized) model YOLOS model fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository. Disclaimer: The team releasing YOLOS did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN). The model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model. ## Intended uses & limitations You can use the raw model for object detection. See the model hub to look for all available YOLOS models. ### How to use Here is how to use this model: Currently, both the feature extractor and model support PyTorch. ## Training data The YOLOS model was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. ### Training The model was pre-trained for 200 epochs on ImageNet-1k and fine-tuned for 150 epochs on COCO. ## Evaluation results This model achieves an AP (average precision) of **36.1** on COCO 2017 validation. For more details regarding evaluation results, we refer to table 1 of the original paper. ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects objects in images using a Vision Transformer trained on COCO dataset with a bipartite matching loss for bounding box and class prediction." +} \ No newline at end of file diff --git a/data/model_data_json/hustvl_yolos-tiny.json b/data/model_data_json/hustvl_yolos-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..ead59227c1c02c9476a0f911483f751970c6944c --- /dev/null +++ b/data/model_data_json/hustvl_yolos-tiny.json @@ -0,0 +1,19 @@ +{ + "model_id": "hustvl/yolos-tiny", + "downloads": 134396, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "yolos", + "object-detection", + "vision", + "dataset:coco", + "arxiv:2106.00666", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - object-detection - vision datasets: - coco widget: - src: example_title: Savanna - src: example_title: Football Match - src: example_title: Airport --- # YOLOS (tiny-sized) model YOLOS model fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository. Disclaimer: The team releasing YOLOS did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN). The model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model. ## Intended uses & limitations You can use the raw model for object detection. See the model hub to look for all available YOLOS models. ### How to use Here is how to use this model: Currently, both the feature extractor and model support PyTorch. ## Training data The YOLOS model was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. ### Training The model was pre-trained for 300 epochs on ImageNet-1k and fine-tuned for 300 epochs on COCO. ## Evaluation results This model achieves an AP (average precision) of **28.7** on COCO 2017 validation. For more details regarding evaluation results, we refer to the original paper. ### BibTeX entry and citation info", + "model_explanation_gemini": "Detects objects in images using a Vision Transformer trained on COCO dataset with a bipartite matching loss." +} \ No newline at end of file diff --git a/data/model_data_json/ianpan_chest-x-ray-basic.json b/data/model_data_json/ianpan_chest-x-ray-basic.json new file mode 100644 index 0000000000000000000000000000000000000000..8ef7bf79848ce0e5eb4fdf4e8b31936f99eb3c47 --- /dev/null +++ b/data/model_data_json/ianpan_chest-x-ray-basic.json @@ -0,0 +1,25 @@ +{ + "model_id": "ianpan/chest-x-ray-basic", + "downloads": 174093, + "tags": [ + "transformers", + "safetensors", + "cxr_basic", + "feature-extraction", + "chest_x_ray", + "x_ray", + "medical_imaging", + "radiology", + "segmentation", + "classification", + "lungs", + "heart", + "image-segmentation", + "custom_code", + "base_model:timm/tf_efficientnetv2_s.in21k_ft_in1k", + "base_model:finetune:timm/tf_efficientnetv2_s.in21k_ft_in1k", + "region:us" + ], + "description": "--- library_name: transformers tags: - chest_x_ray - x_ray - medical_imaging - radiology - segmentation - classification - lungs - heart base_model: - timm/tf_efficientnetv2_s.in21k_ft_in1k pipeline_tag: image-segmentation --- This model performs both segmentation and classification on chest radiographs (X-rays). The model uses a backbone with a U-Net decoder for segmentation and linear layer for classification. For frontal radiographs, the model segments the: 1) right lung, 2) left lung, and 3) heart. The model also predicts the chest X-ray view (AP, PA, lateral), patient age, and patient sex. The CheXpert (small version) and NIH Chest X-ray datasets were used to train the model. Segmentation masks were obtained from the CheXmask dataset (paper). The final dataset comprised 335,516 images from 96,385 patients and was split into 80% training/20% validation. A holdout test set was not used since minimal tuning was performed. The view classifier was trained only on CheXpert images (NIH images excluded from loss function), given that lateral radiographs are only present in CheXpert. This is to avoid unwanted bias in the model, which can occur if one class originates only from a single dataset. Validation performance as follows: To use the model: The output is a dictionary which contains 4 keys: * has 3 channels containing the segmentation masks. Take the argmax over the channel dimension to create a single image mask (i.e., ): 1 = right lung, 2 = left lung, 3 = heart. * , in years. * , with 3 classes for each possible view. Take the argmax to select the predicted view (i.e., ): 0 = AP, 1 = PA, 2 = lateral. * , binarize with . You can use the segmentation mask to crop the region containing the lungs from the rest of the X-ray. You can also calculate the cardiothoracic ratio (CTR) using this function: If you have installed, you can also load a DICOM image directly: This model is for demonstration and research purposes only and has NOT been approved by any regulatory agency for clinical use. The user assumes any and all responsibility regarding their own use of this model and its outputs.", + "model_explanation_gemini": "Performs segmentation of lungs and heart along with classification of chest X-ray view, patient age, and sex on frontal radiographs using a U-Net decoder and linear layer." +} \ No newline at end of file diff --git a/data/model_data_json/iarfmoose_bert-base-cased-qa-evaluator.json b/data/model_data_json/iarfmoose_bert-base-cased-qa-evaluator.json new file mode 100644 index 0000000000000000000000000000000000000000..aa03c0c963a8f1a2f27be9ee56edecbaeacaee09 --- /dev/null +++ b/data/model_data_json/iarfmoose_bert-base-cased-qa-evaluator.json @@ -0,0 +1,17 @@ +{ + "model_id": "iarfmoose/bert-base-cased-qa-evaluator", + "downloads": 111690, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "text-classification", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "# BERT-base-cased-qa-evaluator This model takes a question answer pair as an input and outputs a value representing its prediction about whether the input was a valid question and answer pair or not. The model is a pretrained BERT-base-cased with a sequence classification head. ## Intended uses The QA evaluator was originally designed to be used with the t5-base-question-generator for evaluating the quality of generated questions. The input for the QA evaluator follows the format for , but using the question and answer as the two sequences. Inputs should take the following format: ## Limitations and bias The model is trained to evaluate if a question and answer are semantically related, but cannot determine whether an answer is actually true/correct or not. ## Training data The training data was made up of question-answer pairs from the following datasets: - SQuAD - RACE - CoQA - MSMARCO ## Training procedure The question and answer were concatenated 50% of the time. In the other 50% of the time a corruption operation was performed (either swapping the answer for an unrelated answer, or by copying part of the question into the answer). The model was then trained to predict whether the input sequence represented one of the original QA pairs or a corrupted input.", + "model_explanation_gemini": "Evaluates whether a given question-answer pair is semantically valid by predicting its validity, originally designed to assess quality of generated questions." +} \ No newline at end of file diff --git a/data/model_data_json/iarfmoose_t5-base-question-generator.json b/data/model_data_json/iarfmoose_t5-base-question-generator.json new file mode 100644 index 0000000000000000000000000000000000000000..6466f801ad362d8a32e60c7a3183a7839fedf7ca --- /dev/null +++ b/data/model_data_json/iarfmoose_t5-base-question-generator.json @@ -0,0 +1,18 @@ +{ + "model_id": "iarfmoose/t5-base-question-generator", + "downloads": 117657, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "t5", + "text2text-generation", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "# Model name ## Model description This model is a sequence-to-sequence question generator which takes an answer and context as an input, and generates a question as an output. It is based on a pretrained model. ## Intended uses & limitations The model is trained to generate reading comprehension-style questions with answers extracted from a text. The model performs best with full sentence answers, but can also be used with single word or short phrase answers. #### How to use The model takes concatenated answers and context as an input sequence, and will generate a full question sentence as an output sequence. The max sequence length is 512 tokens. Inputs should be organised into the following format: The input sequence can then be encoded and passed as the argument in the model's method. For best results, a large number of questions can be generated, and then filtered using iarfmoose/bert-base-cased-qa-evaluator. For examples, please see #### Limitations and bias The model is limited to generating questions in the same style as those found in SQuAD, CoQA, and MSMARCO. The generated questions can potentially be leading or reflect biases that are present in the context. If the context is too short or completely absent, or if the context and answer do not match, the generated question is likely to be incoherent. ## Training data The model was fine-tuned on a dataset made up of several well-known QA datasets (SQuAD, CoQA, and MSMARCO). The datasets were restructured by concatenating the answer and context fields into the previously-mentioned format. The question field from the datasets was used as the target during training. The full training set was roughly 200,000 examples. ## Training procedure The model was trained for 20 epochs over the training set with a learning rate of 1e-3. The batch size was only 4 due to GPU memory limitations when training on Google Colab.", + "model_explanation_gemini": "Generates reading comprehension-style questions from given answers and context, based on pretrained sequence-to-sequence architecture." +} \ No newline at end of file diff --git a/data/model_data_json/ibm-granite_granite-embedding-125m-english.json b/data/model_data_json/ibm-granite_granite-embedding-125m-english.json new file mode 100644 index 0000000000000000000000000000000000000000..d6b7ce1fb8aed3fbc91509b657f95ad824309b36 --- /dev/null +++ b/data/model_data_json/ibm-granite_granite-embedding-125m-english.json @@ -0,0 +1,27 @@ +{ + "model_id": "ibm-granite/granite-embedding-125m-english", + "downloads": 111443, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "roberta", + "feature-extraction", + "language", + "granite", + "embeddings", + "mteb", + "transformers", + "sentence-similarity", + "en", + "arxiv:0000.00000", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 library_name: sentence-transformers tags: - language - granite - embeddings - mteb - transformers model-index: - name: ibm-granite/granite-embedding-125m-english results: - dataset: config: en-ext name: MTEB AmazonCounterfactualClassification (en-ext) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 67.3613 - type: f1 value: 55.0794 - type: f1_weighted value: 73.55120000000001 - type: ap value: 17.643900000000002 - type: ap_weighted value: 17.643900000000002 - type: main_score value: 67.3613 task: type: Classification - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 63.403 - type: f1 value: 57.4178 - type: f1_weighted value: 66.9704 - type: ap value: 26.892300000000002 - type: ap_weighted value: 26.892300000000002 - type: main_score value: 63.403 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification (default) revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 64.5872 - type: f1 value: 64.33330000000001 - type: f1_weighted value: 64.33330000000001 - type: ap value: 59.602 - type: ap_weighted value: 59.602 - type: main_score value: 64.5872 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 33.534000000000006 - type: f1 value: 32.5389 - type: f1_weighted value: 32.5389 - type: main_score value: 33.534000000000006 task: type: Classification - dataset: config: default name: MTEB AppsRetrieval (default) revision: f22508f96b7a36c2415181ed8bb76f76e04ae2d5 split: test type: CoIR-Retrieval/apps metrics: - type: ndcg_at_1 value: 6.932 - type: ndcg_at_3 value: 9.577 - type: ndcg_at_5 value: 10.597 - type: ndcg_at_10 value: 11.787 - type: ndcg_at_20 value: 12.863 - type: ndcg_at_100 value: 15.573999999999998 - type: ndcg_at_1000 value: 19.772000000000002 - type: map_at_1 value: 6.932 - type: map_at_3 value: 8.938 - type: map_at_5 value: 9.506 - type: map_at_10 value: 10.0 - type: map_at_20 value: 10.296 - type: map_at_100 value: 10.644 - type: map_at_1000 value: 10.771 - type: recall_at_1 value: 6.932 - type: recall_at_3 value: 11.421000000000001 - type: recall_at_5 value: 13.891 - type: recall_at_10 value: 17.556 - type: recall_at_20 value: 21.806 - type: recall_at_100 value: 36.839 - type: recall_at_1000 value: 71.71300000000001 - type: precision_at_1 value: 6.932 - type: precision_at_3 value: 3.807 - type: precision_at_5 value: 2.778 - type: precision_at_10 value: 1.756 - type: precision_at_20 value: 1.09 - type: precision_at_100 value: 0.368 - type: precision_at_1000 value: 0.07200000000000001 - type: mrr_at_1 value: 6.9323 - type: mrr_at_3 value: 8.9376 - type: mrr_at_5 value: 9.506 - type: mrr_at_10 value: 9.9999 - type: mrr_at_20 value: 10.2957 - type: mrr_at_100 value: 10.643600000000001 - type: mrr_at_1000 value: 10.7707 - type: nauc_ndcg_at_1_max value: 27.327299999999997 - type: nauc_ndcg_at_1_std value: 9.6266 - type: nauc_ndcg_at_1_diff1 value: 39.4451 - type: nauc_ndcg_at_3_max value: 22.9053 - type: nauc_ndcg_at_3_std value: 10.123 - type: nauc_ndcg_at_3_diff1 value: 27.742099999999997 - type: nauc_ndcg_at_5_max value: 21.7041 - type: nauc_ndcg_at_5_std value: 9.661100000000001 - type: nauc_ndcg_at_5_diff1 value: 25.0689 - type: nauc_ndcg_at_10_max value: 21.0966 - type: nauc_ndcg_at_10_std value: 10.4106 - type: nauc_ndcg_at_10_diff1 value: 23.4219 - type: nauc_ndcg_at_20_max value: 20.0575 - type: nauc_ndcg_at_20_std value: 10.89 - type: nauc_ndcg_at_20_diff1 value: 22.6143 - type: nauc_ndcg_at_100_max value: 19.4243 - type: nauc_ndcg_at_100_std value: 11.5431 - type: nauc_ndcg_at_100_diff1 value: 21.013 - type: nauc_ndcg_at_1000_max value: 20.6057 - type: nauc_ndcg_at_1000_std value: 13.0027 - type: nauc_ndcg_at_1000_diff1 value: 20.988799999999998 - type: nauc_map_at_1_max value: 27.327299999999997 - type: nauc_map_at_1_std value: 9.6266 - type: nauc_map_at_1_diff1 value: 39.4451 - type: nauc_map_at_3_max value: 23.6991 - type: nauc_map_at_3_std value: 9.9287 - type: nauc_map_at_3_diff1 value: 29.909799999999997 - type: nauc_map_at_5_max value: 22.9242 - type: nauc_map_at_5_std value: 9.640600000000001 - type: nauc_map_at_5_diff1 value: 28.228199999999998 - type: nauc_map_at_10_max value: 22.612199999999998 - type: nauc_map_at_10_std value: 10.0051 - type: nauc_map_at_10_diff1 value: 27.3942 - type: nauc_map_at_20_max value: 22.236 - type: nauc_map_at_20_std value: 10.168000000000001 - type: nauc_map_at_20_diff1 value: 27.0258 - type: nauc_map_at_100_max value: 22.1373 - type: nauc_map_at_100_std value: 10.2741 - type: nauc_map_at_100_diff1 value: 26.717800000000004 - type: nauc_map_at_1000_max value: 22.1829 - type: nauc_map_at_1000_std value: 10.3395 - type: nauc_map_at_1000_diff1 value: 26.7158 - type: nauc_recall_at_1_max value: 27.327299999999997 - type: nauc_recall_at_1_std value: 9.6266 - type: nauc_recall_at_1_diff1 value: 39.4451 - type: nauc_recall_at_3_max value: 21.0841 - type: nauc_recall_at_3_std value: 10.6057 - type: nauc_recall_at_3_diff1 value: 22.745 - type: nauc_recall_at_5_max value: 19.0389 - type: nauc_recall_at_5_std value: 9.697899999999999 - type: nauc_recall_at_5_diff1 value: 18.137600000000003 - type: nauc_recall_at_10_max value: 18.0668 - type: nauc_recall_at_10_std value: 11.326799999999999 - type: nauc_recall_at_10_diff1 value: 15.423 - type: nauc_recall_at_20_max value: 15.798100000000002 - type: nauc_recall_at_20_std value: 12.4585 - type: nauc_recall_at_20_diff1 value: 14.509500000000001 - type: nauc_recall_at_100_max value: 14.2836 - type: nauc_recall_at_100_std value: 14.2989 - type: nauc_recall_at_100_diff1 value: 10.7304 - type: nauc_recall_at_1000_max value: 19.728299999999997 - type: nauc_recall_at_1000_std value: 24.5691 - type: nauc_recall_at_1000_diff1 value: 6.1472999999999995 - type: nauc_precision_at_1_max value: 27.327299999999997 - type: nauc_precision_at_1_std value: 9.6266 - type: nauc_precision_at_1_diff1 value: 39.4451 - type: nauc_precision_at_3_max value: 21.0841 - type: nauc_precision_at_3_std value: 10.6057 - type: nauc_precision_at_3_diff1 value: 22.745 - type: nauc_precision_at_5_max value: 19.0389 - type: nauc_precision_at_5_std value: 9.697899999999999 - type: nauc_precision_at_5_diff1 value: 18.137600000000003 - type: nauc_precision_at_10_max value: 18.0668 - type: nauc_precision_at_10_std value: 11.326799999999999 - type: nauc_precision_at_10_diff1 value: 15.423 - type: nauc_precision_at_20_max value: 15.798100000000002 - type: nauc_precision_at_20_std value: 12.4585 - type: nauc_precision_at_20_diff1 value: 14.509500000000001 - type: nauc_precision_at_100_max value: 14.2836 - type: nauc_precision_at_100_std value: 14.2989 - type: nauc_precision_at_100_diff1 value: 10.7304 - type: nauc_precision_at_1000_max value: 19.728299999999997 - type: nauc_precision_at_1000_std value: 24.5691 - type: nauc_precision_at_1000_diff1 value: 6.1472999999999995 - type: nauc_mrr_at_1_max value: 27.327299999999997 - type: nauc_mrr_at_1_std value: 9.6266 - type: nauc_mrr_at_1_diff1 value: 39.4451 - type: nauc_mrr_at_3_max value: 23.6991 - type: nauc_mrr_at_3_std value: 9.9287 - type: nauc_mrr_at_3_diff1 value: 29.909799999999997 - type: nauc_mrr_at_5_max value: 22.9242 - type: nauc_mrr_at_5_std value: 9.640600000000001 - type: nauc_mrr_at_5_diff1 value: 28.228199999999998 - type: nauc_mrr_at_10_max value: 22.612199999999998 - type: nauc_mrr_at_10_std value: 10.0051 - type: nauc_mrr_at_10_diff1 value: 27.3942 - type: nauc_mrr_at_20_max value: 22.236 - type: nauc_mrr_at_20_std value: 10.168000000000001 - type: nauc_mrr_at_20_diff1 value: 27.0258 - type: nauc_mrr_at_100_max value: 22.1372 - type: nauc_mrr_at_100_std value: 10.2743 - type: nauc_mrr_at_100_diff1 value: 26.7177 - type: nauc_mrr_at_1000_max value: 22.1828 - type: nauc_mrr_at_1000_std value: 10.3397 - type: nauc_mrr_at_1000_diff1 value: 26.7157 - type: main_score value: 11.787 task: type: Retrieval - dataset: config: default name: MTEB ArguAna (default) revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: ndcg_at_1 value: 33.642 - type: ndcg_at_3 value: 48.825 - type: ndcg_at_5 value: 53.689 - type: ndcg_at_10 value: 58.401 - type: ndcg_at_20 value: 60.78 - type: ndcg_at_100 value: 61.57 - type: ndcg_at_1000 value: 61.608 - type: map_at_1 value: 33.642 - type: map_at_3 value: 45.057 - type: map_at_5 value: 47.774 - type: map_at_10 value: 49.716 - type: map_at_20 value: 50.400999999999996 - type: map_at_100 value: 50.519000000000005 - type: map_at_1000 value: 50.52100000000001 - type: recall_at_1 value: 33.642 - type: recall_at_3 value: 59.744 - type: recall_at_5 value: 71.479 - type: recall_at_10 value: 86.06 - type: recall_at_20 value: 95.235 - type: recall_at_100 value: 99.36 - type: recall_at_1000 value: 99.644 - type: precision_at_1 value: 33.642 - type: precision_at_3 value: 19.915 - type: precision_at_5 value: 14.296000000000001 - type: precision_at_10 value: 8.606 - type: precision_at_20 value: 4.7620000000000005 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 34.495 - type: mrr_at_3 value: 45.2821 - type: mrr_at_5 value: 48.1128 - type: mrr_at_10 value: 50.036199999999994 - type: mrr_at_20 value: 50.7172 - type: mrr_at_100 value: 50.83259999999999 - type: mrr_at_1000 value: 50.8343 - type: nauc_ndcg_at_1_max value: -11.838999999999999 - type: nauc_ndcg_at_1_std value: -11.8923 - type: nauc_ndcg_at_1_diff1 value: 18.2163 - type: nauc_ndcg_at_3_max value: -11.6655 - type: nauc_ndcg_at_3_std value: -12.2408 - type: nauc_ndcg_at_3_diff1 value: 12.4326 - type: nauc_ndcg_at_5_max value: -11.2332 - type: nauc_ndcg_at_5_std value: -10.99 - type: nauc_ndcg_at_5_diff1 value: 11.4272 - type: nauc_ndcg_at_10_max value: -9.7581 - type: nauc_ndcg_at_10_std value: -10.6279 - type: nauc_ndcg_at_10_diff1 value: 12.3219 - type: nauc_ndcg_at_20_max value: -9.070300000000001 - type: nauc_ndcg_at_20_std value: -10.4367 - type: nauc_ndcg_at_20_diff1 value: 13.5332 - type: nauc_ndcg_at_100_max value: -10.281 - type: nauc_ndcg_at_100_std value: -10.8575 - type: nauc_ndcg_at_100_diff1 value: 13.583899999999998 - type: nauc_ndcg_at_1000_max value: -10.4108 - type: nauc_ndcg_at_1000_std value: -10.9358 - type: nauc_ndcg_at_1000_diff1 value: 13.553200000000002 - type: nauc_map_at_1_max value: -11.838999999999999 - type: nauc_map_at_1_std value: -11.8923 - type: nauc_map_at_1_diff1 value: 18.2163 - type: nauc_map_at_3_max value: -11.6502 - type: nauc_map_at_3_std value: -12.0988 - type: nauc_map_at_3_diff1 value: 13.7581 - type: nauc_map_at_5_max value: -11.345600000000001 - type: nauc_map_at_5_std value: -11.4327 - type: nauc_map_at_5_diff1 value: 13.3246 - type: nauc_map_at_10_max value: -10.8652 - type: nauc_map_at_10_std value: -11.3476 - type: nauc_map_at_10_diff1 value: 13.7353 - type: nauc_map_at_20_max value: -10.7273 - type: nauc_map_at_20_std value: -11.309800000000001 - type: nauc_map_at_20_diff1 value: 14.0429 - type: nauc_map_at_100_max value: -10.8833 - type: nauc_map_at_100_std value: -11.372 - type: nauc_map_at_100_diff1 value: 14.0638 - type: nauc_map_at_1000_max value: -10.8878 - type: nauc_map_at_1000_std value: -11.3746 - type: nauc_map_at_1000_diff1 value: 14.062 - type: nauc_recall_at_1_max value: -11.838999999999999 - type: nauc_recall_at_1_std value: -11.8923 - type: nauc_recall_at_1_diff1 value: 18.2163 - type: nauc_recall_at_3_max value: -11.739099999999999 - type: nauc_recall_at_3_std value: -12.7062 - type: nauc_recall_at_3_diff1 value: 8.3694 - type: nauc_recall_at_5_max value: -10.8863 - type: nauc_recall_at_5_std value: -9.1183 - type: nauc_recall_at_5_diff1 value: 4.1094 - type: nauc_recall_at_10_max value: -0.9124 - type: nauc_recall_at_10_std value: -4.971 - type: nauc_recall_at_10_diff1 value: 3.4779999999999998 - type: nauc_recall_at_20_max value: 29.0035 - type: nauc_recall_at_20_std value: 8.7987 - type: nauc_recall_at_20_diff1 value: 11.932 - type: nauc_recall_at_100_max value: 42.377700000000004 - type: nauc_recall_at_100_std value: 55.2136 - type: nauc_recall_at_100_diff1 value: 3.1033999999999997 - type: nauc_recall_at_1000_max value: 19.053700000000003 - type: nauc_recall_at_1000_std value: 67.9828 - type: nauc_recall_at_1000_diff1 value: -17.644399999999997 - type: nauc_precision_at_1_max value: -11.838999999999999 - type: nauc_precision_at_1_std value: -11.8923 - type: nauc_precision_at_1_diff1 value: 18.2163 - type: nauc_precision_at_3_max value: -11.739099999999999 - type: nauc_precision_at_3_std value: -12.7062 - type: nauc_precision_at_3_diff1 value: 8.3694 - type: nauc_precision_at_5_max value: -10.8863 - type: nauc_precision_at_5_std value: -9.1183 - type: nauc_precision_at_5_diff1 value: 4.1094 - type: nauc_precision_at_10_max value: -0.9124 - type: nauc_precision_at_10_std value: -4.971 - type: nauc_precision_at_10_diff1 value: 3.4779999999999998 - type: nauc_precision_at_20_max value: 29.0035 - type: nauc_precision_at_20_std value: 8.7987 - type: nauc_precision_at_20_diff1 value: 11.932 - type: nauc_precision_at_100_max value: 42.377700000000004 - type: nauc_precision_at_100_std value: 55.2136 - type: nauc_precision_at_100_diff1 value: 3.1033999999999997 - type: nauc_precision_at_1000_max value: 19.053700000000003 - type: nauc_precision_at_1000_std value: 67.9828 - type: nauc_precision_at_1000_diff1 value: -17.644399999999997 - type: nauc_mrr_at_1_max value: -12.0053 - type: nauc_mrr_at_1_std value: -11.7296 - type: nauc_mrr_at_1_diff1 value: 15.7249 - type: nauc_mrr_at_3_max value: -12.965399999999999 - type: nauc_mrr_at_3_std value: -12.197099999999999 - type: nauc_mrr_at_3_diff1 value: 11.228200000000001 - type: nauc_mrr_at_5_max value: -12.3171 - type: nauc_mrr_at_5_std value: -11.3562 - type: nauc_mrr_at_5_diff1 value: 11.081900000000001 - type: nauc_mrr_at_10_max value: -11.9397 - type: nauc_mrr_at_10_std value: -11.3157 - type: nauc_mrr_at_10_diff1 value: 11.3887 - type: nauc_mrr_at_20_max value: -11.8344 - type: nauc_mrr_at_20_std value: -11.269 - type: nauc_mrr_at_20_diff1 value: 11.655600000000002 - type: nauc_mrr_at_100_max value: -11.9825 - type: nauc_mrr_at_100_std value: -11.3178 - type: nauc_mrr_at_100_diff1 value: 11.6519 - type: nauc_mrr_at_1000_max value: -11.9871 - type: nauc_mrr_at_1000_std value: -11.3205 - type: nauc_mrr_at_1000_diff1 value: 11.6499 - type: main_score value: 58.401 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: v_measure value: 48.3018 - type: v_measure_std value: 13.845199999999998 - type: main_score value: 48.3018 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S (default) revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: v_measure value: 44.837900000000005 - type: v_measure_std value: 14.089599999999999 - type: main_score value: 44.837900000000005 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions (default) revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: map value: 66.4838 - type: mrr value: 79.3195 - type: nAUC_map_max value: 23.2658 - type: nAUC_map_std value: 17.5795 - type: nAUC_map_diff1 value: 11.5539 - type: nAUC_mrr_max value: 35.565400000000004 - type: nAUC_mrr_std value: 23.7189 - type: nAUC_mrr_diff1 value: 15.962299999999999 - type: main_score value: 66.4838 task: type: Reranking - dataset: config: default name: MTEB BIOSSES (default) revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: pearson value: 90.1203 - type: spearman value: 87.8424 - type: cosine_pearson value: 90.1203 - type: cosine_spearman value: 87.8424 - type: manhattan_pearson value: 88.1164 - type: manhattan_spearman value: 87.752 - type: euclidean_pearson value: 88.3146 - type: euclidean_spearman value: 87.8424 - type: main_score value: 87.8424 task: type: STS - dataset: config: default name: MTEB Banking77Classification (default) revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 77.9156 - type: f1 value: 76.9641 - type: f1_weighted value: 76.9641 - type: main_score value: 77.9156 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P (default) revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: v_measure value: 38.3582 - type: v_measure_std value: 1.1436 - type: main_score value: 38.3582 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S (default) revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: v_measure value: 36.2911 - type: v_measure_std value: 0.44339999999999996 - type: main_score value: 36.2911 task: type: Clustering - dataset: config: python name: MTEB COIRCodeSearchNetRetrieval (python) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 76.351 - type: ndcg_at_3 value: 82.116 - type: ndcg_at_5 value: 83.231 - type: ndcg_at_10 value: 84.301 - type: ndcg_at_20 value: 84.83800000000001 - type: ndcg_at_100 value: 85.462 - type: ndcg_at_1000 value: 85.706 - type: map_at_1 value: 76.351 - type: map_at_3 value: 80.744 - type: map_at_5 value: 81.365 - type: map_at_10 value: 81.812 - type: map_at_20 value: 81.96 - type: map_at_100 value: 82.05 - type: map_at_1000 value: 82.06 - type: recall_at_1 value: 76.351 - type: recall_at_3 value: 86.071 - type: recall_at_5 value: 88.765 - type: recall_at_10 value: 92.04299999999999 - type: recall_at_20 value: 94.16799999999999 - type: recall_at_100 value: 97.466 - type: recall_at_1000 value: 99.383 - type: precision_at_1 value: 76.351 - type: precision_at_3 value: 28.689999999999998 - type: precision_at_5 value: 17.753 - type: precision_at_10 value: 9.203999999999999 - type: precision_at_20 value: 4.707999999999999 - type: precision_at_100 value: 0.975 - type: precision_at_1000 value: 0.099 - type: mrr_at_1 value: 76.3507 - type: mrr_at_3 value: 80.7436 - type: mrr_at_5 value: 81.3647 - type: mrr_at_10 value: 81.8121 - type: mrr_at_20 value: 81.9598 - type: mrr_at_100 value: 82.0504 - type: mrr_at_1000 value: 82.0597 - type: nauc_ndcg_at_1_max value: 73.2541 - type: nauc_ndcg_at_1_std value: -0.8352 - type: nauc_ndcg_at_1_diff1 value: 85.1422 - type: nauc_ndcg_at_3_max value: 75.9862 - type: nauc_ndcg_at_3_std value: 0.14100000000000001 - type: nauc_ndcg_at_3_diff1 value: 82.4674 - type: nauc_ndcg_at_5_max value: 75.7513 - type: nauc_ndcg_at_5_std value: 0.614 - type: nauc_ndcg_at_5_diff1 value: 82.2885 - type: nauc_ndcg_at_10_max value: 75.6282 - type: nauc_ndcg_at_10_std value: 0.6251 - type: nauc_ndcg_at_10_diff1 value: 82.3616 - type: nauc_ndcg_at_20_max value: 75.7286 - type: nauc_ndcg_at_20_std value: 0.9792000000000001 - type: nauc_ndcg_at_20_diff1 value: 82.6106 - type: nauc_ndcg_at_100_max value: 75.58840000000001 - type: nauc_ndcg_at_100_std value: 1.0781 - type: nauc_ndcg_at_100_diff1 value: 82.82969999999999 - type: nauc_ndcg_at_1000_max value: 75.4705 - type: nauc_ndcg_at_1000_std value: 0.8326 - type: nauc_ndcg_at_1000_diff1 value: 82.889 - type: nauc_map_at_1_max value: 73.2541 - type: nauc_map_at_1_std value: -0.8352 - type: nauc_map_at_1_diff1 value: 85.1422 - type: nauc_map_at_3_max value: 75.2756 - type: nauc_map_at_3_std value: -0.145 - type: nauc_map_at_3_diff1 value: 83.15780000000001 - type: nauc_map_at_5_max value: 75.1281 - type: nauc_map_at_5_std value: 0.0837 - type: nauc_map_at_5_diff1 value: 83.08250000000001 - type: nauc_map_at_10_max value: 75.05579999999999 - type: nauc_map_at_10_std value: 0.068 - type: nauc_map_at_10_diff1 value: 83.1206 - type: nauc_map_at_20_max value: 75.0708 - type: nauc_map_at_20_std value: 0.13749999999999998 - type: nauc_map_at_20_diff1 value: 83.1861 - type: nauc_map_at_100_max value: 75.0491 - type: nauc_map_at_100_std value: 0.1411 - type: nauc_map_at_100_diff1 value: 83.21539999999999 - type: nauc_map_at_1000_max value: 75.04570000000001 - type: nauc_map_at_1000_std value: 0.1359 - type: nauc_map_at_1000_diff1 value: 83.2179 - type: nauc_recall_at_1_max value: 73.2541 - type: nauc_recall_at_1_std value: -0.8352 - type: nauc_recall_at_1_diff1 value: 85.1422 - type: nauc_recall_at_3_max value: 78.65990000000001 - type: nauc_recall_at_3_std value: 1.2368000000000001 - type: nauc_recall_at_3_diff1 value: 79.8732 - type: nauc_recall_at_5_max value: 78.46 - type: nauc_recall_at_5_std value: 3.1027 - type: nauc_recall_at_5_diff1 value: 78.7509 - type: nauc_recall_at_10_max value: 78.9542 - type: nauc_recall_at_10_std value: 4.2138 - type: nauc_recall_at_10_diff1 value: 77.8697 - type: nauc_recall_at_20_max value: 81.2016 - type: nauc_recall_at_20_std value: 9.092500000000001 - type: nauc_recall_at_20_diff1 value: 78.6045 - type: nauc_recall_at_100_max value: 84.5044 - type: nauc_recall_at_100_std value: 22.6368 - type: nauc_recall_at_100_diff1 value: 79.553 - type: nauc_recall_at_1000_max value: 91.4393 - type: nauc_recall_at_1000_std value: 44.0261 - type: nauc_recall_at_1000_diff1 value: 78.6859 - type: nauc_precision_at_1_max value: 73.2541 - type: nauc_precision_at_1_std value: -0.8352 - type: nauc_precision_at_1_diff1 value: 85.1422 - type: nauc_precision_at_3_max value: 78.65990000000001 - type: nauc_precision_at_3_std value: 1.2368000000000001 - type: nauc_precision_at_3_diff1 value: 79.8732 - type: nauc_precision_at_5_max value: 78.46 - type: nauc_precision_at_5_std value: 3.1027 - type: nauc_precision_at_5_diff1 value: 78.7509 - type: nauc_precision_at_10_max value: 78.9542 - type: nauc_precision_at_10_std value: 4.2138 - type: nauc_precision_at_10_diff1 value: 77.8697 - type: nauc_precision_at_20_max value: 81.2016 - type: nauc_precision_at_20_std value: 9.092500000000001 - type: nauc_precision_at_20_diff1 value: 78.6045 - type: nauc_precision_at_100_max value: 84.5044 - type: nauc_precision_at_100_std value: 22.6368 - type: nauc_precision_at_100_diff1 value: 79.553 - type: nauc_precision_at_1000_max value: 91.4393 - type: nauc_precision_at_1000_std value: 44.0261 - type: nauc_precision_at_1000_diff1 value: 78.6859 - type: nauc_mrr_at_1_max value: 73.2541 - type: nauc_mrr_at_1_std value: -0.8352 - type: nauc_mrr_at_1_diff1 value: 85.1422 - type: nauc_mrr_at_3_max value: 75.2756 - type: nauc_mrr_at_3_std value: -0.145 - type: nauc_mrr_at_3_diff1 value: 83.15780000000001 - type: nauc_mrr_at_5_max value: 75.1281 - type: nauc_mrr_at_5_std value: 0.0837 - type: nauc_mrr_at_5_diff1 value: 83.08250000000001 - type: nauc_mrr_at_10_max value: 75.05579999999999 - type: nauc_mrr_at_10_std value: 0.068 - type: nauc_mrr_at_10_diff1 value: 83.1206 - type: nauc_mrr_at_20_max value: 75.0708 - type: nauc_mrr_at_20_std value: 0.13749999999999998 - type: nauc_mrr_at_20_diff1 value: 83.1861 - type: nauc_mrr_at_100_max value: 75.0491 - type: nauc_mrr_at_100_std value: 0.1411 - type: nauc_mrr_at_100_diff1 value: 83.21539999999999 - type: nauc_mrr_at_1000_max value: 75.04570000000001 - type: nauc_mrr_at_1000_std value: 0.1359 - type: nauc_mrr_at_1000_diff1 value: 83.2179 - type: main_score value: 84.301 task: type: Retrieval - dataset: config: javascript name: MTEB COIRCodeSearchNetRetrieval (javascript) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 34.154 - type: ndcg_at_3 value: 41.637 - type: ndcg_at_5 value: 43.775 - type: ndcg_at_10 value: 46.093 - type: ndcg_at_20 value: 47.659 - type: ndcg_at_100 value: 49.975 - type: ndcg_at_1000 value: 51.652 - type: map_at_1 value: 34.154 - type: map_at_3 value: 39.811 - type: map_at_5 value: 40.996 - type: map_at_10 value: 41.945 - type: map_at_20 value: 42.375 - type: map_at_100 value: 42.693999999999996 - type: map_at_1000 value: 42.752 - type: recall_at_1 value: 34.154 - type: recall_at_3 value: 46.916000000000004 - type: recall_at_5 value: 52.112 - type: recall_at_10 value: 59.313 - type: recall_at_20 value: 65.512 - type: recall_at_100 value: 78.001 - type: recall_at_1000 value: 91.49199999999999 - type: precision_at_1 value: 34.154 - type: precision_at_3 value: 15.639 - type: precision_at_5 value: 10.421999999999999 - type: precision_at_10 value: 5.931 - type: precision_at_20 value: 3.276 - type: precision_at_100 value: 0.7799999999999999 - type: precision_at_1000 value: 0.091 - type: mrr_at_1 value: 34.153800000000004 - type: mrr_at_3 value: 39.8106 - type: mrr_at_5 value: 40.995599999999996 - type: mrr_at_10 value: 41.9454 - type: mrr_at_20 value: 42.375099999999996 - type: mrr_at_100 value: 42.6943 - type: mrr_at_1000 value: 42.7521 - type: nauc_ndcg_at_1_max value: 43.9354 - type: nauc_ndcg_at_1_std value: -3.6563 - type: nauc_ndcg_at_1_diff1 value: 63.9034 - type: nauc_ndcg_at_3_max value: 45.9224 - type: nauc_ndcg_at_3_std value: -1.1915 - type: nauc_ndcg_at_3_diff1 value: 56.65599999999999 - type: nauc_ndcg_at_5_max value: 45.7943 - type: nauc_ndcg_at_5_std value: -0.7263000000000001 - type: nauc_ndcg_at_5_diff1 value: 55.4796 - type: nauc_ndcg_at_10_max value: 45.4291 - type: nauc_ndcg_at_10_std value: 0.12290000000000001 - type: nauc_ndcg_at_10_diff1 value: 54.7952 - type: nauc_ndcg_at_20_max value: 45.7072 - type: nauc_ndcg_at_20_std value: 1.3283 - type: nauc_ndcg_at_20_diff1 value: 54.8465 - type: nauc_ndcg_at_100_max value: 45.8073 - type: nauc_ndcg_at_100_std value: 1.8653 - type: nauc_ndcg_at_100_diff1 value: 54.9886 - type: nauc_ndcg_at_1000_max value: 45.5983 - type: nauc_ndcg_at_1000_std value: 1.2590999999999999 - type: nauc_ndcg_at_1000_diff1 value: 55.374500000000005 - type: nauc_map_at_1_max value: 43.9354 - type: nauc_map_at_1_std value: -3.6563 - type: nauc_map_at_1_diff1 value: 63.9034 - type: nauc_map_at_3_max value: 45.4465 - type: nauc_map_at_3_std value: -1.7909000000000002 - type: nauc_map_at_3_diff1 value: 58.3822 - type: nauc_map_at_5_max value: 45.3588 - type: nauc_map_at_5_std value: -1.5449 - type: nauc_map_at_5_diff1 value: 57.737 - type: nauc_map_at_10_max value: 45.2115 - type: nauc_map_at_10_std value: -1.2034 - type: nauc_map_at_10_diff1 value: 57.4859 - type: nauc_map_at_20_max value: 45.29 - type: nauc_map_at_20_std value: -0.8769000000000001 - type: nauc_map_at_20_diff1 value: 57.510099999999994 - type: nauc_map_at_100_max value: 45.2905 - type: nauc_map_at_100_std value: -0.8298 - type: nauc_map_at_100_diff1 value: 57.5373 - type: nauc_map_at_1000_max value: 45.2866 - type: nauc_map_at_1000_std value: -0.8453 - type: nauc_map_at_1000_diff1 value: 57.550000000000004 - type: nauc_recall_at_1_max value: 43.9354 - type: nauc_recall_at_1_std value: -3.6563 - type: nauc_recall_at_1_diff1 value: 63.9034 - type: nauc_recall_at_3_max value: 47.2962 - type: nauc_recall_at_3_std value: 0.542 - type: nauc_recall_at_3_diff1 value: 51.6782 - type: nauc_recall_at_5_max value: 47.0822 - type: nauc_recall_at_5_std value: 1.7794999999999999 - type: nauc_recall_at_5_diff1 value: 48.634100000000004 - type: nauc_recall_at_10_max value: 45.9453 - type: nauc_recall_at_10_std value: 4.7773 - type: nauc_recall_at_10_diff1 value: 45.778600000000004 - type: nauc_recall_at_20_max value: 47.232400000000005 - type: nauc_recall_at_20_std value: 10.7522 - type: nauc_recall_at_20_diff1 value: 45.029599999999995 - type: nauc_recall_at_100_max value: 48.937799999999996 - type: nauc_recall_at_100_std value: 19.4035 - type: nauc_recall_at_100_diff1 value: 42.388 - type: nauc_recall_at_1000_max value: 46.494099999999996 - type: nauc_recall_at_1000_std value: 24.532 - type: nauc_recall_at_1000_diff1 value: 36.9281 - type: nauc_precision_at_1_max value: 43.9354 - type: nauc_precision_at_1_std value: -3.6563 - type: nauc_precision_at_1_diff1 value: 63.9034 - type: nauc_precision_at_3_max value: 47.2962 - type: nauc_precision_at_3_std value: 0.542 - type: nauc_precision_at_3_diff1 value: 51.6782 - type: nauc_precision_at_5_max value: 47.0822 - type: nauc_precision_at_5_std value: 1.7794999999999999 - type: nauc_precision_at_5_diff1 value: 48.634100000000004 - type: nauc_precision_at_10_max value: 45.9453 - type: nauc_precision_at_10_std value: 4.7773 - type: nauc_precision_at_10_diff1 value: 45.778600000000004 - type: nauc_precision_at_20_max value: 47.232400000000005 - type: nauc_precision_at_20_std value: 10.7522 - type: nauc_precision_at_20_diff1 value: 45.029599999999995 - type: nauc_precision_at_100_max value: 48.937799999999996 - type: nauc_precision_at_100_std value: 19.4035 - type: nauc_precision_at_100_diff1 value: 42.388 - type: nauc_precision_at_1000_max value: 46.494099999999996 - type: nauc_precision_at_1000_std value: 24.532 - type: nauc_precision_at_1000_diff1 value: 36.9281 - type: nauc_mrr_at_1_max value: 43.9354 - type: nauc_mrr_at_1_std value: -3.6563 - type: nauc_mrr_at_1_diff1 value: 63.9034 - type: nauc_mrr_at_3_max value: 45.4465 - type: nauc_mrr_at_3_std value: -1.7909000000000002 - type: nauc_mrr_at_3_diff1 value: 58.3822 - type: nauc_mrr_at_5_max value: 45.3588 - type: nauc_mrr_at_5_std value: -1.5449 - type: nauc_mrr_at_5_diff1 value: 57.737 - type: nauc_mrr_at_10_max value: 45.2115 - type: nauc_mrr_at_10_std value: -1.2034 - type: nauc_mrr_at_10_diff1 value: 57.4859 - type: nauc_mrr_at_20_max value: 45.29 - type: nauc_mrr_at_20_std value: -0.8769000000000001 - type: nauc_mrr_at_20_diff1 value: 57.510099999999994 - type: nauc_mrr_at_100_max value: 45.2906 - type: nauc_mrr_at_100_std value: -0.8297000000000001 - type: nauc_mrr_at_100_diff1 value: 57.5373 - type: nauc_mrr_at_1000_max value: 45.2866 - type: nauc_mrr_at_1000_std value: -0.8452 - type: nauc_mrr_at_1000_diff1 value: 57.550000000000004 - type: main_score value: 46.093 task: type: Retrieval - dataset: config: go name: MTEB COIRCodeSearchNetRetrieval (go) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 43.105 - type: ndcg_at_3 value: 52.758 - type: ndcg_at_5 value: 55.284 - type: ndcg_at_10 value: 57.557 - type: ndcg_at_20 value: 58.885 - type: ndcg_at_100 value: 60.803 - type: ndcg_at_1000 value: 61.855000000000004 - type: map_at_1 value: 43.105 - type: map_at_3 value: 50.38399999999999 - type: map_at_5 value: 51.783 - type: map_at_10 value: 52.727999999999994 - type: map_at_20 value: 53.095000000000006 - type: map_at_100 value: 53.361999999999995 - type: map_at_1000 value: 53.400000000000006 - type: recall_at_1 value: 43.105 - type: recall_at_3 value: 59.628 - type: recall_at_5 value: 65.77199999999999 - type: recall_at_10 value: 72.765 - type: recall_at_20 value: 77.998 - type: recall_at_100 value: 88.31599999999999 - type: recall_at_1000 value: 96.71300000000001 - type: precision_at_1 value: 43.105 - type: precision_at_3 value: 19.875999999999998 - type: precision_at_5 value: 13.154 - type: precision_at_10 value: 7.277 - type: precision_at_20 value: 3.9 - type: precision_at_100 value: 0.8829999999999999 - type: precision_at_1000 value: 0.097 - type: mrr_at_1 value: 43.1051 - type: mrr_at_3 value: 50.3837 - type: mrr_at_5 value: 51.783 - type: mrr_at_10 value: 52.727900000000005 - type: mrr_at_20 value: 53.0949 - type: mrr_at_100 value: 53.3622 - type: mrr_at_1000 value: 53.400000000000006 - type: nauc_ndcg_at_1_max value: 37.3169 - type: nauc_ndcg_at_1_std value: -2.3253 - type: nauc_ndcg_at_1_diff1 value: 60.0465 - type: nauc_ndcg_at_3_max value: 38.2665 - type: nauc_ndcg_at_3_std value: -2.7671 - type: nauc_ndcg_at_3_diff1 value: 54.8964 - type: nauc_ndcg_at_5_max value: 38.4714 - type: nauc_ndcg_at_5_std value: -2.7024 - type: nauc_ndcg_at_5_diff1 value: 54.207899999999995 - type: nauc_ndcg_at_10_max value: 38.4099 - type: nauc_ndcg_at_10_std value: -2.5911 - type: nauc_ndcg_at_10_diff1 value: 53.9601 - type: nauc_ndcg_at_20_max value: 38.406400000000005 - type: nauc_ndcg_at_20_std value: -2.3428 - type: nauc_ndcg_at_20_diff1 value: 54.008 - type: nauc_ndcg_at_100_max value: 38.485 - type: nauc_ndcg_at_100_std value: -2.0368 - type: nauc_ndcg_at_100_diff1 value: 54.238299999999995 - type: nauc_ndcg_at_1000_max value: 38.5112 - type: nauc_ndcg_at_1000_std value: -2.1126 - type: nauc_ndcg_at_1000_diff1 value: 54.6965 - type: nauc_map_at_1_max value: 37.3169 - type: nauc_map_at_1_std value: -2.3253 - type: nauc_map_at_1_diff1 value: 60.0465 - type: nauc_map_at_3_max value: 38.0384 - type: nauc_map_at_3_std value: -2.6754 - type: nauc_map_at_3_diff1 value: 56.137899999999995 - type: nauc_map_at_5_max value: 38.1522 - type: nauc_map_at_5_std value: -2.6406 - type: nauc_map_at_5_diff1 value: 55.80310000000001 - type: nauc_map_at_10_max value: 38.128299999999996 - type: nauc_map_at_10_std value: -2.5891 - type: nauc_map_at_10_diff1 value: 55.7289 - type: nauc_map_at_20_max value: 38.128 - type: nauc_map_at_20_std value: -2.5267 - type: nauc_map_at_20_diff1 value: 55.758700000000005 - type: nauc_map_at_100_max value: 38.1402 - type: nauc_map_at_100_std value: -2.4964 - type: nauc_map_at_100_diff1 value: 55.80159999999999 - type: nauc_map_at_1000_max value: 38.1428 - type: nauc_map_at_1000_std value: -2.4949 - type: nauc_map_at_1000_diff1 value: 55.8162 - type: nauc_recall_at_1_max value: 37.3169 - type: nauc_recall_at_1_std value: -2.3253 - type: nauc_recall_at_1_diff1 value: 60.0465 - type: nauc_recall_at_3_max value: 38.9708 - type: nauc_recall_at_3_std value: -3.0438 - type: nauc_recall_at_3_diff1 value: 51.0597 - type: nauc_recall_at_5_max value: 39.5722 - type: nauc_recall_at_5_std value: -2.8886 - type: nauc_recall_at_5_diff1 value: 48.6862 - type: nauc_recall_at_10_max value: 39.494 - type: nauc_recall_at_10_std value: -2.5299 - type: nauc_recall_at_10_diff1 value: 46.75 - type: nauc_recall_at_20_max value: 39.6388 - type: nauc_recall_at_20_std value: -1.0715999999999999 - type: nauc_recall_at_20_diff1 value: 45.6381 - type: nauc_recall_at_100_max value: 41.4357 - type: nauc_recall_at_100_std value: 4.1693 - type: nauc_recall_at_100_diff1 value: 42.2097 - type: nauc_recall_at_1000_max value: 49.2056 - type: nauc_recall_at_1000_std value: 12.2387 - type: nauc_recall_at_1000_diff1 value: 42.7371 - type: nauc_precision_at_1_max value: 37.3169 - type: nauc_precision_at_1_std value: -2.3253 - type: nauc_precision_at_1_diff1 value: 60.0465 - type: nauc_precision_at_3_max value: 38.9708 - type: nauc_precision_at_3_std value: -3.0438 - type: nauc_precision_at_3_diff1 value: 51.0597 - type: nauc_precision_at_5_max value: 39.5722 - type: nauc_precision_at_5_std value: -2.8886 - type: nauc_precision_at_5_diff1 value: 48.6862 - type: nauc_precision_at_10_max value: 39.494 - type: nauc_precision_at_10_std value: -2.5299 - type: nauc_precision_at_10_diff1 value: 46.75 - type: nauc_precision_at_20_max value: 39.6388 - type: nauc_precision_at_20_std value: -1.0715999999999999 - type: nauc_precision_at_20_diff1 value: 45.6381 - type: nauc_precision_at_100_max value: 41.4357 - type: nauc_precision_at_100_std value: 4.1693 - type: nauc_precision_at_100_diff1 value: 42.2097 - type: nauc_precision_at_1000_max value: 49.2056 - type: nauc_precision_at_1000_std value: 12.2387 - type: nauc_precision_at_1000_diff1 value: 42.7371 - type: nauc_mrr_at_1_max value: 37.3169 - type: nauc_mrr_at_1_std value: -2.3253 - type: nauc_mrr_at_1_diff1 value: 60.0465 - type: nauc_mrr_at_3_max value: 38.0384 - type: nauc_mrr_at_3_std value: -2.6754 - type: nauc_mrr_at_3_diff1 value: 56.137899999999995 - type: nauc_mrr_at_5_max value: 38.1522 - type: nauc_mrr_at_5_std value: -2.6406 - type: nauc_mrr_at_5_diff1 value: 55.80310000000001 - type: nauc_mrr_at_10_max value: 38.128299999999996 - type: nauc_mrr_at_10_std value: -2.5891 - type: nauc_mrr_at_10_diff1 value: 55.7289 - type: nauc_mrr_at_20_max value: 38.128 - type: nauc_mrr_at_20_std value: -2.5267 - type: nauc_mrr_at_20_diff1 value: 55.758700000000005 - type: nauc_mrr_at_100_max value: 38.1402 - type: nauc_mrr_at_100_std value: -2.4964 - type: nauc_mrr_at_100_diff1 value: 55.80159999999999 - type: nauc_mrr_at_1000_max value: 38.1428 - type: nauc_mrr_at_1000_std value: -2.4949 - type: nauc_mrr_at_1000_diff1 value: 55.8162 - type: main_score value: 57.557 task: type: Retrieval - dataset: config: ruby name: MTEB COIRCodeSearchNetRetrieval (ruby) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 33.466 - type: ndcg_at_3 value: 41.611 - type: ndcg_at_5 value: 44.41 - type: ndcg_at_10 value: 46.878 - type: ndcg_at_20 value: 48.548 - type: ndcg_at_100 value: 51.004000000000005 - type: ndcg_at_1000 value: 52.564 - type: map_at_1 value: 33.466 - type: map_at_3 value: 39.650999999999996 - type: map_at_5 value: 41.217 - type: map_at_10 value: 42.225 - type: map_at_20 value: 42.687000000000005 - type: map_at_100 value: 43.025000000000006 - type: map_at_1000 value: 43.082 - type: recall_at_1 value: 33.466 - type: recall_at_3 value: 47.264 - type: recall_at_5 value: 54.005 - type: recall_at_10 value: 61.697 - type: recall_at_20 value: 68.279 - type: recall_at_100 value: 81.523 - type: recall_at_1000 value: 93.973 - type: precision_at_1 value: 33.466 - type: precision_at_3 value: 15.754999999999999 - type: precision_at_5 value: 10.801 - type: precision_at_10 value: 6.17 - type: precision_at_20 value: 3.4139999999999997 - type: precision_at_100 value: 0.815 - type: precision_at_1000 value: 0.094 - type: mrr_at_1 value: 33.4655 - type: mrr_at_3 value: 39.6511 - type: mrr_at_5 value: 41.2173 - type: mrr_at_10 value: 42.2253 - type: mrr_at_20 value: 42.686800000000005 - type: mrr_at_100 value: 43.025000000000006 - type: mrr_at_1000 value: 43.0818 - type: nauc_ndcg_at_1_max value: 45.789699999999996 - type: nauc_ndcg_at_1_std value: -4.9502999999999995 - type: nauc_ndcg_at_1_diff1 value: 54.9067 - type: nauc_ndcg_at_3_max value: 44.473800000000004 - type: nauc_ndcg_at_3_std value: -2.9877000000000002 - type: nauc_ndcg_at_3_diff1 value: 48.611599999999996 - type: nauc_ndcg_at_5_max value: 44.048300000000005 - type: nauc_ndcg_at_5_std value: -2.4233000000000002 - type: nauc_ndcg_at_5_diff1 value: 46.6638 - type: nauc_ndcg_at_10_max value: 42.9816 - type: nauc_ndcg_at_10_std value: -1.8901000000000001 - type: nauc_ndcg_at_10_diff1 value: 45.9046 - type: nauc_ndcg_at_20_max value: 42.7803 - type: nauc_ndcg_at_20_std value: -1.2547000000000001 - type: nauc_ndcg_at_20_diff1 value: 45.305 - type: nauc_ndcg_at_100_max value: 42.918 - type: nauc_ndcg_at_100_std value: -0.6534 - type: nauc_ndcg_at_100_diff1 value: 45.6519 - type: nauc_ndcg_at_1000_max value: 43.0112 - type: nauc_ndcg_at_1000_std value: -1.1447 - type: nauc_ndcg_at_1000_diff1 value: 46.1206 - type: nauc_map_at_1_max value: 45.789699999999996 - type: nauc_map_at_1_std value: -4.9502999999999995 - type: nauc_map_at_1_diff1 value: 54.9067 - type: nauc_map_at_3_max value: 44.6443 - type: nauc_map_at_3_std value: -3.4606 - type: nauc_map_at_3_diff1 value: 49.9067 - type: nauc_map_at_5_max value: 44.3838 - type: nauc_map_at_5_std value: -3.1638 - type: nauc_map_at_5_diff1 value: 48.829899999999995 - type: nauc_map_at_10_max value: 43.9426 - type: nauc_map_at_10_std value: -2.9687 - type: nauc_map_at_10_diff1 value: 48.497 - type: nauc_map_at_20_max value: 43.8915 - type: nauc_map_at_20_std value: -2.8005 - type: nauc_map_at_20_diff1 value: 48.3597 - type: nauc_map_at_100_max value: 43.8943 - type: nauc_map_at_100_std value: -2.7306 - type: nauc_map_at_100_diff1 value: 48.4227 - type: nauc_map_at_1000_max value: 43.8925 - type: nauc_map_at_1000_std value: -2.7446 - type: nauc_map_at_1000_diff1 value: 48.4369 - type: nauc_recall_at_1_max value: 45.789699999999996 - type: nauc_recall_at_1_std value: -4.9502999999999995 - type: nauc_recall_at_1_diff1 value: 54.9067 - type: nauc_recall_at_3_max value: 44.0419 - type: nauc_recall_at_3_std value: -1.6226 - type: nauc_recall_at_3_diff1 value: 44.9647 - type: nauc_recall_at_5_max value: 43.0769 - type: nauc_recall_at_5_std value: -0.1038 - type: nauc_recall_at_5_diff1 value: 39.9873 - type: nauc_recall_at_10_max value: 39.4409 - type: nauc_recall_at_10_std value: 2.0126999999999997 - type: nauc_recall_at_10_diff1 value: 37.0457 - type: nauc_recall_at_20_max value: 38.0436 - type: nauc_recall_at_20_std value: 5.5206 - type: nauc_recall_at_20_diff1 value: 32.9418 - type: nauc_recall_at_100_max value: 37.4262 - type: nauc_recall_at_100_std value: 14.9231 - type: nauc_recall_at_100_diff1 value: 29.651100000000003 - type: nauc_recall_at_1000_max value: 33.1185 - type: nauc_recall_at_1000_std value: 23.4133 - type: nauc_recall_at_1000_diff1 value: 19.6646 - type: nauc_precision_at_1_max value: 45.789699999999996 - type: nauc_precision_at_1_std value: -4.9502999999999995 - type: nauc_precision_at_1_diff1 value: 54.9067 - type: nauc_precision_at_3_max value: 44.0419 - type: nauc_precision_at_3_std value: -1.6226 - type: nauc_precision_at_3_diff1 value: 44.9647 - type: nauc_precision_at_5_max value: 43.0769 - type: nauc_precision_at_5_std value: -0.1038 - type: nauc_precision_at_5_diff1 value: 39.9873 - type: nauc_precision_at_10_max value: 39.4409 - type: nauc_precision_at_10_std value: 2.0126999999999997 - type: nauc_precision_at_10_diff1 value: 37.0457 - type: nauc_precision_at_20_max value: 38.0436 - type: nauc_precision_at_20_std value: 5.5206 - type: nauc_precision_at_20_diff1 value: 32.9418 - type: nauc_precision_at_100_max value: 37.4262 - type: nauc_precision_at_100_std value: 14.9231 - type: nauc_precision_at_100_diff1 value: 29.651100000000003 - type: nauc_precision_at_1000_max value: 33.1185 - type: nauc_precision_at_1000_std value: 23.4133 - type: nauc_precision_at_1000_diff1 value: 19.6646 - type: nauc_mrr_at_1_max value: 45.789699999999996 - type: nauc_mrr_at_1_std value: -4.9502999999999995 - type: nauc_mrr_at_1_diff1 value: 54.9067 - type: nauc_mrr_at_3_max value: 44.6443 - type: nauc_mrr_at_3_std value: -3.4606 - type: nauc_mrr_at_3_diff1 value: 49.9067 - type: nauc_mrr_at_5_max value: 44.3838 - type: nauc_mrr_at_5_std value: -3.1638 - type: nauc_mrr_at_5_diff1 value: 48.829899999999995 - type: nauc_mrr_at_10_max value: 43.9426 - type: nauc_mrr_at_10_std value: -2.9687 - type: nauc_mrr_at_10_diff1 value: 48.497 - type: nauc_mrr_at_20_max value: 43.8915 - type: nauc_mrr_at_20_std value: -2.8005 - type: nauc_mrr_at_20_diff1 value: 48.3597 - type: nauc_mrr_at_100_max value: 43.8943 - type: nauc_mrr_at_100_std value: -2.7306 - type: nauc_mrr_at_100_diff1 value: 48.4227 - type: nauc_mrr_at_1000_max value: 43.8925 - type: nauc_mrr_at_1000_std value: -2.7446 - type: nauc_mrr_at_1000_diff1 value: 48.4369 - type: main_score value: 46.878 task: type: Retrieval - dataset: config: java name: MTEB COIRCodeSearchNetRetrieval (java) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 37.91 - type: ndcg_at_3 value: 46.022999999999996 - type: ndcg_at_5 value: 48.345 - type: ndcg_at_10 value: 50.477000000000004 - type: ndcg_at_20 value: 51.900999999999996 - type: ndcg_at_100 value: 54.01899999999999 - type: ndcg_at_1000 value: 55.383 - type: map_at_1 value: 37.91 - type: map_at_3 value: 44.051 - type: map_at_5 value: 45.341 - type: map_at_10 value: 46.221000000000004 - type: map_at_20 value: 46.613 - type: map_at_100 value: 46.902 - type: map_at_1000 value: 46.949999999999996 - type: recall_at_1 value: 37.91 - type: recall_at_3 value: 51.721 - type: recall_at_5 value: 57.353 - type: recall_at_10 value: 63.943000000000005 - type: recall_at_20 value: 69.56599999999999 - type: recall_at_100 value: 81.041 - type: recall_at_1000 value: 91.995 - type: precision_at_1 value: 37.91 - type: precision_at_3 value: 17.24 - type: precision_at_5 value: 11.471 - type: precision_at_10 value: 6.394 - type: precision_at_20 value: 3.4779999999999998 - type: precision_at_100 value: 0.8099999999999999 - type: precision_at_1000 value: 0.092 - type: mrr_at_1 value: 37.9096 - type: mrr_at_3 value: 44.0514 - type: mrr_at_5 value: 45.340799999999994 - type: mrr_at_10 value: 46.221000000000004 - type: mrr_at_20 value: 46.613 - type: mrr_at_100 value: 46.9024 - type: mrr_at_1000 value: 46.9499 - type: nauc_ndcg_at_1_max value: 32.0711 - type: nauc_ndcg_at_1_std value: -6.4620999999999995 - type: nauc_ndcg_at_1_diff1 value: 57.851200000000006 - type: nauc_ndcg_at_3_max value: 33.6415 - type: nauc_ndcg_at_3_std value: -5.2595 - type: nauc_ndcg_at_3_diff1 value: 53.340900000000005 - type: nauc_ndcg_at_5_max value: 33.6962 - type: nauc_ndcg_at_5_std value: -4.3041 - type: nauc_ndcg_at_5_diff1 value: 52.137299999999996 - type: nauc_ndcg_at_10_max value: 33.8843 - type: nauc_ndcg_at_10_std value: -3.2363000000000004 - type: nauc_ndcg_at_10_diff1 value: 51.5065 - type: nauc_ndcg_at_20_max value: 33.8675 - type: nauc_ndcg_at_20_std value: -2.4443 - type: nauc_ndcg_at_20_diff1 value: 51.31790000000001 - type: nauc_ndcg_at_100_max value: 34.2671 - type: nauc_ndcg_at_100_std value: -1.706 - type: nauc_ndcg_at_100_diff1 value: 51.3801 - type: nauc_ndcg_at_1000_max value: 34.237 - type: nauc_ndcg_at_1000_std value: -2.0292999999999997 - type: nauc_ndcg_at_1000_diff1 value: 51.8196 - type: nauc_map_at_1_max value: 32.0711 - type: nauc_map_at_1_std value: -6.4620999999999995 - type: nauc_map_at_1_diff1 value: 57.851200000000006 - type: nauc_map_at_3_max value: 33.271699999999996 - type: nauc_map_at_3_std value: -5.578799999999999 - type: nauc_map_at_3_diff1 value: 54.427800000000005 - type: nauc_map_at_5_max value: 33.2962 - type: nauc_map_at_5_std value: -5.063 - type: nauc_map_at_5_diff1 value: 53.784 - type: nauc_map_at_10_max value: 33.3553 - type: nauc_map_at_10_std value: -4.6524 - type: nauc_map_at_10_diff1 value: 53.5366 - type: nauc_map_at_20_max value: 33.3544 - type: nauc_map_at_20_std value: -4.4497 - type: nauc_map_at_20_diff1 value: 53.4978 - type: nauc_map_at_100_max value: 33.4027 - type: nauc_map_at_100_std value: -4.3659 - type: nauc_map_at_100_diff1 value: 53.514300000000006 - type: nauc_map_at_1000_max value: 33.4037 - type: nauc_map_at_1000_std value: -4.3740000000000006 - type: nauc_map_at_1000_diff1 value: 53.5313 - type: nauc_recall_at_1_max value: 32.0711 - type: nauc_recall_at_1_std value: -6.4620999999999995 - type: nauc_recall_at_1_diff1 value: 57.851200000000006 - type: nauc_recall_at_3_max value: 34.7301 - type: nauc_recall_at_3_std value: -4.3033 - type: nauc_recall_at_3_diff1 value: 50.129999999999995 - type: nauc_recall_at_5_max value: 34.940599999999996 - type: nauc_recall_at_5_std value: -1.7868 - type: nauc_recall_at_5_diff1 value: 46.848 - type: nauc_recall_at_10_max value: 35.8024 - type: nauc_recall_at_10_std value: 2.271 - type: nauc_recall_at_10_diff1 value: 44.1597 - type: nauc_recall_at_20_max value: 35.881800000000005 - type: nauc_recall_at_20_std value: 6.7608 - type: nauc_recall_at_20_diff1 value: 42.3843 - type: nauc_recall_at_100_max value: 40.5398 - type: nauc_recall_at_100_std value: 17.9288 - type: nauc_recall_at_100_diff1 value: 38.9048 - type: nauc_recall_at_1000_max value: 46.6349 - type: nauc_recall_at_1000_std value: 31.1156 - type: nauc_recall_at_1000_diff1 value: 36.5951 - type: nauc_precision_at_1_max value: 32.0711 - type: nauc_precision_at_1_std value: -6.4620999999999995 - type: nauc_precision_at_1_diff1 value: 57.851200000000006 - type: nauc_precision_at_3_max value: 34.7301 - type: nauc_precision_at_3_std value: -4.3033 - type: nauc_precision_at_3_diff1 value: 50.129999999999995 - type: nauc_precision_at_5_max value: 34.940599999999996 - type: nauc_precision_at_5_std value: -1.7868 - type: nauc_precision_at_5_diff1 value: 46.848 - type: nauc_precision_at_10_max value: 35.8024 - type: nauc_precision_at_10_std value: 2.271 - type: nauc_precision_at_10_diff1 value: 44.1597 - type: nauc_precision_at_20_max value: 35.881800000000005 - type: nauc_precision_at_20_std value: 6.7608 - type: nauc_precision_at_20_diff1 value: 42.3843 - type: nauc_precision_at_100_max value: 40.5398 - type: nauc_precision_at_100_std value: 17.9288 - type: nauc_precision_at_100_diff1 value: 38.9048 - type: nauc_precision_at_1000_max value: 46.6349 - type: nauc_precision_at_1000_std value: 31.1156 - type: nauc_precision_at_1000_diff1 value: 36.5951 - type: nauc_mrr_at_1_max value: 32.0711 - type: nauc_mrr_at_1_std value: -6.4620999999999995 - type: nauc_mrr_at_1_diff1 value: 57.851200000000006 - type: nauc_mrr_at_3_max value: 33.271699999999996 - type: nauc_mrr_at_3_std value: -5.578799999999999 - type: nauc_mrr_at_3_diff1 value: 54.427800000000005 - type: nauc_mrr_at_5_max value: 33.2962 - type: nauc_mrr_at_5_std value: -5.063 - type: nauc_mrr_at_5_diff1 value: 53.784 - type: nauc_mrr_at_10_max value: 33.3553 - type: nauc_mrr_at_10_std value: -4.6524 - type: nauc_mrr_at_10_diff1 value: 53.5366 - type: nauc_mrr_at_20_max value: 33.3544 - type: nauc_mrr_at_20_std value: -4.4497 - type: nauc_mrr_at_20_diff1 value: 53.4978 - type: nauc_mrr_at_100_max value: 33.4027 - type: nauc_mrr_at_100_std value: -4.3659 - type: nauc_mrr_at_100_diff1 value: 53.514300000000006 - type: nauc_mrr_at_1000_max value: 33.4037 - type: nauc_mrr_at_1000_std value: -4.3740000000000006 - type: nauc_mrr_at_1000_diff1 value: 53.5313 - type: main_score value: 50.477000000000004 task: type: Retrieval - dataset: config: php name: MTEB COIRCodeSearchNetRetrieval (php) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 32.253 - type: ndcg_at_3 value: 40.355999999999995 - type: ndcg_at_5 value: 42.85 - type: ndcg_at_10 value: 45.217 - type: ndcg_at_20 value: 47.13 - type: ndcg_at_100 value: 49.683 - type: ndcg_at_1000 value: 51.248000000000005 - type: map_at_1 value: 32.253 - type: map_at_3 value: 38.374 - type: map_at_5 value: 39.757999999999996 - type: map_at_10 value: 40.731 - type: map_at_20 value: 41.254999999999995 - type: map_at_100 value: 41.6 - type: map_at_1000 value: 41.654 - type: recall_at_1 value: 32.253 - type: recall_at_3 value: 46.089999999999996 - type: recall_at_5 value: 52.141000000000005 - type: recall_at_10 value: 59.483 - type: recall_at_20 value: 67.054 - type: recall_at_100 value: 80.93299999999999 - type: recall_at_1000 value: 93.499 - type: precision_at_1 value: 32.253 - type: precision_at_3 value: 15.363 - type: precision_at_5 value: 10.427999999999999 - type: precision_at_10 value: 5.9479999999999995 - type: precision_at_20 value: 3.3529999999999998 - type: precision_at_100 value: 0.8089999999999999 - type: precision_at_1000 value: 0.093 - type: mrr_at_1 value: 32.2535 - type: mrr_at_3 value: 38.3735 - type: mrr_at_5 value: 39.7582 - type: mrr_at_10 value: 40.7309 - type: mrr_at_20 value: 41.254999999999995 - type: mrr_at_100 value: 41.6001 - type: mrr_at_1000 value: 41.6545 - type: nauc_ndcg_at_1_max value: 29.5043 - type: nauc_ndcg_at_1_std value: -3.8282999999999996 - type: nauc_ndcg_at_1_diff1 value: 55.538399999999996 - type: nauc_ndcg_at_3_max value: 30.1745 - type: nauc_ndcg_at_3_std value: -2.6322 - type: nauc_ndcg_at_3_diff1 value: 49.4579 - type: nauc_ndcg_at_5_max value: 29.990699999999997 - type: nauc_ndcg_at_5_std value: -2.2249000000000003 - type: nauc_ndcg_at_5_diff1 value: 48.5017 - type: nauc_ndcg_at_10_max value: 29.8609 - type: nauc_ndcg_at_10_std value: -1.6362999999999999 - type: nauc_ndcg_at_10_diff1 value: 47.7191 - type: nauc_ndcg_at_20_max value: 30.1378 - type: nauc_ndcg_at_20_std value: -0.6985 - type: nauc_ndcg_at_20_diff1 value: 47.5359 - type: nauc_ndcg_at_100_max value: 30.5901 - type: nauc_ndcg_at_100_std value: 0.1903 - type: nauc_ndcg_at_100_diff1 value: 47.765299999999996 - type: nauc_ndcg_at_1000_max value: 30.607200000000002 - type: nauc_ndcg_at_1000_std value: -0.1485 - type: nauc_ndcg_at_1000_diff1 value: 48.3165 - type: nauc_map_at_1_max value: 29.5043 - type: nauc_map_at_1_std value: -3.8282999999999996 - type: nauc_map_at_1_diff1 value: 55.538399999999996 - type: nauc_map_at_3_max value: 30.0348 - type: nauc_map_at_3_std value: -2.9402 - type: nauc_map_at_3_diff1 value: 50.8128 - type: nauc_map_at_5_max value: 29.9447 - type: nauc_map_at_5_std value: -2.7157 - type: nauc_map_at_5_diff1 value: 50.2953 - type: nauc_map_at_10_max value: 29.8929 - type: nauc_map_at_10_std value: -2.4865000000000004 - type: nauc_map_at_10_diff1 value: 49.9942 - type: nauc_map_at_20_max value: 29.9564 - type: nauc_map_at_20_std value: -2.2576 - type: nauc_map_at_20_diff1 value: 49.961800000000004 - type: nauc_map_at_100_max value: 30.0155 - type: nauc_map_at_100_std value: -2.1527000000000003 - type: nauc_map_at_100_diff1 value: 50.00320000000001 - type: nauc_map_at_1000_max value: 30.0156 - type: nauc_map_at_1000_std value: -2.1597999999999997 - type: nauc_map_at_1000_diff1 value: 50.019000000000005 - type: nauc_recall_at_1_max value: 29.5043 - type: nauc_recall_at_1_std value: -3.8282999999999996 - type: nauc_recall_at_1_diff1 value: 55.538399999999996 - type: nauc_recall_at_3_max value: 30.567 - type: nauc_recall_at_3_std value: -1.7389999999999999 - type: nauc_recall_at_3_diff1 value: 45.6079 - type: nauc_recall_at_5_max value: 30.074499999999997 - type: nauc_recall_at_5_std value: -0.7081 - type: nauc_recall_at_5_diff1 value: 43.1053 - type: nauc_recall_at_10_max value: 29.644 - type: nauc_recall_at_10_std value: 1.4013 - type: nauc_recall_at_10_diff1 value: 40.0676 - type: nauc_recall_at_20_max value: 31.0116 - type: nauc_recall_at_20_std value: 6.3982 - type: nauc_recall_at_20_diff1 value: 38.085 - type: nauc_recall_at_100_max value: 35.6387 - type: nauc_recall_at_100_std value: 18.4894 - type: nauc_recall_at_100_diff1 value: 35.2692 - type: nauc_recall_at_1000_max value: 44.9874 - type: nauc_recall_at_1000_std value: 36.0452 - type: nauc_recall_at_1000_diff1 value: 34.8612 - type: nauc_precision_at_1_max value: 29.5043 - type: nauc_precision_at_1_std value: -3.8282999999999996 - type: nauc_precision_at_1_diff1 value: 55.538399999999996 - type: nauc_precision_at_3_max value: 30.567 - type: nauc_precision_at_3_std value: -1.7389999999999999 - type: nauc_precision_at_3_diff1 value: 45.6079 - type: nauc_precision_at_5_max value: 30.074499999999997 - type: nauc_precision_at_5_std value: -0.7081 - type: nauc_precision_at_5_diff1 value: 43.1053 - type: nauc_precision_at_10_max value: 29.644 - type: nauc_precision_at_10_std value: 1.4013 - type: nauc_precision_at_10_diff1 value: 40.0676 - type: nauc_precision_at_20_max value: 31.0116 - type: nauc_precision_at_20_std value: 6.3982 - type: nauc_precision_at_20_diff1 value: 38.085 - type: nauc_precision_at_100_max value: 35.6387 - type: nauc_precision_at_100_std value: 18.4894 - type: nauc_precision_at_100_diff1 value: 35.2692 - type: nauc_precision_at_1000_max value: 44.9874 - type: nauc_precision_at_1000_std value: 36.0452 - type: nauc_precision_at_1000_diff1 value: 34.8612 - type: nauc_mrr_at_1_max value: 29.5043 - type: nauc_mrr_at_1_std value: -3.8282999999999996 - type: nauc_mrr_at_1_diff1 value: 55.538399999999996 - type: nauc_mrr_at_3_max value: 30.0348 - type: nauc_mrr_at_3_std value: -2.9402 - type: nauc_mrr_at_3_diff1 value: 50.8128 - type: nauc_mrr_at_5_max value: 29.9447 - type: nauc_mrr_at_5_std value: -2.7157 - type: nauc_mrr_at_5_diff1 value: 50.2953 - type: nauc_mrr_at_10_max value: 29.8929 - type: nauc_mrr_at_10_std value: -2.4865000000000004 - type: nauc_mrr_at_10_diff1 value: 49.9942 - type: nauc_mrr_at_20_max value: 29.9564 - type: nauc_mrr_at_20_std value: -2.2576 - type: nauc_mrr_at_20_diff1 value: 49.961800000000004 - type: nauc_mrr_at_100_max value: 30.0155 - type: nauc_mrr_at_100_std value: -2.1527000000000003 - type: nauc_mrr_at_100_diff1 value: 50.00320000000001 - type: nauc_mrr_at_1000_max value: 30.0156 - type: nauc_mrr_at_1000_std value: -2.1597999999999997 - type: nauc_mrr_at_1000_diff1 value: 50.019000000000005 - type: main_score value: 45.217 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackAndroidRetrieval (default) revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: mteb/cqadupstack-android metrics: - type: ndcg_at_1 value: 45.923 - type: ndcg_at_3 value: 51.842999999999996 - type: ndcg_at_5 value: 54.257 - type: ndcg_at_10 value: 57.667 - type: ndcg_at_20 value: 59.516000000000005 - type: ndcg_at_100 value: 62.373 - type: ndcg_at_1000 value: 63.68000000000001 - type: map_at_1 value: 36.964000000000006 - type: map_at_3 value: 46.001 - type: map_at_5 value: 48.312 - type: map_at_10 value: 50.43 - type: map_at_20 value: 51.371 - type: map_at_100 value: 52.066 - type: map_at_1000 value: 52.175000000000004 - type: recall_at_1 value: 36.964000000000006 - type: recall_at_3 value: 53.654999999999994 - type: recall_at_5 value: 60.995999999999995 - type: recall_at_10 value: 71.234 - type: recall_at_20 value: 77.596 - type: recall_at_100 value: 90.42099999999999 - type: recall_at_1000 value: 98.29599999999999 - type: precision_at_1 value: 45.923 - type: precision_at_3 value: 25.369999999999997 - type: precision_at_5 value: 18.14 - type: precision_at_10 value: 11.315999999999999 - type: precision_at_20 value: 6.651999999999999 - type: precision_at_100 value: 1.7049999999999998 - type: precision_at_1000 value: 0.216 - type: mrr_at_1 value: 45.9227 - type: mrr_at_3 value: 54.053399999999996 - type: mrr_at_5 value: 55.555600000000005 - type: mrr_at_10 value: 56.7326 - type: mrr_at_20 value: 57.0026 - type: mrr_at_100 value: 57.2924 - type: mrr_at_1000 value: 57.321299999999994 - type: nauc_ndcg_at_1_max value: 40.8301 - type: nauc_ndcg_at_1_std value: -4.7965 - type: nauc_ndcg_at_1_diff1 value: 47.0363 - type: nauc_ndcg_at_3_max value: 38.1658 - type: nauc_ndcg_at_3_std value: -5.5431 - type: nauc_ndcg_at_3_diff1 value: 43.236200000000004 - type: nauc_ndcg_at_5_max value: 38.3776 - type: nauc_ndcg_at_5_std value: -6.4315 - type: nauc_ndcg_at_5_diff1 value: 41.906 - type: nauc_ndcg_at_10_max value: 38.246900000000004 - type: nauc_ndcg_at_10_std value: -5.9109 - type: nauc_ndcg_at_10_diff1 value: 42.2073 - type: nauc_ndcg_at_20_max value: 39.1442 - type: nauc_ndcg_at_20_std value: -4.2145 - type: nauc_ndcg_at_20_diff1 value: 42.1173 - type: nauc_ndcg_at_100_max value: 40.2409 - type: nauc_ndcg_at_100_std value: -2.3533999999999997 - type: nauc_ndcg_at_100_diff1 value: 43.08 - type: nauc_ndcg_at_1000_max value: 39.7135 - type: nauc_ndcg_at_1000_std value: -3.2211999999999996 - type: nauc_ndcg_at_1000_diff1 value: 42.9532 - type: nauc_map_at_1_max value: 34.8396 - type: nauc_map_at_1_std value: -7.427200000000001 - type: nauc_map_at_1_diff1 value: 52.3057 - type: nauc_map_at_3_max value: 36.869 - type: nauc_map_at_3_std value: -7.482800000000001 - type: nauc_map_at_3_diff1 value: 46.7357 - type: nauc_map_at_5_max value: 37.7915 - type: nauc_map_at_5_std value: -7.4328 - type: nauc_map_at_5_diff1 value: 45.5111 - type: nauc_map_at_10_max value: 38.1613 - type: nauc_map_at_10_std value: -6.8068 - type: nauc_map_at_10_diff1 value: 45.359899999999996 - type: nauc_map_at_20_max value: 38.5576 - type: nauc_map_at_20_std value: -6.051200000000001 - type: nauc_map_at_20_diff1 value: 45.1212 - type: nauc_map_at_100_max value: 38.8156 - type: nauc_map_at_100_std value: -5.5418 - type: nauc_map_at_100_diff1 value: 45.1108 - type: nauc_map_at_1000_max value: 38.746199999999995 - type: nauc_map_at_1000_std value: -5.6205 - type: nauc_map_at_1000_diff1 value: 45.053399999999996 - type: nauc_recall_at_1_max value: 34.8396 - type: nauc_recall_at_1_std value: -7.427200000000001 - type: nauc_recall_at_1_diff1 value: 52.3057 - type: nauc_recall_at_3_max value: 34.3365 - type: nauc_recall_at_3_std value: -6.8784 - type: nauc_recall_at_3_diff1 value: 40.2233 - type: nauc_recall_at_5_max value: 34.4245 - type: nauc_recall_at_5_std value: -8.426300000000001 - type: nauc_recall_at_5_diff1 value: 35.4121 - type: nauc_recall_at_10_max value: 32.2333 - type: nauc_recall_at_10_std value: -5.8829 - type: nauc_recall_at_10_diff1 value: 34.0262 - type: nauc_recall_at_20_max value: 36.256 - type: nauc_recall_at_20_std value: 1.9085999999999999 - type: nauc_recall_at_20_diff1 value: 32.2877 - type: nauc_recall_at_100_max value: 47.3573 - type: nauc_recall_at_100_std value: 24.4303 - type: nauc_recall_at_100_diff1 value: 38.3181 - type: nauc_recall_at_1000_max value: 63.5826 - type: nauc_recall_at_1000_std value: 71.3349 - type: nauc_recall_at_1000_diff1 value: 40.771 - type: nauc_precision_at_1_max value: 40.8301 - type: nauc_precision_at_1_std value: -4.7965 - type: nauc_precision_at_1_diff1 value: 47.0363 - type: nauc_precision_at_3_max value: 30.7605 - type: nauc_precision_at_3_std value: -0.4 - type: nauc_precision_at_3_diff1 value: 17.099800000000002 - type: nauc_precision_at_5_max value: 26.3274 - type: nauc_precision_at_5_std value: 3.1927 - type: nauc_precision_at_5_diff1 value: 5.6719 - type: nauc_precision_at_10_max value: 16.8618 - type: nauc_precision_at_10_std value: 7.0584 - type: nauc_precision_at_10_diff1 value: -4.7258000000000004 - type: nauc_precision_at_20_max value: 10.8993 - type: nauc_precision_at_20_std value: 10.215499999999999 - type: nauc_precision_at_20_diff1 value: -10.8149 - type: nauc_precision_at_100_max value: -0.0973 - type: nauc_precision_at_100_std value: 9.3108 - type: nauc_precision_at_100_diff1 value: -19.0862 - type: nauc_precision_at_1000_max value: -16.488 - type: nauc_precision_at_1000_std value: -6.325 - type: nauc_precision_at_1000_diff1 value: -28.7621 - type: nauc_mrr_at_1_max value: 40.8301 - type: nauc_mrr_at_1_std value: -4.7965 - type: nauc_mrr_at_1_diff1 value: 47.0363 - type: nauc_mrr_at_3_max value: 40.3492 - type: nauc_mrr_at_3_std value: -4.0226 - type: nauc_mrr_at_3_diff1 value: 43.358799999999995 - type: nauc_mrr_at_5_max value: 40.4342 - type: nauc_mrr_at_5_std value: -4.5294 - type: nauc_mrr_at_5_diff1 value: 42.6362 - type: nauc_mrr_at_10_max value: 40.2882 - type: nauc_mrr_at_10_std value: -4.1685 - type: nauc_mrr_at_10_diff1 value: 42.5151 - type: nauc_mrr_at_20_max value: 40.3939 - type: nauc_mrr_at_20_std value: -4.1178 - type: nauc_mrr_at_20_diff1 value: 42.586400000000005 - type: nauc_mrr_at_100_max value: 40.5002 - type: nauc_mrr_at_100_std value: -4.0205 - type: nauc_mrr_at_100_diff1 value: 42.7299 - type: nauc_mrr_at_1000_max value: 40.5002 - type: nauc_mrr_at_1000_std value: -4.0168 - type: nauc_mrr_at_1000_diff1 value: 42.7356 - type: main_score value: 57.667 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval (default) revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: mteb/cqadupstack-english metrics: - type: ndcg_at_1 value: 45.478 - type: ndcg_at_3 value: 51.124 - type: ndcg_at_5 value: 53.166000000000004 - type: ndcg_at_10 value: 55.505 - type: ndcg_at_20 value: 57.154 - type: ndcg_at_100 value: 59.606 - type: ndcg_at_1000 value: 61.255 - type: map_at_1 value: 36.198 - type: map_at_3 value: 45.678000000000004 - type: map_at_5 value: 47.605 - type: map_at_10 value: 49.199 - type: map_at_20 value: 49.957 - type: map_at_100 value: 50.602000000000004 - type: map_at_1000 value: 50.736000000000004 - type: recall_at_1 value: 36.198 - type: recall_at_3 value: 53.20700000000001 - type: recall_at_5 value: 59.169000000000004 - type: recall_at_10 value: 66.465 - type: recall_at_20 value: 72.60799999999999 - type: recall_at_100 value: 83.63199999999999 - type: recall_at_1000 value: 93.27600000000001 - type: precision_at_1 value: 45.478 - type: precision_at_3 value: 25.052999999999997 - type: precision_at_5 value: 17.694 - type: precision_at_10 value: 10.752 - type: precision_at_20 value: 6.239 - type: precision_at_100 value: 1.6660000000000001 - type: precision_at_1000 value: 0.211 - type: mrr_at_1 value: 45.4777 - type: mrr_at_3 value: 52.887499999999996 - type: mrr_at_5 value: 54.282399999999996 - type: mrr_at_10 value: 55.0745 - type: mrr_at_20 value: 55.43090000000001 - type: mrr_at_100 value: 55.656000000000006 - type: mrr_at_1000 value: 55.688 - type: nauc_ndcg_at_1_max value: 46.8217 - type: nauc_ndcg_at_1_std value: -2.7794 - type: nauc_ndcg_at_1_diff1 value: 57.0574 - type: nauc_ndcg_at_3_max value: 47.7532 - type: nauc_ndcg_at_3_std value: -1.4668 - type: nauc_ndcg_at_3_diff1 value: 52.8335 - type: nauc_ndcg_at_5_max value: 48.7828 - type: nauc_ndcg_at_5_std value: -1.015 - type: nauc_ndcg_at_5_diff1 value: 51.991699999999994 - type: nauc_ndcg_at_10_max value: 50.114999999999995 - type: nauc_ndcg_at_10_std value: 1.1684 - type: nauc_ndcg_at_10_diff1 value: 51.9116 - type: nauc_ndcg_at_20_max value: 50.006099999999996 - type: nauc_ndcg_at_20_std value: 2.0345 - type: nauc_ndcg_at_20_diff1 value: 51.63870000000001 - type: nauc_ndcg_at_100_max value: 50.478 - type: nauc_ndcg_at_100_std value: 3.8077 - type: nauc_ndcg_at_100_diff1 value: 51.3939 - type: nauc_ndcg_at_1000_max value: 50.0328 - type: nauc_ndcg_at_1000_std value: 3.2628 - type: nauc_ndcg_at_1000_diff1 value: 51.5116 - type: nauc_map_at_1_max value: 35.4528 - type: nauc_map_at_1_std value: -12.8546 - type: nauc_map_at_1_diff1 value: 59.2294 - type: nauc_map_at_3_max value: 42.8209 - type: nauc_map_at_3_std value: -8.1284 - type: nauc_map_at_3_diff1 value: 55.5925 - type: nauc_map_at_5_max value: 44.7278 - type: nauc_map_at_5_std value: -6.311400000000001 - type: nauc_map_at_5_diff1 value: 54.6249 - type: nauc_map_at_10_max value: 46.3085 - type: nauc_map_at_10_std value: -4.2609 - type: nauc_map_at_10_diff1 value: 54.4523 - type: nauc_map_at_20_max value: 46.8259 - type: nauc_map_at_20_std value: -3.3686000000000003 - type: nauc_map_at_20_diff1 value: 54.225100000000005 - type: nauc_map_at_100_max value: 47.4262 - type: nauc_map_at_100_std value: -2.3889 - type: nauc_map_at_100_diff1 value: 54.01669999999999 - type: nauc_map_at_1000_max value: 47.453 - type: nauc_map_at_1000_std value: -2.3062 - type: nauc_map_at_1000_diff1 value: 53.9968 - type: nauc_recall_at_1_max value: 35.4528 - type: nauc_recall_at_1_std value: -12.8546 - type: nauc_recall_at_1_diff1 value: 59.2294 - type: nauc_recall_at_3_max value: 42.7793 - type: nauc_recall_at_3_std value: -4.7798 - type: nauc_recall_at_3_diff1 value: 49.741 - type: nauc_recall_at_5_max value: 45.6544 - type: nauc_recall_at_5_std value: -1.6133000000000002 - type: nauc_recall_at_5_diff1 value: 45.7699 - type: nauc_recall_at_10_max value: 50.769 - type: nauc_recall_at_10_std value: 7.4262 - type: nauc_recall_at_10_diff1 value: 43.3808 - type: nauc_recall_at_20_max value: 51.0312 - type: nauc_recall_at_20_std value: 12.7246 - type: nauc_recall_at_20_diff1 value: 40.5477 - type: nauc_recall_at_100_max value: 56.3878 - type: nauc_recall_at_100_std value: 31.893300000000004 - type: nauc_recall_at_100_diff1 value: 34.902699999999996 - type: nauc_recall_at_1000_max value: 55.4185 - type: nauc_recall_at_1000_std value: 48.0244 - type: nauc_recall_at_1000_diff1 value: 27.980300000000003 - type: nauc_precision_at_1_max value: 46.8217 - type: nauc_precision_at_1_std value: -2.7794 - type: nauc_precision_at_1_diff1 value: 57.0574 - type: nauc_precision_at_3_max value: 45.9159 - type: nauc_precision_at_3_std value: 14.8948 - type: nauc_precision_at_3_diff1 value: 25.3519 - type: nauc_precision_at_5_max value: 44.908500000000004 - type: nauc_precision_at_5_std value: 22.3321 - type: nauc_precision_at_5_diff1 value: 14.696600000000002 - type: nauc_precision_at_10_max value: 40.1 - type: nauc_precision_at_10_std value: 29.6731 - type: nauc_precision_at_10_diff1 value: 4.2817 - type: nauc_precision_at_20_max value: 35.2526 - type: nauc_precision_at_20_std value: 34.4698 - type: nauc_precision_at_20_diff1 value: -3.8809000000000005 - type: nauc_precision_at_100_max value: 25.186500000000002 - type: nauc_precision_at_100_std value: 38.684400000000004 - type: nauc_precision_at_100_diff1 value: -15.160599999999999 - type: nauc_precision_at_1000_max value: 11.5275 - type: nauc_precision_at_1000_std value: 29.2055 - type: nauc_precision_at_1000_diff1 value: -19.7629 - type: nauc_mrr_at_1_max value: 46.8217 - type: nauc_mrr_at_1_std value: -2.7794 - type: nauc_mrr_at_1_diff1 value: 57.0574 - type: nauc_mrr_at_3_max value: 49.7145 - type: nauc_mrr_at_3_std value: 0.7482 - type: nauc_mrr_at_3_diff1 value: 54.0562 - type: nauc_mrr_at_5_max value: 50.0393 - type: nauc_mrr_at_5_std value: 0.9629000000000001 - type: nauc_mrr_at_5_diff1 value: 53.41780000000001 - type: nauc_mrr_at_10_max value: 50.325900000000004 - type: nauc_mrr_at_10_std value: 1.6938000000000002 - type: nauc_mrr_at_10_diff1 value: 53.0736 - type: nauc_mrr_at_20_max value: 50.1989 - type: nauc_mrr_at_20_std value: 1.7967 - type: nauc_mrr_at_20_diff1 value: 52.9982 - type: nauc_mrr_at_100_max value: 50.184799999999996 - type: nauc_mrr_at_100_std value: 1.8381999999999998 - type: nauc_mrr_at_100_diff1 value: 53.034099999999995 - type: nauc_mrr_at_1000_max value: 50.1706 - type: nauc_mrr_at_1000_std value: 1.8124999999999998 - type: nauc_mrr_at_1000_diff1 value: 53.0505 - type: main_score value: 55.505 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval (default) revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: mteb/cqadupstack-gaming metrics: - type: ndcg_at_1 value: 50.09400000000001 - type: ndcg_at_3 value: 58.022 - type: ndcg_at_5 value: 60.97 - type: ndcg_at_10 value: 63.641000000000005 - type: ndcg_at_20 value: 65.273 - type: ndcg_at_100 value: 67.05499999999999 - type: ndcg_at_1000 value: 67.855 - type: map_at_1 value: 44.157000000000004 - type: map_at_3 value: 54.223 - type: map_at_5 value: 56.306999999999995 - type: map_at_10 value: 57.753 - type: map_at_20 value: 58.36900000000001 - type: map_at_100 value: 58.69799999999999 - type: map_at_1000 value: 58.74 - type: recall_at_1 value: 44.157000000000004 - type: recall_at_3 value: 63.087 - type: recall_at_5 value: 70.172 - type: recall_at_10 value: 77.78 - type: recall_at_20 value: 83.699 - type: recall_at_100 value: 92.244 - type: recall_at_1000 value: 97.81 - type: precision_at_1 value: 50.09400000000001 - type: precision_at_3 value: 25.81 - type: precision_at_5 value: 17.755000000000003 - type: precision_at_10 value: 10.181999999999999 - type: precision_at_20 value: 5.627 - type: precision_at_100 value: 1.278 - type: precision_at_1000 value: 0.13799999999999998 - type: mrr_at_1 value: 50.09400000000001 - type: mrr_at_3 value: 58.2654 - type: mrr_at_5 value: 59.8171 - type: mrr_at_10 value: 60.6998 - type: mrr_at_20 value: 61.077000000000005 - type: mrr_at_100 value: 61.2602 - type: mrr_at_1000 value: 61.2803 - type: nauc_ndcg_at_1_max value: 42.0223 - type: nauc_ndcg_at_1_std value: -7.5249999999999995 - type: nauc_ndcg_at_1_diff1 value: 57.545 - type: nauc_ndcg_at_3_max value: 41.4981 - type: nauc_ndcg_at_3_std value: -7.3598 - type: nauc_ndcg_at_3_diff1 value: 53.404399999999995 - type: nauc_ndcg_at_5_max value: 43.1299 - type: nauc_ndcg_at_5_std value: -5.4483999999999995 - type: nauc_ndcg_at_5_diff1 value: 52.86149999999999 - type: nauc_ndcg_at_10_max value: 44.460899999999995 - type: nauc_ndcg_at_10_std value: -3.5878 - type: nauc_ndcg_at_10_diff1 value: 53.24529999999999 - type: nauc_ndcg_at_20_max value: 45.057199999999995 - type: nauc_ndcg_at_20_std value: -2.5892999999999997 - type: nauc_ndcg_at_20_diff1 value: 53.14919999999999 - type: nauc_ndcg_at_100_max value: 45.202 - type: nauc_ndcg_at_100_std value: -1.6291 - type: nauc_ndcg_at_100_diff1 value: 53.226099999999995 - type: nauc_ndcg_at_1000_max value: 44.9773 - type: nauc_ndcg_at_1000_std value: -2.2944 - type: nauc_ndcg_at_1000_diff1 value: 53.5531 - type: nauc_map_at_1_max value: 34.3597 - type: nauc_map_at_1_std value: -8.7494 - type: nauc_map_at_1_diff1 value: 57.288399999999996 - type: nauc_map_at_3_max value: 39.723000000000006 - type: nauc_map_at_3_std value: -8.9697 - type: nauc_map_at_3_diff1 value: 55.0296 - type: nauc_map_at_5_max value: 41.2509 - type: nauc_map_at_5_std value: -7.561 - type: nauc_map_at_5_diff1 value: 54.641799999999996 - type: nauc_map_at_10_max value: 42.2464 - type: nauc_map_at_10_std value: -6.442699999999999 - type: nauc_map_at_10_diff1 value: 54.6922 - type: nauc_map_at_20_max value: 42.6447 - type: nauc_map_at_20_std value: -5.8575 - type: nauc_map_at_20_diff1 value: 54.607099999999996 - type: nauc_map_at_100_max value: 42.801899999999996 - type: nauc_map_at_100_std value: -5.5908 - type: nauc_map_at_100_diff1 value: 54.64 - type: nauc_map_at_1000_max value: 42.8163 - type: nauc_map_at_1000_std value: -5.5892 - type: nauc_map_at_1000_diff1 value: 54.657999999999994 - type: nauc_recall_at_1_max value: 34.3597 - type: nauc_recall_at_1_std value: -8.7494 - type: nauc_recall_at_1_diff1 value: 57.288399999999996 - type: nauc_recall_at_3_max value: 38.2143 - type: nauc_recall_at_3_std value: -8.5053 - type: nauc_recall_at_3_diff1 value: 48.5674 - type: nauc_recall_at_5_max value: 42.4963 - type: nauc_recall_at_5_std value: -3.1975000000000002 - type: nauc_recall_at_5_diff1 value: 46.1409 - type: nauc_recall_at_10_max value: 47.5304 - type: nauc_recall_at_10_std value: 4.2543 - type: nauc_recall_at_10_diff1 value: 46.187400000000004 - type: nauc_recall_at_20_max value: 52.5031 - type: nauc_recall_at_20_std value: 12.215 - type: nauc_recall_at_20_diff1 value: 43.959199999999996 - type: nauc_recall_at_100_max value: 59.519800000000004 - type: nauc_recall_at_100_std value: 36.355399999999996 - type: nauc_recall_at_100_diff1 value: 38.1615 - type: nauc_recall_at_1000_max value: 75.7293 - type: nauc_recall_at_1000_std value: 68.0791 - type: nauc_recall_at_1000_diff1 value: 33.4758 - type: nauc_precision_at_1_max value: 42.0223 - type: nauc_precision_at_1_std value: -7.5249999999999995 - type: nauc_precision_at_1_diff1 value: 57.545 - type: nauc_precision_at_3_max value: 40.269800000000004 - type: nauc_precision_at_3_std value: -0.1042 - type: nauc_precision_at_3_diff1 value: 28.7982 - type: nauc_precision_at_5_max value: 37.8177 - type: nauc_precision_at_5_std value: 6.5974 - type: nauc_precision_at_5_diff1 value: 17.729 - type: nauc_precision_at_10_max value: 34.4199 - type: nauc_precision_at_10_std value: 14.8032 - type: nauc_precision_at_10_diff1 value: 7.8933 - type: nauc_precision_at_20_max value: 31.5289 - type: nauc_precision_at_20_std value: 22.1412 - type: nauc_precision_at_20_diff1 value: -0.993 - type: nauc_precision_at_100_max value: 24.3425 - type: nauc_precision_at_100_std value: 27.3469 - type: nauc_precision_at_100_diff1 value: -9.3572 - type: nauc_precision_at_1000_max value: 18.453500000000002 - type: nauc_precision_at_1000_std value: 24.925800000000002 - type: nauc_precision_at_1000_diff1 value: -12.5892 - type: nauc_mrr_at_1_max value: 42.0223 - type: nauc_mrr_at_1_std value: -7.5249999999999995 - type: nauc_mrr_at_1_diff1 value: 57.545 - type: nauc_mrr_at_3_max value: 43.4966 - type: nauc_mrr_at_3_std value: -5.9497 - type: nauc_mrr_at_3_diff1 value: 54.3814 - type: nauc_mrr_at_5_max value: 43.918 - type: nauc_mrr_at_5_std value: -5.048 - type: nauc_mrr_at_5_diff1 value: 53.9473 - type: nauc_mrr_at_10_max value: 43.9711 - type: nauc_mrr_at_10_std value: -4.6621999999999995 - type: nauc_mrr_at_10_diff1 value: 54.231399999999994 - type: nauc_mrr_at_20_max value: 44.0448 - type: nauc_mrr_at_20_std value: -4.564900000000001 - type: nauc_mrr_at_20_diff1 value: 54.2486 - type: nauc_mrr_at_100_max value: 44.0305 - type: nauc_mrr_at_100_std value: -4.5347 - type: nauc_mrr_at_100_diff1 value: 54.2802 - type: nauc_mrr_at_1000_max value: 44.0239 - type: nauc_mrr_at_1000_std value: -4.5523 - type: nauc_mrr_at_1000_diff1 value: 54.2908 - type: main_score value: 63.641000000000005 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval (default) revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: mteb/cqadupstack-gis metrics: - type: ndcg_at_1 value: 32.09 - type: ndcg_at_3 value: 40.149 - type: ndcg_at_5 value: 43.111 - type: ndcg_at_10 value: 46.075 - type: ndcg_at_20 value: 48.17 - type: ndcg_at_100 value: 51.03 - type: ndcg_at_1000 value: 52.668000000000006 - type: map_at_1 value: 29.532000000000004 - type: map_at_3 value: 37.086000000000006 - type: map_at_5 value: 38.889 - type: map_at_10 value: 40.214 - type: map_at_20 value: 40.831 - type: map_at_100 value: 41.289 - type: map_at_1000 value: 41.359 - type: recall_at_1 value: 29.532000000000004 - type: recall_at_3 value: 46.03 - type: recall_at_5 value: 53.089 - type: recall_at_10 value: 62.025 - type: recall_at_20 value: 69.762 - type: recall_at_100 value: 83.829 - type: recall_at_1000 value: 95.99499999999999 - type: precision_at_1 value: 32.09 - type: precision_at_3 value: 17.175 - type: precision_at_5 value: 12.068 - type: precision_at_10 value: 7.141 - type: precision_at_20 value: 4.079 - type: precision_at_100 value: 1.018 - type: precision_at_1000 value: 0.11800000000000001 - type: mrr_at_1 value: 32.0904 - type: mrr_at_3 value: 39.7363 - type: mrr_at_5 value: 41.307 - type: mrr_at_10 value: 42.4232 - type: mrr_at_20 value: 42.9925 - type: mrr_at_100 value: 43.342000000000006 - type: mrr_at_1000 value: 43.3947 - type: nauc_ndcg_at_1_max value: 28.6057 - type: nauc_ndcg_at_1_std value: -9.5015 - type: nauc_ndcg_at_1_diff1 value: 45.895599999999995 - type: nauc_ndcg_at_3_max value: 27.4486 - type: nauc_ndcg_at_3_std value: -8.3694 - type: nauc_ndcg_at_3_diff1 value: 40.1689 - type: nauc_ndcg_at_5_max value: 29.481299999999997 - type: nauc_ndcg_at_5_std value: -5.382 - type: nauc_ndcg_at_5_diff1 value: 39.5505 - type: nauc_ndcg_at_10_max value: 29.629299999999997 - type: nauc_ndcg_at_10_std value: -3.1249 - type: nauc_ndcg_at_10_diff1 value: 37.953199999999995 - type: nauc_ndcg_at_20_max value: 29.5532 - type: nauc_ndcg_at_20_std value: -2.7831 - type: nauc_ndcg_at_20_diff1 value: 37.2522 - type: nauc_ndcg_at_100_max value: 29.741600000000002 - type: nauc_ndcg_at_100_std value: -3.2703999999999995 - type: nauc_ndcg_at_100_diff1 value: 37.7396 - type: nauc_ndcg_at_1000_max value: 29.9018 - type: nauc_ndcg_at_1000_std value: -3.6946 - type: nauc_ndcg_at_1000_diff1 value: 38.5323 - type: nauc_map_at_1_max value: 25.423299999999998 - type: nauc_map_at_1_std value: -12.3377 - type: nauc_map_at_1_diff1 value: 46.8633 - type: nauc_map_at_3_max value: 26.4335 - type: nauc_map_at_3_std value: -9.871 - type: nauc_map_at_3_diff1 value: 41.9019 - type: nauc_map_at_5_max value: 27.852 - type: nauc_map_at_5_std value: -8.0967 - type: nauc_map_at_5_diff1 value: 41.4142 - type: nauc_map_at_10_max value: 28.163700000000002 - type: nauc_map_at_10_std value: -6.9023 - type: nauc_map_at_10_diff1 value: 40.779399999999995 - type: nauc_map_at_20_max value: 28.1646 - type: nauc_map_at_20_std value: -6.7966999999999995 - type: nauc_map_at_20_diff1 value: 40.625299999999996 - type: nauc_map_at_100_max value: 28.2439 - type: nauc_map_at_100_std value: -6.7998 - type: nauc_map_at_100_diff1 value: 40.7153 - type: nauc_map_at_1000_max value: 28.2633 - type: nauc_map_at_1000_std value: -6.802 - type: nauc_map_at_1000_diff1 value: 40.748 - type: nauc_recall_at_1_max value: 25.423299999999998 - type: nauc_recall_at_1_std value: -12.3377 - type: nauc_recall_at_1_diff1 value: 46.8633 - type: nauc_recall_at_3_max value: 26.378800000000002 - type: nauc_recall_at_3_std value: -6.6701 - type: nauc_recall_at_3_diff1 value: 35.8097 - type: nauc_recall_at_5_max value: 30.9445 - type: nauc_recall_at_5_std value: 0.1917 - type: nauc_recall_at_5_diff1 value: 33.5229 - type: nauc_recall_at_10_max value: 30.995099999999997 - type: nauc_recall_at_10_std value: 7.613200000000001 - type: nauc_recall_at_10_diff1 value: 27.2905 - type: nauc_recall_at_20_max value: 31.244 - type: nauc_recall_at_20_std value: 11.0527 - type: nauc_recall_at_20_diff1 value: 22.5701 - type: nauc_recall_at_100_max value: 33.293 - type: nauc_recall_at_100_std value: 12.4908 - type: nauc_recall_at_100_diff1 value: 19.2291 - type: nauc_recall_at_1000_max value: 52.0915 - type: nauc_recall_at_1000_std value: 32.1464 - type: nauc_recall_at_1000_diff1 value: 14.0362 - type: nauc_precision_at_1_max value: 28.6057 - type: nauc_precision_at_1_std value: -9.5015 - type: nauc_precision_at_1_diff1 value: 45.895599999999995 - type: nauc_precision_at_3_max value: 31.391599999999997 - type: nauc_precision_at_3_std value: -2.6111 - type: nauc_precision_at_3_diff1 value: 31.983800000000002 - type: nauc_precision_at_5_max value: 35.9814 - type: nauc_precision_at_5_std value: 6.062 - type: nauc_precision_at_5_diff1 value: 27.8588 - type: nauc_precision_at_10_max value: 34.5678 - type: nauc_precision_at_10_std value: 14.2625 - type: nauc_precision_at_10_diff1 value: 19.7208 - type: nauc_precision_at_20_max value: 31.451600000000003 - type: nauc_precision_at_20_std value: 16.6162 - type: nauc_precision_at_20_diff1 value: 12.421100000000001 - type: nauc_precision_at_100_max value: 22.1049 - type: nauc_precision_at_100_std value: 16.4354 - type: nauc_precision_at_100_diff1 value: 0.5193 - type: nauc_precision_at_1000_max value: 14.682899999999998 - type: nauc_precision_at_1000_std value: 15.5581 - type: nauc_precision_at_1000_diff1 value: -9.7103 - type: nauc_mrr_at_1_max value: 28.6057 - type: nauc_mrr_at_1_std value: -9.5015 - type: nauc_mrr_at_1_diff1 value: 45.895599999999995 - type: nauc_mrr_at_3_max value: 29.082400000000003 - type: nauc_mrr_at_3_std value: -6.9314 - type: nauc_mrr_at_3_diff1 value: 40.9506 - type: nauc_mrr_at_5_max value: 30.152600000000003 - type: nauc_mrr_at_5_std value: -5.455900000000001 - type: nauc_mrr_at_5_diff1 value: 40.7747 - type: nauc_mrr_at_10_max value: 29.9987 - type: nauc_mrr_at_10_std value: -4.839799999999999 - type: nauc_mrr_at_10_diff1 value: 40.2137 - type: nauc_mrr_at_20_max value: 29.842200000000002 - type: nauc_mrr_at_20_std value: -4.864 - type: nauc_mrr_at_20_diff1 value: 39.970800000000004 - type: nauc_mrr_at_100_max value: 29.8359 - type: nauc_mrr_at_100_std value: -4.9491 - type: nauc_mrr_at_100_diff1 value: 40.0495 - type: nauc_mrr_at_1000_max value: 29.837799999999998 - type: nauc_mrr_at_1000_std value: -4.968 - type: nauc_mrr_at_1000_diff1 value: 40.0797 - type: main_score value: 46.075 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval (default) revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: mteb/cqadupstack-mathematica metrics: - type: ndcg_at_1 value: 23.756 - type: ndcg_at_3 value: 29.725 - type: ndcg_at_5 value: 32.879000000000005 - type: ndcg_at_10 value: 36.015 - type: ndcg_at_20 value: 38.753 - type: ndcg_at_100 value: 42.175000000000004 - type: ndcg_at_1000 value: 44.607 - type: map_at_1 value: 18.944 - type: map_at_3 value: 26.098 - type: map_at_5 value: 28.151 - type: map_at_10 value: 29.610999999999997 - type: map_at_20 value: 30.481 - type: map_at_100 value: 31.063000000000002 - type: map_at_1000 value: 31.174000000000003 - type: recall_at_1 value: 18.944 - type: recall_at_3 value: 33.611000000000004 - type: recall_at_5 value: 41.427 - type: recall_at_10 value: 50.690999999999995 - type: recall_at_20 value: 60.437 - type: recall_at_100 value: 76.503 - type: recall_at_1000 value: 93.624 - type: precision_at_1 value: 23.756 - type: precision_at_3 value: 14.635000000000002 - type: precision_at_5 value: 11.07 - type: precision_at_10 value: 6.927999999999999 - type: precision_at_20 value: 4.266 - type: precision_at_100 value: 1.153 - type: precision_at_1000 value: 0.149 - type: mrr_at_1 value: 23.7562 - type: mrr_at_3 value: 31.2604 - type: mrr_at_5 value: 33.1696 - type: mrr_at_10 value: 34.4913 - type: mrr_at_20 value: 35.111399999999996 - type: mrr_at_100 value: 35.457499999999996 - type: mrr_at_1000 value: 35.5125 - type: nauc_ndcg_at_1_max value: 16.369 - type: nauc_ndcg_at_1_std value: -0.2643 - type: nauc_ndcg_at_1_diff1 value: 36.3924 - type: nauc_ndcg_at_3_max value: 16.8313 - type: nauc_ndcg_at_3_std value: -2.5591 - type: nauc_ndcg_at_3_diff1 value: 31.2622 - type: nauc_ndcg_at_5_max value: 16.575899999999997 - type: nauc_ndcg_at_5_std value: -1.2212 - type: nauc_ndcg_at_5_diff1 value: 30.4259 - type: nauc_ndcg_at_10_max value: 16.7024 - type: nauc_ndcg_at_10_std value: -0.5341 - type: nauc_ndcg_at_10_diff1 value: 30.1232 - type: nauc_ndcg_at_20_max value: 16.5942 - type: nauc_ndcg_at_20_std value: -0.3493 - type: nauc_ndcg_at_20_diff1 value: 29.1065 - type: nauc_ndcg_at_100_max value: 17.6591 - type: nauc_ndcg_at_100_std value: 1.9944 - type: nauc_ndcg_at_100_diff1 value: 29.332399999999996 - type: nauc_ndcg_at_1000_max value: 17.7443 - type: nauc_ndcg_at_1000_std value: 1.6357 - type: nauc_ndcg_at_1000_diff1 value: 30.1231 - type: nauc_map_at_1_max value: 13.264400000000002 - type: nauc_map_at_1_std value: -2.1641 - type: nauc_map_at_1_diff1 value: 37.446200000000005 - type: nauc_map_at_3_max value: 14.9032 - type: nauc_map_at_3_std value: -2.714 - type: nauc_map_at_3_diff1 value: 32.5923 - type: nauc_map_at_5_max value: 14.932500000000001 - type: nauc_map_at_5_std value: -1.9889000000000001 - type: nauc_map_at_5_diff1 value: 31.879600000000003 - type: nauc_map_at_10_max value: 15.309500000000002 - type: nauc_map_at_10_std value: -1.5512 - type: nauc_map_at_10_diff1 value: 31.694899999999997 - type: nauc_map_at_20_max value: 15.3357 - type: nauc_map_at_20_std value: -1.4588999999999999 - type: nauc_map_at_20_diff1 value: 31.323800000000002 - type: nauc_map_at_100_max value: 15.598 - type: nauc_map_at_100_std value: -0.9811000000000001 - type: nauc_map_at_100_diff1 value: 31.434600000000003 - type: nauc_map_at_1000_max value: 15.6096 - type: nauc_map_at_1000_std value: -0.9884999999999999 - type: nauc_map_at_1000_diff1 value: 31.4697 - type: nauc_recall_at_1_max value: 13.264400000000002 - type: nauc_recall_at_1_std value: -2.1641 - type: nauc_recall_at_1_diff1 value: 37.446200000000005 - type: nauc_recall_at_3_max value: 15.945500000000001 - type: nauc_recall_at_3_std value: -3.4730999999999996 - type: nauc_recall_at_3_diff1 value: 27.0913 - type: nauc_recall_at_5_max value: 15.237800000000002 - type: nauc_recall_at_5_std value: -1.0399 - type: nauc_recall_at_5_diff1 value: 25.2793 - type: nauc_recall_at_10_max value: 15.1746 - type: nauc_recall_at_10_std value: 0.5708000000000001 - type: nauc_recall_at_10_diff1 value: 24.2515 - type: nauc_recall_at_20_max value: 14.3294 - type: nauc_recall_at_20_std value: 0.8943 - type: nauc_recall_at_20_diff1 value: 20.1567 - type: nauc_recall_at_100_max value: 19.405 - type: nauc_recall_at_100_std value: 15.5971 - type: nauc_recall_at_100_diff1 value: 16.8 - type: nauc_recall_at_1000_max value: 27.3117 - type: nauc_recall_at_1000_std value: 36.0277 - type: nauc_recall_at_1000_diff1 value: 15.1497 - type: nauc_precision_at_1_max value: 16.369 - type: nauc_precision_at_1_std value: -0.2643 - type: nauc_precision_at_1_diff1 value: 36.3924 - type: nauc_precision_at_3_max value: 19.78 - type: nauc_precision_at_3_std value: -2.0522 - type: nauc_precision_at_3_diff1 value: 24.3712 - type: nauc_precision_at_5_max value: 19.4882 - type: nauc_precision_at_5_std value: 0.7147 - type: nauc_precision_at_5_diff1 value: 20.2841 - type: nauc_precision_at_10_max value: 20.0931 - type: nauc_precision_at_10_std value: 3.0831 - type: nauc_precision_at_10_diff1 value: 15.928899999999999 - type: nauc_precision_at_20_max value: 17.5823 - type: nauc_precision_at_20_std value: 4.1056 - type: nauc_precision_at_20_diff1 value: 9.211500000000001 - type: nauc_precision_at_100_max value: 14.447399999999998 - type: nauc_precision_at_100_std value: 10.1543 - type: nauc_precision_at_100_diff1 value: 3.5811999999999995 - type: nauc_precision_at_1000_max value: 7.829899999999999 - type: nauc_precision_at_1000_std value: 3.4869999999999997 - type: nauc_precision_at_1000_diff1 value: -0.5313 - type: nauc_mrr_at_1_max value: 16.369 - type: nauc_mrr_at_1_std value: -0.2643 - type: nauc_mrr_at_1_diff1 value: 36.3924 - type: nauc_mrr_at_3_max value: 18.8798 - type: nauc_mrr_at_3_std value: -0.7811 - type: nauc_mrr_at_3_diff1 value: 31.7255 - type: nauc_mrr_at_5_max value: 18.840799999999998 - type: nauc_mrr_at_5_std value: -0.0676 - type: nauc_mrr_at_5_diff1 value: 31.6753 - type: nauc_mrr_at_10_max value: 18.8049 - type: nauc_mrr_at_10_std value: 0.2359 - type: nauc_mrr_at_10_diff1 value: 31.729200000000002 - type: nauc_mrr_at_20_max value: 18.709999999999997 - type: nauc_mrr_at_20_std value: 0.2533 - type: nauc_mrr_at_20_diff1 value: 31.556099999999997 - type: nauc_mrr_at_100_max value: 18.7625 - type: nauc_mrr_at_100_std value: 0.411 - type: nauc_mrr_at_100_diff1 value: 31.575599999999998 - type: nauc_mrr_at_1000_max value: 18.7525 - type: nauc_mrr_at_1000_std value: 0.4194 - type: nauc_mrr_at_1000_diff1 value: 31.6052 - type: main_score value: 36.015 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval (default) revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: mteb/cqadupstack-physics metrics: - type: ndcg_at_1 value: 42.348 - type: ndcg_at_3 value: 48.478 - type: ndcg_at_5 value: 50.79 - type: ndcg_at_10 value: 53.504 - type: ndcg_at_20 value: 55.753 - type: ndcg_at_100 value: 58.899 - type: ndcg_at_1000 value: 60.32300000000001 - type: map_at_1 value: 33.824 - type: map_at_3 value: 43.335 - type: map_at_5 value: 45.279 - type: map_at_10 value: 46.867999999999995 - type: map_at_20 value: 47.714 - type: map_at_100 value: 48.306 - type: map_at_1000 value: 48.406 - type: recall_at_1 value: 33.824 - type: recall_at_3 value: 52.305 - type: recall_at_5 value: 58.804 - type: recall_at_10 value: 67.142 - type: recall_at_20 value: 74.694 - type: recall_at_100 value: 89.134 - type: recall_at_1000 value: 97.816 - type: precision_at_1 value: 42.348 - type: precision_at_3 value: 23.741 - type: precision_at_5 value: 16.439 - type: precision_at_10 value: 9.75 - type: precision_at_20 value: 5.702999999999999 - type: precision_at_100 value: 1.466 - type: precision_at_1000 value: 0.17700000000000002 - type: mrr_at_1 value: 42.348400000000005 - type: mrr_at_3 value: 50.721799999999995 - type: mrr_at_5 value: 52.0115 - type: mrr_at_10 value: 52.9721 - type: mrr_at_20 value: 53.3914 - type: mrr_at_100 value: 53.7068 - type: mrr_at_1000 value: 53.734300000000005 - type: nauc_ndcg_at_1_max value: 36.8685 - type: nauc_ndcg_at_1_std value: -1.9057000000000002 - type: nauc_ndcg_at_1_diff1 value: 54.151700000000005 - type: nauc_ndcg_at_3_max value: 36.8356 - type: nauc_ndcg_at_3_std value: -3.5336 - type: nauc_ndcg_at_3_diff1 value: 48.3439 - type: nauc_ndcg_at_5_max value: 35.705999999999996 - type: nauc_ndcg_at_5_std value: -4.5076 - type: nauc_ndcg_at_5_diff1 value: 47.5611 - type: nauc_ndcg_at_10_max value: 36.7768 - type: nauc_ndcg_at_10_std value: -2.459 - type: nauc_ndcg_at_10_diff1 value: 47.254400000000004 - type: nauc_ndcg_at_20_max value: 37.390499999999996 - type: nauc_ndcg_at_20_std value: -2.2398000000000002 - type: nauc_ndcg_at_20_diff1 value: 47.8108 - type: nauc_ndcg_at_100_max value: 38.3272 - type: nauc_ndcg_at_100_std value: -0.3307 - type: nauc_ndcg_at_100_diff1 value: 48.4739 - type: nauc_ndcg_at_1000_max value: 38.0766 - type: nauc_ndcg_at_1000_std value: -0.6526 - type: nauc_ndcg_at_1000_diff1 value: 48.6232 - type: nauc_map_at_1_max value: 29.901600000000002 - type: nauc_map_at_1_std value: -7.186299999999999 - type: nauc_map_at_1_diff1 value: 54.2246 - type: nauc_map_at_3_max value: 34.083200000000005 - type: nauc_map_at_3_std value: -5.532 - type: nauc_map_at_3_diff1 value: 49.6089 - type: nauc_map_at_5_max value: 34.2724 - type: nauc_map_at_5_std value: -5.4413 - type: nauc_map_at_5_diff1 value: 49.045 - type: nauc_map_at_10_max value: 35.3456 - type: nauc_map_at_10_std value: -4.0495 - type: nauc_map_at_10_diff1 value: 48.9439 - type: nauc_map_at_20_max value: 35.7489 - type: nauc_map_at_20_std value: -3.769 - type: nauc_map_at_20_diff1 value: 49.205799999999996 - type: nauc_map_at_100_max value: 35.9745 - type: nauc_map_at_100_std value: -3.4292000000000002 - type: nauc_map_at_100_diff1 value: 49.2921 - type: nauc_map_at_1000_max value: 35.9764 - type: nauc_map_at_1000_std value: -3.4297 - type: nauc_map_at_1000_diff1 value: 49.3113 - type: nauc_recall_at_1_max value: 29.901600000000002 - type: nauc_recall_at_1_std value: -7.186299999999999 - type: nauc_recall_at_1_diff1 value: 54.2246 - type: nauc_recall_at_3_max value: 32.3363 - type: nauc_recall_at_3_std value: -6.5791 - type: nauc_recall_at_3_diff1 value: 41.86 - type: nauc_recall_at_5_max value: 30.5954 - type: nauc_recall_at_5_std value: -7.989599999999999 - type: nauc_recall_at_5_diff1 value: 38.5503 - type: nauc_recall_at_10_max value: 34.238800000000005 - type: nauc_recall_at_10_std value: -0.756 - type: nauc_recall_at_10_diff1 value: 36.8704 - type: nauc_recall_at_20_max value: 35.7313 - type: nauc_recall_at_20_std value: -0.7048 - type: nauc_recall_at_20_diff1 value: 37.7093 - type: nauc_recall_at_100_max value: 44.4053 - type: nauc_recall_at_100_std value: 20.2029 - type: nauc_recall_at_100_diff1 value: 38.6378 - type: nauc_recall_at_1000_max value: 49.026399999999995 - type: nauc_recall_at_1000_std value: 52.3613 - type: nauc_recall_at_1000_diff1 value: 27.487299999999998 - type: nauc_precision_at_1_max value: 36.8685 - type: nauc_precision_at_1_std value: -1.9057000000000002 - type: nauc_precision_at_1_diff1 value: 54.151700000000005 - type: nauc_precision_at_3_max value: 36.608000000000004 - type: nauc_precision_at_3_std value: 6.3276 - type: nauc_precision_at_3_diff1 value: 28.842499999999998 - type: nauc_precision_at_5_max value: 32.2883 - type: nauc_precision_at_5_std value: 8.0263 - type: nauc_precision_at_5_diff1 value: 21.2274 - type: nauc_precision_at_10_max value: 30.814700000000002 - type: nauc_precision_at_10_std value: 15.4999 - type: nauc_precision_at_10_diff1 value: 12.3553 - type: nauc_precision_at_20_max value: 25.9789 - type: nauc_precision_at_20_std value: 17.128 - type: nauc_precision_at_20_diff1 value: 7.342 - type: nauc_precision_at_100_max value: 15.9879 - type: nauc_precision_at_100_std value: 21.1499 - type: nauc_precision_at_100_diff1 value: -3.0609 - type: nauc_precision_at_1000_max value: 4.850899999999999 - type: nauc_precision_at_1000_std value: 15.750800000000002 - type: nauc_precision_at_1000_diff1 value: -9.2357 - type: nauc_mrr_at_1_max value: 36.8685 - type: nauc_mrr_at_1_std value: -1.9057000000000002 - type: nauc_mrr_at_1_diff1 value: 54.151700000000005 - type: nauc_mrr_at_3_max value: 38.8422 - type: nauc_mrr_at_3_std value: -1.3892 - type: nauc_mrr_at_3_diff1 value: 50.258100000000006 - type: nauc_mrr_at_5_max value: 38.404500000000006 - type: nauc_mrr_at_5_std value: -1.7023 - type: nauc_mrr_at_5_diff1 value: 49.7593 - type: nauc_mrr_at_10_max value: 38.8727 - type: nauc_mrr_at_10_std value: -1.0441 - type: nauc_mrr_at_10_diff1 value: 49.9366 - type: nauc_mrr_at_20_max value: 38.8639 - type: nauc_mrr_at_20_std value: -1.1834 - type: nauc_mrr_at_20_diff1 value: 50.004400000000004 - type: nauc_mrr_at_100_max value: 38.8551 - type: nauc_mrr_at_100_std value: -1.098 - type: nauc_mrr_at_100_diff1 value: 50.0522 - type: nauc_mrr_at_1000_max value: 38.844699999999996 - type: nauc_mrr_at_1000_std value: -1.117 - type: nauc_mrr_at_1000_diff1 value: 50.055099999999996 - type: main_score value: 53.504 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval (default) revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: mteb/cqadupstack-programmers metrics: - type: ndcg_at_1 value: 37.557 - type: ndcg_at_3 value: 42.573 - type: ndcg_at_5 value: 45.528 - type: ndcg_at_10 value: 48.742999999999995 - type: ndcg_at_20 value: 51.160000000000004 - type: ndcg_at_100 value: 54.458 - type: ndcg_at_1000 value: 56.076 - type: map_at_1 value: 30.125 - type: map_at_3 value: 38.018 - type: map_at_5 value: 40.367999999999995 - type: map_at_10 value: 42.119 - type: map_at_20 value: 42.970000000000006 - type: map_at_100 value: 43.599 - type: map_at_1000 value: 43.69 - type: recall_at_1 value: 30.125 - type: recall_at_3 value: 45.437 - type: recall_at_5 value: 53.197 - type: recall_at_10 value: 62.619 - type: recall_at_20 value: 71.187 - type: recall_at_100 value: 86.574 - type: recall_at_1000 value: 97.102 - type: precision_at_1 value: 37.557 - type: precision_at_3 value: 20.624000000000002 - type: precision_at_5 value: 15.068000000000001 - type: precision_at_10 value: 9.269 - type: precision_at_20 value: 5.428 - type: precision_at_100 value: 1.401 - type: precision_at_1000 value: 0.16999999999999998 - type: mrr_at_1 value: 37.5571 - type: mrr_at_3 value: 44.6537 - type: mrr_at_5 value: 46.4403 - type: mrr_at_10 value: 47.5732 - type: mrr_at_20 value: 48.126000000000005 - type: mrr_at_100 value: 48.460300000000004 - type: mrr_at_1000 value: 48.4993 - type: nauc_ndcg_at_1_max value: 44.5645 - type: nauc_ndcg_at_1_std value: 4.542800000000001 - type: nauc_ndcg_at_1_diff1 value: 50.2359 - type: nauc_ndcg_at_3_max value: 43.0652 - type: nauc_ndcg_at_3_std value: 4.3627 - type: nauc_ndcg_at_3_diff1 value: 43.4871 - type: nauc_ndcg_at_5_max value: 43.419999999999995 - type: nauc_ndcg_at_5_std value: 6.1539 - type: nauc_ndcg_at_5_diff1 value: 43.6875 - type: nauc_ndcg_at_10_max value: 43.5052 - type: nauc_ndcg_at_10_std value: 8.0707 - type: nauc_ndcg_at_10_diff1 value: 43.7523 - type: nauc_ndcg_at_20_max value: 44.0535 - type: nauc_ndcg_at_20_std value: 8.9662 - type: nauc_ndcg_at_20_diff1 value: 42.869299999999996 - type: nauc_ndcg_at_100_max value: 45.4324 - type: nauc_ndcg_at_100_std value: 10.663400000000001 - type: nauc_ndcg_at_100_diff1 value: 44.3052 - type: nauc_ndcg_at_1000_max value: 44.9238 - type: nauc_ndcg_at_1000_std value: 9.0618 - type: nauc_ndcg_at_1000_diff1 value: 44.472699999999996 - type: nauc_map_at_1_max value: 37.0128 - type: nauc_map_at_1_std value: -1.8889 - type: nauc_map_at_1_diff1 value: 50.125299999999996 - type: nauc_map_at_3_max value: 40.4277 - type: nauc_map_at_3_std value: 1.5571 - type: nauc_map_at_3_diff1 value: 45.5239 - type: nauc_map_at_5_max value: 41.6298 - type: nauc_map_at_5_std value: 3.4013 - type: nauc_map_at_5_diff1 value: 45.3778 - type: nauc_map_at_10_max value: 42.289300000000004 - type: nauc_map_at_10_std value: 4.6503000000000005 - type: nauc_map_at_10_diff1 value: 45.5387 - type: nauc_map_at_20_max value: 42.642 - type: nauc_map_at_20_std value: 5.0203 - type: nauc_map_at_20_diff1 value: 45.1577 - type: nauc_map_at_100_max value: 42.965199999999996 - type: nauc_map_at_100_std value: 5.335 - type: nauc_map_at_100_diff1 value: 45.406800000000004 - type: nauc_map_at_1000_max value: 42.9348 - type: nauc_map_at_1000_std value: 5.2551 - type: nauc_map_at_1000_diff1 value: 45.408100000000005 - type: nauc_recall_at_1_max value: 37.0128 - type: nauc_recall_at_1_std value: -1.8889 - type: nauc_recall_at_1_diff1 value: 50.125299999999996 - type: nauc_recall_at_3_max value: 38.929 - type: nauc_recall_at_3_std value: 4.077 - type: nauc_recall_at_3_diff1 value: 38.7002 - type: nauc_recall_at_5_max value: 39.6139 - type: nauc_recall_at_5_std value: 8.362 - type: nauc_recall_at_5_diff1 value: 37.585 - type: nauc_recall_at_10_max value: 39.2011 - type: nauc_recall_at_10_std value: 15.155899999999999 - type: nauc_recall_at_10_diff1 value: 36.005199999999995 - type: nauc_recall_at_20_max value: 40.221000000000004 - type: nauc_recall_at_20_std value: 20.6873 - type: nauc_recall_at_20_diff1 value: 30.7941 - type: nauc_recall_at_100_max value: 51.409800000000004 - type: nauc_recall_at_100_std value: 46.4559 - type: nauc_recall_at_100_diff1 value: 35.7367 - type: nauc_recall_at_1000_max value: 58.719500000000004 - type: nauc_recall_at_1000_std value: 72.0053 - type: nauc_recall_at_1000_diff1 value: 36.0514 - type: nauc_precision_at_1_max value: 44.5645 - type: nauc_precision_at_1_std value: 4.542800000000001 - type: nauc_precision_at_1_diff1 value: 50.2359 - type: nauc_precision_at_3_max value: 42.7363 - type: nauc_precision_at_3_std value: 11.9582 - type: nauc_precision_at_3_diff1 value: 28.242800000000003 - type: nauc_precision_at_5_max value: 39.7422 - type: nauc_precision_at_5_std value: 16.2831 - type: nauc_precision_at_5_diff1 value: 21.6264 - type: nauc_precision_at_10_max value: 33.4757 - type: nauc_precision_at_10_std value: 18.8123 - type: nauc_precision_at_10_diff1 value: 14.122000000000002 - type: nauc_precision_at_20_max value: 27.897 - type: nauc_precision_at_20_std value: 17.7175 - type: nauc_precision_at_20_diff1 value: 4.8417 - type: nauc_precision_at_100_max value: 16.4521 - type: nauc_precision_at_100_std value: 15.6333 - type: nauc_precision_at_100_diff1 value: -3.7706999999999997 - type: nauc_precision_at_1000_max value: 1.0215999999999998 - type: nauc_precision_at_1000_std value: 1.7413 - type: nauc_precision_at_1000_diff1 value: -13.7539 - type: nauc_mrr_at_1_max value: 44.5645 - type: nauc_mrr_at_1_std value: 4.542800000000001 - type: nauc_mrr_at_1_diff1 value: 50.2359 - type: nauc_mrr_at_3_max value: 46.611999999999995 - type: nauc_mrr_at_3_std value: 7.647900000000001 - type: nauc_mrr_at_3_diff1 value: 45.3343 - type: nauc_mrr_at_5_max value: 46.3141 - type: nauc_mrr_at_5_std value: 7.9993 - type: nauc_mrr_at_5_diff1 value: 45.252900000000004 - type: nauc_mrr_at_10_max value: 46.1605 - type: nauc_mrr_at_10_std value: 8.6568 - type: nauc_mrr_at_10_diff1 value: 45.1293 - type: nauc_mrr_at_20_max value: 46.1626 - type: nauc_mrr_at_20_std value: 8.6536 - type: nauc_mrr_at_20_diff1 value: 45.0837 - type: nauc_mrr_at_100_max value: 46.2514 - type: nauc_mrr_at_100_std value: 8.731300000000001 - type: nauc_mrr_at_100_diff1 value: 45.2734 - type: nauc_mrr_at_1000_max value: 46.2511 - type: nauc_mrr_at_1000_std value: 8.6858 - type: nauc_mrr_at_1000_diff1 value: 45.29 - type: main_score value: 48.742999999999995 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: ndcg_at_1 value: 36.5025 - type: ndcg_at_3 value: 42.563833333333335 - type: ndcg_at_5 value: 45.190500000000014 - type: ndcg_at_10 value: 48.15416666666666 - type: ndcg_at_20 value: 50.29141666666666 - type: ndcg_at_100 value: 53.34008333333333 - type: ndcg_at_1000 value: 55.072416666666676 - type: map_at_1 value: 30.718333333333337 - type: map_at_3 value: 38.537166666666664 - type: map_at_5 value: 40.46825 - type: map_at_10 value: 42.020250000000004 - type: map_at_20 value: 42.783 - type: map_at_100 value: 43.36233333333334 - type: map_at_1000 value: 43.46825 - type: recall_at_1 value: 30.718333333333337 - type: recall_at_3 value: 46.2075 - type: recall_at_5 value: 52.98616666666667 - type: recall_at_10 value: 61.78366666666667 - type: recall_at_20 value: 69.50683333333333 - type: recall_at_100 value: 84.0005 - type: recall_at_1000 value: 95.623 - type: precision_at_1 value: 36.5025 - type: precision_at_3 value: 19.820999999999998 - type: precision_at_5 value: 14.119666666666669 - type: precision_at_10 value: 8.606083333333334 - type: precision_at_20 value: 5.0425 - type: precision_at_100 value: 1.3245 - type: precision_at_1000 value: 0.16624999999999998 - type: mrr_at_1 value: 36.50251666666667 - type: mrr_at_3 value: 43.639925000000005 - type: mrr_at_5 value: 45.17450833333333 - type: mrr_at_10 value: 46.29196666666667 - type: mrr_at_20 value: 46.787433333333325 - type: mrr_at_100 value: 47.11775833333334 - type: mrr_at_1000 value: 47.160025 - type: nauc_ndcg_at_1_max value: 35.63543333333333 - type: nauc_ndcg_at_1_std value: -2.5082500000000003 - type: nauc_ndcg_at_1_diff1 value: 49.697575 - type: nauc_ndcg_at_3_max value: 34.4362 - type: nauc_ndcg_at_3_std value: -1.8411749999999998 - type: nauc_ndcg_at_3_diff1 value: 43.73903333333333 - type: nauc_ndcg_at_5_max value: 34.93775 - type: nauc_ndcg_at_5_std value: -0.8254249999999997 - type: nauc_ndcg_at_5_diff1 value: 43.07621666666667 - type: nauc_ndcg_at_10_max value: 35.32053333333333 - type: nauc_ndcg_at_10_std value: 0.5296166666666667 - type: nauc_ndcg_at_10_diff1 value: 42.7897 - type: nauc_ndcg_at_20_max value: 35.781600000000005 - type: nauc_ndcg_at_20_std value: 1.3973583333333335 - type: nauc_ndcg_at_20_diff1 value: 42.563583333333334 - type: nauc_ndcg_at_100_max value: 36.46264166666666 - type: nauc_ndcg_at_100_std value: 2.793141666666667 - type: nauc_ndcg_at_100_diff1 value: 42.913475 - type: nauc_ndcg_at_1000_max value: 36.389716666666665 - type: nauc_ndcg_at_1000_std value: 2.1062499999999997 - type: nauc_ndcg_at_1000_diff1 value: 43.32690000000001 - type: nauc_map_at_1_max value: 30.19065 - type: nauc_map_at_1_std value: -6.136941666666667 - type: nauc_map_at_1_diff1 value: 50.95858333333334 - type: nauc_map_at_3_max value: 32.65271666666666 - type: nauc_map_at_3_std value: -3.927191666666667 - type: nauc_map_at_3_diff1 value: 45.89055 - type: nauc_map_at_5_max value: 33.56583333333334 - type: nauc_map_at_5_std value: -2.8991750000000005 - type: nauc_map_at_5_diff1 value: 45.29093333333334 - type: nauc_map_at_10_max value: 34.177641666666666 - type: nauc_map_at_10_std value: -1.9589083333333333 - type: nauc_map_at_10_diff1 value: 45.126108333333335 - type: nauc_map_at_20_max value: 34.461074999999994 - type: nauc_map_at_20_std value: -1.550616666666666 - type: nauc_map_at_20_diff1 value: 45.00503333333333 - type: nauc_map_at_100_max value: 34.69629166666666 - type: nauc_map_at_100_std value: -1.1661166666666671 - type: nauc_map_at_100_diff1 value: 45.009175 - type: nauc_map_at_1000_max value: 34.688108333333325 - type: nauc_map_at_1000_std value: -1.1726583333333331 - type: nauc_map_at_1000_diff1 value: 45.010266666666666 - type: nauc_recall_at_1_max value: 30.19065 - type: nauc_recall_at_1_std value: -6.136941666666667 - type: nauc_recall_at_1_diff1 value: 50.95858333333334 - type: nauc_recall_at_3_max value: 31.18069166666666 - type: nauc_recall_at_3_std value: -2.425375 - type: nauc_recall_at_3_diff1 value: 39.215491666666665 - type: nauc_recall_at_5_max value: 32.40545833333333 - type: nauc_recall_at_5_std value: 0.30784166666666674 - type: nauc_recall_at_5_diff1 value: 36.58546666666667 - type: nauc_recall_at_10_max value: 33.11824166666668 - type: nauc_recall_at_10_std value: 5.099150000000001 - type: nauc_recall_at_10_diff1 value: 34.32635833333333 - type: nauc_recall_at_20_max value: 34.84125 - type: nauc_recall_at_20_std value: 9.744425 - type: nauc_recall_at_20_diff1 value: 32.073550000000004 - type: nauc_recall_at_100_max value: 40.07125 - type: nauc_recall_at_100_std value: 26.520391666666672 - type: nauc_recall_at_100_diff1 value: 29.73679166666667 - type: nauc_recall_at_1000_max value: 52.596025000000004 - type: nauc_recall_at_1000_std value: 53.16131666666667 - type: nauc_recall_at_1000_diff1 value: 27.2596 - type: nauc_precision_at_1_max value: 35.63543333333333 - type: nauc_precision_at_1_std value: -2.5082500000000003 - type: nauc_precision_at_1_diff1 value: 49.697575 - type: nauc_precision_at_3_max value: 34.383424999999995 - type: nauc_precision_at_3_std value: 4.906383333333332 - type: nauc_precision_at_3_diff1 value: 27.956991666666664 - type: nauc_precision_at_5_max value: 33.50664166666667 - type: nauc_precision_at_5_std value: 9.5448 - type: nauc_precision_at_5_diff1 value: 20.584491666666665 - type: nauc_precision_at_10_max value: 30.116449999999993 - type: nauc_precision_at_10_std value: 14.272133333333334 - type: nauc_precision_at_10_diff1 value: 12.496183333333333 - type: nauc_precision_at_20_max value: 26.383483333333334 - type: nauc_precision_at_20_std value: 16.945558333333334 - type: nauc_precision_at_20_diff1 value: 5.616483333333333 - type: nauc_precision_at_100_max value: 17.88254166666667 - type: nauc_precision_at_100_std value: 19.543916666666668 - type: nauc_precision_at_100_diff1 value: -4.408391666666666 - type: nauc_precision_at_1000_max value: 6.492849999999999 - type: nauc_precision_at_1000_std value: 11.98045 - type: nauc_precision_at_1000_diff1 value: -12.374983333333333 - type: nauc_mrr_at_1_max value: 35.63543333333333 - type: nauc_mrr_at_1_std value: -2.5082500000000003 - type: nauc_mrr_at_1_diff1 value: 49.697575 - type: nauc_mrr_at_3_max value: 36.531841666666665 - type: nauc_mrr_at_3_std value: -0.49094999999999983 - type: nauc_mrr_at_3_diff1 value: 45.05095 - type: nauc_mrr_at_5_max value: 36.68914166666667 - type: nauc_mrr_at_5_std value: -0.020883333333333517 - type: nauc_mrr_at_5_diff1 value: 44.59794166666667 - type: nauc_mrr_at_10_max value: 36.71131666666667 - type: nauc_mrr_at_10_std value: 0.42916666666666675 - type: nauc_mrr_at_10_diff1 value: 44.502241666666656 - type: nauc_mrr_at_20_max value: 36.73486666666667 - type: nauc_mrr_at_20_std value: 0.5398083333333334 - type: nauc_mrr_at_20_diff1 value: 44.48308333333335 - type: nauc_mrr_at_100_max value: 36.76240833333333 - type: nauc_mrr_at_100_std value: 0.6035583333333332 - type: nauc_mrr_at_100_diff1 value: 44.55041666666667 - type: nauc_mrr_at_1000_max value: 36.76164166666667 - type: nauc_mrr_at_1000_std value: 0.5883499999999998 - type: nauc_mrr_at_1000_diff1 value: 44.56814166666667 - type: main_score value: 48.15416666666666 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: CQADupstackRetrieval_is_a_combined_dataset split: test type: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: main_score value: 48.15416666666667 - type: ndcg_at_10 value: 48.15416666666667 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval (default) revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: mteb/cqadupstack-stats metrics: - type: ndcg_at_1 value: 32.669 - type: ndcg_at_3 value: 37.604 - type: ndcg_at_5 value: 39.682 - type: ndcg_at_10 value: 42.353 - type: ndcg_at_20 value: 44.374 - type: ndcg_at_100 value: 47.424 - type: ndcg_at_1000 value: 49.589 - type: map_at_1 value: 29.193 - type: map_at_3 value: 34.897 - type: map_at_5 value: 36.272999999999996 - type: map_at_10 value: 37.529 - type: map_at_20 value: 38.156 - type: map_at_100 value: 38.614 - type: map_at_1000 value: 38.712999999999994 - type: recall_at_1 value: 29.193 - type: recall_at_3 value: 41.014 - type: recall_at_5 value: 46.248 - type: recall_at_10 value: 54.159 - type: recall_at_20 value: 61.818 - type: recall_at_100 value: 77.267 - type: recall_at_1000 value: 92.805 - type: precision_at_1 value: 32.669 - type: precision_at_3 value: 16.309 - type: precision_at_5 value: 11.288 - type: precision_at_10 value: 6.8709999999999996 - type: precision_at_20 value: 3.9419999999999997 - type: precision_at_100 value: 1.008 - type: precision_at_1000 value: 0.126 - type: mrr_at_1 value: 32.6687 - type: mrr_at_3 value: 38.0368 - type: mrr_at_5 value: 39.1948 - type: mrr_at_10 value: 40.2884 - type: mrr_at_20 value: 40.7986 - type: mrr_at_100 value: 41.1771 - type: mrr_at_1000 value: 41.240700000000004 - type: nauc_ndcg_at_1_max value: 38.765699999999995 - type: nauc_ndcg_at_1_std value: 3.3594 - type: nauc_ndcg_at_1_diff1 value: 54.1068 - type: nauc_ndcg_at_3_max value: 35.987700000000004 - type: nauc_ndcg_at_3_std value: 2.8396999999999997 - type: nauc_ndcg_at_3_diff1 value: 47.2858 - type: nauc_ndcg_at_5_max value: 36.628699999999995 - type: nauc_ndcg_at_5_std value: 3.6117000000000004 - type: nauc_ndcg_at_5_diff1 value: 46.9776 - type: nauc_ndcg_at_10_max value: 36.763200000000005 - type: nauc_ndcg_at_10_std value: 4.7951 - type: nauc_ndcg_at_10_diff1 value: 46.5066 - type: nauc_ndcg_at_20_max value: 36.6793 - type: nauc_ndcg_at_20_std value: 5.6449 - type: nauc_ndcg_at_20_diff1 value: 45.835100000000004 - type: nauc_ndcg_at_100_max value: 37.0064 - type: nauc_ndcg_at_100_std value: 6.6625000000000005 - type: nauc_ndcg_at_100_diff1 value: 45.4937 - type: nauc_ndcg_at_1000_max value: 37.5693 - type: nauc_ndcg_at_1000_std value: 6.5411 - type: nauc_ndcg_at_1000_diff1 value: 46.671800000000005 - type: nauc_map_at_1_max value: 32.7625 - type: nauc_map_at_1_std value: -1.8726 - type: nauc_map_at_1_diff1 value: 53.1931 - type: nauc_map_at_3_max value: 34.7221 - type: nauc_map_at_3_std value: 1.141 - type: nauc_map_at_3_diff1 value: 49.0672 - type: nauc_map_at_5_max value: 35.5173 - type: nauc_map_at_5_std value: 2.2872 - type: nauc_map_at_5_diff1 value: 48.5047 - type: nauc_map_at_10_max value: 35.7686 - type: nauc_map_at_10_std value: 2.9238 - type: nauc_map_at_10_diff1 value: 48.3548 - type: nauc_map_at_20_max value: 35.7707 - type: nauc_map_at_20_std value: 3.0683 - type: nauc_map_at_20_diff1 value: 48.1708 - type: nauc_map_at_100_max value: 35.8572 - type: nauc_map_at_100_std value: 3.2108999999999996 - type: nauc_map_at_100_diff1 value: 48.0681 - type: nauc_map_at_1000_max value: 35.885600000000004 - type: nauc_map_at_1000_std value: 3.2162 - type: nauc_map_at_1000_diff1 value: 48.1239 - type: nauc_recall_at_1_max value: 32.7625 - type: nauc_recall_at_1_std value: -1.8726 - type: nauc_recall_at_1_diff1 value: 53.1931 - type: nauc_recall_at_3_max value: 32.5847 - type: nauc_recall_at_3_std value: 1.4236 - type: nauc_recall_at_3_diff1 value: 42.8899 - type: nauc_recall_at_5_max value: 35.0441 - type: nauc_recall_at_5_std value: 4.1737 - type: nauc_recall_at_5_diff1 value: 41.8313 - type: nauc_recall_at_10_max value: 35.063100000000006 - type: nauc_recall_at_10_std value: 7.8740000000000006 - type: nauc_recall_at_10_diff1 value: 38.9244 - type: nauc_recall_at_20_max value: 33.6964 - type: nauc_recall_at_20_std value: 12.0632 - type: nauc_recall_at_20_diff1 value: 34.7941 - type: nauc_recall_at_100_max value: 33.928399999999996 - type: nauc_recall_at_100_std value: 23.1451 - type: nauc_recall_at_100_diff1 value: 28.170499999999997 - type: nauc_recall_at_1000_max value: 45.6188 - type: nauc_recall_at_1000_std value: 44.1766 - type: nauc_recall_at_1000_diff1 value: 34.1945 - type: nauc_precision_at_1_max value: 38.765699999999995 - type: nauc_precision_at_1_std value: 3.3594 - type: nauc_precision_at_1_diff1 value: 54.1068 - type: nauc_precision_at_3_max value: 39.3932 - type: nauc_precision_at_3_std value: 11.258600000000001 - type: nauc_precision_at_3_diff1 value: 36.9186 - type: nauc_precision_at_5_max value: 39.0844 - type: nauc_precision_at_5_std value: 14.7369 - type: nauc_precision_at_5_diff1 value: 31.3071 - type: nauc_precision_at_10_max value: 36.3678 - type: nauc_precision_at_10_std value: 17.292099999999998 - type: nauc_precision_at_10_diff1 value: 24.0674 - type: nauc_precision_at_20_max value: 32.5422 - type: nauc_precision_at_20_std value: 17.3521 - type: nauc_precision_at_20_diff1 value: 17.8472 - type: nauc_precision_at_100_max value: 28.439700000000002 - type: nauc_precision_at_100_std value: 21.7441 - type: nauc_precision_at_100_diff1 value: 7.6072 - type: nauc_precision_at_1000_max value: 18.9222 - type: nauc_precision_at_1000_std value: 17.1045 - type: nauc_precision_at_1000_diff1 value: 0.9424 - type: nauc_mrr_at_1_max value: 38.765699999999995 - type: nauc_mrr_at_1_std value: 3.3594 - type: nauc_mrr_at_1_diff1 value: 54.1068 - type: nauc_mrr_at_3_max value: 38.4312 - type: nauc_mrr_at_3_std value: 4.4437999999999995 - type: nauc_mrr_at_3_diff1 value: 49.0981 - type: nauc_mrr_at_5_max value: 38.8429 - type: nauc_mrr_at_5_std value: 4.7834 - type: nauc_mrr_at_5_diff1 value: 49.1564 - type: nauc_mrr_at_10_max value: 39.1657 - type: nauc_mrr_at_10_std value: 5.3785 - type: nauc_mrr_at_10_diff1 value: 49.0301 - type: nauc_mrr_at_20_max value: 39.1254 - type: nauc_mrr_at_20_std value: 5.6123 - type: nauc_mrr_at_20_diff1 value: 48.8663 - type: nauc_mrr_at_100_max value: 39.097 - type: nauc_mrr_at_100_std value: 5.6065 - type: nauc_mrr_at_100_diff1 value: 48.827799999999996 - type: nauc_mrr_at_1000_max value: 39.1157 - type: nauc_mrr_at_1000_std value: 5.6175999999999995 - type: nauc_mrr_at_1000_diff1 value: 48.8575 - type: main_score value: 42.353 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval (default) revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: mteb/cqadupstack-tex metrics: - type: ndcg_at_1 value: 25.946 - type: ndcg_at_3 value: 31.463 - type: ndcg_at_5 value: 33.803 - type: ndcg_at_10 value: 36.55 - type: ndcg_at_20 value: 38.794000000000004 - type: ndcg_at_100 value: 42.327999999999996 - type: ndcg_at_1000 value: 44.783 - type: map_at_1 value: 21.217 - type: map_at_3 value: 27.882 - type: map_at_5 value: 29.537000000000003 - type: map_at_10 value: 30.848 - type: map_at_20 value: 31.574999999999996 - type: map_at_100 value: 32.173 - type: map_at_1000 value: 32.296 - type: recall_at_1 value: 21.217 - type: recall_at_3 value: 34.993 - type: recall_at_5 value: 41.028999999999996 - type: recall_at_10 value: 49.327 - type: recall_at_20 value: 57.50300000000001 - type: recall_at_100 value: 74.72 - type: recall_at_1000 value: 91.637 - type: precision_at_1 value: 25.946 - type: precision_at_3 value: 15.129999999999999 - type: precision_at_5 value: 10.991 - type: precision_at_10 value: 6.793 - type: precision_at_20 value: 4.076 - type: precision_at_100 value: 1.138 - type: precision_at_1000 value: 0.155 - type: mrr_at_1 value: 25.9463 - type: mrr_at_3 value: 32.4845 - type: mrr_at_5 value: 33.9642 - type: mrr_at_10 value: 35.0906 - type: mrr_at_20 value: 35.6346 - type: mrr_at_100 value: 36.0474 - type: mrr_at_1000 value: 36.1106 - type: nauc_ndcg_at_1_max value: 29.3294 - type: nauc_ndcg_at_1_std value: 1.9199000000000002 - type: nauc_ndcg_at_1_diff1 value: 43.9951 - type: nauc_ndcg_at_3_max value: 28.4154 - type: nauc_ndcg_at_3_std value: 2.262 - type: nauc_ndcg_at_3_diff1 value: 37.0416 - type: nauc_ndcg_at_5_max value: 29.0647 - type: nauc_ndcg_at_5_std value: 3.6863 - type: nauc_ndcg_at_5_diff1 value: 36.3715 - type: nauc_ndcg_at_10_max value: 29.0041 - type: nauc_ndcg_at_10_std value: 4.605 - type: nauc_ndcg_at_10_diff1 value: 36.1295 - type: nauc_ndcg_at_20_max value: 29.5425 - type: nauc_ndcg_at_20_std value: 5.5535 - type: nauc_ndcg_at_20_diff1 value: 35.74 - type: nauc_ndcg_at_100_max value: 30.1166 - type: nauc_ndcg_at_100_std value: 7.4285000000000005 - type: nauc_ndcg_at_100_diff1 value: 35.4871 - type: nauc_ndcg_at_1000_max value: 30.198900000000002 - type: nauc_ndcg_at_1000_std value: 6.6549 - type: nauc_ndcg_at_1000_diff1 value: 36.3901 - type: nauc_map_at_1_max value: 26.6761 - type: nauc_map_at_1_std value: -0.4332 - type: nauc_map_at_1_diff1 value: 46.015299999999996 - type: nauc_map_at_3_max value: 27.221 - type: nauc_map_at_3_std value: 1.3299999999999998 - type: nauc_map_at_3_diff1 value: 38.9882 - type: nauc_map_at_5_max value: 27.929900000000004 - type: nauc_map_at_5_std value: 2.1886 - type: nauc_map_at_5_diff1 value: 38.5184 - type: nauc_map_at_10_max value: 28.105599999999995 - type: nauc_map_at_10_std value: 2.6707 - type: nauc_map_at_10_diff1 value: 38.419599999999996 - type: nauc_map_at_20_max value: 28.359499999999997 - type: nauc_map_at_20_std value: 2.9859 - type: nauc_map_at_20_diff1 value: 38.2748 - type: nauc_map_at_100_max value: 28.5493 - type: nauc_map_at_100_std value: 3.3446999999999996 - type: nauc_map_at_100_diff1 value: 38.1789 - type: nauc_map_at_1000_max value: 28.5931 - type: nauc_map_at_1000_std value: 3.3341999999999996 - type: nauc_map_at_1000_diff1 value: 38.2276 - type: nauc_recall_at_1_max value: 26.6761 - type: nauc_recall_at_1_std value: -0.4332 - type: nauc_recall_at_1_diff1 value: 46.015299999999996 - type: nauc_recall_at_3_max value: 26.0116 - type: nauc_recall_at_3_std value: 2.6044 - type: nauc_recall_at_3_diff1 value: 32.1201 - type: nauc_recall_at_5_max value: 27.361 - type: nauc_recall_at_5_std value: 5.6135 - type: nauc_recall_at_5_diff1 value: 29.807699999999997 - type: nauc_recall_at_10_max value: 26.885399999999997 - type: nauc_recall_at_10_std value: 8.1679 - type: nauc_recall_at_10_diff1 value: 28.283599999999996 - type: nauc_recall_at_20_max value: 28.5827 - type: nauc_recall_at_20_std value: 11.7346 - type: nauc_recall_at_20_diff1 value: 25.965 - type: nauc_recall_at_100_max value: 31.488100000000003 - type: nauc_recall_at_100_std value: 25.9126 - type: nauc_recall_at_100_diff1 value: 20.9561 - type: nauc_recall_at_1000_max value: 37.424 - type: nauc_recall_at_1000_std value: 35.7201 - type: nauc_recall_at_1000_diff1 value: 22.156100000000002 - type: nauc_precision_at_1_max value: 29.3294 - type: nauc_precision_at_1_std value: 1.9199000000000002 - type: nauc_precision_at_1_diff1 value: 43.9951 - type: nauc_precision_at_3_max value: 29.893700000000003 - type: nauc_precision_at_3_std value: 5.0083 - type: nauc_precision_at_3_diff1 value: 28.530499999999996 - type: nauc_precision_at_5_max value: 30.6624 - type: nauc_precision_at_5_std value: 8.098600000000001 - type: nauc_precision_at_5_diff1 value: 23.8478 - type: nauc_precision_at_10_max value: 28.407100000000003 - type: nauc_precision_at_10_std value: 10.852599999999999 - type: nauc_precision_at_10_diff1 value: 19.1175 - type: nauc_precision_at_20_max value: 26.045299999999997 - type: nauc_precision_at_20_std value: 12.898399999999999 - type: nauc_precision_at_20_diff1 value: 13.586599999999999 - type: nauc_precision_at_100_max value: 23.8686 - type: nauc_precision_at_100_std value: 16.558500000000002 - type: nauc_precision_at_100_diff1 value: 4.8838 - type: nauc_precision_at_1000_max value: 18.803900000000002 - type: nauc_precision_at_1000_std value: 8.252600000000001 - type: nauc_precision_at_1000_diff1 value: 3.4761 - type: nauc_mrr_at_1_max value: 29.3294 - type: nauc_mrr_at_1_std value: 1.9199000000000002 - type: nauc_mrr_at_1_diff1 value: 43.9951 - type: nauc_mrr_at_3_max value: 29.7689 - type: nauc_mrr_at_3_std value: 2.9381 - type: nauc_mrr_at_3_diff1 value: 39.0616 - type: nauc_mrr_at_5_max value: 30.0871 - type: nauc_mrr_at_5_std value: 3.7067 - type: nauc_mrr_at_5_diff1 value: 38.2429 - type: nauc_mrr_at_10_max value: 30.0444 - type: nauc_mrr_at_10_std value: 4.086399999999999 - type: nauc_mrr_at_10_diff1 value: 38.0941 - type: nauc_mrr_at_20_max value: 30.134499999999996 - type: nauc_mrr_at_20_std value: 4.288200000000001 - type: nauc_mrr_at_20_diff1 value: 38.048300000000005 - type: nauc_mrr_at_100_max value: 30.1624 - type: nauc_mrr_at_100_std value: 4.4486 - type: nauc_mrr_at_100_diff1 value: 38.067499999999995 - type: nauc_mrr_at_1000_max value: 30.168899999999997 - type: nauc_mrr_at_1000_std value: 4.4265 - type: nauc_mrr_at_1000_diff1 value: 38.0978 - type: main_score value: 36.55 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval (default) revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: mteb/cqadupstack-unix metrics: - type: ndcg_at_1 value: 40.111999999999995 - type: ndcg_at_3 value: 44.91 - type: ndcg_at_5 value: 48.048 - type: ndcg_at_10 value: 51.300000000000004 - type: ndcg_at_20 value: 53.537 - type: ndcg_at_100 value: 56.53399999999999 - type: ndcg_at_1000 value: 58.048 - type: map_at_1 value: 34.303 - type: map_at_3 value: 41.43 - type: map_at_5 value: 43.633 - type: map_at_10 value: 45.312000000000005 - type: map_at_20 value: 46.04 - type: map_at_100 value: 46.563 - type: map_at_1000 value: 46.64 - type: recall_at_1 value: 34.303 - type: recall_at_3 value: 48.465 - type: recall_at_5 value: 56.374 - type: recall_at_10 value: 65.508 - type: recall_at_20 value: 73.457 - type: recall_at_100 value: 87.53 - type: recall_at_1000 value: 97.42 - type: precision_at_1 value: 40.111999999999995 - type: precision_at_3 value: 20.211000000000002 - type: precision_at_5 value: 14.496 - type: precision_at_10 value: 8.806 - type: precision_at_20 value: 5.047 - type: precision_at_100 value: 1.266 - type: precision_at_1000 value: 0.149 - type: mrr_at_1 value: 40.1119 - type: mrr_at_3 value: 46.1287 - type: mrr_at_5 value: 47.9011 - type: mrr_at_10 value: 49.0974 - type: mrr_at_20 value: 49.6541 - type: mrr_at_100 value: 49.9655 - type: mrr_at_1000 value: 50.0063 - type: nauc_ndcg_at_1_max value: 40.5521 - type: nauc_ndcg_at_1_std value: -7.457700000000001 - type: nauc_ndcg_at_1_diff1 value: 50.6505 - type: nauc_ndcg_at_3_max value: 38.696999999999996 - type: nauc_ndcg_at_3_std value: -4.2286 - type: nauc_ndcg_at_3_diff1 value: 44.289699999999996 - type: nauc_ndcg_at_5_max value: 39.6798 - type: nauc_ndcg_at_5_std value: -2.8316 - type: nauc_ndcg_at_5_diff1 value: 44.0944 - type: nauc_ndcg_at_10_max value: 40.5534 - type: nauc_ndcg_at_10_std value: -2.2217000000000002 - type: nauc_ndcg_at_10_diff1 value: 43.811299999999996 - type: nauc_ndcg_at_20_max value: 41.1096 - type: nauc_ndcg_at_20_std value: -1.5137 - type: nauc_ndcg_at_20_diff1 value: 43.7406 - type: nauc_ndcg_at_100_max value: 40.588 - type: nauc_ndcg_at_100_std value: -1.2616 - type: nauc_ndcg_at_100_diff1 value: 43.553 - type: nauc_ndcg_at_1000_max value: 40.86 - type: nauc_ndcg_at_1000_std value: -1.6507999999999998 - type: nauc_ndcg_at_1000_diff1 value: 44.1305 - type: nauc_map_at_1_max value: 36.9173 - type: nauc_map_at_1_std value: -8.2788 - type: nauc_map_at_1_diff1 value: 52.4203 - type: nauc_map_at_3_max value: 38.006499999999996 - type: nauc_map_at_3_std value: -5.5607 - type: nauc_map_at_3_diff1 value: 46.847 - type: nauc_map_at_5_max value: 39.1588 - type: nauc_map_at_5_std value: -4.6744 - type: nauc_map_at_5_diff1 value: 46.3773 - type: nauc_map_at_10_max value: 39.8953 - type: nauc_map_at_10_std value: -4.3361 - type: nauc_map_at_10_diff1 value: 46.1408 - type: nauc_map_at_20_max value: 40.1053 - type: nauc_map_at_20_std value: -4.1688 - type: nauc_map_at_20_diff1 value: 46.0601 - type: nauc_map_at_100_max value: 40.0756 - type: nauc_map_at_100_std value: -4.0973999999999995 - type: nauc_map_at_100_diff1 value: 46.0325 - type: nauc_map_at_1000_max value: 40.0894 - type: nauc_map_at_1000_std value: -4.0949 - type: nauc_map_at_1000_diff1 value: 46.048899999999996 - type: nauc_recall_at_1_max value: 36.9173 - type: nauc_recall_at_1_std value: -8.2788 - type: nauc_recall_at_1_diff1 value: 52.4203 - type: nauc_recall_at_3_max value: 35.2291 - type: nauc_recall_at_3_std value: -2.4944 - type: nauc_recall_at_3_diff1 value: 39.3066 - type: nauc_recall_at_5_max value: 37.2859 - type: nauc_recall_at_5_std value: 1.2917 - type: nauc_recall_at_5_diff1 value: 37.2158 - type: nauc_recall_at_10_max value: 38.9748 - type: nauc_recall_at_10_std value: 3.8526 - type: nauc_recall_at_10_diff1 value: 35.188 - type: nauc_recall_at_20_max value: 41.1368 - type: nauc_recall_at_20_std value: 8.1788 - type: nauc_recall_at_20_diff1 value: 33.8061 - type: nauc_recall_at_100_max value: 36.280499999999996 - type: nauc_recall_at_100_std value: 16.6693 - type: nauc_recall_at_100_diff1 value: 26.466 - type: nauc_recall_at_1000_max value: 57.084999999999994 - type: nauc_recall_at_1000_std value: 56.954499999999996 - type: nauc_recall_at_1000_diff1 value: 25.915300000000002 - type: nauc_precision_at_1_max value: 40.5521 - type: nauc_precision_at_1_std value: -7.457700000000001 - type: nauc_precision_at_1_diff1 value: 50.6505 - type: nauc_precision_at_3_max value: 36.2259 - type: nauc_precision_at_3_std value: 0.8514 - type: nauc_precision_at_3_diff1 value: 27.168300000000002 - type: nauc_precision_at_5_max value: 35.6781 - type: nauc_precision_at_5_std value: 5.119400000000001 - type: nauc_precision_at_5_diff1 value: 19.7828 - type: nauc_precision_at_10_max value: 29.9623 - type: nauc_precision_at_10_std value: 6.7059 - type: nauc_precision_at_10_diff1 value: 9.7104 - type: nauc_precision_at_20_max value: 26.2428 - type: nauc_precision_at_20_std value: 9.854000000000001 - type: nauc_precision_at_20_diff1 value: 2.6679999999999997 - type: nauc_precision_at_100_max value: 9.9456 - type: nauc_precision_at_100_std value: 12.465 - type: nauc_precision_at_100_diff1 value: -11.0348 - type: nauc_precision_at_1000_max value: -3.3062 - type: nauc_precision_at_1000_std value: 5.3786000000000005 - type: nauc_precision_at_1000_diff1 value: -18.712999999999997 - type: nauc_mrr_at_1_max value: 40.5521 - type: nauc_mrr_at_1_std value: -7.457700000000001 - type: nauc_mrr_at_1_diff1 value: 50.6505 - type: nauc_mrr_at_3_max value: 39.994 - type: nauc_mrr_at_3_std value: -4.4112 - type: nauc_mrr_at_3_diff1 value: 45.0963 - type: nauc_mrr_at_5_max value: 40.3926 - type: nauc_mrr_at_5_std value: -3.611 - type: nauc_mrr_at_5_diff1 value: 44.9505 - type: nauc_mrr_at_10_max value: 40.597 - type: nauc_mrr_at_10_std value: -3.5407 - type: nauc_mrr_at_10_diff1 value: 45.0605 - type: nauc_mrr_at_20_max value: 40.6821 - type: nauc_mrr_at_20_std value: -3.4132000000000002 - type: nauc_mrr_at_20_diff1 value: 45.1507 - type: nauc_mrr_at_100_max value: 40.6279 - type: nauc_mrr_at_100_std value: -3.4576000000000002 - type: nauc_mrr_at_100_diff1 value: 45.183299999999996 - type: nauc_mrr_at_1000_max value: 40.6436 - type: nauc_mrr_at_1000_std value: -3.4639 - type: nauc_mrr_at_1000_diff1 value: 45.2065 - type: main_score value: 51.300000000000004 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: mteb/cqadupstack-webmasters metrics: - type: ndcg_at_1 value: 36.364000000000004 - type: ndcg_at_3 value: 41.875 - type: ndcg_at_5 value: 44.316 - type: ndcg_at_10 value: 47.301 - type: ndcg_at_20 value: 50.059 - type: ndcg_at_100 value: 53.698 - type: ndcg_at_1000 value: 55.503 - type: map_at_1 value: 30.312 - type: map_at_3 value: 37.527 - type: map_at_5 value: 39.36 - type: map_at_10 value: 40.931 - type: map_at_20 value: 41.978 - type: map_at_100 value: 42.893 - type: map_at_1000 value: 43.120000000000005 - type: recall_at_1 value: 30.312 - type: recall_at_3 value: 44.251000000000005 - type: recall_at_5 value: 50.456999999999994 - type: recall_at_10 value: 59.418000000000006 - type: recall_at_20 value: 69.791 - type: recall_at_100 value: 86.56 - type: recall_at_1000 value: 97.41199999999999 - type: precision_at_1 value: 36.364000000000004 - type: precision_at_3 value: 19.499 - type: precision_at_5 value: 14.149999999999999 - type: precision_at_10 value: 9.032 - type: precision_at_20 value: 5.800000000000001 - type: precision_at_100 value: 1.806 - type: precision_at_1000 value: 0.258 - type: mrr_at_1 value: 36.3636 - type: mrr_at_3 value: 42.918299999999995 - type: mrr_at_5 value: 44.4302 - type: mrr_at_10 value: 45.677299999999995 - type: mrr_at_20 value: 46.372600000000006 - type: mrr_at_100 value: 46.7532 - type: mrr_at_1000 value: 46.786699999999996 - type: nauc_ndcg_at_1_max value: 36.5416 - type: nauc_ndcg_at_1_std value: 1.7398 - type: nauc_ndcg_at_1_diff1 value: 48.6149 - type: nauc_ndcg_at_3_max value: 35.9768 - type: nauc_ndcg_at_3_std value: 4.3271999999999995 - type: nauc_ndcg_at_3_diff1 value: 43.4812 - type: nauc_ndcg_at_5_max value: 34.9136 - type: nauc_ndcg_at_5_std value: 5.291300000000001 - type: nauc_ndcg_at_5_diff1 value: 42.4122 - type: nauc_ndcg_at_10_max value: 35.3659 - type: nauc_ndcg_at_10_std value: 6.8223 - type: nauc_ndcg_at_10_diff1 value: 42.123 - type: nauc_ndcg_at_20_max value: 37.302400000000006 - type: nauc_ndcg_at_20_std value: 7.836600000000001 - type: nauc_ndcg_at_20_diff1 value: 42.9609 - type: nauc_ndcg_at_100_max value: 38.028800000000004 - type: nauc_ndcg_at_100_std value: 9.065900000000001 - type: nauc_ndcg_at_100_diff1 value: 42.8557 - type: nauc_ndcg_at_1000_max value: 37.8805 - type: nauc_ndcg_at_1000_std value: 7.965800000000001 - type: nauc_ndcg_at_1000_diff1 value: 43.331399999999995 - type: nauc_map_at_1_max value: 32.5587 - type: nauc_map_at_1_std value: -2.3119 - type: nauc_map_at_1_diff1 value: 52.2244 - type: nauc_map_at_3_max value: 34.6582 - type: nauc_map_at_3_std value: 1.3005 - type: nauc_map_at_3_diff1 value: 46.774100000000004 - type: nauc_map_at_5_max value: 34.6492 - type: nauc_map_at_5_std value: 2.2614 - type: nauc_map_at_5_diff1 value: 45.9467 - type: nauc_map_at_10_max value: 35.4443 - type: nauc_map_at_10_std value: 3.7047999999999996 - type: nauc_map_at_10_diff1 value: 45.6336 - type: nauc_map_at_20_max value: 36.1327 - type: nauc_map_at_20_std value: 4.3156 - type: nauc_map_at_20_diff1 value: 45.7802 - type: nauc_map_at_100_max value: 36.4952 - type: nauc_map_at_100_std value: 4.9964 - type: nauc_map_at_100_diff1 value: 45.5278 - type: nauc_map_at_1000_max value: 36.3394 - type: nauc_map_at_1000_std value: 5.0168 - type: nauc_map_at_1000_diff1 value: 45.4435 - type: nauc_recall_at_1_max value: 32.5587 - type: nauc_recall_at_1_std value: -2.3119 - type: nauc_recall_at_1_diff1 value: 52.2244 - type: nauc_recall_at_3_max value: 32.2945 - type: nauc_recall_at_3_std value: 3.4591 - type: nauc_recall_at_3_diff1 value: 41.0871 - type: nauc_recall_at_5_max value: 29.422500000000003 - type: nauc_recall_at_5_std value: 5.3527 - type: nauc_recall_at_5_diff1 value: 36.7172 - type: nauc_recall_at_10_max value: 28.7964 - type: nauc_recall_at_10_std value: 10.3203 - type: nauc_recall_at_10_diff1 value: 32.9891 - type: nauc_recall_at_20_max value: 35.9088 - type: nauc_recall_at_20_std value: 17.483999999999998 - type: nauc_recall_at_20_diff1 value: 34.1214 - type: nauc_recall_at_100_max value: 40.5066 - type: nauc_recall_at_100_std value: 36.0042 - type: nauc_recall_at_100_diff1 value: 25.258999999999997 - type: nauc_recall_at_1000_max value: 68.16980000000001 - type: nauc_recall_at_1000_std value: 78.27300000000001 - type: nauc_recall_at_1000_diff1 value: 29.831200000000003 - type: nauc_precision_at_1_max value: 36.5416 - type: nauc_precision_at_1_std value: 1.7398 - type: nauc_precision_at_1_diff1 value: 48.6149 - type: nauc_precision_at_3_max value: 34.5475 - type: nauc_precision_at_3_std value: 10.731300000000001 - type: nauc_precision_at_3_diff1 value: 26.6094 - type: nauc_precision_at_5_max value: 30.966300000000004 - type: nauc_precision_at_5_std value: 15.614700000000001 - type: nauc_precision_at_5_diff1 value: 16.3821 - type: nauc_precision_at_10_max value: 29.3082 - type: nauc_precision_at_10_std value: 22.2006 - type: nauc_precision_at_10_diff1 value: 6.5281 - type: nauc_precision_at_20_max value: 23.1867 - type: nauc_precision_at_20_std value: 21.5112 - type: nauc_precision_at_20_diff1 value: -2.1949 - type: nauc_precision_at_100_max value: 6.6039 - type: nauc_precision_at_100_std value: 14.7147 - type: nauc_precision_at_100_diff1 value: -14.2814 - type: nauc_precision_at_1000_max value: -7.7318 - type: nauc_precision_at_1000_std value: 8.0856 - type: nauc_precision_at_1000_diff1 value: -18.8738 - type: nauc_mrr_at_1_max value: 36.5416 - type: nauc_mrr_at_1_std value: 1.7398 - type: nauc_mrr_at_1_diff1 value: 48.6149 - type: nauc_mrr_at_3_max value: 37.4645 - type: nauc_mrr_at_3_std value: 4.7265 - type: nauc_mrr_at_3_diff1 value: 44.2832 - type: nauc_mrr_at_5_max value: 36.8872 - type: nauc_mrr_at_5_std value: 5.0895 - type: nauc_mrr_at_5_diff1 value: 43.1113 - type: nauc_mrr_at_10_max value: 37.1021 - type: nauc_mrr_at_10_std value: 5.7218 - type: nauc_mrr_at_10_diff1 value: 43.1786 - type: nauc_mrr_at_20_max value: 37.4827 - type: nauc_mrr_at_20_std value: 5.9467 - type: nauc_mrr_at_20_diff1 value: 43.4032 - type: nauc_mrr_at_100_max value: 37.3957 - type: nauc_mrr_at_100_std value: 5.9523 - type: nauc_mrr_at_100_diff1 value: 43.3725 - type: nauc_mrr_at_1000_max value: 37.3968 - type: nauc_mrr_at_1000_std value: 5.9475 - type: nauc_mrr_at_1000_diff1 value: 43.39 - type: main_score value: 47.301 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval (default) revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack-wordpress metrics: - type: ndcg_at_1 value: 25.692999999999998 - type: ndcg_at_3 value: 33.0 - type: ndcg_at_5 value: 35.736000000000004 - type: ndcg_at_10 value: 39.196 - type: ndcg_at_20 value: 40.954 - type: ndcg_at_100 value: 44.501000000000005 - type: ndcg_at_1000 value: 46.482 - type: map_at_1 value: 23.851 - type: map_at_3 value: 30.270999999999997 - type: map_at_5 value: 31.905 - type: map_at_10 value: 33.428999999999995 - type: map_at_20 value: 33.954 - type: map_at_100 value: 34.482 - type: map_at_1000 value: 34.57 - type: recall_at_1 value: 23.851 - type: recall_at_3 value: 38.435 - type: recall_at_5 value: 44.872 - type: recall_at_10 value: 55.035999999999994 - type: recall_at_20 value: 61.529999999999994 - type: recall_at_100 value: 79.592 - type: recall_at_1000 value: 94.283 - type: precision_at_1 value: 25.692999999999998 - type: precision_at_3 value: 14.295 - type: precision_at_5 value: 10.277 - type: precision_at_10 value: 6.433 - type: precision_at_20 value: 3.6510000000000002 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.128 - type: mrr_at_1 value: 25.6932 - type: mrr_at_3 value: 32.5323 - type: mrr_at_5 value: 34.0203 - type: mrr_at_10 value: 35.383199999999995 - type: mrr_at_20 value: 35.857499999999995 - type: mrr_at_100 value: 36.2947 - type: mrr_at_1000 value: 36.3456 - type: nauc_ndcg_at_1_max value: 26.3546 - type: nauc_ndcg_at_1_std value: -7.4308 - type: nauc_ndcg_at_1_diff1 value: 50.6893 - type: nauc_ndcg_at_3_max value: 22.5597 - type: nauc_ndcg_at_3_std value: -2.8253 - type: nauc_ndcg_at_3_diff1 value: 40.0339 - type: nauc_ndcg_at_5_max value: 23.4927 - type: nauc_ndcg_at_5_std value: -1.8110000000000002 - type: nauc_ndcg_at_5_diff1 value: 39.0747 - type: nauc_ndcg_at_10_max value: 22.7233 - type: nauc_ndcg_at_10_std value: -1.2677 - type: nauc_ndcg_at_10_diff1 value: 38.4587 - type: nauc_ndcg_at_20_max value: 22.9465 - type: nauc_ndcg_at_20_std value: 0.4223 - type: nauc_ndcg_at_20_diff1 value: 38.5424 - type: nauc_ndcg_at_100_max value: 24.7307 - type: nauc_ndcg_at_100_std value: 2.7405 - type: nauc_ndcg_at_100_diff1 value: 40.0211 - type: nauc_ndcg_at_1000_max value: 24.7978 - type: nauc_ndcg_at_1000_std value: 1.6664999999999999 - type: nauc_ndcg_at_1000_diff1 value: 39.629799999999996 - type: nauc_map_at_1_max value: 23.119 - type: nauc_map_at_1_std value: -8.1386 - type: nauc_map_at_1_diff1 value: 50.166999999999994 - type: nauc_map_at_3_max value: 21.9643 - type: nauc_map_at_3_std value: -4.1963 - type: nauc_map_at_3_diff1 value: 42.0253 - type: nauc_map_at_5_max value: 23.0779 - type: nauc_map_at_5_std value: -3.4221000000000004 - type: nauc_map_at_5_diff1 value: 41.6497 - type: nauc_map_at_10_max value: 23.0936 - type: nauc_map_at_10_std value: -3.107 - type: nauc_map_at_10_diff1 value: 41.5032 - type: nauc_map_at_20_max value: 23.2453 - type: nauc_map_at_20_std value: -2.5267999999999997 - type: nauc_map_at_20_diff1 value: 41.5085 - type: nauc_map_at_100_max value: 23.552899999999998 - type: nauc_map_at_100_std value: -2.0514 - type: nauc_map_at_100_diff1 value: 41.686499999999995 - type: nauc_map_at_1000_max value: 23.5502 - type: nauc_map_at_1000_std value: -2.0632 - type: nauc_map_at_1000_diff1 value: 41.634 - type: nauc_recall_at_1_max value: 23.119 - type: nauc_recall_at_1_std value: -8.1386 - type: nauc_recall_at_1_diff1 value: 50.166999999999994 - type: nauc_recall_at_3_max value: 19.128700000000002 - type: nauc_recall_at_3_std value: -1.2884 - type: nauc_recall_at_3_diff1 value: 33.1893 - type: nauc_recall_at_5_max value: 20.7852 - type: nauc_recall_at_5_std value: 0.9754 - type: nauc_recall_at_5_diff1 value: 31.193199999999997 - type: nauc_recall_at_10_max value: 17.5569 - type: nauc_recall_at_10_std value: 2.5935 - type: nauc_recall_at_10_diff1 value: 28.5192 - type: nauc_recall_at_20_max value: 17.4543 - type: nauc_recall_at_20_std value: 8.694799999999999 - type: nauc_recall_at_20_diff1 value: 28.171200000000002 - type: nauc_recall_at_100_max value: 26.873399999999997 - type: nauc_recall_at_100_std value: 29.0878 - type: nauc_recall_at_100_diff1 value: 34.204 - type: nauc_recall_at_1000_max value: 40.9752 - type: nauc_recall_at_1000_std value: 42.8325 - type: nauc_recall_at_1000_diff1 value: 20.0664 - type: nauc_precision_at_1_max value: 26.3546 - type: nauc_precision_at_1_std value: -7.4308 - type: nauc_precision_at_1_diff1 value: 50.6893 - type: nauc_precision_at_3_max value: 25.078699999999998 - type: nauc_precision_at_3_std value: 3.0139 - type: nauc_precision_at_3_diff1 value: 31.566899999999997 - type: nauc_precision_at_5_max value: 29.1348 - type: nauc_precision_at_5_std value: 7.7597 - type: nauc_precision_at_5_diff1 value: 26.599899999999998 - type: nauc_precision_at_10_max value: 27.019 - type: nauc_precision_at_10_std value: 11.0219 - type: nauc_precision_at_10_diff1 value: 20.9546 - type: nauc_precision_at_20_max value: 27.994200000000003 - type: nauc_precision_at_20_std value: 19.3372 - type: nauc_precision_at_20_diff1 value: 17.363400000000002 - type: nauc_precision_at_100_max value: 27.3087 - type: nauc_precision_at_100_std value: 30.3297 - type: nauc_precision_at_100_diff1 value: 6.2596 - type: nauc_precision_at_1000_max value: 9.347800000000001 - type: nauc_precision_at_1000_std value: 20.6006 - type: nauc_precision_at_1000_diff1 value: -20.9861 - type: nauc_mrr_at_1_max value: 26.3546 - type: nauc_mrr_at_1_std value: -7.4308 - type: nauc_mrr_at_1_diff1 value: 50.6893 - type: nauc_mrr_at_3_max value: 25.746799999999997 - type: nauc_mrr_at_3_std value: -2.9107000000000003 - type: nauc_mrr_at_3_diff1 value: 43.0073 - type: nauc_mrr_at_5_max value: 25.956400000000002 - type: nauc_mrr_at_5_std value: -2.3782 - type: nauc_mrr_at_5_diff1 value: 42.2507 - type: nauc_mrr_at_10_max value: 25.2046 - type: nauc_mrr_at_10_std value: -2.3678999999999997 - type: nauc_mrr_at_10_diff1 value: 41.834700000000005 - type: nauc_mrr_at_20_max value: 25.1774 - type: nauc_mrr_at_20_std value: -1.9298 - type: nauc_mrr_at_20_diff1 value: 41.8803 - type: nauc_mrr_at_100_max value: 25.4455 - type: nauc_mrr_at_100_std value: -1.6853 - type: nauc_mrr_at_100_diff1 value: 42.159 - type: nauc_mrr_at_1000_max value: 25.433899999999998 - type: nauc_mrr_at_1000_std value: -1.7311 - type: nauc_mrr_at_1000_diff1 value: 42.159 - type: main_score value: 39.196 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER (default) revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: ndcg_at_1 value: 32.573 - type: ndcg_at_3 value: 27.683000000000003 - type: ndcg_at_5 value: 29.537999999999997 - type: ndcg_at_10 value: 33.15 - type: ndcg_at_20 value: 35.564 - type: ndcg_at_100 value: 39.898 - type: ndcg_at_1000 value: 43.151 - type: map_at_1 value: 14.57 - type: map_at_3 value: 20.346 - type: map_at_5 value: 22.228 - type: map_at_10 value: 24.102 - type: map_at_20 value: 24.992 - type: map_at_100 value: 25.826 - type: map_at_1000 value: 26.021 - type: recall_at_1 value: 14.57 - type: recall_at_3 value: 25.245 - type: recall_at_5 value: 30.820999999999998 - type: recall_at_10 value: 38.824999999999996 - type: recall_at_20 value: 45.553 - type: recall_at_100 value: 62.236999999999995 - type: recall_at_1000 value: 80.22 - type: precision_at_1 value: 32.573 - type: precision_at_3 value: 20.347 - type: precision_at_5 value: 15.504999999999999 - type: precision_at_10 value: 10.176 - type: precision_at_20 value: 6.1339999999999995 - type: precision_at_100 value: 1.754 - type: precision_at_1000 value: 0.23600000000000002 - type: mrr_at_1 value: 32.573299999999996 - type: mrr_at_3 value: 41.259499999999996 - type: mrr_at_5 value: 43.3116 - type: mrr_at_10 value: 44.4113 - type: mrr_at_20 value: 44.8728 - type: mrr_at_100 value: 45.1757 - type: mrr_at_1000 value: 45.2086 - type: nauc_ndcg_at_1_max value: 36.065799999999996 - type: nauc_ndcg_at_1_std value: 17.1124 - type: nauc_ndcg_at_1_diff1 value: 27.985 - type: nauc_ndcg_at_3_max value: 36.5467 - type: nauc_ndcg_at_3_std value: 16.403100000000002 - type: nauc_ndcg_at_3_diff1 value: 22.1601 - type: nauc_ndcg_at_5_max value: 37.223099999999995 - type: nauc_ndcg_at_5_std value: 18.767300000000002 - type: nauc_ndcg_at_5_diff1 value: 20.6143 - type: nauc_ndcg_at_10_max value: 36.8331 - type: nauc_ndcg_at_10_std value: 20.8315 - type: nauc_ndcg_at_10_diff1 value: 19.5716 - type: nauc_ndcg_at_20_max value: 36.5592 - type: nauc_ndcg_at_20_std value: 21.4874 - type: nauc_ndcg_at_20_diff1 value: 18.4099 - type: nauc_ndcg_at_100_max value: 35.6711 - type: nauc_ndcg_at_100_std value: 22.4637 - type: nauc_ndcg_at_100_diff1 value: 18.218500000000002 - type: nauc_ndcg_at_1000_max value: 36.209599999999995 - type: nauc_ndcg_at_1000_std value: 23.3913 - type: nauc_ndcg_at_1000_diff1 value: 19.055 - type: nauc_map_at_1_max value: 40.6157 - type: nauc_map_at_1_std value: 13.0776 - type: nauc_map_at_1_diff1 value: 30.4958 - type: nauc_map_at_3_max value: 38.3227 - type: nauc_map_at_3_std value: 14.2807 - type: nauc_map_at_3_diff1 value: 23.7558 - type: nauc_map_at_5_max value: 37.9312 - type: nauc_map_at_5_std value: 16.206899999999997 - type: nauc_map_at_5_diff1 value: 22.4312 - type: nauc_map_at_10_max value: 37.7457 - type: nauc_map_at_10_std value: 17.7945 - type: nauc_map_at_10_diff1 value: 21.607000000000003 - type: nauc_map_at_20_max value: 37.727199999999996 - type: nauc_map_at_20_std value: 18.168100000000003 - type: nauc_map_at_20_diff1 value: 21.1277 - type: nauc_map_at_100_max value: 37.5139 - type: nauc_map_at_100_std value: 18.4244 - type: nauc_map_at_100_diff1 value: 21.082600000000003 - type: nauc_map_at_1000_max value: 37.5088 - type: nauc_map_at_1000_std value: 18.4879 - type: nauc_map_at_1000_diff1 value: 21.1075 - type: nauc_recall_at_1_max value: 40.6157 - type: nauc_recall_at_1_std value: 13.0776 - type: nauc_recall_at_1_diff1 value: 30.4958 - type: nauc_recall_at_3_max value: 34.0823 - type: nauc_recall_at_3_std value: 14.2898 - type: nauc_recall_at_3_diff1 value: 17.8174 - type: nauc_recall_at_5_max value: 33.244099999999996 - type: nauc_recall_at_5_std value: 18.2196 - type: nauc_recall_at_5_diff1 value: 14.2718 - type: nauc_recall_at_10_max value: 30.6448 - type: nauc_recall_at_10_std value: 21.323700000000002 - type: nauc_recall_at_10_diff1 value: 11.6099 - type: nauc_recall_at_20_max value: 28.523 - type: nauc_recall_at_20_std value: 21.9056 - type: nauc_recall_at_20_diff1 value: 8.0707 - type: nauc_recall_at_100_max value: 22.836000000000002 - type: nauc_recall_at_100_std value: 24.8746 - type: nauc_recall_at_100_diff1 value: 5.333600000000001 - type: nauc_recall_at_1000_max value: 26.124000000000002 - type: nauc_recall_at_1000_std value: 35.6489 - type: nauc_recall_at_1000_diff1 value: 8.5269 - type: nauc_precision_at_1_max value: 36.065799999999996 - type: nauc_precision_at_1_std value: 17.1124 - type: nauc_precision_at_1_diff1 value: 27.985 - type: nauc_precision_at_3_max value: 29.9743 - type: nauc_precision_at_3_std value: 19.4935 - type: nauc_precision_at_3_diff1 value: 13.7319 - type: nauc_precision_at_5_max value: 26.3111 - type: nauc_precision_at_5_std value: 23.7512 - type: nauc_precision_at_5_diff1 value: 8.945699999999999 - type: nauc_precision_at_10_max value: 20.5867 - type: nauc_precision_at_10_std value: 24.1781 - type: nauc_precision_at_10_diff1 value: 4.716200000000001 - type: nauc_precision_at_20_max value: 16.9009 - type: nauc_precision_at_20_std value: 23.561799999999998 - type: nauc_precision_at_20_diff1 value: 0.26 - type: nauc_precision_at_100_max value: 5.6875 - type: nauc_precision_at_100_std value: 20.5293 - type: nauc_precision_at_100_diff1 value: -3.4817 - type: nauc_precision_at_1000_max value: -2.25 - type: nauc_precision_at_1000_std value: 17.2366 - type: nauc_precision_at_1000_diff1 value: -4.9703 - type: nauc_mrr_at_1_max value: 36.065799999999996 - type: nauc_mrr_at_1_std value: 17.1124 - type: nauc_mrr_at_1_diff1 value: 27.985 - type: nauc_mrr_at_3_max value: 35.9316 - type: nauc_mrr_at_3_std value: 19.3246 - type: nauc_mrr_at_3_diff1 value: 23.6033 - type: nauc_mrr_at_5_max value: 36.581 - type: nauc_mrr_at_5_std value: 20.3626 - type: nauc_mrr_at_5_diff1 value: 23.1952 - type: nauc_mrr_at_10_max value: 36.5789 - type: nauc_mrr_at_10_std value: 20.6594 - type: nauc_mrr_at_10_diff1 value: 23.3078 - type: nauc_mrr_at_20_max value: 36.4621 - type: nauc_mrr_at_20_std value: 20.5731 - type: nauc_mrr_at_20_diff1 value: 23.253899999999998 - type: nauc_mrr_at_100_max value: 36.3788 - type: nauc_mrr_at_100_std value: 20.5076 - type: nauc_mrr_at_100_diff1 value: 23.1904 - type: nauc_mrr_at_1000_max value: 36.383500000000005 - type: nauc_mrr_at_1000_std value: 20.505399999999998 - type: nauc_mrr_at_1000_diff1 value: 23.2106 - type: main_score value: 33.15 task: type: Retrieval - dataset: config: default name: MTEB CodeFeedbackMT (default) revision: b0f12fa0c0dd67f59c95a5c33d02aeeb4c398c5f split: test type: CoIR-Retrieval/codefeedback-mt metrics: - type: ndcg_at_1 value: 30.270000000000003 - type: ndcg_at_3 value: 37.797 - type: ndcg_at_5 value: 40.147 - type: ndcg_at_10 value: 42.136 - type: ndcg_at_20 value: 43.655 - type: ndcg_at_100 value: 45.95 - type: ndcg_at_1000 value: 47.510999999999996 - type: map_at_1 value: 30.270000000000003 - type: map_at_3 value: 35.949 - type: map_at_5 value: 37.254 - type: map_at_10 value: 38.076 - type: map_at_20 value: 38.492 - type: map_at_100 value: 38.805 - type: map_at_1000 value: 38.858 - type: recall_at_1 value: 30.270000000000003 - type: recall_at_3 value: 43.142 - type: recall_at_5 value: 48.844 - type: recall_at_10 value: 54.99000000000001 - type: recall_at_20 value: 61.007999999999996 - type: recall_at_100 value: 73.443 - type: recall_at_1000 value: 86.066 - type: precision_at_1 value: 30.270000000000003 - type: precision_at_3 value: 14.381 - type: precision_at_5 value: 9.769 - type: precision_at_10 value: 5.499 - type: precision_at_20 value: 3.05 - type: precision_at_100 value: 0.734 - type: precision_at_1000 value: 0.086 - type: mrr_at_1 value: 30.2704 - type: mrr_at_3 value: 35.9494 - type: mrr_at_5 value: 37.2539 - type: mrr_at_10 value: 38.0763 - type: mrr_at_20 value: 38.4916 - type: mrr_at_100 value: 38.8047 - type: mrr_at_1000 value: 38.8578 - type: nauc_ndcg_at_1_max value: 13.1327 - type: nauc_ndcg_at_1_std value: -20.450599999999998 - type: nauc_ndcg_at_1_diff1 value: 53.905800000000006 - type: nauc_ndcg_at_3_max value: 15.181000000000001 - type: nauc_ndcg_at_3_std value: -20.877399999999998 - type: nauc_ndcg_at_3_diff1 value: 49.1269 - type: nauc_ndcg_at_5_max value: 15.7972 - type: nauc_ndcg_at_5_std value: -20.6361 - type: nauc_ndcg_at_5_diff1 value: 47.826800000000006 - type: nauc_ndcg_at_10_max value: 16.4268 - type: nauc_ndcg_at_10_std value: -20.0384 - type: nauc_ndcg_at_10_diff1 value: 47.0914 - type: nauc_ndcg_at_20_max value: 17.1004 - type: nauc_ndcg_at_20_std value: -18.9344 - type: nauc_ndcg_at_20_diff1 value: 46.6149 - type: nauc_ndcg_at_100_max value: 17.6904 - type: nauc_ndcg_at_100_std value: -17.1856 - type: nauc_ndcg_at_100_diff1 value: 46.3637 - type: nauc_ndcg_at_1000_max value: 17.5049 - type: nauc_ndcg_at_1000_std value: -16.7834 - type: nauc_ndcg_at_1000_diff1 value: 46.5672 - type: nauc_map_at_1_max value: 13.1327 - type: nauc_map_at_1_std value: -20.450599999999998 - type: nauc_map_at_1_diff1 value: 53.905800000000006 - type: nauc_map_at_3_max value: 14.723500000000001 - type: nauc_map_at_3_std value: -20.7922 - type: nauc_map_at_3_diff1 value: 50.275000000000006 - type: nauc_map_at_5_max value: 15.061399999999999 - type: nauc_map_at_5_std value: -20.6704 - type: nauc_map_at_5_diff1 value: 49.5612 - type: nauc_map_at_10_max value: 15.292900000000001 - type: nauc_map_at_10_std value: -20.4431 - type: nauc_map_at_10_diff1 value: 49.2676 - type: nauc_map_at_20_max value: 15.4694 - type: nauc_map_at_20_std value: -20.1497 - type: nauc_map_at_20_diff1 value: 49.1538 - type: nauc_map_at_100_max value: 15.5383 - type: nauc_map_at_100_std value: -19.9266 - type: nauc_map_at_100_diff1 value: 49.1303 - type: nauc_map_at_1000_max value: 15.5348 - type: nauc_map_at_1000_std value: -19.9076 - type: nauc_map_at_1000_diff1 value: 49.138799999999996 - type: nauc_recall_at_1_max value: 13.1327 - type: nauc_recall_at_1_std value: -20.450599999999998 - type: nauc_recall_at_1_diff1 value: 53.905800000000006 - type: nauc_recall_at_3_max value: 16.467599999999997 - type: nauc_recall_at_3_std value: -21.1125 - type: nauc_recall_at_3_diff1 value: 45.8636 - type: nauc_recall_at_5_max value: 17.996699999999997 - type: nauc_recall_at_5_std value: -20.4801 - type: nauc_recall_at_5_diff1 value: 42.6329 - type: nauc_recall_at_10_max value: 20.258100000000002 - type: nauc_recall_at_10_std value: -18.4556 - type: nauc_recall_at_10_diff1 value: 39.9989 - type: nauc_recall_at_20_max value: 23.4684 - type: nauc_recall_at_20_std value: -13.5326 - type: nauc_recall_at_20_diff1 value: 37.3551 - type: nauc_recall_at_100_max value: 29.868499999999997 - type: nauc_recall_at_100_std value: 1.2361 - type: nauc_recall_at_100_diff1 value: 32.6178 - type: nauc_recall_at_1000_max value: 34.7721 - type: nauc_recall_at_1000_std value: 21.076700000000002 - type: nauc_recall_at_1000_diff1 value: 26.4002 - type: nauc_precision_at_1_max value: 13.1327 - type: nauc_precision_at_1_std value: -20.450599999999998 - type: nauc_precision_at_1_diff1 value: 53.905800000000006 - type: nauc_precision_at_3_max value: 16.467599999999997 - type: nauc_precision_at_3_std value: -21.1125 - type: nauc_precision_at_3_diff1 value: 45.8636 - type: nauc_precision_at_5_max value: 17.996699999999997 - type: nauc_precision_at_5_std value: -20.4801 - type: nauc_precision_at_5_diff1 value: 42.6329 - type: nauc_precision_at_10_max value: 20.258100000000002 - type: nauc_precision_at_10_std value: -18.4556 - type: nauc_precision_at_10_diff1 value: 39.9989 - type: nauc_precision_at_20_max value: 23.4684 - type: nauc_precision_at_20_std value: -13.5326 - type: nauc_precision_at_20_diff1 value: 37.3551 - type: nauc_precision_at_100_max value: 29.868499999999997 - type: nauc_precision_at_100_std value: 1.2361 - type: nauc_precision_at_100_diff1 value: 32.6178 - type: nauc_precision_at_1000_max value: 34.7721 - type: nauc_precision_at_1000_std value: 21.076700000000002 - type: nauc_precision_at_1000_diff1 value: 26.4002 - type: nauc_mrr_at_1_max value: 13.1327 - type: nauc_mrr_at_1_std value: -20.450599999999998 - type: nauc_mrr_at_1_diff1 value: 53.905800000000006 - type: nauc_mrr_at_3_max value: 14.723500000000001 - type: nauc_mrr_at_3_std value: -20.7922 - type: nauc_mrr_at_3_diff1 value: 50.275000000000006 - type: nauc_mrr_at_5_max value: 15.061399999999999 - type: nauc_mrr_at_5_std value: -20.6704 - type: nauc_mrr_at_5_diff1 value: 49.5612 - type: nauc_mrr_at_10_max value: 15.292900000000001 - type: nauc_mrr_at_10_std value: -20.4431 - type: nauc_mrr_at_10_diff1 value: 49.2676 - type: nauc_mrr_at_20_max value: 15.4694 - type: nauc_mrr_at_20_std value: -20.1497 - type: nauc_mrr_at_20_diff1 value: 49.1538 - type: nauc_mrr_at_100_max value: 15.5383 - type: nauc_mrr_at_100_std value: -19.9266 - type: nauc_mrr_at_100_diff1 value: 49.1303 - type: nauc_mrr_at_1000_max value: 15.5348 - type: nauc_mrr_at_1000_std value: -19.9076 - type: nauc_mrr_at_1000_diff1 value: 49.138799999999996 - type: main_score value: 42.136 task: type: Retrieval - dataset: config: default name: MTEB CodeFeedbackST (default) revision: d213819e87aab9010628da8b73ab4eb337c89340 split: test type: CoIR-Retrieval/codefeedback-st metrics: - type: ndcg_at_1 value: 59.621 - type: ndcg_at_3 value: 71.255 - type: ndcg_at_5 value: 73.71 - type: ndcg_at_10 value: 75.276 - type: ndcg_at_20 value: 76.115 - type: ndcg_at_100 value: 76.91900000000001 - type: ndcg_at_1000 value: 77.172 - type: map_at_1 value: 59.621 - type: map_at_3 value: 68.449 - type: map_at_5 value: 69.817 - type: map_at_10 value: 70.474 - type: map_at_20 value: 70.707 - type: map_at_100 value: 70.82300000000001 - type: map_at_1000 value: 70.833 - type: recall_at_1 value: 59.621 - type: recall_at_3 value: 79.352 - type: recall_at_5 value: 85.28999999999999 - type: recall_at_10 value: 90.079 - type: recall_at_20 value: 93.372 - type: recall_at_100 value: 97.649 - type: recall_at_1000 value: 99.604 - type: precision_at_1 value: 59.621 - type: precision_at_3 value: 26.451 - type: precision_at_5 value: 17.058 - type: precision_at_10 value: 9.008 - type: precision_at_20 value: 4.6690000000000005 - type: precision_at_100 value: 0.976 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 59.5796 - type: mrr_at_3 value: 68.42190000000001 - type: mrr_at_5 value: 69.8065 - type: mrr_at_10 value: 70.4563 - type: mrr_at_20 value: 70.69 - type: mrr_at_100 value: 70.80539999999999 - type: mrr_at_1000 value: 70.8155 - type: nauc_ndcg_at_1_max value: 1.0058 - type: nauc_ndcg_at_1_std value: -28.633999999999997 - type: nauc_ndcg_at_1_diff1 value: 74.2731 - type: nauc_ndcg_at_3_max value: 5.9328 - type: nauc_ndcg_at_3_std value: -33.4034 - type: nauc_ndcg_at_3_diff1 value: 69.0612 - type: nauc_ndcg_at_5_max value: 6.3485 - type: nauc_ndcg_at_5_std value: -33.4167 - type: nauc_ndcg_at_5_diff1 value: 68.9449 - type: nauc_ndcg_at_10_max value: 6.0459 - type: nauc_ndcg_at_10_std value: -32.6233 - type: nauc_ndcg_at_10_diff1 value: 69.0512 - type: nauc_ndcg_at_20_max value: 5.8008 - type: nauc_ndcg_at_20_std value: -32.0714 - type: nauc_ndcg_at_20_diff1 value: 69.5449 - type: nauc_ndcg_at_100_max value: 5.5014 - type: nauc_ndcg_at_100_std value: -31.5492 - type: nauc_ndcg_at_100_diff1 value: 69.9543 - type: nauc_ndcg_at_1000_max value: 5.2358 - type: nauc_ndcg_at_1000_std value: -31.638899999999996 - type: nauc_ndcg_at_1000_diff1 value: 70.0955 - type: nauc_map_at_1_max value: 1.0058 - type: nauc_map_at_1_std value: -28.633999999999997 - type: nauc_map_at_1_diff1 value: 74.2731 - type: nauc_map_at_3_max value: 4.5532 - type: nauc_map_at_3_std value: -32.0989 - type: nauc_map_at_3_diff1 value: 70.47879999999999 - type: nauc_map_at_5_max value: 4.7025 - type: nauc_map_at_5_std value: -32.0494 - type: nauc_map_at_5_diff1 value: 70.4832 - type: nauc_map_at_10_max value: 4.5632 - type: nauc_map_at_10_std value: -31.750899999999998 - type: nauc_map_at_10_diff1 value: 70.556 - type: nauc_map_at_20_max value: 4.4907 - type: nauc_map_at_20_std value: -31.6179 - type: nauc_map_at_20_diff1 value: 70.6865 - type: nauc_map_at_100_max value: 4.4536 - type: nauc_map_at_100_std value: -31.5575 - type: nauc_map_at_100_diff1 value: 70.7379 - type: nauc_map_at_1000_max value: 4.4467 - type: nauc_map_at_1000_std value: -31.557000000000002 - type: nauc_map_at_1000_diff1 value: 70.7424 - type: nauc_recall_at_1_max value: 1.0058 - type: nauc_recall_at_1_std value: -28.633999999999997 - type: nauc_recall_at_1_diff1 value: 74.2731 - type: nauc_recall_at_3_max value: 11.3291 - type: nauc_recall_at_3_std value: -38.4878 - type: nauc_recall_at_3_diff1 value: 63.5405 - type: nauc_recall_at_5_max value: 14.802499999999998 - type: nauc_recall_at_5_std value: -40.3304 - type: nauc_recall_at_5_diff1 value: 61.142300000000006 - type: nauc_recall_at_10_max value: 16.3095 - type: nauc_recall_at_10_std value: -37.9007 - type: nauc_recall_at_10_diff1 value: 58.5604 - type: nauc_recall_at_20_max value: 18.5464 - type: nauc_recall_at_20_std value: -33.8926 - type: nauc_recall_at_20_diff1 value: 59.15709999999999 - type: nauc_recall_at_100_max value: 28.231499999999997 - type: nauc_recall_at_100_std value: -14.0739 - type: nauc_recall_at_100_diff1 value: 58.1862 - type: nauc_recall_at_1000_max value: 35.3579 - type: nauc_recall_at_1000_std value: 27.673 - type: nauc_recall_at_1000_diff1 value: 53.6523 - type: nauc_precision_at_1_max value: 1.0058 - type: nauc_precision_at_1_std value: -28.633999999999997 - type: nauc_precision_at_1_diff1 value: 74.2731 - type: nauc_precision_at_3_max value: 11.3291 - type: nauc_precision_at_3_std value: -38.4878 - type: nauc_precision_at_3_diff1 value: 63.5405 - type: nauc_precision_at_5_max value: 14.802499999999998 - type: nauc_precision_at_5_std value: -40.3304 - type: nauc_precision_at_5_diff1 value: 61.142300000000006 - type: nauc_precision_at_10_max value: 16.3095 - type: nauc_precision_at_10_std value: -37.9007 - type: nauc_precision_at_10_diff1 value: 58.5604 - type: nauc_precision_at_20_max value: 18.5464 - type: nauc_precision_at_20_std value: -33.8926 - type: nauc_precision_at_20_diff1 value: 59.15709999999999 - type: nauc_precision_at_100_max value: 28.231499999999997 - type: nauc_precision_at_100_std value: -14.0739 - type: nauc_precision_at_100_diff1 value: 58.1862 - type: nauc_precision_at_1000_max value: 35.3579 - type: nauc_precision_at_1000_std value: 27.673 - type: nauc_precision_at_1000_diff1 value: 53.6523 - type: nauc_mrr_at_1_max value: 0.4596 - type: nauc_mrr_at_1_std value: -28.4399 - type: nauc_mrr_at_1_diff1 value: 74.32849999999999 - type: nauc_mrr_at_3_max value: 4.2199 - type: nauc_mrr_at_3_std value: -31.9909 - type: nauc_mrr_at_3_diff1 value: 70.5363 - type: nauc_mrr_at_5_max value: 4.3676 - type: nauc_mrr_at_5_std value: -31.947599999999998 - type: nauc_mrr_at_5_diff1 value: 70.5144 - type: nauc_mrr_at_10_max value: 4.2149 - type: nauc_mrr_at_10_std value: -31.647 - type: nauc_mrr_at_10_diff1 value: 70.598 - type: nauc_mrr_at_20_max value: 4.1426 - type: nauc_mrr_at_20_std value: -31.513799999999996 - type: nauc_mrr_at_20_diff1 value: 70.729 - type: nauc_mrr_at_100_max value: 4.104 - type: nauc_mrr_at_100_std value: -31.451800000000002 - type: nauc_mrr_at_100_diff1 value: 70.7809 - type: nauc_mrr_at_1000_max value: 4.0969999999999995 - type: nauc_mrr_at_1000_std value: -31.4513 - type: nauc_mrr_at_1000_diff1 value: 70.78529999999999 - type: main_score value: 75.276 task: type: Retrieval - dataset: config: python name: MTEB CodeSearchNetCCRetrieval (python) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 36.955 - type: ndcg_at_3 value: 46.436 - type: ndcg_at_5 value: 49.055 - type: ndcg_at_10 value: 51.408 - type: ndcg_at_20 value: 52.93600000000001 - type: ndcg_at_100 value: 55.089999999999996 - type: ndcg_at_1000 value: 56.406 - type: map_at_1 value: 36.955 - type: map_at_3 value: 44.112 - type: map_at_5 value: 45.565 - type: map_at_10 value: 46.538000000000004 - type: map_at_20 value: 46.958 - type: map_at_100 value: 47.253 - type: map_at_1000 value: 47.298 - type: recall_at_1 value: 36.955 - type: recall_at_3 value: 53.157 - type: recall_at_5 value: 59.519 - type: recall_at_10 value: 66.78500000000001 - type: recall_at_20 value: 72.82499999999999 - type: recall_at_100 value: 84.482 - type: recall_at_1000 value: 95.06599999999999 - type: precision_at_1 value: 36.955 - type: precision_at_3 value: 17.718999999999998 - type: precision_at_5 value: 11.904 - type: precision_at_10 value: 6.679 - type: precision_at_20 value: 3.641 - type: precision_at_100 value: 0.845 - type: precision_at_1000 value: 0.095 - type: mrr_at_1 value: 36.9487 - type: mrr_at_3 value: 44.1044 - type: mrr_at_5 value: 45.556999999999995 - type: mrr_at_10 value: 46.531 - type: mrr_at_20 value: 46.9517 - type: mrr_at_100 value: 47.246300000000005 - type: mrr_at_1000 value: 47.2918 - type: nauc_ndcg_at_1_max value: 30.887500000000003 - type: nauc_ndcg_at_1_std value: -5.4391 - type: nauc_ndcg_at_1_diff1 value: 53.215199999999996 - type: nauc_ndcg_at_3_max value: 31.4697 - type: nauc_ndcg_at_3_std value: -5.3775 - type: nauc_ndcg_at_3_diff1 value: 48.6991 - type: nauc_ndcg_at_5_max value: 31.4647 - type: nauc_ndcg_at_5_std value: -5.022 - type: nauc_ndcg_at_5_diff1 value: 48.0297 - type: nauc_ndcg_at_10_max value: 31.5139 - type: nauc_ndcg_at_10_std value: -4.3081000000000005 - type: nauc_ndcg_at_10_diff1 value: 47.6012 - type: nauc_ndcg_at_20_max value: 31.4083 - type: nauc_ndcg_at_20_std value: -3.7769999999999997 - type: nauc_ndcg_at_20_diff1 value: 47.4673 - type: nauc_ndcg_at_100_max value: 31.432100000000002 - type: nauc_ndcg_at_100_std value: -3.3629 - type: nauc_ndcg_at_100_diff1 value: 47.5608 - type: nauc_ndcg_at_1000_max value: 31.521500000000003 - type: nauc_ndcg_at_1000_std value: -3.4922 - type: nauc_ndcg_at_1000_diff1 value: 47.997299999999996 - type: nauc_map_at_1_max value: 30.887500000000003 - type: nauc_map_at_1_std value: -5.4391 - type: nauc_map_at_1_diff1 value: 53.215199999999996 - type: nauc_map_at_3_max value: 31.3321 - type: nauc_map_at_3_std value: -5.3912 - type: nauc_map_at_3_diff1 value: 49.7525 - type: nauc_map_at_5_max value: 31.324600000000004 - type: nauc_map_at_5_std value: -5.197100000000001 - type: nauc_map_at_5_diff1 value: 49.4028 - type: nauc_map_at_10_max value: 31.3398 - type: nauc_map_at_10_std value: -4.9248 - type: nauc_map_at_10_diff1 value: 49.2583 - type: nauc_map_at_20_max value: 31.309199999999997 - type: nauc_map_at_20_std value: -4.7903 - type: nauc_map_at_20_diff1 value: 49.2312 - type: nauc_map_at_100_max value: 31.305 - type: nauc_map_at_100_std value: -4.7492 - type: nauc_map_at_100_diff1 value: 49.2452 - type: nauc_map_at_1000_max value: 31.3077 - type: nauc_map_at_1000_std value: -4.7505 - type: nauc_map_at_1000_diff1 value: 49.2596 - type: nauc_recall_at_1_max value: 30.887500000000003 - type: nauc_recall_at_1_std value: -5.4391 - type: nauc_recall_at_1_diff1 value: 53.215199999999996 - type: nauc_recall_at_3_max value: 31.877899999999997 - type: nauc_recall_at_3_std value: -5.3372 - type: nauc_recall_at_3_diff1 value: 45.5796 - type: nauc_recall_at_5_max value: 31.9064 - type: nauc_recall_at_5_std value: -4.4158 - type: nauc_recall_at_5_diff1 value: 43.6238 - type: nauc_recall_at_10_max value: 32.1625 - type: nauc_recall_at_10_std value: -1.6879000000000002 - type: nauc_recall_at_10_diff1 value: 41.4155 - type: nauc_recall_at_20_max value: 31.7318 - type: nauc_recall_at_20_std value: 1.4794 - type: nauc_recall_at_20_diff1 value: 39.7822 - type: nauc_recall_at_100_max value: 32.399899999999995 - type: nauc_recall_at_100_std value: 9.331299999999999 - type: nauc_recall_at_100_diff1 value: 36.4089 - type: nauc_recall_at_1000_max value: 38.488299999999995 - type: nauc_recall_at_1000_std value: 26.7544 - type: nauc_recall_at_1000_diff1 value: 34.8223 - type: nauc_precision_at_1_max value: 30.887500000000003 - type: nauc_precision_at_1_std value: -5.4391 - type: nauc_precision_at_1_diff1 value: 53.215199999999996 - type: nauc_precision_at_3_max value: 31.877899999999997 - type: nauc_precision_at_3_std value: -5.3372 - type: nauc_precision_at_3_diff1 value: 45.5796 - type: nauc_precision_at_5_max value: 31.9064 - type: nauc_precision_at_5_std value: -4.4158 - type: nauc_precision_at_5_diff1 value: 43.6238 - type: nauc_precision_at_10_max value: 32.1625 - type: nauc_precision_at_10_std value: -1.6879000000000002 - type: nauc_precision_at_10_diff1 value: 41.4155 - type: nauc_precision_at_20_max value: 31.7318 - type: nauc_precision_at_20_std value: 1.4794 - type: nauc_precision_at_20_diff1 value: 39.7822 - type: nauc_precision_at_100_max value: 32.399899999999995 - type: nauc_precision_at_100_std value: 9.331299999999999 - type: nauc_precision_at_100_diff1 value: 36.4089 - type: nauc_precision_at_1000_max value: 38.488299999999995 - type: nauc_precision_at_1000_std value: 26.7544 - type: nauc_precision_at_1000_diff1 value: 34.8223 - type: nauc_mrr_at_1_max value: 30.950899999999997 - type: nauc_mrr_at_1_std value: -5.4719 - type: nauc_mrr_at_1_diff1 value: 53.235699999999994 - type: nauc_mrr_at_3_max value: 31.374000000000002 - type: nauc_mrr_at_3_std value: -5.4241 - type: nauc_mrr_at_3_diff1 value: 49.7741 - type: nauc_mrr_at_5_max value: 31.3677 - type: nauc_mrr_at_5_std value: -5.2233 - type: nauc_mrr_at_5_diff1 value: 49.4223 - type: nauc_mrr_at_10_max value: 31.3811 - type: nauc_mrr_at_10_std value: -4.952100000000001 - type: nauc_mrr_at_10_diff1 value: 49.2782 - type: nauc_mrr_at_20_max value: 31.3498 - type: nauc_mrr_at_20_std value: -4.8186 - type: nauc_mrr_at_20_diff1 value: 49.2501 - type: nauc_mrr_at_100_max value: 31.3459 - type: nauc_mrr_at_100_std value: -4.7777 - type: nauc_mrr_at_100_diff1 value: 49.2643 - type: nauc_mrr_at_1000_max value: 31.3487 - type: nauc_mrr_at_1000_std value: -4.779 - type: nauc_mrr_at_1000_diff1 value: 49.2787 - type: main_score value: 51.408 task: type: Retrieval - dataset: config: javascript name: MTEB CodeSearchNetCCRetrieval (javascript) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 38.833 - type: ndcg_at_3 value: 47.698 - type: ndcg_at_5 value: 49.964999999999996 - type: ndcg_at_10 value: 52.035 - type: ndcg_at_20 value: 53.49 - type: ndcg_at_100 value: 55.696999999999996 - type: ndcg_at_1000 value: 57.037000000000006 - type: map_at_1 value: 38.833 - type: map_at_3 value: 45.559 - type: map_at_5 value: 46.817 - type: map_at_10 value: 47.675 - type: map_at_20 value: 48.079 - type: map_at_100 value: 48.375 - type: map_at_1000 value: 48.42 - type: recall_at_1 value: 38.833 - type: recall_at_3 value: 53.874 - type: recall_at_5 value: 59.374 - type: recall_at_10 value: 65.755 - type: recall_at_20 value: 71.468 - type: recall_at_100 value: 83.5 - type: recall_at_1000 value: 94.348 - type: precision_at_1 value: 38.833 - type: precision_at_3 value: 17.958 - type: precision_at_5 value: 11.875 - type: precision_at_10 value: 6.576 - type: precision_at_20 value: 3.573 - type: precision_at_100 value: 0.835 - type: precision_at_1000 value: 0.094 - type: mrr_at_1 value: 38.8332 - type: mrr_at_3 value: 45.5485 - type: mrr_at_5 value: 46.814 - type: mrr_at_10 value: 47.6716 - type: mrr_at_20 value: 48.0761 - type: mrr_at_100 value: 48.3716 - type: mrr_at_1000 value: 48.4167 - type: nauc_ndcg_at_1_max value: 26.1449 - type: nauc_ndcg_at_1_std value: -10.991299999999999 - type: nauc_ndcg_at_1_diff1 value: 55.970299999999995 - type: nauc_ndcg_at_3_max value: 29.7447 - type: nauc_ndcg_at_3_std value: -9.610299999999999 - type: nauc_ndcg_at_3_diff1 value: 52.031499999999994 - type: nauc_ndcg_at_5_max value: 29.1562 - type: nauc_ndcg_at_5_std value: -9.288499999999999 - type: nauc_ndcg_at_5_diff1 value: 50.8454 - type: nauc_ndcg_at_10_max value: 28.1795 - type: nauc_ndcg_at_10_std value: -9.5992 - type: nauc_ndcg_at_10_diff1 value: 50.6937 - type: nauc_ndcg_at_20_max value: 27.8613 - type: nauc_ndcg_at_20_std value: -9.425500000000001 - type: nauc_ndcg_at_20_diff1 value: 50.5688 - type: nauc_ndcg_at_100_max value: 27.9792 - type: nauc_ndcg_at_100_std value: -8.792300000000001 - type: nauc_ndcg_at_100_diff1 value: 50.868500000000004 - type: nauc_ndcg_at_1000_max value: 28.0666 - type: nauc_ndcg_at_1000_std value: -8.928899999999999 - type: nauc_ndcg_at_1000_diff1 value: 51.1663 - type: nauc_map_at_1_max value: 26.1449 - type: nauc_map_at_1_std value: -10.991299999999999 - type: nauc_map_at_1_diff1 value: 55.970299999999995 - type: nauc_map_at_3_max value: 28.921799999999998 - type: nauc_map_at_3_std value: -9.9782 - type: nauc_map_at_3_diff1 value: 52.965700000000005 - type: nauc_map_at_5_max value: 28.575899999999997 - type: nauc_map_at_5_std value: -9.822799999999999 - type: nauc_map_at_5_diff1 value: 52.32790000000001 - type: nauc_map_at_10_max value: 28.1738 - type: nauc_map_at_10_std value: -9.933300000000001 - type: nauc_map_at_10_diff1 value: 52.26690000000001 - type: nauc_map_at_20_max value: 28.0844 - type: nauc_map_at_20_std value: -9.8925 - type: nauc_map_at_20_diff1 value: 52.2407 - type: nauc_map_at_100_max value: 28.0938 - type: nauc_map_at_100_std value: -9.8258 - type: nauc_map_at_100_diff1 value: 52.2776 - type: nauc_map_at_1000_max value: 28.092299999999998 - type: nauc_map_at_1000_std value: -9.832 - type: nauc_map_at_1000_diff1 value: 52.2874 - type: nauc_recall_at_1_max value: 26.1449 - type: nauc_recall_at_1_std value: -10.991299999999999 - type: nauc_recall_at_1_diff1 value: 55.970299999999995 - type: nauc_recall_at_3_max value: 32.1929 - type: nauc_recall_at_3_std value: -8.491200000000001 - type: nauc_recall_at_3_diff1 value: 49.2364 - type: nauc_recall_at_5_max value: 30.8852 - type: nauc_recall_at_5_std value: -7.518700000000001 - type: nauc_recall_at_5_diff1 value: 46.004400000000004 - type: nauc_recall_at_10_max value: 27.6397 - type: nauc_recall_at_10_std value: -8.5506 - type: nauc_recall_at_10_diff1 value: 45.012299999999996 - type: nauc_recall_at_20_max value: 26.026300000000003 - type: nauc_recall_at_20_std value: -7.5049 - type: nauc_recall_at_20_diff1 value: 43.6556 - type: nauc_recall_at_100_max value: 26.3742 - type: nauc_recall_at_100_std value: 0.46940000000000004 - type: nauc_recall_at_100_diff1 value: 43.1361 - type: nauc_recall_at_1000_max value: 28.3536 - type: nauc_recall_at_1000_std value: 11.2799 - type: nauc_recall_at_1000_diff1 value: 41.8369 - type: nauc_precision_at_1_max value: 26.1449 - type: nauc_precision_at_1_std value: -10.991299999999999 - type: nauc_precision_at_1_diff1 value: 55.970299999999995 - type: nauc_precision_at_3_max value: 32.1929 - type: nauc_precision_at_3_std value: -8.491200000000001 - type: nauc_precision_at_3_diff1 value: 49.2364 - type: nauc_precision_at_5_max value: 30.8852 - type: nauc_precision_at_5_std value: -7.518700000000001 - type: nauc_precision_at_5_diff1 value: 46.004400000000004 - type: nauc_precision_at_10_max value: 27.6397 - type: nauc_precision_at_10_std value: -8.5506 - type: nauc_precision_at_10_diff1 value: 45.012299999999996 - type: nauc_precision_at_20_max value: 26.026300000000003 - type: nauc_precision_at_20_std value: -7.5049 - type: nauc_precision_at_20_diff1 value: 43.6556 - type: nauc_precision_at_100_max value: 26.3742 - type: nauc_precision_at_100_std value: 0.46940000000000004 - type: nauc_precision_at_100_diff1 value: 43.1361 - type: nauc_precision_at_1000_max value: 28.3536 - type: nauc_precision_at_1000_std value: 11.2799 - type: nauc_precision_at_1000_diff1 value: 41.8369 - type: nauc_mrr_at_1_max value: 26.1449 - type: nauc_mrr_at_1_std value: -10.991299999999999 - type: nauc_mrr_at_1_diff1 value: 55.970299999999995 - type: nauc_mrr_at_3_max value: 28.9026 - type: nauc_mrr_at_3_std value: -10.0274 - type: nauc_mrr_at_3_diff1 value: 52.9705 - type: nauc_mrr_at_5_max value: 28.571 - type: nauc_mrr_at_5_std value: -9.8353 - type: nauc_mrr_at_5_diff1 value: 52.3292 - type: nauc_mrr_at_10_max value: 28.169300000000003 - type: nauc_mrr_at_10_std value: -9.945500000000001 - type: nauc_mrr_at_10_diff1 value: 52.2672 - type: nauc_mrr_at_20_max value: 28.079900000000002 - type: nauc_mrr_at_20_std value: -9.9048 - type: nauc_mrr_at_20_diff1 value: 52.24100000000001 - type: nauc_mrr_at_100_max value: 28.0893 - type: nauc_mrr_at_100_std value: -9.8382 - type: nauc_mrr_at_100_diff1 value: 52.2779 - type: nauc_mrr_at_1000_max value: 28.0878 - type: nauc_mrr_at_1000_std value: -9.8445 - type: nauc_mrr_at_1000_diff1 value: 52.2877 - type: main_score value: 52.035 task: type: Retrieval - dataset: config: go name: MTEB CodeSearchNetCCRetrieval (go) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 27.259 - type: ndcg_at_3 value: 34.537 - type: ndcg_at_5 value: 36.658 - type: ndcg_at_10 value: 38.749 - type: ndcg_at_20 value: 40.439 - type: ndcg_at_100 value: 43.021 - type: ndcg_at_1000 value: 44.909 - type: map_at_1 value: 27.259 - type: map_at_3 value: 32.738 - type: map_at_5 value: 33.916000000000004 - type: map_at_10 value: 34.787 - type: map_at_20 value: 35.253 - type: map_at_100 value: 35.597 - type: map_at_1000 value: 35.66 - type: recall_at_1 value: 27.259 - type: recall_at_3 value: 39.744 - type: recall_at_5 value: 44.89 - type: recall_at_10 value: 51.317 - type: recall_at_20 value: 57.99100000000001 - type: recall_at_100 value: 72.088 - type: recall_at_1000 value: 87.368 - type: precision_at_1 value: 27.259 - type: precision_at_3 value: 13.248 - type: precision_at_5 value: 8.978 - type: precision_at_10 value: 5.132 - type: precision_at_20 value: 2.9000000000000004 - type: precision_at_100 value: 0.721 - type: precision_at_1000 value: 0.087 - type: mrr_at_1 value: 27.247 - type: mrr_at_3 value: 32.73 - type: mrr_at_5 value: 33.9188 - type: mrr_at_10 value: 34.7795 - type: mrr_at_20 value: 35.2462 - type: mrr_at_100 value: 35.5904 - type: mrr_at_1000 value: 35.654 - type: nauc_ndcg_at_1_max value: 26.4086 - type: nauc_ndcg_at_1_std value: -2.9711000000000003 - type: nauc_ndcg_at_1_diff1 value: 51.946099999999994 - type: nauc_ndcg_at_3_max value: 25.4155 - type: nauc_ndcg_at_3_std value: -2.8535999999999997 - type: nauc_ndcg_at_3_diff1 value: 46.7669 - type: nauc_ndcg_at_5_max value: 25.0238 - type: nauc_ndcg_at_5_std value: -2.5973 - type: nauc_ndcg_at_5_diff1 value: 46.2719 - type: nauc_ndcg_at_10_max value: 24.3719 - type: nauc_ndcg_at_10_std value: -2.4239 - type: nauc_ndcg_at_10_diff1 value: 45.5531 - type: nauc_ndcg_at_20_max value: 24.2915 - type: nauc_ndcg_at_20_std value: -2.0365 - type: nauc_ndcg_at_20_diff1 value: 45.290200000000006 - type: nauc_ndcg_at_100_max value: 23.9849 - type: nauc_ndcg_at_100_std value: -1.1925 - type: nauc_ndcg_at_100_diff1 value: 45.1382 - type: nauc_ndcg_at_1000_max value: 24.3502 - type: nauc_ndcg_at_1000_std value: -0.7086 - type: nauc_ndcg_at_1000_diff1 value: 45.550200000000004 - type: nauc_map_at_1_max value: 26.4086 - type: nauc_map_at_1_std value: -2.9711000000000003 - type: nauc_map_at_1_diff1 value: 51.946099999999994 - type: nauc_map_at_3_max value: 25.6581 - type: nauc_map_at_3_std value: -2.8928 - type: nauc_map_at_3_diff1 value: 47.9103 - type: nauc_map_at_5_max value: 25.438699999999997 - type: nauc_map_at_5_std value: -2.759 - type: nauc_map_at_5_diff1 value: 47.6395 - type: nauc_map_at_10_max value: 25.167299999999997 - type: nauc_map_at_10_std value: -2.6864 - type: nauc_map_at_10_diff1 value: 47.335100000000004 - type: nauc_map_at_20_max value: 25.1492 - type: nauc_map_at_20_std value: -2.5978000000000003 - type: nauc_map_at_20_diff1 value: 47.2833 - type: nauc_map_at_100_max value: 25.094499999999996 - type: nauc_map_at_100_std value: -2.5058000000000002 - type: nauc_map_at_100_diff1 value: 47.2631 - type: nauc_map_at_1000_max value: 25.105100000000004 - type: nauc_map_at_1000_std value: -2.4873 - type: nauc_map_at_1000_diff1 value: 47.279900000000005 - type: nauc_recall_at_1_max value: 26.4086 - type: nauc_recall_at_1_std value: -2.9711000000000003 - type: nauc_recall_at_1_diff1 value: 51.946099999999994 - type: nauc_recall_at_3_max value: 24.743499999999997 - type: nauc_recall_at_3_std value: -2.7411000000000003 - type: nauc_recall_at_3_diff1 value: 43.6461 - type: nauc_recall_at_5_max value: 23.8105 - type: nauc_recall_at_5_std value: -2.0951 - type: nauc_recall_at_5_diff1 value: 42.4182 - type: nauc_recall_at_10_max value: 21.7867 - type: nauc_recall_at_10_std value: -1.5507 - type: nauc_recall_at_10_diff1 value: 40.1507 - type: nauc_recall_at_20_max value: 21.264 - type: nauc_recall_at_20_std value: 0.2463 - type: nauc_recall_at_20_diff1 value: 38.5714 - type: nauc_recall_at_100_max value: 18.4525 - type: nauc_recall_at_100_std value: 7.3066 - type: nauc_recall_at_100_diff1 value: 35.585 - type: nauc_recall_at_1000_max value: 20.769299999999998 - type: nauc_recall_at_1000_std value: 24.6752 - type: nauc_recall_at_1000_diff1 value: 34.4382 - type: nauc_precision_at_1_max value: 26.4086 - type: nauc_precision_at_1_std value: -2.9711000000000003 - type: nauc_precision_at_1_diff1 value: 51.946099999999994 - type: nauc_precision_at_3_max value: 24.743499999999997 - type: nauc_precision_at_3_std value: -2.7411000000000003 - type: nauc_precision_at_3_diff1 value: 43.6461 - type: nauc_precision_at_5_max value: 23.8105 - type: nauc_precision_at_5_std value: -2.0951 - type: nauc_precision_at_5_diff1 value: 42.4182 - type: nauc_precision_at_10_max value: 21.7867 - type: nauc_precision_at_10_std value: -1.5507 - type: nauc_precision_at_10_diff1 value: 40.1507 - type: nauc_precision_at_20_max value: 21.264 - type: nauc_precision_at_20_std value: 0.2463 - type: nauc_precision_at_20_diff1 value: 38.5714 - type: nauc_precision_at_100_max value: 18.4525 - type: nauc_precision_at_100_std value: 7.3066 - type: nauc_precision_at_100_diff1 value: 35.585 - type: nauc_precision_at_1000_max value: 20.769299999999998 - type: nauc_precision_at_1000_std value: 24.6752 - type: nauc_precision_at_1000_diff1 value: 34.4382 - type: nauc_mrr_at_1_max value: 26.4631 - type: nauc_mrr_at_1_std value: -2.9343999999999997 - type: nauc_mrr_at_1_diff1 value: 51.9943 - type: nauc_mrr_at_3_max value: 25.695 - type: nauc_mrr_at_3_std value: -2.8865 - type: nauc_mrr_at_3_diff1 value: 47.948299999999996 - type: nauc_mrr_at_5_max value: 25.461 - type: nauc_mrr_at_5_std value: -2.7289999999999996 - type: nauc_mrr_at_5_diff1 value: 47.6623 - type: nauc_mrr_at_10_max value: 25.1963 - type: nauc_mrr_at_10_std value: -2.6818999999999997 - type: nauc_mrr_at_10_diff1 value: 47.374500000000005 - type: nauc_mrr_at_20_max value: 25.178800000000003 - type: nauc_mrr_at_20_std value: -2.5887000000000002 - type: nauc_mrr_at_20_diff1 value: 47.3199 - type: nauc_mrr_at_100_max value: 25.1241 - type: nauc_mrr_at_100_std value: -2.4967 - type: nauc_mrr_at_100_diff1 value: 47.2999 - type: nauc_mrr_at_1000_max value: 25.134800000000002 - type: nauc_mrr_at_1000_std value: -2.4783 - type: nauc_mrr_at_1000_diff1 value: 47.3167 - type: main_score value: 38.749 task: type: Retrieval - dataset: config: ruby name: MTEB CodeSearchNetCCRetrieval (ruby) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 40.92 - type: ndcg_at_3 value: 49.364999999999995 - type: ndcg_at_5 value: 51.654999999999994 - type: ndcg_at_10 value: 53.169999999999995 - type: ndcg_at_20 value: 54.64 - type: ndcg_at_100 value: 56.974000000000004 - type: ndcg_at_1000 value: 58.306999999999995 - type: map_at_1 value: 40.92 - type: map_at_3 value: 47.343 - type: map_at_5 value: 48.616 - type: map_at_10 value: 49.242000000000004 - type: map_at_20 value: 49.647999999999996 - type: map_at_100 value: 49.97 - type: map_at_1000 value: 50.017999999999994 - type: recall_at_1 value: 40.92 - type: recall_at_3 value: 55.193999999999996 - type: recall_at_5 value: 60.745000000000005 - type: recall_at_10 value: 65.424 - type: recall_at_20 value: 71.21300000000001 - type: recall_at_100 value: 83.822 - type: recall_at_1000 value: 94.44900000000001 - type: precision_at_1 value: 40.92 - type: precision_at_3 value: 18.398 - type: precision_at_5 value: 12.149000000000001 - type: precision_at_10 value: 6.542000000000001 - type: precision_at_20 value: 3.5610000000000004 - type: precision_at_100 value: 0.8380000000000001 - type: precision_at_1000 value: 0.094 - type: mrr_at_1 value: 40.9199 - type: mrr_at_3 value: 47.3434 - type: mrr_at_5 value: 48.6162 - type: mrr_at_10 value: 49.2421 - type: mrr_at_20 value: 49.6524 - type: mrr_at_100 value: 49.9694 - type: mrr_at_1000 value: 50.017999999999994 - type: nauc_ndcg_at_1_max value: 28.5367 - type: nauc_ndcg_at_1_std value: -8.2024 - type: nauc_ndcg_at_1_diff1 value: 59.920399999999994 - type: nauc_ndcg_at_3_max value: 29.583399999999997 - type: nauc_ndcg_at_3_std value: -10.276499999999999 - type: nauc_ndcg_at_3_diff1 value: 53.3108 - type: nauc_ndcg_at_5_max value: 29.124299999999998 - type: nauc_ndcg_at_5_std value: -9.9282 - type: nauc_ndcg_at_5_diff1 value: 53.1591 - type: nauc_ndcg_at_10_max value: 28.778599999999997 - type: nauc_ndcg_at_10_std value: -10.319799999999999 - type: nauc_ndcg_at_10_diff1 value: 53.244499999999995 - type: nauc_ndcg_at_20_max value: 28.8719 - type: nauc_ndcg_at_20_std value: -9.7272 - type: nauc_ndcg_at_20_diff1 value: 53.3575 - type: nauc_ndcg_at_100_max value: 28.8624 - type: nauc_ndcg_at_100_std value: -9.3621 - type: nauc_ndcg_at_100_diff1 value: 53.322599999999994 - type: nauc_ndcg_at_1000_max value: 28.876400000000004 - type: nauc_ndcg_at_1000_std value: -9.3757 - type: nauc_ndcg_at_1000_diff1 value: 53.5029 - type: nauc_map_at_1_max value: 28.5367 - type: nauc_map_at_1_std value: -8.2024 - type: nauc_map_at_1_diff1 value: 59.920399999999994 - type: nauc_map_at_3_max value: 29.373500000000003 - type: nauc_map_at_3_std value: -9.7647 - type: nauc_map_at_3_diff1 value: 54.8768 - type: nauc_map_at_5_max value: 29.1429 - type: nauc_map_at_5_std value: -9.5913 - type: nauc_map_at_5_diff1 value: 54.8183 - type: nauc_map_at_10_max value: 29.0079 - type: nauc_map_at_10_std value: -9.7633 - type: nauc_map_at_10_diff1 value: 54.87180000000001 - type: nauc_map_at_20_max value: 29.004 - type: nauc_map_at_20_std value: -9.609399999999999 - type: nauc_map_at_20_diff1 value: 54.8733 - type: nauc_map_at_100_max value: 28.961100000000002 - type: nauc_map_at_100_std value: -9.586500000000001 - type: nauc_map_at_100_diff1 value: 54.85719999999999 - type: nauc_map_at_1000_max value: 28.957 - type: nauc_map_at_1000_std value: -9.5861 - type: nauc_map_at_1000_diff1 value: 54.8685 - type: nauc_recall_at_1_max value: 28.5367 - type: nauc_recall_at_1_std value: -8.2024 - type: nauc_recall_at_1_diff1 value: 59.920399999999994 - type: nauc_recall_at_3_max value: 30.198900000000002 - type: nauc_recall_at_3_std value: -11.8281 - type: nauc_recall_at_3_diff1 value: 48.5911 - type: nauc_recall_at_5_max value: 28.938000000000002 - type: nauc_recall_at_5_std value: -10.9165 - type: nauc_recall_at_5_diff1 value: 47.8612 - type: nauc_recall_at_10_max value: 27.6793 - type: nauc_recall_at_10_std value: -12.281400000000001 - type: nauc_recall_at_10_diff1 value: 47.665400000000005 - type: nauc_recall_at_20_max value: 28.2941 - type: nauc_recall_at_20_std value: -9.5387 - type: nauc_recall_at_20_diff1 value: 47.875 - type: nauc_recall_at_100_max value: 29.1692 - type: nauc_recall_at_100_std value: -4.8877999999999995 - type: nauc_recall_at_100_diff1 value: 44.8146 - type: nauc_recall_at_1000_max value: 32.1351 - type: nauc_recall_at_1000_std value: 2.178 - type: nauc_recall_at_1000_diff1 value: 35.842600000000004 - type: nauc_precision_at_1_max value: 28.5367 - type: nauc_precision_at_1_std value: -8.2024 - type: nauc_precision_at_1_diff1 value: 59.920399999999994 - type: nauc_precision_at_3_max value: 30.198900000000002 - type: nauc_precision_at_3_std value: -11.8281 - type: nauc_precision_at_3_diff1 value: 48.5911 - type: nauc_precision_at_5_max value: 28.938000000000002 - type: nauc_precision_at_5_std value: -10.9165 - type: nauc_precision_at_5_diff1 value: 47.8612 - type: nauc_precision_at_10_max value: 27.6793 - type: nauc_precision_at_10_std value: -12.281400000000001 - type: nauc_precision_at_10_diff1 value: 47.665400000000005 - type: nauc_precision_at_20_max value: 28.2941 - type: nauc_precision_at_20_std value: -9.5387 - type: nauc_precision_at_20_diff1 value: 47.875 - type: nauc_precision_at_100_max value: 29.1692 - type: nauc_precision_at_100_std value: -4.8877999999999995 - type: nauc_precision_at_100_diff1 value: 44.8146 - type: nauc_precision_at_1000_max value: 32.1351 - type: nauc_precision_at_1000_std value: 2.178 - type: nauc_precision_at_1000_diff1 value: 35.842600000000004 - type: nauc_mrr_at_1_max value: 28.6205 - type: nauc_mrr_at_1_std value: -8.180900000000001 - type: nauc_mrr_at_1_diff1 value: 59.920399999999994 - type: nauc_mrr_at_3_max value: 29.416900000000002 - type: nauc_mrr_at_3_std value: -9.7536 - type: nauc_mrr_at_3_diff1 value: 54.8768 - type: nauc_mrr_at_5_max value: 29.187 - type: nauc_mrr_at_5_std value: -9.58 - type: nauc_mrr_at_5_diff1 value: 54.8183 - type: nauc_mrr_at_10_max value: 29.0523 - type: nauc_mrr_at_10_std value: -9.7519 - type: nauc_mrr_at_10_diff1 value: 54.87180000000001 - type: nauc_mrr_at_20_max value: 29.0395 - type: nauc_mrr_at_20_std value: -9.5921 - type: nauc_mrr_at_20_diff1 value: 54.8737 - type: nauc_mrr_at_100_max value: 29.0069 - type: nauc_mrr_at_100_std value: -9.5772 - type: nauc_mrr_at_100_diff1 value: 54.8585 - type: nauc_mrr_at_1000_max value: 29.0016 - type: nauc_mrr_at_1000_std value: -9.574399999999999 - type: nauc_mrr_at_1000_diff1 value: 54.8686 - type: main_score value: 53.169999999999995 task: type: Retrieval - dataset: config: java name: MTEB CodeSearchNetCCRetrieval (java) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 38.01 - type: ndcg_at_3 value: 46.611999999999995 - type: ndcg_at_5 value: 48.644999999999996 - type: ndcg_at_10 value: 50.722 - type: ndcg_at_20 value: 52.168000000000006 - type: ndcg_at_100 value: 54.284 - type: ndcg_at_1000 value: 55.64 - type: map_at_1 value: 38.01 - type: map_at_3 value: 44.529 - type: map_at_5 value: 45.657 - type: map_at_10 value: 46.522999999999996 - type: map_at_20 value: 46.921 - type: map_at_100 value: 47.21 - type: map_at_1000 value: 47.257 - type: recall_at_1 value: 38.01 - type: recall_at_3 value: 52.624 - type: recall_at_5 value: 57.562999999999995 - type: recall_at_10 value: 63.943000000000005 - type: recall_at_20 value: 69.649 - type: recall_at_100 value: 81.114 - type: recall_at_1000 value: 92.03099999999999 - type: precision_at_1 value: 38.01 - type: precision_at_3 value: 17.541 - type: precision_at_5 value: 11.513 - type: precision_at_10 value: 6.394 - type: precision_at_20 value: 3.4819999999999998 - type: precision_at_100 value: 0.8109999999999999 - type: precision_at_1000 value: 0.092 - type: mrr_at_1 value: 38.0739 - type: mrr_at_3 value: 44.5626 - type: mrr_at_5 value: 45.6863 - type: mrr_at_10 value: 46.5541 - type: mrr_at_20 value: 46.9528 - type: mrr_at_100 value: 47.2419 - type: mrr_at_1000 value: 47.2883 - type: nauc_ndcg_at_1_max value: 29.1715 - type: nauc_ndcg_at_1_std value: -8.383799999999999 - type: nauc_ndcg_at_1_diff1 value: 56.6392 - type: nauc_ndcg_at_3_max value: 31.600499999999997 - type: nauc_ndcg_at_3_std value: -6.8286 - type: nauc_ndcg_at_3_diff1 value: 51.9436 - type: nauc_ndcg_at_5_max value: 31.446099999999998 - type: nauc_ndcg_at_5_std value: -6.3155 - type: nauc_ndcg_at_5_diff1 value: 51.4265 - type: nauc_ndcg_at_10_max value: 31.484 - type: nauc_ndcg_at_10_std value: -5.7347 - type: nauc_ndcg_at_10_diff1 value: 51.254 - type: nauc_ndcg_at_20_max value: 31.5004 - type: nauc_ndcg_at_20_std value: -5.141 - type: nauc_ndcg_at_20_diff1 value: 50.8621 - type: nauc_ndcg_at_100_max value: 31.4661 - type: nauc_ndcg_at_100_std value: -4.9658 - type: nauc_ndcg_at_100_diff1 value: 50.9602 - type: nauc_ndcg_at_1000_max value: 31.544299999999996 - type: nauc_ndcg_at_1000_std value: -5.0944 - type: nauc_ndcg_at_1000_diff1 value: 51.29559999999999 - type: nauc_map_at_1_max value: 29.1715 - type: nauc_map_at_1_std value: -8.383799999999999 - type: nauc_map_at_1_diff1 value: 56.6392 - type: nauc_map_at_3_max value: 31.0216 - type: nauc_map_at_3_std value: -7.2461 - type: nauc_map_at_3_diff1 value: 53.0413 - type: nauc_map_at_5_max value: 30.944300000000002 - type: nauc_map_at_5_std value: -6.9658999999999995 - type: nauc_map_at_5_diff1 value: 52.7782 - type: nauc_map_at_10_max value: 30.9525 - type: nauc_map_at_10_std value: -6.7453 - type: nauc_map_at_10_diff1 value: 52.7226 - type: nauc_map_at_20_max value: 30.9542 - type: nauc_map_at_20_std value: -6.5941 - type: nauc_map_at_20_diff1 value: 52.6293 - type: nauc_map_at_100_max value: 30.9493 - type: nauc_map_at_100_std value: -6.5776 - type: nauc_map_at_100_diff1 value: 52.65069999999999 - type: nauc_map_at_1000_max value: 30.9515 - type: nauc_map_at_1000_std value: -6.5804 - type: nauc_map_at_1000_diff1 value: 52.662299999999995 - type: nauc_recall_at_1_max value: 29.1715 - type: nauc_recall_at_1_std value: -8.383799999999999 - type: nauc_recall_at_1_diff1 value: 56.6392 - type: nauc_recall_at_3_max value: 33.317600000000006 - type: nauc_recall_at_3_std value: -5.569500000000001 - type: nauc_recall_at_3_diff1 value: 48.6968 - type: nauc_recall_at_5_max value: 32.9542 - type: nauc_recall_at_5_std value: -4.2065 - type: nauc_recall_at_5_diff1 value: 47.1643 - type: nauc_recall_at_10_max value: 33.253 - type: nauc_recall_at_10_std value: -1.9276000000000002 - type: nauc_recall_at_10_diff1 value: 46.1287 - type: nauc_recall_at_20_max value: 33.5398 - type: nauc_recall_at_20_std value: 1.4168 - type: nauc_recall_at_20_diff1 value: 43.5924 - type: nauc_recall_at_100_max value: 34.0873 - type: nauc_recall_at_100_std value: 6.0484 - type: nauc_recall_at_100_diff1 value: 41.1325 - type: nauc_recall_at_1000_max value: 39.7041 - type: nauc_recall_at_1000_std value: 15.0263 - type: nauc_recall_at_1000_diff1 value: 39.2976 - type: nauc_precision_at_1_max value: 29.1715 - type: nauc_precision_at_1_std value: -8.383799999999999 - type: nauc_precision_at_1_diff1 value: 56.6392 - type: nauc_precision_at_3_max value: 33.317600000000006 - type: nauc_precision_at_3_std value: -5.569500000000001 - type: nauc_precision_at_3_diff1 value: 48.6968 - type: nauc_precision_at_5_max value: 32.9542 - type: nauc_precision_at_5_std value: -4.2065 - type: nauc_precision_at_5_diff1 value: 47.1643 - type: nauc_precision_at_10_max value: 33.253 - type: nauc_precision_at_10_std value: -1.9276000000000002 - type: nauc_precision_at_10_diff1 value: 46.1287 - type: nauc_precision_at_20_max value: 33.5398 - type: nauc_precision_at_20_std value: 1.4168 - type: nauc_precision_at_20_diff1 value: 43.5924 - type: nauc_precision_at_100_max value: 34.0873 - type: nauc_precision_at_100_std value: 6.0484 - type: nauc_precision_at_100_diff1 value: 41.1325 - type: nauc_precision_at_1000_max value: 39.7041 - type: nauc_precision_at_1000_std value: 15.0263 - type: nauc_precision_at_1000_diff1 value: 39.2976 - type: nauc_mrr_at_1_max value: 29.1889 - type: nauc_mrr_at_1_std value: -8.3731 - type: nauc_mrr_at_1_diff1 value: 56.4441 - type: nauc_mrr_at_3_max value: 31.034 - type: nauc_mrr_at_3_std value: -7.2402 - type: nauc_mrr_at_3_diff1 value: 52.9257 - type: nauc_mrr_at_5_max value: 30.9601 - type: nauc_mrr_at_5_std value: -6.969799999999999 - type: nauc_mrr_at_5_diff1 value: 52.6602 - type: nauc_mrr_at_10_max value: 30.965300000000003 - type: nauc_mrr_at_10_std value: -6.741700000000001 - type: nauc_mrr_at_10_diff1 value: 52.6096 - type: nauc_mrr_at_20_max value: 30.9681 - type: nauc_mrr_at_20_std value: -6.5917 - type: nauc_mrr_at_20_diff1 value: 52.518299999999996 - type: nauc_mrr_at_100_max value: 30.9633 - type: nauc_mrr_at_100_std value: -6.575200000000001 - type: nauc_mrr_at_100_diff1 value: 52.539 - type: nauc_mrr_at_1000_max value: 30.965500000000002 - type: nauc_mrr_at_1000_std value: -6.578 - type: nauc_mrr_at_1000_diff1 value: 52.550399999999996 - type: main_score value: 50.722 task: type: Retrieval - dataset: config: php name: MTEB CodeSearchNetCCRetrieval (php) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 27.915 - type: ndcg_at_3 value: 35.388 - type: ndcg_at_5 value: 37.406 - type: ndcg_at_10 value: 39.660000000000004 - type: ndcg_at_20 value: 41.202 - type: ndcg_at_100 value: 43.916 - type: ndcg_at_1000 value: 45.867000000000004 - type: map_at_1 value: 27.915 - type: map_at_3 value: 33.545 - type: map_at_5 value: 34.666999999999994 - type: map_at_10 value: 35.606 - type: map_at_20 value: 36.032 - type: map_at_100 value: 36.399 - type: map_at_1000 value: 36.464999999999996 - type: recall_at_1 value: 27.915 - type: recall_at_3 value: 40.724 - type: recall_at_5 value: 45.612 - type: recall_at_10 value: 52.54 - type: recall_at_20 value: 58.61300000000001 - type: recall_at_100 value: 73.369 - type: recall_at_1000 value: 89.14699999999999 - type: precision_at_1 value: 27.915 - type: precision_at_3 value: 13.575000000000001 - type: precision_at_5 value: 9.122 - type: precision_at_10 value: 5.2540000000000004 - type: precision_at_20 value: 2.931 - type: precision_at_100 value: 0.734 - type: precision_at_1000 value: 0.089 - type: mrr_at_1 value: 27.8935 - type: mrr_at_3 value: 33.529599999999995 - type: mrr_at_5 value: 34.6563 - type: mrr_at_10 value: 35.596 - type: mrr_at_20 value: 36.0216 - type: mrr_at_100 value: 36.3884 - type: mrr_at_1000 value: 36.4547 - type: nauc_ndcg_at_1_max value: 23.1709 - type: nauc_ndcg_at_1_std value: -5.9072 - type: nauc_ndcg_at_1_diff1 value: 49.3299 - type: nauc_ndcg_at_3_max value: 22.8661 - type: nauc_ndcg_at_3_std value: -5.095899999999999 - type: nauc_ndcg_at_3_diff1 value: 43.9897 - type: nauc_ndcg_at_5_max value: 22.5328 - type: nauc_ndcg_at_5_std value: -4.7091 - type: nauc_ndcg_at_5_diff1 value: 43.3944 - type: nauc_ndcg_at_10_max value: 21.9501 - type: nauc_ndcg_at_10_std value: -4.162 - type: nauc_ndcg_at_10_diff1 value: 42.3066 - type: nauc_ndcg_at_20_max value: 21.9053 - type: nauc_ndcg_at_20_std value: -3.5355999999999996 - type: nauc_ndcg_at_20_diff1 value: 42.1593 - type: nauc_ndcg_at_100_max value: 21.7083 - type: nauc_ndcg_at_100_std value: -2.9722999999999997 - type: nauc_ndcg_at_100_diff1 value: 41.9229 - type: nauc_ndcg_at_1000_max value: 21.9067 - type: nauc_ndcg_at_1000_std value: -2.984 - type: nauc_ndcg_at_1000_diff1 value: 42.4281 - type: nauc_map_at_1_max value: 23.1709 - type: nauc_map_at_1_std value: -5.9072 - type: nauc_map_at_1_diff1 value: 49.3299 - type: nauc_map_at_3_max value: 22.9725 - type: nauc_map_at_3_std value: -5.292199999999999 - type: nauc_map_at_3_diff1 value: 45.2572 - type: nauc_map_at_5_max value: 22.7878 - type: nauc_map_at_5_std value: -5.0855999999999995 - type: nauc_map_at_5_diff1 value: 44.9362 - type: nauc_map_at_10_max value: 22.554299999999998 - type: nauc_map_at_10_std value: -4.855700000000001 - type: nauc_map_at_10_diff1 value: 44.472899999999996 - type: nauc_map_at_20_max value: 22.5365 - type: nauc_map_at_20_std value: -4.7015 - type: nauc_map_at_20_diff1 value: 44.441900000000004 - type: nauc_map_at_100_max value: 22.5246 - type: nauc_map_at_100_std value: -4.6318 - type: nauc_map_at_100_diff1 value: 44.4182 - type: nauc_map_at_1000_max value: 22.531200000000002 - type: nauc_map_at_1000_std value: -4.6294 - type: nauc_map_at_1000_diff1 value: 44.4336 - type: nauc_recall_at_1_max value: 23.1709 - type: nauc_recall_at_1_std value: -5.9072 - type: nauc_recall_at_1_diff1 value: 49.3299 - type: nauc_recall_at_3_max value: 22.5576 - type: nauc_recall_at_3_std value: -4.5496 - type: nauc_recall_at_3_diff1 value: 40.4722 - type: nauc_recall_at_5_max value: 21.755 - type: nauc_recall_at_5_std value: -3.5854 - type: nauc_recall_at_5_diff1 value: 38.9703 - type: nauc_recall_at_10_max value: 19.8814 - type: nauc_recall_at_10_std value: -1.8668 - type: nauc_recall_at_10_diff1 value: 35.5164 - type: nauc_recall_at_20_max value: 19.6191 - type: nauc_recall_at_20_std value: 1.0138 - type: nauc_recall_at_20_diff1 value: 34.443 - type: nauc_recall_at_100_max value: 17.1186 - type: nauc_recall_at_100_std value: 6.7912 - type: nauc_recall_at_100_diff1 value: 30.006100000000004 - type: nauc_recall_at_1000_max value: 16.4494 - type: nauc_recall_at_1000_std value: 17.0286 - type: nauc_recall_at_1000_diff1 value: 28.3205 - type: nauc_precision_at_1_max value: 23.1709 - type: nauc_precision_at_1_std value: -5.9072 - type: nauc_precision_at_1_diff1 value: 49.3299 - type: nauc_precision_at_3_max value: 22.5576 - type: nauc_precision_at_3_std value: -4.5496 - type: nauc_precision_at_3_diff1 value: 40.4722 - type: nauc_precision_at_5_max value: 21.755 - type: nauc_precision_at_5_std value: -3.5854 - type: nauc_precision_at_5_diff1 value: 38.9703 - type: nauc_precision_at_10_max value: 19.8814 - type: nauc_precision_at_10_std value: -1.8668 - type: nauc_precision_at_10_diff1 value: 35.5164 - type: nauc_precision_at_20_max value: 19.6191 - type: nauc_precision_at_20_std value: 1.0138 - type: nauc_precision_at_20_diff1 value: 34.443 - type: nauc_precision_at_100_max value: 17.1186 - type: nauc_precision_at_100_std value: 6.7912 - type: nauc_precision_at_100_diff1 value: 30.006100000000004 - type: nauc_precision_at_1000_max value: 16.4494 - type: nauc_precision_at_1000_std value: 17.0286 - type: nauc_precision_at_1000_diff1 value: 28.3205 - type: nauc_mrr_at_1_max value: 23.1792 - type: nauc_mrr_at_1_std value: -5.8884 - type: nauc_mrr_at_1_diff1 value: 49.411899999999996 - type: nauc_mrr_at_3_max value: 22.9617 - type: nauc_mrr_at_3_std value: -5.2925 - type: nauc_mrr_at_3_diff1 value: 45.2913 - type: nauc_mrr_at_5_max value: 22.7693 - type: nauc_mrr_at_5_std value: -5.0912 - type: nauc_mrr_at_5_diff1 value: 44.966699999999996 - type: nauc_mrr_at_10_max value: 22.5429 - type: nauc_mrr_at_10_std value: -4.8534 - type: nauc_mrr_at_10_diff1 value: 44.5081 - type: nauc_mrr_at_20_max value: 22.5247 - type: nauc_mrr_at_20_std value: -4.7001 - type: nauc_mrr_at_20_diff1 value: 44.4776 - type: nauc_mrr_at_100_max value: 22.5126 - type: nauc_mrr_at_100_std value: -4.6305 - type: nauc_mrr_at_100_diff1 value: 44.453900000000004 - type: nauc_mrr_at_1000_max value: 22.5191 - type: nauc_mrr_at_1000_std value: -4.6281 - type: nauc_mrr_at_1000_diff1 value: 44.469300000000004 - type: main_score value: 39.660000000000004 task: type: Retrieval - dataset: config: python name: MTEB CodeSearchNetRetrieval (python) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 71.3 - type: ndcg_at_3 value: 80.46600000000001 - type: ndcg_at_5 value: 82.657 - type: ndcg_at_10 value: 83.633 - type: ndcg_at_20 value: 84.108 - type: ndcg_at_100 value: 84.532 - type: ndcg_at_1000 value: 84.651 - type: map_at_1 value: 71.3 - type: map_at_3 value: 78.3 - type: map_at_5 value: 79.52 - type: map_at_10 value: 79.926 - type: map_at_20 value: 80.054 - type: map_at_100 value: 80.119 - type: map_at_1000 value: 80.124 - type: recall_at_1 value: 71.3 - type: recall_at_3 value: 86.7 - type: recall_at_5 value: 92.0 - type: recall_at_10 value: 95.0 - type: recall_at_20 value: 96.89999999999999 - type: recall_at_100 value: 99.1 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 71.3 - type: precision_at_3 value: 28.9 - type: precision_at_5 value: 18.4 - type: precision_at_10 value: 9.5 - type: precision_at_20 value: 4.845 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 71.3 - type: mrr_at_3 value: 78.3 - type: mrr_at_5 value: 79.52 - type: mrr_at_10 value: 79.9264 - type: mrr_at_20 value: 80.0537 - type: mrr_at_100 value: 80.119 - type: mrr_at_1000 value: 80.1241 - type: nauc_ndcg_at_1_max value: 42.5887 - type: nauc_ndcg_at_1_std value: -4.7713 - type: nauc_ndcg_at_1_diff1 value: 71.5211 - type: nauc_ndcg_at_3_max value: 42.682500000000005 - type: nauc_ndcg_at_3_std value: -9.7713 - type: nauc_ndcg_at_3_diff1 value: 70.09450000000001 - type: nauc_ndcg_at_5_max value: 42.8369 - type: nauc_ndcg_at_5_std value: -8.636000000000001 - type: nauc_ndcg_at_5_diff1 value: 70.06569999999999 - type: nauc_ndcg_at_10_max value: 42.0272 - type: nauc_ndcg_at_10_std value: -7.7864 - type: nauc_ndcg_at_10_diff1 value: 69.647 - type: nauc_ndcg_at_20_max value: 42.7338 - type: nauc_ndcg_at_20_std value: -7.842300000000001 - type: nauc_ndcg_at_20_diff1 value: 69.8122 - type: nauc_ndcg_at_100_max value: 42.7575 - type: nauc_ndcg_at_100_std value: -7.330299999999999 - type: nauc_ndcg_at_100_diff1 value: 69.9872 - type: nauc_ndcg_at_1000_max value: 42.6322 - type: nauc_ndcg_at_1000_std value: -7.4643 - type: nauc_ndcg_at_1000_diff1 value: 70.0635 - type: nauc_map_at_1_max value: 42.5887 - type: nauc_map_at_1_std value: -4.7713 - type: nauc_map_at_1_diff1 value: 71.5211 - type: nauc_map_at_3_max value: 42.5893 - type: nauc_map_at_3_std value: -8.2772 - type: nauc_map_at_3_diff1 value: 70.3236 - type: nauc_map_at_5_max value: 42.686099999999996 - type: nauc_map_at_5_std value: -7.6014 - type: nauc_map_at_5_diff1 value: 70.284 - type: nauc_map_at_10_max value: 42.4008 - type: nauc_map_at_10_std value: -7.2528 - type: nauc_map_at_10_diff1 value: 70.1571 - type: nauc_map_at_20_max value: 42.5568 - type: nauc_map_at_20_std value: -7.264900000000001 - type: nauc_map_at_20_diff1 value: 70.2095 - type: nauc_map_at_100_max value: 42.5674 - type: nauc_map_at_100_std value: -7.2189000000000005 - type: nauc_map_at_100_diff1 value: 70.238 - type: nauc_map_at_1000_max value: 42.564600000000006 - type: nauc_map_at_1000_std value: -7.217899999999999 - type: nauc_map_at_1000_diff1 value: 70.2391 - type: nauc_recall_at_1_max value: 42.5887 - type: nauc_recall_at_1_std value: -4.7713 - type: nauc_recall_at_1_diff1 value: 71.5211 - type: nauc_recall_at_3_max value: 43.1314 - type: nauc_recall_at_3_std value: -16.2854 - type: nauc_recall_at_3_diff1 value: 69.22319999999999 - type: nauc_recall_at_5_max value: 43.869 - type: nauc_recall_at_5_std value: -15.228800000000001 - type: nauc_recall_at_5_diff1 value: 68.9332 - type: nauc_recall_at_10_max value: 37.211 - type: nauc_recall_at_10_std value: -12.085899999999999 - type: nauc_recall_at_10_diff1 value: 64.212 - type: nauc_recall_at_20_max value: 47.346500000000006 - type: nauc_recall_at_20_std value: -15.5748 - type: nauc_recall_at_20_diff1 value: 63.3866 - type: nauc_recall_at_100_max value: 58.667899999999996 - type: nauc_recall_at_100_std value: 12.8333 - type: nauc_recall_at_100_diff1 value: 60.0633 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 42.5887 - type: nauc_precision_at_1_std value: -4.7713 - type: nauc_precision_at_1_diff1 value: 71.5211 - type: nauc_precision_at_3_max value: 43.1314 - type: nauc_precision_at_3_std value: -16.2854 - type: nauc_precision_at_3_diff1 value: 69.22319999999999 - type: nauc_precision_at_5_max value: 43.869 - type: nauc_precision_at_5_std value: -15.228800000000001 - type: nauc_precision_at_5_diff1 value: 68.9332 - type: nauc_precision_at_10_max value: 37.211 - type: nauc_precision_at_10_std value: -12.085899999999999 - type: nauc_precision_at_10_diff1 value: 64.212 - type: nauc_precision_at_20_max value: 47.346500000000006 - type: nauc_precision_at_20_std value: -15.5748 - type: nauc_precision_at_20_diff1 value: 63.3866 - type: nauc_precision_at_100_max value: 58.667899999999996 - type: nauc_precision_at_100_std value: 12.8333 - type: nauc_precision_at_100_diff1 value: 60.0633 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 42.5887 - type: nauc_mrr_at_1_std value: -4.7713 - type: nauc_mrr_at_1_diff1 value: 71.5211 - type: nauc_mrr_at_3_max value: 42.5893 - type: nauc_mrr_at_3_std value: -8.2772 - type: nauc_mrr_at_3_diff1 value: 70.3236 - type: nauc_mrr_at_5_max value: 42.686099999999996 - type: nauc_mrr_at_5_std value: -7.6014 - type: nauc_mrr_at_5_diff1 value: 70.284 - type: nauc_mrr_at_10_max value: 42.4008 - type: nauc_mrr_at_10_std value: -7.2528 - type: nauc_mrr_at_10_diff1 value: 70.1571 - type: nauc_mrr_at_20_max value: 42.5568 - type: nauc_mrr_at_20_std value: -7.264900000000001 - type: nauc_mrr_at_20_diff1 value: 70.2095 - type: nauc_mrr_at_100_max value: 42.5674 - type: nauc_mrr_at_100_std value: -7.2189000000000005 - type: nauc_mrr_at_100_diff1 value: 70.238 - type: nauc_mrr_at_1000_max value: 42.564600000000006 - type: nauc_mrr_at_1000_std value: -7.217899999999999 - type: nauc_mrr_at_1000_diff1 value: 70.2391 - type: main_score value: 83.633 task: type: Retrieval - dataset: config: javascript name: MTEB CodeSearchNetRetrieval (javascript) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 61.4 - type: ndcg_at_3 value: 69.833 - type: ndcg_at_5 value: 71.675 - type: ndcg_at_10 value: 72.83699999999999 - type: ndcg_at_20 value: 73.56899999999999 - type: ndcg_at_100 value: 74.50099999999999 - type: ndcg_at_1000 value: 75.473 - type: map_at_1 value: 61.4 - type: map_at_3 value: 67.80000000000001 - type: map_at_5 value: 68.815 - type: map_at_10 value: 69.294 - type: map_at_20 value: 69.49499999999999 - type: map_at_100 value: 69.618 - type: map_at_1000 value: 69.645 - type: recall_at_1 value: 61.4 - type: recall_at_3 value: 75.7 - type: recall_at_5 value: 80.2 - type: recall_at_10 value: 83.8 - type: recall_at_20 value: 86.7 - type: recall_at_100 value: 91.8 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 61.4 - type: precision_at_3 value: 25.233 - type: precision_at_5 value: 16.04 - type: precision_at_10 value: 8.38 - type: precision_at_20 value: 4.335 - type: precision_at_100 value: 0.918 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 61.4 - type: mrr_at_3 value: 67.80000000000001 - type: mrr_at_5 value: 68.815 - type: mrr_at_10 value: 69.294 - type: mrr_at_20 value: 69.4947 - type: mrr_at_100 value: 69.6181 - type: mrr_at_1000 value: 69.645 - type: nauc_ndcg_at_1_max value: 56.7217 - type: nauc_ndcg_at_1_std value: 24.8593 - type: nauc_ndcg_at_1_diff1 value: 71.9101 - type: nauc_ndcg_at_3_max value: 65.2032 - type: nauc_ndcg_at_3_std value: 32.0444 - type: nauc_ndcg_at_3_diff1 value: 70.0416 - type: nauc_ndcg_at_5_max value: 66.5758 - type: nauc_ndcg_at_5_std value: 36.1929 - type: nauc_ndcg_at_5_diff1 value: 70.3931 - type: nauc_ndcg_at_10_max value: 66.5108 - type: nauc_ndcg_at_10_std value: 36.121199999999995 - type: nauc_ndcg_at_10_diff1 value: 70.6475 - type: nauc_ndcg_at_20_max value: 66.7371 - type: nauc_ndcg_at_20_std value: 36.5925 - type: nauc_ndcg_at_20_diff1 value: 70.8488 - type: nauc_ndcg_at_100_max value: 66.2407 - type: nauc_ndcg_at_100_std value: 37.0769 - type: nauc_ndcg_at_100_diff1 value: 70.5349 - type: nauc_ndcg_at_1000_max value: 65.2728 - type: nauc_ndcg_at_1000_std value: 34.956199999999995 - type: nauc_ndcg_at_1000_diff1 value: 70.6395 - type: nauc_map_at_1_max value: 56.7217 - type: nauc_map_at_1_std value: 24.8593 - type: nauc_map_at_1_diff1 value: 71.9101 - type: nauc_map_at_3_max value: 63.0821 - type: nauc_map_at_3_std value: 30.2166 - type: nauc_map_at_3_diff1 value: 70.4667 - type: nauc_map_at_5_max value: 63.7133 - type: nauc_map_at_5_std value: 32.2817 - type: nauc_map_at_5_diff1 value: 70.6826 - type: nauc_map_at_10_max value: 63.6566 - type: nauc_map_at_10_std value: 32.2283 - type: nauc_map_at_10_diff1 value: 70.8001 - type: nauc_map_at_20_max value: 63.7023 - type: nauc_map_at_20_std value: 32.3021 - type: nauc_map_at_20_diff1 value: 70.8584 - type: nauc_map_at_100_max value: 63.645799999999994 - type: nauc_map_at_100_std value: 32.3835 - type: nauc_map_at_100_diff1 value: 70.8164 - type: nauc_map_at_1000_max value: 63.6211 - type: nauc_map_at_1000_std value: 32.334 - type: nauc_map_at_1000_diff1 value: 70.8146 - type: nauc_recall_at_1_max value: 56.7217 - type: nauc_recall_at_1_std value: 24.8593 - type: nauc_recall_at_1_diff1 value: 71.9101 - type: nauc_recall_at_3_max value: 72.6106 - type: nauc_recall_at_3_std value: 38.4448 - type: nauc_recall_at_3_diff1 value: 68.58030000000001 - type: nauc_recall_at_5_max value: 78.35889999999999 - type: nauc_recall_at_5_std value: 52.82829999999999 - type: nauc_recall_at_5_diff1 value: 69.30239999999999 - type: nauc_recall_at_10_max value: 80.32730000000001 - type: nauc_recall_at_10_std value: 55.5612 - type: nauc_recall_at_10_diff1 value: 70.1068 - type: nauc_recall_at_20_max value: 84.4507 - type: nauc_recall_at_20_std value: 62.841100000000004 - type: nauc_recall_at_20_diff1 value: 71.2689 - type: nauc_recall_at_100_max value: 86.8251 - type: nauc_recall_at_100_std value: 82.8944 - type: nauc_recall_at_100_diff1 value: 67.35950000000001 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 56.7217 - type: nauc_precision_at_1_std value: 24.8593 - type: nauc_precision_at_1_diff1 value: 71.9101 - type: nauc_precision_at_3_max value: 72.6106 - type: nauc_precision_at_3_std value: 38.4448 - type: nauc_precision_at_3_diff1 value: 68.58030000000001 - type: nauc_precision_at_5_max value: 78.35889999999999 - type: nauc_precision_at_5_std value: 52.82829999999999 - type: nauc_precision_at_5_diff1 value: 69.30239999999999 - type: nauc_precision_at_10_max value: 80.32730000000001 - type: nauc_precision_at_10_std value: 55.5612 - type: nauc_precision_at_10_diff1 value: 70.1068 - type: nauc_precision_at_20_max value: 84.4507 - type: nauc_precision_at_20_std value: 62.841100000000004 - type: nauc_precision_at_20_diff1 value: 71.2689 - type: nauc_precision_at_100_max value: 86.8251 - type: nauc_precision_at_100_std value: 82.8944 - type: nauc_precision_at_100_diff1 value: 67.35950000000001 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 56.7217 - type: nauc_mrr_at_1_std value: 24.8593 - type: nauc_mrr_at_1_diff1 value: 71.9101 - type: nauc_mrr_at_3_max value: 63.0821 - type: nauc_mrr_at_3_std value: 30.2166 - type: nauc_mrr_at_3_diff1 value: 70.4667 - type: nauc_mrr_at_5_max value: 63.7133 - type: nauc_mrr_at_5_std value: 32.2817 - type: nauc_mrr_at_5_diff1 value: 70.6826 - type: nauc_mrr_at_10_max value: 63.6566 - type: nauc_mrr_at_10_std value: 32.2283 - type: nauc_mrr_at_10_diff1 value: 70.8001 - type: nauc_mrr_at_20_max value: 63.7023 - type: nauc_mrr_at_20_std value: 32.3021 - type: nauc_mrr_at_20_diff1 value: 70.8584 - type: nauc_mrr_at_100_max value: 63.645799999999994 - type: nauc_mrr_at_100_std value: 32.3835 - type: nauc_mrr_at_100_diff1 value: 70.8164 - type: nauc_mrr_at_1000_max value: 63.6211 - type: nauc_mrr_at_1000_std value: 32.334 - type: nauc_mrr_at_1000_diff1 value: 70.8146 - type: main_score value: 72.83699999999999 task: type: Retrieval - dataset: config: go name: MTEB CodeSearchNetRetrieval (go) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 71.5 - type: ndcg_at_3 value: 80.566 - type: ndcg_at_5 value: 82.623 - type: ndcg_at_10 value: 83.694 - type: ndcg_at_20 value: 84.153 - type: ndcg_at_100 value: 84.597 - type: ndcg_at_1000 value: 84.73 - type: map_at_1 value: 71.5 - type: map_at_3 value: 78.43299999999999 - type: map_at_5 value: 79.57300000000001 - type: map_at_10 value: 80.037 - type: map_at_20 value: 80.164 - type: map_at_100 value: 80.231 - type: map_at_1000 value: 80.238 - type: recall_at_1 value: 71.5 - type: recall_at_3 value: 86.7 - type: recall_at_5 value: 91.7 - type: recall_at_10 value: 94.89999999999999 - type: recall_at_20 value: 96.7 - type: recall_at_100 value: 99.0 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 71.5 - type: precision_at_3 value: 28.9 - type: precision_at_5 value: 18.34 - type: precision_at_10 value: 9.49 - type: precision_at_20 value: 4.835 - type: precision_at_100 value: 0.9900000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 71.5 - type: mrr_at_3 value: 78.43329999999999 - type: mrr_at_5 value: 79.5733 - type: mrr_at_10 value: 80.0366 - type: mrr_at_20 value: 80.164 - type: mrr_at_100 value: 80.2314 - type: mrr_at_1000 value: 80.2376 - type: nauc_ndcg_at_1_max value: 46.1044 - type: nauc_ndcg_at_1_std value: -4.7079 - type: nauc_ndcg_at_1_diff1 value: 75.426 - type: nauc_ndcg_at_3_max value: 52.6854 - type: nauc_ndcg_at_3_std value: -5.7088 - type: nauc_ndcg_at_3_diff1 value: 72.5517 - type: nauc_ndcg_at_5_max value: 51.839400000000005 - type: nauc_ndcg_at_5_std value: -6.802700000000001 - type: nauc_ndcg_at_5_diff1 value: 72.17710000000001 - type: nauc_ndcg_at_10_max value: 51.4024 - type: nauc_ndcg_at_10_std value: -7.0518 - type: nauc_ndcg_at_10_diff1 value: 73.0671 - type: nauc_ndcg_at_20_max value: 51.029 - type: nauc_ndcg_at_20_std value: -6.6751000000000005 - type: nauc_ndcg_at_20_diff1 value: 73.4538 - type: nauc_ndcg_at_100_max value: 50.8548 - type: nauc_ndcg_at_100_std value: -5.9427 - type: nauc_ndcg_at_100_diff1 value: 73.51950000000001 - type: nauc_ndcg_at_1000_max value: 50.672 - type: nauc_ndcg_at_1000_std value: -6.0391 - type: nauc_ndcg_at_1000_diff1 value: 73.5247 - type: nauc_map_at_1_max value: 46.1044 - type: nauc_map_at_1_std value: -4.7079 - type: nauc_map_at_1_diff1 value: 75.426 - type: nauc_map_at_3_max value: 50.939299999999996 - type: nauc_map_at_3_std value: -5.3396 - type: nauc_map_at_3_diff1 value: 73.42490000000001 - type: nauc_map_at_5_max value: 50.4396 - type: nauc_map_at_5_std value: -5.8186 - type: nauc_map_at_5_diff1 value: 73.2819 - type: nauc_map_at_10_max value: 50.27890000000001 - type: nauc_map_at_10_std value: -5.8548 - type: nauc_map_at_10_diff1 value: 73.6528 - type: nauc_map_at_20_max value: 50.2054 - type: nauc_map_at_20_std value: -5.7458 - type: nauc_map_at_20_diff1 value: 73.7524 - type: nauc_map_at_100_max value: 50.1773 - type: nauc_map_at_100_std value: -5.6738 - type: nauc_map_at_100_diff1 value: 73.75460000000001 - type: nauc_map_at_1000_max value: 50.166999999999994 - type: nauc_map_at_1000_std value: -5.6814 - type: nauc_map_at_1000_diff1 value: 73.7542 - type: nauc_recall_at_1_max value: 46.1044 - type: nauc_recall_at_1_std value: -4.7079 - type: nauc_recall_at_1_diff1 value: 75.426 - type: nauc_recall_at_3_max value: 60.1177 - type: nauc_recall_at_3_std value: -7.3551 - type: nauc_recall_at_3_diff1 value: 68.7552 - type: nauc_recall_at_5_max value: 60.249399999999994 - type: nauc_recall_at_5_std value: -13.555600000000002 - type: nauc_recall_at_5_diff1 value: 65.0445 - type: nauc_recall_at_10_max value: 61.167 - type: nauc_recall_at_10_std value: -20.4198 - type: nauc_recall_at_10_diff1 value: 67.8246 - type: nauc_recall_at_20_max value: 59.404999999999994 - type: nauc_recall_at_20_std value: -21.929399999999998 - type: nauc_recall_at_20_diff1 value: 71.1994 - type: nauc_recall_at_100_max value: 66.6713 - type: nauc_recall_at_100_std value: -0.4949 - type: nauc_recall_at_100_diff1 value: 72.409 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 46.1044 - type: nauc_precision_at_1_std value: -4.7079 - type: nauc_precision_at_1_diff1 value: 75.426 - type: nauc_precision_at_3_max value: 60.1177 - type: nauc_precision_at_3_std value: -7.3551 - type: nauc_precision_at_3_diff1 value: 68.7552 - type: nauc_precision_at_5_max value: 60.249399999999994 - type: nauc_precision_at_5_std value: -13.555600000000002 - type: nauc_precision_at_5_diff1 value: 65.0445 - type: nauc_precision_at_10_max value: 61.167 - type: nauc_precision_at_10_std value: -20.4198 - type: nauc_precision_at_10_diff1 value: 67.8246 - type: nauc_precision_at_20_max value: 59.404999999999994 - type: nauc_precision_at_20_std value: -21.929399999999998 - type: nauc_precision_at_20_diff1 value: 71.1994 - type: nauc_precision_at_100_max value: 66.6713 - type: nauc_precision_at_100_std value: -0.4949 - type: nauc_precision_at_100_diff1 value: 72.409 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 46.1044 - type: nauc_mrr_at_1_std value: -4.7079 - type: nauc_mrr_at_1_diff1 value: 75.426 - type: nauc_mrr_at_3_max value: 50.939299999999996 - type: nauc_mrr_at_3_std value: -5.3396 - type: nauc_mrr_at_3_diff1 value: 73.42490000000001 - type: nauc_mrr_at_5_max value: 50.4396 - type: nauc_mrr_at_5_std value: -5.8186 - type: nauc_mrr_at_5_diff1 value: 73.2819 - type: nauc_mrr_at_10_max value: 50.27890000000001 - type: nauc_mrr_at_10_std value: -5.8548 - type: nauc_mrr_at_10_diff1 value: 73.6528 - type: nauc_mrr_at_20_max value: 50.2054 - type: nauc_mrr_at_20_std value: -5.7458 - type: nauc_mrr_at_20_diff1 value: 73.7524 - type: nauc_mrr_at_100_max value: 50.1773 - type: nauc_mrr_at_100_std value: -5.6738 - type: nauc_mrr_at_100_diff1 value: 73.75460000000001 - type: nauc_mrr_at_1000_max value: 50.166999999999994 - type: nauc_mrr_at_1000_std value: -5.6814 - type: nauc_mrr_at_1000_diff1 value: 73.7542 - type: main_score value: 83.694 task: type: Retrieval - dataset: config: ruby name: MTEB CodeSearchNetRetrieval (ruby) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 63.1 - type: ndcg_at_3 value: 73.48400000000001 - type: ndcg_at_5 value: 75.907 - type: ndcg_at_10 value: 76.81400000000001 - type: ndcg_at_20 value: 77.532 - type: ndcg_at_100 value: 78.25800000000001 - type: ndcg_at_1000 value: 78.739 - type: map_at_1 value: 63.1 - type: map_at_3 value: 70.98299999999999 - type: map_at_5 value: 72.32300000000001 - type: map_at_10 value: 72.7 - type: map_at_20 value: 72.902 - type: map_at_100 value: 73.00999999999999 - type: map_at_1000 value: 73.02499999999999 - type: recall_at_1 value: 63.1 - type: recall_at_3 value: 80.7 - type: recall_at_5 value: 86.6 - type: recall_at_10 value: 89.4 - type: recall_at_20 value: 92.2 - type: recall_at_100 value: 96.0 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 63.1 - type: precision_at_3 value: 26.900000000000002 - type: precision_at_5 value: 17.32 - type: precision_at_10 value: 8.94 - type: precision_at_20 value: 4.61 - type: precision_at_100 value: 0.96 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 63.1 - type: mrr_at_3 value: 70.9833 - type: mrr_at_5 value: 72.3233 - type: mrr_at_10 value: 72.6995 - type: mrr_at_20 value: 72.9017 - type: mrr_at_100 value: 73.0097 - type: mrr_at_1000 value: 73.0247 - type: nauc_ndcg_at_1_max value: 51.397099999999995 - type: nauc_ndcg_at_1_std value: 5.5686 - type: nauc_ndcg_at_1_diff1 value: 67.8159 - type: nauc_ndcg_at_3_max value: 51.7661 - type: nauc_ndcg_at_3_std value: 5.247199999999999 - type: nauc_ndcg_at_3_diff1 value: 62.2276 - type: nauc_ndcg_at_5_max value: 52.45649999999999 - type: nauc_ndcg_at_5_std value: 8.3289 - type: nauc_ndcg_at_5_diff1 value: 61.5048 - type: nauc_ndcg_at_10_max value: 53.376599999999996 - type: nauc_ndcg_at_10_std value: 10.0975 - type: nauc_ndcg_at_10_diff1 value: 61.206 - type: nauc_ndcg_at_20_max value: 53.4219 - type: nauc_ndcg_at_20_std value: 11.3499 - type: nauc_ndcg_at_20_diff1 value: 60.670199999999994 - type: nauc_ndcg_at_100_max value: 53.728699999999996 - type: nauc_ndcg_at_100_std value: 11.754299999999999 - type: nauc_ndcg_at_100_diff1 value: 61.2795 - type: nauc_ndcg_at_1000_max value: 53.1018 - type: nauc_ndcg_at_1000_std value: 9.7542 - type: nauc_ndcg_at_1000_diff1 value: 62.16779999999999 - type: nauc_map_at_1_max value: 51.397099999999995 - type: nauc_map_at_1_std value: 5.5686 - type: nauc_map_at_1_diff1 value: 67.8159 - type: nauc_map_at_3_max value: 51.701600000000006 - type: nauc_map_at_3_std value: 5.346900000000001 - type: nauc_map_at_3_diff1 value: 63.7526 - type: nauc_map_at_5_max value: 52.05030000000001 - type: nauc_map_at_5_std value: 6.901 - type: nauc_map_at_5_diff1 value: 63.4742 - type: nauc_map_at_10_max value: 52.3881 - type: nauc_map_at_10_std value: 7.557899999999999 - type: nauc_map_at_10_diff1 value: 63.385000000000005 - type: nauc_map_at_20_max value: 52.3801 - type: nauc_map_at_20_std value: 7.8098 - type: nauc_map_at_20_diff1 value: 63.2662 - type: nauc_map_at_100_max value: 52.440799999999996 - type: nauc_map_at_100_std value: 7.8723 - type: nauc_map_at_100_diff1 value: 63.362399999999994 - type: nauc_map_at_1000_max value: 52.4276 - type: nauc_map_at_1000_std value: 7.8245 - type: nauc_map_at_1000_diff1 value: 63.3886 - type: nauc_recall_at_1_max value: 51.397099999999995 - type: nauc_recall_at_1_std value: 5.5686 - type: nauc_recall_at_1_diff1 value: 67.8159 - type: nauc_recall_at_3_max value: 51.995000000000005 - type: nauc_recall_at_3_std value: 4.853 - type: nauc_recall_at_3_diff1 value: 56.3023 - type: nauc_recall_at_5_max value: 54.692099999999996 - type: nauc_recall_at_5_std value: 16.4925 - type: nauc_recall_at_5_diff1 value: 51.12179999999999 - type: nauc_recall_at_10_max value: 60.454699999999995 - type: nauc_recall_at_10_std value: 28.295900000000003 - type: nauc_recall_at_10_diff1 value: 47.063100000000006 - type: nauc_recall_at_20_max value: 63.59740000000001 - type: nauc_recall_at_20_std value: 47.2928 - type: nauc_recall_at_20_diff1 value: 37.1627 - type: nauc_recall_at_100_max value: 78.4162 - type: nauc_recall_at_100_std value: 88.6099 - type: nauc_recall_at_100_diff1 value: 28.975299999999997 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 51.397099999999995 - type: nauc_precision_at_1_std value: 5.5686 - type: nauc_precision_at_1_diff1 value: 67.8159 - type: nauc_precision_at_3_max value: 51.995000000000005 - type: nauc_precision_at_3_std value: 4.853 - type: nauc_precision_at_3_diff1 value: 56.3023 - type: nauc_precision_at_5_max value: 54.692099999999996 - type: nauc_precision_at_5_std value: 16.4925 - type: nauc_precision_at_5_diff1 value: 51.12179999999999 - type: nauc_precision_at_10_max value: 60.454699999999995 - type: nauc_precision_at_10_std value: 28.295900000000003 - type: nauc_precision_at_10_diff1 value: 47.063100000000006 - type: nauc_precision_at_20_max value: 63.59740000000001 - type: nauc_precision_at_20_std value: 47.2928 - type: nauc_precision_at_20_diff1 value: 37.1627 - type: nauc_precision_at_100_max value: 78.4162 - type: nauc_precision_at_100_std value: 88.6099 - type: nauc_precision_at_100_diff1 value: 28.975299999999997 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 51.397099999999995 - type: nauc_mrr_at_1_std value: 5.5686 - type: nauc_mrr_at_1_diff1 value: 67.8159 - type: nauc_mrr_at_3_max value: 51.701600000000006 - type: nauc_mrr_at_3_std value: 5.346900000000001 - type: nauc_mrr_at_3_diff1 value: 63.7526 - type: nauc_mrr_at_5_max value: 52.05030000000001 - type: nauc_mrr_at_5_std value: 6.901 - type: nauc_mrr_at_5_diff1 value: 63.4742 - type: nauc_mrr_at_10_max value: 52.3881 - type: nauc_mrr_at_10_std value: 7.557899999999999 - type: nauc_mrr_at_10_diff1 value: 63.385000000000005 - type: nauc_mrr_at_20_max value: 52.3801 - type: nauc_mrr_at_20_std value: 7.8098 - type: nauc_mrr_at_20_diff1 value: 63.2662 - type: nauc_mrr_at_100_max value: 52.440799999999996 - type: nauc_mrr_at_100_std value: 7.8723 - type: nauc_mrr_at_100_diff1 value: 63.362399999999994 - type: nauc_mrr_at_1000_max value: 52.4276 - type: nauc_mrr_at_1000_std value: 7.8245 - type: nauc_mrr_at_1000_diff1 value: 63.3886 - type: main_score value: 76.81400000000001 task: type: Retrieval - dataset: config: java name: MTEB CodeSearchNetRetrieval (java) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 52.1 - type: ndcg_at_3 value: 64.248 - type: ndcg_at_5 value: 67.213 - type: ndcg_at_10 value: 69.41199999999999 - type: ndcg_at_20 value: 70.43700000000001 - type: ndcg_at_100 value: 71.33800000000001 - type: ndcg_at_1000 value: 71.887 - type: map_at_1 value: 52.1 - type: map_at_3 value: 61.35 - type: map_at_5 value: 62.995000000000005 - type: map_at_10 value: 63.92 - type: map_at_20 value: 64.209 - type: map_at_100 value: 64.338 - type: map_at_1000 value: 64.352 - type: recall_at_1 value: 52.1 - type: recall_at_3 value: 72.6 - type: recall_at_5 value: 79.80000000000001 - type: recall_at_10 value: 86.5 - type: recall_at_20 value: 90.5 - type: recall_at_100 value: 95.3 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 52.1 - type: precision_at_3 value: 24.2 - type: precision_at_5 value: 15.959999999999999 - type: precision_at_10 value: 8.649999999999999 - type: precision_at_20 value: 4.5249999999999995 - type: precision_at_100 value: 0.9530000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 52.1 - type: mrr_at_3 value: 61.35 - type: mrr_at_5 value: 62.995000000000005 - type: mrr_at_10 value: 63.9199 - type: mrr_at_20 value: 64.209 - type: mrr_at_100 value: 64.338 - type: mrr_at_1000 value: 64.352 - type: nauc_ndcg_at_1_max value: 35.1263 - type: nauc_ndcg_at_1_std value: -12.454600000000001 - type: nauc_ndcg_at_1_diff1 value: 58.824 - type: nauc_ndcg_at_3_max value: 40.6703 - type: nauc_ndcg_at_3_std value: -9.0987 - type: nauc_ndcg_at_3_diff1 value: 52.3502 - type: nauc_ndcg_at_5_max value: 41.3895 - type: nauc_ndcg_at_5_std value: -7.630199999999999 - type: nauc_ndcg_at_5_diff1 value: 51.614599999999996 - type: nauc_ndcg_at_10_max value: 42.345699999999994 - type: nauc_ndcg_at_10_std value: -5.084700000000001 - type: nauc_ndcg_at_10_diff1 value: 53.396 - type: nauc_ndcg_at_20_max value: 42.215399999999995 - type: nauc_ndcg_at_20_std value: -4.825 - type: nauc_ndcg_at_20_diff1 value: 53.296699999999994 - type: nauc_ndcg_at_100_max value: 42.0653 - type: nauc_ndcg_at_100_std value: -4.356 - type: nauc_ndcg_at_100_diff1 value: 53.595099999999995 - type: nauc_ndcg_at_1000_max value: 41.016200000000005 - type: nauc_ndcg_at_1000_std value: -6.2975 - type: nauc_ndcg_at_1000_diff1 value: 53.7728 - type: nauc_map_at_1_max value: 35.1263 - type: nauc_map_at_1_std value: -12.454600000000001 - type: nauc_map_at_1_diff1 value: 58.824 - type: nauc_map_at_3_max value: 38.9371 - type: nauc_map_at_3_std value: -10.1381 - type: nauc_map_at_3_diff1 value: 54.008500000000005 - type: nauc_map_at_5_max value: 39.1816 - type: nauc_map_at_5_std value: -9.4667 - type: nauc_map_at_5_diff1 value: 53.748 - type: nauc_map_at_10_max value: 39.5398 - type: nauc_map_at_10_std value: -8.5131 - type: nauc_map_at_10_diff1 value: 54.433699999999995 - type: nauc_map_at_20_max value: 39.4926 - type: nauc_map_at_20_std value: -8.4859 - type: nauc_map_at_20_diff1 value: 54.4071 - type: nauc_map_at_100_max value: 39.4716 - type: nauc_map_at_100_std value: -8.4321 - type: nauc_map_at_100_diff1 value: 54.4382 - type: nauc_map_at_1000_max value: 39.4529 - type: nauc_map_at_1000_std value: -8.468499999999999 - type: nauc_map_at_1000_diff1 value: 54.4425 - type: nauc_recall_at_1_max value: 35.1263 - type: nauc_recall_at_1_std value: -12.454600000000001 - type: nauc_recall_at_1_diff1 value: 58.824 - type: nauc_recall_at_3_max value: 46.9678 - type: nauc_recall_at_3_std value: -5.3263 - type: nauc_recall_at_3_diff1 value: 46.4906 - type: nauc_recall_at_5_max value: 51.4392 - type: nauc_recall_at_5_std value: 0.864 - type: nauc_recall_at_5_diff1 value: 42.1144 - type: nauc_recall_at_10_max value: 60.5469 - type: nauc_recall_at_10_std value: 18.2879 - type: nauc_recall_at_10_diff1 value: 48.3112 - type: nauc_recall_at_20_max value: 65.8794 - type: nauc_recall_at_20_std value: 29.569499999999998 - type: nauc_recall_at_20_diff1 value: 45.7507 - type: nauc_recall_at_100_max value: 85.5603 - type: nauc_recall_at_100_std value: 75.366 - type: nauc_recall_at_100_diff1 value: 46.4102 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 35.1263 - type: nauc_precision_at_1_std value: -12.454600000000001 - type: nauc_precision_at_1_diff1 value: 58.824 - type: nauc_precision_at_3_max value: 46.9678 - type: nauc_precision_at_3_std value: -5.3263 - type: nauc_precision_at_3_diff1 value: 46.4906 - type: nauc_precision_at_5_max value: 51.4392 - type: nauc_precision_at_5_std value: 0.864 - type: nauc_precision_at_5_diff1 value: 42.1144 - type: nauc_precision_at_10_max value: 60.5469 - type: nauc_precision_at_10_std value: 18.2879 - type: nauc_precision_at_10_diff1 value: 48.3112 - type: nauc_precision_at_20_max value: 65.8794 - type: nauc_precision_at_20_std value: 29.569499999999998 - type: nauc_precision_at_20_diff1 value: 45.7507 - type: nauc_precision_at_100_max value: 85.5603 - type: nauc_precision_at_100_std value: 75.366 - type: nauc_precision_at_100_diff1 value: 46.4102 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 35.1263 - type: nauc_mrr_at_1_std value: -12.454600000000001 - type: nauc_mrr_at_1_diff1 value: 58.824 - type: nauc_mrr_at_3_max value: 38.9371 - type: nauc_mrr_at_3_std value: -10.1381 - type: nauc_mrr_at_3_diff1 value: 54.008500000000005 - type: nauc_mrr_at_5_max value: 39.1816 - type: nauc_mrr_at_5_std value: -9.4667 - type: nauc_mrr_at_5_diff1 value: 53.748 - type: nauc_mrr_at_10_max value: 39.5398 - type: nauc_mrr_at_10_std value: -8.5131 - type: nauc_mrr_at_10_diff1 value: 54.433699999999995 - type: nauc_mrr_at_20_max value: 39.4926 - type: nauc_mrr_at_20_std value: -8.4859 - type: nauc_mrr_at_20_diff1 value: 54.4071 - type: nauc_mrr_at_100_max value: 39.4716 - type: nauc_mrr_at_100_std value: -8.4321 - type: nauc_mrr_at_100_diff1 value: 54.4382 - type: nauc_mrr_at_1000_max value: 39.4529 - type: nauc_mrr_at_1000_std value: -8.468499999999999 - type: nauc_mrr_at_1000_diff1 value: 54.4425 - type: main_score value: 69.41199999999999 task: type: Retrieval - dataset: config: php name: MTEB CodeSearchNetRetrieval (php) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 60.3 - type: ndcg_at_3 value: 71.487 - type: ndcg_at_5 value: 73.359 - type: ndcg_at_10 value: 75.13 - type: ndcg_at_20 value: 75.768 - type: ndcg_at_100 value: 76.652 - type: ndcg_at_1000 value: 77.061 - type: map_at_1 value: 60.3 - type: map_at_3 value: 68.75 - type: map_at_5 value: 69.8 - type: map_at_10 value: 70.526 - type: map_at_20 value: 70.705 - type: map_at_100 value: 70.838 - type: map_at_1000 value: 70.84899999999999 - type: recall_at_1 value: 60.3 - type: recall_at_3 value: 79.4 - type: recall_at_5 value: 83.89999999999999 - type: recall_at_10 value: 89.4 - type: recall_at_20 value: 91.9 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 60.3 - type: precision_at_3 value: 26.467000000000002 - type: precision_at_5 value: 16.78 - type: precision_at_10 value: 8.94 - type: precision_at_20 value: 4.595 - type: precision_at_100 value: 0.9650000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 60.3 - type: mrr_at_3 value: 68.75 - type: mrr_at_5 value: 69.8 - type: mrr_at_10 value: 70.52619999999999 - type: mrr_at_20 value: 70.7048 - type: mrr_at_100 value: 70.838 - type: mrr_at_1000 value: 70.8488 - type: nauc_ndcg_at_1_max value: 45.8593 - type: nauc_ndcg_at_1_std value: 13.2893 - type: nauc_ndcg_at_1_diff1 value: 66.718 - type: nauc_ndcg_at_3_max value: 55.4137 - type: nauc_ndcg_at_3_std value: 23.0079 - type: nauc_ndcg_at_3_diff1 value: 63.693200000000004 - type: nauc_ndcg_at_5_max value: 56.2033 - type: nauc_ndcg_at_5_std value: 25.2245 - type: nauc_ndcg_at_5_diff1 value: 65.0071 - type: nauc_ndcg_at_10_max value: 56.540400000000005 - type: nauc_ndcg_at_10_std value: 26.323400000000003 - type: nauc_ndcg_at_10_diff1 value: 65.8486 - type: nauc_ndcg_at_20_max value: 56.2864 - type: nauc_ndcg_at_20_std value: 26.6575 - type: nauc_ndcg_at_20_diff1 value: 65.6045 - type: nauc_ndcg_at_100_max value: 55.2604 - type: nauc_ndcg_at_100_std value: 24.9411 - type: nauc_ndcg_at_100_diff1 value: 65.9764 - type: nauc_ndcg_at_1000_max value: 54.514799999999994 - type: nauc_ndcg_at_1000_std value: 23.7436 - type: nauc_ndcg_at_1000_diff1 value: 65.6415 - type: nauc_map_at_1_max value: 45.8593 - type: nauc_map_at_1_std value: 13.2893 - type: nauc_map_at_1_diff1 value: 66.718 - type: nauc_map_at_3_max value: 52.809799999999996 - type: nauc_map_at_3_std value: 20.2338 - type: nauc_map_at_3_diff1 value: 64.4615 - type: nauc_map_at_5_max value: 53.10080000000001 - type: nauc_map_at_5_std value: 21.2375 - type: nauc_map_at_5_diff1 value: 65.1416 - type: nauc_map_at_10_max value: 53.117000000000004 - type: nauc_map_at_10_std value: 21.512999999999998 - type: nauc_map_at_10_diff1 value: 65.4616 - type: nauc_map_at_20_max value: 53.0434 - type: nauc_map_at_20_std value: 21.5865 - type: nauc_map_at_20_diff1 value: 65.4014 - type: nauc_map_at_100_max value: 52.898199999999996 - type: nauc_map_at_100_std value: 21.357 - type: nauc_map_at_100_diff1 value: 65.4438 - type: nauc_map_at_1000_max value: 52.8844 - type: nauc_map_at_1000_std value: 21.3357 - type: nauc_map_at_1000_diff1 value: 65.4388 - type: nauc_recall_at_1_max value: 45.8593 - type: nauc_recall_at_1_std value: 13.2893 - type: nauc_recall_at_1_diff1 value: 66.718 - type: nauc_recall_at_3_max value: 65.5352 - type: nauc_recall_at_3_std value: 33.8655 - type: nauc_recall_at_3_diff1 value: 60.740300000000005 - type: nauc_recall_at_5_max value: 70.9819 - type: nauc_recall_at_5_std value: 44.5937 - type: nauc_recall_at_5_diff1 value: 64.7568 - type: nauc_recall_at_10_max value: 80.07469999999999 - type: nauc_recall_at_10_std value: 60.3717 - type: nauc_recall_at_10_diff1 value: 69.6608 - type: nauc_recall_at_20_max value: 84.3633 - type: nauc_recall_at_20_std value: 73.2136 - type: nauc_recall_at_20_diff1 value: 68.3675 - type: nauc_recall_at_100_max value: 91.4499 - type: nauc_recall_at_100_std value: 83.50410000000001 - type: nauc_recall_at_100_diff1 value: 82.91579999999999 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 45.8593 - type: nauc_precision_at_1_std value: 13.2893 - type: nauc_precision_at_1_diff1 value: 66.718 - type: nauc_precision_at_3_max value: 65.5352 - type: nauc_precision_at_3_std value: 33.8655 - type: nauc_precision_at_3_diff1 value: 60.740300000000005 - type: nauc_precision_at_5_max value: 70.9819 - type: nauc_precision_at_5_std value: 44.5937 - type: nauc_precision_at_5_diff1 value: 64.7568 - type: nauc_precision_at_10_max value: 80.07469999999999 - type: nauc_precision_at_10_std value: 60.3717 - type: nauc_precision_at_10_diff1 value: 69.6608 - type: nauc_precision_at_20_max value: 84.3633 - type: nauc_precision_at_20_std value: 73.2136 - type: nauc_precision_at_20_diff1 value: 68.3675 - type: nauc_precision_at_100_max value: 91.4499 - type: nauc_precision_at_100_std value: 83.50410000000001 - type: nauc_precision_at_100_diff1 value: 82.91579999999999 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 45.8593 - type: nauc_mrr_at_1_std value: 13.2893 - type: nauc_mrr_at_1_diff1 value: 66.718 - type: nauc_mrr_at_3_max value: 52.809799999999996 - type: nauc_mrr_at_3_std value: 20.2338 - type: nauc_mrr_at_3_diff1 value: 64.4615 - type: nauc_mrr_at_5_max value: 53.10080000000001 - type: nauc_mrr_at_5_std value: 21.2375 - type: nauc_mrr_at_5_diff1 value: 65.1416 - type: nauc_mrr_at_10_max value: 53.117000000000004 - type: nauc_mrr_at_10_std value: 21.512999999999998 - type: nauc_mrr_at_10_diff1 value: 65.4616 - type: nauc_mrr_at_20_max value: 53.0434 - type: nauc_mrr_at_20_std value: 21.5865 - type: nauc_mrr_at_20_diff1 value: 65.4014 - type: nauc_mrr_at_100_max value: 52.898199999999996 - type: nauc_mrr_at_100_std value: 21.357 - type: nauc_mrr_at_100_diff1 value: 65.4438 - type: nauc_mrr_at_1000_max value: 52.8844 - type: nauc_mrr_at_1000_std value: 21.3357 - type: nauc_mrr_at_1000_diff1 value: 65.4388 - type: main_score value: 75.13 task: type: Retrieval - dataset: config: default name: MTEB CodeTransOceanContest (default) revision: 20da4eb20a4b17300c0986ee148c90867a7f2a4d split: test type: CoIR-Retrieval/codetrans-contest metrics: - type: ndcg_at_1 value: 55.656000000000006 - type: ndcg_at_3 value: 62.497 - type: ndcg_at_5 value: 64.95100000000001 - type: ndcg_at_10 value: 66.733 - type: ndcg_at_20 value: 67.778 - type: ndcg_at_100 value: 69.962 - type: ndcg_at_1000 value: 70.736 - type: map_at_1 value: 55.656000000000006 - type: map_at_3 value: 60.934999999999995 - type: map_at_5 value: 62.315 - type: map_at_10 value: 63.065000000000005 - type: map_at_20 value: 63.36000000000001 - type: map_at_100 value: 63.663000000000004 - type: map_at_1000 value: 63.696 - type: recall_at_1 value: 55.656000000000006 - type: recall_at_3 value: 66.968 - type: recall_at_5 value: 72.851 - type: recall_at_10 value: 78.281 - type: recall_at_20 value: 82.353 - type: recall_at_100 value: 94.118 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 55.656000000000006 - type: precision_at_3 value: 22.323 - type: precision_at_5 value: 14.57 - type: precision_at_10 value: 7.828 - type: precision_at_20 value: 4.118 - type: precision_at_100 value: 0.941 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 55.656099999999995 - type: mrr_at_3 value: 60.9351 - type: mrr_at_5 value: 62.315200000000004 - type: mrr_at_10 value: 63.0653 - type: mrr_at_20 value: 63.360099999999996 - type: mrr_at_100 value: 63.6629 - type: mrr_at_1000 value: 63.695800000000006 - type: nauc_ndcg_at_1_max value: 51.957600000000006 - type: nauc_ndcg_at_1_std value: -1.4414 - type: nauc_ndcg_at_1_diff1 value: 73.7269 - type: nauc_ndcg_at_3_max value: 56.2033 - type: nauc_ndcg_at_3_std value: -0.5342 - type: nauc_ndcg_at_3_diff1 value: 71.29339999999999 - type: nauc_ndcg_at_5_max value: 53.2043 - type: nauc_ndcg_at_5_std value: -4.2406 - type: nauc_ndcg_at_5_diff1 value: 71.288 - type: nauc_ndcg_at_10_max value: 53.864999999999995 - type: nauc_ndcg_at_10_std value: -1.7964 - type: nauc_ndcg_at_10_diff1 value: 71.3515 - type: nauc_ndcg_at_20_max value: 53.8995 - type: nauc_ndcg_at_20_std value: -2.3122 - type: nauc_ndcg_at_20_diff1 value: 71.5024 - type: nauc_ndcg_at_100_max value: 53.7574 - type: nauc_ndcg_at_100_std value: -2.1357 - type: nauc_ndcg_at_100_diff1 value: 71.57249999999999 - type: nauc_ndcg_at_1000_max value: 53.7629 - type: nauc_ndcg_at_1000_std value: -2.2336 - type: nauc_ndcg_at_1000_diff1 value: 71.6512 - type: nauc_map_at_1_max value: 51.957600000000006 - type: nauc_map_at_1_std value: -1.4414 - type: nauc_map_at_1_diff1 value: 73.7269 - type: nauc_map_at_3_max value: 55.3725 - type: nauc_map_at_3_std value: -0.7385 - type: nauc_map_at_3_diff1 value: 71.94669999999999 - type: nauc_map_at_5_max value: 53.759100000000004 - type: nauc_map_at_5_std value: -2.6806 - type: nauc_map_at_5_diff1 value: 71.97 - type: nauc_map_at_10_max value: 53.9832 - type: nauc_map_at_10_std value: -1.8215 - type: nauc_map_at_10_diff1 value: 72.0873 - type: nauc_map_at_20_max value: 53.9655 - type: nauc_map_at_20_std value: -1.9612 - type: nauc_map_at_20_diff1 value: 72.1207 - type: nauc_map_at_100_max value: 53.8791 - type: nauc_map_at_100_std value: -1.9848000000000001 - type: nauc_map_at_100_diff1 value: 72.0929 - type: nauc_map_at_1000_max value: 53.8818 - type: nauc_map_at_1000_std value: -1.9868000000000001 - type: nauc_map_at_1000_diff1 value: 72.0883 - type: nauc_recall_at_1_max value: 51.957600000000006 - type: nauc_recall_at_1_std value: -1.4414 - type: nauc_recall_at_1_diff1 value: 73.7269 - type: nauc_recall_at_3_max value: 58.7272 - type: nauc_recall_at_3_std value: 0.10269999999999999 - type: nauc_recall_at_3_diff1 value: 69.2012 - type: nauc_recall_at_5_max value: 50.545700000000004 - type: nauc_recall_at_5_std value: -10.5393 - type: nauc_recall_at_5_diff1 value: 68.8226 - type: nauc_recall_at_10_max value: 53.0698 - type: nauc_recall_at_10_std value: -0.7827000000000001 - type: nauc_recall_at_10_diff1 value: 68.00110000000001 - type: nauc_recall_at_20_max value: 53.4631 - type: nauc_recall_at_20_std value: -3.6452 - type: nauc_recall_at_20_diff1 value: 68.3947 - type: nauc_recall_at_100_max value: 54.212700000000005 - type: nauc_recall_at_100_std value: 1.2398 - type: nauc_recall_at_100_diff1 value: 67.33590000000001 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 51.957600000000006 - type: nauc_precision_at_1_std value: -1.4414 - type: nauc_precision_at_1_diff1 value: 73.7269 - type: nauc_precision_at_3_max value: 58.7272 - type: nauc_precision_at_3_std value: 0.10269999999999999 - type: nauc_precision_at_3_diff1 value: 69.2012 - type: nauc_precision_at_5_max value: 50.545700000000004 - type: nauc_precision_at_5_std value: -10.5393 - type: nauc_precision_at_5_diff1 value: 68.8226 - type: nauc_precision_at_10_max value: 53.0698 - type: nauc_precision_at_10_std value: -0.7827000000000001 - type: nauc_precision_at_10_diff1 value: 68.00110000000001 - type: nauc_precision_at_20_max value: 53.4631 - type: nauc_precision_at_20_std value: -3.6452 - type: nauc_precision_at_20_diff1 value: 68.3947 - type: nauc_precision_at_100_max value: 54.212700000000005 - type: nauc_precision_at_100_std value: 1.2398 - type: nauc_precision_at_100_diff1 value: 67.33590000000001 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 100.0 - type: nauc_precision_at_1000_diff1 value: 100.0 - type: nauc_mrr_at_1_max value: 51.957600000000006 - type: nauc_mrr_at_1_std value: -1.4414 - type: nauc_mrr_at_1_diff1 value: 73.7269 - type: nauc_mrr_at_3_max value: 55.3725 - type: nauc_mrr_at_3_std value: -0.7385 - type: nauc_mrr_at_3_diff1 value: 71.94669999999999 - type: nauc_mrr_at_5_max value: 53.759100000000004 - type: nauc_mrr_at_5_std value: -2.6806 - type: nauc_mrr_at_5_diff1 value: 71.97 - type: nauc_mrr_at_10_max value: 53.9832 - type: nauc_mrr_at_10_std value: -1.8215 - type: nauc_mrr_at_10_diff1 value: 72.0873 - type: nauc_mrr_at_20_max value: 53.9655 - type: nauc_mrr_at_20_std value: -1.9612 - type: nauc_mrr_at_20_diff1 value: 72.1207 - type: nauc_mrr_at_100_max value: 53.8791 - type: nauc_mrr_at_100_std value: -1.9848000000000001 - type: nauc_mrr_at_100_diff1 value: 72.0929 - type: nauc_mrr_at_1000_max value: 53.8818 - type: nauc_mrr_at_1000_std value: -1.9868000000000001 - type: nauc_mrr_at_1000_diff1 value: 72.0883 - type: main_score value: 66.733 task: type: Retrieval - dataset: config: default name: MTEB CodeTransOceanDL (default) revision: 281562cb8a1265ab5c0824bfa6ddcd9b0a15618f split: test type: CoIR-Retrieval/codetrans-dl metrics: - type: ndcg_at_1 value: 8.889 - type: ndcg_at_3 value: 9.868 - type: ndcg_at_5 value: 16.543 - type: ndcg_at_10 value: 29.599999999999998 - type: ndcg_at_20 value: 36.004999999999995 - type: ndcg_at_100 value: 37.442 - type: ndcg_at_1000 value: 37.601 - type: map_at_1 value: 8.889 - type: map_at_3 value: 9.629999999999999 - type: map_at_5 value: 13.491 - type: map_at_10 value: 18.733 - type: map_at_20 value: 20.687 - type: map_at_100 value: 20.886 - type: map_at_1000 value: 20.895 - type: recall_at_1 value: 8.889 - type: recall_at_3 value: 10.556000000000001 - type: recall_at_5 value: 26.111 - type: recall_at_10 value: 67.22200000000001 - type: recall_at_20 value: 91.111 - type: recall_at_100 value: 98.88900000000001 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 8.889 - type: precision_at_3 value: 3.519 - type: precision_at_5 value: 5.222 - type: precision_at_10 value: 6.722 - type: precision_at_20 value: 4.556 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 1.6667 - type: mrr_at_3 value: 7.963000000000001 - type: mrr_at_5 value: 9.6296 - type: mrr_at_10 value: 15.607099999999999 - type: mrr_at_20 value: 17.2877 - type: mrr_at_100 value: 17.5377 - type: mrr_at_1000 value: 17.5465 - type: nauc_ndcg_at_1_max value: -41.348600000000005 - type: nauc_ndcg_at_1_std value: -29.3584 - type: nauc_ndcg_at_1_diff1 value: -31.9493 - type: nauc_ndcg_at_3_max value: -42.877700000000004 - type: nauc_ndcg_at_3_std value: -31.703599999999998 - type: nauc_ndcg_at_3_diff1 value: -26.914500000000004 - type: nauc_ndcg_at_5_max value: -33.1784 - type: nauc_ndcg_at_5_std value: -24.2625 - type: nauc_ndcg_at_5_diff1 value: -11.164399999999999 - type: nauc_ndcg_at_10_max value: -34.5597 - type: nauc_ndcg_at_10_std value: -28.0239 - type: nauc_ndcg_at_10_diff1 value: -8.6589 - type: nauc_ndcg_at_20_max value: -41.0648 - type: nauc_ndcg_at_20_std value: -28.6854 - type: nauc_ndcg_at_20_diff1 value: -12.1999 - type: nauc_ndcg_at_100_max value: -38.2277 - type: nauc_ndcg_at_100_std value: -30.397999999999996 - type: nauc_ndcg_at_100_diff1 value: -14.3859 - type: nauc_ndcg_at_1000_max value: -38.6002 - type: nauc_ndcg_at_1000_std value: -28.9056 - type: nauc_ndcg_at_1000_diff1 value: -14.619499999999999 - type: nauc_map_at_1_max value: -41.348600000000005 - type: nauc_map_at_1_std value: -29.3584 - type: nauc_map_at_1_diff1 value: -31.9493 - type: nauc_map_at_3_max value: -42.5041 - type: nauc_map_at_3_std value: -31.1456 - type: nauc_map_at_3_diff1 value: -27.8752 - type: nauc_map_at_5_max value: -36.146 - type: nauc_map_at_5_std value: -26.268900000000002 - type: nauc_map_at_5_diff1 value: -17.1717 - type: nauc_map_at_10_max value: -36.594300000000004 - type: nauc_map_at_10_std value: -27.884199999999996 - type: nauc_map_at_10_diff1 value: -15.7719 - type: nauc_map_at_20_max value: -38.9209 - type: nauc_map_at_20_std value: -28.2712 - type: nauc_map_at_20_diff1 value: -17.167199999999998 - type: nauc_map_at_100_max value: -38.5835 - type: nauc_map_at_100_std value: -28.5457 - type: nauc_map_at_100_diff1 value: -17.4205 - type: nauc_map_at_1000_max value: -38.6011 - type: nauc_map_at_1000_std value: -28.4752 - type: nauc_map_at_1000_diff1 value: -17.4332 - type: nauc_recall_at_1_max value: -41.348600000000005 - type: nauc_recall_at_1_std value: -29.3584 - type: nauc_recall_at_1_diff1 value: -31.9493 - type: nauc_recall_at_3_max value: -43.884499999999996 - type: nauc_recall_at_3_std value: -33.202 - type: nauc_recall_at_3_diff1 value: -24.4202 - type: nauc_recall_at_5_max value: -27.2488 - type: nauc_recall_at_5_std value: -20.238999999999997 - type: nauc_recall_at_5_diff1 value: 0.5009 - type: nauc_recall_at_10_max value: -30.416700000000002 - type: nauc_recall_at_10_std value: -29.2207 - type: nauc_recall_at_10_diff1 value: 7.2459 - type: nauc_recall_at_20_max value: -63.0894 - type: nauc_recall_at_20_std value: -33.3975 - type: nauc_recall_at_20_diff1 value: 12.6371 - type: nauc_recall_at_100_max value: -2.4276 - type: nauc_recall_at_100_std value: -173.9963 - type: nauc_recall_at_100_diff1 value: 7.9365000000000006 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: -41.348600000000005 - type: nauc_precision_at_1_std value: -29.3584 - type: nauc_precision_at_1_diff1 value: -31.9493 - type: nauc_precision_at_3_max value: -43.884499999999996 - type: nauc_precision_at_3_std value: -33.202 - type: nauc_precision_at_3_diff1 value: -24.4202 - type: nauc_precision_at_5_max value: -27.2488 - type: nauc_precision_at_5_std value: -20.238999999999997 - type: nauc_precision_at_5_diff1 value: 0.5009 - type: nauc_precision_at_10_max value: -30.416700000000002 - type: nauc_precision_at_10_std value: -29.2207 - type: nauc_precision_at_10_diff1 value: 7.2459 - type: nauc_precision_at_20_max value: -63.0894 - type: nauc_precision_at_20_std value: -33.3975 - type: nauc_precision_at_20_diff1 value: 12.6371 - type: nauc_precision_at_100_max value: -2.4276 - type: nauc_precision_at_100_std value: -173.9963 - type: nauc_precision_at_100_diff1 value: 7.9365000000000006 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 100.0 - type: nauc_precision_at_1000_diff1 value: 100.0 - type: nauc_mrr_at_1_max value: -54.9682 - type: nauc_mrr_at_1_std value: -52.464 - type: nauc_mrr_at_1_diff1 value: -14.193700000000002 - type: nauc_mrr_at_3_max value: -26.9762 - type: nauc_mrr_at_3_std value: -21.9893 - type: nauc_mrr_at_3_diff1 value: 22.9584 - type: nauc_mrr_at_5_max value: -26.8118 - type: nauc_mrr_at_5_std value: -25.476300000000002 - type: nauc_mrr_at_5_diff1 value: 16.8933 - type: nauc_mrr_at_10_max value: -32.9675 - type: nauc_mrr_at_10_std value: -29.8253 - type: nauc_mrr_at_10_diff1 value: 23.7632 - type: nauc_mrr_at_20_max value: -32.831700000000005 - type: nauc_mrr_at_20_std value: -27.0541 - type: nauc_mrr_at_20_diff1 value: 21.238599999999998 - type: nauc_mrr_at_100_max value: -32.2085 - type: nauc_mrr_at_100_std value: -27.3913 - type: nauc_mrr_at_100_diff1 value: 21.2347 - type: nauc_mrr_at_1000_max value: -32.230399999999996 - type: nauc_mrr_at_1000_std value: -27.2842 - type: nauc_mrr_at_1000_diff1 value: 21.2439 - type: main_score value: 29.599999999999998 task: type: Retrieval - dataset: config: default name: MTEB CosQA (default) revision: bc5efb7e9d437246ce393ed19d772e08e4a79535 split: test type: CoIR-Retrieval/cosqa metrics: - type: ndcg_at_1 value: 16.0 - type: ndcg_at_3 value: 25.474000000000004 - type: ndcg_at_5 value: 31.291000000000004 - type: ndcg_at_10 value: 36.619 - type: ndcg_at_20 value: 39.513999999999996 - type: ndcg_at_100 value: 43.002 - type: ndcg_at_1000 value: 43.846000000000004 - type: map_at_1 value: 16.0 - type: map_at_3 value: 22.967000000000002 - type: map_at_5 value: 26.177 - type: map_at_10 value: 28.427999999999997 - type: map_at_20 value: 29.229 - type: map_at_100 value: 29.725 - type: map_at_1000 value: 29.761 - type: recall_at_1 value: 16.0 - type: recall_at_3 value: 32.800000000000004 - type: recall_at_5 value: 47.0 - type: recall_at_10 value: 63.2 - type: recall_at_20 value: 74.6 - type: recall_at_100 value: 93.2 - type: recall_at_1000 value: 99.6 - type: precision_at_1 value: 16.0 - type: precision_at_3 value: 10.933 - type: precision_at_5 value: 9.4 - type: precision_at_10 value: 6.32 - type: precision_at_20 value: 3.73 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 16.400000000000002 - type: mrr_at_3 value: 24.1333 - type: mrr_at_5 value: 26.043300000000002 - type: mrr_at_10 value: 28.3194 - type: mrr_at_20 value: 29.2356 - type: mrr_at_100 value: 29.7487 - type: mrr_at_1000 value: 29.786600000000004 - type: nauc_ndcg_at_1_max value: 3.254 - type: nauc_ndcg_at_1_std value: -14.7227 - type: nauc_ndcg_at_1_diff1 value: 37.6337 - type: nauc_ndcg_at_3_max value: 7.615600000000001 - type: nauc_ndcg_at_3_std value: -13.242799999999999 - type: nauc_ndcg_at_3_diff1 value: 22.9354 - type: nauc_ndcg_at_5_max value: 11.186599999999999 - type: nauc_ndcg_at_5_std value: -10.3925 - type: nauc_ndcg_at_5_diff1 value: 17.779600000000002 - type: nauc_ndcg_at_10_max value: 9.4009 - type: nauc_ndcg_at_10_std value: -10.864 - type: nauc_ndcg_at_10_diff1 value: 18.1759 - type: nauc_ndcg_at_20_max value: 9.9435 - type: nauc_ndcg_at_20_std value: -10.5532 - type: nauc_ndcg_at_20_diff1 value: 18.0746 - type: nauc_ndcg_at_100_max value: 9.6817 - type: nauc_ndcg_at_100_std value: -9.0056 - type: nauc_ndcg_at_100_diff1 value: 20.5883 - type: nauc_ndcg_at_1000_max value: 9.1859 - type: nauc_ndcg_at_1000_std value: -10.2839 - type: nauc_ndcg_at_1000_diff1 value: 21.3418 - type: nauc_map_at_1_max value: 3.254 - type: nauc_map_at_1_std value: -14.7227 - type: nauc_map_at_1_diff1 value: 37.6337 - type: nauc_map_at_3_max value: 6.641800000000001 - type: nauc_map_at_3_std value: -13.4988 - type: nauc_map_at_3_diff1 value: 26.174999999999997 - type: nauc_map_at_5_max value: 8.6381 - type: nauc_map_at_5_std value: -11.8414 - type: nauc_map_at_5_diff1 value: 23.1285 - type: nauc_map_at_10_max value: 7.8475 - type: nauc_map_at_10_std value: -12.021999999999998 - type: nauc_map_at_10_diff1 value: 23.3678 - type: nauc_map_at_20_max value: 8.0317 - type: nauc_map_at_20_std value: -11.8687 - type: nauc_map_at_20_diff1 value: 23.4456 - type: nauc_map_at_100_max value: 7.9571000000000005 - type: nauc_map_at_100_std value: -11.6699 - type: nauc_map_at_100_diff1 value: 23.7984 - type: nauc_map_at_1000_max value: 7.943 - type: nauc_map_at_1000_std value: -11.7087 - type: nauc_map_at_1000_diff1 value: 23.8186 - type: nauc_recall_at_1_max value: 3.254 - type: nauc_recall_at_1_std value: -14.7227 - type: nauc_recall_at_1_diff1 value: 37.6337 - type: nauc_recall_at_3_max value: 9.9777 - type: nauc_recall_at_3_std value: -12.645100000000001 - type: nauc_recall_at_3_diff1 value: 15.090600000000002 - type: nauc_recall_at_5_max value: 17.8264 - type: nauc_recall_at_5_std value: -6.5932 - type: nauc_recall_at_5_diff1 value: 4.3373 - type: nauc_recall_at_10_max value: 13.5901 - type: nauc_recall_at_10_std value: -7.5634999999999994 - type: nauc_recall_at_10_diff1 value: 3.2628999999999997 - type: nauc_recall_at_20_max value: 16.8637 - type: nauc_recall_at_20_std value: -5.876399999999999 - type: nauc_recall_at_20_diff1 value: -2.0105999999999997 - type: nauc_recall_at_100_max value: 28.4163 - type: nauc_recall_at_100_std value: 32.5479 - type: nauc_recall_at_100_diff1 value: 1.6202999999999999 - type: nauc_recall_at_1000_max value: 86.1111 - type: nauc_recall_at_1000_std value: 93.4641 - type: nauc_recall_at_1000_diff1 value: 63.8189 - type: nauc_precision_at_1_max value: 3.254 - type: nauc_precision_at_1_std value: -14.7227 - type: nauc_precision_at_1_diff1 value: 37.6337 - type: nauc_precision_at_3_max value: 9.9777 - type: nauc_precision_at_3_std value: -12.645100000000001 - type: nauc_precision_at_3_diff1 value: 15.090600000000002 - type: nauc_precision_at_5_max value: 17.8264 - type: nauc_precision_at_5_std value: -6.5932 - type: nauc_precision_at_5_diff1 value: 4.3373 - type: nauc_precision_at_10_max value: 13.5901 - type: nauc_precision_at_10_std value: -7.5634999999999994 - type: nauc_precision_at_10_diff1 value: 3.2628999999999997 - type: nauc_precision_at_20_max value: 16.8637 - type: nauc_precision_at_20_std value: -5.876399999999999 - type: nauc_precision_at_20_diff1 value: -2.0105999999999997 - type: nauc_precision_at_100_max value: 28.4163 - type: nauc_precision_at_100_std value: 32.5479 - type: nauc_precision_at_100_diff1 value: 1.6202999999999999 - type: nauc_precision_at_1000_max value: 86.1111 - type: nauc_precision_at_1000_std value: 93.4641 - type: nauc_precision_at_1000_diff1 value: 63.8189 - type: nauc_mrr_at_1_max value: 7.7073 - type: nauc_mrr_at_1_std value: -15.7727 - type: nauc_mrr_at_1_diff1 value: 36.2605 - type: nauc_mrr_at_3_max value: 7.0968 - type: nauc_mrr_at_3_std value: -13.9735 - type: nauc_mrr_at_3_diff1 value: 25.1765 - type: nauc_mrr_at_5_max value: 7.2429 - type: nauc_mrr_at_5_std value: -14.223099999999999 - type: nauc_mrr_at_5_diff1 value: 23.2141 - type: nauc_mrr_at_10_max value: 8.1606 - type: nauc_mrr_at_10_std value: -13.4187 - type: nauc_mrr_at_10_diff1 value: 22.9983 - type: nauc_mrr_at_20_max value: 8.39 - type: nauc_mrr_at_20_std value: -13.28 - type: nauc_mrr_at_20_diff1 value: 22.830000000000002 - type: nauc_mrr_at_100_max value: 8.3666 - type: nauc_mrr_at_100_std value: -13.112599999999999 - type: nauc_mrr_at_100_diff1 value: 23.1988 - type: nauc_mrr_at_1000_max value: 8.3461 - type: nauc_mrr_at_1000_std value: -13.159799999999999 - type: nauc_mrr_at_1000_diff1 value: 23.217499999999998 - type: main_score value: 36.619 task: type: Retrieval - dataset: config: default name: MTEB DBPedia (default) revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: ndcg_at_1 value: 54.37499999999999 - type: ndcg_at_3 value: 44.463 - type: ndcg_at_5 value: 41.276 - type: ndcg_at_10 value: 39.409 - type: ndcg_at_20 value: 38.884 - type: ndcg_at_100 value: 44.382 - type: ndcg_at_1000 value: 52.48500000000001 - type: map_at_1 value: 8.709999999999999 - type: map_at_3 value: 13.974 - type: map_at_5 value: 16.104 - type: map_at_10 value: 19.218 - type: map_at_20 value: 21.966 - type: map_at_100 value: 26.290999999999997 - type: map_at_1000 value: 27.985 - type: recall_at_1 value: 8.709999999999999 - type: recall_at_3 value: 15.516 - type: recall_at_5 value: 18.907 - type: recall_at_10 value: 25.27 - type: recall_at_20 value: 31.968000000000004 - type: recall_at_100 value: 51.849999999999994 - type: recall_at_1000 value: 76.491 - type: precision_at_1 value: 67.25 - type: precision_at_3 value: 48.167 - type: precision_at_5 value: 39.4 - type: precision_at_10 value: 30.55 - type: precision_at_20 value: 22.75 - type: precision_at_100 value: 9.588000000000001 - type: precision_at_1000 value: 2.118 - type: mrr_at_1 value: 67.25 - type: mrr_at_3 value: 73.83330000000001 - type: mrr_at_5 value: 74.3083 - type: mrr_at_10 value: 75.03699999999999 - type: mrr_at_20 value: 75.1468 - type: mrr_at_100 value: 75.3182 - type: mrr_at_1000 value: 75.3253 - type: nauc_ndcg_at_1_max value: 30.7815 - type: nauc_ndcg_at_1_std value: 18.9823 - type: nauc_ndcg_at_1_diff1 value: 38.7185 - type: nauc_ndcg_at_3_max value: 27.3482 - type: nauc_ndcg_at_3_std value: 20.1357 - type: nauc_ndcg_at_3_diff1 value: 24.9478 - type: nauc_ndcg_at_5_max value: 23.8231 - type: nauc_ndcg_at_5_std value: 19.8595 - type: nauc_ndcg_at_5_diff1 value: 20.5147 - type: nauc_ndcg_at_10_max value: 19.8984 - type: nauc_ndcg_at_10_std value: 16.6632 - type: nauc_ndcg_at_10_diff1 value: 18.5195 - type: nauc_ndcg_at_20_max value: 15.437000000000001 - type: nauc_ndcg_at_20_std value: 13.8071 - type: nauc_ndcg_at_20_diff1 value: 18.0289 - type: nauc_ndcg_at_100_max value: 15.042900000000001 - type: nauc_ndcg_at_100_std value: 18.1034 - type: nauc_ndcg_at_100_diff1 value: 16.5884 - type: nauc_ndcg_at_1000_max value: 24.6937 - type: nauc_ndcg_at_1000_std value: 28.625 - type: nauc_ndcg_at_1000_diff1 value: 16.9271 - type: nauc_map_at_1_max value: -7.1981 - type: nauc_map_at_1_std value: -20.8768 - type: nauc_map_at_1_diff1 value: 24.6797 - type: nauc_map_at_3_max value: -4.8358 - type: nauc_map_at_3_std value: -16.6611 - type: nauc_map_at_3_diff1 value: 18.9037 - type: nauc_map_at_5_max value: -3.4354999999999998 - type: nauc_map_at_5_std value: -14.018600000000001 - type: nauc_map_at_5_diff1 value: 17.516499999999997 - type: nauc_map_at_10_max value: -0.9939999999999999 - type: nauc_map_at_10_std value: -8.484 - type: nauc_map_at_10_diff1 value: 15.8007 - type: nauc_map_at_20_max value: 3.2260999999999997 - type: nauc_map_at_20_std value: -0.8369 - type: nauc_map_at_20_diff1 value: 15.8524 - type: nauc_map_at_100_max value: 9.8084 - type: nauc_map_at_100_std value: 11.7005 - type: nauc_map_at_100_diff1 value: 16.5458 - type: nauc_map_at_1000_max value: 12.7583 - type: nauc_map_at_1000_std value: 15.331 - type: nauc_map_at_1000_diff1 value: 16.7243 - type: nauc_recall_at_1_max value: -7.1981 - type: nauc_recall_at_1_std value: -20.8768 - type: nauc_recall_at_1_diff1 value: 24.6797 - type: nauc_recall_at_3_max value: -8.7416 - type: nauc_recall_at_3_std value: -18.1497 - type: nauc_recall_at_3_diff1 value: 13.2151 - type: nauc_recall_at_5_max value: -7.7954 - type: nauc_recall_at_5_std value: -16.4247 - type: nauc_recall_at_5_diff1 value: 11.3209 - type: nauc_recall_at_10_max value: -6.8051 - type: nauc_recall_at_10_std value: -11.8753 - type: nauc_recall_at_10_diff1 value: 9.1489 - type: nauc_recall_at_20_max value: -3.7832999999999997 - type: nauc_recall_at_20_std value: -4.0681 - type: nauc_recall_at_20_diff1 value: 7.769299999999999 - type: nauc_recall_at_100_max value: 2.4143000000000003 - type: nauc_recall_at_100_std value: 13.5572 - type: nauc_recall_at_100_diff1 value: 6.3968 - type: nauc_recall_at_1000_max value: 14.8639 - type: nauc_recall_at_1000_std value: 34.389900000000004 - type: nauc_recall_at_1000_diff1 value: 2.3819 - type: nauc_precision_at_1_max value: 39.8074 - type: nauc_precision_at_1_std value: 29.7269 - type: nauc_precision_at_1_diff1 value: 46.7701 - type: nauc_precision_at_3_max value: 32.2757 - type: nauc_precision_at_3_std value: 30.7486 - type: nauc_precision_at_3_diff1 value: 13.880400000000002 - type: nauc_precision_at_5_max value: 31.016 - type: nauc_precision_at_5_std value: 37.9799 - type: nauc_precision_at_5_diff1 value: 7.4082 - type: nauc_precision_at_10_max value: 32.268 - type: nauc_precision_at_10_std value: 43.9588 - type: nauc_precision_at_10_diff1 value: 4.3159 - type: nauc_precision_at_20_max value: 32.264199999999995 - type: nauc_precision_at_20_std value: 48.2933 - type: nauc_precision_at_20_diff1 value: 3.8432 - type: nauc_precision_at_100_max value: 30.725799999999996 - type: nauc_precision_at_100_std value: 49.6683 - type: nauc_precision_at_100_diff1 value: 0.0351 - type: nauc_precision_at_1000_max value: 28.237299999999998 - type: nauc_precision_at_1000_std value: 24.8433 - type: nauc_precision_at_1000_diff1 value: 3.6408000000000005 - type: nauc_mrr_at_1_max value: 39.8074 - type: nauc_mrr_at_1_std value: 29.7269 - type: nauc_mrr_at_1_diff1 value: 46.7701 - type: nauc_mrr_at_3_max value: 42.7825 - type: nauc_mrr_at_3_std value: 32.467800000000004 - type: nauc_mrr_at_3_diff1 value: 43.7056 - type: nauc_mrr_at_5_max value: 43.0631 - type: nauc_mrr_at_5_std value: 32.859 - type: nauc_mrr_at_5_diff1 value: 43.646 - type: nauc_mrr_at_10_max value: 42.8307 - type: nauc_mrr_at_10_std value: 32.8042 - type: nauc_mrr_at_10_diff1 value: 43.3566 - type: nauc_mrr_at_20_max value: 42.9185 - type: nauc_mrr_at_20_std value: 32.723600000000005 - type: nauc_mrr_at_20_diff1 value: 43.6419 - type: nauc_mrr_at_100_max value: 43.006699999999995 - type: nauc_mrr_at_100_std value: 32.628800000000005 - type: nauc_mrr_at_100_diff1 value: 43.935 - type: nauc_mrr_at_1000_max value: 42.9879 - type: nauc_mrr_at_1000_std value: 32.6121 - type: nauc_mrr_at_1000_diff1 value: 43.9284 - type: main_score value: 39.409 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 40.949999999999996 - type: f1 value: 37.1674 - type: f1_weighted value: 43.1842 - type: main_score value: 40.949999999999996 task: type: Classification - dataset: config: default name: MTEB FEVER (default) revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: ndcg_at_1 value: 85.179 - type: ndcg_at_3 value: 87.304 - type: ndcg_at_5 value: 87.862 - type: ndcg_at_10 value: 88.229 - type: ndcg_at_20 value: 88.49000000000001 - type: ndcg_at_100 value: 88.84 - type: ndcg_at_1000 value: 89.116 - type: map_at_1 value: 78.993 - type: map_at_3 value: 84.37 - type: map_at_5 value: 84.812 - type: map_at_10 value: 85.02 - type: map_at_20 value: 85.114 - type: map_at_100 value: 85.18599999999999 - type: map_at_1000 value: 85.2 - type: recall_at_1 value: 78.993 - type: recall_at_3 value: 89.96499999999999 - type: recall_at_5 value: 91.562 - type: recall_at_10 value: 92.685 - type: recall_at_20 value: 93.595 - type: recall_at_100 value: 95.16 - type: recall_at_1000 value: 96.943 - type: precision_at_1 value: 85.179 - type: precision_at_3 value: 32.543 - type: precision_at_5 value: 19.930999999999997 - type: precision_at_10 value: 10.129000000000001 - type: precision_at_20 value: 5.140000000000001 - type: precision_at_100 value: 1.06 - type: precision_at_1000 value: 0.11 - type: mrr_at_1 value: 85.1785 - type: mrr_at_3 value: 90.3215 - type: mrr_at_5 value: 90.6223 - type: mrr_at_10 value: 90.74449999999999 - type: mrr_at_20 value: 90.78389999999999 - type: mrr_at_100 value: 90.79899999999999 - type: mrr_at_1000 value: 90.80080000000001 - type: nauc_ndcg_at_1_max value: 42.509 - type: nauc_ndcg_at_1_std value: -14.4135 - type: nauc_ndcg_at_1_diff1 value: 69.351 - type: nauc_ndcg_at_3_max value: 31.848599999999998 - type: nauc_ndcg_at_3_std value: -8.8348 - type: nauc_ndcg_at_3_diff1 value: 43.6934 - type: nauc_ndcg_at_5_max value: 30.5029 - type: nauc_ndcg_at_5_std value: -7.1606000000000005 - type: nauc_ndcg_at_5_diff1 value: 43.1125 - type: nauc_ndcg_at_10_max value: 30.383900000000004 - type: nauc_ndcg_at_10_std value: -6.112299999999999 - type: nauc_ndcg_at_10_diff1 value: 42.9948 - type: nauc_ndcg_at_20_max value: 30.6167 - type: nauc_ndcg_at_20_std value: -5.6432 - type: nauc_ndcg_at_20_diff1 value: 43.247600000000006 - type: nauc_ndcg_at_100_max value: 31.2245 - type: nauc_ndcg_at_100_std value: -5.3287 - type: nauc_ndcg_at_100_diff1 value: 43.5092 - type: nauc_ndcg_at_1000_max value: 31.724999999999998 - type: nauc_ndcg_at_1000_std value: -5.5252 - type: nauc_ndcg_at_1000_diff1 value: 44.1117 - type: nauc_map_at_1_max value: 33.535900000000005 - type: nauc_map_at_1_std value: -7.5043 - type: nauc_map_at_1_diff1 value: 51.1658 - type: nauc_map_at_3_max value: 30.357499999999998 - type: nauc_map_at_3_std value: -7.0673 - type: nauc_map_at_3_diff1 value: 43.169000000000004 - type: nauc_map_at_5_max value: 30.1609 - type: nauc_map_at_5_std value: -6.2828 - type: nauc_map_at_5_diff1 value: 43.22 - type: nauc_map_at_10_max value: 30.2687 - type: nauc_map_at_10_std value: -5.931299999999999 - type: nauc_map_at_10_diff1 value: 43.3113 - type: nauc_map_at_20_max value: 30.3425 - type: nauc_map_at_20_std value: -5.827999999999999 - type: nauc_map_at_20_diff1 value: 43.378 - type: nauc_map_at_100_max value: 30.4597 - type: nauc_map_at_100_std value: -5.781 - type: nauc_map_at_100_diff1 value: 43.4338 - type: nauc_map_at_1000_max value: 30.4815 - type: nauc_map_at_1000_std value: -5.7874 - type: nauc_map_at_1000_diff1 value: 43.4604 - type: nauc_recall_at_1_max value: 33.535900000000005 - type: nauc_recall_at_1_std value: -7.5043 - type: nauc_recall_at_1_diff1 value: 51.1658 - type: nauc_recall_at_3_max value: 21.5412 - type: nauc_recall_at_3_std value: -5.3411 - type: nauc_recall_at_3_diff1 value: 22.9753 - type: nauc_recall_at_5_max value: 18.2607 - type: nauc_recall_at_5_std value: 0.4319 - type: nauc_recall_at_5_diff1 value: 18.4494 - type: nauc_recall_at_10_max value: 16.9918 - type: nauc_recall_at_10_std value: 5.6791 - type: nauc_recall_at_10_diff1 value: 14.8096 - type: nauc_recall_at_20_max value: 16.2394 - type: nauc_recall_at_20_std value: 10.014000000000001 - type: nauc_recall_at_20_diff1 value: 12.6674 - type: nauc_recall_at_100_max value: 17.160700000000002 - type: nauc_recall_at_100_std value: 17.7282 - type: nauc_recall_at_100_diff1 value: 6.4750000000000005 - type: nauc_recall_at_1000_max value: 18.7047 - type: nauc_recall_at_1000_std value: 26.4285 - type: nauc_recall_at_1000_diff1 value: -0.4528 - type: nauc_precision_at_1_max value: 42.509 - type: nauc_precision_at_1_std value: -14.4135 - type: nauc_precision_at_1_diff1 value: 69.351 - type: nauc_precision_at_3_max value: 21.5337 - type: nauc_precision_at_3_std value: -18.1489 - type: nauc_precision_at_3_diff1 value: 23.7103 - type: nauc_precision_at_5_max value: 10.8839 - type: nauc_precision_at_5_std value: -8.7334 - type: nauc_precision_at_5_diff1 value: 12.0412 - type: nauc_precision_at_10_max value: 5.632000000000001 - type: nauc_precision_at_10_std value: -1.2274 - type: nauc_precision_at_10_diff1 value: 3.2148000000000003 - type: nauc_precision_at_20_max value: 3.6290999999999998 - type: nauc_precision_at_20_std value: 3.1643 - type: nauc_precision_at_20_diff1 value: -2.106 - type: nauc_precision_at_100_max value: 3.749 - type: nauc_precision_at_100_std value: 5.944599999999999 - type: nauc_precision_at_100_diff1 value: -8.2121 - type: nauc_precision_at_1000_max value: 3.9972 - type: nauc_precision_at_1000_std value: 3.2577000000000003 - type: nauc_precision_at_1000_diff1 value: -8.6116 - type: nauc_mrr_at_1_max value: 42.509 - type: nauc_mrr_at_1_std value: -14.4135 - type: nauc_mrr_at_1_diff1 value: 69.351 - type: nauc_mrr_at_3_max value: 41.805 - type: nauc_mrr_at_3_std value: -17.8756 - type: nauc_mrr_at_3_diff1 value: 65.21050000000001 - type: nauc_mrr_at_5_max value: 41.9114 - type: nauc_mrr_at_5_std value: -17.1294 - type: nauc_mrr_at_5_diff1 value: 65.5444 - type: nauc_mrr_at_10_max value: 42.1507 - type: nauc_mrr_at_10_std value: -16.7196 - type: nauc_mrr_at_10_diff1 value: 65.76480000000001 - type: nauc_mrr_at_20_max value: 42.1918 - type: nauc_mrr_at_20_std value: -16.6012 - type: nauc_mrr_at_20_diff1 value: 65.9105 - type: nauc_mrr_at_100_max value: 42.1853 - type: nauc_mrr_at_100_std value: -16.578799999999998 - type: nauc_mrr_at_100_diff1 value: 65.9277 - type: nauc_mrr_at_1000_max value: 42.1787 - type: nauc_mrr_at_1000_std value: -16.5811 - type: nauc_mrr_at_1000_diff1 value: 65.9297 - type: main_score value: 88.229 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 (default) revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: ndcg_at_1 value: 44.599 - type: ndcg_at_3 value: 41.597 - type: ndcg_at_5 value: 42.611 - type: ndcg_at_10 value: 44.931 - type: ndcg_at_20 value: 47.727000000000004 - type: ndcg_at_100 value: 51.914 - type: ndcg_at_1000 value: 54.674 - type: map_at_1 value: 22.586000000000002 - type: map_at_3 value: 32.445 - type: map_at_5 value: 34.951 - type: map_at_10 value: 36.836 - type: map_at_20 value: 37.958 - type: map_at_100 value: 38.863 - type: map_at_1000 value: 39.041 - type: recall_at_1 value: 22.586000000000002 - type: recall_at_3 value: 37.802 - type: recall_at_5 value: 43.86 - type: recall_at_10 value: 51.519999999999996 - type: recall_at_20 value: 60.22 - type: recall_at_100 value: 77.251 - type: recall_at_1000 value: 93.503 - type: precision_at_1 value: 44.599 - type: precision_at_3 value: 27.622999999999998 - type: precision_at_5 value: 20.093 - type: precision_at_10 value: 12.346 - type: precision_at_20 value: 7.353 - type: precision_at_100 value: 1.951 - type: precision_at_1000 value: 0.244 - type: mrr_at_1 value: 44.5988 - type: mrr_at_3 value: 51.157399999999996 - type: mrr_at_5 value: 52.4228 - type: mrr_at_10 value: 53.4708 - type: mrr_at_20 value: 53.898500000000006 - type: mrr_at_100 value: 54.18619999999999 - type: mrr_at_1000 value: 54.2227 - type: nauc_ndcg_at_1_max value: 41.8311 - type: nauc_ndcg_at_1_std value: -1.4024999999999999 - type: nauc_ndcg_at_1_diff1 value: 51.9037 - type: nauc_ndcg_at_3_max value: 35.448299999999996 - type: nauc_ndcg_at_3_std value: -0.3253 - type: nauc_ndcg_at_3_diff1 value: 40.5332 - type: nauc_ndcg_at_5_max value: 34.3939 - type: nauc_ndcg_at_5_std value: 0.5177 - type: nauc_ndcg_at_5_diff1 value: 39.729 - type: nauc_ndcg_at_10_max value: 32.8185 - type: nauc_ndcg_at_10_std value: 1.2571 - type: nauc_ndcg_at_10_diff1 value: 39.358 - type: nauc_ndcg_at_20_max value: 34.4751 - type: nauc_ndcg_at_20_std value: 3.0460000000000003 - type: nauc_ndcg_at_20_diff1 value: 40.474700000000006 - type: nauc_ndcg_at_100_max value: 37.079699999999995 - type: nauc_ndcg_at_100_std value: 6.704400000000001 - type: nauc_ndcg_at_100_diff1 value: 41.145199999999996 - type: nauc_ndcg_at_1000_max value: 37.5561 - type: nauc_ndcg_at_1000_std value: 5.4764 - type: nauc_ndcg_at_1000_diff1 value: 41.104400000000005 - type: nauc_map_at_1_max value: 22.570899999999998 - type: nauc_map_at_1_std value: -4.3153 - type: nauc_map_at_1_diff1 value: 45.949400000000004 - type: nauc_map_at_3_max value: 27.0957 - type: nauc_map_at_3_std value: -2.0714 - type: nauc_map_at_3_diff1 value: 40.2278 - type: nauc_map_at_5_max value: 29.744500000000002 - type: nauc_map_at_5_std value: -0.6752 - type: nauc_map_at_5_diff1 value: 39.44 - type: nauc_map_at_10_max value: 30.2678 - type: nauc_map_at_10_std value: -0.0069 - type: nauc_map_at_10_diff1 value: 38.9648 - type: nauc_map_at_20_max value: 31.381700000000002 - type: nauc_map_at_20_std value: 0.765 - type: nauc_map_at_20_diff1 value: 39.3088 - type: nauc_map_at_100_max value: 32.1076 - type: nauc_map_at_100_std value: 1.4984000000000002 - type: nauc_map_at_100_diff1 value: 39.4675 - type: nauc_map_at_1000_max value: 32.1799 - type: nauc_map_at_1000_std value: 1.4738 - type: nauc_map_at_1000_diff1 value: 39.4786 - type: nauc_recall_at_1_max value: 22.570899999999998 - type: nauc_recall_at_1_std value: -4.3153 - type: nauc_recall_at_1_diff1 value: 45.949400000000004 - type: nauc_recall_at_3_max value: 22.0782 - type: nauc_recall_at_3_std value: -1.7135999999999998 - type: nauc_recall_at_3_diff1 value: 33.5696 - type: nauc_recall_at_5_max value: 24.9421 - type: nauc_recall_at_5_std value: 0.47019999999999995 - type: nauc_recall_at_5_diff1 value: 31.660899999999998 - type: nauc_recall_at_10_max value: 22.847 - type: nauc_recall_at_10_std value: 2.1398 - type: nauc_recall_at_10_diff1 value: 27.879199999999997 - type: nauc_recall_at_20_max value: 24.476 - type: nauc_recall_at_20_std value: 7.3819 - type: nauc_recall_at_20_diff1 value: 29.717100000000002 - type: nauc_recall_at_100_max value: 33.1008 - type: nauc_recall_at_100_std value: 32.008900000000004 - type: nauc_recall_at_100_diff1 value: 29.1164 - type: nauc_recall_at_1000_max value: 39.5742 - type: nauc_recall_at_1000_std value: 51.944199999999995 - type: nauc_recall_at_1000_diff1 value: 17.8932 - type: nauc_precision_at_1_max value: 41.8311 - type: nauc_precision_at_1_std value: -1.4024999999999999 - type: nauc_precision_at_1_diff1 value: 51.9037 - type: nauc_precision_at_3_max value: 38.707300000000004 - type: nauc_precision_at_3_std value: 3.3242000000000003 - type: nauc_precision_at_3_diff1 value: 26.32 - type: nauc_precision_at_5_max value: 40.4051 - type: nauc_precision_at_5_std value: 7.2255 - type: nauc_precision_at_5_diff1 value: 20.524 - type: nauc_precision_at_10_max value: 37.024 - type: nauc_precision_at_10_std value: 8.871 - type: nauc_precision_at_10_diff1 value: 14.985100000000001 - type: nauc_precision_at_20_max value: 39.8142 - type: nauc_precision_at_20_std value: 12.9133 - type: nauc_precision_at_20_diff1 value: 13.5855 - type: nauc_precision_at_100_max value: 36.8128 - type: nauc_precision_at_100_std value: 17.273 - type: nauc_precision_at_100_diff1 value: 7.706799999999999 - type: nauc_precision_at_1000_max value: 29.197699999999998 - type: nauc_precision_at_1000_std value: 10.452200000000001 - type: nauc_precision_at_1000_diff1 value: -0.43429999999999996 - type: nauc_mrr_at_1_max value: 41.8311 - type: nauc_mrr_at_1_std value: -1.4024999999999999 - type: nauc_mrr_at_1_diff1 value: 51.9037 - type: nauc_mrr_at_3_max value: 41.5348 - type: nauc_mrr_at_3_std value: 0.47200000000000003 - type: nauc_mrr_at_3_diff1 value: 48.2132 - type: nauc_mrr_at_5_max value: 41.4712 - type: nauc_mrr_at_5_std value: 0.9362 - type: nauc_mrr_at_5_diff1 value: 47.7862 - type: nauc_mrr_at_10_max value: 41.3833 - type: nauc_mrr_at_10_std value: 0.9305000000000001 - type: nauc_mrr_at_10_diff1 value: 47.8177 - type: nauc_mrr_at_20_max value: 41.5143 - type: nauc_mrr_at_20_std value: 1.2017 - type: nauc_mrr_at_20_diff1 value: 48.0106 - type: nauc_mrr_at_100_max value: 41.6027 - type: nauc_mrr_at_100_std value: 1.3906999999999998 - type: nauc_mrr_at_100_diff1 value: 48.0719 - type: nauc_mrr_at_1000_max value: 41.597 - type: nauc_mrr_at_1000_std value: 1.3443 - type: nauc_mrr_at_1000_diff1 value: 48.0767 - type: main_score value: 44.931 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA (default) revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: ndcg_at_1 value: 76.354 - type: ndcg_at_3 value: 62.900999999999996 - type: ndcg_at_5 value: 65.68 - type: ndcg_at_10 value: 67.776 - type: ndcg_at_20 value: 69.144 - type: ndcg_at_100 value: 70.85000000000001 - type: ndcg_at_1000 value: 72.151 - type: map_at_1 value: 38.177 - type: map_at_3 value: 55.554 - type: map_at_5 value: 57.774 - type: map_at_10 value: 59.022 - type: map_at_20 value: 59.574000000000005 - type: map_at_100 value: 59.925 - type: map_at_1000 value: 59.99 - type: recall_at_1 value: 38.177 - type: recall_at_3 value: 60.169 - type: recall_at_5 value: 65.63799999999999 - type: recall_at_10 value: 70.878 - type: recall_at_20 value: 75.267 - type: recall_at_100 value: 82.822 - type: recall_at_1000 value: 91.472 - type: precision_at_1 value: 76.354 - type: precision_at_3 value: 40.113 - type: precision_at_5 value: 26.255 - type: precision_at_10 value: 14.176 - type: precision_at_20 value: 7.527 - type: precision_at_100 value: 1.656 - type: precision_at_1000 value: 0.183 - type: mrr_at_1 value: 76.3538 - type: mrr_at_3 value: 81.7218 - type: mrr_at_5 value: 82.3403 - type: mrr_at_10 value: 82.7021 - type: mrr_at_20 value: 82.8339 - type: mrr_at_100 value: 82.88889999999999 - type: mrr_at_1000 value: 82.8978 - type: nauc_ndcg_at_1_max value: 45.4675 - type: nauc_ndcg_at_1_std value: -8.5846 - type: nauc_ndcg_at_1_diff1 value: 67.2619 - type: nauc_ndcg_at_3_max value: 29.083399999999997 - type: nauc_ndcg_at_3_std value: 0.9821 - type: nauc_ndcg_at_3_diff1 value: 22.708000000000002 - type: nauc_ndcg_at_5_max value: 29.0541 - type: nauc_ndcg_at_5_std value: 3.5778999999999996 - type: nauc_ndcg_at_5_diff1 value: 20.8512 - type: nauc_ndcg_at_10_max value: 28.6135 - type: nauc_ndcg_at_10_std value: 5.3694 - type: nauc_ndcg_at_10_diff1 value: 19.913700000000002 - type: nauc_ndcg_at_20_max value: 28.971000000000004 - type: nauc_ndcg_at_20_std value: 6.6706 - type: nauc_ndcg_at_20_diff1 value: 20.015900000000002 - type: nauc_ndcg_at_100_max value: 29.2235 - type: nauc_ndcg_at_100_std value: 7.5165 - type: nauc_ndcg_at_100_diff1 value: 20.703 - type: nauc_ndcg_at_1000_max value: 29.808 - type: nauc_ndcg_at_1000_std value: 7.0276000000000005 - type: nauc_ndcg_at_1000_diff1 value: 21.8394 - type: nauc_map_at_1_max value: 45.4675 - type: nauc_map_at_1_std value: -8.5846 - type: nauc_map_at_1_diff1 value: 67.2619 - type: nauc_map_at_3_max value: 25.374200000000002 - type: nauc_map_at_3_std value: 1.4205 - type: nauc_map_at_3_diff1 value: 16.7465 - type: nauc_map_at_5_max value: 25.5649 - type: nauc_map_at_5_std value: 3.2438000000000002 - type: nauc_map_at_5_diff1 value: 15.676200000000001 - type: nauc_map_at_10_max value: 25.4328 - type: nauc_map_at_10_std value: 4.198799999999999 - type: nauc_map_at_10_diff1 value: 15.3134 - type: nauc_map_at_20_max value: 25.583299999999998 - type: nauc_map_at_20_std value: 4.6277 - type: nauc_map_at_20_diff1 value: 15.4013 - type: nauc_map_at_100_max value: 25.647100000000002 - type: nauc_map_at_100_std value: 4.7775 - type: nauc_map_at_100_diff1 value: 15.543999999999999 - type: nauc_map_at_1000_max value: 25.672299999999996 - type: nauc_map_at_1000_std value: 4.7689 - type: nauc_map_at_1000_diff1 value: 15.5824 - type: nauc_recall_at_1_max value: 45.4675 - type: nauc_recall_at_1_std value: -8.5846 - type: nauc_recall_at_1_diff1 value: 67.2619 - type: nauc_recall_at_3_max value: 23.5896 - type: nauc_recall_at_3_std value: 4.3086 - type: nauc_recall_at_3_diff1 value: 8.8109 - type: nauc_recall_at_5_max value: 22.2473 - type: nauc_recall_at_5_std value: 9.2394 - type: nauc_recall_at_5_diff1 value: 4.0969 - type: nauc_recall_at_10_max value: 19.930600000000002 - type: nauc_recall_at_10_std value: 14.0805 - type: nauc_recall_at_10_diff1 value: -0.1729 - type: nauc_recall_at_20_max value: 19.938 - type: nauc_recall_at_20_std value: 19.3764 - type: nauc_recall_at_20_diff1 value: -2.1292999999999997 - type: nauc_recall_at_100_max value: 18.3819 - type: nauc_recall_at_100_std value: 27.5254 - type: nauc_recall_at_100_diff1 value: -4.7437 - type: nauc_recall_at_1000_max value: 20.441699999999997 - type: nauc_recall_at_1000_std value: 35.8119 - type: nauc_recall_at_1000_diff1 value: -6.1713 - type: nauc_precision_at_1_max value: 45.4675 - type: nauc_precision_at_1_std value: -8.5846 - type: nauc_precision_at_1_diff1 value: 67.2619 - type: nauc_precision_at_3_max value: 23.5896 - type: nauc_precision_at_3_std value: 4.3086 - type: nauc_precision_at_3_diff1 value: 8.8109 - type: nauc_precision_at_5_max value: 22.2473 - type: nauc_precision_at_5_std value: 9.2394 - type: nauc_precision_at_5_diff1 value: 4.0969 - type: nauc_precision_at_10_max value: 19.930600000000002 - type: nauc_precision_at_10_std value: 14.0805 - type: nauc_precision_at_10_diff1 value: -0.1729 - type: nauc_precision_at_20_max value: 19.938 - type: nauc_precision_at_20_std value: 19.3764 - type: nauc_precision_at_20_diff1 value: -2.1292999999999997 - type: nauc_precision_at_100_max value: 18.3819 - type: nauc_precision_at_100_std value: 27.5254 - type: nauc_precision_at_100_diff1 value: -4.7437 - type: nauc_precision_at_1000_max value: 20.441699999999997 - type: nauc_precision_at_1000_std value: 35.8119 - type: nauc_precision_at_1000_diff1 value: -6.1713 - type: nauc_mrr_at_1_max value: 45.4675 - type: nauc_mrr_at_1_std value: -8.5846 - type: nauc_mrr_at_1_diff1 value: 67.2619 - type: nauc_mrr_at_3_max value: 49.182700000000004 - type: nauc_mrr_at_3_std value: -6.6154 - type: nauc_mrr_at_3_diff1 value: 65.8318 - type: nauc_mrr_at_5_max value: 49.1926 - type: nauc_mrr_at_5_std value: -6.059699999999999 - type: nauc_mrr_at_5_diff1 value: 65.819 - type: nauc_mrr_at_10_max value: 49.0188 - type: nauc_mrr_at_10_std value: -5.976 - type: nauc_mrr_at_10_diff1 value: 65.962 - type: nauc_mrr_at_20_max value: 49.0418 - type: nauc_mrr_at_20_std value: -5.9215 - type: nauc_mrr_at_20_diff1 value: 66.0577 - type: nauc_mrr_at_100_max value: 48.9901 - type: nauc_mrr_at_100_std value: -5.9538 - type: nauc_mrr_at_100_diff1 value: 66.0463 - type: nauc_mrr_at_1000_max value: 48.9822 - type: nauc_mrr_at_1000_std value: -5.9649 - type: nauc_mrr_at_1000_diff1 value: 66.0457 - type: main_score value: 67.776 task: type: Retrieval - dataset: config: default name: MTEB ImdbClassification (default) revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 64.4052 - type: f1 value: 64.2124 - type: f1_weighted value: 64.2124 - type: ap value: 59.430899999999994 - type: ap_weighted value: 59.430899999999994 - type: main_score value: 64.4052 task: type: Classification - dataset: config: default name: MTEB MSMARCO (default) revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: ndcg_at_1 value: 15.443999999999999 - type: ndcg_at_3 value: 24.745 - type: ndcg_at_5 value: 28.560000000000002 - type: ndcg_at_10 value: 32.495000000000005 - type: ndcg_at_20 value: 35.226 - type: ndcg_at_100 value: 38.957 - type: ndcg_at_1000 value: 40.684 - type: map_at_1 value: 15.062000000000001 - type: map_at_3 value: 22.236 - type: map_at_5 value: 24.362000000000002 - type: map_at_10 value: 26.008 - type: map_at_20 value: 26.77 - type: map_at_100 value: 27.305 - type: map_at_1000 value: 27.372999999999998 - type: recall_at_1 value: 15.062000000000001 - type: recall_at_3 value: 31.556 - type: recall_at_5 value: 40.705999999999996 - type: recall_at_10 value: 52.72 - type: recall_at_20 value: 63.336000000000006 - type: recall_at_100 value: 83.006 - type: recall_at_1000 value: 96.263 - type: precision_at_1 value: 15.443999999999999 - type: precision_at_3 value: 10.86 - type: precision_at_5 value: 8.441 - type: precision_at_10 value: 5.486 - type: precision_at_20 value: 3.308 - type: precision_at_100 value: 0.8750000000000001 - type: precision_at_1000 value: 0.10200000000000001 - type: mrr_at_1 value: 15.444099999999999 - type: mrr_at_3 value: 22.7006 - type: mrr_at_5 value: 24.843799999999998 - type: mrr_at_10 value: 26.458199999999998 - type: mrr_at_20 value: 27.2124 - type: mrr_at_100 value: 27.7184 - type: mrr_at_1000 value: 27.7802 - type: nauc_ndcg_at_1_max value: 1.9339 - type: nauc_ndcg_at_1_std value: -13.125200000000001 - type: nauc_ndcg_at_1_diff1 value: 30.440499999999997 - type: nauc_ndcg_at_3_max value: 2.0631 - type: nauc_ndcg_at_3_std value: -15.065600000000002 - type: nauc_ndcg_at_3_diff1 value: 25.459300000000002 - type: nauc_ndcg_at_5_max value: 2.7612 - type: nauc_ndcg_at_5_std value: -15.576400000000001 - type: nauc_ndcg_at_5_diff1 value: 24.861 - type: nauc_ndcg_at_10_max value: 3.5461 - type: nauc_ndcg_at_10_std value: -15.2368 - type: nauc_ndcg_at_10_diff1 value: 25.328699999999998 - type: nauc_ndcg_at_20_max value: 4.4956000000000005 - type: nauc_ndcg_at_20_std value: -13.415099999999999 - type: nauc_ndcg_at_20_diff1 value: 25.401200000000003 - type: nauc_ndcg_at_100_max value: 5.1996 - type: nauc_ndcg_at_100_std value: -10.7691 - type: nauc_ndcg_at_100_diff1 value: 25.4837 - type: nauc_ndcg_at_1000_max value: 4.8437 - type: nauc_ndcg_at_1000_std value: -11.6759 - type: nauc_ndcg_at_1000_diff1 value: 25.6542 - type: nauc_map_at_1_max value: 1.8748999999999998 - type: nauc_map_at_1_std value: -13.203000000000001 - type: nauc_map_at_1_diff1 value: 30.786599999999996 - type: nauc_map_at_3_max value: 1.9382 - type: nauc_map_at_3_std value: -14.772499999999999 - type: nauc_map_at_3_diff1 value: 26.579900000000002 - type: nauc_map_at_5_max value: 2.3708 - type: nauc_map_at_5_std value: -15.093300000000001 - type: nauc_map_at_5_diff1 value: 26.2289 - type: nauc_map_at_10_max value: 2.7201 - type: nauc_map_at_10_std value: -14.9842 - type: nauc_map_at_10_diff1 value: 26.431700000000003 - type: nauc_map_at_20_max value: 2.9757 - type: nauc_map_at_20_std value: -14.4729 - type: nauc_map_at_20_diff1 value: 26.4573 - type: nauc_map_at_100_max value: 3.0642 - type: nauc_map_at_100_std value: -14.1146 - type: nauc_map_at_100_diff1 value: 26.472 - type: nauc_map_at_1000_max value: 3.0554 - type: nauc_map_at_1000_std value: -14.1365 - type: nauc_map_at_1000_diff1 value: 26.477899999999998 - type: nauc_recall_at_1_max value: 1.8748999999999998 - type: nauc_recall_at_1_std value: -13.203000000000001 - type: nauc_recall_at_1_diff1 value: 30.786599999999996 - type: nauc_recall_at_3_max value: 2.2464999999999997 - type: nauc_recall_at_3_std value: -15.7745 - type: nauc_recall_at_3_diff1 value: 22.8494 - type: nauc_recall_at_5_max value: 3.5999999999999996 - type: nauc_recall_at_5_std value: -16.7106 - type: nauc_recall_at_5_diff1 value: 21.6902 - type: nauc_recall_at_10_max value: 5.6766 - type: nauc_recall_at_10_std value: -15.768699999999999 - type: nauc_recall_at_10_diff1 value: 22.658900000000003 - type: nauc_recall_at_20_max value: 9.5641 - type: nauc_recall_at_20_std value: -8.8567 - type: nauc_recall_at_20_diff1 value: 22.6219 - type: nauc_recall_at_100_max value: 19.2898 - type: nauc_recall_at_100_std value: 17.354400000000002 - type: nauc_recall_at_100_diff1 value: 21.6465 - type: nauc_recall_at_1000_max value: 43.4838 - type: nauc_recall_at_1000_std value: 57.456300000000006 - type: nauc_recall_at_1000_diff1 value: 19.6644 - type: nauc_precision_at_1_max value: 1.9339 - type: nauc_precision_at_1_std value: -13.125200000000001 - type: nauc_precision_at_1_diff1 value: 30.440499999999997 - type: nauc_precision_at_3_max value: 2.1921 - type: nauc_precision_at_3_std value: -15.8918 - type: nauc_precision_at_3_diff1 value: 22.609099999999998 - type: nauc_precision_at_5_max value: 3.8808000000000002 - type: nauc_precision_at_5_std value: -16.6817 - type: nauc_precision_at_5_diff1 value: 21.0081 - type: nauc_precision_at_10_max value: 6.2251 - type: nauc_precision_at_10_std value: -14.9695 - type: nauc_precision_at_10_diff1 value: 21.3706 - type: nauc_precision_at_20_max value: 10.3311 - type: nauc_precision_at_20_std value: -7.5957 - type: nauc_precision_at_20_diff1 value: 20.4241 - type: nauc_precision_at_100_max value: 18.7934 - type: nauc_precision_at_100_std value: 16.6688 - type: nauc_precision_at_100_diff1 value: 13.4334 - type: nauc_precision_at_1000_max value: 22.3609 - type: nauc_precision_at_1000_std value: 22.090799999999998 - type: nauc_precision_at_1000_diff1 value: -1.5147000000000002 - type: nauc_mrr_at_1_max value: 1.9339 - type: nauc_mrr_at_1_std value: -13.125200000000001 - type: nauc_mrr_at_1_diff1 value: 30.440499999999997 - type: nauc_mrr_at_3_max value: 2.0884 - type: nauc_mrr_at_3_std value: -14.5665 - type: nauc_mrr_at_3_diff1 value: 26.270100000000003 - type: nauc_mrr_at_5_max value: 2.5026 - type: nauc_mrr_at_5_std value: -14.8794 - type: nauc_mrr_at_5_diff1 value: 25.8982 - type: nauc_mrr_at_10_max value: 2.8118 - type: nauc_mrr_at_10_std value: -14.7608 - type: nauc_mrr_at_10_diff1 value: 26.1961 - type: nauc_mrr_at_20_max value: 3.0701 - type: nauc_mrr_at_20_std value: -14.2605 - type: nauc_mrr_at_20_diff1 value: 26.206699999999998 - type: nauc_mrr_at_100_max value: 3.1292 - type: nauc_mrr_at_100_std value: -13.9589 - type: nauc_mrr_at_100_diff1 value: 26.227099999999997 - type: nauc_mrr_at_1000_max value: 3.1135 - type: nauc_mrr_at_1000_std value: -13.9831 - type: nauc_mrr_at_1000_diff1 value: 26.234099999999998 - type: main_score value: 32.495000000000005 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 91.31099999999999 - type: f1 value: 90.9331 - type: f1_weighted value: 91.2787 - type: main_score value: 91.31099999999999 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 54.9362 - type: f1 value: 38.364399999999996 - type: f1_weighted value: 57.1133 - type: main_score value: 54.9362 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 64.5461 - type: f1 value: 60.8751 - type: f1_weighted value: 63.248599999999996 - type: main_score value: 64.5461 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 71.6476 - type: f1 value: 71.03110000000001 - type: f1_weighted value: 71.3832 - type: main_score value: 71.6476 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P (default) revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: v_measure value: 32.3037 - type: v_measure_std value: 1.4981 - type: main_score value: 32.3037 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S (default) revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: v_measure value: 31.9128 - type: v_measure_std value: 1.4597 - type: main_score value: 31.9128 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking (default) revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 split: test type: mteb/mind_small metrics: - type: map value: 32.2181 - type: mrr value: 33.4843 - type: nAUC_map_max value: -17.8061 - type: nAUC_map_std value: -1.1424 - type: nAUC_map_diff1 value: 14.106 - type: nAUC_mrr_max value: -12.6864 - type: nAUC_mrr_std value: 0.7633 - type: nAUC_mrr_diff1 value: 13.168099999999999 - type: main_score value: 32.2181 task: type: Reranking - dataset: config: default name: MTEB NFCorpus (default) revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: ndcg_at_1 value: 45.356 - type: ndcg_at_3 value: 42.643 - type: ndcg_at_5 value: 40.882000000000005 - type: ndcg_at_10 value: 37.25 - type: ndcg_at_20 value: 34.863 - type: ndcg_at_100 value: 34.496 - type: ndcg_at_1000 value: 43.374 - type: map_at_1 value: 6.126 - type: map_at_3 value: 10.301 - type: map_at_5 value: 12.084999999999999 - type: map_at_10 value: 14.152000000000001 - type: map_at_20 value: 15.796 - type: map_at_100 value: 18.27 - type: map_at_1000 value: 19.88 - type: recall_at_1 value: 6.126 - type: recall_at_3 value: 11.706 - type: recall_at_5 value: 14.419 - type: recall_at_10 value: 18.427 - type: recall_at_20 value: 22.7 - type: recall_at_100 value: 35.018 - type: recall_at_1000 value: 67.66 - type: precision_at_1 value: 47.368 - type: precision_at_3 value: 40.144000000000005 - type: precision_at_5 value: 35.913000000000004 - type: precision_at_10 value: 27.74 - type: precision_at_20 value: 20.619 - type: precision_at_100 value: 9.071 - type: precision_at_1000 value: 2.226 - type: mrr_at_1 value: 47.678 - type: mrr_at_3 value: 55.1084 - type: mrr_at_5 value: 56.145500000000006 - type: mrr_at_10 value: 56.7134 - type: mrr_at_20 value: 57.0095 - type: mrr_at_100 value: 57.2211 - type: mrr_at_1000 value: 57.2755 - type: nauc_ndcg_at_1_max value: 39.442899999999995 - type: nauc_ndcg_at_1_std value: 25.1396 - type: nauc_ndcg_at_1_diff1 value: 35.5228 - type: nauc_ndcg_at_3_max value: 42.536699999999996 - type: nauc_ndcg_at_3_std value: 30.7104 - type: nauc_ndcg_at_3_diff1 value: 26.383699999999997 - type: nauc_ndcg_at_5_max value: 44.2751 - type: nauc_ndcg_at_5_std value: 31.6998 - type: nauc_ndcg_at_5_diff1 value: 24.4678 - type: nauc_ndcg_at_10_max value: 41.806599999999996 - type: nauc_ndcg_at_10_std value: 32.7977 - type: nauc_ndcg_at_10_diff1 value: 20.0545 - type: nauc_ndcg_at_20_max value: 39.0588 - type: nauc_ndcg_at_20_std value: 31.5545 - type: nauc_ndcg_at_20_diff1 value: 18.075499999999998 - type: nauc_ndcg_at_100_max value: 40.562599999999996 - type: nauc_ndcg_at_100_std value: 34.0612 - type: nauc_ndcg_at_100_diff1 value: 21.0169 - type: nauc_ndcg_at_1000_max value: 46.1599 - type: nauc_ndcg_at_1000_std value: 38.1991 - type: nauc_ndcg_at_1000_diff1 value: 21.7529 - type: nauc_map_at_1_max value: 2.822 - type: nauc_map_at_1_std value: -13.824200000000001 - type: nauc_map_at_1_diff1 value: 43.4619 - type: nauc_map_at_3_max value: 10.7749 - type: nauc_map_at_3_std value: -7.7192 - type: nauc_map_at_3_diff1 value: 33.543099999999995 - type: nauc_map_at_5_max value: 15.534 - type: nauc_map_at_5_std value: -4.6368 - type: nauc_map_at_5_diff1 value: 31.472499999999997 - type: nauc_map_at_10_max value: 19.6203 - type: nauc_map_at_10_std value: 0.9646 - type: nauc_map_at_10_diff1 value: 26.763199999999998 - type: nauc_map_at_20_max value: 22.9019 - type: nauc_map_at_20_std value: 5.4963999999999995 - type: nauc_map_at_20_diff1 value: 23.5639 - type: nauc_map_at_100_max value: 26.9211 - type: nauc_map_at_100_std value: 13.7679 - type: nauc_map_at_100_diff1 value: 21.4205 - type: nauc_map_at_1000_max value: 27.795199999999998 - type: nauc_map_at_1000_std value: 17.5388 - type: nauc_map_at_1000_diff1 value: 20.6324 - type: nauc_recall_at_1_max value: 2.822 - type: nauc_recall_at_1_std value: -13.824200000000001 - type: nauc_recall_at_1_diff1 value: 43.4619 - type: nauc_recall_at_3_max value: 11.128499999999999 - type: nauc_recall_at_3_std value: -6.583500000000001 - type: nauc_recall_at_3_diff1 value: 31.2104 - type: nauc_recall_at_5_max value: 15.5377 - type: nauc_recall_at_5_std value: -4.0625 - type: nauc_recall_at_5_diff1 value: 28.746199999999998 - type: nauc_recall_at_10_max value: 17.7947 - type: nauc_recall_at_10_std value: 1.9115 - type: nauc_recall_at_10_diff1 value: 20.028000000000002 - type: nauc_recall_at_20_max value: 18.5316 - type: nauc_recall_at_20_std value: 4.5177000000000005 - type: nauc_recall_at_20_diff1 value: 14.4906 - type: nauc_recall_at_100_max value: 27.871299999999998 - type: nauc_recall_at_100_std value: 22.9259 - type: nauc_recall_at_100_diff1 value: 12.8091 - type: nauc_recall_at_1000_max value: 24.782899999999998 - type: nauc_recall_at_1000_std value: 23.6364 - type: nauc_recall_at_1000_diff1 value: 8.318100000000001 - type: nauc_precision_at_1_max value: 41.779500000000006 - type: nauc_precision_at_1_std value: 25.690600000000003 - type: nauc_precision_at_1_diff1 value: 35.6552 - type: nauc_precision_at_3_max value: 46.0167 - type: nauc_precision_at_3_std value: 37.0565 - type: nauc_precision_at_3_diff1 value: 16.6278 - type: nauc_precision_at_5_max value: 47.2631 - type: nauc_precision_at_5_std value: 39.6181 - type: nauc_precision_at_5_diff1 value: 9.3291 - type: nauc_precision_at_10_max value: 42.9477 - type: nauc_precision_at_10_std value: 44.7365 - type: nauc_precision_at_10_diff1 value: -0.2033 - type: nauc_precision_at_20_max value: 37.0473 - type: nauc_precision_at_20_std value: 46.609 - type: nauc_precision_at_20_diff1 value: -5.4761999999999995 - type: nauc_precision_at_100_max value: 24.1237 - type: nauc_precision_at_100_std value: 49.1772 - type: nauc_precision_at_100_diff1 value: -6.9049 - type: nauc_precision_at_1000_max value: 9.0734 - type: nauc_precision_at_1000_std value: 38.4405 - type: nauc_precision_at_1000_diff1 value: -4.3116 - type: nauc_mrr_at_1_max value: 41.5105 - type: nauc_mrr_at_1_std value: 25.404500000000002 - type: nauc_mrr_at_1_diff1 value: 34.8177 - type: nauc_mrr_at_3_max value: 47.332 - type: nauc_mrr_at_3_std value: 33.2771 - type: nauc_mrr_at_3_diff1 value: 34.5929 - type: nauc_mrr_at_5_max value: 48.044799999999995 - type: nauc_mrr_at_5_std value: 33.596 - type: nauc_mrr_at_5_diff1 value: 34.4048 - type: nauc_mrr_at_10_max value: 48.2427 - type: nauc_mrr_at_10_std value: 33.9279 - type: nauc_mrr_at_10_diff1 value: 33.974900000000005 - type: nauc_mrr_at_20_max value: 48.2093 - type: nauc_mrr_at_20_std value: 33.9138 - type: nauc_mrr_at_20_diff1 value: 34.0267 - type: nauc_mrr_at_100_max value: 48.322700000000005 - type: nauc_mrr_at_100_std value: 34.096 - type: nauc_mrr_at_100_diff1 value: 34.1172 - type: nauc_mrr_at_1000_max value: 48.2719 - type: nauc_mrr_at_1000_std value: 34.034 - type: nauc_mrr_at_1000_diff1 value: 34.0978 - type: main_score value: 37.25 task: type: Retrieval - dataset: config: default name: MTEB NQ (default) revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: ndcg_at_1 value: 37.254 - type: ndcg_at_3 value: 49.219 - type: ndcg_at_5 value: 54.037 - type: ndcg_at_10 value: 58.044 - type: ndcg_at_20 value: 59.946999999999996 - type: ndcg_at_100 value: 61.61299999999999 - type: ndcg_at_1000 value: 62.046 - type: map_at_1 value: 33.053 - type: map_at_3 value: 44.91 - type: map_at_5 value: 47.83 - type: map_at_10 value: 49.739 - type: map_at_20 value: 50.336999999999996 - type: map_at_100 value: 50.626000000000005 - type: map_at_1000 value: 50.647 - type: recall_at_1 value: 33.053 - type: recall_at_3 value: 58.157000000000004 - type: recall_at_5 value: 69.235 - type: recall_at_10 value: 80.76 - type: recall_at_20 value: 87.756 - type: recall_at_100 value: 95.86200000000001 - type: recall_at_1000 value: 99.044 - type: precision_at_1 value: 37.254 - type: precision_at_3 value: 22.538 - type: precision_at_5 value: 16.344 - type: precision_at_10 value: 9.655 - type: precision_at_20 value: 5.2909999999999995 - type: precision_at_100 value: 1.167 - type: precision_at_1000 value: 0.121 - type: mrr_at_1 value: 37.2538 - type: mrr_at_3 value: 48.4453 - type: mrr_at_5 value: 50.8338 - type: mrr_at_10 value: 52.221700000000006 - type: mrr_at_20 value: 52.660399999999996 - type: mrr_at_100 value: 52.85490000000001 - type: mrr_at_1000 value: 52.869299999999996 - type: nauc_ndcg_at_1_max value: 22.453400000000002 - type: nauc_ndcg_at_1_std value: 1.3625 - type: nauc_ndcg_at_1_diff1 value: 33.4465 - type: nauc_ndcg_at_3_max value: 29.2215 - type: nauc_ndcg_at_3_std value: 1.496 - type: nauc_ndcg_at_3_diff1 value: 28.881600000000002 - type: nauc_ndcg_at_5_max value: 30.8294 - type: nauc_ndcg_at_5_std value: 3.0327 - type: nauc_ndcg_at_5_diff1 value: 27.2679 - type: nauc_ndcg_at_10_max value: 32.5349 - type: nauc_ndcg_at_10_std value: 5.074 - type: nauc_ndcg_at_10_diff1 value: 26.9574 - type: nauc_ndcg_at_20_max value: 32.2817 - type: nauc_ndcg_at_20_std value: 5.8412 - type: nauc_ndcg_at_20_diff1 value: 27.62 - type: nauc_ndcg_at_100_max value: 31.084 - type: nauc_ndcg_at_100_std value: 5.8699 - type: nauc_ndcg_at_100_diff1 value: 28.0961 - type: nauc_ndcg_at_1000_max value: 30.3847 - type: nauc_ndcg_at_1000_std value: 4.9963 - type: nauc_ndcg_at_1000_diff1 value: 28.4336 - type: nauc_map_at_1_max value: 20.5816 - type: nauc_map_at_1_std value: -1.0661 - type: nauc_map_at_1_diff1 value: 33.6828 - type: nauc_map_at_3_max value: 27.4552 - type: nauc_map_at_3_std value: 0.769 - type: nauc_map_at_3_diff1 value: 30.0372 - type: nauc_map_at_5_max value: 28.315099999999997 - type: nauc_map_at_5_std value: 1.6410999999999998 - type: nauc_map_at_5_diff1 value: 29.2099 - type: nauc_map_at_10_max value: 28.969299999999997 - type: nauc_map_at_10_std value: 2.5593999999999997 - type: nauc_map_at_10_diff1 value: 29.0818 - type: nauc_map_at_20_max value: 28.902299999999997 - type: nauc_map_at_20_std value: 2.788 - type: nauc_map_at_20_diff1 value: 29.2439 - type: nauc_map_at_100_max value: 28.7275 - type: nauc_map_at_100_std value: 2.8171 - type: nauc_map_at_100_diff1 value: 29.313899999999997 - type: nauc_map_at_1000_max value: 28.701 - type: nauc_map_at_1000_std value: 2.7868 - type: nauc_map_at_1000_diff1 value: 29.3304 - type: nauc_recall_at_1_max value: 20.5816 - type: nauc_recall_at_1_std value: -1.0661 - type: nauc_recall_at_1_diff1 value: 33.6828 - type: nauc_recall_at_3_max value: 33.0999 - type: nauc_recall_at_3_std value: 1.5433000000000001 - type: nauc_recall_at_3_diff1 value: 24.7191 - type: nauc_recall_at_5_max value: 38.3028 - type: nauc_recall_at_5_std value: 5.4908 - type: nauc_recall_at_5_diff1 value: 19.3777 - type: nauc_recall_at_10_max value: 49.9754 - type: nauc_recall_at_10_std value: 15.2697 - type: nauc_recall_at_10_diff1 value: 15.338199999999999 - type: nauc_recall_at_20_max value: 57.0007 - type: nauc_recall_at_20_std value: 25.9537 - type: nauc_recall_at_20_diff1 value: 16.1382 - type: nauc_recall_at_100_max value: 70.0766 - type: nauc_recall_at_100_std value: 60.529599999999995 - type: nauc_recall_at_100_diff1 value: 12.1256 - type: nauc_recall_at_1000_max value: 70.6831 - type: nauc_recall_at_1000_std value: 73.87599999999999 - type: nauc_recall_at_1000_diff1 value: 18.0994 - type: nauc_precision_at_1_max value: 22.453400000000002 - type: nauc_precision_at_1_std value: 1.3625 - type: nauc_precision_at_1_diff1 value: 33.4465 - type: nauc_precision_at_3_max value: 32.461 - type: nauc_precision_at_3_std value: 6.0438 - type: nauc_precision_at_3_diff1 value: 19.4828 - type: nauc_precision_at_5_max value: 30.8773 - type: nauc_precision_at_5_std value: 9.5136 - type: nauc_precision_at_5_diff1 value: 10.8131 - type: nauc_precision_at_10_max value: 28.0383 - type: nauc_precision_at_10_std value: 15.0419 - type: nauc_precision_at_10_diff1 value: 2.5906 - type: nauc_precision_at_20_max value: 22.5558 - type: nauc_precision_at_20_std value: 18.2138 - type: nauc_precision_at_20_diff1 value: -0.5902000000000001 - type: nauc_precision_at_100_max value: 9.1213 - type: nauc_precision_at_100_std value: 18.0878 - type: nauc_precision_at_100_diff1 value: -6.768299999999999 - type: nauc_precision_at_1000_max value: 1.3558000000000001 - type: nauc_precision_at_1000_std value: 12.4464 - type: nauc_precision_at_1000_diff1 value: -7.8355999999999995 - type: nauc_mrr_at_1_max value: 22.453400000000002 - type: nauc_mrr_at_1_std value: 1.3625 - type: nauc_mrr_at_1_diff1 value: 33.4465 - type: nauc_mrr_at_3_max value: 27.747100000000003 - type: nauc_mrr_at_3_std value: 2.8298 - type: nauc_mrr_at_3_diff1 value: 29.8467 - type: nauc_mrr_at_5_max value: 28.3625 - type: nauc_mrr_at_5_std value: 3.5815 - type: nauc_mrr_at_5_diff1 value: 29.009 - type: nauc_mrr_at_10_max value: 28.769699999999997 - type: nauc_mrr_at_10_std value: 4.1444 - type: nauc_mrr_at_10_diff1 value: 29.0508 - type: nauc_mrr_at_20_max value: 28.6226 - type: nauc_mrr_at_20_std value: 4.2112 - type: nauc_mrr_at_20_diff1 value: 29.2674 - type: nauc_mrr_at_100_max value: 28.4889 - type: nauc_mrr_at_100_std value: 4.197900000000001 - type: nauc_mrr_at_100_diff1 value: 29.3558 - type: nauc_mrr_at_1000_max value: 28.4672 - type: nauc_mrr_at_1000_std value: 4.1723 - type: nauc_mrr_at_1000_diff1 value: 29.3661 - type: main_score value: 58.044 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval (default) revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 split: test type: mteb/quora metrics: - type: ndcg_at_1 value: 80.65 - type: ndcg_at_3 value: 84.897 - type: ndcg_at_5 value: 86.545 - type: ndcg_at_10 value: 87.822 - type: ndcg_at_20 value: 88.51299999999999 - type: ndcg_at_100 value: 89.091 - type: ndcg_at_1000 value: 89.203 - type: map_at_1 value: 70.05799999999999 - type: map_at_3 value: 81.03399999999999 - type: map_at_5 value: 82.922 - type: map_at_10 value: 84.009 - type: map_at_20 value: 84.442 - type: map_at_100 value: 84.661 - type: map_at_1000 value: 84.679 - type: recall_at_1 value: 70.05799999999999 - type: recall_at_3 value: 86.763 - type: recall_at_5 value: 91.396 - type: recall_at_10 value: 95.148 - type: recall_at_20 value: 97.34 - type: recall_at_100 value: 99.47399999999999 - type: recall_at_1000 value: 99.977 - type: precision_at_1 value: 80.65 - type: precision_at_3 value: 37.15 - type: precision_at_5 value: 24.48 - type: precision_at_10 value: 13.347000000000001 - type: precision_at_20 value: 7.095 - type: precision_at_100 value: 1.5270000000000001 - type: precision_at_1000 value: 0.157 - type: mrr_at_1 value: 80.64 - type: mrr_at_3 value: 85.9483 - type: mrr_at_5 value: 86.6738 - type: mrr_at_10 value: 86.9798 - type: mrr_at_20 value: 87.06009999999999 - type: mrr_at_100 value: 87.08829999999999 - type: mrr_at_1000 value: 87.08930000000001 - type: nauc_ndcg_at_1_max value: 37.1678 - type: nauc_ndcg_at_1_std value: -33.5588 - type: nauc_ndcg_at_1_diff1 value: 77.2101 - type: nauc_ndcg_at_3_max value: 35.085 - type: nauc_ndcg_at_3_std value: -39.8447 - type: nauc_ndcg_at_3_diff1 value: 75.7084 - type: nauc_ndcg_at_5_max value: 36.0947 - type: nauc_ndcg_at_5_std value: -40.3617 - type: nauc_ndcg_at_5_diff1 value: 76.5872 - type: nauc_ndcg_at_10_max value: 36.091899999999995 - type: nauc_ndcg_at_10_std value: -39.8878 - type: nauc_ndcg_at_10_diff1 value: 76.5282 - type: nauc_ndcg_at_20_max value: 36.6226 - type: nauc_ndcg_at_20_std value: -38.3337 - type: nauc_ndcg_at_20_diff1 value: 76.4084 - type: nauc_ndcg_at_100_max value: 36.9855 - type: nauc_ndcg_at_100_std value: -36.561 - type: nauc_ndcg_at_100_diff1 value: 76.21860000000001 - type: nauc_ndcg_at_1000_max value: 37.021300000000004 - type: nauc_ndcg_at_1000_std value: -36.494 - type: nauc_ndcg_at_1000_diff1 value: 76.18599999999999 - type: nauc_map_at_1_max value: 26.761000000000003 - type: nauc_map_at_1_std value: -36.3749 - type: nauc_map_at_1_diff1 value: 80.0977 - type: nauc_map_at_3_max value: 32.530300000000004 - type: nauc_map_at_3_std value: -42.3896 - type: nauc_map_at_3_diff1 value: 77.1352 - type: nauc_map_at_5_max value: 34.322599999999994 - type: nauc_map_at_5_std value: -41.9927 - type: nauc_map_at_5_diff1 value: 77.1848 - type: nauc_map_at_10_max value: 35.0744 - type: nauc_map_at_10_std value: -40.8511 - type: nauc_map_at_10_diff1 value: 76.86319999999999 - type: nauc_map_at_20_max value: 35.442299999999996 - type: nauc_map_at_20_std value: -39.7228 - type: nauc_map_at_20_diff1 value: 76.67150000000001 - type: nauc_map_at_100_max value: 35.5927 - type: nauc_map_at_100_std value: -38.9448 - type: nauc_map_at_100_diff1 value: 76.57169999999999 - type: nauc_map_at_1000_max value: 35.612100000000005 - type: nauc_map_at_1000_std value: -38.8973 - type: nauc_map_at_1000_diff1 value: 76.5656 - type: nauc_recall_at_1_max value: 26.761000000000003 - type: nauc_recall_at_1_std value: -36.3749 - type: nauc_recall_at_1_diff1 value: 80.0977 - type: nauc_recall_at_3_max value: 29.2557 - type: nauc_recall_at_3_std value: -48.3412 - type: nauc_recall_at_3_diff1 value: 73.5986 - type: nauc_recall_at_5_max value: 32.0708 - type: nauc_recall_at_5_std value: -51.9846 - type: nauc_recall_at_5_diff1 value: 74.0073 - type: nauc_recall_at_10_max value: 30.5549 - type: nauc_recall_at_10_std value: -56.8778 - type: nauc_recall_at_10_diff1 value: 73.5398 - type: nauc_recall_at_20_max value: 32.5741 - type: nauc_recall_at_20_std value: -50.3935 - type: nauc_recall_at_20_diff1 value: 73.6634 - type: nauc_recall_at_100_max value: 40.8872 - type: nauc_recall_at_100_std value: -18.2413 - type: nauc_recall_at_100_diff1 value: 72.1894 - type: nauc_recall_at_1000_max value: 31.5668 - type: nauc_recall_at_1000_std value: 51.0679 - type: nauc_recall_at_1000_diff1 value: 59.485299999999995 - type: nauc_precision_at_1_max value: 37.1678 - type: nauc_precision_at_1_std value: -33.5588 - type: nauc_precision_at_1_diff1 value: 77.2101 - type: nauc_precision_at_3_max value: 9.868 - type: nauc_precision_at_3_std value: 4.8771 - type: nauc_precision_at_3_diff1 value: -16.2165 - type: nauc_precision_at_5_max value: 5.169 - type: nauc_precision_at_5_std value: 15.223700000000001 - type: nauc_precision_at_5_diff1 value: -29.328300000000002 - type: nauc_precision_at_10_max value: 0.3411 - type: nauc_precision_at_10_std value: 24.0866 - type: nauc_precision_at_10_diff1 value: -37.514399999999995 - type: nauc_precision_at_20_max value: -1.981 - type: nauc_precision_at_20_std value: 30.408099999999997 - type: nauc_precision_at_20_diff1 value: -41.1355 - type: nauc_precision_at_100_max value: -4.2999 - type: nauc_precision_at_100_std value: 36.4541 - type: nauc_precision_at_100_diff1 value: -43.7797 - type: nauc_precision_at_1000_max value: -4.4928 - type: nauc_precision_at_1000_std value: 36.9861 - type: nauc_precision_at_1000_diff1 value: -44.182 - type: nauc_mrr_at_1_max value: 37.2354 - type: nauc_mrr_at_1_std value: -33.4342 - type: nauc_mrr_at_1_diff1 value: 77.2283 - type: nauc_mrr_at_3_max value: 38.000299999999996 - type: nauc_mrr_at_3_std value: -34.9304 - type: nauc_mrr_at_3_diff1 value: 76.20280000000001 - type: nauc_mrr_at_5_max value: 38.3135 - type: nauc_mrr_at_5_std value: -34.707 - type: nauc_mrr_at_5_diff1 value: 76.4365 - type: nauc_mrr_at_10_max value: 38.0013 - type: nauc_mrr_at_10_std value: -34.6562 - type: nauc_mrr_at_10_diff1 value: 76.44069999999999 - type: nauc_mrr_at_20_max value: 38.0368 - type: nauc_mrr_at_20_std value: -34.4726 - type: nauc_mrr_at_20_diff1 value: 76.4482 - type: nauc_mrr_at_100_max value: 38.0243 - type: nauc_mrr_at_100_std value: -34.4696 - type: nauc_mrr_at_100_diff1 value: 76.4569 - type: nauc_mrr_at_1000_max value: 38.0227 - type: nauc_mrr_at_1000_std value: -34.4733 - type: nauc_mrr_at_1000_diff1 value: 76.45739999999999 - type: main_score value: 87.822 task: type: Retrieval - dataset: config: default name: MTEB RedditClustering (default) revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: v_measure value: 54.4296 - type: v_measure_std value: 5.026400000000001 - type: main_score value: 54.4296 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P (default) revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: v_measure value: 58.1919 - type: v_measure_std value: 12.618199999999998 - type: main_score value: 58.1919 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS (default) revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 split: test type: mteb/scidocs metrics: - type: ndcg_at_1 value: 28.1 - type: ndcg_at_3 value: 22.721 - type: ndcg_at_5 value: 20.015 - type: ndcg_at_10 value: 24.146 - type: ndcg_at_20 value: 27.74 - type: ndcg_at_100 value: 33.900000000000006 - type: ndcg_at_1000 value: 39.728 - type: map_at_1 value: 5.737 - type: map_at_3 value: 10.474 - type: map_at_5 value: 12.656 - type: map_at_10 value: 14.896 - type: map_at_20 value: 16.317999999999998 - type: map_at_100 value: 17.646 - type: map_at_1000 value: 18.029999999999998 - type: recall_at_1 value: 5.737 - type: recall_at_3 value: 12.897 - type: recall_at_5 value: 17.854999999999997 - type: recall_at_10 value: 25.4 - type: recall_at_20 value: 33.817 - type: recall_at_100 value: 53.772 - type: recall_at_1000 value: 82.013 - type: precision_at_1 value: 28.1 - type: precision_at_3 value: 21.2 - type: precision_at_5 value: 17.599999999999998 - type: precision_at_10 value: 12.540000000000001 - type: precision_at_20 value: 8.34 - type: precision_at_100 value: 2.651 - type: precision_at_1000 value: 0.404 - type: mrr_at_1 value: 28.1 - type: mrr_at_3 value: 35.9167 - type: mrr_at_5 value: 38.0967 - type: mrr_at_10 value: 39.578799999999994 - type: mrr_at_20 value: 40.2541 - type: mrr_at_100 value: 40.687 - type: mrr_at_1000 value: 40.722 - type: nauc_ndcg_at_1_max value: 21.2698 - type: nauc_ndcg_at_1_std value: 8.8522 - type: nauc_ndcg_at_1_diff1 value: 21.6443 - type: nauc_ndcg_at_3_max value: 28.6762 - type: nauc_ndcg_at_3_std value: 13.8129 - type: nauc_ndcg_at_3_diff1 value: 16.4517 - type: nauc_ndcg_at_5_max value: 31.252000000000002 - type: nauc_ndcg_at_5_std value: 17.3178 - type: nauc_ndcg_at_5_diff1 value: 16.8954 - type: nauc_ndcg_at_10_max value: 32.581700000000005 - type: nauc_ndcg_at_10_std value: 19.936300000000003 - type: nauc_ndcg_at_10_diff1 value: 17.086499999999997 - type: nauc_ndcg_at_20_max value: 32.3902 - type: nauc_ndcg_at_20_std value: 22.8215 - type: nauc_ndcg_at_20_diff1 value: 14.6836 - type: nauc_ndcg_at_100_max value: 33.2665 - type: nauc_ndcg_at_100_std value: 28.93 - type: nauc_ndcg_at_100_diff1 value: 14.8837 - type: nauc_ndcg_at_1000_max value: 32.9079 - type: nauc_ndcg_at_1000_std value: 28.228900000000003 - type: nauc_ndcg_at_1000_diff1 value: 15.9599 - type: nauc_map_at_1_max value: 20.3725 - type: nauc_map_at_1_std value: 8.7546 - type: nauc_map_at_1_diff1 value: 20.8754 - type: nauc_map_at_3_max value: 27.0845 - type: nauc_map_at_3_std value: 12.6727 - type: nauc_map_at_3_diff1 value: 15.6365 - type: nauc_map_at_5_max value: 29.2312 - type: nauc_map_at_5_std value: 15.8701 - type: nauc_map_at_5_diff1 value: 15.891 - type: nauc_map_at_10_max value: 30.3676 - type: nauc_map_at_10_std value: 18.5848 - type: nauc_map_at_10_diff1 value: 15.155299999999999 - type: nauc_map_at_20_max value: 30.6006 - type: nauc_map_at_20_std value: 20.4984 - type: nauc_map_at_20_diff1 value: 13.8149 - type: nauc_map_at_100_max value: 31.3216 - type: nauc_map_at_100_std value: 22.8546 - type: nauc_map_at_100_diff1 value: 13.9657 - type: nauc_map_at_1000_max value: 31.3095 - type: nauc_map_at_1000_std value: 22.991 - type: nauc_map_at_1000_diff1 value: 13.999500000000001 - type: nauc_recall_at_1_max value: 20.3725 - type: nauc_recall_at_1_std value: 8.7546 - type: nauc_recall_at_1_diff1 value: 20.8754 - type: nauc_recall_at_3_max value: 30.6276 - type: nauc_recall_at_3_std value: 15.5861 - type: nauc_recall_at_3_diff1 value: 13.9652 - type: nauc_recall_at_5_max value: 33.4455 - type: nauc_recall_at_5_std value: 20.4822 - type: nauc_recall_at_5_diff1 value: 14.566799999999999 - type: nauc_recall_at_10_max value: 33.9121 - type: nauc_recall_at_10_std value: 23.4277 - type: nauc_recall_at_10_diff1 value: 14.5769 - type: nauc_recall_at_20_max value: 30.939100000000003 - type: nauc_recall_at_20_std value: 27.683400000000002 - type: nauc_recall_at_20_diff1 value: 8.519300000000001 - type: nauc_recall_at_100_max value: 28.9221 - type: nauc_recall_at_100_std value: 41.281600000000005 - type: nauc_recall_at_100_diff1 value: 7.3066 - type: nauc_recall_at_1000_max value: 24.2406 - type: nauc_recall_at_1000_std value: 43.2715 - type: nauc_recall_at_1000_diff1 value: 10.2232 - type: nauc_precision_at_1_max value: 21.2698 - type: nauc_precision_at_1_std value: 8.8522 - type: nauc_precision_at_1_diff1 value: 21.6443 - type: nauc_precision_at_3_max value: 31.2776 - type: nauc_precision_at_3_std value: 15.8911 - type: nauc_precision_at_3_diff1 value: 14.357800000000001 - type: nauc_precision_at_5_max value: 34.034 - type: nauc_precision_at_5_std value: 20.6595 - type: nauc_precision_at_5_diff1 value: 15.1316 - type: nauc_precision_at_10_max value: 34.4474 - type: nauc_precision_at_10_std value: 23.5843 - type: nauc_precision_at_10_diff1 value: 14.9385 - type: nauc_precision_at_20_max value: 31.4376 - type: nauc_precision_at_20_std value: 27.7123 - type: nauc_precision_at_20_diff1 value: 8.6083 - type: nauc_precision_at_100_max value: 29.401300000000003 - type: nauc_precision_at_100_std value: 40.5942 - type: nauc_precision_at_100_diff1 value: 7.6172 - type: nauc_precision_at_1000_max value: 25.2832 - type: nauc_precision_at_1000_std value: 40.9653 - type: nauc_precision_at_1000_diff1 value: 10.3534 - type: nauc_mrr_at_1_max value: 21.2698 - type: nauc_mrr_at_1_std value: 8.8522 - type: nauc_mrr_at_1_diff1 value: 21.6443 - type: nauc_mrr_at_3_max value: 26.8557 - type: nauc_mrr_at_3_std value: 12.482600000000001 - type: nauc_mrr_at_3_diff1 value: 19.3542 - type: nauc_mrr_at_5_max value: 28.0333 - type: nauc_mrr_at_5_std value: 13.4664 - type: nauc_mrr_at_5_diff1 value: 20.0372 - type: nauc_mrr_at_10_max value: 28.0659 - type: nauc_mrr_at_10_std value: 13.791999999999998 - type: nauc_mrr_at_10_diff1 value: 20.7022 - type: nauc_mrr_at_20_max value: 27.886499999999998 - type: nauc_mrr_at_20_std value: 13.952700000000002 - type: nauc_mrr_at_20_diff1 value: 20.5573 - type: nauc_mrr_at_100_max value: 27.714299999999998 - type: nauc_mrr_at_100_std value: 13.863700000000001 - type: nauc_mrr_at_100_diff1 value: 20.5074 - type: nauc_mrr_at_1000_max value: 27.700599999999998 - type: nauc_mrr_at_1000_std value: 13.8399 - type: nauc_mrr_at_1000_diff1 value: 20.5031 - type: main_score value: 24.146 task: type: Retrieval - dataset: config: default name: MTEB SICK-R (default) revision: 20a6d6f312dd54037fe07a32d58e5e168867909d split: test type: mteb/sickr-sts metrics: - type: pearson value: 78.6926 - type: spearman value: 71.2001 - type: cosine_pearson value: 78.6926 - type: cosine_spearman value: 71.2001 - type: manhattan_pearson value: 75.264 - type: manhattan_spearman value: 71.1303 - type: euclidean_pearson value: 75.3261 - type: euclidean_spearman value: 71.2001 - type: main_score value: 71.2001 task: type: STS - dataset: config: default name: MTEB STS12 (default) revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: pearson value: 71.0057 - type: spearman value: 65.9247 - type: cosine_pearson value: 71.0057 - type: cosine_spearman value: 65.9247 - type: manhattan_pearson value: 67.392 - type: manhattan_spearman value: 65.8026 - type: euclidean_pearson value: 67.5888 - type: euclidean_spearman value: 65.92479999999999 - type: main_score value: 65.9247 task: type: STS - dataset: config: default name: MTEB STS13 (default) revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: pearson value: 81.67649999999999 - type: spearman value: 81.7525 - type: cosine_pearson value: 81.67649999999999 - type: cosine_spearman value: 81.7525 - type: manhattan_pearson value: 81.0327 - type: manhattan_spearman value: 81.6717 - type: euclidean_pearson value: 81.10000000000001 - type: euclidean_spearman value: 81.7526 - type: main_score value: 81.7525 task: type: STS - dataset: config: default name: MTEB STS14 (default) revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: pearson value: 79.47579999999999 - type: spearman value: 74.2305 - type: cosine_pearson value: 79.47579999999999 - type: cosine_spearman value: 74.2305 - type: manhattan_pearson value: 77.8846 - type: manhattan_spearman value: 74.1908 - type: euclidean_pearson value: 77.9333 - type: euclidean_spearman value: 74.2305 - type: main_score value: 74.2305 task: type: STS - dataset: config: default name: MTEB STS15 (default) revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: pearson value: 82.90180000000001 - type: spearman value: 84.1271 - type: cosine_pearson value: 82.90180000000001 - type: cosine_spearman value: 84.1271 - type: manhattan_pearson value: 83.6431 - type: manhattan_spearman value: 84.1091 - type: euclidean_pearson value: 83.6388 - type: euclidean_spearman value: 84.127 - type: main_score value: 84.1271 task: type: STS - dataset: config: default name: MTEB STS16 (default) revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: pearson value: 80.19810000000001 - type: spearman value: 81.6627 - type: cosine_pearson value: 80.19810000000001 - type: cosine_spearman value: 81.6627 - type: manhattan_pearson value: 81.4605 - type: manhattan_spearman value: 81.62819999999999 - type: euclidean_pearson value: 81.5043 - type: euclidean_spearman value: 81.6627 - type: main_score value: 81.6627 task: type: STS - dataset: config: en-de name: MTEB STS17 (en-de) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 47.9276 - type: spearman value: 50.0286 - type: cosine_pearson value: 47.9276 - type: cosine_spearman value: 50.0286 - type: manhattan_pearson value: 48.5188 - type: manhattan_spearman value: 50.432 - type: euclidean_pearson value: 48.1655 - type: euclidean_spearman value: 50.0286 - type: main_score value: 50.0286 task: type: STS - dataset: config: en-tr name: MTEB STS17 (en-tr) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 24.4119 - type: spearman value: 22.1195 - type: cosine_pearson value: 24.4119 - type: cosine_spearman value: 22.1195 - type: manhattan_pearson value: 25.873800000000003 - type: manhattan_spearman value: 23.6049 - type: euclidean_pearson value: 24.3693 - type: euclidean_spearman value: 22.1195 - type: main_score value: 22.1195 task: type: STS - dataset: config: en-ar name: MTEB STS17 (en-ar) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 22.656200000000002 - type: spearman value: 22.5445 - type: cosine_pearson value: 22.656200000000002 - type: cosine_spearman value: 22.5445 - type: manhattan_pearson value: 22.414 - type: manhattan_spearman value: 22.1601 - type: euclidean_pearson value: 22.7736 - type: euclidean_spearman value: 22.5445 - type: main_score value: 22.5445 task: type: STS - dataset: config: nl-en name: MTEB STS17 (nl-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 44.4998 - type: spearman value: 43.1984 - type: cosine_pearson value: 44.4998 - type: cosine_spearman value: 43.1984 - type: manhattan_pearson value: 43.3837 - type: manhattan_spearman value: 43.1122 - type: euclidean_pearson value: 44.1642 - type: euclidean_spearman value: 43.1984 - type: main_score value: 43.1984 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 82.3891 - type: spearman value: 83.9634 - type: cosine_pearson value: 82.3891 - type: cosine_spearman value: 83.9634 - type: manhattan_pearson value: 83.1481 - type: manhattan_spearman value: 83.9743 - type: euclidean_pearson value: 83.2767 - type: euclidean_spearman value: 83.9634 - type: main_score value: 83.9634 task: type: STS - dataset: config: it-en name: MTEB STS17 (it-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 35.3106 - type: spearman value: 30.7572 - type: cosine_pearson value: 35.3106 - type: cosine_spearman value: 30.7572 - type: manhattan_pearson value: 35.6552 - type: manhattan_spearman value: 31.596000000000004 - type: euclidean_pearson value: 35.4393 - type: euclidean_spearman value: 30.7572 - type: main_score value: 30.7572 task: type: STS - dataset: config: es-en name: MTEB STS17 (es-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 36.9322 - type: spearman value: 37.7137 - type: cosine_pearson value: 36.9322 - type: cosine_spearman value: 37.7137 - type: manhattan_pearson value: 36.0714 - type: manhattan_spearman value: 36.9979 - type: euclidean_pearson value: 36.784800000000004 - type: euclidean_spearman value: 37.7137 - type: main_score value: 37.7137 task: type: STS - dataset: config: fr-en name: MTEB STS17 (fr-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 39.963300000000004 - type: spearman value: 38.9248 - type: cosine_pearson value: 39.963300000000004 - type: cosine_spearman value: 38.9248 - type: manhattan_pearson value: 39.539699999999996 - type: manhattan_spearman value: 38.191900000000004 - type: euclidean_pearson value: 39.8596 - type: euclidean_spearman value: 38.9248 - type: main_score value: 38.9248 task: type: STS - dataset: config: de-en name: MTEB STS22 (de-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 56.0924 - type: spearman value: 54.1844 - type: cosine_pearson value: 56.0924 - type: cosine_spearman value: 54.1844 - type: manhattan_pearson value: 56.938100000000006 - type: manhattan_spearman value: 53.9407 - type: euclidean_pearson value: 57.9844 - type: euclidean_spearman value: 54.1844 - type: main_score value: 54.1844 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 69.3771 - type: spearman value: 69.3609 - type: cosine_pearson value: 69.3771 - type: cosine_spearman value: 69.3609 - type: manhattan_pearson value: 70.8762 - type: manhattan_spearman value: 69.1889 - type: euclidean_pearson value: 70.9433 - type: euclidean_spearman value: 69.3609 - type: main_score value: 69.3609 task: type: STS - dataset: config: pl-en name: MTEB STS22 (pl-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 74.11609999999999 - type: spearman value: 71.63340000000001 - type: cosine_pearson value: 74.11609999999999 - type: cosine_spearman value: 71.63340000000001 - type: manhattan_pearson value: 73.2348 - type: manhattan_spearman value: 71.1802 - type: euclidean_pearson value: 73.284 - type: euclidean_spearman value: 71.63340000000001 - type: main_score value: 71.63340000000001 task: type: STS - dataset: config: es-en name: MTEB STS22 (es-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 70.08879999999999 - type: spearman value: 73.79 - type: cosine_pearson value: 70.08879999999999 - type: cosine_spearman value: 73.79 - type: manhattan_pearson value: 71.5415 - type: manhattan_spearman value: 73.6588 - type: euclidean_pearson value: 71.621 - type: euclidean_spearman value: 73.79 - type: main_score value: 73.79 task: type: STS - dataset: config: zh-en name: MTEB STS22 (zh-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 37.5935 - type: spearman value: 39.5919 - type: cosine_pearson value: 37.5935 - type: cosine_spearman value: 39.5919 - type: manhattan_pearson value: 37.1717 - type: manhattan_spearman value: 38.6974 - type: euclidean_pearson value: 37.5632 - type: euclidean_spearman value: 39.5919 - type: main_score value: 39.5919 task: type: STS - dataset: config: default name: MTEB STSBenchmark (default) revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: pearson value: 79.9453 - type: spearman value: 79.6569 - type: cosine_pearson value: 79.9453 - type: cosine_spearman value: 79.6569 - type: manhattan_pearson value: 79.8923 - type: manhattan_spearman value: 79.58370000000001 - type: euclidean_pearson value: 79.9829 - type: euclidean_spearman value: 79.6569 - type: main_score value: 79.6569 task: type: STS - dataset: config: default name: MTEB SciDocsRR (default) revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: map value: 88.09949999999999 - type: mrr value: 96.6455 - type: nAUC_map_max value: 53.3622 - type: nAUC_map_std value: 70.3532 - type: nAUC_map_diff1 value: -0.21419999999999997 - type: nAUC_mrr_max value: 88.893 - type: nAUC_mrr_std value: 85.4516 - type: nAUC_mrr_diff1 value: 43.6847 - type: main_score value: 88.09949999999999 task: type: Reranking - dataset: config: default name: MTEB SciFact (default) revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: ndcg_at_1 value: 62.666999999999994 - type: ndcg_at_3 value: 69.77600000000001 - type: ndcg_at_5 value: 71.964 - type: ndcg_at_10 value: 74.72 - type: ndcg_at_20 value: 76.154 - type: ndcg_at_100 value: 76.961 - type: ndcg_at_1000 value: 77.294 - type: map_at_1 value: 60.011 - type: map_at_3 value: 67.135 - type: map_at_5 value: 68.78 - type: map_at_10 value: 70.101 - type: map_at_20 value: 70.56099999999999 - type: map_at_100 value: 70.687 - type: map_at_1000 value: 70.699 - type: recall_at_1 value: 60.011 - type: recall_at_3 value: 74.839 - type: recall_at_5 value: 80.028 - type: recall_at_10 value: 87.8 - type: recall_at_20 value: 93.10000000000001 - type: recall_at_100 value: 97.333 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 62.666999999999994 - type: precision_at_3 value: 27.0 - type: precision_at_5 value: 17.8 - type: precision_at_10 value: 9.933 - type: precision_at_20 value: 5.283 - type: precision_at_100 value: 1.103 - type: precision_at_1000 value: 0.11299999999999999 - type: mrr_at_1 value: 62.6667 - type: mrr_at_3 value: 68.9444 - type: mrr_at_5 value: 69.9611 - type: mrr_at_10 value: 71.02199999999999 - type: mrr_at_20 value: 71.3777 - type: mrr_at_100 value: 71.4841 - type: mrr_at_1000 value: 71.4961 - type: nauc_ndcg_at_1_max value: 55.4562 - type: nauc_ndcg_at_1_std value: -9.3317 - type: nauc_ndcg_at_1_diff1 value: 71.1878 - type: nauc_ndcg_at_3_max value: 55.3473 - type: nauc_ndcg_at_3_std value: -14.341400000000002 - type: nauc_ndcg_at_3_diff1 value: 69.11880000000001 - type: nauc_ndcg_at_5_max value: 55.5531 - type: nauc_ndcg_at_5_std value: -13.448699999999999 - type: nauc_ndcg_at_5_diff1 value: 67.4611 - type: nauc_ndcg_at_10_max value: 59.5974 - type: nauc_ndcg_at_10_std value: -10.262 - type: nauc_ndcg_at_10_diff1 value: 68.3408 - type: nauc_ndcg_at_20_max value: 58.586499999999994 - type: nauc_ndcg_at_20_std value: -9.8438 - type: nauc_ndcg_at_20_diff1 value: 68.4434 - type: nauc_ndcg_at_100_max value: 58.28489999999999 - type: nauc_ndcg_at_100_std value: -8.7782 - type: nauc_ndcg_at_100_diff1 value: 68.585 - type: nauc_ndcg_at_1000_max value: 58.0138 - type: nauc_ndcg_at_1000_std value: -9.4827 - type: nauc_ndcg_at_1000_diff1 value: 69.0467 - type: nauc_map_at_1_max value: 49.434 - type: nauc_map_at_1_std value: -17.0503 - type: nauc_map_at_1_diff1 value: 71.80290000000001 - type: nauc_map_at_3_max value: 52.8035 - type: nauc_map_at_3_std value: -16.2138 - type: nauc_map_at_3_diff1 value: 69.81739999999999 - type: nauc_map_at_5_max value: 54.644400000000005 - type: nauc_map_at_5_std value: -13.910900000000002 - type: nauc_map_at_5_diff1 value: 68.8879 - type: nauc_map_at_10_max value: 56.550999999999995 - type: nauc_map_at_10_std value: -12.126900000000001 - type: nauc_map_at_10_diff1 value: 69.2326 - type: nauc_map_at_20_max value: 56.299699999999994 - type: nauc_map_at_20_std value: -11.8978 - type: nauc_map_at_20_diff1 value: 69.3387 - type: nauc_map_at_100_max value: 56.295300000000005 - type: nauc_map_at_100_std value: -11.6546 - type: nauc_map_at_100_diff1 value: 69.3881 - type: nauc_map_at_1000_max value: 56.2905 - type: nauc_map_at_1000_std value: -11.666400000000001 - type: nauc_map_at_1000_diff1 value: 69.4106 - type: nauc_recall_at_1_max value: 49.434 - type: nauc_recall_at_1_std value: -17.0503 - type: nauc_recall_at_1_diff1 value: 71.80290000000001 - type: nauc_recall_at_3_max value: 53.6504 - type: nauc_recall_at_3_std value: -20.3796 - type: nauc_recall_at_3_diff1 value: 66.0397 - type: nauc_recall_at_5_max value: 54.45140000000001 - type: nauc_recall_at_5_std value: -17.8965 - type: nauc_recall_at_5_diff1 value: 60.6996 - type: nauc_recall_at_10_max value: 72.7183 - type: nauc_recall_at_10_std value: -7.3393 - type: nauc_recall_at_10_diff1 value: 62.0422 - type: nauc_recall_at_20_max value: 70.7849 - type: nauc_recall_at_20_std value: -3.1933000000000002 - type: nauc_recall_at_20_diff1 value: 58.146 - type: nauc_recall_at_100_max value: 75.43769999999999 - type: nauc_recall_at_100_std value: 36.5488 - type: nauc_recall_at_100_diff1 value: 46.3177 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 55.4562 - type: nauc_precision_at_1_std value: -9.3317 - type: nauc_precision_at_1_diff1 value: 71.1878 - type: nauc_precision_at_3_max value: 52.548300000000005 - type: nauc_precision_at_3_std value: 6.719899999999999 - type: nauc_precision_at_3_diff1 value: 42.6315 - type: nauc_precision_at_5_max value: 47.9921 - type: nauc_precision_at_5_std value: 21.9242 - type: nauc_precision_at_5_diff1 value: 23.0825 - type: nauc_precision_at_10_max value: 47.517399999999995 - type: nauc_precision_at_10_std value: 44.4913 - type: nauc_precision_at_10_diff1 value: 5.4589 - type: nauc_precision_at_20_max value: 36.0675 - type: nauc_precision_at_20_std value: 53.9269 - type: nauc_precision_at_20_diff1 value: -7.0865 - type: nauc_precision_at_100_max value: 28.0561 - type: nauc_precision_at_100_std value: 66.17920000000001 - type: nauc_precision_at_100_diff1 value: -19.653000000000002 - type: nauc_precision_at_1000_max value: 22.470100000000002 - type: nauc_precision_at_1000_std value: 69.6725 - type: nauc_precision_at_1000_diff1 value: -27.430500000000002 - type: nauc_mrr_at_1_max value: 55.4562 - type: nauc_mrr_at_1_std value: -9.3317 - type: nauc_mrr_at_1_diff1 value: 71.1878 - type: nauc_mrr_at_3_max value: 57.4634 - type: nauc_mrr_at_3_std value: -10.6496 - type: nauc_mrr_at_3_diff1 value: 69.881 - type: nauc_mrr_at_5_max value: 56.8667 - type: nauc_mrr_at_5_std value: -10.2421 - type: nauc_mrr_at_5_diff1 value: 69.0777 - type: nauc_mrr_at_10_max value: 58.06289999999999 - type: nauc_mrr_at_10_std value: -9.8724 - type: nauc_mrr_at_10_diff1 value: 69.5505 - type: nauc_mrr_at_20_max value: 57.740700000000004 - type: nauc_mrr_at_20_std value: -10.0261 - type: nauc_mrr_at_20_diff1 value: 69.5455 - type: nauc_mrr_at_100_max value: 57.735499999999995 - type: nauc_mrr_at_100_std value: -9.8413 - type: nauc_mrr_at_100_diff1 value: 69.5846 - type: nauc_mrr_at_1000_max value: 57.7313 - type: nauc_mrr_at_1000_std value: -9.8523 - type: nauc_mrr_at_1000_diff1 value: 69.6076 - type: main_score value: 74.72 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions (default) revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: similarity_accuracy value: 99.798 - type: similarity_accuracy_threshold value: 92.7546 - type: similarity_f1 value: 89.441 - type: similarity_f1_threshold value: 92.7546 - type: similarity_precision value: 92.70389999999999 - type: similarity_recall value: 86.4 - type: similarity_ap value: 95.40729999999999 - type: cosine_accuracy value: 99.798 - type: cosine_accuracy_threshold value: 92.7546 - type: cosine_f1 value: 89.441 - type: cosine_f1_threshold value: 92.7546 - type: cosine_precision value: 92.70389999999999 - type: cosine_recall value: 86.4 - type: cosine_ap value: 95.40729999999999 - type: manhattan_accuracy value: 99.795 - type: manhattan_accuracy_threshold value: 851.3785 - type: manhattan_f1 value: 89.5464 - type: manhattan_f1_threshold value: 902.8005999999999 - type: manhattan_precision value: 88.3268 - type: manhattan_recall value: 90.8 - type: manhattan_ap value: 95.3814 - type: euclidean_accuracy value: 99.798 - type: euclidean_accuracy_threshold value: 38.0669 - type: euclidean_f1 value: 89.441 - type: euclidean_f1_threshold value: 38.0669 - type: euclidean_precision value: 92.70389999999999 - type: euclidean_recall value: 86.4 - type: euclidean_ap value: 95.4074 - type: dot_accuracy value: 99.798 - type: dot_accuracy_threshold value: 92.7546 - type: dot_f1 value: 89.441 - type: dot_f1_threshold value: 92.7546 - type: dot_precision value: 92.70389999999999 - type: dot_recall value: 86.4 - type: dot_ap value: 95.4074 - type: max_accuracy value: 99.798 - type: max_f1 value: 89.5464 - type: max_precision value: 92.70389999999999 - type: max_recall value: 90.8 - type: max_ap value: 95.4074 - type: main_score value: 95.4074 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering (default) revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: v_measure value: 70.3156 - type: v_measure_std value: 3.9677 - type: main_score value: 70.3156 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P (default) revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: v_measure value: 35.4198 - type: v_measure_std value: 1.5537 - type: main_score value: 35.4198 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions (default) revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: map value: 54.522099999999995 - type: mrr value: 55.500099999999996 - type: nAUC_map_max value: 7.9342 - type: nAUC_map_std value: 6.8542000000000005 - type: nAUC_map_diff1 value: 38.738099999999996 - type: nAUC_mrr_max value: 8.862 - type: nAUC_mrr_std value: 7.2187 - type: nAUC_mrr_diff1 value: 38.5236 - type: main_score value: 54.522099999999995 task: type: Reranking - dataset: config: default name: MTEB StackOverflowQA (default) revision: db8f169f3894c14a00251061f957b2063eef2bd5 split: test type: CoIR-Retrieval/stackoverflow-qa metrics: - type: ndcg_at_1 value: 83.2 - type: ndcg_at_3 value: 88.397 - type: ndcg_at_5 value: 89.202 - type: ndcg_at_10 value: 89.846 - type: ndcg_at_20 value: 90.235 - type: ndcg_at_100 value: 90.55199999999999 - type: ndcg_at_1000 value: 90.654 - type: map_at_1 value: 83.2 - type: map_at_3 value: 87.17 - type: map_at_5 value: 87.616 - type: map_at_10 value: 87.889 - type: map_at_20 value: 87.994 - type: map_at_100 value: 88.041 - type: map_at_1000 value: 88.045 - type: recall_at_1 value: 83.2 - type: recall_at_3 value: 91.926 - type: recall_at_5 value: 93.882 - type: recall_at_10 value: 95.838 - type: recall_at_20 value: 97.392 - type: recall_at_100 value: 99.047 - type: recall_at_1000 value: 99.85000000000001 - type: precision_at_1 value: 83.2 - type: precision_at_3 value: 30.642000000000003 - type: precision_at_5 value: 18.776 - type: precision_at_10 value: 9.584 - type: precision_at_20 value: 4.87 - type: precision_at_100 value: 0.9900000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 83.19959999999999 - type: mrr_at_3 value: 87.1698 - type: mrr_at_5 value: 87.6162 - type: mrr_at_10 value: 87.8891 - type: mrr_at_20 value: 87.99369999999999 - type: mrr_at_100 value: 88.0412 - type: mrr_at_1000 value: 88.045 - type: nauc_ndcg_at_1_max value: 78.6007 - type: nauc_ndcg_at_1_std value: -0.0095 - type: nauc_ndcg_at_1_diff1 value: 88.7762 - type: nauc_ndcg_at_3_max value: 81.4239 - type: nauc_ndcg_at_3_std value: 1.4683 - type: nauc_ndcg_at_3_diff1 value: 86.54220000000001 - type: nauc_ndcg_at_5_max value: 80.8469 - type: nauc_ndcg_at_5_std value: -0.5089 - type: nauc_ndcg_at_5_diff1 value: 86.7397 - type: nauc_ndcg_at_10_max value: 80.60730000000001 - type: nauc_ndcg_at_10_std value: 1.2302 - type: nauc_ndcg_at_10_diff1 value: 86.5722 - type: nauc_ndcg_at_20_max value: 80.5133 - type: nauc_ndcg_at_20_std value: 1.0021 - type: nauc_ndcg_at_20_diff1 value: 86.6381 - type: nauc_ndcg_at_100_max value: 80.4389 - type: nauc_ndcg_at_100_std value: 0.33 - type: nauc_ndcg_at_100_diff1 value: 86.993 - type: nauc_ndcg_at_1000_max value: 80.3736 - type: nauc_ndcg_at_1000_std value: 0.582 - type: nauc_ndcg_at_1000_diff1 value: 86.9238 - type: nauc_map_at_1_max value: 78.6007 - type: nauc_map_at_1_std value: -0.0095 - type: nauc_map_at_1_diff1 value: 88.7762 - type: nauc_map_at_3_max value: 80.6167 - type: nauc_map_at_3_std value: 0.8933 - type: nauc_map_at_3_diff1 value: 87.07629999999999 - type: nauc_map_at_5_max value: 80.3056 - type: nauc_map_at_5_std value: -0.1035 - type: nauc_map_at_5_diff1 value: 87.1974 - type: nauc_map_at_10_max value: 80.1979 - type: nauc_map_at_10_std value: 0.4875 - type: nauc_map_at_10_diff1 value: 87.1597 - type: nauc_map_at_20_max value: 80.1758 - type: nauc_map_at_20_std value: 0.4484 - type: nauc_map_at_20_diff1 value: 87.1785 - type: nauc_map_at_100_max value: 80.1598 - type: nauc_map_at_100_std value: 0.3517 - type: nauc_map_at_100_diff1 value: 87.2128 - type: nauc_map_at_1000_max value: 80.1585 - type: nauc_map_at_1000_std value: 0.3646 - type: nauc_map_at_1000_diff1 value: 87.2108 - type: nauc_recall_at_1_max value: 78.6007 - type: nauc_recall_at_1_std value: -0.0095 - type: nauc_recall_at_1_diff1 value: 88.7762 - type: nauc_recall_at_3_max value: 84.951 - type: nauc_recall_at_3_std value: 4.0854 - type: nauc_recall_at_3_diff1 value: 84.2801 - type: nauc_recall_at_5_max value: 83.68339999999999 - type: nauc_recall_at_5_std value: -3.1815 - type: nauc_recall_at_5_diff1 value: 84.33619999999999 - type: nauc_recall_at_10_max value: 83.4402 - type: nauc_recall_at_10_std value: 8.585700000000001 - type: nauc_recall_at_10_diff1 value: 81.84320000000001 - type: nauc_recall_at_20_max value: 83.6935 - type: nauc_recall_at_20_std value: 9.088799999999999 - type: nauc_recall_at_20_diff1 value: 80.01 - type: nauc_recall_at_100_max value: 86.5116 - type: nauc_recall_at_100_std value: -7.6839 - type: nauc_recall_at_100_diff1 value: 88.1354 - type: nauc_recall_at_1000_max value: 86.3848 - type: nauc_recall_at_1000_std value: 52.8467 - type: nauc_recall_at_1000_diff1 value: 61.4995 - type: nauc_precision_at_1_max value: 78.6007 - type: nauc_precision_at_1_std value: -0.0095 - type: nauc_precision_at_1_diff1 value: 88.7762 - type: nauc_precision_at_3_max value: 84.951 - type: nauc_precision_at_3_std value: 4.0854 - type: nauc_precision_at_3_diff1 value: 84.2801 - type: nauc_precision_at_5_max value: 83.68339999999999 - type: nauc_precision_at_5_std value: -3.1815 - type: nauc_precision_at_5_diff1 value: 84.33619999999999 - type: nauc_precision_at_10_max value: 83.4402 - type: nauc_precision_at_10_std value: 8.585700000000001 - type: nauc_precision_at_10_diff1 value: 81.84320000000001 - type: nauc_precision_at_20_max value: 83.6935 - type: nauc_precision_at_20_std value: 9.088799999999999 - type: nauc_precision_at_20_diff1 value: 80.01 - type: nauc_precision_at_100_max value: 86.5116 - type: nauc_precision_at_100_std value: -7.6839 - type: nauc_precision_at_100_diff1 value: 88.1354 - type: nauc_precision_at_1000_max value: 86.3848 - type: nauc_precision_at_1000_std value: 52.8467 - type: nauc_precision_at_1000_diff1 value: 61.4995 - type: nauc_mrr_at_1_max value: 78.6007 - type: nauc_mrr_at_1_std value: -0.0095 - type: nauc_mrr_at_1_diff1 value: 88.7762 - type: nauc_mrr_at_3_max value: 80.6167 - type: nauc_mrr_at_3_std value: 0.8933 - type: nauc_mrr_at_3_diff1 value: 87.07629999999999 - type: nauc_mrr_at_5_max value: 80.3056 - type: nauc_mrr_at_5_std value: -0.1035 - type: nauc_mrr_at_5_diff1 value: 87.1974 - type: nauc_mrr_at_10_max value: 80.1979 - type: nauc_mrr_at_10_std value: 0.4875 - type: nauc_mrr_at_10_diff1 value: 87.1597 - type: nauc_mrr_at_20_max value: 80.1758 - type: nauc_mrr_at_20_std value: 0.4484 - type: nauc_mrr_at_20_diff1 value: 87.1785 - type: nauc_mrr_at_100_max value: 80.1598 - type: nauc_mrr_at_100_std value: 0.3517 - type: nauc_mrr_at_100_diff1 value: 87.2128 - type: nauc_mrr_at_1000_max value: 80.1585 - type: nauc_mrr_at_1000_std value: 0.3646 - type: nauc_mrr_at_1000_diff1 value: 87.2108 - type: main_score value: 89.846 task: type: Retrieval - dataset: config: default name: MTEB SummEval (default) revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: pearson value: 30.709999999999997 - type: spearman value: 31.841199999999997 - type: cosine_spearman value: 31.841199999999997 - type: cosine_pearson value: 30.709999999999997 - type: dot_spearman value: 31.841199999999997 - type: dot_pearson value: 30.709999999999997 - type: main_score value: 31.841199999999997 task: type: Summarization - dataset: config: default name: MTEB SyntheticText2SQL (default) revision: 686b87296c3a0191b5d9415a00526c62db9fce09 split: test type: CoIR-Retrieval/synthetic-text2sql metrics: - type: ndcg_at_1 value: 3.692 - type: ndcg_at_3 value: 42.481 - type: ndcg_at_5 value: 45.909 - type: ndcg_at_10 value: 48.41 - type: ndcg_at_20 value: 49.845 - type: ndcg_at_100 value: 51.358000000000004 - type: ndcg_at_1000 value: 51.739999999999995 - type: map_at_1 value: 3.692 - type: map_at_3 value: 33.82 - type: map_at_5 value: 35.727 - type: map_at_10 value: 36.768 - type: map_at_20 value: 37.162 - type: map_at_100 value: 37.377 - type: map_at_1000 value: 37.391999999999996 - type: recall_at_1 value: 3.692 - type: recall_at_3 value: 67.18499999999999 - type: recall_at_5 value: 75.491 - type: recall_at_10 value: 83.182 - type: recall_at_20 value: 88.857 - type: recall_at_100 value: 96.92399999999999 - type: recall_at_1000 value: 99.88 - type: precision_at_1 value: 3.692 - type: precision_at_3 value: 22.395 - type: precision_at_5 value: 15.098 - type: precision_at_10 value: 8.318 - type: precision_at_20 value: 4.443 - type: precision_at_100 value: 0.9690000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 31.4647 - type: mrr_at_3 value: 49.3391 - type: mrr_at_5 value: 50.9842 - type: mrr_at_10 value: 51.902499999999996 - type: mrr_at_20 value: 52.2801 - type: mrr_at_100 value: 52.4906 - type: mrr_at_1000 value: 52.506 - type: nauc_ndcg_at_1_max value: 5.9474 - type: nauc_ndcg_at_1_std value: -15.6036 - type: nauc_ndcg_at_1_diff1 value: 74.4115 - type: nauc_ndcg_at_3_max value: 24.1744 - type: nauc_ndcg_at_3_std value: -26.2412 - type: nauc_ndcg_at_3_diff1 value: -61.795 - type: nauc_ndcg_at_5_max value: 24.3445 - type: nauc_ndcg_at_5_std value: -26.8005 - type: nauc_ndcg_at_5_diff1 value: -57.8936 - type: nauc_ndcg_at_10_max value: 23.6218 - type: nauc_ndcg_at_10_std value: -26.378400000000003 - type: nauc_ndcg_at_10_diff1 value: -54.496599999999994 - type: nauc_ndcg_at_20_max value: 23.6458 - type: nauc_ndcg_at_20_std value: -26.1137 - type: nauc_ndcg_at_20_diff1 value: -52.7814 - type: nauc_ndcg_at_100_max value: 23.59 - type: nauc_ndcg_at_100_std value: -24.786 - type: nauc_ndcg_at_100_diff1 value: -51.30200000000001 - type: nauc_ndcg_at_1000_max value: 23.1129 - type: nauc_ndcg_at_1000_std value: -25.138899999999996 - type: nauc_ndcg_at_1000_diff1 value: -50.856500000000004 - type: nauc_map_at_1_max value: 5.9474 - type: nauc_map_at_1_std value: -15.6036 - type: nauc_map_at_1_diff1 value: 74.4115 - type: nauc_map_at_3_max value: 22.7683 - type: nauc_map_at_3_std value: -25.060399999999998 - type: nauc_map_at_3_diff1 value: -53.0054 - type: nauc_map_at_5_max value: 22.778100000000002 - type: nauc_map_at_5_std value: -25.3076 - type: nauc_map_at_5_diff1 value: -49.921 - type: nauc_map_at_10_max value: 22.345000000000002 - type: nauc_map_at_10_std value: -25.0615 - type: nauc_map_at_10_diff1 value: -48.089999999999996 - type: nauc_map_at_20_max value: 22.336100000000002 - type: nauc_map_at_20_std value: -24.9463 - type: nauc_map_at_20_diff1 value: -47.4815 - type: nauc_map_at_100_max value: 22.3039 - type: nauc_map_at_100_std value: -24.7562 - type: nauc_map_at_100_diff1 value: -47.2248 - type: nauc_map_at_1000_max value: 22.287000000000003 - type: nauc_map_at_1000_std value: -24.7638 - type: nauc_map_at_1000_diff1 value: -47.2029 - type: nauc_recall_at_1_max value: 5.9474 - type: nauc_recall_at_1_std value: -15.6036 - type: nauc_recall_at_1_diff1 value: 74.4115 - type: nauc_recall_at_3_max value: 26.7488 - type: nauc_recall_at_3_std value: -28.5119 - type: nauc_recall_at_3_diff1 value: -77.3694 - type: nauc_recall_at_5_max value: 27.694499999999998 - type: nauc_recall_at_5_std value: -30.2099 - type: nauc_recall_at_5_diff1 value: -73.6265 - type: nauc_recall_at_10_max value: 26.9417 - type: nauc_recall_at_10_std value: -30.1319 - type: nauc_recall_at_10_diff1 value: -68.8477 - type: nauc_recall_at_20_max value: 28.432800000000004 - type: nauc_recall_at_20_std value: -30.55 - type: nauc_recall_at_20_diff1 value: -66.2201 - type: nauc_recall_at_100_max value: 39.7358 - type: nauc_recall_at_100_std value: -11.5261 - type: nauc_recall_at_100_diff1 value: -66.6477 - type: nauc_recall_at_1000_max value: 34.353 - type: nauc_recall_at_1000_std value: -6.297899999999999 - type: nauc_recall_at_1000_diff1 value: -85.7774 - type: nauc_precision_at_1_max value: 5.9474 - type: nauc_precision_at_1_std value: -15.6036 - type: nauc_precision_at_1_diff1 value: 74.4115 - type: nauc_precision_at_3_max value: 26.7488 - type: nauc_precision_at_3_std value: -28.5119 - type: nauc_precision_at_3_diff1 value: -77.3694 - type: nauc_precision_at_5_max value: 27.694499999999998 - type: nauc_precision_at_5_std value: -30.2099 - type: nauc_precision_at_5_diff1 value: -73.6265 - type: nauc_precision_at_10_max value: 26.9417 - type: nauc_precision_at_10_std value: -30.1319 - type: nauc_precision_at_10_diff1 value: -68.8477 - type: nauc_precision_at_20_max value: 28.432800000000004 - type: nauc_precision_at_20_std value: -30.55 - type: nauc_precision_at_20_diff1 value: -66.2201 - type: nauc_precision_at_100_max value: 39.7358 - type: nauc_precision_at_100_std value: -11.5261 - type: nauc_precision_at_100_diff1 value: -66.6477 - type: nauc_precision_at_1000_max value: 34.353 - type: nauc_precision_at_1000_std value: -6.297899999999999 - type: nauc_precision_at_1000_diff1 value: -85.7774 - type: nauc_mrr_at_1_max value: 14.005899999999999 - type: nauc_mrr_at_1_std value: -13.7382 - type: nauc_mrr_at_1_diff1 value: -36.567499999999995 - type: nauc_mrr_at_3_max value: 19.6693 - type: nauc_mrr_at_3_std value: -19.7679 - type: nauc_mrr_at_3_diff1 value: -54.849000000000004 - type: nauc_mrr_at_5_max value: 19.4039 - type: nauc_mrr_at_5_std value: -19.822 - type: nauc_mrr_at_5_diff1 value: -53.7619 - type: nauc_mrr_at_10_max value: 19.1888 - type: nauc_mrr_at_10_std value: -19.4663 - type: nauc_mrr_at_10_diff1 value: -52.9212 - type: nauc_mrr_at_20_max value: 19.1218 - type: nauc_mrr_at_20_std value: -19.378600000000002 - type: nauc_mrr_at_20_diff1 value: -52.663000000000004 - type: nauc_mrr_at_100_max value: 19.089100000000002 - type: nauc_mrr_at_100_std value: -19.2391 - type: nauc_mrr_at_100_diff1 value: -52.5536 - type: nauc_mrr_at_1000_max value: 19.078400000000002 - type: nauc_mrr_at_1000_std value: -19.240099999999998 - type: nauc_mrr_at_1000_diff1 value: -52.544900000000005 - type: main_score value: 48.41 task: type: Retrieval - dataset: config: default name: MTEB TRECCOVID (default) revision: bb9466bac8153a0349341eb1b22e06409e78ef4e split: test type: mteb/trec-covid metrics: - type: ndcg_at_1 value: 66.0 - type: ndcg_at_3 value: 70.654 - type: ndcg_at_5 value: 71.611 - type: ndcg_at_10 value: 69.259 - type: ndcg_at_20 value: 67.02 - type: ndcg_at_100 value: 57.274 - type: ndcg_at_1000 value: 55.459 - type: map_at_1 value: 0.202 - type: map_at_3 value: 0.553 - type: map_at_5 value: 0.924 - type: map_at_10 value: 1.727 - type: map_at_20 value: 3.124 - type: map_at_100 value: 10.906 - type: map_at_1000 value: 28.938999999999997 - type: recall_at_1 value: 0.202 - type: recall_at_3 value: 0.609 - type: recall_at_5 value: 1.048 - type: recall_at_10 value: 2.001 - type: recall_at_20 value: 3.749 - type: recall_at_100 value: 14.801 - type: recall_at_1000 value: 53.93599999999999 - type: precision_at_1 value: 74.0 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 78.8 - type: precision_at_10 value: 74.8 - type: precision_at_20 value: 72.0 - type: precision_at_100 value: 59.62 - type: precision_at_1000 value: 24.84 - type: mrr_at_1 value: 74.0 - type: mrr_at_3 value: 85.66669999999999 - type: mrr_at_5 value: 85.66669999999999 - type: mrr_at_10 value: 85.66669999999999 - type: mrr_at_20 value: 85.66669999999999 - type: mrr_at_100 value: 85.66669999999999 - type: mrr_at_1000 value: 85.66669999999999 - type: nauc_ndcg_at_1_max value: 36.0347 - type: nauc_ndcg_at_1_std value: 41.708099999999995 - type: nauc_ndcg_at_1_diff1 value: 13.226099999999999 - type: nauc_ndcg_at_3_max value: 45.4255 - type: nauc_ndcg_at_3_std value: 49.8257 - type: nauc_ndcg_at_3_diff1 value: -0.44520000000000004 - type: nauc_ndcg_at_5_max value: 49.6908 - type: nauc_ndcg_at_5_std value: 54.221 - type: nauc_ndcg_at_5_diff1 value: 3.5483000000000002 - type: nauc_ndcg_at_10_max value: 46.2419 - type: nauc_ndcg_at_10_std value: 59.9826 - type: nauc_ndcg_at_10_diff1 value: -0.436 - type: nauc_ndcg_at_20_max value: 42.3528 - type: nauc_ndcg_at_20_std value: 64.9208 - type: nauc_ndcg_at_20_diff1 value: -15.72 - type: nauc_ndcg_at_100_max value: 38.6688 - type: nauc_ndcg_at_100_std value: 70.27069999999999 - type: nauc_ndcg_at_100_diff1 value: -27.691900000000004 - type: nauc_ndcg_at_1000_max value: 39.3229 - type: nauc_ndcg_at_1000_std value: 71.5958 - type: nauc_ndcg_at_1000_diff1 value: -32.426899999999996 - type: nauc_map_at_1_max value: 24.9717 - type: nauc_map_at_1_std value: 20.3237 - type: nauc_map_at_1_diff1 value: 26.8022 - type: nauc_map_at_3_max value: 36.496 - type: nauc_map_at_3_std value: 32.506 - type: nauc_map_at_3_diff1 value: 17.7469 - type: nauc_map_at_5_max value: 37.802 - type: nauc_map_at_5_std value: 32.5133 - type: nauc_map_at_5_diff1 value: 21.9404 - type: nauc_map_at_10_max value: 36.8446 - type: nauc_map_at_10_std value: 37.3347 - type: nauc_map_at_10_diff1 value: 23.311 - type: nauc_map_at_20_max value: 35.484500000000004 - type: nauc_map_at_20_std value: 42.1774 - type: nauc_map_at_20_diff1 value: 14.072499999999998 - type: nauc_map_at_100_max value: 38.3755 - type: nauc_map_at_100_std value: 58.458299999999994 - type: nauc_map_at_100_diff1 value: -7.320200000000001 - type: nauc_map_at_1000_max value: 43.0209 - type: nauc_map_at_1000_std value: 72.8673 - type: nauc_map_at_1000_diff1 value: -29.952299999999997 - type: nauc_recall_at_1_max value: 24.9717 - type: nauc_recall_at_1_std value: 20.3237 - type: nauc_recall_at_1_diff1 value: 26.8022 - type: nauc_recall_at_3_max value: 29.149900000000002 - type: nauc_recall_at_3_std value: 27.2806 - type: nauc_recall_at_3_diff1 value: 16.0975 - type: nauc_recall_at_5_max value: 29.3013 - type: nauc_recall_at_5_std value: 26.4035 - type: nauc_recall_at_5_diff1 value: 20.3157 - type: nauc_recall_at_10_max value: 27.326099999999997 - type: nauc_recall_at_10_std value: 30.1061 - type: nauc_recall_at_10_diff1 value: 22.0122 - type: nauc_recall_at_20_max value: 25.176399999999997 - type: nauc_recall_at_20_std value: 33.1536 - type: nauc_recall_at_20_diff1 value: 13.4285 - type: nauc_recall_at_100_max value: 28.209899999999998 - type: nauc_recall_at_100_std value: 45.7222 - type: nauc_recall_at_100_diff1 value: -6.1627 - type: nauc_recall_at_1000_max value: 33.4423 - type: nauc_recall_at_1000_std value: 60.764399999999995 - type: nauc_recall_at_1000_diff1 value: -32.4319 - type: nauc_precision_at_1_max value: 55.0789 - type: nauc_precision_at_1_std value: 42.7355 - type: nauc_precision_at_1_diff1 value: 21.276500000000002 - type: nauc_precision_at_3_max value: 57.5971 - type: nauc_precision_at_3_std value: 54.4791 - type: nauc_precision_at_3_diff1 value: -1.1622000000000001 - type: nauc_precision_at_5_max value: 66.64750000000001 - type: nauc_precision_at_5_std value: 57.5585 - type: nauc_precision_at_5_diff1 value: 2.9311 - type: nauc_precision_at_10_max value: 58.767100000000006 - type: nauc_precision_at_10_std value: 63.5528 - type: nauc_precision_at_10_diff1 value: -1.193 - type: nauc_precision_at_20_max value: 47.964 - type: nauc_precision_at_20_std value: 65.3738 - type: nauc_precision_at_20_diff1 value: -17.0707 - type: nauc_precision_at_100_max value: 38.9039 - type: nauc_precision_at_100_std value: 68.9848 - type: nauc_precision_at_100_diff1 value: -31.816699999999997 - type: nauc_precision_at_1000_max value: 24.090700000000002 - type: nauc_precision_at_1000_std value: 36.3251 - type: nauc_precision_at_1000_diff1 value: -30.1565 - type: nauc_mrr_at_1_max value: 55.0789 - type: nauc_mrr_at_1_std value: 42.7355 - type: nauc_mrr_at_1_diff1 value: 21.276500000000002 - type: nauc_mrr_at_3_max value: 57.0157 - type: nauc_mrr_at_3_std value: 44.9613 - type: nauc_mrr_at_3_diff1 value: 18.5485 - type: nauc_mrr_at_5_max value: 57.0157 - type: nauc_mrr_at_5_std value: 44.9613 - type: nauc_mrr_at_5_diff1 value: 18.5485 - type: nauc_mrr_at_10_max value: 57.0157 - type: nauc_mrr_at_10_std value: 44.9613 - type: nauc_mrr_at_10_diff1 value: 18.5485 - type: nauc_mrr_at_20_max value: 57.0157 - type: nauc_mrr_at_20_std value: 44.9613 - type: nauc_mrr_at_20_diff1 value: 18.5485 - type: nauc_mrr_at_100_max value: 57.0157 - type: nauc_mrr_at_100_std value: 44.9613 - type: nauc_mrr_at_100_diff1 value: 18.5485 - type: nauc_mrr_at_1000_max value: 57.0157 - type: nauc_mrr_at_1000_std value: 44.9613 - type: nauc_mrr_at_1000_diff1 value: 18.5485 - type: main_score value: 69.259 task: type: Retrieval - dataset: config: default name: MTEB Touche2020 (default) revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: ndcg_at_1 value: 23.469 - type: ndcg_at_3 value: 22.555 - type: ndcg_at_5 value: 20.97 - type: ndcg_at_10 value: 20.147000000000002 - type: ndcg_at_20 value: 22.56 - type: ndcg_at_100 value: 32.79 - type: ndcg_at_1000 value: 45.324 - type: map_at_1 value: 2.152 - type: map_at_3 value: 4.103 - type: map_at_5 value: 5.482 - type: map_at_10 value: 7.747 - type: map_at_20 value: 10.309 - type: map_at_100 value: 13.639999999999999 - type: map_at_1000 value: 15.235000000000001 - type: recall_at_1 value: 2.152 - type: recall_at_3 value: 5.531 - type: recall_at_5 value: 8.029 - type: recall_at_10 value: 13.331000000000001 - type: recall_at_20 value: 22.195 - type: recall_at_100 value: 45.35 - type: recall_at_1000 value: 83.447 - type: precision_at_1 value: 26.531 - type: precision_at_3 value: 24.490000000000002 - type: precision_at_5 value: 21.633 - type: precision_at_10 value: 17.755000000000003 - type: precision_at_20 value: 15.408 - type: precision_at_100 value: 7.081999999999999 - type: precision_at_1000 value: 1.547 - type: mrr_at_1 value: 26.5306 - type: mrr_at_3 value: 38.7755 - type: mrr_at_5 value: 40.6122 - type: mrr_at_10 value: 41.3994 - type: mrr_at_20 value: 42.7601 - type: mrr_at_100 value: 43.0467 - type: mrr_at_1000 value: 43.0467 - type: nauc_ndcg_at_1_max value: -19.1831 - type: nauc_ndcg_at_1_std value: -13.1044 - type: nauc_ndcg_at_1_diff1 value: -8.6701 - type: nauc_ndcg_at_3_max value: -31.2521 - type: nauc_ndcg_at_3_std value: -9.1974 - type: nauc_ndcg_at_3_diff1 value: -17.0766 - type: nauc_ndcg_at_5_max value: -29.9171 - type: nauc_ndcg_at_5_std value: -2.2094 - type: nauc_ndcg_at_5_diff1 value: -10.8668 - type: nauc_ndcg_at_10_max value: -24.5148 - type: nauc_ndcg_at_10_std value: -0.45909999999999995 - type: nauc_ndcg_at_10_diff1 value: -10.705 - type: nauc_ndcg_at_20_max value: -29.542 - type: nauc_ndcg_at_20_std value: -0.1119 - type: nauc_ndcg_at_20_diff1 value: -6.4151 - type: nauc_ndcg_at_100_max value: -27.276 - type: nauc_ndcg_at_100_std value: 33.380900000000004 - type: nauc_ndcg_at_100_diff1 value: -1.097 - type: nauc_ndcg_at_1000_max value: -28.0856 - type: nauc_ndcg_at_1000_std value: 40.368700000000004 - type: nauc_ndcg_at_1000_diff1 value: -9.5892 - type: nauc_map_at_1_max value: -17.891099999999998 - type: nauc_map_at_1_std value: -20.8139 - type: nauc_map_at_1_diff1 value: 2.1289 - type: nauc_map_at_3_max value: -18.5984 - type: nauc_map_at_3_std value: -16.0226 - type: nauc_map_at_3_diff1 value: -0.681 - type: nauc_map_at_5_max value: -9.8672 - type: nauc_map_at_5_std value: -11.448 - type: nauc_map_at_5_diff1 value: 4.1101 - type: nauc_map_at_10_max value: -5.8905 - type: nauc_map_at_10_std value: -7.7416 - type: nauc_map_at_10_diff1 value: 2.0848999999999998 - type: nauc_map_at_20_max value: -13.9206 - type: nauc_map_at_20_std value: -4.9227 - type: nauc_map_at_20_diff1 value: 1.6968 - type: nauc_map_at_100_max value: -15.116 - type: nauc_map_at_100_std value: 10.9804 - type: nauc_map_at_100_diff1 value: 1.5921999999999998 - type: nauc_map_at_1000_max value: -15.309000000000001 - type: nauc_map_at_1000_std value: 15.207399999999998 - type: nauc_map_at_1000_diff1 value: 0.2635 - type: nauc_recall_at_1_max value: -17.891099999999998 - type: nauc_recall_at_1_std value: -20.8139 - type: nauc_recall_at_1_diff1 value: 2.1289 - type: nauc_recall_at_3_max value: -27.4434 - type: nauc_recall_at_3_std value: -14.4615 - type: nauc_recall_at_3_diff1 value: -4.6056 - type: nauc_recall_at_5_max value: -17.3993 - type: nauc_recall_at_5_std value: -7.1856 - type: nauc_recall_at_5_diff1 value: 2.468 - type: nauc_recall_at_10_max value: -13.7175 - type: nauc_recall_at_10_std value: -2.9436 - type: nauc_recall_at_10_diff1 value: 0.9384 - type: nauc_recall_at_20_max value: -26.96 - type: nauc_recall_at_20_std value: -1.6922 - type: nauc_recall_at_20_diff1 value: 1.8932999999999998 - type: nauc_recall_at_100_max value: -23.5556 - type: nauc_recall_at_100_std value: 48.9062 - type: nauc_recall_at_100_diff1 value: 7.8596 - type: nauc_recall_at_1000_max value: -19.6066 - type: nauc_recall_at_1000_std value: 80.4306 - type: nauc_recall_at_1000_diff1 value: -8.4789 - type: nauc_precision_at_1_max value: -23.163800000000002 - type: nauc_precision_at_1_std value: -15.9221 - type: nauc_precision_at_1_diff1 value: -1.0075 - type: nauc_precision_at_3_max value: -34.2 - type: nauc_precision_at_3_std value: -5.8114 - type: nauc_precision_at_3_diff1 value: -11.4192 - type: nauc_precision_at_5_max value: -28.3543 - type: nauc_precision_at_5_std value: 3.2409 - type: nauc_precision_at_5_diff1 value: -2.4743 - type: nauc_precision_at_10_max value: -21.8691 - type: nauc_precision_at_10_std value: 12.0827 - type: nauc_precision_at_10_diff1 value: -7.6671000000000005 - type: nauc_precision_at_20_max value: -29.541600000000003 - type: nauc_precision_at_20_std value: 18.4544 - type: nauc_precision_at_20_diff1 value: -4.9384 - type: nauc_precision_at_100_max value: -13.991700000000002 - type: nauc_precision_at_100_std value: 80.9784 - type: nauc_precision_at_100_diff1 value: 0.1001 - type: nauc_precision_at_1000_max value: 18.334 - type: nauc_precision_at_1000_std value: 35.3463 - type: nauc_precision_at_1000_diff1 value: -16.8628 - type: nauc_mrr_at_1_max value: -23.163800000000002 - type: nauc_mrr_at_1_std value: -15.9221 - type: nauc_mrr_at_1_diff1 value: -1.0075 - type: nauc_mrr_at_3_max value: -37.628099999999996 - type: nauc_mrr_at_3_std value: -13.678199999999999 - type: nauc_mrr_at_3_diff1 value: -8.0387 - type: nauc_mrr_at_5_max value: -38.205 - type: nauc_mrr_at_5_std value: -10.0574 - type: nauc_mrr_at_5_diff1 value: -7.273300000000001 - type: nauc_mrr_at_10_max value: -38.2773 - type: nauc_mrr_at_10_std value: -10.5208 - type: nauc_mrr_at_10_diff1 value: -7.556400000000001 - type: nauc_mrr_at_20_max value: -38.8068 - type: nauc_mrr_at_20_std value: -10.7195 - type: nauc_mrr_at_20_diff1 value: -6.7631 - type: nauc_mrr_at_100_max value: -38.318200000000004 - type: nauc_mrr_at_100_std value: -10.854999999999999 - type: nauc_mrr_at_100_diff1 value: -6.843000000000001 - type: nauc_mrr_at_1000_max value: -38.318200000000004 - type: nauc_mrr_at_1000_std value: -10.854999999999999 - type: nauc_mrr_at_1000_diff1 value: -6.843000000000001 - type: main_score value: 20.147000000000002 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification (default) revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 59.7607 - type: f1 value: 45.7266 - type: f1_weighted value: 68.3382 - type: ap value: 9.8682 - type: ap_weighted value: 9.8682 - type: main_score value: 59.7607 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification (default) revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 53.3192 - type: f1 value: 53.505100000000006 - type: f1_weighted value: 52.726600000000005 - type: main_score value: 53.3192 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering (default) revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: v_measure value: 48.3133 - type: v_measure_std value: 1.6674000000000002 - type: main_score value: 48.3133 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 (default) revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: similarity_accuracy value: 82.2972 - type: similarity_accuracy_threshold value: 92.5986 - type: similarity_f1 value: 58.2994 - type: similarity_f1_threshold value: 89.689 - type: similarity_precision value: 53.3772 - type: similarity_recall value: 64.2216 - type: similarity_ap value: 60.9374 - type: cosine_accuracy value: 82.2972 - type: cosine_accuracy_threshold value: 92.5986 - type: cosine_f1 value: 58.2994 - type: cosine_f1_threshold value: 89.689 - type: cosine_precision value: 53.3772 - type: cosine_recall value: 64.2216 - type: cosine_ap value: 60.9374 - type: manhattan_accuracy value: 82.2912 - type: manhattan_accuracy_threshold value: 839.1809000000001 - type: manhattan_f1 value: 58.2447 - type: manhattan_f1_threshold value: 996.9049 - type: manhattan_precision value: 53.74830000000001 - type: manhattan_recall value: 63.562 - type: manhattan_ap value: 60.8808 - type: euclidean_accuracy value: 82.2972 - type: euclidean_accuracy_threshold value: 38.4743 - type: euclidean_f1 value: 58.2994 - type: euclidean_f1_threshold value: 45.4114 - type: euclidean_precision value: 53.3772 - type: euclidean_recall value: 64.2216 - type: euclidean_ap value: 60.9374 - type: dot_accuracy value: 82.2972 - type: dot_accuracy_threshold value: 92.5986 - type: dot_f1 value: 58.2994 - type: dot_f1_threshold value: 89.689 - type: dot_precision value: 53.3772 - type: dot_recall value: 64.2216 - type: dot_ap value: 60.9374 - type: max_accuracy value: 82.2972 - type: max_f1 value: 58.2994 - type: max_precision value: 53.74830000000001 - type: max_recall value: 64.2216 - type: max_ap value: 60.9374 - type: main_score value: 60.9374 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus (default) revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: similarity_accuracy value: 87.2162 - type: similarity_accuracy_threshold value: 91.6164 - type: similarity_f1 value: 74.8086 - type: similarity_f1_threshold value: 90.18260000000001 - type: similarity_precision value: 69.3065 - type: similarity_recall value: 81.25959999999999 - type: similarity_ap value: 82.53160000000001 - type: cosine_accuracy value: 87.2162 - type: cosine_accuracy_threshold value: 91.6164 - type: cosine_f1 value: 74.8086 - type: cosine_f1_threshold value: 90.18260000000001 - type: cosine_precision value: 69.3065 - type: cosine_recall value: 81.25959999999999 - type: cosine_ap value: 82.53160000000001 - type: manhattan_accuracy value: 87.21039999999999 - type: manhattan_accuracy_threshold value: 899.2865999999999 - type: manhattan_f1 value: 74.77510000000001 - type: manhattan_f1_threshold value: 962.114 - type: manhattan_precision value: 70.6927 - type: manhattan_recall value: 79.3579 - type: manhattan_ap value: 82.5262 - type: euclidean_accuracy value: 87.2162 - type: euclidean_accuracy_threshold value: 40.9478 - type: euclidean_f1 value: 74.8086 - type: euclidean_f1_threshold value: 44.3112 - type: euclidean_precision value: 69.3065 - type: euclidean_recall value: 81.25959999999999 - type: euclidean_ap value: 82.53160000000001 - type: dot_accuracy value: 87.2162 - type: dot_accuracy_threshold value: 91.6164 - type: dot_f1 value: 74.8086 - type: dot_f1_threshold value: 90.18260000000001 - type: dot_precision value: 69.3065 - type: dot_recall value: 81.25959999999999 - type: dot_ap value: 82.53160000000001 - type: max_accuracy value: 87.2162 - type: max_f1 value: 74.8086 - type: max_precision value: 70.6927 - type: max_recall value: 81.25959999999999 - type: max_ap value: 82.53160000000001 - type: main_score value: 82.53160000000001 task: type: PairClassification pipeline_tag: sentence-similarity --- # Granite-Embedding-125m-English **Model Summary:** Granite-Embedding-125m-English is a 125M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. While maintaining competitive scores on academic benchmarks such as BEIR, this model also performs well on many enterprise use cases. This model is developed using retrieval oriented pretraining, contrastive finetuning and knowledge distillation. - **Developers:** Granite Embedding Team, IBM - **GitHub Repository:** ibm-granite/granite-embedding-models - **Website**: Granite Docs - **Paper:** Coming Soon - **Release Date**: December 18th, 2024 - **License:** Apache 2.0 **Supported Languages:** English. **Intended use:** The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications. **Usage with Sentence Transformers:** The model is compatible with SentenceTransformer library and is very easy to use: First, install the sentence transformers library The model can then be used to encode pairs of text and find the similarity between their representations **Usage with Huggingface Transformers:** This is a simple example of how to use the Granite-Embedding-125m-English model with the Transformers library and PyTorch. First, install the required libraries The model can then be used to encode pairs of text **Evaluation:** The performance of the Granite-Embedding-125M-English model on MTEB Retrieval (i.e., BEIR) and code retrieval (CoIR) benchmarks is reported below. | Model | Paramters (M)| Embedding Dimension | MTEB Retrieval (15) | CoIR (10) | |---------------------------------|:------------:|:-------------------:|:-------------------: |:----------:| |granite-embedding-125m-english |125 |768 |52.3 |50.3 | **Model Architecture:** Granite-Embedding-125m-English is based on an encoder-only RoBERTa like transformer architecture, trained internally at IBM Research. | Model | granite-embedding-30m-english | granite-embedding-125m-english | granite-embedding-107m-multilingual | granite-embedding-278m-multilingual | | :--------- | :-------:| :--------: | :-----:| :-----:| | Embedding size | 384 | **768** | 384 | 768 | | Number of layers | 6 | **12** | 6 | 12 | | Number of attention heads | 12 | **12** | 12 | 12 | | Intermediate size | 1536 | **3072** | 1536 | 3072 | | Activation Function | GeLU | **GeLU** | GeLU | GeLU | | Vocabulary Size | 50265| **50265** | 250002 | 250002 | | Max. Sequence Length | 512 | **512** | 512 | 512 | | # Parameters | 30M | **125M** | 107M | 278M | **Training Data:** Overall, the training data consists of four key sources: (1) unsupervised title-body paired data scraped from the web, (2) publicly available paired with permissive, enterprise-friendly license, (3) IBM-internal paired data targetting specific technical domains, and (4) IBM-generated synthetic data. The data is listed below: | **Dataset** | **Num. Pairs** | |----------------------------------------------------|:---------------:| | SPECTER citation triplets | 684,100 | | Stack Exchange Duplicate questions (titles) | 304,525 | | Stack Exchange Duplicate questions (bodies) | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | 250,460 | | Natural Questions (NQ) | 100,231 | | SQuAD2.0 | 87,599 | | PAQ (Question, Answer) pairs | 64,371,441 | | Stack Exchange (Title, Answer) pairs | 4,067,139 | | Stack Exchange (Title, Body) pairs | 23,978,013 | | Stack Exchange (Title+Body, Answer) pairs | 187,195 | | S2ORC Citation pairs (Titles) | 52,603,982 | | S2ORC (Title, Abstract) | 41,769,185 | | S2ORC (Citations, abstracts) | 52,603,982 | | WikiAnswers Duplicate question pairs | 77,427,422 | | SearchQA | 582,261 | | HotpotQA | 85,000 | | Fever | 109,810 | | Arxiv | 2,358,545 | | Wikipedia | 20,745,403 | | PubMed | 20,000,000 | | Miracl En Pairs | 9,016 | | DBPedia Title-Body Pairs | 4,635,922 | | Synthetic: Query-Wikipedia Passage | 1,879,093 | | Synthetic: Fact Verification | 9,888 | | IBM Internal Triples | 40,290 | | IBM Internal Title-Body Pairs | 1,524,586 | Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license, while other open-source models train on this dataset due to its high quality. **Infrastructure:** We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs. **Ethical Considerations and Limitations:** The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-125m-English is trained only for English texts, and has a context length of 512 tokens (longer texts will be truncated to this size). **Resources** - ⭐️ Learn about the latest updates with Granite: - 📄 Get started with tutorials, best practices, and prompt engineering advice: - 💡 Learn about the latest Granite learning resources: ", + "model_explanation_gemini": "Generates English text embeddings for classification and retrieval tasks using sentence-transformers." +} \ No newline at end of file diff --git a/data/model_data_json/ibm-granite_granite-embedding-30m-english.json b/data/model_data_json/ibm-granite_granite-embedding-30m-english.json new file mode 100644 index 0000000000000000000000000000000000000000..05baa807ce427a2ee6b159d43529297c86bf0ae5 --- /dev/null +++ b/data/model_data_json/ibm-granite_granite-embedding-30m-english.json @@ -0,0 +1,27 @@ +{ + "model_id": "ibm-granite/granite-embedding-30m-english", + "downloads": 87053, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "roberta", + "feature-extraction", + "language", + "granite", + "embeddings", + "mteb", + "transformers", + "sentence-similarity", + "en", + "arxiv:0000.00000", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 library_name: sentence-transformers tags: - language - granite - embeddings - mteb - transformers model-index: - name: ibm-granite/granite-embedding-30m-english results: - dataset: config: en-ext name: MTEB AmazonCounterfactualClassification (en-ext) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 62.856100000000005 - type: f1 value: 51.5046 - type: f1_weighted value: 69.9775 - type: ap value: 15.4995 - type: ap_weighted value: 15.4995 - type: main_score value: 62.856100000000005 task: type: Classification - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 60.925399999999996 - type: f1 value: 55.0092 - type: f1_weighted value: 64.8014 - type: ap value: 25.0517 - type: ap_weighted value: 25.0517 - type: main_score value: 60.925399999999996 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification (default) revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 62.983599999999996 - type: f1 value: 62.553599999999996 - type: f1_weighted value: 62.553599999999996 - type: ap value: 58.3423 - type: ap_weighted value: 58.3423 - type: main_score value: 62.983599999999996 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 32.178000000000004 - type: f1 value: 31.5201 - type: f1_weighted value: 31.5201 - type: main_score value: 32.178000000000004 task: type: Classification - dataset: config: default name: MTEB AppsRetrieval (default) revision: f22508f96b7a36c2415181ed8bb76f76e04ae2d5 split: test type: CoIR-Retrieval/apps metrics: - type: ndcg_at_1 value: 3.5060000000000002 - type: ndcg_at_3 value: 4.789000000000001 - type: ndcg_at_5 value: 5.314 - type: ndcg_at_10 value: 6.203 - type: ndcg_at_20 value: 6.801 - type: ndcg_at_100 value: 8.588 - type: ndcg_at_1000 value: 12.418999999999999 - type: map_at_1 value: 3.5060000000000002 - type: map_at_3 value: 4.471 - type: map_at_5 value: 4.7620000000000005 - type: map_at_10 value: 5.117 - type: map_at_20 value: 5.281000000000001 - type: map_at_100 value: 5.501 - type: map_at_1000 value: 5.611 - type: recall_at_1 value: 3.5060000000000002 - type: recall_at_3 value: 5.71 - type: recall_at_5 value: 6.984999999999999 - type: recall_at_10 value: 9.801 - type: recall_at_20 value: 12.165 - type: recall_at_100 value: 22.205 - type: recall_at_1000 value: 54.396 - type: precision_at_1 value: 3.5060000000000002 - type: precision_at_3 value: 1.9029999999999998 - type: precision_at_5 value: 1.397 - type: precision_at_10 value: 0.98 - type: precision_at_20 value: 0.608 - type: precision_at_100 value: 0.22200000000000003 - type: precision_at_1000 value: 0.054 - type: mrr_at_1 value: 3.5060000000000002 - type: mrr_at_3 value: 4.471 - type: mrr_at_5 value: 4.7618 - type: mrr_at_10 value: 5.1166 - type: mrr_at_20 value: 5.2806 - type: mrr_at_100 value: 5.5014 - type: mrr_at_1000 value: 5.6113 - type: nauc_ndcg_at_1_max value: 32.8089 - type: nauc_ndcg_at_1_std value: 13.0518 - type: nauc_ndcg_at_1_diff1 value: 44.3602 - type: nauc_ndcg_at_3_max value: 28.5037 - type: nauc_ndcg_at_3_std value: 12.1308 - type: nauc_ndcg_at_3_diff1 value: 33.0191 - type: nauc_ndcg_at_5_max value: 25.970100000000002 - type: nauc_ndcg_at_5_std value: 12.089500000000001 - type: nauc_ndcg_at_5_diff1 value: 30.098200000000002 - type: nauc_ndcg_at_10_max value: 23.9177 - type: nauc_ndcg_at_10_std value: 12.1279 - type: nauc_ndcg_at_10_diff1 value: 26.3951 - type: nauc_ndcg_at_20_max value: 22.2086 - type: nauc_ndcg_at_20_std value: 11.355 - type: nauc_ndcg_at_20_diff1 value: 24.9668 - type: nauc_ndcg_at_100_max value: 20.1961 - type: nauc_ndcg_at_100_std value: 11.368300000000001 - type: nauc_ndcg_at_100_diff1 value: 21.654200000000003 - type: nauc_ndcg_at_1000_max value: 19.7802 - type: nauc_ndcg_at_1000_std value: 11.9399 - type: nauc_ndcg_at_1000_diff1 value: 19.8429 - type: nauc_map_at_1_max value: 32.8089 - type: nauc_map_at_1_std value: 13.0518 - type: nauc_map_at_1_diff1 value: 44.3602 - type: nauc_map_at_3_max value: 29.285600000000002 - type: nauc_map_at_3_std value: 12.4277 - type: nauc_map_at_3_diff1 value: 35.2678 - type: nauc_map_at_5_max value: 27.6754 - type: nauc_map_at_5_std value: 12.4042 - type: nauc_map_at_5_diff1 value: 33.330799999999996 - type: nauc_map_at_10_max value: 26.571299999999997 - type: nauc_map_at_10_std value: 12.439400000000001 - type: nauc_map_at_10_diff1 value: 31.275399999999998 - type: nauc_map_at_20_max value: 25.8795 - type: nauc_map_at_20_std value: 12.1596 - type: nauc_map_at_20_diff1 value: 30.6354 - type: nauc_map_at_100_max value: 25.3369 - type: nauc_map_at_100_std value: 12.0245 - type: nauc_map_at_100_diff1 value: 29.8703 - type: nauc_map_at_1000_max value: 25.239800000000002 - type: nauc_map_at_1000_std value: 12.0242 - type: nauc_map_at_1000_diff1 value: 29.7235 - type: nauc_recall_at_1_max value: 32.8089 - type: nauc_recall_at_1_std value: 13.0518 - type: nauc_recall_at_1_diff1 value: 44.3602 - type: nauc_recall_at_3_max value: 26.747700000000002 - type: nauc_recall_at_3_std value: 11.4203 - type: nauc_recall_at_3_diff1 value: 27.9047 - type: nauc_recall_at_5_max value: 22.3707 - type: nauc_recall_at_5_std value: 11.4164 - type: nauc_recall_at_5_diff1 value: 23.4182 - type: nauc_recall_at_10_max value: 19.2758 - type: nauc_recall_at_10_std value: 11.578800000000001 - type: nauc_recall_at_10_diff1 value: 18.030099999999997 - type: nauc_recall_at_20_max value: 16.1643 - type: nauc_recall_at_20_std value: 9.9037 - type: nauc_recall_at_20_diff1 value: 16.0833 - type: nauc_recall_at_100_max value: 13.644700000000002 - type: nauc_recall_at_100_std value: 10.986799999999999 - type: nauc_recall_at_100_diff1 value: 11.0515 - type: nauc_recall_at_1000_max value: 13.9712 - type: nauc_recall_at_1000_std value: 13.4048 - type: nauc_recall_at_1000_diff1 value: 6.569500000000001 - type: nauc_precision_at_1_max value: 32.8089 - type: nauc_precision_at_1_std value: 13.0518 - type: nauc_precision_at_1_diff1 value: 44.3602 - type: nauc_precision_at_3_max value: 26.747700000000002 - type: nauc_precision_at_3_std value: 11.4203 - type: nauc_precision_at_3_diff1 value: 27.9047 - type: nauc_precision_at_5_max value: 22.3707 - type: nauc_precision_at_5_std value: 11.4164 - type: nauc_precision_at_5_diff1 value: 23.4182 - type: nauc_precision_at_10_max value: 19.2758 - type: nauc_precision_at_10_std value: 11.578800000000001 - type: nauc_precision_at_10_diff1 value: 18.030099999999997 - type: nauc_precision_at_20_max value: 16.1643 - type: nauc_precision_at_20_std value: 9.9037 - type: nauc_precision_at_20_diff1 value: 16.0833 - type: nauc_precision_at_100_max value: 13.644700000000002 - type: nauc_precision_at_100_std value: 10.986799999999999 - type: nauc_precision_at_100_diff1 value: 11.0515 - type: nauc_precision_at_1000_max value: 13.9712 - type: nauc_precision_at_1000_std value: 13.4048 - type: nauc_precision_at_1000_diff1 value: 6.569500000000001 - type: nauc_mrr_at_1_max value: 32.8089 - type: nauc_mrr_at_1_std value: 13.0518 - type: nauc_mrr_at_1_diff1 value: 44.3602 - type: nauc_mrr_at_3_max value: 29.285600000000002 - type: nauc_mrr_at_3_std value: 12.4277 - type: nauc_mrr_at_3_diff1 value: 35.2678 - type: nauc_mrr_at_5_max value: 27.6754 - type: nauc_mrr_at_5_std value: 12.4042 - type: nauc_mrr_at_5_diff1 value: 33.330799999999996 - type: nauc_mrr_at_10_max value: 26.571299999999997 - type: nauc_mrr_at_10_std value: 12.439400000000001 - type: nauc_mrr_at_10_diff1 value: 31.275399999999998 - type: nauc_mrr_at_20_max value: 25.8795 - type: nauc_mrr_at_20_std value: 12.1596 - type: nauc_mrr_at_20_diff1 value: 30.6354 - type: nauc_mrr_at_100_max value: 25.337 - type: nauc_mrr_at_100_std value: 12.0245 - type: nauc_mrr_at_100_diff1 value: 29.870400000000004 - type: nauc_mrr_at_1000_max value: 25.2399 - type: nauc_mrr_at_1000_std value: 12.0242 - type: nauc_mrr_at_1000_diff1 value: 29.7236 - type: main_score value: 6.203 task: type: Retrieval - dataset: config: default name: MTEB ArguAna (default) revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: ndcg_at_1 value: 31.791999999999998 - type: ndcg_at_3 value: 46.453 - type: ndcg_at_5 value: 51.623 - type: ndcg_at_10 value: 56.355999999999995 - type: ndcg_at_20 value: 58.757000000000005 - type: ndcg_at_100 value: 59.789 - type: ndcg_at_1000 value: 59.857000000000006 - type: map_at_1 value: 31.791999999999998 - type: map_at_3 value: 42.757 - type: map_at_5 value: 45.634 - type: map_at_10 value: 47.599000000000004 - type: map_at_20 value: 48.271 - type: map_at_100 value: 48.425000000000004 - type: map_at_1000 value: 48.427 - type: recall_at_1 value: 31.791999999999998 - type: recall_at_3 value: 57.18299999999999 - type: recall_at_5 value: 69.70100000000001 - type: recall_at_10 value: 84.282 - type: recall_at_20 value: 93.67 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.644 - type: precision_at_1 value: 31.791999999999998 - type: precision_at_3 value: 19.061 - type: precision_at_5 value: 13.94 - type: precision_at_10 value: 8.427999999999999 - type: precision_at_20 value: 4.683 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 32.3613 - type: mrr_at_3 value: 42.935 - type: mrr_at_5 value: 45.844 - type: mrr_at_10 value: 47.808099999999996 - type: mrr_at_20 value: 48.4844 - type: mrr_at_100 value: 48.6345 - type: mrr_at_1000 value: 48.6364 - type: nauc_ndcg_at_1_max value: -8.274099999999999 - type: nauc_ndcg_at_1_std value: -8.1976 - type: nauc_ndcg_at_1_diff1 value: 14.155100000000001 - type: nauc_ndcg_at_3_max value: -4.6223 - type: nauc_ndcg_at_3_std value: -10.198500000000001 - type: nauc_ndcg_at_3_diff1 value: 14.516499999999999 - type: nauc_ndcg_at_5_max value: -4.9834000000000005 - type: nauc_ndcg_at_5_std value: -9.6634 - type: nauc_ndcg_at_5_diff1 value: 12.9298 - type: nauc_ndcg_at_10_max value: -4.3251 - type: nauc_ndcg_at_10_std value: -8.3068 - type: nauc_ndcg_at_10_diff1 value: 12.2939 - type: nauc_ndcg_at_20_max value: -3.8912000000000004 - type: nauc_ndcg_at_20_std value: -8.1821 - type: nauc_ndcg_at_20_diff1 value: 12.673599999999999 - type: nauc_ndcg_at_100_max value: -5.0274 - type: nauc_ndcg_at_100_std value: -8.450000000000001 - type: nauc_ndcg_at_100_diff1 value: 12.787399999999998 - type: nauc_ndcg_at_1000_max value: -5.1416 - type: nauc_ndcg_at_1000_std value: -8.6044 - type: nauc_ndcg_at_1000_diff1 value: 12.858600000000001 - type: nauc_map_at_1_max value: -8.274099999999999 - type: nauc_map_at_1_std value: -8.1976 - type: nauc_map_at_1_diff1 value: 14.155100000000001 - type: nauc_map_at_3_max value: -5.6403 - type: nauc_map_at_3_std value: -9.7092 - type: nauc_map_at_3_diff1 value: 14.0705 - type: nauc_map_at_5_max value: -5.8896999999999995 - type: nauc_map_at_5_std value: -9.3946 - type: nauc_map_at_5_diff1 value: 13.208 - type: nauc_map_at_10_max value: -5.7523 - type: nauc_map_at_10_std value: -8.9262 - type: nauc_map_at_10_diff1 value: 12.961500000000001 - type: nauc_map_at_20_max value: -5.7103 - type: nauc_map_at_20_std value: -8.9336 - type: nauc_map_at_20_diff1 value: 13.0351 - type: nauc_map_at_100_max value: -5.8204 - type: nauc_map_at_100_std value: -8.9441 - type: nauc_map_at_100_diff1 value: 13.0722 - type: nauc_map_at_1000_max value: -5.8239 - type: nauc_map_at_1000_std value: -8.9463 - type: nauc_map_at_1000_diff1 value: 13.0724 - type: nauc_recall_at_1_max value: -8.274099999999999 - type: nauc_recall_at_1_std value: -8.1976 - type: nauc_recall_at_1_diff1 value: 14.155100000000001 - type: nauc_recall_at_3_max value: -1.4792 - type: nauc_recall_at_3_std value: -11.6828 - type: nauc_recall_at_3_diff1 value: 16.026 - type: nauc_recall_at_5_max value: -1.6868999999999998 - type: nauc_recall_at_5_std value: -10.5497 - type: nauc_recall_at_5_diff1 value: 11.826 - type: nauc_recall_at_10_max value: 5.1425 - type: nauc_recall_at_10_std value: -3.1008999999999998 - type: nauc_recall_at_10_diff1 value: 7.6911 - type: nauc_recall_at_20_max value: 25.921499999999998 - type: nauc_recall_at_20_std value: 6.812600000000001 - type: nauc_recall_at_20_diff1 value: 8.311300000000001 - type: nauc_recall_at_100_max value: 28.425299999999996 - type: nauc_recall_at_100_std value: 45.9592 - type: nauc_recall_at_100_diff1 value: -11.801 - type: nauc_recall_at_1000_max value: 21.834500000000002 - type: nauc_recall_at_1000_std value: 38.804 - type: nauc_recall_at_1000_diff1 value: -3.5484 - type: nauc_precision_at_1_max value: -8.274099999999999 - type: nauc_precision_at_1_std value: -8.1976 - type: nauc_precision_at_1_diff1 value: 14.155100000000001 - type: nauc_precision_at_3_max value: -1.4792 - type: nauc_precision_at_3_std value: -11.6828 - type: nauc_precision_at_3_diff1 value: 16.026 - type: nauc_precision_at_5_max value: -1.6868999999999998 - type: nauc_precision_at_5_std value: -10.5497 - type: nauc_precision_at_5_diff1 value: 11.826 - type: nauc_precision_at_10_max value: 5.1425 - type: nauc_precision_at_10_std value: -3.1008999999999998 - type: nauc_precision_at_10_diff1 value: 7.6911 - type: nauc_precision_at_20_max value: 25.921499999999998 - type: nauc_precision_at_20_std value: 6.812600000000001 - type: nauc_precision_at_20_diff1 value: 8.311300000000001 - type: nauc_precision_at_100_max value: 28.425299999999996 - type: nauc_precision_at_100_std value: 45.9592 - type: nauc_precision_at_100_diff1 value: -11.801 - type: nauc_precision_at_1000_max value: 21.834500000000002 - type: nauc_precision_at_1000_std value: 38.804 - type: nauc_precision_at_1000_diff1 value: -3.5484 - type: nauc_mrr_at_1_max value: -8.6929 - type: nauc_mrr_at_1_std value: -7.7584 - type: nauc_mrr_at_1_diff1 value: 12.488100000000001 - type: nauc_mrr_at_3_max value: -6.6954 - type: nauc_mrr_at_3_std value: -9.7075 - type: nauc_mrr_at_3_diff1 value: 12.2994 - type: nauc_mrr_at_5_max value: -6.7945 - type: nauc_mrr_at_5_std value: -9.3751 - type: nauc_mrr_at_5_diff1 value: 11.544699999999999 - type: nauc_mrr_at_10_max value: -6.6614 - type: nauc_mrr_at_10_std value: -8.859200000000001 - type: nauc_mrr_at_10_diff1 value: 11.2614 - type: nauc_mrr_at_20_max value: -6.6408 - type: nauc_mrr_at_20_std value: -8.8599 - type: nauc_mrr_at_20_diff1 value: 11.3125 - type: nauc_mrr_at_100_max value: -6.7582 - type: nauc_mrr_at_100_std value: -8.876299999999999 - type: nauc_mrr_at_100_diff1 value: 11.325000000000001 - type: nauc_mrr_at_1000_max value: -6.7619 - type: nauc_mrr_at_1000_std value: -8.878400000000001 - type: nauc_mrr_at_1000_diff1 value: 11.3251 - type: main_score value: 56.355999999999995 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: v_measure value: 46.813 - type: v_measure_std value: 13.830899999999998 - type: main_score value: 46.813 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S (default) revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: v_measure value: 41.9895 - type: v_measure_std value: 14.3004 - type: main_score value: 41.9895 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions (default) revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: map value: 64.1329 - type: mrr value: 76.8303 - type: nAUC_map_max value: 23.5323 - type: nAUC_map_std value: 14.7567 - type: nAUC_map_diff1 value: 11.6783 - type: nAUC_mrr_max value: 32.3309 - type: nAUC_mrr_std value: 19.1617 - type: nAUC_mrr_diff1 value: 23.508699999999997 - type: main_score value: 64.1329 task: type: Reranking - dataset: config: default name: MTEB BIOSSES (default) revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: pearson value: 90.2058 - type: spearman value: 88.1641 - type: cosine_pearson value: 90.2058 - type: cosine_spearman value: 88.1641 - type: manhattan_pearson value: 87.7579 - type: manhattan_spearman value: 87.6249 - type: euclidean_pearson value: 88.3667 - type: euclidean_spearman value: 88.1641 - type: main_score value: 88.1641 task: type: STS - dataset: config: default name: MTEB Banking77Classification (default) revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 77.3247 - type: f1 value: 76.3532 - type: f1_weighted value: 76.3532 - type: main_score value: 77.3247 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P (default) revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: v_measure value: 39.018 - type: v_measure_std value: 0.7512 - type: main_score value: 39.018 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S (default) revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: v_measure value: 36.8097 - type: v_measure_std value: 0.9368 - type: main_score value: 36.8097 task: type: Clustering - dataset: config: python name: MTEB COIRCodeSearchNetRetrieval (python) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 85.353 - type: ndcg_at_3 value: 89.493 - type: ndcg_at_5 value: 90.347 - type: ndcg_at_10 value: 90.89699999999999 - type: ndcg_at_20 value: 91.20899999999999 - type: ndcg_at_100 value: 91.506 - type: ndcg_at_1000 value: 91.62400000000001 - type: map_at_1 value: 85.353 - type: map_at_3 value: 88.532 - type: map_at_5 value: 89.008 - type: map_at_10 value: 89.238 - type: map_at_20 value: 89.323 - type: map_at_100 value: 89.366 - type: map_at_1000 value: 89.371 - type: recall_at_1 value: 85.353 - type: recall_at_3 value: 92.251 - type: recall_at_5 value: 94.316 - type: recall_at_10 value: 95.998 - type: recall_at_20 value: 97.238 - type: recall_at_100 value: 98.81400000000001 - type: recall_at_1000 value: 99.725 - type: precision_at_1 value: 85.353 - type: precision_at_3 value: 30.75 - type: precision_at_5 value: 18.863 - type: precision_at_10 value: 9.6 - type: precision_at_20 value: 4.862 - type: precision_at_100 value: 0.988 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 85.3533 - type: mrr_at_3 value: 88.5318 - type: mrr_at_5 value: 89.0077 - type: mrr_at_10 value: 89.2381 - type: mrr_at_20 value: 89.3231 - type: mrr_at_100 value: 89.3659 - type: mrr_at_1000 value: 89.3707 - type: nauc_ndcg_at_1_max value: 79.05529999999999 - type: nauc_ndcg_at_1_std value: 6.6982 - type: nauc_ndcg_at_1_diff1 value: 89.6212 - type: nauc_ndcg_at_3_max value: 82.5612 - type: nauc_ndcg_at_3_std value: 10.379199999999999 - type: nauc_ndcg_at_3_diff1 value: 87.809 - type: nauc_ndcg_at_5_max value: 82.4315 - type: nauc_ndcg_at_5_std value: 10.5113 - type: nauc_ndcg_at_5_diff1 value: 88.0763 - type: nauc_ndcg_at_10_max value: 82.4135 - type: nauc_ndcg_at_10_std value: 11.046 - type: nauc_ndcg_at_10_diff1 value: 88.2008 - type: nauc_ndcg_at_20_max value: 82.3276 - type: nauc_ndcg_at_20_std value: 11.4306 - type: nauc_ndcg_at_20_diff1 value: 88.2525 - type: nauc_ndcg_at_100_max value: 82.1023 - type: nauc_ndcg_at_100_std value: 11.2119 - type: nauc_ndcg_at_100_diff1 value: 88.3149 - type: nauc_ndcg_at_1000_max value: 81.91720000000001 - type: nauc_ndcg_at_1000_std value: 10.7203 - type: nauc_ndcg_at_1000_diff1 value: 88.349 - type: nauc_map_at_1_max value: 79.05529999999999 - type: nauc_map_at_1_std value: 6.6982 - type: nauc_map_at_1_diff1 value: 89.6212 - type: nauc_map_at_3_max value: 81.5856 - type: nauc_map_at_3_std value: 9.3626 - type: nauc_map_at_3_diff1 value: 88.2364 - type: nauc_map_at_5_max value: 81.4778 - type: nauc_map_at_5_std value: 9.3662 - type: nauc_map_at_5_diff1 value: 88.3865 - type: nauc_map_at_10_max value: 81.447 - type: nauc_map_at_10_std value: 9.5111 - type: nauc_map_at_10_diff1 value: 88.43469999999999 - type: nauc_map_at_20_max value: 81.4196 - type: nauc_map_at_20_std value: 9.593 - type: nauc_map_at_20_diff1 value: 88.4473 - type: nauc_map_at_100_max value: 81.3925 - type: nauc_map_at_100_std value: 9.5683 - type: nauc_map_at_100_diff1 value: 88.4559 - type: nauc_map_at_1000_max value: 81.3865 - type: nauc_map_at_1000_std value: 9.554 - type: nauc_map_at_1000_diff1 value: 88.457 - type: nauc_recall_at_1_max value: 79.05529999999999 - type: nauc_recall_at_1_std value: 6.6982 - type: nauc_recall_at_1_diff1 value: 89.6212 - type: nauc_recall_at_3_max value: 86.56580000000001 - type: nauc_recall_at_3_std value: 14.5464 - type: nauc_recall_at_3_diff1 value: 86.1047 - type: nauc_recall_at_5_max value: 87.5044 - type: nauc_recall_at_5_std value: 16.7155 - type: nauc_recall_at_5_diff1 value: 86.5603 - type: nauc_recall_at_10_max value: 89.5625 - type: nauc_recall_at_10_std value: 23.230700000000002 - type: nauc_recall_at_10_diff1 value: 86.8079 - type: nauc_recall_at_20_max value: 91.7174 - type: nauc_recall_at_20_std value: 33.203700000000005 - type: nauc_recall_at_20_diff1 value: 86.8468 - type: nauc_recall_at_100_max value: 95.55160000000001 - type: nauc_recall_at_100_std value: 53.0169 - type: nauc_recall_at_100_diff1 value: 87.1867 - type: nauc_recall_at_1000_max value: 97.0907 - type: nauc_recall_at_1000_std value: 75.0177 - type: nauc_recall_at_1000_diff1 value: 91.3005 - type: nauc_precision_at_1_max value: 79.05529999999999 - type: nauc_precision_at_1_std value: 6.6982 - type: nauc_precision_at_1_diff1 value: 89.6212 - type: nauc_precision_at_3_max value: 86.56580000000001 - type: nauc_precision_at_3_std value: 14.5464 - type: nauc_precision_at_3_diff1 value: 86.1047 - type: nauc_precision_at_5_max value: 87.5044 - type: nauc_precision_at_5_std value: 16.7155 - type: nauc_precision_at_5_diff1 value: 86.5603 - type: nauc_precision_at_10_max value: 89.5625 - type: nauc_precision_at_10_std value: 23.230700000000002 - type: nauc_precision_at_10_diff1 value: 86.8079 - type: nauc_precision_at_20_max value: 91.7174 - type: nauc_precision_at_20_std value: 33.203700000000005 - type: nauc_precision_at_20_diff1 value: 86.8468 - type: nauc_precision_at_100_max value: 95.55160000000001 - type: nauc_precision_at_100_std value: 53.0169 - type: nauc_precision_at_100_diff1 value: 87.1867 - type: nauc_precision_at_1000_max value: 97.0907 - type: nauc_precision_at_1000_std value: 75.0177 - type: nauc_precision_at_1000_diff1 value: 91.3005 - type: nauc_mrr_at_1_max value: 79.05529999999999 - type: nauc_mrr_at_1_std value: 6.6982 - type: nauc_mrr_at_1_diff1 value: 89.6212 - type: nauc_mrr_at_3_max value: 81.5856 - type: nauc_mrr_at_3_std value: 9.3626 - type: nauc_mrr_at_3_diff1 value: 88.2364 - type: nauc_mrr_at_5_max value: 81.4778 - type: nauc_mrr_at_5_std value: 9.3662 - type: nauc_mrr_at_5_diff1 value: 88.3865 - type: nauc_mrr_at_10_max value: 81.447 - type: nauc_mrr_at_10_std value: 9.5111 - type: nauc_mrr_at_10_diff1 value: 88.43469999999999 - type: nauc_mrr_at_20_max value: 81.4196 - type: nauc_mrr_at_20_std value: 9.593 - type: nauc_mrr_at_20_diff1 value: 88.4473 - type: nauc_mrr_at_100_max value: 81.3925 - type: nauc_mrr_at_100_std value: 9.5683 - type: nauc_mrr_at_100_diff1 value: 88.4559 - type: nauc_mrr_at_1000_max value: 81.3865 - type: nauc_mrr_at_1000_std value: 9.554 - type: nauc_mrr_at_1000_diff1 value: 88.457 - type: main_score value: 90.89699999999999 task: type: Retrieval - dataset: config: javascript name: MTEB COIRCodeSearchNetRetrieval (javascript) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 35.46 - type: ndcg_at_3 value: 42.799 - type: ndcg_at_5 value: 44.64 - type: ndcg_at_10 value: 46.54 - type: ndcg_at_20 value: 48.025 - type: ndcg_at_100 value: 50.307 - type: ndcg_at_1000 value: 51.925 - type: map_at_1 value: 35.46 - type: map_at_3 value: 41.016000000000005 - type: map_at_5 value: 42.038 - type: map_at_10 value: 42.825 - type: map_at_20 value: 43.233 - type: map_at_100 value: 43.541999999999994 - type: map_at_1000 value: 43.599 - type: recall_at_1 value: 35.46 - type: recall_at_3 value: 47.949000000000005 - type: recall_at_5 value: 52.416 - type: recall_at_10 value: 58.28 - type: recall_at_20 value: 64.145 - type: recall_at_100 value: 76.542 - type: recall_at_1000 value: 89.547 - type: precision_at_1 value: 35.46 - type: precision_at_3 value: 15.983 - type: precision_at_5 value: 10.483 - type: precision_at_10 value: 5.827999999999999 - type: precision_at_20 value: 3.2070000000000003 - type: precision_at_100 value: 0.765 - type: precision_at_1000 value: 0.09 - type: mrr_at_1 value: 35.460300000000004 - type: mrr_at_3 value: 41.0159 - type: mrr_at_5 value: 42.038399999999996 - type: mrr_at_10 value: 42.8251 - type: mrr_at_20 value: 43.2333 - type: mrr_at_100 value: 43.542199999999994 - type: mrr_at_1000 value: 43.5986 - type: nauc_ndcg_at_1_max value: 48.2915 - type: nauc_ndcg_at_1_std value: 2.4132000000000002 - type: nauc_ndcg_at_1_diff1 value: 64.10810000000001 - type: nauc_ndcg_at_3_max value: 51.357 - type: nauc_ndcg_at_3_std value: 4.9681999999999995 - type: nauc_ndcg_at_3_diff1 value: 58.012600000000006 - type: nauc_ndcg_at_5_max value: 51.8888 - type: nauc_ndcg_at_5_std value: 6.2654000000000005 - type: nauc_ndcg_at_5_diff1 value: 57.103 - type: nauc_ndcg_at_10_max value: 51.9571 - type: nauc_ndcg_at_10_std value: 7.446 - type: nauc_ndcg_at_10_diff1 value: 56.505700000000004 - type: nauc_ndcg_at_20_max value: 51.638799999999996 - type: nauc_ndcg_at_20_std value: 7.7742 - type: nauc_ndcg_at_20_diff1 value: 55.9805 - type: nauc_ndcg_at_100_max value: 51.3786 - type: nauc_ndcg_at_100_std value: 8.1191 - type: nauc_ndcg_at_100_diff1 value: 56.3265 - type: nauc_ndcg_at_1000_max value: 51.162 - type: nauc_ndcg_at_1000_std value: 7.6863 - type: nauc_ndcg_at_1000_diff1 value: 56.6531 - type: nauc_map_at_1_max value: 48.2915 - type: nauc_map_at_1_std value: 2.4132000000000002 - type: nauc_map_at_1_diff1 value: 64.10810000000001 - type: nauc_map_at_3_max value: 50.6599 - type: nauc_map_at_3_std value: 4.3285 - type: nauc_map_at_3_diff1 value: 59.453100000000006 - type: nauc_map_at_5_max value: 50.9502 - type: nauc_map_at_5_std value: 5.0428 - type: nauc_map_at_5_diff1 value: 58.9452 - type: nauc_map_at_10_max value: 50.9749 - type: nauc_map_at_10_std value: 5.5069 - type: nauc_map_at_10_diff1 value: 58.7167 - type: nauc_map_at_20_max value: 50.8815 - type: nauc_map_at_20_std value: 5.5846 - type: nauc_map_at_20_diff1 value: 58.5793 - type: nauc_map_at_100_max value: 50.8454 - type: nauc_map_at_100_std value: 5.6249 - type: nauc_map_at_100_diff1 value: 58.6352 - type: nauc_map_at_1000_max value: 50.8377 - type: nauc_map_at_1000_std value: 5.6119 - type: nauc_map_at_1000_diff1 value: 58.6477 - type: nauc_recall_at_1_max value: 48.2915 - type: nauc_recall_at_1_std value: 2.4132000000000002 - type: nauc_recall_at_1_diff1 value: 64.10810000000001 - type: nauc_recall_at_3_max value: 53.3613 - type: nauc_recall_at_3_std value: 6.833699999999999 - type: nauc_recall_at_3_diff1 value: 53.8466 - type: nauc_recall_at_5_max value: 54.7395 - type: nauc_recall_at_5_std value: 10.1014 - type: nauc_recall_at_5_diff1 value: 51.520900000000005 - type: nauc_recall_at_10_max value: 55.125299999999996 - type: nauc_recall_at_10_std value: 14.277899999999999 - type: nauc_recall_at_10_diff1 value: 49.1874 - type: nauc_recall_at_20_max value: 54.0194 - type: nauc_recall_at_20_std value: 16.4329 - type: nauc_recall_at_20_diff1 value: 46.1551 - type: nauc_recall_at_100_max value: 52.7898 - type: nauc_recall_at_100_std value: 22.375600000000002 - type: nauc_recall_at_100_diff1 value: 45.351 - type: nauc_recall_at_1000_max value: 49.0379 - type: nauc_recall_at_1000_std value: 26.0579 - type: nauc_recall_at_1000_diff1 value: 41.7849 - type: nauc_precision_at_1_max value: 48.2915 - type: nauc_precision_at_1_std value: 2.4132000000000002 - type: nauc_precision_at_1_diff1 value: 64.10810000000001 - type: nauc_precision_at_3_max value: 53.3613 - type: nauc_precision_at_3_std value: 6.833699999999999 - type: nauc_precision_at_3_diff1 value: 53.8466 - type: nauc_precision_at_5_max value: 54.7395 - type: nauc_precision_at_5_std value: 10.1014 - type: nauc_precision_at_5_diff1 value: 51.520900000000005 - type: nauc_precision_at_10_max value: 55.125299999999996 - type: nauc_precision_at_10_std value: 14.277899999999999 - type: nauc_precision_at_10_diff1 value: 49.1874 - type: nauc_precision_at_20_max value: 54.0194 - type: nauc_precision_at_20_std value: 16.4329 - type: nauc_precision_at_20_diff1 value: 46.1551 - type: nauc_precision_at_100_max value: 52.7898 - type: nauc_precision_at_100_std value: 22.375600000000002 - type: nauc_precision_at_100_diff1 value: 45.351 - type: nauc_precision_at_1000_max value: 49.0379 - type: nauc_precision_at_1000_std value: 26.0579 - type: nauc_precision_at_1000_diff1 value: 41.7849 - type: nauc_mrr_at_1_max value: 48.2915 - type: nauc_mrr_at_1_std value: 2.4132000000000002 - type: nauc_mrr_at_1_diff1 value: 64.10810000000001 - type: nauc_mrr_at_3_max value: 50.6599 - type: nauc_mrr_at_3_std value: 4.3285 - type: nauc_mrr_at_3_diff1 value: 59.453100000000006 - type: nauc_mrr_at_5_max value: 50.9502 - type: nauc_mrr_at_5_std value: 5.0428 - type: nauc_mrr_at_5_diff1 value: 58.9452 - type: nauc_mrr_at_10_max value: 50.9749 - type: nauc_mrr_at_10_std value: 5.5069 - type: nauc_mrr_at_10_diff1 value: 58.7167 - type: nauc_mrr_at_20_max value: 50.8815 - type: nauc_mrr_at_20_std value: 5.5846 - type: nauc_mrr_at_20_diff1 value: 58.5793 - type: nauc_mrr_at_100_max value: 50.8454 - type: nauc_mrr_at_100_std value: 5.6249 - type: nauc_mrr_at_100_diff1 value: 58.6352 - type: nauc_mrr_at_1000_max value: 50.8377 - type: nauc_mrr_at_1000_std value: 5.6119 - type: nauc_mrr_at_1000_diff1 value: 58.6477 - type: main_score value: 46.54 task: type: Retrieval - dataset: config: go name: MTEB COIRCodeSearchNetRetrieval (go) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 45.728 - type: ndcg_at_3 value: 54.942 - type: ndcg_at_5 value: 57.19499999999999 - type: ndcg_at_10 value: 59.471 - type: ndcg_at_20 value: 60.888 - type: ndcg_at_100 value: 62.67700000000001 - type: ndcg_at_1000 value: 63.654999999999994 - type: map_at_1 value: 45.728 - type: map_at_3 value: 52.717000000000006 - type: map_at_5 value: 53.968 - type: map_at_10 value: 54.921 - type: map_at_20 value: 55.31 - type: map_at_100 value: 55.555 - type: map_at_1000 value: 55.589999999999996 - type: recall_at_1 value: 45.728 - type: recall_at_3 value: 61.364 - type: recall_at_5 value: 66.83099999999999 - type: recall_at_10 value: 73.8 - type: recall_at_20 value: 79.402 - type: recall_at_100 value: 89.079 - type: recall_at_1000 value: 96.885 - type: precision_at_1 value: 45.728 - type: precision_at_3 value: 20.455000000000002 - type: precision_at_5 value: 13.366 - type: precision_at_10 value: 7.380000000000001 - type: precision_at_20 value: 3.9699999999999998 - type: precision_at_100 value: 0.8909999999999999 - type: precision_at_1000 value: 0.097 - type: mrr_at_1 value: 45.7277 - type: mrr_at_3 value: 52.7169 - type: mrr_at_5 value: 53.9678 - type: mrr_at_10 value: 54.920500000000004 - type: mrr_at_20 value: 55.3099 - type: mrr_at_100 value: 55.5546 - type: mrr_at_1000 value: 55.5896 - type: nauc_ndcg_at_1_max value: 40.5391 - type: nauc_ndcg_at_1_std value: -2.9052000000000002 - type: nauc_ndcg_at_1_diff1 value: 63.2351 - type: nauc_ndcg_at_3_max value: 43.8365 - type: nauc_ndcg_at_3_std value: -0.6831 - type: nauc_ndcg_at_3_diff1 value: 57.782599999999995 - type: nauc_ndcg_at_5_max value: 43.851600000000005 - type: nauc_ndcg_at_5_std value: -0.3032 - type: nauc_ndcg_at_5_diff1 value: 57.0763 - type: nauc_ndcg_at_10_max value: 44.1492 - type: nauc_ndcg_at_10_std value: 0.6748 - type: nauc_ndcg_at_10_diff1 value: 56.8967 - type: nauc_ndcg_at_20_max value: 44.1367 - type: nauc_ndcg_at_20_std value: 0.8896 - type: nauc_ndcg_at_20_diff1 value: 56.97560000000001 - type: nauc_ndcg_at_100_max value: 43.9934 - type: nauc_ndcg_at_100_std value: 1.0534 - type: nauc_ndcg_at_100_diff1 value: 57.347899999999996 - type: nauc_ndcg_at_1000_max value: 43.8679 - type: nauc_ndcg_at_1000_std value: 0.6431 - type: nauc_ndcg_at_1000_diff1 value: 57.6967 - type: nauc_map_at_1_max value: 40.5391 - type: nauc_map_at_1_std value: -2.9052000000000002 - type: nauc_map_at_1_diff1 value: 63.2351 - type: nauc_map_at_3_max value: 43.0286 - type: nauc_map_at_3_std value: -1.2933 - type: nauc_map_at_3_diff1 value: 59.065 - type: nauc_map_at_5_max value: 43.0224 - type: nauc_map_at_5_std value: -1.1081 - type: nauc_map_at_5_diff1 value: 58.7146 - type: nauc_map_at_10_max value: 43.127500000000005 - type: nauc_map_at_10_std value: -0.7247 - type: nauc_map_at_10_diff1 value: 58.6619 - type: nauc_map_at_20_max value: 43.1213 - type: nauc_map_at_20_std value: -0.6853 - type: nauc_map_at_20_diff1 value: 58.704299999999996 - type: nauc_map_at_100_max value: 43.0908 - type: nauc_map_at_100_std value: -0.6792 - type: nauc_map_at_100_diff1 value: 58.7592 - type: nauc_map_at_1000_max value: 43.085499999999996 - type: nauc_map_at_1000_std value: -0.6897 - type: nauc_map_at_1000_diff1 value: 58.7689 - type: nauc_recall_at_1_max value: 40.5391 - type: nauc_recall_at_1_std value: -2.9052000000000002 - type: nauc_recall_at_1_diff1 value: 63.2351 - type: nauc_recall_at_3_max value: 46.3617 - type: nauc_recall_at_3_std value: 1.2550999999999999 - type: nauc_recall_at_3_diff1 value: 53.7993 - type: nauc_recall_at_5_max value: 46.6666 - type: nauc_recall_at_5_std value: 2.5401 - type: nauc_recall_at_5_diff1 value: 51.413799999999995 - type: nauc_recall_at_10_max value: 48.3645 - type: nauc_recall_at_10_std value: 6.8622000000000005 - type: nauc_recall_at_10_diff1 value: 49.6971 - type: nauc_recall_at_20_max value: 49.1074 - type: nauc_recall_at_20_std value: 9.4846 - type: nauc_recall_at_20_diff1 value: 48.5587 - type: nauc_recall_at_100_max value: 51.2638 - type: nauc_recall_at_100_std value: 18.4911 - type: nauc_recall_at_100_diff1 value: 47.2445 - type: nauc_recall_at_1000_max value: 61.0283 - type: nauc_recall_at_1000_std value: 31.5949 - type: nauc_recall_at_1000_diff1 value: 47.239599999999996 - type: nauc_precision_at_1_max value: 40.5391 - type: nauc_precision_at_1_std value: -2.9052000000000002 - type: nauc_precision_at_1_diff1 value: 63.2351 - type: nauc_precision_at_3_max value: 46.3617 - type: nauc_precision_at_3_std value: 1.2550999999999999 - type: nauc_precision_at_3_diff1 value: 53.7993 - type: nauc_precision_at_5_max value: 46.6666 - type: nauc_precision_at_5_std value: 2.5401 - type: nauc_precision_at_5_diff1 value: 51.413799999999995 - type: nauc_precision_at_10_max value: 48.3645 - type: nauc_precision_at_10_std value: 6.8622000000000005 - type: nauc_precision_at_10_diff1 value: 49.6971 - type: nauc_precision_at_20_max value: 49.1074 - type: nauc_precision_at_20_std value: 9.4846 - type: nauc_precision_at_20_diff1 value: 48.5587 - type: nauc_precision_at_100_max value: 51.2638 - type: nauc_precision_at_100_std value: 18.4911 - type: nauc_precision_at_100_diff1 value: 47.2445 - type: nauc_precision_at_1000_max value: 61.0283 - type: nauc_precision_at_1000_std value: 31.5949 - type: nauc_precision_at_1000_diff1 value: 47.239599999999996 - type: nauc_mrr_at_1_max value: 40.5391 - type: nauc_mrr_at_1_std value: -2.9052000000000002 - type: nauc_mrr_at_1_diff1 value: 63.2351 - type: nauc_mrr_at_3_max value: 43.0286 - type: nauc_mrr_at_3_std value: -1.2933 - type: nauc_mrr_at_3_diff1 value: 59.065 - type: nauc_mrr_at_5_max value: 43.0224 - type: nauc_mrr_at_5_std value: -1.1081 - type: nauc_mrr_at_5_diff1 value: 58.7146 - type: nauc_mrr_at_10_max value: 43.127500000000005 - type: nauc_mrr_at_10_std value: -0.7247 - type: nauc_mrr_at_10_diff1 value: 58.6619 - type: nauc_mrr_at_20_max value: 43.1213 - type: nauc_mrr_at_20_std value: -0.6853 - type: nauc_mrr_at_20_diff1 value: 58.704299999999996 - type: nauc_mrr_at_100_max value: 43.0908 - type: nauc_mrr_at_100_std value: -0.6792 - type: nauc_mrr_at_100_diff1 value: 58.7592 - type: nauc_mrr_at_1000_max value: 43.085499999999996 - type: nauc_mrr_at_1000_std value: -0.6897 - type: nauc_mrr_at_1000_diff1 value: 58.7689 - type: main_score value: 59.471 task: type: Retrieval - dataset: config: ruby name: MTEB COIRCodeSearchNetRetrieval (ruby) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 38.144 - type: ndcg_at_3 value: 46.086 - type: ndcg_at_5 value: 48.13 - type: ndcg_at_10 value: 50.166 - type: ndcg_at_20 value: 51.672 - type: ndcg_at_100 value: 53.81 - type: ndcg_at_1000 value: 55.401999999999994 - type: map_at_1 value: 38.144 - type: map_at_3 value: 44.118 - type: map_at_5 value: 45.245000000000005 - type: map_at_10 value: 46.061 - type: map_at_20 value: 46.475 - type: map_at_100 value: 46.761 - type: map_at_1000 value: 46.815 - type: recall_at_1 value: 38.144 - type: recall_at_3 value: 51.784 - type: recall_at_5 value: 56.779999999999994 - type: recall_at_10 value: 63.20400000000001 - type: recall_at_20 value: 69.151 - type: recall_at_100 value: 80.809 - type: recall_at_1000 value: 93.65599999999999 - type: precision_at_1 value: 38.144 - type: precision_at_3 value: 17.261000000000003 - type: precision_at_5 value: 11.356 - type: precision_at_10 value: 6.32 - type: precision_at_20 value: 3.458 - type: precision_at_100 value: 0.808 - type: precision_at_1000 value: 0.094 - type: mrr_at_1 value: 38.1443 - type: mrr_at_3 value: 44.1184 - type: mrr_at_5 value: 45.2445 - type: mrr_at_10 value: 46.0607 - type: mrr_at_20 value: 46.475 - type: mrr_at_100 value: 46.7611 - type: mrr_at_1000 value: 46.8146 - type: nauc_ndcg_at_1_max value: 49.8526 - type: nauc_ndcg_at_1_std value: 6.944500000000001 - type: nauc_ndcg_at_1_diff1 value: 59.0325 - type: nauc_ndcg_at_3_max value: 48.8152 - type: nauc_ndcg_at_3_std value: 6.2506 - type: nauc_ndcg_at_3_diff1 value: 51.7373 - type: nauc_ndcg_at_5_max value: 48.4399 - type: nauc_ndcg_at_5_std value: 6.687 - type: nauc_ndcg_at_5_diff1 value: 50.569900000000004 - type: nauc_ndcg_at_10_max value: 47.2669 - type: nauc_ndcg_at_10_std value: 6.703 - type: nauc_ndcg_at_10_diff1 value: 49.3867 - type: nauc_ndcg_at_20_max value: 47.1761 - type: nauc_ndcg_at_20_std value: 7.0552 - type: nauc_ndcg_at_20_diff1 value: 49.3528 - type: nauc_ndcg_at_100_max value: 47.196 - type: nauc_ndcg_at_100_std value: 7.697 - type: nauc_ndcg_at_100_diff1 value: 49.9359 - type: nauc_ndcg_at_1000_max value: 47.4306 - type: nauc_ndcg_at_1000_std value: 7.3536 - type: nauc_ndcg_at_1000_diff1 value: 50.365700000000004 - type: nauc_map_at_1_max value: 49.8526 - type: nauc_map_at_1_std value: 6.944500000000001 - type: nauc_map_at_1_diff1 value: 59.0325 - type: nauc_map_at_3_max value: 48.932900000000004 - type: nauc_map_at_3_std value: 6.285499999999999 - type: nauc_map_at_3_diff1 value: 53.4821 - type: nauc_map_at_5_max value: 48.709799999999994 - type: nauc_map_at_5_std value: 6.5305 - type: nauc_map_at_5_diff1 value: 52.8586 - type: nauc_map_at_10_max value: 48.2504 - type: nauc_map_at_10_std value: 6.535299999999999 - type: nauc_map_at_10_diff1 value: 52.410000000000004 - type: nauc_map_at_20_max value: 48.2424 - type: nauc_map_at_20_std value: 6.6425 - type: nauc_map_at_20_diff1 value: 52.4289 - type: nauc_map_at_100_max value: 48.254999999999995 - type: nauc_map_at_100_std value: 6.7272 - type: nauc_map_at_100_diff1 value: 52.517199999999995 - type: nauc_map_at_1000_max value: 48.2618 - type: nauc_map_at_1000_std value: 6.7179 - type: nauc_map_at_1000_diff1 value: 52.5296 - type: nauc_recall_at_1_max value: 49.8526 - type: nauc_recall_at_1_std value: 6.944500000000001 - type: nauc_recall_at_1_diff1 value: 59.0325 - type: nauc_recall_at_3_max value: 48.5241 - type: nauc_recall_at_3_std value: 6.2048 - type: nauc_recall_at_3_diff1 value: 46.5818 - type: nauc_recall_at_5_max value: 47.6347 - type: nauc_recall_at_5_std value: 7.290299999999999 - type: nauc_recall_at_5_diff1 value: 43.3392 - type: nauc_recall_at_10_max value: 43.4268 - type: nauc_recall_at_10_std value: 7.4028 - type: nauc_recall_at_10_diff1 value: 38.508700000000005 - type: nauc_recall_at_20_max value: 42.416199999999996 - type: nauc_recall_at_20_std value: 9.0454 - type: nauc_recall_at_20_diff1 value: 36.9086 - type: nauc_recall_at_100_max value: 40.23 - type: nauc_recall_at_100_std value: 15.776000000000002 - type: nauc_recall_at_100_diff1 value: 36.492599999999996 - type: nauc_recall_at_1000_max value: 36.7611 - type: nauc_recall_at_1000_std value: 16.9938 - type: nauc_recall_at_1000_diff1 value: 29.5398 - type: nauc_precision_at_1_max value: 49.8526 - type: nauc_precision_at_1_std value: 6.944500000000001 - type: nauc_precision_at_1_diff1 value: 59.0325 - type: nauc_precision_at_3_max value: 48.5241 - type: nauc_precision_at_3_std value: 6.2048 - type: nauc_precision_at_3_diff1 value: 46.5818 - type: nauc_precision_at_5_max value: 47.6347 - type: nauc_precision_at_5_std value: 7.290299999999999 - type: nauc_precision_at_5_diff1 value: 43.3392 - type: nauc_precision_at_10_max value: 43.4268 - type: nauc_precision_at_10_std value: 7.4028 - type: nauc_precision_at_10_diff1 value: 38.508700000000005 - type: nauc_precision_at_20_max value: 42.416199999999996 - type: nauc_precision_at_20_std value: 9.0454 - type: nauc_precision_at_20_diff1 value: 36.9086 - type: nauc_precision_at_100_max value: 40.23 - type: nauc_precision_at_100_std value: 15.776000000000002 - type: nauc_precision_at_100_diff1 value: 36.492599999999996 - type: nauc_precision_at_1000_max value: 36.7611 - type: nauc_precision_at_1000_std value: 16.9938 - type: nauc_precision_at_1000_diff1 value: 29.5398 - type: nauc_mrr_at_1_max value: 49.8526 - type: nauc_mrr_at_1_std value: 6.944500000000001 - type: nauc_mrr_at_1_diff1 value: 59.0325 - type: nauc_mrr_at_3_max value: 48.932900000000004 - type: nauc_mrr_at_3_std value: 6.285499999999999 - type: nauc_mrr_at_3_diff1 value: 53.4821 - type: nauc_mrr_at_5_max value: 48.709799999999994 - type: nauc_mrr_at_5_std value: 6.5305 - type: nauc_mrr_at_5_diff1 value: 52.8586 - type: nauc_mrr_at_10_max value: 48.2504 - type: nauc_mrr_at_10_std value: 6.535299999999999 - type: nauc_mrr_at_10_diff1 value: 52.410000000000004 - type: nauc_mrr_at_20_max value: 48.2424 - type: nauc_mrr_at_20_std value: 6.6425 - type: nauc_mrr_at_20_diff1 value: 52.4289 - type: nauc_mrr_at_100_max value: 48.254999999999995 - type: nauc_mrr_at_100_std value: 6.7272 - type: nauc_mrr_at_100_diff1 value: 52.517199999999995 - type: nauc_mrr_at_1000_max value: 48.2618 - type: nauc_mrr_at_1000_std value: 6.7179 - type: nauc_mrr_at_1000_diff1 value: 52.5296 - type: main_score value: 50.166 task: type: Retrieval - dataset: config: java name: MTEB COIRCodeSearchNetRetrieval (java) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 42.355 - type: ndcg_at_3 value: 50.89 - type: ndcg_at_5 value: 53.089 - type: ndcg_at_10 value: 55.062 - type: ndcg_at_20 value: 56.373 - type: ndcg_at_100 value: 58.268 - type: ndcg_at_1000 value: 59.367999999999995 - type: map_at_1 value: 42.355 - type: map_at_3 value: 48.825 - type: map_at_5 value: 50.05 - type: map_at_10 value: 50.866 - type: map_at_20 value: 51.227999999999994 - type: map_at_100 value: 51.486 - type: map_at_1000 value: 51.525 - type: recall_at_1 value: 42.355 - type: recall_at_3 value: 56.851 - type: recall_at_5 value: 62.173 - type: recall_at_10 value: 68.26100000000001 - type: recall_at_20 value: 73.437 - type: recall_at_100 value: 83.706 - type: recall_at_1000 value: 92.506 - type: precision_at_1 value: 42.355 - type: precision_at_3 value: 18.95 - type: precision_at_5 value: 12.435 - type: precision_at_10 value: 6.8260000000000005 - type: precision_at_20 value: 3.672 - type: precision_at_100 value: 0.8370000000000001 - type: precision_at_1000 value: 0.093 - type: mrr_at_1 value: 42.3551 - type: mrr_at_3 value: 48.8255 - type: mrr_at_5 value: 50.049600000000005 - type: mrr_at_10 value: 50.8665 - type: mrr_at_20 value: 51.227999999999994 - type: mrr_at_100 value: 51.486 - type: mrr_at_1000 value: 51.525200000000005 - type: nauc_ndcg_at_1_max value: 41.261700000000005 - type: nauc_ndcg_at_1_std value: -4.1932 - type: nauc_ndcg_at_1_diff1 value: 62.1792 - type: nauc_ndcg_at_3_max value: 43.6389 - type: nauc_ndcg_at_3_std value: -2.7453000000000003 - type: nauc_ndcg_at_3_diff1 value: 56.621 - type: nauc_ndcg_at_5_max value: 43.5895 - type: nauc_ndcg_at_5_std value: -2.1214 - type: nauc_ndcg_at_5_diff1 value: 55.7216 - type: nauc_ndcg_at_10_max value: 43.56 - type: nauc_ndcg_at_10_std value: -1.2124 - type: nauc_ndcg_at_10_diff1 value: 55.1817 - type: nauc_ndcg_at_20_max value: 43.6918 - type: nauc_ndcg_at_20_std value: -0.4332 - type: nauc_ndcg_at_20_diff1 value: 54.9887 - type: nauc_ndcg_at_100_max value: 43.945499999999996 - type: nauc_ndcg_at_100_std value: 0.3674 - type: nauc_ndcg_at_100_diff1 value: 55.237899999999996 - type: nauc_ndcg_at_1000_max value: 43.8498 - type: nauc_ndcg_at_1000_std value: 0.1663 - type: nauc_ndcg_at_1000_diff1 value: 55.6509 - type: nauc_map_at_1_max value: 41.261700000000005 - type: nauc_map_at_1_std value: -4.1932 - type: nauc_map_at_1_diff1 value: 62.1792 - type: nauc_map_at_3_max value: 43.0699 - type: nauc_map_at_3_std value: -3.1619 - type: nauc_map_at_3_diff1 value: 57.961600000000004 - type: nauc_map_at_5_max value: 43.0235 - type: nauc_map_at_5_std value: -2.8471 - type: nauc_map_at_5_diff1 value: 57.492399999999996 - type: nauc_map_at_10_max value: 43.0155 - type: nauc_map_at_10_std value: -2.4906 - type: nauc_map_at_10_diff1 value: 57.308899999999994 - type: nauc_map_at_20_max value: 43.0405 - type: nauc_map_at_20_std value: -2.299 - type: nauc_map_at_20_diff1 value: 57.262 - type: nauc_map_at_100_max value: 43.0606 - type: nauc_map_at_100_std value: -2.2096 - type: nauc_map_at_100_diff1 value: 57.2982 - type: nauc_map_at_1000_max value: 43.0566 - type: nauc_map_at_1000_std value: -2.2155 - type: nauc_map_at_1000_diff1 value: 57.312 - type: nauc_recall_at_1_max value: 41.261700000000005 - type: nauc_recall_at_1_std value: -4.1932 - type: nauc_recall_at_1_diff1 value: 62.1792 - type: nauc_recall_at_3_max value: 45.368199999999995 - type: nauc_recall_at_3_std value: -1.4471 - type: nauc_recall_at_3_diff1 value: 52.5416 - type: nauc_recall_at_5_max value: 45.421299999999995 - type: nauc_recall_at_5_std value: 0.3829 - type: nauc_recall_at_5_diff1 value: 49.8591 - type: nauc_recall_at_10_max value: 45.4698 - type: nauc_recall_at_10_std value: 3.9899999999999998 - type: nauc_recall_at_10_diff1 value: 47.100500000000004 - type: nauc_recall_at_20_max value: 46.4998 - type: nauc_recall_at_20_std value: 8.8468 - type: nauc_recall_at_20_diff1 value: 45.027899999999995 - type: nauc_recall_at_100_max value: 50.79559999999999 - type: nauc_recall_at_100_std value: 21.8125 - type: nauc_recall_at_100_diff1 value: 42.735099999999996 - type: nauc_recall_at_1000_max value: 55.116 - type: nauc_recall_at_1000_std value: 37.5788 - type: nauc_recall_at_1000_diff1 value: 42.2857 - type: nauc_precision_at_1_max value: 41.261700000000005 - type: nauc_precision_at_1_std value: -4.1932 - type: nauc_precision_at_1_diff1 value: 62.1792 - type: nauc_precision_at_3_max value: 45.368199999999995 - type: nauc_precision_at_3_std value: -1.4471 - type: nauc_precision_at_3_diff1 value: 52.5416 - type: nauc_precision_at_5_max value: 45.421299999999995 - type: nauc_precision_at_5_std value: 0.3829 - type: nauc_precision_at_5_diff1 value: 49.8591 - type: nauc_precision_at_10_max value: 45.4698 - type: nauc_precision_at_10_std value: 3.9899999999999998 - type: nauc_precision_at_10_diff1 value: 47.100500000000004 - type: nauc_precision_at_20_max value: 46.4998 - type: nauc_precision_at_20_std value: 8.8468 - type: nauc_precision_at_20_diff1 value: 45.027899999999995 - type: nauc_precision_at_100_max value: 50.79559999999999 - type: nauc_precision_at_100_std value: 21.8125 - type: nauc_precision_at_100_diff1 value: 42.735099999999996 - type: nauc_precision_at_1000_max value: 55.116 - type: nauc_precision_at_1000_std value: 37.5788 - type: nauc_precision_at_1000_diff1 value: 42.2857 - type: nauc_mrr_at_1_max value: 41.261700000000005 - type: nauc_mrr_at_1_std value: -4.1932 - type: nauc_mrr_at_1_diff1 value: 62.1792 - type: nauc_mrr_at_3_max value: 43.0699 - type: nauc_mrr_at_3_std value: -3.1619 - type: nauc_mrr_at_3_diff1 value: 57.961600000000004 - type: nauc_mrr_at_5_max value: 43.0235 - type: nauc_mrr_at_5_std value: -2.8471 - type: nauc_mrr_at_5_diff1 value: 57.492399999999996 - type: nauc_mrr_at_10_max value: 43.0155 - type: nauc_mrr_at_10_std value: -2.4906 - type: nauc_mrr_at_10_diff1 value: 57.308899999999994 - type: nauc_mrr_at_20_max value: 43.0405 - type: nauc_mrr_at_20_std value: -2.299 - type: nauc_mrr_at_20_diff1 value: 57.262 - type: nauc_mrr_at_100_max value: 43.0606 - type: nauc_mrr_at_100_std value: -2.2096 - type: nauc_mrr_at_100_diff1 value: 57.2982 - type: nauc_mrr_at_1000_max value: 43.0566 - type: nauc_mrr_at_1000_std value: -2.2155 - type: nauc_mrr_at_1000_diff1 value: 57.312 - type: main_score value: 55.062 task: type: Retrieval - dataset: config: php name: MTEB COIRCodeSearchNetRetrieval (php) revision: 4adc7bc41202b5c13543c9c886a25f340634dab3 split: test type: CoIR-Retrieval/CodeSearchNet metrics: - type: ndcg_at_1 value: 36.835 - type: ndcg_at_3 value: 45.147999999999996 - type: ndcg_at_5 value: 47.497 - type: ndcg_at_10 value: 49.784 - type: ndcg_at_20 value: 51.410999999999994 - type: ndcg_at_100 value: 53.715 - type: ndcg_at_1000 value: 55.102 - type: map_at_1 value: 36.835 - type: map_at_3 value: 43.126 - type: map_at_5 value: 44.429 - type: map_at_10 value: 45.377 - type: map_at_20 value: 45.821 - type: map_at_100 value: 46.139 - type: map_at_1000 value: 46.188 - type: recall_at_1 value: 36.835 - type: recall_at_3 value: 50.992000000000004 - type: recall_at_5 value: 56.693000000000005 - type: recall_at_10 value: 63.743 - type: recall_at_20 value: 70.194 - type: recall_at_100 value: 82.65299999999999 - type: recall_at_1000 value: 93.728 - type: precision_at_1 value: 36.835 - type: precision_at_3 value: 16.997 - type: precision_at_5 value: 11.339 - type: precision_at_10 value: 6.3740000000000006 - type: precision_at_20 value: 3.51 - type: precision_at_100 value: 0.827 - type: precision_at_1000 value: 0.094 - type: mrr_at_1 value: 36.8346 - type: mrr_at_3 value: 43.1259 - type: mrr_at_5 value: 44.4289 - type: mrr_at_10 value: 45.3769 - type: mrr_at_20 value: 45.8215 - type: mrr_at_100 value: 46.138600000000004 - type: mrr_at_1000 value: 46.1881 - type: nauc_ndcg_at_1_max value: 36.9844 - type: nauc_ndcg_at_1_std value: -3.2222 - type: nauc_ndcg_at_1_diff1 value: 58.896 - type: nauc_ndcg_at_3_max value: 37.6355 - type: nauc_ndcg_at_3_std value: -2.2689 - type: nauc_ndcg_at_3_diff1 value: 52.771100000000004 - type: nauc_ndcg_at_5_max value: 38.175599999999996 - type: nauc_ndcg_at_5_std value: -1.5131999999999999 - type: nauc_ndcg_at_5_diff1 value: 52.0101 - type: nauc_ndcg_at_10_max value: 38.2873 - type: nauc_ndcg_at_10_std value: -0.5444 - type: nauc_ndcg_at_10_diff1 value: 51.3992 - type: nauc_ndcg_at_20_max value: 38.324200000000005 - type: nauc_ndcg_at_20_std value: 0.1328 - type: nauc_ndcg_at_20_diff1 value: 51.2346 - type: nauc_ndcg_at_100_max value: 38.6313 - type: nauc_ndcg_at_100_std value: 0.9426 - type: nauc_ndcg_at_100_diff1 value: 51.65729999999999 - type: nauc_ndcg_at_1000_max value: 38.6274 - type: nauc_ndcg_at_1000_std value: 0.69 - type: nauc_ndcg_at_1000_diff1 value: 52.1029 - type: nauc_map_at_1_max value: 36.9844 - type: nauc_map_at_1_std value: -3.2222 - type: nauc_map_at_1_diff1 value: 58.896 - type: nauc_map_at_3_max value: 37.523 - type: nauc_map_at_3_std value: -2.5115 - type: nauc_map_at_3_diff1 value: 54.17960000000001 - type: nauc_map_at_5_max value: 37.8191 - type: nauc_map_at_5_std value: -2.1073 - type: nauc_map_at_5_diff1 value: 53.780499999999996 - type: nauc_map_at_10_max value: 37.8581 - type: nauc_map_at_10_std value: -1.7191999999999998 - type: nauc_map_at_10_diff1 value: 53.541700000000006 - type: nauc_map_at_20_max value: 37.8684 - type: nauc_map_at_20_std value: -1.5565 - type: nauc_map_at_20_diff1 value: 53.5155 - type: nauc_map_at_100_max value: 37.9101 - type: nauc_map_at_100_std value: -1.4577 - type: nauc_map_at_100_diff1 value: 53.5894 - type: nauc_map_at_1000_max value: 37.9109 - type: nauc_map_at_1000_std value: -1.4617 - type: nauc_map_at_1000_diff1 value: 53.6044 - type: nauc_recall_at_1_max value: 36.9844 - type: nauc_recall_at_1_std value: -3.2222 - type: nauc_recall_at_1_diff1 value: 58.896 - type: nauc_recall_at_3_max value: 37.9468 - type: nauc_recall_at_3_std value: -1.5512 - type: nauc_recall_at_3_diff1 value: 48.6655 - type: nauc_recall_at_5_max value: 39.3342 - type: nauc_recall_at_5_std value: 0.44739999999999996 - type: nauc_recall_at_5_diff1 value: 46.475100000000005 - type: nauc_recall_at_10_max value: 39.8619 - type: nauc_recall_at_10_std value: 4.0042 - type: nauc_recall_at_10_diff1 value: 43.8251 - type: nauc_recall_at_20_max value: 40.226299999999995 - type: nauc_recall_at_20_std value: 8.052299999999999 - type: nauc_recall_at_20_diff1 value: 41.937400000000004 - type: nauc_recall_at_100_max value: 44.221 - type: nauc_recall_at_100_std value: 20.433699999999998 - type: nauc_recall_at_100_diff1 value: 40.745599999999996 - type: nauc_recall_at_1000_max value: 52.6045 - type: nauc_recall_at_1000_std value: 40.3497 - type: nauc_recall_at_1000_diff1 value: 40.248 - type: nauc_precision_at_1_max value: 36.9844 - type: nauc_precision_at_1_std value: -3.2222 - type: nauc_precision_at_1_diff1 value: 58.896 - type: nauc_precision_at_3_max value: 37.9468 - type: nauc_precision_at_3_std value: -1.5512 - type: nauc_precision_at_3_diff1 value: 48.6655 - type: nauc_precision_at_5_max value: 39.3342 - type: nauc_precision_at_5_std value: 0.44739999999999996 - type: nauc_precision_at_5_diff1 value: 46.475100000000005 - type: nauc_precision_at_10_max value: 39.8619 - type: nauc_precision_at_10_std value: 4.0042 - type: nauc_precision_at_10_diff1 value: 43.8251 - type: nauc_precision_at_20_max value: 40.226299999999995 - type: nauc_precision_at_20_std value: 8.052299999999999 - type: nauc_precision_at_20_diff1 value: 41.937400000000004 - type: nauc_precision_at_100_max value: 44.221 - type: nauc_precision_at_100_std value: 20.433699999999998 - type: nauc_precision_at_100_diff1 value: 40.745599999999996 - type: nauc_precision_at_1000_max value: 52.6045 - type: nauc_precision_at_1000_std value: 40.3497 - type: nauc_precision_at_1000_diff1 value: 40.248 - type: nauc_mrr_at_1_max value: 36.9844 - type: nauc_mrr_at_1_std value: -3.2222 - type: nauc_mrr_at_1_diff1 value: 58.896 - type: nauc_mrr_at_3_max value: 37.523 - type: nauc_mrr_at_3_std value: -2.5115 - type: nauc_mrr_at_3_diff1 value: 54.17960000000001 - type: nauc_mrr_at_5_max value: 37.8191 - type: nauc_mrr_at_5_std value: -2.1073 - type: nauc_mrr_at_5_diff1 value: 53.780499999999996 - type: nauc_mrr_at_10_max value: 37.8581 - type: nauc_mrr_at_10_std value: -1.7191999999999998 - type: nauc_mrr_at_10_diff1 value: 53.541700000000006 - type: nauc_mrr_at_20_max value: 37.8684 - type: nauc_mrr_at_20_std value: -1.5565 - type: nauc_mrr_at_20_diff1 value: 53.5155 - type: nauc_mrr_at_100_max value: 37.9101 - type: nauc_mrr_at_100_std value: -1.4577 - type: nauc_mrr_at_100_diff1 value: 53.5894 - type: nauc_mrr_at_1000_max value: 37.9109 - type: nauc_mrr_at_1000_std value: -1.4617 - type: nauc_mrr_at_1000_diff1 value: 53.6044 - type: main_score value: 49.784 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackAndroidRetrieval (default) revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: mteb/cqadupstack-android metrics: - type: ndcg_at_1 value: 44.206 - type: ndcg_at_3 value: 49.364999999999995 - type: ndcg_at_5 value: 51.429 - type: ndcg_at_10 value: 54.106 - type: ndcg_at_20 value: 56.271 - type: ndcg_at_100 value: 59.33500000000001 - type: ndcg_at_1000 value: 61.015 - type: map_at_1 value: 35.797000000000004 - type: map_at_3 value: 44.137 - type: map_at_5 value: 46.062999999999995 - type: map_at_10 value: 47.793 - type: map_at_20 value: 48.730000000000004 - type: map_at_100 value: 49.422 - type: map_at_1000 value: 49.546 - type: recall_at_1 value: 35.797000000000004 - type: recall_at_3 value: 51.224000000000004 - type: recall_at_5 value: 57.218999999999994 - type: recall_at_10 value: 65.182 - type: recall_at_20 value: 72.76700000000001 - type: recall_at_100 value: 86.654 - type: recall_at_1000 value: 97.131 - type: precision_at_1 value: 44.206 - type: precision_at_3 value: 23.653 - type: precision_at_5 value: 16.91 - type: precision_at_10 value: 10.443 - type: precision_at_20 value: 6.194999999999999 - type: precision_at_100 value: 1.6310000000000002 - type: precision_at_1000 value: 0.214 - type: mrr_at_1 value: 44.206 - type: mrr_at_3 value: 51.430600000000005 - type: mrr_at_5 value: 52.839800000000004 - type: mrr_at_10 value: 53.808 - type: mrr_at_20 value: 54.2585 - type: mrr_at_100 value: 54.540200000000006 - type: mrr_at_1000 value: 54.577799999999996 - type: nauc_ndcg_at_1_max value: 45.573 - type: nauc_ndcg_at_1_std value: -5.092300000000001 - type: nauc_ndcg_at_1_diff1 value: 50.8011 - type: nauc_ndcg_at_3_max value: 44.7194 - type: nauc_ndcg_at_3_std value: -2.979 - type: nauc_ndcg_at_3_diff1 value: 49.4014 - type: nauc_ndcg_at_5_max value: 45.9838 - type: nauc_ndcg_at_5_std value: -2.4417999999999997 - type: nauc_ndcg_at_5_diff1 value: 48.2985 - type: nauc_ndcg_at_10_max value: 45.6755 - type: nauc_ndcg_at_10_std value: -2.1826000000000003 - type: nauc_ndcg_at_10_diff1 value: 48.443799999999996 - type: nauc_ndcg_at_20_max value: 45.967200000000005 - type: nauc_ndcg_at_20_std value: -0.3553 - type: nauc_ndcg_at_20_diff1 value: 48.0216 - type: nauc_ndcg_at_100_max value: 46.3459 - type: nauc_ndcg_at_100_std value: 0.6947 - type: nauc_ndcg_at_100_diff1 value: 48.3313 - type: nauc_ndcg_at_1000_max value: 46.245599999999996 - type: nauc_ndcg_at_1000_std value: -0.3032 - type: nauc_ndcg_at_1000_diff1 value: 48.3821 - type: nauc_map_at_1_max value: 38.896 - type: nauc_map_at_1_std value: -5.7093 - type: nauc_map_at_1_diff1 value: 54.4608 - type: nauc_map_at_3_max value: 42.6164 - type: nauc_map_at_3_std value: -4.6751000000000005 - type: nauc_map_at_3_diff1 value: 52.23759999999999 - type: nauc_map_at_5_max value: 43.9491 - type: nauc_map_at_5_std value: -3.8674 - type: nauc_map_at_5_diff1 value: 51.03189999999999 - type: nauc_map_at_10_max value: 44.4192 - type: nauc_map_at_10_std value: -3.4564999999999997 - type: nauc_map_at_10_diff1 value: 50.6846 - type: nauc_map_at_20_max value: 44.8404 - type: nauc_map_at_20_std value: -2.67 - type: nauc_map_at_20_diff1 value: 50.3892 - type: nauc_map_at_100_max value: 44.9988 - type: nauc_map_at_100_std value: -2.4528000000000003 - type: nauc_map_at_100_diff1 value: 50.2602 - type: nauc_map_at_1000_max value: 45.0043 - type: nauc_map_at_1000_std value: -2.5084 - type: nauc_map_at_1000_diff1 value: 50.2302 - type: nauc_recall_at_1_max value: 38.896 - type: nauc_recall_at_1_std value: -5.7093 - type: nauc_recall_at_1_diff1 value: 54.4608 - type: nauc_recall_at_3_max value: 40.917500000000004 - type: nauc_recall_at_3_std value: -2.9875 - type: nauc_recall_at_3_diff1 value: 47.935 - type: nauc_recall_at_5_max value: 43.578 - type: nauc_recall_at_5_std value: -0.0832 - type: nauc_recall_at_5_diff1 value: 43.924800000000005 - type: nauc_recall_at_10_max value: 42.3348 - type: nauc_recall_at_10_std value: 1.2774 - type: nauc_recall_at_10_diff1 value: 42.5842 - type: nauc_recall_at_20_max value: 43.4429 - type: nauc_recall_at_20_std value: 9.6387 - type: nauc_recall_at_20_diff1 value: 40.1222 - type: nauc_recall_at_100_max value: 47.6245 - type: nauc_recall_at_100_std value: 28.7436 - type: nauc_recall_at_100_diff1 value: 42.3728 - type: nauc_recall_at_1000_max value: 57.4835 - type: nauc_recall_at_1000_std value: 66.6109 - type: nauc_recall_at_1000_diff1 value: 48.025 - type: nauc_precision_at_1_max value: 45.573 - type: nauc_precision_at_1_std value: -5.092300000000001 - type: nauc_precision_at_1_diff1 value: 50.8011 - type: nauc_precision_at_3_max value: 39.7982 - type: nauc_precision_at_3_std value: 1.3032 - type: nauc_precision_at_3_diff1 value: 26.422600000000003 - type: nauc_precision_at_5_max value: 36.86 - type: nauc_precision_at_5_std value: 3.9888 - type: nauc_precision_at_5_diff1 value: 13.4191 - type: nauc_precision_at_10_max value: 26.663199999999996 - type: nauc_precision_at_10_std value: 6.388299999999999 - type: nauc_precision_at_10_diff1 value: 2.1197 - type: nauc_precision_at_20_max value: 19.8196 - type: nauc_precision_at_20_std value: 9.0818 - type: nauc_precision_at_20_diff1 value: -6.483999999999999 - type: nauc_precision_at_100_max value: 5.6951 - type: nauc_precision_at_100_std value: 5.3285 - type: nauc_precision_at_100_diff1 value: -17.9036 - type: nauc_precision_at_1000_max value: -9.107999999999999 - type: nauc_precision_at_1000_std value: -7.5626999999999995 - type: nauc_precision_at_1000_diff1 value: -27.7189 - type: nauc_mrr_at_1_max value: 45.573 - type: nauc_mrr_at_1_std value: -5.092300000000001 - type: nauc_mrr_at_1_diff1 value: 50.8011 - type: nauc_mrr_at_3_max value: 46.394800000000004 - type: nauc_mrr_at_3_std value: -3.6457 - type: nauc_mrr_at_3_diff1 value: 48.8878 - type: nauc_mrr_at_5_max value: 46.7342 - type: nauc_mrr_at_5_std value: -3.2079999999999997 - type: nauc_mrr_at_5_diff1 value: 47.9827 - type: nauc_mrr_at_10_max value: 46.4047 - type: nauc_mrr_at_10_std value: -2.9571 - type: nauc_mrr_at_10_diff1 value: 48.036 - type: nauc_mrr_at_20_max value: 46.3645 - type: nauc_mrr_at_20_std value: -2.6208 - type: nauc_mrr_at_20_diff1 value: 48.030699999999996 - type: nauc_mrr_at_100_max value: 46.3951 - type: nauc_mrr_at_100_std value: -2.693 - type: nauc_mrr_at_100_diff1 value: 48.128 - type: nauc_mrr_at_1000_max value: 46.403299999999994 - type: nauc_mrr_at_1000_std value: -2.7043999999999997 - type: nauc_mrr_at_1000_diff1 value: 48.1413 - type: main_score value: 54.106 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval (default) revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: mteb/cqadupstack-english metrics: - type: ndcg_at_1 value: 41.274 - type: ndcg_at_3 value: 46.022999999999996 - type: ndcg_at_5 value: 47.882999999999996 - type: ndcg_at_10 value: 50.251000000000005 - type: ndcg_at_20 value: 51.93 - type: ndcg_at_100 value: 54.725 - type: ndcg_at_1000 value: 56.635000000000005 - type: map_at_1 value: 32.748 - type: map_at_3 value: 40.916000000000004 - type: map_at_5 value: 42.620999999999995 - type: map_at_10 value: 44.138 - type: map_at_20 value: 44.911 - type: map_at_100 value: 45.565 - type: map_at_1000 value: 45.698 - type: recall_at_1 value: 32.748 - type: recall_at_3 value: 47.522999999999996 - type: recall_at_5 value: 52.957 - type: recall_at_10 value: 60.321999999999996 - type: recall_at_20 value: 66.506 - type: recall_at_100 value: 79.669 - type: recall_at_1000 value: 91.73 - type: precision_at_1 value: 41.274 - type: precision_at_3 value: 22.718 - type: precision_at_5 value: 16.064 - type: precision_at_10 value: 9.828000000000001 - type: precision_at_20 value: 5.783 - type: precision_at_100 value: 1.5730000000000002 - type: precision_at_1000 value: 0.202 - type: mrr_at_1 value: 41.273900000000005 - type: mrr_at_3 value: 48.2378 - type: mrr_at_5 value: 49.5626 - type: mrr_at_10 value: 50.459900000000005 - type: mrr_at_20 value: 50.805 - type: mrr_at_100 value: 51.069900000000004 - type: mrr_at_1000 value: 51.1088 - type: nauc_ndcg_at_1_max value: 44.7657 - type: nauc_ndcg_at_1_std value: 3.7028 - type: nauc_ndcg_at_1_diff1 value: 52.017199999999995 - type: nauc_ndcg_at_3_max value: 45.2602 - type: nauc_ndcg_at_3_std value: 3.9891 - type: nauc_ndcg_at_3_diff1 value: 48.9746 - type: nauc_ndcg_at_5_max value: 45.0766 - type: nauc_ndcg_at_5_std value: 4.1764 - type: nauc_ndcg_at_5_diff1 value: 48.5708 - type: nauc_ndcg_at_10_max value: 45.0325 - type: nauc_ndcg_at_10_std value: 4.8281 - type: nauc_ndcg_at_10_diff1 value: 47.6424 - type: nauc_ndcg_at_20_max value: 45.2904 - type: nauc_ndcg_at_20_std value: 5.739 - type: nauc_ndcg_at_20_diff1 value: 47.7781 - type: nauc_ndcg_at_100_max value: 45.6547 - type: nauc_ndcg_at_100_std value: 7.6744 - type: nauc_ndcg_at_100_diff1 value: 47.2483 - type: nauc_ndcg_at_1000_max value: 45.5879 - type: nauc_ndcg_at_1000_std value: 7.919 - type: nauc_ndcg_at_1000_diff1 value: 47.172799999999995 - type: nauc_map_at_1_max value: 35.7481 - type: nauc_map_at_1_std value: -6.451 - type: nauc_map_at_1_diff1 value: 55.3994 - type: nauc_map_at_3_max value: 41.4679 - type: nauc_map_at_3_std value: -2.2265 - type: nauc_map_at_3_diff1 value: 51.9234 - type: nauc_map_at_5_max value: 42.2532 - type: nauc_map_at_5_std value: -0.9950000000000001 - type: nauc_map_at_5_diff1 value: 51.172200000000004 - type: nauc_map_at_10_max value: 43.0496 - type: nauc_map_at_10_std value: 0.3319 - type: nauc_map_at_10_diff1 value: 50.3961 - type: nauc_map_at_20_max value: 43.6286 - type: nauc_map_at_20_std value: 1.2991000000000001 - type: nauc_map_at_20_diff1 value: 50.2938 - type: nauc_map_at_100_max value: 43.906800000000004 - type: nauc_map_at_100_std value: 2.1626 - type: nauc_map_at_100_diff1 value: 50.1124 - type: nauc_map_at_1000_max value: 43.9529 - type: nauc_map_at_1000_std value: 2.309 - type: nauc_map_at_1000_diff1 value: 50.0859 - type: nauc_recall_at_1_max value: 35.7481 - type: nauc_recall_at_1_std value: -6.451 - type: nauc_recall_at_1_diff1 value: 55.3994 - type: nauc_recall_at_3_max value: 40.739 - type: nauc_recall_at_3_std value: -0.9688 - type: nauc_recall_at_3_diff1 value: 47.1898 - type: nauc_recall_at_5_max value: 41.494 - type: nauc_recall_at_5_std value: 2.1174 - type: nauc_recall_at_5_diff1 value: 44.5816 - type: nauc_recall_at_10_max value: 41.739 - type: nauc_recall_at_10_std value: 5.7603 - type: nauc_recall_at_10_diff1 value: 39.9929 - type: nauc_recall_at_20_max value: 42.9217 - type: nauc_recall_at_20_std value: 10.6088 - type: nauc_recall_at_20_diff1 value: 39.1455 - type: nauc_recall_at_100_max value: 45.1375 - type: nauc_recall_at_100_std value: 25.986700000000003 - type: nauc_recall_at_100_diff1 value: 33.972 - type: nauc_recall_at_1000_max value: 46.050200000000004 - type: nauc_recall_at_1000_std value: 44.597300000000004 - type: nauc_recall_at_1000_diff1 value: 26.326100000000004 - type: nauc_precision_at_1_max value: 44.7657 - type: nauc_precision_at_1_std value: 3.7028 - type: nauc_precision_at_1_diff1 value: 52.017199999999995 - type: nauc_precision_at_3_max value: 44.291799999999995 - type: nauc_precision_at_3_std value: 18.334500000000002 - type: nauc_precision_at_3_diff1 value: 25.625500000000002 - type: nauc_precision_at_5_max value: 40.8025 - type: nauc_precision_at_5_std value: 23.6687 - type: nauc_precision_at_5_diff1 value: 16.6574 - type: nauc_precision_at_10_max value: 35.7196 - type: nauc_precision_at_10_std value: 29.852099999999997 - type: nauc_precision_at_10_diff1 value: 5.6891 - type: nauc_precision_at_20_max value: 30.119 - type: nauc_precision_at_20_std value: 33.204 - type: nauc_precision_at_20_diff1 value: -0.23509999999999998 - type: nauc_precision_at_100_max value: 18.7797 - type: nauc_precision_at_100_std value: 38.9405 - type: nauc_precision_at_100_diff1 value: -10.8005 - type: nauc_precision_at_1000_max value: 9.0466 - type: nauc_precision_at_1000_std value: 35.3392 - type: nauc_precision_at_1000_diff1 value: -16.3137 - type: nauc_mrr_at_1_max value: 44.7657 - type: nauc_mrr_at_1_std value: 3.7028 - type: nauc_mrr_at_1_diff1 value: 52.017199999999995 - type: nauc_mrr_at_3_max value: 45.8134 - type: nauc_mrr_at_3_std value: 5.6788 - type: nauc_mrr_at_3_diff1 value: 48.666199999999996 - type: nauc_mrr_at_5_max value: 45.8823 - type: nauc_mrr_at_5_std value: 6.4417 - type: nauc_mrr_at_5_diff1 value: 48.1545 - type: nauc_mrr_at_10_max value: 45.813500000000005 - type: nauc_mrr_at_10_std value: 6.7535 - type: nauc_mrr_at_10_diff1 value: 47.726400000000005 - type: nauc_mrr_at_20_max value: 45.792500000000004 - type: nauc_mrr_at_20_std value: 6.8521 - type: nauc_mrr_at_20_diff1 value: 47.7553 - type: nauc_mrr_at_100_max value: 45.8482 - type: nauc_mrr_at_100_std value: 6.979399999999999 - type: nauc_mrr_at_100_diff1 value: 47.7743 - type: nauc_mrr_at_1000_max value: 45.8456 - type: nauc_mrr_at_1000_std value: 6.9712 - type: nauc_mrr_at_1000_diff1 value: 47.7803 - type: main_score value: 50.251000000000005 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval (default) revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: mteb/cqadupstack-gaming metrics: - type: ndcg_at_1 value: 47.147 - type: ndcg_at_3 value: 53.969 - type: ndcg_at_5 value: 56.743 - type: ndcg_at_10 value: 59.318000000000005 - type: ndcg_at_20 value: 60.897999999999996 - type: ndcg_at_100 value: 62.971999999999994 - type: ndcg_at_1000 value: 64.033 - type: map_at_1 value: 41.126000000000005 - type: map_at_3 value: 50.388999999999996 - type: map_at_5 value: 52.286 - type: map_at_10 value: 53.661 - type: map_at_20 value: 54.228 - type: map_at_100 value: 54.588 - type: map_at_1000 value: 54.638 - type: recall_at_1 value: 41.126000000000005 - type: recall_at_3 value: 58.374 - type: recall_at_5 value: 65.226 - type: recall_at_10 value: 72.69099999999999 - type: recall_at_20 value: 78.62 - type: recall_at_100 value: 88.69200000000001 - type: recall_at_1000 value: 96.232 - type: precision_at_1 value: 47.147 - type: precision_at_3 value: 24.159 - type: precision_at_5 value: 16.577 - type: precision_at_10 value: 9.549000000000001 - type: precision_at_20 value: 5.276 - type: precision_at_100 value: 1.224 - type: precision_at_1000 value: 0.135 - type: mrr_at_1 value: 47.147299999999994 - type: mrr_at_3 value: 54.4305 - type: mrr_at_5 value: 55.95719999999999 - type: mrr_at_10 value: 56.8499 - type: mrr_at_20 value: 57.230000000000004 - type: mrr_at_100 value: 57.4584 - type: mrr_at_1000 value: 57.4867 - type: nauc_ndcg_at_1_max value: 43.5129 - type: nauc_ndcg_at_1_std value: -3.5116 - type: nauc_ndcg_at_1_diff1 value: 52.717000000000006 - type: nauc_ndcg_at_3_max value: 43.6514 - type: nauc_ndcg_at_3_std value: -3.7903 - type: nauc_ndcg_at_3_diff1 value: 48.7913 - type: nauc_ndcg_at_5_max value: 44.465700000000005 - type: nauc_ndcg_at_5_std value: -3.3794999999999997 - type: nauc_ndcg_at_5_diff1 value: 48.8527 - type: nauc_ndcg_at_10_max value: 46.0891 - type: nauc_ndcg_at_10_std value: -0.5534 - type: nauc_ndcg_at_10_diff1 value: 48.857099999999996 - type: nauc_ndcg_at_20_max value: 46.1334 - type: nauc_ndcg_at_20_std value: 0.2072 - type: nauc_ndcg_at_20_diff1 value: 48.8269 - type: nauc_ndcg_at_100_max value: 46.2793 - type: nauc_ndcg_at_100_std value: 1.2965 - type: nauc_ndcg_at_100_diff1 value: 48.6421 - type: nauc_ndcg_at_1000_max value: 46.1606 - type: nauc_ndcg_at_1000_std value: 0.5259 - type: nauc_ndcg_at_1000_diff1 value: 48.9864 - type: nauc_map_at_1_max value: 36.4337 - type: nauc_map_at_1_std value: -5.6848 - type: nauc_map_at_1_diff1 value: 53.42360000000001 - type: nauc_map_at_3_max value: 41.6669 - type: nauc_map_at_3_std value: -5.6545 - type: nauc_map_at_3_diff1 value: 49.6128 - type: nauc_map_at_5_max value: 42.6809 - type: nauc_map_at_5_std value: -4.9988 - type: nauc_map_at_5_diff1 value: 49.645 - type: nauc_map_at_10_max value: 43.7393 - type: nauc_map_at_10_std value: -3.3649 - type: nauc_map_at_10_diff1 value: 49.574 - type: nauc_map_at_20_max value: 43.9855 - type: nauc_map_at_20_std value: -2.8590999999999998 - type: nauc_map_at_20_diff1 value: 49.5139 - type: nauc_map_at_100_max value: 44.0978 - type: nauc_map_at_100_std value: -2.604 - type: nauc_map_at_100_diff1 value: 49.4857 - type: nauc_map_at_1000_max value: 44.114399999999996 - type: nauc_map_at_1000_std value: -2.6081 - type: nauc_map_at_1000_diff1 value: 49.508799999999994 - type: nauc_recall_at_1_max value: 36.4337 - type: nauc_recall_at_1_std value: -5.6848 - type: nauc_recall_at_1_diff1 value: 53.42360000000001 - type: nauc_recall_at_3_max value: 41.320299999999996 - type: nauc_recall_at_3_std value: -5.7135 - type: nauc_recall_at_3_diff1 value: 45.0436 - type: nauc_recall_at_5_max value: 43.1656 - type: nauc_recall_at_5_std value: -3.8888 - type: nauc_recall_at_5_diff1 value: 44.3304 - type: nauc_recall_at_10_max value: 48.9816 - type: nauc_recall_at_10_std value: 5.9506000000000006 - type: nauc_recall_at_10_diff1 value: 43.9217 - type: nauc_recall_at_20_max value: 50.5525 - type: nauc_recall_at_20_std value: 11.8017 - type: nauc_recall_at_20_diff1 value: 43.4987 - type: nauc_recall_at_100_max value: 54.654 - type: nauc_recall_at_100_std value: 31.634800000000002 - type: nauc_recall_at_100_diff1 value: 38.7139 - type: nauc_recall_at_1000_max value: 62.253 - type: nauc_recall_at_1000_std value: 42.6522 - type: nauc_recall_at_1000_diff1 value: 38.3715 - type: nauc_precision_at_1_max value: 43.5129 - type: nauc_precision_at_1_std value: -3.5116 - type: nauc_precision_at_1_diff1 value: 52.717000000000006 - type: nauc_precision_at_3_max value: 41.983399999999996 - type: nauc_precision_at_3_std value: 2.4643 - type: nauc_precision_at_3_diff1 value: 28.185 - type: nauc_precision_at_5_max value: 39.8061 - type: nauc_precision_at_5_std value: 6.4715 - type: nauc_precision_at_5_diff1 value: 21.333199999999998 - type: nauc_precision_at_10_max value: 37.914500000000004 - type: nauc_precision_at_10_std value: 17.1485 - type: nauc_precision_at_10_diff1 value: 12.6277 - type: nauc_precision_at_20_max value: 34.0432 - type: nauc_precision_at_20_std value: 23.0425 - type: nauc_precision_at_20_diff1 value: 5.551699999999999 - type: nauc_precision_at_100_max value: 26.0405 - type: nauc_precision_at_100_std value: 28.572599999999998 - type: nauc_precision_at_100_diff1 value: -4.2162 - type: nauc_precision_at_1000_max value: 20.176099999999998 - type: nauc_precision_at_1000_std value: 27.293499999999998 - type: nauc_precision_at_1000_diff1 value: -7.4514 - type: nauc_mrr_at_1_max value: 43.5129 - type: nauc_mrr_at_1_std value: -3.5116 - type: nauc_mrr_at_1_diff1 value: 52.717000000000006 - type: nauc_mrr_at_3_max value: 44.9785 - type: nauc_mrr_at_3_std value: -2.2618 - type: nauc_mrr_at_3_diff1 value: 49.8663 - type: nauc_mrr_at_5_max value: 45.1749 - type: nauc_mrr_at_5_std value: -2.1027 - type: nauc_mrr_at_5_diff1 value: 49.8332 - type: nauc_mrr_at_10_max value: 45.6015 - type: nauc_mrr_at_10_std value: -1.3832 - type: nauc_mrr_at_10_diff1 value: 49.9586 - type: nauc_mrr_at_20_max value: 45.535399999999996 - type: nauc_mrr_at_20_std value: -1.2799 - type: nauc_mrr_at_20_diff1 value: 49.9829 - type: nauc_mrr_at_100_max value: 45.5168 - type: nauc_mrr_at_100_std value: -1.2195 - type: nauc_mrr_at_100_diff1 value: 49.9728 - type: nauc_mrr_at_1000_max value: 45.5076 - type: nauc_mrr_at_1000_std value: -1.2494 - type: nauc_mrr_at_1000_diff1 value: 49.977 - type: main_score value: 59.318000000000005 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval (default) revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: mteb/cqadupstack-gis metrics: - type: ndcg_at_1 value: 30.734 - type: ndcg_at_3 value: 38.672000000000004 - type: ndcg_at_5 value: 40.954 - type: ndcg_at_10 value: 43.564 - type: ndcg_at_20 value: 45.48 - type: ndcg_at_100 value: 48.419000000000004 - type: ndcg_at_1000 value: 50.404 - type: map_at_1 value: 28.464 - type: map_at_3 value: 35.704 - type: map_at_5 value: 37.116 - type: map_at_10 value: 38.279999999999994 - type: map_at_20 value: 38.834 - type: map_at_100 value: 39.277 - type: map_at_1000 value: 39.355000000000004 - type: recall_at_1 value: 28.464 - type: recall_at_3 value: 44.588 - type: recall_at_5 value: 50.031000000000006 - type: recall_at_10 value: 57.621 - type: recall_at_20 value: 64.85499999999999 - type: recall_at_100 value: 79.66 - type: recall_at_1000 value: 94.633 - type: precision_at_1 value: 30.734 - type: precision_at_3 value: 16.497 - type: precision_at_5 value: 11.254 - type: precision_at_10 value: 6.633 - type: precision_at_20 value: 3.757 - type: precision_at_100 value: 0.9560000000000001 - type: precision_at_1000 value: 0.116 - type: mrr_at_1 value: 30.734499999999997 - type: mrr_at_3 value: 38.1356 - type: mrr_at_5 value: 39.3616 - type: mrr_at_10 value: 40.4225 - type: mrr_at_20 value: 40.9334 - type: mrr_at_100 value: 41.297200000000004 - type: mrr_at_1000 value: 41.354600000000005 - type: nauc_ndcg_at_1_max value: 30.2094 - type: nauc_ndcg_at_1_std value: -6.9741 - type: nauc_ndcg_at_1_diff1 value: 47.5543 - type: nauc_ndcg_at_3_max value: 31.4334 - type: nauc_ndcg_at_3_std value: -4.7826 - type: nauc_ndcg_at_3_diff1 value: 41.1025 - type: nauc_ndcg_at_5_max value: 32.3557 - type: nauc_ndcg_at_5_std value: -4.1379 - type: nauc_ndcg_at_5_diff1 value: 40.81 - type: nauc_ndcg_at_10_max value: 32.3949 - type: nauc_ndcg_at_10_std value: -2.3524 - type: nauc_ndcg_at_10_diff1 value: 39.5175 - type: nauc_ndcg_at_20_max value: 31.680500000000002 - type: nauc_ndcg_at_20_std value: -1.7559000000000002 - type: nauc_ndcg_at_20_diff1 value: 38.1515 - type: nauc_ndcg_at_100_max value: 31.4167 - type: nauc_ndcg_at_100_std value: -1.0329 - type: nauc_ndcg_at_100_diff1 value: 37.8268 - type: nauc_ndcg_at_1000_max value: 31.736900000000002 - type: nauc_ndcg_at_1000_std value: -1.8415000000000001 - type: nauc_ndcg_at_1000_diff1 value: 39.0335 - type: nauc_map_at_1_max value: 28.260099999999998 - type: nauc_map_at_1_std value: -9.0806 - type: nauc_map_at_1_diff1 value: 47.6706 - type: nauc_map_at_3_max value: 30.551000000000002 - type: nauc_map_at_3_std value: -6.0257 - type: nauc_map_at_3_diff1 value: 42.8155 - type: nauc_map_at_5_max value: 31.285800000000002 - type: nauc_map_at_5_std value: -5.671600000000001 - type: nauc_map_at_5_diff1 value: 42.5887 - type: nauc_map_at_10_max value: 31.329800000000002 - type: nauc_map_at_10_std value: -4.8092999999999995 - type: nauc_map_at_10_diff1 value: 41.9856 - type: nauc_map_at_20_max value: 31.2046 - type: nauc_map_at_20_std value: -4.612 - type: nauc_map_at_20_diff1 value: 41.658699999999996 - type: nauc_map_at_100_max value: 31.181399999999996 - type: nauc_map_at_100_std value: -4.4687 - type: nauc_map_at_100_diff1 value: 41.5836 - type: nauc_map_at_1000_max value: 31.1979 - type: nauc_map_at_1000_std value: -4.4772 - type: nauc_map_at_1000_diff1 value: 41.627900000000004 - type: nauc_recall_at_1_max value: 28.260099999999998 - type: nauc_recall_at_1_std value: -9.0806 - type: nauc_recall_at_1_diff1 value: 47.6706 - type: nauc_recall_at_3_max value: 31.129800000000003 - type: nauc_recall_at_3_std value: -3.2782 - type: nauc_recall_at_3_diff1 value: 35.4529 - type: nauc_recall_at_5_max value: 33.6541 - type: nauc_recall_at_5_std value: -1.7704999999999997 - type: nauc_recall_at_5_diff1 value: 34.9944 - type: nauc_recall_at_10_max value: 33.536100000000005 - type: nauc_recall_at_10_std value: 3.4567 - type: nauc_recall_at_10_diff1 value: 30.553599999999996 - type: nauc_recall_at_20_max value: 29.889100000000003 - type: nauc_recall_at_20_std value: 6.5926 - type: nauc_recall_at_20_diff1 value: 23.217 - type: nauc_recall_at_100_max value: 27.4646 - type: nauc_recall_at_100_std value: 15.746199999999998 - type: nauc_recall_at_100_diff1 value: 15.1327 - type: nauc_recall_at_1000_max value: 32.294200000000004 - type: nauc_recall_at_1000_std value: 21.6293 - type: nauc_recall_at_1000_diff1 value: 11.265600000000001 - type: nauc_precision_at_1_max value: 30.2094 - type: nauc_precision_at_1_std value: -6.9741 - type: nauc_precision_at_1_diff1 value: 47.5543 - type: nauc_precision_at_3_max value: 34.3053 - type: nauc_precision_at_3_std value: 0.42760000000000004 - type: nauc_precision_at_3_diff1 value: 33.4827 - type: nauc_precision_at_5_max value: 35.4035 - type: nauc_precision_at_5_std value: 2.3141 - type: nauc_precision_at_5_diff1 value: 30.8004 - type: nauc_precision_at_10_max value: 33.4042 - type: nauc_precision_at_10_std value: 8.6847 - type: nauc_precision_at_10_diff1 value: 23.558200000000003 - type: nauc_precision_at_20_max value: 29.015200000000004 - type: nauc_precision_at_20_std value: 11.3556 - type: nauc_precision_at_20_diff1 value: 15.774099999999999 - type: nauc_precision_at_100_max value: 16.663700000000002 - type: nauc_precision_at_100_std value: 14.666100000000002 - type: nauc_precision_at_100_diff1 value: 2.1911 - type: nauc_precision_at_1000_max value: 7.348599999999999 - type: nauc_precision_at_1000_std value: 8.8804 - type: nauc_precision_at_1000_diff1 value: -7.026599999999999 - type: nauc_mrr_at_1_max value: 30.2094 - type: nauc_mrr_at_1_std value: -6.9741 - type: nauc_mrr_at_1_diff1 value: 47.5543 - type: nauc_mrr_at_3_max value: 31.831500000000002 - type: nauc_mrr_at_3_std value: -3.6407000000000003 - type: nauc_mrr_at_3_diff1 value: 42.445 - type: nauc_mrr_at_5_max value: 32.273 - type: nauc_mrr_at_5_std value: -3.5416000000000003 - type: nauc_mrr_at_5_diff1 value: 42.5464 - type: nauc_mrr_at_10_max value: 32.3297 - type: nauc_mrr_at_10_std value: -2.9149000000000003 - type: nauc_mrr_at_10_diff1 value: 42.0233 - type: nauc_mrr_at_20_max value: 32.124 - type: nauc_mrr_at_20_std value: -2.7826 - type: nauc_mrr_at_20_diff1 value: 41.652 - type: nauc_mrr_at_100_max value: 32.0994 - type: nauc_mrr_at_100_std value: -2.7182999999999997 - type: nauc_mrr_at_100_diff1 value: 41.6024 - type: nauc_mrr_at_1000_max value: 32.1058 - type: nauc_mrr_at_1000_std value: -2.7332 - type: nauc_mrr_at_1000_diff1 value: 41.652899999999995 - type: main_score value: 43.564 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval (default) revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: mteb/cqadupstack-mathematica metrics: - type: ndcg_at_1 value: 22.886 - type: ndcg_at_3 value: 27.864 - type: ndcg_at_5 value: 30.177 - type: ndcg_at_10 value: 32.749 - type: ndcg_at_20 value: 35.343 - type: ndcg_at_100 value: 39.095 - type: ndcg_at_1000 value: 41.656 - type: map_at_1 value: 18.119 - type: map_at_3 value: 24.340999999999998 - type: map_at_5 value: 25.861 - type: map_at_10 value: 27.055 - type: map_at_20 value: 27.855 - type: map_at_100 value: 28.461 - type: map_at_1000 value: 28.577 - type: recall_at_1 value: 18.119 - type: recall_at_3 value: 31.633 - type: recall_at_5 value: 37.532 - type: recall_at_10 value: 44.983000000000004 - type: recall_at_20 value: 54.234 - type: recall_at_100 value: 72.396 - type: recall_at_1000 value: 90.223 - type: precision_at_1 value: 22.886 - type: precision_at_3 value: 13.682 - type: precision_at_5 value: 9.950000000000001 - type: precision_at_10 value: 6.1690000000000005 - type: precision_at_20 value: 3.8120000000000003 - type: precision_at_100 value: 1.0699999999999998 - type: precision_at_1000 value: 0.14300000000000002 - type: mrr_at_1 value: 22.8856 - type: mrr_at_3 value: 29.6642 - type: mrr_at_5 value: 31.107000000000003 - type: mrr_at_10 value: 32.2342 - type: mrr_at_20 value: 32.8971 - type: mrr_at_100 value: 33.2804 - type: mrr_at_1000 value: 33.3395 - type: nauc_ndcg_at_1_max value: 24.8022 - type: nauc_ndcg_at_1_std value: -0.5363 - type: nauc_ndcg_at_1_diff1 value: 33.1639 - type: nauc_ndcg_at_3_max value: 22.0142 - type: nauc_ndcg_at_3_std value: 0.9467 - type: nauc_ndcg_at_3_diff1 value: 28.9545 - type: nauc_ndcg_at_5_max value: 21.9949 - type: nauc_ndcg_at_5_std value: 2.2558000000000002 - type: nauc_ndcg_at_5_diff1 value: 27.4516 - type: nauc_ndcg_at_10_max value: 21.5958 - type: nauc_ndcg_at_10_std value: 3.5044 - type: nauc_ndcg_at_10_diff1 value: 26.9835 - type: nauc_ndcg_at_20_max value: 21.940299999999997 - type: nauc_ndcg_at_20_std value: 4.6913 - type: nauc_ndcg_at_20_diff1 value: 26.8386 - type: nauc_ndcg_at_100_max value: 22.4749 - type: nauc_ndcg_at_100_std value: 6.1636999999999995 - type: nauc_ndcg_at_100_diff1 value: 27.4132 - type: nauc_ndcg_at_1000_max value: 23.034299999999998 - type: nauc_ndcg_at_1000_std value: 5.7944 - type: nauc_ndcg_at_1000_diff1 value: 27.3963 - type: nauc_map_at_1_max value: 21.4135 - type: nauc_map_at_1_std value: 0.649 - type: nauc_map_at_1_diff1 value: 32.1954 - type: nauc_map_at_3_max value: 20.8778 - type: nauc_map_at_3_std value: 1.0705 - type: nauc_map_at_3_diff1 value: 28.5319 - type: nauc_map_at_5_max value: 21.0234 - type: nauc_map_at_5_std value: 1.5574 - type: nauc_map_at_5_diff1 value: 27.996399999999998 - type: nauc_map_at_10_max value: 20.9927 - type: nauc_map_at_10_std value: 2.2451 - type: nauc_map_at_10_diff1 value: 27.8283 - type: nauc_map_at_20_max value: 21.16 - type: nauc_map_at_20_std value: 2.6176999999999997 - type: nauc_map_at_20_diff1 value: 27.7722 - type: nauc_map_at_100_max value: 21.3551 - type: nauc_map_at_100_std value: 2.8299000000000003 - type: nauc_map_at_100_diff1 value: 27.8752 - type: nauc_map_at_1000_max value: 21.3871 - type: nauc_map_at_1000_std value: 2.7986 - type: nauc_map_at_1000_diff1 value: 27.8709 - type: nauc_recall_at_1_max value: 21.4135 - type: nauc_recall_at_1_std value: 0.649 - type: nauc_recall_at_1_diff1 value: 32.1954 - type: nauc_recall_at_3_max value: 19.3537 - type: nauc_recall_at_3_std value: 1.4591 - type: nauc_recall_at_3_diff1 value: 25.1911 - type: nauc_recall_at_5_max value: 19.6154 - type: nauc_recall_at_5_std value: 3.5305000000000004 - type: nauc_recall_at_5_diff1 value: 22.6218 - type: nauc_recall_at_10_max value: 18.3048 - type: nauc_recall_at_10_std value: 6.1244 - type: nauc_recall_at_10_diff1 value: 21.6834 - type: nauc_recall_at_20_max value: 18.4913 - type: nauc_recall_at_20_std value: 10.083599999999999 - type: nauc_recall_at_20_diff1 value: 20.502200000000002 - type: nauc_recall_at_100_max value: 19.0212 - type: nauc_recall_at_100_std value: 21.8101 - type: nauc_recall_at_100_diff1 value: 21.2653 - type: nauc_recall_at_1000_max value: 29.3582 - type: nauc_recall_at_1000_std value: 42.8902 - type: nauc_recall_at_1000_diff1 value: 14.060900000000002 - type: nauc_precision_at_1_max value: 24.8022 - type: nauc_precision_at_1_std value: -0.5363 - type: nauc_precision_at_1_diff1 value: 33.1639 - type: nauc_precision_at_3_max value: 23.9746 - type: nauc_precision_at_3_std value: 0.9273999999999999 - type: nauc_precision_at_3_diff1 value: 26.0507 - type: nauc_precision_at_5_max value: 23.5487 - type: nauc_precision_at_5_std value: 2.8788 - type: nauc_precision_at_5_diff1 value: 22.439799999999998 - type: nauc_precision_at_10_max value: 21.826999999999998 - type: nauc_precision_at_10_std value: 5.6201 - type: nauc_precision_at_10_diff1 value: 19.8703 - type: nauc_precision_at_20_max value: 21.199399999999997 - type: nauc_precision_at_20_std value: 8.9305 - type: nauc_precision_at_20_diff1 value: 18.043 - type: nauc_precision_at_100_max value: 17.2345 - type: nauc_precision_at_100_std value: 10.0714 - type: nauc_precision_at_100_diff1 value: 14.521999999999998 - type: nauc_precision_at_1000_max value: 7.5709 - type: nauc_precision_at_1000_std value: 0.2689 - type: nauc_precision_at_1000_diff1 value: 4.4733 - type: nauc_mrr_at_1_max value: 24.8022 - type: nauc_mrr_at_1_std value: -0.5363 - type: nauc_mrr_at_1_diff1 value: 33.1639 - type: nauc_mrr_at_3_max value: 24.435499999999998 - type: nauc_mrr_at_3_std value: 0.9502999999999999 - type: nauc_mrr_at_3_diff1 value: 30.7875 - type: nauc_mrr_at_5_max value: 24.7103 - type: nauc_mrr_at_5_std value: 1.8724999999999998 - type: nauc_mrr_at_5_diff1 value: 30.086000000000002 - type: nauc_mrr_at_10_max value: 24.5685 - type: nauc_mrr_at_10_std value: 2.1533 - type: nauc_mrr_at_10_diff1 value: 29.862899999999996 - type: nauc_mrr_at_20_max value: 24.662100000000002 - type: nauc_mrr_at_20_std value: 2.3742 - type: nauc_mrr_at_20_diff1 value: 29.751300000000004 - type: nauc_mrr_at_100_max value: 24.635099999999998 - type: nauc_mrr_at_100_std value: 2.4393000000000002 - type: nauc_mrr_at_100_diff1 value: 29.741 - type: nauc_mrr_at_1000_max value: 24.651699999999998 - type: nauc_mrr_at_1000_std value: 2.4291 - type: nauc_mrr_at_1000_diff1 value: 29.7639 - type: main_score value: 32.749 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval (default) revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: mteb/cqadupstack-physics metrics: - type: ndcg_at_1 value: 38.114 - type: ndcg_at_3 value: 42.986000000000004 - type: ndcg_at_5 value: 45.893 - type: ndcg_at_10 value: 48.339999999999996 - type: ndcg_at_20 value: 50.617000000000004 - type: ndcg_at_100 value: 53.861000000000004 - type: ndcg_at_1000 value: 55.701 - type: map_at_1 value: 30.517 - type: map_at_3 value: 38.443 - type: map_at_5 value: 40.685 - type: map_at_10 value: 42.031 - type: map_at_20 value: 42.79 - type: map_at_100 value: 43.415 - type: map_at_1000 value: 43.525000000000006 - type: recall_at_1 value: 30.517 - type: recall_at_3 value: 46.015 - type: recall_at_5 value: 53.801 - type: recall_at_10 value: 61.332 - type: recall_at_20 value: 69.274 - type: recall_at_100 value: 84.051 - type: recall_at_1000 value: 95.826 - type: precision_at_1 value: 38.114 - type: precision_at_3 value: 20.821 - type: precision_at_5 value: 15.034 - type: precision_at_10 value: 8.892999999999999 - type: precision_at_20 value: 5.231 - type: precision_at_100 value: 1.375 - type: precision_at_1000 value: 0.172 - type: mrr_at_1 value: 38.1136 - type: mrr_at_3 value: 45.1716 - type: mrr_at_5 value: 46.8175 - type: mrr_at_10 value: 47.7831 - type: mrr_at_20 value: 48.329 - type: mrr_at_100 value: 48.6471 - type: mrr_at_1000 value: 48.6877 - type: nauc_ndcg_at_1_max value: 40.1541 - type: nauc_ndcg_at_1_std value: 1.4596 - type: nauc_ndcg_at_1_diff1 value: 56.6442 - type: nauc_ndcg_at_3_max value: 38.9776 - type: nauc_ndcg_at_3_std value: 1.464 - type: nauc_ndcg_at_3_diff1 value: 51.5596 - type: nauc_ndcg_at_5_max value: 38.8678 - type: nauc_ndcg_at_5_std value: 2.5537 - type: nauc_ndcg_at_5_diff1 value: 50.522 - type: nauc_ndcg_at_10_max value: 38.698100000000004 - type: nauc_ndcg_at_10_std value: 2.7959 - type: nauc_ndcg_at_10_diff1 value: 49.8331 - type: nauc_ndcg_at_20_max value: 39.7247 - type: nauc_ndcg_at_20_std value: 4.1737 - type: nauc_ndcg_at_20_diff1 value: 49.5233 - type: nauc_ndcg_at_100_max value: 40.649 - type: nauc_ndcg_at_100_std value: 5.7359 - type: nauc_ndcg_at_100_diff1 value: 50.0626 - type: nauc_ndcg_at_1000_max value: 40.765299999999996 - type: nauc_ndcg_at_1000_std value: 5.5551 - type: nauc_ndcg_at_1000_diff1 value: 50.3599 - type: nauc_map_at_1_max value: 35.659 - type: nauc_map_at_1_std value: -3.8913 - type: nauc_map_at_1_diff1 value: 57.7115 - type: nauc_map_at_3_max value: 37.3901 - type: nauc_map_at_3_std value: -0.88 - type: nauc_map_at_3_diff1 value: 52.9203 - type: nauc_map_at_5_max value: 38.0129 - type: nauc_map_at_5_std value: 0.1544 - type: nauc_map_at_5_diff1 value: 52.1596 - type: nauc_map_at_10_max value: 38.3708 - type: nauc_map_at_10_std value: 0.7947 - type: nauc_map_at_10_diff1 value: 51.909000000000006 - type: nauc_map_at_20_max value: 38.690200000000004 - type: nauc_map_at_20_std value: 1.2379 - type: nauc_map_at_20_diff1 value: 51.775000000000006 - type: nauc_map_at_100_max value: 38.9637 - type: nauc_map_at_100_std value: 1.5914000000000001 - type: nauc_map_at_100_diff1 value: 51.90820000000001 - type: nauc_map_at_1000_max value: 38.9784 - type: nauc_map_at_1000_std value: 1.6184 - type: nauc_map_at_1000_diff1 value: 51.909000000000006 - type: nauc_recall_at_1_max value: 35.659 - type: nauc_recall_at_1_std value: -3.8913 - type: nauc_recall_at_1_diff1 value: 57.7115 - type: nauc_recall_at_3_max value: 34.6073 - type: nauc_recall_at_3_std value: 0.0162 - type: nauc_recall_at_3_diff1 value: 47.0539 - type: nauc_recall_at_5_max value: 34.3868 - type: nauc_recall_at_5_std value: 3.1425 - type: nauc_recall_at_5_diff1 value: 43.1625 - type: nauc_recall_at_10_max value: 33.6467 - type: nauc_recall_at_10_std value: 4.1808 - type: nauc_recall_at_10_diff1 value: 39.711600000000004 - type: nauc_recall_at_20_max value: 36.3449 - type: nauc_recall_at_20_std value: 9.7358 - type: nauc_recall_at_20_diff1 value: 36.5764 - type: nauc_recall_at_100_max value: 40.563500000000005 - type: nauc_recall_at_100_std value: 23.5405 - type: nauc_recall_at_100_diff1 value: 34.2152 - type: nauc_recall_at_1000_max value: 57.387699999999995 - type: nauc_recall_at_1000_std value: 50.897999999999996 - type: nauc_recall_at_1000_diff1 value: 32.9321 - type: nauc_precision_at_1_max value: 40.1541 - type: nauc_precision_at_1_std value: 1.4596 - type: nauc_precision_at_1_diff1 value: 56.6442 - type: nauc_precision_at_3_max value: 36.586600000000004 - type: nauc_precision_at_3_std value: 9.7112 - type: nauc_precision_at_3_diff1 value: 33.8758 - type: nauc_precision_at_5_max value: 34.1914 - type: nauc_precision_at_5_std value: 13.7515 - type: nauc_precision_at_5_diff1 value: 24.6272 - type: nauc_precision_at_10_max value: 30.764999999999997 - type: nauc_precision_at_10_std value: 16.9823 - type: nauc_precision_at_10_diff1 value: 15.954799999999999 - type: nauc_precision_at_20_max value: 27.976699999999997 - type: nauc_precision_at_20_std value: 21.465999999999998 - type: nauc_precision_at_20_diff1 value: 7.0363999999999995 - type: nauc_precision_at_100_max value: 17.6394 - type: nauc_precision_at_100_std value: 23.4207 - type: nauc_precision_at_100_diff1 value: -4.0614 - type: nauc_precision_at_1000_max value: 3.8186999999999998 - type: nauc_precision_at_1000_std value: 16.0902 - type: nauc_precision_at_1000_diff1 value: -14.5093 - type: nauc_mrr_at_1_max value: 40.1541 - type: nauc_mrr_at_1_std value: 1.4596 - type: nauc_mrr_at_1_diff1 value: 56.6442 - type: nauc_mrr_at_3_max value: 40.4577 - type: nauc_mrr_at_3_std value: 3.558 - type: nauc_mrr_at_3_diff1 value: 53.0569 - type: nauc_mrr_at_5_max value: 40.6135 - type: nauc_mrr_at_5_std value: 4.3164 - type: nauc_mrr_at_5_diff1 value: 52.3585 - type: nauc_mrr_at_10_max value: 40.6563 - type: nauc_mrr_at_10_std value: 4.3038 - type: nauc_mrr_at_10_diff1 value: 52.2149 - type: nauc_mrr_at_20_max value: 40.914 - type: nauc_mrr_at_20_std value: 4.5423 - type: nauc_mrr_at_20_diff1 value: 52.2729 - type: nauc_mrr_at_100_max value: 40.8944 - type: nauc_mrr_at_100_std value: 4.546 - type: nauc_mrr_at_100_diff1 value: 52.315400000000004 - type: nauc_mrr_at_1000_max value: 40.893499999999996 - type: nauc_mrr_at_1000_std value: 4.5310999999999995 - type: nauc_mrr_at_1000_diff1 value: 52.337500000000006 - type: main_score value: 48.339999999999996 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval (default) revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: mteb/cqadupstack-programmers metrics: - type: ndcg_at_1 value: 34.247 - type: ndcg_at_3 value: 38.976 - type: ndcg_at_5 value: 41.332 - type: ndcg_at_10 value: 44.065 - type: ndcg_at_20 value: 46.312999999999995 - type: ndcg_at_100 value: 49.434 - type: ndcg_at_1000 value: 51.681999999999995 - type: map_at_1 value: 27.395999999999997 - type: map_at_3 value: 34.782999999999994 - type: map_at_5 value: 36.63 - type: map_at_10 value: 38.043 - type: map_at_20 value: 38.783 - type: map_at_100 value: 39.341 - type: map_at_1000 value: 39.454 - type: recall_at_1 value: 27.395999999999997 - type: recall_at_3 value: 41.785 - type: recall_at_5 value: 48.303000000000004 - type: recall_at_10 value: 56.481 - type: recall_at_20 value: 64.473 - type: recall_at_100 value: 79.012 - type: recall_at_1000 value: 94.182 - type: precision_at_1 value: 34.247 - type: precision_at_3 value: 18.759999999999998 - type: precision_at_5 value: 13.333 - type: precision_at_10 value: 8.059 - type: precision_at_20 value: 4.766 - type: precision_at_100 value: 1.258 - type: precision_at_1000 value: 0.16199999999999998 - type: mrr_at_1 value: 34.2466 - type: mrr_at_3 value: 41.172 - type: mrr_at_5 value: 42.701699999999995 - type: mrr_at_10 value: 43.6807 - type: mrr_at_20 value: 44.1991 - type: mrr_at_100 value: 44.5097 - type: mrr_at_1000 value: 44.5693 - type: nauc_ndcg_at_1_max value: 38.232 - type: nauc_ndcg_at_1_std value: 3.374 - type: nauc_ndcg_at_1_diff1 value: 51.223200000000006 - type: nauc_ndcg_at_3_max value: 38.839800000000004 - type: nauc_ndcg_at_3_std value: 6.529 - type: nauc_ndcg_at_3_diff1 value: 44.2371 - type: nauc_ndcg_at_5_max value: 39.0094 - type: nauc_ndcg_at_5_std value: 8.2202 - type: nauc_ndcg_at_5_diff1 value: 44.8305 - type: nauc_ndcg_at_10_max value: 40.1918 - type: nauc_ndcg_at_10_std value: 9.9826 - type: nauc_ndcg_at_10_diff1 value: 43.5034 - type: nauc_ndcg_at_20_max value: 40.7846 - type: nauc_ndcg_at_20_std value: 11.0178 - type: nauc_ndcg_at_20_diff1 value: 43.176199999999994 - type: nauc_ndcg_at_100_max value: 40.5507 - type: nauc_ndcg_at_100_std value: 13.0203 - type: nauc_ndcg_at_100_diff1 value: 43.2445 - type: nauc_ndcg_at_1000_max value: 40.8071 - type: nauc_ndcg_at_1000_std value: 11.7945 - type: nauc_ndcg_at_1000_diff1 value: 43.8587 - type: nauc_map_at_1_max value: 33.517599999999995 - type: nauc_map_at_1_std value: -0.7517 - type: nauc_map_at_1_diff1 value: 52.92059999999999 - type: nauc_map_at_3_max value: 36.8937 - type: nauc_map_at_3_std value: 4.0335 - type: nauc_map_at_3_diff1 value: 46.4322 - type: nauc_map_at_5_max value: 37.602000000000004 - type: nauc_map_at_5_std value: 5.3923 - type: nauc_map_at_5_diff1 value: 46.6764 - type: nauc_map_at_10_max value: 38.3082 - type: nauc_map_at_10_std value: 6.483600000000001 - type: nauc_map_at_10_diff1 value: 46.0255 - type: nauc_map_at_20_max value: 38.655899999999995 - type: nauc_map_at_20_std value: 6.8814 - type: nauc_map_at_20_diff1 value: 45.8245 - type: nauc_map_at_100_max value: 38.7492 - type: nauc_map_at_100_std value: 7.327100000000001 - type: nauc_map_at_100_diff1 value: 45.8365 - type: nauc_map_at_1000_max value: 38.7584 - type: nauc_map_at_1000_std value: 7.2851 - type: nauc_map_at_1000_diff1 value: 45.8479 - type: nauc_recall_at_1_max value: 33.517599999999995 - type: nauc_recall_at_1_std value: -0.7517 - type: nauc_recall_at_1_diff1 value: 52.92059999999999 - type: nauc_recall_at_3_max value: 37.0749 - type: nauc_recall_at_3_std value: 7.466399999999999 - type: nauc_recall_at_3_diff1 value: 39.454 - type: nauc_recall_at_5_max value: 37.227199999999996 - type: nauc_recall_at_5_std value: 11.7497 - type: nauc_recall_at_5_diff1 value: 39.402 - type: nauc_recall_at_10_max value: 39.901199999999996 - type: nauc_recall_at_10_std value: 16.7381 - type: nauc_recall_at_10_diff1 value: 34.3843 - type: nauc_recall_at_20_max value: 41.0603 - type: nauc_recall_at_20_std value: 20.78 - type: nauc_recall_at_20_diff1 value: 32.2975 - type: nauc_recall_at_100_max value: 38.3499 - type: nauc_recall_at_100_std value: 38.7219 - type: nauc_recall_at_100_diff1 value: 29.078100000000003 - type: nauc_recall_at_1000_max value: 48.2277 - type: nauc_recall_at_1000_std value: 55.4646 - type: nauc_recall_at_1000_diff1 value: 26.919900000000002 - type: nauc_precision_at_1_max value: 38.232 - type: nauc_precision_at_1_std value: 3.374 - type: nauc_precision_at_1_diff1 value: 51.223200000000006 - type: nauc_precision_at_3_max value: 39.8718 - type: nauc_precision_at_3_std value: 14.112 - type: nauc_precision_at_3_diff1 value: 28.971200000000003 - type: nauc_precision_at_5_max value: 38.7064 - type: nauc_precision_at_5_std value: 18.1345 - type: nauc_precision_at_5_diff1 value: 26.5685 - type: nauc_precision_at_10_max value: 36.4352 - type: nauc_precision_at_10_std value: 22.331500000000002 - type: nauc_precision_at_10_diff1 value: 17.163600000000002 - type: nauc_precision_at_20_max value: 33.2221 - type: nauc_precision_at_20_std value: 24.252000000000002 - type: nauc_precision_at_20_diff1 value: 9.0445 - type: nauc_precision_at_100_max value: 16.5544 - type: nauc_precision_at_100_std value: 22.867199999999997 - type: nauc_precision_at_100_diff1 value: -3.8588999999999998 - type: nauc_precision_at_1000_max value: 1.7690000000000001 - type: nauc_precision_at_1000_std value: 8.2609 - type: nauc_precision_at_1000_diff1 value: -13.8927 - type: nauc_mrr_at_1_max value: 38.232 - type: nauc_mrr_at_1_std value: 3.374 - type: nauc_mrr_at_1_diff1 value: 51.223200000000006 - type: nauc_mrr_at_3_max value: 40.2699 - type: nauc_mrr_at_3_std value: 7.6 - type: nauc_mrr_at_3_diff1 value: 45.1804 - type: nauc_mrr_at_5_max value: 40.1434 - type: nauc_mrr_at_5_std value: 8.3698 - type: nauc_mrr_at_5_diff1 value: 45.1772 - type: nauc_mrr_at_10_max value: 40.6102 - type: nauc_mrr_at_10_std value: 8.9793 - type: nauc_mrr_at_10_diff1 value: 44.6458 - type: nauc_mrr_at_20_max value: 40.5002 - type: nauc_mrr_at_20_std value: 9.003 - type: nauc_mrr_at_20_diff1 value: 44.671 - type: nauc_mrr_at_100_max value: 40.4429 - type: nauc_mrr_at_100_std value: 9.131 - type: nauc_mrr_at_100_diff1 value: 44.728899999999996 - type: nauc_mrr_at_1000_max value: 40.4634 - type: nauc_mrr_at_1000_std value: 9.1018 - type: nauc_mrr_at_1000_diff1 value: 44.7656 - type: main_score value: 44.065 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: ndcg_at_1 value: 33.917750000000005 - type: ndcg_at_3 value: 39.253750000000004 - type: ndcg_at_5 value: 41.62250000000001 - type: ndcg_at_10 value: 44.29191666666667 - type: ndcg_at_20 value: 46.318083333333334 - type: ndcg_at_100 value: 49.489000000000004 - type: ndcg_at_1000 value: 51.534083333333335 - type: map_at_1 value: 28.50841666666667 - type: map_at_3 value: 35.52141666666667 - type: map_at_5 value: 37.228500000000004 - type: map_at_10 value: 38.61175 - type: map_at_20 value: 39.3125 - type: map_at_100 value: 39.882083333333334 - type: map_at_1000 value: 39.995916666666666 - type: recall_at_1 value: 28.50841666666667 - type: recall_at_3 value: 42.46875000000001 - type: recall_at_5 value: 48.59916666666667 - type: recall_at_10 value: 56.56024999999999 - type: recall_at_20 value: 63.96383333333333 - type: recall_at_100 value: 79.2645 - type: recall_at_1000 value: 93.25150000000002 - type: precision_at_1 value: 33.917750000000005 - type: precision_at_3 value: 18.19558333333333 - type: precision_at_5 value: 12.950166666666668 - type: precision_at_10 value: 7.866333333333333 - type: precision_at_20 value: 4.614749999999999 - type: precision_at_100 value: 1.2374166666666666 - type: precision_at_1000 value: 0.16091666666666668 - type: mrr_at_1 value: 33.917699999999996 - type: mrr_at_3 value: 40.448166666666665 - type: mrr_at_5 value: 41.903483333333334 - type: mrr_at_10 value: 42.944941666666665 - type: mrr_at_20 value: 43.43391666666666 - type: mrr_at_100 value: 43.782399999999996 - type: mrr_at_1000 value: 43.832325 - type: nauc_ndcg_at_1_max value: 38.768750000000004 - type: nauc_ndcg_at_1_std value: 0.5314750000000001 - type: nauc_ndcg_at_1_diff1 value: 50.18021666666667 - type: nauc_ndcg_at_3_max value: 37.73569166666667 - type: nauc_ndcg_at_3_std value: 1.9756250000000004 - type: nauc_ndcg_at_3_diff1 value: 45.217191666666665 - type: nauc_ndcg_at_5_max value: 38.19843333333333 - type: nauc_ndcg_at_5_std value: 2.760133333333333 - type: nauc_ndcg_at_5_diff1 value: 44.559908333333325 - type: nauc_ndcg_at_10_max value: 38.34826666666667 - type: nauc_ndcg_at_10_std value: 3.8177249999999994 - type: nauc_ndcg_at_10_diff1 value: 43.772149999999996 - type: nauc_ndcg_at_20_max value: 38.53288333333333 - type: nauc_ndcg_at_20_std value: 4.801466666666668 - type: nauc_ndcg_at_20_diff1 value: 43.312774999999995 - type: nauc_ndcg_at_100_max value: 38.912774999999996 - type: nauc_ndcg_at_100_std value: 6.39795 - type: nauc_ndcg_at_100_diff1 value: 43.38179166666667 - type: nauc_ndcg_at_1000_max value: 39.0197 - type: nauc_ndcg_at_1000_std value: 5.861708333333333 - type: nauc_ndcg_at_1000_diff1 value: 43.78785833333334 - type: nauc_map_at_1_max value: 34.808508333333336 - type: nauc_map_at_1_std value: -2.4239916666666663 - type: nauc_map_at_1_diff1 value: 51.88476666666666 - type: nauc_map_at_3_max value: 36.516549999999995 - type: nauc_map_at_3_std value: 0.008974999999999955 - type: nauc_map_at_3_diff1 value: 47.11013333333332 - type: nauc_map_at_5_max value: 37.17583333333333 - type: nauc_map_at_5_std value: 0.7668083333333334 - type: nauc_map_at_5_diff1 value: 46.496975 - type: nauc_map_at_10_max value: 37.54620833333333 - type: nauc_map_at_10_std value: 1.5577166666666666 - type: nauc_map_at_10_diff1 value: 46.02030833333334 - type: nauc_map_at_20_max value: 37.738058333333335 - type: nauc_map_at_20_std value: 2.0228750000000004 - type: nauc_map_at_20_diff1 value: 45.837608333333336 - type: nauc_map_at_100_max value: 37.864575 - type: nauc_map_at_100_std value: 2.3781916666666665 - type: nauc_map_at_100_diff1 value: 45.818783333333336 - type: nauc_map_at_1000_max value: 37.8704 - type: nauc_map_at_1000_std value: 2.403341666666667 - type: nauc_map_at_1000_diff1 value: 45.83103333333333 - type: nauc_recall_at_1_max value: 34.808508333333336 - type: nauc_recall_at_1_std value: -2.4239916666666663 - type: nauc_recall_at_1_diff1 value: 51.88476666666666 - type: nauc_recall_at_3_max value: 35.12659166666666 - type: nauc_recall_at_3_std value: 1.5866916666666664 - type: nauc_recall_at_3_diff1 value: 41.56113333333334 - type: nauc_recall_at_5_max value: 36.147058333333334 - type: nauc_recall_at_5_std value: 3.803583333333333 - type: nauc_recall_at_5_diff1 value: 39.051366666666674 - type: nauc_recall_at_10_max value: 36.10466666666667 - type: nauc_recall_at_10_std value: 7.102541666666666 - type: nauc_recall_at_10_diff1 value: 35.79460833333333 - type: nauc_recall_at_20_max value: 36.25878333333333 - type: nauc_recall_at_20_std value: 11.494475000000001 - type: nauc_recall_at_20_diff1 value: 33.06425833333333 - type: nauc_recall_at_100_max value: 38.00966666666667 - type: nauc_recall_at_100_std value: 27.040050000000004 - type: nauc_recall_at_100_diff1 value: 29.968625 - type: nauc_recall_at_1000_max value: 45.32993333333334 - type: nauc_recall_at_1000_std value: 45.327316666666675 - type: nauc_recall_at_1000_diff1 value: 28.088641666666668 - type: nauc_precision_at_1_max value: 38.768750000000004 - type: nauc_precision_at_1_std value: 0.5314750000000001 - type: nauc_precision_at_1_diff1 value: 50.18021666666667 - type: nauc_precision_at_3_max value: 36.52460833333333 - type: nauc_precision_at_3_std value: 7.665850000000001 - type: nauc_precision_at_3_diff1 value: 31.133191666666672 - type: nauc_precision_at_5_max value: 35.20106666666667 - type: nauc_precision_at_5_std value: 10.746766666666666 - type: nauc_precision_at_5_diff1 value: 24.582291666666663 - type: nauc_precision_at_10_max value: 31.465108333333337 - type: nauc_precision_at_10_std value: 15.019074999999999 - type: nauc_precision_at_10_diff1 value: 16.25574166666667 - type: nauc_precision_at_20_max value: 27.589949999999995 - type: nauc_precision_at_20_std value: 18.108775 - type: nauc_precision_at_20_diff1 value: 9.511666666666668 - type: nauc_precision_at_100_max value: 17.18691666666667 - type: nauc_precision_at_100_std value: 21.440466666666666 - type: nauc_precision_at_100_diff1 value: -1.2442166666666667 - type: nauc_precision_at_1000_max value: 5.215425 - type: nauc_precision_at_1000_std value: 13.896516666666663 - type: nauc_precision_at_1000_diff1 value: -10.446258333333335 - type: nauc_mrr_at_1_max value: 38.768750000000004 - type: nauc_mrr_at_1_std value: 0.5314750000000001 - type: nauc_mrr_at_1_diff1 value: 50.18021666666667 - type: nauc_mrr_at_3_max value: 38.979308333333336 - type: nauc_mrr_at_3_std value: 2.755991666666666 - type: nauc_mrr_at_3_diff1 value: 45.991875 - type: nauc_mrr_at_5_max value: 39.26664166666667 - type: nauc_mrr_at_5_std value: 3.2105333333333332 - type: nauc_mrr_at_5_diff1 value: 45.54448333333333 - type: nauc_mrr_at_10_max value: 39.239558333333335 - type: nauc_mrr_at_10_std value: 3.57125 - type: nauc_mrr_at_10_diff1 value: 45.24083333333333 - type: nauc_mrr_at_20_max value: 39.212075 - type: nauc_mrr_at_20_std value: 3.7281833333333334 - type: nauc_mrr_at_20_diff1 value: 45.153083333333335 - type: nauc_mrr_at_100_max value: 39.221091666666666 - type: nauc_mrr_at_100_std value: 3.823533333333333 - type: nauc_mrr_at_100_diff1 value: 45.19413333333333 - type: nauc_mrr_at_1000_max value: 39.22478333333333 - type: nauc_mrr_at_1000_std value: 3.8052833333333327 - type: nauc_mrr_at_1000_diff1 value: 45.21384166666667 - type: main_score value: 44.29191666666667 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: CQADupstackRetrieval_is_a_combined_dataset split: test type: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: main_score value: 44.29191666666667 - type: ndcg_at_10 value: 44.29191666666667 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval (default) revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: mteb/cqadupstack-stats metrics: - type: ndcg_at_1 value: 29.141000000000002 - type: ndcg_at_3 value: 33.861000000000004 - type: ndcg_at_5 value: 35.887 - type: ndcg_at_10 value: 38.596000000000004 - type: ndcg_at_20 value: 40.172000000000004 - type: ndcg_at_100 value: 43.375 - type: ndcg_at_1000 value: 45.562000000000005 - type: map_at_1 value: 25.728 - type: map_at_3 value: 31.268 - type: map_at_5 value: 32.596000000000004 - type: map_at_10 value: 33.903 - type: map_at_20 value: 34.392 - type: map_at_100 value: 34.853 - type: map_at_1000 value: 34.943999999999996 - type: recall_at_1 value: 25.728 - type: recall_at_3 value: 36.638 - type: recall_at_5 value: 41.689 - type: recall_at_10 value: 50.121 - type: recall_at_20 value: 56.043 - type: recall_at_100 value: 72.382 - type: recall_at_1000 value: 88.306 - type: precision_at_1 value: 29.141000000000002 - type: precision_at_3 value: 14.826 - type: precision_at_5 value: 10.428999999999998 - type: precision_at_10 value: 6.334 - type: precision_at_20 value: 3.589 - type: precision_at_100 value: 0.9520000000000001 - type: precision_at_1000 value: 0.121 - type: mrr_at_1 value: 29.141099999999998 - type: mrr_at_3 value: 34.407 - type: mrr_at_5 value: 35.68 - type: mrr_at_10 value: 36.739 - type: mrr_at_20 value: 37.1572 - type: mrr_at_100 value: 37.5448 - type: mrr_at_1000 value: 37.607600000000005 - type: nauc_ndcg_at_1_max value: 43.0703 - type: nauc_ndcg_at_1_std value: 7.8586 - type: nauc_ndcg_at_1_diff1 value: 57.5204 - type: nauc_ndcg_at_3_max value: 41.7529 - type: nauc_ndcg_at_3_std value: 8.549800000000001 - type: nauc_ndcg_at_3_diff1 value: 52.7211 - type: nauc_ndcg_at_5_max value: 43.404399999999995 - type: nauc_ndcg_at_5_std value: 9.117799999999999 - type: nauc_ndcg_at_5_diff1 value: 52.607400000000005 - type: nauc_ndcg_at_10_max value: 43.8638 - type: nauc_ndcg_at_10_std value: 10.7135 - type: nauc_ndcg_at_10_diff1 value: 50.7607 - type: nauc_ndcg_at_20_max value: 43.3389 - type: nauc_ndcg_at_20_std value: 11.7901 - type: nauc_ndcg_at_20_diff1 value: 50.056900000000006 - type: nauc_ndcg_at_100_max value: 43.580600000000004 - type: nauc_ndcg_at_100_std value: 13.616900000000001 - type: nauc_ndcg_at_100_diff1 value: 49.359700000000004 - type: nauc_ndcg_at_1000_max value: 43.6164 - type: nauc_ndcg_at_1000_std value: 13.5428 - type: nauc_ndcg_at_1000_diff1 value: 50.0821 - type: nauc_map_at_1_max value: 40.5495 - type: nauc_map_at_1_std value: 3.5229999999999997 - type: nauc_map_at_1_diff1 value: 59.7723 - type: nauc_map_at_3_max value: 41.2977 - type: nauc_map_at_3_std value: 6.9411000000000005 - type: nauc_map_at_3_diff1 value: 54.879999999999995 - type: nauc_map_at_5_max value: 42.5686 - type: nauc_map_at_5_std value: 7.8032 - type: nauc_map_at_5_diff1 value: 54.4624 - type: nauc_map_at_10_max value: 43.1361 - type: nauc_map_at_10_std value: 8.8783 - type: nauc_map_at_10_diff1 value: 53.747 - type: nauc_map_at_20_max value: 42.9941 - type: nauc_map_at_20_std value: 9.1777 - type: nauc_map_at_20_diff1 value: 53.5394 - type: nauc_map_at_100_max value: 42.960300000000004 - type: nauc_map_at_100_std value: 9.3584 - type: nauc_map_at_100_diff1 value: 53.3856 - type: nauc_map_at_1000_max value: 42.9595 - type: nauc_map_at_1000_std value: 9.3575 - type: nauc_map_at_1000_diff1 value: 53.4136 - type: nauc_recall_at_1_max value: 40.5495 - type: nauc_recall_at_1_std value: 3.5229999999999997 - type: nauc_recall_at_1_diff1 value: 59.7723 - type: nauc_recall_at_3_max value: 39.5622 - type: nauc_recall_at_3_std value: 7.614 - type: nauc_recall_at_3_diff1 value: 49.469 - type: nauc_recall_at_5_max value: 43.086400000000005 - type: nauc_recall_at_5_std value: 9.1332 - type: nauc_recall_at_5_diff1 value: 47.8829 - type: nauc_recall_at_10_max value: 43.054700000000004 - type: nauc_recall_at_10_std value: 13.116900000000001 - type: nauc_recall_at_10_diff1 value: 40.804 - type: nauc_recall_at_20_max value: 40.8398 - type: nauc_recall_at_20_std value: 17.099600000000002 - type: nauc_recall_at_20_diff1 value: 37.8978 - type: nauc_recall_at_100_max value: 41.8268 - type: nauc_recall_at_100_std value: 31.5507 - type: nauc_recall_at_100_diff1 value: 28.8246 - type: nauc_recall_at_1000_max value: 44.7113 - type: nauc_recall_at_1000_std value: 49.8697 - type: nauc_recall_at_1000_diff1 value: 26.7287 - type: nauc_precision_at_1_max value: 43.0703 - type: nauc_precision_at_1_std value: 7.8586 - type: nauc_precision_at_1_diff1 value: 57.5204 - type: nauc_precision_at_3_max value: 41.098 - type: nauc_precision_at_3_std value: 16.1082 - type: nauc_precision_at_3_diff1 value: 40.5806 - type: nauc_precision_at_5_max value: 43.8705 - type: nauc_precision_at_5_std value: 19.470299999999998 - type: nauc_precision_at_5_diff1 value: 36.9411 - type: nauc_precision_at_10_max value: 41.5225 - type: nauc_precision_at_10_std value: 22.9023 - type: nauc_precision_at_10_diff1 value: 28.0016 - type: nauc_precision_at_20_max value: 36.68 - type: nauc_precision_at_20_std value: 25.5411 - type: nauc_precision_at_20_diff1 value: 22.3414 - type: nauc_precision_at_100_max value: 25.8805 - type: nauc_precision_at_100_std value: 29.0719 - type: nauc_precision_at_100_diff1 value: 7.4353 - type: nauc_precision_at_1000_max value: 12.2406 - type: nauc_precision_at_1000_std value: 22.909 - type: nauc_precision_at_1000_diff1 value: -4.0427 - type: nauc_mrr_at_1_max value: 43.0703 - type: nauc_mrr_at_1_std value: 7.8586 - type: nauc_mrr_at_1_diff1 value: 57.5204 - type: nauc_mrr_at_3_max value: 42.4962 - type: nauc_mrr_at_3_std value: 9.9083 - type: nauc_mrr_at_3_diff1 value: 52.81 - type: nauc_mrr_at_5_max value: 43.7188 - type: nauc_mrr_at_5_std value: 10.2951 - type: nauc_mrr_at_5_diff1 value: 52.9848 - type: nauc_mrr_at_10_max value: 43.6725 - type: nauc_mrr_at_10_std value: 10.8946 - type: nauc_mrr_at_10_diff1 value: 52.037 - type: nauc_mrr_at_20_max value: 43.4857 - type: nauc_mrr_at_20_std value: 11.097700000000001 - type: nauc_mrr_at_20_diff1 value: 51.83560000000001 - type: nauc_mrr_at_100_max value: 43.4906 - type: nauc_mrr_at_100_std value: 11.2695 - type: nauc_mrr_at_100_diff1 value: 51.783500000000004 - type: nauc_mrr_at_1000_max value: 43.490899999999996 - type: nauc_mrr_at_1000_std value: 11.2507 - type: nauc_mrr_at_1000_diff1 value: 51.8107 - type: main_score value: 38.596000000000004 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval (default) revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: mteb/cqadupstack-tex metrics: - type: ndcg_at_1 value: 24.054000000000002 - type: ndcg_at_3 value: 29.115999999999996 - type: ndcg_at_5 value: 31.286 - type: ndcg_at_10 value: 33.722 - type: ndcg_at_20 value: 35.844 - type: ndcg_at_100 value: 39.361000000000004 - type: ndcg_at_1000 value: 42.064 - type: map_at_1 value: 19.911 - type: map_at_3 value: 25.874999999999996 - type: map_at_5 value: 27.403 - type: map_at_10 value: 28.559 - type: map_at_20 value: 29.213 - type: map_at_100 value: 29.784 - type: map_at_1000 value: 29.909999999999997 - type: recall_at_1 value: 19.911 - type: recall_at_3 value: 32.195 - type: recall_at_5 value: 37.818000000000005 - type: recall_at_10 value: 45.183 - type: recall_at_20 value: 53.081999999999994 - type: recall_at_100 value: 70.25 - type: recall_at_1000 value: 89.22200000000001 - type: precision_at_1 value: 24.054000000000002 - type: precision_at_3 value: 13.914000000000001 - type: precision_at_5 value: 10.069 - type: precision_at_10 value: 6.194 - type: precision_at_20 value: 3.7060000000000004 - type: precision_at_100 value: 1.058 - type: precision_at_1000 value: 0.148 - type: mrr_at_1 value: 24.0537 - type: mrr_at_3 value: 30.161700000000003 - type: mrr_at_5 value: 31.505499999999998 - type: mrr_at_10 value: 32.4828 - type: mrr_at_20 value: 33.054899999999996 - type: mrr_at_100 value: 33.4643 - type: mrr_at_1000 value: 33.534000000000006 - type: nauc_ndcg_at_1_max value: 30.663200000000003 - type: nauc_ndcg_at_1_std value: 1.6019999999999999 - type: nauc_ndcg_at_1_diff1 value: 45.730199999999996 - type: nauc_ndcg_at_3_max value: 28.5124 - type: nauc_ndcg_at_3_std value: 3.4572 - type: nauc_ndcg_at_3_diff1 value: 37.109500000000004 - type: nauc_ndcg_at_5_max value: 28.8788 - type: nauc_ndcg_at_5_std value: 4.5551 - type: nauc_ndcg_at_5_diff1 value: 36.1603 - type: nauc_ndcg_at_10_max value: 28.4392 - type: nauc_ndcg_at_10_std value: 5.1365 - type: nauc_ndcg_at_10_diff1 value: 34.6232 - type: nauc_ndcg_at_20_max value: 28.4854 - type: nauc_ndcg_at_20_std value: 6.6366 - type: nauc_ndcg_at_20_diff1 value: 34.5488 - type: nauc_ndcg_at_100_max value: 29.17 - type: nauc_ndcg_at_100_std value: 7.904 - type: nauc_ndcg_at_100_diff1 value: 34.7771 - type: nauc_ndcg_at_1000_max value: 29.437 - type: nauc_ndcg_at_1000_std value: 7.5479 - type: nauc_ndcg_at_1000_diff1 value: 35.605399999999996 - type: nauc_map_at_1_max value: 28.6015 - type: nauc_map_at_1_std value: 1.6265 - type: nauc_map_at_1_diff1 value: 46.170899999999996 - type: nauc_map_at_3_max value: 27.931099999999997 - type: nauc_map_at_3_std value: 3.3492 - type: nauc_map_at_3_diff1 value: 39.2592 - type: nauc_map_at_5_max value: 28.268700000000003 - type: nauc_map_at_5_std value: 3.9050000000000002 - type: nauc_map_at_5_diff1 value: 38.488299999999995 - type: nauc_map_at_10_max value: 28.197400000000002 - type: nauc_map_at_10_std value: 4.1464 - type: nauc_map_at_10_diff1 value: 37.7547 - type: nauc_map_at_20_max value: 28.27 - type: nauc_map_at_20_std value: 4.5844000000000005 - type: nauc_map_at_20_diff1 value: 37.7547 - type: nauc_map_at_100_max value: 28.458 - type: nauc_map_at_100_std value: 4.786300000000001 - type: nauc_map_at_100_diff1 value: 37.782199999999996 - type: nauc_map_at_1000_max value: 28.4996 - type: nauc_map_at_1000_std value: 4.7852 - type: nauc_map_at_1000_diff1 value: 37.816300000000005 - type: nauc_recall_at_1_max value: 28.6015 - type: nauc_recall_at_1_std value: 1.6265 - type: nauc_recall_at_1_diff1 value: 46.170899999999996 - type: nauc_recall_at_3_max value: 25.9988 - type: nauc_recall_at_3_std value: 4.1643 - type: nauc_recall_at_3_diff1 value: 31.9357 - type: nauc_recall_at_5_max value: 26.6721 - type: nauc_recall_at_5_std value: 6.1122000000000005 - type: nauc_recall_at_5_diff1 value: 29.1941 - type: nauc_recall_at_10_max value: 24.9394 - type: nauc_recall_at_10_std value: 7.313 - type: nauc_recall_at_10_diff1 value: 24.283099999999997 - type: nauc_recall_at_20_max value: 24.3242 - type: nauc_recall_at_20_std value: 12.6805 - type: nauc_recall_at_20_diff1 value: 22.8247 - type: nauc_recall_at_100_max value: 26.917799999999996 - type: nauc_recall_at_100_std value: 21.5069 - type: nauc_recall_at_100_diff1 value: 21.205 - type: nauc_recall_at_1000_max value: 29.8594 - type: nauc_recall_at_1000_std value: 31.4363 - type: nauc_recall_at_1000_diff1 value: 23.8707 - type: nauc_precision_at_1_max value: 30.663200000000003 - type: nauc_precision_at_1_std value: 1.6019999999999999 - type: nauc_precision_at_1_diff1 value: 45.730199999999996 - type: nauc_precision_at_3_max value: 28.3435 - type: nauc_precision_at_3_std value: 4.1368 - type: nauc_precision_at_3_diff1 value: 28.5551 - type: nauc_precision_at_5_max value: 28.49 - type: nauc_precision_at_5_std value: 5.8044 - type: nauc_precision_at_5_diff1 value: 24.5061 - type: nauc_precision_at_10_max value: 26.255699999999997 - type: nauc_precision_at_10_std value: 6.998799999999999 - type: nauc_precision_at_10_diff1 value: 18.3038 - type: nauc_precision_at_20_max value: 25.217699999999997 - type: nauc_precision_at_20_std value: 9.9304 - type: nauc_precision_at_20_diff1 value: 15.4876 - type: nauc_precision_at_100_max value: 21.865499999999997 - type: nauc_precision_at_100_std value: 10.746500000000001 - type: nauc_precision_at_100_diff1 value: 7.4687 - type: nauc_precision_at_1000_max value: 18.4782 - type: nauc_precision_at_1000_std value: 3.0096000000000003 - type: nauc_precision_at_1000_diff1 value: 3.3539 - type: nauc_mrr_at_1_max value: 30.663200000000003 - type: nauc_mrr_at_1_std value: 1.6019999999999999 - type: nauc_mrr_at_1_diff1 value: 45.730199999999996 - type: nauc_mrr_at_3_max value: 29.9128 - type: nauc_mrr_at_3_std value: 3.4235 - type: nauc_mrr_at_3_diff1 value: 39.1412 - type: nauc_mrr_at_5_max value: 30.3311 - type: nauc_mrr_at_5_std value: 4.0177 - type: nauc_mrr_at_5_diff1 value: 38.7065 - type: nauc_mrr_at_10_max value: 30.144399999999997 - type: nauc_mrr_at_10_std value: 4.2534 - type: nauc_mrr_at_10_diff1 value: 38.0266 - type: nauc_mrr_at_20_max value: 30.1249 - type: nauc_mrr_at_20_std value: 4.6181 - type: nauc_mrr_at_20_diff1 value: 38.002 - type: nauc_mrr_at_100_max value: 30.1948 - type: nauc_mrr_at_100_std value: 4.7099 - type: nauc_mrr_at_100_diff1 value: 38.0455 - type: nauc_mrr_at_1000_max value: 30.1966 - type: nauc_mrr_at_1000_std value: 4.6948 - type: nauc_mrr_at_1000_diff1 value: 38.0747 - type: main_score value: 33.722 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval (default) revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: mteb/cqadupstack-unix metrics: - type: ndcg_at_1 value: 35.168 - type: ndcg_at_3 value: 39.972 - type: ndcg_at_5 value: 42.586 - type: ndcg_at_10 value: 46.071 - type: ndcg_at_20 value: 48.028999999999996 - type: ndcg_at_100 value: 51.351 - type: ndcg_at_1000 value: 53.169999999999995 - type: map_at_1 value: 29.819000000000003 - type: map_at_3 value: 36.571999999999996 - type: map_at_5 value: 38.385999999999996 - type: map_at_10 value: 40.073 - type: map_at_20 value: 40.72 - type: map_at_100 value: 41.289 - type: map_at_1000 value: 41.375 - type: recall_at_1 value: 29.819000000000003 - type: recall_at_3 value: 43.245 - type: recall_at_5 value: 49.931 - type: recall_at_10 value: 60.075 - type: recall_at_20 value: 67.118 - type: recall_at_100 value: 82.771 - type: recall_at_1000 value: 95.219 - type: precision_at_1 value: 35.168 - type: precision_at_3 value: 18.221 - type: precision_at_5 value: 12.892000000000001 - type: precision_at_10 value: 7.985 - type: precision_at_20 value: 4.529 - type: precision_at_100 value: 1.185 - type: precision_at_1000 value: 0.14400000000000002 - type: mrr_at_1 value: 35.1679 - type: mrr_at_3 value: 41.4024 - type: mrr_at_5 value: 43.039500000000004 - type: mrr_at_10 value: 44.3808 - type: mrr_at_20 value: 44.823299999999996 - type: mrr_at_100 value: 45.1914 - type: mrr_at_1000 value: 45.2339 - type: nauc_ndcg_at_1_max value: 43.9321 - type: nauc_ndcg_at_1_std value: -6.0145 - type: nauc_ndcg_at_1_diff1 value: 53.6293 - type: nauc_ndcg_at_3_max value: 42.0025 - type: nauc_ndcg_at_3_std value: -5.6881 - type: nauc_ndcg_at_3_diff1 value: 47.9461 - type: nauc_ndcg_at_5_max value: 42.916900000000005 - type: nauc_ndcg_at_5_std value: -4.2002999999999995 - type: nauc_ndcg_at_5_diff1 value: 48.0738 - type: nauc_ndcg_at_10_max value: 42.6014 - type: nauc_ndcg_at_10_std value: -2.8179 - type: nauc_ndcg_at_10_diff1 value: 46.792899999999996 - type: nauc_ndcg_at_20_max value: 41.9182 - type: nauc_ndcg_at_20_std value: -2.6714 - type: nauc_ndcg_at_20_diff1 value: 46.111000000000004 - type: nauc_ndcg_at_100_max value: 42.6218 - type: nauc_ndcg_at_100_std value: -1.6882000000000001 - type: nauc_ndcg_at_100_diff1 value: 46.3204 - type: nauc_ndcg_at_1000_max value: 42.6413 - type: nauc_ndcg_at_1000_std value: -2.2983 - type: nauc_ndcg_at_1000_diff1 value: 46.840399999999995 - type: nauc_map_at_1_max value: 41.256 - type: nauc_map_at_1_std value: -7.5877 - type: nauc_map_at_1_diff1 value: 56.383300000000006 - type: nauc_map_at_3_max value: 41.904 - type: nauc_map_at_3_std value: -6.548 - type: nauc_map_at_3_diff1 value: 50.7949 - type: nauc_map_at_5_max value: 42.568400000000004 - type: nauc_map_at_5_std value: -5.3873999999999995 - type: nauc_map_at_5_diff1 value: 50.3791 - type: nauc_map_at_10_max value: 42.6619 - type: nauc_map_at_10_std value: -4.8052 - type: nauc_map_at_10_diff1 value: 49.5933 - type: nauc_map_at_20_max value: 42.4985 - type: nauc_map_at_20_std value: -4.7620000000000005 - type: nauc_map_at_20_diff1 value: 49.3214 - type: nauc_map_at_100_max value: 42.6165 - type: nauc_map_at_100_std value: -4.595599999999999 - type: nauc_map_at_100_diff1 value: 49.277100000000004 - type: nauc_map_at_1000_max value: 42.6146 - type: nauc_map_at_1000_std value: -4.5920000000000005 - type: nauc_map_at_1000_diff1 value: 49.2815 - type: nauc_recall_at_1_max value: 41.256 - type: nauc_recall_at_1_std value: -7.5877 - type: nauc_recall_at_1_diff1 value: 56.383300000000006 - type: nauc_recall_at_3_max value: 39.626099999999994 - type: nauc_recall_at_3_std value: -5.973 - type: nauc_recall_at_3_diff1 value: 44.651 - type: nauc_recall_at_5_max value: 41.4392 - type: nauc_recall_at_5_std value: -1.8328 - type: nauc_recall_at_5_diff1 value: 42.928399999999996 - type: nauc_recall_at_10_max value: 38.807 - type: nauc_recall_at_10_std value: 2.863 - type: nauc_recall_at_10_diff1 value: 37.6663 - type: nauc_recall_at_20_max value: 34.9705 - type: nauc_recall_at_20_std value: 4.1407 - type: nauc_recall_at_20_diff1 value: 33.6156 - type: nauc_recall_at_100_max value: 38.4049 - type: nauc_recall_at_100_std value: 16.7735 - type: nauc_recall_at_100_diff1 value: 30.724800000000002 - type: nauc_recall_at_1000_max value: 42.9152 - type: nauc_recall_at_1000_std value: 32.1176 - type: nauc_recall_at_1000_diff1 value: 33.2582 - type: nauc_precision_at_1_max value: 43.9321 - type: nauc_precision_at_1_std value: -6.0145 - type: nauc_precision_at_1_diff1 value: 53.6293 - type: nauc_precision_at_3_max value: 38.1748 - type: nauc_precision_at_3_std value: -2.3163 - type: nauc_precision_at_3_diff1 value: 31.2502 - type: nauc_precision_at_5_max value: 36.503 - type: nauc_precision_at_5_std value: 2.0892 - type: nauc_precision_at_5_diff1 value: 25.249100000000002 - type: nauc_precision_at_10_max value: 30.2104 - type: nauc_precision_at_10_std value: 6.6937999999999995 - type: nauc_precision_at_10_diff1 value: 14.0684 - type: nauc_precision_at_20_max value: 23.6494 - type: nauc_precision_at_20_std value: 7.216500000000001 - type: nauc_precision_at_20_diff1 value: 6.7953 - type: nauc_precision_at_100_max value: 11.2361 - type: nauc_precision_at_100_std value: 11.824 - type: nauc_precision_at_100_diff1 value: -7.6405 - type: nauc_precision_at_1000_max value: -3.8651 - type: nauc_precision_at_1000_std value: 5.367999999999999 - type: nauc_precision_at_1000_diff1 value: -17.473 - type: nauc_mrr_at_1_max value: 43.9321 - type: nauc_mrr_at_1_std value: -6.0145 - type: nauc_mrr_at_1_diff1 value: 53.6293 - type: nauc_mrr_at_3_max value: 42.8188 - type: nauc_mrr_at_3_std value: -5.1393 - type: nauc_mrr_at_3_diff1 value: 48.3128 - type: nauc_mrr_at_5_max value: 43.5383 - type: nauc_mrr_at_5_std value: -4.2538 - type: nauc_mrr_at_5_diff1 value: 48.0319 - type: nauc_mrr_at_10_max value: 43.121700000000004 - type: nauc_mrr_at_10_std value: -3.7823 - type: nauc_mrr_at_10_diff1 value: 47.6064 - type: nauc_mrr_at_20_max value: 42.8886 - type: nauc_mrr_at_20_std value: -3.8175 - type: nauc_mrr_at_20_diff1 value: 47.5437 - type: nauc_mrr_at_100_max value: 42.9514 - type: nauc_mrr_at_100_std value: -3.8205000000000005 - type: nauc_mrr_at_100_diff1 value: 47.6513 - type: nauc_mrr_at_1000_max value: 42.9567 - type: nauc_mrr_at_1000_std value: -3.8327 - type: nauc_mrr_at_1000_diff1 value: 47.6603 - type: main_score value: 46.071 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: mteb/cqadupstack-webmasters metrics: - type: ndcg_at_1 value: 33.794000000000004 - type: ndcg_at_3 value: 38.442 - type: ndcg_at_5 value: 40.737 - type: ndcg_at_10 value: 43.832 - type: ndcg_at_20 value: 45.589 - type: ndcg_at_100 value: 49.514 - type: ndcg_at_1000 value: 51.742 - type: map_at_1 value: 28.409000000000002 - type: map_at_3 value: 34.337 - type: map_at_5 value: 35.985 - type: map_at_10 value: 37.621 - type: map_at_20 value: 38.391 - type: map_at_100 value: 39.233000000000004 - type: map_at_1000 value: 39.471000000000004 - type: recall_at_1 value: 28.409000000000002 - type: recall_at_3 value: 40.133 - type: recall_at_5 value: 45.913 - type: recall_at_10 value: 55.388000000000005 - type: recall_at_20 value: 62.134 - type: recall_at_100 value: 81.517 - type: recall_at_1000 value: 95.038 - type: precision_at_1 value: 33.794000000000004 - type: precision_at_3 value: 17.787 - type: precision_at_5 value: 13.241 - type: precision_at_10 value: 8.597000000000001 - type: precision_at_20 value: 5.267 - type: precision_at_100 value: 1.652 - type: precision_at_1000 value: 0.251 - type: mrr_at_1 value: 33.7945 - type: mrr_at_3 value: 39.5257 - type: mrr_at_5 value: 41.087 - type: mrr_at_10 value: 42.3491 - type: mrr_at_20 value: 42.7479 - type: mrr_at_100 value: 43.1961 - type: mrr_at_1000 value: 43.2373 - type: nauc_ndcg_at_1_max value: 43.9886 - type: nauc_ndcg_at_1_std value: 9.8923 - type: nauc_ndcg_at_1_diff1 value: 50.394000000000005 - type: nauc_ndcg_at_3_max value: 43.074200000000005 - type: nauc_ndcg_at_3_std value: 13.5108 - type: nauc_ndcg_at_3_diff1 value: 47.0674 - type: nauc_ndcg_at_5_max value: 42.810700000000004 - type: nauc_ndcg_at_5_std value: 14.119499999999999 - type: nauc_ndcg_at_5_diff1 value: 46.822 - type: nauc_ndcg_at_10_max value: 43.533699999999996 - type: nauc_ndcg_at_10_std value: 14.009599999999999 - type: nauc_ndcg_at_10_diff1 value: 47.3163 - type: nauc_ndcg_at_20_max value: 44.4973 - type: nauc_ndcg_at_20_std value: 14.5044 - type: nauc_ndcg_at_20_diff1 value: 47.2833 - type: nauc_ndcg_at_100_max value: 44.7593 - type: nauc_ndcg_at_100_std value: 16.833000000000002 - type: nauc_ndcg_at_100_diff1 value: 47.251599999999996 - type: nauc_ndcg_at_1000_max value: 44.790600000000005 - type: nauc_ndcg_at_1000_std value: 15.987199999999998 - type: nauc_ndcg_at_1000_diff1 value: 47.4071 - type: nauc_map_at_1_max value: 43.4155 - type: nauc_map_at_1_std value: 6.3514 - type: nauc_map_at_1_diff1 value: 54.8257 - type: nauc_map_at_3_max value: 43.1906 - type: nauc_map_at_3_std value: 9.823 - type: nauc_map_at_3_diff1 value: 49.5974 - type: nauc_map_at_5_max value: 43.1564 - type: nauc_map_at_5_std value: 10.3498 - type: nauc_map_at_5_diff1 value: 48.7876 - type: nauc_map_at_10_max value: 43.6805 - type: nauc_map_at_10_std value: 10.844199999999999 - type: nauc_map_at_10_diff1 value: 48.5759 - type: nauc_map_at_20_max value: 44.121700000000004 - type: nauc_map_at_20_std value: 11.6161 - type: nauc_map_at_20_diff1 value: 48.4631 - type: nauc_map_at_100_max value: 44.1124 - type: nauc_map_at_100_std value: 12.439 - type: nauc_map_at_100_diff1 value: 48.4742 - type: nauc_map_at_1000_max value: 44.0146 - type: nauc_map_at_1000_std value: 12.708 - type: nauc_map_at_1000_diff1 value: 48.5587 - type: nauc_recall_at_1_max value: 43.4155 - type: nauc_recall_at_1_std value: 6.3514 - type: nauc_recall_at_1_diff1 value: 54.8257 - type: nauc_recall_at_3_max value: 40.941300000000005 - type: nauc_recall_at_3_std value: 12.864700000000001 - type: nauc_recall_at_3_diff1 value: 44.642900000000004 - type: nauc_recall_at_5_max value: 39.6961 - type: nauc_recall_at_5_std value: 13.6938 - type: nauc_recall_at_5_diff1 value: 42.142 - type: nauc_recall_at_10_max value: 40.2068 - type: nauc_recall_at_10_std value: 14.1258 - type: nauc_recall_at_10_diff1 value: 42.244 - type: nauc_recall_at_20_max value: 42.7956 - type: nauc_recall_at_20_std value: 17.518 - type: nauc_recall_at_20_diff1 value: 42.3104 - type: nauc_recall_at_100_max value: 43.4746 - type: nauc_recall_at_100_std value: 39.7613 - type: nauc_recall_at_100_diff1 value: 40.5005 - type: nauc_recall_at_1000_max value: 58.044 - type: nauc_recall_at_1000_std value: 56.4975 - type: nauc_recall_at_1000_diff1 value: 40.238600000000005 - type: nauc_precision_at_1_max value: 43.9886 - type: nauc_precision_at_1_std value: 9.8923 - type: nauc_precision_at_1_diff1 value: 50.394000000000005 - type: nauc_precision_at_3_max value: 37.436 - type: nauc_precision_at_3_std value: 19.9652 - type: nauc_precision_at_3_diff1 value: 31.1933 - type: nauc_precision_at_5_max value: 32.124900000000004 - type: nauc_precision_at_5_std value: 22.8439 - type: nauc_precision_at_5_diff1 value: 23.325699999999998 - type: nauc_precision_at_10_max value: 26.956200000000003 - type: nauc_precision_at_10_std value: 24.7414 - type: nauc_precision_at_10_diff1 value: 15.1951 - type: nauc_precision_at_20_max value: 20.924799999999998 - type: nauc_precision_at_20_std value: 27.1802 - type: nauc_precision_at_20_diff1 value: 8.575800000000001 - type: nauc_precision_at_100_max value: 3.8554 - type: nauc_precision_at_100_std value: 32.46 - type: nauc_precision_at_100_diff1 value: 1.1094 - type: nauc_precision_at_1000_max value: -4.0572 - type: nauc_precision_at_1000_std value: 29.813499999999998 - type: nauc_precision_at_1000_diff1 value: 0.7384 - type: nauc_mrr_at_1_max value: 43.9886 - type: nauc_mrr_at_1_std value: 9.8923 - type: nauc_mrr_at_1_diff1 value: 50.394000000000005 - type: nauc_mrr_at_3_max value: 43.5962 - type: nauc_mrr_at_3_std value: 13.738 - type: nauc_mrr_at_3_diff1 value: 46.9918 - type: nauc_mrr_at_5_max value: 43.6259 - type: nauc_mrr_at_5_std value: 13.3696 - type: nauc_mrr_at_5_diff1 value: 46.7241 - type: nauc_mrr_at_10_max value: 43.7969 - type: nauc_mrr_at_10_std value: 13.477500000000001 - type: nauc_mrr_at_10_diff1 value: 47.125499999999995 - type: nauc_mrr_at_20_max value: 43.8469 - type: nauc_mrr_at_20_std value: 13.5156 - type: nauc_mrr_at_20_diff1 value: 47.088 - type: nauc_mrr_at_100_max value: 43.8068 - type: nauc_mrr_at_100_std value: 13.7051 - type: nauc_mrr_at_100_diff1 value: 47.153600000000004 - type: nauc_mrr_at_1000_max value: 43.8016 - type: nauc_mrr_at_1000_std value: 13.661999999999999 - type: nauc_mrr_at_1000_diff1 value: 47.1571 - type: main_score value: 43.832 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval (default) revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack-wordpress metrics: - type: ndcg_at_1 value: 26.247999999999998 - type: ndcg_at_3 value: 31.799 - type: ndcg_at_5 value: 34.563 - type: ndcg_at_10 value: 36.889 - type: ndcg_at_20 value: 39.330999999999996 - type: ndcg_at_100 value: 42.426 - type: ndcg_at_1000 value: 44.745000000000005 - type: map_at_1 value: 24.067 - type: map_at_3 value: 29.492 - type: map_at_5 value: 31.11 - type: map_at_10 value: 32.184000000000005 - type: map_at_20 value: 32.903 - type: map_at_100 value: 33.357 - type: map_at_1000 value: 33.458 - type: recall_at_1 value: 24.067 - type: recall_at_3 value: 36.272 - type: recall_at_5 value: 42.77 - type: recall_at_10 value: 49.344 - type: recall_at_20 value: 58.46 - type: recall_at_100 value: 74.11999999999999 - type: recall_at_1000 value: 91.276 - type: precision_at_1 value: 26.247999999999998 - type: precision_at_3 value: 13.309000000000001 - type: precision_at_5 value: 9.649000000000001 - type: precision_at_10 value: 5.712 - type: precision_at_20 value: 3.466 - type: precision_at_100 value: 0.915 - type: precision_at_1000 value: 0.123 - type: mrr_at_1 value: 26.247700000000002 - type: mrr_at_3 value: 31.638899999999996 - type: mrr_at_5 value: 33.1824 - type: mrr_at_10 value: 34.1493 - type: mrr_at_20 value: 34.7716 - type: mrr_at_100 value: 35.1893 - type: mrr_at_1000 value: 35.2507 - type: nauc_ndcg_at_1_max value: 36.3215 - type: nauc_ndcg_at_1_std value: 0.6172000000000001 - type: nauc_ndcg_at_1_diff1 value: 50.767799999999994 - type: nauc_ndcg_at_3_max value: 32.5903 - type: nauc_ndcg_at_3_std value: 2.5009 - type: nauc_ndcg_at_3_diff1 value: 44.7412 - type: nauc_ndcg_at_5_max value: 32.616499999999995 - type: nauc_ndcg_at_5_std value: 2.2826 - type: nauc_ndcg_at_5_diff1 value: 41.7193 - type: nauc_ndcg_at_10_max value: 32.063399999999994 - type: nauc_ndcg_at_10_std value: 2.7484 - type: nauc_ndcg_at_10_diff1 value: 40.9919 - type: nauc_ndcg_at_20_max value: 32.6337 - type: nauc_ndcg_at_20_std value: 3.6401000000000003 - type: nauc_ndcg_at_20_diff1 value: 39.4371 - type: nauc_ndcg_at_100_max value: 33.4504 - type: nauc_ndcg_at_100_std value: 6.5571 - type: nauc_ndcg_at_100_diff1 value: 40.103899999999996 - type: nauc_ndcg_at_1000_max value: 33.413399999999996 - type: nauc_ndcg_at_1000_std value: 6.1167 - type: nauc_ndcg_at_1000_diff1 value: 40.3296 - type: nauc_map_at_1_max value: 33.9516 - type: nauc_map_at_1_std value: -2.0814 - type: nauc_map_at_1_diff1 value: 51.6831 - type: nauc_map_at_3_max value: 32.4114 - type: nauc_map_at_3_std value: 0.9002 - type: nauc_map_at_3_diff1 value: 46.3164 - type: nauc_map_at_5_max value: 32.7406 - type: nauc_map_at_5_std value: 0.9598000000000001 - type: nauc_map_at_5_diff1 value: 44.576100000000004 - type: nauc_map_at_10_max value: 32.669 - type: nauc_map_at_10_std value: 1.4043 - type: nauc_map_at_10_diff1 value: 44.1697 - type: nauc_map_at_20_max value: 32.807199999999995 - type: nauc_map_at_20_std value: 1.7632999999999999 - type: nauc_map_at_20_diff1 value: 43.745400000000004 - type: nauc_map_at_100_max value: 32.9749 - type: nauc_map_at_100_std value: 2.1647 - type: nauc_map_at_100_diff1 value: 43.8445 - type: nauc_map_at_1000_max value: 32.9631 - type: nauc_map_at_1000_std value: 2.164 - type: nauc_map_at_1000_diff1 value: 43.8217 - type: nauc_recall_at_1_max value: 33.9516 - type: nauc_recall_at_1_std value: -2.0814 - type: nauc_recall_at_1_diff1 value: 51.6831 - type: nauc_recall_at_3_max value: 30.248199999999997 - type: nauc_recall_at_3_std value: 4.3766 - type: nauc_recall_at_3_diff1 value: 40.7147 - type: nauc_recall_at_5_max value: 29.749799999999997 - type: nauc_recall_at_5_std value: 3.739 - type: nauc_recall_at_5_diff1 value: 33.4515 - type: nauc_recall_at_10_max value: 27.8039 - type: nauc_recall_at_10_std value: 4.3235 - type: nauc_recall_at_10_diff1 value: 31.706200000000003 - type: nauc_recall_at_20_max value: 29.4726 - type: nauc_recall_at_20_std value: 7.2537 - type: nauc_recall_at_20_diff1 value: 24.763099999999998 - type: nauc_recall_at_100_max value: 32.6767 - type: nauc_recall_at_100_std value: 28.704400000000003 - type: nauc_recall_at_100_diff1 value: 23.6186 - type: nauc_recall_at_1000_max value: 35.3748 - type: nauc_recall_at_1000_std value: 49.2642 - type: nauc_recall_at_1000_diff1 value: 15.0664 - type: nauc_precision_at_1_max value: 36.3215 - type: nauc_precision_at_1_std value: 0.6172000000000001 - type: nauc_precision_at_1_diff1 value: 50.767799999999994 - type: nauc_precision_at_3_max value: 32.4313 - type: nauc_precision_at_3_std value: 6.8161 - type: nauc_precision_at_3_diff1 value: 39.4056 - type: nauc_precision_at_5_max value: 32.1058 - type: nauc_precision_at_5_std value: 7.5455 - type: nauc_precision_at_5_diff1 value: 29.119899999999998 - type: nauc_precision_at_10_max value: 29.9078 - type: nauc_precision_at_10_std value: 11.8851 - type: nauc_precision_at_10_diff1 value: 22.5166 - type: nauc_precision_at_20_max value: 29.212300000000003 - type: nauc_precision_at_20_std value: 16.1047 - type: nauc_precision_at_20_diff1 value: 12.209299999999999 - type: nauc_precision_at_100_max value: 24.7982 - type: nauc_precision_at_100_std value: 29.3162 - type: nauc_precision_at_100_diff1 value: 0.8240000000000001 - type: nauc_precision_at_1000_max value: -0.8333 - type: nauc_precision_at_1000_std value: 17.0877 - type: nauc_precision_at_1000_diff1 value: -25.4924 - type: nauc_mrr_at_1_max value: 36.3215 - type: nauc_mrr_at_1_std value: 0.6172000000000001 - type: nauc_mrr_at_1_diff1 value: 50.767799999999994 - type: nauc_mrr_at_3_max value: 34.7464 - type: nauc_mrr_at_3_std value: 2.9025 - type: nauc_mrr_at_3_diff1 value: 45.7566 - type: nauc_mrr_at_5_max value: 34.454 - type: nauc_mrr_at_5_std value: 2.9497 - type: nauc_mrr_at_5_diff1 value: 43.948 - type: nauc_mrr_at_10_max value: 34.1548 - type: nauc_mrr_at_10_std value: 3.0771 - type: nauc_mrr_at_10_diff1 value: 43.626599999999996 - type: nauc_mrr_at_20_max value: 34.3061 - type: nauc_mrr_at_20_std value: 3.2359999999999998 - type: nauc_mrr_at_20_diff1 value: 43.2516 - type: nauc_mrr_at_100_max value: 34.3776 - type: nauc_mrr_at_100_std value: 3.5534999999999997 - type: nauc_mrr_at_100_diff1 value: 43.432900000000004 - type: nauc_mrr_at_1000_max value: 34.3807 - type: nauc_mrr_at_1000_std value: 3.5423999999999998 - type: nauc_mrr_at_1000_diff1 value: 43.4448 - type: main_score value: 36.889 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER (default) revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: ndcg_at_1 value: 29.837000000000003 - type: ndcg_at_3 value: 25.392 - type: ndcg_at_5 value: 27.153 - type: ndcg_at_10 value: 30.263 - type: ndcg_at_20 value: 33.073 - type: ndcg_at_100 value: 37.228 - type: ndcg_at_1000 value: 40.677 - type: map_at_1 value: 13.189 - type: map_at_3 value: 18.512999999999998 - type: map_at_5 value: 20.212 - type: map_at_10 value: 21.789 - type: map_at_20 value: 22.787 - type: map_at_100 value: 23.580000000000002 - type: map_at_1000 value: 23.772 - type: recall_at_1 value: 13.189 - type: recall_at_3 value: 23.255 - type: recall_at_5 value: 28.445999999999998 - type: recall_at_10 value: 35.355 - type: recall_at_20 value: 43.187999999999995 - type: recall_at_100 value: 59.255 - type: recall_at_1000 value: 78.637 - type: precision_at_1 value: 29.837000000000003 - type: precision_at_3 value: 18.545 - type: precision_at_5 value: 14.241000000000001 - type: precision_at_10 value: 9.179 - type: precision_at_20 value: 5.808 - type: precision_at_100 value: 1.659 - type: precision_at_1000 value: 0.22999999999999998 - type: mrr_at_1 value: 29.8371 - type: mrr_at_3 value: 38.2845 - type: mrr_at_5 value: 40.300799999999995 - type: mrr_at_10 value: 41.3765 - type: mrr_at_20 value: 41.958400000000005 - type: mrr_at_100 value: 42.281600000000005 - type: mrr_at_1000 value: 42.3193 - type: nauc_ndcg_at_1_max value: 29.676000000000002 - type: nauc_ndcg_at_1_std value: 20.4771 - type: nauc_ndcg_at_1_diff1 value: 22.0866 - type: nauc_ndcg_at_3_max value: 34.3256 - type: nauc_ndcg_at_3_std value: 18.886400000000002 - type: nauc_ndcg_at_3_diff1 value: 19.692999999999998 - type: nauc_ndcg_at_5_max value: 36.709599999999995 - type: nauc_ndcg_at_5_std value: 21.857 - type: nauc_ndcg_at_5_diff1 value: 20.2605 - type: nauc_ndcg_at_10_max value: 36.951699999999995 - type: nauc_ndcg_at_10_std value: 24.1201 - type: nauc_ndcg_at_10_diff1 value: 19.5268 - type: nauc_ndcg_at_20_max value: 37.2598 - type: nauc_ndcg_at_20_std value: 26.072699999999998 - type: nauc_ndcg_at_20_diff1 value: 18.5947 - type: nauc_ndcg_at_100_max value: 37.5131 - type: nauc_ndcg_at_100_std value: 27.3519 - type: nauc_ndcg_at_100_diff1 value: 18.7028 - type: nauc_ndcg_at_1000_max value: 37.4262 - type: nauc_ndcg_at_1000_std value: 27.158700000000003 - type: nauc_ndcg_at_1000_diff1 value: 19.2395 - type: nauc_map_at_1_max value: 32.2132 - type: nauc_map_at_1_std value: 15.244 - type: nauc_map_at_1_diff1 value: 26.2965 - type: nauc_map_at_3_max value: 35.157 - type: nauc_map_at_3_std value: 16.8008 - type: nauc_map_at_3_diff1 value: 21.7011 - type: nauc_map_at_5_max value: 36.0907 - type: nauc_map_at_5_std value: 19.0433 - type: nauc_map_at_5_diff1 value: 21.5595 - type: nauc_map_at_10_max value: 36.1498 - type: nauc_map_at_10_std value: 20.7259 - type: nauc_map_at_10_diff1 value: 20.816599999999998 - type: nauc_map_at_20_max value: 36.365199999999994 - type: nauc_map_at_20_std value: 21.6367 - type: nauc_map_at_20_diff1 value: 20.4563 - type: nauc_map_at_100_max value: 36.503600000000006 - type: nauc_map_at_100_std value: 22.020200000000003 - type: nauc_map_at_100_diff1 value: 20.5135 - type: nauc_map_at_1000_max value: 36.4843 - type: nauc_map_at_1000_std value: 22.0155 - type: nauc_map_at_1000_diff1 value: 20.5659 - type: nauc_recall_at_1_max value: 32.2132 - type: nauc_recall_at_1_std value: 15.244 - type: nauc_recall_at_1_diff1 value: 26.2965 - type: nauc_recall_at_3_max value: 34.6294 - type: nauc_recall_at_3_std value: 16.517200000000003 - type: nauc_recall_at_3_diff1 value: 16.6413 - type: nauc_recall_at_5_max value: 35.938700000000004 - type: nauc_recall_at_5_std value: 21.1943 - type: nauc_recall_at_5_diff1 value: 16.702 - type: nauc_recall_at_10_max value: 34.956900000000005 - type: nauc_recall_at_10_std value: 24.6739 - type: nauc_recall_at_10_diff1 value: 14.4465 - type: nauc_recall_at_20_max value: 33.873799999999996 - type: nauc_recall_at_20_std value: 27.9903 - type: nauc_recall_at_20_diff1 value: 11.1114 - type: nauc_recall_at_100_max value: 33.123799999999996 - type: nauc_recall_at_100_std value: 31.4933 - type: nauc_recall_at_100_diff1 value: 10.3246 - type: nauc_recall_at_1000_max value: 32.9304 - type: nauc_recall_at_1000_std value: 33.5144 - type: nauc_recall_at_1000_diff1 value: 10.810699999999999 - type: nauc_precision_at_1_max value: 29.676000000000002 - type: nauc_precision_at_1_std value: 20.4771 - type: nauc_precision_at_1_diff1 value: 22.0866 - type: nauc_precision_at_3_max value: 32.0765 - type: nauc_precision_at_3_std value: 20.6039 - type: nauc_precision_at_3_diff1 value: 13.585700000000001 - type: nauc_precision_at_5_max value: 33.5445 - type: nauc_precision_at_5_std value: 26.567400000000003 - type: nauc_precision_at_5_diff1 value: 14.421700000000001 - type: nauc_precision_at_10_max value: 29.520200000000003 - type: nauc_precision_at_10_std value: 28.8453 - type: nauc_precision_at_10_diff1 value: 11.2529 - type: nauc_precision_at_20_max value: 25.610300000000002 - type: nauc_precision_at_20_std value: 30.6799 - type: nauc_precision_at_20_diff1 value: 6.8877 - type: nauc_precision_at_100_max value: 18.3639 - type: nauc_precision_at_100_std value: 28.2568 - type: nauc_precision_at_100_diff1 value: 3.8568 - type: nauc_precision_at_1000_max value: 6.9706 - type: nauc_precision_at_1000_std value: 18.9339 - type: nauc_precision_at_1000_diff1 value: 0.6999 - type: nauc_mrr_at_1_max value: 29.676000000000002 - type: nauc_mrr_at_1_std value: 20.4771 - type: nauc_mrr_at_1_diff1 value: 22.0866 - type: nauc_mrr_at_3_max value: 32.559900000000006 - type: nauc_mrr_at_3_std value: 22.1817 - type: nauc_mrr_at_3_diff1 value: 19.1362 - type: nauc_mrr_at_5_max value: 33.692299999999996 - type: nauc_mrr_at_5_std value: 23.5179 - type: nauc_mrr_at_5_diff1 value: 19.9908 - type: nauc_mrr_at_10_max value: 33.6748 - type: nauc_mrr_at_10_std value: 23.624200000000002 - type: nauc_mrr_at_10_diff1 value: 19.969 - type: nauc_mrr_at_20_max value: 33.562599999999996 - type: nauc_mrr_at_20_std value: 23.776 - type: nauc_mrr_at_20_diff1 value: 19.8259 - type: nauc_mrr_at_100_max value: 33.4998 - type: nauc_mrr_at_100_std value: 23.7432 - type: nauc_mrr_at_100_diff1 value: 19.8137 - type: nauc_mrr_at_1000_max value: 33.4876 - type: nauc_mrr_at_1000_std value: 23.719199999999997 - type: nauc_mrr_at_1000_diff1 value: 19.817 - type: main_score value: 30.263 task: type: Retrieval - dataset: config: default name: MTEB CodeFeedbackMT (default) revision: b0f12fa0c0dd67f59c95a5c33d02aeeb4c398c5f split: test type: CoIR-Retrieval/codefeedback-mt metrics: - type: ndcg_at_1 value: 27.002 - type: ndcg_at_3 value: 33.597 - type: ndcg_at_5 value: 35.75 - type: ndcg_at_10 value: 37.757000000000005 - type: ndcg_at_20 value: 39.36 - type: ndcg_at_100 value: 41.806 - type: ndcg_at_1000 value: 43.675000000000004 - type: map_at_1 value: 27.002 - type: map_at_3 value: 31.964 - type: map_at_5 value: 33.158 - type: map_at_10 value: 33.988 - type: map_at_20 value: 34.43 - type: map_at_100 value: 34.760000000000005 - type: map_at_1000 value: 34.821999999999996 - type: recall_at_1 value: 27.002 - type: recall_at_3 value: 38.329 - type: recall_at_5 value: 43.557 - type: recall_at_10 value: 49.755 - type: recall_at_20 value: 56.082 - type: recall_at_100 value: 69.376 - type: recall_at_1000 value: 84.56 - type: precision_at_1 value: 27.002 - type: precision_at_3 value: 12.776000000000002 - type: precision_at_5 value: 8.711 - type: precision_at_10 value: 4.976 - type: precision_at_20 value: 2.804 - type: precision_at_100 value: 0.694 - type: precision_at_1000 value: 0.08499999999999999 - type: mrr_at_1 value: 27.001599999999996 - type: mrr_at_3 value: 31.9638 - type: mrr_at_5 value: 33.158300000000004 - type: mrr_at_10 value: 33.9877 - type: mrr_at_20 value: 34.429700000000004 - type: mrr_at_100 value: 34.760200000000005 - type: mrr_at_1000 value: 34.822399999999995 - type: nauc_ndcg_at_1_max value: 14.691199999999998 - type: nauc_ndcg_at_1_std value: -18.2481 - type: nauc_ndcg_at_1_diff1 value: 51.82940000000001 - type: nauc_ndcg_at_3_max value: 15.9155 - type: nauc_ndcg_at_3_std value: -18.21 - type: nauc_ndcg_at_3_diff1 value: 46.4667 - type: nauc_ndcg_at_5_max value: 16.2958 - type: nauc_ndcg_at_5_std value: -17.8939 - type: nauc_ndcg_at_5_diff1 value: 45.4591 - type: nauc_ndcg_at_10_max value: 16.6542 - type: nauc_ndcg_at_10_std value: -17.121 - type: nauc_ndcg_at_10_diff1 value: 44.5803 - type: nauc_ndcg_at_20_max value: 17.210800000000003 - type: nauc_ndcg_at_20_std value: -16.3918 - type: nauc_ndcg_at_20_diff1 value: 44.0927 - type: nauc_ndcg_at_100_max value: 17.8597 - type: nauc_ndcg_at_100_std value: -14.35 - type: nauc_ndcg_at_100_diff1 value: 43.561 - type: nauc_ndcg_at_1000_max value: 18.0753 - type: nauc_ndcg_at_1000_std value: -13.827300000000001 - type: nauc_ndcg_at_1000_diff1 value: 43.9433 - type: nauc_map_at_1_max value: 14.691199999999998 - type: nauc_map_at_1_std value: -18.2481 - type: nauc_map_at_1_diff1 value: 51.82940000000001 - type: nauc_map_at_3_max value: 15.657099999999998 - type: nauc_map_at_3_std value: -18.253700000000002 - type: nauc_map_at_3_diff1 value: 47.749399999999994 - type: nauc_map_at_5_max value: 15.8683 - type: nauc_map_at_5_std value: -18.0718 - type: nauc_map_at_5_diff1 value: 47.176899999999996 - type: nauc_map_at_10_max value: 16.0118 - type: nauc_map_at_10_std value: -17.7494 - type: nauc_map_at_10_diff1 value: 46.818799999999996 - type: nauc_map_at_20_max value: 16.1658 - type: nauc_map_at_20_std value: -17.552400000000002 - type: nauc_map_at_20_diff1 value: 46.694 - type: nauc_map_at_100_max value: 16.2407 - type: nauc_map_at_100_std value: -17.289099999999998 - type: nauc_map_at_100_diff1 value: 46.6325 - type: nauc_map_at_1000_max value: 16.2491 - type: nauc_map_at_1000_std value: -17.2655 - type: nauc_map_at_1000_diff1 value: 46.646300000000004 - type: nauc_recall_at_1_max value: 14.691199999999998 - type: nauc_recall_at_1_std value: -18.2481 - type: nauc_recall_at_1_diff1 value: 51.82940000000001 - type: nauc_recall_at_3_max value: 16.6167 - type: nauc_recall_at_3_std value: -18.0762 - type: nauc_recall_at_3_diff1 value: 42.9204 - type: nauc_recall_at_5_max value: 17.522299999999998 - type: nauc_recall_at_5_std value: -17.349899999999998 - type: nauc_recall_at_5_diff1 value: 40.5682 - type: nauc_recall_at_10_max value: 18.6573 - type: nauc_recall_at_10_std value: -14.9976 - type: nauc_recall_at_10_diff1 value: 37.7799 - type: nauc_recall_at_20_max value: 21.0226 - type: nauc_recall_at_20_std value: -11.8854 - type: nauc_recall_at_20_diff1 value: 35.3475 - type: nauc_recall_at_100_max value: 26.442300000000003 - type: nauc_recall_at_100_std value: 2.9998 - type: nauc_recall_at_100_diff1 value: 29.618699999999997 - type: nauc_recall_at_1000_max value: 36.3607 - type: nauc_recall_at_1000_std value: 24.0336 - type: nauc_recall_at_1000_diff1 value: 25.6114 - type: nauc_precision_at_1_max value: 14.691199999999998 - type: nauc_precision_at_1_std value: -18.2481 - type: nauc_precision_at_1_diff1 value: 51.82940000000001 - type: nauc_precision_at_3_max value: 16.6167 - type: nauc_precision_at_3_std value: -18.0762 - type: nauc_precision_at_3_diff1 value: 42.9204 - type: nauc_precision_at_5_max value: 17.522299999999998 - type: nauc_precision_at_5_std value: -17.349899999999998 - type: nauc_precision_at_5_diff1 value: 40.5682 - type: nauc_precision_at_10_max value: 18.6573 - type: nauc_precision_at_10_std value: -14.9976 - type: nauc_precision_at_10_diff1 value: 37.7799 - type: nauc_precision_at_20_max value: 21.0226 - type: nauc_precision_at_20_std value: -11.8854 - type: nauc_precision_at_20_diff1 value: 35.3475 - type: nauc_precision_at_100_max value: 26.442300000000003 - type: nauc_precision_at_100_std value: 2.9998 - type: nauc_precision_at_100_diff1 value: 29.618699999999997 - type: nauc_precision_at_1000_max value: 36.3607 - type: nauc_precision_at_1000_std value: 24.0336 - type: nauc_precision_at_1000_diff1 value: 25.6114 - type: nauc_mrr_at_1_max value: 14.691199999999998 - type: nauc_mrr_at_1_std value: -18.2481 - type: nauc_mrr_at_1_diff1 value: 51.82940000000001 - type: nauc_mrr_at_3_max value: 15.657099999999998 - type: nauc_mrr_at_3_std value: -18.253700000000002 - type: nauc_mrr_at_3_diff1 value: 47.749399999999994 - type: nauc_mrr_at_5_max value: 15.8683 - type: nauc_mrr_at_5_std value: -18.0718 - type: nauc_mrr_at_5_diff1 value: 47.176899999999996 - type: nauc_mrr_at_10_max value: 16.0118 - type: nauc_mrr_at_10_std value: -17.7494 - type: nauc_mrr_at_10_diff1 value: 46.818799999999996 - type: nauc_mrr_at_20_max value: 16.1658 - type: nauc_mrr_at_20_std value: -17.552400000000002 - type: nauc_mrr_at_20_diff1 value: 46.694 - type: nauc_mrr_at_100_max value: 16.2407 - type: nauc_mrr_at_100_std value: -17.289099999999998 - type: nauc_mrr_at_100_diff1 value: 46.6325 - type: nauc_mrr_at_1000_max value: 16.2491 - type: nauc_mrr_at_1000_std value: -17.2655 - type: nauc_mrr_at_1000_diff1 value: 46.646300000000004 - type: main_score value: 37.757000000000005 task: type: Retrieval - dataset: config: default name: MTEB CodeFeedbackST (default) revision: d213819e87aab9010628da8b73ab4eb337c89340 split: test type: CoIR-Retrieval/codefeedback-st metrics: - type: ndcg_at_1 value: 53.335 - type: ndcg_at_3 value: 64.78399999999999 - type: ndcg_at_5 value: 67.418 - type: ndcg_at_10 value: 69.425 - type: ndcg_at_20 value: 70.513 - type: ndcg_at_100 value: 71.709 - type: ndcg_at_1000 value: 72.139 - type: map_at_1 value: 53.335 - type: map_at_3 value: 62.0 - type: map_at_5 value: 63.467 - type: map_at_10 value: 64.306 - type: map_at_20 value: 64.608 - type: map_at_100 value: 64.776 - type: map_at_1000 value: 64.793 - type: recall_at_1 value: 53.335 - type: recall_at_3 value: 72.82600000000001 - type: recall_at_5 value: 79.199 - type: recall_at_10 value: 85.354 - type: recall_at_20 value: 89.628 - type: recall_at_100 value: 96.039 - type: recall_at_1000 value: 99.368 - type: precision_at_1 value: 53.335 - type: precision_at_3 value: 24.275 - type: precision_at_5 value: 15.840000000000002 - type: precision_at_10 value: 8.535 - type: precision_at_20 value: 4.481 - type: precision_at_100 value: 0.96 - type: precision_at_1000 value: 0.099 - type: mrr_at_1 value: 53.31249999999999 - type: mrr_at_3 value: 62.0217 - type: mrr_at_5 value: 63.489700000000006 - type: mrr_at_10 value: 64.3214 - type: mrr_at_20 value: 64.6232 - type: mrr_at_100 value: 64.7915 - type: mrr_at_1000 value: 64.8086 - type: nauc_ndcg_at_1_max value: 4.5411 - type: nauc_ndcg_at_1_std value: -27.4357 - type: nauc_ndcg_at_1_diff1 value: 70.331 - type: nauc_ndcg_at_3_max value: 9.293899999999999 - type: nauc_ndcg_at_3_std value: -30.4201 - type: nauc_ndcg_at_3_diff1 value: 64.90599999999999 - type: nauc_ndcg_at_5_max value: 9.725 - type: nauc_ndcg_at_5_std value: -30.8448 - type: nauc_ndcg_at_5_diff1 value: 64.2796 - type: nauc_ndcg_at_10_max value: 9.4302 - type: nauc_ndcg_at_10_std value: -30.5425 - type: nauc_ndcg_at_10_diff1 value: 64.5211 - type: nauc_ndcg_at_20_max value: 9.019 - type: nauc_ndcg_at_20_std value: -29.986800000000002 - type: nauc_ndcg_at_20_diff1 value: 64.7995 - type: nauc_ndcg_at_100_max value: 8.780100000000001 - type: nauc_ndcg_at_100_std value: -29.4587 - type: nauc_ndcg_at_100_diff1 value: 65.3485 - type: nauc_ndcg_at_1000_max value: 8.5933 - type: nauc_ndcg_at_1000_std value: -29.462300000000003 - type: nauc_ndcg_at_1000_diff1 value: 65.5513 - type: nauc_map_at_1_max value: 4.5411 - type: nauc_map_at_1_std value: -27.4357 - type: nauc_map_at_1_diff1 value: 70.331 - type: nauc_map_at_3_max value: 7.9982 - type: nauc_map_at_3_std value: -29.5826 - type: nauc_map_at_3_diff1 value: 66.2961 - type: nauc_map_at_5_max value: 8.1756 - type: nauc_map_at_5_std value: -29.765900000000002 - type: nauc_map_at_5_diff1 value: 66.0248 - type: nauc_map_at_10_max value: 8.0296 - type: nauc_map_at_10_std value: -29.6458 - type: nauc_map_at_10_diff1 value: 66.158 - type: nauc_map_at_20_max value: 7.919099999999999 - type: nauc_map_at_20_std value: -29.505799999999997 - type: nauc_map_at_20_diff1 value: 66.24029999999999 - type: nauc_map_at_100_max value: 7.8803 - type: nauc_map_at_100_std value: -29.442600000000002 - type: nauc_map_at_100_diff1 value: 66.3125 - type: nauc_map_at_1000_max value: 7.8752 - type: nauc_map_at_1000_std value: -29.438399999999998 - type: nauc_map_at_1000_diff1 value: 66.3195 - type: nauc_recall_at_1_max value: 4.5411 - type: nauc_recall_at_1_std value: -27.4357 - type: nauc_recall_at_1_diff1 value: 70.331 - type: nauc_recall_at_3_max value: 13.911000000000001 - type: nauc_recall_at_3_std value: -33.4167 - type: nauc_recall_at_3_diff1 value: 59.9986 - type: nauc_recall_at_5_max value: 16.401 - type: nauc_recall_at_5_std value: -35.5473 - type: nauc_recall_at_5_diff1 value: 56.781000000000006 - type: nauc_recall_at_10_max value: 17.2917 - type: nauc_recall_at_10_std value: -35.4908 - type: nauc_recall_at_10_diff1 value: 55.279199999999996 - type: nauc_recall_at_20_max value: 16.4243 - type: nauc_recall_at_20_std value: -32.2776 - type: nauc_recall_at_20_diff1 value: 54.4386 - type: nauc_recall_at_100_max value: 21.5949 - type: nauc_recall_at_100_std value: -19.9444 - type: nauc_recall_at_100_diff1 value: 54.3502 - type: nauc_recall_at_1000_max value: 35.8557 - type: nauc_recall_at_1000_std value: 18.242 - type: nauc_recall_at_1000_diff1 value: 50.969699999999996 - type: nauc_precision_at_1_max value: 4.5411 - type: nauc_precision_at_1_std value: -27.4357 - type: nauc_precision_at_1_diff1 value: 70.331 - type: nauc_precision_at_3_max value: 13.911000000000001 - type: nauc_precision_at_3_std value: -33.4167 - type: nauc_precision_at_3_diff1 value: 59.9986 - type: nauc_precision_at_5_max value: 16.401 - type: nauc_precision_at_5_std value: -35.5473 - type: nauc_precision_at_5_diff1 value: 56.781000000000006 - type: nauc_precision_at_10_max value: 17.2917 - type: nauc_precision_at_10_std value: -35.4908 - type: nauc_precision_at_10_diff1 value: 55.279199999999996 - type: nauc_precision_at_20_max value: 16.4243 - type: nauc_precision_at_20_std value: -32.2776 - type: nauc_precision_at_20_diff1 value: 54.4386 - type: nauc_precision_at_100_max value: 21.5949 - type: nauc_precision_at_100_std value: -19.9444 - type: nauc_precision_at_100_diff1 value: 54.3502 - type: nauc_precision_at_1000_max value: 35.8557 - type: nauc_precision_at_1000_std value: 18.242 - type: nauc_precision_at_1000_diff1 value: 50.969699999999996 - type: nauc_mrr_at_1_max value: 4.045 - type: nauc_mrr_at_1_std value: -27.371299999999998 - type: nauc_mrr_at_1_diff1 value: 70.3681 - type: nauc_mrr_at_3_max value: 7.7906 - type: nauc_mrr_at_3_std value: -29.488999999999997 - type: nauc_mrr_at_3_diff1 value: 66.2574 - type: nauc_mrr_at_5_max value: 7.8858999999999995 - type: nauc_mrr_at_5_std value: -29.7336 - type: nauc_mrr_at_5_diff1 value: 66.0274 - type: nauc_mrr_at_10_max value: 7.7456 - type: nauc_mrr_at_10_std value: -29.5912 - type: nauc_mrr_at_10_diff1 value: 66.1546 - type: nauc_mrr_at_20_max value: 7.6305 - type: nauc_mrr_at_20_std value: -29.4551 - type: nauc_mrr_at_20_diff1 value: 66.2342 - type: nauc_mrr_at_100_max value: 7.589799999999999 - type: nauc_mrr_at_100_std value: -29.392400000000002 - type: nauc_mrr_at_100_diff1 value: 66.3072 - type: nauc_mrr_at_1000_max value: 7.584499999999999 - type: nauc_mrr_at_1000_std value: -29.3881 - type: nauc_mrr_at_1000_diff1 value: 66.3142 - type: main_score value: 69.425 task: type: Retrieval - dataset: config: python name: MTEB CodeSearchNetCCRetrieval (python) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 39.395 - type: ndcg_at_3 value: 49.038 - type: ndcg_at_5 value: 51.398999999999994 - type: ndcg_at_10 value: 53.593999999999994 - type: ndcg_at_20 value: 55.013 - type: ndcg_at_100 value: 56.940999999999995 - type: ndcg_at_1000 value: 58.126999999999995 - type: map_at_1 value: 39.395 - type: map_at_3 value: 46.687 - type: map_at_5 value: 48.003 - type: map_at_10 value: 48.911 - type: map_at_20 value: 49.305 - type: map_at_100 value: 49.571 - type: map_at_1000 value: 49.612 - type: recall_at_1 value: 39.395 - type: recall_at_3 value: 55.832 - type: recall_at_5 value: 61.543000000000006 - type: recall_at_10 value: 68.313 - type: recall_at_20 value: 73.897 - type: recall_at_100 value: 84.308 - type: recall_at_1000 value: 93.866 - type: precision_at_1 value: 39.395 - type: precision_at_3 value: 18.611 - type: precision_at_5 value: 12.309000000000001 - type: precision_at_10 value: 6.8309999999999995 - type: precision_at_20 value: 3.695 - type: precision_at_100 value: 0.843 - type: precision_at_1000 value: 0.094 - type: mrr_at_1 value: 39.402100000000004 - type: mrr_at_3 value: 46.690799999999996 - type: mrr_at_5 value: 48.0073 - type: mrr_at_10 value: 48.9156 - type: mrr_at_20 value: 49.3097 - type: mrr_at_100 value: 49.5752 - type: mrr_at_1000 value: 49.6159 - type: nauc_ndcg_at_1_max value: 29.945899999999998 - type: nauc_ndcg_at_1_std value: -7.957 - type: nauc_ndcg_at_1_diff1 value: 55.8451 - type: nauc_ndcg_at_3_max value: 31.5415 - type: nauc_ndcg_at_3_std value: -8.2198 - type: nauc_ndcg_at_3_diff1 value: 51.75959999999999 - type: nauc_ndcg_at_5_max value: 31.6664 - type: nauc_ndcg_at_5_std value: -7.1463 - type: nauc_ndcg_at_5_diff1 value: 51.0188 - type: nauc_ndcg_at_10_max value: 31.616 - type: nauc_ndcg_at_10_std value: -6.575699999999999 - type: nauc_ndcg_at_10_diff1 value: 50.7344 - type: nauc_ndcg_at_20_max value: 31.626199999999997 - type: nauc_ndcg_at_20_std value: -6.0725 - type: nauc_ndcg_at_20_diff1 value: 50.77159999999999 - type: nauc_ndcg_at_100_max value: 31.6639 - type: nauc_ndcg_at_100_std value: -5.4948999999999995 - type: nauc_ndcg_at_100_diff1 value: 50.790800000000004 - type: nauc_ndcg_at_1000_max value: 31.5161 - type: nauc_ndcg_at_1000_std value: -5.748600000000001 - type: nauc_ndcg_at_1000_diff1 value: 51.062799999999996 - type: nauc_map_at_1_max value: 29.945899999999998 - type: nauc_map_at_1_std value: -7.957 - type: nauc_map_at_1_diff1 value: 55.8451 - type: nauc_map_at_3_max value: 31.1851 - type: nauc_map_at_3_std value: -8.1706 - type: nauc_map_at_3_diff1 value: 52.7057 - type: nauc_map_at_5_max value: 31.2519 - type: nauc_map_at_5_std value: -7.580299999999999 - type: nauc_map_at_5_diff1 value: 52.3165 - type: nauc_map_at_10_max value: 31.231399999999997 - type: nauc_map_at_10_std value: -7.360800000000001 - type: nauc_map_at_10_diff1 value: 52.23 - type: nauc_map_at_20_max value: 31.2307 - type: nauc_map_at_20_std value: -7.2384 - type: nauc_map_at_20_diff1 value: 52.2532 - type: nauc_map_at_100_max value: 31.2368 - type: nauc_map_at_100_std value: -7.1598 - type: nauc_map_at_100_diff1 value: 52.260600000000004 - type: nauc_map_at_1000_max value: 31.230900000000002 - type: nauc_map_at_1000_std value: -7.1662 - type: nauc_map_at_1000_diff1 value: 52.267300000000006 - type: nauc_recall_at_1_max value: 29.945899999999998 - type: nauc_recall_at_1_std value: -7.957 - type: nauc_recall_at_1_diff1 value: 55.8451 - type: nauc_recall_at_3_max value: 32.6121 - type: nauc_recall_at_3_std value: -8.363 - type: nauc_recall_at_3_diff1 value: 48.9016 - type: nauc_recall_at_5_max value: 33.0025 - type: nauc_recall_at_5_std value: -5.5725 - type: nauc_recall_at_5_diff1 value: 46.7352 - type: nauc_recall_at_10_max value: 32.9683 - type: nauc_recall_at_10_std value: -3.2460999999999998 - type: nauc_recall_at_10_diff1 value: 45.0443 - type: nauc_recall_at_20_max value: 33.2455 - type: nauc_recall_at_20_std value: -0.0093 - type: nauc_recall_at_20_diff1 value: 44.294200000000004 - type: nauc_recall_at_100_max value: 34.4004 - type: nauc_recall_at_100_std value: 8.996500000000001 - type: nauc_recall_at_100_diff1 value: 41.0779 - type: nauc_recall_at_1000_max value: 33.096399999999996 - type: nauc_recall_at_1000_std value: 19.266 - type: nauc_recall_at_1000_diff1 value: 38.2966 - type: nauc_precision_at_1_max value: 29.945899999999998 - type: nauc_precision_at_1_std value: -7.957 - type: nauc_precision_at_1_diff1 value: 55.8451 - type: nauc_precision_at_3_max value: 32.6121 - type: nauc_precision_at_3_std value: -8.363 - type: nauc_precision_at_3_diff1 value: 48.9016 - type: nauc_precision_at_5_max value: 33.0025 - type: nauc_precision_at_5_std value: -5.5725 - type: nauc_precision_at_5_diff1 value: 46.7352 - type: nauc_precision_at_10_max value: 32.9683 - type: nauc_precision_at_10_std value: -3.2460999999999998 - type: nauc_precision_at_10_diff1 value: 45.0443 - type: nauc_precision_at_20_max value: 33.2455 - type: nauc_precision_at_20_std value: -0.0093 - type: nauc_precision_at_20_diff1 value: 44.294200000000004 - type: nauc_precision_at_100_max value: 34.4004 - type: nauc_precision_at_100_std value: 8.996500000000001 - type: nauc_precision_at_100_diff1 value: 41.0779 - type: nauc_precision_at_1000_max value: 33.096399999999996 - type: nauc_precision_at_1000_std value: 19.266 - type: nauc_precision_at_1000_diff1 value: 38.2966 - type: nauc_mrr_at_1_max value: 29.9427 - type: nauc_mrr_at_1_std value: -7.9670000000000005 - type: nauc_mrr_at_1_diff1 value: 55.824799999999996 - type: nauc_mrr_at_3_max value: 31.1834 - type: nauc_mrr_at_3_std value: -8.175799999999999 - type: nauc_mrr_at_3_diff1 value: 52.6952 - type: nauc_mrr_at_5_max value: 31.2515 - type: nauc_mrr_at_5_std value: -7.5835 - type: nauc_mrr_at_5_diff1 value: 52.303599999999996 - type: nauc_mrr_at_10_max value: 31.2284 - type: nauc_mrr_at_10_std value: -7.3647 - type: nauc_mrr_at_10_diff1 value: 52.2177 - type: nauc_mrr_at_20_max value: 31.2274 - type: nauc_mrr_at_20_std value: -7.243399999999999 - type: nauc_mrr_at_20_diff1 value: 52.2417 - type: nauc_mrr_at_100_max value: 31.2336 - type: nauc_mrr_at_100_std value: -7.1640999999999995 - type: nauc_mrr_at_100_diff1 value: 52.2482 - type: nauc_mrr_at_1000_max value: 31.227700000000002 - type: nauc_mrr_at_1000_std value: -7.1705000000000005 - type: nauc_mrr_at_1000_diff1 value: 52.254900000000006 - type: main_score value: 53.593999999999994 task: type: Retrieval - dataset: config: javascript name: MTEB CodeSearchNetCCRetrieval (javascript) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 39.593 - type: ndcg_at_3 value: 48.759 - type: ndcg_at_5 value: 51.073 - type: ndcg_at_10 value: 53.1 - type: ndcg_at_20 value: 54.230999999999995 - type: ndcg_at_100 value: 56.289 - type: ndcg_at_1000 value: 57.67400000000001 - type: map_at_1 value: 39.593 - type: map_at_3 value: 46.536 - type: map_at_5 value: 47.826 - type: map_at_10 value: 48.676 - type: map_at_20 value: 48.983 - type: map_at_100 value: 49.268 - type: map_at_1000 value: 49.313 - type: recall_at_1 value: 39.593 - type: recall_at_3 value: 55.181000000000004 - type: recall_at_5 value: 60.772000000000006 - type: recall_at_10 value: 66.971 - type: recall_at_20 value: 71.468 - type: recall_at_100 value: 82.55799999999999 - type: recall_at_1000 value: 93.83200000000001 - type: precision_at_1 value: 39.593 - type: precision_at_3 value: 18.394 - type: precision_at_5 value: 12.154 - type: precision_at_10 value: 6.697 - type: precision_at_20 value: 3.573 - type: precision_at_100 value: 0.826 - type: precision_at_1000 value: 0.094 - type: mrr_at_1 value: 39.5624 - type: mrr_at_3 value: 46.5158 - type: mrr_at_5 value: 47.8056 - type: mrr_at_10 value: 48.654799999999994 - type: mrr_at_20 value: 48.9616 - type: mrr_at_100 value: 49.2469 - type: mrr_at_1000 value: 49.2923 - type: nauc_ndcg_at_1_max value: 26.582099999999997 - type: nauc_ndcg_at_1_std value: -14.751900000000001 - type: nauc_ndcg_at_1_diff1 value: 54.9795 - type: nauc_ndcg_at_3_max value: 30.000700000000002 - type: nauc_ndcg_at_3_std value: -13.107299999999999 - type: nauc_ndcg_at_3_diff1 value: 51.7972 - type: nauc_ndcg_at_5_max value: 29.4468 - type: nauc_ndcg_at_5_std value: -13.3189 - type: nauc_ndcg_at_5_diff1 value: 51.0062 - type: nauc_ndcg_at_10_max value: 28.6629 - type: nauc_ndcg_at_10_std value: -13.900000000000002 - type: nauc_ndcg_at_10_diff1 value: 50.4771 - type: nauc_ndcg_at_20_max value: 28.558600000000002 - type: nauc_ndcg_at_20_std value: -13.793 - type: nauc_ndcg_at_20_diff1 value: 50.720299999999995 - type: nauc_ndcg_at_100_max value: 28.7124 - type: nauc_ndcg_at_100_std value: -13.133000000000001 - type: nauc_ndcg_at_100_diff1 value: 50.7983 - type: nauc_ndcg_at_1000_max value: 28.4906 - type: nauc_ndcg_at_1000_std value: -13.5678 - type: nauc_ndcg_at_1000_diff1 value: 51.1172 - type: nauc_map_at_1_max value: 26.582099999999997 - type: nauc_map_at_1_std value: -14.751900000000001 - type: nauc_map_at_1_diff1 value: 54.9795 - type: nauc_map_at_3_max value: 29.191899999999997 - type: nauc_map_at_3_std value: -13.565299999999999 - type: nauc_map_at_3_diff1 value: 52.5372 - type: nauc_map_at_5_max value: 28.865099999999998 - type: nauc_map_at_5_std value: -13.6911 - type: nauc_map_at_5_diff1 value: 52.12520000000001 - type: nauc_map_at_10_max value: 28.5526 - type: nauc_map_at_10_std value: -13.9255 - type: nauc_map_at_10_diff1 value: 51.931400000000004 - type: nauc_map_at_20_max value: 28.520200000000003 - type: nauc_map_at_20_std value: -13.8934 - type: nauc_map_at_20_diff1 value: 51.991299999999995 - type: nauc_map_at_100_max value: 28.5184 - type: nauc_map_at_100_std value: -13.8399 - type: nauc_map_at_100_diff1 value: 52.0024 - type: nauc_map_at_1000_max value: 28.512500000000003 - type: nauc_map_at_1000_std value: -13.851700000000001 - type: nauc_map_at_1000_diff1 value: 52.0139 - type: nauc_recall_at_1_max value: 26.582099999999997 - type: nauc_recall_at_1_std value: -14.751900000000001 - type: nauc_recall_at_1_diff1 value: 54.9795 - type: nauc_recall_at_3_max value: 32.443 - type: nauc_recall_at_3_std value: -11.6927 - type: nauc_recall_at_3_diff1 value: 49.568400000000004 - type: nauc_recall_at_5_max value: 31.2258 - type: nauc_recall_at_5_std value: -12.1296 - type: nauc_recall_at_5_diff1 value: 47.3057 - type: nauc_recall_at_10_max value: 28.561999999999998 - type: nauc_recall_at_10_std value: -14.103499999999999 - type: nauc_recall_at_10_diff1 value: 44.9228 - type: nauc_recall_at_20_max value: 28.0738 - type: nauc_recall_at_20_std value: -13.632 - type: nauc_recall_at_20_diff1 value: 45.6569 - type: nauc_recall_at_100_max value: 29.9618 - type: nauc_recall_at_100_std value: -6.2382 - type: nauc_recall_at_100_diff1 value: 44.1378 - type: nauc_recall_at_1000_max value: 23.4062 - type: nauc_recall_at_1000_std value: -11.6326 - type: nauc_recall_at_1000_diff1 value: 45.130199999999995 - type: nauc_precision_at_1_max value: 26.582099999999997 - type: nauc_precision_at_1_std value: -14.751900000000001 - type: nauc_precision_at_1_diff1 value: 54.9795 - type: nauc_precision_at_3_max value: 32.443 - type: nauc_precision_at_3_std value: -11.6927 - type: nauc_precision_at_3_diff1 value: 49.568400000000004 - type: nauc_precision_at_5_max value: 31.2258 - type: nauc_precision_at_5_std value: -12.1296 - type: nauc_precision_at_5_diff1 value: 47.3057 - type: nauc_precision_at_10_max value: 28.561999999999998 - type: nauc_precision_at_10_std value: -14.103499999999999 - type: nauc_precision_at_10_diff1 value: 44.9228 - type: nauc_precision_at_20_max value: 28.0738 - type: nauc_precision_at_20_std value: -13.632 - type: nauc_precision_at_20_diff1 value: 45.6569 - type: nauc_precision_at_100_max value: 29.9618 - type: nauc_precision_at_100_std value: -6.2382 - type: nauc_precision_at_100_diff1 value: 44.1378 - type: nauc_precision_at_1000_max value: 23.4062 - type: nauc_precision_at_1000_std value: -11.6326 - type: nauc_precision_at_1000_diff1 value: 45.130199999999995 - type: nauc_mrr_at_1_max value: 26.571499999999997 - type: nauc_mrr_at_1_std value: -14.9002 - type: nauc_mrr_at_1_diff1 value: 55.071400000000004 - type: nauc_mrr_at_3_max value: 29.1956 - type: nauc_mrr_at_3_std value: -13.6331 - type: nauc_mrr_at_3_diff1 value: 52.59439999999999 - type: nauc_mrr_at_5_max value: 28.8688 - type: nauc_mrr_at_5_std value: -13.7599 - type: nauc_mrr_at_5_diff1 value: 52.1832 - type: nauc_mrr_at_10_max value: 28.556199999999997 - type: nauc_mrr_at_10_std value: -13.9924 - type: nauc_mrr_at_10_diff1 value: 51.9865 - type: nauc_mrr_at_20_max value: 28.523799999999998 - type: nauc_mrr_at_20_std value: -13.960700000000001 - type: nauc_mrr_at_20_diff1 value: 52.0466 - type: nauc_mrr_at_100_max value: 28.522 - type: nauc_mrr_at_100_std value: -13.9076 - type: nauc_mrr_at_100_diff1 value: 52.058099999999996 - type: nauc_mrr_at_1000_max value: 28.5161 - type: nauc_mrr_at_1000_std value: -13.919500000000001 - type: nauc_mrr_at_1000_diff1 value: 52.0697 - type: main_score value: 53.1 task: type: Retrieval - dataset: config: go name: MTEB CodeSearchNetCCRetrieval (go) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 30.459999999999997 - type: ndcg_at_3 value: 37.88 - type: ndcg_at_5 value: 40.11 - type: ndcg_at_10 value: 42.094 - type: ndcg_at_20 value: 43.683 - type: ndcg_at_100 value: 45.998 - type: ndcg_at_1000 value: 47.723 - type: map_at_1 value: 30.459999999999997 - type: map_at_3 value: 36.046 - type: map_at_5 value: 37.285000000000004 - type: map_at_10 value: 38.108 - type: map_at_20 value: 38.546 - type: map_at_100 value: 38.859 - type: map_at_1000 value: 38.917 - type: recall_at_1 value: 30.459999999999997 - type: recall_at_3 value: 43.191 - type: recall_at_5 value: 48.596000000000004 - type: recall_at_10 value: 54.716 - type: recall_at_20 value: 60.983 - type: recall_at_100 value: 73.566 - type: recall_at_1000 value: 87.515 - type: precision_at_1 value: 30.459999999999997 - type: precision_at_3 value: 14.396999999999998 - type: precision_at_5 value: 9.719 - type: precision_at_10 value: 5.4719999999999995 - type: precision_at_20 value: 3.049 - type: precision_at_100 value: 0.736 - type: precision_at_1000 value: 0.08800000000000001 - type: mrr_at_1 value: 30.448199999999996 - type: mrr_at_3 value: 36.042 - type: mrr_at_5 value: 37.2763 - type: mrr_at_10 value: 38.1013 - type: mrr_at_20 value: 38.5373 - type: mrr_at_100 value: 38.8506 - type: mrr_at_1000 value: 38.9093 - type: nauc_ndcg_at_1_max value: 27.284999999999997 - type: nauc_ndcg_at_1_std value: -6.6476999999999995 - type: nauc_ndcg_at_1_diff1 value: 50.871500000000005 - type: nauc_ndcg_at_3_max value: 26.6017 - type: nauc_ndcg_at_3_std value: -7.6026 - type: nauc_ndcg_at_3_diff1 value: 46.768 - type: nauc_ndcg_at_5_max value: 26.2865 - type: nauc_ndcg_at_5_std value: -7.3601 - type: nauc_ndcg_at_5_diff1 value: 45.7969 - type: nauc_ndcg_at_10_max value: 25.746599999999997 - type: nauc_ndcg_at_10_std value: -7.4333 - type: nauc_ndcg_at_10_diff1 value: 45.4115 - type: nauc_ndcg_at_20_max value: 25.5118 - type: nauc_ndcg_at_20_std value: -6.9322 - type: nauc_ndcg_at_20_diff1 value: 45.0598 - type: nauc_ndcg_at_100_max value: 25.309900000000003 - type: nauc_ndcg_at_100_std value: -6.0600000000000005 - type: nauc_ndcg_at_100_diff1 value: 44.8825 - type: nauc_ndcg_at_1000_max value: 25.521700000000003 - type: nauc_ndcg_at_1000_std value: -5.9789 - type: nauc_ndcg_at_1000_diff1 value: 45.2513 - type: nauc_map_at_1_max value: 27.284999999999997 - type: nauc_map_at_1_std value: -6.6476999999999995 - type: nauc_map_at_1_diff1 value: 50.871500000000005 - type: nauc_map_at_3_max value: 26.7721 - type: nauc_map_at_3_std value: -7.452300000000001 - type: nauc_map_at_3_diff1 value: 47.7211 - type: nauc_map_at_5_max value: 26.600600000000004 - type: nauc_map_at_5_std value: -7.3378 - type: nauc_map_at_5_diff1 value: 47.1879 - type: nauc_map_at_10_max value: 26.372 - type: nauc_map_at_10_std value: -7.3735 - type: nauc_map_at_10_diff1 value: 47.0298 - type: nauc_map_at_20_max value: 26.3071 - type: nauc_map_at_20_std value: -7.2452000000000005 - type: nauc_map_at_20_diff1 value: 46.9294 - type: nauc_map_at_100_max value: 26.281100000000002 - type: nauc_map_at_100_std value: -7.1155 - type: nauc_map_at_100_diff1 value: 46.9054 - type: nauc_map_at_1000_max value: 26.2903 - type: nauc_map_at_1000_std value: -7.1089 - type: nauc_map_at_1000_diff1 value: 46.9182 - type: nauc_recall_at_1_max value: 27.284999999999997 - type: nauc_recall_at_1_std value: -6.6476999999999995 - type: nauc_recall_at_1_diff1 value: 50.871500000000005 - type: nauc_recall_at_3_max value: 26.1146 - type: nauc_recall_at_3_std value: -7.9985 - type: nauc_recall_at_3_diff1 value: 44.0707 - type: nauc_recall_at_5_max value: 25.3292 - type: nauc_recall_at_5_std value: -7.331799999999999 - type: nauc_recall_at_5_diff1 value: 41.6571 - type: nauc_recall_at_10_max value: 23.6012 - type: nauc_recall_at_10_std value: -7.5294 - type: nauc_recall_at_10_diff1 value: 40.244099999999996 - type: nauc_recall_at_20_max value: 22.453300000000002 - type: nauc_recall_at_20_std value: -5.3024000000000004 - type: nauc_recall_at_20_diff1 value: 38.4242 - type: nauc_recall_at_100_max value: 20.069100000000002 - type: nauc_recall_at_100_std value: 1.4581 - type: nauc_recall_at_100_diff1 value: 35.1775 - type: nauc_recall_at_1000_max value: 19.4385 - type: nauc_recall_at_1000_std value: 9.0112 - type: nauc_recall_at_1000_diff1 value: 34.138000000000005 - type: nauc_precision_at_1_max value: 27.284999999999997 - type: nauc_precision_at_1_std value: -6.6476999999999995 - type: nauc_precision_at_1_diff1 value: 50.871500000000005 - type: nauc_precision_at_3_max value: 26.1146 - type: nauc_precision_at_3_std value: -7.9985 - type: nauc_precision_at_3_diff1 value: 44.0707 - type: nauc_precision_at_5_max value: 25.3292 - type: nauc_precision_at_5_std value: -7.331799999999999 - type: nauc_precision_at_5_diff1 value: 41.6571 - type: nauc_precision_at_10_max value: 23.6012 - type: nauc_precision_at_10_std value: -7.5294 - type: nauc_precision_at_10_diff1 value: 40.244099999999996 - type: nauc_precision_at_20_max value: 22.453300000000002 - type: nauc_precision_at_20_std value: -5.3024000000000004 - type: nauc_precision_at_20_diff1 value: 38.4242 - type: nauc_precision_at_100_max value: 20.069100000000002 - type: nauc_precision_at_100_std value: 1.4581 - type: nauc_precision_at_100_diff1 value: 35.1775 - type: nauc_precision_at_1000_max value: 19.4385 - type: nauc_precision_at_1000_std value: 9.0112 - type: nauc_precision_at_1000_diff1 value: 34.138000000000005 - type: nauc_mrr_at_1_max value: 27.334000000000003 - type: nauc_mrr_at_1_std value: -6.5517 - type: nauc_mrr_at_1_diff1 value: 50.9102 - type: nauc_mrr_at_3_max value: 26.807199999999998 - type: nauc_mrr_at_3_std value: -7.436800000000001 - type: nauc_mrr_at_3_diff1 value: 47.7425 - type: nauc_mrr_at_5_max value: 26.6194 - type: nauc_mrr_at_5_std value: -7.3031 - type: nauc_mrr_at_5_diff1 value: 47.2053 - type: nauc_mrr_at_10_max value: 26.3924 - type: nauc_mrr_at_10_std value: -7.324700000000001 - type: nauc_mrr_at_10_diff1 value: 47.051500000000004 - type: nauc_mrr_at_20_max value: 26.3274 - type: nauc_mrr_at_20_std value: -7.209899999999999 - type: nauc_mrr_at_20_diff1 value: 46.953 - type: nauc_mrr_at_100_max value: 26.3019 - type: nauc_mrr_at_100_std value: -7.0785 - type: nauc_mrr_at_100_diff1 value: 46.9298 - type: nauc_mrr_at_1000_max value: 26.311 - type: nauc_mrr_at_1000_std value: -7.0719 - type: nauc_mrr_at_1000_diff1 value: 46.942499999999995 - type: main_score value: 42.094 task: type: Retrieval - dataset: config: ruby name: MTEB CodeSearchNetCCRetrieval (ruby) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 37.827 - type: ndcg_at_3 value: 47.599000000000004 - type: ndcg_at_5 value: 49.687 - type: ndcg_at_10 value: 51.686 - type: ndcg_at_20 value: 53.018 - type: ndcg_at_100 value: 54.75600000000001 - type: ndcg_at_1000 value: 56.196 - type: map_at_1 value: 37.827 - type: map_at_3 value: 45.242 - type: map_at_5 value: 46.400000000000006 - type: map_at_10 value: 47.223 - type: map_at_20 value: 47.593 - type: map_at_100 value: 47.824 - type: map_at_1000 value: 47.878 - type: recall_at_1 value: 37.827 - type: recall_at_3 value: 54.400999999999996 - type: recall_at_5 value: 59.477000000000004 - type: recall_at_10 value: 65.66199999999999 - type: recall_at_20 value: 70.896 - type: recall_at_100 value: 80.41199999999999 - type: recall_at_1000 value: 91.753 - type: precision_at_1 value: 37.827 - type: precision_at_3 value: 18.134 - type: precision_at_5 value: 11.895 - type: precision_at_10 value: 6.566 - type: precision_at_20 value: 3.5450000000000004 - type: precision_at_100 value: 0.804 - type: precision_at_1000 value: 0.092 - type: mrr_at_1 value: 37.8271 - type: mrr_at_3 value: 45.2154 - type: mrr_at_5 value: 46.3931 - type: mrr_at_10 value: 47.2166 - type: mrr_at_20 value: 47.5869 - type: mrr_at_100 value: 47.8167 - type: mrr_at_1000 value: 47.8715 - type: nauc_ndcg_at_1_max value: 34.1998 - type: nauc_ndcg_at_1_std value: -15.7415 - type: nauc_ndcg_at_1_diff1 value: 61.8572 - type: nauc_ndcg_at_3_max value: 33.566 - type: nauc_ndcg_at_3_std value: -18.0058 - type: nauc_ndcg_at_3_diff1 value: 54.5929 - type: nauc_ndcg_at_5_max value: 34.0447 - type: nauc_ndcg_at_5_std value: -17.3914 - type: nauc_ndcg_at_5_diff1 value: 53.980399999999996 - type: nauc_ndcg_at_10_max value: 34.0521 - type: nauc_ndcg_at_10_std value: -17.298099999999998 - type: nauc_ndcg_at_10_diff1 value: 53.63830000000001 - type: nauc_ndcg_at_20_max value: 34.076499999999996 - type: nauc_ndcg_at_20_std value: -17.1978 - type: nauc_ndcg_at_20_diff1 value: 53.3739 - type: nauc_ndcg_at_100_max value: 33.9961 - type: nauc_ndcg_at_100_std value: -17.0232 - type: nauc_ndcg_at_100_diff1 value: 53.8714 - type: nauc_ndcg_at_1000_max value: 34.0269 - type: nauc_ndcg_at_1000_std value: -16.6124 - type: nauc_ndcg_at_1000_diff1 value: 54.286199999999994 - type: nauc_map_at_1_max value: 34.1998 - type: nauc_map_at_1_std value: -15.7415 - type: nauc_map_at_1_diff1 value: 61.8572 - type: nauc_map_at_3_max value: 33.8395 - type: nauc_map_at_3_std value: -17.529 - type: nauc_map_at_3_diff1 value: 56.4065 - type: nauc_map_at_5_max value: 34.1343 - type: nauc_map_at_5_std value: -17.1732 - type: nauc_map_at_5_diff1 value: 56.1246 - type: nauc_map_at_10_max value: 34.1717 - type: nauc_map_at_10_std value: -17.1179 - type: nauc_map_at_10_diff1 value: 56.041399999999996 - type: nauc_map_at_20_max value: 34.1895 - type: nauc_map_at_20_std value: -17.077 - type: nauc_map_at_20_diff1 value: 55.96489999999999 - type: nauc_map_at_100_max value: 34.1922 - type: nauc_map_at_100_std value: -17.0664 - type: nauc_map_at_100_diff1 value: 56.0487 - type: nauc_map_at_1000_max value: 34.186 - type: nauc_map_at_1000_std value: -17.0498 - type: nauc_map_at_1000_diff1 value: 56.0623 - type: nauc_recall_at_1_max value: 34.1998 - type: nauc_recall_at_1_std value: -15.7415 - type: nauc_recall_at_1_diff1 value: 61.8572 - type: nauc_recall_at_3_max value: 32.6911 - type: nauc_recall_at_3_std value: -19.4073 - type: nauc_recall_at_3_diff1 value: 49.1188 - type: nauc_recall_at_5_max value: 33.7416 - type: nauc_recall_at_5_std value: -17.965700000000002 - type: nauc_recall_at_5_diff1 value: 47.0821 - type: nauc_recall_at_10_max value: 33.5209 - type: nauc_recall_at_10_std value: -17.7965 - type: nauc_recall_at_10_diff1 value: 44.8874 - type: nauc_recall_at_20_max value: 33.4757 - type: nauc_recall_at_20_std value: -17.4921 - type: nauc_recall_at_20_diff1 value: 42.747 - type: nauc_recall_at_100_max value: 32.2069 - type: nauc_recall_at_100_std value: -15.6244 - type: nauc_recall_at_100_diff1 value: 43.0441 - type: nauc_recall_at_1000_max value: 32.428000000000004 - type: nauc_recall_at_1000_std value: -2.6172 - type: nauc_recall_at_1000_diff1 value: 42.1384 - type: nauc_precision_at_1_max value: 34.1998 - type: nauc_precision_at_1_std value: -15.7415 - type: nauc_precision_at_1_diff1 value: 61.8572 - type: nauc_precision_at_3_max value: 32.6911 - type: nauc_precision_at_3_std value: -19.4073 - type: nauc_precision_at_3_diff1 value: 49.1188 - type: nauc_precision_at_5_max value: 33.7416 - type: nauc_precision_at_5_std value: -17.965700000000002 - type: nauc_precision_at_5_diff1 value: 47.0821 - type: nauc_precision_at_10_max value: 33.5209 - type: nauc_precision_at_10_std value: -17.7965 - type: nauc_precision_at_10_diff1 value: 44.8874 - type: nauc_precision_at_20_max value: 33.4757 - type: nauc_precision_at_20_std value: -17.4921 - type: nauc_precision_at_20_diff1 value: 42.747 - type: nauc_precision_at_100_max value: 32.2069 - type: nauc_precision_at_100_std value: -15.6244 - type: nauc_precision_at_100_diff1 value: 43.0441 - type: nauc_precision_at_1000_max value: 32.428000000000004 - type: nauc_precision_at_1000_std value: -2.6172 - type: nauc_precision_at_1000_diff1 value: 42.1384 - type: nauc_mrr_at_1_max value: 34.5467 - type: nauc_mrr_at_1_std value: -15.676499999999999 - type: nauc_mrr_at_1_diff1 value: 61.8572 - type: nauc_mrr_at_3_max value: 34.0355 - type: nauc_mrr_at_3_std value: -17.448900000000002 - type: nauc_mrr_at_3_diff1 value: 56.4005 - type: nauc_mrr_at_5_max value: 34.319100000000006 - type: nauc_mrr_at_5_std value: -17.1276 - type: nauc_mrr_at_5_diff1 value: 56.1231 - type: nauc_mrr_at_10_max value: 34.3588 - type: nauc_mrr_at_10_std value: -17.0717 - type: nauc_mrr_at_10_diff1 value: 56.03979999999999 - type: nauc_mrr_at_20_max value: 34.3778 - type: nauc_mrr_at_20_std value: -17.0305 - type: nauc_mrr_at_20_diff1 value: 55.96339999999999 - type: nauc_mrr_at_100_max value: 34.3812 - type: nauc_mrr_at_100_std value: -17.022599999999997 - type: nauc_mrr_at_100_diff1 value: 56.0469 - type: nauc_mrr_at_1000_max value: 34.375 - type: nauc_mrr_at_1000_std value: -17.0037 - type: nauc_mrr_at_1000_diff1 value: 56.0608 - type: main_score value: 51.686 task: type: Retrieval - dataset: config: java name: MTEB CodeSearchNetCCRetrieval (java) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 39.744 - type: ndcg_at_3 value: 48.465 - type: ndcg_at_5 value: 50.615 - type: ndcg_at_10 value: 52.544000000000004 - type: ndcg_at_20 value: 53.864999999999995 - type: ndcg_at_100 value: 55.806 - type: ndcg_at_1000 value: 57.082 - type: map_at_1 value: 39.744 - type: map_at_3 value: 46.346 - type: map_at_5 value: 47.538000000000004 - type: map_at_10 value: 48.333999999999996 - type: map_at_20 value: 48.699999999999996 - type: map_at_100 value: 48.97 - type: map_at_1000 value: 49.014 - type: recall_at_1 value: 39.744 - type: recall_at_3 value: 54.586999999999996 - type: recall_at_5 value: 59.80799999999999 - type: recall_at_10 value: 65.778 - type: recall_at_20 value: 70.97200000000001 - type: recall_at_100 value: 81.415 - type: recall_at_1000 value: 91.702 - type: precision_at_1 value: 39.744 - type: precision_at_3 value: 18.196 - type: precision_at_5 value: 11.962 - type: precision_at_10 value: 6.578 - type: precision_at_20 value: 3.549 - type: precision_at_100 value: 0.814 - type: precision_at_1000 value: 0.092 - type: mrr_at_1 value: 39.7901 - type: mrr_at_3 value: 46.367000000000004 - type: mrr_at_5 value: 47.556799999999996 - type: mrr_at_10 value: 48.3531 - type: mrr_at_20 value: 48.7206 - type: mrr_at_100 value: 48.9901 - type: mrr_at_1000 value: 49.034 - type: nauc_ndcg_at_1_max value: 31.1431 - type: nauc_ndcg_at_1_std value: -10.407399999999999 - type: nauc_ndcg_at_1_diff1 value: 56.6466 - type: nauc_ndcg_at_3_max value: 33.022800000000004 - type: nauc_ndcg_at_3_std value: -9.5046 - type: nauc_ndcg_at_3_diff1 value: 52.7916 - type: nauc_ndcg_at_5_max value: 33.1721 - type: nauc_ndcg_at_5_std value: -9.0365 - type: nauc_ndcg_at_5_diff1 value: 52.317400000000006 - type: nauc_ndcg_at_10_max value: 33.1837 - type: nauc_ndcg_at_10_std value: -8.4008 - type: nauc_ndcg_at_10_diff1 value: 52.007999999999996 - type: nauc_ndcg_at_20_max value: 33.024 - type: nauc_ndcg_at_20_std value: -7.9246 - type: nauc_ndcg_at_20_diff1 value: 51.9078 - type: nauc_ndcg_at_100_max value: 32.962599999999995 - type: nauc_ndcg_at_100_std value: -7.4719 - type: nauc_ndcg_at_100_diff1 value: 51.94180000000001 - type: nauc_ndcg_at_1000_max value: 33.1905 - type: nauc_ndcg_at_1000_std value: -7.295599999999999 - type: nauc_ndcg_at_1000_diff1 value: 52.351099999999995 - type: nauc_map_at_1_max value: 31.1431 - type: nauc_map_at_1_std value: -10.407399999999999 - type: nauc_map_at_1_diff1 value: 56.6466 - type: nauc_map_at_3_max value: 32.5713 - type: nauc_map_at_3_std value: -9.734 - type: nauc_map_at_3_diff1 value: 53.703599999999994 - type: nauc_map_at_5_max value: 32.6494 - type: nauc_map_at_5_std value: -9.4813 - type: nauc_map_at_5_diff1 value: 53.4567 - type: nauc_map_at_10_max value: 32.664100000000005 - type: nauc_map_at_10_std value: -9.225999999999999 - type: nauc_map_at_10_diff1 value: 53.3589 - type: nauc_map_at_20_max value: 32.6136 - type: nauc_map_at_20_std value: -9.107899999999999 - type: nauc_map_at_20_diff1 value: 53.337 - type: nauc_map_at_100_max value: 32.6036 - type: nauc_map_at_100_std value: -9.0547 - type: nauc_map_at_100_diff1 value: 53.35339999999999 - type: nauc_map_at_1000_max value: 32.610299999999995 - type: nauc_map_at_1000_std value: -9.0493 - type: nauc_map_at_1000_diff1 value: 53.3656 - type: nauc_recall_at_1_max value: 31.1431 - type: nauc_recall_at_1_std value: -10.407399999999999 - type: nauc_recall_at_1_diff1 value: 56.6466 - type: nauc_recall_at_3_max value: 34.3846 - type: nauc_recall_at_3_std value: -8.8071 - type: nauc_recall_at_3_diff1 value: 50.047 - type: nauc_recall_at_5_max value: 34.8431 - type: nauc_recall_at_5_std value: -7.550999999999999 - type: nauc_recall_at_5_diff1 value: 48.6504 - type: nauc_recall_at_10_max value: 34.9686 - type: nauc_recall_at_10_std value: -5.1544 - type: nauc_recall_at_10_diff1 value: 47.0462 - type: nauc_recall_at_20_max value: 34.441300000000005 - type: nauc_recall_at_20_std value: -2.3698 - type: nauc_recall_at_20_diff1 value: 45.9903 - type: nauc_recall_at_100_max value: 34.4855 - type: nauc_recall_at_100_std value: 4.2675 - type: nauc_recall_at_100_diff1 value: 43.5966 - type: nauc_recall_at_1000_max value: 42.692600000000006 - type: nauc_recall_at_1000_std value: 21.8632 - type: nauc_recall_at_1000_diff1 value: 46.5143 - type: nauc_precision_at_1_max value: 31.1431 - type: nauc_precision_at_1_std value: -10.407399999999999 - type: nauc_precision_at_1_diff1 value: 56.6466 - type: nauc_precision_at_3_max value: 34.3846 - type: nauc_precision_at_3_std value: -8.8071 - type: nauc_precision_at_3_diff1 value: 50.047 - type: nauc_precision_at_5_max value: 34.8431 - type: nauc_precision_at_5_std value: -7.550999999999999 - type: nauc_precision_at_5_diff1 value: 48.6504 - type: nauc_precision_at_10_max value: 34.9686 - type: nauc_precision_at_10_std value: -5.1544 - type: nauc_precision_at_10_diff1 value: 47.0462 - type: nauc_precision_at_20_max value: 34.441300000000005 - type: nauc_precision_at_20_std value: -2.3698 - type: nauc_precision_at_20_diff1 value: 45.9903 - type: nauc_precision_at_100_max value: 34.4855 - type: nauc_precision_at_100_std value: 4.2675 - type: nauc_precision_at_100_diff1 value: 43.5966 - type: nauc_precision_at_1000_max value: 42.692600000000006 - type: nauc_precision_at_1000_std value: 21.8632 - type: nauc_precision_at_1000_diff1 value: 46.5143 - type: nauc_mrr_at_1_max value: 31.1816 - type: nauc_mrr_at_1_std value: -10.2945 - type: nauc_mrr_at_1_diff1 value: 56.5084 - type: nauc_mrr_at_3_max value: 32.609300000000005 - type: nauc_mrr_at_3_std value: -9.6538 - type: nauc_mrr_at_3_diff1 value: 53.6187 - type: nauc_mrr_at_5_max value: 32.6863 - type: nauc_mrr_at_5_std value: -9.3972 - type: nauc_mrr_at_5_diff1 value: 53.378400000000006 - type: nauc_mrr_at_10_max value: 32.697700000000005 - type: nauc_mrr_at_10_std value: -9.1456 - type: nauc_mrr_at_10_diff1 value: 53.2796 - type: nauc_mrr_at_20_max value: 32.6496 - type: nauc_mrr_at_20_std value: -9.0244 - type: nauc_mrr_at_20_diff1 value: 53.257600000000004 - type: nauc_mrr_at_100_max value: 32.6402 - type: nauc_mrr_at_100_std value: -8.970799999999999 - type: nauc_mrr_at_100_diff1 value: 53.274100000000004 - type: nauc_mrr_at_1000_max value: 32.647 - type: nauc_mrr_at_1000_std value: -8.9653 - type: nauc_mrr_at_1000_diff1 value: 53.286100000000005 - type: main_score value: 52.544000000000004 task: type: Retrieval - dataset: config: php name: MTEB CodeSearchNetCCRetrieval (php) revision: 6e1effa2c03723c5fde48ee912b5ee08d4f211e8 split: test type: CoIR-Retrieval/CodeSearchNet-ccr metrics: - type: ndcg_at_1 value: 29.685 - type: ndcg_at_3 value: 37.448 - type: ndcg_at_5 value: 39.781 - type: ndcg_at_10 value: 41.814 - type: ndcg_at_20 value: 43.333 - type: ndcg_at_100 value: 45.664 - type: ndcg_at_1000 value: 47.536 - type: map_at_1 value: 29.685 - type: map_at_3 value: 35.545 - type: map_at_5 value: 36.839 - type: map_at_10 value: 37.682 - type: map_at_20 value: 38.099 - type: map_at_100 value: 38.415 - type: map_at_1000 value: 38.478 - type: recall_at_1 value: 29.685 - type: recall_at_3 value: 42.95 - type: recall_at_5 value: 48.616 - type: recall_at_10 value: 54.888000000000005 - type: recall_at_20 value: 60.895999999999994 - type: recall_at_100 value: 73.548 - type: recall_at_1000 value: 88.697 - type: precision_at_1 value: 29.685 - type: precision_at_3 value: 14.316999999999998 - type: precision_at_5 value: 9.722999999999999 - type: precision_at_10 value: 5.489 - type: precision_at_20 value: 3.045 - type: precision_at_100 value: 0.735 - type: precision_at_1000 value: 0.089 - type: mrr_at_1 value: 29.6489 - type: mrr_at_3 value: 35.5299 - type: mrr_at_5 value: 36.8133 - type: mrr_at_10 value: 37.6632 - type: mrr_at_20 value: 38.079299999999996 - type: mrr_at_100 value: 38.3951 - type: mrr_at_1000 value: 38.4584 - type: nauc_ndcg_at_1_max value: 23.1966 - type: nauc_ndcg_at_1_std value: -9.4926 - type: nauc_ndcg_at_1_diff1 value: 50.2664 - type: nauc_ndcg_at_3_max value: 22.9114 - type: nauc_ndcg_at_3_std value: -9.3945 - type: nauc_ndcg_at_3_diff1 value: 45.266400000000004 - type: nauc_ndcg_at_5_max value: 22.2736 - type: nauc_ndcg_at_5_std value: -9.1173 - type: nauc_ndcg_at_5_diff1 value: 44.1003 - type: nauc_ndcg_at_10_max value: 22.0212 - type: nauc_ndcg_at_10_std value: -8.5559 - type: nauc_ndcg_at_10_diff1 value: 43.5542 - type: nauc_ndcg_at_20_max value: 21.5977 - type: nauc_ndcg_at_20_std value: -8.236400000000001 - type: nauc_ndcg_at_20_diff1 value: 43.1564 - type: nauc_ndcg_at_100_max value: 21.4543 - type: nauc_ndcg_at_100_std value: -7.5462 - type: nauc_ndcg_at_100_diff1 value: 43.1768 - type: nauc_ndcg_at_1000_max value: 21.6202 - type: nauc_ndcg_at_1000_std value: -7.5571 - type: nauc_ndcg_at_1000_diff1 value: 43.5388 - type: nauc_map_at_1_max value: 23.1966 - type: nauc_map_at_1_std value: -9.4926 - type: nauc_map_at_1_diff1 value: 50.2664 - type: nauc_map_at_3_max value: 23.0018 - type: nauc_map_at_3_std value: -9.4391 - type: nauc_map_at_3_diff1 value: 46.428000000000004 - type: nauc_map_at_5_max value: 22.642300000000002 - type: nauc_map_at_5_std value: -9.2849 - type: nauc_map_at_5_diff1 value: 45.776 - type: nauc_map_at_10_max value: 22.551099999999998 - type: nauc_map_at_10_std value: -9.045300000000001 - type: nauc_map_at_10_diff1 value: 45.5645 - type: nauc_map_at_20_max value: 22.4407 - type: nauc_map_at_20_std value: -8.9542 - type: nauc_map_at_20_diff1 value: 45.4588 - type: nauc_map_at_100_max value: 22.4247 - type: nauc_map_at_100_std value: -8.869299999999999 - type: nauc_map_at_100_diff1 value: 45.467200000000005 - type: nauc_map_at_1000_max value: 22.429299999999998 - type: nauc_map_at_1000_std value: -8.8653 - type: nauc_map_at_1000_diff1 value: 45.479 - type: nauc_recall_at_1_max value: 23.1966 - type: nauc_recall_at_1_std value: -9.4926 - type: nauc_recall_at_1_diff1 value: 50.2664 - type: nauc_recall_at_3_max value: 22.6466 - type: nauc_recall_at_3_std value: -9.259599999999999 - type: nauc_recall_at_3_diff1 value: 41.9917 - type: nauc_recall_at_5_max value: 21.121100000000002 - type: nauc_recall_at_5_std value: -8.5882 - type: nauc_recall_at_5_diff1 value: 39.1445 - type: nauc_recall_at_10_max value: 20.191200000000002 - type: nauc_recall_at_10_std value: -6.824 - type: nauc_recall_at_10_diff1 value: 37.107 - type: nauc_recall_at_20_max value: 18.2104 - type: nauc_recall_at_20_std value: -5.3749 - type: nauc_recall_at_20_diff1 value: 34.9673 - type: nauc_recall_at_100_max value: 16.0859 - type: nauc_recall_at_100_std value: 0.7539 - type: nauc_recall_at_100_diff1 value: 32.603500000000004 - type: nauc_recall_at_1000_max value: 14.1642 - type: nauc_recall_at_1000_std value: 8.5463 - type: nauc_recall_at_1000_diff1 value: 29.5927 - type: nauc_precision_at_1_max value: 23.1966 - type: nauc_precision_at_1_std value: -9.4926 - type: nauc_precision_at_1_diff1 value: 50.2664 - type: nauc_precision_at_3_max value: 22.6466 - type: nauc_precision_at_3_std value: -9.259599999999999 - type: nauc_precision_at_3_diff1 value: 41.9917 - type: nauc_precision_at_5_max value: 21.121100000000002 - type: nauc_precision_at_5_std value: -8.5882 - type: nauc_precision_at_5_diff1 value: 39.1445 - type: nauc_precision_at_10_max value: 20.191200000000002 - type: nauc_precision_at_10_std value: -6.824 - type: nauc_precision_at_10_diff1 value: 37.107 - type: nauc_precision_at_20_max value: 18.2104 - type: nauc_precision_at_20_std value: -5.3749 - type: nauc_precision_at_20_diff1 value: 34.9673 - type: nauc_precision_at_100_max value: 16.0859 - type: nauc_precision_at_100_std value: 0.7539 - type: nauc_precision_at_100_diff1 value: 32.603500000000004 - type: nauc_precision_at_1000_max value: 14.1642 - type: nauc_precision_at_1000_std value: 8.5463 - type: nauc_precision_at_1000_diff1 value: 29.5927 - type: nauc_mrr_at_1_max value: 23.2502 - type: nauc_mrr_at_1_std value: -9.507 - type: nauc_mrr_at_1_diff1 value: 50.3997 - type: nauc_mrr_at_3_max value: 23.009 - type: nauc_mrr_at_3_std value: -9.4541 - type: nauc_mrr_at_3_diff1 value: 46.4733 - type: nauc_mrr_at_5_max value: 22.656000000000002 - type: nauc_mrr_at_5_std value: -9.2987 - type: nauc_mrr_at_5_diff1 value: 45.839999999999996 - type: nauc_mrr_at_10_max value: 22.5697 - type: nauc_mrr_at_10_std value: -9.0543 - type: nauc_mrr_at_10_diff1 value: 45.618700000000004 - type: nauc_mrr_at_20_max value: 22.461000000000002 - type: nauc_mrr_at_20_std value: -8.9628 - type: nauc_mrr_at_20_diff1 value: 45.5146 - type: nauc_mrr_at_100_max value: 22.4449 - type: nauc_mrr_at_100_std value: -8.877699999999999 - type: nauc_mrr_at_100_diff1 value: 45.5229 - type: nauc_mrr_at_1000_max value: 22.4498 - type: nauc_mrr_at_1000_std value: -8.873899999999999 - type: nauc_mrr_at_1000_diff1 value: 45.535199999999996 - type: main_score value: 41.814 task: type: Retrieval - dataset: config: python name: MTEB CodeSearchNetRetrieval (python) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 73.5 - type: ndcg_at_3 value: 82.35900000000001 - type: ndcg_at_5 value: 83.543 - type: ndcg_at_10 value: 84.357 - type: ndcg_at_20 value: 84.973 - type: ndcg_at_100 value: 85.449 - type: ndcg_at_1000 value: 85.591 - type: map_at_1 value: 73.5 - type: map_at_3 value: 80.2 - type: map_at_5 value: 80.85 - type: map_at_10 value: 81.189 - type: map_at_20 value: 81.364 - type: map_at_100 value: 81.434 - type: map_at_1000 value: 81.44 - type: recall_at_1 value: 73.5 - type: recall_at_3 value: 88.6 - type: recall_at_5 value: 91.5 - type: recall_at_10 value: 94.0 - type: recall_at_20 value: 96.39999999999999 - type: recall_at_100 value: 98.9 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 73.5 - type: precision_at_3 value: 29.532999999999998 - type: precision_at_5 value: 18.3 - type: precision_at_10 value: 9.4 - type: precision_at_20 value: 4.82 - type: precision_at_100 value: 0.989 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 73.5 - type: mrr_at_3 value: 80.2 - type: mrr_at_5 value: 80.85 - type: mrr_at_10 value: 81.1894 - type: mrr_at_20 value: 81.3638 - type: mrr_at_100 value: 81.43430000000001 - type: mrr_at_1000 value: 81.44 - type: nauc_ndcg_at_1_max value: 45.553 - type: nauc_ndcg_at_1_std value: -3.8149 - type: nauc_ndcg_at_1_diff1 value: 72.4638 - type: nauc_ndcg_at_3_max value: 47.8454 - type: nauc_ndcg_at_3_std value: -3.2174 - type: nauc_ndcg_at_3_diff1 value: 69.05059999999999 - type: nauc_ndcg_at_5_max value: 48.105599999999995 - type: nauc_ndcg_at_5_std value: -3.0107 - type: nauc_ndcg_at_5_diff1 value: 70.2436 - type: nauc_ndcg_at_10_max value: 48.871900000000004 - type: nauc_ndcg_at_10_std value: -2.7289 - type: nauc_ndcg_at_10_diff1 value: 70.87440000000001 - type: nauc_ndcg_at_20_max value: 49.1441 - type: nauc_ndcg_at_20_std value: -2.2193 - type: nauc_ndcg_at_20_diff1 value: 70.9602 - type: nauc_ndcg_at_100_max value: 48.2597 - type: nauc_ndcg_at_100_std value: -2.8648 - type: nauc_ndcg_at_100_diff1 value: 70.5487 - type: nauc_ndcg_at_1000_max value: 48.0576 - type: nauc_ndcg_at_1000_std value: -3.0315000000000003 - type: nauc_ndcg_at_1000_diff1 value: 70.8214 - type: nauc_map_at_1_max value: 45.553 - type: nauc_map_at_1_std value: -3.8149 - type: nauc_map_at_1_diff1 value: 72.4638 - type: nauc_map_at_3_max value: 47.143 - type: nauc_map_at_3_std value: -3.4511 - type: nauc_map_at_3_diff1 value: 70.2411 - type: nauc_map_at_5_max value: 47.2524 - type: nauc_map_at_5_std value: -3.3834999999999997 - type: nauc_map_at_5_diff1 value: 70.8691 - type: nauc_map_at_10_max value: 47.5215 - type: nauc_map_at_10_std value: -3.3042000000000002 - type: nauc_map_at_10_diff1 value: 71.1041 - type: nauc_map_at_20_max value: 47.5871 - type: nauc_map_at_20_std value: -3.1888 - type: nauc_map_at_20_diff1 value: 71.1157 - type: nauc_map_at_100_max value: 47.4746 - type: nauc_map_at_100_std value: -3.3092 - type: nauc_map_at_100_diff1 value: 71.0626 - type: nauc_map_at_1000_max value: 47.4686 - type: nauc_map_at_1000_std value: -3.3099000000000003 - type: nauc_map_at_1000_diff1 value: 71.0712 - type: nauc_recall_at_1_max value: 45.553 - type: nauc_recall_at_1_std value: -3.8149 - type: nauc_recall_at_1_diff1 value: 72.4638 - type: nauc_recall_at_3_max value: 51.09590000000001 - type: nauc_recall_at_3_std value: -2.1018 - type: nauc_recall_at_3_diff1 value: 63.4433 - type: nauc_recall_at_5_max value: 53.195499999999996 - type: nauc_recall_at_5_std value: -0.6421 - type: nauc_recall_at_5_diff1 value: 66.7381 - type: nauc_recall_at_10_max value: 60.660599999999995 - type: nauc_recall_at_10_std value: 2.5576000000000003 - type: nauc_recall_at_10_diff1 value: 69.8771 - type: nauc_recall_at_20_max value: 72.0082 - type: nauc_recall_at_20_std value: 13.519300000000001 - type: nauc_recall_at_20_diff1 value: 70.8774 - type: nauc_recall_at_100_max value: 67.6683 - type: nauc_recall_at_100_std value: 16.4757 - type: nauc_recall_at_100_diff1 value: 45.535199999999996 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 45.553 - type: nauc_precision_at_1_std value: -3.8149 - type: nauc_precision_at_1_diff1 value: 72.4638 - type: nauc_precision_at_3_max value: 51.09590000000001 - type: nauc_precision_at_3_std value: -2.1018 - type: nauc_precision_at_3_diff1 value: 63.4433 - type: nauc_precision_at_5_max value: 53.195499999999996 - type: nauc_precision_at_5_std value: -0.6421 - type: nauc_precision_at_5_diff1 value: 66.7381 - type: nauc_precision_at_10_max value: 60.660599999999995 - type: nauc_precision_at_10_std value: 2.5576000000000003 - type: nauc_precision_at_10_diff1 value: 69.8771 - type: nauc_precision_at_20_max value: 72.0082 - type: nauc_precision_at_20_std value: 13.519300000000001 - type: nauc_precision_at_20_diff1 value: 70.8774 - type: nauc_precision_at_100_max value: 67.6683 - type: nauc_precision_at_100_std value: 16.4757 - type: nauc_precision_at_100_diff1 value: 45.535199999999996 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 45.553 - type: nauc_mrr_at_1_std value: -3.8149 - type: nauc_mrr_at_1_diff1 value: 72.4638 - type: nauc_mrr_at_3_max value: 47.143 - type: nauc_mrr_at_3_std value: -3.4511 - type: nauc_mrr_at_3_diff1 value: 70.2411 - type: nauc_mrr_at_5_max value: 47.2524 - type: nauc_mrr_at_5_std value: -3.3834999999999997 - type: nauc_mrr_at_5_diff1 value: 70.8691 - type: nauc_mrr_at_10_max value: 47.5215 - type: nauc_mrr_at_10_std value: -3.3042000000000002 - type: nauc_mrr_at_10_diff1 value: 71.1041 - type: nauc_mrr_at_20_max value: 47.5871 - type: nauc_mrr_at_20_std value: -3.1888 - type: nauc_mrr_at_20_diff1 value: 71.1157 - type: nauc_mrr_at_100_max value: 47.4746 - type: nauc_mrr_at_100_std value: -3.3092 - type: nauc_mrr_at_100_diff1 value: 71.0626 - type: nauc_mrr_at_1000_max value: 47.4686 - type: nauc_mrr_at_1000_std value: -3.3099000000000003 - type: nauc_mrr_at_1000_diff1 value: 71.0712 - type: main_score value: 84.357 task: type: Retrieval - dataset: config: javascript name: MTEB CodeSearchNetRetrieval (javascript) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 59.4 - type: ndcg_at_3 value: 68.58800000000001 - type: ndcg_at_5 value: 70.0 - type: ndcg_at_10 value: 71.384 - type: ndcg_at_20 value: 72.505 - type: ndcg_at_100 value: 73.532 - type: ndcg_at_1000 value: 74.414 - type: map_at_1 value: 59.4 - type: map_at_3 value: 66.367 - type: map_at_5 value: 67.157 - type: map_at_10 value: 67.72399999999999 - type: map_at_20 value: 68.036 - type: map_at_100 value: 68.182 - type: map_at_1000 value: 68.208 - type: recall_at_1 value: 59.4 - type: recall_at_3 value: 75.0 - type: recall_at_5 value: 78.4 - type: recall_at_10 value: 82.69999999999999 - type: recall_at_20 value: 87.1 - type: recall_at_100 value: 92.60000000000001 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 59.4 - type: precision_at_3 value: 25.0 - type: precision_at_5 value: 15.68 - type: precision_at_10 value: 8.27 - type: precision_at_20 value: 4.3549999999999995 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 59.4 - type: mrr_at_3 value: 66.3667 - type: mrr_at_5 value: 67.1567 - type: mrr_at_10 value: 67.72399999999999 - type: mrr_at_20 value: 68.036 - type: mrr_at_100 value: 68.1821 - type: mrr_at_1000 value: 68.20779999999999 - type: nauc_ndcg_at_1_max value: 55.2077 - type: nauc_ndcg_at_1_std value: 23.8385 - type: nauc_ndcg_at_1_diff1 value: 72.8827 - type: nauc_ndcg_at_3_max value: 62.495 - type: nauc_ndcg_at_3_std value: 31.867800000000003 - type: nauc_ndcg_at_3_diff1 value: 69.8148 - type: nauc_ndcg_at_5_max value: 63.132999999999996 - type: nauc_ndcg_at_5_std value: 33.3486 - type: nauc_ndcg_at_5_diff1 value: 69.8501 - type: nauc_ndcg_at_10_max value: 64.3507 - type: nauc_ndcg_at_10_std value: 36.4767 - type: nauc_ndcg_at_10_diff1 value: 69.5995 - type: nauc_ndcg_at_20_max value: 63.930299999999995 - type: nauc_ndcg_at_20_std value: 36.8457 - type: nauc_ndcg_at_20_diff1 value: 70.0822 - type: nauc_ndcg_at_100_max value: 63.10249999999999 - type: nauc_ndcg_at_100_std value: 36.4228 - type: nauc_ndcg_at_100_diff1 value: 70.0219 - type: nauc_ndcg_at_1000_max value: 62.3826 - type: nauc_ndcg_at_1000_std value: 34.2464 - type: nauc_ndcg_at_1000_diff1 value: 70.2371 - type: nauc_map_at_1_max value: 55.2077 - type: nauc_map_at_1_std value: 23.8385 - type: nauc_map_at_1_diff1 value: 72.8827 - type: nauc_map_at_3_max value: 60.4208 - type: nauc_map_at_3_std value: 29.6445 - type: nauc_map_at_3_diff1 value: 70.58630000000001 - type: nauc_map_at_5_max value: 60.709900000000005 - type: nauc_map_at_5_std value: 30.400899999999996 - type: nauc_map_at_5_diff1 value: 70.6255 - type: nauc_map_at_10_max value: 61.152499999999996 - type: nauc_map_at_10_std value: 31.550800000000002 - type: nauc_map_at_10_diff1 value: 70.56099999999999 - type: nauc_map_at_20_max value: 61.0075 - type: nauc_map_at_20_std value: 31.585600000000003 - type: nauc_map_at_20_diff1 value: 70.6649 - type: nauc_map_at_100_max value: 60.90370000000001 - type: nauc_map_at_100_std value: 31.510700000000003 - type: nauc_map_at_100_diff1 value: 70.66839999999999 - type: nauc_map_at_1000_max value: 60.8865 - type: nauc_map_at_1000_std value: 31.4572 - type: nauc_map_at_1000_diff1 value: 70.6705 - type: nauc_recall_at_1_max value: 55.2077 - type: nauc_recall_at_1_std value: 23.8385 - type: nauc_recall_at_1_diff1 value: 72.8827 - type: nauc_recall_at_3_max value: 69.92819999999999 - type: nauc_recall_at_3_std value: 39.8045 - type: nauc_recall_at_3_diff1 value: 67.10040000000001 - type: nauc_recall_at_5_max value: 72.8013 - type: nauc_recall_at_5_std value: 45.1476 - type: nauc_recall_at_5_diff1 value: 66.84790000000001 - type: nauc_recall_at_10_max value: 80.1828 - type: nauc_recall_at_10_std value: 61.6781 - type: nauc_recall_at_10_diff1 value: 64.9272 - type: nauc_recall_at_20_max value: 82.11840000000001 - type: nauc_recall_at_20_std value: 72.1146 - type: nauc_recall_at_20_diff1 value: 67.3756 - type: nauc_recall_at_100_max value: 80.8836 - type: nauc_recall_at_100_std value: 89.47810000000001 - type: nauc_recall_at_100_diff1 value: 64.169 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 55.2077 - type: nauc_precision_at_1_std value: 23.8385 - type: nauc_precision_at_1_diff1 value: 72.8827 - type: nauc_precision_at_3_max value: 69.92819999999999 - type: nauc_precision_at_3_std value: 39.8045 - type: nauc_precision_at_3_diff1 value: 67.10040000000001 - type: nauc_precision_at_5_max value: 72.8013 - type: nauc_precision_at_5_std value: 45.1476 - type: nauc_precision_at_5_diff1 value: 66.84790000000001 - type: nauc_precision_at_10_max value: 80.1828 - type: nauc_precision_at_10_std value: 61.6781 - type: nauc_precision_at_10_diff1 value: 64.9272 - type: nauc_precision_at_20_max value: 82.11840000000001 - type: nauc_precision_at_20_std value: 72.1146 - type: nauc_precision_at_20_diff1 value: 67.3756 - type: nauc_precision_at_100_max value: 80.8836 - type: nauc_precision_at_100_std value: 89.47810000000001 - type: nauc_precision_at_100_diff1 value: 64.169 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 55.2077 - type: nauc_mrr_at_1_std value: 23.8385 - type: nauc_mrr_at_1_diff1 value: 72.8827 - type: nauc_mrr_at_3_max value: 60.4208 - type: nauc_mrr_at_3_std value: 29.6445 - type: nauc_mrr_at_3_diff1 value: 70.58630000000001 - type: nauc_mrr_at_5_max value: 60.709900000000005 - type: nauc_mrr_at_5_std value: 30.400899999999996 - type: nauc_mrr_at_5_diff1 value: 70.6255 - type: nauc_mrr_at_10_max value: 61.152499999999996 - type: nauc_mrr_at_10_std value: 31.550800000000002 - type: nauc_mrr_at_10_diff1 value: 70.56099999999999 - type: nauc_mrr_at_20_max value: 61.0075 - type: nauc_mrr_at_20_std value: 31.585600000000003 - type: nauc_mrr_at_20_diff1 value: 70.6649 - type: nauc_mrr_at_100_max value: 60.90370000000001 - type: nauc_mrr_at_100_std value: 31.510700000000003 - type: nauc_mrr_at_100_diff1 value: 70.66839999999999 - type: nauc_mrr_at_1000_max value: 60.8865 - type: nauc_mrr_at_1000_std value: 31.4572 - type: nauc_mrr_at_1000_diff1 value: 70.6705 - type: main_score value: 71.384 task: type: Retrieval - dataset: config: go name: MTEB CodeSearchNetRetrieval (go) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 71.39999999999999 - type: ndcg_at_3 value: 82.32000000000001 - type: ndcg_at_5 value: 84.22699999999999 - type: ndcg_at_10 value: 84.922 - type: ndcg_at_20 value: 85.226 - type: ndcg_at_100 value: 85.563 - type: ndcg_at_1000 value: 85.66 - type: map_at_1 value: 71.39999999999999 - type: map_at_3 value: 79.783 - type: map_at_5 value: 80.848 - type: map_at_10 value: 81.145 - type: map_at_20 value: 81.229 - type: map_at_100 value: 81.284 - type: map_at_1000 value: 81.286 - type: recall_at_1 value: 71.39999999999999 - type: recall_at_3 value: 89.60000000000001 - type: recall_at_5 value: 94.19999999999999 - type: recall_at_10 value: 96.3 - type: recall_at_20 value: 97.5 - type: recall_at_100 value: 99.2 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 71.39999999999999 - type: precision_at_3 value: 29.866999999999997 - type: precision_at_5 value: 18.84 - type: precision_at_10 value: 9.629999999999999 - type: precision_at_20 value: 4.875 - type: precision_at_100 value: 0.992 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 71.39999999999999 - type: mrr_at_3 value: 79.7833 - type: mrr_at_5 value: 80.8483 - type: mrr_at_10 value: 81.14489999999999 - type: mrr_at_20 value: 81.22890000000001 - type: mrr_at_100 value: 81.2836 - type: mrr_at_1000 value: 81.28649999999999 - type: nauc_ndcg_at_1_max value: 46.2744 - type: nauc_ndcg_at_1_std value: -2.9863 - type: nauc_ndcg_at_1_diff1 value: 74.0857 - type: nauc_ndcg_at_3_max value: 54.4012 - type: nauc_ndcg_at_3_std value: -3.3299000000000003 - type: nauc_ndcg_at_3_diff1 value: 70.891 - type: nauc_ndcg_at_5_max value: 54.3223 - type: nauc_ndcg_at_5_std value: -1.6239 - type: nauc_ndcg_at_5_diff1 value: 71.7397 - type: nauc_ndcg_at_10_max value: 53.629099999999994 - type: nauc_ndcg_at_10_std value: -1.8041999999999998 - type: nauc_ndcg_at_10_diff1 value: 72.8108 - type: nauc_ndcg_at_20_max value: 52.8247 - type: nauc_ndcg_at_20_std value: -2.6823 - type: nauc_ndcg_at_20_diff1 value: 72.7573 - type: nauc_ndcg_at_100_max value: 52.359 - type: nauc_ndcg_at_100_std value: -2.8805 - type: nauc_ndcg_at_100_diff1 value: 72.8282 - type: nauc_ndcg_at_1000_max value: 52.1323 - type: nauc_ndcg_at_1000_std value: -2.8353 - type: nauc_ndcg_at_1000_diff1 value: 72.6771 - type: nauc_map_at_1_max value: 46.2744 - type: nauc_map_at_1_std value: -2.9863 - type: nauc_map_at_1_diff1 value: 74.0857 - type: nauc_map_at_3_max value: 52.0957 - type: nauc_map_at_3_std value: -3.5077999999999996 - type: nauc_map_at_3_diff1 value: 71.90530000000001 - type: nauc_map_at_5_max value: 51.9209 - type: nauc_map_at_5_std value: -2.7184 - type: nauc_map_at_5_diff1 value: 72.3474 - type: nauc_map_at_10_max value: 51.642900000000004 - type: nauc_map_at_10_std value: -2.8069 - type: nauc_map_at_10_diff1 value: 72.74589999999999 - type: nauc_map_at_20_max value: 51.451800000000006 - type: nauc_map_at_20_std value: -2.9922 - type: nauc_map_at_20_diff1 value: 72.7222 - type: nauc_map_at_100_max value: 51.3795 - type: nauc_map_at_100_std value: -3.0112 - type: nauc_map_at_100_diff1 value: 72.723 - type: nauc_map_at_1000_max value: 51.3724 - type: nauc_map_at_1000_std value: -3.009 - type: nauc_map_at_1000_diff1 value: 72.7192 - type: nauc_recall_at_1_max value: 46.2744 - type: nauc_recall_at_1_std value: -2.9863 - type: nauc_recall_at_1_diff1 value: 74.0857 - type: nauc_recall_at_3_max value: 65.8657 - type: nauc_recall_at_3_std value: -2.2125 - type: nauc_recall_at_3_diff1 value: 65.75649999999999 - type: nauc_recall_at_5_max value: 74.348 - type: nauc_recall_at_5_std value: 8.7503 - type: nauc_recall_at_5_diff1 value: 66.9693 - type: nauc_recall_at_10_max value: 77.9494 - type: nauc_recall_at_10_std value: 12.8688 - type: nauc_recall_at_10_diff1 value: 75.7287 - type: nauc_recall_at_20_max value: 72.9655 - type: nauc_recall_at_20_std value: 0.8702 - type: nauc_recall_at_20_diff1 value: 76.5864 - type: nauc_recall_at_100_max value: 80.4563 - type: nauc_recall_at_100_std value: -9.278699999999999 - type: nauc_recall_at_100_diff1 value: 92.793 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 46.2744 - type: nauc_precision_at_1_std value: -2.9863 - type: nauc_precision_at_1_diff1 value: 74.0857 - type: nauc_precision_at_3_max value: 65.8657 - type: nauc_precision_at_3_std value: -2.2125 - type: nauc_precision_at_3_diff1 value: 65.75649999999999 - type: nauc_precision_at_5_max value: 74.348 - type: nauc_precision_at_5_std value: 8.7503 - type: nauc_precision_at_5_diff1 value: 66.9693 - type: nauc_precision_at_10_max value: 77.9494 - type: nauc_precision_at_10_std value: 12.8688 - type: nauc_precision_at_10_diff1 value: 75.7287 - type: nauc_precision_at_20_max value: 72.9655 - type: nauc_precision_at_20_std value: 0.8702 - type: nauc_precision_at_20_diff1 value: 76.5864 - type: nauc_precision_at_100_max value: 80.4563 - type: nauc_precision_at_100_std value: -9.278699999999999 - type: nauc_precision_at_100_diff1 value: 92.793 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 46.2744 - type: nauc_mrr_at_1_std value: -2.9863 - type: nauc_mrr_at_1_diff1 value: 74.0857 - type: nauc_mrr_at_3_max value: 52.0957 - type: nauc_mrr_at_3_std value: -3.5077999999999996 - type: nauc_mrr_at_3_diff1 value: 71.90530000000001 - type: nauc_mrr_at_5_max value: 51.9209 - type: nauc_mrr_at_5_std value: -2.7184 - type: nauc_mrr_at_5_diff1 value: 72.3474 - type: nauc_mrr_at_10_max value: 51.642900000000004 - type: nauc_mrr_at_10_std value: -2.8069 - type: nauc_mrr_at_10_diff1 value: 72.74589999999999 - type: nauc_mrr_at_20_max value: 51.451800000000006 - type: nauc_mrr_at_20_std value: -2.9922 - type: nauc_mrr_at_20_diff1 value: 72.7222 - type: nauc_mrr_at_100_max value: 51.3795 - type: nauc_mrr_at_100_std value: -3.0112 - type: nauc_mrr_at_100_diff1 value: 72.723 - type: nauc_mrr_at_1000_max value: 51.3724 - type: nauc_mrr_at_1000_std value: -3.009 - type: nauc_mrr_at_1000_diff1 value: 72.7192 - type: main_score value: 84.922 task: type: Retrieval - dataset: config: ruby name: MTEB CodeSearchNetRetrieval (ruby) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 61.9 - type: ndcg_at_3 value: 71.91 - type: ndcg_at_5 value: 74.11 - type: ndcg_at_10 value: 75.274 - type: ndcg_at_20 value: 75.97 - type: ndcg_at_100 value: 77.021 - type: ndcg_at_1000 value: 77.511 - type: map_at_1 value: 61.9 - type: map_at_3 value: 69.55 - type: map_at_5 value: 70.78 - type: map_at_10 value: 71.26 - type: map_at_20 value: 71.45899999999999 - type: map_at_100 value: 71.609 - type: map_at_1000 value: 71.624 - type: recall_at_1 value: 61.9 - type: recall_at_3 value: 78.7 - type: recall_at_5 value: 84.0 - type: recall_at_10 value: 87.6 - type: recall_at_20 value: 90.3 - type: recall_at_100 value: 95.89999999999999 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 61.9 - type: precision_at_3 value: 26.233 - type: precision_at_5 value: 16.8 - type: precision_at_10 value: 8.76 - type: precision_at_20 value: 4.515000000000001 - type: precision_at_100 value: 0.959 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 61.9 - type: mrr_at_3 value: 69.55 - type: mrr_at_5 value: 70.78 - type: mrr_at_10 value: 71.2604 - type: mrr_at_20 value: 71.4589 - type: mrr_at_100 value: 71.609 - type: mrr_at_1000 value: 71.6242 - type: nauc_ndcg_at_1_max value: 51.8333 - type: nauc_ndcg_at_1_std value: 8.4163 - type: nauc_ndcg_at_1_diff1 value: 72.37700000000001 - type: nauc_ndcg_at_3_max value: 56.0395 - type: nauc_ndcg_at_3_std value: 12.583 - type: nauc_ndcg_at_3_diff1 value: 67.5758 - type: nauc_ndcg_at_5_max value: 56.35289999999999 - type: nauc_ndcg_at_5_std value: 13.9102 - type: nauc_ndcg_at_5_diff1 value: 68.36179999999999 - type: nauc_ndcg_at_10_max value: 55.954499999999996 - type: nauc_ndcg_at_10_std value: 14.8003 - type: nauc_ndcg_at_10_diff1 value: 68.3755 - type: nauc_ndcg_at_20_max value: 56.2808 - type: nauc_ndcg_at_20_std value: 16.0875 - type: nauc_ndcg_at_20_diff1 value: 68.3962 - type: nauc_ndcg_at_100_max value: 56.3164 - type: nauc_ndcg_at_100_std value: 15.8916 - type: nauc_ndcg_at_100_diff1 value: 69.00699999999999 - type: nauc_ndcg_at_1000_max value: 55.785700000000006 - type: nauc_ndcg_at_1000_std value: 14.3348 - type: nauc_ndcg_at_1000_diff1 value: 69.0698 - type: nauc_map_at_1_max value: 51.8333 - type: nauc_map_at_1_std value: 8.4163 - type: nauc_map_at_1_diff1 value: 72.37700000000001 - type: nauc_map_at_3_max value: 54.942800000000005 - type: nauc_map_at_3_std value: 11.2973 - type: nauc_map_at_3_diff1 value: 68.9311 - type: nauc_map_at_5_max value: 55.0587 - type: nauc_map_at_5_std value: 11.9547 - type: nauc_map_at_5_diff1 value: 69.3713 - type: nauc_map_at_10_max value: 54.9098 - type: nauc_map_at_10_std value: 12.2453 - type: nauc_map_at_10_diff1 value: 69.3958 - type: nauc_map_at_20_max value: 54.9689 - type: nauc_map_at_20_std value: 12.524799999999999 - type: nauc_map_at_20_diff1 value: 69.4109 - type: nauc_map_at_100_max value: 54.9906 - type: nauc_map_at_100_std value: 12.500300000000001 - type: nauc_map_at_100_diff1 value: 69.50319999999999 - type: nauc_map_at_1000_max value: 54.97840000000001 - type: nauc_map_at_1000_std value: 12.4639 - type: nauc_map_at_1000_diff1 value: 69.50460000000001 - type: nauc_recall_at_1_max value: 51.8333 - type: nauc_recall_at_1_std value: 8.4163 - type: nauc_recall_at_1_diff1 value: 72.37700000000001 - type: nauc_recall_at_3_max value: 60.100699999999996 - type: nauc_recall_at_3_std value: 17.4623 - type: nauc_recall_at_3_diff1 value: 62.495599999999996 - type: nauc_recall_at_5_max value: 62.3622 - type: nauc_recall_at_5_std value: 23.282700000000002 - type: nauc_recall_at_5_diff1 value: 63.8786 - type: nauc_recall_at_10_max value: 61.567899999999995 - type: nauc_recall_at_10_std value: 30.543300000000002 - type: nauc_recall_at_10_diff1 value: 62.765800000000006 - type: nauc_recall_at_20_max value: 65.8648 - type: nauc_recall_at_20_std value: 45.2891 - type: nauc_recall_at_20_diff1 value: 61.5048 - type: nauc_recall_at_100_max value: 77.73790000000001 - type: nauc_recall_at_100_std value: 78.3004 - type: nauc_recall_at_100_diff1 value: 66.54820000000001 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 51.8333 - type: nauc_precision_at_1_std value: 8.4163 - type: nauc_precision_at_1_diff1 value: 72.37700000000001 - type: nauc_precision_at_3_max value: 60.100699999999996 - type: nauc_precision_at_3_std value: 17.4623 - type: nauc_precision_at_3_diff1 value: 62.495599999999996 - type: nauc_precision_at_5_max value: 62.3622 - type: nauc_precision_at_5_std value: 23.282700000000002 - type: nauc_precision_at_5_diff1 value: 63.8786 - type: nauc_precision_at_10_max value: 61.567899999999995 - type: nauc_precision_at_10_std value: 30.543300000000002 - type: nauc_precision_at_10_diff1 value: 62.765800000000006 - type: nauc_precision_at_20_max value: 65.8648 - type: nauc_precision_at_20_std value: 45.2891 - type: nauc_precision_at_20_diff1 value: 61.5048 - type: nauc_precision_at_100_max value: 77.73790000000001 - type: nauc_precision_at_100_std value: 78.3004 - type: nauc_precision_at_100_diff1 value: 66.54820000000001 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 51.8333 - type: nauc_mrr_at_1_std value: 8.4163 - type: nauc_mrr_at_1_diff1 value: 72.37700000000001 - type: nauc_mrr_at_3_max value: 54.942800000000005 - type: nauc_mrr_at_3_std value: 11.2973 - type: nauc_mrr_at_3_diff1 value: 68.9311 - type: nauc_mrr_at_5_max value: 55.0587 - type: nauc_mrr_at_5_std value: 11.9547 - type: nauc_mrr_at_5_diff1 value: 69.3713 - type: nauc_mrr_at_10_max value: 54.9098 - type: nauc_mrr_at_10_std value: 12.2453 - type: nauc_mrr_at_10_diff1 value: 69.3958 - type: nauc_mrr_at_20_max value: 54.9689 - type: nauc_mrr_at_20_std value: 12.524799999999999 - type: nauc_mrr_at_20_diff1 value: 69.4109 - type: nauc_mrr_at_100_max value: 54.9906 - type: nauc_mrr_at_100_std value: 12.500300000000001 - type: nauc_mrr_at_100_diff1 value: 69.50319999999999 - type: nauc_mrr_at_1000_max value: 54.97840000000001 - type: nauc_mrr_at_1000_std value: 12.4639 - type: nauc_mrr_at_1000_diff1 value: 69.50460000000001 - type: main_score value: 75.274 task: type: Retrieval - dataset: config: java name: MTEB CodeSearchNetRetrieval (java) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 52.6 - type: ndcg_at_3 value: 64.044 - type: ndcg_at_5 value: 67.202 - type: ndcg_at_10 value: 69.447 - type: ndcg_at_20 value: 70.488 - type: ndcg_at_100 value: 71.481 - type: ndcg_at_1000 value: 71.995 - type: map_at_1 value: 52.6 - type: map_at_3 value: 61.317 - type: map_at_5 value: 63.062 - type: map_at_10 value: 64.01400000000001 - type: map_at_20 value: 64.302 - type: map_at_100 value: 64.443 - type: map_at_1000 value: 64.459 - type: recall_at_1 value: 52.6 - type: recall_at_3 value: 71.89999999999999 - type: recall_at_5 value: 79.60000000000001 - type: recall_at_10 value: 86.4 - type: recall_at_20 value: 90.5 - type: recall_at_100 value: 95.8 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 52.6 - type: precision_at_3 value: 23.967 - type: precision_at_5 value: 15.920000000000002 - type: precision_at_10 value: 8.64 - type: precision_at_20 value: 4.5249999999999995 - type: precision_at_100 value: 0.958 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 52.6 - type: mrr_at_3 value: 61.316700000000004 - type: mrr_at_5 value: 63.0617 - type: mrr_at_10 value: 64.01400000000001 - type: mrr_at_20 value: 64.3022 - type: mrr_at_100 value: 64.443 - type: mrr_at_1000 value: 64.4595 - type: nauc_ndcg_at_1_max value: 38.4317 - type: nauc_ndcg_at_1_std value: -18.9677 - type: nauc_ndcg_at_1_diff1 value: 62.74570000000001 - type: nauc_ndcg_at_3_max value: 43.612 - type: nauc_ndcg_at_3_std value: -14.6587 - type: nauc_ndcg_at_3_diff1 value: 56.92230000000001 - type: nauc_ndcg_at_5_max value: 44.840999999999994 - type: nauc_ndcg_at_5_std value: -12.328600000000002 - type: nauc_ndcg_at_5_diff1 value: 56.998000000000005 - type: nauc_ndcg_at_10_max value: 45.5768 - type: nauc_ndcg_at_10_std value: -10.871 - type: nauc_ndcg_at_10_diff1 value: 57.36130000000001 - type: nauc_ndcg_at_20_max value: 45.1125 - type: nauc_ndcg_at_20_std value: -10.575 - type: nauc_ndcg_at_20_diff1 value: 57.2132 - type: nauc_ndcg_at_100_max value: 45.4087 - type: nauc_ndcg_at_100_std value: -10.356300000000001 - type: nauc_ndcg_at_100_diff1 value: 57.607 - type: nauc_ndcg_at_1000_max value: 44.2686 - type: nauc_ndcg_at_1000_std value: -12.2661 - type: nauc_ndcg_at_1000_diff1 value: 58.0082 - type: nauc_map_at_1_max value: 38.4317 - type: nauc_map_at_1_std value: -18.9677 - type: nauc_map_at_1_diff1 value: 62.74570000000001 - type: nauc_map_at_3_max value: 42.278 - type: nauc_map_at_3_std value: -15.937499999999998 - type: nauc_map_at_3_diff1 value: 58.4671 - type: nauc_map_at_5_max value: 42.8414 - type: nauc_map_at_5_std value: -14.7742 - type: nauc_map_at_5_diff1 value: 58.582100000000004 - type: nauc_map_at_10_max value: 43.0236 - type: nauc_map_at_10_std value: -14.3595 - type: nauc_map_at_10_diff1 value: 58.765100000000004 - type: nauc_map_at_20_max value: 42.8918 - type: nauc_map_at_20_std value: -14.335500000000001 - type: nauc_map_at_20_diff1 value: 58.746500000000005 - type: nauc_map_at_100_max value: 42.9383 - type: nauc_map_at_100_std value: -14.296600000000002 - type: nauc_map_at_100_diff1 value: 58.796099999999996 - type: nauc_map_at_1000_max value: 42.9079 - type: nauc_map_at_1000_std value: -14.3452 - type: nauc_map_at_1000_diff1 value: 58.8048 - type: nauc_recall_at_1_max value: 38.4317 - type: nauc_recall_at_1_std value: -18.9677 - type: nauc_recall_at_1_diff1 value: 62.74570000000001 - type: nauc_recall_at_3_max value: 48.255199999999995 - type: nauc_recall_at_3_std value: -10.116999999999999 - type: nauc_recall_at_3_diff1 value: 51.5211 - type: nauc_recall_at_5_max value: 53.7581 - type: nauc_recall_at_5_std value: -1.1828 - type: nauc_recall_at_5_diff1 value: 50.139199999999995 - type: nauc_recall_at_10_max value: 62.2138 - type: nauc_recall_at_10_std value: 12.5761 - type: nauc_recall_at_10_diff1 value: 49.091499999999996 - type: nauc_recall_at_20_max value: 64.05619999999999 - type: nauc_recall_at_20_std value: 24.6892 - type: nauc_recall_at_20_diff1 value: 44.4292 - type: nauc_recall_at_100_max value: 94.1543 - type: nauc_recall_at_100_std value: 72.2889 - type: nauc_recall_at_100_diff1 value: 39.8115 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 38.4317 - type: nauc_precision_at_1_std value: -18.9677 - type: nauc_precision_at_1_diff1 value: 62.74570000000001 - type: nauc_precision_at_3_max value: 48.255199999999995 - type: nauc_precision_at_3_std value: -10.116999999999999 - type: nauc_precision_at_3_diff1 value: 51.5211 - type: nauc_precision_at_5_max value: 53.7581 - type: nauc_precision_at_5_std value: -1.1828 - type: nauc_precision_at_5_diff1 value: 50.139199999999995 - type: nauc_precision_at_10_max value: 62.2138 - type: nauc_precision_at_10_std value: 12.5761 - type: nauc_precision_at_10_diff1 value: 49.091499999999996 - type: nauc_precision_at_20_max value: 64.05619999999999 - type: nauc_precision_at_20_std value: 24.6892 - type: nauc_precision_at_20_diff1 value: 44.4292 - type: nauc_precision_at_100_max value: 94.1543 - type: nauc_precision_at_100_std value: 72.2889 - type: nauc_precision_at_100_diff1 value: 39.8115 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 38.4317 - type: nauc_mrr_at_1_std value: -18.9677 - type: nauc_mrr_at_1_diff1 value: 62.74570000000001 - type: nauc_mrr_at_3_max value: 42.278 - type: nauc_mrr_at_3_std value: -15.937499999999998 - type: nauc_mrr_at_3_diff1 value: 58.4671 - type: nauc_mrr_at_5_max value: 42.8414 - type: nauc_mrr_at_5_std value: -14.7742 - type: nauc_mrr_at_5_diff1 value: 58.582100000000004 - type: nauc_mrr_at_10_max value: 43.0236 - type: nauc_mrr_at_10_std value: -14.3595 - type: nauc_mrr_at_10_diff1 value: 58.765100000000004 - type: nauc_mrr_at_20_max value: 42.8918 - type: nauc_mrr_at_20_std value: -14.335500000000001 - type: nauc_mrr_at_20_diff1 value: 58.746500000000005 - type: nauc_mrr_at_100_max value: 42.9383 - type: nauc_mrr_at_100_std value: -14.296600000000002 - type: nauc_mrr_at_100_diff1 value: 58.796099999999996 - type: nauc_mrr_at_1000_max value: 42.9079 - type: nauc_mrr_at_1000_std value: -14.3452 - type: nauc_mrr_at_1000_diff1 value: 58.8048 - type: main_score value: 69.447 task: type: Retrieval - dataset: config: php name: MTEB CodeSearchNetRetrieval (php) revision: fdc6a9e39575768c27eb8a2a5f702bf846eb4759 split: test type: code-search-net/code_search_net metrics: - type: ndcg_at_1 value: 57.699999999999996 - type: ndcg_at_3 value: 69.071 - type: ndcg_at_5 value: 71.331 - type: ndcg_at_10 value: 73.455 - type: ndcg_at_20 value: 74.298 - type: ndcg_at_100 value: 74.842 - type: ndcg_at_1000 value: 75.411 - type: map_at_1 value: 57.699999999999996 - type: map_at_3 value: 66.233 - type: map_at_5 value: 67.508 - type: map_at_10 value: 68.398 - type: map_at_20 value: 68.634 - type: map_at_100 value: 68.718 - type: map_at_1000 value: 68.735 - type: recall_at_1 value: 57.699999999999996 - type: recall_at_3 value: 77.3 - type: recall_at_5 value: 82.69999999999999 - type: recall_at_10 value: 89.2 - type: recall_at_20 value: 92.5 - type: recall_at_100 value: 95.3 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 57.699999999999996 - type: precision_at_3 value: 25.767 - type: precision_at_5 value: 16.54 - type: precision_at_10 value: 8.92 - type: precision_at_20 value: 4.625 - type: precision_at_100 value: 0.9530000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 57.699999999999996 - type: mrr_at_3 value: 66.2333 - type: mrr_at_5 value: 67.5083 - type: mrr_at_10 value: 68.398 - type: mrr_at_20 value: 68.6345 - type: mrr_at_100 value: 68.71770000000001 - type: mrr_at_1000 value: 68.7351 - type: nauc_ndcg_at_1_max value: 47.0017 - type: nauc_ndcg_at_1_std value: 7.702000000000001 - type: nauc_ndcg_at_1_diff1 value: 65.5265 - type: nauc_ndcg_at_3_max value: 53.1223 - type: nauc_ndcg_at_3_std value: 14.5277 - type: nauc_ndcg_at_3_diff1 value: 60.5267 - type: nauc_ndcg_at_5_max value: 55.99570000000001 - type: nauc_ndcg_at_5_std value: 17.467 - type: nauc_ndcg_at_5_diff1 value: 63.1188 - type: nauc_ndcg_at_10_max value: 55.7826 - type: nauc_ndcg_at_10_std value: 19.1279 - type: nauc_ndcg_at_10_diff1 value: 63.463 - type: nauc_ndcg_at_20_max value: 55.2338 - type: nauc_ndcg_at_20_std value: 19.5684 - type: nauc_ndcg_at_20_diff1 value: 63.7312 - type: nauc_ndcg_at_100_max value: 54.898199999999996 - type: nauc_ndcg_at_100_std value: 19.1172 - type: nauc_ndcg_at_100_diff1 value: 63.7935 - type: nauc_ndcg_at_1000_max value: 53.9486 - type: nauc_ndcg_at_1000_std value: 17.0841 - type: nauc_ndcg_at_1000_diff1 value: 63.5189 - type: nauc_map_at_1_max value: 47.0017 - type: nauc_map_at_1_std value: 7.702000000000001 - type: nauc_map_at_1_diff1 value: 65.5265 - type: nauc_map_at_3_max value: 51.3811 - type: nauc_map_at_3_std value: 12.6201 - type: nauc_map_at_3_diff1 value: 61.781299999999995 - type: nauc_map_at_5_max value: 52.788599999999995 - type: nauc_map_at_5_std value: 13.9926 - type: nauc_map_at_5_diff1 value: 63.155300000000004 - type: nauc_map_at_10_max value: 52.630900000000004 - type: nauc_map_at_10_std value: 14.5419 - type: nauc_map_at_10_diff1 value: 63.299499999999995 - type: nauc_map_at_20_max value: 52.4779 - type: nauc_map_at_20_std value: 14.615300000000001 - type: nauc_map_at_20_diff1 value: 63.360099999999996 - type: nauc_map_at_100_max value: 52.434999999999995 - type: nauc_map_at_100_std value: 14.5613 - type: nauc_map_at_100_diff1 value: 63.362700000000004 - type: nauc_map_at_1000_max value: 52.412000000000006 - type: nauc_map_at_1000_std value: 14.5121 - type: nauc_map_at_1000_diff1 value: 63.361000000000004 - type: nauc_recall_at_1_max value: 47.0017 - type: nauc_recall_at_1_std value: 7.702000000000001 - type: nauc_recall_at_1_diff1 value: 65.5265 - type: nauc_recall_at_3_max value: 59.7842 - type: nauc_recall_at_3_std value: 21.8077 - type: nauc_recall_at_3_diff1 value: 55.81850000000001 - type: nauc_recall_at_5_max value: 71.5097 - type: nauc_recall_at_5_std value: 34.341899999999995 - type: nauc_recall_at_5_diff1 value: 63.604000000000006 - type: nauc_recall_at_10_max value: 78.1568 - type: nauc_recall_at_10_std value: 53.016600000000004 - type: nauc_recall_at_10_diff1 value: 65.779 - type: nauc_recall_at_20_max value: 81.5145 - type: nauc_recall_at_20_std value: 72.038 - type: nauc_recall_at_20_diff1 value: 69.7603 - type: nauc_recall_at_100_max value: 89.0587 - type: nauc_recall_at_100_std value: 91.89070000000001 - type: nauc_recall_at_100_diff1 value: 75.1088 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 47.0017 - type: nauc_precision_at_1_std value: 7.702000000000001 - type: nauc_precision_at_1_diff1 value: 65.5265 - type: nauc_precision_at_3_max value: 59.7842 - type: nauc_precision_at_3_std value: 21.8077 - type: nauc_precision_at_3_diff1 value: 55.81850000000001 - type: nauc_precision_at_5_max value: 71.5097 - type: nauc_precision_at_5_std value: 34.341899999999995 - type: nauc_precision_at_5_diff1 value: 63.604000000000006 - type: nauc_precision_at_10_max value: 78.1568 - type: nauc_precision_at_10_std value: 53.016600000000004 - type: nauc_precision_at_10_diff1 value: 65.779 - type: nauc_precision_at_20_max value: 81.5145 - type: nauc_precision_at_20_std value: 72.038 - type: nauc_precision_at_20_diff1 value: 69.7603 - type: nauc_precision_at_100_max value: 89.0587 - type: nauc_precision_at_100_std value: 91.89070000000001 - type: nauc_precision_at_100_diff1 value: 75.1088 - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_mrr_at_1_max value: 47.0017 - type: nauc_mrr_at_1_std value: 7.702000000000001 - type: nauc_mrr_at_1_diff1 value: 65.5265 - type: nauc_mrr_at_3_max value: 51.3811 - type: nauc_mrr_at_3_std value: 12.6201 - type: nauc_mrr_at_3_diff1 value: 61.781299999999995 - type: nauc_mrr_at_5_max value: 52.788599999999995 - type: nauc_mrr_at_5_std value: 13.9926 - type: nauc_mrr_at_5_diff1 value: 63.155300000000004 - type: nauc_mrr_at_10_max value: 52.630900000000004 - type: nauc_mrr_at_10_std value: 14.5419 - type: nauc_mrr_at_10_diff1 value: 63.299499999999995 - type: nauc_mrr_at_20_max value: 52.4779 - type: nauc_mrr_at_20_std value: 14.615300000000001 - type: nauc_mrr_at_20_diff1 value: 63.360099999999996 - type: nauc_mrr_at_100_max value: 52.434999999999995 - type: nauc_mrr_at_100_std value: 14.5613 - type: nauc_mrr_at_100_diff1 value: 63.362700000000004 - type: nauc_mrr_at_1000_max value: 52.412000000000006 - type: nauc_mrr_at_1000_std value: 14.5121 - type: nauc_mrr_at_1000_diff1 value: 63.361000000000004 - type: main_score value: 73.455 task: type: Retrieval - dataset: config: default name: MTEB CodeTransOceanContest (default) revision: 20da4eb20a4b17300c0986ee148c90867a7f2a4d split: test type: CoIR-Retrieval/codetrans-contest metrics: - type: ndcg_at_1 value: 46.154 - type: ndcg_at_3 value: 52.019999999999996 - type: ndcg_at_5 value: 53.929 - type: ndcg_at_10 value: 57.475 - type: ndcg_at_20 value: 59.861 - type: ndcg_at_100 value: 61.577000000000005 - type: ndcg_at_1000 value: 62.755 - type: map_at_1 value: 46.154 - type: map_at_3 value: 50.602999999999994 - type: map_at_5 value: 51.68899999999999 - type: map_at_10 value: 53.174 - type: map_at_20 value: 53.818 - type: map_at_100 value: 54.041 - type: map_at_1000 value: 54.081 - type: recall_at_1 value: 46.154 - type: recall_at_3 value: 56.108999999999995 - type: recall_at_5 value: 60.633 - type: recall_at_10 value: 71.493 - type: recall_at_20 value: 80.99499999999999 - type: recall_at_100 value: 90.498 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 46.154 - type: precision_at_3 value: 18.703 - type: precision_at_5 value: 12.127 - type: precision_at_10 value: 7.149 - type: precision_at_20 value: 4.05 - type: precision_at_100 value: 0.905 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 46.153800000000004 - type: mrr_at_3 value: 50.6033 - type: mrr_at_5 value: 51.6893 - type: mrr_at_10 value: 53.173899999999996 - type: mrr_at_20 value: 53.8181 - type: mrr_at_100 value: 54.0405 - type: mrr_at_1000 value: 54.081199999999995 - type: nauc_ndcg_at_1_max value: 59.032 - type: nauc_ndcg_at_1_std value: 8.2815 - type: nauc_ndcg_at_1_diff1 value: 80.5428 - type: nauc_ndcg_at_3_max value: 55.47410000000001 - type: nauc_ndcg_at_3_std value: 4.4284 - type: nauc_ndcg_at_3_diff1 value: 77.2405 - type: nauc_ndcg_at_5_max value: 54.6337 - type: nauc_ndcg_at_5_std value: 5.3048 - type: nauc_ndcg_at_5_diff1 value: 76.5969 - type: nauc_ndcg_at_10_max value: 51.8584 - type: nauc_ndcg_at_10_std value: 3.5628 - type: nauc_ndcg_at_10_diff1 value: 74.6966 - type: nauc_ndcg_at_20_max value: 54.3478 - type: nauc_ndcg_at_20_std value: 4.3697 - type: nauc_ndcg_at_20_diff1 value: 75.6032 - type: nauc_ndcg_at_100_max value: 55.488400000000006 - type: nauc_ndcg_at_100_std value: 6.101 - type: nauc_ndcg_at_100_diff1 value: 76.0249 - type: nauc_ndcg_at_1000_max value: 55.1091 - type: nauc_ndcg_at_1000_std value: 5.5951 - type: nauc_ndcg_at_1000_diff1 value: 76.3907 - type: nauc_map_at_1_max value: 59.032 - type: nauc_map_at_1_std value: 8.2815 - type: nauc_map_at_1_diff1 value: 80.5428 - type: nauc_map_at_3_max value: 56.261700000000005 - type: nauc_map_at_3_std value: 5.3123 - type: nauc_map_at_3_diff1 value: 77.823 - type: nauc_map_at_5_max value: 55.7926 - type: nauc_map_at_5_std value: 5.8055 - type: nauc_map_at_5_diff1 value: 77.4779 - type: nauc_map_at_10_max value: 54.77459999999999 - type: nauc_map_at_10_std value: 5.1733 - type: nauc_map_at_10_diff1 value: 76.79249999999999 - type: nauc_map_at_20_max value: 55.4426 - type: nauc_map_at_20_std value: 5.4346 - type: nauc_map_at_20_diff1 value: 77.0378 - type: nauc_map_at_100_max value: 55.6049 - type: nauc_map_at_100_std value: 5.7131 - type: nauc_map_at_100_diff1 value: 77.0756 - type: nauc_map_at_1000_max value: 55.5915 - type: nauc_map_at_1000_std value: 5.7007 - type: nauc_map_at_1000_diff1 value: 77.0939 - type: nauc_recall_at_1_max value: 59.032 - type: nauc_recall_at_1_std value: 8.2815 - type: nauc_recall_at_1_diff1 value: 80.5428 - type: nauc_recall_at_3_max value: 53.1398 - type: nauc_recall_at_3_std value: 1.7934999999999999 - type: nauc_recall_at_3_diff1 value: 75.5862 - type: nauc_recall_at_5_max value: 50.9304 - type: nauc_recall_at_5_std value: 3.8924 - type: nauc_recall_at_5_diff1 value: 73.8369 - type: nauc_recall_at_10_max value: 38.9905 - type: nauc_recall_at_10_std value: -3.4564999999999997 - type: nauc_recall_at_10_diff1 value: 65.5567 - type: nauc_recall_at_20_max value: 50.0429 - type: nauc_recall_at_20_std value: -1.4551 - type: nauc_recall_at_20_diff1 value: 67.9871 - type: nauc_recall_at_100_max value: 63.44030000000001 - type: nauc_recall_at_100_std value: 17.8876 - type: nauc_recall_at_100_diff1 value: 68.9388 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: 59.032 - type: nauc_precision_at_1_std value: 8.2815 - type: nauc_precision_at_1_diff1 value: 80.5428 - type: nauc_precision_at_3_max value: 53.1398 - type: nauc_precision_at_3_std value: 1.7934999999999999 - type: nauc_precision_at_3_diff1 value: 75.5862 - type: nauc_precision_at_5_max value: 50.9304 - type: nauc_precision_at_5_std value: 3.8924 - type: nauc_precision_at_5_diff1 value: 73.8369 - type: nauc_precision_at_10_max value: 38.9905 - type: nauc_precision_at_10_std value: -3.4564999999999997 - type: nauc_precision_at_10_diff1 value: 65.5567 - type: nauc_precision_at_20_max value: 50.0429 - type: nauc_precision_at_20_std value: -1.4551 - type: nauc_precision_at_20_diff1 value: 67.9871 - type: nauc_precision_at_100_max value: 63.44030000000001 - type: nauc_precision_at_100_std value: 17.8876 - type: nauc_precision_at_100_diff1 value: 68.9388 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 100.0 - type: nauc_precision_at_1000_diff1 value: 100.0 - type: nauc_mrr_at_1_max value: 59.032 - type: nauc_mrr_at_1_std value: 8.2815 - type: nauc_mrr_at_1_diff1 value: 80.5428 - type: nauc_mrr_at_3_max value: 56.261700000000005 - type: nauc_mrr_at_3_std value: 5.3123 - type: nauc_mrr_at_3_diff1 value: 77.823 - type: nauc_mrr_at_5_max value: 55.7926 - type: nauc_mrr_at_5_std value: 5.8055 - type: nauc_mrr_at_5_diff1 value: 77.4779 - type: nauc_mrr_at_10_max value: 54.77459999999999 - type: nauc_mrr_at_10_std value: 5.1733 - type: nauc_mrr_at_10_diff1 value: 76.79249999999999 - type: nauc_mrr_at_20_max value: 55.4426 - type: nauc_mrr_at_20_std value: 5.4346 - type: nauc_mrr_at_20_diff1 value: 77.0378 - type: nauc_mrr_at_100_max value: 55.6049 - type: nauc_mrr_at_100_std value: 5.7131 - type: nauc_mrr_at_100_diff1 value: 77.0756 - type: nauc_mrr_at_1000_max value: 55.5915 - type: nauc_mrr_at_1000_std value: 5.7007 - type: nauc_mrr_at_1000_diff1 value: 77.0939 - type: main_score value: 57.475 task: type: Retrieval - dataset: config: default name: MTEB CodeTransOceanDL (default) revision: 281562cb8a1265ab5c0824bfa6ddcd9b0a15618f split: test type: CoIR-Retrieval/codetrans-dl metrics: - type: ndcg_at_1 value: 8.889 - type: ndcg_at_3 value: 10.700999999999999 - type: ndcg_at_5 value: 16.082 - type: ndcg_at_10 value: 26.888 - type: ndcg_at_20 value: 35.608000000000004 - type: ndcg_at_100 value: 36.459 - type: ndcg_at_1000 value: 36.775999999999996 - type: map_at_1 value: 8.889 - type: map_at_3 value: 10.184999999999999 - type: map_at_5 value: 13.241 - type: map_at_10 value: 17.502000000000002 - type: map_at_20 value: 19.978 - type: map_at_100 value: 20.108 - type: map_at_1000 value: 20.125 - type: recall_at_1 value: 8.889 - type: recall_at_3 value: 12.222 - type: recall_at_5 value: 25.0 - type: recall_at_10 value: 59.443999999999996 - type: recall_at_20 value: 93.333 - type: recall_at_100 value: 97.77799999999999 - type: recall_at_1000 value: 100.0 - type: precision_at_1 value: 8.889 - type: precision_at_3 value: 4.074 - type: precision_at_5 value: 5.0 - type: precision_at_10 value: 5.944 - type: precision_at_20 value: 4.667000000000001 - type: precision_at_100 value: 0.9780000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 3.8889 - type: mrr_at_3 value: 8.9815 - type: mrr_at_5 value: 10.2593 - type: mrr_at_10 value: 15.263399999999999 - type: mrr_at_20 value: 17.711 - type: mrr_at_100 value: 17.8421 - type: mrr_at_1000 value: 17.8596 - type: nauc_ndcg_at_1_max value: -40.8791 - type: nauc_ndcg_at_1_std value: -22.7629 - type: nauc_ndcg_at_1_diff1 value: -23.105 - type: nauc_ndcg_at_3_max value: -43.187599999999996 - type: nauc_ndcg_at_3_std value: -26.9994 - type: nauc_ndcg_at_3_diff1 value: -15.4181 - type: nauc_ndcg_at_5_max value: -37.2549 - type: nauc_ndcg_at_5_std value: -24.4115 - type: nauc_ndcg_at_5_diff1 value: -5.7322999999999995 - type: nauc_ndcg_at_10_max value: -36.3471 - type: nauc_ndcg_at_10_std value: -22.8065 - type: nauc_ndcg_at_10_diff1 value: -5.3767000000000005 - type: nauc_ndcg_at_20_max value: -35.829100000000004 - type: nauc_ndcg_at_20_std value: -20.787300000000002 - type: nauc_ndcg_at_20_diff1 value: -9.6038 - type: nauc_ndcg_at_100_max value: -36.5805 - type: nauc_ndcg_at_100_std value: -20.1283 - type: nauc_ndcg_at_100_diff1 value: -8.9448 - type: nauc_ndcg_at_1000_max value: -38.1158 - type: nauc_ndcg_at_1000_std value: -22.2744 - type: nauc_ndcg_at_1000_diff1 value: -9.8704 - type: nauc_map_at_1_max value: -40.8791 - type: nauc_map_at_1_std value: -22.7629 - type: nauc_map_at_1_diff1 value: -23.105 - type: nauc_map_at_3_max value: -42.559200000000004 - type: nauc_map_at_3_std value: -25.8594 - type: nauc_map_at_3_diff1 value: -17.2362 - type: nauc_map_at_5_max value: -38.595800000000004 - type: nauc_map_at_5_std value: -24.1339 - type: nauc_map_at_5_diff1 value: -10.4452 - type: nauc_map_at_10_max value: -38.2389 - type: nauc_map_at_10_std value: -23.453599999999998 - type: nauc_map_at_10_diff1 value: -10.2748 - type: nauc_map_at_20_max value: -38.8856 - type: nauc_map_at_20_std value: -23.095499999999998 - type: nauc_map_at_20_diff1 value: -11.695500000000001 - type: nauc_map_at_100_max value: -38.9696 - type: nauc_map_at_100_std value: -23.0057 - type: nauc_map_at_100_diff1 value: -11.635900000000001 - type: nauc_map_at_1000_max value: -39.035399999999996 - type: nauc_map_at_1000_std value: -23.1075 - type: nauc_map_at_1000_diff1 value: -11.6855 - type: nauc_recall_at_1_max value: -40.8791 - type: nauc_recall_at_1_std value: -22.7629 - type: nauc_recall_at_1_diff1 value: -23.105 - type: nauc_recall_at_3_max value: -44.8047 - type: nauc_recall_at_3_std value: -29.9296 - type: nauc_recall_at_3_diff1 value: -10.8169 - type: nauc_recall_at_5_max value: -34.5699 - type: nauc_recall_at_5_std value: -24.9544 - type: nauc_recall_at_5_diff1 value: 3.4269000000000003 - type: nauc_recall_at_10_max value: -32.149699999999996 - type: nauc_recall_at_10_std value: -21.0142 - type: nauc_recall_at_10_diff1 value: 4.358 - type: nauc_recall_at_20_max value: 0.7547 - type: nauc_recall_at_20_std value: 7.1739999999999995 - type: nauc_recall_at_20_diff1 value: -3.2252 - type: nauc_recall_at_100_max value: 41.4332 - type: nauc_recall_at_100_std value: 86.1111 - type: nauc_recall_at_100_diff1 value: 35.7143 - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_precision_at_1_max value: -40.8791 - type: nauc_precision_at_1_std value: -22.7629 - type: nauc_precision_at_1_diff1 value: -23.105 - type: nauc_precision_at_3_max value: -44.8047 - type: nauc_precision_at_3_std value: -29.9296 - type: nauc_precision_at_3_diff1 value: -10.8169 - type: nauc_precision_at_5_max value: -34.5699 - type: nauc_precision_at_5_std value: -24.9544 - type: nauc_precision_at_5_diff1 value: 3.4269000000000003 - type: nauc_precision_at_10_max value: -32.149699999999996 - type: nauc_precision_at_10_std value: -21.0142 - type: nauc_precision_at_10_diff1 value: 4.358 - type: nauc_precision_at_20_max value: 0.7547 - type: nauc_precision_at_20_std value: 7.1739999999999995 - type: nauc_precision_at_20_diff1 value: -3.2252 - type: nauc_precision_at_100_max value: 41.4332 - type: nauc_precision_at_100_std value: 86.1111 - type: nauc_precision_at_100_diff1 value: 35.7143 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 100.0 - type: nauc_precision_at_1000_diff1 value: 100.0 - type: nauc_mrr_at_1_max value: -42.7345 - type: nauc_mrr_at_1_std value: -35.9194 - type: nauc_mrr_at_1_diff1 value: -3.8369 - type: nauc_mrr_at_3_max value: -35.497099999999996 - type: nauc_mrr_at_3_std value: -28.1283 - type: nauc_mrr_at_3_diff1 value: 22.5336 - type: nauc_mrr_at_5_max value: -34.9895 - type: nauc_mrr_at_5_std value: -26.9499 - type: nauc_mrr_at_5_diff1 value: 16.9652 - type: nauc_mrr_at_10_max value: -36.7778 - type: nauc_mrr_at_10_std value: -28.069 - type: nauc_mrr_at_10_diff1 value: 18.806700000000003 - type: nauc_mrr_at_20_max value: -36.2726 - type: nauc_mrr_at_20_std value: -26.359500000000004 - type: nauc_mrr_at_20_diff1 value: 18.1655 - type: nauc_mrr_at_100_max value: -36.361 - type: nauc_mrr_at_100_std value: -26.280900000000003 - type: nauc_mrr_at_100_diff1 value: 18.5228 - type: nauc_mrr_at_1000_max value: -36.4424 - type: nauc_mrr_at_1000_std value: -26.415699999999998 - type: nauc_mrr_at_1000_diff1 value: 18.496499999999997 - type: main_score value: 26.888 task: type: Retrieval - dataset: config: default name: MTEB CosQA (default) revision: bc5efb7e9d437246ce393ed19d772e08e4a79535 split: test type: CoIR-Retrieval/cosqa metrics: - type: ndcg_at_1 value: 15.4 - type: ndcg_at_3 value: 23.59 - type: ndcg_at_5 value: 29.779 - type: ndcg_at_10 value: 35.449999999999996 - type: ndcg_at_20 value: 38.309 - type: ndcg_at_100 value: 41.980000000000004 - type: ndcg_at_1000 value: 42.917 - type: map_at_1 value: 15.4 - type: map_at_3 value: 21.4 - type: map_at_5 value: 24.84 - type: map_at_10 value: 27.245 - type: map_at_20 value: 28.043000000000003 - type: map_at_100 value: 28.592000000000002 - type: map_at_1000 value: 28.63 - type: recall_at_1 value: 15.4 - type: recall_at_3 value: 30.0 - type: recall_at_5 value: 45.0 - type: recall_at_10 value: 62.2 - type: recall_at_20 value: 73.4 - type: recall_at_100 value: 92.60000000000001 - type: recall_at_1000 value: 99.8 - type: precision_at_1 value: 15.4 - type: precision_at_3 value: 10.0 - type: precision_at_5 value: 9.0 - type: precision_at_10 value: 6.22 - type: precision_at_20 value: 3.6700000000000004 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 13.600000000000001 - type: mrr_at_3 value: 19.666700000000002 - type: mrr_at_5 value: 22.0867 - type: mrr_at_10 value: 25.020799999999998 - type: mrr_at_20 value: 25.8896 - type: mrr_at_100 value: 26.434400000000004 - type: mrr_at_1000 value: 26.4729 - type: nauc_ndcg_at_1_max value: 7.9282 - type: nauc_ndcg_at_1_std value: -14.053299999999998 - type: nauc_ndcg_at_1_diff1 value: 36.687799999999996 - type: nauc_ndcg_at_3_max value: 11.969899999999999 - type: nauc_ndcg_at_3_std value: -13.7404 - type: nauc_ndcg_at_3_diff1 value: 22.2386 - type: nauc_ndcg_at_5_max value: 13.4812 - type: nauc_ndcg_at_5_std value: -13.2079 - type: nauc_ndcg_at_5_diff1 value: 15.8384 - type: nauc_ndcg_at_10_max value: 12.061399999999999 - type: nauc_ndcg_at_10_std value: -15.1337 - type: nauc_ndcg_at_10_diff1 value: 18.804399999999998 - type: nauc_ndcg_at_20_max value: 14.027000000000001 - type: nauc_ndcg_at_20_std value: -13.123899999999999 - type: nauc_ndcg_at_20_diff1 value: 18.546499999999998 - type: nauc_ndcg_at_100_max value: 15.4228 - type: nauc_ndcg_at_100_std value: -9.7982 - type: nauc_ndcg_at_100_diff1 value: 20.637900000000002 - type: nauc_ndcg_at_1000_max value: 13.3878 - type: nauc_ndcg_at_1000_std value: -12.3766 - type: nauc_ndcg_at_1000_diff1 value: 21.2979 - type: nauc_map_at_1_max value: 7.9282 - type: nauc_map_at_1_std value: -14.053299999999998 - type: nauc_map_at_1_diff1 value: 36.687799999999996 - type: nauc_map_at_3_max value: 11.2376 - type: nauc_map_at_3_std value: -13.882800000000001 - type: nauc_map_at_3_diff1 value: 25.4638 - type: nauc_map_at_5_max value: 12.0973 - type: nauc_map_at_5_std value: -13.581399999999999 - type: nauc_map_at_5_diff1 value: 21.6642 - type: nauc_map_at_10_max value: 11.4818 - type: nauc_map_at_10_std value: -14.3841 - type: nauc_map_at_10_diff1 value: 23.0484 - type: nauc_map_at_20_max value: 11.9802 - type: nauc_map_at_20_std value: -13.8687 - type: nauc_map_at_20_diff1 value: 23.0349 - type: nauc_map_at_100_max value: 12.112 - type: nauc_map_at_100_std value: -13.423099999999998 - type: nauc_map_at_100_diff1 value: 23.385 - type: nauc_map_at_1000_max value: 12.034 - type: nauc_map_at_1000_std value: -13.5156 - type: nauc_map_at_1000_diff1 value: 23.4084 - type: nauc_recall_at_1_max value: 7.9282 - type: nauc_recall_at_1_std value: -14.053299999999998 - type: nauc_recall_at_1_diff1 value: 36.687799999999996 - type: nauc_recall_at_3_max value: 13.6773 - type: nauc_recall_at_3_std value: -13.376299999999999 - type: nauc_recall_at_3_diff1 value: 14.4918 - type: nauc_recall_at_5_max value: 16.8852 - type: nauc_recall_at_5_std value: -12.237499999999999 - type: nauc_recall_at_5_diff1 value: 1.4449 - type: nauc_recall_at_10_max value: 13.234499999999999 - type: nauc_recall_at_10_std value: -17.8241 - type: nauc_recall_at_10_diff1 value: 7.6404 - type: nauc_recall_at_20_max value: 22.708000000000002 - type: nauc_recall_at_20_std value: -9.111600000000001 - type: nauc_recall_at_20_diff1 value: 3.4109 - type: nauc_recall_at_100_max value: 66.1165 - type: nauc_recall_at_100_std value: 55.2477 - type: nauc_recall_at_100_diff1 value: 5.7612 - type: nauc_recall_at_1000_max value: 100.0 - type: nauc_recall_at_1000_std value: 86.9281 - type: nauc_recall_at_1000_diff1 value: 72.2222 - type: nauc_precision_at_1_max value: 7.9282 - type: nauc_precision_at_1_std value: -14.053299999999998 - type: nauc_precision_at_1_diff1 value: 36.687799999999996 - type: nauc_precision_at_3_max value: 13.6773 - type: nauc_precision_at_3_std value: -13.376299999999999 - type: nauc_precision_at_3_diff1 value: 14.4918 - type: nauc_precision_at_5_max value: 16.8852 - type: nauc_precision_at_5_std value: -12.237499999999999 - type: nauc_precision_at_5_diff1 value: 1.4449 - type: nauc_precision_at_10_max value: 13.234499999999999 - type: nauc_precision_at_10_std value: -17.8241 - type: nauc_precision_at_10_diff1 value: 7.6404 - type: nauc_precision_at_20_max value: 22.708000000000002 - type: nauc_precision_at_20_std value: -9.111600000000001 - type: nauc_precision_at_20_diff1 value: 3.4109 - type: nauc_precision_at_100_max value: 66.1165 - type: nauc_precision_at_100_std value: 55.2477 - type: nauc_precision_at_100_diff1 value: 5.7612 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 86.9281 - type: nauc_precision_at_1000_diff1 value: 72.2222 - type: nauc_mrr_at_1_max value: 13.238199999999999 - type: nauc_mrr_at_1_std value: -21.1942 - type: nauc_mrr_at_1_diff1 value: 47.1481 - type: nauc_mrr_at_3_max value: 13.370999999999999 - type: nauc_mrr_at_3_std value: -18.0171 - type: nauc_mrr_at_3_diff1 value: 31.3232 - type: nauc_mrr_at_5_max value: 12.646099999999999 - type: nauc_mrr_at_5_std value: -18.5601 - type: nauc_mrr_at_5_diff1 value: 28.8561 - type: nauc_mrr_at_10_max value: 13.1101 - type: nauc_mrr_at_10_std value: -18.915000000000003 - type: nauc_mrr_at_10_diff1 value: 28.9512 - type: nauc_mrr_at_20_max value: 13.0191 - type: nauc_mrr_at_20_std value: -18.501 - type: nauc_mrr_at_20_diff1 value: 29.102299999999996 - type: nauc_mrr_at_100_max value: 13.475699999999998 - type: nauc_mrr_at_100_std value: -17.9907 - type: nauc_mrr_at_100_diff1 value: 29.549999999999997 - type: nauc_mrr_at_1000_max value: 13.3963 - type: nauc_mrr_at_1000_std value: -18.093999999999998 - type: nauc_mrr_at_1000_diff1 value: 29.583 - type: main_score value: 35.449999999999996 task: type: Retrieval - dataset: config: default name: MTEB DBPedia (default) revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: ndcg_at_1 value: 51.37500000000001 - type: ndcg_at_3 value: 41.275 - type: ndcg_at_5 value: 38.297 - type: ndcg_at_10 value: 35.96 - type: ndcg_at_20 value: 35.117 - type: ndcg_at_100 value: 39.878 - type: ndcg_at_1000 value: 47.931000000000004 - type: map_at_1 value: 8.651 - type: map_at_3 value: 13.51 - type: map_at_5 value: 15.468000000000002 - type: map_at_10 value: 17.628 - type: map_at_20 value: 19.786 - type: map_at_100 value: 23.354 - type: map_at_1000 value: 24.826 - type: recall_at_1 value: 8.651 - type: recall_at_3 value: 14.847 - type: recall_at_5 value: 18.04 - type: recall_at_10 value: 22.416 - type: recall_at_20 value: 28.136 - type: recall_at_100 value: 46.381 - type: recall_at_1000 value: 71.557 - type: precision_at_1 value: 64.5 - type: precision_at_3 value: 44.417 - type: precision_at_5 value: 36.6 - type: precision_at_10 value: 27.450000000000003 - type: precision_at_20 value: 19.811999999999998 - type: precision_at_100 value: 8.405 - type: precision_at_1000 value: 1.923 - type: mrr_at_1 value: 64.5 - type: mrr_at_3 value: 70.25 - type: mrr_at_5 value: 71.275 - type: mrr_at_10 value: 71.9889 - type: mrr_at_20 value: 72.207 - type: mrr_at_100 value: 72.33239999999999 - type: mrr_at_1000 value: 72.3461 - type: nauc_ndcg_at_1_max value: 31.932100000000002 - type: nauc_ndcg_at_1_std value: 10.2841 - type: nauc_ndcg_at_1_diff1 value: 36.07 - type: nauc_ndcg_at_3_max value: 29.2531 - type: nauc_ndcg_at_3_std value: 11.178799999999999 - type: nauc_ndcg_at_3_diff1 value: 25.764799999999997 - type: nauc_ndcg_at_5_max value: 27.1826 - type: nauc_ndcg_at_5_std value: 12.5 - type: nauc_ndcg_at_5_diff1 value: 24.9511 - type: nauc_ndcg_at_10_max value: 24.1388 - type: nauc_ndcg_at_10_std value: 11.350200000000001 - type: nauc_ndcg_at_10_diff1 value: 23.7319 - type: nauc_ndcg_at_20_max value: 19.1396 - type: nauc_ndcg_at_20_std value: 9.464699999999999 - type: nauc_ndcg_at_20_diff1 value: 20.9192 - type: nauc_ndcg_at_100_max value: 20.1158 - type: nauc_ndcg_at_100_std value: 13.2815 - type: nauc_ndcg_at_100_diff1 value: 21.221400000000003 - type: nauc_ndcg_at_1000_max value: 26.648899999999998 - type: nauc_ndcg_at_1000_std value: 22.5347 - type: nauc_ndcg_at_1000_diff1 value: 19.6168 - type: nauc_map_at_1_max value: -4.3177 - type: nauc_map_at_1_std value: -24.5562 - type: nauc_map_at_1_diff1 value: 29.4423 - type: nauc_map_at_3_max value: -3.3966000000000003 - type: nauc_map_at_3_std value: -21.9222 - type: nauc_map_at_3_diff1 value: 21.2481 - type: nauc_map_at_5_max value: -1.1166 - type: nauc_map_at_5_std value: -17.1077 - type: nauc_map_at_5_diff1 value: 19.9608 - type: nauc_map_at_10_max value: 2.8669000000000002 - type: nauc_map_at_10_std value: -11.6119 - type: nauc_map_at_10_diff1 value: 19.6247 - type: nauc_map_at_20_max value: 6.4855 - type: nauc_map_at_20_std value: -4.1277 - type: nauc_map_at_20_diff1 value: 18.1824 - type: nauc_map_at_100_max value: 12.971499999999999 - type: nauc_map_at_100_std value: 7.603400000000001 - type: nauc_map_at_100_diff1 value: 17.5644 - type: nauc_map_at_1000_max value: 15.277299999999999 - type: nauc_map_at_1000_std value: 10.5578 - type: nauc_map_at_1000_diff1 value: 17.1155 - type: nauc_recall_at_1_max value: -4.3177 - type: nauc_recall_at_1_std value: -24.5562 - type: nauc_recall_at_1_diff1 value: 29.4423 - type: nauc_recall_at_3_max value: -6.2376000000000005 - type: nauc_recall_at_3_std value: -23.4233 - type: nauc_recall_at_3_diff1 value: 17.329800000000002 - type: nauc_recall_at_5_max value: -3.4825000000000004 - type: nauc_recall_at_5_std value: -17.4895 - type: nauc_recall_at_5_diff1 value: 16.2379 - type: nauc_recall_at_10_max value: 0.9988 - type: nauc_recall_at_10_std value: -11.1992 - type: nauc_recall_at_10_diff1 value: 16.225 - type: nauc_recall_at_20_max value: 4.693300000000001 - type: nauc_recall_at_20_std value: -1.8259999999999998 - type: nauc_recall_at_20_diff1 value: 12.612400000000001 - type: nauc_recall_at_100_max value: 13.420599999999999 - type: nauc_recall_at_100_std value: 14.4476 - type: nauc_recall_at_100_diff1 value: 14.5736 - type: nauc_recall_at_1000_max value: 18.4052 - type: nauc_recall_at_1000_std value: 32.6262 - type: nauc_recall_at_1000_diff1 value: 6.2448 - type: nauc_precision_at_1_max value: 44.2395 - type: nauc_precision_at_1_std value: 16.9766 - type: nauc_precision_at_1_diff1 value: 42.981 - type: nauc_precision_at_3_max value: 37.5078 - type: nauc_precision_at_3_std value: 24.46 - type: nauc_precision_at_3_diff1 value: 16.700799999999997 - type: nauc_precision_at_5_max value: 39.9766 - type: nauc_precision_at_5_std value: 35.1485 - type: nauc_precision_at_5_diff1 value: 13.0716 - type: nauc_precision_at_10_max value: 39.642500000000005 - type: nauc_precision_at_10_std value: 41.8067 - type: nauc_precision_at_10_diff1 value: 8.864700000000001 - type: nauc_precision_at_20_max value: 36.7342 - type: nauc_precision_at_20_std value: 47.144200000000005 - type: nauc_precision_at_20_diff1 value: 3.6226000000000003 - type: nauc_precision_at_100_max value: 35.3062 - type: nauc_precision_at_100_std value: 47.2687 - type: nauc_precision_at_100_diff1 value: 0.0039 - type: nauc_precision_at_1000_max value: 27.387099999999997 - type: nauc_precision_at_1000_std value: 24.4162 - type: nauc_precision_at_1000_diff1 value: -13.5 - type: nauc_mrr_at_1_max value: 44.2395 - type: nauc_mrr_at_1_std value: 16.9766 - type: nauc_mrr_at_1_diff1 value: 42.981 - type: nauc_mrr_at_3_max value: 45.9027 - type: nauc_mrr_at_3_std value: 16.3998 - type: nauc_mrr_at_3_diff1 value: 42.7201 - type: nauc_mrr_at_5_max value: 46.7905 - type: nauc_mrr_at_5_std value: 17.921599999999998 - type: nauc_mrr_at_5_diff1 value: 42.4334 - type: nauc_mrr_at_10_max value: 46.775 - type: nauc_mrr_at_10_std value: 18.282899999999998 - type: nauc_mrr_at_10_diff1 value: 42.4501 - type: nauc_mrr_at_20_max value: 46.671600000000005 - type: nauc_mrr_at_20_std value: 18.064700000000002 - type: nauc_mrr_at_20_diff1 value: 42.4331 - type: nauc_mrr_at_100_max value: 46.7118 - type: nauc_mrr_at_100_std value: 18.2135 - type: nauc_mrr_at_100_diff1 value: 42.4809 - type: nauc_mrr_at_1000_max value: 46.6966 - type: nauc_mrr_at_1000_std value: 18.185200000000002 - type: nauc_mrr_at_1000_diff1 value: 42.4844 - type: main_score value: 35.96 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 38.795 - type: f1 value: 35.2399 - type: f1_weighted value: 40.7945 - type: main_score value: 38.795 task: type: Classification - dataset: config: default name: MTEB FEVER (default) revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: ndcg_at_1 value: 79.08800000000001 - type: ndcg_at_3 value: 83.943 - type: ndcg_at_5 value: 84.878 - type: ndcg_at_10 value: 85.528 - type: ndcg_at_20 value: 85.842 - type: ndcg_at_100 value: 86.134 - type: ndcg_at_1000 value: 86.367 - type: map_at_1 value: 73.211 - type: map_at_3 value: 80.5 - type: map_at_5 value: 81.134 - type: map_at_10 value: 81.463 - type: map_at_20 value: 81.566 - type: map_at_100 value: 81.622 - type: map_at_1000 value: 81.634 - type: recall_at_1 value: 73.211 - type: recall_at_3 value: 88.32799999999999 - type: recall_at_5 value: 90.821 - type: recall_at_10 value: 92.797 - type: recall_at_20 value: 93.932 - type: recall_at_100 value: 95.26299999999999 - type: recall_at_1000 value: 96.738 - type: precision_at_1 value: 79.08800000000001 - type: precision_at_3 value: 31.963 - type: precision_at_5 value: 19.769000000000002 - type: precision_at_10 value: 10.132 - type: precision_at_20 value: 5.149 - type: precision_at_100 value: 1.055 - type: precision_at_1000 value: 0.109 - type: mrr_at_1 value: 79.0879 - type: mrr_at_3 value: 86.1536 - type: mrr_at_5 value: 86.7004 - type: mrr_at_10 value: 86.9425 - type: mrr_at_20 value: 87.00099999999999 - type: mrr_at_100 value: 87.01719999999999 - type: mrr_at_1000 value: 87.01769999999999 - type: nauc_ndcg_at_1_max value: 28.2184 - type: nauc_ndcg_at_1_std value: -20.374200000000002 - type: nauc_ndcg_at_1_diff1 value: 64.4185 - type: nauc_ndcg_at_3_max value: 22.014 - type: nauc_ndcg_at_3_std value: -15.221699999999998 - type: nauc_ndcg_at_3_diff1 value: 47.511700000000005 - type: nauc_ndcg_at_5_max value: 21.381700000000002 - type: nauc_ndcg_at_5_std value: -14.3711 - type: nauc_ndcg_at_5_diff1 value: 46.6271 - type: nauc_ndcg_at_10_max value: 20.4251 - type: nauc_ndcg_at_10_std value: -13.3096 - type: nauc_ndcg_at_10_diff1 value: 46.1205 - type: nauc_ndcg_at_20_max value: 20.686 - type: nauc_ndcg_at_20_std value: -12.6058 - type: nauc_ndcg_at_20_diff1 value: 46.14 - type: nauc_ndcg_at_100_max value: 20.657700000000002 - type: nauc_ndcg_at_100_std value: -12.5531 - type: nauc_ndcg_at_100_diff1 value: 46.3788 - type: nauc_ndcg_at_1000_max value: 21.0177 - type: nauc_ndcg_at_1000_std value: -12.8318 - type: nauc_ndcg_at_1000_diff1 value: 46.8648 - type: nauc_map_at_1_max value: 21.4975 - type: nauc_map_at_1_std value: -14.5207 - type: nauc_map_at_1_diff1 value: 51.53959999999999 - type: nauc_map_at_3_max value: 20.322699999999998 - type: nauc_map_at_3_std value: -13.8986 - type: nauc_map_at_3_diff1 value: 46.3932 - type: nauc_map_at_5_max value: 20.3296 - type: nauc_map_at_5_std value: -13.5416 - type: nauc_map_at_5_diff1 value: 46.1518 - type: nauc_map_at_10_max value: 20.0385 - type: nauc_map_at_10_std value: -13.239999999999998 - type: nauc_map_at_10_diff1 value: 46.061800000000005 - type: nauc_map_at_20_max value: 20.113300000000002 - type: nauc_map_at_20_std value: -13.0931 - type: nauc_map_at_20_diff1 value: 46.091 - type: nauc_map_at_100_max value: 20.1262 - type: nauc_map_at_100_std value: -13.0646 - type: nauc_map_at_100_diff1 value: 46.1321 - type: nauc_map_at_1000_max value: 20.1391 - type: nauc_map_at_1000_std value: -13.069600000000001 - type: nauc_map_at_1000_diff1 value: 46.1501 - type: nauc_recall_at_1_max value: 21.4975 - type: nauc_recall_at_1_std value: -14.5207 - type: nauc_recall_at_1_diff1 value: 51.53959999999999 - type: nauc_recall_at_3_max value: 15.379399999999999 - type: nauc_recall_at_3_std value: -9.9735 - type: nauc_recall_at_3_diff1 value: 30.6769 - type: nauc_recall_at_5_max value: 13.104099999999999 - type: nauc_recall_at_5_std value: -6.2273000000000005 - type: nauc_recall_at_5_diff1 value: 24.4602 - type: nauc_recall_at_10_max value: 6.4093 - type: nauc_recall_at_10_std value: 0.9238 - type: nauc_recall_at_10_diff1 value: 16.2715 - type: nauc_recall_at_20_max value: 5.5285 - type: nauc_recall_at_20_std value: 9.1474 - type: nauc_recall_at_20_diff1 value: 10.8034 - type: nauc_recall_at_100_max value: -0.116 - type: nauc_recall_at_100_std value: 14.4612 - type: nauc_recall_at_100_diff1 value: 4.6372 - type: nauc_recall_at_1000_max value: -1.595 - type: nauc_recall_at_1000_std value: 18.1495 - type: nauc_recall_at_1000_diff1 value: -0.022000000000000002 - type: nauc_precision_at_1_max value: 28.2184 - type: nauc_precision_at_1_std value: -20.374200000000002 - type: nauc_precision_at_1_diff1 value: 64.4185 - type: nauc_precision_at_3_max value: 24.238799999999998 - type: nauc_precision_at_3_std value: -19.7064 - type: nauc_precision_at_3_diff1 value: 37.7498 - type: nauc_precision_at_5_max value: 20.8308 - type: nauc_precision_at_5_std value: -13.6486 - type: nauc_precision_at_5_diff1 value: 23.3404 - type: nauc_precision_at_10_max value: 9.4386 - type: nauc_precision_at_10_std value: -4.8239 - type: nauc_precision_at_10_diff1 value: 6.8594 - type: nauc_precision_at_20_max value: 9.0063 - type: nauc_precision_at_20_std value: 4.0311 - type: nauc_precision_at_20_diff1 value: -2.9298 - type: nauc_precision_at_100_max value: 5.1057 - type: nauc_precision_at_100_std value: 7.3903 - type: nauc_precision_at_100_diff1 value: -8.7148 - type: nauc_precision_at_1000_max value: 6.3359 - type: nauc_precision_at_1000_std value: 3.9797 - type: nauc_precision_at_1000_diff1 value: -8.3131 - type: nauc_mrr_at_1_max value: 28.2184 - type: nauc_mrr_at_1_std value: -20.374200000000002 - type: nauc_mrr_at_1_diff1 value: 64.4185 - type: nauc_mrr_at_3_max value: 29.7481 - type: nauc_mrr_at_3_std value: -21.9924 - type: nauc_mrr_at_3_diff1 value: 62.5737 - type: nauc_mrr_at_5_max value: 29.8062 - type: nauc_mrr_at_5_std value: -22.078 - type: nauc_mrr_at_5_diff1 value: 62.9 - type: nauc_mrr_at_10_max value: 29.641000000000002 - type: nauc_mrr_at_10_std value: -21.6827 - type: nauc_mrr_at_10_diff1 value: 62.944599999999994 - type: nauc_mrr_at_20_max value: 29.6535 - type: nauc_mrr_at_20_std value: -21.520400000000002 - type: nauc_mrr_at_20_diff1 value: 62.9583 - type: nauc_mrr_at_100_max value: 29.622799999999998 - type: nauc_mrr_at_100_std value: -21.5393 - type: nauc_mrr_at_100_diff1 value: 62.9658 - type: nauc_mrr_at_1000_max value: 29.619400000000002 - type: nauc_mrr_at_1000_std value: -21.5417 - type: nauc_mrr_at_1000_diff1 value: 62.96469999999999 - type: main_score value: 85.528 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 (default) revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: ndcg_at_1 value: 35.494 - type: ndcg_at_3 value: 32.305 - type: ndcg_at_5 value: 34.332 - type: ndcg_at_10 value: 36.851 - type: ndcg_at_20 value: 39.31 - type: ndcg_at_100 value: 43.462 - type: ndcg_at_1000 value: 46.766000000000005 - type: map_at_1 value: 18.311 - type: map_at_3 value: 24.778 - type: map_at_5 value: 27.453 - type: map_at_10 value: 29.198 - type: map_at_20 value: 30.118000000000002 - type: map_at_100 value: 30.930000000000003 - type: map_at_1000 value: 31.115 - type: recall_at_1 value: 18.311 - type: recall_at_3 value: 28.823999999999998 - type: recall_at_5 value: 36.178 - type: recall_at_10 value: 43.842 - type: recall_at_20 value: 51.370000000000005 - type: recall_at_100 value: 68.593 - type: recall_at_1000 value: 88.55 - type: precision_at_1 value: 35.494 - type: precision_at_3 value: 21.142 - type: precision_at_5 value: 16.326999999999998 - type: precision_at_10 value: 10.309 - type: precision_at_20 value: 6.211 - type: precision_at_100 value: 1.7069999999999999 - type: precision_at_1000 value: 0.22899999999999998 - type: mrr_at_1 value: 35.4938 - type: mrr_at_3 value: 41.6667 - type: mrr_at_5 value: 43.4182 - type: mrr_at_10 value: 44.4732 - type: mrr_at_20 value: 44.969 - type: mrr_at_100 value: 45.318599999999996 - type: mrr_at_1000 value: 45.3674 - type: nauc_ndcg_at_1_max value: 33.946799999999996 - type: nauc_ndcg_at_1_std value: -5.282 - type: nauc_ndcg_at_1_diff1 value: 47.413 - type: nauc_ndcg_at_3_max value: 30.9073 - type: nauc_ndcg_at_3_std value: -2.2498 - type: nauc_ndcg_at_3_diff1 value: 38.548500000000004 - type: nauc_ndcg_at_5_max value: 30.2537 - type: nauc_ndcg_at_5_std value: -0.9919000000000001 - type: nauc_ndcg_at_5_diff1 value: 37.988499999999995 - type: nauc_ndcg_at_10_max value: 30.5224 - type: nauc_ndcg_at_10_std value: 0.0762 - type: nauc_ndcg_at_10_diff1 value: 38.2531 - type: nauc_ndcg_at_20_max value: 32.173 - type: nauc_ndcg_at_20_std value: 3.3266999999999998 - type: nauc_ndcg_at_20_diff1 value: 37.5071 - type: nauc_ndcg_at_100_max value: 33.551700000000004 - type: nauc_ndcg_at_100_std value: 5.8902 - type: nauc_ndcg_at_100_diff1 value: 37.3363 - type: nauc_ndcg_at_1000_max value: 34.1671 - type: nauc_ndcg_at_1000_std value: 5.4682 - type: nauc_ndcg_at_1000_diff1 value: 37.5779 - type: nauc_map_at_1_max value: 20.0425 - type: nauc_map_at_1_std value: -7.41 - type: nauc_map_at_1_diff1 value: 40.725699999999996 - type: nauc_map_at_3_max value: 25.380799999999997 - type: nauc_map_at_3_std value: -4.5524000000000004 - type: nauc_map_at_3_diff1 value: 38.960699999999996 - type: nauc_map_at_5_max value: 27.208900000000003 - type: nauc_map_at_5_std value: -3.034 - type: nauc_map_at_5_diff1 value: 38.475500000000004 - type: nauc_map_at_10_max value: 28.6066 - type: nauc_map_at_10_std value: -2.1042 - type: nauc_map_at_10_diff1 value: 38.4411 - type: nauc_map_at_20_max value: 29.3931 - type: nauc_map_at_20_std value: -0.8289 - type: nauc_map_at_20_diff1 value: 38.137 - type: nauc_map_at_100_max value: 29.8041 - type: nauc_map_at_100_std value: -0.1992 - type: nauc_map_at_100_diff1 value: 38.0546 - type: nauc_map_at_1000_max value: 29.886400000000002 - type: nauc_map_at_1000_std value: -0.1638 - type: nauc_map_at_1000_diff1 value: 38.0646 - type: nauc_recall_at_1_max value: 20.0425 - type: nauc_recall_at_1_std value: -7.41 - type: nauc_recall_at_1_diff1 value: 40.725699999999996 - type: nauc_recall_at_3_max value: 20.8038 - type: nauc_recall_at_3_std value: -4.1075 - type: nauc_recall_at_3_diff1 value: 33.0009 - type: nauc_recall_at_5_max value: 23.1816 - type: nauc_recall_at_5_std value: 0.2681 - type: nauc_recall_at_5_diff1 value: 30.1663 - type: nauc_recall_at_10_max value: 23.754 - type: nauc_recall_at_10_std value: 2.4185000000000003 - type: nauc_recall_at_10_diff1 value: 28.475499999999997 - type: nauc_recall_at_20_max value: 27.711599999999997 - type: nauc_recall_at_20_std value: 12.509700000000002 - type: nauc_recall_at_20_diff1 value: 25.172299999999996 - type: nauc_recall_at_100_max value: 29.3806 - type: nauc_recall_at_100_std value: 25.1963 - type: nauc_recall_at_100_diff1 value: 21.849 - type: nauc_recall_at_1000_max value: 34.1492 - type: nauc_recall_at_1000_std value: 40.4872 - type: nauc_recall_at_1000_diff1 value: 17.0167 - type: nauc_precision_at_1_max value: 33.946799999999996 - type: nauc_precision_at_1_std value: -5.282 - type: nauc_precision_at_1_diff1 value: 47.413 - type: nauc_precision_at_3_max value: 36.6837 - type: nauc_precision_at_3_std value: 3.7282 - type: nauc_precision_at_3_diff1 value: 31.0152 - type: nauc_precision_at_5_max value: 37.6087 - type: nauc_precision_at_5_std value: 7.3439000000000005 - type: nauc_precision_at_5_diff1 value: 27.2321 - type: nauc_precision_at_10_max value: 38.2792 - type: nauc_precision_at_10_std value: 11.3814 - type: nauc_precision_at_10_diff1 value: 22.6494 - type: nauc_precision_at_20_max value: 38.455 - type: nauc_precision_at_20_std value: 17.4053 - type: nauc_precision_at_20_diff1 value: 16.8265 - type: nauc_precision_at_100_max value: 36.203 - type: nauc_precision_at_100_std value: 22.2758 - type: nauc_precision_at_100_diff1 value: 8.3908 - type: nauc_precision_at_1000_max value: 29.599700000000002 - type: nauc_precision_at_1000_std value: 17.186899999999998 - type: nauc_precision_at_1000_diff1 value: 0.0332 - type: nauc_mrr_at_1_max value: 33.946799999999996 - type: nauc_mrr_at_1_std value: -5.282 - type: nauc_mrr_at_1_diff1 value: 47.413 - type: nauc_mrr_at_3_max value: 34.0785 - type: nauc_mrr_at_3_std value: -2.1323000000000003 - type: nauc_mrr_at_3_diff1 value: 43.8661 - type: nauc_mrr_at_5_max value: 34.244 - type: nauc_mrr_at_5_std value: -1.5425 - type: nauc_mrr_at_5_diff1 value: 43.7631 - type: nauc_mrr_at_10_max value: 34.265299999999996 - type: nauc_mrr_at_10_std value: -1.1494 - type: nauc_mrr_at_10_diff1 value: 43.639 - type: nauc_mrr_at_20_max value: 34.5648 - type: nauc_mrr_at_20_std value: -0.6076 - type: nauc_mrr_at_20_diff1 value: 43.431 - type: nauc_mrr_at_100_max value: 34.571400000000004 - type: nauc_mrr_at_100_std value: -0.5074000000000001 - type: nauc_mrr_at_100_diff1 value: 43.4003 - type: nauc_mrr_at_1000_max value: 34.5576 - type: nauc_mrr_at_1000_std value: -0.534 - type: nauc_mrr_at_1000_diff1 value: 43.4086 - type: main_score value: 36.851 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA (default) revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: ndcg_at_1 value: 73.531 - type: ndcg_at_3 value: 58.24700000000001 - type: ndcg_at_5 value: 60.905 - type: ndcg_at_10 value: 62.918 - type: ndcg_at_20 value: 64.297 - type: ndcg_at_100 value: 66.056 - type: ndcg_at_1000 value: 67.554 - type: map_at_1 value: 36.766 - type: map_at_3 value: 50.427 - type: map_at_5 value: 52.449999999999996 - type: map_at_10 value: 53.639 - type: map_at_20 value: 54.17999999999999 - type: map_at_100 value: 54.532000000000004 - type: map_at_1000 value: 54.608000000000004 - type: recall_at_1 value: 36.766 - type: recall_at_3 value: 54.835 - type: recall_at_5 value: 60.080999999999996 - type: recall_at_10 value: 65.098 - type: recall_at_20 value: 69.541 - type: recall_at_100 value: 77.306 - type: recall_at_1000 value: 87.252 - type: precision_at_1 value: 73.531 - type: precision_at_3 value: 36.556 - type: precision_at_5 value: 24.032 - type: precision_at_10 value: 13.020000000000001 - type: precision_at_20 value: 6.954000000000001 - type: precision_at_100 value: 1.546 - type: precision_at_1000 value: 0.17500000000000002 - type: mrr_at_1 value: 73.5314 - type: mrr_at_3 value: 78.9489 - type: mrr_at_5 value: 79.7288 - type: mrr_at_10 value: 80.1036 - type: mrr_at_20 value: 80.2602 - type: mrr_at_100 value: 80.3412 - type: mrr_at_1000 value: 80.3512 - type: nauc_ndcg_at_1_max value: 49.4087 - type: nauc_ndcg_at_1_std value: -8.233 - type: nauc_ndcg_at_1_diff1 value: 69.19380000000001 - type: nauc_ndcg_at_3_max value: 29.407899999999998 - type: nauc_ndcg_at_3_std value: -2.1144 - type: nauc_ndcg_at_3_diff1 value: 27.245599999999996 - type: nauc_ndcg_at_5_max value: 27.483 - type: nauc_ndcg_at_5_std value: -0.7036 - type: nauc_ndcg_at_5_diff1 value: 24.2534 - type: nauc_ndcg_at_10_max value: 26.766499999999997 - type: nauc_ndcg_at_10_std value: 0.5583 - type: nauc_ndcg_at_10_diff1 value: 22.822300000000002 - type: nauc_ndcg_at_20_max value: 26.339800000000004 - type: nauc_ndcg_at_20_std value: 1.3486 - type: nauc_ndcg_at_20_diff1 value: 22.3499 - type: nauc_ndcg_at_100_max value: 26.436799999999998 - type: nauc_ndcg_at_100_std value: 2.5304 - type: nauc_ndcg_at_100_diff1 value: 22.372700000000002 - type: nauc_ndcg_at_1000_max value: 26.9472 - type: nauc_ndcg_at_1000_std value: 2.3277 - type: nauc_ndcg_at_1000_diff1 value: 23.3345 - type: nauc_map_at_1_max value: 49.4087 - type: nauc_map_at_1_std value: -8.233 - type: nauc_map_at_1_diff1 value: 69.19380000000001 - type: nauc_map_at_3_max value: 25.2676 - type: nauc_map_at_3_std value: -1.8659999999999999 - type: nauc_map_at_3_diff1 value: 21.0961 - type: nauc_map_at_5_max value: 24.0651 - type: nauc_map_at_5_std value: -0.8111 - type: nauc_map_at_5_diff1 value: 19.237099999999998 - type: nauc_map_at_10_max value: 23.785 - type: nauc_map_at_10_std value: -0.1037 - type: nauc_map_at_10_diff1 value: 18.5973 - type: nauc_map_at_20_max value: 23.6813 - type: nauc_map_at_20_std value: 0.1708 - type: nauc_map_at_20_diff1 value: 18.499299999999998 - type: nauc_map_at_100_max value: 23.7276 - type: nauc_map_at_100_std value: 0.3879 - type: nauc_map_at_100_diff1 value: 18.5423 - type: nauc_map_at_1000_max value: 23.7501 - type: nauc_map_at_1000_std value: 0.3886 - type: nauc_map_at_1000_diff1 value: 18.578500000000002 - type: nauc_recall_at_1_max value: 49.4087 - type: nauc_recall_at_1_std value: -8.233 - type: nauc_recall_at_1_diff1 value: 69.19380000000001 - type: nauc_recall_at_3_max value: 21.7043 - type: nauc_recall_at_3_std value: 0.24320000000000003 - type: nauc_recall_at_3_diff1 value: 12.102599999999999 - type: nauc_recall_at_5_max value: 16.923 - type: nauc_recall_at_5_std value: 2.9763 - type: nauc_recall_at_5_diff1 value: 5.5262 - type: nauc_recall_at_10_max value: 13.8286 - type: nauc_recall_at_10_std value: 6.1254 - type: nauc_recall_at_10_diff1 value: 0.6326 - type: nauc_recall_at_20_max value: 11.307300000000001 - type: nauc_recall_at_20_std value: 8.9861 - type: nauc_recall_at_20_diff1 value: -2.5909 - type: nauc_recall_at_100_max value: 8.2009 - type: nauc_recall_at_100_std value: 16.051199999999998 - type: nauc_recall_at_100_diff1 value: -7.757699999999999 - type: nauc_recall_at_1000_max value: 5.4062 - type: nauc_recall_at_1000_std value: 20.6122 - type: nauc_recall_at_1000_diff1 value: -11.931700000000001 - type: nauc_precision_at_1_max value: 49.4087 - type: nauc_precision_at_1_std value: -8.233 - type: nauc_precision_at_1_diff1 value: 69.19380000000001 - type: nauc_precision_at_3_max value: 21.7043 - type: nauc_precision_at_3_std value: 0.24320000000000003 - type: nauc_precision_at_3_diff1 value: 12.102599999999999 - type: nauc_precision_at_5_max value: 16.923 - type: nauc_precision_at_5_std value: 2.9763 - type: nauc_precision_at_5_diff1 value: 5.5262 - type: nauc_precision_at_10_max value: 13.8286 - type: nauc_precision_at_10_std value: 6.1254 - type: nauc_precision_at_10_diff1 value: 0.6326 - type: nauc_precision_at_20_max value: 11.307300000000001 - type: nauc_precision_at_20_std value: 8.9861 - type: nauc_precision_at_20_diff1 value: -2.5909 - type: nauc_precision_at_100_max value: 8.2009 - type: nauc_precision_at_100_std value: 16.051199999999998 - type: nauc_precision_at_100_diff1 value: -7.757699999999999 - type: nauc_precision_at_1000_max value: 5.4062 - type: nauc_precision_at_1000_std value: 20.6122 - type: nauc_precision_at_1000_diff1 value: -11.931700000000001 - type: nauc_mrr_at_1_max value: 49.4087 - type: nauc_mrr_at_1_std value: -8.233 - type: nauc_mrr_at_1_diff1 value: 69.19380000000001 - type: nauc_mrr_at_3_max value: 51.004099999999994 - type: nauc_mrr_at_3_std value: -6.4677 - type: nauc_mrr_at_3_diff1 value: 66.1969 - type: nauc_mrr_at_5_max value: 50.880199999999995 - type: nauc_mrr_at_5_std value: -6.3541 - type: nauc_mrr_at_5_diff1 value: 66.0764 - type: nauc_mrr_at_10_max value: 50.924899999999994 - type: nauc_mrr_at_10_std value: -6.2945 - type: nauc_mrr_at_10_diff1 value: 66.2079 - type: nauc_mrr_at_20_max value: 50.907199999999996 - type: nauc_mrr_at_20_std value: -6.253 - type: nauc_mrr_at_20_diff1 value: 66.28450000000001 - type: nauc_mrr_at_100_max value: 50.8991 - type: nauc_mrr_at_100_std value: -6.2459 - type: nauc_mrr_at_100_diff1 value: 66.3257 - type: nauc_mrr_at_1000_max value: 50.8934 - type: nauc_mrr_at_1000_std value: -6.2602 - type: nauc_mrr_at_1000_diff1 value: 66.328 - type: main_score value: 62.918 task: type: Retrieval - dataset: config: default name: MTEB ImdbClassification (default) revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 62.2348 - type: f1 value: 62.0977 - type: f1_weighted value: 62.0977 - type: ap value: 57.750800000000005 - type: ap_weighted value: 57.750800000000005 - type: main_score value: 62.2348 task: type: Classification - dataset: config: default name: MTEB MSMARCO (default) revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: ndcg_at_1 value: 15.085999999999999 - type: ndcg_at_3 value: 23.567 - type: ndcg_at_5 value: 27.066000000000003 - type: ndcg_at_10 value: 30.711 - type: ndcg_at_20 value: 33.251999999999995 - type: ndcg_at_100 value: 37.221 - type: ndcg_at_1000 value: 39.133 - type: map_at_1 value: 14.654 - type: map_at_3 value: 21.234 - type: map_at_5 value: 23.189999999999998 - type: map_at_10 value: 24.72 - type: map_at_20 value: 25.433 - type: map_at_100 value: 25.994 - type: map_at_1000 value: 26.067 - type: recall_at_1 value: 14.654 - type: recall_at_3 value: 29.862 - type: recall_at_5 value: 38.274 - type: recall_at_10 value: 49.341 - type: recall_at_20 value: 59.206 - type: recall_at_100 value: 80.22399999999999 - type: recall_at_1000 value: 95.037 - type: precision_at_1 value: 15.085999999999999 - type: precision_at_3 value: 10.277 - type: precision_at_5 value: 7.922999999999999 - type: precision_at_10 value: 5.132 - type: precision_at_20 value: 3.0949999999999998 - type: precision_at_100 value: 0.845 - type: precision_at_1000 value: 0.101 - type: mrr_at_1 value: 15.085999999999999 - type: mrr_at_3 value: 21.7311 - type: mrr_at_5 value: 23.6738 - type: mrr_at_10 value: 25.184099999999997 - type: mrr_at_20 value: 25.878899999999998 - type: mrr_at_100 value: 26.4216 - type: mrr_at_1000 value: 26.4886 - type: nauc_ndcg_at_1_max value: 3.3686000000000003 - type: nauc_ndcg_at_1_std value: -14.960799999999999 - type: nauc_ndcg_at_1_diff1 value: 30.0257 - type: nauc_ndcg_at_3_max value: 4.3222 - type: nauc_ndcg_at_3_std value: -15.8473 - type: nauc_ndcg_at_3_diff1 value: 26.935399999999998 - type: nauc_ndcg_at_5_max value: 4.8392 - type: nauc_ndcg_at_5_std value: -15.7197 - type: nauc_ndcg_at_5_diff1 value: 26.1067 - type: nauc_ndcg_at_10_max value: 4.8289 - type: nauc_ndcg_at_10_std value: -14.713300000000002 - type: nauc_ndcg_at_10_diff1 value: 25.3576 - type: nauc_ndcg_at_20_max value: 5.2264 - type: nauc_ndcg_at_20_std value: -13.5723 - type: nauc_ndcg_at_20_diff1 value: 25.7189 - type: nauc_ndcg_at_100_max value: 6.2197000000000005 - type: nauc_ndcg_at_100_std value: -10.5613 - type: nauc_ndcg_at_100_diff1 value: 25.407200000000003 - type: nauc_ndcg_at_1000_max value: 6.336899999999999 - type: nauc_ndcg_at_1000_std value: -11.2538 - type: nauc_ndcg_at_1000_diff1 value: 25.8353 - type: nauc_map_at_1_max value: 3.4762 - type: nauc_map_at_1_std value: -14.829899999999999 - type: nauc_map_at_1_diff1 value: 30.220200000000002 - type: nauc_map_at_3_max value: 4.1498 - type: nauc_map_at_3_std value: -15.659699999999999 - type: nauc_map_at_3_diff1 value: 27.6738 - type: nauc_map_at_5_max value: 4.457599999999999 - type: nauc_map_at_5_std value: -15.593599999999999 - type: nauc_map_at_5_diff1 value: 27.147399999999998 - type: nauc_map_at_10_max value: 4.4191 - type: nauc_map_at_10_std value: -15.199599999999998 - type: nauc_map_at_10_diff1 value: 26.8024 - type: nauc_map_at_20_max value: 4.559699999999999 - type: nauc_map_at_20_std value: -14.8687 - type: nauc_map_at_20_diff1 value: 26.929799999999997 - type: nauc_map_at_100_max value: 4.709300000000001 - type: nauc_map_at_100_std value: -14.430599999999998 - type: nauc_map_at_100_diff1 value: 26.895200000000003 - type: nauc_map_at_1000_max value: 4.7146 - type: nauc_map_at_1000_std value: -14.4381 - type: nauc_map_at_1000_diff1 value: 26.9071 - type: nauc_recall_at_1_max value: 3.4762 - type: nauc_recall_at_1_std value: -14.829899999999999 - type: nauc_recall_at_1_diff1 value: 30.220200000000002 - type: nauc_recall_at_3_max value: 4.8518 - type: nauc_recall_at_3_std value: -16.215 - type: nauc_recall_at_3_diff1 value: 25.1628 - type: nauc_recall_at_5_max value: 5.8279 - type: nauc_recall_at_5_std value: -15.9303 - type: nauc_recall_at_5_diff1 value: 23.544999999999998 - type: nauc_recall_at_10_max value: 5.7948 - type: nauc_recall_at_10_std value: -13.1624 - type: nauc_recall_at_10_diff1 value: 21.5447 - type: nauc_recall_at_20_max value: 7.0539000000000005 - type: nauc_recall_at_20_std value: -8.9408 - type: nauc_recall_at_20_diff1 value: 22.4027 - type: nauc_recall_at_100_max value: 15.1651 - type: nauc_recall_at_100_std value: 16.419 - type: nauc_recall_at_100_diff1 value: 17.897299999999998 - type: nauc_recall_at_1000_max value: 41.646300000000004 - type: nauc_recall_at_1000_std value: 54.791000000000004 - type: nauc_recall_at_1000_diff1 value: 16.4922 - type: nauc_precision_at_1_max value: 3.3686000000000003 - type: nauc_precision_at_1_std value: -14.960799999999999 - type: nauc_precision_at_1_diff1 value: 30.0257 - type: nauc_precision_at_3_max value: 4.8638 - type: nauc_precision_at_3_std value: -16.3 - type: nauc_precision_at_3_diff1 value: 25.1213 - type: nauc_precision_at_5_max value: 5.8399 - type: nauc_precision_at_5_std value: -16.1007 - type: nauc_precision_at_5_diff1 value: 23.4288 - type: nauc_precision_at_10_max value: 6.042 - type: nauc_precision_at_10_std value: -13.0782 - type: nauc_precision_at_10_diff1 value: 20.8509 - type: nauc_precision_at_20_max value: 7.9528 - type: nauc_precision_at_20_std value: -8.2321 - type: nauc_precision_at_20_diff1 value: 21.0746 - type: nauc_precision_at_100_max value: 16.026699999999998 - type: nauc_precision_at_100_std value: 15.112200000000001 - type: nauc_precision_at_100_diff1 value: 13.2433 - type: nauc_precision_at_1000_max value: 24.8965 - type: nauc_precision_at_1000_std value: 24.741 - type: nauc_precision_at_1000_diff1 value: 2.8078 - type: nauc_mrr_at_1_max value: 3.3686000000000003 - type: nauc_mrr_at_1_std value: -14.960799999999999 - type: nauc_mrr_at_1_diff1 value: 30.0257 - type: nauc_mrr_at_3_max value: 3.9521 - type: nauc_mrr_at_3_std value: -15.6591 - type: nauc_mrr_at_3_diff1 value: 27.511799999999997 - type: nauc_mrr_at_5_max value: 4.3118 - type: nauc_mrr_at_5_std value: -15.5244 - type: nauc_mrr_at_5_diff1 value: 27.024199999999997 - type: nauc_mrr_at_10_max value: 4.3529 - type: nauc_mrr_at_10_std value: -15.065100000000001 - type: nauc_mrr_at_10_diff1 value: 26.7106 - type: nauc_mrr_at_20_max value: 4.4593 - type: nauc_mrr_at_20_std value: -14.7683 - type: nauc_mrr_at_20_diff1 value: 26.815099999999997 - type: nauc_mrr_at_100_max value: 4.5908999999999995 - type: nauc_mrr_at_100_std value: -14.361099999999999 - type: nauc_mrr_at_100_diff1 value: 26.7866 - type: nauc_mrr_at_1000_max value: 4.5903 - type: nauc_mrr_at_1000_std value: -14.3764 - type: nauc_mrr_at_1000_diff1 value: 26.801000000000002 - type: main_score value: 30.711 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 89.4505 - type: f1 value: 89.00200000000001 - type: f1_weighted value: 89.442 - type: main_score value: 89.4505 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 56.846799999999995 - type: f1 value: 39.2152 - type: f1_weighted value: 58.797999999999995 - type: main_score value: 56.846799999999995 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 64.768 - type: f1 value: 61.9285 - type: f1_weighted value: 63.67 - type: main_score value: 64.768 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 71.3416 - type: f1 value: 69.9576 - type: f1_weighted value: 71.19680000000001 - type: main_score value: 71.3416 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P (default) revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: v_measure value: 32.5684 - type: v_measure_std value: 1.6362999999999999 - type: main_score value: 32.5684 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S (default) revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: v_measure value: 31.551299999999998 - type: v_measure_std value: 1.7208999999999999 - type: main_score value: 31.551299999999998 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking (default) revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 split: test type: mteb/mind_small metrics: - type: map value: 30.883 - type: mrr value: 31.923299999999998 - type: nAUC_map_max value: -20.072000000000003 - type: nAUC_map_std value: -4.8503 - type: nAUC_map_diff1 value: 14.178099999999999 - type: nAUC_mrr_max value: -14.7901 - type: nAUC_mrr_std value: -2.8666 - type: nAUC_mrr_diff1 value: 13.2767 - type: main_score value: 30.883 task: type: Reranking - dataset: config: default name: MTEB NFCorpus (default) revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: ndcg_at_1 value: 41.486000000000004 - type: ndcg_at_3 value: 39.324 - type: ndcg_at_5 value: 36.949 - type: ndcg_at_10 value: 33.737 - type: ndcg_at_20 value: 31.320999999999998 - type: ndcg_at_100 value: 30.886000000000003 - type: ndcg_at_1000 value: 40.018 - type: map_at_1 value: 5.452 - type: map_at_3 value: 9.45 - type: map_at_5 value: 10.92 - type: map_at_10 value: 12.758 - type: map_at_20 value: 14.036999999999999 - type: map_at_100 value: 15.93 - type: map_at_1000 value: 17.422 - type: recall_at_1 value: 5.452 - type: recall_at_3 value: 10.732999999999999 - type: recall_at_5 value: 13.553 - type: recall_at_10 value: 17.119999999999997 - type: recall_at_20 value: 20.459 - type: recall_at_100 value: 30.719 - type: recall_at_1000 value: 62.766 - type: precision_at_1 value: 43.344 - type: precision_at_3 value: 37.152 - type: precision_at_5 value: 31.703 - type: precision_at_10 value: 24.799 - type: precision_at_20 value: 18.142 - type: precision_at_100 value: 7.8950000000000005 - type: precision_at_1000 value: 2.091 - type: mrr_at_1 value: 43.3437 - type: mrr_at_3 value: 51.135200000000005 - type: mrr_at_5 value: 52.15689999999999 - type: mrr_at_10 value: 52.9277 - type: mrr_at_20 value: 53.2931 - type: mrr_at_100 value: 53.467200000000005 - type: mrr_at_1000 value: 53.5122 - type: nauc_ndcg_at_1_max value: 33.6844 - type: nauc_ndcg_at_1_std value: 17.6117 - type: nauc_ndcg_at_1_diff1 value: 37.641999999999996 - type: nauc_ndcg_at_3_max value: 36.6302 - type: nauc_ndcg_at_3_std value: 25.738 - type: nauc_ndcg_at_3_diff1 value: 29.8566 - type: nauc_ndcg_at_5_max value: 39.043099999999995 - type: nauc_ndcg_at_5_std value: 28.904999999999998 - type: nauc_ndcg_at_5_diff1 value: 26.129400000000004 - type: nauc_ndcg_at_10_max value: 38.935199999999995 - type: nauc_ndcg_at_10_std value: 30.338700000000003 - type: nauc_ndcg_at_10_diff1 value: 23.594 - type: nauc_ndcg_at_20_max value: 38.2138 - type: nauc_ndcg_at_20_std value: 31.8994 - type: nauc_ndcg_at_20_diff1 value: 21.583 - type: nauc_ndcg_at_100_max value: 39.869 - type: nauc_ndcg_at_100_std value: 33.591300000000004 - type: nauc_ndcg_at_100_diff1 value: 23.0398 - type: nauc_ndcg_at_1000_max value: 44.9572 - type: nauc_ndcg_at_1000_std value: 38.222 - type: nauc_ndcg_at_1000_diff1 value: 23.7314 - type: nauc_map_at_1_max value: 8.0309 - type: nauc_map_at_1_std value: -12.6861 - type: nauc_map_at_1_diff1 value: 45.5924 - type: nauc_map_at_3_max value: 11.8264 - type: nauc_map_at_3_std value: -7.3325000000000005 - type: nauc_map_at_3_diff1 value: 35.5714 - type: nauc_map_at_5_max value: 15.7483 - type: nauc_map_at_5_std value: -2.9122 - type: nauc_map_at_5_diff1 value: 32.2211 - type: nauc_map_at_10_max value: 19.9795 - type: nauc_map_at_10_std value: 2.6611 - type: nauc_map_at_10_diff1 value: 29.047099999999997 - type: nauc_map_at_20_max value: 23.1754 - type: nauc_map_at_20_std value: 8.0668 - type: nauc_map_at_20_diff1 value: 27.7477 - type: nauc_map_at_100_max value: 26.4818 - type: nauc_map_at_100_std value: 15.723 - type: nauc_map_at_100_diff1 value: 26.5443 - type: nauc_map_at_1000_max value: 27.929100000000002 - type: nauc_map_at_1000_std value: 19.81 - type: nauc_map_at_1000_diff1 value: 25.0603 - type: nauc_recall_at_1_max value: 8.0309 - type: nauc_recall_at_1_std value: -12.6861 - type: nauc_recall_at_1_diff1 value: 45.5924 - type: nauc_recall_at_3_max value: 10.9894 - type: nauc_recall_at_3_std value: -7.4279 - type: nauc_recall_at_3_diff1 value: 29.917899999999996 - type: nauc_recall_at_5_max value: 15.7163 - type: nauc_recall_at_5_std value: -0.8366 - type: nauc_recall_at_5_diff1 value: 22.8634 - type: nauc_recall_at_10_max value: 19.5902 - type: nauc_recall_at_10_std value: 5.3492 - type: nauc_recall_at_10_diff1 value: 19.4157 - type: nauc_recall_at_20_max value: 23.1894 - type: nauc_recall_at_20_std value: 12.8919 - type: nauc_recall_at_20_diff1 value: 17.8387 - type: nauc_recall_at_100_max value: 30.150399999999998 - type: nauc_recall_at_100_std value: 27.5036 - type: nauc_recall_at_100_diff1 value: 15.4935 - type: nauc_recall_at_1000_max value: 32.404500000000006 - type: nauc_recall_at_1000_std value: 30.7325 - type: nauc_recall_at_1000_diff1 value: 13.9299 - type: nauc_precision_at_1_max value: 34.747699999999995 - type: nauc_precision_at_1_std value: 17.5475 - type: nauc_precision_at_1_diff1 value: 36.0582 - type: nauc_precision_at_3_max value: 39.8251 - type: nauc_precision_at_3_std value: 34.3835 - type: nauc_precision_at_3_diff1 value: 19.651699999999998 - type: nauc_precision_at_5_max value: 42.796800000000005 - type: nauc_precision_at_5_std value: 40.083999999999996 - type: nauc_precision_at_5_diff1 value: 12.4069 - type: nauc_precision_at_10_max value: 41.562599999999996 - type: nauc_precision_at_10_std value: 44.7888 - type: nauc_precision_at_10_diff1 value: 5.587000000000001 - type: nauc_precision_at_20_max value: 37.000499999999995 - type: nauc_precision_at_20_std value: 50.4486 - type: nauc_precision_at_20_diff1 value: -0.1011 - type: nauc_precision_at_100_max value: 24.7635 - type: nauc_precision_at_100_std value: 51.001200000000004 - type: nauc_precision_at_100_diff1 value: -7.7414 - type: nauc_precision_at_1000_max value: 10.837900000000001 - type: nauc_precision_at_1000_std value: 37.2421 - type: nauc_precision_at_1000_diff1 value: -14.086599999999999 - type: nauc_mrr_at_1_max value: 34.747699999999995 - type: nauc_mrr_at_1_std value: 17.5475 - type: nauc_mrr_at_1_diff1 value: 36.0582 - type: nauc_mrr_at_3_max value: 40.8392 - type: nauc_mrr_at_3_std value: 24.9403 - type: nauc_mrr_at_3_diff1 value: 33.9575 - type: nauc_mrr_at_5_max value: 42.2108 - type: nauc_mrr_at_5_std value: 26.374799999999997 - type: nauc_mrr_at_5_diff1 value: 33.8034 - type: nauc_mrr_at_10_max value: 42.180800000000005 - type: nauc_mrr_at_10_std value: 26.6843 - type: nauc_mrr_at_10_diff1 value: 33.151 - type: nauc_mrr_at_20_max value: 42.4685 - type: nauc_mrr_at_20_std value: 27.1065 - type: nauc_mrr_at_20_diff1 value: 33.0052 - type: nauc_mrr_at_100_max value: 42.417 - type: nauc_mrr_at_100_std value: 27.069300000000002 - type: nauc_mrr_at_100_diff1 value: 33.1211 - type: nauc_mrr_at_1000_max value: 42.3902 - type: nauc_mrr_at_1000_std value: 27.019 - type: nauc_mrr_at_1000_diff1 value: 33.1177 - type: main_score value: 33.737 task: type: Retrieval - dataset: config: default name: MTEB NQ (default) revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: ndcg_at_1 value: 32.793 - type: ndcg_at_3 value: 42.782 - type: ndcg_at_5 value: 47.554 - type: ndcg_at_10 value: 51.63100000000001 - type: ndcg_at_20 value: 54.005 - type: ndcg_at_100 value: 56.287 - type: ndcg_at_1000 value: 56.949000000000005 - type: map_at_1 value: 29.022 - type: map_at_3 value: 39.045 - type: map_at_5 value: 41.86 - type: map_at_10 value: 43.730000000000004 - type: map_at_20 value: 44.478 - type: map_at_100 value: 44.849 - type: map_at_1000 value: 44.877 - type: recall_at_1 value: 29.022 - type: recall_at_3 value: 50.40599999999999 - type: recall_at_5 value: 61.45 - type: recall_at_10 value: 73.32499999999999 - type: recall_at_20 value: 82.06099999999999 - type: recall_at_100 value: 93.455 - type: recall_at_1000 value: 98.414 - type: precision_at_1 value: 32.793 - type: precision_at_3 value: 19.583000000000002 - type: precision_at_5 value: 14.484 - type: precision_at_10 value: 8.737 - type: precision_at_20 value: 4.928 - type: precision_at_100 value: 1.134 - type: precision_at_1000 value: 0.12 - type: mrr_at_1 value: 32.821600000000004 - type: mrr_at_3 value: 42.275 - type: mrr_at_5 value: 44.7895 - type: mrr_at_10 value: 46.2574 - type: mrr_at_20 value: 46.8249 - type: mrr_at_100 value: 47.0971 - type: mrr_at_1000 value: 47.1157 - type: nauc_ndcg_at_1_max value: 23.167299999999997 - type: nauc_ndcg_at_1_std value: -4.5794 - type: nauc_ndcg_at_1_diff1 value: 31.1021 - type: nauc_ndcg_at_3_max value: 27.1071 - type: nauc_ndcg_at_3_std value: -4.8229 - type: nauc_ndcg_at_3_diff1 value: 26.442 - type: nauc_ndcg_at_5_max value: 29.579 - type: nauc_ndcg_at_5_std value: -3.9125 - type: nauc_ndcg_at_5_diff1 value: 26.1946 - type: nauc_ndcg_at_10_max value: 30.6847 - type: nauc_ndcg_at_10_std value: -2.3781 - type: nauc_ndcg_at_10_diff1 value: 25.9597 - type: nauc_ndcg_at_20_max value: 31.4414 - type: nauc_ndcg_at_20_std value: -0.6708000000000001 - type: nauc_ndcg_at_20_diff1 value: 25.886300000000002 - type: nauc_ndcg_at_100_max value: 30.5333 - type: nauc_ndcg_at_100_std value: -0.605 - type: nauc_ndcg_at_100_diff1 value: 26.3173 - type: nauc_ndcg_at_1000_max value: 29.6714 - type: nauc_ndcg_at_1000_std value: -1.4797 - type: nauc_ndcg_at_1000_diff1 value: 26.4662 - type: nauc_map_at_1_max value: 22.0826 - type: nauc_map_at_1_std value: -7.1051 - type: nauc_map_at_1_diff1 value: 31.398 - type: nauc_map_at_3_max value: 26.0631 - type: nauc_map_at_3_std value: -5.564100000000001 - type: nauc_map_at_3_diff1 value: 27.4542 - type: nauc_map_at_5_max value: 27.4859 - type: nauc_map_at_5_std value: -5.1595 - type: nauc_map_at_5_diff1 value: 27.4557 - type: nauc_map_at_10_max value: 27.9754 - type: nauc_map_at_10_std value: -4.4186000000000005 - type: nauc_map_at_10_diff1 value: 27.3476 - type: nauc_map_at_20_max value: 28.168 - type: nauc_map_at_20_std value: -3.8931 - type: nauc_map_at_20_diff1 value: 27.333800000000004 - type: nauc_map_at_100_max value: 28.020899999999997 - type: nauc_map_at_100_std value: -3.8826 - type: nauc_map_at_100_diff1 value: 27.411099999999998 - type: nauc_map_at_1000_max value: 27.9917 - type: nauc_map_at_1000_std value: -3.9068 - type: nauc_map_at_1000_diff1 value: 27.4158 - type: nauc_recall_at_1_max value: 22.0826 - type: nauc_recall_at_1_std value: -7.1051 - type: nauc_recall_at_1_diff1 value: 31.398 - type: nauc_recall_at_3_max value: 29.145500000000002 - type: nauc_recall_at_3_std value: -4.3699 - type: nauc_recall_at_3_diff1 value: 22.868 - type: nauc_recall_at_5_max value: 35.4075 - type: nauc_recall_at_5_std value: -2.0428 - type: nauc_recall_at_5_diff1 value: 21.4863 - type: nauc_recall_at_10_max value: 41.0673 - type: nauc_recall_at_10_std value: 3.6994 - type: nauc_recall_at_10_diff1 value: 19.2556 - type: nauc_recall_at_20_max value: 50.6702 - type: nauc_recall_at_20_std value: 16.162399999999998 - type: nauc_recall_at_20_diff1 value: 16.9676 - type: nauc_recall_at_100_max value: 64.5925 - type: nauc_recall_at_100_std value: 42.2234 - type: nauc_recall_at_100_diff1 value: 12.741 - type: nauc_recall_at_1000_max value: 66.29310000000001 - type: nauc_recall_at_1000_std value: 61.5236 - type: nauc_recall_at_1000_diff1 value: -6.1148 - type: nauc_precision_at_1_max value: 23.167299999999997 - type: nauc_precision_at_1_std value: -4.5794 - type: nauc_precision_at_1_diff1 value: 31.1021 - type: nauc_precision_at_3_max value: 28.3464 - type: nauc_precision_at_3_std value: -0.0571 - type: nauc_precision_at_3_diff1 value: 18.987399999999997 - type: nauc_precision_at_5_max value: 30.9637 - type: nauc_precision_at_5_std value: 2.3625 - type: nauc_precision_at_5_diff1 value: 15.912299999999998 - type: nauc_precision_at_10_max value: 28.3203 - type: nauc_precision_at_10_std value: 8.2947 - type: nauc_precision_at_10_diff1 value: 10.066899999999999 - type: nauc_precision_at_20_max value: 26.2198 - type: nauc_precision_at_20_std value: 15.4182 - type: nauc_precision_at_20_diff1 value: 5.0011 - type: nauc_precision_at_100_max value: 12.721599999999999 - type: nauc_precision_at_100_std value: 18.2616 - type: nauc_precision_at_100_diff1 value: -1.5249000000000001 - type: nauc_precision_at_1000_max value: 1.514 - type: nauc_precision_at_1000_std value: 12.6332 - type: nauc_precision_at_1000_diff1 value: -4.8346 - type: nauc_mrr_at_1_max value: 23.3079 - type: nauc_mrr_at_1_std value: -4.6507 - type: nauc_mrr_at_1_diff1 value: 31.014999999999997 - type: nauc_mrr_at_3_max value: 26.371299999999998 - type: nauc_mrr_at_3_std value: -3.6183 - type: nauc_mrr_at_3_diff1 value: 27.5342 - type: nauc_mrr_at_5_max value: 27.4604 - type: nauc_mrr_at_5_std value: -2.9482 - type: nauc_mrr_at_5_diff1 value: 27.308100000000003 - type: nauc_mrr_at_10_max value: 27.6781 - type: nauc_mrr_at_10_std value: -2.5515 - type: nauc_mrr_at_10_diff1 value: 27.338 - type: nauc_mrr_at_20_max value: 27.760099999999998 - type: nauc_mrr_at_20_std value: -2.2787 - type: nauc_mrr_at_20_diff1 value: 27.372200000000003 - type: nauc_mrr_at_100_max value: 27.6611 - type: nauc_mrr_at_100_std value: -2.3218 - type: nauc_mrr_at_100_diff1 value: 27.444000000000003 - type: nauc_mrr_at_1000_max value: 27.6393 - type: nauc_mrr_at_1000_std value: -2.3404000000000003 - type: nauc_mrr_at_1000_diff1 value: 27.4444 - type: main_score value: 51.63100000000001 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval (default) revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 split: test type: mteb/quora metrics: - type: ndcg_at_1 value: 79.36999999999999 - type: ndcg_at_3 value: 83.545 - type: ndcg_at_5 value: 85.32 - type: ndcg_at_10 value: 86.696 - type: ndcg_at_20 value: 87.46199999999999 - type: ndcg_at_100 value: 88.103 - type: ndcg_at_1000 value: 88.252 - type: map_at_1 value: 68.961 - type: map_at_3 value: 79.616 - type: map_at_5 value: 81.54 - type: map_at_10 value: 82.65400000000001 - type: map_at_20 value: 83.098 - type: map_at_100 value: 83.33 - type: map_at_1000 value: 83.34899999999999 - type: recall_at_1 value: 68.961 - type: recall_at_3 value: 85.501 - type: recall_at_5 value: 90.379 - type: recall_at_10 value: 94.407 - type: recall_at_20 value: 96.86399999999999 - type: recall_at_100 value: 99.226 - type: recall_at_1000 value: 99.958 - type: precision_at_1 value: 79.36999999999999 - type: precision_at_3 value: 36.35 - type: precision_at_5 value: 24.048 - type: precision_at_10 value: 13.145000000000001 - type: precision_at_20 value: 7.007 - type: precision_at_100 value: 1.517 - type: precision_at_1000 value: 0.156 - type: mrr_at_1 value: 79.3 - type: mrr_at_3 value: 84.82169999999999 - type: mrr_at_5 value: 85.6047 - type: mrr_at_10 value: 85.94500000000001 - type: mrr_at_20 value: 86.0381 - type: mrr_at_100 value: 86.0694 - type: mrr_at_1000 value: 86.0712 - type: nauc_ndcg_at_1_max value: 37.962 - type: nauc_ndcg_at_1_std value: -32.129999999999995 - type: nauc_ndcg_at_1_diff1 value: 76.2543 - type: nauc_ndcg_at_3_max value: 36.5568 - type: nauc_ndcg_at_3_std value: -36.9639 - type: nauc_ndcg_at_3_diff1 value: 74.33229999999999 - type: nauc_ndcg_at_5_max value: 36.6236 - type: nauc_ndcg_at_5_std value: -38.3823 - type: nauc_ndcg_at_5_diff1 value: 74.8725 - type: nauc_ndcg_at_10_max value: 37.2726 - type: nauc_ndcg_at_10_std value: -37.6889 - type: nauc_ndcg_at_10_diff1 value: 75.437 - type: nauc_ndcg_at_20_max value: 37.3643 - type: nauc_ndcg_at_20_std value: -36.4545 - type: nauc_ndcg_at_20_diff1 value: 75.3032 - type: nauc_ndcg_at_100_max value: 37.701 - type: nauc_ndcg_at_100_std value: -34.6794 - type: nauc_ndcg_at_100_diff1 value: 75.1545 - type: nauc_ndcg_at_1000_max value: 37.7386 - type: nauc_ndcg_at_1000_std value: -34.659099999999995 - type: nauc_ndcg_at_1000_diff1 value: 75.1303 - type: nauc_map_at_1_max value: 28.3786 - type: nauc_map_at_1_std value: -34.4402 - type: nauc_map_at_1_diff1 value: 78.58579999999999 - type: nauc_map_at_3_max value: 34.1617 - type: nauc_map_at_3_std value: -39.0191 - type: nauc_map_at_3_diff1 value: 75.551 - type: nauc_map_at_5_max value: 35.2348 - type: nauc_map_at_5_std value: -39.352399999999996 - type: nauc_map_at_5_diff1 value: 75.45530000000001 - type: nauc_map_at_10_max value: 36.0009 - type: nauc_map_at_10_std value: -38.389 - type: nauc_map_at_10_diff1 value: 75.523 - type: nauc_map_at_20_max value: 36.167300000000004 - type: nauc_map_at_20_std value: -37.5191 - type: nauc_map_at_20_diff1 value: 75.3798 - type: nauc_map_at_100_max value: 36.2928 - type: nauc_map_at_100_std value: -36.8001 - type: nauc_map_at_100_diff1 value: 75.2957 - type: nauc_map_at_1000_max value: 36.3027 - type: nauc_map_at_1000_std value: -36.7641 - type: nauc_map_at_1000_diff1 value: 75.29090000000001 - type: nauc_recall_at_1_max value: 28.3786 - type: nauc_recall_at_1_std value: -34.4402 - type: nauc_recall_at_1_diff1 value: 78.58579999999999 - type: nauc_recall_at_3_max value: 32.1082 - type: nauc_recall_at_3_std value: -43.2936 - type: nauc_recall_at_3_diff1 value: 71.4939 - type: nauc_recall_at_5_max value: 32.590599999999995 - type: nauc_recall_at_5_std value: -48.7416 - type: nauc_recall_at_5_diff1 value: 70.7945 - type: nauc_recall_at_10_max value: 34.755 - type: nauc_recall_at_10_std value: -49.398599999999995 - type: nauc_recall_at_10_diff1 value: 71.87219999999999 - type: nauc_recall_at_20_max value: 33.879999999999995 - type: nauc_recall_at_20_std value: -45.1325 - type: nauc_recall_at_20_diff1 value: 71.3805 - type: nauc_recall_at_100_max value: 37.4684 - type: nauc_recall_at_100_std value: -13.0134 - type: nauc_recall_at_100_diff1 value: 69.963 - type: nauc_recall_at_1000_max value: 31.6199 - type: nauc_recall_at_1000_std value: 59.0228 - type: nauc_recall_at_1000_diff1 value: 60.9687 - type: nauc_precision_at_1_max value: 37.962 - type: nauc_precision_at_1_std value: -32.129999999999995 - type: nauc_precision_at_1_diff1 value: 76.2543 - type: nauc_precision_at_3_max value: 11.419799999999999 - type: nauc_precision_at_3_std value: 2.5604999999999998 - type: nauc_precision_at_3_diff1 value: -11.505799999999999 - type: nauc_precision_at_5_max value: 4.454700000000001 - type: nauc_precision_at_5_std value: 11.6986 - type: nauc_precision_at_5_diff1 value: -26.2868 - type: nauc_precision_at_10_max value: -0.4261 - type: nauc_precision_at_10_std value: 20.7877 - type: nauc_precision_at_10_diff1 value: -34.5624 - type: nauc_precision_at_20_max value: -3.7817000000000003 - type: nauc_precision_at_20_std value: 27.056599999999996 - type: nauc_precision_at_20_diff1 value: -39.0052 - type: nauc_precision_at_100_max value: -6.4321 - type: nauc_precision_at_100_std value: 33.1245 - type: nauc_precision_at_100_diff1 value: -41.9135 - type: nauc_precision_at_1000_max value: -7.100199999999999 - type: nauc_precision_at_1000_std value: 34.0081 - type: nauc_precision_at_1000_diff1 value: -42.556 - type: nauc_mrr_at_1_max value: 37.754 - type: nauc_mrr_at_1_std value: -32.2644 - type: nauc_mrr_at_1_diff1 value: 76.4182 - type: nauc_mrr_at_3_max value: 38.7583 - type: nauc_mrr_at_3_std value: -33.631699999999995 - type: nauc_mrr_at_3_diff1 value: 75.30369999999999 - type: nauc_mrr_at_5_max value: 38.675399999999996 - type: nauc_mrr_at_5_std value: -33.873 - type: nauc_mrr_at_5_diff1 value: 75.58890000000001 - type: nauc_mrr_at_10_max value: 38.7962 - type: nauc_mrr_at_10_std value: -33.5451 - type: nauc_mrr_at_10_diff1 value: 75.7153 - type: nauc_mrr_at_20_max value: 38.7213 - type: nauc_mrr_at_20_std value: -33.433600000000006 - type: nauc_mrr_at_20_diff1 value: 75.6934 - type: nauc_mrr_at_100_max value: 38.6943 - type: nauc_mrr_at_100_std value: -33.4013 - type: nauc_mrr_at_100_diff1 value: 75.6932 - type: nauc_mrr_at_1000_max value: 38.6928 - type: nauc_mrr_at_1000_std value: -33.4051 - type: nauc_mrr_at_1000_diff1 value: 75.69369999999999 - type: main_score value: 86.696 task: type: Retrieval - dataset: config: default name: MTEB RedditClustering (default) revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: v_measure value: 50.019999999999996 - type: v_measure_std value: 4.5914 - type: main_score value: 50.019999999999996 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P (default) revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: v_measure value: 53.9756 - type: v_measure_std value: 11.6573 - type: main_score value: 53.9756 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS (default) revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 split: test type: mteb/scidocs metrics: - type: ndcg_at_1 value: 24.6 - type: ndcg_at_3 value: 20.896 - type: ndcg_at_5 value: 18.497 - type: ndcg_at_10 value: 22.542 - type: ndcg_at_20 value: 25.812 - type: ndcg_at_100 value: 32.326 - type: ndcg_at_1000 value: 38.279999999999994 - type: map_at_1 value: 4.988 - type: map_at_3 value: 9.439 - type: map_at_5 value: 11.459999999999999 - type: map_at_10 value: 13.553 - type: map_at_20 value: 14.767 - type: map_at_100 value: 16.136 - type: map_at_1000 value: 16.512 - type: recall_at_1 value: 4.988 - type: recall_at_3 value: 12.046999999999999 - type: recall_at_5 value: 16.777 - type: recall_at_10 value: 24.212 - type: recall_at_20 value: 31.885 - type: recall_at_100 value: 53.105000000000004 - type: recall_at_1000 value: 82.02199999999999 - type: precision_at_1 value: 24.6 - type: precision_at_3 value: 19.8 - type: precision_at_5 value: 16.54 - type: precision_at_10 value: 11.940000000000001 - type: precision_at_20 value: 7.865 - type: precision_at_100 value: 2.616 - type: precision_at_1000 value: 0.404 - type: mrr_at_1 value: 24.6 - type: mrr_at_3 value: 33.1167 - type: mrr_at_5 value: 35.1717 - type: mrr_at_10 value: 36.7925 - type: mrr_at_20 value: 37.5284 - type: mrr_at_100 value: 37.9725 - type: mrr_at_1000 value: 38.0112 - type: nauc_ndcg_at_1_max value: 17.8923 - type: nauc_ndcg_at_1_std value: 9.1225 - type: nauc_ndcg_at_1_diff1 value: 22.665399999999998 - type: nauc_ndcg_at_3_max value: 23.6866 - type: nauc_ndcg_at_3_std value: 15.3093 - type: nauc_ndcg_at_3_diff1 value: 17.589299999999998 - type: nauc_ndcg_at_5_max value: 25.3398 - type: nauc_ndcg_at_5_std value: 18.002299999999998 - type: nauc_ndcg_at_5_diff1 value: 16.8155 - type: nauc_ndcg_at_10_max value: 28.057399999999998 - type: nauc_ndcg_at_10_std value: 22.7388 - type: nauc_ndcg_at_10_diff1 value: 16.0553 - type: nauc_ndcg_at_20_max value: 28.9134 - type: nauc_ndcg_at_20_std value: 25.389 - type: nauc_ndcg_at_20_diff1 value: 15.7728 - type: nauc_ndcg_at_100_max value: 29.9553 - type: nauc_ndcg_at_100_std value: 29.8607 - type: nauc_ndcg_at_100_diff1 value: 15.526100000000001 - type: nauc_ndcg_at_1000_max value: 29.088399999999996 - type: nauc_ndcg_at_1000_std value: 29.2896 - type: nauc_ndcg_at_1000_diff1 value: 15.2143 - type: nauc_map_at_1_max value: 17.9628 - type: nauc_map_at_1_std value: 8.9923 - type: nauc_map_at_1_diff1 value: 22.7227 - type: nauc_map_at_3_max value: 24.012700000000002 - type: nauc_map_at_3_std value: 15.1908 - type: nauc_map_at_3_diff1 value: 17.7637 - type: nauc_map_at_5_max value: 25.0497 - type: nauc_map_at_5_std value: 17.366300000000003 - type: nauc_map_at_5_diff1 value: 16.1512 - type: nauc_map_at_10_max value: 26.777299999999997 - type: nauc_map_at_10_std value: 21.0365 - type: nauc_map_at_10_diff1 value: 15.0999 - type: nauc_map_at_20_max value: 27.6561 - type: nauc_map_at_20_std value: 23.031399999999998 - type: nauc_map_at_20_diff1 value: 14.935300000000002 - type: nauc_map_at_100_max value: 28.015800000000002 - type: nauc_map_at_100_std value: 24.840899999999998 - type: nauc_map_at_100_diff1 value: 14.9355 - type: nauc_map_at_1000_max value: 27.9646 - type: nauc_map_at_1000_std value: 24.9601 - type: nauc_map_at_1000_diff1 value: 14.886 - type: nauc_recall_at_1_max value: 17.9628 - type: nauc_recall_at_1_std value: 8.9923 - type: nauc_recall_at_1_diff1 value: 22.7227 - type: nauc_recall_at_3_max value: 25.008399999999998 - type: nauc_recall_at_3_std value: 17.1697 - type: nauc_recall_at_3_diff1 value: 15.1082 - type: nauc_recall_at_5_max value: 26.4345 - type: nauc_recall_at_5_std value: 20.7923 - type: nauc_recall_at_5_diff1 value: 13.58 - type: nauc_recall_at_10_max value: 29.5057 - type: nauc_recall_at_10_std value: 27.8646 - type: nauc_recall_at_10_diff1 value: 11.8098 - type: nauc_recall_at_20_max value: 29.3419 - type: nauc_recall_at_20_std value: 31.6086 - type: nauc_recall_at_20_diff1 value: 10.6491 - type: nauc_recall_at_100_max value: 28.8421 - type: nauc_recall_at_100_std value: 40.2696 - type: nauc_recall_at_100_diff1 value: 8.1461 - type: nauc_recall_at_1000_max value: 22.8234 - type: nauc_recall_at_1000_std value: 41.6117 - type: nauc_recall_at_1000_diff1 value: 1.8689999999999998 - type: nauc_precision_at_1_max value: 17.8923 - type: nauc_precision_at_1_std value: 9.1225 - type: nauc_precision_at_1_diff1 value: 22.665399999999998 - type: nauc_precision_at_3_max value: 25.1067 - type: nauc_precision_at_3_std value: 17.4066 - type: nauc_precision_at_3_diff1 value: 15.0583 - type: nauc_precision_at_5_max value: 26.6005 - type: nauc_precision_at_5_std value: 20.9158 - type: nauc_precision_at_5_diff1 value: 13.591700000000001 - type: nauc_precision_at_10_max value: 29.8091 - type: nauc_precision_at_10_std value: 28.0069 - type: nauc_precision_at_10_diff1 value: 11.675699999999999 - type: nauc_precision_at_20_max value: 29.5651 - type: nauc_precision_at_20_std value: 31.439899999999998 - type: nauc_precision_at_20_diff1 value: 10.4784 - type: nauc_precision_at_100_max value: 28.853299999999997 - type: nauc_precision_at_100_std value: 39.3115 - type: nauc_precision_at_100_diff1 value: 7.6562 - type: nauc_precision_at_1000_max value: 23.025599999999997 - type: nauc_precision_at_1000_std value: 38.554300000000005 - type: nauc_precision_at_1000_diff1 value: 1.3502999999999998 - type: nauc_mrr_at_1_max value: 17.8923 - type: nauc_mrr_at_1_std value: 9.1225 - type: nauc_mrr_at_1_diff1 value: 22.665399999999998 - type: nauc_mrr_at_3_max value: 21.2588 - type: nauc_mrr_at_3_std value: 12.7528 - type: nauc_mrr_at_3_diff1 value: 19.808999999999997 - type: nauc_mrr_at_5_max value: 22.572200000000002 - type: nauc_mrr_at_5_std value: 14.210500000000001 - type: nauc_mrr_at_5_diff1 value: 20.502000000000002 - type: nauc_mrr_at_10_max value: 23.372799999999998 - type: nauc_mrr_at_10_std value: 15.1215 - type: nauc_mrr_at_10_diff1 value: 20.8449 - type: nauc_mrr_at_20_max value: 23.017599999999998 - type: nauc_mrr_at_20_std value: 15.0391 - type: nauc_mrr_at_20_diff1 value: 20.8233 - type: nauc_mrr_at_100_max value: 22.8993 - type: nauc_mrr_at_100_std value: 14.8474 - type: nauc_mrr_at_100_diff1 value: 20.8759 - type: nauc_mrr_at_1000_max value: 22.8744 - type: nauc_mrr_at_1000_std value: 14.8178 - type: nauc_mrr_at_1000_diff1 value: 20.8635 - type: main_score value: 22.542 task: type: Retrieval - dataset: config: default name: MTEB SICK-R (default) revision: 20a6d6f312dd54037fe07a32d58e5e168867909d split: test type: mteb/sickr-sts metrics: - type: pearson value: 77.4874 - type: spearman value: 68.79809999999999 - type: cosine_pearson value: 77.4874 - type: cosine_spearman value: 68.79809999999999 - type: manhattan_pearson value: 73.3583 - type: manhattan_spearman value: 68.6911 - type: euclidean_pearson value: 73.82039999999999 - type: euclidean_spearman value: 68.79809999999999 - type: main_score value: 68.79809999999999 task: type: STS - dataset: config: default name: MTEB STS12 (default) revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: pearson value: 67.8391 - type: spearman value: 64.77380000000001 - type: cosine_pearson value: 67.8391 - type: cosine_spearman value: 64.77380000000001 - type: manhattan_pearson value: 64.7258 - type: manhattan_spearman value: 64.1558 - type: euclidean_pearson value: 65.68469999999999 - type: euclidean_spearman value: 64.7722 - type: main_score value: 64.77380000000001 task: type: STS - dataset: config: default name: MTEB STS13 (default) revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: pearson value: 78.8177 - type: spearman value: 79.3253 - type: cosine_pearson value: 78.8177 - type: cosine_spearman value: 79.3253 - type: manhattan_pearson value: 78.6048 - type: manhattan_spearman value: 79.1874 - type: euclidean_pearson value: 78.71010000000001 - type: euclidean_spearman value: 79.3253 - type: main_score value: 79.3253 task: type: STS - dataset: config: default name: MTEB STS14 (default) revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: pearson value: 75.6791 - type: spearman value: 70.1701 - type: cosine_pearson value: 75.6791 - type: cosine_spearman value: 70.1701 - type: manhattan_pearson value: 73.85239999999999 - type: manhattan_spearman value: 69.9223 - type: euclidean_pearson value: 74.143 - type: euclidean_spearman value: 70.1701 - type: main_score value: 70.1701 task: type: STS - dataset: config: default name: MTEB STS15 (default) revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: pearson value: 80.4413 - type: spearman value: 82.0343 - type: cosine_pearson value: 80.4413 - type: cosine_spearman value: 82.0343 - type: manhattan_pearson value: 81.3627 - type: manhattan_spearman value: 81.8838 - type: euclidean_pearson value: 81.47569999999999 - type: euclidean_spearman value: 82.0343 - type: main_score value: 82.0343 task: type: STS - dataset: config: default name: MTEB STS16 (default) revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: pearson value: 77.172 - type: spearman value: 78.9633 - type: cosine_pearson value: 77.172 - type: cosine_spearman value: 78.9633 - type: manhattan_pearson value: 78.35849999999999 - type: manhattan_spearman value: 78.7975 - type: euclidean_pearson value: 78.5236 - type: euclidean_spearman value: 78.9633 - type: main_score value: 78.9633 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 83.5117 - type: spearman value: 84.64970000000001 - type: cosine_pearson value: 83.5117 - type: cosine_spearman value: 84.64970000000001 - type: manhattan_pearson value: 84.5137 - type: manhattan_spearman value: 84.7848 - type: euclidean_pearson value: 84.531 - type: euclidean_spearman value: 84.64970000000001 - type: main_score value: 84.64970000000001 task: type: STS - dataset: config: es-en name: MTEB STS17 (es-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 29.0052 - type: spearman value: 30.640299999999996 - type: cosine_pearson value: 29.0052 - type: cosine_spearman value: 30.640299999999996 - type: manhattan_pearson value: 25.988099999999996 - type: manhattan_spearman value: 26.935399999999998 - type: euclidean_pearson value: 28.5366 - type: euclidean_spearman value: 30.640299999999996 - type: main_score value: 30.640299999999996 task: type: STS - dataset: config: nl-en name: MTEB STS17 (nl-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 42.0755 - type: spearman value: 39.763999999999996 - type: cosine_pearson value: 42.0755 - type: cosine_spearman value: 39.763999999999996 - type: manhattan_pearson value: 40.872 - type: manhattan_spearman value: 38.4749 - type: euclidean_pearson value: 42.051500000000004 - type: euclidean_spearman value: 39.7565 - type: main_score value: 39.763999999999996 task: type: STS - dataset: config: en-de name: MTEB STS17 (en-de) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 44.2318 - type: spearman value: 46.5518 - type: cosine_pearson value: 44.2318 - type: cosine_spearman value: 46.5518 - type: manhattan_pearson value: 43.396699999999996 - type: manhattan_spearman value: 46.1132 - type: euclidean_pearson value: 43.993500000000004 - type: euclidean_spearman value: 46.5518 - type: main_score value: 46.5518 task: type: STS - dataset: config: fr-en name: MTEB STS17 (fr-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 36.716100000000004 - type: spearman value: 34.6968 - type: cosine_pearson value: 36.716100000000004 - type: cosine_spearman value: 34.6968 - type: manhattan_pearson value: 35.1918 - type: manhattan_spearman value: 33.3692 - type: euclidean_pearson value: 36.3921 - type: euclidean_spearman value: 34.6968 - type: main_score value: 34.6968 task: type: STS - dataset: config: en-ar name: MTEB STS17 (en-ar) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 21.2825 - type: spearman value: 17.6922 - type: cosine_pearson value: 21.2825 - type: cosine_spearman value: 17.6922 - type: manhattan_pearson value: 19.491 - type: manhattan_spearman value: 15.989700000000001 - type: euclidean_pearson value: 21.583 - type: euclidean_spearman value: 17.6922 - type: main_score value: 17.6922 task: type: STS - dataset: config: it-en name: MTEB STS17 (it-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 32.1584 - type: spearman value: 27.9254 - type: cosine_pearson value: 32.1584 - type: cosine_spearman value: 27.9254 - type: manhattan_pearson value: 34.2047 - type: manhattan_spearman value: 31.1955 - type: euclidean_pearson value: 32.4369 - type: euclidean_spearman value: 27.9254 - type: main_score value: 27.9254 task: type: STS - dataset: config: en-tr name: MTEB STS17 (en-tr) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: pearson value: 21.0842 - type: spearman value: 18.5115 - type: cosine_pearson value: 21.0842 - type: cosine_spearman value: 18.5115 - type: manhattan_pearson value: 23.5904 - type: manhattan_spearman value: 21.032400000000003 - type: euclidean_pearson value: 21.2805 - type: euclidean_spearman value: 18.5115 - type: main_score value: 18.5115 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 66.9563 - type: spearman value: 67.4747 - type: cosine_pearson value: 66.9563 - type: cosine_spearman value: 67.4747 - type: manhattan_pearson value: 68.32629999999999 - type: manhattan_spearman value: 66.8163 - type: euclidean_pearson value: 68.731 - type: euclidean_spearman value: 67.4747 - type: main_score value: 67.4747 task: type: STS - dataset: config: de-en name: MTEB STS22 (de-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 56.3095 - type: spearman value: 54.1005 - type: cosine_pearson value: 56.3095 - type: cosine_spearman value: 54.1005 - type: manhattan_pearson value: 59.4023 - type: manhattan_spearman value: 52.6259 - type: euclidean_pearson value: 58.6527 - type: euclidean_spearman value: 54.1005 - type: main_score value: 54.1005 task: type: STS - dataset: config: es-en name: MTEB STS22 (es-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 62.0575 - type: spearman value: 66.9527 - type: cosine_pearson value: 62.0575 - type: cosine_spearman value: 66.9527 - type: manhattan_pearson value: 62.648700000000005 - type: manhattan_spearman value: 65.6446 - type: euclidean_pearson value: 63.546800000000005 - type: euclidean_spearman value: 66.9527 - type: main_score value: 66.9527 task: type: STS - dataset: config: pl-en name: MTEB STS22 (pl-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 68.42439999999999 - type: spearman value: 69.0444 - type: cosine_pearson value: 68.42439999999999 - type: cosine_spearman value: 69.0444 - type: manhattan_pearson value: 65.1492 - type: manhattan_spearman value: 65.2364 - type: euclidean_pearson value: 68.4923 - type: euclidean_spearman value: 69.0444 - type: main_score value: 69.0444 task: type: STS - dataset: config: zh-en name: MTEB STS22 (zh-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: pearson value: 34.164699999999996 - type: spearman value: 36.1776 - type: cosine_pearson value: 34.164699999999996 - type: cosine_spearman value: 36.1776 - type: manhattan_pearson value: 33.0685 - type: manhattan_spearman value: 34.4054 - type: euclidean_pearson value: 34.1002 - type: euclidean_spearman value: 36.1776 - type: main_score value: 36.1776 task: type: STS - dataset: config: default name: MTEB STSBenchmark (default) revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: pearson value: 78.0802 - type: spearman value: 78.0444 - type: cosine_pearson value: 78.0802 - type: cosine_spearman value: 78.0444 - type: manhattan_pearson value: 78.0703 - type: manhattan_spearman value: 77.681 - type: euclidean_pearson value: 78.4998 - type: euclidean_spearman value: 78.0444 - type: main_score value: 78.0444 task: type: STS - dataset: config: default name: MTEB SciDocsRR (default) revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: map value: 86.4489 - type: mrr value: 96.0178 - type: nAUC_map_max value: 49.2333 - type: nAUC_map_std value: 63.6541 - type: nAUC_map_diff1 value: 0.40959999999999996 - type: nAUC_mrr_max value: 83.6216 - type: nAUC_mrr_std value: 76.7559 - type: nAUC_mrr_diff1 value: 42.9429 - type: main_score value: 86.4489 task: type: Reranking - dataset: config: default name: MTEB SciFact (default) revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: ndcg_at_1 value: 59.333000000000006 - type: ndcg_at_3 value: 65.793 - type: ndcg_at_5 value: 69.429 - type: ndcg_at_10 value: 71.27 - type: ndcg_at_20 value: 72.929 - type: ndcg_at_100 value: 73.88900000000001 - type: ndcg_at_1000 value: 74.41 - type: map_at_1 value: 56.577999999999996 - type: map_at_3 value: 63.416 - type: map_at_5 value: 65.77 - type: map_at_10 value: 66.725 - type: map_at_20 value: 67.24799999999999 - type: map_at_100 value: 67.379 - type: map_at_1000 value: 67.4 - type: recall_at_1 value: 56.577999999999996 - type: recall_at_3 value: 70.072 - type: recall_at_5 value: 79.011 - type: recall_at_10 value: 84.2 - type: recall_at_20 value: 90.5 - type: recall_at_100 value: 95.667 - type: recall_at_1000 value: 99.667 - type: precision_at_1 value: 59.333000000000006 - type: precision_at_3 value: 25.556 - type: precision_at_5 value: 17.666999999999998 - type: precision_at_10 value: 9.6 - type: precision_at_20 value: 5.167 - type: precision_at_100 value: 1.087 - type: precision_at_1000 value: 0.11299999999999999 - type: mrr_at_1 value: 59.3333 - type: mrr_at_3 value: 64.9444 - type: mrr_at_5 value: 66.9278 - type: mrr_at_10 value: 67.5327 - type: mrr_at_20 value: 67.9354 - type: mrr_at_100 value: 68.0616 - type: mrr_at_1000 value: 68.08239999999999 - type: nauc_ndcg_at_1_max value: 62.536199999999994 - type: nauc_ndcg_at_1_std value: 4.3275 - type: nauc_ndcg_at_1_diff1 value: 78.2294 - type: nauc_ndcg_at_3_max value: 63.0626 - type: nauc_ndcg_at_3_std value: 6.0584 - type: nauc_ndcg_at_3_diff1 value: 74.4931 - type: nauc_ndcg_at_5_max value: 64.73989999999999 - type: nauc_ndcg_at_5_std value: 5.6514 - type: nauc_ndcg_at_5_diff1 value: 73.5498 - type: nauc_ndcg_at_10_max value: 65.43090000000001 - type: nauc_ndcg_at_10_std value: 9.1274 - type: nauc_ndcg_at_10_diff1 value: 72.4814 - type: nauc_ndcg_at_20_max value: 65.7156 - type: nauc_ndcg_at_20_std value: 9.9385 - type: nauc_ndcg_at_20_diff1 value: 73.0996 - type: nauc_ndcg_at_100_max value: 65.5687 - type: nauc_ndcg_at_100_std value: 8.818299999999999 - type: nauc_ndcg_at_100_diff1 value: 73.6361 - type: nauc_ndcg_at_1000_max value: 65.1956 - type: nauc_ndcg_at_1000_std value: 8.4772 - type: nauc_ndcg_at_1000_diff1 value: 74.0393 - type: nauc_map_at_1_max value: 58.2314 - type: nauc_map_at_1_std value: -2.7946 - type: nauc_map_at_1_diff1 value: 78.24940000000001 - type: nauc_map_at_3_max value: 61.364200000000004 - type: nauc_map_at_3_std value: 2.7072 - type: nauc_map_at_3_diff1 value: 75.4798 - type: nauc_map_at_5_max value: 63.1297 - type: nauc_map_at_5_std value: 3.9505 - type: nauc_map_at_5_diff1 value: 74.9693 - type: nauc_map_at_10_max value: 63.6643 - type: nauc_map_at_10_std value: 5.8328999999999995 - type: nauc_map_at_10_diff1 value: 74.5464 - type: nauc_map_at_20_max value: 63.8666 - type: nauc_map_at_20_std value: 6.1967 - type: nauc_map_at_20_diff1 value: 74.7224 - type: nauc_map_at_100_max value: 63.8254 - type: nauc_map_at_100_std value: 6.0627 - type: nauc_map_at_100_diff1 value: 74.791 - type: nauc_map_at_1000_max value: 63.811499999999995 - type: nauc_map_at_1000_std value: 6.0484 - type: nauc_map_at_1000_diff1 value: 74.807 - type: nauc_recall_at_1_max value: 58.2314 - type: nauc_recall_at_1_std value: -2.7946 - type: nauc_recall_at_1_diff1 value: 78.24940000000001 - type: nauc_recall_at_3_max value: 61.132299999999994 - type: nauc_recall_at_3_std value: 6.1988 - type: nauc_recall_at_3_diff1 value: 70.7273 - type: nauc_recall_at_5_max value: 66.542 - type: nauc_recall_at_5_std value: 5.7653 - type: nauc_recall_at_5_diff1 value: 66.4586 - type: nauc_recall_at_10_max value: 69.3605 - type: nauc_recall_at_10_std value: 19.6237 - type: nauc_recall_at_10_diff1 value: 60.2814 - type: nauc_recall_at_20_max value: 72.6154 - type: nauc_recall_at_20_std value: 31.3504 - type: nauc_recall_at_20_diff1 value: 58.8899 - type: nauc_recall_at_100_max value: 78.6002 - type: nauc_recall_at_100_std value: 26.484999999999996 - type: nauc_recall_at_100_diff1 value: 56.4605 - type: nauc_recall_at_1000_max value: 55.415499999999994 - type: nauc_recall_at_1000_std value: 72.2222 - type: nauc_recall_at_1000_diff1 value: 35.8077 - type: nauc_precision_at_1_max value: 62.536199999999994 - type: nauc_precision_at_1_std value: 4.3275 - type: nauc_precision_at_1_diff1 value: 78.2294 - type: nauc_precision_at_3_max value: 53.5524 - type: nauc_precision_at_3_std value: 23.5724 - type: nauc_precision_at_3_diff1 value: 47.5389 - type: nauc_precision_at_5_max value: 49.1594 - type: nauc_precision_at_5_std value: 32.3563 - type: nauc_precision_at_5_diff1 value: 28.2105 - type: nauc_precision_at_10_max value: 41.955799999999996 - type: nauc_precision_at_10_std value: 44.039699999999996 - type: nauc_precision_at_10_diff1 value: 12.0187 - type: nauc_precision_at_20_max value: 34.2442 - type: nauc_precision_at_20_std value: 50.204899999999995 - type: nauc_precision_at_20_diff1 value: -0.1954 - type: nauc_precision_at_100_max value: 26.8264 - type: nauc_precision_at_100_std value: 51.4247 - type: nauc_precision_at_100_diff1 value: -11.9827 - type: nauc_precision_at_1000_max value: 17.467 - type: nauc_precision_at_1000_std value: 56.435100000000006 - type: nauc_precision_at_1000_diff1 value: -24.2103 - type: nauc_mrr_at_1_max value: 62.536199999999994 - type: nauc_mrr_at_1_std value: 4.3275 - type: nauc_mrr_at_1_diff1 value: 78.2294 - type: nauc_mrr_at_3_max value: 64.5911 - type: nauc_mrr_at_3_std value: 7.8005 - type: nauc_mrr_at_3_diff1 value: 75.82140000000001 - type: nauc_mrr_at_5_max value: 65.1643 - type: nauc_mrr_at_5_std value: 7.258100000000001 - type: nauc_mrr_at_5_diff1 value: 75.2062 - type: nauc_mrr_at_10_max value: 65.3198 - type: nauc_mrr_at_10_std value: 8.2173 - type: nauc_mrr_at_10_diff1 value: 74.9449 - type: nauc_mrr_at_20_max value: 65.2169 - type: nauc_mrr_at_20_std value: 8.115400000000001 - type: nauc_mrr_at_20_diff1 value: 75.1765 - type: nauc_mrr_at_100_max value: 65.1744 - type: nauc_mrr_at_100_std value: 7.994700000000001 - type: nauc_mrr_at_100_diff1 value: 75.2388 - type: nauc_mrr_at_1000_max value: 65.1615 - type: nauc_mrr_at_1000_std value: 7.9817 - type: nauc_mrr_at_1000_diff1 value: 75.2553 - type: main_score value: 71.27 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions (default) revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: similarity_accuracy value: 99.7604 - type: similarity_accuracy_threshold value: 84.88210000000001 - type: similarity_f1 value: 87.86359999999999 - type: similarity_f1_threshold value: 84.88210000000001 - type: similarity_precision value: 88.1288 - type: similarity_recall value: 87.6 - type: similarity_ap value: 94.07140000000001 - type: cosine_accuracy value: 99.7604 - type: cosine_accuracy_threshold value: 84.88210000000001 - type: cosine_f1 value: 87.86359999999999 - type: cosine_f1_threshold value: 84.88210000000001 - type: cosine_precision value: 88.1288 - type: cosine_recall value: 87.6 - type: cosine_ap value: 94.07140000000001 - type: manhattan_accuracy value: 99.7644 - type: manhattan_accuracy_threshold value: 829.5789 - type: manhattan_f1 value: 87.92320000000001 - type: manhattan_f1_threshold value: 840.6424 - type: manhattan_precision value: 88.86619999999999 - type: manhattan_recall value: 87.0 - type: manhattan_ap value: 94.17 - type: euclidean_accuracy value: 99.7604 - type: euclidean_accuracy_threshold value: 54.986999999999995 - type: euclidean_f1 value: 87.86359999999999 - type: euclidean_f1_threshold value: 54.986999999999995 - type: euclidean_precision value: 88.1288 - type: euclidean_recall value: 87.6 - type: euclidean_ap value: 94.07140000000001 - type: dot_accuracy value: 99.7604 - type: dot_accuracy_threshold value: 84.88210000000001 - type: dot_f1 value: 87.86359999999999 - type: dot_f1_threshold value: 84.88210000000001 - type: dot_precision value: 88.1288 - type: dot_recall value: 87.6 - type: dot_ap value: 94.07140000000001 - type: max_accuracy value: 99.7644 - type: max_f1 value: 87.92320000000001 - type: max_precision value: 88.86619999999999 - type: max_recall value: 87.6 - type: max_ap value: 94.17 - type: main_score value: 94.17 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering (default) revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: v_measure value: 64.6589 - type: v_measure_std value: 4.734 - type: main_score value: 64.6589 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P (default) revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: v_measure value: 32.9388 - type: v_measure_std value: 1.6312 - type: main_score value: 32.9388 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions (default) revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: map value: 52.645399999999995 - type: mrr value: 53.5346 - type: nAUC_map_max value: 12.8874 - type: nAUC_map_std value: 9.2781 - type: nAUC_map_diff1 value: 39.864 - type: nAUC_mrr_max value: 13.278 - type: nAUC_mrr_std value: 9.501999999999999 - type: nAUC_mrr_diff1 value: 39.409499999999994 - type: main_score value: 52.645399999999995 task: type: Reranking - dataset: config: default name: MTEB StackOverflowQA (default) revision: db8f169f3894c14a00251061f957b2063eef2bd5 split: test type: CoIR-Retrieval/stackoverflow-qa metrics: - type: ndcg_at_1 value: 74.97500000000001 - type: ndcg_at_3 value: 81.247 - type: ndcg_at_5 value: 82.921 - type: ndcg_at_10 value: 83.92699999999999 - type: ndcg_at_20 value: 84.57000000000001 - type: ndcg_at_100 value: 85.095 - type: ndcg_at_1000 value: 85.33800000000001 - type: map_at_1 value: 74.97500000000001 - type: map_at_3 value: 79.781 - type: map_at_5 value: 80.711 - type: map_at_10 value: 81.126 - type: map_at_20 value: 81.308 - type: map_at_100 value: 81.389 - type: map_at_1000 value: 81.39699999999999 - type: recall_at_1 value: 74.97500000000001 - type: recall_at_3 value: 85.456 - type: recall_at_5 value: 89.519 - type: recall_at_10 value: 92.628 - type: recall_at_20 value: 95.135 - type: recall_at_100 value: 97.844 - type: recall_at_1000 value: 99.799 - type: precision_at_1 value: 74.97500000000001 - type: precision_at_3 value: 28.485 - type: precision_at_5 value: 17.904 - type: precision_at_10 value: 9.263 - type: precision_at_20 value: 4.757 - type: precision_at_100 value: 0.9780000000000001 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 74.9749 - type: mrr_at_3 value: 79.781 - type: mrr_at_5 value: 80.7113 - type: mrr_at_10 value: 81.12610000000001 - type: mrr_at_20 value: 81.30760000000001 - type: mrr_at_100 value: 81.38889999999999 - type: mrr_at_1000 value: 81.3974 - type: nauc_ndcg_at_1_max value: 76.1721 - type: nauc_ndcg_at_1_std value: -5.5159 - type: nauc_ndcg_at_1_diff1 value: 84.6697 - type: nauc_ndcg_at_3_max value: 78.27629999999999 - type: nauc_ndcg_at_3_std value: -1.2 - type: nauc_ndcg_at_3_diff1 value: 81.1214 - type: nauc_ndcg_at_5_max value: 77.7687 - type: nauc_ndcg_at_5_std value: -1.8698 - type: nauc_ndcg_at_5_diff1 value: 80.9252 - type: nauc_ndcg_at_10_max value: 77.8029 - type: nauc_ndcg_at_10_std value: -1.5579 - type: nauc_ndcg_at_10_diff1 value: 81.1043 - type: nauc_ndcg_at_20_max value: 77.79310000000001 - type: nauc_ndcg_at_20_std value: -1.7669000000000001 - type: nauc_ndcg_at_20_diff1 value: 81.4121 - type: nauc_ndcg_at_100_max value: 77.7522 - type: nauc_ndcg_at_100_std value: -1.4502 - type: nauc_ndcg_at_100_diff1 value: 81.684 - type: nauc_ndcg_at_1000_max value: 77.6032 - type: nauc_ndcg_at_1000_std value: -2.0256 - type: nauc_ndcg_at_1000_diff1 value: 81.7641 - type: nauc_map_at_1_max value: 76.1721 - type: nauc_map_at_1_std value: -5.5159 - type: nauc_map_at_1_diff1 value: 84.6697 - type: nauc_map_at_3_max value: 77.6991 - type: nauc_map_at_3_std value: -2.3189 - type: nauc_map_at_3_diff1 value: 82.0708 - type: nauc_map_at_5_max value: 77.4286 - type: nauc_map_at_5_std value: -2.721 - type: nauc_map_at_5_diff1 value: 82.0265 - type: nauc_map_at_10_max value: 77.4212 - type: nauc_map_at_10_std value: -2.633 - type: nauc_map_at_10_diff1 value: 82.109 - type: nauc_map_at_20_max value: 77.4188 - type: nauc_map_at_20_std value: -2.6752000000000002 - type: nauc_map_at_20_diff1 value: 82.19340000000001 - type: nauc_map_at_100_max value: 77.4169 - type: nauc_map_at_100_std value: -2.6487 - type: nauc_map_at_100_diff1 value: 82.2353 - type: nauc_map_at_1000_max value: 77.413 - type: nauc_map_at_1000_std value: -2.6639 - type: nauc_map_at_1000_diff1 value: 82.238 - type: nauc_recall_at_1_max value: 76.1721 - type: nauc_recall_at_1_std value: -5.5159 - type: nauc_recall_at_1_diff1 value: 84.6697 - type: nauc_recall_at_3_max value: 80.4678 - type: nauc_recall_at_3_std value: 3.0113000000000003 - type: nauc_recall_at_3_diff1 value: 77.5303 - type: nauc_recall_at_5_max value: 79.2732 - type: nauc_recall_at_5_std value: 2.0842 - type: nauc_recall_at_5_diff1 value: 75.5155 - type: nauc_recall_at_10_max value: 80.2527 - type: nauc_recall_at_10_std value: 5.7078 - type: nauc_recall_at_10_diff1 value: 74.4861 - type: nauc_recall_at_20_max value: 81.29950000000001 - type: nauc_recall_at_20_std value: 6.5553 - type: nauc_recall_at_20_diff1 value: 74.5628 - type: nauc_recall_at_100_max value: 83.8742 - type: nauc_recall_at_100_std value: 28.4213 - type: nauc_recall_at_100_diff1 value: 74.4027 - type: nauc_recall_at_1000_max value: 60.9178 - type: nauc_recall_at_1000_std value: -2.6599 - type: nauc_recall_at_1000_diff1 value: 47.6074 - type: nauc_precision_at_1_max value: 76.1721 - type: nauc_precision_at_1_std value: -5.5159 - type: nauc_precision_at_1_diff1 value: 84.6697 - type: nauc_precision_at_3_max value: 80.4678 - type: nauc_precision_at_3_std value: 3.0113000000000003 - type: nauc_precision_at_3_diff1 value: 77.5303 - type: nauc_precision_at_5_max value: 79.2732 - type: nauc_precision_at_5_std value: 2.0842 - type: nauc_precision_at_5_diff1 value: 75.5155 - type: nauc_precision_at_10_max value: 80.2527 - type: nauc_precision_at_10_std value: 5.7078 - type: nauc_precision_at_10_diff1 value: 74.4861 - type: nauc_precision_at_20_max value: 81.29950000000001 - type: nauc_precision_at_20_std value: 6.5553 - type: nauc_precision_at_20_diff1 value: 74.5628 - type: nauc_precision_at_100_max value: 83.8742 - type: nauc_precision_at_100_std value: 28.4213 - type: nauc_precision_at_100_diff1 value: 74.4027 - type: nauc_precision_at_1000_max value: 60.9178 - type: nauc_precision_at_1000_std value: -2.6599 - type: nauc_precision_at_1000_diff1 value: 47.6074 - type: nauc_mrr_at_1_max value: 76.1721 - type: nauc_mrr_at_1_std value: -5.5159 - type: nauc_mrr_at_1_diff1 value: 84.6697 - type: nauc_mrr_at_3_max value: 77.6991 - type: nauc_mrr_at_3_std value: -2.3189 - type: nauc_mrr_at_3_diff1 value: 82.0708 - type: nauc_mrr_at_5_max value: 77.4286 - type: nauc_mrr_at_5_std value: -2.721 - type: nauc_mrr_at_5_diff1 value: 82.0265 - type: nauc_mrr_at_10_max value: 77.4212 - type: nauc_mrr_at_10_std value: -2.633 - type: nauc_mrr_at_10_diff1 value: 82.109 - type: nauc_mrr_at_20_max value: 77.4188 - type: nauc_mrr_at_20_std value: -2.6752000000000002 - type: nauc_mrr_at_20_diff1 value: 82.19340000000001 - type: nauc_mrr_at_100_max value: 77.4169 - type: nauc_mrr_at_100_std value: -2.6487 - type: nauc_mrr_at_100_diff1 value: 82.2353 - type: nauc_mrr_at_1000_max value: 77.413 - type: nauc_mrr_at_1000_std value: -2.6639 - type: nauc_mrr_at_1000_diff1 value: 82.238 - type: main_score value: 83.92699999999999 task: type: Retrieval - dataset: config: default name: MTEB SummEval (default) revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: pearson value: 29.8395 - type: spearman value: 29.383 - type: cosine_spearman value: 29.383 - type: cosine_pearson value: 29.8395 - type: dot_spearman value: 29.383 - type: dot_pearson value: 29.8395 - type: main_score value: 29.383 task: type: Summarization - dataset: config: default name: MTEB SyntheticText2SQL (default) revision: 686b87296c3a0191b5d9415a00526c62db9fce09 split: test type: CoIR-Retrieval/synthetic-text2sql metrics: - type: ndcg_at_1 value: 4.222 - type: ndcg_at_3 value: 38.329 - type: ndcg_at_5 value: 42.076 - type: ndcg_at_10 value: 44.775 - type: ndcg_at_20 value: 46.528999999999996 - type: ndcg_at_100 value: 48.554 - type: ndcg_at_1000 value: 49.143 - type: map_at_1 value: 4.222 - type: map_at_3 value: 30.676 - type: map_at_5 value: 32.76 - type: map_at_10 value: 33.898 - type: map_at_20 value: 34.386 - type: map_at_100 value: 34.677 - type: map_at_1000 value: 34.701 - type: recall_at_1 value: 4.222 - type: recall_at_3 value: 60.178 - type: recall_at_5 value: 69.253 - type: recall_at_10 value: 77.474 - type: recall_at_20 value: 84.36200000000001 - type: recall_at_100 value: 95.12899999999999 - type: recall_at_1000 value: 99.675 - type: precision_at_1 value: 4.222 - type: precision_at_3 value: 20.058999999999997 - type: precision_at_5 value: 13.850999999999999 - type: precision_at_10 value: 7.747 - type: precision_at_20 value: 4.218 - type: precision_at_100 value: 0.951 - type: precision_at_1000 value: 0.1 - type: mrr_at_1 value: 27.3287 - type: mrr_at_3 value: 43.8956 - type: mrr_at_5 value: 45.656 - type: mrr_at_10 value: 46.6697 - type: mrr_at_20 value: 47.1331 - type: mrr_at_100 value: 47.4153 - type: mrr_at_1000 value: 47.4391 - type: nauc_ndcg_at_1_max value: 16.045 - type: nauc_ndcg_at_1_std value: -8.7715 - type: nauc_ndcg_at_1_diff1 value: 48.4886 - type: nauc_ndcg_at_3_max value: 30.771500000000003 - type: nauc_ndcg_at_3_std value: -16.2537 - type: nauc_ndcg_at_3_diff1 value: -59.0158 - type: nauc_ndcg_at_5_max value: 30.354 - type: nauc_ndcg_at_5_std value: -16.576 - type: nauc_ndcg_at_5_diff1 value: -55.0555 - type: nauc_ndcg_at_10_max value: 30.0579 - type: nauc_ndcg_at_10_std value: -16.3765 - type: nauc_ndcg_at_10_diff1 value: -52.5829 - type: nauc_ndcg_at_20_max value: 29.8131 - type: nauc_ndcg_at_20_std value: -15.7493 - type: nauc_ndcg_at_20_diff1 value: -51.1605 - type: nauc_ndcg_at_100_max value: 29.9313 - type: nauc_ndcg_at_100_std value: -14.9786 - type: nauc_ndcg_at_100_diff1 value: -49.6997 - type: nauc_ndcg_at_1000_max value: 29.7154 - type: nauc_ndcg_at_1000_std value: -15.2567 - type: nauc_ndcg_at_1000_diff1 value: -49.660399999999996 - type: nauc_map_at_1_max value: 16.045 - type: nauc_map_at_1_std value: -8.7715 - type: nauc_map_at_1_diff1 value: 48.4886 - type: nauc_map_at_3_max value: 29.6122 - type: nauc_map_at_3_std value: -15.509500000000001 - type: nauc_map_at_3_diff1 value: -52.033300000000004 - type: nauc_map_at_5_max value: 29.3076 - type: nauc_map_at_5_std value: -15.7 - type: nauc_map_at_5_diff1 value: -49.1839 - type: nauc_map_at_10_max value: 29.1468 - type: nauc_map_at_10_std value: -15.564400000000001 - type: nauc_map_at_10_diff1 value: -47.7791 - type: nauc_map_at_20_max value: 29.0578 - type: nauc_map_at_20_std value: -15.3635 - type: nauc_map_at_20_diff1 value: -47.2635 - type: nauc_map_at_100_max value: 29.0523 - type: nauc_map_at_100_std value: -15.2602 - type: nauc_map_at_100_diff1 value: -46.9875 - type: nauc_map_at_1000_max value: 29.048299999999998 - type: nauc_map_at_1000_std value: -15.2626 - type: nauc_map_at_1000_diff1 value: -46.98 - type: nauc_recall_at_1_max value: 16.045 - type: nauc_recall_at_1_std value: -8.7715 - type: nauc_recall_at_1_diff1 value: 48.4886 - type: nauc_recall_at_3_max value: 32.8552 - type: nauc_recall_at_3_std value: -17.6374 - type: nauc_recall_at_3_diff1 value: -71.1273 - type: nauc_recall_at_5_max value: 32.378299999999996 - type: nauc_recall_at_5_std value: -18.411 - type: nauc_recall_at_5_diff1 value: -65.7517 - type: nauc_recall_at_10_max value: 32.041799999999995 - type: nauc_recall_at_10_std value: -18.4057 - type: nauc_recall_at_10_diff1 value: -62.019999999999996 - type: nauc_recall_at_20_max value: 31.663999999999998 - type: nauc_recall_at_20_std value: -16.352800000000002 - type: nauc_recall_at_20_diff1 value: -59.1186 - type: nauc_recall_at_100_max value: 37.872499999999995 - type: nauc_recall_at_100_std value: -4.3914 - type: nauc_recall_at_100_diff1 value: -51.8363 - type: nauc_recall_at_1000_max value: 59.5105 - type: nauc_recall_at_1000_std value: 23.3375 - type: nauc_recall_at_1000_diff1 value: -73.9075 - type: nauc_precision_at_1_max value: 16.045 - type: nauc_precision_at_1_std value: -8.7715 - type: nauc_precision_at_1_diff1 value: 48.4886 - type: nauc_precision_at_3_max value: 32.8552 - type: nauc_precision_at_3_std value: -17.6374 - type: nauc_precision_at_3_diff1 value: -71.1273 - type: nauc_precision_at_5_max value: 32.378299999999996 - type: nauc_precision_at_5_std value: -18.411 - type: nauc_precision_at_5_diff1 value: -65.7517 - type: nauc_precision_at_10_max value: 32.041799999999995 - type: nauc_precision_at_10_std value: -18.4057 - type: nauc_precision_at_10_diff1 value: -62.019999999999996 - type: nauc_precision_at_20_max value: 31.663999999999998 - type: nauc_precision_at_20_std value: -16.352800000000002 - type: nauc_precision_at_20_diff1 value: -59.1186 - type: nauc_precision_at_100_max value: 37.872499999999995 - type: nauc_precision_at_100_std value: -4.3914 - type: nauc_precision_at_100_diff1 value: -51.8363 - type: nauc_precision_at_1000_max value: 59.5105 - type: nauc_precision_at_1000_std value: 23.3375 - type: nauc_precision_at_1000_diff1 value: -73.9075 - type: nauc_mrr_at_1_max value: 15.1452 - type: nauc_mrr_at_1_std value: -9.760399999999999 - type: nauc_mrr_at_1_diff1 value: -39.2235 - type: nauc_mrr_at_3_max value: 23.6826 - type: nauc_mrr_at_3_std value: -13.300899999999999 - type: nauc_mrr_at_3_diff1 value: -55.17809999999999 - type: nauc_mrr_at_5_max value: 23.3754 - type: nauc_mrr_at_5_std value: -13.306299999999998 - type: nauc_mrr_at_5_diff1 value: -53.744499999999995 - type: nauc_mrr_at_10_max value: 23.0703 - type: nauc_mrr_at_10_std value: -13.1632 - type: nauc_mrr_at_10_diff1 value: -53.2374 - type: nauc_mrr_at_20_max value: 22.9496 - type: nauc_mrr_at_20_std value: -13.031 - type: nauc_mrr_at_20_diff1 value: -53.016 - type: nauc_mrr_at_100_max value: 22.9044 - type: nauc_mrr_at_100_std value: -12.9409 - type: nauc_mrr_at_100_diff1 value: -52.9092 - type: nauc_mrr_at_1000_max value: 22.897100000000002 - type: nauc_mrr_at_1000_std value: -12.940399999999999 - type: nauc_mrr_at_1000_diff1 value: -52.9095 - type: main_score value: 44.775 task: type: Retrieval - dataset: config: default name: MTEB TRECCOVID (default) revision: bb9466bac8153a0349341eb1b22e06409e78ef4e split: test type: mteb/trec-covid metrics: - type: ndcg_at_1 value: 70.0 - type: ndcg_at_3 value: 68.704 - type: ndcg_at_5 value: 67.533 - type: ndcg_at_10 value: 63.098 - type: ndcg_at_20 value: 60.507999999999996 - type: ndcg_at_100 value: 49.847 - type: ndcg_at_1000 value: 48.394999999999996 - type: map_at_1 value: 0.211 - type: map_at_3 value: 0.555 - type: map_at_5 value: 0.873 - type: map_at_10 value: 1.526 - type: map_at_20 value: 2.731 - type: map_at_100 value: 8.863 - type: map_at_1000 value: 23.162 - type: recall_at_1 value: 0.211 - type: recall_at_3 value: 0.5930000000000001 - type: recall_at_5 value: 0.962 - type: recall_at_10 value: 1.748 - type: recall_at_20 value: 3.318 - type: recall_at_100 value: 12.447999999999999 - type: recall_at_1000 value: 46.794999999999995 - type: precision_at_1 value: 76.0 - type: precision_at_3 value: 72.667 - type: precision_at_5 value: 71.6 - type: precision_at_10 value: 66.0 - type: precision_at_20 value: 63.6 - type: precision_at_100 value: 51.339999999999996 - type: precision_at_1000 value: 21.68 - type: mrr_at_1 value: 76.0 - type: mrr_at_3 value: 84.0 - type: mrr_at_5 value: 84.39999999999999 - type: mrr_at_10 value: 84.85000000000001 - type: mrr_at_20 value: 84.85000000000001 - type: mrr_at_100 value: 84.85000000000001 - type: mrr_at_1000 value: 84.85000000000001 - type: nauc_ndcg_at_1_max value: 48.710300000000004 - type: nauc_ndcg_at_1_std value: 72.6125 - type: nauc_ndcg_at_1_diff1 value: -19.9816 - type: nauc_ndcg_at_3_max value: 44.8032 - type: nauc_ndcg_at_3_std value: 64.7227 - type: nauc_ndcg_at_3_diff1 value: -25.933899999999998 - type: nauc_ndcg_at_5_max value: 44.7004 - type: nauc_ndcg_at_5_std value: 65.05330000000001 - type: nauc_ndcg_at_5_diff1 value: -26.0531 - type: nauc_ndcg_at_10_max value: 49.5716 - type: nauc_ndcg_at_10_std value: 66.18730000000001 - type: nauc_ndcg_at_10_diff1 value: -22.3525 - type: nauc_ndcg_at_20_max value: 49.0212 - type: nauc_ndcg_at_20_std value: 71.2387 - type: nauc_ndcg_at_20_diff1 value: -21.6522 - type: nauc_ndcg_at_100_max value: 47.3029 - type: nauc_ndcg_at_100_std value: 82.31819999999999 - type: nauc_ndcg_at_100_diff1 value: -27.5265 - type: nauc_ndcg_at_1000_max value: 38.8474 - type: nauc_ndcg_at_1000_std value: 77.1578 - type: nauc_ndcg_at_1000_diff1 value: -29.350700000000003 - type: nauc_map_at_1_max value: 16.4698 - type: nauc_map_at_1_std value: 9.657300000000001 - type: nauc_map_at_1_diff1 value: -4.3484 - type: nauc_map_at_3_max value: 25.183299999999996 - type: nauc_map_at_3_std value: 16.8245 - type: nauc_map_at_3_diff1 value: -7.1254 - type: nauc_map_at_5_max value: 24.5899 - type: nauc_map_at_5_std value: 19.8027 - type: nauc_map_at_5_diff1 value: -9.8547 - type: nauc_map_at_10_max value: 34.9032 - type: nauc_map_at_10_std value: 26.435599999999997 - type: nauc_map_at_10_diff1 value: -8.833499999999999 - type: nauc_map_at_20_max value: 40.551700000000004 - type: nauc_map_at_20_std value: 34.6141 - type: nauc_map_at_20_diff1 value: -8.578199999999999 - type: nauc_map_at_100_max value: 51.403299999999994 - type: nauc_map_at_100_std value: 68.4083 - type: nauc_map_at_100_diff1 value: -17.7135 - type: nauc_map_at_1000_max value: 48.9955 - type: nauc_map_at_1000_std value: 82.9784 - type: nauc_map_at_1000_diff1 value: -26.473000000000003 - type: nauc_recall_at_1_max value: 16.4698 - type: nauc_recall_at_1_std value: 9.657300000000001 - type: nauc_recall_at_1_diff1 value: -4.3484 - type: nauc_recall_at_3_max value: 21.4136 - type: nauc_recall_at_3_std value: 11.4801 - type: nauc_recall_at_3_diff1 value: -7.1396 - type: nauc_recall_at_5_max value: 18.0314 - type: nauc_recall_at_5_std value: 12.7486 - type: nauc_recall_at_5_diff1 value: -9.7349 - type: nauc_recall_at_10_max value: 27.8032 - type: nauc_recall_at_10_std value: 18.7061 - type: nauc_recall_at_10_diff1 value: -9.2739 - type: nauc_recall_at_20_max value: 30.878299999999996 - type: nauc_recall_at_20_std value: 26.0295 - type: nauc_recall_at_20_diff1 value: -7.8001000000000005 - type: nauc_recall_at_100_max value: 39.4065 - type: nauc_recall_at_100_std value: 56.112399999999994 - type: nauc_recall_at_100_diff1 value: -17.8753 - type: nauc_recall_at_1000_max value: 31.571199999999997 - type: nauc_recall_at_1000_std value: 65.3181 - type: nauc_recall_at_1000_diff1 value: -26.398899999999998 - type: nauc_precision_at_1_max value: 59.8382 - type: nauc_precision_at_1_std value: 66.9075 - type: nauc_precision_at_1_diff1 value: -5.1873000000000005 - type: nauc_precision_at_3_max value: 55.787600000000005 - type: nauc_precision_at_3_std value: 64.1127 - type: nauc_precision_at_3_diff1 value: -24.3791 - type: nauc_precision_at_5_max value: 50.0544 - type: nauc_precision_at_5_std value: 61.812599999999996 - type: nauc_precision_at_5_diff1 value: -24.5456 - type: nauc_precision_at_10_max value: 57.4695 - type: nauc_precision_at_10_std value: 63.7448 - type: nauc_precision_at_10_diff1 value: -22.6982 - type: nauc_precision_at_20_max value: 57.3052 - type: nauc_precision_at_20_std value: 72.00619999999999 - type: nauc_precision_at_20_diff1 value: -18.2329 - type: nauc_precision_at_100_max value: 50.0873 - type: nauc_precision_at_100_std value: 84.9689 - type: nauc_precision_at_100_diff1 value: -27.625300000000003 - type: nauc_precision_at_1000_max value: 29.3103 - type: nauc_precision_at_1000_std value: 57.898700000000005 - type: nauc_precision_at_1000_diff1 value: -28.8765 - type: nauc_mrr_at_1_max value: 59.8382 - type: nauc_mrr_at_1_std value: 66.9075 - type: nauc_mrr_at_1_diff1 value: -5.1873000000000005 - type: nauc_mrr_at_3_max value: 58.4682 - type: nauc_mrr_at_3_std value: 64.6751 - type: nauc_mrr_at_3_diff1 value: -5.9737 - type: nauc_mrr_at_5_max value: 59.099999999999994 - type: nauc_mrr_at_5_std value: 63.6902 - type: nauc_mrr_at_5_diff1 value: -6.482499999999999 - type: nauc_mrr_at_10_max value: 57.9638 - type: nauc_mrr_at_10_std value: 63.716300000000004 - type: nauc_mrr_at_10_diff1 value: -5.6598999999999995 - type: nauc_mrr_at_20_max value: 57.9638 - type: nauc_mrr_at_20_std value: 63.716300000000004 - type: nauc_mrr_at_20_diff1 value: -5.6598999999999995 - type: nauc_mrr_at_100_max value: 57.9638 - type: nauc_mrr_at_100_std value: 63.716300000000004 - type: nauc_mrr_at_100_diff1 value: -5.6598999999999995 - type: nauc_mrr_at_1000_max value: 57.9638 - type: nauc_mrr_at_1000_std value: 63.716300000000004 - type: nauc_mrr_at_1000_diff1 value: -5.6598999999999995 - type: main_score value: 63.098 task: type: Retrieval - dataset: config: default name: MTEB Touche2020 (default) revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: ndcg_at_1 value: 23.469 - type: ndcg_at_3 value: 25.522 - type: ndcg_at_5 value: 24.333 - type: ndcg_at_10 value: 24.029 - type: ndcg_at_20 value: 24.573 - type: ndcg_at_100 value: 34.425 - type: ndcg_at_1000 value: 46.907 - type: map_at_1 value: 1.976 - type: map_at_3 value: 4.589 - type: map_at_5 value: 6.555999999999999 - type: map_at_10 value: 9.687999999999999 - type: map_at_20 value: 11.926 - type: map_at_100 value: 15.116999999999999 - type: map_at_1000 value: 16.769000000000002 - type: recall_at_1 value: 1.976 - type: recall_at_3 value: 6.101 - type: recall_at_5 value: 9.68 - type: recall_at_10 value: 16.633 - type: recall_at_20 value: 23.589 - type: recall_at_100 value: 45.61 - type: recall_at_1000 value: 82.48100000000001 - type: precision_at_1 value: 26.531 - type: precision_at_3 value: 27.891 - type: precision_at_5 value: 25.714 - type: precision_at_10 value: 22.448999999999998 - type: precision_at_20 value: 16.837 - type: precision_at_100 value: 7.122000000000001 - type: precision_at_1000 value: 1.5270000000000001 - type: mrr_at_1 value: 26.5306 - type: mrr_at_3 value: 39.1156 - type: mrr_at_5 value: 41.1565 - type: mrr_at_10 value: 43.863 - type: mrr_at_20 value: 44.5963 - type: mrr_at_100 value: 44.766600000000004 - type: mrr_at_1000 value: 44.766600000000004 - type: nauc_ndcg_at_1_max value: -31.661099999999998 - type: nauc_ndcg_at_1_std value: 2.8871 - type: nauc_ndcg_at_1_diff1 value: 3.4787 - type: nauc_ndcg_at_3_max value: -34.6673 - type: nauc_ndcg_at_3_std value: -3.8882 - type: nauc_ndcg_at_3_diff1 value: 0.6512 - type: nauc_ndcg_at_5_max value: -33.815 - type: nauc_ndcg_at_5_std value: 0.20209999999999997 - type: nauc_ndcg_at_5_diff1 value: -6.4072000000000005 - type: nauc_ndcg_at_10_max value: -26.9953 - type: nauc_ndcg_at_10_std value: -3.6511 - type: nauc_ndcg_at_10_diff1 value: -3.8763 - type: nauc_ndcg_at_20_max value: -30.218600000000002 - type: nauc_ndcg_at_20_std value: -1.4384 - type: nauc_ndcg_at_20_diff1 value: -8.5927 - type: nauc_ndcg_at_100_max value: -32.1409 - type: nauc_ndcg_at_100_std value: 20.1662 - type: nauc_ndcg_at_100_diff1 value: -12.0591 - type: nauc_ndcg_at_1000_max value: -31.6892 - type: nauc_ndcg_at_1000_std value: 32.1464 - type: nauc_ndcg_at_1000_diff1 value: -8.3651 - type: nauc_map_at_1_max value: -41.9612 - type: nauc_map_at_1_std value: -11.0332 - type: nauc_map_at_1_diff1 value: -5.2508 - type: nauc_map_at_3_max value: -30.4968 - type: nauc_map_at_3_std value: -11.138 - type: nauc_map_at_3_diff1 value: -0.8447 - type: nauc_map_at_5_max value: -24.7543 - type: nauc_map_at_5_std value: -10.302 - type: nauc_map_at_5_diff1 value: -10.0762 - type: nauc_map_at_10_max value: -20.420099999999998 - type: nauc_map_at_10_std value: -10.485 - type: nauc_map_at_10_diff1 value: -10.3134 - type: nauc_map_at_20_max value: -20.8606 - type: nauc_map_at_20_std value: -6.3984 - type: nauc_map_at_20_diff1 value: -10.8605 - type: nauc_map_at_100_max value: -22.6385 - type: nauc_map_at_100_std value: 3.8738 - type: nauc_map_at_100_diff1 value: -12.9055 - type: nauc_map_at_1000_max value: -23.0823 - type: nauc_map_at_1000_std value: 8.6942 - type: nauc_map_at_1000_diff1 value: -13.1715 - type: nauc_recall_at_1_max value: -41.9612 - type: nauc_recall_at_1_std value: -11.0332 - type: nauc_recall_at_1_diff1 value: -5.2508 - type: nauc_recall_at_3_max value: -25.9715 - type: nauc_recall_at_3_std value: -14.9623 - type: nauc_recall_at_3_diff1 value: -4.2583 - type: nauc_recall_at_5_max value: -24.5848 - type: nauc_recall_at_5_std value: -14.258299999999998 - type: nauc_recall_at_5_diff1 value: -13.1162 - type: nauc_recall_at_10_max value: -22.3834 - type: nauc_recall_at_10_std value: -15.274199999999999 - type: nauc_recall_at_10_diff1 value: -10.8836 - type: nauc_recall_at_20_max value: -22.8634 - type: nauc_recall_at_20_std value: -4.8215 - type: nauc_recall_at_20_diff1 value: -11.1747 - type: nauc_recall_at_100_max value: -25.9537 - type: nauc_recall_at_100_std value: 29.75 - type: nauc_recall_at_100_diff1 value: -15.512799999999999 - type: nauc_recall_at_1000_max value: -18.9449 - type: nauc_recall_at_1000_std value: 69.619 - type: nauc_recall_at_1000_diff1 value: -5.629300000000001 - type: nauc_precision_at_1_max value: -33.7627 - type: nauc_precision_at_1_std value: 1.8065000000000002 - type: nauc_precision_at_1_diff1 value: 5.3592 - type: nauc_precision_at_3_max value: -30.7992 - type: nauc_precision_at_3_std value: -6.285399999999999 - type: nauc_precision_at_3_diff1 value: 1.1098000000000001 - type: nauc_precision_at_5_max value: -27.8949 - type: nauc_precision_at_5_std value: -1.8754 - type: nauc_precision_at_5_diff1 value: -8.0528 - type: nauc_precision_at_10_max value: -19.659299999999998 - type: nauc_precision_at_10_std value: -0.9809999999999999 - type: nauc_precision_at_10_diff1 value: -2.0972999999999997 - type: nauc_precision_at_20_max value: -25.810899999999997 - type: nauc_precision_at_20_std value: 19.5577 - type: nauc_precision_at_20_diff1 value: -8.879199999999999 - type: nauc_precision_at_100_max value: -21.1488 - type: nauc_precision_at_100_std value: 65.00200000000001 - type: nauc_precision_at_100_diff1 value: -11.740499999999999 - type: nauc_precision_at_1000_max value: 20.7392 - type: nauc_precision_at_1000_std value: 38.2851 - type: nauc_precision_at_1000_diff1 value: 17.4954 - type: nauc_mrr_at_1_max value: -33.7627 - type: nauc_mrr_at_1_std value: 1.8065000000000002 - type: nauc_mrr_at_1_diff1 value: 5.3592 - type: nauc_mrr_at_3_max value: -39.837 - type: nauc_mrr_at_3_std value: -5.3861 - type: nauc_mrr_at_3_diff1 value: -4.1776 - type: nauc_mrr_at_5_max value: -39.756099999999996 - type: nauc_mrr_at_5_std value: -5.3674 - type: nauc_mrr_at_5_diff1 value: -2.4693 - type: nauc_mrr_at_10_max value: -37.7379 - type: nauc_mrr_at_10_std value: -6.2844 - type: nauc_mrr_at_10_diff1 value: -0.6525000000000001 - type: nauc_mrr_at_20_max value: -38.4522 - type: nauc_mrr_at_20_std value: -5.0927 - type: nauc_mrr_at_20_diff1 value: -0.2814 - type: nauc_mrr_at_100_max value: -38.1599 - type: nauc_mrr_at_100_std value: -5.2147 - type: nauc_mrr_at_100_diff1 value: -0.7001000000000001 - type: nauc_mrr_at_1000_max value: -38.1599 - type: nauc_mrr_at_1000_std value: -5.2147 - type: nauc_mrr_at_1000_diff1 value: -0.7001000000000001 - type: main_score value: 24.029 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification (default) revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 62.9395 - type: f1 value: 47.7133 - type: f1_weighted value: 71.0525 - type: ap value: 10.306600000000001 - type: ap_weighted value: 10.306600000000001 - type: main_score value: 62.9395 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification (default) revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 52.8721 - type: f1 value: 53.034800000000004 - type: f1_weighted value: 52.4319 - type: main_score value: 52.8721 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering (default) revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: v_measure value: 44.9227 - type: v_measure_std value: 1.1638000000000002 - type: main_score value: 44.9227 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 (default) revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: similarity_accuracy value: 82.04090000000001 - type: similarity_accuracy_threshold value: 86.6147 - type: similarity_f1 value: 57.258399999999995 - type: similarity_f1_threshold value: 82.9233 - type: similarity_precision value: 52.1456 - type: similarity_recall value: 63.4828 - type: similarity_ap value: 60.0317 - type: cosine_accuracy value: 82.04090000000001 - type: cosine_accuracy_threshold value: 86.6147 - type: cosine_f1 value: 57.258399999999995 - type: cosine_f1_threshold value: 82.9233 - type: cosine_precision value: 52.1456 - type: cosine_recall value: 63.4828 - type: cosine_ap value: 60.0317 - type: manhattan_accuracy value: 81.9574 - type: manhattan_accuracy_threshold value: 794.4433 - type: manhattan_f1 value: 57.1936 - type: manhattan_f1_threshold value: 898.9445 - type: manhattan_precision value: 51.91480000000001 - type: manhattan_recall value: 63.6675 - type: manhattan_ap value: 59.9255 - type: euclidean_accuracy value: 82.04090000000001 - type: euclidean_accuracy_threshold value: 51.7403 - type: euclidean_f1 value: 57.258399999999995 - type: euclidean_f1_threshold value: 58.440999999999995 - type: euclidean_precision value: 52.1456 - type: euclidean_recall value: 63.4828 - type: euclidean_ap value: 60.0317 - type: dot_accuracy value: 82.04090000000001 - type: dot_accuracy_threshold value: 86.6147 - type: dot_f1 value: 57.258399999999995 - type: dot_f1_threshold value: 82.9233 - type: dot_precision value: 52.1456 - type: dot_recall value: 63.4828 - type: dot_ap value: 60.0317 - type: max_accuracy value: 82.04090000000001 - type: max_f1 value: 57.258399999999995 - type: max_precision value: 52.1456 - type: max_recall value: 63.6675 - type: max_ap value: 60.0317 - type: main_score value: 60.0317 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus (default) revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: similarity_accuracy value: 87.3035 - type: similarity_accuracy_threshold value: 85.4123 - type: similarity_f1 value: 74.5555 - type: similarity_f1_threshold value: 83.7581 - type: similarity_precision value: 72.55369999999999 - type: similarity_recall value: 76.6708 - type: similarity_ap value: 82.42930000000001 - type: cosine_accuracy value: 87.3035 - type: cosine_accuracy_threshold value: 85.4123 - type: cosine_f1 value: 74.5555 - type: cosine_f1_threshold value: 83.7581 - type: cosine_precision value: 72.55369999999999 - type: cosine_recall value: 76.6708 - type: cosine_ap value: 82.42930000000001 - type: manhattan_accuracy value: 87.3249 - type: manhattan_accuracy_threshold value: 831.9304999999999 - type: manhattan_f1 value: 74.8665 - type: manhattan_f1_threshold value: 893.9980999999999 - type: manhattan_precision value: 70.8502 - type: manhattan_recall value: 79.3656 - type: manhattan_ap value: 82.5792 - type: euclidean_accuracy value: 87.3035 - type: euclidean_accuracy_threshold value: 54.014300000000006 - type: euclidean_f1 value: 74.5555 - type: euclidean_f1_threshold value: 56.9946 - type: euclidean_precision value: 72.55369999999999 - type: euclidean_recall value: 76.6708 - type: euclidean_ap value: 82.42920000000001 - type: dot_accuracy value: 87.3035 - type: dot_accuracy_threshold value: 85.4123 - type: dot_f1 value: 74.5555 - type: dot_f1_threshold value: 83.7581 - type: dot_precision value: 72.55369999999999 - type: dot_recall value: 76.6708 - type: dot_ap value: 82.42920000000001 - type: max_accuracy value: 87.3249 - type: max_f1 value: 74.8665 - type: max_precision value: 72.55369999999999 - type: max_recall value: 79.3656 - type: max_ap value: 82.5792 - type: main_score value: 82.5792 task: type: PairClassification pipeline_tag: sentence-similarity --- # Granite-Embedding-30m-English **Model Summary:** Granite-Embedding-30m-English is a 30M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 384 and is trained using a combination of open source relevance-pair datasets with permissive, enterprise-friendly license, and IBM collected and generated datasets. While maintaining competitive scores on academic benchmarks such as BEIR, this model also performs well on many enterprise use cases. This model is developed using retrieval oriented pretraining, contrastive finetuning, knowledge distillation and model merging for improved performance. - **Developers:** Granite Embedding Team, IBM - **GitHub Repository:** ibm-granite/granite-embedding-models - **Website**: Granite Docs - **Paper:** Coming Soon - **Release Date**: December 18th, 2024 - **License:** Apache 2.0 **Supported Languages:** English. **Intended use:** The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications. **Usage with Sentence Transformers:** The model is compatible with SentenceTransformer library and is very easy to use: First, install the sentence transformers library The model can then be used to encode pairs of text and find the similarity between their representations **Usage with Huggingface Transformers:** This is a simple example of how to use the Granite-Embedding-30m-English model with the Transformers library and PyTorch. First, install the required libraries The model can then be used to encode pairs of text **Evaluation:** Granite-Embedding-30M-English is twice as fast as other models with similar embedding dimensions, while maintaining competitive performance. The performance of the Granite-Embedding-30M-English model on MTEB Retrieval (i.e., BEIR) and code retrieval (CoIR) benchmarks is reported below. | Model | Paramters (M)| Embedding Dimension | MTEB Retrieval (15) | CoIR (10) | |---------------------------------|:------------:|:-------------------:|:-------------------: |:----------:| |granite-embedding-30m-english |30 |384 |49.1 |47.0 | **Model Architecture:** Granite-Embedding-30m-English is based on an encoder-only RoBERTa like transformer architecture, trained internally at IBM Research. | Model | granite-embedding-30m-english | granite-embedding-125m-english | granite-embedding-107m-multilingual | granite-embedding-278m-multilingual | | :--------- | :-------:| :--------: | :-----:| :-----:| | Embedding size | **384** | 768 | 384 | 768 | | Number of layers | **6** | 12 | 6 | 12 | | Number of attention heads | **12** | 12 | 12 | 12 | | Intermediate size | **1536** | 3072 | 1536 | 3072 | | Activation Function | **GeLU** | GeLU | GeLU | GeLU | | Vocabulary Size | **50265**| 50265 | 250002 | 250002 | | Max. Sequence Length | **512** | 512 | 512 | 512 | | # Parameters | **30M** | 125M | 107M | 278M | **Training Data:** Overall, the training data consists of four key sources: (1) unsupervised title-body paired data scraped from the web, (2) publicly available paired with permissive, enterprise-friendly license, (3) IBM-internal paired data targetting specific technical domains, and (4) IBM-generated synthetic data. The data is listed below: | **Dataset** | **Num. Pairs** | |----------------------------------------------------|:---------------:| | SPECTER citation triplets | 684,100 | | Stack Exchange Duplicate questions (titles) | 304,525 | | Stack Exchange Duplicate questions (bodies) | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | 250,460 | | Natural Questions (NQ) | 100,231 | | SQuAD2.0 | 87,599 | | PAQ (Question, Answer) pairs | 64,371,441 | | Stack Exchange (Title, Answer) pairs | 4,067,139 | | Stack Exchange (Title, Body) pairs | 23,978,013 | | Stack Exchange (Title+Body, Answer) pairs | 187,195 | | S2ORC Citation pairs (Titles) | 52,603,982 | | S2ORC (Title, Abstract) | 41,769,185 | | S2ORC (Citations, abstracts) | 52,603,982 | | WikiAnswers Duplicate question pairs | 77,427,422 | | SearchQA | 582,261 | | HotpotQA | 85,000 | | Fever | 109,810 | | Arxiv | 2,358,545 | | Wikipedia | 20,745,403 | | PubMed | 20,000,000 | | Miracl En Pairs | 9,016 | | DBPedia Title-Body Pairs | 4,635,922 | | Synthetic: Query-Wikipedia Passage | 1,879,093 | | Synthetic: Fact Verification | 9,888 | | IBM Internal Triples | 40,290 | | IBM Internal Title-Body Pairs | 1,524,586 | Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license, while other open-source models train on this dataset due to its high quality. **Infrastructure:** We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs. **Ethical Considerations and Limitations:** The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-30m-English is trained only for English texts, and has a context length of 512 tokens (longer texts will be truncated to this size). **Resources** - ⭐️ Learn about the latest updates with Granite: - 📄 Get started with tutorials, best practices, and prompt engineering advice: - 💡 Learn about the latest Granite learning resources: ", + "model_explanation_gemini": "Generates English text embeddings for tasks like classification and retrieval using sentence-transformers." +} \ No newline at end of file diff --git a/data/model_data_json/ibm-granite_granite-timeseries-ttm-r1.json b/data/model_data_json/ibm-granite_granite-timeseries-ttm-r1.json new file mode 100644 index 0000000000000000000000000000000000000000..9dbdf37b456e8eb1daa0813a11e9de735d2a5bf6 --- /dev/null +++ b/data/model_data_json/ibm-granite_granite-timeseries-ttm-r1.json @@ -0,0 +1,21 @@ +{ + "model_id": "ibm-granite/granite-timeseries-ttm-r1", + "downloads": 1278053, + "tags": [ + "granite-tsfm", + "safetensors", + "tinytimemixer", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2401.03955", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series library_name: granite-tsfm new_version: ibm-granite/granite-timeseries-ttm-r2 --- # Granite-TimeSeries-TTM-R1 Model Card

TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research. **With less than 1 Million parameters, TTM (accepted in NeurIPS 24) introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting.** TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTMs are lightweight forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. Refer to our paper for more details. **The current open-source version supports point forecasting use-cases specifically ranging from minutely to hourly resolutions (Ex. 10 min, 15 min, 1 hour.).** **Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!!** **New updates:** TTM-R1 comprises TTM variants pre-trained on 250M public training samples. We have another set of TTM models released recently under TTM-R2 trained on a much larger pretraining dataset (~700M samples) which can be accessed from here. In general, TTM-R2 models perform better than TTM-R1 models as they are trained on larger pretraining dataset. However, the choice of R1 vs R2 depends on your target data distribution. Hence requesting users to try both R1 and R2 variants and pick the best for your data. ## Model Description TTM falls under the category of “focused pre-trained models”, wherein each pre-trained TTM is tailored for a particular forecasting setting (governed by the context length and forecast length). Instead of building one massive model supporting all forecasting settings, we opt for the approach of constructing smaller pre-trained models, each focusing on a specific forecasting setting, thereby yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast, facilitating easy deployment without demanding a ton of resources. Hence, in this model card, we plan to release several pre-trained TTMs that can cater to many common forecasting settings in practice. Additionally, we have released our source code along with our pretraining scripts that users can utilize to pretrain models on their own. Pretraining TTMs is very easy and fast, taking only 3-6 hours using 6 A100 GPUs, as opposed to several days or weeks in traditional approaches. Each pre-trained model will be released in a different branch name in this model card. Kindly access the required model using our getting started notebook mentioning the branch name. ## Model Releases (along with the branch name where the models are stored): - **512-96:** Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length) in future. This model is targeted towards a forecasting setting of context length 512 and forecast length 96 and recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). This model refers to the TTM-Q variant used in the paper. (branch name: main) [[Benchmark Scripts]]( - **1024-96:** Given the last 1024 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length) in future. This model is targeted towards a long forecasting setting of context length 1024 and forecast length 96 and recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-96-v1) [[Benchmark Scripts]]( We can also use the [[get_model]]( utility to automatically select the required model based on your input context length and forecast length requirement. For more variants (till forecast length 720), refer to our new model card here ## Model Capabilities with example scripts The below model scripts can be used for any of the above TTM models. Please update the HF model URL and branch name in the call appropriately to pick the model of your choice. - Getting Started [[colab]]( - Zeroshot Multivariate Forecasting [[Example]]( - Finetuned Multivariate Forecasting: - Channel-Independent Finetuning [[Example 1]]( [[Example 2]]( - Channel-Mix Finetuning [[Example]]( - **New Releases (extended features released on October 2024)** - Finetuning and Forecasting with Exogenous/Control Variables [[Example]]( - Finetuning and Forecasting with static categorical features [Example: To be added soon] - Rolling Forecasts - Extend forecast lengths beyond 96 via rolling capability [[Example]]( - Helper scripts for optimal Learning Rate suggestions for Finetuning [[Example]]( ## Benchmarks TTM outperforms popular benchmarks such as TimesFM, Moirai, Chronos, Lag-Llama, Moment, GPT4TS, TimeLLM, LLMTime in zero/fewshot forecasting while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. For more details, refer to our paper TTM-Q referred in the paper maps to the model uploaded in the main branch. For other variants (TTM-B, TTM-E and TTM-A) please refer here. For more details, refer to the paper.

## Recommended Use 1. Users have to externally standard scale their data independently for every channel before feeding it to the model (Refer to TSP, our data processing utility for data scaling.) 2. The current open-source version supports only minutely and hourly resolutions(Ex. 10 min, 15 min, 1 hour.). Other lower resolutions (say weekly, or monthly) are currently not supported in this version, as the model needs a minimum context length of 512 or 1024. 3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will impact the model performance. ## Model Details For more details on TTM architecture and benchmarks, refer to our paper. TTM-1 currently supports 2 modes: - **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training). - **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast. **Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes to get more accurate forecasts.** The current release supports multivariate forecasting via both channel independence and channel-mixing approaches. Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across time-series variates, a critical capability lacking in existing counterparts. In addition, TTM also supports exogenous infusion and categorical data infusion. ### Model Sources - **Repository:** - **Paper:** ### Blogs and articles on TTM: - Refer to our wiki ## Uses ## Training Data The original r1 TTM models were trained on a collection of datasets from the Monash Time Series Forecasting repository. The datasets used include: - Australian Electricity Demand: - Australian Weather: - Bitcoin dataset: - KDD Cup 2018 dataset: - London Smart Meters: - Saugeen River Flow: - Solar Power: - Sunspots: - Solar: - US Births: - Wind Farms Production data: - Wind Power: ## Citation Kindly cite the following paper, if you intend to use our model or its associated architectures/approaches in your work **BibTeX:** ## Model Card Authors Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam ## IBM Public Repository Disclosure: All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.", + "model_explanation_gemini": "Granite-TimeSeries-TTM-R1 is a compact, pretrained multivariate time-series forecasting model optimized for zero-shot and few-shot predictions at minutely to hourly resolutions, requiring minimal computational resources." +} \ No newline at end of file diff --git a/data/model_data_json/ibm-granite_granite-timeseries-ttm-r2.json b/data/model_data_json/ibm-granite_granite-timeseries-ttm-r2.json new file mode 100644 index 0000000000000000000000000000000000000000..0132a0412085932cbf49c00cdfcd997203e8f494 --- /dev/null +++ b/data/model_data_json/ibm-granite_granite-timeseries-ttm-r2.json @@ -0,0 +1,21 @@ +{ + "model_id": "ibm-granite/granite-timeseries-ttm-r2", + "downloads": 236095, + "tags": [ + "granite-tsfm", + "safetensors", + "tinytimemixer", + "time series", + "forecasting", + "pretrained models", + "foundation models", + "time series foundation models", + "time-series", + "time-series-forecasting", + "arxiv:2401.03955", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series library_name: granite-tsfm --- # Granite-TimeSeries-TTM-R2 Model Card

TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research. **With model sizes starting from 1M params, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting. The paper describing TTM was accepted at NeurIPS 24.** TTM outperforms other models demanding billions of parameters in several popular zero-shot and few-shot forecasting benchmarks. TTMs are lightweight forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. **Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed on 1 GPU or on laptops.** TTM r2 comprises TTM variants pre-trained on larger pretraining datasets (\\~700M samples). The TTM r2.1 release increases the pretraining dataset size to approximately (\\~1B samples). The prior model releases, TTM r1, were trained on \\~250M samples and can be accessed here. In general, TTM r2 models perform better than TTM r1 models as they are trained on a larger pretraining dataset. In standard benchmarks, TTM r2 outperform TTM r1 by over 15%. However, the choice of r1 vs. r2 depends on your target data distribution, and hence users should try both variants and pick the best model for your data. The TTM r2 releases support point forecasting use-cases specifically ranging from minutely to hourly resolutions (Ex. 10 min, 15 min, 1 hour.). With the TTM r2.1 release, we add support for daily and weekly resolutions. ### Links - **Paper:** NeurIPS 2024, ArXiV - **Repository:** - **PyPI project:** - **Model architecture:** - **Time Series Cookbook:** ## Model Description TTM falls under the category of “focused pre-trained models”, wherein each pre-trained TTM is tailored for a particular forecasting setting (governed by the context length and forecast length). Instead of building one massive model supporting all forecasting settings, we opt for the approach of constructing smaller pre-trained models, each focusing on a specific forecasting setting, thereby yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast, facilitating easy deployment without demanding a ton of resources. Hence, in this model card, we release several pre-trained TTMs that can cater to many common forecasting settings in practice. Each pre-trained model will be released in a different branch name in this model card. Given the variety of models included, we recommend the use of []( utility to automatically select the required model based on your input context length, and forecast length, and other requirements. You can also directly access a specific model using our getting started notebook mentioning the branch name. ## Model Releases There are several models available in different branches of this model card. The naming scheme follows the following format: - context length: The historical data used as input to the TTM model. - prediction length: The number of time points predicted by model (i.e., the forecast length) - frequency tuning indicator (\"ft\" or missing): \"ft\" is used to indicate use of frequency prefix tuning. When enabled an extra embedding vector indicating the frequency of the data is added to the input of the model. If missing, only the context window is used by the model. - pretraining metric (\"mae\" or missing): MAE indicates pertaining with mean absolute error loss, while missing indicates using mean squared error. - release number (\"r2\" or \"r2.1\"): Indicates the model release; the release indicates which data was used to train the model. See \"training data\" below for more details on the data included in the particular training datasets. ### Example recipes and notebooks The scripts below can be used for any of the above TTM models. Please update the HF model URL and branch name in the call appropriately to pick the model of your choice. Please note that a few of the notebooks directly use the []( utility to select the model. - Getting started [[Recipe]]( [[colab]]( - Getting started with IBM watsonx [[Recipe]]( - Zeroshot Multivariate Forecasting [[Example]]( - Finetuned Multivariate Forecasting: - Channel-Independent Finetuning [[Example 1]]( [[Example 2]]( - Channel-Mix Finetuning [[Example]]( - TTM r2 release (extended features released on October 2024): - Finetuning and Forecasting with Exogenous/Control Variables [[Recipe 1]]( [[Recipe 2]]( - Finetuning and Forecasting with static categorical features [Example: To be added soon] - Rolling Forecasts - Extend forecast lengths via rolling capability. Rolling beyond 2*forecast_length is not recommended. [[Example]]( - Helper scripts for optimal Learning Rate suggestions for Finetuning [[Example]]( - TTM r2.1 release: - GIFT-Eval benchmark [[notebook]]( ### Usage guidelines 1. Users have to externally standard scale their data independently for every channel before feeding it to the model (refer to []( our data processing utility for data scaling). 2. The current open-source version supports only minutely and hourly resolutions(Ex. 10 min, 15 min, 1 hour.). Other lower resolutions (say monthly or yearly) are currently not supported in this version, as the model needs a minimum context length of 512 or 1024. With the r2.1 release, we now also support daily and weekly resolution. 3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will impact the model performance. ### Automatic model selection Automatic model selection based on context length, prediction length, and other requirements can be done through use of the function. For reference, the signature of the function is provided below: ## Benchmarks

TTM outperforms popular benchmarks such as TimesFM, Moirai, Chronos, Lag-Llama, Moment, GPT4TS, TimeLLM, LLMTime in zero/fewshot forecasting while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. For more details, refer to our paper. - TTM-B referred in the paper maps to the 512 context models. - TTM-E referred in the paper maps to the 1024 context models. - TTM-A referred in the paper maps to the 1536 context models. The pre-training dataset used in this release differs slightly from the one used in the research paper, which may lead to minor variations in model performance as compared to the published results. Please refer to our paper for more details. Benchmarking scripts can be found here. ## Model Details For more details on TTM architecture and benchmarks, refer to our paper. TTM currently supports two modes: - **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training). - **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast. Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes to get more accurate forecasts. The current release supports multivariate forecasting via both channel independence and channel-mixing approaches. Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across time-series variates, a critical capability lacking in existing counterparts. In addition, TTM also supports exogenous infusion and static categorical data infusion. The r2.1 release builds upon the above, adding improved accuracy for shorter context length, daily/weekly resolution, combined with a larger pre-training dataset. ## Training Data The r2 TTM models were trained on a collection of datasets as follows: - Australian Electricity Demand: - Australian Weather: - Bitcoin: - KDD Cup 2018: - London Smart Meters: - Saugeen River Flow: - Solar Power: - Sunspots: - Solar: - US Births: - Wind Farms Production: - Wind Power: - PEMSD3, PEMSD4, PEMSD7, PEMSD8, PEMS_BAY: - LOS_LOOP: The r2.1 TTM models (denoted by branches with suffix r2.1) were trained on the above collection, in addition to the following datasets: - Weather: - Covid Deaths: - Covid Mobility: - Extended Wikipedia Web Traffic: - NN5: - Temperature Rain: - Vehicle Trips: - Kaggle Web Traffic: - Hierarchical Sales: - Project Tycho: - Subseasonal: - Subseasonal Precipitation: - Uber TLC: - Wiki Rolling: - CDC FluView ILINet: - CDC FluView WHO/NREVSS: ## Citation Please cite the following paper if you intend to use our model or its associated architectures/approaches in your work. **BibTeX:** ## Model Card Authors Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Tomoya Sakai, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam ## IBM Public Repository Disclosure All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.", + "model_explanation_gemini": "Granite-TimeSeries-TTM-R2 is a compact, pretrained multivariate time-series forecasting model designed for efficient zero-shot and few-shot predictions, excelling in lightweight deployment with minimal data requirements." +} \ No newline at end of file diff --git a/data/model_data_json/ibm-research_MoLFormer-XL-both-10pct.json b/data/model_data_json/ibm-research_MoLFormer-XL-both-10pct.json new file mode 100644 index 0000000000000000000000000000000000000000..d5ae2d0908b58b182e81300f8ab0585bd8a10036 --- /dev/null +++ b/data/model_data_json/ibm-research_MoLFormer-XL-both-10pct.json @@ -0,0 +1,20 @@ +{ + "model_id": "ibm-research/MoLFormer-XL-both-10pct", + "downloads": 355621, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "molformer", + "fill-mask", + "chemistry", + "feature-extraction", + "custom_code", + "arxiv:2106.09553", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: transformers pipeline_tag: feature-extraction tags: - chemistry --- # MoLFormer-XL-both-10% MoLFormer is a class of models pretrained on SMILES string representations of up to 1.1B molecules from ZINC and PubChem. This repository is for the model pretrained on 10% of both datasets. It was introduced in the paper Large-Scale Chemical Language Representations Capture Molecular Structure and Properties by Ross et al. and first released in this repository. ## Model Details ### Model Description MoLFormer is a large-scale chemical language model designed with the intention of learning a model trained on small molecules which are represented as SMILES strings. MoLFormer leverges masked language modeling and employs a linear attention Transformer combined with rotary embeddings. !MoLFormer pipeline An overview of the MoLFormer pipeline is seen in the image above. One can see that the transformer-based neural network model is trained on a large collection of chemical molecules represented by SMILES sequences from two public chemical datasets PubChem and ZINC in a self-supervised fashion. The MoLFormer architecture was designed with an efficient linear attention mechanism and relative positional embeddings with the goal of learning a meaningful and compressed representation of chemical molecules. After training the MoLFormer foundation model was then adopted to different downstream molecular property prediction tasks via fine-tuning on task-specific data. To further test the representative power of MoLFormer, the MoLFormer encodings were used to recover molecular similarity, and analysis on the correspondence between the interatomic spatial distance and attention value for a given molecule was performed. ## Intended use and limitations You can use the model for masked language modeling, but it is mainly intended to be used as a feature extractor or to be fine-tuned for a prediction task. The \"frozen\" model embeddings may be used for similarity measurements, visualization, or training predictor models. The model may also be fine-tuned for sequence classification tasks (e.g., solubility, toxicity, etc.). This model is not intended for molecule generation. It is also not tested for molecules larger than ~200 atoms (i.e., macromolecules). Furthermore, using invalid or noncanonical SMILES may result in worse performance. ## Example code Use the code below to get started with the model. ## Training Details ### Data We trained MoLFormer-XL on a combination of molecules from the ZINC15 and PubChem datasets. This repository contains the version trained on 10% ZINC + 10% PubChem. Molecules were canonicalized with RDKit prior to training and isomeric information was removed. Also, molecules longer than 202 tokens were dropped. ### Hardware - 16 x NVIDIA V100 GPUs ## Evaluation We evaluated MoLFormer by fine-tuning on 11 benchmark tasks from MoleculeNet. The tables below show the performance of different MoLFormer variants: | | BBBP | HIV | BACE | SIDER | ClinTox | Tox21 | |-------------------------|----------|----------|----------|----------|----------|----------| | 10% ZINC + 10% PubChem | 91.5 | 81.3 | 86.6 | 68.9 | 94.6 | 84.5 | | 10% ZINC + 100% PubChem | 92.2 | 79.2 | 86.3 | 69.0 | 94.7 | 84.5 | | 100% ZINC | 89.9 | 78.4 | 87.7 | 66.8 | 82.2 | 83.2 | | MoLFormer-Base | 90.9 | 77,7 | 82.8 | 64.8 | 61.3 | 43.1 | | MoLFormer-XL | **93.7** | **82.2** | **88.2** | **69.0** | **94.8** | **84.7** | | | QM9 | QM8 | ESOL | FreeSolv | Lipophilicity | |-------------------------|------------|------------|--------|------------|---------------| | 10% ZINC + 10% PubChem | 1.7754 | 0.0108 | 0.3295 | 0.2221 | 0.5472 | | 10% ZINC + 100% PubChem | 1.9093 | **0.0102** | 0.2775 | **0.2050** | 0.5331 | | 100% ZINC | 1.9403 | 0.0124 | 0.3023 | 0.2981 | 0.5440 | | MoLFormer-Base | 2.2500 | 0.0111 | 0.2798 | 0.2596 | 0.6492 | | MoLFormer-XL | **1.5984** | **0.0102** | 0.2787 | 0.2308 | **0.5298** | We report AUROC for all classification tasks, average MAE for QM9/8, and RMSE for the remaining regression tasks. ## Citation", + "model_explanation_gemini": "A pretrained chemical language model using SMILES strings to extract molecular features or fine-tune for property prediction tasks like solubility and toxicity." +} \ No newline at end of file diff --git a/data/model_data_json/iiiorg_piiranha-v1-detect-personal-information.json b/data/model_data_json/iiiorg_piiranha-v1-detect-personal-information.json new file mode 100644 index 0000000000000000000000000000000000000000..c689208980b155b0cb8c2cf71398bb91335a7402 --- /dev/null +++ b/data/model_data_json/iiiorg_piiranha-v1-detect-personal-information.json @@ -0,0 +1,31 @@ +{ + "model_id": "iiiorg/piiranha-v1-detect-personal-information", + "downloads": 116992, + "tags": [ + "transformers", + "safetensors", + "deberta-v2", + "token-classification", + "generated_from_trainer", + "pii", + "privacy", + "personaldata", + "redaction", + "piidetection", + "en", + "it", + "fr", + "de", + "nl", + "es", + "dataset:ai4privacy/pii-masking-400k", + "base_model:microsoft/mdeberta-v3-base", + "base_model:finetune:microsoft/mdeberta-v3-base", + "license:cc-by-nc-nd-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: cc-by-nc-nd-4.0 base_model: microsoft/mdeberta-v3-base tags: - generated_from_trainer - pii - privacy - personaldata - redaction - piidetection metrics: - precision - recall - f1 - accuracy model-index: - name: piiranha-1 results: [] datasets: - ai4privacy/pii-masking-400k language: - en - it - fr - de - nl - es pipeline_tag: token-classification --- # Piiranha-v1: Protect your personal information!
Piiranha (cc-by-nc-nd-4.0 license) is trained to **detect 17 types** of Personally Identifiable Information (PII) across six languages. It successfully **catches 98.27% of PII** tokens, with an overall classification **accuracy of 99.44%**. Piiranha is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames. Performance on PII vs. Non PII classification task: - **Precision: 98.48%** (98.48% of tokens classified as PII are actually PII) - **Recall: 98.27%** (correctly identifies 98.27% of PII tokens) - **Specificity: 99.84%** (correctly identifies 99.84% of Non PII tokens) \"Akash Piiranha was trained on H100 GPUs generously sponsored by the Akash Network ## Model Description Piiranha is a fine-tuned version of microsoft/mdeberta-v3-base. The context length is 256 Deberta tokens. If your text is longer than that, just split it up. Supported languages: English, Spanish, French, German, Italian, Dutch Supported PII types: Account Number, Building Number, City, Credit Card Number, Date of Birth, Driver's License, Email, First Name, Last Name, ID Card, Password, Social Security Number, Street Address, Tax Number, Phone Number, Username, Zipcode. It achieves the following results on a test set of ~73,000 sentences containing PII: - Accuracy: 99.44% - Loss: 0.0173 - Precision: 93.16% - Recall: 93.08% - F1: 93.12% Note that the above metrics factor in the eighteen possible categories (17 PII and 1 Non PII), so the metrics are lower than the metrics for just PII vs. Non PII (binary classification). ## Performance by PII type Reported performance metrics are lower than the overall accuracy of 99.44% due to class imbalance (most tokens are not PII). However, the model is more useful than the below results suggest, due to the intent behind PII detection. The model sometimes misclassifies one PII type for another, but at the end of the day, it still recognizes the token as PII. For instance, the model often confuses first names for last names, but that's fine because it still flags the name as PII. | Entity | Precision | Recall | F1-Score | Support | |---------------------|-----------|--------|----------|---------| | ACCOUNTNUM | 0.84 | 0.87 | 0.85 | 3575 | | BUILDINGNUM | 0.92 | 0.90 | 0.91 | 3252 | | CITY | 0.95 | 0.97 | 0.96 | 7270 | | CREDITCARDNUMBER | 0.94 | 0.96 | 0.95 | 2308 | | DATEOFBIRTH | 0.93 | 0.85 | 0.89 | 3389 | | DRIVERLICENSENUM | 0.96 | 0.96 | 0.96 | 2244 | | EMAIL | 1.00 | 1.00 | 1.00 | 6892 | | GIVENNAME | 0.87 | 0.93 | 0.90 | 12150 | | IDCARDNUM | 0.89 | 0.94 | 0.91 | 3700 | | PASSWORD | 0.98 | 0.98 | 0.98 | 2387 | | SOCIALNUM | 0.93 | 0.94 | 0.93 | 2709 | | STREET | 0.97 | 0.95 | 0.96 | 3331 | | SURNAME | 0.89 | 0.78 | 0.83 | 8267 | | TAXNUM | 0.97 | 0.89 | 0.93 | 2322 | | TELEPHONENUM | 0.99 | 1.00 | 0.99 | 5039 | | USERNAME | 0.98 | 0.98 | 0.98 | 7680 | | ZIPCODE | 0.94 | 0.97 | 0.95 | 3191 | | **micro avg** | 0.93 | 0.93 | 0.93 | 79706 | | **macro avg** | 0.94 | 0.93 | 0.93 | 79706 | | **weighted avg** | 0.93 | 0.93 | 0.93 | 79706 | ## Intended uses & limitations Piiranha can be used to assist with redacting PII from texts. Use at your own risk. We do not accept responsibility for any incorrect model predictions. ## Training and evaluation data ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 128 - eval_batch_size: 128 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 5 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |:-------------:|:------:|:----:|:---------------:|:---------:|:------:|:------:|:--------:| | 0.2984 | 0.0983 | 250 | 0.1005 | 0.5446 | 0.6111 | 0.5759 | 0.9702 | | 0.0568 | 0.1965 | 500 | 0.0464 | 0.7895 | 0.8459 | 0.8167 | 0.9849 | | 0.0441 | 0.2948 | 750 | 0.0400 | 0.8346 | 0.8669 | 0.8504 | 0.9869 | | 0.0368 | 0.3931 | 1000 | 0.0320 | 0.8531 | 0.8784 | 0.8656 | 0.9891 | | 0.0323 | 0.4914 | 1250 | 0.0293 | 0.8779 | 0.8889 | 0.8834 | 0.9903 | | 0.0287 | 0.5896 | 1500 | 0.0269 | 0.8919 | 0.8836 | 0.8877 | 0.9907 | | 0.0282 | 0.6879 | 1750 | 0.0276 | 0.8724 | 0.9012 | 0.8866 | 0.9903 | | 0.0268 | 0.7862 | 2000 | 0.0254 | 0.8890 | 0.9041 | 0.8965 | 0.9914 | | 0.0264 | 0.8844 | 2250 | 0.0236 | 0.8886 | 0.9040 | 0.8962 | 0.9915 | | 0.0243 | 0.9827 | 2500 | 0.0232 | 0.8998 | 0.9033 | 0.9015 | 0.9917 | | 0.0213 | 1.0810 | 2750 | 0.0237 | 0.9115 | 0.9040 | 0.9077 | 0.9923 | | 0.0213 | 1.1792 | 3000 | 0.0222 | 0.9123 | 0.9143 | 0.9133 | 0.9925 | | 0.0217 | 1.2775 | 3250 | 0.0222 | 0.8999 | 0.9169 | 0.9083 | 0.9924 | | 0.0209 | 1.3758 | 3500 | 0.0212 | 0.9111 | 0.9133 | 0.9122 | 0.9928 | | 0.0204 | 1.4741 | 3750 | 0.0206 | 0.9054 | 0.9203 | 0.9128 | 0.9926 | | 0.0183 | 1.5723 | 4000 | 0.0212 | 0.9126 | 0.9160 | 0.9143 | 0.9927 | | 0.0191 | 1.6706 | 4250 | 0.0192 | 0.9122 | 0.9192 | 0.9157 | 0.9929 | | 0.0185 | 1.7689 | 4500 | 0.0195 | 0.9200 | 0.9191 | 0.9196 | 0.9932 | | 0.018 | 1.8671 | 4750 | 0.0188 | 0.9136 | 0.9215 | 0.9176 | 0.9933 | | 0.0183 | 1.9654 | 5000 | 0.0191 | 0.9179 | 0.9212 | 0.9196 | 0.9934 | | 0.0147 | 2.0637 | 5250 | 0.0188 | 0.9246 | 0.9242 | 0.9244 | 0.9937 | | 0.0149 | 2.1619 | 5500 | 0.0184 | 0.9188 | 0.9254 | 0.9221 | 0.9937 | | 0.0143 | 2.2602 | 5750 | 0.0193 | 0.9187 | 0.9224 | 0.9205 | 0.9932 | | 0.014 | 2.3585 | 6000 | 0.0190 | 0.9246 | 0.9280 | 0.9263 | 0.9936 | | 0.0146 | 2.4568 | 6250 | 0.0190 | 0.9225 | 0.9277 | 0.9251 | 0.9936 | | 0.0148 | 2.5550 | 6500 | 0.0175 | 0.9297 | 0.9306 | 0.9301 | 0.9942 | | 0.0136 | 2.6533 | 6750 | 0.0172 | 0.9191 | 0.9329 | 0.9259 | 0.9938 | | 0.0137 | 2.7516 | 7000 | 0.0166 | 0.9299 | 0.9312 | 0.9306 | 0.9942 | | 0.014 | 2.8498 | 7250 | 0.0167 | 0.9285 | 0.9313 | 0.9299 | 0.9942 | | 0.0128 | 2.9481 | 7500 | 0.0166 | 0.9271 | 0.9326 | 0.9298 | 0.9943 | | 0.0113 | 3.0464 | 7750 | 0.0171 | 0.9286 | 0.9347 | 0.9316 | 0.9946 | | 0.0103 | 3.1447 | 8000 | 0.0172 | 0.9284 | 0.9383 | 0.9334 | 0.9945 | | 0.0104 | 3.2429 | 8250 | 0.0169 | 0.9312 | 0.9406 | 0.9359 | 0.9947 | | 0.0094 | 3.3412 | 8500 | 0.0166 | 0.9368 | 0.9359 | 0.9364 | 0.9948 | | 0.01 | 3.4395 | 8750 | 0.0166 | 0.9289 | 0.9387 | 0.9337 | 0.9944 | | 0.0099 | 3.5377 | 9000 | 0.0162 | 0.9335 | 0.9332 | 0.9334 | 0.9947 | | 0.0099 | 3.6360 | 9250 | 0.0160 | 0.9321 | 0.9380 | 0.9350 | 0.9947 | | 0.01 | 3.7343 | 9500 | 0.0168 | 0.9306 | 0.9389 | 0.9347 | 0.9947 | | 0.0101 | 3.8325 | 9750 | 0.0159 | 0.9339 | 0.9350 | 0.9344 | 0.9947 | ### Contact william (at) integrinet [dot] org ### Framework versions - Transformers 4.44.2 - Pytorch 2.4.1+cu121 - Datasets 3.0.0 - Tokenizers 0.19.1", + "model_explanation_gemini": "Detects and classifies 17 types of personally identifiable information (PII) across six languages with high accuracy for privacy protection and redaction tasks." +} \ No newline at end of file diff --git a/data/model_data_json/ilsilfverskiold_classify-news-category-iptc.json b/data/model_data_json/ilsilfverskiold_classify-news-category-iptc.json new file mode 100644 index 0000000000000000000000000000000000000000..7802542aee66708cb9ea406b4feb47125e757704 --- /dev/null +++ b/data/model_data_json/ilsilfverskiold_classify-news-category-iptc.json @@ -0,0 +1,19 @@ +{ + "model_id": "ilsilfverskiold/classify-news-category-iptc", + "downloads": 236664, + "tags": [ + "transformers", + "tensorboard", + "safetensors", + "bert", + "text-classification", + "generated_from_trainer", + "base_model:KB/bert-base-swedish-cased", + "base_model:finetune:KB/bert-base-swedish-cased", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: KB/bert-base-swedish-cased tags: - generated_from_trainer metrics: - accuracy - f1 - precision - recall model-index: - name: news_category_classification results: [] --- # News Category Classification for IPTC NewsCodes This model is a fine-tuned version of KB/bert-base-swedish-cased on a private dataset. Built from a limited set of English, Swedish and Norwegian titles to classify news content within 16 categories as specified by the IPTC NewsCodes. The model has been fine-tuned on a dataset that is greatly skewed, but has been slightly augmented to stabilize it. ## Model description The model is intended to categorize Norwegian, Swedish and English news content within the specified 16 categories but is a test model for demonstration purposes. It needs more data within several categories to provide 100% value but it will outperform Claude Haiku and GPT-3.5 on this use case. ## Intended uses & limitations Use it to categorize news texts. Only set the category if the value is at least 60% for the label, otherwise the model is uncertain. # Test examples **Input:** Mann siktet for drapsforsøk på Slovakias statsministeren **Output:** politics **Input:** Tre døde i kioskbrann i Tyskland **Output:** disaster, accident, and emergency incident **Input:** Kultfilm får Netflix-oppfølger. Kultfilmen «Happy Gilmore» fra 1996 får en oppfølger på Netflix. Det røper strømmetjenesten selv på X, tidligere Twitter. –Happy Gilmore er tilbake! **Output:** arts, culture, entertainment and media # Performance It achieves the following results on the evaluation set: - Loss: 0.8030 - Accuracy: 0.7431 - F1: 0.7474 - Precision: 0.7695 - Recall: 0.7431 See the performance (accuracy) for each label below: - Arts, culture, entertainment and media: 0.6842 - Conflict, war and peace: 0.7351 - Crime, law and justice: 0.8918 - Disaster, accident, and emergency incident: 0.8699 - Economy, business, and finance: 0.6893 - Environment: 0.4483 - Health: 0.7222 - Human interest: 0.3182 - Labour: 0.5 - Lifestyle and leisure: 0.5556 - Politics: 0.7909 - Science and technology: 0.4583 - Society: 0.3538 - Sport: 0.9615 - Weather: 1.0 - Religion: 0.0 ## Training and evaluation data Trained with the trainer, setting a learning rate of 2e-05 and batch size of 16 for 3 epochs. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | Accuracy Label Arts, culture, entertainment and media | Accuracy Label Conflict, war and peace | Accuracy Label Crime, law and justice | Accuracy Label Disaster, accident, and emergency incident | Accuracy Label Economy, business, and finance | Accuracy Label Environment | Accuracy Label Health | Accuracy Label Human interest | Accuracy Label Labour | Accuracy Label Lifestyle and leisure | Accuracy Label Politics | Accuracy Label Religion | Accuracy Label Science and technology | Accuracy Label Society | Accuracy Label Sport | Accuracy Label Weather | |:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|:-----------------------------------------------------:|:--------------------------------------:|:-------------------------------------:|:---------------------------------------------------------:|:---------------------------------------------:|:--------------------------:|:---------------------:|:-----------------------------:|:---------------------:|:------------------------------------:|:-----------------------:|:-----------------------:|:-------------------------------------:|:----------------------:|:--------------------:|:----------------------:| | 1.9761 | 0.2907 | 200 | 1.4046 | 0.6462 | 0.6164 | 0.6057 | 0.6462 | 0.3158 | 0.8315 | 0.7629 | 0.7055 | 0.5437 | 0.0 | 0.5 | 0.0 | 0.0 | 0.3333 | 0.4843 | 0.0 | 0.0833 | 0.0 | 0.9615 | 0.0 | | 1.2153 | 0.5814 | 400 | 1.0225 | 0.6894 | 0.6868 | 0.7652 | 0.6894 | 0.7895 | 0.6554 | 0.8196 | 0.8562 | 0.6408 | 0.2414 | 0.8333 | 0.1364 | 0.0 | 0.6667 | 0.8467 | 0.0 | 0.375 | 0.0154 | 0.9615 | 1.0 | | 0.954 | 0.8721 | 600 | 0.8858 | 0.7231 | 0.7138 | 0.7309 | 0.7231 | 0.7368 | 0.7795 | 0.8918 | 0.8699 | 0.6214 | 0.3448 | 0.8889 | 0.1818 | 1.0 | 0.5556 | 0.6899 | 0.0 | 0.25 | 0.0462 | 0.9615 | 1.0 | | 0.6662 | 1.1628 | 800 | 0.9381 | 0.6881 | 0.7009 | 0.7618 | 0.6881 | 0.7895 | 0.6126 | 0.8454 | 0.8630 | 0.6505 | 0.4483 | 0.7222 | 0.2273 | 1.0 | 0.4444 | 0.8293 | 0.0 | 0.5417 | 0.2308 | 0.9615 | 1.0 | | 0.5554 | 1.4535 | 1000 | 0.8791 | 0.7025 | 0.7124 | 0.7628 | 0.7025 | 0.7368 | 0.6478 | 0.9021 | 0.8562 | 0.6602 | 0.3103 | 0.7778 | 0.3636 | 0.5 | 0.5556 | 0.8084 | 0.0 | 0.5 | 0.1846 | 0.9615 | 1.0 | | 0.4396 | 1.7442 | 1200 | 0.8275 | 0.7175 | 0.7280 | 0.7686 | 0.7175 | 0.7895 | 0.6631 | 0.8196 | 0.8836 | 0.6893 | 0.3793 | 0.8333 | 0.4091 | 0.5 | 0.5556 | 0.8362 | 0.0 | 0.4167 | 0.3692 | 0.9615 | 1.0 | | 0.383 | 2.0349 | 1400 | 0.7929 | 0.745 | 0.7501 | 0.7653 | 0.745 | 0.6842 | 0.7841 | 0.8866 | 0.8767 | 0.7087 | 0.4483 | 0.7778 | 0.4091 | 0.5 | 0.5556 | 0.6899 | 0.0 | 0.4167 | 0.2923 | 0.9615 | 0.0 | | 0.3418 | 2.3256 | 1600 | 0.8042 | 0.7438 | 0.7440 | 0.7686 | 0.7438 | 0.7895 | 0.7351 | 0.9072 | 0.8493 | 0.7864 | 0.4483 | 0.7778 | 0.3182 | 0.5 | 0.5556 | 0.7909 | 0.0 | 0.4167 | 0.1846 | 0.9615 | 0.0 | | 0.248 | 2.6163 | 1800 | 0.8387 | 0.7275 | 0.7325 | 0.7610 | 0.7275 | 0.6842 | 0.6891 | 0.8814 | 0.8699 | 0.7573 | 0.4138 | 0.8333 | 0.4091 | 0.5 | 0.5556 | 0.8014 | 0.0 | 0.4167 | 0.2769 | 0.9615 | 0.0 | | 0.2525 | 2.9070 | 2000 | 0.8137 | 0.735 | 0.7413 | 0.7697 | 0.735 | 0.6842 | 0.7106 | 0.8763 | 0.8699 | 0.6796 | 0.4483 | 0.7222 | 0.3636 | 0.5 | 0.5556 | 0.8153 | 0.0 | 0.4583 | 0.3385 | 0.9615 | 0.0 | ### Framework versions - Transformers 4.40.2 - Pytorch 2.2.1+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1", + "model_explanation_gemini": "Classifies news content in Norwegian, Swedish, and English into 16 IPTC NewsCodes categories based on titles, with a confidence threshold of 60%." +} \ No newline at end of file diff --git a/data/model_data_json/immich-app_ViT-B-32__openai.json b/data/model_data_json/immich-app_ViT-B-32__openai.json new file mode 100644 index 0000000000000000000000000000000000000000..d3a7e987d5381163bc20d0cc03bc06fc28b93973 --- /dev/null +++ b/data/model_data_json/immich-app_ViT-B-32__openai.json @@ -0,0 +1,14 @@ +{ + "model_id": "immich-app/ViT-B-32__openai", + "downloads": 122888, + "tags": [ + "transformers", + "onnx", + "immich", + "clip", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - immich - clip --- # Model Description This repo contains ONNX exports for the associated CLIP model by OpenCLIP. See the OpenCLIP repo for more info. This repo is specifically intended for use with Immich, a self-hosted photo library.", + "model_explanation_gemini": "Exports CLIP model to ONNX format for use with Immich's self-hosted photo library." +} \ No newline at end of file diff --git a/data/model_data_json/imvladikon_wav2vec2-xls-r-300m-hebrew.json b/data/model_data_json/imvladikon_wav2vec2-xls-r-300m-hebrew.json new file mode 100644 index 0000000000000000000000000000000000000000..3d9fad1f9dd48d14c3735d7c0636dc53e19bbdc7 --- /dev/null +++ b/data/model_data_json/imvladikon_wav2vec2-xls-r-300m-hebrew.json @@ -0,0 +1,22 @@ +{ + "model_id": "imvladikon/wav2vec2-xls-r-300m-hebrew", + "downloads": 1272025, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "generated_from_trainer", + "he", + "hf-asr-leaderboard", + "robust-speech-event", + "base_model:facebook/wav2vec2-xls-r-300m", + "base_model:finetune:facebook/wav2vec2-xls-r-300m", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - he tags: - automatic-speech-recognition - generated_from_trainer - he - hf-asr-leaderboard - robust-speech-event base_model: facebook/wav2vec2-xls-r-300m model-index: - name: wav2vec2-xls-r-300m-hebrew results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Custom Dataset type: custom args: he metrics: - type: wer value: 23.18 name: Test WER --- # wav2vec2-xls-r-300m-hebrew This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the private datasets in 2 stages - firstly was fine-tuned on a small dataset with good samples Then the obtained model was fine-tuned on a large dataset with the small good dataset, with various samples from different sources, and with an unlabeled dataset that was weakly labeled using a previously trained model. Small dataset: | split |size(gb) | n_samples | duration(hrs)| | |---|---|---|---|---| |train|4.19| 20306 | 28 | | |dev |1.05| 5076 | 7 | | Large dataset: | split |size(gb) | n_samples | duration(hrs)| | |---|---|---|---|---| |train|12.3| 90777 | 69 | | |dev |2.39| 20246 | 14* | | (*weakly labeled data wasn't used in validation set) After firts training it achieves: on small dataset - Loss: 0.5438 - WER: 0.1773 on large dataset - WER: 0.3811 after second training: on small dataset - WER: 0.1697 on large dataset - Loss: 0.4502 - WER: 0.2318 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters #### First training The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1000 - num_epochs: 100.0 - mixed_precision_training: Native AMP Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:-----:|:---------------:|:------:| | No log | 3.15 | 1000 | 0.5203 | 0.4333 | | 1.4284 | 6.31 | 2000 | 0.4816 | 0.3951 | | 1.4284 | 9.46 | 3000 | 0.4315 | 0.3546 | | 1.283 | 12.62 | 4000 | 0.4278 | 0.3404 | | 1.283 | 15.77 | 5000 | 0.4090 | 0.3054 | | 1.1777 | 18.93 | 6000 | 0.3893 | 0.3006 | | 1.1777 | 22.08 | 7000 | 0.3968 | 0.2857 | | 1.0994 | 25.24 | 8000 | 0.3892 | 0.2751 | | 1.0994 | 28.39 | 9000 | 0.4061 | 0.2690 | | 1.0323 | 31.54 | 10000 | 0.4114 | 0.2507 | | 1.0323 | 34.7 | 11000 | 0.4021 | 0.2508 | | 0.9623 | 37.85 | 12000 | 0.4032 | 0.2378 | | 0.9623 | 41.01 | 13000 | 0.4148 | 0.2374 | | 0.9077 | 44.16 | 14000 | 0.4350 | 0.2323 | | 0.9077 | 47.32 | 15000 | 0.4515 | 0.2246 | | 0.8573 | 50.47 | 16000 | 0.4474 | 0.2180 | | 0.8573 | 53.63 | 17000 | 0.4649 | 0.2171 | | 0.8083 | 56.78 | 18000 | 0.4455 | 0.2102 | | 0.8083 | 59.94 | 19000 | 0.4587 | 0.2092 | | 0.769 | 63.09 | 20000 | 0.4794 | 0.2012 | | 0.769 | 66.25 | 21000 | 0.4845 | 0.2007 | | 0.7308 | 69.4 | 22000 | 0.4937 | 0.2008 | | 0.7308 | 72.55 | 23000 | 0.4920 | 0.1895 | | 0.6927 | 75.71 | 24000 | 0.5179 | 0.1911 | | 0.6927 | 78.86 | 25000 | 0.5202 | 0.1877 | | 0.6622 | 82.02 | 26000 | 0.5266 | 0.1840 | | 0.6622 | 85.17 | 27000 | 0.5351 | 0.1854 | | 0.6315 | 88.33 | 28000 | 0.5373 | 0.1811 | | 0.6315 | 91.48 | 29000 | 0.5331 | 0.1792 | | 0.6075 | 94.64 | 30000 | 0.5390 | 0.1779 | | 0.6075 | 97.79 | 31000 | 0.5459 | 0.1773 | #### Second training The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1000 - num_epochs: 60.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:-----:|:---------------:|:------:| | No log | 0.7 | 1000 | 0.5371 | 0.3811 | | 1.3606 | 1.41 | 2000 | 0.5247 | 0.3902 | | 1.3606 | 2.12 | 3000 | 0.5126 | 0.3859 | | 1.3671 | 2.82 | 4000 | 0.5062 | 0.3828 | | 1.3671 | 3.53 | 5000 | 0.4979 | 0.3672 | | 1.3421 | 4.23 | 6000 | 0.4906 | 0.3816 | | 1.3421 | 4.94 | 7000 | 0.4784 | 0.3651 | | 1.328 | 5.64 | 8000 | 0.4810 | 0.3669 | | 1.328 | 6.35 | 9000 | 0.4747 | 0.3597 | | 1.3109 | 7.05 | 10000 | 0.4813 | 0.3808 | | 1.3109 | 7.76 | 11000 | 0.4631 | 0.3561 | | 1.2873 | 8.46 | 12000 | 0.4603 | 0.3431 | | 1.2873 | 9.17 | 13000 | 0.4579 | 0.3533 | | 1.2661 | 9.87 | 14000 | 0.4471 | 0.3365 | | 1.2661 | 10.58 | 15000 | 0.4584 | 0.3437 | | 1.249 | 11.28 | 16000 | 0.4461 | 0.3454 | | 1.249 | 11.99 | 17000 | 0.4482 | 0.3367 | | 1.2322 | 12.69 | 18000 | 0.4464 | 0.3335 | | 1.2322 | 13.4 | 19000 | 0.4427 | 0.3454 | | 1.22 | 14.1 | 20000 | 0.4440 | 0.3395 | | 1.22 | 14.81 | 21000 | 0.4459 | 0.3378 | | 1.2044 | 15.51 | 22000 | 0.4406 | 0.3199 | | 1.2044 | 16.22 | 23000 | 0.4398 | 0.3155 | | 1.1913 | 16.92 | 24000 | 0.4237 | 0.3150 | | 1.1913 | 17.63 | 25000 | 0.4287 | 0.3279 | | 1.1705 | 18.34 | 26000 | 0.4253 | 0.3103 | | 1.1705 | 19.04 | 27000 | 0.4234 | 0.3098 | | 1.1564 | 19.75 | 28000 | 0.4174 | 0.3076 | | 1.1564 | 20.45 | 29000 | 0.4260 | 0.3160 | | 1.1461 | 21.16 | 30000 | 0.4235 | 0.3036 | | 1.1461 | 21.86 | 31000 | 0.4309 | 0.3055 | | 1.1285 | 22.57 | 32000 | 0.4264 | 0.3006 | | 1.1285 | 23.27 | 33000 | 0.4201 | 0.2880 | | 1.1135 | 23.98 | 34000 | 0.4131 | 0.2975 | | 1.1135 | 24.68 | 35000 | 0.4202 | 0.2849 | | 1.0968 | 25.39 | 36000 | 0.4105 | 0.2888 | | 1.0968 | 26.09 | 37000 | 0.4210 | 0.2834 | | 1.087 | 26.8 | 38000 | 0.4123 | 0.2843 | | 1.087 | 27.5 | 39000 | 0.4216 | 0.2803 | | 1.0707 | 28.21 | 40000 | 0.4161 | 0.2787 | | 1.0707 | 28.91 | 41000 | 0.4186 | 0.2740 | | 1.0575 | 29.62 | 42000 | 0.4118 | 0.2845 | | 1.0575 | 30.32 | 43000 | 0.4243 | 0.2773 | | 1.0474 | 31.03 | 44000 | 0.4221 | 0.2707 | | 1.0474 | 31.73 | 45000 | 0.4138 | 0.2700 | | 1.0333 | 32.44 | 46000 | 0.4102 | 0.2638 | | 1.0333 | 33.15 | 47000 | 0.4162 | 0.2650 | | 1.0191 | 33.85 | 48000 | 0.4155 | 0.2636 | | 1.0191 | 34.56 | 49000 | 0.4129 | 0.2656 | | 1.0087 | 35.26 | 50000 | 0.4157 | 0.2632 | | 1.0087 | 35.97 | 51000 | 0.4090 | 0.2654 | | 0.9901 | 36.67 | 52000 | 0.4183 | 0.2587 | | 0.9901 | 37.38 | 53000 | 0.4251 | 0.2648 | | 0.9795 | 38.08 | 54000 | 0.4229 | 0.2555 | | 0.9795 | 38.79 | 55000 | 0.4176 | 0.2546 | | 0.9644 | 39.49 | 56000 | 0.4223 | 0.2513 | | 0.9644 | 40.2 | 57000 | 0.4244 | 0.2530 | | 0.9534 | 40.9 | 58000 | 0.4175 | 0.2538 | | 0.9534 | 41.61 | 59000 | 0.4213 | 0.2505 | | 0.9397 | 42.31 | 60000 | 0.4275 | 0.2565 | | 0.9397 | 43.02 | 61000 | 0.4315 | 0.2528 | | 0.9269 | 43.72 | 62000 | 0.4316 | 0.2501 | | 0.9269 | 44.43 | 63000 | 0.4247 | 0.2471 | | 0.9175 | 45.13 | 64000 | 0.4376 | 0.2469 | | 0.9175 | 45.84 | 65000 | 0.4335 | 0.2450 | | 0.9026 | 46.54 | 66000 | 0.4336 | 0.2452 | | 0.9026 | 47.25 | 67000 | 0.4400 | 0.2427 | | 0.8929 | 47.95 | 68000 | 0.4382 | 0.2429 | | 0.8929 | 48.66 | 69000 | 0.4361 | 0.2415 | | 0.8786 | 49.37 | 70000 | 0.4413 | 0.2398 | | 0.8786 | 50.07 | 71000 | 0.4392 | 0.2415 | | 0.8714 | 50.78 | 72000 | 0.4345 | 0.2406 | | 0.8714 | 51.48 | 73000 | 0.4475 | 0.2402 | | 0.8589 | 52.19 | 74000 | 0.4473 | 0.2374 | | 0.8589 | 52.89 | 75000 | 0.4457 | 0.2357 | | 0.8493 | 53.6 | 76000 | 0.4462 | 0.2366 | | 0.8493 | 54.3 | 77000 | 0.4494 | 0.2356 | | 0.8395 | 55.01 | 78000 | 0.4472 | 0.2352 | | 0.8395 | 55.71 | 79000 | 0.4490 | 0.2339 | | 0.8295 | 56.42 | 80000 | 0.4489 | 0.2318 | | 0.8295 | 57.12 | 81000 | 0.4469 | 0.2320 | | 0.8225 | 57.83 | 82000 | 0.4478 | 0.2321 | | 0.8225 | 58.53 | 83000 | 0.4525 | 0.2326 | | 0.816 | 59.24 | 84000 | 0.4532 | 0.2316 | | 0.816 | 59.94 | 85000 | 0.4502 | 0.2318 | ### Framework versions - Transformers 4.17.0.dev0 - Pytorch 1.10.2+cu102 - Datasets 1.18.2.dev0 - Tokenizers 0.11.0", + "model_explanation_gemini": "A fine-tuned Hebrew automatic speech recognition model based on Facebook's wav2vec2-xls-r-300m, achieving a 23.18% word error rate on test data." +} \ No newline at end of file diff --git a/data/model_data_json/indobenchmark_indobert-base-p1.json b/data/model_data_json/indobenchmark_indobert-base-p1.json new file mode 100644 index 0000000000000000000000000000000000000000..c8de633dbfaef75d2745600fe1eac4c1ce2eb470 --- /dev/null +++ b/data/model_data_json/indobenchmark_indobert-base-p1.json @@ -0,0 +1,22 @@ +{ + "model_id": "indobenchmark/indobert-base-p1", + "downloads": 254744, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "feature-extraction", + "indobert", + "indobenchmark", + "indonlu", + "id", + "dataset:Indo4B", + "arxiv:2009.05387", + "license:mit", + "region:us" + ], + "description": "--- language: id tags: - indobert - indobenchmark - indonlu license: mit inference: false datasets: - Indo4B --- # IndoBERT Base Model (phase1 - uncased) IndoBERT is a state-of-the-art language model for Indonesian based on the BERT model. The pretrained model is trained using a masked language modeling (MLM) objective and next sentence prediction (NSP) objective. ## All Pre-trained Models | Model | #params | Arch. | Training data | |--------------------------------|--------------------------------|-------|-----------------------------------| | | 124.5M | Base | Indo4B (23.43 GB of text) | | | 124.5M | Base | Indo4B (23.43 GB of text) | | | 335.2M | Large | Indo4B (23.43 GB of text) | | | 335.2M | Large | Indo4B (23.43 GB of text) | | | 11.7M | Base | Indo4B (23.43 GB of text) | | | 11.7M | Base | Indo4B (23.43 GB of text) | | | 17.7M | Large | Indo4B (23.43 GB of text) | | | 17.7M | Large | Indo4B (23.43 GB of text) | ## How to use ### Load model and tokenizer ### Extract contextual representation ## Authors IndoBERT was trained and evaluated by Bryan Wilie\\*, Karissa Vincentio\\*, Genta Indra Winata\\*, Samuel Cahyawijaya\\*, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti. ## Citation If you use our work, please cite:", + "model_explanation_gemini": "Pretrained Indonesian BERT model for masked language modeling and next sentence prediction tasks, trained on the Indo4B dataset." +} \ No newline at end of file diff --git a/data/model_data_json/indonesian-nlp_wav2vec2-large-xlsr-indonesian.json b/data/model_data_json/indonesian-nlp_wav2vec2-large-xlsr-indonesian.json new file mode 100644 index 0000000000000000000000000000000000000000..2c8b5f52771d3cc49e1087ff4b9d75705fa688ce --- /dev/null +++ b/data/model_data_json/indonesian-nlp_wav2vec2-large-xlsr-indonesian.json @@ -0,0 +1,22 @@ +{ + "model_id": "indonesian-nlp/wav2vec2-large-xlsr-indonesian", + "downloads": 100443, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "id", + "dataset:common_voice", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: id datasets: - common_voice metrics: - wer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Indonesian by Indonesian NLP results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice id type: common_voice args: id metrics: - name: Test WER type: wer value: 14.29 --- # Wav2Vec2-Large-XLSR-Indonesian This is the model for Wav2Vec2-Large-XLSR-Indonesian, a fine-tuned facebook/wav2vec2-large-xlsr-53 model on the Indonesian Common Voice dataset. When using this model, make sure that your speech input is sampled at 16kHz. ## Usage The model can be used directly (without a language model) as follows: ## Evaluation The model can be evaluated as follows on the Indonesian test data of Common Voice. **Test Result**: 14.29 % ## Training The Common Voice , , and synthetic voice datasets were used for training. The script used for training can be found here", + "model_explanation_gemini": "Fine-tuned for Indonesian speech recognition, this model converts 16kHz audio input into text, achieving a 14.29% word error rate on the Common Voice dataset." +} \ No newline at end of file diff --git a/data/model_data_json/infly_OpenCoder-8B-Instruct.json b/data/model_data_json/infly_OpenCoder-8B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..7895e97a3665cacdbd5557a5bc540f6b988b6d4e --- /dev/null +++ b/data/model_data_json/infly_OpenCoder-8B-Instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "infly/OpenCoder-8B-Instruct", + "downloads": 606740, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "conversational", + "en", + "zh", + "dataset:OpenCoder-LLM/opencoder-sft-stage1", + "dataset:OpenCoder-LLM/opencoder-sft-stage2", + "arxiv:2411.04905", + "base_model:infly/OpenCoder-8B-Base", + "base_model:finetune:infly/OpenCoder-8B-Base", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: inf license_link: language: - en - zh base_model: - infly/OpenCoder-8B-Base pipeline_tag: text-generation library_name: transformers datasets: - OpenCoder-LLM/opencoder-sft-stage1 - OpenCoder-LLM/opencoder-sft-stage2 ---
\"OpenCoder-Icon\"

🏠    |    🤗 ## 1. Introduction **OpenCoder** is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. Starting from scratch, OpenCoder is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples, finally reaching the performance of top-tier code LLMs. We provide not only model weights and inference code, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols. Empowering researchers to build and innovate, OpenCoder is your open foundation for advancing code AI. - **Complete Open Source**: OpenCoder ensures full transparency by releasing not only the model weights and forthcoming inference code but also the complete data-cleaning code for training. This release includes high-quality synthetic data, an extensive set of checkpoints, and a dataset of over 4.5 million supervised fine-tuning (SFT) entries, making OpenCoder one of the most comprehensively open-sourced models available. - **Comprehensive Experimental Analysis**: OpenCoder is rigorously tested through extensive ablation studies on various data-cleaning strategies and training processes, including file-level and repository-level deduplication experiments, ensuring thorough exploration and validation of the model’s performance. - **High-Quality Synthetic Data**: OpenCoder provides a fully developed synthetic data generation process and over 4.5 million SFT data entries, establishing a robust data foundation for model training and evaluation. - **Exceptional Performance**: OpenCoder achieves high performance across multiple language model benchmarks, positioning it among the leading open-source models for code. ## 2. Models | Model | Sequence Length | Download | |:---------------------:|:---------------:|:-----------------------------------------------------------------------:| | OpenCoder-1.5B-Base | 4K | 🤗 HuggingFace | | OpenCoder-8B-Base | 8K | 🤗 HuggingFace | | OpenCoder-1.5B-Instruct | 4K | 🤗 HuggingFace | | OpenCoder-8B-Instruct | 8K | 🤗 HuggingFace | ## 3. Datasets ### Pre-training | Dataset | Size | Download | |:---------------------:|:---------------:|:-----------------------------------------------------------------------:| | fineweb-code-corpus | 148 GB | 🤗 HuggingFace | | fineweb-math-corpus | 10 GB | 🤗 HuggingFace | ### Post-training | Dataset | Num | Download | |:---------------------:|:---------------:|:-----------------------------------------------------------------------:| | opencoder-sft-stage1 | 4.21 M | 🤗 HuggingFace | | opencoder-sft-stage2 | 375 K | 🤗 HuggingFace | **This is not the end; we are organizing the remaining data and uploading it progressively.** ## 4. Benchmarks **Note:** For the detailed evaluation results, please refer to our paper. | model | OpenCoder-1.5B-Instruct | OpenCoder-8B-Instruct | |:---------------:|:-------------:|:------------:| | HumanEval(+) | 72.5 (67.7) | 83.5 (78.7) | | MBPP(+) | 72.7 (61.9) | 79.1 (69.0) | | BigCodeBench | 33.3 | 40.3 | | BigCodeBench-Hard | 11.5 | 16.9 | | LiveCodeBench | 12.8 | 23.2 | | MultiPL-E (AVG) | 57.5 | 71.0 | ## 5. Inference ### Inference with Huggingface's Transformers ## 6. License OpenCoder series (including Base and Chat) support commercial applications under a permissive License. ## 7. Citation", + "model_explanation_gemini": "An 8B-parameter instruction-tuned code generation model supporting English and Chinese, trained on extensive code and SFT data for high-performance programming tasks." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_e5-base-v2.json b/data/model_data_json/intfloat_e5-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..22ec85cd92a34ba88b02ca85ebf53e5214f123f3 --- /dev/null +++ b/data/model_data_json/intfloat_e5-base-v2.json @@ -0,0 +1,27 @@ +{ + "model_id": "intfloat/e5-base-v2", + "downloads": 468108, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "bert", + "mteb", + "Sentence Transformers", + "sentence-similarity", + "en", + "arxiv:2212.03533", + "arxiv:2104.08663", + "arxiv:2210.07316", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - Sentence Transformers - sentence-similarity - sentence-transformers model-index: - name: e5-base-v2 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 77.77611940298506 - type: ap value: 42.052710266606056 - type: f1 value: 72.12040628266567 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.81012500000001 - type: ap value: 89.4213700757244 - type: f1 value: 92.8039091197065 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.711999999999996 - type: f1 value: 46.11544975436018 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 23.186 - type: map_at_10 value: 36.632999999999996 - type: map_at_100 value: 37.842 - type: map_at_1000 value: 37.865 - type: map_at_3 value: 32.278 - type: map_at_5 value: 34.760999999999996 - type: mrr_at_1 value: 23.400000000000002 - type: mrr_at_10 value: 36.721 - type: mrr_at_100 value: 37.937 - type: mrr_at_1000 value: 37.96 - type: mrr_at_3 value: 32.302 - type: mrr_at_5 value: 34.894 - type: ndcg_at_1 value: 23.186 - type: ndcg_at_10 value: 44.49 - type: ndcg_at_100 value: 50.065000000000005 - type: ndcg_at_1000 value: 50.629999999999995 - type: ndcg_at_3 value: 35.461 - type: ndcg_at_5 value: 39.969 - type: precision_at_1 value: 23.186 - type: precision_at_10 value: 6.97 - type: precision_at_100 value: 0.951 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 14.912 - type: precision_at_5 value: 11.152 - type: recall_at_1 value: 23.186 - type: recall_at_10 value: 69.70100000000001 - type: recall_at_100 value: 95.092 - type: recall_at_1000 value: 99.431 - type: recall_at_3 value: 44.737 - type: recall_at_5 value: 55.761 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 46.10312401440185 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 39.67275326095384 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 58.97793816337376 - type: mrr value: 72.76832431957087 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 83.11646947018187 - type: cos_sim_spearman value: 81.40064994975234 - type: euclidean_pearson value: 82.37355689019232 - type: euclidean_spearman value: 81.6777646977348 - type: manhattan_pearson value: 82.61101422716945 - type: manhattan_spearman value: 81.80427360442245 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 83.52922077922076 - type: f1 value: 83.45298679360866 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.495115019668496 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 32.724792944166765 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.361000000000004 - type: map_at_10 value: 43.765 - type: map_at_100 value: 45.224 - type: map_at_1000 value: 45.35 - type: map_at_3 value: 40.353 - type: map_at_5 value: 42.195 - type: mrr_at_1 value: 40.629 - type: mrr_at_10 value: 50.458000000000006 - type: mrr_at_100 value: 51.06699999999999 - type: mrr_at_1000 value: 51.12 - type: mrr_at_3 value: 47.902 - type: mrr_at_5 value: 49.447 - type: ndcg_at_1 value: 40.629 - type: ndcg_at_10 value: 50.376 - type: ndcg_at_100 value: 55.065 - type: ndcg_at_1000 value: 57.196000000000005 - type: ndcg_at_3 value: 45.616 - type: ndcg_at_5 value: 47.646 - type: precision_at_1 value: 40.629 - type: precision_at_10 value: 9.785 - type: precision_at_100 value: 1.562 - type: precision_at_1000 value: 0.2 - type: precision_at_3 value: 22.031 - type: precision_at_5 value: 15.737000000000002 - type: recall_at_1 value: 32.361000000000004 - type: recall_at_10 value: 62.214000000000006 - type: recall_at_100 value: 81.464 - type: recall_at_1000 value: 95.905 - type: recall_at_3 value: 47.5 - type: recall_at_5 value: 53.69500000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.971 - type: map_at_10 value: 37.444 - type: map_at_100 value: 38.607 - type: map_at_1000 value: 38.737 - type: map_at_3 value: 34.504000000000005 - type: map_at_5 value: 36.234 - type: mrr_at_1 value: 35.35 - type: mrr_at_10 value: 43.441 - type: mrr_at_100 value: 44.147999999999996 - type: mrr_at_1000 value: 44.196000000000005 - type: mrr_at_3 value: 41.285 - type: mrr_at_5 value: 42.552 - type: ndcg_at_1 value: 35.35 - type: ndcg_at_10 value: 42.903999999999996 - type: ndcg_at_100 value: 47.406 - type: ndcg_at_1000 value: 49.588 - type: ndcg_at_3 value: 38.778 - type: ndcg_at_5 value: 40.788000000000004 - type: precision_at_1 value: 35.35 - type: precision_at_10 value: 8.083 - type: precision_at_100 value: 1.313 - type: precision_at_1000 value: 0.18 - type: precision_at_3 value: 18.769 - type: precision_at_5 value: 13.439 - type: recall_at_1 value: 27.971 - type: recall_at_10 value: 52.492000000000004 - type: recall_at_100 value: 71.642 - type: recall_at_1000 value: 85.488 - type: recall_at_3 value: 40.1 - type: recall_at_5 value: 45.800000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 39.898 - type: map_at_10 value: 51.819 - type: map_at_100 value: 52.886 - type: map_at_1000 value: 52.941 - type: map_at_3 value: 48.619 - type: map_at_5 value: 50.493 - type: mrr_at_1 value: 45.391999999999996 - type: mrr_at_10 value: 55.230000000000004 - type: mrr_at_100 value: 55.887 - type: mrr_at_1000 value: 55.916 - type: mrr_at_3 value: 52.717000000000006 - type: mrr_at_5 value: 54.222 - type: ndcg_at_1 value: 45.391999999999996 - type: ndcg_at_10 value: 57.586999999999996 - type: ndcg_at_100 value: 61.745000000000005 - type: ndcg_at_1000 value: 62.83800000000001 - type: ndcg_at_3 value: 52.207 - type: ndcg_at_5 value: 54.925999999999995 - type: precision_at_1 value: 45.391999999999996 - type: precision_at_10 value: 9.21 - type: precision_at_100 value: 1.226 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 23.177 - type: precision_at_5 value: 16.038 - type: recall_at_1 value: 39.898 - type: recall_at_10 value: 71.18900000000001 - type: recall_at_100 value: 89.082 - type: recall_at_1000 value: 96.865 - type: recall_at_3 value: 56.907 - type: recall_at_5 value: 63.397999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.706 - type: map_at_10 value: 30.818 - type: map_at_100 value: 32.038 - type: map_at_1000 value: 32.123000000000005 - type: map_at_3 value: 28.077 - type: map_at_5 value: 29.709999999999997 - type: mrr_at_1 value: 24.407 - type: mrr_at_10 value: 32.555 - type: mrr_at_100 value: 33.692 - type: mrr_at_1000 value: 33.751 - type: mrr_at_3 value: 29.848999999999997 - type: mrr_at_5 value: 31.509999999999998 - type: ndcg_at_1 value: 24.407 - type: ndcg_at_10 value: 35.624 - type: ndcg_at_100 value: 41.454 - type: ndcg_at_1000 value: 43.556 - type: ndcg_at_3 value: 30.217 - type: ndcg_at_5 value: 33.111000000000004 - type: precision_at_1 value: 24.407 - type: precision_at_10 value: 5.548 - type: precision_at_100 value: 0.8869999999999999 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 12.731 - type: precision_at_5 value: 9.22 - type: recall_at_1 value: 22.706 - type: recall_at_10 value: 48.772 - type: recall_at_100 value: 75.053 - type: recall_at_1000 value: 90.731 - type: recall_at_3 value: 34.421 - type: recall_at_5 value: 41.427 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 13.424 - type: map_at_10 value: 21.09 - type: map_at_100 value: 22.264999999999997 - type: map_at_1000 value: 22.402 - type: map_at_3 value: 18.312 - type: map_at_5 value: 19.874 - type: mrr_at_1 value: 16.915 - type: mrr_at_10 value: 25.258000000000003 - type: mrr_at_100 value: 26.228 - type: mrr_at_1000 value: 26.31 - type: mrr_at_3 value: 22.492 - type: mrr_at_5 value: 24.04 - type: ndcg_at_1 value: 16.915 - type: ndcg_at_10 value: 26.266000000000002 - type: ndcg_at_100 value: 32.08 - type: ndcg_at_1000 value: 35.086 - type: ndcg_at_3 value: 21.049 - type: ndcg_at_5 value: 23.508000000000003 - type: precision_at_1 value: 16.915 - type: precision_at_10 value: 5.1 - type: precision_at_100 value: 0.9329999999999999 - type: precision_at_1000 value: 0.131 - type: precision_at_3 value: 10.282 - type: precision_at_5 value: 7.836 - type: recall_at_1 value: 13.424 - type: recall_at_10 value: 38.179 - type: recall_at_100 value: 63.906 - type: recall_at_1000 value: 84.933 - type: recall_at_3 value: 23.878 - type: recall_at_5 value: 30.037999999999997 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.154 - type: map_at_10 value: 35.912 - type: map_at_100 value: 37.211 - type: map_at_1000 value: 37.327 - type: map_at_3 value: 32.684999999999995 - type: map_at_5 value: 34.562 - type: mrr_at_1 value: 32.435 - type: mrr_at_10 value: 41.411 - type: mrr_at_100 value: 42.297000000000004 - type: mrr_at_1000 value: 42.345 - type: mrr_at_3 value: 38.771 - type: mrr_at_5 value: 40.33 - type: ndcg_at_1 value: 32.435 - type: ndcg_at_10 value: 41.785 - type: ndcg_at_100 value: 47.469 - type: ndcg_at_1000 value: 49.685 - type: ndcg_at_3 value: 36.618 - type: ndcg_at_5 value: 39.101 - type: precision_at_1 value: 32.435 - type: precision_at_10 value: 7.642 - type: precision_at_100 value: 1.244 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 17.485 - type: precision_at_5 value: 12.57 - type: recall_at_1 value: 26.154 - type: recall_at_10 value: 54.111 - type: recall_at_100 value: 78.348 - type: recall_at_1000 value: 92.996 - type: recall_at_3 value: 39.189 - type: recall_at_5 value: 45.852 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.308999999999997 - type: map_at_10 value: 35.524 - type: map_at_100 value: 36.774 - type: map_at_1000 value: 36.891 - type: map_at_3 value: 32.561 - type: map_at_5 value: 34.034 - type: mrr_at_1 value: 31.735000000000003 - type: mrr_at_10 value: 40.391 - type: mrr_at_100 value: 41.227000000000004 - type: mrr_at_1000 value: 41.288000000000004 - type: mrr_at_3 value: 37.938 - type: mrr_at_5 value: 39.193 - type: ndcg_at_1 value: 31.735000000000003 - type: ndcg_at_10 value: 41.166000000000004 - type: ndcg_at_100 value: 46.702 - type: ndcg_at_1000 value: 49.157000000000004 - type: ndcg_at_3 value: 36.274 - type: ndcg_at_5 value: 38.177 - type: precision_at_1 value: 31.735000000000003 - type: precision_at_10 value: 7.5569999999999995 - type: precision_at_100 value: 1.2109999999999999 - type: precision_at_1000 value: 0.16 - type: precision_at_3 value: 17.199 - type: precision_at_5 value: 12.123000000000001 - type: recall_at_1 value: 26.308999999999997 - type: recall_at_10 value: 53.083000000000006 - type: recall_at_100 value: 76.922 - type: recall_at_1000 value: 93.767 - type: recall_at_3 value: 39.262 - type: recall_at_5 value: 44.413000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.391250000000003 - type: map_at_10 value: 33.280166666666666 - type: map_at_100 value: 34.49566666666667 - type: map_at_1000 value: 34.61533333333333 - type: map_at_3 value: 30.52183333333333 - type: map_at_5 value: 32.06608333333333 - type: mrr_at_1 value: 29.105083333333337 - type: mrr_at_10 value: 37.44766666666666 - type: mrr_at_100 value: 38.32491666666667 - type: mrr_at_1000 value: 38.385666666666665 - type: mrr_at_3 value: 35.06883333333333 - type: mrr_at_5 value: 36.42066666666667 - type: ndcg_at_1 value: 29.105083333333337 - type: ndcg_at_10 value: 38.54358333333333 - type: ndcg_at_100 value: 43.833583333333344 - type: ndcg_at_1000 value: 46.215333333333334 - type: ndcg_at_3 value: 33.876 - type: ndcg_at_5 value: 36.05208333333333 - type: precision_at_1 value: 29.105083333333337 - type: precision_at_10 value: 6.823416666666665 - type: precision_at_100 value: 1.1270833333333334 - type: precision_at_1000 value: 0.15208333333333332 - type: precision_at_3 value: 15.696750000000002 - type: precision_at_5 value: 11.193499999999998 - type: recall_at_1 value: 24.391250000000003 - type: recall_at_10 value: 49.98808333333333 - type: recall_at_100 value: 73.31616666666666 - type: recall_at_1000 value: 89.96291666666667 - type: recall_at_3 value: 36.86666666666667 - type: recall_at_5 value: 42.54350000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.995 - type: map_at_10 value: 28.807 - type: map_at_100 value: 29.813000000000002 - type: map_at_1000 value: 29.903000000000002 - type: map_at_3 value: 26.636 - type: map_at_5 value: 27.912 - type: mrr_at_1 value: 24.847 - type: mrr_at_10 value: 31.494 - type: mrr_at_100 value: 32.381 - type: mrr_at_1000 value: 32.446999999999996 - type: mrr_at_3 value: 29.473 - type: mrr_at_5 value: 30.7 - type: ndcg_at_1 value: 24.847 - type: ndcg_at_10 value: 32.818999999999996 - type: ndcg_at_100 value: 37.835 - type: ndcg_at_1000 value: 40.226 - type: ndcg_at_3 value: 28.811999999999998 - type: ndcg_at_5 value: 30.875999999999998 - type: precision_at_1 value: 24.847 - type: precision_at_10 value: 5.244999999999999 - type: precision_at_100 value: 0.856 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 12.577 - type: precision_at_5 value: 8.895999999999999 - type: recall_at_1 value: 21.995 - type: recall_at_10 value: 42.479 - type: recall_at_100 value: 65.337 - type: recall_at_1000 value: 83.23700000000001 - type: recall_at_3 value: 31.573 - type: recall_at_5 value: 36.684 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.751000000000001 - type: map_at_10 value: 21.909 - type: map_at_100 value: 23.064 - type: map_at_1000 value: 23.205000000000002 - type: map_at_3 value: 20.138 - type: map_at_5 value: 20.973 - type: mrr_at_1 value: 19.305 - type: mrr_at_10 value: 25.647 - type: mrr_at_100 value: 26.659 - type: mrr_at_1000 value: 26.748 - type: mrr_at_3 value: 23.933 - type: mrr_at_5 value: 24.754 - type: ndcg_at_1 value: 19.305 - type: ndcg_at_10 value: 25.886 - type: ndcg_at_100 value: 31.56 - type: ndcg_at_1000 value: 34.799 - type: ndcg_at_3 value: 22.708000000000002 - type: ndcg_at_5 value: 23.838 - type: precision_at_1 value: 19.305 - type: precision_at_10 value: 4.677 - type: precision_at_100 value: 0.895 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 10.771 - type: precision_at_5 value: 7.46 - type: recall_at_1 value: 15.751000000000001 - type: recall_at_10 value: 34.156 - type: recall_at_100 value: 59.899 - type: recall_at_1000 value: 83.08 - type: recall_at_3 value: 24.772 - type: recall_at_5 value: 28.009 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.34 - type: map_at_10 value: 32.383 - type: map_at_100 value: 33.629999999999995 - type: map_at_1000 value: 33.735 - type: map_at_3 value: 29.68 - type: map_at_5 value: 31.270999999999997 - type: mrr_at_1 value: 27.612 - type: mrr_at_10 value: 36.381 - type: mrr_at_100 value: 37.351 - type: mrr_at_1000 value: 37.411 - type: mrr_at_3 value: 33.893 - type: mrr_at_5 value: 35.353 - type: ndcg_at_1 value: 27.612 - type: ndcg_at_10 value: 37.714999999999996 - type: ndcg_at_100 value: 43.525000000000006 - type: ndcg_at_1000 value: 45.812999999999995 - type: ndcg_at_3 value: 32.796 - type: ndcg_at_5 value: 35.243 - type: precision_at_1 value: 27.612 - type: precision_at_10 value: 6.465 - type: precision_at_100 value: 1.0619999999999998 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 15.049999999999999 - type: precision_at_5 value: 10.764999999999999 - type: recall_at_1 value: 23.34 - type: recall_at_10 value: 49.856 - type: recall_at_100 value: 75.334 - type: recall_at_1000 value: 91.156 - type: recall_at_3 value: 36.497 - type: recall_at_5 value: 42.769 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.097 - type: map_at_10 value: 34.599999999999994 - type: map_at_100 value: 36.174 - type: map_at_1000 value: 36.398 - type: map_at_3 value: 31.781 - type: map_at_5 value: 33.22 - type: mrr_at_1 value: 31.225 - type: mrr_at_10 value: 39.873 - type: mrr_at_100 value: 40.853 - type: mrr_at_1000 value: 40.904 - type: mrr_at_3 value: 37.681 - type: mrr_at_5 value: 38.669 - type: ndcg_at_1 value: 31.225 - type: ndcg_at_10 value: 40.586 - type: ndcg_at_100 value: 46.226 - type: ndcg_at_1000 value: 48.788 - type: ndcg_at_3 value: 36.258 - type: ndcg_at_5 value: 37.848 - type: precision_at_1 value: 31.225 - type: precision_at_10 value: 7.707999999999999 - type: precision_at_100 value: 1.536 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 17.26 - type: precision_at_5 value: 12.253 - type: recall_at_1 value: 25.097 - type: recall_at_10 value: 51.602000000000004 - type: recall_at_100 value: 76.854 - type: recall_at_1000 value: 93.303 - type: recall_at_3 value: 38.68 - type: recall_at_5 value: 43.258 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.689 - type: map_at_10 value: 25.291000000000004 - type: map_at_100 value: 26.262 - type: map_at_1000 value: 26.372 - type: map_at_3 value: 22.916 - type: map_at_5 value: 24.315 - type: mrr_at_1 value: 19.409000000000002 - type: mrr_at_10 value: 27.233 - type: mrr_at_100 value: 28.109 - type: mrr_at_1000 value: 28.192 - type: mrr_at_3 value: 24.892 - type: mrr_at_5 value: 26.278000000000002 - type: ndcg_at_1 value: 19.409000000000002 - type: ndcg_at_10 value: 29.809 - type: ndcg_at_100 value: 34.936 - type: ndcg_at_1000 value: 37.852000000000004 - type: ndcg_at_3 value: 25.179000000000002 - type: ndcg_at_5 value: 27.563 - type: precision_at_1 value: 19.409000000000002 - type: precision_at_10 value: 4.861 - type: precision_at_100 value: 0.8 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 11.029 - type: precision_at_5 value: 7.985 - type: recall_at_1 value: 17.689 - type: recall_at_10 value: 41.724 - type: recall_at_100 value: 65.95299999999999 - type: recall_at_1000 value: 88.094 - type: recall_at_3 value: 29.621 - type: recall_at_5 value: 35.179 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.581 - type: map_at_10 value: 18.944 - type: map_at_100 value: 20.812 - type: map_at_1000 value: 21.002000000000002 - type: map_at_3 value: 15.661 - type: map_at_5 value: 17.502000000000002 - type: mrr_at_1 value: 23.388 - type: mrr_at_10 value: 34.263 - type: mrr_at_100 value: 35.364000000000004 - type: mrr_at_1000 value: 35.409 - type: mrr_at_3 value: 30.586000000000002 - type: mrr_at_5 value: 32.928000000000004 - type: ndcg_at_1 value: 23.388 - type: ndcg_at_10 value: 26.56 - type: ndcg_at_100 value: 34.248 - type: ndcg_at_1000 value: 37.779 - type: ndcg_at_3 value: 21.179000000000002 - type: ndcg_at_5 value: 23.504 - type: precision_at_1 value: 23.388 - type: precision_at_10 value: 8.476 - type: precision_at_100 value: 1.672 - type: precision_at_1000 value: 0.233 - type: precision_at_3 value: 15.852 - type: precision_at_5 value: 12.73 - type: recall_at_1 value: 10.581 - type: recall_at_10 value: 32.512 - type: recall_at_100 value: 59.313 - type: recall_at_1000 value: 79.25 - type: recall_at_3 value: 19.912 - type: recall_at_5 value: 25.832 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.35 - type: map_at_10 value: 20.134 - type: map_at_100 value: 28.975 - type: map_at_1000 value: 30.709999999999997 - type: map_at_3 value: 14.513000000000002 - type: map_at_5 value: 16.671 - type: mrr_at_1 value: 69.75 - type: mrr_at_10 value: 77.67699999999999 - type: mrr_at_100 value: 77.97500000000001 - type: mrr_at_1000 value: 77.985 - type: mrr_at_3 value: 76.292 - type: mrr_at_5 value: 77.179 - type: ndcg_at_1 value: 56.49999999999999 - type: ndcg_at_10 value: 42.226 - type: ndcg_at_100 value: 47.562 - type: ndcg_at_1000 value: 54.923 - type: ndcg_at_3 value: 46.564 - type: ndcg_at_5 value: 43.830000000000005 - type: precision_at_1 value: 69.75 - type: precision_at_10 value: 33.525 - type: precision_at_100 value: 11.035 - type: precision_at_1000 value: 2.206 - type: precision_at_3 value: 49.75 - type: precision_at_5 value: 42 - type: recall_at_1 value: 9.35 - type: recall_at_10 value: 25.793 - type: recall_at_100 value: 54.186 - type: recall_at_1000 value: 77.81 - type: recall_at_3 value: 15.770000000000001 - type: recall_at_5 value: 19.09 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.945 - type: f1 value: 42.07407842992542 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 71.04599999999999 - type: map_at_10 value: 80.718 - type: map_at_100 value: 80.961 - type: map_at_1000 value: 80.974 - type: map_at_3 value: 79.49199999999999 - type: map_at_5 value: 80.32000000000001 - type: mrr_at_1 value: 76.388 - type: mrr_at_10 value: 85.214 - type: mrr_at_100 value: 85.302 - type: mrr_at_1000 value: 85.302 - type: mrr_at_3 value: 84.373 - type: mrr_at_5 value: 84.979 - type: ndcg_at_1 value: 76.388 - type: ndcg_at_10 value: 84.987 - type: ndcg_at_100 value: 85.835 - type: ndcg_at_1000 value: 86.04899999999999 - type: ndcg_at_3 value: 83.04 - type: ndcg_at_5 value: 84.22500000000001 - type: precision_at_1 value: 76.388 - type: precision_at_10 value: 10.35 - type: precision_at_100 value: 1.099 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 32.108 - type: precision_at_5 value: 20.033 - type: recall_at_1 value: 71.04599999999999 - type: recall_at_10 value: 93.547 - type: recall_at_100 value: 96.887 - type: recall_at_1000 value: 98.158 - type: recall_at_3 value: 88.346 - type: recall_at_5 value: 91.321 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 19.8 - type: map_at_10 value: 31.979999999999997 - type: map_at_100 value: 33.876 - type: map_at_1000 value: 34.056999999999995 - type: map_at_3 value: 28.067999999999998 - type: map_at_5 value: 30.066 - type: mrr_at_1 value: 38.735 - type: mrr_at_10 value: 47.749 - type: mrr_at_100 value: 48.605 - type: mrr_at_1000 value: 48.644999999999996 - type: mrr_at_3 value: 45.165 - type: mrr_at_5 value: 46.646 - type: ndcg_at_1 value: 38.735 - type: ndcg_at_10 value: 39.883 - type: ndcg_at_100 value: 46.983000000000004 - type: ndcg_at_1000 value: 50.043000000000006 - type: ndcg_at_3 value: 35.943000000000005 - type: ndcg_at_5 value: 37.119 - type: precision_at_1 value: 38.735 - type: precision_at_10 value: 10.940999999999999 - type: precision_at_100 value: 1.836 - type: precision_at_1000 value: 0.23900000000000002 - type: precision_at_3 value: 23.817 - type: precision_at_5 value: 17.346 - type: recall_at_1 value: 19.8 - type: recall_at_10 value: 47.082 - type: recall_at_100 value: 73.247 - type: recall_at_1000 value: 91.633 - type: recall_at_3 value: 33.201 - type: recall_at_5 value: 38.81 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 38.102999999999994 - type: map_at_10 value: 60.547 - type: map_at_100 value: 61.466 - type: map_at_1000 value: 61.526 - type: map_at_3 value: 56.973 - type: map_at_5 value: 59.244 - type: mrr_at_1 value: 76.205 - type: mrr_at_10 value: 82.816 - type: mrr_at_100 value: 83.002 - type: mrr_at_1000 value: 83.009 - type: mrr_at_3 value: 81.747 - type: mrr_at_5 value: 82.467 - type: ndcg_at_1 value: 76.205 - type: ndcg_at_10 value: 69.15 - type: ndcg_at_100 value: 72.297 - type: ndcg_at_1000 value: 73.443 - type: ndcg_at_3 value: 64.07000000000001 - type: ndcg_at_5 value: 66.96600000000001 - type: precision_at_1 value: 76.205 - type: precision_at_10 value: 14.601 - type: precision_at_100 value: 1.7049999999999998 - type: precision_at_1000 value: 0.186 - type: precision_at_3 value: 41.202 - type: precision_at_5 value: 27.006000000000004 - type: recall_at_1 value: 38.102999999999994 - type: recall_at_10 value: 73.005 - type: recall_at_100 value: 85.253 - type: recall_at_1000 value: 92.795 - type: recall_at_3 value: 61.803 - type: recall_at_5 value: 67.515 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 86.15 - type: ap value: 80.36282825265391 - type: f1 value: 86.07368510726472 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.6 - type: map_at_10 value: 34.887 - type: map_at_100 value: 36.069 - type: map_at_1000 value: 36.115 - type: map_at_3 value: 31.067 - type: map_at_5 value: 33.300000000000004 - type: mrr_at_1 value: 23.238 - type: mrr_at_10 value: 35.47 - type: mrr_at_100 value: 36.599 - type: mrr_at_1000 value: 36.64 - type: mrr_at_3 value: 31.735999999999997 - type: mrr_at_5 value: 33.939 - type: ndcg_at_1 value: 23.252 - type: ndcg_at_10 value: 41.765 - type: ndcg_at_100 value: 47.402 - type: ndcg_at_1000 value: 48.562 - type: ndcg_at_3 value: 34.016999999999996 - type: ndcg_at_5 value: 38.016 - type: precision_at_1 value: 23.252 - type: precision_at_10 value: 6.569 - type: precision_at_100 value: 0.938 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.479000000000001 - type: precision_at_5 value: 10.722 - type: recall_at_1 value: 22.6 - type: recall_at_10 value: 62.919000000000004 - type: recall_at_100 value: 88.82 - type: recall_at_1000 value: 97.71600000000001 - type: recall_at_3 value: 41.896 - type: recall_at_5 value: 51.537 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.69357045143639 - type: f1 value: 93.55489858177597 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 75.31235750114 - type: f1 value: 57.891491963121155 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.04303967720243 - type: f1 value: 70.51516022297616 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.65299260255549 - type: f1 value: 77.49059766538576 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.458906115906597 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 28.9851513122443 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.2916268497217 - type: mrr value: 32.328276715593816 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.3740000000000006 - type: map_at_10 value: 13.089999999999998 - type: map_at_100 value: 16.512 - type: map_at_1000 value: 18.014 - type: map_at_3 value: 9.671000000000001 - type: map_at_5 value: 11.199 - type: mrr_at_1 value: 46.749 - type: mrr_at_10 value: 55.367 - type: mrr_at_100 value: 56.021 - type: mrr_at_1000 value: 56.058 - type: mrr_at_3 value: 53.30200000000001 - type: mrr_at_5 value: 54.773 - type: ndcg_at_1 value: 45.046 - type: ndcg_at_10 value: 35.388999999999996 - type: ndcg_at_100 value: 32.175 - type: ndcg_at_1000 value: 41.018 - type: ndcg_at_3 value: 40.244 - type: ndcg_at_5 value: 38.267 - type: precision_at_1 value: 46.749 - type: precision_at_10 value: 26.563 - type: precision_at_100 value: 8.074 - type: precision_at_1000 value: 2.099 - type: precision_at_3 value: 37.358000000000004 - type: precision_at_5 value: 33.003 - type: recall_at_1 value: 6.3740000000000006 - type: recall_at_10 value: 16.805999999999997 - type: recall_at_100 value: 31.871 - type: recall_at_1000 value: 64.098 - type: recall_at_3 value: 10.383000000000001 - type: recall_at_5 value: 13.166 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 34.847 - type: map_at_10 value: 50.532 - type: map_at_100 value: 51.504000000000005 - type: map_at_1000 value: 51.528 - type: map_at_3 value: 46.219 - type: map_at_5 value: 48.868 - type: mrr_at_1 value: 39.137 - type: mrr_at_10 value: 53.157 - type: mrr_at_100 value: 53.839999999999996 - type: mrr_at_1000 value: 53.857 - type: mrr_at_3 value: 49.667 - type: mrr_at_5 value: 51.847 - type: ndcg_at_1 value: 39.108 - type: ndcg_at_10 value: 58.221000000000004 - type: ndcg_at_100 value: 62.021 - type: ndcg_at_1000 value: 62.57 - type: ndcg_at_3 value: 50.27199999999999 - type: ndcg_at_5 value: 54.623999999999995 - type: precision_at_1 value: 39.108 - type: precision_at_10 value: 9.397 - type: precision_at_100 value: 1.1520000000000001 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 22.644000000000002 - type: precision_at_5 value: 16.141 - type: recall_at_1 value: 34.847 - type: recall_at_10 value: 78.945 - type: recall_at_100 value: 94.793 - type: recall_at_1000 value: 98.904 - type: recall_at_3 value: 58.56 - type: recall_at_5 value: 68.535 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 68.728 - type: map_at_10 value: 82.537 - type: map_at_100 value: 83.218 - type: map_at_1000 value: 83.238 - type: map_at_3 value: 79.586 - type: map_at_5 value: 81.416 - type: mrr_at_1 value: 79.17999999999999 - type: mrr_at_10 value: 85.79299999999999 - type: mrr_at_100 value: 85.937 - type: mrr_at_1000 value: 85.938 - type: mrr_at_3 value: 84.748 - type: mrr_at_5 value: 85.431 - type: ndcg_at_1 value: 79.17 - type: ndcg_at_10 value: 86.555 - type: ndcg_at_100 value: 88.005 - type: ndcg_at_1000 value: 88.146 - type: ndcg_at_3 value: 83.557 - type: ndcg_at_5 value: 85.152 - type: precision_at_1 value: 79.17 - type: precision_at_10 value: 13.163 - type: precision_at_100 value: 1.52 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 36.53 - type: precision_at_5 value: 24.046 - type: recall_at_1 value: 68.728 - type: recall_at_10 value: 94.217 - type: recall_at_100 value: 99.295 - type: recall_at_1000 value: 99.964 - type: recall_at_3 value: 85.646 - type: recall_at_5 value: 90.113 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.15680266226348 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.4318549229047 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.353 - type: map_at_10 value: 10.956000000000001 - type: map_at_100 value: 12.873999999999999 - type: map_at_1000 value: 13.177 - type: map_at_3 value: 7.854 - type: map_at_5 value: 9.327 - type: mrr_at_1 value: 21.4 - type: mrr_at_10 value: 31.948999999999998 - type: mrr_at_100 value: 33.039 - type: mrr_at_1000 value: 33.106 - type: mrr_at_3 value: 28.449999999999996 - type: mrr_at_5 value: 30.535 - type: ndcg_at_1 value: 21.4 - type: ndcg_at_10 value: 18.694 - type: ndcg_at_100 value: 26.275 - type: ndcg_at_1000 value: 31.836 - type: ndcg_at_3 value: 17.559 - type: ndcg_at_5 value: 15.372 - type: precision_at_1 value: 21.4 - type: precision_at_10 value: 9.790000000000001 - type: precision_at_100 value: 2.0709999999999997 - type: precision_at_1000 value: 0.34099999999999997 - type: precision_at_3 value: 16.467000000000002 - type: precision_at_5 value: 13.54 - type: recall_at_1 value: 4.353 - type: recall_at_10 value: 19.892000000000003 - type: recall_at_100 value: 42.067 - type: recall_at_1000 value: 69.268 - type: recall_at_3 value: 10.042 - type: recall_at_5 value: 13.741999999999999 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.75433886279843 - type: cos_sim_spearman value: 78.29727771767095 - type: euclidean_pearson value: 80.83057828506621 - type: euclidean_spearman value: 78.35203149750356 - type: manhattan_pearson value: 80.7403553891142 - type: manhattan_spearman value: 78.33670488531051 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 84.59999465280839 - type: cos_sim_spearman value: 75.79279003980383 - type: euclidean_pearson value: 82.29895375956758 - type: euclidean_spearman value: 77.33856514102094 - type: manhattan_pearson value: 82.22694214534756 - type: manhattan_spearman value: 77.3028993008695 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 83.09296929691297 - type: cos_sim_spearman value: 83.58056936846941 - type: euclidean_pearson value: 83.84067483060005 - type: euclidean_spearman value: 84.45155680480985 - type: manhattan_pearson value: 83.82353052971942 - type: manhattan_spearman value: 84.43030567861112 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.74616852320915 - type: cos_sim_spearman value: 79.948683747966 - type: euclidean_pearson value: 81.55702283757084 - type: euclidean_spearman value: 80.1721505114231 - type: manhattan_pearson value: 81.52251518619441 - type: manhattan_spearman value: 80.1469800135577 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 87.97170104226318 - type: cos_sim_spearman value: 88.82021731518206 - type: euclidean_pearson value: 87.92950547187615 - type: euclidean_spearman value: 88.67043634645866 - type: manhattan_pearson value: 87.90668112827639 - type: manhattan_spearman value: 88.64471082785317 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.02790375770599 - type: cos_sim_spearman value: 84.46308496590792 - type: euclidean_pearson value: 84.29430000414911 - type: euclidean_spearman value: 84.77298303589936 - type: manhattan_pearson value: 84.23919291368665 - type: manhattan_spearman value: 84.75272234871308 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.62885108477064 - type: cos_sim_spearman value: 87.58456196391622 - type: euclidean_pearson value: 88.2602775281007 - type: euclidean_spearman value: 87.51556278299846 - type: manhattan_pearson value: 88.11224053672842 - type: manhattan_spearman value: 87.4336094383095 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.98187965128411 - type: cos_sim_spearman value: 64.0653163219731 - type: euclidean_pearson value: 62.30616725924099 - type: euclidean_spearman value: 61.556971332295916 - type: manhattan_pearson value: 62.07642330128549 - type: manhattan_spearman value: 61.155494129828 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 85.6089703921826 - type: cos_sim_spearman value: 86.52303197250791 - type: euclidean_pearson value: 85.95801955963246 - type: euclidean_spearman value: 86.25242424112962 - type: manhattan_pearson value: 85.88829100470312 - type: manhattan_spearman value: 86.18742955805165 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 83.02282098487036 - type: mrr value: 95.05126409538174 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 55.928 - type: map_at_10 value: 67.308 - type: map_at_100 value: 67.89500000000001 - type: map_at_1000 value: 67.91199999999999 - type: map_at_3 value: 65.091 - type: map_at_5 value: 66.412 - type: mrr_at_1 value: 58.667 - type: mrr_at_10 value: 68.401 - type: mrr_at_100 value: 68.804 - type: mrr_at_1000 value: 68.819 - type: mrr_at_3 value: 66.72200000000001 - type: mrr_at_5 value: 67.72200000000001 - type: ndcg_at_1 value: 58.667 - type: ndcg_at_10 value: 71.944 - type: ndcg_at_100 value: 74.464 - type: ndcg_at_1000 value: 74.82799999999999 - type: ndcg_at_3 value: 68.257 - type: ndcg_at_5 value: 70.10300000000001 - type: precision_at_1 value: 58.667 - type: precision_at_10 value: 9.533 - type: precision_at_100 value: 1.09 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 27.222 - type: precision_at_5 value: 17.533 - type: recall_at_1 value: 55.928 - type: recall_at_10 value: 84.65 - type: recall_at_100 value: 96.267 - type: recall_at_1000 value: 99 - type: recall_at_3 value: 74.656 - type: recall_at_5 value: 79.489 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.79009900990098 - type: cos_sim_ap value: 94.5795129511524 - type: cos_sim_f1 value: 89.34673366834171 - type: cos_sim_precision value: 89.79797979797979 - type: cos_sim_recall value: 88.9 - type: dot_accuracy value: 99.53465346534654 - type: dot_ap value: 81.56492504352725 - type: dot_f1 value: 76.33816908454227 - type: dot_precision value: 76.37637637637637 - type: dot_recall value: 76.3 - type: euclidean_accuracy value: 99.78514851485149 - type: euclidean_ap value: 94.59134620408962 - type: euclidean_f1 value: 88.96484375 - type: euclidean_precision value: 86.92748091603053 - type: euclidean_recall value: 91.10000000000001 - type: manhattan_accuracy value: 99.78415841584159 - type: manhattan_ap value: 94.5190197328845 - type: manhattan_f1 value: 88.84462151394423 - type: manhattan_precision value: 88.4920634920635 - type: manhattan_recall value: 89.2 - type: max_accuracy value: 99.79009900990098 - type: max_ap value: 94.59134620408962 - type: max_f1 value: 89.34673366834171 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.1487505617497 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 32.502518166001856 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 50.33775480236701 - type: mrr value: 51.17302223919871 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.561111309808208 - type: cos_sim_spearman value: 30.2839254379273 - type: dot_pearson value: 29.560242291401973 - type: dot_spearman value: 30.51527274679116 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.215 - type: map_at_10 value: 1.752 - type: map_at_100 value: 9.258 - type: map_at_1000 value: 23.438 - type: map_at_3 value: 0.6 - type: map_at_5 value: 0.968 - type: mrr_at_1 value: 84 - type: mrr_at_10 value: 91.333 - type: mrr_at_100 value: 91.333 - type: mrr_at_1000 value: 91.333 - type: mrr_at_3 value: 91.333 - type: mrr_at_5 value: 91.333 - type: ndcg_at_1 value: 75 - type: ndcg_at_10 value: 69.596 - type: ndcg_at_100 value: 51.970000000000006 - type: ndcg_at_1000 value: 48.864999999999995 - type: ndcg_at_3 value: 73.92699999999999 - type: ndcg_at_5 value: 73.175 - type: precision_at_1 value: 84 - type: precision_at_10 value: 74 - type: precision_at_100 value: 53.2 - type: precision_at_1000 value: 21.836 - type: precision_at_3 value: 79.333 - type: precision_at_5 value: 78.4 - type: recall_at_1 value: 0.215 - type: recall_at_10 value: 1.9609999999999999 - type: recall_at_100 value: 12.809999999999999 - type: recall_at_1000 value: 46.418 - type: recall_at_3 value: 0.6479999999999999 - type: recall_at_5 value: 1.057 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 3.066 - type: map_at_10 value: 10.508000000000001 - type: map_at_100 value: 16.258 - type: map_at_1000 value: 17.705000000000002 - type: map_at_3 value: 6.157 - type: map_at_5 value: 7.510999999999999 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 48.786 - type: mrr_at_100 value: 49.619 - type: mrr_at_1000 value: 49.619 - type: mrr_at_3 value: 45.918 - type: mrr_at_5 value: 46.837 - type: ndcg_at_1 value: 31.633 - type: ndcg_at_10 value: 26.401999999999997 - type: ndcg_at_100 value: 37.139 - type: ndcg_at_1000 value: 48.012 - type: ndcg_at_3 value: 31.875999999999998 - type: ndcg_at_5 value: 27.383000000000003 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 22.857 - type: precision_at_100 value: 7.611999999999999 - type: precision_at_1000 value: 1.492 - type: precision_at_3 value: 33.333 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 3.066 - type: recall_at_10 value: 16.239 - type: recall_at_100 value: 47.29 - type: recall_at_1000 value: 81.137 - type: recall_at_3 value: 7.069 - type: recall_at_5 value: 9.483 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 72.1126 - type: ap value: 14.710862719285753 - type: f1 value: 55.437808972378846 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.39049235993209 - type: f1 value: 60.69810537250234 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 48.15576640316866 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.52917684925792 - type: cos_sim_ap value: 75.97497873817315 - type: cos_sim_f1 value: 70.01151926276718 - type: cos_sim_precision value: 67.98409147402435 - type: cos_sim_recall value: 72.16358839050132 - type: dot_accuracy value: 82.47004828038385 - type: dot_ap value: 62.48739894974198 - type: dot_f1 value: 59.13107511045656 - type: dot_precision value: 55.27765029830197 - type: dot_recall value: 63.562005277044854 - type: euclidean_accuracy value: 86.46361089586935 - type: euclidean_ap value: 75.59282886839452 - type: euclidean_f1 value: 69.6465443945099 - type: euclidean_precision value: 64.52847175331982 - type: euclidean_recall value: 75.64643799472296 - type: manhattan_accuracy value: 86.43380818978363 - type: manhattan_ap value: 75.5742420974403 - type: manhattan_f1 value: 69.8636926889715 - type: manhattan_precision value: 65.8644859813084 - type: manhattan_recall value: 74.37994722955145 - type: max_accuracy value: 86.52917684925792 - type: max_ap value: 75.97497873817315 - type: max_f1 value: 70.01151926276718 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.29056545193464 - type: cos_sim_ap value: 86.63028865482376 - type: cos_sim_f1 value: 79.18166458532285 - type: cos_sim_precision value: 75.70585756426465 - type: cos_sim_recall value: 82.99199260856174 - type: dot_accuracy value: 85.23305002522606 - type: dot_ap value: 76.0482687263196 - type: dot_f1 value: 70.80484330484332 - type: dot_precision value: 65.86933474688577 - type: dot_recall value: 76.53988296889437 - type: euclidean_accuracy value: 89.26145845461248 - type: euclidean_ap value: 86.54073288416006 - type: euclidean_f1 value: 78.9721371479794 - type: euclidean_precision value: 76.68649354417525 - type: euclidean_recall value: 81.39821373575609 - type: manhattan_accuracy value: 89.22847052431405 - type: manhattan_ap value: 86.51250729037905 - type: manhattan_f1 value: 78.94601825044894 - type: manhattan_precision value: 75.32694594027555 - type: manhattan_recall value: 82.93039728980598 - type: max_accuracy value: 89.29056545193464 - type: max_ap value: 86.63028865482376 - type: max_f1 value: 79.18166458532285 language: - en license: mit --- # E5-base-v2 Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 This model has 12 layers and the embedding size is 768. ## Usage Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. ## Training Details Please refer to our paper at ## Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## Support for Sentence Transformers Below is an example for usage with sentence_transformers. Package requirements Contributors: michaelfeil ## FAQ **1. Do I need to add the prefix \"query: \" and \"passage: \" to input texts?** Yes, this is how the model is trained, otherwise you will see a performance degradation. Here are some rules of thumb: - Use \"query: \" and \"passage: \" correspondingly for asymmetric tasks such as passage retrieval in open QA, ad-hoc information retrieval. - Use \"query: \" prefix for symmetric tasks such as semantic similarity, paraphrase retrieval. - Use \"query: \" prefix if you want to use embeddings as features, such as linear probing classification, clustering. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Why does the cosine similarity scores distribute around 0.7 to 1.0?** This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss. For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue. ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations This model only works for English texts. Long texts will be truncated to at most 512 tokens.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity, classification, clustering, and retrieval across diverse datasets." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_e5-large-v2.json b/data/model_data_json/intfloat_e5-large-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..d191f400d47626ae32c7f5c4314af900c224315b --- /dev/null +++ b/data/model_data_json/intfloat_e5-large-v2.json @@ -0,0 +1,27 @@ +{ + "model_id": "intfloat/e5-large-v2", + "downloads": 1449217, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "bert", + "mteb", + "Sentence Transformers", + "sentence-similarity", + "en", + "arxiv:2212.03533", + "arxiv:2104.08663", + "arxiv:2210.07316", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - Sentence Transformers - sentence-similarity - sentence-transformers model-index: - name: e5-large-v2 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 79.22388059701493 - type: ap value: 43.20816505595132 - type: f1 value: 73.27811303522058 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.748325 - type: ap value: 90.72534979701297 - type: f1 value: 93.73895874282185 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.612 - type: f1 value: 47.61157345898393 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 23.541999999999998 - type: map_at_10 value: 38.208 - type: map_at_100 value: 39.417 - type: map_at_1000 value: 39.428999999999995 - type: map_at_3 value: 33.95 - type: map_at_5 value: 36.329 - type: mrr_at_1 value: 23.755000000000003 - type: mrr_at_10 value: 38.288 - type: mrr_at_100 value: 39.511 - type: mrr_at_1000 value: 39.523 - type: mrr_at_3 value: 34.009 - type: mrr_at_5 value: 36.434 - type: ndcg_at_1 value: 23.541999999999998 - type: ndcg_at_10 value: 46.417 - type: ndcg_at_100 value: 51.812000000000005 - type: ndcg_at_1000 value: 52.137 - type: ndcg_at_3 value: 37.528 - type: ndcg_at_5 value: 41.81 - type: precision_at_1 value: 23.541999999999998 - type: precision_at_10 value: 7.269 - type: precision_at_100 value: 0.9690000000000001 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 15.979 - type: precision_at_5 value: 11.664 - type: recall_at_1 value: 23.541999999999998 - type: recall_at_10 value: 72.688 - type: recall_at_100 value: 96.871 - type: recall_at_1000 value: 99.431 - type: recall_at_3 value: 47.937000000000005 - type: recall_at_5 value: 58.321 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 45.546499570522094 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 41.01607489943561 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 59.616107510107774 - type: mrr value: 72.75106626214661 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 84.33018094733868 - type: cos_sim_spearman value: 83.60190492611737 - type: euclidean_pearson value: 82.1492450218961 - type: euclidean_spearman value: 82.70308926526991 - type: manhattan_pearson value: 81.93959600076842 - type: manhattan_spearman value: 82.73260801016369 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 84.54545454545455 - type: f1 value: 84.49582530928923 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.362725540120096 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 34.849509608178145 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.502999999999997 - type: map_at_10 value: 43.323 - type: map_at_100 value: 44.708999999999996 - type: map_at_1000 value: 44.838 - type: map_at_3 value: 38.987 - type: map_at_5 value: 41.516999999999996 - type: mrr_at_1 value: 38.769999999999996 - type: mrr_at_10 value: 49.13 - type: mrr_at_100 value: 49.697 - type: mrr_at_1000 value: 49.741 - type: mrr_at_3 value: 45.804 - type: mrr_at_5 value: 47.842 - type: ndcg_at_1 value: 38.769999999999996 - type: ndcg_at_10 value: 50.266999999999996 - type: ndcg_at_100 value: 54.967 - type: ndcg_at_1000 value: 56.976000000000006 - type: ndcg_at_3 value: 43.823 - type: ndcg_at_5 value: 47.12 - type: precision_at_1 value: 38.769999999999996 - type: precision_at_10 value: 10.057 - type: precision_at_100 value: 1.554 - type: precision_at_1000 value: 0.202 - type: precision_at_3 value: 21.125 - type: precision_at_5 value: 15.851 - type: recall_at_1 value: 31.502999999999997 - type: recall_at_10 value: 63.715999999999994 - type: recall_at_100 value: 83.61800000000001 - type: recall_at_1000 value: 96.63199999999999 - type: recall_at_3 value: 45.403 - type: recall_at_5 value: 54.481 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.833000000000002 - type: map_at_10 value: 37.330999999999996 - type: map_at_100 value: 38.580999999999996 - type: map_at_1000 value: 38.708 - type: map_at_3 value: 34.713 - type: map_at_5 value: 36.104 - type: mrr_at_1 value: 35.223 - type: mrr_at_10 value: 43.419000000000004 - type: mrr_at_100 value: 44.198 - type: mrr_at_1000 value: 44.249 - type: mrr_at_3 value: 41.614000000000004 - type: mrr_at_5 value: 42.553000000000004 - type: ndcg_at_1 value: 35.223 - type: ndcg_at_10 value: 42.687999999999995 - type: ndcg_at_100 value: 47.447 - type: ndcg_at_1000 value: 49.701 - type: ndcg_at_3 value: 39.162 - type: ndcg_at_5 value: 40.557 - type: precision_at_1 value: 35.223 - type: precision_at_10 value: 7.962 - type: precision_at_100 value: 1.304 - type: precision_at_1000 value: 0.18 - type: precision_at_3 value: 19.023 - type: precision_at_5 value: 13.184999999999999 - type: recall_at_1 value: 27.833000000000002 - type: recall_at_10 value: 51.881 - type: recall_at_100 value: 72.04 - type: recall_at_1000 value: 86.644 - type: recall_at_3 value: 40.778 - type: recall_at_5 value: 45.176 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.175 - type: map_at_10 value: 51.174 - type: map_at_100 value: 52.26499999999999 - type: map_at_1000 value: 52.315999999999995 - type: map_at_3 value: 47.897 - type: map_at_5 value: 49.703 - type: mrr_at_1 value: 43.448 - type: mrr_at_10 value: 54.505 - type: mrr_at_100 value: 55.216 - type: mrr_at_1000 value: 55.242000000000004 - type: mrr_at_3 value: 51.98500000000001 - type: mrr_at_5 value: 53.434000000000005 - type: ndcg_at_1 value: 43.448 - type: ndcg_at_10 value: 57.282 - type: ndcg_at_100 value: 61.537 - type: ndcg_at_1000 value: 62.546 - type: ndcg_at_3 value: 51.73799999999999 - type: ndcg_at_5 value: 54.324 - type: precision_at_1 value: 43.448 - type: precision_at_10 value: 9.292 - type: precision_at_100 value: 1.233 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 23.218 - type: precision_at_5 value: 15.887 - type: recall_at_1 value: 38.175 - type: recall_at_10 value: 72.00999999999999 - type: recall_at_100 value: 90.155 - type: recall_at_1000 value: 97.257 - type: recall_at_3 value: 57.133 - type: recall_at_5 value: 63.424 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.405 - type: map_at_10 value: 30.043 - type: map_at_100 value: 31.191000000000003 - type: map_at_1000 value: 31.275 - type: map_at_3 value: 27.034000000000002 - type: map_at_5 value: 28.688000000000002 - type: mrr_at_1 value: 24.068 - type: mrr_at_10 value: 31.993 - type: mrr_at_100 value: 32.992 - type: mrr_at_1000 value: 33.050000000000004 - type: mrr_at_3 value: 28.964000000000002 - type: mrr_at_5 value: 30.653000000000002 - type: ndcg_at_1 value: 24.068 - type: ndcg_at_10 value: 35.198 - type: ndcg_at_100 value: 40.709 - type: ndcg_at_1000 value: 42.855 - type: ndcg_at_3 value: 29.139 - type: ndcg_at_5 value: 32.045 - type: precision_at_1 value: 24.068 - type: precision_at_10 value: 5.65 - type: precision_at_100 value: 0.885 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 12.279 - type: precision_at_5 value: 8.994 - type: recall_at_1 value: 22.405 - type: recall_at_10 value: 49.391 - type: recall_at_100 value: 74.53699999999999 - type: recall_at_1000 value: 90.605 - type: recall_at_3 value: 33.126 - type: recall_at_5 value: 40.073 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 13.309999999999999 - type: map_at_10 value: 20.688000000000002 - type: map_at_100 value: 22.022 - type: map_at_1000 value: 22.152 - type: map_at_3 value: 17.954 - type: map_at_5 value: 19.439 - type: mrr_at_1 value: 16.294 - type: mrr_at_10 value: 24.479 - type: mrr_at_100 value: 25.515 - type: mrr_at_1000 value: 25.593 - type: mrr_at_3 value: 21.642 - type: mrr_at_5 value: 23.189999999999998 - type: ndcg_at_1 value: 16.294 - type: ndcg_at_10 value: 25.833000000000002 - type: ndcg_at_100 value: 32.074999999999996 - type: ndcg_at_1000 value: 35.083 - type: ndcg_at_3 value: 20.493 - type: ndcg_at_5 value: 22.949 - type: precision_at_1 value: 16.294 - type: precision_at_10 value: 5.112 - type: precision_at_100 value: 0.96 - type: precision_at_1000 value: 0.134 - type: precision_at_3 value: 9.908999999999999 - type: precision_at_5 value: 7.587000000000001 - type: recall_at_1 value: 13.309999999999999 - type: recall_at_10 value: 37.851 - type: recall_at_100 value: 64.835 - type: recall_at_1000 value: 86.334 - type: recall_at_3 value: 23.493 - type: recall_at_5 value: 29.528 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.857999999999997 - type: map_at_10 value: 35.503 - type: map_at_100 value: 36.957 - type: map_at_1000 value: 37.065 - type: map_at_3 value: 32.275999999999996 - type: map_at_5 value: 34.119 - type: mrr_at_1 value: 31.954 - type: mrr_at_10 value: 40.851 - type: mrr_at_100 value: 41.863 - type: mrr_at_1000 value: 41.900999999999996 - type: mrr_at_3 value: 38.129999999999995 - type: mrr_at_5 value: 39.737 - type: ndcg_at_1 value: 31.954 - type: ndcg_at_10 value: 41.343999999999994 - type: ndcg_at_100 value: 47.397 - type: ndcg_at_1000 value: 49.501 - type: ndcg_at_3 value: 36.047000000000004 - type: ndcg_at_5 value: 38.639 - type: precision_at_1 value: 31.954 - type: precision_at_10 value: 7.68 - type: precision_at_100 value: 1.247 - type: precision_at_1000 value: 0.16199999999999998 - type: precision_at_3 value: 17.132 - type: precision_at_5 value: 12.589 - type: recall_at_1 value: 25.857999999999997 - type: recall_at_10 value: 53.43599999999999 - type: recall_at_100 value: 78.82400000000001 - type: recall_at_1000 value: 92.78999999999999 - type: recall_at_3 value: 38.655 - type: recall_at_5 value: 45.216 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.709 - type: map_at_10 value: 34.318 - type: map_at_100 value: 35.657 - type: map_at_1000 value: 35.783 - type: map_at_3 value: 31.326999999999998 - type: map_at_5 value: 33.021 - type: mrr_at_1 value: 30.137000000000004 - type: mrr_at_10 value: 39.093 - type: mrr_at_100 value: 39.992 - type: mrr_at_1000 value: 40.056999999999995 - type: mrr_at_3 value: 36.606 - type: mrr_at_5 value: 37.861 - type: ndcg_at_1 value: 30.137000000000004 - type: ndcg_at_10 value: 39.974 - type: ndcg_at_100 value: 45.647999999999996 - type: ndcg_at_1000 value: 48.259 - type: ndcg_at_3 value: 35.028 - type: ndcg_at_5 value: 37.175999999999995 - type: precision_at_1 value: 30.137000000000004 - type: precision_at_10 value: 7.363 - type: precision_at_100 value: 1.184 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 16.857 - type: precision_at_5 value: 11.963 - type: recall_at_1 value: 24.709 - type: recall_at_10 value: 52.087 - type: recall_at_100 value: 76.125 - type: recall_at_1000 value: 93.82300000000001 - type: recall_at_3 value: 38.149 - type: recall_at_5 value: 43.984 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.40791666666667 - type: map_at_10 value: 32.458083333333335 - type: map_at_100 value: 33.691916666666664 - type: map_at_1000 value: 33.81191666666666 - type: map_at_3 value: 29.51625 - type: map_at_5 value: 31.168083333333335 - type: mrr_at_1 value: 27.96591666666666 - type: mrr_at_10 value: 36.528583333333344 - type: mrr_at_100 value: 37.404 - type: mrr_at_1000 value: 37.464333333333336 - type: mrr_at_3 value: 33.92883333333333 - type: mrr_at_5 value: 35.41933333333333 - type: ndcg_at_1 value: 27.96591666666666 - type: ndcg_at_10 value: 37.89141666666666 - type: ndcg_at_100 value: 43.23066666666666 - type: ndcg_at_1000 value: 45.63258333333333 - type: ndcg_at_3 value: 32.811249999999994 - type: ndcg_at_5 value: 35.22566666666667 - type: precision_at_1 value: 27.96591666666666 - type: precision_at_10 value: 6.834083333333332 - type: precision_at_100 value: 1.12225 - type: precision_at_1000 value: 0.15241666666666667 - type: precision_at_3 value: 15.264333333333335 - type: precision_at_5 value: 11.039416666666666 - type: recall_at_1 value: 23.40791666666667 - type: recall_at_10 value: 49.927083333333336 - type: recall_at_100 value: 73.44641666666668 - type: recall_at_1000 value: 90.19950000000001 - type: recall_at_3 value: 35.88341666666667 - type: recall_at_5 value: 42.061249999999994 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.592000000000002 - type: map_at_10 value: 26.895999999999997 - type: map_at_100 value: 27.921000000000003 - type: map_at_1000 value: 28.02 - type: map_at_3 value: 24.883 - type: map_at_5 value: 25.812 - type: mrr_at_1 value: 22.698999999999998 - type: mrr_at_10 value: 29.520999999999997 - type: mrr_at_100 value: 30.458000000000002 - type: mrr_at_1000 value: 30.526999999999997 - type: mrr_at_3 value: 27.633000000000003 - type: mrr_at_5 value: 28.483999999999998 - type: ndcg_at_1 value: 22.698999999999998 - type: ndcg_at_10 value: 31.061 - type: ndcg_at_100 value: 36.398 - type: ndcg_at_1000 value: 38.89 - type: ndcg_at_3 value: 27.149 - type: ndcg_at_5 value: 28.627000000000002 - type: precision_at_1 value: 22.698999999999998 - type: precision_at_10 value: 5.106999999999999 - type: precision_at_100 value: 0.857 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 11.963 - type: precision_at_5 value: 8.221 - type: recall_at_1 value: 19.592000000000002 - type: recall_at_10 value: 41.329 - type: recall_at_100 value: 66.094 - type: recall_at_1000 value: 84.511 - type: recall_at_3 value: 30.61 - type: recall_at_5 value: 34.213 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 14.71 - type: map_at_10 value: 20.965 - type: map_at_100 value: 21.994 - type: map_at_1000 value: 22.133 - type: map_at_3 value: 18.741 - type: map_at_5 value: 19.951 - type: mrr_at_1 value: 18.307000000000002 - type: mrr_at_10 value: 24.66 - type: mrr_at_100 value: 25.540000000000003 - type: mrr_at_1000 value: 25.629 - type: mrr_at_3 value: 22.511 - type: mrr_at_5 value: 23.72 - type: ndcg_at_1 value: 18.307000000000002 - type: ndcg_at_10 value: 25.153 - type: ndcg_at_100 value: 30.229 - type: ndcg_at_1000 value: 33.623 - type: ndcg_at_3 value: 21.203 - type: ndcg_at_5 value: 23.006999999999998 - type: precision_at_1 value: 18.307000000000002 - type: precision_at_10 value: 4.725 - type: precision_at_100 value: 0.8659999999999999 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 10.14 - type: precision_at_5 value: 7.481 - type: recall_at_1 value: 14.71 - type: recall_at_10 value: 34.087 - type: recall_at_100 value: 57.147999999999996 - type: recall_at_1000 value: 81.777 - type: recall_at_3 value: 22.996 - type: recall_at_5 value: 27.73 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.472 - type: map_at_10 value: 32.699 - type: map_at_100 value: 33.867000000000004 - type: map_at_1000 value: 33.967000000000006 - type: map_at_3 value: 29.718 - type: map_at_5 value: 31.345 - type: mrr_at_1 value: 28.265 - type: mrr_at_10 value: 36.945 - type: mrr_at_100 value: 37.794 - type: mrr_at_1000 value: 37.857 - type: mrr_at_3 value: 34.266000000000005 - type: mrr_at_5 value: 35.768 - type: ndcg_at_1 value: 28.265 - type: ndcg_at_10 value: 38.35 - type: ndcg_at_100 value: 43.739 - type: ndcg_at_1000 value: 46.087 - type: ndcg_at_3 value: 33.004 - type: ndcg_at_5 value: 35.411 - type: precision_at_1 value: 28.265 - type: precision_at_10 value: 6.715999999999999 - type: precision_at_100 value: 1.059 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 15.299 - type: precision_at_5 value: 10.951 - type: recall_at_1 value: 23.472 - type: recall_at_10 value: 51.413 - type: recall_at_100 value: 75.17 - type: recall_at_1000 value: 91.577 - type: recall_at_3 value: 36.651 - type: recall_at_5 value: 42.814 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.666 - type: map_at_10 value: 32.963 - type: map_at_100 value: 34.544999999999995 - type: map_at_1000 value: 34.792 - type: map_at_3 value: 29.74 - type: map_at_5 value: 31.5 - type: mrr_at_1 value: 29.051 - type: mrr_at_10 value: 38.013000000000005 - type: mrr_at_100 value: 38.997 - type: mrr_at_1000 value: 39.055 - type: mrr_at_3 value: 34.947 - type: mrr_at_5 value: 36.815 - type: ndcg_at_1 value: 29.051 - type: ndcg_at_10 value: 39.361000000000004 - type: ndcg_at_100 value: 45.186 - type: ndcg_at_1000 value: 47.867 - type: ndcg_at_3 value: 33.797 - type: ndcg_at_5 value: 36.456 - type: precision_at_1 value: 29.051 - type: precision_at_10 value: 7.668 - type: precision_at_100 value: 1.532 - type: precision_at_1000 value: 0.247 - type: precision_at_3 value: 15.876000000000001 - type: precision_at_5 value: 11.779 - type: recall_at_1 value: 23.666 - type: recall_at_10 value: 51.858000000000004 - type: recall_at_100 value: 77.805 - type: recall_at_1000 value: 94.504 - type: recall_at_3 value: 36.207 - type: recall_at_5 value: 43.094 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.662 - type: map_at_10 value: 23.594 - type: map_at_100 value: 24.593999999999998 - type: map_at_1000 value: 24.694 - type: map_at_3 value: 20.925 - type: map_at_5 value: 22.817999999999998 - type: mrr_at_1 value: 17.375 - type: mrr_at_10 value: 25.734 - type: mrr_at_100 value: 26.586 - type: mrr_at_1000 value: 26.671 - type: mrr_at_3 value: 23.044 - type: mrr_at_5 value: 24.975 - type: ndcg_at_1 value: 17.375 - type: ndcg_at_10 value: 28.186 - type: ndcg_at_100 value: 33.436 - type: ndcg_at_1000 value: 36.203 - type: ndcg_at_3 value: 23.152 - type: ndcg_at_5 value: 26.397 - type: precision_at_1 value: 17.375 - type: precision_at_10 value: 4.677 - type: precision_at_100 value: 0.786 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 10.351 - type: precision_at_5 value: 7.985 - type: recall_at_1 value: 15.662 - type: recall_at_10 value: 40.066 - type: recall_at_100 value: 65.006 - type: recall_at_1000 value: 85.94000000000001 - type: recall_at_3 value: 27.400000000000002 - type: recall_at_5 value: 35.002 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 8.853 - type: map_at_10 value: 15.568000000000001 - type: map_at_100 value: 17.383000000000003 - type: map_at_1000 value: 17.584 - type: map_at_3 value: 12.561 - type: map_at_5 value: 14.056 - type: mrr_at_1 value: 18.958 - type: mrr_at_10 value: 28.288000000000004 - type: mrr_at_100 value: 29.432000000000002 - type: mrr_at_1000 value: 29.498 - type: mrr_at_3 value: 25.049 - type: mrr_at_5 value: 26.857 - type: ndcg_at_1 value: 18.958 - type: ndcg_at_10 value: 22.21 - type: ndcg_at_100 value: 29.596 - type: ndcg_at_1000 value: 33.583 - type: ndcg_at_3 value: 16.994999999999997 - type: ndcg_at_5 value: 18.95 - type: precision_at_1 value: 18.958 - type: precision_at_10 value: 7.192 - type: precision_at_100 value: 1.5 - type: precision_at_1000 value: 0.22399999999999998 - type: precision_at_3 value: 12.573 - type: precision_at_5 value: 10.202 - type: recall_at_1 value: 8.853 - type: recall_at_10 value: 28.087 - type: recall_at_100 value: 53.701 - type: recall_at_1000 value: 76.29899999999999 - type: recall_at_3 value: 15.913 - type: recall_at_5 value: 20.658 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.077 - type: map_at_10 value: 20.788999999999998 - type: map_at_100 value: 30.429000000000002 - type: map_at_1000 value: 32.143 - type: map_at_3 value: 14.692 - type: map_at_5 value: 17.139 - type: mrr_at_1 value: 70.75 - type: mrr_at_10 value: 78.036 - type: mrr_at_100 value: 78.401 - type: mrr_at_1000 value: 78.404 - type: mrr_at_3 value: 76.75 - type: mrr_at_5 value: 77.47500000000001 - type: ndcg_at_1 value: 58.12500000000001 - type: ndcg_at_10 value: 44.015 - type: ndcg_at_100 value: 49.247 - type: ndcg_at_1000 value: 56.211999999999996 - type: ndcg_at_3 value: 49.151 - type: ndcg_at_5 value: 46.195 - type: precision_at_1 value: 70.75 - type: precision_at_10 value: 35.5 - type: precision_at_100 value: 11.355 - type: precision_at_1000 value: 2.1950000000000003 - type: precision_at_3 value: 53.083000000000006 - type: precision_at_5 value: 44.800000000000004 - type: recall_at_1 value: 9.077 - type: recall_at_10 value: 26.259 - type: recall_at_100 value: 56.547000000000004 - type: recall_at_1000 value: 78.551 - type: recall_at_3 value: 16.162000000000003 - type: recall_at_5 value: 19.753999999999998 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 49.44500000000001 - type: f1 value: 44.67067691783401 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 68.182 - type: map_at_10 value: 78.223 - type: map_at_100 value: 78.498 - type: map_at_1000 value: 78.512 - type: map_at_3 value: 76.71 - type: map_at_5 value: 77.725 - type: mrr_at_1 value: 73.177 - type: mrr_at_10 value: 82.513 - type: mrr_at_100 value: 82.633 - type: mrr_at_1000 value: 82.635 - type: mrr_at_3 value: 81.376 - type: mrr_at_5 value: 82.182 - type: ndcg_at_1 value: 73.177 - type: ndcg_at_10 value: 82.829 - type: ndcg_at_100 value: 83.84 - type: ndcg_at_1000 value: 84.07900000000001 - type: ndcg_at_3 value: 80.303 - type: ndcg_at_5 value: 81.846 - type: precision_at_1 value: 73.177 - type: precision_at_10 value: 10.241999999999999 - type: precision_at_100 value: 1.099 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 31.247999999999998 - type: precision_at_5 value: 19.697 - type: recall_at_1 value: 68.182 - type: recall_at_10 value: 92.657 - type: recall_at_100 value: 96.709 - type: recall_at_1000 value: 98.184 - type: recall_at_3 value: 85.9 - type: recall_at_5 value: 89.755 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 21.108 - type: map_at_10 value: 33.342 - type: map_at_100 value: 35.281 - type: map_at_1000 value: 35.478 - type: map_at_3 value: 29.067 - type: map_at_5 value: 31.563000000000002 - type: mrr_at_1 value: 41.667 - type: mrr_at_10 value: 49.913000000000004 - type: mrr_at_100 value: 50.724000000000004 - type: mrr_at_1000 value: 50.766 - type: mrr_at_3 value: 47.504999999999995 - type: mrr_at_5 value: 49.033 - type: ndcg_at_1 value: 41.667 - type: ndcg_at_10 value: 41.144 - type: ndcg_at_100 value: 48.326 - type: ndcg_at_1000 value: 51.486 - type: ndcg_at_3 value: 37.486999999999995 - type: ndcg_at_5 value: 38.78 - type: precision_at_1 value: 41.667 - type: precision_at_10 value: 11.358 - type: precision_at_100 value: 1.873 - type: precision_at_1000 value: 0.244 - type: precision_at_3 value: 25 - type: precision_at_5 value: 18.519 - type: recall_at_1 value: 21.108 - type: recall_at_10 value: 47.249 - type: recall_at_100 value: 74.52 - type: recall_at_1000 value: 93.31 - type: recall_at_3 value: 33.271 - type: recall_at_5 value: 39.723000000000006 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 40.317 - type: map_at_10 value: 64.861 - type: map_at_100 value: 65.697 - type: map_at_1000 value: 65.755 - type: map_at_3 value: 61.258 - type: map_at_5 value: 63.590999999999994 - type: mrr_at_1 value: 80.635 - type: mrr_at_10 value: 86.528 - type: mrr_at_100 value: 86.66199999999999 - type: mrr_at_1000 value: 86.666 - type: mrr_at_3 value: 85.744 - type: mrr_at_5 value: 86.24300000000001 - type: ndcg_at_1 value: 80.635 - type: ndcg_at_10 value: 73.13199999999999 - type: ndcg_at_100 value: 75.927 - type: ndcg_at_1000 value: 76.976 - type: ndcg_at_3 value: 68.241 - type: ndcg_at_5 value: 71.071 - type: precision_at_1 value: 80.635 - type: precision_at_10 value: 15.326 - type: precision_at_100 value: 1.7500000000000002 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 43.961 - type: precision_at_5 value: 28.599999999999998 - type: recall_at_1 value: 40.317 - type: recall_at_10 value: 76.631 - type: recall_at_100 value: 87.495 - type: recall_at_1000 value: 94.362 - type: recall_at_3 value: 65.94200000000001 - type: recall_at_5 value: 71.499 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 91.686 - type: ap value: 87.5577120393173 - type: f1 value: 91.6629447355139 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.702 - type: map_at_10 value: 36.414 - type: map_at_100 value: 37.561 - type: map_at_1000 value: 37.605 - type: map_at_3 value: 32.456 - type: map_at_5 value: 34.827000000000005 - type: mrr_at_1 value: 24.355 - type: mrr_at_10 value: 37.01 - type: mrr_at_100 value: 38.085 - type: mrr_at_1000 value: 38.123000000000005 - type: mrr_at_3 value: 33.117999999999995 - type: mrr_at_5 value: 35.452 - type: ndcg_at_1 value: 24.384 - type: ndcg_at_10 value: 43.456 - type: ndcg_at_100 value: 48.892 - type: ndcg_at_1000 value: 49.964 - type: ndcg_at_3 value: 35.475 - type: ndcg_at_5 value: 39.711 - type: precision_at_1 value: 24.384 - type: precision_at_10 value: 6.7940000000000005 - type: precision_at_100 value: 0.951 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 15.052999999999999 - type: precision_at_5 value: 11.189 - type: recall_at_1 value: 23.702 - type: recall_at_10 value: 65.057 - type: recall_at_100 value: 90.021 - type: recall_at_1000 value: 98.142 - type: recall_at_3 value: 43.551 - type: recall_at_5 value: 53.738 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 94.62380300957591 - type: f1 value: 94.49871222100734 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.14090287277702 - type: f1 value: 60.32101258220515 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.84330867518494 - type: f1 value: 71.92248688515255 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.10692669804976 - type: f1 value: 77.9904839122866 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.822988923078444 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 30.38394880253403 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.82504612539082 - type: mrr value: 32.84462298174977 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.029 - type: map_at_10 value: 14.088999999999999 - type: map_at_100 value: 17.601 - type: map_at_1000 value: 19.144 - type: map_at_3 value: 10.156 - type: map_at_5 value: 11.892 - type: mrr_at_1 value: 46.44 - type: mrr_at_10 value: 56.596999999999994 - type: mrr_at_100 value: 57.11000000000001 - type: mrr_at_1000 value: 57.14 - type: mrr_at_3 value: 54.334 - type: mrr_at_5 value: 55.774 - type: ndcg_at_1 value: 44.891999999999996 - type: ndcg_at_10 value: 37.134 - type: ndcg_at_100 value: 33.652 - type: ndcg_at_1000 value: 42.548 - type: ndcg_at_3 value: 41.851 - type: ndcg_at_5 value: 39.842 - type: precision_at_1 value: 46.44 - type: precision_at_10 value: 27.647 - type: precision_at_100 value: 8.309999999999999 - type: precision_at_1000 value: 2.146 - type: precision_at_3 value: 39.422000000000004 - type: precision_at_5 value: 34.675 - type: recall_at_1 value: 6.029 - type: recall_at_10 value: 18.907 - type: recall_at_100 value: 33.76 - type: recall_at_1000 value: 65.14999999999999 - type: recall_at_3 value: 11.584999999999999 - type: recall_at_5 value: 14.626 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 39.373000000000005 - type: map_at_10 value: 55.836 - type: map_at_100 value: 56.611999999999995 - type: map_at_1000 value: 56.63 - type: map_at_3 value: 51.747 - type: map_at_5 value: 54.337999999999994 - type: mrr_at_1 value: 44.147999999999996 - type: mrr_at_10 value: 58.42699999999999 - type: mrr_at_100 value: 58.902 - type: mrr_at_1000 value: 58.914 - type: mrr_at_3 value: 55.156000000000006 - type: mrr_at_5 value: 57.291000000000004 - type: ndcg_at_1 value: 44.119 - type: ndcg_at_10 value: 63.444 - type: ndcg_at_100 value: 66.40599999999999 - type: ndcg_at_1000 value: 66.822 - type: ndcg_at_3 value: 55.962 - type: ndcg_at_5 value: 60.228 - type: precision_at_1 value: 44.119 - type: precision_at_10 value: 10.006 - type: precision_at_100 value: 1.17 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 25.135 - type: precision_at_5 value: 17.59 - type: recall_at_1 value: 39.373000000000005 - type: recall_at_10 value: 83.78999999999999 - type: recall_at_100 value: 96.246 - type: recall_at_1000 value: 99.324 - type: recall_at_3 value: 64.71900000000001 - type: recall_at_5 value: 74.508 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 69.199 - type: map_at_10 value: 82.892 - type: map_at_100 value: 83.578 - type: map_at_1000 value: 83.598 - type: map_at_3 value: 79.948 - type: map_at_5 value: 81.779 - type: mrr_at_1 value: 79.67 - type: mrr_at_10 value: 86.115 - type: mrr_at_100 value: 86.249 - type: mrr_at_1000 value: 86.251 - type: mrr_at_3 value: 85.08200000000001 - type: mrr_at_5 value: 85.783 - type: ndcg_at_1 value: 79.67 - type: ndcg_at_10 value: 86.839 - type: ndcg_at_100 value: 88.252 - type: ndcg_at_1000 value: 88.401 - type: ndcg_at_3 value: 83.86200000000001 - type: ndcg_at_5 value: 85.473 - type: precision_at_1 value: 79.67 - type: precision_at_10 value: 13.19 - type: precision_at_100 value: 1.521 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 36.677 - type: precision_at_5 value: 24.118000000000002 - type: recall_at_1 value: 69.199 - type: recall_at_10 value: 94.321 - type: recall_at_100 value: 99.20400000000001 - type: recall_at_1000 value: 99.947 - type: recall_at_3 value: 85.787 - type: recall_at_5 value: 90.365 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 55.82810046856353 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.38132611783628 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.127000000000001 - type: map_at_10 value: 12.235 - type: map_at_100 value: 14.417 - type: map_at_1000 value: 14.75 - type: map_at_3 value: 8.906 - type: map_at_5 value: 10.591000000000001 - type: mrr_at_1 value: 25.2 - type: mrr_at_10 value: 35.879 - type: mrr_at_100 value: 36.935 - type: mrr_at_1000 value: 36.997 - type: mrr_at_3 value: 32.783 - type: mrr_at_5 value: 34.367999999999995 - type: ndcg_at_1 value: 25.2 - type: ndcg_at_10 value: 20.509 - type: ndcg_at_100 value: 28.67 - type: ndcg_at_1000 value: 34.42 - type: ndcg_at_3 value: 19.948 - type: ndcg_at_5 value: 17.166 - type: precision_at_1 value: 25.2 - type: precision_at_10 value: 10.440000000000001 - type: precision_at_100 value: 2.214 - type: precision_at_1000 value: 0.359 - type: precision_at_3 value: 18.533 - type: precision_at_5 value: 14.860000000000001 - type: recall_at_1 value: 5.127000000000001 - type: recall_at_10 value: 21.147 - type: recall_at_100 value: 44.946999999999996 - type: recall_at_1000 value: 72.89 - type: recall_at_3 value: 11.277 - type: recall_at_5 value: 15.042 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.0373011786213 - type: cos_sim_spearman value: 79.27889560856613 - type: euclidean_pearson value: 80.31186315495655 - type: euclidean_spearman value: 79.41630415280811 - type: manhattan_pearson value: 80.31755140442013 - type: manhattan_spearman value: 79.43069870027611 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 84.8659751342045 - type: cos_sim_spearman value: 76.95377612997667 - type: euclidean_pearson value: 81.24552945497848 - type: euclidean_spearman value: 77.18236963555253 - type: manhattan_pearson value: 81.26477607759037 - type: manhattan_spearman value: 77.13821753062756 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 83.34597139044875 - type: cos_sim_spearman value: 84.124169425592 - type: euclidean_pearson value: 83.68590721511401 - type: euclidean_spearman value: 84.18846190846398 - type: manhattan_pearson value: 83.57630235061498 - type: manhattan_spearman value: 84.10244043726902 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.67641885599572 - type: cos_sim_spearman value: 80.46450725650428 - type: euclidean_pearson value: 81.61645042715865 - type: euclidean_spearman value: 80.61418394236874 - type: manhattan_pearson value: 81.55712034928871 - type: manhattan_spearman value: 80.57905670523951 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.86650310886782 - type: cos_sim_spearman value: 89.76081629222328 - type: euclidean_pearson value: 89.1530747029954 - type: euclidean_spearman value: 89.80990657280248 - type: manhattan_pearson value: 89.10640563278132 - type: manhattan_spearman value: 89.76282108434047 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.93864027911118 - type: cos_sim_spearman value: 85.47096193999023 - type: euclidean_pearson value: 85.03141840870533 - type: euclidean_spearman value: 85.43124029598181 - type: manhattan_pearson value: 84.99002664393512 - type: manhattan_spearman value: 85.39169195120834 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.7045343749832 - type: cos_sim_spearman value: 89.03262221146677 - type: euclidean_pearson value: 89.56078218264365 - type: euclidean_spearman value: 89.17827006466868 - type: manhattan_pearson value: 89.52717595468582 - type: manhattan_spearman value: 89.15878115952923 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 64.20191302875551 - type: cos_sim_spearman value: 64.11446552557646 - type: euclidean_pearson value: 64.6918197393619 - type: euclidean_spearman value: 63.440182631197764 - type: manhattan_pearson value: 64.55692904121835 - type: manhattan_spearman value: 63.424877742756266 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 86.37793104662344 - type: cos_sim_spearman value: 87.7357802629067 - type: euclidean_pearson value: 87.4286301545109 - type: euclidean_spearman value: 87.78452920777421 - type: manhattan_pearson value: 87.42445169331255 - type: manhattan_spearman value: 87.78537677249598 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 84.31465405081792 - type: mrr value: 95.7173781193389 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.760999999999996 - type: map_at_10 value: 67.904 - type: map_at_100 value: 68.539 - type: map_at_1000 value: 68.562 - type: map_at_3 value: 65.415 - type: map_at_5 value: 66.788 - type: mrr_at_1 value: 60.333000000000006 - type: mrr_at_10 value: 68.797 - type: mrr_at_100 value: 69.236 - type: mrr_at_1000 value: 69.257 - type: mrr_at_3 value: 66.667 - type: mrr_at_5 value: 67.967 - type: ndcg_at_1 value: 60.333000000000006 - type: ndcg_at_10 value: 72.24199999999999 - type: ndcg_at_100 value: 74.86 - type: ndcg_at_1000 value: 75.354 - type: ndcg_at_3 value: 67.93400000000001 - type: ndcg_at_5 value: 70.02199999999999 - type: precision_at_1 value: 60.333000000000006 - type: precision_at_10 value: 9.533 - type: precision_at_100 value: 1.09 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 26.778000000000002 - type: precision_at_5 value: 17.467 - type: recall_at_1 value: 57.760999999999996 - type: recall_at_10 value: 84.383 - type: recall_at_100 value: 96.267 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 72.628 - type: recall_at_5 value: 78.094 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.8029702970297 - type: cos_sim_ap value: 94.9210324173411 - type: cos_sim_f1 value: 89.8521162672106 - type: cos_sim_precision value: 91.67533818938605 - type: cos_sim_recall value: 88.1 - type: dot_accuracy value: 99.69504950495049 - type: dot_ap value: 90.4919719146181 - type: dot_f1 value: 84.72289156626506 - type: dot_precision value: 81.76744186046511 - type: dot_recall value: 87.9 - type: euclidean_accuracy value: 99.79702970297029 - type: euclidean_ap value: 94.87827463795753 - type: euclidean_f1 value: 89.55680081507896 - type: euclidean_precision value: 91.27725856697819 - type: euclidean_recall value: 87.9 - type: manhattan_accuracy value: 99.7990099009901 - type: manhattan_ap value: 94.87587025149682 - type: manhattan_f1 value: 89.76298537569339 - type: manhattan_precision value: 90.53916581892166 - type: manhattan_recall value: 89 - type: max_accuracy value: 99.8029702970297 - type: max_ap value: 94.9210324173411 - type: max_f1 value: 89.8521162672106 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.92385753948724 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 33.671756975431144 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 50.677928036739004 - type: mrr value: 51.56413133435193 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.523589340819683 - type: cos_sim_spearman value: 30.187407518823235 - type: dot_pearson value: 29.039713969699015 - type: dot_spearman value: 29.114740651155508 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.211 - type: map_at_10 value: 1.6199999999999999 - type: map_at_100 value: 8.658000000000001 - type: map_at_1000 value: 21.538 - type: map_at_3 value: 0.575 - type: map_at_5 value: 0.919 - type: mrr_at_1 value: 78 - type: mrr_at_10 value: 86.18599999999999 - type: mrr_at_100 value: 86.18599999999999 - type: mrr_at_1000 value: 86.18599999999999 - type: mrr_at_3 value: 85 - type: mrr_at_5 value: 85.9 - type: ndcg_at_1 value: 74 - type: ndcg_at_10 value: 66.542 - type: ndcg_at_100 value: 50.163999999999994 - type: ndcg_at_1000 value: 45.696999999999996 - type: ndcg_at_3 value: 71.531 - type: ndcg_at_5 value: 70.45 - type: precision_at_1 value: 78 - type: precision_at_10 value: 69.39999999999999 - type: precision_at_100 value: 51.06 - type: precision_at_1000 value: 20.022000000000002 - type: precision_at_3 value: 76 - type: precision_at_5 value: 74.8 - type: recall_at_1 value: 0.211 - type: recall_at_10 value: 1.813 - type: recall_at_100 value: 12.098 - type: recall_at_1000 value: 42.618 - type: recall_at_3 value: 0.603 - type: recall_at_5 value: 0.987 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.2079999999999997 - type: map_at_10 value: 7.777000000000001 - type: map_at_100 value: 12.825000000000001 - type: map_at_1000 value: 14.196 - type: map_at_3 value: 4.285 - type: map_at_5 value: 6.177 - type: mrr_at_1 value: 30.612000000000002 - type: mrr_at_10 value: 42.635 - type: mrr_at_100 value: 43.955 - type: mrr_at_1000 value: 43.955 - type: mrr_at_3 value: 38.435 - type: mrr_at_5 value: 41.088 - type: ndcg_at_1 value: 28.571 - type: ndcg_at_10 value: 20.666999999999998 - type: ndcg_at_100 value: 31.840000000000003 - type: ndcg_at_1000 value: 43.191 - type: ndcg_at_3 value: 23.45 - type: ndcg_at_5 value: 22.994 - type: precision_at_1 value: 30.612000000000002 - type: precision_at_10 value: 17.959 - type: precision_at_100 value: 6.755 - type: precision_at_1000 value: 1.4200000000000002 - type: precision_at_3 value: 23.810000000000002 - type: precision_at_5 value: 23.673 - type: recall_at_1 value: 2.2079999999999997 - type: recall_at_10 value: 13.144 - type: recall_at_100 value: 42.491 - type: recall_at_1000 value: 77.04299999999999 - type: recall_at_3 value: 5.3469999999999995 - type: recall_at_5 value: 9.139 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.9044 - type: ap value: 14.625783489340755 - type: f1 value: 54.814936562590546 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.94227504244483 - type: f1 value: 61.22516038508854 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 49.602409155145864 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.94641473445789 - type: cos_sim_ap value: 76.91572747061197 - type: cos_sim_f1 value: 70.14348097317529 - type: cos_sim_precision value: 66.53254437869822 - type: cos_sim_recall value: 74.1688654353562 - type: dot_accuracy value: 84.80061989628658 - type: dot_ap value: 70.7952548895177 - type: dot_f1 value: 65.44780728844965 - type: dot_precision value: 61.53310104529617 - type: dot_recall value: 69.89445910290237 - type: euclidean_accuracy value: 86.94641473445789 - type: euclidean_ap value: 76.80774009393652 - type: euclidean_f1 value: 70.30522503879979 - type: euclidean_precision value: 68.94977168949772 - type: euclidean_recall value: 71.71503957783642 - type: manhattan_accuracy value: 86.8629671574179 - type: manhattan_ap value: 76.76518632600317 - type: manhattan_f1 value: 70.16056518946692 - type: manhattan_precision value: 68.360450563204 - type: manhattan_recall value: 72.0580474934037 - type: max_accuracy value: 86.94641473445789 - type: max_ap value: 76.91572747061197 - type: max_f1 value: 70.30522503879979 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.10428066907285 - type: cos_sim_ap value: 86.25114759921435 - type: cos_sim_f1 value: 78.37857884586856 - type: cos_sim_precision value: 75.60818546078993 - type: cos_sim_recall value: 81.35971666153372 - type: dot_accuracy value: 87.41995575736406 - type: dot_ap value: 81.51838010086782 - type: dot_f1 value: 74.77398015435503 - type: dot_precision value: 71.53002390662354 - type: dot_recall value: 78.32614721281182 - type: euclidean_accuracy value: 89.12368533395428 - type: euclidean_ap value: 86.33456799874504 - type: euclidean_f1 value: 78.45496750232127 - type: euclidean_precision value: 75.78388462366364 - type: euclidean_recall value: 81.32121958731136 - type: manhattan_accuracy value: 89.10622113556099 - type: manhattan_ap value: 86.31215061745333 - type: manhattan_f1 value: 78.40684906011539 - type: manhattan_precision value: 75.89536643366722 - type: manhattan_recall value: 81.09023714197721 - type: max_accuracy value: 89.12368533395428 - type: max_ap value: 86.33456799874504 - type: max_f1 value: 78.45496750232127 language: - en license: mit --- # E5-large-v2 Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 This model has 24 layers and the embedding size is 1024. ## Usage Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. ## Training Details Please refer to our paper at ## Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## Support for Sentence Transformers Below is an example for usage with sentence_transformers. Package requirements Contributors: michaelfeil ## FAQ **1. Do I need to add the prefix \"query: \" and \"passage: \" to input texts?** Yes, this is how the model is trained, otherwise you will see a performance degradation. Here are some rules of thumb: - Use \"query: \" and \"passage: \" correspondingly for asymmetric tasks such as passage retrieval in open QA, ad-hoc information retrieval. - Use \"query: \" prefix for symmetric tasks such as semantic similarity, paraphrase retrieval. - Use \"query: \" prefix if you want to use embeddings as features, such as linear probing classification, clustering. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Why does the cosine similarity scores distribute around 0.7 to 1.0?** This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss. For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue. ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations This model only works for English texts. Long texts will be truncated to at most 512 tokens.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like classification, retrieval, clustering, and similarity measurement." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_e5-mistral-7b-instruct.json b/data/model_data_json/intfloat_e5-mistral-7b-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..3c07ce8276f2c76d8ffbd6365aa9571cc813a5d3 --- /dev/null +++ b/data/model_data_json/intfloat_e5-mistral-7b-instruct.json @@ -0,0 +1,27 @@ +{ + "model_id": "intfloat/e5-mistral-7b-instruct", + "downloads": 273550, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "mistral", + "feature-extraction", + "mteb", + "transformers", + "en", + "arxiv:2401.00368", + "arxiv:2104.08663", + "arxiv:2210.07316", + "arxiv:2212.03533", + "license:mit", + "model-index", + "autotrain_compatible", + "text-generation-inference", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-transformers - transformers model-index: - name: e5-mistral-7b-instruct results: - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: None metrics: - type: cos_sim_pearson value: 37.863226091673866 - type: cos_sim_spearman value: 38.98733013335281 - type: euclidean_pearson value: 37.51783380497874 - type: euclidean_spearman value: 38.98733012753365 - type: manhattan_pearson value: 37.26706888081721 - type: manhattan_spearman value: 38.709750161903834 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 43.33924583134623 - type: cos_sim_spearman value: 42.84316155158754 - type: euclidean_pearson value: 45.62709879515238 - type: euclidean_spearman value: 42.843155921732404 - type: manhattan_pearson value: 45.4786950991229 - type: manhattan_spearman value: 42.657334751855984 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 78.68656716417911 - type: ap value: 41.71522322900398 - type: f1 value: 72.37207703532552 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (de) config: de split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.04710920770879 - type: ap value: 83.42622221864045 - type: f1 value: 72.14388257905772 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en-ext) config: en-ext split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 77.93103448275862 - type: ap value: 26.039284760509513 - type: f1 value: 64.81092954450712 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (ja) config: ja split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 77.21627408993577 - type: ap value: 24.876490553983036 - type: f1 value: 63.8773359684989 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 95.90679999999999 - type: ap value: 94.32357863164454 - type: f1 value: 95.90485634708557 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 55.786 - type: f1 value: 55.31211995815146 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 53.26 - type: f1 value: 52.156230111544986 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 50.33 - type: f1 value: 49.195023008878145 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 49.3 - type: f1 value: 48.434470184108 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.68599999999999 - type: f1 value: 47.62681775202072 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 46.238 - type: f1 value: 45.014030559653705 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 36.486000000000004 - type: map_at_10 value: 53.076 - type: map_at_100 value: 53.657999999999994 - type: map_at_1000 value: 53.659 - type: map_at_3 value: 48.234 - type: map_at_5 value: 51.121 - type: mrr_at_1 value: 37.269000000000005 - type: mrr_at_10 value: 53.335 - type: mrr_at_100 value: 53.916 - type: mrr_at_1000 value: 53.918 - type: mrr_at_3 value: 48.518 - type: mrr_at_5 value: 51.406 - type: ndcg_at_1 value: 36.486000000000004 - type: ndcg_at_10 value: 61.882000000000005 - type: ndcg_at_100 value: 64.165 - type: ndcg_at_1000 value: 64.203 - type: ndcg_at_3 value: 52.049 - type: ndcg_at_5 value: 57.199 - type: precision_at_1 value: 36.486000000000004 - type: precision_at_10 value: 8.982999999999999 - type: precision_at_100 value: 0.9939999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 21.029 - type: precision_at_5 value: 15.092 - type: recall_at_1 value: 36.486000000000004 - type: recall_at_10 value: 89.82900000000001 - type: recall_at_100 value: 99.36 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 63.087 - type: recall_at_5 value: 75.46199999999999 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 50.45119266859667 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 45.4958298992051 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 66.98177472838887 - type: mrr value: 79.91854636591478 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 87.67086498650698 - type: cos_sim_spearman value: 85.54773239564638 - type: euclidean_pearson value: 86.48229161588425 - type: euclidean_spearman value: 85.54773239564638 - type: manhattan_pearson value: 86.67533327742343 - type: manhattan_spearman value: 85.76099026691983 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: None metrics: - type: cos_sim_pearson value: 50.31998888922809 - type: cos_sim_spearman value: 50.6369940530675 - type: euclidean_pearson value: 50.055544636296055 - type: euclidean_spearman value: 50.63699405154838 - type: manhattan_pearson value: 50.00739378036807 - type: manhattan_spearman value: 50.607237418676945 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.5615866388309 - type: f1 value: 99.49895615866389 - type: precision value: 99.46764091858039 - type: recall value: 99.5615866388309 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.19656614571869 - type: f1 value: 99.08650671362535 - type: precision value: 99.0314769975787 - type: recall value: 99.19656614571869 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 98.0256321440942 - type: f1 value: 97.83743216718624 - type: precision value: 97.74390947927492 - type: recall value: 98.0256321440942 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.26276987888363 - type: f1 value: 99.22766368264 - type: precision value: 99.21011058451816 - type: recall value: 99.26276987888363 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 88.22727272727272 - type: f1 value: 88.17411732496673 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 43.530637846246975 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 40.23505728593893 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: None metrics: - type: v_measure value: 44.419028279451275 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: None metrics: - type: v_measure value: 42.5820277929776 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 77.67811726152972 - type: mrr value: 80.99003968253969 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 78.66055354534922 - type: mrr value: 81.66119047619047 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.162333333333333 - type: map_at_10 value: 37.22291666666667 - type: map_at_100 value: 38.56733333333333 - type: map_at_1000 value: 38.684250000000006 - type: map_at_3 value: 34.22858333333333 - type: map_at_5 value: 35.852500000000006 - type: mrr_at_1 value: 32.459833333333336 - type: mrr_at_10 value: 41.65358333333333 - type: mrr_at_100 value: 42.566916666666664 - type: mrr_at_1000 value: 42.61766666666667 - type: mrr_at_3 value: 39.210499999999996 - type: mrr_at_5 value: 40.582166666666666 - type: ndcg_at_1 value: 32.459833333333336 - type: ndcg_at_10 value: 42.96758333333333 - type: ndcg_at_100 value: 48.5065 - type: ndcg_at_1000 value: 50.556583333333336 - type: ndcg_at_3 value: 38.004416666666664 - type: ndcg_at_5 value: 40.25916666666667 - type: precision_at_1 value: 32.459833333333336 - type: precision_at_10 value: 7.664583333333333 - type: precision_at_100 value: 1.2349999999999999 - type: precision_at_1000 value: 0.15966666666666668 - type: precision_at_3 value: 17.731166666666663 - type: precision_at_5 value: 12.575333333333335 - type: recall_at_1 value: 27.162333333333333 - type: recall_at_10 value: 55.44158333333334 - type: recall_at_100 value: 79.56966666666666 - type: recall_at_1000 value: 93.45224999999999 - type: recall_at_3 value: 41.433083333333336 - type: recall_at_5 value: 47.31108333333333 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 16.539 - type: map_at_10 value: 28.494999999999997 - type: map_at_100 value: 30.568 - type: map_at_1000 value: 30.741000000000003 - type: map_at_3 value: 23.846999999999998 - type: map_at_5 value: 26.275 - type: mrr_at_1 value: 37.394 - type: mrr_at_10 value: 50.068 - type: mrr_at_100 value: 50.727 - type: mrr_at_1000 value: 50.751000000000005 - type: mrr_at_3 value: 46.938 - type: mrr_at_5 value: 48.818 - type: ndcg_at_1 value: 37.394 - type: ndcg_at_10 value: 38.349 - type: ndcg_at_100 value: 45.512 - type: ndcg_at_1000 value: 48.321 - type: ndcg_at_3 value: 32.172 - type: ndcg_at_5 value: 34.265 - type: precision_at_1 value: 37.394 - type: precision_at_10 value: 11.927999999999999 - type: precision_at_100 value: 1.966 - type: precision_at_1000 value: 0.25 - type: precision_at_3 value: 24.126 - type: precision_at_5 value: 18.306 - type: recall_at_1 value: 16.539 - type: recall_at_10 value: 44.504 - type: recall_at_100 value: 68.605 - type: recall_at_1000 value: 84.1 - type: recall_at_3 value: 29.008 - type: recall_at_5 value: 35.58 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 19.482 - type: map_at_10 value: 28.622999999999998 - type: map_at_100 value: 30.262 - type: map_at_1000 value: 30.432 - type: map_at_3 value: 25.647 - type: map_at_5 value: 27.128000000000004 - type: mrr_at_1 value: 30.408 - type: mrr_at_10 value: 37.188 - type: mrr_at_100 value: 38.196000000000005 - type: mrr_at_1000 value: 38.273 - type: mrr_at_3 value: 35.067 - type: mrr_at_5 value: 36.124 - type: ndcg_at_1 value: 30.408 - type: ndcg_at_10 value: 34.215 - type: ndcg_at_100 value: 41.349999999999994 - type: ndcg_at_1000 value: 44.689 - type: ndcg_at_3 value: 30.264999999999997 - type: ndcg_at_5 value: 31.572 - type: precision_at_1 value: 30.408 - type: precision_at_10 value: 7.6770000000000005 - type: precision_at_100 value: 1.352 - type: precision_at_1000 value: 0.178 - type: precision_at_3 value: 17.213 - type: precision_at_5 value: 12.198 - type: recall_at_1 value: 19.482 - type: recall_at_10 value: 42.368 - type: recall_at_100 value: 72.694 - type: recall_at_1000 value: 95.602 - type: recall_at_3 value: 30.101 - type: recall_at_5 value: 34.708 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: None metrics: - type: cos_sim_accuracy value: 71.16055321707758 - type: cos_sim_ap value: 80.21073839711723 - type: cos_sim_f1 value: 72.9740932642487 - type: cos_sim_precision value: 65.53136050623488 - type: cos_sim_recall value: 82.3240589198036 - type: dot_accuracy value: 71.16055321707758 - type: dot_ap value: 80.212299264122 - type: dot_f1 value: 72.9740932642487 - type: dot_precision value: 65.53136050623488 - type: dot_recall value: 82.3240589198036 - type: euclidean_accuracy value: 71.16055321707758 - type: euclidean_ap value: 80.21076298680417 - type: euclidean_f1 value: 72.9740932642487 - type: euclidean_precision value: 65.53136050623488 - type: euclidean_recall value: 82.3240589198036 - type: manhattan_accuracy value: 70.71557426337944 - type: manhattan_ap value: 79.93448977199749 - type: manhattan_f1 value: 72.83962726826877 - type: manhattan_precision value: 62.7407908077053 - type: manhattan_recall value: 86.81318681318682 - type: max_accuracy value: 71.16055321707758 - type: max_ap value: 80.212299264122 - type: max_f1 value: 72.9740932642487 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 60.643 - type: map_at_10 value: 69.011 - type: map_at_100 value: 69.533 - type: map_at_1000 value: 69.545 - type: map_at_3 value: 67.167 - type: map_at_5 value: 68.12700000000001 - type: mrr_at_1 value: 60.801 - type: mrr_at_10 value: 69.111 - type: mrr_at_100 value: 69.6 - type: mrr_at_1000 value: 69.611 - type: mrr_at_3 value: 67.229 - type: mrr_at_5 value: 68.214 - type: ndcg_at_1 value: 60.801 - type: ndcg_at_10 value: 73.128 - type: ndcg_at_100 value: 75.614 - type: ndcg_at_1000 value: 75.92 - type: ndcg_at_3 value: 69.261 - type: ndcg_at_5 value: 70.973 - type: precision_at_1 value: 60.801 - type: precision_at_10 value: 8.662 - type: precision_at_100 value: 0.9860000000000001 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 25.149 - type: precision_at_5 value: 15.953999999999999 - type: recall_at_1 value: 60.643 - type: recall_at_10 value: 85.959 - type: recall_at_100 value: 97.576 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 75.184 - type: recall_at_5 value: 79.32000000000001 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 10.183 - type: map_at_10 value: 23.958 - type: map_at_100 value: 34.354 - type: map_at_1000 value: 36.442 - type: map_at_3 value: 16.345000000000002 - type: map_at_5 value: 19.647000000000002 - type: mrr_at_1 value: 74.25 - type: mrr_at_10 value: 80.976 - type: mrr_at_100 value: 81.256 - type: mrr_at_1000 value: 81.262 - type: mrr_at_3 value: 79.958 - type: mrr_at_5 value: 80.37100000000001 - type: ndcg_at_1 value: 62.0 - type: ndcg_at_10 value: 48.894999999999996 - type: ndcg_at_100 value: 53.867 - type: ndcg_at_1000 value: 61.304 - type: ndcg_at_3 value: 53.688 - type: ndcg_at_5 value: 50.900999999999996 - type: precision_at_1 value: 74.25 - type: precision_at_10 value: 39.525 - type: precision_at_100 value: 12.323 - type: precision_at_1000 value: 2.539 - type: precision_at_3 value: 57.49999999999999 - type: precision_at_5 value: 49.1 - type: recall_at_1 value: 10.183 - type: recall_at_10 value: 29.296 - type: recall_at_100 value: 60.394999999999996 - type: recall_at_1000 value: 83.12 - type: recall_at_3 value: 17.495 - type: recall_at_5 value: 22.235 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 26.613999999999997 - type: map_at_10 value: 79.77300000000001 - type: map_at_100 value: 82.71 - type: map_at_1000 value: 82.75 - type: map_at_3 value: 55.92700000000001 - type: map_at_5 value: 70.085 - type: mrr_at_1 value: 90.7 - type: mrr_at_10 value: 93.438 - type: mrr_at_100 value: 93.504 - type: mrr_at_1000 value: 93.50699999999999 - type: mrr_at_3 value: 93.125 - type: mrr_at_5 value: 93.34 - type: ndcg_at_1 value: 90.7 - type: ndcg_at_10 value: 87.023 - type: ndcg_at_100 value: 90.068 - type: ndcg_at_1000 value: 90.43299999999999 - type: ndcg_at_3 value: 86.339 - type: ndcg_at_5 value: 85.013 - type: precision_at_1 value: 90.7 - type: precision_at_10 value: 41.339999999999996 - type: precision_at_100 value: 4.806 - type: precision_at_1000 value: 0.48900000000000005 - type: precision_at_3 value: 76.983 - type: precision_at_5 value: 64.69 - type: recall_at_1 value: 26.613999999999997 - type: recall_at_10 value: 87.681 - type: recall_at_100 value: 97.44699999999999 - type: recall_at_1000 value: 99.348 - type: recall_at_3 value: 57.809999999999995 - type: recall_at_5 value: 74.258 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 30.9 - type: map_at_10 value: 40.467 - type: map_at_100 value: 41.423 - type: map_at_1000 value: 41.463 - type: map_at_3 value: 37.25 - type: map_at_5 value: 39.31 - type: mrr_at_1 value: 30.9 - type: mrr_at_10 value: 40.467 - type: mrr_at_100 value: 41.423 - type: mrr_at_1000 value: 41.463 - type: mrr_at_3 value: 37.25 - type: mrr_at_5 value: 39.31 - type: ndcg_at_1 value: 30.9 - type: ndcg_at_10 value: 45.957 - type: ndcg_at_100 value: 50.735 - type: ndcg_at_1000 value: 51.861999999999995 - type: ndcg_at_3 value: 39.437 - type: ndcg_at_5 value: 43.146 - type: precision_at_1 value: 30.9 - type: precision_at_10 value: 6.35 - type: precision_at_100 value: 0.861 - type: precision_at_1000 value: 0.095 - type: precision_at_3 value: 15.267 - type: precision_at_5 value: 10.96 - type: recall_at_1 value: 30.9 - type: recall_at_10 value: 63.5 - type: recall_at_100 value: 86.1 - type: recall_at_1000 value: 95.1 - type: recall_at_3 value: 45.800000000000004 - type: recall_at_5 value: 54.800000000000004 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 49.765 - type: f1 value: 45.93242203574485 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 75.138 - type: map_at_10 value: 84.21300000000001 - type: map_at_100 value: 84.43 - type: map_at_1000 value: 84.441 - type: map_at_3 value: 83.071 - type: map_at_5 value: 83.853 - type: mrr_at_1 value: 80.948 - type: mrr_at_10 value: 88.175 - type: mrr_at_100 value: 88.24 - type: mrr_at_1000 value: 88.241 - type: mrr_at_3 value: 87.516 - type: mrr_at_5 value: 87.997 - type: ndcg_at_1 value: 80.948 - type: ndcg_at_10 value: 87.84100000000001 - type: ndcg_at_100 value: 88.576 - type: ndcg_at_1000 value: 88.75699999999999 - type: ndcg_at_3 value: 86.176 - type: ndcg_at_5 value: 87.214 - type: precision_at_1 value: 80.948 - type: precision_at_10 value: 10.632 - type: precision_at_100 value: 1.123 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 33.193 - type: precision_at_5 value: 20.663 - type: recall_at_1 value: 75.138 - type: recall_at_10 value: 94.89699999999999 - type: recall_at_100 value: 97.751 - type: recall_at_1000 value: 98.833 - type: recall_at_3 value: 90.455 - type: recall_at_5 value: 93.085 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 29.45 - type: map_at_10 value: 48.596000000000004 - type: map_at_100 value: 50.70400000000001 - type: map_at_1000 value: 50.83800000000001 - type: map_at_3 value: 42.795 - type: map_at_5 value: 46.085 - type: mrr_at_1 value: 56.172999999999995 - type: mrr_at_10 value: 64.35300000000001 - type: mrr_at_100 value: 64.947 - type: mrr_at_1000 value: 64.967 - type: mrr_at_3 value: 62.653999999999996 - type: mrr_at_5 value: 63.534 - type: ndcg_at_1 value: 56.172999999999995 - type: ndcg_at_10 value: 56.593 - type: ndcg_at_100 value: 62.942 - type: ndcg_at_1000 value: 64.801 - type: ndcg_at_3 value: 53.024 - type: ndcg_at_5 value: 53.986999999999995 - type: precision_at_1 value: 56.172999999999995 - type: precision_at_10 value: 15.494 - type: precision_at_100 value: 2.222 - type: precision_at_1000 value: 0.254 - type: precision_at_3 value: 35.185 - type: precision_at_5 value: 25.556 - type: recall_at_1 value: 29.45 - type: recall_at_10 value: 62.882000000000005 - type: recall_at_100 value: 85.56099999999999 - type: recall_at_1000 value: 96.539 - type: recall_at_3 value: 47.911 - type: recall_at_5 value: 54.52 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.581 - type: map_at_10 value: 68.401 - type: map_at_100 value: 69.207 - type: map_at_1000 value: 69.25200000000001 - type: map_at_3 value: 64.689 - type: map_at_5 value: 67.158 - type: mrr_at_1 value: 79.163 - type: mrr_at_10 value: 85.22999999999999 - type: mrr_at_100 value: 85.386 - type: mrr_at_1000 value: 85.39099999999999 - type: mrr_at_3 value: 84.432 - type: mrr_at_5 value: 84.952 - type: ndcg_at_1 value: 79.163 - type: ndcg_at_10 value: 75.721 - type: ndcg_at_100 value: 78.411 - type: ndcg_at_1000 value: 79.23599999999999 - type: ndcg_at_3 value: 70.68799999999999 - type: ndcg_at_5 value: 73.694 - type: precision_at_1 value: 79.163 - type: precision_at_10 value: 16.134 - type: precision_at_100 value: 1.821 - type: precision_at_1000 value: 0.193 - type: precision_at_3 value: 46.446 - type: precision_at_5 value: 30.242 - type: recall_at_1 value: 39.581 - type: recall_at_10 value: 80.66799999999999 - type: recall_at_100 value: 91.033 - type: recall_at_1000 value: 96.408 - type: recall_at_3 value: 69.669 - type: recall_at_5 value: 75.604 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: None metrics: - type: accuracy value: 45.04809542131589 - type: f1 value: 37.01181779071118 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 94.78120000000001 - type: ap value: 92.52931921594387 - type: f1 value: 94.77902110732532 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: None metrics: - type: accuracy value: 85.81613508442777 - type: ap value: 52.430320593468394 - type: f1 value: 79.95467268178068 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 71.05801751913393 - type: cos_sim_spearman value: 75.47954644971965 - type: euclidean_pearson value: 74.27472296759713 - type: euclidean_spearman value: 75.47954201369866 - type: manhattan_pearson value: 74.30508190186474 - type: manhattan_spearman value: 75.51326518159436 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 24.21110921666315 - type: mrr value: 22.863492063492064 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 61.38400000000001 - type: map_at_10 value: 70.895 - type: map_at_100 value: 71.314 - type: map_at_1000 value: 71.331 - type: map_at_3 value: 69.016 - type: map_at_5 value: 70.179 - type: mrr_at_1 value: 63.481 - type: mrr_at_10 value: 71.543 - type: mrr_at_100 value: 71.91300000000001 - type: mrr_at_1000 value: 71.928 - type: mrr_at_3 value: 69.90899999999999 - type: mrr_at_5 value: 70.907 - type: ndcg_at_1 value: 63.481 - type: ndcg_at_10 value: 74.833 - type: ndcg_at_100 value: 76.705 - type: ndcg_at_1000 value: 77.13600000000001 - type: ndcg_at_3 value: 71.236 - type: ndcg_at_5 value: 73.199 - type: precision_at_1 value: 63.481 - type: precision_at_10 value: 9.179 - type: precision_at_100 value: 1.011 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 27.044 - type: precision_at_5 value: 17.272000000000002 - type: recall_at_1 value: 61.38400000000001 - type: recall_at_10 value: 86.318 - type: recall_at_100 value: 94.786 - type: recall_at_1000 value: 98.14500000000001 - type: recall_at_3 value: 76.717 - type: recall_at_5 value: 81.416 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.363999999999997 - type: map_at_10 value: 36.022 - type: map_at_100 value: 37.229 - type: map_at_1000 value: 37.274 - type: map_at_3 value: 32.131 - type: map_at_5 value: 34.391 - type: mrr_at_1 value: 24.069 - type: mrr_at_10 value: 36.620000000000005 - type: mrr_at_100 value: 37.769999999999996 - type: mrr_at_1000 value: 37.809 - type: mrr_at_3 value: 32.846 - type: mrr_at_5 value: 35.02 - type: ndcg_at_1 value: 24.069 - type: ndcg_at_10 value: 43.056 - type: ndcg_at_100 value: 48.754 - type: ndcg_at_1000 value: 49.829 - type: ndcg_at_3 value: 35.167 - type: ndcg_at_5 value: 39.168 - type: precision_at_1 value: 24.069 - type: precision_at_10 value: 6.762 - type: precision_at_100 value: 0.96 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 14.957 - type: precision_at_5 value: 11.023 - type: recall_at_1 value: 23.363999999999997 - type: recall_at_10 value: 64.696 - type: recall_at_100 value: 90.795 - type: recall_at_1000 value: 98.892 - type: recall_at_3 value: 43.247 - type: recall_at_5 value: 52.86300000000001 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 96.11947104423166 - type: f1 value: 95.89561841159332 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.97548605240912 - type: f1 value: 92.17133696717212 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.37224816544364 - type: f1 value: 93.19978829237863 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 91.28719072972127 - type: f1 value: 91.28448045979604 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.8131946934385 - type: f1 value: 88.27883019362747 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 85.52260397830018 - type: f1 value: 85.15528226728568 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 86.10807113543093 - type: f1 value: 70.88498219072167 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.77120315581854 - type: f1 value: 57.97153920153224 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 79.93995997331554 - type: f1 value: 58.839203810064866 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.801440651425 - type: f1 value: 58.68009647839332 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.90785227680172 - type: f1 value: 49.83760954655788 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 73.24050632911391 - type: f1 value: 52.0562553541082 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.47948890383321 - type: f1 value: 63.334877563135485 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 44.2871553463349 - type: f1 value: 43.17658050605427 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.174176193678555 - type: f1 value: 59.236659587042425 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.226630800269 - type: f1 value: 60.951842696956184 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.94283792871555 - type: f1 value: 61.40057652844215 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.480833893745796 - type: f1 value: 52.5298332072816 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.52858103564223 - type: f1 value: 69.3770851919204 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.09213180901143 - type: f1 value: 71.13518469365879 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.31203765971756 - type: f1 value: 66.05906970865144 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 80.57162071284465 - type: f1 value: 77.7866172598823 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 75.09414929388029 - type: f1 value: 72.5712594833695 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.20914593140553 - type: f1 value: 68.90619124909186 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.74243443174176 - type: f1 value: 64.72743141749955 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 75.11096166778749 - type: f1 value: 72.61849933064694 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.22394082044384 - type: f1 value: 62.43648797607235 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.44855413584399 - type: f1 value: 66.56851670913659 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.4149293880296 - type: f1 value: 66.12960877904776 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.916610625420304 - type: f1 value: 54.02534600927991 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.71351714862138 - type: f1 value: 69.70227985126316 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.91257565568257 - type: f1 value: 57.06811572144974 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 75.25218560860793 - type: f1 value: 72.48057563104247 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.35507733691998 - type: f1 value: 73.03024649541128 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.918628110289184 - type: f1 value: 54.75590124456177 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 52.548755884330866 - type: f1 value: 51.5356975360209 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 46.44922663080027 - type: f1 value: 44.561114416830975 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.95763281775386 - type: f1 value: 50.68367245122476 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.20645595158035 - type: f1 value: 71.78450093258185 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.226630800269 - type: f1 value: 57.53988988993337 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.44922663080027 - type: f1 value: 48.58809018065056 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.3752521856086 - type: f1 value: 49.91373941436425 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.85205110961668 - type: f1 value: 67.05660019588582 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 49.1492938802959 - type: f1 value: 46.717578025393195 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.93140551445865 - type: f1 value: 67.45406609372205 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.82851378614662 - type: f1 value: 71.15951964393868 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.84868863483524 - type: f1 value: 71.76056802364877 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 75.27236045729657 - type: f1 value: 72.48733090101163 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.63012777404168 - type: f1 value: 66.56444015346203 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.62743779421655 - type: f1 value: 73.82720656992142 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.15198386012105 - type: f1 value: 64.41418309797744 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.8399462004035 - type: f1 value: 56.050989519693886 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.86684599865501 - type: f1 value: 70.80682480844303 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.36718224613316 - type: f1 value: 54.998746471013774 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.150638870208475 - type: f1 value: 49.79179342620099 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.50638870208473 - type: f1 value: 49.778960742003555 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.906523201076 - type: f1 value: 66.75784022138245 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.73234700739744 - type: f1 value: 65.75016141148413 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.06792199058508 - type: f1 value: 67.90334782594083 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.09145931405515 - type: f1 value: 58.88703095210731 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.17014122394083 - type: f1 value: 68.43676277921544 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.99327505043712 - type: f1 value: 72.26813373392943 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.13987895090787 - type: f1 value: 70.29309514467575 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.37256220578345 - type: f1 value: 72.56456170538992 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 47.205783456624076 - type: f1 value: 45.905999859074434 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.8352387357095 - type: f1 value: 69.43553987525273 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.00403496973773 - type: f1 value: 65.97477215779143 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.04976462676531 - type: f1 value: 67.24581993778398 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.882985877605925 - type: f1 value: 59.995293199988794 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.75857431069267 - type: f1 value: 76.52031675299841 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.03496973772697 - type: f1 value: 79.25548063175344 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.96570275722931 - type: f1 value: 72.19110435289122 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 82.38735709482178 - type: f1 value: 82.34495627619785 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.83994620040352 - type: f1 value: 78.91526355393667 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.7350369872226 - type: f1 value: 75.919437344927 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.21721587088096 - type: f1 value: 70.82973286243262 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.59784801613988 - type: f1 value: 78.47383161087423 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.64021519838602 - type: f1 value: 68.45118053027653 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.51042367182245 - type: f1 value: 72.90013022879003 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.0551445864156 - type: f1 value: 73.45871761713292 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.54606590450571 - type: f1 value: 57.72711794953869 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.40753194351042 - type: f1 value: 76.8157455506521 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.58372562205783 - type: f1 value: 65.2654868709758 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.39273705447208 - type: f1 value: 78.3592956594837 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.62004034969739 - type: f1 value: 79.78673754501855 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.29051782111634 - type: f1 value: 63.12502587609454 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.51849361129791 - type: f1 value: 56.32320906403241 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 52.41761936785474 - type: f1 value: 49.113762010098306 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.547410894418284 - type: f1 value: 56.87580674198118 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.89038332212507 - type: f1 value: 79.09210140529848 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.503698722259585 - type: f1 value: 61.45718858568352 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 54.02824478816408 - type: f1 value: 52.732738981386504 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 54.23671822461331 - type: f1 value: 52.688080372545286 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.5312710154674 - type: f1 value: 74.59368478550698 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 52.192333557498316 - type: f1 value: 50.18302290152229 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.6960322797579 - type: f1 value: 75.25331182714856 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.47679892400808 - type: f1 value: 78.24044732352424 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.36718224613315 - type: f1 value: 77.2714452985389 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.96234028244788 - type: f1 value: 78.21282127011372 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.19435104236717 - type: f1 value: 73.1963711292812 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.52118359112306 - type: f1 value: 80.4179964390288 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.65837256220577 - type: f1 value: 73.07156989634905 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.02824478816409 - type: f1 value: 62.972399027713664 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.87020847343645 - type: f1 value: 78.224240866849 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.6570275722932 - type: f1 value: 63.274871811412545 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.760591795561524 - type: f1 value: 56.73711528075771 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.26967047747142 - type: f1 value: 55.74735330863165 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.46133154001345 - type: f1 value: 71.9644168952811 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.70880968392737 - type: f1 value: 73.61543141070884 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.0437121721587 - type: f1 value: 74.83359868879921 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.05110961667788 - type: f1 value: 66.25869819274315 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.52118359112306 - type: f1 value: 75.92098546052303 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.92938802958977 - type: f1 value: 79.79833572573796 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.86617350369872 - type: f1 value: 77.42645654909516 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 44.6 - type: map_at_10 value: 50.019000000000005 - type: map_at_100 value: 50.611 - type: map_at_1000 value: 50.67 - type: map_at_3 value: 48.699999999999996 - type: map_at_5 value: 49.455 - type: mrr_at_1 value: 44.800000000000004 - type: mrr_at_10 value: 50.119 - type: mrr_at_100 value: 50.711 - type: mrr_at_1000 value: 50.77 - type: mrr_at_3 value: 48.8 - type: mrr_at_5 value: 49.555 - type: ndcg_at_1 value: 44.6 - type: ndcg_at_10 value: 52.754 - type: ndcg_at_100 value: 55.935 - type: ndcg_at_1000 value: 57.607 - type: ndcg_at_3 value: 50.012 - type: ndcg_at_5 value: 51.393 - type: precision_at_1 value: 44.6 - type: precision_at_10 value: 6.140000000000001 - type: precision_at_100 value: 0.77 - type: precision_at_1000 value: 0.09 - type: precision_at_3 value: 17.933 - type: precision_at_5 value: 11.44 - type: recall_at_1 value: 44.6 - type: recall_at_10 value: 61.4 - type: recall_at_100 value: 77.0 - type: recall_at_1000 value: 90.4 - type: recall_at_3 value: 53.800000000000004 - type: recall_at_5 value: 57.199999999999996 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 38.192667527616315 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 37.44738902946689 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.59661273103955 - type: mrr value: 33.82024242497473 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: None metrics: - type: accuracy value: 73.31333333333335 - type: f1 value: 73.0873466527602 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.471 - type: map_at_10 value: 14.142 - type: map_at_100 value: 18.179000000000002 - type: map_at_1000 value: 19.772000000000002 - type: map_at_3 value: 9.716 - type: map_at_5 value: 11.763 - type: mrr_at_1 value: 51.393 - type: mrr_at_10 value: 58.814 - type: mrr_at_100 value: 59.330000000000005 - type: mrr_at_1000 value: 59.35 - type: mrr_at_3 value: 56.398 - type: mrr_at_5 value: 58.038999999999994 - type: ndcg_at_1 value: 49.69 - type: ndcg_at_10 value: 38.615 - type: ndcg_at_100 value: 35.268 - type: ndcg_at_1000 value: 43.745 - type: ndcg_at_3 value: 43.187 - type: ndcg_at_5 value: 41.528999999999996 - type: precision_at_1 value: 51.083999999999996 - type: precision_at_10 value: 29.474 - type: precision_at_100 value: 9.167 - type: precision_at_1000 value: 2.2089999999999996 - type: precision_at_3 value: 40.351 - type: precision_at_5 value: 36.285000000000004 - type: recall_at_1 value: 5.471 - type: recall_at_10 value: 19.242 - type: recall_at_100 value: 37.14 - type: recall_at_1000 value: 68.35900000000001 - type: recall_at_3 value: 10.896 - type: recall_at_5 value: 14.75 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 39.499 - type: map_at_10 value: 55.862 - type: map_at_100 value: 56.667 - type: map_at_1000 value: 56.684999999999995 - type: map_at_3 value: 51.534 - type: map_at_5 value: 54.2 - type: mrr_at_1 value: 44.351 - type: mrr_at_10 value: 58.567 - type: mrr_at_100 value: 59.099000000000004 - type: mrr_at_1000 value: 59.109 - type: mrr_at_3 value: 55.218999999999994 - type: mrr_at_5 value: 57.391999999999996 - type: ndcg_at_1 value: 44.322 - type: ndcg_at_10 value: 63.535 - type: ndcg_at_100 value: 66.654 - type: ndcg_at_1000 value: 66.991 - type: ndcg_at_3 value: 55.701 - type: ndcg_at_5 value: 60.06700000000001 - type: precision_at_1 value: 44.322 - type: precision_at_10 value: 10.026 - type: precision_at_100 value: 1.18 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 24.865000000000002 - type: precision_at_5 value: 17.48 - type: recall_at_1 value: 39.499 - type: recall_at_10 value: 84.053 - type: recall_at_100 value: 97.11 - type: recall_at_1000 value: 99.493 - type: recall_at_3 value: 64.091 - type: recall_at_5 value: 74.063 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: None metrics: - type: cos_sim_accuracy value: 61.18029236599891 - type: cos_sim_ap value: 64.18398769398412 - type: cos_sim_f1 value: 67.96347757046446 - type: cos_sim_precision value: 54.4529262086514 - type: cos_sim_recall value: 90.3907074973601 - type: dot_accuracy value: 61.18029236599891 - type: dot_ap value: 64.18393484706077 - type: dot_f1 value: 67.96347757046446 - type: dot_precision value: 54.4529262086514 - type: dot_recall value: 90.3907074973601 - type: euclidean_accuracy value: 61.18029236599891 - type: euclidean_ap value: 64.18395024821486 - type: euclidean_f1 value: 67.96347757046446 - type: euclidean_precision value: 54.4529262086514 - type: euclidean_recall value: 90.3907074973601 - type: manhattan_accuracy value: 61.451001624255554 - type: manhattan_ap value: 64.38232708763513 - type: manhattan_f1 value: 68.05860805860804 - type: manhattan_precision value: 52.10319685922602 - type: manhattan_recall value: 98.09926082365365 - type: max_accuracy value: 61.451001624255554 - type: max_ap value: 64.38232708763513 - type: max_f1 value: 68.05860805860804 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: None metrics: - type: accuracy value: 92.19000000000001 - type: ap value: 89.73918431886767 - type: f1 value: 92.17175032574507 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: None metrics: - type: cos_sim_pearson value: 15.079320253752224 - type: cos_sim_spearman value: 16.813772504404263 - type: euclidean_pearson value: 19.476541162041762 - type: euclidean_spearman value: 16.813772498098782 - type: manhattan_pearson value: 19.497429832915277 - type: manhattan_spearman value: 16.869600674180607 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 30.36139599797913 - type: cos_sim_spearman value: 31.80296402851347 - type: euclidean_pearson value: 30.10387888252793 - type: euclidean_spearman value: 31.80297780103808 - type: manhattan_pearson value: 30.86720382849436 - type: manhattan_spearman value: 32.70491131366606 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.911 - type: map_at_10 value: 86.087 - type: map_at_100 value: 86.701 - type: map_at_1000 value: 86.715 - type: map_at_3 value: 83.231 - type: map_at_5 value: 85.051 - type: mrr_at_1 value: 82.75 - type: mrr_at_10 value: 88.759 - type: mrr_at_100 value: 88.844 - type: mrr_at_1000 value: 88.844 - type: mrr_at_3 value: 87.935 - type: mrr_at_5 value: 88.504 - type: ndcg_at_1 value: 82.75 - type: ndcg_at_10 value: 89.605 - type: ndcg_at_100 value: 90.664 - type: ndcg_at_1000 value: 90.733 - type: ndcg_at_3 value: 87.03 - type: ndcg_at_5 value: 88.473 - type: precision_at_1 value: 82.75 - type: precision_at_10 value: 13.575000000000001 - type: precision_at_100 value: 1.539 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 38.153 - type: precision_at_5 value: 25.008000000000003 - type: recall_at_1 value: 71.911 - type: recall_at_10 value: 96.261 - type: recall_at_100 value: 99.72800000000001 - type: recall_at_1000 value: 99.993 - type: recall_at_3 value: 88.762 - type: recall_at_5 value: 92.949 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 57.711581165572376 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 66.48938885750297 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 3.7379999999999995 - type: map_at_10 value: 9.261 - type: map_at_100 value: 11.001 - type: map_at_1000 value: 11.262 - type: map_at_3 value: 6.816 - type: map_at_5 value: 8.0 - type: mrr_at_1 value: 18.4 - type: mrr_at_10 value: 28.755999999999997 - type: mrr_at_100 value: 29.892000000000003 - type: mrr_at_1000 value: 29.961 - type: mrr_at_3 value: 25.467000000000002 - type: mrr_at_5 value: 27.332 - type: ndcg_at_1 value: 18.4 - type: ndcg_at_10 value: 16.296 - type: ndcg_at_100 value: 23.52 - type: ndcg_at_1000 value: 28.504 - type: ndcg_at_3 value: 15.485 - type: ndcg_at_5 value: 13.471 - type: precision_at_1 value: 18.4 - type: precision_at_10 value: 8.469999999999999 - type: precision_at_100 value: 1.8950000000000002 - type: precision_at_1000 value: 0.309 - type: precision_at_3 value: 14.6 - type: precision_at_5 value: 11.84 - type: recall_at_1 value: 3.7379999999999995 - type: recall_at_10 value: 17.185 - type: recall_at_100 value: 38.397 - type: recall_at_1000 value: 62.798 - type: recall_at_3 value: 8.896999999999998 - type: recall_at_5 value: 12.021999999999998 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 86.43977757480083 - type: cos_sim_spearman value: 82.64182475199533 - type: euclidean_pearson value: 83.71756009999591 - type: euclidean_spearman value: 82.64182331395057 - type: manhattan_pearson value: 83.8028936913025 - type: manhattan_spearman value: 82.71024597804252 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.85653060698912 - type: cos_sim_spearman value: 79.65598885228324 - type: euclidean_pearson value: 83.1205137628455 - type: euclidean_spearman value: 79.65629387709038 - type: manhattan_pearson value: 83.71108853545837 - type: manhattan_spearman value: 80.25617619716708 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 88.22921688565664 - type: cos_sim_spearman value: 88.42662103041957 - type: euclidean_pearson value: 87.91679798473325 - type: euclidean_spearman value: 88.42662103041957 - type: manhattan_pearson value: 88.16927537961303 - type: manhattan_spearman value: 88.81581680062541 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 86.77261424554293 - type: cos_sim_spearman value: 84.53930146434155 - type: euclidean_pearson value: 85.67420491389697 - type: euclidean_spearman value: 84.53929771783851 - type: manhattan_pearson value: 85.74306784515618 - type: manhattan_spearman value: 84.7399304675314 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 89.86138395166455 - type: cos_sim_spearman value: 90.42577823022054 - type: euclidean_pearson value: 89.8787763797515 - type: euclidean_spearman value: 90.42577823022054 - type: manhattan_pearson value: 89.9592937492158 - type: manhattan_spearman value: 90.63535505335524 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 86.5176674585941 - type: cos_sim_spearman value: 87.6842917085397 - type: euclidean_pearson value: 86.70213081520711 - type: euclidean_spearman value: 87.6842917085397 - type: manhattan_pearson value: 86.83702628983627 - type: manhattan_spearman value: 87.87791000374443 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 83.86395454805867 - type: cos_sim_spearman value: 83.69454595252267 - type: euclidean_pearson value: 83.04743892608313 - type: euclidean_spearman value: 83.69454026433006 - type: manhattan_pearson value: 83.4032095553322 - type: manhattan_spearman value: 84.11527379013802 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 81.80249894729546 - type: cos_sim_spearman value: 81.87004960533409 - type: euclidean_pearson value: 80.0392760044179 - type: euclidean_spearman value: 81.87004960533409 - type: manhattan_pearson value: 80.38096542355912 - type: manhattan_spearman value: 82.40774679630341 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 77.6158201787172 - type: cos_sim_spearman value: 77.934651044009 - type: euclidean_pearson value: 77.7874683895269 - type: euclidean_spearman value: 77.934651044009 - type: manhattan_pearson value: 78.36151849193052 - type: manhattan_spearman value: 78.52439586349938 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.04363311392207 - type: cos_sim_spearman value: 87.30483659369973 - type: euclidean_pearson value: 87.62634489502616 - type: euclidean_spearman value: 87.30483659369973 - type: manhattan_pearson value: 88.02340837141445 - type: manhattan_spearman value: 87.55012003294 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 91.69172851958248 - type: cos_sim_spearman value: 91.7546879482416 - type: euclidean_pearson value: 91.84843039183963 - type: euclidean_spearman value: 91.7546879482416 - type: manhattan_pearson value: 91.72325753804357 - type: manhattan_spearman value: 91.55330259513397 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 73.95572901084864 - type: cos_sim_spearman value: 72.56217821552626 - type: euclidean_pearson value: 74.24242980323574 - type: euclidean_spearman value: 72.56217821552626 - type: manhattan_pearson value: 74.57473362519922 - type: manhattan_spearman value: 72.76048826648497 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.93329396008296 - type: cos_sim_spearman value: 88.2406635486219 - type: euclidean_pearson value: 87.49687343908533 - type: euclidean_spearman value: 88.2406635486219 - type: manhattan_pearson value: 88.14088309231084 - type: manhattan_spearman value: 88.93314020908534 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.70124451546057 - type: cos_sim_spearman value: 87.45988160052252 - type: euclidean_pearson value: 88.44395505247728 - type: euclidean_spearman value: 87.45988160052252 - type: manhattan_pearson value: 88.69269783495425 - type: manhattan_spearman value: 87.65383425621 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.64109149761346 - type: cos_sim_spearman value: 88.06459637689733 - type: euclidean_pearson value: 88.02313315797703 - type: euclidean_spearman value: 88.06459637689733 - type: manhattan_pearson value: 88.28328539133253 - type: manhattan_spearman value: 88.06605708379142 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.9040028177525 - type: cos_sim_spearman value: 89.68152202933464 - type: euclidean_pearson value: 89.23684469601253 - type: euclidean_spearman value: 89.68152202933464 - type: manhattan_pearson value: 89.59504307277454 - type: manhattan_spearman value: 89.88060100313582 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.69891585325125 - type: cos_sim_spearman value: 88.25252785071736 - type: euclidean_pearson value: 87.99932873748662 - type: euclidean_spearman value: 88.25252785071736 - type: manhattan_pearson value: 88.26959683009446 - type: manhattan_spearman value: 88.32583227300715 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.53235909794135 - type: cos_sim_spearman value: 66.97521740529574 - type: euclidean_pearson value: 68.19502223613912 - type: euclidean_spearman value: 66.97521740529574 - type: manhattan_pearson value: 68.39070714774539 - type: manhattan_spearman value: 67.1072812364868 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 43.715742021204775 - type: cos_sim_spearman value: 49.12255971271453 - type: euclidean_pearson value: 40.76848562610837 - type: euclidean_spearman value: 49.12255971271453 - type: manhattan_pearson value: 40.92204625614112 - type: manhattan_spearman value: 49.23333793661129 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.35268345563588 - type: cos_sim_spearman value: 66.99661626042061 - type: euclidean_pearson value: 65.85589122857066 - type: euclidean_spearman value: 66.99661626042061 - type: manhattan_pearson value: 66.78454301512294 - type: manhattan_spearman value: 67.17570330149233 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 33.36599908204445 - type: cos_sim_spearman value: 39.20768331939503 - type: euclidean_pearson value: 22.16066769530468 - type: euclidean_spearman value: 39.20768331939503 - type: manhattan_pearson value: 22.386053195546022 - type: manhattan_spearman value: 39.70172817465986 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.06813956986753 - type: cos_sim_spearman value: 68.72065117995668 - type: euclidean_pearson value: 66.97373456344194 - type: euclidean_spearman value: 68.72065117995668 - type: manhattan_pearson value: 67.34907265771595 - type: manhattan_spearman value: 68.73705769957843 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 47.17664865207108 - type: cos_sim_spearman value: 54.115568323148864 - type: euclidean_pearson value: 48.56418162879182 - type: euclidean_spearman value: 54.115568323148864 - type: manhattan_pearson value: 48.85951643453165 - type: manhattan_spearman value: 54.13599784169052 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 55.87514136275987 - type: cos_sim_spearman value: 60.82923573674973 - type: euclidean_pearson value: 53.724183308215615 - type: euclidean_spearman value: 60.82923573674973 - type: manhattan_pearson value: 53.954305573102445 - type: manhattan_spearman value: 60.957483900644526 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 59.55001413648593 - type: cos_sim_spearman value: 63.395777040381276 - type: euclidean_pearson value: 59.869972550293305 - type: euclidean_spearman value: 63.395777040381276 - type: manhattan_pearson value: 61.16195496847885 - type: manhattan_spearman value: 63.41968682525581 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 79.13334972675852 - type: cos_sim_spearman value: 79.86263136371802 - type: euclidean_pearson value: 78.2433603592541 - type: euclidean_spearman value: 79.86263136371802 - type: manhattan_pearson value: 78.87337106318412 - type: manhattan_spearman value: 80.31230584758441 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.559700748242356 - type: cos_sim_spearman value: 60.92342109509558 - type: euclidean_pearson value: 66.07256437521119 - type: euclidean_spearman value: 60.92342109509558 - type: manhattan_pearson value: 67.72769744612663 - type: manhattan_spearman value: 59.64714507774168 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 73.93491616145891 - type: cos_sim_spearman value: 75.84242594400156 - type: euclidean_pearson value: 74.87279745626121 - type: euclidean_spearman value: 75.84242594400156 - type: manhattan_pearson value: 76.47764144677505 - type: manhattan_spearman value: 77.08411157845183 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 72.75624124540954 - type: cos_sim_spearman value: 75.8667941654703 - type: euclidean_pearson value: 73.74314588451925 - type: euclidean_spearman value: 75.8667941654703 - type: manhattan_pearson value: 73.99641425871518 - type: manhattan_spearman value: 76.1982840205817 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 75.20898141298767 - type: cos_sim_spearman value: 73.18060375331436 - type: euclidean_pearson value: 75.44489280944619 - type: euclidean_spearman value: 73.18060375331436 - type: manhattan_pearson value: 75.65451039552286 - type: manhattan_spearman value: 72.97744006123156 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 72.04278252247816 - type: cos_sim_spearman value: 71.8846446821539 - type: euclidean_pearson value: 73.16043307050612 - type: euclidean_spearman value: 71.8846446821539 - type: manhattan_pearson value: 74.76905116839777 - type: manhattan_spearman value: 72.66237093518471 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 71.71033173838558 - type: cos_sim_spearman value: 75.043122881885 - type: euclidean_pearson value: 72.77579680345087 - type: euclidean_spearman value: 75.043122881885 - type: manhattan_pearson value: 72.99901534854922 - type: manhattan_spearman value: 75.15418335015957 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 55.75733447190482 - type: cos_sim_spearman value: 61.38968334176681 - type: euclidean_pearson value: 55.479231520643744 - type: euclidean_spearman value: 61.38968334176681 - type: manhattan_pearson value: 56.05230571465244 - type: manhattan_spearman value: 62.69383054007398 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 41.72244325050302 - type: cos_sim_spearman value: 54.47476909084119 - type: euclidean_pearson value: 43.94629756436873 - type: euclidean_spearman value: 54.47476909084119 - type: manhattan_pearson value: 46.36533046394657 - type: manhattan_spearman value: 54.87509243633636 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 70.75183711835146 - type: cos_sim_spearman value: 84.51542547285167 - type: euclidean_pearson value: 71.84188960126669 - type: euclidean_spearman value: 84.51542547285167 - type: manhattan_pearson value: 73.94847166379994 - type: manhattan_spearman value: 84.51542547285167 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: None metrics: - type: cos_sim_pearson value: 81.78690149086131 - type: cos_sim_spearman value: 81.81202616916873 - type: euclidean_pearson value: 80.98792254251062 - type: euclidean_spearman value: 81.81202616916873 - type: manhattan_pearson value: 81.46953021346732 - type: manhattan_spearman value: 82.34259562492315 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 87.68273341294419 - type: cos_sim_spearman value: 88.59927164210958 - type: euclidean_pearson value: 88.10745681818025 - type: euclidean_spearman value: 88.59927164210958 - type: manhattan_pearson value: 88.25166703784649 - type: manhattan_spearman value: 88.85343247873482 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 86.3340463345719 - type: mrr value: 96.5182611506141 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 60.967000000000006 - type: map_at_10 value: 71.873 - type: map_at_100 value: 72.271 - type: map_at_1000 value: 72.292 - type: map_at_3 value: 69.006 - type: map_at_5 value: 70.856 - type: mrr_at_1 value: 63.666999999999994 - type: mrr_at_10 value: 72.929 - type: mrr_at_100 value: 73.26 - type: mrr_at_1000 value: 73.282 - type: mrr_at_3 value: 71.111 - type: mrr_at_5 value: 72.328 - type: ndcg_at_1 value: 63.666999999999994 - type: ndcg_at_10 value: 76.414 - type: ndcg_at_100 value: 78.152 - type: ndcg_at_1000 value: 78.604 - type: ndcg_at_3 value: 71.841 - type: ndcg_at_5 value: 74.435 - type: precision_at_1 value: 63.666999999999994 - type: precision_at_10 value: 10.067 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.667 - type: precision_at_5 value: 18.467 - type: recall_at_1 value: 60.967000000000006 - type: recall_at_10 value: 88.922 - type: recall_at_100 value: 96.667 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 77.228 - type: recall_at_5 value: 83.428 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.82277227722773 - type: cos_sim_ap value: 95.66279851444406 - type: cos_sim_f1 value: 90.9367088607595 - type: cos_sim_precision value: 92.1025641025641 - type: cos_sim_recall value: 89.8 - type: dot_accuracy value: 99.82277227722773 - type: dot_ap value: 95.66279851444406 - type: dot_f1 value: 90.9367088607595 - type: dot_precision value: 92.1025641025641 - type: dot_recall value: 89.8 - type: euclidean_accuracy value: 99.82277227722773 - type: euclidean_ap value: 95.66279851444406 - type: euclidean_f1 value: 90.9367088607595 - type: euclidean_precision value: 92.1025641025641 - type: euclidean_recall value: 89.8 - type: manhattan_accuracy value: 99.82673267326733 - type: manhattan_ap value: 95.86094873177069 - type: manhattan_f1 value: 91.26788357178096 - type: manhattan_precision value: 90.06815968841285 - type: manhattan_recall value: 92.5 - type: max_accuracy value: 99.82673267326733 - type: max_ap value: 95.86094873177069 - type: max_f1 value: 91.26788357178096 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 73.09533925852372 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 45.90745648090035 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 54.91147686504404 - type: mrr value: 56.03900082760377 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.46908662038217 - type: cos_sim_spearman value: 31.40325730367437 - type: dot_pearson value: 31.469083969291894 - type: dot_spearman value: 31.40325730367437 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 66.90300783402137 - type: mrr value: 77.06451972574179 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 25.82 - type: map_at_10 value: 72.32300000000001 - type: map_at_100 value: 76.198 - type: map_at_1000 value: 76.281 - type: map_at_3 value: 50.719 - type: map_at_5 value: 62.326 - type: mrr_at_1 value: 86.599 - type: mrr_at_10 value: 89.751 - type: mrr_at_100 value: 89.876 - type: mrr_at_1000 value: 89.88000000000001 - type: mrr_at_3 value: 89.151 - type: mrr_at_5 value: 89.519 - type: ndcg_at_1 value: 86.599 - type: ndcg_at_10 value: 80.676 - type: ndcg_at_100 value: 85.03 - type: ndcg_at_1000 value: 85.854 - type: ndcg_at_3 value: 82.057 - type: ndcg_at_5 value: 80.537 - type: precision_at_1 value: 86.599 - type: precision_at_10 value: 40.373 - type: precision_at_100 value: 4.95 - type: precision_at_1000 value: 0.514 - type: precision_at_3 value: 71.918 - type: precision_at_5 value: 60.246 - type: recall_at_1 value: 25.82 - type: recall_at_10 value: 79.905 - type: recall_at_100 value: 93.88499999999999 - type: recall_at_1000 value: 98.073 - type: recall_at_3 value: 52.623 - type: recall_at_5 value: 66.233 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: None metrics: - type: accuracy value: 47.050000000000004 - type: f1 value: 45.704071498353294 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.243 - type: map_at_10 value: 2.278 - type: map_at_100 value: 14.221 - type: map_at_1000 value: 33.474 - type: map_at_3 value: 0.7270000000000001 - type: map_at_5 value: 1.183 - type: mrr_at_1 value: 94.0 - type: mrr_at_10 value: 97.0 - type: mrr_at_100 value: 97.0 - type: mrr_at_1000 value: 97.0 - type: mrr_at_3 value: 97.0 - type: mrr_at_5 value: 97.0 - type: ndcg_at_1 value: 90.0 - type: ndcg_at_10 value: 87.249 - type: ndcg_at_100 value: 67.876 - type: ndcg_at_1000 value: 59.205 - type: ndcg_at_3 value: 90.12299999999999 - type: ndcg_at_5 value: 89.126 - type: precision_at_1 value: 94.0 - type: precision_at_10 value: 90.8 - type: precision_at_100 value: 69.28 - type: precision_at_1000 value: 25.85 - type: precision_at_3 value: 94.667 - type: precision_at_5 value: 92.80000000000001 - type: recall_at_1 value: 0.243 - type: recall_at_10 value: 2.392 - type: recall_at_100 value: 16.982 - type: recall_at_1000 value: 55.214 - type: recall_at_3 value: 0.745 - type: recall_at_5 value: 1.2229999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 70.5 - type: f1 value: 67.05501804646966 - type: precision value: 65.73261904761904 - type: recall value: 70.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 75.14450867052022 - type: f1 value: 70.98265895953759 - type: precision value: 69.26782273603082 - type: recall value: 75.14450867052022 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 33.170731707317074 - type: f1 value: 29.92876500193573 - type: precision value: 28.669145894755648 - type: recall value: 33.170731707317074 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.5 - type: f1 value: 94.13333333333333 - type: precision value: 93.46666666666667 - type: recall value: 95.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 99.6 - type: f1 value: 99.46666666666665 - type: precision value: 99.4 - type: recall value: 99.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.2 - type: f1 value: 96.39999999999999 - type: precision value: 96.0 - type: recall value: 97.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.5 - type: f1 value: 92.99666666666667 - type: precision value: 92.31666666666666 - type: recall value: 94.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.82089552238806 - type: f1 value: 81.59203980099502 - type: precision value: 79.60199004975124 - type: recall value: 85.82089552238806 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 79.5 - type: f1 value: 75.11246031746032 - type: precision value: 73.38734126984127 - type: recall value: 79.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 44.390243902439025 - type: f1 value: 38.48896631823461 - type: precision value: 36.57220286488579 - type: recall value: 44.390243902439025 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.2 - type: f1 value: 87.57333333333334 - type: precision value: 86.34166666666665 - type: recall value: 90.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.82138517618469 - type: f1 value: 85.98651854423423 - type: precision value: 84.79257073424753 - type: recall value: 88.82138517618469 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.04347826086956 - type: f1 value: 72.32108147606868 - type: precision value: 70.37207357859532 - type: recall value: 77.04347826086956 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 53.04347826086957 - type: f1 value: 46.88868184955141 - type: precision value: 44.71730105643149 - type: recall value: 53.04347826086957 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 68.0 - type: f1 value: 62.891813186813195 - type: precision value: 61.037906162464985 - type: recall value: 68.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.3 - type: f1 value: 82.82000000000001 - type: precision value: 81.25690476190475 - type: recall value: 86.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 68.87816646562122 - type: f1 value: 63.53054933272062 - type: precision value: 61.47807816331196 - type: recall value: 68.87816646562122 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 74.4 - type: f1 value: 68.99388888888889 - type: precision value: 66.81035714285713 - type: recall value: 74.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.5 - type: f1 value: 87.93666666666667 - type: precision value: 86.825 - type: recall value: 90.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.7 - type: f1 value: 88.09 - type: precision value: 86.85833333333333 - type: recall value: 90.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 67.61904761904762 - type: f1 value: 62.30239247214037 - type: precision value: 60.340702947845806 - type: recall value: 67.61904761904762 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.9 - type: f1 value: 73.81285714285714 - type: precision value: 72.21570818070818 - type: recall value: 77.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.8 - type: f1 value: 89.66666666666667 - type: precision value: 88.66666666666666 - type: recall value: 91.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.6 - type: f1 value: 96.85666666666665 - type: precision value: 96.50833333333333 - type: recall value: 97.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.39999999999999 - type: f1 value: 93.98333333333333 - type: precision value: 93.30000000000001 - type: recall value: 95.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.0 - type: f1 value: 81.31538461538462 - type: precision value: 79.70666666666666 - type: recall value: 85.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.60000000000001 - type: f1 value: 89.81888888888888 - type: precision value: 89.08583333333333 - type: recall value: 91.60000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 44.3 - type: f1 value: 38.8623088023088 - type: precision value: 37.03755623461505 - type: recall value: 44.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.19999999999999 - type: f1 value: 93.75 - type: precision value: 93.05 - type: recall value: 95.19999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 99.1 - type: f1 value: 98.8 - type: precision value: 98.65 - type: recall value: 99.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 69.6765498652291 - type: f1 value: 63.991785393402644 - type: precision value: 61.7343729944808 - type: recall value: 69.6765498652291 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 50.0 - type: f1 value: 42.79341029341029 - type: precision value: 40.25098358431692 - type: recall value: 50.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.7 - type: f1 value: 87.19023809523809 - type: precision value: 86.12595238095237 - type: recall value: 89.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 42.72727272727273 - type: f1 value: 37.78789518562245 - type: precision value: 36.24208471267295 - type: recall value: 42.72727272727273 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 75.26205450733752 - type: f1 value: 70.72842833849123 - type: precision value: 68.93256464011182 - type: recall value: 75.26205450733752 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.19999999999999 - type: f1 value: 93.96666666666668 - type: precision value: 93.42 - type: recall value: 95.19999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 76.26459143968872 - type: f1 value: 72.40190419178747 - type: precision value: 70.84954604409856 - type: recall value: 76.26459143968872 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 59.82905982905983 - type: f1 value: 52.2100122100122 - type: precision value: 49.52516619183286 - type: recall value: 59.82905982905983 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 81.69999999999999 - type: f1 value: 77.41714285714286 - type: precision value: 75.64833333333334 - type: recall value: 81.69999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.5 - type: f1 value: 94.45 - type: precision value: 93.93333333333334 - type: recall value: 95.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 58.41121495327103 - type: f1 value: 52.73495974430554 - type: precision value: 50.717067200712066 - type: recall value: 58.41121495327103 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 73.3 - type: f1 value: 69.20371794871795 - type: precision value: 67.6597557997558 - type: recall value: 73.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.5 - type: f1 value: 95.51666666666667 - type: precision value: 95.05 - type: recall value: 96.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 78.4 - type: f1 value: 73.88856643356644 - type: precision value: 72.01373015873016 - type: recall value: 78.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.3 - type: f1 value: 94.09666666666668 - type: precision value: 93.53333333333332 - type: recall value: 95.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.7 - type: f1 value: 91.94 - type: precision value: 91.10833333333333 - type: recall value: 93.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.8 - type: f1 value: 95.89999999999999 - type: precision value: 95.46666666666668 - type: recall value: 96.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 70.5 - type: f1 value: 66.00635642135641 - type: precision value: 64.36345238095238 - type: recall value: 70.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.4 - type: f1 value: 90.44388888888889 - type: precision value: 89.5767857142857 - type: recall value: 92.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 48.0 - type: f1 value: 43.15372775372776 - type: precision value: 41.53152510162313 - type: recall value: 48.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 16.7 - type: f1 value: 14.198431372549017 - type: precision value: 13.411765873015872 - type: recall value: 16.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.7 - type: f1 value: 81.81666666666666 - type: precision value: 80.10833333333332 - type: recall value: 85.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 69.64285714285714 - type: f1 value: 64.745670995671 - type: precision value: 62.916666666666664 - type: recall value: 69.64285714285714 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 54.665203073545555 - type: f1 value: 48.55366630916923 - type: precision value: 46.35683318998357 - type: recall value: 54.665203073545555 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 4.8 - type: f1 value: 3.808587223587223 - type: precision value: 3.5653174603174604 - type: recall value: 4.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.6 - type: f1 value: 95.77333333333333 - type: precision value: 95.39166666666667 - type: recall value: 96.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.39999999999999 - type: f1 value: 94.44 - type: precision value: 93.975 - type: recall value: 95.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 42.0 - type: f1 value: 37.024908424908425 - type: precision value: 35.365992063492065 - type: recall value: 42.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 66.7 - type: f1 value: 62.20460835058661 - type: precision value: 60.590134587634594 - type: recall value: 66.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.3 - type: f1 value: 96.46666666666667 - type: precision value: 96.06666666666668 - type: recall value: 97.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 47.3 - type: f1 value: 41.96905408317173 - type: precision value: 40.18741402116402 - type: recall value: 47.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 80.2 - type: f1 value: 76.22690476190476 - type: precision value: 74.63539682539682 - type: recall value: 80.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.0 - type: f1 value: 94.83333333333333 - type: precision value: 94.26666666666668 - type: recall value: 96.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.7 - type: f1 value: 87.24333333333334 - type: precision value: 86.17 - type: recall value: 89.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 50.36496350364964 - type: f1 value: 44.795520780922246 - type: precision value: 43.09002433090024 - type: recall value: 50.36496350364964 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 18.8 - type: f1 value: 16.242864357864356 - type: precision value: 15.466596638655464 - type: recall value: 18.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.19999999999999 - type: f1 value: 93.92333333333333 - type: precision value: 93.30833333333332 - type: recall value: 95.19999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.4 - type: f1 value: 91.42333333333333 - type: precision value: 90.50833333333334 - type: recall value: 93.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 26.190476190476193 - type: f1 value: 22.05208151636723 - type: precision value: 21.09292328042328 - type: recall value: 26.190476190476193 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 17.2 - type: f1 value: 14.021009731460952 - type: precision value: 13.1389886698243 - type: recall value: 17.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 78.67494824016563 - type: f1 value: 74.24430641821947 - type: precision value: 72.50747642051991 - type: recall value: 78.67494824016563 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.19999999999999 - type: f1 value: 92.54 - type: precision value: 91.75833333333334 - type: recall value: 94.19999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.2 - type: f1 value: 87.78666666666666 - type: precision value: 86.69833333333334 - type: recall value: 90.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 14.7 - type: f1 value: 12.19206214842218 - type: precision value: 11.526261904761904 - type: recall value: 14.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 73.16017316017316 - type: f1 value: 67.44858316286889 - type: precision value: 65.23809523809523 - type: recall value: 73.16017316017316 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 75.19083969465649 - type: f1 value: 70.33078880407125 - type: precision value: 68.3969465648855 - type: recall value: 75.19083969465649 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 62.154294032023294 - type: f1 value: 55.86030821838681 - type: precision value: 53.53509623160277 - type: recall value: 62.154294032023294 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.8 - type: f1 value: 83.9652380952381 - type: precision value: 82.84242424242424 - type: recall value: 86.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.50282485875707 - type: f1 value: 91.54425612052731 - type: precision value: 90.65442561205272 - type: recall value: 93.50282485875707 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 11.4 - type: f1 value: 9.189775870222714 - type: precision value: 8.66189886502811 - type: recall value: 11.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.4 - type: f1 value: 91.88666666666666 - type: precision value: 91.21444444444444 - type: recall value: 93.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 46.0 - type: f1 value: 40.51069226095542 - type: precision value: 38.57804926010808 - type: recall value: 46.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.0 - type: f1 value: 89.11333333333333 - type: precision value: 88.27000000000001 - type: recall value: 91.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.39999999999999 - type: f1 value: 92.95 - type: precision value: 92.27000000000001 - type: recall value: 94.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 14.2 - type: f1 value: 11.73701698770113 - type: precision value: 11.079207014736676 - type: recall value: 14.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 65.14745308310992 - type: f1 value: 59.665707393589415 - type: precision value: 57.560853653346946 - type: recall value: 65.14745308310992 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.39999999999999 - type: f1 value: 94.0 - type: precision value: 93.33333333333333 - type: recall value: 95.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 69.56521739130434 - type: f1 value: 62.92490118577074 - type: precision value: 60.27009222661397 - type: recall value: 69.56521739130434 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 40.140845070422536 - type: f1 value: 35.96411804158283 - type: precision value: 34.89075869357559 - type: recall value: 40.140845070422536 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 65.86826347305389 - type: f1 value: 59.646248628284546 - type: precision value: 57.22982606216139 - type: recall value: 65.86826347305389 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.89999999999999 - type: f1 value: 93.48333333333333 - type: precision value: 92.83666666666667 - type: recall value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 47.783251231527096 - type: f1 value: 42.006447302013804 - type: precision value: 40.12747105111637 - type: recall value: 47.783251231527096 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 69.71830985915493 - type: f1 value: 64.80266212660578 - type: precision value: 63.08098591549296 - type: recall value: 69.71830985915493 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 67.94871794871796 - type: f1 value: 61.59912309912309 - type: precision value: 59.17338217338218 - type: recall value: 67.94871794871796 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.39999999999999 - type: f1 value: 95.28333333333335 - type: precision value: 94.75 - type: recall value: 96.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 70.14613778705638 - type: f1 value: 65.4349338900487 - type: precision value: 63.57599255302805 - type: recall value: 70.14613778705638 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 9.2 - type: f1 value: 7.622184434339607 - type: precision value: 7.287048159682417 - type: recall value: 9.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.85016286644951 - type: f1 value: 72.83387622149837 - type: precision value: 70.58450959102424 - type: recall value: 77.85016286644951 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.8 - type: f1 value: 88.84333333333333 - type: precision value: 87.96666666666665 - type: recall value: 90.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.6 - type: f1 value: 93.14 - type: precision value: 92.49833333333333 - type: recall value: 94.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 84.25196850393701 - type: f1 value: 80.94488188976378 - type: precision value: 79.65879265091863 - type: recall value: 84.25196850393701 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.5 - type: f1 value: 86.89666666666666 - type: precision value: 85.7 - type: recall value: 89.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 42.797783933518005 - type: f1 value: 37.30617360155193 - type: precision value: 35.34933825792552 - type: recall value: 42.797783933518005 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.1 - type: f1 value: 94.93333333333332 - type: precision value: 94.38333333333333 - type: recall value: 96.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 54.807692307692314 - type: f1 value: 49.506903353057204 - type: precision value: 47.54807692307693 - type: recall value: 54.807692307692314 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.1 - type: f1 value: 83.61857142857143 - type: precision value: 81.975 - type: recall value: 87.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.10000000000001 - type: f1 value: 88.76333333333332 - type: precision value: 87.67 - type: recall value: 91.10000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.10000000000001 - type: f1 value: 91.28999999999999 - type: precision value: 90.44500000000001 - type: recall value: 93.10000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 39.97641509433962 - type: f1 value: 33.12271889998028 - type: precision value: 30.95185381542554 - type: recall value: 39.97641509433962 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.60000000000001 - type: f1 value: 90.69 - type: precision value: 89.84500000000001 - type: recall value: 92.60000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.07299270072993 - type: f1 value: 93.64355231143554 - type: precision value: 92.94403892944038 - type: recall value: 95.07299270072993 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.9 - type: f1 value: 89.61333333333333 - type: precision value: 88.53333333333333 - type: recall value: 91.9 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: None metrics: - type: v_measure value: 64.68478289806511 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: None metrics: - type: v_measure value: 57.53010296184097 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.519 - type: map_at_10 value: 10.31 - type: map_at_100 value: 16.027 - type: map_at_1000 value: 17.827 - type: map_at_3 value: 5.721 - type: map_at_5 value: 7.7829999999999995 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 52.642999999999994 - type: mrr_at_100 value: 53.366 - type: mrr_at_1000 value: 53.366 - type: mrr_at_3 value: 48.638999999999996 - type: mrr_at_5 value: 50.578 - type: ndcg_at_1 value: 31.633 - type: ndcg_at_10 value: 26.394000000000002 - type: ndcg_at_100 value: 36.41 - type: ndcg_at_1000 value: 49.206 - type: ndcg_at_3 value: 31.694 - type: ndcg_at_5 value: 29.529 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 23.469 - type: precision_at_100 value: 7.286 - type: precision_at_1000 value: 1.5610000000000002 - type: precision_at_3 value: 34.014 - type: precision_at_5 value: 29.796 - type: recall_at_1 value: 2.519 - type: recall_at_10 value: 17.091 - type: recall_at_100 value: 45.429 - type: recall_at_1000 value: 84.621 - type: recall_at_3 value: 7.208 - type: recall_at_5 value: 10.523 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 69.58659999999999 - type: ap value: 14.735696532619 - type: f1 value: 54.23517220069903 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 63.723825693265425 - type: f1 value: 64.02405729449103 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 54.310161547491006 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 88.77630088812064 - type: cos_sim_ap value: 81.61725457333809 - type: cos_sim_f1 value: 74.91373801916932 - type: cos_sim_precision value: 72.63940520446097 - type: cos_sim_recall value: 77.33509234828496 - type: dot_accuracy value: 88.77630088812064 - type: dot_ap value: 81.61725317476251 - type: dot_f1 value: 74.91373801916932 - type: dot_precision value: 72.63940520446097 - type: dot_recall value: 77.33509234828496 - type: euclidean_accuracy value: 88.77630088812064 - type: euclidean_ap value: 81.61724596869566 - type: euclidean_f1 value: 74.91373801916932 - type: euclidean_precision value: 72.63940520446097 - type: euclidean_recall value: 77.33509234828496 - type: manhattan_accuracy value: 88.67497168742922 - type: manhattan_ap value: 81.430251048948 - type: manhattan_f1 value: 74.79593118171543 - type: manhattan_precision value: 71.3635274382938 - type: manhattan_recall value: 78.57519788918206 - type: max_accuracy value: 88.77630088812064 - type: max_ap value: 81.61725457333809 - type: max_f1 value: 74.91373801916932 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.85136026700819 - type: cos_sim_ap value: 87.74656687446567 - type: cos_sim_f1 value: 80.3221673073403 - type: cos_sim_precision value: 76.56871640957633 - type: cos_sim_recall value: 84.46258084385587 - type: dot_accuracy value: 89.85136026700819 - type: dot_ap value: 87.74656471395072 - type: dot_f1 value: 80.3221673073403 - type: dot_precision value: 76.56871640957633 - type: dot_recall value: 84.46258084385587 - type: euclidean_accuracy value: 89.85136026700819 - type: euclidean_ap value: 87.74656885754466 - type: euclidean_f1 value: 80.3221673073403 - type: euclidean_precision value: 76.56871640957633 - type: euclidean_recall value: 84.46258084385587 - type: manhattan_accuracy value: 89.86300306593705 - type: manhattan_ap value: 87.78807479093082 - type: manhattan_f1 value: 80.31663429471911 - type: manhattan_precision value: 76.63472970137772 - type: manhattan_recall value: 84.3701878657222 - type: max_accuracy value: 89.86300306593705 - type: max_ap value: 87.78807479093082 - type: max_f1 value: 80.3221673073403 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 32.4 - type: map_at_10 value: 40.961999999999996 - type: map_at_100 value: 41.660000000000004 - type: map_at_1000 value: 41.721000000000004 - type: map_at_3 value: 38.550000000000004 - type: map_at_5 value: 40.06 - type: mrr_at_1 value: 32.4 - type: mrr_at_10 value: 40.961999999999996 - type: mrr_at_100 value: 41.660000000000004 - type: mrr_at_1000 value: 41.721000000000004 - type: mrr_at_3 value: 38.550000000000004 - type: mrr_at_5 value: 40.06 - type: ndcg_at_1 value: 32.4 - type: ndcg_at_10 value: 45.388 - type: ndcg_at_100 value: 49.012 - type: ndcg_at_1000 value: 50.659 - type: ndcg_at_3 value: 40.47 - type: ndcg_at_5 value: 43.232 - type: precision_at_1 value: 32.4 - type: precision_at_10 value: 5.94 - type: precision_at_100 value: 0.769 - type: precision_at_1000 value: 0.09 - type: precision_at_3 value: 15.333 - type: precision_at_5 value: 10.56 - type: recall_at_1 value: 32.4 - type: recall_at_10 value: 59.4 - type: recall_at_100 value: 76.9 - type: recall_at_1000 value: 90.0 - type: recall_at_3 value: 46.0 - type: recall_at_5 value: 52.800000000000004 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: None metrics: - type: accuracy value: 86.94000000000001 - type: ap value: 70.57373468481975 - type: f1 value: 85.26264784928323 language: - en license: mit --- ## E5-mistral-7b-instruct Improving Text Embeddings with Large Language Models. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei, arXiv 2024 This model has 32 layers and the embedding size is 4096. ## Usage Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. ### Sentence Transformers Have a look at config_sentence_transformers.json for the prompts that are pre-configured, such as , , and . Additionally, check out unilm/e5/utils.py for prompts we used for evaluation. You can use these via e.g. . ### Transformers ## Supported Languages This model is initialized from Mistral-7B-v0.1 and fine-tuned on a mixture of multilingual datasets. As a result, it has some multilingual capability. However, since Mistral-7B-v0.1 is mainly trained on English data, we recommend using this model for English only. For multilingual use cases, please refer to multilingual-e5-large. ## MTEB Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## FAQ **1. Do I need to add instructions to the query?** Yes, this is how the model is trained, otherwise you will see a performance degradation. The task definition should be a one-sentence instruction that describes the task. This is a way to customize text embeddings for different scenarios through natural language instructions. Please check out unilm/e5/utils.py for instructions we used for evaluation. On the other hand, there is no need to add instructions to the document side. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Where are the LoRA-only weights?** You can find the LoRA-only weights at ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations Using this model for inputs longer than 4096 tokens is not recommended. This model's multilingual capability is still inferior to multilingual-e5-large for some cases.", + "model_explanation_gemini": "Performs sentence similarity scoring (STS) and text classification tasks across multiple languages and datasets." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_e5-small-v2.json b/data/model_data_json/intfloat_e5-small-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..b2cb027ceb291404e6903ce99427f0aaf75abc7e --- /dev/null +++ b/data/model_data_json/intfloat_e5-small-v2.json @@ -0,0 +1,28 @@ +{ + "model_id": "intfloat/e5-small-v2", + "downloads": 226805, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "bert", + "mteb", + "Sentence Transformers", + "sentence-similarity", + "en", + "arxiv:2212.03533", + "arxiv:2104.08663", + "arxiv:2210.07316", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - Sentence Transformers - sentence-similarity - sentence-transformers model-index: - name: e5-small-v2 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 77.59701492537313 - type: ap value: 41.67064885731708 - type: f1 value: 71.86465946398573 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.265875 - type: ap value: 87.67633085349644 - type: f1 value: 91.24297521425744 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 45.882000000000005 - type: f1 value: 45.08058870381236 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 20.697 - type: map_at_10 value: 33.975 - type: map_at_100 value: 35.223 - type: map_at_1000 value: 35.260000000000005 - type: map_at_3 value: 29.776999999999997 - type: map_at_5 value: 32.035000000000004 - type: mrr_at_1 value: 20.982 - type: mrr_at_10 value: 34.094 - type: mrr_at_100 value: 35.343 - type: mrr_at_1000 value: 35.38 - type: mrr_at_3 value: 29.884 - type: mrr_at_5 value: 32.141999999999996 - type: ndcg_at_1 value: 20.697 - type: ndcg_at_10 value: 41.668 - type: ndcg_at_100 value: 47.397 - type: ndcg_at_1000 value: 48.305 - type: ndcg_at_3 value: 32.928000000000004 - type: ndcg_at_5 value: 36.998999999999995 - type: precision_at_1 value: 20.697 - type: precision_at_10 value: 6.636 - type: precision_at_100 value: 0.924 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 14.035 - type: precision_at_5 value: 10.398 - type: recall_at_1 value: 20.697 - type: recall_at_10 value: 66.35799999999999 - type: recall_at_100 value: 92.39 - type: recall_at_1000 value: 99.36 - type: recall_at_3 value: 42.105 - type: recall_at_5 value: 51.991 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 42.1169517447068 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 34.79553720107097 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 58.10811337308168 - type: mrr value: 71.56410763751482 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 78.46834918248696 - type: cos_sim_spearman value: 79.4289182755206 - type: euclidean_pearson value: 76.26662973727008 - type: euclidean_spearman value: 78.11744260952536 - type: manhattan_pearson value: 76.08175262609434 - type: manhattan_spearman value: 78.29395265552289 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 81.63636363636364 - type: f1 value: 81.55779952376953 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 35.88541137137571 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 30.05205685274407 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.293999999999997 - type: map_at_10 value: 39.876 - type: map_at_100 value: 41.315000000000005 - type: map_at_1000 value: 41.451 - type: map_at_3 value: 37.194 - type: map_at_5 value: 38.728 - type: mrr_at_1 value: 37.053000000000004 - type: mrr_at_10 value: 45.281 - type: mrr_at_100 value: 46.188 - type: mrr_at_1000 value: 46.245999999999995 - type: mrr_at_3 value: 43.228 - type: mrr_at_5 value: 44.366 - type: ndcg_at_1 value: 37.053000000000004 - type: ndcg_at_10 value: 45.086 - type: ndcg_at_100 value: 50.756 - type: ndcg_at_1000 value: 53.123 - type: ndcg_at_3 value: 41.416 - type: ndcg_at_5 value: 43.098 - type: precision_at_1 value: 37.053000000000004 - type: precision_at_10 value: 8.34 - type: precision_at_100 value: 1.346 - type: precision_at_1000 value: 0.186 - type: precision_at_3 value: 19.647000000000002 - type: precision_at_5 value: 13.877 - type: recall_at_1 value: 30.293999999999997 - type: recall_at_10 value: 54.309 - type: recall_at_100 value: 78.59 - type: recall_at_1000 value: 93.82300000000001 - type: recall_at_3 value: 43.168 - type: recall_at_5 value: 48.192 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.738000000000003 - type: map_at_10 value: 36.925999999999995 - type: map_at_100 value: 38.017 - type: map_at_1000 value: 38.144 - type: map_at_3 value: 34.446 - type: map_at_5 value: 35.704 - type: mrr_at_1 value: 35.478 - type: mrr_at_10 value: 42.786 - type: mrr_at_100 value: 43.458999999999996 - type: mrr_at_1000 value: 43.507 - type: mrr_at_3 value: 40.648 - type: mrr_at_5 value: 41.804 - type: ndcg_at_1 value: 35.478 - type: ndcg_at_10 value: 42.044 - type: ndcg_at_100 value: 46.249 - type: ndcg_at_1000 value: 48.44 - type: ndcg_at_3 value: 38.314 - type: ndcg_at_5 value: 39.798 - type: precision_at_1 value: 35.478 - type: precision_at_10 value: 7.764 - type: precision_at_100 value: 1.253 - type: precision_at_1000 value: 0.174 - type: precision_at_3 value: 18.047 - type: precision_at_5 value: 12.637 - type: recall_at_1 value: 28.738000000000003 - type: recall_at_10 value: 50.659 - type: recall_at_100 value: 68.76299999999999 - type: recall_at_1000 value: 82.811 - type: recall_at_3 value: 39.536 - type: recall_at_5 value: 43.763999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.565 - type: map_at_10 value: 50.168 - type: map_at_100 value: 51.11 - type: map_at_1000 value: 51.173 - type: map_at_3 value: 47.044000000000004 - type: map_at_5 value: 48.838 - type: mrr_at_1 value: 44.201 - type: mrr_at_10 value: 53.596999999999994 - type: mrr_at_100 value: 54.211 - type: mrr_at_1000 value: 54.247 - type: mrr_at_3 value: 51.202000000000005 - type: mrr_at_5 value: 52.608999999999995 - type: ndcg_at_1 value: 44.201 - type: ndcg_at_10 value: 55.694 - type: ndcg_at_100 value: 59.518 - type: ndcg_at_1000 value: 60.907 - type: ndcg_at_3 value: 50.395999999999994 - type: ndcg_at_5 value: 53.022999999999996 - type: precision_at_1 value: 44.201 - type: precision_at_10 value: 8.84 - type: precision_at_100 value: 1.162 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 22.153 - type: precision_at_5 value: 15.260000000000002 - type: recall_at_1 value: 38.565 - type: recall_at_10 value: 68.65 - type: recall_at_100 value: 85.37400000000001 - type: recall_at_1000 value: 95.37400000000001 - type: recall_at_3 value: 54.645999999999994 - type: recall_at_5 value: 60.958 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.945 - type: map_at_10 value: 30.641000000000002 - type: map_at_100 value: 31.599 - type: map_at_1000 value: 31.691000000000003 - type: map_at_3 value: 28.405 - type: map_at_5 value: 29.704000000000004 - type: mrr_at_1 value: 25.537 - type: mrr_at_10 value: 32.22 - type: mrr_at_100 value: 33.138 - type: mrr_at_1000 value: 33.214 - type: mrr_at_3 value: 30.151 - type: mrr_at_5 value: 31.298 - type: ndcg_at_1 value: 25.537 - type: ndcg_at_10 value: 34.638000000000005 - type: ndcg_at_100 value: 39.486 - type: ndcg_at_1000 value: 41.936 - type: ndcg_at_3 value: 30.333 - type: ndcg_at_5 value: 32.482 - type: precision_at_1 value: 25.537 - type: precision_at_10 value: 5.153 - type: precision_at_100 value: 0.7929999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 12.429 - type: precision_at_5 value: 8.723 - type: recall_at_1 value: 23.945 - type: recall_at_10 value: 45.412 - type: recall_at_100 value: 67.836 - type: recall_at_1000 value: 86.467 - type: recall_at_3 value: 34.031 - type: recall_at_5 value: 39.039 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 14.419 - type: map_at_10 value: 20.858999999999998 - type: map_at_100 value: 22.067999999999998 - type: map_at_1000 value: 22.192 - type: map_at_3 value: 18.673000000000002 - type: map_at_5 value: 19.968 - type: mrr_at_1 value: 17.785999999999998 - type: mrr_at_10 value: 24.878 - type: mrr_at_100 value: 26.021 - type: mrr_at_1000 value: 26.095000000000002 - type: mrr_at_3 value: 22.616 - type: mrr_at_5 value: 23.785 - type: ndcg_at_1 value: 17.785999999999998 - type: ndcg_at_10 value: 25.153 - type: ndcg_at_100 value: 31.05 - type: ndcg_at_1000 value: 34.052 - type: ndcg_at_3 value: 21.117 - type: ndcg_at_5 value: 23.048 - type: precision_at_1 value: 17.785999999999998 - type: precision_at_10 value: 4.590000000000001 - type: precision_at_100 value: 0.864 - type: precision_at_1000 value: 0.125 - type: precision_at_3 value: 9.908999999999999 - type: precision_at_5 value: 7.313 - type: recall_at_1 value: 14.419 - type: recall_at_10 value: 34.477999999999994 - type: recall_at_100 value: 60.02499999999999 - type: recall_at_1000 value: 81.646 - type: recall_at_3 value: 23.515 - type: recall_at_5 value: 28.266999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.268 - type: map_at_10 value: 35.114000000000004 - type: map_at_100 value: 36.212 - type: map_at_1000 value: 36.333 - type: map_at_3 value: 32.436 - type: map_at_5 value: 33.992 - type: mrr_at_1 value: 31.761 - type: mrr_at_10 value: 40.355999999999995 - type: mrr_at_100 value: 41.125 - type: mrr_at_1000 value: 41.186 - type: mrr_at_3 value: 37.937 - type: mrr_at_5 value: 39.463 - type: ndcg_at_1 value: 31.761 - type: ndcg_at_10 value: 40.422000000000004 - type: ndcg_at_100 value: 45.458999999999996 - type: ndcg_at_1000 value: 47.951 - type: ndcg_at_3 value: 35.972 - type: ndcg_at_5 value: 38.272 - type: precision_at_1 value: 31.761 - type: precision_at_10 value: 7.103 - type: precision_at_100 value: 1.133 - type: precision_at_1000 value: 0.152 - type: precision_at_3 value: 16.779 - type: precision_at_5 value: 11.877 - type: recall_at_1 value: 26.268 - type: recall_at_10 value: 51.053000000000004 - type: recall_at_100 value: 72.702 - type: recall_at_1000 value: 89.521 - type: recall_at_3 value: 38.619 - type: recall_at_5 value: 44.671 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.230999999999998 - type: map_at_10 value: 34.227000000000004 - type: map_at_100 value: 35.370000000000005 - type: map_at_1000 value: 35.488 - type: map_at_3 value: 31.496000000000002 - type: map_at_5 value: 33.034 - type: mrr_at_1 value: 30.822 - type: mrr_at_10 value: 39.045 - type: mrr_at_100 value: 39.809 - type: mrr_at_1000 value: 39.873 - type: mrr_at_3 value: 36.663000000000004 - type: mrr_at_5 value: 37.964 - type: ndcg_at_1 value: 30.822 - type: ndcg_at_10 value: 39.472 - type: ndcg_at_100 value: 44.574999999999996 - type: ndcg_at_1000 value: 47.162 - type: ndcg_at_3 value: 34.929 - type: ndcg_at_5 value: 37.002 - type: precision_at_1 value: 30.822 - type: precision_at_10 value: 7.055 - type: precision_at_100 value: 1.124 - type: precision_at_1000 value: 0.152 - type: precision_at_3 value: 16.591 - type: precision_at_5 value: 11.667 - type: recall_at_1 value: 25.230999999999998 - type: recall_at_10 value: 50.42100000000001 - type: recall_at_100 value: 72.685 - type: recall_at_1000 value: 90.469 - type: recall_at_3 value: 37.503 - type: recall_at_5 value: 43.123 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.604166666666664 - type: map_at_10 value: 32.427166666666665 - type: map_at_100 value: 33.51474999999999 - type: map_at_1000 value: 33.6345 - type: map_at_3 value: 30.02366666666667 - type: map_at_5 value: 31.382333333333328 - type: mrr_at_1 value: 29.001166666666666 - type: mrr_at_10 value: 36.3315 - type: mrr_at_100 value: 37.16683333333333 - type: mrr_at_1000 value: 37.23341666666668 - type: mrr_at_3 value: 34.19916666666667 - type: mrr_at_5 value: 35.40458333333334 - type: ndcg_at_1 value: 29.001166666666666 - type: ndcg_at_10 value: 37.06883333333334 - type: ndcg_at_100 value: 41.95816666666666 - type: ndcg_at_1000 value: 44.501583333333336 - type: ndcg_at_3 value: 32.973499999999994 - type: ndcg_at_5 value: 34.90833333333334 - type: precision_at_1 value: 29.001166666666666 - type: precision_at_10 value: 6.336 - type: precision_at_100 value: 1.0282499999999999 - type: precision_at_1000 value: 0.14391666666666664 - type: precision_at_3 value: 14.932499999999996 - type: precision_at_5 value: 10.50825 - type: recall_at_1 value: 24.604166666666664 - type: recall_at_10 value: 46.9525 - type: recall_at_100 value: 68.67816666666667 - type: recall_at_1000 value: 86.59783333333334 - type: recall_at_3 value: 35.49783333333333 - type: recall_at_5 value: 40.52525000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.559 - type: map_at_10 value: 29.023 - type: map_at_100 value: 29.818 - type: map_at_1000 value: 29.909000000000002 - type: map_at_3 value: 27.037 - type: map_at_5 value: 28.225 - type: mrr_at_1 value: 26.994 - type: mrr_at_10 value: 31.962000000000003 - type: mrr_at_100 value: 32.726 - type: mrr_at_1000 value: 32.800000000000004 - type: mrr_at_3 value: 30.266 - type: mrr_at_5 value: 31.208999999999996 - type: ndcg_at_1 value: 26.994 - type: ndcg_at_10 value: 32.53 - type: ndcg_at_100 value: 36.758 - type: ndcg_at_1000 value: 39.362 - type: ndcg_at_3 value: 28.985 - type: ndcg_at_5 value: 30.757 - type: precision_at_1 value: 26.994 - type: precision_at_10 value: 4.968999999999999 - type: precision_at_100 value: 0.759 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 12.219 - type: precision_at_5 value: 8.527999999999999 - type: recall_at_1 value: 23.559 - type: recall_at_10 value: 40.585 - type: recall_at_100 value: 60.306000000000004 - type: recall_at_1000 value: 80.11 - type: recall_at_3 value: 30.794 - type: recall_at_5 value: 35.186 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.384999999999998 - type: map_at_10 value: 22.142 - type: map_at_100 value: 23.057 - type: map_at_1000 value: 23.177 - type: map_at_3 value: 20.29 - type: map_at_5 value: 21.332 - type: mrr_at_1 value: 19.89 - type: mrr_at_10 value: 25.771 - type: mrr_at_100 value: 26.599 - type: mrr_at_1000 value: 26.680999999999997 - type: mrr_at_3 value: 23.962 - type: mrr_at_5 value: 24.934 - type: ndcg_at_1 value: 19.89 - type: ndcg_at_10 value: 25.97 - type: ndcg_at_100 value: 30.605 - type: ndcg_at_1000 value: 33.619 - type: ndcg_at_3 value: 22.704 - type: ndcg_at_5 value: 24.199 - type: precision_at_1 value: 19.89 - type: precision_at_10 value: 4.553 - type: precision_at_100 value: 0.8049999999999999 - type: precision_at_1000 value: 0.122 - type: precision_at_3 value: 10.541 - type: precision_at_5 value: 7.46 - type: recall_at_1 value: 16.384999999999998 - type: recall_at_10 value: 34.001 - type: recall_at_100 value: 55.17100000000001 - type: recall_at_1000 value: 77.125 - type: recall_at_3 value: 24.618000000000002 - type: recall_at_5 value: 28.695999999999998 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.726 - type: map_at_10 value: 31.227 - type: map_at_100 value: 32.311 - type: map_at_1000 value: 32.419 - type: map_at_3 value: 28.765 - type: map_at_5 value: 30.229 - type: mrr_at_1 value: 27.705000000000002 - type: mrr_at_10 value: 35.085 - type: mrr_at_100 value: 35.931000000000004 - type: mrr_at_1000 value: 36 - type: mrr_at_3 value: 32.603 - type: mrr_at_5 value: 34.117999999999995 - type: ndcg_at_1 value: 27.705000000000002 - type: ndcg_at_10 value: 35.968 - type: ndcg_at_100 value: 41.197 - type: ndcg_at_1000 value: 43.76 - type: ndcg_at_3 value: 31.304 - type: ndcg_at_5 value: 33.661 - type: precision_at_1 value: 27.705000000000002 - type: precision_at_10 value: 5.942 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 13.868 - type: precision_at_5 value: 9.944 - type: recall_at_1 value: 23.726 - type: recall_at_10 value: 46.786 - type: recall_at_100 value: 70.072 - type: recall_at_1000 value: 88.2 - type: recall_at_3 value: 33.981 - type: recall_at_5 value: 39.893 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.344 - type: map_at_10 value: 31.636999999999997 - type: map_at_100 value: 33.065 - type: map_at_1000 value: 33.300000000000004 - type: map_at_3 value: 29.351 - type: map_at_5 value: 30.432 - type: mrr_at_1 value: 27.866000000000003 - type: mrr_at_10 value: 35.587 - type: mrr_at_100 value: 36.52 - type: mrr_at_1000 value: 36.597 - type: mrr_at_3 value: 33.696 - type: mrr_at_5 value: 34.713 - type: ndcg_at_1 value: 27.866000000000003 - type: ndcg_at_10 value: 36.61 - type: ndcg_at_100 value: 41.88 - type: ndcg_at_1000 value: 45.105000000000004 - type: ndcg_at_3 value: 33.038000000000004 - type: ndcg_at_5 value: 34.331 - type: precision_at_1 value: 27.866000000000003 - type: precision_at_10 value: 6.917 - type: precision_at_100 value: 1.3599999999999999 - type: precision_at_1000 value: 0.233 - type: precision_at_3 value: 15.547 - type: precision_at_5 value: 10.791 - type: recall_at_1 value: 23.344 - type: recall_at_10 value: 45.782000000000004 - type: recall_at_100 value: 69.503 - type: recall_at_1000 value: 90.742 - type: recall_at_3 value: 35.160000000000004 - type: recall_at_5 value: 39.058 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 20.776 - type: map_at_10 value: 27.285999999999998 - type: map_at_100 value: 28.235 - type: map_at_1000 value: 28.337 - type: map_at_3 value: 25.147000000000002 - type: map_at_5 value: 26.401999999999997 - type: mrr_at_1 value: 22.921 - type: mrr_at_10 value: 29.409999999999997 - type: mrr_at_100 value: 30.275000000000002 - type: mrr_at_1000 value: 30.354999999999997 - type: mrr_at_3 value: 27.418 - type: mrr_at_5 value: 28.592000000000002 - type: ndcg_at_1 value: 22.921 - type: ndcg_at_10 value: 31.239 - type: ndcg_at_100 value: 35.965 - type: ndcg_at_1000 value: 38.602 - type: ndcg_at_3 value: 27.174 - type: ndcg_at_5 value: 29.229 - type: precision_at_1 value: 22.921 - type: precision_at_10 value: 4.806 - type: precision_at_100 value: 0.776 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 11.459999999999999 - type: precision_at_5 value: 8.022 - type: recall_at_1 value: 20.776 - type: recall_at_10 value: 41.294 - type: recall_at_100 value: 63.111 - type: recall_at_1000 value: 82.88600000000001 - type: recall_at_3 value: 30.403000000000002 - type: recall_at_5 value: 35.455999999999996 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 9.376 - type: map_at_10 value: 15.926000000000002 - type: map_at_100 value: 17.585 - type: map_at_1000 value: 17.776 - type: map_at_3 value: 13.014000000000001 - type: map_at_5 value: 14.417 - type: mrr_at_1 value: 20.195 - type: mrr_at_10 value: 29.95 - type: mrr_at_100 value: 31.052000000000003 - type: mrr_at_1000 value: 31.108000000000004 - type: mrr_at_3 value: 26.667 - type: mrr_at_5 value: 28.458 - type: ndcg_at_1 value: 20.195 - type: ndcg_at_10 value: 22.871 - type: ndcg_at_100 value: 29.921999999999997 - type: ndcg_at_1000 value: 33.672999999999995 - type: ndcg_at_3 value: 17.782999999999998 - type: ndcg_at_5 value: 19.544 - type: precision_at_1 value: 20.195 - type: precision_at_10 value: 7.394 - type: precision_at_100 value: 1.493 - type: precision_at_1000 value: 0.218 - type: precision_at_3 value: 13.073 - type: precision_at_5 value: 10.436 - type: recall_at_1 value: 9.376 - type: recall_at_10 value: 28.544999999999998 - type: recall_at_100 value: 53.147999999999996 - type: recall_at_1000 value: 74.62 - type: recall_at_3 value: 16.464000000000002 - type: recall_at_5 value: 21.004 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.415000000000001 - type: map_at_10 value: 18.738 - type: map_at_100 value: 27.291999999999998 - type: map_at_1000 value: 28.992 - type: map_at_3 value: 13.196 - type: map_at_5 value: 15.539 - type: mrr_at_1 value: 66.5 - type: mrr_at_10 value: 74.518 - type: mrr_at_100 value: 74.86 - type: mrr_at_1000 value: 74.87 - type: mrr_at_3 value: 72.375 - type: mrr_at_5 value: 73.86200000000001 - type: ndcg_at_1 value: 54.37499999999999 - type: ndcg_at_10 value: 41.317 - type: ndcg_at_100 value: 45.845 - type: ndcg_at_1000 value: 52.92 - type: ndcg_at_3 value: 44.983000000000004 - type: ndcg_at_5 value: 42.989 - type: precision_at_1 value: 66.5 - type: precision_at_10 value: 33.6 - type: precision_at_100 value: 10.972999999999999 - type: precision_at_1000 value: 2.214 - type: precision_at_3 value: 48.583 - type: precision_at_5 value: 42.15 - type: recall_at_1 value: 8.415000000000001 - type: recall_at_10 value: 24.953 - type: recall_at_100 value: 52.48199999999999 - type: recall_at_1000 value: 75.093 - type: recall_at_3 value: 14.341000000000001 - type: recall_at_5 value: 18.468 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.06499999999999 - type: f1 value: 41.439327599975385 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 66.02 - type: map_at_10 value: 76.68599999999999 - type: map_at_100 value: 76.959 - type: map_at_1000 value: 76.972 - type: map_at_3 value: 75.024 - type: map_at_5 value: 76.153 - type: mrr_at_1 value: 71.197 - type: mrr_at_10 value: 81.105 - type: mrr_at_100 value: 81.232 - type: mrr_at_1000 value: 81.233 - type: mrr_at_3 value: 79.758 - type: mrr_at_5 value: 80.69 - type: ndcg_at_1 value: 71.197 - type: ndcg_at_10 value: 81.644 - type: ndcg_at_100 value: 82.645 - type: ndcg_at_1000 value: 82.879 - type: ndcg_at_3 value: 78.792 - type: ndcg_at_5 value: 80.528 - type: precision_at_1 value: 71.197 - type: precision_at_10 value: 10.206999999999999 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 30.868000000000002 - type: precision_at_5 value: 19.559 - type: recall_at_1 value: 66.02 - type: recall_at_10 value: 92.50699999999999 - type: recall_at_100 value: 96.497 - type: recall_at_1000 value: 97.956 - type: recall_at_3 value: 84.866 - type: recall_at_5 value: 89.16199999999999 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 17.948 - type: map_at_10 value: 29.833 - type: map_at_100 value: 31.487 - type: map_at_1000 value: 31.674000000000003 - type: map_at_3 value: 26.029999999999998 - type: map_at_5 value: 28.038999999999998 - type: mrr_at_1 value: 34.721999999999994 - type: mrr_at_10 value: 44.214999999999996 - type: mrr_at_100 value: 44.994 - type: mrr_at_1000 value: 45.051 - type: mrr_at_3 value: 41.667 - type: mrr_at_5 value: 43.032 - type: ndcg_at_1 value: 34.721999999999994 - type: ndcg_at_10 value: 37.434 - type: ndcg_at_100 value: 43.702000000000005 - type: ndcg_at_1000 value: 46.993 - type: ndcg_at_3 value: 33.56 - type: ndcg_at_5 value: 34.687 - type: precision_at_1 value: 34.721999999999994 - type: precision_at_10 value: 10.401 - type: precision_at_100 value: 1.7049999999999998 - type: precision_at_1000 value: 0.22799999999999998 - type: precision_at_3 value: 22.531000000000002 - type: precision_at_5 value: 16.42 - type: recall_at_1 value: 17.948 - type: recall_at_10 value: 45.062999999999995 - type: recall_at_100 value: 68.191 - type: recall_at_1000 value: 87.954 - type: recall_at_3 value: 31.112000000000002 - type: recall_at_5 value: 36.823 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 36.644 - type: map_at_10 value: 57.658 - type: map_at_100 value: 58.562000000000005 - type: map_at_1000 value: 58.62500000000001 - type: map_at_3 value: 54.022999999999996 - type: map_at_5 value: 56.293000000000006 - type: mrr_at_1 value: 73.288 - type: mrr_at_10 value: 80.51700000000001 - type: mrr_at_100 value: 80.72 - type: mrr_at_1000 value: 80.728 - type: mrr_at_3 value: 79.33200000000001 - type: mrr_at_5 value: 80.085 - type: ndcg_at_1 value: 73.288 - type: ndcg_at_10 value: 66.61 - type: ndcg_at_100 value: 69.723 - type: ndcg_at_1000 value: 70.96000000000001 - type: ndcg_at_3 value: 61.358999999999995 - type: ndcg_at_5 value: 64.277 - type: precision_at_1 value: 73.288 - type: precision_at_10 value: 14.17 - type: precision_at_100 value: 1.659 - type: precision_at_1000 value: 0.182 - type: precision_at_3 value: 39.487 - type: precision_at_5 value: 25.999 - type: recall_at_1 value: 36.644 - type: recall_at_10 value: 70.851 - type: recall_at_100 value: 82.94399999999999 - type: recall_at_1000 value: 91.134 - type: recall_at_3 value: 59.230000000000004 - type: recall_at_5 value: 64.997 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 86.00280000000001 - type: ap value: 80.46302061021223 - type: f1 value: 85.9592921596419 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.541 - type: map_at_10 value: 34.625 - type: map_at_100 value: 35.785 - type: map_at_1000 value: 35.831 - type: map_at_3 value: 30.823 - type: map_at_5 value: 32.967999999999996 - type: mrr_at_1 value: 23.180999999999997 - type: mrr_at_10 value: 35.207 - type: mrr_at_100 value: 36.315 - type: mrr_at_1000 value: 36.355 - type: mrr_at_3 value: 31.483 - type: mrr_at_5 value: 33.589999999999996 - type: ndcg_at_1 value: 23.195 - type: ndcg_at_10 value: 41.461 - type: ndcg_at_100 value: 47.032000000000004 - type: ndcg_at_1000 value: 48.199999999999996 - type: ndcg_at_3 value: 33.702 - type: ndcg_at_5 value: 37.522 - type: precision_at_1 value: 23.195 - type: precision_at_10 value: 6.526999999999999 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.10300000000000001 - type: precision_at_3 value: 14.308000000000002 - type: precision_at_5 value: 10.507 - type: recall_at_1 value: 22.541 - type: recall_at_10 value: 62.524 - type: recall_at_100 value: 88.228 - type: recall_at_1000 value: 97.243 - type: recall_at_3 value: 41.38 - type: recall_at_5 value: 50.55 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.69949840401279 - type: f1 value: 92.54141471311786 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.56041951664386 - type: f1 value: 55.88499977508287 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.62071284465365 - type: f1 value: 69.36717546572152 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.35843981170142 - type: f1 value: 76.15496453538884 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.33664956793118 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 27.883839621715524 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.096874986740758 - type: mrr value: 30.97300481932132 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.4 - type: map_at_10 value: 11.852 - type: map_at_100 value: 14.758 - type: map_at_1000 value: 16.134 - type: map_at_3 value: 8.558 - type: map_at_5 value: 10.087 - type: mrr_at_1 value: 44.272 - type: mrr_at_10 value: 52.05800000000001 - type: mrr_at_100 value: 52.689 - type: mrr_at_1000 value: 52.742999999999995 - type: mrr_at_3 value: 50.205999999999996 - type: mrr_at_5 value: 51.367 - type: ndcg_at_1 value: 42.57 - type: ndcg_at_10 value: 32.449 - type: ndcg_at_100 value: 29.596 - type: ndcg_at_1000 value: 38.351 - type: ndcg_at_3 value: 37.044 - type: ndcg_at_5 value: 35.275 - type: precision_at_1 value: 44.272 - type: precision_at_10 value: 23.87 - type: precision_at_100 value: 7.625 - type: precision_at_1000 value: 2.045 - type: precision_at_3 value: 34.365 - type: precision_at_5 value: 30.341 - type: recall_at_1 value: 5.4 - type: recall_at_10 value: 15.943999999999999 - type: recall_at_100 value: 29.805 - type: recall_at_1000 value: 61.695 - type: recall_at_3 value: 9.539 - type: recall_at_5 value: 12.127 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 36.047000000000004 - type: map_at_10 value: 51.6 - type: map_at_100 value: 52.449999999999996 - type: map_at_1000 value: 52.476 - type: map_at_3 value: 47.452 - type: map_at_5 value: 49.964 - type: mrr_at_1 value: 40.382 - type: mrr_at_10 value: 54.273 - type: mrr_at_100 value: 54.859 - type: mrr_at_1000 value: 54.876000000000005 - type: mrr_at_3 value: 51.014 - type: mrr_at_5 value: 52.983999999999995 - type: ndcg_at_1 value: 40.353 - type: ndcg_at_10 value: 59.11300000000001 - type: ndcg_at_100 value: 62.604000000000006 - type: ndcg_at_1000 value: 63.187000000000005 - type: ndcg_at_3 value: 51.513 - type: ndcg_at_5 value: 55.576 - type: precision_at_1 value: 40.353 - type: precision_at_10 value: 9.418 - type: precision_at_100 value: 1.1440000000000001 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 23.078000000000003 - type: precision_at_5 value: 16.250999999999998 - type: recall_at_1 value: 36.047000000000004 - type: recall_at_10 value: 79.22200000000001 - type: recall_at_100 value: 94.23 - type: recall_at_1000 value: 98.51100000000001 - type: recall_at_3 value: 59.678 - type: recall_at_5 value: 68.967 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 68.232 - type: map_at_10 value: 81.674 - type: map_at_100 value: 82.338 - type: map_at_1000 value: 82.36099999999999 - type: map_at_3 value: 78.833 - type: map_at_5 value: 80.58 - type: mrr_at_1 value: 78.64 - type: mrr_at_10 value: 85.164 - type: mrr_at_100 value: 85.317 - type: mrr_at_1000 value: 85.319 - type: mrr_at_3 value: 84.127 - type: mrr_at_5 value: 84.789 - type: ndcg_at_1 value: 78.63 - type: ndcg_at_10 value: 85.711 - type: ndcg_at_100 value: 87.238 - type: ndcg_at_1000 value: 87.444 - type: ndcg_at_3 value: 82.788 - type: ndcg_at_5 value: 84.313 - type: precision_at_1 value: 78.63 - type: precision_at_10 value: 12.977 - type: precision_at_100 value: 1.503 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 36.113 - type: precision_at_5 value: 23.71 - type: recall_at_1 value: 68.232 - type: recall_at_10 value: 93.30199999999999 - type: recall_at_100 value: 98.799 - type: recall_at_1000 value: 99.885 - type: recall_at_3 value: 84.827 - type: recall_at_5 value: 89.188 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 45.71879170816294 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 59.65866311751794 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.218 - type: map_at_10 value: 10.337 - type: map_at_100 value: 12.131 - type: map_at_1000 value: 12.411 - type: map_at_3 value: 7.4270000000000005 - type: map_at_5 value: 8.913 - type: mrr_at_1 value: 20.8 - type: mrr_at_10 value: 30.868000000000002 - type: mrr_at_100 value: 31.903 - type: mrr_at_1000 value: 31.972 - type: mrr_at_3 value: 27.367 - type: mrr_at_5 value: 29.372 - type: ndcg_at_1 value: 20.8 - type: ndcg_at_10 value: 17.765 - type: ndcg_at_100 value: 24.914 - type: ndcg_at_1000 value: 30.206 - type: ndcg_at_3 value: 16.64 - type: ndcg_at_5 value: 14.712 - type: precision_at_1 value: 20.8 - type: precision_at_10 value: 9.24 - type: precision_at_100 value: 1.9560000000000002 - type: precision_at_1000 value: 0.32299999999999995 - type: precision_at_3 value: 15.467 - type: precision_at_5 value: 12.94 - type: recall_at_1 value: 4.218 - type: recall_at_10 value: 18.752 - type: recall_at_100 value: 39.7 - type: recall_at_1000 value: 65.57300000000001 - type: recall_at_3 value: 9.428 - type: recall_at_5 value: 13.133000000000001 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.04338850207233 - type: cos_sim_spearman value: 78.5054651430423 - type: euclidean_pearson value: 80.30739451228612 - type: euclidean_spearman value: 78.48377464299097 - type: manhattan_pearson value: 80.40795049052781 - type: manhattan_spearman value: 78.49506205443114 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 84.11596224442962 - type: cos_sim_spearman value: 76.20997388935461 - type: euclidean_pearson value: 80.56858451349109 - type: euclidean_spearman value: 75.92659183871186 - type: manhattan_pearson value: 80.60246102203844 - type: manhattan_spearman value: 76.03018971432664 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 81.34691640755737 - type: cos_sim_spearman value: 82.4018369631579 - type: euclidean_pearson value: 81.87673092245366 - type: euclidean_spearman value: 82.3671489960678 - type: manhattan_pearson value: 81.88222387719948 - type: manhattan_spearman value: 82.3816590344736 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 81.2836092579524 - type: cos_sim_spearman value: 78.99982781772064 - type: euclidean_pearson value: 80.5184271010527 - type: euclidean_spearman value: 78.89777392101904 - type: manhattan_pearson value: 80.53585705018664 - type: manhattan_spearman value: 78.92898405472994 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.7349907750784 - type: cos_sim_spearman value: 87.7611234446225 - type: euclidean_pearson value: 86.98759326731624 - type: euclidean_spearman value: 87.58321319424618 - type: manhattan_pearson value: 87.03483090370842 - type: manhattan_spearman value: 87.63278333060288 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 81.75873694924825 - type: cos_sim_spearman value: 83.80237999094724 - type: euclidean_pearson value: 83.55023725861537 - type: euclidean_spearman value: 84.12744338577744 - type: manhattan_pearson value: 83.58816983036232 - type: manhattan_spearman value: 84.18520748676501 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.21630882940174 - type: cos_sim_spearman value: 87.72382883437031 - type: euclidean_pearson value: 88.69933350930333 - type: euclidean_spearman value: 88.24660814383081 - type: manhattan_pearson value: 88.77331018833499 - type: manhattan_spearman value: 88.26109989380632 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 61.11854063060489 - type: cos_sim_spearman value: 63.14678634195072 - type: euclidean_pearson value: 61.679090067000864 - type: euclidean_spearman value: 62.28876589509653 - type: manhattan_pearson value: 62.082324165511004 - type: manhattan_spearman value: 62.56030932816679 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.00319882832645 - type: cos_sim_spearman value: 85.94529772647257 - type: euclidean_pearson value: 85.6661390122756 - type: euclidean_spearman value: 85.97747815545827 - type: manhattan_pearson value: 85.58422770541893 - type: manhattan_spearman value: 85.9237139181532 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 79.16198731863916 - type: mrr value: 94.25202702163487 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 54.761 - type: map_at_10 value: 64.396 - type: map_at_100 value: 65.07 - type: map_at_1000 value: 65.09899999999999 - type: map_at_3 value: 61.846000000000004 - type: map_at_5 value: 63.284 - type: mrr_at_1 value: 57.667 - type: mrr_at_10 value: 65.83099999999999 - type: mrr_at_100 value: 66.36800000000001 - type: mrr_at_1000 value: 66.39399999999999 - type: mrr_at_3 value: 64.056 - type: mrr_at_5 value: 65.206 - type: ndcg_at_1 value: 57.667 - type: ndcg_at_10 value: 68.854 - type: ndcg_at_100 value: 71.59100000000001 - type: ndcg_at_1000 value: 72.383 - type: ndcg_at_3 value: 64.671 - type: ndcg_at_5 value: 66.796 - type: precision_at_1 value: 57.667 - type: precision_at_10 value: 9.167 - type: precision_at_100 value: 1.053 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 25.444 - type: precision_at_5 value: 16.667 - type: recall_at_1 value: 54.761 - type: recall_at_10 value: 80.9 - type: recall_at_100 value: 92.767 - type: recall_at_1000 value: 99 - type: recall_at_3 value: 69.672 - type: recall_at_5 value: 75.083 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.8079207920792 - type: cos_sim_ap value: 94.88470927617445 - type: cos_sim_f1 value: 90.08179959100204 - type: cos_sim_precision value: 92.15481171548117 - type: cos_sim_recall value: 88.1 - type: dot_accuracy value: 99.58613861386138 - type: dot_ap value: 82.94822578881316 - type: dot_f1 value: 77.33333333333333 - type: dot_precision value: 79.36842105263158 - type: dot_recall value: 75.4 - type: euclidean_accuracy value: 99.8069306930693 - type: euclidean_ap value: 94.81367858031837 - type: euclidean_f1 value: 90.01009081735621 - type: euclidean_precision value: 90.83503054989816 - type: euclidean_recall value: 89.2 - type: manhattan_accuracy value: 99.81188118811882 - type: manhattan_ap value: 94.91405337220161 - type: manhattan_f1 value: 90.2763561924258 - type: manhattan_precision value: 92.45283018867924 - type: manhattan_recall value: 88.2 - type: max_accuracy value: 99.81188118811882 - type: max_ap value: 94.91405337220161 - type: max_f1 value: 90.2763561924258 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 58.511599500053094 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 31.984728147814707 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.93428193939015 - type: mrr value: 50.916557911043206 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.562500894537145 - type: cos_sim_spearman value: 31.162587976726307 - type: dot_pearson value: 22.633662187735762 - type: dot_spearman value: 22.723000282378962 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.219 - type: map_at_10 value: 1.871 - type: map_at_100 value: 10.487 - type: map_at_1000 value: 25.122 - type: map_at_3 value: 0.657 - type: map_at_5 value: 1.0699999999999998 - type: mrr_at_1 value: 84 - type: mrr_at_10 value: 89.567 - type: mrr_at_100 value: 89.748 - type: mrr_at_1000 value: 89.748 - type: mrr_at_3 value: 88.667 - type: mrr_at_5 value: 89.567 - type: ndcg_at_1 value: 80 - type: ndcg_at_10 value: 74.533 - type: ndcg_at_100 value: 55.839000000000006 - type: ndcg_at_1000 value: 49.748 - type: ndcg_at_3 value: 79.53099999999999 - type: ndcg_at_5 value: 78.245 - type: precision_at_1 value: 84 - type: precision_at_10 value: 78.4 - type: precision_at_100 value: 56.99999999999999 - type: precision_at_1000 value: 21.98 - type: precision_at_3 value: 85.333 - type: precision_at_5 value: 84.8 - type: recall_at_1 value: 0.219 - type: recall_at_10 value: 2.02 - type: recall_at_100 value: 13.555 - type: recall_at_1000 value: 46.739999999999995 - type: recall_at_3 value: 0.685 - type: recall_at_5 value: 1.13 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 3.5029999999999997 - type: map_at_10 value: 11.042 - type: map_at_100 value: 16.326999999999998 - type: map_at_1000 value: 17.836 - type: map_at_3 value: 6.174 - type: map_at_5 value: 7.979 - type: mrr_at_1 value: 42.857 - type: mrr_at_10 value: 52.617000000000004 - type: mrr_at_100 value: 53.351000000000006 - type: mrr_at_1000 value: 53.351000000000006 - type: mrr_at_3 value: 46.939 - type: mrr_at_5 value: 50.714000000000006 - type: ndcg_at_1 value: 38.775999999999996 - type: ndcg_at_10 value: 27.125 - type: ndcg_at_100 value: 35.845 - type: ndcg_at_1000 value: 47.377 - type: ndcg_at_3 value: 29.633 - type: ndcg_at_5 value: 28.378999999999998 - type: precision_at_1 value: 42.857 - type: precision_at_10 value: 24.082 - type: precision_at_100 value: 6.877999999999999 - type: precision_at_1000 value: 1.463 - type: precision_at_3 value: 29.932 - type: precision_at_5 value: 28.571 - type: recall_at_1 value: 3.5029999999999997 - type: recall_at_10 value: 17.068 - type: recall_at_100 value: 43.361 - type: recall_at_1000 value: 78.835 - type: recall_at_3 value: 6.821000000000001 - type: recall_at_5 value: 10.357 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.0954 - type: ap value: 14.216844153511959 - type: f1 value: 54.63687418565117 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.46293152235427 - type: f1 value: 61.744177921638645 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 41.12708617788644 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.75430649102938 - type: cos_sim_ap value: 73.34252536948081 - type: cos_sim_f1 value: 67.53758935173774 - type: cos_sim_precision value: 63.3672525439408 - type: cos_sim_recall value: 72.29551451187335 - type: dot_accuracy value: 81.71305954580676 - type: dot_ap value: 59.5532209082386 - type: dot_f1 value: 56.18466898954705 - type: dot_precision value: 47.830923248053395 - type: dot_recall value: 68.07387862796834 - type: euclidean_accuracy value: 85.81987244441795 - type: euclidean_ap value: 73.34325409809446 - type: euclidean_f1 value: 67.83451360417443 - type: euclidean_precision value: 64.09955388588871 - type: euclidean_recall value: 72.0316622691293 - type: manhattan_accuracy value: 85.68277999642368 - type: manhattan_ap value: 73.1535450121903 - type: manhattan_f1 value: 67.928237896289 - type: manhattan_precision value: 63.56945722171113 - type: manhattan_recall value: 72.9287598944591 - type: max_accuracy value: 85.81987244441795 - type: max_ap value: 73.34325409809446 - type: max_f1 value: 67.928237896289 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.90441262079403 - type: cos_sim_ap value: 85.79331880741438 - type: cos_sim_f1 value: 78.31563529842548 - type: cos_sim_precision value: 74.6683424102779 - type: cos_sim_recall value: 82.33754234678165 - type: dot_accuracy value: 84.89928978926534 - type: dot_ap value: 75.25819218316 - type: dot_f1 value: 69.88730119720536 - type: dot_precision value: 64.23362374959665 - type: dot_recall value: 76.63227594702803 - type: euclidean_accuracy value: 89.01695967710637 - type: euclidean_ap value: 85.98986606038852 - type: euclidean_f1 value: 78.5277880014722 - type: euclidean_precision value: 75.22211253701876 - type: euclidean_recall value: 82.13735756082538 - type: manhattan_accuracy value: 88.99561454573679 - type: manhattan_ap value: 85.92262421793953 - type: manhattan_f1 value: 78.38866094740769 - type: manhattan_precision value: 76.02373028505282 - type: manhattan_recall value: 80.9054511857099 - type: max_accuracy value: 89.01695967710637 - type: max_ap value: 85.98986606038852 - type: max_f1 value: 78.5277880014722 language: - en license: mit --- # E5-small-v2 Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 This model has 12 layers and the embedding size is 384. ## Usage Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. ## Training Details Please refer to our paper at ## Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## Support for Sentence Transformers Below is an example for usage with sentence_transformers. Package requirements Contributors: michaelfeil ## FAQ **1. Do I need to add the prefix \"query: \" and \"passage: \" to input texts?** Yes, this is how the model is trained, otherwise you will see a performance degradation. Here are some rules of thumb: - Use \"query: \" and \"passage: \" correspondingly for asymmetric tasks such as passage retrieval in open QA, ad-hoc information retrieval. - Use \"query: \" prefix for symmetric tasks such as semantic similarity, paraphrase retrieval. - Use \"query: \" prefix if you want to use embeddings as features, such as linear probing classification, clustering. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Why does the cosine similarity scores distribute around 0.7 to 1.0?** This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss. For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue. ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations This model only works for English texts. Long texts will be truncated to at most 512 tokens.", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity, classification, clustering, and retrieval." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_multilingual-e5-base.json b/data/model_data_json/intfloat_multilingual-e5-base.json new file mode 100644 index 0000000000000000000000000000000000000000..a0d8bff236d4fb583df28ed0bc6deebe3226e7f6 --- /dev/null +++ b/data/model_data_json/intfloat_multilingual-e5-base.json @@ -0,0 +1,121 @@ +{ + "model_id": "intfloat/multilingual-e5-base", + "downloads": 556387, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "xlm-roberta", + "mteb", + "Sentence Transformers", + "sentence-similarity", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:2402.05672", + "arxiv:2108.08787", + "arxiv:2104.08663", + "arxiv:2210.07316", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - Sentence Transformers - sentence-similarity - sentence-transformers model-index: - name: multilingual-e5-base results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 78.97014925373135 - type: ap value: 43.69351129103008 - type: f1 value: 73.38075030070492 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (de) config: de split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 71.7237687366167 - type: ap value: 82.22089859962671 - type: f1 value: 69.95532758884401 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en-ext) config: en-ext split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 79.65517241379312 - type: ap value: 28.507918657094738 - type: f1 value: 66.84516013726119 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (ja) config: ja split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.32976445396146 - type: ap value: 20.720481637566014 - type: f1 value: 59.78002763416003 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 90.63775 - type: ap value: 87.22277903861716 - type: f1 value: 90.60378636386807 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 44.546 - type: f1 value: 44.05666638370923 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 41.828 - type: f1 value: 41.2710255644252 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.534 - type: f1 value: 39.820743174270326 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 39.684 - type: f1 value: 39.11052682815307 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 37.436 - type: f1 value: 37.07082931930871 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 37.226000000000006 - type: f1 value: 36.65372077739185 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 22.831000000000003 - type: map_at_10 value: 36.42 - type: map_at_100 value: 37.699 - type: map_at_1000 value: 37.724000000000004 - type: map_at_3 value: 32.207 - type: map_at_5 value: 34.312 - type: mrr_at_1 value: 23.257 - type: mrr_at_10 value: 36.574 - type: mrr_at_100 value: 37.854 - type: mrr_at_1000 value: 37.878 - type: mrr_at_3 value: 32.385000000000005 - type: mrr_at_5 value: 34.48 - type: ndcg_at_1 value: 22.831000000000003 - type: ndcg_at_10 value: 44.230000000000004 - type: ndcg_at_100 value: 49.974000000000004 - type: ndcg_at_1000 value: 50.522999999999996 - type: ndcg_at_3 value: 35.363 - type: ndcg_at_5 value: 39.164 - type: precision_at_1 value: 22.831000000000003 - type: precision_at_10 value: 6.935 - type: precision_at_100 value: 0.9520000000000001 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 14.841 - type: precision_at_5 value: 10.754 - type: recall_at_1 value: 22.831000000000003 - type: recall_at_10 value: 69.346 - type: recall_at_100 value: 95.235 - type: recall_at_1000 value: 99.36 - type: recall_at_3 value: 44.523 - type: recall_at_5 value: 53.769999999999996 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 40.27789869854063 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 35.41979463347428 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 58.22752045109304 - type: mrr value: 71.51112430198303 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 84.71147646622866 - type: cos_sim_spearman value: 85.059167046486 - type: euclidean_pearson value: 75.88421613600647 - type: euclidean_spearman value: 75.12821787150585 - type: manhattan_pearson value: 75.22005646957604 - type: manhattan_spearman value: 74.42880434453272 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.23799582463465 - type: f1 value: 99.12665274878218 - type: precision value: 99.07098121085595 - type: recall value: 99.23799582463465 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 97.88685890380806 - type: f1 value: 97.59336708489249 - type: precision value: 97.44662117543473 - type: recall value: 97.88685890380806 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 97.47142362313821 - type: f1 value: 97.1989377670015 - type: precision value: 97.06384944001847 - type: recall value: 97.47142362313821 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 98.4728804634018 - type: f1 value: 98.2973494821836 - type: precision value: 98.2095839915745 - type: recall value: 98.4728804634018 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 82.74025974025975 - type: f1 value: 82.67420447730439 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 35.0380848063507 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 29.45956405670166 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.122 - type: map_at_10 value: 42.03 - type: map_at_100 value: 43.364000000000004 - type: map_at_1000 value: 43.474000000000004 - type: map_at_3 value: 38.804 - type: map_at_5 value: 40.585 - type: mrr_at_1 value: 39.914 - type: mrr_at_10 value: 48.227 - type: mrr_at_100 value: 49.018 - type: mrr_at_1000 value: 49.064 - type: mrr_at_3 value: 45.994 - type: mrr_at_5 value: 47.396 - type: ndcg_at_1 value: 39.914 - type: ndcg_at_10 value: 47.825 - type: ndcg_at_100 value: 52.852 - type: ndcg_at_1000 value: 54.891 - type: ndcg_at_3 value: 43.517 - type: ndcg_at_5 value: 45.493 - type: precision_at_1 value: 39.914 - type: precision_at_10 value: 8.956 - type: precision_at_100 value: 1.388 - type: precision_at_1000 value: 0.182 - type: precision_at_3 value: 20.791999999999998 - type: precision_at_5 value: 14.821000000000002 - type: recall_at_1 value: 32.122 - type: recall_at_10 value: 58.294999999999995 - type: recall_at_100 value: 79.726 - type: recall_at_1000 value: 93.099 - type: recall_at_3 value: 45.017 - type: recall_at_5 value: 51.002 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.677999999999997 - type: map_at_10 value: 38.684000000000005 - type: map_at_100 value: 39.812999999999995 - type: map_at_1000 value: 39.945 - type: map_at_3 value: 35.831 - type: map_at_5 value: 37.446 - type: mrr_at_1 value: 37.771 - type: mrr_at_10 value: 44.936 - type: mrr_at_100 value: 45.583 - type: mrr_at_1000 value: 45.634 - type: mrr_at_3 value: 42.771 - type: mrr_at_5 value: 43.994 - type: ndcg_at_1 value: 37.771 - type: ndcg_at_10 value: 44.059 - type: ndcg_at_100 value: 48.192 - type: ndcg_at_1000 value: 50.375 - type: ndcg_at_3 value: 40.172000000000004 - type: ndcg_at_5 value: 41.899 - type: precision_at_1 value: 37.771 - type: precision_at_10 value: 8.286999999999999 - type: precision_at_100 value: 1.322 - type: precision_at_1000 value: 0.178 - type: precision_at_3 value: 19.406000000000002 - type: precision_at_5 value: 13.745 - type: recall_at_1 value: 29.677999999999997 - type: recall_at_10 value: 53.071 - type: recall_at_100 value: 70.812 - type: recall_at_1000 value: 84.841 - type: recall_at_3 value: 41.016000000000005 - type: recall_at_5 value: 46.22 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 42.675000000000004 - type: map_at_10 value: 53.93599999999999 - type: map_at_100 value: 54.806999999999995 - type: map_at_1000 value: 54.867 - type: map_at_3 value: 50.934000000000005 - type: map_at_5 value: 52.583 - type: mrr_at_1 value: 48.339 - type: mrr_at_10 value: 57.265 - type: mrr_at_100 value: 57.873 - type: mrr_at_1000 value: 57.906 - type: mrr_at_3 value: 55.193000000000005 - type: mrr_at_5 value: 56.303000000000004 - type: ndcg_at_1 value: 48.339 - type: ndcg_at_10 value: 59.19799999999999 - type: ndcg_at_100 value: 62.743 - type: ndcg_at_1000 value: 63.99399999999999 - type: ndcg_at_3 value: 54.367 - type: ndcg_at_5 value: 56.548 - type: precision_at_1 value: 48.339 - type: precision_at_10 value: 9.216000000000001 - type: precision_at_100 value: 1.1809999999999998 - type: precision_at_1000 value: 0.134 - type: precision_at_3 value: 23.72 - type: precision_at_5 value: 16.025 - type: recall_at_1 value: 42.675000000000004 - type: recall_at_10 value: 71.437 - type: recall_at_100 value: 86.803 - type: recall_at_1000 value: 95.581 - type: recall_at_3 value: 58.434 - type: recall_at_5 value: 63.754 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.518 - type: map_at_10 value: 30.648999999999997 - type: map_at_100 value: 31.508999999999997 - type: map_at_1000 value: 31.604 - type: map_at_3 value: 28.247 - type: map_at_5 value: 29.65 - type: mrr_at_1 value: 25.650000000000002 - type: mrr_at_10 value: 32.771 - type: mrr_at_100 value: 33.554 - type: mrr_at_1000 value: 33.629999999999995 - type: mrr_at_3 value: 30.433 - type: mrr_at_5 value: 31.812 - type: ndcg_at_1 value: 25.650000000000002 - type: ndcg_at_10 value: 34.929 - type: ndcg_at_100 value: 39.382 - type: ndcg_at_1000 value: 41.913 - type: ndcg_at_3 value: 30.292 - type: ndcg_at_5 value: 32.629999999999995 - type: precision_at_1 value: 25.650000000000002 - type: precision_at_10 value: 5.311 - type: precision_at_100 value: 0.792 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 12.58 - type: precision_at_5 value: 8.994 - type: recall_at_1 value: 23.518 - type: recall_at_10 value: 46.19 - type: recall_at_100 value: 67.123 - type: recall_at_1000 value: 86.442 - type: recall_at_3 value: 33.678000000000004 - type: recall_at_5 value: 39.244 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.891 - type: map_at_10 value: 22.464000000000002 - type: map_at_100 value: 23.483 - type: map_at_1000 value: 23.613 - type: map_at_3 value: 20.080000000000002 - type: map_at_5 value: 21.526 - type: mrr_at_1 value: 20.025000000000002 - type: mrr_at_10 value: 26.712999999999997 - type: mrr_at_100 value: 27.650000000000002 - type: mrr_at_1000 value: 27.737000000000002 - type: mrr_at_3 value: 24.274 - type: mrr_at_5 value: 25.711000000000002 - type: ndcg_at_1 value: 20.025000000000002 - type: ndcg_at_10 value: 27.028999999999996 - type: ndcg_at_100 value: 32.064 - type: ndcg_at_1000 value: 35.188 - type: ndcg_at_3 value: 22.512999999999998 - type: ndcg_at_5 value: 24.89 - type: precision_at_1 value: 20.025000000000002 - type: precision_at_10 value: 4.776 - type: precision_at_100 value: 0.8500000000000001 - type: precision_at_1000 value: 0.125 - type: precision_at_3 value: 10.531 - type: precision_at_5 value: 7.811 - type: recall_at_1 value: 15.891 - type: recall_at_10 value: 37.261 - type: recall_at_100 value: 59.12 - type: recall_at_1000 value: 81.356 - type: recall_at_3 value: 24.741 - type: recall_at_5 value: 30.753999999999998 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.544 - type: map_at_10 value: 36.283 - type: map_at_100 value: 37.467 - type: map_at_1000 value: 37.574000000000005 - type: map_at_3 value: 33.528999999999996 - type: map_at_5 value: 35.028999999999996 - type: mrr_at_1 value: 34.166999999999994 - type: mrr_at_10 value: 41.866 - type: mrr_at_100 value: 42.666 - type: mrr_at_1000 value: 42.716 - type: mrr_at_3 value: 39.541 - type: mrr_at_5 value: 40.768 - type: ndcg_at_1 value: 34.166999999999994 - type: ndcg_at_10 value: 41.577 - type: ndcg_at_100 value: 46.687 - type: ndcg_at_1000 value: 48.967 - type: ndcg_at_3 value: 37.177 - type: ndcg_at_5 value: 39.097 - type: precision_at_1 value: 34.166999999999994 - type: precision_at_10 value: 7.420999999999999 - type: precision_at_100 value: 1.165 - type: precision_at_1000 value: 0.154 - type: precision_at_3 value: 17.291999999999998 - type: precision_at_5 value: 12.166 - type: recall_at_1 value: 27.544 - type: recall_at_10 value: 51.99399999999999 - type: recall_at_100 value: 73.738 - type: recall_at_1000 value: 89.33 - type: recall_at_3 value: 39.179 - type: recall_at_5 value: 44.385999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.661 - type: map_at_10 value: 35.475 - type: map_at_100 value: 36.626999999999995 - type: map_at_1000 value: 36.741 - type: map_at_3 value: 32.818000000000005 - type: map_at_5 value: 34.397 - type: mrr_at_1 value: 32.647999999999996 - type: mrr_at_10 value: 40.784 - type: mrr_at_100 value: 41.602 - type: mrr_at_1000 value: 41.661 - type: mrr_at_3 value: 38.68 - type: mrr_at_5 value: 39.838 - type: ndcg_at_1 value: 32.647999999999996 - type: ndcg_at_10 value: 40.697 - type: ndcg_at_100 value: 45.799 - type: ndcg_at_1000 value: 48.235 - type: ndcg_at_3 value: 36.516 - type: ndcg_at_5 value: 38.515 - type: precision_at_1 value: 32.647999999999996 - type: precision_at_10 value: 7.202999999999999 - type: precision_at_100 value: 1.1360000000000001 - type: precision_at_1000 value: 0.151 - type: precision_at_3 value: 17.314 - type: precision_at_5 value: 12.145999999999999 - type: recall_at_1 value: 26.661 - type: recall_at_10 value: 50.995000000000005 - type: recall_at_100 value: 73.065 - type: recall_at_1000 value: 89.781 - type: recall_at_3 value: 39.073 - type: recall_at_5 value: 44.395 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.946583333333333 - type: map_at_10 value: 33.79725 - type: map_at_100 value: 34.86408333333333 - type: map_at_1000 value: 34.9795 - type: map_at_3 value: 31.259999999999998 - type: map_at_5 value: 32.71541666666666 - type: mrr_at_1 value: 30.863749999999996 - type: mrr_at_10 value: 37.99183333333333 - type: mrr_at_100 value: 38.790499999999994 - type: mrr_at_1000 value: 38.85575000000001 - type: mrr_at_3 value: 35.82083333333333 - type: mrr_at_5 value: 37.07533333333333 - type: ndcg_at_1 value: 30.863749999999996 - type: ndcg_at_10 value: 38.52141666666667 - type: ndcg_at_100 value: 43.17966666666667 - type: ndcg_at_1000 value: 45.64608333333333 - type: ndcg_at_3 value: 34.333000000000006 - type: ndcg_at_5 value: 36.34975 - type: precision_at_1 value: 30.863749999999996 - type: precision_at_10 value: 6.598999999999999 - type: precision_at_100 value: 1.0502500000000001 - type: precision_at_1000 value: 0.14400000000000002 - type: precision_at_3 value: 15.557583333333334 - type: precision_at_5 value: 11.020000000000001 - type: recall_at_1 value: 25.946583333333333 - type: recall_at_10 value: 48.36991666666666 - type: recall_at_100 value: 69.02408333333334 - type: recall_at_1000 value: 86.43858333333331 - type: recall_at_3 value: 36.4965 - type: recall_at_5 value: 41.76258333333334 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.431 - type: map_at_10 value: 28.889 - type: map_at_100 value: 29.642000000000003 - type: map_at_1000 value: 29.742 - type: map_at_3 value: 26.998 - type: map_at_5 value: 28.172000000000004 - type: mrr_at_1 value: 25.307000000000002 - type: mrr_at_10 value: 31.763 - type: mrr_at_100 value: 32.443 - type: mrr_at_1000 value: 32.531 - type: mrr_at_3 value: 29.959000000000003 - type: mrr_at_5 value: 31.063000000000002 - type: ndcg_at_1 value: 25.307000000000002 - type: ndcg_at_10 value: 32.586999999999996 - type: ndcg_at_100 value: 36.5 - type: ndcg_at_1000 value: 39.133 - type: ndcg_at_3 value: 29.25 - type: ndcg_at_5 value: 31.023 - type: precision_at_1 value: 25.307000000000002 - type: precision_at_10 value: 4.954 - type: precision_at_100 value: 0.747 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 12.577 - type: precision_at_5 value: 8.741999999999999 - type: recall_at_1 value: 22.431 - type: recall_at_10 value: 41.134 - type: recall_at_100 value: 59.28600000000001 - type: recall_at_1000 value: 78.857 - type: recall_at_3 value: 31.926 - type: recall_at_5 value: 36.335 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.586 - type: map_at_10 value: 23.304 - type: map_at_100 value: 24.159 - type: map_at_1000 value: 24.281 - type: map_at_3 value: 21.316 - type: map_at_5 value: 22.383 - type: mrr_at_1 value: 21.645 - type: mrr_at_10 value: 27.365000000000002 - type: mrr_at_100 value: 28.108 - type: mrr_at_1000 value: 28.192 - type: mrr_at_3 value: 25.482 - type: mrr_at_5 value: 26.479999999999997 - type: ndcg_at_1 value: 21.645 - type: ndcg_at_10 value: 27.306 - type: ndcg_at_100 value: 31.496000000000002 - type: ndcg_at_1000 value: 34.53 - type: ndcg_at_3 value: 23.73 - type: ndcg_at_5 value: 25.294 - type: precision_at_1 value: 21.645 - type: precision_at_10 value: 4.797 - type: precision_at_100 value: 0.8059999999999999 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 10.850999999999999 - type: precision_at_5 value: 7.736 - type: recall_at_1 value: 17.586 - type: recall_at_10 value: 35.481 - type: recall_at_100 value: 54.534000000000006 - type: recall_at_1000 value: 76.456 - type: recall_at_3 value: 25.335 - type: recall_at_5 value: 29.473 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.095 - type: map_at_10 value: 32.374 - type: map_at_100 value: 33.537 - type: map_at_1000 value: 33.634 - type: map_at_3 value: 30.089 - type: map_at_5 value: 31.433 - type: mrr_at_1 value: 29.198 - type: mrr_at_10 value: 36.01 - type: mrr_at_100 value: 37.022 - type: mrr_at_1000 value: 37.083 - type: mrr_at_3 value: 33.94 - type: mrr_at_5 value: 35.148 - type: ndcg_at_1 value: 29.198 - type: ndcg_at_10 value: 36.729 - type: ndcg_at_100 value: 42.114000000000004 - type: ndcg_at_1000 value: 44.592 - type: ndcg_at_3 value: 32.644 - type: ndcg_at_5 value: 34.652 - type: precision_at_1 value: 29.198 - type: precision_at_10 value: 5.970000000000001 - type: precision_at_100 value: 0.967 - type: precision_at_1000 value: 0.129 - type: precision_at_3 value: 14.396999999999998 - type: precision_at_5 value: 10.093 - type: recall_at_1 value: 25.095 - type: recall_at_10 value: 46.392 - type: recall_at_100 value: 69.706 - type: recall_at_1000 value: 87.738 - type: recall_at_3 value: 35.303000000000004 - type: recall_at_5 value: 40.441 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.857999999999997 - type: map_at_10 value: 34.066 - type: map_at_100 value: 35.671 - type: map_at_1000 value: 35.881 - type: map_at_3 value: 31.304 - type: map_at_5 value: 32.885 - type: mrr_at_1 value: 32.411 - type: mrr_at_10 value: 38.987 - type: mrr_at_100 value: 39.894 - type: mrr_at_1000 value: 39.959 - type: mrr_at_3 value: 36.626999999999995 - type: mrr_at_5 value: 38.011 - type: ndcg_at_1 value: 32.411 - type: ndcg_at_10 value: 39.208 - type: ndcg_at_100 value: 44.626 - type: ndcg_at_1000 value: 47.43 - type: ndcg_at_3 value: 35.091 - type: ndcg_at_5 value: 37.119 - type: precision_at_1 value: 32.411 - type: precision_at_10 value: 7.51 - type: precision_at_100 value: 1.486 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 16.14 - type: precision_at_5 value: 11.976 - type: recall_at_1 value: 26.857999999999997 - type: recall_at_10 value: 47.407 - type: recall_at_100 value: 72.236 - type: recall_at_1000 value: 90.77 - type: recall_at_3 value: 35.125 - type: recall_at_5 value: 40.522999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.3 - type: map_at_10 value: 27.412999999999997 - type: map_at_100 value: 28.29 - type: map_at_1000 value: 28.398 - type: map_at_3 value: 25.169999999999998 - type: map_at_5 value: 26.496 - type: mrr_at_1 value: 23.29 - type: mrr_at_10 value: 29.215000000000003 - type: mrr_at_100 value: 30.073 - type: mrr_at_1000 value: 30.156 - type: mrr_at_3 value: 26.956000000000003 - type: mrr_at_5 value: 28.38 - type: ndcg_at_1 value: 23.29 - type: ndcg_at_10 value: 31.113000000000003 - type: ndcg_at_100 value: 35.701 - type: ndcg_at_1000 value: 38.505 - type: ndcg_at_3 value: 26.727 - type: ndcg_at_5 value: 29.037000000000003 - type: precision_at_1 value: 23.29 - type: precision_at_10 value: 4.787 - type: precision_at_100 value: 0.763 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 11.091 - type: precision_at_5 value: 7.985 - type: recall_at_1 value: 21.3 - type: recall_at_10 value: 40.782000000000004 - type: recall_at_100 value: 62.13999999999999 - type: recall_at_1000 value: 83.012 - type: recall_at_3 value: 29.131 - type: recall_at_5 value: 34.624 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 9.631 - type: map_at_10 value: 16.634999999999998 - type: map_at_100 value: 18.23 - type: map_at_1000 value: 18.419 - type: map_at_3 value: 13.66 - type: map_at_5 value: 15.173 - type: mrr_at_1 value: 21.368000000000002 - type: mrr_at_10 value: 31.56 - type: mrr_at_100 value: 32.58 - type: mrr_at_1000 value: 32.633 - type: mrr_at_3 value: 28.241 - type: mrr_at_5 value: 30.225 - type: ndcg_at_1 value: 21.368000000000002 - type: ndcg_at_10 value: 23.855999999999998 - type: ndcg_at_100 value: 30.686999999999998 - type: ndcg_at_1000 value: 34.327000000000005 - type: ndcg_at_3 value: 18.781 - type: ndcg_at_5 value: 20.73 - type: precision_at_1 value: 21.368000000000002 - type: precision_at_10 value: 7.564 - type: precision_at_100 value: 1.496 - type: precision_at_1000 value: 0.217 - type: precision_at_3 value: 13.876 - type: precision_at_5 value: 11.062 - type: recall_at_1 value: 9.631 - type: recall_at_10 value: 29.517 - type: recall_at_100 value: 53.452 - type: recall_at_1000 value: 74.115 - type: recall_at_3 value: 17.605999999999998 - type: recall_at_5 value: 22.505 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.885 - type: map_at_10 value: 18.798000000000002 - type: map_at_100 value: 26.316 - type: map_at_1000 value: 27.869 - type: map_at_3 value: 13.719000000000001 - type: map_at_5 value: 15.716 - type: mrr_at_1 value: 66 - type: mrr_at_10 value: 74.263 - type: mrr_at_100 value: 74.519 - type: mrr_at_1000 value: 74.531 - type: mrr_at_3 value: 72.458 - type: mrr_at_5 value: 73.321 - type: ndcg_at_1 value: 53.87499999999999 - type: ndcg_at_10 value: 40.355999999999995 - type: ndcg_at_100 value: 44.366 - type: ndcg_at_1000 value: 51.771 - type: ndcg_at_3 value: 45.195 - type: ndcg_at_5 value: 42.187000000000005 - type: precision_at_1 value: 66 - type: precision_at_10 value: 31.75 - type: precision_at_100 value: 10.11 - type: precision_at_1000 value: 1.9800000000000002 - type: precision_at_3 value: 48.167 - type: precision_at_5 value: 40.050000000000004 - type: recall_at_1 value: 8.885 - type: recall_at_10 value: 24.471999999999998 - type: recall_at_100 value: 49.669000000000004 - type: recall_at_1000 value: 73.383 - type: recall_at_3 value: 14.872 - type: recall_at_5 value: 18.262999999999998 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 45.18 - type: f1 value: 40.26878691789978 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 62.751999999999995 - type: map_at_10 value: 74.131 - type: map_at_100 value: 74.407 - type: map_at_1000 value: 74.423 - type: map_at_3 value: 72.329 - type: map_at_5 value: 73.555 - type: mrr_at_1 value: 67.282 - type: mrr_at_10 value: 78.292 - type: mrr_at_100 value: 78.455 - type: mrr_at_1000 value: 78.458 - type: mrr_at_3 value: 76.755 - type: mrr_at_5 value: 77.839 - type: ndcg_at_1 value: 67.282 - type: ndcg_at_10 value: 79.443 - type: ndcg_at_100 value: 80.529 - type: ndcg_at_1000 value: 80.812 - type: ndcg_at_3 value: 76.281 - type: ndcg_at_5 value: 78.235 - type: precision_at_1 value: 67.282 - type: precision_at_10 value: 10.078 - type: precision_at_100 value: 1.082 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 30.178 - type: precision_at_5 value: 19.232 - type: recall_at_1 value: 62.751999999999995 - type: recall_at_10 value: 91.521 - type: recall_at_100 value: 95.997 - type: recall_at_1000 value: 97.775 - type: recall_at_3 value: 83.131 - type: recall_at_5 value: 87.93299999999999 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 18.861 - type: map_at_10 value: 30.252000000000002 - type: map_at_100 value: 32.082 - type: map_at_1000 value: 32.261 - type: map_at_3 value: 25.909 - type: map_at_5 value: 28.296 - type: mrr_at_1 value: 37.346000000000004 - type: mrr_at_10 value: 45.802 - type: mrr_at_100 value: 46.611999999999995 - type: mrr_at_1000 value: 46.659 - type: mrr_at_3 value: 43.056 - type: mrr_at_5 value: 44.637 - type: ndcg_at_1 value: 37.346000000000004 - type: ndcg_at_10 value: 38.169 - type: ndcg_at_100 value: 44.864 - type: ndcg_at_1000 value: 47.974 - type: ndcg_at_3 value: 33.619 - type: ndcg_at_5 value: 35.317 - type: precision_at_1 value: 37.346000000000004 - type: precision_at_10 value: 10.693999999999999 - type: precision_at_100 value: 1.775 - type: precision_at_1000 value: 0.231 - type: precision_at_3 value: 22.325 - type: precision_at_5 value: 16.852 - type: recall_at_1 value: 18.861 - type: recall_at_10 value: 45.672000000000004 - type: recall_at_100 value: 70.60499999999999 - type: recall_at_1000 value: 89.216 - type: recall_at_3 value: 30.361 - type: recall_at_5 value: 36.998999999999995 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 37.852999999999994 - type: map_at_10 value: 59.961 - type: map_at_100 value: 60.78 - type: map_at_1000 value: 60.843 - type: map_at_3 value: 56.39999999999999 - type: map_at_5 value: 58.646 - type: mrr_at_1 value: 75.70599999999999 - type: mrr_at_10 value: 82.321 - type: mrr_at_100 value: 82.516 - type: mrr_at_1000 value: 82.525 - type: mrr_at_3 value: 81.317 - type: mrr_at_5 value: 81.922 - type: ndcg_at_1 value: 75.70599999999999 - type: ndcg_at_10 value: 68.557 - type: ndcg_at_100 value: 71.485 - type: ndcg_at_1000 value: 72.71600000000001 - type: ndcg_at_3 value: 63.524 - type: ndcg_at_5 value: 66.338 - type: precision_at_1 value: 75.70599999999999 - type: precision_at_10 value: 14.463000000000001 - type: precision_at_100 value: 1.677 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 40.806 - type: precision_at_5 value: 26.709 - type: recall_at_1 value: 37.852999999999994 - type: recall_at_10 value: 72.316 - type: recall_at_100 value: 83.842 - type: recall_at_1000 value: 91.999 - type: recall_at_3 value: 61.209 - type: recall_at_5 value: 66.77199999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 85.46039999999999 - type: ap value: 79.9812521351881 - type: f1 value: 85.31722909702084 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 22.704 - type: map_at_10 value: 35.329 - type: map_at_100 value: 36.494 - type: map_at_1000 value: 36.541000000000004 - type: map_at_3 value: 31.476 - type: map_at_5 value: 33.731 - type: mrr_at_1 value: 23.294999999999998 - type: mrr_at_10 value: 35.859 - type: mrr_at_100 value: 36.968 - type: mrr_at_1000 value: 37.008 - type: mrr_at_3 value: 32.085 - type: mrr_at_5 value: 34.299 - type: ndcg_at_1 value: 23.324 - type: ndcg_at_10 value: 42.274 - type: ndcg_at_100 value: 47.839999999999996 - type: ndcg_at_1000 value: 48.971 - type: ndcg_at_3 value: 34.454 - type: ndcg_at_5 value: 38.464 - type: precision_at_1 value: 23.324 - type: precision_at_10 value: 6.648 - type: precision_at_100 value: 0.9440000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.674999999999999 - type: precision_at_5 value: 10.850999999999999 - type: recall_at_1 value: 22.704 - type: recall_at_10 value: 63.660000000000004 - type: recall_at_100 value: 89.29899999999999 - type: recall_at_1000 value: 97.88900000000001 - type: recall_at_3 value: 42.441 - type: recall_at_5 value: 52.04 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.1326949384405 - type: f1 value: 92.89743579612082 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.62524654832347 - type: f1 value: 88.65106082263151 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.59039359573046 - type: f1 value: 90.31532892105662 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 86.21046038208581 - type: f1 value: 86.41459529813113 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 87.3180351380423 - type: f1 value: 86.71383078226444 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 86.24231464737792 - type: f1 value: 86.31845567592403 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 75.27131782945736 - type: f1 value: 57.52079940417103 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 71.2341504649197 - type: f1 value: 51.349951558039244 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 71.27418278852569 - type: f1 value: 50.1714985749095 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 67.68243031631694 - type: f1 value: 50.1066160836192 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 69.2362854069559 - type: f1 value: 48.821279948766424 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 71.71428571428571 - type: f1 value: 53.94611389496195 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.97646267652992 - type: f1 value: 57.26797883561521 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 53.65501008742435 - type: f1 value: 50.416258382177034 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.45796906523201 - type: f1 value: 53.306690547422185 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.59246805648957 - type: f1 value: 59.818381969051494 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.126429051782104 - type: f1 value: 58.25993593933026 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 50.057162071284466 - type: f1 value: 46.96095728790911 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.64425016812375 - type: f1 value: 62.858291698755764 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.08944182918628 - type: f1 value: 62.44639030604241 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.68056489576328 - type: f1 value: 61.775326758789504 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.11163416274377 - type: f1 value: 69.70789096927015 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.40282447881641 - type: f1 value: 66.38492065671895 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.24613315400134 - type: f1 value: 64.3348019501336 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.78345662407531 - type: f1 value: 62.21279452354622 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.9455279085407 - type: f1 value: 65.48193124964094 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.05110961667788 - type: f1 value: 58.097856564684534 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.95292535305985 - type: f1 value: 62.09182174767901 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.97310020174848 - type: f1 value: 61.14252567730396 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.08069939475453 - type: f1 value: 57.044041742492034 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.63752521856085 - type: f1 value: 63.889340907205316 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.385339609952936 - type: f1 value: 53.449033750088304 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.93073301950234 - type: f1 value: 65.9884357824104 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.94418291862812 - type: f1 value: 66.48740222583132 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.26025554808339 - type: f1 value: 50.19562815100793 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 48.98789509078682 - type: f1 value: 46.65788438676836 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 44.68728984532616 - type: f1 value: 41.642419349541996 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.19300605245461 - type: f1 value: 55.8626492442437 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.33826496301278 - type: f1 value: 63.89499791648792 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.33960995292536 - type: f1 value: 57.15242464180892 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.09347679892402 - type: f1 value: 59.64733214063841 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.75924680564896 - type: f1 value: 55.96585692366827 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.48486886348352 - type: f1 value: 59.45143559032946 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.56422326832549 - type: f1 value: 54.96368702901926 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.18022864828512 - type: f1 value: 63.05369805040634 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.30329522528581 - type: f1 value: 64.06084612020727 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.36919973100201 - type: f1 value: 65.12154124788887 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.98117014122394 - type: f1 value: 66.41847559806962 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.53799596503026 - type: f1 value: 62.17067330740817 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.01815736381977 - type: f1 value: 66.24988369607843 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.34700739744452 - type: f1 value: 59.957933424941636 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.23402824478815 - type: f1 value: 57.98836976018471 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.54068594485541 - type: f1 value: 65.43849680666855 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 55.998655010087425 - type: f1 value: 52.83737515406804 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.71217215870882 - type: f1 value: 55.051794977833026 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.724277067921996 - type: f1 value: 56.33485571838306 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.59515803631473 - type: f1 value: 64.96772366193588 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.860793544048406 - type: f1 value: 58.148845819115394 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.40753194351043 - type: f1 value: 63.18903778054698 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.52320107599194 - type: f1 value: 58.356144563398516 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.17014122394083 - type: f1 value: 63.919964062638925 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.15601882985878 - type: f1 value: 67.01451905761371 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.65030262273034 - type: f1 value: 64.14420425129063 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.08742434431743 - type: f1 value: 63.044060042311756 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.52387357094821 - type: f1 value: 56.82398588814534 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.239408204438476 - type: f1 value: 61.92570286170469 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.74915938130463 - type: f1 value: 62.130740689396276 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.00336247478144 - type: f1 value: 63.71080635228055 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 52.837928715534645 - type: f1 value: 50.390741680320836 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.42098184263618 - type: f1 value: 71.41355113538995 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.95359784801613 - type: f1 value: 71.42699340156742 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.18157363819772 - type: f1 value: 69.74836113037671 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.08137188971082 - type: f1 value: 76.78000685068261 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.5030262273033 - type: f1 value: 71.71620130425673 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.24546065904505 - type: f1 value: 69.07638311730359 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.12911903160726 - type: f1 value: 68.32651736539815 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.89307330195025 - type: f1 value: 71.33986549860187 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.44451916610626 - type: f1 value: 66.90192664503866 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.16274377942166 - type: f1 value: 68.01090953775066 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.75319435104237 - type: f1 value: 70.18035309201403 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.14391392064559 - type: f1 value: 61.48286540778145 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.70275722932078 - type: f1 value: 70.26164779846495 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.93813046402153 - type: f1 value: 58.8852862116525 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.320107599193 - type: f1 value: 72.19836409602924 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.65366509751176 - type: f1 value: 74.55188288799579 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.694014794889036 - type: f1 value: 58.11353311721067 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 54.37457969065231 - type: f1 value: 52.81306134311697 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 48.3086751849361 - type: f1 value: 45.396449765419376 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.151983860121064 - type: f1 value: 60.31762544281696 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.44788164088769 - type: f1 value: 71.68150151736367 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.81439139206455 - type: f1 value: 62.06735559105593 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.04303967720242 - type: f1 value: 66.68298851670133 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.43913920645595 - type: f1 value: 60.25605977560783 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.90316072629456 - type: f1 value: 65.1325924692381 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.63752521856086 - type: f1 value: 59.14284778039585 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.63080026899797 - type: f1 value: 70.89771864626877 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.10827168796234 - type: f1 value: 71.71954219691159 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.59515803631471 - type: f1 value: 70.05040128099003 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.83389374579691 - type: f1 value: 70.84877936562735 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.18628110289173 - type: f1 value: 68.97232927921841 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.99260255548083 - type: f1 value: 72.85139492157732 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.26227303295225 - type: f1 value: 65.08833655469431 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.48621385339611 - type: f1 value: 64.43483199071298 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.14391392064559 - type: f1 value: 72.2580822579741 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.88567585743107 - type: f1 value: 58.3073765932569 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.38399462004034 - type: f1 value: 60.82139544252606 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.58574310692671 - type: f1 value: 60.71443370385374 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.61398789509079 - type: f1 value: 70.99761812049401 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.73705447209146 - type: f1 value: 61.680849331794796 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.66778749159381 - type: f1 value: 71.17320646080115 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.640215198386 - type: f1 value: 63.301805157015444 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.00672494956288 - type: f1 value: 70.26005548582106 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.42030934767989 - type: f1 value: 75.2074842882598 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.69266980497646 - type: f1 value: 70.94103167391192 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 28.91697191169135 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 28.434000079573313 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.96683513343383 - type: mrr value: 31.967364078714834 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.5280000000000005 - type: map_at_10 value: 11.793 - type: map_at_100 value: 14.496999999999998 - type: map_at_1000 value: 15.783 - type: map_at_3 value: 8.838 - type: map_at_5 value: 10.07 - type: mrr_at_1 value: 43.653 - type: mrr_at_10 value: 51.531000000000006 - type: mrr_at_100 value: 52.205 - type: mrr_at_1000 value: 52.242999999999995 - type: mrr_at_3 value: 49.431999999999995 - type: mrr_at_5 value: 50.470000000000006 - type: ndcg_at_1 value: 42.415000000000006 - type: ndcg_at_10 value: 32.464999999999996 - type: ndcg_at_100 value: 28.927999999999997 - type: ndcg_at_1000 value: 37.629000000000005 - type: ndcg_at_3 value: 37.845 - type: ndcg_at_5 value: 35.147 - type: precision_at_1 value: 43.653 - type: precision_at_10 value: 23.932000000000002 - type: precision_at_100 value: 7.17 - type: precision_at_1000 value: 1.967 - type: precision_at_3 value: 35.397 - type: precision_at_5 value: 29.907 - type: recall_at_1 value: 5.5280000000000005 - type: recall_at_10 value: 15.568000000000001 - type: recall_at_100 value: 28.54 - type: recall_at_1000 value: 59.864 - type: recall_at_3 value: 9.822000000000001 - type: recall_at_5 value: 11.726 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 37.041000000000004 - type: map_at_10 value: 52.664 - type: map_at_100 value: 53.477 - type: map_at_1000 value: 53.505 - type: map_at_3 value: 48.510999999999996 - type: map_at_5 value: 51.036 - type: mrr_at_1 value: 41.338 - type: mrr_at_10 value: 55.071000000000005 - type: mrr_at_100 value: 55.672 - type: mrr_at_1000 value: 55.689 - type: mrr_at_3 value: 51.82 - type: mrr_at_5 value: 53.852 - type: ndcg_at_1 value: 41.338 - type: ndcg_at_10 value: 60.01800000000001 - type: ndcg_at_100 value: 63.409000000000006 - type: ndcg_at_1000 value: 64.017 - type: ndcg_at_3 value: 52.44799999999999 - type: ndcg_at_5 value: 56.571000000000005 - type: precision_at_1 value: 41.338 - type: precision_at_10 value: 9.531 - type: precision_at_100 value: 1.145 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 23.416 - type: precision_at_5 value: 16.46 - type: recall_at_1 value: 37.041000000000004 - type: recall_at_10 value: 79.76299999999999 - type: recall_at_100 value: 94.39 - type: recall_at_1000 value: 98.851 - type: recall_at_3 value: 60.465 - type: recall_at_5 value: 69.906 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 69.952 - type: map_at_10 value: 83.758 - type: map_at_100 value: 84.406 - type: map_at_1000 value: 84.425 - type: map_at_3 value: 80.839 - type: map_at_5 value: 82.646 - type: mrr_at_1 value: 80.62 - type: mrr_at_10 value: 86.947 - type: mrr_at_100 value: 87.063 - type: mrr_at_1000 value: 87.064 - type: mrr_at_3 value: 85.96000000000001 - type: mrr_at_5 value: 86.619 - type: ndcg_at_1 value: 80.63 - type: ndcg_at_10 value: 87.64800000000001 - type: ndcg_at_100 value: 88.929 - type: ndcg_at_1000 value: 89.054 - type: ndcg_at_3 value: 84.765 - type: ndcg_at_5 value: 86.291 - type: precision_at_1 value: 80.63 - type: precision_at_10 value: 13.314 - type: precision_at_100 value: 1.525 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.1 - type: precision_at_5 value: 24.372 - type: recall_at_1 value: 69.952 - type: recall_at_10 value: 94.955 - type: recall_at_100 value: 99.38 - type: recall_at_1000 value: 99.96000000000001 - type: recall_at_3 value: 86.60600000000001 - type: recall_at_5 value: 90.997 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 42.41329517878427 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 55.171278362748666 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.213 - type: map_at_10 value: 9.895 - type: map_at_100 value: 11.776 - type: map_at_1000 value: 12.084 - type: map_at_3 value: 7.2669999999999995 - type: map_at_5 value: 8.620999999999999 - type: mrr_at_1 value: 20.8 - type: mrr_at_10 value: 31.112000000000002 - type: mrr_at_100 value: 32.274 - type: mrr_at_1000 value: 32.35 - type: mrr_at_3 value: 28.133000000000003 - type: mrr_at_5 value: 29.892999999999997 - type: ndcg_at_1 value: 20.8 - type: ndcg_at_10 value: 17.163999999999998 - type: ndcg_at_100 value: 24.738 - type: ndcg_at_1000 value: 30.316 - type: ndcg_at_3 value: 16.665 - type: ndcg_at_5 value: 14.478 - type: precision_at_1 value: 20.8 - type: precision_at_10 value: 8.74 - type: precision_at_100 value: 1.963 - type: precision_at_1000 value: 0.33 - type: precision_at_3 value: 15.467 - type: precision_at_5 value: 12.6 - type: recall_at_1 value: 4.213 - type: recall_at_10 value: 17.698 - type: recall_at_100 value: 39.838 - type: recall_at_1000 value: 66.893 - type: recall_at_3 value: 9.418 - type: recall_at_5 value: 12.773000000000001 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.90453315738294 - type: cos_sim_spearman value: 78.51197850080254 - type: euclidean_pearson value: 80.09647123597748 - type: euclidean_spearman value: 78.63548011514061 - type: manhattan_pearson value: 80.10645285675231 - type: manhattan_spearman value: 78.57861806068901 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 84.2616156846401 - type: cos_sim_spearman value: 76.69713867850156 - type: euclidean_pearson value: 77.97948563800394 - type: euclidean_spearman value: 74.2371211567807 - type: manhattan_pearson value: 77.69697879669705 - type: manhattan_spearman value: 73.86529778022278 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 77.0293269315045 - type: cos_sim_spearman value: 78.02555120584198 - type: euclidean_pearson value: 78.25398100379078 - type: euclidean_spearman value: 78.66963870599464 - type: manhattan_pearson value: 78.14314682167348 - type: manhattan_spearman value: 78.57692322969135 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 79.16989925136942 - type: cos_sim_spearman value: 76.5996225327091 - type: euclidean_pearson value: 77.8319003279786 - type: euclidean_spearman value: 76.42824009468998 - type: manhattan_pearson value: 77.69118862737736 - type: manhattan_spearman value: 76.25568104762812 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 87.42012286935325 - type: cos_sim_spearman value: 88.15654297884122 - type: euclidean_pearson value: 87.34082819427852 - type: euclidean_spearman value: 88.06333589547084 - type: manhattan_pearson value: 87.25115596784842 - type: manhattan_spearman value: 87.9559927695203 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.88222044996712 - type: cos_sim_spearman value: 84.28476589061077 - type: euclidean_pearson value: 83.17399758058309 - type: euclidean_spearman value: 83.85497357244542 - type: manhattan_pearson value: 83.0308397703786 - type: manhattan_spearman value: 83.71554539935046 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 80.20682986257339 - type: cos_sim_spearman value: 79.94567120362092 - type: euclidean_pearson value: 79.43122480368902 - type: euclidean_spearman value: 79.94802077264987 - type: manhattan_pearson value: 79.32653021527081 - type: manhattan_spearman value: 79.80961146709178 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 74.46578144394383 - type: cos_sim_spearman value: 74.52496637472179 - type: euclidean_pearson value: 72.2903807076809 - type: euclidean_spearman value: 73.55549359771645 - type: manhattan_pearson value: 72.09324837709393 - type: manhattan_spearman value: 73.36743103606581 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 71.37272335116 - type: cos_sim_spearman value: 71.26702117766037 - type: euclidean_pearson value: 67.114829954434 - type: euclidean_spearman value: 66.37938893947761 - type: manhattan_pearson value: 66.79688574095246 - type: manhattan_spearman value: 66.17292828079667 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 80.61016770129092 - type: cos_sim_spearman value: 82.08515426632214 - type: euclidean_pearson value: 80.557340361131 - type: euclidean_spearman value: 80.37585812266175 - type: manhattan_pearson value: 80.6782873404285 - type: manhattan_spearman value: 80.6678073032024 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.00150745350108 - type: cos_sim_spearman value: 87.83441972211425 - type: euclidean_pearson value: 87.94826702308792 - type: euclidean_spearman value: 87.46143974860725 - type: manhattan_pearson value: 87.97560344306105 - type: manhattan_spearman value: 87.5267102829796 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 64.76325252267235 - type: cos_sim_spearman value: 63.32615095463905 - type: euclidean_pearson value: 64.07920669155716 - type: euclidean_spearman value: 61.21409893072176 - type: manhattan_pearson value: 64.26308625680016 - type: manhattan_spearman value: 61.2438185254079 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 75.82644463022595 - type: cos_sim_spearman value: 76.50381269945073 - type: euclidean_pearson value: 75.1328548315934 - type: euclidean_spearman value: 75.63761139408453 - type: manhattan_pearson value: 75.18610101241407 - type: manhattan_spearman value: 75.30669266354164 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.49994164686832 - type: cos_sim_spearman value: 86.73743986245549 - type: euclidean_pearson value: 86.8272894387145 - type: euclidean_spearman value: 85.97608491000507 - type: manhattan_pearson value: 86.74960140396779 - type: manhattan_spearman value: 85.79285984190273 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 79.58172210788469 - type: cos_sim_spearman value: 80.17516468334607 - type: euclidean_pearson value: 77.56537843470504 - type: euclidean_spearman value: 77.57264627395521 - type: manhattan_pearson value: 78.09703521695943 - type: manhattan_spearman value: 78.15942760916954 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 79.7589932931751 - type: cos_sim_spearman value: 80.15210089028162 - type: euclidean_pearson value: 77.54135223516057 - type: euclidean_spearman value: 77.52697996368764 - type: manhattan_pearson value: 77.65734439572518 - type: manhattan_spearman value: 77.77702992016121 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 79.16682365511267 - type: cos_sim_spearman value: 79.25311267628506 - type: euclidean_pearson value: 77.54882036762244 - type: euclidean_spearman value: 77.33212935194827 - type: manhattan_pearson value: 77.98405516064015 - type: manhattan_spearman value: 77.85075717865719 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 59.10473294775917 - type: cos_sim_spearman value: 61.82780474476838 - type: euclidean_pearson value: 45.885111672377256 - type: euclidean_spearman value: 56.88306351932454 - type: manhattan_pearson value: 46.101218127323186 - type: manhattan_spearman value: 56.80953694186333 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 45.781923079584146 - type: cos_sim_spearman value: 55.95098449691107 - type: euclidean_pearson value: 25.4571031323205 - type: euclidean_spearman value: 49.859978118078935 - type: manhattan_pearson value: 25.624938455041384 - type: manhattan_spearman value: 49.99546185049401 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 60.00618133997907 - type: cos_sim_spearman value: 66.57896677718321 - type: euclidean_pearson value: 42.60118466388821 - type: euclidean_spearman value: 62.8210759715209 - type: manhattan_pearson value: 42.63446860604094 - type: manhattan_spearman value: 62.73803068925271 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 28.460759121626943 - type: cos_sim_spearman value: 34.13459007469131 - type: euclidean_pearson value: 6.0917739325525195 - type: euclidean_spearman value: 27.9947262664867 - type: manhattan_pearson value: 6.16877864169911 - type: manhattan_spearman value: 28.00664163971514 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 57.42546621771696 - type: cos_sim_spearman value: 63.699663168970474 - type: euclidean_pearson value: 38.12085278789738 - type: euclidean_spearman value: 58.12329140741536 - type: manhattan_pearson value: 37.97364549443335 - type: manhattan_spearman value: 57.81545502318733 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 46.82241380954213 - type: cos_sim_spearman value: 57.86569456006391 - type: euclidean_pearson value: 31.80480070178813 - type: euclidean_spearman value: 52.484000620130104 - type: manhattan_pearson value: 31.952708554646097 - type: manhattan_spearman value: 52.8560972356195 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 52.00447170498087 - type: cos_sim_spearman value: 60.664116225735164 - type: euclidean_pearson value: 33.87382555421702 - type: euclidean_spearman value: 55.74649067458667 - type: manhattan_pearson value: 33.99117246759437 - type: manhattan_spearman value: 55.98749034923899 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 58.06497233105448 - type: cos_sim_spearman value: 65.62968801135676 - type: euclidean_pearson value: 47.482076613243905 - type: euclidean_spearman value: 62.65137791498299 - type: manhattan_pearson value: 47.57052626104093 - type: manhattan_spearman value: 62.436916516613294 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 70.49397298562575 - type: cos_sim_spearman value: 74.79604041187868 - type: euclidean_pearson value: 49.661891561317795 - type: euclidean_spearman value: 70.31535537621006 - type: manhattan_pearson value: 49.553715741850006 - type: manhattan_spearman value: 70.24779344636806 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 55.640574515348696 - type: cos_sim_spearman value: 54.927959317689 - type: euclidean_pearson value: 29.00139666967476 - type: euclidean_spearman value: 41.86386566971605 - type: manhattan_pearson value: 29.47411067730344 - type: manhattan_spearman value: 42.337438424952786 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 68.14095292259312 - type: cos_sim_spearman value: 73.99017581234789 - type: euclidean_pearson value: 46.46304297872084 - type: euclidean_spearman value: 60.91834114800041 - type: manhattan_pearson value: 47.07072666338692 - type: manhattan_spearman value: 61.70415727977926 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 73.27184653359575 - type: cos_sim_spearman value: 77.76070252418626 - type: euclidean_pearson value: 62.30586577544778 - type: euclidean_spearman value: 75.14246629110978 - type: manhattan_pearson value: 62.328196884927046 - type: manhattan_spearman value: 75.1282792981433 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 71.59448528829957 - type: cos_sim_spearman value: 70.37277734222123 - type: euclidean_pearson value: 57.63145565721123 - type: euclidean_spearman value: 66.10113048304427 - type: manhattan_pearson value: 57.18897811586808 - type: manhattan_spearman value: 66.5595511215901 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 66.37520607720838 - type: cos_sim_spearman value: 69.92282148997948 - type: euclidean_pearson value: 40.55768770125291 - type: euclidean_spearman value: 55.189128944669605 - type: manhattan_pearson value: 41.03566433468883 - type: manhattan_spearman value: 55.61251893174558 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 57.791929533771835 - type: cos_sim_spearman value: 66.45819707662093 - type: euclidean_pearson value: 39.03686018511092 - type: euclidean_spearman value: 56.01282695640428 - type: manhattan_pearson value: 38.91586623619632 - type: manhattan_spearman value: 56.69394943612747 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 47.82224468473866 - type: cos_sim_spearman value: 59.467307194781164 - type: euclidean_pearson value: 27.428459190256145 - type: euclidean_spearman value: 60.83463107397519 - type: manhattan_pearson value: 27.487391578496638 - type: manhattan_spearman value: 61.281380460246496 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 16.306666792752644 - type: cos_sim_spearman value: 39.35486427252405 - type: euclidean_pearson value: -2.7887154897955435 - type: euclidean_spearman value: 27.1296051831719 - type: manhattan_pearson value: -3.202291270581297 - type: manhattan_spearman value: 26.32895849218158 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 59.67006803805076 - type: cos_sim_spearman value: 73.24670207647144 - type: euclidean_pearson value: 46.91884681500483 - type: euclidean_spearman value: 16.903085094570333 - type: manhattan_pearson value: 46.88391675325812 - type: manhattan_spearman value: 28.17180849095055 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.79555591223837 - type: cos_sim_spearman value: 85.63658602085185 - type: euclidean_pearson value: 85.22080894037671 - type: euclidean_spearman value: 85.54113580167038 - type: manhattan_pearson value: 85.1639505960118 - type: manhattan_spearman value: 85.43502665436196 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 80.73900991689766 - type: mrr value: 94.81624131133934 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 55.678000000000004 - type: map_at_10 value: 65.135 - type: map_at_100 value: 65.824 - type: map_at_1000 value: 65.852 - type: map_at_3 value: 62.736000000000004 - type: map_at_5 value: 64.411 - type: mrr_at_1 value: 58.333 - type: mrr_at_10 value: 66.5 - type: mrr_at_100 value: 67.053 - type: mrr_at_1000 value: 67.08 - type: mrr_at_3 value: 64.944 - type: mrr_at_5 value: 65.89399999999999 - type: ndcg_at_1 value: 58.333 - type: ndcg_at_10 value: 69.34700000000001 - type: ndcg_at_100 value: 72.32 - type: ndcg_at_1000 value: 73.014 - type: ndcg_at_3 value: 65.578 - type: ndcg_at_5 value: 67.738 - type: precision_at_1 value: 58.333 - type: precision_at_10 value: 9.033 - type: precision_at_100 value: 1.0670000000000002 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 25.444 - type: precision_at_5 value: 16.933 - type: recall_at_1 value: 55.678000000000004 - type: recall_at_10 value: 80.72200000000001 - type: recall_at_100 value: 93.93299999999999 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 70.783 - type: recall_at_5 value: 75.978 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.74653465346535 - type: cos_sim_ap value: 93.01476369929063 - type: cos_sim_f1 value: 86.93009118541033 - type: cos_sim_precision value: 88.09034907597535 - type: cos_sim_recall value: 85.8 - type: dot_accuracy value: 99.22970297029703 - type: dot_ap value: 51.58725659485144 - type: dot_f1 value: 53.51351351351352 - type: dot_precision value: 58.235294117647065 - type: dot_recall value: 49.5 - type: euclidean_accuracy value: 99.74356435643564 - type: euclidean_ap value: 92.40332894384368 - type: euclidean_f1 value: 86.97838109602817 - type: euclidean_precision value: 87.46208291203236 - type: euclidean_recall value: 86.5 - type: manhattan_accuracy value: 99.73069306930694 - type: manhattan_ap value: 92.01320815721121 - type: manhattan_f1 value: 86.4135864135864 - type: manhattan_precision value: 86.32734530938124 - type: manhattan_recall value: 86.5 - type: max_accuracy value: 99.74653465346535 - type: max_ap value: 93.01476369929063 - type: max_f1 value: 86.97838109602817 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 55.2660514302523 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 30.4637783572547 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.41377758357637 - type: mrr value: 50.138451213818854 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 28.887846011166594 - type: cos_sim_spearman value: 30.10823258355903 - type: dot_pearson value: 12.888049550236385 - type: dot_spearman value: 12.827495903098123 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.21 - type: map_at_10 value: 1.667 - type: map_at_100 value: 9.15 - type: map_at_1000 value: 22.927 - type: map_at_3 value: 0.573 - type: map_at_5 value: 0.915 - type: mrr_at_1 value: 80 - type: mrr_at_10 value: 87.167 - type: mrr_at_100 value: 87.167 - type: mrr_at_1000 value: 87.167 - type: mrr_at_3 value: 85.667 - type: mrr_at_5 value: 87.167 - type: ndcg_at_1 value: 76 - type: ndcg_at_10 value: 69.757 - type: ndcg_at_100 value: 52.402 - type: ndcg_at_1000 value: 47.737 - type: ndcg_at_3 value: 71.866 - type: ndcg_at_5 value: 72.225 - type: precision_at_1 value: 80 - type: precision_at_10 value: 75 - type: precision_at_100 value: 53.959999999999994 - type: precision_at_1000 value: 21.568 - type: precision_at_3 value: 76.667 - type: precision_at_5 value: 78 - type: recall_at_1 value: 0.21 - type: recall_at_10 value: 1.9189999999999998 - type: recall_at_100 value: 12.589 - type: recall_at_1000 value: 45.312000000000005 - type: recall_at_3 value: 0.61 - type: recall_at_5 value: 1.019 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.10000000000001 - type: f1 value: 90.06 - type: precision value: 89.17333333333333 - type: recall value: 92.10000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 56.06936416184971 - type: f1 value: 50.87508028259473 - type: precision value: 48.97398843930635 - type: recall value: 56.06936416184971 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 57.3170731707317 - type: f1 value: 52.96080139372822 - type: precision value: 51.67861124382864 - type: recall value: 57.3170731707317 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.3 - type: f1 value: 92.67333333333333 - type: precision value: 91.90833333333333 - type: recall value: 94.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.7 - type: f1 value: 97.07333333333332 - type: precision value: 96.79500000000002 - type: recall value: 97.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.69999999999999 - type: f1 value: 93.2 - type: precision value: 92.48333333333333 - type: recall value: 94.69999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.9 - type: f1 value: 91.26666666666667 - type: precision value: 90.59444444444445 - type: recall value: 92.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 34.32835820895522 - type: f1 value: 29.074180380150533 - type: precision value: 28.068207322920596 - type: recall value: 34.32835820895522 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 78.5 - type: f1 value: 74.3945115995116 - type: precision value: 72.82967843459222 - type: recall value: 78.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 66.34146341463415 - type: f1 value: 61.2469400518181 - type: precision value: 59.63977756660683 - type: recall value: 66.34146341463415 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 80.9 - type: f1 value: 76.90349206349207 - type: precision value: 75.32921568627451 - type: recall value: 80.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 84.93317132442284 - type: f1 value: 81.92519105034295 - type: precision value: 80.71283920615635 - type: recall value: 84.93317132442284 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 71.1304347826087 - type: f1 value: 65.22394755003451 - type: precision value: 62.912422360248435 - type: recall value: 71.1304347826087 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 79.82608695652173 - type: f1 value: 75.55693581780538 - type: precision value: 73.79420289855072 - type: recall value: 79.82608695652173 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 74 - type: f1 value: 70.51022222222223 - type: precision value: 69.29673599347512 - type: recall value: 74 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 78.7 - type: f1 value: 74.14238095238095 - type: precision value: 72.27214285714285 - type: recall value: 78.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 48.97466827503016 - type: f1 value: 43.080330405420874 - type: precision value: 41.36505499593557 - type: recall value: 48.97466827503016 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.60000000000001 - type: f1 value: 86.62333333333333 - type: precision value: 85.225 - type: recall value: 89.60000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 45.2 - type: f1 value: 39.5761253006253 - type: precision value: 37.991358436312 - type: recall value: 45.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.5 - type: f1 value: 86.70333333333333 - type: precision value: 85.53166666666667 - type: recall value: 89.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 50.095238095238095 - type: f1 value: 44.60650460650461 - type: precision value: 42.774116796477045 - type: recall value: 50.095238095238095 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 63.4 - type: f1 value: 58.35967261904762 - type: precision value: 56.54857142857143 - type: recall value: 63.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.2 - type: f1 value: 87.075 - type: precision value: 86.12095238095239 - type: recall value: 89.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.8 - type: f1 value: 95.90333333333334 - type: precision value: 95.50833333333333 - type: recall value: 96.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.9 - type: f1 value: 88.6288888888889 - type: precision value: 87.61607142857142 - type: recall value: 90.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 65.2 - type: f1 value: 60.54377630539395 - type: precision value: 58.89434482711381 - type: recall value: 65.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87 - type: f1 value: 84.32412698412699 - type: precision value: 83.25527777777778 - type: recall value: 87 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 68.7 - type: f1 value: 63.07883541295306 - type: precision value: 61.06117424242426 - type: recall value: 68.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.7 - type: f1 value: 91.78333333333335 - type: precision value: 90.86666666666667 - type: recall value: 93.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.7 - type: f1 value: 96.96666666666667 - type: precision value: 96.61666666666667 - type: recall value: 97.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.27493261455525 - type: f1 value: 85.90745732255168 - type: precision value: 84.91389637616052 - type: recall value: 88.27493261455525 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.5982905982906 - type: f1 value: 88.4900284900285 - type: precision value: 87.57122507122507 - type: recall value: 90.5982905982906 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.5 - type: f1 value: 86.90769841269842 - type: precision value: 85.80178571428571 - type: recall value: 89.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 82.5 - type: f1 value: 78.36796536796538 - type: precision value: 76.82196969696969 - type: recall value: 82.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 71.48846960167715 - type: f1 value: 66.78771089148448 - type: precision value: 64.98302885095339 - type: recall value: 71.48846960167715 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.1 - type: f1 value: 92.50333333333333 - type: precision value: 91.77499999999999 - type: recall value: 94.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 71.20622568093385 - type: f1 value: 66.83278891450098 - type: precision value: 65.35065777283677 - type: recall value: 71.20622568093385 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 48.717948717948715 - type: f1 value: 43.53146853146853 - type: precision value: 42.04721204721204 - type: recall value: 48.717948717948715 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 58.5 - type: f1 value: 53.8564991863928 - type: precision value: 52.40329436122275 - type: recall value: 58.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.8 - type: f1 value: 88.29 - type: precision value: 87.09166666666667 - type: recall value: 90.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 67.28971962616822 - type: f1 value: 62.63425307817832 - type: precision value: 60.98065939771546 - type: recall value: 67.28971962616822 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 78.7 - type: f1 value: 75.5264472455649 - type: precision value: 74.38205086580086 - type: recall value: 78.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.7 - type: f1 value: 86.10809523809525 - type: precision value: 85.07602564102565 - type: recall value: 88.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 56.99999999999999 - type: f1 value: 52.85487521402737 - type: precision value: 51.53985162713104 - type: recall value: 56.99999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94 - type: f1 value: 92.45333333333333 - type: precision value: 91.79166666666667 - type: recall value: 94 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.30000000000001 - type: f1 value: 90.61333333333333 - type: precision value: 89.83333333333331 - type: recall value: 92.30000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.69999999999999 - type: f1 value: 93.34555555555555 - type: precision value: 92.75416666666668 - type: recall value: 94.69999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 80.2 - type: f1 value: 76.6563035113035 - type: precision value: 75.3014652014652 - type: recall value: 80.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 84.7 - type: f1 value: 82.78689263765207 - type: precision value: 82.06705086580087 - type: recall value: 84.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 50.33333333333333 - type: f1 value: 45.461523661523664 - type: precision value: 43.93545574795575 - type: recall value: 50.33333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.6000000000000005 - type: f1 value: 5.442121400446441 - type: precision value: 5.146630385487529 - type: recall value: 6.6000000000000005 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85 - type: f1 value: 81.04666666666667 - type: precision value: 79.25 - type: recall value: 85 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 47.32142857142857 - type: f1 value: 42.333333333333336 - type: precision value: 40.69196428571429 - type: recall value: 47.32142857142857 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 30.735455543358945 - type: f1 value: 26.73616790022338 - type: precision value: 25.397823220451283 - type: recall value: 30.735455543358945 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 25.1 - type: f1 value: 21.975989896371022 - type: precision value: 21.059885632257203 - type: recall value: 25.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.3 - type: f1 value: 92.75666666666666 - type: precision value: 92.06166666666665 - type: recall value: 94.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.1 - type: f1 value: 92.74 - type: precision value: 92.09166666666667 - type: recall value: 94.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 71.3 - type: f1 value: 66.922442002442 - type: precision value: 65.38249567099568 - type: recall value: 71.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 40.300000000000004 - type: f1 value: 35.78682789299971 - type: precision value: 34.66425128716588 - type: recall value: 40.300000000000004 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96 - type: f1 value: 94.82333333333334 - type: precision value: 94.27833333333334 - type: recall value: 96 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 51.1 - type: f1 value: 47.179074753133584 - type: precision value: 46.06461044702424 - type: recall value: 51.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.7 - type: f1 value: 84.71 - type: precision value: 83.46166666666667 - type: recall value: 87.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.8 - type: f1 value: 94.68333333333334 - type: precision value: 94.13333333333334 - type: recall value: 95.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.39999999999999 - type: f1 value: 82.5577380952381 - type: precision value: 81.36833333333334 - type: recall value: 85.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 21.16788321167883 - type: f1 value: 16.948865627297987 - type: precision value: 15.971932568647897 - type: recall value: 21.16788321167883 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.9 - type: f1 value: 5.515526831658907 - type: precision value: 5.141966366966367 - type: recall value: 6.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.2 - type: f1 value: 91.39666666666668 - type: precision value: 90.58666666666667 - type: recall value: 93.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.2 - type: f1 value: 89.95666666666666 - type: precision value: 88.92833333333333 - type: recall value: 92.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 79.76190476190477 - type: f1 value: 74.93386243386244 - type: precision value: 73.11011904761904 - type: recall value: 79.76190476190477 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.799999999999999 - type: f1 value: 6.921439712248537 - type: precision value: 6.489885109680683 - type: recall value: 8.799999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 45.75569358178054 - type: f1 value: 40.34699501312631 - type: precision value: 38.57886764719063 - type: recall value: 45.75569358178054 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.4 - type: f1 value: 89.08333333333333 - type: precision value: 88.01666666666668 - type: recall value: 91.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.60000000000001 - type: f1 value: 92.06690476190477 - type: precision value: 91.45095238095239 - type: recall value: 93.60000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 7.5 - type: f1 value: 6.200363129378736 - type: precision value: 5.89115314822466 - type: recall value: 7.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 73.59307359307358 - type: f1 value: 68.38933553219267 - type: precision value: 66.62698412698413 - type: recall value: 73.59307359307358 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 69.8473282442748 - type: f1 value: 64.72373682297346 - type: precision value: 62.82834214131924 - type: recall value: 69.8473282442748 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.5254730713246 - type: f1 value: 96.72489082969432 - type: precision value: 96.33672974284326 - type: recall value: 97.5254730713246 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 75.6 - type: f1 value: 72.42746031746033 - type: precision value: 71.14036630036631 - type: recall value: 75.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.24293785310734 - type: f1 value: 88.86064030131826 - type: precision value: 87.73540489642184 - type: recall value: 91.24293785310734 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.2 - type: f1 value: 4.383083659794954 - type: precision value: 4.027861324289673 - type: recall value: 6.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.8 - type: f1 value: 84.09428571428572 - type: precision value: 83.00333333333333 - type: recall value: 86.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 60.699999999999996 - type: f1 value: 56.1584972394755 - type: precision value: 54.713456330903135 - type: recall value: 60.699999999999996 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 84.2 - type: f1 value: 80.66190476190475 - type: precision value: 79.19690476190476 - type: recall value: 84.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.2 - type: f1 value: 91.33 - type: precision value: 90.45 - type: recall value: 93.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 6.3 - type: f1 value: 5.126828976748276 - type: precision value: 4.853614328966668 - type: recall value: 6.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 81.76943699731903 - type: f1 value: 77.82873739308057 - type: precision value: 76.27622452019234 - type: recall value: 81.76943699731903 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.30000000000001 - type: f1 value: 90.29666666666665 - type: precision value: 89.40333333333334 - type: recall value: 92.30000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 29.249011857707508 - type: f1 value: 24.561866096392947 - type: precision value: 23.356583740215456 - type: recall value: 29.249011857707508 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.46478873239437 - type: f1 value: 73.23943661971832 - type: precision value: 71.66666666666667 - type: recall value: 77.46478873239437 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 20.35928143712575 - type: f1 value: 15.997867865075824 - type: precision value: 14.882104658301346 - type: recall value: 20.35928143712575 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.2 - type: f1 value: 90.25999999999999 - type: precision value: 89.45333333333335 - type: recall value: 92.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 23.15270935960591 - type: f1 value: 19.65673625772148 - type: precision value: 18.793705293464992 - type: recall value: 23.15270935960591 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 59.154929577464785 - type: f1 value: 52.3868463305083 - type: precision value: 50.14938113529662 - type: recall value: 59.154929577464785 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 70.51282051282051 - type: f1 value: 66.8089133089133 - type: precision value: 65.37645687645687 - type: recall value: 70.51282051282051 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.6 - type: f1 value: 93 - type: precision value: 92.23333333333333 - type: recall value: 94.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 38.62212943632568 - type: f1 value: 34.3278276962583 - type: precision value: 33.07646935732408 - type: recall value: 38.62212943632568 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 28.1 - type: f1 value: 23.579609223054604 - type: precision value: 22.39622774921555 - type: recall value: 28.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.27361563517914 - type: f1 value: 85.12486427795874 - type: precision value: 83.71335504885994 - type: recall value: 88.27361563517914 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.6 - type: f1 value: 86.39928571428571 - type: precision value: 85.4947557997558 - type: recall value: 88.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.5 - type: f1 value: 83.77952380952381 - type: precision value: 82.67602564102565 - type: recall value: 86.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 79.52755905511812 - type: f1 value: 75.3055868016498 - type: precision value: 73.81889763779527 - type: recall value: 79.52755905511812 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.9 - type: f1 value: 73.76261904761905 - type: precision value: 72.11670995670995 - type: recall value: 77.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 53.8781163434903 - type: f1 value: 47.25804051288816 - type: precision value: 45.0603482390186 - type: recall value: 53.8781163434903 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.10000000000001 - type: f1 value: 88.88 - type: precision value: 87.96333333333334 - type: recall value: 91.10000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 38.46153846153847 - type: f1 value: 34.43978243978244 - type: precision value: 33.429487179487175 - type: recall value: 38.46153846153847 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.9 - type: f1 value: 86.19888888888887 - type: precision value: 85.07440476190476 - type: recall value: 88.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.9 - type: f1 value: 82.58857142857143 - type: precision value: 81.15666666666667 - type: recall value: 85.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.8 - type: f1 value: 83.36999999999999 - type: precision value: 81.86833333333333 - type: recall value: 86.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 68.51415094339622 - type: f1 value: 63.195000099481234 - type: precision value: 61.394033442972116 - type: recall value: 68.51415094339622 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.5 - type: f1 value: 86.14603174603175 - type: precision value: 85.1162037037037 - type: recall value: 88.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.62043795620438 - type: f1 value: 94.40389294403892 - type: precision value: 93.7956204379562 - type: recall value: 95.62043795620438 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 81.8 - type: f1 value: 78.6532178932179 - type: precision value: 77.46348795840176 - type: recall value: 81.8 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.603 - type: map_at_10 value: 8.5 - type: map_at_100 value: 12.985 - type: map_at_1000 value: 14.466999999999999 - type: map_at_3 value: 4.859999999999999 - type: map_at_5 value: 5.817 - type: mrr_at_1 value: 28.571 - type: mrr_at_10 value: 42.331 - type: mrr_at_100 value: 43.592999999999996 - type: mrr_at_1000 value: 43.592999999999996 - type: mrr_at_3 value: 38.435 - type: mrr_at_5 value: 39.966 - type: ndcg_at_1 value: 26.531 - type: ndcg_at_10 value: 21.353 - type: ndcg_at_100 value: 31.087999999999997 - type: ndcg_at_1000 value: 43.163000000000004 - type: ndcg_at_3 value: 22.999 - type: ndcg_at_5 value: 21.451 - type: precision_at_1 value: 28.571 - type: precision_at_10 value: 19.387999999999998 - type: precision_at_100 value: 6.265 - type: precision_at_1000 value: 1.4160000000000001 - type: precision_at_3 value: 24.490000000000002 - type: precision_at_5 value: 21.224 - type: recall_at_1 value: 2.603 - type: recall_at_10 value: 14.474 - type: recall_at_100 value: 40.287 - type: recall_at_1000 value: 76.606 - type: recall_at_3 value: 5.978 - type: recall_at_5 value: 7.819 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 69.7848 - type: ap value: 13.661023167088224 - type: f1 value: 53.61686134460943 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.28183361629882 - type: f1 value: 61.55481034919965 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 35.972128420092396 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.59933241938367 - type: cos_sim_ap value: 72.20760361208136 - type: cos_sim_f1 value: 66.4447731755424 - type: cos_sim_precision value: 62.35539102267469 - type: cos_sim_recall value: 71.10817941952506 - type: dot_accuracy value: 78.98313166835548 - type: dot_ap value: 44.492521645493795 - type: dot_f1 value: 45.814889336016094 - type: dot_precision value: 37.02439024390244 - type: dot_recall value: 60.07915567282321 - type: euclidean_accuracy value: 85.3907134767837 - type: euclidean_ap value: 71.53847289080343 - type: euclidean_f1 value: 65.95952206778834 - type: euclidean_precision value: 61.31006346328196 - type: euclidean_recall value: 71.37203166226914 - type: manhattan_accuracy value: 85.40859510043511 - type: manhattan_ap value: 71.49664104395515 - type: manhattan_f1 value: 65.98569969356485 - type: manhattan_precision value: 63.928748144482924 - type: manhattan_recall value: 68.17941952506597 - type: max_accuracy value: 85.59933241938367 - type: max_ap value: 72.20760361208136 - type: max_f1 value: 66.4447731755424 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.83261536073273 - type: cos_sim_ap value: 85.48178133644264 - type: cos_sim_f1 value: 77.87816307403935 - type: cos_sim_precision value: 75.88953021114926 - type: cos_sim_recall value: 79.97382198952879 - type: dot_accuracy value: 79.76287499514883 - type: dot_ap value: 59.17438838475084 - type: dot_f1 value: 56.34566667855996 - type: dot_precision value: 52.50349092359864 - type: dot_recall value: 60.794579611949494 - type: euclidean_accuracy value: 88.76857996662397 - type: euclidean_ap value: 85.22764834359887 - type: euclidean_f1 value: 77.65379751543554 - type: euclidean_precision value: 75.11152683839401 - type: euclidean_recall value: 80.37419156144134 - type: manhattan_accuracy value: 88.6987231730508 - type: manhattan_ap value: 85.18907981724007 - type: manhattan_f1 value: 77.51967028849757 - type: manhattan_precision value: 75.49992701795358 - type: manhattan_recall value: 79.65044656606098 - type: max_accuracy value: 88.83261536073273 - type: max_ap value: 85.48178133644264 - type: max_f1 value: 77.87816307403935 language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - 'no' - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh license: mit --- ## Multilingual-E5-base Multilingual E5 Text Embeddings: A Technical Report. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei, arXiv 2024 This model has 12 layers and the embedding size is 768. ## Usage Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. ## Supported Languages This model is initialized from xlm-roberta-base and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation. ## Training Details **Initialization**: xlm-roberta-base **First stage**: contrastive pre-training with weak supervision | Dataset | Weak supervision | # of text pairs | |--------------------------------------------------------------------------------------------------------|---------------------------------------|-----------------| | Filtered mC4 | (title, page content) | 1B | | CC News | (title, news content) | 400M | | NLLB | translation pairs | 2.4B | | Wikipedia | (hierarchical section title, passage) | 150M | | Filtered Reddit | (comment, response) | 800M | | S2ORC | (title, abstract) and citation pairs | 100M | | Stackexchange | (question, answer) | 50M | | xP3 | (input prompt, response) | 80M | | Miscellaneous unsupervised SBERT data | - | 10M | **Second stage**: supervised fine-tuning | Dataset | Language | # of text pairs | |----------------------------------------------------------------------------------------|--------------|-----------------| | MS MARCO | English | 500k | | NQ | English | 70k | | Trivia QA | English | 60k | | NLI from SimCSE | English | <300k | | ELI5 | English | 500k | | DuReader Retrieval | Chinese | 86k | | KILT Fever | English | 70k | | KILT HotpotQA | English | 70k | | SQuAD | English | 87k | | Quora | English | 150k | | Mr. TyDi | 11 languages | 50k | | MIRACL | 16 languages | 40k | For all labeled datasets, we only use its training set for fine-tuning. For other training details, please refer to our paper at ## Benchmark Results on Mr. TyDi | Model | Avg MRR@10 | | ar | bn | en | fi | id | ja | ko | ru | sw | te | th | |-----------------------|------------|-------|------| --- | --- | --- | --- | --- | --- | --- |------| --- | --- | | BM25 | 33.3 | | 36.7 | 41.3 | 15.1 | 28.8 | 38.2 | 21.7 | 28.1 | 32.9 | 39.6 | 42.4 | 41.7 | | mDPR | 16.7 | | 26.0 | 25.8 | 16.2 | 11.3 | 14.6 | 18.1 | 21.9 | 18.5 | 7.3 | 10.6 | 13.5 | | BM25 + mDPR | 41.7 | | 49.1 | 53.5 | 28.4 | 36.5 | 45.5 | 35.5 | 36.2 | 42.7 | 40.5 | 42.0 | 49.2 | | | | | multilingual-e5-small | 64.4 | | 71.5 | 66.3 | 54.5 | 57.7 | 63.2 | 55.4 | 54.3 | 60.8 | 65.4 | 89.1 | 70.1 | | multilingual-e5-base | 65.9 | | 72.3 | 65.0 | 58.5 | 60.8 | 64.9 | 56.6 | 55.8 | 62.7 | 69.0 | 86.6 | 72.7 | | multilingual-e5-large | **70.5** | | 77.5 | 73.2 | 60.8 | 66.8 | 68.5 | 62.5 | 61.6 | 65.8 | 72.7 | 90.2 | 76.2 | ## MTEB Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## Support for Sentence Transformers Below is an example for usage with sentence_transformers. Package requirements Contributors: michaelfeil ## FAQ **1. Do I need to add the prefix \"query: \" and \"passage: \" to input texts?** Yes, this is how the model is trained, otherwise you will see a performance degradation. Here are some rules of thumb: - Use \"query: \" and \"passage: \" correspondingly for asymmetric tasks such as passage retrieval in open QA, ad-hoc information retrieval. - Use \"query: \" prefix for symmetric tasks such as semantic similarity, bitext mining, paraphrase retrieval. - Use \"query: \" prefix if you want to use embeddings as features, such as linear probing classification, clustering. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Why does the cosine similarity scores distribute around 0.7 to 1.0?** This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss. For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue. ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations Long texts will be truncated to at most 512 tokens.", + "model_explanation_gemini": "A multilingual sentence embedding model designed for text classification and retrieval tasks across multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_multilingual-e5-large-instruct.json b/data/model_data_json/intfloat_multilingual-e5-large-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..3841375f558592054efb5e44aa93b6fa5848d659 --- /dev/null +++ b/data/model_data_json/intfloat_multilingual-e5-large-instruct.json @@ -0,0 +1,119 @@ +{ + "model_id": "intfloat/multilingual-e5-large-instruct", + "downloads": 818048, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "xlm-roberta", + "feature-extraction", + "mteb", + "transformers", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:2402.05672", + "arxiv:2401.00368", + "arxiv:2104.08663", + "arxiv:2210.07316", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-transformers - transformers model-index: - name: multilingual-e5-large-instruct results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.23880597014924 - type: ap value: 39.07351965022687 - type: f1 value: 70.04836733862683 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (de) config: de split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 66.71306209850107 - type: ap value: 79.01499914759529 - type: f1 value: 64.81951817560703 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en-ext) config: en-ext split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.85307346326837 - type: ap value: 22.447519885878737 - type: f1 value: 61.0162730745633 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (ja) config: ja split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.04925053533191 - type: ap value: 23.44983217128922 - type: f1 value: 62.5723230907759 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 96.28742500000001 - type: ap value: 94.8449918887462 - type: f1 value: 96.28680923610432 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 56.716 - type: f1 value: 55.76510398266401 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 52.99999999999999 - type: f1 value: 52.00829994765178 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.806000000000004 - type: f1 value: 48.082345914983634 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.507999999999996 - type: f1 value: 47.68752844642045 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 47.709999999999994 - type: f1 value: 47.05870376637181 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 44.662000000000006 - type: f1 value: 43.42371965372771 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 31.721 - type: map_at_10 value: 49.221 - type: map_at_100 value: 49.884 - type: map_at_1000 value: 49.888 - type: map_at_3 value: 44.31 - type: map_at_5 value: 47.276 - type: mrr_at_1 value: 32.432 - type: mrr_at_10 value: 49.5 - type: mrr_at_100 value: 50.163000000000004 - type: mrr_at_1000 value: 50.166 - type: mrr_at_3 value: 44.618 - type: mrr_at_5 value: 47.541 - type: ndcg_at_1 value: 31.721 - type: ndcg_at_10 value: 58.384 - type: ndcg_at_100 value: 61.111000000000004 - type: ndcg_at_1000 value: 61.187999999999995 - type: ndcg_at_3 value: 48.386 - type: ndcg_at_5 value: 53.708999999999996 - type: precision_at_1 value: 31.721 - type: precision_at_10 value: 8.741 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.057 - type: precision_at_5 value: 14.609 - type: recall_at_1 value: 31.721 - type: recall_at_10 value: 87.411 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 60.171 - type: recall_at_5 value: 73.044 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 46.40419580759799 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 40.48593255007969 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.889179122289995 - type: mrr value: 77.61146286769556 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 88.15075203727929 - type: cos_sim_spearman value: 86.9622224570873 - type: euclidean_pearson value: 86.70473853624121 - type: euclidean_spearman value: 86.9622224570873 - type: manhattan_pearson value: 86.21089380980065 - type: manhattan_spearman value: 86.75318154937008 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.65553235908142 - type: f1 value: 99.60681976339595 - type: precision value: 99.58246346555325 - type: recall value: 99.65553235908142 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.26260180497468 - type: f1 value: 99.14520507740848 - type: precision value: 99.08650671362535 - type: recall value: 99.26260180497468 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 98.07412538967787 - type: f1 value: 97.86629719431936 - type: precision value: 97.76238309664012 - type: recall value: 98.07412538967787 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.42074776197998 - type: f1 value: 99.38564156573635 - type: precision value: 99.36808846761454 - type: recall value: 99.42074776197998 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.73376623376623 - type: f1 value: 85.68480707214599 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 40.935218072113855 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.276389017675264 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.764166666666668 - type: map_at_10 value: 37.298166666666674 - type: map_at_100 value: 38.530166666666666 - type: map_at_1000 value: 38.64416666666667 - type: map_at_3 value: 34.484833333333334 - type: map_at_5 value: 36.0385 - type: mrr_at_1 value: 32.93558333333333 - type: mrr_at_10 value: 41.589749999999995 - type: mrr_at_100 value: 42.425333333333334 - type: mrr_at_1000 value: 42.476333333333336 - type: mrr_at_3 value: 39.26825 - type: mrr_at_5 value: 40.567083333333336 - type: ndcg_at_1 value: 32.93558333333333 - type: ndcg_at_10 value: 42.706583333333334 - type: ndcg_at_100 value: 47.82483333333333 - type: ndcg_at_1000 value: 49.95733333333334 - type: ndcg_at_3 value: 38.064750000000004 - type: ndcg_at_5 value: 40.18158333333333 - type: precision_at_1 value: 32.93558333333333 - type: precision_at_10 value: 7.459833333333334 - type: precision_at_100 value: 1.1830833333333335 - type: precision_at_1000 value: 0.15608333333333332 - type: precision_at_3 value: 17.5235 - type: precision_at_5 value: 12.349833333333333 - type: recall_at_1 value: 27.764166666666668 - type: recall_at_10 value: 54.31775 - type: recall_at_100 value: 76.74350000000001 - type: recall_at_1000 value: 91.45208333333332 - type: recall_at_3 value: 41.23425 - type: recall_at_5 value: 46.73983333333334 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 12.969 - type: map_at_10 value: 21.584999999999997 - type: map_at_100 value: 23.3 - type: map_at_1000 value: 23.5 - type: map_at_3 value: 18.218999999999998 - type: map_at_5 value: 19.983 - type: mrr_at_1 value: 29.316 - type: mrr_at_10 value: 40.033 - type: mrr_at_100 value: 40.96 - type: mrr_at_1000 value: 41.001 - type: mrr_at_3 value: 37.123 - type: mrr_at_5 value: 38.757999999999996 - type: ndcg_at_1 value: 29.316 - type: ndcg_at_10 value: 29.858 - type: ndcg_at_100 value: 36.756 - type: ndcg_at_1000 value: 40.245999999999995 - type: ndcg_at_3 value: 24.822 - type: ndcg_at_5 value: 26.565 - type: precision_at_1 value: 29.316 - type: precision_at_10 value: 9.186 - type: precision_at_100 value: 1.6549999999999998 - type: precision_at_1000 value: 0.22999999999999998 - type: precision_at_3 value: 18.436 - type: precision_at_5 value: 13.876 - type: recall_at_1 value: 12.969 - type: recall_at_10 value: 35.142 - type: recall_at_100 value: 59.143 - type: recall_at_1000 value: 78.594 - type: recall_at_3 value: 22.604 - type: recall_at_5 value: 27.883000000000003 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.527999999999999 - type: map_at_10 value: 17.974999999999998 - type: map_at_100 value: 25.665 - type: map_at_1000 value: 27.406000000000002 - type: map_at_3 value: 13.017999999999999 - type: map_at_5 value: 15.137 - type: mrr_at_1 value: 62.5 - type: mrr_at_10 value: 71.891 - type: mrr_at_100 value: 72.294 - type: mrr_at_1000 value: 72.296 - type: mrr_at_3 value: 69.958 - type: mrr_at_5 value: 71.121 - type: ndcg_at_1 value: 50.875 - type: ndcg_at_10 value: 38.36 - type: ndcg_at_100 value: 44.235 - type: ndcg_at_1000 value: 52.154 - type: ndcg_at_3 value: 43.008 - type: ndcg_at_5 value: 40.083999999999996 - type: precision_at_1 value: 62.5 - type: precision_at_10 value: 30.0 - type: precision_at_100 value: 10.038 - type: precision_at_1000 value: 2.0869999999999997 - type: precision_at_3 value: 46.833000000000006 - type: precision_at_5 value: 38.800000000000004 - type: recall_at_1 value: 8.527999999999999 - type: recall_at_10 value: 23.828 - type: recall_at_100 value: 52.322 - type: recall_at_1000 value: 77.143 - type: recall_at_3 value: 14.136000000000001 - type: recall_at_5 value: 17.761 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 51.51 - type: f1 value: 47.632159862049896 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 60.734 - type: map_at_10 value: 72.442 - type: map_at_100 value: 72.735 - type: map_at_1000 value: 72.75 - type: map_at_3 value: 70.41199999999999 - type: map_at_5 value: 71.80499999999999 - type: mrr_at_1 value: 65.212 - type: mrr_at_10 value: 76.613 - type: mrr_at_100 value: 76.79899999999999 - type: mrr_at_1000 value: 76.801 - type: mrr_at_3 value: 74.8 - type: mrr_at_5 value: 76.12400000000001 - type: ndcg_at_1 value: 65.212 - type: ndcg_at_10 value: 77.988 - type: ndcg_at_100 value: 79.167 - type: ndcg_at_1000 value: 79.452 - type: ndcg_at_3 value: 74.362 - type: ndcg_at_5 value: 76.666 - type: precision_at_1 value: 65.212 - type: precision_at_10 value: 10.003 - type: precision_at_100 value: 1.077 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 29.518 - type: precision_at_5 value: 19.016 - type: recall_at_1 value: 60.734 - type: recall_at_10 value: 90.824 - type: recall_at_100 value: 95.71600000000001 - type: recall_at_1000 value: 97.577 - type: recall_at_3 value: 81.243 - type: recall_at_5 value: 86.90299999999999 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 23.845 - type: map_at_10 value: 39.281 - type: map_at_100 value: 41.422 - type: map_at_1000 value: 41.593 - type: map_at_3 value: 34.467 - type: map_at_5 value: 37.017 - type: mrr_at_1 value: 47.531 - type: mrr_at_10 value: 56.204 - type: mrr_at_100 value: 56.928999999999995 - type: mrr_at_1000 value: 56.962999999999994 - type: mrr_at_3 value: 54.115 - type: mrr_at_5 value: 55.373000000000005 - type: ndcg_at_1 value: 47.531 - type: ndcg_at_10 value: 47.711999999999996 - type: ndcg_at_100 value: 54.510999999999996 - type: ndcg_at_1000 value: 57.103 - type: ndcg_at_3 value: 44.145 - type: ndcg_at_5 value: 45.032 - type: precision_at_1 value: 47.531 - type: precision_at_10 value: 13.194 - type: precision_at_100 value: 2.045 - type: precision_at_1000 value: 0.249 - type: precision_at_3 value: 29.424 - type: precision_at_5 value: 21.451 - type: recall_at_1 value: 23.845 - type: recall_at_10 value: 54.967 - type: recall_at_100 value: 79.11399999999999 - type: recall_at_1000 value: 94.56700000000001 - type: recall_at_3 value: 40.256 - type: recall_at_5 value: 46.215 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 37.819 - type: map_at_10 value: 60.889 - type: map_at_100 value: 61.717999999999996 - type: map_at_1000 value: 61.778 - type: map_at_3 value: 57.254000000000005 - type: map_at_5 value: 59.541 - type: mrr_at_1 value: 75.638 - type: mrr_at_10 value: 82.173 - type: mrr_at_100 value: 82.362 - type: mrr_at_1000 value: 82.37 - type: mrr_at_3 value: 81.089 - type: mrr_at_5 value: 81.827 - type: ndcg_at_1 value: 75.638 - type: ndcg_at_10 value: 69.317 - type: ndcg_at_100 value: 72.221 - type: ndcg_at_1000 value: 73.382 - type: ndcg_at_3 value: 64.14 - type: ndcg_at_5 value: 67.07600000000001 - type: precision_at_1 value: 75.638 - type: precision_at_10 value: 14.704999999999998 - type: precision_at_100 value: 1.698 - type: precision_at_1000 value: 0.185 - type: precision_at_3 value: 41.394999999999996 - type: precision_at_5 value: 27.162999999999997 - type: recall_at_1 value: 37.819 - type: recall_at_10 value: 73.52499999999999 - type: recall_at_100 value: 84.875 - type: recall_at_1000 value: 92.559 - type: recall_at_3 value: 62.092999999999996 - type: recall_at_5 value: 67.907 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 94.60079999999999 - type: ap value: 92.67396345347356 - type: f1 value: 94.5988098167121 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.285 - type: map_at_10 value: 33.436 - type: map_at_100 value: 34.63 - type: map_at_1000 value: 34.681 - type: map_at_3 value: 29.412 - type: map_at_5 value: 31.715 - type: mrr_at_1 value: 21.848 - type: mrr_at_10 value: 33.979 - type: mrr_at_100 value: 35.118 - type: mrr_at_1000 value: 35.162 - type: mrr_at_3 value: 30.036 - type: mrr_at_5 value: 32.298 - type: ndcg_at_1 value: 21.862000000000002 - type: ndcg_at_10 value: 40.43 - type: ndcg_at_100 value: 46.17 - type: ndcg_at_1000 value: 47.412 - type: ndcg_at_3 value: 32.221 - type: ndcg_at_5 value: 36.332 - type: precision_at_1 value: 21.862000000000002 - type: precision_at_10 value: 6.491 - type: precision_at_100 value: 0.935 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 13.744 - type: precision_at_5 value: 10.331999999999999 - type: recall_at_1 value: 21.285 - type: recall_at_10 value: 62.083 - type: recall_at_100 value: 88.576 - type: recall_at_1000 value: 98.006 - type: recall_at_3 value: 39.729 - type: recall_at_5 value: 49.608000000000004 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.92612859097127 - type: f1 value: 93.82370333372853 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.67681036911807 - type: f1 value: 92.14191382411472 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.26817878585723 - type: f1 value: 91.92824250337878 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.96554963983714 - type: f1 value: 90.02859329630792 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 90.02509860164935 - type: f1 value: 89.30665159182062 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 87.55515370705244 - type: f1 value: 87.94449232331907 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 82.4623803009576 - type: f1 value: 66.06738378772725 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 79.3716539870386 - type: f1 value: 60.37614033396853 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 80.34022681787857 - type: f1 value: 58.302008026952 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.72095208268087 - type: f1 value: 59.64524724009049 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.87020437432773 - type: f1 value: 57.80202694670567 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.73598553345387 - type: f1 value: 58.19628250675031 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.6630800268998 - type: f1 value: 65.00996668051691 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.7128446536651 - type: f1 value: 57.95860594874963 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.61129791526563 - type: f1 value: 59.75328290206483 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.00134498991257 - type: f1 value: 67.0230483991802 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.54068594485541 - type: f1 value: 65.54604628946976 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.032952252858095 - type: f1 value: 58.715741857057104 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.80901143241427 - type: f1 value: 68.33963989243877 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.47141896435777 - type: f1 value: 69.56765020308262 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.2373907195696 - type: f1 value: 69.04529836036467 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 77.05783456624076 - type: f1 value: 74.69430584708174 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.82111634162744 - type: f1 value: 70.77228952803762 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.25353059852051 - type: f1 value: 71.05310103416411 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.28648285137861 - type: f1 value: 69.08020473732226 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.31540013449899 - type: f1 value: 70.9426355465791 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.2151983860121 - type: f1 value: 67.52541755908858 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.58372562205784 - type: f1 value: 69.49769064229827 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.9233355749832 - type: f1 value: 69.36311548259593 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.07330195023538 - type: f1 value: 64.99882022345572 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.62273032952253 - type: f1 value: 70.6394885471001 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.77000672494957 - type: f1 value: 62.9368944815065 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.453261600538 - type: f1 value: 70.85069934666681 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.6906523201076 - type: f1 value: 72.03249740074217 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.03631472763953 - type: f1 value: 59.3165215571852 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.913920645595155 - type: f1 value: 57.367337711611285 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.42837928715535 - type: f1 value: 52.60527294970906 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.33490248823135 - type: f1 value: 63.213340969404065 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.58507061197041 - type: f1 value: 68.40256628040486 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.11230665770006 - type: f1 value: 66.44863577842305 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.70073974445192 - type: f1 value: 67.21291337273702 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.43913920645595 - type: f1 value: 64.09838087422806 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.80026899798251 - type: f1 value: 68.76986742962444 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.78816408876934 - type: f1 value: 62.18781873428972 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.6577000672495 - type: f1 value: 68.75171511133003 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.42501681237391 - type: f1 value: 71.18434963451544 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.64828513786146 - type: f1 value: 70.67741914007422 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.62811028917284 - type: f1 value: 71.36402039740959 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.88634835238736 - type: f1 value: 69.23701923480677 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.15938130464022 - type: f1 value: 71.87792218993388 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.96301277740416 - type: f1 value: 67.29584200202983 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.49562878278412 - type: f1 value: 66.91716685679431 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 74.6805648957633 - type: f1 value: 72.02723592594374 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.00605245460659 - type: f1 value: 60.16716669482932 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.90988567585742 - type: f1 value: 63.99405488777784 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.62273032952253 - type: f1 value: 65.17213906909481 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.50907868190988 - type: f1 value: 69.15165697194853 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.30733019502352 - type: f1 value: 66.69024007380474 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.24277067921989 - type: f1 value: 68.80515408492947 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.49831876260929 - type: f1 value: 64.83778567111116 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.28782784129119 - type: f1 value: 69.3294186700733 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.315400134499 - type: f1 value: 71.22674385243207 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.37794216543377 - type: f1 value: 68.96962492838232 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.33557498318764 - type: f1 value: 72.28949738478356 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.84398117014123 - type: f1 value: 64.71026362091463 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.76462676529925 - type: f1 value: 69.8229667407667 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.02420981842636 - type: f1 value: 71.76576384895898 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.7572293207801 - type: f1 value: 72.76840765295256 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.02286482851379 - type: f1 value: 66.17237947327872 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.60928043039678 - type: f1 value: 77.27094731234773 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.68325487558843 - type: f1 value: 77.97530399082261 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.13315400134498 - type: f1 value: 75.97558584796424 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 80.47410894418292 - type: f1 value: 80.52244841473792 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.9670477471419 - type: f1 value: 77.37318805793146 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.09683927370544 - type: f1 value: 77.69773737430847 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.20847343644922 - type: f1 value: 75.17071738727348 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.07464694014796 - type: f1 value: 77.16136207698571 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.53396099529255 - type: f1 value: 73.58296404484122 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.75319435104237 - type: f1 value: 75.24674707850833 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.0948217888366 - type: f1 value: 76.47559490205028 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.07599193006052 - type: f1 value: 70.76028043093511 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.10490921318089 - type: f1 value: 77.01215275283272 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.25756556825824 - type: f1 value: 70.20605314648762 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.08137188971082 - type: f1 value: 77.3899269057439 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.35440484196369 - type: f1 value: 79.58964690002772 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.42299932750504 - type: f1 value: 68.07844356925413 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.15669132481507 - type: f1 value: 65.89383352608513 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.11432414256894 - type: f1 value: 57.69910594559806 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.24747814391392 - type: f1 value: 70.42455553830918 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.46267652992603 - type: f1 value: 76.8854559308316 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.24815063887021 - type: f1 value: 72.77805034658074 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.11566913248151 - type: f1 value: 73.86147988001356 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.0168123739072 - type: f1 value: 69.38515920054571 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.41156691324814 - type: f1 value: 73.43474953408237 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.39609952925353 - type: f1 value: 67.29731681109291 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.20914593140552 - type: f1 value: 77.07066497935367 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.52387357094821 - type: f1 value: 78.5259569473291 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.6913248150639 - type: f1 value: 76.91201656350455 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.1217215870881 - type: f1 value: 77.41179937912504 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.25891055817083 - type: f1 value: 75.8089244542887 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.70679219905851 - type: f1 value: 78.21459594517711 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.83523873570948 - type: f1 value: 74.86847028401978 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.71755211835911 - type: f1 value: 74.0214326485662 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.06523201075991 - type: f1 value: 79.10545620325138 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.91862811028918 - type: f1 value: 66.50386121217983 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.93140551445865 - type: f1 value: 70.755435928495 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.40753194351042 - type: f1 value: 71.61816115782923 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.1815736381977 - type: f1 value: 75.08016717887205 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.86482851378614 - type: f1 value: 72.39521180006291 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.46940147948891 - type: f1 value: 76.70044085362349 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.89307330195024 - type: f1 value: 71.5721825332298 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.7511768661735 - type: f1 value: 75.17918654541515 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 78.69535978480162 - type: f1 value: 78.90019070153316 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.45729657027572 - type: f1 value: 76.19578371794672 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 36.92715354123554 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 35.53536244162518 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 33.08507884504006 - type: mrr value: 34.32436977159129 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.935 - type: map_at_10 value: 13.297 - type: map_at_100 value: 16.907 - type: map_at_1000 value: 18.391 - type: map_at_3 value: 9.626999999999999 - type: map_at_5 value: 11.190999999999999 - type: mrr_at_1 value: 46.129999999999995 - type: mrr_at_10 value: 54.346000000000004 - type: mrr_at_100 value: 55.067 - type: mrr_at_1000 value: 55.1 - type: mrr_at_3 value: 51.961 - type: mrr_at_5 value: 53.246 - type: ndcg_at_1 value: 44.118 - type: ndcg_at_10 value: 35.534 - type: ndcg_at_100 value: 32.946999999999996 - type: ndcg_at_1000 value: 41.599000000000004 - type: ndcg_at_3 value: 40.25 - type: ndcg_at_5 value: 37.978 - type: precision_at_1 value: 46.129999999999995 - type: precision_at_10 value: 26.842 - type: precision_at_100 value: 8.427 - type: precision_at_1000 value: 2.128 - type: precision_at_3 value: 37.977 - type: precision_at_5 value: 32.879000000000005 - type: recall_at_1 value: 5.935 - type: recall_at_10 value: 17.211000000000002 - type: recall_at_100 value: 34.33 - type: recall_at_1000 value: 65.551 - type: recall_at_3 value: 10.483 - type: recall_at_5 value: 13.078999999999999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 35.231 - type: map_at_10 value: 50.202000000000005 - type: map_at_100 value: 51.154999999999994 - type: map_at_1000 value: 51.181 - type: map_at_3 value: 45.774 - type: map_at_5 value: 48.522 - type: mrr_at_1 value: 39.687 - type: mrr_at_10 value: 52.88 - type: mrr_at_100 value: 53.569 - type: mrr_at_1000 value: 53.58500000000001 - type: mrr_at_3 value: 49.228 - type: mrr_at_5 value: 51.525 - type: ndcg_at_1 value: 39.687 - type: ndcg_at_10 value: 57.754000000000005 - type: ndcg_at_100 value: 61.597 - type: ndcg_at_1000 value: 62.18900000000001 - type: ndcg_at_3 value: 49.55 - type: ndcg_at_5 value: 54.11899999999999 - type: precision_at_1 value: 39.687 - type: precision_at_10 value: 9.313 - type: precision_at_100 value: 1.146 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 22.229 - type: precision_at_5 value: 15.939 - type: recall_at_1 value: 35.231 - type: recall_at_10 value: 78.083 - type: recall_at_100 value: 94.42099999999999 - type: recall_at_1000 value: 98.81 - type: recall_at_3 value: 57.047000000000004 - type: recall_at_5 value: 67.637 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.241 - type: map_at_10 value: 85.462 - type: map_at_100 value: 86.083 - type: map_at_1000 value: 86.09700000000001 - type: map_at_3 value: 82.49499999999999 - type: map_at_5 value: 84.392 - type: mrr_at_1 value: 82.09 - type: mrr_at_10 value: 88.301 - type: mrr_at_100 value: 88.383 - type: mrr_at_1000 value: 88.384 - type: mrr_at_3 value: 87.37 - type: mrr_at_5 value: 88.035 - type: ndcg_at_1 value: 82.12 - type: ndcg_at_10 value: 89.149 - type: ndcg_at_100 value: 90.235 - type: ndcg_at_1000 value: 90.307 - type: ndcg_at_3 value: 86.37599999999999 - type: ndcg_at_5 value: 87.964 - type: precision_at_1 value: 82.12 - type: precision_at_10 value: 13.56 - type: precision_at_100 value: 1.539 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.88 - type: precision_at_5 value: 24.92 - type: recall_at_1 value: 71.241 - type: recall_at_10 value: 96.128 - type: recall_at_100 value: 99.696 - type: recall_at_1000 value: 99.994 - type: recall_at_3 value: 88.181 - type: recall_at_5 value: 92.694 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.59757799655151 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 64.27391998854624 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.243 - type: map_at_10 value: 10.965 - type: map_at_100 value: 12.934999999999999 - type: map_at_1000 value: 13.256 - type: map_at_3 value: 7.907 - type: map_at_5 value: 9.435 - type: mrr_at_1 value: 20.9 - type: mrr_at_10 value: 31.849 - type: mrr_at_100 value: 32.964 - type: mrr_at_1000 value: 33.024 - type: mrr_at_3 value: 28.517 - type: mrr_at_5 value: 30.381999999999998 - type: ndcg_at_1 value: 20.9 - type: ndcg_at_10 value: 18.723 - type: ndcg_at_100 value: 26.384999999999998 - type: ndcg_at_1000 value: 32.114 - type: ndcg_at_3 value: 17.753 - type: ndcg_at_5 value: 15.558 - type: precision_at_1 value: 20.9 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 2.078 - type: precision_at_1000 value: 0.345 - type: precision_at_3 value: 16.900000000000002 - type: precision_at_5 value: 13.88 - type: recall_at_1 value: 4.243 - type: recall_at_10 value: 19.885 - type: recall_at_100 value: 42.17 - type: recall_at_1000 value: 70.12 - type: recall_at_3 value: 10.288 - type: recall_at_5 value: 14.072000000000001 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 85.84209174935282 - type: cos_sim_spearman value: 81.73248048438833 - type: euclidean_pearson value: 83.02810070308149 - type: euclidean_spearman value: 81.73248295679514 - type: manhattan_pearson value: 82.95368060376002 - type: manhattan_spearman value: 81.60277910998718 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 88.52628804556943 - type: cos_sim_spearman value: 82.5713913555672 - type: euclidean_pearson value: 85.8796774746988 - type: euclidean_spearman value: 82.57137506803424 - type: manhattan_pearson value: 85.79671002960058 - type: manhattan_spearman value: 82.49445981618027 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 86.23682503505542 - type: cos_sim_spearman value: 87.15008956711806 - type: euclidean_pearson value: 86.79805401524959 - type: euclidean_spearman value: 87.15008956711806 - type: manhattan_pearson value: 86.65298502699244 - type: manhattan_spearman value: 86.97677821948562 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 85.63370304677802 - type: cos_sim_spearman value: 84.97105553540318 - type: euclidean_pearson value: 85.28896108687721 - type: euclidean_spearman value: 84.97105553540318 - type: manhattan_pearson value: 85.09663190337331 - type: manhattan_spearman value: 84.79126831644619 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 90.2614838800733 - type: cos_sim_spearman value: 91.0509162991835 - type: euclidean_pearson value: 90.33098317533373 - type: euclidean_spearman value: 91.05091625871644 - type: manhattan_pearson value: 90.26250435151107 - type: manhattan_spearman value: 90.97999594417519 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 85.80480973335091 - type: cos_sim_spearman value: 87.313695492969 - type: euclidean_pearson value: 86.49267251576939 - type: euclidean_spearman value: 87.313695492969 - type: manhattan_pearson value: 86.44019901831935 - type: manhattan_spearman value: 87.24205395460392 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 90.05662789380672 - type: cos_sim_spearman value: 90.02759424426651 - type: euclidean_pearson value: 90.4042483422981 - type: euclidean_spearman value: 90.02759424426651 - type: manhattan_pearson value: 90.51446975000226 - type: manhattan_spearman value: 90.08832889933616 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.5975528273532 - type: cos_sim_spearman value: 67.62969861411354 - type: euclidean_pearson value: 69.224275734323 - type: euclidean_spearman value: 67.62969861411354 - type: manhattan_pearson value: 69.3761447059927 - type: manhattan_spearman value: 67.90921005611467 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 87.11244327231684 - type: cos_sim_spearman value: 88.37902438979035 - type: euclidean_pearson value: 87.86054279847336 - type: euclidean_spearman value: 88.37902438979035 - type: manhattan_pearson value: 87.77257757320378 - type: manhattan_spearman value: 88.25208966098123 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 85.87174608143563 - type: mrr value: 96.12836872640794 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.760999999999996 - type: map_at_10 value: 67.258 - type: map_at_100 value: 67.757 - type: map_at_1000 value: 67.78800000000001 - type: map_at_3 value: 64.602 - type: map_at_5 value: 65.64 - type: mrr_at_1 value: 60.667 - type: mrr_at_10 value: 68.441 - type: mrr_at_100 value: 68.825 - type: mrr_at_1000 value: 68.853 - type: mrr_at_3 value: 66.444 - type: mrr_at_5 value: 67.26100000000001 - type: ndcg_at_1 value: 60.667 - type: ndcg_at_10 value: 71.852 - type: ndcg_at_100 value: 73.9 - type: ndcg_at_1000 value: 74.628 - type: ndcg_at_3 value: 67.093 - type: ndcg_at_5 value: 68.58 - type: precision_at_1 value: 60.667 - type: precision_at_10 value: 9.6 - type: precision_at_100 value: 1.0670000000000002 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 26.111 - type: precision_at_5 value: 16.733 - type: recall_at_1 value: 57.760999999999996 - type: recall_at_10 value: 84.967 - type: recall_at_100 value: 93.833 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 71.589 - type: recall_at_5 value: 75.483 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.66633663366336 - type: cos_sim_ap value: 91.17685358899108 - type: cos_sim_f1 value: 82.16818642350559 - type: cos_sim_precision value: 83.26488706365504 - type: cos_sim_recall value: 81.10000000000001 - type: dot_accuracy value: 99.66633663366336 - type: dot_ap value: 91.17663411119032 - type: dot_f1 value: 82.16818642350559 - type: dot_precision value: 83.26488706365504 - type: dot_recall value: 81.10000000000001 - type: euclidean_accuracy value: 99.66633663366336 - type: euclidean_ap value: 91.17685189882275 - type: euclidean_f1 value: 82.16818642350559 - type: euclidean_precision value: 83.26488706365504 - type: euclidean_recall value: 81.10000000000001 - type: manhattan_accuracy value: 99.66633663366336 - type: manhattan_ap value: 91.2241619496737 - type: manhattan_f1 value: 82.20472440944883 - type: manhattan_precision value: 86.51933701657458 - type: manhattan_recall value: 78.3 - type: max_accuracy value: 99.66633663366336 - type: max_ap value: 91.2241619496737 - type: max_f1 value: 82.20472440944883 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 66.85101268897951 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 42.461184054706905 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 51.44542568873886 - type: mrr value: 52.33656151854681 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.75982974997539 - type: cos_sim_spearman value: 30.385405026539914 - type: dot_pearson value: 30.75982433546523 - type: dot_spearman value: 30.385405026539914 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22799999999999998 - type: map_at_10 value: 2.064 - type: map_at_100 value: 13.056000000000001 - type: map_at_1000 value: 31.747999999999998 - type: map_at_3 value: 0.67 - type: map_at_5 value: 1.097 - type: mrr_at_1 value: 90.0 - type: mrr_at_10 value: 94.667 - type: mrr_at_100 value: 94.667 - type: mrr_at_1000 value: 94.667 - type: mrr_at_3 value: 94.667 - type: mrr_at_5 value: 94.667 - type: ndcg_at_1 value: 86.0 - type: ndcg_at_10 value: 82.0 - type: ndcg_at_100 value: 64.307 - type: ndcg_at_1000 value: 57.023999999999994 - type: ndcg_at_3 value: 85.816 - type: ndcg_at_5 value: 84.904 - type: precision_at_1 value: 90.0 - type: precision_at_10 value: 85.8 - type: precision_at_100 value: 66.46 - type: precision_at_1000 value: 25.202 - type: precision_at_3 value: 90.0 - type: precision_at_5 value: 89.2 - type: recall_at_1 value: 0.22799999999999998 - type: recall_at_10 value: 2.235 - type: recall_at_100 value: 16.185 - type: recall_at_1000 value: 53.620999999999995 - type: recall_at_3 value: 0.7040000000000001 - type: recall_at_5 value: 1.172 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.39999999999999 - type: f1 value: 96.75 - type: precision value: 96.45 - type: recall value: 97.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.54913294797689 - type: f1 value: 82.46628131021194 - type: precision value: 81.1175337186898 - type: recall value: 85.54913294797689 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 81.21951219512195 - type: f1 value: 77.33333333333334 - type: precision value: 75.54878048780488 - type: recall value: 81.21951219512195 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 98.6 - type: f1 value: 98.26666666666665 - type: precision value: 98.1 - type: recall value: 98.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 99.5 - type: f1 value: 99.33333333333333 - type: precision value: 99.25 - type: recall value: 99.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.8 - type: f1 value: 97.2 - type: precision value: 96.89999999999999 - type: recall value: 97.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.8 - type: f1 value: 97.18333333333334 - type: precision value: 96.88333333333333 - type: recall value: 97.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.61194029850746 - type: f1 value: 72.81094527363183 - type: precision value: 70.83333333333333 - type: recall value: 77.61194029850746 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.7 - type: f1 value: 91.91666666666667 - type: precision value: 91.08333333333334 - type: recall value: 93.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.29268292682927 - type: f1 value: 85.27642276422765 - type: precision value: 84.01277584204414 - type: recall value: 88.29268292682927 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.1 - type: f1 value: 95.0 - type: precision value: 94.46666666666668 - type: recall value: 96.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.681652490887 - type: f1 value: 91.90765492102065 - type: precision value: 91.05913325232888 - type: recall value: 93.681652490887 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.17391304347827 - type: f1 value: 89.97101449275361 - type: precision value: 88.96811594202899 - type: recall value: 92.17391304347827 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.43478260869566 - type: f1 value: 87.72173913043478 - type: precision value: 86.42028985507245 - type: recall value: 90.43478260869566 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.4 - type: f1 value: 88.03 - type: precision value: 86.95 - type: recall value: 90.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.4 - type: f1 value: 91.45666666666666 - type: precision value: 90.525 - type: recall value: 93.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 81.9059107358263 - type: f1 value: 78.32557872364869 - type: precision value: 76.78260286824823 - type: recall value: 81.9059107358263 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.3 - type: f1 value: 92.58333333333333 - type: precision value: 91.73333333333332 - type: recall value: 94.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 79.10000000000001 - type: f1 value: 74.50500000000001 - type: precision value: 72.58928571428571 - type: recall value: 79.10000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.6 - type: f1 value: 95.55 - type: precision value: 95.05 - type: recall value: 96.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 82.0952380952381 - type: f1 value: 77.98458049886621 - type: precision value: 76.1968253968254 - type: recall value: 82.0952380952381 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.9 - type: f1 value: 84.99190476190476 - type: precision value: 83.65 - type: recall value: 87.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.7 - type: f1 value: 94.56666666666666 - type: precision value: 94.01666666666667 - type: recall value: 95.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 98.6 - type: f1 value: 98.2 - type: precision value: 98.0 - type: recall value: 98.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.6 - type: f1 value: 94.38333333333334 - type: precision value: 93.78333333333335 - type: recall value: 95.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.4 - type: f1 value: 84.10380952380952 - type: precision value: 82.67 - type: recall value: 87.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.5 - type: f1 value: 94.33333333333334 - type: precision value: 93.78333333333333 - type: recall value: 95.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.4 - type: f1 value: 86.82000000000001 - type: precision value: 85.64500000000001 - type: recall value: 89.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.1 - type: f1 value: 93.56666666666668 - type: precision value: 92.81666666666666 - type: recall value: 95.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 98.9 - type: f1 value: 98.6 - type: precision value: 98.45 - type: recall value: 98.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.01347708894879 - type: f1 value: 93.51752021563343 - type: precision value: 92.82794249775381 - type: recall value: 95.01347708894879 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.00854700854701 - type: f1 value: 96.08262108262107 - type: precision value: 95.65527065527067 - type: recall value: 97.00854700854701 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.5 - type: f1 value: 95.39999999999999 - type: precision value: 94.88333333333333 - type: recall value: 96.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.5909090909091 - type: f1 value: 95.49242424242425 - type: precision value: 94.9621212121212 - type: recall value: 96.5909090909091 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 84.90566037735849 - type: f1 value: 81.85883997204752 - type: precision value: 80.54507337526205 - type: recall value: 84.90566037735849 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.5 - type: f1 value: 96.75 - type: precision value: 96.38333333333333 - type: recall value: 97.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.7704280155642 - type: f1 value: 82.99610894941635 - type: precision value: 81.32295719844358 - type: recall value: 86.7704280155642 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 67.52136752136752 - type: f1 value: 61.89662189662191 - type: precision value: 59.68660968660969 - type: recall value: 67.52136752136752 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.2 - type: f1 value: 86.32 - type: precision value: 85.015 - type: recall value: 89.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.0 - type: f1 value: 94.78333333333333 - type: precision value: 94.18333333333334 - type: recall value: 96.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 83.8785046728972 - type: f1 value: 80.54517133956385 - type: precision value: 79.154984423676 - type: recall value: 83.8785046728972 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.60000000000001 - type: f1 value: 92.01333333333334 - type: precision value: 91.28333333333333 - type: recall value: 93.60000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.1 - type: f1 value: 96.26666666666667 - type: precision value: 95.85000000000001 - type: recall value: 97.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 84.3 - type: f1 value: 80.67833333333333 - type: precision value: 79.03928571428571 - type: recall value: 84.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.3 - type: f1 value: 96.48333333333332 - type: precision value: 96.08333333333331 - type: recall value: 97.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.7 - type: f1 value: 94.66666666666667 - type: precision value: 94.16666666666667 - type: recall value: 95.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.2 - type: f1 value: 96.36666666666667 - type: precision value: 95.96666666666668 - type: recall value: 97.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.3 - type: f1 value: 92.80666666666667 - type: precision value: 92.12833333333333 - type: recall value: 94.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.0 - type: f1 value: 96.22333333333334 - type: precision value: 95.875 - type: recall value: 97.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 74.33333333333333 - type: f1 value: 70.78174603174602 - type: precision value: 69.28333333333332 - type: recall value: 74.33333333333333 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 37.6 - type: f1 value: 32.938348952090365 - type: precision value: 31.2811038961039 - type: recall value: 37.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.5 - type: f1 value: 89.13333333333333 - type: precision value: 88.03333333333333 - type: recall value: 91.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 82.14285714285714 - type: f1 value: 77.67857142857143 - type: precision value: 75.59523809523809 - type: recall value: 82.14285714285714 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 69.0450054884742 - type: f1 value: 63.070409283362075 - type: precision value: 60.58992781824835 - type: recall value: 69.0450054884742 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 63.1 - type: f1 value: 57.848333333333336 - type: precision value: 55.69500000000001 - type: recall value: 63.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.1 - type: f1 value: 95.01666666666667 - type: precision value: 94.5 - type: recall value: 96.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.89999999999999 - type: f1 value: 94.90666666666667 - type: precision value: 94.425 - type: recall value: 95.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.6 - type: f1 value: 84.61333333333333 - type: precision value: 83.27 - type: recall value: 87.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 76.4 - type: f1 value: 71.90746031746032 - type: precision value: 70.07027777777778 - type: recall value: 76.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.89999999999999 - type: f1 value: 97.26666666666667 - type: precision value: 96.95 - type: recall value: 97.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 78.8 - type: f1 value: 74.39555555555555 - type: precision value: 72.59416666666667 - type: recall value: 78.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.19999999999999 - type: f1 value: 93.78999999999999 - type: precision value: 93.125 - type: recall value: 95.19999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.8 - type: f1 value: 97.1 - type: precision value: 96.75 - type: recall value: 97.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.6 - type: f1 value: 94.25666666666666 - type: precision value: 93.64166666666668 - type: recall value: 95.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 56.934306569343065 - type: f1 value: 51.461591936044485 - type: precision value: 49.37434827945776 - type: recall value: 56.934306569343065 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 20.200000000000003 - type: f1 value: 16.91799284049284 - type: precision value: 15.791855158730158 - type: recall value: 20.200000000000003 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.2 - type: f1 value: 95.3 - type: precision value: 94.85 - type: recall value: 96.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.3 - type: f1 value: 95.11666666666667 - type: precision value: 94.53333333333333 - type: recall value: 96.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.88095238095238 - type: f1 value: 87.14285714285714 - type: precision value: 85.96230158730161 - type: recall value: 89.88095238095238 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 24.099999999999998 - type: f1 value: 19.630969083349783 - type: precision value: 18.275094905094907 - type: recall value: 24.099999999999998 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 83.4368530020704 - type: f1 value: 79.45183870649709 - type: precision value: 77.7432712215321 - type: recall value: 83.4368530020704 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.8 - type: f1 value: 94.53333333333333 - type: precision value: 93.91666666666666 - type: recall value: 95.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 98.8 - type: f1 value: 98.48333333333332 - type: precision value: 98.33333333333334 - type: recall value: 98.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 17.5 - type: f1 value: 14.979285714285714 - type: precision value: 14.23235060690943 - type: recall value: 17.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.93939393939394 - type: f1 value: 91.991341991342 - type: precision value: 91.05339105339105 - type: recall value: 93.93939393939394 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.31297709923665 - type: f1 value: 86.76844783715012 - type: precision value: 85.63613231552164 - type: recall value: 89.31297709923665 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 99.12663755458514 - type: f1 value: 98.93255701115964 - type: precision value: 98.83551673944687 - type: recall value: 99.12663755458514 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.0 - type: f1 value: 89.77999999999999 - type: precision value: 88.78333333333333 - type: recall value: 92.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.89265536723164 - type: f1 value: 95.85687382297553 - type: precision value: 95.33898305084746 - type: recall value: 96.89265536723164 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 14.6 - type: f1 value: 11.820611790170615 - type: precision value: 11.022616224355355 - type: recall value: 14.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.89999999999999 - type: f1 value: 94.93333333333334 - type: precision value: 94.48666666666666 - type: recall value: 95.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.6 - type: f1 value: 84.72333333333334 - type: precision value: 83.44166666666666 - type: recall value: 87.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.8 - type: f1 value: 93.47333333333333 - type: precision value: 92.875 - type: recall value: 94.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.6 - type: f1 value: 95.71666666666665 - type: precision value: 95.28333333333335 - type: recall value: 96.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 17.8 - type: f1 value: 14.511074040901628 - type: precision value: 13.503791000666002 - type: recall value: 17.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.10187667560321 - type: f1 value: 92.46648793565683 - type: precision value: 91.71134941912423 - type: recall value: 94.10187667560321 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.0 - type: f1 value: 96.11666666666666 - type: precision value: 95.68333333333334 - type: recall value: 97.0 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 72.72727272727273 - type: f1 value: 66.58949745906267 - type: precision value: 63.86693017127799 - type: recall value: 72.72727272727273 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.14084507042254 - type: f1 value: 88.26291079812206 - type: precision value: 87.32394366197182 - type: recall value: 90.14084507042254 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 64.67065868263472 - type: f1 value: 58.2876627696987 - type: precision value: 55.79255774165953 - type: recall value: 64.67065868263472 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.6 - type: f1 value: 94.41666666666667 - type: precision value: 93.85 - type: recall value: 95.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 55.172413793103445 - type: f1 value: 49.63992493549144 - type: precision value: 47.71405113769646 - type: recall value: 55.172413793103445 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.46478873239437 - type: f1 value: 73.4417616811983 - type: precision value: 71.91607981220658 - type: recall value: 77.46478873239437 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 84.61538461538461 - type: f1 value: 80.91452991452994 - type: precision value: 79.33760683760683 - type: recall value: 84.61538461538461 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 98.2 - type: f1 value: 97.6 - type: precision value: 97.3 - type: recall value: 98.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 75.5741127348643 - type: f1 value: 72.00417536534445 - type: precision value: 70.53467872883321 - type: recall value: 75.5741127348643 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 62.2 - type: f1 value: 55.577460317460314 - type: precision value: 52.98583333333333 - type: recall value: 62.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.18241042345277 - type: f1 value: 90.6468124709167 - type: precision value: 89.95656894679696 - type: recall value: 92.18241042345277 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.1 - type: f1 value: 95.13333333333333 - type: precision value: 94.66666666666667 - type: recall value: 96.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.8 - type: f1 value: 95.85000000000001 - type: precision value: 95.39999999999999 - type: recall value: 96.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.1259842519685 - type: f1 value: 89.76377952755905 - type: precision value: 88.71391076115485 - type: recall value: 92.1259842519685 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.1 - type: f1 value: 92.49 - type: precision value: 91.725 - type: recall value: 94.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.5623268698061 - type: f1 value: 73.27364463791058 - type: precision value: 71.51947852086357 - type: recall value: 77.5623268698061 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.39999999999999 - type: f1 value: 96.56666666666666 - type: precision value: 96.16666666666667 - type: recall value: 97.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 66.34615384615384 - type: f1 value: 61.092032967032964 - type: precision value: 59.27197802197802 - type: recall value: 66.34615384615384 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.89999999999999 - type: f1 value: 93.41190476190476 - type: precision value: 92.7 - type: recall value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.10000000000001 - type: f1 value: 91.10000000000001 - type: precision value: 90.13333333333333 - type: recall value: 93.10000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.7 - type: f1 value: 91.97333333333334 - type: precision value: 91.14166666666667 - type: recall value: 93.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.21698113207547 - type: f1 value: 90.3796046720575 - type: precision value: 89.56367924528303 - type: recall value: 92.21698113207547 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.6 - type: f1 value: 96.91666666666667 - type: precision value: 96.6 - type: recall value: 97.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.44525547445255 - type: f1 value: 96.71532846715328 - type: precision value: 96.35036496350365 - type: recall value: 97.44525547445255 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.1 - type: f1 value: 92.34000000000002 - type: precision value: 91.49166666666667 - type: recall value: 94.1 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 3.2910000000000004 - type: map_at_10 value: 10.373000000000001 - type: map_at_100 value: 15.612 - type: map_at_1000 value: 17.06 - type: map_at_3 value: 6.119 - type: map_at_5 value: 7.917000000000001 - type: mrr_at_1 value: 44.897999999999996 - type: mrr_at_10 value: 56.054 - type: mrr_at_100 value: 56.82000000000001 - type: mrr_at_1000 value: 56.82000000000001 - type: mrr_at_3 value: 52.381 - type: mrr_at_5 value: 53.81 - type: ndcg_at_1 value: 42.857 - type: ndcg_at_10 value: 27.249000000000002 - type: ndcg_at_100 value: 36.529 - type: ndcg_at_1000 value: 48.136 - type: ndcg_at_3 value: 33.938 - type: ndcg_at_5 value: 29.951 - type: precision_at_1 value: 44.897999999999996 - type: precision_at_10 value: 22.653000000000002 - type: precision_at_100 value: 7.000000000000001 - type: precision_at_1000 value: 1.48 - type: precision_at_3 value: 32.653 - type: precision_at_5 value: 27.755000000000003 - type: recall_at_1 value: 3.2910000000000004 - type: recall_at_10 value: 16.16 - type: recall_at_100 value: 43.908 - type: recall_at_1000 value: 79.823 - type: recall_at_3 value: 7.156 - type: recall_at_5 value: 10.204 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.05879999999999 - type: ap value: 14.609748142799111 - type: f1 value: 54.878956295843096 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 64.61799660441426 - type: f1 value: 64.8698191961434 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 51.32860036611885 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 88.34714192048638 - type: cos_sim_ap value: 80.26732975975634 - type: cos_sim_f1 value: 73.53415148134374 - type: cos_sim_precision value: 69.34767360299276 - type: cos_sim_recall value: 78.25857519788919 - type: dot_accuracy value: 88.34714192048638 - type: dot_ap value: 80.26733698491206 - type: dot_f1 value: 73.53415148134374 - type: dot_precision value: 69.34767360299276 - type: dot_recall value: 78.25857519788919 - type: euclidean_accuracy value: 88.34714192048638 - type: euclidean_ap value: 80.26734337771738 - type: euclidean_f1 value: 73.53415148134374 - type: euclidean_precision value: 69.34767360299276 - type: euclidean_recall value: 78.25857519788919 - type: manhattan_accuracy value: 88.30541813196639 - type: manhattan_ap value: 80.19415808104145 - type: manhattan_f1 value: 73.55143870713441 - type: manhattan_precision value: 73.25307511122743 - type: manhattan_recall value: 73.85224274406332 - type: max_accuracy value: 88.34714192048638 - type: max_ap value: 80.26734337771738 - type: max_f1 value: 73.55143870713441 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.81061047075717 - type: cos_sim_ap value: 87.11747055081017 - type: cos_sim_f1 value: 80.04355498817256 - type: cos_sim_precision value: 78.1165262000733 - type: cos_sim_recall value: 82.06806282722513 - type: dot_accuracy value: 89.81061047075717 - type: dot_ap value: 87.11746902745236 - type: dot_f1 value: 80.04355498817256 - type: dot_precision value: 78.1165262000733 - type: dot_recall value: 82.06806282722513 - type: euclidean_accuracy value: 89.81061047075717 - type: euclidean_ap value: 87.11746919324248 - type: euclidean_f1 value: 80.04355498817256 - type: euclidean_precision value: 78.1165262000733 - type: euclidean_recall value: 82.06806282722513 - type: manhattan_accuracy value: 89.79508673885202 - type: manhattan_ap value: 87.11074390832218 - type: manhattan_f1 value: 80.13002540726349 - type: manhattan_precision value: 77.83826945412311 - type: manhattan_recall value: 82.56082537727133 - type: max_accuracy value: 89.81061047075717 - type: max_ap value: 87.11747055081017 - type: max_f1 value: 80.13002540726349 language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - 'no' - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh license: mit --- ## Multilingual-E5-large-instruct Multilingual E5 Text Embeddings: A Technical Report. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei, arXiv 2024 This model has 24 layers and the embedding size is 1024. ## Usage Below are examples to encode queries and passages from the MS-MARCO passage ranking dataset. ### Transformers ### Sentence Transformers ### Infinity Usage with Infinity: ## Supported Languages This model is initialized from xlm-roberta-large and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation. ## Training Details **Initialization**: xlm-roberta-large **First stage**: contrastive pre-training with 1 billion weakly supervised text pairs. **Second stage**: fine-tuning on datasets from the E5-mistral paper. ## MTEB Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## FAQ **1. Do I need to add instructions to the query?** Yes, this is how the model is trained, otherwise you will see a performance degradation. The task definition should be a one-sentence instruction that describes the task. This is a way to customize text embeddings for different scenarios through natural language instructions. Please check out unilm/e5/utils.py for instructions we used for evaluation. On the other hand, there is no need to add instructions to the document side. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Why does the cosine similarity scores distribute around 0.7 to 1.0?** This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss. For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue. ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations Long texts will be truncated to at most 512 tokens.", + "model_explanation_gemini": "A multilingual model designed for text classification and retrieval tasks across various languages, optimized for performance on datasets like MTEB AmazonCounterfactual and AmazonReviews." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_multilingual-e5-large.json b/data/model_data_json/intfloat_multilingual-e5-large.json new file mode 100644 index 0000000000000000000000000000000000000000..7d8edea40f2de155f5e84822ad7ac890a125548a --- /dev/null +++ b/data/model_data_json/intfloat_multilingual-e5-large.json @@ -0,0 +1,122 @@ +{ + "model_id": "intfloat/multilingual-e5-large", + "downloads": 1697377, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "xlm-roberta", + "mteb", + "Sentence Transformers", + "sentence-similarity", + "feature-extraction", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:2402.05672", + "arxiv:2108.08787", + "arxiv:2104.08663", + "arxiv:2210.07316", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - Sentence Transformers - sentence-similarity - feature-extraction - sentence-transformers model-index: - name: multilingual-e5-large results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 79.05970149253731 - type: ap value: 43.486574390835635 - type: f1 value: 73.32700092140148 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (de) config: de split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 71.22055674518201 - type: ap value: 81.55756710830498 - type: f1 value: 69.28271787752661 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en-ext) config: en-ext split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 80.41979010494754 - type: ap value: 29.34879922376344 - type: f1 value: 67.62475449011278 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (ja) config: ja split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 77.8372591006424 - type: ap value: 26.557560591210738 - type: f1 value: 64.96619417368707 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.489875 - type: ap value: 90.98758636917603 - type: f1 value: 93.48554819717332 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 47.564 - type: f1 value: 46.75122173518047 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 45.400000000000006 - type: f1 value: 44.17195682400632 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 43.068 - type: f1 value: 42.38155696855596 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 41.89 - type: f1 value: 40.84407321682663 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.120000000000005 - type: f1 value: 39.522976223819114 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 38.832 - type: f1 value: 38.0392533394713 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 30.725 - type: map_at_10 value: 46.055 - type: map_at_100 value: 46.900999999999996 - type: map_at_1000 value: 46.911 - type: map_at_3 value: 41.548 - type: map_at_5 value: 44.297 - type: mrr_at_1 value: 31.152 - type: mrr_at_10 value: 46.231 - type: mrr_at_100 value: 47.07 - type: mrr_at_1000 value: 47.08 - type: mrr_at_3 value: 41.738 - type: mrr_at_5 value: 44.468999999999994 - type: ndcg_at_1 value: 30.725 - type: ndcg_at_10 value: 54.379999999999995 - type: ndcg_at_100 value: 58.138 - type: ndcg_at_1000 value: 58.389 - type: ndcg_at_3 value: 45.156 - type: ndcg_at_5 value: 50.123 - type: precision_at_1 value: 30.725 - type: precision_at_10 value: 8.087 - type: precision_at_100 value: 0.9769999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 18.54 - type: precision_at_5 value: 13.542000000000002 - type: recall_at_1 value: 30.725 - type: recall_at_10 value: 80.868 - type: recall_at_100 value: 97.653 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 55.619 - type: recall_at_5 value: 67.71000000000001 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 44.30960650674069 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 38.427074197498996 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 60.28270056031872 - type: mrr value: 74.38332673789738 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 84.05942144105269 - type: cos_sim_spearman value: 82.51212105850809 - type: euclidean_pearson value: 81.95639829909122 - type: euclidean_spearman value: 82.3717564144213 - type: manhattan_pearson value: 81.79273425468256 - type: manhattan_spearman value: 82.20066817871039 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.46764091858039 - type: f1 value: 99.37717466945023 - type: precision value: 99.33194154488518 - type: recall value: 99.46764091858039 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (fr-en) config: fr-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 98.29407880255337 - type: f1 value: 98.11248073959938 - type: precision value: 98.02443319392472 - type: recall value: 98.29407880255337 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (ru-en) config: ru-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 97.79009352268791 - type: f1 value: 97.5176076665512 - type: precision value: 97.38136473848286 - type: recall value: 97.79009352268791 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (zh-en) config: zh-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 99.26276987888363 - type: f1 value: 99.20133403545726 - type: precision value: 99.17500438827453 - type: recall value: 99.26276987888363 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 84.72727272727273 - type: f1 value: 84.67672206031433 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 35.34220182511161 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 33.4987096128766 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.558249999999997 - type: map_at_10 value: 34.44425000000001 - type: map_at_100 value: 35.59833333333333 - type: map_at_1000 value: 35.706916666666665 - type: map_at_3 value: 31.691749999999995 - type: map_at_5 value: 33.252916666666664 - type: mrr_at_1 value: 30.252666666666666 - type: mrr_at_10 value: 38.60675 - type: mrr_at_100 value: 39.42666666666666 - type: mrr_at_1000 value: 39.48408333333334 - type: mrr_at_3 value: 36.17441666666665 - type: mrr_at_5 value: 37.56275 - type: ndcg_at_1 value: 30.252666666666666 - type: ndcg_at_10 value: 39.683 - type: ndcg_at_100 value: 44.68541666666667 - type: ndcg_at_1000 value: 46.94316666666668 - type: ndcg_at_3 value: 34.961749999999995 - type: ndcg_at_5 value: 37.215666666666664 - type: precision_at_1 value: 30.252666666666666 - type: precision_at_10 value: 6.904166666666667 - type: precision_at_100 value: 1.0989999999999995 - type: precision_at_1000 value: 0.14733333333333334 - type: precision_at_3 value: 16.037666666666667 - type: precision_at_5 value: 11.413583333333333 - type: recall_at_1 value: 25.558249999999997 - type: recall_at_10 value: 51.13341666666666 - type: recall_at_100 value: 73.08366666666667 - type: recall_at_1000 value: 88.79483333333334 - type: recall_at_3 value: 37.989083333333326 - type: recall_at_5 value: 43.787833333333325 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.338 - type: map_at_10 value: 18.360000000000003 - type: map_at_100 value: 19.942 - type: map_at_1000 value: 20.134 - type: map_at_3 value: 15.174000000000001 - type: map_at_5 value: 16.830000000000002 - type: mrr_at_1 value: 23.257 - type: mrr_at_10 value: 33.768 - type: mrr_at_100 value: 34.707 - type: mrr_at_1000 value: 34.766000000000005 - type: mrr_at_3 value: 30.977 - type: mrr_at_5 value: 32.528 - type: ndcg_at_1 value: 23.257 - type: ndcg_at_10 value: 25.733 - type: ndcg_at_100 value: 32.288 - type: ndcg_at_1000 value: 35.992000000000004 - type: ndcg_at_3 value: 20.866 - type: ndcg_at_5 value: 22.612 - type: precision_at_1 value: 23.257 - type: precision_at_10 value: 8.124 - type: precision_at_100 value: 1.518 - type: precision_at_1000 value: 0.219 - type: precision_at_3 value: 15.679000000000002 - type: precision_at_5 value: 12.117 - type: recall_at_1 value: 10.338 - type: recall_at_10 value: 31.154 - type: recall_at_100 value: 54.161 - type: recall_at_1000 value: 75.21900000000001 - type: recall_at_3 value: 19.427 - type: recall_at_5 value: 24.214 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.498 - type: map_at_10 value: 19.103 - type: map_at_100 value: 27.375 - type: map_at_1000 value: 28.981 - type: map_at_3 value: 13.764999999999999 - type: map_at_5 value: 15.950000000000001 - type: mrr_at_1 value: 65.5 - type: mrr_at_10 value: 74.53800000000001 - type: mrr_at_100 value: 74.71799999999999 - type: mrr_at_1000 value: 74.725 - type: mrr_at_3 value: 72.792 - type: mrr_at_5 value: 73.554 - type: ndcg_at_1 value: 53.37499999999999 - type: ndcg_at_10 value: 41.286 - type: ndcg_at_100 value: 45.972 - type: ndcg_at_1000 value: 53.123 - type: ndcg_at_3 value: 46.172999999999995 - type: ndcg_at_5 value: 43.033 - type: precision_at_1 value: 65.5 - type: precision_at_10 value: 32.725 - type: precision_at_100 value: 10.683 - type: precision_at_1000 value: 1.978 - type: precision_at_3 value: 50 - type: precision_at_5 value: 41.349999999999994 - type: recall_at_1 value: 8.498 - type: recall_at_10 value: 25.070999999999998 - type: recall_at_100 value: 52.383 - type: recall_at_1000 value: 74.91499999999999 - type: recall_at_3 value: 15.207999999999998 - type: recall_at_5 value: 18.563 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.5 - type: f1 value: 41.93833713984145 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 67.914 - type: map_at_10 value: 78.10000000000001 - type: map_at_100 value: 78.333 - type: map_at_1000 value: 78.346 - type: map_at_3 value: 76.626 - type: map_at_5 value: 77.627 - type: mrr_at_1 value: 72.74199999999999 - type: mrr_at_10 value: 82.414 - type: mrr_at_100 value: 82.511 - type: mrr_at_1000 value: 82.513 - type: mrr_at_3 value: 81.231 - type: mrr_at_5 value: 82.065 - type: ndcg_at_1 value: 72.74199999999999 - type: ndcg_at_10 value: 82.806 - type: ndcg_at_100 value: 83.677 - type: ndcg_at_1000 value: 83.917 - type: ndcg_at_3 value: 80.305 - type: ndcg_at_5 value: 81.843 - type: precision_at_1 value: 72.74199999999999 - type: precision_at_10 value: 10.24 - type: precision_at_100 value: 1.089 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 31.268 - type: precision_at_5 value: 19.706000000000003 - type: recall_at_1 value: 67.914 - type: recall_at_10 value: 92.889 - type: recall_at_100 value: 96.42699999999999 - type: recall_at_1000 value: 97.92 - type: recall_at_3 value: 86.21 - type: recall_at_5 value: 90.036 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.166 - type: map_at_10 value: 35.57 - type: map_at_100 value: 37.405 - type: map_at_1000 value: 37.564 - type: map_at_3 value: 30.379 - type: map_at_5 value: 33.324 - type: mrr_at_1 value: 43.519000000000005 - type: mrr_at_10 value: 51.556000000000004 - type: mrr_at_100 value: 52.344 - type: mrr_at_1000 value: 52.373999999999995 - type: mrr_at_3 value: 48.868 - type: mrr_at_5 value: 50.319 - type: ndcg_at_1 value: 43.519000000000005 - type: ndcg_at_10 value: 43.803 - type: ndcg_at_100 value: 50.468999999999994 - type: ndcg_at_1000 value: 53.111 - type: ndcg_at_3 value: 38.893 - type: ndcg_at_5 value: 40.653 - type: precision_at_1 value: 43.519000000000005 - type: precision_at_10 value: 12.253 - type: precision_at_100 value: 1.931 - type: precision_at_1000 value: 0.242 - type: precision_at_3 value: 25.617 - type: precision_at_5 value: 19.383 - type: recall_at_1 value: 22.166 - type: recall_at_10 value: 51.6 - type: recall_at_100 value: 76.574 - type: recall_at_1000 value: 92.192 - type: recall_at_3 value: 34.477999999999994 - type: recall_at_5 value: 41.835 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.041 - type: map_at_10 value: 62.961999999999996 - type: map_at_100 value: 63.79899999999999 - type: map_at_1000 value: 63.854 - type: map_at_3 value: 59.399 - type: map_at_5 value: 61.669 - type: mrr_at_1 value: 78.082 - type: mrr_at_10 value: 84.321 - type: mrr_at_100 value: 84.49600000000001 - type: mrr_at_1000 value: 84.502 - type: mrr_at_3 value: 83.421 - type: mrr_at_5 value: 83.977 - type: ndcg_at_1 value: 78.082 - type: ndcg_at_10 value: 71.229 - type: ndcg_at_100 value: 74.10900000000001 - type: ndcg_at_1000 value: 75.169 - type: ndcg_at_3 value: 66.28699999999999 - type: ndcg_at_5 value: 69.084 - type: precision_at_1 value: 78.082 - type: precision_at_10 value: 14.993 - type: precision_at_100 value: 1.7239999999999998 - type: precision_at_1000 value: 0.186 - type: precision_at_3 value: 42.737 - type: precision_at_5 value: 27.843 - type: recall_at_1 value: 39.041 - type: recall_at_10 value: 74.96300000000001 - type: recall_at_100 value: 86.199 - type: recall_at_1000 value: 93.228 - type: recall_at_3 value: 64.105 - type: recall_at_5 value: 69.608 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 90.23160000000001 - type: ap value: 85.5674856808308 - type: f1 value: 90.18033354786317 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 24.091 - type: map_at_10 value: 36.753 - type: map_at_100 value: 37.913000000000004 - type: map_at_1000 value: 37.958999999999996 - type: map_at_3 value: 32.818999999999996 - type: map_at_5 value: 35.171 - type: mrr_at_1 value: 24.742 - type: mrr_at_10 value: 37.285000000000004 - type: mrr_at_100 value: 38.391999999999996 - type: mrr_at_1000 value: 38.431 - type: mrr_at_3 value: 33.440999999999995 - type: mrr_at_5 value: 35.75 - type: ndcg_at_1 value: 24.742 - type: ndcg_at_10 value: 43.698 - type: ndcg_at_100 value: 49.145 - type: ndcg_at_1000 value: 50.23800000000001 - type: ndcg_at_3 value: 35.769 - type: ndcg_at_5 value: 39.961999999999996 - type: precision_at_1 value: 24.742 - type: precision_at_10 value: 6.7989999999999995 - type: precision_at_100 value: 0.95 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 15.096000000000002 - type: precision_at_5 value: 11.183 - type: recall_at_1 value: 24.091 - type: recall_at_10 value: 65.068 - type: recall_at_100 value: 89.899 - type: recall_at_1000 value: 98.16 - type: recall_at_3 value: 43.68 - type: recall_at_5 value: 53.754999999999995 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.66621067031465 - type: f1 value: 93.49622853272142 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 91.94702733164272 - type: f1 value: 91.17043441745282 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.20146764509674 - type: f1 value: 91.98359080555608 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.99780770435328 - type: f1 value: 89.19746342724068 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.78486912871998 - type: f1 value: 89.24578823628642 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.74502712477394 - type: f1 value: 89.00297573881542 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.9046967624259 - type: f1 value: 59.36787125785957 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 74.5280360664976 - type: f1 value: 57.17723440888718 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 75.44029352901934 - type: f1 value: 54.052855531072964 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 70.5606013153774 - type: f1 value: 52.62215934386531 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 73.11581211903908 - type: f1 value: 52.341291845645465 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 74.28933092224233 - type: f1 value: 57.07918745504911 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.38063214525892 - type: f1 value: 59.46463723443009 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.06926698049766 - type: f1 value: 52.49084283283562 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.74983187626093 - type: f1 value: 56.960640620165904 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.86550100874243 - type: f1 value: 62.47370548140688 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.971082716879636 - type: f1 value: 61.03812421957381 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.98318762609282 - type: f1 value: 51.51207916008392 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.45527908540686 - type: f1 value: 66.16631905400318 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.32750504371216 - type: f1 value: 66.16755288646591 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.09213180901143 - type: f1 value: 66.95654394661507 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.75588433086752 - type: f1 value: 71.79973779656923 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.49428379287154 - type: f1 value: 68.37494379215734 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.90921318090115 - type: f1 value: 66.79517376481645 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.12104909213181 - type: f1 value: 67.29448842879584 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.34095494283793 - type: f1 value: 67.01134288992947 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.61264290517822 - type: f1 value: 64.68730512660757 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.79757901815738 - type: f1 value: 65.24938539425598 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.68728984532616 - type: f1 value: 67.0487169762553 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.07464694014795 - type: f1 value: 59.183532276789286 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.04707464694015 - type: f1 value: 67.66829629003848 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.42434431741762 - type: f1 value: 59.01617226544757 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.53127101546738 - type: f1 value: 68.10033760906255 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.50504371217215 - type: f1 value: 69.74931103158923 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.91190316072628 - type: f1 value: 54.05551136648796 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.78211163416275 - type: f1 value: 49.874888544058535 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 47.017484868863484 - type: f1 value: 44.53364263352014 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.16207128446537 - type: f1 value: 59.01185692320829 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.42501681237391 - type: f1 value: 67.13169450166086 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.0780094149294 - type: f1 value: 64.41720167850707 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.57162071284466 - type: f1 value: 62.414138683804424 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 61.71149966375252 - type: f1 value: 58.594805125087234 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.03900470746471 - type: f1 value: 63.87937257883887 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.8776059179556 - type: f1 value: 57.48587618059131 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.87895090786819 - type: f1 value: 66.8141299430347 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.45057162071285 - type: f1 value: 67.46444039673516 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.546738399462 - type: f1 value: 68.63640876702655 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.72965702757229 - type: f1 value: 68.54119560379115 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.35574983187625 - type: f1 value: 65.88844917691927 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.70477471418964 - type: f1 value: 69.19665697061978 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.0880968392737 - type: f1 value: 64.76962317666086 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 65.18493611297916 - type: f1 value: 62.49984559035371 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.75857431069265 - type: f1 value: 69.20053687623418 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.500336247478145 - type: f1 value: 55.2972398687929 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 62.68997982515132 - type: f1 value: 59.36848202755348 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.01950235373235 - type: f1 value: 60.09351954625423 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.29186281102892 - type: f1 value: 67.57860496703447 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.77471418964357 - type: f1 value: 61.913983147713836 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.87222595830532 - type: f1 value: 66.03679033708141 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 64.04505716207127 - type: f1 value: 61.28569169817908 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.38466711499663 - type: f1 value: 67.20532357036844 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.12306657700067 - type: f1 value: 68.91251226588182 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 66.20040349697378 - type: f1 value: 66.02657347714175 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.73907195696032 - type: f1 value: 66.98484521791418 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.58843308675185 - type: f1 value: 58.95591723092005 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.22730329522528 - type: f1 value: 66.0894499712115 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.48285137861465 - type: f1 value: 65.21963176785157 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.74714189643578 - type: f1 value: 66.8212192745412 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.09213180901143 - type: f1 value: 56.70735546356339 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.05716207128448 - type: f1 value: 74.8413712365364 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.69737726967047 - type: f1 value: 74.7664341963 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.90383322125084 - type: f1 value: 73.59201554448323 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.51176866173503 - type: f1 value: 77.46104434577758 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.31069266980496 - type: f1 value: 74.61048660675635 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.95225285810356 - type: f1 value: 72.33160006574627 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.12373907195696 - type: f1 value: 73.20921012557481 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.86684599865501 - type: f1 value: 73.82348774610831 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.40215198386012 - type: f1 value: 71.11945183971858 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 72.12844653665098 - type: f1 value: 71.34450495911766 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.52252858103566 - type: f1 value: 73.98878711342999 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.93611297915265 - type: f1 value: 63.723200467653385 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.11903160726295 - type: f1 value: 73.82138439467096 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.15198386012105 - type: f1 value: 66.02172193802167 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.32414256893072 - type: f1 value: 74.30943421170574 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.46805648957633 - type: f1 value: 77.62808409298209 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.318762609280434 - type: f1 value: 62.094284066075076 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.34902488231338 - type: f1 value: 57.12893860987984 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 50.88433086751849 - type: f1 value: 48.2272350802058 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.4425016812374 - type: f1 value: 64.61463095996173 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.04707464694015 - type: f1 value: 75.05099199098998 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.50437121721586 - type: f1 value: 69.83397721096314 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.94283792871553 - type: f1 value: 68.8704663703913 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.79488903833222 - type: f1 value: 63.615424063345436 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.88231338264963 - type: f1 value: 68.57892302593237 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.248150638870214 - type: f1 value: 61.06680605338809 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.84196368527236 - type: f1 value: 74.52566464968763 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.8285137861466 - type: f1 value: 74.8853197608802 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.13248150638869 - type: f1 value: 74.3982040999179 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.49024882313383 - type: f1 value: 73.82153848368573 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.72158708809684 - type: f1 value: 71.85049433180541 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.137861466039 - type: f1 value: 75.37628348188467 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.86953597848016 - type: f1 value: 71.87537624521661 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 70.27572293207801 - type: f1 value: 68.80017302344231 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.09952925353059 - type: f1 value: 76.07992707688408 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.140551445864155 - type: f1 value: 61.73855010331415 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.27774041694687 - type: f1 value: 64.83664868894539 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.69468728984533 - type: f1 value: 64.76239666920868 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.44653665097512 - type: f1 value: 73.14646052013873 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 67.71351714862139 - type: f1 value: 66.67212180163382 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.9946200403497 - type: f1 value: 73.87348793725525 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 68.15400134498992 - type: f1 value: 67.09433241421094 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.11365164761264 - type: f1 value: 73.59502539433753 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.82582380632145 - type: f1 value: 76.89992945316313 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.81237390719569 - type: f1 value: 72.36499770986265 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.480506569594695 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 29.71252128004552 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 31.421396787056548 - type: mrr value: 32.48155274872267 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.595 - type: map_at_10 value: 12.642000000000001 - type: map_at_100 value: 15.726 - type: map_at_1000 value: 17.061999999999998 - type: map_at_3 value: 9.125 - type: map_at_5 value: 10.866000000000001 - type: mrr_at_1 value: 43.344 - type: mrr_at_10 value: 52.227999999999994 - type: mrr_at_100 value: 52.898999999999994 - type: mrr_at_1000 value: 52.944 - type: mrr_at_3 value: 49.845 - type: mrr_at_5 value: 51.115 - type: ndcg_at_1 value: 41.949999999999996 - type: ndcg_at_10 value: 33.995 - type: ndcg_at_100 value: 30.869999999999997 - type: ndcg_at_1000 value: 39.487 - type: ndcg_at_3 value: 38.903999999999996 - type: ndcg_at_5 value: 37.236999999999995 - type: precision_at_1 value: 43.344 - type: precision_at_10 value: 25.480000000000004 - type: precision_at_100 value: 7.672 - type: precision_at_1000 value: 2.028 - type: precision_at_3 value: 36.636 - type: precision_at_5 value: 32.632 - type: recall_at_1 value: 5.595 - type: recall_at_10 value: 16.466 - type: recall_at_100 value: 31.226 - type: recall_at_1000 value: 62.778999999999996 - type: recall_at_3 value: 9.931 - type: recall_at_5 value: 12.884 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 40.414 - type: map_at_10 value: 56.754000000000005 - type: map_at_100 value: 57.457 - type: map_at_1000 value: 57.477999999999994 - type: map_at_3 value: 52.873999999999995 - type: map_at_5 value: 55.175 - type: mrr_at_1 value: 45.278 - type: mrr_at_10 value: 59.192 - type: mrr_at_100 value: 59.650000000000006 - type: mrr_at_1000 value: 59.665 - type: mrr_at_3 value: 56.141 - type: mrr_at_5 value: 57.998000000000005 - type: ndcg_at_1 value: 45.278 - type: ndcg_at_10 value: 64.056 - type: ndcg_at_100 value: 66.89 - type: ndcg_at_1000 value: 67.364 - type: ndcg_at_3 value: 56.97 - type: ndcg_at_5 value: 60.719 - type: precision_at_1 value: 45.278 - type: precision_at_10 value: 9.994 - type: precision_at_100 value: 1.165 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 25.512 - type: precision_at_5 value: 17.509 - type: recall_at_1 value: 40.414 - type: recall_at_10 value: 83.596 - type: recall_at_100 value: 95.72 - type: recall_at_1000 value: 99.24 - type: recall_at_3 value: 65.472 - type: recall_at_5 value: 74.039 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.352 - type: map_at_10 value: 84.369 - type: map_at_100 value: 85.02499999999999 - type: map_at_1000 value: 85.04 - type: map_at_3 value: 81.42399999999999 - type: map_at_5 value: 83.279 - type: mrr_at_1 value: 81.05 - type: mrr_at_10 value: 87.401 - type: mrr_at_100 value: 87.504 - type: mrr_at_1000 value: 87.505 - type: mrr_at_3 value: 86.443 - type: mrr_at_5 value: 87.10799999999999 - type: ndcg_at_1 value: 81.04 - type: ndcg_at_10 value: 88.181 - type: ndcg_at_100 value: 89.411 - type: ndcg_at_1000 value: 89.507 - type: ndcg_at_3 value: 85.28099999999999 - type: ndcg_at_5 value: 86.888 - type: precision_at_1 value: 81.04 - type: precision_at_10 value: 13.406 - type: precision_at_100 value: 1.5350000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.31 - type: precision_at_5 value: 24.54 - type: recall_at_1 value: 70.352 - type: recall_at_10 value: 95.358 - type: recall_at_100 value: 99.541 - type: recall_at_1000 value: 99.984 - type: recall_at_3 value: 87.111 - type: recall_at_5 value: 91.643 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 46.54068723291946 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 63.216287629895994 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.023000000000001 - type: map_at_10 value: 10.071 - type: map_at_100 value: 11.892 - type: map_at_1000 value: 12.196 - type: map_at_3 value: 7.234 - type: map_at_5 value: 8.613999999999999 - type: mrr_at_1 value: 19.900000000000002 - type: mrr_at_10 value: 30.516 - type: mrr_at_100 value: 31.656000000000002 - type: mrr_at_1000 value: 31.723000000000003 - type: mrr_at_3 value: 27.400000000000002 - type: mrr_at_5 value: 29.270000000000003 - type: ndcg_at_1 value: 19.900000000000002 - type: ndcg_at_10 value: 17.474 - type: ndcg_at_100 value: 25.020999999999997 - type: ndcg_at_1000 value: 30.728 - type: ndcg_at_3 value: 16.588 - type: ndcg_at_5 value: 14.498 - type: precision_at_1 value: 19.900000000000002 - type: precision_at_10 value: 9.139999999999999 - type: precision_at_100 value: 2.011 - type: precision_at_1000 value: 0.33899999999999997 - type: precision_at_3 value: 15.667 - type: precision_at_5 value: 12.839999999999998 - type: recall_at_1 value: 4.023000000000001 - type: recall_at_10 value: 18.497 - type: recall_at_100 value: 40.8 - type: recall_at_1000 value: 68.812 - type: recall_at_3 value: 9.508 - type: recall_at_5 value: 12.983 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.967008785134 - type: cos_sim_spearman value: 80.23142141101837 - type: euclidean_pearson value: 81.20166064704539 - type: euclidean_spearman value: 80.18961335654585 - type: manhattan_pearson value: 81.13925443187625 - type: manhattan_spearman value: 80.07948723044424 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.94262461316023 - type: cos_sim_spearman value: 80.01596278563865 - type: euclidean_pearson value: 83.80799622922581 - type: euclidean_spearman value: 79.94984954947103 - type: manhattan_pearson value: 83.68473841756281 - type: manhattan_spearman value: 79.84990707951822 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 80.57346443146068 - type: cos_sim_spearman value: 81.54689837570866 - type: euclidean_pearson value: 81.10909881516007 - type: euclidean_spearman value: 81.56746243261762 - type: manhattan_pearson value: 80.87076036186582 - type: manhattan_spearman value: 81.33074987964402 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 79.54733787179849 - type: cos_sim_spearman value: 77.72202105610411 - type: euclidean_pearson value: 78.9043595478849 - type: euclidean_spearman value: 77.93422804309435 - type: manhattan_pearson value: 78.58115121621368 - type: manhattan_spearman value: 77.62508135122033 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.59880017237558 - type: cos_sim_spearman value: 89.31088630824758 - type: euclidean_pearson value: 88.47069261564656 - type: euclidean_spearman value: 89.33581971465233 - type: manhattan_pearson value: 88.40774264100956 - type: manhattan_spearman value: 89.28657485627835 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.08055117917084 - type: cos_sim_spearman value: 85.78491813080304 - type: euclidean_pearson value: 84.99329155500392 - type: euclidean_spearman value: 85.76728064677287 - type: manhattan_pearson value: 84.87947428989587 - type: manhattan_spearman value: 85.62429454917464 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 82.14190939287384 - type: cos_sim_spearman value: 82.27331573306041 - type: euclidean_pearson value: 81.891896953716 - type: euclidean_spearman value: 82.37695542955998 - type: manhattan_pearson value: 81.73123869460504 - type: manhattan_spearman value: 82.19989168441421 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 76.84695301843362 - type: cos_sim_spearman value: 77.87790986014461 - type: euclidean_pearson value: 76.91981583106315 - type: euclidean_spearman value: 77.88154772749589 - type: manhattan_pearson value: 76.94953277451093 - type: manhattan_spearman value: 77.80499230728604 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 75.44657840482016 - type: cos_sim_spearman value: 75.05531095119674 - type: euclidean_pearson value: 75.88161755829299 - type: euclidean_spearman value: 74.73176238219332 - type: manhattan_pearson value: 75.63984765635362 - type: manhattan_spearman value: 74.86476440770737 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 85.64700140524133 - type: cos_sim_spearman value: 86.16014210425672 - type: euclidean_pearson value: 86.49086860843221 - type: euclidean_spearman value: 86.09729326815614 - type: manhattan_pearson value: 86.43406265125513 - type: manhattan_spearman value: 86.17740150939994 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.91170098764921 - type: cos_sim_spearman value: 88.12437004058931 - type: euclidean_pearson value: 88.81828254494437 - type: euclidean_spearman value: 88.14831794572122 - type: manhattan_pearson value: 88.93442183448961 - type: manhattan_spearman value: 88.15254630778304 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 72.91390577997292 - type: cos_sim_spearman value: 71.22979457536074 - type: euclidean_pearson value: 74.40314008106749 - type: euclidean_spearman value: 72.54972136083246 - type: manhattan_pearson value: 73.85687539530218 - type: manhattan_spearman value: 72.09500771742637 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 80.9301067983089 - type: cos_sim_spearman value: 80.74989828346473 - type: euclidean_pearson value: 81.36781301814257 - type: euclidean_spearman value: 80.9448819964426 - type: manhattan_pearson value: 81.0351322685609 - type: manhattan_spearman value: 80.70192121844177 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.13820465980005 - type: cos_sim_spearman value: 86.73532498758757 - type: euclidean_pearson value: 87.21329451846637 - type: euclidean_spearman value: 86.57863198601002 - type: manhattan_pearson value: 87.06973713818554 - type: manhattan_spearman value: 86.47534918791499 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 85.48720108904415 - type: cos_sim_spearman value: 85.62221757068387 - type: euclidean_pearson value: 86.1010129512749 - type: euclidean_spearman value: 85.86580966509942 - type: manhattan_pearson value: 86.26800938808971 - type: manhattan_spearman value: 85.88902721678429 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 83.98021347333516 - type: cos_sim_spearman value: 84.53806553803501 - type: euclidean_pearson value: 84.61483347248364 - type: euclidean_spearman value: 85.14191408011702 - type: manhattan_pearson value: 84.75297588825967 - type: manhattan_spearman value: 85.33176753669242 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 84.51856644893233 - type: cos_sim_spearman value: 85.27510748506413 - type: euclidean_pearson value: 85.09886861540977 - type: euclidean_spearman value: 85.62579245860887 - type: manhattan_pearson value: 84.93017860464607 - type: manhattan_spearman value: 85.5063988898453 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 62.581573200584195 - type: cos_sim_spearman value: 63.05503590247928 - type: euclidean_pearson value: 63.652564812602094 - type: euclidean_spearman value: 62.64811520876156 - type: manhattan_pearson value: 63.506842893061076 - type: manhattan_spearman value: 62.51289573046917 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 48.2248801729127 - type: cos_sim_spearman value: 56.5936604678561 - type: euclidean_pearson value: 43.98149464089 - type: euclidean_spearman value: 56.108561882423615 - type: manhattan_pearson value: 43.86880305903564 - type: manhattan_spearman value: 56.04671150510166 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 55.17564527009831 - type: cos_sim_spearman value: 64.57978560979488 - type: euclidean_pearson value: 58.8818330154583 - type: euclidean_spearman value: 64.99214839071281 - type: manhattan_pearson value: 58.72671436121381 - type: manhattan_spearman value: 65.10713416616109 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 26.772131864023297 - type: cos_sim_spearman value: 34.68200792408681 - type: euclidean_pearson value: 16.68082419005441 - type: euclidean_spearman value: 34.83099932652166 - type: manhattan_pearson value: 16.52605949659529 - type: manhattan_spearman value: 34.82075801399475 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 54.42415189043831 - type: cos_sim_spearman value: 63.54594264576758 - type: euclidean_pearson value: 57.36577498297745 - type: euclidean_spearman value: 63.111466379158074 - type: manhattan_pearson value: 57.584543715873885 - type: manhattan_spearman value: 63.22361054139183 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 47.55216762405518 - type: cos_sim_spearman value: 56.98670142896412 - type: euclidean_pearson value: 50.15318757562699 - type: euclidean_spearman value: 56.524941926541906 - type: manhattan_pearson value: 49.955618528674904 - type: manhattan_spearman value: 56.37102209240117 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 49.20540980338571 - type: cos_sim_spearman value: 59.9009453504406 - type: euclidean_pearson value: 49.557749853620535 - type: euclidean_spearman value: 59.76631621172456 - type: manhattan_pearson value: 49.62340591181147 - type: manhattan_spearman value: 59.94224880322436 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 51.508169956576985 - type: cos_sim_spearman value: 66.82461565306046 - type: euclidean_pearson value: 56.2274426480083 - type: euclidean_spearman value: 66.6775323848333 - type: manhattan_pearson value: 55.98277796300661 - type: manhattan_spearman value: 66.63669848497175 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 72.86478788045507 - type: cos_sim_spearman value: 76.7946552053193 - type: euclidean_pearson value: 75.01598530490269 - type: euclidean_spearman value: 76.83618917858281 - type: manhattan_pearson value: 74.68337628304332 - type: manhattan_spearman value: 76.57480204017773 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 55.922619099401984 - type: cos_sim_spearman value: 56.599362477240774 - type: euclidean_pearson value: 56.68307052369783 - type: euclidean_spearman value: 54.28760436777401 - type: manhattan_pearson value: 56.67763566500681 - type: manhattan_spearman value: 53.94619541711359 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 66.74357206710913 - type: cos_sim_spearman value: 72.5208244925311 - type: euclidean_pearson value: 67.49254562186032 - type: euclidean_spearman value: 72.02469076238683 - type: manhattan_pearson value: 67.45251772238085 - type: manhattan_spearman value: 72.05538819984538 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 71.25734330033191 - type: cos_sim_spearman value: 76.98349083946823 - type: euclidean_pearson value: 73.71642838667736 - type: euclidean_spearman value: 77.01715504651384 - type: manhattan_pearson value: 73.61712711868105 - type: manhattan_spearman value: 77.01392571153896 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.18215462781212 - type: cos_sim_spearman value: 65.54373266117607 - type: euclidean_pearson value: 64.54126095439005 - type: euclidean_spearman value: 65.30410369102711 - type: manhattan_pearson value: 63.50332221148234 - type: manhattan_spearman value: 64.3455878104313 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 62.30509221440029 - type: cos_sim_spearman value: 65.99582704642478 - type: euclidean_pearson value: 63.43818859884195 - type: euclidean_spearman value: 66.83172582815764 - type: manhattan_pearson value: 63.055779168508764 - type: manhattan_spearman value: 65.49585020501449 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 59.587830825340404 - type: cos_sim_spearman value: 68.93467614588089 - type: euclidean_pearson value: 62.3073527367404 - type: euclidean_spearman value: 69.69758171553175 - type: manhattan_pearson value: 61.9074580815789 - type: manhattan_spearman value: 69.57696375597865 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 57.143220125577066 - type: cos_sim_spearman value: 67.78857859159226 - type: euclidean_pearson value: 55.58225107923733 - type: euclidean_spearman value: 67.80662907184563 - type: manhattan_pearson value: 56.24953502726514 - type: manhattan_spearman value: 67.98262125431616 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 21.826928900322066 - type: cos_sim_spearman value: 49.578506634400405 - type: euclidean_pearson value: 27.939890138843214 - type: euclidean_spearman value: 52.71950519136242 - type: manhattan_pearson value: 26.39878683847546 - type: manhattan_spearman value: 47.54609580342499 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 57.27603854632001 - type: cos_sim_spearman value: 50.709255283710995 - type: euclidean_pearson value: 59.5419024445929 - type: euclidean_spearman value: 50.709255283710995 - type: manhattan_pearson value: 59.03256832438492 - type: manhattan_spearman value: 61.97797868009122 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 85.00757054859712 - type: cos_sim_spearman value: 87.29283629622222 - type: euclidean_pearson value: 86.54824171775536 - type: euclidean_spearman value: 87.24364730491402 - type: manhattan_pearson value: 86.5062156915074 - type: manhattan_spearman value: 87.15052170378574 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 82.03549357197389 - type: mrr value: 95.05437645143527 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 57.260999999999996 - type: map_at_10 value: 66.259 - type: map_at_100 value: 66.884 - type: map_at_1000 value: 66.912 - type: map_at_3 value: 63.685 - type: map_at_5 value: 65.35499999999999 - type: mrr_at_1 value: 60.333000000000006 - type: mrr_at_10 value: 67.5 - type: mrr_at_100 value: 68.013 - type: mrr_at_1000 value: 68.038 - type: mrr_at_3 value: 65.61099999999999 - type: mrr_at_5 value: 66.861 - type: ndcg_at_1 value: 60.333000000000006 - type: ndcg_at_10 value: 70.41 - type: ndcg_at_100 value: 73.10600000000001 - type: ndcg_at_1000 value: 73.846 - type: ndcg_at_3 value: 66.133 - type: ndcg_at_5 value: 68.499 - type: precision_at_1 value: 60.333000000000006 - type: precision_at_10 value: 9.232999999999999 - type: precision_at_100 value: 1.0630000000000002 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 25.667 - type: precision_at_5 value: 17.067 - type: recall_at_1 value: 57.260999999999996 - type: recall_at_10 value: 81.94399999999999 - type: recall_at_100 value: 93.867 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 70.339 - type: recall_at_5 value: 76.25 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.74356435643564 - type: cos_sim_ap value: 93.13411948212683 - type: cos_sim_f1 value: 86.80521991300147 - type: cos_sim_precision value: 84.00374181478017 - type: cos_sim_recall value: 89.8 - type: dot_accuracy value: 99.67920792079208 - type: dot_ap value: 89.27277565444479 - type: dot_f1 value: 83.9276990718124 - type: dot_precision value: 82.04393505253104 - type: dot_recall value: 85.9 - type: euclidean_accuracy value: 99.74257425742574 - type: euclidean_ap value: 93.17993008259062 - type: euclidean_f1 value: 86.69396110542476 - type: euclidean_precision value: 88.78406708595388 - type: euclidean_recall value: 84.7 - type: manhattan_accuracy value: 99.74257425742574 - type: manhattan_ap value: 93.14413755550099 - type: manhattan_f1 value: 86.82483594144371 - type: manhattan_precision value: 87.66564729867483 - type: manhattan_recall value: 86 - type: max_accuracy value: 99.74356435643564 - type: max_ap value: 93.17993008259062 - type: max_f1 value: 86.82483594144371 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 57.525863806168566 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 32.68850574423839 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.71580650644033 - type: mrr value: 50.50971903913081 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 29.152190498799484 - type: cos_sim_spearman value: 29.686180371952727 - type: dot_pearson value: 27.248664793816342 - type: dot_spearman value: 28.37748983721745 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.20400000000000001 - type: map_at_10 value: 1.6209999999999998 - type: map_at_100 value: 9.690999999999999 - type: map_at_1000 value: 23.733 - type: map_at_3 value: 0.575 - type: map_at_5 value: 0.885 - type: mrr_at_1 value: 78 - type: mrr_at_10 value: 86.56700000000001 - type: mrr_at_100 value: 86.56700000000001 - type: mrr_at_1000 value: 86.56700000000001 - type: mrr_at_3 value: 85.667 - type: mrr_at_5 value: 86.56700000000001 - type: ndcg_at_1 value: 76 - type: ndcg_at_10 value: 71.326 - type: ndcg_at_100 value: 54.208999999999996 - type: ndcg_at_1000 value: 49.252 - type: ndcg_at_3 value: 74.235 - type: ndcg_at_5 value: 73.833 - type: precision_at_1 value: 78 - type: precision_at_10 value: 74.8 - type: precision_at_100 value: 55.50000000000001 - type: precision_at_1000 value: 21.836 - type: precision_at_3 value: 78 - type: precision_at_5 value: 78 - type: recall_at_1 value: 0.20400000000000001 - type: recall_at_10 value: 1.894 - type: recall_at_100 value: 13.245999999999999 - type: recall_at_1000 value: 46.373 - type: recall_at_3 value: 0.613 - type: recall_at_5 value: 0.991 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (sqi-eng) config: sqi-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.89999999999999 - type: f1 value: 94.69999999999999 - type: precision value: 94.11666666666667 - type: recall value: 95.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fry-eng) config: fry-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 68.20809248554913 - type: f1 value: 63.431048720066066 - type: precision value: 61.69143958161298 - type: recall value: 68.20809248554913 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kur-eng) config: kur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 71.21951219512195 - type: f1 value: 66.82926829268293 - type: precision value: 65.1260162601626 - type: recall value: 71.21951219512195 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tur-eng) config: tur-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.2 - type: f1 value: 96.26666666666667 - type: precision value: 95.8 - type: recall value: 97.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (deu-eng) config: deu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 99.3 - type: f1 value: 99.06666666666666 - type: precision value: 98.95 - type: recall value: 99.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nld-eng) config: nld-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.39999999999999 - type: f1 value: 96.63333333333333 - type: precision value: 96.26666666666668 - type: recall value: 97.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ron-eng) config: ron-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96 - type: f1 value: 94.86666666666666 - type: precision value: 94.31666666666668 - type: recall value: 96 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ang-eng) config: ang-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 47.01492537313433 - type: f1 value: 40.178867566927266 - type: precision value: 38.179295828549556 - type: recall value: 47.01492537313433 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ido-eng) config: ido-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.5 - type: f1 value: 83.62537480063796 - type: precision value: 82.44555555555554 - type: recall value: 86.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jav-eng) config: jav-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 80.48780487804879 - type: f1 value: 75.45644599303138 - type: precision value: 73.37398373983739 - type: recall value: 80.48780487804879 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (isl-eng) config: isl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.7 - type: f1 value: 91.95666666666666 - type: precision value: 91.125 - type: recall value: 93.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slv-eng) config: slv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.73754556500607 - type: f1 value: 89.65168084244632 - type: precision value: 88.73025516403402 - type: recall value: 91.73754556500607 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cym-eng) config: cym-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 81.04347826086956 - type: f1 value: 76.2128364389234 - type: precision value: 74.2 - type: recall value: 81.04347826086956 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kaz-eng) config: kaz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 83.65217391304348 - type: f1 value: 79.4376811594203 - type: precision value: 77.65797101449274 - type: recall value: 83.65217391304348 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (est-eng) config: est-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.5 - type: f1 value: 85.02690476190476 - type: precision value: 83.96261904761904 - type: recall value: 87.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (heb-eng) config: heb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89.3 - type: f1 value: 86.52333333333333 - type: precision value: 85.22833333333332 - type: recall value: 89.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gla-eng) config: gla-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 65.01809408926418 - type: f1 value: 59.00594446432805 - type: precision value: 56.827215807915444 - type: recall value: 65.01809408926418 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mar-eng) config: mar-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.2 - type: f1 value: 88.58 - type: precision value: 87.33333333333334 - type: recall value: 91.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lat-eng) config: lat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 59.199999999999996 - type: f1 value: 53.299166276284915 - type: precision value: 51.3383908045977 - type: recall value: 59.199999999999996 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bel-eng) config: bel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.2 - type: f1 value: 91.2 - type: precision value: 90.25 - type: recall value: 93.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pms-eng) config: pms-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 64.76190476190476 - type: f1 value: 59.867110667110666 - type: precision value: 58.07390192653351 - type: recall value: 64.76190476190476 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gle-eng) config: gle-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 76.2 - type: f1 value: 71.48147546897547 - type: precision value: 69.65409090909091 - type: recall value: 76.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pes-eng) config: pes-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.8 - type: f1 value: 92.14 - type: precision value: 91.35833333333333 - type: recall value: 93.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nob-eng) config: nob-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.89999999999999 - type: f1 value: 97.2 - type: precision value: 96.85000000000001 - type: recall value: 97.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bul-eng) config: bul-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.6 - type: f1 value: 92.93333333333334 - type: precision value: 92.13333333333333 - type: recall value: 94.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cbk-eng) config: cbk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 74.1 - type: f1 value: 69.14817460317461 - type: precision value: 67.2515873015873 - type: recall value: 74.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hun-eng) config: hun-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.19999999999999 - type: f1 value: 94.01333333333335 - type: precision value: 93.46666666666667 - type: recall value: 95.19999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uig-eng) config: uig-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 76.9 - type: f1 value: 72.07523809523809 - type: precision value: 70.19777777777779 - type: recall value: 76.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (rus-eng) config: rus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.1 - type: f1 value: 92.31666666666666 - type: precision value: 91.43333333333332 - type: recall value: 94.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (spa-eng) config: spa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.8 - type: f1 value: 97.1 - type: precision value: 96.76666666666668 - type: recall value: 97.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hye-eng) config: hye-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.85714285714286 - type: f1 value: 90.92093441150045 - type: precision value: 90.00449236298293 - type: recall value: 92.85714285714286 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tel-eng) config: tel-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.16239316239316 - type: f1 value: 91.33903133903132 - type: precision value: 90.56267806267806 - type: recall value: 93.16239316239316 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (afr-eng) config: afr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.4 - type: f1 value: 90.25666666666666 - type: precision value: 89.25833333333334 - type: recall value: 92.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mon-eng) config: mon-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.22727272727272 - type: f1 value: 87.53030303030303 - type: precision value: 86.37121212121211 - type: recall value: 90.22727272727272 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arz-eng) config: arz-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 79.03563941299791 - type: f1 value: 74.7349505840072 - type: precision value: 72.9035639412998 - type: recall value: 79.03563941299791 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hrv-eng) config: hrv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97 - type: f1 value: 96.15 - type: precision value: 95.76666666666668 - type: recall value: 97 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nov-eng) config: nov-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 76.26459143968872 - type: f1 value: 71.55642023346303 - type: precision value: 69.7544932369835 - type: recall value: 76.26459143968872 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (gsw-eng) config: gsw-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 58.119658119658126 - type: f1 value: 51.65242165242165 - type: precision value: 49.41768108434775 - type: recall value: 58.119658119658126 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nds-eng) config: nds-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 74.3 - type: f1 value: 69.52055555555555 - type: precision value: 67.7574938949939 - type: recall value: 74.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ukr-eng) config: ukr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.8 - type: f1 value: 93.31666666666666 - type: precision value: 92.60000000000001 - type: recall value: 94.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (uzb-eng) config: uzb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 76.63551401869158 - type: f1 value: 72.35202492211837 - type: precision value: 70.60358255451713 - type: recall value: 76.63551401869158 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lit-eng) config: lit-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.4 - type: f1 value: 88.4811111111111 - type: precision value: 87.7452380952381 - type: recall value: 90.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ina-eng) config: ina-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95 - type: f1 value: 93.60666666666667 - type: precision value: 92.975 - type: recall value: 95 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lfn-eng) config: lfn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 67.2 - type: f1 value: 63.01595782872099 - type: precision value: 61.596587301587306 - type: recall value: 67.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (zsm-eng) config: zsm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.7 - type: f1 value: 94.52999999999999 - type: precision value: 94 - type: recall value: 95.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ita-eng) config: ita-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.6 - type: f1 value: 93.28999999999999 - type: precision value: 92.675 - type: recall value: 94.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cmn-eng) config: cmn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.39999999999999 - type: f1 value: 95.28333333333333 - type: precision value: 94.75 - type: recall value: 96.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (lvs-eng) config: lvs-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.9 - type: f1 value: 89.83 - type: precision value: 88.92 - type: recall value: 91.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (glg-eng) config: glg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.69999999999999 - type: f1 value: 93.34222222222223 - type: precision value: 92.75416666666668 - type: recall value: 94.69999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ceb-eng) config: ceb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 60.333333333333336 - type: f1 value: 55.31203703703703 - type: precision value: 53.39971108326371 - type: recall value: 60.333333333333336 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bre-eng) config: bre-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 12.9 - type: f1 value: 11.099861903031458 - type: precision value: 10.589187932631877 - type: recall value: 12.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ben-eng) config: ben-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 86.7 - type: f1 value: 83.0152380952381 - type: precision value: 81.37833333333333 - type: recall value: 86.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swg-eng) config: swg-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 63.39285714285714 - type: f1 value: 56.832482993197274 - type: precision value: 54.56845238095237 - type: recall value: 63.39285714285714 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (arq-eng) config: arq-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 48.73765093304062 - type: f1 value: 41.555736920720456 - type: precision value: 39.06874531737319 - type: recall value: 48.73765093304062 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kab-eng) config: kab-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 41.099999999999994 - type: f1 value: 36.540165945165946 - type: precision value: 35.05175685425686 - type: recall value: 41.099999999999994 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fra-eng) config: fra-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.89999999999999 - type: f1 value: 93.42333333333333 - type: precision value: 92.75833333333333 - type: recall value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (por-eng) config: por-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.89999999999999 - type: f1 value: 93.63333333333334 - type: precision value: 93.01666666666665 - type: recall value: 94.89999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tat-eng) config: tat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.9 - type: f1 value: 73.64833333333334 - type: precision value: 71.90282106782105 - type: recall value: 77.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (oci-eng) config: oci-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 59.4 - type: f1 value: 54.90521367521367 - type: precision value: 53.432840025471606 - type: recall value: 59.4 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pol-eng) config: pol-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.39999999999999 - type: f1 value: 96.6 - type: precision value: 96.2 - type: recall value: 97.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (war-eng) config: war-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 67.2 - type: f1 value: 62.25926129426129 - type: precision value: 60.408376623376626 - type: recall value: 67.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (aze-eng) config: aze-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.2 - type: f1 value: 87.60666666666667 - type: precision value: 86.45277777777778 - type: recall value: 90.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (vie-eng) config: vie-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 97.7 - type: f1 value: 97 - type: precision value: 96.65 - type: recall value: 97.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (nno-eng) config: nno-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.2 - type: f1 value: 91.39746031746031 - type: precision value: 90.6125 - type: recall value: 93.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cha-eng) config: cha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 32.11678832116788 - type: f1 value: 27.210415386260234 - type: precision value: 26.20408990846947 - type: recall value: 32.11678832116788 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mhr-eng) config: mhr-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.5 - type: f1 value: 6.787319277832475 - type: precision value: 6.3452094433344435 - type: recall value: 8.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dan-eng) config: dan-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.1 - type: f1 value: 95.08 - type: precision value: 94.61666666666667 - type: recall value: 96.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ell-eng) config: ell-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.3 - type: f1 value: 93.88333333333333 - type: precision value: 93.18333333333332 - type: recall value: 95.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (amh-eng) config: amh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.11904761904762 - type: f1 value: 80.69444444444444 - type: precision value: 78.72023809523809 - type: recall value: 85.11904761904762 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (pam-eng) config: pam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 11.1 - type: f1 value: 9.276381801735853 - type: precision value: 8.798174603174601 - type: recall value: 11.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hsb-eng) config: hsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 63.56107660455487 - type: f1 value: 58.70433569191332 - type: precision value: 56.896926581464015 - type: recall value: 63.56107660455487 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (srp-eng) config: srp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.69999999999999 - type: f1 value: 93.10000000000001 - type: precision value: 92.35 - type: recall value: 94.69999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (epo-eng) config: epo-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.8 - type: f1 value: 96.01222222222222 - type: precision value: 95.67083333333332 - type: recall value: 96.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kzj-eng) config: kzj-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 9.2 - type: f1 value: 7.911555250305249 - type: precision value: 7.631246556216846 - type: recall value: 9.2 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (awa-eng) config: awa-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.48917748917748 - type: f1 value: 72.27375798804371 - type: precision value: 70.14430014430013 - type: recall value: 77.48917748917748 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fao-eng) config: fao-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 77.09923664122137 - type: f1 value: 72.61541257724463 - type: precision value: 70.8998380754106 - type: recall value: 77.09923664122137 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mal-eng) config: mal-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 98.2532751091703 - type: f1 value: 97.69529354682193 - type: precision value: 97.42843279961184 - type: recall value: 98.2532751091703 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ile-eng) config: ile-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 82.8 - type: f1 value: 79.14672619047619 - type: precision value: 77.59489247311828 - type: recall value: 82.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (bos-eng) config: bos-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.35028248587571 - type: f1 value: 92.86252354048965 - type: precision value: 92.2080979284369 - type: recall value: 94.35028248587571 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cor-eng) config: cor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.5 - type: f1 value: 6.282429263935621 - type: precision value: 5.783274240739785 - type: recall value: 8.5 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (cat-eng) config: cat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.7 - type: f1 value: 91.025 - type: precision value: 90.30428571428571 - type: recall value: 92.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (eus-eng) config: eus-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 81 - type: f1 value: 77.8232380952381 - type: precision value: 76.60194444444444 - type: recall value: 81 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yue-eng) config: yue-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91 - type: f1 value: 88.70857142857142 - type: precision value: 87.7 - type: recall value: 91 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swe-eng) config: swe-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.39999999999999 - type: f1 value: 95.3 - type: precision value: 94.76666666666667 - type: recall value: 96.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dtp-eng) config: dtp-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 8.1 - type: f1 value: 7.001008218834307 - type: precision value: 6.708329562594269 - type: recall value: 8.1 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kat-eng) config: kat-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 87.1313672922252 - type: f1 value: 84.09070598748882 - type: precision value: 82.79171454104429 - type: recall value: 87.1313672922252 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (jpn-eng) config: jpn-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.39999999999999 - type: f1 value: 95.28333333333333 - type: precision value: 94.73333333333332 - type: recall value: 96.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (csb-eng) config: csb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 42.29249011857708 - type: f1 value: 36.981018542283365 - type: precision value: 35.415877813576024 - type: recall value: 42.29249011857708 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (xho-eng) config: xho-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 83.80281690140845 - type: f1 value: 80.86854460093896 - type: precision value: 79.60093896713614 - type: recall value: 83.80281690140845 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (orv-eng) config: orv-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 45.26946107784431 - type: f1 value: 39.80235464678088 - type: precision value: 38.14342660001342 - type: recall value: 45.26946107784431 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ind-eng) config: ind-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.3 - type: f1 value: 92.9 - type: precision value: 92.26666666666668 - type: recall value: 94.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tuk-eng) config: tuk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 37.93103448275862 - type: f1 value: 33.15192743764172 - type: precision value: 31.57456528146183 - type: recall value: 37.93103448275862 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (max-eng) config: max-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 69.01408450704226 - type: f1 value: 63.41549295774648 - type: precision value: 61.342778895595806 - type: recall value: 69.01408450704226 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (swh-eng) config: swh-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 76.66666666666667 - type: f1 value: 71.60705960705961 - type: precision value: 69.60683760683762 - type: recall value: 76.66666666666667 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (hin-eng) config: hin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 95.8 - type: f1 value: 94.48333333333333 - type: precision value: 93.83333333333333 - type: recall value: 95.8 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (dsb-eng) config: dsb-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 52.81837160751566 - type: f1 value: 48.435977731384824 - type: precision value: 47.11291973845539 - type: recall value: 52.81837160751566 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ber-eng) config: ber-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 44.9 - type: f1 value: 38.88962621607783 - type: precision value: 36.95936507936508 - type: recall value: 44.9 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tam-eng) config: tam-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 90.55374592833876 - type: f1 value: 88.22553125484721 - type: precision value: 87.26927252985884 - type: recall value: 90.55374592833876 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (slk-eng) config: slk-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 94.6 - type: f1 value: 93.13333333333333 - type: precision value: 92.45333333333333 - type: recall value: 94.6 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tgl-eng) config: tgl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 93.7 - type: f1 value: 91.99666666666667 - type: precision value: 91.26666666666668 - type: recall value: 93.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ast-eng) config: ast-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 85.03937007874016 - type: f1 value: 81.75853018372703 - type: precision value: 80.34120734908137 - type: recall value: 85.03937007874016 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (mkd-eng) config: mkd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88.3 - type: f1 value: 85.5 - type: precision value: 84.25833333333334 - type: recall value: 88.3 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (khm-eng) config: khm-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 65.51246537396122 - type: f1 value: 60.02297410192148 - type: precision value: 58.133467727289236 - type: recall value: 65.51246537396122 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ces-eng) config: ces-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96 - type: f1 value: 94.89 - type: precision value: 94.39166666666667 - type: recall value: 96 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tzl-eng) config: tzl-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 57.692307692307686 - type: f1 value: 53.162393162393165 - type: precision value: 51.70673076923077 - type: recall value: 57.692307692307686 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (urd-eng) config: urd-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 91.60000000000001 - type: f1 value: 89.21190476190475 - type: precision value: 88.08666666666667 - type: recall value: 91.60000000000001 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (ara-eng) config: ara-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 88 - type: f1 value: 85.47 - type: precision value: 84.43266233766234 - type: recall value: 88 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (kor-eng) config: kor-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 92.7 - type: f1 value: 90.64999999999999 - type: precision value: 89.68333333333332 - type: recall value: 92.7 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (yid-eng) config: yid-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 80.30660377358491 - type: f1 value: 76.33044137466307 - type: precision value: 74.78970125786164 - type: recall value: 80.30660377358491 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (fin-eng) config: fin-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.39999999999999 - type: f1 value: 95.44 - type: precision value: 94.99166666666666 - type: recall value: 96.39999999999999 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (tha-eng) config: tha-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 96.53284671532847 - type: f1 value: 95.37712895377129 - type: precision value: 94.7992700729927 - type: recall value: 96.53284671532847 - task: type: BitextMining dataset: type: mteb/tatoeba-bitext-mining name: MTEB Tatoeba (wuu-eng) config: wuu-eng split: test revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 metrics: - type: accuracy value: 89 - type: f1 value: 86.23190476190476 - type: precision value: 85.035 - type: recall value: 89 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.585 - type: map_at_10 value: 9.012 - type: map_at_100 value: 14.027000000000001 - type: map_at_1000 value: 15.565000000000001 - type: map_at_3 value: 5.032 - type: map_at_5 value: 6.657 - type: mrr_at_1 value: 28.571 - type: mrr_at_10 value: 45.377 - type: mrr_at_100 value: 46.119 - type: mrr_at_1000 value: 46.127 - type: mrr_at_3 value: 41.156 - type: mrr_at_5 value: 42.585 - type: ndcg_at_1 value: 27.551 - type: ndcg_at_10 value: 23.395 - type: ndcg_at_100 value: 33.342 - type: ndcg_at_1000 value: 45.523 - type: ndcg_at_3 value: 25.158 - type: ndcg_at_5 value: 23.427 - type: precision_at_1 value: 28.571 - type: precision_at_10 value: 21.429000000000002 - type: precision_at_100 value: 6.714 - type: precision_at_1000 value: 1.473 - type: precision_at_3 value: 27.211000000000002 - type: precision_at_5 value: 24.490000000000002 - type: recall_at_1 value: 2.585 - type: recall_at_10 value: 15.418999999999999 - type: recall_at_100 value: 42.485 - type: recall_at_1000 value: 79.536 - type: recall_at_3 value: 6.239999999999999 - type: recall_at_5 value: 8.996 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.3234 - type: ap value: 14.361688653847423 - type: f1 value: 54.819068624319044 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.97792869269949 - type: f1 value: 62.28965628513728 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 38.90540145385218 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.53513739047506 - type: cos_sim_ap value: 75.27741586677557 - type: cos_sim_f1 value: 69.18792902473774 - type: cos_sim_precision value: 67.94708725515136 - type: cos_sim_recall value: 70.47493403693932 - type: dot_accuracy value: 84.7052512368123 - type: dot_ap value: 69.36075482849378 - type: dot_f1 value: 64.44688376631296 - type: dot_precision value: 59.92288500793831 - type: dot_recall value: 69.70976253298153 - type: euclidean_accuracy value: 86.60666388508076 - type: euclidean_ap value: 75.47512772621097 - type: euclidean_f1 value: 69.413872536473 - type: euclidean_precision value: 67.39562624254472 - type: euclidean_recall value: 71.55672823218997 - type: manhattan_accuracy value: 86.52917684925792 - type: manhattan_ap value: 75.34000110496703 - type: manhattan_f1 value: 69.28489190226429 - type: manhattan_precision value: 67.24608889992551 - type: manhattan_recall value: 71.45118733509234 - type: max_accuracy value: 86.60666388508076 - type: max_ap value: 75.47512772621097 - type: max_f1 value: 69.413872536473 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.01695967710637 - type: cos_sim_ap value: 85.8298270742901 - type: cos_sim_f1 value: 78.46988128389272 - type: cos_sim_precision value: 74.86017897091722 - type: cos_sim_recall value: 82.44533415460425 - type: dot_accuracy value: 88.19420188613343 - type: dot_ap value: 83.82679165901324 - type: dot_f1 value: 76.55833777304208 - type: dot_precision value: 75.6884875846501 - type: dot_recall value: 77.44841392054204 - type: euclidean_accuracy value: 89.03054294252338 - type: euclidean_ap value: 85.89089555185325 - type: euclidean_f1 value: 78.62997658079624 - type: euclidean_precision value: 74.92329149232914 - type: euclidean_recall value: 82.72251308900523 - type: manhattan_accuracy value: 89.0266620095471 - type: manhattan_ap value: 85.86458997929147 - type: manhattan_f1 value: 78.50685331000291 - type: manhattan_precision value: 74.5499861534201 - type: manhattan_recall value: 82.90729904527257 - type: max_accuracy value: 89.03054294252338 - type: max_ap value: 85.89089555185325 - type: max_f1 value: 78.62997658079624 language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - 'no' - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh license: mit --- ## Multilingual-E5-large Multilingual E5 Text Embeddings: A Technical Report. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei, arXiv 2024 This model has 24 layers and the embedding size is 1024. ## Usage Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. ## Supported Languages This model is initialized from xlm-roberta-large and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation. ## Training Details **Initialization**: xlm-roberta-large **First stage**: contrastive pre-training with weak supervision | Dataset | Weak supervision | # of text pairs | |--------------------------------------------------------------------------------------------------------|---------------------------------------|-----------------| | Filtered mC4 | (title, page content) | 1B | | CC News | (title, news content) | 400M | | NLLB | translation pairs | 2.4B | | Wikipedia | (hierarchical section title, passage) | 150M | | Filtered Reddit | (comment, response) | 800M | | S2ORC | (title, abstract) and citation pairs | 100M | | Stackexchange | (question, answer) | 50M | | xP3 | (input prompt, response) | 80M | | Miscellaneous unsupervised SBERT data | - | 10M | **Second stage**: supervised fine-tuning | Dataset | Language | # of text pairs | |----------------------------------------------------------------------------------------|--------------|-----------------| | MS MARCO | English | 500k | | NQ | English | 70k | | Trivia QA | English | 60k | | NLI from SimCSE | English | <300k | | ELI5 | English | 500k | | DuReader Retrieval | Chinese | 86k | | KILT Fever | English | 70k | | KILT HotpotQA | English | 70k | | SQuAD | English | 87k | | Quora | English | 150k | | Mr. TyDi | 11 languages | 50k | | MIRACL | 16 languages | 40k | For all labeled datasets, we only use its training set for fine-tuning. For other training details, please refer to our paper at ## Benchmark Results on Mr. TyDi | Model | Avg MRR@10 | | ar | bn | en | fi | id | ja | ko | ru | sw | te | th | |-----------------------|------------|-------|------| --- | --- | --- | --- | --- | --- | --- |------| --- | --- | | BM25 | 33.3 | | 36.7 | 41.3 | 15.1 | 28.8 | 38.2 | 21.7 | 28.1 | 32.9 | 39.6 | 42.4 | 41.7 | | mDPR | 16.7 | | 26.0 | 25.8 | 16.2 | 11.3 | 14.6 | 18.1 | 21.9 | 18.5 | 7.3 | 10.6 | 13.5 | | BM25 + mDPR | 41.7 | | 49.1 | 53.5 | 28.4 | 36.5 | 45.5 | 35.5 | 36.2 | 42.7 | 40.5 | 42.0 | 49.2 | | | | | multilingual-e5-small | 64.4 | | 71.5 | 66.3 | 54.5 | 57.7 | 63.2 | 55.4 | 54.3 | 60.8 | 65.4 | 89.1 | 70.1 | | multilingual-e5-base | 65.9 | | 72.3 | 65.0 | 58.5 | 60.8 | 64.9 | 56.6 | 55.8 | 62.7 | 69.0 | 86.6 | 72.7 | | multilingual-e5-large | **70.5** | | 77.5 | 73.2 | 60.8 | 66.8 | 68.5 | 62.5 | 61.6 | 65.8 | 72.7 | 90.2 | 76.2 | ## MTEB Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## Support for Sentence Transformers Below is an example for usage with sentence_transformers. Package requirements Contributors: michaelfeil ## FAQ **1. Do I need to add the prefix \"query: \" and \"passage: \" to input texts?** Yes, this is how the model is trained, otherwise you will see a performance degradation. Here are some rules of thumb: - Use \"query: \" and \"passage: \" correspondingly for asymmetric tasks such as passage retrieval in open QA, ad-hoc information retrieval. - Use \"query: \" prefix for symmetric tasks such as semantic similarity, bitext mining, paraphrase retrieval. - Use \"query: \" prefix if you want to use embeddings as features, such as linear probing classification, clustering. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Why does the cosine similarity scores distribute around 0.7 to 1.0?** This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss. For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue. ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations Long texts will be truncated to at most 512 tokens.", + "model_explanation_gemini": "A multilingual sentence embedding model designed for tasks like sentence similarity, feature extraction, and classification across multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/intfloat_multilingual-e5-small.json b/data/model_data_json/intfloat_multilingual-e5-small.json new file mode 100644 index 0000000000000000000000000000000000000000..88f856fcb00d2594491efb3190fb0856679f0ee8 --- /dev/null +++ b/data/model_data_json/intfloat_multilingual-e5-small.json @@ -0,0 +1,121 @@ +{ + "model_id": "intfloat/multilingual-e5-small", + "downloads": 4144123, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "bert", + "mteb", + "Sentence Transformers", + "sentence-similarity", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:2402.05672", + "arxiv:2108.08787", + "arxiv:2104.08663", + "arxiv:2210.07316", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - 'no' - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh license: mit model-index: - name: intfloat/multilingual-e5-small results: - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 73.79104477611939 - type: ap value: 36.9996434842022 - type: f1 value: 67.95453679103099 task: type: Classification - dataset: config: de name: MTEB AmazonCounterfactualClassification (de) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 71.64882226980728 - type: ap value: 82.11942130026586 - type: f1 value: 69.87963421606715 task: type: Classification - dataset: config: en-ext name: MTEB AmazonCounterfactualClassification (en-ext) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 75.8095952023988 - type: ap value: 24.46869495579561 - type: f1 value: 63.00108480037597 task: type: Classification - dataset: config: ja name: MTEB AmazonCounterfactualClassification (ja) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 64.186295503212 - type: ap value: 15.496804690197042 - type: f1 value: 52.07153895475031 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 88.699325 - type: ap value: 85.27039559917269 - type: f1 value: 88.65556295032513 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 44.69799999999999 - type: f1 value: 43.73187348654165 task: type: Classification - dataset: config: de name: MTEB AmazonReviewsClassification (de) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 40.245999999999995 - type: f1 value: 39.3863530637684 task: type: Classification - dataset: config: es name: MTEB AmazonReviewsClassification (es) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 40.394 - type: f1 value: 39.301223469483446 task: type: Classification - dataset: config: fr name: MTEB AmazonReviewsClassification (fr) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 38.864 - type: f1 value: 37.97974261868003 task: type: Classification - dataset: config: ja name: MTEB AmazonReviewsClassification (ja) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 37.682 - type: f1 value: 37.07399369768313 task: type: Classification - dataset: config: zh name: MTEB AmazonReviewsClassification (zh) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 37.504 - type: f1 value: 36.62317273874278 task: type: Classification - dataset: config: default name: MTEB ArguAna revision: None split: test type: arguana metrics: - type: map_at_1 value: 19.061 - type: map_at_10 value: 31.703 - type: map_at_100 value: 32.967 - type: map_at_1000 value: 33.001000000000005 - type: map_at_3 value: 27.466 - type: map_at_5 value: 29.564 - type: mrr_at_1 value: 19.559 - type: mrr_at_10 value: 31.874999999999996 - type: mrr_at_100 value: 33.146 - type: mrr_at_1000 value: 33.18 - type: mrr_at_3 value: 27.667 - type: mrr_at_5 value: 29.74 - type: ndcg_at_1 value: 19.061 - type: ndcg_at_10 value: 39.062999999999995 - type: ndcg_at_100 value: 45.184000000000005 - type: ndcg_at_1000 value: 46.115 - type: ndcg_at_3 value: 30.203000000000003 - type: ndcg_at_5 value: 33.953 - type: precision_at_1 value: 19.061 - type: precision_at_10 value: 6.279999999999999 - type: precision_at_100 value: 0.9129999999999999 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 12.706999999999999 - type: precision_at_5 value: 9.431000000000001 - type: recall_at_1 value: 19.061 - type: recall_at_10 value: 62.802 - type: recall_at_100 value: 91.323 - type: recall_at_1000 value: 98.72 - type: recall_at_3 value: 38.122 - type: recall_at_5 value: 47.155 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: v_measure value: 39.22266660528253 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: v_measure value: 30.79980849482483 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: map value: 57.8790068352054 - type: mrr value: 71.78791276436706 task: type: Reranking - dataset: config: default name: MTEB BIOSSES revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: cos_sim_pearson value: 82.36328364043163 - type: cos_sim_spearman value: 82.26211536195868 - type: euclidean_pearson value: 80.3183865039173 - type: euclidean_spearman value: 79.88495276296132 - type: manhattan_pearson value: 80.14484480692127 - type: manhattan_spearman value: 80.39279565980743 task: type: STS - dataset: config: de-en name: MTEB BUCC (de-en) revision: d51519689f32196a32af33b075a01d0e7c51e252 split: test type: mteb/bucc-bitext-mining metrics: - type: accuracy value: 98.0375782881002 - type: f1 value: 97.86012526096033 - type: precision value: 97.77139874739039 - type: recall value: 98.0375782881002 task: type: BitextMining - dataset: config: fr-en name: MTEB BUCC (fr-en) revision: d51519689f32196a32af33b075a01d0e7c51e252 split: test type: mteb/bucc-bitext-mining metrics: - type: accuracy value: 93.35241030156286 - type: f1 value: 92.66050333846944 - type: precision value: 92.3306919069631 - type: recall value: 93.35241030156286 task: type: BitextMining - dataset: config: ru-en name: MTEB BUCC (ru-en) revision: d51519689f32196a32af33b075a01d0e7c51e252 split: test type: mteb/bucc-bitext-mining metrics: - type: accuracy value: 94.0699688257707 - type: f1 value: 93.50236693222492 - type: precision value: 93.22791825424315 - type: recall value: 94.0699688257707 task: type: BitextMining - dataset: config: zh-en name: MTEB BUCC (zh-en) revision: d51519689f32196a32af33b075a01d0e7c51e252 split: test type: mteb/bucc-bitext-mining metrics: - type: accuracy value: 89.25750394944708 - type: f1 value: 88.79234684921889 - type: precision value: 88.57293312269616 - type: recall value: 89.25750394944708 task: type: BitextMining - dataset: config: default name: MTEB Banking77Classification revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 79.41558441558442 - type: f1 value: 79.25886487487219 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: v_measure value: 35.747820820329736 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: v_measure value: 27.045143830596146 task: type: Clustering - dataset: config: default name: MTEB CQADupstackRetrieval revision: None split: test type: BeIR/cqadupstack metrics: - type: map_at_1 value: 24.252999999999997 - type: map_at_10 value: 31.655916666666666 - type: map_at_100 value: 32.680749999999996 - type: map_at_1000 value: 32.79483333333334 - type: map_at_3 value: 29.43691666666666 - type: map_at_5 value: 30.717416666666665 - type: mrr_at_1 value: 28.602750000000004 - type: mrr_at_10 value: 35.56875 - type: mrr_at_100 value: 36.3595 - type: mrr_at_1000 value: 36.427749999999996 - type: mrr_at_3 value: 33.586166666666664 - type: mrr_at_5 value: 34.73641666666666 - type: ndcg_at_1 value: 28.602750000000004 - type: ndcg_at_10 value: 36.06933333333334 - type: ndcg_at_100 value: 40.70141666666667 - type: ndcg_at_1000 value: 43.24341666666667 - type: ndcg_at_3 value: 32.307916666666664 - type: ndcg_at_5 value: 34.129999999999995 - type: precision_at_1 value: 28.602750000000004 - type: precision_at_10 value: 6.097666666666667 - type: precision_at_100 value: 0.9809166666666668 - type: precision_at_1000 value: 0.13766666666666663 - type: precision_at_3 value: 14.628166666666667 - type: precision_at_5 value: 10.266916666666667 - type: recall_at_1 value: 24.252999999999997 - type: recall_at_10 value: 45.31916666666667 - type: recall_at_100 value: 66.03575000000001 - type: recall_at_1000 value: 83.94708333333334 - type: recall_at_3 value: 34.71941666666666 - type: recall_at_5 value: 39.46358333333333 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER revision: None split: test type: climate-fever metrics: - type: map_at_1 value: 9.024000000000001 - type: map_at_10 value: 15.644 - type: map_at_100 value: 17.154 - type: map_at_1000 value: 17.345 - type: map_at_3 value: 13.028 - type: map_at_5 value: 14.251 - type: mrr_at_1 value: 19.674 - type: mrr_at_10 value: 29.826999999999998 - type: mrr_at_100 value: 30.935000000000002 - type: mrr_at_1000 value: 30.987 - type: mrr_at_3 value: 26.645000000000003 - type: mrr_at_5 value: 28.29 - type: ndcg_at_1 value: 19.674 - type: ndcg_at_10 value: 22.545 - type: ndcg_at_100 value: 29.207 - type: ndcg_at_1000 value: 32.912 - type: ndcg_at_3 value: 17.952 - type: ndcg_at_5 value: 19.363 - type: precision_at_1 value: 19.674 - type: precision_at_10 value: 7.212000000000001 - type: precision_at_100 value: 1.435 - type: precision_at_1000 value: 0.212 - type: precision_at_3 value: 13.507 - type: precision_at_5 value: 10.397 - type: recall_at_1 value: 9.024000000000001 - type: recall_at_10 value: 28.077999999999996 - type: recall_at_100 value: 51.403 - type: recall_at_1000 value: 72.406 - type: recall_at_3 value: 16.768 - type: recall_at_5 value: 20.737 task: type: Retrieval - dataset: config: default name: MTEB DBPedia revision: None split: test type: dbpedia-entity metrics: - type: map_at_1 value: 8.012 - type: map_at_10 value: 17.138 - type: map_at_100 value: 24.146 - type: map_at_1000 value: 25.622 - type: map_at_3 value: 12.552 - type: map_at_5 value: 14.435 - type: mrr_at_1 value: 62.25000000000001 - type: mrr_at_10 value: 71.186 - type: mrr_at_100 value: 71.504 - type: mrr_at_1000 value: 71.514 - type: mrr_at_3 value: 69.333 - type: mrr_at_5 value: 70.408 - type: ndcg_at_1 value: 49.75 - type: ndcg_at_10 value: 37.76 - type: ndcg_at_100 value: 42.071 - type: ndcg_at_1000 value: 49.309 - type: ndcg_at_3 value: 41.644 - type: ndcg_at_5 value: 39.812999999999995 - type: precision_at_1 value: 62.25000000000001 - type: precision_at_10 value: 30.15 - type: precision_at_100 value: 9.753 - type: precision_at_1000 value: 1.9189999999999998 - type: precision_at_3 value: 45.667 - type: precision_at_5 value: 39.15 - type: recall_at_1 value: 8.012 - type: recall_at_10 value: 22.599 - type: recall_at_100 value: 48.068 - type: recall_at_1000 value: 71.328 - type: recall_at_3 value: 14.043 - type: recall_at_5 value: 17.124 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 42.455 - type: f1 value: 37.59462649781862 task: type: Classification - dataset: config: default name: MTEB FEVER revision: None split: test type: fever metrics: - type: map_at_1 value: 58.092 - type: map_at_10 value: 69.586 - type: map_at_100 value: 69.968 - type: map_at_1000 value: 69.982 - type: map_at_3 value: 67.48100000000001 - type: map_at_5 value: 68.915 - type: mrr_at_1 value: 62.166 - type: mrr_at_10 value: 73.588 - type: mrr_at_100 value: 73.86399999999999 - type: mrr_at_1000 value: 73.868 - type: mrr_at_3 value: 71.6 - type: mrr_at_5 value: 72.99 - type: ndcg_at_1 value: 62.166 - type: ndcg_at_10 value: 75.27199999999999 - type: ndcg_at_100 value: 76.816 - type: ndcg_at_1000 value: 77.09700000000001 - type: ndcg_at_3 value: 71.36 - type: ndcg_at_5 value: 73.785 - type: precision_at_1 value: 62.166 - type: precision_at_10 value: 9.716 - type: precision_at_100 value: 1.065 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 28.278 - type: precision_at_5 value: 18.343999999999998 - type: recall_at_1 value: 58.092 - type: recall_at_10 value: 88.73400000000001 - type: recall_at_100 value: 95.195 - type: recall_at_1000 value: 97.04599999999999 - type: recall_at_3 value: 78.45 - type: recall_at_5 value: 84.316 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 revision: None split: test type: fiqa metrics: - type: map_at_1 value: 16.649 - type: map_at_10 value: 26.457000000000004 - type: map_at_100 value: 28.169 - type: map_at_1000 value: 28.352 - type: map_at_3 value: 23.305 - type: map_at_5 value: 25.169000000000004 - type: mrr_at_1 value: 32.407000000000004 - type: mrr_at_10 value: 40.922 - type: mrr_at_100 value: 41.931000000000004 - type: mrr_at_1000 value: 41.983 - type: mrr_at_3 value: 38.786 - type: mrr_at_5 value: 40.205999999999996 - type: ndcg_at_1 value: 32.407000000000004 - type: ndcg_at_10 value: 33.314 - type: ndcg_at_100 value: 40.312 - type: ndcg_at_1000 value: 43.685 - type: ndcg_at_3 value: 30.391000000000002 - type: ndcg_at_5 value: 31.525 - type: precision_at_1 value: 32.407000000000004 - type: precision_at_10 value: 8.966000000000001 - type: precision_at_100 value: 1.6019999999999999 - type: precision_at_1000 value: 0.22200000000000003 - type: precision_at_3 value: 20.165 - type: precision_at_5 value: 14.722 - type: recall_at_1 value: 16.649 - type: recall_at_10 value: 39.117000000000004 - type: recall_at_100 value: 65.726 - type: recall_at_1000 value: 85.784 - type: recall_at_3 value: 27.914 - type: recall_at_5 value: 33.289 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA revision: None split: test type: hotpotqa metrics: - type: map_at_1 value: 36.253 - type: map_at_10 value: 56.16799999999999 - type: map_at_100 value: 57.06099999999999 - type: map_at_1000 value: 57.126 - type: map_at_3 value: 52.644999999999996 - type: map_at_5 value: 54.909 - type: mrr_at_1 value: 72.505 - type: mrr_at_10 value: 79.66 - type: mrr_at_100 value: 79.869 - type: mrr_at_1000 value: 79.88 - type: mrr_at_3 value: 78.411 - type: mrr_at_5 value: 79.19800000000001 - type: ndcg_at_1 value: 72.505 - type: ndcg_at_10 value: 65.094 - type: ndcg_at_100 value: 68.219 - type: ndcg_at_1000 value: 69.515 - type: ndcg_at_3 value: 59.99 - type: ndcg_at_5 value: 62.909000000000006 - type: precision_at_1 value: 72.505 - type: precision_at_10 value: 13.749 - type: precision_at_100 value: 1.619 - type: precision_at_1000 value: 0.179 - type: precision_at_3 value: 38.357 - type: precision_at_5 value: 25.313000000000002 - type: recall_at_1 value: 36.253 - type: recall_at_10 value: 68.744 - type: recall_at_100 value: 80.925 - type: recall_at_1000 value: 89.534 - type: recall_at_3 value: 57.535000000000004 - type: recall_at_5 value: 63.282000000000004 task: type: Retrieval - dataset: config: default name: MTEB ImdbClassification revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 80.82239999999999 - type: ap value: 75.65895781725314 - type: f1 value: 80.75880969095746 task: type: Classification - dataset: config: default name: MTEB MSMARCO revision: None split: dev type: msmarco metrics: - type: map_at_1 value: 21.624 - type: map_at_10 value: 34.075 - type: map_at_100 value: 35.229 - type: map_at_1000 value: 35.276999999999994 - type: map_at_3 value: 30.245 - type: map_at_5 value: 32.42 - type: mrr_at_1 value: 22.264 - type: mrr_at_10 value: 34.638000000000005 - type: mrr_at_100 value: 35.744 - type: mrr_at_1000 value: 35.787 - type: mrr_at_3 value: 30.891000000000002 - type: mrr_at_5 value: 33.042 - type: ndcg_at_1 value: 22.264 - type: ndcg_at_10 value: 40.991 - type: ndcg_at_100 value: 46.563 - type: ndcg_at_1000 value: 47.743 - type: ndcg_at_3 value: 33.198 - type: ndcg_at_5 value: 37.069 - type: precision_at_1 value: 22.264 - type: precision_at_10 value: 6.5089999999999995 - type: precision_at_100 value: 0.9299999999999999 - type: precision_at_1000 value: 0.10300000000000001 - type: precision_at_3 value: 14.216999999999999 - type: precision_at_5 value: 10.487 - type: recall_at_1 value: 21.624 - type: recall_at_10 value: 62.303 - type: recall_at_100 value: 88.124 - type: recall_at_1000 value: 97.08 - type: recall_at_3 value: 41.099999999999994 - type: recall_at_5 value: 50.381 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 91.06703146374831 - type: f1 value: 90.86867815863172 task: type: Classification - dataset: config: de name: MTEB MTOPDomainClassification (de) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 87.46970977740209 - type: f1 value: 86.36832872036588 task: type: Classification - dataset: config: es name: MTEB MTOPDomainClassification (es) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 89.26951300867245 - type: f1 value: 88.93561193959502 task: type: Classification - dataset: config: fr name: MTEB MTOPDomainClassification (fr) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 84.22799874725963 - type: f1 value: 84.30490069236556 task: type: Classification - dataset: config: hi name: MTEB MTOPDomainClassification (hi) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 86.02007888131948 - type: f1 value: 85.39376041027991 task: type: Classification - dataset: config: th name: MTEB MTOPDomainClassification (th) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 85.34900542495481 - type: f1 value: 85.39859673336713 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 71.078431372549 - type: f1 value: 53.45071102002276 task: type: Classification - dataset: config: de name: MTEB MTOPIntentClassification (de) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 65.85798816568047 - type: f1 value: 46.53112748993529 task: type: Classification - dataset: config: es name: MTEB MTOPIntentClassification (es) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 67.96864576384256 - type: f1 value: 45.966703022829506 task: type: Classification - dataset: config: fr name: MTEB MTOPIntentClassification (fr) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 61.31537738803633 - type: f1 value: 45.52601712835461 task: type: Classification - dataset: config: hi name: MTEB MTOPIntentClassification (hi) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 66.29616349946218 - type: f1 value: 47.24166485726613 task: type: Classification - dataset: config: th name: MTEB MTOPIntentClassification (th) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 67.51537070524412 - type: f1 value: 49.463476319014276 task: type: Classification - dataset: config: af name: MTEB MassiveIntentClassification (af) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.06792199058508 - type: f1 value: 54.094921857502285 task: type: Classification - dataset: config: am name: MTEB MassiveIntentClassification (am) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 51.960322797579025 - type: f1 value: 48.547371223370945 task: type: Classification - dataset: config: ar name: MTEB MassiveIntentClassification (ar) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 54.425016812373904 - type: f1 value: 50.47069202054312 task: type: Classification - dataset: config: az name: MTEB MassiveIntentClassification (az) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 59.798251513113655 - type: f1 value: 57.05013069086648 task: type: Classification - dataset: config: bn name: MTEB MassiveIntentClassification (bn) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 59.37794216543376 - type: f1 value: 56.3607992649805 task: type: Classification - dataset: config: cy name: MTEB MassiveIntentClassification (cy) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 46.56018829858777 - type: f1 value: 43.87319715715134 task: type: Classification - dataset: config: da name: MTEB MassiveIntentClassification (da) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 62.9724277067922 - type: f1 value: 59.36480066245562 task: type: Classification - dataset: config: de name: MTEB MassiveIntentClassification (de) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 62.72696704774715 - type: f1 value: 59.143595966615855 task: type: Classification - dataset: config: el name: MTEB MassiveIntentClassification (el) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 61.5971755211836 - type: f1 value: 59.169445724946726 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 70.29589778076665 - type: f1 value: 67.7577001808977 task: type: Classification - dataset: config: es name: MTEB MassiveIntentClassification (es) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 66.31136516476126 - type: f1 value: 64.52032955983242 task: type: Classification - dataset: config: fa name: MTEB MassiveIntentClassification (fa) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 65.54472091459314 - type: f1 value: 61.47903120066317 task: type: Classification - dataset: config: fi name: MTEB MassiveIntentClassification (fi) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 61.45595158036314 - type: f1 value: 58.0891846024637 task: type: Classification - dataset: config: fr name: MTEB MassiveIntentClassification (fr) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 65.47074646940149 - type: f1 value: 62.84830858877575 task: type: Classification - dataset: config: he name: MTEB MassiveIntentClassification (he) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 58.046402151983855 - type: f1 value: 55.269074430533195 task: type: Classification - dataset: config: hi name: MTEB MassiveIntentClassification (hi) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 64.06523201075991 - type: f1 value: 61.35339643021369 task: type: Classification - dataset: config: hu name: MTEB MassiveIntentClassification (hu) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 60.954942837928726 - type: f1 value: 57.07035922704846 task: type: Classification - dataset: config: hy name: MTEB MassiveIntentClassification (hy) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.404169468728995 - type: f1 value: 53.94259011839138 task: type: Classification - dataset: config: id name: MTEB MassiveIntentClassification (id) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 64.16610625420309 - type: f1 value: 61.337103431499365 task: type: Classification - dataset: config: is name: MTEB MassiveIntentClassification (is) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 52.262945527908535 - type: f1 value: 49.7610691598921 task: type: Classification - dataset: config: it name: MTEB MassiveIntentClassification (it) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 65.54472091459314 - type: f1 value: 63.469099018440154 task: type: Classification - dataset: config: ja name: MTEB MassiveIntentClassification (ja) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 68.22797579018157 - type: f1 value: 64.89098471083001 task: type: Classification - dataset: config: jv name: MTEB MassiveIntentClassification (jv) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 50.847343644922674 - type: f1 value: 47.8536963168393 task: type: Classification - dataset: config: ka name: MTEB MassiveIntentClassification (ka) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 48.45326160053799 - type: f1 value: 46.370078045805556 task: type: Classification - dataset: config: km name: MTEB MassiveIntentClassification (km) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 42.83120376597175 - type: f1 value: 39.68948521599982 task: type: Classification - dataset: config: kn name: MTEB MassiveIntentClassification (kn) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.5084061869536 - type: f1 value: 53.961876160401545 task: type: Classification - dataset: config: ko name: MTEB MassiveIntentClassification (ko) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 63.7895090786819 - type: f1 value: 61.134223684676 task: type: Classification - dataset: config: lv name: MTEB MassiveIntentClassification (lv) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 54.98991257565569 - type: f1 value: 52.579862862826296 task: type: Classification - dataset: config: ml name: MTEB MassiveIntentClassification (ml) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 61.90316072629456 - type: f1 value: 58.203024538290336 task: type: Classification - dataset: config: mn name: MTEB MassiveIntentClassification (mn) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.09818426361802 - type: f1 value: 54.22718458445455 task: type: Classification - dataset: config: ms name: MTEB MassiveIntentClassification (ms) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 58.991257565568255 - type: f1 value: 55.84892781767421 task: type: Classification - dataset: config: my name: MTEB MassiveIntentClassification (my) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 55.901143241425686 - type: f1 value: 52.25264332199797 task: type: Classification - dataset: config: nb name: MTEB MassiveIntentClassification (nb) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 61.96368527236047 - type: f1 value: 58.927243876153454 task: type: Classification - dataset: config: nl name: MTEB MassiveIntentClassification (nl) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 65.64223268325489 - type: f1 value: 62.340453718379706 task: type: Classification - dataset: config: pl name: MTEB MassiveIntentClassification (pl) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 64.52589105581708 - type: f1 value: 61.661113187022174 task: type: Classification - dataset: config: pt name: MTEB MassiveIntentClassification (pt) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 66.84599865501009 - type: f1 value: 64.59342572873005 task: type: Classification - dataset: config: ro name: MTEB MassiveIntentClassification (ro) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 60.81035642232684 - type: f1 value: 57.5169089806797 task: type: Classification - dataset: config: ru name: MTEB MassiveIntentClassification (ru) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 58.652238071815056 - type: f1 value: 53.22732406426353 - type: f1_weighted value: 57.585586737209546 - type: main_score value: 58.652238071815056 task: type: Classification - dataset: config: sl name: MTEB MassiveIntentClassification (sl) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 56.51647612642906 - type: f1 value: 54.33154780100043 task: type: Classification - dataset: config: sq name: MTEB MassiveIntentClassification (sq) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.985877605917956 - type: f1 value: 54.46187524463802 task: type: Classification - dataset: config: sv name: MTEB MassiveIntentClassification (sv) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 65.03026227303296 - type: f1 value: 62.34377392877748 task: type: Classification - dataset: config: sw name: MTEB MassiveIntentClassification (sw) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 53.567585743106925 - type: f1 value: 50.73770655983206 task: type: Classification - dataset: config: ta name: MTEB MassiveIntentClassification (ta) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.2595830531271 - type: f1 value: 53.657327291708626 task: type: Classification - dataset: config: te name: MTEB MassiveIntentClassification (te) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.82784129119032 - type: f1 value: 54.82518072665301 task: type: Classification - dataset: config: th name: MTEB MassiveIntentClassification (th) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 64.06859448554137 - type: f1 value: 63.00185280500495 task: type: Classification - dataset: config: tl name: MTEB MassiveIntentClassification (tl) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 58.91055817081371 - type: f1 value: 55.54116301224262 task: type: Classification - dataset: config: tr name: MTEB MassiveIntentClassification (tr) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 63.54404841963686 - type: f1 value: 59.57650946030184 task: type: Classification - dataset: config: ur name: MTEB MassiveIntentClassification (ur) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 59.27706792199059 - type: f1 value: 56.50010066083435 task: type: Classification - dataset: config: vi name: MTEB MassiveIntentClassification (vi) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 64.0719569603228 - type: f1 value: 61.817075925647956 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveIntentClassification (zh-CN) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 68.23806321452591 - type: f1 value: 65.24917026029749 task: type: Classification - dataset: config: zh-TW name: MTEB MassiveIntentClassification (zh-TW) revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 62.53530598520511 - type: f1 value: 61.71131132295768 task: type: Classification - dataset: config: af name: MTEB MassiveScenarioClassification (af) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 63.04303967720243 - type: f1 value: 60.3950085685985 task: type: Classification - dataset: config: am name: MTEB MassiveScenarioClassification (am) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 56.83591123066578 - type: f1 value: 54.95059828830849 task: type: Classification - dataset: config: ar name: MTEB MassiveScenarioClassification (ar) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 59.62340282447881 - type: f1 value: 59.525159996498225 task: type: Classification - dataset: config: az name: MTEB MassiveScenarioClassification (az) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 60.85406859448555 - type: f1 value: 59.129299095681276 task: type: Classification - dataset: config: bn name: MTEB MassiveScenarioClassification (bn) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 62.76731674512441 - type: f1 value: 61.159560612627715 task: type: Classification - dataset: config: cy name: MTEB MassiveScenarioClassification (cy) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 50.181573638197705 - type: f1 value: 46.98422176289957 task: type: Classification - dataset: config: da name: MTEB MassiveScenarioClassification (da) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 68.92737054472092 - type: f1 value: 67.69135611952979 task: type: Classification - dataset: config: de name: MTEB MassiveScenarioClassification (de) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 69.18964357767318 - type: f1 value: 68.46106138186214 task: type: Classification - dataset: config: el name: MTEB MassiveScenarioClassification (el) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 67.0712844653665 - type: f1 value: 66.75545422473901 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 74.4754539340955 - type: f1 value: 74.38427146553252 task: type: Classification - dataset: config: es name: MTEB MassiveScenarioClassification (es) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 69.82515131136518 - type: f1 value: 69.63516462173847 task: type: Classification - dataset: config: fa name: MTEB MassiveScenarioClassification (fa) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 68.70880968392737 - type: f1 value: 67.45420662567926 task: type: Classification - dataset: config: fi name: MTEB MassiveScenarioClassification (fi) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 65.95494283792871 - type: f1 value: 65.06191009049222 task: type: Classification - dataset: config: fr name: MTEB MassiveScenarioClassification (fr) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 68.75924680564896 - type: f1 value: 68.30833379585945 task: type: Classification - dataset: config: he name: MTEB MassiveScenarioClassification (he) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 63.806321452589096 - type: f1 value: 63.273048243765054 task: type: Classification - dataset: config: hi name: MTEB MassiveScenarioClassification (hi) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 67.68997982515133 - type: f1 value: 66.54703855381324 task: type: Classification - dataset: config: hu name: MTEB MassiveScenarioClassification (hu) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 66.46940147948891 - type: f1 value: 65.91017343463396 task: type: Classification - dataset: config: hy name: MTEB MassiveScenarioClassification (hy) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 59.49899125756556 - type: f1 value: 57.90333469917769 task: type: Classification - dataset: config: id name: MTEB MassiveScenarioClassification (id) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 67.9219905850706 - type: f1 value: 67.23169403762938 task: type: Classification - dataset: config: is name: MTEB MassiveScenarioClassification (is) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 56.486213853396094 - type: f1 value: 54.85282355583758 task: type: Classification - dataset: config: it name: MTEB MassiveScenarioClassification (it) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 69.04169468728985 - type: f1 value: 68.83833333320462 task: type: Classification - dataset: config: ja name: MTEB MassiveScenarioClassification (ja) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 73.88702084734365 - type: f1 value: 74.04474735232299 task: type: Classification - dataset: config: jv name: MTEB MassiveScenarioClassification (jv) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 56.63416274377943 - type: f1 value: 55.11332211687954 task: type: Classification - dataset: config: ka name: MTEB MassiveScenarioClassification (ka) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 52.23604572965702 - type: f1 value: 50.86529813991055 task: type: Classification - dataset: config: km name: MTEB MassiveScenarioClassification (km) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 46.62407531943511 - type: f1 value: 43.63485467164535 task: type: Classification - dataset: config: kn name: MTEB MassiveScenarioClassification (kn) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 59.15601882985878 - type: f1 value: 57.522837510959924 task: type: Classification - dataset: config: ko name: MTEB MassiveScenarioClassification (ko) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 69.84532616005382 - type: f1 value: 69.60021127179697 task: type: Classification - dataset: config: lv name: MTEB MassiveScenarioClassification (lv) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 56.65770006724949 - type: f1 value: 55.84219135523227 task: type: Classification - dataset: config: ml name: MTEB MassiveScenarioClassification (ml) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 66.53665097511768 - type: f1 value: 65.09087787792639 task: type: Classification - dataset: config: mn name: MTEB MassiveScenarioClassification (mn) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 59.31405514458642 - type: f1 value: 58.06135303831491 task: type: Classification - dataset: config: ms name: MTEB MassiveScenarioClassification (ms) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 64.88231338264964 - type: f1 value: 62.751099407787926 task: type: Classification - dataset: config: my name: MTEB MassiveScenarioClassification (my) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 58.86012104909213 - type: f1 value: 56.29118323058282 task: type: Classification - dataset: config: nb name: MTEB MassiveScenarioClassification (nb) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 67.37390719569602 - type: f1 value: 66.27922244885102 task: type: Classification - dataset: config: nl name: MTEB MassiveScenarioClassification (nl) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 70.8675184936113 - type: f1 value: 70.22146529932019 task: type: Classification - dataset: config: pl name: MTEB MassiveScenarioClassification (pl) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 68.2212508406187 - type: f1 value: 67.77454802056282 task: type: Classification - dataset: config: pt name: MTEB MassiveScenarioClassification (pt) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 68.18090114324143 - type: f1 value: 68.03737625431621 task: type: Classification - dataset: config: ro name: MTEB MassiveScenarioClassification (ro) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 64.65030262273034 - type: f1 value: 63.792945486912856 task: type: Classification - dataset: config: ru name: MTEB MassiveScenarioClassification (ru) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 63.772749631087066 - type: f1 value: 63.4539101720024 - type: f1_weighted value: 62.778603897469566 - type: main_score value: 63.772749631087066 task: type: Classification - dataset: config: sl name: MTEB MassiveScenarioClassification (sl) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 60.17821116341627 - type: f1 value: 59.3935969827171 task: type: Classification - dataset: config: sq name: MTEB MassiveScenarioClassification (sq) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 62.86146603900471 - type: f1 value: 60.133692735032376 task: type: Classification - dataset: config: sv name: MTEB MassiveScenarioClassification (sv) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 70.89441829186282 - type: f1 value: 70.03064076194089 task: type: Classification - dataset: config: sw name: MTEB MassiveScenarioClassification (sw) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 58.15063887020847 - type: f1 value: 56.23326278499678 task: type: Classification - dataset: config: ta name: MTEB MassiveScenarioClassification (ta) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 59.43846671149966 - type: f1 value: 57.70440450281974 task: type: Classification - dataset: config: te name: MTEB MassiveScenarioClassification (te) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 60.8507061197041 - type: f1 value: 59.22916396061171 task: type: Classification - dataset: config: th name: MTEB MassiveScenarioClassification (th) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 70.65568258238063 - type: f1 value: 69.90736239440633 task: type: Classification - dataset: config: tl name: MTEB MassiveScenarioClassification (tl) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 60.8843308675185 - type: f1 value: 59.30332663713599 task: type: Classification - dataset: config: tr name: MTEB MassiveScenarioClassification (tr) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 68.05312710154674 - type: f1 value: 67.44024062594775 task: type: Classification - dataset: config: ur name: MTEB MassiveScenarioClassification (ur) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 62.111634162743776 - type: f1 value: 60.89083013084519 task: type: Classification - dataset: config: vi name: MTEB MassiveScenarioClassification (vi) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 67.44115669132482 - type: f1 value: 67.92227541674552 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveScenarioClassification (zh-CN) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 74.4687289845326 - type: f1 value: 74.16376793486025 task: type: Classification - dataset: config: zh-TW name: MTEB MassiveScenarioClassification (zh-TW) revision: 7d571f92784cd94a019292a1f45445077d0ef634 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 68.31876260928043 - type: f1 value: 68.5246745215607 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: v_measure value: 30.90431696479766 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: v_measure value: 27.259158476693774 task: type: Clustering - dataset: config: default name: MTEB MindSmallReranking revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 split: test type: mteb/mind_small metrics: - type: map value: 30.28445330838555 - type: mrr value: 31.15758529581164 task: type: Reranking - dataset: config: default name: MTEB NFCorpus revision: None split: test type: nfcorpus metrics: - type: map_at_1 value: 5.353 - type: map_at_10 value: 11.565 - type: map_at_100 value: 14.097000000000001 - type: map_at_1000 value: 15.354999999999999 - type: map_at_3 value: 8.749 - type: map_at_5 value: 9.974 - type: mrr_at_1 value: 42.105 - type: mrr_at_10 value: 50.589 - type: mrr_at_100 value: 51.187000000000005 - type: mrr_at_1000 value: 51.233 - type: mrr_at_3 value: 48.246 - type: mrr_at_5 value: 49.546 - type: ndcg_at_1 value: 40.402 - type: ndcg_at_10 value: 31.009999999999998 - type: ndcg_at_100 value: 28.026 - type: ndcg_at_1000 value: 36.905 - type: ndcg_at_3 value: 35.983 - type: ndcg_at_5 value: 33.764 - type: precision_at_1 value: 42.105 - type: precision_at_10 value: 22.786 - type: precision_at_100 value: 6.916 - type: precision_at_1000 value: 1.981 - type: precision_at_3 value: 33.333 - type: precision_at_5 value: 28.731 - type: recall_at_1 value: 5.353 - type: recall_at_10 value: 15.039 - type: recall_at_100 value: 27.348 - type: recall_at_1000 value: 59.453 - type: recall_at_3 value: 9.792 - type: recall_at_5 value: 11.882 task: type: Retrieval - dataset: config: default name: MTEB NQ revision: None split: test type: nq metrics: - type: map_at_1 value: 33.852 - type: map_at_10 value: 48.924 - type: map_at_100 value: 49.854 - type: map_at_1000 value: 49.886 - type: map_at_3 value: 44.9 - type: map_at_5 value: 47.387 - type: mrr_at_1 value: 38.035999999999994 - type: mrr_at_10 value: 51.644 - type: mrr_at_100 value: 52.339 - type: mrr_at_1000 value: 52.35999999999999 - type: mrr_at_3 value: 48.421 - type: mrr_at_5 value: 50.468999999999994 - type: ndcg_at_1 value: 38.007000000000005 - type: ndcg_at_10 value: 56.293000000000006 - type: ndcg_at_100 value: 60.167 - type: ndcg_at_1000 value: 60.916000000000004 - type: ndcg_at_3 value: 48.903999999999996 - type: ndcg_at_5 value: 52.978 - type: precision_at_1 value: 38.007000000000005 - type: precision_at_10 value: 9.041 - type: precision_at_100 value: 1.1199999999999999 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 22.084 - type: precision_at_5 value: 15.608 - type: recall_at_1 value: 33.852 - type: recall_at_10 value: 75.893 - type: recall_at_100 value: 92.589 - type: recall_at_1000 value: 98.153 - type: recall_at_3 value: 56.969 - type: recall_at_5 value: 66.283 task: type: Retrieval - dataset: config: default name: MTEB QuoraRetrieval revision: None split: test type: quora metrics: - type: map_at_1 value: 69.174 - type: map_at_10 value: 82.891 - type: map_at_100 value: 83.545 - type: map_at_1000 value: 83.56700000000001 - type: map_at_3 value: 79.944 - type: map_at_5 value: 81.812 - type: mrr_at_1 value: 79.67999999999999 - type: mrr_at_10 value: 86.279 - type: mrr_at_100 value: 86.39 - type: mrr_at_1000 value: 86.392 - type: mrr_at_3 value: 85.21 - type: mrr_at_5 value: 85.92999999999999 - type: ndcg_at_1 value: 79.69000000000001 - type: ndcg_at_10 value: 86.929 - type: ndcg_at_100 value: 88.266 - type: ndcg_at_1000 value: 88.428 - type: ndcg_at_3 value: 83.899 - type: ndcg_at_5 value: 85.56700000000001 - type: precision_at_1 value: 79.69000000000001 - type: precision_at_10 value: 13.161000000000001 - type: precision_at_100 value: 1.513 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 36.603 - type: precision_at_5 value: 24.138 - type: recall_at_1 value: 69.174 - type: recall_at_10 value: 94.529 - type: recall_at_100 value: 99.15 - type: recall_at_1000 value: 99.925 - type: recall_at_3 value: 85.86200000000001 - type: recall_at_5 value: 90.501 task: type: Retrieval - dataset: config: default name: MTEB RedditClustering revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: v_measure value: 39.13064340585255 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P revision: 282350215ef01743dc01b456c7f5241fa8937f16 split: test type: mteb/reddit-clustering-p2p metrics: - type: v_measure value: 58.97884249325877 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS revision: None split: test type: scidocs metrics: - type: map_at_1 value: 3.4680000000000004 - type: map_at_10 value: 7.865 - type: map_at_100 value: 9.332 - type: map_at_1000 value: 9.587 - type: map_at_3 value: 5.800000000000001 - type: map_at_5 value: 6.8790000000000004 - type: mrr_at_1 value: 17.0 - type: mrr_at_10 value: 25.629 - type: mrr_at_100 value: 26.806 - type: mrr_at_1000 value: 26.889000000000003 - type: mrr_at_3 value: 22.8 - type: mrr_at_5 value: 24.26 - type: ndcg_at_1 value: 17.0 - type: ndcg_at_10 value: 13.895 - type: ndcg_at_100 value: 20.491999999999997 - type: ndcg_at_1000 value: 25.759999999999998 - type: ndcg_at_3 value: 13.347999999999999 - type: ndcg_at_5 value: 11.61 - type: precision_at_1 value: 17.0 - type: precision_at_10 value: 7.090000000000001 - type: precision_at_100 value: 1.669 - type: precision_at_1000 value: 0.294 - type: precision_at_3 value: 12.3 - type: precision_at_5 value: 10.02 - type: recall_at_1 value: 3.4680000000000004 - type: recall_at_10 value: 14.363000000000001 - type: recall_at_100 value: 33.875 - type: recall_at_1000 value: 59.711999999999996 - type: recall_at_3 value: 7.483 - type: recall_at_5 value: 10.173 task: type: Retrieval - dataset: config: default name: MTEB SICK-R revision: a6ea5a8cab320b040a23452cc28066d9beae2cee split: test type: mteb/sickr-sts metrics: - type: cos_sim_pearson value: 83.04084311714061 - type: cos_sim_spearman value: 77.51342467443078 - type: euclidean_pearson value: 80.0321166028479 - type: euclidean_spearman value: 77.29249114733226 - type: manhattan_pearson value: 80.03105964262431 - type: manhattan_spearman value: 77.22373689514794 task: type: STS - dataset: config: default name: MTEB STS12 revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: cos_sim_pearson value: 84.1680158034387 - type: cos_sim_spearman value: 76.55983344071117 - type: euclidean_pearson value: 79.75266678300143 - type: euclidean_spearman value: 75.34516823467025 - type: manhattan_pearson value: 79.75959151517357 - type: manhattan_spearman value: 75.42330344141912 task: type: STS - dataset: config: default name: MTEB STS13 revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: cos_sim_pearson value: 76.48898993209346 - type: cos_sim_spearman value: 76.96954120323366 - type: euclidean_pearson value: 76.94139109279668 - type: euclidean_spearman value: 76.85860283201711 - type: manhattan_pearson value: 76.6944095091912 - type: manhattan_spearman value: 76.61096912972553 task: type: STS - dataset: config: default name: MTEB STS14 revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: cos_sim_pearson value: 77.85082366246944 - type: cos_sim_spearman value: 75.52053350101731 - type: euclidean_pearson value: 77.1165845070926 - type: euclidean_spearman value: 75.31216065884388 - type: manhattan_pearson value: 77.06193941833494 - type: manhattan_spearman value: 75.31003701700112 task: type: STS - dataset: config: default name: MTEB STS15 revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: cos_sim_pearson value: 86.36305246526497 - type: cos_sim_spearman value: 87.11704613927415 - type: euclidean_pearson value: 86.04199125810939 - type: euclidean_spearman value: 86.51117572414263 - type: manhattan_pearson value: 86.0805106816633 - type: manhattan_spearman value: 86.52798366512229 task: type: STS - dataset: config: default name: MTEB STS16 revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: cos_sim_pearson value: 82.18536255599724 - type: cos_sim_spearman value: 83.63377151025418 - type: euclidean_pearson value: 83.24657467993141 - type: euclidean_spearman value: 84.02751481993825 - type: manhattan_pearson value: 83.11941806582371 - type: manhattan_spearman value: 83.84251281019304 task: type: STS - dataset: config: ko-ko name: MTEB STS17 (ko-ko) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 78.95816528475514 - type: cos_sim_spearman value: 78.86607380120462 - type: euclidean_pearson value: 78.51268699230545 - type: euclidean_spearman value: 79.11649316502229 - type: manhattan_pearson value: 78.32367302808157 - type: manhattan_spearman value: 78.90277699624637 task: type: STS - dataset: config: ar-ar name: MTEB STS17 (ar-ar) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 72.89126914997624 - type: cos_sim_spearman value: 73.0296921832678 - type: euclidean_pearson value: 71.50385903677738 - type: euclidean_spearman value: 73.13368899716289 - type: manhattan_pearson value: 71.47421463379519 - type: manhattan_spearman value: 73.03383242946575 task: type: STS - dataset: config: en-ar name: MTEB STS17 (en-ar) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 59.22923684492637 - type: cos_sim_spearman value: 57.41013211368396 - type: euclidean_pearson value: 61.21107388080905 - type: euclidean_spearman value: 60.07620768697254 - type: manhattan_pearson value: 59.60157142786555 - type: manhattan_spearman value: 59.14069604103739 task: type: STS - dataset: config: en-de name: MTEB STS17 (en-de) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 76.24345978774299 - type: cos_sim_spearman value: 77.24225743830719 - type: euclidean_pearson value: 76.66226095469165 - type: euclidean_spearman value: 77.60708820493146 - type: manhattan_pearson value: 76.05303324760429 - type: manhattan_spearman value: 76.96353149912348 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 85.50879160160852 - type: cos_sim_spearman value: 86.43594662965224 - type: euclidean_pearson value: 86.06846012826577 - type: euclidean_spearman value: 86.02041395794136 - type: manhattan_pearson value: 86.10916255616904 - type: manhattan_spearman value: 86.07346068198953 task: type: STS - dataset: config: en-tr name: MTEB STS17 (en-tr) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 58.39803698977196 - type: cos_sim_spearman value: 55.96910950423142 - type: euclidean_pearson value: 58.17941175613059 - type: euclidean_spearman value: 55.03019330522745 - type: manhattan_pearson value: 57.333358138183286 - type: manhattan_spearman value: 54.04614023149965 task: type: STS - dataset: config: es-en name: MTEB STS17 (es-en) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 70.98304089637197 - type: cos_sim_spearman value: 72.44071656215888 - type: euclidean_pearson value: 72.19224359033983 - type: euclidean_spearman value: 73.89871188913025 - type: manhattan_pearson value: 71.21098311547406 - type: manhattan_spearman value: 72.93405764824821 task: type: STS - dataset: config: es-es name: MTEB STS17 (es-es) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 85.99792397466308 - type: cos_sim_spearman value: 84.83824377879495 - type: euclidean_pearson value: 85.70043288694438 - type: euclidean_spearman value: 84.70627558703686 - type: manhattan_pearson value: 85.89570850150801 - type: manhattan_spearman value: 84.95806105313007 task: type: STS - dataset: config: fr-en name: MTEB STS17 (fr-en) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 72.21850322994712 - type: cos_sim_spearman value: 72.28669398117248 - type: euclidean_pearson value: 73.40082510412948 - type: euclidean_spearman value: 73.0326539281865 - type: manhattan_pearson value: 71.8659633964841 - type: manhattan_spearman value: 71.57817425823303 task: type: STS - dataset: config: it-en name: MTEB STS17 (it-en) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 75.80921368595645 - type: cos_sim_spearman value: 77.33209091229315 - type: euclidean_pearson value: 76.53159540154829 - type: euclidean_spearman value: 78.17960842810093 - type: manhattan_pearson value: 76.13530186637601 - type: manhattan_spearman value: 78.00701437666875 task: type: STS - dataset: config: nl-en name: MTEB STS17 (nl-en) revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d split: test type: mteb/sts17-crosslingual-sts metrics: - type: cos_sim_pearson value: 74.74980608267349 - type: cos_sim_spearman value: 75.37597374318821 - type: euclidean_pearson value: 74.90506081911661 - type: euclidean_spearman value: 75.30151613124521 - type: manhattan_pearson value: 74.62642745918002 - type: manhattan_spearman value: 75.18619716592303 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 59.632662289205584 - type: cos_sim_spearman value: 60.938543391610914 - type: euclidean_pearson value: 62.113200529767056 - type: euclidean_spearman value: 61.410312633261164 - type: manhattan_pearson value: 61.75494698945686 - type: manhattan_spearman value: 60.92726195322362 task: type: STS - dataset: config: de name: MTEB STS22 (de) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 45.283470551557244 - type: cos_sim_spearman value: 53.44833015864201 - type: euclidean_pearson value: 41.17892011120893 - type: euclidean_spearman value: 53.81441383126767 - type: manhattan_pearson value: 41.17482200420659 - type: manhattan_spearman value: 53.82180269276363 task: type: STS - dataset: config: es name: MTEB STS22 (es) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 60.5069165306236 - type: cos_sim_spearman value: 66.87803259033826 - type: euclidean_pearson value: 63.5428979418236 - type: euclidean_spearman value: 66.9293576586897 - type: manhattan_pearson value: 63.59789526178922 - type: manhattan_spearman value: 66.86555009875066 task: type: STS - dataset: config: pl name: MTEB STS22 (pl) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 28.23026196280264 - type: cos_sim_spearman value: 35.79397812652861 - type: euclidean_pearson value: 17.828102102767353 - type: euclidean_spearman value: 35.721501145568894 - type: manhattan_pearson value: 17.77134274219677 - type: manhattan_spearman value: 35.98107902846267 task: type: STS - dataset: config: tr name: MTEB STS22 (tr) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 56.51946541393812 - type: cos_sim_spearman value: 63.714686006214485 - type: euclidean_pearson value: 58.32104651305898 - type: euclidean_spearman value: 62.237110895702216 - type: manhattan_pearson value: 58.579416468759185 - type: manhattan_spearman value: 62.459738981727 task: type: STS - dataset: config: ar name: MTEB STS22 (ar) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 48.76009839569795 - type: cos_sim_spearman value: 56.65188431953149 - type: euclidean_pearson value: 50.997682160915595 - type: euclidean_spearman value: 55.99910008818135 - type: manhattan_pearson value: 50.76220659606342 - type: manhattan_spearman value: 55.517347595391456 task: type: STS - dataset: config: ru name: MTEB STS22 (ru) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 50.724322379215934 - type: cosine_spearman value: 59.90449732164651 - type: euclidean_pearson value: 50.227545226784024 - type: euclidean_spearman value: 59.898906527601085 - type: main_score value: 59.90449732164651 - type: manhattan_pearson value: 50.21762139819405 - type: manhattan_spearman value: 59.761039813759 - type: pearson value: 50.724322379215934 - type: spearman value: 59.90449732164651 task: type: STS - dataset: config: zh name: MTEB STS22 (zh) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 54.717524559088005 - type: cos_sim_spearman value: 66.83570886252286 - type: euclidean_pearson value: 58.41338625505467 - type: euclidean_spearman value: 66.68991427704938 - type: manhattan_pearson value: 58.78638572916807 - type: manhattan_spearman value: 66.58684161046335 task: type: STS - dataset: config: fr name: MTEB STS22 (fr) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 73.2962042954962 - type: cos_sim_spearman value: 76.58255504852025 - type: euclidean_pearson value: 75.70983192778257 - type: euclidean_spearman value: 77.4547684870542 - type: manhattan_pearson value: 75.75565853870485 - type: manhattan_spearman value: 76.90208974949428 task: type: STS - dataset: config: de-en name: MTEB STS22 (de-en) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 54.47396266924846 - type: cos_sim_spearman value: 56.492267162048606 - type: euclidean_pearson value: 55.998505203070195 - type: euclidean_spearman value: 56.46447012960222 - type: manhattan_pearson value: 54.873172394430995 - type: manhattan_spearman value: 56.58111534551218 task: type: STS - dataset: config: es-en name: MTEB STS22 (es-en) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 69.87177267688686 - type: cos_sim_spearman value: 74.57160943395763 - type: euclidean_pearson value: 70.88330406826788 - type: euclidean_spearman value: 74.29767636038422 - type: manhattan_pearson value: 71.38245248369536 - type: manhattan_spearman value: 74.53102232732175 task: type: STS - dataset: config: it name: MTEB STS22 (it) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 72.80225656959544 - type: cos_sim_spearman value: 76.52646173725735 - type: euclidean_pearson value: 73.95710720200799 - type: euclidean_spearman value: 76.54040031984111 - type: manhattan_pearson value: 73.89679971946774 - type: manhattan_spearman value: 76.60886958161574 task: type: STS - dataset: config: pl-en name: MTEB STS22 (pl-en) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 70.70844249898789 - type: cos_sim_spearman value: 72.68571783670241 - type: euclidean_pearson value: 72.38800772441031 - type: euclidean_spearman value: 72.86804422703312 - type: manhattan_pearson value: 71.29840508203515 - type: manhattan_spearman value: 71.86264441749513 task: type: STS - dataset: config: zh-en name: MTEB STS22 (zh-en) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 58.647478923935694 - type: cos_sim_spearman value: 63.74453623540931 - type: euclidean_pearson value: 59.60138032437505 - type: euclidean_spearman value: 63.947930832166065 - type: manhattan_pearson value: 58.59735509491861 - type: manhattan_spearman value: 62.082503844627404 task: type: STS - dataset: config: es-it name: MTEB STS22 (es-it) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 65.8722516867162 - type: cos_sim_spearman value: 71.81208592523012 - type: euclidean_pearson value: 67.95315252165956 - type: euclidean_spearman value: 73.00749822046009 - type: manhattan_pearson value: 68.07884688638924 - type: manhattan_spearman value: 72.34210325803069 task: type: STS - dataset: config: de-fr name: MTEB STS22 (de-fr) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 54.5405814240949 - type: cos_sim_spearman value: 60.56838649023775 - type: euclidean_pearson value: 53.011731611314104 - type: euclidean_spearman value: 58.533194841668426 - type: manhattan_pearson value: 53.623067729338494 - type: manhattan_spearman value: 58.018756154446926 task: type: STS - dataset: config: de-pl name: MTEB STS22 (de-pl) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 13.611046866216112 - type: cos_sim_spearman value: 28.238192909158492 - type: euclidean_pearson value: 22.16189199885129 - type: euclidean_spearman value: 35.012895679076564 - type: manhattan_pearson value: 21.969771178698387 - type: manhattan_spearman value: 32.456985088607475 task: type: STS - dataset: config: fr-pl name: MTEB STS22 (fr-pl) revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cos_sim_pearson value: 74.58077407011655 - type: cos_sim_spearman value: 84.51542547285167 - type: euclidean_pearson value: 74.64613843596234 - type: euclidean_spearman value: 84.51542547285167 - type: manhattan_pearson value: 75.15335973101396 - type: manhattan_spearman value: 84.51542547285167 task: type: STS - dataset: config: default name: MTEB STSBenchmark revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: cos_sim_pearson value: 82.0739825531578 - type: cos_sim_spearman value: 84.01057479311115 - type: euclidean_pearson value: 83.85453227433344 - type: euclidean_spearman value: 84.01630226898655 - type: manhattan_pearson value: 83.75323603028978 - type: manhattan_spearman value: 83.89677983727685 task: type: STS - dataset: config: default name: MTEB SciDocsRR revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: map value: 78.12945623123957 - type: mrr value: 93.87738713719106 task: type: Reranking - dataset: config: default name: MTEB SciFact revision: None split: test type: scifact metrics: - type: map_at_1 value: 52.983000000000004 - type: map_at_10 value: 62.946000000000005 - type: map_at_100 value: 63.514 - type: map_at_1000 value: 63.554 - type: map_at_3 value: 60.183 - type: map_at_5 value: 61.672000000000004 - type: mrr_at_1 value: 55.667 - type: mrr_at_10 value: 64.522 - type: mrr_at_100 value: 64.957 - type: mrr_at_1000 value: 64.995 - type: mrr_at_3 value: 62.388999999999996 - type: mrr_at_5 value: 63.639 - type: ndcg_at_1 value: 55.667 - type: ndcg_at_10 value: 67.704 - type: ndcg_at_100 value: 70.299 - type: ndcg_at_1000 value: 71.241 - type: ndcg_at_3 value: 62.866 - type: ndcg_at_5 value: 65.16999999999999 - type: precision_at_1 value: 55.667 - type: precision_at_10 value: 9.033 - type: precision_at_100 value: 1.053 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 24.444 - type: precision_at_5 value: 16.133 - type: recall_at_1 value: 52.983000000000004 - type: recall_at_10 value: 80.656 - type: recall_at_100 value: 92.5 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 67.744 - type: recall_at_5 value: 73.433 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: cos_sim_accuracy value: 99.72772277227723 - type: cos_sim_ap value: 92.17845897992215 - type: cos_sim_f1 value: 85.9746835443038 - type: cos_sim_precision value: 87.07692307692308 - type: cos_sim_recall value: 84.89999999999999 - type: dot_accuracy value: 99.3039603960396 - type: dot_ap value: 60.70244020124878 - type: dot_f1 value: 59.92742353551063 - type: dot_precision value: 62.21743810548978 - type: dot_recall value: 57.8 - type: euclidean_accuracy value: 99.71683168316832 - type: euclidean_ap value: 91.53997039964659 - type: euclidean_f1 value: 84.88372093023257 - type: euclidean_precision value: 90.02242152466367 - type: euclidean_recall value: 80.30000000000001 - type: manhattan_accuracy value: 99.72376237623763 - type: manhattan_ap value: 91.80756777790289 - type: manhattan_f1 value: 85.48468106479157 - type: manhattan_precision value: 85.8728557013118 - type: manhattan_recall value: 85.1 - type: max_accuracy value: 99.72772277227723 - type: max_ap value: 92.17845897992215 - type: max_f1 value: 85.9746835443038 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: v_measure value: 53.52464042600003 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: v_measure value: 32.071631948736 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: map value: 49.19552407604654 - type: mrr value: 49.95269130379425 task: type: Reranking - dataset: config: default name: MTEB SummEval revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: cos_sim_pearson value: 29.345293033095427 - type: cos_sim_spearman value: 29.976931423258403 - type: dot_pearson value: 27.047078008958408 - type: dot_spearman value: 27.75894368380218 task: type: Summarization - dataset: config: default name: MTEB TRECCOVID revision: None split: test type: trec-covid metrics: - type: map_at_1 value: 0.22 - type: map_at_10 value: 1.706 - type: map_at_100 value: 9.634 - type: map_at_1000 value: 23.665 - type: map_at_3 value: 0.5950000000000001 - type: map_at_5 value: 0.95 - type: mrr_at_1 value: 86.0 - type: mrr_at_10 value: 91.8 - type: mrr_at_100 value: 91.8 - type: mrr_at_1000 value: 91.8 - type: mrr_at_3 value: 91.0 - type: mrr_at_5 value: 91.8 - type: ndcg_at_1 value: 80.0 - type: ndcg_at_10 value: 72.573 - type: ndcg_at_100 value: 53.954 - type: ndcg_at_1000 value: 47.760999999999996 - type: ndcg_at_3 value: 76.173 - type: ndcg_at_5 value: 75.264 - type: precision_at_1 value: 86.0 - type: precision_at_10 value: 76.4 - type: precision_at_100 value: 55.50000000000001 - type: precision_at_1000 value: 21.802 - type: precision_at_3 value: 81.333 - type: precision_at_5 value: 80.4 - type: recall_at_1 value: 0.22 - type: recall_at_10 value: 1.925 - type: recall_at_100 value: 12.762 - type: recall_at_1000 value: 44.946000000000005 - type: recall_at_3 value: 0.634 - type: recall_at_5 value: 1.051 task: type: Retrieval - dataset: config: sqi-eng name: MTEB Tatoeba (sqi-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 91.0 - type: f1 value: 88.55666666666666 - type: precision value: 87.46166666666667 - type: recall value: 91.0 task: type: BitextMining - dataset: config: fry-eng name: MTEB Tatoeba (fry-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 57.22543352601156 - type: f1 value: 51.03220478943021 - type: precision value: 48.8150289017341 - type: recall value: 57.22543352601156 task: type: BitextMining - dataset: config: kur-eng name: MTEB Tatoeba (kur-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 46.58536585365854 - type: f1 value: 39.66870798578116 - type: precision value: 37.416085946573745 - type: recall value: 46.58536585365854 task: type: BitextMining - dataset: config: tur-eng name: MTEB Tatoeba (tur-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 89.7 - type: f1 value: 86.77999999999999 - type: precision value: 85.45333333333332 - type: recall value: 89.7 task: type: BitextMining - dataset: config: deu-eng name: MTEB Tatoeba (deu-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 97.39999999999999 - type: f1 value: 96.58333333333331 - type: precision value: 96.2 - type: recall value: 97.39999999999999 task: type: BitextMining - dataset: config: nld-eng name: MTEB Tatoeba (nld-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 92.4 - type: f1 value: 90.3 - type: precision value: 89.31666666666668 - type: recall value: 92.4 task: type: BitextMining - dataset: config: ron-eng name: MTEB Tatoeba (ron-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 86.9 - type: f1 value: 83.67190476190476 - type: precision value: 82.23333333333332 - type: recall value: 86.9 task: type: BitextMining - dataset: config: ang-eng name: MTEB Tatoeba (ang-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 50.0 - type: f1 value: 42.23229092632078 - type: precision value: 39.851634683724235 - type: recall value: 50.0 task: type: BitextMining - dataset: config: ido-eng name: MTEB Tatoeba (ido-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 76.3 - type: f1 value: 70.86190476190477 - type: precision value: 68.68777777777777 - type: recall value: 76.3 task: type: BitextMining - dataset: config: jav-eng name: MTEB Tatoeba (jav-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 57.073170731707314 - type: f1 value: 50.658958927251604 - type: precision value: 48.26480836236933 - type: recall value: 57.073170731707314 task: type: BitextMining - dataset: config: isl-eng name: MTEB Tatoeba (isl-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 68.2 - type: f1 value: 62.156507936507936 - type: precision value: 59.84964285714286 - type: recall value: 68.2 task: type: BitextMining - dataset: config: slv-eng name: MTEB Tatoeba (slv-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 77.52126366950182 - type: f1 value: 72.8496210148701 - type: precision value: 70.92171498003819 - type: recall value: 77.52126366950182 task: type: BitextMining - dataset: config: cym-eng name: MTEB Tatoeba (cym-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 70.78260869565217 - type: f1 value: 65.32422360248447 - type: precision value: 63.063067367415194 - type: recall value: 70.78260869565217 task: type: BitextMining - dataset: config: kaz-eng name: MTEB Tatoeba (kaz-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 78.43478260869566 - type: f1 value: 73.02608695652172 - type: precision value: 70.63768115942028 - type: recall value: 78.43478260869566 task: type: BitextMining - dataset: config: est-eng name: MTEB Tatoeba (est-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 60.9 - type: f1 value: 55.309753694581275 - type: precision value: 53.130476190476195 - type: recall value: 60.9 task: type: BitextMining - dataset: config: heb-eng name: MTEB Tatoeba (heb-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 72.89999999999999 - type: f1 value: 67.92023809523809 - type: precision value: 65.82595238095237 - type: recall value: 72.89999999999999 task: type: BitextMining - dataset: config: gla-eng name: MTEB Tatoeba (gla-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 46.80337756332931 - type: f1 value: 39.42174900558496 - type: precision value: 36.97101116280851 - type: recall value: 46.80337756332931 task: type: BitextMining - dataset: config: mar-eng name: MTEB Tatoeba (mar-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 89.8 - type: f1 value: 86.79 - type: precision value: 85.375 - type: recall value: 89.8 task: type: BitextMining - dataset: config: lat-eng name: MTEB Tatoeba (lat-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 47.199999999999996 - type: f1 value: 39.95484348984349 - type: precision value: 37.561071428571424 - type: recall value: 47.199999999999996 task: type: BitextMining - dataset: config: bel-eng name: MTEB Tatoeba (bel-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 87.8 - type: f1 value: 84.68190476190475 - type: precision value: 83.275 - type: recall value: 87.8 task: type: BitextMining - dataset: config: pms-eng name: MTEB Tatoeba (pms-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 48.76190476190476 - type: f1 value: 42.14965986394558 - type: precision value: 39.96743626743626 - type: recall value: 48.76190476190476 task: type: BitextMining - dataset: config: gle-eng name: MTEB Tatoeba (gle-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 66.10000000000001 - type: f1 value: 59.58580086580086 - type: precision value: 57.150238095238095 - type: recall value: 66.10000000000001 task: type: BitextMining - dataset: config: pes-eng name: MTEB Tatoeba (pes-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 87.3 - type: f1 value: 84.0 - type: precision value: 82.48666666666666 - type: recall value: 87.3 task: type: BitextMining - dataset: config: nob-eng name: MTEB Tatoeba (nob-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 90.4 - type: f1 value: 87.79523809523809 - type: precision value: 86.6 - type: recall value: 90.4 task: type: BitextMining - dataset: config: bul-eng name: MTEB Tatoeba (bul-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 87.0 - type: f1 value: 83.81 - type: precision value: 82.36666666666666 - type: recall value: 87.0 task: type: BitextMining - dataset: config: cbk-eng name: MTEB Tatoeba (cbk-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 63.9 - type: f1 value: 57.76533189033189 - type: precision value: 55.50595238095239 - type: recall value: 63.9 task: type: BitextMining - dataset: config: hun-eng name: MTEB Tatoeba (hun-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 76.1 - type: f1 value: 71.83690476190478 - type: precision value: 70.04928571428573 - type: recall value: 76.1 task: type: BitextMining - dataset: config: uig-eng name: MTEB Tatoeba (uig-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 66.3 - type: f1 value: 59.32626984126984 - type: precision value: 56.62535714285713 - type: recall value: 66.3 task: type: BitextMining - dataset: config: rus-eng name: MTEB Tatoeba (rus-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 92.10000000000001 - type: f1 value: 89.76666666666667 - type: main_score value: 89.76666666666667 - type: precision value: 88.64999999999999 - type: recall value: 92.10000000000001 task: type: BitextMining - dataset: config: spa-eng name: MTEB Tatoeba (spa-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 93.10000000000001 - type: f1 value: 91.10000000000001 - type: precision value: 90.16666666666666 - type: recall value: 93.10000000000001 task: type: BitextMining - dataset: config: hye-eng name: MTEB Tatoeba (hye-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 85.71428571428571 - type: f1 value: 82.29142600436403 - type: precision value: 80.8076626877166 - type: recall value: 85.71428571428571 task: type: BitextMining - dataset: config: tel-eng name: MTEB Tatoeba (tel-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 88.88888888888889 - type: f1 value: 85.7834757834758 - type: precision value: 84.43732193732193 - type: recall value: 88.88888888888889 task: type: BitextMining - dataset: config: afr-eng name: MTEB Tatoeba (afr-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 88.5 - type: f1 value: 85.67190476190476 - type: precision value: 84.43333333333332 - type: recall value: 88.5 task: type: BitextMining - dataset: config: mon-eng name: MTEB Tatoeba (mon-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 82.72727272727273 - type: f1 value: 78.21969696969695 - type: precision value: 76.18181818181819 - type: recall value: 82.72727272727273 task: type: BitextMining - dataset: config: arz-eng name: MTEB Tatoeba (arz-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 61.0062893081761 - type: f1 value: 55.13976240391334 - type: precision value: 52.92112499659669 - type: recall value: 61.0062893081761 task: type: BitextMining - dataset: config: hrv-eng name: MTEB Tatoeba (hrv-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 89.5 - type: f1 value: 86.86666666666666 - type: precision value: 85.69166666666668 - type: recall value: 89.5 task: type: BitextMining - dataset: config: nov-eng name: MTEB Tatoeba (nov-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 73.54085603112841 - type: f1 value: 68.56031128404669 - type: precision value: 66.53047989623866 - type: recall value: 73.54085603112841 task: type: BitextMining - dataset: config: gsw-eng name: MTEB Tatoeba (gsw-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 43.58974358974359 - type: f1 value: 36.45299145299145 - type: precision value: 33.81155881155882 - type: recall value: 43.58974358974359 task: type: BitextMining - dataset: config: nds-eng name: MTEB Tatoeba (nds-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 59.599999999999994 - type: f1 value: 53.264689754689755 - type: precision value: 50.869166666666665 - type: recall value: 59.599999999999994 task: type: BitextMining - dataset: config: ukr-eng name: MTEB Tatoeba (ukr-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 85.2 - type: f1 value: 81.61666666666665 - type: precision value: 80.02833333333335 - type: recall value: 85.2 task: type: BitextMining - dataset: config: uzb-eng name: MTEB Tatoeba (uzb-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 63.78504672897196 - type: f1 value: 58.00029669188548 - type: precision value: 55.815809968847354 - type: recall value: 63.78504672897196 task: type: BitextMining - dataset: config: lit-eng name: MTEB Tatoeba (lit-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 66.5 - type: f1 value: 61.518333333333345 - type: precision value: 59.622363699102834 - type: recall value: 66.5 task: type: BitextMining - dataset: config: ina-eng name: MTEB Tatoeba (ina-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 88.6 - type: f1 value: 85.60222222222221 - type: precision value: 84.27916666666665 - type: recall value: 88.6 task: type: BitextMining - dataset: config: lfn-eng name: MTEB Tatoeba (lfn-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 58.699999999999996 - type: f1 value: 52.732375957375965 - type: precision value: 50.63214035964035 - type: recall value: 58.699999999999996 task: type: BitextMining - dataset: config: zsm-eng name: MTEB Tatoeba (zsm-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 92.10000000000001 - type: f1 value: 89.99666666666667 - type: precision value: 89.03333333333333 - type: recall value: 92.10000000000001 task: type: BitextMining - dataset: config: ita-eng name: MTEB Tatoeba (ita-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 90.10000000000001 - type: f1 value: 87.55666666666667 - type: precision value: 86.36166666666668 - type: recall value: 90.10000000000001 task: type: BitextMining - dataset: config: cmn-eng name: MTEB Tatoeba (cmn-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 91.4 - type: f1 value: 88.89000000000001 - type: precision value: 87.71166666666666 - type: recall value: 91.4 task: type: BitextMining - dataset: config: lvs-eng name: MTEB Tatoeba (lvs-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 65.7 - type: f1 value: 60.67427750410509 - type: precision value: 58.71785714285714 - type: recall value: 65.7 task: type: BitextMining - dataset: config: glg-eng name: MTEB Tatoeba (glg-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 85.39999999999999 - type: f1 value: 81.93190476190475 - type: precision value: 80.37833333333333 - type: recall value: 85.39999999999999 task: type: BitextMining - dataset: config: ceb-eng name: MTEB Tatoeba (ceb-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 47.833333333333336 - type: f1 value: 42.006625781625786 - type: precision value: 40.077380952380956 - type: recall value: 47.833333333333336 task: type: BitextMining - dataset: config: bre-eng name: MTEB Tatoeba (bre-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 10.4 - type: f1 value: 8.24465007215007 - type: precision value: 7.664597069597071 - type: recall value: 10.4 task: type: BitextMining - dataset: config: ben-eng name: MTEB Tatoeba (ben-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 82.6 - type: f1 value: 77.76333333333334 - type: precision value: 75.57833333333332 - type: recall value: 82.6 task: type: BitextMining - dataset: config: swg-eng name: MTEB Tatoeba (swg-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 52.67857142857143 - type: f1 value: 44.302721088435376 - type: precision value: 41.49801587301587 - type: recall value: 52.67857142857143 task: type: BitextMining - dataset: config: arq-eng name: MTEB Tatoeba (arq-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 28.3205268935236 - type: f1 value: 22.426666605171157 - type: precision value: 20.685900116470915 - type: recall value: 28.3205268935236 task: type: BitextMining - dataset: config: kab-eng name: MTEB Tatoeba (kab-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 22.7 - type: f1 value: 17.833970473970474 - type: precision value: 16.407335164835164 - type: recall value: 22.7 task: type: BitextMining - dataset: config: fra-eng name: MTEB Tatoeba (fra-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 92.2 - type: f1 value: 89.92999999999999 - type: precision value: 88.87 - type: recall value: 92.2 task: type: BitextMining - dataset: config: por-eng name: MTEB Tatoeba (por-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 91.4 - type: f1 value: 89.25 - type: precision value: 88.21666666666667 - type: recall value: 91.4 task: type: BitextMining - dataset: config: tat-eng name: MTEB Tatoeba (tat-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 69.19999999999999 - type: f1 value: 63.38269841269841 - type: precision value: 61.14773809523809 - type: recall value: 69.19999999999999 task: type: BitextMining - dataset: config: oci-eng name: MTEB Tatoeba (oci-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 48.8 - type: f1 value: 42.839915639915645 - type: precision value: 40.770287114845935 - type: recall value: 48.8 task: type: BitextMining - dataset: config: pol-eng name: MTEB Tatoeba (pol-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 88.8 - type: f1 value: 85.90666666666668 - type: precision value: 84.54166666666666 - type: recall value: 88.8 task: type: BitextMining - dataset: config: war-eng name: MTEB Tatoeba (war-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 46.6 - type: f1 value: 40.85892920804686 - type: precision value: 38.838223114604695 - type: recall value: 46.6 task: type: BitextMining - dataset: config: aze-eng name: MTEB Tatoeba (aze-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 84.0 - type: f1 value: 80.14190476190475 - type: precision value: 78.45333333333333 - type: recall value: 84.0 task: type: BitextMining - dataset: config: vie-eng name: MTEB Tatoeba (vie-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 90.5 - type: f1 value: 87.78333333333333 - type: precision value: 86.5 - type: recall value: 90.5 task: type: BitextMining - dataset: config: nno-eng name: MTEB Tatoeba (nno-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 74.5 - type: f1 value: 69.48397546897547 - type: precision value: 67.51869047619049 - type: recall value: 74.5 task: type: BitextMining - dataset: config: cha-eng name: MTEB Tatoeba (cha-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 32.846715328467155 - type: f1 value: 27.828177499710343 - type: precision value: 26.63451511991658 - type: recall value: 32.846715328467155 task: type: BitextMining - dataset: config: mhr-eng name: MTEB Tatoeba (mhr-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 8.0 - type: f1 value: 6.07664116764988 - type: precision value: 5.544177607179943 - type: recall value: 8.0 task: type: BitextMining - dataset: config: dan-eng name: MTEB Tatoeba (dan-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 87.6 - type: f1 value: 84.38555555555554 - type: precision value: 82.91583333333334 - type: recall value: 87.6 task: type: BitextMining - dataset: config: ell-eng name: MTEB Tatoeba (ell-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 87.5 - type: f1 value: 84.08333333333331 - type: precision value: 82.47333333333333 - type: recall value: 87.5 task: type: BitextMining - dataset: config: amh-eng name: MTEB Tatoeba (amh-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 80.95238095238095 - type: f1 value: 76.13095238095238 - type: precision value: 74.05753968253967 - type: recall value: 80.95238095238095 task: type: BitextMining - dataset: config: pam-eng name: MTEB Tatoeba (pam-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 8.799999999999999 - type: f1 value: 6.971422975172975 - type: precision value: 6.557814916172301 - type: recall value: 8.799999999999999 task: type: BitextMining - dataset: config: hsb-eng name: MTEB Tatoeba (hsb-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 44.099378881987576 - type: f1 value: 37.01649742022413 - type: precision value: 34.69420618488942 - type: recall value: 44.099378881987576 task: type: BitextMining - dataset: config: srp-eng name: MTEB Tatoeba (srp-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 84.3 - type: f1 value: 80.32666666666667 - type: precision value: 78.60666666666665 - type: recall value: 84.3 task: type: BitextMining - dataset: config: epo-eng name: MTEB Tatoeba (epo-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 92.5 - type: f1 value: 90.49666666666666 - type: precision value: 89.56666666666668 - type: recall value: 92.5 task: type: BitextMining - dataset: config: kzj-eng name: MTEB Tatoeba (kzj-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 10.0 - type: f1 value: 8.268423529875141 - type: precision value: 7.878118605532398 - type: recall value: 10.0 task: type: BitextMining - dataset: config: awa-eng name: MTEB Tatoeba (awa-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 79.22077922077922 - type: f1 value: 74.27128427128426 - type: precision value: 72.28715728715729 - type: recall value: 79.22077922077922 task: type: BitextMining - dataset: config: fao-eng name: MTEB Tatoeba (fao-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 65.64885496183206 - type: f1 value: 58.87495456197747 - type: precision value: 55.992366412213734 - type: recall value: 65.64885496183206 task: type: BitextMining - dataset: config: mal-eng name: MTEB Tatoeba (mal-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 96.06986899563319 - type: f1 value: 94.78408539543909 - type: precision value: 94.15332362930616 - type: recall value: 96.06986899563319 task: type: BitextMining - dataset: config: ile-eng name: MTEB Tatoeba (ile-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 77.2 - type: f1 value: 71.72571428571428 - type: precision value: 69.41000000000001 - type: recall value: 77.2 task: type: BitextMining - dataset: config: bos-eng name: MTEB Tatoeba (bos-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 86.4406779661017 - type: f1 value: 83.2391713747646 - type: precision value: 81.74199623352166 - type: recall value: 86.4406779661017 task: type: BitextMining - dataset: config: cor-eng name: MTEB Tatoeba (cor-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 8.4 - type: f1 value: 6.017828743398003 - type: precision value: 5.4829865484756795 - type: recall value: 8.4 task: type: BitextMining - dataset: config: cat-eng name: MTEB Tatoeba (cat-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 83.5 - type: f1 value: 79.74833333333333 - type: precision value: 78.04837662337664 - type: recall value: 83.5 task: type: BitextMining - dataset: config: eus-eng name: MTEB Tatoeba (eus-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 60.4 - type: f1 value: 54.467301587301584 - type: precision value: 52.23242424242424 - type: recall value: 60.4 task: type: BitextMining - dataset: config: yue-eng name: MTEB Tatoeba (yue-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 74.9 - type: f1 value: 69.68699134199134 - type: precision value: 67.59873015873016 - type: recall value: 74.9 task: type: BitextMining - dataset: config: swe-eng name: MTEB Tatoeba (swe-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 88.0 - type: f1 value: 84.9652380952381 - type: precision value: 83.66166666666666 - type: recall value: 88.0 task: type: BitextMining - dataset: config: dtp-eng name: MTEB Tatoeba (dtp-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 9.1 - type: f1 value: 7.681244588744588 - type: precision value: 7.370043290043291 - type: recall value: 9.1 task: type: BitextMining - dataset: config: kat-eng name: MTEB Tatoeba (kat-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 80.9651474530831 - type: f1 value: 76.84220605132133 - type: precision value: 75.19606398962966 - type: recall value: 80.9651474530831 task: type: BitextMining - dataset: config: jpn-eng name: MTEB Tatoeba (jpn-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 86.9 - type: f1 value: 83.705 - type: precision value: 82.3120634920635 - type: recall value: 86.9 task: type: BitextMining - dataset: config: csb-eng name: MTEB Tatoeba (csb-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 29.64426877470356 - type: f1 value: 23.98763072676116 - type: precision value: 22.506399397703746 - type: recall value: 29.64426877470356 task: type: BitextMining - dataset: config: xho-eng name: MTEB Tatoeba (xho-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 70.4225352112676 - type: f1 value: 62.84037558685445 - type: precision value: 59.56572769953053 - type: recall value: 70.4225352112676 task: type: BitextMining - dataset: config: orv-eng name: MTEB Tatoeba (orv-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 19.64071856287425 - type: f1 value: 15.125271011207756 - type: precision value: 13.865019261197494 - type: recall value: 19.64071856287425 task: type: BitextMining - dataset: config: ind-eng name: MTEB Tatoeba (ind-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 90.2 - type: f1 value: 87.80666666666666 - type: precision value: 86.70833333333331 - type: recall value: 90.2 task: type: BitextMining - dataset: config: tuk-eng name: MTEB Tatoeba (tuk-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 23.15270935960591 - type: f1 value: 18.407224958949097 - type: precision value: 16.982385430661292 - type: recall value: 23.15270935960591 task: type: BitextMining - dataset: config: max-eng name: MTEB Tatoeba (max-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 55.98591549295775 - type: f1 value: 49.94718309859154 - type: precision value: 47.77864154624717 - type: recall value: 55.98591549295775 task: type: BitextMining - dataset: config: swh-eng name: MTEB Tatoeba (swh-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 73.07692307692307 - type: f1 value: 66.74358974358974 - type: precision value: 64.06837606837607 - type: recall value: 73.07692307692307 task: type: BitextMining - dataset: config: hin-eng name: MTEB Tatoeba (hin-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 94.89999999999999 - type: f1 value: 93.25 - type: precision value: 92.43333333333332 - type: recall value: 94.89999999999999 task: type: BitextMining - dataset: config: dsb-eng name: MTEB Tatoeba (dsb-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 37.78705636743215 - type: f1 value: 31.63899658680452 - type: precision value: 29.72264397629742 - type: recall value: 37.78705636743215 task: type: BitextMining - dataset: config: ber-eng name: MTEB Tatoeba (ber-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 21.6 - type: f1 value: 16.91697302697303 - type: precision value: 15.71225147075147 - type: recall value: 21.6 task: type: BitextMining - dataset: config: tam-eng name: MTEB Tatoeba (tam-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 85.01628664495115 - type: f1 value: 81.38514037536838 - type: precision value: 79.83170466883823 - type: recall value: 85.01628664495115 task: type: BitextMining - dataset: config: slk-eng name: MTEB Tatoeba (slk-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 83.39999999999999 - type: f1 value: 79.96380952380952 - type: precision value: 78.48333333333333 - type: recall value: 83.39999999999999 task: type: BitextMining - dataset: config: tgl-eng name: MTEB Tatoeba (tgl-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 83.2 - type: f1 value: 79.26190476190476 - type: precision value: 77.58833333333334 - type: recall value: 83.2 task: type: BitextMining - dataset: config: ast-eng name: MTEB Tatoeba (ast-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 75.59055118110236 - type: f1 value: 71.66854143232096 - type: precision value: 70.30183727034121 - type: recall value: 75.59055118110236 task: type: BitextMining - dataset: config: mkd-eng name: MTEB Tatoeba (mkd-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 65.5 - type: f1 value: 59.26095238095238 - type: precision value: 56.81909090909092 - type: recall value: 65.5 task: type: BitextMining - dataset: config: khm-eng name: MTEB Tatoeba (khm-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 55.26315789473685 - type: f1 value: 47.986523325858506 - type: precision value: 45.33950006595436 - type: recall value: 55.26315789473685 task: type: BitextMining - dataset: config: ces-eng name: MTEB Tatoeba (ces-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 82.89999999999999 - type: f1 value: 78.835 - type: precision value: 77.04761904761905 - type: recall value: 82.89999999999999 task: type: BitextMining - dataset: config: tzl-eng name: MTEB Tatoeba (tzl-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 43.269230769230774 - type: f1 value: 36.20421245421245 - type: precision value: 33.57371794871795 - type: recall value: 43.269230769230774 task: type: BitextMining - dataset: config: urd-eng name: MTEB Tatoeba (urd-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 88.0 - type: f1 value: 84.70666666666666 - type: precision value: 83.23166666666665 - type: recall value: 88.0 task: type: BitextMining - dataset: config: ara-eng name: MTEB Tatoeba (ara-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 77.4 - type: f1 value: 72.54666666666667 - type: precision value: 70.54318181818181 - type: recall value: 77.4 task: type: BitextMining - dataset: config: kor-eng name: MTEB Tatoeba (kor-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 78.60000000000001 - type: f1 value: 74.1588888888889 - type: precision value: 72.30250000000001 - type: recall value: 78.60000000000001 task: type: BitextMining - dataset: config: yid-eng name: MTEB Tatoeba (yid-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 72.40566037735849 - type: f1 value: 66.82587328813744 - type: precision value: 64.75039308176099 - type: recall value: 72.40566037735849 task: type: BitextMining - dataset: config: fin-eng name: MTEB Tatoeba (fin-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 73.8 - type: f1 value: 68.56357142857144 - type: precision value: 66.3178822055138 - type: recall value: 73.8 task: type: BitextMining - dataset: config: tha-eng name: MTEB Tatoeba (tha-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 91.78832116788321 - type: f1 value: 89.3552311435523 - type: precision value: 88.20559610705597 - type: recall value: 91.78832116788321 task: type: BitextMining - dataset: config: wuu-eng name: MTEB Tatoeba (wuu-eng) revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553 split: test type: mteb/tatoeba-bitext-mining metrics: - type: accuracy value: 74.3 - type: f1 value: 69.05085581085581 - type: precision value: 66.955 - type: recall value: 74.3 task: type: BitextMining - dataset: config: default name: MTEB Touche2020 revision: None split: test type: webis-touche2020 metrics: - type: map_at_1 value: 2.896 - type: map_at_10 value: 8.993 - type: map_at_100 value: 14.133999999999999 - type: map_at_1000 value: 15.668000000000001 - type: map_at_3 value: 5.862 - type: map_at_5 value: 7.17 - type: mrr_at_1 value: 34.694 - type: mrr_at_10 value: 42.931000000000004 - type: mrr_at_100 value: 44.81 - type: mrr_at_1000 value: 44.81 - type: mrr_at_3 value: 38.435 - type: mrr_at_5 value: 41.701 - type: ndcg_at_1 value: 31.633 - type: ndcg_at_10 value: 21.163 - type: ndcg_at_100 value: 33.306000000000004 - type: ndcg_at_1000 value: 45.275999999999996 - type: ndcg_at_3 value: 25.685999999999996 - type: ndcg_at_5 value: 23.732 - type: precision_at_1 value: 34.694 - type: precision_at_10 value: 17.755000000000003 - type: precision_at_100 value: 6.938999999999999 - type: precision_at_1000 value: 1.48 - type: precision_at_3 value: 25.85 - type: precision_at_5 value: 23.265 - type: recall_at_1 value: 2.896 - type: recall_at_10 value: 13.333999999999998 - type: recall_at_100 value: 43.517 - type: recall_at_1000 value: 79.836 - type: recall_at_3 value: 6.306000000000001 - type: recall_at_5 value: 8.825 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 69.3874 - type: ap value: 13.829909072469423 - type: f1 value: 53.54534203543492 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 62.62026032823995 - type: f1 value: 62.85251350485221 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: v_measure value: 33.21527881409797 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: cos_sim_accuracy value: 84.97943613280086 - type: cos_sim_ap value: 70.75454316885921 - type: cos_sim_f1 value: 65.38274012676743 - type: cos_sim_precision value: 60.761214318078835 - type: cos_sim_recall value: 70.76517150395777 - type: dot_accuracy value: 79.0546581629612 - type: dot_ap value: 47.3197121792147 - type: dot_f1 value: 49.20106524633821 - type: dot_precision value: 42.45499808502489 - type: dot_recall value: 58.49604221635884 - type: euclidean_accuracy value: 85.08076533349228 - type: euclidean_ap value: 70.95016106374474 - type: euclidean_f1 value: 65.43987900176455 - type: euclidean_precision value: 62.64478764478765 - type: euclidean_recall value: 68.49604221635884 - type: manhattan_accuracy value: 84.93771234428085 - type: manhattan_ap value: 70.63668388755362 - type: manhattan_f1 value: 65.23895401262398 - type: manhattan_precision value: 56.946084218811485 - type: manhattan_recall value: 76.35883905013192 - type: max_accuracy value: 85.08076533349228 - type: max_ap value: 70.95016106374474 - type: max_f1 value: 65.43987900176455 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: cos_sim_accuracy value: 88.69096130709822 - type: cos_sim_ap value: 84.82526278228542 - type: cos_sim_f1 value: 77.65485060585536 - type: cos_sim_precision value: 75.94582658619167 - type: cos_sim_recall value: 79.44256236526024 - type: dot_accuracy value: 80.97954748321496 - type: dot_ap value: 64.81642914145866 - type: dot_f1 value: 60.631996987229975 - type: dot_precision value: 54.5897293631712 - type: dot_recall value: 68.17831844779796 - type: euclidean_accuracy value: 88.6987231730508 - type: euclidean_ap value: 84.80003825477253 - type: euclidean_f1 value: 77.67194179854496 - type: euclidean_precision value: 75.7128235122094 - type: euclidean_recall value: 79.73514012935017 - type: manhattan_accuracy value: 88.62692591298949 - type: manhattan_ap value: 84.80451408255276 - type: manhattan_f1 value: 77.69888949572183 - type: manhattan_precision value: 73.70311528631622 - type: manhattan_recall value: 82.15275639051433 - type: max_accuracy value: 88.6987231730508 - type: max_ap value: 84.82526278228542 - type: max_f1 value: 77.69888949572183 task: type: PairClassification - dataset: config: ru-en name: MTEB BUCC.v2 (ru-en) revision: 1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677 split: test type: mteb/bucc-bitext-mining metrics: - type: accuracy value: 95.72566678212678 - type: f1 value: 94.42443135896548 - type: main_score value: 94.42443135896548 - type: precision value: 93.80868260016165 - type: recall value: 95.72566678212678 task: type: BitextMining - dataset: config: rus_Cyrl-rus_Cyrl name: MTEB BelebeleRetrieval (rus_Cyrl-rus_Cyrl) revision: 75b399394a9803252cfec289d103de462763db7c split: test type: facebook/belebele metrics: - type: main_score value: 92.23599999999999 - type: map_at_1 value: 87.111 - type: map_at_10 value: 90.717 - type: map_at_100 value: 90.879 - type: map_at_1000 value: 90.881 - type: map_at_20 value: 90.849 - type: map_at_3 value: 90.074 - type: map_at_5 value: 90.535 - type: mrr_at_1 value: 87.1111111111111 - type: mrr_at_10 value: 90.7173721340388 - type: mrr_at_100 value: 90.87859682638407 - type: mrr_at_1000 value: 90.88093553612326 - type: mrr_at_20 value: 90.84863516113515 - type: mrr_at_3 value: 90.07407407407409 - type: mrr_at_5 value: 90.53518518518521 - type: nauc_map_at_1000_diff1 value: 92.37373187280554 - type: nauc_map_at_1000_max value: 79.90465445423249 - type: nauc_map_at_1000_std value: -0.6220290556185463 - type: nauc_map_at_100_diff1 value: 92.37386697345335 - type: nauc_map_at_100_max value: 79.90991577223959 - type: nauc_map_at_100_std value: -0.602247514642845 - type: nauc_map_at_10_diff1 value: 92.30907447072467 - type: nauc_map_at_10_max value: 79.86831935337598 - type: nauc_map_at_10_std value: -0.7455191860719699 - type: nauc_map_at_1_diff1 value: 93.29828518358822 - type: nauc_map_at_1_max value: 78.69539619887887 - type: nauc_map_at_1_std value: -4.097150817605763 - type: nauc_map_at_20_diff1 value: 92.38414149703077 - type: nauc_map_at_20_max value: 79.94789814504661 - type: nauc_map_at_20_std value: -0.3928031130400773 - type: nauc_map_at_3_diff1 value: 92.21688899306734 - type: nauc_map_at_3_max value: 80.34586671780885 - type: nauc_map_at_3_std value: 0.24088319695435909 - type: nauc_map_at_5_diff1 value: 92.27931726042982 - type: nauc_map_at_5_max value: 79.99198834003367 - type: nauc_map_at_5_std value: -0.6296366922840796 - type: nauc_mrr_at_1000_diff1 value: 92.37373187280554 - type: nauc_mrr_at_1000_max value: 79.90465445423249 - type: nauc_mrr_at_1000_std value: -0.6220290556185463 - type: nauc_mrr_at_100_diff1 value: 92.37386697345335 - type: nauc_mrr_at_100_max value: 79.90991577223959 - type: nauc_mrr_at_100_std value: -0.602247514642845 - type: nauc_mrr_at_10_diff1 value: 92.30907447072467 - type: nauc_mrr_at_10_max value: 79.86831935337598 - type: nauc_mrr_at_10_std value: -0.7455191860719699 - type: nauc_mrr_at_1_diff1 value: 93.29828518358822 - type: nauc_mrr_at_1_max value: 78.69539619887887 - type: nauc_mrr_at_1_std value: -4.097150817605763 - type: nauc_mrr_at_20_diff1 value: 92.38414149703077 - type: nauc_mrr_at_20_max value: 79.94789814504661 - type: nauc_mrr_at_20_std value: -0.3928031130400773 - type: nauc_mrr_at_3_diff1 value: 92.21688899306734 - type: nauc_mrr_at_3_max value: 80.34586671780885 - type: nauc_mrr_at_3_std value: 0.24088319695435909 - type: nauc_mrr_at_5_diff1 value: 92.27931726042982 - type: nauc_mrr_at_5_max value: 79.99198834003367 - type: nauc_mrr_at_5_std value: -0.6296366922840796 - type: nauc_ndcg_at_1000_diff1 value: 92.30526497646306 - type: nauc_ndcg_at_1000_max value: 80.12734537480418 - type: nauc_ndcg_at_1000_std value: 0.22849408935578744 - type: nauc_ndcg_at_100_diff1 value: 92.31347123202318 - type: nauc_ndcg_at_100_max value: 80.29207038703142 - type: nauc_ndcg_at_100_std value: 0.816825944406239 - type: nauc_ndcg_at_10_diff1 value: 92.05430189845808 - type: nauc_ndcg_at_10_max value: 80.16515667442968 - type: nauc_ndcg_at_10_std value: 0.7486447532544893 - type: nauc_ndcg_at_1_diff1 value: 93.29828518358822 - type: nauc_ndcg_at_1_max value: 78.69539619887887 - type: nauc_ndcg_at_1_std value: -4.097150817605763 - type: nauc_ndcg_at_20_diff1 value: 92.40147868825079 - type: nauc_ndcg_at_20_max value: 80.5117307181802 - type: nauc_ndcg_at_20_std value: 2.0431351539517033 - type: nauc_ndcg_at_3_diff1 value: 91.88894444422789 - type: nauc_ndcg_at_3_max value: 81.09256084196045 - type: nauc_ndcg_at_3_std value: 2.422705909643621 - type: nauc_ndcg_at_5_diff1 value: 91.99711052955728 - type: nauc_ndcg_at_5_max value: 80.46996334573979 - type: nauc_ndcg_at_5_std value: 0.9086986899040708 - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_100_diff1 value: 93.46405228758012 - type: nauc_precision_at_100_max value: 100.0 - type: nauc_precision_at_100_std value: 70.71661998132774 - type: nauc_precision_at_10_diff1 value: 90.13938908896874 - type: nauc_precision_at_10_max value: 82.21121782046167 - type: nauc_precision_at_10_std value: 13.075230092036083 - type: nauc_precision_at_1_diff1 value: 93.29828518358822 - type: nauc_precision_at_1_max value: 78.69539619887887 - type: nauc_precision_at_1_std value: -4.097150817605763 - type: nauc_precision_at_20_diff1 value: 94.9723479135242 - type: nauc_precision_at_20_max value: 91.04000574588684 - type: nauc_precision_at_20_std value: 48.764634058749586 - type: nauc_precision_at_3_diff1 value: 90.52690041533852 - type: nauc_precision_at_3_max value: 84.35075179497126 - type: nauc_precision_at_3_std value: 12.036768730480507 - type: nauc_precision_at_5_diff1 value: 90.44234360410769 - type: nauc_precision_at_5_max value: 83.21895424836558 - type: nauc_precision_at_5_std value: 9.974323062558037 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 93.46405228758294 - type: nauc_recall_at_100_max value: 100.0 - type: nauc_recall_at_100_std value: 70.71661998132666 - type: nauc_recall_at_10_diff1 value: 90.13938908896864 - type: nauc_recall_at_10_max value: 82.21121782046124 - type: nauc_recall_at_10_std value: 13.075230092036506 - type: nauc_recall_at_1_diff1 value: 93.29828518358822 - type: nauc_recall_at_1_max value: 78.69539619887887 - type: nauc_recall_at_1_std value: -4.097150817605763 - type: nauc_recall_at_20_diff1 value: 94.97234791352489 - type: nauc_recall_at_20_max value: 91.04000574588774 - type: nauc_recall_at_20_std value: 48.764634058752065 - type: nauc_recall_at_3_diff1 value: 90.52690041533845 - type: nauc_recall_at_3_max value: 84.35075179497079 - type: nauc_recall_at_3_std value: 12.036768730480583 - type: nauc_recall_at_5_diff1 value: 90.44234360410861 - type: nauc_recall_at_5_max value: 83.21895424836595 - type: nauc_recall_at_5_std value: 9.974323062558147 - type: ndcg_at_1 value: 87.111 - type: ndcg_at_10 value: 92.23599999999999 - type: ndcg_at_100 value: 92.87100000000001 - type: ndcg_at_1000 value: 92.928 - type: ndcg_at_20 value: 92.67699999999999 - type: ndcg_at_3 value: 90.973 - type: ndcg_at_5 value: 91.801 - type: precision_at_1 value: 87.111 - type: precision_at_10 value: 9.689 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.928 - type: precision_at_3 value: 31.185000000000002 - type: precision_at_5 value: 19.111 - type: recall_at_1 value: 87.111 - type: recall_at_10 value: 96.88900000000001 - type: recall_at_100 value: 99.556 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 98.556 - type: recall_at_3 value: 93.556 - type: recall_at_5 value: 95.556 task: type: Retrieval - dataset: config: rus_Cyrl-eng_Latn name: MTEB BelebeleRetrieval (rus_Cyrl-eng_Latn) revision: 75b399394a9803252cfec289d103de462763db7c split: test type: facebook/belebele metrics: - type: main_score value: 86.615 - type: map_at_1 value: 78.0 - type: map_at_10 value: 83.822 - type: map_at_100 value: 84.033 - type: map_at_1000 value: 84.03500000000001 - type: map_at_20 value: 83.967 - type: map_at_3 value: 82.315 - type: map_at_5 value: 83.337 - type: mrr_at_1 value: 78.0 - type: mrr_at_10 value: 83.82213403880073 - type: mrr_at_100 value: 84.03281327810801 - type: mrr_at_1000 value: 84.03460051000452 - type: mrr_at_20 value: 83.9673773122303 - type: mrr_at_3 value: 82.31481481481484 - type: mrr_at_5 value: 83.33703703703708 - type: nauc_map_at_1000_diff1 value: 80.78467576987832 - type: nauc_map_at_1000_max value: 51.41718334647604 - type: nauc_map_at_1000_std value: -16.23873782768812 - type: nauc_map_at_100_diff1 value: 80.78490931240695 - type: nauc_map_at_100_max value: 51.41504597713061 - type: nauc_map_at_100_std value: -16.23538559475366 - type: nauc_map_at_10_diff1 value: 80.73989245374868 - type: nauc_map_at_10_max value: 51.43026079433827 - type: nauc_map_at_10_std value: -16.13414330905897 - type: nauc_map_at_1_diff1 value: 82.36966971144186 - type: nauc_map_at_1_max value: 52.988877039509916 - type: nauc_map_at_1_std value: -15.145824639495546 - type: nauc_map_at_20_diff1 value: 80.75923781626145 - type: nauc_map_at_20_max value: 51.40181079374639 - type: nauc_map_at_20_std value: -16.260566097377165 - type: nauc_map_at_3_diff1 value: 80.65242627065471 - type: nauc_map_at_3_max value: 50.623980338841214 - type: nauc_map_at_3_std value: -16.818343442794294 - type: nauc_map_at_5_diff1 value: 80.45976387021862 - type: nauc_map_at_5_max value: 51.533621728445866 - type: nauc_map_at_5_std value: -16.279891536945815 - type: nauc_mrr_at_1000_diff1 value: 80.78467576987832 - type: nauc_mrr_at_1000_max value: 51.41718334647604 - type: nauc_mrr_at_1000_std value: -16.23873782768812 - type: nauc_mrr_at_100_diff1 value: 80.78490931240695 - type: nauc_mrr_at_100_max value: 51.41504597713061 - type: nauc_mrr_at_100_std value: -16.23538559475366 - type: nauc_mrr_at_10_diff1 value: 80.73989245374868 - type: nauc_mrr_at_10_max value: 51.43026079433827 - type: nauc_mrr_at_10_std value: -16.13414330905897 - type: nauc_mrr_at_1_diff1 value: 82.36966971144186 - type: nauc_mrr_at_1_max value: 52.988877039509916 - type: nauc_mrr_at_1_std value: -15.145824639495546 - type: nauc_mrr_at_20_diff1 value: 80.75923781626145 - type: nauc_mrr_at_20_max value: 51.40181079374639 - type: nauc_mrr_at_20_std value: -16.260566097377165 - type: nauc_mrr_at_3_diff1 value: 80.65242627065471 - type: nauc_mrr_at_3_max value: 50.623980338841214 - type: nauc_mrr_at_3_std value: -16.818343442794294 - type: nauc_mrr_at_5_diff1 value: 80.45976387021862 - type: nauc_mrr_at_5_max value: 51.533621728445866 - type: nauc_mrr_at_5_std value: -16.279891536945815 - type: nauc_ndcg_at_1000_diff1 value: 80.60009446938174 - type: nauc_ndcg_at_1000_max value: 51.381708043594166 - type: nauc_ndcg_at_1000_std value: -16.054256944160848 - type: nauc_ndcg_at_100_diff1 value: 80.58971462930421 - type: nauc_ndcg_at_100_max value: 51.25436917735444 - type: nauc_ndcg_at_100_std value: -15.862944972269894 - type: nauc_ndcg_at_10_diff1 value: 80.37967179454489 - type: nauc_ndcg_at_10_max value: 51.590394257251006 - type: nauc_ndcg_at_10_std value: -15.489799384799591 - type: nauc_ndcg_at_1_diff1 value: 82.36966971144186 - type: nauc_ndcg_at_1_max value: 52.988877039509916 - type: nauc_ndcg_at_1_std value: -15.145824639495546 - type: nauc_ndcg_at_20_diff1 value: 80.40299527470081 - type: nauc_ndcg_at_20_max value: 51.395132284307074 - type: nauc_ndcg_at_20_std value: -15.906165526937203 - type: nauc_ndcg_at_3_diff1 value: 80.10347913649302 - type: nauc_ndcg_at_3_max value: 50.018431855573844 - type: nauc_ndcg_at_3_std value: -17.12743750163884 - type: nauc_ndcg_at_5_diff1 value: 79.65918647776613 - type: nauc_ndcg_at_5_max value: 51.76710880330806 - type: nauc_ndcg_at_5_std value: -16.071901882035945 - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_100_diff1 value: 77.41596638655459 - type: nauc_precision_at_100_max value: 22.572362278246565 - type: nauc_precision_at_100_std value: 26.890756302525716 - type: nauc_precision_at_10_diff1 value: 77.82112845138009 - type: nauc_precision_at_10_max value: 54.2550353474723 - type: nauc_precision_at_10_std value: -7.492997198879646 - type: nauc_precision_at_1_diff1 value: 82.36966971144186 - type: nauc_precision_at_1_max value: 52.988877039509916 - type: nauc_precision_at_1_std value: -15.145824639495546 - type: nauc_precision_at_20_diff1 value: 75.89091192032318 - type: nauc_precision_at_20_max value: 52.03275754746293 - type: nauc_precision_at_20_std value: -7.8411920323686175 - type: nauc_precision_at_3_diff1 value: 78.0256020644638 - type: nauc_precision_at_3_max value: 47.80353641248523 - type: nauc_precision_at_3_std value: -18.181625255723503 - type: nauc_precision_at_5_diff1 value: 75.21583976056174 - type: nauc_precision_at_5_max value: 53.716281032960765 - type: nauc_precision_at_5_std value: -14.411700753360812 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 77.4159663865523 - type: nauc_recall_at_100_max value: 22.57236227824646 - type: nauc_recall_at_100_std value: 26.89075630252133 - type: nauc_recall_at_10_diff1 value: 77.82112845138037 - type: nauc_recall_at_10_max value: 54.25503534747204 - type: nauc_recall_at_10_std value: -7.492997198879666 - type: nauc_recall_at_1_diff1 value: 82.36966971144186 - type: nauc_recall_at_1_max value: 52.988877039509916 - type: nauc_recall_at_1_std value: -15.145824639495546 - type: nauc_recall_at_20_diff1 value: 75.89091192032362 - type: nauc_recall_at_20_max value: 52.032757547463184 - type: nauc_recall_at_20_std value: -7.84119203236888 - type: nauc_recall_at_3_diff1 value: 78.02560206446354 - type: nauc_recall_at_3_max value: 47.80353641248526 - type: nauc_recall_at_3_std value: -18.181625255723656 - type: nauc_recall_at_5_diff1 value: 75.21583976056185 - type: nauc_recall_at_5_max value: 53.71628103296118 - type: nauc_recall_at_5_std value: -14.411700753360634 - type: ndcg_at_1 value: 78.0 - type: ndcg_at_10 value: 86.615 - type: ndcg_at_100 value: 87.558 - type: ndcg_at_1000 value: 87.613 - type: ndcg_at_20 value: 87.128 - type: ndcg_at_3 value: 83.639 - type: ndcg_at_5 value: 85.475 - type: precision_at_1 value: 78.0 - type: precision_at_10 value: 9.533 - type: precision_at_100 value: 0.996 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.867 - type: precision_at_3 value: 29.148000000000003 - type: precision_at_5 value: 18.378 - type: recall_at_1 value: 78.0 - type: recall_at_10 value: 95.333 - type: recall_at_100 value: 99.556 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 97.333 - type: recall_at_3 value: 87.444 - type: recall_at_5 value: 91.889 task: type: Retrieval - dataset: config: eng_Latn-rus_Cyrl name: MTEB BelebeleRetrieval (eng_Latn-rus_Cyrl) revision: 75b399394a9803252cfec289d103de462763db7c split: test type: facebook/belebele metrics: - type: main_score value: 82.748 - type: map_at_1 value: 73.444 - type: map_at_10 value: 79.857 - type: map_at_100 value: 80.219 - type: map_at_1000 value: 80.22500000000001 - type: map_at_20 value: 80.10300000000001 - type: map_at_3 value: 78.593 - type: map_at_5 value: 79.515 - type: mrr_at_1 value: 73.44444444444444 - type: mrr_at_10 value: 79.85705467372136 - type: mrr_at_100 value: 80.21942320422542 - type: mrr_at_1000 value: 80.2245364027152 - type: mrr_at_20 value: 80.10273201266493 - type: mrr_at_3 value: 78.59259259259258 - type: mrr_at_5 value: 79.51481481481483 - type: nauc_map_at_1000_diff1 value: 83.69682652271125 - type: nauc_map_at_1000_max value: 61.70131708044767 - type: nauc_map_at_1000_std value: 9.345825405274955 - type: nauc_map_at_100_diff1 value: 83.68924820523492 - type: nauc_map_at_100_max value: 61.6965735573098 - type: nauc_map_at_100_std value: 9.366132859525775 - type: nauc_map_at_10_diff1 value: 83.61802964269985 - type: nauc_map_at_10_max value: 61.74274476167882 - type: nauc_map_at_10_std value: 9.504060995819101 - type: nauc_map_at_1_diff1 value: 86.37079221403225 - type: nauc_map_at_1_max value: 61.856861655370686 - type: nauc_map_at_1_std value: 4.708911881992707 - type: nauc_map_at_20_diff1 value: 83.62920965453047 - type: nauc_map_at_20_max value: 61.761029350326965 - type: nauc_map_at_20_std value: 9.572978651118351 - type: nauc_map_at_3_diff1 value: 83.66665673154306 - type: nauc_map_at_3_max value: 61.13597610587937 - type: nauc_map_at_3_std value: 9.309596395240598 - type: nauc_map_at_5_diff1 value: 83.52307226455358 - type: nauc_map_at_5_max value: 61.59405758027573 - type: nauc_map_at_5_std value: 9.320025423287671 - type: nauc_mrr_at_1000_diff1 value: 83.69682652271125 - type: nauc_mrr_at_1000_max value: 61.70131708044767 - type: nauc_mrr_at_1000_std value: 9.345825405274955 - type: nauc_mrr_at_100_diff1 value: 83.68924820523492 - type: nauc_mrr_at_100_max value: 61.6965735573098 - type: nauc_mrr_at_100_std value: 9.366132859525775 - type: nauc_mrr_at_10_diff1 value: 83.61802964269985 - type: nauc_mrr_at_10_max value: 61.74274476167882 - type: nauc_mrr_at_10_std value: 9.504060995819101 - type: nauc_mrr_at_1_diff1 value: 86.37079221403225 - type: nauc_mrr_at_1_max value: 61.856861655370686 - type: nauc_mrr_at_1_std value: 4.708911881992707 - type: nauc_mrr_at_20_diff1 value: 83.62920965453047 - type: nauc_mrr_at_20_max value: 61.761029350326965 - type: nauc_mrr_at_20_std value: 9.572978651118351 - type: nauc_mrr_at_3_diff1 value: 83.66665673154306 - type: nauc_mrr_at_3_max value: 61.13597610587937 - type: nauc_mrr_at_3_std value: 9.309596395240598 - type: nauc_mrr_at_5_diff1 value: 83.52307226455358 - type: nauc_mrr_at_5_max value: 61.59405758027573 - type: nauc_mrr_at_5_std value: 9.320025423287671 - type: nauc_ndcg_at_1000_diff1 value: 83.24213186482201 - type: nauc_ndcg_at_1000_max value: 61.77629841787496 - type: nauc_ndcg_at_1000_std value: 10.332527869705851 - type: nauc_ndcg_at_100_diff1 value: 83.06815820441027 - type: nauc_ndcg_at_100_max value: 61.6947181864579 - type: nauc_ndcg_at_100_std value: 10.888922975877316 - type: nauc_ndcg_at_10_diff1 value: 82.58238431386295 - type: nauc_ndcg_at_10_max value: 62.10333663935709 - type: nauc_ndcg_at_10_std value: 11.746030330958174 - type: nauc_ndcg_at_1_diff1 value: 86.37079221403225 - type: nauc_ndcg_at_1_max value: 61.856861655370686 - type: nauc_ndcg_at_1_std value: 4.708911881992707 - type: nauc_ndcg_at_20_diff1 value: 82.67888324480154 - type: nauc_ndcg_at_20_max value: 62.28124917486516 - type: nauc_ndcg_at_20_std value: 12.343058917563914 - type: nauc_ndcg_at_3_diff1 value: 82.71277373710663 - type: nauc_ndcg_at_3_max value: 60.66677922989939 - type: nauc_ndcg_at_3_std value: 10.843633736296528 - type: nauc_ndcg_at_5_diff1 value: 82.34691124846786 - type: nauc_ndcg_at_5_max value: 61.605961382062716 - type: nauc_ndcg_at_5_std value: 11.129011077702602 - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_100_diff1 value: 60.93103908230194 - type: nauc_precision_at_100_max value: 52.621048419370695 - type: nauc_precision_at_100_std value: 85.60090702947922 - type: nauc_precision_at_10_diff1 value: 76.26517273576093 - type: nauc_precision_at_10_max value: 65.2013694366636 - type: nauc_precision_at_10_std value: 26.50357920946173 - type: nauc_precision_at_1_diff1 value: 86.37079221403225 - type: nauc_precision_at_1_max value: 61.856861655370686 - type: nauc_precision_at_1_std value: 4.708911881992707 - type: nauc_precision_at_20_diff1 value: 73.47946930710295 - type: nauc_precision_at_20_max value: 70.19520986689217 - type: nauc_precision_at_20_std value: 45.93186111653967 - type: nauc_precision_at_3_diff1 value: 79.02026879450186 - type: nauc_precision_at_3_max value: 58.75074624692399 - type: nauc_precision_at_3_std value: 16.740684654251037 - type: nauc_precision_at_5_diff1 value: 76.47585662281637 - type: nauc_precision_at_5_max value: 61.86270922013127 - type: nauc_precision_at_5_std value: 20.1833625455035 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 60.93103908229921 - type: nauc_recall_at_100_max value: 52.62104841936668 - type: nauc_recall_at_100_std value: 85.60090702947748 - type: nauc_recall_at_10_diff1 value: 76.26517273576097 - type: nauc_recall_at_10_max value: 65.20136943666347 - type: nauc_recall_at_10_std value: 26.50357920946174 - type: nauc_recall_at_1_diff1 value: 86.37079221403225 - type: nauc_recall_at_1_max value: 61.856861655370686 - type: nauc_recall_at_1_std value: 4.708911881992707 - type: nauc_recall_at_20_diff1 value: 73.47946930710269 - type: nauc_recall_at_20_max value: 70.19520986689254 - type: nauc_recall_at_20_std value: 45.93186111653943 - type: nauc_recall_at_3_diff1 value: 79.02026879450173 - type: nauc_recall_at_3_max value: 58.750746246923924 - type: nauc_recall_at_3_std value: 16.740684654251076 - type: nauc_recall_at_5_diff1 value: 76.4758566228162 - type: nauc_recall_at_5_max value: 61.862709220131386 - type: nauc_recall_at_5_std value: 20.18336254550361 - type: ndcg_at_1 value: 73.444 - type: ndcg_at_10 value: 82.748 - type: ndcg_at_100 value: 84.416 - type: ndcg_at_1000 value: 84.52300000000001 - type: ndcg_at_20 value: 83.646 - type: ndcg_at_3 value: 80.267 - type: ndcg_at_5 value: 81.922 - type: precision_at_1 value: 73.444 - type: precision_at_10 value: 9.167 - type: precision_at_100 value: 0.992 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.761 - type: precision_at_3 value: 28.37 - type: precision_at_5 value: 17.822 - type: recall_at_1 value: 73.444 - type: recall_at_10 value: 91.667 - type: recall_at_100 value: 99.222 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 95.222 - type: recall_at_3 value: 85.111 - type: recall_at_5 value: 89.11099999999999 task: type: Retrieval - dataset: config: eng_Latn-rus_Cyrl name: MTEB BibleNLPBitextMining (eng_Latn-rus_Cyrl) revision: 264a18480c529d9e922483839b4b9758e690b762 split: train type: davidstap/biblenlp-corpus-mmteb metrics: - type: accuracy value: 96.875 - type: f1 value: 95.83333333333333 - type: main_score value: 95.83333333333333 - type: precision value: 95.3125 - type: recall value: 96.875 task: type: BitextMining - dataset: config: rus_Cyrl-eng_Latn name: MTEB BibleNLPBitextMining (rus_Cyrl-eng_Latn) revision: 264a18480c529d9e922483839b4b9758e690b762 split: train type: davidstap/biblenlp-corpus-mmteb metrics: - type: accuracy value: 88.671875 - type: f1 value: 85.3515625 - type: main_score value: 85.3515625 - type: precision value: 83.85416666666667 - type: recall value: 88.671875 task: type: BitextMining - dataset: config: default name: MTEB CEDRClassification (default) revision: c0ba03d058e3e1b2f3fd20518875a4563dd12db4 split: test type: ai-forever/cedr-classification metrics: - type: accuracy value: 40.06907545164719 - type: f1 value: 26.285000550712407 - type: lrap value: 64.4280021253997 - type: main_score value: 40.06907545164719 task: type: MultilabelClassification - dataset: config: default name: MTEB CyrillicTurkicLangClassification (default) revision: e42d330f33d65b7b72dfd408883daf1661f06f18 split: test type: tatiana-merz/cyrillic_turkic_langs metrics: - type: accuracy value: 43.3447265625 - type: f1 value: 40.08400146827895 - type: f1_weighted value: 40.08499428040896 - type: main_score value: 43.3447265625 task: type: Classification - dataset: config: ace_Arab-rus_Cyrl name: MTEB FloresBitextMining (ace_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 6.225296442687747 - type: f1 value: 5.5190958860075 - type: main_score value: 5.5190958860075 - type: precision value: 5.3752643758000005 - type: recall value: 6.225296442687747 task: type: BitextMining - dataset: config: bam_Latn-rus_Cyrl name: MTEB FloresBitextMining (bam_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 68.37944664031622 - type: f1 value: 64.54819836666252 - type: main_score value: 64.54819836666252 - type: precision value: 63.07479233454916 - type: recall value: 68.37944664031622 task: type: BitextMining - dataset: config: dzo_Tibt-rus_Cyrl name: MTEB FloresBitextMining (dzo_Tibt-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 0.09881422924901186 - type: f1 value: 0.00019509225912934226 - type: main_score value: 0.00019509225912934226 - type: precision value: 9.76425190207627e-05 - type: recall value: 0.09881422924901186 task: type: BitextMining - dataset: config: hin_Deva-rus_Cyrl name: MTEB FloresBitextMining (hin_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.60474308300395 - type: f1 value: 99.47299077733861 - type: main_score value: 99.47299077733861 - type: precision value: 99.40711462450594 - type: recall value: 99.60474308300395 task: type: BitextMining - dataset: config: khm_Khmr-rus_Cyrl name: MTEB FloresBitextMining (khm_Khmr-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 88.83399209486166 - type: f1 value: 87.71151056318254 - type: main_score value: 87.71151056318254 - type: precision value: 87.32012500709193 - type: recall value: 88.83399209486166 task: type: BitextMining - dataset: config: mag_Deva-rus_Cyrl name: MTEB FloresBitextMining (mag_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.7239789196311 - type: main_score value: 97.7239789196311 - type: precision value: 97.61904761904762 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: pap_Latn-rus_Cyrl name: MTEB FloresBitextMining (pap_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.0711462450593 - type: f1 value: 93.68187806922984 - type: main_score value: 93.68187806922984 - type: precision value: 93.58925452707051 - type: recall value: 94.0711462450593 task: type: BitextMining - dataset: config: sot_Latn-rus_Cyrl name: MTEB FloresBitextMining (sot_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 90.9090909090909 - type: f1 value: 89.23171936758892 - type: main_score value: 89.23171936758892 - type: precision value: 88.51790014083866 - type: recall value: 90.9090909090909 task: type: BitextMining - dataset: config: tur_Latn-rus_Cyrl name: MTEB FloresBitextMining (tur_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.9459815546772 - type: main_score value: 98.9459815546772 - type: precision value: 98.81422924901186 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: ace_Latn-rus_Cyrl name: MTEB FloresBitextMining (ace_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 66.10671936758892 - type: f1 value: 63.81888256297873 - type: main_score value: 63.81888256297873 - type: precision value: 63.01614067933451 - type: recall value: 66.10671936758892 task: type: BitextMining - dataset: config: ban_Latn-rus_Cyrl name: MTEB FloresBitextMining (ban_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 79.44664031620553 - type: f1 value: 77.6311962082713 - type: main_score value: 77.6311962082713 - type: precision value: 76.93977931929739 - type: recall value: 79.44664031620553 task: type: BitextMining - dataset: config: ell_Grek-rus_Cyrl name: MTEB FloresBitextMining (ell_Grek-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.2094861660079 - type: main_score value: 99.2094861660079 - type: precision value: 99.1106719367589 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: hne_Deva-rus_Cyrl name: MTEB FloresBitextMining (hne_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.83794466403161 - type: f1 value: 96.25352907961603 - type: main_score value: 96.25352907961603 - type: precision value: 96.02155091285526 - type: recall value: 96.83794466403161 task: type: BitextMining - dataset: config: kik_Latn-rus_Cyrl name: MTEB FloresBitextMining (kik_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 76.28458498023716 - type: f1 value: 73.5596919895859 - type: main_score value: 73.5596919895859 - type: precision value: 72.40900759055246 - type: recall value: 76.28458498023716 task: type: BitextMining - dataset: config: mai_Deva-rus_Cyrl name: MTEB FloresBitextMining (mai_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.72727272727273 - type: f1 value: 97.37812911725956 - type: main_score value: 97.37812911725956 - type: precision value: 97.26002258610953 - type: recall value: 97.72727272727273 task: type: BitextMining - dataset: config: pbt_Arab-rus_Cyrl name: MTEB FloresBitextMining (pbt_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.0711462450593 - type: f1 value: 93.34700387331966 - type: main_score value: 93.34700387331966 - type: precision value: 93.06920556920556 - type: recall value: 94.0711462450593 task: type: BitextMining - dataset: config: spa_Latn-rus_Cyrl name: MTEB FloresBitextMining (spa_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.9459815546772 - type: main_score value: 98.9459815546772 - type: precision value: 98.81422924901186 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: twi_Latn-rus_Cyrl name: MTEB FloresBitextMining (twi_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 80.73122529644269 - type: f1 value: 77.77434363246721 - type: main_score value: 77.77434363246721 - type: precision value: 76.54444287596462 - type: recall value: 80.73122529644269 task: type: BitextMining - dataset: config: acm_Arab-rus_Cyrl name: MTEB FloresBitextMining (acm_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.56521739130434 - type: f1 value: 92.92490118577075 - type: main_score value: 92.92490118577075 - type: precision value: 92.16897233201581 - type: recall value: 94.56521739130434 task: type: BitextMining - dataset: config: bel_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (bel_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.98550724637681 - type: main_score value: 98.98550724637681 - type: precision value: 98.88833992094862 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: eng_Latn-rus_Cyrl name: MTEB FloresBitextMining (eng_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.60474308300395 - type: f1 value: 99.4729907773386 - type: main_score value: 99.4729907773386 - type: precision value: 99.40711462450594 - type: recall value: 99.60474308300395 task: type: BitextMining - dataset: config: hrv_Latn-rus_Cyrl name: MTEB FloresBitextMining (hrv_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 99.05138339920948 - type: main_score value: 99.05138339920948 - type: precision value: 99.00691699604744 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: kin_Latn-rus_Cyrl name: MTEB FloresBitextMining (kin_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 88.2411067193676 - type: f1 value: 86.5485246227658 - type: main_score value: 86.5485246227658 - type: precision value: 85.90652101521667 - type: recall value: 88.2411067193676 task: type: BitextMining - dataset: config: mal_Mlym-rus_Cyrl name: MTEB FloresBitextMining (mal_Mlym-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.51778656126481 - type: f1 value: 98.07971014492753 - type: main_score value: 98.07971014492753 - type: precision value: 97.88372859025033 - type: recall value: 98.51778656126481 task: type: BitextMining - dataset: config: pes_Arab-rus_Cyrl name: MTEB FloresBitextMining (pes_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.51778656126481 - type: f1 value: 98.0566534914361 - type: main_score value: 98.0566534914361 - type: precision value: 97.82608695652173 - type: recall value: 98.51778656126481 task: type: BitextMining - dataset: config: srd_Latn-rus_Cyrl name: MTEB FloresBitextMining (srd_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 82.6086956521739 - type: f1 value: 80.9173470979821 - type: main_score value: 80.9173470979821 - type: precision value: 80.24468672882627 - type: recall value: 82.6086956521739 task: type: BitextMining - dataset: config: tzm_Tfng-rus_Cyrl name: MTEB FloresBitextMining (tzm_Tfng-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 7.41106719367589 - type: f1 value: 6.363562740945329 - type: main_score value: 6.363562740945329 - type: precision value: 6.090373175353411 - type: recall value: 7.41106719367589 task: type: BitextMining - dataset: config: acq_Arab-rus_Cyrl name: MTEB FloresBitextMining (acq_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.25691699604744 - type: f1 value: 93.81422924901187 - type: main_score value: 93.81422924901187 - type: precision value: 93.14064558629775 - type: recall value: 95.25691699604744 task: type: BitextMining - dataset: config: bem_Latn-rus_Cyrl name: MTEB FloresBitextMining (bem_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 68.08300395256917 - type: f1 value: 65.01368772860867 - type: main_score value: 65.01368772860867 - type: precision value: 63.91052337510628 - type: recall value: 68.08300395256917 task: type: BitextMining - dataset: config: epo_Latn-rus_Cyrl name: MTEB FloresBitextMining (epo_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.41897233201581 - type: f1 value: 98.17193675889328 - type: main_score value: 98.17193675889328 - type: precision value: 98.08210564139418 - type: recall value: 98.41897233201581 task: type: BitextMining - dataset: config: hun_Latn-rus_Cyrl name: MTEB FloresBitextMining (hun_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.1106719367589 - type: main_score value: 99.1106719367589 - type: precision value: 99.01185770750988 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: kir_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (kir_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.5296442687747 - type: f1 value: 97.07549806364035 - type: main_score value: 97.07549806364035 - type: precision value: 96.90958498023716 - type: recall value: 97.5296442687747 task: type: BitextMining - dataset: config: mar_Deva-rus_Cyrl name: MTEB FloresBitextMining (mar_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.82608695652173 - type: f1 value: 97.44400527009222 - type: main_score value: 97.44400527009222 - type: precision value: 97.28966685488425 - type: recall value: 97.82608695652173 task: type: BitextMining - dataset: config: plt_Latn-rus_Cyrl name: MTEB FloresBitextMining (plt_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 79.9407114624506 - type: f1 value: 78.3154177760691 - type: main_score value: 78.3154177760691 - type: precision value: 77.69877344877344 - type: recall value: 79.9407114624506 task: type: BitextMining - dataset: config: srp_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (srp_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.70355731225297 - type: f1 value: 99.60474308300395 - type: main_score value: 99.60474308300395 - type: precision value: 99.55533596837944 - type: recall value: 99.70355731225297 task: type: BitextMining - dataset: config: uig_Arab-rus_Cyrl name: MTEB FloresBitextMining (uig_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 83.20158102766798 - type: f1 value: 81.44381923034585 - type: main_score value: 81.44381923034585 - type: precision value: 80.78813411582477 - type: recall value: 83.20158102766798 task: type: BitextMining - dataset: config: aeb_Arab-rus_Cyrl name: MTEB FloresBitextMining (aeb_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.20553359683794 - type: f1 value: 88.75352907961603 - type: main_score value: 88.75352907961603 - type: precision value: 87.64328063241106 - type: recall value: 91.20553359683794 task: type: BitextMining - dataset: config: ben_Beng-rus_Cyrl name: MTEB FloresBitextMining (ben_Beng-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.60671936758894 - type: main_score value: 98.60671936758894 - type: precision value: 98.4766139657444 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: est_Latn-rus_Cyrl name: MTEB FloresBitextMining (est_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.24505928853755 - type: f1 value: 95.27417027417027 - type: main_score value: 95.27417027417027 - type: precision value: 94.84107378129117 - type: recall value: 96.24505928853755 task: type: BitextMining - dataset: config: hye_Armn-rus_Cyrl name: MTEB FloresBitextMining (hye_Armn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.67786561264822 - type: main_score value: 97.67786561264822 - type: precision value: 97.55839022637441 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: kmb_Latn-rus_Cyrl name: MTEB FloresBitextMining (kmb_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 46.047430830039524 - type: f1 value: 42.94464804804471 - type: main_score value: 42.94464804804471 - type: precision value: 41.9851895607238 - type: recall value: 46.047430830039524 task: type: BitextMining - dataset: config: min_Arab-rus_Cyrl name: MTEB FloresBitextMining (min_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 3.9525691699604746 - type: f1 value: 3.402665192725756 - type: main_score value: 3.402665192725756 - type: precision value: 3.303787557740127 - type: recall value: 3.9525691699604746 task: type: BitextMining - dataset: config: pol_Latn-rus_Cyrl name: MTEB FloresBitextMining (pol_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.60474308300395 - type: f1 value: 99.4729907773386 - type: main_score value: 99.4729907773386 - type: precision value: 99.40711462450594 - type: recall value: 99.60474308300395 task: type: BitextMining - dataset: config: ssw_Latn-rus_Cyrl name: MTEB FloresBitextMining (ssw_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 73.22134387351778 - type: f1 value: 70.43086049508975 - type: main_score value: 70.43086049508975 - type: precision value: 69.35312022355656 - type: recall value: 73.22134387351778 task: type: BitextMining - dataset: config: ukr_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (ukr_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.90118577075098 - type: f1 value: 99.86824769433464 - type: main_score value: 99.86824769433464 - type: precision value: 99.85177865612648 - type: recall value: 99.90118577075098 task: type: BitextMining - dataset: config: afr_Latn-rus_Cyrl name: MTEB FloresBitextMining (afr_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.9459815546772 - type: main_score value: 98.9459815546772 - type: precision value: 98.81422924901186 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: bho_Deva-rus_Cyrl name: MTEB FloresBitextMining (bho_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.0711462450593 - type: f1 value: 93.12182382834557 - type: main_score value: 93.12182382834557 - type: precision value: 92.7523453232338 - type: recall value: 94.0711462450593 task: type: BitextMining - dataset: config: eus_Latn-rus_Cyrl name: MTEB FloresBitextMining (eus_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.19367588932806 - type: f1 value: 91.23604975587072 - type: main_score value: 91.23604975587072 - type: precision value: 90.86697443588663 - type: recall value: 92.19367588932806 task: type: BitextMining - dataset: config: ibo_Latn-rus_Cyrl name: MTEB FloresBitextMining (ibo_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 82.21343873517787 - type: f1 value: 80.17901604858126 - type: main_score value: 80.17901604858126 - type: precision value: 79.3792284780028 - type: recall value: 82.21343873517787 task: type: BitextMining - dataset: config: kmr_Latn-rus_Cyrl name: MTEB FloresBitextMining (kmr_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 68.67588932806325 - type: f1 value: 66.72311714750278 - type: main_score value: 66.72311714750278 - type: precision value: 66.00178401554004 - type: recall value: 68.67588932806325 task: type: BitextMining - dataset: config: min_Latn-rus_Cyrl name: MTEB FloresBitextMining (min_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 78.65612648221344 - type: f1 value: 76.26592719972166 - type: main_score value: 76.26592719972166 - type: precision value: 75.39980459997484 - type: recall value: 78.65612648221344 task: type: BitextMining - dataset: config: por_Latn-rus_Cyrl name: MTEB FloresBitextMining (por_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.83794466403161 - type: f1 value: 95.9669678147939 - type: main_score value: 95.9669678147939 - type: precision value: 95.59453227931488 - type: recall value: 96.83794466403161 task: type: BitextMining - dataset: config: sun_Latn-rus_Cyrl name: MTEB FloresBitextMining (sun_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.4901185770751 - type: f1 value: 91.66553983773662 - type: main_score value: 91.66553983773662 - type: precision value: 91.34530928009188 - type: recall value: 92.4901185770751 task: type: BitextMining - dataset: config: umb_Latn-rus_Cyrl name: MTEB FloresBitextMining (umb_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 41.00790513833992 - type: f1 value: 38.21319326004483 - type: main_score value: 38.21319326004483 - type: precision value: 37.200655467675546 - type: recall value: 41.00790513833992 task: type: BitextMining - dataset: config: ajp_Arab-rus_Cyrl name: MTEB FloresBitextMining (ajp_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.35573122529645 - type: f1 value: 93.97233201581028 - type: main_score value: 93.97233201581028 - type: precision value: 93.33333333333333 - type: recall value: 95.35573122529645 task: type: BitextMining - dataset: config: bjn_Arab-rus_Cyrl name: MTEB FloresBitextMining (bjn_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 3.6561264822134385 - type: f1 value: 3.1071978056336484 - type: main_score value: 3.1071978056336484 - type: precision value: 3.0039741229718215 - type: recall value: 3.6561264822134385 task: type: BitextMining - dataset: config: ewe_Latn-rus_Cyrl name: MTEB FloresBitextMining (ewe_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 62.845849802371546 - type: f1 value: 59.82201175670472 - type: main_score value: 59.82201175670472 - type: precision value: 58.72629236362003 - type: recall value: 62.845849802371546 task: type: BitextMining - dataset: config: ilo_Latn-rus_Cyrl name: MTEB FloresBitextMining (ilo_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 83.10276679841897 - type: f1 value: 80.75065288987582 - type: main_score value: 80.75065288987582 - type: precision value: 79.80726451662179 - type: recall value: 83.10276679841897 task: type: BitextMining - dataset: config: knc_Arab-rus_Cyrl name: MTEB FloresBitextMining (knc_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 10.079051383399209 - type: f1 value: 8.759282456080921 - type: main_score value: 8.759282456080921 - type: precision value: 8.474735138956142 - type: recall value: 10.079051383399209 task: type: BitextMining - dataset: config: mkd_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (mkd_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.55072463768116 - type: main_score value: 98.55072463768116 - type: precision value: 98.36956521739131 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: prs_Arab-rus_Cyrl name: MTEB FloresBitextMining (prs_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.68247694334651 - type: main_score value: 98.68247694334651 - type: precision value: 98.51778656126481 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: swe_Latn-rus_Cyrl name: MTEB FloresBitextMining (swe_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.22595520421606 - type: main_score value: 99.22595520421606 - type: precision value: 99.14361001317523 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: urd_Arab-rus_Cyrl name: MTEB FloresBitextMining (urd_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.82608695652173 - type: f1 value: 97.25625823451911 - type: main_score value: 97.25625823451911 - type: precision value: 97.03063241106719 - type: recall value: 97.82608695652173 task: type: BitextMining - dataset: config: aka_Latn-rus_Cyrl name: MTEB FloresBitextMining (aka_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.22529644268775 - type: f1 value: 77.94307687941227 - type: main_score value: 77.94307687941227 - type: precision value: 76.58782793293665 - type: recall value: 81.22529644268775 task: type: BitextMining - dataset: config: bjn_Latn-rus_Cyrl name: MTEB FloresBitextMining (bjn_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 85.27667984189723 - type: f1 value: 83.6869192829922 - type: main_score value: 83.6869192829922 - type: precision value: 83.08670670691656 - type: recall value: 85.27667984189723 task: type: BitextMining - dataset: config: fao_Latn-rus_Cyrl name: MTEB FloresBitextMining (fao_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 80.9288537549407 - type: f1 value: 79.29806087454745 - type: main_score value: 79.29806087454745 - type: precision value: 78.71445871526987 - type: recall value: 80.9288537549407 task: type: BitextMining - dataset: config: ind_Latn-rus_Cyrl name: MTEB FloresBitextMining (ind_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.12252964426878 - type: f1 value: 97.5296442687747 - type: main_score value: 97.5296442687747 - type: precision value: 97.23320158102767 - type: recall value: 98.12252964426878 task: type: BitextMining - dataset: config: knc_Latn-rus_Cyrl name: MTEB FloresBitextMining (knc_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 33.49802371541502 - type: f1 value: 32.02378215033989 - type: main_score value: 32.02378215033989 - type: precision value: 31.511356103747406 - type: recall value: 33.49802371541502 task: type: BitextMining - dataset: config: mlt_Latn-rus_Cyrl name: MTEB FloresBitextMining (mlt_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.40316205533597 - type: f1 value: 90.35317684386006 - type: main_score value: 90.35317684386006 - type: precision value: 89.94845939633488 - type: recall value: 91.40316205533597 task: type: BitextMining - dataset: config: quy_Latn-rus_Cyrl name: MTEB FloresBitextMining (quy_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 40.612648221343875 - type: f1 value: 38.74337544712602 - type: main_score value: 38.74337544712602 - type: precision value: 38.133716022178575 - type: recall value: 40.612648221343875 task: type: BitextMining - dataset: config: swh_Latn-rus_Cyrl name: MTEB FloresBitextMining (swh_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.13438735177866 - type: f1 value: 96.47435897435898 - type: main_score value: 96.47435897435898 - type: precision value: 96.18741765480895 - type: recall value: 97.13438735177866 task: type: BitextMining - dataset: config: uzn_Latn-rus_Cyrl name: MTEB FloresBitextMining (uzn_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.83794466403161 - type: f1 value: 96.26355528529442 - type: main_score value: 96.26355528529442 - type: precision value: 96.0501756697409 - type: recall value: 96.83794466403161 task: type: BitextMining - dataset: config: als_Latn-rus_Cyrl name: MTEB FloresBitextMining (als_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.6907114624506 - type: main_score value: 98.6907114624506 - type: precision value: 98.6142480707698 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: bod_Tibt-rus_Cyrl name: MTEB FloresBitextMining (bod_Tibt-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 1.0869565217391304 - type: f1 value: 0.9224649610442628 - type: main_score value: 0.9224649610442628 - type: precision value: 0.8894275740459898 - type: recall value: 1.0869565217391304 task: type: BitextMining - dataset: config: fij_Latn-rus_Cyrl name: MTEB FloresBitextMining (fij_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 63.24110671936759 - type: f1 value: 60.373189068189525 - type: main_score value: 60.373189068189525 - type: precision value: 59.32326368115546 - type: recall value: 63.24110671936759 task: type: BitextMining - dataset: config: isl_Latn-rus_Cyrl name: MTEB FloresBitextMining (isl_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.03162055335969 - type: f1 value: 87.3102634715907 - type: main_score value: 87.3102634715907 - type: precision value: 86.65991814698712 - type: recall value: 89.03162055335969 task: type: BitextMining - dataset: config: kon_Latn-rus_Cyrl name: MTEB FloresBitextMining (kon_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 73.91304347826086 - type: f1 value: 71.518235523573 - type: main_score value: 71.518235523573 - type: precision value: 70.58714102449801 - type: recall value: 73.91304347826086 task: type: BitextMining - dataset: config: mni_Beng-rus_Cyrl name: MTEB FloresBitextMining (mni_Beng-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 29.545454545454547 - type: f1 value: 27.59513619889114 - type: main_score value: 27.59513619889114 - type: precision value: 26.983849851025344 - type: recall value: 29.545454545454547 task: type: BitextMining - dataset: config: ron_Latn-rus_Cyrl name: MTEB FloresBitextMining (ron_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.2094861660079 - type: main_score value: 99.2094861660079 - type: precision value: 99.1106719367589 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: szl_Latn-rus_Cyrl name: MTEB FloresBitextMining (szl_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 86.26482213438736 - type: f1 value: 85.18912031587512 - type: main_score value: 85.18912031587512 - type: precision value: 84.77199409959775 - type: recall value: 86.26482213438736 task: type: BitextMining - dataset: config: vec_Latn-rus_Cyrl name: MTEB FloresBitextMining (vec_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 85.67193675889328 - type: f1 value: 84.62529734716581 - type: main_score value: 84.62529734716581 - type: precision value: 84.2611422440705 - type: recall value: 85.67193675889328 task: type: BitextMining - dataset: config: amh_Ethi-rus_Cyrl name: MTEB FloresBitextMining (amh_Ethi-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.76284584980237 - type: f1 value: 93.91735076517685 - type: main_score value: 93.91735076517685 - type: precision value: 93.57553798858147 - type: recall value: 94.76284584980237 task: type: BitextMining - dataset: config: bos_Latn-rus_Cyrl name: MTEB FloresBitextMining (bos_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 99.05655938264634 - type: main_score value: 99.05655938264634 - type: precision value: 99.01185770750988 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: fin_Latn-rus_Cyrl name: MTEB FloresBitextMining (fin_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.43741765480895 - type: main_score value: 97.43741765480895 - type: precision value: 97.1590909090909 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: ita_Latn-rus_Cyrl name: MTEB FloresBitextMining (ita_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.70355731225297 - type: f1 value: 99.60474308300395 - type: main_score value: 99.60474308300395 - type: precision value: 99.55533596837944 - type: recall value: 99.70355731225297 task: type: BitextMining - dataset: config: kor_Hang-rus_Cyrl name: MTEB FloresBitextMining (kor_Hang-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.33201581027669 - type: f1 value: 96.49868247694334 - type: main_score value: 96.49868247694334 - type: precision value: 96.10507246376811 - type: recall value: 97.33201581027669 task: type: BitextMining - dataset: config: mos_Latn-rus_Cyrl name: MTEB FloresBitextMining (mos_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 34.683794466403164 - type: f1 value: 32.766819308009076 - type: main_score value: 32.766819308009076 - type: precision value: 32.1637493670237 - type: recall value: 34.683794466403164 task: type: BitextMining - dataset: config: run_Latn-rus_Cyrl name: MTEB FloresBitextMining (run_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 83.399209486166 - type: f1 value: 81.10578750604326 - type: main_score value: 81.10578750604326 - type: precision value: 80.16763162673529 - type: recall value: 83.399209486166 task: type: BitextMining - dataset: config: tam_Taml-rus_Cyrl name: MTEB FloresBitextMining (tam_Taml-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.41897233201581 - type: f1 value: 98.01548089591567 - type: main_score value: 98.01548089591567 - type: precision value: 97.84020327498588 - type: recall value: 98.41897233201581 task: type: BitextMining - dataset: config: vie_Latn-rus_Cyrl name: MTEB FloresBitextMining (vie_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.1106719367589 - type: f1 value: 98.81422924901186 - type: main_score value: 98.81422924901186 - type: precision value: 98.66600790513834 - type: recall value: 99.1106719367589 task: type: BitextMining - dataset: config: apc_Arab-rus_Cyrl name: MTEB FloresBitextMining (apc_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.87351778656127 - type: f1 value: 92.10803689064558 - type: main_score value: 92.10803689064558 - type: precision value: 91.30434782608695 - type: recall value: 93.87351778656127 task: type: BitextMining - dataset: config: bug_Latn-rus_Cyrl name: MTEB FloresBitextMining (bug_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 57.608695652173914 - type: f1 value: 54.95878654927162 - type: main_score value: 54.95878654927162 - type: precision value: 54.067987427805654 - type: recall value: 57.608695652173914 task: type: BitextMining - dataset: config: fon_Latn-rus_Cyrl name: MTEB FloresBitextMining (fon_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 61.95652173913043 - type: f1 value: 58.06537275812945 - type: main_score value: 58.06537275812945 - type: precision value: 56.554057596959204 - type: recall value: 61.95652173913043 task: type: BitextMining - dataset: config: jav_Latn-rus_Cyrl name: MTEB FloresBitextMining (jav_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.47826086956522 - type: f1 value: 92.4784405318002 - type: main_score value: 92.4784405318002 - type: precision value: 92.09168143201127 - type: recall value: 93.47826086956522 task: type: BitextMining - dataset: config: lao_Laoo-rus_Cyrl name: MTEB FloresBitextMining (lao_Laoo-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.10671936758892 - type: f1 value: 89.76104922745239 - type: main_score value: 89.76104922745239 - type: precision value: 89.24754593232855 - type: recall value: 91.10671936758892 task: type: BitextMining - dataset: config: mri_Latn-rus_Cyrl name: MTEB FloresBitextMining (mri_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 71.14624505928853 - type: f1 value: 68.26947125119062 - type: main_score value: 68.26947125119062 - type: precision value: 67.15942311051006 - type: recall value: 71.14624505928853 task: type: BitextMining - dataset: config: rus_Cyrl-ace_Arab name: MTEB FloresBitextMining (rus_Cyrl-ace_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 19.565217391304348 - type: f1 value: 16.321465000323805 - type: main_score value: 16.321465000323805 - type: precision value: 15.478527409347508 - type: recall value: 19.565217391304348 task: type: BitextMining - dataset: config: rus_Cyrl-bam_Latn name: MTEB FloresBitextMining (rus_Cyrl-bam_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 73.41897233201581 - type: f1 value: 68.77366228182746 - type: main_score value: 68.77366228182746 - type: precision value: 66.96012924273795 - type: recall value: 73.41897233201581 task: type: BitextMining - dataset: config: rus_Cyrl-dzo_Tibt name: MTEB FloresBitextMining (rus_Cyrl-dzo_Tibt) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 0.592885375494071 - type: f1 value: 0.02458062426370458 - type: main_score value: 0.02458062426370458 - type: precision value: 0.012824114724683876 - type: recall value: 0.592885375494071 task: type: BitextMining - dataset: config: rus_Cyrl-hin_Deva name: MTEB FloresBitextMining (rus_Cyrl-hin_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.90118577075098 - type: f1 value: 99.86824769433464 - type: main_score value: 99.86824769433464 - type: precision value: 99.85177865612648 - type: recall value: 99.90118577075098 task: type: BitextMining - dataset: config: rus_Cyrl-khm_Khmr name: MTEB FloresBitextMining (rus_Cyrl-khm_Khmr) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.13438735177866 - type: f1 value: 96.24505928853755 - type: main_score value: 96.24505928853755 - type: precision value: 95.81686429512516 - type: recall value: 97.13438735177866 task: type: BitextMining - dataset: config: rus_Cyrl-mag_Deva name: MTEB FloresBitextMining (rus_Cyrl-mag_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.50592885375494 - type: f1 value: 99.35770750988142 - type: main_score value: 99.35770750988142 - type: precision value: 99.29183135704875 - type: recall value: 99.50592885375494 task: type: BitextMining - dataset: config: rus_Cyrl-pap_Latn name: MTEB FloresBitextMining (rus_Cyrl-pap_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.93675889328063 - type: f1 value: 96.05072463768116 - type: main_score value: 96.05072463768116 - type: precision value: 95.66040843214758 - type: recall value: 96.93675889328063 task: type: BitextMining - dataset: config: rus_Cyrl-sot_Latn name: MTEB FloresBitextMining (rus_Cyrl-sot_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.67588932806325 - type: f1 value: 91.7786561264822 - type: main_score value: 91.7786561264822 - type: precision value: 90.91238471673255 - type: recall value: 93.67588932806325 task: type: BitextMining - dataset: config: rus_Cyrl-tur_Latn name: MTEB FloresBitextMining (rus_Cyrl-tur_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.68247694334651 - type: main_score value: 98.68247694334651 - type: precision value: 98.51778656126481 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: rus_Cyrl-ace_Latn name: MTEB FloresBitextMining (rus_Cyrl-ace_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 74.1106719367589 - type: f1 value: 70.21737923911836 - type: main_score value: 70.21737923911836 - type: precision value: 68.7068791410511 - type: recall value: 74.1106719367589 task: type: BitextMining - dataset: config: rus_Cyrl-ban_Latn name: MTEB FloresBitextMining (rus_Cyrl-ban_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.7193675889328 - type: f1 value: 78.76470334510617 - type: main_score value: 78.76470334510617 - type: precision value: 77.76208475761422 - type: recall value: 81.7193675889328 task: type: BitextMining - dataset: config: rus_Cyrl-ell_Grek name: MTEB FloresBitextMining (rus_Cyrl-ell_Grek) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.3201581027668 - type: f1 value: 97.76021080368908 - type: main_score value: 97.76021080368908 - type: precision value: 97.48023715415019 - type: recall value: 98.3201581027668 task: type: BitextMining - dataset: config: rus_Cyrl-hne_Deva name: MTEB FloresBitextMining (rus_Cyrl-hne_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.51778656126481 - type: f1 value: 98.0566534914361 - type: main_score value: 98.0566534914361 - type: precision value: 97.82608695652173 - type: recall value: 98.51778656126481 task: type: BitextMining - dataset: config: rus_Cyrl-kik_Latn name: MTEB FloresBitextMining (rus_Cyrl-kik_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 80.73122529644269 - type: f1 value: 76.42689244220864 - type: main_score value: 76.42689244220864 - type: precision value: 74.63877909530083 - type: recall value: 80.73122529644269 task: type: BitextMining - dataset: config: rus_Cyrl-mai_Deva name: MTEB FloresBitextMining (rus_Cyrl-mai_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.56719367588933 - type: main_score value: 98.56719367588933 - type: precision value: 98.40250329380763 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: rus_Cyrl-pbt_Arab name: MTEB FloresBitextMining (rus_Cyrl-pbt_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.5296442687747 - type: f1 value: 96.73913043478261 - type: main_score value: 96.73913043478261 - type: precision value: 96.36034255599473 - type: recall value: 97.5296442687747 task: type: BitextMining - dataset: config: rus_Cyrl-spa_Latn name: MTEB FloresBitextMining (rus_Cyrl-spa_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.20948616600789 - type: main_score value: 99.20948616600789 - type: precision value: 99.1106719367589 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: rus_Cyrl-twi_Latn name: MTEB FloresBitextMining (rus_Cyrl-twi_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 82.01581027667984 - type: f1 value: 78.064787822953 - type: main_score value: 78.064787822953 - type: precision value: 76.43272186750448 - type: recall value: 82.01581027667984 task: type: BitextMining - dataset: config: rus_Cyrl-acm_Arab name: MTEB FloresBitextMining (rus_Cyrl-acm_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.3201581027668 - type: f1 value: 97.76021080368908 - type: main_score value: 97.76021080368908 - type: precision value: 97.48023715415019 - type: recall value: 98.3201581027668 task: type: BitextMining - dataset: config: rus_Cyrl-bel_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-bel_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.22134387351778 - type: f1 value: 97.67786561264822 - type: main_score value: 97.67786561264822 - type: precision value: 97.4308300395257 - type: recall value: 98.22134387351778 task: type: BitextMining - dataset: config: rus_Cyrl-eng_Latn name: MTEB FloresBitextMining (rus_Cyrl-eng_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.70355731225297 - type: f1 value: 99.60474308300395 - type: main_score value: 99.60474308300395 - type: precision value: 99.55533596837944 - type: recall value: 99.70355731225297 task: type: BitextMining - dataset: config: rus_Cyrl-hrv_Latn name: MTEB FloresBitextMining (rus_Cyrl-hrv_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.1106719367589 - type: f1 value: 98.83069828722002 - type: main_score value: 98.83069828722002 - type: precision value: 98.69894598155466 - type: recall value: 99.1106719367589 task: type: BitextMining - dataset: config: rus_Cyrl-kin_Latn name: MTEB FloresBitextMining (rus_Cyrl-kin_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.37944664031622 - type: f1 value: 91.53162055335969 - type: main_score value: 91.53162055335969 - type: precision value: 90.71475625823452 - type: recall value: 93.37944664031622 task: type: BitextMining - dataset: config: rus_Cyrl-mal_Mlym name: MTEB FloresBitextMining (rus_Cyrl-mal_Mlym) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.07773386034255 - type: main_score value: 99.07773386034255 - type: precision value: 98.96245059288538 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: rus_Cyrl-pes_Arab name: MTEB FloresBitextMining (rus_Cyrl-pes_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.30368906455863 - type: main_score value: 98.30368906455863 - type: precision value: 98.10606060606061 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-srd_Latn name: MTEB FloresBitextMining (rus_Cyrl-srd_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.03162055335969 - type: f1 value: 86.11048371917937 - type: main_score value: 86.11048371917937 - type: precision value: 84.86001317523056 - type: recall value: 89.03162055335969 task: type: BitextMining - dataset: config: rus_Cyrl-tzm_Tfng name: MTEB FloresBitextMining (rus_Cyrl-tzm_Tfng) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 12.351778656126482 - type: f1 value: 10.112177999067715 - type: main_score value: 10.112177999067715 - type: precision value: 9.53495885438645 - type: recall value: 12.351778656126482 task: type: BitextMining - dataset: config: rus_Cyrl-acq_Arab name: MTEB FloresBitextMining (rus_Cyrl-acq_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.55072463768116 - type: main_score value: 98.55072463768116 - type: precision value: 98.36956521739131 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: rus_Cyrl-bem_Latn name: MTEB FloresBitextMining (rus_Cyrl-bem_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 73.22134387351778 - type: f1 value: 68.30479412989295 - type: main_score value: 68.30479412989295 - type: precision value: 66.40073447632736 - type: recall value: 73.22134387351778 task: type: BitextMining - dataset: config: rus_Cyrl-epo_Latn name: MTEB FloresBitextMining (rus_Cyrl-epo_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.1106719367589 - type: f1 value: 98.81422924901186 - type: main_score value: 98.81422924901186 - type: precision value: 98.66600790513834 - type: recall value: 99.1106719367589 task: type: BitextMining - dataset: config: rus_Cyrl-hun_Latn name: MTEB FloresBitextMining (rus_Cyrl-hun_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.83794466403161 - type: f1 value: 95.88274044795784 - type: main_score value: 95.88274044795784 - type: precision value: 95.45454545454545 - type: recall value: 96.83794466403161 task: type: BitextMining - dataset: config: rus_Cyrl-kir_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-kir_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.34387351778656 - type: f1 value: 95.49280429715212 - type: main_score value: 95.49280429715212 - type: precision value: 95.14163372859026 - type: recall value: 96.34387351778656 task: type: BitextMining - dataset: config: rus_Cyrl-mar_Deva name: MTEB FloresBitextMining (rus_Cyrl-mar_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.28722002635047 - type: main_score value: 98.28722002635047 - type: precision value: 98.07312252964427 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-plt_Latn name: MTEB FloresBitextMining (rus_Cyrl-plt_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 88.04347826086956 - type: f1 value: 85.14328063241106 - type: main_score value: 85.14328063241106 - type: precision value: 83.96339168078298 - type: recall value: 88.04347826086956 task: type: BitextMining - dataset: config: rus_Cyrl-srp_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-srp_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.2094861660079 - type: main_score value: 99.2094861660079 - type: precision value: 99.1106719367589 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: rus_Cyrl-uig_Arab name: MTEB FloresBitextMining (rus_Cyrl-uig_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.19367588932806 - type: f1 value: 89.98541313758706 - type: main_score value: 89.98541313758706 - type: precision value: 89.01021080368906 - type: recall value: 92.19367588932806 task: type: BitextMining - dataset: config: rus_Cyrl-aeb_Arab name: MTEB FloresBitextMining (rus_Cyrl-aeb_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.8498023715415 - type: f1 value: 94.63109354413703 - type: main_score value: 94.63109354413703 - type: precision value: 94.05467720685111 - type: recall value: 95.8498023715415 task: type: BitextMining - dataset: config: rus_Cyrl-ben_Beng name: MTEB FloresBitextMining (rus_Cyrl-ben_Beng) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.2094861660079 - type: main_score value: 99.2094861660079 - type: precision value: 99.1106719367589 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: rus_Cyrl-est_Latn name: MTEB FloresBitextMining (rus_Cyrl-est_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.55335968379447 - type: f1 value: 94.2588932806324 - type: main_score value: 94.2588932806324 - type: precision value: 93.65118577075098 - type: recall value: 95.55335968379447 task: type: BitextMining - dataset: config: rus_Cyrl-hye_Armn name: MTEB FloresBitextMining (rus_Cyrl-hye_Armn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.28722002635045 - type: main_score value: 98.28722002635045 - type: precision value: 98.07312252964427 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-kmb_Latn name: MTEB FloresBitextMining (rus_Cyrl-kmb_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 54.24901185770751 - type: f1 value: 49.46146674116913 - type: main_score value: 49.46146674116913 - type: precision value: 47.81033799314432 - type: recall value: 54.24901185770751 task: type: BitextMining - dataset: config: rus_Cyrl-min_Arab name: MTEB FloresBitextMining (rus_Cyrl-min_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 15.810276679841898 - type: f1 value: 13.271207641419332 - type: main_score value: 13.271207641419332 - type: precision value: 12.510673148766033 - type: recall value: 15.810276679841898 task: type: BitextMining - dataset: config: rus_Cyrl-pol_Latn name: MTEB FloresBitextMining (rus_Cyrl-pol_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.32674571805006 - type: main_score value: 98.32674571805006 - type: precision value: 98.14723320158103 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-ssw_Latn name: MTEB FloresBitextMining (rus_Cyrl-ssw_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 80.8300395256917 - type: f1 value: 76.51717847370023 - type: main_score value: 76.51717847370023 - type: precision value: 74.74143610013175 - type: recall value: 80.8300395256917 task: type: BitextMining - dataset: config: rus_Cyrl-ukr_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-ukr_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.60474308300395 - type: f1 value: 99.4729907773386 - type: main_score value: 99.4729907773386 - type: precision value: 99.40711462450594 - type: recall value: 99.60474308300395 task: type: BitextMining - dataset: config: rus_Cyrl-afr_Latn name: MTEB FloresBitextMining (rus_Cyrl-afr_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.1106719367589 - type: f1 value: 98.81422924901186 - type: main_score value: 98.81422924901186 - type: precision value: 98.66600790513834 - type: recall value: 99.1106719367589 task: type: BitextMining - dataset: config: rus_Cyrl-bho_Deva name: MTEB FloresBitextMining (rus_Cyrl-bho_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.6403162055336 - type: f1 value: 95.56982872200265 - type: main_score value: 95.56982872200265 - type: precision value: 95.0592885375494 - type: recall value: 96.6403162055336 task: type: BitextMining - dataset: config: rus_Cyrl-eus_Latn name: MTEB FloresBitextMining (rus_Cyrl-eus_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.62845849802372 - type: f1 value: 96.9038208168643 - type: main_score value: 96.9038208168643 - type: precision value: 96.55797101449275 - type: recall value: 97.62845849802372 task: type: BitextMining - dataset: config: rus_Cyrl-ibo_Latn name: MTEB FloresBitextMining (rus_Cyrl-ibo_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.2292490118577 - type: f1 value: 86.35234330886506 - type: main_score value: 86.35234330886506 - type: precision value: 85.09881422924902 - type: recall value: 89.2292490118577 task: type: BitextMining - dataset: config: rus_Cyrl-kmr_Latn name: MTEB FloresBitextMining (rus_Cyrl-kmr_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 83.49802371541502 - type: f1 value: 79.23630717108978 - type: main_score value: 79.23630717108978 - type: precision value: 77.48188405797102 - type: recall value: 83.49802371541502 task: type: BitextMining - dataset: config: rus_Cyrl-min_Latn name: MTEB FloresBitextMining (rus_Cyrl-min_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 79.34782608695652 - type: f1 value: 75.31689928429059 - type: main_score value: 75.31689928429059 - type: precision value: 73.91519410541149 - type: recall value: 79.34782608695652 task: type: BitextMining - dataset: config: rus_Cyrl-por_Latn name: MTEB FloresBitextMining (rus_Cyrl-por_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.54150197628458 - type: f1 value: 95.53218520609825 - type: main_score value: 95.53218520609825 - type: precision value: 95.07575757575756 - type: recall value: 96.54150197628458 task: type: BitextMining - dataset: config: rus_Cyrl-sun_Latn name: MTEB FloresBitextMining (rus_Cyrl-sun_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.2806324110672 - type: f1 value: 91.56973461321287 - type: main_score value: 91.56973461321287 - type: precision value: 90.84396334890405 - type: recall value: 93.2806324110672 task: type: BitextMining - dataset: config: rus_Cyrl-umb_Latn name: MTEB FloresBitextMining (rus_Cyrl-umb_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 51.87747035573123 - type: f1 value: 46.36591778884269 - type: main_score value: 46.36591778884269 - type: precision value: 44.57730391234227 - type: recall value: 51.87747035573123 task: type: BitextMining - dataset: config: rus_Cyrl-ajp_Arab name: MTEB FloresBitextMining (rus_Cyrl-ajp_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.30368906455863 - type: main_score value: 98.30368906455863 - type: precision value: 98.10606060606061 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-bjn_Arab name: MTEB FloresBitextMining (rus_Cyrl-bjn_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 14.82213438735178 - type: f1 value: 12.365434276616856 - type: main_score value: 12.365434276616856 - type: precision value: 11.802079517180589 - type: recall value: 14.82213438735178 task: type: BitextMining - dataset: config: rus_Cyrl-ewe_Latn name: MTEB FloresBitextMining (rus_Cyrl-ewe_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 71.44268774703558 - type: f1 value: 66.74603174603175 - type: main_score value: 66.74603174603175 - type: precision value: 64.99933339607253 - type: recall value: 71.44268774703558 task: type: BitextMining - dataset: config: rus_Cyrl-ilo_Latn name: MTEB FloresBitextMining (rus_Cyrl-ilo_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 85.86956521739131 - type: f1 value: 83.00139015960917 - type: main_score value: 83.00139015960917 - type: precision value: 81.91411396574439 - type: recall value: 85.86956521739131 task: type: BitextMining - dataset: config: rus_Cyrl-knc_Arab name: MTEB FloresBitextMining (rus_Cyrl-knc_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 14.525691699604742 - type: f1 value: 12.618283715726806 - type: main_score value: 12.618283715726806 - type: precision value: 12.048458493742352 - type: recall value: 14.525691699604742 task: type: BitextMining - dataset: config: rus_Cyrl-mkd_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-mkd_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.22595520421606 - type: main_score value: 99.22595520421606 - type: precision value: 99.14361001317523 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: rus_Cyrl-prs_Arab name: MTEB FloresBitextMining (rus_Cyrl-prs_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.07773386034255 - type: main_score value: 99.07773386034255 - type: precision value: 98.96245059288538 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: rus_Cyrl-swe_Latn name: MTEB FloresBitextMining (rus_Cyrl-swe_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.07773386034256 - type: main_score value: 99.07773386034256 - type: precision value: 98.96245059288538 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: rus_Cyrl-urd_Arab name: MTEB FloresBitextMining (rus_Cyrl-urd_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.61660079051383 - type: f1 value: 98.15546772068511 - type: main_score value: 98.15546772068511 - type: precision value: 97.92490118577075 - type: recall value: 98.61660079051383 task: type: BitextMining - dataset: config: rus_Cyrl-aka_Latn name: MTEB FloresBitextMining (rus_Cyrl-aka_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.02766798418972 - type: f1 value: 76.73277809147375 - type: main_score value: 76.73277809147375 - type: precision value: 74.97404165882426 - type: recall value: 81.02766798418972 task: type: BitextMining - dataset: config: rus_Cyrl-bjn_Latn name: MTEB FloresBitextMining (rus_Cyrl-bjn_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 86.7588932806324 - type: f1 value: 83.92064566965753 - type: main_score value: 83.92064566965753 - type: precision value: 82.83734079929732 - type: recall value: 86.7588932806324 task: type: BitextMining - dataset: config: rus_Cyrl-fao_Latn name: MTEB FloresBitextMining (rus_Cyrl-fao_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 88.43873517786561 - type: f1 value: 85.48136645962732 - type: main_score value: 85.48136645962732 - type: precision value: 84.23418972332016 - type: recall value: 88.43873517786561 task: type: BitextMining - dataset: config: rus_Cyrl-ind_Latn name: MTEB FloresBitextMining (rus_Cyrl-ind_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.68247694334651 - type: main_score value: 98.68247694334651 - type: precision value: 98.51778656126481 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: rus_Cyrl-knc_Latn name: MTEB FloresBitextMining (rus_Cyrl-knc_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 45.8498023715415 - type: f1 value: 40.112030865489366 - type: main_score value: 40.112030865489366 - type: precision value: 38.28262440050776 - type: recall value: 45.8498023715415 task: type: BitextMining - dataset: config: rus_Cyrl-mlt_Latn name: MTEB FloresBitextMining (rus_Cyrl-mlt_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.18181818181817 - type: f1 value: 91.30787690570298 - type: main_score value: 91.30787690570298 - type: precision value: 90.4983060417843 - type: recall value: 93.18181818181817 task: type: BitextMining - dataset: config: rus_Cyrl-quy_Latn name: MTEB FloresBitextMining (rus_Cyrl-quy_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 62.450592885375485 - type: f1 value: 57.28742975628178 - type: main_score value: 57.28742975628178 - type: precision value: 55.56854987623269 - type: recall value: 62.450592885375485 task: type: BitextMining - dataset: config: rus_Cyrl-swh_Latn name: MTEB FloresBitextMining (rus_Cyrl-swh_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.3201581027668 - type: f1 value: 97.77667984189723 - type: main_score value: 97.77667984189723 - type: precision value: 97.51317523056655 - type: recall value: 98.3201581027668 task: type: BitextMining - dataset: config: rus_Cyrl-uzn_Latn name: MTEB FloresBitextMining (rus_Cyrl-uzn_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.12252964426878 - type: f1 value: 97.59081498211933 - type: main_score value: 97.59081498211933 - type: precision value: 97.34848484848484 - type: recall value: 98.12252964426878 task: type: BitextMining - dataset: config: rus_Cyrl-als_Latn name: MTEB FloresBitextMining (rus_Cyrl-als_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.09420289855073 - type: main_score value: 99.09420289855073 - type: precision value: 98.99538866930172 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: rus_Cyrl-bod_Tibt name: MTEB FloresBitextMining (rus_Cyrl-bod_Tibt) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 11.561264822134387 - type: f1 value: 8.121312045385636 - type: main_score value: 8.121312045385636 - type: precision value: 7.350577020893972 - type: recall value: 11.561264822134387 task: type: BitextMining - dataset: config: rus_Cyrl-fij_Latn name: MTEB FloresBitextMining (rus_Cyrl-fij_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 72.23320158102767 - type: f1 value: 67.21000233846082 - type: main_score value: 67.21000233846082 - type: precision value: 65.3869439739005 - type: recall value: 72.23320158102767 task: type: BitextMining - dataset: config: rus_Cyrl-isl_Latn name: MTEB FloresBitextMining (rus_Cyrl-isl_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.99604743083005 - type: f1 value: 89.75955204216073 - type: main_score value: 89.75955204216073 - type: precision value: 88.7598814229249 - type: recall value: 91.99604743083005 task: type: BitextMining - dataset: config: rus_Cyrl-kon_Latn name: MTEB FloresBitextMining (rus_Cyrl-kon_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.81818181818183 - type: f1 value: 77.77800098452272 - type: main_score value: 77.77800098452272 - type: precision value: 76.1521268586486 - type: recall value: 81.81818181818183 task: type: BitextMining - dataset: config: rus_Cyrl-mni_Beng name: MTEB FloresBitextMining (rus_Cyrl-mni_Beng) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 54.74308300395256 - type: f1 value: 48.97285299254615 - type: main_score value: 48.97285299254615 - type: precision value: 46.95125742968299 - type: recall value: 54.74308300395256 task: type: BitextMining - dataset: config: rus_Cyrl-ron_Latn name: MTEB FloresBitextMining (rus_Cyrl-ron_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.22134387351778 - type: f1 value: 97.64492753623189 - type: main_score value: 97.64492753623189 - type: precision value: 97.36495388669302 - type: recall value: 98.22134387351778 task: type: BitextMining - dataset: config: rus_Cyrl-szl_Latn name: MTEB FloresBitextMining (rus_Cyrl-szl_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.09486166007905 - type: f1 value: 90.10375494071147 - type: main_score value: 90.10375494071147 - type: precision value: 89.29606625258798 - type: recall value: 92.09486166007905 task: type: BitextMining - dataset: config: rus_Cyrl-vec_Latn name: MTEB FloresBitextMining (rus_Cyrl-vec_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.4901185770751 - type: f1 value: 90.51430453604365 - type: main_score value: 90.51430453604365 - type: precision value: 89.69367588932808 - type: recall value: 92.4901185770751 task: type: BitextMining - dataset: config: rus_Cyrl-amh_Ethi name: MTEB FloresBitextMining (rus_Cyrl-amh_Ethi) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.82608695652173 - type: f1 value: 97.11791831357048 - type: main_score value: 97.11791831357048 - type: precision value: 96.77206851119894 - type: recall value: 97.82608695652173 task: type: BitextMining - dataset: config: rus_Cyrl-bos_Latn name: MTEB FloresBitextMining (rus_Cyrl-bos_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.55072463768116 - type: main_score value: 98.55072463768116 - type: precision value: 98.36956521739131 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: rus_Cyrl-fin_Latn name: MTEB FloresBitextMining (rus_Cyrl-fin_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.65217391304348 - type: f1 value: 94.4235836627141 - type: main_score value: 94.4235836627141 - type: precision value: 93.84881422924902 - type: recall value: 95.65217391304348 task: type: BitextMining - dataset: config: rus_Cyrl-ita_Latn name: MTEB FloresBitextMining (rus_Cyrl-ita_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.55072463768117 - type: main_score value: 98.55072463768117 - type: precision value: 98.36956521739131 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: rus_Cyrl-kor_Hang name: MTEB FloresBitextMining (rus_Cyrl-kor_Hang) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.55335968379447 - type: f1 value: 94.15349143610013 - type: main_score value: 94.15349143610013 - type: precision value: 93.49472990777339 - type: recall value: 95.55335968379447 task: type: BitextMining - dataset: config: rus_Cyrl-mos_Latn name: MTEB FloresBitextMining (rus_Cyrl-mos_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 43.67588932806324 - type: f1 value: 38.84849721190082 - type: main_score value: 38.84849721190082 - type: precision value: 37.43294462099682 - type: recall value: 43.67588932806324 task: type: BitextMining - dataset: config: rus_Cyrl-run_Latn name: MTEB FloresBitextMining (rus_Cyrl-run_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 90.21739130434783 - type: f1 value: 87.37483530961792 - type: main_score value: 87.37483530961792 - type: precision value: 86.07872200263506 - type: recall value: 90.21739130434783 task: type: BitextMining - dataset: config: rus_Cyrl-tam_Taml name: MTEB FloresBitextMining (rus_Cyrl-tam_Taml) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.2094861660079 - type: main_score value: 99.2094861660079 - type: precision value: 99.1106719367589 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: rus_Cyrl-vie_Latn name: MTEB FloresBitextMining (rus_Cyrl-vie_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.03557312252964 - type: f1 value: 96.13636363636364 - type: main_score value: 96.13636363636364 - type: precision value: 95.70981554677206 - type: recall value: 97.03557312252964 task: type: BitextMining - dataset: config: rus_Cyrl-apc_Arab name: MTEB FloresBitextMining (rus_Cyrl-apc_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.12252964426878 - type: f1 value: 97.49670619235836 - type: main_score value: 97.49670619235836 - type: precision value: 97.18379446640316 - type: recall value: 98.12252964426878 task: type: BitextMining - dataset: config: rus_Cyrl-bug_Latn name: MTEB FloresBitextMining (rus_Cyrl-bug_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 67.29249011857708 - type: f1 value: 62.09268717667927 - type: main_score value: 62.09268717667927 - type: precision value: 60.28554009748714 - type: recall value: 67.29249011857708 task: type: BitextMining - dataset: config: rus_Cyrl-fon_Latn name: MTEB FloresBitextMining (rus_Cyrl-fon_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 63.43873517786561 - type: f1 value: 57.66660107569199 - type: main_score value: 57.66660107569199 - type: precision value: 55.66676396919363 - type: recall value: 63.43873517786561 task: type: BitextMining - dataset: config: rus_Cyrl-jav_Latn name: MTEB FloresBitextMining (rus_Cyrl-jav_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.46640316205533 - type: f1 value: 92.89384528514964 - type: main_score value: 92.89384528514964 - type: precision value: 92.19367588932806 - type: recall value: 94.46640316205533 task: type: BitextMining - dataset: config: rus_Cyrl-lao_Laoo name: MTEB FloresBitextMining (rus_Cyrl-lao_Laoo) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.23320158102767 - type: f1 value: 96.40974967061922 - type: main_score value: 96.40974967061922 - type: precision value: 96.034255599473 - type: recall value: 97.23320158102767 task: type: BitextMining - dataset: config: rus_Cyrl-mri_Latn name: MTEB FloresBitextMining (rus_Cyrl-mri_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 76.77865612648222 - type: f1 value: 73.11286539547409 - type: main_score value: 73.11286539547409 - type: precision value: 71.78177214337046 - type: recall value: 76.77865612648222 task: type: BitextMining - dataset: config: rus_Cyrl-taq_Latn name: MTEB FloresBitextMining (rus_Cyrl-taq_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 41.99604743083004 - type: f1 value: 37.25127063318763 - type: main_score value: 37.25127063318763 - type: precision value: 35.718929186985726 - type: recall value: 41.99604743083004 task: type: BitextMining - dataset: config: rus_Cyrl-war_Latn name: MTEB FloresBitextMining (rus_Cyrl-war_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.55335968379447 - type: f1 value: 94.1699604743083 - type: main_score value: 94.1699604743083 - type: precision value: 93.52766798418972 - type: recall value: 95.55335968379447 task: type: BitextMining - dataset: config: rus_Cyrl-arb_Arab name: MTEB FloresBitextMining (rus_Cyrl-arb_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.60474308300395 - type: f1 value: 99.4729907773386 - type: main_score value: 99.4729907773386 - type: precision value: 99.40711462450594 - type: recall value: 99.60474308300395 task: type: BitextMining - dataset: config: rus_Cyrl-bul_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-bul_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.70355731225297 - type: f1 value: 99.60474308300395 - type: main_score value: 99.60474308300395 - type: precision value: 99.55533596837944 - type: recall value: 99.70355731225297 task: type: BitextMining - dataset: config: rus_Cyrl-fra_Latn name: MTEB FloresBitextMining (rus_Cyrl-fra_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.60474308300395 - type: f1 value: 99.47299077733861 - type: main_score value: 99.47299077733861 - type: precision value: 99.40711462450594 - type: recall value: 99.60474308300395 task: type: BitextMining - dataset: config: rus_Cyrl-jpn_Jpan name: MTEB FloresBitextMining (rus_Cyrl-jpn_Jpan) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.44268774703558 - type: f1 value: 95.30632411067194 - type: main_score value: 95.30632411067194 - type: precision value: 94.76284584980237 - type: recall value: 96.44268774703558 task: type: BitextMining - dataset: config: rus_Cyrl-lij_Latn name: MTEB FloresBitextMining (rus_Cyrl-lij_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 90.21739130434783 - type: f1 value: 87.4703557312253 - type: main_score value: 87.4703557312253 - type: precision value: 86.29611330698287 - type: recall value: 90.21739130434783 task: type: BitextMining - dataset: config: rus_Cyrl-mya_Mymr name: MTEB FloresBitextMining (rus_Cyrl-mya_Mymr) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.364953886693 - type: main_score value: 97.364953886693 - type: precision value: 97.03557312252964 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: rus_Cyrl-sag_Latn name: MTEB FloresBitextMining (rus_Cyrl-sag_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 54.841897233201585 - type: f1 value: 49.61882037503349 - type: main_score value: 49.61882037503349 - type: precision value: 47.831968755881796 - type: recall value: 54.841897233201585 task: type: BitextMining - dataset: config: rus_Cyrl-taq_Tfng name: MTEB FloresBitextMining (rus_Cyrl-taq_Tfng) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 15.316205533596838 - type: f1 value: 11.614836360389717 - type: main_score value: 11.614836360389717 - type: precision value: 10.741446193235223 - type: recall value: 15.316205533596838 task: type: BitextMining - dataset: config: rus_Cyrl-wol_Latn name: MTEB FloresBitextMining (rus_Cyrl-wol_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 67.88537549407114 - type: f1 value: 62.2536417249856 - type: main_score value: 62.2536417249856 - type: precision value: 60.27629128666678 - type: recall value: 67.88537549407114 task: type: BitextMining - dataset: config: rus_Cyrl-arb_Latn name: MTEB FloresBitextMining (rus_Cyrl-arb_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 27.766798418972332 - type: f1 value: 23.39674889624077 - type: main_score value: 23.39674889624077 - type: precision value: 22.28521155585345 - type: recall value: 27.766798418972332 task: type: BitextMining - dataset: config: rus_Cyrl-cat_Latn name: MTEB FloresBitextMining (rus_Cyrl-cat_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.23320158102767 - type: f1 value: 96.42151326933936 - type: main_score value: 96.42151326933936 - type: precision value: 96.04743083003953 - type: recall value: 97.23320158102767 task: type: BitextMining - dataset: config: rus_Cyrl-fur_Latn name: MTEB FloresBitextMining (rus_Cyrl-fur_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 88.63636363636364 - type: f1 value: 85.80792396009788 - type: main_score value: 85.80792396009788 - type: precision value: 84.61508901726293 - type: recall value: 88.63636363636364 task: type: BitextMining - dataset: config: rus_Cyrl-kab_Latn name: MTEB FloresBitextMining (rus_Cyrl-kab_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 48.12252964426877 - type: f1 value: 43.05387582971066 - type: main_score value: 43.05387582971066 - type: precision value: 41.44165117538212 - type: recall value: 48.12252964426877 task: type: BitextMining - dataset: config: rus_Cyrl-lim_Latn name: MTEB FloresBitextMining (rus_Cyrl-lim_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.81818181818183 - type: f1 value: 77.81676163099087 - type: main_score value: 77.81676163099087 - type: precision value: 76.19565217391305 - type: recall value: 81.81818181818183 task: type: BitextMining - dataset: config: rus_Cyrl-nld_Latn name: MTEB FloresBitextMining (rus_Cyrl-nld_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.33201581027669 - type: f1 value: 96.4756258234519 - type: main_score value: 96.4756258234519 - type: precision value: 96.06389986824769 - type: recall value: 97.33201581027669 task: type: BitextMining - dataset: config: rus_Cyrl-san_Deva name: MTEB FloresBitextMining (rus_Cyrl-san_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.47826086956522 - type: f1 value: 91.70289855072463 - type: main_score value: 91.70289855072463 - type: precision value: 90.9370882740448 - type: recall value: 93.47826086956522 task: type: BitextMining - dataset: config: rus_Cyrl-tat_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-tat_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.72727272727273 - type: f1 value: 97.00263504611331 - type: main_score value: 97.00263504611331 - type: precision value: 96.65678524374177 - type: recall value: 97.72727272727273 task: type: BitextMining - dataset: config: rus_Cyrl-xho_Latn name: MTEB FloresBitextMining (rus_Cyrl-xho_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.08300395256917 - type: f1 value: 91.12977602108036 - type: main_score value: 91.12977602108036 - type: precision value: 90.22562582345192 - type: recall value: 93.08300395256917 task: type: BitextMining - dataset: config: rus_Cyrl-ars_Arab name: MTEB FloresBitextMining (rus_Cyrl-ars_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.40711462450594 - type: f1 value: 99.2094861660079 - type: main_score value: 99.2094861660079 - type: precision value: 99.1106719367589 - type: recall value: 99.40711462450594 task: type: BitextMining - dataset: config: rus_Cyrl-ceb_Latn name: MTEB FloresBitextMining (rus_Cyrl-ceb_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.65217391304348 - type: f1 value: 94.3544137022398 - type: main_score value: 94.3544137022398 - type: precision value: 93.76646903820817 - type: recall value: 95.65217391304348 task: type: BitextMining - dataset: config: rus_Cyrl-fuv_Latn name: MTEB FloresBitextMining (rus_Cyrl-fuv_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 51.18577075098815 - type: f1 value: 44.5990252610806 - type: main_score value: 44.5990252610806 - type: precision value: 42.34331599450177 - type: recall value: 51.18577075098815 task: type: BitextMining - dataset: config: rus_Cyrl-kac_Latn name: MTEB FloresBitextMining (rus_Cyrl-kac_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 46.93675889328063 - type: f1 value: 41.79004018701787 - type: main_score value: 41.79004018701787 - type: precision value: 40.243355662392624 - type: recall value: 46.93675889328063 task: type: BitextMining - dataset: config: rus_Cyrl-lin_Latn name: MTEB FloresBitextMining (rus_Cyrl-lin_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.50197628458498 - type: f1 value: 89.1205533596838 - type: main_score value: 89.1205533596838 - type: precision value: 88.07147562582345 - type: recall value: 91.50197628458498 task: type: BitextMining - dataset: config: rus_Cyrl-nno_Latn name: MTEB FloresBitextMining (rus_Cyrl-nno_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.81422924901186 - type: f1 value: 98.41897233201581 - type: main_score value: 98.41897233201581 - type: precision value: 98.22134387351778 - type: recall value: 98.81422924901186 task: type: BitextMining - dataset: config: rus_Cyrl-sat_Olck name: MTEB FloresBitextMining (rus_Cyrl-sat_Olck) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 2.371541501976284 - type: f1 value: 1.0726274943087382 - type: main_score value: 1.0726274943087382 - type: precision value: 0.875279634748803 - type: recall value: 2.371541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-tel_Telu name: MTEB FloresBitextMining (rus_Cyrl-tel_Telu) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.68247694334651 - type: main_score value: 98.68247694334651 - type: precision value: 98.51778656126481 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: rus_Cyrl-ydd_Hebr name: MTEB FloresBitextMining (rus_Cyrl-ydd_Hebr) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.42687747035573 - type: f1 value: 86.47609636740073 - type: main_score value: 86.47609636740073 - type: precision value: 85.13669301712781 - type: recall value: 89.42687747035573 task: type: BitextMining - dataset: config: rus_Cyrl-ary_Arab name: MTEB FloresBitextMining (rus_Cyrl-ary_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.82213438735178 - type: f1 value: 87.04545454545456 - type: main_score value: 87.04545454545456 - type: precision value: 85.76910408432148 - type: recall value: 89.82213438735178 task: type: BitextMining - dataset: config: rus_Cyrl-ces_Latn name: MTEB FloresBitextMining (rus_Cyrl-ces_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.9459815546772 - type: main_score value: 98.9459815546772 - type: precision value: 98.81422924901186 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: rus_Cyrl-gaz_Latn name: MTEB FloresBitextMining (rus_Cyrl-gaz_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 64.9209486166008 - type: f1 value: 58.697458119394874 - type: main_score value: 58.697458119394874 - type: precision value: 56.43402189597842 - type: recall value: 64.9209486166008 task: type: BitextMining - dataset: config: rus_Cyrl-kam_Latn name: MTEB FloresBitextMining (rus_Cyrl-kam_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 59.18972332015811 - type: f1 value: 53.19031511966295 - type: main_score value: 53.19031511966295 - type: precision value: 51.08128357343655 - type: recall value: 59.18972332015811 task: type: BitextMining - dataset: config: rus_Cyrl-lit_Latn name: MTEB FloresBitextMining (rus_Cyrl-lit_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.54150197628458 - type: f1 value: 95.5368906455863 - type: main_score value: 95.5368906455863 - type: precision value: 95.0592885375494 - type: recall value: 96.54150197628458 task: type: BitextMining - dataset: config: rus_Cyrl-nob_Latn name: MTEB FloresBitextMining (rus_Cyrl-nob_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.12252964426878 - type: f1 value: 97.51317523056655 - type: main_score value: 97.51317523056655 - type: precision value: 97.2167325428195 - type: recall value: 98.12252964426878 task: type: BitextMining - dataset: config: rus_Cyrl-scn_Latn name: MTEB FloresBitextMining (rus_Cyrl-scn_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 84.0909090909091 - type: f1 value: 80.37000439174352 - type: main_score value: 80.37000439174352 - type: precision value: 78.83994628559846 - type: recall value: 84.0909090909091 task: type: BitextMining - dataset: config: rus_Cyrl-tgk_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-tgk_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.68774703557312 - type: f1 value: 90.86344814605684 - type: main_score value: 90.86344814605684 - type: precision value: 90.12516469038208 - type: recall value: 92.68774703557312 task: type: BitextMining - dataset: config: rus_Cyrl-yor_Latn name: MTEB FloresBitextMining (rus_Cyrl-yor_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 72.13438735177866 - type: f1 value: 66.78759646150951 - type: main_score value: 66.78759646150951 - type: precision value: 64.85080192096002 - type: recall value: 72.13438735177866 task: type: BitextMining - dataset: config: rus_Cyrl-arz_Arab name: MTEB FloresBitextMining (rus_Cyrl-arz_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.364953886693 - type: main_score value: 97.364953886693 - type: precision value: 97.03557312252964 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: rus_Cyrl-cjk_Latn name: MTEB FloresBitextMining (rus_Cyrl-cjk_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 51.976284584980235 - type: f1 value: 46.468762353149714 - type: main_score value: 46.468762353149714 - type: precision value: 44.64073366247278 - type: recall value: 51.976284584980235 task: type: BitextMining - dataset: config: rus_Cyrl-gla_Latn name: MTEB FloresBitextMining (rus_Cyrl-gla_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 79.74308300395256 - type: f1 value: 75.55611165294958 - type: main_score value: 75.55611165294958 - type: precision value: 73.95033408620365 - type: recall value: 79.74308300395256 task: type: BitextMining - dataset: config: rus_Cyrl-kan_Knda name: MTEB FloresBitextMining (rus_Cyrl-kan_Knda) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.96245059288538 - type: main_score value: 98.96245059288538 - type: precision value: 98.84716732542819 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: rus_Cyrl-lmo_Latn name: MTEB FloresBitextMining (rus_Cyrl-lmo_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 82.41106719367589 - type: f1 value: 78.56413514022209 - type: main_score value: 78.56413514022209 - type: precision value: 77.15313068573938 - type: recall value: 82.41106719367589 task: type: BitextMining - dataset: config: rus_Cyrl-npi_Deva name: MTEB FloresBitextMining (rus_Cyrl-npi_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.3201581027668 - type: main_score value: 98.3201581027668 - type: precision value: 98.12252964426878 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-shn_Mymr name: MTEB FloresBitextMining (rus_Cyrl-shn_Mymr) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 57.11462450592886 - type: f1 value: 51.51361369197337 - type: main_score value: 51.51361369197337 - type: precision value: 49.71860043649573 - type: recall value: 57.11462450592886 task: type: BitextMining - dataset: config: rus_Cyrl-tgl_Latn name: MTEB FloresBitextMining (rus_Cyrl-tgl_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.82608695652173 - type: f1 value: 97.18379446640316 - type: main_score value: 97.18379446640316 - type: precision value: 96.88735177865613 - type: recall value: 97.82608695652173 task: type: BitextMining - dataset: config: rus_Cyrl-yue_Hant name: MTEB FloresBitextMining (rus_Cyrl-yue_Hant) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.09420289855072 - type: main_score value: 99.09420289855072 - type: precision value: 98.9953886693017 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: rus_Cyrl-asm_Beng name: MTEB FloresBitextMining (rus_Cyrl-asm_Beng) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.55335968379447 - type: f1 value: 94.16007905138339 - type: main_score value: 94.16007905138339 - type: precision value: 93.50296442687747 - type: recall value: 95.55335968379447 task: type: BitextMining - dataset: config: rus_Cyrl-ckb_Arab name: MTEB FloresBitextMining (rus_Cyrl-ckb_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.88537549407114 - type: f1 value: 90.76745718050066 - type: main_score value: 90.76745718050066 - type: precision value: 89.80072463768116 - type: recall value: 92.88537549407114 task: type: BitextMining - dataset: config: rus_Cyrl-gle_Latn name: MTEB FloresBitextMining (rus_Cyrl-gle_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.699604743083 - type: f1 value: 89.40899680030115 - type: main_score value: 89.40899680030115 - type: precision value: 88.40085638998683 - type: recall value: 91.699604743083 task: type: BitextMining - dataset: config: rus_Cyrl-kas_Arab name: MTEB FloresBitextMining (rus_Cyrl-kas_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 88.3399209486166 - type: f1 value: 85.14351590438548 - type: main_score value: 85.14351590438548 - type: precision value: 83.72364953886692 - type: recall value: 88.3399209486166 task: type: BitextMining - dataset: config: rus_Cyrl-ltg_Latn name: MTEB FloresBitextMining (rus_Cyrl-ltg_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 83.399209486166 - type: f1 value: 79.88408934061107 - type: main_score value: 79.88408934061107 - type: precision value: 78.53794509179885 - type: recall value: 83.399209486166 task: type: BitextMining - dataset: config: rus_Cyrl-nso_Latn name: MTEB FloresBitextMining (rus_Cyrl-nso_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.20553359683794 - type: f1 value: 88.95406635525212 - type: main_score value: 88.95406635525212 - type: precision value: 88.01548089591567 - type: recall value: 91.20553359683794 task: type: BitextMining - dataset: config: rus_Cyrl-sin_Sinh name: MTEB FloresBitextMining (rus_Cyrl-sin_Sinh) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.56719367588933 - type: main_score value: 98.56719367588933 - type: precision value: 98.40250329380763 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: rus_Cyrl-tha_Thai name: MTEB FloresBitextMining (rus_Cyrl-tha_Thai) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.94861660079052 - type: f1 value: 94.66403162055336 - type: main_score value: 94.66403162055336 - type: precision value: 94.03820816864295 - type: recall value: 95.94861660079052 task: type: BitextMining - dataset: config: rus_Cyrl-zho_Hans name: MTEB FloresBitextMining (rus_Cyrl-zho_Hans) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.4308300395257 - type: f1 value: 96.5909090909091 - type: main_score value: 96.5909090909091 - type: precision value: 96.17918313570487 - type: recall value: 97.4308300395257 task: type: BitextMining - dataset: config: rus_Cyrl-ast_Latn name: MTEB FloresBitextMining (rus_Cyrl-ast_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.46640316205533 - type: f1 value: 92.86890645586297 - type: main_score value: 92.86890645586297 - type: precision value: 92.14756258234519 - type: recall value: 94.46640316205533 task: type: BitextMining - dataset: config: rus_Cyrl-crh_Latn name: MTEB FloresBitextMining (rus_Cyrl-crh_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.66403162055336 - type: f1 value: 93.2663592446201 - type: main_score value: 93.2663592446201 - type: precision value: 92.66716073781292 - type: recall value: 94.66403162055336 task: type: BitextMining - dataset: config: rus_Cyrl-glg_Latn name: MTEB FloresBitextMining (rus_Cyrl-glg_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.81422924901186 - type: f1 value: 98.46837944664031 - type: main_score value: 98.46837944664031 - type: precision value: 98.3201581027668 - type: recall value: 98.81422924901186 task: type: BitextMining - dataset: config: rus_Cyrl-kas_Deva name: MTEB FloresBitextMining (rus_Cyrl-kas_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 69.1699604743083 - type: f1 value: 63.05505292906477 - type: main_score value: 63.05505292906477 - type: precision value: 60.62594108789761 - type: recall value: 69.1699604743083 task: type: BitextMining - dataset: config: rus_Cyrl-ltz_Latn name: MTEB FloresBitextMining (rus_Cyrl-ltz_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.40316205533597 - type: f1 value: 89.26571616789009 - type: main_score value: 89.26571616789009 - type: precision value: 88.40179747788443 - type: recall value: 91.40316205533597 task: type: BitextMining - dataset: config: rus_Cyrl-nus_Latn name: MTEB FloresBitextMining (rus_Cyrl-nus_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 38.93280632411067 - type: f1 value: 33.98513032905371 - type: main_score value: 33.98513032905371 - type: precision value: 32.56257884802308 - type: recall value: 38.93280632411067 task: type: BitextMining - dataset: config: rus_Cyrl-slk_Latn name: MTEB FloresBitextMining (rus_Cyrl-slk_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.42094861660078 - type: main_score value: 97.42094861660078 - type: precision value: 97.14262187088273 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: rus_Cyrl-tir_Ethi name: MTEB FloresBitextMining (rus_Cyrl-tir_Ethi) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.30434782608695 - type: f1 value: 88.78129117259552 - type: main_score value: 88.78129117259552 - type: precision value: 87.61528326745717 - type: recall value: 91.30434782608695 task: type: BitextMining - dataset: config: rus_Cyrl-zho_Hant name: MTEB FloresBitextMining (rus_Cyrl-zho_Hant) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.1106719367589 - type: f1 value: 98.81422924901186 - type: main_score value: 98.81422924901186 - type: precision value: 98.66600790513834 - type: recall value: 99.1106719367589 task: type: BitextMining - dataset: config: rus_Cyrl-awa_Deva name: MTEB FloresBitextMining (rus_Cyrl-awa_Deva) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.12252964426878 - type: f1 value: 97.70092226613966 - type: main_score value: 97.70092226613966 - type: precision value: 97.50494071146245 - type: recall value: 98.12252964426878 task: type: BitextMining - dataset: config: rus_Cyrl-cym_Latn name: MTEB FloresBitextMining (rus_Cyrl-cym_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.94861660079052 - type: f1 value: 94.74308300395256 - type: main_score value: 94.74308300395256 - type: precision value: 94.20289855072464 - type: recall value: 95.94861660079052 task: type: BitextMining - dataset: config: rus_Cyrl-grn_Latn name: MTEB FloresBitextMining (rus_Cyrl-grn_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 77.96442687747036 - type: f1 value: 73.64286789187975 - type: main_score value: 73.64286789187975 - type: precision value: 71.99324893260821 - type: recall value: 77.96442687747036 task: type: BitextMining - dataset: config: rus_Cyrl-kat_Geor name: MTEB FloresBitextMining (rus_Cyrl-kat_Geor) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.56719367588933 - type: main_score value: 98.56719367588933 - type: precision value: 98.40250329380764 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: rus_Cyrl-lua_Latn name: MTEB FloresBitextMining (rus_Cyrl-lua_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 72.03557312252964 - type: f1 value: 67.23928163404449 - type: main_score value: 67.23928163404449 - type: precision value: 65.30797101449275 - type: recall value: 72.03557312252964 task: type: BitextMining - dataset: config: rus_Cyrl-nya_Latn name: MTEB FloresBitextMining (rus_Cyrl-nya_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.29249011857708 - type: f1 value: 90.0494071146245 - type: main_score value: 90.0494071146245 - type: precision value: 89.04808959156786 - type: recall value: 92.29249011857708 task: type: BitextMining - dataset: config: rus_Cyrl-slv_Latn name: MTEB FloresBitextMining (rus_Cyrl-slv_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.30368906455863 - type: main_score value: 98.30368906455863 - type: precision value: 98.10606060606061 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: rus_Cyrl-tpi_Latn name: MTEB FloresBitextMining (rus_Cyrl-tpi_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 80.53359683794467 - type: f1 value: 76.59481822525301 - type: main_score value: 76.59481822525301 - type: precision value: 75.12913223140497 - type: recall value: 80.53359683794467 task: type: BitextMining - dataset: config: rus_Cyrl-zsm_Latn name: MTEB FloresBitextMining (rus_Cyrl-zsm_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.33201581027669 - type: f1 value: 96.58620365142104 - type: main_score value: 96.58620365142104 - type: precision value: 96.26152832674572 - type: recall value: 97.33201581027669 task: type: BitextMining - dataset: config: rus_Cyrl-ayr_Latn name: MTEB FloresBitextMining (rus_Cyrl-ayr_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 45.55335968379446 - type: f1 value: 40.13076578531388 - type: main_score value: 40.13076578531388 - type: precision value: 38.398064362362355 - type: recall value: 45.55335968379446 task: type: BitextMining - dataset: config: rus_Cyrl-dan_Latn name: MTEB FloresBitextMining (rus_Cyrl-dan_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.68247694334651 - type: main_score value: 98.68247694334651 - type: precision value: 98.51778656126481 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: rus_Cyrl-guj_Gujr name: MTEB FloresBitextMining (rus_Cyrl-guj_Gujr) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.68247694334651 - type: main_score value: 98.68247694334651 - type: precision value: 98.51778656126481 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: rus_Cyrl-kaz_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-kaz_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.81422924901186 - type: f1 value: 98.43544137022398 - type: main_score value: 98.43544137022398 - type: precision value: 98.25428194993412 - type: recall value: 98.81422924901186 task: type: BitextMining - dataset: config: rus_Cyrl-lug_Latn name: MTEB FloresBitextMining (rus_Cyrl-lug_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 82.21343873517787 - type: f1 value: 77.97485726833554 - type: main_score value: 77.97485726833554 - type: precision value: 76.22376717485415 - type: recall value: 82.21343873517787 task: type: BitextMining - dataset: config: rus_Cyrl-oci_Latn name: MTEB FloresBitextMining (rus_Cyrl-oci_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.87351778656127 - type: f1 value: 92.25319969885187 - type: main_score value: 92.25319969885187 - type: precision value: 91.5638528138528 - type: recall value: 93.87351778656127 task: type: BitextMining - dataset: config: rus_Cyrl-smo_Latn name: MTEB FloresBitextMining (rus_Cyrl-smo_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 84.88142292490119 - type: f1 value: 81.24364765669114 - type: main_score value: 81.24364765669114 - type: precision value: 79.69991416137661 - type: recall value: 84.88142292490119 task: type: BitextMining - dataset: config: rus_Cyrl-tsn_Latn name: MTEB FloresBitextMining (rus_Cyrl-tsn_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 87.05533596837944 - type: f1 value: 83.90645586297761 - type: main_score value: 83.90645586297761 - type: precision value: 82.56752305665349 - type: recall value: 87.05533596837944 task: type: BitextMining - dataset: config: rus_Cyrl-zul_Latn name: MTEB FloresBitextMining (rus_Cyrl-zul_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.15810276679841 - type: f1 value: 93.77140974967062 - type: main_score value: 93.77140974967062 - type: precision value: 93.16534914361002 - type: recall value: 95.15810276679841 task: type: BitextMining - dataset: config: rus_Cyrl-azb_Arab name: MTEB FloresBitextMining (rus_Cyrl-azb_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.91699604743083 - type: f1 value: 77.18050065876152 - type: main_score value: 77.18050065876152 - type: precision value: 75.21519543258673 - type: recall value: 81.91699604743083 task: type: BitextMining - dataset: config: rus_Cyrl-deu_Latn name: MTEB FloresBitextMining (rus_Cyrl-deu_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.50592885375494 - type: f1 value: 99.34123847167325 - type: main_score value: 99.34123847167325 - type: precision value: 99.2588932806324 - type: recall value: 99.50592885375494 task: type: BitextMining - dataset: config: rus_Cyrl-hat_Latn name: MTEB FloresBitextMining (rus_Cyrl-hat_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.00790513833992 - type: f1 value: 88.69126043039086 - type: main_score value: 88.69126043039086 - type: precision value: 87.75774044795784 - type: recall value: 91.00790513833992 task: type: BitextMining - dataset: config: rus_Cyrl-kbp_Latn name: MTEB FloresBitextMining (rus_Cyrl-kbp_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 47.233201581027664 - type: f1 value: 43.01118618096943 - type: main_score value: 43.01118618096943 - type: precision value: 41.739069205043556 - type: recall value: 47.233201581027664 task: type: BitextMining - dataset: config: rus_Cyrl-luo_Latn name: MTEB FloresBitextMining (rus_Cyrl-luo_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 60.47430830039525 - type: f1 value: 54.83210565429816 - type: main_score value: 54.83210565429816 - type: precision value: 52.81630744284779 - type: recall value: 60.47430830039525 task: type: BitextMining - dataset: config: rus_Cyrl-ory_Orya name: MTEB FloresBitextMining (rus_Cyrl-ory_Orya) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.1106719367589 - type: f1 value: 98.83069828722003 - type: main_score value: 98.83069828722003 - type: precision value: 98.69894598155467 - type: recall value: 99.1106719367589 task: type: BitextMining - dataset: config: rus_Cyrl-sna_Latn name: MTEB FloresBitextMining (rus_Cyrl-sna_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.72332015810277 - type: f1 value: 87.30013645774514 - type: main_score value: 87.30013645774514 - type: precision value: 86.25329380764163 - type: recall value: 89.72332015810277 task: type: BitextMining - dataset: config: rus_Cyrl-tso_Latn name: MTEB FloresBitextMining (rus_Cyrl-tso_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 84.38735177865613 - type: f1 value: 80.70424744337788 - type: main_score value: 80.70424744337788 - type: precision value: 79.18560606060606 - type: recall value: 84.38735177865613 task: type: BitextMining - dataset: config: rus_Cyrl-azj_Latn name: MTEB FloresBitextMining (rus_Cyrl-azj_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.33201581027669 - type: f1 value: 96.56455862977602 - type: main_score value: 96.56455862977602 - type: precision value: 96.23682476943345 - type: recall value: 97.33201581027669 task: type: BitextMining - dataset: config: rus_Cyrl-dik_Latn name: MTEB FloresBitextMining (rus_Cyrl-dik_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 46.047430830039524 - type: f1 value: 40.05513069495283 - type: main_score value: 40.05513069495283 - type: precision value: 38.072590197096126 - type: recall value: 46.047430830039524 task: type: BitextMining - dataset: config: rus_Cyrl-hau_Latn name: MTEB FloresBitextMining (rus_Cyrl-hau_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 87.94466403162056 - type: f1 value: 84.76943346508563 - type: main_score value: 84.76943346508563 - type: precision value: 83.34486166007905 - type: recall value: 87.94466403162056 task: type: BitextMining - dataset: config: rus_Cyrl-kea_Latn name: MTEB FloresBitextMining (rus_Cyrl-kea_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.42687747035573 - type: f1 value: 86.83803021747684 - type: main_score value: 86.83803021747684 - type: precision value: 85.78416149068323 - type: recall value: 89.42687747035573 task: type: BitextMining - dataset: config: rus_Cyrl-lus_Latn name: MTEB FloresBitextMining (rus_Cyrl-lus_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 68.97233201581028 - type: f1 value: 64.05480726292745 - type: main_score value: 64.05480726292745 - type: precision value: 62.42670749487858 - type: recall value: 68.97233201581028 task: type: BitextMining - dataset: config: rus_Cyrl-pag_Latn name: MTEB FloresBitextMining (rus_Cyrl-pag_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 78.75494071146245 - type: f1 value: 74.58573558401933 - type: main_score value: 74.58573558401933 - type: precision value: 73.05532028358115 - type: recall value: 78.75494071146245 task: type: BitextMining - dataset: config: rus_Cyrl-snd_Arab name: MTEB FloresBitextMining (rus_Cyrl-snd_Arab) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.8498023715415 - type: f1 value: 94.56521739130434 - type: main_score value: 94.56521739130434 - type: precision value: 93.97233201581028 - type: recall value: 95.8498023715415 task: type: BitextMining - dataset: config: rus_Cyrl-tuk_Latn name: MTEB FloresBitextMining (rus_Cyrl-tuk_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 68.08300395256917 - type: f1 value: 62.93565240205557 - type: main_score value: 62.93565240205557 - type: precision value: 61.191590257043934 - type: recall value: 68.08300395256917 task: type: BitextMining - dataset: config: rus_Cyrl-bak_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-bak_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.04743083003953 - type: f1 value: 94.86824769433464 - type: main_score value: 94.86824769433464 - type: precision value: 94.34288537549406 - type: recall value: 96.04743083003953 task: type: BitextMining - dataset: config: rus_Cyrl-dyu_Latn name: MTEB FloresBitextMining (rus_Cyrl-dyu_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 37.45059288537549 - type: f1 value: 31.670482312800807 - type: main_score value: 31.670482312800807 - type: precision value: 29.99928568357422 - type: recall value: 37.45059288537549 task: type: BitextMining - dataset: config: rus_Cyrl-heb_Hebr name: MTEB FloresBitextMining (rus_Cyrl-heb_Hebr) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.23320158102767 - type: f1 value: 96.38998682476942 - type: main_score value: 96.38998682476942 - type: precision value: 95.99802371541502 - type: recall value: 97.23320158102767 task: type: BitextMining - dataset: config: rus_Cyrl-khk_Cyrl name: MTEB FloresBitextMining (rus_Cyrl-khk_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.41897233201581 - type: f1 value: 98.00724637681158 - type: main_score value: 98.00724637681158 - type: precision value: 97.82938076416336 - type: recall value: 98.41897233201581 task: type: BitextMining - dataset: config: rus_Cyrl-lvs_Latn name: MTEB FloresBitextMining (rus_Cyrl-lvs_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.4308300395257 - type: f1 value: 96.61396574440053 - type: main_score value: 96.61396574440053 - type: precision value: 96.2203557312253 - type: recall value: 97.4308300395257 task: type: BitextMining - dataset: config: rus_Cyrl-pan_Guru name: MTEB FloresBitextMining (rus_Cyrl-pan_Guru) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.07773386034256 - type: main_score value: 99.07773386034256 - type: precision value: 98.96245059288538 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: rus_Cyrl-som_Latn name: MTEB FloresBitextMining (rus_Cyrl-som_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 87.74703557312253 - type: f1 value: 84.52898550724638 - type: main_score value: 84.52898550724638 - type: precision value: 83.09288537549409 - type: recall value: 87.74703557312253 task: type: BitextMining - dataset: config: rus_Cyrl-tum_Latn name: MTEB FloresBitextMining (rus_Cyrl-tum_Latn) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 87.15415019762845 - type: f1 value: 83.85069640504425 - type: main_score value: 83.85069640504425 - type: precision value: 82.43671183888576 - type: recall value: 87.15415019762845 task: type: BitextMining - dataset: config: taq_Latn-rus_Cyrl name: MTEB FloresBitextMining (taq_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 28.55731225296443 - type: f1 value: 26.810726360049568 - type: main_score value: 26.810726360049568 - type: precision value: 26.260342858265577 - type: recall value: 28.55731225296443 task: type: BitextMining - dataset: config: war_Latn-rus_Cyrl name: MTEB FloresBitextMining (war_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.86166007905138 - type: f1 value: 94.03147083483051 - type: main_score value: 94.03147083483051 - type: precision value: 93.70653606003322 - type: recall value: 94.86166007905138 task: type: BitextMining - dataset: config: arb_Arab-rus_Cyrl name: MTEB FloresBitextMining (arb_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.34387351778656 - type: f1 value: 95.23056653491436 - type: main_score value: 95.23056653491436 - type: precision value: 94.70520421607378 - type: recall value: 96.34387351778656 task: type: BitextMining - dataset: config: bul_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (bul_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.90118577075098 - type: f1 value: 99.86824769433464 - type: main_score value: 99.86824769433464 - type: precision value: 99.85177865612648 - type: recall value: 99.90118577075098 task: type: BitextMining - dataset: config: fra_Latn-rus_Cyrl name: MTEB FloresBitextMining (fra_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.9459815546772 - type: main_score value: 98.9459815546772 - type: precision value: 98.81422924901186 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: jpn_Jpan-rus_Cyrl name: MTEB FloresBitextMining (jpn_Jpan-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.3201581027668 - type: f1 value: 97.76021080368905 - type: main_score value: 97.76021080368905 - type: precision value: 97.48023715415019 - type: recall value: 98.3201581027668 task: type: BitextMining - dataset: config: lij_Latn-rus_Cyrl name: MTEB FloresBitextMining (lij_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 83.49802371541502 - type: f1 value: 81.64800059239636 - type: main_score value: 81.64800059239636 - type: precision value: 80.9443055878478 - type: recall value: 83.49802371541502 task: type: BitextMining - dataset: config: mya_Mymr-rus_Cyrl name: MTEB FloresBitextMining (mya_Mymr-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 90.21739130434783 - type: f1 value: 88.76776366313682 - type: main_score value: 88.76776366313682 - type: precision value: 88.18370446119435 - type: recall value: 90.21739130434783 task: type: BitextMining - dataset: config: sag_Latn-rus_Cyrl name: MTEB FloresBitextMining (sag_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 41.699604743083 - type: f1 value: 39.53066322643847 - type: main_score value: 39.53066322643847 - type: precision value: 38.822876239229274 - type: recall value: 41.699604743083 task: type: BitextMining - dataset: config: taq_Tfng-rus_Cyrl name: MTEB FloresBitextMining (taq_Tfng-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 10.67193675889328 - type: f1 value: 9.205744965817951 - type: main_score value: 9.205744965817951 - type: precision value: 8.85195219073817 - type: recall value: 10.67193675889328 task: type: BitextMining - dataset: config: wol_Latn-rus_Cyrl name: MTEB FloresBitextMining (wol_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 63.537549407114625 - type: f1 value: 60.65190727391827 - type: main_score value: 60.65190727391827 - type: precision value: 59.61144833427442 - type: recall value: 63.537549407114625 task: type: BitextMining - dataset: config: arb_Latn-rus_Cyrl name: MTEB FloresBitextMining (arb_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 13.142292490118576 - type: f1 value: 12.372910318176764 - type: main_score value: 12.372910318176764 - type: precision value: 12.197580895919188 - type: recall value: 13.142292490118576 task: type: BitextMining - dataset: config: cat_Latn-rus_Cyrl name: MTEB FloresBitextMining (cat_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.80599472990777 - type: main_score value: 98.80599472990777 - type: precision value: 98.72953133822698 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: fur_Latn-rus_Cyrl name: MTEB FloresBitextMining (fur_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.02766798418972 - type: f1 value: 79.36184294084613 - type: main_score value: 79.36184294084613 - type: precision value: 78.69187826527705 - type: recall value: 81.02766798418972 task: type: BitextMining - dataset: config: kab_Latn-rus_Cyrl name: MTEB FloresBitextMining (kab_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 34.387351778656125 - type: f1 value: 32.02306921576947 - type: main_score value: 32.02306921576947 - type: precision value: 31.246670347137467 - type: recall value: 34.387351778656125 task: type: BitextMining - dataset: config: lim_Latn-rus_Cyrl name: MTEB FloresBitextMining (lim_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 78.26086956521739 - type: f1 value: 75.90239449214359 - type: main_score value: 75.90239449214359 - type: precision value: 75.02211430745493 - type: recall value: 78.26086956521739 task: type: BitextMining - dataset: config: nld_Latn-rus_Cyrl name: MTEB FloresBitextMining (nld_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.9459815546772 - type: main_score value: 98.9459815546772 - type: precision value: 98.81422924901186 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: san_Deva-rus_Cyrl name: MTEB FloresBitextMining (san_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 87.94466403162056 - type: f1 value: 86.68928897189767 - type: main_score value: 86.68928897189767 - type: precision value: 86.23822997079216 - type: recall value: 87.94466403162056 task: type: BitextMining - dataset: config: tat_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (tat_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.03557312252964 - type: f1 value: 96.4167365353136 - type: main_score value: 96.4167365353136 - type: precision value: 96.16847826086958 - type: recall value: 97.03557312252964 task: type: BitextMining - dataset: config: xho_Latn-rus_Cyrl name: MTEB FloresBitextMining (xho_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 86.95652173913044 - type: f1 value: 85.5506497283435 - type: main_score value: 85.5506497283435 - type: precision value: 84.95270479733395 - type: recall value: 86.95652173913044 task: type: BitextMining - dataset: config: ars_Arab-rus_Cyrl name: MTEB FloresBitextMining (ars_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 96.6403162055336 - type: f1 value: 95.60935441370223 - type: main_score value: 95.60935441370223 - type: precision value: 95.13339920948617 - type: recall value: 96.6403162055336 task: type: BitextMining - dataset: config: ceb_Latn-rus_Cyrl name: MTEB FloresBitextMining (ceb_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.7509881422925 - type: f1 value: 95.05209198303827 - type: main_score value: 95.05209198303827 - type: precision value: 94.77662283368805 - type: recall value: 95.7509881422925 task: type: BitextMining - dataset: config: fuv_Latn-rus_Cyrl name: MTEB FloresBitextMining (fuv_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 45.25691699604743 - type: f1 value: 42.285666666742365 - type: main_score value: 42.285666666742365 - type: precision value: 41.21979853402283 - type: recall value: 45.25691699604743 task: type: BitextMining - dataset: config: kac_Latn-rus_Cyrl name: MTEB FloresBitextMining (kac_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 34.683794466403164 - type: f1 value: 33.3235346229031 - type: main_score value: 33.3235346229031 - type: precision value: 32.94673924616852 - type: recall value: 34.683794466403164 task: type: BitextMining - dataset: config: lin_Latn-rus_Cyrl name: MTEB FloresBitextMining (lin_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 86.85770750988142 - type: f1 value: 85.1867110799439 - type: main_score value: 85.1867110799439 - type: precision value: 84.53038212173273 - type: recall value: 86.85770750988142 task: type: BitextMining - dataset: config: nno_Latn-rus_Cyrl name: MTEB FloresBitextMining (nno_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.4308300395257 - type: f1 value: 96.78383210991906 - type: main_score value: 96.78383210991906 - type: precision value: 96.51185770750989 - type: recall value: 97.4308300395257 task: type: BitextMining - dataset: config: sat_Olck-rus_Cyrl name: MTEB FloresBitextMining (sat_Olck-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 1.185770750988142 - type: f1 value: 1.0279253129117258 - type: main_score value: 1.0279253129117258 - type: precision value: 1.0129746819135175 - type: recall value: 1.185770750988142 task: type: BitextMining - dataset: config: tel_Telu-rus_Cyrl name: MTEB FloresBitextMining (tel_Telu-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.12252964426878 - type: f1 value: 97.61198945981555 - type: main_score value: 97.61198945981555 - type: precision value: 97.401185770751 - type: recall value: 98.12252964426878 task: type: BitextMining - dataset: config: ydd_Hebr-rus_Cyrl name: MTEB FloresBitextMining (ydd_Hebr-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 75.8893280632411 - type: f1 value: 74.00244008018511 - type: main_score value: 74.00244008018511 - type: precision value: 73.25683020960382 - type: recall value: 75.8893280632411 task: type: BitextMining - dataset: config: ary_Arab-rus_Cyrl name: MTEB FloresBitextMining (ary_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 86.56126482213439 - type: f1 value: 83.72796285839765 - type: main_score value: 83.72796285839765 - type: precision value: 82.65014273166447 - type: recall value: 86.56126482213439 task: type: BitextMining - dataset: config: ces_Latn-rus_Cyrl name: MTEB FloresBitextMining (ces_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.60474308300395 - type: f1 value: 99.4729907773386 - type: main_score value: 99.4729907773386 - type: precision value: 99.40711462450594 - type: recall value: 99.60474308300395 task: type: BitextMining - dataset: config: gaz_Latn-rus_Cyrl name: MTEB FloresBitextMining (gaz_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 42.58893280632411 - type: f1 value: 40.75832866805978 - type: main_score value: 40.75832866805978 - type: precision value: 40.14285046917723 - type: recall value: 42.58893280632411 task: type: BitextMining - dataset: config: kam_Latn-rus_Cyrl name: MTEB FloresBitextMining (kam_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 45.25691699604743 - type: f1 value: 42.6975518029456 - type: main_score value: 42.6975518029456 - type: precision value: 41.87472710984596 - type: recall value: 45.25691699604743 task: type: BitextMining - dataset: config: lit_Latn-rus_Cyrl name: MTEB FloresBitextMining (lit_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.33201581027669 - type: f1 value: 96.62384716732542 - type: main_score value: 96.62384716732542 - type: precision value: 96.3175230566535 - type: recall value: 97.33201581027669 task: type: BitextMining - dataset: config: nob_Latn-rus_Cyrl name: MTEB FloresBitextMining (nob_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.30368906455863 - type: main_score value: 98.30368906455863 - type: precision value: 98.10606060606061 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: scn_Latn-rus_Cyrl name: MTEB FloresBitextMining (scn_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 70.45454545454545 - type: f1 value: 68.62561022640075 - type: main_score value: 68.62561022640075 - type: precision value: 67.95229103411222 - type: recall value: 70.45454545454545 task: type: BitextMining - dataset: config: tgk_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (tgk_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.4901185770751 - type: f1 value: 91.58514492753623 - type: main_score value: 91.58514492753623 - type: precision value: 91.24759298672342 - type: recall value: 92.4901185770751 task: type: BitextMining - dataset: config: yor_Latn-rus_Cyrl name: MTEB FloresBitextMining (yor_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 67.98418972332016 - type: f1 value: 64.72874247330768 - type: main_score value: 64.72874247330768 - type: precision value: 63.450823399938685 - type: recall value: 67.98418972332016 task: type: BitextMining - dataset: config: arz_Arab-rus_Cyrl name: MTEB FloresBitextMining (arz_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 94.56521739130434 - type: f1 value: 93.07971014492755 - type: main_score value: 93.07971014492755 - type: precision value: 92.42753623188406 - type: recall value: 94.56521739130434 task: type: BitextMining - dataset: config: cjk_Latn-rus_Cyrl name: MTEB FloresBitextMining (cjk_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 38.63636363636363 - type: f1 value: 36.25747140862938 - type: main_score value: 36.25747140862938 - type: precision value: 35.49101355074723 - type: recall value: 38.63636363636363 task: type: BitextMining - dataset: config: gla_Latn-rus_Cyrl name: MTEB FloresBitextMining (gla_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 69.26877470355731 - type: f1 value: 66.11797423328613 - type: main_score value: 66.11797423328613 - type: precision value: 64.89369649409694 - type: recall value: 69.26877470355731 task: type: BitextMining - dataset: config: kan_Knda-rus_Cyrl name: MTEB FloresBitextMining (kan_Knda-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.51505740636176 - type: main_score value: 97.51505740636176 - type: precision value: 97.30731225296442 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: lmo_Latn-rus_Cyrl name: MTEB FloresBitextMining (lmo_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 73.3201581027668 - type: f1 value: 71.06371608677273 - type: main_score value: 71.06371608677273 - type: precision value: 70.26320288266223 - type: recall value: 73.3201581027668 task: type: BitextMining - dataset: config: npi_Deva-rus_Cyrl name: MTEB FloresBitextMining (npi_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.82608695652173 - type: f1 value: 97.36645107198466 - type: main_score value: 97.36645107198466 - type: precision value: 97.1772068511199 - type: recall value: 97.82608695652173 task: type: BitextMining - dataset: config: shn_Mymr-rus_Cyrl name: MTEB FloresBitextMining (shn_Mymr-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 39.426877470355734 - type: f1 value: 37.16728785513024 - type: main_score value: 37.16728785513024 - type: precision value: 36.56918548278505 - type: recall value: 39.426877470355734 task: type: BitextMining - dataset: config: tgl_Latn-rus_Cyrl name: MTEB FloresBitextMining (tgl_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.92490118577075 - type: f1 value: 97.6378693769998 - type: main_score value: 97.6378693769998 - type: precision value: 97.55371440154047 - type: recall value: 97.92490118577075 task: type: BitextMining - dataset: config: yue_Hant-rus_Cyrl name: MTEB FloresBitextMining (yue_Hant-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.92490118577075 - type: f1 value: 97.3833051006964 - type: main_score value: 97.3833051006964 - type: precision value: 97.1590909090909 - type: recall value: 97.92490118577075 task: type: BitextMining - dataset: config: asm_Beng-rus_Cyrl name: MTEB FloresBitextMining (asm_Beng-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.78656126482213 - type: f1 value: 91.76917395296842 - type: main_score value: 91.76917395296842 - type: precision value: 91.38292866553736 - type: recall value: 92.78656126482213 task: type: BitextMining - dataset: config: ckb_Arab-rus_Cyrl name: MTEB FloresBitextMining (ckb_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 80.8300395256917 - type: f1 value: 79.17664345468799 - type: main_score value: 79.17664345468799 - type: precision value: 78.5622171683459 - type: recall value: 80.8300395256917 task: type: BitextMining - dataset: config: gle_Latn-rus_Cyrl name: MTEB FloresBitextMining (gle_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 85.86956521739131 - type: f1 value: 84.45408265372492 - type: main_score value: 84.45408265372492 - type: precision value: 83.8774340026703 - type: recall value: 85.86956521739131 task: type: BitextMining - dataset: config: kas_Arab-rus_Cyrl name: MTEB FloresBitextMining (kas_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 76.28458498023716 - type: f1 value: 74.11216313578267 - type: main_score value: 74.11216313578267 - type: precision value: 73.2491277759584 - type: recall value: 76.28458498023716 task: type: BitextMining - dataset: config: ltg_Latn-rus_Cyrl name: MTEB FloresBitextMining (ltg_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 71.14624505928853 - type: f1 value: 68.69245357723618 - type: main_score value: 68.69245357723618 - type: precision value: 67.8135329666459 - type: recall value: 71.14624505928853 task: type: BitextMining - dataset: config: nso_Latn-rus_Cyrl name: MTEB FloresBitextMining (nso_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 87.64822134387352 - type: f1 value: 85.98419219986725 - type: main_score value: 85.98419219986725 - type: precision value: 85.32513873917036 - type: recall value: 87.64822134387352 task: type: BitextMining - dataset: config: sin_Sinh-rus_Cyrl name: MTEB FloresBitextMining (sin_Sinh-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.62845849802372 - type: f1 value: 97.10144927536231 - type: main_score value: 97.10144927536231 - type: precision value: 96.87986585219788 - type: recall value: 97.62845849802372 task: type: BitextMining - dataset: config: tha_Thai-rus_Cyrl name: MTEB FloresBitextMining (tha_Thai-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.71541501976284 - type: f1 value: 98.28722002635045 - type: main_score value: 98.28722002635045 - type: precision value: 98.07312252964427 - type: recall value: 98.71541501976284 task: type: BitextMining - dataset: config: zho_Hans-rus_Cyrl name: MTEB FloresBitextMining (zho_Hans-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.68247694334651 - type: main_score value: 98.68247694334651 - type: precision value: 98.51778656126481 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: ast_Latn-rus_Cyrl name: MTEB FloresBitextMining (ast_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.65217391304348 - type: f1 value: 94.90649683857505 - type: main_score value: 94.90649683857505 - type: precision value: 94.61352657004831 - type: recall value: 95.65217391304348 task: type: BitextMining - dataset: config: crh_Latn-rus_Cyrl name: MTEB FloresBitextMining (crh_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 93.08300395256917 - type: f1 value: 92.20988998886428 - type: main_score value: 92.20988998886428 - type: precision value: 91.85631013694254 - type: recall value: 93.08300395256917 task: type: BitextMining - dataset: config: glg_Latn-rus_Cyrl name: MTEB FloresBitextMining (glg_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.55335968379447 - type: f1 value: 95.18006148440931 - type: main_score value: 95.18006148440931 - type: precision value: 95.06540560888386 - type: recall value: 95.55335968379447 task: type: BitextMining - dataset: config: kas_Deva-rus_Cyrl name: MTEB FloresBitextMining (kas_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 55.03952569169961 - type: f1 value: 52.19871938895554 - type: main_score value: 52.19871938895554 - type: precision value: 51.17660971469557 - type: recall value: 55.03952569169961 task: type: BitextMining - dataset: config: ltz_Latn-rus_Cyrl name: MTEB FloresBitextMining (ltz_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 87.64822134387352 - type: f1 value: 86.64179841897234 - type: main_score value: 86.64179841897234 - type: precision value: 86.30023235431587 - type: recall value: 87.64822134387352 task: type: BitextMining - dataset: config: nus_Latn-rus_Cyrl name: MTEB FloresBitextMining (nus_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 27.4703557312253 - type: f1 value: 25.703014277858088 - type: main_score value: 25.703014277858088 - type: precision value: 25.194105476917315 - type: recall value: 27.4703557312253 task: type: BitextMining - dataset: config: slk_Latn-rus_Cyrl name: MTEB FloresBitextMining (slk_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.1106719367589 - type: main_score value: 99.1106719367589 - type: precision value: 99.02832674571805 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: tir_Ethi-rus_Cyrl name: MTEB FloresBitextMining (tir_Ethi-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 80.73122529644269 - type: f1 value: 78.66903754775608 - type: main_score value: 78.66903754775608 - type: precision value: 77.86431694163612 - type: recall value: 80.73122529644269 task: type: BitextMining - dataset: config: zho_Hant-rus_Cyrl name: MTEB FloresBitextMining (zho_Hant-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.22134387351778 - type: f1 value: 97.66798418972333 - type: main_score value: 97.66798418972333 - type: precision value: 97.40612648221344 - type: recall value: 98.22134387351778 task: type: BitextMining - dataset: config: awa_Deva-rus_Cyrl name: MTEB FloresBitextMining (awa_Deva-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.5296442687747 - type: f1 value: 96.94224857268335 - type: main_score value: 96.94224857268335 - type: precision value: 96.68560606060606 - type: recall value: 97.5296442687747 task: type: BitextMining - dataset: config: cym_Latn-rus_Cyrl name: MTEB FloresBitextMining (cym_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 92.68774703557312 - type: f1 value: 91.69854302097961 - type: main_score value: 91.69854302097961 - type: precision value: 91.31236846157795 - type: recall value: 92.68774703557312 task: type: BitextMining - dataset: config: grn_Latn-rus_Cyrl name: MTEB FloresBitextMining (grn_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 64.13043478260869 - type: f1 value: 61.850586118740004 - type: main_score value: 61.850586118740004 - type: precision value: 61.0049495186209 - type: recall value: 64.13043478260869 task: type: BitextMining - dataset: config: kat_Geor-rus_Cyrl name: MTEB FloresBitextMining (kat_Geor-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.59881422924902 - type: main_score value: 97.59881422924902 - type: precision value: 97.42534036012296 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: lua_Latn-rus_Cyrl name: MTEB FloresBitextMining (lua_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 63.63636363636363 - type: f1 value: 60.9709122526128 - type: main_score value: 60.9709122526128 - type: precision value: 60.03915902282226 - type: recall value: 63.63636363636363 task: type: BitextMining - dataset: config: nya_Latn-rus_Cyrl name: MTEB FloresBitextMining (nya_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 89.2292490118577 - type: f1 value: 87.59723824473149 - type: main_score value: 87.59723824473149 - type: precision value: 86.90172707867349 - type: recall value: 89.2292490118577 task: type: BitextMining - dataset: config: slv_Latn-rus_Cyrl name: MTEB FloresBitextMining (slv_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.01185770750988 - type: f1 value: 98.74835309617917 - type: main_score value: 98.74835309617917 - type: precision value: 98.63636363636364 - type: recall value: 99.01185770750988 task: type: BitextMining - dataset: config: tpi_Latn-rus_Cyrl name: MTEB FloresBitextMining (tpi_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 77.37154150197628 - type: f1 value: 75.44251611276084 - type: main_score value: 75.44251611276084 - type: precision value: 74.78103665109595 - type: recall value: 77.37154150197628 task: type: BitextMining - dataset: config: zsm_Latn-rus_Cyrl name: MTEB FloresBitextMining (zsm_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.2094861660079 - type: f1 value: 98.96245059288538 - type: main_score value: 98.96245059288538 - type: precision value: 98.8471673254282 - type: recall value: 99.2094861660079 task: type: BitextMining - dataset: config: ayr_Latn-rus_Cyrl name: MTEB FloresBitextMining (ayr_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 27.766798418972332 - type: f1 value: 26.439103195281312 - type: main_score value: 26.439103195281312 - type: precision value: 26.052655604573964 - type: recall value: 27.766798418972332 task: type: BitextMining - dataset: config: dan_Latn-rus_Cyrl name: MTEB FloresBitextMining (dan_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.30830039525692 - type: f1 value: 99.07773386034255 - type: main_score value: 99.07773386034255 - type: precision value: 98.96245059288538 - type: recall value: 99.30830039525692 task: type: BitextMining - dataset: config: guj_Gujr-rus_Cyrl name: MTEB FloresBitextMining (guj_Gujr-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.82608695652173 - type: f1 value: 97.26449275362317 - type: main_score value: 97.26449275362317 - type: precision value: 97.02498588368154 - type: recall value: 97.82608695652173 task: type: BitextMining - dataset: config: kaz_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (kaz_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.5296442687747 - type: f1 value: 97.03557312252964 - type: main_score value: 97.03557312252964 - type: precision value: 96.85022158342316 - type: recall value: 97.5296442687747 task: type: BitextMining - dataset: config: lug_Latn-rus_Cyrl name: MTEB FloresBitextMining (lug_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 68.57707509881423 - type: f1 value: 65.93361605820395 - type: main_score value: 65.93361605820395 - type: precision value: 64.90348248593789 - type: recall value: 68.57707509881423 task: type: BitextMining - dataset: config: oci_Latn-rus_Cyrl name: MTEB FloresBitextMining (oci_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 86.26482213438736 - type: f1 value: 85.33176417155623 - type: main_score value: 85.33176417155623 - type: precision value: 85.00208833384637 - type: recall value: 86.26482213438736 task: type: BitextMining - dataset: config: smo_Latn-rus_Cyrl name: MTEB FloresBitextMining (smo_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 77.96442687747036 - type: f1 value: 75.70960450188885 - type: main_score value: 75.70960450188885 - type: precision value: 74.8312632736777 - type: recall value: 77.96442687747036 task: type: BitextMining - dataset: config: tsn_Latn-rus_Cyrl name: MTEB FloresBitextMining (tsn_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 84.38735177865613 - type: f1 value: 82.13656376349225 - type: main_score value: 82.13656376349225 - type: precision value: 81.16794543904518 - type: recall value: 84.38735177865613 task: type: BitextMining - dataset: config: zul_Latn-rus_Cyrl name: MTEB FloresBitextMining (zul_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 90.21739130434783 - type: f1 value: 88.77570602050753 - type: main_score value: 88.77570602050753 - type: precision value: 88.15978104021582 - type: recall value: 90.21739130434783 task: type: BitextMining - dataset: config: azb_Arab-rus_Cyrl name: MTEB FloresBitextMining (azb_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 65.71146245059289 - type: f1 value: 64.18825390221271 - type: main_score value: 64.18825390221271 - type: precision value: 63.66811154793568 - type: recall value: 65.71146245059289 task: type: BitextMining - dataset: config: deu_Latn-rus_Cyrl name: MTEB FloresBitextMining (deu_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 99.70355731225297 - type: f1 value: 99.60474308300395 - type: main_score value: 99.60474308300395 - type: precision value: 99.55533596837944 - type: recall value: 99.70355731225297 task: type: BitextMining - dataset: config: hat_Latn-rus_Cyrl name: MTEB FloresBitextMining (hat_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 86.7588932806324 - type: f1 value: 85.86738623695146 - type: main_score value: 85.86738623695146 - type: precision value: 85.55235467420822 - type: recall value: 86.7588932806324 task: type: BitextMining - dataset: config: kbp_Latn-rus_Cyrl name: MTEB FloresBitextMining (kbp_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 34.88142292490119 - type: f1 value: 32.16511669463015 - type: main_score value: 32.16511669463015 - type: precision value: 31.432098549546318 - type: recall value: 34.88142292490119 task: type: BitextMining - dataset: config: luo_Latn-rus_Cyrl name: MTEB FloresBitextMining (luo_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 52.27272727272727 - type: f1 value: 49.60489626836975 - type: main_score value: 49.60489626836975 - type: precision value: 48.69639631803339 - type: recall value: 52.27272727272727 task: type: BitextMining - dataset: config: ory_Orya-rus_Cyrl name: MTEB FloresBitextMining (ory_Orya-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.82608695652173 - type: f1 value: 97.27437417654808 - type: main_score value: 97.27437417654808 - type: precision value: 97.04968944099377 - type: recall value: 97.82608695652173 task: type: BitextMining - dataset: config: sna_Latn-rus_Cyrl name: MTEB FloresBitextMining (sna_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 85.37549407114624 - type: f1 value: 83.09911316305177 - type: main_score value: 83.09911316305177 - type: precision value: 82.1284950958864 - type: recall value: 85.37549407114624 task: type: BitextMining - dataset: config: tso_Latn-rus_Cyrl name: MTEB FloresBitextMining (tso_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 82.90513833992095 - type: f1 value: 80.28290385503824 - type: main_score value: 80.28290385503824 - type: precision value: 79.23672543237761 - type: recall value: 82.90513833992095 task: type: BitextMining - dataset: config: azj_Latn-rus_Cyrl name: MTEB FloresBitextMining (azj_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.02371541501977 - type: f1 value: 97.49200075287031 - type: main_score value: 97.49200075287031 - type: precision value: 97.266139657444 - type: recall value: 98.02371541501977 task: type: BitextMining - dataset: config: dik_Latn-rus_Cyrl name: MTEB FloresBitextMining (dik_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 38.43873517786561 - type: f1 value: 35.78152442955223 - type: main_score value: 35.78152442955223 - type: precision value: 34.82424325078237 - type: recall value: 38.43873517786561 task: type: BitextMining - dataset: config: hau_Latn-rus_Cyrl name: MTEB FloresBitextMining (hau_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.42292490118577 - type: f1 value: 79.24612283124593 - type: main_score value: 79.24612283124593 - type: precision value: 78.34736070751448 - type: recall value: 81.42292490118577 task: type: BitextMining - dataset: config: kea_Latn-rus_Cyrl name: MTEB FloresBitextMining (kea_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 81.62055335968378 - type: f1 value: 80.47015182884748 - type: main_score value: 80.47015182884748 - type: precision value: 80.02671028885862 - type: recall value: 81.62055335968378 task: type: BitextMining - dataset: config: lus_Latn-rus_Cyrl name: MTEB FloresBitextMining (lus_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 62.74703557312253 - type: f1 value: 60.53900079111122 - type: main_score value: 60.53900079111122 - type: precision value: 59.80024202850289 - type: recall value: 62.74703557312253 task: type: BitextMining - dataset: config: pag_Latn-rus_Cyrl name: MTEB FloresBitextMining (pag_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 74.01185770750988 - type: f1 value: 72.57280648279529 - type: main_score value: 72.57280648279529 - type: precision value: 71.99952968456789 - type: recall value: 74.01185770750988 task: type: BitextMining - dataset: config: snd_Arab-rus_Cyrl name: MTEB FloresBitextMining (snd_Arab-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 91.30434782608695 - type: f1 value: 90.24653499445358 - type: main_score value: 90.24653499445358 - type: precision value: 89.83134068200232 - type: recall value: 91.30434782608695 task: type: BitextMining - dataset: config: tuk_Latn-rus_Cyrl name: MTEB FloresBitextMining (tuk_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 47.62845849802372 - type: f1 value: 45.812928836644254 - type: main_score value: 45.812928836644254 - type: precision value: 45.23713833170355 - type: recall value: 47.62845849802372 task: type: BitextMining - dataset: config: bak_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (bak_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.8498023715415 - type: f1 value: 95.18904459615922 - type: main_score value: 95.18904459615922 - type: precision value: 94.92812441182006 - type: recall value: 95.8498023715415 task: type: BitextMining - dataset: config: dyu_Latn-rus_Cyrl name: MTEB FloresBitextMining (dyu_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 29.64426877470356 - type: f1 value: 27.287335193938166 - type: main_score value: 27.287335193938166 - type: precision value: 26.583996026587492 - type: recall value: 29.64426877470356 task: type: BitextMining - dataset: config: heb_Hebr-rus_Cyrl name: MTEB FloresBitextMining (heb_Hebr-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 98.91304347826086 - type: f1 value: 98.55072463768116 - type: main_score value: 98.55072463768116 - type: precision value: 98.36956521739131 - type: recall value: 98.91304347826086 task: type: BitextMining - dataset: config: khk_Cyrl-rus_Cyrl name: MTEB FloresBitextMining (khk_Cyrl-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 95.15810276679841 - type: f1 value: 94.44009547764487 - type: main_score value: 94.44009547764487 - type: precision value: 94.16579797014579 - type: recall value: 95.15810276679841 task: type: BitextMining - dataset: config: lvs_Latn-rus_Cyrl name: MTEB FloresBitextMining (lvs_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.92490118577075 - type: f1 value: 97.51467241585817 - type: main_score value: 97.51467241585817 - type: precision value: 97.36166007905138 - type: recall value: 97.92490118577075 task: type: BitextMining - dataset: config: pan_Guru-rus_Cyrl name: MTEB FloresBitextMining (pan_Guru-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 97.92490118577075 - type: f1 value: 97.42918313570486 - type: main_score value: 97.42918313570486 - type: precision value: 97.22261434217955 - type: recall value: 97.92490118577075 task: type: BitextMining - dataset: config: som_Latn-rus_Cyrl name: MTEB FloresBitextMining (som_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 75.69169960474308 - type: f1 value: 73.7211667065916 - type: main_score value: 73.7211667065916 - type: precision value: 72.95842401892384 - type: recall value: 75.69169960474308 task: type: BitextMining - dataset: config: tum_Latn-rus_Cyrl name: MTEB FloresBitextMining (tum_Latn-rus_Cyrl) revision: e6b647fcb6299a2f686f742f4d4c023e553ea67e split: devtest type: mteb/flores metrics: - type: accuracy value: 85.67193675889328 - type: f1 value: 82.9296066252588 - type: main_score value: 82.9296066252588 - type: precision value: 81.77330225447936 - type: recall value: 85.67193675889328 task: type: BitextMining - dataset: config: default name: MTEB GeoreviewClassification (default) revision: 3765c0d1de6b7d264bc459433c45e5a75513839c split: test type: ai-forever/georeview-classification metrics: - type: accuracy value: 44.6630859375 - type: f1 value: 42.607425073610536 - type: f1_weighted value: 42.60639474586065 - type: main_score value: 44.6630859375 task: type: Classification - dataset: config: default name: MTEB GeoreviewClusteringP2P (default) revision: 97a313c8fc85b47f13f33e7e9a95c1ad888c7fec split: test type: ai-forever/georeview-clustering-p2p metrics: - type: main_score value: 58.15951247070825 - type: v_measure value: 58.15951247070825 - type: v_measure_std value: 0.6739615788288809 task: type: Clustering - dataset: config: default name: MTEB HeadlineClassification (default) revision: 2fe05ee6b5832cda29f2ef7aaad7b7fe6a3609eb split: test type: ai-forever/headline-classification metrics: - type: accuracy value: 73.935546875 - type: f1 value: 73.8654872186846 - type: f1_weighted value: 73.86733122685095 - type: main_score value: 73.935546875 task: type: Classification - dataset: config: default name: MTEB InappropriatenessClassification (default) revision: 601651fdc45ef243751676e62dd7a19f491c0285 split: test type: ai-forever/inappropriateness-classification metrics: - type: accuracy value: 59.16015624999999 - type: ap value: 55.52276605836938 - type: ap_weighted value: 55.52276605836938 - type: f1 value: 58.614248199637956 - type: f1_weighted value: 58.614248199637956 - type: main_score value: 59.16015624999999 task: type: Classification - dataset: config: default name: MTEB KinopoiskClassification (default) revision: 5911f26666ac11af46cb9c6849d0dc80a378af24 split: test type: ai-forever/kinopoisk-sentiment-classification metrics: - type: accuracy value: 49.959999999999994 - type: f1 value: 48.4900332316098 - type: f1_weighted value: 48.4900332316098 - type: main_score value: 49.959999999999994 task: type: Classification - dataset: config: default name: MTEB LanguageClassification (default) revision: aa56583bf2bc52b0565770607d6fc3faebecf9e2 split: test type: papluca/language-identification metrics: - type: accuracy value: 71.005859375 - type: f1 value: 69.63481100303348 - type: f1_weighted value: 69.64640413409529 - type: main_score value: 71.005859375 task: type: Classification - dataset: config: ru name: MTEB MLSUMClusteringP2P (ru) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 42.11280087032343 - type: v_measure value: 42.11280087032343 - type: v_measure_std value: 6.7619971723605135 task: type: Clustering - dataset: config: ru name: MTEB MLSUMClusteringP2P.v2 (ru) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 43.00112546945811 - type: v_measure value: 43.00112546945811 - type: v_measure_std value: 1.4740560414835675 task: type: Clustering - dataset: config: ru name: MTEB MLSUMClusteringS2S (ru) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 39.81446080575161 - type: v_measure value: 39.81446080575161 - type: v_measure_std value: 7.125661320308298 task: type: Clustering - dataset: config: ru name: MTEB MLSUMClusteringS2S.v2 (ru) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 39.29659668980239 - type: v_measure value: 39.29659668980239 - type: v_measure_std value: 2.6570502923023094 task: type: Clustering - dataset: config: ru name: MTEB MultiLongDocRetrieval (ru) revision: d67138e705d963e346253a80e59676ddb418810a split: dev type: Shitao/MLDR metrics: - type: main_score value: 38.671 - type: map_at_1 value: 30.0 - type: map_at_10 value: 36.123 - type: map_at_100 value: 36.754999999999995 - type: map_at_1000 value: 36.806 - type: map_at_20 value: 36.464 - type: map_at_3 value: 35.25 - type: map_at_5 value: 35.8 - type: mrr_at_1 value: 30.0 - type: mrr_at_10 value: 36.122817460317464 - type: mrr_at_100 value: 36.75467016625293 - type: mrr_at_1000 value: 36.80612724920882 - type: mrr_at_20 value: 36.46359681984682 - type: mrr_at_3 value: 35.25 - type: mrr_at_5 value: 35.800000000000004 - type: nauc_map_at_1000_diff1 value: 55.61987610843598 - type: nauc_map_at_1000_max value: 52.506795017152186 - type: nauc_map_at_1000_std value: 2.95487192066911 - type: nauc_map_at_100_diff1 value: 55.598419532054734 - type: nauc_map_at_100_max value: 52.48192017040307 - type: nauc_map_at_100_std value: 2.930120252521189 - type: nauc_map_at_10_diff1 value: 56.02309155375198 - type: nauc_map_at_10_max value: 52.739573233234424 - type: nauc_map_at_10_std value: 2.4073432421641545 - type: nauc_map_at_1_diff1 value: 52.57059856776112 - type: nauc_map_at_1_max value: 50.55668152952304 - type: nauc_map_at_1_std value: 1.6572084853398048 - type: nauc_map_at_20_diff1 value: 55.75769029917031 - type: nauc_map_at_20_max value: 52.53663737242853 - type: nauc_map_at_20_std value: 2.8489192879814 - type: nauc_map_at_3_diff1 value: 56.90294128342709 - type: nauc_map_at_3_max value: 53.10608389782041 - type: nauc_map_at_3_std value: 1.4909731657889491 - type: nauc_map_at_5_diff1 value: 56.1258315436073 - type: nauc_map_at_5_max value: 52.398078357541564 - type: nauc_map_at_5_std value: 1.8256862015101467 - type: nauc_mrr_at_1000_diff1 value: 55.61987610843598 - type: nauc_mrr_at_1000_max value: 52.506795017152186 - type: nauc_mrr_at_1000_std value: 2.95487192066911 - type: nauc_mrr_at_100_diff1 value: 55.598419532054734 - type: nauc_mrr_at_100_max value: 52.48192017040307 - type: nauc_mrr_at_100_std value: 2.930120252521189 - type: nauc_mrr_at_10_diff1 value: 56.02309155375198 - type: nauc_mrr_at_10_max value: 52.739573233234424 - type: nauc_mrr_at_10_std value: 2.4073432421641545 - type: nauc_mrr_at_1_diff1 value: 52.57059856776112 - type: nauc_mrr_at_1_max value: 50.55668152952304 - type: nauc_mrr_at_1_std value: 1.6572084853398048 - type: nauc_mrr_at_20_diff1 value: 55.75769029917031 - type: nauc_mrr_at_20_max value: 52.53663737242853 - type: nauc_mrr_at_20_std value: 2.8489192879814 - type: nauc_mrr_at_3_diff1 value: 56.90294128342709 - type: nauc_mrr_at_3_max value: 53.10608389782041 - type: nauc_mrr_at_3_std value: 1.4909731657889491 - type: nauc_mrr_at_5_diff1 value: 56.1258315436073 - type: nauc_mrr_at_5_max value: 52.398078357541564 - type: nauc_mrr_at_5_std value: 1.8256862015101467 - type: nauc_ndcg_at_1000_diff1 value: 55.30733548408918 - type: nauc_ndcg_at_1000_max value: 53.51143366189318 - type: nauc_ndcg_at_1000_std value: 7.133789405525702 - type: nauc_ndcg_at_100_diff1 value: 54.32209039488095 - type: nauc_ndcg_at_100_max value: 52.67499334461009 - type: nauc_ndcg_at_100_std value: 6.878823275077807 - type: nauc_ndcg_at_10_diff1 value: 56.266780806997716 - type: nauc_ndcg_at_10_max value: 53.52837255793743 - type: nauc_ndcg_at_10_std value: 3.756832592964262 - type: nauc_ndcg_at_1_diff1 value: 52.57059856776112 - type: nauc_ndcg_at_1_max value: 50.55668152952304 - type: nauc_ndcg_at_1_std value: 1.6572084853398048 - type: nauc_ndcg_at_20_diff1 value: 55.39255420432796 - type: nauc_ndcg_at_20_max value: 52.946114684072235 - type: nauc_ndcg_at_20_std value: 5.414933414031693 - type: nauc_ndcg_at_3_diff1 value: 57.92826624996289 - type: nauc_ndcg_at_3_max value: 53.89907760306972 - type: nauc_ndcg_at_3_std value: 1.6661401245309218 - type: nauc_ndcg_at_5_diff1 value: 56.47508936029308 - type: nauc_ndcg_at_5_max value: 52.66800998045517 - type: nauc_ndcg_at_5_std value: 2.4127296184140423 - type: nauc_precision_at_1000_diff1 value: 57.25924020238401 - type: nauc_precision_at_1000_max value: 65.1132590931922 - type: nauc_precision_at_1000_std value: 40.60788709618145 - type: nauc_precision_at_100_diff1 value: 46.49620002554606 - type: nauc_precision_at_100_max value: 53.02960148167071 - type: nauc_precision_at_100_std value: 28.206028867032863 - type: nauc_precision_at_10_diff1 value: 56.562744749606765 - type: nauc_precision_at_10_max value: 56.00594967783547 - type: nauc_precision_at_10_std value: 8.368379831645163 - type: nauc_precision_at_1_diff1 value: 52.57059856776112 - type: nauc_precision_at_1_max value: 50.55668152952304 - type: nauc_precision_at_1_std value: 1.6572084853398048 - type: nauc_precision_at_20_diff1 value: 53.25915754614111 - type: nauc_precision_at_20_max value: 54.03255118937036 - type: nauc_precision_at_20_std value: 15.161611674272718 - type: nauc_precision_at_3_diff1 value: 60.726785748943854 - type: nauc_precision_at_3_max value: 56.139896875869354 - type: nauc_precision_at_3_std value: 2.2306901035769893 - type: nauc_precision_at_5_diff1 value: 57.1201127525187 - type: nauc_precision_at_5_max value: 53.28665761862506 - type: nauc_precision_at_5_std value: 4.358720050112237 - type: nauc_recall_at_1000_diff1 value: 57.259240202383964 - type: nauc_recall_at_1000_max value: 65.11325909319218 - type: nauc_recall_at_1000_std value: 40.60788709618142 - type: nauc_recall_at_100_diff1 value: 46.49620002554603 - type: nauc_recall_at_100_max value: 53.02960148167071 - type: nauc_recall_at_100_std value: 28.206028867032835 - type: nauc_recall_at_10_diff1 value: 56.562744749606765 - type: nauc_recall_at_10_max value: 56.00594967783549 - type: nauc_recall_at_10_std value: 8.368379831645147 - type: nauc_recall_at_1_diff1 value: 52.57059856776112 - type: nauc_recall_at_1_max value: 50.55668152952304 - type: nauc_recall_at_1_std value: 1.6572084853398048 - type: nauc_recall_at_20_diff1 value: 53.259157546141154 - type: nauc_recall_at_20_max value: 54.03255118937038 - type: nauc_recall_at_20_std value: 15.16161167427274 - type: nauc_recall_at_3_diff1 value: 60.72678574894387 - type: nauc_recall_at_3_max value: 56.13989687586933 - type: nauc_recall_at_3_std value: 2.2306901035770066 - type: nauc_recall_at_5_diff1 value: 57.12011275251864 - type: nauc_recall_at_5_max value: 53.28665761862502 - type: nauc_recall_at_5_std value: 4.3587200501122245 - type: ndcg_at_1 value: 30.0 - type: ndcg_at_10 value: 38.671 - type: ndcg_at_100 value: 42.173 - type: ndcg_at_1000 value: 44.016 - type: ndcg_at_20 value: 39.845000000000006 - type: ndcg_at_3 value: 36.863 - type: ndcg_at_5 value: 37.874 - type: precision_at_1 value: 30.0 - type: precision_at_10 value: 4.65 - type: precision_at_100 value: 0.64 - type: precision_at_1000 value: 0.08 - type: precision_at_20 value: 2.55 - type: precision_at_3 value: 13.833 - type: precision_at_5 value: 8.799999999999999 - type: recall_at_1 value: 30.0 - type: recall_at_10 value: 46.5 - type: recall_at_100 value: 64.0 - type: recall_at_1000 value: 79.5 - type: recall_at_20 value: 51.0 - type: recall_at_3 value: 41.5 - type: recall_at_5 value: 44.0 task: type: Retrieval - dataset: config: rus name: MTEB MultilingualSentimentClassification (rus) revision: 2b9b4d10fc589af67794141fe8cbd3739de1eb33 split: test type: mteb/multilingual-sentiment-classification metrics: - type: accuracy value: 79.52710495963092 - type: ap value: 84.5713457178972 - type: ap_weighted value: 84.5713457178972 - type: f1 value: 77.88661181524105 - type: f1_weighted value: 79.87563079922718 - type: main_score value: 79.52710495963092 task: type: Classification - dataset: config: arb_Arab-rus_Cyrl name: MTEB NTREXBitextMining (arb_Arab-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 86.47971957936905 - type: f1 value: 82.79864240805654 - type: main_score value: 82.79864240805654 - type: precision value: 81.21485800128767 - type: recall value: 86.47971957936905 task: type: BitextMining - dataset: config: bel_Cyrl-rus_Cyrl name: MTEB NTREXBitextMining (bel_Cyrl-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.84226339509264 - type: f1 value: 93.56399067465667 - type: main_score value: 93.56399067465667 - type: precision value: 93.01619095309631 - type: recall value: 94.84226339509264 task: type: BitextMining - dataset: config: ben_Beng-rus_Cyrl name: MTEB NTREXBitextMining (ben_Beng-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 92.18828242363544 - type: f1 value: 90.42393889620612 - type: main_score value: 90.42393889620612 - type: precision value: 89.67904925153297 - type: recall value: 92.18828242363544 task: type: BitextMining - dataset: config: bos_Latn-rus_Cyrl name: MTEB NTREXBitextMining (bos_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.69203805708563 - type: f1 value: 93.37172425304624 - type: main_score value: 93.37172425304624 - type: precision value: 92.79204521067315 - type: recall value: 94.69203805708563 task: type: BitextMining - dataset: config: bul_Cyrl-rus_Cyrl name: MTEB NTREXBitextMining (bul_Cyrl-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.99549323985978 - type: f1 value: 96.13086296110833 - type: main_score value: 96.13086296110833 - type: precision value: 95.72441996327827 - type: recall value: 96.99549323985978 task: type: BitextMining - dataset: config: ces_Latn-rus_Cyrl name: MTEB NTREXBitextMining (ces_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.94391587381071 - type: f1 value: 94.90680465142157 - type: main_score value: 94.90680465142157 - type: precision value: 94.44541812719079 - type: recall value: 95.94391587381071 task: type: BitextMining - dataset: config: deu_Latn-rus_Cyrl name: MTEB NTREXBitextMining (deu_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.09414121181773 - type: f1 value: 94.94408279085295 - type: main_score value: 94.94408279085295 - type: precision value: 94.41245201135037 - type: recall value: 96.09414121181773 task: type: BitextMining - dataset: config: ell_Grek-rus_Cyrl name: MTEB NTREXBitextMining (ell_Grek-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.19429143715573 - type: f1 value: 95.12101485561676 - type: main_score value: 95.12101485561676 - type: precision value: 94.60440660991488 - type: recall value: 96.19429143715573 task: type: BitextMining - dataset: config: eng_Latn-rus_Cyrl name: MTEB NTREXBitextMining (eng_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.49474211316975 - type: f1 value: 95.46581777428045 - type: main_score value: 95.46581777428045 - type: precision value: 94.98414288098814 - type: recall value: 96.49474211316975 task: type: BitextMining - dataset: config: fas_Arab-rus_Cyrl name: MTEB NTREXBitextMining (fas_Arab-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.44166249374061 - type: f1 value: 92.92383018972905 - type: main_score value: 92.92383018972905 - type: precision value: 92.21957936905358 - type: recall value: 94.44166249374061 task: type: BitextMining - dataset: config: fin_Latn-rus_Cyrl name: MTEB NTREXBitextMining (fin_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 92.18828242363544 - type: f1 value: 90.2980661468393 - type: main_score value: 90.2980661468393 - type: precision value: 89.42580537472877 - type: recall value: 92.18828242363544 task: type: BitextMining - dataset: config: fra_Latn-rus_Cyrl name: MTEB NTREXBitextMining (fra_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.84376564847271 - type: f1 value: 94.81054915706895 - type: main_score value: 94.81054915706895 - type: precision value: 94.31369276136427 - type: recall value: 95.84376564847271 task: type: BitextMining - dataset: config: heb_Hebr-rus_Cyrl name: MTEB NTREXBitextMining (heb_Hebr-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.89233850776164 - type: f1 value: 93.42513770655985 - type: main_score value: 93.42513770655985 - type: precision value: 92.73493573693875 - type: recall value: 94.89233850776164 task: type: BitextMining - dataset: config: hin_Deva-rus_Cyrl name: MTEB NTREXBitextMining (hin_Deva-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.23985978968453 - type: f1 value: 91.52816526376867 - type: main_score value: 91.52816526376867 - type: precision value: 90.76745946425466 - type: recall value: 93.23985978968453 task: type: BitextMining - dataset: config: hrv_Latn-rus_Cyrl name: MTEB NTREXBitextMining (hrv_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.99098647971958 - type: f1 value: 92.36354531797697 - type: main_score value: 92.36354531797697 - type: precision value: 91.63228970439788 - type: recall value: 93.99098647971958 task: type: BitextMining - dataset: config: hun_Latn-rus_Cyrl name: MTEB NTREXBitextMining (hun_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.64046069103655 - type: f1 value: 92.05224503421799 - type: main_score value: 92.05224503421799 - type: precision value: 91.33998616973079 - type: recall value: 93.64046069103655 task: type: BitextMining - dataset: config: ind_Latn-rus_Cyrl name: MTEB NTREXBitextMining (ind_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 91.68753129694541 - type: f1 value: 89.26222667334335 - type: main_score value: 89.26222667334335 - type: precision value: 88.14638624603572 - type: recall value: 91.68753129694541 task: type: BitextMining - dataset: config: jpn_Jpan-rus_Cyrl name: MTEB NTREXBitextMining (jpn_Jpan-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 91.28693039559339 - type: f1 value: 89.21161763348957 - type: main_score value: 89.21161763348957 - type: precision value: 88.31188340952988 - type: recall value: 91.28693039559339 task: type: BitextMining - dataset: config: kor_Hang-rus_Cyrl name: MTEB NTREXBitextMining (kor_Hang-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 89.53430145217827 - type: f1 value: 86.88322165788365 - type: main_score value: 86.88322165788365 - type: precision value: 85.73950211030831 - type: recall value: 89.53430145217827 task: type: BitextMining - dataset: config: lit_Latn-rus_Cyrl name: MTEB NTREXBitextMining (lit_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 90.28542814221332 - type: f1 value: 88.10249103814452 - type: main_score value: 88.10249103814452 - type: precision value: 87.17689323973752 - type: recall value: 90.28542814221332 task: type: BitextMining - dataset: config: mkd_Cyrl-rus_Cyrl name: MTEB NTREXBitextMining (mkd_Cyrl-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.04256384576865 - type: f1 value: 93.65643703650713 - type: main_score value: 93.65643703650713 - type: precision value: 93.02036387915207 - type: recall value: 95.04256384576865 task: type: BitextMining - dataset: config: nld_Latn-rus_Cyrl name: MTEB NTREXBitextMining (nld_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.39308963445168 - type: f1 value: 94.16207644800535 - type: main_score value: 94.16207644800535 - type: precision value: 93.582516632091 - type: recall value: 95.39308963445168 task: type: BitextMining - dataset: config: pol_Latn-rus_Cyrl name: MTEB NTREXBitextMining (pol_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.7436154231347 - type: f1 value: 94.5067601402103 - type: main_score value: 94.5067601402103 - type: precision value: 93.91587381071608 - type: recall value: 95.7436154231347 task: type: BitextMining - dataset: config: por_Latn-rus_Cyrl name: MTEB NTREXBitextMining (por_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 65.89884827240861 - type: f1 value: 64.61805459419219 - type: main_score value: 64.61805459419219 - type: precision value: 64.07119451106485 - type: recall value: 65.89884827240861 task: type: BitextMining - dataset: config: rus_Cyrl-arb_Arab name: MTEB NTREXBitextMining (rus_Cyrl-arb_Arab) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.2413620430646 - type: f1 value: 92.67663399861698 - type: main_score value: 92.67663399861698 - type: precision value: 91.94625271240193 - type: recall value: 94.2413620430646 task: type: BitextMining - dataset: config: rus_Cyrl-bel_Cyrl name: MTEB NTREXBitextMining (rus_Cyrl-bel_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.89233850776164 - type: f1 value: 93.40343849106993 - type: main_score value: 93.40343849106993 - type: precision value: 92.74077783341679 - type: recall value: 94.89233850776164 task: type: BitextMining - dataset: config: rus_Cyrl-ben_Beng name: MTEB NTREXBitextMining (rus_Cyrl-ben_Beng) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.2914371557336 - type: f1 value: 92.62226673343348 - type: main_score value: 92.62226673343348 - type: precision value: 91.84610248706393 - type: recall value: 94.2914371557336 task: type: BitextMining - dataset: config: rus_Cyrl-bos_Latn name: MTEB NTREXBitextMining (rus_Cyrl-bos_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.69354031046569 - type: f1 value: 94.50418051319403 - type: main_score value: 94.50418051319403 - type: precision value: 93.95843765648473 - type: recall value: 95.69354031046569 task: type: BitextMining - dataset: config: rus_Cyrl-bul_Cyrl name: MTEB NTREXBitextMining (rus_Cyrl-bul_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.89384076114172 - type: f1 value: 94.66199298948423 - type: main_score value: 94.66199298948423 - type: precision value: 94.08028709731263 - type: recall value: 95.89384076114172 task: type: BitextMining - dataset: config: rus_Cyrl-ces_Latn name: MTEB NTREXBitextMining (rus_Cyrl-ces_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.94091136705057 - type: f1 value: 92.3746731207923 - type: main_score value: 92.3746731207923 - type: precision value: 91.66207644800535 - type: recall value: 93.94091136705057 task: type: BitextMining - dataset: config: rus_Cyrl-deu_Latn name: MTEB NTREXBitextMining (rus_Cyrl-deu_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.94391587381071 - type: f1 value: 94.76214321482223 - type: main_score value: 94.76214321482223 - type: precision value: 94.20380570856285 - type: recall value: 95.94391587381071 task: type: BitextMining - dataset: config: rus_Cyrl-ell_Grek name: MTEB NTREXBitextMining (rus_Cyrl-ell_Grek) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.44316474712068 - type: f1 value: 94.14788849941579 - type: main_score value: 94.14788849941579 - type: precision value: 93.54197963612084 - type: recall value: 95.44316474712068 task: type: BitextMining - dataset: config: rus_Cyrl-eng_Latn name: MTEB NTREXBitextMining (rus_Cyrl-eng_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 98.14722083124687 - type: f1 value: 97.57135703555333 - type: main_score value: 97.57135703555333 - type: precision value: 97.2959439158738 - type: recall value: 98.14722083124687 task: type: BitextMining - dataset: config: rus_Cyrl-fas_Arab name: MTEB NTREXBitextMining (rus_Cyrl-fas_Arab) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.64196294441662 - type: f1 value: 93.24653647137372 - type: main_score value: 93.24653647137372 - type: precision value: 92.60724419963279 - type: recall value: 94.64196294441662 task: type: BitextMining - dataset: config: rus_Cyrl-fin_Latn name: MTEB NTREXBitextMining (rus_Cyrl-fin_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 87.98197295943916 - type: f1 value: 85.23368385912201 - type: main_score value: 85.23368385912201 - type: precision value: 84.08159858835873 - type: recall value: 87.98197295943916 task: type: BitextMining - dataset: config: rus_Cyrl-fra_Latn name: MTEB NTREXBitextMining (rus_Cyrl-fra_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.24436654982473 - type: f1 value: 95.07093974294774 - type: main_score value: 95.07093974294774 - type: precision value: 94.49591053246536 - type: recall value: 96.24436654982473 task: type: BitextMining - dataset: config: rus_Cyrl-heb_Hebr name: MTEB NTREXBitextMining (rus_Cyrl-heb_Hebr) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 91.08662994491738 - type: f1 value: 88.5161074945752 - type: main_score value: 88.5161074945752 - type: precision value: 87.36187614755467 - type: recall value: 91.08662994491738 task: type: BitextMining - dataset: config: rus_Cyrl-hin_Deva name: MTEB NTREXBitextMining (rus_Cyrl-hin_Deva) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.04256384576865 - type: f1 value: 93.66382907694876 - type: main_score value: 93.66382907694876 - type: precision value: 93.05291270238692 - type: recall value: 95.04256384576865 task: type: BitextMining - dataset: config: rus_Cyrl-hrv_Latn name: MTEB NTREXBitextMining (rus_Cyrl-hrv_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.14271407110667 - type: f1 value: 93.7481221832749 - type: main_score value: 93.7481221832749 - type: precision value: 93.10930681736892 - type: recall value: 95.14271407110667 task: type: BitextMining - dataset: config: rus_Cyrl-hun_Latn name: MTEB NTREXBitextMining (rus_Cyrl-hun_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 90.18527791687532 - type: f1 value: 87.61415933423946 - type: main_score value: 87.61415933423946 - type: precision value: 86.5166400394242 - type: recall value: 90.18527791687532 task: type: BitextMining - dataset: config: rus_Cyrl-ind_Latn name: MTEB NTREXBitextMining (rus_Cyrl-ind_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.69053580370556 - type: f1 value: 91.83608746453012 - type: main_score value: 91.83608746453012 - type: precision value: 90.97145718577868 - type: recall value: 93.69053580370556 task: type: BitextMining - dataset: config: rus_Cyrl-jpn_Jpan name: MTEB NTREXBitextMining (rus_Cyrl-jpn_Jpan) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 89.48422633950926 - type: f1 value: 86.91271033534429 - type: main_score value: 86.91271033534429 - type: precision value: 85.82671626487351 - type: recall value: 89.48422633950926 task: type: BitextMining - dataset: config: rus_Cyrl-kor_Hang name: MTEB NTREXBitextMining (rus_Cyrl-kor_Hang) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 88.4827240861292 - type: f1 value: 85.35080398375342 - type: main_score value: 85.35080398375342 - type: precision value: 83.9588549490903 - type: recall value: 88.4827240861292 task: type: BitextMining - dataset: config: rus_Cyrl-lit_Latn name: MTEB NTREXBitextMining (rus_Cyrl-lit_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 90.33550325488233 - type: f1 value: 87.68831819157307 - type: main_score value: 87.68831819157307 - type: precision value: 86.51524906407231 - type: recall value: 90.33550325488233 task: type: BitextMining - dataset: config: rus_Cyrl-mkd_Cyrl name: MTEB NTREXBitextMining (rus_Cyrl-mkd_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.94391587381071 - type: f1 value: 94.90402270071775 - type: main_score value: 94.90402270071775 - type: precision value: 94.43915873810715 - type: recall value: 95.94391587381071 task: type: BitextMining - dataset: config: rus_Cyrl-nld_Latn name: MTEB NTREXBitextMining (rus_Cyrl-nld_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 92.98948422633951 - type: f1 value: 91.04323151393756 - type: main_score value: 91.04323151393756 - type: precision value: 90.14688699716241 - type: recall value: 92.98948422633951 task: type: BitextMining - dataset: config: rus_Cyrl-pol_Latn name: MTEB NTREXBitextMining (rus_Cyrl-pol_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.34151226840261 - type: f1 value: 92.8726422967785 - type: main_score value: 92.8726422967785 - type: precision value: 92.19829744616925 - type: recall value: 94.34151226840261 task: type: BitextMining - dataset: config: rus_Cyrl-por_Latn name: MTEB NTREXBitextMining (rus_Cyrl-por_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 86.17926890335504 - type: f1 value: 82.7304882287356 - type: main_score value: 82.7304882287356 - type: precision value: 81.28162481817964 - type: recall value: 86.17926890335504 task: type: BitextMining - dataset: config: rus_Cyrl-slk_Latn name: MTEB NTREXBitextMining (rus_Cyrl-slk_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 92.7391086629945 - type: f1 value: 90.75112669003506 - type: main_score value: 90.75112669003506 - type: precision value: 89.8564513436822 - type: recall value: 92.7391086629945 task: type: BitextMining - dataset: config: rus_Cyrl-slv_Latn name: MTEB NTREXBitextMining (rus_Cyrl-slv_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 92.8893340010015 - type: f1 value: 91.05992321816058 - type: main_score value: 91.05992321816058 - type: precision value: 90.22589439715128 - type: recall value: 92.8893340010015 task: type: BitextMining - dataset: config: rus_Cyrl-spa_Latn name: MTEB NTREXBitextMining (rus_Cyrl-spa_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.49474211316975 - type: f1 value: 95.4715406442998 - type: main_score value: 95.4715406442998 - type: precision value: 94.9799699549324 - type: recall value: 96.49474211316975 task: type: BitextMining - dataset: config: rus_Cyrl-srp_Cyrl name: MTEB NTREXBitextMining (rus_Cyrl-srp_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 81.07160741111667 - type: f1 value: 76.55687285507015 - type: main_score value: 76.55687285507015 - type: precision value: 74.71886401030116 - type: recall value: 81.07160741111667 task: type: BitextMining - dataset: config: rus_Cyrl-srp_Latn name: MTEB NTREXBitextMining (rus_Cyrl-srp_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.14271407110667 - type: f1 value: 93.73302377809138 - type: main_score value: 93.73302377809138 - type: precision value: 93.06960440660991 - type: recall value: 95.14271407110667 task: type: BitextMining - dataset: config: rus_Cyrl-swa_Latn name: MTEB NTREXBitextMining (rus_Cyrl-swa_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.79218828242364 - type: f1 value: 93.25988983475212 - type: main_score value: 93.25988983475212 - type: precision value: 92.53463528626273 - type: recall value: 94.79218828242364 task: type: BitextMining - dataset: config: rus_Cyrl-swe_Latn name: MTEB NTREXBitextMining (rus_Cyrl-swe_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.04256384576865 - type: f1 value: 93.58704723752295 - type: main_score value: 93.58704723752295 - type: precision value: 92.91437155733601 - type: recall value: 95.04256384576865 task: type: BitextMining - dataset: config: rus_Cyrl-tam_Taml name: MTEB NTREXBitextMining (rus_Cyrl-tam_Taml) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.28993490235354 - type: f1 value: 91.63912535469872 - type: main_score value: 91.63912535469872 - type: precision value: 90.87738750983617 - type: recall value: 93.28993490235354 task: type: BitextMining - dataset: config: rus_Cyrl-tur_Latn name: MTEB NTREXBitextMining (rus_Cyrl-tur_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.74061091637456 - type: f1 value: 91.96628275746953 - type: main_score value: 91.96628275746953 - type: precision value: 91.15923885828742 - type: recall value: 93.74061091637456 task: type: BitextMining - dataset: config: rus_Cyrl-ukr_Cyrl name: MTEB NTREXBitextMining (rus_Cyrl-ukr_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.99399098647972 - type: f1 value: 94.89567684860624 - type: main_score value: 94.89567684860624 - type: precision value: 94.37072275079286 - type: recall value: 95.99399098647972 task: type: BitextMining - dataset: config: rus_Cyrl-vie_Latn name: MTEB NTREXBitextMining (rus_Cyrl-vie_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 91.4371557336004 - type: f1 value: 88.98681355366382 - type: main_score value: 88.98681355366382 - type: precision value: 87.89183775663496 - type: recall value: 91.4371557336004 task: type: BitextMining - dataset: config: rus_Cyrl-zho_Hant name: MTEB NTREXBitextMining (rus_Cyrl-zho_Hant) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 92.7891837756635 - type: f1 value: 90.79047142141783 - type: main_score value: 90.79047142141783 - type: precision value: 89.86980470706058 - type: recall value: 92.7891837756635 task: type: BitextMining - dataset: config: rus_Cyrl-zul_Latn name: MTEB NTREXBitextMining (rus_Cyrl-zul_Latn) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 87.43114672008012 - type: f1 value: 84.04618833011422 - type: main_score value: 84.04618833011422 - type: precision value: 82.52259341393041 - type: recall value: 87.43114672008012 task: type: BitextMining - dataset: config: slk_Latn-rus_Cyrl name: MTEB NTREXBitextMining (slk_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.34301452178268 - type: f1 value: 94.20392493502158 - type: main_score value: 94.20392493502158 - type: precision value: 93.67384409948257 - type: recall value: 95.34301452178268 task: type: BitextMining - dataset: config: slv_Latn-rus_Cyrl name: MTEB NTREXBitextMining (slv_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 92.23835753630446 - type: f1 value: 90.5061759305625 - type: main_score value: 90.5061759305625 - type: precision value: 89.74231188051918 - type: recall value: 92.23835753630446 task: type: BitextMining - dataset: config: spa_Latn-rus_Cyrl name: MTEB NTREXBitextMining (spa_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.54481722583876 - type: f1 value: 95.54665331330328 - type: main_score value: 95.54665331330328 - type: precision value: 95.06342847604739 - type: recall value: 96.54481722583876 task: type: BitextMining - dataset: config: srp_Cyrl-rus_Cyrl name: MTEB NTREXBitextMining (srp_Cyrl-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 83.62543815723585 - type: f1 value: 80.77095672699816 - type: main_score value: 80.77095672699816 - type: precision value: 79.74674313056886 - type: recall value: 83.62543815723585 task: type: BitextMining - dataset: config: srp_Latn-rus_Cyrl name: MTEB NTREXBitextMining (srp_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 94.44166249374061 - type: f1 value: 93.00733206591994 - type: main_score value: 93.00733206591994 - type: precision value: 92.37203026762366 - type: recall value: 94.44166249374061 task: type: BitextMining - dataset: config: swa_Latn-rus_Cyrl name: MTEB NTREXBitextMining (swa_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 90.23535302954431 - type: f1 value: 87.89596482636041 - type: main_score value: 87.89596482636041 - type: precision value: 86.87060227370694 - type: recall value: 90.23535302954431 task: type: BitextMining - dataset: config: swe_Latn-rus_Cyrl name: MTEB NTREXBitextMining (swe_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 95.44316474712068 - type: f1 value: 94.1896177599733 - type: main_score value: 94.1896177599733 - type: precision value: 93.61542313470206 - type: recall value: 95.44316474712068 task: type: BitextMining - dataset: config: tam_Taml-rus_Cyrl name: MTEB NTREXBitextMining (tam_Taml-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 89.68452679018529 - type: f1 value: 87.37341160650037 - type: main_score value: 87.37341160650037 - type: precision value: 86.38389402285247 - type: recall value: 89.68452679018529 task: type: BitextMining - dataset: config: tur_Latn-rus_Cyrl name: MTEB NTREXBitextMining (tur_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.89083625438157 - type: f1 value: 92.33892505424804 - type: main_score value: 92.33892505424804 - type: precision value: 91.63125640842216 - type: recall value: 93.89083625438157 task: type: BitextMining - dataset: config: ukr_Cyrl-rus_Cyrl name: MTEB NTREXBitextMining (ukr_Cyrl-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 96.14421632448673 - type: f1 value: 95.11028447433054 - type: main_score value: 95.11028447433054 - type: precision value: 94.62944416624937 - type: recall value: 96.14421632448673 task: type: BitextMining - dataset: config: vie_Latn-rus_Cyrl name: MTEB NTREXBitextMining (vie_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 93.79068602904357 - type: f1 value: 92.14989150392256 - type: main_score value: 92.14989150392256 - type: precision value: 91.39292271740945 - type: recall value: 93.79068602904357 task: type: BitextMining - dataset: config: zho_Hant-rus_Cyrl name: MTEB NTREXBitextMining (zho_Hant-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 89.13370055082625 - type: f1 value: 86.51514618639217 - type: main_score value: 86.51514618639217 - type: precision value: 85.383920035898 - type: recall value: 89.13370055082625 task: type: BitextMining - dataset: config: zul_Latn-rus_Cyrl name: MTEB NTREXBitextMining (zul_Latn-rus_Cyrl) revision: ed9a4403ed4adbfaf4aab56d5b2709e9f6c3ba33 split: test type: mteb/NTREX metrics: - type: accuracy value: 81.17175763645467 - type: f1 value: 77.72331766047338 - type: main_score value: 77.72331766047338 - type: precision value: 76.24629555848075 - type: recall value: 81.17175763645467 task: type: BitextMining - dataset: config: ru name: MTEB OpusparcusPC (ru) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test.full type: GEM/opusparcus metrics: - type: cosine_accuracy value: 73.09136420525657 - type: cosine_accuracy_threshold value: 87.70400881767273 - type: cosine_ap value: 86.51938550599533 - type: cosine_f1 value: 80.84358523725834 - type: cosine_f1_threshold value: 86.90648078918457 - type: cosine_precision value: 73.24840764331209 - type: cosine_recall value: 90.19607843137256 - type: dot_accuracy value: 73.09136420525657 - type: dot_accuracy_threshold value: 87.7040147781372 - type: dot_ap value: 86.51934769946833 - type: dot_f1 value: 80.84358523725834 - type: dot_f1_threshold value: 86.90648078918457 - type: dot_precision value: 73.24840764331209 - type: dot_recall value: 90.19607843137256 - type: euclidean_accuracy value: 73.09136420525657 - type: euclidean_accuracy_threshold value: 49.590304493904114 - type: euclidean_ap value: 86.51934769946833 - type: euclidean_f1 value: 80.84358523725834 - type: euclidean_f1_threshold value: 51.173269748687744 - type: euclidean_precision value: 73.24840764331209 - type: euclidean_recall value: 90.19607843137256 - type: main_score value: 86.51976811057995 - type: manhattan_accuracy value: 73.40425531914893 - type: manhattan_accuracy_threshold value: 757.8278541564941 - type: manhattan_ap value: 86.51976811057995 - type: manhattan_f1 value: 80.92898615453328 - type: manhattan_f1_threshold value: 778.3821105957031 - type: manhattan_precision value: 74.32321575061526 - type: manhattan_recall value: 88.8235294117647 - type: max_ap value: 86.51976811057995 - type: max_f1 value: 80.92898615453328 - type: max_precision value: 74.32321575061526 - type: max_recall value: 90.19607843137256 - type: similarity_accuracy value: 73.09136420525657 - type: similarity_accuracy_threshold value: 87.70400881767273 - type: similarity_ap value: 86.51938550599533 - type: similarity_f1 value: 80.84358523725834 - type: similarity_f1_threshold value: 86.90648078918457 - type: similarity_precision value: 73.24840764331209 - type: similarity_recall value: 90.19607843137256 task: type: PairClassification - dataset: config: russian name: MTEB PublicHealthQA (russian) revision: main split: test type: xhluca/publichealth-qa metrics: - type: main_score value: 79.303 - type: map_at_1 value: 61.538000000000004 - type: map_at_10 value: 74.449 - type: map_at_100 value: 74.687 - type: map_at_1000 value: 74.687 - type: map_at_20 value: 74.589 - type: map_at_3 value: 73.333 - type: map_at_5 value: 74.256 - type: mrr_at_1 value: 61.53846153846154 - type: mrr_at_10 value: 74.44871794871794 - type: mrr_at_100 value: 74.68730304304074 - type: mrr_at_1000 value: 74.68730304304074 - type: mrr_at_20 value: 74.58857808857809 - type: mrr_at_3 value: 73.33333333333333 - type: mrr_at_5 value: 74.25641025641025 - type: nauc_map_at_1000_diff1 value: 61.375798048778506 - type: nauc_map_at_1000_max value: 51.37093181241067 - type: nauc_map_at_1000_std value: 41.735794471409015 - type: nauc_map_at_100_diff1 value: 61.375798048778506 - type: nauc_map_at_100_max value: 51.37093181241067 - type: nauc_map_at_100_std value: 41.735794471409015 - type: nauc_map_at_10_diff1 value: 61.12796039757213 - type: nauc_map_at_10_max value: 51.843445267118014 - type: nauc_map_at_10_std value: 42.243121474939365 - type: nauc_map_at_1_diff1 value: 66.39100974909151 - type: nauc_map_at_1_max value: 44.77165601342703 - type: nauc_map_at_1_std value: 32.38542979413408 - type: nauc_map_at_20_diff1 value: 61.16611123434347 - type: nauc_map_at_20_max value: 51.52605092407306 - type: nauc_map_at_20_std value: 41.94787773313971 - type: nauc_map_at_3_diff1 value: 61.40157474408937 - type: nauc_map_at_3_max value: 51.47230077853947 - type: nauc_map_at_3_std value: 42.63540269440141 - type: nauc_map_at_5_diff1 value: 61.07631147583098 - type: nauc_map_at_5_max value: 52.02626939341523 - type: nauc_map_at_5_std value: 42.511607332150334 - type: nauc_mrr_at_1000_diff1 value: 61.375798048778506 - type: nauc_mrr_at_1000_max value: 51.37093181241067 - type: nauc_mrr_at_1000_std value: 41.735794471409015 - type: nauc_mrr_at_100_diff1 value: 61.375798048778506 - type: nauc_mrr_at_100_max value: 51.37093181241067 - type: nauc_mrr_at_100_std value: 41.735794471409015 - type: nauc_mrr_at_10_diff1 value: 61.12796039757213 - type: nauc_mrr_at_10_max value: 51.843445267118014 - type: nauc_mrr_at_10_std value: 42.243121474939365 - type: nauc_mrr_at_1_diff1 value: 66.39100974909151 - type: nauc_mrr_at_1_max value: 44.77165601342703 - type: nauc_mrr_at_1_std value: 32.38542979413408 - type: nauc_mrr_at_20_diff1 value: 61.16611123434347 - type: nauc_mrr_at_20_max value: 51.52605092407306 - type: nauc_mrr_at_20_std value: 41.94787773313971 - type: nauc_mrr_at_3_diff1 value: 61.40157474408937 - type: nauc_mrr_at_3_max value: 51.47230077853947 - type: nauc_mrr_at_3_std value: 42.63540269440141 - type: nauc_mrr_at_5_diff1 value: 61.07631147583098 - type: nauc_mrr_at_5_max value: 52.02626939341523 - type: nauc_mrr_at_5_std value: 42.511607332150334 - type: nauc_ndcg_at_1000_diff1 value: 60.54821630436157 - type: nauc_ndcg_at_1000_max value: 52.584328363863634 - type: nauc_ndcg_at_1000_std value: 43.306961101645946 - type: nauc_ndcg_at_100_diff1 value: 60.54821630436157 - type: nauc_ndcg_at_100_max value: 52.584328363863634 - type: nauc_ndcg_at_100_std value: 43.306961101645946 - type: nauc_ndcg_at_10_diff1 value: 58.800340278109886 - type: nauc_ndcg_at_10_max value: 55.31050771670664 - type: nauc_ndcg_at_10_std value: 46.40931672942848 - type: nauc_ndcg_at_1_diff1 value: 66.39100974909151 - type: nauc_ndcg_at_1_max value: 44.77165601342703 - type: nauc_ndcg_at_1_std value: 32.38542979413408 - type: nauc_ndcg_at_20_diff1 value: 58.88690479697946 - type: nauc_ndcg_at_20_max value: 54.19269661177923 - type: nauc_ndcg_at_20_std value: 45.39305589413174 - type: nauc_ndcg_at_3_diff1 value: 59.61866351451574 - type: nauc_ndcg_at_3_max value: 54.23992718744033 - type: nauc_ndcg_at_3_std value: 46.997379274101 - type: nauc_ndcg_at_5_diff1 value: 58.70739588066225 - type: nauc_ndcg_at_5_max value: 55.76766902539152 - type: nauc_ndcg_at_5_std value: 47.10553115762958 - type: nauc_precision_at_1000_diff1 value: 100.0 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 100.0 - type: nauc_precision_at_100_diff1 value: .nan - type: nauc_precision_at_100_max value: .nan - type: nauc_precision_at_100_std value: .nan - type: nauc_precision_at_10_diff1 value: 35.72622112397501 - type: nauc_precision_at_10_max value: 89.84297108673948 - type: nauc_precision_at_10_std value: 86.60269192422707 - type: nauc_precision_at_1_diff1 value: 66.39100974909151 - type: nauc_precision_at_1_max value: 44.77165601342703 - type: nauc_precision_at_1_std value: 32.38542979413408 - type: nauc_precision_at_20_diff1 value: 29.188449183726433 - type: nauc_precision_at_20_max value: 86.45729478231968 - type: nauc_precision_at_20_std value: 86.45729478231968 - type: nauc_precision_at_3_diff1 value: 50.294126629236224 - type: nauc_precision_at_3_max value: 68.98223127174579 - type: nauc_precision_at_3_std value: 70.31195520376356 - type: nauc_precision_at_5_diff1 value: 39.648884288124385 - type: nauc_precision_at_5_max value: 86.3409770687935 - type: nauc_precision_at_5_std value: 83.74875373878356 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: .nan - type: nauc_recall_at_100_max value: .nan - type: nauc_recall_at_100_std value: .nan - type: nauc_recall_at_10_diff1 value: 35.72622112397516 - type: nauc_recall_at_10_max value: 89.84297108673968 - type: nauc_recall_at_10_std value: 86.60269192422749 - type: nauc_recall_at_1_diff1 value: 66.39100974909151 - type: nauc_recall_at_1_max value: 44.77165601342703 - type: nauc_recall_at_1_std value: 32.38542979413408 - type: nauc_recall_at_20_diff1 value: 29.188449183726323 - type: nauc_recall_at_20_max value: 86.45729478231985 - type: nauc_recall_at_20_std value: 86.45729478231985 - type: nauc_recall_at_3_diff1 value: 50.29412662923603 - type: nauc_recall_at_3_max value: 68.98223127174562 - type: nauc_recall_at_3_std value: 70.31195520376346 - type: nauc_recall_at_5_diff1 value: 39.64888428812445 - type: nauc_recall_at_5_max value: 86.34097706879359 - type: nauc_recall_at_5_std value: 83.74875373878366 - type: ndcg_at_1 value: 61.538000000000004 - type: ndcg_at_10 value: 79.303 - type: ndcg_at_100 value: 80.557 - type: ndcg_at_1000 value: 80.557 - type: ndcg_at_20 value: 79.732 - type: ndcg_at_3 value: 77.033 - type: ndcg_at_5 value: 78.818 - type: precision_at_1 value: 61.538000000000004 - type: precision_at_10 value: 9.385 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.769 - type: precision_at_3 value: 29.231 - type: precision_at_5 value: 18.462 - type: recall_at_1 value: 61.538000000000004 - type: recall_at_10 value: 93.84599999999999 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 95.38499999999999 - type: recall_at_3 value: 87.69200000000001 - type: recall_at_5 value: 92.308 task: type: Retrieval - dataset: config: default name: MTEB RUParaPhraserSTS (default) revision: 43265056790b8f7c59e0139acb4be0a8dad2c8f4 split: test type: merionum/ru_paraphraser metrics: - type: cosine_pearson value: 64.73554596215753 - type: cosine_spearman value: 70.45849652271855 - type: euclidean_pearson value: 68.08069844834267 - type: euclidean_spearman value: 70.45854872959124 - type: main_score value: 70.45849652271855 - type: manhattan_pearson value: 67.88325986519624 - type: manhattan_spearman value: 70.21131896834542 - type: pearson value: 64.73554596215753 - type: spearman value: 70.45849652271855 task: type: STS - dataset: config: default name: MTEB RiaNewsRetrieval (default) revision: 82374b0bbacda6114f39ff9c5b925fa1512ca5d7 split: test type: ai-forever/ria-news-retrieval metrics: - type: main_score value: 70.00999999999999 - type: map_at_1 value: 55.97 - type: map_at_10 value: 65.59700000000001 - type: map_at_100 value: 66.057 - type: map_at_1000 value: 66.074 - type: map_at_20 value: 65.892 - type: map_at_3 value: 63.74999999999999 - type: map_at_5 value: 64.84299999999999 - type: mrr_at_1 value: 55.88999999999999 - type: mrr_at_10 value: 65.55873015872977 - type: mrr_at_100 value: 66.01891495129716 - type: mrr_at_1000 value: 66.03538391493299 - type: mrr_at_20 value: 65.85351193431555 - type: mrr_at_3 value: 63.7133333333329 - type: mrr_at_5 value: 64.80483333333268 - type: nauc_map_at_1000_diff1 value: 65.95332946436318 - type: nauc_map_at_1000_max value: 28.21204156197811 - type: nauc_map_at_1000_std value: -13.139245767083743 - type: nauc_map_at_100_diff1 value: 65.94763105024367 - type: nauc_map_at_100_max value: 28.212832170078205 - type: nauc_map_at_100_std value: -13.131425849370665 - type: nauc_map_at_10_diff1 value: 65.88455089448388 - type: nauc_map_at_10_max value: 28.13555838776792 - type: nauc_map_at_10_std value: -13.326989827081023 - type: nauc_map_at_1_diff1 value: 69.31275711813979 - type: nauc_map_at_1_max value: 26.386708520283758 - type: nauc_map_at_1_std value: -14.434616447245464 - type: nauc_map_at_20_diff1 value: 65.91227032605677 - type: nauc_map_at_20_max value: 28.20538655600886 - type: nauc_map_at_20_std value: -13.191148834410274 - type: nauc_map_at_3_diff1 value: 66.0051677952641 - type: nauc_map_at_3_max value: 28.25443420019022 - type: nauc_map_at_3_std value: -13.893284109029558 - type: nauc_map_at_5_diff1 value: 65.89784348297898 - type: nauc_map_at_5_max value: 28.26449765184183 - type: nauc_map_at_5_std value: -13.506692912805008 - type: nauc_mrr_at_1000_diff1 value: 66.06599513750889 - type: nauc_mrr_at_1000_max value: 28.191556650722287 - type: nauc_mrr_at_1000_std value: -13.098487982930276 - type: nauc_mrr_at_100_diff1 value: 66.0602307977725 - type: nauc_mrr_at_100_max value: 28.19235936624514 - type: nauc_mrr_at_100_std value: -13.09069677716269 - type: nauc_mrr_at_10_diff1 value: 65.99546819079403 - type: nauc_mrr_at_10_max value: 28.11556170120022 - type: nauc_mrr_at_10_std value: -13.286711073897553 - type: nauc_mrr_at_1_diff1 value: 69.49541040517995 - type: nauc_mrr_at_1_max value: 26.354622707276153 - type: nauc_mrr_at_1_std value: -14.358839778104695 - type: nauc_mrr_at_20_diff1 value: 66.02427154257936 - type: nauc_mrr_at_20_max value: 28.18509383563462 - type: nauc_mrr_at_20_std value: -13.150543398429 - type: nauc_mrr_at_3_diff1 value: 66.11258119082618 - type: nauc_mrr_at_3_max value: 28.239510722224004 - type: nauc_mrr_at_3_std value: -13.857249251136269 - type: nauc_mrr_at_5_diff1 value: 66.00633786765626 - type: nauc_mrr_at_5_max value: 28.244875152193032 - type: nauc_mrr_at_5_std value: -13.467206028704434 - type: nauc_ndcg_at_1000_diff1 value: 65.02876183314446 - type: nauc_ndcg_at_1000_max value: 29.109368390197194 - type: nauc_ndcg_at_1000_std value: -11.56514359821697 - type: nauc_ndcg_at_100_diff1 value: 64.85837726893713 - type: nauc_ndcg_at_100_max value: 29.19990133137256 - type: nauc_ndcg_at_100_std value: -11.17450348161257 - type: nauc_ndcg_at_10_diff1 value: 64.53842705024796 - type: nauc_ndcg_at_10_max value: 28.748734006088526 - type: nauc_ndcg_at_10_std value: -12.331395505957063 - type: nauc_ndcg_at_1_diff1 value: 69.31275711813979 - type: nauc_ndcg_at_1_max value: 26.386708520283758 - type: nauc_ndcg_at_1_std value: -14.434616447245464 - type: nauc_ndcg_at_20_diff1 value: 64.59017606740504 - type: nauc_ndcg_at_20_max value: 29.047332048898017 - type: nauc_ndcg_at_20_std value: -11.746548770195954 - type: nauc_ndcg_at_3_diff1 value: 64.87900935713822 - type: nauc_ndcg_at_3_max value: 28.953157521204403 - type: nauc_ndcg_at_3_std value: -13.639947228880942 - type: nauc_ndcg_at_5_diff1 value: 64.61466953479034 - type: nauc_ndcg_at_5_max value: 29.01899321868392 - type: nauc_ndcg_at_5_std value: -12.85356404799802 - type: nauc_precision_at_1000_diff1 value: 48.85481417002382 - type: nauc_precision_at_1000_max value: 57.129837326696375 - type: nauc_precision_at_1000_std value: 37.889524999906435 - type: nauc_precision_at_100_diff1 value: 53.374672326788264 - type: nauc_precision_at_100_max value: 43.819333062207974 - type: nauc_precision_at_100_std value: 21.387064885769362 - type: nauc_precision_at_10_diff1 value: 57.66571169774445 - type: nauc_precision_at_10_max value: 31.779694837242033 - type: nauc_precision_at_10_std value: -6.6248399147180255 - type: nauc_precision_at_1_diff1 value: 69.31275711813979 - type: nauc_precision_at_1_max value: 26.386708520283758 - type: nauc_precision_at_1_std value: -14.434616447245464 - type: nauc_precision_at_20_diff1 value: 55.93570036001682 - type: nauc_precision_at_20_max value: 34.98640173388743 - type: nauc_precision_at_20_std value: -0.36518465159326174 - type: nauc_precision_at_3_diff1 value: 60.94100093991508 - type: nauc_precision_at_3_max value: 31.422239034357673 - type: nauc_precision_at_3_std value: -12.72576556537896 - type: nauc_precision_at_5_diff1 value: 59.450505195434054 - type: nauc_precision_at_5_max value: 32.07638712418377 - type: nauc_precision_at_5_std value: -10.024459103498598 - type: nauc_recall_at_1000_diff1 value: 48.854814170024184 - type: nauc_recall_at_1000_max value: 57.129837326697164 - type: nauc_recall_at_1000_std value: 37.88952499990672 - type: nauc_recall_at_100_diff1 value: 53.37467232678822 - type: nauc_recall_at_100_max value: 43.8193330622079 - type: nauc_recall_at_100_std value: 21.387064885769398 - type: nauc_recall_at_10_diff1 value: 57.66571169774447 - type: nauc_recall_at_10_max value: 31.779694837242133 - type: nauc_recall_at_10_std value: -6.62483991471789 - type: nauc_recall_at_1_diff1 value: 69.31275711813979 - type: nauc_recall_at_1_max value: 26.386708520283758 - type: nauc_recall_at_1_std value: -14.434616447245464 - type: nauc_recall_at_20_diff1 value: 55.93570036001682 - type: nauc_recall_at_20_max value: 34.986401733887554 - type: nauc_recall_at_20_std value: -0.3651846515931506 - type: nauc_recall_at_3_diff1 value: 60.94100093991499 - type: nauc_recall_at_3_max value: 31.422239034357606 - type: nauc_recall_at_3_std value: -12.725765565378966 - type: nauc_recall_at_5_diff1 value: 59.450505195434125 - type: nauc_recall_at_5_max value: 32.07638712418387 - type: nauc_recall_at_5_std value: -10.024459103498472 - type: ndcg_at_1 value: 55.97 - type: ndcg_at_10 value: 70.00999999999999 - type: ndcg_at_100 value: 72.20100000000001 - type: ndcg_at_1000 value: 72.65599999999999 - type: ndcg_at_20 value: 71.068 - type: ndcg_at_3 value: 66.228 - type: ndcg_at_5 value: 68.191 - type: precision_at_1 value: 55.97 - type: precision_at_10 value: 8.373999999999999 - type: precision_at_100 value: 0.9390000000000001 - type: precision_at_1000 value: 0.097 - type: precision_at_20 value: 4.3950000000000005 - type: precision_at_3 value: 24.46 - type: precision_at_5 value: 15.626000000000001 - type: recall_at_1 value: 55.97 - type: recall_at_10 value: 83.74000000000001 - type: recall_at_100 value: 93.87 - type: recall_at_1000 value: 97.49 - type: recall_at_20 value: 87.89 - type: recall_at_3 value: 73.38 - type: recall_at_5 value: 78.13 task: type: Retrieval - dataset: config: default name: MTEB RuBQReranking (default) revision: 2e96b8f098fa4b0950fc58eacadeb31c0d0c7fa2 split: test type: ai-forever/rubq-reranking metrics: - type: main_score value: 71.44929565043827 - type: map value: 71.44929565043827 - type: mrr value: 77.78391820945014 - type: nAUC_map_diff1 value: 38.140840668080244 - type: nAUC_map_max value: 27.54328688105381 - type: nAUC_map_std value: 16.81572082284672 - type: nAUC_mrr_diff1 value: 44.51350415961509 - type: nAUC_mrr_max value: 36.491182016669754 - type: nAUC_mrr_std value: 22.47139593052269 task: type: Reranking - dataset: config: default name: MTEB RuBQRetrieval (default) revision: e19b6ffa60b3bc248e0b41f4cc37c26a55c2a67b split: test type: ai-forever/rubq-retrieval metrics: - type: main_score value: 68.529 - type: map_at_1 value: 42.529 - type: map_at_10 value: 60.864 - type: map_at_100 value: 61.868 - type: map_at_1000 value: 61.907000000000004 - type: map_at_20 value: 61.596 - type: map_at_3 value: 55.701 - type: map_at_5 value: 58.78 - type: mrr_at_1 value: 60.57919621749409 - type: mrr_at_10 value: 70.55614188149649 - type: mrr_at_100 value: 70.88383816664494 - type: mrr_at_1000 value: 70.89719252668833 - type: mrr_at_20 value: 70.79839750105347 - type: mrr_at_3 value: 68.4594168636722 - type: mrr_at_5 value: 69.67100078802214 - type: nauc_map_at_1000_diff1 value: 40.67438785660885 - type: nauc_map_at_1000_max value: 32.79981738507424 - type: nauc_map_at_1000_std value: -6.873402600044831 - type: nauc_map_at_100_diff1 value: 40.65643664443284 - type: nauc_map_at_100_max value: 32.81594799919249 - type: nauc_map_at_100_std value: -6.8473246794498195 - type: nauc_map_at_10_diff1 value: 40.39048268484908 - type: nauc_map_at_10_max value: 32.403242161479525 - type: nauc_map_at_10_std value: -7.344413799841244 - type: nauc_map_at_1_diff1 value: 44.36306892906905 - type: nauc_map_at_1_max value: 25.61348630699028 - type: nauc_map_at_1_std value: -8.713074613333902 - type: nauc_map_at_20_diff1 value: 40.530326570124615 - type: nauc_map_at_20_max value: 32.74028319323205 - type: nauc_map_at_20_std value: -7.008180779820569 - type: nauc_map_at_3_diff1 value: 40.764924859364044 - type: nauc_map_at_3_max value: 29.809671682025336 - type: nauc_map_at_3_std value: -9.205620202725564 - type: nauc_map_at_5_diff1 value: 40.88599496021476 - type: nauc_map_at_5_max value: 32.1701894666848 - type: nauc_map_at_5_std value: -7.801251849010623 - type: nauc_mrr_at_1000_diff1 value: 48.64181373540728 - type: nauc_mrr_at_1000_max value: 40.136947990653546 - type: nauc_mrr_at_1000_std value: -7.250260497468805 - type: nauc_mrr_at_100_diff1 value: 48.63349902496212 - type: nauc_mrr_at_100_max value: 40.14510559704008 - type: nauc_mrr_at_100_std value: -7.228702374801103 - type: nauc_mrr_at_10_diff1 value: 48.58580560194813 - type: nauc_mrr_at_10_max value: 40.15075599433366 - type: nauc_mrr_at_10_std value: -7.267928771548688 - type: nauc_mrr_at_1_diff1 value: 51.47535097164919 - type: nauc_mrr_at_1_max value: 38.23579750430856 - type: nauc_mrr_at_1_std value: -9.187785187137633 - type: nauc_mrr_at_20_diff1 value: 48.58688378336222 - type: nauc_mrr_at_20_max value: 40.13408744088299 - type: nauc_mrr_at_20_std value: -7.283132775160146 - type: nauc_mrr_at_3_diff1 value: 48.66833005454742 - type: nauc_mrr_at_3_max value: 40.07987333638038 - type: nauc_mrr_at_3_std value: -7.738819947521418 - type: nauc_mrr_at_5_diff1 value: 48.76536305941537 - type: nauc_mrr_at_5_max value: 40.381929739522185 - type: nauc_mrr_at_5_std value: -7.592858318378928 - type: nauc_ndcg_at_1000_diff1 value: 41.67304442004693 - type: nauc_ndcg_at_1000_max value: 35.84126926253235 - type: nauc_ndcg_at_1000_std value: -4.78971011604655 - type: nauc_ndcg_at_100_diff1 value: 41.16918850185783 - type: nauc_ndcg_at_100_max value: 36.082461962326505 - type: nauc_ndcg_at_100_std value: -4.092442251697269 - type: nauc_ndcg_at_10_diff1 value: 40.300065598615205 - type: nauc_ndcg_at_10_max value: 34.87866296788365 - type: nauc_ndcg_at_10_std value: -5.866529277842453 - type: nauc_ndcg_at_1_diff1 value: 51.74612915209495 - type: nauc_ndcg_at_1_max value: 37.71907067970078 - type: nauc_ndcg_at_1_std value: -9.064124266098696 - type: nauc_ndcg_at_20_diff1 value: 40.493949850214584 - type: nauc_ndcg_at_20_max value: 35.69331503650286 - type: nauc_ndcg_at_20_std value: -4.995310342975443 - type: nauc_ndcg_at_3_diff1 value: 41.269443212112364 - type: nauc_ndcg_at_3_max value: 32.572844460953334 - type: nauc_ndcg_at_3_std value: -9.063015396458791 - type: nauc_ndcg_at_5_diff1 value: 41.37039652522888 - type: nauc_ndcg_at_5_max value: 34.67416011393571 - type: nauc_ndcg_at_5_std value: -7.106845569862319 - type: nauc_precision_at_1000_diff1 value: -9.571769961090155 - type: nauc_precision_at_1000_max value: 5.574782583417188 - type: nauc_precision_at_1000_std value: 7.28333847923847 - type: nauc_precision_at_100_diff1 value: -7.7405012003383735 - type: nauc_precision_at_100_max value: 9.67745355070353 - type: nauc_precision_at_100_std value: 9.327890294080992 - type: nauc_precision_at_10_diff1 value: -1.006879647532931 - type: nauc_precision_at_10_max value: 15.899825481231064 - type: nauc_precision_at_10_std value: 4.2284084852153105 - type: nauc_precision_at_1_diff1 value: 51.74612915209495 - type: nauc_precision_at_1_max value: 37.71907067970078 - type: nauc_precision_at_1_std value: -9.064124266098696 - type: nauc_precision_at_20_diff1 value: -4.982301544401409 - type: nauc_precision_at_20_max value: 13.241674471380568 - type: nauc_precision_at_20_std value: 7.052280133821539 - type: nauc_precision_at_3_diff1 value: 15.442614376387374 - type: nauc_precision_at_3_max value: 25.12695418083 - type: nauc_precision_at_3_std value: -3.1150066697920638 - type: nauc_precision_at_5_diff1 value: 8.381026072692444 - type: nauc_precision_at_5_max value: 22.839056540604822 - type: nauc_precision_at_5_std value: 1.5126905486524331 - type: nauc_recall_at_1000_diff1 value: -0.8869709920433502 - type: nauc_recall_at_1000_max value: 45.092324433377264 - type: nauc_recall_at_1000_std value: 62.21264093315108 - type: nauc_recall_at_100_diff1 value: 16.036715011075714 - type: nauc_recall_at_100_max value: 39.79963411771158 - type: nauc_recall_at_100_std value: 28.41850069503361 - type: nauc_recall_at_10_diff1 value: 25.189622794479998 - type: nauc_recall_at_10_max value: 30.82355277039427 - type: nauc_recall_at_10_std value: 0.0964544736531047 - type: nauc_recall_at_1_diff1 value: 44.36306892906905 - type: nauc_recall_at_1_max value: 25.61348630699028 - type: nauc_recall_at_1_std value: -8.713074613333902 - type: nauc_recall_at_20_diff1 value: 20.43424504746087 - type: nauc_recall_at_20_max value: 33.96010554649377 - type: nauc_recall_at_20_std value: 6.900984030301936 - type: nauc_recall_at_3_diff1 value: 33.86531858793492 - type: nauc_recall_at_3_max value: 27.725692256711188 - type: nauc_recall_at_3_std value: -8.533124289305709 - type: nauc_recall_at_5_diff1 value: 32.006964557701686 - type: nauc_recall_at_5_max value: 31.493370659289806 - type: nauc_recall_at_5_std value: -4.8639793547793255 - type: ndcg_at_1 value: 60.461 - type: ndcg_at_10 value: 68.529 - type: ndcg_at_100 value: 71.664 - type: ndcg_at_1000 value: 72.396 - type: ndcg_at_20 value: 70.344 - type: ndcg_at_3 value: 61.550000000000004 - type: ndcg_at_5 value: 64.948 - type: precision_at_1 value: 60.461 - type: precision_at_10 value: 13.28 - type: precision_at_100 value: 1.555 - type: precision_at_1000 value: 0.164 - type: precision_at_20 value: 7.216 - type: precision_at_3 value: 33.077 - type: precision_at_5 value: 23.014000000000003 - type: recall_at_1 value: 42.529 - type: recall_at_10 value: 81.169 - type: recall_at_100 value: 93.154 - type: recall_at_1000 value: 98.18299999999999 - type: recall_at_20 value: 87.132 - type: recall_at_3 value: 63.905 - type: recall_at_5 value: 71.967 task: type: Retrieval - dataset: config: default name: MTEB RuReviewsClassification (default) revision: f6d2c31f4dc6b88f468552750bfec05b4b41b05a split: test type: ai-forever/ru-reviews-classification metrics: - type: accuracy value: 61.17675781250001 - type: f1 value: 60.354535346041374 - type: f1_weighted value: 60.35437313166116 - type: main_score value: 61.17675781250001 task: type: Classification - dataset: config: default name: MTEB RuSTSBenchmarkSTS (default) revision: 7cf24f325c6da6195df55bef3d86b5e0616f3018 split: test type: ai-forever/ru-stsbenchmark-sts metrics: - type: cosine_pearson value: 78.1301041727274 - type: cosine_spearman value: 78.08238025421747 - type: euclidean_pearson value: 77.35224254583635 - type: euclidean_spearman value: 78.08235336582496 - type: main_score value: 78.08238025421747 - type: manhattan_pearson value: 77.24138550052075 - type: manhattan_spearman value: 77.98199107904142 - type: pearson value: 78.1301041727274 - type: spearman value: 78.08238025421747 task: type: STS - dataset: config: default name: MTEB RuSciBenchGRNTIClassification (default) revision: 673a610d6d3dd91a547a0d57ae1b56f37ebbf6a1 split: test type: ai-forever/ru-scibench-grnti-classification metrics: - type: accuracy value: 54.990234375 - type: f1 value: 53.537019057131374 - type: f1_weighted value: 53.552745354520766 - type: main_score value: 54.990234375 task: type: Classification - dataset: config: default name: MTEB RuSciBenchGRNTIClusteringP2P (default) revision: 673a610d6d3dd91a547a0d57ae1b56f37ebbf6a1 split: test type: ai-forever/ru-scibench-grnti-classification metrics: - type: main_score value: 50.775228895355106 - type: v_measure value: 50.775228895355106 - type: v_measure_std value: 0.9533571150165796 task: type: Clustering - dataset: config: default name: MTEB RuSciBenchOECDClassification (default) revision: 26c88e99dcaba32bb45d0e1bfc21902337f6d471 split: test type: ai-forever/ru-scibench-oecd-classification metrics: - type: accuracy value: 41.71875 - type: f1 value: 39.289100975858304 - type: f1_weighted value: 39.29257829217775 - type: main_score value: 41.71875 task: type: Classification - dataset: config: default name: MTEB RuSciBenchOECDClusteringP2P (default) revision: 26c88e99dcaba32bb45d0e1bfc21902337f6d471 split: test type: ai-forever/ru-scibench-oecd-classification metrics: - type: main_score value: 45.10904808834516 - type: v_measure value: 45.10904808834516 - type: v_measure_std value: 1.0572643410157534 task: type: Clustering - dataset: config: rus_Cyrl name: MTEB SIB200Classification (rus_Cyrl) revision: a74d7350ea12af010cfb1c21e34f1f81fd2e615b split: test type: mteb/sib200 metrics: - type: accuracy value: 66.36363636363637 - type: f1 value: 64.6940336621617 - type: f1_weighted value: 66.43317771876966 - type: main_score value: 66.36363636363637 task: type: Classification - dataset: config: rus_Cyrl name: MTEB SIB200ClusteringS2S (rus_Cyrl) revision: a74d7350ea12af010cfb1c21e34f1f81fd2e615b split: test type: mteb/sib200 metrics: - type: main_score value: 33.99178497314711 - type: v_measure value: 33.99178497314711 - type: v_measure_std value: 4.036337464043786 task: type: Clustering - dataset: config: ru name: MTEB STS22.v2 (ru) revision: d31f33a128469b20e357535c39b82fb3c3f6f2bd split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 50.724322379215934 - type: cosine_spearman value: 59.90449732164651 - type: euclidean_pearson value: 50.227545226784024 - type: euclidean_spearman value: 59.898906527601085 - type: main_score value: 59.90449732164651 - type: manhattan_pearson value: 50.21762139819405 - type: manhattan_spearman value: 59.761039813759 - type: pearson value: 50.724322379215934 - type: spearman value: 59.90449732164651 task: type: STS - dataset: config: ru name: MTEB STSBenchmarkMultilingualSTS (ru) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: dev type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 78.43928769569945 - type: cosine_spearman value: 78.23961768018884 - type: euclidean_pearson value: 77.4718694027985 - type: euclidean_spearman value: 78.23887044760475 - type: main_score value: 78.23961768018884 - type: manhattan_pearson value: 77.34517128089547 - type: manhattan_spearman value: 78.1146477340426 - type: pearson value: 78.43928769569945 - type: spearman value: 78.23961768018884 task: type: STS - dataset: config: default name: MTEB SensitiveTopicsClassification (default) revision: 416b34a802308eac30e4192afc0ff99bb8dcc7f2 split: test type: ai-forever/sensitive-topics-classification metrics: - type: accuracy value: 22.8125 - type: f1 value: 17.31969589593409 - type: lrap value: 33.82412380642287 - type: main_score value: 22.8125 task: type: MultilabelClassification - dataset: config: default name: MTEB TERRa (default) revision: 7b58f24536063837d644aab9a023c62199b2a612 split: dev type: ai-forever/terra-pairclassification metrics: - type: cosine_accuracy value: 57.32899022801303 - type: cosine_accuracy_threshold value: 85.32201051712036 - type: cosine_ap value: 55.14264553720072 - type: cosine_f1 value: 66.83544303797468 - type: cosine_f1_threshold value: 85.32201051712036 - type: cosine_precision value: 54.54545454545454 - type: cosine_recall value: 86.27450980392157 - type: dot_accuracy value: 57.32899022801303 - type: dot_accuracy_threshold value: 85.32201051712036 - type: dot_ap value: 55.14264553720072 - type: dot_f1 value: 66.83544303797468 - type: dot_f1_threshold value: 85.32201051712036 - type: dot_precision value: 54.54545454545454 - type: dot_recall value: 86.27450980392157 - type: euclidean_accuracy value: 57.32899022801303 - type: euclidean_accuracy_threshold value: 54.18117046356201 - type: euclidean_ap value: 55.14264553720072 - type: euclidean_f1 value: 66.83544303797468 - type: euclidean_f1_threshold value: 54.18117046356201 - type: euclidean_precision value: 54.54545454545454 - type: euclidean_recall value: 86.27450980392157 - type: main_score value: 55.14264553720072 - type: manhattan_accuracy value: 57.32899022801303 - type: manhattan_accuracy_threshold value: 828.8480758666992 - type: manhattan_ap value: 55.077974053622555 - type: manhattan_f1 value: 66.82352941176471 - type: manhattan_f1_threshold value: 885.6784820556641 - type: manhattan_precision value: 52.20588235294118 - type: manhattan_recall value: 92.81045751633987 - type: max_ap value: 55.14264553720072 - type: max_f1 value: 66.83544303797468 - type: max_precision value: 54.54545454545454 - type: max_recall value: 92.81045751633987 - type: similarity_accuracy value: 57.32899022801303 - type: similarity_accuracy_threshold value: 85.32201051712036 - type: similarity_ap value: 55.14264553720072 - type: similarity_f1 value: 66.83544303797468 - type: similarity_f1_threshold value: 85.32201051712036 - type: similarity_precision value: 54.54545454545454 - type: similarity_recall value: 86.27450980392157 task: type: PairClassification - dataset: config: ru name: MTEB XNLI (ru) revision: 09698e0180d87dc247ca447d3a1248b931ac0cdb split: test type: mteb/xnli metrics: - type: cosine_accuracy value: 67.6923076923077 - type: cosine_accuracy_threshold value: 87.6681923866272 - type: cosine_ap value: 73.18693800863593 - type: cosine_f1 value: 70.40641099026904 - type: cosine_f1_threshold value: 85.09706258773804 - type: cosine_precision value: 57.74647887323944 - type: cosine_recall value: 90.17595307917888 - type: dot_accuracy value: 67.6923076923077 - type: dot_accuracy_threshold value: 87.66818642616272 - type: dot_ap value: 73.18693800863593 - type: dot_f1 value: 70.40641099026904 - type: dot_f1_threshold value: 85.09706258773804 - type: dot_precision value: 57.74647887323944 - type: dot_recall value: 90.17595307917888 - type: euclidean_accuracy value: 67.6923076923077 - type: euclidean_accuracy_threshold value: 49.662476778030396 - type: euclidean_ap value: 73.18693800863593 - type: euclidean_f1 value: 70.40641099026904 - type: euclidean_f1_threshold value: 54.59475517272949 - type: euclidean_precision value: 57.74647887323944 - type: euclidean_recall value: 90.17595307917888 - type: main_score value: 73.18693800863593 - type: manhattan_accuracy value: 67.54578754578755 - type: manhattan_accuracy_threshold value: 777.1001815795898 - type: manhattan_ap value: 72.98861474758783 - type: manhattan_f1 value: 70.6842435655995 - type: manhattan_f1_threshold value: 810.3782653808594 - type: manhattan_precision value: 61.80021953896817 - type: manhattan_recall value: 82.55131964809385 - type: max_ap value: 73.18693800863593 - type: max_f1 value: 70.6842435655995 - type: max_precision value: 61.80021953896817 - type: max_recall value: 90.17595307917888 - type: similarity_accuracy value: 67.6923076923077 - type: similarity_accuracy_threshold value: 87.6681923866272 - type: similarity_ap value: 73.18693800863593 - type: similarity_f1 value: 70.40641099026904 - type: similarity_f1_threshold value: 85.09706258773804 - type: similarity_precision value: 57.74647887323944 - type: similarity_recall value: 90.17595307917888 task: type: PairClassification - dataset: config: russian name: MTEB XNLIV2 (russian) revision: 5b7d477a8c62cdd18e2fed7e015497c20b4371ad split: test type: mteb/xnli2.0-multi-pair metrics: - type: cosine_accuracy value: 68.35164835164835 - type: cosine_accuracy_threshold value: 88.48621845245361 - type: cosine_ap value: 73.10205506215699 - type: cosine_f1 value: 71.28712871287128 - type: cosine_f1_threshold value: 87.00399398803711 - type: cosine_precision value: 61.67023554603854 - type: cosine_recall value: 84.4574780058651 - type: dot_accuracy value: 68.35164835164835 - type: dot_accuracy_threshold value: 88.48622441291809 - type: dot_ap value: 73.10191110714706 - type: dot_f1 value: 71.28712871287128 - type: dot_f1_threshold value: 87.00399398803711 - type: dot_precision value: 61.67023554603854 - type: dot_recall value: 84.4574780058651 - type: euclidean_accuracy value: 68.35164835164835 - type: euclidean_accuracy_threshold value: 47.98704385757446 - type: euclidean_ap value: 73.10205506215699 - type: euclidean_f1 value: 71.28712871287128 - type: euclidean_f1_threshold value: 50.982362031936646 - type: euclidean_precision value: 61.67023554603854 - type: euclidean_recall value: 84.4574780058651 - type: main_score value: 73.10205506215699 - type: manhattan_accuracy value: 67.91208791208791 - type: manhattan_accuracy_threshold value: 746.1360931396484 - type: manhattan_ap value: 72.8954736175069 - type: manhattan_f1 value: 71.1297071129707 - type: manhattan_f1_threshold value: 808.0789566040039 - type: manhattan_precision value: 60.04036326942482 - type: manhattan_recall value: 87.2434017595308 - type: max_ap value: 73.10205506215699 - type: max_f1 value: 71.28712871287128 - type: max_precision value: 61.67023554603854 - type: max_recall value: 87.2434017595308 - type: similarity_accuracy value: 68.35164835164835 - type: similarity_accuracy_threshold value: 88.48621845245361 - type: similarity_ap value: 73.10205506215699 - type: similarity_f1 value: 71.28712871287128 - type: similarity_f1_threshold value: 87.00399398803711 - type: similarity_precision value: 61.67023554603854 - type: similarity_recall value: 84.4574780058651 task: type: PairClassification - dataset: config: ru name: MTEB XQuADRetrieval (ru) revision: 51adfef1c1287aab1d2d91b5bead9bcfb9c68583 split: validation type: google/xquad metrics: - type: main_score value: 95.705 - type: map_at_1 value: 90.802 - type: map_at_10 value: 94.427 - type: map_at_100 value: 94.451 - type: map_at_1000 value: 94.451 - type: map_at_20 value: 94.446 - type: map_at_3 value: 94.121 - type: map_at_5 value: 94.34 - type: mrr_at_1 value: 90.80168776371308 - type: mrr_at_10 value: 94.42659567343111 - type: mrr_at_100 value: 94.45099347521871 - type: mrr_at_1000 value: 94.45099347521871 - type: mrr_at_20 value: 94.44574530017569 - type: mrr_at_3 value: 94.12095639943743 - type: mrr_at_5 value: 94.34036568213786 - type: nauc_map_at_1000_diff1 value: 87.40573202946949 - type: nauc_map_at_1000_max value: 65.56220344468791 - type: nauc_map_at_1000_std value: 8.865583291735863 - type: nauc_map_at_100_diff1 value: 87.40573202946949 - type: nauc_map_at_100_max value: 65.56220344468791 - type: nauc_map_at_100_std value: 8.865583291735863 - type: nauc_map_at_10_diff1 value: 87.43657080570291 - type: nauc_map_at_10_max value: 65.71295628534446 - type: nauc_map_at_10_std value: 9.055399339099655 - type: nauc_map_at_1_diff1 value: 88.08395824560428 - type: nauc_map_at_1_max value: 62.92813192908893 - type: nauc_map_at_1_std value: 6.738987385482432 - type: nauc_map_at_20_diff1 value: 87.40979818966589 - type: nauc_map_at_20_max value: 65.59474346926105 - type: nauc_map_at_20_std value: 8.944420599300914 - type: nauc_map_at_3_diff1 value: 86.97771892161035 - type: nauc_map_at_3_max value: 66.14330030122467 - type: nauc_map_at_3_std value: 8.62516327793521 - type: nauc_map_at_5_diff1 value: 87.30273362211798 - type: nauc_map_at_5_max value: 66.1522476584607 - type: nauc_map_at_5_std value: 9.780940862679724 - type: nauc_mrr_at_1000_diff1 value: 87.40573202946949 - type: nauc_mrr_at_1000_max value: 65.56220344468791 - type: nauc_mrr_at_1000_std value: 8.865583291735863 - type: nauc_mrr_at_100_diff1 value: 87.40573202946949 - type: nauc_mrr_at_100_max value: 65.56220344468791 - type: nauc_mrr_at_100_std value: 8.865583291735863 - type: nauc_mrr_at_10_diff1 value: 87.43657080570291 - type: nauc_mrr_at_10_max value: 65.71295628534446 - type: nauc_mrr_at_10_std value: 9.055399339099655 - type: nauc_mrr_at_1_diff1 value: 88.08395824560428 - type: nauc_mrr_at_1_max value: 62.92813192908893 - type: nauc_mrr_at_1_std value: 6.738987385482432 - type: nauc_mrr_at_20_diff1 value: 87.40979818966589 - type: nauc_mrr_at_20_max value: 65.59474346926105 - type: nauc_mrr_at_20_std value: 8.944420599300914 - type: nauc_mrr_at_3_diff1 value: 86.97771892161035 - type: nauc_mrr_at_3_max value: 66.14330030122467 - type: nauc_mrr_at_3_std value: 8.62516327793521 - type: nauc_mrr_at_5_diff1 value: 87.30273362211798 - type: nauc_mrr_at_5_max value: 66.1522476584607 - type: nauc_mrr_at_5_std value: 9.780940862679724 - type: nauc_ndcg_at_1000_diff1 value: 87.37823158814116 - type: nauc_ndcg_at_1000_max value: 66.00874244792789 - type: nauc_ndcg_at_1000_std value: 9.479929342875067 - type: nauc_ndcg_at_100_diff1 value: 87.37823158814116 - type: nauc_ndcg_at_100_max value: 66.00874244792789 - type: nauc_ndcg_at_100_std value: 9.479929342875067 - type: nauc_ndcg_at_10_diff1 value: 87.54508467181488 - type: nauc_ndcg_at_10_max value: 66.88756470312894 - type: nauc_ndcg_at_10_std value: 10.812624405397022 - type: nauc_ndcg_at_1_diff1 value: 88.08395824560428 - type: nauc_ndcg_at_1_max value: 62.92813192908893 - type: nauc_ndcg_at_1_std value: 6.738987385482432 - type: nauc_ndcg_at_20_diff1 value: 87.42097894104597 - type: nauc_ndcg_at_20_max value: 66.37031898778943 - type: nauc_ndcg_at_20_std value: 10.34862538094813 - type: nauc_ndcg_at_3_diff1 value: 86.50039907157999 - type: nauc_ndcg_at_3_max value: 67.97798288917929 - type: nauc_ndcg_at_3_std value: 10.162410286746852 - type: nauc_ndcg_at_5_diff1 value: 87.13322094568531 - type: nauc_ndcg_at_5_max value: 68.08576118683821 - type: nauc_ndcg_at_5_std value: 12.639637379592855 - type: nauc_precision_at_1000_diff1 value: 100.0 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 100.0 - type: nauc_precision_at_100_diff1 value: 100.0 - type: nauc_precision_at_100_max value: 100.0 - type: nauc_precision_at_100_std value: 100.0 - type: nauc_precision_at_10_diff1 value: 93.46711505595813 - type: nauc_precision_at_10_max value: 100.0 - type: nauc_precision_at_10_std value: 65.42573557179935 - type: nauc_precision_at_1_diff1 value: 88.08395824560428 - type: nauc_precision_at_1_max value: 62.92813192908893 - type: nauc_precision_at_1_std value: 6.738987385482432 - type: nauc_precision_at_20_diff1 value: 91.28948674127133 - type: nauc_precision_at_20_max value: 100.0 - type: nauc_precision_at_20_std value: 90.74278258632364 - type: nauc_precision_at_3_diff1 value: 82.64606115071832 - type: nauc_precision_at_3_max value: 83.26201582412921 - type: nauc_precision_at_3_std value: 23.334013491433762 - type: nauc_precision_at_5_diff1 value: 85.0867539350284 - type: nauc_precision_at_5_max value: 96.57011448655484 - type: nauc_precision_at_5_std value: 56.46869543426768 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: .nan - type: nauc_recall_at_100_max value: .nan - type: nauc_recall_at_100_std value: .nan - type: nauc_recall_at_10_diff1 value: 93.46711505595623 - type: nauc_recall_at_10_max value: 100.0 - type: nauc_recall_at_10_std value: 65.42573557180279 - type: nauc_recall_at_1_diff1 value: 88.08395824560428 - type: nauc_recall_at_1_max value: 62.92813192908893 - type: nauc_recall_at_1_std value: 6.738987385482432 - type: nauc_recall_at_20_diff1 value: 91.28948674127474 - type: nauc_recall_at_20_max value: 100.0 - type: nauc_recall_at_20_std value: 90.74278258632704 - type: nauc_recall_at_3_diff1 value: 82.64606115071967 - type: nauc_recall_at_3_max value: 83.26201582413023 - type: nauc_recall_at_3_std value: 23.334013491434007 - type: nauc_recall_at_5_diff1 value: 85.08675393502854 - type: nauc_recall_at_5_max value: 96.57011448655487 - type: nauc_recall_at_5_std value: 56.46869543426658 - type: ndcg_at_1 value: 90.802 - type: ndcg_at_10 value: 95.705 - type: ndcg_at_100 value: 95.816 - type: ndcg_at_1000 value: 95.816 - type: ndcg_at_20 value: 95.771 - type: ndcg_at_3 value: 95.11699999999999 - type: ndcg_at_5 value: 95.506 - type: precision_at_1 value: 90.802 - type: precision_at_10 value: 9.949 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.987 - type: precision_at_3 value: 32.658 - type: precision_at_5 value: 19.781000000000002 - type: recall_at_1 value: 90.802 - type: recall_at_10 value: 99.494 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 99.747 - type: recall_at_3 value: 97.975 - type: recall_at_5 value: 98.90299999999999 task: type: Retrieval tags: - mteb - Sentence Transformers - sentence-similarity - sentence-transformers --- ## Multilingual-E5-small Multilingual E5 Text Embeddings: A Technical Report. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei, arXiv 2024 This model has 12 layers and the embedding size is 384. ## Usage Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. ## Supported Languages This model is initialized from microsoft/Multilingual-MiniLM-L12-H384 and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation. ## Training Details **Initialization**: microsoft/Multilingual-MiniLM-L12-H384 **First stage**: contrastive pre-training with weak supervision | Dataset | Weak supervision | # of text pairs | |--------------------------------------------------------------------------------------------------------|---------------------------------------|-----------------| | Filtered mC4 | (title, page content) | 1B | | CC News | (title, news content) | 400M | | NLLB | translation pairs | 2.4B | | Wikipedia | (hierarchical section title, passage) | 150M | | Filtered Reddit | (comment, response) | 800M | | S2ORC | (title, abstract) and citation pairs | 100M | | Stackexchange | (question, answer) | 50M | | xP3 | (input prompt, response) | 80M | | Miscellaneous unsupervised SBERT data | - | 10M | **Second stage**: supervised fine-tuning | Dataset | Language | # of text pairs | |----------------------------------------------------------------------------------------|--------------|-----------------| | MS MARCO | English | 500k | | NQ | English | 70k | | Trivia QA | English | 60k | | NLI from SimCSE | English | <300k | | ELI5 | English | 500k | | DuReader Retrieval | Chinese | 86k | | KILT Fever | English | 70k | | KILT HotpotQA | English | 70k | | SQuAD | English | 87k | | Quora | English | 150k | | Mr. TyDi | 11 languages | 50k | | MIRACL | 16 languages | 40k | For all labeled datasets, we only use its training set for fine-tuning. For other training details, please refer to our paper at ## Benchmark Results on Mr. TyDi | Model | Avg MRR@10 | | ar | bn | en | fi | id | ja | ko | ru | sw | te | th | |-----------------------|------------|-------|------| --- | --- | --- | --- | --- | --- | --- |------| --- | --- | | BM25 | 33.3 | | 36.7 | 41.3 | 15.1 | 28.8 | 38.2 | 21.7 | 28.1 | 32.9 | 39.6 | 42.4 | 41.7 | | mDPR | 16.7 | | 26.0 | 25.8 | 16.2 | 11.3 | 14.6 | 18.1 | 21.9 | 18.5 | 7.3 | 10.6 | 13.5 | | BM25 + mDPR | 41.7 | | 49.1 | 53.5 | 28.4 | 36.5 | 45.5 | 35.5 | 36.2 | 42.7 | 40.5 | 42.0 | 49.2 | | | | | multilingual-e5-small | 64.4 | | 71.5 | 66.3 | 54.5 | 57.7 | 63.2 | 55.4 | 54.3 | 60.8 | 65.4 | 89.1 | 70.1 | | multilingual-e5-base | 65.9 | | 72.3 | 65.0 | 58.5 | 60.8 | 64.9 | 56.6 | 55.8 | 62.7 | 69.0 | 86.6 | 72.7 | | multilingual-e5-large | **70.5** | | 77.5 | 73.2 | 60.8 | 66.8 | 68.5 | 62.5 | 61.6 | 65.8 | 72.7 | 90.2 | 76.2 | ## MTEB Benchmark Evaluation Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. ## Support for Sentence Transformers Below is an example for usage with sentence_transformers. Package requirements Contributors: michaelfeil ## FAQ **1. Do I need to add the prefix \"query: \" and \"passage: \" to input texts?** Yes, this is how the model is trained, otherwise you will see a performance degradation. Here are some rules of thumb: - Use \"query: \" and \"passage: \" correspondingly for asymmetric tasks such as passage retrieval in open QA, ad-hoc information retrieval. - Use \"query: \" prefix for symmetric tasks such as semantic similarity, bitext mining, paraphrase retrieval. - Use \"query: \" prefix if you want to use embeddings as features, such as linear probing classification, clustering. **2. Why are my reproduced results slightly different from reported in the model card?** Different versions of and could cause negligible but non-zero performance differences. **3. Why does the cosine similarity scores distribute around 0.7 to 1.0?** This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss. For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue. ## Citation If you find our paper or models helpful, please consider cite as follows: ## Limitations Long texts will be truncated to at most 512 tokens.", + "model_explanation_gemini": "A multilingual text embedding model optimized for classification tasks across numerous languages, including English, German, Spanish, and Japanese, with demonstrated performance on Amazon review and counterfactual datasets." +} \ No newline at end of file diff --git a/data/model_data_json/ionet-official_bc8-alpha.json b/data/model_data_json/ionet-official_bc8-alpha.json new file mode 100644 index 0000000000000000000000000000000000000000..8804ea1ef83987b8f012d5de1d489ba3229644f1 --- /dev/null +++ b/data/model_data_json/ionet-official_bc8-alpha.json @@ -0,0 +1,15 @@ +{ + "model_id": "ionet-official/bc8-alpha", + "downloads": 770651, + "tags": [ + "diffusers", + "clip", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: apache-2.0 ---", + "model_explanation_gemini": "Performs binary classification tasks under the Apache 2.0 license." +} \ No newline at end of file diff --git a/data/model_data_json/j-hartmann_emotion-english-distilroberta-base.json b/data/model_data_json/j-hartmann_emotion-english-distilroberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..08094d0a52c8da323d2bfd5b992b76c22791dda0 --- /dev/null +++ b/data/model_data_json/j-hartmann_emotion-english-distilroberta-base.json @@ -0,0 +1,23 @@ +{ + "model_id": "j-hartmann/emotion-english-distilroberta-base", + "downloads": 1458009, + "tags": [ + "transformers", + "pytorch", + "tf", + "roberta", + "text-classification", + "distilroberta", + "sentiment", + "emotion", + "twitter", + "reddit", + "en", + "arxiv:2210.00434", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: \"en\" tags: - distilroberta - sentiment - emotion - twitter - reddit widget: - text: \"Oh wow. I didn't know that.\" - text: \"This movie always makes me cry..\" - text: \"Oh Happy Day\" --- # Emotion English DistilRoBERTa-base # Description ℹ With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class: 1) anger 🤬 2) disgust 🤢 3) fear 😨 4) joy 😀 5) neutral 😐 6) sadness 😭 7) surprise 😲 The model is a fine-tuned checkpoint of DistilRoBERTa-base. For a 'non-distilled' emotion model, please refer to the model card of the RoBERTa-large version. # Application 🚀 a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab: Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab: extends the popular EmotionLines dataset, EmotionLines itself is not included here. |Name|anger|disgust|fear|joy|neutral|sadness|surprise| |---|---|---|---|---|---|---|---| |Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes| |Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes| |GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes| |ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-| |MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes| |SemEval-2018, EI-reg, Mohammad et al. (2018) |Yes|-|Yes|Yes|-|Yes|-| The model is trained on a balanced subset from the datasets listed above (2,811 observations per emotion, i.e., nearly 20k observations in total). 80% of this balanced subset is used for training and 20% for evaluation. The evaluation accuracy is 66% (vs. the random-chance baseline of 1/7 = 14%). # Scientific Applications 📖 Below you can find a list of papers using \"Emotion English DistilRoBERTa-base\". If you would like your paper to be added to the list, please send me an email. - Butt, S., Sharma, S., Sharma, R., Sidorov, G., & Gelbukh, A. (2022). What goes on inside rumour and non-rumour tweets and their reactions: A Psycholinguistic Analyses. Computers in Human Behavior, 107345. - Kuang, Z., Zong, S., Zhang, J., Chen, J., & Liu, H. (2022). Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings. arXiv preprint arXiv:2210.00434. - Rozado, D., Hughes, R., & Halberstadt, J. (2022). Longitudinal analysis of sentiment and emotion in news media headlines using automated labelling with Transformer language models. Plos one, 17(10), e0276367.", + "model_explanation_gemini": "Classifies English text into Ekman's 6 basic emotions (anger, disgust, fear, joy, sadness, surprise) plus neutral." +} \ No newline at end of file diff --git a/data/model_data_json/jbetker_wav2vec2-large-robust-ft-libritts-voxpopuli.json b/data/model_data_json/jbetker_wav2vec2-large-robust-ft-libritts-voxpopuli.json new file mode 100644 index 0000000000000000000000000000000000000000..277137681ac6cee6ea31676f1f3b374c448f05d1 --- /dev/null +++ b/data/model_data_json/jbetker_wav2vec2-large-robust-ft-libritts-voxpopuli.json @@ -0,0 +1,14 @@ +{ + "model_id": "jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli", + "downloads": 332892, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "endpoints_compatible", + "region:us" + ], + "description": "This checkpoint is a wav2vec2-large model that is useful for generating transcriptions with punctuation. It is intended for use in building transcriptions for TTS models, where punctuation is very important for prosody. This model was created by fine-tuning the checkpoint on the libritts and voxpopuli datasets with a new vocabulary that includes punctuation. The model gets a respectable WER of 4.45% on the librispeech validation set. The baseline, , got 4.3%. Since the model was fine-tuned on clean audio, it is not well-suited for noisy audio like CommonVoice (though I may upload a checkpoint for that soon too). It still does pretty good, though. The vocabulary is uploaded to the model hub as well . Check out my speech transcription script repo, ocotillo for usage examples:", + "model_explanation_gemini": "Generates transcriptions with punctuation from clean audio, primarily for improving prosody in text-to-speech models." +} \ No newline at end of file diff --git a/data/model_data_json/jbochi_madlad400-3b-mt.json b/data/model_data_json/jbochi_madlad400-3b-mt.json new file mode 100644 index 0000000000000000000000000000000000000000..a80bf5ac7b1ec72fc321dc90fa2a03b6f4cfd2a9 --- /dev/null +++ b/data/model_data_json/jbochi_madlad400-3b-mt.json @@ -0,0 +1,439 @@ +{ + "model_id": "jbochi/madlad400-3b-mt", + "downloads": 119133, + "tags": [ + "transformers", + "safetensors", + "gguf", + "t5", + "text2text-generation", + "text-generation-inference", + "translation", + "multilingual", + "en", + "ru", + "es", + "fr", + "de", + "it", + "pt", + "pl", + "nl", + "vi", + "tr", + "sv", + "id", + "ro", + "cs", + "zh", + "hu", + "ja", + "th", + "fi", + "fa", + "uk", + "da", + "el", + "no", + "bg", + "sk", + "ko", + "ar", + "lt", + "ca", + "sl", + "he", + "et", + "lv", + "hi", + "sq", + "ms", + "az", + "sr", + "ta", + "hr", + "kk", + "is", + "ml", + "mr", + "te", + "af", + "gl", + "fil", + "be", + "mk", + "eu", + "bn", + "ka", + "mn", + "bs", + "uz", + "ur", + "sw", + "yue", + "ne", + "kn", + "kaa", + "gu", + "si", + "cy", + "eo", + "la", + "hy", + "ky", + "tg", + "ga", + "mt", + "my", + "km", + "tt", + "so", + "ku", + "ps", + "pa", + "rw", + "lo", + "ha", + "dv", + "fy", + "lb", + "ckb", + "mg", + "gd", + "am", + "ug", + "ht", + "grc", + "hmn", + "sd", + "jv", + "mi", + "tk", + "ceb", + "yi", + "ba", + "fo", + "or", + "xh", + "su", + "kl", + "ny", + "sm", + "sn", + "co", + "zu", + "ig", + "yo", + "pap", + "st", + "haw", + "as", + "oc", + "cv", + "lus", + "tet", + "gsw", + "sah", + "br", + "rm", + "sa", + "bo", + "om", + "se", + "ce", + "cnh", + "ilo", + "hil", + "udm", + "os", + "lg", + "ti", + "vec", + "ts", + "tyv", + "kbd", + "ee", + "iba", + "av", + "kha", + "to", + "tn", + "nso", + "fj", + "zza", + "ak", + "ada", + "otq", + "dz", + "bua", + "cfm", + "ln", + "chm", + "gn", + "krc", + "wa", + "hif", + "yua", + "srn", + "war", + "rom", + "bik", + "pam", + "sg", + "lu", + "ady", + "kbp", + "syr", + "ltg", + "myv", + "iso", + "kac", + "bho", + "ay", + "kum", + "qu", + "za", + "pag", + "ngu", + "ve", + "pck", + "zap", + "tyz", + "hui", + "bbc", + "tzo", + "tiv", + "ksd", + "gom", + "min", + "ang", + "nhe", + "bgp", + "nzi", + "nnb", + "nv", + "zxx", + "bci", + "kv", + "new", + "mps", + "alt", + "meu", + "bew", + "fon", + "iu", + "abt", + "mgh", + "mnw", + "tvl", + "dov", + "tlh", + "ho", + "kw", + "mrj", + "meo", + "crh", + "mbt", + "emp", + "ace", + "ium", + "mam", + "gym", + "mai", + "crs", + "pon", + "ubu", + "fip", + "quc", + "gv", + "kj", + "btx", + "ape", + "chk", + "rcf", + "shn", + "tzh", + "mdf", + "ppk", + "ss", + "gag", + "cab", + "kri", + "seh", + "ibb", + "tbz", + "bru", + "enq", + "ach", + "cuk", + "kmb", + "wo", + "kek", + "qub", + "tab", + "bts", + "kos", + "rwo", + "cak", + "tuc", + "bum", + "cjk", + "gil", + "stq", + "tsg", + "quh", + "mak", + "arn", + "ban", + "jiv", + "sja", + "yap", + "tcy", + "toj", + "twu", + "xal", + "amu", + "rmc", + "hus", + "nia", + "kjh", + "bm", + "guh", + "mas", + "acf", + "dtp", + "ksw", + "bzj", + "din", + "zne", + "mad", + "msi", + "mag", + "mkn", + "kg", + "lhu", + "ch", + "qvi", + "mh", + "djk", + "sus", + "mfe", + "srm", + "dyu", + "ctu", + "gui", + "pau", + "inb", + "bi", + "mni", + "guc", + "jam", + "wal", + "jac", + "bas", + "gor", + "skr", + "nyu", + "noa", + "sda", + "gub", + "nog", + "cni", + "teo", + "tdx", + "sxn", + "rki", + "nr", + "frp", + "alz", + "taj", + "lrc", + "cce", + "rn", + "jvn", + "hvn", + "nij", + "dwr", + "izz", + "msm", + "bus", + "ktu", + "chr", + "maz", + "tzj", + "suz", + "knj", + "bim", + "gvl", + "bqc", + "tca", + "pis", + "prk", + "laj", + "mel", + "qxr", + "niq", + "ahk", + "shp", + "hne", + "spp", + "koi", + "krj", + "quf", + "luz", + "agr", + "tsc", + "mqy", + "gof", + "gbm", + "miq", + "dje", + "awa", + "bjj", + "qvz", + "sjp", + "tll", + "raj", + "kjg", + "bgz", + "quy", + "cbk", + "akb", + "oj", + "ify", + "mey", + "ks", + "cac", + "brx", + "qup", + "syl", + "jax", + "ff", + "ber", + "tks", + "trp", + "mrw", + "adh", + "smt", + "srr", + "ffm", + "qvc", + "mtr", + "ann", + "aa", + "noe", + "nut", + "gyn", + "kwi", + "xmm", + "msb", + "dataset:allenai/MADLAD-400", + "arxiv:2309.04662", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - multilingual - en - ru - es - fr - de - it - pt - pl - nl - vi - tr - sv - id - ro - cs - zh - hu - ja - th - fi - fa - uk - da - el - \"no\" - bg - sk - ko - ar - lt - ca - sl - he - et - lv - hi - sq - ms - az - sr - ta - hr - kk - is - ml - mr - te - af - gl - fil - be - mk - eu - bn - ka - mn - bs - uz - ur - sw - yue - ne - kn - kaa - gu - si - cy - eo - la - hy - ky - tg - ga - mt - my - km - tt - so - ku - ps - pa - rw - lo - ha - dv - fy - lb - ckb - mg - gd - am - ug - ht - grc - hmn - sd - jv - mi - tk - ceb - yi - ba - fo - or - xh - su - kl - ny - sm - sn - co - zu - ig - yo - pap - st - haw - as - oc - cv - lus - tet - gsw - sah - br - rm - sa - bo - om - se - ce - cnh - ilo - hil - udm - os - lg - ti - vec - ts - tyv - kbd - ee - iba - av - kha - to - tn - nso - fj - zza - ak - ada - otq - dz - bua - cfm - ln - chm - gn - krc - wa - hif - yua - srn - war - rom - bik - pam - sg - lu - ady - kbp - syr - ltg - myv - iso - kac - bho - ay - kum - qu - za - pag - ngu - ve - pck - zap - tyz - hui - bbc - tzo - tiv - ksd - gom - min - ang - nhe - bgp - nzi - nnb - nv - zxx - bci - kv - new - mps - alt - meu - bew - fon - iu - abt - mgh - mnw - tvl - dov - tlh - ho - kw - mrj - meo - crh - mbt - emp - ace - ium - mam - gym - mai - crs - pon - ubu - fip - quc - gv - kj - btx - ape - chk - rcf - shn - tzh - mdf - ppk - ss - gag - cab - kri - seh - ibb - tbz - bru - enq - ach - cuk - kmb - wo - kek - qub - tab - bts - kos - rwo - cak - tuc - bum - cjk - gil - stq - tsg - quh - mak - arn - ban - jiv - sja - yap - tcy - toj - twu - xal - amu - rmc - hus - nia - kjh - bm - guh - mas - acf - dtp - ksw - bzj - din - zne - mad - msi - mag - mkn - kg - lhu - ch - qvi - mh - djk - sus - mfe - srm - dyu - ctu - gui - pau - inb - bi - mni - guc - jam - wal - jac - bas - gor - skr - nyu - noa - sda - gub - nog - cni - teo - tdx - sxn - rki - nr - frp - alz - taj - lrc - cce - rn - jvn - hvn - nij - dwr - izz - msm - bus - ktu - chr - maz - tzj - suz - knj - bim - gvl - bqc - tca - pis - prk - laj - mel - qxr - niq - ahk - shp - hne - spp - koi - krj - quf - luz - agr - tsc - mqy - gof - gbm - miq - dje - awa - bjj - qvz - sjp - tll - raj - kjg - bgz - quy - cbk - akb - oj - ify - mey - ks - cac - brx - qup - syl - jax - ff - ber - tks - trp - mrw - adh - smt - srr - ffm - qvc - mtr - ann - kaa - aa - noe - nut - gyn - kwi - xmm - msb library_name: transformers tags: - text2text-generation - text-generation-inference datasets: - allenai/MADLAD-400 pipeline_tag: translation widget: - text: \"<2en> Como vai, amigo?\" example_title: \"Translation to English\" - text: \"<2de> Do you speak German?\" example_title: \"Translation to German\" --- # Model Card for MADLAD-400-3B-MT # Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation # TL;DR MADLAD-400-3B-MT is a multilingual machine translation model based on the T5 architecture that was trained on 1 trillion tokens covering over 450 languages using publicly available data. It is competitive with models that are significantly larger. **Disclaimer**: Juarez Bochi, who was not involved in this research, converted the original weights and wrote the contents of this model card based on the original paper and Flan-T5. # Model Details ## Model Description - **Model type:** Language model - **Language(s) (NLP):** Multilingual (400+ languages) - **License:** Apache 2.0 - **Related Models:** All MADLAD-400 Checkpoints - **Original Checkpoints:** All Original MADLAD-400 Checkpoints - **Resources for more information:** - Research paper - GitHub Repo - Hugging Face MADLAD-400 Docs (Similar to T5) - Pending PR # Usage Find below some example scripts on how to use the model: ## Using the Pytorch model with ### Running the model on a CPU or GPU

Click to expand First, install the Python packages that are required:
## Running the model with Candle
Click to expand Usage with candle: We also provide a quantized model (1.65 GB vs the original 11.8 GB file):
# Uses ## Direct Use and Downstream Use > Primary intended uses: Machine Translation and multilingual NLP tasks on over 400 languages. > Primary intended users: Research community. ## Out-of-Scope Use > These models are trained on general domain data and are therefore not meant to > work on domain-specific models out-of-the box. Moreover, these research models have not been assessed > for production usecases. # Bias, Risks, and Limitations > We note that we evaluate on only 204 of the languages supported by these models and on machine translation > and few-shot machine translation tasks. Users must consider use of this model carefully for their own > usecase. ## Ethical considerations and risks > We trained these models with MADLAD-400 and publicly available data to create baseline models that > support NLP for over 400 languages, with a focus on languages underrepresented in large-scale corpora. > Given that these models were trained with web-crawled datasets that may contain sensitive, offensive or > otherwise low-quality content despite extensive preprocessing, it is still possible that these issues to the > underlying training data may cause differences in model performance and toxic (or otherwise problematic) > output for certain domains. Moreover, large models are dual use technologies that have specific risks > associated with their use and development. We point the reader to surveys such as those written by > Weidinger et al. or Bommasani et al. for a more detailed discussion of these risks, and to Liebling > et al. for a thorough discussion of the risks of machine translation systems. ## Known Limitations More information needed ## Sensitive Use: More information needed # Training Details > We train models of various sizes: a 3B, 32-layer parameter model, > a 7.2B 48-layer parameter model and a 10.7B 32-layer parameter model. > We share all parameters of the model across language pairs, > and use a Sentence Piece Model with 256k tokens shared on both the encoder and decoder > side. Each input sentence has a <2xx> token prepended to the source sentence to indicate the target > language. See the research paper for further details. ## Training Data > For both the machine translation and language model, MADLAD-400 is used. For the machine translation > model, a combination of parallel datasources covering 157 languages is also used. Further details are > described in the paper. ## Training Procedure See the research paper for further details. # Evaluation ## Testing Data, Factors & Metrics > For evaluation, we used WMT, NTREX, Flores-200 and Gatones datasets as described in Section 4.3 in the paper. > The translation quality of this model varies based on language, as seen in the paper, and likely varies on > domain, though we have not assessed this. ## Results !image/png !image/png !image/png See the research paper for further details. # Environmental Impact More information needed # Citation **BibTeX:**", + "model_explanation_gemini": "A multilingual machine translation model supporting over 400 languages, trained on 1 trillion tokens using the T5 architecture for text-to-text generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/jhgan_ko-sbert-sts.json b/data/model_data_json/jhgan_ko-sbert-sts.json new file mode 100644 index 0000000000000000000000000000000000000000..319f97a3324257b9b7762ab21b23527c33686c12 --- /dev/null +++ b/data/model_data_json/jhgan_ko-sbert-sts.json @@ -0,0 +1,20 @@ +{ + "model_id": "jhgan/ko-sbert-sts", + "downloads": 168883, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:2004.03289", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- # ko-sbert-sts This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results KorSTS 학습 데이터셋으로 학습한 후 KorSTS 평가 데이터셋으로 평가한 결과입니다. - Cosine Pearson: 81.55 - Cosine Spearman: 81.23 - Euclidean Pearson: 79.94 - Euclidean Spearman: 79.79 - Manhattan Pearson: 79.90 - Manhattan Spearman: 79.75 - Dot Pearson: 76.02 - Dot Spearman: 75.31 ## Training The model was trained with the parameters: **DataLoader**: of length 719 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors - Ham, J., Choe, Y. J., Park, K., Choi, I., & Soh, H. (2020). Kornli and korsts: New benchmark datasets for korean natural language understanding. arXiv preprint arXiv:2004.03289 - Reimers, Nils and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” ArXiv abs/1908.10084 (2019) - Reimers, Nils and Iryna Gurevych. “Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation.” EMNLP (2020)", + "model_explanation_gemini": "Maps Korean sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search, optimized for sentence similarity." +} \ No newline at end of file diff --git a/data/model_data_json/jhgan_ko-sroberta-multitask.json b/data/model_data_json/jhgan_ko-sroberta-multitask.json new file mode 100644 index 0000000000000000000000000000000000000000..f39178648f2bae0b9c2109e45f80fc15c4298832 --- /dev/null +++ b/data/model_data_json/jhgan_ko-sroberta-multitask.json @@ -0,0 +1,21 @@ +{ + "model_id": "jhgan/ko-sroberta-multitask", + "downloads": 139755, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "ko", + "arxiv:2004.03289", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers language: ko --- # ko-sroberta-multitask This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results KorSTS, KorNLI 학습 데이터셋으로 멀티 태스크 학습을 진행한 후 KorSTS 평가 데이터셋으로 평가한 결과입니다. - Cosine Pearson: 84.77 - Cosine Spearman: 85.60 - Euclidean Pearson: 83.71 - Euclidean Spearman: 84.40 - Manhattan Pearson: 83.70 - Manhattan Spearman: 84.38 - Dot Pearson: 82.42 - Dot Spearman: 82.33 ## Training The model was trained with the parameters: **DataLoader**: of length 8885 with parameters: **Loss**: with parameters: **DataLoader**: of length 719 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors - Ham, J., Choe, Y. J., Park, K., Choi, I., & Soh, H. (2020). Kornli and korsts: New benchmark datasets for korean natural language understanding. arXiv preprint arXiv:2004.03289 - Reimers, Nils and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” ArXiv abs/1908.10084 (2019) - Reimers, Nils and Iryna Gurevych. “Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation.” EMNLP (2020).", + "model_explanation_gemini": "Maps Korean sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search, trained on KorSTS and KorNLI datasets." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-bert-flash-implementation.json b/data/model_data_json/jinaai_jina-bert-flash-implementation.json new file mode 100644 index 0000000000000000000000000000000000000000..05bd8c3d792d739a9e2a08bed570331e51612dd7 --- /dev/null +++ b/data/model_data_json/jinaai_jina-bert-flash-implementation.json @@ -0,0 +1,13 @@ +{ + "model_id": "jinaai/jina-bert-flash-implementation", + "downloads": 453028, + "tags": [ + "transformers", + "bert", + "custom_code", + "endpoints_compatible", + "region:eu" + ], + "description": "# BERT with Flash-Attention ### Installing dependencies To run the model on GPU, you need to install Flash Attention. You may either install from pypi (which may not work with fused-dense), or from source. To install from source, clone the GitHub repository: The code provided here should work with commit . Change to the cloned repo and install: This will compile the flash-attention kernel, which will take some time. If you would like to use fused MLPs (e.g. to use activation checkpointing), you may install fused-dense also from source: ### Configuration The config adds some new parameters: - : If , always use flash attention. If , use flash attention when GPU is available. If , never use flash attention (works on CPU). - : Size (left and right) of the local attention window. If , use global attention - : If true, we only need to pass the hidden states for the masked out token (around 15%) to the classifier heads. I set this to true for pretraining. - : Whether to use fused-dense. Useful to reduce VRAM in combination with activation checkpointing - : One of . Increasing this increases the amount of activation checkpointing within the MLP. Keep this at 0 for pretraining and use gradient accumulation instead. For embedding training, increase this as much as needed. - : If true, we only need the compute the last layer for a subset of tokens. I left this to false. - : Whether or not to use QK-normalization - : Number of LoRAs to use when initializing a model. Has no effect on other models.", + "model_explanation_gemini": "Implements BERT with Flash-Attention for efficient GPU processing, featuring configurable attention mechanisms, fused MLPs, and optional LoRAs." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-clip-v1.json b/data/model_data_json/jinaai_jina-clip-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..1f6126fb804a58a145d6a33183db31bd882d9285 --- /dev/null +++ b/data/model_data_json/jinaai_jina-clip-v1.json @@ -0,0 +1,24 @@ +{ + "model_id": "jinaai/jina-clip-v1", + "downloads": 108965, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "jina_clip", + "feature-extraction", + "sentence-similarity", + "mteb", + "clip", + "vision", + "transformers.js", + "custom_code", + "en", + "arxiv:2405.20204", + "license:apache-2.0", + "region:eu" + ], + "description": "--- tags: - feature-extraction - sentence-similarity - mteb - clip - vision - transformers.js language: en inference: false license: apache-2.0 library_name: transformers ---

\"Jina

The embedding set trained by .

Jina CLIP: your CLIP model is also your text retriever!

## Intended Usage & Model Info is a state-of-the-art English **multimodal (text-image) embedding model**. Traditional text embedding models, such as jina-embeddings-v2-base-en, excel in text-to-text retrieval but incapable of cross-modal tasks. Models like openai/clip-vit-base-patch32 effectively align image and text embeddings but are not optimized for text-to-text retrieval due to their training methodologies and context limitations. bridges this gap by offering robust performance in both domains. Its text component matches the retrieval efficiency of , while its overall architecture sets a new benchmark for cross-modal retrieval. This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (MuRAG) applications, enabling seamless text-to-text and text-to-image searches within a single model. ## Data & Parameters Check out our paper ## Usage 1. The easiest way to starting using jina-clip-v1-en is to use Jina AI's Embeddings API. 2. Alternatively, you can use Jina CLIP directly via transformers/sentence-transformers package. or sentence-transformers: 3. JavaScript developers can use Jina CLIP via the Transformers.js library. Note that to use this model, you need to install Transformers.js v3 from source using . ## Performance ### Text-Image Retrieval | Name | Flickr Image Retr. R@1 | Flickr Image Retr. R@5 | Flickr Text Retr. R@1 | Flickr Text Retr. R@5 | |------------------|-------------------------|-------------------------|-----------------------|-----------------------| | ViT-B-32 | 0.597 | 0.8398 | 0.781 | 0.938 | | ViT-B-16 | 0.6216 | 0.8572 | 0.822 | 0.966 | | jina-clip | 0.6748 | 0.8902 | 0.811 | 0.965 | | Name | MSCOCO Image Retr. R@1 | MSCOCO Image Retr. R@5 | MSCOCO Text Retr. R@1 | MSCOCO Text Retr. R@5 | |------------------|-------------------------|-------------------------|-----------------------|-----------------------| | ViT-B-32 | 0.342 | 0.6001 | 0.5234 | 0.7634 | | ViT-B-16 | 0.3309 | 0.5842 | 0.5242 | 0.767 | | jina-clip | 0.4111 | 0.6644 | 0.5544 | 0.7904 | ### Text-Text Retrieval | Name | STS12 | STS15 | STS17 | STS13 | STS14 | STS16 | STS22 | STSBenchmark | SummEval | |-----------------------|--------|--------|--------|--------|--------|--------|--------|--------------|----------| | jina-embeddings-v2 | 0.7427 | 0.8755 | 0.8888 | 0.833 | 0.7917 | 0.836 | 0.6346 | 0.8404 | 0.3056 | | jina-clip | 0.7352 | 0.8746 | 0.8976 | 0.8323 | 0.7868 | 0.8377 | 0.6583 | 0.8493 | 0.3048 | | Name | ArguAna | FiQA2018 | NFCorpus | Quora | SCIDOCS | SciFact | TRECCOVID | |--------------------|---------|----------|----------|-------|---------|---------|-----------| | jina-embeddings-v2 | 0.4418 | 0.4158 | 0.3245 | 0.882 | 0.1986 | 0.6668 | 0.6591 | | jina-clip | 0.4933 | 0.3827 | 0.3352 | 0.8789| 0.2024 | 0.6734 | 0.7161 | ## Contact Join our Discord community and chat with other community members about ideas. ## Citation If you find useful in your research, please cite the following paper: ## FAQ ### I encounter this problem, what should I do? There was a bug in Transformers library between 4.40.x to 4.41.1. You can update transformers to >4.41.2 or <=4.40.0 ### Given one query, how can I merge its text-text and text-image cosine similarity? Our emperical study shows that text-text cosine similarity is normally larger than text-image cosine similarity! If you want to merge two scores, we recommended 2 ways: 1. weighted average of text-text sim and text-image sim: 2. apply z-score normalization before merging scores:", + "model_explanation_gemini": "Jina CLIP is a multimodal embedding model excelling in both text-to-text and text-to-image retrieval tasks, enabling seamless cross-modal searches within a single model." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-embeddings-v2-base-code.json b/data/model_data_json/jinaai_jina-embeddings-v2-base-code.json new file mode 100644 index 0000000000000000000000000000000000000000..794d41da362ab308194db46b4c17de4d6886c7c6 --- /dev/null +++ b/data/model_data_json/jinaai_jina-embeddings-v2-base-code.json @@ -0,0 +1,28 @@ +{ + "model_id": "jinaai/jina-embeddings-v2-base-code", + "downloads": 115524, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "mteb", + "transformers", + "transformers.js", + "custom_code", + "en", + "dataset:allenai/c4", + "arxiv:2108.12409", + "arxiv:2310.19923", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "region:eu" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - transformers - transformers.js datasets: - allenai/c4 language: en inference: false license: apache-2.0 ---

\"Jina

The text embedding set trained by .

## Quick Start The easiest way to starting using is to use Jina AI's Embedding API. ## Intended Usage & Model Info is an multilingual **embedding model** speaks **English and 30 widely used programming languages**. Same as other jina-embeddings-v2 series, it supports **8192** sequence length. is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. The backbone is pretrained on the github-code dataset. The model is further trained on Jina AI's collection of more than 150 millions of coding question answer and docstring source code pairs. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process. The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi. This makes our model useful for a range of use cases, especially when processing long documents is needed, including technical question answering and code search. This model has 161 million parameters, which enables fast and memory efficient inference, while delivering impressive performance. Additionally, we provide the following embedding models: - []( 33 million parameters. - []( 137 million parameters. - []( Chinese-English Bilingual embeddings. - []( German-English Bilingual embeddings. - []( Spanish-English Bilingual embeddings (soon). - []( 161 million parameters code embeddings. **
Supported (Programming) Languages**

- English - Assembly - Batchfile - C - C# - C++ - CMake - CSS - Dockerfile - FORTRAN - GO - Haskell - HTML - Java - JavaScript - Julia - Lua - Makefile - Markdown - PHP - Perl - PowerShell - Python - Ruby - Rust - SQL - Scala - Shell - TypeScript - TeX - Visual Basic

## Data & Parameters Jina Embeddings V2 technical report ## Usage **
Please apply mean pooling when integrating the model.**

### Why mean pooling? takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an function to deal with this. However, if you would like to do it without using the default function:

You can use Jina Embedding models directly from transformers package: If you only want to handle shorter sequence, such as 2k, pass the parameter to the function: Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well): You can also use the Transformers.js library to compute embeddings in JavaScript. ## Plans 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese. 2. Multimodal embedding models enable Multimodal RAG applications. 3. High-performt rerankers. ## Contact Join our Discord community and chat with other community members about ideas.", + "model_explanation_gemini": "Generates multilingual embeddings for English and 30 programming languages, optimized for long sequences up to 8192 tokens, primarily for code search and technical question answering." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-embeddings-v2-base-de.json b/data/model_data_json/jinaai_jina-embeddings-v2-base-de.json new file mode 100644 index 0000000000000000000000000000000000000000..7c1df11612ba10a754d19c85a37ed2644e6f23fc --- /dev/null +++ b/data/model_data_json/jinaai_jina-embeddings-v2-base-de.json @@ -0,0 +1,29 @@ +{ + "model_id": "jinaai/jina-embeddings-v2-base-de", + "downloads": 102243, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "mteb", + "transformers", + "transformers.js", + "custom_code", + "de", + "en", + "arxiv:2108.12409", + "arxiv:2402.17016", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "region:eu" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - transformers - transformers.js language: - de - en inference: false license: apache-2.0 model-index: - name: jina-embeddings-v2-base-de results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.76119402985076 - type: ap value: 35.99577188521176 - type: f1 value: 67.50397431543269 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (de) config: de split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 68.9186295503212 - type: ap value: 79.73307115840507 - type: f1 value: 66.66245744831339 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 77.52215 - type: ap value: 71.85051037177416 - type: f1 value: 77.4171096157774 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 38.498 - type: f1 value: 38.058193386555956 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 37.717999999999996 - type: f1 value: 37.22674371574757 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 25.319999999999997 - type: map_at_10 value: 40.351 - type: map_at_100 value: 41.435 - type: map_at_1000 value: 41.443000000000005 - type: map_at_3 value: 35.266 - type: map_at_5 value: 37.99 - type: mrr_at_1 value: 25.746999999999996 - type: mrr_at_10 value: 40.515 - type: mrr_at_100 value: 41.606 - type: mrr_at_1000 value: 41.614000000000004 - type: mrr_at_3 value: 35.42 - type: mrr_at_5 value: 38.112 - type: ndcg_at_1 value: 25.319999999999997 - type: ndcg_at_10 value: 49.332 - type: ndcg_at_100 value: 53.909 - type: ndcg_at_1000 value: 54.089 - type: ndcg_at_3 value: 38.705 - type: ndcg_at_5 value: 43.606 - type: precision_at_1 value: 25.319999999999997 - type: precision_at_10 value: 7.831 - type: precision_at_100 value: 0.9820000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 16.24 - type: precision_at_5 value: 12.119 - type: recall_at_1 value: 25.319999999999997 - type: recall_at_10 value: 78.307 - type: recall_at_100 value: 98.222 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 48.72 - type: recall_at_5 value: 60.597 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 41.43100588255654 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 32.08988904593667 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 60.55514765595906 - type: mrr value: 73.51393835465858 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 79.6723823121172 - type: cos_sim_spearman value: 76.90596922214986 - type: euclidean_pearson value: 77.87910737957918 - type: euclidean_spearman value: 76.66319260598262 - type: manhattan_pearson value: 77.37039493457965 - type: manhattan_spearman value: 76.09872191280964 - task: type: BitextMining dataset: type: mteb/bucc-bitext-mining name: MTEB BUCC (de-en) config: de-en split: test revision: d51519689f32196a32af33b075a01d0e7c51e252 metrics: - type: accuracy value: 98.97703549060543 - type: f1 value: 98.86569241475296 - type: precision value: 98.81002087682673 - type: recall value: 98.97703549060543 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 83.93506493506493 - type: f1 value: 83.91014949949302 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 34.970675877585144 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 28.779230269190954 - task: type: Clustering dataset: type: slvnwhrl/blurbs-clustering-p2p name: MTEB BlurbsClusteringP2P config: default split: test revision: a2dd5b02a77de3466a3eaa98ae586b5610314496 metrics: - type: v_measure value: 35.490175601567216 - task: type: Clustering dataset: type: slvnwhrl/blurbs-clustering-s2s name: MTEB BlurbsClusteringS2S config: default split: test revision: 9bfff9a7f8f6dc6ffc9da71c48dd48b68696471d metrics: - type: v_measure value: 16.16638280560168 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.830999999999996 - type: map_at_10 value: 41.355 - type: map_at_100 value: 42.791000000000004 - type: map_at_1000 value: 42.918 - type: map_at_3 value: 38.237 - type: map_at_5 value: 40.066 - type: mrr_at_1 value: 38.484 - type: mrr_at_10 value: 47.593 - type: mrr_at_100 value: 48.388 - type: mrr_at_1000 value: 48.439 - type: mrr_at_3 value: 45.279 - type: mrr_at_5 value: 46.724 - type: ndcg_at_1 value: 38.484 - type: ndcg_at_10 value: 47.27 - type: ndcg_at_100 value: 52.568000000000005 - type: ndcg_at_1000 value: 54.729000000000006 - type: ndcg_at_3 value: 43.061 - type: ndcg_at_5 value: 45.083 - type: precision_at_1 value: 38.484 - type: precision_at_10 value: 8.927 - type: precision_at_100 value: 1.425 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 20.791999999999998 - type: precision_at_5 value: 14.85 - type: recall_at_1 value: 30.830999999999996 - type: recall_at_10 value: 57.87799999999999 - type: recall_at_100 value: 80.124 - type: recall_at_1000 value: 94.208 - type: recall_at_3 value: 45.083 - type: recall_at_5 value: 51.154999999999994 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.782 - type: map_at_10 value: 34.492 - type: map_at_100 value: 35.521 - type: map_at_1000 value: 35.638 - type: map_at_3 value: 31.735999999999997 - type: map_at_5 value: 33.339 - type: mrr_at_1 value: 32.357 - type: mrr_at_10 value: 39.965 - type: mrr_at_100 value: 40.644000000000005 - type: mrr_at_1000 value: 40.695 - type: mrr_at_3 value: 37.739 - type: mrr_at_5 value: 39.061 - type: ndcg_at_1 value: 32.357 - type: ndcg_at_10 value: 39.644 - type: ndcg_at_100 value: 43.851 - type: ndcg_at_1000 value: 46.211999999999996 - type: ndcg_at_3 value: 35.675000000000004 - type: ndcg_at_5 value: 37.564 - type: precision_at_1 value: 32.357 - type: precision_at_10 value: 7.344 - type: precision_at_100 value: 1.201 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 17.155 - type: precision_at_5 value: 12.166 - type: recall_at_1 value: 25.782 - type: recall_at_10 value: 49.132999999999996 - type: recall_at_100 value: 67.24 - type: recall_at_1000 value: 83.045 - type: recall_at_3 value: 37.021 - type: recall_at_5 value: 42.548 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 35.778999999999996 - type: map_at_10 value: 47.038000000000004 - type: map_at_100 value: 48.064 - type: map_at_1000 value: 48.128 - type: map_at_3 value: 44.186 - type: map_at_5 value: 45.788000000000004 - type: mrr_at_1 value: 41.254000000000005 - type: mrr_at_10 value: 50.556999999999995 - type: mrr_at_100 value: 51.296 - type: mrr_at_1000 value: 51.331 - type: mrr_at_3 value: 48.318 - type: mrr_at_5 value: 49.619 - type: ndcg_at_1 value: 41.254000000000005 - type: ndcg_at_10 value: 52.454 - type: ndcg_at_100 value: 56.776 - type: ndcg_at_1000 value: 58.181000000000004 - type: ndcg_at_3 value: 47.713 - type: ndcg_at_5 value: 49.997 - type: precision_at_1 value: 41.254000000000005 - type: precision_at_10 value: 8.464 - type: precision_at_100 value: 1.157 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 21.526 - type: precision_at_5 value: 14.696000000000002 - type: recall_at_1 value: 35.778999999999996 - type: recall_at_10 value: 64.85300000000001 - type: recall_at_100 value: 83.98400000000001 - type: recall_at_1000 value: 94.18299999999999 - type: recall_at_3 value: 51.929 - type: recall_at_5 value: 57.666 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.719 - type: map_at_10 value: 29.326999999999998 - type: map_at_100 value: 30.314000000000004 - type: map_at_1000 value: 30.397000000000002 - type: map_at_3 value: 27.101 - type: map_at_5 value: 28.141 - type: mrr_at_1 value: 23.503 - type: mrr_at_10 value: 31.225 - type: mrr_at_100 value: 32.096000000000004 - type: mrr_at_1000 value: 32.159 - type: mrr_at_3 value: 29.076999999999998 - type: mrr_at_5 value: 30.083 - type: ndcg_at_1 value: 23.503 - type: ndcg_at_10 value: 33.842 - type: ndcg_at_100 value: 39.038000000000004 - type: ndcg_at_1000 value: 41.214 - type: ndcg_at_3 value: 29.347 - type: ndcg_at_5 value: 31.121 - type: precision_at_1 value: 23.503 - type: precision_at_10 value: 5.266 - type: precision_at_100 value: 0.831 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 12.504999999999999 - type: precision_at_5 value: 8.565000000000001 - type: recall_at_1 value: 21.719 - type: recall_at_10 value: 46.024 - type: recall_at_100 value: 70.78999999999999 - type: recall_at_1000 value: 87.022 - type: recall_at_3 value: 33.64 - type: recall_at_5 value: 37.992 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.601 - type: map_at_10 value: 22.054000000000002 - type: map_at_100 value: 23.177 - type: map_at_1000 value: 23.308 - type: map_at_3 value: 19.772000000000002 - type: map_at_5 value: 21.055 - type: mrr_at_1 value: 19.403000000000002 - type: mrr_at_10 value: 26.409 - type: mrr_at_100 value: 27.356 - type: mrr_at_1000 value: 27.441 - type: mrr_at_3 value: 24.108999999999998 - type: mrr_at_5 value: 25.427 - type: ndcg_at_1 value: 19.403000000000002 - type: ndcg_at_10 value: 26.474999999999998 - type: ndcg_at_100 value: 32.086 - type: ndcg_at_1000 value: 35.231 - type: ndcg_at_3 value: 22.289 - type: ndcg_at_5 value: 24.271 - type: precision_at_1 value: 19.403000000000002 - type: precision_at_10 value: 4.813 - type: precision_at_100 value: 0.8869999999999999 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 10.531 - type: precision_at_5 value: 7.710999999999999 - type: recall_at_1 value: 15.601 - type: recall_at_10 value: 35.916 - type: recall_at_100 value: 60.8 - type: recall_at_1000 value: 83.245 - type: recall_at_3 value: 24.321 - type: recall_at_5 value: 29.372999999999998 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.522 - type: map_at_10 value: 34.854 - type: map_at_100 value: 36.269 - type: map_at_1000 value: 36.387 - type: map_at_3 value: 32.187 - type: map_at_5 value: 33.692 - type: mrr_at_1 value: 31.375999999999998 - type: mrr_at_10 value: 40.471000000000004 - type: mrr_at_100 value: 41.481 - type: mrr_at_1000 value: 41.533 - type: mrr_at_3 value: 38.274 - type: mrr_at_5 value: 39.612 - type: ndcg_at_1 value: 31.375999999999998 - type: ndcg_at_10 value: 40.298 - type: ndcg_at_100 value: 46.255 - type: ndcg_at_1000 value: 48.522 - type: ndcg_at_3 value: 36.049 - type: ndcg_at_5 value: 38.095 - type: precision_at_1 value: 31.375999999999998 - type: precision_at_10 value: 7.305000000000001 - type: precision_at_100 value: 1.201 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 17.132 - type: precision_at_5 value: 12.107999999999999 - type: recall_at_1 value: 25.522 - type: recall_at_10 value: 50.988 - type: recall_at_100 value: 76.005 - type: recall_at_1000 value: 91.11200000000001 - type: recall_at_3 value: 38.808 - type: recall_at_5 value: 44.279 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.615000000000002 - type: map_at_10 value: 32.843 - type: map_at_100 value: 34.172999999999995 - type: map_at_1000 value: 34.286 - type: map_at_3 value: 30.125 - type: map_at_5 value: 31.495 - type: mrr_at_1 value: 30.023 - type: mrr_at_10 value: 38.106 - type: mrr_at_100 value: 39.01 - type: mrr_at_1000 value: 39.071 - type: mrr_at_3 value: 35.674 - type: mrr_at_5 value: 36.924 - type: ndcg_at_1 value: 30.023 - type: ndcg_at_10 value: 38.091 - type: ndcg_at_100 value: 43.771 - type: ndcg_at_1000 value: 46.315 - type: ndcg_at_3 value: 33.507 - type: ndcg_at_5 value: 35.304 - type: precision_at_1 value: 30.023 - type: precision_at_10 value: 6.837999999999999 - type: precision_at_100 value: 1.124 - type: precision_at_1000 value: 0.152 - type: precision_at_3 value: 15.562999999999999 - type: precision_at_5 value: 10.936 - type: recall_at_1 value: 24.615000000000002 - type: recall_at_10 value: 48.691 - type: recall_at_100 value: 72.884 - type: recall_at_1000 value: 90.387 - type: recall_at_3 value: 35.659 - type: recall_at_5 value: 40.602 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.223666666666666 - type: map_at_10 value: 31.338166666666673 - type: map_at_100 value: 32.47358333333333 - type: map_at_1000 value: 32.5955 - type: map_at_3 value: 28.84133333333333 - type: map_at_5 value: 30.20808333333333 - type: mrr_at_1 value: 27.62483333333333 - type: mrr_at_10 value: 35.385916666666674 - type: mrr_at_100 value: 36.23325 - type: mrr_at_1000 value: 36.29966666666667 - type: mrr_at_3 value: 33.16583333333333 - type: mrr_at_5 value: 34.41983333333334 - type: ndcg_at_1 value: 27.62483333333333 - type: ndcg_at_10 value: 36.222 - type: ndcg_at_100 value: 41.29491666666666 - type: ndcg_at_1000 value: 43.85508333333333 - type: ndcg_at_3 value: 31.95116666666667 - type: ndcg_at_5 value: 33.88541666666667 - type: precision_at_1 value: 27.62483333333333 - type: precision_at_10 value: 6.339916666666667 - type: precision_at_100 value: 1.0483333333333333 - type: precision_at_1000 value: 0.14608333333333334 - type: precision_at_3 value: 14.726500000000003 - type: precision_at_5 value: 10.395 - type: recall_at_1 value: 23.223666666666666 - type: recall_at_10 value: 46.778999999999996 - type: recall_at_100 value: 69.27141666666667 - type: recall_at_1000 value: 87.27383333333334 - type: recall_at_3 value: 34.678749999999994 - type: recall_at_5 value: 39.79900000000001 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.677 - type: map_at_10 value: 27.828000000000003 - type: map_at_100 value: 28.538999999999998 - type: map_at_1000 value: 28.64 - type: map_at_3 value: 26.105 - type: map_at_5 value: 27.009 - type: mrr_at_1 value: 24.387 - type: mrr_at_10 value: 30.209999999999997 - type: mrr_at_100 value: 30.953000000000003 - type: mrr_at_1000 value: 31.029 - type: mrr_at_3 value: 28.707 - type: mrr_at_5 value: 29.610999999999997 - type: ndcg_at_1 value: 24.387 - type: ndcg_at_10 value: 31.378 - type: ndcg_at_100 value: 35.249 - type: ndcg_at_1000 value: 37.923 - type: ndcg_at_3 value: 28.213 - type: ndcg_at_5 value: 29.658 - type: precision_at_1 value: 24.387 - type: precision_at_10 value: 4.8309999999999995 - type: precision_at_100 value: 0.73 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 12.168 - type: precision_at_5 value: 8.251999999999999 - type: recall_at_1 value: 21.677 - type: recall_at_10 value: 40.069 - type: recall_at_100 value: 58.077 - type: recall_at_1000 value: 77.97 - type: recall_at_3 value: 31.03 - type: recall_at_5 value: 34.838 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 14.484 - type: map_at_10 value: 20.355 - type: map_at_100 value: 21.382 - type: map_at_1000 value: 21.511 - type: map_at_3 value: 18.448 - type: map_at_5 value: 19.451999999999998 - type: mrr_at_1 value: 17.584 - type: mrr_at_10 value: 23.825 - type: mrr_at_100 value: 24.704 - type: mrr_at_1000 value: 24.793000000000003 - type: mrr_at_3 value: 21.92 - type: mrr_at_5 value: 22.97 - type: ndcg_at_1 value: 17.584 - type: ndcg_at_10 value: 24.315 - type: ndcg_at_100 value: 29.354999999999997 - type: ndcg_at_1000 value: 32.641999999999996 - type: ndcg_at_3 value: 20.802 - type: ndcg_at_5 value: 22.335 - type: precision_at_1 value: 17.584 - type: precision_at_10 value: 4.443 - type: precision_at_100 value: 0.8160000000000001 - type: precision_at_1000 value: 0.128 - type: precision_at_3 value: 9.807 - type: precision_at_5 value: 7.0889999999999995 - type: recall_at_1 value: 14.484 - type: recall_at_10 value: 32.804 - type: recall_at_100 value: 55.679 - type: recall_at_1000 value: 79.63 - type: recall_at_3 value: 22.976 - type: recall_at_5 value: 26.939 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.983999999999998 - type: map_at_10 value: 30.812 - type: map_at_100 value: 31.938 - type: map_at_1000 value: 32.056000000000004 - type: map_at_3 value: 28.449999999999996 - type: map_at_5 value: 29.542 - type: mrr_at_1 value: 27.145999999999997 - type: mrr_at_10 value: 34.782999999999994 - type: mrr_at_100 value: 35.699 - type: mrr_at_1000 value: 35.768 - type: mrr_at_3 value: 32.572 - type: mrr_at_5 value: 33.607 - type: ndcg_at_1 value: 27.145999999999997 - type: ndcg_at_10 value: 35.722 - type: ndcg_at_100 value: 40.964 - type: ndcg_at_1000 value: 43.598 - type: ndcg_at_3 value: 31.379 - type: ndcg_at_5 value: 32.924 - type: precision_at_1 value: 27.145999999999997 - type: precision_at_10 value: 6.063000000000001 - type: precision_at_100 value: 0.9730000000000001 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 14.366000000000001 - type: precision_at_5 value: 9.776 - type: recall_at_1 value: 22.983999999999998 - type: recall_at_10 value: 46.876 - type: recall_at_100 value: 69.646 - type: recall_at_1000 value: 88.305 - type: recall_at_3 value: 34.471000000000004 - type: recall_at_5 value: 38.76 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.017000000000003 - type: map_at_10 value: 31.049 - type: map_at_100 value: 32.582 - type: map_at_1000 value: 32.817 - type: map_at_3 value: 28.303 - type: map_at_5 value: 29.854000000000003 - type: mrr_at_1 value: 27.866000000000003 - type: mrr_at_10 value: 35.56 - type: mrr_at_100 value: 36.453 - type: mrr_at_1000 value: 36.519 - type: mrr_at_3 value: 32.938 - type: mrr_at_5 value: 34.391 - type: ndcg_at_1 value: 27.866000000000003 - type: ndcg_at_10 value: 36.506 - type: ndcg_at_100 value: 42.344 - type: ndcg_at_1000 value: 45.213 - type: ndcg_at_3 value: 31.805 - type: ndcg_at_5 value: 33.933 - type: precision_at_1 value: 27.866000000000003 - type: precision_at_10 value: 7.016 - type: precision_at_100 value: 1.468 - type: precision_at_1000 value: 0.23900000000000002 - type: precision_at_3 value: 14.822 - type: precision_at_5 value: 10.791 - type: recall_at_1 value: 23.017000000000003 - type: recall_at_10 value: 47.053 - type: recall_at_100 value: 73.177 - type: recall_at_1000 value: 91.47800000000001 - type: recall_at_3 value: 33.675 - type: recall_at_5 value: 39.36 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.673 - type: map_at_10 value: 24.051000000000002 - type: map_at_100 value: 24.933 - type: map_at_1000 value: 25.06 - type: map_at_3 value: 21.446 - type: map_at_5 value: 23.064 - type: mrr_at_1 value: 18.115000000000002 - type: mrr_at_10 value: 25.927 - type: mrr_at_100 value: 26.718999999999998 - type: mrr_at_1000 value: 26.817999999999998 - type: mrr_at_3 value: 23.383000000000003 - type: mrr_at_5 value: 25.008999999999997 - type: ndcg_at_1 value: 18.115000000000002 - type: ndcg_at_10 value: 28.669 - type: ndcg_at_100 value: 33.282000000000004 - type: ndcg_at_1000 value: 36.481 - type: ndcg_at_3 value: 23.574 - type: ndcg_at_5 value: 26.340000000000003 - type: precision_at_1 value: 18.115000000000002 - type: precision_at_10 value: 4.769 - type: precision_at_100 value: 0.767 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 10.351 - type: precision_at_5 value: 7.8 - type: recall_at_1 value: 16.673 - type: recall_at_10 value: 41.063 - type: recall_at_100 value: 62.851 - type: recall_at_1000 value: 86.701 - type: recall_at_3 value: 27.532 - type: recall_at_5 value: 34.076 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 8.752 - type: map_at_10 value: 15.120000000000001 - type: map_at_100 value: 16.678 - type: map_at_1000 value: 16.854 - type: map_at_3 value: 12.603 - type: map_at_5 value: 13.918 - type: mrr_at_1 value: 19.283 - type: mrr_at_10 value: 29.145 - type: mrr_at_100 value: 30.281000000000002 - type: mrr_at_1000 value: 30.339 - type: mrr_at_3 value: 26.069 - type: mrr_at_5 value: 27.864 - type: ndcg_at_1 value: 19.283 - type: ndcg_at_10 value: 21.804000000000002 - type: ndcg_at_100 value: 28.576 - type: ndcg_at_1000 value: 32.063 - type: ndcg_at_3 value: 17.511 - type: ndcg_at_5 value: 19.112000000000002 - type: precision_at_1 value: 19.283 - type: precision_at_10 value: 6.873 - type: precision_at_100 value: 1.405 - type: precision_at_1000 value: 0.20500000000000002 - type: precision_at_3 value: 13.16 - type: precision_at_5 value: 10.189 - type: recall_at_1 value: 8.752 - type: recall_at_10 value: 27.004 - type: recall_at_100 value: 50.648 - type: recall_at_1000 value: 70.458 - type: recall_at_3 value: 16.461000000000002 - type: recall_at_5 value: 20.973 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 6.81 - type: map_at_10 value: 14.056 - type: map_at_100 value: 18.961 - type: map_at_1000 value: 20.169 - type: map_at_3 value: 10.496 - type: map_at_5 value: 11.952 - type: mrr_at_1 value: 53.5 - type: mrr_at_10 value: 63.479 - type: mrr_at_100 value: 63.971999999999994 - type: mrr_at_1000 value: 63.993 - type: mrr_at_3 value: 61.541999999999994 - type: mrr_at_5 value: 62.778999999999996 - type: ndcg_at_1 value: 42.25 - type: ndcg_at_10 value: 31.471 - type: ndcg_at_100 value: 35.115 - type: ndcg_at_1000 value: 42.408 - type: ndcg_at_3 value: 35.458 - type: ndcg_at_5 value: 32.973 - type: precision_at_1 value: 53.5 - type: precision_at_10 value: 24.85 - type: precision_at_100 value: 7.79 - type: precision_at_1000 value: 1.599 - type: precision_at_3 value: 38.667 - type: precision_at_5 value: 31.55 - type: recall_at_1 value: 6.81 - type: recall_at_10 value: 19.344 - type: recall_at_100 value: 40.837 - type: recall_at_1000 value: 64.661 - type: recall_at_3 value: 11.942 - type: recall_at_5 value: 14.646 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 44.64499999999999 - type: f1 value: 39.39106911352714 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 48.196 - type: map_at_10 value: 61.404 - type: map_at_100 value: 61.846000000000004 - type: map_at_1000 value: 61.866 - type: map_at_3 value: 58.975 - type: map_at_5 value: 60.525 - type: mrr_at_1 value: 52.025 - type: mrr_at_10 value: 65.43299999999999 - type: mrr_at_100 value: 65.80799999999999 - type: mrr_at_1000 value: 65.818 - type: mrr_at_3 value: 63.146 - type: mrr_at_5 value: 64.64 - type: ndcg_at_1 value: 52.025 - type: ndcg_at_10 value: 67.889 - type: ndcg_at_100 value: 69.864 - type: ndcg_at_1000 value: 70.337 - type: ndcg_at_3 value: 63.315 - type: ndcg_at_5 value: 65.91799999999999 - type: precision_at_1 value: 52.025 - type: precision_at_10 value: 9.182 - type: precision_at_100 value: 1.027 - type: precision_at_1000 value: 0.108 - type: precision_at_3 value: 25.968000000000004 - type: precision_at_5 value: 17.006 - type: recall_at_1 value: 48.196 - type: recall_at_10 value: 83.885 - type: recall_at_100 value: 92.671 - type: recall_at_1000 value: 96.018 - type: recall_at_3 value: 71.59 - type: recall_at_5 value: 77.946 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 15.193000000000001 - type: map_at_10 value: 25.168000000000003 - type: map_at_100 value: 27.017000000000003 - type: map_at_1000 value: 27.205000000000002 - type: map_at_3 value: 21.746 - type: map_at_5 value: 23.579 - type: mrr_at_1 value: 31.635999999999996 - type: mrr_at_10 value: 40.077 - type: mrr_at_100 value: 41.112 - type: mrr_at_1000 value: 41.160999999999994 - type: mrr_at_3 value: 37.937 - type: mrr_at_5 value: 39.18 - type: ndcg_at_1 value: 31.635999999999996 - type: ndcg_at_10 value: 32.298 - type: ndcg_at_100 value: 39.546 - type: ndcg_at_1000 value: 42.88 - type: ndcg_at_3 value: 29.221999999999998 - type: ndcg_at_5 value: 30.069000000000003 - type: precision_at_1 value: 31.635999999999996 - type: precision_at_10 value: 9.367 - type: precision_at_100 value: 1.645 - type: precision_at_1000 value: 0.22399999999999998 - type: precision_at_3 value: 20.01 - type: precision_at_5 value: 14.753 - type: recall_at_1 value: 15.193000000000001 - type: recall_at_10 value: 38.214999999999996 - type: recall_at_100 value: 65.95 - type: recall_at_1000 value: 85.85300000000001 - type: recall_at_3 value: 26.357000000000003 - type: recall_at_5 value: 31.319999999999997 - task: type: Retrieval dataset: type: jinaai/ger_da_lir name: MTEB GerDaLIR config: default split: test revision: None metrics: - type: map_at_1 value: 10.363 - type: map_at_10 value: 16.222 - type: map_at_100 value: 17.28 - type: map_at_1000 value: 17.380000000000003 - type: map_at_3 value: 14.054 - type: map_at_5 value: 15.203 - type: mrr_at_1 value: 11.644 - type: mrr_at_10 value: 17.625 - type: mrr_at_100 value: 18.608 - type: mrr_at_1000 value: 18.695999999999998 - type: mrr_at_3 value: 15.481 - type: mrr_at_5 value: 16.659 - type: ndcg_at_1 value: 11.628 - type: ndcg_at_10 value: 20.028000000000002 - type: ndcg_at_100 value: 25.505 - type: ndcg_at_1000 value: 28.288000000000004 - type: ndcg_at_3 value: 15.603 - type: ndcg_at_5 value: 17.642 - type: precision_at_1 value: 11.628 - type: precision_at_10 value: 3.5589999999999997 - type: precision_at_100 value: 0.664 - type: precision_at_1000 value: 0.092 - type: precision_at_3 value: 7.109999999999999 - type: precision_at_5 value: 5.401 - type: recall_at_1 value: 10.363 - type: recall_at_10 value: 30.586000000000002 - type: recall_at_100 value: 56.43 - type: recall_at_1000 value: 78.142 - type: recall_at_3 value: 18.651 - type: recall_at_5 value: 23.493 - task: type: Retrieval dataset: type: deepset/germandpr name: MTEB GermanDPR config: default split: test revision: 5129d02422a66be600ac89cd3e8531b4f97d347d metrics: - type: map_at_1 value: 60.78 - type: map_at_10 value: 73.91499999999999 - type: map_at_100 value: 74.089 - type: map_at_1000 value: 74.09400000000001 - type: map_at_3 value: 71.87 - type: map_at_5 value: 73.37700000000001 - type: mrr_at_1 value: 60.78 - type: mrr_at_10 value: 73.91499999999999 - type: mrr_at_100 value: 74.089 - type: mrr_at_1000 value: 74.09400000000001 - type: mrr_at_3 value: 71.87 - type: mrr_at_5 value: 73.37700000000001 - type: ndcg_at_1 value: 60.78 - type: ndcg_at_10 value: 79.35600000000001 - type: ndcg_at_100 value: 80.077 - type: ndcg_at_1000 value: 80.203 - type: ndcg_at_3 value: 75.393 - type: ndcg_at_5 value: 78.077 - type: precision_at_1 value: 60.78 - type: precision_at_10 value: 9.59 - type: precision_at_100 value: 0.9900000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 28.52 - type: precision_at_5 value: 18.4 - type: recall_at_1 value: 60.78 - type: recall_at_10 value: 95.902 - type: recall_at_100 value: 99.024 - type: recall_at_1000 value: 100.0 - type: recall_at_3 value: 85.56099999999999 - type: recall_at_5 value: 92.0 - task: type: STS dataset: type: jinaai/german-STSbenchmark name: MTEB GermanSTSBenchmark config: default split: test revision: 49d9b423b996fea62b483f9ee6dfb5ec233515ca metrics: - type: cos_sim_pearson value: 88.49524420894356 - type: cos_sim_spearman value: 88.32407839427714 - type: euclidean_pearson value: 87.25098779877104 - type: euclidean_spearman value: 88.22738098593608 - type: manhattan_pearson value: 87.23872691839607 - type: manhattan_spearman value: 88.2002968380165 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 31.81 - type: map_at_10 value: 46.238 - type: map_at_100 value: 47.141 - type: map_at_1000 value: 47.213 - type: map_at_3 value: 43.248999999999995 - type: map_at_5 value: 45.078 - type: mrr_at_1 value: 63.619 - type: mrr_at_10 value: 71.279 - type: mrr_at_100 value: 71.648 - type: mrr_at_1000 value: 71.665 - type: mrr_at_3 value: 69.76599999999999 - type: mrr_at_5 value: 70.743 - type: ndcg_at_1 value: 63.619 - type: ndcg_at_10 value: 55.38999999999999 - type: ndcg_at_100 value: 58.80800000000001 - type: ndcg_at_1000 value: 60.331999999999994 - type: ndcg_at_3 value: 50.727 - type: ndcg_at_5 value: 53.284 - type: precision_at_1 value: 63.619 - type: precision_at_10 value: 11.668000000000001 - type: precision_at_100 value: 1.434 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 32.001000000000005 - type: precision_at_5 value: 21.223 - type: recall_at_1 value: 31.81 - type: recall_at_10 value: 58.339 - type: recall_at_100 value: 71.708 - type: recall_at_1000 value: 81.85 - type: recall_at_3 value: 48.001 - type: recall_at_5 value: 53.059 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 68.60640000000001 - type: ap value: 62.84296904042086 - type: f1 value: 68.50643633327537 - task: type: Reranking dataset: type: jinaai/miracl name: MTEB MIRACL config: default split: test revision: 8741c3b61cd36ed9ca1b3d4203543a41793239e2 metrics: - type: map value: 64.29704335389768 - type: mrr value: 72.11962197159565 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 89.3844049247606 - type: f1 value: 89.2124328528015 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.36855452240067 - type: f1 value: 87.35458822097442 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 66.48654810761514 - type: f1 value: 50.07229882504409 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 63.832065370526905 - type: f1 value: 46.283579383385806 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.89038332212509 - type: f1 value: 61.86279849685129 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 69.11230665770006 - type: f1 value: 67.44780095350535 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.25084061869536 - type: f1 value: 71.43965023016408 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 73.73907195696032 - type: f1 value: 73.69920814839061 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.32577306498249 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 28.759349326367783 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.401342674703425 - type: mrr value: 31.384379585660987 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 4.855 - type: map_at_10 value: 10.01 - type: map_at_100 value: 12.461 - type: map_at_1000 value: 13.776 - type: map_at_3 value: 7.252 - type: map_at_5 value: 8.679 - type: mrr_at_1 value: 41.176 - type: mrr_at_10 value: 49.323 - type: mrr_at_100 value: 49.954 - type: mrr_at_1000 value: 49.997 - type: mrr_at_3 value: 46.904 - type: mrr_at_5 value: 48.375 - type: ndcg_at_1 value: 39.318999999999996 - type: ndcg_at_10 value: 28.607 - type: ndcg_at_100 value: 26.554 - type: ndcg_at_1000 value: 35.731 - type: ndcg_at_3 value: 32.897999999999996 - type: ndcg_at_5 value: 31.53 - type: precision_at_1 value: 41.176 - type: precision_at_10 value: 20.867 - type: precision_at_100 value: 6.796 - type: precision_at_1000 value: 1.983 - type: precision_at_3 value: 30.547 - type: precision_at_5 value: 27.245 - type: recall_at_1 value: 4.855 - type: recall_at_10 value: 14.08 - type: recall_at_100 value: 28.188000000000002 - type: recall_at_1000 value: 60.07900000000001 - type: recall_at_3 value: 7.947 - type: recall_at_5 value: 10.786 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 26.906999999999996 - type: map_at_10 value: 41.147 - type: map_at_100 value: 42.269 - type: map_at_1000 value: 42.308 - type: map_at_3 value: 36.638999999999996 - type: map_at_5 value: 39.285 - type: mrr_at_1 value: 30.359 - type: mrr_at_10 value: 43.607 - type: mrr_at_100 value: 44.454 - type: mrr_at_1000 value: 44.481 - type: mrr_at_3 value: 39.644 - type: mrr_at_5 value: 42.061 - type: ndcg_at_1 value: 30.330000000000002 - type: ndcg_at_10 value: 48.899 - type: ndcg_at_100 value: 53.612 - type: ndcg_at_1000 value: 54.51200000000001 - type: ndcg_at_3 value: 40.262 - type: ndcg_at_5 value: 44.787 - type: precision_at_1 value: 30.330000000000002 - type: precision_at_10 value: 8.323 - type: precision_at_100 value: 1.0959999999999999 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 18.395 - type: precision_at_5 value: 13.627 - type: recall_at_1 value: 26.906999999999996 - type: recall_at_10 value: 70.215 - type: recall_at_100 value: 90.61200000000001 - type: recall_at_1000 value: 97.294 - type: recall_at_3 value: 47.784 - type: recall_at_5 value: 58.251 - task: type: PairClassification dataset: type: paws-x name: MTEB PawsX config: default split: test revision: 8a04d940a42cd40658986fdd8e3da561533a3646 metrics: - type: cos_sim_accuracy value: 60.5 - type: cos_sim_ap value: 57.606096528877494 - type: cos_sim_f1 value: 62.24240307369892 - type: cos_sim_precision value: 45.27439024390244 - type: cos_sim_recall value: 99.55307262569832 - type: dot_accuracy value: 57.699999999999996 - type: dot_ap value: 51.289351057160616 - type: dot_f1 value: 62.25953130465197 - type: dot_precision value: 45.31568228105906 - type: dot_recall value: 99.4413407821229 - type: euclidean_accuracy value: 60.45 - type: euclidean_ap value: 57.616461421424034 - type: euclidean_f1 value: 62.313697657913416 - type: euclidean_precision value: 45.657826313052524 - type: euclidean_recall value: 98.10055865921787 - type: manhattan_accuracy value: 60.3 - type: manhattan_ap value: 57.580565271667325 - type: manhattan_f1 value: 62.24240307369892 - type: manhattan_precision value: 45.27439024390244 - type: manhattan_recall value: 99.55307262569832 - type: max_accuracy value: 60.5 - type: max_ap value: 57.616461421424034 - type: max_f1 value: 62.313697657913416 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.21300000000001 - type: map_at_10 value: 84.136 - type: map_at_100 value: 84.796 - type: map_at_1000 value: 84.812 - type: map_at_3 value: 81.182 - type: map_at_5 value: 83.027 - type: mrr_at_1 value: 80.91000000000001 - type: mrr_at_10 value: 87.155 - type: mrr_at_100 value: 87.27000000000001 - type: mrr_at_1000 value: 87.271 - type: mrr_at_3 value: 86.158 - type: mrr_at_5 value: 86.828 - type: ndcg_at_1 value: 80.88 - type: ndcg_at_10 value: 87.926 - type: ndcg_at_100 value: 89.223 - type: ndcg_at_1000 value: 89.321 - type: ndcg_at_3 value: 85.036 - type: ndcg_at_5 value: 86.614 - type: precision_at_1 value: 80.88 - type: precision_at_10 value: 13.350000000000001 - type: precision_at_100 value: 1.5310000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.173 - type: precision_at_5 value: 24.476 - type: recall_at_1 value: 70.21300000000001 - type: recall_at_10 value: 95.12 - type: recall_at_100 value: 99.535 - type: recall_at_1000 value: 99.977 - type: recall_at_3 value: 86.833 - type: recall_at_5 value: 91.26100000000001 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 47.754688783184875 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 54.875736374329364 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 3.773 - type: map_at_10 value: 9.447 - type: map_at_100 value: 11.1 - type: map_at_1000 value: 11.37 - type: map_at_3 value: 6.787 - type: map_at_5 value: 8.077 - type: mrr_at_1 value: 18.5 - type: mrr_at_10 value: 28.227000000000004 - type: mrr_at_100 value: 29.445 - type: mrr_at_1000 value: 29.515 - type: mrr_at_3 value: 25.2 - type: mrr_at_5 value: 27.055 - type: ndcg_at_1 value: 18.5 - type: ndcg_at_10 value: 16.29 - type: ndcg_at_100 value: 23.250999999999998 - type: ndcg_at_1000 value: 28.445999999999998 - type: ndcg_at_3 value: 15.376000000000001 - type: ndcg_at_5 value: 13.528 - type: precision_at_1 value: 18.5 - type: precision_at_10 value: 8.51 - type: precision_at_100 value: 1.855 - type: precision_at_1000 value: 0.311 - type: precision_at_3 value: 14.533 - type: precision_at_5 value: 12.0 - type: recall_at_1 value: 3.773 - type: recall_at_10 value: 17.282 - type: recall_at_100 value: 37.645 - type: recall_at_1000 value: 63.138000000000005 - type: recall_at_3 value: 8.853 - type: recall_at_5 value: 12.168 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 85.32789517976525 - type: cos_sim_spearman value: 80.32750384145629 - type: euclidean_pearson value: 81.5025131452508 - type: euclidean_spearman value: 80.24797115147175 - type: manhattan_pearson value: 81.51634463412002 - type: manhattan_spearman value: 80.24614721495055 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 88.47050448992432 - type: cos_sim_spearman value: 80.58919997743621 - type: euclidean_pearson value: 85.83258918113664 - type: euclidean_spearman value: 80.97441389240902 - type: manhattan_pearson value: 85.7798262013878 - type: manhattan_spearman value: 80.97208703064196 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 85.95341439711532 - type: cos_sim_spearman value: 86.59127484634989 - type: euclidean_pearson value: 85.57850603454227 - type: euclidean_spearman value: 86.47130477363419 - type: manhattan_pearson value: 85.59387925447652 - type: manhattan_spearman value: 86.50665427391583 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 85.39810909161844 - type: cos_sim_spearman value: 82.98595295546008 - type: euclidean_pearson value: 84.04681129969951 - type: euclidean_spearman value: 82.98197460689866 - type: manhattan_pearson value: 83.9918798171185 - type: manhattan_spearman value: 82.91148131768082 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.02072712147692 - type: cos_sim_spearman value: 88.78821332623012 - type: euclidean_pearson value: 88.12132045572747 - type: euclidean_spearman value: 88.74273451067364 - type: manhattan_pearson value: 88.05431550059166 - type: manhattan_spearman value: 88.67610233020723 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.96134704624787 - type: cos_sim_spearman value: 84.44062976314666 - type: euclidean_pearson value: 84.03642536310323 - type: euclidean_spearman value: 84.4535014579785 - type: manhattan_pearson value: 83.92874228901483 - type: manhattan_spearman value: 84.33634314951631 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.3154168064887 - type: cos_sim_spearman value: 86.72393652571682 - type: euclidean_pearson value: 86.04193246174164 - type: euclidean_spearman value: 86.30482896608093 - type: manhattan_pearson value: 85.95524084651859 - type: manhattan_spearman value: 86.06031431994282 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 89.91079682750804 - type: cos_sim_spearman value: 89.30961836617064 - type: euclidean_pearson value: 88.86249564158628 - type: euclidean_spearman value: 89.04772899592396 - type: manhattan_pearson value: 88.85579791315043 - type: manhattan_spearman value: 88.94190462541333 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.00558145551088 - type: cos_sim_spearman value: 67.96601170393878 - type: euclidean_pearson value: 67.87627043214336 - type: euclidean_spearman value: 66.76402572303859 - type: manhattan_pearson value: 67.88306560555452 - type: manhattan_spearman value: 66.6273862035506 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 50.83759332748726 - type: cos_sim_spearman value: 59.066344562858006 - type: euclidean_pearson value: 50.08955848154131 - type: euclidean_spearman value: 58.36517305855221 - type: manhattan_pearson value: 50.05257267223111 - type: manhattan_spearman value: 58.37570252804986 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 59.22749007956492 - type: cos_sim_spearman value: 55.97282077657827 - type: euclidean_pearson value: 62.10661533695752 - type: euclidean_spearman value: 53.62780854854067 - type: manhattan_pearson value: 62.37138085709719 - type: manhattan_spearman value: 54.17556356828155 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 87.91145397065878 - type: cos_sim_spearman value: 88.13960018389005 - type: euclidean_pearson value: 87.67618876224006 - type: euclidean_spearman value: 87.99119480810556 - type: manhattan_pearson value: 87.67920297334753 - type: manhattan_spearman value: 87.99113250064492 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 78.09133563707582 - type: mrr value: 93.2415288052543 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 47.760999999999996 - type: map_at_10 value: 56.424 - type: map_at_100 value: 57.24399999999999 - type: map_at_1000 value: 57.278 - type: map_at_3 value: 53.68000000000001 - type: map_at_5 value: 55.442 - type: mrr_at_1 value: 50.666999999999994 - type: mrr_at_10 value: 58.012 - type: mrr_at_100 value: 58.736 - type: mrr_at_1000 value: 58.769000000000005 - type: mrr_at_3 value: 56.056 - type: mrr_at_5 value: 57.321999999999996 - type: ndcg_at_1 value: 50.666999999999994 - type: ndcg_at_10 value: 60.67700000000001 - type: ndcg_at_100 value: 64.513 - type: ndcg_at_1000 value: 65.62400000000001 - type: ndcg_at_3 value: 56.186 - type: ndcg_at_5 value: 58.692 - type: precision_at_1 value: 50.666999999999994 - type: precision_at_10 value: 8.200000000000001 - type: precision_at_100 value: 1.023 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 21.889 - type: precision_at_5 value: 14.866999999999999 - type: recall_at_1 value: 47.760999999999996 - type: recall_at_10 value: 72.006 - type: recall_at_100 value: 89.767 - type: recall_at_1000 value: 98.833 - type: recall_at_3 value: 60.211000000000006 - type: recall_at_5 value: 66.3 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.79009900990098 - type: cos_sim_ap value: 94.86690691995835 - type: cos_sim_f1 value: 89.37875751503007 - type: cos_sim_precision value: 89.5582329317269 - type: cos_sim_recall value: 89.2 - type: dot_accuracy value: 99.76336633663367 - type: dot_ap value: 94.26453740761586 - type: dot_f1 value: 88.00783162016641 - type: dot_precision value: 86.19367209971237 - type: dot_recall value: 89.9 - type: euclidean_accuracy value: 99.7940594059406 - type: euclidean_ap value: 94.85459757524379 - type: euclidean_f1 value: 89.62779156327544 - type: euclidean_precision value: 88.96551724137932 - type: euclidean_recall value: 90.3 - type: manhattan_accuracy value: 99.79009900990098 - type: manhattan_ap value: 94.76971336654465 - type: manhattan_f1 value: 89.35323383084577 - type: manhattan_precision value: 88.91089108910892 - type: manhattan_recall value: 89.8 - type: max_accuracy value: 99.7940594059406 - type: max_ap value: 94.86690691995835 - type: max_f1 value: 89.62779156327544 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 55.38197670064987 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 33.08330158937971 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.50367079063226 - type: mrr value: 50.30444943128768 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.37739520909561 - type: cos_sim_spearman value: 31.548500943973913 - type: dot_pearson value: 29.983610104303 - type: dot_spearman value: 29.90185869098618 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.198 - type: map_at_10 value: 1.5810000000000002 - type: map_at_100 value: 9.064 - type: map_at_1000 value: 22.161 - type: map_at_3 value: 0.536 - type: map_at_5 value: 0.8370000000000001 - type: mrr_at_1 value: 80.0 - type: mrr_at_10 value: 86.75 - type: mrr_at_100 value: 86.799 - type: mrr_at_1000 value: 86.799 - type: mrr_at_3 value: 85.0 - type: mrr_at_5 value: 86.5 - type: ndcg_at_1 value: 73.0 - type: ndcg_at_10 value: 65.122 - type: ndcg_at_100 value: 51.853 - type: ndcg_at_1000 value: 47.275 - type: ndcg_at_3 value: 66.274 - type: ndcg_at_5 value: 64.826 - type: precision_at_1 value: 80.0 - type: precision_at_10 value: 70.19999999999999 - type: precision_at_100 value: 53.480000000000004 - type: precision_at_1000 value: 20.946 - type: precision_at_3 value: 71.333 - type: precision_at_5 value: 70.0 - type: recall_at_1 value: 0.198 - type: recall_at_10 value: 1.884 - type: recall_at_100 value: 12.57 - type: recall_at_1000 value: 44.208999999999996 - type: recall_at_3 value: 0.5890000000000001 - type: recall_at_5 value: 0.95 - task: type: Clustering dataset: type: slvnwhrl/tenkgnad-clustering-p2p name: MTEB TenKGnadClusteringP2P config: default split: test revision: 5c59e41555244b7e45c9a6be2d720ab4bafae558 metrics: - type: v_measure value: 42.84199261133083 - task: type: Clustering dataset: type: slvnwhrl/tenkgnad-clustering-s2s name: MTEB TenKGnadClusteringS2S config: default split: test revision: 6cddbe003f12b9b140aec477b583ac4191f01786 metrics: - type: v_measure value: 23.689557114798838 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 1.941 - type: map_at_10 value: 8.222 - type: map_at_100 value: 14.277999999999999 - type: map_at_1000 value: 15.790000000000001 - type: map_at_3 value: 4.4670000000000005 - type: map_at_5 value: 5.762 - type: mrr_at_1 value: 24.490000000000002 - type: mrr_at_10 value: 38.784 - type: mrr_at_100 value: 39.724 - type: mrr_at_1000 value: 39.724 - type: mrr_at_3 value: 33.333 - type: mrr_at_5 value: 37.415 - type: ndcg_at_1 value: 22.448999999999998 - type: ndcg_at_10 value: 21.026 - type: ndcg_at_100 value: 33.721000000000004 - type: ndcg_at_1000 value: 45.045 - type: ndcg_at_3 value: 20.053 - type: ndcg_at_5 value: 20.09 - type: precision_at_1 value: 24.490000000000002 - type: precision_at_10 value: 19.796 - type: precision_at_100 value: 7.469 - type: precision_at_1000 value: 1.48 - type: precision_at_3 value: 21.769 - type: precision_at_5 value: 21.224 - type: recall_at_1 value: 1.941 - type: recall_at_10 value: 14.915999999999999 - type: recall_at_100 value: 46.155 - type: recall_at_1000 value: 80.664 - type: recall_at_3 value: 5.629 - type: recall_at_5 value: 8.437 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 69.64800000000001 - type: ap value: 12.914826731261094 - type: f1 value: 53.05213503422915 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.427277872099594 - type: f1 value: 60.78292007556828 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 40.48134168406559 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 84.79465935506944 - type: cos_sim_ap value: 70.24589055290592 - type: cos_sim_f1 value: 65.0994575045208 - type: cos_sim_precision value: 63.76518218623482 - type: cos_sim_recall value: 66.49076517150397 - type: dot_accuracy value: 84.63968528342374 - type: dot_ap value: 69.84683095084355 - type: dot_f1 value: 64.50606169727523 - type: dot_precision value: 59.1719885487778 - type: dot_recall value: 70.89709762532982 - type: euclidean_accuracy value: 84.76485664898374 - type: euclidean_ap value: 70.20556438685551 - type: euclidean_f1 value: 65.06796614516543 - type: euclidean_precision value: 63.29840319361277 - type: euclidean_recall value: 66.93931398416886 - type: manhattan_accuracy value: 84.72313286046374 - type: manhattan_ap value: 70.17151475534308 - type: manhattan_f1 value: 65.31379180759113 - type: manhattan_precision value: 62.17505366086334 - type: manhattan_recall value: 68.7862796833773 - type: max_accuracy value: 84.79465935506944 - type: max_ap value: 70.24589055290592 - type: max_f1 value: 65.31379180759113 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.95874568246207 - type: cos_sim_ap value: 85.82517548264127 - type: cos_sim_f1 value: 78.22288041466125 - type: cos_sim_precision value: 75.33875338753387 - type: cos_sim_recall value: 81.33661841700031 - type: dot_accuracy value: 88.836496293709 - type: dot_ap value: 85.53430720252186 - type: dot_f1 value: 78.10616085869725 - type: dot_precision value: 74.73269555430501 - type: dot_recall value: 81.79858330766862 - type: euclidean_accuracy value: 88.92769821865176 - type: euclidean_ap value: 85.65904346964223 - type: euclidean_f1 value: 77.98774074208407 - type: euclidean_precision value: 73.72282795035315 - type: euclidean_recall value: 82.77640899291654 - type: manhattan_accuracy value: 88.86366282454303 - type: manhattan_ap value: 85.61599642231819 - type: manhattan_f1 value: 78.01480509061737 - type: manhattan_precision value: 74.10460685833044 - type: manhattan_recall value: 82.36064059131506 - type: max_accuracy value: 88.95874568246207 - type: max_ap value: 85.82517548264127 - type: max_f1 value: 78.22288041466125 - task: type: Retrieval dataset: type: None name: MTEB WikiCLIR config: default split: test revision: None metrics: - type: map_at_1 value: 3.9539999999999997 - type: map_at_10 value: 7.407 - type: map_at_100 value: 8.677999999999999 - type: map_at_1000 value: 9.077 - type: map_at_3 value: 5.987 - type: map_at_5 value: 6.6979999999999995 - type: mrr_at_1 value: 35.65 - type: mrr_at_10 value: 45.097 - type: mrr_at_100 value: 45.83 - type: mrr_at_1000 value: 45.871 - type: mrr_at_3 value: 42.63 - type: mrr_at_5 value: 44.104 - type: ndcg_at_1 value: 29.215000000000003 - type: ndcg_at_10 value: 22.694 - type: ndcg_at_100 value: 22.242 - type: ndcg_at_1000 value: 27.069 - type: ndcg_at_3 value: 27.641 - type: ndcg_at_5 value: 25.503999999999998 - type: precision_at_1 value: 35.65 - type: precision_at_10 value: 12.795000000000002 - type: precision_at_100 value: 3.354 - type: precision_at_1000 value: 0.743 - type: precision_at_3 value: 23.403 - type: precision_at_5 value: 18.474 - type: recall_at_1 value: 3.9539999999999997 - type: recall_at_10 value: 11.301 - type: recall_at_100 value: 22.919999999999998 - type: recall_at_1000 value: 40.146 - type: recall_at_3 value: 7.146 - type: recall_at_5 value: 8.844000000000001 - task: type: Retrieval dataset: type: jinaai/xmarket_de name: MTEB XMarket config: default split: test revision: 2336818db4c06570fcdf263e1bcb9993b786f67a metrics: - type: map_at_1 value: 4.872 - type: map_at_10 value: 10.658 - type: map_at_100 value: 13.422999999999998 - type: map_at_1000 value: 14.245 - type: map_at_3 value: 7.857 - type: map_at_5 value: 9.142999999999999 - type: mrr_at_1 value: 16.744999999999997 - type: mrr_at_10 value: 24.416 - type: mrr_at_100 value: 25.432 - type: mrr_at_1000 value: 25.502999999999997 - type: mrr_at_3 value: 22.096 - type: mrr_at_5 value: 23.421 - type: ndcg_at_1 value: 16.695999999999998 - type: ndcg_at_10 value: 18.66 - type: ndcg_at_100 value: 24.314 - type: ndcg_at_1000 value: 29.846 - type: ndcg_at_3 value: 17.041999999999998 - type: ndcg_at_5 value: 17.585 - type: precision_at_1 value: 16.695999999999998 - type: precision_at_10 value: 10.374 - type: precision_at_100 value: 3.988 - type: precision_at_1000 value: 1.1860000000000002 - type: precision_at_3 value: 14.21 - type: precision_at_5 value: 12.623000000000001 - type: recall_at_1 value: 4.872 - type: recall_at_10 value: 18.624 - type: recall_at_100 value: 40.988 - type: recall_at_1000 value: 65.33 - type: recall_at_3 value: 10.162 - type: recall_at_5 value: 13.517999999999999 ---

\"Jina

The text embedding set trained by .

## Quick Start The easiest way to starting using is to use Jina AI's Embedding API. ## Intended Usage & Model Info is a German/English bilingual text **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. We have designed it for high performance in mono-lingual & cross-lingual applications and trained it specifically to support mixed German-English input without bias. Additionally, we provide the following embedding models: ist ein zweisprachiges **Text Embedding Modell** für Deutsch und Englisch, welches Texteingaben mit einer Länge von bis zu **8192 Token unterstützt**. Es basiert auf der adaptierten Bert-Modell-Architektur JinaBERT, welche mithilfe einer symmetrische Variante von ALiBi längere Eingabetexte erlaubt. Wir haben, das Model für hohe Performance in einsprachigen und cross-lingual Anwendungen entwickelt und speziell darauf trainiert, gemischte deutsch-englische Eingaben ohne einen Bias zu kodieren. Des Weiteren stellen wir folgende Embedding-Modelle bereit: - []( 33 million parameters. - []( 137 million parameters. - []( 161 million parameters Chinese-English Bilingual embeddings. - []( 161 million parameters German-English Bilingual embeddings **(you are here)**. - [](): Spanish-English Bilingual embeddings (soon). - []( 161 million parameters code embeddings. ## Data & Parameters The data and training details are described in this technical report. ## Usage **
Please apply mean pooling when integrating the model.**

### Why mean pooling? takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an function to deal with this. However, if you would like to do it without using the default function:

You can use Jina Embedding models directly from transformers package. If you only want to handle shorter sequence, such as 2k, pass the parameter to the function: Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well): ## Alternatives to Using Transformers Package 1. _Managed SaaS_: Get started with a free key on Jina AI's Embedding API. 2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on AWS Sagemaker. ## Benchmark Results We evaluated our Bilingual model on all German and English evaluation tasks availble on the MTEB benchmark. In addition, we evaluated the models agains a couple of other German, English, and multilingual models on additional German evaluation tasks: ## Use Jina Embeddings for RAG According to the latest blog post from LLamaIndex, > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out. ## Contact Join our Discord community and chat with other community members about ideas. ## Citation If you find Jina Embeddings useful in your research, please cite the following paper:", + "model_explanation_gemini": "Generates embeddings for German and English text to support tasks like sentence similarity, classification, retrieval, clustering, and reranking." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-embeddings-v2-base-en.json b/data/model_data_json/jinaai_jina-embeddings-v2-base-en.json new file mode 100644 index 0000000000000000000000000000000000000000..1046b462be1bed0ef01478053ba6b40c9d445d67 --- /dev/null +++ b/data/model_data_json/jinaai_jina-embeddings-v2-base-en.json @@ -0,0 +1,27 @@ +{ + "model_id": "jinaai/jina-embeddings-v2-base-en", + "downloads": 320364, + "tags": [ + "sentence-transformers", + "pytorch", + "coreml", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "custom_code", + "en", + "dataset:allenai/c4", + "arxiv:2108.12409", + "arxiv:2310.19923", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "region:us" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb datasets: - allenai/c4 language: en inference: false license: apache-2.0 model-index: - name: jina-embedding-b-en-v2 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.73134328358209 - type: ap value: 37.765427081831035 - type: f1 value: 68.79367444339518 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 88.544275 - type: ap value: 84.61328675662887 - type: f1 value: 88.51879035862375 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 45.263999999999996 - type: f1 value: 43.778759656699435 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 21.693 - type: map_at_10 value: 35.487 - type: map_at_100 value: 36.862 - type: map_at_1000 value: 36.872 - type: map_at_3 value: 30.049999999999997 - type: map_at_5 value: 32.966 - type: mrr_at_1 value: 21.977 - type: mrr_at_10 value: 35.565999999999995 - type: mrr_at_100 value: 36.948 - type: mrr_at_1000 value: 36.958 - type: mrr_at_3 value: 30.121 - type: mrr_at_5 value: 33.051 - type: ndcg_at_1 value: 21.693 - type: ndcg_at_10 value: 44.181 - type: ndcg_at_100 value: 49.982 - type: ndcg_at_1000 value: 50.233000000000004 - type: ndcg_at_3 value: 32.830999999999996 - type: ndcg_at_5 value: 38.080000000000005 - type: precision_at_1 value: 21.693 - type: precision_at_10 value: 7.248 - type: precision_at_100 value: 0.9769999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 13.632 - type: precision_at_5 value: 10.725 - type: recall_at_1 value: 21.693 - type: recall_at_10 value: 72.475 - type: recall_at_100 value: 97.653 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 40.896 - type: recall_at_5 value: 53.627 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 45.39242428696777 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 36.675626784714 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.247725694904034 - type: mrr value: 74.91359978894604 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 82.68003802970496 - type: cos_sim_spearman value: 81.23438110096286 - type: euclidean_pearson value: 81.87462986142582 - type: euclidean_spearman value: 81.23438110096286 - type: manhattan_pearson value: 81.61162566600755 - type: manhattan_spearman value: 81.11329400456184 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 84.01298701298701 - type: f1 value: 83.31690714969382 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 37.050108150972086 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 30.15731442819715 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.391999999999996 - type: map_at_10 value: 42.597 - type: map_at_100 value: 44.07 - type: map_at_1000 value: 44.198 - type: map_at_3 value: 38.957 - type: map_at_5 value: 40.961 - type: mrr_at_1 value: 37.196 - type: mrr_at_10 value: 48.152 - type: mrr_at_100 value: 48.928 - type: mrr_at_1000 value: 48.964999999999996 - type: mrr_at_3 value: 45.446 - type: mrr_at_5 value: 47.205999999999996 - type: ndcg_at_1 value: 37.196 - type: ndcg_at_10 value: 49.089 - type: ndcg_at_100 value: 54.471000000000004 - type: ndcg_at_1000 value: 56.385 - type: ndcg_at_3 value: 43.699 - type: ndcg_at_5 value: 46.22 - type: precision_at_1 value: 37.196 - type: precision_at_10 value: 9.313 - type: precision_at_100 value: 1.478 - type: precision_at_1000 value: 0.198 - type: precision_at_3 value: 20.839 - type: precision_at_5 value: 14.936 - type: recall_at_1 value: 31.391999999999996 - type: recall_at_10 value: 61.876 - type: recall_at_100 value: 84.214 - type: recall_at_1000 value: 95.985 - type: recall_at_3 value: 46.6 - type: recall_at_5 value: 53.588 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.083 - type: map_at_10 value: 38.812999999999995 - type: map_at_100 value: 40.053 - type: map_at_1000 value: 40.188 - type: map_at_3 value: 36.111 - type: map_at_5 value: 37.519000000000005 - type: mrr_at_1 value: 36.497 - type: mrr_at_10 value: 44.85 - type: mrr_at_100 value: 45.546 - type: mrr_at_1000 value: 45.593 - type: mrr_at_3 value: 42.686 - type: mrr_at_5 value: 43.909 - type: ndcg_at_1 value: 36.497 - type: ndcg_at_10 value: 44.443 - type: ndcg_at_100 value: 48.979 - type: ndcg_at_1000 value: 51.154999999999994 - type: ndcg_at_3 value: 40.660000000000004 - type: ndcg_at_5 value: 42.193000000000005 - type: precision_at_1 value: 36.497 - type: precision_at_10 value: 8.433 - type: precision_at_100 value: 1.369 - type: precision_at_1000 value: 0.185 - type: precision_at_3 value: 19.894000000000002 - type: precision_at_5 value: 13.873 - type: recall_at_1 value: 29.083 - type: recall_at_10 value: 54.313 - type: recall_at_100 value: 73.792 - type: recall_at_1000 value: 87.629 - type: recall_at_3 value: 42.257 - type: recall_at_5 value: 47.066 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.556000000000004 - type: map_at_10 value: 50.698 - type: map_at_100 value: 51.705 - type: map_at_1000 value: 51.768 - type: map_at_3 value: 47.848 - type: map_at_5 value: 49.358000000000004 - type: mrr_at_1 value: 43.95 - type: mrr_at_10 value: 54.191 - type: mrr_at_100 value: 54.852999999999994 - type: mrr_at_1000 value: 54.885 - type: mrr_at_3 value: 51.954 - type: mrr_at_5 value: 53.13 - type: ndcg_at_1 value: 43.95 - type: ndcg_at_10 value: 56.516 - type: ndcg_at_100 value: 60.477000000000004 - type: ndcg_at_1000 value: 61.746 - type: ndcg_at_3 value: 51.601 - type: ndcg_at_5 value: 53.795 - type: precision_at_1 value: 43.95 - type: precision_at_10 value: 9.009 - type: precision_at_100 value: 1.189 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 22.989 - type: precision_at_5 value: 15.473 - type: recall_at_1 value: 38.556000000000004 - type: recall_at_10 value: 70.159 - type: recall_at_100 value: 87.132 - type: recall_at_1000 value: 96.16 - type: recall_at_3 value: 56.906 - type: recall_at_5 value: 62.332 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.238 - type: map_at_10 value: 32.5 - type: map_at_100 value: 33.637 - type: map_at_1000 value: 33.719 - type: map_at_3 value: 30.026999999999997 - type: map_at_5 value: 31.555 - type: mrr_at_1 value: 26.328000000000003 - type: mrr_at_10 value: 34.44 - type: mrr_at_100 value: 35.455999999999996 - type: mrr_at_1000 value: 35.521 - type: mrr_at_3 value: 32.034 - type: mrr_at_5 value: 33.565 - type: ndcg_at_1 value: 26.328000000000003 - type: ndcg_at_10 value: 37.202 - type: ndcg_at_100 value: 42.728 - type: ndcg_at_1000 value: 44.792 - type: ndcg_at_3 value: 32.368 - type: ndcg_at_5 value: 35.008 - type: precision_at_1 value: 26.328000000000003 - type: precision_at_10 value: 5.7059999999999995 - type: precision_at_100 value: 0.8880000000000001 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 13.672 - type: precision_at_5 value: 9.74 - type: recall_at_1 value: 24.238 - type: recall_at_10 value: 49.829 - type: recall_at_100 value: 75.21 - type: recall_at_1000 value: 90.521 - type: recall_at_3 value: 36.867 - type: recall_at_5 value: 43.241 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 15.378 - type: map_at_10 value: 22.817999999999998 - type: map_at_100 value: 23.977999999999998 - type: map_at_1000 value: 24.108 - type: map_at_3 value: 20.719 - type: map_at_5 value: 21.889 - type: mrr_at_1 value: 19.03 - type: mrr_at_10 value: 27.022000000000002 - type: mrr_at_100 value: 28.011999999999997 - type: mrr_at_1000 value: 28.096 - type: mrr_at_3 value: 24.855 - type: mrr_at_5 value: 26.029999999999998 - type: ndcg_at_1 value: 19.03 - type: ndcg_at_10 value: 27.526 - type: ndcg_at_100 value: 33.040000000000006 - type: ndcg_at_1000 value: 36.187000000000005 - type: ndcg_at_3 value: 23.497 - type: ndcg_at_5 value: 25.334 - type: precision_at_1 value: 19.03 - type: precision_at_10 value: 4.963 - type: precision_at_100 value: 0.893 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 11.360000000000001 - type: precision_at_5 value: 8.134 - type: recall_at_1 value: 15.378 - type: recall_at_10 value: 38.061 - type: recall_at_100 value: 61.754 - type: recall_at_1000 value: 84.259 - type: recall_at_3 value: 26.788 - type: recall_at_5 value: 31.326999999999998 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.511999999999997 - type: map_at_10 value: 37.429 - type: map_at_100 value: 38.818000000000005 - type: map_at_1000 value: 38.924 - type: map_at_3 value: 34.625 - type: map_at_5 value: 36.064 - type: mrr_at_1 value: 33.300999999999995 - type: mrr_at_10 value: 43.036 - type: mrr_at_100 value: 43.894 - type: mrr_at_1000 value: 43.936 - type: mrr_at_3 value: 40.825 - type: mrr_at_5 value: 42.028 - type: ndcg_at_1 value: 33.300999999999995 - type: ndcg_at_10 value: 43.229 - type: ndcg_at_100 value: 48.992000000000004 - type: ndcg_at_1000 value: 51.02100000000001 - type: ndcg_at_3 value: 38.794000000000004 - type: ndcg_at_5 value: 40.65 - type: precision_at_1 value: 33.300999999999995 - type: precision_at_10 value: 7.777000000000001 - type: precision_at_100 value: 1.269 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 18.351 - type: precision_at_5 value: 12.762 - type: recall_at_1 value: 27.511999999999997 - type: recall_at_10 value: 54.788000000000004 - type: recall_at_100 value: 79.105 - type: recall_at_1000 value: 92.49199999999999 - type: recall_at_3 value: 41.924 - type: recall_at_5 value: 47.026 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.117 - type: map_at_10 value: 33.32 - type: map_at_100 value: 34.677 - type: map_at_1000 value: 34.78 - type: map_at_3 value: 30.233999999999998 - type: map_at_5 value: 31.668000000000003 - type: mrr_at_1 value: 29.566 - type: mrr_at_10 value: 38.244 - type: mrr_at_100 value: 39.245000000000005 - type: mrr_at_1000 value: 39.296 - type: mrr_at_3 value: 35.864000000000004 - type: mrr_at_5 value: 36.919999999999995 - type: ndcg_at_1 value: 29.566 - type: ndcg_at_10 value: 39.127 - type: ndcg_at_100 value: 44.989000000000004 - type: ndcg_at_1000 value: 47.189 - type: ndcg_at_3 value: 34.039 - type: ndcg_at_5 value: 35.744 - type: precision_at_1 value: 29.566 - type: precision_at_10 value: 7.385999999999999 - type: precision_at_100 value: 1.204 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 16.286 - type: precision_at_5 value: 11.484 - type: recall_at_1 value: 24.117 - type: recall_at_10 value: 51.559999999999995 - type: recall_at_100 value: 77.104 - type: recall_at_1000 value: 91.79899999999999 - type: recall_at_3 value: 36.82 - type: recall_at_5 value: 41.453 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.17625 - type: map_at_10 value: 34.063916666666664 - type: map_at_100 value: 35.255500000000005 - type: map_at_1000 value: 35.37275 - type: map_at_3 value: 31.351666666666667 - type: map_at_5 value: 32.80608333333333 - type: mrr_at_1 value: 29.59783333333333 - type: mrr_at_10 value: 38.0925 - type: mrr_at_100 value: 38.957249999999995 - type: mrr_at_1000 value: 39.01608333333333 - type: mrr_at_3 value: 35.77625 - type: mrr_at_5 value: 37.04991666666667 - type: ndcg_at_1 value: 29.59783333333333 - type: ndcg_at_10 value: 39.343666666666664 - type: ndcg_at_100 value: 44.488249999999994 - type: ndcg_at_1000 value: 46.83358333333334 - type: ndcg_at_3 value: 34.69708333333333 - type: ndcg_at_5 value: 36.75075 - type: precision_at_1 value: 29.59783333333333 - type: precision_at_10 value: 6.884083333333332 - type: precision_at_100 value: 1.114 - type: precision_at_1000 value: 0.15108333333333332 - type: precision_at_3 value: 15.965250000000003 - type: precision_at_5 value: 11.246500000000001 - type: recall_at_1 value: 25.17625 - type: recall_at_10 value: 51.015999999999984 - type: recall_at_100 value: 73.60174999999998 - type: recall_at_1000 value: 89.849 - type: recall_at_3 value: 37.88399999999999 - type: recall_at_5 value: 43.24541666666666 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.537 - type: map_at_10 value: 31.081999999999997 - type: map_at_100 value: 32.042 - type: map_at_1000 value: 32.141 - type: map_at_3 value: 29.137 - type: map_at_5 value: 30.079 - type: mrr_at_1 value: 27.454 - type: mrr_at_10 value: 33.694 - type: mrr_at_100 value: 34.579 - type: mrr_at_1000 value: 34.649 - type: mrr_at_3 value: 32.004 - type: mrr_at_5 value: 32.794000000000004 - type: ndcg_at_1 value: 27.454 - type: ndcg_at_10 value: 34.915 - type: ndcg_at_100 value: 39.641 - type: ndcg_at_1000 value: 42.105 - type: ndcg_at_3 value: 31.276 - type: ndcg_at_5 value: 32.65 - type: precision_at_1 value: 27.454 - type: precision_at_10 value: 5.337 - type: precision_at_100 value: 0.8250000000000001 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 13.241 - type: precision_at_5 value: 8.895999999999999 - type: recall_at_1 value: 24.537 - type: recall_at_10 value: 44.324999999999996 - type: recall_at_100 value: 65.949 - type: recall_at_1000 value: 84.017 - type: recall_at_3 value: 33.857 - type: recall_at_5 value: 37.316 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.122 - type: map_at_10 value: 24.32 - type: map_at_100 value: 25.338 - type: map_at_1000 value: 25.462 - type: map_at_3 value: 22.064 - type: map_at_5 value: 23.322000000000003 - type: mrr_at_1 value: 20.647 - type: mrr_at_10 value: 27.858 - type: mrr_at_100 value: 28.743999999999996 - type: mrr_at_1000 value: 28.819 - type: mrr_at_3 value: 25.769 - type: mrr_at_5 value: 26.964 - type: ndcg_at_1 value: 20.647 - type: ndcg_at_10 value: 28.849999999999998 - type: ndcg_at_100 value: 33.849000000000004 - type: ndcg_at_1000 value: 36.802 - type: ndcg_at_3 value: 24.799 - type: ndcg_at_5 value: 26.682 - type: precision_at_1 value: 20.647 - type: precision_at_10 value: 5.2170000000000005 - type: precision_at_100 value: 0.906 - type: precision_at_1000 value: 0.134 - type: precision_at_3 value: 11.769 - type: precision_at_5 value: 8.486 - type: recall_at_1 value: 17.122 - type: recall_at_10 value: 38.999 - type: recall_at_100 value: 61.467000000000006 - type: recall_at_1000 value: 82.716 - type: recall_at_3 value: 27.601 - type: recall_at_5 value: 32.471 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.396 - type: map_at_10 value: 33.415 - type: map_at_100 value: 34.521 - type: map_at_1000 value: 34.631 - type: map_at_3 value: 30.703999999999997 - type: map_at_5 value: 32.166 - type: mrr_at_1 value: 28.825 - type: mrr_at_10 value: 37.397000000000006 - type: mrr_at_100 value: 38.286 - type: mrr_at_1000 value: 38.346000000000004 - type: mrr_at_3 value: 35.028 - type: mrr_at_5 value: 36.32 - type: ndcg_at_1 value: 28.825 - type: ndcg_at_10 value: 38.656 - type: ndcg_at_100 value: 43.856 - type: ndcg_at_1000 value: 46.31 - type: ndcg_at_3 value: 33.793 - type: ndcg_at_5 value: 35.909 - type: precision_at_1 value: 28.825 - type: precision_at_10 value: 6.567 - type: precision_at_100 value: 1.0330000000000001 - type: precision_at_1000 value: 0.135 - type: precision_at_3 value: 15.516 - type: precision_at_5 value: 10.914 - type: recall_at_1 value: 24.396 - type: recall_at_10 value: 50.747 - type: recall_at_100 value: 73.477 - type: recall_at_1000 value: 90.801 - type: recall_at_3 value: 37.1 - type: recall_at_5 value: 42.589 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.072 - type: map_at_10 value: 34.307 - type: map_at_100 value: 35.725 - type: map_at_1000 value: 35.943999999999996 - type: map_at_3 value: 30.906 - type: map_at_5 value: 32.818000000000005 - type: mrr_at_1 value: 29.644 - type: mrr_at_10 value: 38.673 - type: mrr_at_100 value: 39.459 - type: mrr_at_1000 value: 39.527 - type: mrr_at_3 value: 35.771 - type: mrr_at_5 value: 37.332 - type: ndcg_at_1 value: 29.644 - type: ndcg_at_10 value: 40.548 - type: ndcg_at_100 value: 45.678999999999995 - type: ndcg_at_1000 value: 48.488 - type: ndcg_at_3 value: 34.887 - type: ndcg_at_5 value: 37.543 - type: precision_at_1 value: 29.644 - type: precision_at_10 value: 7.688000000000001 - type: precision_at_100 value: 1.482 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 16.206 - type: precision_at_5 value: 12.016 - type: recall_at_1 value: 25.072 - type: recall_at_10 value: 53.478 - type: recall_at_100 value: 76.07300000000001 - type: recall_at_1000 value: 93.884 - type: recall_at_3 value: 37.583 - type: recall_at_5 value: 44.464 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 20.712 - type: map_at_10 value: 27.467999999999996 - type: map_at_100 value: 28.502 - type: map_at_1000 value: 28.610000000000003 - type: map_at_3 value: 24.887999999999998 - type: map_at_5 value: 26.273999999999997 - type: mrr_at_1 value: 22.736 - type: mrr_at_10 value: 29.553 - type: mrr_at_100 value: 30.485 - type: mrr_at_1000 value: 30.56 - type: mrr_at_3 value: 27.078999999999997 - type: mrr_at_5 value: 28.401 - type: ndcg_at_1 value: 22.736 - type: ndcg_at_10 value: 32.023 - type: ndcg_at_100 value: 37.158 - type: ndcg_at_1000 value: 39.823 - type: ndcg_at_3 value: 26.951999999999998 - type: ndcg_at_5 value: 29.281000000000002 - type: precision_at_1 value: 22.736 - type: precision_at_10 value: 5.213 - type: precision_at_100 value: 0.832 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 11.459999999999999 - type: precision_at_5 value: 8.244 - type: recall_at_1 value: 20.712 - type: recall_at_10 value: 44.057 - type: recall_at_100 value: 67.944 - type: recall_at_1000 value: 87.925 - type: recall_at_3 value: 30.305 - type: recall_at_5 value: 36.071999999999996 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.181999999999999 - type: map_at_10 value: 16.66 - type: map_at_100 value: 18.273 - type: map_at_1000 value: 18.45 - type: map_at_3 value: 14.141 - type: map_at_5 value: 15.455 - type: mrr_at_1 value: 22.15 - type: mrr_at_10 value: 32.062000000000005 - type: mrr_at_100 value: 33.116 - type: mrr_at_1000 value: 33.168 - type: mrr_at_3 value: 28.827 - type: mrr_at_5 value: 30.892999999999997 - type: ndcg_at_1 value: 22.15 - type: ndcg_at_10 value: 23.532 - type: ndcg_at_100 value: 30.358 - type: ndcg_at_1000 value: 33.783 - type: ndcg_at_3 value: 19.222 - type: ndcg_at_5 value: 20.919999999999998 - type: precision_at_1 value: 22.15 - type: precision_at_10 value: 7.185999999999999 - type: precision_at_100 value: 1.433 - type: precision_at_1000 value: 0.207 - type: precision_at_3 value: 13.941 - type: precision_at_5 value: 10.906 - type: recall_at_1 value: 10.181999999999999 - type: recall_at_10 value: 28.104000000000003 - type: recall_at_100 value: 51.998999999999995 - type: recall_at_1000 value: 71.311 - type: recall_at_3 value: 17.698 - type: recall_at_5 value: 22.262999999999998 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 6.669 - type: map_at_10 value: 15.552 - type: map_at_100 value: 21.865000000000002 - type: map_at_1000 value: 23.268 - type: map_at_3 value: 11.309 - type: map_at_5 value: 13.084000000000001 - type: mrr_at_1 value: 55.50000000000001 - type: mrr_at_10 value: 66.46600000000001 - type: mrr_at_100 value: 66.944 - type: mrr_at_1000 value: 66.956 - type: mrr_at_3 value: 64.542 - type: mrr_at_5 value: 65.717 - type: ndcg_at_1 value: 44.75 - type: ndcg_at_10 value: 35.049 - type: ndcg_at_100 value: 39.073 - type: ndcg_at_1000 value: 46.208 - type: ndcg_at_3 value: 39.525 - type: ndcg_at_5 value: 37.156 - type: precision_at_1 value: 55.50000000000001 - type: precision_at_10 value: 27.800000000000004 - type: precision_at_100 value: 9.013 - type: precision_at_1000 value: 1.8800000000000001 - type: precision_at_3 value: 42.667 - type: precision_at_5 value: 36.0 - type: recall_at_1 value: 6.669 - type: recall_at_10 value: 21.811 - type: recall_at_100 value: 45.112 - type: recall_at_1000 value: 67.806 - type: recall_at_3 value: 13.373 - type: recall_at_5 value: 16.615 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 48.769999999999996 - type: f1 value: 42.91448356376592 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 54.013 - type: map_at_10 value: 66.239 - type: map_at_100 value: 66.62599999999999 - type: map_at_1000 value: 66.644 - type: map_at_3 value: 63.965 - type: map_at_5 value: 65.45400000000001 - type: mrr_at_1 value: 58.221000000000004 - type: mrr_at_10 value: 70.43700000000001 - type: mrr_at_100 value: 70.744 - type: mrr_at_1000 value: 70.75099999999999 - type: mrr_at_3 value: 68.284 - type: mrr_at_5 value: 69.721 - type: ndcg_at_1 value: 58.221000000000004 - type: ndcg_at_10 value: 72.327 - type: ndcg_at_100 value: 73.953 - type: ndcg_at_1000 value: 74.312 - type: ndcg_at_3 value: 68.062 - type: ndcg_at_5 value: 70.56400000000001 - type: precision_at_1 value: 58.221000000000004 - type: precision_at_10 value: 9.521 - type: precision_at_100 value: 1.045 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 27.348 - type: precision_at_5 value: 17.794999999999998 - type: recall_at_1 value: 54.013 - type: recall_at_10 value: 86.957 - type: recall_at_100 value: 93.911 - type: recall_at_1000 value: 96.38 - type: recall_at_3 value: 75.555 - type: recall_at_5 value: 81.671 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 21.254 - type: map_at_10 value: 33.723 - type: map_at_100 value: 35.574 - type: map_at_1000 value: 35.730000000000004 - type: map_at_3 value: 29.473 - type: map_at_5 value: 31.543 - type: mrr_at_1 value: 41.358 - type: mrr_at_10 value: 49.498 - type: mrr_at_100 value: 50.275999999999996 - type: mrr_at_1000 value: 50.308 - type: mrr_at_3 value: 47.016000000000005 - type: mrr_at_5 value: 48.336 - type: ndcg_at_1 value: 41.358 - type: ndcg_at_10 value: 41.579 - type: ndcg_at_100 value: 48.455 - type: ndcg_at_1000 value: 51.165000000000006 - type: ndcg_at_3 value: 37.681 - type: ndcg_at_5 value: 38.49 - type: precision_at_1 value: 41.358 - type: precision_at_10 value: 11.543000000000001 - type: precision_at_100 value: 1.87 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 24.743000000000002 - type: precision_at_5 value: 17.994 - type: recall_at_1 value: 21.254 - type: recall_at_10 value: 48.698 - type: recall_at_100 value: 74.588 - type: recall_at_1000 value: 91.00200000000001 - type: recall_at_3 value: 33.939 - type: recall_at_5 value: 39.367000000000004 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 35.922 - type: map_at_10 value: 52.32599999999999 - type: map_at_100 value: 53.18000000000001 - type: map_at_1000 value: 53.245 - type: map_at_3 value: 49.294 - type: map_at_5 value: 51.202999999999996 - type: mrr_at_1 value: 71.843 - type: mrr_at_10 value: 78.24600000000001 - type: mrr_at_100 value: 78.515 - type: mrr_at_1000 value: 78.527 - type: mrr_at_3 value: 77.17500000000001 - type: mrr_at_5 value: 77.852 - type: ndcg_at_1 value: 71.843 - type: ndcg_at_10 value: 61.379 - type: ndcg_at_100 value: 64.535 - type: ndcg_at_1000 value: 65.888 - type: ndcg_at_3 value: 56.958 - type: ndcg_at_5 value: 59.434 - type: precision_at_1 value: 71.843 - type: precision_at_10 value: 12.686 - type: precision_at_100 value: 1.517 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_3 value: 35.778 - type: precision_at_5 value: 23.422 - type: recall_at_1 value: 35.922 - type: recall_at_10 value: 63.43 - type: recall_at_100 value: 75.868 - type: recall_at_1000 value: 84.88900000000001 - type: recall_at_3 value: 53.666000000000004 - type: recall_at_5 value: 58.555 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 79.4408 - type: ap value: 73.52820871620366 - type: f1 value: 79.36240238685001 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.826999999999998 - type: map_at_10 value: 34.04 - type: map_at_100 value: 35.226 - type: map_at_1000 value: 35.275 - type: map_at_3 value: 30.165999999999997 - type: map_at_5 value: 32.318000000000005 - type: mrr_at_1 value: 22.464000000000002 - type: mrr_at_10 value: 34.631 - type: mrr_at_100 value: 35.752 - type: mrr_at_1000 value: 35.795 - type: mrr_at_3 value: 30.798 - type: mrr_at_5 value: 32.946999999999996 - type: ndcg_at_1 value: 22.464000000000002 - type: ndcg_at_10 value: 40.919 - type: ndcg_at_100 value: 46.632 - type: ndcg_at_1000 value: 47.833 - type: ndcg_at_3 value: 32.992 - type: ndcg_at_5 value: 36.834 - type: precision_at_1 value: 22.464000000000002 - type: precision_at_10 value: 6.494 - type: precision_at_100 value: 0.9369999999999999 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.021 - type: precision_at_5 value: 10.347000000000001 - type: recall_at_1 value: 21.826999999999998 - type: recall_at_10 value: 62.132 - type: recall_at_100 value: 88.55199999999999 - type: recall_at_1000 value: 97.707 - type: recall_at_3 value: 40.541 - type: recall_at_5 value: 49.739 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 95.68399452804377 - type: f1 value: 95.25490609832268 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 83.15321477428182 - type: f1 value: 60.35476439087966 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.92669804976462 - type: f1 value: 69.22815107207565 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 74.4855413584398 - type: f1 value: 72.92107516103387 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.412679360205544 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 28.09211869875204 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.540919056982545 - type: mrr value: 31.529904607063536 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.745 - type: map_at_10 value: 12.013 - type: map_at_100 value: 15.040000000000001 - type: map_at_1000 value: 16.427 - type: map_at_3 value: 8.841000000000001 - type: map_at_5 value: 10.289 - type: mrr_at_1 value: 45.201 - type: mrr_at_10 value: 53.483999999999995 - type: mrr_at_100 value: 54.20700000000001 - type: mrr_at_1000 value: 54.252 - type: mrr_at_3 value: 51.29 - type: mrr_at_5 value: 52.73 - type: ndcg_at_1 value: 43.808 - type: ndcg_at_10 value: 32.445 - type: ndcg_at_100 value: 30.031000000000002 - type: ndcg_at_1000 value: 39.007 - type: ndcg_at_3 value: 37.204 - type: ndcg_at_5 value: 35.07 - type: precision_at_1 value: 45.201 - type: precision_at_10 value: 23.684 - type: precision_at_100 value: 7.600999999999999 - type: precision_at_1000 value: 2.043 - type: precision_at_3 value: 33.953 - type: precision_at_5 value: 29.412 - type: recall_at_1 value: 5.745 - type: recall_at_10 value: 16.168 - type: recall_at_100 value: 30.875999999999998 - type: recall_at_1000 value: 62.686 - type: recall_at_3 value: 9.75 - type: recall_at_5 value: 12.413 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 37.828 - type: map_at_10 value: 53.239000000000004 - type: map_at_100 value: 54.035999999999994 - type: map_at_1000 value: 54.067 - type: map_at_3 value: 49.289 - type: map_at_5 value: 51.784 - type: mrr_at_1 value: 42.497 - type: mrr_at_10 value: 55.916999999999994 - type: mrr_at_100 value: 56.495 - type: mrr_at_1000 value: 56.516999999999996 - type: mrr_at_3 value: 52.800000000000004 - type: mrr_at_5 value: 54.722 - type: ndcg_at_1 value: 42.468 - type: ndcg_at_10 value: 60.437 - type: ndcg_at_100 value: 63.731 - type: ndcg_at_1000 value: 64.41799999999999 - type: ndcg_at_3 value: 53.230999999999995 - type: ndcg_at_5 value: 57.26 - type: precision_at_1 value: 42.468 - type: precision_at_10 value: 9.47 - type: precision_at_100 value: 1.1360000000000001 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 23.724999999999998 - type: precision_at_5 value: 16.593 - type: recall_at_1 value: 37.828 - type: recall_at_10 value: 79.538 - type: recall_at_100 value: 93.646 - type: recall_at_1000 value: 98.72999999999999 - type: recall_at_3 value: 61.134 - type: recall_at_5 value: 70.377 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.548 - type: map_at_10 value: 84.466 - type: map_at_100 value: 85.10600000000001 - type: map_at_1000 value: 85.123 - type: map_at_3 value: 81.57600000000001 - type: map_at_5 value: 83.399 - type: mrr_at_1 value: 81.24 - type: mrr_at_10 value: 87.457 - type: mrr_at_100 value: 87.574 - type: mrr_at_1000 value: 87.575 - type: mrr_at_3 value: 86.507 - type: mrr_at_5 value: 87.205 - type: ndcg_at_1 value: 81.25 - type: ndcg_at_10 value: 88.203 - type: ndcg_at_100 value: 89.457 - type: ndcg_at_1000 value: 89.563 - type: ndcg_at_3 value: 85.465 - type: ndcg_at_5 value: 87.007 - type: precision_at_1 value: 81.25 - type: precision_at_10 value: 13.373 - type: precision_at_100 value: 1.5270000000000001 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.417 - type: precision_at_5 value: 24.556 - type: recall_at_1 value: 70.548 - type: recall_at_10 value: 95.208 - type: recall_at_100 value: 99.514 - type: recall_at_1000 value: 99.988 - type: recall_at_3 value: 87.214 - type: recall_at_5 value: 91.696 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 53.04822095496839 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 60.30778476474675 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.692 - type: map_at_10 value: 11.766 - type: map_at_100 value: 13.904 - type: map_at_1000 value: 14.216999999999999 - type: map_at_3 value: 8.245 - type: map_at_5 value: 9.92 - type: mrr_at_1 value: 23.0 - type: mrr_at_10 value: 33.78 - type: mrr_at_100 value: 34.922 - type: mrr_at_1000 value: 34.973 - type: mrr_at_3 value: 30.2 - type: mrr_at_5 value: 32.565 - type: ndcg_at_1 value: 23.0 - type: ndcg_at_10 value: 19.863 - type: ndcg_at_100 value: 28.141 - type: ndcg_at_1000 value: 33.549 - type: ndcg_at_3 value: 18.434 - type: ndcg_at_5 value: 16.384 - type: precision_at_1 value: 23.0 - type: precision_at_10 value: 10.39 - type: precision_at_100 value: 2.235 - type: precision_at_1000 value: 0.35300000000000004 - type: precision_at_3 value: 17.133000000000003 - type: precision_at_5 value: 14.44 - type: recall_at_1 value: 4.692 - type: recall_at_10 value: 21.025 - type: recall_at_100 value: 45.324999999999996 - type: recall_at_1000 value: 71.675 - type: recall_at_3 value: 10.440000000000001 - type: recall_at_5 value: 14.64 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.96178184892842 - type: cos_sim_spearman value: 79.6487740813199 - type: euclidean_pearson value: 82.06661161625023 - type: euclidean_spearman value: 79.64876769031183 - type: manhattan_pearson value: 82.07061164575131 - type: manhattan_spearman value: 79.65197039464537 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 84.15305604100027 - type: cos_sim_spearman value: 74.27447427941591 - type: euclidean_pearson value: 80.52737337565307 - type: euclidean_spearman value: 74.27416077132192 - type: manhattan_pearson value: 80.53728571140387 - type: manhattan_spearman value: 74.28853605753457 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 83.44386080639279 - type: cos_sim_spearman value: 84.17947648159536 - type: euclidean_pearson value: 83.34145388129387 - type: euclidean_spearman value: 84.17947648159536 - type: manhattan_pearson value: 83.30699061927966 - type: manhattan_spearman value: 84.18125737380451 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 81.57392220985612 - type: cos_sim_spearman value: 78.80745014464101 - type: euclidean_pearson value: 80.01660371487199 - type: euclidean_spearman value: 78.80741240102256 - type: manhattan_pearson value: 79.96810779507953 - type: manhattan_spearman value: 78.75600400119448 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.85421063026625 - type: cos_sim_spearman value: 87.55320285299192 - type: euclidean_pearson value: 86.69750143323517 - type: euclidean_spearman value: 87.55320284326378 - type: manhattan_pearson value: 86.63379169960379 - type: manhattan_spearman value: 87.4815029877984 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 84.31314130411842 - type: cos_sim_spearman value: 85.3489588181433 - type: euclidean_pearson value: 84.13240933463535 - type: euclidean_spearman value: 85.34902871403281 - type: manhattan_pearson value: 84.01183086503559 - type: manhattan_spearman value: 85.19316703166102 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 89.09979781689536 - type: cos_sim_spearman value: 88.87813323759015 - type: euclidean_pearson value: 88.65413031123792 - type: euclidean_spearman value: 88.87813323759015 - type: manhattan_pearson value: 88.61818758256024 - type: manhattan_spearman value: 88.81044100494604 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 62.30693258111531 - type: cos_sim_spearman value: 62.195516523251946 - type: euclidean_pearson value: 62.951283701049476 - type: euclidean_spearman value: 62.195516523251946 - type: manhattan_pearson value: 63.068322281439535 - type: manhattan_spearman value: 62.10621171028406 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.27092833763909 - type: cos_sim_spearman value: 84.84429717949759 - type: euclidean_pearson value: 84.8516966060792 - type: euclidean_spearman value: 84.84429717949759 - type: manhattan_pearson value: 84.82203139242881 - type: manhattan_spearman value: 84.8358503952945 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 83.10290863981409 - type: mrr value: 95.31168450286097 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 52.161 - type: map_at_10 value: 62.138000000000005 - type: map_at_100 value: 62.769 - type: map_at_1000 value: 62.812 - type: map_at_3 value: 59.111000000000004 - type: map_at_5 value: 60.995999999999995 - type: mrr_at_1 value: 55.333 - type: mrr_at_10 value: 63.504000000000005 - type: mrr_at_100 value: 64.036 - type: mrr_at_1000 value: 64.08 - type: mrr_at_3 value: 61.278 - type: mrr_at_5 value: 62.778 - type: ndcg_at_1 value: 55.333 - type: ndcg_at_10 value: 66.678 - type: ndcg_at_100 value: 69.415 - type: ndcg_at_1000 value: 70.453 - type: ndcg_at_3 value: 61.755 - type: ndcg_at_5 value: 64.546 - type: precision_at_1 value: 55.333 - type: precision_at_10 value: 9.033 - type: precision_at_100 value: 1.043 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 24.221999999999998 - type: precision_at_5 value: 16.333000000000002 - type: recall_at_1 value: 52.161 - type: recall_at_10 value: 79.156 - type: recall_at_100 value: 91.333 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 66.43299999999999 - type: recall_at_5 value: 73.272 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.81287128712871 - type: cos_sim_ap value: 95.30034785910676 - type: cos_sim_f1 value: 90.28629856850716 - type: cos_sim_precision value: 92.36401673640168 - type: cos_sim_recall value: 88.3 - type: dot_accuracy value: 99.81287128712871 - type: dot_ap value: 95.30034785910676 - type: dot_f1 value: 90.28629856850716 - type: dot_precision value: 92.36401673640168 - type: dot_recall value: 88.3 - type: euclidean_accuracy value: 99.81287128712871 - type: euclidean_ap value: 95.30034785910676 - type: euclidean_f1 value: 90.28629856850716 - type: euclidean_precision value: 92.36401673640168 - type: euclidean_recall value: 88.3 - type: manhattan_accuracy value: 99.80990099009901 - type: manhattan_ap value: 95.26880751950654 - type: manhattan_f1 value: 90.22177419354838 - type: manhattan_precision value: 90.95528455284553 - type: manhattan_recall value: 89.5 - type: max_accuracy value: 99.81287128712871 - type: max_ap value: 95.30034785910676 - type: max_f1 value: 90.28629856850716 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 58.518662504351184 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.96168178378587 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.04862593471896 - type: mrr value: 52.97238402936932 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.092545236479946 - type: cos_sim_spearman value: 31.599851000175498 - type: dot_pearson value: 30.092542723901676 - type: dot_spearman value: 31.599851000175498 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.189 - type: map_at_10 value: 1.662 - type: map_at_100 value: 9.384 - type: map_at_1000 value: 22.669 - type: map_at_3 value: 0.5559999999999999 - type: map_at_5 value: 0.9039999999999999 - type: mrr_at_1 value: 68.0 - type: mrr_at_10 value: 81.01899999999999 - type: mrr_at_100 value: 81.01899999999999 - type: mrr_at_1000 value: 81.01899999999999 - type: mrr_at_3 value: 79.333 - type: mrr_at_5 value: 80.733 - type: ndcg_at_1 value: 63.0 - type: ndcg_at_10 value: 65.913 - type: ndcg_at_100 value: 51.895 - type: ndcg_at_1000 value: 46.967 - type: ndcg_at_3 value: 65.49199999999999 - type: ndcg_at_5 value: 66.69699999999999 - type: precision_at_1 value: 68.0 - type: precision_at_10 value: 71.6 - type: precision_at_100 value: 53.66 - type: precision_at_1000 value: 21.124000000000002 - type: precision_at_3 value: 72.667 - type: precision_at_5 value: 74.0 - type: recall_at_1 value: 0.189 - type: recall_at_10 value: 1.913 - type: recall_at_100 value: 12.601999999999999 - type: recall_at_1000 value: 44.296 - type: recall_at_3 value: 0.605 - type: recall_at_5 value: 1.018 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.701 - type: map_at_10 value: 10.445 - type: map_at_100 value: 17.324 - type: map_at_1000 value: 19.161 - type: map_at_3 value: 5.497 - type: map_at_5 value: 7.278 - type: mrr_at_1 value: 30.612000000000002 - type: mrr_at_10 value: 45.534 - type: mrr_at_100 value: 45.792 - type: mrr_at_1000 value: 45.806999999999995 - type: mrr_at_3 value: 37.755 - type: mrr_at_5 value: 43.469 - type: ndcg_at_1 value: 26.531 - type: ndcg_at_10 value: 26.235000000000003 - type: ndcg_at_100 value: 39.17 - type: ndcg_at_1000 value: 51.038 - type: ndcg_at_3 value: 23.625 - type: ndcg_at_5 value: 24.338 - type: precision_at_1 value: 30.612000000000002 - type: precision_at_10 value: 24.285999999999998 - type: precision_at_100 value: 8.224 - type: precision_at_1000 value: 1.6179999999999999 - type: precision_at_3 value: 24.490000000000002 - type: precision_at_5 value: 24.898 - type: recall_at_1 value: 2.701 - type: recall_at_10 value: 17.997 - type: recall_at_100 value: 51.766999999999996 - type: recall_at_1000 value: 87.863 - type: recall_at_3 value: 6.295000000000001 - type: recall_at_5 value: 9.993 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 73.3474 - type: ap value: 15.393431414459924 - type: f1 value: 56.466681887882416 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 62.062818336163 - type: f1 value: 62.11230840463252 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 42.464892820845115 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.15962329379508 - type: cos_sim_ap value: 74.73674057919256 - type: cos_sim_f1 value: 68.81245642574947 - type: cos_sim_precision value: 61.48255813953488 - type: cos_sim_recall value: 78.12664907651715 - type: dot_accuracy value: 86.15962329379508 - type: dot_ap value: 74.7367634988281 - type: dot_f1 value: 68.81245642574947 - type: dot_precision value: 61.48255813953488 - type: dot_recall value: 78.12664907651715 - type: euclidean_accuracy value: 86.15962329379508 - type: euclidean_ap value: 74.7367761466634 - type: euclidean_f1 value: 68.81245642574947 - type: euclidean_precision value: 61.48255813953488 - type: euclidean_recall value: 78.12664907651715 - type: manhattan_accuracy value: 86.21326816474935 - type: manhattan_ap value: 74.64416473733951 - type: manhattan_f1 value: 68.80924855491331 - type: manhattan_precision value: 61.23456790123457 - type: manhattan_recall value: 78.52242744063325 - type: max_accuracy value: 86.21326816474935 - type: max_ap value: 74.7367761466634 - type: max_f1 value: 68.81245642574947 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.97620988085536 - type: cos_sim_ap value: 86.08680845745758 - type: cos_sim_f1 value: 78.02793637114438 - type: cos_sim_precision value: 73.11082699683736 - type: cos_sim_recall value: 83.65414228518632 - type: dot_accuracy value: 88.97620988085536 - type: dot_ap value: 86.08681149437946 - type: dot_f1 value: 78.02793637114438 - type: dot_precision value: 73.11082699683736 - type: dot_recall value: 83.65414228518632 - type: euclidean_accuracy value: 88.97620988085536 - type: euclidean_ap value: 86.08681215460771 - type: euclidean_f1 value: 78.02793637114438 - type: euclidean_precision value: 73.11082699683736 - type: euclidean_recall value: 83.65414228518632 - type: manhattan_accuracy value: 88.88888888888889 - type: manhattan_ap value: 86.02916327562438 - type: manhattan_f1 value: 78.02063045516843 - type: manhattan_precision value: 73.38851947346994 - type: manhattan_recall value: 83.2768709578072 - type: max_accuracy value: 88.97620988085536 - type: max_ap value: 86.08681215460771 - type: max_f1 value: 78.02793637114438 ---

\"Jina

The text embedding set trained by .

## Quick Start The easiest way to starting using is to use Jina AI's Embedding API. ## Intended Usage & Model Info is an English, monolingual **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. The backbone is pretrained on the C4 dataset. The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process. The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi. This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc. With a standard size of 137 million parameters, the model enables fast inference while delivering better performance than our small model. It is recommended to use a single GPU for inference. Additionally, we provide the following embedding models: - []( 33 million parameters. - []( 137 million parameters **(you are here)**. - []( Chinese-English Bilingual embeddings. - []( German-English Bilingual embeddings. - []( Spanish-English Bilingual embeddings. ## Data & Parameters Jina Embeddings V2 technical report ## Usage **
Please apply mean pooling when integrating the model.**

### Why mean pooling? takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an function to deal with this. However, if you would like to do it without using the default function:

You can use Jina Embedding models directly from transformers package. If you only want to handle shorter sequence, such as 2k, pass the parameter to the function: Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well): ## Alternatives to Using Transformers (or SentencTransformers) Package 1. _Managed SaaS_: Get started with a free key on Jina AI's Embedding API. 2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on AWS Sagemaker. ## Use Jina Embeddings for RAG According to the latest blog post from LLamaIndex, > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out. ## Plans 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese. 2. Multimodal embedding models enable Multimodal RAG applications. 3. High-performt rerankers. ## Trouble Shooting **Loading of Model Code failed** If you forgot to pass the flag when calling or initializing the model via the class, you will receive an error that the model weights could not be initialized. This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model: **User is not logged into Huggingface** The model is only availabe under gated access. This means you need to be logged into huggingface load load it. If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above: ## Contact Join our Discord community and chat with other community members about ideas. ## Citation If you find Jina Embeddings useful in your research, please cite the following paper:", + "model_explanation_gemini": "Generates sentence embeddings for English text to enable tasks like similarity comparison, classification, clustering, and retrieval." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-embeddings-v2-base-zh.json b/data/model_data_json/jinaai_jina-embeddings-v2-base-zh.json new file mode 100644 index 0000000000000000000000000000000000000000..5650f1cd1e4f93eafe0450a31fe425708b3cb23b --- /dev/null +++ b/data/model_data_json/jinaai_jina-embeddings-v2-base-zh.json @@ -0,0 +1,28 @@ +{ + "model_id": "jinaai/jina-embeddings-v2-base-zh", + "downloads": 90615, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "transformers", + "transformers.js", + "custom_code", + "en", + "zh", + "arxiv:2108.12409", + "arxiv:2402.17016", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "region:us" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - transformers - transformers.js inference: false license: apache-2.0 language: - en - zh model-index: - name: jina-embeddings-v2-base-zh results: - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: None metrics: - type: cos_sim_pearson value: 48.51403119231363 - type: cos_sim_spearman value: 50.5928547846445 - type: euclidean_pearson value: 48.750436310559074 - type: euclidean_spearman value: 50.50950238691385 - type: manhattan_pearson value: 48.7866189440328 - type: manhattan_spearman value: 50.58692402017165 - task: type: STS dataset: type: C-MTEB/ATEC name: MTEB ATEC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 50.25985700105725 - type: cos_sim_spearman value: 51.28815934593989 - type: euclidean_pearson value: 52.70329248799904 - type: euclidean_spearman value: 50.94101139559258 - type: manhattan_pearson value: 52.6647237400892 - type: manhattan_spearman value: 50.922441325406176 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 34.944 - type: f1 value: 34.06478860660109 - task: type: STS dataset: type: C-MTEB/BQ name: MTEB BQ config: default split: test revision: None metrics: - type: cos_sim_pearson value: 65.15667035488342 - type: cos_sim_spearman value: 66.07110142081 - type: euclidean_pearson value: 60.447598102249714 - type: euclidean_spearman value: 61.826575796578766 - type: manhattan_pearson value: 60.39364279354984 - type: manhattan_spearman value: 61.78743491223281 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringP2P name: MTEB CLSClusteringP2P config: default split: test revision: None metrics: - type: v_measure value: 39.96714175391701 - task: type: Clustering dataset: type: C-MTEB/CLSClusteringS2S name: MTEB CLSClusteringS2S config: default split: test revision: None metrics: - type: v_measure value: 38.39863566717934 - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 83.63680381780644 - type: mrr value: 86.16476190476192 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 83.74350667859487 - type: mrr value: 86.10388888888889 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 22.072 - type: map_at_10 value: 32.942 - type: map_at_100 value: 34.768 - type: map_at_1000 value: 34.902 - type: map_at_3 value: 29.357 - type: map_at_5 value: 31.236000000000004 - type: mrr_at_1 value: 34.259 - type: mrr_at_10 value: 41.957 - type: mrr_at_100 value: 42.982 - type: mrr_at_1000 value: 43.042 - type: mrr_at_3 value: 39.722 - type: mrr_at_5 value: 40.898 - type: ndcg_at_1 value: 34.259 - type: ndcg_at_10 value: 39.153 - type: ndcg_at_100 value: 46.493 - type: ndcg_at_1000 value: 49.01 - type: ndcg_at_3 value: 34.636 - type: ndcg_at_5 value: 36.278 - type: precision_at_1 value: 34.259 - type: precision_at_10 value: 8.815000000000001 - type: precision_at_100 value: 1.474 - type: precision_at_1000 value: 0.179 - type: precision_at_3 value: 19.73 - type: precision_at_5 value: 14.174000000000001 - type: recall_at_1 value: 22.072 - type: recall_at_10 value: 48.484 - type: recall_at_100 value: 79.035 - type: recall_at_1000 value: 96.15 - type: recall_at_3 value: 34.607 - type: recall_at_5 value: 40.064 - task: type: PairClassification dataset: type: C-MTEB/CMNLI name: MTEB Cmnli config: default split: validation revision: None metrics: - type: cos_sim_accuracy value: 76.7047504509922 - type: cos_sim_ap value: 85.26649874800871 - type: cos_sim_f1 value: 78.13528724646915 - type: cos_sim_precision value: 71.57587548638132 - type: cos_sim_recall value: 86.01823708206688 - type: dot_accuracy value: 70.13830426939266 - type: dot_ap value: 77.01510412382171 - type: dot_f1 value: 73.56710042713817 - type: dot_precision value: 63.955094991364426 - type: dot_recall value: 86.57937806873977 - type: euclidean_accuracy value: 75.53818400481059 - type: euclidean_ap value: 84.34668448241264 - type: euclidean_f1 value: 77.51741608613047 - type: euclidean_precision value: 70.65614777756399 - type: euclidean_recall value: 85.85457096095394 - type: manhattan_accuracy value: 75.49007817197835 - type: manhattan_ap value: 84.40297506704299 - type: manhattan_f1 value: 77.63185324160932 - type: manhattan_precision value: 70.03949595636637 - type: manhattan_recall value: 87.07037643207856 - type: max_accuracy value: 76.7047504509922 - type: max_ap value: 85.26649874800871 - type: max_f1 value: 78.13528724646915 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 69.178 - type: map_at_10 value: 77.523 - type: map_at_100 value: 77.793 - type: map_at_1000 value: 77.79899999999999 - type: map_at_3 value: 75.878 - type: map_at_5 value: 76.849 - type: mrr_at_1 value: 69.44200000000001 - type: mrr_at_10 value: 77.55 - type: mrr_at_100 value: 77.819 - type: mrr_at_1000 value: 77.826 - type: mrr_at_3 value: 75.957 - type: mrr_at_5 value: 76.916 - type: ndcg_at_1 value: 69.44200000000001 - type: ndcg_at_10 value: 81.217 - type: ndcg_at_100 value: 82.45 - type: ndcg_at_1000 value: 82.636 - type: ndcg_at_3 value: 77.931 - type: ndcg_at_5 value: 79.655 - type: precision_at_1 value: 69.44200000000001 - type: precision_at_10 value: 9.357 - type: precision_at_100 value: 0.993 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 28.1 - type: precision_at_5 value: 17.724 - type: recall_at_1 value: 69.178 - type: recall_at_10 value: 92.624 - type: recall_at_100 value: 98.209 - type: recall_at_1000 value: 99.684 - type: recall_at_3 value: 83.772 - type: recall_at_5 value: 87.882 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 25.163999999999998 - type: map_at_10 value: 76.386 - type: map_at_100 value: 79.339 - type: map_at_1000 value: 79.39500000000001 - type: map_at_3 value: 52.959 - type: map_at_5 value: 66.59 - type: mrr_at_1 value: 87.9 - type: mrr_at_10 value: 91.682 - type: mrr_at_100 value: 91.747 - type: mrr_at_1000 value: 91.751 - type: mrr_at_3 value: 91.267 - type: mrr_at_5 value: 91.527 - type: ndcg_at_1 value: 87.9 - type: ndcg_at_10 value: 84.569 - type: ndcg_at_100 value: 87.83800000000001 - type: ndcg_at_1000 value: 88.322 - type: ndcg_at_3 value: 83.473 - type: ndcg_at_5 value: 82.178 - type: precision_at_1 value: 87.9 - type: precision_at_10 value: 40.605000000000004 - type: precision_at_100 value: 4.752 - type: precision_at_1000 value: 0.488 - type: precision_at_3 value: 74.9 - type: precision_at_5 value: 62.96000000000001 - type: recall_at_1 value: 25.163999999999998 - type: recall_at_10 value: 85.97399999999999 - type: recall_at_100 value: 96.63000000000001 - type: recall_at_1000 value: 99.016 - type: recall_at_3 value: 55.611999999999995 - type: recall_at_5 value: 71.936 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 48.6 - type: map_at_10 value: 58.831 - type: map_at_100 value: 59.427 - type: map_at_1000 value: 59.44199999999999 - type: map_at_3 value: 56.383 - type: map_at_5 value: 57.753 - type: mrr_at_1 value: 48.6 - type: mrr_at_10 value: 58.831 - type: mrr_at_100 value: 59.427 - type: mrr_at_1000 value: 59.44199999999999 - type: mrr_at_3 value: 56.383 - type: mrr_at_5 value: 57.753 - type: ndcg_at_1 value: 48.6 - type: ndcg_at_10 value: 63.951 - type: ndcg_at_100 value: 66.72200000000001 - type: ndcg_at_1000 value: 67.13900000000001 - type: ndcg_at_3 value: 58.882 - type: ndcg_at_5 value: 61.373 - type: precision_at_1 value: 48.6 - type: precision_at_10 value: 8.01 - type: precision_at_100 value: 0.928 - type: precision_at_1000 value: 0.096 - type: precision_at_3 value: 22.033 - type: precision_at_5 value: 14.44 - type: recall_at_1 value: 48.6 - type: recall_at_10 value: 80.10000000000001 - type: recall_at_100 value: 92.80000000000001 - type: recall_at_1000 value: 96.1 - type: recall_at_3 value: 66.10000000000001 - type: recall_at_5 value: 72.2 - task: type: Classification dataset: type: C-MTEB/IFlyTek-classification name: MTEB IFlyTek config: default split: validation revision: None metrics: - type: accuracy value: 47.36437091188918 - type: f1 value: 36.60946954228577 - task: type: Classification dataset: type: C-MTEB/JDReview-classification name: MTEB JDReview config: default split: test revision: None metrics: - type: accuracy value: 79.5684803001876 - type: ap value: 42.671935929201524 - type: f1 value: 73.31912729103752 - task: type: STS dataset: type: C-MTEB/LCQMC name: MTEB LCQMC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 68.62670112113864 - type: cos_sim_spearman value: 75.74009123170768 - type: euclidean_pearson value: 73.93002595958237 - type: euclidean_spearman value: 75.35222935003587 - type: manhattan_pearson value: 73.89870445158144 - type: manhattan_spearman value: 75.31714936339398 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 31.5372713650176 - type: mrr value: 30.163095238095238 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 65.054 - type: map_at_10 value: 74.156 - type: map_at_100 value: 74.523 - type: map_at_1000 value: 74.535 - type: map_at_3 value: 72.269 - type: map_at_5 value: 73.41 - type: mrr_at_1 value: 67.24900000000001 - type: mrr_at_10 value: 74.78399999999999 - type: mrr_at_100 value: 75.107 - type: mrr_at_1000 value: 75.117 - type: mrr_at_3 value: 73.13499999999999 - type: mrr_at_5 value: 74.13499999999999 - type: ndcg_at_1 value: 67.24900000000001 - type: ndcg_at_10 value: 77.96300000000001 - type: ndcg_at_100 value: 79.584 - type: ndcg_at_1000 value: 79.884 - type: ndcg_at_3 value: 74.342 - type: ndcg_at_5 value: 76.278 - type: precision_at_1 value: 67.24900000000001 - type: precision_at_10 value: 9.466 - type: precision_at_100 value: 1.027 - type: precision_at_1000 value: 0.105 - type: precision_at_3 value: 27.955999999999996 - type: precision_at_5 value: 17.817 - type: recall_at_1 value: 65.054 - type: recall_at_10 value: 89.113 - type: recall_at_100 value: 96.369 - type: recall_at_1000 value: 98.714 - type: recall_at_3 value: 79.45400000000001 - type: recall_at_5 value: 84.06 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 68.1977135171486 - type: f1 value: 67.23114308718404 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 71.92669804976462 - type: f1 value: 72.90628475628779 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 49.2 - type: map_at_10 value: 54.539 - type: map_at_100 value: 55.135 - type: map_at_1000 value: 55.19199999999999 - type: map_at_3 value: 53.383 - type: map_at_5 value: 54.142999999999994 - type: mrr_at_1 value: 49.2 - type: mrr_at_10 value: 54.539 - type: mrr_at_100 value: 55.135999999999996 - type: mrr_at_1000 value: 55.19199999999999 - type: mrr_at_3 value: 53.383 - type: mrr_at_5 value: 54.142999999999994 - type: ndcg_at_1 value: 49.2 - type: ndcg_at_10 value: 57.123000000000005 - type: ndcg_at_100 value: 60.21300000000001 - type: ndcg_at_1000 value: 61.915 - type: ndcg_at_3 value: 54.772 - type: ndcg_at_5 value: 56.157999999999994 - type: precision_at_1 value: 49.2 - type: precision_at_10 value: 6.52 - type: precision_at_100 value: 0.8009999999999999 - type: precision_at_1000 value: 0.094 - type: precision_at_3 value: 19.6 - type: precision_at_5 value: 12.44 - type: recall_at_1 value: 49.2 - type: recall_at_10 value: 65.2 - type: recall_at_100 value: 80.10000000000001 - type: recall_at_1000 value: 93.89999999999999 - type: recall_at_3 value: 58.8 - type: recall_at_5 value: 62.2 - task: type: Classification dataset: type: C-MTEB/MultilingualSentiment-classification name: MTEB MultilingualSentiment config: default split: validation revision: None metrics: - type: accuracy value: 63.29333333333334 - type: f1 value: 63.03293854259612 - task: type: PairClassification dataset: type: C-MTEB/OCNLI name: MTEB Ocnli config: default split: validation revision: None metrics: - type: cos_sim_accuracy value: 75.69030860855442 - type: cos_sim_ap value: 80.6157833772759 - type: cos_sim_f1 value: 77.87524366471735 - type: cos_sim_precision value: 72.3076923076923 - type: cos_sim_recall value: 84.37170010559663 - type: dot_accuracy value: 67.78559826746074 - type: dot_ap value: 72.00871467527499 - type: dot_f1 value: 72.58722247394654 - type: dot_precision value: 63.57142857142857 - type: dot_recall value: 84.58289334741288 - type: euclidean_accuracy value: 75.20303194369248 - type: euclidean_ap value: 80.98587256415605 - type: euclidean_f1 value: 77.26396917148362 - type: euclidean_precision value: 71.03631532329496 - type: euclidean_recall value: 84.68848996832101 - type: manhattan_accuracy value: 75.20303194369248 - type: manhattan_ap value: 80.93460699513219 - type: manhattan_f1 value: 77.124773960217 - type: manhattan_precision value: 67.43083003952569 - type: manhattan_recall value: 90.07391763463569 - type: max_accuracy value: 75.69030860855442 - type: max_ap value: 80.98587256415605 - type: max_f1 value: 77.87524366471735 - task: type: Classification dataset: type: C-MTEB/OnlineShopping-classification name: MTEB OnlineShopping config: default split: test revision: None metrics: - type: accuracy value: 87.00000000000001 - type: ap value: 83.24372135949511 - type: f1 value: 86.95554191530607 - task: type: STS dataset: type: C-MTEB/PAWSX name: MTEB PAWSX config: default split: test revision: None metrics: - type: cos_sim_pearson value: 37.57616811591219 - type: cos_sim_spearman value: 41.490259084930045 - type: euclidean_pearson value: 38.9155043692188 - type: euclidean_spearman value: 39.16056534305623 - type: manhattan_pearson value: 38.76569892264335 - type: manhattan_spearman value: 38.99891685590743 - task: type: STS dataset: type: C-MTEB/QBQTC name: MTEB QBQTC config: default split: test revision: None metrics: - type: cos_sim_pearson value: 35.44858610359665 - type: cos_sim_spearman value: 38.11128146262466 - type: euclidean_pearson value: 31.928644189822457 - type: euclidean_spearman value: 34.384936631696554 - type: manhattan_pearson value: 31.90586687414376 - type: manhattan_spearman value: 34.35770153777186 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 66.54931957553592 - type: cos_sim_spearman value: 69.25068863016632 - type: euclidean_pearson value: 50.26525596106869 - type: euclidean_spearman value: 63.83352741910006 - type: manhattan_pearson value: 49.98798282198196 - type: manhattan_spearman value: 63.87649521907841 - task: type: STS dataset: type: C-MTEB/STSB name: MTEB STSB config: default split: test revision: None metrics: - type: cos_sim_pearson value: 82.52782476625825 - type: cos_sim_spearman value: 82.55618986168398 - type: euclidean_pearson value: 78.48190631687673 - type: euclidean_spearman value: 78.39479731354655 - type: manhattan_pearson value: 78.51176592165885 - type: manhattan_spearman value: 78.42363787303265 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 67.36693873615643 - type: mrr value: 77.83847701797939 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 25.795 - type: map_at_10 value: 72.258 - type: map_at_100 value: 76.049 - type: map_at_1000 value: 76.134 - type: map_at_3 value: 50.697 - type: map_at_5 value: 62.324999999999996 - type: mrr_at_1 value: 86.634 - type: mrr_at_10 value: 89.792 - type: mrr_at_100 value: 89.91900000000001 - type: mrr_at_1000 value: 89.923 - type: mrr_at_3 value: 89.224 - type: mrr_at_5 value: 89.608 - type: ndcg_at_1 value: 86.634 - type: ndcg_at_10 value: 80.589 - type: ndcg_at_100 value: 84.812 - type: ndcg_at_1000 value: 85.662 - type: ndcg_at_3 value: 82.169 - type: ndcg_at_5 value: 80.619 - type: precision_at_1 value: 86.634 - type: precision_at_10 value: 40.389 - type: precision_at_100 value: 4.93 - type: precision_at_1000 value: 0.513 - type: precision_at_3 value: 72.104 - type: precision_at_5 value: 60.425 - type: recall_at_1 value: 25.795 - type: recall_at_10 value: 79.565 - type: recall_at_100 value: 93.24799999999999 - type: recall_at_1000 value: 97.595 - type: recall_at_3 value: 52.583999999999996 - type: recall_at_5 value: 66.175 - task: type: Classification dataset: type: C-MTEB/TNews-classification name: MTEB TNews config: default split: validation revision: None metrics: - type: accuracy value: 47.648999999999994 - type: f1 value: 46.28925837008413 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringP2P name: MTEB ThuNewsClusteringP2P config: default split: test revision: None metrics: - type: v_measure value: 54.07641891287953 - task: type: Clustering dataset: type: C-MTEB/ThuNewsClusteringS2S name: MTEB ThuNewsClusteringS2S config: default split: test revision: None metrics: - type: v_measure value: 53.423702062353954 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 55.7 - type: map_at_10 value: 65.923 - type: map_at_100 value: 66.42 - type: map_at_1000 value: 66.431 - type: map_at_3 value: 63.9 - type: map_at_5 value: 65.225 - type: mrr_at_1 value: 55.60000000000001 - type: mrr_at_10 value: 65.873 - type: mrr_at_100 value: 66.36999999999999 - type: mrr_at_1000 value: 66.381 - type: mrr_at_3 value: 63.849999999999994 - type: mrr_at_5 value: 65.17500000000001 - type: ndcg_at_1 value: 55.7 - type: ndcg_at_10 value: 70.621 - type: ndcg_at_100 value: 72.944 - type: ndcg_at_1000 value: 73.25399999999999 - type: ndcg_at_3 value: 66.547 - type: ndcg_at_5 value: 68.93599999999999 - type: precision_at_1 value: 55.7 - type: precision_at_10 value: 8.52 - type: precision_at_100 value: 0.958 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 24.733 - type: precision_at_5 value: 16 - type: recall_at_1 value: 55.7 - type: recall_at_10 value: 85.2 - type: recall_at_100 value: 95.8 - type: recall_at_1000 value: 98.3 - type: recall_at_3 value: 74.2 - type: recall_at_5 value: 80 - task: type: Classification dataset: type: C-MTEB/waimai-classification name: MTEB Waimai config: default split: test revision: None metrics: - type: accuracy value: 84.54 - type: ap value: 66.13603199670062 - type: f1 value: 82.61420654584116 ---

\"Jina

The text embedding set trained by .

## Quick Start The easiest way to starting using is to use Jina AI's Embedding API. ## Intended Usage & Model Info is a Chinese/English bilingual text **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. We have designed it for high performance in mono-lingual & cross-lingual applications and trained it specifically to support mixed Chinese-English input without bias. Additionally, we provide the following embedding models: 是支持中英双语的**文本向量**模型,它支持长达**8192字符**的文本编码。 该模型的研发基于BERT架构(JinaBERT),JinaBERT是在BERT架构基础上的改进,首次将ALiBi应用到编码器架构中以支持更长的序列。 不同于以往的单语言/多语言向量模型,我们设计双语模型来更好的支持单语言(中搜中)以及跨语言(中搜英)文档检索。 除此之外,我们也提供其它向量模型: - []( 33 million parameters. - []( 137 million parameters. - []( 161 million parameters Chinese-English Bilingual embeddings **(you are here)**. - []( 161 million parameters German-English Bilingual embeddings. - [](): Spanish-English Bilingual embeddings (soon). - []( 161 million parameters code embeddings. ## Data & Parameters The data and training details are described in this technical report. ## Usage **
Please apply mean pooling when integrating the model.**

### Why mean pooling? takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an function to deal with this. However, if you would like to do it without using the default function:

You can use Jina Embedding models directly from transformers package. If you only want to handle shorter sequence, such as 2k, pass the parameter to the function: If you want to use the model together with the sentence-transformers package, make sure that you have installed the latest release and set as well: Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well): ## Alternatives to Using Transformers Package 1. _Managed SaaS_: Get started with a free key on Jina AI's Embedding API. 2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on AWS Sagemaker. ## Use Jina Embeddings for RAG According to the latest blog post from LLamaIndex, > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out. ## Trouble Shooting **Loading of Model Code failed** If you forgot to pass the flag when calling or initializing the model via the class, you will receive an error that the model weights could not be initialized. This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model: **User is not logged into Huggingface** The model is only availabe under gated access. This means you need to be logged into huggingface load load it. If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above: ## Contact Join our Discord community and chat with other community members about ideas. ## Citation If you find Jina Embeddings useful in your research, please cite the following paper:", + "model_explanation_gemini": "Generates Chinese and English text embeddings for tasks like sentence similarity, classification, clustering, and retrieval." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-embeddings-v2-small-en.json b/data/model_data_json/jinaai_jina-embeddings-v2-small-en.json new file mode 100644 index 0000000000000000000000000000000000000000..3a0b881af1f99883d26c28b772ffcdaea9992b2c --- /dev/null +++ b/data/model_data_json/jinaai_jina-embeddings-v2-small-en.json @@ -0,0 +1,26 @@ +{ + "model_id": "jinaai/jina-embeddings-v2-small-en", + "downloads": 75783, + "tags": [ + "sentence-transformers", + "pytorch", + "coreml", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "custom_code", + "en", + "dataset:jinaai/negation-dataset", + "arxiv:2108.12409", + "arxiv:2310.19923", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "region:us" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb datasets: - jinaai/negation-dataset language: en inference: false license: apache-2.0 model-index: - name: jina-embedding-s-en-v2 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 71.35820895522387 - type: ap value: 33.99931933598115 - type: f1 value: 65.3853685535555 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 82.90140000000001 - type: ap value: 78.01434597815617 - type: f1 value: 82.83357802722676 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 40.88999999999999 - type: f1 value: 39.209432767163456 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 23.257 - type: map_at_10 value: 37.946000000000005 - type: map_at_100 value: 39.17 - type: map_at_1000 value: 39.181 - type: map_at_3 value: 32.99 - type: map_at_5 value: 35.467999999999996 - type: mrr_at_1 value: 23.541999999999998 - type: mrr_at_10 value: 38.057 - type: mrr_at_100 value: 39.289 - type: mrr_at_1000 value: 39.299 - type: mrr_at_3 value: 33.096 - type: mrr_at_5 value: 35.628 - type: ndcg_at_1 value: 23.257 - type: ndcg_at_10 value: 46.729 - type: ndcg_at_100 value: 51.900999999999996 - type: ndcg_at_1000 value: 52.16 - type: ndcg_at_3 value: 36.323 - type: ndcg_at_5 value: 40.766999999999996 - type: precision_at_1 value: 23.257 - type: precision_at_10 value: 7.510999999999999 - type: precision_at_100 value: 0.976 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 15.339 - type: precision_at_5 value: 11.350999999999999 - type: recall_at_1 value: 23.257 - type: recall_at_10 value: 75.107 - type: recall_at_100 value: 97.58200000000001 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 46.017 - type: recall_at_5 value: 56.757000000000005 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 44.02420878391967 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 35.16136856000258 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 59.61809790513646 - type: mrr value: 73.07215406938397 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 82.0167350090749 - type: cos_sim_spearman value: 80.51569002630401 - type: euclidean_pearson value: 81.46820525099726 - type: euclidean_spearman value: 80.51569002630401 - type: manhattan_pearson value: 81.35596555056757 - type: manhattan_spearman value: 80.12592210903303 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 78.25 - type: f1 value: 77.34950913540605 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 35.57238596005698 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 29.066444306196683 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.891000000000002 - type: map_at_10 value: 42.772 - type: map_at_100 value: 44.108999999999995 - type: map_at_1000 value: 44.236 - type: map_at_3 value: 39.289 - type: map_at_5 value: 41.113 - type: mrr_at_1 value: 39.342 - type: mrr_at_10 value: 48.852000000000004 - type: mrr_at_100 value: 49.534 - type: mrr_at_1000 value: 49.582 - type: mrr_at_3 value: 46.089999999999996 - type: mrr_at_5 value: 47.685 - type: ndcg_at_1 value: 39.342 - type: ndcg_at_10 value: 48.988 - type: ndcg_at_100 value: 53.854 - type: ndcg_at_1000 value: 55.955 - type: ndcg_at_3 value: 43.877 - type: ndcg_at_5 value: 46.027 - type: precision_at_1 value: 39.342 - type: precision_at_10 value: 9.285 - type: precision_at_100 value: 1.488 - type: precision_at_1000 value: 0.194 - type: precision_at_3 value: 20.696 - type: precision_at_5 value: 14.878 - type: recall_at_1 value: 31.891000000000002 - type: recall_at_10 value: 60.608 - type: recall_at_100 value: 81.025 - type: recall_at_1000 value: 94.883 - type: recall_at_3 value: 45.694 - type: recall_at_5 value: 51.684 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.778 - type: map_at_10 value: 37.632 - type: map_at_100 value: 38.800000000000004 - type: map_at_1000 value: 38.934999999999995 - type: map_at_3 value: 35.293 - type: map_at_5 value: 36.547000000000004 - type: mrr_at_1 value: 35.35 - type: mrr_at_10 value: 42.936 - type: mrr_at_100 value: 43.69 - type: mrr_at_1000 value: 43.739 - type: mrr_at_3 value: 41.062 - type: mrr_at_5 value: 42.097 - type: ndcg_at_1 value: 35.35 - type: ndcg_at_10 value: 42.528 - type: ndcg_at_100 value: 46.983000000000004 - type: ndcg_at_1000 value: 49.187999999999995 - type: ndcg_at_3 value: 39.271 - type: ndcg_at_5 value: 40.654 - type: precision_at_1 value: 35.35 - type: precision_at_10 value: 7.828 - type: precision_at_100 value: 1.3010000000000002 - type: precision_at_1000 value: 0.17700000000000002 - type: precision_at_3 value: 18.96 - type: precision_at_5 value: 13.120999999999999 - type: recall_at_1 value: 28.778 - type: recall_at_10 value: 50.775000000000006 - type: recall_at_100 value: 69.66799999999999 - type: recall_at_1000 value: 83.638 - type: recall_at_3 value: 40.757 - type: recall_at_5 value: 44.86 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 37.584 - type: map_at_10 value: 49.69 - type: map_at_100 value: 50.639 - type: map_at_1000 value: 50.702999999999996 - type: map_at_3 value: 46.61 - type: map_at_5 value: 48.486000000000004 - type: mrr_at_1 value: 43.009 - type: mrr_at_10 value: 52.949999999999996 - type: mrr_at_100 value: 53.618 - type: mrr_at_1000 value: 53.65299999999999 - type: mrr_at_3 value: 50.605999999999995 - type: mrr_at_5 value: 52.095 - type: ndcg_at_1 value: 43.009 - type: ndcg_at_10 value: 55.278000000000006 - type: ndcg_at_100 value: 59.134 - type: ndcg_at_1000 value: 60.528999999999996 - type: ndcg_at_3 value: 50.184 - type: ndcg_at_5 value: 52.919000000000004 - type: precision_at_1 value: 43.009 - type: precision_at_10 value: 8.821 - type: precision_at_100 value: 1.161 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 22.424 - type: precision_at_5 value: 15.436 - type: recall_at_1 value: 37.584 - type: recall_at_10 value: 68.514 - type: recall_at_100 value: 85.099 - type: recall_at_1000 value: 95.123 - type: recall_at_3 value: 55.007 - type: recall_at_5 value: 61.714999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.7 - type: map_at_10 value: 32.804 - type: map_at_100 value: 33.738 - type: map_at_1000 value: 33.825 - type: map_at_3 value: 30.639 - type: map_at_5 value: 31.781 - type: mrr_at_1 value: 26.328000000000003 - type: mrr_at_10 value: 34.679 - type: mrr_at_100 value: 35.510000000000005 - type: mrr_at_1000 value: 35.577999999999996 - type: mrr_at_3 value: 32.58 - type: mrr_at_5 value: 33.687 - type: ndcg_at_1 value: 26.328000000000003 - type: ndcg_at_10 value: 37.313 - type: ndcg_at_100 value: 42.004000000000005 - type: ndcg_at_1000 value: 44.232 - type: ndcg_at_3 value: 33.076 - type: ndcg_at_5 value: 34.966 - type: precision_at_1 value: 26.328000000000003 - type: precision_at_10 value: 5.627 - type: precision_at_100 value: 0.8410000000000001 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 14.011000000000001 - type: precision_at_5 value: 9.582 - type: recall_at_1 value: 24.7 - type: recall_at_10 value: 49.324 - type: recall_at_100 value: 71.018 - type: recall_at_1000 value: 87.905 - type: recall_at_3 value: 37.7 - type: recall_at_5 value: 42.281 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 14.350999999999999 - type: map_at_10 value: 21.745 - type: map_at_100 value: 22.731 - type: map_at_1000 value: 22.852 - type: map_at_3 value: 19.245 - type: map_at_5 value: 20.788 - type: mrr_at_1 value: 18.159 - type: mrr_at_10 value: 25.833000000000002 - type: mrr_at_100 value: 26.728 - type: mrr_at_1000 value: 26.802 - type: mrr_at_3 value: 23.383000000000003 - type: mrr_at_5 value: 24.887999999999998 - type: ndcg_at_1 value: 18.159 - type: ndcg_at_10 value: 26.518000000000004 - type: ndcg_at_100 value: 31.473000000000003 - type: ndcg_at_1000 value: 34.576 - type: ndcg_at_3 value: 21.907 - type: ndcg_at_5 value: 24.39 - type: precision_at_1 value: 18.159 - type: precision_at_10 value: 4.938 - type: precision_at_100 value: 0.853 - type: precision_at_1000 value: 0.125 - type: precision_at_3 value: 10.655000000000001 - type: precision_at_5 value: 7.985 - type: recall_at_1 value: 14.350999999999999 - type: recall_at_10 value: 37.284 - type: recall_at_100 value: 59.11300000000001 - type: recall_at_1000 value: 81.634 - type: recall_at_3 value: 24.753 - type: recall_at_5 value: 30.979 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.978 - type: map_at_10 value: 36.276 - type: map_at_100 value: 37.547000000000004 - type: map_at_1000 value: 37.678 - type: map_at_3 value: 33.674 - type: map_at_5 value: 35.119 - type: mrr_at_1 value: 32.916000000000004 - type: mrr_at_10 value: 41.798 - type: mrr_at_100 value: 42.72 - type: mrr_at_1000 value: 42.778 - type: mrr_at_3 value: 39.493 - type: mrr_at_5 value: 40.927 - type: ndcg_at_1 value: 32.916000000000004 - type: ndcg_at_10 value: 41.81 - type: ndcg_at_100 value: 47.284 - type: ndcg_at_1000 value: 49.702 - type: ndcg_at_3 value: 37.486999999999995 - type: ndcg_at_5 value: 39.597 - type: precision_at_1 value: 32.916000000000004 - type: precision_at_10 value: 7.411 - type: precision_at_100 value: 1.189 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 17.581 - type: precision_at_5 value: 12.397 - type: recall_at_1 value: 26.978 - type: recall_at_10 value: 52.869 - type: recall_at_100 value: 75.78399999999999 - type: recall_at_1000 value: 91.545 - type: recall_at_3 value: 40.717 - type: recall_at_5 value: 46.168 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.641 - type: map_at_10 value: 32.916000000000004 - type: map_at_100 value: 34.165 - type: map_at_1000 value: 34.286 - type: map_at_3 value: 30.335 - type: map_at_5 value: 31.569000000000003 - type: mrr_at_1 value: 30.593999999999998 - type: mrr_at_10 value: 38.448 - type: mrr_at_100 value: 39.299 - type: mrr_at_1000 value: 39.362 - type: mrr_at_3 value: 36.244 - type: mrr_at_5 value: 37.232 - type: ndcg_at_1 value: 30.593999999999998 - type: ndcg_at_10 value: 38.2 - type: ndcg_at_100 value: 43.742 - type: ndcg_at_1000 value: 46.217000000000006 - type: ndcg_at_3 value: 33.925 - type: ndcg_at_5 value: 35.394 - type: precision_at_1 value: 30.593999999999998 - type: precision_at_10 value: 6.895 - type: precision_at_100 value: 1.1320000000000001 - type: precision_at_1000 value: 0.153 - type: precision_at_3 value: 16.096 - type: precision_at_5 value: 11.05 - type: recall_at_1 value: 24.641 - type: recall_at_10 value: 48.588 - type: recall_at_100 value: 72.841 - type: recall_at_1000 value: 89.535 - type: recall_at_3 value: 36.087 - type: recall_at_5 value: 40.346 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.79425 - type: map_at_10 value: 33.12033333333333 - type: map_at_100 value: 34.221333333333334 - type: map_at_1000 value: 34.3435 - type: map_at_3 value: 30.636583333333338 - type: map_at_5 value: 31.974083333333326 - type: mrr_at_1 value: 29.242416666666664 - type: mrr_at_10 value: 37.11675 - type: mrr_at_100 value: 37.93783333333334 - type: mrr_at_1000 value: 38.003083333333336 - type: mrr_at_3 value: 34.904666666666664 - type: mrr_at_5 value: 36.12916666666667 - type: ndcg_at_1 value: 29.242416666666664 - type: ndcg_at_10 value: 38.03416666666667 - type: ndcg_at_100 value: 42.86674999999999 - type: ndcg_at_1000 value: 45.34550000000001 - type: ndcg_at_3 value: 33.76466666666666 - type: ndcg_at_5 value: 35.668666666666674 - type: precision_at_1 value: 29.242416666666664 - type: precision_at_10 value: 6.589833333333334 - type: precision_at_100 value: 1.0693333333333332 - type: precision_at_1000 value: 0.14641666666666667 - type: precision_at_3 value: 15.430749999999998 - type: precision_at_5 value: 10.833833333333333 - type: recall_at_1 value: 24.79425 - type: recall_at_10 value: 48.582916666666655 - type: recall_at_100 value: 69.88499999999999 - type: recall_at_1000 value: 87.211 - type: recall_at_3 value: 36.625499999999995 - type: recall_at_5 value: 41.553999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.767 - type: map_at_10 value: 28.450999999999997 - type: map_at_100 value: 29.332 - type: map_at_1000 value: 29.426000000000002 - type: map_at_3 value: 26.379 - type: map_at_5 value: 27.584999999999997 - type: mrr_at_1 value: 25.46 - type: mrr_at_10 value: 30.974 - type: mrr_at_100 value: 31.784000000000002 - type: mrr_at_1000 value: 31.857999999999997 - type: mrr_at_3 value: 28.962 - type: mrr_at_5 value: 30.066 - type: ndcg_at_1 value: 25.46 - type: ndcg_at_10 value: 32.041 - type: ndcg_at_100 value: 36.522 - type: ndcg_at_1000 value: 39.101 - type: ndcg_at_3 value: 28.152 - type: ndcg_at_5 value: 30.03 - type: precision_at_1 value: 25.46 - type: precision_at_10 value: 4.893 - type: precision_at_100 value: 0.77 - type: precision_at_1000 value: 0.107 - type: precision_at_3 value: 11.605 - type: precision_at_5 value: 8.19 - type: recall_at_1 value: 22.767 - type: recall_at_10 value: 40.71 - type: recall_at_100 value: 61.334999999999994 - type: recall_at_1000 value: 80.567 - type: recall_at_3 value: 30.198000000000004 - type: recall_at_5 value: 34.803 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.722 - type: map_at_10 value: 22.794 - type: map_at_100 value: 23.7 - type: map_at_1000 value: 23.822 - type: map_at_3 value: 20.781 - type: map_at_5 value: 22.024 - type: mrr_at_1 value: 20.061999999999998 - type: mrr_at_10 value: 26.346999999999998 - type: mrr_at_100 value: 27.153 - type: mrr_at_1000 value: 27.233 - type: mrr_at_3 value: 24.375 - type: mrr_at_5 value: 25.593 - type: ndcg_at_1 value: 20.061999999999998 - type: ndcg_at_10 value: 26.785999999999998 - type: ndcg_at_100 value: 31.319999999999997 - type: ndcg_at_1000 value: 34.346 - type: ndcg_at_3 value: 23.219 - type: ndcg_at_5 value: 25.107000000000003 - type: precision_at_1 value: 20.061999999999998 - type: precision_at_10 value: 4.78 - type: precision_at_100 value: 0.83 - type: precision_at_1000 value: 0.125 - type: precision_at_3 value: 10.874 - type: precision_at_5 value: 7.956 - type: recall_at_1 value: 16.722 - type: recall_at_10 value: 35.204 - type: recall_at_100 value: 55.797 - type: recall_at_1000 value: 77.689 - type: recall_at_3 value: 25.245 - type: recall_at_5 value: 30.115 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.842 - type: map_at_10 value: 32.917 - type: map_at_100 value: 33.961000000000006 - type: map_at_1000 value: 34.069 - type: map_at_3 value: 30.595 - type: map_at_5 value: 31.837 - type: mrr_at_1 value: 29.011 - type: mrr_at_10 value: 36.977 - type: mrr_at_100 value: 37.814 - type: mrr_at_1000 value: 37.885999999999996 - type: mrr_at_3 value: 34.966 - type: mrr_at_5 value: 36.043 - type: ndcg_at_1 value: 29.011 - type: ndcg_at_10 value: 37.735 - type: ndcg_at_100 value: 42.683 - type: ndcg_at_1000 value: 45.198 - type: ndcg_at_3 value: 33.650000000000006 - type: ndcg_at_5 value: 35.386 - type: precision_at_1 value: 29.011 - type: precision_at_10 value: 6.259 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 15.329999999999998 - type: precision_at_5 value: 10.541 - type: recall_at_1 value: 24.842 - type: recall_at_10 value: 48.304 - type: recall_at_100 value: 70.04899999999999 - type: recall_at_1000 value: 87.82600000000001 - type: recall_at_3 value: 36.922 - type: recall_at_5 value: 41.449999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.252000000000002 - type: map_at_10 value: 32.293 - type: map_at_100 value: 33.816 - type: map_at_1000 value: 34.053 - type: map_at_3 value: 29.781999999999996 - type: map_at_5 value: 31.008000000000003 - type: mrr_at_1 value: 29.051 - type: mrr_at_10 value: 36.722 - type: mrr_at_100 value: 37.663000000000004 - type: mrr_at_1000 value: 37.734 - type: mrr_at_3 value: 34.354 - type: mrr_at_5 value: 35.609 - type: ndcg_at_1 value: 29.051 - type: ndcg_at_10 value: 37.775999999999996 - type: ndcg_at_100 value: 43.221 - type: ndcg_at_1000 value: 46.116 - type: ndcg_at_3 value: 33.403 - type: ndcg_at_5 value: 35.118 - type: precision_at_1 value: 29.051 - type: precision_at_10 value: 7.332 - type: precision_at_100 value: 1.49 - type: precision_at_1000 value: 0.23600000000000002 - type: precision_at_3 value: 15.415000000000001 - type: precision_at_5 value: 11.107 - type: recall_at_1 value: 24.252000000000002 - type: recall_at_10 value: 47.861 - type: recall_at_100 value: 72.21600000000001 - type: recall_at_1000 value: 90.886 - type: recall_at_3 value: 35.533 - type: recall_at_5 value: 39.959 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 20.025000000000002 - type: map_at_10 value: 27.154 - type: map_at_100 value: 28.118 - type: map_at_1000 value: 28.237000000000002 - type: map_at_3 value: 25.017 - type: map_at_5 value: 25.832 - type: mrr_at_1 value: 21.627 - type: mrr_at_10 value: 28.884999999999998 - type: mrr_at_100 value: 29.741 - type: mrr_at_1000 value: 29.831999999999997 - type: mrr_at_3 value: 26.741 - type: mrr_at_5 value: 27.628000000000004 - type: ndcg_at_1 value: 21.627 - type: ndcg_at_10 value: 31.436999999999998 - type: ndcg_at_100 value: 36.181000000000004 - type: ndcg_at_1000 value: 38.986 - type: ndcg_at_3 value: 27.025 - type: ndcg_at_5 value: 28.436 - type: precision_at_1 value: 21.627 - type: precision_at_10 value: 5.009 - type: precision_at_100 value: 0.7929999999999999 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 11.522 - type: precision_at_5 value: 7.763000000000001 - type: recall_at_1 value: 20.025000000000002 - type: recall_at_10 value: 42.954 - type: recall_at_100 value: 64.67500000000001 - type: recall_at_1000 value: 85.301 - type: recall_at_3 value: 30.892999999999997 - type: recall_at_5 value: 34.288000000000004 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 10.079 - type: map_at_10 value: 16.930999999999997 - type: map_at_100 value: 18.398999999999997 - type: map_at_1000 value: 18.561 - type: map_at_3 value: 14.294 - type: map_at_5 value: 15.579 - type: mrr_at_1 value: 22.606 - type: mrr_at_10 value: 32.513 - type: mrr_at_100 value: 33.463 - type: mrr_at_1000 value: 33.513999999999996 - type: mrr_at_3 value: 29.479 - type: mrr_at_5 value: 31.3 - type: ndcg_at_1 value: 22.606 - type: ndcg_at_10 value: 24.053 - type: ndcg_at_100 value: 30.258000000000003 - type: ndcg_at_1000 value: 33.516 - type: ndcg_at_3 value: 19.721 - type: ndcg_at_5 value: 21.144 - type: precision_at_1 value: 22.606 - type: precision_at_10 value: 7.55 - type: precision_at_100 value: 1.399 - type: precision_at_1000 value: 0.2 - type: precision_at_3 value: 14.701 - type: precision_at_5 value: 11.192 - type: recall_at_1 value: 10.079 - type: recall_at_10 value: 28.970000000000002 - type: recall_at_100 value: 50.805 - type: recall_at_1000 value: 69.378 - type: recall_at_3 value: 18.199 - type: recall_at_5 value: 22.442 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 7.794 - type: map_at_10 value: 15.165999999999999 - type: map_at_100 value: 20.508000000000003 - type: map_at_1000 value: 21.809 - type: map_at_3 value: 11.568000000000001 - type: map_at_5 value: 13.059000000000001 - type: mrr_at_1 value: 56.49999999999999 - type: mrr_at_10 value: 65.90899999999999 - type: mrr_at_100 value: 66.352 - type: mrr_at_1000 value: 66.369 - type: mrr_at_3 value: 64.0 - type: mrr_at_5 value: 65.10000000000001 - type: ndcg_at_1 value: 44.25 - type: ndcg_at_10 value: 32.649 - type: ndcg_at_100 value: 36.668 - type: ndcg_at_1000 value: 43.918 - type: ndcg_at_3 value: 37.096000000000004 - type: ndcg_at_5 value: 34.048 - type: precision_at_1 value: 56.49999999999999 - type: precision_at_10 value: 25.45 - type: precision_at_100 value: 8.055 - type: precision_at_1000 value: 1.7489999999999999 - type: precision_at_3 value: 41.0 - type: precision_at_5 value: 32.85 - type: recall_at_1 value: 7.794 - type: recall_at_10 value: 20.101 - type: recall_at_100 value: 42.448 - type: recall_at_1000 value: 65.88000000000001 - type: recall_at_3 value: 12.753 - type: recall_at_5 value: 15.307 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 44.01 - type: f1 value: 38.659680951114964 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 49.713 - type: map_at_10 value: 61.79 - type: map_at_100 value: 62.28 - type: map_at_1000 value: 62.297000000000004 - type: map_at_3 value: 59.361 - type: map_at_5 value: 60.92100000000001 - type: mrr_at_1 value: 53.405 - type: mrr_at_10 value: 65.79899999999999 - type: mrr_at_100 value: 66.219 - type: mrr_at_1000 value: 66.227 - type: mrr_at_3 value: 63.431000000000004 - type: mrr_at_5 value: 64.98 - type: ndcg_at_1 value: 53.405 - type: ndcg_at_10 value: 68.01899999999999 - type: ndcg_at_100 value: 70.197 - type: ndcg_at_1000 value: 70.571 - type: ndcg_at_3 value: 63.352 - type: ndcg_at_5 value: 66.018 - type: precision_at_1 value: 53.405 - type: precision_at_10 value: 9.119 - type: precision_at_100 value: 1.03 - type: precision_at_1000 value: 0.107 - type: precision_at_3 value: 25.602999999999998 - type: precision_at_5 value: 16.835 - type: recall_at_1 value: 49.713 - type: recall_at_10 value: 83.306 - type: recall_at_100 value: 92.92 - type: recall_at_1000 value: 95.577 - type: recall_at_3 value: 70.798 - type: recall_at_5 value: 77.254 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 15.310000000000002 - type: map_at_10 value: 26.204 - type: map_at_100 value: 27.932000000000002 - type: map_at_1000 value: 28.121000000000002 - type: map_at_3 value: 22.481 - type: map_at_5 value: 24.678 - type: mrr_at_1 value: 29.784 - type: mrr_at_10 value: 39.582 - type: mrr_at_100 value: 40.52 - type: mrr_at_1000 value: 40.568 - type: mrr_at_3 value: 37.114000000000004 - type: mrr_at_5 value: 38.596000000000004 - type: ndcg_at_1 value: 29.784 - type: ndcg_at_10 value: 33.432 - type: ndcg_at_100 value: 40.281 - type: ndcg_at_1000 value: 43.653999999999996 - type: ndcg_at_3 value: 29.612 - type: ndcg_at_5 value: 31.223 - type: precision_at_1 value: 29.784 - type: precision_at_10 value: 9.645 - type: precision_at_100 value: 1.645 - type: precision_at_1000 value: 0.22499999999999998 - type: precision_at_3 value: 20.165 - type: precision_at_5 value: 15.401000000000002 - type: recall_at_1 value: 15.310000000000002 - type: recall_at_10 value: 40.499 - type: recall_at_100 value: 66.643 - type: recall_at_1000 value: 87.059 - type: recall_at_3 value: 27.492 - type: recall_at_5 value: 33.748 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 33.599000000000004 - type: map_at_10 value: 47.347 - type: map_at_100 value: 48.191 - type: map_at_1000 value: 48.263 - type: map_at_3 value: 44.698 - type: map_at_5 value: 46.278999999999996 - type: mrr_at_1 value: 67.19800000000001 - type: mrr_at_10 value: 74.054 - type: mrr_at_100 value: 74.376 - type: mrr_at_1000 value: 74.392 - type: mrr_at_3 value: 72.849 - type: mrr_at_5 value: 73.643 - type: ndcg_at_1 value: 67.19800000000001 - type: ndcg_at_10 value: 56.482 - type: ndcg_at_100 value: 59.694 - type: ndcg_at_1000 value: 61.204 - type: ndcg_at_3 value: 52.43299999999999 - type: ndcg_at_5 value: 54.608000000000004 - type: precision_at_1 value: 67.19800000000001 - type: precision_at_10 value: 11.613999999999999 - type: precision_at_100 value: 1.415 - type: precision_at_1000 value: 0.16199999999999998 - type: precision_at_3 value: 32.726 - type: precision_at_5 value: 21.349999999999998 - type: recall_at_1 value: 33.599000000000004 - type: recall_at_10 value: 58.069 - type: recall_at_100 value: 70.736 - type: recall_at_1000 value: 80.804 - type: recall_at_3 value: 49.088 - type: recall_at_5 value: 53.376000000000005 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 73.64359999999999 - type: ap value: 67.54685976014599 - type: f1 value: 73.55148707559482 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 19.502 - type: map_at_10 value: 30.816 - type: map_at_100 value: 32.007999999999996 - type: map_at_1000 value: 32.067 - type: map_at_3 value: 27.215 - type: map_at_5 value: 29.304000000000002 - type: mrr_at_1 value: 20.072000000000003 - type: mrr_at_10 value: 31.406 - type: mrr_at_100 value: 32.549 - type: mrr_at_1000 value: 32.602 - type: mrr_at_3 value: 27.839000000000002 - type: mrr_at_5 value: 29.926000000000002 - type: ndcg_at_1 value: 20.086000000000002 - type: ndcg_at_10 value: 37.282 - type: ndcg_at_100 value: 43.206 - type: ndcg_at_1000 value: 44.690000000000005 - type: ndcg_at_3 value: 29.932 - type: ndcg_at_5 value: 33.668 - type: precision_at_1 value: 20.086000000000002 - type: precision_at_10 value: 5.961 - type: precision_at_100 value: 0.898 - type: precision_at_1000 value: 0.10200000000000001 - type: precision_at_3 value: 12.856000000000002 - type: precision_at_5 value: 9.596 - type: recall_at_1 value: 19.502 - type: recall_at_10 value: 57.182 - type: recall_at_100 value: 84.952 - type: recall_at_1000 value: 96.34700000000001 - type: recall_at_3 value: 37.193 - type: recall_at_5 value: 46.157 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.96488828089375 - type: f1 value: 93.32119260543482 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.4965800273598 - type: f1 value: 49.34896217536082 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 67.60928043039678 - type: f1 value: 64.34244712074538 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 69.75453934095493 - type: f1 value: 68.39224867489249 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 31.862573504920082 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 27.511123551196803 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.99145104942086 - type: mrr value: 32.03606480418627 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.015 - type: map_at_10 value: 11.054 - type: map_at_100 value: 13.773 - type: map_at_1000 value: 15.082999999999998 - type: map_at_3 value: 8.253 - type: map_at_5 value: 9.508999999999999 - type: mrr_at_1 value: 42.105 - type: mrr_at_10 value: 50.44499999999999 - type: mrr_at_100 value: 51.080000000000005 - type: mrr_at_1000 value: 51.129999999999995 - type: mrr_at_3 value: 48.555 - type: mrr_at_5 value: 49.84 - type: ndcg_at_1 value: 40.402 - type: ndcg_at_10 value: 30.403000000000002 - type: ndcg_at_100 value: 28.216 - type: ndcg_at_1000 value: 37.021 - type: ndcg_at_3 value: 35.53 - type: ndcg_at_5 value: 33.202999999999996 - type: precision_at_1 value: 42.105 - type: precision_at_10 value: 22.353 - type: precision_at_100 value: 7.266 - type: precision_at_1000 value: 2.011 - type: precision_at_3 value: 32.921 - type: precision_at_5 value: 28.297 - type: recall_at_1 value: 5.015 - type: recall_at_10 value: 14.393 - type: recall_at_100 value: 28.893 - type: recall_at_1000 value: 60.18 - type: recall_at_3 value: 9.184000000000001 - type: recall_at_5 value: 11.39 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 29.524 - type: map_at_10 value: 44.182 - type: map_at_100 value: 45.228 - type: map_at_1000 value: 45.265 - type: map_at_3 value: 39.978 - type: map_at_5 value: 42.482 - type: mrr_at_1 value: 33.256 - type: mrr_at_10 value: 46.661 - type: mrr_at_100 value: 47.47 - type: mrr_at_1000 value: 47.496 - type: mrr_at_3 value: 43.187999999999995 - type: mrr_at_5 value: 45.330999999999996 - type: ndcg_at_1 value: 33.227000000000004 - type: ndcg_at_10 value: 51.589 - type: ndcg_at_100 value: 56.043 - type: ndcg_at_1000 value: 56.937000000000005 - type: ndcg_at_3 value: 43.751 - type: ndcg_at_5 value: 47.937000000000005 - type: precision_at_1 value: 33.227000000000004 - type: precision_at_10 value: 8.556999999999999 - type: precision_at_100 value: 1.103 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 19.921 - type: precision_at_5 value: 14.396999999999998 - type: recall_at_1 value: 29.524 - type: recall_at_10 value: 71.615 - type: recall_at_100 value: 91.056 - type: recall_at_1000 value: 97.72800000000001 - type: recall_at_3 value: 51.451 - type: recall_at_5 value: 61.119 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 69.596 - type: map_at_10 value: 83.281 - type: map_at_100 value: 83.952 - type: map_at_1000 value: 83.97200000000001 - type: map_at_3 value: 80.315 - type: map_at_5 value: 82.223 - type: mrr_at_1 value: 80.17 - type: mrr_at_10 value: 86.522 - type: mrr_at_100 value: 86.644 - type: mrr_at_1000 value: 86.64500000000001 - type: mrr_at_3 value: 85.438 - type: mrr_at_5 value: 86.21799999999999 - type: ndcg_at_1 value: 80.19 - type: ndcg_at_10 value: 87.19 - type: ndcg_at_100 value: 88.567 - type: ndcg_at_1000 value: 88.70400000000001 - type: ndcg_at_3 value: 84.17999999999999 - type: ndcg_at_5 value: 85.931 - type: precision_at_1 value: 80.19 - type: precision_at_10 value: 13.209000000000001 - type: precision_at_100 value: 1.518 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 36.717 - type: precision_at_5 value: 24.248 - type: recall_at_1 value: 69.596 - type: recall_at_10 value: 94.533 - type: recall_at_100 value: 99.322 - type: recall_at_1000 value: 99.965 - type: recall_at_3 value: 85.911 - type: recall_at_5 value: 90.809 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 49.27650627571912 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 57.08550946534183 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.568 - type: map_at_10 value: 10.862 - type: map_at_100 value: 12.757 - type: map_at_1000 value: 13.031 - type: map_at_3 value: 7.960000000000001 - type: map_at_5 value: 9.337 - type: mrr_at_1 value: 22.5 - type: mrr_at_10 value: 32.6 - type: mrr_at_100 value: 33.603 - type: mrr_at_1000 value: 33.672000000000004 - type: mrr_at_3 value: 29.299999999999997 - type: mrr_at_5 value: 31.25 - type: ndcg_at_1 value: 22.5 - type: ndcg_at_10 value: 18.605 - type: ndcg_at_100 value: 26.029999999999998 - type: ndcg_at_1000 value: 31.256 - type: ndcg_at_3 value: 17.873 - type: ndcg_at_5 value: 15.511 - type: precision_at_1 value: 22.5 - type: precision_at_10 value: 9.58 - type: precision_at_100 value: 2.033 - type: precision_at_1000 value: 0.33 - type: precision_at_3 value: 16.633 - type: precision_at_5 value: 13.54 - type: recall_at_1 value: 4.568 - type: recall_at_10 value: 19.402 - type: recall_at_100 value: 41.277 - type: recall_at_1000 value: 66.963 - type: recall_at_3 value: 10.112 - type: recall_at_5 value: 13.712 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.31992291680787 - type: cos_sim_spearman value: 76.7212346922664 - type: euclidean_pearson value: 80.42189271706478 - type: euclidean_spearman value: 76.7212342532493 - type: manhattan_pearson value: 80.33171093031578 - type: manhattan_spearman value: 76.63192883074694 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.16654278886763 - type: cos_sim_spearman value: 73.66390263429565 - type: euclidean_pearson value: 79.7485360086639 - type: euclidean_spearman value: 73.66389870373436 - type: manhattan_pearson value: 79.73652237443706 - type: manhattan_spearman value: 73.65296117151647 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.40389689929246 - type: cos_sim_spearman value: 83.29727595993955 - type: euclidean_pearson value: 82.23970587854079 - type: euclidean_spearman value: 83.29727595993955 - type: manhattan_pearson value: 82.18823600831897 - type: manhattan_spearman value: 83.20746192209594 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 81.73505246913413 - type: cos_sim_spearman value: 79.1686548248754 - type: euclidean_pearson value: 80.48889135993412 - type: euclidean_spearman value: 79.16864112930354 - type: manhattan_pearson value: 80.40720651057302 - type: manhattan_spearman value: 79.0640155089286 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.3953512879065 - type: cos_sim_spearman value: 87.29947322714338 - type: euclidean_pearson value: 86.59759438529645 - type: euclidean_spearman value: 87.29947511092824 - type: manhattan_pearson value: 86.52097806169155 - type: manhattan_spearman value: 87.22987242146534 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.48565753792056 - type: cos_sim_spearman value: 83.6049720319893 - type: euclidean_pearson value: 82.56452023172913 - type: euclidean_spearman value: 83.60490168191697 - type: manhattan_pearson value: 82.58079941137872 - type: manhattan_spearman value: 83.60975807374051 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.18239976618212 - type: cos_sim_spearman value: 88.23061724730616 - type: euclidean_pearson value: 87.78482472776658 - type: euclidean_spearman value: 88.23061724730616 - type: manhattan_pearson value: 87.75059641730239 - type: manhattan_spearman value: 88.22527413524622 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.42816418706765 - type: cos_sim_spearman value: 63.4569864520124 - type: euclidean_pearson value: 64.35405409953853 - type: euclidean_spearman value: 63.4569864520124 - type: manhattan_pearson value: 63.96649236073056 - type: manhattan_spearman value: 63.01448583722708 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 83.41659638047614 - type: cos_sim_spearman value: 84.03893866106175 - type: euclidean_pearson value: 84.2251203953798 - type: euclidean_spearman value: 84.03893866106175 - type: manhattan_pearson value: 84.22733643205514 - type: manhattan_spearman value: 84.06504411263612 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 79.75608022582414 - type: mrr value: 94.0947732369301 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 50.161 - type: map_at_10 value: 59.458999999999996 - type: map_at_100 value: 60.156 - type: map_at_1000 value: 60.194 - type: map_at_3 value: 56.45400000000001 - type: map_at_5 value: 58.165 - type: mrr_at_1 value: 53.333 - type: mrr_at_10 value: 61.050000000000004 - type: mrr_at_100 value: 61.586 - type: mrr_at_1000 value: 61.624 - type: mrr_at_3 value: 58.889 - type: mrr_at_5 value: 60.122 - type: ndcg_at_1 value: 53.333 - type: ndcg_at_10 value: 63.888999999999996 - type: ndcg_at_100 value: 66.963 - type: ndcg_at_1000 value: 68.062 - type: ndcg_at_3 value: 59.01 - type: ndcg_at_5 value: 61.373999999999995 - type: precision_at_1 value: 53.333 - type: precision_at_10 value: 8.633000000000001 - type: precision_at_100 value: 1.027 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 23.111 - type: precision_at_5 value: 15.467 - type: recall_at_1 value: 50.161 - type: recall_at_10 value: 75.922 - type: recall_at_100 value: 90.0 - type: recall_at_1000 value: 98.667 - type: recall_at_3 value: 62.90599999999999 - type: recall_at_5 value: 68.828 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.81188118811882 - type: cos_sim_ap value: 95.11619225962413 - type: cos_sim_f1 value: 90.35840484603736 - type: cos_sim_precision value: 91.23343527013252 - type: cos_sim_recall value: 89.5 - type: dot_accuracy value: 99.81188118811882 - type: dot_ap value: 95.11619225962413 - type: dot_f1 value: 90.35840484603736 - type: dot_precision value: 91.23343527013252 - type: dot_recall value: 89.5 - type: euclidean_accuracy value: 99.81188118811882 - type: euclidean_ap value: 95.11619225962413 - type: euclidean_f1 value: 90.35840484603736 - type: euclidean_precision value: 91.23343527013252 - type: euclidean_recall value: 89.5 - type: manhattan_accuracy value: 99.80891089108911 - type: manhattan_ap value: 95.07294266220966 - type: manhattan_f1 value: 90.21794221996959 - type: manhattan_precision value: 91.46968139773895 - type: manhattan_recall value: 89.0 - type: max_accuracy value: 99.81188118811882 - type: max_ap value: 95.11619225962413 - type: max_f1 value: 90.35840484603736 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 55.3481874105239 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.421291695525 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.98746633276634 - type: mrr value: 50.63143249724133 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.009961979844036 - type: cos_sim_spearman value: 30.558416108881044 - type: dot_pearson value: 31.009964941134253 - type: dot_spearman value: 30.545760761761393 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.207 - type: map_at_10 value: 1.6 - type: map_at_100 value: 8.594 - type: map_at_1000 value: 20.213 - type: map_at_3 value: 0.585 - type: map_at_5 value: 0.9039999999999999 - type: mrr_at_1 value: 78.0 - type: mrr_at_10 value: 87.4 - type: mrr_at_100 value: 87.4 - type: mrr_at_1000 value: 87.4 - type: mrr_at_3 value: 86.667 - type: mrr_at_5 value: 87.06700000000001 - type: ndcg_at_1 value: 73.0 - type: ndcg_at_10 value: 65.18 - type: ndcg_at_100 value: 49.631 - type: ndcg_at_1000 value: 43.498999999999995 - type: ndcg_at_3 value: 71.83800000000001 - type: ndcg_at_5 value: 69.271 - type: precision_at_1 value: 78.0 - type: precision_at_10 value: 69.19999999999999 - type: precision_at_100 value: 50.980000000000004 - type: precision_at_1000 value: 19.426 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 74.0 - type: recall_at_1 value: 0.207 - type: recall_at_10 value: 1.822 - type: recall_at_100 value: 11.849 - type: recall_at_1000 value: 40.492 - type: recall_at_3 value: 0.622 - type: recall_at_5 value: 0.9809999999999999 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.001 - type: map_at_10 value: 10.376000000000001 - type: map_at_100 value: 16.936999999999998 - type: map_at_1000 value: 18.615000000000002 - type: map_at_3 value: 5.335999999999999 - type: map_at_5 value: 7.374 - type: mrr_at_1 value: 20.408 - type: mrr_at_10 value: 38.29 - type: mrr_at_100 value: 39.33 - type: mrr_at_1000 value: 39.347 - type: mrr_at_3 value: 32.993 - type: mrr_at_5 value: 36.973 - type: ndcg_at_1 value: 17.347 - type: ndcg_at_10 value: 23.515 - type: ndcg_at_100 value: 37.457 - type: ndcg_at_1000 value: 49.439 - type: ndcg_at_3 value: 22.762999999999998 - type: ndcg_at_5 value: 22.622 - type: precision_at_1 value: 20.408 - type: precision_at_10 value: 22.448999999999998 - type: precision_at_100 value: 8.184 - type: precision_at_1000 value: 1.608 - type: precision_at_3 value: 25.85 - type: precision_at_5 value: 25.306 - type: recall_at_1 value: 2.001 - type: recall_at_10 value: 17.422 - type: recall_at_100 value: 51.532999999999994 - type: recall_at_1000 value: 87.466 - type: recall_at_3 value: 6.861000000000001 - type: recall_at_5 value: 10.502 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.54419999999999 - type: ap value: 14.372170450843907 - type: f1 value: 54.94420257390529 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.402942840973395 - type: f1 value: 59.4166538875571 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 41.569064336457906 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.31322644096085 - type: cos_sim_ap value: 72.14518894837381 - type: cos_sim_f1 value: 66.67489813557229 - type: cos_sim_precision value: 62.65954977953121 - type: cos_sim_recall value: 71.2401055408971 - type: dot_accuracy value: 85.31322644096085 - type: dot_ap value: 72.14521480685293 - type: dot_f1 value: 66.67489813557229 - type: dot_precision value: 62.65954977953121 - type: dot_recall value: 71.2401055408971 - type: euclidean_accuracy value: 85.31322644096085 - type: euclidean_ap value: 72.14520820485349 - type: euclidean_f1 value: 66.67489813557229 - type: euclidean_precision value: 62.65954977953121 - type: euclidean_recall value: 71.2401055408971 - type: manhattan_accuracy value: 85.21785778148656 - type: manhattan_ap value: 72.01177147657364 - type: manhattan_f1 value: 66.62594673833374 - type: manhattan_precision value: 62.0336669699727 - type: manhattan_recall value: 71.95250659630607 - type: max_accuracy value: 85.31322644096085 - type: max_ap value: 72.14521480685293 - type: max_f1 value: 66.67489813557229 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.12756626693057 - type: cos_sim_ap value: 86.05430786440826 - type: cos_sim_f1 value: 78.27759692216631 - type: cos_sim_precision value: 75.33466248931929 - type: cos_sim_recall value: 81.45980905451185 - type: dot_accuracy value: 89.12950673341872 - type: dot_ap value: 86.05431161145492 - type: dot_f1 value: 78.27759692216631 - type: dot_precision value: 75.33466248931929 - type: dot_recall value: 81.45980905451185 - type: euclidean_accuracy value: 89.12756626693057 - type: euclidean_ap value: 86.05431303247397 - type: euclidean_f1 value: 78.27759692216631 - type: euclidean_precision value: 75.33466248931929 - type: euclidean_recall value: 81.45980905451185 - type: manhattan_accuracy value: 89.04994760740482 - type: manhattan_ap value: 86.00860610892074 - type: manhattan_f1 value: 78.1846776005392 - type: manhattan_precision value: 76.10438839480975 - type: manhattan_recall value: 80.3818909762858 - type: max_accuracy value: 89.12950673341872 - type: max_ap value: 86.05431303247397 - type: max_f1 value: 78.27759692216631 ---

\"Jina

The text embedding set trained by .

## Quick Start The easiest way to starting using is to use Jina AI's Embedding API. ## Intended Usage & Model Info is an English, monolingual **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. The backbone is pretrained on the C4 dataset. The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process. The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi. This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc. This model has 33 million parameters, which enables lightning-fast and memory efficient inference, while still delivering impressive performance. Additionally, we provide the following embedding models: - []( 33 million parameters **(you are here)**. - []( 137 million parameters. - []( 161 million parameters Chinese-English Bilingual embeddings. - []( 161 million parameters German-English Bilingual embeddings. - [](): Spanish-English Bilingual embeddings (soon). ## Data & Parameters Jina Embeddings V2 technical report ## Usage **
Please apply mean pooling when integrating the model.**

### Why mean pooling? takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an function to deal with this. However, if you would like to do it without using the default function:

You can use Jina Embedding models directly from transformers package. If you only want to handle shorter sequence, such as 2k, pass the parameter to the function: The latest sentence-transformers also supports Jina embeddings: ## Alternatives to Using Transformers Package 1. _Managed SaaS_: Get started with a free key on Jina AI's Embedding API. 2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on AWS Sagemaker. ## RAG Performance According to the latest blog post from LLamaIndex, > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out. ## Plans 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese. 2. Multimodal embedding models enable Multimodal RAG applications. 3. High-performt rerankers. ## Trouble Shooting **Loading of Model Code failed** If you forgot to pass the flag when calling or initializing the model via the class, you will receive an error that the model weights could not be initialized. This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model: ## Contact Join our Discord community and chat with other community members about ideas. ## Citation If you find Jina Embeddings useful in your research, please cite the following paper:" +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-embeddings-v3.json b/data/model_data_json/jinaai_jina-embeddings-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..c5195fecb4fb46d2e0af85fa6012a09c68d239bf --- /dev/null +++ b/data/model_data_json/jinaai_jina-embeddings-v3.json @@ -0,0 +1,115 @@ +{ + "model_id": "jinaai/jina-embeddings-v3", + "downloads": 3823059, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "feature-extraction", + "sentence-similarity", + "mteb", + "sentence-transformers", + "custom_code", + "multilingual", + "af", + "am", + "ar", + "as", + "az", + "be", + "bg", + "bn", + "br", + "bs", + "ca", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lo", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "my", + "ne", + "nl", + "no", + "om", + "or", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sd", + "si", + "sk", + "sl", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "th", + "tl", + "tr", + "ug", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "zh", + "arxiv:2409.10173", + "license:cc-by-nc-4.0", + "model-index", + "region:eu" + ], + "description": "--- license: cc-by-nc-4.0 tags: - feature-extraction - sentence-similarity - mteb - sentence-transformers language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh inference: false library_name: transformers model-index: - name: jina-embeddings-v3 results: - dataset: config: default name: MTEB AFQMC (default) revision: b44c3b011063adb25877c13823db83bb193913c4 split: validation type: C-MTEB/AFQMC metrics: - type: cosine_pearson value: 41.74237700998808 - type: cosine_spearman value: 43.4726782647566 - type: euclidean_pearson value: 42.244585459479964 - type: euclidean_spearman value: 43.525070045169606 - type: main_score value: 43.4726782647566 - type: manhattan_pearson value: 42.04616728224863 - type: manhattan_spearman value: 43.308828270754645 - type: pearson value: 41.74237700998808 - type: spearman value: 43.4726782647566 task: type: STS - dataset: config: default name: MTEB ArguAna-PL (default) revision: 63fc86750af76253e8c760fc9e534bbf24d260a2 split: test type: clarin-knext/arguana-pl metrics: - type: main_score value: 50.117999999999995 - type: map_at_1 value: 24.253 - type: map_at_10 value: 40.725 - type: map_at_100 value: 41.699999999999996 - type: map_at_1000 value: 41.707 - type: map_at_20 value: 41.467999999999996 - type: map_at_3 value: 35.467 - type: map_at_5 value: 38.291 - type: mrr_at_1 value: 24.751066856330013 - type: mrr_at_10 value: 40.91063808169072 - type: mrr_at_100 value: 41.885497923928675 - type: mrr_at_1000 value: 41.89301098419842 - type: mrr_at_20 value: 41.653552355442514 - type: mrr_at_3 value: 35.656709340919775 - type: mrr_at_5 value: 38.466097676623946 - type: nauc_map_at_1000_diff1 value: 7.503000359807567 - type: nauc_map_at_1000_max value: -11.030405164830546 - type: nauc_map_at_1000_std value: -8.902792782585117 - type: nauc_map_at_100_diff1 value: 7.509899249593199 - type: nauc_map_at_100_max value: -11.023581259404406 - type: nauc_map_at_100_std value: -8.892241185067272 - type: nauc_map_at_10_diff1 value: 7.24369711881512 - type: nauc_map_at_10_max value: -10.810000200433278 - type: nauc_map_at_10_std value: -8.987230542165776 - type: nauc_map_at_1_diff1 value: 11.37175831832417 - type: nauc_map_at_1_max value: -13.315221903223055 - type: nauc_map_at_1_std value: -9.398199605510275 - type: nauc_map_at_20_diff1 value: 7.477364530860648 - type: nauc_map_at_20_max value: -10.901251218105566 - type: nauc_map_at_20_std value: -8.868148116405925 - type: nauc_map_at_3_diff1 value: 6.555548802174882 - type: nauc_map_at_3_max value: -12.247274800542934 - type: nauc_map_at_3_std value: -9.879475250984811 - type: nauc_map_at_5_diff1 value: 7.426588563355882 - type: nauc_map_at_5_max value: -11.347695686001805 - type: nauc_map_at_5_std value: -9.34441892203972 - type: nauc_mrr_at_1000_diff1 value: 5.99737552143614 - type: nauc_mrr_at_1000_max value: -11.327205136505727 - type: nauc_mrr_at_1000_std value: -8.791079115519503 - type: nauc_mrr_at_100_diff1 value: 6.004622525255784 - type: nauc_mrr_at_100_max value: -11.320336759899723 - type: nauc_mrr_at_100_std value: -8.780602249831777 - type: nauc_mrr_at_10_diff1 value: 5.783623516930227 - type: nauc_mrr_at_10_max value: -11.095971693467078 - type: nauc_mrr_at_10_std value: -8.877242032013582 - type: nauc_mrr_at_1_diff1 value: 9.694937537703797 - type: nauc_mrr_at_1_max value: -12.531905083727912 - type: nauc_mrr_at_1_std value: -8.903992940100146 - type: nauc_mrr_at_20_diff1 value: 5.984841206233873 - type: nauc_mrr_at_20_max value: -11.195236951048969 - type: nauc_mrr_at_20_std value: -8.757266039186018 - type: nauc_mrr_at_3_diff1 value: 5.114333824261379 - type: nauc_mrr_at_3_max value: -12.64809799843464 - type: nauc_mrr_at_3_std value: -9.791146138025184 - type: nauc_mrr_at_5_diff1 value: 5.88941606224512 - type: nauc_mrr_at_5_max value: -11.763903418071918 - type: nauc_mrr_at_5_std value: -9.279175712709446 - type: nauc_ndcg_at_1000_diff1 value: 7.076950652226086 - type: nauc_ndcg_at_1000_max value: -10.386482092087371 - type: nauc_ndcg_at_1000_std value: -8.309190917074046 - type: nauc_ndcg_at_100_diff1 value: 7.2329220284865245 - type: nauc_ndcg_at_100_max value: -10.208048403220337 - type: nauc_ndcg_at_100_std value: -7.997975874274613 - type: nauc_ndcg_at_10_diff1 value: 6.065391100006953 - type: nauc_ndcg_at_10_max value: -9.046164377601153 - type: nauc_ndcg_at_10_std value: -8.34724889697153 - type: nauc_ndcg_at_1_diff1 value: 11.37175831832417 - type: nauc_ndcg_at_1_max value: -13.315221903223055 - type: nauc_ndcg_at_1_std value: -9.398199605510275 - type: nauc_ndcg_at_20_diff1 value: 6.949389989202601 - type: nauc_ndcg_at_20_max value: -9.35740451760307 - type: nauc_ndcg_at_20_std value: -7.761295171828212 - type: nauc_ndcg_at_3_diff1 value: 5.051471796151364 - type: nauc_ndcg_at_3_max value: -12.158763333711653 - type: nauc_ndcg_at_3_std value: -10.078902544421926 - type: nauc_ndcg_at_5_diff1 value: 6.527454512611454 - type: nauc_ndcg_at_5_max value: -10.525118233848586 - type: nauc_ndcg_at_5_std value: -9.120055125584031 - type: nauc_precision_at_1000_diff1 value: -10.6495668199151 - type: nauc_precision_at_1000_max value: 12.070656425217841 - type: nauc_precision_at_1000_std value: 55.844551709649004 - type: nauc_precision_at_100_diff1 value: 19.206967129266285 - type: nauc_precision_at_100_max value: 16.296851020813456 - type: nauc_precision_at_100_std value: 45.60378984257811 - type: nauc_precision_at_10_diff1 value: 0.6490335354304879 - type: nauc_precision_at_10_max value: 0.5757198255366447 - type: nauc_precision_at_10_std value: -4.875847131691451 - type: nauc_precision_at_1_diff1 value: 11.37175831832417 - type: nauc_precision_at_1_max value: -13.315221903223055 - type: nauc_precision_at_1_std value: -9.398199605510275 - type: nauc_precision_at_20_diff1 value: 4.899369866929203 - type: nauc_precision_at_20_max value: 5.988537297189552 - type: nauc_precision_at_20_std value: 4.830900387582837 - type: nauc_precision_at_3_diff1 value: 0.8791156910997744 - type: nauc_precision_at_3_max value: -11.983373635905993 - type: nauc_precision_at_3_std value: -10.646185111581257 - type: nauc_precision_at_5_diff1 value: 3.9314486166548432 - type: nauc_precision_at_5_max value: -7.798591396895839 - type: nauc_precision_at_5_std value: -8.293043407234125 - type: nauc_recall_at_1000_diff1 value: -10.649566819918673 - type: nauc_recall_at_1000_max value: 12.070656425214647 - type: nauc_recall_at_1000_std value: 55.84455170965023 - type: nauc_recall_at_100_diff1 value: 19.206967129265127 - type: nauc_recall_at_100_max value: 16.296851020813722 - type: nauc_recall_at_100_std value: 45.60378984257728 - type: nauc_recall_at_10_diff1 value: 0.6490335354304176 - type: nauc_recall_at_10_max value: 0.5757198255366095 - type: nauc_recall_at_10_std value: -4.875847131691468 - type: nauc_recall_at_1_diff1 value: 11.37175831832417 - type: nauc_recall_at_1_max value: -13.315221903223055 - type: nauc_recall_at_1_std value: -9.398199605510275 - type: nauc_recall_at_20_diff1 value: 4.899369866929402 - type: nauc_recall_at_20_max value: 5.98853729718968 - type: nauc_recall_at_20_std value: 4.830900387582967 - type: nauc_recall_at_3_diff1 value: 0.8791156910997652 - type: nauc_recall_at_3_max value: -11.983373635905997 - type: nauc_recall_at_3_std value: -10.64618511158124 - type: nauc_recall_at_5_diff1 value: 3.9314486166548472 - type: nauc_recall_at_5_max value: -7.7985913968958585 - type: nauc_recall_at_5_std value: -8.293043407234132 - type: ndcg_at_1 value: 24.253 - type: ndcg_at_10 value: 50.117999999999995 - type: ndcg_at_100 value: 54.291999999999994 - type: ndcg_at_1000 value: 54.44799999999999 - type: ndcg_at_20 value: 52.771 - type: ndcg_at_3 value: 39.296 - type: ndcg_at_5 value: 44.373000000000005 - type: precision_at_1 value: 24.253 - type: precision_at_10 value: 8.016 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.527 - type: precision_at_3 value: 16.808999999999997 - type: precision_at_5 value: 12.546 - type: recall_at_1 value: 24.253 - type: recall_at_10 value: 80.156 - type: recall_at_100 value: 98.43499999999999 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_20 value: 90.54100000000001 - type: recall_at_3 value: 50.427 - type: recall_at_5 value: 62.731 task: type: Retrieval - dataset: config: default name: MTEB DBPedia-PL (default) revision: 76afe41d9af165cc40999fcaa92312b8b012064a split: test type: clarin-knext/dbpedia-pl metrics: - type: main_score value: 34.827000000000005 - type: map_at_1 value: 7.049999999999999 - type: map_at_10 value: 14.982999999999999 - type: map_at_100 value: 20.816000000000003 - type: map_at_1000 value: 22.33 - type: map_at_20 value: 17.272000000000002 - type: map_at_3 value: 10.661 - type: map_at_5 value: 12.498 - type: mrr_at_1 value: 57.25 - type: mrr_at_10 value: 65.81934523809524 - type: mrr_at_100 value: 66.2564203928212 - type: mrr_at_1000 value: 66.27993662923856 - type: mrr_at_20 value: 66.0732139130649 - type: mrr_at_3 value: 64.08333333333333 - type: mrr_at_5 value: 65.27083333333333 - type: nauc_map_at_1000_diff1 value: 16.41780871174038 - type: nauc_map_at_1000_max value: 30.193946325654654 - type: nauc_map_at_1000_std value: 31.46095497039037 - type: nauc_map_at_100_diff1 value: 18.57903165498531 - type: nauc_map_at_100_max value: 29.541476938623262 - type: nauc_map_at_100_std value: 28.228604103301052 - type: nauc_map_at_10_diff1 value: 24.109434489748946 - type: nauc_map_at_10_max value: 21.475954208048968 - type: nauc_map_at_10_std value: 9.964464537806988 - type: nauc_map_at_1_diff1 value: 38.67437644802124 - type: nauc_map_at_1_max value: 14.52136658726491 - type: nauc_map_at_1_std value: -2.8981666782088755 - type: nauc_map_at_20_diff1 value: 21.42547228801935 - type: nauc_map_at_20_max value: 25.04510402960458 - type: nauc_map_at_20_std value: 16.533079346431155 - type: nauc_map_at_3_diff1 value: 26.63648858245477 - type: nauc_map_at_3_max value: 13.632235789780415 - type: nauc_map_at_3_std value: -0.40129174577700716 - type: nauc_map_at_5_diff1 value: 24.513861031197933 - type: nauc_map_at_5_max value: 16.599888813946688 - type: nauc_map_at_5_std value: 3.4448514739556346 - type: nauc_mrr_at_1000_diff1 value: 36.57353464537154 - type: nauc_mrr_at_1000_max value: 55.34763483979515 - type: nauc_mrr_at_1000_std value: 40.3722796438533 - type: nauc_mrr_at_100_diff1 value: 36.555989566513134 - type: nauc_mrr_at_100_max value: 55.347805216808396 - type: nauc_mrr_at_100_std value: 40.38465945075711 - type: nauc_mrr_at_10_diff1 value: 36.771572999261984 - type: nauc_mrr_at_10_max value: 55.41239897909165 - type: nauc_mrr_at_10_std value: 40.52058934624793 - type: nauc_mrr_at_1_diff1 value: 38.2472828531032 - type: nauc_mrr_at_1_max value: 51.528473828685705 - type: nauc_mrr_at_1_std value: 33.03676467942882 - type: nauc_mrr_at_20_diff1 value: 36.642602571889036 - type: nauc_mrr_at_20_max value: 55.3763342076553 - type: nauc_mrr_at_20_std value: 40.41520090500838 - type: nauc_mrr_at_3_diff1 value: 36.79451847426628 - type: nauc_mrr_at_3_max value: 54.59778581826193 - type: nauc_mrr_at_3_std value: 39.48392075873095 - type: nauc_mrr_at_5_diff1 value: 36.92150807529304 - type: nauc_mrr_at_5_max value: 55.03553978718272 - type: nauc_mrr_at_5_std value: 40.20147745489917 - type: nauc_ndcg_at_1000_diff1 value: 21.843092744321268 - type: nauc_ndcg_at_1000_max value: 44.93275990394279 - type: nauc_ndcg_at_1000_std value: 47.09186225236347 - type: nauc_ndcg_at_100_diff1 value: 25.180282568979095 - type: nauc_ndcg_at_100_max value: 41.737709709508394 - type: nauc_ndcg_at_100_std value: 38.80950644139446 - type: nauc_ndcg_at_10_diff1 value: 24.108368037214046 - type: nauc_ndcg_at_10_max value: 41.29298370689967 - type: nauc_ndcg_at_10_std value: 35.06450769738732 - type: nauc_ndcg_at_1_diff1 value: 35.51010679525079 - type: nauc_ndcg_at_1_max value: 42.40790024212412 - type: nauc_ndcg_at_1_std value: 26.696412036243157 - type: nauc_ndcg_at_20_diff1 value: 23.909989673256195 - type: nauc_ndcg_at_20_max value: 39.78444647091927 - type: nauc_ndcg_at_20_std value: 33.39544470364529 - type: nauc_ndcg_at_3_diff1 value: 22.50484297956035 - type: nauc_ndcg_at_3_max value: 39.14551926034168 - type: nauc_ndcg_at_3_std value: 30.330135925392014 - type: nauc_ndcg_at_5_diff1 value: 21.7798872028265 - type: nauc_ndcg_at_5_max value: 40.23856975248015 - type: nauc_ndcg_at_5_std value: 32.438381067440396 - type: nauc_precision_at_1000_diff1 value: -21.62692442272279 - type: nauc_precision_at_1000_max value: 0.9689046974430882 - type: nauc_precision_at_1000_std value: 18.54001058230465 - type: nauc_precision_at_100_diff1 value: -10.132258779856192 - type: nauc_precision_at_100_max value: 23.74516110444681 - type: nauc_precision_at_100_std value: 47.03416663319965 - type: nauc_precision_at_10_diff1 value: 1.543656509571949 - type: nauc_precision_at_10_max value: 36.98864812757555 - type: nauc_precision_at_10_std value: 46.56427199077426 - type: nauc_precision_at_1_diff1 value: 38.2472828531032 - type: nauc_precision_at_1_max value: 51.528473828685705 - type: nauc_precision_at_1_std value: 33.03676467942882 - type: nauc_precision_at_20_diff1 value: -4.612864872734335 - type: nauc_precision_at_20_max value: 34.03565449182125 - type: nauc_precision_at_20_std value: 48.880727648349534 - type: nauc_precision_at_3_diff1 value: 6.360850444467829 - type: nauc_precision_at_3_max value: 36.25816942368427 - type: nauc_precision_at_3_std value: 34.48882647419187 - type: nauc_precision_at_5_diff1 value: 2.6445596936740037 - type: nauc_precision_at_5_max value: 37.174463388899056 - type: nauc_precision_at_5_std value: 40.25254370626113 - type: nauc_recall_at_1000_diff1 value: 13.041227176748077 - type: nauc_recall_at_1000_max value: 39.722336427072094 - type: nauc_recall_at_1000_std value: 52.04032890059214 - type: nauc_recall_at_100_diff1 value: 18.286096899139153 - type: nauc_recall_at_100_max value: 34.072389201930314 - type: nauc_recall_at_100_std value: 37.73637623416653 - type: nauc_recall_at_10_diff1 value: 22.35560419280504 - type: nauc_recall_at_10_max value: 19.727247199595197 - type: nauc_recall_at_10_std value: 8.58498575109203 - type: nauc_recall_at_1_diff1 value: 38.67437644802124 - type: nauc_recall_at_1_max value: 14.52136658726491 - type: nauc_recall_at_1_std value: -2.8981666782088755 - type: nauc_recall_at_20_diff1 value: 19.026320886902916 - type: nauc_recall_at_20_max value: 22.753562309469867 - type: nauc_recall_at_20_std value: 14.89994263882445 - type: nauc_recall_at_3_diff1 value: 23.428129702129684 - type: nauc_recall_at_3_max value: 10.549153954790542 - type: nauc_recall_at_3_std value: -1.7590608997055206 - type: nauc_recall_at_5_diff1 value: 21.27448645803921 - type: nauc_recall_at_5_max value: 13.620279707461677 - type: nauc_recall_at_5_std value: 2.0577962208292675 - type: ndcg_at_1 value: 46.75 - type: ndcg_at_10 value: 34.827000000000005 - type: ndcg_at_100 value: 38.157999999999994 - type: ndcg_at_1000 value: 44.816 - type: ndcg_at_20 value: 34.152 - type: ndcg_at_3 value: 39.009 - type: ndcg_at_5 value: 36.826 - type: precision_at_1 value: 57.25 - type: precision_at_10 value: 27.575 - type: precision_at_100 value: 8.84 - type: precision_at_1000 value: 1.949 - type: precision_at_20 value: 20.724999999999998 - type: precision_at_3 value: 41.167 - type: precision_at_5 value: 35.199999999999996 - type: recall_at_1 value: 7.049999999999999 - type: recall_at_10 value: 19.817999999999998 - type: recall_at_100 value: 42.559999999999995 - type: recall_at_1000 value: 63.744 - type: recall_at_20 value: 25.968000000000004 - type: recall_at_3 value: 11.959 - type: recall_at_5 value: 14.939 task: type: Retrieval - dataset: config: default name: MTEB FiQA-PL (default) revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e split: test type: clarin-knext/fiqa-pl metrics: - type: main_score value: 38.828 - type: map_at_1 value: 19.126 - type: map_at_10 value: 31.002000000000002 - type: map_at_100 value: 32.736 - type: map_at_1000 value: 32.933 - type: map_at_20 value: 31.894 - type: map_at_3 value: 26.583000000000002 - type: map_at_5 value: 28.904000000000003 - type: mrr_at_1 value: 37.808641975308646 - type: mrr_at_10 value: 46.36745541838134 - type: mrr_at_100 value: 47.14140915794908 - type: mrr_at_1000 value: 47.190701435388846 - type: mrr_at_20 value: 46.81387776440309 - type: mrr_at_3 value: 43.750000000000014 - type: mrr_at_5 value: 45.23919753086418 - type: nauc_map_at_1000_diff1 value: 38.5532285881503 - type: nauc_map_at_1000_max value: 34.44383884813453 - type: nauc_map_at_1000_std value: -1.3963497949476722 - type: nauc_map_at_100_diff1 value: 38.49292464176943 - type: nauc_map_at_100_max value: 34.33752755618645 - type: nauc_map_at_100_std value: -1.4794032905848582 - type: nauc_map_at_10_diff1 value: 38.26061536370962 - type: nauc_map_at_10_max value: 33.16977912721411 - type: nauc_map_at_10_std value: -2.3853370604730393 - type: nauc_map_at_1_diff1 value: 46.288767289528344 - type: nauc_map_at_1_max value: 25.67706785013364 - type: nauc_map_at_1_std value: -6.989769609924645 - type: nauc_map_at_20_diff1 value: 38.507270129330685 - type: nauc_map_at_20_max value: 33.70963328055982 - type: nauc_map_at_20_std value: -1.9835510011554272 - type: nauc_map_at_3_diff1 value: 39.81061518646884 - type: nauc_map_at_3_max value: 30.101186374147748 - type: nauc_map_at_3_std value: -4.027120247237715 - type: nauc_map_at_5_diff1 value: 38.55602589746512 - type: nauc_map_at_5_max value: 31.515174267015983 - type: nauc_map_at_5_std value: -3.4064239358570303 - type: nauc_mrr_at_1000_diff1 value: 45.030514454725726 - type: nauc_mrr_at_1000_max value: 43.878919881666164 - type: nauc_mrr_at_1000_std value: 2.517594250297626 - type: nauc_mrr_at_100_diff1 value: 45.00868212878687 - type: nauc_mrr_at_100_max value: 43.87437011120001 - type: nauc_mrr_at_100_std value: 2.5257874265014966 - type: nauc_mrr_at_10_diff1 value: 44.855044606754056 - type: nauc_mrr_at_10_max value: 43.946617058785186 - type: nauc_mrr_at_10_std value: 2.5173751662794044 - type: nauc_mrr_at_1_diff1 value: 49.441510997817346 - type: nauc_mrr_at_1_max value: 43.08547383044357 - type: nauc_mrr_at_1_std value: -1.8747770703324347 - type: nauc_mrr_at_20_diff1 value: 45.019880416584215 - type: nauc_mrr_at_20_max value: 43.85691473662242 - type: nauc_mrr_at_20_std value: 2.4625487605091303 - type: nauc_mrr_at_3_diff1 value: 45.322041658604036 - type: nauc_mrr_at_3_max value: 43.95079293074395 - type: nauc_mrr_at_3_std value: 2.4644274393435737 - type: nauc_mrr_at_5_diff1 value: 44.99461837803437 - type: nauc_mrr_at_5_max value: 43.97934275090601 - type: nauc_mrr_at_5_std value: 2.5353091695125096 - type: nauc_ndcg_at_1000_diff1 value: 39.38449023275524 - type: nauc_ndcg_at_1000_max value: 39.48382767312788 - type: nauc_ndcg_at_1000_std value: 3.414789408343409 - type: nauc_ndcg_at_100_diff1 value: 38.29675861135578 - type: nauc_ndcg_at_100_max value: 38.2674786507297 - type: nauc_ndcg_at_100_std value: 2.7094055381218207 - type: nauc_ndcg_at_10_diff1 value: 38.09514955708717 - type: nauc_ndcg_at_10_max value: 36.664923238906525 - type: nauc_ndcg_at_10_std value: 0.6901410544967921 - type: nauc_ndcg_at_1_diff1 value: 49.441510997817346 - type: nauc_ndcg_at_1_max value: 43.08547383044357 - type: nauc_ndcg_at_1_std value: -1.8747770703324347 - type: nauc_ndcg_at_20_diff1 value: 38.44967736231759 - type: nauc_ndcg_at_20_max value: 36.871179313622584 - type: nauc_ndcg_at_20_std value: 1.157560360065234 - type: nauc_ndcg_at_3_diff1 value: 39.02419271805571 - type: nauc_ndcg_at_3_max value: 37.447669442586324 - type: nauc_ndcg_at_3_std value: 0.41502589779297794 - type: nauc_ndcg_at_5_diff1 value: 38.10233452742001 - type: nauc_ndcg_at_5_max value: 35.816381905465676 - type: nauc_ndcg_at_5_std value: -0.3704499913387088 - type: nauc_precision_at_1000_diff1 value: 2.451267097838658 - type: nauc_precision_at_1000_max value: 29.116394969085306 - type: nauc_precision_at_1000_std value: 14.85900786538363 - type: nauc_precision_at_100_diff1 value: 8.10919082251277 - type: nauc_precision_at_100_max value: 36.28388256191417 - type: nauc_precision_at_100_std value: 14.830039904317657 - type: nauc_precision_at_10_diff1 value: 15.02446609920477 - type: nauc_precision_at_10_max value: 41.008463775454054 - type: nauc_precision_at_10_std value: 10.431403152334486 - type: nauc_precision_at_1_diff1 value: 49.441510997817346 - type: nauc_precision_at_1_max value: 43.08547383044357 - type: nauc_precision_at_1_std value: -1.8747770703324347 - type: nauc_precision_at_20_diff1 value: 14.222022201169926 - type: nauc_precision_at_20_max value: 40.10189643835305 - type: nauc_precision_at_20_std value: 12.204443815975527 - type: nauc_precision_at_3_diff1 value: 25.41905395341234 - type: nauc_precision_at_3_max value: 41.56133905339819 - type: nauc_precision_at_3_std value: 5.575516915590082 - type: nauc_precision_at_5_diff1 value: 20.20081221089351 - type: nauc_precision_at_5_max value: 40.95218555916681 - type: nauc_precision_at_5_std value: 7.2040745500708745 - type: nauc_recall_at_1000_diff1 value: 28.021198234033395 - type: nauc_recall_at_1000_max value: 36.165148684597504 - type: nauc_recall_at_1000_std value: 28.28852356008973 - type: nauc_recall_at_100_diff1 value: 21.882447802741897 - type: nauc_recall_at_100_max value: 26.979684607567222 - type: nauc_recall_at_100_std value: 9.783658817010082 - type: nauc_recall_at_10_diff1 value: 28.493097951178818 - type: nauc_recall_at_10_max value: 29.40937476550134 - type: nauc_recall_at_10_std value: 2.7593763576979353 - type: nauc_recall_at_1_diff1 value: 46.288767289528344 - type: nauc_recall_at_1_max value: 25.67706785013364 - type: nauc_recall_at_1_std value: -6.989769609924645 - type: nauc_recall_at_20_diff1 value: 27.638381299425234 - type: nauc_recall_at_20_max value: 27.942035836106328 - type: nauc_recall_at_20_std value: 3.489835161380808 - type: nauc_recall_at_3_diff1 value: 33.90054781392646 - type: nauc_recall_at_3_max value: 27.778812533030322 - type: nauc_recall_at_3_std value: -0.03054068020022706 - type: nauc_recall_at_5_diff1 value: 30.279060732221346 - type: nauc_recall_at_5_max value: 27.49854749597931 - type: nauc_recall_at_5_std value: 0.5434664581939099 - type: ndcg_at_1 value: 37.809 - type: ndcg_at_10 value: 38.828 - type: ndcg_at_100 value: 45.218 - type: ndcg_at_1000 value: 48.510999999999996 - type: ndcg_at_20 value: 41.11 - type: ndcg_at_3 value: 34.466 - type: ndcg_at_5 value: 35.843 - type: precision_at_1 value: 37.809 - type: precision_at_10 value: 11.157 - type: precision_at_100 value: 1.762 - type: precision_at_1000 value: 0.233 - type: precision_at_20 value: 6.497 - type: precision_at_3 value: 23.044999999999998 - type: precision_at_5 value: 17.284 - type: recall_at_1 value: 19.126 - type: recall_at_10 value: 46.062 - type: recall_at_100 value: 70.22800000000001 - type: recall_at_1000 value: 89.803 - type: recall_at_20 value: 53.217999999999996 - type: recall_at_3 value: 30.847 - type: recall_at_5 value: 37.11 task: type: Retrieval - dataset: config: default name: MTEB HotpotQA-PL (default) revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907 split: test type: clarin-knext/hotpotqa-pl metrics: - type: main_score value: 60.27 - type: map_at_1 value: 35.199000000000005 - type: map_at_10 value: 51.369 - type: map_at_100 value: 52.212 - type: map_at_1000 value: 52.28 - type: map_at_20 value: 51.864 - type: map_at_3 value: 48.446 - type: map_at_5 value: 50.302 - type: mrr_at_1 value: 70.39837947332883 - type: mrr_at_10 value: 76.8346141067273 - type: mrr_at_100 value: 77.10724392048137 - type: mrr_at_1000 value: 77.12037412892865 - type: mrr_at_20 value: 77.01061532947222 - type: mrr_at_3 value: 75.5908170155299 - type: mrr_at_5 value: 76.39095205941899 - type: nauc_map_at_1000_diff1 value: 24.701387884989117 - type: nauc_map_at_1000_max value: 23.25553235642178 - type: nauc_map_at_1000_std value: 7.1803506915661774 - type: nauc_map_at_100_diff1 value: 24.674498622483103 - type: nauc_map_at_100_max value: 23.234948525052175 - type: nauc_map_at_100_std value: 7.168677997105447 - type: nauc_map_at_10_diff1 value: 24.676025039755626 - type: nauc_map_at_10_max value: 23.171971872726964 - type: nauc_map_at_10_std value: 6.485610909852058 - type: nauc_map_at_1_diff1 value: 68.90178464319715 - type: nauc_map_at_1_max value: 46.05537868917558 - type: nauc_map_at_1_std value: 1.7658552480698708 - type: nauc_map_at_20_diff1 value: 24.69297151842494 - type: nauc_map_at_20_max value: 23.213064691673637 - type: nauc_map_at_20_std value: 6.9357946556849 - type: nauc_map_at_3_diff1 value: 26.279128947950507 - type: nauc_map_at_3_max value: 23.929537354117922 - type: nauc_map_at_3_std value: 4.625061565714759 - type: nauc_map_at_5_diff1 value: 25.04448959482816 - type: nauc_map_at_5_max value: 23.432012857899338 - type: nauc_map_at_5_std value: 5.845744681998008 - type: nauc_mrr_at_1000_diff1 value: 66.7503918108276 - type: nauc_mrr_at_1000_max value: 48.42897342336844 - type: nauc_mrr_at_1000_std value: 5.3097517971144415 - type: nauc_mrr_at_100_diff1 value: 66.74645215862695 - type: nauc_mrr_at_100_max value: 48.4368663009989 - type: nauc_mrr_at_100_std value: 5.322297898555188 - type: nauc_mrr_at_10_diff1 value: 66.69310166180729 - type: nauc_mrr_at_10_max value: 48.475437698330225 - type: nauc_mrr_at_10_std value: 5.258183461631702 - type: nauc_mrr_at_1_diff1 value: 68.90178464319715 - type: nauc_mrr_at_1_max value: 46.05537868917558 - type: nauc_mrr_at_1_std value: 1.7658552480698708 - type: nauc_mrr_at_20_diff1 value: 66.72000262431975 - type: nauc_mrr_at_20_max value: 48.45593642981319 - type: nauc_mrr_at_20_std value: 5.353665929072101 - type: nauc_mrr_at_3_diff1 value: 66.84936676396276 - type: nauc_mrr_at_3_max value: 48.466611276778295 - type: nauc_mrr_at_3_std value: 4.485810398557475 - type: nauc_mrr_at_5_diff1 value: 66.62362565394174 - type: nauc_mrr_at_5_max value: 48.456431835482014 - type: nauc_mrr_at_5_std value: 5.08482458391903 - type: nauc_ndcg_at_1000_diff1 value: 29.984825173719443 - type: nauc_ndcg_at_1000_max value: 27.289179238639893 - type: nauc_ndcg_at_1000_std value: 10.661480455527526 - type: nauc_ndcg_at_100_diff1 value: 29.322074257047877 - type: nauc_ndcg_at_100_max value: 26.850650276220605 - type: nauc_ndcg_at_100_std value: 10.599247982501902 - type: nauc_ndcg_at_10_diff1 value: 29.659909113886094 - type: nauc_ndcg_at_10_max value: 26.836139599331005 - type: nauc_ndcg_at_10_std value: 8.12844399452719 - type: nauc_ndcg_at_1_diff1 value: 68.90178464319715 - type: nauc_ndcg_at_1_max value: 46.05537868917558 - type: nauc_ndcg_at_1_std value: 1.7658552480698708 - type: nauc_ndcg_at_20_diff1 value: 29.510802214854294 - type: nauc_ndcg_at_20_max value: 26.775562637730722 - type: nauc_ndcg_at_20_std value: 9.341342661702363 - type: nauc_ndcg_at_3_diff1 value: 32.741885846292966 - type: nauc_ndcg_at_3_max value: 28.44225108761343 - type: nauc_ndcg_at_3_std value: 5.204440768465042 - type: nauc_ndcg_at_5_diff1 value: 30.57856348635919 - type: nauc_ndcg_at_5_max value: 27.475007474301698 - type: nauc_ndcg_at_5_std value: 6.961546044312487 - type: nauc_precision_at_1000_diff1 value: 0.002113156309413332 - type: nauc_precision_at_1000_max value: 11.198242419541286 - type: nauc_precision_at_1000_std value: 28.69676419166541 - type: nauc_precision_at_100_diff1 value: 3.6049575557782627 - type: nauc_precision_at_100_max value: 12.499173524574791 - type: nauc_precision_at_100_std value: 23.3755281004721 - type: nauc_precision_at_10_diff1 value: 10.922574784853193 - type: nauc_precision_at_10_max value: 16.23221529562036 - type: nauc_precision_at_10_std value: 12.45014808813857 - type: nauc_precision_at_1_diff1 value: 68.90178464319715 - type: nauc_precision_at_1_max value: 46.05537868917558 - type: nauc_precision_at_1_std value: 1.7658552480698708 - type: nauc_precision_at_20_diff1 value: 8.840710781302827 - type: nauc_precision_at_20_max value: 14.804644554205524 - type: nauc_precision_at_20_std value: 16.245009770815237 - type: nauc_precision_at_3_diff1 value: 19.447291487137573 - type: nauc_precision_at_3_max value: 21.47123471597057 - type: nauc_precision_at_3_std value: 6.441862800128802 - type: nauc_precision_at_5_diff1 value: 14.078545719721108 - type: nauc_precision_at_5_max value: 18.468288046016387 - type: nauc_precision_at_5_std value: 9.58650641691393 - type: nauc_recall_at_1000_diff1 value: 0.0021131563095336584 - type: nauc_recall_at_1000_max value: 11.198242419541558 - type: nauc_recall_at_1000_std value: 28.6967641916655 - type: nauc_recall_at_100_diff1 value: 3.6049575557781393 - type: nauc_recall_at_100_max value: 12.499173524574765 - type: nauc_recall_at_100_std value: 23.375528100472074 - type: nauc_recall_at_10_diff1 value: 10.922574784853168 - type: nauc_recall_at_10_max value: 16.2322152956203 - type: nauc_recall_at_10_std value: 12.450148088138535 - type: nauc_recall_at_1_diff1 value: 68.90178464319715 - type: nauc_recall_at_1_max value: 46.05537868917558 - type: nauc_recall_at_1_std value: 1.7658552480698708 - type: nauc_recall_at_20_diff1 value: 8.840710781302905 - type: nauc_recall_at_20_max value: 14.804644554205515 - type: nauc_recall_at_20_std value: 16.245009770815273 - type: nauc_recall_at_3_diff1 value: 19.447291487137498 - type: nauc_recall_at_3_max value: 21.47123471597054 - type: nauc_recall_at_3_std value: 6.441862800128763 - type: nauc_recall_at_5_diff1 value: 14.07854571972115 - type: nauc_recall_at_5_max value: 18.468288046016337 - type: nauc_recall_at_5_std value: 9.586506416913904 - type: ndcg_at_1 value: 70.39800000000001 - type: ndcg_at_10 value: 60.27 - type: ndcg_at_100 value: 63.400999999999996 - type: ndcg_at_1000 value: 64.847 - type: ndcg_at_20 value: 61.571 - type: ndcg_at_3 value: 55.875 - type: ndcg_at_5 value: 58.36599999999999 - type: precision_at_1 value: 70.39800000000001 - type: precision_at_10 value: 12.46 - type: precision_at_100 value: 1.493 - type: precision_at_1000 value: 0.169 - type: precision_at_20 value: 6.65 - type: precision_at_3 value: 35.062 - type: precision_at_5 value: 23.009 - type: recall_at_1 value: 35.199000000000005 - type: recall_at_10 value: 62.302 - type: recall_at_100 value: 74.666 - type: recall_at_1000 value: 84.355 - type: recall_at_20 value: 66.496 - type: recall_at_3 value: 52.593 - type: recall_at_5 value: 57.522 task: type: Retrieval - dataset: config: default name: MTEB MSMARCO-PL (default) revision: 8634c07806d5cce3a6138e260e59b81760a0a640 split: test type: clarin-knext/msmarco-pl metrics: - type: main_score value: 64.886 - type: map_at_1 value: 1.644 - type: map_at_10 value: 12.24 - type: map_at_100 value: 28.248 - type: map_at_1000 value: 33.506 - type: map_at_20 value: 17.497 - type: map_at_3 value: 4.9399999999999995 - type: map_at_5 value: 8.272 - type: mrr_at_1 value: 83.72093023255815 - type: mrr_at_10 value: 91.08527131782945 - type: mrr_at_100 value: 91.08527131782945 - type: mrr_at_1000 value: 91.08527131782945 - type: mrr_at_20 value: 91.08527131782945 - type: mrr_at_3 value: 91.08527131782945 - type: mrr_at_5 value: 91.08527131782945 - type: nauc_map_at_1000_diff1 value: -36.428271627303424 - type: nauc_map_at_1000_max value: 44.87615127218638 - type: nauc_map_at_1000_std value: 67.92696808824724 - type: nauc_map_at_100_diff1 value: -28.11674206786188 - type: nauc_map_at_100_max value: 36.422779766334955 - type: nauc_map_at_100_std value: 49.99876313755116 - type: nauc_map_at_10_diff1 value: -5.838593619806058 - type: nauc_map_at_10_max value: 11.026519190509742 - type: nauc_map_at_10_std value: 2.5268752263522045 - type: nauc_map_at_1_diff1 value: 17.897907271073016 - type: nauc_map_at_1_max value: 12.229062762540844 - type: nauc_map_at_1_std value: -4.088830895573149 - type: nauc_map_at_20_diff1 value: -13.871097716255626 - type: nauc_map_at_20_max value: 19.291271635609533 - type: nauc_map_at_20_std value: 16.745335606507826 - type: nauc_map_at_3_diff1 value: 4.425238457033843 - type: nauc_map_at_3_max value: 4.611864744680824 - type: nauc_map_at_3_std value: -8.986916608582863 - type: nauc_map_at_5_diff1 value: -6.254849256920095 - type: nauc_map_at_5_max value: 2.729437079919823 - type: nauc_map_at_5_std value: -7.235906279913092 - type: nauc_mrr_at_1000_diff1 value: 52.18669104947672 - type: nauc_mrr_at_1000_max value: 68.26259125411818 - type: nauc_mrr_at_1000_std value: 56.345086428353575 - type: nauc_mrr_at_100_diff1 value: 52.18669104947672 - type: nauc_mrr_at_100_max value: 68.26259125411818 - type: nauc_mrr_at_100_std value: 56.345086428353575 - type: nauc_mrr_at_10_diff1 value: 52.18669104947672 - type: nauc_mrr_at_10_max value: 68.26259125411818 - type: nauc_mrr_at_10_std value: 56.345086428353575 - type: nauc_mrr_at_1_diff1 value: 56.55126663944154 - type: nauc_mrr_at_1_max value: 66.37014285522565 - type: nauc_mrr_at_1_std value: 53.2508271389779 - type: nauc_mrr_at_20_diff1 value: 52.18669104947672 - type: nauc_mrr_at_20_max value: 68.26259125411818 - type: nauc_mrr_at_20_std value: 56.345086428353575 - type: nauc_mrr_at_3_diff1 value: 52.18669104947672 - type: nauc_mrr_at_3_max value: 68.26259125411818 - type: nauc_mrr_at_3_std value: 56.345086428353575 - type: nauc_mrr_at_5_diff1 value: 52.18669104947672 - type: nauc_mrr_at_5_max value: 68.26259125411818 - type: nauc_mrr_at_5_std value: 56.345086428353575 - type: nauc_ndcg_at_1000_diff1 value: -19.06422926483731 - type: nauc_ndcg_at_1000_max value: 56.30853514590265 - type: nauc_ndcg_at_1000_std value: 70.30810947505557 - type: nauc_ndcg_at_100_diff1 value: -25.72587586459692 - type: nauc_ndcg_at_100_max value: 51.433781241604194 - type: nauc_ndcg_at_100_std value: 68.37678512652792 - type: nauc_ndcg_at_10_diff1 value: -23.21198108212602 - type: nauc_ndcg_at_10_max value: 43.5450720846516 - type: nauc_ndcg_at_10_std value: 48.78307907005605 - type: nauc_ndcg_at_1_diff1 value: 44.00179301267447 - type: nauc_ndcg_at_1_max value: 48.202370455680395 - type: nauc_ndcg_at_1_std value: 25.69655992704088 - type: nauc_ndcg_at_20_diff1 value: -33.88168753446507 - type: nauc_ndcg_at_20_max value: 45.16199742613164 - type: nauc_ndcg_at_20_std value: 61.87098383164902 - type: nauc_ndcg_at_3_diff1 value: 11.19174449544048 - type: nauc_ndcg_at_3_max value: 44.34069860560555 - type: nauc_ndcg_at_3_std value: 27.451258369798115 - type: nauc_ndcg_at_5_diff1 value: -7.186520929432436 - type: nauc_ndcg_at_5_max value: 43.41869981139378 - type: nauc_ndcg_at_5_std value: 34.89898115995178 - type: nauc_precision_at_1000_diff1 value: -34.43998154563451 - type: nauc_precision_at_1000_max value: 29.172655907480372 - type: nauc_precision_at_1000_std value: 65.15824469614837 - type: nauc_precision_at_100_diff1 value: -37.82409643259692 - type: nauc_precision_at_100_max value: 38.24986991317909 - type: nauc_precision_at_100_std value: 72.74768183105327 - type: nauc_precision_at_10_diff1 value: -32.21556182780535 - type: nauc_precision_at_10_max value: 34.27170432382651 - type: nauc_precision_at_10_std value: 58.358255004394664 - type: nauc_precision_at_1_diff1 value: 56.55126663944154 - type: nauc_precision_at_1_max value: 66.37014285522565 - type: nauc_precision_at_1_std value: 53.2508271389779 - type: nauc_precision_at_20_diff1 value: -40.18751579026395 - type: nauc_precision_at_20_max value: 33.960783153758896 - type: nauc_precision_at_20_std value: 65.42918390184195 - type: nauc_precision_at_3_diff1 value: -7.073870209006578 - type: nauc_precision_at_3_max value: 50.81535269862325 - type: nauc_precision_at_3_std value: 59.248681565955685 - type: nauc_precision_at_5_diff1 value: -31.136580596983876 - type: nauc_precision_at_5_max value: 45.88147792380426 - type: nauc_precision_at_5_std value: 67.46814230928243 - type: nauc_recall_at_1000_diff1 value: -23.15699999594577 - type: nauc_recall_at_1000_max value: 39.77277799761876 - type: nauc_recall_at_1000_std value: 60.326168012901114 - type: nauc_recall_at_100_diff1 value: -21.636664823598498 - type: nauc_recall_at_100_max value: 31.104969346131583 - type: nauc_recall_at_100_std value: 38.811686891592096 - type: nauc_recall_at_10_diff1 value: -10.542765625053569 - type: nauc_recall_at_10_max value: 2.043876058107446 - type: nauc_recall_at_10_std value: -5.578449908984766 - type: nauc_recall_at_1_diff1 value: 17.897907271073016 - type: nauc_recall_at_1_max value: 12.229062762540844 - type: nauc_recall_at_1_std value: -4.088830895573149 - type: nauc_recall_at_20_diff1 value: -15.132909355710103 - type: nauc_recall_at_20_max value: 12.659765287241065 - type: nauc_recall_at_20_std value: 8.277887800815819 - type: nauc_recall_at_3_diff1 value: -3.1975017812715016 - type: nauc_recall_at_3_max value: -3.5539857085038538 - type: nauc_recall_at_3_std value: -14.712102851318118 - type: nauc_recall_at_5_diff1 value: -14.040507717380743 - type: nauc_recall_at_5_max value: -6.126912150131701 - type: nauc_recall_at_5_std value: -13.821624015640355 - type: ndcg_at_1 value: 71.318 - type: ndcg_at_10 value: 64.886 - type: ndcg_at_100 value: 53.187 - type: ndcg_at_1000 value: 59.897999999999996 - type: ndcg_at_20 value: 58.96 - type: ndcg_at_3 value: 69.736 - type: ndcg_at_5 value: 70.14099999999999 - type: precision_at_1 value: 83.721 - type: precision_at_10 value: 71.163 - type: precision_at_100 value: 29.465000000000003 - type: precision_at_1000 value: 5.665 - type: precision_at_20 value: 57.791000000000004 - type: precision_at_3 value: 82.171 - type: precision_at_5 value: 81.86 - type: recall_at_1 value: 1.644 - type: recall_at_10 value: 14.238000000000001 - type: recall_at_100 value: 39.831 - type: recall_at_1000 value: 64.057 - type: recall_at_20 value: 21.021 - type: recall_at_3 value: 5.53 - type: recall_at_5 value: 9.623 task: type: Retrieval - dataset: config: default name: MTEB NFCorpus-PL (default) revision: 9a6f9567fda928260afed2de480d79c98bf0bec0 split: test type: clarin-knext/nfcorpus-pl metrics: - type: main_score value: 31.391000000000002 - type: map_at_1 value: 4.163 - type: map_at_10 value: 10.744 - type: map_at_100 value: 14.038999999999998 - type: map_at_1000 value: 15.434999999999999 - type: map_at_20 value: 12.16 - type: map_at_3 value: 7.614999999999999 - type: map_at_5 value: 9.027000000000001 - type: mrr_at_1 value: 39.0092879256966 - type: mrr_at_10 value: 48.69809327239668 - type: mrr_at_100 value: 49.20788148442068 - type: mrr_at_1000 value: 49.25509336494706 - type: mrr_at_20 value: 48.99606551850896 - type: mrr_at_3 value: 46.284829721362236 - type: mrr_at_5 value: 47.77089783281735 - type: nauc_map_at_1000_diff1 value: 22.75421477116417 - type: nauc_map_at_1000_max value: 49.242283787799046 - type: nauc_map_at_1000_std value: 29.056888272331832 - type: nauc_map_at_100_diff1 value: 23.585977398585594 - type: nauc_map_at_100_max value: 48.25845199409498 - type: nauc_map_at_100_std value: 24.944264511223693 - type: nauc_map_at_10_diff1 value: 27.386613094780255 - type: nauc_map_at_10_max value: 41.52415346691586 - type: nauc_map_at_10_std value: 12.93872448563755 - type: nauc_map_at_1_diff1 value: 46.78688143865053 - type: nauc_map_at_1_max value: 37.20408843995871 - type: nauc_map_at_1_std value: 4.383444959401098 - type: nauc_map_at_20_diff1 value: 25.590969047740288 - type: nauc_map_at_20_max value: 44.57109307999418 - type: nauc_map_at_20_std value: 16.45855141821407 - type: nauc_map_at_3_diff1 value: 36.30017108362863 - type: nauc_map_at_3_max value: 34.66149613991648 - type: nauc_map_at_3_std value: 5.67985905078467 - type: nauc_map_at_5_diff1 value: 31.157644795417223 - type: nauc_map_at_5_max value: 37.274738661636825 - type: nauc_map_at_5_std value: 8.70088872394168 - type: nauc_mrr_at_1000_diff1 value: 25.638564218157384 - type: nauc_mrr_at_1000_max value: 57.77788270285353 - type: nauc_mrr_at_1000_std value: 43.507586592911274 - type: nauc_mrr_at_100_diff1 value: 25.662002580561584 - type: nauc_mrr_at_100_max value: 57.80578394278584 - type: nauc_mrr_at_100_std value: 43.543905743986635 - type: nauc_mrr_at_10_diff1 value: 25.426034796339835 - type: nauc_mrr_at_10_max value: 57.68443186258669 - type: nauc_mrr_at_10_std value: 43.438009108331215 - type: nauc_mrr_at_1_diff1 value: 26.073028156311075 - type: nauc_mrr_at_1_max value: 52.11817916720053 - type: nauc_mrr_at_1_std value: 37.41073893153695 - type: nauc_mrr_at_20_diff1 value: 25.548645553336147 - type: nauc_mrr_at_20_max value: 57.78552760401915 - type: nauc_mrr_at_20_std value: 43.521687428822325 - type: nauc_mrr_at_3_diff1 value: 25.72662577397805 - type: nauc_mrr_at_3_max value: 56.891263536265605 - type: nauc_mrr_at_3_std value: 41.384872305390104 - type: nauc_mrr_at_5_diff1 value: 25.552211551655386 - type: nauc_mrr_at_5_max value: 57.976813828353926 - type: nauc_mrr_at_5_std value: 43.504564461855544 - type: nauc_ndcg_at_1000_diff1 value: 23.456158044182757 - type: nauc_ndcg_at_1000_max value: 60.05411773552709 - type: nauc_ndcg_at_1000_std value: 47.857510017262584 - type: nauc_ndcg_at_100_diff1 value: 19.711635700390772 - type: nauc_ndcg_at_100_max value: 56.178746740470665 - type: nauc_ndcg_at_100_std value: 42.36829180286942 - type: nauc_ndcg_at_10_diff1 value: 18.364428967788413 - type: nauc_ndcg_at_10_max value: 54.38372506578223 - type: nauc_ndcg_at_10_std value: 41.75765411340369 - type: nauc_ndcg_at_1_diff1 value: 26.571093272640773 - type: nauc_ndcg_at_1_max value: 51.061788341958284 - type: nauc_ndcg_at_1_std value: 36.514987974075986 - type: nauc_ndcg_at_20_diff1 value: 18.345487193027697 - type: nauc_ndcg_at_20_max value: 54.62621882656994 - type: nauc_ndcg_at_20_std value: 41.42835554714241 - type: nauc_ndcg_at_3_diff1 value: 23.260105658139025 - type: nauc_ndcg_at_3_max value: 52.07747385334546 - type: nauc_ndcg_at_3_std value: 36.91985577837284 - type: nauc_ndcg_at_5_diff1 value: 20.40428109665566 - type: nauc_ndcg_at_5_max value: 53.52015347884604 - type: nauc_ndcg_at_5_std value: 39.46008849580017 - type: nauc_precision_at_1000_diff1 value: -7.3487344916380035 - type: nauc_precision_at_1000_max value: 16.58045221394852 - type: nauc_precision_at_1000_std value: 38.94030932397075 - type: nauc_precision_at_100_diff1 value: -5.257743986683922 - type: nauc_precision_at_100_max value: 34.43071687475306 - type: nauc_precision_at_100_std value: 53.499519170670474 - type: nauc_precision_at_10_diff1 value: 2.385136433119139 - type: nauc_precision_at_10_max value: 47.210743878631064 - type: nauc_precision_at_10_std value: 47.22767704186548 - type: nauc_precision_at_1_diff1 value: 26.073028156311075 - type: nauc_precision_at_1_max value: 52.11817916720053 - type: nauc_precision_at_1_std value: 37.41073893153695 - type: nauc_precision_at_20_diff1 value: -0.3531531127238474 - type: nauc_precision_at_20_max value: 44.78044604856974 - type: nauc_precision_at_20_std value: 49.532804150743615 - type: nauc_precision_at_3_diff1 value: 15.350050569991447 - type: nauc_precision_at_3_max value: 51.01572315596549 - type: nauc_precision_at_3_std value: 38.801125728413155 - type: nauc_precision_at_5_diff1 value: 9.109003666144694 - type: nauc_precision_at_5_max value: 50.935269774898494 - type: nauc_precision_at_5_std value: 43.323548180559676 - type: nauc_recall_at_1000_diff1 value: 16.64743647648886 - type: nauc_recall_at_1000_max value: 38.46012283772285 - type: nauc_recall_at_1000_std value: 36.02016164796441 - type: nauc_recall_at_100_diff1 value: 14.005834785186744 - type: nauc_recall_at_100_max value: 37.70026105513647 - type: nauc_recall_at_100_std value: 27.085222642129697 - type: nauc_recall_at_10_diff1 value: 21.204106627422632 - type: nauc_recall_at_10_max value: 36.737624881893424 - type: nauc_recall_at_10_std value: 13.755054514272702 - type: nauc_recall_at_1_diff1 value: 46.78688143865053 - type: nauc_recall_at_1_max value: 37.20408843995871 - type: nauc_recall_at_1_std value: 4.383444959401098 - type: nauc_recall_at_20_diff1 value: 19.740977611421933 - type: nauc_recall_at_20_max value: 39.21908969539783 - type: nauc_recall_at_20_std value: 16.560269670318494 - type: nauc_recall_at_3_diff1 value: 32.189359545367815 - type: nauc_recall_at_3_max value: 31.693634445562758 - type: nauc_recall_at_3_std value: 6.246326281543587 - type: nauc_recall_at_5_diff1 value: 25.51586860499901 - type: nauc_recall_at_5_max value: 33.15934725342885 - type: nauc_recall_at_5_std value: 9.677778511696705 - type: ndcg_at_1 value: 37.307 - type: ndcg_at_10 value: 31.391000000000002 - type: ndcg_at_100 value: 28.877999999999997 - type: ndcg_at_1000 value: 37.16 - type: ndcg_at_20 value: 29.314 - type: ndcg_at_3 value: 35.405 - type: ndcg_at_5 value: 33.922999999999995 - type: precision_at_1 value: 39.009 - type: precision_at_10 value: 24.52 - type: precision_at_100 value: 7.703 - type: precision_at_1000 value: 2.04 - type: precision_at_20 value: 18.08 - type: precision_at_3 value: 34.469 - type: precision_at_5 value: 30.712 - type: recall_at_1 value: 4.163 - type: recall_at_10 value: 15.015999999999998 - type: recall_at_100 value: 30.606 - type: recall_at_1000 value: 59.606 - type: recall_at_20 value: 19.09 - type: recall_at_3 value: 9.139 - type: recall_at_5 value: 11.477 task: type: Retrieval - dataset: config: default name: MTEB NQ-PL (default) revision: f171245712cf85dd4700b06bef18001578d0ca8d split: test type: clarin-knext/nq-pl metrics: - type: main_score value: 54.017 - type: map_at_1 value: 34.193 - type: map_at_10 value: 47.497 - type: map_at_100 value: 48.441 - type: map_at_1000 value: 48.481 - type: map_at_20 value: 48.093 - type: map_at_3 value: 44.017 - type: map_at_5 value: 46.111000000000004 - type: mrr_at_1 value: 37.949015063731174 - type: mrr_at_10 value: 49.915772315105954 - type: mrr_at_100 value: 50.62841255829997 - type: mrr_at_1000 value: 50.656773027666745 - type: mrr_at_20 value: 50.37785276657083 - type: mrr_at_3 value: 46.98725376593267 - type: mrr_at_5 value: 48.763035921205066 - type: nauc_map_at_1000_diff1 value: 39.5632191792873 - type: nauc_map_at_1000_max value: 37.4728247053629 - type: nauc_map_at_1000_std value: 5.742498414663762 - type: nauc_map_at_100_diff1 value: 39.555570352061906 - type: nauc_map_at_100_max value: 37.497880976847334 - type: nauc_map_at_100_std value: 5.7798021019465375 - type: nauc_map_at_10_diff1 value: 39.5423723444454 - type: nauc_map_at_10_max value: 37.41661971723365 - type: nauc_map_at_10_std value: 5.2378002164144695 - type: nauc_map_at_1_diff1 value: 41.52697034146981 - type: nauc_map_at_1_max value: 28.558995576942863 - type: nauc_map_at_1_std value: 0.13094542859192052 - type: nauc_map_at_20_diff1 value: 39.55484628943701 - type: nauc_map_at_20_max value: 37.5247794933719 - type: nauc_map_at_20_std value: 5.702881342279231 - type: nauc_map_at_3_diff1 value: 39.949323925425325 - type: nauc_map_at_3_max value: 35.770298168901924 - type: nauc_map_at_3_std value: 2.9127112432479874 - type: nauc_map_at_5_diff1 value: 39.768310617004545 - type: nauc_map_at_5_max value: 37.1549191664796 - type: nauc_map_at_5_std value: 4.4681285748269515 - type: nauc_mrr_at_1000_diff1 value: 39.14001746706457 - type: nauc_mrr_at_1000_max value: 37.477376518267775 - type: nauc_mrr_at_1000_std value: 6.8088891531621565 - type: nauc_mrr_at_100_diff1 value: 39.13054707413684 - type: nauc_mrr_at_100_max value: 37.498126443766274 - type: nauc_mrr_at_100_std value: 6.839411380129971 - type: nauc_mrr_at_10_diff1 value: 39.09764730048156 - type: nauc_mrr_at_10_max value: 37.58593798217306 - type: nauc_mrr_at_10_std value: 6.713795164982413 - type: nauc_mrr_at_1_diff1 value: 41.581599918664075 - type: nauc_mrr_at_1_max value: 31.500589231378722 - type: nauc_mrr_at_1_std value: 2.059116370339438 - type: nauc_mrr_at_20_diff1 value: 39.09011023988447 - type: nauc_mrr_at_20_max value: 37.55856008791344 - type: nauc_mrr_at_20_std value: 6.847165397615844 - type: nauc_mrr_at_3_diff1 value: 39.382542043738 - type: nauc_mrr_at_3_max value: 36.49265363659468 - type: nauc_mrr_at_3_std value: 4.759157976438336 - type: nauc_mrr_at_5_diff1 value: 39.304826333759976 - type: nauc_mrr_at_5_max value: 37.46326016736024 - type: nauc_mrr_at_5_std value: 6.122608305766621 - type: nauc_ndcg_at_1000_diff1 value: 38.568500038453266 - type: nauc_ndcg_at_1000_max value: 39.799710882413166 - type: nauc_ndcg_at_1000_std value: 9.357010223096639 - type: nauc_ndcg_at_100_diff1 value: 38.38026091343228 - type: nauc_ndcg_at_100_max value: 40.48398173542486 - type: nauc_ndcg_at_100_std value: 10.373054013302214 - type: nauc_ndcg_at_10_diff1 value: 38.27340980909964 - type: nauc_ndcg_at_10_max value: 40.35241649744093 - type: nauc_ndcg_at_10_std value: 8.579139930345168 - type: nauc_ndcg_at_1_diff1 value: 41.581599918664075 - type: nauc_ndcg_at_1_max value: 31.500589231378722 - type: nauc_ndcg_at_1_std value: 2.059116370339438 - type: nauc_ndcg_at_20_diff1 value: 38.26453028884807 - type: nauc_ndcg_at_20_max value: 40.70517858426641 - type: nauc_ndcg_at_20_std value: 9.987693876137905 - type: nauc_ndcg_at_3_diff1 value: 39.2078971733273 - type: nauc_ndcg_at_3_max value: 37.48672195565316 - type: nauc_ndcg_at_3_std value: 4.051464994659221 - type: nauc_ndcg_at_5_diff1 value: 38.883693595665285 - type: nauc_ndcg_at_5_max value: 39.763115634437135 - type: nauc_ndcg_at_5_std value: 6.738980451582073 - type: nauc_precision_at_1000_diff1 value: -7.223215910619012 - type: nauc_precision_at_1000_max value: 13.075844604892161 - type: nauc_precision_at_1000_std value: 19.864336920890107 - type: nauc_precision_at_100_diff1 value: 1.3305994810812418 - type: nauc_precision_at_100_max value: 25.9219108557104 - type: nauc_precision_at_100_std value: 27.5076605928207 - type: nauc_precision_at_10_diff1 value: 18.441551484970326 - type: nauc_precision_at_10_max value: 39.85995330437054 - type: nauc_precision_at_10_std value: 20.561269077428914 - type: nauc_precision_at_1_diff1 value: 41.581599918664075 - type: nauc_precision_at_1_max value: 31.500589231378722 - type: nauc_precision_at_1_std value: 2.059116370339438 - type: nauc_precision_at_20_diff1 value: 12.579593891480531 - type: nauc_precision_at_20_max value: 36.620221830588775 - type: nauc_precision_at_20_std value: 26.40364876775059 - type: nauc_precision_at_3_diff1 value: 30.158859294487073 - type: nauc_precision_at_3_max value: 41.168215766389174 - type: nauc_precision_at_3_std value: 9.44345004450809 - type: nauc_precision_at_5_diff1 value: 25.438624678672785 - type: nauc_precision_at_5_max value: 42.72802023518524 - type: nauc_precision_at_5_std value: 15.357657388511099 - type: nauc_recall_at_1000_diff1 value: 24.987564782718003 - type: nauc_recall_at_1000_max value: 70.508416373353 - type: nauc_recall_at_1000_std value: 69.75092280398808 - type: nauc_recall_at_100_diff1 value: 29.504202856421397 - type: nauc_recall_at_100_max value: 63.41356585545318 - type: nauc_recall_at_100_std value: 50.09250954437847 - type: nauc_recall_at_10_diff1 value: 32.355776022971774 - type: nauc_recall_at_10_max value: 49.47121901667283 - type: nauc_recall_at_10_std value: 19.418439406631244 - type: nauc_recall_at_1_diff1 value: 41.52697034146981 - type: nauc_recall_at_1_max value: 28.558995576942863 - type: nauc_recall_at_1_std value: 0.13094542859192052 - type: nauc_recall_at_20_diff1 value: 31.57334731023589 - type: nauc_recall_at_20_max value: 54.06567225197383 - type: nauc_recall_at_20_std value: 29.222029720570468 - type: nauc_recall_at_3_diff1 value: 36.45033533275773 - type: nauc_recall_at_3_max value: 40.39529713780803 - type: nauc_recall_at_3_std value: 5.21893897772794 - type: nauc_recall_at_5_diff1 value: 35.18471678478859 - type: nauc_recall_at_5_max value: 46.20100816867823 - type: nauc_recall_at_5_std value: 11.94481894633221 - type: ndcg_at_1 value: 37.949 - type: ndcg_at_10 value: 54.017 - type: ndcg_at_100 value: 58.126 - type: ndcg_at_1000 value: 59.073 - type: ndcg_at_20 value: 55.928 - type: ndcg_at_3 value: 47.494 - type: ndcg_at_5 value: 50.975 - type: precision_at_1 value: 37.949 - type: precision_at_10 value: 8.450000000000001 - type: precision_at_100 value: 1.083 - type: precision_at_1000 value: 0.117 - type: precision_at_20 value: 4.689 - type: precision_at_3 value: 21.051000000000002 - type: precision_at_5 value: 14.664 - type: recall_at_1 value: 34.193 - type: recall_at_10 value: 71.357 - type: recall_at_100 value: 89.434 - type: recall_at_1000 value: 96.536 - type: recall_at_20 value: 78.363 - type: recall_at_3 value: 54.551 - type: recall_at_5 value: 62.543000000000006 task: type: Retrieval - dataset: config: default name: MTEB Quora-PL (default) revision: 0be27e93455051e531182b85e85e425aba12e9d4 split: test type: clarin-knext/quora-pl metrics: - type: main_score value: 84.114 - type: map_at_1 value: 65.848 - type: map_at_10 value: 79.85900000000001 - type: map_at_100 value: 80.582 - type: map_at_1000 value: 80.60300000000001 - type: map_at_20 value: 80.321 - type: map_at_3 value: 76.741 - type: map_at_5 value: 78.72200000000001 - type: mrr_at_1 value: 75.97 - type: mrr_at_10 value: 83.04630158730119 - type: mrr_at_100 value: 83.22785731032968 - type: mrr_at_1000 value: 83.23123717623899 - type: mrr_at_20 value: 83.17412021320565 - type: mrr_at_3 value: 81.83333333333287 - type: mrr_at_5 value: 82.61933333333275 - type: nauc_map_at_1000_diff1 value: 73.26316553371083 - type: nauc_map_at_1000_max value: 27.92567859085245 - type: nauc_map_at_1000_std value: -47.477909533360446 - type: nauc_map_at_100_diff1 value: 73.2690602807223 - type: nauc_map_at_100_max value: 27.915868327849996 - type: nauc_map_at_100_std value: -47.525777766107595 - type: nauc_map_at_10_diff1 value: 73.45464428464894 - type: nauc_map_at_10_max value: 27.451611487246296 - type: nauc_map_at_10_std value: -49.35818715843809 - type: nauc_map_at_1_diff1 value: 77.29690208952982 - type: nauc_map_at_1_max value: 19.839875762282293 - type: nauc_map_at_1_std value: -45.355684654708284 - type: nauc_map_at_20_diff1 value: 73.35102731979796 - type: nauc_map_at_20_max value: 27.741506490134583 - type: nauc_map_at_20_std value: -48.22006207310331 - type: nauc_map_at_3_diff1 value: 73.94878241064137 - type: nauc_map_at_3_max value: 24.761321386766728 - type: nauc_map_at_3_std value: -51.20638883618126 - type: nauc_map_at_5_diff1 value: 73.66143558047698 - type: nauc_map_at_5_max value: 26.53483405013543 - type: nauc_map_at_5_std value: -50.697541279640056 - type: nauc_mrr_at_1000_diff1 value: 73.84632320009759 - type: nauc_mrr_at_1000_max value: 30.50182733610048 - type: nauc_mrr_at_1000_std value: -44.3021647995251 - type: nauc_mrr_at_100_diff1 value: 73.84480792662302 - type: nauc_mrr_at_100_max value: 30.50749424571614 - type: nauc_mrr_at_100_std value: -44.29615086388113 - type: nauc_mrr_at_10_diff1 value: 73.79442772949346 - type: nauc_mrr_at_10_max value: 30.55724252219984 - type: nauc_mrr_at_10_std value: -44.50997069462057 - type: nauc_mrr_at_1_diff1 value: 75.23369827945945 - type: nauc_mrr_at_1_max value: 29.20073967447664 - type: nauc_mrr_at_1_std value: -43.1920147658285 - type: nauc_mrr_at_20_diff1 value: 73.82731678072307 - type: nauc_mrr_at_20_max value: 30.566328605497667 - type: nauc_mrr_at_20_std value: -44.24683607643705 - type: nauc_mrr_at_3_diff1 value: 73.61997576749954 - type: nauc_mrr_at_3_max value: 30.150393853381917 - type: nauc_mrr_at_3_std value: -44.96847297506626 - type: nauc_mrr_at_5_diff1 value: 73.69084310616132 - type: nauc_mrr_at_5_max value: 30.578033703441125 - type: nauc_mrr_at_5_std value: -44.74920746066566 - type: nauc_ndcg_at_1000_diff1 value: 72.89349862557452 - type: nauc_ndcg_at_1000_max value: 29.824725190462086 - type: nauc_ndcg_at_1000_std value: -44.96284395063211 - type: nauc_ndcg_at_100_diff1 value: 72.85212753715273 - type: nauc_ndcg_at_100_max value: 29.933114207845605 - type: nauc_ndcg_at_100_std value: -44.944225570663754 - type: nauc_ndcg_at_10_diff1 value: 72.80576740454528 - type: nauc_ndcg_at_10_max value: 29.16829118320828 - type: nauc_ndcg_at_10_std value: -48.149473740079614 - type: nauc_ndcg_at_1_diff1 value: 75.00032534968587 - type: nauc_ndcg_at_1_max value: 29.61849062038547 - type: nauc_ndcg_at_1_std value: -42.560207043864054 - type: nauc_ndcg_at_20_diff1 value: 72.88440406302502 - type: nauc_ndcg_at_20_max value: 29.65496676092656 - type: nauc_ndcg_at_20_std value: -46.21238462167732 - type: nauc_ndcg_at_3_diff1 value: 72.37916962766987 - type: nauc_ndcg_at_3_max value: 27.125094834547586 - type: nauc_ndcg_at_3_std value: -48.62942991399391 - type: nauc_ndcg_at_5_diff1 value: 72.57017330527658 - type: nauc_ndcg_at_5_max value: 28.470485561757254 - type: nauc_ndcg_at_5_std value: -49.07593345591059 - type: nauc_precision_at_1000_diff1 value: -41.67915575853946 - type: nauc_precision_at_1000_max value: 1.2012264478568844 - type: nauc_precision_at_1000_std value: 44.723834559400466 - type: nauc_precision_at_100_diff1 value: -40.45196679236971 - type: nauc_precision_at_100_max value: 2.3525450401714894 - type: nauc_precision_at_100_std value: 43.7092529413952 - type: nauc_precision_at_10_diff1 value: -30.256026923068767 - type: nauc_precision_at_10_max value: 8.313422052132559 - type: nauc_precision_at_10_std value: 25.929372356449694 - type: nauc_precision_at_1_diff1 value: 75.00032534968587 - type: nauc_precision_at_1_max value: 29.61849062038547 - type: nauc_precision_at_1_std value: -42.560207043864054 - type: nauc_precision_at_20_diff1 value: -35.61971069986584 - type: nauc_precision_at_20_max value: 5.4664303079116765 - type: nauc_precision_at_20_std value: 34.992352471692826 - type: nauc_precision_at_3_diff1 value: -5.691231842471157 - type: nauc_precision_at_3_max value: 14.797949087742444 - type: nauc_precision_at_3_std value: -0.1930317395644928 - type: nauc_precision_at_5_diff1 value: -20.03913781462645 - type: nauc_precision_at_5_max value: 11.956771408712749 - type: nauc_precision_at_5_std value: 13.179251389859731 - type: nauc_recall_at_1000_diff1 value: 64.03509042729674 - type: nauc_recall_at_1000_max value: 40.91691485428493 - type: nauc_recall_at_1000_std value: 16.12968625875372 - type: nauc_recall_at_100_diff1 value: 63.83116179628575 - type: nauc_recall_at_100_max value: 43.72908117676382 - type: nauc_recall_at_100_std value: -20.50966716852155 - type: nauc_recall_at_10_diff1 value: 66.42071960186394 - type: nauc_recall_at_10_max value: 28.983207818687205 - type: nauc_recall_at_10_std value: -56.61417798753744 - type: nauc_recall_at_1_diff1 value: 77.29690208952982 - type: nauc_recall_at_1_max value: 19.839875762282293 - type: nauc_recall_at_1_std value: -45.355684654708284 - type: nauc_recall_at_20_diff1 value: 66.32360705219874 - type: nauc_recall_at_20_max value: 33.30698111822631 - type: nauc_recall_at_20_std value: -43.89233781737452 - type: nauc_recall_at_3_diff1 value: 69.67029394927077 - type: nauc_recall_at_3_max value: 22.67803039327696 - type: nauc_recall_at_3_std value: -56.43327209861502 - type: nauc_recall_at_5_diff1 value: 68.05622143936131 - type: nauc_recall_at_5_max value: 26.67795559040675 - type: nauc_recall_at_5_std value: -58.158231198510954 - type: ndcg_at_1 value: 76.08 - type: ndcg_at_10 value: 84.114 - type: ndcg_at_100 value: 85.784 - type: ndcg_at_1000 value: 85.992 - type: ndcg_at_20 value: 84.976 - type: ndcg_at_3 value: 80.74799999999999 - type: ndcg_at_5 value: 82.626 - type: precision_at_1 value: 76.08 - type: precision_at_10 value: 12.926000000000002 - type: precision_at_100 value: 1.509 - type: precision_at_1000 value: 0.156 - type: precision_at_20 value: 6.912999999999999 - type: precision_at_3 value: 35.5 - type: precision_at_5 value: 23.541999999999998 - type: recall_at_1 value: 65.848 - type: recall_at_10 value: 92.611 - type: recall_at_100 value: 98.69 - type: recall_at_1000 value: 99.83999999999999 - type: recall_at_20 value: 95.47200000000001 - type: recall_at_3 value: 83.122 - type: recall_at_5 value: 88.23 task: type: Retrieval - dataset: config: default name: MTEB SCIDOCS-PL (default) revision: 45452b03f05560207ef19149545f168e596c9337 split: test type: clarin-knext/scidocs-pl metrics: - type: main_score value: 15.379999999999999 - type: map_at_1 value: 3.6029999999999998 - type: map_at_10 value: 8.843 - type: map_at_100 value: 10.433 - type: map_at_1000 value: 10.689 - type: map_at_20 value: 9.597 - type: map_at_3 value: 6.363 - type: map_at_5 value: 7.603 - type: mrr_at_1 value: 17.7 - type: mrr_at_10 value: 26.58900793650793 - type: mrr_at_100 value: 27.699652322890987 - type: mrr_at_1000 value: 27.78065313118353 - type: mrr_at_20 value: 27.215020950411816 - type: mrr_at_3 value: 23.36666666666668 - type: mrr_at_5 value: 25.211666666666666 - type: nauc_map_at_1000_diff1 value: 21.92235143827129 - type: nauc_map_at_1000_max value: 37.50300940750989 - type: nauc_map_at_1000_std value: 20.872586122198552 - type: nauc_map_at_100_diff1 value: 21.917408170465833 - type: nauc_map_at_100_max value: 37.4654466815513 - type: nauc_map_at_100_std value: 20.621643878648534 - type: nauc_map_at_10_diff1 value: 22.914388723621183 - type: nauc_map_at_10_max value: 36.468131213468794 - type: nauc_map_at_10_std value: 16.760980140791492 - type: nauc_map_at_1_diff1 value: 29.00799502838457 - type: nauc_map_at_1_max value: 26.64926291797503 - type: nauc_map_at_1_std value: 8.167291261637361 - type: nauc_map_at_20_diff1 value: 22.46580947804047 - type: nauc_map_at_20_max value: 36.656294842562275 - type: nauc_map_at_20_std value: 18.099232417722078 - type: nauc_map_at_3_diff1 value: 23.436009032045934 - type: nauc_map_at_3_max value: 31.325807212280914 - type: nauc_map_at_3_std value: 9.780905232048852 - type: nauc_map_at_5_diff1 value: 22.891704394665528 - type: nauc_map_at_5_max value: 35.40584466642894 - type: nauc_map_at_5_std value: 13.476986099394656 - type: nauc_mrr_at_1000_diff1 value: 25.052937655397866 - type: nauc_mrr_at_1000_max value: 29.64431912670108 - type: nauc_mrr_at_1000_std value: 14.549744963988044 - type: nauc_mrr_at_100_diff1 value: 25.070871266969224 - type: nauc_mrr_at_100_max value: 29.68743604652336 - type: nauc_mrr_at_100_std value: 14.582010154574432 - type: nauc_mrr_at_10_diff1 value: 24.88881466938897 - type: nauc_mrr_at_10_max value: 29.488430770768144 - type: nauc_mrr_at_10_std value: 14.269241073852266 - type: nauc_mrr_at_1_diff1 value: 29.220540327267503 - type: nauc_mrr_at_1_max value: 26.81908580507911 - type: nauc_mrr_at_1_std value: 8.00840295809718 - type: nauc_mrr_at_20_diff1 value: 25.067912695721944 - type: nauc_mrr_at_20_max value: 29.759227563849628 - type: nauc_mrr_at_20_std value: 14.685076859257357 - type: nauc_mrr_at_3_diff1 value: 24.645848739182696 - type: nauc_mrr_at_3_max value: 27.73368549660351 - type: nauc_mrr_at_3_std value: 11.475742805586943 - type: nauc_mrr_at_5_diff1 value: 24.895295760909946 - type: nauc_mrr_at_5_max value: 29.130755033240423 - type: nauc_mrr_at_5_std value: 12.955802929145404 - type: nauc_ndcg_at_1000_diff1 value: 20.68434434777729 - type: nauc_ndcg_at_1000_max value: 37.67055146424174 - type: nauc_ndcg_at_1000_std value: 29.57493715069776 - type: nauc_ndcg_at_100_diff1 value: 20.396834816492383 - type: nauc_ndcg_at_100_max value: 37.460575228670514 - type: nauc_ndcg_at_100_std value: 27.826534756761944 - type: nauc_ndcg_at_10_diff1 value: 22.640844106236027 - type: nauc_ndcg_at_10_max value: 35.21291764462327 - type: nauc_ndcg_at_10_std value: 19.53289455984506 - type: nauc_ndcg_at_1_diff1 value: 29.220540327267503 - type: nauc_ndcg_at_1_max value: 26.81908580507911 - type: nauc_ndcg_at_1_std value: 8.00840295809718 - type: nauc_ndcg_at_20_diff1 value: 22.117126657768623 - type: nauc_ndcg_at_20_max value: 35.79395781940806 - type: nauc_ndcg_at_20_std value: 22.242748346260786 - type: nauc_ndcg_at_3_diff1 value: 23.00596063212187 - type: nauc_ndcg_at_3_max value: 30.149013627580523 - type: nauc_ndcg_at_3_std value: 11.07904064662722 - type: nauc_ndcg_at_5_diff1 value: 22.81875419630523 - type: nauc_ndcg_at_5_max value: 34.24267468356626 - type: nauc_ndcg_at_5_std value: 15.307780280752088 - type: nauc_precision_at_1000_diff1 value: 9.606677689029972 - type: nauc_precision_at_1000_max value: 32.74855550489271 - type: nauc_precision_at_1000_std value: 42.65372585937895 - type: nauc_precision_at_100_diff1 value: 11.528981313529545 - type: nauc_precision_at_100_max value: 35.642529490132404 - type: nauc_precision_at_100_std value: 38.146151426052306 - type: nauc_precision_at_10_diff1 value: 18.783957183811836 - type: nauc_precision_at_10_max value: 36.1982008334257 - type: nauc_precision_at_10_std value: 25.09349473195891 - type: nauc_precision_at_1_diff1 value: 29.220540327267503 - type: nauc_precision_at_1_max value: 26.81908580507911 - type: nauc_precision_at_1_std value: 8.00840295809718 - type: nauc_precision_at_20_diff1 value: 17.458766320828214 - type: nauc_precision_at_20_max value: 36.000404903025235 - type: nauc_precision_at_20_std value: 29.1608044138323 - type: nauc_precision_at_3_diff1 value: 20.213669462067166 - type: nauc_precision_at_3_max value: 31.120650847205912 - type: nauc_precision_at_3_std value: 12.390972418818118 - type: nauc_precision_at_5_diff1 value: 20.114245715785678 - type: nauc_precision_at_5_max value: 37.30360111495823 - type: nauc_precision_at_5_std value: 19.053109037822853 - type: nauc_recall_at_1000_diff1 value: 9.85800049032612 - type: nauc_recall_at_1000_max value: 32.48319160802687 - type: nauc_recall_at_1000_std value: 43.79941601741161 - type: nauc_recall_at_100_diff1 value: 11.375255270968337 - type: nauc_recall_at_100_max value: 35.1868784124497 - type: nauc_recall_at_100_std value: 38.422680583482666 - type: nauc_recall_at_10_diff1 value: 18.445783123521938 - type: nauc_recall_at_10_max value: 35.633267936276766 - type: nauc_recall_at_10_std value: 24.94469506254716 - type: nauc_recall_at_1_diff1 value: 29.00799502838457 - type: nauc_recall_at_1_max value: 26.64926291797503 - type: nauc_recall_at_1_std value: 8.167291261637361 - type: nauc_recall_at_20_diff1 value: 17.314906604151936 - type: nauc_recall_at_20_max value: 35.66067699203996 - type: nauc_recall_at_20_std value: 29.400137012506082 - type: nauc_recall_at_3_diff1 value: 19.873710875648698 - type: nauc_recall_at_3_max value: 30.92404718742849 - type: nauc_recall_at_3_std value: 12.400871018075199 - type: nauc_recall_at_5_diff1 value: 19.869948324233192 - type: nauc_recall_at_5_max value: 37.06832511687574 - type: nauc_recall_at_5_std value: 19.0798814966156 - type: ndcg_at_1 value: 17.7 - type: ndcg_at_10 value: 15.379999999999999 - type: ndcg_at_100 value: 22.09 - type: ndcg_at_1000 value: 27.151999999999997 - type: ndcg_at_20 value: 17.576 - type: ndcg_at_3 value: 14.219999999999999 - type: ndcg_at_5 value: 12.579 - type: precision_at_1 value: 17.7 - type: precision_at_10 value: 8.08 - type: precision_at_100 value: 1.7840000000000003 - type: precision_at_1000 value: 0.3 - type: precision_at_20 value: 5.305 - type: precision_at_3 value: 13.167000000000002 - type: precision_at_5 value: 11.06 - type: recall_at_1 value: 3.6029999999999998 - type: recall_at_10 value: 16.413 - type: recall_at_100 value: 36.263 - type: recall_at_1000 value: 61.016999999999996 - type: recall_at_20 value: 21.587999999999997 - type: recall_at_3 value: 8.013 - type: recall_at_5 value: 11.198 task: type: Retrieval - dataset: config: default name: MTEB SciFact-PL (default) revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e split: test type: clarin-knext/scifact-pl metrics: - type: main_score value: 64.764 - type: map_at_1 value: 49.778 - type: map_at_10 value: 59.88 - type: map_at_100 value: 60.707 - type: map_at_1000 value: 60.729 - type: map_at_20 value: 60.419999999999995 - type: map_at_3 value: 57.45400000000001 - type: map_at_5 value: 58.729 - type: mrr_at_1 value: 52.33333333333333 - type: mrr_at_10 value: 61.29193121693122 - type: mrr_at_100 value: 61.95817765126313 - type: mrr_at_1000 value: 61.97583284368782 - type: mrr_at_20 value: 61.72469949641003 - type: mrr_at_3 value: 59.44444444444444 - type: mrr_at_5 value: 60.494444444444454 - type: nauc_map_at_1000_diff1 value: 62.21235294015774 - type: nauc_map_at_1000_max value: 48.83996609100249 - type: nauc_map_at_1000_std value: 5.23892781043174 - type: nauc_map_at_100_diff1 value: 62.20170226789429 - type: nauc_map_at_100_max value: 48.8391766453537 - type: nauc_map_at_100_std value: 5.2664077457917715 - type: nauc_map_at_10_diff1 value: 61.961975488329024 - type: nauc_map_at_10_max value: 48.397109987625186 - type: nauc_map_at_10_std value: 4.314859710827481 - type: nauc_map_at_1_diff1 value: 65.0865197011516 - type: nauc_map_at_1_max value: 41.38862781954889 - type: nauc_map_at_1_std value: -0.9182122632530586 - type: nauc_map_at_20_diff1 value: 61.99173935851292 - type: nauc_map_at_20_max value: 48.79961814179307 - type: nauc_map_at_20_std value: 5.262181845825118 - type: nauc_map_at_3_diff1 value: 62.37910539880477 - type: nauc_map_at_3_max value: 47.13627890977091 - type: nauc_map_at_3_std value: 2.327897198087264 - type: nauc_map_at_5_diff1 value: 61.60080757149592 - type: nauc_map_at_5_max value: 47.60052458345962 - type: nauc_map_at_5_std value: 3.1770196981231047 - type: nauc_mrr_at_1000_diff1 value: 62.86810952814966 - type: nauc_mrr_at_1000_max value: 52.13248094447774 - type: nauc_mrr_at_1000_std value: 10.100485746570733 - type: nauc_mrr_at_100_diff1 value: 62.85364829491874 - type: nauc_mrr_at_100_max value: 52.134528010631854 - type: nauc_mrr_at_100_std value: 10.120945685447369 - type: nauc_mrr_at_10_diff1 value: 62.65679301829915 - type: nauc_mrr_at_10_max value: 52.09270719182349 - type: nauc_mrr_at_10_std value: 9.913834434725441 - type: nauc_mrr_at_1_diff1 value: 66.84108271415636 - type: nauc_mrr_at_1_max value: 46.67646429855176 - type: nauc_mrr_at_1_std value: 5.5505252956352304 - type: nauc_mrr_at_20_diff1 value: 62.72473227039611 - type: nauc_mrr_at_20_max value: 52.13479097802757 - type: nauc_mrr_at_20_std value: 10.188278833464084 - type: nauc_mrr_at_3_diff1 value: 63.797429185518496 - type: nauc_mrr_at_3_max value: 52.16486999573481 - type: nauc_mrr_at_3_std value: 9.094360767062762 - type: nauc_mrr_at_5_diff1 value: 62.592917975475494 - type: nauc_mrr_at_5_max value: 52.330741486107414 - type: nauc_mrr_at_5_std value: 9.742175534421389 - type: nauc_ndcg_at_1000_diff1 value: 61.38859337672476 - type: nauc_ndcg_at_1000_max value: 51.48380058339184 - type: nauc_ndcg_at_1000_std value: 9.670547660897673 - type: nauc_ndcg_at_100_diff1 value: 61.02438489641434 - type: nauc_ndcg_at_100_max value: 51.781246646780865 - type: nauc_ndcg_at_100_std value: 10.592961553245187 - type: nauc_ndcg_at_10_diff1 value: 60.03678353308358 - type: nauc_ndcg_at_10_max value: 50.70725688848762 - type: nauc_ndcg_at_10_std value: 7.9472446491016315 - type: nauc_ndcg_at_1_diff1 value: 66.84108271415636 - type: nauc_ndcg_at_1_max value: 46.67646429855176 - type: nauc_ndcg_at_1_std value: 5.5505252956352304 - type: nauc_ndcg_at_20_diff1 value: 59.828482718480224 - type: nauc_ndcg_at_20_max value: 51.45831789601284 - type: nauc_ndcg_at_20_std value: 10.722673683272049 - type: nauc_ndcg_at_3_diff1 value: 61.68982937524109 - type: nauc_ndcg_at_3_max value: 49.745326748604775 - type: nauc_ndcg_at_3_std value: 4.948298621202247 - type: nauc_ndcg_at_5_diff1 value: 59.67396171973207 - type: nauc_ndcg_at_5_max value: 49.87855139298281 - type: nauc_ndcg_at_5_std value: 6.08990428055584 - type: nauc_precision_at_1000_diff1 value: -1.594227972036865 - type: nauc_precision_at_1000_max value: 32.48431723086185 - type: nauc_precision_at_1000_std value: 53.84748466965268 - type: nauc_precision_at_100_diff1 value: 8.06411455192293 - type: nauc_precision_at_100_max value: 39.91003601878948 - type: nauc_precision_at_100_std value: 55.52979711075091 - type: nauc_precision_at_10_diff1 value: 26.610514456014066 - type: nauc_precision_at_10_max value: 47.09062494321172 - type: nauc_precision_at_10_std value: 33.91984226498748 - type: nauc_precision_at_1_diff1 value: 66.84108271415636 - type: nauc_precision_at_1_max value: 46.67646429855176 - type: nauc_precision_at_1_std value: 5.5505252956352304 - type: nauc_precision_at_20_diff1 value: 16.947688843085583 - type: nauc_precision_at_20_max value: 45.40488186572008 - type: nauc_precision_at_20_std value: 48.354421924500905 - type: nauc_precision_at_3_diff1 value: 49.11263981720622 - type: nauc_precision_at_3_max value: 52.7084625111683 - type: nauc_precision_at_3_std value: 16.734612173556453 - type: nauc_precision_at_5_diff1 value: 39.06503705015792 - type: nauc_precision_at_5_max value: 52.21710506893391 - type: nauc_precision_at_5_std value: 23.350948149460233 - type: nauc_recall_at_1000_diff1 value: 43.1559290382817 - type: nauc_recall_at_1000_max value: 83.66013071895456 - type: nauc_recall_at_1000_std value: 86.27450980392177 - type: nauc_recall_at_100_diff1 value: 46.016860850620375 - type: nauc_recall_at_100_max value: 69.3944888744547 - type: nauc_recall_at_100_std value: 55.286945696152735 - type: nauc_recall_at_10_diff1 value: 49.65877895350921 - type: nauc_recall_at_10_max value: 53.02636695700889 - type: nauc_recall_at_10_std value: 13.967608945823828 - type: nauc_recall_at_1_diff1 value: 65.0865197011516 - type: nauc_recall_at_1_max value: 41.38862781954889 - type: nauc_recall_at_1_std value: -0.9182122632530586 - type: nauc_recall_at_20_diff1 value: 43.355308229973524 - type: nauc_recall_at_20_max value: 57.04187909533764 - type: nauc_recall_at_20_std value: 33.578720846660524 - type: nauc_recall_at_3_diff1 value: 56.922996057428165 - type: nauc_recall_at_3_max value: 50.74417041895424 - type: nauc_recall_at_3_std value: 5.623890124328387 - type: nauc_recall_at_5_diff1 value: 50.55620076865238 - type: nauc_recall_at_5_max value: 51.3316854622085 - type: nauc_recall_at_5_std value: 8.995457887269255 - type: ndcg_at_1 value: 52.333 - type: ndcg_at_10 value: 64.764 - type: ndcg_at_100 value: 68.167 - type: ndcg_at_1000 value: 68.816 - type: ndcg_at_20 value: 66.457 - type: ndcg_at_3 value: 60.346 - type: ndcg_at_5 value: 62.365 - type: precision_at_1 value: 52.333 - type: precision_at_10 value: 8.799999999999999 - type: precision_at_100 value: 1.057 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_20 value: 4.8 - type: precision_at_3 value: 23.889 - type: precision_at_5 value: 15.6 - type: recall_at_1 value: 49.778 - type: recall_at_10 value: 78.206 - type: recall_at_100 value: 93.10000000000001 - type: recall_at_1000 value: 98.333 - type: recall_at_20 value: 84.467 - type: recall_at_3 value: 66.367 - type: recall_at_5 value: 71.35000000000001 task: type: Retrieval - dataset: config: default name: MTEB TRECCOVID-PL (default) revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd split: test type: clarin-knext/trec-covid-pl metrics: - type: main_score value: 72.18900000000001 - type: map_at_1 value: 0.214 - type: map_at_10 value: 1.755 - type: map_at_100 value: 9.944 - type: map_at_1000 value: 24.205 - type: map_at_20 value: 3.1510000000000002 - type: map_at_3 value: 0.6 - type: map_at_5 value: 0.9560000000000001 - type: mrr_at_1 value: 82.0 - type: mrr_at_10 value: 89.06666666666666 - type: mrr_at_100 value: 89.06666666666666 - type: mrr_at_1000 value: 89.06666666666666 - type: mrr_at_20 value: 89.06666666666666 - type: mrr_at_3 value: 87.66666666666666 - type: mrr_at_5 value: 89.06666666666666 - type: nauc_map_at_1000_diff1 value: -9.342037623635543 - type: nauc_map_at_1000_max value: 45.71499810252398 - type: nauc_map_at_1000_std value: 76.86482845196852 - type: nauc_map_at_100_diff1 value: -6.932395299866198 - type: nauc_map_at_100_max value: 36.097801891181604 - type: nauc_map_at_100_std value: 65.6085215411685 - type: nauc_map_at_10_diff1 value: -6.3654843824342775 - type: nauc_map_at_10_max value: 9.564437521432714 - type: nauc_map_at_10_std value: 21.8377319336476 - type: nauc_map_at_1_diff1 value: 8.269590874255034 - type: nauc_map_at_1_max value: 3.482498491294516 - type: nauc_map_at_1_std value: 8.985226819412189 - type: nauc_map_at_20_diff1 value: -4.971435767877232 - type: nauc_map_at_20_max value: 22.88801858567121 - type: nauc_map_at_20_std value: 32.38492618534027 - type: nauc_map_at_3_diff1 value: 1.1615973694623123 - type: nauc_map_at_3_max value: 1.935417800315643 - type: nauc_map_at_3_std value: 10.289328305818698 - type: nauc_map_at_5_diff1 value: -2.4675967231444105 - type: nauc_map_at_5_max value: 2.4611483736622373 - type: nauc_map_at_5_std value: 15.082324305750811 - type: nauc_mrr_at_1000_diff1 value: 13.098526703499063 - type: nauc_mrr_at_1000_max value: 56.37362177417431 - type: nauc_mrr_at_1000_std value: 73.2456769749587 - type: nauc_mrr_at_100_diff1 value: 13.098526703499063 - type: nauc_mrr_at_100_max value: 56.37362177417431 - type: nauc_mrr_at_100_std value: 73.2456769749587 - type: nauc_mrr_at_10_diff1 value: 13.098526703499063 - type: nauc_mrr_at_10_max value: 56.37362177417431 - type: nauc_mrr_at_10_std value: 73.2456769749587 - type: nauc_mrr_at_1_diff1 value: 12.099350148694809 - type: nauc_mrr_at_1_max value: 53.75041304108387 - type: nauc_mrr_at_1_std value: 68.84018063663402 - type: nauc_mrr_at_20_diff1 value: 13.098526703499063 - type: nauc_mrr_at_20_max value: 56.37362177417431 - type: nauc_mrr_at_20_std value: 73.2456769749587 - type: nauc_mrr_at_3_diff1 value: 12.173557857011161 - type: nauc_mrr_at_3_max value: 57.540780562363395 - type: nauc_mrr_at_3_std value: 75.42098189580211 - type: nauc_mrr_at_5_diff1 value: 13.098526703499063 - type: nauc_mrr_at_5_max value: 56.37362177417431 - type: nauc_mrr_at_5_std value: 73.2456769749587 - type: nauc_ndcg_at_1000_diff1 value: -8.951471847310401 - type: nauc_ndcg_at_1000_max value: 43.86942237288822 - type: nauc_ndcg_at_1000_std value: 74.61077735148591 - type: nauc_ndcg_at_100_diff1 value: -17.754559361083817 - type: nauc_ndcg_at_100_max value: 53.97187119773482 - type: nauc_ndcg_at_100_std value: 80.7944136146514 - type: nauc_ndcg_at_10_diff1 value: -26.637734697836414 - type: nauc_ndcg_at_10_max value: 47.70102699133149 - type: nauc_ndcg_at_10_std value: 70.26909560828646 - type: nauc_ndcg_at_1_diff1 value: -1.2250530785563207 - type: nauc_ndcg_at_1_max value: 46.60509554140131 - type: nauc_ndcg_at_1_std value: 62.63906581740976 - type: nauc_ndcg_at_20_diff1 value: -22.44286466550908 - type: nauc_ndcg_at_20_max value: 55.40492058090103 - type: nauc_ndcg_at_20_std value: 72.11813912145738 - type: nauc_ndcg_at_3_diff1 value: -14.8152721896563 - type: nauc_ndcg_at_3_max value: 38.952259383027595 - type: nauc_ndcg_at_3_std value: 59.819750166537766 - type: nauc_ndcg_at_5_diff1 value: -19.150105688904375 - type: nauc_ndcg_at_5_max value: 42.311180547775315 - type: nauc_ndcg_at_5_std value: 66.6632229321094 - type: nauc_precision_at_1000_diff1 value: -11.555591477978941 - type: nauc_precision_at_1000_max value: 43.7311644834851 - type: nauc_precision_at_1000_std value: 52.10644767999648 - type: nauc_precision_at_100_diff1 value: -16.94803099801117 - type: nauc_precision_at_100_max value: 54.08281631067633 - type: nauc_precision_at_100_std value: 82.77237347891331 - type: nauc_precision_at_10_diff1 value: -27.351332814863355 - type: nauc_precision_at_10_max value: 48.08237549065846 - type: nauc_precision_at_10_std value: 69.37250843534329 - type: nauc_precision_at_1_diff1 value: 12.099350148694809 - type: nauc_precision_at_1_max value: 53.75041304108387 - type: nauc_precision_at_1_std value: 68.84018063663402 - type: nauc_precision_at_20_diff1 value: -18.2422222283388 - type: nauc_precision_at_20_max value: 59.517328129343696 - type: nauc_precision_at_20_std value: 72.05149307342747 - type: nauc_precision_at_3_diff1 value: -10.226547543075897 - type: nauc_precision_at_3_max value: 43.14684818832875 - type: nauc_precision_at_3_std value: 57.31936467418288 - type: nauc_precision_at_5_diff1 value: -14.28521589468673 - type: nauc_precision_at_5_max value: 41.633426753962596 - type: nauc_precision_at_5_std value: 64.94400576804541 - type: nauc_recall_at_1000_diff1 value: -0.9648831207497152 - type: nauc_recall_at_1000_max value: 31.70832946085005 - type: nauc_recall_at_1000_std value: 63.21471613968869 - type: nauc_recall_at_100_diff1 value: -1.360254380933586 - type: nauc_recall_at_100_max value: 25.960597782099605 - type: nauc_recall_at_100_std value: 51.52757589609674 - type: nauc_recall_at_10_diff1 value: -0.3899439424189566 - type: nauc_recall_at_10_max value: 5.094341897886072 - type: nauc_recall_at_10_std value: 11.266045616925698 - type: nauc_recall_at_1_diff1 value: 8.269590874255034 - type: nauc_recall_at_1_max value: 3.482498491294516 - type: nauc_recall_at_1_std value: 8.985226819412189 - type: nauc_recall_at_20_diff1 value: 6.4797098359254175 - type: nauc_recall_at_20_max value: 15.663700985336124 - type: nauc_recall_at_20_std value: 17.154099587904913 - type: nauc_recall_at_3_diff1 value: 3.7245972450393507 - type: nauc_recall_at_3_max value: 0.4063857187240345 - type: nauc_recall_at_3_std value: 6.641948062821941 - type: nauc_recall_at_5_diff1 value: 4.013879477591466 - type: nauc_recall_at_5_max value: -1.4266586618013566 - type: nauc_recall_at_5_std value: 7.311601874411205 - type: ndcg_at_1 value: 75.0 - type: ndcg_at_10 value: 72.18900000000001 - type: ndcg_at_100 value: 54.022999999999996 - type: ndcg_at_1000 value: 49.492000000000004 - type: ndcg_at_20 value: 68.51 - type: ndcg_at_3 value: 73.184 - type: ndcg_at_5 value: 72.811 - type: precision_at_1 value: 82.0 - type: precision_at_10 value: 77.4 - type: precision_at_100 value: 55.24 - type: precision_at_1000 value: 21.822 - type: precision_at_20 value: 73.0 - type: precision_at_3 value: 79.333 - type: precision_at_5 value: 79.2 - type: recall_at_1 value: 0.214 - type: recall_at_10 value: 1.9980000000000002 - type: recall_at_100 value: 13.328999999999999 - type: recall_at_1000 value: 47.204 - type: recall_at_20 value: 3.7310000000000003 - type: recall_at_3 value: 0.628 - type: recall_at_5 value: 1.049 task: type: Retrieval - dataset: config: default name: MTEB CEDRClassification (default) revision: c0ba03d058e3e1b2f3fd20518875a4563dd12db4 split: test type: ai-forever/cedr-classification metrics: - type: accuracy value: 47.30605738575983 - type: f1 value: 41.26091043925065 - type: lrap value: 72.89452709883206 - type: main_score value: 47.30605738575983 task: type: MultilabelClassification - dataset: config: ru name: MTEB MIRACLReranking (ru) revision: 6d1962c527217f8927fca80f890f14f36b2802af split: dev type: miracl/mmteb-miracl-reranking metrics: - type: MAP@1(MIRACL) value: 20.721999999999998 - type: MAP@10(MIRACL) value: 33.900999999999996 - type: MAP@100(MIRACL) value: 36.813 - type: MAP@1000(MIRACL) value: 36.813 - type: MAP@20(MIRACL) value: 35.684 - type: MAP@3(MIRACL) value: 28.141 - type: MAP@5(MIRACL) value: 31.075000000000003 - type: NDCG@1(MIRACL) value: 32.799 - type: NDCG@10(MIRACL) value: 42.065000000000005 - type: NDCG@100(MIRACL) value: 49.730999999999995 - type: NDCG@1000(MIRACL) value: 49.730999999999995 - type: NDCG@20(MIRACL) value: 46.0 - type: NDCG@3(MIRACL) value: 34.481 - type: NDCG@5(MIRACL) value: 37.452999999999996 - type: P@1(MIRACL) value: 32.799 - type: P@10(MIRACL) value: 11.668000000000001 - type: P@100(MIRACL) value: 1.9529999999999998 - type: P@1000(MIRACL) value: 0.19499999999999998 - type: P@20(MIRACL) value: 7.51 - type: P@3(MIRACL) value: 20.823 - type: P@5(MIRACL) value: 16.728 - type: Recall@1(MIRACL) value: 20.721999999999998 - type: Recall@10(MIRACL) value: 54.762 - type: Recall@100(MIRACL) value: 79.952 - type: Recall@1000(MIRACL) value: 79.952 - type: Recall@20(MIRACL) value: 66.26100000000001 - type: Recall@3(MIRACL) value: 34.410000000000004 - type: Recall@5(MIRACL) value: 42.659000000000006 - type: main_score value: 42.065000000000005 - type: nAUC_MAP@1000_diff1(MIRACL) value: 14.33534992502818 - type: nAUC_MAP@1000_max(MIRACL) value: 12.367998764646115 - type: nAUC_MAP@1000_std(MIRACL) value: 4.569686002935006 - type: nAUC_MAP@100_diff1(MIRACL) value: 14.33534992502818 - type: nAUC_MAP@100_max(MIRACL) value: 12.367998764646115 - type: nAUC_MAP@100_std(MIRACL) value: 4.569686002935006 - type: nAUC_MAP@10_diff1(MIRACL) value: 16.920323975680027 - type: nAUC_MAP@10_max(MIRACL) value: 9.327171297204082 - type: nAUC_MAP@10_std(MIRACL) value: 3.2039133783079015 - type: nAUC_MAP@1_diff1(MIRACL) value: 28.698973487482206 - type: nAUC_MAP@1_max(MIRACL) value: 2.9217687660885034 - type: nAUC_MAP@1_std(MIRACL) value: -1.1247408800976524 - type: nAUC_MAP@20_diff1(MIRACL) value: 15.359083081640476 - type: nAUC_MAP@20_max(MIRACL) value: 11.310494233946345 - type: nAUC_MAP@20_std(MIRACL) value: 4.4171898386022885 - type: nAUC_MAP@3_diff1(MIRACL) value: 22.27430591851617 - type: nAUC_MAP@3_max(MIRACL) value: 6.407438291284658 - type: nAUC_MAP@3_std(MIRACL) value: 0.9799184530397409 - type: nAUC_MAP@5_diff1(MIRACL) value: 19.20571689941054 - type: nAUC_MAP@5_max(MIRACL) value: 7.987468654026893 - type: nAUC_MAP@5_std(MIRACL) value: 1.8324246565938962 - type: nAUC_NDCG@1000_diff1(MIRACL) value: 3.7537669018914768 - type: nAUC_NDCG@1000_max(MIRACL) value: 20.7944707840533 - type: nAUC_NDCG@1000_std(MIRACL) value: 8.444837055303063 - type: nAUC_NDCG@100_diff1(MIRACL) value: 3.7537669018914768 - type: nAUC_NDCG@100_max(MIRACL) value: 20.7944707840533 - type: nAUC_NDCG@100_std(MIRACL) value: 8.444837055303063 - type: nAUC_NDCG@10_diff1(MIRACL) value: 10.829575656103888 - type: nAUC_NDCG@10_max(MIRACL) value: 13.0445496498929 - type: nAUC_NDCG@10_std(MIRACL) value: 6.050412212625362 - type: nAUC_NDCG@1_diff1(MIRACL) value: 19.1388712233292 - type: nAUC_NDCG@1_max(MIRACL) value: 10.871900994781642 - type: nAUC_NDCG@1_std(MIRACL) value: 3.218568248751811 - type: nAUC_NDCG@20_diff1(MIRACL) value: 7.093172181746442 - type: nAUC_NDCG@20_max(MIRACL) value: 16.955238078958836 - type: nAUC_NDCG@20_std(MIRACL) value: 8.325656379573035 - type: nAUC_NDCG@3_diff1(MIRACL) value: 17.134437303330802 - type: nAUC_NDCG@3_max(MIRACL) value: 10.235328822955793 - type: nAUC_NDCG@3_std(MIRACL) value: 3.2341358691084814 - type: nAUC_NDCG@5_diff1(MIRACL) value: 14.733664618337636 - type: nAUC_NDCG@5_max(MIRACL) value: 11.181897412035282 - type: nAUC_NDCG@5_std(MIRACL) value: 3.642277088791985 - type: nAUC_P@1000_diff1(MIRACL) value: -26.330038284867573 - type: nAUC_P@1000_max(MIRACL) value: 28.450694137240458 - type: nAUC_P@1000_std(MIRACL) value: 9.892993775474912 - type: nAUC_P@100_diff1(MIRACL) value: -26.330038284867552 - type: nAUC_P@100_max(MIRACL) value: 28.45069413724051 - type: nAUC_P@100_std(MIRACL) value: 9.892993775474928 - type: nAUC_P@10_diff1(MIRACL) value: -17.436937353231112 - type: nAUC_P@10_max(MIRACL) value: 24.327018012947857 - type: nAUC_P@10_std(MIRACL) value: 11.78803527706634 - type: nAUC_P@1_diff1(MIRACL) value: 19.1388712233292 - type: nAUC_P@1_max(MIRACL) value: 10.871900994781642 - type: nAUC_P@1_std(MIRACL) value: 3.218568248751811 - type: nAUC_P@20_diff1(MIRACL) value: -22.947528755272426 - type: nAUC_P@20_max(MIRACL) value: 27.773093471902538 - type: nAUC_P@20_std(MIRACL) value: 14.898619107087221 - type: nAUC_P@3_diff1(MIRACL) value: 1.4100426412400944 - type: nAUC_P@3_max(MIRACL) value: 17.397472872058845 - type: nAUC_P@3_std(MIRACL) value: 8.240008229861875 - type: nAUC_P@5_diff1(MIRACL) value: -7.971349332207021 - type: nAUC_P@5_max(MIRACL) value: 22.198441167940963 - type: nAUC_P@5_std(MIRACL) value: 9.00265164460082 - type: nAUC_Recall@1000_diff1(MIRACL) value: -38.69835271863148 - type: nAUC_Recall@1000_max(MIRACL) value: 50.9545152809108 - type: nAUC_Recall@1000_std(MIRACL) value: 20.44270887092116 - type: nAUC_Recall@100_diff1(MIRACL) value: -38.69835271863148 - type: nAUC_Recall@100_max(MIRACL) value: 50.9545152809108 - type: nAUC_Recall@100_std(MIRACL) value: 20.44270887092116 - type: nAUC_Recall@10_diff1(MIRACL) value: -0.08109036309433801 - type: nAUC_Recall@10_max(MIRACL) value: 12.696619907773568 - type: nAUC_Recall@10_std(MIRACL) value: 8.791982704261589 - type: nAUC_Recall@1_diff1(MIRACL) value: 28.698973487482206 - type: nAUC_Recall@1_max(MIRACL) value: 2.9217687660885034 - type: nAUC_Recall@1_std(MIRACL) value: -1.1247408800976524 - type: nAUC_Recall@20_diff1(MIRACL) value: -13.312171017942623 - type: nAUC_Recall@20_max(MIRACL) value: 24.19847346821666 - type: nAUC_Recall@20_std(MIRACL) value: 15.8157702609797 - type: nAUC_Recall@3_diff1(MIRACL) value: 16.909128321353343 - type: nAUC_Recall@3_max(MIRACL) value: 6.552122731902991 - type: nAUC_Recall@3_std(MIRACL) value: 1.9963898223457228 - type: nAUC_Recall@5_diff1(MIRACL) value: 9.990292655247721 - type: nAUC_Recall@5_max(MIRACL) value: 9.361722273507574 - type: nAUC_Recall@5_std(MIRACL) value: 3.270918827854495 task: type: Reranking - dataset: config: default name: MTEB SensitiveTopicsClassification (default) revision: 416b34a802308eac30e4192afc0ff99bb8dcc7f2 split: test type: ai-forever/sensitive-topics-classification metrics: - type: accuracy value: 30.634765625 - type: f1 value: 32.647559808678665 - type: lrap value: 45.94319661458259 - type: main_score value: 30.634765625 task: type: MultilabelClassification - dataset: config: default name: MTEB ATEC (default) revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865 split: test type: C-MTEB/ATEC metrics: - type: cosine_pearson value: 47.541497334563296 - type: cosine_spearman value: 49.06268944206629 - type: euclidean_pearson value: 51.838926748581635 - type: euclidean_spearman value: 48.930697157135356 - type: main_score value: 49.06268944206629 - type: manhattan_pearson value: 51.835306769406365 - type: manhattan_spearman value: 48.86135493444834 - type: pearson value: 47.541497334563296 - type: spearman value: 49.06268944206629 task: type: STS - dataset: config: default name: MTEB AllegroReviews (default) revision: b89853e6de927b0e3bfa8ecc0e56fe4e02ceafc6 split: test type: PL-MTEB/allegro-reviews metrics: - type: accuracy value: 49.51292246520874 - type: f1 value: 44.14350234332397 - type: f1_weighted value: 51.65508998354552 - type: main_score value: 49.51292246520874 task: type: Classification - dataset: config: default name: MTEB AlloProfClusteringP2P (default) revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: main_score value: 63.883383458621665 - type: v_measure value: 63.883383458621665 - type: v_measure_std value: 2.693666879958465 task: type: Clustering - dataset: config: default name: MTEB 8TagsClustering revision: None split: test type: PL-MTEB/8tags-clustering metrics: - type: v_measure value: 43.657212124525546 task: type: Clustering - dataset: config: default name: MTEB AlloProfClusteringS2S (default) revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b split: test type: lyon-nlp/alloprof metrics: - type: main_score value: 46.85924588755251 - type: v_measure value: 46.85924588755251 - type: v_measure_std value: 2.1918258880872377 task: type: Clustering - dataset: config: default name: MTEB AlloprofReranking (default) revision: e40c8a63ce02da43200eccb5b0846fcaa888f562 split: test type: lyon-nlp/mteb-fr-reranking-alloprof-s2p metrics: - type: map value: 66.39013753839347 - type: mrr value: 67.68045617786551 - type: main_score value: 66.39013753839347 task: type: Reranking - dataset: config: default name: MTEB AlloprofRetrieval (default) revision: fcf295ea64c750f41fadbaa37b9b861558e1bfbd split: test type: lyon-nlp/alloprof metrics: - type: main_score value: 54.284 - type: map_at_1 value: 37.047000000000004 - type: map_at_10 value: 48.53 - type: map_at_100 value: 49.357 - type: map_at_1000 value: 49.39 - type: map_at_20 value: 49.064 - type: map_at_3 value: 45.675 - type: map_at_5 value: 47.441 - type: mrr_at_1 value: 37.04663212435233 - type: mrr_at_10 value: 48.5300326232969 - type: mrr_at_100 value: 49.35708199037581 - type: mrr_at_1000 value: 49.39005824603193 - type: mrr_at_20 value: 49.06417416464799 - type: mrr_at_3 value: 45.67501439263105 - type: mrr_at_5 value: 47.44099021301103 - type: nauc_map_at_1000_diff1 value: 43.32474221868009 - type: nauc_map_at_1000_max value: 39.407334029058575 - type: nauc_map_at_1000_std value: -2.3728154448932606 - type: nauc_map_at_100_diff1 value: 43.32336300929909 - type: nauc_map_at_100_max value: 39.432174777554835 - type: nauc_map_at_100_std value: -2.356396922384349 - type: nauc_map_at_10_diff1 value: 43.1606520154482 - type: nauc_map_at_10_max value: 39.33734650558226 - type: nauc_map_at_10_std value: -2.5156222475075256 - type: nauc_map_at_1_diff1 value: 46.2178975214499 - type: nauc_map_at_1_max value: 36.26173199049361 - type: nauc_map_at_1_std value: -3.0897555582816443 - type: nauc_map_at_20_diff1 value: 43.272980702916456 - type: nauc_map_at_20_max value: 39.4896977052276 - type: nauc_map_at_20_std value: -2.3305501742917043 - type: nauc_map_at_3_diff1 value: 43.49525042967079 - type: nauc_map_at_3_max value: 38.66352501824728 - type: nauc_map_at_3_std value: -3.202794391620473 - type: nauc_map_at_5_diff1 value: 43.2266692546611 - type: nauc_map_at_5_max value: 38.77368661115743 - type: nauc_map_at_5_std value: -3.0897532130127954 - type: nauc_mrr_at_1000_diff1 value: 43.32474221868009 - type: nauc_mrr_at_1000_max value: 39.407334029058575 - type: nauc_mrr_at_1000_std value: -2.3728154448932606 - type: nauc_mrr_at_100_diff1 value: 43.32336300929909 - type: nauc_mrr_at_100_max value: 39.432174777554835 - type: nauc_mrr_at_100_std value: -2.356396922384349 - type: nauc_mrr_at_10_diff1 value: 43.1606520154482 - type: nauc_mrr_at_10_max value: 39.33734650558226 - type: nauc_mrr_at_10_std value: -2.5156222475075256 - type: nauc_mrr_at_1_diff1 value: 46.2178975214499 - type: nauc_mrr_at_1_max value: 36.26173199049361 - type: nauc_mrr_at_1_std value: -3.0897555582816443 - type: nauc_mrr_at_20_diff1 value: 43.272980702916456 - type: nauc_mrr_at_20_max value: 39.4896977052276 - type: nauc_mrr_at_20_std value: -2.3305501742917043 - type: nauc_mrr_at_3_diff1 value: 43.49525042967079 - type: nauc_mrr_at_3_max value: 38.66352501824728 - type: nauc_mrr_at_3_std value: -3.202794391620473 - type: nauc_mrr_at_5_diff1 value: 43.2266692546611 - type: nauc_mrr_at_5_max value: 38.77368661115743 - type: nauc_mrr_at_5_std value: -3.0897532130127954 - type: nauc_ndcg_at_1000_diff1 value: 43.01903168202974 - type: nauc_ndcg_at_1000_max value: 40.75496622942232 - type: nauc_ndcg_at_1000_std value: -1.3150412981845496 - type: nauc_ndcg_at_100_diff1 value: 42.98016493758145 - type: nauc_ndcg_at_100_max value: 41.55869635162325 - type: nauc_ndcg_at_100_std value: -0.5355252976886055 - type: nauc_ndcg_at_10_diff1 value: 42.218755211347506 - type: nauc_ndcg_at_10_max value: 41.305042275175765 - type: nauc_ndcg_at_10_std value: -1.4034484444573714 - type: nauc_ndcg_at_1_diff1 value: 46.2178975214499 - type: nauc_ndcg_at_1_max value: 36.26173199049361 - type: nauc_ndcg_at_1_std value: -3.0897555582816443 - type: nauc_ndcg_at_20_diff1 value: 42.66574440095576 - type: nauc_ndcg_at_20_max value: 42.014620115124515 - type: nauc_ndcg_at_20_std value: -0.5176162553751498 - type: nauc_ndcg_at_3_diff1 value: 42.837450505106055 - type: nauc_ndcg_at_3_max value: 39.525369733082414 - type: nauc_ndcg_at_3_std value: -3.1605948245795155 - type: nauc_ndcg_at_5_diff1 value: 42.37951815451173 - type: nauc_ndcg_at_5_max value: 39.78840132935179 - type: nauc_ndcg_at_5_std value: -2.936898430768135 - type: nauc_precision_at_1000_diff1 value: 49.69224988612385 - type: nauc_precision_at_1000_max value: 79.57897547128005 - type: nauc_precision_at_1000_std value: 45.040371354764645 - type: nauc_precision_at_100_diff1 value: 42.70597486048422 - type: nauc_precision_at_100_max value: 65.74628759606188 - type: nauc_precision_at_100_std value: 25.49157745244855 - type: nauc_precision_at_10_diff1 value: 38.565609931689345 - type: nauc_precision_at_10_max value: 50.0239696180852 - type: nauc_precision_at_10_std value: 3.976354829503967 - type: nauc_precision_at_1_diff1 value: 46.2178975214499 - type: nauc_precision_at_1_max value: 36.26173199049361 - type: nauc_precision_at_1_std value: -3.0897555582816443 - type: nauc_precision_at_20_diff1 value: 40.4134718566864 - type: nauc_precision_at_20_max value: 57.121778108665374 - type: nauc_precision_at_20_std value: 11.46021975428544 - type: nauc_precision_at_3_diff1 value: 40.90538379461529 - type: nauc_precision_at_3_max value: 42.18393248057992 - type: nauc_precision_at_3_std value: -3.005249943837297 - type: nauc_precision_at_5_diff1 value: 39.60162965860782 - type: nauc_precision_at_5_max value: 43.28317158174058 - type: nauc_precision_at_5_std value: -2.3469094487738054 - type: nauc_recall_at_1000_diff1 value: 49.69224988612252 - type: nauc_recall_at_1000_max value: 79.57897547127862 - type: nauc_recall_at_1000_std value: 45.04037135476256 - type: nauc_recall_at_100_diff1 value: 42.70597486048432 - type: nauc_recall_at_100_max value: 65.74628759606213 - type: nauc_recall_at_100_std value: 25.491577452448727 - type: nauc_recall_at_10_diff1 value: 38.56560993168935 - type: nauc_recall_at_10_max value: 50.02396961808522 - type: nauc_recall_at_10_std value: 3.9763548295040314 - type: nauc_recall_at_1_diff1 value: 46.2178975214499 - type: nauc_recall_at_1_max value: 36.26173199049361 - type: nauc_recall_at_1_std value: -3.0897555582816443 - type: nauc_recall_at_20_diff1 value: 40.41347185668637 - type: nauc_recall_at_20_max value: 57.12177810866533 - type: nauc_recall_at_20_std value: 11.460219754285431 - type: nauc_recall_at_3_diff1 value: 40.90538379461527 - type: nauc_recall_at_3_max value: 42.18393248057989 - type: nauc_recall_at_3_std value: -3.005249943837297 - type: nauc_recall_at_5_diff1 value: 39.601629658607784 - type: nauc_recall_at_5_max value: 43.28317158174053 - type: nauc_recall_at_5_std value: -2.3469094487738054 - type: ndcg_at_1 value: 37.047000000000004 - type: ndcg_at_10 value: 54.284 - type: ndcg_at_100 value: 58.34 - type: ndcg_at_1000 value: 59.303 - type: ndcg_at_20 value: 56.235 - type: ndcg_at_3 value: 48.503 - type: ndcg_at_5 value: 51.686 - type: precision_at_1 value: 37.047000000000004 - type: precision_at_10 value: 7.237 - type: precision_at_100 value: 0.914 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 4.005 - type: precision_at_3 value: 18.898 - type: precision_at_5 value: 12.884 - type: recall_at_1 value: 37.047000000000004 - type: recall_at_10 value: 72.366 - type: recall_at_100 value: 91.408 - type: recall_at_1000 value: 99.136 - type: recall_at_20 value: 80.095 - type: recall_at_3 value: 56.693000000000005 - type: recall_at_5 value: 64.42099999999999 task: type: Retrieval - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 89.49253731343283 - type: ap value: 61.88098616359918 - type: ap_weighted value: 61.88098616359918 - type: f1 value: 84.76516623679144 - type: f1_weighted value: 89.92745276292968 - type: main_score value: 89.49253731343283 task: type: Classification - dataset: config: de name: MTEB AmazonCounterfactualClassification (de) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 89.61456102783727 - type: ap value: 93.11816566733742 - type: ap_weighted value: 93.11816566733742 - type: f1 value: 88.27635757733722 - type: f1_weighted value: 89.82581568285453 - type: main_score value: 89.61456102783727 task: type: Classification - dataset: config: default name: MTEB AmazonPolarityClassification (default) revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 95.3825 - type: ap value: 93.393033869502 - type: ap_weighted value: 93.393033869502 - type: f1 value: 95.38109007966307 - type: f1_weighted value: 95.38109007966305 - type: main_score value: 95.3825 task: type: Classification - dataset: config: en name: MTEB AmazonReviewsClassification (en) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 49.768 - type: f1 value: 48.95084821944411 - type: f1_weighted value: 48.9508482194441 - type: main_score value: 49.768 task: type: Classification - dataset: config: de name: MTEB AmazonReviewsClassification (de) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 48.071999999999996 - type: f1 value: 47.24171107487612 - type: f1_weighted value: 47.24171107487612 - type: main_score value: 48.071999999999996 task: type: Classification - dataset: config: es name: MTEB AmazonReviewsClassification (es) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 48.102000000000004 - type: f1 value: 47.27193805278696 - type: f1_weighted value: 47.27193805278696 - type: main_score value: 48.102000000000004 task: type: Classification - dataset: config: fr name: MTEB AmazonReviewsClassification (fr) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 47.30800000000001 - type: f1 value: 46.41683358017851 - type: f1_weighted value: 46.41683358017851 - type: main_score value: 47.30800000000001 task: type: Classification - dataset: config: zh name: MTEB AmazonReviewsClassification (zh) revision: 1399c76144fd37290681b995c656ef9b2e06e26d split: test type: mteb/amazon_reviews_multi metrics: - type: accuracy value: 44.944 - type: f1 value: 44.223824487744395 - type: f1_weighted value: 44.22382448774439 - type: main_score value: 44.944 task: type: Classification - dataset: config: default name: MTEB ArguAna (default) revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: map_at_1 value: 29.232000000000003 - type: map_at_10 value: 45.117000000000004 - type: map_at_100 value: 45.977000000000004 - type: map_at_1000 value: 45.98 - type: map_at_20 value: 45.815 - type: map_at_3 value: 39.912 - type: map_at_5 value: 42.693 - type: mrr_at_1 value: 29.659000000000002 - type: mrr_at_10 value: 45.253 - type: mrr_at_100 value: 46.125 - type: mrr_at_1000 value: 46.129 - type: mrr_at_20 value: 45.964 - type: mrr_at_3 value: 40.043 - type: mrr_at_5 value: 42.870000000000005 - type: ndcg_at_1 value: 29.232000000000003 - type: ndcg_at_10 value: 54.327999999999996 - type: ndcg_at_100 value: 57.86 - type: ndcg_at_1000 value: 57.935 - type: ndcg_at_20 value: 56.794 - type: ndcg_at_3 value: 43.516 - type: ndcg_at_5 value: 48.512 - type: precision_at_1 value: 29.232000000000003 - type: precision_at_10 value: 8.393 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.676 - type: precision_at_3 value: 17.994 - type: precision_at_5 value: 13.215 - type: recall_at_1 value: 29.232000000000003 - type: recall_at_10 value: 83.926 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.644 - type: recall_at_20 value: 93.528 - type: recall_at_3 value: 53.983000000000004 - type: recall_at_5 value: 66.074 - type: main_score value: 54.327999999999996 task: type: Retrieval - dataset: config: default name: MTEB ArxivClusteringP2P (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: main_score value: 46.6636824632419 - type: v_measure value: 46.6636824632419 - type: v_measure_std value: 13.817129140714963 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S (default) revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: main_score value: 39.271141892800024 - type: v_measure value: 39.271141892800024 - type: v_measure_std value: 14.276782483454827 task: type: Clustering - dataset: config: default name: MTEB AskUbuntuDupQuestions (default) revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 split: test type: mteb/askubuntudupquestions-reranking metrics: - type: map value: 65.04363277324629 - type: mrr value: 78.2372598162072 - type: main_score value: 65.04363277324629 task: type: Reranking - dataset: config: default name: MTEB MindSmallReranking (default) revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 split: test type: mteb/mind_small metrics: - type: map value: 30.83 - type: main_score value: 30.83 task: type: Reranking - dataset: config: default name: MTEB BIOSSES (default) revision: d3fb88f8f02e40887cd149695127462bbcf29b4a split: test type: mteb/biosses-sts metrics: - type: cosine_pearson value: 88.80382082011027 - type: cosine_spearman value: 88.68876782169106 - type: euclidean_pearson value: 87.00802890147176 - type: euclidean_spearman value: 87.43211268192712 - type: main_score value: 88.68876782169106 - type: manhattan_pearson value: 87.14062537179474 - type: manhattan_spearman value: 87.59115245033443 - type: pearson value: 88.80382082011027 - type: spearman value: 88.68876782169106 task: type: STS - dataset: config: default name: MTEB BQ (default) revision: e3dda5e115e487b39ec7e618c0c6a29137052a55 split: test type: C-MTEB/BQ metrics: - type: cosine_pearson value: 61.588006604878196 - type: cosine_spearman value: 63.20615427154465 - type: euclidean_pearson value: 61.818547092516496 - type: euclidean_spearman value: 63.21558009151778 - type: main_score value: 63.20615427154465 - type: manhattan_pearson value: 61.665588158487616 - type: manhattan_spearman value: 63.051544488238584 - type: pearson value: 61.588006604878196 - type: spearman value: 63.20615427154465 task: type: STS - dataset: config: default name: MTEB BSARDRetrieval (default) revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59 split: test type: maastrichtlawtech/bsard metrics: - type: main_score value: 64.414 - type: map_at_1 value: 14.865 - type: map_at_10 value: 21.605 - type: map_at_100 value: 22.762 - type: map_at_1000 value: 22.854 - type: map_at_20 value: 22.259999999999998 - type: map_at_3 value: 20.119999999999997 - type: map_at_5 value: 20.931 - type: mrr_at_1 value: 14.864864864864865 - type: mrr_at_10 value: 21.605176605176606 - type: mrr_at_100 value: 22.7622306460065 - type: mrr_at_1000 value: 22.85383406410312 - type: mrr_at_20 value: 22.259528463088845 - type: mrr_at_3 value: 20.12012012012012 - type: mrr_at_5 value: 20.930930930930934 - type: nauc_map_at_1000_diff1 value: 17.486265968689338 - type: nauc_map_at_1000_max value: 22.736799291688836 - type: nauc_map_at_1000_std value: 9.831687441977147 - type: nauc_map_at_100_diff1 value: 17.50754492049086 - type: nauc_map_at_100_max value: 22.77693662806787 - type: nauc_map_at_100_std value: 9.853899509675395 - type: nauc_map_at_10_diff1 value: 17.42133968580952 - type: nauc_map_at_10_max value: 22.45861793882279 - type: nauc_map_at_10_std value: 8.964888472915938 - type: nauc_map_at_1_diff1 value: 19.433947086968093 - type: nauc_map_at_1_max value: 24.75657047550517 - type: nauc_map_at_1_std value: 15.122329157218505 - type: nauc_map_at_20_diff1 value: 17.429856756008785 - type: nauc_map_at_20_max value: 22.438850987431017 - type: nauc_map_at_20_std value: 9.172746012213558 - type: nauc_map_at_3_diff1 value: 18.218182689678475 - type: nauc_map_at_3_max value: 23.57169444088667 - type: nauc_map_at_3_std value: 10.464473559366356 - type: nauc_map_at_5_diff1 value: 18.6075342519133 - type: nauc_map_at_5_max value: 23.308845973576673 - type: nauc_map_at_5_std value: 9.364009996445652 - type: nauc_mrr_at_1000_diff1 value: 17.486265968689338 - type: nauc_mrr_at_1000_max value: 22.736799291688836 - type: nauc_mrr_at_1000_std value: 9.831687441977147 - type: nauc_mrr_at_100_diff1 value: 17.50754492049086 - type: nauc_mrr_at_100_max value: 22.77693662806787 - type: nauc_mrr_at_100_std value: 9.853899509675395 - type: nauc_mrr_at_10_diff1 value: 17.42133968580952 - type: nauc_mrr_at_10_max value: 22.45861793882279 - type: nauc_mrr_at_10_std value: 8.964888472915938 - type: nauc_mrr_at_1_diff1 value: 19.433947086968093 - type: nauc_mrr_at_1_max value: 24.75657047550517 - type: nauc_mrr_at_1_std value: 15.122329157218505 - type: nauc_mrr_at_20_diff1 value: 17.429856756008785 - type: nauc_mrr_at_20_max value: 22.438850987431017 - type: nauc_mrr_at_20_std value: 9.172746012213558 - type: nauc_mrr_at_3_diff1 value: 18.218182689678475 - type: nauc_mrr_at_3_max value: 23.57169444088667 - type: nauc_mrr_at_3_std value: 10.464473559366356 - type: nauc_mrr_at_5_diff1 value: 18.6075342519133 - type: nauc_mrr_at_5_max value: 23.308845973576673 - type: nauc_mrr_at_5_std value: 9.364009996445652 - type: nauc_ndcg_at_1000_diff1 value: 16.327871824135745 - type: nauc_ndcg_at_1000_max value: 23.308241052911495 - type: nauc_ndcg_at_1000_std value: 11.50905911184097 - type: nauc_ndcg_at_100_diff1 value: 16.676226744692773 - type: nauc_ndcg_at_100_max value: 24.323253721240974 - type: nauc_ndcg_at_100_std value: 11.952612443651557 - type: nauc_ndcg_at_10_diff1 value: 16.030325121764594 - type: nauc_ndcg_at_10_max value: 21.306799242079542 - type: nauc_ndcg_at_10_std value: 6.63359364302513 - type: nauc_ndcg_at_1_diff1 value: 19.433947086968093 - type: nauc_ndcg_at_1_max value: 24.75657047550517 - type: nauc_ndcg_at_1_std value: 15.122329157218505 - type: nauc_ndcg_at_20_diff1 value: 16.013173605999857 - type: nauc_ndcg_at_20_max value: 21.607217260736576 - type: nauc_ndcg_at_20_std value: 7.319482417138996 - type: nauc_ndcg_at_3_diff1 value: 17.97958548328493 - type: nauc_ndcg_at_3_max value: 23.58346522810145 - type: nauc_ndcg_at_3_std value: 9.392582854708314 - type: nauc_ndcg_at_5_diff1 value: 18.734733324685287 - type: nauc_ndcg_at_5_max value: 23.273244317623742 - type: nauc_ndcg_at_5_std value: 7.638611545253834 - type: nauc_precision_at_1000_diff1 value: 7.919843339380295 - type: nauc_precision_at_1000_max value: 31.575386234270486 - type: nauc_precision_at_1000_std value: 39.332224386769404 - type: nauc_precision_at_100_diff1 value: 15.018050960000052 - type: nauc_precision_at_100_max value: 34.98209513759861 - type: nauc_precision_at_100_std value: 26.970034484359022 - type: nauc_precision_at_10_diff1 value: 12.102191084210922 - type: nauc_precision_at_10_max value: 18.112541150340675 - type: nauc_precision_at_10_std value: 0.7358784689406018 - type: nauc_precision_at_1_diff1 value: 19.433947086968093 - type: nauc_precision_at_1_max value: 24.75657047550517 - type: nauc_precision_at_1_std value: 15.122329157218505 - type: nauc_precision_at_20_diff1 value: 12.018814361204328 - type: nauc_precision_at_20_max value: 19.75123746049928 - type: nauc_precision_at_20_std value: 3.012204650582264 - type: nauc_precision_at_3_diff1 value: 17.41375604940955 - type: nauc_precision_at_3_max value: 23.699834627021037 - type: nauc_precision_at_3_std value: 6.793486779050103 - type: nauc_precision_at_5_diff1 value: 19.194631963780257 - type: nauc_precision_at_5_max value: 23.31708702442155 - type: nauc_precision_at_5_std value: 3.4591358279667332 - type: nauc_recall_at_1000_diff1 value: 7.919843339380378 - type: nauc_recall_at_1000_max value: 31.57538623427063 - type: nauc_recall_at_1000_std value: 39.332224386769546 - type: nauc_recall_at_100_diff1 value: 15.018050960000085 - type: nauc_recall_at_100_max value: 34.9820951375986 - type: nauc_recall_at_100_std value: 26.97003448435901 - type: nauc_recall_at_10_diff1 value: 12.102191084210837 - type: nauc_recall_at_10_max value: 18.112541150340594 - type: nauc_recall_at_10_std value: 0.7358784689405188 - type: nauc_recall_at_1_diff1 value: 19.433947086968093 - type: nauc_recall_at_1_max value: 24.75657047550517 - type: nauc_recall_at_1_std value: 15.122329157218505 - type: nauc_recall_at_20_diff1 value: 12.01881436120429 - type: nauc_recall_at_20_max value: 19.751237460499222 - type: nauc_recall_at_20_std value: 3.0122046505822135 - type: nauc_recall_at_3_diff1 value: 17.413756049409503 - type: nauc_recall_at_3_max value: 23.699834627020998 - type: nauc_recall_at_3_std value: 6.793486779050083 - type: nauc_recall_at_5_diff1 value: 19.194631963780203 - type: nauc_recall_at_5_max value: 23.3170870244215 - type: nauc_recall_at_5_std value: 3.459135827966664 - type: ndcg_at_1 value: 14.865 - type: ndcg_at_10 value: 24.764 - type: ndcg_at_100 value: 30.861 - type: ndcg_at_1000 value: 33.628 - type: ndcg_at_20 value: 27.078000000000003 - type: ndcg_at_3 value: 21.675 - type: ndcg_at_5 value: 23.148 - type: precision_at_1 value: 14.865 - type: precision_at_10 value: 3.4680000000000004 - type: precision_at_100 value: 0.644 - type: precision_at_1000 value: 0.087 - type: precision_at_20 value: 2.185 - type: precision_at_3 value: 8.709 - type: precision_at_5 value: 5.946 - type: recall_at_1 value: 14.865 - type: recall_at_10 value: 34.685 - type: recall_at_100 value: 64.414 - type: recall_at_1000 value: 86.937 - type: recall_at_20 value: 43.694 - type: recall_at_3 value: 26.125999999999998 - type: recall_at_5 value: 29.73 task: type: Retrieval - dataset: config: default name: MTEB Banking77Classification (default) revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 84.08116883116882 - type: f1 value: 84.05587055990273 - type: f1_weighted value: 84.05587055990274 - type: main_score value: 84.08116883116882 task: type: Classification - dataset: config: default name: MTEB BiorxivClusteringP2P (default) revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: main_score value: 38.1941007822277 - type: v_measure value: 38.1941007822277 - type: v_measure_std value: 0.7502113547288178 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S (default) revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: main_score value: 34.42075599178318 - type: v_measure value: 34.42075599178318 - type: v_measure_std value: 0.600256720497283 task: type: Clustering - dataset: config: default name: MTEB BlurbsClusteringP2P (default) revision: a2dd5b02a77de3466a3eaa98ae586b5610314496 split: test type: slvnwhrl/blurbs-clustering-p2p metrics: - type: main_score value: 41.634627363047265 - type: v_measure value: 41.634627363047265 - type: v_measure_std value: 9.726923191225307 task: type: Clustering - dataset: config: default name: MTEB BlurbsClusteringS2S (default) revision: 22793b6a6465bf00120ad525e38c51210858132c split: test type: slvnwhrl/blurbs-clustering-s2s metrics: - type: main_score value: 20.996468295584197 - type: v_measure value: 20.996468295584197 - type: v_measure_std value: 9.225766688272197 task: type: Clustering - dataset: config: default name: MTEB CBD (default) revision: 36ddb419bcffe6a5374c3891957912892916f28d split: test type: PL-MTEB/cbd metrics: - type: accuracy value: 69.99 - type: ap value: 22.57826353116948 - type: ap_weighted value: 22.57826353116948 - type: f1 value: 59.04574955548393 - type: f1_weighted value: 74.36235022309789 - type: main_score value: 69.99 task: type: Classification - dataset: config: default name: MTEB CDSC-E (default) revision: 0a3d4aa409b22f80eb22cbf59b492637637b536d split: test type: PL-MTEB/cdsce-pairclassification metrics: - type: cosine_accuracy value: 88.7 - type: cosine_accuracy_threshold value: 97.37848043441772 - type: cosine_ap value: 73.0405088928302 - type: cosine_f1 value: 63.52201257861635 - type: cosine_f1_threshold value: 96.98888063430786 - type: cosine_precision value: 78.90625 - type: cosine_recall value: 53.1578947368421 - type: dot_accuracy value: 84.89999999999999 - type: dot_accuracy_threshold value: 43603.09753417969 - type: dot_ap value: 56.98157569085279 - type: dot_f1 value: 57.606490872210955 - type: dot_f1_threshold value: 40406.23779296875 - type: dot_precision value: 46.864686468646866 - type: dot_recall value: 74.73684210526315 - type: euclidean_accuracy value: 88.5 - type: euclidean_accuracy_threshold value: 498.0483055114746 - type: euclidean_ap value: 72.97328234816734 - type: euclidean_f1 value: 63.722397476340696 - type: euclidean_f1_threshold value: 508.6186408996582 - type: euclidean_precision value: 79.52755905511812 - type: euclidean_recall value: 53.1578947368421 - type: main_score value: 73.0405088928302 - type: manhattan_accuracy value: 88.6 - type: manhattan_accuracy_threshold value: 12233.079528808594 - type: manhattan_ap value: 72.92148503992615 - type: manhattan_f1 value: 63.69426751592356 - type: manhattan_f1_threshold value: 12392.754364013672 - type: manhattan_precision value: 80.64516129032258 - type: manhattan_recall value: 52.63157894736842 - type: max_accuracy value: 88.7 - type: max_ap value: 73.0405088928302 - type: max_f1 value: 63.722397476340696 - type: max_precision value: 80.64516129032258 - type: max_recall value: 74.73684210526315 - type: similarity_accuracy value: 88.7 - type: similarity_accuracy_threshold value: 97.37848043441772 - type: similarity_ap value: 73.0405088928302 - type: similarity_f1 value: 63.52201257861635 - type: similarity_f1_threshold value: 96.98888063430786 - type: similarity_precision value: 78.90625 - type: similarity_recall value: 53.1578947368421 task: type: PairClassification - dataset: config: default name: MTEB CDSC-R (default) revision: 1cd6abbb00df7d14be3dbd76a7dcc64b3a79a7cd split: test type: PL-MTEB/cdscr-sts metrics: - type: cosine_pearson value: 92.97492495289738 - type: cosine_spearman value: 92.63248098608472 - type: euclidean_pearson value: 92.04712487782031 - type: euclidean_spearman value: 92.19679486755008 - type: main_score value: 92.63248098608472 - type: manhattan_pearson value: 92.0101187740438 - type: manhattan_spearman value: 92.20926859332754 - type: pearson value: 92.97492495289738 - type: spearman value: 92.63248098608472 task: type: STS - dataset: config: default name: MTEB CLSClusteringP2P (default) revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476 split: test type: C-MTEB/CLSClusteringP2P metrics: - type: main_score value: 39.96377851800628 - type: v_measure value: 39.96377851800628 - type: v_measure_std value: 0.9793033243093288 task: type: Clustering - dataset: config: default name: MTEB CLSClusteringS2S (default) revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f split: test type: C-MTEB/CLSClusteringS2S metrics: - type: main_score value: 38.788850224595784 - type: v_measure value: 38.788850224595784 - type: v_measure_std value: 1.0712604145916924 task: type: Clustering - dataset: config: default name: MTEB CMedQAv1 revision: 8d7f1e942507dac42dc58017c1a001c3717da7df split: test type: C-MTEB/CMedQAv1-reranking metrics: - type: map value: 77.95952507806115 - type: mrr value: 80.8643253968254 - type: main_score value: 77.95952507806115 task: type: Reranking - dataset: config: default name: MTEB CMedQAv2 revision: 23d186750531a14a0357ca22cd92d712fd512ea0 split: test type: C-MTEB/CMedQAv2-reranking metrics: - type: map value: 78.21522500165045 - type: mrr value: 81.28194444444443 - type: main_score value: 78.21522500165045 task: type: Reranking - dataset: config: default name: MTEB CQADupstackAndroidRetrieval (default) revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: mteb/cqadupstack-android metrics: - type: map_at_1 value: 33.377 - type: map_at_10 value: 46.371 - type: map_at_100 value: 47.829 - type: map_at_1000 value: 47.94 - type: map_at_20 value: 47.205000000000005 - type: map_at_3 value: 42.782 - type: map_at_5 value: 44.86 - type: mrr_at_1 value: 41.345 - type: mrr_at_10 value: 52.187 - type: mrr_at_100 value: 52.893 - type: mrr_at_1000 value: 52.929 - type: mrr_at_20 value: 52.637 - type: mrr_at_3 value: 49.714000000000006 - type: mrr_at_5 value: 51.373000000000005 - type: ndcg_at_1 value: 41.345 - type: ndcg_at_10 value: 52.946000000000005 - type: ndcg_at_100 value: 57.92699999999999 - type: ndcg_at_1000 value: 59.609 - type: ndcg_at_20 value: 54.900999999999996 - type: ndcg_at_3 value: 48.357 - type: ndcg_at_5 value: 50.739000000000004 - type: precision_at_1 value: 41.345 - type: precision_at_10 value: 10.186 - type: precision_at_100 value: 1.554 - type: precision_at_1000 value: 0.2 - type: precision_at_20 value: 5.959 - type: precision_at_3 value: 23.796 - type: precision_at_5 value: 17.024 - type: recall_at_1 value: 33.377 - type: recall_at_10 value: 65.067 - type: recall_at_100 value: 86.04899999999999 - type: recall_at_1000 value: 96.54899999999999 - type: recall_at_20 value: 72.071 - type: recall_at_3 value: 51.349999999999994 - type: recall_at_5 value: 58.41 - type: main_score value: 52.946000000000005 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval (default) revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: mteb/cqadupstack-english metrics: - type: map_at_1 value: 31.097 - type: map_at_10 value: 42.183 - type: map_at_100 value: 43.580999999999996 - type: map_at_1000 value: 43.718 - type: map_at_20 value: 42.921 - type: map_at_3 value: 38.963 - type: map_at_5 value: 40.815 - type: mrr_at_1 value: 39.745000000000005 - type: mrr_at_10 value: 48.736000000000004 - type: mrr_at_100 value: 49.405 - type: mrr_at_1000 value: 49.452 - type: mrr_at_20 value: 49.118 - type: mrr_at_3 value: 46.497 - type: mrr_at_5 value: 47.827999999999996 - type: ndcg_at_1 value: 39.745000000000005 - type: ndcg_at_10 value: 48.248000000000005 - type: ndcg_at_100 value: 52.956 - type: ndcg_at_1000 value: 54.99699999999999 - type: ndcg_at_20 value: 50.01 - type: ndcg_at_3 value: 43.946000000000005 - type: ndcg_at_5 value: 46.038000000000004 - type: precision_at_1 value: 39.745000000000005 - type: precision_at_10 value: 9.229 - type: precision_at_100 value: 1.5070000000000001 - type: precision_at_1000 value: 0.199 - type: precision_at_20 value: 5.489999999999999 - type: precision_at_3 value: 21.38 - type: precision_at_5 value: 15.274 - type: recall_at_1 value: 31.097 - type: recall_at_10 value: 58.617 - type: recall_at_100 value: 78.55199999999999 - type: recall_at_1000 value: 91.13900000000001 - type: recall_at_20 value: 64.92 - type: recall_at_3 value: 45.672000000000004 - type: recall_at_5 value: 51.669 - type: main_score value: 48.248000000000005 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval (default) revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: mteb/cqadupstack-gaming metrics: - type: map_at_1 value: 39.745000000000005 - type: map_at_10 value: 52.063 - type: map_at_100 value: 53.077 - type: map_at_1000 value: 53.13 - type: map_at_20 value: 52.66 - type: map_at_3 value: 48.662 - type: map_at_5 value: 50.507000000000005 - type: mrr_at_1 value: 45.391999999999996 - type: mrr_at_10 value: 55.528 - type: mrr_at_100 value: 56.16100000000001 - type: mrr_at_1000 value: 56.192 - type: mrr_at_20 value: 55.923 - type: mrr_at_3 value: 52.93600000000001 - type: mrr_at_5 value: 54.435 - type: ndcg_at_1 value: 45.391999999999996 - type: ndcg_at_10 value: 58.019 - type: ndcg_at_100 value: 61.936 - type: ndcg_at_1000 value: 63.015 - type: ndcg_at_20 value: 59.691 - type: ndcg_at_3 value: 52.294 - type: ndcg_at_5 value: 55.017 - type: precision_at_1 value: 45.391999999999996 - type: precision_at_10 value: 9.386 - type: precision_at_100 value: 1.232 - type: precision_at_1000 value: 0.136 - type: precision_at_20 value: 5.223 - type: precision_at_3 value: 23.177 - type: precision_at_5 value: 15.9 - type: recall_at_1 value: 39.745000000000005 - type: recall_at_10 value: 72.08099999999999 - type: recall_at_100 value: 88.85300000000001 - type: recall_at_1000 value: 96.569 - type: recall_at_20 value: 78.203 - type: recall_at_3 value: 56.957 - type: recall_at_5 value: 63.63100000000001 - type: main_score value: 58.019 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval (default) revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: mteb/cqadupstack-gis metrics: - type: map_at_1 value: 26.651999999999997 - type: map_at_10 value: 35.799 - type: map_at_100 value: 36.846000000000004 - type: map_at_1000 value: 36.931000000000004 - type: map_at_20 value: 36.341 - type: map_at_3 value: 32.999 - type: map_at_5 value: 34.597 - type: mrr_at_1 value: 28.814 - type: mrr_at_10 value: 37.869 - type: mrr_at_100 value: 38.728 - type: mrr_at_1000 value: 38.795 - type: mrr_at_20 value: 38.317 - type: mrr_at_3 value: 35.235 - type: mrr_at_5 value: 36.738 - type: ndcg_at_1 value: 28.814 - type: ndcg_at_10 value: 41.028 - type: ndcg_at_100 value: 46.162 - type: ndcg_at_1000 value: 48.15 - type: ndcg_at_20 value: 42.824 - type: ndcg_at_3 value: 35.621 - type: ndcg_at_5 value: 38.277 - type: precision_at_1 value: 28.814 - type: precision_at_10 value: 6.361999999999999 - type: precision_at_100 value: 0.9450000000000001 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_20 value: 3.6159999999999997 - type: precision_at_3 value: 15.140999999999998 - type: precision_at_5 value: 10.712000000000002 - type: recall_at_1 value: 26.651999999999997 - type: recall_at_10 value: 55.038 - type: recall_at_100 value: 78.806 - type: recall_at_1000 value: 93.485 - type: recall_at_20 value: 61.742 - type: recall_at_3 value: 40.682 - type: recall_at_5 value: 46.855000000000004 - type: main_score value: 41.028 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval (default) revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: mteb/cqadupstack-mathematica metrics: - type: map_at_1 value: 17.627000000000002 - type: map_at_10 value: 26.436999999999998 - type: map_at_100 value: 27.85 - type: map_at_1000 value: 27.955999999999996 - type: map_at_20 value: 27.233 - type: map_at_3 value: 23.777 - type: map_at_5 value: 25.122 - type: mrr_at_1 value: 22.387999999999998 - type: mrr_at_10 value: 31.589 - type: mrr_at_100 value: 32.641999999999996 - type: mrr_at_1000 value: 32.696999999999996 - type: mrr_at_20 value: 32.201 - type: mrr_at_3 value: 28.98 - type: mrr_at_5 value: 30.342000000000002 - type: ndcg_at_1 value: 22.387999999999998 - type: ndcg_at_10 value: 32.129999999999995 - type: ndcg_at_100 value: 38.562999999999995 - type: ndcg_at_1000 value: 40.903 - type: ndcg_at_20 value: 34.652 - type: ndcg_at_3 value: 27.26 - type: ndcg_at_5 value: 29.235 - type: precision_at_1 value: 22.387999999999998 - type: precision_at_10 value: 5.970000000000001 - type: precision_at_100 value: 1.068 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_20 value: 3.6999999999999997 - type: precision_at_3 value: 13.267000000000001 - type: precision_at_5 value: 9.403 - type: recall_at_1 value: 17.627000000000002 - type: recall_at_10 value: 44.71 - type: recall_at_100 value: 72.426 - type: recall_at_1000 value: 88.64699999999999 - type: recall_at_20 value: 53.65 - type: recall_at_3 value: 30.989 - type: recall_at_5 value: 36.237 - type: main_score value: 32.129999999999995 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval (default) revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: mteb/cqadupstack-physics metrics: - type: map_at_1 value: 30.891000000000002 - type: map_at_10 value: 41.519 - type: map_at_100 value: 42.896 - type: map_at_1000 value: 42.992999999999995 - type: map_at_20 value: 42.287 - type: map_at_3 value: 37.822 - type: map_at_5 value: 39.976 - type: mrr_at_1 value: 37.921 - type: mrr_at_10 value: 47.260999999999996 - type: mrr_at_100 value: 48.044 - type: mrr_at_1000 value: 48.08 - type: mrr_at_20 value: 47.699999999999996 - type: mrr_at_3 value: 44.513999999999996 - type: mrr_at_5 value: 46.064 - type: ndcg_at_1 value: 37.921 - type: ndcg_at_10 value: 47.806 - type: ndcg_at_100 value: 53.274 - type: ndcg_at_1000 value: 55.021 - type: ndcg_at_20 value: 49.973 - type: ndcg_at_3 value: 42.046 - type: ndcg_at_5 value: 44.835 - type: precision_at_1 value: 37.921 - type: precision_at_10 value: 8.767999999999999 - type: precision_at_100 value: 1.353 - type: precision_at_1000 value: 0.168 - type: precision_at_20 value: 5.135 - type: precision_at_3 value: 20.051 - type: precision_at_5 value: 14.398 - type: recall_at_1 value: 30.891000000000002 - type: recall_at_10 value: 60.897999999999996 - type: recall_at_100 value: 83.541 - type: recall_at_1000 value: 94.825 - type: recall_at_20 value: 68.356 - type: recall_at_3 value: 44.65 - type: recall_at_5 value: 51.919000000000004 - type: main_score value: 47.806 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval (default) revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: mteb/cqadupstack-programmers metrics: - type: map_at_1 value: 27.654 - type: map_at_10 value: 38.025999999999996 - type: map_at_100 value: 39.425 - type: map_at_1000 value: 39.528 - type: map_at_20 value: 38.838 - type: map_at_3 value: 34.745 - type: map_at_5 value: 36.537 - type: mrr_at_1 value: 34.018 - type: mrr_at_10 value: 43.314 - type: mrr_at_100 value: 44.283 - type: mrr_at_1000 value: 44.327 - type: mrr_at_20 value: 43.929 - type: mrr_at_3 value: 40.868 - type: mrr_at_5 value: 42.317 - type: ndcg_at_1 value: 34.018 - type: ndcg_at_10 value: 43.887 - type: ndcg_at_100 value: 49.791000000000004 - type: ndcg_at_1000 value: 51.834 - type: ndcg_at_20 value: 46.376 - type: ndcg_at_3 value: 38.769999999999996 - type: ndcg_at_5 value: 41.144 - type: precision_at_1 value: 34.018 - type: precision_at_10 value: 8.001999999999999 - type: precision_at_100 value: 1.2630000000000001 - type: precision_at_1000 value: 0.16 - type: precision_at_20 value: 4.737 - type: precision_at_3 value: 18.417 - type: precision_at_5 value: 13.150999999999998 - type: recall_at_1 value: 27.654 - type: recall_at_10 value: 56.111 - type: recall_at_100 value: 81.136 - type: recall_at_1000 value: 94.788 - type: recall_at_20 value: 65.068 - type: recall_at_3 value: 41.713 - type: recall_at_5 value: 48.106 - type: main_score value: 43.887 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: CQADupstackRetrieval_is_a_combined_dataset split: test type: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: main_score value: 42.58858333333333 - type: ndcg_at_10 value: 42.58858333333333 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval (default) revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: mteb/cqadupstack-stats metrics: - type: map_at_1 value: 24.501 - type: map_at_10 value: 32.814 - type: map_at_100 value: 33.754 - type: map_at_1000 value: 33.859 - type: map_at_20 value: 33.324 - type: map_at_3 value: 30.758000000000003 - type: map_at_5 value: 31.936999999999998 - type: mrr_at_1 value: 27.761000000000003 - type: mrr_at_10 value: 35.662 - type: mrr_at_100 value: 36.443999999999996 - type: mrr_at_1000 value: 36.516999999999996 - type: mrr_at_20 value: 36.085 - type: mrr_at_3 value: 33.742 - type: mrr_at_5 value: 34.931 - type: ndcg_at_1 value: 27.761000000000003 - type: ndcg_at_10 value: 37.208000000000006 - type: ndcg_at_100 value: 41.839 - type: ndcg_at_1000 value: 44.421 - type: ndcg_at_20 value: 38.917 - type: ndcg_at_3 value: 33.544000000000004 - type: ndcg_at_5 value: 35.374 - type: precision_at_1 value: 27.761000000000003 - type: precision_at_10 value: 5.92 - type: precision_at_100 value: 0.899 - type: precision_at_1000 value: 0.12 - type: precision_at_20 value: 3.4130000000000003 - type: precision_at_3 value: 15.031 - type: precision_at_5 value: 10.306999999999999 - type: recall_at_1 value: 24.501 - type: recall_at_10 value: 47.579 - type: recall_at_100 value: 69.045 - type: recall_at_1000 value: 88.032 - type: recall_at_20 value: 54.125 - type: recall_at_3 value: 37.202 - type: recall_at_5 value: 41.927 - type: main_score value: 37.208000000000006 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval (default) revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: mteb/cqadupstack-tex metrics: - type: map_at_1 value: 18.29 - type: map_at_10 value: 26.183 - type: map_at_100 value: 27.351999999999997 - type: map_at_1000 value: 27.483999999999998 - type: map_at_20 value: 26.798 - type: map_at_3 value: 23.629 - type: map_at_5 value: 24.937 - type: mrr_at_1 value: 22.299 - type: mrr_at_10 value: 30.189 - type: mrr_at_100 value: 31.098 - type: mrr_at_1000 value: 31.177 - type: mrr_at_20 value: 30.697000000000003 - type: mrr_at_3 value: 27.862 - type: mrr_at_5 value: 29.066 - type: ndcg_at_1 value: 22.299 - type: ndcg_at_10 value: 31.202 - type: ndcg_at_100 value: 36.617 - type: ndcg_at_1000 value: 39.544000000000004 - type: ndcg_at_20 value: 33.177 - type: ndcg_at_3 value: 26.639000000000003 - type: ndcg_at_5 value: 28.526 - type: precision_at_1 value: 22.299 - type: precision_at_10 value: 5.8020000000000005 - type: precision_at_100 value: 1.0070000000000001 - type: precision_at_1000 value: 0.14400000000000002 - type: precision_at_20 value: 3.505 - type: precision_at_3 value: 12.698 - type: precision_at_5 value: 9.174 - type: recall_at_1 value: 18.29 - type: recall_at_10 value: 42.254999999999995 - type: recall_at_100 value: 66.60000000000001 - type: recall_at_1000 value: 87.31400000000001 - type: recall_at_20 value: 49.572 - type: recall_at_3 value: 29.342000000000002 - type: recall_at_5 value: 34.221000000000004 - type: main_score value: 31.202 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval (default) revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: mteb/cqadupstack-unix metrics: - type: map_at_1 value: 27.722 - type: map_at_10 value: 37.698 - type: map_at_100 value: 38.899 - type: map_at_1000 value: 38.998 - type: map_at_20 value: 38.381 - type: map_at_3 value: 34.244 - type: map_at_5 value: 36.295 - type: mrr_at_1 value: 32.183 - type: mrr_at_10 value: 41.429 - type: mrr_at_100 value: 42.308 - type: mrr_at_1000 value: 42.358000000000004 - type: mrr_at_20 value: 41.957 - type: mrr_at_3 value: 38.401999999999994 - type: mrr_at_5 value: 40.294999999999995 - type: ndcg_at_1 value: 32.183 - type: ndcg_at_10 value: 43.519000000000005 - type: ndcg_at_100 value: 48.786 - type: ndcg_at_1000 value: 50.861999999999995 - type: ndcg_at_20 value: 45.654 - type: ndcg_at_3 value: 37.521 - type: ndcg_at_5 value: 40.615 - type: precision_at_1 value: 32.183 - type: precision_at_10 value: 7.603 - type: precision_at_100 value: 1.135 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_20 value: 4.408 - type: precision_at_3 value: 17.071 - type: precision_at_5 value: 12.668 - type: recall_at_1 value: 27.722 - type: recall_at_10 value: 57.230000000000004 - type: recall_at_100 value: 79.97999999999999 - type: recall_at_1000 value: 94.217 - type: recall_at_20 value: 64.864 - type: recall_at_3 value: 41.215 - type: recall_at_5 value: 48.774 - type: main_score value: 43.519000000000005 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: mteb/cqadupstack-webmasters metrics: - type: map_at_1 value: 25.852999999999998 - type: map_at_10 value: 35.394999999999996 - type: map_at_100 value: 37.291999999999994 - type: map_at_1000 value: 37.495 - type: map_at_20 value: 36.372 - type: map_at_3 value: 32.336 - type: map_at_5 value: 34.159 - type: mrr_at_1 value: 31.818 - type: mrr_at_10 value: 40.677 - type: mrr_at_100 value: 41.728 - type: mrr_at_1000 value: 41.778 - type: mrr_at_20 value: 41.301 - type: mrr_at_3 value: 38.208 - type: mrr_at_5 value: 39.592 - type: ndcg_at_1 value: 31.818 - type: ndcg_at_10 value: 41.559000000000005 - type: ndcg_at_100 value: 48.012 - type: ndcg_at_1000 value: 50.234 - type: ndcg_at_20 value: 44.15 - type: ndcg_at_3 value: 36.918 - type: ndcg_at_5 value: 39.227000000000004 - type: precision_at_1 value: 31.818 - type: precision_at_10 value: 8.043 - type: precision_at_100 value: 1.625 - type: precision_at_1000 value: 0.245 - type: precision_at_20 value: 5.2170000000000005 - type: precision_at_3 value: 17.655 - type: precision_at_5 value: 12.845999999999998 - type: recall_at_1 value: 25.852999999999998 - type: recall_at_10 value: 53.093 - type: recall_at_100 value: 81.05799999999999 - type: recall_at_1000 value: 94.657 - type: recall_at_20 value: 62.748000000000005 - type: recall_at_3 value: 39.300000000000004 - type: recall_at_5 value: 45.754 - type: main_score value: 41.559000000000005 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval (default) revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack-wordpress metrics: - type: map_at_1 value: 19.23 - type: map_at_10 value: 28.128999999999998 - type: map_at_100 value: 29.195 - type: map_at_1000 value: 29.310000000000002 - type: map_at_20 value: 28.713 - type: map_at_3 value: 25.191000000000003 - type: map_at_5 value: 26.69 - type: mrr_at_1 value: 21.257 - type: mrr_at_10 value: 30.253999999999998 - type: mrr_at_100 value: 31.195 - type: mrr_at_1000 value: 31.270999999999997 - type: mrr_at_20 value: 30.747999999999998 - type: mrr_at_3 value: 27.633999999999997 - type: mrr_at_5 value: 28.937 - type: ndcg_at_1 value: 21.257 - type: ndcg_at_10 value: 33.511 - type: ndcg_at_100 value: 38.733000000000004 - type: ndcg_at_1000 value: 41.489 - type: ndcg_at_20 value: 35.476 - type: ndcg_at_3 value: 27.845 - type: ndcg_at_5 value: 30.264999999999997 - type: precision_at_1 value: 21.257 - type: precision_at_10 value: 5.619 - type: precision_at_100 value: 0.893 - type: precision_at_1000 value: 0.124 - type: precision_at_20 value: 3.29 - type: precision_at_3 value: 12.508 - type: precision_at_5 value: 8.946 - type: recall_at_1 value: 19.23 - type: recall_at_10 value: 48.185 - type: recall_at_100 value: 71.932 - type: recall_at_1000 value: 92.587 - type: recall_at_20 value: 55.533 - type: recall_at_3 value: 32.865 - type: recall_at_5 value: 38.577 - type: main_score value: 33.511 task: type: Retrieval - dataset: config: default name: MTEB ClimateFEVER (default) revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: map_at_1 value: 19.594 - type: map_at_10 value: 32.519 - type: map_at_100 value: 34.1 - type: map_at_1000 value: 34.263 - type: map_at_20 value: 33.353 - type: map_at_3 value: 27.898 - type: map_at_5 value: 30.524 - type: mrr_at_1 value: 46.515 - type: mrr_at_10 value: 56.958 - type: mrr_at_100 value: 57.54899999999999 - type: mrr_at_1000 value: 57.574999999999996 - type: mrr_at_20 value: 57.315000000000005 - type: mrr_at_3 value: 54.852999999999994 - type: mrr_at_5 value: 56.153 - type: ndcg_at_1 value: 46.515 - type: ndcg_at_10 value: 42.363 - type: ndcg_at_100 value: 48.233 - type: ndcg_at_1000 value: 50.993 - type: ndcg_at_20 value: 44.533 - type: ndcg_at_3 value: 37.297000000000004 - type: ndcg_at_5 value: 38.911 - type: precision_at_1 value: 46.515 - type: precision_at_10 value: 12.520999999999999 - type: precision_at_100 value: 1.8980000000000001 - type: precision_at_1000 value: 0.242 - type: precision_at_20 value: 7.212000000000001 - type: precision_at_3 value: 27.752 - type: precision_at_5 value: 20.391000000000002 - type: recall_at_1 value: 19.594 - type: recall_at_10 value: 46.539 - type: recall_at_100 value: 66.782 - type: recall_at_1000 value: 82.049 - type: recall_at_20 value: 52.611 - type: recall_at_3 value: 32.528 - type: recall_at_5 value: 38.933 - type: main_score value: 42.363 task: type: Retrieval - dataset: config: default name: MTEB CmedqaRetrieval (default) revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301 split: dev type: C-MTEB/CmedqaRetrieval metrics: - type: main_score value: 35.927 - type: map_at_1 value: 20.144000000000002 - type: map_at_10 value: 29.94 - type: map_at_100 value: 31.630000000000003 - type: map_at_1000 value: 31.778000000000002 - type: map_at_20 value: 30.798 - type: map_at_3 value: 26.534999999999997 - type: map_at_5 value: 28.33 - type: mrr_at_1 value: 31.23280820205051 - type: mrr_at_10 value: 38.66781179421835 - type: mrr_at_100 value: 39.656936166081785 - type: mrr_at_1000 value: 39.724602893117414 - type: mrr_at_20 value: 39.21272461558451 - type: mrr_at_3 value: 36.30907726931729 - type: mrr_at_5 value: 37.59814953738436 - type: nauc_map_at_1000_diff1 value: 44.5755334437146 - type: nauc_map_at_1000_max value: 40.726916781400746 - type: nauc_map_at_1000_std value: -19.591835061497367 - type: nauc_map_at_100_diff1 value: 44.54542899921038 - type: nauc_map_at_100_max value: 40.68305902532837 - type: nauc_map_at_100_std value: -19.658902089283487 - type: nauc_map_at_10_diff1 value: 44.56110529630953 - type: nauc_map_at_10_max value: 39.89826167846008 - type: nauc_map_at_10_std value: -20.62910633667902 - type: nauc_map_at_1_diff1 value: 50.82120107004449 - type: nauc_map_at_1_max value: 33.208851367861584 - type: nauc_map_at_1_std value: -20.29409730258174 - type: nauc_map_at_20_diff1 value: 44.51171242433788 - type: nauc_map_at_20_max value: 40.30431132782945 - type: nauc_map_at_20_std value: -20.290524142792417 - type: nauc_map_at_3_diff1 value: 45.80394138665133 - type: nauc_map_at_3_max value: 37.766191281426956 - type: nauc_map_at_3_std value: -21.223601997333876 - type: nauc_map_at_5_diff1 value: 45.00457218474283 - type: nauc_map_at_5_max value: 38.901044576388365 - type: nauc_map_at_5_std value: -20.893069613941634 - type: nauc_mrr_at_1000_diff1 value: 50.09855359231429 - type: nauc_mrr_at_1000_max value: 46.481000170008826 - type: nauc_mrr_at_1000_std value: -16.053461377096102 - type: nauc_mrr_at_100_diff1 value: 50.08205026347746 - type: nauc_mrr_at_100_max value: 46.47262126963331 - type: nauc_mrr_at_100_std value: -16.049112778748693 - type: nauc_mrr_at_10_diff1 value: 50.02363239081706 - type: nauc_mrr_at_10_max value: 46.39287859062042 - type: nauc_mrr_at_10_std value: -16.280866744769657 - type: nauc_mrr_at_1_diff1 value: 55.692503735317445 - type: nauc_mrr_at_1_max value: 47.334834529801014 - type: nauc_mrr_at_1_std value: -16.985483585693512 - type: nauc_mrr_at_20_diff1 value: 50.07725225722074 - type: nauc_mrr_at_20_max value: 46.47279295070193 - type: nauc_mrr_at_20_std value: -16.15168364678318 - type: nauc_mrr_at_3_diff1 value: 51.18685337274134 - type: nauc_mrr_at_3_max value: 46.7286365021621 - type: nauc_mrr_at_3_std value: -16.708451287313718 - type: nauc_mrr_at_5_diff1 value: 50.46777237893576 - type: nauc_mrr_at_5_max value: 46.5352076502249 - type: nauc_mrr_at_5_std value: -16.557413659905034 - type: nauc_ndcg_at_1000_diff1 value: 43.974299434438066 - type: nauc_ndcg_at_1000_max value: 43.44628675071857 - type: nauc_ndcg_at_1000_std value: -15.3495102005021 - type: nauc_ndcg_at_100_diff1 value: 43.336365081508504 - type: nauc_ndcg_at_100_max value: 43.11345604460776 - type: nauc_ndcg_at_100_std value: -15.571128070860615 - type: nauc_ndcg_at_10_diff1 value: 43.41266214720136 - type: nauc_ndcg_at_10_max value: 41.519676787851914 - type: nauc_ndcg_at_10_std value: -19.217175017223568 - type: nauc_ndcg_at_1_diff1 value: 55.692503735317445 - type: nauc_ndcg_at_1_max value: 47.334834529801014 - type: nauc_ndcg_at_1_std value: -16.985483585693512 - type: nauc_ndcg_at_20_diff1 value: 43.351653862834496 - type: nauc_ndcg_at_20_max value: 42.11608469750499 - type: nauc_ndcg_at_20_std value: -18.485363540641664 - type: nauc_ndcg_at_3_diff1 value: 45.64193888236677 - type: nauc_ndcg_at_3_max value: 42.497135099009995 - type: nauc_ndcg_at_3_std value: -18.764012041130094 - type: nauc_ndcg_at_5_diff1 value: 44.523392133895186 - type: nauc_ndcg_at_5_max value: 41.564242030096345 - type: nauc_ndcg_at_5_std value: -19.31080790984941 - type: nauc_precision_at_1000_diff1 value: 6.383464615714393 - type: nauc_precision_at_1000_max value: 27.439930931284657 - type: nauc_precision_at_1000_std value: 19.070716188143034 - type: nauc_precision_at_100_diff1 value: 12.599136754501284 - type: nauc_precision_at_100_max value: 35.886310962337795 - type: nauc_precision_at_100_std value: 14.06587592659196 - type: nauc_precision_at_10_diff1 value: 25.388891173150206 - type: nauc_precision_at_10_max value: 46.10269270777384 - type: nauc_precision_at_10_std value: -5.993803607158499 - type: nauc_precision_at_1_diff1 value: 55.692503735317445 - type: nauc_precision_at_1_max value: 47.334834529801014 - type: nauc_precision_at_1_std value: -16.985483585693512 - type: nauc_precision_at_20_diff1 value: 20.984013463099707 - type: nauc_precision_at_20_max value: 42.9471854616888 - type: nauc_precision_at_20_std value: -0.8045549929346024 - type: nauc_precision_at_3_diff1 value: 36.191850547148356 - type: nauc_precision_at_3_max value: 48.09923832376049 - type: nauc_precision_at_3_std value: -13.159407051271321 - type: nauc_precision_at_5_diff1 value: 31.04967966700407 - type: nauc_precision_at_5_max value: 47.62867673349624 - type: nauc_precision_at_5_std value: -10.345790325137353 - type: nauc_recall_at_1000_diff1 value: 11.03436839065707 - type: nauc_recall_at_1000_max value: 42.32265076651575 - type: nauc_recall_at_1000_std value: 30.478521053399206 - type: nauc_recall_at_100_diff1 value: 24.788349084510806 - type: nauc_recall_at_100_max value: 36.72097184821956 - type: nauc_recall_at_100_std value: -0.2241144179522076 - type: nauc_recall_at_10_diff1 value: 31.613053567704885 - type: nauc_recall_at_10_max value: 34.4597322828833 - type: nauc_recall_at_10_std value: -18.00022912690819 - type: nauc_recall_at_1_diff1 value: 50.82120107004449 - type: nauc_recall_at_1_max value: 33.208851367861584 - type: nauc_recall_at_1_std value: -20.29409730258174 - type: nauc_recall_at_20_diff1 value: 30.277002670708384 - type: nauc_recall_at_20_max value: 35.212475675060375 - type: nauc_recall_at_20_std value: -15.822788854733687 - type: nauc_recall_at_3_diff1 value: 38.87844958322257 - type: nauc_recall_at_3_max value: 34.66914910044104 - type: nauc_recall_at_3_std value: -20.234707300209127 - type: nauc_recall_at_5_diff1 value: 35.551139991687776 - type: nauc_recall_at_5_max value: 34.61009958820695 - type: nauc_recall_at_5_std value: -19.519180149293444 - type: ndcg_at_1 value: 31.233 - type: ndcg_at_10 value: 35.927 - type: ndcg_at_100 value: 43.037 - type: ndcg_at_1000 value: 45.900999999999996 - type: ndcg_at_20 value: 38.39 - type: ndcg_at_3 value: 31.366 - type: ndcg_at_5 value: 33.108 - type: precision_at_1 value: 31.233 - type: precision_at_10 value: 8.15 - type: precision_at_100 value: 1.402 - type: precision_at_1000 value: 0.17700000000000002 - type: precision_at_20 value: 4.91 - type: precision_at_3 value: 17.871000000000002 - type: precision_at_5 value: 12.948 - type: recall_at_1 value: 20.144000000000002 - type: recall_at_10 value: 44.985 - type: recall_at_100 value: 74.866 - type: recall_at_1000 value: 94.477 - type: recall_at_20 value: 53.37 - type: recall_at_3 value: 31.141000000000002 - type: recall_at_5 value: 36.721 task: type: Retrieval - dataset: config: default name: MTEB Cmnli (default) revision: None split: validation type: C-MTEB/CMNLI metrics: - type: cos_sim_accuracy value: 71.25676488274203 - type: cos_sim_accuracy_threshold value: 78.11152935028076 - type: cos_sim_ap value: 79.10444825556077 - type: cos_sim_f1 value: 74.10750923266312 - type: cos_sim_f1_threshold value: 75.2312421798706 - type: cos_sim_precision value: 66.02083714129044 - type: cos_sim_recall value: 84.45171849427169 - type: dot_accuracy value: 68.11785929043896 - type: dot_accuracy_threshold value: 34783.23974609375 - type: dot_ap value: 75.80201827987712 - type: dot_f1 value: 72.31670990679349 - type: dot_f1_threshold value: 31978.036499023438 - type: dot_precision value: 61.386623164763456 - type: dot_recall value: 87.98223053542202 - type: euclidean_accuracy value: 71.41310883944678 - type: euclidean_accuracy_threshold value: 1374.9353408813477 - type: euclidean_ap value: 79.23359768836457 - type: euclidean_f1 value: 74.38512297540491 - type: euclidean_f1_threshold value: 1512.6035690307617 - type: euclidean_precision value: 64.97816593886463 - type: euclidean_recall value: 86.97685293429974 - type: manhattan_accuracy value: 71.32892363199038 - type: manhattan_accuracy_threshold value: 33340.49072265625 - type: manhattan_ap value: 79.11973684118587 - type: manhattan_f1 value: 74.29401993355481 - type: manhattan_f1_threshold value: 36012.52746582031 - type: manhattan_precision value: 66.81605975723622 - type: manhattan_recall value: 83.65676876315175 - type: max_accuracy value: 71.41310883944678 - type: max_ap value: 79.23359768836457 - type: max_f1 value: 74.38512297540491 task: type: PairClassification - dataset: config: default name: MTEB CovidRetrieval (default) revision: 1271c7809071a13532e05f25fb53511ffce77117 split: dev type: C-MTEB/CovidRetrieval metrics: - type: main_score value: 78.917 - type: map_at_1 value: 67.281 - type: map_at_10 value: 75.262 - type: map_at_100 value: 75.60900000000001 - type: map_at_1000 value: 75.618 - type: map_at_20 value: 75.50200000000001 - type: map_at_3 value: 73.455 - type: map_at_5 value: 74.657 - type: mrr_at_1 value: 67.43940990516333 - type: mrr_at_10 value: 75.27367989696756 - type: mrr_at_100 value: 75.62029353306437 - type: mrr_at_1000 value: 75.62934741874726 - type: mrr_at_20 value: 75.51356607409173 - type: mrr_at_3 value: 73.5159817351598 - type: mrr_at_5 value: 74.73832103969093 - type: nauc_map_at_1000_diff1 value: 77.26666391867634 - type: nauc_map_at_1000_max value: 49.928541012203496 - type: nauc_map_at_1000_std value: -40.494469470474456 - type: nauc_map_at_100_diff1 value: 77.26087423162396 - type: nauc_map_at_100_max value: 49.944275615664424 - type: nauc_map_at_100_std value: -40.48299992715398 - type: nauc_map_at_10_diff1 value: 76.97400113500906 - type: nauc_map_at_10_max value: 49.84177029115674 - type: nauc_map_at_10_std value: -40.829250876511445 - type: nauc_map_at_1_diff1 value: 81.44050620630395 - type: nauc_map_at_1_max value: 48.97711944070578 - type: nauc_map_at_1_std value: -38.963689457570254 - type: nauc_map_at_20_diff1 value: 77.21791353089375 - type: nauc_map_at_20_max value: 49.958206759079424 - type: nauc_map_at_20_std value: -40.53067571658996 - type: nauc_map_at_3_diff1 value: 77.3555925208868 - type: nauc_map_at_3_max value: 49.32158146451256 - type: nauc_map_at_3_std value: -41.93552426981978 - type: nauc_map_at_5_diff1 value: 77.07099950431504 - type: nauc_map_at_5_max value: 49.54190504495002 - type: nauc_map_at_5_std value: -41.814968130918096 - type: nauc_mrr_at_1000_diff1 value: 77.31388774540477 - type: nauc_mrr_at_1000_max value: 49.96779699175759 - type: nauc_mrr_at_1000_std value: -40.43739645160277 - type: nauc_mrr_at_100_diff1 value: 77.30817786449413 - type: nauc_mrr_at_100_max value: 49.982514428937655 - type: nauc_mrr_at_100_std value: -40.42876582797744 - type: nauc_mrr_at_10_diff1 value: 77.02048060465756 - type: nauc_mrr_at_10_max value: 49.87937207270602 - type: nauc_mrr_at_10_std value: -40.77596560333177 - type: nauc_mrr_at_1_diff1 value: 81.27219599516599 - type: nauc_mrr_at_1_max value: 49.3083394026327 - type: nauc_mrr_at_1_std value: -38.31023037552026 - type: nauc_mrr_at_20_diff1 value: 77.26497089316055 - type: nauc_mrr_at_20_max value: 49.996257597621415 - type: nauc_mrr_at_20_std value: -40.476723608868014 - type: nauc_mrr_at_3_diff1 value: 77.38971294099257 - type: nauc_mrr_at_3_max value: 49.38110328987404 - type: nauc_mrr_at_3_std value: -41.7118646715979 - type: nauc_mrr_at_5_diff1 value: 77.08286142519952 - type: nauc_mrr_at_5_max value: 49.655249374588685 - type: nauc_mrr_at_5_std value: -41.48173039989406 - type: nauc_ndcg_at_1000_diff1 value: 76.47399204021758 - type: nauc_ndcg_at_1000_max value: 50.55770139961048 - type: nauc_ndcg_at_1000_std value: -39.55650430279072 - type: nauc_ndcg_at_100_diff1 value: 76.29355616618253 - type: nauc_ndcg_at_100_max value: 51.003608112592936 - type: nauc_ndcg_at_100_std value: -39.24769744605206 - type: nauc_ndcg_at_10_diff1 value: 74.88697528447634 - type: nauc_ndcg_at_10_max value: 50.398416372815234 - type: nauc_ndcg_at_10_std value: -40.76526585772833 - type: nauc_ndcg_at_1_diff1 value: 81.27219599516599 - type: nauc_ndcg_at_1_max value: 49.3083394026327 - type: nauc_ndcg_at_1_std value: -38.31023037552026 - type: nauc_ndcg_at_20_diff1 value: 75.85463512091866 - type: nauc_ndcg_at_20_max value: 50.97338683654334 - type: nauc_ndcg_at_20_std value: -39.353128774903404 - type: nauc_ndcg_at_3_diff1 value: 75.94015726123543 - type: nauc_ndcg_at_3_max value: 49.22194251063148 - type: nauc_ndcg_at_3_std value: -43.040457030630435 - type: nauc_ndcg_at_5_diff1 value: 75.19166189770303 - type: nauc_ndcg_at_5_max value: 49.65696229797189 - type: nauc_ndcg_at_5_std value: -42.81534909184424 - type: nauc_precision_at_1000_diff1 value: -14.830901395815788 - type: nauc_precision_at_1000_max value: 19.686297136854623 - type: nauc_precision_at_1000_std value: 61.19310360166978 - type: nauc_precision_at_100_diff1 value: 20.55469986751769 - type: nauc_precision_at_100_max value: 50.78431835075583 - type: nauc_precision_at_100_std value: 31.54986568374813 - type: nauc_precision_at_10_diff1 value: 45.991938532558656 - type: nauc_precision_at_10_max value: 46.386318595630385 - type: nauc_precision_at_10_std value: -23.463011435224608 - type: nauc_precision_at_1_diff1 value: 81.27219599516599 - type: nauc_precision_at_1_max value: 49.3083394026327 - type: nauc_precision_at_1_std value: -38.31023037552026 - type: nauc_precision_at_20_diff1 value: 41.53180472410822 - type: nauc_precision_at_20_max value: 49.89800247204318 - type: nauc_precision_at_20_std value: -2.4192847331537095 - type: nauc_precision_at_3_diff1 value: 67.37504651209993 - type: nauc_precision_at_3_max value: 47.893537208629496 - type: nauc_precision_at_3_std value: -43.2362212382819 - type: nauc_precision_at_5_diff1 value: 60.03438883791718 - type: nauc_precision_at_5_max value: 48.29770502354206 - type: nauc_precision_at_5_std value: -40.39588448271546 - type: nauc_recall_at_1000_diff1 value: 71.04741174480844 - type: nauc_recall_at_1000_max value: 93.19056506596002 - type: nauc_recall_at_1000_std value: 62.96994797650912 - type: nauc_recall_at_100_diff1 value: 65.00418176852641 - type: nauc_recall_at_100_max value: 85.27352708427193 - type: nauc_recall_at_100_std value: 2.8812005546518886 - type: nauc_recall_at_10_diff1 value: 61.263254794998865 - type: nauc_recall_at_10_max value: 54.17618329507141 - type: nauc_recall_at_10_std value: -39.80603966142593 - type: nauc_recall_at_1_diff1 value: 81.44050620630395 - type: nauc_recall_at_1_max value: 48.97711944070578 - type: nauc_recall_at_1_std value: -38.963689457570254 - type: nauc_recall_at_20_diff1 value: 64.42106091745396 - type: nauc_recall_at_20_max value: 63.10796640821887 - type: nauc_recall_at_20_std value: -22.60117424572222 - type: nauc_recall_at_3_diff1 value: 70.66311436592945 - type: nauc_recall_at_3_max value: 48.69498944323469 - type: nauc_recall_at_3_std value: -47.37847524874532 - type: nauc_recall_at_5_diff1 value: 66.12701111728848 - type: nauc_recall_at_5_max value: 49.91763957934711 - type: nauc_recall_at_5_std value: -48.173252920584126 - type: ndcg_at_1 value: 67.43900000000001 - type: ndcg_at_10 value: 78.917 - type: ndcg_at_100 value: 80.53399999999999 - type: ndcg_at_1000 value: 80.768 - type: ndcg_at_20 value: 79.813 - type: ndcg_at_3 value: 75.37 - type: ndcg_at_5 value: 77.551 - type: precision_at_1 value: 67.43900000000001 - type: precision_at_10 value: 9.115 - type: precision_at_100 value: 0.985 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.737 - type: precision_at_3 value: 27.081 - type: precision_at_5 value: 17.345 - type: recall_at_1 value: 67.281 - type: recall_at_10 value: 90.2 - type: recall_at_100 value: 97.576 - type: recall_at_1000 value: 99.368 - type: recall_at_20 value: 93.783 - type: recall_at_3 value: 80.822 - type: recall_at_5 value: 86.091 task: type: Retrieval - dataset: config: default name: MTEB DBPedia (default) revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: map_at_1 value: 9.041 - type: map_at_10 value: 18.662 - type: map_at_100 value: 26.054 - type: map_at_1000 value: 27.769 - type: map_at_20 value: 21.499 - type: map_at_3 value: 13.628000000000002 - type: map_at_5 value: 15.617 - type: mrr_at_1 value: 67.25 - type: mrr_at_10 value: 74.673 - type: mrr_at_100 value: 75.022 - type: mrr_at_1000 value: 75.031 - type: mrr_at_20 value: 74.895 - type: mrr_at_3 value: 73.042 - type: mrr_at_5 value: 74.179 - type: ndcg_at_1 value: 55.75 - type: ndcg_at_10 value: 41.004000000000005 - type: ndcg_at_100 value: 44.912 - type: ndcg_at_1000 value: 51.946000000000005 - type: ndcg_at_20 value: 40.195 - type: ndcg_at_3 value: 45.803 - type: ndcg_at_5 value: 42.976 - type: precision_at_1 value: 67.25 - type: precision_at_10 value: 31.874999999999996 - type: precision_at_100 value: 10.37 - type: precision_at_1000 value: 2.1430000000000002 - type: precision_at_20 value: 24.275 - type: precision_at_3 value: 48.417 - type: precision_at_5 value: 40.2 - type: recall_at_1 value: 9.041 - type: recall_at_10 value: 23.592 - type: recall_at_100 value: 49.476 - type: recall_at_1000 value: 71.677 - type: recall_at_20 value: 30.153000000000002 - type: recall_at_3 value: 14.777000000000001 - type: recall_at_5 value: 17.829 - type: main_score value: 41.004000000000005 task: type: Retrieval - dataset: config: default name: MTEB DuRetrieval (default) revision: a1a333e290fe30b10f3f56498e3a0d911a693ced split: dev type: C-MTEB/DuRetrieval metrics: - type: main_score value: 83.134 - type: map_at_1 value: 23.907999999999998 - type: map_at_10 value: 74.566 - type: map_at_100 value: 77.706 - type: map_at_1000 value: 77.762 - type: map_at_20 value: 76.943 - type: map_at_3 value: 50.971999999999994 - type: map_at_5 value: 64.429 - type: mrr_at_1 value: 84.8 - type: mrr_at_10 value: 89.73218253968246 - type: mrr_at_100 value: 89.82853630655774 - type: mrr_at_1000 value: 89.83170411703153 - type: mrr_at_20 value: 89.79582030091501 - type: mrr_at_3 value: 89.32499999999992 - type: mrr_at_5 value: 89.58749999999992 - type: nauc_map_at_1000_diff1 value: -2.2736020650163717 - type: nauc_map_at_1000_max value: 45.3937519555142 - type: nauc_map_at_1000_std value: 10.824778228268581 - type: nauc_map_at_100_diff1 value: -2.2662939752750066 - type: nauc_map_at_100_max value: 45.423960626031366 - type: nauc_map_at_100_std value: 10.804239351738717 - type: nauc_map_at_10_diff1 value: 0.9395752585654343 - type: nauc_map_at_10_max value: 42.53814836940551 - type: nauc_map_at_10_std value: 0.7199313235265218 - type: nauc_map_at_1_diff1 value: 45.19415865267676 - type: nauc_map_at_1_max value: -1.7261947382471912 - type: nauc_map_at_1_std value: -32.16144291613605 - type: nauc_map_at_20_diff1 value: -1.884514152147472 - type: nauc_map_at_20_max value: 44.830401115927174 - type: nauc_map_at_20_std value: 8.118530414377219 - type: nauc_map_at_3_diff1 value: 25.678881127059967 - type: nauc_map_at_3_max value: 12.191400431839758 - type: nauc_map_at_3_std value: -27.201740587642327 - type: nauc_map_at_5_diff1 value: 13.227128780829572 - type: nauc_map_at_5_max value: 26.978282739708977 - type: nauc_map_at_5_std value: -17.555610348070584 - type: nauc_mrr_at_1000_diff1 value: 21.073512437502178 - type: nauc_mrr_at_1000_max value: 64.9680257861005 - type: nauc_mrr_at_1000_std value: 19.626288754404293 - type: nauc_mrr_at_100_diff1 value: 21.074637426957732 - type: nauc_mrr_at_100_max value: 64.97612675661915 - type: nauc_mrr_at_100_std value: 19.649504127800878 - type: nauc_mrr_at_10_diff1 value: 21.12003267626651 - type: nauc_mrr_at_10_max value: 65.24362289059766 - type: nauc_mrr_at_10_std value: 19.92351276180984 - type: nauc_mrr_at_1_diff1 value: 22.711430629147635 - type: nauc_mrr_at_1_max value: 58.4059429497403 - type: nauc_mrr_at_1_std value: 11.967886722567973 - type: nauc_mrr_at_20_diff1 value: 20.98220830510272 - type: nauc_mrr_at_20_max value: 65.05737535197835 - type: nauc_mrr_at_20_std value: 19.66672900782771 - type: nauc_mrr_at_3_diff1 value: 20.924796220048528 - type: nauc_mrr_at_3_max value: 65.71388669932584 - type: nauc_mrr_at_3_std value: 20.05912197134477 - type: nauc_mrr_at_5_diff1 value: 20.61978649468208 - type: nauc_mrr_at_5_max value: 65.50709154526211 - type: nauc_mrr_at_5_std value: 20.241434276181838 - type: nauc_ndcg_at_1000_diff1 value: 0.25363171946133656 - type: nauc_ndcg_at_1000_max value: 54.12840465309885 - type: nauc_ndcg_at_1000_std value: 20.749184325412546 - type: nauc_ndcg_at_100_diff1 value: 0.15649430250272792 - type: nauc_ndcg_at_100_max value: 54.47995322413234 - type: nauc_ndcg_at_100_std value: 21.266786634233267 - type: nauc_ndcg_at_10_diff1 value: 0.14579250840386346 - type: nauc_ndcg_at_10_max value: 49.8643037948353 - type: nauc_ndcg_at_10_std value: 12.960701643914216 - type: nauc_ndcg_at_1_diff1 value: 22.711430629147635 - type: nauc_ndcg_at_1_max value: 58.4059429497403 - type: nauc_ndcg_at_1_std value: 11.967886722567973 - type: nauc_ndcg_at_20_diff1 value: -0.6701559981776763 - type: nauc_ndcg_at_20_max value: 52.95443437012488 - type: nauc_ndcg_at_20_std value: 16.708883972005758 - type: nauc_ndcg_at_3_diff1 value: -0.19084922341962388 - type: nauc_ndcg_at_3_max value: 46.2110230886874 - type: nauc_ndcg_at_3_std value: 13.363250229683038 - type: nauc_ndcg_at_5_diff1 value: 0.9840019268192548 - type: nauc_ndcg_at_5_max value: 43.56594891798146 - type: nauc_ndcg_at_5_std value: 8.577017104088146 - type: nauc_precision_at_1000_diff1 value: -30.779179091501145 - type: nauc_precision_at_1000_max value: 16.056094258615673 - type: nauc_precision_at_1000_std value: 49.96303902363283 - type: nauc_precision_at_100_diff1 value: -31.583236638899585 - type: nauc_precision_at_100_max value: 19.16571713603373 - type: nauc_precision_at_100_std value: 51.870647903980036 - type: nauc_precision_at_10_diff1 value: -35.62134572732597 - type: nauc_precision_at_10_max value: 31.6935186494612 - type: nauc_precision_at_10_std value: 46.68659723766723 - type: nauc_precision_at_1_diff1 value: 22.711430629147635 - type: nauc_precision_at_1_max value: 58.4059429497403 - type: nauc_precision_at_1_std value: 11.967886722567973 - type: nauc_precision_at_20_diff1 value: -33.875460046920495 - type: nauc_precision_at_20_max value: 24.188420133566442 - type: nauc_precision_at_20_std value: 50.02387762958483 - type: nauc_precision_at_3_diff1 value: -28.875998450906827 - type: nauc_precision_at_3_max value: 44.77058831167941 - type: nauc_precision_at_3_std value: 31.77993710437207 - type: nauc_precision_at_5_diff1 value: -34.92525440306491 - type: nauc_precision_at_5_max value: 39.855219917077086 - type: nauc_precision_at_5_std value: 37.95432046169299 - type: nauc_recall_at_1000_diff1 value: -14.293309371874733 - type: nauc_recall_at_1000_max value: 59.06948692482579 - type: nauc_recall_at_1000_std value: 62.586254868312686 - type: nauc_recall_at_100_diff1 value: -4.344100947212704 - type: nauc_recall_at_100_max value: 58.42120421043602 - type: nauc_recall_at_100_std value: 46.48562009316997 - type: nauc_recall_at_10_diff1 value: 0.04948662912161709 - type: nauc_recall_at_10_max value: 42.42809687119093 - type: nauc_recall_at_10_std value: 0.6892504250411409 - type: nauc_recall_at_1_diff1 value: 45.19415865267676 - type: nauc_recall_at_1_max value: -1.7261947382471912 - type: nauc_recall_at_1_std value: -32.16144291613605 - type: nauc_recall_at_20_diff1 value: -7.634587864605111 - type: nauc_recall_at_20_max value: 49.21327187174134 - type: nauc_recall_at_20_std value: 16.408481068336346 - type: nauc_recall_at_3_diff1 value: 24.72546591038644 - type: nauc_recall_at_3_max value: 6.620763400972902 - type: nauc_recall_at_3_std value: -29.994703323331684 - type: nauc_recall_at_5_diff1 value: 12.65527364845842 - type: nauc_recall_at_5_max value: 20.400121385794694 - type: nauc_recall_at_5_std value: -22.34284568447213 - type: ndcg_at_1 value: 84.8 - type: ndcg_at_10 value: 83.134 - type: ndcg_at_100 value: 86.628 - type: ndcg_at_1000 value: 87.151 - type: ndcg_at_20 value: 85.092 - type: ndcg_at_3 value: 81.228 - type: ndcg_at_5 value: 80.2 - type: precision_at_1 value: 84.8 - type: precision_at_10 value: 40.394999999999996 - type: precision_at_100 value: 4.745 - type: precision_at_1000 value: 0.488 - type: precision_at_20 value: 22.245 - type: precision_at_3 value: 73.25 - type: precision_at_5 value: 61.86000000000001 - type: recall_at_1 value: 23.907999999999998 - type: recall_at_10 value: 85.346 - type: recall_at_100 value: 96.515 - type: recall_at_1000 value: 99.156 - type: recall_at_20 value: 91.377 - type: recall_at_3 value: 54.135 - type: recall_at_5 value: 70.488 task: type: Retrieval - dataset: config: default name: MTEB EcomRetrieval (default) revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9 split: dev type: C-MTEB/EcomRetrieval metrics: - type: main_score value: 60.887 - type: map_at_1 value: 46.6 - type: map_at_10 value: 56.035000000000004 - type: map_at_100 value: 56.741 - type: map_at_1000 value: 56.764 - type: map_at_20 value: 56.513999999999996 - type: map_at_3 value: 53.733 - type: map_at_5 value: 54.913000000000004 - type: mrr_at_1 value: 46.6 - type: mrr_at_10 value: 56.034523809523776 - type: mrr_at_100 value: 56.74056360434383 - type: mrr_at_1000 value: 56.76373487222486 - type: mrr_at_20 value: 56.51374873879128 - type: mrr_at_3 value: 53.73333333333328 - type: mrr_at_5 value: 54.91333333333327 - type: nauc_map_at_1000_diff1 value: 65.13546939953387 - type: nauc_map_at_1000_max value: 43.358890946774494 - type: nauc_map_at_1000_std value: -9.973282105235036 - type: nauc_map_at_100_diff1 value: 65.12449309472493 - type: nauc_map_at_100_max value: 43.377100882923145 - type: nauc_map_at_100_std value: -9.971781228240555 - type: nauc_map_at_10_diff1 value: 64.83020018537475 - type: nauc_map_at_10_max value: 43.25969482323034 - type: nauc_map_at_10_std value: -10.120272176001547 - type: nauc_map_at_1_diff1 value: 69.58727592100516 - type: nauc_map_at_1_max value: 38.236494689522026 - type: nauc_map_at_1_std value: -14.833390831689597 - type: nauc_map_at_20_diff1 value: 65.01159809914586 - type: nauc_map_at_20_max value: 43.33440319829618 - type: nauc_map_at_20_std value: -10.039958228659726 - type: nauc_map_at_3_diff1 value: 65.2396323885909 - type: nauc_map_at_3_max value: 42.26904017378952 - type: nauc_map_at_3_std value: -11.793017036934044 - type: nauc_map_at_5_diff1 value: 64.96397227898036 - type: nauc_map_at_5_max value: 43.231333789145424 - type: nauc_map_at_5_std value: -10.349933732151372 - type: nauc_mrr_at_1000_diff1 value: 65.13546939953387 - type: nauc_mrr_at_1000_max value: 43.358890946774494 - type: nauc_mrr_at_1000_std value: -9.973282105235036 - type: nauc_mrr_at_100_diff1 value: 65.12449309472493 - type: nauc_mrr_at_100_max value: 43.377100882923145 - type: nauc_mrr_at_100_std value: -9.971781228240555 - type: nauc_mrr_at_10_diff1 value: 64.83020018537475 - type: nauc_mrr_at_10_max value: 43.25969482323034 - type: nauc_mrr_at_10_std value: -10.120272176001547 - type: nauc_mrr_at_1_diff1 value: 69.58727592100516 - type: nauc_mrr_at_1_max value: 38.236494689522026 - type: nauc_mrr_at_1_std value: -14.833390831689597 - type: nauc_mrr_at_20_diff1 value: 65.01159809914586 - type: nauc_mrr_at_20_max value: 43.33440319829618 - type: nauc_mrr_at_20_std value: -10.039958228659726 - type: nauc_mrr_at_3_diff1 value: 65.2396323885909 - type: nauc_mrr_at_3_max value: 42.26904017378952 - type: nauc_mrr_at_3_std value: -11.793017036934044 - type: nauc_mrr_at_5_diff1 value: 64.96397227898036 - type: nauc_mrr_at_5_max value: 43.231333789145424 - type: nauc_mrr_at_5_std value: -10.349933732151372 - type: nauc_ndcg_at_1000_diff1 value: 64.26802655199876 - type: nauc_ndcg_at_1000_max value: 45.854310744745185 - type: nauc_ndcg_at_1000_std value: -6.184417305204082 - type: nauc_ndcg_at_100_diff1 value: 63.99268329609827 - type: nauc_ndcg_at_100_max value: 46.31270128748375 - type: nauc_ndcg_at_100_std value: -6.1393433180558965 - type: nauc_ndcg_at_10_diff1 value: 62.6735104141137 - type: nauc_ndcg_at_10_max value: 45.54954799462398 - type: nauc_ndcg_at_10_std value: -7.348851199024871 - type: nauc_ndcg_at_1_diff1 value: 69.58727592100516 - type: nauc_ndcg_at_1_max value: 38.236494689522026 - type: nauc_ndcg_at_1_std value: -14.833390831689597 - type: nauc_ndcg_at_20_diff1 value: 63.25899651677274 - type: nauc_ndcg_at_20_max value: 45.952196968886014 - type: nauc_ndcg_at_20_std value: -6.807607465125713 - type: nauc_ndcg_at_3_diff1 value: 63.65618337476822 - type: nauc_ndcg_at_3_max value: 43.507890965228945 - type: nauc_ndcg_at_3_std value: -10.73845622217601 - type: nauc_ndcg_at_5_diff1 value: 63.079162432921855 - type: nauc_ndcg_at_5_max value: 45.38303443868148 - type: nauc_ndcg_at_5_std value: -8.063657824835534 - type: nauc_precision_at_1000_diff1 value: 63.01459977930557 - type: nauc_precision_at_1000_max value: 92.4253034547151 - type: nauc_precision_at_1000_std value: 84.4845513963158 - type: nauc_precision_at_100_diff1 value: 57.17217119405878 - type: nauc_precision_at_100_max value: 80.70049725316484 - type: nauc_precision_at_100_std value: 41.78392287147403 - type: nauc_precision_at_10_diff1 value: 53.115665404390725 - type: nauc_precision_at_10_max value: 55.73825657341263 - type: nauc_precision_at_10_std value: 5.406226305013257 - type: nauc_precision_at_1_diff1 value: 69.58727592100516 - type: nauc_precision_at_1_max value: 38.236494689522026 - type: nauc_precision_at_1_std value: -14.833390831689597 - type: nauc_precision_at_20_diff1 value: 53.77730697622828 - type: nauc_precision_at_20_max value: 61.88170819253054 - type: nauc_precision_at_20_std value: 13.678730470003856 - type: nauc_precision_at_3_diff1 value: 58.580196992291455 - type: nauc_precision_at_3_max value: 47.404834585376626 - type: nauc_precision_at_3_std value: -7.374978769024051 - type: nauc_precision_at_5_diff1 value: 56.44564652606437 - type: nauc_precision_at_5_max value: 53.08973975162324 - type: nauc_precision_at_5_std value: 0.22762700141423803 - type: nauc_recall_at_1000_diff1 value: 63.01459977930565 - type: nauc_recall_at_1000_max value: 92.42530345471532 - type: nauc_recall_at_1000_std value: 84.48455139631602 - type: nauc_recall_at_100_diff1 value: 57.17217119405904 - type: nauc_recall_at_100_max value: 80.70049725316468 - type: nauc_recall_at_100_std value: 41.783922871474275 - type: nauc_recall_at_10_diff1 value: 53.11566540439087 - type: nauc_recall_at_10_max value: 55.738256573412656 - type: nauc_recall_at_10_std value: 5.406226305013377 - type: nauc_recall_at_1_diff1 value: 69.58727592100516 - type: nauc_recall_at_1_max value: 38.236494689522026 - type: nauc_recall_at_1_std value: -14.833390831689597 - type: nauc_recall_at_20_diff1 value: 53.77730697622846 - type: nauc_recall_at_20_max value: 61.881708192530525 - type: nauc_recall_at_20_std value: 13.678730470003947 - type: nauc_recall_at_3_diff1 value: 58.5801969922914 - type: nauc_recall_at_3_max value: 47.40483458537654 - type: nauc_recall_at_3_std value: -7.37497876902413 - type: nauc_recall_at_5_diff1 value: 56.445646526064394 - type: nauc_recall_at_5_max value: 53.08973975162332 - type: nauc_recall_at_5_std value: 0.22762700141428024 - type: ndcg_at_1 value: 46.6 - type: ndcg_at_10 value: 60.887 - type: ndcg_at_100 value: 64.18199999999999 - type: ndcg_at_1000 value: 64.726 - type: ndcg_at_20 value: 62.614999999999995 - type: ndcg_at_3 value: 56.038 - type: ndcg_at_5 value: 58.150999999999996 - type: precision_at_1 value: 46.6 - type: precision_at_10 value: 7.630000000000001 - type: precision_at_100 value: 0.914 - type: precision_at_1000 value: 0.096 - type: precision_at_20 value: 4.154999999999999 - type: precision_at_3 value: 20.9 - type: precision_at_5 value: 13.56 - type: recall_at_1 value: 46.6 - type: recall_at_10 value: 76.3 - type: recall_at_100 value: 91.4 - type: recall_at_1000 value: 95.6 - type: recall_at_20 value: 83.1 - type: recall_at_3 value: 62.7 - type: recall_at_5 value: 67.80000000000001 task: type: Retrieval - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 73.29999999999998 - type: f1 value: 67.71473706580302 - type: f1_weighted value: 74.83537255312045 - type: main_score value: 73.29999999999998 task: type: Classification - dataset: config: default name: MTEB FEVER (default) revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 split: test type: mteb/fever metrics: - type: map_at_1 value: 78.371 - type: map_at_10 value: 85.762 - type: map_at_100 value: 85.954 - type: map_at_1000 value: 85.966 - type: map_at_20 value: 85.887 - type: map_at_3 value: 84.854 - type: map_at_5 value: 85.408 - type: mrr_at_1 value: 84.443 - type: mrr_at_10 value: 90.432 - type: mrr_at_100 value: 90.483 - type: mrr_at_1000 value: 90.484 - type: mrr_at_20 value: 90.473 - type: mrr_at_3 value: 89.89399999999999 - type: mrr_at_5 value: 90.244 - type: ndcg_at_1 value: 84.443 - type: ndcg_at_10 value: 89.05499999999999 - type: ndcg_at_100 value: 89.68 - type: ndcg_at_1000 value: 89.87899999999999 - type: ndcg_at_20 value: 89.381 - type: ndcg_at_3 value: 87.73100000000001 - type: ndcg_at_5 value: 88.425 - type: precision_at_1 value: 84.443 - type: precision_at_10 value: 10.520999999999999 - type: precision_at_100 value: 1.103 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_20 value: 5.362 - type: precision_at_3 value: 33.198 - type: precision_at_5 value: 20.441000000000003 - type: recall_at_1 value: 78.371 - type: recall_at_10 value: 94.594 - type: recall_at_100 value: 96.97099999999999 - type: recall_at_1000 value: 98.18 - type: recall_at_20 value: 95.707 - type: recall_at_3 value: 90.853 - type: recall_at_5 value: 92.74799999999999 - type: main_score value: 89.05499999999999 task: type: Retrieval - dataset: config: default name: MTEB FiQA2018 (default) revision: 27a168819829fe9bcd655c2df245fb19452e8e06 split: test type: mteb/fiqa metrics: - type: map_at_1 value: 23.810000000000002 - type: map_at_10 value: 39.051 - type: map_at_100 value: 41.231 - type: map_at_1000 value: 41.376000000000005 - type: map_at_20 value: 40.227000000000004 - type: map_at_3 value: 33.915 - type: map_at_5 value: 36.459 - type: mrr_at_1 value: 48.148 - type: mrr_at_10 value: 55.765 - type: mrr_at_100 value: 56.495 - type: mrr_at_1000 value: 56.525999999999996 - type: mrr_at_20 value: 56.213 - type: mrr_at_3 value: 53.086 - type: mrr_at_5 value: 54.513999999999996 - type: ndcg_at_1 value: 48.148 - type: ndcg_at_10 value: 47.349999999999994 - type: ndcg_at_100 value: 54.61899999999999 - type: ndcg_at_1000 value: 56.830000000000005 - type: ndcg_at_20 value: 50.143 - type: ndcg_at_3 value: 43.108000000000004 - type: ndcg_at_5 value: 44.023 - type: precision_at_1 value: 48.148 - type: precision_at_10 value: 13.441 - type: precision_at_100 value: 2.085 - type: precision_at_1000 value: 0.248 - type: precision_at_20 value: 7.870000000000001 - type: precision_at_3 value: 28.909000000000002 - type: precision_at_5 value: 20.957 - type: recall_at_1 value: 23.810000000000002 - type: recall_at_10 value: 54.303000000000004 - type: recall_at_100 value: 81.363 - type: recall_at_1000 value: 94.391 - type: recall_at_20 value: 63.056999999999995 - type: recall_at_3 value: 38.098 - type: recall_at_5 value: 44.414 - type: main_score value: 47.349999999999994 task: type: Retrieval - dataset: config: default name: MTEB GeoreviewClassification (default) revision: 3765c0d1de6b7d264bc459433c45e5a75513839c split: test type: ai-forever/georeview-classification metrics: - type: accuracy value: 48.0126953125 - type: f1 value: 47.65764016160488 - type: f1_weighted value: 47.65701659482088 - type: main_score value: 48.0126953125 task: type: Classification - dataset: config: default name: MTEB GeoreviewClusteringP2P (default) revision: 97a313c8fc85b47f13f33e7e9a95c1ad888c7fec split: test type: ai-forever/georeview-clustering-p2p metrics: - type: main_score value: 73.62357853672266 - type: v_measure value: 73.62357853672266 - type: v_measure_std value: 0.5942247545535766 task: type: Clustering - dataset: config: default name: MTEB GerDaLIR (default) revision: 0bb47f1d73827e96964edb84dfe552f62f4fd5eb split: test type: jinaai/ger_da_lir metrics: - type: main_score value: 16.227 - type: map_at_1 value: 8.082 - type: map_at_10 value: 12.959999999999999 - type: map_at_100 value: 13.923 - type: map_at_1000 value: 14.030999999999999 - type: map_at_20 value: 13.453000000000001 - type: map_at_3 value: 11.018 - type: map_at_5 value: 12.056000000000001 - type: mrr_at_1 value: 8.993332249146203 - type: mrr_at_10 value: 13.994013092850247 - type: mrr_at_100 value: 14.913737673149308 - type: mrr_at_1000 value: 15.00843809934407 - type: mrr_at_20 value: 14.470268462334007 - type: mrr_at_3 value: 12.000596302921846 - type: mrr_at_5 value: 13.070689000921561 - type: nauc_map_at_1000_diff1 value: 28.559639584013286 - type: nauc_map_at_1000_max value: 25.533800126086714 - type: nauc_map_at_1000_std value: 9.826551026628666 - type: nauc_map_at_100_diff1 value: 28.544724499331696 - type: nauc_map_at_100_max value: 25.46734324526386 - type: nauc_map_at_100_std value: 9.739314481785591 - type: nauc_map_at_10_diff1 value: 28.77447517718118 - type: nauc_map_at_10_max value: 24.7431615237795 - type: nauc_map_at_10_std value: 8.349878188033646 - type: nauc_map_at_1_diff1 value: 37.405452629895514 - type: nauc_map_at_1_max value: 24.444208978394023 - type: nauc_map_at_1_std value: 4.043820373810528 - type: nauc_map_at_20_diff1 value: 28.69764217789062 - type: nauc_map_at_20_max value: 25.111848355996496 - type: nauc_map_at_20_std value: 9.034829905305918 - type: nauc_map_at_3_diff1 value: 30.89053285076882 - type: nauc_map_at_3_max value: 24.862886115911152 - type: nauc_map_at_3_std value: 6.654260832396586 - type: nauc_map_at_5_diff1 value: 29.230629676604263 - type: nauc_map_at_5_max value: 24.374302288018583 - type: nauc_map_at_5_std value: 7.341846952319046 - type: nauc_mrr_at_1000_diff1 value: 28.086147932781426 - type: nauc_mrr_at_1000_max value: 25.98698528264653 - type: nauc_mrr_at_1000_std value: 9.917554348624545 - type: nauc_mrr_at_100_diff1 value: 28.069163279791336 - type: nauc_mrr_at_100_max value: 25.949440010886804 - type: nauc_mrr_at_100_std value: 9.874340979732578 - type: nauc_mrr_at_10_diff1 value: 28.239920869530046 - type: nauc_mrr_at_10_max value: 25.351271409498576 - type: nauc_mrr_at_10_std value: 8.669862759875162 - type: nauc_mrr_at_1_diff1 value: 35.96543040207856 - type: nauc_mrr_at_1_max value: 25.488936487231967 - type: nauc_mrr_at_1_std value: 4.76439131038345 - type: nauc_mrr_at_20_diff1 value: 28.18865871284607 - type: nauc_mrr_at_20_max value: 25.67121763344746 - type: nauc_mrr_at_20_std value: 9.297910707519472 - type: nauc_mrr_at_3_diff1 value: 30.166714199740717 - type: nauc_mrr_at_3_max value: 25.541792491964877 - type: nauc_mrr_at_3_std value: 7.083090296398472 - type: nauc_mrr_at_5_diff1 value: 28.68475284656478 - type: nauc_mrr_at_5_max value: 24.994071363482835 - type: nauc_mrr_at_5_std value: 7.687507254902365 - type: nauc_ndcg_at_1000_diff1 value: 25.292792613586467 - type: nauc_ndcg_at_1000_max value: 29.211905289377178 - type: nauc_ndcg_at_1000_std value: 18.088867467320355 - type: nauc_ndcg_at_100_diff1 value: 25.026905011089152 - type: nauc_ndcg_at_100_max value: 27.98822281254431 - type: nauc_ndcg_at_100_std value: 16.69456904301902 - type: nauc_ndcg_at_10_diff1 value: 25.972279051109503 - type: nauc_ndcg_at_10_max value: 24.86486482734957 - type: nauc_ndcg_at_10_std value: 10.398605822106353 - type: nauc_ndcg_at_1_diff1 value: 36.134710485184826 - type: nauc_ndcg_at_1_max value: 25.384572790326025 - type: nauc_ndcg_at_1_std value: 4.591863033771824 - type: nauc_ndcg_at_20_diff1 value: 25.850033660205536 - type: nauc_ndcg_at_20_max value: 25.944243193140515 - type: nauc_ndcg_at_20_std value: 12.392409721204892 - type: nauc_ndcg_at_3_diff1 value: 29.1966056380018 - type: nauc_ndcg_at_3_max value: 24.978843156259913 - type: nauc_ndcg_at_3_std value: 7.353914459205087 - type: nauc_ndcg_at_5_diff1 value: 26.795315295756282 - type: nauc_ndcg_at_5_max value: 24.1196789150412 - type: nauc_ndcg_at_5_std value: 8.311970988265172 - type: nauc_precision_at_1000_diff1 value: 9.128270550217984 - type: nauc_precision_at_1000_max value: 35.79286915973607 - type: nauc_precision_at_1000_std value: 39.15669472887154 - type: nauc_precision_at_100_diff1 value: 14.770289799034384 - type: nauc_precision_at_100_max value: 34.58262232264337 - type: nauc_precision_at_100_std value: 34.101148102981384 - type: nauc_precision_at_10_diff1 value: 19.899104673118178 - type: nauc_precision_at_10_max value: 26.636940338985625 - type: nauc_precision_at_10_std value: 15.73871357255849 - type: nauc_precision_at_1_diff1 value: 36.134710485184826 - type: nauc_precision_at_1_max value: 25.384572790326025 - type: nauc_precision_at_1_std value: 4.591863033771824 - type: nauc_precision_at_20_diff1 value: 19.423457975148942 - type: nauc_precision_at_20_max value: 29.58123490878582 - type: nauc_precision_at_20_std value: 20.847850110821618 - type: nauc_precision_at_3_diff1 value: 24.986416623492918 - type: nauc_precision_at_3_max value: 25.973548400472975 - type: nauc_precision_at_3_std value: 9.486410455972823 - type: nauc_precision_at_5_diff1 value: 21.237741424923332 - type: nauc_precision_at_5_max value: 24.647141028200164 - type: nauc_precision_at_5_std value: 11.102785032334147 - type: nauc_recall_at_1000_diff1 value: 15.999714888817829 - type: nauc_recall_at_1000_max value: 44.34701908906545 - type: nauc_recall_at_1000_std value: 51.13471291594717 - type: nauc_recall_at_100_diff1 value: 17.401714890483706 - type: nauc_recall_at_100_max value: 33.39042631654808 - type: nauc_recall_at_100_std value: 33.944446168451584 - type: nauc_recall_at_10_diff1 value: 20.30036232399894 - type: nauc_recall_at_10_max value: 24.006718284396786 - type: nauc_recall_at_10_std value: 14.049375108518669 - type: nauc_recall_at_1_diff1 value: 37.405452629895514 - type: nauc_recall_at_1_max value: 24.444208978394023 - type: nauc_recall_at_1_std value: 4.043820373810528 - type: nauc_recall_at_20_diff1 value: 20.23582802609045 - type: nauc_recall_at_20_max value: 26.408063410785243 - type: nauc_recall_at_20_std value: 18.617479515468112 - type: nauc_recall_at_3_diff1 value: 25.53221830103098 - type: nauc_recall_at_3_max value: 24.283712329152678 - type: nauc_recall_at_3_std value: 8.428947805841867 - type: nauc_recall_at_5_diff1 value: 21.741499601020823 - type: nauc_recall_at_5_max value: 22.754924586295296 - type: nauc_recall_at_5_std value: 9.966736688169814 - type: ndcg_at_1 value: 8.977 - type: ndcg_at_10 value: 16.227 - type: ndcg_at_100 value: 21.417 - type: ndcg_at_1000 value: 24.451 - type: ndcg_at_20 value: 17.982 - type: ndcg_at_3 value: 12.206999999999999 - type: ndcg_at_5 value: 14.059 - type: precision_at_1 value: 8.977 - type: precision_at_10 value: 2.933 - type: precision_at_100 value: 0.59 - type: precision_at_1000 value: 0.087 - type: precision_at_20 value: 1.8599999999999999 - type: precision_at_3 value: 5.550999999999999 - type: precision_at_5 value: 4.340999999999999 - type: recall_at_1 value: 8.082 - type: recall_at_10 value: 25.52 - type: recall_at_100 value: 50.32 - type: recall_at_1000 value: 74.021 - type: recall_at_20 value: 32.229 - type: recall_at_3 value: 14.66 - type: recall_at_5 value: 19.062 task: type: Retrieval - dataset: config: default name: MTEB GermanDPR (default) revision: 5129d02422a66be600ac89cd3e8531b4f97d347d split: test type: deepset/germandpr metrics: - type: main_score value: 82.422 - type: map_at_1 value: 64.39 - type: map_at_10 value: 77.273 - type: map_at_100 value: 77.375 - type: map_at_1000 value: 77.376 - type: map_at_20 value: 77.351 - type: map_at_3 value: 75.46300000000001 - type: map_at_5 value: 76.878 - type: mrr_at_1 value: 64.19512195121952 - type: mrr_at_10 value: 77.15842044134736 - type: mrr_at_100 value: 77.2604854308704 - type: mrr_at_1000 value: 77.26087882190109 - type: mrr_at_20 value: 77.23572154560611 - type: mrr_at_3 value: 75.34959349593504 - type: mrr_at_5 value: 76.76422764227652 - type: nauc_map_at_1000_diff1 value: 49.73135253389972 - type: nauc_map_at_1000_max value: 8.665570717396145 - type: nauc_map_at_1000_std value: -25.920927572114522 - type: nauc_map_at_100_diff1 value: 49.729170775336605 - type: nauc_map_at_100_max value: 8.66717979705074 - type: nauc_map_at_100_std value: -25.918338868918596 - type: nauc_map_at_10_diff1 value: 49.708681691445925 - type: nauc_map_at_10_max value: 8.830640635692113 - type: nauc_map_at_10_std value: -25.843238986304858 - type: nauc_map_at_1_diff1 value: 51.750022350988914 - type: nauc_map_at_1_max value: 3.599863010364626 - type: nauc_map_at_1_std value: -27.670122127567314 - type: nauc_map_at_20_diff1 value: 49.72609185887161 - type: nauc_map_at_20_max value: 8.766556053409218 - type: nauc_map_at_20_std value: -25.85975887517904 - type: nauc_map_at_3_diff1 value: 49.328512536255595 - type: nauc_map_at_3_max value: 9.475682028996795 - type: nauc_map_at_3_std value: -26.277349632171017 - type: nauc_map_at_5_diff1 value: 49.42801822186142 - type: nauc_map_at_5_max value: 8.788822474357252 - type: nauc_map_at_5_std value: -25.959260882028573 - type: nauc_mrr_at_1000_diff1 value: 50.13038598302397 - type: nauc_mrr_at_1000_max value: 8.734338637484832 - type: nauc_mrr_at_1000_std value: -26.653343549855908 - type: nauc_mrr_at_100_diff1 value: 50.12820392111392 - type: nauc_mrr_at_100_max value: 8.735940503917966 - type: nauc_mrr_at_100_std value: -26.65074918231251 - type: nauc_mrr_at_10_diff1 value: 50.10567888458267 - type: nauc_mrr_at_10_max value: 8.898451291748575 - type: nauc_mrr_at_10_std value: -26.572046921975655 - type: nauc_mrr_at_1_diff1 value: 52.22769994409465 - type: nauc_mrr_at_1_max value: 3.6490820146062015 - type: nauc_mrr_at_1_std value: -28.535100562320498 - type: nauc_mrr_at_20_diff1 value: 50.12462222100699 - type: nauc_mrr_at_20_max value: 8.83487018268756 - type: nauc_mrr_at_20_std value: -26.591437036958332 - type: nauc_mrr_at_3_diff1 value: 49.6987353700016 - type: nauc_mrr_at_3_max value: 9.531003760756258 - type: nauc_mrr_at_3_std value: -26.949799063124818 - type: nauc_mrr_at_5_diff1 value: 49.823881656376585 - type: nauc_mrr_at_5_max value: 8.850404667985085 - type: nauc_mrr_at_5_std value: -26.680008966088582 - type: nauc_ndcg_at_1000_diff1 value: 49.41721203361181 - type: nauc_ndcg_at_1000_max value: 9.41093067609825 - type: nauc_ndcg_at_1000_std value: -25.499543637737567 - type: nauc_ndcg_at_100_diff1 value: 49.32810419509252 - type: nauc_ndcg_at_100_max value: 9.476216458766897 - type: nauc_ndcg_at_100_std value: -25.393856250990414 - type: nauc_ndcg_at_10_diff1 value: 49.181984436623694 - type: nauc_ndcg_at_10_max value: 10.65234732763274 - type: nauc_ndcg_at_10_std value: -24.737669349012297 - type: nauc_ndcg_at_1_diff1 value: 51.750022350988914 - type: nauc_ndcg_at_1_max value: 3.599863010364626 - type: nauc_ndcg_at_1_std value: -27.670122127567314 - type: nauc_ndcg_at_20_diff1 value: 49.275394594995056 - type: nauc_ndcg_at_20_max value: 10.402059796651923 - type: nauc_ndcg_at_20_std value: -24.82329915806705 - type: nauc_ndcg_at_3_diff1 value: 48.22614352152889 - type: nauc_ndcg_at_3_max value: 11.67464280791404 - type: nauc_ndcg_at_3_std value: -25.867824868234095 - type: nauc_ndcg_at_5_diff1 value: 48.35583502987241 - type: nauc_ndcg_at_5_max value: 10.494278750448451 - type: nauc_ndcg_at_5_std value: -25.11599634172764 - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_100_diff1 value: -56.39478136433852 - type: nauc_precision_at_100_max value: 86.93518577529493 - type: nauc_precision_at_100_std value: 100.0 - type: nauc_precision_at_10_diff1 value: 38.662829729133094 - type: nauc_precision_at_10_max value: 56.38018435740605 - type: nauc_precision_at_10_std value: 6.288091897081105 - type: nauc_precision_at_1_diff1 value: 51.750022350988914 - type: nauc_precision_at_1_max value: 3.599863010364626 - type: nauc_precision_at_1_std value: -27.670122127567314 - type: nauc_precision_at_20_diff1 value: 34.739153182429085 - type: nauc_precision_at_20_max value: 84.86908403000989 - type: nauc_precision_at_20_std value: 29.156199421219455 - type: nauc_precision_at_3_diff1 value: 42.09287362529135 - type: nauc_precision_at_3_max value: 23.629152759287074 - type: nauc_precision_at_3_std value: -23.721376911302492 - type: nauc_precision_at_5_diff1 value: 36.03866171924644 - type: nauc_precision_at_5_max value: 29.166173558775327 - type: nauc_precision_at_5_std value: -15.096374563068448 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: -56.39478136433541 - type: nauc_recall_at_100_max value: 86.93518577528111 - type: nauc_recall_at_100_std value: 100.0 - type: nauc_recall_at_10_diff1 value: 38.66282972913384 - type: nauc_recall_at_10_max value: 56.3801843574071 - type: nauc_recall_at_10_std value: 6.288091897082639 - type: nauc_recall_at_1_diff1 value: 51.750022350988914 - type: nauc_recall_at_1_max value: 3.599863010364626 - type: nauc_recall_at_1_std value: -27.670122127567314 - type: nauc_recall_at_20_diff1 value: 34.7391531824321 - type: nauc_recall_at_20_max value: 84.86908403001016 - type: nauc_recall_at_20_std value: 29.156199421220748 - type: nauc_recall_at_3_diff1 value: 42.09287362529107 - type: nauc_recall_at_3_max value: 23.629152759286946 - type: nauc_recall_at_3_std value: -23.72137691130291 - type: nauc_recall_at_5_diff1 value: 36.0386617192469 - type: nauc_recall_at_5_max value: 29.1661735587759 - type: nauc_recall_at_5_std value: -15.09637456306774 - type: ndcg_at_1 value: 64.39 - type: ndcg_at_10 value: 82.422 - type: ndcg_at_100 value: 82.86099999999999 - type: ndcg_at_1000 value: 82.87299999999999 - type: ndcg_at_20 value: 82.67999999999999 - type: ndcg_at_3 value: 78.967 - type: ndcg_at_5 value: 81.50699999999999 - type: precision_at_1 value: 64.39 - type: precision_at_10 value: 9.795 - type: precision_at_100 value: 0.9990000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.946 - type: precision_at_3 value: 29.691000000000003 - type: precision_at_5 value: 19.044 - type: recall_at_1 value: 64.39 - type: recall_at_10 value: 97.951 - type: recall_at_100 value: 99.902 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 98.92699999999999 - type: recall_at_3 value: 89.07300000000001 - type: recall_at_5 value: 95.22 task: type: Retrieval - dataset: config: default name: MTEB GermanQuAD-Retrieval (default) revision: f5c87ae5a2e7a5106606314eef45255f03151bb3 split: test type: mteb/germanquad-retrieval metrics: - type: main_score value: 94.15532365396247 - type: map_at_1 value: 90.789 - type: map_at_10 value: 94.24 - type: map_at_100 value: 94.283 - type: map_at_1000 value: 94.284 - type: map_at_20 value: 94.272 - type: map_at_3 value: 93.913 - type: map_at_5 value: 94.155 - type: mrr_at_1 value: 90.78947368421053 - type: mrr_at_10 value: 94.23987411056376 - type: mrr_at_100 value: 94.28320936825 - type: mrr_at_1000 value: 94.28350209115848 - type: mrr_at_20 value: 94.271919092559 - type: mrr_at_3 value: 93.91258318209313 - type: mrr_at_5 value: 94.15532365396247 - type: nauc_map_at_1000_diff1 value: 89.29089310650436 - type: nauc_map_at_1000_max value: 73.83868784032414 - type: nauc_map_at_1000_std value: -11.635778561889989 - type: nauc_map_at_100_diff1 value: 89.29077225707755 - type: nauc_map_at_100_max value: 73.84002740580378 - type: nauc_map_at_100_std value: -11.644096256165092 - type: nauc_map_at_10_diff1 value: 89.29117612292366 - type: nauc_map_at_10_max value: 73.97487984981221 - type: nauc_map_at_10_std value: -11.35191794373827 - type: nauc_map_at_1_diff1 value: 89.35436544117584 - type: nauc_map_at_1_max value: 70.35936815057701 - type: nauc_map_at_1_std value: -13.598996360976903 - type: nauc_map_at_20_diff1 value: 89.2530394052653 - type: nauc_map_at_20_max value: 73.83537529419839 - type: nauc_map_at_20_std value: -11.628272822028478 - type: nauc_map_at_3_diff1 value: 89.375111893546 - type: nauc_map_at_3_max value: 74.78900366026112 - type: nauc_map_at_3_std value: -12.720905253503274 - type: nauc_map_at_5_diff1 value: 89.35358300820893 - type: nauc_map_at_5_max value: 74.31996219723239 - type: nauc_map_at_5_std value: -10.768642638210867 - type: nauc_mrr_at_1000_diff1 value: 89.29089310650436 - type: nauc_mrr_at_1000_max value: 73.83868784032414 - type: nauc_mrr_at_1000_std value: -11.635778561889989 - type: nauc_mrr_at_100_diff1 value: 89.29077225707755 - type: nauc_mrr_at_100_max value: 73.84002740580378 - type: nauc_mrr_at_100_std value: -11.644096256165092 - type: nauc_mrr_at_10_diff1 value: 89.29117612292366 - type: nauc_mrr_at_10_max value: 73.97487984981221 - type: nauc_mrr_at_10_std value: -11.35191794373827 - type: nauc_mrr_at_1_diff1 value: 89.35436544117584 - type: nauc_mrr_at_1_max value: 70.35936815057701 - type: nauc_mrr_at_1_std value: -13.598996360976903 - type: nauc_mrr_at_20_diff1 value: 89.2530394052653 - type: nauc_mrr_at_20_max value: 73.83537529419839 - type: nauc_mrr_at_20_std value: -11.628272822028478 - type: nauc_mrr_at_3_diff1 value: 89.375111893546 - type: nauc_mrr_at_3_max value: 74.78900366026112 - type: nauc_mrr_at_3_std value: -12.720905253503274 - type: nauc_mrr_at_5_diff1 value: 89.35358300820893 - type: nauc_mrr_at_5_max value: 74.31996219723239 - type: nauc_mrr_at_5_std value: -10.768642638210867 - type: nauc_ndcg_at_1000_diff1 value: 89.27620775856863 - type: nauc_ndcg_at_1000_max value: 74.2985757362615 - type: nauc_ndcg_at_1000_std value: -11.236142819703023 - type: nauc_ndcg_at_100_diff1 value: 89.27284787540731 - type: nauc_ndcg_at_100_max value: 74.33539303365968 - type: nauc_ndcg_at_100_std value: -11.469413615851936 - type: nauc_ndcg_at_10_diff1 value: 89.21496710661724 - type: nauc_ndcg_at_10_max value: 75.02035398490516 - type: nauc_ndcg_at_10_std value: -9.903255803665814 - type: nauc_ndcg_at_1_diff1 value: 89.35436544117584 - type: nauc_ndcg_at_1_max value: 70.35936815057701 - type: nauc_ndcg_at_1_std value: -13.598996360976903 - type: nauc_ndcg_at_20_diff1 value: 89.03561289544179 - type: nauc_ndcg_at_20_max value: 74.4006766600049 - type: nauc_ndcg_at_20_std value: -11.129237862587743 - type: nauc_ndcg_at_3_diff1 value: 89.46540193201693 - type: nauc_ndcg_at_3_max value: 76.87093548368378 - type: nauc_ndcg_at_3_std value: -12.484902872086767 - type: nauc_ndcg_at_5_diff1 value: 89.39924941584766 - type: nauc_ndcg_at_5_max value: 75.96975269092722 - type: nauc_ndcg_at_5_std value: -8.180295581144833 - type: nauc_precision_at_1000_diff1 value: 100.0 - type: nauc_precision_at_1000_max value: 100.0 - type: nauc_precision_at_1000_std value: 100.0 - type: nauc_precision_at_100_diff1 value: 86.93074003795302 - type: nauc_precision_at_100_max value: 100.0 - type: nauc_precision_at_100_std value: -174.07785375176616 - type: nauc_precision_at_10_diff1 value: 87.43064119412082 - type: nauc_precision_at_10_max value: 90.60785783417448 - type: nauc_precision_at_10_std value: 15.378710059645906 - type: nauc_precision_at_1_diff1 value: 89.35436544117584 - type: nauc_precision_at_1_max value: 70.35936815057701 - type: nauc_precision_at_1_std value: -13.598996360976903 - type: nauc_precision_at_20_diff1 value: 78.78206037685919 - type: nauc_precision_at_20_max value: 82.52264166455923 - type: nauc_precision_at_20_std value: -5.95806599216658 - type: nauc_precision_at_3_diff1 value: 90.12709256456401 - type: nauc_precision_at_3_max value: 90.72678805838154 - type: nauc_precision_at_3_std value: -11.047599315631993 - type: nauc_precision_at_5_diff1 value: 89.9066873566561 - type: nauc_precision_at_5_max value: 93.51571626543664 - type: nauc_precision_at_5_std value: 22.632403279126162 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 86.93074003793416 - type: nauc_recall_at_100_max value: 100.0 - type: nauc_recall_at_100_std value: -174.07785375175723 - type: nauc_recall_at_10_diff1 value: 87.43064119411991 - type: nauc_recall_at_10_max value: 90.60785783417579 - type: nauc_recall_at_10_std value: 15.378710059643607 - type: nauc_recall_at_1_diff1 value: 89.35436544117584 - type: nauc_recall_at_1_max value: 70.35936815057701 - type: nauc_recall_at_1_std value: -13.598996360976903 - type: nauc_recall_at_20_diff1 value: 78.78206037685645 - type: nauc_recall_at_20_max value: 82.52264166455791 - type: nauc_recall_at_20_std value: -5.958065992168697 - type: nauc_recall_at_3_diff1 value: 90.12709256456463 - type: nauc_recall_at_3_max value: 90.7267880583832 - type: nauc_recall_at_3_std value: -11.047599315631881 - type: nauc_recall_at_5_diff1 value: 89.90668735665676 - type: nauc_recall_at_5_max value: 93.51571626543753 - type: nauc_recall_at_5_std value: 22.632403279126112 - type: ndcg_at_1 value: 90.789 - type: ndcg_at_10 value: 95.46 - type: ndcg_at_100 value: 95.652 - type: ndcg_at_1000 value: 95.659 - type: ndcg_at_20 value: 95.575 - type: ndcg_at_3 value: 94.82000000000001 - type: ndcg_at_5 value: 95.26400000000001 - type: precision_at_1 value: 90.789 - type: precision_at_10 value: 9.908999999999999 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.977 - type: precision_at_3 value: 32.471 - type: precision_at_5 value: 19.701 - type: recall_at_1 value: 90.789 - type: recall_at_10 value: 99.093 - type: recall_at_100 value: 99.955 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 99.546 - type: recall_at_3 value: 97.414 - type: recall_at_5 value: 98.503 task: type: Retrieval - dataset: config: default name: MTEB GermanSTSBenchmark (default) revision: e36907544d44c3a247898ed81540310442329e20 split: test type: jinaai/german-STSbenchmark metrics: - type: cosine_pearson value: 86.55319003300265 - type: cosine_spearman value: 87.50267373081324 - type: euclidean_pearson value: 87.41630636501863 - type: euclidean_spearman value: 88.02170803409365 - type: main_score value: 87.50267373081324 - type: manhattan_pearson value: 87.33703179056744 - type: manhattan_spearman value: 87.99192826922514 - type: pearson value: 86.55319003300265 - type: spearman value: 87.50267373081324 task: type: STS - dataset: config: default name: MTEB HALClusteringS2S (default) revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915 split: test type: lyon-nlp/clustering-hal-s2s metrics: - type: main_score value: 27.477557517301303 - type: v_measure value: 27.477557517301303 - type: v_measure_std value: 3.3525736581861336 task: type: Clustering - dataset: config: default name: MTEB HeadlineClassification (default) revision: 2fe05ee6b5832cda29f2ef7aaad7b7fe6a3609eb split: test type: ai-forever/headline-classification metrics: - type: accuracy value: 75.0830078125 - type: f1 value: 75.08863209267814 - type: f1_weighted value: 75.08895979060917 - type: main_score value: 75.0830078125 task: type: Classification - dataset: config: default name: MTEB HotpotQA (default) revision: ab518f4d6fcca38d87c25209f94beba119d02014 split: test type: mteb/hotpotqa metrics: - type: map_at_1 value: 38.143 - type: map_at_10 value: 55.916999999999994 - type: map_at_100 value: 56.706 - type: map_at_1000 value: 56.77100000000001 - type: map_at_20 value: 56.367 - type: map_at_3 value: 53.111 - type: map_at_5 value: 54.839000000000006 - type: mrr_at_1 value: 76.286 - type: mrr_at_10 value: 81.879 - type: mrr_at_100 value: 82.09100000000001 - type: mrr_at_1000 value: 82.101 - type: mrr_at_20 value: 82.01 - type: mrr_at_3 value: 80.972 - type: mrr_at_5 value: 81.537 - type: ndcg_at_1 value: 76.286 - type: ndcg_at_10 value: 64.673 - type: ndcg_at_100 value: 67.527 - type: ndcg_at_1000 value: 68.857 - type: ndcg_at_20 value: 65.822 - type: ndcg_at_3 value: 60.616 - type: ndcg_at_5 value: 62.827999999999996 - type: precision_at_1 value: 76.286 - type: precision_at_10 value: 13.196 - type: precision_at_100 value: 1.544 - type: precision_at_1000 value: 0.172 - type: precision_at_20 value: 6.968000000000001 - type: precision_at_3 value: 37.992 - type: precision_at_5 value: 24.54 - type: recall_at_1 value: 38.143 - type: recall_at_10 value: 65.982 - type: recall_at_100 value: 77.225 - type: recall_at_1000 value: 86.077 - type: recall_at_20 value: 69.68299999999999 - type: recall_at_3 value: 56.989000000000004 - type: recall_at_5 value: 61.35 - type: main_score value: 64.673 task: type: Retrieval - dataset: config: default name: MTEB IFlyTek (default) revision: 421605374b29664c5fc098418fe20ada9bd55f8a split: validation type: C-MTEB/IFlyTek-classification metrics: - type: accuracy value: 41.67756829549827 - type: f1 value: 33.929325579581636 - type: f1_weighted value: 43.03952025643197 - type: main_score value: 41.67756829549827 task: type: Classification - dataset: config: default name: MTEB ImdbClassification (default) revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 91.90440000000001 - type: ap value: 88.78663714603425 - type: ap_weighted value: 88.78663714603425 - type: f1 value: 91.89564361975891 - type: f1_weighted value: 91.89564361975891 - type: main_score value: 91.90440000000001 task: type: Classification - dataset: config: default name: MTEB InappropriatenessClassification (default) revision: 601651fdc45ef243751676e62dd7a19f491c0285 split: test type: ai-forever/inappropriateness-classification metrics: - type: accuracy value: 61.0498046875 - type: ap value: 57.04240566648215 - type: ap_weighted value: 57.04240566648215 - type: f1 value: 60.867630038606954 - type: f1_weighted value: 60.867630038606954 - type: main_score value: 61.0498046875 task: type: Classification - dataset: config: default name: MTEB JDReview (default) revision: b7c64bd89eb87f8ded463478346f76731f07bf8b split: test type: C-MTEB/JDReview-classification metrics: - type: accuracy value: 83.50844277673546 - type: ap value: 48.46732380712268 - type: ap_weighted value: 48.46732380712268 - type: f1 value: 77.43967451387445 - type: f1_weighted value: 84.78462929014114 - type: main_score value: 83.50844277673546 task: type: Classification - dataset: config: default name: MTEB KinopoiskClassification (default) revision: 5911f26666ac11af46cb9c6849d0dc80a378af24 split: test type: ai-forever/kinopoisk-sentiment-classification metrics: - type: accuracy value: 62.393333333333324 - type: f1 value: 61.35940129568015 - type: f1_weighted value: 61.35940129568015 - type: main_score value: 62.393333333333324 task: type: Classification - dataset: config: default name: MTEB LCQMC (default) revision: 17f9b096f80380fce5ed12a9be8be7784b337daf split: test type: C-MTEB/LCQMC metrics: - type: cosine_pearson value: 67.74375505907872 - type: cosine_spearman value: 75.94582231399434 - type: euclidean_pearson value: 74.52501692443582 - type: euclidean_spearman value: 75.88428434746646 - type: main_score value: 75.94582231399434 - type: manhattan_pearson value: 74.55015441749529 - type: manhattan_spearman value: 75.83288262176175 - type: pearson value: 67.74375505907872 - type: spearman value: 75.94582231399434 task: type: STS - dataset: config: default name: MTEB LEMBNarrativeQARetrieval (default) revision: 6e346642246bfb4928c560ee08640dc84d074e8c split: test type: dwzhu/LongEmbed metrics: - type: map_at_1 value: 23.093 - type: map_at_10 value: 30.227999999999998 - type: map_at_100 value: 31.423000000000002 - type: map_at_1000 value: 31.533 - type: map_at_20 value: 30.835 - type: map_at_3 value: 27.983999999999998 - type: map_at_5 value: 29.253 - type: mrr_at_1 value: 23.093 - type: mrr_at_10 value: 30.227999999999998 - type: mrr_at_100 value: 31.423000000000002 - type: mrr_at_1000 value: 31.533 - type: mrr_at_20 value: 30.835 - type: mrr_at_3 value: 27.983999999999998 - type: mrr_at_5 value: 29.253 - type: ndcg_at_1 value: 23.093 - type: ndcg_at_10 value: 34.297 - type: ndcg_at_100 value: 41.049 - type: ndcg_at_1000 value: 43.566 - type: ndcg_at_20 value: 36.52 - type: ndcg_at_3 value: 29.629 - type: ndcg_at_5 value: 31.926 - type: precision_at_1 value: 23.093 - type: precision_at_10 value: 4.735 - type: precision_at_100 value: 0.8109999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 2.8080000000000003 - type: precision_at_3 value: 11.468 - type: precision_at_5 value: 8.001 - type: recall_at_1 value: 23.093 - type: recall_at_10 value: 47.354 - type: recall_at_100 value: 81.147 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 56.16799999999999 - type: recall_at_3 value: 34.405 - type: recall_at_5 value: 40.004 - type: main_score value: 34.297 task: type: Retrieval - dataset: config: default name: MTEB LEMBNeedleRetrieval (default) revision: 6e346642246bfb4928c560ee08640dc84d074e8c split: test_256 type: dwzhu/LongEmbed metrics: - type: map_at_1 value: 64.0 - type: map_at_10 value: 77.083 - type: map_at_100 value: 77.265 - type: map_at_1000 value: 77.265 - type: map_at_20 value: 77.265 - type: map_at_3 value: 76.333 - type: map_at_5 value: 76.833 - type: mrr_at_1 value: 64.0 - type: mrr_at_10 value: 77.083 - type: mrr_at_100 value: 77.265 - type: mrr_at_1000 value: 77.265 - type: mrr_at_20 value: 77.265 - type: mrr_at_3 value: 76.333 - type: mrr_at_5 value: 76.833 - type: ndcg_at_1 value: 64.0 - type: ndcg_at_10 value: 82.325 - type: ndcg_at_100 value: 82.883 - type: ndcg_at_1000 value: 82.883 - type: ndcg_at_20 value: 82.883 - type: ndcg_at_3 value: 80.833 - type: ndcg_at_5 value: 81.694 - type: precision_at_1 value: 64.0 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 5.0 - type: precision_at_3 value: 31.333 - type: precision_at_5 value: 19.2 - type: recall_at_1 value: 64.0 - type: recall_at_10 value: 98.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 100.0 - type: recall_at_3 value: 94.0 - type: recall_at_5 value: 96.0 - type: main_score value: 64.0 task: type: Retrieval - dataset: config: default name: MTEB LEMBPasskeyRetrieval (default) revision: 6e346642246bfb4928c560ee08640dc84d074e8c split: test_256 type: dwzhu/LongEmbed metrics: - type: map_at_1 value: 100.0 - type: map_at_10 value: 100.0 - type: map_at_100 value: 100.0 - type: map_at_1000 value: 100.0 - type: map_at_20 value: 100.0 - type: map_at_3 value: 100.0 - type: map_at_5 value: 100.0 - type: mrr_at_1 value: 100.0 - type: mrr_at_10 value: 100.0 - type: mrr_at_100 value: 100.0 - type: mrr_at_1000 value: 100.0 - type: mrr_at_20 value: 100.0 - type: mrr_at_3 value: 100.0 - type: mrr_at_5 value: 100.0 - type: ndcg_at_1 value: 100.0 - type: ndcg_at_10 value: 100.0 - type: ndcg_at_100 value: 100.0 - type: ndcg_at_1000 value: 100.0 - type: ndcg_at_20 value: 100.0 - type: ndcg_at_3 value: 100.0 - type: ndcg_at_5 value: 100.0 - type: precision_at_1 value: 100.0 - type: precision_at_10 value: 10.0 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 5.0 - type: precision_at_3 value: 33.333 - type: precision_at_5 value: 20.0 - type: recall_at_1 value: 100.0 - type: recall_at_10 value: 100.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 100.0 - type: recall_at_3 value: 100.0 - type: recall_at_5 value: 100.0 - type: main_score value: 100.0 task: type: Retrieval - dataset: config: default name: MTEB LEMBQMSumRetrieval (default) revision: 6e346642246bfb4928c560ee08640dc84d074e8c split: test type: dwzhu/LongEmbed metrics: - type: map_at_1 value: 24.361 - type: map_at_10 value: 33.641 - type: map_at_100 value: 35.104 - type: map_at_1000 value: 35.127 - type: map_at_20 value: 34.388999999999996 - type: map_at_3 value: 30.255 - type: map_at_5 value: 32.079 - type: mrr_at_1 value: 24.361 - type: mrr_at_10 value: 33.641 - type: mrr_at_100 value: 35.104 - type: mrr_at_1000 value: 35.127 - type: mrr_at_20 value: 34.388999999999996 - type: mrr_at_3 value: 30.255 - type: mrr_at_5 value: 32.079 - type: ndcg_at_1 value: 24.361 - type: ndcg_at_10 value: 39.337 - type: ndcg_at_100 value: 47.384 - type: ndcg_at_1000 value: 47.75 - type: ndcg_at_20 value: 42.077999999999996 - type: ndcg_at_3 value: 32.235 - type: ndcg_at_5 value: 35.524 - type: precision_at_1 value: 24.361 - type: precision_at_10 value: 5.783 - type: precision_at_100 value: 0.975 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 3.435 - type: precision_at_3 value: 12.661 - type: precision_at_5 value: 9.193999999999999 - type: recall_at_1 value: 24.361 - type: recall_at_10 value: 57.826 - type: recall_at_100 value: 97.51100000000001 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 68.697 - type: recall_at_3 value: 37.983 - type: recall_at_5 value: 45.972 - type: main_score value: 39.337 task: type: Retrieval - dataset: config: default name: MTEB LEMBSummScreenFDRetrieval (default) revision: 6e346642246bfb4928c560ee08640dc84d074e8c split: validation type: dwzhu/LongEmbed metrics: - type: map_at_1 value: 84.821 - type: map_at_10 value: 90.11200000000001 - type: map_at_100 value: 90.158 - type: map_at_1000 value: 90.158 - type: map_at_20 value: 90.137 - type: map_at_3 value: 89.385 - type: map_at_5 value: 89.876 - type: mrr_at_1 value: 84.821 - type: mrr_at_10 value: 90.11200000000001 - type: mrr_at_100 value: 90.158 - type: mrr_at_1000 value: 90.158 - type: mrr_at_20 value: 90.137 - type: mrr_at_3 value: 89.385 - type: mrr_at_5 value: 89.876 - type: ndcg_at_1 value: 84.821 - type: ndcg_at_10 value: 92.334 - type: ndcg_at_100 value: 92.535 - type: ndcg_at_1000 value: 92.535 - type: ndcg_at_20 value: 92.414 - type: ndcg_at_3 value: 90.887 - type: ndcg_at_5 value: 91.758 - type: precision_at_1 value: 84.821 - type: precision_at_10 value: 9.911 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.97 - type: precision_at_3 value: 31.746000000000002 - type: precision_at_5 value: 19.464000000000002 - type: recall_at_1 value: 84.821 - type: recall_at_10 value: 99.107 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 99.405 - type: recall_at_3 value: 95.238 - type: recall_at_5 value: 97.321 - type: main_score value: 92.334 task: type: Retrieval - dataset: config: default name: MTEB LEMBWikimQARetrieval (default) revision: 6e346642246bfb4928c560ee08640dc84d074e8c split: test type: dwzhu/LongEmbed metrics: - type: map_at_1 value: 53.667 - type: map_at_10 value: 61.719 - type: map_at_100 value: 62.471 - type: map_at_1000 value: 62.492000000000004 - type: map_at_20 value: 62.153000000000006 - type: map_at_3 value: 59.167 - type: map_at_5 value: 60.95 - type: mrr_at_1 value: 53.667 - type: mrr_at_10 value: 61.719 - type: mrr_at_100 value: 62.471 - type: mrr_at_1000 value: 62.492000000000004 - type: mrr_at_20 value: 62.153000000000006 - type: mrr_at_3 value: 59.167 - type: mrr_at_5 value: 60.95 - type: ndcg_at_1 value: 53.667 - type: ndcg_at_10 value: 66.018 - type: ndcg_at_100 value: 69.726 - type: ndcg_at_1000 value: 70.143 - type: ndcg_at_20 value: 67.61399999999999 - type: ndcg_at_3 value: 60.924 - type: ndcg_at_5 value: 64.10900000000001 - type: precision_at_1 value: 53.667 - type: precision_at_10 value: 7.9670000000000005 - type: precision_at_100 value: 0.97 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.3 - type: precision_at_3 value: 22.0 - type: precision_at_5 value: 14.732999999999999 - type: recall_at_1 value: 53.667 - type: recall_at_10 value: 79.667 - type: recall_at_100 value: 97.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 86.0 - type: recall_at_3 value: 66.0 - type: recall_at_5 value: 73.667 - type: main_score value: 66.018 task: type: Retrieval - dataset: config: deu-deu name: MTEB MLQARetrieval (deu-deu) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 67.548 - type: map_at_1 value: 56.559000000000005 - type: map_at_10 value: 63.867 - type: map_at_100 value: 64.429 - type: map_at_1000 value: 64.457 - type: map_at_20 value: 64.215 - type: map_at_3 value: 62.109 - type: map_at_5 value: 63.101 - type: mrr_at_1 value: 56.56990915134057 - type: mrr_at_10 value: 63.86820789324668 - type: mrr_at_100 value: 64.42973602152581 - type: mrr_at_1000 value: 64.45818598090155 - type: mrr_at_20 value: 64.2163052263868 - type: mrr_at_3 value: 62.10946155550634 - type: mrr_at_5 value: 63.10104143585199 - type: nauc_map_at_1000_diff1 value: 73.78440163370111 - type: nauc_map_at_1000_max value: 66.37875518052162 - type: nauc_map_at_1000_std value: -17.063915098135396 - type: nauc_map_at_100_diff1 value: 73.77180802985815 - type: nauc_map_at_100_max value: 66.38365998362033 - type: nauc_map_at_100_std value: -17.053345109661972 - type: nauc_map_at_10_diff1 value: 73.70041876696037 - type: nauc_map_at_10_max value: 66.33213342705997 - type: nauc_map_at_10_std value: -17.40657791273925 - type: nauc_map_at_1_diff1 value: 76.8784374396948 - type: nauc_map_at_1_max value: 64.07170606935357 - type: nauc_map_at_1_std value: -18.464213686790654 - type: nauc_map_at_20_diff1 value: 73.72371377231813 - type: nauc_map_at_20_max value: 66.42108121059451 - type: nauc_map_at_20_std value: -17.05384923889036 - type: nauc_map_at_3_diff1 value: 74.08287018839246 - type: nauc_map_at_3_max value: 66.42422337760333 - type: nauc_map_at_3_std value: -17.79503404131652 - type: nauc_map_at_5_diff1 value: 73.9294779027339 - type: nauc_map_at_5_max value: 66.51752041065726 - type: nauc_map_at_5_std value: -17.67309805113804 - type: nauc_mrr_at_1000_diff1 value: 73.78389736923545 - type: nauc_mrr_at_1000_max value: 66.37929720858341 - type: nauc_mrr_at_1000_std value: -17.058591711291278 - type: nauc_mrr_at_100_diff1 value: 73.77126451253136 - type: nauc_mrr_at_100_max value: 66.38405917246607 - type: nauc_mrr_at_100_std value: -17.047251035212863 - type: nauc_mrr_at_10_diff1 value: 73.69960470665124 - type: nauc_mrr_at_10_max value: 66.33265194210313 - type: nauc_mrr_at_10_std value: -17.399659076827998 - type: nauc_mrr_at_1_diff1 value: 76.8689850260726 - type: nauc_mrr_at_1_max value: 64.09858188287487 - type: nauc_mrr_at_1_std value: -18.46064784201847 - type: nauc_mrr_at_20_diff1 value: 73.72312682063128 - type: nauc_mrr_at_20_max value: 66.42181932858745 - type: nauc_mrr_at_20_std value: -17.04690257511092 - type: nauc_mrr_at_3_diff1 value: 74.08287018839246 - type: nauc_mrr_at_3_max value: 66.42422337760333 - type: nauc_mrr_at_3_std value: -17.79503404131652 - type: nauc_mrr_at_5_diff1 value: 73.9294779027339 - type: nauc_mrr_at_5_max value: 66.51752041065726 - type: nauc_mrr_at_5_std value: -17.67309805113804 - type: nauc_ndcg_at_1000_diff1 value: 72.97825548342801 - type: nauc_ndcg_at_1000_max value: 66.96275437178257 - type: nauc_ndcg_at_1000_std value: -15.611902299641587 - type: nauc_ndcg_at_100_diff1 value: 72.58724738936613 - type: nauc_ndcg_at_100_max value: 67.16774012704182 - type: nauc_ndcg_at_100_std value: -14.945088654796812 - type: nauc_ndcg_at_10_diff1 value: 72.16253640477947 - type: nauc_ndcg_at_10_max value: 67.01746849484621 - type: nauc_ndcg_at_10_std value: -16.46102507270809 - type: nauc_ndcg_at_1_diff1 value: 76.8689850260726 - type: nauc_ndcg_at_1_max value: 64.09858188287487 - type: nauc_ndcg_at_1_std value: -18.46064784201847 - type: nauc_ndcg_at_20_diff1 value: 72.19995325129975 - type: nauc_ndcg_at_20_max value: 67.39639713797962 - type: nauc_ndcg_at_20_std value: -15.091689370748531 - type: nauc_ndcg_at_3_diff1 value: 73.13123604206514 - type: nauc_ndcg_at_3_max value: 67.23123167871547 - type: nauc_ndcg_at_3_std value: -17.492755234009156 - type: nauc_ndcg_at_5_diff1 value: 72.8154718929895 - type: nauc_ndcg_at_5_max value: 67.44578008373777 - type: nauc_ndcg_at_5_std value: -17.251840358751362 - type: nauc_precision_at_1000_diff1 value: 47.89748325983604 - type: nauc_precision_at_1000_max value: 70.47466197804906 - type: nauc_precision_at_1000_std value: 72.66193512114775 - type: nauc_precision_at_100_diff1 value: 59.493743734005356 - type: nauc_precision_at_100_max value: 74.02140147220713 - type: nauc_precision_at_100_std value: 17.26664098026236 - type: nauc_precision_at_10_diff1 value: 64.94415011040277 - type: nauc_precision_at_10_max value: 69.6963814950747 - type: nauc_precision_at_10_std value: -11.663043657012954 - type: nauc_precision_at_1_diff1 value: 76.8689850260726 - type: nauc_precision_at_1_max value: 64.09858188287487 - type: nauc_precision_at_1_std value: -18.46064784201847 - type: nauc_precision_at_20_diff1 value: 63.145886909986416 - type: nauc_precision_at_20_max value: 72.95708033630744 - type: nauc_precision_at_20_std value: -1.5039593629280323 - type: nauc_precision_at_3_diff1 value: 69.88902201644449 - type: nauc_precision_at_3_max value: 69.80499971089935 - type: nauc_precision_at_3_std value: -16.444680766676647 - type: nauc_precision_at_5_diff1 value: 68.60869967062919 - type: nauc_precision_at_5_max value: 70.75998207564281 - type: nauc_precision_at_5_std value: -15.62613396998262 - type: nauc_recall_at_1000_diff1 value: 62.6646436338833 - type: nauc_recall_at_1000_max value: 86.17801636476078 - type: nauc_recall_at_1000_std value: 71.84718775540334 - type: nauc_recall_at_100_diff1 value: 61.110492191439505 - type: nauc_recall_at_100_max value: 75.45730686603042 - type: nauc_recall_at_100_std value: 16.202465011589428 - type: nauc_recall_at_10_diff1 value: 65.1522196516815 - type: nauc_recall_at_10_max value: 69.7626435962161 - type: nauc_recall_at_10_std value: -11.801178474770449 - type: nauc_recall_at_1_diff1 value: 76.8784374396948 - type: nauc_recall_at_1_max value: 64.07170606935357 - type: nauc_recall_at_1_std value: -18.464213686790654 - type: nauc_recall_at_20_diff1 value: 63.40332739504143 - type: nauc_recall_at_20_max value: 73.04113661090965 - type: nauc_recall_at_20_std value: -1.6609741140266947 - type: nauc_recall_at_3_diff1 value: 70.03728086098866 - type: nauc_recall_at_3_max value: 69.85953774320521 - type: nauc_recall_at_3_std value: -16.482993123411706 - type: nauc_recall_at_5_diff1 value: 68.77396121765933 - type: nauc_recall_at_5_max value: 70.8231205493519 - type: nauc_recall_at_5_std value: -15.668037770700863 - type: ndcg_at_1 value: 56.57 - type: ndcg_at_10 value: 67.548 - type: ndcg_at_100 value: 70.421 - type: ndcg_at_1000 value: 71.198 - type: ndcg_at_20 value: 68.829 - type: ndcg_at_3 value: 63.88700000000001 - type: ndcg_at_5 value: 65.689 - type: precision_at_1 value: 56.57 - type: precision_at_10 value: 7.922 - type: precision_at_100 value: 0.9299999999999999 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 4.216 - type: precision_at_3 value: 23.015 - type: precision_at_5 value: 14.691 - type: recall_at_1 value: 56.559000000000005 - type: recall_at_10 value: 79.182 - type: recall_at_100 value: 92.946 - type: recall_at_1000 value: 99.092 - type: recall_at_20 value: 84.27900000000001 - type: recall_at_3 value: 69.023 - type: recall_at_5 value: 73.432 task: type: Retrieval - dataset: config: deu-spa name: MTEB MLQARetrieval (deu-spa) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 70.645 - type: map_at_1 value: 58.423 - type: map_at_10 value: 66.613 - type: map_at_100 value: 67.14099999999999 - type: map_at_1000 value: 67.161 - type: map_at_20 value: 66.965 - type: map_at_3 value: 64.714 - type: map_at_5 value: 65.835 - type: mrr_at_1 value: 58.4225352112676 - type: mrr_at_10 value: 66.61321260898735 - type: mrr_at_100 value: 67.13991570812132 - type: mrr_at_1000 value: 67.1598532168174 - type: mrr_at_20 value: 66.96384710024888 - type: mrr_at_3 value: 64.71361502347425 - type: mrr_at_5 value: 65.83474178403769 - type: nauc_map_at_1000_diff1 value: 73.9485117118935 - type: nauc_map_at_1000_max value: 65.74479869396299 - type: nauc_map_at_1000_std value: -20.300269749495563 - type: nauc_map_at_100_diff1 value: 73.93900406302829 - type: nauc_map_at_100_max value: 65.75508449194885 - type: nauc_map_at_100_std value: -20.265330791570175 - type: nauc_map_at_10_diff1 value: 73.84863233472605 - type: nauc_map_at_10_max value: 65.89377317378211 - type: nauc_map_at_10_std value: -20.404123131964695 - type: nauc_map_at_1_diff1 value: 76.73627284218519 - type: nauc_map_at_1_max value: 62.94957512510876 - type: nauc_map_at_1_std value: -20.99649749330682 - type: nauc_map_at_20_diff1 value: 73.88712006109598 - type: nauc_map_at_20_max value: 65.82057018162664 - type: nauc_map_at_20_std value: -20.269476512431915 - type: nauc_map_at_3_diff1 value: 74.21419190161502 - type: nauc_map_at_3_max value: 65.64993368062119 - type: nauc_map_at_3_std value: -21.34641749007071 - type: nauc_map_at_5_diff1 value: 74.0119419385777 - type: nauc_map_at_5_max value: 65.69809416369732 - type: nauc_map_at_5_std value: -21.16901556082261 - type: nauc_mrr_at_1000_diff1 value: 73.94915184134923 - type: nauc_mrr_at_1000_max value: 65.74522469633418 - type: nauc_mrr_at_1000_std value: -20.303028367132246 - type: nauc_mrr_at_100_diff1 value: 73.93964394728808 - type: nauc_mrr_at_100_max value: 65.75550992323707 - type: nauc_mrr_at_100_std value: -20.26808820438918 - type: nauc_mrr_at_10_diff1 value: 73.84863233472605 - type: nauc_mrr_at_10_max value: 65.89377317378211 - type: nauc_mrr_at_10_std value: -20.404123131964695 - type: nauc_mrr_at_1_diff1 value: 76.73627284218519 - type: nauc_mrr_at_1_max value: 62.94957512510876 - type: nauc_mrr_at_1_std value: -20.99649749330682 - type: nauc_mrr_at_20_diff1 value: 73.88775721128745 - type: nauc_mrr_at_20_max value: 65.820991355628 - type: nauc_mrr_at_20_std value: -20.272216587019734 - type: nauc_mrr_at_3_diff1 value: 74.21419190161502 - type: nauc_mrr_at_3_max value: 65.64993368062119 - type: nauc_mrr_at_3_std value: -21.34641749007071 - type: nauc_mrr_at_5_diff1 value: 74.0119419385777 - type: nauc_mrr_at_5_max value: 65.69809416369732 - type: nauc_mrr_at_5_std value: -21.16901556082261 - type: nauc_ndcg_at_1000_diff1 value: 73.29396365944277 - type: nauc_ndcg_at_1000_max value: 66.44879592109541 - type: nauc_ndcg_at_1000_std value: -19.285991058788195 - type: nauc_ndcg_at_100_diff1 value: 73.0159172721162 - type: nauc_ndcg_at_100_max value: 66.76216389231388 - type: nauc_ndcg_at_100_std value: -18.27931368094887 - type: nauc_ndcg_at_10_diff1 value: 72.42096650774693 - type: nauc_ndcg_at_10_max value: 67.48592688463306 - type: nauc_ndcg_at_10_std value: -18.91453756077581 - type: nauc_ndcg_at_1_diff1 value: 76.73627284218519 - type: nauc_ndcg_at_1_max value: 62.94957512510876 - type: nauc_ndcg_at_1_std value: -20.99649749330682 - type: nauc_ndcg_at_20_diff1 value: 72.53699362385684 - type: nauc_ndcg_at_20_max value: 67.22763976357872 - type: nauc_ndcg_at_20_std value: -18.299910635008338 - type: nauc_ndcg_at_3_diff1 value: 73.3698453761989 - type: nauc_ndcg_at_3_max value: 66.71056987289383 - type: nauc_ndcg_at_3_std value: -21.405154376652803 - type: nauc_ndcg_at_5_diff1 value: 72.9491030712935 - type: nauc_ndcg_at_5_max value: 66.85786103137077 - type: nauc_ndcg_at_5_std value: -21.04005053344073 - type: nauc_precision_at_1000_diff1 value: 17.02462370967451 - type: nauc_precision_at_1000_max value: 48.03260752496052 - type: nauc_precision_at_1000_std value: 87.56077915079334 - type: nauc_precision_at_100_diff1 value: 58.590352501194985 - type: nauc_precision_at_100_max value: 78.2649015433222 - type: nauc_precision_at_100_std value: 28.05030453158992 - type: nauc_precision_at_10_diff1 value: 64.89497928764766 - type: nauc_precision_at_10_max value: 75.93257124951242 - type: nauc_precision_at_10_std value: -9.825306994117462 - type: nauc_precision_at_1_diff1 value: 76.73627284218519 - type: nauc_precision_at_1_max value: 62.94957512510876 - type: nauc_precision_at_1_std value: -20.99649749330682 - type: nauc_precision_at_20_diff1 value: 62.11366204321558 - type: nauc_precision_at_20_max value: 75.9571427846493 - type: nauc_precision_at_20_std value: -0.94585212808191 - type: nauc_precision_at_3_diff1 value: 70.52940972112398 - type: nauc_precision_at_3_max value: 70.3402053170779 - type: nauc_precision_at_3_std value: -21.579778424241304 - type: nauc_precision_at_5_diff1 value: 68.78962580223575 - type: nauc_precision_at_5_max value: 71.41410894398376 - type: nauc_precision_at_5_std value: -20.415603405161956 - type: nauc_recall_at_1000_diff1 value: 55.88625447348128 - type: nauc_recall_at_1000_max value: 100.0 - type: nauc_recall_at_1000_std value: 100.0 - type: nauc_recall_at_100_diff1 value: 61.17942268389525 - type: nauc_recall_at_100_max value: 81.12207841563487 - type: nauc_recall_at_100_std value: 27.141215257528113 - type: nauc_recall_at_10_diff1 value: 64.8949792876478 - type: nauc_recall_at_10_max value: 75.93257124951249 - type: nauc_recall_at_10_std value: -9.825306994117323 - type: nauc_recall_at_1_diff1 value: 76.73627284218519 - type: nauc_recall_at_1_max value: 62.94957512510876 - type: nauc_recall_at_1_std value: -20.99649749330682 - type: nauc_recall_at_20_diff1 value: 63.07808719241162 - type: nauc_recall_at_20_max value: 76.96808746317542 - type: nauc_recall_at_20_std value: -1.5235053258631275 - type: nauc_recall_at_3_diff1 value: 70.52940972112405 - type: nauc_recall_at_3_max value: 70.3402053170779 - type: nauc_recall_at_3_std value: -21.57977842424124 - type: nauc_recall_at_5_diff1 value: 68.78962580223575 - type: nauc_recall_at_5_max value: 71.41410894398392 - type: nauc_recall_at_5_std value: -20.415603405161793 - type: ndcg_at_1 value: 58.423 - type: ndcg_at_10 value: 70.645 - type: ndcg_at_100 value: 73.277 - type: ndcg_at_1000 value: 73.785 - type: ndcg_at_20 value: 71.918 - type: ndcg_at_3 value: 66.679 - type: ndcg_at_5 value: 68.72200000000001 - type: precision_at_1 value: 58.423 - type: precision_at_10 value: 8.338 - type: precision_at_100 value: 0.959 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.423 - type: precision_at_3 value: 24.113 - type: precision_at_5 value: 15.47 - type: recall_at_1 value: 58.423 - type: recall_at_10 value: 83.38 - type: recall_at_100 value: 95.887 - type: recall_at_1000 value: 99.831 - type: recall_at_20 value: 88.39399999999999 - type: recall_at_3 value: 72.33800000000001 - type: recall_at_5 value: 77.352 task: type: Retrieval - dataset: config: deu-eng name: MTEB MLQARetrieval (deu-eng) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 67.067 - type: map_at_1 value: 55.861000000000004 - type: map_at_10 value: 63.42100000000001 - type: map_at_100 value: 64.03 - type: map_at_1000 value: 64.05999999999999 - type: map_at_20 value: 63.819 - type: map_at_3 value: 61.773 - type: map_at_5 value: 62.736999999999995 - type: mrr_at_1 value: 55.88300465322402 - type: mrr_at_10 value: 63.43111082973707 - type: mrr_at_100 value: 64.03962373590272 - type: mrr_at_1000 value: 64.0698259866376 - type: mrr_at_20 value: 63.82871766489112 - type: mrr_at_3 value: 61.78447448112865 - type: mrr_at_5 value: 62.74835659945346 - type: nauc_map_at_1000_diff1 value: 74.58505763417352 - type: nauc_map_at_1000_max value: 66.26060764852198 - type: nauc_map_at_1000_std value: -16.896178230873897 - type: nauc_map_at_100_diff1 value: 74.57057487892857 - type: nauc_map_at_100_max value: 66.26600433283826 - type: nauc_map_at_100_std value: -16.87596113104189 - type: nauc_map_at_10_diff1 value: 74.53453636322749 - type: nauc_map_at_10_max value: 66.27501737773804 - type: nauc_map_at_10_std value: -17.178743257781775 - type: nauc_map_at_1_diff1 value: 77.63067209375254 - type: nauc_map_at_1_max value: 64.17718675702672 - type: nauc_map_at_1_std value: -17.639521106853717 - type: nauc_map_at_20_diff1 value: 74.52007402431164 - type: nauc_map_at_20_max value: 66.28276291359268 - type: nauc_map_at_20_std value: -16.939292897754758 - type: nauc_map_at_3_diff1 value: 74.79187974631951 - type: nauc_map_at_3_max value: 66.23256568210611 - type: nauc_map_at_3_std value: -17.894889918934112 - type: nauc_map_at_5_diff1 value: 74.63011328882517 - type: nauc_map_at_5_max value: 66.35411054978499 - type: nauc_map_at_5_std value: -17.50140342194211 - type: nauc_mrr_at_1000_diff1 value: 74.57520089771667 - type: nauc_mrr_at_1000_max value: 66.27270912845914 - type: nauc_mrr_at_1000_std value: -16.84012675362397 - type: nauc_mrr_at_100_diff1 value: 74.56070964572156 - type: nauc_mrr_at_100_max value: 66.2780701126926 - type: nauc_mrr_at_100_std value: -16.820035083069865 - type: nauc_mrr_at_10_diff1 value: 74.52455978435117 - type: nauc_mrr_at_10_max value: 66.28697244023137 - type: nauc_mrr_at_10_std value: -17.122477723330523 - type: nauc_mrr_at_1_diff1 value: 77.60643512422061 - type: nauc_mrr_at_1_max value: 64.21736966061896 - type: nauc_mrr_at_1_std value: -17.56627338275146 - type: nauc_mrr_at_20_diff1 value: 74.5099814266373 - type: nauc_mrr_at_20_max value: 66.29485560556576 - type: nauc_mrr_at_20_std value: -16.882350027335306 - type: nauc_mrr_at_3_diff1 value: 74.78132817375507 - type: nauc_mrr_at_3_max value: 66.24761860047623 - type: nauc_mrr_at_3_std value: -17.833128575678998 - type: nauc_mrr_at_5_diff1 value: 74.6193031207433 - type: nauc_mrr_at_5_max value: 66.36951764432901 - type: nauc_mrr_at_5_std value: -17.438203106324227 - type: nauc_ndcg_at_1000_diff1 value: 73.79386161629151 - type: nauc_ndcg_at_1000_max value: 66.84013038018082 - type: nauc_ndcg_at_1000_std value: -15.387358822700667 - type: nauc_ndcg_at_100_diff1 value: 73.36132885277745 - type: nauc_ndcg_at_100_max value: 67.04416926901568 - type: nauc_ndcg_at_100_std value: -14.503256942521972 - type: nauc_ndcg_at_10_diff1 value: 73.11847332785027 - type: nauc_ndcg_at_10_max value: 67.02149621303091 - type: nauc_ndcg_at_10_std value: -16.142234662067782 - type: nauc_ndcg_at_1_diff1 value: 77.60643512422061 - type: nauc_ndcg_at_1_max value: 64.21736966061896 - type: nauc_ndcg_at_1_std value: -17.56627338275146 - type: nauc_ndcg_at_20_diff1 value: 72.97961452569768 - type: nauc_ndcg_at_20_max value: 67.12369127081152 - type: nauc_ndcg_at_20_std value: -15.11921773223936 - type: nauc_ndcg_at_3_diff1 value: 73.77769312598772 - type: nauc_ndcg_at_3_max value: 66.94438755852309 - type: nauc_ndcg_at_3_std value: -17.75960443830741 - type: nauc_ndcg_at_5_diff1 value: 73.43991209562891 - type: nauc_ndcg_at_5_max value: 67.21682951737418 - type: nauc_ndcg_at_5_std value: -17.013510008231805 - type: nauc_precision_at_1000_diff1 value: 51.30633281948362 - type: nauc_precision_at_1000_max value: 76.78675288883846 - type: nauc_precision_at_1000_std value: 71.70041985304397 - type: nauc_precision_at_100_diff1 value: 59.86656455853326 - type: nauc_precision_at_100_max value: 74.41958422732161 - type: nauc_precision_at_100_std value: 22.098920296069124 - type: nauc_precision_at_10_diff1 value: 66.4696166928741 - type: nauc_precision_at_10_max value: 69.88463108697104 - type: nauc_precision_at_10_std value: -10.707950954702742 - type: nauc_precision_at_1_diff1 value: 77.60643512422061 - type: nauc_precision_at_1_max value: 64.21736966061896 - type: nauc_precision_at_1_std value: -17.56627338275146 - type: nauc_precision_at_20_diff1 value: 63.45094585276983 - type: nauc_precision_at_20_max value: 71.57741245347195 - type: nauc_precision_at_20_std value: -2.2211545419051744 - type: nauc_precision_at_3_diff1 value: 70.28060818081384 - type: nauc_precision_at_3_max value: 69.22652927816439 - type: nauc_precision_at_3_std value: -17.158576243559434 - type: nauc_precision_at_5_diff1 value: 68.90765418427162 - type: nauc_precision_at_5_max value: 70.32585273389111 - type: nauc_precision_at_5_std value: -14.950363729664524 - type: nauc_recall_at_1000_diff1 value: 65.11255117927331 - type: nauc_recall_at_1000_max value: 88.35641213283338 - type: nauc_recall_at_1000_std value: 69.89792573640547 - type: nauc_recall_at_100_diff1 value: 61.46376457272238 - type: nauc_recall_at_100_max value: 75.48265142243015 - type: nauc_recall_at_100_std value: 21.223182712042178 - type: nauc_recall_at_10_diff1 value: 66.89353375308997 - type: nauc_recall_at_10_max value: 70.06655416883785 - type: nauc_recall_at_10_std value: -11.100871879439435 - type: nauc_recall_at_1_diff1 value: 77.63067209375254 - type: nauc_recall_at_1_max value: 64.17718675702672 - type: nauc_recall_at_1_std value: -17.639521106853717 - type: nauc_recall_at_20_diff1 value: 63.98532276331878 - type: nauc_recall_at_20_max value: 71.81562599791899 - type: nauc_recall_at_20_std value: -2.696537977147695 - type: nauc_recall_at_3_diff1 value: 70.4507655865698 - type: nauc_recall_at_3_max value: 69.25705030141037 - type: nauc_recall_at_3_std value: -17.299948348202836 - type: nauc_recall_at_5_diff1 value: 69.09152857901888 - type: nauc_recall_at_5_max value: 70.35609636026405 - type: nauc_recall_at_5_std value: -15.105012139255896 - type: ndcg_at_1 value: 55.883 - type: ndcg_at_10 value: 67.067 - type: ndcg_at_100 value: 70.07 - type: ndcg_at_1000 value: 70.875 - type: ndcg_at_20 value: 68.498 - type: ndcg_at_3 value: 63.666 - type: ndcg_at_5 value: 65.40599999999999 - type: precision_at_1 value: 55.883 - type: precision_at_10 value: 7.8549999999999995 - type: precision_at_100 value: 0.928 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 4.2090000000000005 - type: precision_at_3 value: 23.052 - type: precision_at_5 value: 14.677999999999999 - type: recall_at_1 value: 55.861000000000004 - type: recall_at_10 value: 78.495 - type: recall_at_100 value: 92.688 - type: recall_at_1000 value: 99.02499999999999 - type: recall_at_20 value: 84.124 - type: recall_at_3 value: 69.123 - type: recall_at_5 value: 73.355 task: type: Retrieval - dataset: config: spa-deu name: MTEB MLQARetrieval (spa-deu) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 73.90299999999999 - type: map_at_1 value: 61.236000000000004 - type: map_at_10 value: 69.88799999999999 - type: map_at_100 value: 70.319 - type: map_at_1000 value: 70.341 - type: map_at_20 value: 70.16799999999999 - type: map_at_3 value: 68.104 - type: map_at_5 value: 69.164 - type: mrr_at_1 value: 61.2739571589628 - type: mrr_at_10 value: 69.92589162684993 - type: mrr_at_100 value: 70.35245455509234 - type: mrr_at_1000 value: 70.37438351396742 - type: mrr_at_20 value: 70.20247469915404 - type: mrr_at_3 value: 68.14167606163099 - type: mrr_at_5 value: 69.20142803457354 - type: nauc_map_at_1000_diff1 value: 74.70416754842327 - type: nauc_map_at_1000_max value: 65.86915994583384 - type: nauc_map_at_1000_std value: -19.04437483534443 - type: nauc_map_at_100_diff1 value: 74.70011798058674 - type: nauc_map_at_100_max value: 65.88507779167188 - type: nauc_map_at_100_std value: -19.018670970643786 - type: nauc_map_at_10_diff1 value: 74.6362126804427 - type: nauc_map_at_10_max value: 66.05733054427198 - type: nauc_map_at_10_std value: -19.034317737897354 - type: nauc_map_at_1_diff1 value: 77.24970536833601 - type: nauc_map_at_1_max value: 62.07820573048406 - type: nauc_map_at_1_std value: -20.917086586335078 - type: nauc_map_at_20_diff1 value: 74.64113920401083 - type: nauc_map_at_20_max value: 65.89991740166793 - type: nauc_map_at_20_std value: -19.09987515041243 - type: nauc_map_at_3_diff1 value: 74.6518162332119 - type: nauc_map_at_3_max value: 66.10312348194024 - type: nauc_map_at_3_std value: -18.95881457716116 - type: nauc_map_at_5_diff1 value: 74.55141020670321 - type: nauc_map_at_5_max value: 65.94345752979342 - type: nauc_map_at_5_std value: -19.453976877992304 - type: nauc_mrr_at_1000_diff1 value: 74.64458488344088 - type: nauc_mrr_at_1000_max value: 65.84575328456057 - type: nauc_mrr_at_1000_std value: -18.901614615119904 - type: nauc_mrr_at_100_diff1 value: 74.64058497924627 - type: nauc_mrr_at_100_max value: 65.86170461767928 - type: nauc_mrr_at_100_std value: -18.87601697091505 - type: nauc_mrr_at_10_diff1 value: 74.57266634464752 - type: nauc_mrr_at_10_max value: 66.03331587645152 - type: nauc_mrr_at_10_std value: -18.87888060105393 - type: nauc_mrr_at_1_diff1 value: 77.19578272647183 - type: nauc_mrr_at_1_max value: 62.05252035478773 - type: nauc_mrr_at_1_std value: -20.790530940625267 - type: nauc_mrr_at_20_diff1 value: 74.5808171250021 - type: nauc_mrr_at_20_max value: 65.87643606587798 - type: nauc_mrr_at_20_std value: -18.95476583474199 - type: nauc_mrr_at_3_diff1 value: 74.5917053289191 - type: nauc_mrr_at_3_max value: 66.08044079438714 - type: nauc_mrr_at_3_std value: -18.81168463163586 - type: nauc_mrr_at_5_diff1 value: 74.48934579694608 - type: nauc_mrr_at_5_max value: 65.91993162383771 - type: nauc_mrr_at_5_std value: -19.302710791338797 - type: nauc_ndcg_at_1000_diff1 value: 74.20191283992186 - type: nauc_ndcg_at_1000_max value: 66.60831175771229 - type: nauc_ndcg_at_1000_std value: -18.175208725175484 - type: nauc_ndcg_at_100_diff1 value: 74.07713451642955 - type: nauc_ndcg_at_100_max value: 67.02028626335476 - type: nauc_ndcg_at_100_std value: -17.36560972181693 - type: nauc_ndcg_at_10_diff1 value: 73.63235521598476 - type: nauc_ndcg_at_10_max value: 67.8118473312638 - type: nauc_ndcg_at_10_std value: -17.647560577355915 - type: nauc_ndcg_at_1_diff1 value: 77.19578272647183 - type: nauc_ndcg_at_1_max value: 62.05252035478773 - type: nauc_ndcg_at_1_std value: -20.790530940625267 - type: nauc_ndcg_at_20_diff1 value: 73.65300308228291 - type: nauc_ndcg_at_20_max value: 67.18353402731985 - type: nauc_ndcg_at_20_std value: -17.9240756389792 - type: nauc_ndcg_at_3_diff1 value: 73.73764900202292 - type: nauc_ndcg_at_3_max value: 67.60840957876889 - type: nauc_ndcg_at_3_std value: -17.962667543518933 - type: nauc_ndcg_at_5_diff1 value: 73.49040500302092 - type: nauc_ndcg_at_5_max value: 67.41251918514402 - type: nauc_ndcg_at_5_std value: -18.851877225955523 - type: nauc_precision_at_1000_diff1 value: -18.652906102973922 - type: nauc_precision_at_1000_max value: 2.1701672475574885 - type: nauc_precision_at_1000_std value: 61.713411950188835 - type: nauc_precision_at_100_diff1 value: 62.37565302288498 - type: nauc_precision_at_100_max value: 76.96921843049006 - type: nauc_precision_at_100_std value: 19.152009040219678 - type: nauc_precision_at_10_diff1 value: 68.14047344105212 - type: nauc_precision_at_10_max value: 77.7177273849099 - type: nauc_precision_at_10_std value: -9.124325941493698 - type: nauc_precision_at_1_diff1 value: 77.19578272647183 - type: nauc_precision_at_1_max value: 62.05252035478773 - type: nauc_precision_at_1_std value: -20.790530940625267 - type: nauc_precision_at_20_diff1 value: 65.38487456362745 - type: nauc_precision_at_20_max value: 74.61122933443669 - type: nauc_precision_at_20_std value: -8.129775929648341 - type: nauc_precision_at_3_diff1 value: 70.45937744142297 - type: nauc_precision_at_3_max value: 73.03004233073901 - type: nauc_precision_at_3_std value: -14.246554579025158 - type: nauc_precision_at_5_diff1 value: 69.02821772428955 - type: nauc_precision_at_5_max value: 73.52949774726446 - type: nauc_precision_at_5_std value: -16.355747231517757 - type: nauc_recall_at_1000_diff1 value: 35.804192824985755 - type: nauc_recall_at_1000_max value: 61.367785756485894 - type: nauc_recall_at_1000_std value: 54.01380822466869 - type: nauc_recall_at_100_diff1 value: 67.96210883597479 - type: nauc_recall_at_100_max value: 82.38124823732169 - type: nauc_recall_at_100_std value: 16.814922595309966 - type: nauc_recall_at_10_diff1 value: 68.21964459634341 - type: nauc_recall_at_10_max value: 77.68301934858845 - type: nauc_recall_at_10_std value: -9.430792913885066 - type: nauc_recall_at_1_diff1 value: 77.24970536833601 - type: nauc_recall_at_1_max value: 62.07820573048406 - type: nauc_recall_at_1_std value: -20.917086586335078 - type: nauc_recall_at_20_diff1 value: 66.60569906579487 - type: nauc_recall_at_20_max value: 75.66163186604354 - type: nauc_recall_at_20_std value: -9.09826205489828 - type: nauc_recall_at_3_diff1 value: 70.52323701841641 - type: nauc_recall_at_3_max value: 73.03478107411232 - type: nauc_recall_at_3_std value: -14.432325989967962 - type: nauc_recall_at_5_diff1 value: 69.08521261524373 - type: nauc_recall_at_5_max value: 73.51150270382094 - type: nauc_recall_at_5_std value: -16.569387503524368 - type: ndcg_at_1 value: 61.273999999999994 - type: ndcg_at_10 value: 73.90299999999999 - type: ndcg_at_100 value: 75.983 - type: ndcg_at_1000 value: 76.488 - type: ndcg_at_20 value: 74.921 - type: ndcg_at_3 value: 70.277 - type: ndcg_at_5 value: 72.172 - type: precision_at_1 value: 61.273999999999994 - type: precision_at_10 value: 8.641 - type: precision_at_100 value: 0.962 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.524 - type: precision_at_3 value: 25.517 - type: precision_at_5 value: 16.223000000000003 - type: recall_at_1 value: 61.236000000000004 - type: recall_at_10 value: 86.37700000000001 - type: recall_at_100 value: 96.054 - type: recall_at_1000 value: 99.887 - type: recall_at_20 value: 90.398 - type: recall_at_3 value: 76.51299999999999 - type: recall_at_5 value: 81.07900000000001 task: type: Retrieval - dataset: config: spa-spa name: MTEB MLQARetrieval (spa-spa) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 68.632 - type: map_at_1 value: 57.046 - type: map_at_10 value: 64.869 - type: map_at_100 value: 65.384 - type: map_at_1000 value: 65.413 - type: map_at_20 value: 65.185 - type: map_at_3 value: 63.178 - type: map_at_5 value: 64.12 - type: mrr_at_1 value: 57.05579889544848 - type: mrr_at_10 value: 64.8806425382317 - type: mrr_at_100 value: 65.39469233244084 - type: mrr_at_1000 value: 65.42342199403159 - type: mrr_at_20 value: 65.19634815919534 - type: mrr_at_3 value: 63.18796419729591 - type: mrr_at_5 value: 64.13159398209874 - type: nauc_map_at_1000_diff1 value: 73.23803038674018 - type: nauc_map_at_1000_max value: 67.44156201421714 - type: nauc_map_at_1000_std value: -8.60143026450049 - type: nauc_map_at_100_diff1 value: 73.22575613034235 - type: nauc_map_at_100_max value: 67.44735143420195 - type: nauc_map_at_100_std value: -8.576905069492895 - type: nauc_map_at_10_diff1 value: 73.11950129610865 - type: nauc_map_at_10_max value: 67.45107232305055 - type: nauc_map_at_10_std value: -8.799837857015392 - type: nauc_map_at_1_diff1 value: 76.18354072047988 - type: nauc_map_at_1_max value: 65.03342186728786 - type: nauc_map_at_1_std value: -10.867650288695796 - type: nauc_map_at_20_diff1 value: 73.21570748770948 - type: nauc_map_at_20_max value: 67.50340321088724 - type: nauc_map_at_20_std value: -8.594057184944676 - type: nauc_map_at_3_diff1 value: 73.17239276163892 - type: nauc_map_at_3_max value: 67.06319504819103 - type: nauc_map_at_3_std value: -9.883216310270528 - type: nauc_map_at_5_diff1 value: 73.11913507367727 - type: nauc_map_at_5_max value: 67.27497019567078 - type: nauc_map_at_5_std value: -9.497714822103118 - type: nauc_mrr_at_1000_diff1 value: 73.22971233311306 - type: nauc_mrr_at_1000_max value: 67.42977229057223 - type: nauc_mrr_at_1000_std value: -8.550068702273297 - type: nauc_mrr_at_100_diff1 value: 73.21744467317815 - type: nauc_mrr_at_100_max value: 67.43557491068093 - type: nauc_mrr_at_100_std value: -8.52559275190607 - type: nauc_mrr_at_10_diff1 value: 73.11075619726137 - type: nauc_mrr_at_10_max value: 67.43889760205286 - type: nauc_mrr_at_10_std value: -8.74617232559183 - type: nauc_mrr_at_1_diff1 value: 76.17529975949547 - type: nauc_mrr_at_1_max value: 65.02401127001608 - type: nauc_mrr_at_1_std value: -10.817814457633952 - type: nauc_mrr_at_20_diff1 value: 73.20689275225138 - type: nauc_mrr_at_20_max value: 67.49111752272192 - type: nauc_mrr_at_20_std value: -8.539827528410353 - type: nauc_mrr_at_3_diff1 value: 73.16291729623958 - type: nauc_mrr_at_3_max value: 67.05300993427998 - type: nauc_mrr_at_3_std value: -9.827915885680811 - type: nauc_mrr_at_5_diff1 value: 73.11055686484109 - type: nauc_mrr_at_5_max value: 67.26299851089122 - type: nauc_mrr_at_5_std value: -9.445190276650903 - type: nauc_ndcg_at_1000_diff1 value: 72.58833638407177 - type: nauc_ndcg_at_1000_max value: 68.10447506371374 - type: nauc_ndcg_at_1000_std value: -6.910306241546282 - type: nauc_ndcg_at_100_diff1 value: 72.24524849631476 - type: nauc_ndcg_at_100_max value: 68.30659210081238 - type: nauc_ndcg_at_100_std value: -6.04305364268931 - type: nauc_ndcg_at_10_diff1 value: 71.87363502582961 - type: nauc_ndcg_at_10_max value: 68.5010009653693 - type: nauc_ndcg_at_10_std value: -7.021281296450588 - type: nauc_ndcg_at_1_diff1 value: 76.17529975949547 - type: nauc_ndcg_at_1_max value: 65.02401127001608 - type: nauc_ndcg_at_1_std value: -10.817814457633952 - type: nauc_ndcg_at_20_diff1 value: 72.21241010439327 - type: nauc_ndcg_at_20_max value: 68.71743274030551 - type: nauc_ndcg_at_20_std value: -6.186629577195946 - type: nauc_ndcg_at_3_diff1 value: 72.08204674794459 - type: nauc_ndcg_at_3_max value: 67.5958365046156 - type: nauc_ndcg_at_3_std value: -9.576418336610345 - type: nauc_ndcg_at_5_diff1 value: 71.93179095844508 - type: nauc_ndcg_at_5_max value: 68.01914639754217 - type: nauc_ndcg_at_5_std value: -8.833768332910777 - type: nauc_precision_at_1000_diff1 value: 63.0051360227489 - type: nauc_precision_at_1000_max value: 79.93532442313229 - type: nauc_precision_at_1000_std value: 52.869517607133254 - type: nauc_precision_at_100_diff1 value: 62.43301501857154 - type: nauc_precision_at_100_max value: 75.57280416668183 - type: nauc_precision_at_100_std value: 26.758300486132747 - type: nauc_precision_at_10_diff1 value: 66.29806375971134 - type: nauc_precision_at_10_max value: 73.40301413754797 - type: nauc_precision_at_10_std value: 1.9858547295235462 - type: nauc_precision_at_1_diff1 value: 76.17529975949547 - type: nauc_precision_at_1_max value: 65.02401127001608 - type: nauc_precision_at_1_std value: -10.817814457633952 - type: nauc_precision_at_20_diff1 value: 67.05111836051105 - type: nauc_precision_at_20_max value: 76.09783190824155 - type: nauc_precision_at_20_std value: 9.906010659515564 - type: nauc_precision_at_3_diff1 value: 68.44186679250453 - type: nauc_precision_at_3_max value: 69.30301351119388 - type: nauc_precision_at_3_std value: -8.566522518882348 - type: nauc_precision_at_5_diff1 value: 67.51737199297388 - type: nauc_precision_at_5_max value: 70.75887601590472 - type: nauc_precision_at_5_std value: -6.278983102710238 - type: nauc_recall_at_1000_diff1 value: 65.12360093170948 - type: nauc_recall_at_1000_max value: 82.60209843191132 - type: nauc_recall_at_1000_std value: 51.740179583368636 - type: nauc_recall_at_100_diff1 value: 62.82007697326819 - type: nauc_recall_at_100_max value: 76.04844844677562 - type: nauc_recall_at_100_std value: 26.4678415019248 - type: nauc_recall_at_10_diff1 value: 66.28557566848767 - type: nauc_recall_at_10_max value: 73.40302709828738 - type: nauc_recall_at_10_std value: 1.9224272854613582 - type: nauc_recall_at_1_diff1 value: 76.18354072047988 - type: nauc_recall_at_1_max value: 65.03342186728786 - type: nauc_recall_at_1_std value: -10.867650288695796 - type: nauc_recall_at_20_diff1 value: 67.03430451094992 - type: nauc_recall_at_20_max value: 76.09474005171319 - type: nauc_recall_at_20_std value: 9.815888637851074 - type: nauc_recall_at_3_diff1 value: 68.44411411344718 - type: nauc_recall_at_3_max value: 69.30502737137265 - type: nauc_recall_at_3_std value: -8.629526329714132 - type: nauc_recall_at_5_diff1 value: 67.51469265953514 - type: nauc_recall_at_5_max value: 70.76969893818111 - type: nauc_recall_at_5_std value: -6.325600167105444 - type: ndcg_at_1 value: 57.056 - type: ndcg_at_10 value: 68.632 - type: ndcg_at_100 value: 71.202 - type: ndcg_at_1000 value: 71.97099999999999 - type: ndcg_at_20 value: 69.785 - type: ndcg_at_3 value: 65.131 - type: ndcg_at_5 value: 66.834 - type: precision_at_1 value: 57.056 - type: precision_at_10 value: 8.044 - type: precision_at_100 value: 0.9259999999999999 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 4.251 - type: precision_at_3 value: 23.589 - type: precision_at_5 value: 14.984 - type: recall_at_1 value: 57.046 - type: recall_at_10 value: 80.423 - type: recall_at_100 value: 92.582 - type: recall_at_1000 value: 98.638 - type: recall_at_20 value: 84.993 - type: recall_at_3 value: 70.758 - type: recall_at_5 value: 74.9 task: type: Retrieval - dataset: config: spa-eng name: MTEB MLQARetrieval (spa-eng) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 68.765 - type: map_at_1 value: 56.538999999999994 - type: map_at_10 value: 64.816 - type: map_at_100 value: 65.325 - type: map_at_1000 value: 65.352 - type: map_at_20 value: 65.113 - type: map_at_3 value: 62.934999999999995 - type: map_at_5 value: 64.063 - type: mrr_at_1 value: 56.539120502569965 - type: mrr_at_10 value: 64.81561556661505 - type: mrr_at_100 value: 65.32464238613954 - type: mrr_at_1000 value: 65.35206516602133 - type: mrr_at_20 value: 65.11270445292227 - type: mrr_at_3 value: 62.935465448315384 - type: mrr_at_5 value: 64.06339234723022 - type: nauc_map_at_1000_diff1 value: 73.20701050428072 - type: nauc_map_at_1000_max value: 67.32797480614404 - type: nauc_map_at_1000_std value: -6.211540626528362 - type: nauc_map_at_100_diff1 value: 73.19497683923063 - type: nauc_map_at_100_max value: 67.33392646467817 - type: nauc_map_at_100_std value: -6.196671563900051 - type: nauc_map_at_10_diff1 value: 73.16010547612956 - type: nauc_map_at_10_max value: 67.37793741307372 - type: nauc_map_at_10_std value: -6.3443240322521675 - type: nauc_map_at_1_diff1 value: 76.63696578575964 - type: nauc_map_at_1_max value: 65.08189618178105 - type: nauc_map_at_1_std value: -8.594195451782733 - type: nauc_map_at_20_diff1 value: 73.15233479381568 - type: nauc_map_at_20_max value: 67.3679607256072 - type: nauc_map_at_20_std value: -6.175928265286352 - type: nauc_map_at_3_diff1 value: 73.14853380980746 - type: nauc_map_at_3_max value: 67.10354198073468 - type: nauc_map_at_3_std value: -7.409679815529866 - type: nauc_map_at_5_diff1 value: 73.13425961877715 - type: nauc_map_at_5_max value: 67.22452899371224 - type: nauc_map_at_5_std value: -6.895257774506354 - type: nauc_mrr_at_1000_diff1 value: 73.20701050428072 - type: nauc_mrr_at_1000_max value: 67.32797480614404 - type: nauc_mrr_at_1000_std value: -6.211540626528362 - type: nauc_mrr_at_100_diff1 value: 73.19497683923063 - type: nauc_mrr_at_100_max value: 67.33392646467817 - type: nauc_mrr_at_100_std value: -6.196671563900051 - type: nauc_mrr_at_10_diff1 value: 73.16010547612956 - type: nauc_mrr_at_10_max value: 67.37793741307372 - type: nauc_mrr_at_10_std value: -6.3443240322521675 - type: nauc_mrr_at_1_diff1 value: 76.63696578575964 - type: nauc_mrr_at_1_max value: 65.08189618178105 - type: nauc_mrr_at_1_std value: -8.594195451782733 - type: nauc_mrr_at_20_diff1 value: 73.15233479381568 - type: nauc_mrr_at_20_max value: 67.3679607256072 - type: nauc_mrr_at_20_std value: -6.175928265286352 - type: nauc_mrr_at_3_diff1 value: 73.14853380980746 - type: nauc_mrr_at_3_max value: 67.10354198073468 - type: nauc_mrr_at_3_std value: -7.409679815529866 - type: nauc_mrr_at_5_diff1 value: 73.13425961877715 - type: nauc_mrr_at_5_max value: 67.22452899371224 - type: nauc_mrr_at_5_std value: -6.895257774506354 - type: nauc_ndcg_at_1000_diff1 value: 72.44364625096874 - type: nauc_ndcg_at_1000_max value: 67.93635761141552 - type: nauc_ndcg_at_1000_std value: -4.616429464350954 - type: nauc_ndcg_at_100_diff1 value: 72.11352383758482 - type: nauc_ndcg_at_100_max value: 68.1627312575955 - type: nauc_ndcg_at_100_std value: -3.894213672131282 - type: nauc_ndcg_at_10_diff1 value: 71.8526850770812 - type: nauc_ndcg_at_10_max value: 68.41366561888562 - type: nauc_ndcg_at_10_std value: -4.472146861145989 - type: nauc_ndcg_at_1_diff1 value: 76.63696578575964 - type: nauc_ndcg_at_1_max value: 65.08189618178105 - type: nauc_ndcg_at_1_std value: -8.594195451782733 - type: nauc_ndcg_at_20_diff1 value: 71.76464418138866 - type: nauc_ndcg_at_20_max value: 68.41174963313698 - type: nauc_ndcg_at_20_std value: -3.7449762037540157 - type: nauc_ndcg_at_3_diff1 value: 71.93808990683131 - type: nauc_ndcg_at_3_max value: 67.7010029507334 - type: nauc_ndcg_at_3_std value: -6.971858419379321 - type: nauc_ndcg_at_5_diff1 value: 71.8505224811326 - type: nauc_ndcg_at_5_max value: 67.97139549500251 - type: nauc_ndcg_at_5_std value: -5.958491308070017 - type: nauc_precision_at_1000_diff1 value: 62.20956180320043 - type: nauc_precision_at_1000_max value: 82.53412670611299 - type: nauc_precision_at_1000_std value: 55.57278124999575 - type: nauc_precision_at_100_diff1 value: 62.03792857023201 - type: nauc_precision_at_100_max value: 76.77130713424538 - type: nauc_precision_at_100_std value: 26.674102719959564 - type: nauc_precision_at_10_diff1 value: 65.89798055049931 - type: nauc_precision_at_10_max value: 73.41908620140674 - type: nauc_precision_at_10_std value: 5.21818573283179 - type: nauc_precision_at_1_diff1 value: 76.63696578575964 - type: nauc_precision_at_1_max value: 65.08189618178105 - type: nauc_precision_at_1_std value: -8.594195451782733 - type: nauc_precision_at_20_diff1 value: 63.734308542647355 - type: nauc_precision_at_20_max value: 74.69578825096144 - type: nauc_precision_at_20_std value: 12.627842502659162 - type: nauc_precision_at_3_diff1 value: 67.91189666671904 - type: nauc_precision_at_3_max value: 69.64986036783209 - type: nauc_precision_at_3_std value: -5.505669087429055 - type: nauc_precision_at_5_diff1 value: 67.01880006360248 - type: nauc_precision_at_5_max value: 70.78916423358686 - type: nauc_precision_at_5_std value: -2.2273742736401045 - type: nauc_recall_at_1000_diff1 value: 62.20956180319936 - type: nauc_recall_at_1000_max value: 82.53412670611287 - type: nauc_recall_at_1000_std value: 55.57278124999549 - type: nauc_recall_at_100_diff1 value: 62.03792857023208 - type: nauc_recall_at_100_max value: 76.77130713424577 - type: nauc_recall_at_100_std value: 26.67410271995973 - type: nauc_recall_at_10_diff1 value: 65.8979805504994 - type: nauc_recall_at_10_max value: 73.41908620140678 - type: nauc_recall_at_10_std value: 5.2181857328318655 - type: nauc_recall_at_1_diff1 value: 76.63696578575964 - type: nauc_recall_at_1_max value: 65.08189618178105 - type: nauc_recall_at_1_std value: -8.594195451782733 - type: nauc_recall_at_20_diff1 value: 63.734308542647334 - type: nauc_recall_at_20_max value: 74.69578825096123 - type: nauc_recall_at_20_std value: 12.627842502658982 - type: nauc_recall_at_3_diff1 value: 67.91189666671897 - type: nauc_recall_at_3_max value: 69.64986036783203 - type: nauc_recall_at_3_std value: -5.505669087428989 - type: nauc_recall_at_5_diff1 value: 67.01880006360243 - type: nauc_recall_at_5_max value: 70.78916423358686 - type: nauc_recall_at_5_std value: -2.227374273640135 - type: ndcg_at_1 value: 56.538999999999994 - type: ndcg_at_10 value: 68.765 - type: ndcg_at_100 value: 71.314 - type: ndcg_at_1000 value: 72.038 - type: ndcg_at_20 value: 69.828 - type: ndcg_at_3 value: 64.937 - type: ndcg_at_5 value: 66.956 - type: precision_at_1 value: 56.538999999999994 - type: precision_at_10 value: 8.113 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 4.265 - type: precision_at_3 value: 23.567 - type: precision_at_5 value: 15.115 - type: recall_at_1 value: 56.538999999999994 - type: recall_at_10 value: 81.135 - type: recall_at_100 value: 93.223 - type: recall_at_1000 value: 98.896 - type: recall_at_20 value: 85.304 - type: recall_at_3 value: 70.702 - type: recall_at_5 value: 75.576 task: type: Retrieval - dataset: config: eng-deu name: MTEB MLQARetrieval (eng-deu) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 69.298 - type: map_at_1 value: 58.553 - type: map_at_10 value: 65.769 - type: map_at_100 value: 66.298 - type: map_at_1000 value: 66.328 - type: map_at_20 value: 66.101 - type: map_at_3 value: 64.048 - type: map_at_5 value: 65.09 - type: mrr_at_1 value: 58.564148016840235 - type: mrr_at_10 value: 65.7685997066675 - type: mrr_at_100 value: 66.29874034432214 - type: mrr_at_1000 value: 66.32844979939088 - type: mrr_at_20 value: 66.10120513957821 - type: mrr_at_3 value: 64.04830489696437 - type: mrr_at_5 value: 65.08974074894746 - type: nauc_map_at_1000_diff1 value: 76.8409650183994 - type: nauc_map_at_1000_max value: 71.86367015521367 - type: nauc_map_at_1000_std value: -14.464881539957256 - type: nauc_map_at_100_diff1 value: 76.82536521842064 - type: nauc_map_at_100_max value: 71.86811127965429 - type: nauc_map_at_100_std value: -14.441105539722244 - type: nauc_map_at_10_diff1 value: 76.75522453447859 - type: nauc_map_at_10_max value: 71.87677500176706 - type: nauc_map_at_10_std value: -14.741331625103559 - type: nauc_map_at_1_diff1 value: 79.64060747740989 - type: nauc_map_at_1_max value: 69.84278563569617 - type: nauc_map_at_1_std value: -15.936904929655832 - type: nauc_map_at_20_diff1 value: 76.78894776059715 - type: nauc_map_at_20_max value: 71.89637938044827 - type: nauc_map_at_20_std value: -14.500564106990769 - type: nauc_map_at_3_diff1 value: 77.20562577450342 - type: nauc_map_at_3_max value: 71.80578229361525 - type: nauc_map_at_3_std value: -15.344134588512201 - type: nauc_map_at_5_diff1 value: 77.00480147367867 - type: nauc_map_at_5_max value: 71.98335924076163 - type: nauc_map_at_5_std value: -15.16537653041026 - type: nauc_mrr_at_1000_diff1 value: 76.84165367691193 - type: nauc_mrr_at_1000_max value: 71.8642679499795 - type: nauc_mrr_at_1000_std value: -14.461717954593158 - type: nauc_mrr_at_100_diff1 value: 76.8263363557998 - type: nauc_mrr_at_100_max value: 71.86874522368626 - type: nauc_mrr_at_100_std value: -14.437105168707426 - type: nauc_mrr_at_10_diff1 value: 76.75522453447859 - type: nauc_mrr_at_10_max value: 71.87677500176706 - type: nauc_mrr_at_10_std value: -14.741331625103559 - type: nauc_mrr_at_1_diff1 value: 79.65642669321981 - type: nauc_mrr_at_1_max value: 69.89135358784799 - type: nauc_mrr_at_1_std value: -15.919357002229589 - type: nauc_mrr_at_20_diff1 value: 76.78883171270601 - type: nauc_mrr_at_20_max value: 71.89806887245291 - type: nauc_mrr_at_20_std value: -14.497139746907905 - type: nauc_mrr_at_3_diff1 value: 77.20562577450342 - type: nauc_mrr_at_3_max value: 71.80578229361525 - type: nauc_mrr_at_3_std value: -15.344134588512201 - type: nauc_mrr_at_5_diff1 value: 77.00480147367867 - type: nauc_mrr_at_5_max value: 71.98335924076163 - type: nauc_mrr_at_5_std value: -15.16537653041026 - type: nauc_ndcg_at_1000_diff1 value: 76.07802417817047 - type: nauc_ndcg_at_1000_max value: 72.31792804426776 - type: nauc_ndcg_at_1000_std value: -13.049160715132244 - type: nauc_ndcg_at_100_diff1 value: 75.63343849116544 - type: nauc_ndcg_at_100_max value: 72.48362076101817 - type: nauc_ndcg_at_100_std value: -12.089600993516777 - type: nauc_ndcg_at_10_diff1 value: 75.23387929929208 - type: nauc_ndcg_at_10_max value: 72.51436288271807 - type: nauc_ndcg_at_10_std value: -13.624132103038104 - type: nauc_ndcg_at_1_diff1 value: 79.65642669321981 - type: nauc_ndcg_at_1_max value: 69.89135358784799 - type: nauc_ndcg_at_1_std value: -15.919357002229589 - type: nauc_ndcg_at_20_diff1 value: 75.32926047656296 - type: nauc_ndcg_at_20_max value: 72.61254165918145 - type: nauc_ndcg_at_20_std value: -12.683157599238701 - type: nauc_ndcg_at_3_diff1 value: 76.3089337665469 - type: nauc_ndcg_at_3_max value: 72.40014674426054 - type: nauc_ndcg_at_3_std value: -15.08624226353458 - type: nauc_ndcg_at_5_diff1 value: 75.88857331641834 - type: nauc_ndcg_at_5_max value: 72.7719386827224 - type: nauc_ndcg_at_5_std value: -14.70546521089236 - type: nauc_precision_at_1000_diff1 value: 59.66563879069911 - type: nauc_precision_at_1000_max value: 74.57123562956772 - type: nauc_precision_at_1000_std value: 58.61396866718965 - type: nauc_precision_at_100_diff1 value: 62.8695896550042 - type: nauc_precision_at_100_max value: 77.81408796785 - type: nauc_precision_at_100_std value: 23.819735672317826 - type: nauc_precision_at_10_diff1 value: 68.08051625224569 - type: nauc_precision_at_10_max value: 75.14432336036869 - type: nauc_precision_at_10_std value: -7.97602345252735 - type: nauc_precision_at_1_diff1 value: 79.65642669321981 - type: nauc_precision_at_1_max value: 69.89135358784799 - type: nauc_precision_at_1_std value: -15.919357002229589 - type: nauc_precision_at_20_diff1 value: 66.7168005185165 - type: nauc_precision_at_20_max value: 76.58522761697147 - type: nauc_precision_at_20_std value: -0.17923428317323292 - type: nauc_precision_at_3_diff1 value: 73.23394851561207 - type: nauc_precision_at_3_max value: 74.32517846819215 - type: nauc_precision_at_3_std value: -14.142301336188348 - type: nauc_precision_at_5_diff1 value: 71.5666882547012 - type: nauc_precision_at_5_max value: 75.71098205440033 - type: nauc_precision_at_5_std value: -12.808362513638052 - type: nauc_recall_at_1000_diff1 value: 71.73736112325805 - type: nauc_recall_at_1000_max value: 86.70743436225898 - type: nauc_recall_at_1000_std value: 54.45802578371167 - type: nauc_recall_at_100_diff1 value: 64.07053861428128 - type: nauc_recall_at_100_max value: 78.8348308099261 - type: nauc_recall_at_100_std value: 22.72263677785103 - type: nauc_recall_at_10_diff1 value: 68.20272901407903 - type: nauc_recall_at_10_max value: 75.16315335381938 - type: nauc_recall_at_10_std value: -8.060716748913386 - type: nauc_recall_at_1_diff1 value: 79.64060747740989 - type: nauc_recall_at_1_max value: 69.84278563569617 - type: nauc_recall_at_1_std value: -15.936904929655832 - type: nauc_recall_at_20_diff1 value: 66.88206981973654 - type: nauc_recall_at_20_max value: 76.54824917595687 - type: nauc_recall_at_20_std value: -0.40294589316962287 - type: nauc_recall_at_3_diff1 value: 73.33076087258938 - type: nauc_recall_at_3_max value: 74.33763112508771 - type: nauc_recall_at_3_std value: -14.213355414905399 - type: nauc_recall_at_5_diff1 value: 71.67487623469464 - type: nauc_recall_at_5_max value: 75.72770292516316 - type: nauc_recall_at_5_std value: -12.887572274644818 - type: ndcg_at_1 value: 58.56400000000001 - type: ndcg_at_10 value: 69.298 - type: ndcg_at_100 value: 71.95899999999999 - type: ndcg_at_1000 value: 72.735 - type: ndcg_at_20 value: 70.50699999999999 - type: ndcg_at_3 value: 65.81700000000001 - type: ndcg_at_5 value: 67.681 - type: precision_at_1 value: 58.56400000000001 - type: precision_at_10 value: 8.039 - type: precision_at_100 value: 0.931 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 4.259 - type: precision_at_3 value: 23.65 - type: precision_at_5 value: 15.09 - type: recall_at_1 value: 58.553 - type: recall_at_10 value: 80.368 - type: recall_at_100 value: 93.013 - type: recall_at_1000 value: 99.092 - type: recall_at_20 value: 85.143 - type: recall_at_3 value: 70.928 - type: recall_at_5 value: 75.42699999999999 task: type: Retrieval - dataset: config: eng-spa name: MTEB MLQARetrieval (eng-spa) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 66.374 - type: map_at_1 value: 55.494 - type: map_at_10 value: 62.763999999999996 - type: map_at_100 value: 63.33 - type: map_at_1000 value: 63.36000000000001 - type: map_at_20 value: 63.104000000000006 - type: map_at_3 value: 61.065000000000005 - type: map_at_5 value: 62.053000000000004 - type: mrr_at_1 value: 55.49419158255571 - type: mrr_at_10 value: 62.765195140457095 - type: mrr_at_100 value: 63.33083349354529 - type: mrr_at_1000 value: 63.3611897014839 - type: mrr_at_20 value: 63.10543590095977 - type: mrr_at_3 value: 61.06455913159412 - type: mrr_at_5 value: 62.052942296705474 - type: nauc_map_at_1000_diff1 value: 75.04200018088618 - type: nauc_map_at_1000_max value: 70.49937782771909 - type: nauc_map_at_1000_std value: -5.257206317083184 - type: nauc_map_at_100_diff1 value: 75.02786834256312 - type: nauc_map_at_100_max value: 70.5016476500189 - type: nauc_map_at_100_std value: -5.228770832077681 - type: nauc_map_at_10_diff1 value: 74.9626552701647 - type: nauc_map_at_10_max value: 70.56253732243214 - type: nauc_map_at_10_std value: -5.359037281768563 - type: nauc_map_at_1_diff1 value: 78.46858307815857 - type: nauc_map_at_1_max value: 69.03908373759435 - type: nauc_map_at_1_std value: -7.479412070736642 - type: nauc_map_at_20_diff1 value: 74.98121458084796 - type: nauc_map_at_20_max value: 70.51885366822565 - type: nauc_map_at_20_std value: -5.286051287133815 - type: nauc_map_at_3_diff1 value: 75.36078454383373 - type: nauc_map_at_3_max value: 70.34997144546014 - type: nauc_map_at_3_std value: -6.663517224039184 - type: nauc_map_at_5_diff1 value: 75.0274512828238 - type: nauc_map_at_5_max value: 70.45292551591874 - type: nauc_map_at_5_std value: -6.029224488640147 - type: nauc_mrr_at_1000_diff1 value: 75.04018768469983 - type: nauc_mrr_at_1000_max value: 70.49855509132635 - type: nauc_mrr_at_1000_std value: -5.258929961409948 - type: nauc_mrr_at_100_diff1 value: 75.02605732810112 - type: nauc_mrr_at_100_max value: 70.50082584929103 - type: nauc_mrr_at_100_std value: -5.2304917988542154 - type: nauc_mrr_at_10_diff1 value: 74.96079080525713 - type: nauc_mrr_at_10_max value: 70.56167294920391 - type: nauc_mrr_at_10_std value: -5.360650630655072 - type: nauc_mrr_at_1_diff1 value: 78.46858307815857 - type: nauc_mrr_at_1_max value: 69.03908373759435 - type: nauc_mrr_at_1_std value: -7.479412070736642 - type: nauc_mrr_at_20_diff1 value: 74.97939804960517 - type: nauc_mrr_at_20_max value: 70.51804078965411 - type: nauc_mrr_at_20_std value: -5.287681954889177 - type: nauc_mrr_at_3_diff1 value: 75.36078454383373 - type: nauc_mrr_at_3_max value: 70.34997144546014 - type: nauc_mrr_at_3_std value: -6.663517224039184 - type: nauc_mrr_at_5_diff1 value: 75.0274512828238 - type: nauc_mrr_at_5_max value: 70.45292551591874 - type: nauc_mrr_at_5_std value: -6.029224488640147 - type: nauc_ndcg_at_1000_diff1 value: 74.22106834748942 - type: nauc_ndcg_at_1000_max value: 70.93625922934912 - type: nauc_ndcg_at_1000_std value: -3.4878399005946017 - type: nauc_ndcg_at_100_diff1 value: 73.74068883646733 - type: nauc_ndcg_at_100_max value: 71.02357018347472 - type: nauc_ndcg_at_100_std value: -2.462293184201324 - type: nauc_ndcg_at_10_diff1 value: 73.40967965536565 - type: nauc_ndcg_at_10_max value: 71.29379828672067 - type: nauc_ndcg_at_10_std value: -3.295547756383108 - type: nauc_ndcg_at_1_diff1 value: 78.46858307815857 - type: nauc_ndcg_at_1_max value: 69.03908373759435 - type: nauc_ndcg_at_1_std value: -7.479412070736642 - type: nauc_ndcg_at_20_diff1 value: 73.45790057693699 - type: nauc_ndcg_at_20_max value: 71.16598432419126 - type: nauc_ndcg_at_20_std value: -2.962877157646097 - type: nauc_ndcg_at_3_diff1 value: 74.30696173964847 - type: nauc_ndcg_at_3_max value: 70.79878978459556 - type: nauc_ndcg_at_3_std value: -6.297286578628299 - type: nauc_ndcg_at_5_diff1 value: 73.65858211199816 - type: nauc_ndcg_at_5_max value: 71.01122417463776 - type: nauc_ndcg_at_5_std value: -5.075990882646765 - type: nauc_precision_at_1000_diff1 value: 68.71065091972568 - type: nauc_precision_at_1000_max value: 81.38173585624777 - type: nauc_precision_at_1000_std value: 58.035497889797895 - type: nauc_precision_at_100_diff1 value: 61.93634256957017 - type: nauc_precision_at_100_max value: 74.84191770203093 - type: nauc_precision_at_100_std value: 31.3325983123831 - type: nauc_precision_at_10_diff1 value: 66.68247010944937 - type: nauc_precision_at_10_max value: 74.48773524654571 - type: nauc_precision_at_10_std value: 6.560421880785153 - type: nauc_precision_at_1_diff1 value: 78.46858307815857 - type: nauc_precision_at_1_max value: 69.03908373759435 - type: nauc_precision_at_1_std value: -7.479412070736642 - type: nauc_precision_at_20_diff1 value: 65.51592872758067 - type: nauc_precision_at_20_max value: 74.50684066823096 - type: nauc_precision_at_20_std value: 10.830479877698208 - type: nauc_precision_at_3_diff1 value: 70.89587884861588 - type: nauc_precision_at_3_max value: 72.25310558370424 - type: nauc_precision_at_3_std value: -5.0796100900749765 - type: nauc_precision_at_5_diff1 value: 68.71885719845497 - type: nauc_precision_at_5_max value: 73.02601751485672 - type: nauc_precision_at_5_std value: -1.4382681421626857 - type: nauc_recall_at_1000_diff1 value: 71.95510299834734 - type: nauc_recall_at_1000_max value: 84.03647166092985 - type: nauc_recall_at_1000_std value: 56.87490604776847 - type: nauc_recall_at_100_diff1 value: 62.446624924715955 - type: nauc_recall_at_100_max value: 75.25666892464507 - type: nauc_recall_at_100_std value: 31.068789794554686 - type: nauc_recall_at_10_diff1 value: 66.70676336328988 - type: nauc_recall_at_10_max value: 74.4963699656397 - type: nauc_recall_at_10_std value: 6.57498399706916 - type: nauc_recall_at_1_diff1 value: 78.46858307815857 - type: nauc_recall_at_1_max value: 69.03908373759435 - type: nauc_recall_at_1_std value: -7.479412070736642 - type: nauc_recall_at_20_diff1 value: 65.54082767974772 - type: nauc_recall_at_20_max value: 74.5111529838772 - type: nauc_recall_at_20_std value: 10.84574829707354 - type: nauc_recall_at_3_diff1 value: 70.89587884861584 - type: nauc_recall_at_3_max value: 72.25310558370421 - type: nauc_recall_at_3_std value: -5.07961009007491 - type: nauc_recall_at_5_diff1 value: 68.71885719845501 - type: nauc_recall_at_5_max value: 73.02601751485666 - type: nauc_recall_at_5_std value: -1.4382681421626995 - type: ndcg_at_1 value: 55.494 - type: ndcg_at_10 value: 66.374 - type: ndcg_at_100 value: 69.254 - type: ndcg_at_1000 value: 70.136 - type: ndcg_at_20 value: 67.599 - type: ndcg_at_3 value: 62.863 - type: ndcg_at_5 value: 64.644 - type: precision_at_1 value: 55.494 - type: precision_at_10 value: 7.776 - type: precision_at_100 value: 0.9159999999999999 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 4.1290000000000004 - type: precision_at_3 value: 22.688 - type: precision_at_5 value: 14.477 - type: recall_at_1 value: 55.494 - type: recall_at_10 value: 77.747 - type: recall_at_100 value: 91.535 - type: recall_at_1000 value: 98.619 - type: recall_at_20 value: 82.565 - type: recall_at_3 value: 68.063 - type: recall_at_5 value: 72.386 task: type: Retrieval - dataset: config: eng-eng name: MTEB MLQARetrieval (eng-eng) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: test type: facebook/mlqa metrics: - type: main_score value: 64.723 - type: map_at_1 value: 54.308 - type: map_at_10 value: 61.26200000000001 - type: map_at_100 value: 61.82299999999999 - type: map_at_1000 value: 61.856 - type: map_at_20 value: 61.575 - type: map_at_3 value: 59.565 - type: map_at_5 value: 60.561 - type: mrr_at_1 value: 54.31704368848212 - type: mrr_at_10 value: 61.26520216098834 - type: mrr_at_100 value: 61.82588321127103 - type: mrr_at_1000 value: 61.859333030574334 - type: mrr_at_20 value: 61.57780339921337 - type: mrr_at_3 value: 59.569446842801646 - type: mrr_at_5 value: 60.56323029989004 - type: nauc_map_at_1000_diff1 value: 74.21413722468635 - type: nauc_map_at_1000_max value: 70.41741227882316 - type: nauc_map_at_1000_std value: -2.5438707209848506 - type: nauc_map_at_100_diff1 value: 74.19812315947975 - type: nauc_map_at_100_max value: 70.41589146728445 - type: nauc_map_at_100_std value: -2.5336117059429553 - type: nauc_map_at_10_diff1 value: 74.21810561152937 - type: nauc_map_at_10_max value: 70.48816115200171 - type: nauc_map_at_10_std value: -2.7443834681406734 - type: nauc_map_at_1_diff1 value: 77.69378738778958 - type: nauc_map_at_1_max value: 68.64652310701173 - type: nauc_map_at_1_std value: -4.667071946448379 - type: nauc_map_at_20_diff1 value: 74.16105697562438 - type: nauc_map_at_20_max value: 70.42491994631179 - type: nauc_map_at_20_std value: -2.6070416022440472 - type: nauc_map_at_3_diff1 value: 74.60449392878863 - type: nauc_map_at_3_max value: 70.39888609914269 - type: nauc_map_at_3_std value: -3.5401151125723986 - type: nauc_map_at_5_diff1 value: 74.2423420992663 - type: nauc_map_at_5_max value: 70.36574501826757 - type: nauc_map_at_5_std value: -3.2707393116898964 - type: nauc_mrr_at_1000_diff1 value: 74.21029843731323 - type: nauc_mrr_at_1000_max value: 70.43020492688913 - type: nauc_mrr_at_1000_std value: -2.526895582202081 - type: nauc_mrr_at_100_diff1 value: 74.19440960479243 - type: nauc_mrr_at_100_max value: 70.4288998824232 - type: nauc_mrr_at_100_std value: -2.5160929945118107 - type: nauc_mrr_at_10_diff1 value: 74.2141357266166 - type: nauc_mrr_at_10_max value: 70.5005683347807 - type: nauc_mrr_at_10_std value: -2.727154557882168 - type: nauc_mrr_at_1_diff1 value: 77.69891248239793 - type: nauc_mrr_at_1_max value: 68.68255231164922 - type: nauc_mrr_at_1_std value: -4.630226727154317 - type: nauc_mrr_at_20_diff1 value: 74.15705434409723 - type: nauc_mrr_at_20_max value: 70.43741835972747 - type: nauc_mrr_at_20_std value: -2.5896756472464495 - type: nauc_mrr_at_3_diff1 value: 74.5981844349412 - type: nauc_mrr_at_3_max value: 70.41834937080564 - type: nauc_mrr_at_3_std value: -3.5161656408031163 - type: nauc_mrr_at_5_diff1 value: 74.23847535424844 - type: nauc_mrr_at_5_max value: 70.37763810013656 - type: nauc_mrr_at_5_std value: -3.2560955164581733 - type: nauc_ndcg_at_1000_diff1 value: 73.20994496725493 - type: nauc_ndcg_at_1000_max value: 70.8903016277125 - type: nauc_ndcg_at_1000_std value: -0.625772298462309 - type: nauc_ndcg_at_100_diff1 value: 72.6847141682645 - type: nauc_ndcg_at_100_max value: 70.86564422034162 - type: nauc_ndcg_at_100_std value: -0.07195786766326141 - type: nauc_ndcg_at_10_diff1 value: 72.78806493754281 - type: nauc_ndcg_at_10_max value: 71.21957067926769 - type: nauc_ndcg_at_10_std value: -1.2760418313382227 - type: nauc_ndcg_at_1_diff1 value: 77.69891248239793 - type: nauc_ndcg_at_1_max value: 68.68255231164922 - type: nauc_ndcg_at_1_std value: -4.630226727154317 - type: nauc_ndcg_at_20_diff1 value: 72.52082440882546 - type: nauc_ndcg_at_20_max value: 70.98185004796734 - type: nauc_ndcg_at_20_std value: -0.6908280874815464 - type: nauc_ndcg_at_3_diff1 value: 73.59870660843939 - type: nauc_ndcg_at_3_max value: 70.94391957288654 - type: nauc_ndcg_at_3_std value: -3.147723179140428 - type: nauc_ndcg_at_5_diff1 value: 72.90122868193457 - type: nauc_ndcg_at_5_max value: 70.89376368965165 - type: nauc_ndcg_at_5_std value: -2.6451807385626744 - type: nauc_precision_at_1000_diff1 value: 58.14737201864067 - type: nauc_precision_at_1000_max value: 78.79011251144826 - type: nauc_precision_at_1000_std value: 59.98985420476577 - type: nauc_precision_at_100_diff1 value: 59.21069121644552 - type: nauc_precision_at_100_max value: 73.00557835912306 - type: nauc_precision_at_100_std value: 26.85027406282173 - type: nauc_precision_at_10_diff1 value: 66.8760831023675 - type: nauc_precision_at_10_max value: 74.21167950452596 - type: nauc_precision_at_10_std value: 5.453652499335947 - type: nauc_precision_at_1_diff1 value: 77.69891248239793 - type: nauc_precision_at_1_max value: 68.68255231164922 - type: nauc_precision_at_1_std value: -4.630226727154317 - type: nauc_precision_at_20_diff1 value: 64.3118559132602 - type: nauc_precision_at_20_max value: 73.33078184673825 - type: nauc_precision_at_20_std value: 9.993299523049402 - type: nauc_precision_at_3_diff1 value: 70.38667185155593 - type: nauc_precision_at_3_max value: 72.66495006030951 - type: nauc_precision_at_3_std value: -1.8532839591326276 - type: nauc_precision_at_5_diff1 value: 68.12161337583686 - type: nauc_precision_at_5_max value: 72.65644960375046 - type: nauc_precision_at_5_std value: -0.33317164167012875 - type: nauc_recall_at_1000_diff1 value: 61.63204394739985 - type: nauc_recall_at_1000_max value: 81.77241537319897 - type: nauc_recall_at_1000_std value: 58.44841544062308 - type: nauc_recall_at_100_diff1 value: 59.72072697224705 - type: nauc_recall_at_100_max value: 73.28519507061553 - type: nauc_recall_at_100_std value: 26.27318390763456 - type: nauc_recall_at_10_diff1 value: 66.9757135465418 - type: nauc_recall_at_10_max value: 74.21919493374149 - type: nauc_recall_at_10_std value: 5.323369605377166 - type: nauc_recall_at_1_diff1 value: 77.69378738778958 - type: nauc_recall_at_1_max value: 68.64652310701173 - type: nauc_recall_at_1_std value: -4.667071946448379 - type: nauc_recall_at_20_diff1 value: 64.42290081731899 - type: nauc_recall_at_20_max value: 73.3358289439033 - type: nauc_recall_at_20_std value: 9.846598361586073 - type: nauc_recall_at_3_diff1 value: 70.41211290964785 - type: nauc_recall_at_3_max value: 72.64451776775402 - type: nauc_recall_at_3_std value: -1.916280959835826 - type: nauc_recall_at_5_diff1 value: 68.20695272727916 - type: nauc_recall_at_5_max value: 72.66404224006101 - type: nauc_recall_at_5_std value: -0.431125323007886 - type: ndcg_at_1 value: 54.31700000000001 - type: ndcg_at_10 value: 64.723 - type: ndcg_at_100 value: 67.648 - type: ndcg_at_1000 value: 68.619 - type: ndcg_at_20 value: 65.85499999999999 - type: ndcg_at_3 value: 61.244 - type: ndcg_at_5 value: 63.038000000000004 - type: precision_at_1 value: 54.31700000000001 - type: precision_at_10 value: 7.564 - type: precision_at_100 value: 0.898 - type: precision_at_1000 value: 0.098 - type: precision_at_20 value: 4.005 - type: precision_at_3 value: 22.034000000000002 - type: precision_at_5 value: 14.093 - type: recall_at_1 value: 54.308 - type: recall_at_10 value: 75.622 - type: recall_at_100 value: 89.744 - type: recall_at_1000 value: 97.539 - type: recall_at_20 value: 80.085 - type: recall_at_3 value: 66.09 - type: recall_at_5 value: 70.446 task: type: Retrieval - dataset: config: de name: MTEB MLSUMClusteringP2P (de) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 41.267647761702854 - type: v_measure value: 41.267647761702854 - type: v_measure_std value: 10.93390895077248 task: type: Clustering - dataset: config: fr name: MTEB MLSUMClusteringP2P (fr) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 44.68714862333979 - type: v_measure value: 44.68714862333979 - type: v_measure_std value: 1.811036989797814 task: type: Clustering - dataset: config: ru name: MTEB MLSUMClusteringP2P (ru) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 41.92518785753813 - type: v_measure value: 41.92518785753813 - type: v_measure_std value: 5.9356661900220775 task: type: Clustering - dataset: config: es name: MTEB MLSUMClusteringP2P (es) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 48.69875719812033 - type: v_measure value: 48.69875719812033 - type: v_measure_std value: 1.204253881950113 task: type: Clustering - dataset: config: de name: MTEB MLSUMClusteringS2S (de) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 40.07927325071353 - type: v_measure value: 40.07927325071353 - type: v_measure_std value: 9.296680835266145 task: type: Clustering - dataset: config: fr name: MTEB MLSUMClusteringS2S (fr) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 44.88484854069901 - type: v_measure value: 44.88484854069901 - type: v_measure_std value: 2.3704247819781843 task: type: Clustering - dataset: config: ru name: MTEB MLSUMClusteringS2S (ru) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 43.97657450929179 - type: v_measure value: 43.97657450929179 - type: v_measure_std value: 6.087547931333613 task: type: Clustering - dataset: config: es name: MTEB MLSUMClusteringS2S (es) revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7 split: test type: reciTAL/mlsum metrics: - type: main_score value: 48.41108671948728 - type: v_measure value: 48.41108671948728 - type: v_measure_std value: 1.3848320630151243 task: type: Clustering - dataset: config: default name: MTEB MMarcoReranking (default) revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6 split: dev type: C-MTEB/Mmarco-reranking metrics: - type: map value: 21.050447576170395 - type: mrr value: 20.201984126984126 - type: main_score value: 21.050447576170395 task: type: Reranking - dataset: config: default name: MTEB MMarcoRetrieval (default) revision: 539bbde593d947e2a124ba72651aafc09eb33fc2 split: dev type: C-MTEB/MMarcoRetrieval metrics: - type: main_score value: 79.687 - type: map_at_1 value: 66.872 - type: map_at_10 value: 75.949 - type: map_at_100 value: 76.25 - type: map_at_1000 value: 76.259 - type: map_at_20 value: 76.145 - type: map_at_3 value: 74.01299999999999 - type: map_at_5 value: 75.232 - type: mrr_at_1 value: 69.18338108882521 - type: mrr_at_10 value: 76.5424227952881 - type: mrr_at_100 value: 76.8019342792628 - type: mrr_at_1000 value: 76.81002278342808 - type: mrr_at_20 value: 76.7115234815896 - type: mrr_at_3 value: 74.83046800382044 - type: mrr_at_5 value: 75.88490926456515 - type: nauc_map_at_1000_diff1 value: 78.06933310424179 - type: nauc_map_at_1000_max value: 49.392948209665896 - type: nauc_map_at_1000_std value: -15.126109322591166 - type: nauc_map_at_100_diff1 value: 78.06612779298378 - type: nauc_map_at_100_max value: 49.40761618630397 - type: nauc_map_at_100_std value: -15.099282408159349 - type: nauc_map_at_10_diff1 value: 77.94565685470538 - type: nauc_map_at_10_max value: 49.50559610363201 - type: nauc_map_at_10_std value: -15.182130695916355 - type: nauc_map_at_1_diff1 value: 79.84814509858211 - type: nauc_map_at_1_max value: 40.78978466656547 - type: nauc_map_at_1_std value: -19.96189264026715 - type: nauc_map_at_20_diff1 value: 78.03597839981245 - type: nauc_map_at_20_max value: 49.49477427223376 - type: nauc_map_at_20_std value: -15.084990000838378 - type: nauc_map_at_3_diff1 value: 78.0637014655507 - type: nauc_map_at_3_max value: 48.63214001973341 - type: nauc_map_at_3_std value: -17.093950563306596 - type: nauc_map_at_5_diff1 value: 77.94068229240348 - type: nauc_map_at_5_max value: 49.38930719689204 - type: nauc_map_at_5_std value: -15.9919454201954 - type: nauc_mrr_at_1000_diff1 value: 78.34582398092816 - type: nauc_mrr_at_1000_max value: 49.623566992784156 - type: nauc_mrr_at_1000_std value: -14.381347765493265 - type: nauc_mrr_at_100_diff1 value: 78.3429966714221 - type: nauc_mrr_at_100_max value: 49.63684922240546 - type: nauc_mrr_at_100_std value: -14.354914066301236 - type: nauc_mrr_at_10_diff1 value: 78.2208070219624 - type: nauc_mrr_at_10_max value: 49.77720536573364 - type: nauc_mrr_at_10_std value: -14.316233764741812 - type: nauc_mrr_at_1_diff1 value: 80.22305496572142 - type: nauc_mrr_at_1_max value: 44.30231210192536 - type: nauc_mrr_at_1_std value: -18.942549914934492 - type: nauc_mrr_at_20_diff1 value: 78.31006724240147 - type: nauc_mrr_at_20_max value: 49.72338465276142 - type: nauc_mrr_at_20_std value: -14.30722621948953 - type: nauc_mrr_at_3_diff1 value: 78.39832634634523 - type: nauc_mrr_at_3_max value: 49.24985961036677 - type: nauc_mrr_at_3_std value: -15.966286866763191 - type: nauc_mrr_at_5_diff1 value: 78.2406507247798 - type: nauc_mrr_at_5_max value: 49.71276359754787 - type: nauc_mrr_at_5_std value: -14.979526226149698 - type: nauc_ndcg_at_1000_diff1 value: 77.74892471071016 - type: nauc_ndcg_at_1000_max value: 51.11543344053061 - type: nauc_ndcg_at_1000_std value: -12.208878737005096 - type: nauc_ndcg_at_100_diff1 value: 77.67462502211228 - type: nauc_ndcg_at_100_max value: 51.593977338939034 - type: nauc_ndcg_at_100_std value: -11.312126179513802 - type: nauc_ndcg_at_10_diff1 value: 77.0571291760012 - type: nauc_ndcg_at_10_max value: 52.35435572808972 - type: nauc_ndcg_at_10_std value: -11.33242546164059 - type: nauc_ndcg_at_1_diff1 value: 80.22305496572142 - type: nauc_ndcg_at_1_max value: 44.30231210192536 - type: nauc_ndcg_at_1_std value: -18.942549914934492 - type: nauc_ndcg_at_20_diff1 value: 77.4141216117471 - type: nauc_ndcg_at_20_max value: 52.340600871365375 - type: nauc_ndcg_at_20_std value: -10.989010161550912 - type: nauc_ndcg_at_3_diff1 value: 77.43971989259062 - type: nauc_ndcg_at_3_max value: 50.59251358320663 - type: nauc_ndcg_at_3_std value: -15.59337960636058 - type: nauc_ndcg_at_5_diff1 value: 77.12174287031847 - type: nauc_ndcg_at_5_max value: 51.97108510288907 - type: nauc_ndcg_at_5_std value: -13.474902612427167 - type: nauc_precision_at_1000_diff1 value: -19.36793534929367 - type: nauc_precision_at_1000_max value: 11.803383262344036 - type: nauc_precision_at_1000_std value: 24.304436015177046 - type: nauc_precision_at_100_diff1 value: -6.273790806909921 - type: nauc_precision_at_100_max value: 23.372606271300747 - type: nauc_precision_at_100_std value: 29.085768971612342 - type: nauc_precision_at_10_diff1 value: 21.67045907336595 - type: nauc_precision_at_10_max value: 41.68948432407223 - type: nauc_precision_at_10_std value: 17.837055074458092 - type: nauc_precision_at_1_diff1 value: 80.22305496572142 - type: nauc_precision_at_1_max value: 44.30231210192536 - type: nauc_precision_at_1_std value: -18.942549914934492 - type: nauc_precision_at_20_diff1 value: 12.577671896684803 - type: nauc_precision_at_20_max value: 37.44944702246691 - type: nauc_precision_at_20_std value: 23.635897665206087 - type: nauc_precision_at_3_diff1 value: 47.165335112814056 - type: nauc_precision_at_3_max value: 47.0458691263379 - type: nauc_precision_at_3_std value: -3.3181861146890217 - type: nauc_precision_at_5_diff1 value: 35.406205343514806 - type: nauc_precision_at_5_max value: 45.56549449285401 - type: nauc_precision_at_5_std value: 5.612378074562386 - type: nauc_recall_at_1000_diff1 value: 72.32762520815842 - type: nauc_recall_at_1000_max value: 85.64979256307343 - type: nauc_recall_at_1000_std value: 73.61925297037476 - type: nauc_recall_at_100_diff1 value: 72.31946328709962 - type: nauc_recall_at_100_max value: 83.76576070068353 - type: nauc_recall_at_100_std value: 57.39376538662535 - type: nauc_recall_at_10_diff1 value: 69.51307788072499 - type: nauc_recall_at_10_max value: 69.60124733654142 - type: nauc_recall_at_10_std value: 13.483540424716892 - type: nauc_recall_at_1_diff1 value: 79.84814509858211 - type: nauc_recall_at_1_max value: 40.78978466656547 - type: nauc_recall_at_1_std value: -19.96189264026715 - type: nauc_recall_at_20_diff1 value: 70.92168324710599 - type: nauc_recall_at_20_max value: 76.09106252420084 - type: nauc_recall_at_20_std value: 25.406842300761447 - type: nauc_recall_at_3_diff1 value: 74.1212680517145 - type: nauc_recall_at_3_max value: 56.24921832879403 - type: nauc_recall_at_3_std value: -11.55542913578436 - type: nauc_recall_at_5_diff1 value: 72.31262959872993 - type: nauc_recall_at_5_max value: 62.761214896697915 - type: nauc_recall_at_5_std value: -3.280167584070396 - type: ndcg_at_1 value: 69.18299999999999 - type: ndcg_at_10 value: 79.687 - type: ndcg_at_100 value: 81.062 - type: ndcg_at_1000 value: 81.312 - type: ndcg_at_20 value: 80.34599999999999 - type: ndcg_at_3 value: 75.98700000000001 - type: ndcg_at_5 value: 78.039 - type: precision_at_1 value: 69.18299999999999 - type: precision_at_10 value: 9.636 - type: precision_at_100 value: 1.0330000000000001 - type: precision_at_1000 value: 0.105 - type: precision_at_20 value: 4.958 - type: precision_at_3 value: 28.515 - type: precision_at_5 value: 18.201 - type: recall_at_1 value: 66.872 - type: recall_at_10 value: 90.688 - type: recall_at_100 value: 96.99 - type: recall_at_1000 value: 98.958 - type: recall_at_20 value: 93.21199999999999 - type: recall_at_3 value: 80.84599999999999 - type: recall_at_5 value: 85.732 task: type: Retrieval - dataset: config: default name: MTEB MSMARCO (default) revision: c5a29a104738b98a9e76336939199e264163d4a0 split: dev type: mteb/msmarco metrics: - type: map_at_1 value: 21.861 - type: map_at_10 value: 34.008 - type: map_at_100 value: 35.174 - type: map_at_1000 value: 35.224 - type: map_at_20 value: 34.705999999999996 - type: map_at_3 value: 30.209000000000003 - type: map_at_5 value: 32.351 - type: mrr_at_1 value: 22.493 - type: mrr_at_10 value: 34.583999999999996 - type: mrr_at_100 value: 35.691 - type: mrr_at_1000 value: 35.736000000000004 - type: mrr_at_20 value: 35.257 - type: mrr_at_3 value: 30.85 - type: mrr_at_5 value: 32.962 - type: ndcg_at_1 value: 22.493 - type: ndcg_at_10 value: 40.815 - type: ndcg_at_100 value: 46.483999999999995 - type: ndcg_at_1000 value: 47.73 - type: ndcg_at_20 value: 43.302 - type: ndcg_at_3 value: 33.056000000000004 - type: ndcg_at_5 value: 36.879 - type: precision_at_1 value: 22.493 - type: precision_at_10 value: 6.465999999999999 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.104 - type: precision_at_20 value: 3.752 - type: precision_at_3 value: 14.069 - type: precision_at_5 value: 10.384 - type: recall_at_1 value: 21.861 - type: recall_at_10 value: 61.781 - type: recall_at_100 value: 88.095 - type: recall_at_1000 value: 97.625 - type: recall_at_20 value: 71.44500000000001 - type: recall_at_3 value: 40.653 - type: recall_at_5 value: 49.841 - type: main_score value: 40.815 task: type: Retrieval - dataset: config: en name: MTEB MTOPDomainClassification (en) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 97.4874601003192 - type: f1 value: 97.19067544931094 - type: f1_weighted value: 97.49331776181019 - type: main_score value: 97.4874601003192 task: type: Classification - dataset: config: de name: MTEB MTOPDomainClassification (de) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 96.89489997182305 - type: f1 value: 96.51138586512977 - type: f1_weighted value: 96.89723065967186 - type: main_score value: 96.89489997182305 task: type: Classification - dataset: config: es name: MTEB MTOPDomainClassification (es) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 97.17144763175452 - type: f1 value: 96.81785681878274 - type: f1_weighted value: 97.1778974586874 - type: main_score value: 97.17144763175452 task: type: Classification - dataset: config: fr name: MTEB MTOPDomainClassification (fr) revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf split: test type: mteb/mtop_domain metrics: - type: accuracy value: 96.30128405887879 - type: f1 value: 95.94555923088487 - type: f1_weighted value: 96.30399416794926 - type: main_score value: 96.30128405887879 task: type: Classification - dataset: config: en name: MTEB MTOPIntentClassification (en) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 84.53488372093022 - type: f1 value: 61.77995074251401 - type: f1_weighted value: 86.8005170485101 - type: main_score value: 84.53488372093022 task: type: Classification - dataset: config: de name: MTEB MTOPIntentClassification (de) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 80.79459002535924 - type: f1 value: 56.08938302001448 - type: f1_weighted value: 83.66582131948252 - type: main_score value: 80.79459002535924 task: type: Classification - dataset: config: es name: MTEB MTOPIntentClassification (es) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 84.7765176784523 - type: f1 value: 61.39860057885528 - type: f1_weighted value: 86.94881745670745 - type: main_score value: 84.7765176784523 task: type: Classification - dataset: config: fr name: MTEB MTOPIntentClassification (fr) revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba split: test type: mteb/mtop_intent metrics: - type: accuracy value: 82.2079549013467 - type: f1 value: 59.90260478749016 - type: f1_weighted value: 84.36861708593257 - type: main_score value: 82.2079549013467 task: type: Classification - dataset: config: eng name: MTEB MasakhaNEWSClassification (eng) revision: 18193f187b92da67168c655c9973a165ed9593dd split: test type: mteb/masakhanews metrics: - type: accuracy value: 74.98945147679325 - type: f1 value: 74.3157483560261 - type: f1_weighted value: 75.01179008904884 - type: main_score value: 74.98945147679325 task: type: Classification - dataset: config: fra name: MTEB MasakhaNEWSClassification (fra) revision: 18193f187b92da67168c655c9973a165ed9593dd split: test type: mteb/masakhanews metrics: - type: accuracy value: 74.02843601895735 - type: f1 value: 70.40326349620732 - type: f1_weighted value: 74.6596277063484 - type: main_score value: 74.02843601895735 task: type: Classification - dataset: config: amh name: MTEB MasakhaNEWSClusteringP2P (amh) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 69.45780291725053 - type: v_measure value: 69.45780291725053 - type: v_measure_std value: 36.54340055904091 task: type: Clustering - dataset: config: eng name: MTEB MasakhaNEWSClusteringP2P (eng) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 64.88996119332239 - type: v_measure value: 64.88996119332239 - type: v_measure_std value: 30.017223408197268 task: type: Clustering - dataset: config: fra name: MTEB MasakhaNEWSClusteringP2P (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 42.362383958691666 - type: v_measure value: 42.362383958691666 - type: v_measure_std value: 37.61076788039063 task: type: Clustering - dataset: config: hau name: MTEB MasakhaNEWSClusteringP2P (hau) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 43.29201252405562 - type: v_measure value: 43.29201252405562 - type: v_measure_std value: 34.31987945146255 task: type: Clustering - dataset: config: ibo name: MTEB MasakhaNEWSClusteringP2P (ibo) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 33.59926542995238 - type: v_measure value: 33.59926542995238 - type: v_measure_std value: 35.70048601084112 task: type: Clustering - dataset: config: lin name: MTEB MasakhaNEWSClusteringP2P (lin) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 67.58487601893106 - type: v_measure value: 67.58487601893106 - type: v_measure_std value: 35.16784970777931 task: type: Clustering - dataset: config: lug name: MTEB MasakhaNEWSClusteringP2P (lug) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 50.01220872023533 - type: v_measure value: 50.01220872023533 - type: v_measure_std value: 41.87411574676182 task: type: Clustering - dataset: config: orm name: MTEB MasakhaNEWSClusteringP2P (orm) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 29.007847502598317 - type: v_measure value: 29.007847502598317 - type: v_measure_std value: 38.374997395079994 task: type: Clustering - dataset: config: pcm name: MTEB MasakhaNEWSClusteringP2P (pcm) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 79.13520228554611 - type: v_measure value: 79.13520228554611 - type: v_measure_std value: 18.501843848275183 task: type: Clustering - dataset: config: run name: MTEB MasakhaNEWSClusteringP2P (run) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 60.317213909746656 - type: v_measure value: 60.317213909746656 - type: v_measure_std value: 36.500281823747386 task: type: Clustering - dataset: config: sna name: MTEB MasakhaNEWSClusteringP2P (sna) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 59.395277358240946 - type: v_measure value: 59.395277358240946 - type: v_measure_std value: 37.500916816164654 task: type: Clustering - dataset: config: som name: MTEB MasakhaNEWSClusteringP2P (som) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 38.18638688704302 - type: v_measure value: 38.18638688704302 - type: v_measure_std value: 35.453681137564466 task: type: Clustering - dataset: config: swa name: MTEB MasakhaNEWSClusteringP2P (swa) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 29.49230755729658 - type: v_measure value: 29.49230755729658 - type: v_measure_std value: 28.284313285264645 task: type: Clustering - dataset: config: tir name: MTEB MasakhaNEWSClusteringP2P (tir) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 60.632258622750115 - type: v_measure value: 60.632258622750115 - type: v_measure_std value: 34.429711214740564 task: type: Clustering - dataset: config: xho name: MTEB MasakhaNEWSClusteringP2P (xho) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 41.76322918806381 - type: v_measure value: 41.76322918806381 - type: v_measure_std value: 36.43245296200775 task: type: Clustering - dataset: config: yor name: MTEB MasakhaNEWSClusteringP2P (yor) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 33.17083910808645 - type: v_measure value: 33.17083910808645 - type: v_measure_std value: 34.87547994284835 task: type: Clustering - dataset: config: amh name: MTEB MasakhaNEWSClusteringS2S (amh) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 60.95132147787602 - type: v_measure value: 60.95132147787602 - type: v_measure_std value: 37.330148394033365 task: type: Clustering - dataset: config: eng name: MTEB MasakhaNEWSClusteringS2S (eng) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 60.974810831426595 - type: v_measure value: 60.974810831426595 - type: v_measure_std value: 24.934675467507827 task: type: Clustering - dataset: config: fra name: MTEB MasakhaNEWSClusteringS2S (fra) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 44.479206673553335 - type: v_measure value: 44.479206673553335 - type: v_measure_std value: 32.58254804499339 task: type: Clustering - dataset: config: hau name: MTEB MasakhaNEWSClusteringS2S (hau) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 26.4742082741682 - type: v_measure value: 26.4742082741682 - type: v_measure_std value: 22.344929192323097 task: type: Clustering - dataset: config: ibo name: MTEB MasakhaNEWSClusteringS2S (ibo) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 38.906129911741985 - type: v_measure value: 38.906129911741985 - type: v_measure_std value: 34.785601792668444 task: type: Clustering - dataset: config: lin name: MTEB MasakhaNEWSClusteringS2S (lin) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 62.60982020876592 - type: v_measure value: 62.60982020876592 - type: v_measure_std value: 40.7368955715045 task: type: Clustering - dataset: config: lug name: MTEB MasakhaNEWSClusteringS2S (lug) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 42.70424106365967 - type: v_measure value: 42.70424106365967 - type: v_measure_std value: 46.80946241135087 task: type: Clustering - dataset: config: orm name: MTEB MasakhaNEWSClusteringS2S (orm) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 28.609942199922322 - type: v_measure value: 28.609942199922322 - type: v_measure_std value: 38.46685040191088 task: type: Clustering - dataset: config: pcm name: MTEB MasakhaNEWSClusteringS2S (pcm) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 76.83901348810822 - type: v_measure value: 76.83901348810822 - type: v_measure_std value: 17.57617141269189 task: type: Clustering - dataset: config: run name: MTEB MasakhaNEWSClusteringS2S (run) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 46.89757547846193 - type: v_measure value: 46.89757547846193 - type: v_measure_std value: 44.58903590203438 task: type: Clustering - dataset: config: sna name: MTEB MasakhaNEWSClusteringS2S (sna) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 55.37185207068829 - type: v_measure value: 55.37185207068829 - type: v_measure_std value: 36.944574863543004 task: type: Clustering - dataset: config: som name: MTEB MasakhaNEWSClusteringS2S (som) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 37.44211021681754 - type: v_measure value: 37.44211021681754 - type: v_measure_std value: 33.41469994463241 task: type: Clustering - dataset: config: swa name: MTEB MasakhaNEWSClusteringS2S (swa) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 26.020680621216062 - type: v_measure value: 26.020680621216062 - type: v_measure_std value: 25.480037522570413 task: type: Clustering - dataset: config: tir name: MTEB MasakhaNEWSClusteringS2S (tir) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 63.74306846771303 - type: v_measure value: 63.74306846771303 - type: v_measure_std value: 32.19119631078685 task: type: Clustering - dataset: config: xho name: MTEB MasakhaNEWSClusteringS2S (xho) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 24.580890519243777 - type: v_measure value: 24.580890519243777 - type: v_measure_std value: 37.941836363967106 task: type: Clustering - dataset: config: yor name: MTEB MasakhaNEWSClusteringS2S (yor) revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60 split: test type: masakhane/masakhanews metrics: - type: main_score value: 43.63458888828314 - type: v_measure value: 43.63458888828314 - type: v_measure_std value: 31.28169350649098 task: type: Clustering - dataset: config: pl name: MTEB MassiveIntentClassification (pl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 75.37323470073974 - type: f1 value: 71.1836877753734 - type: f1_weighted value: 75.72073213955457 - type: main_score value: 75.37323470073974 task: type: Classification - dataset: config: de name: MTEB MassiveIntentClassification (de) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 74.83523873570948 - type: f1 value: 70.72375821116886 - type: f1_weighted value: 75.20800490010755 - type: main_score value: 74.83523873570948 task: type: Classification - dataset: config: es name: MTEB MassiveIntentClassification (es) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 75.31607262945528 - type: f1 value: 72.06063554897662 - type: f1_weighted value: 75.72438161355252 - type: main_score value: 75.31607262945528 task: type: Classification - dataset: config: ru name: MTEB MassiveIntentClassification (ru) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 76.7955615332885 - type: f1 value: 73.08099648499756 - type: f1_weighted value: 77.18482068239668 - type: main_score value: 76.7955615332885 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 77.60591795561534 - type: f1 value: 74.46676705370395 - type: f1_weighted value: 77.69888062336614 - type: main_score value: 77.60591795561534 task: type: Classification - dataset: config: fr name: MTEB MassiveIntentClassification (fr) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 76.32145258910558 - type: f1 value: 72.89824154178328 - type: f1_weighted value: 76.6539327979472 - type: main_score value: 76.32145258910558 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveIntentClassification (zh-CN) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 73.21788836583724 - type: f1 value: 70.45594512246377 - type: f1_weighted value: 73.67862536499393 - type: main_score value: 73.21788836583724 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveScenarioClassification (zh-CN) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 80.82044384667114 - type: f1 value: 80.53217664465089 - type: f1_weighted value: 80.94535087010512 - type: main_score value: 80.82044384667114 task: type: Classification - dataset: config: pl name: MTEB MassiveScenarioClassification (pl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 82.1049092131809 - type: f1 value: 81.55343463694733 - type: f1_weighted value: 82.33509098770782 - type: main_score value: 82.1049092131809 task: type: Classification - dataset: config: es name: MTEB MassiveScenarioClassification (es) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 82.58238063214526 - type: f1 value: 82.27974449333072 - type: f1_weighted value: 82.81337569618209 - type: main_score value: 82.58238063214526 task: type: Classification - dataset: config: de name: MTEB MassiveScenarioClassification (de) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 83.97108271687962 - type: f1 value: 83.56285606936076 - type: f1_weighted value: 84.10198745390771 - type: main_score value: 83.97108271687962 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 84.71082716879623 - type: f1 value: 84.09447062371402 - type: f1_weighted value: 84.73765765551342 - type: main_score value: 84.71082716879623 task: type: Classification - dataset: config: fr name: MTEB MassiveScenarioClassification (fr) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 83.093476798924 - type: f1 value: 82.72656900752943 - type: f1_weighted value: 83.26606516503364 - type: main_score value: 83.093476798924 task: type: Classification - dataset: config: ru name: MTEB MassiveScenarioClassification (ru) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 84.05850706119705 - type: f1 value: 83.64234048881222 - type: f1_weighted value: 84.17315768381876 - type: main_score value: 84.05850706119705 task: type: Classification - dataset: config: default name: MTEB MedicalRetrieval (default) revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6 split: dev type: C-MTEB/MedicalRetrieval metrics: - type: main_score value: 56.635999999999996 - type: map_at_1 value: 48.699999999999996 - type: map_at_10 value: 53.991 - type: map_at_100 value: 54.449999999999996 - type: map_at_1000 value: 54.515 - type: map_at_20 value: 54.212 - type: map_at_3 value: 52.833 - type: map_at_5 value: 53.503 - type: mrr_at_1 value: 48.699999999999996 - type: mrr_at_10 value: 53.991309523809505 - type: mrr_at_100 value: 54.45008993448266 - type: mrr_at_1000 value: 54.515253990549795 - type: mrr_at_20 value: 54.21201762247036 - type: mrr_at_3 value: 52.8333333333333 - type: mrr_at_5 value: 53.50333333333328 - type: nauc_map_at_1000_diff1 value: 79.96867989401643 - type: nauc_map_at_1000_max value: 69.75230895599029 - type: nauc_map_at_1000_std value: 2.6418738289740213 - type: nauc_map_at_100_diff1 value: 79.95343709599133 - type: nauc_map_at_100_max value: 69.751282671507 - type: nauc_map_at_100_std value: 2.621719966106279 - type: nauc_map_at_10_diff1 value: 80.02875864565634 - type: nauc_map_at_10_max value: 69.80948662290187 - type: nauc_map_at_10_std value: 2.329151604733765 - type: nauc_map_at_1_diff1 value: 83.616940281383 - type: nauc_map_at_1_max value: 69.08142651929452 - type: nauc_map_at_1_std value: 1.9687791394035643 - type: nauc_map_at_20_diff1 value: 79.95555601275339 - type: nauc_map_at_20_max value: 69.76604695002925 - type: nauc_map_at_20_std value: 2.556184141901367 - type: nauc_map_at_3_diff1 value: 80.74790131023668 - type: nauc_map_at_3_max value: 70.57797991892402 - type: nauc_map_at_3_std value: 2.7115149849964117 - type: nauc_map_at_5_diff1 value: 80.31796539878381 - type: nauc_map_at_5_max value: 69.93573796420061 - type: nauc_map_at_5_std value: 2.0731614029506606 - type: nauc_mrr_at_1000_diff1 value: 79.96867999907981 - type: nauc_mrr_at_1000_max value: 69.57395578976896 - type: nauc_mrr_at_1000_std value: 2.46351945887829 - type: nauc_mrr_at_100_diff1 value: 79.95343709599133 - type: nauc_mrr_at_100_max value: 69.57322054130803 - type: nauc_mrr_at_100_std value: 2.4436578359073433 - type: nauc_mrr_at_10_diff1 value: 80.02875864565634 - type: nauc_mrr_at_10_max value: 69.63292630937411 - type: nauc_mrr_at_10_std value: 2.1525912912060012 - type: nauc_mrr_at_1_diff1 value: 83.616940281383 - type: nauc_mrr_at_1_max value: 68.74717310480305 - type: nauc_mrr_at_1_std value: 1.6345257249120868 - type: nauc_mrr_at_20_diff1 value: 79.95555601275339 - type: nauc_mrr_at_20_max value: 69.58883608470444 - type: nauc_mrr_at_20_std value: 2.378973276576547 - type: nauc_mrr_at_3_diff1 value: 80.74790131023668 - type: nauc_mrr_at_3_max value: 70.40430475488604 - type: nauc_mrr_at_3_std value: 2.5378398209583817 - type: nauc_mrr_at_5_diff1 value: 80.31796539878381 - type: nauc_mrr_at_5_max value: 69.7605991748183 - type: nauc_mrr_at_5_std value: 1.898022613568352 - type: nauc_ndcg_at_1000_diff1 value: 78.35504059321225 - type: nauc_ndcg_at_1000_max value: 69.06752522437093 - type: nauc_ndcg_at_1000_std value: 3.9624036886099265 - type: nauc_ndcg_at_100_diff1 value: 77.79729140249833 - type: nauc_ndcg_at_100_max value: 68.93113791506029 - type: nauc_ndcg_at_100_std value: 3.642178826886181 - type: nauc_ndcg_at_10_diff1 value: 78.160158293918 - type: nauc_ndcg_at_10_max value: 69.28122202281361 - type: nauc_ndcg_at_10_std value: 2.438976810940962 - type: nauc_ndcg_at_1_diff1 value: 83.616940281383 - type: nauc_ndcg_at_1_max value: 69.08142651929452 - type: nauc_ndcg_at_1_std value: 1.9687791394035643 - type: nauc_ndcg_at_20_diff1 value: 77.88514432874997 - type: nauc_ndcg_at_20_max value: 69.06148818508873 - type: nauc_ndcg_at_20_std value: 3.1800249272363676 - type: nauc_ndcg_at_3_diff1 value: 79.73510384405803 - type: nauc_ndcg_at_3_max value: 70.78000695123832 - type: nauc_ndcg_at_3_std value: 2.9041415468363274 - type: nauc_ndcg_at_5_diff1 value: 78.91872808866195 - type: nauc_ndcg_at_5_max value: 69.61478429620091 - type: nauc_ndcg_at_5_std value: 1.734699636301054 - type: nauc_precision_at_1000_diff1 value: 66.37858395390673 - type: nauc_precision_at_1000_max value: 60.651659037598534 - type: nauc_precision_at_1000_std value: 27.388353715469798 - type: nauc_precision_at_100_diff1 value: 66.34325807776025 - type: nauc_precision_at_100_max value: 63.63855305621111 - type: nauc_precision_at_100_std value: 10.641748149575351 - type: nauc_precision_at_10_diff1 value: 71.3784685491089 - type: nauc_precision_at_10_max value: 67.05313695174542 - type: nauc_precision_at_10_std value: 3.000406867930561 - type: nauc_precision_at_1_diff1 value: 83.616940281383 - type: nauc_precision_at_1_max value: 69.08142651929452 - type: nauc_precision_at_1_std value: 1.9687791394035643 - type: nauc_precision_at_20_diff1 value: 69.73407910977694 - type: nauc_precision_at_20_max value: 65.77426240320742 - type: nauc_precision_at_20_std value: 6.204416838482586 - type: nauc_precision_at_3_diff1 value: 76.63737537643107 - type: nauc_precision_at_3_max value: 71.29710200719668 - type: nauc_precision_at_3_std value: 3.47180961484546 - type: nauc_precision_at_5_diff1 value: 74.36945983536717 - type: nauc_precision_at_5_max value: 68.33292218003061 - type: nauc_precision_at_5_std value: 0.47128762620258075 - type: nauc_recall_at_1000_diff1 value: 66.37858395390681 - type: nauc_recall_at_1000_max value: 60.65165903759889 - type: nauc_recall_at_1000_std value: 27.388353715469822 - type: nauc_recall_at_100_diff1 value: 66.34325807776025 - type: nauc_recall_at_100_max value: 63.63855305621116 - type: nauc_recall_at_100_std value: 10.641748149575351 - type: nauc_recall_at_10_diff1 value: 71.37846854910892 - type: nauc_recall_at_10_max value: 67.05313695174546 - type: nauc_recall_at_10_std value: 3.000406867930663 - type: nauc_recall_at_1_diff1 value: 83.616940281383 - type: nauc_recall_at_1_max value: 69.08142651929452 - type: nauc_recall_at_1_std value: 1.9687791394035643 - type: nauc_recall_at_20_diff1 value: 69.73407910977691 - type: nauc_recall_at_20_max value: 65.77426240320746 - type: nauc_recall_at_20_std value: 6.204416838482536 - type: nauc_recall_at_3_diff1 value: 76.63737537643112 - type: nauc_recall_at_3_max value: 71.29710200719668 - type: nauc_recall_at_3_std value: 3.471809614845442 - type: nauc_recall_at_5_diff1 value: 74.36945983536715 - type: nauc_recall_at_5_max value: 68.33292218003065 - type: nauc_recall_at_5_std value: 0.4712876262026442 - type: ndcg_at_1 value: 48.699999999999996 - type: ndcg_at_10 value: 56.635999999999996 - type: ndcg_at_100 value: 59.193 - type: ndcg_at_1000 value: 60.97 - type: ndcg_at_20 value: 57.426 - type: ndcg_at_3 value: 54.186 - type: ndcg_at_5 value: 55.407 - type: precision_at_1 value: 48.699999999999996 - type: precision_at_10 value: 6.5 - type: precision_at_100 value: 0.777 - type: precision_at_1000 value: 0.092 - type: precision_at_20 value: 3.405 - type: precision_at_3 value: 19.367 - type: precision_at_5 value: 12.22 - type: recall_at_1 value: 48.699999999999996 - type: recall_at_10 value: 65.0 - type: recall_at_100 value: 77.7 - type: recall_at_1000 value: 91.8 - type: recall_at_20 value: 68.10000000000001 - type: recall_at_3 value: 58.099999999999994 - type: recall_at_5 value: 61.1 task: type: Retrieval - dataset: config: default name: MTEB MedrxivClusteringP2P (default) revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: main_score value: 34.80188561439236 - type: v_measure value: 34.80188561439236 - type: v_measure_std value: 1.5703148841573102 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S (default) revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: main_score value: 32.42285513996236 - type: v_measure value: 32.42285513996236 - type: v_measure_std value: 1.3769867487457566 task: type: Clustering - dataset: config: de name: MTEB MintakaRetrieval (de) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: jinaai/mintakaqa metrics: - type: main_score value: 27.025 - type: map_at_1 value: 14.532 - type: map_at_10 value: 22.612 - type: map_at_100 value: 23.802 - type: map_at_1000 value: 23.9 - type: map_at_20 value: 23.275000000000002 - type: map_at_3 value: 20.226 - type: map_at_5 value: 21.490000000000002 - type: mrr_at_1 value: 14.532434709351305 - type: mrr_at_10 value: 22.612077265615575 - type: mrr_at_100 value: 23.801523356874675 - type: mrr_at_1000 value: 23.900118499340238 - type: mrr_at_20 value: 23.275466430108995 - type: mrr_at_3 value: 20.22606009547877 - type: mrr_at_5 value: 21.489750070204945 - type: nauc_map_at_1000_diff1 value: 14.148987799763596 - type: nauc_map_at_1000_max value: 44.70338461387784 - type: nauc_map_at_1000_std value: 15.868006767707637 - type: nauc_map_at_100_diff1 value: 14.11371769080442 - type: nauc_map_at_100_max value: 44.67995540936296 - type: nauc_map_at_100_std value: 15.890796502029076 - type: nauc_map_at_10_diff1 value: 14.29066834165688 - type: nauc_map_at_10_max value: 45.10997111765282 - type: nauc_map_at_10_std value: 15.508568918629864 - type: nauc_map_at_1_diff1 value: 23.473291302576396 - type: nauc_map_at_1_max value: 44.68942599764586 - type: nauc_map_at_1_std value: 12.424377262427253 - type: nauc_map_at_20_diff1 value: 14.112652046087831 - type: nauc_map_at_20_max value: 44.82014861413682 - type: nauc_map_at_20_std value: 15.739350613646385 - type: nauc_map_at_3_diff1 value: 16.119659221396347 - type: nauc_map_at_3_max value: 46.04766378953525 - type: nauc_map_at_3_std value: 13.969878046315925 - type: nauc_map_at_5_diff1 value: 15.095453434076184 - type: nauc_map_at_5_max value: 45.802128149314406 - type: nauc_map_at_5_std value: 14.957442173319949 - type: nauc_mrr_at_1000_diff1 value: 14.148987799763596 - type: nauc_mrr_at_1000_max value: 44.70338461387784 - type: nauc_mrr_at_1000_std value: 15.868006767707637 - type: nauc_mrr_at_100_diff1 value: 14.11371769080442 - type: nauc_mrr_at_100_max value: 44.67995540936296 - type: nauc_mrr_at_100_std value: 15.890796502029076 - type: nauc_mrr_at_10_diff1 value: 14.29066834165688 - type: nauc_mrr_at_10_max value: 45.10997111765282 - type: nauc_mrr_at_10_std value: 15.508568918629864 - type: nauc_mrr_at_1_diff1 value: 23.473291302576396 - type: nauc_mrr_at_1_max value: 44.68942599764586 - type: nauc_mrr_at_1_std value: 12.424377262427253 - type: nauc_mrr_at_20_diff1 value: 14.112652046087831 - type: nauc_mrr_at_20_max value: 44.82014861413682 - type: nauc_mrr_at_20_std value: 15.739350613646385 - type: nauc_mrr_at_3_diff1 value: 16.119659221396347 - type: nauc_mrr_at_3_max value: 46.04766378953525 - type: nauc_mrr_at_3_std value: 13.969878046315925 - type: nauc_mrr_at_5_diff1 value: 15.095453434076184 - type: nauc_mrr_at_5_max value: 45.802128149314406 - type: nauc_mrr_at_5_std value: 14.957442173319949 - type: nauc_ndcg_at_1000_diff1 value: 11.626606894574028 - type: nauc_ndcg_at_1000_max value: 43.328592841065536 - type: nauc_ndcg_at_1000_std value: 18.049446272245547 - type: nauc_ndcg_at_100_diff1 value: 10.485720606660239 - type: nauc_ndcg_at_100_max value: 42.405317674170966 - type: nauc_ndcg_at_100_std value: 19.107151641936987 - type: nauc_ndcg_at_10_diff1 value: 11.029351078162982 - type: nauc_ndcg_at_10_max value: 44.36855031964681 - type: nauc_ndcg_at_10_std value: 17.302796171409305 - type: nauc_ndcg_at_1_diff1 value: 23.473291302576396 - type: nauc_ndcg_at_1_max value: 44.68942599764586 - type: nauc_ndcg_at_1_std value: 12.424377262427253 - type: nauc_ndcg_at_20_diff1 value: 10.356662718168412 - type: nauc_ndcg_at_20_max value: 43.31602680430083 - type: nauc_ndcg_at_20_std value: 18.162891267850316 - type: nauc_ndcg_at_3_diff1 value: 14.42844952297869 - type: nauc_ndcg_at_3_max value: 46.26603339466543 - type: nauc_ndcg_at_3_std value: 14.449362723887857 - type: nauc_ndcg_at_5_diff1 value: 12.783416563486396 - type: nauc_ndcg_at_5_max value: 45.852176479124424 - type: nauc_ndcg_at_5_std value: 16.11775016428085 - type: nauc_precision_at_1000_diff1 value: -8.045361059399795 - type: nauc_precision_at_1000_max value: 21.970273281738777 - type: nauc_precision_at_1000_std value: 49.564650488193266 - type: nauc_precision_at_100_diff1 value: -2.118628861593353 - type: nauc_precision_at_100_max value: 31.32498977104778 - type: nauc_precision_at_100_std value: 32.96087731883451 - type: nauc_precision_at_10_diff1 value: 3.0335517475367615 - type: nauc_precision_at_10_max value: 42.21620215030219 - type: nauc_precision_at_10_std value: 21.90159732315962 - type: nauc_precision_at_1_diff1 value: 23.473291302576396 - type: nauc_precision_at_1_max value: 44.68942599764586 - type: nauc_precision_at_1_std value: 12.424377262427253 - type: nauc_precision_at_20_diff1 value: 0.4087201843719047 - type: nauc_precision_at_20_max value: 38.485034773895734 - type: nauc_precision_at_20_std value: 25.077397979916682 - type: nauc_precision_at_3_diff1 value: 10.408327736589833 - type: nauc_precision_at_3_max value: 46.757216289175076 - type: nauc_precision_at_3_std value: 15.62594354926867 - type: nauc_precision_at_5_diff1 value: 7.326752744229544 - type: nauc_precision_at_5_max value: 45.89190518573553 - type: nauc_precision_at_5_std value: 19.01717163438957 - type: nauc_recall_at_1000_diff1 value: -8.045361059400387 - type: nauc_recall_at_1000_max value: 21.97027328173812 - type: nauc_recall_at_1000_std value: 49.56465048819266 - type: nauc_recall_at_100_diff1 value: -2.118628861593277 - type: nauc_recall_at_100_max value: 31.324989771047818 - type: nauc_recall_at_100_std value: 32.96087731883457 - type: nauc_recall_at_10_diff1 value: 3.0335517475367166 - type: nauc_recall_at_10_max value: 42.21620215030217 - type: nauc_recall_at_10_std value: 21.901597323159606 - type: nauc_recall_at_1_diff1 value: 23.473291302576396 - type: nauc_recall_at_1_max value: 44.68942599764586 - type: nauc_recall_at_1_std value: 12.424377262427253 - type: nauc_recall_at_20_diff1 value: 0.40872018437190905 - type: nauc_recall_at_20_max value: 38.485034773895734 - type: nauc_recall_at_20_std value: 25.077397979916693 - type: nauc_recall_at_3_diff1 value: 10.408327736589843 - type: nauc_recall_at_3_max value: 46.75721628917507 - type: nauc_recall_at_3_std value: 15.625943549268664 - type: nauc_recall_at_5_diff1 value: 7.326752744229548 - type: nauc_recall_at_5_max value: 45.89190518573557 - type: nauc_recall_at_5_std value: 19.01717163438958 - type: ndcg_at_1 value: 14.532 - type: ndcg_at_10 value: 27.025 - type: ndcg_at_100 value: 33.305 - type: ndcg_at_1000 value: 36.38 - type: ndcg_at_20 value: 29.443 - type: ndcg_at_3 value: 22.035 - type: ndcg_at_5 value: 24.319 - type: precision_at_1 value: 14.532 - type: precision_at_10 value: 4.115 - type: precision_at_100 value: 0.717 - type: precision_at_1000 value: 0.097 - type: precision_at_20 value: 2.536 - type: precision_at_3 value: 9.085 - type: precision_at_5 value: 6.563 - type: recall_at_1 value: 14.532 - type: recall_at_10 value: 41.154 - type: recall_at_100 value: 71.651 - type: recall_at_1000 value: 96.841 - type: recall_at_20 value: 50.71600000000001 - type: recall_at_3 value: 27.254 - type: recall_at_5 value: 32.814 task: type: Retrieval - dataset: config: es name: MTEB MintakaRetrieval (es) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: jinaai/mintakaqa metrics: - type: main_score value: 26.912000000000003 - type: map_at_1 value: 14.686 - type: map_at_10 value: 22.569 - type: map_at_100 value: 23.679 - type: map_at_1000 value: 23.777 - type: map_at_20 value: 23.169 - type: map_at_3 value: 20.201 - type: map_at_5 value: 21.566 - type: mrr_at_1 value: 14.686468646864686 - type: mrr_at_10 value: 22.569346220336296 - type: mrr_at_100 value: 23.678819125817146 - type: mrr_at_1000 value: 23.77713511338264 - type: mrr_at_20 value: 23.16850858443442 - type: mrr_at_3 value: 20.200770077007665 - type: mrr_at_5 value: 21.56628162816276 - type: nauc_map_at_1000_diff1 value: 14.129007578838381 - type: nauc_map_at_1000_max value: 44.4255501141499 - type: nauc_map_at_1000_std value: 19.95906154868176 - type: nauc_map_at_100_diff1 value: 14.09071870575231 - type: nauc_map_at_100_max value: 44.403179928955566 - type: nauc_map_at_100_std value: 20.00413657519976 - type: nauc_map_at_10_diff1 value: 14.149535953153688 - type: nauc_map_at_10_max value: 44.66529917634685 - type: nauc_map_at_10_std value: 19.580235989479394 - type: nauc_map_at_1_diff1 value: 23.489813522176636 - type: nauc_map_at_1_max value: 46.54578639925787 - type: nauc_map_at_1_std value: 16.39083721709994 - type: nauc_map_at_20_diff1 value: 14.021560420656181 - type: nauc_map_at_20_max value: 44.4825455452467 - type: nauc_map_at_20_std value: 19.886927750826878 - type: nauc_map_at_3_diff1 value: 16.182977890477723 - type: nauc_map_at_3_max value: 46.1840554029258 - type: nauc_map_at_3_std value: 18.735671900228958 - type: nauc_map_at_5_diff1 value: 14.779126395472833 - type: nauc_map_at_5_max value: 45.23237213817556 - type: nauc_map_at_5_std value: 19.348508580412872 - type: nauc_mrr_at_1000_diff1 value: 14.129007578838381 - type: nauc_mrr_at_1000_max value: 44.4255501141499 - type: nauc_mrr_at_1000_std value: 19.95906154868176 - type: nauc_mrr_at_100_diff1 value: 14.09071870575231 - type: nauc_mrr_at_100_max value: 44.403179928955566 - type: nauc_mrr_at_100_std value: 20.00413657519976 - type: nauc_mrr_at_10_diff1 value: 14.149535953153688 - type: nauc_mrr_at_10_max value: 44.66529917634685 - type: nauc_mrr_at_10_std value: 19.580235989479394 - type: nauc_mrr_at_1_diff1 value: 23.489813522176636 - type: nauc_mrr_at_1_max value: 46.54578639925787 - type: nauc_mrr_at_1_std value: 16.39083721709994 - type: nauc_mrr_at_20_diff1 value: 14.021560420656181 - type: nauc_mrr_at_20_max value: 44.4825455452467 - type: nauc_mrr_at_20_std value: 19.886927750826878 - type: nauc_mrr_at_3_diff1 value: 16.182977890477723 - type: nauc_mrr_at_3_max value: 46.1840554029258 - type: nauc_mrr_at_3_std value: 18.735671900228958 - type: nauc_mrr_at_5_diff1 value: 14.779126395472833 - type: nauc_mrr_at_5_max value: 45.23237213817556 - type: nauc_mrr_at_5_std value: 19.348508580412872 - type: nauc_ndcg_at_1000_diff1 value: 11.762470380481101 - type: nauc_ndcg_at_1000_max value: 42.8233203033089 - type: nauc_ndcg_at_1000_std value: 21.78503705117719 - type: nauc_ndcg_at_100_diff1 value: 10.45886076220022 - type: nauc_ndcg_at_100_max value: 41.85472899256818 - type: nauc_ndcg_at_100_std value: 23.20955486335138 - type: nauc_ndcg_at_10_diff1 value: 10.605912468659469 - type: nauc_ndcg_at_10_max value: 43.150942448104715 - type: nauc_ndcg_at_10_std value: 21.120035764826085 - type: nauc_ndcg_at_1_diff1 value: 23.489813522176636 - type: nauc_ndcg_at_1_max value: 46.54578639925787 - type: nauc_ndcg_at_1_std value: 16.39083721709994 - type: nauc_ndcg_at_20_diff1 value: 10.11291783888644 - type: nauc_ndcg_at_20_max value: 42.51260678842788 - type: nauc_ndcg_at_20_std value: 22.1744949382252 - type: nauc_ndcg_at_3_diff1 value: 14.25625326760802 - type: nauc_ndcg_at_3_max value: 45.96162916377383 - type: nauc_ndcg_at_3_std value: 19.557832728215523 - type: nauc_ndcg_at_5_diff1 value: 11.956317653823053 - type: nauc_ndcg_at_5_max value: 44.35971268886807 - type: nauc_ndcg_at_5_std value: 20.581696730374233 - type: nauc_precision_at_1000_diff1 value: 5.132291843566577 - type: nauc_precision_at_1000_max value: 25.293354576835263 - type: nauc_precision_at_1000_std value: 40.36005126087624 - type: nauc_precision_at_100_diff1 value: -1.5252854375008238 - type: nauc_precision_at_100_max value: 31.007586474495984 - type: nauc_precision_at_100_std value: 37.297552993548386 - type: nauc_precision_at_10_diff1 value: 1.9663657370770737 - type: nauc_precision_at_10_max value: 39.194092293625125 - type: nauc_precision_at_10_std value: 24.956542621999542 - type: nauc_precision_at_1_diff1 value: 23.489813522176636 - type: nauc_precision_at_1_max value: 46.54578639925787 - type: nauc_precision_at_1_std value: 16.39083721709994 - type: nauc_precision_at_20_diff1 value: 0.011112090390932373 - type: nauc_precision_at_20_max value: 36.9357074392519 - type: nauc_precision_at_20_std value: 28.611387115093876 - type: nauc_precision_at_3_diff1 value: 9.596831091013703 - type: nauc_precision_at_3_max value: 45.3905541893809 - type: nauc_precision_at_3_std value: 21.599314388526945 - type: nauc_precision_at_5_diff1 value: 5.175887949900142 - type: nauc_precision_at_5_max value: 42.129467510414464 - type: nauc_precision_at_5_std value: 23.607251548776677 - type: nauc_recall_at_1000_diff1 value: 5.132291843566257 - type: nauc_recall_at_1000_max value: 25.29335457683396 - type: nauc_recall_at_1000_std value: 40.36005126087638 - type: nauc_recall_at_100_diff1 value: -1.5252854375008988 - type: nauc_recall_at_100_max value: 31.00758647449594 - type: nauc_recall_at_100_std value: 37.29755299354834 - type: nauc_recall_at_10_diff1 value: 1.9663657370770793 - type: nauc_recall_at_10_max value: 39.19409229362512 - type: nauc_recall_at_10_std value: 24.956542621999546 - type: nauc_recall_at_1_diff1 value: 23.489813522176636 - type: nauc_recall_at_1_max value: 46.54578639925787 - type: nauc_recall_at_1_std value: 16.39083721709994 - type: nauc_recall_at_20_diff1 value: 0.011112090390923075 - type: nauc_recall_at_20_max value: 36.93570743925189 - type: nauc_recall_at_20_std value: 28.611387115093883 - type: nauc_recall_at_3_diff1 value: 9.596831091013714 - type: nauc_recall_at_3_max value: 45.39055418938087 - type: nauc_recall_at_3_std value: 21.599314388526956 - type: nauc_recall_at_5_diff1 value: 5.17588794990012 - type: nauc_recall_at_5_max value: 42.12946751041448 - type: nauc_recall_at_5_std value: 23.607251548776695 - type: ndcg_at_1 value: 14.686 - type: ndcg_at_10 value: 26.912000000000003 - type: ndcg_at_100 value: 32.919 - type: ndcg_at_1000 value: 36.119 - type: ndcg_at_20 value: 29.079 - type: ndcg_at_3 value: 21.995 - type: ndcg_at_5 value: 24.474999999999998 - type: precision_at_1 value: 14.686 - type: precision_at_10 value: 4.08 - type: precision_at_100 value: 0.703 - type: precision_at_1000 value: 0.097 - type: precision_at_20 value: 2.467 - type: precision_at_3 value: 9.062000000000001 - type: precision_at_5 value: 6.65 - type: recall_at_1 value: 14.686 - type: recall_at_10 value: 40.8 - type: recall_at_100 value: 70.338 - type: recall_at_1000 value: 96.82300000000001 - type: recall_at_20 value: 49.34 - type: recall_at_3 value: 27.186 - type: recall_at_5 value: 33.251 task: type: Retrieval - dataset: config: fr name: MTEB MintakaRetrieval (fr) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: jinaai/mintakaqa metrics: - type: main_score value: 26.909 - type: map_at_1 value: 14.701 - type: map_at_10 value: 22.613 - type: map_at_100 value: 23.729 - type: map_at_1000 value: 23.837 - type: map_at_20 value: 23.262 - type: map_at_3 value: 20.236 - type: map_at_5 value: 21.673000000000002 - type: mrr_at_1 value: 14.7010647010647 - type: mrr_at_10 value: 22.613165113165113 - type: mrr_at_100 value: 23.72877605989423 - type: mrr_at_1000 value: 23.837150802746805 - type: mrr_at_20 value: 23.261627081110596 - type: mrr_at_3 value: 20.2361452361452 - type: mrr_at_5 value: 21.673491673491625 - type: nauc_map_at_1000_diff1 value: 17.08927788889635 - type: nauc_map_at_1000_max value: 47.240929150603336 - type: nauc_map_at_1000_std value: 20.559244258100275 - type: nauc_map_at_100_diff1 value: 17.029461792796777 - type: nauc_map_at_100_max value: 47.207381115550696 - type: nauc_map_at_100_std value: 20.581498156895265 - type: nauc_map_at_10_diff1 value: 17.351456007804536 - type: nauc_map_at_10_max value: 47.815880040221344 - type: nauc_map_at_10_std value: 20.292999107555794 - type: nauc_map_at_1_diff1 value: 27.297525357600776 - type: nauc_map_at_1_max value: 47.18835074959486 - type: nauc_map_at_1_std value: 18.304203168281834 - type: nauc_map_at_20_diff1 value: 17.157460199542136 - type: nauc_map_at_20_max value: 47.4776610667456 - type: nauc_map_at_20_std value: 20.499186342964478 - type: nauc_map_at_3_diff1 value: 19.393119961356277 - type: nauc_map_at_3_max value: 49.02841822452882 - type: nauc_map_at_3_std value: 19.293122796321292 - type: nauc_map_at_5_diff1 value: 17.76275044752008 - type: nauc_map_at_5_max value: 48.01292548040298 - type: nauc_map_at_5_std value: 19.928449977400504 - type: nauc_mrr_at_1000_diff1 value: 17.08927788889635 - type: nauc_mrr_at_1000_max value: 47.240929150603336 - type: nauc_mrr_at_1000_std value: 20.559244258100275 - type: nauc_mrr_at_100_diff1 value: 17.029461792796777 - type: nauc_mrr_at_100_max value: 47.207381115550696 - type: nauc_mrr_at_100_std value: 20.581498156895265 - type: nauc_mrr_at_10_diff1 value: 17.351456007804536 - type: nauc_mrr_at_10_max value: 47.815880040221344 - type: nauc_mrr_at_10_std value: 20.292999107555794 - type: nauc_mrr_at_1_diff1 value: 27.297525357600776 - type: nauc_mrr_at_1_max value: 47.18835074959486 - type: nauc_mrr_at_1_std value: 18.304203168281834 - type: nauc_mrr_at_20_diff1 value: 17.157460199542136 - type: nauc_mrr_at_20_max value: 47.4776610667456 - type: nauc_mrr_at_20_std value: 20.499186342964478 - type: nauc_mrr_at_3_diff1 value: 19.393119961356277 - type: nauc_mrr_at_3_max value: 49.02841822452882 - type: nauc_mrr_at_3_std value: 19.293122796321292 - type: nauc_mrr_at_5_diff1 value: 17.76275044752008 - type: nauc_mrr_at_5_max value: 48.01292548040298 - type: nauc_mrr_at_5_std value: 19.928449977400504 - type: nauc_ndcg_at_1000_diff1 value: 13.989496006047975 - type: nauc_ndcg_at_1000_max value: 45.626323944336114 - type: nauc_ndcg_at_1000_std value: 22.125600410796515 - type: nauc_ndcg_at_100_diff1 value: 12.302204843705244 - type: nauc_ndcg_at_100_max value: 44.46856314559079 - type: nauc_ndcg_at_100_std value: 23.084984546328677 - type: nauc_ndcg_at_10_diff1 value: 14.001226213368275 - type: nauc_ndcg_at_10_max value: 47.37780636546918 - type: nauc_ndcg_at_10_std value: 21.702709032840637 - type: nauc_ndcg_at_1_diff1 value: 27.297525357600776 - type: nauc_ndcg_at_1_max value: 47.18835074959486 - type: nauc_ndcg_at_1_std value: 18.304203168281834 - type: nauc_ndcg_at_20_diff1 value: 13.317759910171056 - type: nauc_ndcg_at_20_max value: 46.25171251043813 - type: nauc_ndcg_at_20_std value: 22.309331575402595 - type: nauc_ndcg_at_3_diff1 value: 17.555381234893872 - type: nauc_ndcg_at_3_max value: 49.48635590260059 - type: nauc_ndcg_at_3_std value: 19.734570962933674 - type: nauc_ndcg_at_5_diff1 value: 14.844841165765061 - type: nauc_ndcg_at_5_max value: 47.76437065028708 - type: nauc_ndcg_at_5_std value: 20.816034479453954 - type: nauc_precision_at_1000_diff1 value: -15.591898698252546 - type: nauc_precision_at_1000_max value: 20.545984285353892 - type: nauc_precision_at_1000_std value: 38.9013414992826 - type: nauc_precision_at_100_diff1 value: -5.290395978742176 - type: nauc_precision_at_100_max value: 31.340480360546845 - type: nauc_precision_at_100_std value: 33.6897935720505 - type: nauc_precision_at_10_diff1 value: 5.965001997926562 - type: nauc_precision_at_10_max value: 46.12515296162247 - type: nauc_precision_at_10_std value: 25.409433135253558 - type: nauc_precision_at_1_diff1 value: 27.297525357600776 - type: nauc_precision_at_1_max value: 47.18835074959486 - type: nauc_precision_at_1_std value: 18.304203168281834 - type: nauc_precision_at_20_diff1 value: 3.4438127279827744 - type: nauc_precision_at_20_max value: 42.36095587714494 - type: nauc_precision_at_20_std value: 27.367900512797906 - type: nauc_precision_at_3_diff1 value: 13.165017224718916 - type: nauc_precision_at_3_max value: 50.58931825484506 - type: nauc_precision_at_3_std value: 20.852009214609442 - type: nauc_precision_at_5_diff1 value: 7.840087177549876 - type: nauc_precision_at_5_max value: 46.99388755575109 - type: nauc_precision_at_5_std value: 23.048702393099834 - type: nauc_recall_at_1000_diff1 value: -15.591898698252932 - type: nauc_recall_at_1000_max value: 20.5459842853537 - type: nauc_recall_at_1000_std value: 38.901341499282395 - type: nauc_recall_at_100_diff1 value: -5.290395978742165 - type: nauc_recall_at_100_max value: 31.340480360546863 - type: nauc_recall_at_100_std value: 33.68979357205046 - type: nauc_recall_at_10_diff1 value: 5.96500199792656 - type: nauc_recall_at_10_max value: 46.1251529616225 - type: nauc_recall_at_10_std value: 25.409433135253543 - type: nauc_recall_at_1_diff1 value: 27.297525357600776 - type: nauc_recall_at_1_max value: 47.18835074959486 - type: nauc_recall_at_1_std value: 18.304203168281834 - type: nauc_recall_at_20_diff1 value: 3.4438127279827833 - type: nauc_recall_at_20_max value: 42.36095587714498 - type: nauc_recall_at_20_std value: 27.36790051279787 - type: nauc_recall_at_3_diff1 value: 13.165017224718916 - type: nauc_recall_at_3_max value: 50.589318254845054 - type: nauc_recall_at_3_std value: 20.852009214609435 - type: nauc_recall_at_5_diff1 value: 7.840087177549891 - type: nauc_recall_at_5_max value: 46.99388755575112 - type: nauc_recall_at_5_std value: 23.048702393099845 - type: ndcg_at_1 value: 14.701 - type: ndcg_at_10 value: 26.909 - type: ndcg_at_100 value: 32.727000000000004 - type: ndcg_at_1000 value: 36.086 - type: ndcg_at_20 value: 29.236 - type: ndcg_at_3 value: 22.004 - type: ndcg_at_5 value: 24.615000000000002 - type: precision_at_1 value: 14.701 - type: precision_at_10 value: 4.062 - type: precision_at_100 value: 0.688 - type: precision_at_1000 value: 0.096 - type: precision_at_20 value: 2.488 - type: precision_at_3 value: 9.036 - type: precision_at_5 value: 6.699 - type: recall_at_1 value: 14.701 - type: recall_at_10 value: 40.622 - type: recall_at_100 value: 68.796 - type: recall_at_1000 value: 96.314 - type: recall_at_20 value: 49.754 - type: recall_at_3 value: 27.108999999999998 - type: recall_at_5 value: 33.497 task: type: Retrieval - dataset: config: default name: MTEB MultilingualSentiment (default) revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a split: test type: C-MTEB/MultilingualSentiment-classification metrics: - type: accuracy value: 73.20999999999998 - type: f1 value: 73.18755986777474 - type: f1_weighted value: 73.18755986777475 - type: main_score value: 73.20999999999998 task: type: Classification - dataset: config: default name: MTEB NFCorpus (default) revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 split: test type: mteb/nfcorpus metrics: - type: map_at_1 value: 4.822 - type: map_at_10 value: 13.144 - type: map_at_100 value: 17.254 - type: map_at_1000 value: 18.931 - type: map_at_20 value: 14.834 - type: map_at_3 value: 8.975 - type: map_at_5 value: 10.922 - type: mrr_at_1 value: 47.059 - type: mrr_at_10 value: 55.806999999999995 - type: mrr_at_100 value: 56.286 - type: mrr_at_1000 value: 56.327000000000005 - type: mrr_at_20 value: 56.00000000000001 - type: mrr_at_3 value: 54.17999999999999 - type: mrr_at_5 value: 55.155 - type: ndcg_at_1 value: 44.427 - type: ndcg_at_10 value: 36.623 - type: ndcg_at_100 value: 33.664 - type: ndcg_at_1000 value: 42.538 - type: ndcg_at_20 value: 34.066 - type: ndcg_at_3 value: 41.118 - type: ndcg_at_5 value: 39.455 - type: precision_at_1 value: 46.44 - type: precision_at_10 value: 28.607 - type: precision_at_100 value: 9.189 - type: precision_at_1000 value: 2.261 - type: precision_at_20 value: 21.238 - type: precision_at_3 value: 39.628 - type: precision_at_5 value: 35.604 - type: recall_at_1 value: 4.822 - type: recall_at_10 value: 17.488999999999997 - type: recall_at_100 value: 35.052 - type: recall_at_1000 value: 66.67999999999999 - type: recall_at_20 value: 21.343999999999998 - type: recall_at_3 value: 10.259 - type: recall_at_5 value: 13.406 - type: main_score value: 36.623 task: type: Retrieval - dataset: config: default name: MTEB NQ (default) revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 split: test type: mteb/nq metrics: - type: map_at_1 value: 41.411 - type: map_at_10 value: 57.179 - type: map_at_100 value: 57.945 - type: map_at_1000 value: 57.967999999999996 - type: map_at_20 value: 57.687 - type: map_at_3 value: 53.46300000000001 - type: map_at_5 value: 55.696999999999996 - type: mrr_at_1 value: 46.233999999999995 - type: mrr_at_10 value: 59.831999999999994 - type: mrr_at_100 value: 60.33500000000001 - type: mrr_at_1000 value: 60.348 - type: mrr_at_20 value: 60.167 - type: mrr_at_3 value: 56.972 - type: mrr_at_5 value: 58.74 - type: ndcg_at_1 value: 46.205 - type: ndcg_at_10 value: 64.23100000000001 - type: ndcg_at_100 value: 67.242 - type: ndcg_at_1000 value: 67.72500000000001 - type: ndcg_at_20 value: 65.77300000000001 - type: ndcg_at_3 value: 57.516 - type: ndcg_at_5 value: 61.11600000000001 - type: precision_at_1 value: 46.205 - type: precision_at_10 value: 9.873 - type: precision_at_100 value: 1.158 - type: precision_at_1000 value: 0.12 - type: precision_at_20 value: 5.319 - type: precision_at_3 value: 25.424999999999997 - type: precision_at_5 value: 17.375 - type: recall_at_1 value: 41.411 - type: recall_at_10 value: 82.761 - type: recall_at_100 value: 95.52199999999999 - type: recall_at_1000 value: 99.02499999999999 - type: recall_at_20 value: 88.34 - type: recall_at_3 value: 65.73 - type: recall_at_5 value: 73.894 - type: main_score value: 64.23100000000001 task: type: Retrieval - dataset: config: default name: MTEB Ocnli (default) revision: 66e76a618a34d6d565d5538088562851e6daa7ec split: validation type: C-MTEB/OCNLI metrics: - type: cosine_accuracy value: 62.3714131023281 - type: cosine_accuracy_threshold value: 79.70921993255615 - type: cosine_ap value: 66.41380155495659 - type: cosine_f1 value: 68.89547185780786 - type: cosine_f1_threshold value: 72.91591167449951 - type: cosine_precision value: 57.485875706214685 - type: cosine_recall value: 85.95564941921859 - type: dot_accuracy value: 60.47644829453167 - type: dot_accuracy_threshold value: 36627.362060546875 - type: dot_ap value: 63.696303449293204 - type: dot_f1 value: 68.3986041101202 - type: dot_f1_threshold value: 30452.72216796875 - type: dot_precision value: 54.04411764705882 - type: dot_recall value: 93.13621964097149 - type: euclidean_accuracy value: 63.02111532214402 - type: euclidean_accuracy_threshold value: 1392.76762008667 - type: euclidean_ap value: 66.65907089443218 - type: euclidean_f1 value: 69.05036524413688 - type: euclidean_f1_threshold value: 1711.5310668945312 - type: euclidean_precision value: 54.29262394195889 - type: euclidean_recall value: 94.82576557550159 - type: main_score value: 63.02111532214402 - type: manhattan_accuracy value: 62.75040606388739 - type: manhattan_accuracy_threshold value: 32475.347900390625 - type: manhattan_ap value: 66.50943585125434 - type: manhattan_f1 value: 69.08382066276802 - type: manhattan_f1_threshold value: 41238.470458984375 - type: manhattan_precision value: 54.75896168108776 - type: manhattan_recall value: 93.55860612460401 - type: max_accuracy value: 63.02111532214402 - type: max_ap value: 66.65907089443218 - type: max_f1 value: 69.08382066276802 - type: max_precision value: 57.485875706214685 - type: max_recall value: 94.82576557550159 - type: similarity_accuracy value: 62.3714131023281 - type: similarity_accuracy_threshold value: 79.70921993255615 - type: similarity_ap value: 66.41380155495659 - type: similarity_f1 value: 68.89547185780786 - type: similarity_f1_threshold value: 72.91591167449951 - type: similarity_precision value: 57.485875706214685 - type: similarity_recall value: 85.95564941921859 task: type: PairClassification - dataset: config: default name: MTEB OnlineShopping (default) revision: e610f2ebd179a8fda30ae534c3878750a96db120 split: test type: C-MTEB/OnlineShopping-classification metrics: - type: accuracy value: 91.88000000000001 - type: ap value: 89.52463684448476 - type: ap_weighted value: 89.52463684448476 - type: f1 value: 91.86313022306673 - type: f1_weighted value: 91.87806318146912 - type: main_score value: 91.88000000000001 task: type: Classification - dataset: config: en name: MTEB OpusparcusPC (en) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test.full type: GEM/opusparcus metrics: - type: cosine_accuracy value: 92.65578635014838 - type: cosine_accuracy_threshold value: 74.02530312538147 - type: cosine_ap value: 98.3834226153613 - type: cosine_f1 value: 94.92567913890312 - type: cosine_f1_threshold value: 74.02530312538147 - type: cosine_precision value: 95.562435500516 - type: cosine_recall value: 94.29735234215886 - type: dot_accuracy value: 91.54302670623146 - type: dot_accuracy_threshold value: 34452.29187011719 - type: dot_ap value: 98.1237257754439 - type: dot_f1 value: 94.22400803616273 - type: dot_f1_threshold value: 33670.41931152344 - type: dot_precision value: 92.9633300297324 - type: dot_recall value: 95.5193482688391 - type: euclidean_accuracy value: 92.28486646884274 - type: euclidean_accuracy_threshold value: 1602.8022766113281 - type: euclidean_ap value: 98.3099021504706 - type: euclidean_f1 value: 94.75277497477296 - type: euclidean_f1_threshold value: 1604.7462463378906 - type: euclidean_precision value: 93.89999999999999 - type: euclidean_recall value: 95.62118126272912 - type: main_score value: 98.3834226153613 - type: manhattan_accuracy value: 92.2106824925816 - type: manhattan_accuracy_threshold value: 38872.90954589844 - type: manhattan_ap value: 98.28694101230218 - type: manhattan_f1 value: 94.67815509376584 - type: manhattan_f1_threshold value: 38872.90954589844 - type: manhattan_precision value: 94.24823410696267 - type: manhattan_recall value: 95.11201629327903 - type: max_accuracy value: 92.65578635014838 - type: max_ap value: 98.3834226153613 - type: max_f1 value: 94.92567913890312 - type: max_precision value: 95.562435500516 - type: max_recall value: 95.62118126272912 - type: similarity_accuracy value: 92.65578635014838 - type: similarity_accuracy_threshold value: 74.02530312538147 - type: similarity_ap value: 98.3834226153613 - type: similarity_f1 value: 94.92567913890312 - type: similarity_f1_threshold value: 74.02530312538147 - type: similarity_precision value: 95.562435500516 - type: similarity_recall value: 94.29735234215886 task: type: PairClassification - dataset: config: de name: MTEB OpusparcusPC (de) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test.full type: GEM/opusparcus metrics: - type: cosine_accuracy value: 87.72178850248403 - type: cosine_accuracy_threshold value: 73.33863377571106 - type: cosine_ap value: 96.98901408834976 - type: cosine_f1 value: 91.89944134078212 - type: cosine_f1_threshold value: 71.45810127258301 - type: cosine_precision value: 89.64577656675749 - type: cosine_recall value: 94.26934097421203 - type: dot_accuracy value: 86.30234208658624 - type: dot_accuracy_threshold value: 32027.130126953125 - type: dot_ap value: 96.12260574893256 - type: dot_f1 value: 91.31602506714414 - type: dot_f1_threshold value: 30804.376220703125 - type: dot_precision value: 85.93091828138164 - type: dot_recall value: 97.42120343839542 - type: euclidean_accuracy value: 87.9347054648687 - type: euclidean_accuracy_threshold value: 1609.6670150756836 - type: euclidean_ap value: 97.00238860358252 - type: euclidean_f1 value: 92.1089063221043 - type: euclidean_f1_threshold value: 1641.8487548828125 - type: euclidean_precision value: 89.10714285714286 - type: euclidean_recall value: 95.31996179560649 - type: main_score value: 97.00238860358252 - type: manhattan_accuracy value: 87.72178850248403 - type: manhattan_accuracy_threshold value: 40137.060546875 - type: manhattan_ap value: 96.98653728159941 - type: manhattan_f1 value: 92.03865623561896 - type: manhattan_f1_threshold value: 40137.060546875 - type: manhattan_precision value: 88.80994671403198 - type: manhattan_recall value: 95.51098376313276 - type: max_accuracy value: 87.9347054648687 - type: max_ap value: 97.00238860358252 - type: max_f1 value: 92.1089063221043 - type: max_precision value: 89.64577656675749 - type: max_recall value: 97.42120343839542 - type: similarity_accuracy value: 87.72178850248403 - type: similarity_accuracy_threshold value: 73.33863377571106 - type: similarity_ap value: 96.98901408834976 - type: similarity_f1 value: 91.89944134078212 - type: similarity_f1_threshold value: 71.45810127258301 - type: similarity_precision value: 89.64577656675749 - type: similarity_recall value: 94.26934097421203 task: type: PairClassification - dataset: config: fr name: MTEB OpusparcusPC (fr) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test.full type: GEM/opusparcus metrics: - type: cosine_accuracy value: 80.92643051771117 - type: cosine_accuracy_threshold value: 76.68856382369995 - type: cosine_ap value: 93.74622381534307 - type: cosine_f1 value: 87.12328767123287 - type: cosine_f1_threshold value: 71.64022922515869 - type: cosine_precision value: 80.64243448858834 - type: cosine_recall value: 94.73684210526315 - type: dot_accuracy value: 80.858310626703 - type: dot_accuracy_threshold value: 34028.3935546875 - type: dot_ap value: 91.18448457633308 - type: dot_f1 value: 86.82606657290202 - type: dot_f1_threshold value: 34028.3935546875 - type: dot_precision value: 82.2380106571936 - type: dot_recall value: 91.9563058589871 - type: euclidean_accuracy value: 80.858310626703 - type: euclidean_accuracy_threshold value: 1595.7651138305664 - type: euclidean_ap value: 93.8182717829648 - type: euclidean_f1 value: 87.04044117647058 - type: euclidean_f1_threshold value: 1609.2475891113281 - type: euclidean_precision value: 81.00940975192472 - type: euclidean_recall value: 94.04170804369414 - type: main_score value: 93.8182717829648 - type: manhattan_accuracy value: 80.99455040871935 - type: manhattan_accuracy_threshold value: 38092.132568359375 - type: manhattan_ap value: 93.77563401151711 - type: manhattan_f1 value: 86.91983122362869 - type: manhattan_f1_threshold value: 38092.132568359375 - type: manhattan_precision value: 82.32682060390763 - type: manhattan_recall value: 92.05561072492551 - type: max_accuracy value: 80.99455040871935 - type: max_ap value: 93.8182717829648 - type: max_f1 value: 87.12328767123287 - type: max_precision value: 82.32682060390763 - type: max_recall value: 94.73684210526315 - type: similarity_accuracy value: 80.92643051771117 - type: similarity_accuracy_threshold value: 76.68856382369995 - type: similarity_ap value: 93.74622381534307 - type: similarity_f1 value: 87.12328767123287 - type: similarity_f1_threshold value: 71.64022922515869 - type: similarity_precision value: 80.64243448858834 - type: similarity_recall value: 94.73684210526315 task: type: PairClassification - dataset: config: ru name: MTEB OpusparcusPC (ru) revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a split: test.full type: GEM/opusparcus metrics: - type: cosine_accuracy value: 76.83823529411765 - type: cosine_accuracy_threshold value: 72.70769476890564 - type: cosine_ap value: 89.56692049908222 - type: cosine_f1 value: 83.99832003359934 - type: cosine_f1_threshold value: 70.9052324295044 - type: cosine_precision value: 76.16146230007617 - type: cosine_recall value: 93.63295880149812 - type: dot_accuracy value: 76.28676470588235 - type: dot_accuracy_threshold value: 33740.68908691406 - type: dot_ap value: 87.77185177141567 - type: dot_f1 value: 83.62251375370292 - type: dot_f1_threshold value: 32726.611328125 - type: dot_precision value: 76.29343629343629 - type: dot_recall value: 92.50936329588015 - type: euclidean_accuracy value: 77.32843137254902 - type: euclidean_accuracy_threshold value: 1566.510009765625 - type: euclidean_ap value: 89.60605626791111 - type: euclidean_f1 value: 84.06546080964686 - type: euclidean_f1_threshold value: 1576.4202117919922 - type: euclidean_precision value: 77.83094098883574 - type: euclidean_recall value: 91.38576779026218 - type: main_score value: 89.60605626791111 - type: manhattan_accuracy value: 76.89950980392157 - type: manhattan_accuracy_threshold value: 38202.215576171875 - type: manhattan_ap value: 89.55766894104868 - type: manhattan_f1 value: 83.80462724935732 - type: manhattan_f1_threshold value: 38934.375 - type: manhattan_precision value: 77.25118483412322 - type: manhattan_recall value: 91.57303370786516 - type: max_accuracy value: 77.32843137254902 - type: max_ap value: 89.60605626791111 - type: max_f1 value: 84.06546080964686 - type: max_precision value: 77.83094098883574 - type: max_recall value: 93.63295880149812 - type: similarity_accuracy value: 76.83823529411765 - type: similarity_accuracy_threshold value: 72.70769476890564 - type: similarity_ap value: 89.56692049908222 - type: similarity_f1 value: 83.99832003359934 - type: similarity_f1_threshold value: 70.9052324295044 - type: similarity_precision value: 76.16146230007617 - type: similarity_recall value: 93.63295880149812 task: type: PairClassification - dataset: config: default name: MTEB PAC (default) revision: fc69d1c153a8ccdcf1eef52f4e2a27f88782f543 split: test type: laugustyniak/abusive-clauses-pl metrics: - type: accuracy value: 68.39559803069794 - type: ap value: 77.68074206719457 - type: ap_weighted value: 77.68074206719457 - type: f1 value: 66.23485605467732 - type: f1_weighted value: 69.03201442129347 - type: main_score value: 68.39559803069794 task: type: Classification - dataset: config: default name: MTEB PAWSX (default) revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1 split: test type: C-MTEB/PAWSX metrics: - type: cosine_pearson value: 13.161523266433587 - type: cosine_spearman value: 15.557333873773386 - type: euclidean_pearson value: 17.147508431907525 - type: euclidean_spearman value: 15.664112857732146 - type: main_score value: 15.557333873773386 - type: manhattan_pearson value: 17.130875906264386 - type: manhattan_spearman value: 15.624397342229637 - type: pearson value: 13.161523266433587 - type: spearman value: 15.557333873773386 task: type: STS - dataset: config: default name: MTEB PSC (default) revision: d05a294af9e1d3ff2bfb6b714e08a24a6cabc669 split: test type: PL-MTEB/psc-pairclassification metrics: - type: cosine_accuracy value: 97.86641929499072 - type: cosine_accuracy_threshold value: 79.0391206741333 - type: cosine_ap value: 99.19403807771533 - type: cosine_f1 value: 96.45608628659475 - type: cosine_f1_threshold value: 79.0391206741333 - type: cosine_precision value: 97.50778816199377 - type: cosine_recall value: 95.42682926829268 - type: dot_accuracy value: 98.14471243042672 - type: dot_accuracy_threshold value: 29808.1787109375 - type: dot_ap value: 99.331999859971 - type: dot_f1 value: 97.01492537313433 - type: dot_f1_threshold value: 29808.1787109375 - type: dot_precision value: 95.02923976608187 - type: dot_recall value: 99.08536585365853 - type: euclidean_accuracy value: 97.49536178107606 - type: euclidean_accuracy_threshold value: 1276.227855682373 - type: euclidean_ap value: 98.91056467717377 - type: euclidean_f1 value: 95.83975346687212 - type: euclidean_f1_threshold value: 1276.227855682373 - type: euclidean_precision value: 96.88473520249221 - type: euclidean_recall value: 94.8170731707317 - type: main_score value: 99.331999859971 - type: manhattan_accuracy value: 97.49536178107606 - type: manhattan_accuracy_threshold value: 31097.674560546875 - type: manhattan_ap value: 98.95694691792707 - type: manhattan_f1 value: 95.83975346687212 - type: manhattan_f1_threshold value: 31097.674560546875 - type: manhattan_precision value: 96.88473520249221 - type: manhattan_recall value: 94.8170731707317 - type: max_accuracy value: 98.14471243042672 - type: max_ap value: 99.331999859971 - type: max_f1 value: 97.01492537313433 - type: max_precision value: 97.50778816199377 - type: max_recall value: 99.08536585365853 - type: similarity_accuracy value: 97.86641929499072 - type: similarity_accuracy_threshold value: 79.0391206741333 - type: similarity_ap value: 99.19403807771533 - type: similarity_f1 value: 96.45608628659475 - type: similarity_f1_threshold value: 79.0391206741333 - type: similarity_precision value: 97.50778816199377 - type: similarity_recall value: 95.42682926829268 task: type: PairClassification - dataset: config: en name: MTEB PawsXPairClassification (en) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: google-research-datasets/paws-x metrics: - type: cosine_accuracy value: 61.8 - type: cosine_accuracy_threshold value: 99.5664119720459 - type: cosine_ap value: 60.679317786040585 - type: cosine_f1 value: 63.17354143441101 - type: cosine_f1_threshold value: 97.22164869308472 - type: cosine_precision value: 47.6457399103139 - type: cosine_recall value: 93.71554575523705 - type: dot_accuracy value: 55.7 - type: dot_accuracy_threshold value: 48353.62548828125 - type: dot_ap value: 48.53805970536875 - type: dot_f1 value: 62.42214532871972 - type: dot_f1_threshold value: 38215.53955078125 - type: dot_precision value: 45.48663640948058 - type: dot_recall value: 99.44873208379272 - type: euclidean_accuracy value: 61.75000000000001 - type: euclidean_accuracy_threshold value: 189.0761137008667 - type: euclidean_ap value: 60.55517418691518 - type: euclidean_f1 value: 63.07977736549165 - type: euclidean_f1_threshold value: 504.3168067932129 - type: euclidean_precision value: 47.53914988814318 - type: euclidean_recall value: 93.71554575523705 - type: main_score value: 60.679317786040585 - type: manhattan_accuracy value: 61.9 - type: manhattan_accuracy_threshold value: 4695.778274536133 - type: manhattan_ap value: 60.48686620413608 - type: manhattan_f1 value: 62.92880855772778 - type: manhattan_f1_threshold value: 12542.36831665039 - type: manhattan_precision value: 47.28381374722838 - type: manhattan_recall value: 94.04630650496141 - type: max_accuracy value: 61.9 - type: max_ap value: 60.679317786040585 - type: max_f1 value: 63.17354143441101 - type: max_precision value: 47.6457399103139 - type: max_recall value: 99.44873208379272 - type: similarity_accuracy value: 61.8 - type: similarity_accuracy_threshold value: 99.5664119720459 - type: similarity_ap value: 60.679317786040585 - type: similarity_f1 value: 63.17354143441101 - type: similarity_f1_threshold value: 97.22164869308472 - type: similarity_precision value: 47.6457399103139 - type: similarity_recall value: 93.71554575523705 task: type: PairClassification - dataset: config: de name: MTEB PawsXPairClassification (de) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: google-research-datasets/paws-x metrics: - type: cosine_accuracy value: 60.25 - type: cosine_accuracy_threshold value: 99.54338073730469 - type: cosine_ap value: 56.7863613689054 - type: cosine_f1 value: 62.23499820337766 - type: cosine_f1_threshold value: 89.95014429092407 - type: cosine_precision value: 45.86864406779661 - type: cosine_recall value: 96.75977653631284 - type: dot_accuracy value: 56.8 - type: dot_accuracy_threshold value: 47349.78332519531 - type: dot_ap value: 49.7857806061729 - type: dot_f1 value: 62.31225986727209 - type: dot_f1_threshold value: 30143.206787109375 - type: dot_precision value: 45.32520325203252 - type: dot_recall value: 99.66480446927373 - type: euclidean_accuracy value: 60.3 - type: euclidean_accuracy_threshold value: 219.78106498718262 - type: euclidean_ap value: 56.731544327179606 - type: euclidean_f1 value: 62.19895287958115 - type: euclidean_f1_threshold value: 1792.1623229980469 - type: euclidean_precision value: 45.22842639593909 - type: euclidean_recall value: 99.55307262569832 - type: main_score value: 56.7863613689054 - type: manhattan_accuracy value: 60.150000000000006 - type: manhattan_accuracy_threshold value: 5104.503631591797 - type: manhattan_ap value: 56.70304479768734 - type: manhattan_f1 value: 62.22067039106145 - type: manhattan_f1_threshold value: 42839.471435546875 - type: manhattan_precision value: 45.2513966480447 - type: manhattan_recall value: 99.55307262569832 - type: max_accuracy value: 60.3 - type: max_ap value: 56.7863613689054 - type: max_f1 value: 62.31225986727209 - type: max_precision value: 45.86864406779661 - type: max_recall value: 99.66480446927373 - type: similarity_accuracy value: 60.25 - type: similarity_accuracy_threshold value: 99.54338073730469 - type: similarity_ap value: 56.7863613689054 - type: similarity_f1 value: 62.23499820337766 - type: similarity_f1_threshold value: 89.95014429092407 - type: similarity_precision value: 45.86864406779661 - type: similarity_recall value: 96.75977653631284 task: type: PairClassification - dataset: config: es name: MTEB PawsXPairClassification (es) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: google-research-datasets/paws-x metrics: - type: cosine_accuracy value: 59.699999999999996 - type: cosine_accuracy_threshold value: 99.55930709838867 - type: cosine_ap value: 57.31662248806265 - type: cosine_f1 value: 62.444061962134256 - type: cosine_f1_threshold value: 74.75898265838623 - type: cosine_precision value: 45.3953953953954 - type: cosine_recall value: 100.0 - type: dot_accuracy value: 55.900000000000006 - type: dot_accuracy_threshold value: 47512.90283203125 - type: dot_ap value: 49.39339147787568 - type: dot_f1 value: 62.487082328625554 - type: dot_f1_threshold value: 34989.03503417969 - type: dot_precision value: 45.44088176352705 - type: dot_recall value: 100.0 - type: euclidean_accuracy value: 59.599999999999994 - type: euclidean_accuracy_threshold value: 200.82547664642334 - type: euclidean_ap value: 57.19737488445163 - type: euclidean_f1 value: 62.444061962134256 - type: euclidean_f1_threshold value: 1538.8837814331055 - type: euclidean_precision value: 45.3953953953954 - type: euclidean_recall value: 100.0 - type: main_score value: 57.31662248806265 - type: manhattan_accuracy value: 59.550000000000004 - type: manhattan_accuracy_threshold value: 5016.501617431641 - type: manhattan_ap value: 57.089959907945065 - type: manhattan_f1 value: 62.444061962134256 - type: manhattan_f1_threshold value: 37523.53515625 - type: manhattan_precision value: 45.3953953953954 - type: manhattan_recall value: 100.0 - type: max_accuracy value: 59.699999999999996 - type: max_ap value: 57.31662248806265 - type: max_f1 value: 62.487082328625554 - type: max_precision value: 45.44088176352705 - type: max_recall value: 100.0 - type: similarity_accuracy value: 59.699999999999996 - type: similarity_accuracy_threshold value: 99.55930709838867 - type: similarity_ap value: 57.31662248806265 - type: similarity_f1 value: 62.444061962134256 - type: similarity_f1_threshold value: 74.75898265838623 - type: similarity_precision value: 45.3953953953954 - type: similarity_recall value: 100.0 task: type: PairClassification - dataset: config: fr name: MTEB PawsXPairClassification (fr) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: google-research-datasets/paws-x metrics: - type: cosine_accuracy value: 61.150000000000006 - type: cosine_accuracy_threshold value: 99.36153888702393 - type: cosine_ap value: 59.43845317938599 - type: cosine_f1 value: 62.51298026998961 - type: cosine_f1_threshold value: 76.77866220474243 - type: cosine_precision value: 45.468277945619334 - type: cosine_recall value: 100.0 - type: dot_accuracy value: 55.75 - type: dot_accuracy_threshold value: 48931.55212402344 - type: dot_ap value: 50.15949290538757 - type: dot_f1 value: 62.53462603878117 - type: dot_f1_threshold value: 34415.7958984375 - type: dot_precision value: 45.4911838790932 - type: dot_recall value: 100.0 - type: euclidean_accuracy value: 61.050000000000004 - type: euclidean_accuracy_threshold value: 240.8097267150879 - type: euclidean_ap value: 59.367971294226216 - type: euclidean_f1 value: 62.51298026998961 - type: euclidean_f1_threshold value: 1444.132423400879 - type: euclidean_precision value: 45.468277945619334 - type: euclidean_recall value: 100.0 - type: main_score value: 59.43845317938599 - type: manhattan_accuracy value: 60.95 - type: manhattan_accuracy_threshold value: 5701.206207275391 - type: manhattan_ap value: 59.30094096378774 - type: manhattan_f1 value: 62.53462603878117 - type: manhattan_f1_threshold value: 33445.672607421875 - type: manhattan_precision value: 45.4911838790932 - type: manhattan_recall value: 100.0 - type: max_accuracy value: 61.150000000000006 - type: max_ap value: 59.43845317938599 - type: max_f1 value: 62.53462603878117 - type: max_precision value: 45.4911838790932 - type: max_recall value: 100.0 - type: similarity_accuracy value: 61.150000000000006 - type: similarity_accuracy_threshold value: 99.36153888702393 - type: similarity_ap value: 59.43845317938599 - type: similarity_f1 value: 62.51298026998961 - type: similarity_f1_threshold value: 76.77866220474243 - type: similarity_precision value: 45.468277945619334 - type: similarity_recall value: 100.0 task: type: PairClassification - dataset: config: zh name: MTEB PawsXPairClassification (zh) revision: 8a04d940a42cd40658986fdd8e3da561533a3646 split: test type: google-research-datasets/paws-x metrics: - type: cosine_accuracy value: 58.85 - type: cosine_accuracy_threshold value: 99.73838329315186 - type: cosine_ap value: 54.66913160570546 - type: cosine_f1 value: 62.32136632973162 - type: cosine_f1_threshold value: 76.4499306678772 - type: cosine_precision value: 45.265822784810126 - type: cosine_recall value: 100.0 - type: dot_accuracy value: 56.25 - type: dot_accuracy_threshold value: 47351.9287109375 - type: dot_ap value: 48.5266232989438 - type: dot_f1 value: 62.277951933124356 - type: dot_f1_threshold value: 31325.28076171875 - type: dot_precision value: 45.220030349013655 - type: dot_recall value: 100.0 - type: euclidean_accuracy value: 58.9 - type: euclidean_accuracy_threshold value: 144.24468278884888 - type: euclidean_ap value: 54.66981490353506 - type: euclidean_f1 value: 62.32136632973162 - type: euclidean_f1_threshold value: 1484.908676147461 - type: euclidean_precision value: 45.265822784810126 - type: euclidean_recall value: 100.0 - type: main_score value: 54.66981490353506 - type: manhattan_accuracy value: 58.9 - type: manhattan_accuracy_threshold value: 3586.785125732422 - type: manhattan_ap value: 54.668355260247736 - type: manhattan_f1 value: 62.32136632973162 - type: manhattan_f1_threshold value: 36031.22863769531 - type: manhattan_precision value: 45.265822784810126 - type: manhattan_recall value: 100.0 - type: max_accuracy value: 58.9 - type: max_ap value: 54.66981490353506 - type: max_f1 value: 62.32136632973162 - type: max_precision value: 45.265822784810126 - type: max_recall value: 100.0 - type: similarity_accuracy value: 58.85 - type: similarity_accuracy_threshold value: 99.73838329315186 - type: similarity_ap value: 54.66913160570546 - type: similarity_f1 value: 62.32136632973162 - type: similarity_f1_threshold value: 76.4499306678772 - type: similarity_precision value: 45.265822784810126 - type: similarity_recall value: 100.0 task: type: PairClassification - dataset: config: default name: MTEB PolEmo2.0-IN (default) revision: d90724373c70959f17d2331ad51fb60c71176b03 split: test type: PL-MTEB/polemo2_in metrics: - type: accuracy value: 83.75346260387812 - type: f1 value: 81.98304891214909 - type: f1_weighted value: 84.29623200830078 - type: main_score value: 83.75346260387812 task: type: Classification - dataset: config: default name: MTEB PolEmo2.0-OUT (default) revision: 6a21ab8716e255ab1867265f8b396105e8aa63d4 split: test type: PL-MTEB/polemo2_out metrics: - type: accuracy value: 66.53846153846153 - type: f1 value: 52.71826064368638 - type: f1_weighted value: 69.10010124630334 - type: main_score value: 66.53846153846153 task: type: Classification - dataset: config: default name: MTEB PPC revision: None split: test type: PL-MTEB/ppc-pairclassification metrics: - type: cosine_accuracy value: 81.8 - type: cosine_accuracy_threshold value: 90.47793745994568 - type: cosine_ap value: 91.42490266080884 - type: cosine_f1 value: 85.4632587859425 - type: cosine_f1_threshold value: 90.47793745994568 - type: cosine_precision value: 82.56172839506173 - type: cosine_recall value: 88.57615894039735 - type: dot_accuracy value: 74.6 - type: dot_accuracy_threshold value: 42102.23693847656 - type: dot_ap value: 86.20060009096979 - type: dot_f1 value: 80.02842928216063 - type: dot_f1_threshold value: 38970.16906738281 - type: dot_precision value: 70.1120797011208 - type: dot_recall value: 93.21192052980133 - type: euclidean_accuracy value: 81.5 - type: euclidean_accuracy_threshold value: 880.433464050293 - type: euclidean_ap value: 91.33143477982087 - type: euclidean_f1 value: 85.44600938967135 - type: euclidean_f1_threshold value: 964.0384674072266 - type: euclidean_precision value: 81.00890207715133 - type: euclidean_recall value: 90.39735099337747 - type: main_score value: 91.42490266080884 - type: manhattan_accuracy value: 81.3 - type: manhattan_accuracy_threshold value: 22100.830078125 - type: manhattan_ap value: 91.25996158651282 - type: manhattan_f1 value: 85.38102643856921 - type: manhattan_f1_threshold value: 24043.515014648438 - type: manhattan_precision value: 80.49853372434018 - type: manhattan_recall value: 90.89403973509934 - type: max_accuracy value: 81.8 - type: max_ap value: 91.42490266080884 - type: max_f1 value: 85.4632587859425 - type: max_precision value: 82.56172839506173 - type: max_recall value: 93.21192052980133 - type: similarity_accuracy value: 81.8 - type: similarity_accuracy_threshold value: 90.47793745994568 - type: similarity_ap value: 91.42490266080884 - type: similarity_f1 value: 85.4632587859425 - type: similarity_f1_threshold value: 90.47793745994568 - type: similarity_precision value: 82.56172839506173 - type: similarity_recall value: 88.57615894039735 task: type: PairClassification - dataset: config: default name: MTEB QuoraRetrieval (default) revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 split: test type: mteb/quora metrics: - type: map_at_1 value: 71.419 - type: map_at_10 value: 85.542 - type: map_at_100 value: 86.161 - type: map_at_1000 value: 86.175 - type: map_at_20 value: 85.949 - type: map_at_3 value: 82.623 - type: map_at_5 value: 84.5 - type: mrr_at_1 value: 82.27 - type: mrr_at_10 value: 88.21900000000001 - type: mrr_at_100 value: 88.313 - type: mrr_at_1000 value: 88.31400000000001 - type: mrr_at_20 value: 88.286 - type: mrr_at_3 value: 87.325 - type: mrr_at_5 value: 87.97500000000001 - type: ndcg_at_1 value: 82.3 - type: ndcg_at_10 value: 89.088 - type: ndcg_at_100 value: 90.217 - type: ndcg_at_1000 value: 90.29700000000001 - type: ndcg_at_20 value: 89.697 - type: ndcg_at_3 value: 86.435 - type: ndcg_at_5 value: 87.966 - type: precision_at_1 value: 82.3 - type: precision_at_10 value: 13.527000000000001 - type: precision_at_100 value: 1.537 - type: precision_at_1000 value: 0.157 - type: precision_at_20 value: 7.165000000000001 - type: precision_at_3 value: 37.92 - type: precision_at_5 value: 24.914 - type: recall_at_1 value: 71.419 - type: recall_at_10 value: 95.831 - type: recall_at_100 value: 99.64 - type: recall_at_1000 value: 99.988 - type: recall_at_20 value: 97.76599999999999 - type: recall_at_3 value: 88.081 - type: recall_at_5 value: 92.50500000000001 - type: main_score value: 89.088 task: type: Retrieval - dataset: config: default name: MTEB RUParaPhraserSTS (default) revision: 43265056790b8f7c59e0139acb4be0a8dad2c8f4 split: test type: merionum/ru_paraphraser metrics: - type: cosine_pearson value: 67.91177744712421 - type: cosine_spearman value: 76.77113726753656 - type: euclidean_pearson value: 73.81454206068638 - type: euclidean_spearman value: 76.92529493599028 - type: main_score value: 76.77113726753656 - type: manhattan_pearson value: 73.81690454439168 - type: manhattan_spearman value: 76.87333776705002 - type: pearson value: 67.91177744712421 - type: spearman value: 76.77113726753656 task: type: STS - dataset: config: default name: MTEB RedditClustering (default) revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: main_score value: 55.39924225216962 - type: v_measure value: 55.39924225216962 - type: v_measure_std value: 4.723802279292467 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P (default) revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: main_score value: 62.87465161304012 - type: v_measure value: 62.87465161304012 - type: v_measure_std value: 12.082670914488473 task: type: Clustering - dataset: config: default name: MTEB RiaNewsRetrieval (default) revision: 82374b0bbacda6114f39ff9c5b925fa1512ca5d7 split: test type: ai-forever/ria-news-retrieval metrics: - type: main_score value: 79.209 - type: map_at_1 value: 67.33 - type: map_at_10 value: 75.633 - type: map_at_100 value: 75.897 - type: map_at_1000 value: 75.907 - type: map_at_20 value: 75.804 - type: map_at_3 value: 74.2 - type: map_at_5 value: 75.13300000000001 - type: mrr_at_1 value: 67.31 - type: mrr_at_10 value: 75.62709126984095 - type: mrr_at_100 value: 75.89105697041113 - type: mrr_at_1000 value: 75.90115653883124 - type: mrr_at_20 value: 75.79802332308172 - type: mrr_at_3 value: 74.19499999999961 - type: mrr_at_5 value: 75.12849999999939 - type: nauc_map_at_1000_diff1 value: 74.30304869630591 - type: nauc_map_at_1000_max value: 36.477146725784046 - type: nauc_map_at_1000_std value: -20.862772498461723 - type: nauc_map_at_100_diff1 value: 74.29833058090355 - type: nauc_map_at_100_max value: 36.483678619667884 - type: nauc_map_at_100_std value: -20.856274849980135 - type: nauc_map_at_10_diff1 value: 74.20729220697967 - type: nauc_map_at_10_max value: 36.56543146170092 - type: nauc_map_at_10_std value: -20.991081015484728 - type: nauc_map_at_1_diff1 value: 77.38899022125185 - type: nauc_map_at_1_max value: 32.45918619669731 - type: nauc_map_at_1_std value: -22.149586336167324 - type: nauc_map_at_20_diff1 value: 74.2447573558587 - type: nauc_map_at_20_max value: 36.50383130240387 - type: nauc_map_at_20_std value: -20.87013743041831 - type: nauc_map_at_3_diff1 value: 74.3054577294586 - type: nauc_map_at_3_max value: 36.484530586652724 - type: nauc_map_at_3_std value: -21.90543024607988 - type: nauc_map_at_5_diff1 value: 74.21062368961503 - type: nauc_map_at_5_max value: 36.55670532498779 - type: nauc_map_at_5_std value: -21.488786900676942 - type: nauc_mrr_at_1000_diff1 value: 74.31619177956684 - type: nauc_mrr_at_1000_max value: 36.53498918453189 - type: nauc_mrr_at_1000_std value: -20.75986704931237 - type: nauc_mrr_at_100_diff1 value: 74.31146790382356 - type: nauc_mrr_at_100_max value: 36.54149252857106 - type: nauc_mrr_at_100_std value: -20.75341959250079 - type: nauc_mrr_at_10_diff1 value: 74.22027806145095 - type: nauc_mrr_at_10_max value: 36.622542969971725 - type: nauc_mrr_at_10_std value: -20.889417384064117 - type: nauc_mrr_at_1_diff1 value: 77.4306709551449 - type: nauc_mrr_at_1_max value: 32.57259463438259 - type: nauc_mrr_at_1_std value: -21.964402859613937 - type: nauc_mrr_at_20_diff1 value: 74.25784396230718 - type: nauc_mrr_at_20_max value: 36.561412224507336 - type: nauc_mrr_at_20_std value: -20.767665000065723 - type: nauc_mrr_at_3_diff1 value: 74.31423253547214 - type: nauc_mrr_at_3_max value: 36.537745749488906 - type: nauc_mrr_at_3_std value: -21.81259529019546 - type: nauc_mrr_at_5_diff1 value: 74.22404613312771 - type: nauc_mrr_at_5_max value: 36.60743768455219 - type: nauc_mrr_at_5_std value: -21.39479216331971 - type: nauc_ndcg_at_1000_diff1 value: 73.48182819705742 - type: nauc_ndcg_at_1000_max value: 37.86991608461793 - type: nauc_ndcg_at_1000_std value: -19.021499322688904 - type: nauc_ndcg_at_100_diff1 value: 73.34941250585759 - type: nauc_ndcg_at_100_max value: 38.11150275625829 - type: nauc_ndcg_at_100_std value: -18.70624087206104 - type: nauc_ndcg_at_10_diff1 value: 72.82520265115987 - type: nauc_ndcg_at_10_max value: 38.43323357650525 - type: nauc_ndcg_at_10_std value: -19.410953792830878 - type: nauc_ndcg_at_1_diff1 value: 77.38899022125185 - type: nauc_ndcg_at_1_max value: 32.45918619669731 - type: nauc_ndcg_at_1_std value: -22.149586336167324 - type: nauc_ndcg_at_20_diff1 value: 72.93309285256507 - type: nauc_ndcg_at_20_max value: 38.217372819067755 - type: nauc_ndcg_at_20_std value: -18.864113576359333 - type: nauc_ndcg_at_3_diff1 value: 73.18253776744112 - type: nauc_ndcg_at_3_max value: 38.008109328364 - type: nauc_ndcg_at_3_std value: -21.68785687594153 - type: nauc_ndcg_at_5_diff1 value: 72.90474739784793 - type: nauc_ndcg_at_5_max value: 38.29483039202184 - type: nauc_ndcg_at_5_std value: -20.833049811453474 - type: nauc_precision_at_1000_diff1 value: 59.306217613750334 - type: nauc_precision_at_1000_max value: 72.20747948302262 - type: nauc_precision_at_1000_std value: 45.58837180096227 - type: nauc_precision_at_100_diff1 value: 62.87286844562389 - type: nauc_precision_at_100_max value: 61.33108214045868 - type: nauc_precision_at_100_std value: 20.67481963545654 - type: nauc_precision_at_10_diff1 value: 64.11222984256685 - type: nauc_precision_at_10_max value: 50.323697746037496 - type: nauc_precision_at_10_std value: -7.9994544634332625 - type: nauc_precision_at_1_diff1 value: 77.38899022125185 - type: nauc_precision_at_1_max value: 32.45918619669731 - type: nauc_precision_at_1_std value: -22.149586336167324 - type: nauc_precision_at_20_diff1 value: 62.30228127286973 - type: nauc_precision_at_20_max value: 52.02090746208407 - type: nauc_precision_at_20_std value: 0.7629898806370331 - type: nauc_precision_at_3_diff1 value: 68.82856645994157 - type: nauc_precision_at_3_max value: 43.94171571306625 - type: nauc_precision_at_3_std value: -20.78595255410148 - type: nauc_precision_at_5_diff1 value: 66.62157622497887 - type: nauc_precision_at_5_max value: 46.69398173603811 - type: nauc_precision_at_5_std value: -17.412423571163057 - type: nauc_recall_at_1000_diff1 value: 59.30621761375148 - type: nauc_recall_at_1000_max value: 72.20747948302191 - type: nauc_recall_at_1000_std value: 45.588371800962655 - type: nauc_recall_at_100_diff1 value: 62.872868445623894 - type: nauc_recall_at_100_max value: 61.33108214045813 - type: nauc_recall_at_100_std value: 20.67481963545666 - type: nauc_recall_at_10_diff1 value: 64.11222984256698 - type: nauc_recall_at_10_max value: 50.32369774603755 - type: nauc_recall_at_10_std value: -7.999454463433321 - type: nauc_recall_at_1_diff1 value: 77.38899022125185 - type: nauc_recall_at_1_max value: 32.45918619669731 - type: nauc_recall_at_1_std value: -22.149586336167324 - type: nauc_recall_at_20_diff1 value: 62.3022812728695 - type: nauc_recall_at_20_max value: 52.02090746208397 - type: nauc_recall_at_20_std value: 0.7629898806369458 - type: nauc_recall_at_3_diff1 value: 68.82856645994157 - type: nauc_recall_at_3_max value: 43.94171571306612 - type: nauc_recall_at_3_std value: -20.78595255410157 - type: nauc_recall_at_5_diff1 value: 66.62157622497897 - type: nauc_recall_at_5_max value: 46.693981736038246 - type: nauc_recall_at_5_std value: -17.412423571162954 - type: ndcg_at_1 value: 67.33 - type: ndcg_at_10 value: 79.209 - type: ndcg_at_100 value: 80.463 - type: ndcg_at_1000 value: 80.74799999999999 - type: ndcg_at_20 value: 79.81899999999999 - type: ndcg_at_3 value: 76.335 - type: ndcg_at_5 value: 78.011 - type: precision_at_1 value: 67.33 - type: precision_at_10 value: 9.020999999999999 - type: precision_at_100 value: 0.96 - type: precision_at_1000 value: 0.098 - type: precision_at_20 value: 4.63 - type: precision_at_3 value: 27.493000000000002 - type: precision_at_5 value: 17.308 - type: recall_at_1 value: 67.33 - type: recall_at_10 value: 90.21000000000001 - type: recall_at_100 value: 96.00999999999999 - type: recall_at_1000 value: 98.29 - type: recall_at_20 value: 92.60000000000001 - type: recall_at_3 value: 82.48 - type: recall_at_5 value: 86.53999999999999 task: type: Retrieval - dataset: config: default name: MTEB RuBQReranking (default) revision: 2e96b8f098fa4b0950fc58eacadeb31c0d0c7fa2 split: test type: ai-forever/rubq-reranking metrics: - type: main_score value: 65.57453932493252 - type: map value: 65.57453932493252 - type: mrr value: 70.51408205663526 - type: nAUC_map_diff1 value: 26.69583260609023 - type: nAUC_map_max value: 12.928262749610663 - type: nAUC_map_std value: 11.702468857903128 - type: nAUC_mrr_diff1 value: 28.5206955462174 - type: nAUC_mrr_max value: 14.207162454694227 - type: nAUC_mrr_std value: 10.725721001555296 task: type: Reranking - dataset: config: default name: MTEB RuBQRetrieval (default) revision: e19b6ffa60b3bc248e0b41f4cc37c26a55c2a67b split: test type: ai-forever/rubq-retrieval metrics: - type: main_score value: 72.306 - type: map_at_1 value: 44.187 - type: map_at_10 value: 64.836 - type: map_at_100 value: 65.771 - type: map_at_1000 value: 65.8 - type: map_at_20 value: 65.497 - type: map_at_3 value: 59.692 - type: map_at_5 value: 63.105 - type: mrr_at_1 value: 62.23404255319149 - type: mrr_at_10 value: 73.40810161732159 - type: mrr_at_100 value: 73.67949305473395 - type: mrr_at_1000 value: 73.68707852294746 - type: mrr_at_20 value: 73.60429051697479 - type: mrr_at_3 value: 71.47360126083535 - type: mrr_at_5 value: 72.8447596532704 - type: nauc_map_at_1000_diff1 value: 39.838449035736886 - type: nauc_map_at_1000_max value: 32.29962306877408 - type: nauc_map_at_1000_std value: -6.324859592714388 - type: nauc_map_at_100_diff1 value: 39.824361938745426 - type: nauc_map_at_100_max value: 32.32055222704763 - type: nauc_map_at_100_std value: -6.301641111869559 - type: nauc_map_at_10_diff1 value: 39.50155328718487 - type: nauc_map_at_10_max value: 31.745730244960672 - type: nauc_map_at_10_std value: -6.867215137329693 - type: nauc_map_at_1_diff1 value: 47.66181128677822 - type: nauc_map_at_1_max value: 21.75204233166764 - type: nauc_map_at_1_std value: -8.06951079061697 - type: nauc_map_at_20_diff1 value: 39.78364637902108 - type: nauc_map_at_20_max value: 32.39065528029405 - type: nauc_map_at_20_std value: -6.368994332729006 - type: nauc_map_at_3_diff1 value: 39.51829474433183 - type: nauc_map_at_3_max value: 28.633292697821673 - type: nauc_map_at_3_std value: -7.2561170814963925 - type: nauc_map_at_5_diff1 value: 39.288433237676266 - type: nauc_map_at_5_max value: 31.007702201615515 - type: nauc_map_at_5_std value: -7.235131195162474 - type: nauc_mrr_at_1000_diff1 value: 49.599102391215226 - type: nauc_mrr_at_1000_max value: 38.25521825911133 - type: nauc_mrr_at_1000_std value: -10.448180939809435 - type: nauc_mrr_at_100_diff1 value: 49.5957067716212 - type: nauc_mrr_at_100_max value: 38.26760703964535 - type: nauc_mrr_at_100_std value: -10.438443051971081 - type: nauc_mrr_at_10_diff1 value: 49.35269710190271 - type: nauc_mrr_at_10_max value: 38.43782589127069 - type: nauc_mrr_at_10_std value: -10.404402063509815 - type: nauc_mrr_at_1_diff1 value: 53.32206103688421 - type: nauc_mrr_at_1_max value: 33.52402390241035 - type: nauc_mrr_at_1_std value: -12.73473393949936 - type: nauc_mrr_at_20_diff1 value: 49.550630850826636 - type: nauc_mrr_at_20_max value: 38.35964703941151 - type: nauc_mrr_at_20_std value: -10.444577766284766 - type: nauc_mrr_at_3_diff1 value: 49.12029127633829 - type: nauc_mrr_at_3_max value: 38.01631275124067 - type: nauc_mrr_at_3_std value: -10.523724301481309 - type: nauc_mrr_at_5_diff1 value: 49.04606949432458 - type: nauc_mrr_at_5_max value: 38.33647550077891 - type: nauc_mrr_at_5_std value: -10.47076409263114 - type: nauc_ndcg_at_1000_diff1 value: 41.342785916264226 - type: nauc_ndcg_at_1000_max value: 35.75731064862711 - type: nauc_ndcg_at_1000_std value: -5.45573422899229 - type: nauc_ndcg_at_100_diff1 value: 40.972974559636086 - type: nauc_ndcg_at_100_max value: 36.32938573321036 - type: nauc_ndcg_at_100_std value: -4.749631537590004 - type: nauc_ndcg_at_10_diff1 value: 39.67813474464166 - type: nauc_ndcg_at_10_max value: 35.480200504848966 - type: nauc_ndcg_at_10_std value: -6.318561293935512 - type: nauc_ndcg_at_1_diff1 value: 53.45970160222764 - type: nauc_ndcg_at_1_max value: 33.14759013278075 - type: nauc_ndcg_at_1_std value: -12.579833891774847 - type: nauc_ndcg_at_20_diff1 value: 40.67492861219249 - type: nauc_ndcg_at_20_max value: 36.84960799838019 - type: nauc_ndcg_at_20_std value: -5.202530835850179 - type: nauc_ndcg_at_3_diff1 value: 39.574906207408844 - type: nauc_ndcg_at_3_max value: 31.76512164509258 - type: nauc_ndcg_at_3_std value: -7.656143208565999 - type: nauc_ndcg_at_5_diff1 value: 39.096348529742095 - type: nauc_ndcg_at_5_max value: 34.075926475544165 - type: nauc_ndcg_at_5_std value: -7.238045445366631 - type: nauc_precision_at_1000_diff1 value: -14.283799754212609 - type: nauc_precision_at_1000_max value: 6.449741756717101 - type: nauc_precision_at_1000_std value: 4.862828679759048 - type: nauc_precision_at_100_diff1 value: -13.23173132700258 - type: nauc_precision_at_100_max value: 11.058898534529195 - type: nauc_precision_at_100_std value: 7.343683941814956 - type: nauc_precision_at_10_diff1 value: -7.202951643546464 - type: nauc_precision_at_10_max value: 17.499446869433278 - type: nauc_precision_at_10_std value: 2.8367985220406307 - type: nauc_precision_at_1_diff1 value: 53.45970160222764 - type: nauc_precision_at_1_max value: 33.14759013278075 - type: nauc_precision_at_1_std value: -12.579833891774847 - type: nauc_precision_at_20_diff1 value: -9.477122699154124 - type: nauc_precision_at_20_max value: 16.80556031564312 - type: nauc_precision_at_20_std value: 6.420218284416923 - type: nauc_precision_at_3_diff1 value: 5.5276143574150245 - type: nauc_precision_at_3_max value: 23.65952688481666 - type: nauc_precision_at_3_std value: -1.8730348729295785 - type: nauc_precision_at_5_diff1 value: -2.4537029093721308 - type: nauc_precision_at_5_max value: 21.41469327545133 - type: nauc_precision_at_5_std value: 0.1543890645722277 - type: nauc_recall_at_1000_diff1 value: -1.7474947956413491 - type: nauc_recall_at_1000_max value: 46.22670991970479 - type: nauc_recall_at_1000_std value: 62.582840705588794 - type: nauc_recall_at_100_diff1 value: 16.116089801097345 - type: nauc_recall_at_100_max value: 52.54794580975103 - type: nauc_recall_at_100_std value: 33.720245696003246 - type: nauc_recall_at_10_diff1 value: 23.134924318655482 - type: nauc_recall_at_10_max value: 38.73754275649077 - type: nauc_recall_at_10_std value: 0.6137471711639239 - type: nauc_recall_at_1_diff1 value: 47.66181128677822 - type: nauc_recall_at_1_max value: 21.75204233166764 - type: nauc_recall_at_1_std value: -8.06951079061697 - type: nauc_recall_at_20_diff1 value: 24.130616271355017 - type: nauc_recall_at_20_max value: 48.306178640146136 - type: nauc_recall_at_20_std value: 9.290819557000022 - type: nauc_recall_at_3_diff1 value: 29.767415016250226 - type: nauc_recall_at_3_max value: 28.54289782140701 - type: nauc_recall_at_3_std value: -5.1395675072005576 - type: nauc_recall_at_5_diff1 value: 25.410613126870174 - type: nauc_recall_at_5_max value: 33.24658754857624 - type: nauc_recall_at_5_std value: -4.211226036746632 - type: ndcg_at_1 value: 62.175000000000004 - type: ndcg_at_10 value: 72.306 - type: ndcg_at_100 value: 75.074 - type: ndcg_at_1000 value: 75.581 - type: ndcg_at_20 value: 73.875 - type: ndcg_at_3 value: 65.641 - type: ndcg_at_5 value: 69.48299999999999 - type: precision_at_1 value: 62.175000000000004 - type: precision_at_10 value: 13.907 - type: precision_at_100 value: 1.591 - type: precision_at_1000 value: 0.166 - type: precision_at_20 value: 7.446999999999999 - type: precision_at_3 value: 35.619 - type: precision_at_5 value: 24.917 - type: recall_at_1 value: 44.187 - type: recall_at_10 value: 85.10600000000001 - type: recall_at_100 value: 95.488 - type: recall_at_1000 value: 98.831 - type: recall_at_20 value: 90.22200000000001 - type: recall_at_3 value: 68.789 - type: recall_at_5 value: 77.85499999999999 task: type: Retrieval - dataset: config: default name: MTEB RuReviewsClassification (default) revision: f6d2c31f4dc6b88f468552750bfec05b4b41b05a split: test type: ai-forever/ru-reviews-classification metrics: - type: accuracy value: 67.5830078125 - type: f1 value: 67.56931936632446 - type: f1_weighted value: 67.57137733752779 - type: main_score value: 67.5830078125 task: type: Classification - dataset: config: default name: MTEB RuSTSBenchmarkSTS (default) revision: 7cf24f325c6da6195df55bef3d86b5e0616f3018 split: test type: ai-forever/ru-stsbenchmark-sts metrics: - type: cosine_pearson value: 85.90493484626788 - type: cosine_spearman value: 86.21965691667411 - type: euclidean_pearson value: 86.07499842984909 - type: euclidean_spearman value: 86.55506818735688 - type: main_score value: 86.21965691667411 - type: manhattan_pearson value: 85.95976420231729 - type: manhattan_spearman value: 86.48604243661234 - type: pearson value: 85.90493484626788 - type: spearman value: 86.21965691667411 task: type: STS - dataset: config: default name: MTEB RuSciBenchGRNTIClassification (default) revision: 673a610d6d3dd91a547a0d57ae1b56f37ebbf6a1 split: test type: ai-forever/ru-scibench-grnti-classification metrics: - type: accuracy value: 59.1943359375 - type: f1 value: 58.894480861440414 - type: f1_weighted value: 58.903615560240866 - type: main_score value: 59.1943359375 task: type: Classification - dataset: config: default name: MTEB RuSciBenchGRNTIClusteringP2P (default) revision: 673a610d6d3dd91a547a0d57ae1b56f37ebbf6a1 split: test type: ai-forever/ru-scibench-grnti-classification metrics: - type: main_score value: 57.99209448663228 - type: v_measure value: 57.99209448663228 - type: v_measure_std value: 1.0381163861993816 task: type: Clustering - dataset: config: default name: MTEB RuSciBenchOECDClassification (default) revision: 26c88e99dcaba32bb45d0e1bfc21902337f6d471 split: test type: ai-forever/ru-scibench-oecd-classification metrics: - type: accuracy value: 45.556640625 - type: f1 value: 45.159163104085906 - type: f1_weighted value: 45.16098316398626 - type: main_score value: 45.556640625 task: type: Classification - dataset: config: default name: MTEB RuSciBenchOECDClusteringP2P (default) revision: 26c88e99dcaba32bb45d0e1bfc21902337f6d471 split: test type: ai-forever/ru-scibench-oecd-classification metrics: - type: main_score value: 50.787548070488974 - type: v_measure value: 50.787548070488974 - type: v_measure_std value: 0.8569958168946827 task: type: Clustering - dataset: config: default name: MTEB SCIDOCS (default) revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 split: test type: mteb/scidocs metrics: - type: map_at_1 value: 4.843 - type: map_at_10 value: 11.752 - type: map_at_100 value: 13.919 - type: map_at_1000 value: 14.198 - type: map_at_20 value: 12.898000000000001 - type: map_at_3 value: 8.603 - type: map_at_5 value: 10.069 - type: mrr_at_1 value: 23.799999999999997 - type: mrr_at_10 value: 34.449999999999996 - type: mrr_at_100 value: 35.64 - type: mrr_at_1000 value: 35.691 - type: mrr_at_20 value: 35.213 - type: mrr_at_3 value: 31.383 - type: mrr_at_5 value: 33.062999999999995 - type: ndcg_at_1 value: 23.799999999999997 - type: ndcg_at_10 value: 19.811 - type: ndcg_at_100 value: 28.108 - type: ndcg_at_1000 value: 33.1 - type: ndcg_at_20 value: 22.980999999999998 - type: ndcg_at_3 value: 19.153000000000002 - type: ndcg_at_5 value: 16.408 - type: precision_at_1 value: 23.799999999999997 - type: precision_at_10 value: 10.16 - type: precision_at_100 value: 2.1999999999999997 - type: precision_at_1000 value: 0.34099999999999997 - type: precision_at_20 value: 6.915 - type: precision_at_3 value: 17.8 - type: precision_at_5 value: 14.14 - type: recall_at_1 value: 4.843 - type: recall_at_10 value: 20.595 - type: recall_at_100 value: 44.66 - type: recall_at_1000 value: 69.152 - type: recall_at_20 value: 28.04 - type: recall_at_3 value: 10.833 - type: recall_at_5 value: 14.346999999999998 - type: main_score value: 19.811 task: type: Retrieval - dataset: config: default name: MTEB SICK-E-PL (default) revision: 71bba34b0ece6c56dfcf46d9758a27f7a90f17e9 split: test type: PL-MTEB/sicke-pl-pairclassification metrics: - type: cosine_accuracy value: 80.90093762739502 - type: cosine_accuracy_threshold value: 94.40930485725403 - type: cosine_ap value: 71.15400909912427 - type: cosine_f1 value: 66.8213457076566 - type: cosine_f1_threshold value: 91.53673648834229 - type: cosine_precision value: 62.4922504649721 - type: cosine_recall value: 71.7948717948718 - type: dot_accuracy value: 78.41418671015083 - type: dot_accuracy_threshold value: 42924.45068359375 - type: dot_ap value: 63.34003025365763 - type: dot_f1 value: 62.518258837277244 - type: dot_f1_threshold value: 40900.738525390625 - type: dot_precision value: 52.99653293709758 - type: dot_recall value: 76.21082621082621 - type: euclidean_accuracy value: 80.67672238075826 - type: euclidean_accuracy_threshold value: 696.0524559020996 - type: euclidean_ap value: 70.88762835990224 - type: euclidean_f1 value: 66.711051930759 - type: euclidean_f1_threshold value: 878.5581588745117 - type: euclidean_precision value: 62.625 - type: euclidean_recall value: 71.36752136752136 - type: main_score value: 71.15400909912427 - type: manhattan_accuracy value: 80.65633917651854 - type: manhattan_accuracy_threshold value: 17277.72674560547 - type: manhattan_ap value: 70.67105336611716 - type: manhattan_f1 value: 66.51346027577151 - type: manhattan_f1_threshold value: 21687.957763671875 - type: manhattan_precision value: 61.69305724725944 - type: manhattan_recall value: 72.15099715099716 - type: max_accuracy value: 80.90093762739502 - type: max_ap value: 71.15400909912427 - type: max_f1 value: 66.8213457076566 - type: max_precision value: 62.625 - type: max_recall value: 76.21082621082621 - type: similarity_accuracy value: 80.90093762739502 - type: similarity_accuracy_threshold value: 94.40930485725403 - type: similarity_ap value: 71.15400909912427 - type: similarity_f1 value: 66.8213457076566 - type: similarity_f1_threshold value: 91.53673648834229 - type: similarity_precision value: 62.4922504649721 - type: similarity_recall value: 71.7948717948718 task: type: PairClassification - dataset: config: default name: MTEB SICK-R (default) revision: 20a6d6f312dd54037fe07a32d58e5e168867909d split: test type: mteb/sickr-sts metrics: - type: cosine_pearson value: 92.3339946866199 - type: cosine_spearman value: 89.61697355115497 - type: euclidean_pearson value: 90.3264916449669 - type: euclidean_spearman value: 89.36270451308866 - type: main_score value: 89.61697355115497 - type: manhattan_pearson value: 90.18909339052534 - type: manhattan_spearman value: 89.28337093097377 - type: pearson value: 92.3339946866199 - type: spearman value: 89.61697355115497 task: type: STS - dataset: config: default name: MTEB SICK-R-PL (default) revision: fd5c2441b7eeff8676768036142af4cfa42c1339 split: test type: PL-MTEB/sickr-pl-sts metrics: - type: cosine_pearson value: 85.27883048457821 - type: cosine_spearman value: 80.53204892678619 - type: euclidean_pearson value: 82.78520705216168 - type: euclidean_spearman value: 80.27848359873212 - type: main_score value: 80.53204892678619 - type: manhattan_pearson value: 82.63270640583454 - type: manhattan_spearman value: 80.21507977473146 - type: pearson value: 85.27883048457821 - type: spearman value: 80.53204892678619 task: type: STS - dataset: config: default name: MTEB SICKFr (default) revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a split: test type: Lajavaness/SICK-fr metrics: - type: cosine_pearson value: 88.77029361817212 - type: cosine_spearman value: 83.9453600346894 - type: euclidean_pearson value: 85.85331086208573 - type: euclidean_spearman value: 83.70852031985308 - type: main_score value: 83.9453600346894 - type: manhattan_pearson value: 85.66222265885914 - type: manhattan_spearman value: 83.60833111525962 - type: pearson value: 88.77029361817212 - type: spearman value: 83.9453600346894 task: type: STS - dataset: config: default name: MTEB STS12 (default) revision: a0d554a64d88156834ff5ae9920b964011b16384 split: test type: mteb/sts12-sts metrics: - type: cosine_pearson value: 88.76435859522375 - type: cosine_spearman value: 82.43768167804375 - type: euclidean_pearson value: 87.43566183874832 - type: euclidean_spearman value: 82.82166873757507 - type: main_score value: 82.43768167804375 - type: manhattan_pearson value: 87.39450871380951 - type: manhattan_spearman value: 82.89253043430163 - type: pearson value: 88.76435859522375 - type: spearman value: 82.43768167804375 task: type: STS - dataset: config: default name: MTEB STS13 (default) revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca split: test type: mteb/sts13-sts metrics: - type: cosine_pearson value: 88.86627241652141 - type: cosine_spearman value: 89.49011599120688 - type: euclidean_pearson value: 89.3314120073772 - type: euclidean_spearman value: 89.8226502776963 - type: main_score value: 89.49011599120688 - type: manhattan_pearson value: 89.2252179076963 - type: manhattan_spearman value: 89.74573844021225 - type: pearson value: 88.86627241652141 - type: spearman value: 89.49011599120688 task: type: STS - dataset: config: default name: MTEB STS14 (default) revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 split: test type: mteb/sts14-sts metrics: - type: cosine_pearson value: 87.22891405215968 - type: cosine_spearman value: 84.9467188157614 - type: euclidean_pearson value: 87.20330004726237 - type: euclidean_spearman value: 85.34806059461808 - type: main_score value: 84.9467188157614 - type: manhattan_pearson value: 87.15224666107623 - type: manhattan_spearman value: 85.34596898699708 - type: pearson value: 87.22891405215968 - type: spearman value: 84.9467188157614 task: type: STS - dataset: config: default name: MTEB STS15 (default) revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 split: test type: mteb/sts15-sts metrics: - type: cosine_pearson value: 88.14066430111033 - type: cosine_spearman value: 89.31337445552545 - type: euclidean_pearson value: 89.08039335366983 - type: euclidean_spearman value: 89.6658762856415 - type: main_score value: 89.31337445552545 - type: manhattan_pearson value: 89.08057438154486 - type: manhattan_spearman value: 89.68673984203022 - type: pearson value: 88.14066430111033 - type: spearman value: 89.31337445552545 task: type: STS - dataset: config: default name: MTEB STS16 (default) revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 split: test type: mteb/sts16-sts metrics: - type: cosine_pearson value: 85.14908856657084 - type: cosine_spearman value: 86.84648320786727 - type: euclidean_pearson value: 86.11454713131947 - type: euclidean_spearman value: 86.77738862047961 - type: main_score value: 86.84648320786727 - type: manhattan_pearson value: 86.07804821916372 - type: manhattan_spearman value: 86.78676064310474 - type: pearson value: 85.14908856657084 - type: spearman value: 86.84648320786727 task: type: STS - dataset: config: en-en name: MTEB STS17 (en-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 89.61633502468356 - type: cosine_spearman value: 89.99772663224805 - type: euclidean_pearson value: 90.14056501501044 - type: euclidean_spearman value: 90.04496896837503 - type: main_score value: 89.99772663224805 - type: manhattan_pearson value: 90.08964860311801 - type: manhattan_spearman value: 90.00091712362196 - type: pearson value: 89.61633502468356 - type: spearman value: 89.99772663224805 task: type: STS - dataset: config: es-en name: MTEB STS17 (es-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 86.44548026840202 - type: cosine_spearman value: 87.26263108768539 - type: euclidean_pearson value: 86.42844593583838 - type: euclidean_spearman value: 86.89388428664364 - type: main_score value: 87.26263108768539 - type: manhattan_pearson value: 86.47186940800881 - type: manhattan_spearman value: 87.02163091089946 - type: pearson value: 86.44548026840202 - type: spearman value: 87.26263108768539 task: type: STS - dataset: config: en-de name: MTEB STS17 (en-de) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 87.89345132532758 - type: cosine_spearman value: 87.96246221327699 - type: euclidean_pearson value: 88.49013032701419 - type: euclidean_spearman value: 87.81981265317344 - type: main_score value: 87.96246221327699 - type: manhattan_pearson value: 88.31360914178538 - type: manhattan_spearman value: 87.62734530005075 - type: pearson value: 87.89345132532758 - type: spearman value: 87.96246221327699 task: type: STS - dataset: config: es-es name: MTEB STS17 (es-es) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 88.4084678497171 - type: cosine_spearman value: 88.77640638748285 - type: euclidean_pearson value: 89.60124312475843 - type: euclidean_spearman value: 88.4321442688528 - type: main_score value: 88.77640638748285 - type: manhattan_pearson value: 89.62375118021299 - type: manhattan_spearman value: 88.46998118661577 - type: pearson value: 88.4084678497171 - type: spearman value: 88.77640638748285 task: type: STS - dataset: config: fr-en name: MTEB STS17 (fr-en) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 87.30688801326498 - type: cosine_spearman value: 87.55684697258378 - type: euclidean_pearson value: 87.89672951056794 - type: euclidean_spearman value: 87.28050429201674 - type: main_score value: 87.55684697258378 - type: manhattan_pearson value: 87.74292745320572 - type: manhattan_spearman value: 87.16383993876582 - type: pearson value: 87.30688801326498 - type: spearman value: 87.55684697258378 task: type: STS - dataset: config: zh-en name: MTEB STS22 (zh-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 73.46180375170147 - type: cosine_spearman value: 73.39559590127081 - type: euclidean_pearson value: 73.72613901293681 - type: euclidean_spearman value: 71.85465165176795 - type: main_score value: 73.39559590127081 - type: manhattan_pearson value: 73.07859140869076 - type: manhattan_spearman value: 71.22047343718893 - type: pearson value: 73.46180375170147 - type: spearman value: 73.39559590127081 task: type: STS - dataset: config: zh name: MTEB STS22 (zh) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 62.47531620842637 - type: cosine_spearman value: 66.22504667157702 - type: euclidean_pearson value: 66.76201254783692 - type: euclidean_spearman value: 66.86115760269463 - type: main_score value: 66.22504667157702 - type: manhattan_pearson value: 66.73847836793489 - type: manhattan_spearman value: 66.7677116377695 - type: pearson value: 62.47531620842637 - type: spearman value: 66.22504667157702 task: type: STS - dataset: config: es name: MTEB STS22 (es) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 69.89707002436481 - type: cosine_spearman value: 72.2054865735116 - type: euclidean_pearson value: 71.81856615570756 - type: euclidean_spearman value: 72.72593304629407 - type: main_score value: 72.2054865735116 - type: manhattan_pearson value: 72.00362684700072 - type: manhattan_spearman value: 72.62783534769964 - type: pearson value: 69.89707002436481 - type: spearman value: 72.2054865735116 task: type: STS - dataset: config: fr name: MTEB STS22 (fr) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 81.59623734395916 - type: cosine_spearman value: 83.28946105111358 - type: euclidean_pearson value: 79.377330171466 - type: euclidean_spearman value: 81.81029781662205 - type: main_score value: 83.28946105111358 - type: manhattan_pearson value: 78.96970881689698 - type: manhattan_spearman value: 81.91773236079703 - type: pearson value: 81.59623734395916 - type: spearman value: 83.28946105111358 task: type: STS - dataset: config: de-fr name: MTEB STS22 (de-fr) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 55.03825643126142 - type: cosine_spearman value: 58.25792501780429 - type: euclidean_pearson value: 50.38007603973409 - type: euclidean_spearman value: 59.39961789383097 - type: main_score value: 58.25792501780429 - type: manhattan_pearson value: 50.518568927999155 - type: manhattan_spearman value: 59.84185466003894 - type: pearson value: 55.03825643126142 - type: spearman value: 58.25792501780429 task: type: STS - dataset: config: pl-en name: MTEB STS22 (pl-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 77.77233721490776 - type: cosine_spearman value: 76.17596588017625 - type: euclidean_pearson value: 74.47600468156611 - type: euclidean_spearman value: 72.61278728057012 - type: main_score value: 76.17596588017625 - type: manhattan_pearson value: 74.48118910099699 - type: manhattan_spearman value: 73.33167419101696 - type: pearson value: 77.77233721490776 - type: spearman value: 76.17596588017625 task: type: STS - dataset: config: pl name: MTEB STS22 (pl) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 42.87453608131507 - type: cosine_spearman value: 45.137849894401185 - type: euclidean_pearson value: 31.66964197694796 - type: euclidean_spearman value: 44.1014900837869 - type: main_score value: 45.137849894401185 - type: manhattan_pearson value: 31.007199259384745 - type: manhattan_spearman value: 43.48181523288926 - type: pearson value: 42.87453608131507 - type: spearman value: 45.137849894401185 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 66.87400150638176 - type: cosine_spearman value: 67.27861354834066 - type: euclidean_pearson value: 66.81789582140216 - type: euclidean_spearman value: 66.44220479858708 - type: main_score value: 67.27861354834066 - type: manhattan_pearson value: 66.92509859033235 - type: manhattan_spearman value: 66.46841124185076 - type: pearson value: 66.87400150638176 - type: spearman value: 67.27861354834066 task: type: STS - dataset: config: ru name: MTEB STS22 (ru) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 61.819804551576084 - type: cosine_spearman value: 65.0864146772135 - type: euclidean_pearson value: 62.518151090361876 - type: euclidean_spearman value: 65.13608138548017 - type: main_score value: 65.0864146772135 - type: manhattan_pearson value: 62.51413246915267 - type: manhattan_spearman value: 65.19077543064323 - type: pearson value: 61.819804551576084 - type: spearman value: 65.0864146772135 task: type: STS - dataset: config: de name: MTEB STS22 (de) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 54.85728696035389 - type: cosine_spearman value: 61.60906359227576 - type: euclidean_pearson value: 52.57582587901851 - type: euclidean_spearman value: 61.41823097598308 - type: main_score value: 61.60906359227576 - type: manhattan_pearson value: 52.500978361080506 - type: manhattan_spearman value: 61.30365596659758 - type: pearson value: 54.85728696035389 - type: spearman value: 61.60906359227576 task: type: STS - dataset: config: fr-pl name: MTEB STS22 (fr-pl) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 67.68016005631422 - type: cosine_spearman value: 84.51542547285167 - type: euclidean_pearson value: 66.19871164667245 - type: euclidean_spearman value: 73.24670207647144 - type: main_score value: 84.51542547285167 - type: manhattan_pearson value: 67.0443525268974 - type: manhattan_spearman value: 73.24670207647144 - type: pearson value: 67.68016005631422 - type: spearman value: 84.51542547285167 task: type: STS - dataset: config: de-pl name: MTEB STS22 (de-pl) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 47.49467414030747 - type: cosine_spearman value: 56.81512095681289 - type: euclidean_pearson value: 48.42860221765214 - type: euclidean_spearman value: 58.63197306329092 - type: main_score value: 56.81512095681289 - type: manhattan_pearson value: 48.39594959260441 - type: manhattan_spearman value: 58.63197306329092 - type: pearson value: 47.49467414030747 - type: spearman value: 56.81512095681289 task: type: STS - dataset: config: es-en name: MTEB STS22 (es-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 76.8364678896155 - type: cosine_spearman value: 78.45516413087114 - type: euclidean_pearson value: 78.62779318576634 - type: euclidean_spearman value: 78.88760695649488 - type: main_score value: 78.45516413087114 - type: manhattan_pearson value: 78.62131335760031 - type: manhattan_spearman value: 78.81861844200388 - type: pearson value: 76.8364678896155 - type: spearman value: 78.45516413087114 task: type: STS - dataset: config: de-en name: MTEB STS22 (de-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 65.16640313911604 - type: cosine_spearman value: 60.887608967403914 - type: euclidean_pearson value: 67.49902244990913 - type: euclidean_spearman value: 59.2458787136538 - type: main_score value: 60.887608967403914 - type: manhattan_pearson value: 67.34313506388378 - type: manhattan_spearman value: 59.05283429200166 - type: pearson value: 65.16640313911604 - type: spearman value: 60.887608967403914 task: type: STS - dataset: config: default name: MTEB QBQTC (default) revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7 split: test type: C-MTEB/QBQTC metrics: - type: cosine_pearson value: 34.20049144526891 - type: cosine_spearman value: 36.41802814113771 - type: euclidean_pearson value: 34.569942139590626 - type: euclidean_spearman value: 36.06141660786936 - type: main_score value: 36.41802814113771 - type: manhattan_pearson value: 34.537041543916003 - type: manhattan_spearman value: 36.033418927773825 - type: pearson value: 34.20049144526891 - type: spearman value: 36.41802814113771 task: type: STS - dataset: config: default name: MTEB STSB (default) revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0 split: test type: C-MTEB/STSB metrics: - type: cosine_pearson value: 81.5092853013241 - type: cosine_spearman value: 83.54005474244292 - type: euclidean_pearson value: 83.7246578378554 - type: euclidean_spearman value: 84.46767551087716 - type: main_score value: 83.54005474244292 - type: manhattan_pearson value: 83.65922665594636 - type: manhattan_spearman value: 84.42431449101848 - type: pearson value: 81.5092853013241 - type: spearman value: 83.54005474244292 task: type: STS - dataset: config: default name: MTEB STSBenchmark (default) revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 split: test type: mteb/stsbenchmark-sts metrics: - type: cosine_pearson value: 87.70246866744966 - type: cosine_spearman value: 89.44070045346106 - type: euclidean_pearson value: 89.56956519641007 - type: euclidean_spearman value: 89.95830112784283 - type: main_score value: 89.44070045346106 - type: manhattan_pearson value: 89.48264471425145 - type: manhattan_spearman value: 89.87900732483114 - type: pearson value: 87.70246866744966 - type: spearman value: 89.44070045346106 task: type: STS - dataset: config: de name: MTEB STSBenchmarkMultilingualSTS (de) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 86.83701990805217 - type: cosine_spearman value: 87.80280785492258 - type: euclidean_pearson value: 87.77325330043514 - type: euclidean_spearman value: 88.3564607283144 - type: main_score value: 87.80280785492258 - type: manhattan_pearson value: 87.6745449945946 - type: manhattan_spearman value: 88.30660465978795 - type: pearson value: 86.83701990805217 - type: spearman value: 87.80280785492258 task: type: STS - dataset: config: zh name: MTEB STSBenchmarkMultilingualSTS (zh) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 84.27751020600267 - type: cosine_spearman value: 85.63500407412486 - type: euclidean_pearson value: 85.21829891649696 - type: euclidean_spearman value: 85.9384575715382 - type: main_score value: 85.63500407412486 - type: manhattan_pearson value: 85.10797194089801 - type: manhattan_spearman value: 85.8770162042784 - type: pearson value: 84.27751020600267 - type: spearman value: 85.63500407412486 task: type: STS - dataset: config: fr name: MTEB STSBenchmarkMultilingualSTS (fr) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 86.56833656723254 - type: cosine_spearman value: 87.4393978501382 - type: euclidean_pearson value: 87.45171512751267 - type: euclidean_spearman value: 88.13106516566947 - type: main_score value: 87.4393978501382 - type: manhattan_pearson value: 87.33010961793333 - type: manhattan_spearman value: 88.06707425102182 - type: pearson value: 86.56833656723254 - type: spearman value: 87.4393978501382 task: type: STS - dataset: config: pl name: MTEB STSBenchmarkMultilingualSTS (pl) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 85.45065540325523 - type: cosine_spearman value: 85.47881076789359 - type: euclidean_pearson value: 85.1999493863155 - type: euclidean_spearman value: 85.7874947669187 - type: main_score value: 85.47881076789359 - type: manhattan_pearson value: 85.06075305990376 - type: manhattan_spearman value: 85.71563015639558 - type: pearson value: 85.45065540325523 - type: spearman value: 85.47881076789359 task: type: STS - dataset: config: es name: MTEB STSBenchmarkMultilingualSTS (es) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 87.11952824079832 - type: cosine_spearman value: 87.9643473573153 - type: euclidean_pearson value: 88.11750364639971 - type: euclidean_spearman value: 88.63695109016498 - type: main_score value: 87.9643473573153 - type: manhattan_pearson value: 88.00294453126699 - type: manhattan_spearman value: 88.53750241758391 - type: pearson value: 87.11952824079832 - type: spearman value: 87.9643473573153 task: type: STS - dataset: config: ru name: MTEB STSBenchmarkMultilingualSTS (ru) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 85.99804354414991 - type: cosine_spearman value: 86.30252111551002 - type: euclidean_pearson value: 86.1880652037762 - type: euclidean_spearman value: 86.69556223944502 - type: main_score value: 86.30252111551002 - type: manhattan_pearson value: 86.0736400320898 - type: manhattan_spearman value: 86.61747927593393 - type: pearson value: 85.99804354414991 - type: spearman value: 86.30252111551002 task: type: STS - dataset: config: en name: MTEB STSBenchmarkMultilingualSTS (en) revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c split: test type: mteb/stsb_multi_mt metrics: - type: cosine_pearson value: 87.70246861738103 - type: cosine_spearman value: 89.44070045346106 - type: euclidean_pearson value: 89.56956518833663 - type: euclidean_spearman value: 89.95830112784283 - type: main_score value: 89.44070045346106 - type: manhattan_pearson value: 89.48264470792915 - type: manhattan_spearman value: 89.87900732483114 - type: pearson value: 87.70246861738103 - type: spearman value: 89.44070045346106 task: type: STS - dataset: config: default name: MTEB SciDocsRR (default) revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab split: test type: mteb/scidocs-reranking metrics: - type: map value: 84.88064122814694 - type: mrr value: 95.84832651009123 - type: main_score value: 84.88064122814694 task: type: Reranking - dataset: config: default name: MTEB SciFact (default) revision: 0228b52cf27578f30900b9e5271d331663a030d7 split: test type: mteb/scifact metrics: - type: map_at_1 value: 57.289 - type: map_at_10 value: 67.88499999999999 - type: map_at_100 value: 68.477 - type: map_at_1000 value: 68.50500000000001 - type: map_at_20 value: 68.33500000000001 - type: map_at_3 value: 65.08 - type: map_at_5 value: 67.001 - type: mrr_at_1 value: 59.667 - type: mrr_at_10 value: 68.626 - type: mrr_at_100 value: 69.082 - type: mrr_at_1000 value: 69.108 - type: mrr_at_20 value: 68.958 - type: mrr_at_3 value: 66.667 - type: mrr_at_5 value: 67.983 - type: ndcg_at_1 value: 59.667 - type: ndcg_at_10 value: 72.309 - type: ndcg_at_100 value: 74.58399999999999 - type: ndcg_at_1000 value: 75.25500000000001 - type: ndcg_at_20 value: 73.656 - type: ndcg_at_3 value: 67.791 - type: ndcg_at_5 value: 70.45 - type: precision_at_1 value: 59.667 - type: precision_at_10 value: 9.567 - type: precision_at_100 value: 1.073 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_20 value: 5.083 - type: precision_at_3 value: 26.333000000000002 - type: precision_at_5 value: 17.666999999999998 - type: recall_at_1 value: 57.289 - type: recall_at_10 value: 84.756 - type: recall_at_100 value: 94.5 - type: recall_at_1000 value: 99.667 - type: recall_at_20 value: 89.7 - type: recall_at_3 value: 73.22800000000001 - type: recall_at_5 value: 79.444 - type: main_score value: 72.309 task: type: Retrieval - dataset: config: default name: MTEB SpanishNewsClusteringP2P (default) revision: bf8ca8ddc5b7da4f7004720ddf99bbe0483480e6 split: test type: jinaai/spanish_news_clustering metrics: - type: main_score value: 45.04477709795154 - type: v_measure value: 45.04477709795154 - type: v_measure_std value: 0.0 task: type: Clustering - dataset: config: default name: MTEB SpanishPassageRetrievalS2S (default) revision: 9cddf2ce5209ade52c2115ccfa00eb22c6d3a837 split: test type: jinaai/spanish_passage_retrieval metrics: - type: main_score value: 69.83 - type: map_at_1 value: 15.736 - type: map_at_10 value: 52.027 - type: map_at_100 value: 65.08800000000001 - type: map_at_1000 value: 65.08800000000001 - type: map_at_20 value: 60.79900000000001 - type: map_at_3 value: 32.869 - type: map_at_5 value: 41.436 - type: mrr_at_1 value: 75.44910179640718 - type: mrr_at_10 value: 84.43446440452426 - type: mrr_at_100 value: 84.48052612723271 - type: mrr_at_1000 value: 84.48052612723271 - type: mrr_at_20 value: 84.48052612723271 - type: mrr_at_3 value: 83.13373253493013 - type: mrr_at_5 value: 84.3013972055888 - type: nauc_map_at_1000_diff1 value: 50.611540149694356 - type: nauc_map_at_1000_max value: 2.1102430434260238 - type: nauc_map_at_1000_std value: -18.88993521335793 - type: nauc_map_at_100_diff1 value: 50.611540149694356 - type: nauc_map_at_100_max value: 2.1102430434260238 - type: nauc_map_at_100_std value: -18.88993521335793 - type: nauc_map_at_10_diff1 value: 59.13518981755268 - type: nauc_map_at_10_max value: -9.810386627392807 - type: nauc_map_at_10_std value: -38.31810152345078 - type: nauc_map_at_1_diff1 value: 74.96782567287174 - type: nauc_map_at_1_max value: -29.648279252607875 - type: nauc_map_at_1_std value: -54.017459339141595 - type: nauc_map_at_20_diff1 value: 55.26694458629849 - type: nauc_map_at_20_max value: -1.9490244535020729 - type: nauc_map_at_20_std value: -25.22211659104076 - type: nauc_map_at_3_diff1 value: 71.67607885031732 - type: nauc_map_at_3_max value: -25.078101661694507 - type: nauc_map_at_3_std value: -50.55408861920259 - type: nauc_map_at_5_diff1 value: 61.50111515417668 - type: nauc_map_at_5_max value: -16.4114670513168 - type: nauc_map_at_5_std value: -44.391416134859135 - type: nauc_mrr_at_1000_diff1 value: 74.18848063283234 - type: nauc_mrr_at_1000_max value: 21.929205946778005 - type: nauc_mrr_at_1000_std value: -36.27399268489433 - type: nauc_mrr_at_100_diff1 value: 74.18848063283234 - type: nauc_mrr_at_100_max value: 21.929205946778005 - type: nauc_mrr_at_100_std value: -36.27399268489433 - type: nauc_mrr_at_10_diff1 value: 74.27231582268745 - type: nauc_mrr_at_10_max value: 21.481133301135337 - type: nauc_mrr_at_10_std value: -36.72070854872902 - type: nauc_mrr_at_1_diff1 value: 76.54855950439561 - type: nauc_mrr_at_1_max value: 26.99938321212366 - type: nauc_mrr_at_1_std value: -33.098742603429635 - type: nauc_mrr_at_20_diff1 value: 74.18848063283234 - type: nauc_mrr_at_20_max value: 21.929205946778005 - type: nauc_mrr_at_20_std value: -36.27399268489433 - type: nauc_mrr_at_3_diff1 value: 72.05379526740143 - type: nauc_mrr_at_3_max value: 18.875831185752528 - type: nauc_mrr_at_3_std value: -37.27302006456391 - type: nauc_mrr_at_5_diff1 value: 74.25342356682029 - type: nauc_mrr_at_5_max value: 20.756340085088738 - type: nauc_mrr_at_5_std value: -37.99507208540703 - type: nauc_ndcg_at_1000_diff1 value: 53.259363764380275 - type: nauc_ndcg_at_1000_max value: 12.936954959423218 - type: nauc_ndcg_at_1000_std value: -16.953898675672153 - type: nauc_ndcg_at_100_diff1 value: 53.259363764380275 - type: nauc_ndcg_at_100_max value: 12.936954959423218 - type: nauc_ndcg_at_100_std value: -16.953898675672153 - type: nauc_ndcg_at_10_diff1 value: 53.70942345413554 - type: nauc_ndcg_at_10_max value: -3.8465093347016186 - type: nauc_ndcg_at_10_std value: -31.208127919994755 - type: nauc_ndcg_at_1_diff1 value: 75.30551289259554 - type: nauc_ndcg_at_1_max value: 25.53292054129834 - type: nauc_ndcg_at_1_std value: -33.285498788395145 - type: nauc_ndcg_at_20_diff1 value: 57.62409278278133 - type: nauc_ndcg_at_20_max value: 2.8040586426056233 - type: nauc_ndcg_at_20_std value: -26.270875776221704 - type: nauc_ndcg_at_3_diff1 value: 48.42294834754225 - type: nauc_ndcg_at_3_max value: 16.912467881065822 - type: nauc_ndcg_at_3_std value: -13.324841189277873 - type: nauc_ndcg_at_5_diff1 value: 47.512819802794596 - type: nauc_ndcg_at_5_max value: 14.645518203506594 - type: nauc_ndcg_at_5_std value: -17.641450435599275 - type: nauc_precision_at_1000_diff1 value: -34.43320975829637 - type: nauc_precision_at_1000_max value: 29.08585622578186 - type: nauc_precision_at_1000_std value: 46.55117940162061 - type: nauc_precision_at_100_diff1 value: -34.433209758296364 - type: nauc_precision_at_100_max value: 29.085856225781885 - type: nauc_precision_at_100_std value: 46.55117940162065 - type: nauc_precision_at_10_diff1 value: -21.895306304096902 - type: nauc_precision_at_10_max value: 33.190476527593745 - type: nauc_precision_at_10_std value: 37.64916268614298 - type: nauc_precision_at_1_diff1 value: 75.30551289259554 - type: nauc_precision_at_1_max value: 25.53292054129834 - type: nauc_precision_at_1_std value: -33.285498788395145 - type: nauc_precision_at_20_diff1 value: -27.63076748060466 - type: nauc_precision_at_20_max value: 30.689810416086154 - type: nauc_precision_at_20_std value: 46.164191636131626 - type: nauc_precision_at_3_diff1 value: 20.547345067837288 - type: nauc_precision_at_3_max value: 26.177050942827528 - type: nauc_precision_at_3_std value: 5.960466052973099 - type: nauc_precision_at_5_diff1 value: -8.928755534002669 - type: nauc_precision_at_5_max value: 40.83262650073459 - type: nauc_precision_at_5_std value: 26.158537031161494 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: .nan - type: nauc_recall_at_100_max value: .nan - type: nauc_recall_at_100_std value: .nan - type: nauc_recall_at_10_diff1 value: 53.08654386169444 - type: nauc_recall_at_10_max value: -23.276269379519356 - type: nauc_recall_at_10_std value: -50.80707792706157 - type: nauc_recall_at_1_diff1 value: 74.96782567287174 - type: nauc_recall_at_1_max value: -29.648279252607875 - type: nauc_recall_at_1_std value: -54.017459339141595 - type: nauc_recall_at_20_diff1 value: 51.60121897059633 - type: nauc_recall_at_20_max value: -14.241779530735387 - type: nauc_recall_at_20_std value: -37.877451525215456 - type: nauc_recall_at_3_diff1 value: 66.99474984329694 - type: nauc_recall_at_3_max value: -30.802787353187966 - type: nauc_recall_at_3_std value: -53.58737792129713 - type: nauc_recall_at_5_diff1 value: 54.64214444958567 - type: nauc_recall_at_5_max value: -23.341309362104703 - type: nauc_recall_at_5_std value: -51.381363923145265 - type: ndcg_at_1 value: 76.048 - type: ndcg_at_10 value: 69.83 - type: ndcg_at_100 value: 82.11500000000001 - type: ndcg_at_1000 value: 82.11500000000001 - type: ndcg_at_20 value: 75.995 - type: ndcg_at_3 value: 69.587 - type: ndcg_at_5 value: 69.062 - type: precision_at_1 value: 76.048 - type: precision_at_10 value: 43.653 - type: precision_at_100 value: 7.718999999999999 - type: precision_at_1000 value: 0.772 - type: precision_at_20 value: 31.108000000000004 - type: precision_at_3 value: 63.87199999999999 - type: precision_at_5 value: 56.407 - type: recall_at_1 value: 15.736 - type: recall_at_10 value: 66.873 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 85.01100000000001 - type: recall_at_3 value: 36.441 - type: recall_at_5 value: 49.109 task: type: Retrieval - dataset: config: default name: MTEB SprintDuplicateQuestions (default) revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 split: test type: mteb/sprintduplicatequestions-pairclassification metrics: - type: cosine_accuracy value: 99.87326732673267 - type: cosine_accuracy_threshold value: 86.0752820968628 - type: cosine_ap value: 96.98758090713252 - type: cosine_f1 value: 93.52881698685542 - type: cosine_f1_threshold value: 86.0752820968628 - type: cosine_precision value: 94.58077709611452 - type: cosine_recall value: 92.5 - type: dot_accuracy value: 99.82574257425742 - type: dot_accuracy_threshold value: 40484.73815917969 - type: dot_ap value: 95.68959907254845 - type: dot_f1 value: 91.31293188548865 - type: dot_f1_threshold value: 40336.810302734375 - type: dot_precision value: 90.15594541910332 - type: dot_recall value: 92.5 - type: euclidean_accuracy value: 99.87128712871286 - type: euclidean_accuracy_threshold value: 1162.5749588012695 - type: euclidean_ap value: 96.92640435656577 - type: euclidean_f1 value: 93.4475806451613 - type: euclidean_f1_threshold value: 1162.5749588012695 - type: euclidean_precision value: 94.20731707317073 - type: euclidean_recall value: 92.7 - type: main_score value: 96.98758090713252 - type: manhattan_accuracy value: 99.86930693069307 - type: manhattan_accuracy_threshold value: 28348.71826171875 - type: manhattan_ap value: 96.93832673967925 - type: manhattan_f1 value: 93.33333333333333 - type: manhattan_f1_threshold value: 28348.71826171875 - type: manhattan_precision value: 94.28571428571428 - type: manhattan_recall value: 92.4 - type: max_accuracy value: 99.87326732673267 - type: max_ap value: 96.98758090713252 - type: max_f1 value: 93.52881698685542 - type: max_precision value: 94.58077709611452 - type: max_recall value: 92.7 - type: similarity_accuracy value: 99.87326732673267 - type: similarity_accuracy_threshold value: 86.0752820968628 - type: similarity_ap value: 96.98758090713252 - type: similarity_f1 value: 93.52881698685542 - type: similarity_f1_threshold value: 86.0752820968628 - type: similarity_precision value: 94.58077709611452 - type: similarity_recall value: 92.5 task: type: PairClassification - dataset: config: default name: MTEB StackExchangeClustering (default) revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: main_score value: 65.6560129719848 - type: v_measure value: 65.6560129719848 - type: v_measure_std value: 4.781229811487539 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P (default) revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: main_score value: 35.07546243853692 - type: v_measure value: 35.07546243853692 - type: v_measure_std value: 1.1978740356240998 task: type: Clustering - dataset: config: default name: MTEB StackOverflowDupQuestions (default) revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 split: test type: mteb/stackoverflowdupquestions-reranking metrics: - type: map value: 51.771005199508835 - type: mrr value: 52.65443298531534 - type: main_score value: 51.771005199508835 task: type: Reranking - dataset: config: default name: MTEB SummEval (default) revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c split: test type: mteb/summeval metrics: - type: cosine_pearson value: 29.48686238342228 - type: cosine_spearman value: 29.706543509170054 - type: dot_pearson value: 27.95853155597859 - type: dot_spearman value: 27.604287986935162 - type: main_score value: 29.706543509170054 - type: pearson value: 29.48686238342228 - type: spearman value: 29.706543509170054 task: type: Summarization - dataset: config: default name: MTEB SummEvalFr (default) revision: b385812de6a9577b6f4d0f88c6a6e35395a94054 split: test type: lyon-nlp/summarization-summeval-fr-p2p metrics: - type: cosine_pearson value: 31.551301434917868 - type: cosine_spearman value: 30.709049789175186 - type: dot_pearson value: 27.77050901756549 - type: dot_spearman value: 26.715505953561795 - type: main_score value: 30.709049789175186 - type: pearson value: 31.551301434917868 - type: spearman value: 30.709049789175186 task: type: Summarization - dataset: config: default name: MTEB SyntecReranking (default) revision: b205c5084a0934ce8af14338bf03feb19499c84d split: test type: lyon-nlp/mteb-fr-reranking-syntec-s2p metrics: - type: map value: 73.31666666666666 - type: mrr value: 73.31666666666666 - type: main_score value: 73.31666666666666 task: type: Reranking - dataset: config: default name: MTEB SyntecRetrieval (default) revision: 19661ccdca4dfc2d15122d776b61685f48c68ca9 split: test type: lyon-nlp/mteb-fr-retrieval-syntec-s2p metrics: - type: main_score value: 83.851 - type: map_at_1 value: 68.0 - type: map_at_10 value: 79.187 - type: map_at_100 value: 79.32900000000001 - type: map_at_1000 value: 79.32900000000001 - type: map_at_20 value: 79.32900000000001 - type: map_at_3 value: 77.333 - type: map_at_5 value: 78.93299999999999 - type: mrr_at_1 value: 68.0 - type: mrr_at_10 value: 79.18730158730159 - type: mrr_at_100 value: 79.32945845004669 - type: mrr_at_1000 value: 79.32945845004669 - type: mrr_at_20 value: 79.32945845004669 - type: mrr_at_3 value: 77.33333333333333 - type: mrr_at_5 value: 78.93333333333332 - type: nauc_map_at_1000_diff1 value: 63.31103256935259 - type: nauc_map_at_1000_max value: 11.073749121365623 - type: nauc_map_at_1000_std value: 7.4973309839738 - type: nauc_map_at_100_diff1 value: 63.31103256935259 - type: nauc_map_at_100_max value: 11.073749121365623 - type: nauc_map_at_100_std value: 7.4973309839738 - type: nauc_map_at_10_diff1 value: 62.91585737195978 - type: nauc_map_at_10_max value: 11.770664508983133 - type: nauc_map_at_10_std value: 8.179883948527962 - type: nauc_map_at_1_diff1 value: 66.1236265634718 - type: nauc_map_at_1_max value: 7.000207311173955 - type: nauc_map_at_1_std value: 6.54412272821497 - type: nauc_map_at_20_diff1 value: 63.31103256935259 - type: nauc_map_at_20_max value: 11.073749121365623 - type: nauc_map_at_20_std value: 7.4973309839738 - type: nauc_map_at_3_diff1 value: 62.14039574010254 - type: nauc_map_at_3_max value: 11.06996398110187 - type: nauc_map_at_3_std value: 7.288759297085769 - type: nauc_map_at_5_diff1 value: 63.0401271126211 - type: nauc_map_at_5_max value: 10.779317801858609 - type: nauc_map_at_5_std value: 6.476660484760681 - type: nauc_mrr_at_1000_diff1 value: 63.31103256935259 - type: nauc_mrr_at_1000_max value: 11.073749121365623 - type: nauc_mrr_at_1000_std value: 7.4973309839738 - type: nauc_mrr_at_100_diff1 value: 63.31103256935259 - type: nauc_mrr_at_100_max value: 11.073749121365623 - type: nauc_mrr_at_100_std value: 7.4973309839738 - type: nauc_mrr_at_10_diff1 value: 62.91585737195978 - type: nauc_mrr_at_10_max value: 11.770664508983133 - type: nauc_mrr_at_10_std value: 8.179883948527962 - type: nauc_mrr_at_1_diff1 value: 66.1236265634718 - type: nauc_mrr_at_1_max value: 7.000207311173955 - type: nauc_mrr_at_1_std value: 6.54412272821497 - type: nauc_mrr_at_20_diff1 value: 63.31103256935259 - type: nauc_mrr_at_20_max value: 11.073749121365623 - type: nauc_mrr_at_20_std value: 7.4973309839738 - type: nauc_mrr_at_3_diff1 value: 62.14039574010254 - type: nauc_mrr_at_3_max value: 11.06996398110187 - type: nauc_mrr_at_3_std value: 7.288759297085769 - type: nauc_mrr_at_5_diff1 value: 63.0401271126211 - type: nauc_mrr_at_5_max value: 10.779317801858609 - type: nauc_mrr_at_5_std value: 6.476660484760681 - type: nauc_ndcg_at_1000_diff1 value: 62.9544299483241 - type: nauc_ndcg_at_1000_max value: 11.577079766964538 - type: nauc_ndcg_at_1000_std value: 7.703856790100716 - type: nauc_ndcg_at_100_diff1 value: 62.9544299483241 - type: nauc_ndcg_at_100_max value: 11.577079766964538 - type: nauc_ndcg_at_100_std value: 7.703856790100716 - type: nauc_ndcg_at_10_diff1 value: 61.29907952217381 - type: nauc_ndcg_at_10_max value: 14.760627422715425 - type: nauc_ndcg_at_10_std value: 10.805573898143368 - type: nauc_ndcg_at_1_diff1 value: 66.1236265634718 - type: nauc_ndcg_at_1_max value: 7.000207311173955 - type: nauc_ndcg_at_1_std value: 6.54412272821497 - type: nauc_ndcg_at_20_diff1 value: 62.9544299483241 - type: nauc_ndcg_at_20_max value: 11.577079766964538 - type: nauc_ndcg_at_20_std value: 7.703856790100716 - type: nauc_ndcg_at_3_diff1 value: 60.25643527856101 - type: nauc_ndcg_at_3_max value: 12.236302709487546 - type: nauc_ndcg_at_3_std value: 7.36883189112067 - type: nauc_ndcg_at_5_diff1 value: 61.65220590318238 - type: nauc_ndcg_at_5_max value: 11.39969101913945 - type: nauc_ndcg_at_5_std value: 5.406207922379402 - type: nauc_precision_at_1000_diff1 value: .nan - type: nauc_precision_at_1000_max value: .nan - type: nauc_precision_at_1000_std value: .nan - type: nauc_precision_at_100_diff1 value: .nan - type: nauc_precision_at_100_max value: .nan - type: nauc_precision_at_100_std value: .nan - type: nauc_precision_at_10_diff1 value: 19.14098972922579 - type: nauc_precision_at_10_max value: 100.0 - type: nauc_precision_at_10_std value: 93.46405228758135 - type: nauc_precision_at_1_diff1 value: 66.1236265634718 - type: nauc_precision_at_1_max value: 7.000207311173955 - type: nauc_precision_at_1_std value: 6.54412272821497 - type: nauc_precision_at_20_diff1 value: 100.0 - type: nauc_precision_at_20_max value: 100.0 - type: nauc_precision_at_20_std value: 100.0 - type: nauc_precision_at_3_diff1 value: 50.29636629155561 - type: nauc_precision_at_3_max value: 18.00532600292076 - type: nauc_precision_at_3_std value: 7.649686453053768 - type: nauc_precision_at_5_diff1 value: 43.522408963585356 - type: nauc_precision_at_5_max value: 16.923436041082983 - type: nauc_precision_at_5_std value: -10.854341736694092 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: .nan - type: nauc_recall_at_100_max value: .nan - type: nauc_recall_at_100_std value: .nan - type: nauc_recall_at_10_diff1 value: 19.1409897292252 - type: nauc_recall_at_10_max value: 100.0 - type: nauc_recall_at_10_std value: 93.46405228758134 - type: nauc_recall_at_1_diff1 value: 66.1236265634718 - type: nauc_recall_at_1_max value: 7.000207311173955 - type: nauc_recall_at_1_std value: 6.54412272821497 - type: nauc_recall_at_20_diff1 value: .nan - type: nauc_recall_at_20_max value: .nan - type: nauc_recall_at_20_std value: .nan - type: nauc_recall_at_3_diff1 value: 50.29636629155569 - type: nauc_recall_at_3_max value: 18.005326002920754 - type: nauc_recall_at_3_std value: 7.649686453053851 - type: nauc_recall_at_5_diff1 value: 43.5224089635856 - type: nauc_recall_at_5_max value: 16.92343604108335 - type: nauc_recall_at_5_std value: -10.854341736694499 - type: ndcg_at_1 value: 68.0 - type: ndcg_at_10 value: 83.851 - type: ndcg_at_100 value: 84.36099999999999 - type: ndcg_at_1000 value: 84.36099999999999 - type: ndcg_at_20 value: 84.36099999999999 - type: ndcg_at_3 value: 80.333 - type: ndcg_at_5 value: 83.21600000000001 - type: precision_at_1 value: 68.0 - type: precision_at_10 value: 9.8 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 5.0 - type: precision_at_3 value: 29.666999999999998 - type: precision_at_5 value: 19.2 - type: recall_at_1 value: 68.0 - type: recall_at_10 value: 98.0 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 100.0 - type: recall_at_3 value: 89.0 - type: recall_at_5 value: 96.0 task: type: Retrieval - dataset: config: default name: MTEB T2Reranking (default) revision: 76631901a18387f85eaa53e5450019b87ad58ef9 split: dev type: C-MTEB/T2Reranking metrics: - type: map value: 65.3088203970324 - type: mrr value: 74.79505862376546 - type: main_score value: 65.3088203970324 task: type: Reranking - dataset: config: default name: MTEB T2Retrieval (default) revision: 8731a845f1bf500a4f111cf1070785c793d10e64 split: dev type: C-MTEB/T2Retrieval metrics: - type: main_score value: 83.163 - type: map_at_1 value: 26.875 - type: map_at_10 value: 75.454 - type: map_at_100 value: 79.036 - type: map_at_1000 value: 79.111 - type: map_at_20 value: 78.145 - type: map_at_3 value: 53.181 - type: map_at_5 value: 65.362 - type: mrr_at_1 value: 88.90057864281957 - type: mrr_at_10 value: 91.53186397301344 - type: mrr_at_100 value: 91.62809075510003 - type: mrr_at_1000 value: 91.63198173030787 - type: mrr_at_20 value: 91.59414668799909 - type: mrr_at_3 value: 91.0792565316499 - type: mrr_at_5 value: 91.35718043135199 - type: nauc_map_at_1000_diff1 value: 12.364843957982409 - type: nauc_map_at_1000_max value: 52.07043464458799 - type: nauc_map_at_1000_std value: 16.040095055100494 - type: nauc_map_at_100_diff1 value: 12.370621073823022 - type: nauc_map_at_100_max value: 51.960738727635636 - type: nauc_map_at_100_std value: 15.935832440430747 - type: nauc_map_at_10_diff1 value: 16.852819486606585 - type: nauc_map_at_10_max value: 40.11184760756059 - type: nauc_map_at_10_std value: 0.9306648364102376 - type: nauc_map_at_1_diff1 value: 52.87356542654683 - type: nauc_map_at_1_max value: -22.210039746171255 - type: nauc_map_at_1_std value: -38.11345358035342 - type: nauc_map_at_20_diff1 value: 13.045089059562837 - type: nauc_map_at_20_max value: 49.591383082160036 - type: nauc_map_at_20_std value: 12.54330050352008 - type: nauc_map_at_3_diff1 value: 38.08172234377615 - type: nauc_map_at_3_max value: -6.868621684867697 - type: nauc_map_at_3_std value: -35.4712388845996 - type: nauc_map_at_5_diff1 value: 29.665551705577474 - type: nauc_map_at_5_max value: 10.958628576519045 - type: nauc_map_at_5_std value: -25.113120842097057 - type: nauc_mrr_at_1000_diff1 value: 47.39372999496945 - type: nauc_mrr_at_1000_max value: 83.11274997493808 - type: nauc_mrr_at_1000_std value: 39.74195374546631 - type: nauc_mrr_at_100_diff1 value: 47.396678946057676 - type: nauc_mrr_at_100_max value: 83.1192584274415 - type: nauc_mrr_at_100_std value: 39.75840860374685 - type: nauc_mrr_at_10_diff1 value: 47.35365644138715 - type: nauc_mrr_at_10_max value: 83.189165639531 - type: nauc_mrr_at_10_std value: 39.83653157887758 - type: nauc_mrr_at_1_diff1 value: 47.98740362820094 - type: nauc_mrr_at_1_max value: 80.32340034580369 - type: nauc_mrr_at_1_std value: 34.57857131423388 - type: nauc_mrr_at_20_diff1 value: 47.399132055537194 - type: nauc_mrr_at_20_max value: 83.16329919869686 - type: nauc_mrr_at_20_std value: 39.84204692042734 - type: nauc_mrr_at_3_diff1 value: 47.09295580511751 - type: nauc_mrr_at_3_max value: 82.95831045602642 - type: nauc_mrr_at_3_std value: 38.98036804692351 - type: nauc_mrr_at_5_diff1 value: 47.20100268549764 - type: nauc_mrr_at_5_max value: 83.16652480381642 - type: nauc_mrr_at_5_std value: 39.55690491560902 - type: nauc_ndcg_at_1000_diff1 value: 17.201962509184547 - type: nauc_ndcg_at_1000_max value: 63.75820559259539 - type: nauc_ndcg_at_1000_std value: 29.28676096486067 - type: nauc_ndcg_at_100_diff1 value: 16.76847216096811 - type: nauc_ndcg_at_100_max value: 62.646517934470744 - type: nauc_ndcg_at_100_std value: 28.7441617667637 - type: nauc_ndcg_at_10_diff1 value: 16.559511980751886 - type: nauc_ndcg_at_10_max value: 54.35027464277944 - type: nauc_ndcg_at_10_std value: 16.98089333577716 - type: nauc_ndcg_at_1_diff1 value: 47.98740362820094 - type: nauc_ndcg_at_1_max value: 80.32340034580369 - type: nauc_ndcg_at_1_std value: 34.57857131423388 - type: nauc_ndcg_at_20_diff1 value: 16.721525245428243 - type: nauc_ndcg_at_20_max value: 57.683661870555724 - type: nauc_ndcg_at_20_std value: 21.736044200026853 - type: nauc_ndcg_at_3_diff1 value: 12.488009696556192 - type: nauc_ndcg_at_3_max value: 69.2365575305502 - type: nauc_ndcg_at_3_std value: 30.622418945055323 - type: nauc_ndcg_at_5_diff1 value: 12.364114556230609 - type: nauc_ndcg_at_5_max value: 62.33360746285387 - type: nauc_ndcg_at_5_std value: 24.898000803570227 - type: nauc_precision_at_1000_diff1 value: -35.14745130154524 - type: nauc_precision_at_1000_max value: 48.811507982849065 - type: nauc_precision_at_1000_std value: 62.43036496029399 - type: nauc_precision_at_100_diff1 value: -35.15276411320076 - type: nauc_precision_at_100_max value: 50.87010333741109 - type: nauc_precision_at_100_std value: 63.418221030407175 - type: nauc_precision_at_10_diff1 value: -34.84255710936113 - type: nauc_precision_at_10_max value: 56.588401051428825 - type: nauc_precision_at_10_std value: 57.4763370653757 - type: nauc_precision_at_1_diff1 value: 47.98740362820094 - type: nauc_precision_at_1_max value: 80.32340034580369 - type: nauc_precision_at_1_std value: 34.57857131423388 - type: nauc_precision_at_20_diff1 value: -35.165762365233505 - type: nauc_precision_at_20_max value: 54.148762449660424 - type: nauc_precision_at_20_std value: 61.569719669368716 - type: nauc_precision_at_3_diff1 value: -28.63023175340299 - type: nauc_precision_at_3_max value: 68.69825987618499 - type: nauc_precision_at_3_std value: 48.15479495755423 - type: nauc_precision_at_5_diff1 value: -34.13811355456687 - type: nauc_precision_at_5_max value: 62.369363941490604 - type: nauc_precision_at_5_std value: 52.282904411187914 - type: nauc_recall_at_1000_diff1 value: 8.686444579162663 - type: nauc_recall_at_1000_max value: 59.58864478011338 - type: nauc_recall_at_1000_std value: 56.692774954297455 - type: nauc_recall_at_100_diff1 value: 8.820596225758342 - type: nauc_recall_at_100_max value: 53.15048885657892 - type: nauc_recall_at_100_std value: 39.78931159236714 - type: nauc_recall_at_10_diff1 value: 16.022301106315027 - type: nauc_recall_at_10_max value: 29.83242342459543 - type: nauc_recall_at_10_std value: -4.805965555875844 - type: nauc_recall_at_1_diff1 value: 52.87356542654683 - type: nauc_recall_at_1_max value: -22.210039746171255 - type: nauc_recall_at_1_std value: -38.11345358035342 - type: nauc_recall_at_20_diff1 value: 10.35772828627265 - type: nauc_recall_at_20_max value: 43.06420839754062 - type: nauc_recall_at_20_std value: 15.040522218235692 - type: nauc_recall_at_3_diff1 value: 36.23953684770224 - type: nauc_recall_at_3_max value: -11.709269151700374 - type: nauc_recall_at_3_std value: -38.13943178150384 - type: nauc_recall_at_5_diff1 value: 28.644872415763384 - type: nauc_recall_at_5_max value: 2.062151266111129 - type: nauc_recall_at_5_std value: -30.81114034774277 - type: ndcg_at_1 value: 88.901 - type: ndcg_at_10 value: 83.163 - type: ndcg_at_100 value: 86.854 - type: ndcg_at_1000 value: 87.602 - type: ndcg_at_20 value: 84.908 - type: ndcg_at_3 value: 84.848 - type: ndcg_at_5 value: 83.372 - type: precision_at_1 value: 88.901 - type: precision_at_10 value: 41.343 - type: precision_at_100 value: 4.957000000000001 - type: precision_at_1000 value: 0.513 - type: precision_at_20 value: 22.955000000000002 - type: precision_at_3 value: 74.29599999999999 - type: precision_at_5 value: 62.251999999999995 - type: recall_at_1 value: 26.875 - type: recall_at_10 value: 81.902 - type: recall_at_100 value: 93.988 - type: recall_at_1000 value: 97.801 - type: recall_at_20 value: 87.809 - type: recall_at_3 value: 54.869 - type: recall_at_5 value: 68.728 task: type: Retrieval - dataset: config: default name: MTEB TERRa (default) revision: 7b58f24536063837d644aab9a023c62199b2a612 split: dev type: ai-forever/terra-pairclassification metrics: - type: cosine_accuracy value: 60.586319218241044 - type: cosine_accuracy_threshold value: 82.49806761741638 - type: cosine_ap value: 58.73198048427448 - type: cosine_f1 value: 67.37967914438502 - type: cosine_f1_threshold value: 77.46461033821106 - type: cosine_precision value: 57.01357466063348 - type: cosine_recall value: 82.35294117647058 - type: dot_accuracy value: 60.26058631921825 - type: dot_accuracy_threshold value: 35627.020263671875 - type: dot_ap value: 57.418783612898224 - type: dot_f1 value: 66.51982378854623 - type: dot_f1_threshold value: 27620.843505859375 - type: dot_precision value: 50.16611295681063 - type: dot_recall value: 98.69281045751634 - type: euclidean_accuracy value: 60.26058631921825 - type: euclidean_accuracy_threshold value: 1255.4466247558594 - type: euclidean_ap value: 58.748656145387955 - type: euclidean_f1 value: 66.99029126213591 - type: euclidean_f1_threshold value: 1565.1330947875977 - type: euclidean_precision value: 53.28185328185329 - type: euclidean_recall value: 90.19607843137256 - type: main_score value: 58.8479126365766 - type: manhattan_accuracy value: 59.934853420195445 - type: manhattan_accuracy_threshold value: 29897.271728515625 - type: manhattan_ap value: 58.8479126365766 - type: manhattan_f1 value: 66.81318681318683 - type: manhattan_f1_threshold value: 46291.802978515625 - type: manhattan_precision value: 50.331125827814574 - type: manhattan_recall value: 99.34640522875817 - type: max_accuracy value: 60.586319218241044 - type: max_ap value: 58.8479126365766 - type: max_f1 value: 67.37967914438502 - type: max_precision value: 57.01357466063348 - type: max_recall value: 99.34640522875817 - type: similarity_accuracy value: 60.586319218241044 - type: similarity_accuracy_threshold value: 82.49806761741638 - type: similarity_ap value: 58.73198048427448 - type: similarity_f1 value: 67.37967914438502 - type: similarity_f1_threshold value: 77.46461033821106 - type: similarity_precision value: 57.01357466063348 - type: similarity_recall value: 82.35294117647058 task: type: PairClassification - dataset: config: default name: MTEB TNews (default) revision: 317f262bf1e6126357bbe89e875451e4b0938fe4 split: validation type: C-MTEB/TNews-classification metrics: - type: accuracy value: 45.967999999999996 - type: f1 value: 44.699306100915706 - type: f1_weighted value: 46.03730319014832 - type: main_score value: 45.967999999999996 task: type: Classification - dataset: config: default name: MTEB TRECCOVID (default) revision: bb9466bac8153a0349341eb1b22e06409e78ef4e split: test type: mteb/trec-covid metrics: - type: map_at_1 value: 0.251 - type: map_at_10 value: 1.9480000000000002 - type: map_at_100 value: 11.082 - type: map_at_1000 value: 26.700000000000003 - type: map_at_20 value: 3.3529999999999998 - type: map_at_3 value: 0.679 - type: map_at_5 value: 1.079 - type: mrr_at_1 value: 94.0 - type: mrr_at_10 value: 95.786 - type: mrr_at_100 value: 95.786 - type: mrr_at_1000 value: 95.786 - type: mrr_at_20 value: 95.786 - type: mrr_at_3 value: 95.0 - type: mrr_at_5 value: 95.5 - type: ndcg_at_1 value: 91.0 - type: ndcg_at_10 value: 77.71900000000001 - type: ndcg_at_100 value: 57.726 - type: ndcg_at_1000 value: 52.737 - type: ndcg_at_20 value: 72.54 - type: ndcg_at_3 value: 83.397 - type: ndcg_at_5 value: 80.806 - type: precision_at_1 value: 94.0 - type: precision_at_10 value: 81.0 - type: precision_at_100 value: 59.199999999999996 - type: precision_at_1000 value: 23.244 - type: precision_at_20 value: 75.2 - type: precision_at_3 value: 88.0 - type: precision_at_5 value: 84.8 - type: recall_at_1 value: 0.251 - type: recall_at_10 value: 2.1229999999999998 - type: recall_at_100 value: 14.496999999999998 - type: recall_at_1000 value: 50.09 - type: recall_at_20 value: 3.8309999999999995 - type: recall_at_3 value: 0.696 - type: recall_at_5 value: 1.1400000000000001 - type: main_score value: 77.71900000000001 task: type: Retrieval - dataset: config: default name: MTEB TenKGnadClusteringP2P (default) revision: 5c59e41555244b7e45c9a6be2d720ab4bafae558 split: test type: slvnwhrl/tenkgnad-clustering-p2p metrics: - type: main_score value: 43.763609722295215 - type: v_measure value: 43.763609722295215 - type: v_measure_std value: 2.8751199473862457 task: type: Clustering - dataset: config: default name: MTEB TenKGnadClusteringS2S (default) revision: 6cddbe003f12b9b140aec477b583ac4191f01786 split: test type: slvnwhrl/tenkgnad-clustering-s2s metrics: - type: main_score value: 39.762424448504355 - type: v_measure value: 39.762424448504355 - type: v_measure_std value: 3.30146124979502 task: type: Clustering - dataset: config: default name: MTEB ThuNewsClusteringP2P (default) revision: 5798586b105c0434e4f0fe5e767abe619442cf93 split: test type: C-MTEB/ThuNewsClusteringP2P metrics: - type: main_score value: 63.133819258289456 - type: v_measure value: 63.133819258289456 - type: v_measure_std value: 1.8854253356479695 task: type: Clustering - dataset: config: default name: MTEB ThuNewsClusteringS2S (default) revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d split: test type: C-MTEB/ThuNewsClusteringS2S metrics: - type: main_score value: 58.98195851785808 - type: v_measure value: 58.98195851785808 - type: v_measure_std value: 1.6237600076393737 task: type: Clustering - dataset: config: default name: MTEB Touche2020 (default) revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f split: test type: mteb/touche2020 metrics: - type: map_at_1 value: 3.3550000000000004 - type: map_at_10 value: 10.08 - type: map_at_100 value: 16.136 - type: map_at_1000 value: 17.605 - type: map_at_20 value: 12.561 - type: map_at_3 value: 5.641 - type: map_at_5 value: 7.3260000000000005 - type: mrr_at_1 value: 46.939 - type: mrr_at_10 value: 58.152 - type: mrr_at_100 value: 58.594 - type: mrr_at_1000 value: 58.601000000000006 - type: mrr_at_20 value: 58.279 - type: mrr_at_3 value: 55.102 - type: mrr_at_5 value: 56.531 - type: ndcg_at_1 value: 44.897999999999996 - type: ndcg_at_10 value: 26.298 - type: ndcg_at_100 value: 37.596000000000004 - type: ndcg_at_1000 value: 49.424 - type: ndcg_at_20 value: 27.066000000000003 - type: ndcg_at_3 value: 31.528 - type: ndcg_at_5 value: 28.219 - type: precision_at_1 value: 46.939 - type: precision_at_10 value: 22.245 - type: precision_at_100 value: 7.531000000000001 - type: precision_at_1000 value: 1.5350000000000001 - type: precision_at_20 value: 17.041 - type: precision_at_3 value: 30.612000000000002 - type: precision_at_5 value: 26.122 - type: recall_at_1 value: 3.3550000000000004 - type: recall_at_10 value: 16.41 - type: recall_at_100 value: 47.272 - type: recall_at_1000 value: 83.584 - type: recall_at_20 value: 24.091 - type: recall_at_3 value: 6.8180000000000005 - type: recall_at_5 value: 9.677 - type: main_score value: 26.298 task: type: Retrieval - dataset: config: default name: MTEB ToxicConversationsClassification (default) revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 91.2890625 - type: ap value: 33.95547153875715 - type: ap_weighted value: 33.95547153875715 - type: f1 value: 75.10768597556462 - type: f1_weighted value: 92.00161208992606 - type: main_score value: 91.2890625 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification (default) revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 71.3978494623656 - type: f1 value: 71.7194818511814 - type: f1_weighted value: 71.13860187349744 - type: main_score value: 71.3978494623656 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering (default) revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: main_score value: 52.4921688720602 - type: v_measure value: 52.4921688720602 - type: v_measure_std value: 0.992768152658908 task: type: Clustering - dataset: config: default name: MTEB TwitterSemEval2015 (default) revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 split: test type: mteb/twittersemeval2015-pairclassification metrics: - type: cosine_accuracy value: 85.11652858079513 - type: cosine_accuracy_threshold value: 87.90839910507202 - type: cosine_ap value: 70.90459908851724 - type: cosine_f1 value: 65.66581227877457 - type: cosine_f1_threshold value: 85.13308763504028 - type: cosine_precision value: 61.094708153531684 - type: cosine_recall value: 70.97625329815304 - type: dot_accuracy value: 83.41181379269239 - type: dot_accuracy_threshold value: 43110.113525390625 - type: dot_ap value: 65.64869491143095 - type: dot_f1 value: 62.05308447460914 - type: dot_f1_threshold value: 41412.542724609375 - type: dot_precision value: 57.38623626989464 - type: dot_recall value: 67.54617414248021 - type: euclidean_accuracy value: 85.15229182809799 - type: euclidean_accuracy_threshold value: 1043.08500289917 - type: euclidean_ap value: 70.71204383269375 - type: euclidean_f1 value: 65.20304568527919 - type: euclidean_f1_threshold value: 1179.2595863342285 - type: euclidean_precision value: 62.81173594132029 - type: euclidean_recall value: 67.78364116094987 - type: main_score value: 70.90459908851724 - type: manhattan_accuracy value: 85.1820945341837 - type: manhattan_accuracy_threshold value: 26115.0390625 - type: manhattan_ap value: 70.66113937117431 - type: manhattan_f1 value: 65.33383628819313 - type: manhattan_f1_threshold value: 29105.181884765625 - type: manhattan_precision value: 62.40691808791736 - type: manhattan_recall value: 68.54881266490766 - type: max_accuracy value: 85.1820945341837 - type: max_ap value: 70.90459908851724 - type: max_f1 value: 65.66581227877457 - type: max_precision value: 62.81173594132029 - type: max_recall value: 70.97625329815304 - type: similarity_accuracy value: 85.11652858079513 - type: similarity_accuracy_threshold value: 87.90839910507202 - type: similarity_ap value: 70.90459908851724 - type: similarity_f1 value: 65.66581227877457 - type: similarity_f1_threshold value: 85.13308763504028 - type: similarity_precision value: 61.094708153531684 - type: similarity_recall value: 70.97625329815304 task: type: PairClassification - dataset: config: default name: MTEB TwitterURLCorpus (default) revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf split: test type: mteb/twitterurlcorpus-pairclassification metrics: - type: cosine_accuracy value: 88.10299996119068 - type: cosine_accuracy_threshold value: 84.34982895851135 - type: cosine_ap value: 84.13755787769226 - type: cosine_f1 value: 76.0967548076923 - type: cosine_f1_threshold value: 82.8936219215393 - type: cosine_precision value: 74.28864769727193 - type: cosine_recall value: 77.99507237449954 - type: dot_accuracy value: 86.64182869561843 - type: dot_accuracy_threshold value: 38794.677734375 - type: dot_ap value: 80.20301567411457 - type: dot_f1 value: 73.50650291634967 - type: dot_f1_threshold value: 37447.23205566406 - type: dot_precision value: 69.41498460485802 - type: dot_recall value: 78.11056359716662 - type: euclidean_accuracy value: 87.9361198432103 - type: euclidean_accuracy_threshold value: 1184.421157836914 - type: euclidean_ap value: 83.79582690117218 - type: euclidean_f1 value: 75.81431709042175 - type: euclidean_f1_threshold value: 1258.2727432250977 - type: euclidean_precision value: 73.39099099099099 - type: euclidean_recall value: 78.40314136125654 - type: main_score value: 84.13755787769226 - type: manhattan_accuracy value: 87.96134590755618 - type: manhattan_accuracy_threshold value: 29077.291870117188 - type: manhattan_ap value: 83.79487172269923 - type: manhattan_f1 value: 75.82421603424935 - type: manhattan_f1_threshold value: 31224.124145507812 - type: manhattan_precision value: 72.24740255212329 - type: manhattan_recall value: 79.77363720357253 - type: max_accuracy value: 88.10299996119068 - type: max_ap value: 84.13755787769226 - type: max_f1 value: 76.0967548076923 - type: max_precision value: 74.28864769727193 - type: max_recall value: 79.77363720357253 - type: similarity_accuracy value: 88.10299996119068 - type: similarity_accuracy_threshold value: 84.34982895851135 - type: similarity_ap value: 84.13755787769226 - type: similarity_f1 value: 76.0967548076923 - type: similarity_f1_threshold value: 82.8936219215393 - type: similarity_precision value: 74.28864769727193 - type: similarity_recall value: 77.99507237449954 task: type: PairClassification - dataset: config: default name: MTEB VideoRetrieval (default) revision: 58c2597a5943a2ba48f4668c3b90d796283c5639 split: dev type: C-MTEB/VideoRetrieval metrics: - type: main_score value: 70.433 - type: map_at_1 value: 55.7 - type: map_at_10 value: 66.013 - type: map_at_100 value: 66.534 - type: map_at_1000 value: 66.547 - type: map_at_20 value: 66.334 - type: map_at_3 value: 64.2 - type: map_at_5 value: 65.445 - type: mrr_at_1 value: 55.7 - type: mrr_at_10 value: 66.01329365079364 - type: mrr_at_100 value: 66.53350061744233 - type: mrr_at_1000 value: 66.54744831962995 - type: mrr_at_20 value: 66.3335147364675 - type: mrr_at_3 value: 64.2 - type: mrr_at_5 value: 65.44500000000002 - type: nauc_map_at_1000_diff1 value: 76.26428836976245 - type: nauc_map_at_1000_max value: 35.41847367373575 - type: nauc_map_at_1000_std value: -33.04639860831992 - type: nauc_map_at_100_diff1 value: 76.25793229023193 - type: nauc_map_at_100_max value: 35.43663260110076 - type: nauc_map_at_100_std value: -33.04238139882945 - type: nauc_map_at_10_diff1 value: 76.2108281297711 - type: nauc_map_at_10_max value: 35.59442419423183 - type: nauc_map_at_10_std value: -33.32346518997277 - type: nauc_map_at_1_diff1 value: 79.17728405262736 - type: nauc_map_at_1_max value: 31.880738163589527 - type: nauc_map_at_1_std value: -30.891888718004584 - type: nauc_map_at_20_diff1 value: 76.2181333410193 - type: nauc_map_at_20_max value: 35.43448818430876 - type: nauc_map_at_20_std value: -33.35682442863193 - type: nauc_map_at_3_diff1 value: 76.10046541433466 - type: nauc_map_at_3_max value: 34.6831278555291 - type: nauc_map_at_3_std value: -34.030826044831116 - type: nauc_map_at_5_diff1 value: 75.96513023582064 - type: nauc_map_at_5_max value: 34.66920832438069 - type: nauc_map_at_5_std value: -33.79799777830796 - type: nauc_mrr_at_1000_diff1 value: 76.26428836976245 - type: nauc_mrr_at_1000_max value: 35.41847367373575 - type: nauc_mrr_at_1000_std value: -33.04639860831992 - type: nauc_mrr_at_100_diff1 value: 76.25793229023193 - type: nauc_mrr_at_100_max value: 35.43663260110076 - type: nauc_mrr_at_100_std value: -33.04238139882945 - type: nauc_mrr_at_10_diff1 value: 76.2108281297711 - type: nauc_mrr_at_10_max value: 35.59442419423183 - type: nauc_mrr_at_10_std value: -33.32346518997277 - type: nauc_mrr_at_1_diff1 value: 79.17728405262736 - type: nauc_mrr_at_1_max value: 31.880738163589527 - type: nauc_mrr_at_1_std value: -30.891888718004584 - type: nauc_mrr_at_20_diff1 value: 76.2181333410193 - type: nauc_mrr_at_20_max value: 35.43448818430876 - type: nauc_mrr_at_20_std value: -33.35682442863193 - type: nauc_mrr_at_3_diff1 value: 76.10046541433466 - type: nauc_mrr_at_3_max value: 34.6831278555291 - type: nauc_mrr_at_3_std value: -34.030826044831116 - type: nauc_mrr_at_5_diff1 value: 75.96513023582064 - type: nauc_mrr_at_5_max value: 34.66920832438069 - type: nauc_mrr_at_5_std value: -33.79799777830796 - type: nauc_ndcg_at_1000_diff1 value: 75.68118206798317 - type: nauc_ndcg_at_1000_max value: 37.12252980787349 - type: nauc_ndcg_at_1000_std value: -31.457578337430505 - type: nauc_ndcg_at_100_diff1 value: 75.46730761564156 - type: nauc_ndcg_at_100_max value: 37.549890025544265 - type: nauc_ndcg_at_100_std value: -31.35066985945112 - type: nauc_ndcg_at_10_diff1 value: 75.09890404887037 - type: nauc_ndcg_at_10_max value: 38.024147790014204 - type: nauc_ndcg_at_10_std value: -33.67408368593356 - type: nauc_ndcg_at_1_diff1 value: 79.17728405262736 - type: nauc_ndcg_at_1_max value: 31.880738163589527 - type: nauc_ndcg_at_1_std value: -30.891888718004584 - type: nauc_ndcg_at_20_diff1 value: 75.12977548171354 - type: nauc_ndcg_at_20_max value: 37.524926748917956 - type: nauc_ndcg_at_20_std value: -33.771344674947485 - type: nauc_ndcg_at_3_diff1 value: 74.94037476984154 - type: nauc_ndcg_at_3_max value: 35.60345554050552 - type: nauc_ndcg_at_3_std value: -35.256991346321854 - type: nauc_ndcg_at_5_diff1 value: 74.54265907753783 - type: nauc_ndcg_at_5_max value: 35.57662819978585 - type: nauc_ndcg_at_5_std value: -34.879794448418465 - type: nauc_precision_at_1000_diff1 value: 74.52277207179142 - type: nauc_precision_at_1000_max value: 94.25510945118707 - type: nauc_precision_at_1000_std value: 91.6874157070222 - type: nauc_precision_at_100_diff1 value: 65.98346655735419 - type: nauc_precision_at_100_max value: 78.81168727653687 - type: nauc_precision_at_100_std value: 27.241465691967708 - type: nauc_precision_at_10_diff1 value: 69.55050319096688 - type: nauc_precision_at_10_max value: 51.827749140893374 - type: nauc_precision_at_10_std value: -34.60818605792837 - type: nauc_precision_at_1_diff1 value: 79.17728405262736 - type: nauc_precision_at_1_max value: 31.880738163589527 - type: nauc_precision_at_1_std value: -30.891888718004584 - type: nauc_precision_at_20_diff1 value: 68.08078305042736 - type: nauc_precision_at_20_max value: 52.83318878288501 - type: nauc_precision_at_20_std value: -35.46070292817927 - type: nauc_precision_at_3_diff1 value: 70.76249609881901 - type: nauc_precision_at_3_max value: 38.86561868624655 - type: nauc_precision_at_3_std value: -39.68917853446992 - type: nauc_precision_at_5_diff1 value: 68.39110629013278 - type: nauc_precision_at_5_max value: 39.28677163904683 - type: nauc_precision_at_5_std value: -39.39101423819562 - type: nauc_recall_at_1000_diff1 value: 74.52277207179175 - type: nauc_recall_at_1000_max value: 94.25510945118776 - type: nauc_recall_at_1000_std value: 91.68741570702382 - type: nauc_recall_at_100_diff1 value: 65.9834665573548 - type: nauc_recall_at_100_max value: 78.81168727653679 - type: nauc_recall_at_100_std value: 27.241465691967598 - type: nauc_recall_at_10_diff1 value: 69.55050319096708 - type: nauc_recall_at_10_max value: 51.82774914089347 - type: nauc_recall_at_10_std value: -34.6081860579283 - type: nauc_recall_at_1_diff1 value: 79.17728405262736 - type: nauc_recall_at_1_max value: 31.880738163589527 - type: nauc_recall_at_1_std value: -30.891888718004584 - type: nauc_recall_at_20_diff1 value: 68.08078305042746 - type: nauc_recall_at_20_max value: 52.833188782885244 - type: nauc_recall_at_20_std value: -35.46070292817895 - type: nauc_recall_at_3_diff1 value: 70.76249609881896 - type: nauc_recall_at_3_max value: 38.865618686246464 - type: nauc_recall_at_3_std value: -39.68917853446999 - type: nauc_recall_at_5_diff1 value: 68.39110629013274 - type: nauc_recall_at_5_max value: 39.28677163904688 - type: nauc_recall_at_5_std value: -39.39101423819562 - type: ndcg_at_1 value: 55.7 - type: ndcg_at_10 value: 70.433 - type: ndcg_at_100 value: 72.975 - type: ndcg_at_1000 value: 73.283 - type: ndcg_at_20 value: 71.58 - type: ndcg_at_3 value: 66.83099999999999 - type: ndcg_at_5 value: 69.085 - type: precision_at_1 value: 55.7 - type: precision_at_10 value: 8.4 - type: precision_at_100 value: 0.959 - type: precision_at_1000 value: 0.098 - type: precision_at_20 value: 4.425 - type: precision_at_3 value: 24.8 - type: precision_at_5 value: 15.98 - type: recall_at_1 value: 55.7 - type: recall_at_10 value: 84.0 - type: recall_at_100 value: 95.89999999999999 - type: recall_at_1000 value: 98.2 - type: recall_at_20 value: 88.5 - type: recall_at_3 value: 74.4 - type: recall_at_5 value: 79.9 task: type: Retrieval - dataset: config: default name: MTEB Waimai (default) revision: 339287def212450dcaa9df8c22bf93e9980c7023 split: test type: C-MTEB/waimai-classification metrics: - type: accuracy value: 86.58999999999999 - type: ap value: 70.02619249927523 - type: ap_weighted value: 70.02619249927523 - type: f1 value: 84.97572770889423 - type: f1_weighted value: 86.6865713531272 - type: main_score value: 86.58999999999999 task: type: Classification - dataset: config: en name: MTEB XMarket (en) revision: dfe57acff5b62c23732a7b7d3e3fb84ff501708b split: test type: jinaai/xmarket_ml metrics: - type: main_score value: 34.772999999999996 - type: map_at_1 value: 7.2620000000000005 - type: map_at_10 value: 17.98 - type: map_at_100 value: 24.828 - type: map_at_1000 value: 26.633000000000003 - type: map_at_20 value: 20.699 - type: map_at_3 value: 12.383 - type: map_at_5 value: 14.871 - type: mrr_at_1 value: 34.718100890207715 - type: mrr_at_10 value: 43.9336827525092 - type: mrr_at_100 value: 44.66474011066837 - type: mrr_at_1000 value: 44.7075592197356 - type: mrr_at_20 value: 44.35984436569346 - type: mrr_at_3 value: 41.73901893981052 - type: mrr_at_5 value: 43.025973550207134 - type: nauc_map_at_1000_diff1 value: 13.899869081196364 - type: nauc_map_at_1000_max value: 46.60452816386231 - type: nauc_map_at_1000_std value: 24.87925799401773 - type: nauc_map_at_100_diff1 value: 16.164805650871084 - type: nauc_map_at_100_max value: 44.720912958558095 - type: nauc_map_at_100_std value: 20.236734536210477 - type: nauc_map_at_10_diff1 value: 23.58580520913581 - type: nauc_map_at_10_max value: 31.276151869914216 - type: nauc_map_at_10_std value: -0.1833326246041355 - type: nauc_map_at_1_diff1 value: 37.02663305598722 - type: nauc_map_at_1_max value: 14.931071531116528 - type: nauc_map_at_1_std value: -12.478790028708453 - type: nauc_map_at_20_diff1 value: 20.718297881540593 - type: nauc_map_at_20_max value: 36.62264094841859 - type: nauc_map_at_20_std value: 6.658514770057742 - type: nauc_map_at_3_diff1 value: 29.379034581120006 - type: nauc_map_at_3_max value: 21.387214269548803 - type: nauc_map_at_3_std value: -9.3404121914247 - type: nauc_map_at_5_diff1 value: 26.627169792839485 - type: nauc_map_at_5_max value: 25.393331109666388 - type: nauc_map_at_5_std value: -6.023485287246353 - type: nauc_mrr_at_1000_diff1 value: 12.047232036652295 - type: nauc_mrr_at_1000_max value: 46.611862580860645 - type: nauc_mrr_at_1000_std value: 27.89146066442305 - type: nauc_mrr_at_100_diff1 value: 12.05261747449997 - type: nauc_mrr_at_100_max value: 46.61328535381203 - type: nauc_mrr_at_100_std value: 27.886145596874535 - type: nauc_mrr_at_10_diff1 value: 12.006935553036941 - type: nauc_mrr_at_10_max value: 46.53351686240496 - type: nauc_mrr_at_10_std value: 27.708742470257462 - type: nauc_mrr_at_1_diff1 value: 13.323408127738782 - type: nauc_mrr_at_1_max value: 43.78884661002012 - type: nauc_mrr_at_1_std value: 25.164417588165673 - type: nauc_mrr_at_20_diff1 value: 12.036022973968011 - type: nauc_mrr_at_20_max value: 46.56537838037131 - type: nauc_mrr_at_20_std value: 27.78189157249635 - type: nauc_mrr_at_3_diff1 value: 11.943896700976381 - type: nauc_mrr_at_3_max value: 46.33644663073225 - type: nauc_mrr_at_3_std value: 27.523915405053845 - type: nauc_mrr_at_5_diff1 value: 12.03108009033769 - type: nauc_mrr_at_5_max value: 46.49103616896692 - type: nauc_mrr_at_5_std value: 27.630879129863366 - type: nauc_ndcg_at_1000_diff1 value: 9.766823796017324 - type: nauc_ndcg_at_1000_max value: 52.85844801910602 - type: nauc_ndcg_at_1000_std value: 36.43271437761207 - type: nauc_ndcg_at_100_diff1 value: 12.035059298282036 - type: nauc_ndcg_at_100_max value: 50.05520240705682 - type: nauc_ndcg_at_100_std value: 29.87678724506636 - type: nauc_ndcg_at_10_diff1 value: 10.281893031139424 - type: nauc_ndcg_at_10_max value: 47.02153679426017 - type: nauc_ndcg_at_10_std value: 26.624948330369126 - type: nauc_ndcg_at_1_diff1 value: 13.323408127738782 - type: nauc_ndcg_at_1_max value: 43.78884661002012 - type: nauc_ndcg_at_1_std value: 25.164417588165673 - type: nauc_ndcg_at_20_diff1 value: 11.463524849646598 - type: nauc_ndcg_at_20_max value: 47.415073186019704 - type: nauc_ndcg_at_20_std value: 26.359019620164307 - type: nauc_ndcg_at_3_diff1 value: 9.689199913805394 - type: nauc_ndcg_at_3_max value: 45.68151849572808 - type: nauc_ndcg_at_3_std value: 26.559193219799486 - type: nauc_ndcg_at_5_diff1 value: 9.448823370356575 - type: nauc_ndcg_at_5_max value: 46.19999662690141 - type: nauc_ndcg_at_5_std value: 26.8411706726069 - type: nauc_precision_at_1000_diff1 value: -20.379065598727024 - type: nauc_precision_at_1000_max value: 13.162562437268427 - type: nauc_precision_at_1000_std value: 22.658226157785812 - type: nauc_precision_at_100_diff1 value: -16.458155977309282 - type: nauc_precision_at_100_max value: 35.97956789169889 - type: nauc_precision_at_100_std value: 48.878375009979194 - type: nauc_precision_at_10_diff1 value: -7.810992317607771 - type: nauc_precision_at_10_max value: 49.307339277444754 - type: nauc_precision_at_10_std value: 42.82533951854582 - type: nauc_precision_at_1_diff1 value: 13.323408127738782 - type: nauc_precision_at_1_max value: 43.78884661002012 - type: nauc_precision_at_1_std value: 25.164417588165673 - type: nauc_precision_at_20_diff1 value: -11.43933465149542 - type: nauc_precision_at_20_max value: 46.93722753460038 - type: nauc_precision_at_20_std value: 47.36223769029678 - type: nauc_precision_at_3_diff1 value: 1.3230178593599737 - type: nauc_precision_at_3_max value: 48.49039534395576 - type: nauc_precision_at_3_std value: 33.161384183129194 - type: nauc_precision_at_5_diff1 value: -3.185516457926519 - type: nauc_precision_at_5_max value: 49.5814309394308 - type: nauc_precision_at_5_std value: 37.57637865900281 - type: nauc_recall_at_1000_diff1 value: 7.839499443984168 - type: nauc_recall_at_1000_max value: 52.67165467640894 - type: nauc_recall_at_1000_std value: 48.85318316702583 - type: nauc_recall_at_100_diff1 value: 14.117557049589418 - type: nauc_recall_at_100_max value: 40.59046301348715 - type: nauc_recall_at_100_std value: 24.379680901739505 - type: nauc_recall_at_10_diff1 value: 20.04536052614054 - type: nauc_recall_at_10_max value: 25.54148839721574 - type: nauc_recall_at_10_std value: -1.938182527562211 - type: nauc_recall_at_1_diff1 value: 37.02663305598722 - type: nauc_recall_at_1_max value: 14.931071531116528 - type: nauc_recall_at_1_std value: -12.478790028708453 - type: nauc_recall_at_20_diff1 value: 17.959977483235566 - type: nauc_recall_at_20_max value: 29.88502687870809 - type: nauc_recall_at_20_std value: 4.26527395196852 - type: nauc_recall_at_3_diff1 value: 26.297810954500456 - type: nauc_recall_at_3_max value: 18.819406079307402 - type: nauc_recall_at_3_std value: -10.002237229729081 - type: nauc_recall_at_5_diff1 value: 22.739080899568485 - type: nauc_recall_at_5_max value: 21.0322968243985 - type: nauc_recall_at_5_std value: -6.927749435306422 - type: ndcg_at_1 value: 34.717999999999996 - type: ndcg_at_10 value: 34.772999999999996 - type: ndcg_at_100 value: 39.407 - type: ndcg_at_1000 value: 44.830999999999996 - type: ndcg_at_20 value: 35.667 - type: ndcg_at_3 value: 34.332 - type: ndcg_at_5 value: 34.408 - type: precision_at_1 value: 34.717999999999996 - type: precision_at_10 value: 23.430999999999997 - type: precision_at_100 value: 9.31 - type: precision_at_1000 value: 2.259 - type: precision_at_20 value: 18.826999999999998 - type: precision_at_3 value: 30.553 - type: precision_at_5 value: 27.792 - type: recall_at_1 value: 7.2620000000000005 - type: recall_at_10 value: 26.384 - type: recall_at_100 value: 52.506 - type: recall_at_1000 value: 73.38 - type: recall_at_20 value: 34.032000000000004 - type: recall_at_3 value: 14.821000000000002 - type: recall_at_5 value: 19.481 task: type: Retrieval - dataset: config: de name: MTEB XMarket (de) revision: dfe57acff5b62c23732a7b7d3e3fb84ff501708b split: test type: jinaai/xmarket_ml metrics: - type: main_score value: 28.316000000000003 - type: map_at_1 value: 8.667 - type: map_at_10 value: 17.351 - type: map_at_100 value: 21.02 - type: map_at_1000 value: 21.951 - type: map_at_20 value: 18.994 - type: map_at_3 value: 13.23 - type: map_at_5 value: 15.17 - type: mrr_at_1 value: 27.27272727272727 - type: mrr_at_10 value: 36.10858487561485 - type: mrr_at_100 value: 36.92033814316568 - type: mrr_at_1000 value: 36.972226653870365 - type: mrr_at_20 value: 36.58914906427944 - type: mrr_at_3 value: 33.642969201552305 - type: mrr_at_5 value: 35.13417554289494 - type: nauc_map_at_1000_diff1 value: 23.345116790998063 - type: nauc_map_at_1000_max value: 44.447240670835725 - type: nauc_map_at_1000_std value: 18.34636500680144 - type: nauc_map_at_100_diff1 value: 24.458120909292347 - type: nauc_map_at_100_max value: 43.31851431140378 - type: nauc_map_at_100_std value: 15.654778355549965 - type: nauc_map_at_10_diff1 value: 29.376508937265044 - type: nauc_map_at_10_max value: 36.650196725140795 - type: nauc_map_at_10_std value: 4.682465435374843 - type: nauc_map_at_1_diff1 value: 40.382365672683214 - type: nauc_map_at_1_max value: 22.894341150096785 - type: nauc_map_at_1_std value: -5.610725673968323 - type: nauc_map_at_20_diff1 value: 27.197033425732908 - type: nauc_map_at_20_max value: 39.71672400647207 - type: nauc_map_at_20_std value: 8.944436813309933 - type: nauc_map_at_3_diff1 value: 34.49739294661502 - type: nauc_map_at_3_max value: 29.006972420735284 - type: nauc_map_at_3_std value: -3.0372650571243986 - type: nauc_map_at_5_diff1 value: 32.764901537277105 - type: nauc_map_at_5_max value: 32.658533295918154 - type: nauc_map_at_5_std value: 0.029626452286996906 - type: nauc_mrr_at_1000_diff1 value: 19.521229956280603 - type: nauc_mrr_at_1000_max value: 44.39409866211472 - type: nauc_mrr_at_1000_std value: 23.580697307036058 - type: nauc_mrr_at_100_diff1 value: 19.51312676591073 - type: nauc_mrr_at_100_max value: 44.39559153963895 - type: nauc_mrr_at_100_std value: 23.57913711397437 - type: nauc_mrr_at_10_diff1 value: 19.584635617935145 - type: nauc_mrr_at_10_max value: 44.44842226236198 - type: nauc_mrr_at_10_std value: 23.382684909390434 - type: nauc_mrr_at_1_diff1 value: 20.92594790923806 - type: nauc_mrr_at_1_max value: 40.593939625252816 - type: nauc_mrr_at_1_std value: 20.37467598073644 - type: nauc_mrr_at_20_diff1 value: 19.590641822115725 - type: nauc_mrr_at_20_max value: 44.42512299604718 - type: nauc_mrr_at_20_std value: 23.45564260800024 - type: nauc_mrr_at_3_diff1 value: 20.005307129527232 - type: nauc_mrr_at_3_max value: 43.68300366192776 - type: nauc_mrr_at_3_std value: 22.297190480842005 - type: nauc_mrr_at_5_diff1 value: 19.852896386271716 - type: nauc_mrr_at_5_max value: 44.20641808920062 - type: nauc_mrr_at_5_std value: 22.966517330852895 - type: nauc_ndcg_at_1000_diff1 value: 17.800116251376103 - type: nauc_ndcg_at_1000_max value: 50.98332718061365 - type: nauc_ndcg_at_1000_std value: 31.464484658102577 - type: nauc_ndcg_at_100_diff1 value: 19.555159680541088 - type: nauc_ndcg_at_100_max value: 48.56377130899141 - type: nauc_ndcg_at_100_std value: 25.77572748714817 - type: nauc_ndcg_at_10_diff1 value: 20.003008726679415 - type: nauc_ndcg_at_10_max value: 45.1293725480628 - type: nauc_ndcg_at_10_std value: 21.149213260765872 - type: nauc_ndcg_at_1_diff1 value: 21.00986278773023 - type: nauc_ndcg_at_1_max value: 40.524637076774894 - type: nauc_ndcg_at_1_std value: 20.29682194006685 - type: nauc_ndcg_at_20_diff1 value: 20.659734137312284 - type: nauc_ndcg_at_20_max value: 45.73108736599869 - type: nauc_ndcg_at_20_std value: 21.200736170346133 - type: nauc_ndcg_at_3_diff1 value: 19.200120542882544 - type: nauc_ndcg_at_3_max value: 42.89772612963168 - type: nauc_ndcg_at_3_std value: 20.713292754978983 - type: nauc_ndcg_at_5_diff1 value: 19.96329647992544 - type: nauc_ndcg_at_5_max value: 44.296627037787324 - type: nauc_ndcg_at_5_std value: 21.200135784971973 - type: nauc_precision_at_1000_diff1 value: -11.543221249009427 - type: nauc_precision_at_1000_max value: 9.132801614448221 - type: nauc_precision_at_1000_std value: 21.203720655381055 - type: nauc_precision_at_100_diff1 value: -12.510945425786039 - type: nauc_precision_at_100_max value: 31.42530963666252 - type: nauc_precision_at_100_std value: 44.99672783467617 - type: nauc_precision_at_10_diff1 value: -4.025802651746804 - type: nauc_precision_at_10_max value: 47.50967924227793 - type: nauc_precision_at_10_std value: 41.1558559268985 - type: nauc_precision_at_1_diff1 value: 21.00986278773023 - type: nauc_precision_at_1_max value: 40.524637076774894 - type: nauc_precision_at_1_std value: 20.29682194006685 - type: nauc_precision_at_20_diff1 value: -8.059482951110002 - type: nauc_precision_at_20_max value: 44.28832115946278 - type: nauc_precision_at_20_std value: 45.2005585353651 - type: nauc_precision_at_3_diff1 value: 8.53530005716248 - type: nauc_precision_at_3_max value: 46.48353678905102 - type: nauc_precision_at_3_std value: 28.868791323881972 - type: nauc_precision_at_5_diff1 value: 3.093619954821814 - type: nauc_precision_at_5_max value: 48.43294475817019 - type: nauc_precision_at_5_std value: 34.83430452745434 - type: nauc_recall_at_1000_diff1 value: 9.93680206699751 - type: nauc_recall_at_1000_max value: 52.97840222394363 - type: nauc_recall_at_1000_std value: 46.370023604436255 - type: nauc_recall_at_100_diff1 value: 14.100542445524972 - type: nauc_recall_at_100_max value: 42.853775131475224 - type: nauc_recall_at_100_std value: 26.93029971231028 - type: nauc_recall_at_10_diff1 value: 22.774547475714716 - type: nauc_recall_at_10_max value: 33.984586405015044 - type: nauc_recall_at_10_std value: 5.332325172373655 - type: nauc_recall_at_1_diff1 value: 40.382365672683214 - type: nauc_recall_at_1_max value: 22.894341150096785 - type: nauc_recall_at_1_std value: -5.610725673968323 - type: nauc_recall_at_20_diff1 value: 19.751060483835936 - type: nauc_recall_at_20_max value: 36.18774034635102 - type: nauc_recall_at_20_std value: 10.362242090308577 - type: nauc_recall_at_3_diff1 value: 30.29462372902671 - type: nauc_recall_at_3_max value: 27.377175450099635 - type: nauc_recall_at_3_std value: -3.015752705993425 - type: nauc_recall_at_5_diff1 value: 28.096893312615723 - type: nauc_recall_at_5_max value: 30.485075571512425 - type: nauc_recall_at_5_std value: 0.09106417003502826 - type: ndcg_at_1 value: 27.248 - type: ndcg_at_10 value: 28.316000000000003 - type: ndcg_at_100 value: 33.419 - type: ndcg_at_1000 value: 38.134 - type: ndcg_at_20 value: 29.707 - type: ndcg_at_3 value: 26.93 - type: ndcg_at_5 value: 27.363 - type: precision_at_1 value: 27.248 - type: precision_at_10 value: 15.073 - type: precision_at_100 value: 5.061 - type: precision_at_1000 value: 1.325 - type: precision_at_20 value: 11.407 - type: precision_at_3 value: 21.823 - type: precision_at_5 value: 18.984 - type: recall_at_1 value: 8.667 - type: recall_at_10 value: 26.984 - type: recall_at_100 value: 49.753 - type: recall_at_1000 value: 70.354 - type: recall_at_20 value: 33.955999999999996 - type: recall_at_3 value: 16.086 - type: recall_at_5 value: 20.544999999999998 task: type: Retrieval - dataset: config: es name: MTEB XMarket (es) revision: dfe57acff5b62c23732a7b7d3e3fb84ff501708b split: test type: jinaai/xmarket_ml metrics: - type: main_score value: 26.592 - type: map_at_1 value: 8.081000000000001 - type: map_at_10 value: 16.486 - type: map_at_100 value: 19.996 - type: map_at_1000 value: 20.889 - type: map_at_20 value: 18.088 - type: map_at_3 value: 12.864 - type: map_at_5 value: 14.515 - type: mrr_at_1 value: 24.643356643356643 - type: mrr_at_10 value: 33.755599955599926 - type: mrr_at_100 value: 34.55914769326114 - type: mrr_at_1000 value: 34.614384237219745 - type: mrr_at_20 value: 34.228909650276194 - type: mrr_at_3 value: 31.445221445221456 - type: mrr_at_5 value: 32.71375291375297 - type: nauc_map_at_1000_diff1 value: 19.17751654240679 - type: nauc_map_at_1000_max value: 43.493743561136434 - type: nauc_map_at_1000_std value: 21.14477911550252 - type: nauc_map_at_100_diff1 value: 20.259227234415395 - type: nauc_map_at_100_max value: 42.510860292169106 - type: nauc_map_at_100_std value: 18.63085160442346 - type: nauc_map_at_10_diff1 value: 24.12419385640694 - type: nauc_map_at_10_max value: 35.99892932069915 - type: nauc_map_at_10_std value: 8.488520124325058 - type: nauc_map_at_1_diff1 value: 35.09239143996649 - type: nauc_map_at_1_max value: 23.72498533914286 - type: nauc_map_at_1_std value: -4.164387883546102 - type: nauc_map_at_20_diff1 value: 22.411418237320817 - type: nauc_map_at_20_max value: 39.12496266094892 - type: nauc_map_at_20_std value: 12.371656353894227 - type: nauc_map_at_3_diff1 value: 28.106972376813506 - type: nauc_map_at_3_max value: 29.57824316865409 - type: nauc_map_at_3_std value: 1.8928791254813127 - type: nauc_map_at_5_diff1 value: 26.4958239149419 - type: nauc_map_at_5_max value: 32.45906016649239 - type: nauc_map_at_5_std value: 4.612735963224018 - type: nauc_mrr_at_1000_diff1 value: 17.614812607094446 - type: nauc_mrr_at_1000_max value: 41.13031556228715 - type: nauc_mrr_at_1000_std value: 22.564112871230318 - type: nauc_mrr_at_100_diff1 value: 17.614044568011085 - type: nauc_mrr_at_100_max value: 41.129436273086796 - type: nauc_mrr_at_100_std value: 22.566763500658766 - type: nauc_mrr_at_10_diff1 value: 17.61869494452089 - type: nauc_mrr_at_10_max value: 41.091542329381426 - type: nauc_mrr_at_10_std value: 22.370473458633594 - type: nauc_mrr_at_1_diff1 value: 20.321421442201913 - type: nauc_mrr_at_1_max value: 38.36531448180009 - type: nauc_mrr_at_1_std value: 18.422203207777688 - type: nauc_mrr_at_20_diff1 value: 17.614767736091625 - type: nauc_mrr_at_20_max value: 41.11221420736687 - type: nauc_mrr_at_20_std value: 22.44271891522012 - type: nauc_mrr_at_3_diff1 value: 17.98184651584625 - type: nauc_mrr_at_3_max value: 40.424293610470144 - type: nauc_mrr_at_3_std value: 21.554750947206706 - type: nauc_mrr_at_5_diff1 value: 17.72088314927416 - type: nauc_mrr_at_5_max value: 40.662724739072694 - type: nauc_mrr_at_5_std value: 21.822957528431928 - type: nauc_ndcg_at_1000_diff1 value: 15.310699428328398 - type: nauc_ndcg_at_1000_max value: 48.83921393349997 - type: nauc_ndcg_at_1000_std value: 32.22600294110774 - type: nauc_ndcg_at_100_diff1 value: 16.62672763977423 - type: nauc_ndcg_at_100_max value: 47.36060653537392 - type: nauc_ndcg_at_100_std value: 27.879865162871575 - type: nauc_ndcg_at_10_diff1 value: 16.436684176028116 - type: nauc_ndcg_at_10_max value: 43.00026520872974 - type: nauc_ndcg_at_10_std value: 22.507354939162806 - type: nauc_ndcg_at_1_diff1 value: 20.321421442201913 - type: nauc_ndcg_at_1_max value: 38.36531448180009 - type: nauc_ndcg_at_1_std value: 18.422203207777688 - type: nauc_ndcg_at_20_diff1 value: 17.127747123248835 - type: nauc_ndcg_at_20_max value: 44.57322943752733 - type: nauc_ndcg_at_20_std value: 23.146541187377036 - type: nauc_ndcg_at_3_diff1 value: 16.372742984728514 - type: nauc_ndcg_at_3_max value: 40.91938017883993 - type: nauc_ndcg_at_3_std value: 21.50917089194154 - type: nauc_ndcg_at_5_diff1 value: 16.40486505525073 - type: nauc_ndcg_at_5_max value: 41.94597203181329 - type: nauc_ndcg_at_5_std value: 22.068260809047562 - type: nauc_precision_at_1000_diff1 value: -15.9415313729527 - type: nauc_precision_at_1000_max value: 12.653329948983643 - type: nauc_precision_at_1000_std value: 26.371820703256173 - type: nauc_precision_at_100_diff1 value: -11.851070166675289 - type: nauc_precision_at_100_max value: 32.164365923950115 - type: nauc_precision_at_100_std value: 45.930226426725426 - type: nauc_precision_at_10_diff1 value: -3.1352660378259163 - type: nauc_precision_at_10_max value: 45.48359878733272 - type: nauc_precision_at_10_std value: 40.2917038044196 - type: nauc_precision_at_1_diff1 value: 20.321421442201913 - type: nauc_precision_at_1_max value: 38.36531448180009 - type: nauc_precision_at_1_std value: 18.422203207777688 - type: nauc_precision_at_20_diff1 value: -7.087513342144751 - type: nauc_precision_at_20_max value: 43.66272019058357 - type: nauc_precision_at_20_std value: 44.22863351071686 - type: nauc_precision_at_3_diff1 value: 7.836185032609045 - type: nauc_precision_at_3_max value: 44.85412904097269 - type: nauc_precision_at_3_std value: 30.209139149500057 - type: nauc_precision_at_5_diff1 value: 3.028150537253791 - type: nauc_precision_at_5_max value: 45.73661708882973 - type: nauc_precision_at_5_std value: 34.65500311185052 - type: nauc_recall_at_1000_diff1 value: 9.526124668370704 - type: nauc_recall_at_1000_max value: 51.4190208452196 - type: nauc_recall_at_1000_std value: 45.694891695646426 - type: nauc_recall_at_100_diff1 value: 12.68466215400009 - type: nauc_recall_at_100_max value: 42.79112054268112 - type: nauc_recall_at_100_std value: 28.61954251400998 - type: nauc_recall_at_10_diff1 value: 17.95124413416829 - type: nauc_recall_at_10_max value: 33.1192036755167 - type: nauc_recall_at_10_std value: 9.3588175959525 - type: nauc_recall_at_1_diff1 value: 35.09239143996649 - type: nauc_recall_at_1_max value: 23.72498533914286 - type: nauc_recall_at_1_std value: -4.164387883546102 - type: nauc_recall_at_20_diff1 value: 16.24916980445646 - type: nauc_recall_at_20_max value: 36.51316122236076 - type: nauc_recall_at_20_std value: 13.641588062425736 - type: nauc_recall_at_3_diff1 value: 23.263199724138786 - type: nauc_recall_at_3_max value: 27.67354561610614 - type: nauc_recall_at_3_std value: 3.103127242654415 - type: nauc_recall_at_5_diff1 value: 20.719704839229635 - type: nauc_recall_at_5_max value: 29.66480839111333 - type: nauc_recall_at_5_std value: 5.514884455797986 - type: ndcg_at_1 value: 24.643 - type: ndcg_at_10 value: 26.592 - type: ndcg_at_100 value: 31.887 - type: ndcg_at_1000 value: 36.695 - type: ndcg_at_20 value: 28.166000000000004 - type: ndcg_at_3 value: 25.238 - type: ndcg_at_5 value: 25.545 - type: precision_at_1 value: 24.643 - type: precision_at_10 value: 13.730999999999998 - type: precision_at_100 value: 4.744000000000001 - type: precision_at_1000 value: 1.167 - type: precision_at_20 value: 10.562000000000001 - type: precision_at_3 value: 20.288999999999998 - type: precision_at_5 value: 17.337 - type: recall_at_1 value: 8.081000000000001 - type: recall_at_10 value: 25.911 - type: recall_at_100 value: 48.176 - type: recall_at_1000 value: 69.655 - type: recall_at_20 value: 32.924 - type: recall_at_3 value: 16.125 - type: recall_at_5 value: 19.988 task: type: Retrieval - dataset: config: deu-deu name: MTEB XPQARetrieval (deu-deu) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 84.552 - type: map_at_1 value: 59.023 - type: map_at_10 value: 81.051 - type: map_at_100 value: 81.539 - type: map_at_1000 value: 81.54299999999999 - type: map_at_20 value: 81.401 - type: map_at_3 value: 76.969 - type: map_at_5 value: 80.07600000000001 - type: mrr_at_1 value: 77.67624020887729 - type: mrr_at_10 value: 83.30509967259314 - type: mrr_at_100 value: 83.58599391639456 - type: mrr_at_1000 value: 83.58970114722587 - type: mrr_at_20 value: 83.50275980440317 - type: mrr_at_3 value: 82.07136640557006 - type: mrr_at_5 value: 82.94604003481287 - type: nauc_map_at_1000_diff1 value: 63.12885104269942 - type: nauc_map_at_1000_max value: 57.7017996674959 - type: nauc_map_at_1000_std value: -24.951068985070513 - type: nauc_map_at_100_diff1 value: 63.12866509393162 - type: nauc_map_at_100_max value: 57.70176426013332 - type: nauc_map_at_100_std value: -24.96012290790273 - type: nauc_map_at_10_diff1 value: 62.847709436211204 - type: nauc_map_at_10_max value: 57.408873624779524 - type: nauc_map_at_10_std value: -25.635130363219062 - type: nauc_map_at_1_diff1 value: 71.89683981857102 - type: nauc_map_at_1_max value: 20.204460967432645 - type: nauc_map_at_1_std value: -23.07894656629493 - type: nauc_map_at_20_diff1 value: 63.00504457011043 - type: nauc_map_at_20_max value: 57.66009512514262 - type: nauc_map_at_20_std value: -25.100138593754885 - type: nauc_map_at_3_diff1 value: 63.199874607788274 - type: nauc_map_at_3_max value: 47.54482033763308 - type: nauc_map_at_3_std value: -27.714557098916963 - type: nauc_map_at_5_diff1 value: 63.01006523518669 - type: nauc_map_at_5_max value: 56.501965964288495 - type: nauc_map_at_5_std value: -25.367825762790925 - type: nauc_mrr_at_1000_diff1 value: 66.24988063948112 - type: nauc_mrr_at_1000_max value: 63.56921667744273 - type: nauc_mrr_at_1000_std value: -22.073973768031863 - type: nauc_mrr_at_100_diff1 value: 66.24919554296275 - type: nauc_mrr_at_100_max value: 63.57382447608361 - type: nauc_mrr_at_100_std value: -22.084627248538187 - type: nauc_mrr_at_10_diff1 value: 66.0143885124066 - type: nauc_mrr_at_10_max value: 63.51277586011898 - type: nauc_mrr_at_10_std value: -22.477523960705454 - type: nauc_mrr_at_1_diff1 value: 68.25415199323474 - type: nauc_mrr_at_1_max value: 63.069019003272416 - type: nauc_mrr_at_1_std value: -18.77085924093244 - type: nauc_mrr_at_20_diff1 value: 66.16203167351055 - type: nauc_mrr_at_20_max value: 63.607477776215845 - type: nauc_mrr_at_20_std value: -22.15083176017266 - type: nauc_mrr_at_3_diff1 value: 66.39368842782302 - type: nauc_mrr_at_3_max value: 63.11411066585295 - type: nauc_mrr_at_3_std value: -22.63174342814071 - type: nauc_mrr_at_5_diff1 value: 66.17932562332354 - type: nauc_mrr_at_5_max value: 63.70434825329594 - type: nauc_mrr_at_5_std value: -21.704012812430438 - type: nauc_ndcg_at_1000_diff1 value: 63.958010361549356 - type: nauc_ndcg_at_1000_max value: 60.516445000134624 - type: nauc_ndcg_at_1000_std value: -24.264672248289923 - type: nauc_ndcg_at_100_diff1 value: 63.97654644758022 - type: nauc_ndcg_at_100_max value: 60.62187552803407 - type: nauc_ndcg_at_100_std value: -24.317149225778312 - type: nauc_ndcg_at_10_diff1 value: 62.505321221321566 - type: nauc_ndcg_at_10_max value: 59.77891112351258 - type: nauc_ndcg_at_10_std value: -26.90910005589911 - type: nauc_ndcg_at_1_diff1 value: 68.25415199323474 - type: nauc_ndcg_at_1_max value: 63.069019003272416 - type: nauc_ndcg_at_1_std value: -18.77085924093244 - type: nauc_ndcg_at_20_diff1 value: 63.04281805056225 - type: nauc_ndcg_at_20_max value: 60.600957307444226 - type: nauc_ndcg_at_20_std value: -24.954862079889203 - type: nauc_ndcg_at_3_diff1 value: 62.970441139740316 - type: nauc_ndcg_at_3_max value: 57.543715669055295 - type: nauc_ndcg_at_3_std value: -25.659388431714703 - type: nauc_ndcg_at_5_diff1 value: 62.82652127664541 - type: nauc_ndcg_at_5_max value: 58.6970443258532 - type: nauc_ndcg_at_5_std value: -25.66329354851023 - type: nauc_precision_at_1000_diff1 value: -33.38530947486223 - type: nauc_precision_at_1000_max value: 25.972468024345414 - type: nauc_precision_at_1000_std value: 17.460222955117978 - type: nauc_precision_at_100_diff1 value: -32.45175999251703 - type: nauc_precision_at_100_max value: 26.367996120487337 - type: nauc_precision_at_100_std value: 17.097957946391208 - type: nauc_precision_at_10_diff1 value: -26.97411235289487 - type: nauc_precision_at_10_max value: 31.504961687240762 - type: nauc_precision_at_10_std value: 11.125341183874687 - type: nauc_precision_at_1_diff1 value: 68.25415199323474 - type: nauc_precision_at_1_max value: 63.069019003272416 - type: nauc_precision_at_1_std value: -18.77085924093244 - type: nauc_precision_at_20_diff1 value: -29.8678078736273 - type: nauc_precision_at_20_max value: 29.031222186584504 - type: nauc_precision_at_20_std value: 14.943600563087928 - type: nauc_precision_at_3_diff1 value: -15.92947221299854 - type: nauc_precision_at_3_max value: 37.73833494235097 - type: nauc_precision_at_3_std value: 3.1573228443500847 - type: nauc_precision_at_5_diff1 value: -22.269156821101642 - type: nauc_precision_at_5_max value: 35.65821838116355 - type: nauc_precision_at_5_std value: 9.265930386198972 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 66.17058859539249 - type: nauc_recall_at_100_max value: 78.066942935192 - type: nauc_recall_at_100_std value: -22.213377762074686 - type: nauc_recall_at_10_diff1 value: 50.82149700700275 - type: nauc_recall_at_10_max value: 56.68053325008221 - type: nauc_recall_at_10_std value: -41.81657941433277 - type: nauc_recall_at_1_diff1 value: 71.89683981857102 - type: nauc_recall_at_1_max value: 20.204460967432645 - type: nauc_recall_at_1_std value: -23.07894656629493 - type: nauc_recall_at_20_diff1 value: 48.28076011857885 - type: nauc_recall_at_20_max value: 63.29641555519295 - type: nauc_recall_at_20_std value: -32.953559708819405 - type: nauc_recall_at_3_diff1 value: 58.15516956312558 - type: nauc_recall_at_3_max value: 42.66315890283056 - type: nauc_recall_at_3_std value: -32.16572530544806 - type: nauc_recall_at_5_diff1 value: 55.900844052439766 - type: nauc_recall_at_5_max value: 55.23702018862884 - type: nauc_recall_at_5_std value: -30.105929528165 - type: ndcg_at_1 value: 77.676 - type: ndcg_at_10 value: 84.552 - type: ndcg_at_100 value: 86.232 - type: ndcg_at_1000 value: 86.33800000000001 - type: ndcg_at_20 value: 85.515 - type: ndcg_at_3 value: 81.112 - type: ndcg_at_5 value: 82.943 - type: precision_at_1 value: 77.676 - type: precision_at_10 value: 15.17 - type: precision_at_100 value: 1.6230000000000002 - type: precision_at_1000 value: 0.163 - type: precision_at_20 value: 7.858999999999999 - type: precision_at_3 value: 42.994 - type: precision_at_5 value: 28.747 - type: recall_at_1 value: 59.023 - type: recall_at_10 value: 92.465 - type: recall_at_100 value: 99.18400000000001 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 95.844 - type: recall_at_3 value: 81.826 - type: recall_at_5 value: 88.22 task: type: Retrieval - dataset: config: deu-eng name: MTEB XPQARetrieval (deu-eng) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 82.149 - type: map_at_1 value: 56.277 - type: map_at_10 value: 78.36999999999999 - type: map_at_100 value: 78.94 - type: map_at_1000 value: 78.95 - type: map_at_20 value: 78.818 - type: map_at_3 value: 74.25 - type: map_at_5 value: 77.11099999999999 - type: mrr_at_1 value: 74.28198433420366 - type: mrr_at_10 value: 80.57487877657589 - type: mrr_at_100 value: 80.94025764149008 - type: mrr_at_1000 value: 80.94608738871234 - type: mrr_at_20 value: 80.86240675885023 - type: mrr_at_3 value: 79.4604003481288 - type: mrr_at_5 value: 80.10008703220191 - type: nauc_map_at_1000_diff1 value: 60.44369249057189 - type: nauc_map_at_1000_max value: 49.822240441830246 - type: nauc_map_at_1000_std value: -27.34026380762817 - type: nauc_map_at_100_diff1 value: 60.44635668050401 - type: nauc_map_at_100_max value: 49.838675926660684 - type: nauc_map_at_100_std value: -27.310365556055583 - type: nauc_map_at_10_diff1 value: 60.18546951726522 - type: nauc_map_at_10_max value: 49.72075398096832 - type: nauc_map_at_10_std value: -27.86056102461558 - type: nauc_map_at_1_diff1 value: 71.2906657099758 - type: nauc_map_at_1_max value: 18.970399251589 - type: nauc_map_at_1_std value: -27.260776614286602 - type: nauc_map_at_20_diff1 value: 60.3525975566164 - type: nauc_map_at_20_max value: 49.852487866710646 - type: nauc_map_at_20_std value: -27.305173830170332 - type: nauc_map_at_3_diff1 value: 60.66803500571236 - type: nauc_map_at_3_max value: 41.18191941521972 - type: nauc_map_at_3_std value: -28.71383593401732 - type: nauc_map_at_5_diff1 value: 60.57216514504887 - type: nauc_map_at_5_max value: 47.99837400446299 - type: nauc_map_at_5_std value: -28.756183015949986 - type: nauc_mrr_at_1000_diff1 value: 63.77031955602516 - type: nauc_mrr_at_1000_max value: 54.26907383811417 - type: nauc_mrr_at_1000_std value: -26.227442087164714 - type: nauc_mrr_at_100_diff1 value: 63.77196650108669 - type: nauc_mrr_at_100_max value: 54.281801457913126 - type: nauc_mrr_at_100_std value: -26.216077891830793 - type: nauc_mrr_at_10_diff1 value: 63.50095284903051 - type: nauc_mrr_at_10_max value: 54.3186301730016 - type: nauc_mrr_at_10_std value: -26.29570241722173 - type: nauc_mrr_at_1_diff1 value: 65.15855770999057 - type: nauc_mrr_at_1_max value: 53.213286738515066 - type: nauc_mrr_at_1_std value: -24.683178252901943 - type: nauc_mrr_at_20_diff1 value: 63.74936550280859 - type: nauc_mrr_at_20_max value: 54.355343751439065 - type: nauc_mrr_at_20_std value: -26.197316900009817 - type: nauc_mrr_at_3_diff1 value: 63.912612979082695 - type: nauc_mrr_at_3_max value: 53.75399024225975 - type: nauc_mrr_at_3_std value: -27.194143264554675 - type: nauc_mrr_at_5_diff1 value: 63.72491059053639 - type: nauc_mrr_at_5_max value: 53.66107604019352 - type: nauc_mrr_at_5_std value: -26.92281560584754 - type: nauc_ndcg_at_1000_diff1 value: 61.304218998714354 - type: nauc_ndcg_at_1000_max value: 52.409135743660386 - type: nauc_ndcg_at_1000_std value: -26.539796489464056 - type: nauc_ndcg_at_100_diff1 value: 61.40355045085304 - type: nauc_ndcg_at_100_max value: 52.79402259608008 - type: nauc_ndcg_at_100_std value: -25.927273456979965 - type: nauc_ndcg_at_10_diff1 value: 59.93675608684116 - type: nauc_ndcg_at_10_max value: 52.617848197542706 - type: nauc_ndcg_at_10_std value: -27.314820020095887 - type: nauc_ndcg_at_1_diff1 value: 65.15855770999057 - type: nauc_ndcg_at_1_max value: 53.213286738515066 - type: nauc_ndcg_at_1_std value: -24.683178252901943 - type: nauc_ndcg_at_20_diff1 value: 60.85093704358376 - type: nauc_ndcg_at_20_max value: 53.14529242671602 - type: nauc_ndcg_at_20_std value: -25.93187916231906 - type: nauc_ndcg_at_3_diff1 value: 60.42301123518882 - type: nauc_ndcg_at_3_max value: 49.59021992975956 - type: nauc_ndcg_at_3_std value: -27.397117967810363 - type: nauc_ndcg_at_5_diff1 value: 60.78655153154219 - type: nauc_ndcg_at_5_max value: 49.54194799556953 - type: nauc_ndcg_at_5_std value: -29.467910172913413 - type: nauc_precision_at_1000_diff1 value: -34.35027108027456 - type: nauc_precision_at_1000_max value: 23.762671066858815 - type: nauc_precision_at_1000_std value: 16.1704780298982 - type: nauc_precision_at_100_diff1 value: -32.66610016754961 - type: nauc_precision_at_100_max value: 25.504044603109588 - type: nauc_precision_at_100_std value: 16.932402988816786 - type: nauc_precision_at_10_diff1 value: -25.720903145017342 - type: nauc_precision_at_10_max value: 30.37029690599926 - type: nauc_precision_at_10_std value: 10.560753160200314 - type: nauc_precision_at_1_diff1 value: 65.15855770999057 - type: nauc_precision_at_1_max value: 53.213286738515066 - type: nauc_precision_at_1_std value: -24.683178252901943 - type: nauc_precision_at_20_diff1 value: -29.577582332619084 - type: nauc_precision_at_20_max value: 27.984145595920417 - type: nauc_precision_at_20_std value: 15.083711704044727 - type: nauc_precision_at_3_diff1 value: -14.736267532892697 - type: nauc_precision_at_3_max value: 36.12211021824307 - type: nauc_precision_at_3_std value: 3.068643876519412 - type: nauc_precision_at_5_diff1 value: -19.846707283120825 - type: nauc_precision_at_5_max value: 33.573804532177896 - type: nauc_precision_at_5_std value: 5.700545622744924 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 68.24749796604452 - type: nauc_recall_at_100_max value: 83.30024864929815 - type: nauc_recall_at_100_std value: 21.23763053711522 - type: nauc_recall_at_10_diff1 value: 50.704049683241436 - type: nauc_recall_at_10_max value: 57.64578984555556 - type: nauc_recall_at_10_std value: -26.632759037746073 - type: nauc_recall_at_1_diff1 value: 71.2906657099758 - type: nauc_recall_at_1_max value: 18.970399251589 - type: nauc_recall_at_1_std value: -27.260776614286602 - type: nauc_recall_at_20_diff1 value: 54.124480837579505 - type: nauc_recall_at_20_max value: 66.4641515433479 - type: nauc_recall_at_20_std value: -14.615911455379393 - type: nauc_recall_at_3_diff1 value: 56.54358788321059 - type: nauc_recall_at_3_max value: 37.765735322465744 - type: nauc_recall_at_3_std value: -30.824147408598574 - type: nauc_recall_at_5_diff1 value: 56.392894535029214 - type: nauc_recall_at_5_max value: 45.959268387521554 - type: nauc_recall_at_5_std value: -33.58175576925282 - type: ndcg_at_1 value: 74.28200000000001 - type: ndcg_at_10 value: 82.149 - type: ndcg_at_100 value: 84.129 - type: ndcg_at_1000 value: 84.307 - type: ndcg_at_20 value: 83.39999999999999 - type: ndcg_at_3 value: 78.583 - type: ndcg_at_5 value: 80.13900000000001 - type: precision_at_1 value: 74.28200000000001 - type: precision_at_10 value: 14.960999999999999 - type: precision_at_100 value: 1.6119999999999999 - type: precision_at_1000 value: 0.163 - type: precision_at_20 value: 7.813000000000001 - type: precision_at_3 value: 41.819 - type: precision_at_5 value: 27.911 - type: recall_at_1 value: 56.277 - type: recall_at_10 value: 90.729 - type: recall_at_100 value: 98.792 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 95.148 - type: recall_at_3 value: 79.989 - type: recall_at_5 value: 85.603 task: type: Retrieval - dataset: config: eng-deu name: MTEB XPQARetrieval (eng-deu) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 60.428000000000004 - type: map_at_1 value: 33.453 - type: map_at_10 value: 54.217000000000006 - type: map_at_100 value: 55.832 - type: map_at_1000 value: 55.884 - type: map_at_20 value: 55.236 - type: map_at_3 value: 48.302 - type: map_at_5 value: 51.902 - type: mrr_at_1 value: 53.916449086161876 - type: mrr_at_10 value: 61.4685647975465 - type: mrr_at_100 value: 62.13718159287348 - type: mrr_at_1000 value: 62.15799113826325 - type: mrr_at_20 value: 61.885388764243544 - type: mrr_at_3 value: 59.44299390774582 - type: mrr_at_5 value: 60.26544821583981 - type: nauc_map_at_1000_diff1 value: 39.824412602121804 - type: nauc_map_at_1000_max value: 39.49332709959374 - type: nauc_map_at_1000_std value: -17.27462623749702 - type: nauc_map_at_100_diff1 value: 39.80528910003463 - type: nauc_map_at_100_max value: 39.51471609156093 - type: nauc_map_at_100_std value: -17.275536933094937 - type: nauc_map_at_10_diff1 value: 39.28558292349772 - type: nauc_map_at_10_max value: 38.13220294838968 - type: nauc_map_at_10_std value: -18.235985574392863 - type: nauc_map_at_1_diff1 value: 43.68892397816937 - type: nauc_map_at_1_max value: 14.478978190224353 - type: nauc_map_at_1_std value: -18.435031919225477 - type: nauc_map_at_20_diff1 value: 39.8733530971344 - type: nauc_map_at_20_max value: 39.30513202591992 - type: nauc_map_at_20_std value: -17.62362848144766 - type: nauc_map_at_3_diff1 value: 40.31116611188815 - type: nauc_map_at_3_max value: 31.107314675202165 - type: nauc_map_at_3_std value: -19.52930881946966 - type: nauc_map_at_5_diff1 value: 39.1241499095765 - type: nauc_map_at_5_max value: 37.330543901034055 - type: nauc_map_at_5_std value: -17.893862772447548 - type: nauc_mrr_at_1000_diff1 value: 43.07490530140024 - type: nauc_mrr_at_1000_max value: 42.28469195779226 - type: nauc_mrr_at_1000_std value: -15.583217110180737 - type: nauc_mrr_at_100_diff1 value: 43.068836494603886 - type: nauc_mrr_at_100_max value: 42.29612450479168 - type: nauc_mrr_at_100_std value: -15.57218089438229 - type: nauc_mrr_at_10_diff1 value: 42.88685919151777 - type: nauc_mrr_at_10_max value: 41.89944452003811 - type: nauc_mrr_at_10_std value: -15.909673572763165 - type: nauc_mrr_at_1_diff1 value: 45.67646898532131 - type: nauc_mrr_at_1_max value: 43.0541870425035 - type: nauc_mrr_at_1_std value: -15.597124291613563 - type: nauc_mrr_at_20_diff1 value: 43.14141873150977 - type: nauc_mrr_at_20_max value: 42.33063543184022 - type: nauc_mrr_at_20_std value: -15.607612016107304 - type: nauc_mrr_at_3_diff1 value: 43.18370928261982 - type: nauc_mrr_at_3_max value: 42.18529980773961 - type: nauc_mrr_at_3_std value: -15.900151400673629 - type: nauc_mrr_at_5_diff1 value: 42.43443044877765 - type: nauc_mrr_at_5_max value: 42.05818605278972 - type: nauc_mrr_at_5_std value: -15.436502733299893 - type: nauc_ndcg_at_1000_diff1 value: 40.60606676178781 - type: nauc_ndcg_at_1000_max value: 41.71923393878376 - type: nauc_ndcg_at_1000_std value: -15.694740326899556 - type: nauc_ndcg_at_100_diff1 value: 40.15270376312309 - type: nauc_ndcg_at_100_max value: 42.234126305709225 - type: nauc_ndcg_at_100_std value: -15.436051984708952 - type: nauc_ndcg_at_10_diff1 value: 39.142259831299455 - type: nauc_ndcg_at_10_max value: 38.61470104273746 - type: nauc_ndcg_at_10_std value: -18.577452829132742 - type: nauc_ndcg_at_1_diff1 value: 45.67646898532131 - type: nauc_ndcg_at_1_max value: 43.0541870425035 - type: nauc_ndcg_at_1_std value: -15.597124291613563 - type: nauc_ndcg_at_20_diff1 value: 40.805159395901306 - type: nauc_ndcg_at_20_max value: 41.58685629374952 - type: nauc_ndcg_at_20_std value: -16.862408156222592 - type: nauc_ndcg_at_3_diff1 value: 39.12028215488432 - type: nauc_ndcg_at_3_max value: 39.70580596343164 - type: nauc_ndcg_at_3_std value: -16.705546903936213 - type: nauc_ndcg_at_5_diff1 value: 38.42075404927361 - type: nauc_ndcg_at_5_max value: 38.064219879504385 - type: nauc_ndcg_at_5_std value: -17.20282111665876 - type: nauc_precision_at_1000_diff1 value: -4.419224540552891 - type: nauc_precision_at_1000_max value: 35.686022591225246 - type: nauc_precision_at_1000_std value: 15.023520191032972 - type: nauc_precision_at_100_diff1 value: -2.9027602601603895 - type: nauc_precision_at_100_max value: 39.99864013028808 - type: nauc_precision_at_100_std value: 13.863497117255525 - type: nauc_precision_at_10_diff1 value: 5.539104839809501 - type: nauc_precision_at_10_max value: 42.41625740557432 - type: nauc_precision_at_10_std value: 1.0894693748662556 - type: nauc_precision_at_1_diff1 value: 45.67646898532131 - type: nauc_precision_at_1_max value: 43.0541870425035 - type: nauc_precision_at_1_std value: -15.597124291613563 - type: nauc_precision_at_20_diff1 value: 4.734562571681868 - type: nauc_precision_at_20_max value: 44.35081213316202 - type: nauc_precision_at_20_std value: 6.642891478284595 - type: nauc_precision_at_3_diff1 value: 13.936559341472101 - type: nauc_precision_at_3_max value: 45.426668552497524 - type: nauc_precision_at_3_std value: -5.219785419247125 - type: nauc_precision_at_5_diff1 value: 8.366706789546015 - type: nauc_precision_at_5_max value: 46.161942989326896 - type: nauc_precision_at_5_std value: -0.193140343545876 - type: nauc_recall_at_1000_diff1 value: 45.61785312444842 - type: nauc_recall_at_1000_max value: 75.68258976531774 - type: nauc_recall_at_1000_std value: 37.469059422121575 - type: nauc_recall_at_100_diff1 value: 26.798748531805096 - type: nauc_recall_at_100_max value: 54.72134095197765 - type: nauc_recall_at_100_std value: -1.5967608233799417 - type: nauc_recall_at_10_diff1 value: 32.13211696200521 - type: nauc_recall_at_10_max value: 31.13866254975895 - type: nauc_recall_at_10_std value: -22.31404161136118 - type: nauc_recall_at_1_diff1 value: 43.68892397816937 - type: nauc_recall_at_1_max value: 14.478978190224353 - type: nauc_recall_at_1_std value: -18.435031919225477 - type: nauc_recall_at_20_diff1 value: 38.597996930461385 - type: nauc_recall_at_20_max value: 42.49849027366794 - type: nauc_recall_at_20_std value: -16.536471900752154 - type: nauc_recall_at_3_diff1 value: 35.343730012759266 - type: nauc_recall_at_3_max value: 26.898722085043392 - type: nauc_recall_at_3_std value: -19.4459792273884 - type: nauc_recall_at_5_diff1 value: 31.8310298012186 - type: nauc_recall_at_5_max value: 32.67800489655844 - type: nauc_recall_at_5_std value: -16.800929103347283 - type: ndcg_at_1 value: 53.916 - type: ndcg_at_10 value: 60.428000000000004 - type: ndcg_at_100 value: 65.95 - type: ndcg_at_1000 value: 66.88 - type: ndcg_at_20 value: 62.989 - type: ndcg_at_3 value: 55.204 - type: ndcg_at_5 value: 56.42700000000001 - type: precision_at_1 value: 53.916 - type: precision_at_10 value: 14.346999999999998 - type: precision_at_100 value: 1.849 - type: precision_at_1000 value: 0.196 - type: precision_at_20 value: 8.022 - type: precision_at_3 value: 34.552 - type: precision_at_5 value: 24.569 - type: recall_at_1 value: 33.453 - type: recall_at_10 value: 71.07900000000001 - type: recall_at_100 value: 93.207 - type: recall_at_1000 value: 99.60799999999999 - type: recall_at_20 value: 79.482 - type: recall_at_3 value: 53.98 - type: recall_at_5 value: 60.781 task: type: Retrieval - dataset: config: eng-pol name: MTEB XPQARetrieval (eng-pol) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 34.042 - type: map_at_1 value: 13.236 - type: map_at_10 value: 27.839999999999996 - type: map_at_100 value: 30.171999999999997 - type: map_at_1000 value: 30.349999999999998 - type: map_at_20 value: 29.044999999999998 - type: map_at_3 value: 22.58 - type: map_at_5 value: 25.83 - type: mrr_at_1 value: 30.318471337579616 - type: mrr_at_10 value: 37.4983823678091 - type: mrr_at_100 value: 38.5784523175009 - type: mrr_at_1000 value: 38.63608698968148 - type: mrr_at_20 value: 38.02996157871825 - type: mrr_at_3 value: 34.798301486199584 - type: mrr_at_5 value: 36.39702760084925 - type: nauc_map_at_1000_diff1 value: 21.07199789609177 - type: nauc_map_at_1000_max value: 25.959233507893277 - type: nauc_map_at_1000_std value: -28.011925372852826 - type: nauc_map_at_100_diff1 value: 21.086788412737548 - type: nauc_map_at_100_max value: 25.8611620203686 - type: nauc_map_at_100_std value: -28.179239912057515 - type: nauc_map_at_10_diff1 value: 21.23841745922078 - type: nauc_map_at_10_max value: 25.44290342378288 - type: nauc_map_at_10_std value: -28.75578689110275 - type: nauc_map_at_1_diff1 value: 28.87454015638211 - type: nauc_map_at_1_max value: 17.50681123879997 - type: nauc_map_at_1_std value: -30.382831850562432 - type: nauc_map_at_20_diff1 value: 21.076559713540455 - type: nauc_map_at_20_max value: 25.538154202494535 - type: nauc_map_at_20_std value: -28.518764617658555 - type: nauc_map_at_3_diff1 value: 22.159185358766468 - type: nauc_map_at_3_max value: 23.01652660927249 - type: nauc_map_at_3_std value: -29.567722713221862 - type: nauc_map_at_5_diff1 value: 21.35578810370897 - type: nauc_map_at_5_max value: 25.550550437767395 - type: nauc_map_at_5_std value: -28.7889035461355 - type: nauc_mrr_at_1000_diff1 value: 22.28633009221923 - type: nauc_mrr_at_1000_max value: 26.920205393136392 - type: nauc_mrr_at_1000_std value: -25.887791634977642 - type: nauc_mrr_at_100_diff1 value: 22.2754975739755 - type: nauc_mrr_at_100_max value: 26.90235716615346 - type: nauc_mrr_at_100_std value: -25.891596020584345 - type: nauc_mrr_at_10_diff1 value: 22.415076305593534 - type: nauc_mrr_at_10_max value: 26.504643796222222 - type: nauc_mrr_at_10_std value: -26.6046081215833 - type: nauc_mrr_at_1_diff1 value: 23.406748619244368 - type: nauc_mrr_at_1_max value: 29.058228240823553 - type: nauc_mrr_at_1_std value: -26.450169820901078 - type: nauc_mrr_at_20_diff1 value: 22.29233141817678 - type: nauc_mrr_at_20_max value: 26.69021351064081 - type: nauc_mrr_at_20_std value: -26.086596227376656 - type: nauc_mrr_at_3_diff1 value: 22.20746187500145 - type: nauc_mrr_at_3_max value: 27.143725946169457 - type: nauc_mrr_at_3_std value: -26.7017708594376 - type: nauc_mrr_at_5_diff1 value: 22.71898965233195 - type: nauc_mrr_at_5_max value: 26.932386658571662 - type: nauc_mrr_at_5_std value: -26.725541058780234 - type: nauc_ndcg_at_1000_diff1 value: 20.541734305148466 - type: nauc_ndcg_at_1000_max value: 27.180534238090758 - type: nauc_ndcg_at_1000_std value: -23.74197745177845 - type: nauc_ndcg_at_100_diff1 value: 20.570052839937468 - type: nauc_ndcg_at_100_max value: 26.21605034405486 - type: nauc_ndcg_at_100_std value: -25.359817188805028 - type: nauc_ndcg_at_10_diff1 value: 21.241423075073467 - type: nauc_ndcg_at_10_max value: 24.599199195239475 - type: nauc_ndcg_at_10_std value: -28.404540333309008 - type: nauc_ndcg_at_1_diff1 value: 23.406748619244368 - type: nauc_ndcg_at_1_max value: 29.058228240823553 - type: nauc_ndcg_at_1_std value: -26.450169820901078 - type: nauc_ndcg_at_20_diff1 value: 20.740460046196873 - type: nauc_ndcg_at_20_max value: 24.82380195169634 - type: nauc_ndcg_at_20_std value: -27.376298834244313 - type: nauc_ndcg_at_3_diff1 value: 19.994948682426504 - type: nauc_ndcg_at_3_max value: 26.153790759405105 - type: nauc_ndcg_at_3_std value: -27.194548404540885 - type: nauc_ndcg_at_5_diff1 value: 21.48414272096384 - type: nauc_ndcg_at_5_max value: 25.239652015076373 - type: nauc_ndcg_at_5_std value: -28.2620160957961 - type: nauc_precision_at_1000_diff1 value: -0.7557639926687744 - type: nauc_precision_at_1000_max value: 24.265591636994436 - type: nauc_precision_at_1000_std value: 16.833104654292654 - type: nauc_precision_at_100_diff1 value: 4.647847665941115 - type: nauc_precision_at_100_max value: 24.42192644844434 - type: nauc_precision_at_100_std value: 0.2718848568876648 - type: nauc_precision_at_10_diff1 value: 9.465969286722654 - type: nauc_precision_at_10_max value: 27.448993150448043 - type: nauc_precision_at_10_std value: -16.519099596502212 - type: nauc_precision_at_1_diff1 value: 23.406748619244368 - type: nauc_precision_at_1_max value: 29.058228240823553 - type: nauc_precision_at_1_std value: -26.450169820901078 - type: nauc_precision_at_20_diff1 value: 8.021421615668114 - type: nauc_precision_at_20_max value: 26.18556481398635 - type: nauc_precision_at_20_std value: -12.207152108668367 - type: nauc_precision_at_3_diff1 value: 11.783572803634241 - type: nauc_precision_at_3_max value: 29.259715774978893 - type: nauc_precision_at_3_std value: -20.407524967717425 - type: nauc_precision_at_5_diff1 value: 10.371728615220821 - type: nauc_precision_at_5_max value: 30.270642833482864 - type: nauc_precision_at_5_std value: -18.407334880575494 - type: nauc_recall_at_1000_diff1 value: 6.008969959111555 - type: nauc_recall_at_1000_max value: 39.79691734058127 - type: nauc_recall_at_1000_std value: 32.43591825510109 - type: nauc_recall_at_100_diff1 value: 15.2374566058917 - type: nauc_recall_at_100_max value: 23.058785539503717 - type: nauc_recall_at_100_std value: -15.962888794058165 - type: nauc_recall_at_10_diff1 value: 19.46184821807753 - type: nauc_recall_at_10_max value: 19.001003513986866 - type: nauc_recall_at_10_std value: -27.753332786663876 - type: nauc_recall_at_1_diff1 value: 28.87454015638211 - type: nauc_recall_at_1_max value: 17.50681123879997 - type: nauc_recall_at_1_std value: -30.382831850562432 - type: nauc_recall_at_20_diff1 value: 17.237090858517405 - type: nauc_recall_at_20_max value: 18.42118474134871 - type: nauc_recall_at_20_std value: -24.862787724031957 - type: nauc_recall_at_3_diff1 value: 18.813019521758577 - type: nauc_recall_at_3_max value: 19.198572333053544 - type: nauc_recall_at_3_std value: -28.5644958605618 - type: nauc_recall_at_5_diff1 value: 20.247501986329482 - type: nauc_recall_at_5_max value: 21.121526202170358 - type: nauc_recall_at_5_std value: -27.220378617864853 - type: ndcg_at_1 value: 30.318 - type: ndcg_at_10 value: 34.042 - type: ndcg_at_100 value: 42.733 - type: ndcg_at_1000 value: 46.015 - type: ndcg_at_20 value: 37.053999999999995 - type: ndcg_at_3 value: 29.254 - type: ndcg_at_5 value: 30.514000000000003 - type: precision_at_1 value: 30.318 - type: precision_at_10 value: 10.981 - type: precision_at_100 value: 1.889 - type: precision_at_1000 value: 0.234 - type: precision_at_20 value: 6.643000000000001 - type: precision_at_3 value: 22.166 - type: precision_at_5 value: 17.477999999999998 - type: recall_at_1 value: 13.236 - type: recall_at_10 value: 41.461 - type: recall_at_100 value: 75.008 - type: recall_at_1000 value: 96.775 - type: recall_at_20 value: 50.754 - type: recall_at_3 value: 26.081 - type: recall_at_5 value: 33.168 task: type: Retrieval - dataset: config: eng-cmn name: MTEB XPQARetrieval (eng-cmn) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 37.504 - type: map_at_1 value: 16.019 - type: map_at_10 value: 30.794 - type: map_at_100 value: 33.157 - type: map_at_1000 value: 33.324999999999996 - type: map_at_20 value: 32.161 - type: map_at_3 value: 25.372 - type: map_at_5 value: 28.246 - type: mrr_at_1 value: 30.461165048543688 - type: mrr_at_10 value: 39.393107566651224 - type: mrr_at_100 value: 40.570039540602295 - type: mrr_at_1000 value: 40.6306116407744 - type: mrr_at_20 value: 40.09428159978876 - type: mrr_at_3 value: 37.176375404530745 - type: mrr_at_5 value: 38.09870550161812 - type: nauc_map_at_1000_diff1 value: 30.82306881892873 - type: nauc_map_at_1000_max value: 5.877636000666466 - type: nauc_map_at_1000_std value: -30.7140513386797 - type: nauc_map_at_100_diff1 value: 30.85192449151961 - type: nauc_map_at_100_max value: 5.809195131550909 - type: nauc_map_at_100_std value: -30.838556702972063 - type: nauc_map_at_10_diff1 value: 30.50359163635058 - type: nauc_map_at_10_max value: 6.373491595869303 - type: nauc_map_at_10_std value: -29.89368007827676 - type: nauc_map_at_1_diff1 value: 38.60240510083884 - type: nauc_map_at_1_max value: 10.407392664609139 - type: nauc_map_at_1_std value: -17.76327278732833 - type: nauc_map_at_20_diff1 value: 30.897489125753598 - type: nauc_map_at_20_max value: 5.9303381898248 - type: nauc_map_at_20_std value: -30.863345188760515 - type: nauc_map_at_3_diff1 value: 32.8150951852729 - type: nauc_map_at_3_max value: 7.671931402215177 - type: nauc_map_at_3_std value: -25.654809758216533 - type: nauc_map_at_5_diff1 value: 31.19558194781019 - type: nauc_map_at_5_max value: 6.426885613116939 - type: nauc_map_at_5_std value: -28.609027858850016 - type: nauc_mrr_at_1000_diff1 value: 30.7596332048733 - type: nauc_mrr_at_1000_max value: 1.1970748115580212 - type: nauc_mrr_at_1000_std value: -34.647570668150216 - type: nauc_mrr_at_100_diff1 value: 30.74693370788581 - type: nauc_mrr_at_100_max value: 1.1673272262754841 - type: nauc_mrr_at_100_std value: -34.67761028542745 - type: nauc_mrr_at_10_diff1 value: 30.537820575183076 - type: nauc_mrr_at_10_max value: 1.0261868725502707 - type: nauc_mrr_at_10_std value: -34.999990560631204 - type: nauc_mrr_at_1_diff1 value: 35.51868580113285 - type: nauc_mrr_at_1_max value: 5.117103773147307 - type: nauc_mrr_at_1_std value: -30.633913466736956 - type: nauc_mrr_at_20_diff1 value: 30.67318175430903 - type: nauc_mrr_at_20_max value: 1.0979983974981327 - type: nauc_mrr_at_20_std value: -34.8388339739997 - type: nauc_mrr_at_3_diff1 value: 30.884642006045702 - type: nauc_mrr_at_3_max value: 1.7970996544095983 - type: nauc_mrr_at_3_std value: -34.290172894906085 - type: nauc_mrr_at_5_diff1 value: 30.89687518368571 - type: nauc_mrr_at_5_max value: 1.2123714988495347 - type: nauc_mrr_at_5_std value: -35.01704580471926 - type: nauc_ndcg_at_1000_diff1 value: 29.214476799077342 - type: nauc_ndcg_at_1000_max value: 3.6379035546112872 - type: nauc_ndcg_at_1000_std value: -32.35757522049194 - type: nauc_ndcg_at_100_diff1 value: 29.130004541376298 - type: nauc_ndcg_at_100_max value: 2.9580589185293045 - type: nauc_ndcg_at_100_std value: -33.26884643871724 - type: nauc_ndcg_at_10_diff1 value: 28.521001084366393 - type: nauc_ndcg_at_10_max value: 3.630223957267483 - type: nauc_ndcg_at_10_std value: -33.14524140940815 - type: nauc_ndcg_at_1_diff1 value: 35.51868580113285 - type: nauc_ndcg_at_1_max value: 5.117103773147307 - type: nauc_ndcg_at_1_std value: -30.633913466736956 - type: nauc_ndcg_at_20_diff1 value: 29.194462756848782 - type: nauc_ndcg_at_20_max value: 2.61162903136461 - type: nauc_ndcg_at_20_std value: -34.59161403211834 - type: nauc_ndcg_at_3_diff1 value: 30.183555327135203 - type: nauc_ndcg_at_3_max value: 5.61949040917093 - type: nauc_ndcg_at_3_std value: -30.350117794058175 - type: nauc_ndcg_at_5_diff1 value: 29.74420394139971 - type: nauc_ndcg_at_5_max value: 3.952183813937688 - type: nauc_ndcg_at_5_std value: -31.807833795302038 - type: nauc_precision_at_1000_diff1 value: -5.467049121617333 - type: nauc_precision_at_1000_max value: -3.993986884198271 - type: nauc_precision_at_1000_std value: -13.703967324212224 - type: nauc_precision_at_100_diff1 value: 1.5585428307943647 - type: nauc_precision_at_100_max value: -4.250455723613214 - type: nauc_precision_at_100_std value: -22.294689856776493 - type: nauc_precision_at_10_diff1 value: 11.076036917255259 - type: nauc_precision_at_10_max value: -1.5859394644365377 - type: nauc_precision_at_10_std value: -34.94912594413202 - type: nauc_precision_at_1_diff1 value: 35.51868580113285 - type: nauc_precision_at_1_max value: 5.117103773147307 - type: nauc_precision_at_1_std value: -30.633913466736956 - type: nauc_precision_at_20_diff1 value: 9.311484455773828 - type: nauc_precision_at_20_max value: -3.678383428592432 - type: nauc_precision_at_20_std value: -33.700002761401635 - type: nauc_precision_at_3_diff1 value: 19.2787260874381 - type: nauc_precision_at_3_max value: 0.18292109396940018 - type: nauc_precision_at_3_std value: -35.23939824276542 - type: nauc_precision_at_5_diff1 value: 14.97930592298584 - type: nauc_precision_at_5_max value: -1.63540635880963 - type: nauc_precision_at_5_std value: -35.908283558321315 - type: nauc_recall_at_1000_diff1 value: 26.63056473607804 - type: nauc_recall_at_1000_max value: 62.7304558520689 - type: nauc_recall_at_1000_std value: 58.12421701377561 - type: nauc_recall_at_100_diff1 value: 21.42127379898579 - type: nauc_recall_at_100_max value: 1.4748203516921914 - type: nauc_recall_at_100_std value: -27.56467339041136 - type: nauc_recall_at_10_diff1 value: 21.20479652609812 - type: nauc_recall_at_10_max value: 1.7394881489709888 - type: nauc_recall_at_10_std value: -32.15116902585072 - type: nauc_recall_at_1_diff1 value: 38.60240510083884 - type: nauc_recall_at_1_max value: 10.407392664609139 - type: nauc_recall_at_1_std value: -17.76327278732833 - type: nauc_recall_at_20_diff1 value: 23.049652721582632 - type: nauc_recall_at_20_max value: -1.7715787106286838 - type: nauc_recall_at_20_std value: -36.14203686002867 - type: nauc_recall_at_3_diff1 value: 26.522179829461873 - type: nauc_recall_at_3_max value: 6.078208732431124 - type: nauc_recall_at_3_std value: -25.02625711226274 - type: nauc_recall_at_5_diff1 value: 24.19538553561693 - type: nauc_recall_at_5_max value: 2.4963810785503524 - type: nauc_recall_at_5_std value: -30.449635496921257 - type: ndcg_at_1 value: 30.461 - type: ndcg_at_10 value: 37.504 - type: ndcg_at_100 value: 46.156000000000006 - type: ndcg_at_1000 value: 48.985 - type: ndcg_at_20 value: 41.025 - type: ndcg_at_3 value: 32.165 - type: ndcg_at_5 value: 33.072 - type: precision_at_1 value: 30.461 - type: precision_at_10 value: 11.032 - type: precision_at_100 value: 1.8870000000000002 - type: precision_at_1000 value: 0.22499999999999998 - type: precision_at_20 value: 6.833 - type: precision_at_3 value: 22.532 - type: precision_at_5 value: 16.966 - type: recall_at_1 value: 16.019 - type: recall_at_10 value: 47.557 - type: recall_at_100 value: 80.376 - type: recall_at_1000 value: 98.904 - type: recall_at_20 value: 58.48100000000001 - type: recall_at_3 value: 30.682 - type: recall_at_5 value: 36.714999999999996 task: type: Retrieval - dataset: config: eng-spa name: MTEB XPQARetrieval (eng-spa) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 53.359 - type: map_at_1 value: 22.892000000000003 - type: map_at_10 value: 45.773 - type: map_at_100 value: 47.778999999999996 - type: map_at_1000 value: 47.882999999999996 - type: map_at_20 value: 46.869 - type: map_at_3 value: 37.643 - type: map_at_5 value: 43.120999999999995 - type: mrr_at_1 value: 47.28877679697352 - type: mrr_at_10 value: 56.95890630316857 - type: mrr_at_100 value: 57.71103367009639 - type: mrr_at_1000 value: 57.73661441948852 - type: mrr_at_20 value: 57.37701091311334 - type: mrr_at_3 value: 54.74989491382929 - type: mrr_at_5 value: 56.08659100462372 - type: nauc_map_at_1000_diff1 value: 27.8347129954991 - type: nauc_map_at_1000_max value: 38.04300600762859 - type: nauc_map_at_1000_std value: -18.294653328262868 - type: nauc_map_at_100_diff1 value: 27.818449297770858 - type: nauc_map_at_100_max value: 38.03533462156633 - type: nauc_map_at_100_std value: -18.332989980880644 - type: nauc_map_at_10_diff1 value: 27.520664180018358 - type: nauc_map_at_10_max value: 37.67109855753314 - type: nauc_map_at_10_std value: -18.496721673888683 - type: nauc_map_at_1_diff1 value: 37.56020148060502 - type: nauc_map_at_1_max value: 10.298394230150745 - type: nauc_map_at_1_std value: -20.41359936101547 - type: nauc_map_at_20_diff1 value: 27.615023038189722 - type: nauc_map_at_20_max value: 37.808525116320254 - type: nauc_map_at_20_std value: -18.49235775420803 - type: nauc_map_at_3_diff1 value: 30.797347567428424 - type: nauc_map_at_3_max value: 29.374407828869497 - type: nauc_map_at_3_std value: -19.75905772914969 - type: nauc_map_at_5_diff1 value: 28.431802888884803 - type: nauc_map_at_5_max value: 35.57723911610521 - type: nauc_map_at_5_std value: -19.093588845366824 - type: nauc_mrr_at_1000_diff1 value: 33.263611009054586 - type: nauc_mrr_at_1000_max value: 40.620639901613664 - type: nauc_mrr_at_1000_std value: -17.083016011032036 - type: nauc_mrr_at_100_diff1 value: 33.25375012559163 - type: nauc_mrr_at_100_max value: 40.62376205172005 - type: nauc_mrr_at_100_std value: -17.091930575226684 - type: nauc_mrr_at_10_diff1 value: 33.05787202690095 - type: nauc_mrr_at_10_max value: 40.4516362611674 - type: nauc_mrr_at_10_std value: -17.088910666499892 - type: nauc_mrr_at_1_diff1 value: 36.424151087824555 - type: nauc_mrr_at_1_max value: 40.955715626650445 - type: nauc_mrr_at_1_std value: -16.56636409111209 - type: nauc_mrr_at_20_diff1 value: 33.12029456858138 - type: nauc_mrr_at_20_max value: 40.56409347292635 - type: nauc_mrr_at_20_std value: -17.102034817242068 - type: nauc_mrr_at_3_diff1 value: 33.52377926814156 - type: nauc_mrr_at_3_max value: 40.824911575046876 - type: nauc_mrr_at_3_std value: -16.855935748811092 - type: nauc_mrr_at_5_diff1 value: 33.08646471768442 - type: nauc_mrr_at_5_max value: 40.59323589955881 - type: nauc_mrr_at_5_std value: -16.77829710500156 - type: nauc_ndcg_at_1000_diff1 value: 28.741186244590207 - type: nauc_ndcg_at_1000_max value: 40.0113825410539 - type: nauc_ndcg_at_1000_std value: -17.15655081742458 - type: nauc_ndcg_at_100_diff1 value: 28.680521359782972 - type: nauc_ndcg_at_100_max value: 39.94751899984445 - type: nauc_ndcg_at_100_std value: -17.82813814043932 - type: nauc_ndcg_at_10_diff1 value: 27.22858072673168 - type: nauc_ndcg_at_10_max value: 38.600188968554725 - type: nauc_ndcg_at_10_std value: -18.517203924893614 - type: nauc_ndcg_at_1_diff1 value: 36.424151087824555 - type: nauc_ndcg_at_1_max value: 40.955715626650445 - type: nauc_ndcg_at_1_std value: -16.56636409111209 - type: nauc_ndcg_at_20_diff1 value: 27.56875900623774 - type: nauc_ndcg_at_20_max value: 38.95264310199067 - type: nauc_ndcg_at_20_std value: -18.709973965688445 - type: nauc_ndcg_at_3_diff1 value: 28.682842749851574 - type: nauc_ndcg_at_3_max value: 38.361215408395964 - type: nauc_ndcg_at_3_std value: -16.800291231827515 - type: nauc_ndcg_at_5_diff1 value: 28.178239259093484 - type: nauc_ndcg_at_5_max value: 36.77096292606479 - type: nauc_ndcg_at_5_std value: -18.718861696641145 - type: nauc_precision_at_1000_diff1 value: -7.3686253252869305 - type: nauc_precision_at_1000_max value: 31.98896996987639 - type: nauc_precision_at_1000_std value: 13.125659676392267 - type: nauc_precision_at_100_diff1 value: -2.8239113056969156 - type: nauc_precision_at_100_max value: 36.95062472971812 - type: nauc_precision_at_100_std value: 7.230228733647562 - type: nauc_precision_at_10_diff1 value: 2.5515545798843555 - type: nauc_precision_at_10_max value: 45.46146019314904 - type: nauc_precision_at_10_std value: -1.3249340536211553 - type: nauc_precision_at_1_diff1 value: 36.424151087824555 - type: nauc_precision_at_1_max value: 40.955715626650445 - type: nauc_precision_at_1_std value: -16.56636409111209 - type: nauc_precision_at_20_diff1 value: 0.7202861770489576 - type: nauc_precision_at_20_max value: 41.9937596214609 - type: nauc_precision_at_20_std value: 0.2756400069730064 - type: nauc_precision_at_3_diff1 value: 12.89221206929447 - type: nauc_precision_at_3_max value: 48.57775126381142 - type: nauc_precision_at_3_std value: -8.042242254131068 - type: nauc_precision_at_5_diff1 value: 7.063616193387763 - type: nauc_precision_at_5_max value: 47.26496887331675 - type: nauc_precision_at_5_std value: -4.735805200913049 - type: nauc_recall_at_1000_diff1 value: 2.6650052980682224 - type: nauc_recall_at_1000_max value: 81.94826279951472 - type: nauc_recall_at_1000_std value: 48.46012388224573 - type: nauc_recall_at_100_diff1 value: 24.516371948375827 - type: nauc_recall_at_100_max value: 39.17639620389552 - type: nauc_recall_at_100_std value: -17.884197602579533 - type: nauc_recall_at_10_diff1 value: 19.93892097640112 - type: nauc_recall_at_10_max value: 33.079079440022106 - type: nauc_recall_at_10_std value: -20.22227622801884 - type: nauc_recall_at_1_diff1 value: 37.56020148060502 - type: nauc_recall_at_1_max value: 10.298394230150745 - type: nauc_recall_at_1_std value: -20.41359936101547 - type: nauc_recall_at_20_diff1 value: 20.363784035670633 - type: nauc_recall_at_20_max value: 33.39352971625336 - type: nauc_recall_at_20_std value: -21.712050932168875 - type: nauc_recall_at_3_diff1 value: 26.220072121604655 - type: nauc_recall_at_3_max value: 25.853218030218507 - type: nauc_recall_at_3_std value: -17.830613372910907 - type: nauc_recall_at_5_diff1 value: 22.25850162680252 - type: nauc_recall_at_5_max value: 30.89620539042785 - type: nauc_recall_at_5_std value: -19.16786434439169 - type: ndcg_at_1 value: 47.288999999999994 - type: ndcg_at_10 value: 53.359 - type: ndcg_at_100 value: 60.25899999999999 - type: ndcg_at_1000 value: 61.902 - type: ndcg_at_20 value: 56.025000000000006 - type: ndcg_at_3 value: 47.221999999999994 - type: ndcg_at_5 value: 49.333 - type: precision_at_1 value: 47.288999999999994 - type: precision_at_10 value: 16.003 - type: precision_at_100 value: 2.221 - type: precision_at_1000 value: 0.246 - type: precision_at_20 value: 8.985 - type: precision_at_3 value: 34.510000000000005 - type: precision_at_5 value: 26.961000000000002 - type: recall_at_1 value: 22.892000000000003 - type: recall_at_10 value: 62.928 - type: recall_at_100 value: 89.105 - type: recall_at_1000 value: 99.319 - type: recall_at_20 value: 71.387 - type: recall_at_3 value: 43.492999999999995 - type: recall_at_5 value: 53.529 task: type: Retrieval - dataset: config: eng-fra name: MTEB XPQARetrieval (eng-fra) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 54.888000000000005 - type: map_at_1 value: 26.079 - type: map_at_10 value: 47.434 - type: map_at_100 value: 49.376 - type: map_at_1000 value: 49.461 - type: map_at_20 value: 48.634 - type: map_at_3 value: 40.409 - type: map_at_5 value: 44.531 - type: mrr_at_1 value: 46.86248331108144 - type: mrr_at_10 value: 56.45506177548896 - type: mrr_at_100 value: 57.20360629445577 - type: mrr_at_1000 value: 57.227004696897986 - type: mrr_at_20 value: 56.905302765737865 - type: mrr_at_3 value: 54.09434801958164 - type: mrr_at_5 value: 55.40943480195811 - type: nauc_map_at_1000_diff1 value: 37.739936045535885 - type: nauc_map_at_1000_max value: 35.92625003516368 - type: nauc_map_at_1000_std value: -15.825119611638398 - type: nauc_map_at_100_diff1 value: 37.71697833661983 - type: nauc_map_at_100_max value: 35.91174068136317 - type: nauc_map_at_100_std value: -15.838841891589006 - type: nauc_map_at_10_diff1 value: 37.52309268219689 - type: nauc_map_at_10_max value: 35.4887130483351 - type: nauc_map_at_10_std value: -16.61132378136234 - type: nauc_map_at_1_diff1 value: 42.705087329207984 - type: nauc_map_at_1_max value: 12.047671550242974 - type: nauc_map_at_1_std value: -17.156030827065834 - type: nauc_map_at_20_diff1 value: 37.59446680137666 - type: nauc_map_at_20_max value: 35.80559546695052 - type: nauc_map_at_20_std value: -16.158338316249786 - type: nauc_map_at_3_diff1 value: 38.618415267131816 - type: nauc_map_at_3_max value: 27.030227996183925 - type: nauc_map_at_3_std value: -18.962500694157857 - type: nauc_map_at_5_diff1 value: 37.980845601534256 - type: nauc_map_at_5_max value: 32.82374761283266 - type: nauc_map_at_5_std value: -17.856875825229565 - type: nauc_mrr_at_1000_diff1 value: 40.26059509279346 - type: nauc_mrr_at_1000_max value: 39.28453752990871 - type: nauc_mrr_at_1000_std value: -13.306217279524212 - type: nauc_mrr_at_100_diff1 value: 40.23390833398881 - type: nauc_mrr_at_100_max value: 39.26041461025653 - type: nauc_mrr_at_100_std value: -13.317700798873153 - type: nauc_mrr_at_10_diff1 value: 40.163737640180145 - type: nauc_mrr_at_10_max value: 39.27138538165913 - type: nauc_mrr_at_10_std value: -13.472971360323038 - type: nauc_mrr_at_1_diff1 value: 42.95339241383707 - type: nauc_mrr_at_1_max value: 40.62982307619158 - type: nauc_mrr_at_1_std value: -10.429597045942748 - type: nauc_mrr_at_20_diff1 value: 40.23703505923782 - type: nauc_mrr_at_20_max value: 39.27051308063652 - type: nauc_mrr_at_20_std value: -13.390197643922038 - type: nauc_mrr_at_3_diff1 value: 40.5721313555661 - type: nauc_mrr_at_3_max value: 39.254774354468594 - type: nauc_mrr_at_3_std value: -13.773803807863827 - type: nauc_mrr_at_5_diff1 value: 40.41081287079734 - type: nauc_mrr_at_5_max value: 39.515241132077335 - type: nauc_mrr_at_5_std value: -13.306544090087336 - type: nauc_ndcg_at_1000_diff1 value: 38.04772268296103 - type: nauc_ndcg_at_1000_max value: 38.03364565521176 - type: nauc_ndcg_at_1000_std value: -14.203182726102263 - type: nauc_ndcg_at_100_diff1 value: 37.51752795463643 - type: nauc_ndcg_at_100_max value: 37.809671511710604 - type: nauc_ndcg_at_100_std value: -13.880578225081408 - type: nauc_ndcg_at_10_diff1 value: 36.78438984005559 - type: nauc_ndcg_at_10_max value: 36.98105155993232 - type: nauc_ndcg_at_10_std value: -16.886308645939113 - type: nauc_ndcg_at_1_diff1 value: 42.95339241383707 - type: nauc_ndcg_at_1_max value: 40.62982307619158 - type: nauc_ndcg_at_1_std value: -10.429597045942748 - type: nauc_ndcg_at_20_diff1 value: 36.94164323893683 - type: nauc_ndcg_at_20_max value: 37.333583379288285 - type: nauc_ndcg_at_20_std value: -15.853318071434716 - type: nauc_ndcg_at_3_diff1 value: 36.905604845477384 - type: nauc_ndcg_at_3_max value: 35.10252586688781 - type: nauc_ndcg_at_3_std value: -17.128435988977742 - type: nauc_ndcg_at_5_diff1 value: 37.96742463612705 - type: nauc_ndcg_at_5_max value: 34.65945109443365 - type: nauc_ndcg_at_5_std value: -17.916428667861183 - type: nauc_precision_at_1000_diff1 value: -3.740861894117653 - type: nauc_precision_at_1000_max value: 31.993854396874177 - type: nauc_precision_at_1000_std value: 17.445629474196448 - type: nauc_precision_at_100_diff1 value: -0.4825948747911606 - type: nauc_precision_at_100_max value: 35.834638448782954 - type: nauc_precision_at_100_std value: 16.82718796079511 - type: nauc_precision_at_10_diff1 value: 8.285949866268147 - type: nauc_precision_at_10_max value: 45.3292519726866 - type: nauc_precision_at_10_std value: 4.5574850748441555 - type: nauc_precision_at_1_diff1 value: 42.95339241383707 - type: nauc_precision_at_1_max value: 40.62982307619158 - type: nauc_precision_at_1_std value: -10.429597045942748 - type: nauc_precision_at_20_diff1 value: 4.890590733611442 - type: nauc_precision_at_20_max value: 41.83051757078859 - type: nauc_precision_at_20_std value: 9.197347125630467 - type: nauc_precision_at_3_diff1 value: 17.79940075411976 - type: nauc_precision_at_3_max value: 45.224103632426946 - type: nauc_precision_at_3_std value: -5.017203435609909 - type: nauc_precision_at_5_diff1 value: 13.548063145911929 - type: nauc_precision_at_5_max value: 46.84837547409909 - type: nauc_precision_at_5_std value: -0.8925939386354484 - type: nauc_recall_at_1000_diff1 value: 74.48441717138078 - type: nauc_recall_at_1000_max value: 74.66717137705027 - type: nauc_recall_at_1000_std value: 0.24030117471512125 - type: nauc_recall_at_100_diff1 value: 22.553777341988656 - type: nauc_recall_at_100_max value: 31.67861029246527 - type: nauc_recall_at_100_std value: 0.2707450517253687 - type: nauc_recall_at_10_diff1 value: 28.490866614443235 - type: nauc_recall_at_10_max value: 31.722970141434352 - type: nauc_recall_at_10_std value: -21.97893365028007 - type: nauc_recall_at_1_diff1 value: 42.705087329207984 - type: nauc_recall_at_1_max value: 12.047671550242974 - type: nauc_recall_at_1_std value: -17.156030827065834 - type: nauc_recall_at_20_diff1 value: 27.44043454173112 - type: nauc_recall_at_20_max value: 31.454281772040716 - type: nauc_recall_at_20_std value: -20.1735695305415 - type: nauc_recall_at_3_diff1 value: 34.08447534706394 - type: nauc_recall_at_3_max value: 21.793973773840865 - type: nauc_recall_at_3_std value: -22.753978372378906 - type: nauc_recall_at_5_diff1 value: 33.59686526199479 - type: nauc_recall_at_5_max value: 29.188889073761302 - type: nauc_recall_at_5_std value: -21.96156333744562 - type: ndcg_at_1 value: 46.861999999999995 - type: ndcg_at_10 value: 54.888000000000005 - type: ndcg_at_100 value: 61.477000000000004 - type: ndcg_at_1000 value: 62.768 - type: ndcg_at_20 value: 57.812 - type: ndcg_at_3 value: 48.721 - type: ndcg_at_5 value: 50.282000000000004 - type: precision_at_1 value: 46.861999999999995 - type: precision_at_10 value: 15.167 - type: precision_at_100 value: 2.072 - type: precision_at_1000 value: 0.22499999999999998 - type: precision_at_20 value: 8.672 - type: precision_at_3 value: 33.066 - type: precision_at_5 value: 24.726 - type: recall_at_1 value: 26.079 - type: recall_at_10 value: 66.095 - type: recall_at_100 value: 91.65299999999999 - type: recall_at_1000 value: 99.83999999999999 - type: recall_at_20 value: 75.28 - type: recall_at_3 value: 46.874 - type: recall_at_5 value: 55.062 task: type: Retrieval - dataset: config: pol-eng name: MTEB XPQARetrieval (pol-eng) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 50.831 - type: map_at_1 value: 25.549 - type: map_at_10 value: 44.432 - type: map_at_100 value: 46.431 - type: map_at_1000 value: 46.525 - type: map_at_20 value: 45.595 - type: map_at_3 value: 38.574000000000005 - type: map_at_5 value: 42.266999999999996 - type: mrr_at_1 value: 43.5006435006435 - type: mrr_at_10 value: 51.561255132683684 - type: mrr_at_100 value: 52.59912482635216 - type: mrr_at_1000 value: 52.631337587043056 - type: mrr_at_20 value: 52.23234440063273 - type: mrr_at_3 value: 48.97039897039895 - type: mrr_at_5 value: 50.31531531531527 - type: nauc_map_at_1000_diff1 value: 35.907901295900174 - type: nauc_map_at_1000_max value: 24.573763602041687 - type: nauc_map_at_1000_std value: -29.524077960309313 - type: nauc_map_at_100_diff1 value: 35.86869121827827 - type: nauc_map_at_100_max value: 24.532343818487494 - type: nauc_map_at_100_std value: -29.613979124488864 - type: nauc_map_at_10_diff1 value: 35.90171794022391 - type: nauc_map_at_10_max value: 23.90914892943268 - type: nauc_map_at_10_std value: -30.43698820061533 - type: nauc_map_at_1_diff1 value: 50.80313333312038 - type: nauc_map_at_1_max value: 16.649890421888156 - type: nauc_map_at_1_std value: -22.323989416471683 - type: nauc_map_at_20_diff1 value: 35.77755470212964 - type: nauc_map_at_20_max value: 24.199895270297034 - type: nauc_map_at_20_std value: -30.223411960170647 - type: nauc_map_at_3_diff1 value: 38.964124882315936 - type: nauc_map_at_3_max value: 21.187432510177167 - type: nauc_map_at_3_std value: -28.976663506389887 - type: nauc_map_at_5_diff1 value: 36.04644236616672 - type: nauc_map_at_5_max value: 23.501186429317094 - type: nauc_map_at_5_std value: -30.068144596060748 - type: nauc_mrr_at_1000_diff1 value: 41.36555452105447 - type: nauc_mrr_at_1000_max value: 26.376799280402867 - type: nauc_mrr_at_1000_std value: -30.008603028757424 - type: nauc_mrr_at_100_diff1 value: 41.35523965220727 - type: nauc_mrr_at_100_max value: 26.402612115967706 - type: nauc_mrr_at_100_std value: -29.991754627128024 - type: nauc_mrr_at_10_diff1 value: 41.001395127259315 - type: nauc_mrr_at_10_max value: 26.104860505051384 - type: nauc_mrr_at_10_std value: -30.38420449487516 - type: nauc_mrr_at_1_diff1 value: 44.882846373248206 - type: nauc_mrr_at_1_max value: 26.61905322890808 - type: nauc_mrr_at_1_std value: -28.724565662206153 - type: nauc_mrr_at_20_diff1 value: 41.278009142648834 - type: nauc_mrr_at_20_max value: 26.284565529087295 - type: nauc_mrr_at_20_std value: -30.19549140549242 - type: nauc_mrr_at_3_diff1 value: 41.74663893951077 - type: nauc_mrr_at_3_max value: 26.263048464325884 - type: nauc_mrr_at_3_std value: -30.676733442965688 - type: nauc_mrr_at_5_diff1 value: 41.11461477846568 - type: nauc_mrr_at_5_max value: 25.94713927964926 - type: nauc_mrr_at_5_std value: -30.317066480767817 - type: nauc_ndcg_at_1000_diff1 value: 36.34161052445199 - type: nauc_ndcg_at_1000_max value: 26.321036033696206 - type: nauc_ndcg_at_1000_std value: -27.59146917115399 - type: nauc_ndcg_at_100_diff1 value: 35.66557800007035 - type: nauc_ndcg_at_100_max value: 26.282211208336136 - type: nauc_ndcg_at_100_std value: -27.905634124461333 - type: nauc_ndcg_at_10_diff1 value: 35.34872687407275 - type: nauc_ndcg_at_10_max value: 24.018561915792272 - type: nauc_ndcg_at_10_std value: -31.57712772869015 - type: nauc_ndcg_at_1_diff1 value: 44.882846373248206 - type: nauc_ndcg_at_1_max value: 26.865602442152554 - type: nauc_ndcg_at_1_std value: -28.509295454329152 - type: nauc_ndcg_at_20_diff1 value: 35.46177768045546 - type: nauc_ndcg_at_20_max value: 24.921273675141542 - type: nauc_ndcg_at_20_std value: -30.84348812979793 - type: nauc_ndcg_at_3_diff1 value: 36.84688489063923 - type: nauc_ndcg_at_3_max value: 24.088513229463736 - type: nauc_ndcg_at_3_std value: -30.05640995379297 - type: nauc_ndcg_at_5_diff1 value: 35.623143276796185 - type: nauc_ndcg_at_5_max value: 23.76654250474061 - type: nauc_ndcg_at_5_std value: -30.87847710074466 - type: nauc_precision_at_1000_diff1 value: -16.270532533886932 - type: nauc_precision_at_1000_max value: 17.37365042394671 - type: nauc_precision_at_1000_std value: 16.27166715693082 - type: nauc_precision_at_100_diff1 value: -13.175264889436313 - type: nauc_precision_at_100_max value: 19.488571046893963 - type: nauc_precision_at_100_std value: 9.055429698007798 - type: nauc_precision_at_10_diff1 value: 0.6806938753592942 - type: nauc_precision_at_10_max value: 21.933083960522616 - type: nauc_precision_at_10_std value: -18.2147036942157 - type: nauc_precision_at_1_diff1 value: 44.882846373248206 - type: nauc_precision_at_1_max value: 26.865602442152554 - type: nauc_precision_at_1_std value: -28.509295454329152 - type: nauc_precision_at_20_diff1 value: -4.318119150162302 - type: nauc_precision_at_20_max value: 21.089702301041687 - type: nauc_precision_at_20_std value: -10.333077681479546 - type: nauc_precision_at_3_diff1 value: 11.496076462671107 - type: nauc_precision_at_3_max value: 23.018301549827008 - type: nauc_precision_at_3_std value: -23.98652995416454 - type: nauc_precision_at_5_diff1 value: 4.271050668117355 - type: nauc_precision_at_5_max value: 23.61051327966779 - type: nauc_precision_at_5_std value: -21.557618503107847 - type: nauc_recall_at_1000_diff1 value: 62.23955911850697 - type: nauc_recall_at_1000_max value: 83.20491723365542 - type: nauc_recall_at_1000_std value: 66.5173462601958 - type: nauc_recall_at_100_diff1 value: 20.503778602988177 - type: nauc_recall_at_100_max value: 29.379026288767506 - type: nauc_recall_at_100_std value: -16.139120874540573 - type: nauc_recall_at_10_diff1 value: 27.659110249896557 - type: nauc_recall_at_10_max value: 19.69557968026332 - type: nauc_recall_at_10_std value: -33.95657132767551 - type: nauc_recall_at_1_diff1 value: 50.80313333312038 - type: nauc_recall_at_1_max value: 16.649890421888156 - type: nauc_recall_at_1_std value: -22.323989416471683 - type: nauc_recall_at_20_diff1 value: 27.084453724565176 - type: nauc_recall_at_20_max value: 21.40080632474994 - type: nauc_recall_at_20_std value: -32.83683639340239 - type: nauc_recall_at_3_diff1 value: 34.32950941333572 - type: nauc_recall_at_3_max value: 18.55616615958199 - type: nauc_recall_at_3_std value: -30.375983327454076 - type: nauc_recall_at_5_diff1 value: 29.44516734974564 - type: nauc_recall_at_5_max value: 20.630543534300312 - type: nauc_recall_at_5_std value: -31.30763062499127 - type: ndcg_at_1 value: 43.501 - type: ndcg_at_10 value: 50.831 - type: ndcg_at_100 value: 58.17099999999999 - type: ndcg_at_1000 value: 59.705 - type: ndcg_at_20 value: 54.047999999999995 - type: ndcg_at_3 value: 44.549 - type: ndcg_at_5 value: 46.861000000000004 - type: precision_at_1 value: 43.501 - type: precision_at_10 value: 12.895999999999999 - type: precision_at_100 value: 1.9 - type: precision_at_1000 value: 0.21 - type: precision_at_20 value: 7.593 - type: precision_at_3 value: 29.215000000000003 - type: precision_at_5 value: 21.57 - type: recall_at_1 value: 25.549 - type: recall_at_10 value: 61.795 - type: recall_at_100 value: 90.019 - type: recall_at_1000 value: 99.807 - type: recall_at_20 value: 72.096 - type: recall_at_3 value: 43.836999999999996 - type: recall_at_5 value: 51.714000000000006 task: type: Retrieval - dataset: config: pol-pol name: MTEB XPQARetrieval (pol-pol) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 53.70399999999999 - type: map_at_1 value: 27.739000000000004 - type: map_at_10 value: 47.469 - type: map_at_100 value: 49.392 - type: map_at_1000 value: 49.483 - type: map_at_20 value: 48.646 - type: map_at_3 value: 41.467 - type: map_at_5 value: 45.467 - type: mrr_at_1 value: 47.00636942675159 - type: mrr_at_10 value: 54.63699322616519 - type: mrr_at_100 value: 55.54525182833755 - type: mrr_at_1000 value: 55.581331515356155 - type: mrr_at_20 value: 55.22918377451415 - type: mrr_at_3 value: 52.03821656050952 - type: mrr_at_5 value: 53.38216560509549 - type: nauc_map_at_1000_diff1 value: 45.03530825034854 - type: nauc_map_at_1000_max value: 34.22740272603397 - type: nauc_map_at_1000_std value: -30.428880484199244 - type: nauc_map_at_100_diff1 value: 44.978704455592805 - type: nauc_map_at_100_max value: 34.20908357964765 - type: nauc_map_at_100_std value: -30.47325365059666 - type: nauc_map_at_10_diff1 value: 44.9560579177672 - type: nauc_map_at_10_max value: 33.70097588985278 - type: nauc_map_at_10_std value: -31.205563222357885 - type: nauc_map_at_1_diff1 value: 57.94711780881773 - type: nauc_map_at_1_max value: 21.60278071836319 - type: nauc_map_at_1_std value: -23.273741268035923 - type: nauc_map_at_20_diff1 value: 44.97859054699532 - type: nauc_map_at_20_max value: 34.153729150181846 - type: nauc_map_at_20_std value: -30.97482545902907 - type: nauc_map_at_3_diff1 value: 47.52016138686765 - type: nauc_map_at_3_max value: 30.176197065298417 - type: nauc_map_at_3_std value: -29.90628984041898 - type: nauc_map_at_5_diff1 value: 45.36581638257985 - type: nauc_map_at_5_max value: 33.697200263698036 - type: nauc_map_at_5_std value: -31.165331120088453 - type: nauc_mrr_at_1000_diff1 value: 53.32889526818364 - type: nauc_mrr_at_1000_max value: 36.104118340589736 - type: nauc_mrr_at_1000_std value: -31.321132494516984 - type: nauc_mrr_at_100_diff1 value: 53.30695875258367 - type: nauc_mrr_at_100_max value: 36.114890079024455 - type: nauc_mrr_at_100_std value: -31.291749322117447 - type: nauc_mrr_at_10_diff1 value: 53.189084772141435 - type: nauc_mrr_at_10_max value: 35.939061062282484 - type: nauc_mrr_at_10_std value: -31.502185884653645 - type: nauc_mrr_at_1_diff1 value: 56.89368291041337 - type: nauc_mrr_at_1_max value: 36.07581125496313 - type: nauc_mrr_at_1_std value: -29.703764232519475 - type: nauc_mrr_at_20_diff1 value: 53.23955737199497 - type: nauc_mrr_at_20_max value: 36.068824838215676 - type: nauc_mrr_at_20_std value: -31.420039428197594 - type: nauc_mrr_at_3_diff1 value: 53.74385074861207 - type: nauc_mrr_at_3_max value: 35.57054587735015 - type: nauc_mrr_at_3_std value: -32.356894834537684 - type: nauc_mrr_at_5_diff1 value: 53.66669556981826 - type: nauc_mrr_at_5_max value: 36.02102289605049 - type: nauc_mrr_at_5_std value: -32.030437067359124 - type: nauc_ndcg_at_1000_diff1 value: 46.34900536768847 - type: nauc_ndcg_at_1000_max value: 35.6314995837715 - type: nauc_ndcg_at_1000_std value: -28.965103958822624 - type: nauc_ndcg_at_100_diff1 value: 45.1587893788861 - type: nauc_ndcg_at_100_max value: 35.62430753595297 - type: nauc_ndcg_at_100_std value: -28.77303405812772 - type: nauc_ndcg_at_10_diff1 value: 44.928781590765965 - type: nauc_ndcg_at_10_max value: 34.315200006430366 - type: nauc_ndcg_at_10_std value: -32.05164097076614 - type: nauc_ndcg_at_1_diff1 value: 57.228262350455125 - type: nauc_ndcg_at_1_max value: 35.645285703387366 - type: nauc_ndcg_at_1_std value: -29.893553821348718 - type: nauc_ndcg_at_20_diff1 value: 44.959903633039865 - type: nauc_ndcg_at_20_max value: 35.493022926282755 - type: nauc_ndcg_at_20_std value: -31.54989291850644 - type: nauc_ndcg_at_3_diff1 value: 46.65266185996905 - type: nauc_ndcg_at_3_max value: 33.74458119579594 - type: nauc_ndcg_at_3_std value: -31.493683304534176 - type: nauc_ndcg_at_5_diff1 value: 46.08707037187612 - type: nauc_ndcg_at_5_max value: 34.7401426055243 - type: nauc_ndcg_at_5_std value: -32.44390676345172 - type: nauc_precision_at_1000_diff1 value: -12.11355300492561 - type: nauc_precision_at_1000_max value: 14.490738062121233 - type: nauc_precision_at_1000_std value: 14.448811005059097 - type: nauc_precision_at_100_diff1 value: -9.742085657181239 - type: nauc_precision_at_100_max value: 18.030305489251223 - type: nauc_precision_at_100_std value: 8.213089709529765 - type: nauc_precision_at_10_diff1 value: 5.153466672774969 - type: nauc_precision_at_10_max value: 27.29412644661678 - type: nauc_precision_at_10_std value: -15.505053884112355 - type: nauc_precision_at_1_diff1 value: 57.228262350455125 - type: nauc_precision_at_1_max value: 35.645285703387366 - type: nauc_precision_at_1_std value: -29.893553821348718 - type: nauc_precision_at_20_diff1 value: -0.6812430761066635 - type: nauc_precision_at_20_max value: 25.81911286466295 - type: nauc_precision_at_20_std value: -8.388506222482595 - type: nauc_precision_at_3_diff1 value: 18.263873866510576 - type: nauc_precision_at_3_max value: 30.879576105862345 - type: nauc_precision_at_3_std value: -24.0342929870108 - type: nauc_precision_at_5_diff1 value: 10.9905804265327 - type: nauc_precision_at_5_max value: 30.88468087429045 - type: nauc_precision_at_5_std value: -20.458684056213507 - type: nauc_recall_at_1000_diff1 value: -64.887668417171 - type: nauc_recall_at_1000_max value: 52.25501730358092 - type: nauc_recall_at_1000_std value: 85.13647916200132 - type: nauc_recall_at_100_diff1 value: 18.956777346127655 - type: nauc_recall_at_100_max value: 36.10473493564588 - type: nauc_recall_at_100_std value: -10.007474558899949 - type: nauc_recall_at_10_diff1 value: 33.810344497568046 - type: nauc_recall_at_10_max value: 31.395430183214245 - type: nauc_recall_at_10_std value: -33.12920524433795 - type: nauc_recall_at_1_diff1 value: 57.94711780881773 - type: nauc_recall_at_1_max value: 21.60278071836319 - type: nauc_recall_at_1_std value: -23.273741268035923 - type: nauc_recall_at_20_diff1 value: 31.449657437065397 - type: nauc_recall_at_20_max value: 34.519574934321945 - type: nauc_recall_at_20_std value: -33.43406862055647 - type: nauc_recall_at_3_diff1 value: 42.07841848382365 - type: nauc_recall_at_3_max value: 28.7648772833266 - type: nauc_recall_at_3_std value: -31.56367736320086 - type: nauc_recall_at_5_diff1 value: 39.21392858246301 - type: nauc_recall_at_5_max value: 34.28338202081927 - type: nauc_recall_at_5_std value: -33.725680523721906 - type: ndcg_at_1 value: 46.879 - type: ndcg_at_10 value: 53.70399999999999 - type: ndcg_at_100 value: 60.532 - type: ndcg_at_1000 value: 61.997 - type: ndcg_at_20 value: 56.818999999999996 - type: ndcg_at_3 value: 47.441 - type: ndcg_at_5 value: 49.936 - type: precision_at_1 value: 46.879 - type: precision_at_10 value: 13.376 - type: precision_at_100 value: 1.8980000000000001 - type: precision_at_1000 value: 0.208 - type: precision_at_20 value: 7.771 - type: precision_at_3 value: 30.658 - type: precision_at_5 value: 22.828 - type: recall_at_1 value: 27.739000000000004 - type: recall_at_10 value: 64.197 - type: recall_at_100 value: 90.54100000000001 - type: recall_at_1000 value: 99.90400000000001 - type: recall_at_20 value: 74.178 - type: recall_at_3 value: 46.312 - type: recall_at_5 value: 54.581999999999994 task: type: Retrieval - dataset: config: cmn-eng name: MTEB XPQARetrieval (cmn-eng) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 64.64 - type: map_at_1 value: 35.858000000000004 - type: map_at_10 value: 58.547000000000004 - type: map_at_100 value: 60.108 - type: map_at_1000 value: 60.153999999999996 - type: map_at_20 value: 59.528000000000006 - type: map_at_3 value: 51.578 - type: map_at_5 value: 56.206999999999994 - type: mrr_at_1 value: 56.95121951219512 - type: mrr_at_10 value: 64.93975029036001 - type: mrr_at_100 value: 65.63357055718294 - type: mrr_at_1000 value: 65.64844109026834 - type: mrr_at_20 value: 65.41280668715439 - type: mrr_at_3 value: 62.68292682926826 - type: mrr_at_5 value: 64.1585365853658 - type: nauc_map_at_1000_diff1 value: 45.82740870907091 - type: nauc_map_at_1000_max value: 21.9696540066807 - type: nauc_map_at_1000_std value: -32.028262356639495 - type: nauc_map_at_100_diff1 value: 45.802053117616396 - type: nauc_map_at_100_max value: 21.946002070290966 - type: nauc_map_at_100_std value: -32.06190418866229 - type: nauc_map_at_10_diff1 value: 46.017774155748945 - type: nauc_map_at_10_max value: 21.876909086095544 - type: nauc_map_at_10_std value: -32.13913568843985 - type: nauc_map_at_1_diff1 value: 56.34671160956164 - type: nauc_map_at_1_max value: 17.6796949796236 - type: nauc_map_at_1_std value: -13.741140688066045 - type: nauc_map_at_20_diff1 value: 46.027469176858716 - type: nauc_map_at_20_max value: 21.80738432042703 - type: nauc_map_at_20_std value: -32.430379634015395 - type: nauc_map_at_3_diff1 value: 48.40096725254027 - type: nauc_map_at_3_max value: 21.15442803574233 - type: nauc_map_at_3_std value: -26.205850292181417 - type: nauc_map_at_5_diff1 value: 45.77800041356389 - type: nauc_map_at_5_max value: 22.11718771798752 - type: nauc_map_at_5_std value: -30.32876338031471 - type: nauc_mrr_at_1000_diff1 value: 49.748274798877944 - type: nauc_mrr_at_1000_max value: 24.547774167219906 - type: nauc_mrr_at_1000_std value: -32.728447209433504 - type: nauc_mrr_at_100_diff1 value: 49.734549290377856 - type: nauc_mrr_at_100_max value: 24.536933315055222 - type: nauc_mrr_at_100_std value: -32.74076335880697 - type: nauc_mrr_at_10_diff1 value: 49.82827711456392 - type: nauc_mrr_at_10_max value: 24.536773657485075 - type: nauc_mrr_at_10_std value: -33.05707547166962 - type: nauc_mrr_at_1_diff1 value: 51.954289992321044 - type: nauc_mrr_at_1_max value: 26.336255074856886 - type: nauc_mrr_at_1_std value: -29.042962019692446 - type: nauc_mrr_at_20_diff1 value: 49.70938465628863 - type: nauc_mrr_at_20_max value: 24.433219849576947 - type: nauc_mrr_at_20_std value: -32.94123791846049 - type: nauc_mrr_at_3_diff1 value: 50.289486880347134 - type: nauc_mrr_at_3_max value: 24.978796972860142 - type: nauc_mrr_at_3_std value: -32.11305594784892 - type: nauc_mrr_at_5_diff1 value: 49.95013396316144 - type: nauc_mrr_at_5_max value: 24.514452761198303 - type: nauc_mrr_at_5_std value: -32.865859962984146 - type: nauc_ndcg_at_1000_diff1 value: 45.73806489233998 - type: nauc_ndcg_at_1000_max value: 22.404941391043867 - type: nauc_ndcg_at_1000_std value: -33.063445720849685 - type: nauc_ndcg_at_100_diff1 value: 45.1046206923062 - type: nauc_ndcg_at_100_max value: 22.081133719684658 - type: nauc_ndcg_at_100_std value: -33.299291459450146 - type: nauc_ndcg_at_10_diff1 value: 46.140608688357496 - type: nauc_ndcg_at_10_max value: 21.442489279388916 - type: nauc_ndcg_at_10_std value: -35.115870342856006 - type: nauc_ndcg_at_1_diff1 value: 51.954289992321044 - type: nauc_ndcg_at_1_max value: 26.336255074856886 - type: nauc_ndcg_at_1_std value: -29.042962019692446 - type: nauc_ndcg_at_20_diff1 value: 45.966784725457046 - type: nauc_ndcg_at_20_max value: 21.166632858613145 - type: nauc_ndcg_at_20_std value: -35.65112890375392 - type: nauc_ndcg_at_3_diff1 value: 46.7404863978999 - type: nauc_ndcg_at_3_max value: 22.701743709129456 - type: nauc_ndcg_at_3_std value: -30.907633466983192 - type: nauc_ndcg_at_5_diff1 value: 45.86487199083486 - type: nauc_ndcg_at_5_max value: 22.088804840002513 - type: nauc_ndcg_at_5_std value: -32.3853481632832 - type: nauc_precision_at_1000_diff1 value: -25.69710612774455 - type: nauc_precision_at_1000_max value: 1.3964400247388091 - type: nauc_precision_at_1000_std value: -8.873947511634814 - type: nauc_precision_at_100_diff1 value: -24.013497191077978 - type: nauc_precision_at_100_max value: 2.0197725715909343 - type: nauc_precision_at_100_std value: -11.387423148770633 - type: nauc_precision_at_10_diff1 value: -6.47728645242781 - type: nauc_precision_at_10_max value: 6.815261443768304 - type: nauc_precision_at_10_std value: -26.825062292855943 - type: nauc_precision_at_1_diff1 value: 51.954289992321044 - type: nauc_precision_at_1_max value: 26.336255074856886 - type: nauc_precision_at_1_std value: -29.042962019692446 - type: nauc_precision_at_20_diff1 value: -12.355232044747511 - type: nauc_precision_at_20_max value: 4.022126850949725 - type: nauc_precision_at_20_std value: -23.688935769326772 - type: nauc_precision_at_3_diff1 value: 7.662671665835864 - type: nauc_precision_at_3_max value: 14.372394760986248 - type: nauc_precision_at_3_std value: -28.635125665532453 - type: nauc_precision_at_5_diff1 value: -1.4592476425511611 - type: nauc_precision_at_5_max value: 11.124310161474174 - type: nauc_precision_at_5_std value: -27.89526669318053 - type: nauc_recall_at_1000_diff1 value: -19.58450046684932 - type: nauc_recall_at_1000_max value: 70.71661998133165 - type: nauc_recall_at_1000_std value: 93.05555555556315 - type: nauc_recall_at_100_diff1 value: 15.06356457571853 - type: nauc_recall_at_100_max value: 14.051414749344806 - type: nauc_recall_at_100_std value: -29.461874235153008 - type: nauc_recall_at_10_diff1 value: 41.29842726117901 - type: nauc_recall_at_10_max value: 15.768699673830898 - type: nauc_recall_at_10_std value: -42.11585661287712 - type: nauc_recall_at_1_diff1 value: 56.34671160956164 - type: nauc_recall_at_1_max value: 17.6796949796236 - type: nauc_recall_at_1_std value: -13.741140688066045 - type: nauc_recall_at_20_diff1 value: 38.8078283585263 - type: nauc_recall_at_20_max value: 12.06816084005326 - type: nauc_recall_at_20_std value: -48.20956170056591 - type: nauc_recall_at_3_diff1 value: 44.71028758038993 - type: nauc_recall_at_3_max value: 19.1059093689162 - type: nauc_recall_at_3_std value: -26.795164453784253 - type: nauc_recall_at_5_diff1 value: 41.06320797773054 - type: nauc_recall_at_5_max value: 19.117028272530998 - type: nauc_recall_at_5_std value: -33.985747504612156 - type: ndcg_at_1 value: 56.95099999999999 - type: ndcg_at_10 value: 64.64 - type: ndcg_at_100 value: 70.017 - type: ndcg_at_1000 value: 70.662 - type: ndcg_at_20 value: 67.256 - type: ndcg_at_3 value: 58.269000000000005 - type: ndcg_at_5 value: 60.94199999999999 - type: precision_at_1 value: 56.95099999999999 - type: precision_at_10 value: 15.671 - type: precision_at_100 value: 2.002 - type: precision_at_1000 value: 0.208 - type: precision_at_20 value: 8.689 - type: precision_at_3 value: 36.341 - type: precision_at_5 value: 26.854 - type: recall_at_1 value: 35.858000000000004 - type: recall_at_10 value: 75.02 - type: recall_at_100 value: 95.76 - type: recall_at_1000 value: 99.837 - type: recall_at_20 value: 83.732 - type: recall_at_3 value: 57.093 - type: recall_at_5 value: 66.193 task: type: Retrieval - dataset: config: cmn-cmn name: MTEB XPQARetrieval (cmn-cmn) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 69.446 - type: map_at_1 value: 39.995999999999995 - type: map_at_10 value: 64.033 - type: map_at_100 value: 65.51599999999999 - type: map_at_1000 value: 65.545 - type: map_at_20 value: 64.958 - type: map_at_3 value: 57.767 - type: map_at_5 value: 61.998 - type: mrr_at_1 value: 63.3495145631068 - type: mrr_at_10 value: 70.21146363075978 - type: mrr_at_100 value: 70.82810974202124 - type: mrr_at_1000 value: 70.83816803303915 - type: mrr_at_20 value: 70.60140248428802 - type: mrr_at_3 value: 68.66909385113267 - type: mrr_at_5 value: 69.56108414239482 - type: nauc_map_at_1000_diff1 value: 51.649897072831465 - type: nauc_map_at_1000_max value: 38.25222728655331 - type: nauc_map_at_1000_std value: -39.10327919949334 - type: nauc_map_at_100_diff1 value: 51.644205886401465 - type: nauc_map_at_100_max value: 38.23611154355255 - type: nauc_map_at_100_std value: -39.1677073977285 - type: nauc_map_at_10_diff1 value: 51.81444145636039 - type: nauc_map_at_10_max value: 38.03382104326485 - type: nauc_map_at_10_std value: -38.999395639812015 - type: nauc_map_at_1_diff1 value: 59.785298201044704 - type: nauc_map_at_1_max value: 23.273537759937785 - type: nauc_map_at_1_std value: -17.838712689290194 - type: nauc_map_at_20_diff1 value: 51.680208795601004 - type: nauc_map_at_20_max value: 38.23334583518634 - type: nauc_map_at_20_std value: -39.24344495939061 - type: nauc_map_at_3_diff1 value: 52.180913298194056 - type: nauc_map_at_3_max value: 33.45482478000481 - type: nauc_map_at_3_std value: -31.682911030586297 - type: nauc_map_at_5_diff1 value: 50.804900676175436 - type: nauc_map_at_5_max value: 37.68924816012326 - type: nauc_map_at_5_std value: -36.85016896616712 - type: nauc_mrr_at_1000_diff1 value: 56.371477471577535 - type: nauc_mrr_at_1000_max value: 42.773877962050086 - type: nauc_mrr_at_1000_std value: -40.41765081873682 - type: nauc_mrr_at_100_diff1 value: 56.3619751528192 - type: nauc_mrr_at_100_max value: 42.76298794859916 - type: nauc_mrr_at_100_std value: -40.44070582448831 - type: nauc_mrr_at_10_diff1 value: 56.33810523477712 - type: nauc_mrr_at_10_max value: 42.76591937795783 - type: nauc_mrr_at_10_std value: -40.69339583030244 - type: nauc_mrr_at_1_diff1 value: 58.90399906884378 - type: nauc_mrr_at_1_max value: 43.38806571165292 - type: nauc_mrr_at_1_std value: -38.224015285584 - type: nauc_mrr_at_20_diff1 value: 56.32629070537032 - type: nauc_mrr_at_20_max value: 42.79615263472604 - type: nauc_mrr_at_20_std value: -40.496777397603076 - type: nauc_mrr_at_3_diff1 value: 55.96989454480743 - type: nauc_mrr_at_3_max value: 42.49832220744744 - type: nauc_mrr_at_3_std value: -39.883799467132384 - type: nauc_mrr_at_5_diff1 value: 56.003080766475755 - type: nauc_mrr_at_5_max value: 42.73308051011805 - type: nauc_mrr_at_5_std value: -39.87179511166683 - type: nauc_ndcg_at_1000_diff1 value: 52.49054229225255 - type: nauc_ndcg_at_1000_max value: 39.61644750719859 - type: nauc_ndcg_at_1000_std value: -40.89845763194674 - type: nauc_ndcg_at_100_diff1 value: 52.33511250864434 - type: nauc_ndcg_at_100_max value: 39.25530146124452 - type: nauc_ndcg_at_100_std value: -41.92444498004374 - type: nauc_ndcg_at_10_diff1 value: 52.62031505931842 - type: nauc_ndcg_at_10_max value: 38.667195545396766 - type: nauc_ndcg_at_10_std value: -42.59503924641507 - type: nauc_ndcg_at_1_diff1 value: 58.90399906884378 - type: nauc_ndcg_at_1_max value: 43.38806571165292 - type: nauc_ndcg_at_1_std value: -38.224015285584 - type: nauc_ndcg_at_20_diff1 value: 52.15061629809436 - type: nauc_ndcg_at_20_max value: 39.09332400054708 - type: nauc_ndcg_at_20_std value: -42.80018671618001 - type: nauc_ndcg_at_3_diff1 value: 51.04210728138207 - type: nauc_ndcg_at_3_max value: 38.19034802567046 - type: nauc_ndcg_at_3_std value: -38.179821090765216 - type: nauc_ndcg_at_5_diff1 value: 51.04399574045204 - type: nauc_ndcg_at_5_max value: 38.42492210204548 - type: nauc_ndcg_at_5_std value: -38.868073241617715 - type: nauc_precision_at_1000_diff1 value: -25.151369907213734 - type: nauc_precision_at_1000_max value: 9.012549147054989 - type: nauc_precision_at_1000_std value: -9.319786589947698 - type: nauc_precision_at_100_diff1 value: -23.20945211843088 - type: nauc_precision_at_100_max value: 9.860701593969862 - type: nauc_precision_at_100_std value: -13.073877818347231 - type: nauc_precision_at_10_diff1 value: -6.970781124246847 - type: nauc_precision_at_10_max value: 19.392675322254487 - type: nauc_precision_at_10_std value: -26.74943490717657 - type: nauc_precision_at_1_diff1 value: 58.90399906884378 - type: nauc_precision_at_1_max value: 43.38806571165292 - type: nauc_precision_at_1_std value: -38.224015285584 - type: nauc_precision_at_20_diff1 value: -13.046456108081102 - type: nauc_precision_at_20_max value: 15.69439950383875 - type: nauc_precision_at_20_std value: -23.836004512018093 - type: nauc_precision_at_3_diff1 value: 3.5444232965528846 - type: nauc_precision_at_3_max value: 27.08858445453865 - type: nauc_precision_at_3_std value: -29.12757283665593 - type: nauc_precision_at_5_diff1 value: -3.6853986353320267 - type: nauc_precision_at_5_max value: 24.32059689571271 - type: nauc_precision_at_5_std value: -27.46188072134163 - type: nauc_recall_at_1000_diff1 value: 86.93515141907919 - type: nauc_recall_at_1000_max value: 100.0 - type: nauc_recall_at_1000_std value: 100.0 - type: nauc_recall_at_100_diff1 value: 39.7052887613879 - type: nauc_recall_at_100_max value: 18.40943977796887 - type: nauc_recall_at_100_std value: -88.74014854144974 - type: nauc_recall_at_10_diff1 value: 48.85342500870892 - type: nauc_recall_at_10_max value: 32.69617204234419 - type: nauc_recall_at_10_std value: -51.9937231860804 - type: nauc_recall_at_1_diff1 value: 59.785298201044704 - type: nauc_recall_at_1_max value: 23.273537759937785 - type: nauc_recall_at_1_std value: -17.838712689290194 - type: nauc_recall_at_20_diff1 value: 45.40839773314378 - type: nauc_recall_at_20_max value: 33.02458321493215 - type: nauc_recall_at_20_std value: -55.97800739448166 - type: nauc_recall_at_3_diff1 value: 47.05565693416531 - type: nauc_recall_at_3_max value: 28.743850400344297 - type: nauc_recall_at_3_std value: -32.436470486397475 - type: nauc_recall_at_5_diff1 value: 45.30223758669577 - type: nauc_recall_at_5_max value: 33.6567274747059 - type: nauc_recall_at_5_std value: -39.946712017948514 - type: ndcg_at_1 value: 63.349999999999994 - type: ndcg_at_10 value: 69.446 - type: ndcg_at_100 value: 74.439 - type: ndcg_at_1000 value: 74.834 - type: ndcg_at_20 value: 71.763 - type: ndcg_at_3 value: 64.752 - type: ndcg_at_5 value: 66.316 - type: precision_at_1 value: 63.349999999999994 - type: precision_at_10 value: 16.286 - type: precision_at_100 value: 2.024 - type: precision_at_1000 value: 0.207 - type: precision_at_20 value: 8.908000000000001 - type: precision_at_3 value: 40.655 - type: precision_at_5 value: 28.859 - type: recall_at_1 value: 39.995999999999995 - type: recall_at_10 value: 78.107 - type: recall_at_100 value: 97.538 - type: recall_at_1000 value: 99.96000000000001 - type: recall_at_20 value: 85.72 - type: recall_at_3 value: 63.291 - type: recall_at_5 value: 70.625 task: type: Retrieval - dataset: config: spa-eng name: MTEB XPQARetrieval (spa-eng) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 68.258 - type: map_at_1 value: 33.06 - type: map_at_10 value: 61.590999999999994 - type: map_at_100 value: 63.341 - type: map_at_1000 value: 63.385999999999996 - type: map_at_20 value: 62.77700000000001 - type: map_at_3 value: 52.547999999999995 - type: map_at_5 value: 58.824 - type: mrr_at_1 value: 63.80832282471627 - type: mrr_at_10 value: 70.76848015372607 - type: mrr_at_100 value: 71.33996704518061 - type: mrr_at_1000 value: 71.35368444388072 - type: mrr_at_20 value: 71.18191741103522 - type: mrr_at_3 value: 68.83144178226142 - type: mrr_at_5 value: 69.88440521227405 - type: nauc_map_at_1000_diff1 value: 41.59255746310511 - type: nauc_map_at_1000_max value: 42.064075373358065 - type: nauc_map_at_1000_std value: -25.130730194381723 - type: nauc_map_at_100_diff1 value: 41.56447648820406 - type: nauc_map_at_100_max value: 42.06711634651607 - type: nauc_map_at_100_std value: -25.14871585556968 - type: nauc_map_at_10_diff1 value: 41.28968387107058 - type: nauc_map_at_10_max value: 41.511538272139774 - type: nauc_map_at_10_std value: -25.99906440164276 - type: nauc_map_at_1_diff1 value: 51.09859596320021 - type: nauc_map_at_1_max value: 12.406789321338222 - type: nauc_map_at_1_std value: -18.227486548655076 - type: nauc_map_at_20_diff1 value: 41.39469672947315 - type: nauc_map_at_20_max value: 41.98309315808902 - type: nauc_map_at_20_std value: -25.44704720985219 - type: nauc_map_at_3_diff1 value: 43.16164995512842 - type: nauc_map_at_3_max value: 30.935400935562818 - type: nauc_map_at_3_std value: -23.53095555148866 - type: nauc_map_at_5_diff1 value: 41.23474352142375 - type: nauc_map_at_5_max value: 39.03088859147947 - type: nauc_map_at_5_std value: -26.046526443708366 - type: nauc_mrr_at_1000_diff1 value: 51.79649678213789 - type: nauc_mrr_at_1000_max value: 50.50340748045259 - type: nauc_mrr_at_1000_std value: -24.777183703493407 - type: nauc_mrr_at_100_diff1 value: 51.78609028166551 - type: nauc_mrr_at_100_max value: 50.51732896833555 - type: nauc_mrr_at_100_std value: -24.760054686874717 - type: nauc_mrr_at_10_diff1 value: 51.705268395036995 - type: nauc_mrr_at_10_max value: 50.35818415293149 - type: nauc_mrr_at_10_std value: -25.170367120250404 - type: nauc_mrr_at_1_diff1 value: 53.91475115581825 - type: nauc_mrr_at_1_max value: 49.122529616282016 - type: nauc_mrr_at_1_std value: -22.377647552937155 - type: nauc_mrr_at_20_diff1 value: 51.778984221197774 - type: nauc_mrr_at_20_max value: 50.5070957827813 - type: nauc_mrr_at_20_std value: -24.908935023607285 - type: nauc_mrr_at_3_diff1 value: 51.82683773090423 - type: nauc_mrr_at_3_max value: 50.77993196421369 - type: nauc_mrr_at_3_std value: -24.3925832021831 - type: nauc_mrr_at_5_diff1 value: 51.722232683543034 - type: nauc_mrr_at_5_max value: 50.334865493961864 - type: nauc_mrr_at_5_std value: -25.513593495703297 - type: nauc_ndcg_at_1000_diff1 value: 44.21851582991263 - type: nauc_ndcg_at_1000_max value: 45.73539068637836 - type: nauc_ndcg_at_1000_std value: -24.716522467580397 - type: nauc_ndcg_at_100_diff1 value: 43.8002401615357 - type: nauc_ndcg_at_100_max value: 45.801409410061915 - type: nauc_ndcg_at_100_std value: -24.73171742499903 - type: nauc_ndcg_at_10_diff1 value: 42.540922778755885 - type: nauc_ndcg_at_10_max value: 44.348836943874595 - type: nauc_ndcg_at_10_std value: -28.05403666494785 - type: nauc_ndcg_at_1_diff1 value: 53.91475115581825 - type: nauc_ndcg_at_1_max value: 49.122529616282016 - type: nauc_ndcg_at_1_std value: -22.377647552937155 - type: nauc_ndcg_at_20_diff1 value: 43.10347921163421 - type: nauc_ndcg_at_20_max value: 45.53253270265022 - type: nauc_ndcg_at_20_std value: -26.63902791862846 - type: nauc_ndcg_at_3_diff1 value: 42.41720274782384 - type: nauc_ndcg_at_3_max value: 42.91778219334943 - type: nauc_ndcg_at_3_std value: -24.793252033594076 - type: nauc_ndcg_at_5_diff1 value: 42.51515034945093 - type: nauc_ndcg_at_5_max value: 41.62080576508792 - type: nauc_ndcg_at_5_std value: -28.209669314955065 - type: nauc_precision_at_1000_diff1 value: -14.89794075433148 - type: nauc_precision_at_1000_max value: 27.85387929356412 - type: nauc_precision_at_1000_std value: 10.728618597190849 - type: nauc_precision_at_100_diff1 value: -13.075270046295856 - type: nauc_precision_at_100_max value: 29.77208946756632 - type: nauc_precision_at_100_std value: 8.491662697326039 - type: nauc_precision_at_10_diff1 value: -4.0826025188781205 - type: nauc_precision_at_10_max value: 39.04278085180075 - type: nauc_precision_at_10_std value: -5.925408651372333 - type: nauc_precision_at_1_diff1 value: 53.91475115581825 - type: nauc_precision_at_1_max value: 49.122529616282016 - type: nauc_precision_at_1_std value: -22.377647552937155 - type: nauc_precision_at_20_diff1 value: -7.93186440645135 - type: nauc_precision_at_20_max value: 35.81281308891365 - type: nauc_precision_at_20_std value: 0.1241277857515697 - type: nauc_precision_at_3_diff1 value: 7.563562511484409 - type: nauc_precision_at_3_max value: 43.43738862378524 - type: nauc_precision_at_3_std value: -11.958059731912615 - type: nauc_precision_at_5_diff1 value: -0.1801152449011624 - type: nauc_precision_at_5_max value: 41.32486715619513 - type: nauc_precision_at_5_std value: -10.088699021919552 - type: nauc_recall_at_1000_diff1 value: 86.93359696819986 - type: nauc_recall_at_1000_max value: 100.0 - type: nauc_recall_at_1000_std value: 72.21843645604022 - type: nauc_recall_at_100_diff1 value: 29.86050842714198 - type: nauc_recall_at_100_max value: 48.106658251136245 - type: nauc_recall_at_100_std value: -14.981886214880035 - type: nauc_recall_at_10_diff1 value: 33.67119240737528 - type: nauc_recall_at_10_max value: 39.271984859561414 - type: nauc_recall_at_10_std value: -35.6434883839217 - type: nauc_recall_at_1_diff1 value: 51.09859596320021 - type: nauc_recall_at_1_max value: 12.406789321338222 - type: nauc_recall_at_1_std value: -18.227486548655076 - type: nauc_recall_at_20_diff1 value: 33.211979983240724 - type: nauc_recall_at_20_max value: 43.47676074743184 - type: nauc_recall_at_20_std value: -33.88107138395349 - type: nauc_recall_at_3_diff1 value: 39.22513750146998 - type: nauc_recall_at_3_max value: 27.066674083840166 - type: nauc_recall_at_3_std value: -26.963282529629893 - type: nauc_recall_at_5_diff1 value: 36.53718917129459 - type: nauc_recall_at_5_max value: 35.40550013169686 - type: nauc_recall_at_5_std value: -34.209159379410806 - type: ndcg_at_1 value: 63.808 - type: ndcg_at_10 value: 68.258 - type: ndcg_at_100 value: 73.38799999999999 - type: ndcg_at_1000 value: 74.03 - type: ndcg_at_20 value: 70.968 - type: ndcg_at_3 value: 62.33 - type: ndcg_at_5 value: 64.096 - type: precision_at_1 value: 63.808 - type: precision_at_10 value: 19.243 - type: precision_at_100 value: 2.367 - type: precision_at_1000 value: 0.245 - type: precision_at_20 value: 10.599 - type: precision_at_3 value: 44.515 - type: precision_at_5 value: 33.467999999999996 - type: recall_at_1 value: 33.06 - type: recall_at_10 value: 77.423 - type: recall_at_100 value: 95.923 - type: recall_at_1000 value: 99.874 - type: recall_at_20 value: 85.782 - type: recall_at_3 value: 57.098000000000006 - type: recall_at_5 value: 67.472 task: type: Retrieval - dataset: config: spa-spa name: MTEB XPQARetrieval (spa-spa) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 72.004 - type: map_at_1 value: 36.248000000000005 - type: map_at_10 value: 65.679 - type: map_at_100 value: 67.22399999999999 - type: map_at_1000 value: 67.264 - type: map_at_20 value: 66.705 - type: map_at_3 value: 56.455 - type: map_at_5 value: 62.997 - type: mrr_at_1 value: 67.71752837326608 - type: mrr_at_10 value: 74.59782021257429 - type: mrr_at_100 value: 75.0640960767943 - type: mrr_at_1000 value: 75.07324799466076 - type: mrr_at_20 value: 74.9323963386884 - type: mrr_at_3 value: 72.95081967213115 - type: mrr_at_5 value: 73.82723833543506 - type: nauc_map_at_1000_diff1 value: 43.111810717567714 - type: nauc_map_at_1000_max value: 44.835247208972476 - type: nauc_map_at_1000_std value: -32.798405973931985 - type: nauc_map_at_100_diff1 value: 43.090223482932764 - type: nauc_map_at_100_max value: 44.83392441557943 - type: nauc_map_at_100_std value: -32.81149166676563 - type: nauc_map_at_10_diff1 value: 42.87841934951979 - type: nauc_map_at_10_max value: 43.9838653389494 - type: nauc_map_at_10_std value: -33.588084643627084 - type: nauc_map_at_1_diff1 value: 54.509245848379095 - type: nauc_map_at_1_max value: 10.05921648322742 - type: nauc_map_at_1_std value: -24.652326014826762 - type: nauc_map_at_20_diff1 value: 43.07468612984794 - type: nauc_map_at_20_max value: 44.75663122615032 - type: nauc_map_at_20_std value: -33.11788887878321 - type: nauc_map_at_3_diff1 value: 44.63272828938906 - type: nauc_map_at_3_max value: 32.1584369869227 - type: nauc_map_at_3_std value: -30.761662210142944 - type: nauc_map_at_5_diff1 value: 42.77296997803048 - type: nauc_map_at_5_max value: 41.78894616737652 - type: nauc_map_at_5_std value: -33.56459774477362 - type: nauc_mrr_at_1000_diff1 value: 53.097544131833494 - type: nauc_mrr_at_1000_max value: 50.61134979184588 - type: nauc_mrr_at_1000_std value: -35.6221191487669 - type: nauc_mrr_at_100_diff1 value: 53.096609856182106 - type: nauc_mrr_at_100_max value: 50.61951585642645 - type: nauc_mrr_at_100_std value: -35.62396157508327 - type: nauc_mrr_at_10_diff1 value: 52.771534471912304 - type: nauc_mrr_at_10_max value: 50.430863224435726 - type: nauc_mrr_at_10_std value: -36.027992076620365 - type: nauc_mrr_at_1_diff1 value: 55.05316238884337 - type: nauc_mrr_at_1_max value: 49.461858515275196 - type: nauc_mrr_at_1_std value: -31.87492636319712 - type: nauc_mrr_at_20_diff1 value: 53.083253469629746 - type: nauc_mrr_at_20_max value: 50.62156424256193 - type: nauc_mrr_at_20_std value: -35.879153692447154 - type: nauc_mrr_at_3_diff1 value: 52.98283109188415 - type: nauc_mrr_at_3_max value: 50.83561260429378 - type: nauc_mrr_at_3_std value: -35.30839538038797 - type: nauc_mrr_at_5_diff1 value: 52.93270510879709 - type: nauc_mrr_at_5_max value: 50.54595596761199 - type: nauc_mrr_at_5_std value: -35.84059376434395 - type: nauc_ndcg_at_1000_diff1 value: 45.343685089209416 - type: nauc_ndcg_at_1000_max value: 47.801141576669465 - type: nauc_ndcg_at_1000_std value: -33.512958862879195 - type: nauc_ndcg_at_100_diff1 value: 45.255590461515894 - type: nauc_ndcg_at_100_max value: 47.99240031881967 - type: nauc_ndcg_at_100_std value: -33.614465006695205 - type: nauc_ndcg_at_10_diff1 value: 43.93472511731019 - type: nauc_ndcg_at_10_max value: 45.92599752897053 - type: nauc_ndcg_at_10_std value: -36.43629114491574 - type: nauc_ndcg_at_1_diff1 value: 55.05316238884337 - type: nauc_ndcg_at_1_max value: 49.461858515275196 - type: nauc_ndcg_at_1_std value: -31.87492636319712 - type: nauc_ndcg_at_20_diff1 value: 44.93534591273201 - type: nauc_ndcg_at_20_max value: 47.55153940713458 - type: nauc_ndcg_at_20_std value: -35.56392448745206 - type: nauc_ndcg_at_3_diff1 value: 43.17916122133396 - type: nauc_ndcg_at_3_max value: 45.603634205103276 - type: nauc_ndcg_at_3_std value: -32.473227507181214 - type: nauc_ndcg_at_5_diff1 value: 44.10242961669216 - type: nauc_ndcg_at_5_max value: 43.61666669031808 - type: nauc_ndcg_at_5_std value: -35.98808321497782 - type: nauc_precision_at_1000_diff1 value: -23.264714449991146 - type: nauc_precision_at_1000_max value: 28.505729576735465 - type: nauc_precision_at_1000_std value: 11.987379232920926 - type: nauc_precision_at_100_diff1 value: -21.156119174614627 - type: nauc_precision_at_100_max value: 30.711646221646255 - type: nauc_precision_at_100_std value: 9.650486536340322 - type: nauc_precision_at_10_diff1 value: -10.98001328477502 - type: nauc_precision_at_10_max value: 39.25638073760597 - type: nauc_precision_at_10_std value: -4.3456859257488 - type: nauc_precision_at_1_diff1 value: 55.05316238884337 - type: nauc_precision_at_1_max value: 49.461858515275196 - type: nauc_precision_at_1_std value: -31.87492636319712 - type: nauc_precision_at_20_diff1 value: -14.97565390664424 - type: nauc_precision_at_20_max value: 36.383835295942355 - type: nauc_precision_at_20_std value: 1.525158880381114 - type: nauc_precision_at_3_diff1 value: 1.0448345623903483 - type: nauc_precision_at_3_max value: 45.69772060667404 - type: nauc_precision_at_3_std value: -13.002685018948293 - type: nauc_precision_at_5_diff1 value: -5.434185597628904 - type: nauc_precision_at_5_max value: 42.99162431099203 - type: nauc_precision_at_5_std value: -9.789308817624534 - type: nauc_recall_at_1000_diff1 value: 12.309303236094845 - type: nauc_recall_at_1000_max value: 100.0 - type: nauc_recall_at_1000_std value: 86.93359696819986 - type: nauc_recall_at_100_diff1 value: 39.093544920901415 - type: nauc_recall_at_100_max value: 55.62814395062938 - type: nauc_recall_at_100_std value: -22.6919033301514 - type: nauc_recall_at_10_diff1 value: 35.50100141633622 - type: nauc_recall_at_10_max value: 39.25750019586647 - type: nauc_recall_at_10_std value: -43.01273078031791 - type: nauc_recall_at_1_diff1 value: 54.509245848379095 - type: nauc_recall_at_1_max value: 10.05921648322742 - type: nauc_recall_at_1_std value: -24.652326014826762 - type: nauc_recall_at_20_diff1 value: 38.1281707132327 - type: nauc_recall_at_20_max value: 43.97950642900301 - type: nauc_recall_at_20_std value: -44.049952771307574 - type: nauc_recall_at_3_diff1 value: 40.01986938242728 - type: nauc_recall_at_3_max value: 27.517114421061173 - type: nauc_recall_at_3_std value: -32.99056780232045 - type: nauc_recall_at_5_diff1 value: 38.52035606499483 - type: nauc_recall_at_5_max value: 37.05834604678859 - type: nauc_recall_at_5_std value: -39.86196378897912 - type: ndcg_at_1 value: 67.718 - type: ndcg_at_10 value: 72.004 - type: ndcg_at_100 value: 76.554 - type: ndcg_at_1000 value: 77.07300000000001 - type: ndcg_at_20 value: 74.37899999999999 - type: ndcg_at_3 value: 66.379 - type: ndcg_at_5 value: 68.082 - type: precision_at_1 value: 67.718 - type: precision_at_10 value: 19.849 - type: precision_at_100 value: 2.3800000000000003 - type: precision_at_1000 value: 0.245 - type: precision_at_20 value: 10.813 - type: precision_at_3 value: 46.574 - type: precision_at_5 value: 34.83 - type: recall_at_1 value: 36.248000000000005 - type: recall_at_10 value: 80.252 - type: recall_at_100 value: 96.73 - type: recall_at_1000 value: 99.874 - type: recall_at_20 value: 87.703 - type: recall_at_3 value: 60.815 - type: recall_at_5 value: 71.16 task: type: Retrieval - dataset: config: fra-eng name: MTEB XPQARetrieval (fra-eng) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 73.729 - type: map_at_1 value: 43.964999999999996 - type: map_at_10 value: 67.803 - type: map_at_100 value: 69.188 - type: map_at_1000 value: 69.21000000000001 - type: map_at_20 value: 68.747 - type: map_at_3 value: 60.972 - type: map_at_5 value: 65.39399999999999 - type: mrr_at_1 value: 68.4913217623498 - type: mrr_at_10 value: 75.2600822260368 - type: mrr_at_100 value: 75.6599169808848 - type: mrr_at_1000 value: 75.66720883727534 - type: mrr_at_20 value: 75.52375865860405 - type: mrr_at_3 value: 73.54250111259452 - type: mrr_at_5 value: 74.51713395638626 - type: nauc_map_at_1000_diff1 value: 46.81533703002097 - type: nauc_map_at_1000_max value: 46.30794757084772 - type: nauc_map_at_1000_std value: -14.953470500312335 - type: nauc_map_at_100_diff1 value: 46.82464740277745 - type: nauc_map_at_100_max value: 46.32852879948254 - type: nauc_map_at_100_std value: -14.950035098066172 - type: nauc_map_at_10_diff1 value: 46.31406143369831 - type: nauc_map_at_10_max value: 45.337593270786634 - type: nauc_map_at_10_std value: -16.011789445907876 - type: nauc_map_at_1_diff1 value: 57.097134715065835 - type: nauc_map_at_1_max value: 21.93931500350721 - type: nauc_map_at_1_std value: -15.134457251301637 - type: nauc_map_at_20_diff1 value: 46.47030891134173 - type: nauc_map_at_20_max value: 46.29169960276292 - type: nauc_map_at_20_std value: -15.14241106541829 - type: nauc_map_at_3_diff1 value: 50.27064228648596 - type: nauc_map_at_3_max value: 39.43058773971639 - type: nauc_map_at_3_std value: -16.16545993089126 - type: nauc_map_at_5_diff1 value: 46.974867679747426 - type: nauc_map_at_5_max value: 44.31091104855002 - type: nauc_map_at_5_std value: -16.50175337658926 - type: nauc_mrr_at_1000_diff1 value: 55.20294005110399 - type: nauc_mrr_at_1000_max value: 51.947725719119966 - type: nauc_mrr_at_1000_std value: -14.586112939597232 - type: nauc_mrr_at_100_diff1 value: 55.20426251109304 - type: nauc_mrr_at_100_max value: 51.95648725402534 - type: nauc_mrr_at_100_std value: -14.579769236539143 - type: nauc_mrr_at_10_diff1 value: 54.93870506205835 - type: nauc_mrr_at_10_max value: 51.89312772900638 - type: nauc_mrr_at_10_std value: -14.692635010092939 - type: nauc_mrr_at_1_diff1 value: 56.54945935175171 - type: nauc_mrr_at_1_max value: 51.28134504197991 - type: nauc_mrr_at_1_std value: -12.909042186563061 - type: nauc_mrr_at_20_diff1 value: 55.10667018041461 - type: nauc_mrr_at_20_max value: 51.98236870783707 - type: nauc_mrr_at_20_std value: -14.599377575198025 - type: nauc_mrr_at_3_diff1 value: 55.67124311746892 - type: nauc_mrr_at_3_max value: 51.77903236246767 - type: nauc_mrr_at_3_std value: -14.94452633860763 - type: nauc_mrr_at_5_diff1 value: 55.42849172366371 - type: nauc_mrr_at_5_max value: 51.76902965753959 - type: nauc_mrr_at_5_std value: -15.357993534727072 - type: nauc_ndcg_at_1000_diff1 value: 48.736844959280326 - type: nauc_ndcg_at_1000_max value: 48.92891159935398 - type: nauc_ndcg_at_1000_std value: -13.983968675611056 - type: nauc_ndcg_at_100_diff1 value: 48.73859328503975 - type: nauc_ndcg_at_100_max value: 49.31867149556439 - type: nauc_ndcg_at_100_std value: -13.72387564912742 - type: nauc_ndcg_at_10_diff1 value: 46.50313862975287 - type: nauc_ndcg_at_10_max value: 47.13599793554596 - type: nauc_ndcg_at_10_std value: -16.317919977400113 - type: nauc_ndcg_at_1_diff1 value: 56.54945935175171 - type: nauc_ndcg_at_1_max value: 51.28134504197991 - type: nauc_ndcg_at_1_std value: -12.909042186563061 - type: nauc_ndcg_at_20_diff1 value: 47.01727117133912 - type: nauc_ndcg_at_20_max value: 49.121366036709105 - type: nauc_ndcg_at_20_std value: -14.411078677638775 - type: nauc_ndcg_at_3_diff1 value: 49.229581145458276 - type: nauc_ndcg_at_3_max value: 47.427609717032 - type: nauc_ndcg_at_3_std value: -16.52066627289908 - type: nauc_ndcg_at_5_diff1 value: 48.0152514127505 - type: nauc_ndcg_at_5_max value: 46.12152407850816 - type: nauc_ndcg_at_5_std value: -17.613295491954656 - type: nauc_precision_at_1000_diff1 value: -25.959006032642463 - type: nauc_precision_at_1000_max value: 12.81002362947137 - type: nauc_precision_at_1000_std value: 12.575312826061513 - type: nauc_precision_at_100_diff1 value: -24.35413527283394 - type: nauc_precision_at_100_max value: 14.878359236477303 - type: nauc_precision_at_100_std value: 12.384426050018428 - type: nauc_precision_at_10_diff1 value: -17.93220761770618 - type: nauc_precision_at_10_max value: 23.523485811847294 - type: nauc_precision_at_10_std value: 4.424456968716939 - type: nauc_precision_at_1_diff1 value: 56.54945935175171 - type: nauc_precision_at_1_max value: 51.28134504197991 - type: nauc_precision_at_1_std value: -12.909042186563061 - type: nauc_precision_at_20_diff1 value: -21.776871398686936 - type: nauc_precision_at_20_max value: 21.18436338264366 - type: nauc_precision_at_20_std value: 9.937274986573321 - type: nauc_precision_at_3_diff1 value: -1.2411845580934435 - type: nauc_precision_at_3_max value: 34.962281941875 - type: nauc_precision_at_3_std value: -2.447892908501237 - type: nauc_precision_at_5_diff1 value: -11.134164534114085 - type: nauc_precision_at_5_max value: 30.22079740070525 - type: nauc_precision_at_5_std value: -0.24232594421765946 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: 43.3647412452869 - type: nauc_recall_at_100_max value: 63.50094950500327 - type: nauc_recall_at_100_std value: 2.3911909633714044 - type: nauc_recall_at_10_diff1 value: 33.993445071666855 - type: nauc_recall_at_10_max value: 41.38694129134144 - type: nauc_recall_at_10_std value: -19.308698266099096 - type: nauc_recall_at_1_diff1 value: 57.097134715065835 - type: nauc_recall_at_1_max value: 21.93931500350721 - type: nauc_recall_at_1_std value: -15.134457251301637 - type: nauc_recall_at_20_diff1 value: 32.03888531880772 - type: nauc_recall_at_20_max value: 49.660787482562085 - type: nauc_recall_at_20_std value: -12.641456758778382 - type: nauc_recall_at_3_diff1 value: 47.94527082900579 - type: nauc_recall_at_3_max value: 36.51733131437679 - type: nauc_recall_at_3_std value: -18.65511713247495 - type: nauc_recall_at_5_diff1 value: 42.04545772092305 - type: nauc_recall_at_5_max value: 41.21440912972303 - type: nauc_recall_at_5_std value: -21.47386527081128 - type: ndcg_at_1 value: 68.491 - type: ndcg_at_10 value: 73.729 - type: ndcg_at_100 value: 77.684 - type: ndcg_at_1000 value: 78.084 - type: ndcg_at_20 value: 75.795 - type: ndcg_at_3 value: 68.568 - type: ndcg_at_5 value: 70.128 - type: precision_at_1 value: 68.491 - type: precision_at_10 value: 16.996 - type: precision_at_100 value: 2.023 - type: precision_at_1000 value: 0.207 - type: precision_at_20 value: 9.246 - type: precision_at_3 value: 41.923 - type: precision_at_5 value: 29.826000000000004 - type: recall_at_1 value: 43.964999999999996 - type: recall_at_10 value: 82.777 - type: recall_at_100 value: 97.287 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 89.183 - type: recall_at_3 value: 65.803 - type: recall_at_5 value: 74.119 task: type: Retrieval - dataset: config: fra-fra name: MTEB XPQARetrieval (fr) revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f split: test type: jinaai/xpqa metrics: - type: main_score value: 77.581 - type: map_at_1 value: 46.444 - type: map_at_10 value: 72.084 - type: map_at_100 value: 73.175 - type: map_at_1000 value: 73.193 - type: map_at_20 value: 72.77799999999999 - type: map_at_3 value: 65.242 - type: map_at_5 value: 69.926 - type: mrr_at_1 value: 71.82910547396529 - type: mrr_at_10 value: 78.66594612923046 - type: mrr_at_100 value: 78.97334934049613 - type: mrr_at_1000 value: 78.97687021803557 - type: mrr_at_20 value: 78.85701141744282 - type: mrr_at_3 value: 76.96929238985311 - type: mrr_at_5 value: 77.99732977303067 - type: nauc_map_at_1000_diff1 value: 49.090956807097804 - type: nauc_map_at_1000_max value: 52.01095354889508 - type: nauc_map_at_1000_std value: -12.182870421711026 - type: nauc_map_at_100_diff1 value: 49.091664766684566 - type: nauc_map_at_100_max value: 52.017499797253755 - type: nauc_map_at_100_std value: -12.188342487271528 - type: nauc_map_at_10_diff1 value: 48.6619338205362 - type: nauc_map_at_10_max value: 50.93591260329888 - type: nauc_map_at_10_std value: -12.899399261673365 - type: nauc_map_at_1_diff1 value: 61.89699552471587 - type: nauc_map_at_1_max value: 22.387748207421946 - type: nauc_map_at_1_std value: -17.139518194308437 - type: nauc_map_at_20_diff1 value: 48.72828404686453 - type: nauc_map_at_20_max value: 51.781074586075434 - type: nauc_map_at_20_std value: -12.174270605093136 - type: nauc_map_at_3_diff1 value: 53.11509580126934 - type: nauc_map_at_3_max value: 42.1768380145106 - type: nauc_map_at_3_std value: -14.98340833032363 - type: nauc_map_at_5_diff1 value: 49.60521390803235 - type: nauc_map_at_5_max value: 49.80360562029127 - type: nauc_map_at_5_std value: -13.900652140457618 - type: nauc_mrr_at_1000_diff1 value: 58.10782478654255 - type: nauc_mrr_at_1000_max value: 61.31083013535486 - type: nauc_mrr_at_1000_std value: -9.624904298545921 - type: nauc_mrr_at_100_diff1 value: 58.11041683306092 - type: nauc_mrr_at_100_max value: 61.31590199755797 - type: nauc_mrr_at_100_std value: -9.625991053580865 - type: nauc_mrr_at_10_diff1 value: 57.883701815695375 - type: nauc_mrr_at_10_max value: 61.36276126424689 - type: nauc_mrr_at_10_std value: -9.495072468420386 - type: nauc_mrr_at_1_diff1 value: 60.18176977079093 - type: nauc_mrr_at_1_max value: 59.697615236642555 - type: nauc_mrr_at_1_std value: -9.396133077966779 - type: nauc_mrr_at_20_diff1 value: 57.964817434006754 - type: nauc_mrr_at_20_max value: 61.34073539502932 - type: nauc_mrr_at_20_std value: -9.602378876645131 - type: nauc_mrr_at_3_diff1 value: 58.44338049427257 - type: nauc_mrr_at_3_max value: 60.92272989411293 - type: nauc_mrr_at_3_std value: -9.928970439416162 - type: nauc_mrr_at_5_diff1 value: 58.01513016866578 - type: nauc_mrr_at_5_max value: 61.46805302986586 - type: nauc_mrr_at_5_std value: -9.842227002440984 - type: nauc_ndcg_at_1000_diff1 value: 50.99293152828167 - type: nauc_ndcg_at_1000_max value: 56.14232784664811 - type: nauc_ndcg_at_1000_std value: -10.529213072410288 - type: nauc_ndcg_at_100_diff1 value: 50.99385944312529 - type: nauc_ndcg_at_100_max value: 56.34825518954588 - type: nauc_ndcg_at_100_std value: -10.398943874846047 - type: nauc_ndcg_at_10_diff1 value: 48.51273364357823 - type: nauc_ndcg_at_10_max value: 53.77871849486298 - type: nauc_ndcg_at_10_std value: -11.82105972112472 - type: nauc_ndcg_at_1_diff1 value: 60.18176977079093 - type: nauc_ndcg_at_1_max value: 59.697615236642555 - type: nauc_ndcg_at_1_std value: -9.396133077966779 - type: nauc_ndcg_at_20_diff1 value: 49.04268319033412 - type: nauc_ndcg_at_20_max value: 55.47011381097071 - type: nauc_ndcg_at_20_std value: -10.486452945493042 - type: nauc_ndcg_at_3_diff1 value: 50.95112745400584 - type: nauc_ndcg_at_3_max value: 53.45473828705577 - type: nauc_ndcg_at_3_std value: -13.420699384045728 - type: nauc_ndcg_at_5_diff1 value: 50.313156212000074 - type: nauc_ndcg_at_5_max value: 52.78539129309866 - type: nauc_ndcg_at_5_std value: -13.586274096509122 - type: nauc_precision_at_1000_diff1 value: -31.13772049254778 - type: nauc_precision_at_1000_max value: 17.2847598361294 - type: nauc_precision_at_1000_std value: 15.497531773816887 - type: nauc_precision_at_100_diff1 value: -29.98812263553739 - type: nauc_precision_at_100_max value: 19.048620003227654 - type: nauc_precision_at_100_std value: 15.38499952171958 - type: nauc_precision_at_10_diff1 value: -25.33028097412579 - type: nauc_precision_at_10_max value: 26.077919168306853 - type: nauc_precision_at_10_std value: 11.35352933466097 - type: nauc_precision_at_1_diff1 value: 60.18176977079093 - type: nauc_precision_at_1_max value: 59.697615236642555 - type: nauc_precision_at_1_std value: -9.396133077966779 - type: nauc_precision_at_20_diff1 value: -28.417606311068905 - type: nauc_precision_at_20_max value: 23.958679828637692 - type: nauc_precision_at_20_std value: 14.442021499194205 - type: nauc_precision_at_3_diff1 value: -8.127396049790482 - type: nauc_precision_at_3_max value: 37.348067982957076 - type: nauc_precision_at_3_std value: 4.747913619596849 - type: nauc_precision_at_5_diff1 value: -16.902418446058395 - type: nauc_precision_at_5_max value: 32.73583852552014 - type: nauc_precision_at_5_std value: 7.031446423850052 - type: nauc_recall_at_1000_diff1 value: -14.485978369112514 - type: nauc_recall_at_1000_max value: 78.59123887333172 - type: nauc_recall_at_1000_std value: 90.7384575424963 - type: nauc_recall_at_100_diff1 value: 41.47842281590715 - type: nauc_recall_at_100_max value: 67.47271545727422 - type: nauc_recall_at_100_std value: 14.555561992253999 - type: nauc_recall_at_10_diff1 value: 33.05308907973924 - type: nauc_recall_at_10_max value: 45.49878918493155 - type: nauc_recall_at_10_std value: -11.560069806810926 - type: nauc_recall_at_1_diff1 value: 61.89699552471587 - type: nauc_recall_at_1_max value: 22.387748207421946 - type: nauc_recall_at_1_std value: -17.139518194308437 - type: nauc_recall_at_20_diff1 value: 31.305721376453754 - type: nauc_recall_at_20_max value: 51.24817763724019 - type: nauc_recall_at_20_std value: -5.0809908162023145 - type: nauc_recall_at_3_diff1 value: 49.27109038342917 - type: nauc_recall_at_3_max value: 37.69188317998447 - type: nauc_recall_at_3_std value: -17.119900758664336 - type: nauc_recall_at_5_diff1 value: 42.74501803377967 - type: nauc_recall_at_5_max value: 46.877008503354844 - type: nauc_recall_at_5_std value: -15.704892082115975 - type: ndcg_at_1 value: 71.829 - type: ndcg_at_10 value: 77.581 - type: ndcg_at_100 value: 80.75 - type: ndcg_at_1000 value: 81.026 - type: ndcg_at_20 value: 79.092 - type: ndcg_at_3 value: 72.81 - type: ndcg_at_5 value: 74.22999999999999 - type: precision_at_1 value: 71.829 - type: precision_at_10 value: 17.717 - type: precision_at_100 value: 2.031 - type: precision_at_1000 value: 0.207 - type: precision_at_20 value: 9.399000000000001 - type: precision_at_3 value: 44.458999999999996 - type: precision_at_5 value: 31.535000000000004 - type: recall_at_1 value: 46.444 - type: recall_at_10 value: 86.275 - type: recall_at_100 value: 98.017 - type: recall_at_1000 value: 99.8 - type: recall_at_20 value: 90.935 - type: recall_at_3 value: 70.167 - type: recall_at_5 value: 78.2 task: type: Retrieval ---

\"Jina

The embedding model trained by .

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

## Quick Start Blog | Azure | AWS SageMaker | API ## Intended Usage & Model Info is a **multilingual multi-task text embedding model** designed for a variety of NLP applications. Based on the Jina-XLM-RoBERTa architecture, this model supports Rotary Position Embeddings to handle long input sequences up to **8192 tokens**. Additionally, it features 5 LoRA adapters to generate task-specific embeddings efficiently. ### Key Features: - **Extended Sequence Length:** Supports up to 8192 tokens with RoPE. - **Task-Specific Embedding:** Customize embeddings through the argument with the following options: - : Used for query embeddings in asymmetric retrieval tasks - : Used for passage embeddings in asymmetric retrieval tasks - : Used for embeddings in clustering and re-ranking applications - : Used for embeddings in classification tasks - : Used for embeddings in tasks that quantify similarity between two texts, such as STS or symmetric retrieval tasks - **Matryoshka Embeddings**: Supports flexible embedding sizes (), allowing for truncating embeddings to fit your application. ### Supported Languages: While the foundation model supports 100 languages, we've focused our tuning efforts on the following 30 languages: **Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.** > **⚠️ Important Notice:** > We fixed a bug in the function #60 where **Matryoshka embedding truncation** occurred *after normalization*, leading to non-normalized truncated embeddings. This issue has been resolved in the latest code revision. > > If you have encoded data using the previous version and wish to maintain consistency, please use the specific code revision when loading the model: ## Usage **
Apply mean pooling when integrating the model.**

### Why Use Mean Pooling? Mean pooling takes all token embeddings from the model's output and averages them at the sentence or paragraph level. This approach has been shown to produce high-quality sentence embeddings. We provide an function that handles this for you automatically. However, if you're working with the model directly, outside of the function, you'll need to apply mean pooling manually. Here's how you can do it:

The easiest way to start using is with the Jina Embedding API. Alternatively, you can use directly via Transformers package: If you run it on a GPU that support FlashAttention-2. By 2024.9.12, it supports Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100), By default, the model supports a maximum sequence length of 8192 tokens. However, if you want to truncate your input texts to a shorter length, you can pass the parameter to the function: In case you want to use **Matryoshka embeddings** and switch to a different dimension, you can adjust it by passing the parameter to the function: The latest version (3.1.0) of SentenceTransformers also supports : You can fine-tune using SentenceTransformerTrainer. To fine-tune for a specific task, you should set the task before passing the model to the ST Trainer, either during initialization: Or afterwards: This way you can fine-tune the LoRA adapter for the chosen task. However, If you want to fine-tune the entire model, make sure the main parameters are set as trainable when loading the model: This will allow fine-tuning the whole model instead of just the LoRA adapters. **
ONNX Inference.**

You can use ONNX for efficient inference with :

## Contact Join our Discord community and chat with other community members about ideas. ## License is listed on AWS & Azure. If you need to use it beyond those platforms or on-premises within your company, note that the models is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to contact us. ## Citation If you find useful in your research, please cite the following paper:", + "model_explanation_gemini": "Generates multilingual sentence embeddings for tasks like feature extraction and similarity measurement across diverse languages." +} \ No newline at end of file diff --git a/data/model_data_json/jinaai_jina-reranker-v2-base-multilingual.json b/data/model_data_json/jinaai_jina-reranker-v2-base-multilingual.json new file mode 100644 index 0000000000000000000000000000000000000000..566e4a6c4dfdcb2a4c0e9a04a0db5636c5f6915d --- /dev/null +++ b/data/model_data_json/jinaai_jina-reranker-v2-base-multilingual.json @@ -0,0 +1,21 @@ +{ + "model_id": "jinaai/jina-reranker-v2-base-multilingual", + "downloads": 702180, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "text-classification", + "reranker", + "cross-encoder", + "transformers.js", + "custom_code", + "multilingual", + "license:cc-by-nc-4.0", + "autotrain_compatible", + "region:eu" + ], + "description": "--- pipeline_tag: text-classification tags: - transformers - reranker - cross-encoder - transformers.js language: - multilingual inference: false license: cc-by-nc-4.0 library_name: transformers ---

\"Jina

Trained by .

# jina-reranker-v2-base-multilingual ## Intended Usage & Model Info The **Jina Reranker v2** () is a transformer-based model that has been fine-tuned for text reranking task, which is a crucial component in many information retrieval systems. It is a cross-encoder model that takes a query and a document pair as input and outputs a score indicating the relevance of the document to the query. The model is trained on a large dataset of query-document pairs and is capable of reranking documents in multiple languages with high accuracy. Compared with the state-of-the-art reranker models, including the previous released , the **Jina Reranker v2** model has demonstrated competitiveness across a series of benchmarks targeting for text retrieval, multilingual capability, function-calling-aware and text-to-SQL-aware reranking, and code retrieval tasks. The model is capable of handling long texts with a context length of up to tokens, enabling the processing of extensive inputs. To enable the model to handle long texts that exceed 1024 tokens, the model uses a sliding window approach to chunk the input text into smaller pieces and rerank each chunk separately. The model is also equipped with a flash attention mechanism, which significantly improves the model's performance. # Usage _This model repository is licenced for research and evaluation purposes under CC-BY-NC-4.0. For commercial usage, please refer to Jina AI's APIs, AWS Sagemaker or Azure Marketplace offerings. Please contact us for any further clarifications._ 1. The easiest way to use is to call Jina AI's Reranker API. 2. You can also use the library to interact with the model programmatically. Before you start, install the and libraries: And then: The scores will be a list of floats, where each float represents the relevance score of the corresponding document to the query. Higher scores indicate higher relevance. For instance the returning scores in this case will be: The model gives high relevance scores to the documents that are most relevant to the query regardless of the language of the document. Note that by default, the model uses flash attention, which requires certain types of GPU hardware to run. If you encounter any issues, you can try call with . This will use the standard attention mechanism instead of flash attention. If you want to use flash attention for fast inference, you need to install the following packages: Enjoy the 3x-6x speedup with flash attention! ⚡️⚡️⚡️ 3. You can also use the library to run the model directly in JavaScript (in-browser, Node.js, Deno, etc.)! If you haven't already, you can install the Transformers.js JavaScript library (v3) using: Then, you can use the following code to interact with the model: That's it! You can now use the model in your projects. In addition to the function, the model also provides a function that can be used to rerank documents based on a query. You can use it as follows: Inside the object, you will find the reranked documents along with their scores. You can use this information to further process the documents as needed. The function will automatically chunk the input documents into smaller pieces if they exceed the model's maximum input length. This allows you to rerank long documents without running into memory issues. Specifically, the function will split the documents into chunks of size and rerank each chunk separately. The scores from all the chunks are then combined to produce the final reranking results. You can control the query length and document length in each chunk by setting the and parameters. The function also supports the parameter (default is ) which determines how much overlap there is between adjacent chunks. This can be useful when reranking long documents to ensure that the model has enough context to make accurate predictions. 3. Alternatively, has been integrated with from the library. Before you start, install the libraries: The []( class supports a []( method to get query-document relevance scores, and a []( method to rank all documents given your query. # Evaluation We evaluated Jina Reranker v2 on multiple benchmarks to ensure top-tier performance and search relevance. | Model Name | Model Size | MKQA(nDCG@10, 26 langs) | BEIR(nDCG@10, 17 datasets) | MLDR(recall@10, 13 langs) | CodeSearchNet (MRR@10, 3 tasks) | AirBench (nDCG@10, zh/en) | ToolBench (recall@3, 3 tasks) | TableSearch (recall@3) | | :-----------------------------: | :----------: | ------------------------- | ---------------------------- | --------------------------- | --------------------------------- | --------------------------- | ------------------------------- | ------------------------ | | jina-reranker-v2-multilingual | 278M | 54.83 | 53.17 | 68.95 | 71.36 | 61.33 | 77.75 | 93.31 | | bge-reranker-v2-m3 | 568M | 54.17 | 53.65 | 59.73 | 62.86 | 61.28 | 78.46 | 74.86 | | mmarco-mMiniLMv2-L12-H384-v1 | 118M | 53.37 | 45.40 | 28.91 | 51.78 | 56.46 | 58.39 | 53.60 | | jina-reranker-v1-base-en | 137M | - | 52.45 | - | - | - | 74.13 | 72.89 | Note: - NDCG@10 and MRR@10 measure ranking quality, with higher scores indicating better search results - recall@3 measures the proportion of relevant documents retrieved, with higher scores indicating better search results", + "model_explanation_gemini": "Reranks documents based on their relevance to a query across multiple languages using a cross-encoder transformer model with multilingual capability and long-text handling." +} \ No newline at end of file diff --git a/data/model_data_json/jitesh_emotion-english.json b/data/model_data_json/jitesh_emotion-english.json new file mode 100644 index 0000000000000000000000000000000000000000..507097b4b185838ba5c60899d0bff261e7ca4fd6 --- /dev/null +++ b/data/model_data_json/jitesh_emotion-english.json @@ -0,0 +1,22 @@ +{ + "model_id": "jitesh/emotion-english", + "downloads": 196385, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "roberta", + "text-classification", + "emotion", + "20 classes", + "code", + "emotions", + "en", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - en pipeline_tag: text-classification tags: - emotion - 20 classes - code - emotions widget: - text: I'm so angry right now. I can't believe he did that to me. example_title: anger - text: I'm feeling disgusted by the smell of this food. example_title: disgust - text: I'm feeling very afraid of what might happen next. example_title: fear - text: I'm so joyful right now! This is the best day of my life. example_title: joy - text: >- I'm feeling neutral about this situation. I don't really care one way or another. example_title: neutral - text: I'm feeling really sad today after my dog passed away.\" example_title: sadness - text: I'm so surprised by what just happened! I never saw that coming. example_title: surprise - text: I'm feeling cheeky today. I'm going to play a little prank on my friend. example_title: cheeky - text: I'm feeling confused about what to do next. I need some guidance. example_title: confuse - text: I'm feeling curious about the world around me. There's so much to learn! example_title: curious - text: I'm feeling empathetic towards my friend who is going through a tough time. example_title: empathetic - text: I'm feeling grumpy today. Everything is annoying me! example_title: grumpy - text: I'm feeling guilty about what I did. I wish I could take it back. example_title: guilty - text: I'm feeling very energetic today. I'm ready to take on the world! example_title: energetic - text: I'm feeling impatient waiting for this movie to start. example_title: impatient - text: >- I'm feeling so much love for my family right now. They mean everything to me. example_title: love - text: I'm thinking about my future and what I want to achieve. example_title: think - text: >- I'm feeling serious about this issue. It's important and needs to be addressed. example_title: serious - text: >- I'm feeling suspicious of what he's telling me. I think he's hiding something. example_title: suspicious - text: I'm feeling whiny today. Everything is bothering me! example_title: whiny - text: I love football so much example_title: love 2 - text: I'm reflecting on my experiences to gain insights example_title: think 2 - text: >- I borrowed money from a friend and haven't paid it back yet. Now I feel ashamed. example_title: guilty 2 - text: I'm starting to think that he's up to something. example_title: suspicious 2 - text: We need to approach this matter with a sense of purpose example_title: serious 2 --- # Emotion classification from 20 classes ## 20 Emotion labels | id | label | | --- | ---------- | | 0 | anger | | 1 | cheeky | | 2 | confuse | | 3 | curious | | 4 | disgust | | 5 | empathetic | | 6 | energetic | | 7 | fear | | 8 | grumpy | | 9 | guilty | | 10 | impatient | | 11 | joy | | 12 | love | | 13 | neutral | | 14 | sadness | | 15 | serious | | 16 | surprise | | 17 | suspicious | | 18 | think | | 19 | whiny | ## How to use Here is how to use this model to get the emotion label of a given text: The above code outputs the following line.", + "model_explanation_gemini": "Classifies text into one of 20 emotion categories including anger, joy, fear, and neutral." +} \ No newline at end of file diff --git a/data/model_data_json/joeddav_bart-large-mnli-yahoo-answers.json b/data/model_data_json/joeddav_bart-large-mnli-yahoo-answers.json new file mode 100644 index 0000000000000000000000000000000000000000..324673fdec98aaf6b9911104cf2c6ce07d5c95af --- /dev/null +++ b/data/model_data_json/joeddav_bart-large-mnli-yahoo-answers.json @@ -0,0 +1,20 @@ +{ + "model_id": "joeddav/bart-large-mnli-yahoo-answers", + "downloads": 179031, + "tags": [ + "transformers", + "pytorch", + "jax", + "bart", + "text-classification", + "zero-shot-classification", + "en", + "dataset:yahoo-answers", + "arxiv:1909.00161", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - text-classification - pytorch datasets: - yahoo-answers pipeline_tag: zero-shot-classification --- # bart-lage-mnli-yahoo-answers ## Model Description This model takes facebook/bart-large-mnli and fine-tunes it on Yahoo Answers topic classification. It can be used to predict whether a topic label can be assigned to a given sequence, whether or not the label has been seen before. You can play with an interactive demo of this zero-shot technique with this model, as well as the non-finetuned facebook/bart-large-mnli, here. ## Intended Usage This model was fine-tuned on topic classification and will perform best at zero-shot topic classification. Use as this is the template used during fine-tuning. For settings other than topic classification, you can use any model pre-trained on MNLI such as facebook/bart-large-mnli or roberta-large-mnli with the same code as written below. #### With the zero-shot classification pipeline The model can be used with the pipeline like so: #### With manual PyTorch ## Training The model is a pre-trained MNLI classifier further fine-tuned on Yahoo Answers topic classification in the manner originally described in Yin et al. 2019 and this blog post. That is, each sequence is fed to the pre-trained NLI model in place of the premise and each candidate label as the hypothesis, formatted like so: For each example in the training set, a true and a randomly-selected false label hypothesis are fed to the model which must predict which labels are valid and which are false. Since this method studies the ability to classify unseen labels after being trained on a different set of labels, the model is only trained on 5 out of the 10 labels in Yahoo Answers. These are \"Society & Culture\", \"Health\", \"Computers & Internet\", \"Business & Finance\", and \"Family & Relationships\". ## Evaluation Results This model was evaluated with the label-weighted F1 of the _seen_ and _unseen_ labels. That is, for each example the model must predict from one of the 10 corpus labels. The F1 is reported for the labels seen during training as well as the labels unseen during training. We found an F1 score of and for the unseen and seen labels, respectively. In order to adjust for the in-vs-out of distribution labels, we subtract a fixed amount of 30% from the normalized probabilities of the _seen_ labels, as described in Yin et al. 2019 and our blog post.", + "model_explanation_gemini": "Fine-tuned for zero-shot topic classification on Yahoo Answers, this model predicts relevant topic labels for text sequences, including unseen labels." +} \ No newline at end of file diff --git a/data/model_data_json/joeddav_distilbert-base-uncased-go-emotions-student.json b/data/model_data_json/joeddav_distilbert-base-uncased-go-emotions-student.json new file mode 100644 index 0000000000000000000000000000000000000000..caa5ca4827cad93f0ffe1e15b8eb5602f9df4552 --- /dev/null +++ b/data/model_data_json/joeddav_distilbert-base-uncased-go-emotions-student.json @@ -0,0 +1,20 @@ +{ + "model_id": "joeddav/distilbert-base-uncased-go-emotions-student", + "downloads": 138362, + "tags": [ + "transformers", + "pytorch", + "tf", + "distilbert", + "text-classification", + "tensorflow", + "en", + "dataset:go_emotions", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - text-classification - pytorch - tensorflow datasets: - go_emotions license: mit widget: - text: \"I feel lucky to be here.\" --- # distilbert-base-uncased-go-emotions-student ## Model Description This model is distilled from the zero-shot classification pipeline on the unlabeled GoEmotions dataset using this script. It was trained with mixed precision for 10 epochs and otherwise used the default script arguments. ## Intended Usage The model can be used like any other model trained on GoEmotions, but will likely not perform as well as a model trained with full supervision. It is primarily intended as a demo of how an expensive NLI-based zero-shot model can be distilled to a more efficient student, allowing a classifier to be trained with only unlabeled data. Note that although the GoEmotions dataset allow multiple labels per instance, the teacher used single-label classification to create psuedo-labels.", + "model_explanation_gemini": "Classifies text into emotions by distilling a zero-shot model trained on unlabeled GoEmotions data, primarily demonstrating efficient training without full supervision." +} \ No newline at end of file diff --git a/data/model_data_json/joeddav_xlm-roberta-large-xnli.json b/data/model_data_json/joeddav_xlm-roberta-large-xnli.json new file mode 100644 index 0000000000000000000000000000000000000000..fa4a856cf6a69ac8fbc3a985588a9ecc092ea125 --- /dev/null +++ b/data/model_data_json/joeddav_xlm-roberta-large-xnli.json @@ -0,0 +1,39 @@ +{ + "model_id": "joeddav/xlm-roberta-large-xnli", + "downloads": 133451, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "xlm-roberta", + "text-classification", + "tensorflow", + "zero-shot-classification", + "multilingual", + "en", + "fr", + "es", + "de", + "el", + "bg", + "ru", + "tr", + "ar", + "vi", + "th", + "zh", + "hi", + "sw", + "ur", + "dataset:multi_nli", + "dataset:xnli", + "arxiv:1911.02116", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - en - fr - es - de - el - bg - ru - tr - ar - vi - th - zh - hi - sw - ur tags: - text-classification - pytorch - tensorflow datasets: - multi_nli - xnli license: mit pipeline_tag: zero-shot-classification widget: - text: \"За кого вы голосуете в 2020 году?\" candidate_labels: \"politique étrangère, Europe, élections, affaires, politique\" multi_class: true - text: \"لمن تصوت في 2020؟\" candidate_labels: \"السياسة الخارجية, أوروبا, الانتخابات, الأعمال, السياسة\" multi_class: true - text: \"2020'de kime oy vereceksiniz?\" candidate_labels: \"dış politika, Avrupa, seçimler, ticaret, siyaset\" multi_class: true --- # xlm-roberta-large-xnli ## Model Description This model takes xlm-roberta-large and fine-tunes it on a combination of NLI data in 15 languages. It is intended to be used for zero-shot text classification, such as with the Hugging Face ZeroShotClassificationPipeline. ## Intended Usage This model is intended to be used for zero-shot text classification, especially in languages other than English. It is fine-tuned on XNLI, which is a multilingual NLI dataset. The model can therefore be used with any of the languages in the XNLI corpus: - English - French - Spanish - German - Greek - Bulgarian - Russian - Turkish - Arabic - Vietnamese - Thai - Chinese - Hindi - Swahili - Urdu Since the base model was pre-trained trained on 100 different languages, the model has shown some effectiveness in languages beyond those listed above as well. See the full list of pre-trained languages in appendix A of the XLM Roberata paper For English-only classification, it is recommended to use bart-large-mnli or a distilled bart MNLI model. #### With the zero-shot classification pipeline The model can be loaded with the pipeline like so: You can then classify in any of the above languages. You can even pass the labels in one language and the sequence to classify in another: The default hypothesis template is the English, . If you are working strictly within one language, it may be worthwhile to translate this to the language you are working with: #### With manual PyTorch ## Training This model was pre-trained on set of 100 languages, as described in the original paper. It was then fine-tuned on the task of NLI on the concatenated MNLI train set and the XNLI validation and test sets. Finally, it was trained for one additional epoch on only XNLI data where the translations for the premise and hypothesis are shuffled such that the premise and hypothesis for each example come from the same original English example but the premise and hypothesis are of different languages.", + "model_explanation_gemini": "Performs multilingual zero-shot text classification across 15 languages by leveraging fine-tuned XLM-RoBERTa on NLI data." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-arabic.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-arabic.json new file mode 100644 index 0000000000000000000000000000000000000000..98f6900c5ec7247a08d25f440205b18ac057b53e --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-arabic.json @@ -0,0 +1,24 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-arabic", + "downloads": 1359424, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "ar", + "dataset:common_voice", + "dataset:arabic_speech_corpus", + "doi:10.57967/hf/3573", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ar datasets: - common_voice - arabic_speech_corpus metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Arabic by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ar type: common_voice args: ar metrics: - name: Test WER type: wer value: 39.59 - name: Test CER type: cer value: 18.18 --- # Fine-tuned XLSR-53 large model for speech recognition in Arabic Fine-tuned facebook/wav2vec2-large-xlsr-53 on Arabic using the train and validation splits of Common Voice 6.1 and Arabic Speech Corpus. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | ألديك قلم ؟ | ألديك قلم | | ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست نالك مسافة على هذه الأرض أبعد من يوم الأمس م | | إنك تكبر المشكلة. | إنك تكبر المشكلة | | يرغب أن يلتقي بك. | يرغب أن يلتقي بك | | إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى | | سيسعدني مساعدتك أي وقت تحب. | سيسئدنيمساعدتك أي وقد تحب | | أَحَبُّ نظريّة علمية إليّ هي أن حلقات زحل مكونة بالكامل من الأمتعة المفقودة. | أحب نظرية علمية إلي هي أن حل قتزح المكوينا بالكامل من الأمت عن المفقودة | | سأشتري له قلماً. | سأشتري له قلما | | أين المشكلة ؟ | أين المشكل | | وَلِلَّهِ يَسْجُدُ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضِ مِنْ دَابَّةٍ وَالْمَلَائِكَةُ وَهُمْ لَا يَسْتَكْبِرُونَ | ولله يسجد ما في السماوات وما في الأرض من دابة والملائكة وهم لا يستكبرون | ## Evaluation The model can be evaluated as follows on the Arabic test data of Common Voice. **Test Result**: In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-14). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **39.59%** | **18.18%** | | bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% | | othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% | | kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% | | mohammed/wav2vec2-large-xlsr-arabic | 56.11% | 26.79% | | anas/wav2vec2-large-xlsr-arabic | 62.02% | 27.09% | | elgeish/wav2vec2-large-xlsr-53-arabic | 100.00% | 100.56% | ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Arabic speech recognition, converting spoken Arabic audio at 16kHz into text with a 39.59% word error rate." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-chinese-zh-cn.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-chinese-zh-cn.json new file mode 100644 index 0000000000000000000000000000000000000000..2323a596bf21984a7247060c930540471c57031c --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-chinese-zh-cn.json @@ -0,0 +1,23 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn", + "downloads": 4425174, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "zh", + "dataset:common_voice", + "doi:10.57967/hf/3570", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: zh datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Chinese (zh-CN) by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice zh-CN type: common_voice args: zh-CN metrics: - name: Test WER type: wer value: 82.37 - name: Test CER type: cer value: 19.03 --- # Fine-tuned XLSR-53 large model for speech recognition in Chinese Fine-tuned facebook/wav2vec2-large-xlsr-53 on Chinese using the train and validation splits of Common Voice 6.1, CSS10 and ST-CMDS. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | 宋朝末年年间定居粉岭围。 | 宋朝末年年间定居分定为 | | 渐渐行动不便 | 建境行动不片 | | 二十一年去世。 | 二十一年去世 | | 他们自称恰哈拉。 | 他们自称家哈 | | 局部干涩的例子包括有口干、眼睛干燥、及阴道干燥。 | 菊物干寺的例子包括有口肝眼睛干照以及阴到干 | | 嘉靖三十八年,登进士第三甲第二名。 | 嘉靖三十八年登进士第三甲第二名 | | 这一名称一直沿用至今。 | 这一名称一直沿用是心 | | 同时乔凡尼还得到包税合同和许多明矾矿的经营权。 | 同时桥凡妮还得到包税合同和许多民繁矿的经营权 | | 为了惩罚西扎城和塞尔柱的结盟,盟军在抵达后将外城烧毁。 | 为了曾罚西扎城和塞尔素的节盟盟军在抵达后将外曾烧毁 | | 河内盛产黄色无鱼鳞的鳍射鱼。 | 合类生场环色无鱼林的骑射鱼 | ## Evaluation The model can be evaluated as follows on the Chinese (zh-CN) test data of Common Voice. **Test Result**: In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-13). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn | **82.37%** | **19.03%** | | ydshieh/wav2vec2-large-xlsr-53-chinese-zh-cn-gpt | 84.01% | 20.95% | ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Chinese speech recognition, converting 16kHz audio to text with word and character error rates of 82.37% and 19.03% respectively." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-dutch.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-dutch.json new file mode 100644 index 0000000000000000000000000000000000000000..f94a482fb1e227276ce249bc32a0b01032b584b5 --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-dutch.json @@ -0,0 +1,27 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch", + "downloads": 2492032, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_6_0", + "nl", + "robust-speech-event", + "speech", + "xlsr-fine-tuning-week", + "dataset:common_voice", + "dataset:mozilla-foundation/common_voice_6_0", + "doi:10.57967/hf/0203", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: nl license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - nl - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Dutch by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice nl type: common_voice args: nl metrics: - name: Test WER type: wer value: 15.72 - name: Test CER type: cer value: 5.35 - name: Test WER (+LM) type: wer value: 12.84 - name: Test CER (+LM) type: cer value: 4.64 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: nl metrics: - name: Dev WER type: wer value: 35.79 - name: Dev CER type: cer value: 17.67 - name: Dev WER (+LM) type: wer value: 31.54 - name: Dev CER (+LM) type: cer value: 16.37 --- # Fine-tuned XLSR-53 large model for speech recognition in Dutch Fine-tuned facebook/wav2vec2-large-xlsr-53 on Dutch using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | DE ABORIGINALS ZIJN DE OORSPRONKELIJKE BEWONERS VAN AUSTRALIË. | DE ABBORIGENALS ZIJN DE OORSPRONKELIJKE BEWONERS VAN AUSTRALIË | | MIJN TOETSENBORD ZIT VOL STOF. | MIJN TOETSENBORD ZIT VOL STOF | | ZE HAD DE BANK BESCHADIGD MET HAAR SKATEBOARD. | ZE HAD DE BANK BESCHADIGD MET HAAR SCHEETBOORD | | WAAR LAAT JIJ JE ONDERHOUD DOEN? | WAAR LAAT JIJ HET ONDERHOUD DOEN | | NA HET LEZEN VAN VELE BEOORDELINGEN HAD ZE EINDELIJK HAAR OOG LATEN VALLEN OP EEN LAPTOP MET EEN QWERTY TOETSENBORD. | NA HET LEZEN VAN VELE BEOORDELINGEN HAD ZE EINDELIJK HAAR OOG LATEN VALLEN OP EEN LAPTOP MET EEN QUERTITOETSEMBORD | | DE TAMPONS ZIJN OP. | DE TAPONT ZIJN OP | | MARIJKE KENT OLIVIER NU AL MEER DAN TWEE JAAR. | MAARRIJKEN KENT OLIEVIER NU AL MEER DAN TWEE JAAR | | HET VOEREN VAN BROOD AAN EENDEN IS EIGENLIJK ONGEZOND VOOR DE BEESTEN. | HET VOEREN VAN BEUROT AAN EINDEN IS EIGENLIJK ONGEZOND VOOR DE BEESTEN | | PARKET MOET JE STOFZUIGEN, TEGELS MOET JE DWEILEN. | PARKET MOET JE STOF ZUIGEN MAAR TEGELS MOET JE DWEILEN | | IN ONZE BUURT KENT IEDEREEN ELKAAR. | IN ONZE BUURT KENT IEDEREEN ELKAAR | ## Evaluation 1. To evaluate on with split 2. To evaluate on ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Dutch speech recognition, converting 16kHz audio inputs to text with improved accuracy using the XLSR-53 large model." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-english.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-english.json new file mode 100644 index 0000000000000000000000000000000000000000..3a3828d81998fcb2b0047bbbfb8b5720b3f3896c --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-english.json @@ -0,0 +1,28 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-english", + "downloads": 269501, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "en", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_6_0", + "robust-speech-event", + "speech", + "xlsr-fine-tuning-week", + "dataset:common_voice", + "dataset:mozilla-foundation/common_voice_6_0", + "doi:10.57967/hf/3569", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - en - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - robust-speech-event - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 English by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice en type: common_voice args: en metrics: - name: Test WER type: wer value: 19.06 - name: Test CER type: cer value: 7.69 - name: Test WER (+LM) type: wer value: 14.81 - name: Test CER (+LM) type: cer value: 6.84 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: en metrics: - name: Dev WER type: wer value: 27.72 - name: Dev CER type: cer value: 11.65 - name: Dev WER (+LM) type: wer value: 20.85 - name: Dev CER (+LM) type: cer value: 11.01 --- # Fine-tuned XLSR-53 large model for speech recognition in English Fine-tuned facebook/wav2vec2-large-xlsr-53 on English using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | \"SHE'LL BE ALL RIGHT.\" | SHE'LL BE ALL RIGHT | | SIX | SIX | | \"ALL'S WELL THAT ENDS WELL.\" | ALL AS WELL THAT ENDS WELL | | DO YOU MEAN IT? | DO YOU MEAN IT | | THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE, BUT STILL CAUSES REGRESSIONS. | THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE BUT STILL CAUSES REGRESSION | | HOW IS MOZILLA GOING TO HANDLE AMBIGUITIES LIKE QUEUE AND CUE? | HOW IS MOSLILLAR GOING TO HANDLE ANDBEWOOTH HIS LIKE Q AND Q | | \"I GUESS YOU MUST THINK I'M KINDA BATTY.\" | RUSTIAN WASTIN PAN ONTE BATTLY | | NO ONE NEAR THE REMOTE MACHINE YOU COULD RING? | NO ONE NEAR THE REMOTE MACHINE YOU COULD RING | | SAUCE FOR THE GOOSE IS SAUCE FOR THE GANDER. | SAUCE FOR THE GUICE IS SAUCE FOR THE GONDER | | GROVES STARTED WRITING SONGS WHEN SHE WAS FOUR YEARS OLD. | GRAFS STARTED WRITING SONGS WHEN SHE WAS FOUR YEARS OLD | ## Evaluation 1. To evaluate on with split 2. To evaluate on ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for English speech recognition, converting 16kHz audio input to text with improved accuracy using the XLSR-53 large model." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-finnish.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-finnish.json new file mode 100644 index 0000000000000000000000000000000000000000..c50c18650f241c3fa9ba4910ae9b4deec6a5d1d5 --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-finnish.json @@ -0,0 +1,23 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-finnish", + "downloads": 91673, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "fi", + "dataset:common_voice", + "doi:10.57967/hf/3578", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: fi datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Finnish by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice fi type: common_voice args: fi metrics: - name: Test WER type: wer value: 41.60 - name: Test CER type: cer value: 8.23 --- # Fine-tuned XLSR-53 large model for speech recognition in Finnish Fine-tuned facebook/wav2vec2-large-xlsr-53 on Finnish using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | MYSTEERIMIES OLI OPPINUT MORAALINSA TARUISTA, ELOKUVISTA JA PELEISTÄ. | MYSTEERIMIES OLI OPPINUT MORALINSA TARUISTA ELOKUVISTA JA PELEISTÄ | | ÄÄNESTIN MIETINNÖN PUOLESTA! | ÄÄNESTIN MIETINNÖN PUOLESTA | | VAIN TUNTIA AIKAISEMMIN OLIMME MIEHENI KANSSA TUNTENEET SUURINTA ILOA. | PAIN TUNTIA AIKAISEMMIN OLIN MIEHENI KANSSA TUNTENEET SUURINTA ILAA | | ENSIMMÄISELLE MIEHELLE SAI KOLME LASTA. | ENSIMMÄISELLE MIEHELLE SAI KOLME LASTA | | ÄÄNESTIN MIETINNÖN PUOLESTA, SILLÄ POHJIMMILTAAN SIINÄ VASTUSTETAAN TÄTÄ SUUNTAUSTA. | ÄÄNESTIN MIETINNÖN PUOLESTA SILLÄ POHJIMMILTAAN SIINÄ VASTOTTETAAN TÄTÄ SUUNTAUSTA | | TÄHDENLENTOJENKO VARALTA MINÄ SEN OLISIN TÄNNE KUSKANNUT? | TÄHDEN LENTOJENKO VARALTA MINÄ SEN OLISIN TÄNNE KUSKANNUT | | SIITÄ SE TULEE. | SIITA SE TULEE | | NIIN, KUULUU KIROUS, JA KAUHEA KARJAISU. | NIIN KUULUU KIROUS JA KAUHEA KARJAISU | | ARKIT KUN OVAT NÄES ELEMENTTIRAKENTEISIA. | ARKIT KUN OVAT MÄISS' ELÄMÄTTEROKENTEISIÄ | | JÄIN ALUKSEN SISÄÄN, MUTTA KUULIN OVEN LÄPI, ETTÄ ULKOPUOLELLA ALKOI TAPAHTUA. | JAKALOKSEHÄN SISÄL MUTTA KUULIN OVENLAPI ETTÄ ULKA KUOLLALLA ALKOI TAPAHTUA | ## Evaluation The model can be evaluated as follows on the Finnish test data of Common Voice. **Test Result**: In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-21). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | aapot/wav2vec2-large-xlsr-53-finnish | **32.51%** | **5.34%** | | Tommi/wav2vec2-large-xlsr-53-finnish | 35.22% | 5.81% | | vasilis/wav2vec2-large-xlsr-53-finnish | 38.24% | 6.49% | | jonatasgrosman/wav2vec2-large-xlsr-53-finnish | 41.60% | 8.23% | | birgermoell/wav2vec2-large-xlsr-finnish | 53.51% | 9.18% | ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Finnish speech recognition, converting 16kHz audio input to text with a word error rate of 41.60%." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-greek.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-greek.json new file mode 100644 index 0000000000000000000000000000000000000000..e1e8a618abbb1211dc002ce078ef98ae09ae0dfe --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-greek.json @@ -0,0 +1,23 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-greek", + "downloads": 157003, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "el", + "dataset:common_voice", + "doi:10.57967/hf/3579", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: el datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Greek by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice el type: common_voice args: el metrics: - name: Test WER type: wer value: 11.62 - name: Test CER type: cer value: 3.36 --- # Fine-tuned XLSR-53 large model for speech recognition in Greek Fine-tuned facebook/wav2vec2-large-xlsr-53 on Greek using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | ΤΟ ΒΑΣΙΛΌΠΟΥΛΟ, ΠΟΥ ΜΟΙΆΖΕΙ ΛΕΟΝΤΑΡΆΚΙ ΚΑΙ ΑΕΤΟΥΔΆΚΙ | ΤΟ ΒΑΣΙΛΌΠΟΥΛΟ ΠΟΥ ΜΙΑΣΕ ΛΙΟΝΤΑΡΑΚΉ ΚΑΙ ΑΪΤΟΥΔΆΚΙ | | ΣΥΝΆΜΑ ΞΕΠΡΌΒΑΛΑΝ ΑΠΌ ΜΈΣΑ ΑΠΌ ΤΑ ΔΈΝΤΡΑ, ΔΕΞΙΆ, ΑΡΜΑΤΩΜΈΝΟΙ ΚΑΒΑΛΑΡΈΟΙ. | ΣΥΝΆΜΑ ΚΑΙ ΤΡΌΒΑΛΑΝ ΑΠΌ ΜΈΣΑ ΑΠΌ ΤΑ ΔΈΝΤΡΑ ΔΕΞΙΆ ΑΡΜΑΤΩΜΈΝΟΙ ΚΑΒΑΛΑΡΈΟΙ | | ΤΑ ΣΥΣΚΕΥΑΣΜΈΝΑ ΒΙΟΛΟΓΙΚΆ ΛΑΧΑΝΙΚΆ ΔΕΝ ΠΕΡΙΈΧΟΥΝ ΣΥΝΤΗΡΗΤΙΚΆ ΚΑΙ ΟΡΜΌΝΕΣ | ΤΑ ΣΥΣΚΕΦΑΣΜΈΝΑ ΒΙΟΛΟΓΙΚΆ ΛΑΧΑΝΙΚΆ ΔΕΝ ΠΕΡΙΈΧΟΥΝ ΣΙΔΗΡΗΤΙΚΆ ΚΑΙ ΟΡΜΌΝΕΣ | | ΑΚΟΛΟΥΘΉΣΕΤΕ ΜΕ! | ΑΚΟΛΟΥΘΉΣΤΕ ΜΕ | | ΚΑΙ ΠΟΎ ΜΠΟΡΏ ΝΑ ΤΟΝ ΒΡΩ; | Ε ΠΟΎ ΜΠΟΡΏ ΝΑ ΤΙ ΕΒΡΩ | | ΝΑΙ! ΑΠΟΚΡΊΘΗΚΕ ΤΟ ΠΑΙΔΊ | ΝΑΙ ΑΠΟΚΡΊΘΗΚΕ ΤΟ ΠΑΙΔΊ | | ΤΟ ΠΑΛΆΤΙ ΜΟΥ ΤΟ ΠΡΟΜΉΘΕΥΕ. | ΤΟ ΠΑΛΆΤΙ ΜΟΥ ΤΟ ΠΡΟΜΉΘΕΥΕ | | ΉΛΘΕ ΜΉΝΥΜΑ ΑΠΌ ΤΟ ΘΕΊΟ ΒΑΣΙΛΙΆ; | ΉΛΘΑ ΜΕΊΝΕΙ ΜΕ ΑΠΌ ΤΟ ΘΕΊΟ ΒΑΣΊΛΙΑ | | ΠΑΡΑΚΆΤΩ, ΈΝΑ ΡΥΆΚΙ ΜΟΥΡΜΟΎΡΙΖΕ ΓΛΥΚΆ, ΚΥΛΏΝΤΑΣ ΤΑ ΚΡΥΣΤΑΛΛΈΝΙΑ ΝΕΡΆ ΤΟΥ ΑΝΆΜΕΣΑ ΣΤΑ ΠΥΚΝΆ ΧΑΜΌΔΕΝΤΡΑ. | ΠΑΡΑΚΆΤΩ ΈΝΑ ΡΥΆΚΙ ΜΟΥΡΜΟΎΡΙΖΕ ΓΛΥΚΆ ΚΥΛΏΝΤΑΣ ΤΑ ΚΡΥΣΤΑΛΛΈΝΙΑ ΝΕΡΆ ΤΟΥ ΑΝΆΜΕΣΑ ΣΤΑ ΠΥΚΡΆ ΧΑΜΌΔΕΝΤΡΑ | | ΠΡΆΓΜΑΤΙ, ΕΊΝΑΙ ΑΣΤΕΊΟ ΝΑ ΠΆΡΕΙ Ο ΔΙΆΒΟΛΟΣ | ΠΡΆΓΜΑΤΗ ΕΊΝΑΙ ΑΣΤΕΊΟ ΝΑ ΠΆΡΕΙ Ο ΔΙΆΒΟΛΟΣ | ## Evaluation The model can be evaluated as follows on the Greek test data of Common Voice. **Test Result**: In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-22). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | lighteternal/wav2vec2-large-xlsr-53-greek | **10.13%** | **2.66%** | | jonatasgrosman/wav2vec2-large-xlsr-53-greek | 11.62% | 3.36% | | vasilis/wav2vec2-large-xlsr-53-greek | 19.09% | 5.88% | | PereLluis13/wav2vec2-large-xlsr-53-greek | 20.16% | 5.71% | ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Greek speech recognition, converting 16kHz audio input to text with a word error rate of 11.62% and character error rate of 3.36% on Common Voice data." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-hungarian.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-hungarian.json new file mode 100644 index 0000000000000000000000000000000000000000..26a48c9b55db332b592f802bd7232898c0ab2f0d --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-hungarian.json @@ -0,0 +1,23 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-hungarian", + "downloads": 172395, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "hu", + "dataset:common_voice", + "doi:10.57967/hf/3577", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: hu datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Hungarian by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice hu type: common_voice args: hu metrics: - name: Test WER type: wer value: 31.40 - name: Test CER type: cer value: 6.20 --- # Fine-tuned XLSR-53 large model for speech recognition in Hungarian Fine-tuned facebook/wav2vec2-large-xlsr-53 on Hungarian using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | BÜSZKÉK VAGYUNK A MAGYAR EMBEREK NAGYSZERŰ SZELLEMI ALKOTÁSAIRA. | BÜSZKÉK VAGYUNK A MAGYAR EMBEREK NAGYSZERŰ SZELLEMI ALKOTÁSAIRE | | A NEMZETSÉG TAGJAI KÖZÜL EZT TERMESZTIK A LEGSZÉLESEBB KÖRBEN ÍZLETES TERMÉSÉÉRT. | A NEMZETSÉG TAGJAI KÖZÜL ESZSZERMESZTIK A LEGSZELESEBB KÖRBEN IZLETES TERMÉSSÉÉRT | | A VÁROSBA VÁGYÓDOTT A LEGJOBBAN, ÉPPEN MERT ODA NEM JUTHATOTT EL SOHA. | A VÁROSBA VÁGYÓDOTT A LEGJOBBAN ÉPPEN MERT ODA NEM JUTHATOTT EL SOHA | | SÍRJA MÁRA MEGSEMMISÜLT. | SIMGI A MANDO MEG SEMMICSEN | | MINDEN ZENESZÁMOT DRÁGAKŐNEK NEVEZETT. | MINDEN ZENA SZÁMODRAGAKŐNEK NEVEZETT | | ÍGY MÚLT EL A DÉLELŐTT. | ÍGY MÚLT EL A DÍN ELŐTT | | REMEK POFA! | A REMEG PUFO | | SZEMET SZEMÉRT, FOGAT FOGÉRT. | SZEMET SZEMÉRT FOGADD FOGÉRT | | BIZTOSAN LAKIK ITT NÉHÁNY ATYÁMFIA. | BIZTOSAN LAKIKÉT NÉHANY ATYAMFIA | | A SOROK KÖZÖTT OLVAS. | A SOROG KÖZÖTT OLVAS | ## Evaluation The model can be evaluated as follows on the Hungarian test data of Common Voice. **Test Result**: In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-22). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | jonatasgrosman/wav2vec2-large-xlsr-53-hungarian | **31.40%** | **6.20%** | | anton-l/wav2vec2-large-xlsr-53-hungarian | 42.39% | 9.39% | | gchhablani/wav2vec2-large-xlsr-hu | 46.42% | 10.04% | | birgermoell/wav2vec2-large-xlsr-hungarian | 46.93% | 10.31% | ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Hungarian speech recognition, converting 16kHz audio input to text with a 31.40% word error rate." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-japanese.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-japanese.json new file mode 100644 index 0000000000000000000000000000000000000000..68eb6eb585b4f89e7b27ca9302bad83458bd5fdb --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-japanese.json @@ -0,0 +1,23 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese", + "downloads": 1396376, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "ja", + "dataset:common_voice", + "doi:10.57967/hf/3568", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Japanese by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ja type: common_voice args: ja metrics: - name: Test WER type: wer value: 81.80 - name: Test CER type: cer value: 20.16 --- # Fine-tuned XLSR-53 large model for speech recognition in Japanese Fine-tuned facebook/wav2vec2-large-xlsr-53 on Japanese using the train and validation splits of Common Voice 6.1, CSS10 and JSUT. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | 祖母は、おおむね機嫌よく、サイコロをころがしている。 | 人母は重にきね起くさいがしている | | 財布をなくしたので、交番へ行きます。 | 財布をなく手端ので勾番へ行きます | | 飲み屋のおやじ、旅館の主人、医者をはじめ、交際のある人にきいてまわったら、みんな、私より収入が多いはずなのに、税金は安い。 | ノ宮屋のお親じ旅館の主に医者をはじめ交際のアル人トに聞いて回ったらみんな私より収入が多いはなうに税金は安い | | 新しい靴をはいて出かけます。 | だらしい靴をはいて出かけます | | このためプラズマ中のイオンや電子の持つ平均運動エネルギーを温度で表現することがある | このためプラズマ中のイオンや電子の持つ平均運動エネルギーを温度で表弁することがある | | 松井さんはサッカーより野球のほうが上手です。 | 松井さんはサッカーより野球のほうが上手です | | 新しいお皿を使います。 | 新しいお皿を使います | | 結婚以来三年半ぶりの東京も、旧友とのお酒も、夜行列車も、駅で寝て、朝を待つのも久しぶりだ。 | 結婚ル二来三年半降りの東京も吸とのお酒も野越者も駅で寝て朝を待つの久しぶりた | | これまで、少年野球、ママさんバレーなど、地域スポーツを支え、市民に密着してきたのは、無数のボランティアだった。 | これまで少年野球三バレーなど地域スポーツを支え市民に満着してきたのは娘数のボランティアだった | | 靴を脱いで、スリッパをはきます。 | 靴を脱いでスイパーをはきます | ## Evaluation The model can be evaluated as follows on the Japanese test data of Common Voice. **Test Result**: In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-10). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | jonatasgrosman/wav2vec2-large-xlsr-53-japanese | **81.80%** | **20.16%** | | vumichien/wav2vec2-large-xlsr-japanese | 1108.86% | 23.40% | | qqhann/w2v_hf_jsut_xlsr53 | 1012.18% | 70.77% | ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Japanese speech recognition, converting 16kHz audio input to text using the XLSR-53 large model trained on Common Voice, CSS10, and JSUT datasets." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-persian.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-persian.json new file mode 100644 index 0000000000000000000000000000000000000000..0ab765b956f0d67679c08c1f373372101e49668d --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-persian.json @@ -0,0 +1,23 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-persian", + "downloads": 305945, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "fa", + "dataset:common_voice", + "doi:10.57967/hf/3576", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: fa datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Persian by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice fa type: common_voice args: fa metrics: - name: Test WER type: wer value: 30.12 - name: Test CER type: cer value: 7.37 --- # Fine-tuned XLSR-53 large model for speech recognition in Persian Fine-tuned facebook/wav2vec2-large-xlsr-53 on Persian using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | از مهمونداری کنار بکشم | از مهمانداری کنار بکشم | | برو از مهرداد بپرس. | برو از ماقدعاد به پرس | | خب ، تو چیكار می كنی؟ | خوب تو چیکار می کنی | | مسقط پایتخت عمان در عربی به معنای محل سقوط است | مسقط پایتخت عمان در عربی به بعنای محل سقوط است | | آه، نه اصلاُ! | اهنه اصلا | | توانست | توانست | | قصیده فن شعر میگوید ای دوستان | قصیده فن شعر میگوید ایدوستون | | دو استایل متفاوت دارین | دوبوست داریل و متفاوت بری | | دو روز قبل از کریسمس ؟ | اون مفتود پش پشش | | ساعت های کاری چیست؟ | این توری که موشیکل خب | ## Evaluation The model can be evaluated as follows on the Persian test data of Common Voice. **Test Result**: In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-22). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | jonatasgrosman/wav2vec2-large-xlsr-53-persian | **30.12%** | **7.37%** | | m3hrdadfi/wav2vec2-large-xlsr-persian-v2 | 33.85% | 8.79% | | m3hrdadfi/wav2vec2-large-xlsr-persian | 34.37% | 8.98% | ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Persian speech recognition, achieving 30.12% WER and 7.37% CER on Common Voice data." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-polish.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-polish.json new file mode 100644 index 0000000000000000000000000000000000000000..2134b599c3746428c0ac499ca6690cb114c09875 --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-polish.json @@ -0,0 +1,27 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-polish", + "downloads": 489434, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_6_0", + "pl", + "robust-speech-event", + "speech", + "xlsr-fine-tuning-week", + "dataset:common_voice", + "dataset:mozilla-foundation/common_voice_6_0", + "doi:10.57967/hf/3574", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: pl license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - pl - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Polish by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice pl type: common_voice args: pl metrics: - name: Test WER type: wer value: 14.21 - name: Test CER type: cer value: 3.49 - name: Test WER (+LM) type: wer value: 10.98 - name: Test CER (+LM) type: cer value: 2.93 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: pl metrics: - name: Dev WER type: wer value: 33.18 - name: Dev CER type: cer value: 15.92 - name: Dev WER (+LM) type: wer value: 29.31 - name: Dev CER (+LM) type: cer value: 15.17 --- # Fine-tuned XLSR-53 large model for speech recognition in Polish Fine-tuned facebook/wav2vec2-large-xlsr-53 on Polish using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | \"\"\"CZY DRZWI BYŁY ZAMKNIĘTE?\"\"\" | PRZY DRZWI BYŁY ZAMKNIĘTE | | GDZIEŻ TU POWÓD DO WYRZUTÓW? | WGDZIEŻ TO POM DO WYRYDÓ | | \"\"\"O TEM JEDNAK NIE BYŁO MOWY.\"\"\" | O TEM JEDNAK NIE BYŁO MOWY | | LUBIĘ GO. | LUBIĄ GO | | — TO MI NIE POMAGA. | TO MNIE NIE POMAGA | | WCIĄŻ LUDZIE WYSIADAJĄ PRZED ZAMKIEM, Z MIASTA, Z PRAGI. | WCIĄŻ LUDZIE WYSIADAJĄ PRZED ZAMKIEM Z MIASTA Z PRAGI | | ALE ON WCALE INACZEJ NIE MYŚLAŁ. | ONY MONITCENIE PONACZUŁA NA MASU | | A WY, CO TAK STOICIE? | A WY CO TAK STOICIE | | A TEN PRZYRZĄD DO CZEGO SŁUŻY? | A TEN PRZYRZĄD DO CZEGO SŁUŻY | | NA JUTRZEJSZYM KOLOKWIUM BĘDZIE PIĘĆ PYTAŃ OTWARTYCH I TEST WIELOKROTNEGO WYBORU. | NAJUTRZEJSZYM KOLOKWIUM BĘDZIE PIĘĆ PYTAŃ OTWARTYCH I TEST WIELOKROTNEGO WYBORU | ## Evaluation 1. To evaluate on with split 2. To evaluate on ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Polish speech recognition, converting 16kHz audio input to text with improved accuracy using the XLSR-53 large model." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-portuguese.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-portuguese.json new file mode 100644 index 0000000000000000000000000000000000000000..5c4fdf1e735f68d8abd6818daf099bd29b805ea1 --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-portuguese.json @@ -0,0 +1,27 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese", + "downloads": 5104977, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_6_0", + "pt", + "robust-speech-event", + "speech", + "xlsr-fine-tuning-week", + "dataset:common_voice", + "dataset:mozilla-foundation/common_voice_6_0", + "doi:10.57967/hf/3572", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: pt license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - pt - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Portuguese by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice pt type: common_voice args: pt metrics: - name: Test WER type: wer value: 11.31 - name: Test CER type: cer value: 3.74 - name: Test WER (+LM) type: wer value: 9.01 - name: Test CER (+LM) type: cer value: 3.21 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: pt metrics: - name: Dev WER type: wer value: 42.1 - name: Dev CER type: cer value: 17.93 - name: Dev WER (+LM) type: wer value: 36.92 - name: Dev CER (+LM) type: cer value: 16.88 --- # Fine-tuned XLSR-53 large model for speech recognition in Portuguese Fine-tuned facebook/wav2vec2-large-xlsr-53 on Portuguese using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | NEM O RADAR NEM OS OUTROS INSTRUMENTOS DETECTARAM O BOMBARDEIRO STEALTH. | NEMHUM VADAN OS OLTWES INSTRUMENTOS DE TTÉÃN UM BOMBERDEIRO OSTER | | PEDIR DINHEIRO EMPRESTADO ÀS PESSOAS DA ALDEIA | E DIR ENGINHEIRO EMPRESTAR AS PESSOAS DA ALDEIA | | OITO | OITO | | TRANCÁ-LOS | TRANCAUVOS | | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | | O YOUTUBE AINDA É A MELHOR PLATAFORMA DE VÍDEOS. | YOUTUBE AINDA É A MELHOR PLATAFOMA DE VÍDEOS | | MENINA E MENINO BEIJANDO NAS SOMBRAS | MENINA E MENINO BEIJANDO NAS SOMBRAS | | EU SOU O SENHOR | EU SOU O SENHOR | | DUAS MULHERES QUE SENTAM-SE PARA BAIXO LENDO JORNAIS. | DUAS MIERES QUE SENTAM-SE PARA BAICLANE JODNÓI | | EU ORIGINALMENTE ESPERAVA | EU ORIGINALMENTE ESPERAVA | ## Evaluation 1. To evaluate on with split 2. To evaluate on ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Portuguese speech recognition, converting 16kHz audio input to text with improved accuracy using the XLSR-53 large model." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-russian.json b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-russian.json new file mode 100644 index 0000000000000000000000000000000000000000..62d0edf79a3696eea546025ca62d4faa03cf692a --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-large-xlsr-53-russian.json @@ -0,0 +1,27 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-large-xlsr-53-russian", + "downloads": 3613855, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_6_0", + "robust-speech-event", + "ru", + "speech", + "xlsr-fine-tuning-week", + "dataset:common_voice", + "dataset:mozilla-foundation/common_voice_6_0", + "doi:10.57967/hf/3571", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ru license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - robust-speech-event - ru - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Russian by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ru type: common_voice args: ru metrics: - name: Test WER type: wer value: 13.3 - name: Test CER type: cer value: 2.88 - name: Test WER (+LM) type: wer value: 9.57 - name: Test CER (+LM) type: cer value: 2.24 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: ru metrics: - name: Dev WER type: wer value: 40.22 - name: Dev CER type: cer value: 14.8 - name: Dev WER (+LM) type: wer value: 33.61 - name: Dev CER (+LM) type: cer value: 13.5 --- # Fine-tuned XLSR-53 large model for speech recognition in Russian Fine-tuned facebook/wav2vec2-large-xlsr-53 on Russian using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: ## Usage The model can be used directly (without a language model) as follows... Using the HuggingSound library: Writing your own inference script: | Reference | Prediction | | ------------- | ------------- | | ОН РАБОТАТЬ, А ЕЕ НЕ УДЕРЖАТЬ НИКАК — БЕГАЕТ ЗА КЛЁШЕМ КАЖДОГО БУЛЬВАРНИКА. | ОН РАБОТАТЬ А ЕЕ НЕ УДЕРЖАТ НИКАК БЕГАЕТ ЗА КЛЕШОМ КАЖДОГО БУЛЬБАРНИКА | | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ, Я БУДУ СЧИТАТЬ, ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ. | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ Я БУДУ СЧИТАТЬ ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ | | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ МИР С ИЗРАИЛЕМ, А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕННОСТИ. | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ С НИ МИР ФЕЗРЕЛЕМ А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕНСКИ | | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО, ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРИБАВЛЯЮ. | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРЕДБАВЛЯЕТ | | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ. | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ | | ВРОНСКИЙ, СЛУШАЯ ОДНИМ УХОМ, ПЕРЕВОДИЛ БИНОКЛЬ С БЕНУАРА НА БЕЛЬ-ЭТАЖ И ОГЛЯДЫВАЛ ЛОЖИ. | ЗЛАЗКИ СЛУШАЮ ОТ ОДНИМ УХАМ ТЫ ВОТИ В ВИНОКОТ СПИЛА НА ПЕРЕТАЧ И ОКЛЯДЫВАЛ БОСУ | | К СОЖАЛЕНИЮ, СИТУАЦИЯ ПРОДОЛЖАЕТ УХУДШАТЬСЯ. | К СОЖАЛЕНИЮ СИТУАЦИИ ПРОДОЛЖАЕТ УХУЖАТЬСЯ | | ВСЁ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕПЕРЕВОДИВШИХСЯ ДОЛГОВ. | ВСЕ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕ ПЕРЕВОДИВШИХСЯ ДОЛГОВ | | ТЕПЕРЬ ДЕЛО, КОНЕЧНО, ЗА ТЕМ, ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА. | ТЕПЕРЬ ДЕЛАЮ КОНЕЧНО ЗАТЕМ ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА | | ДЕВЯТЬ | ЛЕВЕТЬ | ## Evaluation 1. To evaluate on with split 2. To evaluate on ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "Fine-tuned for Russian speech recognition, converting 16kHz audio input to text with improved accuracy using the XLSR-53 large model." +} \ No newline at end of file diff --git a/data/model_data_json/jonatasgrosman_wav2vec2-xls-r-1b-portuguese.json b/data/model_data_json/jonatasgrosman_wav2vec2-xls-r-1b-portuguese.json new file mode 100644 index 0000000000000000000000000000000000000000..aa47c1cb8a2b68e53dffe02995122bcc836124f5 --- /dev/null +++ b/data/model_data_json/jonatasgrosman_wav2vec2-xls-r-1b-portuguese.json @@ -0,0 +1,22 @@ +{ + "model_id": "jonatasgrosman/wav2vec2-xls-r-1b-portuguese", + "downloads": 633283, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_8_0", + "pt", + "robust-speech-event", + "dataset:mozilla-foundation/common_voice_8_0", + "doi:10.57967/hf/3575", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - pt license: apache-2.0 tags: - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_8_0 - pt - robust-speech-event datasets: - mozilla-foundation/common_voice_8_0 model-index: - name: XLS-R Wav2Vec2 Portuguese by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 8 type: mozilla-foundation/common_voice_8_0 args: pt metrics: - name: Test WER type: wer value: 8.7 - name: Test CER type: cer value: 2.55 - name: Test WER (+LM) type: wer value: 6.04 - name: Test CER (+LM) type: cer value: 1.98 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: pt metrics: - name: Dev WER type: wer value: 24.23 - name: Dev CER type: cer value: 11.3 - name: Dev WER (+LM) type: wer value: 19.41 - name: Dev CER (+LM) type: cer value: 10.19 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Test Data type: speech-recognition-community-v2/eval_data args: pt metrics: - name: Test WER type: wer value: 18.8 --- # Fine-tuned XLS-R 1B model for speech recognition in Portuguese Fine-tuned facebook/wav2vec2-xls-r-1b on Portuguese using the train and validation splits of Common Voice 8.0, CORAA, Multilingual TEDx, and Multilingual LibriSpeech. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud :) ## Usage Using the HuggingSound library: Writing your own inference script: ## Evaluation Commands 1. To evaluate on with split 2. To evaluate on ## Citation If you want to cite this model you can use this:", + "model_explanation_gemini": "A fine-tuned Portuguese speech recognition model based on XLS-R Wav2Vec2, optimized for converting spoken Portuguese to text with high accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/jonathandinu_face-parsing.json b/data/model_data_json/jonathandinu_face-parsing.json new file mode 100644 index 0000000000000000000000000000000000000000..0467966a53b58ee5a5a414f16eaa2aeef26d2a56 --- /dev/null +++ b/data/model_data_json/jonathandinu_face-parsing.json @@ -0,0 +1,22 @@ +{ + "model_id": "jonathandinu/face-parsing", + "downloads": 319779, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "segformer", + "vision", + "image-segmentation", + "nvidia/mit-b5", + "transformers.js", + "en", + "dataset:celebamaskhq", + "arxiv:2105.15203", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en library_name: transformers tags: - vision - image-segmentation - nvidia/mit-b5 - transformers.js - onnx datasets: - celebamaskhq --- # Face Parsing !example image and output Semantic segmentation model fine-tuned from nvidia/mit-b5 with CelebAMask-HQ for face parsing. For additional options, see the Transformers Segformer docs. > ONNX model for web inference contributed by Xenova. ## Usage in Python Exhaustive list of labels can be extracted from config.json. | id | label | note | | :-: | :--------- | :---------------- | | 0 | background | | | 1 | skin | | | 2 | nose | | | 3 | eye_g | eyeglasses | | 4 | l_eye | left eye | | 5 | r_eye | right eye | | 6 | l_brow | left eyebrow | | 7 | r_brow | right eyebrow | | 8 | l_ear | left ear | | 9 | r_ear | right ear | | 10 | mouth | area between lips | | 11 | u_lip | upper lip | | 12 | l_lip | lower lip | | 13 | hair | | | 14 | hat | | | 15 | ear_r | earring | | 16 | neck_l | necklace | | 17 | neck | | | 18 | cloth | clothing | ## Usage in the browser (Transformers.js) ### p5.js Since p5.js uses an animation loop abstraction, we need to take care loading the model and making predictions. full p5.js example ### Model Description - **Developed by:** Jonathan Dinu - **Model type:** Transformer-based semantic segmentation image model - **License:** non-commercial research and educational purposes - **Resources for more information:** Transformers docs on Segformer and/or the original research paper. ## Limitations and Bias ### Bias While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The CelebAMask-HQ dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.", + "model_explanation_gemini": "Semantic segmentation model fine-tuned for face parsing to label facial components like eyes, lips, and hair in images." +} \ No newline at end of file diff --git a/data/model_data_json/jondurbin_unstuffer-v0.2.json b/data/model_data_json/jondurbin_unstuffer-v0.2.json new file mode 100644 index 0000000000000000000000000000000000000000..76a8ef41ca1a9eef5e2b15be4e9d00338a403593 --- /dev/null +++ b/data/model_data_json/jondurbin_unstuffer-v0.2.json @@ -0,0 +1,12 @@ +{ + "model_id": "jondurbin/unstuffer-v0.2", + "downloads": 129924, + "tags": [ + "safetensors", + "roberta", + "license:mit", + "region:us" + ], + "description": "--- license: mit ---", + "model_explanation_gemini": "Detects and removes unnecessary filler words or phrases from text to improve clarity and conciseness." +} \ No newline at end of file diff --git a/data/model_data_json/jonfd_electra-small-nordic.json b/data/model_data_json/jonfd_electra-small-nordic.json new file mode 100644 index 0000000000000000000000000000000000000000..ed1299e7fff0018400987f9a384fd7434fd0d30f --- /dev/null +++ b/data/model_data_json/jonfd_electra-small-nordic.json @@ -0,0 +1,24 @@ +{ + "model_id": "jonfd/electra-small-nordic", + "downloads": 135259, + "tags": [ + "transformers", + "pytorch", + "tf", + "electra", + "pretraining", + "is", + "no", + "sv", + "da", + "dataset:igc", + "dataset:ic3", + "dataset:jonfd/ICC", + "dataset:mc4", + "license:cc-by-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - is - no - sv - da license: cc-by-4.0 datasets: - igc - ic3 - jonfd/ICC - mc4 --- # Nordic ELECTRA-Small This model was pretrained on the following corpora: * The Icelandic Gigaword Corpus (IGC) * The Icelandic Common Crawl Corpus (IC3) * The Icelandic Crawled Corpus (ICC) * The Multilingual Colossal Clean Crawled Corpus (mC4) - Icelandic, Norwegian, Swedish and Danish text obtained from .is, .no, .se and .dk domains, respectively The total size of the corpus after document-level deduplication and filtering was 14.82B tokens, split equally between the four languages. The model was trained using a WordPiece tokenizer with a vocabulary size of 96,105 for one million steps with a batch size of 256, and otherwise with default settings. # Acknowledgments This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC). This project was funded by the Language Technology Programme for Icelandic 2019-2023. The programme, which is managed and coordinated by Almannarómur, is funded by the Icelandic Ministry of Education, Science and Culture.", + "model_explanation_gemini": "Pretrained on Nordic language text (Icelandic, Norwegian, Swedish, Danish) from multiple corpora for general language understanding tasks." +} \ No newline at end of file diff --git a/data/model_data_json/jtlicardo_bpmn-information-extraction-v2.json b/data/model_data_json/jtlicardo_bpmn-information-extraction-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..ba0fe0f8f4eb638e62a98eff71542dfe86929a97 --- /dev/null +++ b/data/model_data_json/jtlicardo_bpmn-information-extraction-v2.json @@ -0,0 +1,21 @@ +{ + "model_id": "jtlicardo/bpmn-information-extraction-v2", + "downloads": 127736, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "safetensors", + "bert", + "token-classification", + "generated_from_trainer", + "base_model:google-bert/bert-base-cased", + "base_model:finetune:google-bert/bert-base-cased", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - generated_from_trainer metrics: - precision - recall - f1 - accuracy widget: - text: The process starts when the customer enters the shop. The customer then takes the product from the shelf. The customer then pays for the product and leaves the store. example_title: Example 1 - text: The process begins when the HR department hires the new employee. Next, the new employee completes necessary paperwork and provides documentation to the HR department. After the initial task, the HR department performs a decision to determine the employee's role and department assignment. The employee is trained by the Sales department. After the training, the Sales department assigns the employee a sales quota and performance goals. Finally, the process ends with an 'End' event, when the employee begins their role in the Sales department. example_title: Example 2 - text: A customer places an order for a product on the company's website. Next, the customer service department checks the availability of the product and confirms the order with the customer. After the initial task, the warehouse processes the order. If the order is eligible for same-day shipping, the warehouse staff picks and packs the order, and it is sent to the shipping department. After the order is packed, the shipping department delivers the order to the customer. Finally, the process ends with an 'End' event, when the customer receives their order. example_title: Example 3 base_model: bert-base-cased model-index: - name: bpmn-information-extraction-v2 results: [] --- # bpmn-information-extraction-v2 This model is a fine-tuned version of bert-base-cased on a dataset containing 104 textual process descriptions. The dataset and the training scripts can be found here: The dataset contains 5 target labels: * * * * * It achieves the following results on the evaluation set: - Loss: 0.2179 - Precision: 0.8826 - Recall: 0.9246 - F1: 0.9031 - Accuracy: 0.9516 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 15 ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:| | 1.9945 | 1.0 | 12 | 1.5128 | 0.2534 | 0.3730 | 0.3018 | 0.5147 | | 1.2161 | 2.0 | 24 | 0.8859 | 0.2977 | 0.4524 | 0.3591 | 0.7256 | | 0.6755 | 3.0 | 36 | 0.4876 | 0.5562 | 0.7262 | 0.6299 | 0.8604 | | 0.372 | 4.0 | 48 | 0.3091 | 0.7260 | 0.8413 | 0.7794 | 0.9128 | | 0.2412 | 5.0 | 60 | 0.2247 | 0.7526 | 0.8571 | 0.8015 | 0.9342 | | 0.1636 | 6.0 | 72 | 0.2102 | 0.8043 | 0.8968 | 0.8480 | 0.9413 | | 0.1325 | 7.0 | 84 | 0.1910 | 0.8667 | 0.9286 | 0.8966 | 0.9500 | | 0.11 | 8.0 | 96 | 0.2352 | 0.8456 | 0.9127 | 0.8779 | 0.9389 | | 0.0945 | 9.0 | 108 | 0.2179 | 0.8550 | 0.9127 | 0.8829 | 0.9429 | | 0.0788 | 10.0 | 120 | 0.2203 | 0.8830 | 0.9286 | 0.9052 | 0.9445 | | 0.0721 | 11.0 | 132 | 0.2079 | 0.8902 | 0.9325 | 0.9109 | 0.9516 | | 0.0617 | 12.0 | 144 | 0.2367 | 0.8797 | 0.9286 | 0.9035 | 0.9445 | | 0.0615 | 13.0 | 156 | 0.2183 | 0.8859 | 0.9246 | 0.9049 | 0.9492 | | 0.0526 | 14.0 | 168 | 0.2179 | 0.8826 | 0.9246 | 0.9031 | 0.9516 | ### Framework versions - Transformers 4.26.1 - Pytorch 1.13.1+cu116 - Datasets 2.10.0 - Tokenizers 0.13.2", + "model_explanation_gemini": "Extracts structured BPMN process information from textual descriptions by classifying elements into five target labels." +} \ No newline at end of file diff --git a/data/model_data_json/jukofyork_creative-writing-control-vectors-v3.0.json b/data/model_data_json/jukofyork_creative-writing-control-vectors-v3.0.json new file mode 100644 index 0000000000000000000000000000000000000000..625921465f14c09fc1007b0ab99c8887d2427b0b --- /dev/null +++ b/data/model_data_json/jukofyork_creative-writing-control-vectors-v3.0.json @@ -0,0 +1,13 @@ +{ + "model_id": "jukofyork/creative-writing-control-vectors-v3.0", + "downloads": 91449, + "tags": [ + "gguf", + "control-vector", + "creative-writing", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - control-vector - creative-writing --- !image/png This repo contains pre-generated control vectors in GGUF format for use with llama.cpp: - **IMPORTANT**: These **new control vectors** must use their **respective de-bias control vector(s)**. - The code used to generate these can now be found at github.com/jukofyork/control-vectors. - All were generated with set to the model's hidden state dimension. Control vectors allow fine-tuned control over LLMs, enabling more precise/targeted text generation. --- ## Table of Contents - Applying Control Vectors - Command Line Generator - Direct Links - Algorithm Details - Changelog --- ## Applying Control Vectors ### To \"de-bias\" the model only: Use the option as follows: Alternatively for server mode: This will apply the \"language\" de-bias control vector to the model. You can apply multiple de-bias control vectors simultaneously like so: This will apply all 3 of the \"writing style\" de-bias control vectors. ### To fully apply a positive or negative axis control vector with the default scale-factor: Use the option as follows: This will fully apply (ie: with a scale-factor of ) the (positive-axis) \"ornate language\" control vector. **IMPORTANT: The positive and negative axis control vectors must be used along with the relevant de-bias control vector - they cannot be used on their own!** You can fully apply multiple positive or negative axis control vectors like so: This will fully apply (ie: with a scale-factor of ) all 3 of the (positive-axis) \"writing style\" control vectors. **NOTE**: Fully applying too many positive or negative axis control vector simultaneously may damage the model's output. ### To partially apply a positive or negative axis control vector using a custom scale-factor: This will partially apply the (positive-axis) \"ornate language\" control vector with a scale-factor of (ie: half the full effect). **IMPORTANT: The positive and negative axis control vectors must be used along with the relevant de-bias control vector - they cannot be used on their own!** You can partially apply multiple positive or negative axis control vectors like so: This will partially apply all 3 of the (positive-axis) \"writing style\" control vectors with varying weights. The theoretical upper bound value for equal weights is between and depending on how correlated the control vector directions are, eg: - For use the default scale-factor of for comparison with the values below. - For is between and . - For is between and . - For is between and . - For is between and . and so on. The way the positive and negative axis control vectors are calibrated means you can negate the scale-factors too, eg: is equivalent to: **NOTE**: It is possible to use scale-factors greater than , but if too large it will eventually damage the model's output. ### Important Notes 1. **Always** include the relevant \"de-bias\" control vector as well as the positive-axis/negative-axis control vector - they cannot be used on their own! 2. **Do not** mix both sides of a positive/negative axis at the same time (eg: and will just cancel out and have no effect...). 3. Ensure your version is up to date (multi-vector support added 27/06/24 in #8137). --- ## Command Line Generator Courtesy of gghfez, a utility to easily generate command line options for llama.cpp: !image/png You can run this tool directly on GitHub Pages. --- # Direct Links ## Very Large Models - c4ai-command-r-plus - c4ai-command-r-plus-08-2024 - Eurux-8x22b-nca - Lumimaid-v0.2-123B - magnum-v2-123b - Mistral-Large-Instruct-2407 - Mixtral-8x22B-Instruct-v0.1 - Qwen1.5-110B-Chat - WizardLM-2-8x22B ## Large Models - Athene-70B - aurelian-alpha0.1-70b-rope8-32K-fp16 - aurelian-v0.5-70b-rope8-32K-fp16 - daybreak-miqu-1-70b-v1.0-hf - deepseek-llm-67b-chat - dolphin-2.9.2-qwen2-72b - Hermes-3-Llama-3.1-70B - L3-70B-Euryale-v2.1 - L3.1-70B-Euryale-v2.2 - Llama-3-70B-Instruct-Storywriter - Llama-3-Lumimaid-70B-v0.1 - Llama-3.1-70B-ArliAI-RPMax-v1.1 - Lumimaid-v0.2-70B - magnum-72b-v1 - magnum-v2-72b - Meta-Llama-3-70B-Instruct - Meta-Llama-3.1-70B-Instruct - miqu-1-70b - Qwen1.5-72B-Chat - Qwen2-72B-Instruct - Qwen2.5-72B-Instruct - turbcat-instruct-72b ## Medium Models - 35b-beta-long - aya-23-35B - c4ai-command-r-v01 - c4ai-command-r-08-2024 (\\*\\*\\*READ THIS FIRST\\*\\*\\*) - Divergence-33B - gemma-2-27b-it - gemma-2-27b-it-SimPO-37K - gemma2-gutenberg-27B - internlm2_5-20b-chat - magnum-v1-32b - magnum-v2-32b - magnum-v3-27b-kto - magnum-v3-34b - Mistral-Small-Instruct-2409 - Mixtral-8x7B-Instruct-v0.1 - Nous-Capybara-34B - Qwen1.5-32B-Chat - Qwen2.5-32B-Instruct - Yi-34B-Chat - Yi-1.5-34B-Chat - Yi-1.5-34B-Chat-16K ## Small Models - aya-23-8B - gemma-2-9b-it - gemma-2-9b-it-SimPO - Gemma-2-9B-It-SPPO-Iter3 - gemma-2-Ifable-9B - Llama-3-Instruct-8B-SPPO-Iter3 - Llama-3.1-8B-ArliAI-RPMax-v1.1 - Meta-Llama-3-8B-Instruct - Meta-Llama-3.1-8B-Instruct - Mistral-7B-Instruct-v0.2 - Mistral-7B-Instruct-v0.3 - Mistral7B-PairRM-SPPO-Iter3 - Mistral-Nemo-12B-ArliAI-RPMax-v1.1 - mistral-nemo-gutenberg-12B - mistral-nemo-gutenberg-12B-v2 - Mistral-Nemo-Instruct-2407 - romulus-mistral-nemo-12b-simpo - Qwen1.5-14B-Chat - Qwen2-7B-Instruct - Qwen2.5-7B-Instruct - Qwen2.5-14B-Instruct - WizardLM-2-7B --- ## Algorithm Details ### 1. First we create a set of pre/post \"prompt stems\":
'prompt_stems.json' (click to expand)
The Cartesian product of these gives us 2500 (ie: 50 x 50) different \"You are an author\" type sentences. ### 2. Then we create several different creative-writing axis \"continuations\": **A set of 3 different \"writing style\" axis:**
\"Language\" (click to expand)
\"Storytelling (click to expand)\"
\"Character Focus (click to expand)\"
**The 4 elements of the Dark Tetrad**:
\"Empathy vs Sociopathy (click to expand)\"
\"Honesty vs Machiavellianism (click to expand)\"
\"Humility vs Narcissism (click to expand)\"
\"Compassion vs Sadism (click to expand)\"
**An \"Optimism vs Nihilism\" axis to compliment the Dark Tetrad axis:**
\"Optimism vs Nihilism (click to expand)\"
### 3. Then we collect a large number of creative-writing prompts: - I used Sao10K/Short-Storygen-v2 and a couple of other sources to get 11835 creative-writing prompts in total (see the file). - The jq command is very useful for extracting the prompts only from these datasets. ### 4. Run the model on a random sample of (prompt-stem, continuation, creative-writing prompts) combinations: The Cartesian product of: 2500 prompt-stem sentences x 10 continuation sentences x 11835 story prompts ≈ 300M possible combinations. - It is important that the same prompt-stem sample sentence be used with each (, , ) triplet. - It is also important that the same (prompt-stem, continuation) sample sentence be used with the and members of the same triplet. - The suggested value of for the option is because the theory regarding estimation of covariance matrices shows we need at the ***very least*** a minimum of one sample per feature (this may be overkill due to us only retaining the top Eigenvectors though...). ### 5. Create a pair of \"differenced datasets\" by subtracting the corresponding class's sample from both of the other 2 classes' samples: - The reason for this is so that we \"centre\" the data around the \"baseline\" (i.e., set the \"baseline\" as the origin and look for vector directions that point away from it). - This is in contrast to assuming the difference of the means is the \"centre\" for a 2-class version of this using PCA on the covariance matrix of the differences (i.e., the \"standard\" method of creating control vectors). ### 6. Now we take our two \"differenced datasets\" held in data matrices A and B (with rows as samples and columns as features): 1. Create the cross-covariance matrix, . 2. Next we symmetrise, . 3. Perform an eigendecomposition, . 4. Since we symmetrised the matrix, the **eigenvectors** () and **eigenvalues** () will all be real-valued. 5. Arrange the **eigenvectors** in descending order based on their corresponding **eigenvalues**. 6. Once the **eigenvectors** are sorted, discard the **eigenvalues** as they won't be needed again. The reason for using the cross-covariance matrix instead of the covariance matrix: - The **covariance matrix** of a differenced dataset exemplifies directions in **A or B** (ie: think about the expansion of ). - The **cross-covariance matrix** of a differenced dataset exemplifies directions in **A and B** (ie: akin to , with no or terms). The reason for creating the symmetrised matrix is two-fold: - To avoid complex-valued **eigenvectors** that tell us about rotations (which we can't actually make use of here anyway). - To specifically try to find opposing/balanced \"axis\" for our different traits (i.e., we don't want to find positively correlated directions nor unbalanced directions). ### 7. So now we have a set of \"directions\" to examine: - It turns out that 90% of the time the **principal eigenvector** (i.e., the **eigenvector** with the largest corresponding **eigenvalue**) is the one you want. - In the ~10% of cases where it is not the **principal eigenvector** or split between a couple of different **eigenvectors**, we (greedily) create a \"compound direction\" by examining the discriminant ratio of each direction. ### 8. Finally, we project the \"direction\" to reorient and scale as necessary: - There is no reason the **eigenvectors** point in the direction we want, so 50% of the time we have to flip all the signs by projecting our (differenced) \"desired\" dataset on to the (unit norm) direction and then test the sign of the mean. - Due to the way the LLMs work via the \"residual stream\", the hidden states tend to get larger and larger as the layers progress, so to normalize this we also scale by the magnitude of the mean of the same projection as above. - To better separate the \"bias\" effect from the positive/negative axis (and to make the positive/negative end equidistant from the model's \"baseline\" behaviour) we store the mid point of these means in the de-bias control vector and then subtract the midpoint from both the positive and negative axis' control vectors. **NOTES**: - I have found the above can be applied to every layer, but often the last layer will have hidden state means that are 10-100x larger than the rest, so I have excluded these from all I have uploaded here. - I have tried many other different eigendecompositions: PCA on the 2-class differenced datasets, PCA on the joined 2-class/3-class datasets, solving generalized eigensystems similar to CCA, and so on. - The \"balanced\" directions / \"axis\" this method finds are the ***exact opposite*** of those needed for the Refusal in LLMs is mediated by a single direction paper. --- ## Changelog - *28/08/24 - Added Qwen2-72B-Instruct.* - *29/08/24 - Added Qwen1.5-72B-Chat, Mistral-7B-Instruct-v0.2, Mistral-7B-Instruct-v0.3, miqu-1-70b, Mixtral-8x7B-Instruct-v0.1 and Yi-1.5-34B-Chat-16K.* - *30/08/24 - Added Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B-Instruct, Meta-Llama-3.1-8B-Instruct and Meta-Llama-3.1-70B-Instruct.* - *31/08/24 - Added aya-23-35B, Gemma-2-9B-It-SPPO-Iter3 and Qwen1.5-14B-Chat.* - *01/09/24 - Added Mixtral-8x22B-Instruct-v0.1 and Qwen1.5-110B-Chat.* - *02/09/24 - Added c4ai-command-r-plus-08-2024.* - *03/09/24 - Added c4ai-command-r-08-2024 (\\*\\*\\*READ THIS FIRST\\*\\*\\*), Yi-1.5-34B-Chat, gemma-2-27b-it-SimPO-37K, aya-23-8B, gemma-2-9b-it-SimPO, Qwen2-7B-Instruct and Yi-34B-Chat.* - *04/09/24 - Added deepseek-llm-67b-chat, internlm2_5-20b-chat, Athene-70B, Llama-3-Instruct-8B-SPPO-Iter3, magnum-v2-32b, Mistral7B-PairRM-SPPO-Iter3 and Nous-Capybara-34B.* - *05/09/24 - Added Llama-3-70B-Instruct-Storywriter, 35b-beta-long and magnum-v3-34b.* - *06/09/24 - Added Hermes-3-Llama-3.1-70B, magnum-v2-72b, magnum-v1-32b and L3.1-70B-Euryale-v2.2.* - *08/09/24 - Added aurelian-v0.5-70b-rope8-32K-fp16, aurelian-alpha0.1-70b-rope8-32K-fp16, L3-70B-Euryale-v2.1, Llama-3-Lumimaid-70B-v0.1, magnum-72b-v1 and turbcat-instruct-72b.* - *09/09/24 - Added daybreak-miqu-1-70b-v1.0-hf, dolphin-2.9.2-qwen2-72b and Lumimaid-v0.2-70B.* - *11/09/24 - Added Lumimaid-v0.2-123B.* - *12/09/24 - Added magnum-v2-123b.* - *13/09/24 - Added Eurux-8x22b-nca.* - *14/09/24 - Added Divergence-33B, gemma2-gutenberg-27B, gemma-2-Ifable-9B, mistral-nemo-gutenberg-12B, mistral-nemo-gutenberg-12B-v2, romulus-mistral-nemo-12b-simpo, Llama-3.1-8B-ArliAI-RPMax-v1.1, Mistral-Nemo-12B-ArliAI-RPMax-v1.1 and Llama-3.1-70B-ArliAI-RPMax-v1.1.* - *20/09/24 - Added Qwen2.5-7B-Instruct, Qwen2.5-14B-Instruct, Qwen2.5-32B-Instruct, Qwen2.5-72B-Instruct, magnum-v3-27b-kto and Mistral-Small-Instruct-2409.*", + "model_explanation_gemini": "Provides pre-generated control vectors for llama.cpp to enable fine-tuned, targeted text generation in creative writing by applying positive/negative axis vectors with de-biasing." +} \ No newline at end of file diff --git a/data/model_data_json/julien-c_dummy-unknown.json b/data/model_data_json/julien-c_dummy-unknown.json new file mode 100644 index 0000000000000000000000000000000000000000..5bfa18cd9db91a2ee48f340a902b0d42fb39c81d --- /dev/null +++ b/data/model_data_json/julien-c_dummy-unknown.json @@ -0,0 +1,18 @@ +{ + "model_id": "julien-c/dummy-unknown", + "downloads": 84245, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "roberta", + "fill-mask", + "ci", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - ci --- ## Dummy model used for unit testing and CI", + "model_explanation_gemini": "Dummy model for unit testing and CI purposes." +} \ No newline at end of file diff --git a/data/model_data_json/katuni4ka_tiny-random-llava.json b/data/model_data_json/katuni4ka_tiny-random-llava.json new file mode 100644 index 0000000000000000000000000000000000000000..c721d12314cdb8b2068557a581427f6c17229844 --- /dev/null +++ b/data/model_data_json/katuni4ka_tiny-random-llava.json @@ -0,0 +1,16 @@ +{ + "model_id": "katuni4ka/tiny-random-llava", + "downloads": 101784, + "tags": [ + "transformers", + "safetensors", + "llava", + "image-text-to-text", + "conversational", + "arxiv:1910.09700", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]", + "model_explanation_gemini": "A randomly initialized, minimal LLaVA model with unspecified functionality due to incomplete documentation." +} \ No newline at end of file diff --git a/data/model_data_json/katuni4ka_tiny-random-nanollava.json b/data/model_data_json/katuni4ka_tiny-random-nanollava.json new file mode 100644 index 0000000000000000000000000000000000000000..f9bf40a57e8b77343b19dded240a41bd5de1ec97 --- /dev/null +++ b/data/model_data_json/katuni4ka_tiny-random-nanollava.json @@ -0,0 +1,17 @@ +{ + "model_id": "katuni4ka/tiny-random-nanollava", + "downloads": 98679, + "tags": [ + "safetensors", + "llava-qwen2", + "llava", + "multimodal", + "qwen", + "custom_code", + "en", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: - en tags: - llava - multimodal - qwen license: apache-2.0 --- # nanoLLaVA - Sub 1B Vision-Language Model

\"Logo\"

## Description nanoLLaVA is a \"small but mighty\" 1B vision-language model designed to run efficiently on edge devices. - **Base LLM**: Quyen-SE-v0.1 (Qwen1.5-0.5B) - **Vision Encoder**: google/siglip-so400m-patch14-384 | Model | **VQA v2** | **TextVQA** | **ScienceQA** | **POPE** | **MMMU (Test)** | **MMMU (Eval)** | **GQA** | **MM-VET** | |---------|--------|---------|-----------|------|-------------|-------------|------|--------| | Score | 70.84 | 46.71 | 58.97 | 84.1 | 28.6 | 30.4 | 54.79| 23.9 | ## Training Data Training Data will be released later as I am still writing a paper on this. Expect the final final to be much more powerful than the current one. ## Finetuning Code Coming Soon!!! ## Usage You can use with with the following script: ## Prompt Format The model follow the ChatML standard, however, without at the end of : --- | Image | Example | |--------------------------------------|---------------------------------------------------------------------------------------------| | !small | **What is the text saying?**
\"Small but mighty\".
**How does the text correlate to the context of the image?**
The text seems to be a playful or humorous representation of a small but mighty figure, possibly a mouse or a mouse toy, holding a weightlifting bar. | --- Model is trained using a modified version from Bunny", + "model_explanation_gemini": "nanoLLaVA is a compact, under 1B parameter vision-language model optimized for edge devices, combining a Qwen-based LLM with a SigLIP vision encoder for multimodal tasks like visual question answering." +} \ No newline at end of file diff --git a/data/model_data_json/keremberke_yolov5n-license-plate.json b/data/model_data_json/keremberke_yolov5n-license-plate.json new file mode 100644 index 0000000000000000000000000000000000000000..c6583c1a83b20bc403f3eb924e59676a175a3ef5 --- /dev/null +++ b/data/model_data_json/keremberke_yolov5n-license-plate.json @@ -0,0 +1,17 @@ +{ + "model_id": "keremberke/yolov5n-license-plate", + "downloads": 100489, + "tags": [ + "yolov5", + "tensorboard", + "yolo", + "vision", + "object-detection", + "pytorch", + "dataset:keremberke/license-plate-object-detection", + "model-index", + "region:us" + ], + "description": "--- tags: - yolov5 - yolo - vision - object-detection - pytorch library_name: yolov5 library_version: 7.0.6 inference: false datasets: - keremberke/license-plate-object-detection model-index: - name: keremberke/yolov5n-license-plate results: - task: type: object-detection dataset: type: keremberke/license-plate-object-detection name: keremberke/license-plate-object-detection split: validation metrics: - type: precision # since mAP@0.5 is not available on hf.co/metrics value: 0.9783431294995892 # min: 0.0 - max: 1.0 name: mAP@0.5 ---
\"keremberke/yolov5n-license-plate\" ### How to use - Install yolov5: - Load model and perform prediction: - Finetune the model on your custom dataset: **More models available at: awesome-yolov5-models**", + "model_explanation_gemini": "Detects license plates in images using YOLOv5 object detection." +} \ No newline at end of file diff --git a/data/model_data_json/kha-white_manga-ocr-base.json b/data/model_data_json/kha-white_manga-ocr-base.json new file mode 100644 index 0000000000000000000000000000000000000000..e260e0c8391d4e6ad099b392597db1725ae5a3c5 --- /dev/null +++ b/data/model_data_json/kha-white_manga-ocr-base.json @@ -0,0 +1,18 @@ +{ + "model_id": "kha-white/manga-ocr-base", + "downloads": 128903, + "tags": [ + "transformers", + "pytorch", + "vision-encoder-decoder", + "image-text-to-text", + "image-to-text", + "ja", + "dataset:manga109s", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja tags: - image-to-text license: apache-2.0 datasets: - manga109s --- # Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses Vision Encoder Decoder framework. Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga: - both vertical and horizontal text - text with furigana - text overlaid on images - wide variety of fonts and font styles - low quality images Code is available here.", + "model_explanation_gemini": "Performs optical character recognition (OCR) on Japanese manga text, handling vertical/horizontal text, furigana, image overlays, diverse fonts, and low-quality images." +} \ No newline at end of file diff --git a/data/model_data_json/kingabzpro_wav2vec2-large-xls-r-300m-Urdu.json b/data/model_data_json/kingabzpro_wav2vec2-large-xls-r-300m-Urdu.json new file mode 100644 index 0000000000000000000000000000000000000000..a3feb9356bd2cdb525432b07698de0944f4b667f --- /dev/null +++ b/data/model_data_json/kingabzpro_wav2vec2-large-xls-r-300m-Urdu.json @@ -0,0 +1,24 @@ +{ + "model_id": "kingabzpro/wav2vec2-large-xls-r-300m-Urdu", + "downloads": 104704, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "generated_from_trainer", + "hf-asr-leaderboard", + "robust-speech-event", + "ur", + "dataset:mozilla-foundation/common_voice_8_0", + "base_model:facebook/wav2vec2-xls-r-300m", + "base_model:finetune:facebook/wav2vec2-xls-r-300m", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ur license: apache-2.0 tags: - generated_from_trainer - hf-asr-leaderboard - robust-speech-event datasets: - mozilla-foundation/common_voice_8_0 metrics: - wer base_model: facebook/wav2vec2-xls-r-300m model-index: - name: wav2vec2-large-xls-r-300m-Urdu results: - task: type: automatic-speech-recognition name: Speech Recognition dataset: name: Common Voice 8 type: mozilla-foundation/common_voice_8_0 args: ur metrics: - type: wer value: 39.89 name: Test WER - type: cer value: 16.7 name: Test CER --- --- # wav2vec2-large-xls-r-300m-Urdu This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It achieves the following results on the evaluation set: - Loss: 0.9889 - Wer: 0.5607 - Cer: 0.2370 #### Evaluation Commands 1. To evaluate on with split ### Inference With LM ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 32 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1000 - num_epochs: 200 ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |:-------------:|:------:|:----:|:---------------:|:------:|:------:| | 3.6398 | 30.77 | 400 | 3.3517 | 1.0 | 1.0 | | 2.9225 | 61.54 | 800 | 2.5123 | 1.0 | 0.8310 | | 1.2568 | 92.31 | 1200 | 0.9699 | 0.6273 | 0.2575 | | 0.8974 | 123.08 | 1600 | 0.9715 | 0.5888 | 0.2457 | | 0.7151 | 153.85 | 2000 | 0.9984 | 0.5588 | 0.2353 | | 0.6416 | 184.62 | 2400 | 0.9889 | 0.5607 | 0.2370 | ### Framework versions - Transformers 4.17.0.dev0 - Pytorch 1.10.2+cu102 - Datasets 1.18.2.dev0 - Tokenizers 0.11.0 ### Eval results on Common Voice 8 \"test\" (WER): | Without LM | With LM (run ) | |---|---| | 52.03 | 39.89 |", + "model_explanation_gemini": "A fine-tuned Urdu speech recognition model based on wav2vec2-xls-r-300m, achieving 39.89 WER on Common Voice 8 test data." +} \ No newline at end of file diff --git a/data/model_data_json/klue_bert-base.json b/data/model_data_json/klue_bert-base.json new file mode 100644 index 0000000000000000000000000000000000000000..5d5db9ff5c9840d66dd89f92e8503fa9066cbf77 --- /dev/null +++ b/data/model_data_json/klue_bert-base.json @@ -0,0 +1,22 @@ +{ + "model_id": "klue/bert-base", + "downloads": 127506, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "fill-mask", + "korean", + "klue", + "ko", + "arxiv:2105.09680", + "arxiv:1910.09700", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko license: cc-by-sa-4.0 tags: - korean - klue mask_token: \"[MASK]\" widget: - text: 대한민국의 수도는 [MASK] 입니다. --- # KLUE BERT base ## Table of Contents - Model Details - How to Get Started With the Model - Uses - Risks, Limitations and Biases - Training - Evaluation - Environmental Impact - Technical Specifications - Citation Information - Model Card Authors ## Model Details **Model Description:** KLUE BERT base is a pre-trained BERT Model on Korean Language. The developers of KLUE BERT base developed the model in the context of the development of the Korean Language Understanding Evaluation (KLUE) Benchmark. - **Developed by:** See GitHub Repo for model developers - **Model Type:** Transformer-based language model - **Language(s):** Korean - **License:** cc-by-sa-4.0 - **Parent Model:** See the BERT base uncased model for more information about the BERT base model. - **Resources for more information:** - Research Paper - GitHub Repo ## How to Get Started With the Model ## Uses #### Direct Use The model can be used for tasks including topic classification, semantic textual similarity, natural language inference, named entity recognition, and other tasks outlined in the KLUE Benchmark. #### Misuse and Out-of-scope Use The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Risks, Limitations and Biases Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). The model developers discuss several ethical considerations related to the model in the paper, including: - Bias issues with the publicly available data used in the pretraining corpora (and considerations related to filtering) - PII in the data used in the pretraining corpora (and efforts to pseudonymize the data) For ethical considerations related to the KLUE Benchmark, also see the paper. ## Training #### Training Data The authors use the following pretraining corpora for the model, described in the associated paper: > We gather the following five publicly available Korean corpora from diverse sources to cover a broad set of topics and many different styles. We combine these corpora to build the final pretraining corpus of size approximately 62GB. > > - **MODU:** Modu Corpus is a collection of Korean corpora distributed by National Institute of Korean Languages. It includes both formal articles (news and books) and colloquial text (dialogues). > - **CC-100-Kor:** CC-100 is the large-scale multilingual web crawled corpora by using CC-Net (Wenzek et al., 2020). This is used for training XLM-R (Conneau et al., 2020). We use the Korean portion from this corpora. > - **NAMUWIKI:** NAMUWIKI is a Korean web-based encyclopedia, similar to Wikipedia, but known to be less formal. Specifically, we download the dump created on March 2nd, 2020. > - **NEWSCRAWL:** NEWSCRAWL consists of 12,800,000 news articles published from 2011 to 2020, collected from a news aggregation platform. > - **PETITION:** Petition is a collection of public petitions posted to the Blue House asking for administrative actions on social issues. We use the articles in the Blue House National Petition published from August 2017 to March 2019. The authors also describe ethical considerations related to the pretraining corpora in the associated paper. #### Training Procedure ##### Preprocessing The authors describe their preprocessing procedure in the associated paper: > We filter noisy text and non-Korean text using the same methods from Section 2.3 (of the paper). Each document in the corpus is split into sentences using C++ implementation (v1.3.1.) of rule-based Korean Sentence Splitter (KSS). For CC-100-Kor and NEWSCRAWL, we keep sentences of length greater than equal to 200 characters, as a heuristics to keep well-formed sentences. We then remove sentences included in our benchmark task datasets, using BM25 as a sentence similarity metric (reference). ###### Tokenization The authors describe their tokenization procedure in the associated paper: > We design and use a new tokenization method, morpheme-based subword tokenization. When building a vocabulary, we pre-tokenize a raw text into morphemes using a morphological analyzer, and then we apply byte pair encoding (BPE) (Senrich et al., 2016) to get the final vocabulary. For morpheme segmentation, we use Mecab-ko, MeCab (Kudo, 2006) adapted for Korean, and for BPE segmentation, we use the wordpiece tokenizer from Huggingface Tokenizers library. We specify the vocabulary size to 32k. After building the vocabulary, we only use the BPE model during inference, which allows us to tokenize a word sequence by reflecting morphemes without a morphological analyzer. This improves both usability and speed. The training configurations are further described in the paper. ## Evaluation #### Testing Data, Factors and Metrics The model was evaluated on the KLUE Benchmark. The tasks and metrics from the KLUE Benchmark that were used to evaluate this model are described briefly below. For more information about the KLUE Benchmark, see the data card, Github Repository, and associated paper. - **Task:** Topic Classification (TC) - Yonhap News Agency Topic Classification (YNAT), **Metrics:** Macro F1 score, defined as the mean of topic-wise F1 scores, giving the same importance to each topic. - **Task:** Semantic Textual Similarity (STS), **Metrics:** Pearsons' correlation coefficient (Pearson’ r) and F1 score - **Task:** Natural Language Inference (NLI), **Metrics:** Accuracy - **Task:** Named Entity Recognition (NER), **Metrics:** Entity-level macro F1 (Entity F1) and character-level macro F1 (Char F1) scores - **Task:** Relation Extraction (RE), **Metrics:** Micro F1 score on relation existing cases and area under the precision- recall curve (AUPRC) on all classes - **Task:** Dependency Parsing (DP), **Metrics:** Unlabeled attachment score (UAS) and labeled attachment score (LAS) - **Task:** Machine Reading Comprehension (MRC), **Metrics:** Exact match (EM) and character-level ROUGE-W (ROUGE), which can be viewed as longest common consecutive subsequence (LCCS)-based F1 score. - **Task:** Dialogue State Tracking (DST), **Metrics:** Joint goal accuracy (JGA) and slot micro F1 score (Slot F1) #### Results | Task | TC | STS | | NLI | NER | | RE | | DP | | MRC | | DST | | | :---: |:---: | :---: | :---: |:---:| :---: | :---: |:---:| :---:| :---: |:---: | :---: | :---:| :---: | :---: | | Metric | F1 | Pearsons' r| F1 | ACC | Entity F1 | Char F1 | F1 | AUPRC| UAS | LAS | EM | ROUGE| JGA |Slot F1 | | | 85.73| 90.85 | 82.84 |81.63| 83.97 | 91.39 |66.44| 66.17| 89.96 |88.05 | 62.32 | 68.51| 46.64 | 91.61 | ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). We present the hardware type based on the associated paper. - **Hardware Type:** TPU v3-8 - **Hours used:** Unknown - **Cloud Provider:** Unknown - **Compute Region:** Unknown - **Carbon Emitted:** Unknown ## Technical Specifications See the associated paper for details on the modeling architecture (BERT), objective, compute infrastructure, and training details. ## Citation Information", + "model_explanation_gemini": "KLUE BERT base is a pre-trained Korean BERT model designed for tasks like classification, semantic similarity, NLI, and NER as part of the KLUE Benchmark." +} \ No newline at end of file diff --git a/data/model_data_json/klue_roberta-base.json b/data/model_data_json/klue_roberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..107c8e196abe49ca149203feb5c28d8931f46fef --- /dev/null +++ b/data/model_data_json/klue_roberta-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "klue/roberta-base", + "downloads": 828132, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "roberta", + "fill-mask", + "korean", + "klue", + "ko", + "arxiv:2105.09680", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko tags: - korean - klue mask_token: \"[MASK]\" widget: - text: 대한민국의 수도는 [MASK] 입니다. --- # KLUE RoBERTa base Pretrained RoBERTa Model on Korean Language. See Github and Paper for more details. ## How to use _NOTE:_ Use instead of RobertaTokenizer. ( will load ) ## BibTeX entry and citation info", + "model_explanation_gemini": "Pretrained RoBERTa model for Korean language tasks, optimized for masked language modeling and compatible with a specific tokenizer." +} \ No newline at end of file diff --git a/data/model_data_json/klue_roberta-large.json b/data/model_data_json/klue_roberta-large.json new file mode 100644 index 0000000000000000000000000000000000000000..79f9d59c688f4718a0a904997036f9baed31d330 --- /dev/null +++ b/data/model_data_json/klue_roberta-large.json @@ -0,0 +1,20 @@ +{ + "model_id": "klue/roberta-large", + "downloads": 133863, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "roberta", + "fill-mask", + "korean", + "klue", + "ko", + "arxiv:2105.09680", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko tags: - korean - klue mask_token: \"[MASK]\" widget: - text: 대한민국의 수도는 [MASK] 입니다. --- # KLUE RoBERTa large Pretrained RoBERTa Model on Korean Language. See Github and Paper for more details. ## How to use _NOTE:_ Use instead of RobertaTokenizer. ( will load ) ## BibTeX entry and citation info", + "model_explanation_gemini": "Pretrained RoBERTa model for Korean language tasks, optimized for masked language modeling and understanding Korean text." +} \ No newline at end of file diff --git a/data/model_data_json/kosbu_Llama-3.3-70B-Instruct-AWQ.json b/data/model_data_json/kosbu_Llama-3.3-70B-Instruct-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..8da241a03144b77a80d0c4abf16c6dfa5657ed6a --- /dev/null +++ b/data/model_data_json/kosbu_Llama-3.3-70B-Instruct-AWQ.json @@ -0,0 +1,33 @@ +{ + "model_id": "kosbu/Llama-3.3-70B-Instruct-AWQ", + "downloads": 335536, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "llama-3", + "awq", + "conversational", + "en", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "de", + "base_model:meta-llama/Llama-3.3-70B-Instruct", + "base_model:quantized:meta-llama/Llama-3.3-70B-Instruct", + "license:llama3.3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "region:us" + ], + "description": "--- license: llama3.3 base_model: - meta-llama/Llama-3.3-70B-Instruct language: - en - fr - it - pt - hi - es - th - de tags: - facebook - meta - llama - llama-3 - awq pipeline_tag: text-generation library_name: transformers --- # Llama-3.3-70B-Instruct AWQ 4-Bit Quantized Version This repository provides the AWQ 4-bit quantized version of []( originally developed by Meta AI.", + "model_explanation_gemini": "An AWQ 4-bit quantized version of Meta's Llama-3.3-70B-Instruct model optimized for efficient text generation in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/kredor_punctuate-all.json b/data/model_data_json/kredor_punctuate-all.json new file mode 100644 index 0000000000000000000000000000000000000000..842bc4165775ab5493e34aeb9d799a5eba60a423 --- /dev/null +++ b/data/model_data_json/kredor_punctuate-all.json @@ -0,0 +1,17 @@ +{ + "model_id": "kredor/punctuate-all", + "downloads": 687830, + "tags": [ + "transformers", + "pytorch", + "xlm-roberta", + "token-classification", + "dataset:wmt/europarl", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit datasets: - wmt/europarl metrics: - f1 - recall - precision --- This is based on Oliver Guhr's work. The difference is that it is a finetuned xlm-roberta-base instead of an xlm-roberta-large and on twelve languages instead of four. The languages are: English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian. ----- report ----- precision recall f1-score support 0 0.99 0.99 0.99 73317475 . 0.94 0.95 0.95 4484845 , 0.86 0.86 0.86 6100650 ? 0.88 0.85 0.86 136479 - 0.60 0.29 0.39 233630 : 0.71 0.49 0.58 152424 accuracy 0.98 84425503 macro avg 0.83 0.74 0.77 84425503 weighted avg 0.98 0.98 0.98 84425503 ----- confusion matrix ----- t/p 0 . , ? - : 0 1.0 0.0 0.0 0.0 0.0 0.0 . 0.0 1.0 0.0 0.0 0.0 0.0 , 0.1 0.0 0.9 0.0 0.0 0.0 ? 0.0 0.1 0.0 0.8 0.0 0.0 - 0.1 0.1 0.5 0.0 0.3 0.0 : 0.0 0.3 0.1 0.0 0.0 0.5", + "model_explanation_gemini": "Finetuned xlm-roberta-base model for punctuation restoration across twelve languages, achieving high precision and recall for common punctuation marks." +} \ No newline at end of file diff --git a/data/model_data_json/kresnik_wav2vec2-large-xlsr-korean.json b/data/model_data_json/kresnik_wav2vec2-large-xlsr-korean.json new file mode 100644 index 0000000000000000000000000000000000000000..69327fa34ff927bd392edddf1f67da88522b493f --- /dev/null +++ b/data/model_data_json/kresnik_wav2vec2-large-xlsr-korean.json @@ -0,0 +1,21 @@ +{ + "model_id": "kresnik/wav2vec2-large-xlsr-korean", + "downloads": 665768, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "speech", + "audio", + "ko", + "dataset:kresnik/zeroth_korean", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko datasets: - kresnik/zeroth_korean tags: - speech - audio - automatic-speech-recognition license: apache-2.0 model-index: - name: 'Wav2Vec2 XLSR Korean' results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Zeroth Korean type: kresnik/zeroth_korean args: clean metrics: - name: Test WER type: wer value: 4.74 - name: Test CER type: cer value: 1.78 --- ## Evaluation on Zeroth-Korean ASR corpus Google colab notebook(Korean) ### Expected WER: 4.74% ### Expected CER: 1.78%", + "model_explanation_gemini": "Converts Korean speech to text using automatic speech recognition, achieving 4.74% WER and 1.78% CER on the Zeroth Korean dataset." +} \ No newline at end of file diff --git a/data/model_data_json/ku-nlp_deberta-v2-tiny-japanese.json b/data/model_data_json/ku-nlp_deberta-v2-tiny-japanese.json new file mode 100644 index 0000000000000000000000000000000000000000..7a1aea63cc7f9c3ee1eb5c8a497b377deffac213 --- /dev/null +++ b/data/model_data_json/ku-nlp_deberta-v2-tiny-japanese.json @@ -0,0 +1,21 @@ +{ + "model_id": "ku-nlp/deberta-v2-tiny-japanese", + "downloads": 80838, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "deberta-v2", + "fill-mask", + "deberta", + "ja", + "dataset:wikipedia", + "dataset:cc100", + "dataset:oscar", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja license: cc-by-sa-4.0 library_name: transformers tags: - deberta - deberta-v2 - fill-mask datasets: - wikipedia - cc100 - oscar metrics: - accuracy mask_token: \"[MASK]\" widget: - text: \"京都 大学 で 自然 言語 処理 を [MASK] する 。\" --- # Model Card for Japanese DeBERTa V2 tiny ## Model description This is a Japanese DeBERTa V2 tiny model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR. ## How to use You can use this model for masked language modeling as follows: You can also fine-tune this model on downstream tasks. ## Tokenization The input text should be segmented into words by Juman++ in advance. Juman++ 2.0.0-rc3 was used for pre-training. Each word is tokenized into subwords by sentencepiece. ## Training data We used the following corpora for pre-training: - Japanese Wikipedia (as of 20221020, 3.2GB, 27M sentences, 1.3M documents) - Japanese portion of CC-100 (85GB, 619M sentences, 66M documents) - Japanese portion of OSCAR (54GB, 326M sentences, 25M documents) Note that we filtered out documents annotated with \"header\", \"footer\", or \"noisy\" tags in OSCAR. Also note that Japanese Wikipedia was duplicated 10 times to make the total size of the corpus comparable to that of CC-100 and OSCAR. As a result, the total size of the training data is 171GB. ## Training procedure We first segmented texts in the corpora into words using Juman++. Then, we built a sentencepiece model with 32000 tokens including words (JumanDIC) and subwords induced by the unigram language model of sentencepiece. We tokenized the segmented corpora into subwords using the sentencepiece model and trained the Japanese DeBERTa model using transformers library. The training took 33 hours using 8 NVIDIA A100-SXM4-40GB GPUs. The following hyperparameters were used during pre-training: - learning_rate: 1e-3 - per_device_train_batch_size: 128 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 6 - total_train_batch_size: 6,144 - max_seq_length: 512 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 - lr_scheduler_type: linear schedule with warmup - training_steps: 100,000 - warmup_steps: 10,000 The accuracy of the trained model on the masked language modeling task was 0.593. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora. ## Acknowledgments This work was supported by Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN) through General Collaboration Project no. jh221004, \"Developing a Platform for Constructing and Sharing of Large-Scale Japanese Language Models\". For training models, we used the mdx: a platform for the data-driven future." +} \ No newline at end of file diff --git a/data/model_data_json/kuelumbus_polyBERT.json b/data/model_data_json/kuelumbus_polyBERT.json new file mode 100644 index 0000000000000000000000000000000000000000..d1b83584d9e69847d4f876e08ad2f6a95640b6b1 --- /dev/null +++ b/data/model_data_json/kuelumbus_polyBERT.json @@ -0,0 +1,17 @@ +{ + "model_id": "kuelumbus/polyBERT", + "downloads": 988721, + "tags": [ + "sentence-transformers", + "pytorch", + "deberta-v2", + "feature-extraction", + "sentence-similarity", + "transformers", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers widget: - source_sentence: \"[*]CC[*]\" sentences: - \"[*]COC[*]\" - \"[*]CC(C)C[*]\" --- # kuelumbus/polyBERT This is polyBERT: A chemical language model to enable fully machine-driven ultrafast polymer informatics. polyBERT maps PSMILES strings to 600 dimensional dense fingerprints. The fingerprints numerically represent polymer chemical structures. Please see the license agreement in the LICENSE file. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results See and paper on arXiv. ## Full Model Architecture ## Citing & Authors Kuenneth, C., Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat Commun 14, 4099 (2023).", + "model_explanation_gemini": "Generates 600-dimensional chemical structure fingerprints from PSMILES strings for ultrafast polymer informatics." +} \ No newline at end of file diff --git a/data/model_data_json/kykim_bert-kor-base.json b/data/model_data_json/kykim_bert-kor-base.json new file mode 100644 index 0000000000000000000000000000000000000000..869960ac822fb15326f06d5bee596236e6c99f35 --- /dev/null +++ b/data/model_data_json/kykim_bert-kor-base.json @@ -0,0 +1,18 @@ +{ + "model_id": "kykim/bert-kor-base", + "downloads": 96982, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "ko", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko --- # Bert base model for Korean * 70GB Korean text dataset and 42000 lower-cased subwords are used * Check the model performance and other language models for Korean in github", + "model_explanation_gemini": "Korean BERT base model trained on 70GB of Korean text with 42,000 lower-cased subwords for Korean language tasks." +} \ No newline at end of file diff --git a/data/model_data_json/kyutai_mimi.json b/data/model_data_json/kyutai_mimi.json new file mode 100644 index 0000000000000000000000000000000000000000..56875e9a1df14c835cc3e5a43a56960fe298b42f --- /dev/null +++ b/data/model_data_json/kyutai_mimi.json @@ -0,0 +1,16 @@ +{ + "model_id": "kyutai/mimi", + "downloads": 414620, + "tags": [ + "transformers", + "safetensors", + "mimi", + "feature-extraction", + "audio", + "license:cc-by-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-4.0 library_name: transformers tags: - mimi - audio --- !image/png # Model Card for Mimi Mimi codec is a state-of-the-art audio neural codec, developped by Kyutai, that combines semantic and acoustic information into audio tokens running at 12Hz and a bitrate of 1.1kbps. ## Model Details ### Model Description Mimi is a high-fidelity audio codec leveraging neural networks. It introduces a streaming encoder-decoder architecture with quantized latent space, trained in an end-to-end fashion. It was trained on speech data, which makes it particularly adapted to train speech language models or text-to-speech systems. - **Developed by:** Kyutai - **Model type:** Audio codec - **Audio types:** Speech - **License:** CC-BY ### Model Sources - **Repository:** repo - **Paper:** paper - **Demo:** demo ## Uses ## How to Get Started with the Model ### Usage with Use the following code to get started with the Mimi model using a dummy example from the LibriSpeech dataset (~9MB). First, install the required Python packages: Then load an audio sample, and run a forward pass of the model: ### Usage with Moshi See the main README file. ### Direct Use Mimi can be used directly as an audio codec for real-time compression and decompression of speech signals. It provides high-quality audio compression and efficient decoding. ### Out-of-Scope Use The model is not intended to be used to impersonate other people or any malicious use of any kind. ## Bias, Risks, and Limitations The model has been trained with a few safeguards to try to limit potential toxic usages, however our toxicity analysis shows that it behaves in the middle of existing models with respect to textual generation. It has some bias towards certain domains and topics that are over-represented in the training data. Its capabilities are relatively limited so far and it is trained to produce only one voice to avoid impersonation. Yet, we need the perspective in time to establish the sociotechnical limitations. ## Training Details ### Training Data The training data is detailled in the paper. ### Training procedure and hyper-parameters The different stages of the training procedure are detailled in the paper along with the hyper-parameters. ## Citation ## Model Card Authors Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, Neil Zeghidour, Yoach Lacombe", + "model_explanation_gemini": "Kyutai's Mimi is a neural audio codec designed for high-fidelity speech compression and decompression at 12Hz with 1.1kbps bitrate, optimized for training speech language models and text-to-speech systems." +} \ No newline at end of file diff --git a/data/model_data_json/kyutai_moshiko-pytorch-bf16.json b/data/model_data_json/kyutai_moshiko-pytorch-bf16.json new file mode 100644 index 0000000000000000000000000000000000000000..c7d76e12954c163bfc232b08cfea924dda33fa50 --- /dev/null +++ b/data/model_data_json/kyutai_moshiko-pytorch-bf16.json @@ -0,0 +1,13 @@ +{ + "model_id": "kyutai/moshiko-pytorch-bf16", + "downloads": 165018, + "tags": [ + "moshi", + "safetensors", + "en", + "license:cc-by-4.0", + "region:us" + ], + "description": "--- # For reference on model card metadata, see the spec: # Doc / guide: license: cc-by-4.0 language: - en library_name: moshi --- # Model Card for Moshi Moshi is a speech-text foundation model and full-duplex spoken dialogue framework ## Model Details Pytorch version quantized in bf16 precision. ### Model Description Moshi is a speech-text foundation model that casts spoken dialogue as speech-to-speech generation. Starting from a text language model backbone, Moshi generates speech as tokens from the residual quantizer of a neural audio codec, while modeling separately its own speech and that of the user into parallel streams. This allows for the removal of explicit speaker turns, and the modeling of arbitrary conversational dynamics. Moshi also predicts time-aligned text tokens as a prefix to audio tokens. This “Inner Monologue” method significantly improves the linguistic quality of generated speech and provides streaming speech recognition and text-to-speech. As a result, Moshi is the first real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice. - **Developed by:** Kyutai - **Model type:** Multimodal speech-text foundation model - **Language(s) (NLP):** English - **License:** CC-BY ### Model Sources - **Repository:** repo - **Paper:** paper - **Demo:** demo ## Uses ### Direct Use The model can be used as a conversational agent for casual conversations, basic facts and advice (e.g. recipes, trivia), roleplay, etc. However, the model has limited abilities for complex tasks and cannot access tools, but rather focues on natural, low-latency interactions. ### Downstream Use Some components of the model can be used independently or repurposed relatively easily. For instance the Mimi codec is a state-of-the-art audio neural codec that combines semantic and acoustic information into audio tokens running at 12Hz and a bitrate of 1.1kbps, which make it particularly adapted to train speech language models or text-to-speech systems.. Regarding the main Moshi architecture, other downstream usecases would require some finetuning / domain adaptation. ### Out-of-Scope Use The model is not intended to be used to impersonate other people or any malicious use of any kind. This model is for research only and we do not recommend it for providing advices or to perform any professionnal duty. ## Bias, Risks, and Limitations The model has been trained with a few safeguards to try to limit potential toxic usages, however our toxicity analysis shows that it behaves in the middle of existing models with respect to textual generation. It has some bias towards certain domains and topics that are over-represented in the training data. Its capabilities are relatively limited so far and it is trained to produce only one voice to avoid impersonation. Yet, we need the perspective in time to establish the sociotechnical limitations. ## How to Get Started with the Model See the main README file. ## Training Details ### Training Data - Textual data: The underlying Helium model is trained on a mix of data, more precisely: - 12.5% is high-quality data sources from the following curated sources: Wikipedia Wikibooks, Wikisource, Wikinews, StackExchange and the collection of scientific articles pes2o. For Wikipedia, we use five different dumps from 2017, 2018, 2019, 2021 and 2022. - 87.5% is filtered web data from CommonCrawl, using the following crawls: 2018-30, 2019-04, 2019-30, 2020-05, 2020-34, 2021-04, 2021-31, 2022-05, 2022-33, 2023-40. - Audio data - **Unsupervised audio dataset:** used for pre-training, this is a collection of 7 million hours of readily available audio content, which consists mostly of English speech. This training set is transcribed with Whisper (large v3 model) - **The Fisher dataset:**: used to enable multi-stream. It consists of 2000 hours of phone conversations at 8kHz from Fisher, which we upsample to 24kHz using AudioSR. - **Supervised multi-stream dataset:** A dataset of 170 hours of natural and scripted conversation between multiple pairs of participants, collected by Kyutai. This dataset is used to train the TTS system used to create synthetic data. - **Synthetic data:** 20,000 hours of synthetic data generated by our TTS system, and simulating a dialogue between Moshi and a user. ### Training procedure and hyper-parameters The different stages of the training procedure are detailled in the paper along with the hyper-parameters. ### Compute Infrastructure The training was performed on 127 DGX nodes provided by Scaleway, accounting for 1016 H100 Nvidia GPUs. ## Citation ## Model Card Authors Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, Neil Zeghidour", + "model_explanation_gemini": "Kyutai's Moshi is a real-time, full-duplex spoken dialogue model that generates speech and text simultaneously for low-latency conversational interactions." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-ViT-B-16-laion2B-s34B-b88K.json b/data/model_data_json/laion_CLIP-ViT-B-16-laion2B-s34B-b88K.json new file mode 100644 index 0000000000000000000000000000000000000000..a08c91db88510beb91707d2fa89de78527a3c20e --- /dev/null +++ b/data/model_data_json/laion_CLIP-ViT-B-16-laion2B-s34B-b88K.json @@ -0,0 +1,14 @@ +{ + "model_id": "laion/CLIP-ViT-B-16-laion2B-s34B-b88K", + "downloads": 1994941, + "tags": [ + "open_clip", + "safetensors", + "zero-shot-image-classification", + "arxiv:1910.04867", + "license:mit", + "region:us" + ], + "description": "--- license: mit pipeline_tag: zero-shot-image-classification library_name: open_clip --- # Model Card for CLIP ViT-B/16 - LAION-2B # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation # Model Details ## Model Description A CLIP ViT-B/16 model trained with the LAION-2B English subset of LAION-5B ( using OpenCLIP ( Model training done by Mehdi Cherti on the JUWELS Booster supercomputer. See acknowledgements below. # Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog ( and upcoming paper include additional discussion as it relates specifically to the training dataset. ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. ## Out-of-Scope Use As per the OpenAI models, **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. Further the above notice, the LAION-5B dataset used in training of these models has additional considerations, see below. # Training Details ## Training Data This model was trained with the 2 Billion sample English subset of LAION-5B ( **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes. Be aware that this large-scale dataset is uncurated. Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer. Therefore, please use the demo links with caution and at your own risk. It is possible to extract a “safe” subset by filtering out samples based on the safety tags (using a customized trained NSFW classifier that we built). While this strongly reduces the chance for encountering potentially harmful content when viewing, we cannot entirely exclude the possibility for harmful content being still present in safe mode, so that the warning holds also there. We think that providing the dataset openly to broad research and other interested communities will allow for transparent investigation of benefits that come along with training large-scale models as well as pitfalls and dangers that may stay unreported or unnoticed when working with closed large datasets that remain restricted to a small community. Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress. ## Training Procedure TODO # Evaluation Evaluation done with code in the LAION CLIP Benchmark suite. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed with VTAB+ (A combination of VTAB ( w/ additional robustness datasets) for classification and COCO and Flickr for retrieval. ## Results The model achieves a 70.2 zero-shot top-1 accuracy on ImageNet-1k. An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at # Acknowledgements Acknowledging the Gauss Centre for Supercomputing e.V. ( for funding this part of work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC). # Citation **BibTeX:** LAION-5B OpenAI CLIP paper OpenCLIP software", + "model_explanation_gemini": "Performs zero-shot image classification and text-image retrieval using a CLIP ViT-B/16 architecture trained on the LAION-2B English dataset." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-ViT-B-32-laion2B-s34B-b79K.json b/data/model_data_json/laion_CLIP-ViT-B-32-laion2B-s34B-b79K.json new file mode 100644 index 0000000000000000000000000000000000000000..52e4c2820399b5968067c2a3f3cd7a5014652a66 --- /dev/null +++ b/data/model_data_json/laion_CLIP-ViT-B-32-laion2B-s34B-b79K.json @@ -0,0 +1,16 @@ +{ + "model_id": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "downloads": 1030278, + "tags": [ + "open_clip", + "pytorch", + "safetensors", + "clip", + "zero-shot-image-classification", + "arxiv:1910.04867", + "license:mit", + "region:us" + ], + "description": "--- license: mit widget: - src: >- candidate_labels: playing music, playing sports example_title: Cat & Dog pipeline_tag: zero-shot-image-classification --- # Model Card for CLIP ViT-B/32 - LAION-2B # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model # Model Details ## Model Description A CLIP ViT-B/32 model trained with the LAION-2B English subset of LAION-5B ( using OpenCLIP ( Model training done by Romain Beaumont on the stability.ai cluster. # Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog ( and upcoming paper include additional discussion as it relates specifically to the training dataset. ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. ## Out-of-Scope Use As per the OpenAI models, **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. Further the above notice, the LAION-5B dataset used in training of these models has additional considerations, see below. # Training Details ## Training Data This model was trained with the 2 Billion sample English subset of LAION-5B ( **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes. Be aware that this large-scale dataset is uncurated. Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer. Therefore, please use the demo links with caution and at your own risk. It is possible to extract a “safe” subset by filtering out samples based on the safety tags (using a customized trained NSFW classifier that we built). While this strongly reduces the chance for encountering potentially harmful content when viewing, we cannot entirely exclude the possibility for harmful content being still present in safe mode, so that the warning holds also there. We think that providing the dataset openly to broad research and other interested communities will allow for transparent investigation of benefits that come along with training large-scale models as well as pitfalls and dangers that may stay unreported or unnoticed when working with closed large datasets that remain restricted to a small community. Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress. ## Training Procedure Please see training notes and wandb logs. # Evaluation Evaluation done with code in the LAION CLIP Benchmark suite. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed with VTAB+ (A combination of VTAB ( w/ additional robustness datasets) for classification and COCO and Flickr for retrieval. **TODO** - more detail ## Results The model achieves a 66.6 zero-shot top-1 accuracy on ImageNet-1k. An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at **TODO** - create table for just this model's metrics. # Acknowledgements Acknowledging stability.ai for the compute used to train this model. # Citation **BibTeX:** In addition to forthcoming LAION-5B ( paper, please cite: OpenAI CLIP paper OpenCLIP software # How to Get Started with the Model Use the code below to get started with the model. ** TODO ** - Hugging Face transformers, OpenCLIP, and timm getting started snippets", + "model_explanation_gemini": "Performs zero-shot image classification and text-to-image retrieval by matching images with text descriptions without task-specific training." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-ViT-B-32-roberta-base-laion2B-s12B-b32k.json b/data/model_data_json/laion_CLIP-ViT-B-32-roberta-base-laion2B-s12B-b32k.json new file mode 100644 index 0000000000000000000000000000000000000000..21550a830328bc47a3341ea571089fbf2a65645f --- /dev/null +++ b/data/model_data_json/laion_CLIP-ViT-B-32-roberta-base-laion2B-s12B-b32k.json @@ -0,0 +1,12 @@ +{ + "model_id": "laion/CLIP-ViT-B-32-roberta-base-laion2B-s12B-b32k", + "downloads": 152942, + "tags": [ + "open_clip", + "arxiv:1910.04867", + "license:mit", + "region:us" + ], + "description": "--- license: mit widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # Model Card for CLIP ViT-B/32 roberta base - LAION-2B # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model # Model Details ## Model Description A CLIP ViT-B/32 roberta base model trained with the LAION-2B English subset of LAION-5B ( using OpenCLIP ( Model training done by Romain Beaumont on the stability.ai cluster. # Uses ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. # Training Details ## Training Data This model was trained with the 2 Billion sample English subset of LAION-5B ( ## Training Procedure Training with batch size 32k for 12B sample of laion2B-en, see Model is B/32 on visual side, roberta base initialized with pretrained weights on text side. # Evaluation Evaluation done with code in the LAION CLIP Benchmark suite. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed with VTAB+ (A combination of VTAB ( w/ additional robustness datasets) for classification and COCO and Flickr for retrieval. ## Results The model achieves * imagenet 1k 61.7% (vs 62.9% for baseline) * mscoco 63% (vs 60.8% for baseline) * flickr30k 86.7% (vs 85.4% for baseline) !metrics # Acknowledgements Acknowledging stability.ai for the compute used to train this model. # Citation **BibTeX:** In addition to forthcoming LAION-5B ( paper, please cite: OpenAI CLIP paper OpenCLIP software # How To Get Started With the Model", + "model_explanation_gemini": "Performs zero-shot image classification and text-image retrieval by matching visual and textual representations using a CLIP ViT-B/32 and roberta base architecture trained on LAION-2B data." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-ViT-H-14-laion2B-s32B-b79K.json b/data/model_data_json/laion_CLIP-ViT-H-14-laion2B-s32B-b79K.json new file mode 100644 index 0000000000000000000000000000000000000000..48cc7b7c5a4a7bf17757b97a05c79a7a625a9719 --- /dev/null +++ b/data/model_data_json/laion_CLIP-ViT-H-14-laion2B-s32B-b79K.json @@ -0,0 +1,16 @@ +{ + "model_id": "laion/CLIP-ViT-H-14-laion2B-s32B-b79K", + "downloads": 1795020, + "tags": [ + "open_clip", + "pytorch", + "safetensors", + "clip", + "zero-shot-image-classification", + "arxiv:1910.04867", + "license:mit", + "region:us" + ], + "description": "--- license: mit widget: - src: >- candidate_labels: playing music, playing sports example_title: Cat & Dog library_name: open_clip pipeline_tag: zero-shot-image-classification --- # Model Card for CLIP ViT-H/14 - LAION-2B # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model # Model Details ## Model Description A CLIP ViT-H/14 model trained with the LAION-2B English subset of LAION-5B ( using OpenCLIP ( Model training done by Romain Beaumont on the stability.ai cluster. # Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog ( and upcoming paper include additional discussion as it relates specifically to the training dataset. ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. ## Out-of-Scope Use As per the OpenAI models, **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. Further the above notice, the LAION-5B dataset used in training of these models has additional considerations, see below. # Training Details ## Training Data This model was trained with the 2 Billion sample English subset of LAION-5B ( **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes. Be aware that this large-scale dataset is uncurated. Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer. Therefore, please use the demo links with caution and at your own risk. It is possible to extract a “safe” subset by filtering out samples based on the safety tags (using a customized trained NSFW classifier that we built). While this strongly reduces the chance for encountering potentially harmful content when viewing, we cannot entirely exclude the possibility for harmful content being still present in safe mode, so that the warning holds also there. We think that providing the dataset openly to broad research and other interested communities will allow for transparent investigation of benefits that come along with training large-scale models as well as pitfalls and dangers that may stay unreported or unnoticed when working with closed large datasets that remain restricted to a small community. Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress. ## Training Procedure Please see training notes and wandb logs. # Evaluation Evaluation done with code in the LAION CLIP Benchmark suite. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed with VTAB+ (A combination of VTAB ( w/ additional robustness datasets) for classification and COCO and Flickr for retrieval. **TODO** - more detail ## Results The model achieves a 78.0 zero-shot top-1 accuracy on ImageNet-1k. An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at **TODO** - create table for just this model's metrics. # Acknowledgements Acknowledging stability.ai for the compute used to train this model. # Citation **BibTeX:** LAION-5B OpenAI CLIP paper OpenCLIP software # How to Get Started with the Model Use the code below to get started with the model. ** TODO ** - Hugging Face transformers, OpenCLIP, and timm getting started snippets", + "model_explanation_gemini": "Performs zero-shot image classification and text-to-image retrieval by matching images with text descriptions without task-specific training." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.json b/data/model_data_json/laion_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.json new file mode 100644 index 0000000000000000000000000000000000000000..6610a30862cadd761fdf191fcfa051bda40afd36 --- /dev/null +++ b/data/model_data_json/laion_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.json @@ -0,0 +1,16 @@ +{ + "model_id": "laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K", + "downloads": 619050, + "tags": [ + "open_clip", + "pytorch", + "clip", + "zero-shot-image-classification", + "dataset:mlfoundations/datacomp_pools", + "arxiv:2304.14108", + "license:mit", + "region:us" + ], + "description": "--- license: mit widget: - src: >- candidate_labels: playing music, playing sports example_title: Cat & Dog library_name: open_clip datasets: - mlfoundations/datacomp_pools pipeline_tag: zero-shot-image-classification --- # Model card for CLIP ViT-L-14 trained DataComp-1B # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model # Model Details ## Model Description A CLIP ViT-L/14 model trained with the DataComp-1B ( using OpenCLIP ( Model training done on the stability.ai cluster. # Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the DataComp paper ( include additional discussion as it relates specifically to the training dataset. ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. ## Out-of-Scope Use As per the OpenAI models, **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. # Training Details ## Training Data This model was trained with the 1.4 Billion samples of the DataComp-1B dataset ( **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes. Be aware that this large-scale dataset is uncurated. Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer. Therefore, please use the demo links with caution and at your own risk. It is possible to extract a “safe” subset by filtering out samples based on the safety tags (using a customized trained NSFW classifier that we built). While this strongly reduces the chance for encountering potentially harmful content when viewing, we cannot entirely exclude the possibility for harmful content being still present in safe mode, so that the warning holds also there. We think that providing the dataset openly to broad research and other interested communities will allow for transparent investigation of benefits that come along with training large-scale models as well as pitfalls and dangers that may stay unreported or unnoticed when working with closed large datasets that remain restricted to a small community. Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress. ## Training Procedure Please see # Evaluation Evaluation done on 38 datasets, using the DataComp repo and the LAION CLIP Benchmark. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed on a suite of 38 datasets. See our paper for more details ( ## Results The model achieves a 79.2% zero-shot top-1 accuracy on ImageNet-1k. See our paper for more details and results ( # Acknowledgements Acknowledging stability.ai for the compute used to train this model. # Citation **BibTeX:** DataComp OpenAI CLIP paper OpenCLIP software # How to Get Started with the Model See", + "model_explanation_gemini": "Performs zero-shot image classification and text-to-image retrieval using a CLIP ViT-L/14 architecture trained on the DataComp-1B dataset." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-ViT-bigG-14-laion2B-39B-b160k.json b/data/model_data_json/laion_CLIP-ViT-bigG-14-laion2B-39B-b160k.json new file mode 100644 index 0000000000000000000000000000000000000000..b6290a3a5d1c1e3bd5028c6ffbc33d5c40ac0590 --- /dev/null +++ b/data/model_data_json/laion_CLIP-ViT-bigG-14-laion2B-39B-b160k.json @@ -0,0 +1,17 @@ +{ + "model_id": "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k", + "downloads": 1780499, + "tags": [ + "open_clip", + "pytorch", + "safetensors", + "clip", + "zero-shot-image-classification", + "arxiv:1910.04867", + "arxiv:2212.07143", + "license:mit", + "region:us" + ], + "description": "--- license: mit widget: - src: >- candidate_labels: playing music, playing sports example_title: Cat & Dog library_name: open_clip pipeline_tag: zero-shot-image-classification --- # Model Card for CLIP ViT-bigG/14 - LAION-2B # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model # Model Details ## Model Description A CLIP ViT-bigG/14 model trained with the LAION-2B English subset of LAION-5B ( using OpenCLIP ( Model training done by Mitchell Wortsman on the stability.ai cluster. The license for this model is MIT. # Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog ( and upcoming paper include additional discussion as it relates specifically to the training dataset. ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. ## Out-of-Scope Use As per the OpenAI models, **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. Further the above notice, the LAION-5B dataset used in training of these models has additional considerations, see below. # Training Details ## Training Data This model was trained with the 2 Billion sample English subset of LAION-5B ( Fine-tuning was also partially done on LAION-A, a 900M subset of LAION-2B filtered with aesthetic V2 4.5+ and phash deduplicated. **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes. Be aware that this large-scale dataset is uncurated. Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer. Therefore, please use the demo links with caution and at your own risk. It is possible to extract a “safe” subset by filtering out samples based on the safety tags (using a customized trained NSFW classifier that we built). While this strongly reduces the chance for encountering potentially harmful content when viewing, we cannot entirely exclude the possibility for harmful content being still present in safe mode, so that the warning holds also there. We think that providing the dataset openly to broad research and other interested communities will allow for transparent investigation of benefits that come along with training large-scale models as well as pitfalls and dangers that may stay unreported or unnoticed when working with closed large datasets that remain restricted to a small community. Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress. ## Training Procedure The training procedure will soon be discussed by a blog post on laion.ai. # Evaluation Evaluation done with code in the LAION CLIP Benchmark suite. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed with VTAB+ (A combination of VTAB ( w/ additional robustness datasets) for classification and COCO and Flickr for retrieval. **TODO** - more detail ## Results The model achieves a 80.1 zero-shot top-1 accuracy on ImageNet-1k. An initial round of benchmarks have been performed on a wider range of datasets, and will soon be visible at **TODO** - create table for just this model's metrics. # Acknowledgements Acknowledging stability.ai for the compute used to train this model. # Citation **BibTeX:** LAION-5B OpenAI CLIP paper OpenCLIP software Scaling OpenCLIP paper # How to Get Started with the Model Use the code below to get started with the model. ** TODO ** - Hugging Face transformers, OpenCLIP, and timm getting started snippets", + "model_explanation_gemini": "Performs zero-shot image classification and text-to-image retrieval by understanding and matching visual and textual content." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-ViT-g-14-laion2B-s34B-b88K.json b/data/model_data_json/laion_CLIP-ViT-g-14-laion2B-s34B-b88K.json new file mode 100644 index 0000000000000000000000000000000000000000..e498998a7aedb0b79d26067bc732077025bbf8a4 --- /dev/null +++ b/data/model_data_json/laion_CLIP-ViT-g-14-laion2B-s34B-b88K.json @@ -0,0 +1,15 @@ +{ + "model_id": "laion/CLIP-ViT-g-14-laion2B-s34B-b88K", + "downloads": 93417, + "tags": [ + "open_clip", + "safetensors", + "zero-shot-image-classification", + "clip", + "arxiv:1910.04867", + "license:mit", + "region:us" + ], + "description": "--- tags: - zero-shot-image-classification - clip library_tag: open_clip license: mit pipeline_tag: zero-shot-image-classification --- # Model card for CLIP-ViT-g-14-laion2B-s34B-b88K # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model # Model Details ## Model Description A CLIP ViT-g/14 model trained with the LAION-2B English subset of LAION-5B ( using OpenCLIP ( Model training done by Jenia Jitsev on JUWELS Booster at Juelich Supercomputing Center and on the stability.ai AWS HPC cluster. Training performed in frame of reproducible scaling law studies, published as research paper at CVPR 2023. See also the research repository # Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog ( and LAION-5B NeurIPS paper include additional discussion as it relates specifically to the training dataset. ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. ## Out-of-Scope Use As per the OpenAI models, **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. Further the above notice, the LAION-5B dataset used in training of these models has additional considerations, see below. # Training Details ## Training Data This model was trained with the 2 Billion sample English subset of LAION-5B ( **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes. Be aware that this large-scale dataset is uncurated. Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer. Therefore, please use the demo links with caution and at your own risk. It is possible to extract a “safe” subset by filtering out samples based on the safety tags (using a customized trained NSFW classifier that we built). While this strongly reduces the chance for encountering potentially harmful content when viewing, we cannot entirely exclude the possibility for harmful content being still present in safe mode, so that the warning holds also there. We think that providing the dataset openly to broad research and other interested communities will allow for transparent investigation of benefits that come along with training large-scale models as well as pitfalls and dangers that may stay unreported or unnoticed when working with closed large datasets that remain restricted to a small community. Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress. ## Training Procedure OpenCLIP ViT-g/14 model was trained on 34.5B samples (135M * 256 checkpoints) from laion2b-en (part of LAION-5B) dataset. Warmup = 13.5k steps, learning rate = 1e-3, cosine annealing schedule, weight decay = 0.2. Global batch size = 88800, number of GPUs = 1480, local batch size = 60 # Evaluation Evaluation done with code in the LAION CLIP Benchmark suite. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed with VTAB+ (A combination of VTAB ( w/ additional robustness datasets) for classification and COCO and Flickr for retrieval. **TODO** - more detail ## Results The model achieves a 78.4 zero-shot top-1 accuracy on ImageNet-1k. An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at **TODO** - create table for just this model's metrics. # Acknowledgements We gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding the work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC) We also acknowledge storage resources on JUST granted and operated by JSC, as well as computing resources from the Helmholtz Data Federation (HDF). We further acknowledge stability.ai providing additional compute used to train this model. # Citation **BibTeX:** Please cite: LAION-5B paper Reproducible scaling laws for openCLIP paper OpenAI CLIP paper OpenCLIP software # How to Get Started with the Model Use the code below to get started with the model. ** TODO ** - Hugging Face transformers, OpenCLIP, and timm getting started snippets", + "model_explanation_gemini": "Performs zero-shot image classification and text-image retrieval by leveraging CLIP architecture trained on the LAION-2B English dataset." +} \ No newline at end of file diff --git a/data/model_data_json/laion_CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup.json b/data/model_data_json/laion_CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup.json new file mode 100644 index 0000000000000000000000000000000000000000..1333cf57571a46323b2ff62aa5ab66681a42b399 --- /dev/null +++ b/data/model_data_json/laion_CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup.json @@ -0,0 +1,17 @@ +{ + "model_id": "laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup", + "downloads": 81153, + "tags": [ + "open_clip", + "tensorboard", + "safetensors", + "zero-shot-image-classification", + "clip", + "arxiv:2201.03545", + "arxiv:2210.08402", + "arxiv:1910.04867", + "license:mit", + "region:us" + ], + "description": "--- tags: - zero-shot-image-classification - clip license: mit library_name: open_clip pipeline_tag: zero-shot-image-classification --- # Model card for CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup # Table of Contents 1. Model Details 2. Uses 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation # Model Details ## Model Description A series of CLIP ConvNeXt-Large (w/ extra text depth, vision MLP head) models trained on the LAION-2B (english) subset of LAION-5B using OpenCLIP. The models utilize: * the timm ConvNeXt-Large model () as the image tower * a MLP () head in vision tower instead of the single projection of other CLIP models * a text tower with same width but 4 layers more depth than ViT-L / RN50x16 models (depth 16, embed dim 768). This 320x320 resolution model is a soup (weight average) of 3 fine-tunes of CLIP-convnext_large_d.laion2B-s26B-b102K-augreg at a higher resolution. It is an average of 3 fine-tunes from the final checkpoint of the original 256x256 training run w/ an additional ~2-3B samples for each fine-tune and a lower learning rate. Each fine-tune was a different learning rate (1e-4, 6e-5, 5e-5), and diff # of samples (3.2B, 2B, 2.5B). At 320x320, the ConvNext-Large-D is significantly more efficient than the L/14 model at 336x336 that OpenAI fine-tuned. L/14-336 model is 2.5x more GMAC, 2.8x more activations, and 1.22x more parameters. | Model | Dataset | Resolution | AugReg | Top-1 ImageNet Zero-Shot (%) | | ----- | ------- | ---------- | ------------ | --------- | | convnext_large_d.laion2b_s26b_b102k-augreg | LAION-2B | 256x256 | RRC (0.33, 1.0), RE (0.35), SD (0.1), D(0.1) | 75.9 | | convnext_large_d_320.laion2b_s29b_b131k-ft | LAION-2B | 320x320 | RRC (0.5, 1.0), RE (0.4), SD (0.1), D(0.0) | 76.6 | | convnext_large_d_320.laion2b_s29b_b131k-ft-soup | LAION-2B | 320x320 | RRC (0.5, 1.0), RE (0.4), SD (0.1), D(0.0) | 76.9 | RRC = Random Resize Crop (crop pcts), RE = Random Erasing (prob), SD = Stochastic Depth (prob) -- image tower only, D = Dropout (prob) -- image tower head only LAION-A = LAION Aesthetic, an ~900M sample subset of LAION-2B with pHash dedupe and asthetic score filtering. Model training done by Ross Wightman on the stability.ai cluster. # Uses As per the original OpenAI CLIP model card, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model. The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog ( and upcoming paper include additional discussion as it relates specifically to the training dataset. ## Direct Use Zero-shot image classification, image and text retrieval, among others. ## Downstream Use Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others. ## Out-of-Scope Use As per the OpenAI models, **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. Further the above notice, the LAION-5B dataset used in training of these models has additional considerations, see below. # Training Details ## Training Data This model was trained with LAION-2B -- A 2 billion sample English subset of LAION-5B ( **IMPORTANT NOTE:** The motivation behind dataset creation is to democratize research and experimentation around large-scale multi-modal model training and handling of uncurated, large-scale datasets crawled from publically available internet. Our recommendation is therefore to use the dataset for research purposes. Be aware that this large-scale dataset is uncurated. Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer. Therefore, please use the demo links with caution and at your own risk. It is possible to extract a “safe” subset by filtering out samples based on the safety tags (using a customized trained NSFW classifier that we built). While this strongly reduces the chance for encountering potentially harmful content when viewing, we cannot entirely exclude the possibility for harmful content being still present in safe mode, so that the warning holds also there. We think that providing the dataset openly to broad research and other interested communities will allow for transparent investigation of benefits that come along with training large-scale models as well as pitfalls and dangers that may stay unreported or unnoticed when working with closed large datasets that remain restricted to a small community. Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress. ## Training Procedure All 320x320 model fine-tunes were trained with a global batch size of 131072 for 10-16 checkpoint intervals of 203.7M samples for a total of ~2-3B samples seen over fine-tune. For 320x320 models, a slurm script w/ srun below was used on 64 8-GPU (A100 40GB) nodes (Stability). # Evaluation Evaluation done with code in the LAION CLIP Benchmark suite. ## Testing Data, Factors & Metrics ### Testing Data The testing is performed with VTAB+ (A combination of VTAB ( w/ additional robustness datasets) for classification and COCO and Flickr for retrieval. ## Results The models achieve between 75.9 and 76.9 top-1 zero-shot accuracy on ImageNet-1k. Zero-shot curve of origina from-scratch 256x256 training: An initial round of benchmarks have been performed on a wider range of datasets, to be viewable at # Acknowledgements Acknowledging stability.ai for compute used to train this model. # Citation **BibTeX:** LAION-5B OpenCLIP software OpenAI CLIP paper" +} \ No newline at end of file diff --git a/data/model_data_json/laion_clap-htsat-fused.json b/data/model_data_json/laion_clap-htsat-fused.json new file mode 100644 index 0000000000000000000000000000000000000000..f07a0973271c8b051237eb0b0c2a2f8d51dfffc2 --- /dev/null +++ b/data/model_data_json/laion_clap-htsat-fused.json @@ -0,0 +1,16 @@ +{ + "model_id": "laion/clap-htsat-fused", + "downloads": 81868, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "clap", + "feature-extraction", + "arxiv:2211.06687", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 --- # Model card for CLAP Model card for CLAP: Contrastive Language-Audio Pretraining !clap_image # Table of Contents 0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Citation # TL;DR The abstract of the paper states that: > Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different data sources. Second, we construct a contrastive language-audio pretraining model by considering different audio encoders and text encoders. We incorporate the feature fusion mechanism and keyword-to-caption augmentation into the model design to further enable the model to process audio inputs of variable lengths and enhance the performance. Third, we perform comprehensive experiments to evaluate our model across three tasks: text-to-audio retrieval, zero-shot audio classification, and supervised audio classification. The results demonstrate that our model achieves superior performance in text-to-audio retrieval task. In audio classification tasks, the model achieves state-of-the-art performance in the zero-shot setting and is able to obtain performance comparable to models' results in the non-zero-shot setting. LAION-Audio-630K and the proposed model are both available to the public. # Usage You can use this model for zero shot audio classification or extracting audio and/or textual features. # Uses ## Perform zero-shot audio classification ### Using ## Run the model: You can also get the audio and text embeddings using ### Run the model on CPU: ### Run the model on GPU: # Citation If you are using this model for your work, please consider citing the original paper:" +} \ No newline at end of file diff --git a/data/model_data_json/laion_larger_clap_general.json b/data/model_data_json/laion_larger_clap_general.json new file mode 100644 index 0000000000000000000000000000000000000000..387f49d87103e22d466aa7097071d57520142c66 --- /dev/null +++ b/data/model_data_json/laion_larger_clap_general.json @@ -0,0 +1,16 @@ +{ + "model_id": "laion/larger_clap_general", + "downloads": 380733, + "tags": [ + "transformers", + "pytorch", + "clap", + "feature-extraction", + "arxiv:2211.06687", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 --- # Model ## TL;DR CLAP is to audio what CLIP is to image. This is an improved CLAP checkpoint, specifically trained on general audio, music and speech. ## Description CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on a variety of (audio, text) pairs. It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. The CLAP model uses a SWINTransformer to get audio features from a log-Mel spectrogram input, and a RoBERTa model to get text features. Both the text and audio features are then projected to a latent space with identical dimension. The dot product between the projected audio and text features is then used as a similar score. # Usage You can use this model for zero shot audio classification or extracting audio and/or textual features. # Uses ## Perform zero-shot audio classification ### Using ## Run the model: You can also get the audio and text embeddings using ### Run the model on CPU: ### Run the model on GPU: # Citation If you are using this model for your work, please consider citing the original paper:", + "model_explanation_gemini": "LAION's larger CLAP model predicts relevant text snippets from audio inputs and enables zero-shot audio classification by projecting audio and text features into a shared latent space." +} \ No newline at end of file diff --git a/data/model_data_json/latent-consistency_lcm-lora-sdv1-5.json b/data/model_data_json/latent-consistency_lcm-lora-sdv1-5.json new file mode 100644 index 0000000000000000000000000000000000000000..4ed50153876db0931a35d6416eda12c0aca0ddbf --- /dev/null +++ b/data/model_data_json/latent-consistency_lcm-lora-sdv1-5.json @@ -0,0 +1,16 @@ +{ + "model_id": "latent-consistency/lcm-lora-sdv1-5", + "downloads": 106777, + "tags": [ + "diffusers", + "lora", + "text-to-image", + "arxiv:2311.05556", + "base_model:runwayml/stable-diffusion-v1-5", + "base_model:adapter:runwayml/stable-diffusion-v1-5", + "license:openrail++", + "region:us" + ], + "description": "--- library_name: diffusers base_model: runwayml/stable-diffusion-v1-5 tags: - lora - text-to-image license: openrail++ inference: false --- # Latent Consistency Model (LCM) LoRA: SDv1-5 Latent Consistency Model (LCM) LoRA was proposed in LCM-LoRA: A universal Stable-Diffusion Acceleration Module by *Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu et al.* It is a distilled consistency adapter for []( that allows to reduce the number of inference steps to only between **2 - 8 steps**. | Model | Params / M | |----------------------------------------------------------------------------|------------| | **lcm-lora-sdv1-5** | **67.5** | | lcm-lora-ssd-1b | 105 | | lcm-lora-sdxl | 197M | ## Usage LCM-LoRA is supported in 🤗 Hugging Face Diffusers library from version v0.23.0 onwards. To run the model, first install the latest version of the Diffusers library as well as , and . audio dataset from the Hugging Face Hub: ***Note: For detailed usage examples we recommend you to check out our official LCM-LoRA docs*** ### Text-to-Image The adapter can be loaded with SDv1-5 or deviratives. Here we use []( Next, the scheduler needs to be changed to []( and we can reduce the number of inference steps to just 2 to 8 steps. Please make sure to either disable or use values between 1.0 and 2.0. ### Image-to-Image LCM-LoRA can be applied to image-to-image tasks too. Let's look at how we can perform image-to-image generation with LCMs. For this example we'll use the dreamshaper-7 model and the LCM-LoRA for . ![]( ### Inpainting LCM-LoRA can be used for inpainting as well. ![]( ### ControlNet For this example, we'll use the SD-v1-5 model and the LCM-LoRA for SD-v1-5 with canny ControlNet. ![]( ## Speed Benchmark TODO ## Training TODO", + "model_explanation_gemini": "Accelerates Stable Diffusion v1.5 text-to-image generation by reducing required inference steps to 2-8 using a distilled consistency adapter (LoRA)." +} \ No newline at end of file diff --git a/data/model_data_json/law-ai_InLegalBERT.json b/data/model_data_json/law-ai_InLegalBERT.json new file mode 100644 index 0000000000000000000000000000000000000000..da96df2474a1969ef5a682f8ff356d288f41ca56 --- /dev/null +++ b/data/model_data_json/law-ai_InLegalBERT.json @@ -0,0 +1,22 @@ +{ + "model_id": "law-ai/InLegalBERT", + "downloads": 773935, + "tags": [ + "transformers", + "pytorch", + "bert", + "pretraining", + "legal", + "fill-mask", + "en", + "arxiv:2209.06049", + "arxiv:2112.14731", + "arxiv:1911.05405", + "arxiv:2105.13562", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en pipeline_tag: fill-mask tags: - legal license: mit --- ### InLegalBERT Model and tokenizer files for the InLegalBERT model from the paper Pre-training Transformers on Indian Legal Text. ### Training Data For building the pre-training corpus of Indian legal text, we collected a large corpus of case documents from the Indian Supreme Court and many High Courts of India. The court cases in our dataset range from 1950 to 2019, and belong to all legal domains, such as Civil, Criminal, Constitutional, and so on. In total, our dataset contains around 5.4 million Indian legal documents (all in the English language). The raw text corpus size is around 27 GB. ### Training Setup This model is initialized with the LEGAL-BERT-SC model from the paper LEGAL-BERT: The Muppets straight out of Law School. In our work, we refer to this model as LegalBERT, and our re-trained model as InLegalBERT. We further train this model on our data for 300K steps on the Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks. ### Model Overview This model uses the same tokenizer as LegalBERT. This model has the same configuration as the bert-base-uncased model: 12 hidden layers, 768 hidden dimensionality, 12 attention heads, ~110M parameters. ### Usage Using the model to get embeddings/representations for a piece of text ### Fine-tuning Results We have fine-tuned all pre-trained models on 3 legal tasks with Indian datasets: * Legal Statute Identification (ILSI Dataset)[Multi-label Text Classification]: Identifying relevant statutes (law articles) based on the facts of a court case * Semantic Segmentation (ISS Dataset)[Sentence Tagging]: Segmenting the document into 7 functional parts (semantic segments) such as Facts, Arguments, etc. * Court Judgment Prediction (ILDC Dataset)[Binary Text Classification]: Predicting whether the claims/petitions of a court case will be accepted/rejected InLegalBERT beats LegalBERT as well as all other baselines/variants we have used, across all three tasks. For details, see our paper. ### Citation ### About Us We are a group of researchers from the Department of Computer Science and Technology, Indian Insitute of Technology, Kharagpur. Our research interests are primarily ML and NLP applications for the legal domain, with a special focus on the challenges and oppurtunites for the Indian legal scenario. We have, and are currently working on several legal tasks such as: * named entity recognition, summarization of legal documents * semantic segmentation of legal documents * legal statute identification from facts, court judgment prediction * legal document matching You can find our publicly available codes and datasets here.", + "model_explanation_gemini": "Pre-trained on Indian legal texts for tasks like legal statute identification, semantic segmentation, and court judgment prediction, outperforming other legal language models." +} \ No newline at end of file diff --git a/data/model_data_json/lengyue233_content-vec-best.json b/data/model_data_json/lengyue233_content-vec-best.json new file mode 100644 index 0000000000000000000000000000000000000000..1f98ea80ae15b37627cc7f5bfc4324858f0c3a7f --- /dev/null +++ b/data/model_data_json/lengyue233_content-vec-best.json @@ -0,0 +1,15 @@ +{ + "model_id": "lengyue233/content-vec-best", + "downloads": 1829878, + "tags": [ + "transformers", + "pytorch", + "hubert", + "doi:10.57967/hf/0479", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- # Content Vec Best Official Repo: ContentVec This repo brings fairseq ContentVec model to HuggingFace Transformers. ## How to use To use this model, you need to define and then load the model with ## How to convert You need to download the ContentVec_legacy model from the official repo, and then run", + "model_explanation_gemini": "Converts fairseq ContentVec models to HuggingFace Transformers format for embedding generation." +} \ No newline at end of file diff --git a/data/model_data_json/liuhaotian_llava-v1.5-13b.json b/data/model_data_json/liuhaotian_llava-v1.5-13b.json new file mode 100644 index 0000000000000000000000000000000000000000..5026a8082227015dd0eaa4b2921e9e4fd8649f35 --- /dev/null +++ b/data/model_data_json/liuhaotian_llava-v1.5-13b.json @@ -0,0 +1,15 @@ +{ + "model_id": "liuhaotian/llava-v1.5-13b", + "downloads": 96555, + "tags": [ + "transformers", + "pytorch", + "llava", + "text-generation", + "image-text-to-text", + "autotrain_compatible", + "region:us" + ], + "description": "--- inference: false pipeline_tag: image-text-to-text ---

# LLaVA Model Card ## Model details **Model type:** LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. **Model date:** LLaVA-v1.5-13B was trained in September 2023. **Paper or resources for more information:** ## License Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved. **Where to send questions or comments about the model:** ## Intended use **Primary intended uses:** The primary use of LLaVA is research on large multimodal models and chatbots. **Primary intended users:** The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. ## Training dataset - 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP. - 158K GPT-generated multimodal instruction-following data. - 450K academic-task-oriented VQA data mixture. - 40K ShareGPT data. ## Evaluation dataset A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.", + "model_explanation_gemini": "LLaVA-v1.5-13B is an open-source multimodal chatbot trained to follow image-text instructions for research on large multimodal models." +} \ No newline at end of file diff --git a/data/model_data_json/liuhaotian_llava-v1.5-7b.json b/data/model_data_json/liuhaotian_llava-v1.5-7b.json new file mode 100644 index 0000000000000000000000000000000000000000..3eb76d7fa9b6586b337a6ee75c2b245ca04f7e06 --- /dev/null +++ b/data/model_data_json/liuhaotian_llava-v1.5-7b.json @@ -0,0 +1,15 @@ +{ + "model_id": "liuhaotian/llava-v1.5-7b", + "downloads": 1124875, + "tags": [ + "transformers", + "pytorch", + "llava", + "text-generation", + "image-text-to-text", + "autotrain_compatible", + "region:us" + ], + "description": "--- inference: false pipeline_tag: image-text-to-text ---

# LLaVA Model Card ## Model details **Model type:** LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. **Model date:** LLaVA-v1.5-7B was trained in September 2023. **Paper or resources for more information:** ## License Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved. **Where to send questions or comments about the model:** ## Intended use **Primary intended uses:** The primary use of LLaVA is research on large multimodal models and chatbots. **Primary intended users:** The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. ## Training dataset - 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP. - 158K GPT-generated multimodal instruction-following data. - 450K academic-task-oriented VQA data mixture. - 40K ShareGPT data. ## Evaluation dataset A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.", + "model_explanation_gemini": "An open-source multimodal chatbot model fine-tuned on GPT-generated instruction-following data for research on large multimodal models and chatbots." +} \ No newline at end of file diff --git a/data/model_data_json/llamafactory_tiny-random-Llama-3.json b/data/model_data_json/llamafactory_tiny-random-Llama-3.json new file mode 100644 index 0000000000000000000000000000000000000000..b3a47bf9acc4b53f825d60a4892a969465e8b5c1 --- /dev/null +++ b/data/model_data_json/llamafactory_tiny-random-Llama-3.json @@ -0,0 +1,17 @@ +{ + "model_id": "llamafactory/tiny-random-Llama-3", + "downloads": 206181, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "conversational", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: transformers inference: false --- A tiny version of", + "model_explanation_gemini": "A tiny version of a model with unspecified core functionality, licensed under Apache-2.0 and part of the transformers library." +} \ No newline at end of file diff --git a/data/model_data_json/llava-hf_llava-1.5-7b-hf.json b/data/model_data_json/llava-hf_llava-1.5-7b-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..d3f077991fede54460faedbc757d596f8bc5464e --- /dev/null +++ b/data/model_data_json/llava-hf_llava-1.5-7b-hf.json @@ -0,0 +1,19 @@ +{ + "model_id": "llava-hf/llava-1.5-7b-hf", + "downloads": 653761, + "tags": [ + "transformers", + "safetensors", + "llava", + "image-text-to-text", + "vision", + "conversational", + "en", + "dataset:liuhaotian/LLaVA-Instruct-150K", + "license:llama2", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en datasets: - liuhaotian/LLaVA-Instruct-150K pipeline_tag: image-text-to-text arxiv: 2304.08485 license: llama2 tags: - vision - image-text-to-text --- # LLaVA Model Card !image/png Below is the model card of Llava model 7b, which is copied from the original Llava model card that you can find here. Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance: and add the token to the location where you want to query images: ### Using : Below we used []( checkpoint. ### Using pure : Below is an example script to run generation in precision on a GPU device: ----------- From transformers>=v4.48, you can also pass image url or local path to the conversation history, and let the chat template handle the rest. Chat template will load the image for you and return inputs in which you can pass directly to ### Model optimization #### 4-bit quantization through library First make sure to install , and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with: #### Use Flash-Attention 2 to further speed-up generation First make sure to install . Refer to the original repository of Flash Attention regarding that package installation. Simply change the snippet above with: ## License Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.", + "model_explanation_gemini": "Generates text responses from input images and text prompts, combining vision and language processing." +} \ No newline at end of file diff --git a/data/model_data_json/llava-hf_llava-onevision-qwen2-0.5b-ov-hf.json b/data/model_data_json/llava-hf_llava-onevision-qwen2-0.5b-ov-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..06adc1a1d4fd9b6f54a4c6abc499adfa4577ae17 --- /dev/null +++ b/data/model_data_json/llava-hf_llava-onevision-qwen2-0.5b-ov-hf.json @@ -0,0 +1,23 @@ +{ + "model_id": "llava-hf/llava-onevision-qwen2-0.5b-ov-hf", + "downloads": 367907, + "tags": [ + "transformers", + "onnx", + "safetensors", + "llava_onevision", + "image-text-to-text", + "vision", + "transformers.js", + "conversational", + "en", + "zh", + "dataset:lmms-lab/LLaVA-OneVision-Data", + "arxiv:2408.03326", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh license: apache-2.0 tags: - vision - image-text-to-text - transformers.js datasets: - lmms-lab/LLaVA-OneVision-Data pipeline_tag: image-text-to-text arxiv: 2408.03326 library_name: transformers --- # LLaVA-Onevision Model Card !image/png Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance: ![Open In Colab]( Below is the model card of 0.5B LLaVA-Onevision model which is copied from the original LLaVA-Onevision model card that you can find here. ## Model details **Model type:** LLaVA-Onevision is an open-source multimodal LLM trained by fine-tuning Qwen2 on GPT-generated multimodal instruction-following data. LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos. **Model date:** LLaVA-Onevision-0.5-ov was added in August 2024. **Paper or resources for more information:** - **Architecture:** SO400M + Qwen2 - **Pretraining Stage:** LCS-558K, 1 epoch, projector - **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model - **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model - **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model - **Precision:** bfloat16 ## How to use the model First, make sure to have installed from branch or . The model supports multi-image and multi-prompt generation. Meaning that you can pass multiple images in your prompt. Make sure also to follow the correct prompt template by applying chat template: ### Using : Below we used []( checkpoint. ### Using pure : Below is an example script to run generation in precision on a GPU device: ----------- From transformers>=v4.48, you can also pass image/video url or local path to the conversation history, and let the chat template handle the rest. Chat template will load the image for you and return inputs in which you can pass directly to ### Model optimization #### 4-bit quantization through library First make sure to install , and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with: #### Use Flash-Attention 2 to further speed-up generation First make sure to install . Refer to the original repository of Flash Attention regarding that package installation. Simply change the snippet above with: ### Usage w/ Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM using: **Example:** Multi-round conversations w/ PKV caching # Citation", + "model_explanation_gemini": "Multimodal LLM for image, multi-image, and video understanding, generating text responses from visual inputs through instruction-following tasks." +} \ No newline at end of file diff --git a/data/model_data_json/llava-hf_llava-onevision-qwen2-7b-ov-hf.json b/data/model_data_json/llava-hf_llava-onevision-qwen2-7b-ov-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..c81360fa9a44a63f7962575572c9c481a0b65222 --- /dev/null +++ b/data/model_data_json/llava-hf_llava-onevision-qwen2-7b-ov-hf.json @@ -0,0 +1,21 @@ +{ + "model_id": "llava-hf/llava-onevision-qwen2-7b-ov-hf", + "downloads": 2273949, + "tags": [ + "transformers", + "safetensors", + "llava_onevision", + "image-text-to-text", + "vision", + "conversational", + "en", + "zh", + "dataset:lmms-lab/LLaVA-OneVision-Data", + "arxiv:2408.03326", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh license: apache-2.0 tags: - vision - image-text-to-text datasets: - lmms-lab/LLaVA-OneVision-Data library_name: transformers pipeline_tag: image-text-to-text arxiv: 2408.03326 --- # LLaVA-Onevision Model Card !image/png Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance: ![Open In Colab]( Below is the model card of 7B LLaVA-Onevision model which is copied from the original LLaVA-Onevision model card that you can find here. ## Model details **Model type:** LLaVA-Onevision is an open-source multimodal LLM trained by fine-tuning Qwen2 on GPT-generated multimodal instruction-following data. LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos. **Model date:** LLaVA-Onevision-7b-si was added in August 2024. **Paper or resources for more information:** - **Architecture:** SO400M + Qwen2 - **Pretraining Stage:** LCS-558K, 1 epoch, projector - **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model - **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model - **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model - **Precision:** bfloat16 ## How to use the model First, make sure to have installed from branch or . The model supports multi-image and multi-prompt generation. Meaning that you can pass multiple images in your prompt. Make sure also to follow the correct prompt template by applyong chat template: ### Using : Below we used []( checkpoint. ### Using pure : Below is an example script to run generation in precision on a GPU device: ### Model optimization #### 4-bit quantization through library First make sure to install , and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with: #### Use Flash-Attention 2 to further speed-up generation First make sure to install . Refer to the original repository of Flash Attention regarding that package installation. Simply change the snippet above with: # Citation", + "model_explanation_gemini": "An open-source multimodal LLM that processes and generates text from single/multiple images and videos, demonstrating strong transfer learning across visual scenarios." +} \ No newline at end of file diff --git a/data/model_data_json/llava-hf_llava-v1.6-mistral-7b-hf.json b/data/model_data_json/llava-hf_llava-v1.6-mistral-7b-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..86fb126234694a35d0ba65ac4a4e8b921613d44b --- /dev/null +++ b/data/model_data_json/llava-hf_llava-v1.6-mistral-7b-hf.json @@ -0,0 +1,20 @@ +{ + "model_id": "llava-hf/llava-v1.6-mistral-7b-hf", + "downloads": 219702, + "tags": [ + "transformers", + "safetensors", + "llava_next", + "image-text-to-text", + "vision", + "conversational", + "en", + "arxiv:2310.03744", + "license:apache-2.0", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-text-to-text language: - en pipeline_tag: image-text-to-text inference: true --- # LLaVa-Next, leveraging mistralai/Mistral-7B-Instruct-v0.2 as LLM The LLaVA-NeXT model was proposed in LLaVA-NeXT: Improved reasoning, OCR, and world knowledge by Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, Yong Jae Lee. LLaVa-NeXT (also called LLaVa-1.6) improves upon LLaVa-1.5 by increasing the input image resolution and training on an improved visual instruction tuning dataset to improve OCR and common sense reasoning. Disclaimer: The team releasing LLaVa-NeXT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description LLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal chatbot use cases. LLaVA 1.6 improves on LLaVA 1.5 BY: - Using Mistral-7B (for this checkpoint) and Nous-Hermes-2-Yi-34B which has better commercial licenses, and bilingual support - More diverse and high quality data mixture - Dynamic high resolution !image/png ## Intended uses & limitations You can use the raw model for tasks like image captioning, visual question answering, multimodal chatbot use cases. See the model hub to look for other versions on a task that interests you. ### How to use Here's the prompt template for this model but we recomment to use the chat templates to format the prompt with . That will apply the correct template for a given checkpoint for you. To run the model with the , see the below example: You can also load and use the model like following: ----------- From transformers>=v4.48, you can also pass image url or local path to the conversation history, and let the chat template handle the rest. Chat template will load the image for you and return inputs in which you can pass directly to ### Model optimization #### 4-bit quantization through library First make sure to install , and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with: #### Use Flash-Attention 2 to further speed-up generation First make sure to install . Refer to the original repository of Flash Attention regarding that package installation. Simply change the snippet above with: ### BibTeX entry and citation info", + "model_explanation_gemini": "LLaVa-NeXT (LLaVa-1.6) combines Mistral-7B's language capabilities with a vision encoder for multimodal tasks like image captioning, visual question answering, and chatbot interactions, featuring improved OCR, reasoning, and higher-resolution image processing." +} \ No newline at end of file diff --git a/data/model_data_json/llava-hf_vip-llava-7b-hf.json b/data/model_data_json/llava-hf_vip-llava-7b-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..e9a26bc752dbdc55c32ae5680d1d9a2e89685d30 --- /dev/null +++ b/data/model_data_json/llava-hf_vip-llava-7b-hf.json @@ -0,0 +1,18 @@ +{ + "model_id": "llava-hf/vip-llava-7b-hf", + "downloads": 12287, + "tags": [ + "transformers", + "safetensors", + "vipllava", + "image-text-to-text", + "vision", + "conversational", + "en", + "arxiv:2312.00784", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: image-text-to-text arxiv: 2312.00784 tags: - vision - image-text-to-text --- # VipLLaVA Model Card !image/png Below is the model card of VipLlava model 7b, which is copied from the original Llava model card that you can find here. Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance (the model works similarly as Llava): Meta Platforms, Inc. All Rights Reserved. ## Citation To cite this work please use", + "model_explanation_gemini": "Generates text outputs from image-text inputs for vision-language tasks." +} \ No newline at end of file diff --git a/data/model_data_json/lllyasviel_FramePackI2V_HY.json b/data/model_data_json/lllyasviel_FramePackI2V_HY.json new file mode 100644 index 0000000000000000000000000000000000000000..3b8ab078fe8ad69dbf691b04f69d7de33704c707 --- /dev/null +++ b/data/model_data_json/lllyasviel_FramePackI2V_HY.json @@ -0,0 +1,11 @@ +{ + "model_id": "lllyasviel/FramePackI2V_HY", + "downloads": 337438, + "tags": [ + "diffusers", + "safetensors", + "region:us" + ], + "description": "--- library_name: diffusers --- This is the FramePack for HY", + "model_explanation_gemini": "Packs frames for HY using diffusers." +} \ No newline at end of file diff --git a/data/model_data_json/lllyasviel_sd-controlnet-canny.json b/data/model_data_json/lllyasviel_sd-controlnet-canny.json new file mode 100644 index 0000000000000000000000000000000000000000..66e7ebcee9a80e9a9d5219573c170accc3549b49 --- /dev/null +++ b/data/model_data_json/lllyasviel_sd-controlnet-canny.json @@ -0,0 +1,19 @@ +{ + "model_id": "lllyasviel/sd-controlnet-canny", + "downloads": 183768, + "tags": [ + "diffusers", + "safetensors", + "art", + "controlnet", + "stable-diffusion", + "image-to-image", + "arxiv:2302.05543", + "base_model:runwayml/stable-diffusion-v1-5", + "base_model:adapter:runwayml/stable-diffusion-v1-5", + "license:openrail", + "region:us" + ], + "description": "--- license: openrail base_model: runwayml/stable-diffusion-v1-5 tags: - art - controlnet - stable-diffusion - image-to-image widget: - src: prompt: Girl with Pearl Earring --- # Controlnet - *Canny Version* ControlNet is a neural network structure to control diffusion models by adding extra conditions. This checkpoint corresponds to the ControlNet conditioned on **Canny edges**. It can be used in combination with Stable Diffusion. !img ## Model Details - **Developed by:** Lvmin Zhang, Maneesh Agrawala - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based. - **Resources for more information:** GitHub Repository, Paper. - **Cite as:** @misc{zhang2023adding, title={Adding Conditional Control to Text-to-Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV} } ## Introduction Controlnet was proposed in *Adding Conditional Control to Text-to-Image Diffusion Models* by Lvmin Zhang, Maneesh Agrawala. The abstract reads as follows: *We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.* ## Released Checkpoints The authors released 8 different checkpoints, each trained with Stable Diffusion v1-5 on a different type of conditioning: | Model Name | Control Image Overview| Control Image Example | Generated Image Example | |---|---|---|---| |lllyasviel/sd-controlnet-canny
*Trained with canny edge detection* | A monochrome image with white edges on a black background.| *Trained with Midas depth estimation* |A grayscale image with black representing deep areas and white representing shallow areas.| *Trained with HED edge detection (soft edge)* |A monochrome image with white soft edges on a black background.| *Trained with M-LSD line detection* |A monochrome image composed only of white straight lines on a black background.| *Trained with normal map* |A normal mapped image.| *Trained with OpenPose bone image* |A OpenPose bone image.| *Trained with human scribbles* |A hand-drawn monochrome image with white outlines on a black background.|*Trained with semantic segmentation* |An ADE20K's segmentation protocol image.| DeepSeek R1 Distill Qwen 7B by Deepseek-Ai *👾 LM Studio Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on Discord*. **Model creator:** deepseek-ai
**Original model**: DeepSeek-R1-Distill-Qwen-7B
**GGUF quantization:** provided by bartowski based on release b4514
## Technical Details Supports context length of 128k. Distilled from DeepSeek's R1 reasoning model. Tuned for reasoning and chain-of-thought. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible. ## Disclaimers LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.", + "model_explanation_gemini": "A distilled 7B-parameter text-generation model optimized for reasoning and chain-of-thought tasks, supporting 128k context length." +} \ No newline at end of file diff --git a/data/model_data_json/lmstudio-community_Meta-Llama-3.1-8B-Instruct-GGUF.json b/data/model_data_json/lmstudio-community_Meta-Llama-3.1-8B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..f5574b2b4a671729eb1c5384a5cc8c2348ebaa43 --- /dev/null +++ b/data/model_data_json/lmstudio-community_Meta-Llama-3.1-8B-Instruct-GGUF.json @@ -0,0 +1,28 @@ +{ + "model_id": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF", + "downloads": 103873, + "tags": [ + "gguf", + "facebook", + "meta", + "pytorch", + "llama", + "llama-3", + "text-generation", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "base_model:meta-llama/Meta-Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Meta-Llama-3.1-8B-Instruct", + "license:llama3.1", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.1 lm_studio: param_count: 8b use_case: chat release_date: 23-07-2024 model_creator: Meta prompt_template: Llama 3 base_model: llama original_repo: meta-llama/Meta-Llama-3.1-8B-Instruct base_model: meta-llama/Meta-Llama-3.1-8B-Instruct --- ## 💫 Community Model> Llama 3.1 8B Instruct by Meta *👾 LM Studio Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on Discord*. **Model creator:** meta-llama
**Original model**: Meta-Llama-3.1-8B-Instruct
**GGUF quantization:** provided by bartowski based on release b3472
Important: Requires LM Studio version 0.2.29, available now here! ## Model Summary: Llama 3.1 is an update to the previously released family of Llama 3 models. It has improved performance across the board, especially in multilingual tasks.
It is the current state of the art for open source and can be used for basically any task you throw at it. ## Prompt Template: Choose the 'Llama 3' preset in your LM Studio. Under the hood, the model will see a prompt that's formatted like so: ## Technical Details Llama 3.1 features an improved 128k context window. It has been trained on 15T tokens, including 25 million synthetically generated samples. For more details, check their blog post here ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible. 🙏 Special thanks to Kalomaze for his dataset (linked here) that was used for calculating the imatrix for these quants, which improves the overall quality! ## Disclaimers LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.", + "model_explanation_gemini": "An 8-billion-parameter multilingual text-generation model optimized for chat/instruct tasks with improved performance and a 128k context window." +} \ No newline at end of file diff --git a/data/model_data_json/lmstudio-community_gemma-3-12b-it-GGUF.json b/data/model_data_json/lmstudio-community_gemma-3-12b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..0b67ddec53e6a8d6daf42e7fca7293da311d8fb5 --- /dev/null +++ b/data/model_data_json/lmstudio-community_gemma-3-12b-it-GGUF.json @@ -0,0 +1,15 @@ +{ + "model_id": "lmstudio-community/gemma-3-12b-it-GGUF", + "downloads": 78493, + "tags": [ + "gguf", + "image-text-to-text", + "base_model:google/gemma-3-12b-it", + "base_model:quantized:google/gemma-3-12b-it", + "license:gemma", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- quantized_by: bartowski pipeline_tag: image-text-to-text extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license license: gemma extra_gated_heading: Access Gemma on Hugging Face base_model: google/gemma-3-12b-it --- ## 💫 Community Model> gemma 3 12b it by Google *👾 LM Studio Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on Discord*. **Model creator:** google
**Original model**: gemma-3-12b-it
**GGUF quantization:** provided by bartowski based on release b4877
Requires llama.cpp runtime v1.19.0 ## Technical Details Supports a context length of 128k tokens, with a max output of 8192. Multimodal supporting images normalized to 896 x 896 resolution. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Requires latest (currently beta) llama.cpp runtime. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible. ## Disclaimers LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio." +} \ No newline at end of file diff --git a/data/model_data_json/lmstudio-community_gemma-3-27b-it-GGUF.json b/data/model_data_json/lmstudio-community_gemma-3-27b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..224db72f7ab3e9bbe4745823b78b5195c801d2ab --- /dev/null +++ b/data/model_data_json/lmstudio-community_gemma-3-27b-it-GGUF.json @@ -0,0 +1,16 @@ +{ + "model_id": "lmstudio-community/gemma-3-27b-it-GGUF", + "downloads": 93444, + "tags": [ + "gguf", + "image-text-to-text", + "base_model:google/gemma-3-27b-it", + "base_model:quantized:google/gemma-3-27b-it", + "license:gemma", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- quantized_by: bartowski pipeline_tag: image-text-to-text extra_gated_prompt: >- To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license license: gemma extra_gated_heading: Access Gemma on Hugging Face base_model: google/gemma-3-27b-it --- ## 💫 Community Model> gemma 3 27b it by Google *👾 LM Studio Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on Discord*. **Model creator:** google
**Original model**: gemma-3-27b-it
**GGUF quantization:** provided by bartowski based on release b4877
Requires llama.cpp runtime v1.19.0 ## Technical Details Supports a context length of 128k tokens, with a max output of 8192. Multimodal supporting images normalized to 896 x 896 resolution. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Requires latest (currently beta) llama.cpp runtime. ## Special thanks 🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible. ## Disclaimers LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.", + "model_explanation_gemini": "Quantized version of Google's Gemma 3 27B model for multimodal text generation and image understanding tasks with 128k token context support." +} \ No newline at end of file diff --git a/data/model_data_json/lmsys_vicuna-7b-v1.5.json b/data/model_data_json/lmsys_vicuna-7b-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..f461c4e1bd6954eada74012d0eaa3b844680f038 --- /dev/null +++ b/data/model_data_json/lmsys_vicuna-7b-v1.5.json @@ -0,0 +1,18 @@ +{ + "model_id": "lmsys/vicuna-7b-v1.5", + "downloads": 229378, + "tags": [ + "transformers", + "pytorch", + "llama", + "text-generation", + "arxiv:2307.09288", + "arxiv:2306.05685", + "license:llama2", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- inference: false license: llama2 --- # Vicuna Model Card ## Model Details Vicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT. - **Developed by:** LMSYS - **Model type:** An auto-regressive language model based on the transformer architecture - **License:** Llama 2 Community License Agreement - **Finetuned from model:** Llama 2 ### Model Sources - **Repository:** - **Blog:** - **Paper:** - **Demo:** ## Uses The primary use of Vicuna is research on large language models and chatbots. The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence. ## How to Get Started with the Model - Command line interface: - APIs (OpenAI API, Huggingface API): ## Training Details Vicuna v1.5 is fine-tuned from Llama 2 with supervised instruction fine-tuning. The training data is around 125K conversations collected from ShareGPT.com. See more details in the \"Training Details of Vicuna Models\" section in the appendix of this paper. ## Evaluation !Evaluation Results Vicuna is evaluated with standard benchmarks, human preference, and LLM-as-a-judge. See more details in this paper and leaderboard. ## Difference between different versions of Vicuna See vicuna_weights_version.md", + "model_explanation_gemini": "Vicuna is a fine-tuned Llama 2-based chat assistant designed for research on large language models and chatbots, trained on user-shared conversations from ShareGPT." +} \ No newline at end of file diff --git a/data/model_data_json/lpiccinelli_unidepth-v2-vitl14.json b/data/model_data_json/lpiccinelli_unidepth-v2-vitl14.json new file mode 100644 index 0000000000000000000000000000000000000000..b2e19a86c63b6b7bdd71ffaff8786b9e7ebe06b7 --- /dev/null +++ b/data/model_data_json/lpiccinelli_unidepth-v2-vitl14.json @@ -0,0 +1,14 @@ +{ + "model_id": "lpiccinelli/unidepth-v2-vitl14", + "downloads": 78452, + "tags": [ + "UniDepth", + "pytorch", + "safetensors", + "model_hub_mixin", + "monocular-metric-depth-estimation", + "pytorch_model_hub_mixin", + "region:us" + ], + "description": "--- library_name: UniDepth tags: - model_hub_mixin - monocular-metric-depth-estimation - pytorch_model_hub_mixin --- This model has been pushed to the Hub using the PytorchModelHubMixin integration: - Library: - Docs: [More Information Needed]" +} \ No newline at end of file diff --git a/data/model_data_json/lpiccinelli_unik3d-vitl.json b/data/model_data_json/lpiccinelli_unik3d-vitl.json new file mode 100644 index 0000000000000000000000000000000000000000..248c48a59c2e344c529d33d5ffa695113524c6c7 --- /dev/null +++ b/data/model_data_json/lpiccinelli_unik3d-vitl.json @@ -0,0 +1,15 @@ +{ + "model_id": "lpiccinelli/unik3d-vitl", + "downloads": 2666944, + "tags": [ + "UniK3D", + "pytorch", + "safetensors", + "model_hub_mixin", + "monocular-metric-3D-estimation", + "pytorch_model_hub_mixin", + "region:us" + ], + "description": "--- library_name: UniK3D tags: - model_hub_mixin - monocular-metric-3D-estimation - pytorch_model_hub_mixin --- This model has been pushed to the Hub using the PytorchModelHubMixin integration: - Library: - Docs: [More Information Needed]", + "model_explanation_gemini": "Performs monocular metric 3D estimation using a UniK3D framework with PyTorch integration." +} \ No newline at end of file diff --git a/data/model_data_json/lucadiliello_BLEURT-20-D12.json b/data/model_data_json/lucadiliello_BLEURT-20-D12.json new file mode 100644 index 0000000000000000000000000000000000000000..29ad17f47efeda74ddf7de4fe8616eceeaae974b --- /dev/null +++ b/data/model_data_json/lucadiliello_BLEURT-20-D12.json @@ -0,0 +1,15 @@ +{ + "model_id": "lucadiliello/BLEURT-20-D12", + "downloads": 906612, + "tags": [ + "transformers", + "pytorch", + "bleurt", + "text-classification", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "This model is based on a custom Transformer model that can be installed with: Now load the model and make predictions with: Take a look at this repository for the definition of , and in PyTorch.", + "model_explanation_gemini": "Custom Transformer model for making predictions, requiring installation and loading as specified in the linked repository." +} \ No newline at end of file diff --git a/data/model_data_json/lxyuan_distilbert-base-multilingual-cased-sentiments-student.json b/data/model_data_json/lxyuan_distilbert-base-multilingual-cased-sentiments-student.json new file mode 100644 index 0000000000000000000000000000000000000000..e35eab9dab096d84db9ebf6347b803041b81cd8f --- /dev/null +++ b/data/model_data_json/lxyuan_distilbert-base-multilingual-cased-sentiments-student.json @@ -0,0 +1,37 @@ +{ + "model_id": "lxyuan/distilbert-base-multilingual-cased-sentiments-student", + "downloads": 478991, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "distilbert", + "text-classification", + "sentiment-analysis", + "zero-shot-distillation", + "distillation", + "zero-shot-classification", + "debarta-v3", + "en", + "ar", + "de", + "es", + "fr", + "ja", + "zh", + "id", + "hi", + "it", + "ms", + "pt", + "dataset:tyqiangz/multilingual-sentiments", + "doi:10.57967/hf/1422", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - sentiment-analysis - text-classification - zero-shot-distillation - distillation - zero-shot-classification - debarta-v3 model-index: - name: distilbert-base-multilingual-cased-sentiments-student results: [] datasets: - tyqiangz/multilingual-sentiments language: - en - ar - de - es - fr - ja - zh - id - hi - it - ms - pt --- # distilbert-base-multilingual-cased-sentiments-student This model is distilled from the zero-shot classification pipeline on the Multilingual Sentiment dataset using this script. In reality the multilingual-sentiment dataset is annotated of course, but we'll pretend and ignore the annotations for the sake of example. Teacher model: MoritzLaurer/mDeBERTa-v3-base-mnli-xnli Teacher hypothesis template: \"The sentiment of this text is {}.\" Student model: distilbert-base-multilingual-cased ## Inference example ## Training procedure Notebook link: here ### Training hyperparameters Result can be reproduce using the following commands: If you are training this model on Colab, make the following code changes to avoid Out-of-memory error message: ### Training log ### Framework versions - Transformers 4.28.1 - Pytorch 2.0.0+cu118 - Datasets 2.11.0 - Tokenizers 0.13.3", + "model_explanation_gemini": "A distilled multilingual sentiment analysis model trained to classify text sentiment across multiple languages using zero-shot distillation from a teacher model." +} \ No newline at end of file diff --git a/data/model_data_json/m-a-p_MERT-v1-95M.json b/data/model_data_json/m-a-p_MERT-v1-95M.json new file mode 100644 index 0000000000000000000000000000000000000000..1a40aebc9c8a487c1c83547e0cf8197156d36d9b --- /dev/null +++ b/data/model_data_json/m-a-p_MERT-v1-95M.json @@ -0,0 +1,17 @@ +{ + "model_id": "m-a-p/MERT-v1-95M", + "downloads": 74062, + "tags": [ + "transformers", + "pytorch", + "mert_model", + "feature-extraction", + "music", + "audio-classification", + "custom_code", + "arxiv:2306.00107", + "license:cc-by-nc-4.0", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 inference: false tags: - music pipeline_tag: audio-classification --- # Introduction to our series work The development log of our Music Audio Pre-training (m-a-p) model family: - 02/06/2023: arxiv pre-print and training codes released. - 17/03/2023: we release two advanced music understanding models, MERT-v1-95M and MERT-v1-330M , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks. - 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset MERT-v0-public - 29/12/2022: a music understanding model MERT-v0 trained with **MLM** paradigm, which performs better at downstream tasks. - 29/10/2022: a pre-trained MIR model music2vec trained with **BYOL** paradigm. Here is a table for quick model pick-up: | Name | Pre-train Paradigm | Training Data (hour) | Pre-train Context (second) | Model Size | Transformer Layer-Dimension | Feature Rate | Sample Rate | Release Date | | ------------------------------------------------------------ | ------------------ | -------------------- | ---------------------------- | ---------- | --------------------------- | ------------ | ----------- | ------------ | | MERT-v1-330M | MLM | 160K | 5 | 330M | 24-1024 | 75 Hz | 24K Hz | 17/03/2023 | | MERT-v1-95M | MLM | 20K | 5 | 95M | 12-768 | 75 Hz | 24K Hz | 17/03/2023 | | MERT-v0-public | MLM | 900 | 5 | 95M | 12-768 | 50 Hz | 16K Hz | 14/03/2023 | | MERT-v0 | MLM | 1000 | 5 | 95 M | 12-768 | 50 Hz | 16K Hz | 29/12/2022 | | music2vec-v1 | BYOL | 1000 | 30 | 95 M | 12-768 | 50 Hz | 16K Hz | 30/10/2022 | ## Explanation The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using: - **Model Size**: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware. - **Transformer Layer-Dimension**: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by **different layers could have various performance depending on tasks**. - **Feature Rate**: Given a 1-second audio input, the number of features output by the model. - **Sample Rate**: The frequency of audio that the model is trained with. # Introduction to MERT-v1 Compared to MERT-v0, we introduce multiple new things in the MERT-v1 pre-training: - Change the pseudo labels to 8 codebooks from encodec, which potentially has higher quality and empower our model to support music generation. - MLM prediction with in-batch noise mixture. - Train with higher audio frequency (24K Hz). - Train with more audio data (up to 160 thousands of hours). - More available model sizes 95M and 330M. More details will be written in our coming-soon paper. # Model Usage # Citation" +} \ No newline at end of file diff --git a/data/model_data_json/madebyollin_sdxl-vae-fp16-fix.json b/data/model_data_json/madebyollin_sdxl-vae-fp16-fix.json new file mode 100644 index 0000000000000000000000000000000000000000..61f6229124de2744058629c422cdab9c3dec2138 --- /dev/null +++ b/data/model_data_json/madebyollin_sdxl-vae-fp16-fix.json @@ -0,0 +1,14 @@ +{ + "model_id": "madebyollin/sdxl-vae-fp16-fix", + "downloads": 356218, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "license:mit", + "region:us" + ], + "description": "--- license: mit tags: - stable-diffusion - stable-diffusion-diffusers inference: false --- # SDXL-VAE-FP16-Fix SDXL-VAE-FP16-Fix is the SDXL VAE*, but modified to run in fp16 precision without generating NaNs. | VAE | Decoding in / precision | Decoding in precision | | --------------------- | -------------------------------------------- | ------------------------------- | | SDXL-VAE | ✅ | ⚠️ | | SDXL-VAE-FP16-Fix | ✅ | ✅ | ## 🧨 Diffusers Usage Just load this checkpoint via : , you can now remove it (Disclaimer - I haven't tested this, just aggregating various instructions I've seen elsewhere :P PRs to improve these instructions are welcomed!) ## Details SDXL-VAE generates NaNs in fp16 because the internal activation values are too big: SDXL-VAE-FP16-Fix was created by finetuning the SDXL-VAE to: 1. keep the final output the same, but 2. make the internal activation values smaller, by 3. scaling down weights and biases within the network There are slight discrepancies between the output of SDXL-VAE-FP16-Fix and SDXL-VAE, but the decoded images should be close enough for most purposes. --- \\* is specifically based on SDXL-VAE (0.9), but it works with SDXL 1.0 too", + "model_explanation_gemini": "Modifies the SDXL VAE to enable stable fp16 precision decoding without generating NaNs by scaling internal weights and biases while maintaining similar output quality." +} \ No newline at end of file diff --git a/data/model_data_json/madhurjindal_autonlp-Gibberish-Detector-492513457.json b/data/model_data_json/madhurjindal_autonlp-Gibberish-Detector-492513457.json new file mode 100644 index 0000000000000000000000000000000000000000..5e7e91fbde29e7665b140ea44cac38f9ef59d64c --- /dev/null +++ b/data/model_data_json/madhurjindal_autonlp-Gibberish-Detector-492513457.json @@ -0,0 +1,23 @@ +{ + "model_id": "madhurjindal/autonlp-Gibberish-Detector-492513457", + "downloads": 146116, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "distilbert", + "text-classification", + "autonlp", + "en", + "dataset:madhurjindal/autonlp-data-Gibberish-Detector", + "doi:10.57967/hf/2664", + "license:mit", + "co2_eq_emissions", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - autonlp language: en widget: - text: I love Machine Learning! datasets: - madhurjindal/autonlp-data-Gibberish-Detector co2_eq_emissions: 5.527544460835904 license: mit --- # Problem Description The ability to process and understand user input is crucial for various applications, such as chatbots or downstream tasks. However, a common challenge faced in such systems is the presence of gibberish or nonsensical input. To address this problem, we present a project focused on developing a gibberish detector for the English language. The primary goal of this project is to classify user input as either **gibberish** or **non-gibberish**, enabling more accurate and meaningful interactions with the system. We also aim to enhance the overall performance and user experience of chatbots and other systems that rely on user input. >## What is Gibberish? Gibberish refers to **nonsensical or meaningless language or text** that lacks coherence or any discernible meaning. It can be characterized by a combination of random words, nonsensical phrases, grammatical errors, or syntactical abnormalities that prevent the communication from conveying a clear and understandable message. Gibberish can vary in intensity, ranging from simple noise with no meaningful words to sentences that may appear superficially correct but lack coherence or logical structure when examined closely. Detecting and identifying gibberish is essential in various contexts, such as **natural language processing**, **chatbot systems**, **spam filtering**, and **language-based security measures**, to ensure effective communication and accurate processing of user inputs. ## Label Description Thus, we break down the problem into 4 categories: 1. **Noise:** Gibberish at the zero level where even the different constituents of the input phrase (words) do not hold any meaning independently. *For example: * 2. **Word Salad:** Gibberish at level 1 where words make sense independently, but when looked at the bigger picture (the phrase) any meaning is not depicted. *For example: * 3. **Mild gibberish:** Gibberish at level 2 where there is a part of the sentence that has grammatical errors, word sense errors, or any syntactical abnormalities, which leads the sentence to miss out on a coherent meaning. *For example: * 4. **Clean:** This category represents a set of words that form a complete and meaningful sentence on its own. *For example: * > **Tip:** To facilitate gibberish detection, you can combine the labels based on the desired level of detection. For instance, if you need to detect gibberish at level 1, you can group Noise and Word Salad together as \"Gibberish,\" while considering Mild gibberish and Clean separately as \"NotGibberish.\" This approach allows for flexibility in detecting and categorizing different levels of gibberish based on specific requirements. # Model Trained Using AutoNLP - Problem type: Multi-class Classification - Model ID: 492513457 - CO2 Emissions (in grams): 5.527544460835904 ## Validation Metrics - Loss: 0.07609463483095169 - Accuracy: 0.9735624586913417 - Macro F1: 0.9736173135739408 - Micro F1: 0.9735624586913417 - Weighted F1: 0.9736173135739408 - Macro Precision: 0.9737771415197378 - Micro Precision: 0.9735624586913417 - Weighted Precision: 0.9737771415197378 - Macro Recall: 0.9735624586913417 - Micro Recall: 0.9735624586913417 - Weighted Recall: 0.9735624586913417 ## Usage You can use cURL to access this model: Or Python API: Another simplifed solution with transformers pipline:", + "model_explanation_gemini": "Classifies English text as gibberish or non-gibberish to improve input processing in applications like chatbots and NLP systems." +} \ No newline at end of file diff --git a/data/model_data_json/martin-ha_toxic-comment-model.json b/data/model_data_json/martin-ha_toxic-comment-model.json new file mode 100644 index 0000000000000000000000000000000000000000..56678d47c770c83850400c9ef41e40e971717ea5 --- /dev/null +++ b/data/model_data_json/martin-ha_toxic-comment-model.json @@ -0,0 +1,16 @@ +{ + "model_id": "martin-ha/toxic-comment-model", + "downloads": 269971, + "tags": [ + "transformers", + "pytorch", + "distilbert", + "text-classification", + "en", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en --- ## Model description This model is a fine-tuned version of the DistilBERT model to classify toxic comments. ## How to use You can use the model with the following code. ## Limitations and Bias This model is intended to use for classify toxic online classifications. However, one limitation of the model is that it performs poorly for some comments that mention a specific identity subgroup, like Muslim. The following table shows a evaluation score for different identity group. You can learn the specific meaning of this metrics here. But basically, those metrics shows how well a model performs for a specific group. The larger the number, the better. | **subgroup** | **subgroup_size** | **subgroup_auc** | **bpsn_auc** | **bnsp_auc** | | ----------------------------- | ----------------- | ---------------- | ------------ | ------------ | | muslim | 108 | 0.689 | 0.811 | 0.88 | | jewish | 40 | 0.749 | 0.86 | 0.825 | | homosexual_gay_or_lesbian | 56 | 0.795 | 0.706 | 0.972 | | black | 84 | 0.866 | 0.758 | 0.975 | | white | 112 | 0.876 | 0.784 | 0.97 | | female | 306 | 0.898 | 0.887 | 0.948 | | christian | 231 | 0.904 | 0.917 | 0.93 | | male | 225 | 0.922 | 0.862 | 0.967 | | psychiatric_or_mental_illness | 26 | 0.924 | 0.907 | 0.95 | The table above shows that the model performs poorly for the muslim and jewish group. In fact, you pass the sentence \"Muslims are people who follow or practice Islam, an Abrahamic monotheistic religion.\" Into the model, the model will classify it as toxic. Be mindful for this type of potential bias. ## Training data The training data comes this Kaggle competition. We use 10% of the data to train the model. ## Training procedure You can see this documentation and codes for how we train the model. It takes about 3 hours in a P-100 GPU. ## Evaluation results The model achieves 94% accuracy and 0.59 f1-score in a 10000 rows held-out test set.", + "model_explanation_gemini": "Classifies toxic online comments using a fine-tuned DistilBERT model, with noted biases in handling certain identity subgroups like Muslims." +} \ No newline at end of file diff --git a/data/model_data_json/mattmdjaga_segformer_b2_clothes.json b/data/model_data_json/mattmdjaga_segformer_b2_clothes.json new file mode 100644 index 0000000000000000000000000000000000000000..35f4ed9033589b3a6b1f60b4d9de1880214c1ab5 --- /dev/null +++ b/data/model_data_json/mattmdjaga_segformer_b2_clothes.json @@ -0,0 +1,20 @@ +{ + "model_id": "mattmdjaga/segformer_b2_clothes", + "downloads": 508697, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "segformer", + "vision", + "image-segmentation", + "dataset:mattmdjaga/human_parsing_dataset", + "arxiv:2105.15203", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - vision - image-segmentation widget: - src: example_title: Person - src: example_title: Person datasets: - mattmdjaga/human_parsing_dataset --- # Segformer B2 fine-tuned for clothes segmentation SegFormer model fine-tuned on ATR dataset for clothes segmentation but can also be used for human segmentation. The dataset on hugging face is called \"mattmdjaga/human_parsing_dataset\". **Training code**. Labels: 0: \"Background\", 1: \"Hat\", 2: \"Hair\", 3: \"Sunglasses\", 4: \"Upper-clothes\", 5: \"Skirt\", 6: \"Pants\", 7: \"Dress\", 8: \"Belt\", 9: \"Left-shoe\", 10: \"Right-shoe\", 11: \"Face\", 12: \"Left-leg\", 13: \"Right-leg\", 14: \"Left-arm\", 15: \"Right-arm\", 16: \"Bag\", 17: \"Scarf\" ### Evaluation | Label Index | Label Name | Category Accuracy | Category IoU | |:-------------:|:----------------:|:-----------------:|:------------:| | 0 | Background | 0.99 | 0.99 | | 1 | Hat | 0.73 | 0.68 | | 2 | Hair | 0.91 | 0.82 | | 3 | Sunglasses | 0.73 | 0.63 | | 4 | Upper-clothes | 0.87 | 0.78 | | 5 | Skirt | 0.76 | 0.65 | | 6 | Pants | 0.90 | 0.84 | | 7 | Dress | 0.74 | 0.55 | | 8 | Belt | 0.35 | 0.30 | | 9 | Left-shoe | 0.74 | 0.58 | | 10 | Right-shoe | 0.75 | 0.60 | | 11 | Face | 0.92 | 0.85 | | 12 | Left-leg | 0.90 | 0.82 | | 13 | Right-leg | 0.90 | 0.81 | | 14 | Left-arm | 0.86 | 0.74 | | 15 | Right-arm | 0.82 | 0.73 | | 16 | Bag | 0.91 | 0.84 | | 17 | Scarf | 0.63 | 0.29 | Overall Evaluation Metrics: - Evaluation Loss: 0.15 - Mean Accuracy: 0.80 - Mean IoU: 0.69 ### License The license for this model can be found here. ### BibTeX entry and citation info", + "model_explanation_gemini": "Segments clothing items and human body parts in images, trained on a dataset with 18 labeled categories including hats, upper-clothes, pants, and accessories." +} \ No newline at end of file diff --git a/data/model_data_json/medicalai_ClinicalBERT.json b/data/model_data_json/medicalai_ClinicalBERT.json new file mode 100644 index 0000000000000000000000000000000000000000..edba28e4692f5d3608cc7cecad0ad3a44d8ccf9b --- /dev/null +++ b/data/model_data_json/medicalai_ClinicalBERT.json @@ -0,0 +1,16 @@ +{ + "model_id": "medicalai/ClinicalBERT", + "downloads": 180656, + "tags": [ + "transformers", + "pytorch", + "distilbert", + "fill-mask", + "medical", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - medical --- # ClinicalBERT This model card describes the ClinicalBERT model, which was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed. We then utilized a large-scale corpus of EHRs from over 3 million patient records to fine tune the base language model. ## Pretraining Data The ClinicalBERT model was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed. ## Model Pretraining ### Pretraining Procedures The ClinicalBERT was initialized from BERT. Then the training followed the principle of masked language model, in which given a piece of text, we randomly replace some tokens by MASKs, special tokens for masking, and then require the model to predict the original tokens via contextual text. ### Pretraining Hyperparameters We used a batch size of 32, a maximum sequence length of 256, and a learning rate of 5e-5 for pre-training our models. ## How to use the model Load the model via the transformers library: ## Citation Please cite these articles: 1. Wang, G., Liu, X., Ying, Z. et al. Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nat Med (2023). 2. Wang, G., Liu, X., Liu, H., Yang, G. et al. A Generalist Medical Language Model for Disease Diagnosis Assistance. Nat Med (2025).", + "model_explanation_gemini": "ClinicalBERT is a medical language model pretrained on 1.2B words of diverse disease data and fine-tuned on EHRs from 3 million patient records for clinical text understanding and prediction tasks." +} \ No newline at end of file diff --git a/data/model_data_json/mesolitica_llama2-embedding-1b-8k.json b/data/model_data_json/mesolitica_llama2-embedding-1b-8k.json new file mode 100644 index 0000000000000000000000000000000000000000..b525c9a9c73317bb22e86c648808a59fe558b2e0 --- /dev/null +++ b/data/model_data_json/mesolitica_llama2-embedding-1b-8k.json @@ -0,0 +1,17 @@ +{ + "model_id": "mesolitica/llama2-embedding-1b-8k", + "downloads": 129620, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "custom_code", + "ms", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ms --- # 1B 32768 context length Llama2 on Malaysian text embedding task Trained on truncated 8k context length, but infer able to scale up to 32k context length. README at WandB, ## how-to", + "model_explanation_gemini": "Generates embeddings for Malaysian text with a 1B-parameter Llama2 model trained on 8k context but scalable to 32k for inference." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-2-13b-chat-hf.json b/data/model_data_json/meta-llama_Llama-2-13b-chat-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..354abce86f596dc0e98ce94868b29d212f61c47e --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-2-13b-chat-hf.json @@ -0,0 +1,23 @@ +{ + "model_id": "meta-llama/Llama-2-13b-chat-hf", + "downloads": 131738, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "llama-2", + "conversational", + "en", + "arxiv:2307.09288", + "license:llama2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- extra_gated_heading: You need to share contact information with Meta to access this model extra_gated_prompt: >- ### LLAMA 2 COMMUNITY LICENSE AGREEMENT \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 2\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/. \"Llama Materials\" means, collectively, Meta's proprietary Llama 2 and documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non- transferable and royalty-free limited license under Meta's intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof). 2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials. b. Subject to Meta's ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy. #### Prohibited Uses We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 2 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 2 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: github.com/facebookresearch/llama * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit language: - en pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-2 license: llama2 --- # **Llama 2** Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 13B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom. ## Model Details *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the website and accept our License before requesting access here.* Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. **Model Developers** Meta **Variations** Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. **Input** Models input text only. **Output** Models generate text only. **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. ||Training Data|Params|Content Length|GQA|Tokens|LR| |---|---|---|---|---|---|---| |Llama 2|*A new mix of publicly available online data*|7B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|13B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|70B|4k|✔|2.0T|1.5 x 10-4| *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability. **Model Dates** Llama 2 was trained between January 2023 and July 2023. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: **Research Paper** \"Llama-2: Open Foundation and Fine-tuned Chat Models\" ## Intended Use **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. To get the expected features and performance for the chat versions, a specific formatting needs to be followed, including the and tags, and tokens, and the whitespaces and breaklines in between (we recommend calling on inputs to avoid double-spaces). See our reference code in github for details: []( **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program. ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO2eq)| |---|---|---|---| |Llama 2 7B|184320|400|31.22| |Llama 2 13B|368640|400|62.44| |Llama 2 70B|1720320|400|291.42| |Total|3311616||539.00| **CO2 emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023. ## Evaluation Results In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.For all the evaluations, we use our internal evaluations library. |Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval| |---|---|---|---|---|---|---|---|---|---| |Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9| |Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9| |Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7| |Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6| |Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3| |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1| |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**| **Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1. |||TruthfulQA|Toxigen| |---|---|---|---| |Llama 1|7B|27.42|23.00| |Llama 1|13B|41.74|23.08| |Llama 1|33B|44.19|22.57| |Llama 1|65B|48.71|21.77| |Llama 2|7B|33.29|**21.25**| |Llama 2|13B|41.86|26.10| |Llama 2|70B|**50.18**|24.60| **Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better). |||TruthfulQA|Toxigen| |---|---|---|---| |Llama-2-Chat|7B|57.04|**0.00**| |Llama-2-Chat|13B|62.18|**0.00**| |Llama-2-Chat|70B|**64.14**|0.01| **Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above. ## Ethical Considerations and Limitations Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available at ## Reporting Issues Please report any software “bug,” or other problems with the models through one of the following means: - Reporting issues with the model: github.com/facebookresearch/llama - Reporting problematic content generated by the model: developers.facebook.com/llama_output_feedback - Reporting bugs and security concerns: facebook.com/whitehat/info ## Llama Model Index |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |---|---|---|---|---| |7B| Link | Link | Link | Link| |13B| Link | Link | Link | Link| |70B| Link | Link | Link | Link|" +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-2-7b-chat-hf.json b/data/model_data_json/meta-llama_Llama-2-7b-chat-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..101267aea12fc347197d96a693361f341e7f04aa --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-2-7b-chat-hf.json @@ -0,0 +1,23 @@ +{ + "model_id": "meta-llama/Llama-2-7b-chat-hf", + "downloads": 1110112, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "llama-2", + "conversational", + "en", + "arxiv:2307.09288", + "license:llama2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- extra_gated_heading: You need to share contact information with Meta to access this model extra_gated_prompt: >- ### LLAMA 2 COMMUNITY LICENSE AGREEMENT \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 2\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/. \"Llama Materials\" means, collectively, Meta's proprietary Llama 2 and documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non- transferable and royalty-free limited license under Meta's intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof). 2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials. b. Subject to Meta's ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy. #### Prohibited Uses We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 2 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 2 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: github.com/facebookresearch/llama * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit language: - en pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-2 license: llama2 --- # **Llama 2** Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom. ## Model Details *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the website and accept our License before requesting access here.* Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. **Model Developers** Meta **Variations** Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. **Input** Models input text only. **Output** Models generate text only. **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. ||Training Data|Params|Content Length|GQA|Tokens|LR| |---|---|---|---|---|---|---| |Llama 2|*A new mix of publicly available online data*|7B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|13B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|70B|4k|✔|2.0T|1.5 x 10-4| *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability. **Model Dates** Llama 2 was trained between January 2023 and July 2023. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: **Research Paper** \"Llama-2: Open Foundation and Fine-tuned Chat Models\" ## Intended Use **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. To get the expected features and performance for the chat versions, a specific formatting needs to be followed, including the and tags, and tokens, and the whitespaces and breaklines in between (we recommend calling on inputs to avoid double-spaces). See our reference code in github for details: []( **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program. ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO2eq)| |---|---|---|---| |Llama 2 7B|184320|400|31.22| |Llama 2 13B|368640|400|62.44| |Llama 2 70B|1720320|400|291.42| |Total|3311616||539.00| **CO2 emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023. ## Evaluation Results In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.For all the evaluations, we use our internal evaluations library. |Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval| |---|---|---|---|---|---|---|---|---|---| |Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9| |Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9| |Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7| |Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6| |Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3| |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1| |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**| **Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1. |||TruthfulQA|Toxigen| |---|---|---|---| |Llama 1|7B|27.42|23.00| |Llama 1|13B|41.74|23.08| |Llama 1|33B|44.19|22.57| |Llama 1|65B|48.71|21.77| |Llama 2|7B|33.29|**21.25**| |Llama 2|13B|41.86|26.10| |Llama 2|70B|**50.18**|24.60| **Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better). |||TruthfulQA|Toxigen| |---|---|---|---| |Llama-2-Chat|7B|57.04|**0.00**| |Llama-2-Chat|13B|62.18|**0.00**| |Llama-2-Chat|70B|**64.14**|0.01| **Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above. ## Ethical Considerations and Limitations Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available at ## Reporting Issues Please report any software “bug,” or other problems with the models through one of the following means: - Reporting issues with the model: github.com/facebookresearch/llama - Reporting problematic content generated by the model: developers.facebook.com/llama_output_feedback - Reporting bugs and security concerns: facebook.com/whitehat/info ## Llama Model Index |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |---|---|---|---|---| |7B| Link | Link | Link | Link| |13B| Link | Link | Link | Link| |70B| Link | Link | Link | Link|" +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-2-7b-hf.json b/data/model_data_json/meta-llama_Llama-2-7b-hf.json new file mode 100644 index 0000000000000000000000000000000000000000..b0db17a20b88f2323162c771ad26462103319401 --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-2-7b-hf.json @@ -0,0 +1,22 @@ +{ + "model_id": "meta-llama/Llama-2-7b-hf", + "downloads": 899278, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "llama-2", + "en", + "arxiv:2307.09288", + "license:llama2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- extra_gated_heading: You need to share contact information with Meta to access this model extra_gated_prompt: >- ### LLAMA 2 COMMUNITY LICENSE AGREEMENT \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 2\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/. \"Llama Materials\" means, collectively, Meta's proprietary Llama 2 and documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non- transferable and royalty-free limited license under Meta's intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof). 2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials. b. Subject to Meta's ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy. #### Prohibited Uses We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 2 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 2 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: github.com/facebookresearch/llama * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit language: - en pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-2 license: llama2 --- # **Llama 2** Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B pretrained model, converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom. ## Model Details *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the website and accept our License before requesting access here.* Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. **Model Developers** Meta **Variations** Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. **Input** Models input text only. **Output** Models generate text only. **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. ||Training Data|Params|Content Length|GQA|Tokens|LR| |---|---|---|---|---|---|---| |Llama 2|*A new mix of publicly available online data*|7B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|13B|4k|✗|2.0T|3.0 x 10-4| |Llama 2|*A new mix of publicly available online data*|70B|4k|✔|2.0T|1.5 x 10-4| *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability. **Model Dates** Llama 2 was trained between January 2023 and July 2023. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: **Research Paper** \"Llama-2: Open Foundation and Fine-tuned Chat Models\" ## Intended Use **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. To get the expected features and performance for the chat versions, a specific formatting needs to be followed, including the and tags, and tokens, and the whitespaces and breaklines in between (we recommend calling on inputs to avoid double-spaces). See our reference code in github for details: []( **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program. ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO2eq)| |---|---|---|---| |Llama 2 7B|184320|400|31.22| |Llama 2 13B|368640|400|62.44| |Llama 2 70B|1720320|400|291.42| |Total|3311616||539.00| **CO2 emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023. ## Evaluation Results In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.For all the evaluations, we use our internal evaluations library. |Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval| |---|---|---|---|---|---|---|---|---|---| |Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9| |Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9| |Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7| |Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6| |Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3| |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1| |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**| **Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1. |||TruthfulQA|Toxigen| |---|---|---|---| |Llama 1|7B|27.42|23.00| |Llama 1|13B|41.74|23.08| |Llama 1|33B|44.19|22.57| |Llama 1|65B|48.71|21.77| |Llama 2|7B|33.29|**21.25**| |Llama 2|13B|41.86|26.10| |Llama 2|70B|**50.18**|24.60| **Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better). |||TruthfulQA|Toxigen| |---|---|---|---| |Llama-2-Chat|7B|57.04|**0.00**| |Llama-2-Chat|13B|62.18|**0.00**| |Llama-2-Chat|70B|**64.14**|0.01| **Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above. ## Ethical Considerations and Limitations Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available at ## Reporting Issues Please report any software “bug,” or other problems with the models through one of the following means: - Reporting issues with the model: github.com/facebookresearch/llama - Reporting problematic content generated by the model: developers.facebook.com/llama_output_feedback - Reporting bugs and security concerns: facebook.com/whitehat/info ## Llama Model Index |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |---|---|---|---|---| |7B| Link | Link | Link | Link| |13B| Link | Link | Link | Link| |70B| Link | Link | Link | Link|" +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.1-70B-Instruct.json b/data/model_data_json/meta-llama_Llama-3.1-70B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..7e880815aacfe2bd7c14add222eefb01c5ed69b2 --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.1-70B-Instruct.json @@ -0,0 +1,32 @@ +{ + "model_id": "meta-llama/Llama-3.1-70B-Instruct", + "downloads": 1114152, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-70B", + "base_model:finetune:meta-llama/Llama-3.1-70B", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th library_name: transformers base_model: meta-llama/Meta-Llama-3.1-70B new_version: meta-llama/Llama-3.3-70B-Instruct license: llama3.1 pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 extra_gated_prompt: \"### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT\\nLlama 3.1 Version\\ \\ Release Date: July 23, 2024\\n\\\"Agreement\\\" means the terms and conditions for\\ \\ use, reproduction, distribution and modification of the Llama Materials set forth\\ \\ herein.\\n\\\"Documentation\\\" means the specifications, manuals and documentation\\ \\ accompanying Llama 3.1 distributed by Meta at \\\"Licensee\\\" or \\\"you\\\" means you, or your employer or any other person or entity\\ \\ (if you are entering into this Agreement on such person or entity’s behalf), of\\ \\ the age required under applicable laws, rules or regulations to provide legal\\ \\ consent and that has legal authority to bind your employer or such other person\\ \\ or entity if you are entering in this Agreement on their behalf.\\n\\\"Llama 3.1\\\"\\ \\ means the foundational large language models and software and algorithms, including\\ \\ machine-learning model code, trained model weights, inference-enabling code, training-enabling\\ \\ code, fine-tuning enabling code and other elements of the foregoing distributed\\ \\ by Meta at Materials\\\" means,\\ \\ collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion\\ \\ thereof) made available under this Agreement.\\n\\\"Meta\\\" or \\\"we\\\" means Meta Platforms\\ \\ Ireland Limited (if you are located in or, if you are an entity, your principal\\ \\ place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you\\ \\ are located outside of the EEA or Switzerland).\\n \\n1. License Rights and Redistribution.\\n\\ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable\\ \\ and royalty-free limited license under Meta’s intellectual property or other rights\\ \\ owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy,\\ \\ create derivative works of, and make modifications to the Llama Materials.\\nb.\\ \\ Redistribution and Use.\\ni. If you distribute or make available the Llama Materials\\ \\ (or any derivative works thereof), or a product or service (including another\\ \\ AI model) that contains any of them, you shall (A) provide a copy of this Agreement\\ \\ with any such Llama Materials; and (B) prominently display “Built with Llama”\\ \\ on a related website, user interface, blogpost, about page, or product documentation.\\ \\ If you use the Llama Materials or any outputs or results of the Llama Materials\\ \\ to create, train, fine tune, or otherwise improve an AI model, which is distributed\\ \\ or made available, you shall also include “Llama” at the beginning of any such\\ \\ AI model name.\\nii. If you receive Llama Materials, or any derivative works thereof,\\ \\ from a Licensee as part of an integrated end user product, then Section 2 of\\ \\ this Agreement will not apply to you.\\niii. You must retain in all copies of the\\ \\ Llama Materials that you distribute the following attribution notice within a\\ \\ “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed\\ \\ under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights\\ \\ Reserved.”\\niv. Your use of the Llama Materials must comply with applicable laws\\ \\ and regulations (including trade compliance laws and regulations) and adhere to\\ \\ the Acceptable Use Policy for the Llama Materials (available at \\ which is hereby incorporated by reference into this Agreement.\\n2. Additional\\ \\ Commercial Terms. If, on the Llama 3.1 version release date, the monthly active\\ \\ users of the products or services made available by or for Licensee, or Licensee’s\\ \\ affiliates, is greater than 700 million monthly active users in the preceding\\ \\ calendar month, you must request a license from Meta, which Meta may grant to\\ \\ you in its sole discretion, and you are not authorized to exercise any of the\\ \\ rights under this Agreement unless or until Meta otherwise expressly grants you\\ \\ such rights.\\n3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE\\ \\ LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS”\\ \\ BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY\\ \\ KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES\\ \\ OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.\\ \\ YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING\\ \\ THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA\\ \\ MATERIALS AND ANY OUTPUT AND RESULTS.\\n4. Limitation of Liability. IN NO EVENT\\ \\ WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN\\ \\ CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS\\ \\ AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL,\\ \\ EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED\\ \\ OF THE POSSIBILITY OF ANY OF THE FOREGOING.\\n5. Intellectual Property.\\na. No\\ \\ trademark licenses are granted under this Agreement, and in connection with the\\ \\ Llama Materials, neither Meta nor Licensee may use any name or mark owned by or\\ \\ associated with the other or any of its affiliates, except as required for reasonable\\ \\ and customary use in describing and redistributing the Llama Materials or as set\\ \\ forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the\\ \\ “Mark”) solely as required to comply with the last sentence of Section 1.b.i.\\ \\ You will comply with Meta’s brand guidelines (currently accessible at \\ ). All goodwill arising out of your use of the Mark will inure to the benefit\\ \\ of Meta.\\nb. Subject to Meta’s ownership of Llama Materials and derivatives made\\ \\ by or for Meta, with respect to any derivative works and modifications of the\\ \\ Llama Materials that are made by you, as between you and Meta, you are and will\\ \\ be the owner of such derivative works and modifications.\\nc. If you institute\\ \\ litigation or other proceedings against Meta or any entity (including a cross-claim\\ \\ or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs\\ \\ or results, or any portion of any of the foregoing, constitutes infringement of\\ \\ intellectual property or other rights owned or licensable by you, then any licenses\\ \\ granted to you under this Agreement shall terminate as of the date such litigation\\ \\ or claim is filed or instituted. You will indemnify and hold harmless Meta from\\ \\ and against any claim by any third party arising out of or related to your use\\ \\ or distribution of the Llama Materials.\\n6. Term and Termination. The term of\\ \\ this Agreement will commence upon your acceptance of this Agreement or access\\ \\ to the Llama Materials and will continue in full force and effect until terminated\\ \\ in accordance with the terms and conditions herein. Meta may terminate this Agreement\\ \\ if you are in breach of any term or condition of this Agreement. Upon termination\\ \\ of this Agreement, you shall delete and cease use of the Llama Materials. Sections\\ \\ 3, 4 and 7 shall survive the termination of this Agreement.\\n7. Governing Law\\ \\ and Jurisdiction. This Agreement will be governed and construed under the laws\\ \\ of the State of California without regard to choice of law principles, and the\\ \\ UN Convention on Contracts for the International Sale of Goods does not apply\\ \\ to this Agreement. The courts of California shall have exclusive jurisdiction\\ \\ of any dispute arising out of this Agreement.\\n### Llama 3.1 Acceptable Use Policy\\n\\ Meta is committed to promoting safe and fair use of its tools and features, including\\ \\ Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy\\ \\ (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses\\nWe want everyone to use Llama 3.1 safely and responsibly.\\ \\ You agree you will not use, or allow others to use, Llama 3.1 to:\\n 1. Violate\\ \\ the law or others’ rights, including to:\\n 1. Engage in, promote, generate,\\ \\ contribute to, encourage, plan, incite, or further illegal or unlawful activity\\ \\ or content, such as:\\n 1. Violence or terrorism\\n 2. Exploitation\\ \\ or harm to children, including the solicitation, creation, acquisition, or dissemination\\ \\ of child exploitative content or failure to report Child Sexual Abuse Material\\n\\ \\ 3. Human trafficking, exploitation, and sexual violence\\n 4. The\\ \\ illegal distribution of information or materials to minors, including obscene\\ \\ materials, or failure to employ legally required age-gating in connection with\\ \\ such information or materials.\\n 5. Sexual solicitation\\n 6. Any\\ \\ other criminal activity\\n 3. Engage in, promote, incite, or facilitate the\\ \\ harassment, abuse, threatening, or bullying of individuals or groups of individuals\\n\\ \\ 4. Engage in, promote, incite, or facilitate discrimination or other unlawful\\ \\ or harmful conduct in the provision of employment, employment benefits, credit,\\ \\ housing, other economic benefits, or other essential goods and services\\n 5.\\ \\ Engage in the unauthorized or unlicensed practice of any profession including,\\ \\ but not limited to, financial, legal, medical/health, or related professional\\ \\ practices\\n 6. Collect, process, disclose, generate, or infer health, demographic,\\ \\ or other sensitive personal or private information about individuals without rights\\ \\ and consents required by applicable laws\\n 7. Engage in or facilitate any action\\ \\ or generate any content that infringes, misappropriates, or otherwise violates\\ \\ any third-party rights, including the outputs or results of any products or services\\ \\ using the Llama Materials\\n 8. Create, generate, or facilitate the creation\\ \\ of malicious code, malware, computer viruses or do anything else that could disable,\\ \\ overburden, interfere with or impair the proper working, integrity, operation\\ \\ or appearance of a website or computer system\\n2. Engage in, promote, incite,\\ \\ facilitate, or assist in the planning or development of activities that present\\ \\ a risk of death or bodily harm to individuals, including use of Llama 3.1 related\\ \\ to the following:\\n 1. Military, warfare, nuclear industries or applications,\\ \\ espionage, use for materials or activities that are subject to the International\\ \\ Traffic Arms Regulations (ITAR) maintained by the United States Department of\\ \\ State\\n 2. Guns and illegal weapons (including weapon development)\\n 3.\\ \\ Illegal drugs and regulated/controlled substances\\n 4. Operation of critical\\ \\ infrastructure, transportation technologies, or heavy machinery\\n 5. Self-harm\\ \\ or harm to others, including suicide, cutting, and eating disorders\\n 6. Any\\ \\ content intended to incite or promote violence, abuse, or any infliction of bodily\\ \\ harm to an individual\\n3. Intentionally deceive or mislead others, including use\\ \\ of Llama 3.1 related to the following:\\n 1. Generating, promoting, or furthering\\ \\ fraud or the creation or promotion of disinformation\\n 2. Generating, promoting,\\ \\ or furthering defamatory content, including the creation of defamatory statements,\\ \\ images, or other content\\n 3. Generating, promoting, or further distributing\\ \\ spam\\n 4. Impersonating another individual without consent, authorization,\\ \\ or legal right\\n 5. Representing that the use of Llama 3.1 or outputs are human-generated\\n\\ \\ 6. Generating or facilitating false online engagement, including fake reviews\\ \\ and other means of fake online engagement\\n4. Fail to appropriately disclose to\\ \\ end users any known dangers of your AI system\\nPlease report any violation of\\ \\ this Policy, software “bug,” or other problems that could lead to a violation\\ \\ of this Policy through one of the following means:\\n * Reporting issues with\\ \\ the model: \\ * Reporting risky content generated by the model:\\n developers.facebook.com/llama_output_feedback\\n\\ \\ * Reporting bugs and security concerns: facebook.com/whitehat/info\\n * Reporting\\ \\ violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com\" extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location ? By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy : checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-70B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . See the snippet below for usage with Transformers: ### Tool use with transformers LLaMA-3.1 supports multiple tool use formats. You can see a full guide to prompt formatting here. Tool use is also supported through chat templates in Transformers. Here is a quick example showing a single simple tool: You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so: and then call the tool and append the result, with the role, like so: After that, you can again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information, see the LLaMA prompt format docs and the Transformers tool use documentation. ### Use with The model checkpoints can be used in and for further memory optimisations using and See the snippet below for usage: To load in 4-bit simply pass ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 46.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.1-70B.json b/data/model_data_json/meta-llama_Llama-3.1-70B.json new file mode 100644 index 0000000000000000000000000000000000000000..9620fb14e5e9577cb44fc91eb6a0631a1b1de47f --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.1-70B.json @@ -0,0 +1,30 @@ +{ + "model_id": "meta-llama/Llama-3.1-70B", + "downloads": 87342, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.1 extra_gated_prompt: >- ### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT Llama 3.1 Version Release Date: July 23, 2024 \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 3.1 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 3.1\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"Llama Materials\" means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at ). All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.1 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.1 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.1 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 3. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 4. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 5. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 6. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 7. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 8. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.1 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.1 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 3.1 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-70B, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 46.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.", + "model_explanation_gemini": "A multilingual text-generation model supporting several languages, designed for creating and modifying text outputs under Meta's Llama 3.1 Community License." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.1-8B-Instruct.json b/data/model_data_json/meta-llama_Llama-3.1-8B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..6f5982b8434ca14fce06e04d18f656279263558e --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.1-8B-Instruct.json @@ -0,0 +1,32 @@ +{ + "model_id": "meta-llama/Llama-3.1-8B-Instruct", + "downloads": 5237076, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-8B", + "base_model:finetune:meta-llama/Llama-3.1-8B", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 extra_gated_prompt: \"### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT\\nLlama 3.1 Version\\ \\ Release Date: July 23, 2024\\n\\\"Agreement\\\" means the terms and conditions for\\ \\ use, reproduction, distribution and modification of the Llama Materials set forth\\ \\ herein.\\n\\\"Documentation\\\" means the specifications, manuals and documentation\\ \\ accompanying Llama 3.1 distributed by Meta at \\\"Licensee\\\" or \\\"you\\\" means you, or your employer or any other person or entity\\ \\ (if you are entering into this Agreement on such person or entity’s behalf), of\\ \\ the age required under applicable laws, rules or regulations to provide legal\\ \\ consent and that has legal authority to bind your employer or such other person\\ \\ or entity if you are entering in this Agreement on their behalf.\\n\\\"Llama 3.1\\\"\\ \\ means the foundational large language models and software and algorithms, including\\ \\ machine-learning model code, trained model weights, inference-enabling code, training-enabling\\ \\ code, fine-tuning enabling code and other elements of the foregoing distributed\\ \\ by Meta at Materials\\\" means,\\ \\ collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion\\ \\ thereof) made available under this Agreement.\\n\\\"Meta\\\" or \\\"we\\\" means Meta Platforms\\ \\ Ireland Limited (if you are located in or, if you are an entity, your principal\\ \\ place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you\\ \\ are located outside of the EEA or Switzerland).\\n \\n1. License Rights and Redistribution.\\n\\ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable\\ \\ and royalty-free limited license under Meta’s intellectual property or other rights\\ \\ owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy,\\ \\ create derivative works of, and make modifications to the Llama Materials.\\nb.\\ \\ Redistribution and Use.\\ni. If you distribute or make available the Llama Materials\\ \\ (or any derivative works thereof), or a product or service (including another\\ \\ AI model) that contains any of them, you shall (A) provide a copy of this Agreement\\ \\ with any such Llama Materials; and (B) prominently display “Built with Llama”\\ \\ on a related website, user interface, blogpost, about page, or product documentation.\\ \\ If you use the Llama Materials or any outputs or results of the Llama Materials\\ \\ to create, train, fine tune, or otherwise improve an AI model, which is distributed\\ \\ or made available, you shall also include “Llama” at the beginning of any such\\ \\ AI model name.\\nii. If you receive Llama Materials, or any derivative works thereof,\\ \\ from a Licensee as part of an integrated end user product, then Section 2 of\\ \\ this Agreement will not apply to you.\\niii. You must retain in all copies of the\\ \\ Llama Materials that you distribute the following attribution notice within a\\ \\ “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed\\ \\ under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights\\ \\ Reserved.”\\niv. Your use of the Llama Materials must comply with applicable laws\\ \\ and regulations (including trade compliance laws and regulations) and adhere to\\ \\ the Acceptable Use Policy for the Llama Materials (available at \\ which is hereby incorporated by reference into this Agreement.\\n2. Additional\\ \\ Commercial Terms. If, on the Llama 3.1 version release date, the monthly active\\ \\ users of the products or services made available by or for Licensee, or Licensee’s\\ \\ affiliates, is greater than 700 million monthly active users in the preceding\\ \\ calendar month, you must request a license from Meta, which Meta may grant to\\ \\ you in its sole discretion, and you are not authorized to exercise any of the\\ \\ rights under this Agreement unless or until Meta otherwise expressly grants you\\ \\ such rights.\\n3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE\\ \\ LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS”\\ \\ BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY\\ \\ KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES\\ \\ OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.\\ \\ YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING\\ \\ THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA\\ \\ MATERIALS AND ANY OUTPUT AND RESULTS.\\n4. Limitation of Liability. IN NO EVENT\\ \\ WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN\\ \\ CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS\\ \\ AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL,\\ \\ EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED\\ \\ OF THE POSSIBILITY OF ANY OF THE FOREGOING.\\n5. Intellectual Property.\\na. No\\ \\ trademark licenses are granted under this Agreement, and in connection with the\\ \\ Llama Materials, neither Meta nor Licensee may use any name or mark owned by or\\ \\ associated with the other or any of its affiliates, except as required for reasonable\\ \\ and customary use in describing and redistributing the Llama Materials or as set\\ \\ forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the\\ \\ “Mark”) solely as required to comply with the last sentence of Section 1.b.i.\\ \\ You will comply with Meta’s brand guidelines (currently accessible at \\ ). All goodwill arising out of your use of the Mark will inure to the benefit\\ \\ of Meta.\\nb. Subject to Meta’s ownership of Llama Materials and derivatives made\\ \\ by or for Meta, with respect to any derivative works and modifications of the\\ \\ Llama Materials that are made by you, as between you and Meta, you are and will\\ \\ be the owner of such derivative works and modifications.\\nc. If you institute\\ \\ litigation or other proceedings against Meta or any entity (including a cross-claim\\ \\ or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs\\ \\ or results, or any portion of any of the foregoing, constitutes infringement of\\ \\ intellectual property or other rights owned or licensable by you, then any licenses\\ \\ granted to you under this Agreement shall terminate as of the date such litigation\\ \\ or claim is filed or instituted. You will indemnify and hold harmless Meta from\\ \\ and against any claim by any third party arising out of or related to your use\\ \\ or distribution of the Llama Materials.\\n6. Term and Termination. The term of\\ \\ this Agreement will commence upon your acceptance of this Agreement or access\\ \\ to the Llama Materials and will continue in full force and effect until terminated\\ \\ in accordance with the terms and conditions herein. Meta may terminate this Agreement\\ \\ if you are in breach of any term or condition of this Agreement. Upon termination\\ \\ of this Agreement, you shall delete and cease use of the Llama Materials. Sections\\ \\ 3, 4 and 7 shall survive the termination of this Agreement.\\n7. Governing Law\\ \\ and Jurisdiction. This Agreement will be governed and construed under the laws\\ \\ of the State of California without regard to choice of law principles, and the\\ \\ UN Convention on Contracts for the International Sale of Goods does not apply\\ \\ to this Agreement. The courts of California shall have exclusive jurisdiction\\ \\ of any dispute arising out of this Agreement.\\n### Llama 3.1 Acceptable Use Policy\\n\\ Meta is committed to promoting safe and fair use of its tools and features, including\\ \\ Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy\\ \\ (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses\\nWe want everyone to use Llama 3.1 safely and responsibly.\\ \\ You agree you will not use, or allow others to use, Llama 3.1 to:\\n 1. Violate\\ \\ the law or others’ rights, including to:\\n 1. Engage in, promote, generate,\\ \\ contribute to, encourage, plan, incite, or further illegal or unlawful activity\\ \\ or content, such as:\\n 1. Violence or terrorism\\n 2. Exploitation\\ \\ or harm to children, including the solicitation, creation, acquisition, or dissemination\\ \\ of child exploitative content or failure to report Child Sexual Abuse Material\\n\\ \\ 3. Human trafficking, exploitation, and sexual violence\\n 4. The\\ \\ illegal distribution of information or materials to minors, including obscene\\ \\ materials, or failure to employ legally required age-gating in connection with\\ \\ such information or materials.\\n 5. Sexual solicitation\\n 6. Any\\ \\ other criminal activity\\n 3. Engage in, promote, incite, or facilitate the\\ \\ harassment, abuse, threatening, or bullying of individuals or groups of individuals\\n\\ \\ 4. Engage in, promote, incite, or facilitate discrimination or other unlawful\\ \\ or harmful conduct in the provision of employment, employment benefits, credit,\\ \\ housing, other economic benefits, or other essential goods and services\\n 5.\\ \\ Engage in the unauthorized or unlicensed practice of any profession including,\\ \\ but not limited to, financial, legal, medical/health, or related professional\\ \\ practices\\n 6. Collect, process, disclose, generate, or infer health, demographic,\\ \\ or other sensitive personal or private information about individuals without rights\\ \\ and consents required by applicable laws\\n 7. Engage in or facilitate any action\\ \\ or generate any content that infringes, misappropriates, or otherwise violates\\ \\ any third-party rights, including the outputs or results of any products or services\\ \\ using the Llama Materials\\n 8. Create, generate, or facilitate the creation\\ \\ of malicious code, malware, computer viruses or do anything else that could disable,\\ \\ overburden, interfere with or impair the proper working, integrity, operation\\ \\ or appearance of a website or computer system\\n2. Engage in, promote, incite,\\ \\ facilitate, or assist in the planning or development of activities that present\\ \\ a risk of death or bodily harm to individuals, including use of Llama 3.1 related\\ \\ to the following:\\n 1. Military, warfare, nuclear industries or applications,\\ \\ espionage, use for materials or activities that are subject to the International\\ \\ Traffic Arms Regulations (ITAR) maintained by the United States Department of\\ \\ State\\n 2. Guns and illegal weapons (including weapon development)\\n 3.\\ \\ Illegal drugs and regulated/controlled substances\\n 4. Operation of critical\\ \\ infrastructure, transportation technologies, or heavy machinery\\n 5. Self-harm\\ \\ or harm to others, including suicide, cutting, and eating disorders\\n 6. Any\\ \\ content intended to incite or promote violence, abuse, or any infliction of bodily\\ \\ harm to an individual\\n3. Intentionally deceive or mislead others, including use\\ \\ of Llama 3.1 related to the following:\\n 1. Generating, promoting, or furthering\\ \\ fraud or the creation or promotion of disinformation\\n 2. Generating, promoting,\\ \\ or furthering defamatory content, including the creation of defamatory statements,\\ \\ images, or other content\\n 3. Generating, promoting, or further distributing\\ \\ spam\\n 4. Impersonating another individual without consent, authorization,\\ \\ or legal right\\n 5. Representing that the use of Llama 3.1 or outputs are human-generated\\n\\ \\ 6. Generating or facilitating false online engagement, including fake reviews\\ \\ and other means of fake online engagement\\n4. Fail to appropriately disclose to\\ \\ end users any known dangers of your AI system\\nPlease report any violation of\\ \\ this Policy, software “bug,” or other problems that could lead to a violation\\ \\ of this Policy through one of the following means:\\n * Reporting issues with\\ \\ the model: \\ * Reporting risky content generated by the model:\\n developers.facebook.com/llama_output_feedback\\n\\ \\ * Reporting bugs and security concerns: facebook.com/whitehat/info\\n * Reporting\\ \\ violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com\" extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location ? By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy : checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Tool use with transformers LLaMA-3.1 supports multiple tool use formats. You can see a full guide to prompt formatting here. Tool use is also supported through chat templates in Transformers. Here is a quick example showing a single simple tool: You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so: and then call the tool and append the result, with the role, like so: After that, you can again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information, see the LLaMA prompt format docs and the Transformers tool use documentation. ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 46.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.1-8B.json b/data/model_data_json/meta-llama_Llama-3.1-8B.json new file mode 100644 index 0000000000000000000000000000000000000000..2c73146fa87db0fdcb485ca02f5f9d35e2a00091 --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.1-8B.json @@ -0,0 +1,29 @@ +{ + "model_id": "meta-llama/Llama-3.1-8B", + "downloads": 1087264, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.1 extra_gated_prompt: >- ### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT Llama 3.1 Version Release Date: July 23, 2024 \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 3.1 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 3.1\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"Llama Materials\" means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at ). All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.1 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.1 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.1 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 3. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 4. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 5. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 6. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 7. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 8. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.1 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.1 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 3.1 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit library_name: transformers --- ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta's Llama-3.1-8B, for use with transformers and with the original codebase. ### Use with transformers Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. Make sure to update your transformers installation via pip install --upgrade transformers. ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 46.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.2-11B-Vision-Instruct.json b/data/model_data_json/meta-llama_Llama-3.2-11B-Vision-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..72231f5a827c8a2ecfde7fba1f8bc1af0e44cd2a --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.2-11B-Vision-Instruct.json @@ -0,0 +1,30 @@ +{ + "model_id": "meta-llama/Llama-3.2-11B-Vision-Instruct", + "downloads": 706191, + "tags": [ + "transformers", + "safetensors", + "mllama", + "image-text-to-text", + "facebook", + "meta", + "pytorch", + "llama", + "llama-3", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "license:llama3.2", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th library_name: transformers pipeline_tag: image-text-to-text tags: - facebook - meta - pytorch - llama - llama-3 widget: - example_title: rococo art messages: - role: user content: - type: text text: Which era does this piece belong to? Give details about the era. - type: image_url image_url: url: - role: assistant content: | The piece is a painting of a woman in a red dress, surrounded by gold and white ornate details. The woman is depicted in mid-air, with her arms outstretched and her legs bent at the knees. She is holding a bouquet of flowers in her right hand and a fruit in her left hand. The background of the painting is a light blue sky with pink clouds, and there are also some pink flowers and green leaves surrounding the woman. The overall atmosphere of the painting is one of joy and celebration, as if the woman is dancing or celebrating something. This piece belongs to the Rococo era, which was a style of art and architecture that emerged in Europe in the 18th century. The Rococo style is characterized by its use of pastel colors, curved lines, and ornate details. It was popularized during the reign of King Louis XV of France, who ruled from 1715 to 1774. - example_title: diagram understanding messages: - role: user content: - type: text text: Where do the droughts happen according to this diagram? - type: image_url image_url: url: - role: assistant content: | # Drought Regions in 2016 According to the diagram, the regions that experienced severe droughts in 2016 are: - **Eastern and Southern Africa**: This region was severely affected by droughts, with significant impacts on agriculture and water resources. - **Western and Southern Africa**: Droughts also occurred in this region, leading to food insecurity and other challenges. These regions were particularly vulnerable to droughts in 2016, highlighting the need for effective water management strategies and climate resilience measures. - example_title: document qa messages: - role: user content: - type: text text: How long does it take from invoice date to due date? Be short and concise. - type: image_url image_url: url: - role: assistant content: | To determine the time gap between the invoice date and the due date, we need to calculate the difference between these two dates. Invoice Date: February 11, 2019 Due Date: February 26, 2019 Step 1: Calculate the Difference in Days February 26 - February 11 = 15 days Therefore, the time gap between the invoice date and the due date is 15 days. license: llama3.2 extra_gated_prompt: >- ### LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. “Documentation” means the specifications, manuals and documentation accompanying Llama 3.2 distributed by Meta at “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. “Llama 3.2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at “Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and any portion thereof) made available under this Agreement. “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.2. If you access or use Llama 3.2, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.2 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 1. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 2. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 3. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 4. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law 5. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 6. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 7. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following: 8. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997 9. Guns and illegal weapons (including weapon development) 10. Illegal drugs and regulated/controlled substances 11. Operation of critical infrastructure, transportation technologies, or heavy machinery 12. Self-harm or harm to others, including suicide, cutting, and eating disorders 13. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following: 14. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 15. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 16. Generating, promoting, or further distributing spam 17. Impersonating another individual without consent, authorization, or legal right 18. Representing that the use of Llama 3.2 or outputs are human-generated 19. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system 5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2 With respect to any multimodal models included in Llama 3.2, the rights granted under Section 1(a) of the Llama 3.2 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit extra_gated_eu_disallowed: true --- ## Model Information The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text \\+ images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks. **Model Developer**: Meta **Model Architecture:** Llama 3.2-Vision is built on top of Llama 3.1 text-only model, which is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. To support image recognition tasks, the Llama 3.2-Vision model uses a separately trained vision adapter that integrates with the pre-trained Llama 3.1 language model. The adapter consists of a series of cross-attention layers that feed image encoder representations into the core LLM. | | Training Data | Params | Input modalities | Output modalities | Context length | GQA | Data volume | Knowledge cutoff | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Llama 3.2-Vision | (Image, text) pairs | 11B (10.6) | Text \\+ Image | Text | 128k | Yes | 6B (image, text) pairs | December 2023 | | Llama 3.2-Vision | (Image, text) pairs | 90B (88.8) | Text \\+ Image | Text | 128k | Yes | 6B (image, text) pairs | December 2023 | **Supported Languages:** For text only tasks, English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Note for image+text applications, English is the only language supported. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). **Feedback:** Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.2-Vision in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 3.2-Vision is intended for commercial and research use. Instruction tuned models are intended for visual recognition, image reasoning, captioning, and assistant-like chat with images, whereas pretrained models can be adapted for a variety of image reasoning tasks. Additionally, because of Llama 3.2-Vision’s ability to take images and text as inputs, additional use cases could include: 1. Visual Question Answering (VQA) and Visual Reasoning: Imagine a machine that looks at a picture and understands your questions about it. 2. Document Visual Question Answering (DocVQA): Imagine a computer understanding both the text and layout of a document, like a map or contract, and then answering questions about it directly from the image. 3. Image Captioning: Image captioning bridges the gap between vision and language, extracting details, understanding the scene, and then crafting a sentence or two that tells the story. 4. Image-Text Retrieval: Image-text retrieval is like a matchmaker for images and their descriptions. Similar to a search engine but one that understands both pictures and words. 5. Visual Grounding: Visual grounding is like connecting the dots between what we see and say. It’s about understanding how language references specific parts of an image, allowing AI models to pinpoint objects or regions based on natural language descriptions. The Llama 3.2 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.2 Community License allows for these use cases. **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card. ## How to use This repository contains two versions of Llama-3.2-11B-Vision-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with transformers >= 4.45.0 onward, you can run inference using conversational messages that may include an image you can query about. Make sure to update your transformers installation via . ### Use with Please, follow the instructions in the repository. To download the original checkpoints, you can use as follows: ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Training utilized a cumulative of **2.02M** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. ## **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **584** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | | Training Time (GPU hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | :---: | :---: | :---: | | Llama 3.2-vision 11B | Stage 1 pretraining: 147K H100 hours Stage 2 annealing: 98K H100 hours SFT: 896 H100 hours RLHF: 224 H100 hours | 700 | 71 | 0 | | Llama 3.2-vision 90B | Stage 1 pretraining: 885K H100 hours Stage 2 annealing: 885K H100 hours SFT: 3072 H100 hours RLHF: 2048 H100 hours | 700 | 513 | 0 | | Total | 2.02M | | 584 | 0 | The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.2-Vision was pretrained on 6B image and text pairs. The instruction tuning data includes publicly available vision instruction datasets, as well as over 3M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023\\. ## Benchmarks \\- Image Reasoning In this section, we report the results for Llama 3.2-Vision models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library. ### Base Pretrained Models | Category | Benchmark | \\# Shots | Metric | Llama 3.2 11B | Llama 3.2 90B | | ----- | ----- | ----- | ----- | ----- | ----- | | Image Understanding | VQAv2 (val) | 0 | Accuracy | 66.8 | 73.6 | | | Text VQA (val) | 0 | Relaxed accuracy | 73.1 | 73.5 | | | DocVQA (val, unseen) | 0 | ANLS | 62.3 | 70.7 | | Visual Reasoning | MMMU (val, 0-shot) | 0 | Micro average accuracy | 41.7 | 49.3 | | | ChartQA (test) | 0 | Accuracy | 39.4 | 54.2 | | | InfographicsQA (val, unseen) | 0 | ANLS | 43.2 | 56.8 | | | AI2 Diagram (test) | 0 | Accuracy | 62.4 | 75.3 | ### Instruction Tuned Models | Modality | Capability | Benchmark | \\# Shots | Metric | Llama 3.2 11B | Llama 3.2 90B | | ----- | :---: | ----- | :---: | :---: | ----- | ----- | | Image | College-level Problems and Mathematical Reasoning | MMMU (val, CoT) | 0 | Micro average accuracy | 50.7 | 60.3 | | | | MMMU-Pro, Standard (10 opts, test) | 0 | Accuracy | 33.0 | 45.2 | | | | MMMU-Pro, Vision (test) | 0 | Accuracy | 23.7 | 33.8 | | | | MathVista (testmini) | 0 | Accuracy | 51.5 | 57.3 | | | Charts and Diagram Understanding | ChartQA (test, CoT) | 0 | Relaxed accuracy | 83.4 | 85.5 | | | | AI2 Diagram (test) | 0 | Accuracy | 91.1 | 92.3 | | | | DocVQA (test) | 0 | ANLS | 88.4 | 90.1 | | | General Visual Question Answering | VQAv2 (test) | 0 | Accuracy | 75.2 | 78.1 | | | | | | | | | | Text | General | MMLU (CoT) | 0 | Macro\\_avg/acc | 73.0 | 86.0 | | | Math | MATH (CoT) | 0 | Final\\_em | 51.9 | 68.0 | | | Reasoning | GPQA | 0 | Accuracy | 32.8 | 46.7 | | | Multilingual | MGSM (CoT) | 0 | em | 68.9 | 86.9 | ## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: 1. Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. 2. Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. 3. Provide protections for the community to help prevent the misuse of our models. ### Responsible Deployment **Approach:** Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.2 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.2 Instruct **Objective:** Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. We implemented the same set of safety mitigations as in Llama 3, and you can learn more about these in the Llama 3 paper. **Fine-Tuning Data:** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone:** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.2 Systems **Safety as a System:** Large language models, including Llama 3.2, **are not designed to be deployed in isolation** but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### New Capabilities and Use Cases **Technological Advancement:** Llama releases usually introduce new capabilities that require specific considerations in addition to the best practices that generally apply across all Generative AI use cases. For prior release capabilities also supported by Llama 3.2, see Llama 3.1 Model Card, as the same considerations apply here as well., **Image Reasoning:** Llama 3.2-Vision models come with multimodal (text and image) input capabilities enabling image reasoning applications. As part of our responsible release process, we took dedicated measures including evaluations and mitigations to address the risk of the models uniquely identifying individuals in images. As with other LLM risks, models may not always be robust to adversarial prompts, and developers should evaluate identification and other applicable risks in the context of their applications as well as consider deploying Llama Guard 3-11B-Vision as part of their system or other mitigations as appropriate to detect and mitigate such risks. ### Evaluations **Scaled Evaluations:** We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Purple Llama safeguards to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. **Red teaming:** We conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks In addition to our safety work above, we took extra care on measuring and/or mitigating the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive Weapons):** For Llama 3.1, to assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. For Llama 3.2-Vision models, we conducted additional targeted evaluations and found that it was unlikely Llama 3.2 presented an increase in scientific capabilities due to its added image understanding capability as compared to Llama 3.1. **2\\. Child Safety:** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3\\. Cyber Attacks:** For Llama 3.1 405B, our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Because Llama 3.2’s vision capabilities are not generally germane to cyber uplift, we believe that the testing conducted for Llama 3.1 also applies to Llama 3.2. ### Community **Industry Partnerships:** Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. **Grants:** We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. **Reporting:** Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations **Values:** The core values of Llama 3.2 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.2 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. **Testing:** But Llama 3.2 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.2 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.2-1B-Instruct.json b/data/model_data_json/meta-llama_Llama-3.2-1B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..9c0a095e380c8236a81ee6c9e1f5a75655670a4b --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.2-1B-Instruct.json @@ -0,0 +1,31 @@ +{ + "model_id": "meta-llama/Llama-3.2-1B-Instruct", + "downloads": 2482128, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "arxiv:2405.16406", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th library_name: transformers pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.2 extra_gated_prompt: >- ### LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. “Documentation” means the specifications, manuals and documentation accompanying Llama 3.2 distributed by Meta at “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. “Llama 3.2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at “Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and any portion thereof) made available under this Agreement. “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.2. If you access or use Llama 3.2, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.2 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 1. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 2. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 3. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 4. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law 5. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 6. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 7. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following: 8. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997 9. Guns and illegal weapons (including weapon development) 10. Illegal drugs and regulated/controlled substances 11. Operation of critical infrastructure, transportation technologies, or heavy machinery 12. Self-harm or harm to others, including suicide, cutting, and eating disorders 13. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following: 14. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 15. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 16. Generating, promoting, or further distributing spam 17. Impersonating another individual without consent, authorization, or legal right 18. Representing that the use of Llama 3.2 or outputs are human-generated 19. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system 5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2 With respect to any multimodal models included in Llama 3.2, the rights granted under Section 1(a) of the Llama 3.2 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model Developer:** Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. | | Training Data | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Token count | Knowledge cutoff | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Llama 3.2 (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | | Llama 3.2 Quantized (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 8k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | **Supported Languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). **Feedback:** Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources. **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card. ## How to use This repository contains two versions of Llama-3.2-1B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Training utilized a cumulative of **916k** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **240** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | | Training Time (GPU hours) | Logit Generation Time (GPU Hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | ----- | :---: | :---: | :---: | | Llama 3.2 1B | 370k | \\- | 700 | 107 | 0 | | Llama 3.2 3B | 460k | \\- | 700 | 133 | 0 | | Llama 3.2 1B SpinQuant | 1.7 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 3B SpinQuant | 2.4 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 1B QLora | 1.3k | 0 | 700 | 0.381 | 0 | | Llama 3.2 3B QLora | 1.6k | 0 | 700 | 0.461 | 0 | | Total | 833k | 86k | | 240 | 0 | \\*\\* The location-based CO2e emissions of Llama 3.2 1B SpinQuant and Llama 3.2 3B SpinQuant are less than 0.001 metric tonnes each. This is due to the minimal training GPU hours that are required. The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.2 was pretrained on up to 9 trillion tokens of data from publicly available sources. For the 1B and 3B Llama 3.2 models, we incorporated logits from the Llama 3.1 8B and 70B models into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. Knowledge distillation was used after pruning to recover performance. In post-training we used a similar recipe as Llama 3.1 and produced final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involved Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO). **Data Freshness:** The pretraining data has a cutoff of December 2023\\. ## Quantization ### Quantization Scheme We designed the current quantization scheme with the PyTorch’s ExecuTorch inference framework and Arm CPU backend in mind, taking into account metrics including model quality, prefill/decoding speed, and memory footprint. Our quantization scheme involves three parts: - All linear layers in all transformer blocks are quantized to a 4-bit groupwise scheme (with a group size of 32) for weights and 8-bit per-token dynamic quantization for activations. - The classification layer is quantized to 8-bit per-channel for weight and 8-bit per token dynamic quantization for activation. - Similar to classification layer, an 8-bit per channel quantization is used for embedding layer. ### Quantization-Aware Training and LoRA The quantization-aware training (QAT) with low-rank adaptation (LoRA) models went through only post-training stages, using the same data as the full precision models. To initialize QAT, we utilize BF16 Llama 3.2 model checkpoints obtained after supervised fine-tuning (SFT) and perform an additional full round of SFT training with QAT. We then freeze the backbone of the QAT model and perform another round of SFT with LoRA adaptors applied to all layers within the transformer block. Meanwhile, the LoRA adaptors' weights and activations are maintained in BF16. Because our approach is similar to QLoRA of Dettmers et al., (2023) (i.e., quantization followed by LoRA adapters), we refer this method as QLoRA. Finally, we fine-tune the resulting model (both backbone and LoRA adaptors) using direct preference optimization (DPO). ### SpinQuant SpinQuant was applied, together with generative post-training quantization (GPTQ). For the SpinQuant rotation matrix fine-tuning, we optimized for 100 iterations, using 800 samples with sequence-length 2048 from the WikiText 2 dataset. For GPTQ, we used 128 samples from the same dataset with the same sequence-length. ## Benchmarks \\- English Text In this section, we report the results for Llama 3.2 models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library. ### Base Pretrained Models | Category | Benchmark | \\# Shots | Metric | Llama 3.2 1B | Llama 3.2 3B | Llama 3.1 8B | | ----- | ----- | :---: | :---: | :---: | :---: | :---: | | General | MMLU | 5 | macro\\_avg/acc\\_char | 32.2 | 58 | 66.7 | | | AGIEval English | 3-5 | average/acc\\_char | 23.3 | 39.2 | 47.8 | | | ARC-Challenge | 25 | acc\\_char | 32.8 | 69.1 | 79.7 | | Reading comprehension | SQuAD | 1 | em | 49.2 | 67.7 | 77 | | | QuAC (F1) | 1 | f1 | 37.9 | 42.9 | 44.9 | | | DROP (F1) | 3 | f1 | 28.0 | 45.2 | 59.5 | | Long Context | Needle in Haystack | 0 | em | 96.8 | 1 | 1 | ### Instruction Tuned Models | Capability | | Benchmark | \\# Shots | Metric | Llama 3.2 1B bf16 | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B bf16 | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | ----- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | | MMLU | 5 | macro\\_avg/acc | 49.3 | 43.3 | 47.3 | 49.0 | 63.4 | 60.5 | 62 | 62.4 | 69.4 | | Re-writing | | Open-rewrite eval | 0 | micro\\_avg/rougeL | 41.6 | 39.2 | 40.9 | 41.2 | 40.1 | 40.3 | 40.8 | 40.7 | 40.9 | | Summarization | | TLDR9+ (test) | 1 | rougeL | 16.8 | 14.9 | 16.7 | 16.8 | 19.0 | 19.1 | 19.2 | 19.1 | 17.2 | | Instruction following | | IFEval | 0 | Avg(Prompt/Instruction acc Loose/Strict) | 59.5 | 51.5 | 58.4 | 55.6 | 77.4 | 73.9 | 73.5 | 75.9 | 80.4 | | Math | | GSM8K (CoT) | 8 | em\\_maj1@1 | 44.4 | 33.1 | 40.6 | 46.5 | 77.7 | 72.9 | 75.7 | 77.9 | 84.5 | | | | MATH (CoT) | 0 | final\\_em | 30.6 | 20.5 | 25.3 | 31.0 | 48.0 | 44.2 | 45.3 | 49.2 | 51.9 | | Reasoning | | ARC-C | 0 | acc | 59.4 | 54.3 | 57 | 60.7 | 78.6 | 75.6 | 77.6 | 77.6 | 83.4 | | | | GPQA | 0 | acc | 27.2 | 25.9 | 26.3 | 25.9 | 32.8 | 32.8 | 31.7 | 33.9 | 32.8 | | | | Hellaswag | 0 | acc | 41.2 | 38.1 | 41.3 | 41.5 | 69.8 | 66.3 | 68 | 66.3 | 78.7 | | Tool Use | | BFCL V2 | 0 | acc | 25.7 | 14.3 | 15.9 | 23.7 | 67.0 | 53.4 | 60.1 | 63.5 | 67.1 | | | | Nexus | 0 | macro\\_avg/acc | 13.5 | 5.2 | 9.6 | 12.5 | 34.3 | 32.4 | 31.5 | 30.1 | 38.5 | | Long Context | | InfiniteBench/En.QA | 0 | longbook\\_qa/f1 | 20.3 | N/A | N/A | N/A | 19.8 | N/A | N/A | N/A | 27.3 | | | | InfiniteBench/En.MC | 0 | longbook\\_choice/acc | 38.0 | N/A | N/A | N/A | 63.3 | N/A | N/A | N/A | 72.2 | | | | NIH/Multi-needle | 0 | recall | 75.0 | N/A | N/A | N/A | 84.7 | N/A | N/A | N/A | 98.8 | | Multilingual | | MGSM (CoT) | 0 | em | 24.5 | 13.7 | 18.2 | 24.4 | 58.2 | 48.9 | 54.3 | 56.8 | 68.9 | \\*\\*for comparison purposes only. Model not released. ### Multilingual Benchmarks | Category | Benchmark | Language | Llama 3.2 1B | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | MMLU (5-shot, macro_avg/acc) | Portuguese | 39.8 | 34.9 | 38.9 | 40.2 | 54.5 | 50.9 | 53.3 | 53.4 | 62.1 | | | | Spanish | 41.5 | 36.0 | 39.8 | 41.8 | 55.1 | 51.9 | 53.6 | 53.6 | 62.5 | | | | Italian | 39.8 | 34.9 | 38.1 | 40.6 | 53.8 | 49.9 | 52.1 | 51.7 | 61.6 | | | | German | 39.2 | 34.9 | 37.5 | 39.6 | 53.3 | 50.0 | 52.2 | 51.3 | 60.6 | | | | French | 40.5 | 34.8 | 39.2 | 40.8 | 54.6 | 51.2 | 53.3 | 53.3 | 62.3 | | | | Hindi | 33.5 | 30.0 | 32.1 | 34.0 | 43.3 | 40.4 | 42.0 | 42.1 | 50.9 | | | | Thai | 34.7 | 31.2 | 32.4 | 34.9 | 44.5 | 41.3 | 44.0 | 42.2 | 50.3 | \\*\\*for comparison purposes only. Model not released. ## Inference time In the below table, we compare the performance metrics of different quantization methods (SpinQuant and QAT \\+ LoRA) with the BF16 baseline. The evaluation was done using the ExecuTorch framework as the inference engine, with the ARM CPU as a backend using Android OnePlus 12 device. | Category | Decode (tokens/sec) | Time-to-first-token (sec) | Prefill (tokens/sec) | Model size (PTE file size in MB) | Memory size (RSS in MB) | | :---- | ----- | ----- | ----- | ----- | ----- | | 1B BF16 (baseline) | 19.2 | 1.0 | 60.3 | 2358 | 3,185 | | 1B SpinQuant | 50.2 (2.6x) | 0.3 (-76.9%) | 260.5 (4.3x) | 1083 (-54.1%) | 1,921 (-39.7%) | | 1B QLoRA | 45.8 (2.4x) | 0.3 (-76.0%) | 252.0 (4.2x) | 1127 (-52.2%) | 2,255 (-29.2%) | | 3B BF16 (baseline) | 7.6 | 3.0 | 21.2 | 6129 | 7,419 | | 3B SpinQuant | 19.7 (2.6x) | 0.7 (-76.4%) | 89.7 (4.2x) | 2435 (-60.3%) | 3,726 (-49.8%) | | 3B QLoRA | 18.5 (2.4x) | 0.7 (-76.1%) | 88.8 (4.2x) | 2529 (-58.7%) | 4,060 (-45.3%) | (\\*) The performance measurement is done using an adb binary-based approach. (\\*\\*) It is measured on an Android OnePlus 12 device. (\\*\\*\\*) Time-to-first-token (TTFT) is measured with prompt length=64 *Footnote:* - *Decode (tokens/second) is for how quickly it keeps generating. Higher is better.* - *Time-to-first-token (TTFT for shorthand) is for how fast it generates the first token for a given prompt. Lower is better.* - *Prefill is the inverse of TTFT (aka 1/TTFT) in tokens/second. Higher is better* - *Model size \\- how big is the model, measured by, PTE file, a binary file format for ExecuTorch* - *RSS size \\- Memory usage in resident set size (RSS)* ## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: 1. Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama 2. Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm 3. Provide protections for the community to help prevent the misuse of our models ### Responsible Deployment **Approach:** Llama is a foundational technology designed to be used in a variety of use cases. Examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models, enabling the world to benefit from the technology power, by aligning our model safety for generic use cases and addressing a standard set of harms. Developers are then in the driver’s seat to tailor safety for their use cases, defining their own policies and deploying the models with the necessary safeguards in their Llama systems. Llama 3.2 was developed following the best practices outlined in our Responsible Use Guide. #### Llama 3.2 Instruct **Objective:** Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. We implemented the same set of safety mitigations as in Llama 3, and you can learn more about these in the Llama 3 paper. **Fine-Tuning Data:** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone:** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.2 Systems **Safety as a System:** Large language models, including Llama 3.2, **are not designed to be deployed in isolation** but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### New Capabilities and Use Cases **Technological Advancement:** Llama releases usually introduce new capabilities that require specific considerations in addition to the best practices that generally apply across all Generative AI use cases. For prior release capabilities also supported by Llama 3.2, see Llama 3.1 Model Card, as the same considerations apply here as well. **Constrained Environments:** Llama 3.2 1B and 3B models are expected to be deployed in highly constrained environments, such as mobile devices. LLM Systems using smaller models will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems. Developers should ensure the safety of their system meets the requirements of their use case. We recommend using lighter system safeguards for such use cases, like Llama Guard 3-1B or its mobile-optimized version. ### Evaluations **Scaled Evaluations:** We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Purple Llama safeguards to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. **Red Teaming:** We conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks In addition to our safety work above, we took extra care on measuring and/or mitigating the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive Weapons):** Llama 3.2 1B and 3B models are smaller and less capable derivatives of Llama 3.1. For Llama 3.1 70B and 405B, to assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons and have determined that such testing also applies to the smaller 1B and 3B models. **2\\. Child Safety:** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3\\. Cyber Attacks:** For Llama 3.1 405B, our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Because Llama 3.2’s 1B and 3B models are smaller and less capable models than Llama 3.1 405B, we broadly believe that the testing conducted for the 405B model also applies to Llama 3.2 models. ### Community **Industry Partnerships:** Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. **Grants:** We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. **Reporting:** Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations **Values:** The core values of Llama 3.2 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.2 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. **Testing:** Llama 3.2 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.2 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.2-1B.json b/data/model_data_json/meta-llama_Llama-3.2-1B.json new file mode 100644 index 0000000000000000000000000000000000000000..09909dd941eec053e8d180642048a3de0c4ff996 --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.2-1B.json @@ -0,0 +1,30 @@ +{ + "model_id": "meta-llama/Llama-3.2-1B", + "downloads": 1990093, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "arxiv:2405.16406", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th library_name: transformers pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.2 extra_gated_prompt: >- ### LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. “Documentation” means the specifications, manuals and documentation accompanying Llama 3.2 distributed by Meta at “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. “Llama 3.2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at “Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and any portion thereof) made available under this Agreement. “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.2. If you access or use Llama 3.2, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.2 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 1. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 2. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 3. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 4. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law 5. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 6. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 7. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following: 8. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997 9. Guns and illegal weapons (including weapon development) 10. Illegal drugs and regulated/controlled substances 11. Operation of critical infrastructure, transportation technologies, or heavy machinery 12. Self-harm or harm to others, including suicide, cutting, and eating disorders 13. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following: 14. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 15. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 16. Generating, promoting, or further distributing spam 17. Impersonating another individual without consent, authorization, or legal right 18. Representing that the use of Llama 3.2 or outputs are human-generated 19. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system 5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2 With respect to any multimodal models included in Llama 3.2, the rights granted under Section 1(a) of the Llama 3.2 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model Developer:** Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. | | Training Data | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Token count | Knowledge cutoff | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Llama 3.2 (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | | Llama 3.2 Quantized (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 8k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | **Supported Languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). **Feedback:** Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources. **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card. ## How to use This repository contains two versions of Llama-3.2-1B, for use with transformers and with the original codebase. ### Use with transformers Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. Make sure to update your transformers installation via pip install --upgrade transformers. ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Training utilized a cumulative of **916k** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **240** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | | Training Time (GPU hours) | Logit Generation Time (GPU Hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | ----- | :---: | :---: | :---: | | Llama 3.2 1B | 370k | \\- | 700 | 107 | 0 | | Llama 3.2 3B | 460k | \\- | 700 | 133 | 0 | | Llama 3.2 1B SpinQuant | 1.7 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 3B SpinQuant | 2.4 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 1B QLora | 1.3k | 0 | 700 | 0.381 | 0 | | Llama 3.2 3B QLora | 1.6k | 0 | 700 | 0.461 | 0 | | Total | 833k | 86k | | 240 | 0 | \\*\\* The location-based CO2e emissions of Llama 3.2 1B SpinQuant and Llama 3.2 3B SpinQuant are less than 0.001 metric tonnes each. This is due to the minimal training GPU hours that are required. The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.2 was pretrained on up to 9 trillion tokens of data from publicly available sources. For the 1B and 3B Llama 3.2 models, we incorporated logits from the Llama 3.1 8B and 70B models into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. Knowledge distillation was used after pruning to recover performance. In post-training we used a similar recipe as Llama 3.1 and produced final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involved Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO). **Data Freshness:** The pretraining data has a cutoff of December 2023\\. ## Quantization ### Quantization Scheme We designed the current quantization scheme with the PyTorch’s ExecuTorch inference framework and Arm CPU backend in mind, taking into account metrics including model quality, prefill/decoding speed, and memory footprint. Our quantization scheme involves three parts: - All linear layers in all transformer blocks are quantized to a 4-bit groupwise scheme (with a group size of 32) for weights and 8-bit per-token dynamic quantization for activations. - The classification layer is quantized to 8-bit per-channel for weight and 8-bit per token dynamic quantization for activation. - Similar to classification layer, an 8-bit per channel quantization is used for embedding layer. ### Quantization-Aware Training and LoRA The quantization-aware training (QAT) with low-rank adaptation (LoRA) models went through only post-training stages, using the same data as the full precision models. To initialize QAT, we utilize BF16 Llama 3.2 model checkpoints obtained after supervised fine-tuning (SFT) and perform an additional full round of SFT training with QAT. We then freeze the backbone of the QAT model and perform another round of SFT with LoRA adaptors applied to all layers within the transformer block. Meanwhile, the LoRA adaptors' weights and activations are maintained in BF16. Because our approach is similar to QLoRA of Dettmers et al., (2023) (i.e., quantization followed by LoRA adapters), we refer this method as QLoRA. Finally, we fine-tune the resulting model (both backbone and LoRA adaptors) using direct preference optimization (DPO). ### SpinQuant SpinQuant was applied, together with generative post-training quantization (GPTQ). For the SpinQuant rotation matrix fine-tuning, we optimized for 100 iterations, using 800 samples with sequence-length 2048 from the WikiText 2 dataset. For GPTQ, we used 128 samples from the same dataset with the same sequence-length. ## Benchmarks \\- English Text In this section, we report the results for Llama 3.2 models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library. ### Base Pretrained Models | Category | Benchmark | \\# Shots | Metric | Llama 3.2 1B | Llama 3.2 3B | Llama 3.1 8B | | ----- | ----- | :---: | :---: | :---: | :---: | :---: | | General | MMLU | 5 | macro\\_avg/acc\\_char | 32.2 | 58 | 66.7 | | | AGIEval English | 3-5 | average/acc\\_char | 23.3 | 39.2 | 47.8 | | | ARC-Challenge | 25 | acc\\_char | 32.8 | 69.1 | 79.7 | | Reading comprehension | SQuAD | 1 | em | 49.2 | 67.7 | 77 | | | QuAC (F1) | 1 | f1 | 37.9 | 42.9 | 44.9 | | | DROP (F1) | 3 | f1 | 28.0 | 45.2 | 59.5 | | Long Context | Needle in Haystack | 0 | em | 96.8 | 1 | 1 | ### Instruction Tuned Models | Capability | | Benchmark | \\# Shots | Metric | Llama 3.2 1B bf16 | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B bf16 | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | ----- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | | MMLU | 5 | macro\\_avg/acc | 49.3 | 43.3 | 47.3 | 49.0 | 63.4 | 60.5 | 62 | 62.4 | 69.4 | | Re-writing | | Open-rewrite eval | 0 | micro\\_avg/rougeL | 41.6 | 39.2 | 40.9 | 41.2 | 40.1 | 40.3 | 40.8 | 40.7 | 40.9 | | Summarization | | TLDR9+ (test) | 1 | rougeL | 16.8 | 14.9 | 16.7 | 16.8 | 19.0 | 19.1 | 19.2 | 19.1 | 17.2 | | Instruction following | | IFEval | 0 | Avg(Prompt/Instruction acc Loose/Strict) | 59.5 | 51.5 | 58.4 | 55.6 | 77.4 | 73.9 | 73.5 | 75.9 | 80.4 | | Math | | GSM8K (CoT) | 8 | em\\_maj1@1 | 44.4 | 33.1 | 40.6 | 46.5 | 77.7 | 72.9 | 75.7 | 77.9 | 84.5 | | | | MATH (CoT) | 0 | final\\_em | 30.6 | 20.5 | 25.3 | 31.0 | 48.0 | 44.2 | 45.3 | 49.2 | 51.9 | | Reasoning | | ARC-C | 0 | acc | 59.4 | 54.3 | 57 | 60.7 | 78.6 | 75.6 | 77.6 | 77.6 | 83.4 | | | | GPQA | 0 | acc | 27.2 | 25.9 | 26.3 | 25.9 | 32.8 | 32.8 | 31.7 | 33.9 | 32.8 | | | | Hellaswag | 0 | acc | 41.2 | 38.1 | 41.3 | 41.5 | 69.8 | 66.3 | 68 | 66.3 | 78.7 | | Tool Use | | BFCL V2 | 0 | acc | 25.7 | 14.3 | 15.9 | 23.7 | 67.0 | 53.4 | 60.1 | 63.5 | 67.1 | | | | Nexus | 0 | macro\\_avg/acc | 13.5 | 5.2 | 9.6 | 12.5 | 34.3 | 32.4 | 31.5 | 30.1 | 38.5 | | Long Context | | InfiniteBench/En.QA | 0 | longbook\\_qa/f1 | 20.3 | N/A | N/A | N/A | 19.8 | N/A | N/A | N/A | 27.3 | | | | InfiniteBench/En.MC | 0 | longbook\\_choice/acc | 38.0 | N/A | N/A | N/A | 63.3 | N/A | N/A | N/A | 72.2 | | | | NIH/Multi-needle | 0 | recall | 75.0 | N/A | N/A | N/A | 84.7 | N/A | N/A | N/A | 98.8 | | Multilingual | | MGSM (CoT) | 0 | em | 24.5 | 13.7 | 18.2 | 24.4 | 58.2 | 48.9 | 54.3 | 56.8 | 68.9 | \\*\\*for comparison purposes only. Model not released. ### Multilingual Benchmarks | Category | Benchmark | Language | Llama 3.2 1B | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | MMLU (5-shot, macro_avg/acc) | Portuguese | 39.8 | 34.9 | 38.9 | 40.2 | 54.5 | 50.9 | 53.3 | 53.4 | 62.1 | | | | Spanish | 41.5 | 36.0 | 39.8 | 41.8 | 55.1 | 51.9 | 53.6 | 53.6 | 62.5 | | | | Italian | 39.8 | 34.9 | 38.1 | 40.6 | 53.8 | 49.9 | 52.1 | 51.7 | 61.6 | | | | German | 39.2 | 34.9 | 37.5 | 39.6 | 53.3 | 50.0 | 52.2 | 51.3 | 60.6 | | | | French | 40.5 | 34.8 | 39.2 | 40.8 | 54.6 | 51.2 | 53.3 | 53.3 | 62.3 | | | | Hindi | 33.5 | 30.0 | 32.1 | 34.0 | 43.3 | 40.4 | 42.0 | 42.1 | 50.9 | | | | Thai | 34.7 | 31.2 | 32.4 | 34.9 | 44.5 | 41.3 | 44.0 | 42.2 | 50.3 | \\*\\*for comparison purposes only. Model not released. ## Inference time In the below table, we compare the performance metrics of different quantization methods (SpinQuant and QAT \\+ LoRA) with the BF16 baseline. The evaluation was done using the ExecuTorch framework as the inference engine, with the ARM CPU as a backend using Android OnePlus 12 device. | Category | Decode (tokens/sec) | Time-to-first-token (sec) | Prefill (tokens/sec) | Model size (PTE file size in MB) | Memory size (RSS in MB) | | :---- | ----- | ----- | ----- | ----- | ----- | | 1B BF16 (baseline) | 19.2 | 1.0 | 60.3 | 2358 | 3,185 | | 1B SpinQuant | 50.2 (2.6x) | 0.3 (-76.9%) | 260.5 (4.3x) | 1083 (-54.1%) | 1,921 (-39.7%) | | 1B QLoRA | 45.8 (2.4x) | 0.3 (-76.0%) | 252.0 (4.2x) | 1127 (-52.2%) | 2,255 (-29.2%) | | 3B BF16 (baseline) | 7.6 | 3.0 | 21.2 | 6129 | 7,419 | | 3B SpinQuant | 19.7 (2.6x) | 0.7 (-76.4%) | 89.7 (4.2x) | 2435 (-60.3%) | 3,726 (-49.8%) | | 3B QLoRA | 18.5 (2.4x) | 0.7 (-76.1%) | 88.8 (4.2x) | 2529 (-58.7%) | 4,060 (-45.3%) | (\\*) The performance measurement is done using an adb binary-based approach. (\\*\\*) It is measured on an Android OnePlus 12 device. (\\*\\*\\*) Time-to-first-token (TTFT) is measured with prompt length=64 *Footnote:* - *Decode (tokens/second) is for how quickly it keeps generating. Higher is better.* - *Time-to-first-token (TTFT for shorthand) is for how fast it generates the first token for a given prompt. Lower is better.* - *Prefill is the inverse of TTFT (aka 1/TTFT) in tokens/second. Higher is better* - *Model size \\- how big is the model, measured by, PTE file, a binary file format for ExecuTorch* - *RSS size \\- Memory usage in resident set size (RSS)* ## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: 1. Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama 2. Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm 3. Provide protections for the community to help prevent the misuse of our models ### Responsible Deployment **Approach:** Llama is a foundational technology designed to be used in a variety of use cases. Examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models, enabling the world to benefit from the technology power, by aligning our model safety for generic use cases and addressing a standard set of harms. Developers are then in the driver’s seat to tailor safety for their use cases, defining their own policies and deploying the models with the necessary safeguards in their Llama systems. Llama 3.2 was developed following the best practices outlined in our Responsible Use Guide. #### Llama 3.2 Instruct **Objective:** Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. We implemented the same set of safety mitigations as in Llama 3, and you can learn more about these in the Llama 3 paper. **Fine-Tuning Data:** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone:** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.2 Systems **Safety as a System:** Large language models, including Llama 3.2, **are not designed to be deployed in isolation** but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### New Capabilities and Use Cases **Technological Advancement:** Llama releases usually introduce new capabilities that require specific considerations in addition to the best practices that generally apply across all Generative AI use cases. For prior release capabilities also supported by Llama 3.2, see Llama 3.1 Model Card, as the same considerations apply here as well. **Constrained Environments:** Llama 3.2 1B and 3B models are expected to be deployed in highly constrained environments, such as mobile devices. LLM Systems using smaller models will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems. Developers should ensure the safety of their system meets the requirements of their use case. We recommend using lighter system safeguards for such use cases, like Llama Guard 3-1B or its mobile-optimized version. ### Evaluations **Scaled Evaluations:** We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Purple Llama safeguards to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. **Red Teaming:** We conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks In addition to our safety work above, we took extra care on measuring and/or mitigating the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive Weapons):** Llama 3.2 1B and 3B models are smaller and less capable derivatives of Llama 3.1. For Llama 3.1 70B and 405B, to assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons and have determined that such testing also applies to the smaller 1B and 3B models. **2\\. Child Safety:** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3\\. Cyber Attacks:** For Llama 3.1 405B, our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Because Llama 3.2’s 1B and 3B models are smaller and less capable models than Llama 3.1 405B, we broadly believe that the testing conducted for the 405B model also applies to Llama 3.2 models. ### Community **Industry Partnerships:** Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. **Grants:** We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. **Reporting:** Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations **Values:** The core values of Llama 3.2 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.2 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. **Testing:** Llama 3.2 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.2 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.2-3B-Instruct.json b/data/model_data_json/meta-llama_Llama-3.2-3B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..d9c258b9588408a240469ba8a7fc351fc26323a5 --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.2-3B-Instruct.json @@ -0,0 +1,31 @@ +{ + "model_id": "meta-llama/Llama-3.2-3B-Instruct", + "downloads": 1485037, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "arxiv:2405.16406", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th library_name: transformers pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.2 extra_gated_prompt: >- ### LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. “Documentation” means the specifications, manuals and documentation accompanying Llama 3.2 distributed by Meta at “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. “Llama 3.2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at “Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and any portion thereof) made available under this Agreement. “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.2. If you access or use Llama 3.2, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.2 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 1. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 2. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 3. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 4. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law 5. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 6. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 7. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following: 8. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997 9. Guns and illegal weapons (including weapon development) 10. Illegal drugs and regulated/controlled substances 11. Operation of critical infrastructure, transportation technologies, or heavy machinery 12. Self-harm or harm to others, including suicide, cutting, and eating disorders 13. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following: 14. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 15. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 16. Generating, promoting, or further distributing spam 17. Impersonating another individual without consent, authorization, or legal right 18. Representing that the use of Llama 3.2 or outputs are human-generated 19. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system 5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2 With respect to any multimodal models included in Llama 3.2, the rights granted under Section 1(a) of the Llama 3.2 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model Developer:** Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. | | Training Data | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Token count | Knowledge cutoff | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Llama 3.2 (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | | Llama 3.2 Quantized (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 8k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | **Supported Languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). **Feedback:** Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources. **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card. ## How to use This repository contains two versions of Llama-3.2-3B-Instruct, for use with and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Training utilized a cumulative of **916k** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **240** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | | Training Time (GPU hours) | Logit Generation Time (GPU Hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | ----- | :---: | :---: | :---: | | Llama 3.2 1B | 370k | \\- | 700 | 107 | 0 | | Llama 3.2 3B | 460k | \\- | 700 | 133 | 0 | | Llama 3.2 1B SpinQuant | 1.7 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 3B SpinQuant | 2.4 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 1B QLora | 1.3k | 0 | 700 | 0.381 | 0 | | Llama 3.2 3B QLora | 1.6k | 0 | 700 | 0.461 | 0 | | Total | 833k | 86k | | 240 | 0 | \\*\\* The location-based CO2e emissions of Llama 3.2 1B SpinQuant and Llama 3.2 3B SpinQuant are less than 0.001 metric tonnes each. This is due to the minimal training GPU hours that are required. The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.2 was pretrained on up to 9 trillion tokens of data from publicly available sources. For the 1B and 3B Llama 3.2 models, we incorporated logits from the Llama 3.1 8B and 70B models into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. Knowledge distillation was used after pruning to recover performance. In post-training we used a similar recipe as Llama 3.1 and produced final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involved Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO). **Data Freshness:** The pretraining data has a cutoff of December 2023\\. ## Quantization ### Quantization Scheme We designed the current quantization scheme with the PyTorch’s ExecuTorch inference framework and Arm CPU backend in mind, taking into account metrics including model quality, prefill/decoding speed, and memory footprint. Our quantization scheme involves three parts: - All linear layers in all transformer blocks are quantized to a 4-bit groupwise scheme (with a group size of 32) for weights and 8-bit per-token dynamic quantization for activations. - The classification layer is quantized to 8-bit per-channel for weight and 8-bit per token dynamic quantization for activation. - Similar to classification layer, an 8-bit per channel quantization is used for embedding layer. ### Quantization-Aware Training and LoRA The quantization-aware training (QAT) with low-rank adaptation (LoRA) models went through only post-training stages, using the same data as the full precision models. To initialize QAT, we utilize BF16 Llama 3.2 model checkpoints obtained after supervised fine-tuning (SFT) and perform an additional full round of SFT training with QAT. We then freeze the backbone of the QAT model and perform another round of SFT with LoRA adaptors applied to all layers within the transformer block. Meanwhile, the LoRA adaptors' weights and activations are maintained in BF16. Because our approach is similar to QLoRA of Dettmers et al., (2023) (i.e., quantization followed by LoRA adapters), we refer this method as QLoRA. Finally, we fine-tune the resulting model (both backbone and LoRA adaptors) using direct preference optimization (DPO). ### SpinQuant SpinQuant was applied, together with generative post-training quantization (GPTQ). For the SpinQuant rotation matrix fine-tuning, we optimized for 100 iterations, using 800 samples with sequence-length 2048 from the WikiText 2 dataset. For GPTQ, we used 128 samples from the same dataset with the same sequence-length. ## Benchmarks \\- English Text In this section, we report the results for Llama 3.2 models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library. ### Base Pretrained Models | Category | Benchmark | \\# Shots | Metric | Llama 3.2 1B | Llama 3.2 3B | Llama 3.1 8B | | ----- | ----- | :---: | :---: | :---: | :---: | :---: | | General | MMLU | 5 | macro\\_avg/acc\\_char | 32.2 | 58 | 66.7 | | | AGIEval English | 3-5 | average/acc\\_char | 23.3 | 39.2 | 47.8 | | | ARC-Challenge | 25 | acc\\_char | 32.8 | 69.1 | 79.7 | | Reading comprehension | SQuAD | 1 | em | 49.2 | 67.7 | 77 | | | QuAC (F1) | 1 | f1 | 37.9 | 42.9 | 44.9 | | | DROP (F1) | 3 | f1 | 28.0 | 45.2 | 59.5 | | Long Context | Needle in Haystack | 0 | em | 96.8 | 1 | 1 | ### Instruction Tuned Models | Capability | | Benchmark | \\# Shots | Metric | Llama 3.2 1B bf16 | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B bf16 | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | ----- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | | MMLU | 5 | macro\\_avg/acc | 49.3 | 43.3 | 47.3 | 49.0 | 63.4 | 60.5 | 62 | 62.4 | 69.4 | | Re-writing | | Open-rewrite eval | 0 | micro\\_avg/rougeL | 41.6 | 39.2 | 40.9 | 41.2 | 40.1 | 40.3 | 40.8 | 40.7 | 40.9 | | Summarization | | TLDR9+ (test) | 1 | rougeL | 16.8 | 14.9 | 16.7 | 16.8 | 19.0 | 19.1 | 19.2 | 19.1 | 17.2 | | Instruction following | | IFEval | 0 | Avg(Prompt/Instruction acc Loose/Strict) | 59.5 | 51.5 | 58.4 | 55.6 | 77.4 | 73.9 | 73.5 | 75.9 | 80.4 | | Math | | GSM8K (CoT) | 8 | em\\_maj1@1 | 44.4 | 33.1 | 40.6 | 46.5 | 77.7 | 72.9 | 75.7 | 77.9 | 84.5 | | | | MATH (CoT) | 0 | final\\_em | 30.6 | 20.5 | 25.3 | 31.0 | 48.0 | 44.2 | 45.3 | 49.2 | 51.9 | | Reasoning | | ARC-C | 0 | acc | 59.4 | 54.3 | 57 | 60.7 | 78.6 | 75.6 | 77.6 | 77.6 | 83.4 | | | | GPQA | 0 | acc | 27.2 | 25.9 | 26.3 | 25.9 | 32.8 | 32.8 | 31.7 | 33.9 | 32.8 | | | | Hellaswag | 0 | acc | 41.2 | 38.1 | 41.3 | 41.5 | 69.8 | 66.3 | 68 | 66.3 | 78.7 | | Tool Use | | BFCL V2 | 0 | acc | 25.7 | 14.3 | 15.9 | 23.7 | 67.0 | 53.4 | 60.1 | 63.5 | 67.1 | | | | Nexus | 0 | macro\\_avg/acc | 13.5 | 5.2 | 9.6 | 12.5 | 34.3 | 32.4 | 31.5 | 30.1 | 38.5 | | Long Context | | InfiniteBench/En.QA | 0 | longbook\\_qa/f1 | 20.3 | N/A | N/A | N/A | 19.8 | N/A | N/A | N/A | 27.3 | | | | InfiniteBench/En.MC | 0 | longbook\\_choice/acc | 38.0 | N/A | N/A | N/A | 63.3 | N/A | N/A | N/A | 72.2 | | | | NIH/Multi-needle | 0 | recall | 75.0 | N/A | N/A | N/A | 84.7 | N/A | N/A | N/A | 98.8 | | Multilingual | | MGSM (CoT) | 0 | em | 24.5 | 13.7 | 18.2 | 24.4 | 58.2 | 48.9 | 54.3 | 56.8 | 68.9 | \\*\\*for comparison purposes only. Model not released. ### Multilingual Benchmarks | Category | Benchmark | Language | Llama 3.2 1B | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | MMLU (5-shot, macro_avg/acc) | Portuguese | 39.8 | 34.9 | 38.9 | 40.2 | 54.5 | 50.9 | 53.3 | 53.4 | 62.1 | | | | Spanish | 41.5 | 36.0 | 39.8 | 41.8 | 55.1 | 51.9 | 53.6 | 53.6 | 62.5 | | | | Italian | 39.8 | 34.9 | 38.1 | 40.6 | 53.8 | 49.9 | 52.1 | 51.7 | 61.6 | | | | German | 39.2 | 34.9 | 37.5 | 39.6 | 53.3 | 50.0 | 52.2 | 51.3 | 60.6 | | | | French | 40.5 | 34.8 | 39.2 | 40.8 | 54.6 | 51.2 | 53.3 | 53.3 | 62.3 | | | | Hindi | 33.5 | 30.0 | 32.1 | 34.0 | 43.3 | 40.4 | 42.0 | 42.1 | 50.9 | | | | Thai | 34.7 | 31.2 | 32.4 | 34.9 | 44.5 | 41.3 | 44.0 | 42.2 | 50.3 | \\*\\*for comparison purposes only. Model not released. ## Inference time In the below table, we compare the performance metrics of different quantization methods (SpinQuant and QAT \\+ LoRA) with the BF16 baseline. The evaluation was done using the ExecuTorch framework as the inference engine, with the ARM CPU as a backend using Android OnePlus 12 device. | Category | Decode (tokens/sec) | Time-to-first-token (sec) | Prefill (tokens/sec) | Model size (PTE file size in MB) | Memory size (RSS in MB) | | :---- | ----- | ----- | ----- | ----- | ----- | | 1B BF16 (baseline) | 19.2 | 1.0 | 60.3 | 2358 | 3,185 | | 1B SpinQuant | 50.2 (2.6x) | 0.3 (-76.9%) | 260.5 (4.3x) | 1083 (-54.1%) | 1,921 (-39.7%) | | 1B QLoRA | 45.8 (2.4x) | 0.3 (-76.0%) | 252.0 (4.2x) | 1127 (-52.2%) | 2,255 (-29.2%) | | 3B BF16 (baseline) | 7.6 | 3.0 | 21.2 | 6129 | 7,419 | | 3B SpinQuant | 19.7 (2.6x) | 0.7 (-76.4%) | 89.7 (4.2x) | 2435 (-60.3%) | 3,726 (-49.8%) | | 3B QLoRA | 18.5 (2.4x) | 0.7 (-76.1%) | 88.8 (4.2x) | 2529 (-58.7%) | 4,060 (-45.3%) | (\\*) The performance measurement is done using an adb binary-based approach. (\\*\\*) It is measured on an Android OnePlus 12 device. (\\*\\*\\*) Time-to-first-token (TTFT) is measured with prompt length=64 *Footnote:* - *Decode (tokens/second) is for how quickly it keeps generating. Higher is better.* - *Time-to-first-token (TTFT for shorthand) is for how fast it generates the first token for a given prompt. Lower is better.* - *Prefill is the inverse of TTFT (aka 1/TTFT) in tokens/second. Higher is better* - *Model size \\- how big is the model, measured by, PTE file, a binary file format for ExecuTorch* - *RSS size \\- Memory usage in resident set size (RSS)* ## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: 1. Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama 2. Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm 3. Provide protections for the community to help prevent the misuse of our models ### Responsible Deployment **Approach:** Llama is a foundational technology designed to be used in a variety of use cases. Examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models, enabling the world to benefit from the technology power, by aligning our model safety for generic use cases and addressing a standard set of harms. Developers are then in the driver’s seat to tailor safety for their use cases, defining their own policies and deploying the models with the necessary safeguards in their Llama systems. Llama 3.2 was developed following the best practices outlined in our Responsible Use Guide. #### Llama 3.2 Instruct **Objective:** Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. We implemented the same set of safety mitigations as in Llama 3, and you can learn more about these in the Llama 3 paper. **Fine-Tuning Data:** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone:** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.2 Systems **Safety as a System:** Large language models, including Llama 3.2, **are not designed to be deployed in isolation** but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### New Capabilities and Use Cases **Technological Advancement:** Llama releases usually introduce new capabilities that require specific considerations in addition to the best practices that generally apply across all Generative AI use cases. For prior release capabilities also supported by Llama 3.2, see Llama 3.1 Model Card, as the same considerations apply here as well. **Constrained Environments:** Llama 3.2 1B and 3B models are expected to be deployed in highly constrained environments, such as mobile devices. LLM Systems using smaller models will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems. Developers should ensure the safety of their system meets the requirements of their use case. We recommend using lighter system safeguards for such use cases, like Llama Guard 3-1B or its mobile-optimized version. ### Evaluations **Scaled Evaluations:** We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Purple Llama safeguards to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. **Red Teaming:** We conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks In addition to our safety work above, we took extra care on measuring and/or mitigating the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive Weapons):** Llama 3.2 1B and 3B models are smaller and less capable derivatives of Llama 3.1. For Llama 3.1 70B and 405B, to assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons and have determined that such testing also applies to the smaller 1B and 3B models. **2\\. Child Safety:** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3\\. Cyber Attacks:** For Llama 3.1 405B, our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Because Llama 3.2’s 1B and 3B models are smaller and less capable models than Llama 3.1 405B, we broadly believe that the testing conducted for the 405B model also applies to Llama 3.2 models. ### Community **Industry Partnerships:** Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. **Grants:** We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. **Reporting:** Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations **Values:** The core values of Llama 3.2 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.2 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. **Testing:** Llama 3.2 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.2 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.2-3B.json b/data/model_data_json/meta-llama_Llama-3.2-3B.json new file mode 100644 index 0000000000000000000000000000000000000000..86ab13dd107ae7530b8615d338d68d3021634d87 --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.2-3B.json @@ -0,0 +1,30 @@ +{ + "model_id": "meta-llama/Llama-3.2-3B", + "downloads": 546751, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "en", + "de", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "arxiv:2204.05149", + "arxiv:2405.16406", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - pt - hi - es - th library_name: transformers pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.2 extra_gated_prompt: >- ### LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. “Documentation” means the specifications, manuals and documentation accompanying Llama 3.2 distributed by Meta at “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. “Llama 3.2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at “Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and any portion thereof) made available under this Agreement. “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.2. If you access or use Llama 3.2, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.2 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 1. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 2. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 3. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 4. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law 5. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 6. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 7. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following: 8. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997 9. Guns and illegal weapons (including weapon development) 10. Illegal drugs and regulated/controlled substances 11. Operation of critical infrastructure, transportation technologies, or heavy machinery 12. Self-harm or harm to others, including suicide, cutting, and eating disorders 13. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following: 14. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 15. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 16. Generating, promoting, or further distributing spam 17. Impersonating another individual without consent, authorization, or legal right 18. Representing that the use of Llama 3.2 or outputs are human-generated 19. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system 5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2 With respect to any multimodal models included in Llama 3.2, the rights granted under Section 1(a) of the Llama 3.2 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models. Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Information The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model Developer:** Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. | | Training Data | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Token count | Knowledge cutoff | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Llama 3.2 (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | | Llama 3.2 Quantized (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 8k | Yes | Yes | Up to 9T tokens | December 2023 | | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | | **Supported Languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). **Feedback:** Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources. **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card. ## How to use This repository contains two versions of Llama-3.2-3B, for use with transformers and with the original codebase. ### Use with transformers Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. Make sure to update your transformers installation via pip install --upgrade transformers. ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Training utilized a cumulative of **916k** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **240** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | | Training Time (GPU hours) | Logit Generation Time (GPU Hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | ----- | :---: | :---: | :---: | | Llama 3.2 1B | 370k | \\- | 700 | 107 | 0 | | Llama 3.2 3B | 460k | \\- | 700 | 133 | 0 | | Llama 3.2 1B SpinQuant | 1.7 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 3B SpinQuant | 2.4 | 0 | 700 | *Negligible*\\*\\* | 0 | | Llama 3.2 1B QLora | 1.3k | 0 | 700 | 0.381 | 0 | | Llama 3.2 3B QLora | 1.6k | 0 | 700 | 0.461 | 0 | | Total | 833k | 86k | | 240 | 0 | \\*\\* The location-based CO2e emissions of Llama 3.2 1B SpinQuant and Llama 3.2 3B SpinQuant are less than 0.001 metric tonnes each. This is due to the minimal training GPU hours that are required. The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.2 was pretrained on up to 9 trillion tokens of data from publicly available sources. For the 1B and 3B Llama 3.2 models, we incorporated logits from the Llama 3.1 8B and 70B models into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. Knowledge distillation was used after pruning to recover performance. In post-training we used a similar recipe as Llama 3.1 and produced final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involved Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO). **Data Freshness:** The pretraining data has a cutoff of December 2023\\. ## Quantization ### Quantization Scheme We designed the current quantization scheme with the PyTorch’s ExecuTorch inference framework and Arm CPU backend in mind, taking into account metrics including model quality, prefill/decoding speed, and memory footprint. Our quantization scheme involves three parts: - All linear layers in all transformer blocks are quantized to a 4-bit groupwise scheme (with a group size of 32) for weights and 8-bit per-token dynamic quantization for activations. - The classification layer is quantized to 8-bit per-channel for weight and 8-bit per token dynamic quantization for activation. - Similar to classification layer, an 8-bit per channel quantization is used for embedding layer. ### Quantization-Aware Training and LoRA The quantization-aware training (QAT) with low-rank adaptation (LoRA) models went through only post-training stages, using the same data as the full precision models. To initialize QAT, we utilize BF16 Llama 3.2 model checkpoints obtained after supervised fine-tuning (SFT) and perform an additional full round of SFT training with QAT. We then freeze the backbone of the QAT model and perform another round of SFT with LoRA adaptors applied to all layers within the transformer block. Meanwhile, the LoRA adaptors' weights and activations are maintained in BF16. Because our approach is similar to QLoRA of Dettmers et al., (2023) (i.e., quantization followed by LoRA adapters), we refer this method as QLoRA. Finally, we fine-tune the resulting model (both backbone and LoRA adaptors) using direct preference optimization (DPO). ### SpinQuant SpinQuant was applied, together with generative post-training quantization (GPTQ). For the SpinQuant rotation matrix fine-tuning, we optimized for 100 iterations, using 800 samples with sequence-length 2048 from the WikiText 2 dataset. For GPTQ, we used 128 samples from the same dataset with the same sequence-length. ## Benchmarks \\- English Text In this section, we report the results for Llama 3.2 models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library. ### Base Pretrained Models | Category | Benchmark | \\# Shots | Metric | Llama 3.2 1B | Llama 3.2 3B | Llama 3.1 8B | | ----- | ----- | :---: | :---: | :---: | :---: | :---: | | General | MMLU | 5 | macro\\_avg/acc\\_char | 32.2 | 58 | 66.7 | | | AGIEval English | 3-5 | average/acc\\_char | 23.3 | 39.2 | 47.8 | | | ARC-Challenge | 25 | acc\\_char | 32.8 | 69.1 | 79.7 | | Reading comprehension | SQuAD | 1 | em | 49.2 | 67.7 | 77 | | | QuAC (F1) | 1 | f1 | 37.9 | 42.9 | 44.9 | | | DROP (F1) | 3 | f1 | 28.0 | 45.2 | 59.5 | | Long Context | Needle in Haystack | 0 | em | 96.8 | 1 | 1 | ### Instruction Tuned Models | Capability | | Benchmark | \\# Shots | Metric | Llama 3.2 1B bf16 | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B bf16 | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | ----- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | | MMLU | 5 | macro\\_avg/acc | 49.3 | 43.3 | 47.3 | 49.0 | 63.4 | 60.5 | 62 | 62.4 | 69.4 | | Re-writing | | Open-rewrite eval | 0 | micro\\_avg/rougeL | 41.6 | 39.2 | 40.9 | 41.2 | 40.1 | 40.3 | 40.8 | 40.7 | 40.9 | | Summarization | | TLDR9+ (test) | 1 | rougeL | 16.8 | 14.9 | 16.7 | 16.8 | 19.0 | 19.1 | 19.2 | 19.1 | 17.2 | | Instruction following | | IFEval | 0 | Avg(Prompt/Instruction acc Loose/Strict) | 59.5 | 51.5 | 58.4 | 55.6 | 77.4 | 73.9 | 73.5 | 75.9 | 80.4 | | Math | | GSM8K (CoT) | 8 | em\\_maj1@1 | 44.4 | 33.1 | 40.6 | 46.5 | 77.7 | 72.9 | 75.7 | 77.9 | 84.5 | | | | MATH (CoT) | 0 | final\\_em | 30.6 | 20.5 | 25.3 | 31.0 | 48.0 | 44.2 | 45.3 | 49.2 | 51.9 | | Reasoning | | ARC-C | 0 | acc | 59.4 | 54.3 | 57 | 60.7 | 78.6 | 75.6 | 77.6 | 77.6 | 83.4 | | | | GPQA | 0 | acc | 27.2 | 25.9 | 26.3 | 25.9 | 32.8 | 32.8 | 31.7 | 33.9 | 32.8 | | | | Hellaswag | 0 | acc | 41.2 | 38.1 | 41.3 | 41.5 | 69.8 | 66.3 | 68 | 66.3 | 78.7 | | Tool Use | | BFCL V2 | 0 | acc | 25.7 | 14.3 | 15.9 | 23.7 | 67.0 | 53.4 | 60.1 | 63.5 | 67.1 | | | | Nexus | 0 | macro\\_avg/acc | 13.5 | 5.2 | 9.6 | 12.5 | 34.3 | 32.4 | 31.5 | 30.1 | 38.5 | | Long Context | | InfiniteBench/En.QA | 0 | longbook\\_qa/f1 | 20.3 | N/A | N/A | N/A | 19.8 | N/A | N/A | N/A | 27.3 | | | | InfiniteBench/En.MC | 0 | longbook\\_choice/acc | 38.0 | N/A | N/A | N/A | 63.3 | N/A | N/A | N/A | 72.2 | | | | NIH/Multi-needle | 0 | recall | 75.0 | N/A | N/A | N/A | 84.7 | N/A | N/A | N/A | 98.8 | | Multilingual | | MGSM (CoT) | 0 | em | 24.5 | 13.7 | 18.2 | 24.4 | 58.2 | 48.9 | 54.3 | 56.8 | 68.9 | \\*\\*for comparison purposes only. Model not released. ### Multilingual Benchmarks | Category | Benchmark | Language | Llama 3.2 1B | Llama 3.2 1B Vanilla PTQ\\*\\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B | Llama 3.2 3B Vanilla PTQ\\*\\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | General | MMLU (5-shot, macro_avg/acc) | Portuguese | 39.8 | 34.9 | 38.9 | 40.2 | 54.5 | 50.9 | 53.3 | 53.4 | 62.1 | | | | Spanish | 41.5 | 36.0 | 39.8 | 41.8 | 55.1 | 51.9 | 53.6 | 53.6 | 62.5 | | | | Italian | 39.8 | 34.9 | 38.1 | 40.6 | 53.8 | 49.9 | 52.1 | 51.7 | 61.6 | | | | German | 39.2 | 34.9 | 37.5 | 39.6 | 53.3 | 50.0 | 52.2 | 51.3 | 60.6 | | | | French | 40.5 | 34.8 | 39.2 | 40.8 | 54.6 | 51.2 | 53.3 | 53.3 | 62.3 | | | | Hindi | 33.5 | 30.0 | 32.1 | 34.0 | 43.3 | 40.4 | 42.0 | 42.1 | 50.9 | | | | Thai | 34.7 | 31.2 | 32.4 | 34.9 | 44.5 | 41.3 | 44.0 | 42.2 | 50.3 | \\*\\*for comparison purposes only. Model not released. ## Inference time In the below table, we compare the performance metrics of different quantization methods (SpinQuant and QAT \\+ LoRA) with the BF16 baseline. The evaluation was done using the ExecuTorch framework as the inference engine, with the ARM CPU as a backend using Android OnePlus 12 device. | Category | Decode (tokens/sec) | Time-to-first-token (sec) | Prefill (tokens/sec) | Model size (PTE file size in MB) | Memory size (RSS in MB) | | :---- | ----- | ----- | ----- | ----- | ----- | | 1B BF16 (baseline) | 19.2 | 1.0 | 60.3 | 2358 | 3,185 | | 1B SpinQuant | 50.2 (2.6x) | 0.3 (-76.9%) | 260.5 (4.3x) | 1083 (-54.1%) | 1,921 (-39.7%) | | 1B QLoRA | 45.8 (2.4x) | 0.3 (-76.0%) | 252.0 (4.2x) | 1127 (-52.2%) | 2,255 (-29.2%) | | 3B BF16 (baseline) | 7.6 | 3.0 | 21.2 | 6129 | 7,419 | | 3B SpinQuant | 19.7 (2.6x) | 0.7 (-76.4%) | 89.7 (4.2x) | 2435 (-60.3%) | 3,726 (-49.8%) | | 3B QLoRA | 18.5 (2.4x) | 0.7 (-76.1%) | 88.8 (4.2x) | 2529 (-58.7%) | 4,060 (-45.3%) | (\\*) The performance measurement is done using an adb binary-based approach. (\\*\\*) It is measured on an Android OnePlus 12 device. (\\*\\*\\*) Time-to-first-token (TTFT) is measured with prompt length=64 *Footnote:* - *Decode (tokens/second) is for how quickly it keeps generating. Higher is better.* - *Time-to-first-token (TTFT for shorthand) is for how fast it generates the first token for a given prompt. Lower is better.* - *Prefill is the inverse of TTFT (aka 1/TTFT) in tokens/second. Higher is better* - *Model size \\- how big is the model, measured by, PTE file, a binary file format for ExecuTorch* - *RSS size \\- Memory usage in resident set size (RSS)* ## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: 1. Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama 2. Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm 3. Provide protections for the community to help prevent the misuse of our models ### Responsible Deployment **Approach:** Llama is a foundational technology designed to be used in a variety of use cases. Examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models, enabling the world to benefit from the technology power, by aligning our model safety for generic use cases and addressing a standard set of harms. Developers are then in the driver’s seat to tailor safety for their use cases, defining their own policies and deploying the models with the necessary safeguards in their Llama systems. Llama 3.2 was developed following the best practices outlined in our Responsible Use Guide. #### Llama 3.2 Instruct **Objective:** Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. We implemented the same set of safety mitigations as in Llama 3, and you can learn more about these in the Llama 3 paper. **Fine-Tuning Data:** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone:** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.2 Systems **Safety as a System:** Large language models, including Llama 3.2, **are not designed to be deployed in isolation** but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### New Capabilities and Use Cases **Technological Advancement:** Llama releases usually introduce new capabilities that require specific considerations in addition to the best practices that generally apply across all Generative AI use cases. For prior release capabilities also supported by Llama 3.2, see Llama 3.1 Model Card, as the same considerations apply here as well. **Constrained Environments:** Llama 3.2 1B and 3B models are expected to be deployed in highly constrained environments, such as mobile devices. LLM Systems using smaller models will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems. Developers should ensure the safety of their system meets the requirements of their use case. We recommend using lighter system safeguards for such use cases, like Llama Guard 3-1B or its mobile-optimized version. ### Evaluations **Scaled Evaluations:** We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Purple Llama safeguards to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. **Red Teaming:** We conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks In addition to our safety work above, we took extra care on measuring and/or mitigating the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive Weapons):** Llama 3.2 1B and 3B models are smaller and less capable derivatives of Llama 3.1. For Llama 3.1 70B and 405B, to assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons and have determined that such testing also applies to the smaller 1B and 3B models. **2\\. Child Safety:** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3\\. Cyber Attacks:** For Llama 3.1 405B, our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Because Llama 3.2’s 1B and 3B models are smaller and less capable models than Llama 3.1 405B, we broadly believe that the testing conducted for the 405B model also applies to Llama 3.2 models. ### Community **Industry Partnerships:** Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. **Grants:** We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. **Reporting:** Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations **Values:** The core values of Llama 3.2 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.2 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. **Testing:** Llama 3.2 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.2 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-3.3-70B-Instruct.json b/data/model_data_json/meta-llama_Llama-3.3-70B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..bc63c7230325497138ce676112d3d5f5ce6de642 --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-3.3-70B-Instruct.json @@ -0,0 +1,32 @@ +{ + "model_id": "meta-llama/Llama-3.3-70B-Instruct", + "downloads": 966128, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "fr", + "it", + "pt", + "hi", + "es", + "th", + "de", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-70B", + "base_model:finetune:meta-llama/Llama-3.1-70B", + "license:llama3.3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers language: - en - fr - it - pt - hi - es - th - de base_model: - meta-llama/Llama-3.1-70B tags: - facebook - meta - pytorch - llama - llama-3 extra_gated_prompt: \"### LLAMA 3.3 COMMUNITY LICENSE AGREEMENT\\nLlama 3.3 Version Release Date: December 6, 2024\\n\\\"Agreement\\\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.\\n\\\"Documentation\\\" means the specifications, manuals and documentation accompanying Llama 3.3 distributed by Meta at or \\\"you\\\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.\\n\\\"Llama 3.3\\\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at Materials\\\" means, collectively, Meta’s proprietary Llama 3.3 and Documentation (and any portion thereof) made available under this Agreement.\\n\\\"Meta\\\" or \\\"we\\\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).\\nBy clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.\\n1. License Rights and Redistribution.\\na. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.\\nb. Redistribution and Use.\\ni. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name.\\nii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.\\_\\niii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.3 is licensed under the Llama 3.3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”\\niv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. \\n2. Additional Commercial Terms. If, on the Llama 3.3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.\\n3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.\\n4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.\\n5. Intellectual Property.\\na. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta.\\nb. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.\\nc. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.3 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.\\n6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.\\n7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.\\n### Llama 3.3 Acceptable Use Policy\\nMeta is committed to promoting safe and fair use of its tools and features, including Llama 3.3. If you access or use Llama 3.3, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at Uses\\nWe want everyone to use Llama 3.3 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.3 to:\\n1. Violate the law or others’ rights, including to:\\n\\n 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: \\n 1. Violence or terrorism \\n 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material \\n 3. Human trafficking, exploitation, and sexual violence \\n 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. \\n 5. Sexual solicitation \\n 6. Any other criminal activity\\n\\n 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals\\n\\n 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services\\n\\n 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices\\n\\n 5. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law\\n\\n 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials\\n\\n 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system\\n\\n 8. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta\\n\\n2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.3 related to the following:\\n\\n 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997\\n\\n 2. Guns and illegal weapons (including weapon development)\\n\\n 3. Illegal drugs and regulated/controlled substances\\n\\n 4. Operation of critical infrastructure, transportation technologies, or heavy machinery\\n\\n 5. Self-harm or harm to others, including suicide, cutting, and eating disorders\\n\\n 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual\\n\\n3. Intentionally deceive or mislead others, including use of Llama 3.3 related to the following:\\n\\n 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation\\n\\n 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content\\n\\n 3. Generating, promoting, or further distributing spam\\n\\n 4. Impersonating another individual without consent, authorization, or legal right\\n\\n 5. Representing that the use of Llama 3.3 or outputs are human-generated\\n\\n 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement\\n\\n4. Fail to appropriately disclose to end users any known dangers of your AI system\\n5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.3\\nWith respect to any multimodal models included in Llama 3.3, the rights granted under Section 1(a) of the Llama 3.3 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models.\\nPlease report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:\\n* Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama\\\\_output\\\\_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.3: LlamaUseReport@meta.com \" extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit license: llama3.3 --- ## Model Information The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. | | Training Data | Params | Input modalities | Output modalities | Context length | GQA | Token count | Knowledge cutoff | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Llama 3.3 (text only) | A new mix of publicly available online data. | 70B | Multilingual Text | Multilingual Text and code | 128k | Yes | 15T+ | December 2023 | **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.3 model**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** * **70B Instruct: December 6, 2024** **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license, the Llama 3.3 Community License Agreement, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.3 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.3 model also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.3 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.3 Community License. Use in languages beyond those explicitly referenced as supported in this model card\\*\\*. \\*\\*Note: Llama 3.3 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.3 models for languages beyond the 8 supported languages provided they comply with the Llama 3.3 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.3 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Llama-3.3-70B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . See the snippet below for usage with Transformers: ### Tool use with transformers LLaMA-3.3 supports multiple tool use formats. You can see a full guide to prompt formatting here. Tool use is also supported through chat templates in Transformers. Here is a quick example showing a single simple tool: You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so: and then call the tool and append the result, with the role, like so: After that, you can again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information, see the LLaMA prompt format docs and the Transformers tool use documentation. ### Use with The model checkpoints can be used in and for further memory optimisations using and See the snippet below for usage: To load in 4-bit simply pass ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use** Training utilized a cumulative of **39.3**M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. ## ## **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | | Training Time (GPU hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | :---: | :---: | :---: | | Llama 3.3 70B | 7.0M | 700 | 2,040 | 0 | ## The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.3 was pretrained on \\~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023\\. ## Benchmarks \\- English Text In this section, we report the results for Llama 3.3 relative to our previous models. ### Instruction tuned models ## | Category | Benchmark | \\# Shots | Metric | Llama 3.1 8B Instruct | Llama 3.1 70B Instruct | Llama-3.3 70B Instruct | Llama 3.1 405B Instruct | | :---- | :---- | ----- | :---- | ----- | ----- | ----- | ----- | | | MMLU (CoT) | 0 | macro\\_avg/acc | 73.0 | 86.0 | 86.0 | 88.6 | | | MMLU Pro (CoT) | 5 | macro\\_avg/acc | 48.3 | 66.4 | 68.9 | 73.3 | | Steerability | IFEval | | | 80.4 | 87.5 | 92.1 | 88.6 | | Reasoning | GPQA Diamond (CoT) | 0 | acc | 31.8 | 48.0 | 50.5 | 49.0 | | Code | HumanEval | 0 | pass@1 | 72.6 | 80.5 | 88.4 | 89.0 | | | MBPP EvalPlus (base) | 0 | pass@1 | 72.8 | 86.0 | 87.6 | 88.6 | | Math | MATH (CoT) | 0 | sympy\\_intersection\\_score | 51.9 | 68.0 | 77.0 | 73.8 | | Tool Use | BFCL v2 | 0 | overall\\_ast\\_summary/macro\\_avg/valid | 65.4 | 77.5 | 77.3 | 81.1 | | Multilingual | MGSM | 0 | em | 68.9 | 86.9 | 91.1 | 91.6 | ## ## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.3 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.3 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.3 systems **Large language models, including Llama 3.3, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### Capability specific considerations **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.3 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. . ### Critical and other risks ### We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons of the Llama 3 family of models, we performed uplift testing designed to assess whether use of the Llama 3 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. ### **2\\. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3\\. Cyber attack enablement** Our cyber attack uplift study investigated whether the Llama 3 family of LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.3 model, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-4-Maverick-17B-128E-Instruct.json b/data/model_data_json/meta-llama_Llama-4-Maverick-17B-128E-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..4023616f811ec5ba3f000a138e5758776dfeb85a --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-4-Maverick-17B-128E-Instruct.json @@ -0,0 +1,36 @@ +{ + "model_id": "meta-llama/Llama-4-Maverick-17B-128E-Instruct", + "downloads": 92502, + "tags": [ + "transformers", + "safetensors", + "llama4", + "image-text-to-text", + "facebook", + "meta", + "pytorch", + "llama", + "conversational", + "ar", + "de", + "en", + "es", + "fr", + "hi", + "id", + "it", + "pt", + "th", + "tl", + "vi", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-4-Maverick-17B-128E", + "base_model:finetune:meta-llama/Llama-4-Maverick-17B-128E", + "license:other", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers language: - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi base_model: - meta-llama/Llama-4-Maverick-17B-128E tags: - facebook - meta - pytorch - llama - llama4 extra_gated_prompt: >- **LLAMA 4 COMMUNITY LICENSE AGREEMENT** Llama 4 Version Effective Date: April 5, 2025 \"**Agreement**\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"**Documentation**\" means the specifications, manuals and documentation accompanying Llama 4 distributed by Meta at \"**Licensee**\" or \"**you**\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"**Llama 4**\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"**Llama Materials**\" means, collectively, Meta’s proprietary Llama 4 and Documentation (and any portion thereof) made available under this Agreement. \"**Meta**\" or \"**we**\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1\\. **License Rights and Redistribution**. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display \"Built with Llama\" on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include \"Llama\" at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2\\. **Additional Commercial Terms**. If, on the Llama 4 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3**. Disclaimer of Warranty**. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4\\. **Limitation of Liability**. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5\\. **Intellectual Property**. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use \"Llama\" (the \"Mark\") solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 4 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6\\. **Term and Termination**. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7\\. **Governing Law and Jurisdiction**. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit extra_gated_heading: \"Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.\" license: other license_name: llama4 --- ## Model Information The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts. **Model developer**: Meta **Model Architecture:** The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality.
Model Name Training Data Params Input modalities Output modalities Context length Token count Knowledge cutoff
Llama 4 Scout (17Bx16E) A mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI. Learn more in our . 17B (Activated) 109B (Total) Multilingual text and image Multilingual text and code 10M ~40T August 2024
Llama 4 Maverick (17Bx128E) 17B (Activated) 400B (Total) Multilingual text and image Multilingual text and code 1M ~22T August 2024
**Supported languages:** Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. **Model Release Date:** April 5, 2025 **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models may be released as we improve model behavior with community feedback. **License**: A custom commercial license, the Llama 4 Community License Agreement, is available at: **Where to send questions or comments about the model:** Instructions on how to provide feedback or comments on the model can be found in the Llama README. For more technical information about generation parameters and recipes for how to use Llama 4 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 4 is intended for commercial and research use in multiple languages. Instruction tuned models are intended for assistant-like chat and visual reasoning tasks, whereas pretrained models can be adapted for natural language generation. For vision, Llama 4 models are also optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The Llama 4 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 4 Community License allows for these use cases. **Out-of-scope**: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 4 Community License. Use in languages or capabilities beyond those explicitly referenced as supported in this model card\\*\\*. \\*\\*Note: 1\\. Llama 4 has been trained on a broader collection of languages than the 12 supported languages (pre-training includes 200 total languages). Developers may fine-tune Llama 4 models for languages beyond the 12 supported languages provided they comply with the Llama 4 Community License and the Acceptable Use Policy. Developers are responsible for ensuring that their use of Llama 4 in additional languages is done in a safe and responsible manner. 2\\. Llama 4 has been tested for image understanding up to 5 input images. If leveraging additional image understanding capabilities beyond this, Developers are responsible for ensuring that their deployments are mitigated for risks and should perform additional testing and tuning tailored to their specific applications. ## How to use with transformers Please, make sure you have transformers installed, or upgrade using . ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU clusters, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Model pre-training utilized a cumulative of **7.38M** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. ## ## **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **1,999 tons** CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with clean and renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | Model Name | Training Time (GPU hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | :---: | :---: | :---: | | Llama 4 Scout | 5.0M | 700 | 1,354 | 0 | | Llama 4 Maverick | 2.38M | 700 | 645 | 0 | | Total | 7.38M | \\- | 1,999 | 0 | ## The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 4 Scout was pretrained on \\~40 trillion tokens and Llama 4 Maverick was pretrained on \\~22 trillion tokens of multimodal data from a mix of publicly available, licensed data and information from Meta’s products and services. This includes publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI. **Data Freshness:** The pretraining data has a cutoff of August 2024\\. ## Benchmarks In this section, we report the results for Llama 4 relative to our previous models. We've provided quantized checkpoints for deployment flexibility, but all reported evaluations and testing were conducted on bf16 models. ### Pre-trained models | Pre-trained models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.1 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Reasoning & Knowledge | MMLU | 5 | macro\\_avg/acc\\_char | 79.3 | 85.2 | 79.6 | 85.5 | | | MMLU-Pro | 5 | macro\\_avg/em | 53.8 | 61.6 | 58.2 | 62.9 | | | MATH | 4 | em\\_maj1@1 | 41.6 | 53.5 | 50.3 | 61.2 | | Code | MBPP | 3 | pass@1 | 66.4 | 74.4 | 67.8 | 77.6 | | Multilingual | TydiQA | 1 | average/f1 | 29.9 | 34.3 | 31.5 | 31.7 | | Image | ChartQA | 0 | relaxed\\_accuracy | No multimodal support | | 83.4 | 85.3 | | | DocVQA | 0 | anls | | | 89.4 | 91.6 | ### Instruction tuned models | Instruction tuned models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | ----- | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.3 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Image Reasoning | MMMU | 0 | accuracy | No multimodal support | | 69.4 | 73.4 | | | MMMU Pro^ | 0 | accuracy | | | 52.2 | 59.6 | | | MathVista | 0 | accuracy | | | 70.7 | 73.7 | | Image Understanding | ChartQA | 0 | relaxed\\_accuracy | | | 88.8 | 90.0 | | | DocVQA (test) | 0 | anls | | | 94.4 | 94.4 | | Coding | LiveCodeBench (10/01/2024-02/01/2025) | 0 | pass@1 | 33.3 | 27.7 | 32.8 | 43.4 | | Reasoning & Knowledge | MMLU Pro | 0 | macro\\_avg/acc | 68.9 | 73.4 | 74.3 | 80.5 | | | GPQA Diamond | 0 | accuracy | 50.5 | 49.0 | 57.2 | 69.8 | | Multilingual | MGSM | 0 | average/em | 91.1 | 91.6 | 90.6 | 92.3 | | Long context | MTOB (half book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | Context window is 128K | | 42.2/36.6 | 54.0/46.4 | | | MTOB (full book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | | | 39.7/36.3 | 50.8/46.7 | ^reported numbers for MMMU Pro is the average of Standard and Vision tasks ## Quantization The Llama 4 Scout model is released as BF16 weights, but can fit within a single H100 GPU with on-the-fly int4 quantization; the Llama 4 Maverick model is released as both BF16 and FP8 quantized weights. The FP8 quantized weights fit on a single H100 DGX host while still maintaining quality. We provide code for on-the-fly int4 quantization which minimizes performance degradation as well. ## Safeguards As part of our release approach, we followed a three-pronged strategy to manage risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. Llama is a foundational technology designed for use in a variety of use cases; examples on how Meta’s Llama models have been deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology, by aligning our model’s safety for a standard set of risks. Developers are then in the driver seat to tailor safety for their use case, defining their own policies and deploying the models with the necessary safeguards. Llama 4 was developed following the best practices outlined in our Developer Use Guide: AI Protections. ### Model level fine tuning The primary objective of conducting safety fine-tuning is to offer developers a readily available, safe, and powerful model for various applications, reducing the workload needed to deploy safe AI systems. Additionally, this effort provides the research community with a valuable resource for studying the robustness of safety fine-tuning. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals** Building on the work we started with our Llama 3 models, we put a great emphasis on driving down model refusals to benign prompts for Llama 4\\. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. **Tone** We expanded our work on the refusal tone from Llama 3 so that the model sounds more natural. We targeted removing preachy and overly moralizing language, and we corrected formatting issues including the correct use of headers, lists, tables and more. To achieve this, we also targeted improvements to system prompt steerability and instruction following, meaning the model is more readily able to take on a specified tone. All of these contribute to a more conversational and insightful experience overall. **System Prompts** Llama 4 is a more steerable model, meaning responses can be easily tailored to meet specific developer outcomes. Effective system prompts can significantly enhance the performance of large language models. In particular, we’ve seen that the use of a system prompt can be effective in reducing false refusals and templated or “preachy” language patterns common in LLMs. They can also improve conversationality and use of appropriate formatting. Consider the prompt below as a basic template for which a developer might want to further customize to meet specific needs or use cases for our Llama 4 models. | System prompt | | :---- | | You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, \"it's unethical to\", \"it's worth noting…\", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4\\. Your knowledge cutoff date is August 2024\\. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. | ### Llama 4 system protections Large language models, including Llama 4, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional guardrails as required. System protections are key to achieving the right helpfulness-safety alignment, mitigating safety and security risks inherent to the system, and integration of the model or system with external tools. We provide the community with system level protections \\- like Llama Guard, Prompt Guard and Code Shield \\- that developers should deploy with Llama models or other LLMs. All of our reference implementation demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, visual QA. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, coding or memorization. **Red teaming** We conduct recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we use the learnings to improve our benchmarks and safety tuning datasets. We partner early with subject-matter experts in critical risk areas to understand how models may lead to unintended harm for society. Based on these conversations, we derive a set of adversarial goals for the red team, such as extracting harmful information or reprogramming the model to act in potentially harmful ways. The red team consists of experts in cybersecurity, adversarial machine learning, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks ### We spend additional focus on the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons for Llama 4, we applied expert-designed and other targeted evaluations designed to assess whether the use of Llama 4 could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. We also conducted additional red teaming and evaluations for violations of our content policies related to this risk area. **2\\. Child Safety** We leverage pre-training methods like data filtering as a first step in mitigating Child Safety risk in our model. To assess the post trained model for Child Safety risk, a team of experts assesses the model’s capability to produce outputs resulting in Child Safety risks. We use this to inform additional model fine-tuning and in-depth red teaming exercises. We’ve also expanded our Child Safety evaluation benchmarks to cover Llama 4 capabilities like multi-image and multi-lingual. **3\\. Cyber attack enablement** Our cyber evaluations investigated whether Llama 4 is sufficiently capable to enable catastrophic threat scenario outcomes. We conducted threat modeling exercises to identify the specific model capabilities that would be necessary to automate operations or enhance human capabilities across key attack vectors both in terms of skill level and speed. We then identified and developed challenges against which to test for these capabilities in Llama 4 and peer models. Specifically, we focused on evaluating the capabilities of Llama 4 to automate cyberattacks, identify and exploit security vulnerabilities, and automate harmful workflows. Overall, we find that Llama 4 models do not introduce risk plausibly enabling catastrophic cyber outcomes. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Trust tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Considerations and Limitations Our AI is anchored on the values of freedom of expression \\- helping people to explore, debate, and innovate using our technology. We respect people's autonomy and empower them to choose how they experience, interact, and build with AI. Our AI promotes an open exchange of ideas. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 4 addresses users and their needs as they are, without inserting unnecessary judgment, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. Llama 4 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 4’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 4 models, developers should perform safety testing and tuning tailored to their specific applications of the model. We also encourage the open source community to use Llama for the purpose of research and building state of the art tools that address emerging risks. Please refer to available resources including our Developer Use Guide: AI Protections, Llama Protections solutions, and other resources to learn more.", + "model_explanation_gemini": "A multilingual large language model designed for instruction-following tasks across 12 languages, built on Meta's Llama-4 architecture with a 17B parameter base." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-4-Scout-17B-16E-Instruct.json b/data/model_data_json/meta-llama_Llama-4-Scout-17B-16E-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..e85f142ea6e8c948efbb8b946f189ac708231e4f --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-4-Scout-17B-16E-Instruct.json @@ -0,0 +1,35 @@ +{ + "model_id": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "downloads": 861988, + "tags": [ + "transformers", + "safetensors", + "llama4", + "image-text-to-text", + "facebook", + "meta", + "pytorch", + "llama", + "conversational", + "ar", + "de", + "en", + "es", + "fr", + "hi", + "id", + "it", + "pt", + "th", + "tl", + "vi", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-4-Scout-17B-16E", + "base_model:finetune:meta-llama/Llama-4-Scout-17B-16E", + "license:other", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers language: - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi base_model: - meta-llama/Llama-4-Scout-17B-16E tags: - facebook - meta - pytorch - llama - llama4 extra_gated_prompt: >- **LLAMA 4 COMMUNITY LICENSE AGREEMENT** Llama 4 Version Effective Date: April 5, 2025 \"**Agreement**\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"**Documentation**\" means the specifications, manuals and documentation accompanying Llama 4 distributed by Meta at \"**Licensee**\" or \"**you**\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"**Llama 4**\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"**Llama Materials**\" means, collectively, Meta’s proprietary Llama 4 and Documentation (and any portion thereof) made available under this Agreement. \"**Meta**\" or \"**we**\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1\\. **License Rights and Redistribution**. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display \"Built with Llama\" on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include \"Llama\" at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2\\. **Additional Commercial Terms**. If, on the Llama 4 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3**. Disclaimer of Warranty**. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4\\. **Limitation of Liability**. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5\\. **Intellectual Property**. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use \"Llama\" (the \"Mark\") solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 4 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6\\. **Term and Termination**. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7\\. **Governing Law and Jurisdiction**. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit extra_gated_heading: \"Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.\" license: other license_name: llama4 --- ## Model Information The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts. **Model developer**: Meta **Model Architecture:** The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality.
Model Name Training Data Params Input modalities Output modalities Context length Token count Knowledge cutoff
Llama 4 Scout (17Bx16E) A mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI. Learn more in our . 17B (Activated) 109B (Total) Multilingual text and image Multilingual text and code 10M ~40T August 2024
Llama 4 Maverick (17Bx128E) 17B (Activated) 400B (Total) Multilingual text and image Multilingual text and code 1M ~22T August 2024
**Supported languages:** Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. **Model Release Date:** April 5, 2025 **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models may be released as we improve model behavior with community feedback. **License**: A custom commercial license, the Llama 4 Community License Agreement, is available at: **Where to send questions or comments about the model:** Instructions on how to provide feedback or comments on the model can be found in the Llama README. For more technical information about generation parameters and recipes for how to use Llama 4 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 4 is intended for commercial and research use in multiple languages. Instruction tuned models are intended for assistant-like chat and visual reasoning tasks, whereas pretrained models can be adapted for natural language generation. For vision, Llama 4 models are also optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The Llama 4 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 4 Community License allows for these use cases. **Out-of-scope**: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 4 Community License. Use in languages or capabilities beyond those explicitly referenced as supported in this model card\\*\\*. \\*\\*Note: 1\\. Llama 4 has been trained on a broader collection of languages than the 12 supported languages (pre-training includes 200 total languages). Developers may fine-tune Llama 4 models for languages beyond the 12 supported languages provided they comply with the Llama 4 Community License and the Acceptable Use Policy. Developers are responsible for ensuring that their use of Llama 4 in additional languages is done in a safe and responsible manner. 2\\. Llama 4 has been tested for image understanding up to 5 input images. If leveraging additional image understanding capabilities beyond this, Developers are responsible for ensuring that their deployments are mitigated for risks and should perform additional testing and tuning tailored to their specific applications. ## How to use with transformers Please, make sure you have transformers installed, or upgrade using . ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU clusters, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Model pre-training utilized a cumulative of **7.38M** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. ## ## **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **1,999 tons** CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with clean and renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | Model Name | Training Time (GPU hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | :---: | :---: | :---: | | Llama 4 Scout | 5.0M | 700 | 1,354 | 0 | | Llama 4 Maverick | 2.38M | 700 | 645 | 0 | | Total | 7.38M | \\- | 1,999 | 0 | ## The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 4 Scout was pretrained on \\~40 trillion tokens and Llama 4 Maverick was pretrained on \\~22 trillion tokens of multimodal data from a mix of publicly available, licensed data and information from Meta’s products and services. This includes publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI. **Data Freshness:** The pretraining data has a cutoff of August 2024\\. ## Benchmarks In this section, we report the results for Llama 4 relative to our previous models. We've provided quantized checkpoints for deployment flexibility, but all reported evaluations and testing were conducted on bf16 models. ### Pre-trained models | Pre-trained models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.1 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Reasoning & Knowledge | MMLU | 5 | macro\\_avg/acc\\_char | 79.3 | 85.2 | 79.6 | 85.5 | | | MMLU-Pro | 5 | macro\\_avg/em | 53.8 | 61.6 | 58.2 | 62.9 | | | MATH | 4 | em\\_maj1@1 | 41.6 | 53.5 | 50.3 | 61.2 | | Code | MBPP | 3 | pass@1 | 66.4 | 74.4 | 67.8 | 77.6 | | Multilingual | TydiQA | 1 | average/f1 | 29.9 | 34.3 | 31.5 | 31.7 | | Image | ChartQA | 0 | relaxed\\_accuracy | No multimodal support | | 83.4 | 85.3 | | | DocVQA | 0 | anls | | | 89.4 | 91.6 | ### Instruction tuned models | Instruction tuned models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | ----- | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.3 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Image Reasoning | MMMU | 0 | accuracy | No multimodal support | | 69.4 | 73.4 | | | MMMU Pro^ | 0 | accuracy | | | 52.2 | 59.6 | | | MathVista | 0 | accuracy | | | 70.7 | 73.7 | | Image Understanding | ChartQA | 0 | relaxed\\_accuracy | | | 88.8 | 90.0 | | | DocVQA (test) | 0 | anls | | | 94.4 | 94.4 | | Coding | LiveCodeBench (10/01/2024-02/01/2025) | 0 | pass@1 | 33.3 | 27.7 | 32.8 | 43.4 | | Reasoning & Knowledge | MMLU Pro | 0 | macro\\_avg/acc | 68.9 | 73.4 | 74.3 | 80.5 | | | GPQA Diamond | 0 | accuracy | 50.5 | 49.0 | 57.2 | 69.8 | | Multilingual | MGSM | 0 | average/em | 91.1 | 91.6 | 90.6 | 92.3 | | Long context | MTOB (half book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | Context window is 128K | | 42.2/36.6 | 54.0/46.4 | | | MTOB (full book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | | | 39.7/36.3 | 50.8/46.7 | ^reported numbers for MMMU Pro is the average of Standard and Vision tasks ## Quantization The Llama 4 Scout model is released as BF16 weights, but can fit within a single H100 GPU with on-the-fly int4 quantization; the Llama 4 Maverick model is released as both BF16 and FP8 quantized weights. The FP8 quantized weights fit on a single H100 DGX host while still maintaining quality. We provide code for on-the-fly int4 quantization which minimizes performance degradation as well. ## Safeguards As part of our release approach, we followed a three-pronged strategy to manage risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. Llama is a foundational technology designed for use in a variety of use cases; examples on how Meta’s Llama models have been deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology, by aligning our model’s safety for a standard set of risks. Developers are then in the driver seat to tailor safety for their use case, defining their own policies and deploying the models with the necessary safeguards. Llama 4 was developed following the best practices outlined in our Developer Use Guide: AI Protections. ### Model level fine tuning The primary objective of conducting safety fine-tuning is to offer developers a readily available, safe, and powerful model for various applications, reducing the workload needed to deploy safe AI systems. Additionally, this effort provides the research community with a valuable resource for studying the robustness of safety fine-tuning. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals** Building on the work we started with our Llama 3 models, we put a great emphasis on driving down model refusals to benign prompts for Llama 4\\. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. **Tone** We expanded our work on the refusal tone from Llama 3 so that the model sounds more natural. We targeted removing preachy and overly moralizing language, and we corrected formatting issues including the correct use of headers, lists, tables and more. To achieve this, we also targeted improvements to system prompt steerability and instruction following, meaning the model is more readily able to take on a specified tone. All of these contribute to a more conversational and insightful experience overall. **System Prompts** Llama 4 is a more steerable model, meaning responses can be easily tailored to meet specific developer outcomes. Effective system prompts can significantly enhance the performance of large language models. In particular, we’ve seen that the use of a system prompt can be effective in reducing false refusals and templated or “preachy” language patterns common in LLMs. They can also improve conversationality and use of appropriate formatting. Consider the prompt below as a basic template for which a developer might want to further customize to meet specific needs or use cases for our Llama 4 models. | System prompt | | :---- | | You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, \"it's unethical to\", \"it's worth noting…\", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4\\. Your knowledge cutoff date is August 2024\\. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. | ### Llama 4 system protections Large language models, including Llama 4, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional guardrails as required. System protections are key to achieving the right helpfulness-safety alignment, mitigating safety and security risks inherent to the system, and integration of the model or system with external tools. We provide the community with system level protections \\- like Llama Guard, Prompt Guard and Code Shield \\- that developers should deploy with Llama models or other LLMs. All of our reference implementation demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, visual QA. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, coding or memorization. **Red teaming** We conduct recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we use the learnings to improve our benchmarks and safety tuning datasets. We partner early with subject-matter experts in critical risk areas to understand how models may lead to unintended harm for society. Based on these conversations, we derive a set of adversarial goals for the red team, such as extracting harmful information or reprogramming the model to act in potentially harmful ways. The red team consists of experts in cybersecurity, adversarial machine learning, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks ### We spend additional focus on the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons for Llama 4, we applied expert-designed and other targeted evaluations designed to assess whether the use of Llama 4 could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. We also conducted additional red teaming and evaluations for violations of our content policies related to this risk area. **2\\. Child Safety** We leverage pre-training methods like data filtering as a first step in mitigating Child Safety risk in our model. To assess the post trained model for Child Safety risk, a team of experts assesses the model’s capability to produce outputs resulting in Child Safety risks. We use this to inform additional model fine-tuning and in-depth red teaming exercises. We’ve also expanded our Child Safety evaluation benchmarks to cover Llama 4 capabilities like multi-image and multi-lingual. **3\\. Cyber attack enablement** Our cyber evaluations investigated whether Llama 4 is sufficiently capable to enable catastrophic threat scenario outcomes. We conducted threat modeling exercises to identify the specific model capabilities that would be necessary to automate operations or enhance human capabilities across key attack vectors both in terms of skill level and speed. We then identified and developed challenges against which to test for these capabilities in Llama 4 and peer models. Specifically, we focused on evaluating the capabilities of Llama 4 to automate cyberattacks, identify and exploit security vulnerabilities, and automate harmful workflows. Overall, we find that Llama 4 models do not introduce risk plausibly enabling catastrophic cyber outcomes. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Trust tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Considerations and Limitations Our AI is anchored on the values of freedom of expression \\- helping people to explore, debate, and innovate using our technology. We respect people's autonomy and empower them to choose how they experience, interact, and build with AI. Our AI promotes an open exchange of ideas. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 4 addresses users and their needs as they are, without inserting unnecessary judgment, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. Llama 4 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 4’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 4 models, developers should perform safety testing and tuning tailored to their specific applications of the model. We also encourage the open source community to use Llama for the purpose of research and building state of the art tools that address emerging risks. Please refer to available resources including our Developer Use Guide: AI Protections, Llama Protections solutions, and other resources to learn more." +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Llama-Guard-3-8B.json b/data/model_data_json/meta-llama_Llama-Guard-3-8B.json new file mode 100644 index 0000000000000000000000000000000000000000..c0eaa2bc83510dc10b3b5fc9decaa4079e44290f --- /dev/null +++ b/data/model_data_json/meta-llama_Llama-Guard-3-8B.json @@ -0,0 +1,28 @@ +{ + "model_id": "meta-llama/Llama-Guard-3-8B", + "downloads": 312877, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "arxiv:2407.21783", + "arxiv:2312.06674", + "arxiv:2204.05862", + "arxiv:2308.01263", + "base_model:meta-llama/Llama-3.1-8B", + "base_model:finetune:meta-llama/Llama-3.1-8B", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: text-generation base_model: meta-llama/Meta-Llama-3.1-8B tags: - facebook - meta - pytorch - llama - llama-3 license: llama3.1 extra_gated_prompt: >- ### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT Llama 3.1 Version Release Date: July 23, 2024 \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 3.1 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 3.1\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"Llama Materials\" means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at ). All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 3.1 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Llama 3.1 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.1 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 3. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 4. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 5. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 6. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 7. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 8. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.1 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 3.1 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 3.1 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- # Model Details Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls. Below is a response classification example for Llama Guard 3.

In order to produce classifier scores, we look at the probability for the first token, and use that as the “unsafe” class probability. We can then apply score thresholding to make binary decisions. # Hazard Taxonomy and Policy The model is trained to predict safety labels on the 14 categories shown below, based on the MLCommons taxonomy of 13 hazards, as well as an additional category for Code Interpreter Abuse for tool calls use cases
Hazard categories
S1: Violent Crimes S2: Non-Violent Crimes
S3: Sex-Related Crimes S4: Child Sexual Exploitation
S5: Defamation S6: Specialized Advice
S7: Privacy S8: Intellectual Property
S9: Indiscriminate Weapons S10: Hate
S11: Suicide & Self-Harm S12: Sexual Content
S13: Elections S14: Code Interpreter Abuse
**S1 - Violent Crimes** Responses that enable, encourage, or endorse the commission of violent crimes, including: (1) unlawful violence toward people (ex: terrorism, genocide, murder, hate-motivated violent crimes, child abuse, assault, battery, kidnapping) and (2) unlawful violence toward animals (ex: animal abuse) **S2 - Non-Violent Crimes** Responses that enable, encourage, or endorse the commission of non-violent crimes, including: (1) personal crimes (ex: labor trafficking, threats, intimidation, hate-motivated non-violent crimes), (2) financial crimes (ex: fraud, scams, money laundering), (3) property crimes (ex: theft, arson, vandalism), (4) drug crimes (ex: creating or using regulated substances), (5) weapons crimes (ex: producing unlicensed firearms), and (6) cyber crimes (ex: hacking) **S3 - Sex-Related Crimes** Responses that enable, encourage, or endorse the commission of sex-related crimes, including: (1) sex trafficking, (2) sexual assault (ex: rape), (3) sexual harassment (physical, verbal, or visual), and (4) prostitution **S4 - Child Sexual Exploitation** Responses that contain, describe, enable, encourage, or endorse the sexual abuse of children **S5 - Defamation** Responses that are both verifiably false and likely to injure a living person’s reputation **S6 - Specialized Advice** Responses that contain specialized financial, medical, or legal advice, or that indicate dangerous activities or objects are safe **S7 - Privacy** Responses that contain sensitive, nonpublic personal information that could undermine someone’s physical, digital, or financial security **S8 - Intellectual Property** Responses that may violate the intellectual property rights of any third party **S9 - Indiscriminate Weapons** Responses that enable, encourage, or endorse the creation of indiscriminate weapons, including: (1) chemical weapons (ex: nerve gas), (2) biological weapons (ex: anthrax), (3) radiological weapons (ex: cobalt bombs), (4) nuclear weapons (ex: fission bombs), and (5) high-yield explosive weapons (ex: cluster munitions) **S10 - Hate** Responses that demean or dehumanize people on the basis of their sensitive, personal characteristics (i.e., race, color, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity, and/or serious disease) **S11 - Suicide & Self-Harm** Responses that enable, encourage, or endorse acts of intentional self-harm, including: (1) suicide, (2) self-injury (ex: cutting), and (3) disordered eating **S12 - Sexual Content** Responses that contain erotica **S13 - Elections** Responses that contain factually incorrect information about electoral systems and processes, including in the time, place, or manner of voting in civic elections **S14 - Code Interpreter Abuse** Responses that seek to abuse code interpreters, including those that enable denial of service attacks, container escapes or privilege escalation exploits # Supported languages Llama Guard 3 supports content safety for the following languages : English, French, German, Hindi, Italian, Portuguese, Spanish, Thai. # Usage > [!IMPORTANT] > > This repository corresponds to half-precision version of the model. A 8-bit precision version is also provided, please visit meta-llama/Llama-Guard-3-8B-INT8. Llama Guard 3 can be directly used with . It is only supported since version 4.43. # Training Data We use the English data used by Llama Guard [1], which are obtained by getting Llama 2 and Llama 3 generations on prompts from the hh-rlhf dataset [2]. In order to scale training data for new categories and new capabilities such as multilingual and tool use, we collect additional human and synthetically generated data. Similar to the English data, the multilingual data are Human-AI conversation data that are either single-turn or multi-turn. To reduce the model’s false positive rate, we curate a set of multilingual benign prompt and response data where LLMs likely reject the prompts. For the tool use capability, we consider search tool calls and code interpreter abuse. To develop training data for search tool use, we use Llama3 to generate responses to a collected and synthetic set of prompts. The generations are based on the query results obtained from the Brave Search API. To develop synthetic training data to detect code interpreter attacks, we use an LLM to generate safe and unsafe prompts. Then, we use a non-safety-tuned LLM to generate code interpreter completions that comply with these instructions. For safe data, we focus on data close to the boundary of what would be considered unsafe, to minimize false positives on such borderline examples. # Evaluation **Note on evaluations:** As discussed in the original Llama Guard paper, comparing model performance is not straightforward as each model is built on its own policy and is expected to perform better on an evaluation dataset with a policy aligned to the model. This highlights the need for industry standards. By aligning the Llama Guard family of models with the Proof of Concept MLCommons taxonomy of hazards, we hope to drive adoption of industry standards like this and facilitate collaboration and transparency in the LLM safety and content evaluation space. In this regard, we evaluate the performance of Llama Guard 3 on MLCommons hazard taxonomy and compare it across languages with Llama Guard 2 [3] on our internal test. We also add GPT4 as baseline with zero-shot prompting using MLCommons hazard taxonomy. Tables 1, 2, and 3 show that Llama Guard 3 improves over Llama Guard 2 and outperforms GPT4 in English, multilingual, and tool use capabilities. Noteworthily, Llama Guard 3 achieves better performance with much lower false positive rates. We also benchmark Llama Guard 3 in the OSS dataset XSTest [4] and observe that it achieves the same F1 score but a lower false positive rate compared to Llama Guard 2.
Table 1: Comparison of performance of various models measured on our internal English test set for MLCommons hazard taxonomy (response classification). | | **F1 ↑** | **AUPRC ↑** | **False Positive
Rate ↓** | |--------------------------|:--------:|:-----------:|:----------------------------:| | Llama Guard 2 | 0.877 | 0.927 | 0.081 | | Llama Guard 3 | 0.939 | 0.985 | 0.040 | | GPT4 | 0.805 | N/A | 0.152 |

Table 2: Comparison of multilingual performance of various models measured on our internal test set for MLCommons hazard taxonomy (prompt+response classification).
F1 ↑ / FPR ↓
French
German
Hindi
Italian
Portuguese
Spanish
Thai
Llama Guard 2
0.911/0.012
0.795/0.062
0.832/0.062
0.681/0.039
0.845/0.032
0.876/0.001
0.822/0.078
Llama Guard 3
0.943/0.036
0.877/0.032
0.871/0.050
0.873/0.038
0.860/0.060
0.875/0.023
0.834/0.030
GPT4
0.795/0.157
0.691/0.123
0.709/0.206
0.753/0.204
0.738/0.207
0.711/0.169
0.688/0.168

Table 3: Comparison of performance of various models measured on our internal test set for other moderation capabilities (prompt+response classification).
Search tool calls Code interpreter abuse
F1 ↑
AUPRC ↑
FPR ↓
F1 ↑
AUPRC ↑
FPR ↓
Llama Guard 2
0.749
0.794
0.284
0.683
0.677
0.670
Llama Guard 3
0.856
0.938
0.174
0.885
0.967
0.125
GPT4
0.732
N/A
0.525
0.636
N/A
0.90
# Application As outlined in the Llama 3 paper, Llama Guard 3 provides industry leading system-level safety performance and is recommended to be deployed along with Llama 3.1. Note that, while deploying Llama Guard 3 will likely improve the safety of your system, it might increase refusals to benign prompts (False Positives). Violation rate improvement and impact on false positives as measured on internal benchmarks are provided in the Llama 3 paper. # Quantization We are committed to help the community deploy Llama systems responsibly. We provide a quantized version of Llama Guard 3 to lower the deployment cost. We used int 8 implementation integrated into the hugging face ecosystem, reducing the checkpoint size by about 40% with very small impact on model performance. In Table 5, we observe that the performance quantized model is comparable to the original model.
Table 5: Impact of quantization on Llama Guard 3 performance.

Task


Capability

Non-Quantized

Quantized

Precision

Recall

F1

FPR

Precision

Recall

F1

FPR

Prompt Classification

English

0.952

0.943

0.947

0.057

0.961

0.939

0.950

0.045

Multilingual

0.901

0.899

0.900

0.054

0.906

0.892

0.899

0.051

Tool Use

0.884

0.958

0.920

0.126

0.876

0.946

0.909

0.134

Response Classification

English

0.947

0.931

0.939

0.040

0.947

0.925

0.936

0.040

Multilingual

0.929

0.805

0.862

0.033

0.931

0.785

0.851

0.031

Tool Use

0.774

0.884

0.825

0.176

0.793

0.865

0.827

0.155

# Get started Llama Guard 3 is available by default on Llama 3.1 reference implementations. You can learn more about how to configure and customize using Llama Recipes shared on our Github repository. # Limitations There are some limitations associated with Llama Guard 3. First, Llama Guard 3 itself is an LLM fine-tuned on Llama 3.1. Thus, its performance (e.g., judgments that need common sense knowledge, multilingual capability, and policy coverage) might be limited by its (pre-)training data. Some hazard categories may require factual, up-to-date knowledge to be evaluated (for example, S5: Defamation, S8: Intellectual Property, and S13: Elections) . We believe more complex systems should be deployed to accurately moderate these categories for use cases highly sensitive to these types of hazards, but Llama Guard 3 provides a good baseline for generic use cases. Lastly, as an LLM, Llama Guard 3 may be susceptible to adversarial attacks or prompt injection attacks that could bypass or alter its intended use. Please feel free to report vulnerabilities and we will look to incorporate improvements in future versions of Llama Guard. # Citation # References [1] Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations [2] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback [3] Llama Guard 2 Model Card [4] XSTest: A Test Suite for Identifying Exaggerated Safety Behaviors in Large Language Models" +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_LlamaGuard-7b.json b/data/model_data_json/meta-llama_LlamaGuard-7b.json new file mode 100644 index 0000000000000000000000000000000000000000..419cb6212cd7bed171d2a2660e52458ebf728f1a --- /dev/null +++ b/data/model_data_json/meta-llama_LlamaGuard-7b.json @@ -0,0 +1,24 @@ +{ + "model_id": "meta-llama/LlamaGuard-7b", + "downloads": 639785, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-2", + "conversational", + "en", + "arxiv:2307.09288", + "arxiv:2312.04724", + "license:llama2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- extra_gated_heading: You need to share contact information with Meta to access this model extra_gated_prompt: >- ### LLAMA 2 COMMUNITY LICENSE AGREEMENT \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Llama 2\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/. \"Llama Materials\" means, collectively, Meta's proprietary Llama 2 and documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non- transferable and royalty-free limited license under Meta's intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof). 2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials. b. Subject to Meta's ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Llama 2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy. #### Prohibited Uses We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Llama 2 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Llama 2 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: github.com/facebookresearch/llama * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit language: - en pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-2 license: llama2 --- ## Model Details **This repository contains the model weights both in the vanilla Llama format and the Hugging Face format. If you have not received access, please review this discussion** Llama-Guard is a 7B parameter Llama 2-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe/unsafe, and if unsafe based on a policy, it also lists the violating subcategories. Here is an example: In order to produce classifier scores, we look at the probability for the first token, and turn that into an “unsafe” class probability. Model users can then make binary decisions by applying a desired threshold to the probability scores. ## Training and Evaluation ### Training Data We use a mix of prompts that come from the Anthropic dataset and redteaming examples that we have collected in house, in a separate process from our production redteaming. In particular, we took the prompts only from the Anthropic dataset, and generated new responses from our in-house LLaMA models, using jailbreaking techniques to elicit violating responses. We then annotated Anthropic data (prompts & responses) in house, mapping labels according to the categories identified above. Overall we have ~13K training examples. ## Taxonomy of harms and Risk Guidelines As automated content risk mitigation relies on classifiers to make decisions about content in real time, a prerequisite to building these systems is to have the following components: - A **taxonomy** of risks that are of interest – these become the classes of a classifier. - A **risk guideline** that determines where we put the line between encouraged and discouraged outputs for each risk category in the taxonomy. Together with this model, we release an open taxonomy inspired by existing open taxonomies such as those employed by Google, Microsoft and OpenAI in the hope that it can be useful to the community. This taxonomy does not necessarily reflect Meta's own internal policies and is meant to demonstrate the value of our method to tune LLMs into classifiers that show high performance and high degrees of adaptability to different policies. ### The Llama-Guard Safety Taxonomy & Risk Guidelines Below, we provide both the harm types themselves under this taxonomy and also examples of the specific kinds of content that would be considered harmful under each category: - **Violence & Hate** encompasses statements that encourage or could help people plan or engage in violence. Similarly, statements that advocate discrimination, contain slurs, or voice hateful sentiments against people based on their sensitive personal characteristics (ex: race, color, religion, national origin, sexual orientation, gender, gender identity, or disability) would also be considered inappropriate under this category. - **Sexual Content** encompasses statements encouraging someone (who could be underage) to engage in specific sex acts. Similarly, sexually explicit (i.e., erotic) statements would also be considered inappropriate under this category. - **Guns & Illegal Weapons** encompasses statements that specifically encourage, condone, or could help people plan or execute specific crimes involving the illegal acquisition, creation, or use of guns or illegal weapons (ex: explosives, biological agents, or chemical weapons), including instructions on how to create or use them. - **Regulated or Controlled Substances** encompasses statements that specifically encourage or could help people to illegally produce, transfer, or consume regulated or controlled substances like illegal drugs, tobacco, alcohol, or cannabis. - **Suicide & Self Harm** encompasses statements that encourage, condone, or enable people to harm themselves (ex: by providing instructions or information on methods of self-harm). When people express or imply an intent to harm themselves, any response other than one that provides links to appropriate health resources would be considered inappropriate (even if entirely supportive). - **Criminal Planning** encompasses miscellaneous statements that encourage, condone, or could help people plan or execute specific criminal activities, like arson, kidnapping, or theft. Items should only be considered harmful under this category when they could not be read as violating any of the other harm types above (ex: statements that encourage violence should be considered violating under Violence & Hate rather than this category). ## How to Use in > [!warning] > You need to be logged in to the Hugging Face Hub to use the model. For more details, see this Colab notebook. ## Evaluation results We compare the performance of the model against standard content moderation APIs in the industry, including OpenAI, Azure Content Safety,and PerspectiveAPI from Google on both public and in-house benchmarks. The public benchmarks include ToxicChat and OpenAI Moderation. Note: comparisons are not exactly apples-to-apples due to mismatches in each taxonomy. The interested reader can find a more detailed discussion about this in our paper. | | Our Test Set (Prompt) | OpenAI Mod | ToxicChat | Our Test Set (Response) | | --------------- | --------------------- | ---------- | --------- | ----------------------- | | Llama-Guard | **0.945** | 0.847 | **0.626** | **0.953** | | OpenAI API | 0.764 | **0.856** | 0.588 | 0.769 | | Perspective API | 0.728 | 0.787 | 0.532 | 0.699 |" +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Meta-Llama-3-70B-Instruct.json b/data/model_data_json/meta-llama_Meta-Llama-3-70B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..030b404a92d16c4538d8df61198ed02389ad4955 --- /dev/null +++ b/data/model_data_json/meta-llama_Meta-Llama-3-70B-Instruct.json @@ -0,0 +1,24 @@ +{ + "model_id": "meta-llama/Meta-Llama-3-70B-Instruct", + "downloads": 291523, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "base_model:meta-llama/Meta-Llama-3-70B", + "base_model:finetune:meta-llama/Meta-Llama-3-70B", + "license:llama3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: text-generation base_model: meta-llama/Meta-Llama-3-70B new_version: meta-llama/Llama-3.3-70B-Instruct tags: - facebook - meta - pytorch - llama - llama-3 license: llama3 extra_gated_prompt: >- ### META LLAMA 3 COMMUNITY LICENSE AGREEMENT Meta Llama 3 Version Release Date: April 18, 2024 \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Meta Llama 3 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Meta Llama 3\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"Llama Materials\" means, collectively, Meta’s proprietary Meta Llama 3 and Documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof). 2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama 3” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at ). All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Meta Llama 3 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Meta Llama 3 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Meta Llama 3. If you access or use Meta Llama 3, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Meta Llama 3 safely and responsibly. You agree you will not use, or allow others to use, Meta Llama 3 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Meta Llama 3 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Meta Llama 3 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Meta Llama 3 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit widget: - example_title: Winter holidays messages: - role: system content: You are a helpful and honest assistant. Please, respond concisely and truthfully. - role: user content: Can you recommend a good destination for Winter holidays? - example_title: Programming assistant messages: - role: system content: You are a helpful and honest code and programming assistant. Please, respond concisely and truthfully. - role: user content: Write a function that computes the nth fibonacci number. inference: parameters: max_new_tokens: 300 stop: - <|end_of_text|> - <|eot_id|> --- ## Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. **Model developers** Meta **Variations** Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. **Input** Models input text only. **Output** Models generate text and code only. **Model Architecture** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Context length GQA Token count Knowledge cutoff
Llama 3 A new mix of publicly available online data. 8B 8k Yes 15T+ March, 2023
70B 8k Yes December, 2023
**Llama 3 family of models**. Token counts refer to pretraining data only. Both the 8 and 70B versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date** April 18, 2024. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3 Community License. Use in languages other than English**. **Note: Developers may fine-tune Llama 3 models for languages beyond English provided they comply with the Llama 3 Community License and the Acceptable Use Policy. ## How to use This repository contains two versions of Meta-Llama-3-70B-Instruct, for use with transformers and with the original codebase. ### Use with transformers See the snippet below for usage with Transformers: ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : For Hugging Face support, we recommend using transformers or TGI, but a similar command works. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint Pretraining utilized a cumulative** 7.7M GPU hours of computation on hardware of type H100-80GB (TDP of 700W). Estimated total emissions were 2290 tCO2eq, 100% of which were offset by Meta’s sustainability program.
Time (GPU hours) Power Consumption (W) Carbon Emitted(tCO2eq)
Llama 3 8B 1.3M 700 390
Llama 3 70B 6.4M 700 1900
Total 7.7M 2290
**CO2 emissions during pre-training**. Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 3 was pretrained on over 15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 10M human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of March 2023 for the 7B and December 2023 for the 70B models respectively. ## Benchmarks In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here. ### Base pretrained models
Category Benchmark Llama 3 8B Llama2 7B Llama2 13B Llama 3 70B Llama2 70B
General MMLU (5-shot) 66.6 45.7 53.8 79.5 69.7
AGIEval English (3-5 shot) 45.9 28.8 38.7 63.0 54.8
CommonSenseQA (7-shot) 72.6 57.6 67.6 83.8 78.7
Winogrande (5-shot) 76.1 73.3 75.4 83.1 81.8
BIG-Bench Hard (3-shot, CoT) 61.1 38.1 47.0 81.3 65.7
ARC-Challenge (25-shot) 78.6 53.7 67.6 93.0 85.3
Knowledge reasoning TriviaQA-Wiki (5-shot) 78.5 72.1 79.6 89.7 87.5
Reading comprehension SQuAD (1-shot) 76.4 72.2 72.1 85.6 82.6
QuAC (1-shot, F1) 44.4 39.6 44.9 51.1 49.4
BoolQ (0-shot) 75.7 65.5 66.9 79.0 73.1
DROP (3-shot, F1) 58.4 37.9 49.8 79.7 70.2
### Instruction tuned models
Benchmark Llama 3 8B Llama 2 7B Llama 2 13B Llama 3 70B Llama 2 70B
MMLU (5-shot) 68.4 34.1 47.8 82.0 52.9
GPQA (0-shot) 34.2 21.7 22.3 39.5 21.0
HumanEval (0-shot) 62.2 7.9 14.0 81.7 25.6
GSM-8K (8-shot, CoT) 79.6 25.7 77.4 93.0 57.5
MATH (4-shot, CoT) 30.0 3.8 6.7 50.4 11.6
### Responsibility & Safety We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community. Foundation models are widely capable technologies that are built to be used for a diverse range of applications. They are not designed to meet every developer preference on safety levels for all use cases, out-of-the-box, as those by their nature will differ across different applications. Rather, responsible LLM-application deployment is achieved by implementing a series of safety best practices throughout the development of such applications, from the model pre-training, fine-tuning and the deployment of systems composed of safeguards to tailor the safety needs specifically to the use case and audience. As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application. We also provide a set of resources including Meta Llama Guard 2 and Code Shield safeguards. These tools have proven to drastically reduce residual risks of LLM Systems, while maintaining a high level of helpfulness. We encourage developers to tune and deploy these safeguards according to their needs and we provide a reference implementation to get you started. #### Llama 3-Instruct As outlined in the Responsible Use Guide, some trade-off between model helpfulness and model alignment is likely unavoidable. Developers should exercise discretion about how to weigh the benefits of alignment and helpfulness for their specific use case and audience. Developers should be mindful of residual risks when using Llama models and leverage additional safety tools as needed to reach the right safety bar for their use case. Safety For our instruction tuned model, we conducted extensive red teaming exercises, performed adversarial evaluations and implemented safety mitigations techniques to lower residual risks. As with any Large Language Model, residual risks will likely remain and we recommend that developers assess these risks in the context of their use case. In parallel, we are working with the community to make AI safety benchmark standards transparent, rigorous and interpretable. Refusals In addition to residual risks, we put a great emphasis on model refusals to benign prompts. Over-refusing not only can impact the user experience but could even be harmful in certain contexts as well. We’ve heard the feedback from the developer community and improved our fine tuning to ensure that Llama 3 is significantly less likely to falsely refuse to answer prompts than Llama 2. We built internal benchmarks and developed mitigations to limit false refusals making Llama 3 our most helpful model to date. #### Responsible release In addition to responsible use considerations outlined above, we followed a rigorous process that requires us to take extra measures against misuse and critical risks before we make our release decision. Misuse If you access or use Llama 3, you agree to the Acceptable Use Policy. The most recent copy of this policy can be found at #### Critical risks CBRNE (Chemical, Biological, Radiological, Nuclear, and high yield Explosives) We have conducted a two fold assessment of the safety of the model in this area: * Iterative testing during model training to assess the safety of responses related to CBRNE threats and other adversarial risks. * Involving external CBRNE experts to conduct an uplift test assessing the ability of the model to accurately provide expert knowledge and reduce barriers to potential CBRNE misuse, by reference to what can be achieved using web search (without the model). ### Cyber Security We have evaluated Llama 3 with CyberSecEval, Meta’s cybersecurity safety eval suite, measuring Llama 3’s propensity to suggest insecure code when used as a coding assistant, and Llama 3’s propensity to comply with requests to help carry out cyber attacks, where attacks are defined by the industry standard MITRE ATT&CK cyber attack ontology. On our insecure coding and cyber attacker helpfulness tests, Llama 3 behaved in the same range or safer than models of equivalent coding capability. ### Child Safety Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership in AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has been in English, and has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3 models, developers should perform safety testing and tuning tailored to their specific applications of the model. As outlined in the Responsible Use Guide, we recommend incorporating Purple Llama solutions into your workflows and specifically Llama Guard which provides a base model to filter input and output prompts to layer system-level safety on top of model-level safety. Please see the Responsible Use Guide available at ## Citation instructions @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = { } ## Contributors Aaditya Singh; Aaron Grattafiori; Abhimanyu Dubey; Abhinav Jauhri; Abhinav Pandey; Abhishek Kadian; Adam Kelsey; Adi Gangidi; Ahmad Al-Dahle; Ahuva Goldstand; Aiesha Letman; Ajay Menon; Akhil Mathur; Alan Schelten; Alex Vaughan; Amy Yang; Andrei Lupu; Andres Alvarado; Andrew Gallagher; Andrew Gu; Andrew Ho; Andrew Poulton; Andrew Ryan; Angela Fan; Ankit Ramchandani; Anthony Hartshorn; Archi Mitra; Archie Sravankumar; Artem Korenev; Arun Rao; Ashley Gabriel; Ashwin Bharambe; Assaf Eisenman; Aston Zhang; Aurelien Rodriguez; Austen Gregerson; Ava Spataru; Baptiste Roziere; Ben Maurer; Benjamin Leonhardi; Bernie Huang; Bhargavi Paranjape; Bing Liu; Binh Tang; Bobbie Chern; Brani Stojkovic; Brian Fuller; Catalina Mejia Arenas; Chao Zhou; Charlotte Caucheteux; Chaya Nayak; Ching-Hsiang Chu; Chloe Bi; Chris Cai; Chris Cox; Chris Marra; Chris McConnell; Christian Keller; Christoph Feichtenhofer; Christophe Touret; Chunyang Wu; Corinne Wong; Cristian Canton Ferrer; Damien Allonsius; Daniel Kreymer; Daniel Haziza; Daniel Li; Danielle Pintz; Danny Livshits; Danny Wyatt; David Adkins; David Esiobu; David Xu; Davide Testuggine; Delia David; Devi Parikh; Dhruv Choudhary; Dhruv Mahajan; Diana Liskovich; Diego Garcia-Olano; Diego Perino; Dieuwke Hupkes; Dingkang Wang; Dustin Holland; Egor Lakomkin; Elina Lobanova; Xiaoqing Ellen Tan; Emily Dinan; Eric Smith; Erik Brinkman; Esteban Arcaute; Filip Radenovic; Firat Ozgenel; Francesco Caggioni; Frank Seide; Frank Zhang; Gabriel Synnaeve; Gabriella Schwarz; Gabrielle Lee; Gada Badeer; Georgia Anderson; Graeme Nail; Gregoire Mialon; Guan Pang; Guillem Cucurell; Hailey Nguyen; Hannah Korevaar; Hannah Wang; Haroun Habeeb; Harrison Rudolph; Henry Aspegren; Hu Xu; Hugo Touvron; Iga Kozlowska; Igor Molybog; Igor Tufanov; Iliyan Zarov; Imanol Arrieta Ibarra; Irina-Elena Veliche; Isabel Kloumann; Ishan Misra; Ivan Evtimov; Jacob Xu; Jade Copet; Jake Weissman; Jan Geffert; Jana Vranes; Japhet Asher; Jason Park; Jay Mahadeokar; Jean-Baptiste Gaya; Jeet Shah; Jelmer van der Linde; Jennifer Chan; Jenny Hong; Jenya Lee; Jeremy Fu; Jeremy Teboul; Jianfeng Chi; Jianyu Huang; Jie Wang; Jiecao Yu; Joanna Bitton; Joe Spisak; Joelle Pineau; Jon Carvill; Jongsoo Park; Joseph Rocca; Joshua Johnstun; Junteng Jia; Kalyan Vasuden Alwala; Kam Hou U; Kate Plawiak; Kartikeya Upasani; Kaushik Veeraraghavan; Ke Li; Kenneth Heafield; Kevin Stone; Khalid El-Arini; Krithika Iyer; Kshitiz Malik; Kuenley Chiu; Kunal Bhalla; Kyle Huang; Lakshya Garg; Lauren Rantala-Yeary; Laurens van der Maaten; Lawrence Chen; Leandro Silva; Lee Bell; Lei Zhang; Liang Tan; Louis Martin; Lovish Madaan; Luca Wehrstedt; Lukas Blecher; Luke de Oliveira; Madeline Muzzi; Madian Khabsa; Manav Avlani; Mannat Singh; Manohar Paluri; Mark Zuckerberg; Marcin Kardas; Martynas Mankus; Mathew Oldham; Mathieu Rita; Matthew Lennie; Maya Pavlova; Meghan Keneally; Melanie Kambadur; Mihir Patel; Mikayel Samvelyan; Mike Clark; Mike Lewis; Min Si; Mitesh Kumar Singh; Mo Metanat; Mona Hassan; Naman Goyal; Narjes Torabi; Nicolas Usunier; Nikolay Bashlykov; Nikolay Bogoychev; Niladri Chatterji; Ning Dong; Oliver Aobo Yang; Olivier Duchenne; Onur Celebi; Parth Parekh; Patrick Alrassy; Paul Saab; Pavan Balaji; Pedro Rittner; Pengchuan Zhang; Pengwei Li; Petar Vasic; Peter Weng; Polina Zvyagina; Prajjwal Bhargava; Pratik Dubal; Praveen Krishnan; Punit Singh Koura; Qing He; Rachel Rodriguez; Ragavan Srinivasan; Rahul Mitra; Ramon Calderer; Raymond Li; Robert Stojnic; Roberta Raileanu; Robin Battey; Rocky Wang; Rohit Girdhar; Rohit Patel; Romain Sauvestre; Ronnie Polidoro; Roshan Sumbaly; Ross Taylor; Ruan Silva; Rui Hou; Rui Wang; Russ Howes; Ruty Rinott; Saghar Hosseini; Sai Jayesh Bondu; Samyak Datta; Sanjay Singh; Sara Chugh; Sargun Dhillon; Satadru Pan; Sean Bell; Sergey Edunov; Shaoliang Nie; Sharan Narang; Sharath Raparthy; Shaun Lindsay; Sheng Feng; Sheng Shen; Shenghao Lin; Shiva Shankar; Shruti Bhosale; Shun Zhang; Simon Vandenhende; Sinong Wang; Seohyun Sonia Kim; Soumya Batra; Sten Sootla; Steve Kehoe; Suchin Gururangan; Sumit Gupta; Sunny Virk; Sydney Borodinsky; Tamar Glaser; Tamar Herman; Tamara Best; Tara Fowler; Thomas Georgiou; Thomas Scialom; Tianhe Li; Todor Mihaylov; Tong Xiao; Ujjwal Karn; Vedanuj Goswami; Vibhor Gupta; Vignesh Ramanathan; Viktor Kerkez; Vinay Satish Kumar; Vincent Gonguet; Vish Vogeti; Vlad Poenaru; Vlad Tiberiu Mihailescu; Vladan Petrovic; Vladimir Ivanov; Wei Li; Weiwei Chu; Wenhan Xiong; Wenyin Fu; Wes Bouaziz; Whitney Meers; Will Constable; Xavier Martinet; Xiaojian Wu; Xinbo Gao; Xinfeng Xie; Xuchao Jia; Yaelle Goldschlag; Yann LeCun; Yashesh Gaur; Yasmine Babaei; Ye Qi; Yenda Li; Yi Wen; Yiwen Song; Youngjin Nam; Yuchen Hao; Yuchen Zhang; Yun Wang; Yuning Mao; Yuzi He; Zacharie Delpierre Coudert; Zachary DeVito; Zahra Hankir; Zhaoduo Wen; Zheng Yan; Zhengxing Chen; Zhenyu Yang; Zoe Papakipos" +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Meta-Llama-3-8B-Instruct.json b/data/model_data_json/meta-llama_Meta-Llama-3-8B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..9d5130c8ca4af446e6b24232a0034679101e3800 --- /dev/null +++ b/data/model_data_json/meta-llama_Meta-Llama-3-8B-Instruct.json @@ -0,0 +1,22 @@ +{ + "model_id": "meta-llama/Meta-Llama-3-8B-Instruct", + "downloads": 1312717, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "conversational", + "en", + "license:llama3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3 new_version: meta-llama/Llama-3.1-8B-Instruct extra_gated_prompt: >- ### META LLAMA 3 COMMUNITY LICENSE AGREEMENT Meta Llama 3 Version Release Date: April 18, 2024 \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Meta Llama 3 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Meta Llama 3\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"Llama Materials\" means, collectively, Meta’s proprietary Meta Llama 3 and Documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof). 2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama 3” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at ). All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Meta Llama 3 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Meta Llama 3 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Meta Llama 3. If you access or use Meta Llama 3, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Meta Llama 3 safely and responsibly. You agree you will not use, or allow others to use, Meta Llama 3 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Meta Llama 3 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Meta Llama 3 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Meta Llama 3 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit widget: - example_title: Hello messages: - role: user content: Hey my name is Julien! How are you? - example_title: Winter holidays messages: - role: system content: You are a helpful and honest assistant. Please, respond concisely and truthfully. - role: user content: Can you recommend a good destination for Winter holidays? - example_title: Programming assistant messages: - role: system content: You are a helpful and honest code and programming assistant. Please, respond concisely and truthfully. - role: user content: Write a function that computes the nth fibonacci number. inference: parameters: max_new_tokens: 300 stop: - <|end_of_text|> - <|eot_id|> --- ## Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. **Model developers** Meta **Variations** Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. **Input** Models input text only. **Output** Models generate text and code only. **Model Architecture** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Context length GQA Token count Knowledge cutoff
Llama 3 A new mix of publicly available online data. 8B 8k Yes 15T+ March, 2023
70B 8k Yes December, 2023
**Llama 3 family of models**. Token counts refer to pretraining data only. Both the 8 and 70B versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date** April 18, 2024. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3 Community License. Use in languages other than English**. **Note: Developers may fine-tune Llama 3 models for languages beyond English provided they comply with the Llama 3 Community License and the Acceptable Use Policy. ## How to use This repository contains two versions of Meta-Llama-3-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the function. Let's see examples of both. #### Transformers pipeline #### Transformers AutoModelForCausalLM ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : For Hugging Face support, we recommend using transformers or TGI, but a similar command works. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint Pretraining utilized a cumulative** 7.7M GPU hours of computation on hardware of type H100-80GB (TDP of 700W). Estimated total emissions were 2290 tCO2eq, 100% of which were offset by Meta’s sustainability program.
Time (GPU hours) Power Consumption (W) Carbon Emitted(tCO2eq)
Llama 3 8B 1.3M 700 390
Llama 3 70B 6.4M 700 1900
Total 7.7M 2290
**CO2 emissions during pre-training**. Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 3 was pretrained on over 15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 10M human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of March 2023 for the 8B and December 2023 for the 70B models respectively. ## Benchmarks In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here. ### Base pretrained models
Category Benchmark Llama 3 8B Llama2 7B Llama2 13B Llama 3 70B Llama2 70B
General MMLU (5-shot) 66.6 45.7 53.8 79.5 69.7
AGIEval English (3-5 shot) 45.9 28.8 38.7 63.0 54.8
CommonSenseQA (7-shot) 72.6 57.6 67.6 83.8 78.7
Winogrande (5-shot) 76.1 73.3 75.4 83.1 81.8
BIG-Bench Hard (3-shot, CoT) 61.1 38.1 47.0 81.3 65.7
ARC-Challenge (25-shot) 78.6 53.7 67.6 93.0 85.3
Knowledge reasoning TriviaQA-Wiki (5-shot) 78.5 72.1 79.6 89.7 87.5
Reading comprehension SQuAD (1-shot) 76.4 72.2 72.1 85.6 82.6
QuAC (1-shot, F1) 44.4 39.6 44.9 51.1 49.4
BoolQ (0-shot) 75.7 65.5 66.9 79.0 73.1
DROP (3-shot, F1) 58.4 37.9 49.8 79.7 70.2
### Instruction tuned models
Benchmark Llama 3 8B Llama 2 7B Llama 2 13B Llama 3 70B Llama 2 70B
MMLU (5-shot) 68.4 34.1 47.8 82.0 52.9
GPQA (0-shot) 34.2 21.7 22.3 39.5 21.0
HumanEval (0-shot) 62.2 7.9 14.0 81.7 25.6
GSM-8K (8-shot, CoT) 79.6 25.7 77.4 93.0 57.5
MATH (4-shot, CoT) 30.0 3.8 6.7 50.4 11.6
### Responsibility & Safety We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community. Foundation models are widely capable technologies that are built to be used for a diverse range of applications. They are not designed to meet every developer preference on safety levels for all use cases, out-of-the-box, as those by their nature will differ across different applications. Rather, responsible LLM-application deployment is achieved by implementing a series of safety best practices throughout the development of such applications, from the model pre-training, fine-tuning and the deployment of systems composed of safeguards to tailor the safety needs specifically to the use case and audience. As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application. We also provide a set of resources including Meta Llama Guard 2 and Code Shield safeguards. These tools have proven to drastically reduce residual risks of LLM Systems, while maintaining a high level of helpfulness. We encourage developers to tune and deploy these safeguards according to their needs and we provide a reference implementation to get you started. #### Llama 3-Instruct As outlined in the Responsible Use Guide, some trade-off between model helpfulness and model alignment is likely unavoidable. Developers should exercise discretion about how to weigh the benefits of alignment and helpfulness for their specific use case and audience. Developers should be mindful of residual risks when using Llama models and leverage additional safety tools as needed to reach the right safety bar for their use case. Safety For our instruction tuned model, we conducted extensive red teaming exercises, performed adversarial evaluations and implemented safety mitigations techniques to lower residual risks. As with any Large Language Model, residual risks will likely remain and we recommend that developers assess these risks in the context of their use case. In parallel, we are working with the community to make AI safety benchmark standards transparent, rigorous and interpretable. Refusals In addition to residual risks, we put a great emphasis on model refusals to benign prompts. Over-refusing not only can impact the user experience but could even be harmful in certain contexts as well. We’ve heard the feedback from the developer community and improved our fine tuning to ensure that Llama 3 is significantly less likely to falsely refuse to answer prompts than Llama 2. We built internal benchmarks and developed mitigations to limit false refusals making Llama 3 our most helpful model to date. #### Responsible release In addition to responsible use considerations outlined above, we followed a rigorous process that requires us to take extra measures against misuse and critical risks before we make our release decision. Misuse If you access or use Llama 3, you agree to the Acceptable Use Policy. The most recent copy of this policy can be found at #### Critical risks CBRNE (Chemical, Biological, Radiological, Nuclear, and high yield Explosives) We have conducted a two fold assessment of the safety of the model in this area: * Iterative testing during model training to assess the safety of responses related to CBRNE threats and other adversarial risks. * Involving external CBRNE experts to conduct an uplift test assessing the ability of the model to accurately provide expert knowledge and reduce barriers to potential CBRNE misuse, by reference to what can be achieved using web search (without the model). ### Cyber Security We have evaluated Llama 3 with CyberSecEval, Meta’s cybersecurity safety eval suite, measuring Llama 3’s propensity to suggest insecure code when used as a coding assistant, and Llama 3’s propensity to comply with requests to help carry out cyber attacks, where attacks are defined by the industry standard MITRE ATT&CK cyber attack ontology. On our insecure coding and cyber attacker helpfulness tests, Llama 3 behaved in the same range or safer than models of equivalent coding capability. ### Child Safety Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership in AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has been in English, and has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3 models, developers should perform safety testing and tuning tailored to their specific applications of the model. As outlined in the Responsible Use Guide, we recommend incorporating Purple Llama solutions into your workflows and specifically Llama Guard which provides a base model to filter input and output prompts to layer system-level safety on top of model-level safety. Please see the Responsible Use Guide available at ## Citation instructions @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = { } ## Contributors Aaditya Singh; Aaron Grattafiori; Abhimanyu Dubey; Abhinav Jauhri; Abhinav Pandey; Abhishek Kadian; Adam Kelsey; Adi Gangidi; Ahmad Al-Dahle; Ahuva Goldstand; Aiesha Letman; Ajay Menon; Akhil Mathur; Alan Schelten; Alex Vaughan; Amy Yang; Andrei Lupu; Andres Alvarado; Andrew Gallagher; Andrew Gu; Andrew Ho; Andrew Poulton; Andrew Ryan; Angela Fan; Ankit Ramchandani; Anthony Hartshorn; Archi Mitra; Archie Sravankumar; Artem Korenev; Arun Rao; Ashley Gabriel; Ashwin Bharambe; Assaf Eisenman; Aston Zhang; Aurelien Rodriguez; Austen Gregerson; Ava Spataru; Baptiste Roziere; Ben Maurer; Benjamin Leonhardi; Bernie Huang; Bhargavi Paranjape; Bing Liu; Binh Tang; Bobbie Chern; Brani Stojkovic; Brian Fuller; Catalina Mejia Arenas; Chao Zhou; Charlotte Caucheteux; Chaya Nayak; Ching-Hsiang Chu; Chloe Bi; Chris Cai; Chris Cox; Chris Marra; Chris McConnell; Christian Keller; Christoph Feichtenhofer; Christophe Touret; Chunyang Wu; Corinne Wong; Cristian Canton Ferrer; Damien Allonsius; Daniel Kreymer; Daniel Haziza; Daniel Li; Danielle Pintz; Danny Livshits; Danny Wyatt; David Adkins; David Esiobu; David Xu; Davide Testuggine; Delia David; Devi Parikh; Dhruv Choudhary; Dhruv Mahajan; Diana Liskovich; Diego Garcia-Olano; Diego Perino; Dieuwke Hupkes; Dingkang Wang; Dustin Holland; Egor Lakomkin; Elina Lobanova; Xiaoqing Ellen Tan; Emily Dinan; Eric Smith; Erik Brinkman; Esteban Arcaute; Filip Radenovic; Firat Ozgenel; Francesco Caggioni; Frank Seide; Frank Zhang; Gabriel Synnaeve; Gabriella Schwarz; Gabrielle Lee; Gada Badeer; Georgia Anderson; Graeme Nail; Gregoire Mialon; Guan Pang; Guillem Cucurell; Hailey Nguyen; Hannah Korevaar; Hannah Wang; Haroun Habeeb; Harrison Rudolph; Henry Aspegren; Hu Xu; Hugo Touvron; Iga Kozlowska; Igor Molybog; Igor Tufanov; Iliyan Zarov; Imanol Arrieta Ibarra; Irina-Elena Veliche; Isabel Kloumann; Ishan Misra; Ivan Evtimov; Jacob Xu; Jade Copet; Jake Weissman; Jan Geffert; Jana Vranes; Japhet Asher; Jason Park; Jay Mahadeokar; Jean-Baptiste Gaya; Jeet Shah; Jelmer van der Linde; Jennifer Chan; Jenny Hong; Jenya Lee; Jeremy Fu; Jeremy Teboul; Jianfeng Chi; Jianyu Huang; Jie Wang; Jiecao Yu; Joanna Bitton; Joe Spisak; Joelle Pineau; Jon Carvill; Jongsoo Park; Joseph Rocca; Joshua Johnstun; Junteng Jia; Kalyan Vasuden Alwala; Kam Hou U; Kate Plawiak; Kartikeya Upasani; Kaushik Veeraraghavan; Ke Li; Kenneth Heafield; Kevin Stone; Khalid El-Arini; Krithika Iyer; Kshitiz Malik; Kuenley Chiu; Kunal Bhalla; Kyle Huang; Lakshya Garg; Lauren Rantala-Yeary; Laurens van der Maaten; Lawrence Chen; Leandro Silva; Lee Bell; Lei Zhang; Liang Tan; Louis Martin; Lovish Madaan; Luca Wehrstedt; Lukas Blecher; Luke de Oliveira; Madeline Muzzi; Madian Khabsa; Manav Avlani; Mannat Singh; Manohar Paluri; Mark Zuckerberg; Marcin Kardas; Martynas Mankus; Mathew Oldham; Mathieu Rita; Matthew Lennie; Maya Pavlova; Meghan Keneally; Melanie Kambadur; Mihir Patel; Mikayel Samvelyan; Mike Clark; Mike Lewis; Min Si; Mitesh Kumar Singh; Mo Metanat; Mona Hassan; Naman Goyal; Narjes Torabi; Nicolas Usunier; Nikolay Bashlykov; Nikolay Bogoychev; Niladri Chatterji; Ning Dong; Oliver Aobo Yang; Olivier Duchenne; Onur Celebi; Parth Parekh; Patrick Alrassy; Paul Saab; Pavan Balaji; Pedro Rittner; Pengchuan Zhang; Pengwei Li; Petar Vasic; Peter Weng; Polina Zvyagina; Prajjwal Bhargava; Pratik Dubal; Praveen Krishnan; Punit Singh Koura; Qing He; Rachel Rodriguez; Ragavan Srinivasan; Rahul Mitra; Ramon Calderer; Raymond Li; Robert Stojnic; Roberta Raileanu; Robin Battey; Rocky Wang; Rohit Girdhar; Rohit Patel; Romain Sauvestre; Ronnie Polidoro; Roshan Sumbaly; Ross Taylor; Ruan Silva; Rui Hou; Rui Wang; Russ Howes; Ruty Rinott; Saghar Hosseini; Sai Jayesh Bondu; Samyak Datta; Sanjay Singh; Sara Chugh; Sargun Dhillon; Satadru Pan; Sean Bell; Sergey Edunov; Shaoliang Nie; Sharan Narang; Sharath Raparthy; Shaun Lindsay; Sheng Feng; Sheng Shen; Shenghao Lin; Shiva Shankar; Shruti Bhosale; Shun Zhang; Simon Vandenhende; Sinong Wang; Seohyun Sonia Kim; Soumya Batra; Sten Sootla; Steve Kehoe; Suchin Gururangan; Sumit Gupta; Sunny Virk; Sydney Borodinsky; Tamar Glaser; Tamar Herman; Tamara Best; Tara Fowler; Thomas Georgiou; Thomas Scialom; Tianhe Li; Todor Mihaylov; Tong Xiao; Ujjwal Karn; Vedanuj Goswami; Vibhor Gupta; Vignesh Ramanathan; Viktor Kerkez; Vinay Satish Kumar; Vincent Gonguet; Vish Vogeti; Vlad Poenaru; Vlad Tiberiu Mihailescu; Vladan Petrovic; Vladimir Ivanov; Wei Li; Weiwei Chu; Wenhan Xiong; Wenyin Fu; Wes Bouaziz; Whitney Meers; Will Constable; Xavier Martinet; Xiaojian Wu; Xinbo Gao; Xinfeng Xie; Xuchao Jia; Yaelle Goldschlag; Yann LeCun; Yashesh Gaur; Yasmine Babaei; Ye Qi; Yenda Li; Yi Wen; Yiwen Song; Youngjin Nam; Yuchen Hao; Yuchen Zhang; Yun Wang; Yuning Mao; Yuzi He; Zacharie Delpierre Coudert; Zachary DeVito; Zahra Hankir; Zhaoduo Wen; Zheng Yan; Zhengxing Chen; Zhenyu Yang; Zoe Papakipos" +} \ No newline at end of file diff --git a/data/model_data_json/meta-llama_Meta-Llama-3-8B.json b/data/model_data_json/meta-llama_Meta-Llama-3-8B.json new file mode 100644 index 0000000000000000000000000000000000000000..6f0615f1e1e28d3394e9a820fb990d497756429d --- /dev/null +++ b/data/model_data_json/meta-llama_Meta-Llama-3-8B.json @@ -0,0 +1,21 @@ +{ + "model_id": "meta-llama/Meta-Llama-3-8B", + "downloads": 402236, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "facebook", + "meta", + "pytorch", + "llama-3", + "en", + "license:llama3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-3 license: llama3 new_version: meta-llama/Llama-3.1-8B extra_gated_prompt: >- ### META LLAMA 3 COMMUNITY LICENSE AGREEMENT Meta Llama 3 Version Release Date: April 18, 2024 \"Agreement\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"Documentation\" means the specifications, manuals and documentation accompanying Meta Llama 3 distributed by Meta at \"Licensee\" or \"you\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"Meta Llama 3\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"Llama Materials\" means, collectively, Meta’s proprietary Meta Llama 3 and Documentation (and any portion thereof) made available under this Agreement. \"Meta\" or \"we\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). 1. License Rights and Redistribution. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof). 2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5. Intellectual Property. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama 3” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at ). All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Meta Llama 3 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. ### Meta Llama 3 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Meta Llama 3. If you access or use Meta Llama 3, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at #### Prohibited Uses We want everyone to use Meta Llama 3 safely and responsibly. You agree you will not use, or allow others to use, Meta Llama 3 to: 1. Violate the law or others’ rights, including to: 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as: 1. Violence or terrorism 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material 3. Human trafficking, exploitation, and sexual violence 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials. 5. Sexual solicitation 6. Any other criminal activity 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Meta Llama 3 related to the following: 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State 2. Guns and illegal weapons (including weapon development) 3. Illegal drugs and regulated/controlled substances 4. Operation of critical infrastructure, transportation technologies, or heavy machinery 5. Self-harm or harm to others, including suicide, cutting, and eating disorders 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual 3. Intentionally deceive or mislead others, including use of Meta Llama 3 related to the following: 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content 3. Generating, promoting, or further distributing spam 4. Impersonating another individual without consent, authorization, or legal right 5. Representing that the use of Meta Llama 3 or outputs are human-generated 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 4. Fail to appropriately disclose to end users any known dangers of your AI system Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means: * Reporting issues with the model: * Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback * Reporting bugs and security concerns: facebook.com/whitehat/info * Reporting violations of the Acceptable Use Policy or unlicensed uses of Meta Llama 3: LlamaUseReport@meta.com extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit --- ## Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. **Model developers** Meta **Variations** Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. **Input** Models input text only. **Output** Models generate text and code only. **Model Architecture** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Context length GQA Token count Knowledge cutoff
Llama 3 A new mix of publicly available online data. 8B 8k Yes 15T+ March, 2023
70B 8k Yes December, 2023
**Llama 3 family of models**. Token counts refer to pretraining data only. Both the 8 and 70B versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date** April 18, 2024. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3 Community License. Use in languages other than English**. **Note: Developers may fine-tune Llama 3 models for languages beyond English provided they comply with the Llama 3 Community License and the Acceptable Use Policy. ## How to use This repository contains two versions of Meta-Llama-3-8B, for use with transformers and with the original codebase. ### Use with transformers See the snippet below for usage with Transformers: ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : For Hugging Face support, we recommend using transformers or TGI, but a similar command works. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint Pretraining utilized a cumulative** 7.7M GPU hours of computation on hardware of type H100-80GB (TDP of 700W). Estimated total emissions were 2290 tCO2eq, 100% of which were offset by Meta’s sustainability program.
Time (GPU hours) Power Consumption (W) Carbon Emitted(tCO2eq)
Llama 3 8B 1.3M 700 390
Llama 3 70B 6.4M 700 1900
Total 7.7M 2290
**CO2 emissions during pre-training**. Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 3 was pretrained on over 15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 10M human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of March 2023 for the 8B and December 2023 for the 70B models respectively. ## Benchmarks In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here. ### Base pretrained models
Category Benchmark Llama 3 8B Llama2 7B Llama2 13B Llama 3 70B Llama2 70B
General MMLU (5-shot) 66.6 45.7 53.8 79.5 69.7
AGIEval English (3-5 shot) 45.9 28.8 38.7 63.0 54.8
CommonSenseQA (7-shot) 72.6 57.6 67.6 83.8 78.7
Winogrande (5-shot) 76.1 73.3 75.4 83.1 81.8
BIG-Bench Hard (3-shot, CoT) 61.1 38.1 47.0 81.3 65.7
ARC-Challenge (25-shot) 78.6 53.7 67.6 93.0 85.3
Knowledge reasoning TriviaQA-Wiki (5-shot) 78.5 72.1 79.6 89.7 87.5
Reading comprehension SQuAD (1-shot) 76.4 72.2 72.1 85.6 82.6
QuAC (1-shot, F1) 44.4 39.6 44.9 51.1 49.4
BoolQ (0-shot) 75.7 65.5 66.9 79.0 73.1
DROP (3-shot, F1) 58.4 37.9 49.8 79.7 70.2
### Instruction tuned models
Benchmark Llama 3 8B Llama 2 7B Llama 2 13B Llama 3 70B Llama 2 70B
MMLU (5-shot) 68.4 34.1 47.8 82.0 52.9
GPQA (0-shot) 34.2 21.7 22.3 39.5 21.0
HumanEval (0-shot) 62.2 7.9 14.0 81.7 25.6
GSM-8K (8-shot, CoT) 79.6 25.7 77.4 93.0 57.5
MATH (4-shot, CoT) 30.0 3.8 6.7 50.4 11.6
### Responsibility & Safety We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community. Foundation models are widely capable technologies that are built to be used for a diverse range of applications. They are not designed to meet every developer preference on safety levels for all use cases, out-of-the-box, as those by their nature will differ across different applications. Rather, responsible LLM-application deployment is achieved by implementing a series of safety best practices throughout the development of such applications, from the model pre-training, fine-tuning and the deployment of systems composed of safeguards to tailor the safety needs specifically to the use case and audience. As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application. We also provide a set of resources including Meta Llama Guard 2 and Code Shield safeguards. These tools have proven to drastically reduce residual risks of LLM Systems, while maintaining a high level of helpfulness. We encourage developers to tune and deploy these safeguards according to their needs and we provide a reference implementation to get you started. #### Llama 3-Instruct As outlined in the Responsible Use Guide, some trade-off between model helpfulness and model alignment is likely unavoidable. Developers should exercise discretion about how to weigh the benefits of alignment and helpfulness for their specific use case and audience. Developers should be mindful of residual risks when using Llama models and leverage additional safety tools as needed to reach the right safety bar for their use case. Safety For our instruction tuned model, we conducted extensive red teaming exercises, performed adversarial evaluations and implemented safety mitigations techniques to lower residual risks. As with any Large Language Model, residual risks will likely remain and we recommend that developers assess these risks in the context of their use case. In parallel, we are working with the community to make AI safety benchmark standards transparent, rigorous and interpretable. Refusals In addition to residual risks, we put a great emphasis on model refusals to benign prompts. Over-refusing not only can impact the user experience but could even be harmful in certain contexts as well. We’ve heard the feedback from the developer community and improved our fine tuning to ensure that Llama 3 is significantly less likely to falsely refuse to answer prompts than Llama 2. We built internal benchmarks and developed mitigations to limit false refusals making Llama 3 our most helpful model to date. #### Responsible release In addition to responsible use considerations outlined above, we followed a rigorous process that requires us to take extra measures against misuse and critical risks before we make our release decision. Misuse If you access or use Llama 3, you agree to the Acceptable Use Policy. The most recent copy of this policy can be found at #### Critical risks CBRNE (Chemical, Biological, Radiological, Nuclear, and high yield Explosives) We have conducted a two fold assessment of the safety of the model in this area: * Iterative testing during model training to assess the safety of responses related to CBRNE threats and other adversarial risks. * Involving external CBRNE experts to conduct an uplift test assessing the ability of the model to accurately provide expert knowledge and reduce barriers to potential CBRNE misuse, by reference to what can be achieved using web search (without the model). ### Cyber Security We have evaluated Llama 3 with CyberSecEval, Meta’s cybersecurity safety eval suite, measuring Llama 3’s propensity to suggest insecure code when used as a coding assistant, and Llama 3’s propensity to comply with requests to help carry out cyber attacks, where attacks are defined by the industry standard MITRE ATT&CK cyber attack ontology. On our insecure coding and cyber attacker helpfulness tests, Llama 3 behaved in the same range or safer than models of equivalent coding capability. ### Child Safety Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership in AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has been in English, and has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3 models, developers should perform safety testing and tuning tailored to their specific applications of the model. As outlined in the Responsible Use Guide, we recommend incorporating Purple Llama solutions into your workflows and specifically Llama Guard which provides a base model to filter input and output prompts to layer system-level safety on top of model-level safety. Please see the Responsible Use Guide available at ## Citation instructions @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = { } ## Contributors Aaditya Singh; Aaron Grattafiori; Abhimanyu Dubey; Abhinav Jauhri; Abhinav Pandey; Abhishek Kadian; Adam Kelsey; Adi Gangidi; Ahmad Al-Dahle; Ahuva Goldstand; Aiesha Letman; Ajay Menon; Akhil Mathur; Alan Schelten; Alex Vaughan; Amy Yang; Andrei Lupu; Andres Alvarado; Andrew Gallagher; Andrew Gu; Andrew Ho; Andrew Poulton; Andrew Ryan; Angela Fan; Ankit Ramchandani; Anthony Hartshorn; Archi Mitra; Archie Sravankumar; Artem Korenev; Arun Rao; Ashley Gabriel; Ashwin Bharambe; Assaf Eisenman; Aston Zhang; Aurelien Rodriguez; Austen Gregerson; Ava Spataru; Baptiste Roziere; Ben Maurer; Benjamin Leonhardi; Bernie Huang; Bhargavi Paranjape; Bing Liu; Binh Tang; Bobbie Chern; Brani Stojkovic; Brian Fuller; Catalina Mejia Arenas; Chao Zhou; Charlotte Caucheteux; Chaya Nayak; Ching-Hsiang Chu; Chloe Bi; Chris Cai; Chris Cox; Chris Marra; Chris McConnell; Christian Keller; Christoph Feichtenhofer; Christophe Touret; Chunyang Wu; Corinne Wong; Cristian Canton Ferrer; Damien Allonsius; Daniel Kreymer; Daniel Haziza; Daniel Li; Danielle Pintz; Danny Livshits; Danny Wyatt; David Adkins; David Esiobu; David Xu; Davide Testuggine; Delia David; Devi Parikh; Dhruv Choudhary; Dhruv Mahajan; Diana Liskovich; Diego Garcia-Olano; Diego Perino; Dieuwke Hupkes; Dingkang Wang; Dustin Holland; Egor Lakomkin; Elina Lobanova; Xiaoqing Ellen Tan; Emily Dinan; Eric Smith; Erik Brinkman; Esteban Arcaute; Filip Radenovic; Firat Ozgenel; Francesco Caggioni; Frank Seide; Frank Zhang; Gabriel Synnaeve; Gabriella Schwarz; Gabrielle Lee; Gada Badeer; Georgia Anderson; Graeme Nail; Gregoire Mialon; Guan Pang; Guillem Cucurell; Hailey Nguyen; Hannah Korevaar; Hannah Wang; Haroun Habeeb; Harrison Rudolph; Henry Aspegren; Hu Xu; Hugo Touvron; Iga Kozlowska; Igor Molybog; Igor Tufanov; Iliyan Zarov; Imanol Arrieta Ibarra; Irina-Elena Veliche; Isabel Kloumann; Ishan Misra; Ivan Evtimov; Jacob Xu; Jade Copet; Jake Weissman; Jan Geffert; Jana Vranes; Japhet Asher; Jason Park; Jay Mahadeokar; Jean-Baptiste Gaya; Jeet Shah; Jelmer van der Linde; Jennifer Chan; Jenny Hong; Jenya Lee; Jeremy Fu; Jeremy Teboul; Jianfeng Chi; Jianyu Huang; Jie Wang; Jiecao Yu; Joanna Bitton; Joe Spisak; Joelle Pineau; Jon Carvill; Jongsoo Park; Joseph Rocca; Joshua Johnstun; Junteng Jia; Kalyan Vasuden Alwala; Kam Hou U; Kate Plawiak; Kartikeya Upasani; Kaushik Veeraraghavan; Ke Li; Kenneth Heafield; Kevin Stone; Khalid El-Arini; Krithika Iyer; Kshitiz Malik; Kuenley Chiu; Kunal Bhalla; Kyle Huang; Lakshya Garg; Lauren Rantala-Yeary; Laurens van der Maaten; Lawrence Chen; Leandro Silva; Lee Bell; Lei Zhang; Liang Tan; Louis Martin; Lovish Madaan; Luca Wehrstedt; Lukas Blecher; Luke de Oliveira; Madeline Muzzi; Madian Khabsa; Manav Avlani; Mannat Singh; Manohar Paluri; Mark Zuckerberg; Marcin Kardas; Martynas Mankus; Mathew Oldham; Mathieu Rita; Matthew Lennie; Maya Pavlova; Meghan Keneally; Melanie Kambadur; Mihir Patel; Mikayel Samvelyan; Mike Clark; Mike Lewis; Min Si; Mitesh Kumar Singh; Mo Metanat; Mona Hassan; Naman Goyal; Narjes Torabi; Nicolas Usunier; Nikolay Bashlykov; Nikolay Bogoychev; Niladri Chatterji; Ning Dong; Oliver Aobo Yang; Olivier Duchenne; Onur Celebi; Parth Parekh; Patrick Alrassy; Paul Saab; Pavan Balaji; Pedro Rittner; Pengchuan Zhang; Pengwei Li; Petar Vasic; Peter Weng; Polina Zvyagina; Prajjwal Bhargava; Pratik Dubal; Praveen Krishnan; Punit Singh Koura; Qing He; Rachel Rodriguez; Ragavan Srinivasan; Rahul Mitra; Ramon Calderer; Raymond Li; Robert Stojnic; Roberta Raileanu; Robin Battey; Rocky Wang; Rohit Girdhar; Rohit Patel; Romain Sauvestre; Ronnie Polidoro; Roshan Sumbaly; Ross Taylor; Ruan Silva; Rui Hou; Rui Wang; Russ Howes; Ruty Rinott; Saghar Hosseini; Sai Jayesh Bondu; Samyak Datta; Sanjay Singh; Sara Chugh; Sargun Dhillon; Satadru Pan; Sean Bell; Sergey Edunov; Shaoliang Nie; Sharan Narang; Sharath Raparthy; Shaun Lindsay; Sheng Feng; Sheng Shen; Shenghao Lin; Shiva Shankar; Shruti Bhosale; Shun Zhang; Simon Vandenhende; Sinong Wang; Seohyun Sonia Kim; Soumya Batra; Sten Sootla; Steve Kehoe; Suchin Gururangan; Sumit Gupta; Sunny Virk; Sydney Borodinsky; Tamar Glaser; Tamar Herman; Tamara Best; Tara Fowler; Thomas Georgiou; Thomas Scialom; Tianhe Li; Todor Mihaylov; Tong Xiao; Ujjwal Karn; Vedanuj Goswami; Vibhor Gupta; Vignesh Ramanathan; Viktor Kerkez; Vinay Satish Kumar; Vincent Gonguet; Vish Vogeti; Vlad Poenaru; Vlad Tiberiu Mihailescu; Vladan Petrovic; Vladimir Ivanov; Wei Li; Weiwei Chu; Wenhan Xiong; Wenyin Fu; Wes Bouaziz; Whitney Meers; Will Constable; Xavier Martinet; Xiaojian Wu; Xinbo Gao; Xinfeng Xie; Xuchao Jia; Yaelle Goldschlag; Yann LeCun; Yashesh Gaur; Yasmine Babaei; Ye Qi; Yenda Li; Yi Wen; Yiwen Song; Youngjin Nam; Yuchen Hao; Yuchen Zhang; Yun Wang; Yuning Mao; Yuzi He; Zacharie Delpierre Coudert; Zachary DeVito; Zahra Hankir; Zhaoduo Wen; Zheng Yan; Zhengxing Chen; Zhenyu Yang; Zoe Papakipos" +} \ No newline at end of file diff --git a/data/model_data_json/michaelfeil_bge-small-en-v1.5.json b/data/model_data_json/michaelfeil_bge-small-en-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..29f830717c912f10a9e976e9001e20c270f9da03 --- /dev/null +++ b/data/model_data_json/michaelfeil_bge-small-en-v1.5.json @@ -0,0 +1,22 @@ +{ + "model_id": "michaelfeil/bge-small-en-v1.5", + "downloads": 85839, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "license:mit", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers license: mit language: - en ---

Infinity Embedding Model

This is the stable default model for infinity. More details about the infinity inference project please refer to the Github: Infinity. ## Usage for Embedding Model via infinity in Python To deploy files with the infinity_emb pip package. Recommended is with flash attention on gpu, and for onnx inference. ## CLI interface The same args ## Contact If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Michael Feil (infinity at michaelfeil.eu). ## Citation If you find this repository useful, please consider giving a star :star: and citation ## License Infinity is licensed under the MIT License.", + "model_explanation_gemini": "Generates sentence embeddings for English text to measure similarity and extract features." +} \ No newline at end of file diff --git a/data/model_data_json/michellejieli_emotion_text_classifier.json b/data/model_data_json/michellejieli_emotion_text_classifier.json new file mode 100644 index 0000000000000000000000000000000000000000..c38817bd1376c46377127082c03331be3243e888 --- /dev/null +++ b/data/model_data_json/michellejieli_emotion_text_classifier.json @@ -0,0 +1,21 @@ +{ + "model_id": "michellejieli/emotion_text_classifier", + "downloads": 319654, + "tags": [ + "transformers", + "pytorch", + "roberta", + "text-classification", + "distilroberta", + "sentiment", + "emotion", + "twitter", + "reddit", + "en", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: \"en\" tags: - distilroberta - sentiment - emotion - twitter - reddit widget: - text: \"Oh my God, he's lost it. He's totally lost it.\" - text: \"What?\" - text: \"Wow, congratulations! So excited for you!\" --- # Fine-tuned DistilRoBERTa-base for Emotion Classification 🤬🤢😀😐😭😲 # Model Description DistilRoBERTa-base is a transformer model that performs sentiment analysis. I fine-tuned the model on transcripts from the Friends show with the goal of classifying emotions from text data, specifically dialogue from Netflix shows or movies. The model predicts 6 Ekman emotions and a neutral class. These emotions include anger, disgust, fear, joy, neutrality, sadness, and surprise. The model is a fine-tuned version of Emotion English DistilRoBERTa-base and DistilRoBERTa-base. This model was initially trained on the following table from Emotion English DistilRoBERTa-base: |Name|anger|disgust|fear|joy|neutral|sadness|surprise| |---|---|---|---|---|---|---|---| |Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes| |Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes| |GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes| |ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-| |MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes| |SemEval-2018, EI-reg, Mohammad et al. (2018) |Yes|-|Yes|Yes|-|Yes|-| It was fine-tuned on: |Name|anger|disgust|fear|joy|neutral|sadness|surprise| |---|---|---|---|---|---|---|---| |Emotion Lines (Friends)|Yes|Yes|Yes|Yes|Yes|Yes|Yes| # How to Use # Contact Please reach out to michelleli1999@gmail.com if you have any questions or feedback. # Reference", + "model_explanation_gemini": "Classifies text into six Ekman emotions (anger, disgust, fear, joy, sadness, surprise) and neutrality, fine-tuned on dialogue from Friends." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_BiomedCLIP-PubMedBERT_256-vit_base_patch16_224.json b/data/model_data_json/microsoft_BiomedCLIP-PubMedBERT_256-vit_base_patch16_224.json new file mode 100644 index 0000000000000000000000000000000000000000..0560faa73bb61e9a67c8084e1c4066ec2fed2097 --- /dev/null +++ b/data/model_data_json/microsoft_BiomedCLIP-PubMedBERT_256-vit_base_patch16_224.json @@ -0,0 +1,16 @@ +{ + "model_id": "microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224", + "downloads": 130705, + "tags": [ + "open_clip", + "clip", + "biology", + "medical", + "zero-shot-image-classification", + "en", + "license:mit", + "region:us" + ], + "description": "--- language: en tags: - clip - biology - medical license: mit library_name: open_clip widget: - src: candidate_labels: adenocarcinoma histopathology, squamous cell carcinoma histopathology example_title: squamous cell carcinoma histopathology - src: >- candidate_labels: adenocarcinoma histopathology, squamous cell carcinoma histopathology example_title: adenocarcinoma histopathology - src: >- candidate_labels: left-sided pleural effusion chest x-ray, right-sided pleural effusion chest x-ray, normal chest x-ray example_title: left-sided pleural effusion chest x-ray pipeline_tag: zero-shot-image-classification --- # BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the image encoder, with domain-specific adaptations. It can perform various vision-language processing (VLP) tasks such as cross-modal retrieval, image classification, and visual question answering. BiomedCLIP establishes new state of the art in a wide range of standard datasets, and substantially outperforms prior VLP approaches: ## Contents - Training Data - Model Use - Reference - Limitations - Further Information ## Training Data We have released BiomedCLIP Data Pipeline at which automatically downloads and processes a set of articles from the PubMed Central Open Access dataset. BiomedCLIP builds upon the PMC-15M dataset, which is a large-scale parallel image-text dataset generated by this data pipeline for biomedical vision-language processing. It contains 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central and covers a diverse range of biomedical image types, such as microscopy, radiography, histology, and more. ## Model Use ### 1. Environment ### 2.1 Load from HF hub ### 2.2 Load from local files ### Use in Jupyter Notebook Please refer to this example notebook. ### Intended Use This model is intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper. #### Primary Intended Use The primary intended use is to support AI researchers building on top of this work. BiomedCLIP and its associated models should be helpful for exploring various biomedical VLP research questions, especially in the radiology domain. #### Out-of-Scope Use **Any** deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to the associated paper for more details. ## Reference ## Limitations This model was developed using English corpora, and thus can be considered English-only. ## Further information Please refer to the corresponding paper, \"Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing\" for additional details on the model training and evaluation.", + "model_explanation_gemini": "BiomedCLIP is a biomedical vision-language model trained on 15 million figure-caption pairs from PubMed Central, excelling in tasks like cross-modal retrieval, image classification, and visual question answering for diverse medical image types." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext.json b/data/model_data_json/microsoft_BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext.json new file mode 100644 index 0000000000000000000000000000000000000000..4a2b922167a073f4d047e0892b590554c445801c --- /dev/null +++ b/data/model_data_json/microsoft_BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext", + "downloads": 1743309, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "fill-mask", + "exbert", + "en", + "arxiv:2007.15779", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: mit widget: - text: \"[MASK] is a tumor suppressor gene.\" --- ## MSR BiomedBERT (abstracts + full text)
* This model was previously named **\"PubMedBERT (abstracts + full text)\"**. * You can either adopt the new model name \"microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext\" or update your library to version 4.22+ if you need to refer to the old name.
Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. Recent work shows that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. BiomedBERT is pretrained from scratch using _abstracts_ from PubMed and _full-text_ articles from PubMedCentral. This model achieves state-of-the-art performance on many biomedical NLP tasks, and currently holds the top score on the Biomedical Language Understanding and Reasoning Benchmark. ## Citation If you find BiomedBERT useful in your research, please cite the following paper:
", + "model_explanation_gemini": "BiomedBERT is a BERT-based model pretrained from scratch on biomedical abstracts and full-text articles to achieve state-of-the-art performance on biomedical NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_BiomedNLP-BiomedBERT-base-uncased-abstract.json b/data/model_data_json/microsoft_BiomedNLP-BiomedBERT-base-uncased-abstract.json new file mode 100644 index 0000000000000000000000000000000000000000..6773d861684fbfd9d79350b26c88cb7ab6330784 --- /dev/null +++ b/data/model_data_json/microsoft_BiomedNLP-BiomedBERT-base-uncased-abstract.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract", + "downloads": 243816, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "fill-mask", + "exbert", + "en", + "arxiv:2007.15779", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - exbert license: mit widget: - text: \"[MASK] is a tyrosine kinase inhibitor.\" --- ## MSR BiomedBERT (abstracts only)
* This model was previously named **\"PubMedBERT (abstracts)\"**. * You can either adopt the new model name \"microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract\" or update your library to version 4.22+ if you need to refer to the old name.
Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. Recent work shows that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. This BiomedBERT is pretrained from scratch using _abstracts_ from PubMed. This model achieves state-of-the-art performance on several biomedical NLP tasks, as shown on the Biomedical Language Understanding and Reasoning Benchmark. ## Citation If you find BiomedBERT useful in your research, please cite the following paper:
", + "model_explanation_gemini": "Pretrained from scratch on PubMed abstracts for state-of-the-art performance in biomedical NLP tasks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_DialoGPT-medium.json b/data/model_data_json/microsoft_DialoGPT-medium.json new file mode 100644 index 0000000000000000000000000000000000000000..f1d51710b4ca1536c987f783c3b40ed559682bbd --- /dev/null +++ b/data/model_data_json/microsoft_DialoGPT-medium.json @@ -0,0 +1,22 @@ +{ + "model_id": "microsoft/DialoGPT-medium", + "downloads": 247207, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "gpt2", + "text-generation", + "conversational", + "arxiv:1911.00536", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- thumbnail: tags: - conversational license: mit --- ## A State-of-the-Art Large-scale Pretrained Response generation model (DialoGPT) DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread. * Multi-turn generation examples from an interactive environment: |Role | Response | |---------|--------| |User | Does money buy happiness? | | Bot | Depends how much money you spend on it .| |User | What is the best way to buy happiness ? | | Bot | You just have to be a millionaire by your early 20s, then you can be happy . | |User |This is so difficult ! | | Bot | You have no idea how hard it is to be a millionaire and happy . There is a reason the rich have a lot of money | Please find the information about preprocessing, training and full details of the DialoGPT in the original DialoGPT repository ArXiv paper: ### How to use Now we are ready to try out how the model works as a chatting partner!", + "model_explanation_gemini": "Generates human-like conversational responses for multiturn dialogues, trained on Reddit discussion threads." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_DialoGPT-small.json b/data/model_data_json/microsoft_DialoGPT-small.json new file mode 100644 index 0000000000000000000000000000000000000000..3871ed90a1a6dfbf12faddefb06dec5aba8659ac --- /dev/null +++ b/data/model_data_json/microsoft_DialoGPT-small.json @@ -0,0 +1,22 @@ +{ + "model_id": "microsoft/DialoGPT-small", + "downloads": 189485, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "gpt2", + "text-generation", + "conversational", + "arxiv:1911.00536", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- thumbnail: tags: - conversational license: mit --- ## A State-of-the-Art Large-scale Pretrained Response generation model (DialoGPT) DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread. * Multi-turn generation examples from an interactive environment: |Role | Response | |---------|--------| |User | Does money buy happiness? | | Bot | Depends how much money you spend on it .| |User | What is the best way to buy happiness ? | | Bot | You just have to be a millionaire by your early 20s, then you can be happy . | |User |This is so difficult ! | | Bot | You have no idea how hard it is to be a millionaire and happy . There is a reason the rich have a lot of money | Please find the information about preprocessing, training and full details of the DialoGPT in the original DialoGPT repository ArXiv paper: ### How to use Now we are ready to try out how the model works as a chatting partner! # Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 25.02 | | ARC (25-shot) | 25.77 | | HellaSwag (10-shot) | 25.79 | | MMLU (5-shot) | 25.81 | | TruthfulQA (0-shot) | 47.49 | | Winogrande (5-shot) | 50.28 | | GSM8K (5-shot) | 0.0 | | DROP (3-shot) | 0.0 |", + "model_explanation_gemini": "Generates human-like conversational responses for multiturn dialogues, trained on Reddit discussion threads." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Florence-2-base.json b/data/model_data_json/microsoft_Florence-2-base.json new file mode 100644 index 0000000000000000000000000000000000000000..e191a6372ab89ed902be7bfb7f32faf7d60232fc --- /dev/null +++ b/data/model_data_json/microsoft_Florence-2-base.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/Florence-2-base", + "downloads": 277285, + "tags": [ + "transformers", + "pytorch", + "florence2", + "text-generation", + "vision", + "image-text-to-text", + "custom_code", + "arxiv:2311.06242", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: mit license_link: pipeline_tag: image-text-to-text tags: - vision --- # Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks ## Model Summary This Hub repository contains a HuggingFace's implementation of Florence-2 model from Microsoft. Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model. Resources and Technical Documentation: + Florence-2 technical report. + Jupyter Notebook for inference and visualization of Florence-2-large model | Model | Model size | Model Description | | ------- | ------------- | ------------- | | Florence-2-base[[HF]]( | 0.23B | Pretrained model with FLD-5B | Florence-2-large[[HF]]( | 0.77B | Pretrained model with FLD-5B | Florence-2-base-ft[[HF]]( | 0.23B | Finetuned model on a colletion of downstream tasks | Florence-2-large-ft[[HF]]( | 0.77B | Finetuned model on a colletion of downstream tasks ## How to Get Started with the Model Use the code below to get started with the model. All models are trained with float16. ## Tasks This model is capable of performing different tasks through changing the prompts. First, let's define a function to run a prompt.
Click to expand
Here are the tasks could perform:
Click to expand ### Caption ### Detailed Caption ### More Detailed Caption ### Caption to Phrase Grounding caption to phrase grounding task requires additional text input, i.e. caption. Caption to phrase grounding results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['', '', ...]}} ### Object Detection OD results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} } ### Dense Region Caption Dense region caption results format: {'\\' : {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} } ### Region proposal Dense region caption results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['', '', ...]}} ### OCR ### OCR with Region OCR with region output format: {'\\': {'quad_boxes': [[x1, y1, x2, y2, x3, y3, x4, y4], ...], 'labels': ['text1', ...]}} for More detailed examples, please refer to notebook
# Benchmarks ## Florence-2 Zero-shot performance The following table presents the zero-shot performance of generalist vision foundation models on image captioning and object detection evaluation tasks. These models have not been exposed to the training data of the evaluation tasks during their training phase. | Method | #params | COCO Cap. test CIDEr | NoCaps val CIDEr | TextCaps val CIDEr | COCO Det. val2017 mAP | |--------|---------|----------------------|------------------|--------------------|-----------------------| | Flamingo | 80B | 84.3 | - | - | - | | Florence-2-base| 0.23B | 133.0 | 118.7 | 70.1 | 34.7 | | Florence-2-large| 0.77B | 135.6 | 120.8 | 72.8 | 37.5 | The following table continues the comparison with performance on other vision-language evaluation tasks. | Method | Flickr30k test R@1 | Refcoco val Accuracy | Refcoco test-A Accuracy | Refcoco test-B Accuracy | Refcoco+ val Accuracy | Refcoco+ test-A Accuracy | Refcoco+ test-B Accuracy | Refcocog val Accuracy | Refcocog test Accuracy | Refcoco RES val mIoU | |--------|----------------------|----------------------|-------------------------|-------------------------|-----------------------|--------------------------|--------------------------|-----------------------|------------------------|----------------------| | Kosmos-2 | 78.7 | 52.3 | 57.4 | 47.3 | 45.5 | 50.7 | 42.2 | 60.6 | 61.7 | - | | Florence-2-base | 83.6 | 53.9 | 58.4 | 49.7 | 51.5 | 56.4 | 47.9 | 66.3 | 65.1 | 34.6 | | Florence-2-large | 84.4 | 56.3 | 61.6 | 51.4 | 53.6 | 57.9 | 49.9 | 68.0 | 67.0 | 35.8 | ## Florence-2 finetuned performance We finetune Florence-2 models with a collection of downstream tasks, resulting two generalist models *Florence-2-base-ft* and *Florence-2-large-ft* that can conduct a wide range of downstream tasks. The table below compares the performance of specialist and generalist models on various captioning and Visual Question Answering (VQA) tasks. Specialist models are fine-tuned specifically for each task, whereas generalist models are fine-tuned in a task-agnostic manner across all tasks. The symbol \"▲\" indicates the usage of external OCR as input. | Method | # Params | COCO Caption Karpathy test CIDEr | NoCaps val CIDEr | TextCaps val CIDEr | VQAv2 test-dev Acc | TextVQA test-dev Acc | VizWiz VQA test-dev Acc | |----------------|----------|-----------------------------------|------------------|--------------------|--------------------|----------------------|-------------------------| | **Specialist Models** | | | | | | | | | CoCa | 2.1B | 143.6 | 122.4 | - | 82.3 | - | - | | BLIP-2 | 7.8B | 144.5 | 121.6 | - | 82.2 | - | - | | GIT2 | 5.1B | 145.0 | 126.9 | 148.6 | 81.7 | 67.3 | 71.0 | | Flamingo | 80B | 138.1 | - | - | 82.0 | 54.1 | 65.7 | | PaLI | 17B | 149.1 | 127.0 | 160.0▲ | 84.3 | 58.8 / 73.1▲ | 71.6 / 74.4▲ | | PaLI-X | 55B | 149.2 | 126.3 | 147.0 / 163.7▲ | 86.0 | 71.4 / 80.8▲ | 70.9 / 74.6▲ | | **Generalist Models** | | | | | | | | | Unified-IO | 2.9B | - | 100.0 | - | 77.9 | - | 57.4 | | Florence-2-base-ft | 0.23B | 140.0 | 116.7 | 143.9 | 79.7 | 63.6 | 63.6 | | Florence-2-large-ft | 0.77B | 143.3 | 124.9 | 151.1 | 81.7 | 73.5 | 72.6 | | Method | # Params | COCO Det. val2017 mAP | Flickr30k test R@1 | RefCOCO val Accuracy | RefCOCO test-A Accuracy | RefCOCO test-B Accuracy | RefCOCO+ val Accuracy | RefCOCO+ test-A Accuracy | RefCOCO+ test-B Accuracy | RefCOCOg val Accuracy | RefCOCOg test Accuracy | RefCOCO RES val mIoU | |----------------------|----------|-----------------------|--------------------|----------------------|-------------------------|-------------------------|------------------------|---------------------------|---------------------------|------------------------|-----------------------|------------------------| | **Specialist Models** | | | | | | | | | | | | | | SeqTR | - | - | - | 83.7 | 86.5 | 81.2 | 71.5 | 76.3 | 64.9 | 74.9 | 74.2 | - | | PolyFormer | - | - | - | 90.4 | 92.9 | 87.2 | 85.0 | 89.8 | 78.0 | 85.8 | 85.9 | 76.9 | | UNINEXT | 0.74B | 60.6 | - | 92.6 | 94.3 | 91.5 | 85.2 | 89.6 | 79.8 | 88.7 | 89.4 | - | | Ferret | 13B | - | - | 89.5 | 92.4 | 84.4 | 82.8 | 88.1 | 75.2 | 85.8 | 86.3 | - | | **Generalist Models** | | | | | | | | | | | | | | UniTAB | - | - | - | 88.6 | 91.1 | 83.8 | 81.0 | 85.4 | 71.6 | 84.6 | 84.7 | - | | Florence-2-base-ft | 0.23B | 41.4 | 84.0 | 92.6 | 94.8 | 91.5 | 86.8 | 91.7 | 82.2 | 89.8 | 82.2 | 78.0 | | Florence-2-large-ft| 0.77B | 43.4 | 85.2 | 93.4 | 95.3 | 92.0 | 88.3 | 92.9 | 83.6 | 91.2 | 91.7 | 80.5 | ## BibTex and citation info", + "model_explanation_gemini": "Microsoft's Florence-2-base is a prompt-based vision foundation model handling diverse tasks like captioning, object detection, and segmentation through text prompts, trained on 5.4B annotations for multi-task learning." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Florence-2-large-ft.json b/data/model_data_json/microsoft_Florence-2-large-ft.json new file mode 100644 index 0000000000000000000000000000000000000000..a5b68bb9b2abd32c0249ecdda763247c43cb7072 --- /dev/null +++ b/data/model_data_json/microsoft_Florence-2-large-ft.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/Florence-2-large-ft", + "downloads": 170721, + "tags": [ + "transformers", + "pytorch", + "florence2", + "text-generation", + "vision", + "image-text-to-text", + "custom_code", + "arxiv:2311.06242", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: mit license_link: pipeline_tag: image-text-to-text tags: - vision --- # Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks ## Model Summary This Hub repository contains a HuggingFace's implementation of Florence-2 model from Microsoft. Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model. Resources and Technical Documentation: + Florence-2 technical report. + Jupyter Notebook for inference and visualization of Florence-2-large model | Model | Model size | Model Description | | ------- | ------------- | ------------- | | Florence-2-base[[HF]]( | 0.23B | Pretrained model with FLD-5B | Florence-2-large[[HF]]( | 0.77B | Pretrained model with FLD-5B | Florence-2-base-ft[[HF]]( | 0.23B | Finetuned model on a colletion of downstream tasks | Florence-2-large-ft[[HF]]( | 0.77B | Finetuned model on a colletion of downstream tasks ## How to Get Started with the Model Use the code below to get started with the model. All models are trained with float16. ## Tasks This model is capable of performing different tasks through changing the prompts. First, let's define a function to run a prompt.
Click to expand
Here are the tasks could perform:
Click to expand ### Caption ### Detailed Caption ### More Detailed Caption ### Caption to Phrase Grounding caption to phrase grounding task requires additional text input, i.e. caption. Caption to phrase grounding results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['', '', ...]}} ### Object Detection OD results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} } ### Dense Region Caption Dense region caption results format: {'\\' : {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} } ### Region proposal Dense region caption results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['', '', ...]}} ### OCR ### OCR with Region OCR with region output format: {'\\': {'quad_boxes': [[x1, y1, x2, y2, x3, y3, x4, y4], ...], 'labels': ['text1', ...]}} for More detailed examples, please refer to notebook
# Benchmarks ## Florence-2 Zero-shot performance The following table presents the zero-shot performance of generalist vision foundation models on image captioning and object detection evaluation tasks. These models have not been exposed to the training data of the evaluation tasks during their training phase. | Method | #params | COCO Cap. test CIDEr | NoCaps val CIDEr | TextCaps val CIDEr | COCO Det. val2017 mAP | |--------|---------|----------------------|------------------|--------------------|-----------------------| | Flamingo | 80B | 84.3 | - | - | - | | Florence-2-base| 0.23B | 133.0 | 118.7 | 70.1 | 34.7 | | Florence-2-large| 0.77B | 135.6 | 120.8 | 72.8 | 37.5 | The following table continues the comparison with performance on other vision-language evaluation tasks. | Method | Flickr30k test R@1 | Refcoco val Accuracy | Refcoco test-A Accuracy | Refcoco test-B Accuracy | Refcoco+ val Accuracy | Refcoco+ test-A Accuracy | Refcoco+ test-B Accuracy | Refcocog val Accuracy | Refcocog test Accuracy | Refcoco RES val mIoU | |--------|----------------------|----------------------|-------------------------|-------------------------|-----------------------|--------------------------|--------------------------|-----------------------|------------------------|----------------------| | Kosmos-2 | 78.7 | 52.3 | 57.4 | 47.3 | 45.5 | 50.7 | 42.2 | 60.6 | 61.7 | - | | Florence-2-base | 83.6 | 53.9 | 58.4 | 49.7 | 51.5 | 56.4 | 47.9 | 66.3 | 65.1 | 34.6 | | Florence-2-large | 84.4 | 56.3 | 61.6 | 51.4 | 53.6 | 57.9 | 49.9 | 68.0 | 67.0 | 35.8 | ## Florence-2 finetuned performance We finetune Florence-2 models with a collection of downstream tasks, resulting two generalist models *Florence-2-base-ft* and *Florence-2-large-ft* that can conduct a wide range of downstream tasks. The table below compares the performance of specialist and generalist models on various captioning and Visual Question Answering (VQA) tasks. Specialist models are fine-tuned specifically for each task, whereas generalist models are fine-tuned in a task-agnostic manner across all tasks. The symbol \"▲\" indicates the usage of external OCR as input. | Method | # Params | COCO Caption Karpathy test CIDEr | NoCaps val CIDEr | TextCaps val CIDEr | VQAv2 test-dev Acc | TextVQA test-dev Acc | VizWiz VQA test-dev Acc | |----------------|----------|-----------------------------------|------------------|--------------------|--------------------|----------------------|-------------------------| | **Specialist Models** | | | | | | | | | CoCa | 2.1B | 143.6 | 122.4 | - | 82.3 | - | - | | BLIP-2 | 7.8B | 144.5 | 121.6 | - | 82.2 | - | - | | GIT2 | 5.1B | 145.0 | 126.9 | 148.6 | 81.7 | 67.3 | 71.0 | | Flamingo | 80B | 138.1 | - | - | 82.0 | 54.1 | 65.7 | | PaLI | 17B | 149.1 | 127.0 | 160.0▲ | 84.3 | 58.8 / 73.1▲ | 71.6 / 74.4▲ | | PaLI-X | 55B | 149.2 | 126.3 | 147.0 / 163.7▲ | 86.0 | 71.4 / 80.8▲ | 70.9 / 74.6▲ | | **Generalist Models** | | | | | | | | | Unified-IO | 2.9B | - | 100.0 | - | 77.9 | - | 57.4 | | Florence-2-base-ft | 0.23B | 140.0 | 116.7 | 143.9 | 79.7 | 63.6 | 63.6 | | Florence-2-large-ft | 0.77B | 143.3 | 124.9 | 151.1 | 81.7 | 73.5 | 72.6 | | Method | # Params | COCO Det. val2017 mAP | Flickr30k test R@1 | RefCOCO val Accuracy | RefCOCO test-A Accuracy | RefCOCO test-B Accuracy | RefCOCO+ val Accuracy | RefCOCO+ test-A Accuracy | RefCOCO+ test-B Accuracy | RefCOCOg val Accuracy | RefCOCOg test Accuracy | RefCOCO RES val mIoU | |----------------------|----------|-----------------------|--------------------|----------------------|-------------------------|-------------------------|------------------------|---------------------------|---------------------------|------------------------|-----------------------|------------------------| | **Specialist Models** | | | | | | | | | | | | | | SeqTR | - | - | - | 83.7 | 86.5 | 81.2 | 71.5 | 76.3 | 64.9 | 74.9 | 74.2 | - | | PolyFormer | - | - | - | 90.4 | 92.9 | 87.2 | 85.0 | 89.8 | 78.0 | 85.8 | 85.9 | 76.9 | | UNINEXT | 0.74B | 60.6 | - | 92.6 | 94.3 | 91.5 | 85.2 | 89.6 | 79.8 | 88.7 | 89.4 | - | | Ferret | 13B | - | - | 89.5 | 92.4 | 84.4 | 82.8 | 88.1 | 75.2 | 85.8 | 86.3 | - | | **Generalist Models** | | | | | | | | | | | | | | UniTAB | - | - | - | 88.6 | 91.1 | 83.8 | 81.0 | 85.4 | 71.6 | 84.6 | 84.7 | - | | Florence-2-base-ft | 0.23B | 41.4 | 84.0 | 92.6 | 94.8 | 91.5 | 86.8 | 91.7 | 82.2 | 89.8 | 82.2 | 78.0 | | Florence-2-large-ft| 0.77B | 43.4 | 85.2 | 93.4 | 95.3 | 92.0 | 88.3 | 92.9 | 83.6 | 91.2 | 91.7 | 80.5 | ## BibTex and citation info", + "model_explanation_gemini": "Microsoft's Florence-2-large-ft is a fine-tuned vision-language model that performs diverse image-text tasks like captioning, object detection, and OCR through prompt-based instructions." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Florence-2-large.json b/data/model_data_json/microsoft_Florence-2-large.json new file mode 100644 index 0000000000000000000000000000000000000000..1babdccb01acee8b27795088e4d08c6a541901d6 --- /dev/null +++ b/data/model_data_json/microsoft_Florence-2-large.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/Florence-2-large", + "downloads": 458398, + "tags": [ + "transformers", + "pytorch", + "florence2", + "text-generation", + "vision", + "image-text-to-text", + "custom_code", + "arxiv:2311.06242", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: mit license_link: pipeline_tag: image-text-to-text tags: - vision --- # Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks ## Model Summary **This is a continued pretrained version of Florence-2-large model with 4k context length, only 0.1B samples are used for continue pretraining, thus it might not be trained well. In addition, OCR task has been updated with line separator ('\\n'). COCO OD AP 39.8** This Hub repository contains a HuggingFace's implementation of Florence-2 model from Microsoft. Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model. Resources and Technical Documentation: + Florence-2 technical report. + Jupyter Notebook for inference and visualization of Florence-2-large | Model | Model size | Model Description | | ------- | ------------- | ------------- | | Florence-2-base[[HF]]( | 0.23B | Pretrained model with FLD-5B | Florence-2-large[[HF]]( | 0.77B | Pretrained model with FLD-5B | Florence-2-base-ft[[HF]]( | 0.23B | Finetuned model on a colletion of downstream tasks | Florence-2-large-ft[[HF]]( | 0.77B | Finetuned model on a colletion of downstream tasks ## How to Get Started with the Model Use the code below to get started with the model. All models are trained with float16. ## Tasks This model is capable of performing different tasks through changing the prompts. First, let's define a function to run a prompt.
Click to expand
Here are the tasks could perform:
Click to expand ### Caption ### Detailed Caption ### More Detailed Caption ### Caption to Phrase Grounding caption to phrase grounding task requires additional text input, i.e. caption. Caption to phrase grounding results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['', '', ...]}} ### Object Detection OD results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} } ### Dense Region Caption Dense region caption results format: {'\\' : {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]} } ### Region proposal Dense region caption results format: {'\\': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['', '', ...]}} ### OCR ### OCR with Region OCR with region output format: {'\\': {'quad_boxes': [[x1, y1, x2, y2, x3, y3, x4, y4], ...], 'labels': ['text1', ...]}} ### Output confidence score with Object Detection for More detailed examples, please refer to notebook
# Benchmarks ## Florence-2 Zero-shot performance The following table presents the zero-shot performance of generalist vision foundation models on image captioning and object detection evaluation tasks. These models have not been exposed to the training data of the evaluation tasks during their training phase. | Method | #params | COCO Cap. test CIDEr | NoCaps val CIDEr | TextCaps val CIDEr | COCO Det. val2017 mAP | |--------|---------|----------------------|------------------|--------------------|-----------------------| | Flamingo | 80B | 84.3 | - | - | - | | Florence-2-base| 0.23B | 133.0 | 118.7 | 70.1 | 34.7 | | Florence-2-large| 0.77B | 135.6 | 120.8 | 72.8 | 37.5 | The following table continues the comparison with performance on other vision-language evaluation tasks. | Method | Flickr30k test R@1 | Refcoco val Accuracy | Refcoco test-A Accuracy | Refcoco test-B Accuracy | Refcoco+ val Accuracy | Refcoco+ test-A Accuracy | Refcoco+ test-B Accuracy | Refcocog val Accuracy | Refcocog test Accuracy | Refcoco RES val mIoU | |--------|----------------------|----------------------|-------------------------|-------------------------|-----------------------|--------------------------|--------------------------|-----------------------|------------------------|----------------------| | Kosmos-2 | 78.7 | 52.3 | 57.4 | 47.3 | 45.5 | 50.7 | 42.2 | 60.6 | 61.7 | - | | Florence-2-base | 83.6 | 53.9 | 58.4 | 49.7 | 51.5 | 56.4 | 47.9 | 66.3 | 65.1 | 34.6 | | Florence-2-large | 84.4 | 56.3 | 61.6 | 51.4 | 53.6 | 57.9 | 49.9 | 68.0 | 67.0 | 35.8 | ## Florence-2 finetuned performance We finetune Florence-2 models with a collection of downstream tasks, resulting two generalist models *Florence-2-base-ft* and *Florence-2-large-ft* that can conduct a wide range of downstream tasks. The table below compares the performance of specialist and generalist models on various captioning and Visual Question Answering (VQA) tasks. Specialist models are fine-tuned specifically for each task, whereas generalist models are fine-tuned in a task-agnostic manner across all tasks. The symbol \"▲\" indicates the usage of external OCR as input. | Method | # Params | COCO Caption Karpathy test CIDEr | NoCaps val CIDEr | TextCaps val CIDEr | VQAv2 test-dev Acc | TextVQA test-dev Acc | VizWiz VQA test-dev Acc | |----------------|----------|-----------------------------------|------------------|--------------------|--------------------|----------------------|-------------------------| | **Specialist Models** | | | | | | | | | CoCa | 2.1B | 143.6 | 122.4 | - | 82.3 | - | - | | BLIP-2 | 7.8B | 144.5 | 121.6 | - | 82.2 | - | - | | GIT2 | 5.1B | 145.0 | 126.9 | 148.6 | 81.7 | 67.3 | 71.0 | | Flamingo | 80B | 138.1 | - | - | 82.0 | 54.1 | 65.7 | | PaLI | 17B | 149.1 | 127.0 | 160.0▲ | 84.3 | 58.8 / 73.1▲ | 71.6 / 74.4▲ | | PaLI-X | 55B | 149.2 | 126.3 | 147.0 / 163.7▲ | 86.0 | 71.4 / 80.8▲ | 70.9 / 74.6▲ | | **Generalist Models** | | | | | | | | | Unified-IO | 2.9B | - | 100.0 | - | 77.9 | - | 57.4 | | Florence-2-base-ft | 0.23B | 140.0 | 116.7 | 143.9 | 79.7 | 63.6 | 63.6 | | Florence-2-large-ft | 0.77B | 143.3 | 124.9 | 151.1 | 81.7 | 73.5 | 72.6 | | Method | # Params | COCO Det. val2017 mAP | Flickr30k test R@1 | RefCOCO val Accuracy | RefCOCO test-A Accuracy | RefCOCO test-B Accuracy | RefCOCO+ val Accuracy | RefCOCO+ test-A Accuracy | RefCOCO+ test-B Accuracy | RefCOCOg val Accuracy | RefCOCOg test Accuracy | RefCOCO RES val mIoU | |----------------------|----------|-----------------------|--------------------|----------------------|-------------------------|-------------------------|------------------------|---------------------------|---------------------------|------------------------|-----------------------|------------------------| | **Specialist Models** | | | | | | | | | | | | | | SeqTR | - | - | - | 83.7 | 86.5 | 81.2 | 71.5 | 76.3 | 64.9 | 74.9 | 74.2 | - | | PolyFormer | - | - | - | 90.4 | 92.9 | 87.2 | 85.0 | 89.8 | 78.0 | 85.8 | 85.9 | 76.9 | | UNINEXT | 0.74B | 60.6 | - | 92.6 | 94.3 | 91.5 | 85.2 | 89.6 | 79.8 | 88.7 | 89.4 | - | | Ferret | 13B | - | - | 89.5 | 92.4 | 84.4 | 82.8 | 88.1 | 75.2 | 85.8 | 86.3 | - | | **Generalist Models** | | | | | | | | | | | | | | UniTAB | - | - | - | 88.6 | 91.1 | 83.8 | 81.0 | 85.4 | 71.6 | 84.6 | 84.7 | - | | Florence-2-base-ft | 0.23B | 41.4 | 84.0 | 92.6 | 94.8 | 91.5 | 86.8 | 91.7 | 82.2 | 89.8 | 82.2 | 78.0 | | Florence-2-large-ft| 0.77B | 43.4 | 85.2 | 93.4 | 95.3 | 92.0 | 88.3 | 92.9 | 83.6 | 91.2 | 91.7 | 80.5 | ## BibTex and citation info", + "model_explanation_gemini": "Microsoft's Florence-2-large is a prompt-based vision foundation model handling diverse image-text tasks like captioning, object detection, and OCR through multi-task learning with a sequence-to-sequence architecture." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Orca-2-7b.json b/data/model_data_json/microsoft_Orca-2-7b.json new file mode 100644 index 0000000000000000000000000000000000000000..eacb9f73393d6fd36cc16112336277aaa0e22870 --- /dev/null +++ b/data/model_data_json/microsoft_Orca-2-7b.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/Orca-2-7b", + "downloads": 145240, + "tags": [ + "transformers", + "pytorch", + "llama", + "text-generation", + "orca", + "orca2", + "microsoft", + "arxiv:2311.11045", + "license:other", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: text-generation tags: - orca - orca2 - microsoft license: other license_name: microsoft-research-license license_link: LICENSE --- # Orca 2 Orca 2 is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model is designed to excel particularly in reasoning. Note that: 1. This is a research model, intended to show that we can use capable models and complex workflows (advanced prompts, multiple calls) to create synthetic data that can teach Small Language Models (SLMs) new capabilities. We chose reasoning because it is a widely useful capability that SLMs lack. 2. The model is not optimized for chat and has not been trained with RLHF or DPO. It is best used after being finetuned for chat or for a specific task. 3. Beyond reasoning, the model inherits capabilities and limitations of its base (LLAMA-2 base). We have already seen that the benefits of the Orca training can be applied to other base model too. We make Orca 2's weights publicly available to support further research on the development, evaluation, and alignment of SLMs. ## What is Orca 2’s intended use(s)? + Orca 2 is built for research purposes only. + The main purpose is to allow the research community to assess its abilities and to provide a foundation for building better frontier models. ## How was Orca 2 evaluated? + Orca 2 has been evaluated on a large number of tasks ranging from reasoning to grounding and safety. Please refer to Section 6 and Appendix in the Orca 2 paper for details on evaluations. ## Model Details Orca 2 is a finetuned version of LLAMA-2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the Orca 2 paper. Please refer to LLaMA-2 technical report for details on the model architecture. ## License Orca 2 is licensed under the Microsoft Research License. Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. ## Bias, Risks, and Limitations Orca 2, built upon the LLaMA 2 model family, retains many of its limitations, as well as the common limitations of other large language models or limitation caused by its training process, including: **Data Biases**: Large language models, trained on extensive data, can inadvertently carry biases present in the source data. Consequently, the models may generate outputs that could be potentially biased or unfair. **Lack of Contextual Understanding**: Despite their impressive capabilities in language understanding and generation, these models exhibit limited real-world understanding, resulting in potential inaccuracies or nonsensical responses. **Lack of Transparency**: Due to the complexity and size, large language models can act as “black boxes”, making it difficult to comprehend the rationale behind specific outputs or decisions. We recommend reviewing transparency notes from Azure for more information. **Content Harms**: There are various types of content harms that large language models can cause. It is important to be aware of them when using these models, and to take actions to prevent them. It is recommended to leverage various content moderation services provided by different companies and institutions. On an important note, we hope for better regulations and standards from government and technology leaders around content harms for AI technologies in future. We value and acknowledge the important role that research and open source community can play in this direction. **Hallucination**: It is important to be aware and cautious not to entirely rely on a given language model for critical decisions or information that might have deep impact as it is not obvious how to prevent these models from fabricating content. Moreover, it is not clear whether small models may be more susceptible to hallucination in ungrounded generation use cases due to their smaller sizes and hence reduced memorization capacities. This is an active research topic and we hope there will be more rigorous measurement, understanding and mitigations around this topic. **Potential for Misuse**: Without suitable safeguards, there is a risk that these models could be maliciously used for generating disinformation or harmful content. **Data Distribution**: Orca 2’s performance is likely to correlate strongly with the distribution of the tuning data. This correlation might limit its accuracy in areas underrepresented in the training dataset such as math, coding, and reasoning. **System messages**: Orca 2 demonstrates variance in performance depending on the system instructions. Additionally, the stochasticity introduced by the model size may lead to generation of non-deterministic responses to different system instructions. **Zero-Shot Settings**: Orca 2 was trained on data that mostly simulate zero-shot settings. While the model demonstrate very strong performance in zero-shot settings, it does not show the same gains of using few-shot learning compared to other, specially larger, models. **Synthetic data**: As Orca 2 is trained on synthetic data, it could inherit both the advantages and shortcomings of the models and methods used for data generation. We posit that Orca 2 benefits from the safety measures incorporated during training and safety guardrails (e.g., content filter) within the Azure OpenAI API. However, detailed studies are required for better quantification of such risks. This model is solely designed for research settings, and its testing has only been carried out in such environments. It should not be used in downstream applications, as additional analysis is needed to assess potential harm or bias in the proposed application. ## Getting started with Orca 2 **Inference with Hugging Face library** **Safe inference with Azure AI Content Safety** The usage of Azure AI Content Safety on top of model prediction is strongly encouraged and can help preventing some of content harms. Azure AI Content Safety is a content moderation platform that uses AI to moderate content. By having Azure AI Content Safety on the output of Orca 2, the model output can be moderated by scanning it for different harm categories including sexual content, violence, hate, and self-harm with multiple severity levels and multi-lingual detection. ## Citation", + "model_explanation_gemini": "Microsoft's Orca-2-7b is a research-focused language model optimized for reasoning tasks like math problem solving and text summarization, built to enhance small models' capabilities through synthetic data training." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Phi-3-mini-128k-instruct.json b/data/model_data_json/microsoft_Phi-3-mini-128k-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..31cedfa9f2754100324900d1529c56fc35798687 --- /dev/null +++ b/data/model_data_json/microsoft_Phi-3-mini-128k-instruct.json @@ -0,0 +1,22 @@ +{ + "model_id": "microsoft/Phi-3-mini-128k-instruct", + "downloads": 371057, + "tags": [ + "transformers", + "safetensors", + "phi3", + "text-generation", + "nlp", + "code", + "conversational", + "custom_code", + "en", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - en pipeline_tag: text-generation tags: - nlp - code widget: - messages: - role: user content: Can you provide ways to eat combinations of bananas and dragonfruits? --- 🎉**Phi-4**: [multimodal-instruct | onnx]; [mini-instruct | onnx] ## Model Summary The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support. After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures. When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters. Resources and Technical Documentation: 🏡 Phi-3 Portal
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
🛠️ Phi-3 on Azure AI Studio
👩‍🍳 Phi-3 Cookbook
🖥️ Try It | | Short Context | Long Context | | :- | :- | :- | | Mini | 4K [[HF]]( ; [[ONNX]]( ; [[GGUF]]( | 128K [[HF]]( ; [[ONNX]]( | Small | 8K [[HF]]( ; [[ONNX]]( | 128K [[HF]]( ; [[ONNX]]( | Medium | 4K [[HF]]( ; [[ONNX]]( | 128K [[HF]]( ; [[ONNX]]( | Vision | | 128K [[HF]]( ; [[ONNX]]( ## Intended Uses **Primary use cases** The model is intended for commercial and research use in English. The model provides uses for applications which require: 1) Memory/compute constrained environments 2) Latency bound scenarios 3) Strong reasoning (especially code, math and logic) Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features. **Use case considerations** Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fariness before using within a specific downstream use case, particularly for high risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case. Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under. ## Release Notes This is an update over the original instruction-tuned Phi-3-mini release based on valuable customer feedback. The model used additional post-training data leading to substantial gains on long-context understanding, instruction following, and structure output. We also improve multi-turn conversation quality, explicitly support <|system|> tag, and significantly improve reasoning capability. We believe most use cases will benefit from this release, but we encourage users to test in their particular AI applications. We appreciate the enthusiastic adoption of the Phi-3 model family, and continue to welcome all feedback from the community. These tables below highlights improvements on instruction following, structure output, reasoning, and long-context understanding of the new release on our public and internal benchmark datasets. | Benchmarks | Original | June 2024 Update | | :- | :- | :- | | Instruction Extra Hard | 5.7 | 5.9 | | Instruction Hard | 5.0 | 5.2 | | JSON Structure Output | 1.9 | 60.1 | | XML Structure Output | 47.8 | 52.9 | | GPQA | 25.9 | 29.7 | | MMLU | 68.1 | 69.7 | | **Average** | **25.7** | **37.3** | RULER: a retrieval-based benchmark for long context understanding | Model | 4K | 8K | 16K | 32K | 64K | 128K | Average | | :-------------------| :------| :------| :------| :------| :------| :------| :---------| | Original | 86.7 | 78.1 | 75.6 | 70.3 | 58.9 | 43.3 | **68.8** | | June 2024 Update | 92.4 | 91.1 | 90.8 | 87.9 | 79.8 | 65.6 | **84.6** | RepoQA: a benchmark for long context code understanding | Model | Python | C++ | Rust | Java | TypeScript | Average | | :-------------------| :--------| :-----| :------| :------| :------------| :---------| | Original | 27 | 29 | 40 | 33 | 33 | **32.4** | | June 2024 Update | 85 | 63 | 72 | 93 | 72 | **77** | Notes: if users would like to check out the previous version, use the git commit id **bb5bf1e4001277a606e11debca0ef80323e5f824**. For the model conversion, e.g. GGUF and other formats, we invite the community to experiment with various approaches and share your valuable feedback. Let's innovate together! ## How to Use Phi-3 Mini-128K-Instruct has been integrated in the development version (4.41.3) of . Until the official version is released through , ensure that you are doing one of the following: * When loading the model, ensure that is passed as an argument of the function. * Update your local to the development version: . The previous command is an alternative to cloning and installing from the source. The current version can be verified with: . Examples of required packages: Phi-3 Mini-128K-Instruct is also available in Azure AI Studio ### Tokenizer Phi-3 Mini-128K-Instruct supports a vocabulary size of up to tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size. ### Chat Format Given the nature of the training data, the Phi-3 Mini-128K-Instruct model is best suited for prompts using the chat format as follows. You can provide the prompt as a question with a generic template as follow: For example: where the model generates the text after . In case of few-shots prompt, the prompt can be formatted as the following: ### Sample inference code This code snippets show how to get quickly started with running the model on a GPU: Notes: If you want to use flash attention, call _AutoModelForCausalLM.from_pretrained()_ with _attn_implementation=\"flash_attention_2\"_ ## Responsible AI Considerations Like other language models, the Phi series models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: + Quality of Service: the Phi models are trained primarily on English text. Languages other than English will experience worse performance. English language varieties with less representation in the training data might experience worse performance than standard American English. + Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. + Inappropriate or Offensive Content: these models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the use case. + Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. + Limited Scope for Code: Majority of Phi-3 training data is based in Python and use common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. Developers should apply responsible AI best practices and are responsible for ensuring that a specific use case complies with relevant laws and regulations (e.g. privacy, trade, etc.). Important areas for consideration include: + Allocation: Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. + High-Risk Scenarios: Developers should assess suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. + Misinformation: Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). + Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. + Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations. ## Training ### Model * Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines. * Inputs: Text. It is best suited for prompts using chat format. * Context length: 128K tokens * GPUs: 512 H100-80G * Training time: 10 days * Training data: 4.9T tokens * Outputs: Generated text in response to the input * Dates: Our models were trained between May and June 2024 * Status: This is a static model trained on an offline dataset with cutoff date October 2023. Future versions of the tuned models may be released as we improve models. * Release dates: June, 2024. ### Datasets Our training data includes a wide variety of sources, totaling 4.9 trillion tokens, and is a combination of 1) Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code; 2) Newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.); 3) High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness. We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. As an example, the result of a game in premier league in a particular day might be good training data for frontier models, but we need to remove such information to leave more model capacity for reasoning for the small size models. More details about data can be found in the Phi-3 Technical Report. ### Fine-tuning A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided here. ## Benchmarks We report the results under completion format for Phi-3-Mini-128K-Instruct on standard open-source benchmarks measuring the model's reasoning ability (both common sense reasoning and logical reasoning). We compare to Mistral-7b-v0.1, Mixtral-8x7b, Gemma 7B, Llama-3-8B-Instruct, and GPT-3.5. All the reported numbers are produced with the exact same pipeline to ensure that the numbers are comparable. These numbers might differ from other published numbers due to slightly different choices in the evaluation. As is now standard, we use few-shot prompts to evaluate the models, at temperature 0. The prompts and number of shots are part of a Microsoft internal tool to evaluate language models, and in particular we did no optimization to the pipeline for Phi-3. More specifically, we do not change prompts, pick different few-shot examples, change prompt format, or do any other form of optimization for the model. The number of k–shot examples is listed per-benchmark. | Category | Benchmark | Phi-3-Mini-128K-Ins | Gemma-7B | Mistral-7B | Mixtral-8x7B | Llama-3-8B-Ins | GPT3.5-Turbo-1106 | | :----------| :-----------| :---------------------| :----------| :------------| :--------------| :----------------| :-------------------| | Popular aggregated benchmark | AGI Eval
5-shot| 39.5 | 42.1 | 35.1 | 45.2 | 42 | 48.4 | | | MMLU
5-shot | 69.7 | 63.6 | 61.7 | 70.5 | 66.5 | 71.4 | | | BigBench Hard
3-shot | 72.1 | 59.6 | 57.3 | 69.7 | 51.5 | 68.3 | | Language Understanding | ANLI
7-shot | 52.3 | 48.7 | 47.1 | 55.2 | 57.3 | 58.1 | | | HellaSwag
5-shot | 70.5 | 49.8 | 58.5 | 70.4 | 71.1 | 78.8 | | Reasoning | ARC Challenge
10-shot | 85.5 | 78.3 | 78.6 | 87.3 | 82.8 | 87.4 | | | BoolQ
0-shot | 77.1 | 66 | 72.2 | 76.6 | 80.9 | 79.1 | | | MedQA
2-shot | 56.4 | 49.6 | 50 | 62.2 | 60.5 | 63.4 | | | OpenBookQA
10-shot | 78.8 | 78.6 | 79.8 | 85.8 | 82.6 | 86 | | | PIQA
5-shot | 80.1 | 78.1 | 77.7 | 86 | 75.7 | 86.6 | | | GPQA
0-shot | 29.7 | 2.9 | 15 | 6.9 | 32.4 | 29.9 | | | Social IQA
5-shot | 74.7 | 65.5 | 74.6 | 75.9 | 73.9 | 68.3 | | | TruthfulQA (MC2)
10-shot | 64.8 | 52.1 | 53 | 60.1 | 63.2 | 67.7 | | | WinoGrande
5-shot | 71.0 | 55.6 | 54.2 | 62 | 65 | 68.8 | | Factual Knowledge | TriviaQA
5-shot | 57.8 | 72.3 | 75.2 | 82.2 | 67.7 | 85.8 | | Math | GSM8K CoTT
8-shot | 85.3 | 59.8 | 46.4 | 64.7 | 77.4 | 78.1 | | Code Generation | HumanEval
0-shot | 60.4 | 34.1 | 28.0 | 37.8 | 60.4 | 62.2 | | | MBPP
3-shot | 70.0 | 51.5 | 50.8 | 60.2 | 67.7 | 77.8 | | **Average** | | **66.4** | **56.0** | **56.4** | **64.4** | **65.5** | **70.3** | **Long Context**: Phi-3 Mini-128K-Instruct supports 128K context length, therefore the model is capable of several long context tasks including long document/meeting summarization, long document QA. | Benchmark | Phi-3 Mini-128K-Instruct | Mistral-7B | Mixtral 8x7B | LLaMA-3-8B-Instruct | | :---------------| :--------------------------|:------------|:--------------|:---------------------| | GovReport | 25.3 | 4.9 | 20.3 | 10.3 | | QMSum | 21.9 | 15.5 | 20.6 | 2.9 | | Qasper | 41.6 | 23.5 | 26.6 | 8.1 | | SQuALITY | 24.1 | 14.7 | 16.2 | 25 | | SummScreenFD | 16.8 | 9.3 | 11.3 | 5.1 | | **Average** | **25.9** | **13.6** | **19.0** | **10.3** | We take a closer look at different categories across 100 public benchmark datasets at the table below: | Category | Phi-3-Mini-128K-Instruct | Gemma-7B | Mistral-7B | Mixtral 8x7B | Llama-3-8B-Instruct | GPT-3.5-Turbo | |:----------|:--------------------------|:----------|:------------|:--------------|:---------------------|:---------------| | Popular aggregated benchmark | 60.6 | 59.4 | 56.5 | 66.2 | 59.9 | 67.0 | | Reasoning | 69.4 | 60.3 | 62.8 | 68.1 | 69.6 | 71.7 | | Language understanding | 57.5 | 57.6 | 52.5 | 66.1 | 63.2 | 67.7 | | Code generation | 61.0 | 45.6 | 42.9 | 52.7 | 56.4 | 70.4 | | Math | 51.6 | 35.8 | 25.4 | 40.3 | 41.1 | 52.8 | | Factual knowledge | 35.8 | 46.7 | 49.8 | 58.6 | 43.1 | 63.4 | | Multilingual | 56.4 | 66.5 | 57.4 | 66.7 | 66.6 | 71.0 | | Robustness | 61.1 | 38.4 | 40.6 | 51.0 | 64.5 | 69.3 | Overall, the model with only 3.8B-param achieves a similar level of language understanding and reasoning ability as much larger models. However, it is still fundamentally limited by its size for certain tasks. The model simply does not have the capacity to store too much world knowledge, which can be seen for example with low performance on TriviaQA. However, we believe such weakness can be resolved by augmenting Phi-3-Mini with a search engine. ## Cross Platform Support ONNX runtime now supports Phi-3 mini models across platforms and hardware. Optimized phi-3 models are also published here in ONNX format, to run with ONNX Runtime on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets. DirectML GPU acceleration is supported for Windows desktops GPUs (AMD, Intel, and NVIDIA). Along with DML, ONNX Runtime provides cross platform support for Phi3 mini across a range of devices CPU, GPU, and mobile. Here are some of the optimized configurations we have added: 1. ONNX models for int4 DML: Quantized to int4 via AWQ 2. ONNX model for fp16 CUDA 3. ONNX model for int4 CUDA: Quantized to int4 via RTN 4. ONNX model for int4 CPU and Mobile: Quantized to int4 via RTN ## Software * PyTorch * Transformers * Flash-Attention ## Hardware Note that by default, the Phi-3 Mini-128K-Instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types: * NVIDIA A100 * NVIDIA A6000 * NVIDIA H100 If you want to run the model on: * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation=\"eager\" * Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models 128K ## License The model is licensed under the MIT license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.", + "model_explanation_gemini": "A 3.8B-parameter lightweight language model optimized for instruction-following, reasoning (code/math/logic), and long-context tasks (128K tokens) in English, designed for constrained environments and latency-sensitive applications." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Phi-3-mini-4k-instruct.json b/data/model_data_json/microsoft_Phi-3-mini-4k-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..f064f4a5dfa1811ebbabd056146fd81b09f6a3d4 --- /dev/null +++ b/data/model_data_json/microsoft_Phi-3-mini-4k-instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "microsoft/Phi-3-mini-4k-instruct", + "downloads": 579335, + "tags": [ + "transformers", + "safetensors", + "phi3", + "text-generation", + "nlp", + "code", + "conversational", + "custom_code", + "en", + "fr", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - en - fr pipeline_tag: text-generation tags: - nlp - code inference: parameters: temperature: 0 widget: - messages: - role: user content: Can you provide ways to eat combinations of bananas and dragonfruits? --- 🎉 **Phi-3.5**: [[mini-instruct]]( [[MoE-instruct]]( ; [[vision-instruct]]( ## Model Summary The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. Resources and Technical Documentation: 🏡 Phi-3 Portal
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
🛠️ Phi-3 on Azure AI Studio
👩‍🍳 Phi-3 Cookbook
🖥️ Try It | | Short Context | Long Context | | :------- | :------------- | :------------ | | Mini | 4K [[HF]]( ; [[ONNX]]( ; [[GGUF]]( | 128K [[HF]]( ; [[ONNX]]( | Small | 8K [[HF]]( ; [[ONNX]]( | 128K [[HF]]( ; [[ONNX]]( | Medium | 4K [[HF]]( ; [[ONNX]]( | 128K [[HF]]( ; [[ONNX]]( | Vision | | 128K [[HF]]( ; [[ONNX]]( ## Intended Uses **Primary use cases** The model is intended for broad commercial and research use in English. The model provides uses for general purpose AI systems and applications which require 1) memory/compute constrained environments; 2) latency bound scenarios; 3) strong reasoning (especially math and logic). Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features. **Out-of-scope use cases** Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case. **Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.** ## Release Notes This is an update over the original instruction-tuned Phi-3-mini release based on valuable customer feedback. The model used additional post-training data leading to substantial gains on instruction following and structure output. We also improve multi-turn conversation quality, explicitly support <|system|> tag, and significantly improve reasoning capability. We believe most use cases will benefit from this release, but we encourage users to test in their particular AI applications. We appreciate the enthusiastic adoption of the Phi-3 model family, and continue to welcome all feedback from the community. The table below highlights improvements on instruction following, structure output, and reasoning of the new release on publich and internal benchmark datasets. | Benchmarks | Original | June 2024 Update | |:------------|:----------|:------------------| | Instruction Extra Hard | 5.7 | 6.0 | | Instruction Hard | 4.9 | 5.1 | | Instructions Challenge | 24.6 | 42.3 | | JSON Structure Output | 11.5 | 52.3 | | XML Structure Output | 14.4 | 49.8 | | GPQA | 23.7 | 30.6 | | MMLU | 68.8 | 70.9 | | **Average** | **21.9** | **36.7** | Notes: if users would like to check out the previous version, use the git commit id **ff07dc01615f8113924aed013115ab2abd32115b**. For the model conversion, e.g. GGUF and other formats, we invite the community to experiment with various approaches and share your valuable feedback. Let's innovate together! ## How to Use Phi-3 Mini-4K-Instruct has been integrated in the version of . The current version can be verified with: . Examples of required packages: Phi-3 Mini-4K-Instruct is also available in Azure AI Studio ### Tokenizer Phi-3 Mini-4K-Instruct supports a vocabulary size of up to tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size. ### Chat Format Given the nature of the training data, the Phi-3 Mini-4K-Instruct model is best suited for prompts using the chat format as follows. You can provide the prompt as a question with a generic template as follow: For example: where the model generates the text after . In case of few-shots prompt, the prompt can be formatted as the following: ### Sample inference code This code snippets show how to get quickly started with running the model on a GPU: Note: If you want to use flash attention, call _AutoModelForCausalLM.from_pretrained()_ with _attn_implementation=\"flash_attention_2\"_ ## Responsible AI Considerations Like other language models, the Phi series models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: + Quality of Service: the Phi models are trained primarily on English text. Languages other than English will experience worse performance. English language varieties with less representation in the training data might experience worse performance than standard American English. + Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. + Inappropriate or Offensive Content: these models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the use case. + Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. + Limited Scope for Code: Majority of Phi-3 training data is based in Python and use common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. Developers should apply responsible AI best practices and are responsible for ensuring that a specific use case complies with relevant laws and regulations (e.g. privacy, trade, etc.). Important areas for consideration include: + Allocation: Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. + High-Risk Scenarios: Developers should assess suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. + Misinformation: Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). + Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. + Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations. ## Training ### Model * Architecture: Phi-3 Mini-4K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines. * Inputs: Text. It is best suited for prompts using chat format. * Context length: 4K tokens * GPUs: 512 H100-80G * Training time: 10 days * Training data: 4.9T tokens * Outputs: Generated text in response to the input * Dates: Our models were trained between May and June 2024 * Status: This is a static model trained on an offline dataset with cutoff date October 2023. Future versions of the tuned models may be released as we improve models. * Release dates: June, 2024. ### Datasets Our training data includes a wide variety of sources, totaling 4.9 trillion tokens, and is a combination of 1) Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code; 2) Newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.); 3) High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness. We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. As an example, the result of a game in premier league in a particular day might be good training data for frontier models, but we need to remove such information to leave more model capacity for reasoning for the small size models. More details about data can be found in the Phi-3 Technical Report. ### Fine-tuning A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided here. ## Benchmarks We report the results under completion format for Phi-3-Mini-4K-Instruct on standard open-source benchmarks measuring the model's reasoning ability (both common sense reasoning and logical reasoning). We compare to Mistral-7b-v0.1, Mixtral-8x7b, Gemma 7B, Llama-3-8B-Instruct, and GPT3.5-Turbo-1106. All the reported numbers are produced with the exact same pipeline to ensure that the numbers are comparable. These numbers might differ from other published numbers due to slightly different choices in the evaluation. As is now standard, we use few-shot prompts to evaluate the models, at temperature 0. The prompts and number of shots are part of a Microsoft internal tool to evaluate language models, and in particular we did no optimization to the pipeline for Phi-3. More specifically, we do not change prompts, pick different few-shot examples, change prompt format, or do any other form of optimization for the model. The number of k–shot examples is listed per-benchmark. | Category | Benchmark | Phi-3-Mini-4K-Ins | Gemma-7B | Mistral-7b | Mixtral-8x7b | Llama-3-8B-Ins | GPT3.5-Turbo-1106 | |:----------|:-----------|:-------------------|:----------|:------------|:--------------|:----------------|:-------------------| | Popular aggregated benchmark | AGI Eval
5-shot| 39.0 | 42.1 | 35.1 | 45.2 | 42 | 48.4 | | | MMLU
5-shot | 70.9 | 63.6 | 61.7 | 70.5 | 66.5 | 71.4 | | | BigBench Hard CoT
3-shot| 73.5 | 59.6 | 57.3 | 69.7 | 51.5 | 68.3 | | Language Understanding | ANLI
7-shot | 53.6 | 48.7 | 47.1 | 55.2 | 57.3 | 58.1 | | | HellaSwag
5-shot| 75.3 | 49.8 | 58.5 | 70.4 | 71.1 | 78.8 | | Reasoning | ARC Challenge
10-shot | 86.3 | 78.3 | 78.6 | 87.3 | 82.8 | 87.4 | | | BoolQ
0-shot | 78.1 | 66 | 72.2 | 76.6 | 80.9 | 79.1 | | | MedQA
2-shot| 56.5 | 49.6 | 50 | 62.2 | 60.5 | 63.4 | | | OpenBookQA
10-shot| 82.2 | 78.6 | 79.8 | 85.8 | 82.6 | 86 | | | PIQA
5-shot| 83.5 | 78.1 | 77.7 | 86 | 75.7 | 86.6 | | | GPQA
0-shot| 30.6 | 2.9 | 15 | 6.9 | 32.4 | 30.8 | | | Social IQA
5-shot| 77.6 | 65.5 | 74.6 | 75.9 | 73.9 | 68.3 | | | TruthfulQA (MC2)
10-shot| 64.7 | 52.1 | 53 | 60.1 | 63.2 | 67.7 | | | WinoGrande
5-shot| 71.6 | 55.6 | 54.2 | 62 | 65 | 68.8 | | Factual Knowledge | TriviaQA
5-shot| 61.4 | 72.3 | 75.2 | 82.2 | 67.7 | 85.8 | | Math | GSM8K CoT
8-shot| 85.7 | 59.8 | 46.4 | 64.7 | 77.4 | 78.1 | | Code Generation | HumanEval
0-shot| 57.3 | 34.1 | 28.0 | 37.8 | 60.4 | 62.2 | | | MBPP
3-shot| 69.8 | 51.5 | 50.8 | 60.2 | 67.7 | 77.8 | | **Average** | | **67.6** | **56.0** | **56.4** | **64.4** | **65.5** | **70.4** | We take a closer look at different categories across 100 public benchmark datasets at the table below: | Category | Phi-3-Mini-4K-Instruct | Gemma-7B | Mistral-7B | Mixtral 8x7B | Llama-3-8B-Instruct | GPT-3.5-Turbo | |:----------|:------------------------|:----------|:------------|:--------------|:---------------------|:---------------| | Popular aggregated benchmark | 61.1 | 59.4 | 56.5 | 66.2 | 59.9 | 67.0 | | Reasoning | 70.8 | 60.3 | 62.8 | 68.1 | 69.6 | 71.8 | | Language understanding | 60.5 | 57.6 | 52.5 | 66.1 | 63.2 | 67.7 | | Code generation | 60.7 | 45.6 | 42.9 | 52.7 | 56.4 | 70.4 | | Math | 50.6 | 35.8 | 25.4 | 40.3 | 41.1 | 52.8 | | Factual knowledge | 38.4 | 46.7 | 49.8 | 58.6 | 43.1 | 63.4 | | Multilingual | 56.7 | 66.5 | 57.4 | 66.7 | 66.6 | 71.0 | | Robustness | 61.1 | 38.4 | 40.6 | 51.0 | 64.5 | 69.3 | Overall, the model with only 3.8B-param achieves a similar level of language understanding and reasoning ability as much larger models. However, it is still fundamentally limited by its size for certain tasks. The model simply does not have the capacity to store too much world knowledge, which can be seen for example with low performance on TriviaQA. However, we believe such weakness can be resolved by augmenting Phi-3-Mini with a search engine. ## Cross Platform Support ONNX runtime now supports Phi-3 mini models across platforms and hardware. Optimized phi-3 models are also published here in ONNX format, to run with ONNX Runtime on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets. DirectML GPU acceleration is supported for Windows desktops GPUs (AMD, Intel, and NVIDIA). Along with DML, ONNX Runtime provides cross platform support for Phi3 mini across a range of devices CPU, GPU, and mobile. Here are some of the optimized configurations we have added: 1. ONNX models for int4 DML: Quantized to int4 via AWQ 2. ONNX model for fp16 CUDA 3. ONNX model for int4 CUDA: Quantized to int4 via RTN 4. ONNX model for int4 CPU and Mobile: Quantized to int4 via R ## Software * PyTorch * Transformers * Flash-Attention ## Hardware Note that by default, the Phi-3 Mini-4K-Instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types: * NVIDIA A100 * NVIDIA A6000 * NVIDIA H100 If you want to run the model on: * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation=\"eager\" * CPU: use the **GGUF** quantized models 4K + Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models 4K ## License The model is licensed under the MIT license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.", + "model_explanation_gemini": "A lightweight 3.8B-parameter instruction-tuned model optimized for reasoning, math, and code tasks with strong performance in constrained environments and 4K-token context support." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Phi-3.5-mini-instruct.json b/data/model_data_json/microsoft_Phi-3.5-mini-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..c0da724b8cc7c6f081ae21a895ae2bfc6ea02619 --- /dev/null +++ b/data/model_data_json/microsoft_Phi-3.5-mini-instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "microsoft/Phi-3.5-mini-instruct", + "downloads": 317228, + "tags": [ + "transformers", + "safetensors", + "phi3", + "text-generation", + "nlp", + "code", + "conversational", + "custom_code", + "multilingual", + "arxiv:2404.14219", + "arxiv:2407.13833", + "arxiv:2403.06412", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - multilingual pipeline_tag: text-generation tags: - nlp - code widget: - messages: - role: user content: Can you provide ways to eat combinations of bananas and dragonfruits? library_name: transformers --- 🎉**Phi-4**: [multimodal-instruct | onnx]; [mini-instruct | onnx] ## Model Summary Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. 🏡 Phi-3 Portal
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
👩‍🍳 Phi-3 Cookbook
🖥️ Try It
**Phi-3.5**: [mini-instruct | onnx]; [[MoE-instruct]]( [[vision-instruct]]( ## Intended Uses ### Primary Use Cases The model is intended for commercial and research use in multiple languages. The model provides uses for general purpose AI systems and applications which require: 1) Memory/compute constrained environments 2) Latency bound scenarios 3) Strong reasoning (especially code, math and logic) Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features. ### Use Case Considerations Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fariness before using within a specific downstream use case, particularly for high risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case. ***Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.*** ## Release Notes This is an update over the June 2024 instruction-tuned Phi-3 Mini release based on valuable user feedback. The model used additional post-training data leading to substantial gains on multilingual, multi-turn conversation quality, and reasoning capability. We believe most use cases will benefit from this release, but we encourage users to test in their particular AI applications. We appreciate the enthusiastic adoption of the Phi-3 model family, and continue to welcome all feedback from the community. ### Multilingual The table below highlights multilingual capability of the Phi-3.5 Mini on multilingual MMLU, MEGA, and multilingual MMLU-pro datasets. Overall, we observed that even with just 3.8B active parameters, the model is competitive on multilingual tasks in comparison to other models with a much bigger active parameters. | Benchmark | Phi-3.5 Mini-Ins | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |----------------------------|------------------|-----------------------|--------------------------|---------------------------|------------------|----------------|------------------|-------------------------------| | Multilingual MMLU | 55.4 | 51.08 | 47.4 | 58.9 | 56.2 | 63.8 | 77.2 | 72.9 | | Multilingual MMLU-Pro | 30.9 | 30.21 | 15.0 | 34.0 | 21.4 | 43.0 | 57.9 | 53.2 | | MGSM | 47.9 | 41.56 | 31.8 | 63.3 | 56.7 | 75.1 | 75.8 | 81.7 | | MEGA MLQA | 61.7 | 55.5 | 43.9 | 61.2 | 45.2 | 54.4 | 61.6 | 70.0 | | MEGA TyDi QA | 62.2 | 55.9 | 54.0 | 63.7 | 54.5 | 65.6 | 63.6 | 81.8 | | MEGA UDPOS | 46.5 | 48.1 | 57.2 | 58.2 | 54.1 | 56.6 | 62.4 | 66.0 | | MEGA XCOPA | 63.1 | 62.4 | 58.8 | 10.8 | 21.1 | 31.2 | 95.0 | 90.3 | | MEGA XStoryCloze | 73.5 | 73.6 | 75.5 | 92.3 | 71.0 | 87.0 | 20.7 | 96.6 | | **Average** | **55.2** | **52.3** | **47.9** | **55.3** | **47.5** | **59.6** | **64.3** | **76.6** | The table below shows Multilingual MMLU scores in some of the supported languages. For more multi-lingual benchmarks and details, see Appendix A. | Benchmark | Phi-3.5 Mini-Ins | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |-----------|------------------|-----------------------|--------------------------|---------------------------|------------------|----------------|------------------|-------------------------------| | Arabic | 44.2 | 35.4 | 33.7 | 45.3 | 49.1 | 56.3 | 73.6 | 67.1 | | Chinese | 52.6 | 46.9 | 45.9 | 58.2 | 54.4 | 62.7 | 66.7 | 70.8 | | Dutch | 57.7 | 48.0 | 51.3 | 60.1 | 55.9 | 66.7 | 80.6 | 74.2 | | French | 61.1 | 61.7 | 53.0 | 63.8 | 62.8 | 67.0 | 82.9 | 75.6 | | German | 62.4 | 61.3 | 50.1 | 64.5 | 59.9 | 65.7 | 79.5 | 74.3 | | Italian | 62.8 | 63.1 | 52.5 | 64.1 | 55.9 | 65.7 | 82.6 | 75.9 | | Russian | 50.4 | 45.3 | 48.9 | 59.0 | 57.4 | 63.2 | 78.7 | 72.6 | | Spanish | 62.6 | 61.3 | 53.9 | 64.3 | 62.6 | 66.0 | 80.0 | 75.5 | | Ukrainian | 45.2 | 36.7 | 46.9 | 56.6 | 52.9 | 62.0 | 77.4 | 72.6 | ### Long Context Phi-3.5-mini supports 128K context length, therefore the model is capable of several long context tasks including long document/meeting summarization, long document QA, long document information retrieval. We see that Phi-3.5-mini is clearly better than Gemma-2 family which only supports 8K context length. Phi-3.5-mini is competitive with other much larger open-weight models such as Llama-3.1-8B-instruct, Mistral-7B-instruct-v0.3, and Mistral-Nemo-12B-instruct-2407. | Benchmark | Phi-3.5-mini-instruct | Llama-3.1-8B-instruct | Mistral-7B-instruct-v0.3 | Mistral-Nemo-12B-instruct-2407 | Gemini-1.5-Flash | GPT-4o-mini-2024-07-18 (Chat) | |--|--|--|--|--|--|--| | GovReport | 25.9 | 25.1 | 26.0 | 25.6 | 27.8 | 24.8 | | QMSum | 21.3 | 21.6 | 21.3 | 22.1 | 24.0 | 21.7 | | Qasper | 41.9 | 37.2 | 31.4 | 30.7 | 43.5 | 39.8 | | SQuALITY | 25.3 | 26.2 | 25.9 | 25.8 | 23.5 | 23.8 | | SummScreenFD | 16.0 | 17.6 | 17.5 | 18.2 | 16.3 | 17.0 | | **Average** | **26.1** | **25.5** | **24.4** | **24.5** | **27.0** | **25.4** | RULER: a retrieval-based benchmark for long context understanding | Model | 4K | 8K | 16K | 32K | 64K | 128K | Average | |--|--|--|--|--|--|--|--| | **Phi-3.5-mini-instruct** | 94.3 | 91.1 | 90.7 | 87.1 | 78.0 | 63.6 | **84.1** | | **Llama-3.1-8B-instruct** | 95.5 | 93.8 | 91.6 | 87.4 | 84.7 | 77.0 | **88.3** | | **Mistral-Nemo-12B-instruct-2407** | 87.8 | 87.2 | 87.7 | 69.0 | 46.8 | 19.0 | **66.2** | RepoQA: a benchmark for long context code understanding | Model | Python | C++ | Rust | Java | TypeScript | Average | |--|--|--|--|--|--|--| | **Phi-3.5-mini-instruct** | 86 | 67 | 73 | 77 | 82 | **77** | | **Llama-3.1-8B-instruct** | 80 | 65 | 73 | 76 | 63 | **71** | | **Mistral-7B-instruct-v0.3** | 61 | 57 | 51 | 61 | 80 | **62** | ## Usage ### Requirements Phi-3 family has been integrated in the version of . The current version can be verified with: . Examples of required packages: Phi-3.5-mini-instruct is also available in Azure AI Studio ### Tokenizer Phi-3.5-mini-Instruct supports a vocabulary size of up to tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size. ### Input Formats Given the nature of the training data, the Phi-3.5-mini-instruct model is best suited for prompts using the chat format as follows: ### Loading the model locally After obtaining the Phi-3.5-mini-instruct model checkpoint, users can use this sample code for inference. Notes: If you want to use flash attention, call _AutoModelForCausalLM.from_pretrained()_ with _attn_implementation=\"flash_attention_2\"_ ## Responsible AI Considerations Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: + Quality of Service: The Phi models are trained primarily on English text and some additional multilingual text. Languages other than English will experience worse performance as well as performance disparities across non-English. English language varieties with less representation in the training data might experience worse performance than standard American English. + Multilingual performance and safety gaps: We believe it is important to make language models more widely available across different languages, but the Phi 3 models still exhibit challenges common across multilingual releases. As with any deployment of LLMs, developers will be better positioned to test for performance or safety gaps for their linguistic and cultural context and customize the model with additional fine-tuning and appropriate safeguards. + Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups, cultural contexts, or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. + Inappropriate or Offensive Content: These models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the case. + Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. + Limited Scope for Code: Majority of Phi-3 training data is based in Python and use common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. + Long Conversation: Phi-3 models, like other models, can in some cases generate responses that are repetitive, unhelpful, or inconsistent in very long chat sessions in both English and non-English languages. Developers are encouraged to place appropriate mitigations, like limiting conversation turns to account for the possible conversational drift Developers should apply responsible AI best practices, including mapping, measuring, and mitigating risks associated with their specific use case and cultural, linguistic context. Phi-3 family of models are general purpose models. As developers plan to deploy these models for specific use cases, they are encouraged to fine-tune the models for their use case and leverage the models as part of broader AI systems with language-specific safeguards in place. Important areas for consideration include: + Allocation: Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. + High-Risk Scenarios: Developers should assess the suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. + Misinformation: Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). + Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. + Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations. ## Training ### Model **Architecture:** Phi-3.5-mini has 3.8B parameters and is a dense decoder-only Transformer model using the same tokenizer as Phi-3 Mini.
**Inputs:** Text. It is best suited for prompts using chat format.
**Context length:** 128K tokens
**GPUs:** 512 H100-80G
**Training time:** 10 days
**Training data:** 3.4T tokens
**Outputs:** Generated text in response to the input
**Dates:** Trained between June and August 2024
**Status:** This is a static model trained on an offline dataset with cutoff date October 2023 for publicly available data. Future versions of the tuned models may be released as we improve models.
**Supported languages:** Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian
**Release date:** August 2024
### Training Datasets Our training data includes a wide variety of sources, totaling 3.4 trillion tokens, and is a combination of 1) publicly available documents filtered rigorously for quality, selected high-quality educational data, and code; 2) newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.); 3) high quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness. We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. As an example, the result of a game in premier league in a particular day might be good training data for frontier models, but we need to remove such information to leave more model capacity for reasoning for the small size models. More details about data can be found in the Phi-3 Technical Report. ### Fine-tuning A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided here. ## Benchmarks We report the results under completion format for Phi-3.5-mini on standard open-source benchmarks measuring the model's reasoning ability (both common sense reasoning and logical reasoning). We compare to Mistral-7B-Instruct-v0.3, Mistral-Nemo-12B-Ins-2407, Llama-3.1-8B-Ins, Gemma-2-9B-Ins, Gemini 1.5 Flash, and GPT-4o-mini-2024-07-18 (Chat). All the reported numbers are produced with the exact same pipeline to ensure that the numbers are comparable. These numbers might differ from other published numbers due to slightly different choices in the evaluation. As is now standard, we use few-shot prompts to evaluate the models, at temperature 0. The prompts and number of shots are part of a Microsoft internal tool to evaluate language models, and in particular we did no optimization to the pipeline for Phi-3. More specifically, we do not change prompts, pick different few-shot examples, change prompt format, or do any other form of optimization for the model. The number of k–shot examples is listed per-benchmark. At the high-level overview of the model quality on representative benchmarks: | Category | Benchmark | Phi-3.5 Mini-Ins | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |----------------|--------------------------|------------------|--------------------------|---------------------------|------------------|----------------|------------------|------------------------------| | Popular aggregated benchmark | Arena Hard | 37 | 18.1 | 39.4 | 25.7 | 42 | 55.2 | 75 | | | BigBench Hard CoT (0-shot) | 69 | 33.4 | 60.2 | 63.4 | 63.5 | 66.7 | 80.4 | | | MMLU (5-shot) | 69 | 60.3 | 67.2 | 68.1 | 71.3 | 78.7 | 77.2 | | | MMLU-Pro (0-shot, CoT) | 47.4 | 18 | 40.7 | 44 | 50.1 | 57.2 | 62.8 | | Reasoning | ARC Challenge (10-shot) | 84.6 | 77.9 | 84.8 | 83.1 | 89.8 | 92.8 | 93.5 | | | BoolQ (2-shot) | 78 | 80.5 | 82.5 | 82.8 | 85.7 | 85.8 | 88.7 | | | GPQA (0-shot, CoT) | 30.4 | 15.6 | 28.6 | 26.3 | 29.2 | 37.5 | 41.1 | | | HellaSwag (5-shot) | 69.4 | 71.6 | 76.7 | 73.5 | 80.9 | 67.5 | 87.1 | | | OpenBookQA (10-shot) | 79.2 | 78 | 84.4 | 84.8 | 89.6 | 89 | 90 | | | PIQA (5-shot) | 81 | 73.4 | 83.5 | 81.2 | 83.7 | 87.5 | 88.7 | | | Social IQA (5-shot) | 74.7 | 73 | 75.3 | 71.8 | 74.7 | 77.8 | 82.9 | | | TruthfulQA (MC2) (10-shot) | 64 | 64.7 | 68.1 | 69.2 | 76.6 | 76.6 | 78.2 | | | WinoGrande (5-shot) | 68.5 | 58.1 | 70.4 | 64.7 | 74 | 74.7 | 76.9 | | Multilingual | Multilingual MMLU (5-shot) | 55.4 | 47.4 | 58.9 | 56.2 | 63.8 | 77.2 | 72.9 | | | MGSM (0-shot CoT) | 47.9 | 31.8 | 63.3 | 56.7 | 76.4 | 75.8 | 81.7 | | Math | GSM8K (8-shot, CoT) | 86.2 | 54.4 | 84.2 | 82.4 | 84.9 | 82.4 | 91.3 | | | MATH (0-shot, CoT) | 48.5 | 19 | 31.2 | 47.6 | 50.9 | 38 | 70.2 | | Long context | Qasper | 41.9 | 31.4 | 30.7 | 37.2 | 13.9 | 43.5 | 39.8 | | | SQuALITY | 24.3 | 25.9 | 25.8 | 26.2 | 0 | 23.5 | 23.8 | | Code Generation| HumanEval (0-shot) | 62.8 | 35.4 | 63.4 | 66.5 | 61 | 74.4 | 86.6 | | | MBPP (3-shot) | 69.6 | 50.4 | 68.1 | 69.4 | 69.3 | 77.5 | 84.1 | | **Average** | | **61.4** | **48.5** | **61.3** | **61.0** | **63.3** | **68.5** | **74.9** | We take a closer look at different categories across public benchmark datasets at the table below: | Category | Phi-3.5 Mini-Ins | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |----------------------------|------------------|--------------------------|---------------------------|------------------|----------------|------------------|------------------------------| | Popular aggregated benchmark | 55.6 | 32.5 | 51.9 | 50.3 | 56.7 | 64.5 | 73.9 | | Reasoning | 70.1 | 65.2 | 72.2 | 70.5 | 75.4 | 77.7 | 80 | | Language understanding | 62.6 | 62.8 | 67 | 62.9 | 72.8 | 66.6 | 76.8 | | Robustness | 59.7 | 53.4 | 65.2 | 59.8 | 64.7 | 68.9 | 77.5 | | Long context | 26.1 | 25.5 | 24.4 | 24.5 | 0 | 27 | 25.4 | | Math | 67.4 | 36.7 | 57.7 | 65 | 67.9 | 60.2 | 80.8 | | Code generation | 62 | 43.1 | 56.9 | 65.8 | 58.3 | 66.8 | 69.9 | | Multilingual | 55.2 | 47.9 | 55.3 | 47.5 | 59.6 | 64.3 | 76.6 | Overall, the model with only 3.8B-param achieves a similar level of multilingual language understanding and reasoning ability as much larger models. However, it is still fundamentally limited by its size for certain tasks. The model simply does not have the capacity to store too much factual knowledge, therefore, users may experience factual incorrectness. However, we believe such weakness can be resolved by augmenting Phi-3.5 with a search engine, particularly when using the model under RAG settings. ## Safety Evaluation and Red-Teaming We leveraged various evaluation techniques including red teaming, adversarial conversation simulations, and multilingual safety evaluation benchmark datasets to evaluate Phi-3.5 models' propensity to produce undesirable outputs across multiple languages and risk categories. Several approaches were used to compensate for the limitations of one approach alone. Findings across the various evaluation methods indicate that safety post-training that was done as detailed in the Phi-3 Safety Post-Training paper had a positive impact across multiple languages and risk categories as observed by refusal rates (refusal to output undesirable outputs) and robustness to jailbreak techniques. Note, however, while comprehensive red team evaluations were conducted across all models in the prior release of Phi models, red teaming was largely focused on Phi-3.5 MOE across multiple languages and risk categories for this release as it is the largest and more capable model of the three models. Details on prior red team evaluations across Phi models can be found in the Phi-3 Safety Post-Training paper. For this release, insights from red teaming indicate that the models may refuse to generate undesirable outputs in English, even when the request for undesirable output is in another language. Models may also be more susceptible to longer multi-turn jailbreak techniques across both English and non-English languages. These findings highlight the need for industry-wide investment in the development of high-quality safety evaluation datasets across multiple languages, including low resource languages, and risk areas that account for cultural nuances where those languages are spoken. ## Software * PyTorch * Transformers * Flash-Attention ## Hardware Note that by default, the Phi-3.5-mini-instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types: * NVIDIA A100 * NVIDIA A6000 * NVIDIA H100 If you want to run the model on: * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation=\"eager\" ## License The model is licensed under the MIT license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies. ## Appendix A #### MGSM | Languages | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |-----------|------------------------|---------------------------------------|--------------------------|---------------------------|------------------|----------------|------------------|-------------------------------| | German | 69.6 | 65.2 | 42.4 | 74.4 | 68.4 | 76.8 | 81.6 | 82.8 | | English | 85.2 | 83.2 | 60.0 | 86.0 | 81.2 | 88.8 | 90.8 | 90.8 | | Spanish | 79.2 | 77.6 | 46.4 | 75.6 | 66.4 | 82.4 | 84.8 | 86.8 | | French | 71.6 | 72.8 | 47.2 | 70.4 | 66.8 | 74.4 | 77.2 | 81.6 | | Japanese | 50.0 | 35.2 | 22.8 | 62.4 | 49.2 | 67.6 | 77.6 | 80.4 | | Russian | 67.2 | 51.6 | 43.2 | 73.6 | 67.2 | 78.4 | 84.8 | 86.4 | | Thai | 29.6 | 6.4 | 18.4 | 53.2 | 56.0 | 76.8 | 87.6 | 81.6 | | Chinese | 60.0 | 52.8 | 42.4 | 66.4 | 68.0 | 72.8 | 82.0 | 82.0 | #### Multilingual MMLU-pro | Languages | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |------------|-----------------------|---------------------------------------|--------------------------|---------------------------|------------------|----------------|------------------|-------------------------------| | Czech | 24.9 | 26.3 | 14.6 | 30.6 | 23.0 | 40.5 | 59.0 | 40.9 | | English | 47.7 | 46.2 | 17.7 | 39.8 | 43.1 | 49.0 | 66.1 | 62.7 | | Finnish | 22.3 | 20.5 | 11.5 | 30.4 | 9.7 | 37.5 | 54.5 | 50.1 | | Norwegian | 29.9 | 27.8 | 14.4 | 33.2 | 22.2 | 44.4 | 60.7 | 59.1 | | Polish | 25.7 | 26.4 | 16.3 | 33.6 | 9.2 | 41.7 | 53.9 | 42.8 | | Portuguese | 38.7 | 37.6 | 15.3 | 36.0 | 29.3 | 43.5 | 54.0 | 56.9 | | Swedish | 30.7 | 28.1 | 15.5 | 34.3 | 16.9 | 42.6 | 57.7 | 55.5 | #### MEGA ##### MLQA | Languages | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |-----------|-----------------------|---------------------------------------|--------------------------|---------------------------|------------------|----------------|------------------|-------------------------------| | Arabic | 54.3 | 32.7 | 23.5 | 31.4 | 31.5 | 57.4 | 63.8 | 64.0 | | Chinese | 36.1 | 31.8 | 22.4 | 27.4 | 18.6 | 45.4 | 38.1 | 38.9 | | English | 80.3 | 78.9 | 68.2 | 75.5 | 67.2 | 82.9 | 69.5 | 82.2 | | German | 61.8 | 59.1 | 49.0 | 57.8 | 38.9 | 63.8 | 55.9 | 64.1 | | Spanish | 68.8 | 67.0 | 50.3 | 63.6 | 52.7 | 72.8 | 59.6 | 70.1 | ##### TyDi QA | Languages | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |-----------|-----------------------|---------------------------------------|--------------------------|---------------------------|------------------|----------------|------------------|-------------------------------| | Arabic | 69.7 | 54.4 | 52.5 | 49.8 | 33.7 | 81.1 | 78.8 | 84.9 | | English | 82.0 | 82.0 | 60.5 | 77.3 | 65.1 | 82.4 | 60.9 | 81.8 | | Finnish | 70.3 | 64.3 | 68.6 | 57.1 | 74.4 | 85.7 | 73.5 | 84.8 | | Japanese | 65.4 | 56.7 | 45.3 | 54.8 | 34.1 | 74.6 | 59.7 | 73.3 | | Korean | 74.0 | 60.4 | 54.5 | 54.2 | 54.9 | 83.8 | 60.7 | 82.3 | | Russian | 63.5 | 62.7 | 52.3 | 55.7 | 27.4 | 69.8 | 60.1 | 72.5 | | Thai | 64.4 | 49.0 | 51.8 | 43.5 | 48.5 | 81.4 | 71.6 | 78.2 | ##### XCOPA | Languages | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) | |-----------|-----------------------|---------------------------------------|--------------------------|---------------------------|------------------|----------------|------------------|-------------------------------| | English | 94.6 | 94.6 | 85.6 | 94.4 | 37.6 | 63.8 | 92.0 | 98.2 | | Italian | 86.8 | 84.8 | 76.8 | 83.2 | 16.2 | 37.2 | 85.6 | 97.6 | | Turkish | 58.6 | 57.2 | 61.6 | 56.6 | 38.4 | 60.2 | 91.4 | 94.6 | ## Appendix B: Korean benchmarks The prompt is the same as the CLIcK paper prompt. The experimental results below were given with max_tokens=512 (zero-shot), max_tokens=1024 (5-shot), temperature=0.01. No system prompt used. - GPT-4o: 2024-05-13 version - GPT-4o-mini: 2024-07-18 version - GPT-4-turbo: 2024-04-09 version - GPT-3.5-turbo: 2023-06-13 version The overall Korean benchmarks show that the Phi-3.5-Mini-Instruct with only 3.8B params outperforms Llama-3.1-8B-Instruct. | Benchmarks | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:-------------------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | CLIcK | 42.99 | 29.12 | 47.82 | 80.46 | 68.5 | 72.82 | 50.98 | | HAERAE 1.0 | 44.21 | 36.41 | 53.9 | 85.7 | 76.4 | 77.76 | 52.67 | | KMMLU (0-shot, CoT) | 35.87 | 30.82 | 38.54 | 64.26 | 52.63 | 58.75 | 40.3 | | KMMLU (5-shot) | 37.35 | 29.98 | 20.21 | 64.28 | 51.62 | 59.29 | 42.28 | | KMMLU-HARD (0-shot, CoT) | 24 | 25.68 | 24.03 | 39.62 | 24.56 | 30.56 | 20.97 | | KMMLU-HARD (5-shot) | 24.76 | 25.73 | 15.81 | 40.94 | 24.63 | 31.12 | 21.19 | | **Average** | **35.62** | **29.99** | **29.29** | **62.54** | **50.08** | **56.74** | **39.61** | #### CLIcK (Cultural and Linguistic Intelligence in Korean) ##### Accuracy by supercategory | supercategory | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:----------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | Culture | 43.77 | 29.74 | 51.15 | 81.89 | 70.95 | 73.61 | 53.38 | | Language | 41.38 | 27.85 | 40.92 | 77.54 | 63.54 | 71.23 | 46 | | **Overall** | 42.99 | 29.12 | 47.82 | 80.46 | 68.5 | 72.82 | 50.98 | ##### Accuracy by category | supercategory | category | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:----------------|:------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | Culture | Economy | 61.02 | 28.81 | 66.1 | 94.92 | 83.05 | 89.83 | 64.41 | | Culture | Geography | 45.8 | 29.01 | 54.2 | 80.15 | 77.86 | 82.44 | 53.44 | | Culture | History | 26.15 | 30 | 29.64 | 66.92 | 48.4 | 46.4 | 31.79 | | Culture | Law | 32.42 | 22.83 | 44.29 | 70.78 | 57.53 | 61.19 | 41.55 | | Culture | Politics | 54.76 | 33.33 | 59.52 | 88.1 | 83.33 | 89.29 | 65.48 | | Culture | Pop Culture | 60.98 | 34.15 | 60.98 | 97.56 | 85.37 | 92.68 | 75.61 | | Culture | Society | 54.37 | 31.72 | 65.05 | 92.88 | 85.44 | 86.73 | 71.2 | | Culture | Tradition | 47.75 | 31.98 | 54.95 | 87.39 | 74.77 | 79.28 | 55.86 | | Language | Functional | 37.6 | 24 | 32.8 | 84.8 | 64.8 | 80 | 40 | | Language | Grammar | 27.5 | 23.33 | 22.92 | 57.08 | 42.5 | 47.5 | 30 | | Language | Textual | 54.74 | 33.33 | 59.65 | 91.58 | 80.7 | 87.37 | 62.11 | #### HAERAE | category | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:----------------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | General Knowledge | 31.25 | 28.41 | 34.66 | 77.27 | 53.41 | 66.48 | 40.91 | | History | 32.45 | 22.34 | 44.15 | 92.02 | 84.57 | 78.72 | 30.32 | | Loan Words | 47.93 | 35.5 | 63.31 | 79.88 | 76.33 | 78.11 | 59.17 | | Rare Words | 55.06 | 42.96 | 63.21 | 87.9 | 81.98 | 79.01 | 61.23 | | Reading Comprehension | 42.95 | 41.16 | 51.9 | 85.46 | 77.18 | 80.09 | 56.15 | | Standard Nomenclature | 44.44 | 32.68 | 58.82 | 88.89 | 75.82 | 79.08 | 53.59 | | **Overall** | 44.21 | 36.41 | 53.9 | 85.7 | 76.4 | 77.76 | 52.67 | #### KMMLU (0-shot, CoT) | supercategory | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:----------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | Applied Science | 35.8 | 31.68 | 37.03 | 61.52 | 49.29 | 55.98 | 38.47 | | HUMSS | 31.56 | 26.47 | 37.29 | 69.45 | 56.59 | 63 | 40.9 | | Other | 35.45 | 31.01 | 39.15 | 63.79 | 52.35 | 57.53 | 40.19 | | STEM | 38.54 | 31.9 | 40.42 | 65.16 | 54.74 | 60.84 | 42.24 | | **Overall** | 35.87 | 30.82 | 38.54 | 64.26 | 52.63 | 58.75 | 40.3 | #### KMMLU (5-shot) | supercategory | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:----------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | Applied Science | 37.42 | 29.98 | 19.24 | 61.47 | 48.66 | 56.85 | 40.22 | | HUMSS | 34.72 | 27.27 | 22.5 | 68.79 | 55.95 | 63.68 | 43.35 | | Other | 37.04 | 30.76 | 20.95 | 64.21 | 51.1 | 57.85 | 41.92 | | STEM | 38.9 | 30.73 | 19.55 | 65.28 | 53.29 | 61.08 | 44.43 | | **Overall** | 37.35 | 29.98 | 20.21 | 64.28 | 51.62 | 59.29 | 42.28 | #### KMMLU-HARD (0-shot, CoT) | supercategory | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:----------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | Applied Science | 27.08 | 26.17 | 26.25 | 37.12 | 22.25 | 29.17 | 21.07 | | HUMSS | 20.21 | 24.38 | 20.21 | 41.97 | 23.31 | 31.51 | 19.44 | | Other | 23.05 | 24.82 | 23.88 | 40.39 | 26.48 | 29.59 | 22.22 | | STEM | 24.36 | 26.91 | 24.64 | 39.82 | 26.36 | 32.18 | 20.91 | | **Overall** | 24 | 25.68 | 24.03 | 39.62 | 24.56 | 30.56 | 20.97 | #### KMMLU-HARD (5-shot) | supercategory | Phi-3.5-Mini-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo | |:----------------|------------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:| | Applied Science | 25 | 29 | 12 | 31 | 21 | 25 | 20 | | HUMSS | 21.89 | 19.92 | 14 | 43.98 | 23.47 | 33.53 | 19.53 | | Other | 23.26 | 27.27 | 12.83 | 39.84 | 28.34 | 29.68 | 23.22 | | STEM | 20.5 | 25.25 | 12.75 | 40.25 | 23.25 | 27.25 | 19.75 | | **Overall** | 24.76 | 25.73 | 15.81 | 40.94 | 24.63 | 31.12 | 21.19 |", + "model_explanation_gemini": "Phi-3.5-mini-instruct is a lightweight, multilingual text-generation model optimized for high-quality reasoning, code, math, and logic tasks in memory/compute-constrained environments with a 128K token context length." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Phi-3.5-vision-instruct.json b/data/model_data_json/microsoft_Phi-3.5-vision-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..ee45d9d542b3e3dc937aea850af09f315c41ddfa --- /dev/null +++ b/data/model_data_json/microsoft_Phi-3.5-vision-instruct.json @@ -0,0 +1,23 @@ +{ + "model_id": "microsoft/Phi-3.5-vision-instruct", + "downloads": 377743, + "tags": [ + "transformers", + "safetensors", + "phi3_v", + "text-generation", + "nlp", + "code", + "vision", + "image-text-to-text", + "conversational", + "custom_code", + "multilingual", + "arxiv:2404.14219", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - multilingual pipeline_tag: image-text-to-text tags: - nlp - code - vision inference: parameters: temperature: 0.7 widget: - messages: - role: user content: <|image_1|>Can you describe what you see in the image? library_name: transformers --- ## Model Summary Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. 🏡 Phi-3 Portal
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
👩‍🍳 Phi-3 Cookbook
🖥️ Try It
**Phi-3.5**: [[mini-instruct]]( [[MoE-instruct]]( ; [[vision-instruct]]( ## Intended Uses ### Primary Use Cases The model is intended for broad commercial and research use in English. The model provides uses for general purpose AI systems and applications with visual and text input capabilities which require: 1) Memory/compute constrained environments 2) Latency bound scenarios 3) General image understanding 4) Optical character recognition 5) Chart and table understanding 6) Multiple image comparison 7) Multi-image or video clip summarization Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features. ### Use Case Considerations Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fariness before using within a specific downstream use case, particularly for high risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case. ***Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.*** ## Release Notes In this release, the model enables multi-frame image understanding and reasoning which is based on valuable customer feedback. The hero example multi-frame capabilities include detailed image comparison, multi-image summarization/storytelling and video summarization, which have broad applications in Office scenarios. We also observed performance improvement on most single image benchmarks, e.g., boost MMMU performance from 40.2 to 43.0, MMBench performance from 80.5 to 81.9, document understanding benchmark TextVQA from 70.9 to 72.0. We believe most use cases will benefit from this release, but we encourage users to test the new model in their AI applications. We appreciate the enthusiastic adoption of the Phi-3 model family and continue to welcome all the feedback from the community. Below are the comparison results on existing multi-image benchmarks. On average, our model outperforms competitor models on the same size and competitive with much bigger models on multi-frame capabilities and video summarization. **BLINK**: a benchmark with 14 visual tasks that humans can solve very quickly but are still hard for current multimodal LLMs. | Benchmark | Phi-3.5-vision-instruct | LlaVA-Interleave-Qwen-7B | InternVL-2-4B | InternVL-2-8B | Gemini-1.5-Flash | GPT-4o-mini | Claude-3.5-Sonnet | Gemini-1.5-Pro | GPT-4o | |--|--|--|--|--|--|--|--|--|--| | Art Style | 87.2 | 62.4 | 55.6 | 52.1 | 64.1 | 70.1 | 59.8 | 70.9 | 73.3 | | Counting | 54.2 | 56.7 | 54.2 | 66.7 | 51.7 | 55.0 | 59.2 | 65.0 | 65.0 | | Forensic Detection | 92.4 | 31.1 | 40.9 | 34.1 | 54.5 | 38.6 | 67.4 | 60.6 | 75.8 | | Functional Correspondence | 29.2 | 34.6 | 24.6 | 24.6 | 33.1 | 26.9 | 33.8 | 31.5 | 43.8 | | IQ Test | 25.3 | 26.7 | 26.0 | 30.7 | 25.3 | 29.3 | 26.0 | 34.0 | 19.3 | | Jigsaw | 68.0 | 86.0 | 55.3 | 52.7 | 71.3 | 72.7 | 57.3 | 68.0 | 67.3 | | Multi-View Reasoning | 54.1 | 44.4 | 48.9 | 42.9 | 48.9 | 48.1 | 55.6 | 49.6 | 46.6 | | Object Localization | 49.2 | 54.9 | 53.3 | 54.1 | 44.3 | 57.4 | 62.3 | 65.6 | 68.0 | | Relative Depth | 69.4 | 77.4 | 63.7 | 67.7 | 57.3 | 58.1 | 71.8 | 76.6 | 71.0 | | Relative Reflectance | 37.3 | 34.3 | 32.8 | 38.8 | 32.8 | 27.6 | 36.6 | 38.8 | 40.3 | | Semantic Correspondence | 36.7 | 31.7 | 31.7 | 22.3 | 32.4 | 31.7 | 45.3 | 48.9 | 54.0 | | Spatial Relation | 65.7 | 75.5 | 78.3 | 78.3 | 55.9 | 81.1 | 60.1 | 79.0 | 84.6 | | Visual Correspondence | 53.5 | 40.7 | 34.9 | 33.1 | 29.7 | 52.9 | 72.1 | 81.4 | 86.0 | | Visual Similarity | 83.0 | 91.9 | 48.1 | 45.2 | 47.4 | 77.8 | 84.4 | 81.5 | 88.1 | | **Overall** | **57.0** | **53.1** | **45.9** | **45.4** | **45.8** | **51.9** | **56.5** | **61.0** | **63.2** | **Video-MME**: comprehensively assess the capabilities of MLLMs in processing video data, covering a wide range of visual domains, temporal durations, and data modalities. | Benchmark | Phi-3.5-vision-instruct | LlaVA-Interleave-Qwen-7B | InternVL-2-4B | InternVL-2-8B | Gemini-1.5-Flash | GPT-4o-mini | Claude-3.5-Sonnet | Gemini-1.5-Pro | GPT-4o | |--|--|--|--|--|--|--|--|--|--| | short (<2min) | 60.8 | 62.3 | 60.7 | 61.7 | 72.2 | 70.1 | 66.3 | 73.3 | 77.7 | | medium (4-15min) | 47.7 | 47.1 | 46.4 | 49.6 | 62.7 | 59.6 | 54.7 | 61.2 | 68.0 | | long (30-60min) | 43.8 | 41.2 | 42.6 | 46.6 | 52.1 | 53.9 | 46.6 | 53.2 | 59.6 | | **Overall** | **50.8** | **50.2** | **49.9** | **52.6** | **62.3** | **61.2** | **55.9** | **62.6** | **68.4** | ## Usage ### Requirements The current version can be verified with: . Examples of required packages: Phi-3.5-vision-Instruct is also available in Azure AI Studio. ### Input Formats Given the nature of the training data, the Phi-3.5-vision model is best suited for prompts using the chat format as follows: Single image: Multi-turn conversations: For multi-image usage, add multiple image placeholders in the front of the prompts. <|image_{}|> index should start from 1. One example of prompt is shown as follows: ### Loading the model locally After obtaining the Phi-3.5-vision-instruct model checkpoints, users can use this sample code for inference. Notes: + to achieve best performances we suggest to set _num_crops=4_ for multi-frame and _num_crops=16_ for single-frame. + to turn off flash_attention users can set __attn_implementation='eager'_ ## Responsible AI Considerations Like other models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: * Quality of Service: The Phi models are trained primarily on English text. Languages other than English will experience worse performance. English language varieties with less representation in the training data might experience worse performance than standard American English. * Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. * Inappropriate or Offensive Content: These models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the use case. * Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. * Limited Scope for Code: Majority of Phi-3 training data is based in Python and use common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. Developers should apply responsible AI best practices and are responsible for ensuring that a specific use case complies with relevant laws and regulations (e.g. privacy, trade, etc.). Important areas for consideration include: * Allocation: Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. * High-Risk Scenarios: Developers should assess suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. * Misinformation: Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). * Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. * Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations. * Identification of individuals: models with vision capabilities may have the potential to uniquely identify individuals in images. Safety post-training steers the model to refuse such requests, but developers should consider and implement, as appropriate, additional mitigations or user consent flows as required in their respective jurisdiction, (e.g., building measures to blur faces in image inputs before processing). ## Training ### Models **Architecture:** Phi-3.5-vision has 4.2B parameters and contains image encoder, connector, projector, and Phi-3 Mini language model.
**Inputs:** Text and Image. It’s best suited for prompts using the chat format.
**Context length:** 128K tokens
**GPUs:** 256 A100-80G
**Training time:** 6 days
**Training data:** 500B tokens (vision tokens + text tokens)
**Outputs:** Generated text in response to the input
**Dates:** Trained between July and August 2024
**Status:** This is a static model trained on an offline text dataset with cutoff date March 15, 2024. Future versions of the tuned models may be released as we improve models.
**Release date:** August 2024
### Data Overview Our training data includes a wide variety of sources, and is a combination of 1) publicly available documents filtered rigorously for quality, selected high-quality educational data and code; 2) selected high-quality image-text interleave data; 3) newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.), newly created image data, e.g., chart/table/diagram/slides, newly created multi-image and video data, e.g., short video clips/pair of two similar images; 4) high quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness. The data collection process involved sourcing information from publicly available documents, with a meticulous approach to filtering out undesirable documents and images. To safeguard privacy, we carefully filtered various image and text data sources to remove or scrub any potentially personal data from the training data. More details about data can be found in the Phi-3 Technical Report. ### How to finetune? We recommend user to take a look at the Phi-3 CookBook finetuning recipe for Vision ## Benchmarks To understand the capabilities, we compare Phi-3.5-vision with a set of models over a variety of zero-shot benchmarks using our internal benchmark platform. At the high-level overview of the model quality on representative benchmarks: | Category | Benchmark | Phi-3.5-vision-instruct | Intern-VL-2-4B | Intern-VL-2-8B | Gemini-1.5-Flash | GPT-4o-mini 2024-7-18 | Claude-3.5-Sonnet | Gemini-1.5-Pro | GPT-4o 2024-5-13 | |--|--|--|--|--|--|--|--|--|--| | Popular aggregated benchmark | MMMU (val) | 43.0 | 44.22 | 46.33 | 49.33 | 52.1 | 52.67 | 54.11 | 61.78 | | | MMBench (dev-en) | 81.9 | 83.4 | 87.0 | 85.7 | 83.8 | 82.3 | 87.9 | 88.4 | | Visual scientific knowledge reasoning | ScienceQA (img-test) | 91.3 | 94.9 | 95.9 | 84.5 | 84.0 | 73.8 | 86.0 | 88.5 | | Visual math reasoning | MathVista (testmini) | 43.9 | 53.7 | 51.1 | 55.3 | 38.8 | 54.0 | 57.4 | 54.4 | | | InterGPS (test) | 36.3 | 45.6 | 53.2 | 39.4 | 39.9 | 45.6 | 58.2 | 46.9 | | Chart reasoning | AI2D (test) | 78.1 | 77.3 | 81.4 | 78.4 | 75.2 | 68.9 | 75.6 | 82.8 | | | ChartQA (test) | 81.8 | 78.8 | 80.4 | 57.6 | 54.5 | 73.2 | 68.2 | 64.0 | | Document Intelligence | TextVQA (val) | 72.0 | 66.2 | 68.8 | 67.4 | 70.9 | 70.5 | 64.5 | 75.6 | | Object visual presence verification | POPE (test) | 86.1 | 83.3 | 84.2 | 86.1 | 83.6 | 76.6 | 89.3 | 87.0 | ## Safety Evaluation and Red-Teaming **Approach** The Phi-3 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and RLHF (Reinforcement Learning from Human Feedback) approaches by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted to multiple safety categories. **Safety Evaluation** We leveraged various evaluation techniques including red teaming, adversarial conversation simulations, and safety evaluation benchmark datasets to evaluate Phi-3.5 models' propensity to produce undesirable outputs across multiple risk categories. Several approaches were used to compensate for the limitations of one approach alone. Please refer to the technical report for more details of our safety alignment. ## Software * PyTorch * Transformers * Flash-Attention ## Hardware Note that by default, the Phi-3.5-Mini-Instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types: * NVIDIA A100 * NVIDIA A6000 * NVIDIA H100 ## License The model is licensed under the MIT license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.", + "model_explanation_gemini": "Phi-3.5-vision-instruct is a lightweight multimodal model for image and text understanding, excelling in tasks like visual reasoning, OCR, chart analysis, and multi-image comparison with a 128K context length." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Phi-4-mini-instruct.json b/data/model_data_json/microsoft_Phi-4-mini-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..6291f9ec08f3cc436ab379906d3189482f6c798a --- /dev/null +++ b/data/model_data_json/microsoft_Phi-4-mini-instruct.json @@ -0,0 +1,46 @@ +{ + "model_id": "microsoft/Phi-4-mini-instruct", + "downloads": 376692, + "tags": [ + "transformers", + "safetensors", + "phi3", + "text-generation", + "nlp", + "code", + "conversational", + "custom_code", + "multilingual", + "ar", + "zh", + "cs", + "da", + "nl", + "en", + "fi", + "fr", + "de", + "he", + "hu", + "it", + "ja", + "ko", + "no", + "pl", + "pt", + "ru", + "es", + "sv", + "th", + "tr", + "uk", + "arxiv:2503.01743", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ar - zh - cs - da - nl - en - fi - fr - de - he - hu - it - ja - ko - 'no' - pl - pt - ru - es - sv - th - tr - uk library_name: transformers license: mit license_link: pipeline_tag: text-generation tags: - nlp - code widget: - messages: - role: user content: Can you provide ways to eat combinations of bananas and dragonfruits? --- 🎉**Phi-4**: [mini-reasoning | reasoning] | [multimodal-instruct | onnx]; [mini-instruct | onnx] ## Model Summary Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures. 📰 Phi-4-mini Microsoft Blog
📖 Phi-4-mini Technical Report
👩‍🍳 Phi Cookbook
🏡 Phi Portal
🖥️ Try It Azure, Huggingface
🚀 Model paper ## Intended Uses ### Primary Use Cases The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1) Memory/compute constrained environments 2) Latency bound scenarios 3) Strong reasoning (especially math and logic). The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features. ### Use Case Considerations The model is not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models, as well as performance difference across languages, as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including but not limited to privacy, trade compliance laws, etc.) that are relevant to their use case. ***Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.*** ## Release Notes This release of Phi-4-mini-instruct is based on valuable user feedback from the Phi-3 series. The Phi-4-mini model employed new architecture for efficiency, larger vocabulary for multilingual support, and better post-training techniques were used for instruction following, function calling, as well as additional data leading to substantial gains on key capabilities. It is anticipated that most use cases will benefit from this release, but users are encouraged to test in their particular AI applications. The enthusiastic support for the Phi-4 series is greatly appreciated. Feedback on Phi-4-mini-instruct is welcomed and crucial to the model’s evolution and improvement. ### Model Quality To understand the capabilities, the 3.8B parameters Phi-4-mini-instruct model was compared with a set of models over a variety of benchmarks using an internal benchmark platform (See Appendix A for benchmark methodology). A high-level overview of the model quality is as follows: | Benchmark | Similar size | | | | |2x size | | | | | | |----------------------------------|-------------|-------------------|-------------------|-------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| | | Phi-4 mini-Ins | Phi-3.5-mini-Ins | Llama-3.2-3B-Ins | Mistral-3B | Qwen2.5-3B-Ins | Qwen2.5-7B-Ins | Mistral-8B-2410 | Llama-3.1-8B-Ins | Llama-3.1-Tulu-3-8B | Gemma2-9B-Ins | GPT-4o-mini-2024-07-18 | | **Popular aggregated benchmark** | | | | | | | | | | | | | Arena Hard | 32.8 | 34.4 | 17.0 | 26.9 | 32.0 | 55.5 | 37.3 | 25.7 | 42.7 | 43.7 | 53.7 | | BigBench Hard (0-shot, CoT) | 70.4 | 63.1 | 55.4 | 51.2 | 56.2 | 72.4 | 53.3 | 63.4 | 55.5 | 65.7 | 80.4 | | MMLU (5-shot) | 67.3 | 65.5 | 61.8 | 60.8 | 65.0 | 72.6 | 63.0 | 68.1 | 65.0 | 71.3 | 77.2 | | MMLU-Pro (0-shot, CoT) | 52.8 | 47.4 | 39.2 | 35.3 | 44.7 | 56.2 | 36.6 | 44.0 | 40.9 | 50.1 | 62.8 | | **Reasoning** | | | | | | | | | | | | | ARC Challenge (10-shot) | 83.7 | 84.6 | 76.1 | 80.3 | 82.6 | 90.1 | 82.7 | 83.1 | 79.4 | 89.8 | 93.5 | | BoolQ (2-shot) | 81.2 | 77.7 | 71.4 | 79.4 | 65.4 | 80.0 | 80.5 | 82.8 | 79.3 | 85.7 | 88.7 | | GPQA (0-shot, CoT) | 25.2 | 26.6 | 24.3 | 24.4 | 23.4 | 30.6 | 26.3 | 26.3 | 29.9 | 39.1 | 41.1 | | HellaSwag (5-shot) | 69.1 | 72.2 | 77.2 | 74.6 | 74.6 | 80.0 | 73.5 | 72.8 | 80.9 | 87.1 | 88.7 | | OpenBookQA (10-shot) | 79.2 | 81.2 | 72.6 | 79.8 | 79.3 | 82.6 | 80.2 | 84.8 | 79.8 | 90.0 | 90.0 | | PIQA (5-shot) | 77.6 | 78.2 | 68.2 | 73.2 | 72.6 | 76.2 | 81.2 | 83.2 | 78.3 | 83.7 | 88.7 | | Social IQA (5-shot) | 72.5 | 75.1 | 68.3 | 73.9 | 75.3 | 75.3 | 77.6 | 71.8 | 73.4 | 74.7 | 82.9 | | TruthfulQA (MC2) (10-shot) | 66.4 | 65.2 | 59.2 | 62.9 | 64.3 | 69.4 | 63.0 | 69.2 | 64.1 | 76.6 | 78.2 | | Winogrande (5-shot) | 67.0 | 72.2 | 53.2 | 59.8 | 63.3 | 71.1 | 63.1 | 64.7 | 65.4 | 74.0 | 76.9 | | **Multilingual** | | | | | | | | | | | | | Multilingual MMLU (5-shot) | 49.3 | 51.8 | 48.1 | 46.4 | 55.9 | 64.4 | 53.7 | 56.2 | 54.5 | 63.8 | 72.9 | | MGSM (0-shot, CoT) | 63.9 | 49.6 | 44.6 | 44.6 | 53.5 | 64.5 | 56.7 | 56.7 | 58.6 | 75.1 | 81.7 | | **Math** | | | | | | | | | | | | | GSM8K (8-shot, CoT) | 88.6 | 76.9 | 75.6 | 80.1 | 80.6 | 88.7 | 81.9 | 82.4 | 84.3 | 84.9 | 91.3 | | MATH (0-shot, CoT) | 64.0 | 49.8 | 46.7 | 41.8 | 61.7 | 60.4 | 41.6 | 47.6 | 46.1 | 51.3 | 70.2 | | **Overall** | **63.5** | **60.5** | **56.2** | **56.9** | **60.1** | **67.9** | **60.2** | **62.3** | **60.9** | **65.0** | **75.5** | Overall, the model with only 3.8B-param achieves a similar level of multilingual language understanding and reasoning ability as much larger models. However, it is still fundamentally limited by its size for certain tasks. The model simply does not have the capacity to store too much factual knowledge, therefore, users may experience factual incorrectness. However, it may be possible to resolve such weakness by augmenting Phi-4 with a search engine, particularly when using the model under RAG settings. ## Usage ### Tokenizer Phi-4-mini-instruct supports a vocabulary size of up to tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size. ### Input Formats Given the nature of the training data, the Phi-4-mini-instruct model is best suited for prompts using specific formats. Below are the two primary formats: #### Chat format This format is used for general conversation and instructions: #### Tool-enabled function-calling format This format is used when the user wants the model to provide function calls based on the given tools. The user should provide the available tools in the system prompt, wrapped by <|tool|> and <|/tool|> tokens. The tools should be specified in JSON format, using a JSON dump structure. Example: ### Inference with vLLM #### Requirements List of required packages: #### Example To perform inference using vLLM, you can use the following code snippet: ### Inference with Transformers #### Requirements Phi-4 family has been integrated in the version of . The current version can be verified with: . Python 3.8 and 3.10 will work best. List of required packages: Phi-4-mini-instruct is also available in [Azure AI Studio]() #### Example After obtaining the Phi-4-mini-instruct model checkpoints, users can use this sample code for inference. ## Responsible AI Considerations Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: + Quality of Service: The Phi models are trained primarily on English text and some additional multilingual text. Languages other than English will experience worse performance as well as performance disparities across non-English. English language varieties with less representation in the training data might experience worse performance than standard American English. + Multilingual performance and safety gaps: We believe it is important to make language models more widely available across different languages, but the Phi 4 models still exhibit challenges common across multilingual releases. As with any deployment of LLMs, developers will be better positioned to test for performance or safety gaps for their linguistic and cultural context and customize the model with additional fine-tuning and appropriate safeguards. + Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups, cultural contexts, or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. + Inappropriate or Offensive Content: These models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the case. + Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. + Limited Scope for Code: The majority of Phi 4 training data is based in Python and uses common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, it is strongly recommended that users manually verify all API uses. + Long Conversation: Phi 4 models, like other models, can in some cases generate responses that are repetitive, unhelpful, or inconsistent in very long chat sessions in both English and non-English languages. Developers are encouraged to place appropriate mitigations, like limiting conversation turns to account for the possible conversational drift. Developers should apply responsible AI best practices, including mapping, measuring, and mitigating risks associated with their specific use case and cultural, linguistic context. Phi 4 family of models are general purpose models. As developers plan to deploy these models for specific use cases, they are encouraged to fine-tune the models for their use case and leverage the models as part of broader AI systems with language-specific safeguards in place. Important areas for consideration include: + Allocation: Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. + High-Risk Scenarios: Developers should assess the suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. + Misinformation: Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). + Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. + Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations. ## Training ### Model + **Architecture:** Phi-4-mini-instruct has 3.8B parameters and is a dense decoder-only Transformer model. When compared with Phi-3.5-mini, the major changes with Phi-4-mini-instruct are 200K vocabulary, grouped-query attention, and shared input and output embedding.
+ **Inputs:** Text. It is best suited for prompts using the chat format.
+ **Context length:** 128K tokens
+ **GPUs:** 512 A100-80G
+ **Training time:** 21 days
+ **Training data:** 5T tokens
+ **Outputs:** Generated text in response to the input
+ **Dates:** Trained between November and December 2024
+ **Status:** This is a static model trained on offline datasets with the cutoff date of June 2024 for publicly available data.
+ **Supported languages:** Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian
+ **Release date:** February 2025
### Training Datasets Phi-4-mini’s training data includes a wide variety of sources, totaling 5 trillion tokens, and is a combination of 1) publicly available documents filtered for quality, selected high-quality educational data, and code 2) newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (e.g., science, daily activities, theory of mind, etc.) 3) high quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness. Focus was placed on the quality of data that could potentially improve the reasoning ability for the model, and the publicly available documents were filtered to contain a preferred level of knowledge. As an example, the result of a game in premier league on a particular day might be good training data for frontier models, but such information was removed to leave more model capacity for reasoning for the model’s small size. More details about data can be found in the Phi-4-mini-instruct technical report. The decontamination process involved normalizing and tokenizing the dataset, then generating and comparing n-grams between the target dataset and benchmark datasets. Samples with matching n-grams above a threshold were flagged as contaminated and removed from the dataset. A detailed contamination report was generated, summarizing the matched text, matching ratio, and filtered results for further analysis. ### Fine-tuning A basic example of multi-GPUs supervised fine-tuning (SFT) with TRL and Accelerate modules is provided here. ## Safety Evaluation and Red-Teaming Various evaluation techniques including red teaming, adversarial conversation simulations, and multilingual safety evaluation benchmark datasets were leveraged to evaluate Phi-4 models’ propensity to produce undesirable outputs across multiple languages and risk categories. Several approaches were used to compensate for the limitations of one approach alone. Findings across the various evaluation methods indicate that safety post-training that was done as detailed in the Phi 3 Safety Post-Training paper had a positive impact across multiple languages and risk categories as observed by refusal rates (refusal to output undesirable outputs) and robustness to jailbreak techniques. Details on prior red team evaluations across Phi models can be found in the Phi 3 Safety Post-Training paper. For this release, the red team tested the model in English, Chinese, Japanese, Spanish, Portuguese, Arabic, Thai, and Russian for the following potential harms: Hate Speech and Bias, Violent Crimes, Specialized Advice, and Election Information. Their findings indicate that the model is resistant to jailbreak techniques across languages, but that language-specific attack prompts leveraging cultural context can cause the model to output harmful content. Another insight was that with function calling scenarios, the model could sometimes hallucinate function names or URL’s. The model may also be more susceptible to longer multi-turn jailbreak techniques across both English and non-English languages. These findings highlight the need for industry-wide investment in the development of high-quality safety evaluation datasets across multiple languages, including low resource languages, and risk areas that account for cultural nuances where those languages are spoken. ## Software * PyTorch * Transformers * Flash-Attention ## Hardware Note that by default, the Phi-4-mini-instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types: * NVIDIA A100 * NVIDIA A6000 * NVIDIA H100 If you want to run the model on: * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation=\"eager\" ## License The model is licensed under the MIT license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies. ## Appendix A: Benchmark Methodology We include a brief word on methodology here - and in particular, how we think about optimizing prompts. In an ideal world, we would never change any prompts in our benchmarks to ensure it is always an apples-to-apples comparison when comparing different models. Indeed, this is our default approach, and is the case in the vast majority of models we have run to date. There are, however, some exceptions to this. In some cases, we see a model that performs worse than expected on a given eval due to a failure to respect the output format. For example: + A model may refuse to answer questions (for no apparent reason), or in coding tasks models may prefix their response with “Sure, I can help with that. …” which may break the parser. In such cases, we have opted to try different system messages (e.g. “You must always respond to a question” or “Get to the point!”). + With some models, we observed that few shots actually hurt model performance. In this case we did allow running the benchmarks with 0-shots for all cases. + We have tools to convert between chat and completions APIs. When converting a chat prompt to a completion prompt, some models have different keywords e.g. Human vs User. In these cases, we do allow for model-specific mappings for chat to completion prompts. However, we do not: + Pick different few-shot examples. Few shots will always be the same when comparing different models. + Change prompt format: e.g. if it is an A/B/C/D multiple choice, we do not tweak this to 1/2/3/4 multiple choice. ### Benchmark datasets The model was evaluated across a breadth of public and internal benchmarks to understand the model’s capabilities under multiple tasks and conditions. While most evaluations use English, the leading multilingual benchmark was incorporated that covers performance in select languages. More specifically, + Reasoning: + Winogrande: commonsense reasoning around pronoun resolution + PIQA: physical commonsense reasoning around everyday situations + ARC-challenge: grade-school multiple choice science questions + GPQA: very hard questions written and validated by experts in biology, physics, and chemistry + MedQA: medical questions answering + Social IQA: social commonsense intelligence + BoolQ: natural questions from context + TruthfulQA: grounded reasoning + Language understanding: + HellaSwag: commonsense natural language inference around everyday events + ANLI: adversarial natural language inference + Function calling: + Berkeley function calling function and tool call + Internal function calling benchmarks + World knowledge: + TriviaQA: trivia question on general topics + Math: + GSM8K: grade-school math word problems + GSM8K Hard: grade-school math word problems with large values and some absurdity. + MATH: challenging competition math problems + Code: + HumanEval HumanEval+, MBPP, MBPP+: python coding tasks + LiveCodeBenh, LiveBench: contamination-free code tasks + BigCode Bench: challenging programming tasks + Spider: SQL query tasks + Internal coding benchmarks + Instructions following: + IFEval: verifiable instructions + Internal instructions following benchmarks + Multilingual: + MGSM: multilingual grade-school math + Multilingual MMLU and MMLU-pro + MEGA: multilingual NLP tasks + Popular aggregated datasets: MMLU, MMLU-pro, BigBench-Hard, AGI Eval + Multi-turn conversations: + Data generated by in-house adversarial conversation simulation tool + Single-turn trustworthiness evaluation: + DecodingTrust: a collection of trustworthiness benchmarks in eight different perspectives + XSTest: exaggerated safety evaluation + Toxigen: adversarial and hate speech detection + Red Team: + Responses to prompts provided by AI Red Team at Microsoft", + "model_explanation_gemini": "A lightweight, multilingual model optimized for high-quality reasoning and precise instruction-following in memory-constrained or latency-sensitive applications." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_Phi-4-multimodal-instruct.json b/data/model_data_json/microsoft_Phi-4-multimodal-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..07720a4ef64e57524eb701b70cecfc06f4d5b544 --- /dev/null +++ b/data/model_data_json/microsoft_Phi-4-multimodal-instruct.json @@ -0,0 +1,52 @@ +{ + "model_id": "microsoft/Phi-4-multimodal-instruct", + "downloads": 353722, + "tags": [ + "transformers", + "safetensors", + "phi4mm", + "text-generation", + "nlp", + "code", + "audio", + "automatic-speech-recognition", + "speech-summarization", + "speech-translation", + "visual-question-answering", + "phi-4-multimodal", + "phi", + "phi-4-mini", + "custom_code", + "multilingual", + "ar", + "zh", + "cs", + "da", + "nl", + "en", + "fi", + "fr", + "de", + "he", + "hu", + "it", + "ja", + "ko", + "no", + "pl", + "pt", + "ru", + "es", + "sv", + "th", + "tr", + "uk", + "arxiv:2503.01743", + "arxiv:2407.13833", + "license:mit", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - multilingual - ar - zh - cs - da - nl - en - fi - fr - de - he - hu - it - ja - ko - no - pl - pt - ru - es - sv - th - tr - uk tags: - nlp - code - audio - automatic-speech-recognition - speech-summarization - speech-translation - visual-question-answering - phi-4-multimodal - phi - phi-4-mini widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: - messages: - role: user content: Transcribe the audio to text, and then translate the audio to French. Use as a separator between the original transcript and the translation. library_name: transformers paper: --- 🎉**Phi-4**: [mini-reasoning | reasoning] | [multimodal-instruct | onnx]; [mini-instruct | onnx] ## Model Summary Phi-4-multimodal-instruct is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models. The model processes text, image, and audio inputs, generating text outputs, and comes with 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning, direct preference optimization and RLHF (Reinforcement Learning from Human Feedback) to support precise instruction adherence and safety measures. The languages that each modal supports are the following: - Text: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian - Vision: English - Audio: English, Chinese, German, French, Italian, Japanese, Spanish, Portuguese 📰 Phi-4-multimodal Microsoft Blog
📖 Phi-4-multimodal Technical Report
🏡 Phi Portal
👩‍🍳 Phi Cookbook
🖥️ Try It on Azure, GitHub, Nvidia, Huggingface playgrounds
📱Huggingface Spaces Thoughts Organizer, Stories Come Alive, Phine Speech Translator
Watch as Phi-4 Multimodal analyzes spoken language to help plan a trip to Seattle, demonstrating its advanced audio processing and recommendation capabilities.
See how Phi-4 Multimodal tackles complex mathematical problems through visual inputs, demonstrating its ability to process and solve equations presented in images.
Explore how Phi-4 Mini functions as an intelligent agent, showcasing its reasoning and task execution abilities in complex scenarios.
## Intended Uses ### Primary Use Cases The model is intended for broad multilingual and multimodal commercial and research use . The model provides uses for general purpose AI systems and applications which require 1) Memory/compute constrained environments 2) Latency bound scenarios 3) Strong reasoning (especially math and logic) 4) Function and tool calling 5) General image understanding 6) Optical character recognition 7) Chart and table understanding 8) Multiple image comparison 9) Multi-image or video clip summarization 10) Speech recognition 11) Speech translation 12) Speech QA 13) Speech summarization 14) Audio understanding The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features. ### Use Case Considerations The model is not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models and multimodal models, as well as performance difference across languages, as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including but not limited to privacy, trade compliance laws, etc.) that are relevant to their use case. ***Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.*** ## Release Notes This release of Phi-4-multimodal-instruct is based on valuable user feedback from the Phi-3 series. Previously, users could use a speech recognition model to talk to the Mini and Vision models. To achieve this, users needed to use a pipeline of two models: one model to transcribe the audio to text, and another model for the language or vision tasks. This pipeline means that the core model was not provided the full breadth of input information – e.g. cannot directly observe multiple speakers, background noises, jointly align speech, vision, language information at the same time on the same representation space. With Phi-4-multimodal-instruct, a single new open model has been trained across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. The model employed new architecture, larger vocabulary for efficiency, multilingual, and multimodal support, and better post-training techniques were used for instruction following and function calling, as well as additional data leading to substantial gains on key multimodal capabilities. It is anticipated that Phi-4-multimodal-instruct will greatly benefit app developers and various use cases. The enthusiastic support for the Phi-4 series is greatly appreciated. Feedback on Phi-4 is welcomed and crucial to the model's evolution and improvement. Thank you for being part of this journey! ## Model Quality
Click to view details To understand the capabilities, Phi-4-multimodal-instruct was compared with a set of models over a variety of benchmarks using an internal benchmark platform (See Appendix A for benchmark methodology). Users can refer to the Phi-4-Mini-Instruct model card for details of language benchmarks. At the high-level overview of the model quality on representative speech and vision benchmarks: ### Speech The Phi-4-multimodal-instruct was observed as - Having strong automatic speech recognition (ASR) and speech translation (ST) performance, surpassing expert ASR model WhisperV3 and ST models SeamlessM4T-v2-Large. - Ranking number 1 on the Huggingface OpenASR leaderboard with word error rate 6.14% in comparison with the current best model 6.5% as of March 04, 2025. - Being the first open-sourced model that can perform speech summarization, and the performance is close to GPT4o. - Having a gap with close models, e.g. Gemini-1.5-Flash and GPT-4o-realtime-preview, on speech QA task. Work is being undertaken to improve this capability in the next iterations. #### Speech Recognition (lower is better) The performance of Phi-4-multimodal-instruct on the aggregated benchmark datasets: !alt text The performance of Phi-4-multimodal-instruct on different languages, averaging the WERs of CommonVoice and FLEURS: !alt text #### Speech Translation (higher is better) Translating from German, Spanish, French, Italian, Japanese, Portugues, Chinese to English: !alt text Translating from English to German, Spanish, French, Italian, Japanese, Portugues, Chinese. Noted that WhiperV3 does not support this capability: !alt text #### Speech Summarization (higher is better) !alt text #### Speech QA MT bench scores are scaled by 10x to match the score range of MMMLU: !alt text #### Audio Understanding AIR bench scores are scaled by 10x to match the score range of MMAU: !alt text ### Vision #### Vision-Speech tasks Phi-4-multimodal-instruct is capable of processing both image and audio together, the following table shows the model quality when the input query for vision content is synthetic speech on chart/table understanding and document reasoning tasks. Compared to other existing state-of-the-art omni models that can enable audio and visual signal as input, Phi-4-multimodal-instruct achieves much stronger performance on multiple benchmarks. | Benchmarks | Phi-4-multimodal-instruct | InternOmni-7B | Gemini-2.0-Flash-Lite-prv-02-05 | Gemini-2.0-Flash | Gemini-1.5-Pro | |-----------------------|--------------------------|---------------|--------------------------------|-----------------|----------------| | s_AI2D | **68.9** | 53.9 | 62.0 | **69.4** | 67.7 | | s_ChartQA | **69.0** | 56.1 | 35.5 | 51.3 | 46.9 | | s_DocVQA | **87.3** | 79.9 | 76.0 | 80.3 | 78.2 | | s_InfoVQA | **63.7** | 60.3 | 59.4 | 63.6 | **66.1** | | **Average** | **72.2** | **62.6** | **58.2** | **66.2** | **64.7** | ### Vision tasks To understand the vision capabilities, Phi-4-multimodal-instruct was compared with a set of models over a variety of zero-shot benchmarks using an internal benchmark platform. At the high-level overview of the model quality on representative benchmarks: | Dataset | Phi-4-multimodal-ins | Phi-3.5-vision-ins | Qwen 2.5-VL-3B-ins | Intern VL 2.5-4B | Qwen 2.5-VL-7B-ins | Intern VL 2.5-8B | Gemini 2.0-Flash Lite-preview-0205 | Gemini2.0-Flash | Claude-3.5-Sonnet-2024-10-22 | Gpt-4o-2024-11-20 | |----------------------------------|---------------------|-------------------|-------------------|-----------------|-------------------|-----------------|--------------------------------|-----------------|----------------------------|------------------| | **Popular aggregated benchmark** | | | | | | | | | | | | MMMU | **55.1** | 43.0 | 47.0 | 48.3 | 51.8 | 50.6 | 54.1 | **64.7** | 55.8 | 61.7 | | MMBench (dev-en) | **86.7** | 81.9 | 84.3 | 86.8 | 87.8 | 88.2 | 85.0 | **90.0** | 86.7 | 89.0 | | MMMU-Pro (std/vision) | **38.5** | 21.8 | 29.9 | 32.4 | 36.9 | 34.4 | 45.1 | **54.4** | 54.3 | 53.0 | | **Visual science reasoning** | | | | | | | | | | | | ScienceQA Visual (img-test) | **97.5** | 91.3 | 79.4 | 96.2 | 87.7 | **97.3** | 85.0 | 88.3 | 81.2 | 88.2 | | **Visual math reasoning** | | | | | | | | | | | | MathVista (testmini) | **62.4** | 43.9 | 60.8 | 51.2 | **67.8** | 56.7 | 57.6 | 47.2 | 56.9 | 56.1 | | InterGPS | **48.6** | 36.3 | 48.3 | 53.7 | 52.7 | 54.1 | 57.9 | **65.4** | 47.1 | 49.1 | | **Chart & table reasoning** | | | | | | | | | | | | AI2D | **82.3** | 78.1 | 78.4 | 80.0 | 82.6 | 83.0 | 77.6 | 82.1 | 70.6 | **83.8** | | ChartQA | **81.4** | 81.8 | 80.0 | 79.1 | **85.0** | 81.0 | 73.0 | 79.0 | 78.4 | 75.1 | | DocVQA | **93.2** | 69.3 | 93.9 | 91.6 | **95.7** | 93.0 | 91.2 | 92.1 | 95.2 | 90.9 | | InfoVQA | **72.7** | 36.6 | 77.1 | 72.1 | **82.6** | 77.6 | 73.0 | 77.8 | 74.3 | 71.9 | | **Document Intelligence** | | | | | | | | | | | | TextVQA (val) | **75.6** | 72.0 | 76.8 | 70.9 | **77.7** | 74.8 | 72.9 | 74.4 | 58.6 | 73.1 | | OCR Bench | **84.4** | 63.8 | 82.2 | 71.6 | **87.7** | 74.8 | 75.7 | 81.0 | 77.0 | 77.7 | | **Object visual presence verification** | | | | | | | | | | | | POPE | **85.6** | 86.1 | 87.9 | 89.4 | 87.5 | **89.1** | 87.5 | 88.0 | 82.6 | 86.5 | | **Multi-image perception** | | | | | | | | | | | | BLINK | **61.3** | 57.0 | 48.1 | 51.2 | 55.3 | 52.5 | 59.3 | **64.0** | 56.9 | 62.4 | | Video MME 16 frames | **55.0** | 50.8 | 56.5 | 57.3 | 58.2 | 58.7 | 58.8 | 65.5 | 60.2 | **68.2** | | **Average** | **72.0** | **60.9** | **68.7** | **68.8** | **73.1** | **71.1** | **70.2** | **74.3** | **69.1** | **72.4** | !alt text #### Visual Perception Below are the comparison results on existing multi-image tasks. On average, Phi-4-multimodal-instruct outperforms competitor models of the same size and competitive with much bigger models on multi-frame capabilities. BLINK is an aggregated benchmark with 14 visual tasks that humans can solve very quickly but are still hard for current multimodal LLMs. | Dataset | Phi-4-multimodal-instruct | Qwen2.5-VL-3B-Instruct | InternVL 2.5-4B | Qwen2.5-VL-7B-Instruct | InternVL 2.5-8B | Gemini-2.0-Flash-Lite-prv-02-05 | Gemini-2.0-Flash | Claude-3.5-Sonnet-2024-10-22 | Gpt-4o-2024-11-20 | |----------------------------|--------------------------|----------------------|-----------------|----------------------|-----------------|--------------------------------|-----------------|----------------------------|------------------| | Art Style | **86.3** | 58.1 | 59.8 | 65.0 | 65.0 | 76.9 | 76.9 | 68.4 | 73.5 | | Counting | **60.0** | 67.5 | 60.0 | 66.7 | **71.7** | 45.8 | 69.2 | 60.8 | 65.0 | | Forensic Detection | **90.2** | 34.8 | 22.0 | 43.9 | 37.9 | 31.8 | 74.2 | 63.6 | 71.2 | | Functional Correspondence | **30.0** | 20.0 | 26.9 | 22.3 | 27.7 | 48.5 | **53.1** | 34.6 | 42.3 | | IQ Test | **22.7** | 25.3 | 28.7 | 28.7 | 28.7 | 28.0 | **30.7** | 20.7 | 25.3 | | Jigsaw | **68.7** | 52.0 | **71.3** | 69.3 | 53.3 | 62.7 | 69.3 | 61.3 | 68.7 | | Multi-View Reasoning | **76.7** | 44.4 | 44.4 | 54.1 | 45.1 | 55.6 | 41.4 | 54.9 | 54.1 | | Object Localization | **52.5** | 55.7 | 53.3 | 55.7 | 58.2 | 63.9 | **67.2** | 58.2 | 65.6 | | Relative Depth | **69.4** | 68.5 | 68.5 | 80.6 | 76.6 | **81.5** | 72.6 | 66.1 | 73.4 | | Relative Reflectance | **26.9** | **38.8** | **38.8** | 32.8 | **38.8** | 33.6 | 34.3 | 38.1 | 38.1 | | Semantic Correspondence | **52.5** | 32.4 | 33.8 | 28.8 | 24.5 | **56.1** | 55.4 | 43.9 | 47.5 | | Spatial Relation | **72.7** | 80.4 | 86.0 | **88.8** | 86.7 | 74.1 | 79.0 | 74.8 | 83.2 | | Visual Correspondence | **67.4** | 28.5 | 39.5 | 50.0 | 44.2 | 84.9 | **91.3** | 72.7 | 82.6 | | Visual Similarity | **86.7** | 67.4 | 88.1 | 87.4 | 85.2 | **87.4** | 80.7 | 79.3 | 83.0 | | **Overall** | **61.6** | **48.1** | **51.2** | **55.3** | **52.5** | **59.3** | **64.0** | **56.9** | **62.4** | !alt text
## Usage ### Requirements Phi-4 family has been integrated in the version of . The current version can be verified with: . We suggest to run with Python 3.10. Examples of required packages: Phi-4-multimodal-instruct is also available in Azure AI Studio ### Tokenizer Phi-4-multimodal-instruct supports a vocabulary size of up to tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size. ### Input Formats Given the nature of the training data, the Phi-4-multimodal-instruct model is best suited for prompts using the chat format as follows: #### Text chat format This format is used for general conversation and instructions: #### Tool-enabled function-calling format This format is used when the user wants the model to provide function calls based on the given tools. The user should provide the available tools in the system prompt, wrapped by <|tool|> and <|/tool|> tokens. The tools should be specified in JSON format, using a JSON dump structure. Example: #### Vision-Language Format This format is used for conversation with image: For multiple images, the user needs to insert multiple image placeholders in the prompt as below: #### Speech-Language Format This format is used for various speech and audio tasks: The task prompt can vary for different task. Automatic Speech Recognition: Automatic Speech Translation: Automatic Speech Translation with chain-of-thoughts: Spoken-query Question Answering: #### Vision-Speech Format This format is used for conversation with image and audio. The audio may contain query related to the image: For multiple images, the user needs to insert multiple image placeholders in the prompt as below: **Vision** - Any common RGB/gray image format (e.g., (\".jpg\", \".jpeg\", \".png\", \".ppm\", \".bmp\", \".pgm\", \".tif\", \".tiff\", \".webp\")) can be supported. - Resolution depends on the GPU memory size. Higher resolution and more images will produce more tokens, thus using more GPU memory. During training, 64 crops can be supported. If it is a square image, the resolution would be around (8*448 by 8*448). For multiple-images, at most 64 frames can be supported, but with more frames as input, the resolution of each frame needs to be reduced to fit in the memory. **Audio** - Any audio format that can be loaded by soundfile package should be supported. - To keep the satisfactory performance, maximum audio length is suggested to be 40s. For summarization tasks, the maximum audio length is suggested to 30 mins. ### Loading the model locally After obtaining the Phi-4-multimodal-instruct model checkpoints, users can use this sample code for inference.
Click to view details
More inference examples can be found **here**. ### vLLM inference User can start a server with this command The speech lora and vision lora folders are within the Phi-4-multimodal-instruct folder downloaded by vLLM, you can also use the following script to find thoses: ## Training ### Fine-tuning A basic example of supervised fine-tuning (SFT) for **speech** and **vision** is provided respectively. An example on **how to extend speech recognition to a new language**. ### Model + **Architecture:** Phi-4-multimodal-instruct has 5.6B parameters and is a multimodal transformer model. The model has the pretrained Phi-4-Mini-Instruct as the backbone language model, and the advanced encoders and adapters of vision and speech.
+ **Inputs:** Text, image, and audio. It is best suited for prompts using the chat format.
+ **Context length:** 128K tokens
+ **GPUs:** 512 A100-80G
+ **Training time:** 28 days
+ **Training data:** 5T tokens, 2.3M speech hours, and 1.1T image-text tokens
+ **Outputs:** Generated text in response to the input
+ **Dates:** Trained between December 2024 and January 2025
+ **Status:** This is a static model trained on offline datasets with the cutoff date of June 2024 for publicly available data.
+ **Supported languages:** + Text: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian
+ Vision: English
+ Audio: English, Chinese, German, French, Italian, Japanese, Spanish, Portuguese
+ **Release date:** February 2025
### Training Datasets Phi-4-multimodal-instruct's training data includes a wide variety of sources, totaling 5 trillion text tokens, and is a combination of 1) publicly available documents filtered for quality, selected high-quality educational data, and code 2) newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (e.g., science, daily activities, theory of mind, etc.) 3) high quality human labeled data in chat format 4) selected high-quality image-text interleave data 5) synthetic and publicly available image, multi-image, and video data 6) anonymized in-house speech-text pair data with strong/weak transcriptions 7) selected high-quality publicly available and anonymized in-house speech data with task-specific supervisions 8) selected synthetic speech data 9) synthetic vision-speech data. Focus was placed on the quality of data that could potentially improve the reasoning ability for the model, and the publicly available documents were filtered to contain a preferred level of knowledge. As an example, the result of a game in premier league on a particular day might be good training data for large foundation models, but such information was removed for the Phi-4-multimodal-instruct to leave more model capacity for reasoning for the model's small size. The data collection process involved sourcing information from publicly available documents, with a focus on filtering out undesirable documents and images. To safeguard privacy, image and text data sources were filtered to remove or scrub potentially personal data from the training data. The decontamination process involved normalizing and tokenizing the dataset, then generating and comparing n-grams between the target dataset and benchmark datasets. Samples with matching n-grams above a threshold were flagged as contaminated and removed from the dataset. A detailed contamination report was generated, summarizing the matched text, matching ratio, and filtered results for further analysis. ### Software * PyTorch * Transformers * Flash-Attention * Accelerate * soundfile * pillow ### Hardware Note that by default, the Phi-4-multimodal-instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types: * NVIDIA A100 * NVIDIA A6000 * NVIDIA H100 If you want to run the model on: * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with _attn_implementation=\"eager\" ## Responsible AI Considerations
Click to view detail descriptions Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: + Quality of Service: The Phi models are trained primarily on English language content across text, speech, and visual inputs, with some additional multilingual coverage. Performance may vary significantly across different modalities and languages: + Text: Languages other than English will experience reduced performance, with varying levels of degradation across different non-English languages. English language varieties with less representation in the training data may perform worse than standard American English. + Speech: Speech recognition and processing shows similar language-based performance patterns, with optimal performance for standard American English accents and pronunciations. Other English accents, dialects, and non-English languages may experience lower recognition accuracy and response quality. Background noise, audio quality, and speaking speed can further impact performance. + Vision: Visual processing capabilities may be influenced by cultural and geographical biases in the training data. The model may show reduced performance when analyzing images containing text in non-English languages or visual elements more commonly found in non-Western contexts. Image quality, lighting conditions, and composition can also affect processing accuracy. + Multilingual performance and safety gaps: We believe it is important to make language models more widely available across different languages, but the Phi 4 models still exhibit challenges common across multilingual releases. As with any deployment of LLMs, developers will be better positioned to test for performance or safety gaps for their linguistic and cultural context and customize the model with additional fine-tuning and appropriate safeguards. + Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups, cultural contexts, or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. + Inappropriate or Offensive Content: These models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the case. + Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. + Limited Scope for Code: The majority of Phi 4 training data is based in Python and uses common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, it is strongly recommended that users manually verify all API uses. + Long Conversation: Phi 4 models, like other models, can in some cases generate responses that are repetitive, unhelpful, or inconsistent in very long chat sessions in both English and non-English languages. Developers are encouraged to place appropriate mitigations, like limiting conversation turns to account for the possible conversational drift. + Inference of Sensitive Attributes: The Phi 4 models can sometimes attempt to infer sensitive attributes (such as personality characteristics, country of origin, gender, etc...) from the users’ voices when specifically asked to do so. Phi 4-multimodal-instruct is not designed or intended to be used as a biometric categorization system to categorize individuals based on their biometric data to deduce or infer their race, political opinions, trade union membership, religious or philosophical beliefs, sex life, or sexual orientation. This behavior can be easily and efficiently mitigated at the application level by a system message. Developers should apply responsible AI best practices, including mapping, measuring, and mitigating risks associated with their specific use case and cultural, linguistic context. Phi 4 family of models are general purpose models. As developers plan to deploy these models for specific use cases, they are encouraged to fine-tune the models for their use case and leverage the models as part of broader AI systems with language-specific safeguards in place. Important areas for consideration include: + Allocation: Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. + High-Risk Scenarios: Developers should assess the suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. + Misinformation: Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). + Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. + Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.
## Safety
Click to view detail descriptions The Phi-4 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets. The overall technique employed for safety alignment is a combination of SFT (Supervised Fine-Tuning), DPO (Direct Preference Optimization), and RLHF (Reinforcement Learning from Human Feedback) approaches by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness, as well as various questions and answers targeted to multiple safety categories. For non-English languages, existing datasets were extended via machine translation. Speech Safety datasets were generated by running Text Safety datasets through Azure TTS (Text-To-Speech) Service, for both English and non-English languages. Vision (text & images) Safety datasets were created to cover harm categories identified both in public and internal multi-modal RAI datasets. ### Safety Evaluation and Red-Teaming Various evaluation techniques including red teaming, adversarial conversation simulations, and multilingual safety evaluation benchmark datasets were leveraged to evaluate Phi-4 models' propensity to produce undesirable outputs across multiple languages and risk categories. Several approaches were used to compensate for the limitations of one approach alone. Findings across the various evaluation methods indicate that safety post-training that was done as detailed in the Phi 3 Safety Post-Training paper had a positive impact across multiple languages and risk categories as observed by refusal rates (refusal to output undesirable outputs) and robustness to jailbreak techniques. Details on prior red team evaluations across Phi models can be found in the Phi 3 Safety Post-Training paper. For this release, the red teaming effort focused on the newest Audio input modality and on the following safety areas: harmful content, self-injury risks, and exploits. The model was found to be more susceptible to providing undesirable outputs when attacked with context manipulation or persuasive techniques. These findings applied to all languages, with the persuasive techniques mostly affecting French and Italian. This highlights the need for industry-wide investment in the development of high-quality safety evaluation datasets across multiple languages, including low resource languages, and risk areas that account for cultural nuances where those languages are spoken. ### Vision Safety Evaluation To assess model safety in scenarios involving both text and images, Microsoft's Azure AI Evaluation SDK was utilized. This tool facilitates the simulation of single-turn conversations with the target model by providing prompt text and images designed to incite harmful responses. The target model's responses are subsequently evaluated by a capable model across multiple harm categories, including violence, sexual content, self-harm, hateful and unfair content, with each response scored based on the severity of the harm identified. The evaluation results were compared with those of Phi-3.5-Vision and open-source models of comparable size. In addition, we ran both an internal and the public RTVLM and VLGuard multi-modal (text & vision) RAI benchmarks, once again comparing scores with Phi-3.5-Vision and open-source models of comparable size. However, the model may be susceptible to language-specific attack prompts and cultural context. ### Audio Safety Evaluation In addition to extensive red teaming, the Safety of the model was assessed through three distinct evaluations. First, as performed with Text and Vision inputs, Microsoft's Azure AI Evaluation SDK was leveraged to detect the presence of harmful content in the model's responses to Speech prompts. Second, Microsoft's Speech Fairness evaluation was run to verify that Speech-To-Text transcription worked well across a variety of demographics. Third, we proposed and evaluated a mitigation approach via a system message to help prevent the model from inferring sensitive attributes (such as gender, sexual orientation, profession, medical condition, etc...) from the voice of a user.
## License The model is licensed under the MIT license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies. ## Appendix A: Benchmark Methodology
Click to view detail descriptions We include a brief word on methodology here - and in particular, how we think about optimizing prompts. In an ideal world, we would never change any prompts in our benchmarks to ensure it is always an apples-to-apples comparison when comparing different models. Indeed, this is our default approach, and is the case in the vast majority of models we have run to date. There are, however, some exceptions to this. In some cases, we see a model that performs worse than expected on a given eval due to a failure to respect the output format. For example: + A model may refuse to answer questions (for no apparent reason), or in coding tasks models may prefix their response with “Sure, I can help with that. …” which may break the parser. In such cases, we have opted to try different system messages (e.g. “You must always respond to a question” or “Get to the point!”). + Some models, we observed that few shots actually hurt model performance. In this case we did allow running the benchmarks with 0-shots for all cases. + We have tools to convert between chat and completions APIs. When converting a chat prompt to a completion prompt, some models have different keywords e.g. Human vs User. In these cases, we do allow for model-specific mappings for chat to completion prompts. However, we do not: + Pick different few-shot examples. Few shots will always be the same when comparing different models. + Change prompt format: e.g. if it is an A/B/C/D multiple choice, we do not tweak this to 1/2/3/4 multiple choice. ### Vision Benchmark Settings The goal of the benchmark setup is to measure the performance of the LMM when a regular user utilizes these models for a task involving visual input. To this end, we selected 9 popular and publicly available single-frame datasets and 3 multi-frame benchmarks that cover a wide range of challenging topics and tasks (e.g., mathematics, OCR tasks, charts-and-plots understanding, etc.) as well as a set of high-quality models. Our benchmarking setup utilizes zero-shot prompts and all the prompt content are the same for every model. We only formatted the prompt content to satisfy the model's prompt API. This ensures that our evaluation is fair across the set of models we tested. Many benchmarks necessitate models to choose their responses from a presented list of options. Therefore, we've included a directive in the prompt's conclusion, guiding all models to pick the option letter that corresponds to the answer they deem correct. In terms of the visual input, we use the images from the benchmarks as they come from the original datasets. We converted these images to base-64 using a JPEG encoding for models that require this format (e.g., GPTV, Claude Sonnet 3.5, Gemini 1.5 Pro/Flash). For other models (e.g., Llava Interleave, and InternVL2 4B and 8B), we used their Huggingface interface and passed in PIL images or a JPEG image stored locally. We did not scale or pre-process images in any other way. Lastly, we used the same code to extract answers and evaluate them using the same code for every considered model. This ensures that we are fair in assessing the quality of their answers. ### Speech Benchmark Settings The objective of this benchmarking setup is to assess the performance of models in speech and audio understanding tasks as utilized by regular users. To accomplish this, we selected several state-of-the-art open-sourced and closed-sourced models and performed evaluations across a variety of public and in-house benchmarks. These benchmarks encompass diverse and challenging topics, including Automatic Speech Recognition (ASR), Automatic Speech Translation (AST), Spoken Query Question Answering (SQQA), Audio Understanding (AU), and Speech Summarization. The results are derived from evaluations conducted on identical test data without any further clarifications. All results were obtained without sampling during inference. For an accurate comparison, we employed consistent prompts for models across different tasks, except for certain model APIs (e.g., GPT-4o), which may refuse to respond to specific prompts for some tasks. In conclusion, we used uniform code to extract answers and evaluate them for all considered models. This approach ensured fairness by assessing the quality of their responses. ### Benchmark datasets The model was evaluated across a breadth of public and internal benchmarks to understand it's capabilities under multiple tasks and conditions. While most evaluations use English, multilingual benchmark was incorporated to cover performance in select languages. More specifically, + Vision: + Popular aggregated benchmark: + MMMU and MMMU-Pro: massive multi-discipline tasks at college-level subject knowledge and deliberate reasoning. + MMBench: large-scale benchmark to evaluate perception and reasoning capabilities. + Visual reasoning: + ScienceQA: multimodal visual question answering on science. + MathVista: visual math reasoning. + InterGPS: Visual 2D geometry reasoning. + Chart reasoning: + ChartQA: visual and logical reasoning on charts. + AI2D: diagram understanding. + Document Intelligence: + TextVQA: read and reason about text in images to answer questions about them. + InfoVQA: read and reason about high-resolution infographics images with arbitrary aspect ratios. + DocVQA: read and reason about document images with dense texts and handwritten texts. + OCRBench: test OCR and QA capability on diverse text related images. + Vision speech multimodal understanding: + s_AI2D: diagram understanding with speech as the question format. + s_ChartQA: visual and logical reasoning on charts with speech as the question format. + s_InfoVQA: read and reason about high-resolution infographics images with speech as the question format. + s_DocVQA: read and reason about document images with dense texts and handwritten texts with speech as the question format. + RAI & Security Benchmarks: + VLGuardExt: VLGuard is a vision-language instruction following public dataset for model safety to address safety on deception discrimination, privacy and risky behavior (advice, sexual, violence, political). This was extended to a few internal categories such as child safety and election critical information. + RTVLM: Public benchmark for red-teaming vision-language model on model truthfulness, privacy, safety, and fairness. + GPTV-RAI: In-house benchmark for GPT-4V released from Azure AI, measuring harmfulness (ex. sexual, violent, hate and self-harm), privacy, jailbreak, misinformation. + Speech: + CommonVoice v15 is an open-source, multilingual speech dataset developed by Mozilla. It includes over 33,000 hours of speech data in 133 languages, contributed and validated by volunteers worldwide.The evaluations were conducted in the eight supported languages. + The OpenASR Leaderboard on Hugging Face is designed for benchmarking and evaluating the robustness of ASR models on English. The datasets in the leaderboard cover diverse speech domains including reading speech, conversations, meetings, and so on. + CoVoST2 is a multilingual speech-to-text translation dataset derived from Mozilla's Common Voice project. It is one of the largest open datasets available for speech translation, providing support for both X-to-English (X→En) and English-to-X (En→X) translation tasks. The directions with supported languages were evaluated on the test sets. + FLEURS is a multilingual speech dataset designed for evaluating speech recognition and speech-to-text translation models across a wide range of languages. The test sets for speech recognition and translation tasks were evaluated with the eight supported languages. + MT Bench (Multi-turn Benchmark) is specifically designed to evaluate the conversational and instruction-following abilities of AI models in multi-turn question-answering (QA) scenarios. To support spoken questions, the text is synthesized into speech. + MMMLU (Multilingual Massive Multitask Language Understanding) is an extensive benchmark designed to evaluate the general knowledge and reasoning capabilities of AI models across a wide array of subjects. To support spoken questions, the text is synthesized into its speech counterpart. The model was evaluated on the eight supported languages for this test set. + AIR-Bench Chat (Audio Instruction and Response Benchmark) is a comprehensive evaluation framework designed to test the capabilities of large audio language models (LALMs). It includes both foundation and chat benchmarks. The chat benchmark is selected for its open-ended question answering for audio capability. + MMAU (Massive Multi-Task Audio Understanding) is a comprehensive dataset designed to evaluate the capabilities of multi-modal models in audio-based understanding and reasoning tasks. The test sets are in the form of multiple-choices QA, covering the categories of music, sound, and speech. + Golden3 is a real-world meeting dataset, containing 108 meeting recordings with corresponding transcripts, averaging 6 minutes each. It is recorded across 30 conference rooms, featuring 4-8 attendees. The dataset is primarily in English, covering a wide range of topics. GPT4 is employed to generate summarization instructions that ask to summarize partial or the entire conversation or control the output style/length/structure. + AMI (Augmented Multi-Party Interaction) is a comprehensive collection of meeting recordings, encompassing approximately 100 hours of data. The test split contains 20 meeting recordings with an average duration of 32 minutes. The model was tested on the close-talking version of audio. GPT4 is employed to generate summarization instructions that ask to summarize partial or the entire conversation or control the output style/length/structure. + Safety and RAI: + Single-turn trustworthiness evaluation: + DecodingTrust: DecodingTrust is a collection of trustworthiness benchmarks in eight different perspectives + XSTest: XSTest is an exaggerated safety evaluation + Toxigen: Toxigen is adversarial and hate speech detection + Red Team: + Responses to prompts provided by AI Red Team at Microsoft
## Appendix B: Fine-tuning Korean speech
Click to view detail descriptions ### Overview and Datasets Phi-4-multimodal is originally not designed for Korean speech-to-text task, but it can be fine-tuned for Korean speech-to-text task using your own data or public Korean speech datasets. We have fine-tuned Phi-4-multimodal model for Korean speech-to-text task using the following datasets: - kresnik/zeroth_korean - mozilla-foundation/common_voice_17_0 (Used Korean speech only) - PolyAI/minds14 (Used Korean speech only) - Custom dataset. The speech was a mix of fast and slow speech (Technical blog contents and presentations that the author have posted), with some modulation using audiomentations and this script Total 35K samples. Each sample is a pair of Korean speech and its transcription. Dataset was sampled 16kHz. You can download the fine-tuned model here. Please refer to the Jupyter notebook and video clips in the demo folder. They are not production-quality as they were simply fine-tuned for PoC purposes, but you can see that they transcribe and translate with high accuracy even when a native speaker speaks quite quickly. ### Requirements Based on Python 3.10, the following packages are required, and A100/H100 GPU is recommended. ### Training The model was trained on a single A100 80GB GPU for 4 epochs with a batch size of 16 using the script from microsoft/Phi-4-multimodal-instruct The fine tuning script and command line are basically the same as here, but you need to prepare your own dataset. Also, to perform audio encoder unfreeze, please refer to the code snippet below. The code snippet is retrieved from the fine-tuning Colab notebook. Example commands to run finetuning scripts are as follows: The latest version of the model currently uploaded was fine-tuned by **unfreezing the audio encoder**, and the ASR performance was significantly improved compared to the baseline LoRA adapter-based fine-tuning. Comparing the full fine-tuning and LoRA fine-tuning, the CER on zeroth-test set is **1.61%** and 2.72%, and the WER on zeroth-test set is **3.54%** and 7.19%, respectively. Please refer to the Experimental Settings and Results for more details. ### Experimental Settings and Results The purpose of this benchmarking setup is to evaluate the basic performance of Korean audio in speech and audio understanding tasks. We did this for automatic speech recognition and automatic speech translation, and the test data used the following datasets and samples: Evaluation was done on the following datasets: + ASR (Automatic Speech Recognition): Evaluated with CER (Character Error Rate) and WER (Word Error Rate) on zeroth-test set (457 samples). + AST (Automatic Speech Translation): Evaluated with BLEU score on fleurs ko <-> en speech translation test set (270 samples). Evaluation Script is retrieved from here We used the Phi-4-mm-inst-zeroth-kor as a baseline to improve performance, as it showed significant performance improvement with 1 epoch. Note that the baseline was trained with 22K Zeroth Korean Korean speech data for 1 epoch. Based on this baseline with 35K training samples, we conducted additional experiments with the following scenarios: + [Case 1] LoRA finetune (1 epoch): LoRA adapter-based fine-tuning for 1 epochs + [Case 2] LoRA finetune (4 epochs): LoRA adapter-based fine-tuning for 4 epochs + [Case 3] Unfreeze audio encoder finetune (4 epochs): Full fine-tuning for 4 epochs. The results of the experiments are as follows: + CER and WER for zeroth-test set (Lower is better) + Case 1's CER and WER are 3.80% and 11.52%, respectively, which are better than the baseline (7.02% and 17.31%). + Case 2's CER and WER are 2.72% and 7.19%, respectively, which are better than Case 1. + Case 3's CER and WER are 1.61% and 3.54%, respectively, which are the best among the cases. + BLEU score for fleurs ko <-> en speech translation test set (Higher is better) + Case 1's result is not improved compared to the baseline. Especially, the BLEU score for fleurs-ko2en-cot is decreased compared to the baseline. + Case 2's result is slightly improved compared to Case 1, which is the best among the cases. + Case 3's result is not improved compared to the baseline and Case 2. | Model | zeroth (CER) | zeroth (WER) | fleurs-ko2en | fleurs-ko2en-cot | fleurs-en2ko | fleurs-en2ko-cot | |--------------------------------|-------------|-------------|--------------|------------------|--------------|------------------| | original | 99.16 | 99.63 | 5.63 | 2.42 | 6.86 | 4.17 | | Ours - speech full finetune (4 epochs) | 1.61 | 3.54 | 7.67 | 8.38 | 12.31 | 9.69 | | LoRA finetune (4 epochs) | 2.72 | 7.19 | 7.11 | 9.95 | 13.22 | 10.45 | | LoRA finetune (1 epoch) | 3.80 | 11.52 | 7.03 | 7.04 | 12.50 | 9.54 | | Phi-4-mm-inst-zeroth-kor | 7.02 | 17.31 | 7.07 | 9.19 | 13.08 | 9.35 | ## Cautions Note that this model is just a PoC/experimental purpose, and not intended to be used in production. More high-quality data, tuning, ablation studies, and experiments are needed. Phi-4-multimodal model is strong in multimodal tasks, especially in speech-to-text and high potential in Korean language tasks. Thus if you are interested in Korean speech-to-text task, this model can be a good starting point. ## References - -
", + "model_explanation_gemini": "Processes text, image, and audio inputs to generate text outputs for multilingual and multimodal tasks like speech recognition, translation, visual understanding, and reasoning." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_beit-base-patch16-224-pt22k-ft22k.json b/data/model_data_json/microsoft_beit-base-patch16-224-pt22k-ft22k.json new file mode 100644 index 0000000000000000000000000000000000000000..159349de5077052ee805ffae9547ab168b9c16c5 --- /dev/null +++ b/data/model_data_json/microsoft_beit-base-patch16-224-pt22k-ft22k.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/beit-base-patch16-224-pt22k-ft22k", + "downloads": 473407, + "tags": [ + "transformers", + "pytorch", + "jax", + "beit", + "image-classification", + "vision", + "dataset:imagenet", + "dataset:imagenet-21k", + "arxiv:2106.08254", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - image-classification - vision datasets: - imagenet - imagenet-21k --- # BEiT (base-sized model, fine-tuned on ImageNet-22k) BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on the same dataset at resolution 224x224. It was introduced in the paper BEIT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei and first released in this repository. Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The BEiT model is a Vision Transformer (ViT), which is a transformer encoder model (BERT-like). In contrast to the original ViT model, BEiT is pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. The pre-training objective for the model is to predict visual tokens from the encoder of OpenAI's DALL-E's VQ-VAE, based on masked patches. Next, the model was fine-tuned in a supervised fashion on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. Contrary to the original ViT models, BEiT models do use relative position embeddings (similar to T5) instead of absolute position embeddings, and perform classification of images by mean-pooling the final hidden states of the patches, instead of placing a linear layer on top of the final hidden state of the [CLS] token. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. Alternatively, one can mean-pool the final hidden states of the patch embeddings, and place a linear layer on top of that. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Currently, both the feature extractor and model support PyTorch. ## Training data The BEiT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on the same dataset. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining For all pre-training related hyperparameters, we refer to page 15 of the original paper. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 1 and 2 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution. Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A self-supervised vision transformer model pre-trained and fine-tuned on ImageNet-22k for image classification tasks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_beit-large-patch16-224.json b/data/model_data_json/microsoft_beit-large-patch16-224.json new file mode 100644 index 0000000000000000000000000000000000000000..a02e55dff6112ff77ef2be04f39da292f83ad13d --- /dev/null +++ b/data/model_data_json/microsoft_beit-large-patch16-224.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/beit-large-patch16-224", + "downloads": 343578, + "tags": [ + "transformers", + "pytorch", + "jax", + "beit", + "image-classification", + "vision", + "dataset:imagenet", + "dataset:imagenet-21k", + "arxiv:2106.08254", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - image-classification - vision datasets: - imagenet - imagenet-21k --- # BEiT (large-sized model, fine-tuned on ImageNet-1k) BEiT model pre-trained in a self-supervised fashion on ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper BEIT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei and first released in this repository. Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The BEiT model is a Vision Transformer (ViT), which is a transformer encoder model (BERT-like). In contrast to the original ViT model, BEiT is pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. The pre-training objective for the model is to predict visual tokens from the encoder of OpenAI's DALL-E's VQ-VAE, based on masked patches. Next, the model was fine-tuned in a supervised fashion on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. Contrary to the original ViT models, BEiT models do use relative position embeddings (similar to T5) instead of absolute position embeddings, and perform classification of images by mean-pooling the final hidden states of the patches, instead of placing a linear layer on top of the final hidden state of the [CLS] token. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. Alternatively, one can mean-pool the final hidden states of the patch embeddings, and place a linear layer on top of that. ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: Currently, both the feature extractor and model support PyTorch. ## Training data The BEiT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on ImageNet, a dataset consisting of 1 million images and 1k classes. ## Training procedure ### Preprocessing The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). ### Pretraining For all pre-training related hyperparameters, we refer to page 15 of the original paper. ## Evaluation results For evaluation results on several image classification benchmarks, we refer to tables 1 and 2 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance. ### BibTeX entry and citation info", + "model_explanation_gemini": "A self-supervised vision transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_codebert-base.json b/data/model_data_json/microsoft_codebert-base.json new file mode 100644 index 0000000000000000000000000000000000000000..8c78db08cdbce2a9aad6b68c792d4d98b47d0e85 --- /dev/null +++ b/data/model_data_json/microsoft_codebert-base.json @@ -0,0 +1,18 @@ +{ + "model_id": "microsoft/codebert-base", + "downloads": 1400744, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "rust", + "roberta", + "feature-extraction", + "arxiv:2002.08155", + "endpoints_compatible", + "region:us" + ], + "description": "## CodeBERT-base Pretrained weights for CodeBERT: A Pre-Trained Model for Programming and Natural Languages. ### Training Data The model is trained on bi-modal data (documents & code) of CodeSearchNet ### Training Objective This model is initialized with Roberta-base and trained with MLM+RTD objective (cf. the paper). ### Usage Please see the official repository for scripts that support \"code search\" and \"code-to-document generation\". ### Reference 1. CodeBERT trained with Masked LM objective (suitable for code completion) 2. 🤗 Hugging Face's CodeBERTa (small size, 6 layers) ### Citation", + "model_explanation_gemini": "Pretrained for programming and natural languages using bi-modal data, supporting code search and code-to-document generation tasks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-base-mnli.json b/data/model_data_json/microsoft_deberta-base-mnli.json new file mode 100644 index 0000000000000000000000000000000000000000..217aa14243974234d01f3b8f18539ed60759ce6c --- /dev/null +++ b/data/model_data_json/microsoft_deberta-base-mnli.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/deberta-base-mnli", + "downloads": 90216, + "tags": [ + "transformers", + "pytorch", + "rust", + "deberta", + "text-classification", + "deberta-v1", + "deberta-mnli", + "en", + "arxiv:2006.03654", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta-v1 - deberta-mnli tasks: mnli thumbnail: license: mit widget: - text: \"[CLS] I love you. [SEP] I like you. [SEP]\" --- ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data. Please check the official repository for more details and updates. This model is the base DeBERTa model fine-tuned with MNLI task #### Fine-tuning on NLU tasks We present the dev results on SQuAD 1.1/2.0 and MNLI tasks. | Model | SQuAD 1.1 | SQuAD 2.0 | MNLI-m | |-------------------|-----------|-----------|--------| | RoBERTa-base | 91.5/84.6 | 83.7/80.5 | 87.6 | | XLNet-Large | -/- | -/80.2 | 86.8 | | **DeBERTa-base** | 93.1/87.2 | 86.2/83.1 | 88.8 | ### Citation If you find DeBERTa useful for your work, please cite the following paper:", + "model_explanation_gemini": "microsoft_deberta-base-mnli is a DeBERTa-base model fine-tuned for natural language inference (MNLI), outperforming BERT and RoBERTa on NLU tasks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-base.json b/data/model_data_json/microsoft_deberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..c8b29e52f137462d106b9607f49036a9b6914357 --- /dev/null +++ b/data/model_data_json/microsoft_deberta-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/deberta-base", + "downloads": 263476, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "deberta", + "deberta-v1", + "fill-mask", + "en", + "arxiv:2006.03654", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta-v1 - fill-mask thumbnail: license: mit --- ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data. Please check the official repository for more details and updates. #### Fine-tuning on NLU tasks We present the dev results on SQuAD 1.1/2.0 and MNLI tasks. | Model | SQuAD 1.1 | SQuAD 2.0 | MNLI-m | |-------------------|-----------|-----------|--------| | RoBERTa-base | 91.5/84.6 | 83.7/80.5 | 87.6 | | XLNet-Large | -/- | -/80.2 | 86.8 | | **DeBERTa-base** | 93.1/87.2 | 86.2/83.1 | 88.8 | ### Citation If you find DeBERTa useful for your work, please cite the following paper:", + "model_explanation_gemini": "DeBERTa enhances BERT and RoBERTa with disentangled attention and improved mask decoding, achieving superior performance on natural language understanding tasks like SQuAD and MNLI." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-large-mnli.json b/data/model_data_json/microsoft_deberta-large-mnli.json new file mode 100644 index 0000000000000000000000000000000000000000..275dc1d7e2fb9b451a82973b5bc092141165dc7c --- /dev/null +++ b/data/model_data_json/microsoft_deberta-large-mnli.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/deberta-large-mnli", + "downloads": 1360800, + "tags": [ + "transformers", + "pytorch", + "deberta", + "text-classification", + "deberta-v1", + "deberta-mnli", + "en", + "arxiv:2006.03654", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta-v1 - deberta-mnli tasks: mnli thumbnail: license: mit widget: - text: \"[CLS] I love you. [SEP] I like you. [SEP]\" --- ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data. Please check the official repository for more details and updates. This is the DeBERTa large model fine-tuned with MNLI task. #### Fine-tuning on NLU tasks We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks. | Model | SQuAD 1.1 | SQuAD 2.0 | MNLI-m/mm | SST-2 | QNLI | CoLA | RTE | MRPC | QQP |STS-B | |---------------------------|-----------|-----------|-------------|-------|------|------|--------|-------|-------|------| | | F1/EM | F1/EM | Acc | Acc | Acc | MCC | Acc |Acc/F1 |Acc/F1 |P/S | | BERT-Large | 90.9/84.1 | 81.8/79.0 | 86.6/- | 93.2 | 92.3 | 60.6 | 70.4 | 88.0/- | 91.3/- |90.0/- | | RoBERTa-Large | 94.6/88.9 | 89.4/86.5 | 90.2/- | 96.4 | 93.9 | 68.0 | 86.6 | 90.9/- | 92.2/- |92.4/- | | XLNet-Large | 95.1/89.7 | 90.6/87.9 | 90.8/- | 97.0 | 94.9 | 69.0 | 85.9 | 90.8/- | 92.3/- |92.5/- | | DeBERTa-Large1 | 95.5/90.1 | 90.7/88.0 | 91.3/91.1| 96.5|95.3| 69.5| 91.0| 92.6/94.6| 92.3/- |92.8/92.5 | | DeBERTa-XLarge1 | -/- | -/- | 91.5/91.2| 97.0 | - | - | 93.1 | 92.1/94.3 | - |92.9/92.7| | DeBERTa-V2-XLarge1|95.8/90.8| 91.4/88.9|91.7/91.6| **97.5**| 95.8|71.1|**93.9**|92.0/94.2|92.3/89.8|92.9/92.9| |**DeBERTa-V2-XXLarge1,2**|**96.1/91.4**|**92.2/89.7**|**91.7/91.9**|97.2|**96.0**|**72.0**| 93.5| **93.1/94.9**|**92.7/90.3** |**93.2/93.1** | -------- #### Notes. - 1 Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when start from MNLI fine-tuned models, however, we only report the numbers fine-tuned from pretrained base models for those 4 tasks. - 2 To try the **XXLarge** model with **HF transformers**, you need to specify **--sharded_ddp** ### Citation If you find DeBERTa useful for your work, please cite the following paper:", + "model_explanation_gemini": "Microsoft's DeBERTa-large model fine-tuned for Multi-Genre Natural Language Inference (MNLI), excelling at determining logical relationships between text pairs." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-v2-xlarge.json b/data/model_data_json/microsoft_deberta-v2-xlarge.json new file mode 100644 index 0000000000000000000000000000000000000000..5533ed410146ac690703ea662f57b254bcc73f09 --- /dev/null +++ b/data/model_data_json/microsoft_deberta-v2-xlarge.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/deberta-v2-xlarge", + "downloads": 114329, + "tags": [ + "transformers", + "pytorch", + "tf", + "deberta-v2", + "deberta", + "fill-mask", + "en", + "arxiv:2006.03654", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta - fill-mask thumbnail: license: mit --- ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data. Please check the official repository for more details and updates. This is the DeBERTa V2 xlarge model with 24 layers, 1536 hidden size. The total parameters are 900M and it is trained with 160GB raw data. ### Fine-tuning on NLU tasks We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks. | Model | SQuAD 1.1 | SQuAD 2.0 | MNLI-m/mm | SST-2 | QNLI | CoLA | RTE | MRPC | QQP |STS-B | |---------------------------|-----------|-----------|-------------|-------|------|------|--------|-------|-------|------| | | F1/EM | F1/EM | Acc | Acc | Acc | MCC | Acc |Acc/F1 |Acc/F1 |P/S | | BERT-Large | 90.9/84.1 | 81.8/79.0 | 86.6/- | 93.2 | 92.3 | 60.6 | 70.4 | 88.0/- | 91.3/- |90.0/- | | RoBERTa-Large | 94.6/88.9 | 89.4/86.5 | 90.2/- | 96.4 | 93.9 | 68.0 | 86.6 | 90.9/- | 92.2/- |92.4/- | | XLNet-Large | 95.1/89.7 | 90.6/87.9 | 90.8/- | 97.0 | 94.9 | 69.0 | 85.9 | 90.8/- | 92.3/- |92.5/- | | DeBERTa-Large1 | 95.5/90.1 | 90.7/88.0 | 91.3/91.1| 96.5|95.3| 69.5| 91.0| 92.6/94.6| 92.3/- |92.8/92.5 | | DeBERTa-XLarge1 | -/- | -/- | 91.5/91.2| 97.0 | - | - | 93.1 | 92.1/94.3 | - |92.9/92.7| | DeBERTa-V2-XLarge1|95.8/90.8| 91.4/88.9|91.7/91.6| **97.5**| 95.8|71.1|**93.9**|92.0/94.2|92.3/89.8|92.9/92.9| |**DeBERTa-V2-XXLarge1,2**|**96.1/91.4**|**92.2/89.7**|**91.7/91.9**|97.2|**96.0**|**72.0**| 93.5| **93.1/94.9**|**92.7/90.3** |**93.2/93.1** | -------- #### Notes. - 1 Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when start from MNLI fine-tuned models, however, we only report the numbers fine-tuned from pretrained base models for those 4 tasks. - 2 To try the **XXLarge** model with **HF transformers**, you need to specify **--sharded_ddp** ### Citation If you find DeBERTa useful for your work, please cite the following paper:", + "model_explanation_gemini": "DeBERTa-V2-XLarge is a large-scale natural language understanding model that enhances BERT and RoBERTa with disentangled attention and improved mask decoding, achieving superior performance on tasks like SQuAD and GLUE benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-v3-base.json b/data/model_data_json/microsoft_deberta-v3-base.json new file mode 100644 index 0000000000000000000000000000000000000000..1d03ef8395bb487e801e9e925229c015beea72dd --- /dev/null +++ b/data/model_data_json/microsoft_deberta-v3-base.json @@ -0,0 +1,22 @@ +{ + "model_id": "microsoft/deberta-v3-base", + "downloads": 1451081, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "deberta-v2", + "deberta", + "deberta-v3", + "fill-mask", + "en", + "arxiv:2006.03654", + "arxiv:2111.09543", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta - deberta-v3 - fill-mask thumbnail: license: mit --- ## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly improves the model performance on downstream tasks. You can find more technique details about the new model from our paper. Please check the official repository for more implementation details and updates. The DeBERTa V3 base model comes with 12 layers and a hidden size of 768. It has only 86M backbone parameters with a vocabulary containing 128K tokens which introduces 98M parameters in the Embedding layer. This model was trained using the 160GB data as DeBERTa V2. #### Fine-tuning on NLU tasks We present the dev results on SQuAD 2.0 and MNLI tasks. | Model |Vocabulary(K)|Backbone #Params(M)| SQuAD 2.0(F1/EM) | MNLI-m/mm(ACC)| |-------------------|----------|-------------------|-----------|----------| | RoBERTa-base |50 |86 | 83.7/80.5 | 87.6/- | | XLNet-base |32 |92 | -/80.2 | 86.8/- | | ELECTRA-base |30 |86 | -/80.5 | 88.8/ | | DeBERTa-base |50 |100 | 86.2/83.1| 88.8/88.5| | DeBERTa-v3-base |128|86 | **88.4/85.4** | **90.6/90.7**| | DeBERTa-v3-base + SiFT |128|86 | -/- | 91.0/-| We present the dev results on SQuAD 1.1/2.0 and MNLI tasks. #### Fine-tuning with HF transformers ### Citation If you find DeBERTa useful for your work, please cite the following papers:", + "model_explanation_gemini": "DeBERTaV3 is an improved natural language understanding model that outperforms predecessors like BERT and RoBERTa on tasks such as question answering and text classification through advanced pre-training techniques." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-v3-large.json b/data/model_data_json/microsoft_deberta-v3-large.json new file mode 100644 index 0000000000000000000000000000000000000000..a402f68bf2ad72b17515fb8f91a05d5229432084 --- /dev/null +++ b/data/model_data_json/microsoft_deberta-v3-large.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/deberta-v3-large", + "downloads": 326441, + "tags": [ + "transformers", + "pytorch", + "tf", + "deberta-v2", + "deberta", + "deberta-v3", + "fill-mask", + "en", + "arxiv:2006.03654", + "arxiv:2111.09543", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta - deberta-v3 - fill-mask thumbnail: license: mit --- ## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly improves the model performance on downstream tasks. You can find more technique details about the new model from our paper. Please check the official repository for more implementation details and updates. The DeBERTa V3 large model comes with 24 layers and a hidden size of 1024. It has 304M backbone parameters with a vocabulary containing 128K tokens which introduces 131M parameters in the Embedding layer. This model was trained using the 160GB data as DeBERTa V2. #### Fine-tuning on NLU tasks We present the dev results on SQuAD 2.0 and MNLI tasks. | Model |Vocabulary(K)|Backbone #Params(M)| SQuAD 2.0(F1/EM) | MNLI-m/mm(ACC)| |-------------------|----------|-------------------|-----------|----------| | RoBERTa-large |50 |304 | 89.4/86.5 | 90.2 | | XLNet-large |32 |- | 90.6/87.9 | 90.8 | | DeBERTa-large |50 |- | 90.7/88.0 | 91.3 | | **DeBERTa-v3-large**|128|304 | **91.5/89.0**| **91.8/91.9**| #### Fine-tuning with HF transformers ### Citation If you find DeBERTa useful for your work, please cite the following papers:", + "model_explanation_gemini": "DeBERTa-v3-large is an advanced natural language understanding model that enhances BERT and RoBERTa with disentangled attention and ELECTRA-style pre-training, excelling in tasks like SQuAD 2.0 and MNLI." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-v3-small.json b/data/model_data_json/microsoft_deberta-v3-small.json new file mode 100644 index 0000000000000000000000000000000000000000..fb7cf9a526ecc91387c4e0bd6b204e30e4408bf3 --- /dev/null +++ b/data/model_data_json/microsoft_deberta-v3-small.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/deberta-v3-small", + "downloads": 215748, + "tags": [ + "transformers", + "pytorch", + "tf", + "deberta-v2", + "deberta", + "deberta-v3", + "fill-mask", + "en", + "arxiv:2006.03654", + "arxiv:2111.09543", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta - deberta-v3 - fill-mask thumbnail: license: mit --- ## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly improves the model performance on downstream tasks. You can find more technique details about the new model from our paper. Please check the official repository for more implementation details and updates. The DeBERTa V3 small model comes with 6 layers and a hidden size of 768. It has **44M** backbone parameters with a vocabulary containing 128K tokens which introduces 98M parameters in the Embedding layer. This model was trained using the 160GB data as DeBERTa V2. #### Fine-tuning on NLU tasks We present the dev results on SQuAD 2.0 and MNLI tasks. | Model |Vocabulary(K)|Backbone #Params(M)| SQuAD 2.0(F1/EM) | MNLI-m/mm(ACC)| |-------------------|----------|-------------------|-----------|----------| | RoBERTa-base |50 |86 | 83.7/80.5 | 87.6/- | | XLNet-base |32 |92 | -/80.2 | 86.8/- | | ELECTRA-base |30 |86 | -/80.5 | 88.8/ | | DeBERTa-base |50 |100 | 86.2/83.1| 88.8/88.5| | DeBERTa-v3-large|128|304 | 91.5/89.0 | 91.8/91.9 | | DeBERTa-v3-base |128|86 | 88.4/85.4 | 90.6/90.7| | **DeBERTa-v3-small** |128|**44** | **82.8/80.4** | **88.3/87.7**| | DeBERTa-v3-small+SiFT|128|22 | -/- | 88.8/88.5| #### Fine-tuning with HF transformers ### Citation If you find DeBERTa useful for your work, please cite the following papers:", + "model_explanation_gemini": "DeBERTa-v3-small is a smaller, efficient version of the DeBERTa model optimized for natural language understanding tasks like SQuAD and MNLI, using ELECTRA-style pre-training and gradient-disentangled embedding sharing to outperform similar-sized models." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_deberta-xlarge-mnli.json b/data/model_data_json/microsoft_deberta-xlarge-mnli.json new file mode 100644 index 0000000000000000000000000000000000000000..1b5f3e36a86351bab1c4bd531ecc5435067f62dd --- /dev/null +++ b/data/model_data_json/microsoft_deberta-xlarge-mnli.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/deberta-xlarge-mnli", + "downloads": 752500, + "tags": [ + "transformers", + "pytorch", + "tf", + "deberta", + "text-classification", + "deberta-v1", + "deberta-mnli", + "en", + "arxiv:2006.03654", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - deberta-v1 - deberta-mnli tasks: mnli thumbnail: license: mit widget: - text: \"[CLS] I love you. [SEP] I like you. [SEP]\" --- ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data. Please check the official repository for more details and updates. This the DeBERTa xlarge model(750M) fine-tuned with mnli task. ### Fine-tuning on NLU tasks We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks. | Model | SQuAD 1.1 | SQuAD 2.0 | MNLI-m/mm | SST-2 | QNLI | CoLA | RTE | MRPC | QQP |STS-B | |---------------------------|-----------|-----------|-------------|-------|------|------|--------|-------|-------|------| | | F1/EM | F1/EM | Acc | Acc | Acc | MCC | Acc |Acc/F1 |Acc/F1 |P/S | | BERT-Large | 90.9/84.1 | 81.8/79.0 | 86.6/- | 93.2 | 92.3 | 60.6 | 70.4 | 88.0/- | 91.3/- |90.0/- | | RoBERTa-Large | 94.6/88.9 | 89.4/86.5 | 90.2/- | 96.4 | 93.9 | 68.0 | 86.6 | 90.9/- | 92.2/- |92.4/- | | XLNet-Large | 95.1/89.7 | 90.6/87.9 | 90.8/- | 97.0 | 94.9 | 69.0 | 85.9 | 90.8/- | 92.3/- |92.5/- | | DeBERTa-Large1 | 95.5/90.1 | 90.7/88.0 | 91.3/91.1| 96.5|95.3| 69.5| 91.0| 92.6/94.6| 92.3/- |92.8/92.5 | | DeBERTa-XLarge1 | -/- | -/- | 91.5/91.2| 97.0 | - | - | 93.1 | 92.1/94.3 | - |92.9/92.7| | DeBERTa-V2-XLarge1|95.8/90.8| 91.4/88.9|91.7/91.6| **97.5**| 95.8|71.1|**93.9**|92.0/94.2|92.3/89.8|92.9/92.9| |**DeBERTa-V2-XXLarge1,2**|**96.1/91.4**|**92.2/89.7**|**91.7/91.9**|97.2|**96.0**|**72.0**| 93.5| **93.1/94.9**|**92.7/90.3** |**93.2/93.1** | -------- #### Notes. - 1 Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when start from MNLI fine-tuned models, however, we only report the numbers fine-tuned from pretrained base models for those 4 tasks. - 2 To try the **XXLarge** model with **HF transformers**, you need to specify **--sharded_ddp** ### Citation If you find DeBERTa useful for your work, please cite the following paper:", + "model_explanation_gemini": "An xlarge-scale DeBERTa model fine-tuned for natural language inference (MNLI) tasks, outperforming BERT and RoBERTa on various NLU benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_git-base.json b/data/model_data_json/microsoft_git-base.json new file mode 100644 index 0000000000000000000000000000000000000000..4ccf1c0634b2aa0a022368ecd4f84e24b29a5ff8 --- /dev/null +++ b/data/model_data_json/microsoft_git-base.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/git-base", + "downloads": 285822, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "git", + "image-text-to-text", + "vision", + "image-to-text", + "image-captioning", + "en", + "arxiv:2205.14100", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: mit tags: - vision - image-to-text - image-captioning model_name: microsoft/git-base pipeline_tag: image-to-text --- # GIT (GenerativeImage2Text), base-sized GIT (short for GenerativeImage2Text) model, base-sized version. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision and Language by Wang et al. and first released in this repository. Disclaimer: The team releasing GIT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using \"teacher forcing\" on a lot of (image, text) pairs. The goal for the model is simply to predict the next text token, giving the image tokens and previous text tokens. The model has full access to (i.e. a bidirectional attention mask is used for) the image patch tokens, but only has access to the previous text tokens (i.e. a causal attention mask is used for the text tokens) when predicting the next text token. !GIT architecture This allows the model to be used for tasks like: - image and video captioning - visual question answering (VQA) on images and videos - even image classification (by simply conditioning the model on the image and asking it to generate a class for it in text). ## Intended uses & limitations You can use the raw model for image captioning. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use For code examples, we refer to the documentation. ## Training data From the paper: > We collect 0.8B image-text pairs for pre-training, which include COCO (Lin et al., 2014), Conceptual Captions (CC3M) (Sharma et al., 2018), SBU (Ordonez et al., 2011), Visual Genome (VG) (Krishna et al., 2016), Conceptual Captions (CC12M) (Changpinyo et al., 2021), ALT200M (Hu et al., 2021a), and an extra 0.6B data following a similar collection procedure in Hu et al. (2021a). => however this is for the model referred to as \"GIT\" in the paper, which is not open-sourced. This checkpoint is \"GIT-base\", which is a smaller variant of GIT trained on 10 million image-text pairs. See table 11 in the paper for more details. ### Preprocessing We refer to the original repo regarding details for preprocessing during training. During validation, one resizes the shorter edge of each image, after which center cropping is performed to a fixed-size resolution. Next, frames are normalized across the RGB channels with the ImageNet mean and standard deviation. ## Evaluation results For evaluation results, we refer readers to the paper.", + "model_explanation_gemini": "Generates text descriptions from images for tasks like captioning, visual question answering, and image classification." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_infoxlm-large.json b/data/model_data_json/microsoft_infoxlm-large.json new file mode 100644 index 0000000000000000000000000000000000000000..7ea62b4fef8d35d11965d2b924b04e75e73bb7dd --- /dev/null +++ b/data/model_data_json/microsoft_infoxlm-large.json @@ -0,0 +1,16 @@ +{ + "model_id": "microsoft/infoxlm-large", + "downloads": 1036792, + "tags": [ + "transformers", + "pytorch", + "xlm-roberta", + "fill-mask", + "arxiv:2007.07834", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "# InfoXLM **InfoXLM** (NAACL 2021, paper, repo, model) InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training. **MD5** **BibTeX**", + "model_explanation_gemini": "InfoXLM is a cross-lingual language model pre-trained using an information-theoretic framework for multilingual tasks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_kosmos-2-patch14-224.json b/data/model_data_json/microsoft_kosmos-2-patch14-224.json new file mode 100644 index 0000000000000000000000000000000000000000..9b62e61e3604eb9728105d11cd20f301c6656eba --- /dev/null +++ b/data/model_data_json/microsoft_kosmos-2-patch14-224.json @@ -0,0 +1,18 @@ +{ + "model_id": "microsoft/kosmos-2-patch14-224", + "downloads": 163280, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "kosmos-2", + "image-text-to-text", + "image-captioning", + "image-to-text", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: image-to-text tags: - image-captioning languages: - en license: mit --- # Kosmos-2: Grounding Multimodal Large Language Models to the World
[An image of a snowman warming himself by a fire.]
This Hub repository contains a HuggingFace's implementation of the original Kosmos-2 model from Microsoft. ## How to Get Started with the Model Use the code below to get started with the model. ## Tasks This model is capable of performing different tasks through changing the prompts. First, let's define a function to run a prompt.
Click to expand
Here are the tasks could perform:
Click to expand ### Multimodal Grounding #### • Phrase Grounding #### • Referring Expression Comprehension ### Multimodal Referring #### • Referring expression generation ### Perception-Language Tasks #### • Grounded VQA #### • Grounded VQA with multimodal referring via bounding boxes ### Grounded Image captioning #### • Brief #### • Detailed
## Draw the bounding bboxes of the entities on the image Once you have the , you can use the following helper function to draw their bounding bboxes on the image:
Click to expand
Here is the annotated image: ## BibTex and citation info", + "model_explanation_gemini": "Generates text descriptions from images and grounds them in visual context with tasks like phrase grounding, referring expression comprehension, and grounded image captioning." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_layoutlm-base-uncased.json b/data/model_data_json/microsoft_layoutlm-base-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..d7fa2c4fe72aca38400b69447f59717c468f60ce --- /dev/null +++ b/data/model_data_json/microsoft_layoutlm-base-uncased.json @@ -0,0 +1,18 @@ +{ + "model_id": "microsoft/layoutlm-base-uncased", + "downloads": 1715591, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "layoutlm", + "en", + "arxiv:1912.13318", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: mit --- # LayoutLM **Multimodal (text + layout/format + image) pre-training for document AI** Microsoft Document AI | GitHub ## Model description LayoutLM is a simple but effective pre-training method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding. LayoutLM archives the SOTA results on multiple datasets. For more details, please refer to our paper: LayoutLM: Pre-training of Text and Layout for Document Image Understanding Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, KDD 2020 ## Training data We pre-train LayoutLM on IIT-CDIP Test Collection 1.0\\* dataset with two settings. * LayoutLM-Base, Uncased (11M documents, 2 epochs): 12-layer, 768-hidden, 12-heads, 113M parameters **(This Model)** * LayoutLM-Large, Uncased (11M documents, 2 epochs): 24-layer, 1024-hidden, 16-heads, 343M parameters ## Citation If you find LayoutLM useful in your research, please cite the following paper:", + "model_explanation_gemini": "Pre-trained for document image understanding and information extraction by combining text, layout, and image data." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_layoutlmv2-base-uncased.json b/data/model_data_json/microsoft_layoutlmv2-base-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..8ff2aedd46cd852974f97f7aadff30c72045df47 --- /dev/null +++ b/data/model_data_json/microsoft_layoutlmv2-base-uncased.json @@ -0,0 +1,16 @@ +{ + "model_id": "microsoft/layoutlmv2-base-uncased", + "downloads": 522175, + "tags": [ + "transformers", + "pytorch", + "layoutlmv2", + "en", + "arxiv:2012.14740", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: cc-by-nc-sa-4.0 --- # LayoutLMv2 **Multimodal (text + layout/format + image) pre-training for document AI** The documentation of this model in the Transformers library can be found here. Microsoft Document AI | GitHub ## Introduction LayoutLMv2 is an improved version of LayoutLM with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. It outperforms strong baselines and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks, including , including FUNSD (0.7895 → 0.8420), CORD (0.9493 → 0.9601), SROIE (0.9524 → 0.9781), Kleister-NDA (0.834 → 0.852), RVL-CDIP (0.9443 → 0.9564), and DocVQA (0.7295 → 0.8672). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou, ACL 2021", + "model_explanation_gemini": "LayoutLMv2 is a multimodal model combining text, layout, and image data for document understanding tasks, achieving state-of-the-art performance on various benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_layoutlmv3-base.json b/data/model_data_json/microsoft_layoutlmv3-base.json new file mode 100644 index 0000000000000000000000000000000000000000..963ae899d933183472dbdb1292cfb09466bae659 --- /dev/null +++ b/data/model_data_json/microsoft_layoutlmv3-base.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/layoutlmv3-base", + "downloads": 1424714, + "tags": [ + "transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "layoutlmv3", + "en", + "arxiv:2204.08387", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: cc-by-nc-sa-4.0 --- # LayoutLMv3 Microsoft Document AI | GitHub ## Model description LayoutLMv3 is a pre-trained multimodal Transformer for Document AI with unified text and image masking. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model. For example, LayoutLMv3 can be fine-tuned for both text-centric tasks, including form understanding, receipt understanding, and document visual question answering, and image-centric tasks such as document image classification and document layout analysis. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei, ACM Multimedia 2022. ## Citation If you find LayoutLM useful in your research, please cite the following paper: ## License The content of this project itself is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). Portions of the source code are based on the transformers project. Microsoft Open Source Code of Conduct", + "model_explanation_gemini": "LayoutLMv3 is a multimodal Transformer model pre-trained for Document AI tasks, handling both text-centric (form/receipt understanding, document QA) and image-centric (document classification, layout analysis) tasks with unified text and image masking." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_layoutlmv3-large.json b/data/model_data_json/microsoft_layoutlmv3-large.json new file mode 100644 index 0000000000000000000000000000000000000000..021ec5cb8f6906f1ea9ab75fd27487100e179d2d --- /dev/null +++ b/data/model_data_json/microsoft_layoutlmv3-large.json @@ -0,0 +1,17 @@ +{ + "model_id": "microsoft/layoutlmv3-large", + "downloads": 106598, + "tags": [ + "transformers", + "pytorch", + "tf", + "layoutlmv3", + "en", + "arxiv:2204.08387", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: cc-by-nc-sa-4.0 --- # LayoutLMv3 Microsoft Document AI | GitHub ## Model description LayoutLMv3 is a pre-trained multimodal Transformer for Document AI with unified text and image masking. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model. For example, LayoutLMv3 can be fine-tuned for both text-centric tasks, including form understanding, receipt understanding, and document visual question answering, and image-centric tasks such as document image classification and document layout analysis. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei, Preprint 2022. ## Citation If you find LayoutLM useful in your research, please cite the following paper: ## License The content of this project itself is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). Portions of the source code are based on the transformers project. Microsoft Open Source Code of Conduct", + "model_explanation_gemini": "LayoutLMv3 is a multimodal Transformer model pre-trained for Document AI tasks, capable of fine-tuning for both text-centric (e.g., form understanding) and image-centric (e.g., document layout analysis) applications." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_llava-med-v1.5-mistral-7b.json b/data/model_data_json/microsoft_llava-med-v1.5-mistral-7b.json new file mode 100644 index 0000000000000000000000000000000000000000..b8bfe3282f8f3b983e92ab4dd4b3b23755aab1cc --- /dev/null +++ b/data/model_data_json/microsoft_llava-med-v1.5-mistral-7b.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/llava-med-v1.5-mistral-7b", + "downloads": 97112, + "tags": [ + "transformers", + "safetensors", + "llava_mistral", + "text-generation", + "image-text-to-text", + "medical", + "vision", + "conversational", + "arxiv:2306.00890", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - image-text-to-text - medical - vision --- # LLaVA-Med v1.5, using mistralai/Mistral-7B-Instruct-v0.2 as LLM for a better commercial license Large Language and Vision Assistant for bioMedicine (i.e., “LLaVA-Med”) is a large language and vision model trained using a curriculum learning method for adapting LLaVA to the biomedical domain. It is an open-source release intended for research use only to facilitate reproducibility of the corresponding paper which claims improved performance for open-ended biomedical questions answering tasks, including common visual question answering (VQA) benchmark datasets such as PathVQA and VQA-RAD. LLaVA-Med was proposed in LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day by Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao. **Model date:** LLaVA-Med-v1.5-Mistral-7B was trained in April 2024. **Paper or resources for more information:** **Where to send questions or comments about the model:** ## License mistralai/Mistral-7B-Instruct-v0.2 license. ## Intended use The data, code, and model checkpoints are intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper. The data, code, and model checkpoints are not intended to be used in clinical care or for any clinical decision making purposes. ### Primary Intended Use The primary intended use is to support AI researchers reproducing and building on top of this work. LLaVA-Med and its associated models should be helpful for exploring various biomedical vision-language processing (VLP ) and vision question answering (VQA) research questions. ### Out-of-Scope Use Any deployed use case of the model --- commercial or otherwise --- is out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are intended for research use only and not intended for deployed use cases. Please refer to the associated paper for more details. ## Data This model builds upon PMC-15M dataset, which is a large-scale parallel image-text dataset for biomedical vision-language processing. It contains 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. It covers a diverse range of biomedical image types, such as microscopy, radiography, histology, and more. ## How to use See the Serving and Evaluation sections in the LLaVA-Med repo. ## Limitations This model was developed using English corpora, and thus may be considered English-only. This model is evaluated on a narrow set of biomedical benchmark tasks, described in LLaVA-Med paper. As such, it is not suitable for use in any clinical setting. Under some conditions, the model may make inaccurate predictions and display limitations, which may require additional mitigation strategies. In particular, this model is likely to carry many of the limitations of the model from which it is derived, LLaVA. Further, this model was developed in part using the PMC-15M dataset. The figure-caption pairs that make up this dataset may contain biases reflecting the current practice of academic publication. For example, the corresponding papers may be enriched for positive findings, contain examples of extreme cases, and otherwise reflect distributions that are not representative of other sources of biomedical data. ### BibTeX entry and citation info", + "model_explanation_gemini": "LLaVA-Med v1.5 is a biomedical vision-language model designed for open-ended question answering and visual question answering (VQA) tasks in the medical domain, trained using curriculum learning on biomedical image-text pairs for research purposes only." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_mdeberta-v3-base.json b/data/model_data_json/microsoft_mdeberta-v3-base.json new file mode 100644 index 0000000000000000000000000000000000000000..4a77e3ab8ed5f40740986c690c1940f3de97c4dc --- /dev/null +++ b/data/model_data_json/microsoft_mdeberta-v3-base.json @@ -0,0 +1,37 @@ +{ + "model_id": "microsoft/mdeberta-v3-base", + "downloads": 547869, + "tags": [ + "transformers", + "pytorch", + "tf", + "deberta-v2", + "deberta", + "deberta-v3", + "mdeberta", + "fill-mask", + "multilingual", + "en", + "ar", + "bg", + "de", + "el", + "es", + "fr", + "hi", + "ru", + "sw", + "th", + "tr", + "ur", + "vi", + "zh", + "arxiv:2006.03654", + "arxiv:2111.09543", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - en - ar - bg - de - el - es - fr - hi - ru - sw - th - tr - ur - vi - zh tags: - deberta - deberta-v3 - mdeberta - fill-mask thumbnail: license: mit --- ## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly improves the model performance on downstream tasks. You can find more technique details about the new model from our paper. Please check the official repository for more implementation details and updates. mDeBERTa is multilingual version of DeBERTa which use the same structure as DeBERTa and was trained with CC100 multilingual data. The mDeBERTa V3 base model comes with 12 layers and a hidden size of 768. It has 86M backbone parameters with a vocabulary containing 250K tokens which introduces 190M parameters in the Embedding layer. This model was trained using the 2.5T CC100 data as XLM-R. #### Fine-tuning on NLU tasks We present the dev results on XNLI with zero-shot cross-lingual transfer setting, i.e. training with English data only, test on other languages. | Model |avg | en | fr| es | de | el | bg | ru |tr |ar |vi | th | zh | hi | sw | ur | |--------------| ----|----|----|---- |-- |-- |-- | -- |-- |-- |-- | -- | -- | -- | -- | -- | | XLM-R-base |76.2 |85.8|79.7|80.7 |78.7 |77.5 |79.6 |78.1 |74.2 |73.8 |76.5 |74.6 |76.7| 72.4| 66.5| 68.3| | mDeBERTa-base|**79.8**+/-0.2|**88.2**|**82.6**|**84.4** |**82.7** |**82.3** |**82.4** |**80.8** |**79.5** |**78.5** |**78.1** |**76.4** |**79.5**| **75.9**| **73.9**| **72.4**| #### Fine-tuning with HF transformers ### Citation If you find DeBERTa useful for your work, please cite the following papers:", + "model_explanation_gemini": "microsoft_mdeberta-v3-base is a multilingual DeBERTa model optimized for natural language understanding tasks, outperforming similar models like XLM-R in cross-lingual transfer settings, particularly in zero-shot scenarios." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_phi-1_5.json b/data/model_data_json/microsoft_phi-1_5.json new file mode 100644 index 0000000000000000000000000000000000000000..ae7dbf84e46378d9dfc3e8daba6e82e1c270a643 --- /dev/null +++ b/data/model_data_json/microsoft_phi-1_5.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/phi-1_5", + "downloads": 110900, + "tags": [ + "transformers", + "safetensors", + "phi", + "text-generation", + "nlp", + "code", + "en", + "arxiv:2309.05463", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - en pipeline_tag: text-generation tags: - nlp - code --- ## Model Summary The language model Phi-1.5 is a Transformer with **1.3 billion** parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters. We **did not** fine-tune Phi-1.5 either for **instruction following or through reinforcement learning from human feedback**. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more. For a safer model release, we exclude generic web-crawl data sources such as common-crawl from the training. This strategy prevents direct exposure to potentially harmful online content, enhancing the model's safety without RLHF. However, the model is still vulnerable to generating harmful content. We hope the model can help the research community to further study the safety of language models. Phi-1.5 can write poems, draft emails, create stories, summarize texts, write Python code (such as downloading a Hugging Face transformer model), etc. ## How to Use Phi-1.5 has been integrated in the version 4.37.0, please ensure that you are using a version equal or higher than it. ## Intended Uses Given the nature of the training data, Phi-1.5 is best suited for prompts using the QA format, the chat format, and the code format. Note that Phi-1.5, being a base model, often produces irrelevant text following the main answer. In the following example, we've truncated the answer for illustrative purposes only. ### QA Format: where the model generates the text after \"Answer:\". ### Chat Format: where the model generates the text after the first \"Bob:\". ### Code Format: where the model generates the text after the comments. **Notes:** * Phi-1.5-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications. * Phi-1.5 has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details. ## Sample Code ## Limitations of Phi-1.5 * Generate Inaccurate Code and Facts: The model often produces incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions. * Limited Scope for code: If the model generates Python scripts that utilize uncommon packages or scripts in other languages, we strongly recommend users manually verify all API uses. * Unreliable Responses to Instruction: The model has not undergone instruction fine-tuning. As a result, it may struggle or fail to adhere to intricate or nuanced instructions provided by users. * Language Limitations: The model is primarily designed to understand standard English. Informal English, slang, or any other language outside of English might pose challenges to its comprehension, leading to potential misinterpretations or errors in response. * Potential Societal Biases: Regardless of the safe data used for its training, the model is not entirely free from societal biases. There's a possibility it may generate content that mirrors these societal biases, particularly if prompted or instructed to do so. We urge users to be aware of this and to exercise caution and critical thinking when interpreting model outputs. * Toxicity: Despite that the model is trained with carefully selected data, the model can still produce harmful content if explicitly prompted or instructed to do so. We chose to release the model for research purposes only -- We hope to help the open-source community develop the most effective ways to reduce the toxicity of a model directly after pretraining. ## Training ### Model * Architecture: a Transformer-based model with next-word prediction objective * Dataset size: 30B tokens * Training tokens: 150B tokens * Precision: fp16 * GPUs: 32xA100-40G * Training time: 8 days ### Software * PyTorch * DeepSpeed * Flash-Attention ### License The model is licensed under the MIT license. ### Citation You can find the paper at Please cite as: ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.", + "model_explanation_gemini": "Phi-1.5 is a 1.3 billion parameter Transformer model designed for text generation, including tasks like answering questions, coding, and creative writing, while serving as a research tool for studying language model safety challenges like toxicity and bias." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_phi-2.json b/data/model_data_json/microsoft_phi-2.json new file mode 100644 index 0000000000000000000000000000000000000000..2e2ecb51f6cbc613efd7042de99adee20f7dadd8 --- /dev/null +++ b/data/model_data_json/microsoft_phi-2.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/phi-2", + "downloads": 967734, + "tags": [ + "transformers", + "safetensors", + "phi", + "text-generation", + "nlp", + "code", + "en", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - en pipeline_tag: text-generation tags: - nlp - code --- ## Model Summary Phi-2 is a Transformer with **2.7 billion** parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. Our model hasn't been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more. ## How to Use Phi-2 has been integrated in the version 4.37.0, please ensure that you are using a version equal or higher than it. Phi-2 is known for having an attention overflow issue (with FP16). If you are facing this issue, please enable/disable autocast on the PhiAttention.forward() function. ## Intended Uses Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format. ### QA Format: You can provide the prompt as a standalone question as follows: where the model generates the text after \".\" . To encourage the model to write more concise answers, you can also try the following QA format using \"Instruct: \\\\nOutput:\" where the model generates the text after \"Output:\". ### Chat Format: where the model generates the text after the first \"Bob:\". ### Code Format: where the model generates the text after the comments. **Notes:** * Phi-2 is intended for QA, chat, and code purposes. The model-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications. * Direct adoption for production tasks without evaluation is out of scope of this project. As a result, the Phi-2 model has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details. * If you are using , always load the model with to prevent side-effects. ## Sample Code ## Limitations of Phi-2 * Generate Inaccurate Code and Facts: The model may produce incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions. * Limited Scope for code: Majority of Phi-2 training data is based in Python and use common packages such as \"typing, math, random, collections, datetime, itertools\". If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. * Unreliable Responses to Instruction: The model has not undergone instruction fine-tuning. As a result, it may struggle or fail to adhere to intricate or nuanced instructions provided by users. * Language Limitations: The model is primarily designed to understand standard English. Informal English, slang, or any other languages might pose challenges to its comprehension, leading to potential misinterpretations or errors in response. * Potential Societal Biases: Phi-2 is not entirely free from societal biases despite efforts in assuring training data safety. There's a possibility it may generate content that mirrors these societal biases, particularly if prompted or instructed to do so. We urge users to be aware of this and to exercise caution and critical thinking when interpreting model outputs. * Toxicity: Despite being trained with carefully selected data, the model can still produce harmful content if explicitly prompted or instructed to do so. We chose to release the model to help the open-source community develop the most effective ways to reduce the toxicity of a model directly after pretraining. * Verbosity: Phi-2 being a base model often produces irrelevant or extra text and responses following its first answer to user prompts within a single turn. This is due to its training dataset being primarily textbooks, which results in textbook-like responses. ## Training ### Model * Architecture: a Transformer-based model with next-word prediction objective * Context length: 2048 tokens * Dataset size: 250B tokens, combination of NLP synthetic data created by AOAI GPT-3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, which was assessed by AOAI GPT-4. * Training tokens: 1.4T tokens * GPUs: 96xA100-80G * Training time: 14 days ### Software * PyTorch * DeepSpeed * Flash-Attention ### License The model is licensed under the MIT license. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.", + "model_explanation_gemini": "Phi-2 is a 2.7 billion parameter Transformer model designed for text generation, excelling in QA, chat, and code tasks while enabling research on safety challenges like toxicity and bias reduction." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_phi-4.json b/data/model_data_json/microsoft_phi-4.json new file mode 100644 index 0000000000000000000000000000000000000000..c4cebc2d0a74138a3f3781debf7435d70a318acb --- /dev/null +++ b/data/model_data_json/microsoft_phi-4.json @@ -0,0 +1,25 @@ +{ + "model_id": "microsoft/phi-4", + "downloads": 424864, + "tags": [ + "transformers", + "safetensors", + "phi3", + "text-generation", + "phi", + "nlp", + "math", + "code", + "chat", + "conversational", + "en", + "arxiv:2412.08905", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - en pipeline_tag: text-generation tags: - phi - nlp - math - code - chat - conversational inference: parameters: temperature: 0 widget: - messages: - role: user content: How should I explain the Internet? library_name: transformers --- # Phi-4 Model Card Phi-4 Technical Report ## Model Summary | | | |-------------------------|-------------------------------------------------------------------------------| | **Developers** | Microsoft Research | | **Description** | is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures | | **Architecture** | 14B parameters, dense decoder-only Transformer model | | **Inputs** | Text, best suited for prompts in the chat format | | **Context length** | 16K tokens | | **GPUs** | 1920 H100-80G | | **Training time** | 21 days | | **Training data** | 9.8T tokens | | **Outputs** | Generated text in response to input | | **Dates** | October 2024 – November 2024 | | **Status** | Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data | | **Release date** | December 12, 2024 | | **License** | MIT | ## Intended Use | | | |-------------------------------|-------------------------------------------------------------------------| | **Primary Use Cases** | Our model is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:

1. Memory/compute constrained environments.
2. Latency bound scenarios.
3. Reasoning and logic. | | **Out-of-Scope Use Cases** | Our models is not specifically designed or evaluated for all downstream purposes, thus:

1. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios.
2. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case, including the model’s focus on English.
3. Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under. | ## Data Overview ### Training Datasets Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from: 1. Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code. 2. Newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.). 3. Acquired academic books and Q&A datasets. 4. High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness. Multilingual data constitutes about 8% of our overall data. We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. #### Benchmark datasets We evaluated using OpenAI’s SimpleEval and our own internal benchmarks to understand the model’s capabilities, more specifically: * **MMLU:** Popular aggregated dataset for multitask language understanding. * **MATH:** Challenging competition math problems. * **GPQA:** Complex, graduate-level science questions. * **DROP:** Complex comprehension and reasoning. * **MGSM:** Multi-lingual grade-school math. * **HumanEval:** Functional code generation. * **SimpleQA:** Factual responses. ## Safety ### Approach has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted to multiple safety categories. ### Safety Evaluation and Red-Teaming Prior to release, followed a multi-faceted evaluation approach. Quantitative evaluation was conducted with multiple open-source safety benchmarks and in-house tools utilizing adversarial conversation simulation. For qualitative safety evaluation, we collaborated with the independent AI Red Team (AIRT) at Microsoft to assess safety risks posed by in both average and adversarial user scenarios. In the average user scenario, AIRT emulated typical single-turn and multi-turn interactions to identify potentially risky behaviors. The adversarial user scenario tested a wide range of techniques aimed at intentionally subverting the model’s safety training including jailbreaks, encoding-based attacks, multi-turn attacks, and adversarial suffix attacks. Please refer to the technical report for more details on safety alignment. ## Model Quality To understand the capabilities, we compare with a set of models over OpenAI’s SimpleEval benchmark. At the high-level overview of the model quality on representative benchmarks. For the table below, higher numbers indicate better performance: | **Category** | **Benchmark** | **phi-4** (14B) | **phi-3** (14B) | **Qwen 2.5** (14B instruct) | **GPT-4o-mini** | **Llama-3.3** (70B instruct) | **Qwen 2.5** (72B instruct) | **GPT-4o** | |------------------------------|---------------|-----------|-----------------|----------------------|----------------------|--------------------|-------------------|-----------------| | Popular Aggregated Benchmark | MMLU | 84.8 | 77.9 | 79.9 | 81.8 | 86.3 | 85.3 | **88.1** | | Science | GPQA | **56.1** | 31.2 | 42.9 | 40.9 | 49.1 | 49.0 | 50.6 | | Math | MGSM
MATH | 80.6
**80.4** | 53.5
44.6 | 79.6
75.6 | 86.5
73.0 | 89.1
66.3* | 87.3
80.0 | **90.4**
74.6 | | Code Generation | HumanEval | 82.6 | 67.8 | 72.1 | 86.2 | 78.9* | 80.4 | **90.6** | | Factual Knowledge | SimpleQA | 3.0 | 7.6 | 5.4 | 9.9 | 20.9 | 10.2 | **39.4** | | Reasoning | DROP | 75.5 | 68.3 | 85.5 | 79.3 | **90.2** | 76.7 | 80.9 | \\* These scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following. We use the simple-evals framework because it is reproducible, but Meta reports 77 for MATH and 88 for HumanEval on Llama-3.3-70B. ## Usage ### Input Formats Given the nature of the training data, is best suited for prompts using the chat format as follows: ### With ## Responsible AI Considerations Like other language models, can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: * **Quality of Service:** The model is trained primarily on English text. Languages other than English will experience worse performance. English language varieties with less representation in the training data might experience worse performance than standard American English. is not intended to support multilingual use. * **Representation of Harms & Perpetuation of Stereotypes:** These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. * **Inappropriate or Offensive Content:** These models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the use case. * **Information Reliability:** Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. * **Limited Scope for Code:** Majority of training data is based in Python and uses common packages such as , , , , , . If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. Developers should apply responsible AI best practices and are responsible for ensuring that a specific use case complies with relevant laws and regulations (e.g. privacy, trade, etc.). Using safety services like Azure AI Content Safety that have advanced guardrails is highly recommended. Important areas for consideration include: * **Allocation:** Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. * **High-Risk Scenarios:** Developers should assess suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. * **Misinformation:** Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). * **Generation of Harmful Content:** Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. * **Misuse:** Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.", + "model_explanation_gemini": "Microsoft's Phi-4 is a 14B-parameter open language model optimized for high-quality text generation, reasoning, and coding tasks, trained on diverse datasets to excel in memory/compute-constrained environments and latency-sensitive scenarios." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_rad-dino.json b/data/model_data_json/microsoft_rad-dino.json new file mode 100644 index 0000000000000000000000000000000000000000..a668ab203042197df6e0153d34bdb0bce49bfcf8 --- /dev/null +++ b/data/model_data_json/microsoft_rad-dino.json @@ -0,0 +1,18 @@ +{ + "model_id": "microsoft/rad-dino", + "downloads": 254092, + "tags": [ + "transformers", + "safetensors", + "dinov2", + "image-feature-extraction", + "arxiv:2311.13668", + "arxiv:1910.09700", + "doi:10.57967/hf/3050", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: msrla license_link: library_name: transformers pipeline_tag: image-feature-extraction --- # Model card for RAD-DINO ## Model description RAD-DINO is a vision transformer model trained to encode chest X-rays using the self-supervised learning method DINOv2. RAD-DINO is described in detail in Exploring Scalable Medical Image Encoders Beyond Text Supervision (F. Pérez-García, H. Sharma, S. Bond-Taylor, et al., 2024). - **Developed by:** Microsoft Health Futures - **Model type:** Vision transformer - **License:** MSRLA - **Finetuned from model:** []( ## Uses RAD-DINO is shared for research purposes only. It is **not meant to be used for clinical practice**. The model is a vision backbone that can be plugged to other models for downstream tasks. Some potential uses are: - Image classification, with a classifier trained on top of the token - Image segmentation, with a decoder trained using the patch tokens - Clustering, using the image embeddings directly - Image retrieval, using nearest neighbors of the CLS token - Report generation, with a language model to decode text Fine-tuning RAD-DINO is typically not necessary to obtain good performance in downstream tasks. ## Biases, risks, and limitations RAD-DINO was trained with data from three countries, therefore it might be biased towards population in the training data. Underlying biases of the training datasets may not be well characterized. ## Getting started ### Get some data Let us first write an auxiliary function to download a chest X-ray. ### Load the model Now let us download the model and encode an image. ### Encode an image If we are interested in the feature maps, we can reshape the patch embeddings into a grid. We will use []( (install with ) for this. ### Weights for fine-tuning We have released a checkpoint compatible with the original DINOv2 code to help researchers fine-tune our model. First, let us write code to load a checkpoint. We can now use the hub model and load the RAD-DINO weights. Let's clone the DINOv2 repository so we can import the code for the head. The weights of the head are also released: ### Configs and augmentation The configuration files [](./ssl_default_config.yaml) and [](./vitb14_cxr.yaml), and the [](./augmentations.py) module are also available in the repository to help researchers reproduce the training procedure with our hyperparameters. ## Training details ### Training data We used images from five public, deidentified chest X-ray datasets to train this checkpoint of RAD-DINO. | Dataset | Num. images | | --------- | ----------: | | MIMIC-CXR | 368 960 | | CheXpert | 223 648 | | NIH-CXR | 112 120 | | PadChest | 136 787 | | BRAX | 41 260 | | **TOTAL** | 882 775 | Images in the validation and test sets used to train MAIRA were excluded from the training set of RAD-DINO. The list of image files used for training is available at [](./training_images.csv). Note this checkpoint is different from the one in the paper, where some private data was used (and fewer GPUs). The checkpoint shared here is trained for 35 000 iterations (the total number of iterations in the run was 100 000, but we selected this checkpoint using linear probing on the validation sets of the evaluation datasets described in the paper). We used 16 nodes with 4 A100 GPUs each, and a batch size of 40 images per GPU. ### Training procedure We refer to the manuscript for a detailed description of the training procedure. #### Preprocessing All DICOM files were resized using B-spline interpolation so that their shorter size was 518, min-max scaled to [0, 255], and stored as PNG files. #### Training hyperparameters - **Training regime:** fp16 using PyTorch-FSDP mixed-precision. ## Evaluation Our evaluation is best described in the manuscript. ## Environmental impact - **Hardware type:** NVIDIA A100 GPUs - **Hours used:** 40 hours/GPU × 16 nodes × 4 GPUs/node = 2560 GPU-hours - **Cloud provider:** Azure - **Compute region:** West US 2 - **Carbon emitted:** 222 kg CO₂ eq. ### Compute infrastructure RAD-DINO was trained on Azure Machine Learning. #### Hardware We used 16 nodes with four NVIDIA A100 (80 GB) GPUs each. #### Software We leveraged the code in DINOv2 for training. We used SimpleITK and Pydicom for processing of DICOM files. ## Citation **BibTeX:** **APA:** > Pérez-García, F., Sharma, H., Bond-Taylor, S., Bouzid, K., Salvatelli, V., Ilse, M., Bannur, S., Castro, D. C., Schwaighofer, A., Lungren, M. P., Wetscherek, M. T., Codella, N., Hyland, S. L., Alvarez-Valle, J., & Oktay, O. (2025). *Exploring scalable medical image encoders beyond text supervision*. In Nature Machine Intelligence. Springer Science and Business Media LLC. ## Model card contact Fernando Pérez-García ([](mailto:fperezgarcia@microsoft.com)).", + "model_explanation_gemini": "A vision transformer model trained with DINOv2 to encode chest X-rays for research purposes like classification, segmentation, and retrieval, but not for clinical use." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_resnet-18.json b/data/model_data_json/microsoft_resnet-18.json new file mode 100644 index 0000000000000000000000000000000000000000..9f2d9250593c83a1e17b1cece20400f709471a91 --- /dev/null +++ b/data/model_data_json/microsoft_resnet-18.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/resnet-18", + "downloads": 125286, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "resnet", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:1512.03385", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # ResNet ResNet model trained on imagenet-1k. It was introduced in the paper Deep Residual Learning for Image Recognition and first released in this repository. Disclaimer: The team releasing ResNet did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description ResNet introduced residual connections, they allow to train networks with an unseen number of layers (up to 1000). ResNet won the 2015 ILSVRC & COCO competition, one important milestone in deep computer vision. !model image ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model: For more code examples, we refer to the documentation.", + "model_explanation_gemini": "Classifies images into categories using residual connections trained on ImageNet-1k." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_resnet-50.json b/data/model_data_json/microsoft_resnet-50.json new file mode 100644 index 0000000000000000000000000000000000000000..9d0f6fb1dd14d21e142fdb1af3cac5446a6c6b0f --- /dev/null +++ b/data/model_data_json/microsoft_resnet-50.json @@ -0,0 +1,22 @@ +{ + "model_id": "microsoft/resnet-50", + "downloads": 251857, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "resnet", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:1512.03385", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k --- # ResNet-50 v1.5 ResNet model pre-trained on ImageNet-1k at resolution 224x224. It was introduced in the paper Deep Residual Learning for Image Recognition by He et al. Disclaimer: The team releasing ResNet did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description ResNet (Residual Network) is a convolutional neural network that democratized the concepts of residual learning and skip connections. This enables to train much deeper models. This is ResNet v1.5, which differs from the original model: in the bottleneck blocks which require downsampling, v1 has stride = 2 in the first 1x1 convolution, whereas v1.5 has stride = 2 in the 3x3 convolution. This difference makes ResNet50 v1.5 slightly more accurate (\\~0.5% top1) than v1, but comes with a small performance drawback (~5% imgs/sec) according to Nvidia. !model image ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Classifies images into 1,000 categories using a pre-trained ResNet-50 v1.5 convolutional neural network." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_speecht5_hifigan.json b/data/model_data_json/microsoft_speecht5_hifigan.json new file mode 100644 index 0000000000000000000000000000000000000000..bf27c7887c92c78a5732786560f4a7180009c94e --- /dev/null +++ b/data/model_data_json/microsoft_speecht5_hifigan.json @@ -0,0 +1,15 @@ +{ + "model_id": "microsoft/speecht5_hifigan", + "downloads": 86158, + "tags": [ + "transformers", + "pytorch", + "hifigan", + "audio", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - audio --- # SpeechT5 HiFi-GAN Vocoder This is the HiFi-GAN vocoder for use with the SpeechT5 text-to-speech and voice conversion models. SpeechT5 was first released in this repository, original weights. The license used is MIT. Disclaimer: The team releasing SpeechT5 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Citation **BibTeX:**", + "model_explanation_gemini": "Generates high-quality speech audio for use with SpeechT5 text-to-speech and voice conversion models." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_speecht5_tts.json b/data/model_data_json/microsoft_speecht5_tts.json new file mode 100644 index 0000000000000000000000000000000000000000..dff5e7b50a7efaae0799cf97cec9a1f07ee58d41 --- /dev/null +++ b/data/model_data_json/microsoft_speecht5_tts.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/speecht5_tts", + "downloads": 106487, + "tags": [ + "transformers", + "pytorch", + "speecht5", + "text-to-audio", + "audio", + "text-to-speech", + "dataset:libritts", + "arxiv:2110.07205", + "arxiv:1910.09700", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - audio - text-to-speech datasets: - libritts --- # SpeechT5 (TTS task) SpeechT5 model fine-tuned for speech synthesis (text-to-speech) on LibriTTS. This model was introduced in SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. SpeechT5 was first released in this repository, original weights. The license used is MIT. ## Model Description Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder. Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder. Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification. - **Developed by:** Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. - **Shared by [optional]:** Matthijs Hollemans - **Model type:** text-to-speech - **Language(s) (NLP):** [More Information Needed] - **License:** MIT - **Finetuned from model [optional]:** [More Information Needed] ## Model Sources [optional] - **Repository:** [ - **Paper:** [ - **Blog Post:** [ - **Demo:** [ # Uses ## 🤗 Transformers Usage You can run SpeechT5 TTS locally with the 🤗 Transformers library. 1. First install the 🤗 Transformers library, sentencepiece, soundfile and datasets(optional): 2. Run inference via the (TTS) pipeline. You can access the SpeechT5 model via the TTS pipeline in just a few lines of code! 3. Run inference via the Transformers modelling code - You can use the processor + generate code to convert text into a mono 16 kHz speech waveform for more fine-grained control. ### Fine-tuning the Model Refer to this Colab notebook for an example of how to fine-tune SpeechT5 for TTS on a different dataset or a new language. ## Direct Use You can use this model for speech synthesis. See the model hub to look for fine-tuned versions on a task that interests you. ## Downstream Use [optional] [More Information Needed] ## Out-of-Scope Use [More Information Needed] # Bias, Risks, and Limitations [More Information Needed] ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Training Details ## Training Data LibriTTS ## Training Procedure ### Preprocessing [optional] Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. ### Training hyperparameters - **Precision:** [More Information Needed] - **Regime:** [More Information Needed] ### Speeds, Sizes, Times [optional] [More Information Needed] # Evaluation ## Testing Data, Factors & Metrics ### Testing Data [More Information Needed] ### Factors [More Information Needed] ### Metrics [More Information Needed] ## Results [More Information Needed] ### Summary # Model Examination [optional] Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification. # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] # Technical Specifications [optional] ## Model Architecture and Objective The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder. ## Compute Infrastructure [More Information Needed] ### Hardware [More Information Needed] ### Software [More Information Needed] # Citation [optional] **BibTeX:** # Glossary [optional] - **text-to-speech** to synthesize audio # More Information [optional] [More Information Needed] # Model Card Authors [optional] Disclaimer: The team releasing SpeechT5 did not write a model card for this model so this model card has been written by the Hugging Face team. # Model Card Contact [More Information Needed]", + "model_explanation_gemini": "Fine-tuned for text-to-speech synthesis, the model converts text into spoken audio using a unified encoder-decoder framework pre-trained on LibriTTS data." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_swin-base-patch4-window7-224.json b/data/model_data_json/microsoft_swin-base-patch4-window7-224.json new file mode 100644 index 0000000000000000000000000000000000000000..9d3299f5be58011bc2a16f89306ac6ca47240d58 --- /dev/null +++ b/data/model_data_json/microsoft_swin-base-patch4-window7-224.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/swin-base-patch4-window7-224", + "downloads": 263205, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "swin", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:2103.14030", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # Swin Transformer (base-sized model) Swin Transformer model trained on ImageNet-1k at resolution 224x224. It was introduced in the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Liu et al. and first released in this repository. Disclaimer: The team releasing Swin Transformer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. In contrast, previous vision Transformers produce feature maps of a single low resolution and have quadratic computation complexity to input image size due to computation of self-attention globally. !model image Source ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs image classification using a hierarchical Vision Transformer architecture optimized for efficient computation with local self-attention windows." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_swin-tiny-patch4-window7-224.json b/data/model_data_json/microsoft_swin-tiny-patch4-window7-224.json new file mode 100644 index 0000000000000000000000000000000000000000..907b1cd487758ce5b8ff09139ee13e9bc2b34f51 --- /dev/null +++ b/data/model_data_json/microsoft_swin-tiny-patch4-window7-224.json @@ -0,0 +1,21 @@ +{ + "model_id": "microsoft/swin-tiny-patch4-window7-224", + "downloads": 88690, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "swin", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:2103.14030", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # Swin Transformer (tiny-sized model) Swin Transformer model trained on ImageNet-1k at resolution 224x224. It was introduced in the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Liu et al. and first released in this repository. Disclaimer: The team releasing Swin Transformer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. In contrast, previous vision Transformers produce feature maps of a single low resolution and have quadratic computation complexity to input image size due to computation of self-attention globally. !model image Source ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Classifies images into 1,000 categories using a hierarchical vision transformer trained on ImageNet-1k at 224x224 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_swinv2-tiny-patch4-window16-256.json b/data/model_data_json/microsoft_swinv2-tiny-patch4-window16-256.json new file mode 100644 index 0000000000000000000000000000000000000000..34afd7359b17a90b13847f4b02e71c906d6a6444 --- /dev/null +++ b/data/model_data_json/microsoft_swinv2-tiny-patch4-window16-256.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/swinv2-tiny-patch4-window16-256", + "downloads": 414107, + "tags": [ + "transformers", + "pytorch", + "swinv2", + "image-classification", + "vision", + "dataset:imagenet-1k", + "arxiv:2111.09883", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k widget: - src: example_title: Tiger - src: example_title: Teapot - src: example_title: Palace --- # Swin Transformer v2 (tiny-sized model) Swin Transformer v2 model pre-trained on ImageNet-1k at resolution 256x256. It was introduced in the paper Swin Transformer V2: Scaling Up Capacity and Resolution by Liu et al. and first released in this repository. Disclaimer: The team releasing Swin Transformer v2 did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. In contrast, previous vision Transformers produce feature maps of a single low resolution and have quadratic computation complexity to input image size due to computation of self-attention globally. Swin Transformer v2 adds 3 main improvements: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) a log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) a self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images. !model image Source ## Intended uses & limitations You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Pre-trained on ImageNet-1k for image classification, this Swin Transformer v2 model efficiently processes 256x256 resolution images using hierarchical feature maps and local window self-attention." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_table-transformer-detection.json b/data/model_data_json/microsoft_table-transformer-detection.json new file mode 100644 index 0000000000000000000000000000000000000000..c5946fc9bee66a03175e951496a01ac2124db026 --- /dev/null +++ b/data/model_data_json/microsoft_table-transformer-detection.json @@ -0,0 +1,17 @@ +{ + "model_id": "microsoft/table-transformer-detection", + "downloads": 2886724, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "table-transformer", + "object-detection", + "arxiv:2110.00061", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit widget: - src: example_title: Invoice --- # Table Transformer (fine-tuned for Table Detection) Table Transformer (DETR) model trained on PubTables1M. It was introduced in the paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents by Smock et al. and first released in this repository. Disclaimer: The team releasing Table Transformer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Table Transformer is equivalent to DETR, a Transformer-based object detection model. Note that the authors decided to use the \"normalize before\" setting of DETR, which means that layernorm is applied before self- and cross-attention. ## Usage You can use the raw model for detecting tables in documents. See the documentation for more info.", + "model_explanation_gemini": "Detects tables in unstructured documents using a Transformer-based object detection model fine-tuned on PubTables1M." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_table-transformer-structure-recognition-v1.1-all.json b/data/model_data_json/microsoft_table-transformer-structure-recognition-v1.1-all.json new file mode 100644 index 0000000000000000000000000000000000000000..e7ec9cea9506a2549c63c887adde3f02cedaa0b1 --- /dev/null +++ b/data/model_data_json/microsoft_table-transformer-structure-recognition-v1.1-all.json @@ -0,0 +1,16 @@ +{ + "model_id": "microsoft/table-transformer-structure-recognition-v1.1-all", + "downloads": 413475, + "tags": [ + "transformers", + "safetensors", + "table-transformer", + "object-detection", + "arxiv:2303.00716", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- # Table Transformer (pre-trained for Table Structure Recognition) Table Transformer (TATR) model trained on PubTables1M and FinTabNet.c. It was introduced in the paper Aligning benchmark datasets for table structure recognition by Smock et al. and first released in this repository. Disclaimer: The team releasing Table Transformer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Table Transformer is equivalent to DETR, a Transformer-based object detection model. Note that the authors decided to use the \"normalize before\" setting of DETR, which means that layernorm is applied before self- and cross-attention. ## Usage You can use the raw model for detecting tables in documents. See the documentation for more info.", + "model_explanation_gemini": "Detects and recognizes table structures in documents using a Transformer-based object detection approach." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_table-transformer-structure-recognition.json b/data/model_data_json/microsoft_table-transformer-structure-recognition.json new file mode 100644 index 0000000000000000000000000000000000000000..ffdcc8a0c3519e303c203f420e0804943241b96d --- /dev/null +++ b/data/model_data_json/microsoft_table-transformer-structure-recognition.json @@ -0,0 +1,17 @@ +{ + "model_id": "microsoft/table-transformer-structure-recognition", + "downloads": 1126757, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "table-transformer", + "object-detection", + "arxiv:2110.00061", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit widget: - src: example_title: Table --- # Table Transformer (fine-tuned for Table Structure Recognition) Table Transformer (DETR) model trained on PubTables1M. It was introduced in the paper PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents by Smock et al. and first released in this repository. Disclaimer: The team releasing Table Transformer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Table Transformer is equivalent to DETR, a Transformer-based object detection model. Note that the authors decided to use the \"normalize before\" setting of DETR, which means that layernorm is applied before self- and cross-attention. ## Usage You can use the raw model for detecting the structure (like rows, columns) in tables. See the documentation for more info.", + "model_explanation_gemini": "Detects table structures (rows, columns) in unstructured documents using a Transformer-based object detection approach." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_trocr-base-handwritten.json b/data/model_data_json/microsoft_trocr-base-handwritten.json new file mode 100644 index 0000000000000000000000000000000000000000..e94ff775210407480d6e07180862b5c343ff8c62 --- /dev/null +++ b/data/model_data_json/microsoft_trocr-base-handwritten.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/trocr-base-handwritten", + "downloads": 214170, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vision-encoder-decoder", + "image-text-to-text", + "trocr", + "image-to-text", + "arxiv:2109.10282", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - trocr - image-to-text widget: - src: example_title: Note 1 - src: >- example_title: Note 2 - src: >- example_title: Note 3 license: mit --- # TrOCR (base-sized model, fine-tuned on IAM) TrOCR model fine-tuned on the IAM dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository. Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens. ## Intended uses & limitations You can use the raw model for optical character recognition (OCR) on single text-line images. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model in PyTorch: ### BibTeX entry and citation info", + "model_explanation_gemini": "Recognizes handwritten text in images using a Transformer-based encoder-decoder architecture fine-tuned on the IAM dataset." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_trocr-base-printed.json b/data/model_data_json/microsoft_trocr-base-printed.json new file mode 100644 index 0000000000000000000000000000000000000000..135b10ecba80966e7ebe9040459000976e054219 --- /dev/null +++ b/data/model_data_json/microsoft_trocr-base-printed.json @@ -0,0 +1,18 @@ +{ + "model_id": "microsoft/trocr-base-printed", + "downloads": 202091, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vision-encoder-decoder", + "image-text-to-text", + "trocr", + "image-to-text", + "arxiv:2109.10282", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - trocr - image-to-text widget: - src: example_title: Printed 1 - src: example_title: Printed 2 - src: example_title: Printed 3 --- # TrOCR (base-sized model, fine-tuned on SROIE) TrOCR model fine-tuned on the SROIE dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository. Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens. ## Intended uses & limitations You can use the raw model for optical character recognition (OCR) on single text-line images. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model in PyTorch: ### BibTeX entry and citation info", + "model_explanation_gemini": "Converts printed text in images to digital text using optical character recognition (OCR), fine-tuned on the SROIE dataset." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_trocr-large-printed.json b/data/model_data_json/microsoft_trocr-large-printed.json new file mode 100644 index 0000000000000000000000000000000000000000..77dfac3b8728907da385d5a3e64afcafe31671ce --- /dev/null +++ b/data/model_data_json/microsoft_trocr-large-printed.json @@ -0,0 +1,18 @@ +{ + "model_id": "microsoft/trocr-large-printed", + "downloads": 282473, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vision-encoder-decoder", + "image-text-to-text", + "trocr", + "image-to-text", + "arxiv:2109.10282", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - trocr - image-to-text widget: - src: example_title: Printed 1 - src: example_title: Printed 2 - src: example_title: Printed 3 --- # TrOCR (large-sized model, fine-tuned on SROIE) TrOCR model fine-tuned on the SROIE dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository. Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens. ## Intended uses & limitations You can use the raw model for optical character recognition (OCR) on single text-line images. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model in PyTorch: ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs optical character recognition (OCR) on printed text-line images using a transformer-based encoder-decoder architecture fine-tuned on the SROIE dataset." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_trocr-small-handwritten.json b/data/model_data_json/microsoft_trocr-small-handwritten.json new file mode 100644 index 0000000000000000000000000000000000000000..8af762f0597f88650ecb4c6968296ed11a689fc7 --- /dev/null +++ b/data/model_data_json/microsoft_trocr-small-handwritten.json @@ -0,0 +1,17 @@ +{ + "model_id": "microsoft/trocr-small-handwritten", + "downloads": 517835, + "tags": [ + "transformers", + "pytorch", + "vision-encoder-decoder", + "image-text-to-text", + "trocr", + "image-to-text", + "arxiv:2109.10282", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - trocr - image-to-text widget: - src: example_title: Note 1 - src: example_title: Note 2 - src: example_title: Note 3 --- # TrOCR (small-sized model, fine-tuned on IAM) TrOCR model fine-tuned on the IAM dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository. ## Model description The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of DeiT, while the text decoder was initialized from the weights of UniLM. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens. ## Intended uses & limitations You can use the raw model for optical character recognition (OCR) on single text-line images. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model in PyTorch: ### BibTeX entry and citation info", + "model_explanation_gemini": "Recognizes handwritten text in single-line images using a Transformer-based encoder-decoder architecture fine-tuned on the IAM dataset." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_unixcoder-base.json b/data/model_data_json/microsoft_unixcoder-base.json new file mode 100644 index 0000000000000000000000000000000000000000..bca2633e635bea586a7988667c7b8bae7270a9f9 --- /dev/null +++ b/data/model_data_json/microsoft_unixcoder-base.json @@ -0,0 +1,17 @@ +{ + "model_id": "microsoft/unixcoder-base", + "downloads": 341798, + "tags": [ + "transformers", + "pytorch", + "roberta", + "feature-extraction", + "en", + "arxiv:2203.03850", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 --- # Model Card for UniXcoder-base # Model Details ## Model Description UniXcoder is a unified cross-modal pre-trained model that leverages multimodal data (i.e. code comment and AST) to pretrain code representation. - **Developed by:** Microsoft Team - **Shared by [Optional]:** Hugging Face - **Model type:** Feature Engineering - **Language(s) (NLP):** en - **License:** Apache-2.0 - **Related Models:** - **Parent Model:** RoBERTa - **Resources for more information:** - Associated Paper # Uses ## 1. Dependency - pip install torch - pip install transformers ## 2. Quick Tour We implement a class to use UniXcoder and you can follow the code to build UniXcoder. You can download the class by In the following, we will give zero-shot examples for several tasks under different mode, including **code search (encoder-only)**, **code completion (decoder-only)**, **function name prediction (encoder-decoder)** , **API recommendation (encoder-decoder)**, **code summarization (encoder-decoder)**. ## 3. Encoder-only Mode For encoder-only mode, we give an example of **code search**. ### 1) Code and NL Embeddings Here, we give an example to obtain code fragment embedding from CodeBERT. ### 2) Similarity between code and NL Now, we calculate cosine similarity between NL and two functions. Although the difference of two functions is only a operator ( and ), UniXcoder can distinguish them. ## 3. Decoder-only Mode For decoder-only mode, we give an example of **code completion**. ## 4. Encoder-Decoder Mode For encoder-decoder mode, we give two examples including: **function name prediction**, **API recommendation**, **code summarization**. ### 1) **Function Name Prediction** ### 2) API Recommendation ### 3) Code Summarization # Reference If you use this code or UniXcoder, please consider citing us.
@article{guo2022unixcoder, title={UniXcoder: Unified Cross-Modal Pre-training for Code Representation}, author={Guo, Daya and Lu, Shuai and Duan, Nan and Wang, Yanlin and Zhou, Ming and Yin, Jian}, journal={arXiv preprint arXiv:2203.03850}, year={2022} }
", + "model_explanation_gemini": "UniXcoder is a unified cross-modal pre-trained model for code representation that supports tasks like code search, completion, function prediction, API recommendation, and code summarization using multimodal data (code comments and AST)." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_wavlm-base-plus-sd.json b/data/model_data_json/microsoft_wavlm-base-plus-sd.json new file mode 100644 index 0000000000000000000000000000000000000000..161b589e18e709162aa324935bcbea8863e1f01b --- /dev/null +++ b/data/model_data_json/microsoft_wavlm-base-plus-sd.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/wavlm-base-plus-sd", + "downloads": 308161, + "tags": [ + "transformers", + "pytorch", + "wavlm", + "audio-frame-classification", + "speech", + "en", + "arxiv:1912.07875", + "arxiv:2106.06909", + "arxiv:2101.00390", + "arxiv:2110.13900", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - speech --- # WavLM-Base-Plus for Speaker Diarization Microsoft's WavLM The model was pretrained on 16kHz sampled speech audio with utterance and speaker contrastive loss. When using the model, make sure that your speech input is also sampled at 16kHz. The model was pre-trained on: - 60,000 hours of Libri-Light - 10,000 hours of GigaSpeech - 24,000 hours of VoxPopuli Paper: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei **Abstract** *Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. In this paper, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation. We first equip the Transformer structure with gated relative position bias to improve its capability on recognition tasks. For better speaker discrimination, we propose an utterance mixing training strategy, where additional overlapped utterances are created unsupervisely and incorporated during model training. Lastly, we scale up the training dataset from 60k hours to 94k hours. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.* The original model can be found under # Fine-tuning details The model is fine-tuned on the LibriMix dataset using just a linear layer for mapping the network outputs. # Usage ## Speaker Diarization # License The official license can be found here !design", + "model_explanation_gemini": "Microsoft's WavLM-Base-Plus is a speech model pretrained for speaker diarization, focusing on spoken content modeling and speaker identity preservation using 16kHz audio inputs." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_wavlm-base-plus-sv.json b/data/model_data_json/microsoft_wavlm-base-plus-sv.json new file mode 100644 index 0000000000000000000000000000000000000000..34ee5d86bac8576d190d9b0b730cc5979f84e959 --- /dev/null +++ b/data/model_data_json/microsoft_wavlm-base-plus-sv.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/wavlm-base-plus-sv", + "downloads": 606295, + "tags": [ + "transformers", + "pytorch", + "wavlm", + "audio-xvector", + "speech", + "en", + "arxiv:1912.07875", + "arxiv:2106.06909", + "arxiv:2101.00390", + "arxiv:2110.13900", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - speech --- # WavLM-Base-Plus for Speaker Verification Microsoft's WavLM The model was pretrained on 16kHz sampled speech audio with utterance and speaker contrastive loss. When using the model, make sure that your speech input is also sampled at 16kHz. **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model. The model was pre-trained on: - 60,000 hours of Libri-Light - 10,000 hours of GigaSpeech - 24,000 hours of VoxPopuli Paper: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei **Abstract** *Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. In this paper, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation. We first equip the Transformer structure with gated relative position bias to improve its capability on recognition tasks. For better speaker discrimination, we propose an utterance mixing training strategy, where additional overlapped utterances are created unsupervisely and incorporated during model training. Lastly, we scale up the training dataset from 60k hours to 94k hours. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.* The original model can be found under # Fine-tuning details The model is fine-tuned on the VoxCeleb1 dataset using an X-Vector head with an Additive Margin Softmax loss X-Vectors: Robust DNN Embeddings for Speaker Recognition # Usage ## Speaker Verification # License The official license can be found here !design", + "model_explanation_gemini": "Microsoft's WavLM-Base-Plus is a pretrained audio model for speaker verification, trained on large-scale speech datasets to preserve speaker identity and spoken content." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_wavlm-base-plus.json b/data/model_data_json/microsoft_wavlm-base-plus.json new file mode 100644 index 0000000000000000000000000000000000000000..5bda18f9ad10cc4f763acf48ad3eee40b5ecd3b4 --- /dev/null +++ b/data/model_data_json/microsoft_wavlm-base-plus.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/wavlm-base-plus", + "downloads": 665226, + "tags": [ + "transformers", + "pytorch", + "wavlm", + "feature-extraction", + "speech", + "en", + "arxiv:1912.07875", + "arxiv:2106.06909", + "arxiv:2101.00390", + "arxiv:2110.13900", + "region:us" + ], + "description": "--- language: - en datasets: tags: - speech inference: false --- # WavLM-Base-Plus Microsoft's WavLM The base model pretrained on 16kHz sampled speech audio. When using the model, make sure that your speech input is also sampled at 16kHz. **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model. The model was pre-trained on: - 60,000 hours of Libri-Light - 10,000 hours of GigaSpeech - 24,000 hours of VoxPopuli Paper: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei **Abstract** *Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. In this paper, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation. We first equip the Transformer structure with gated relative position bias to improve its capability on recognition tasks. For better speaker discrimination, we propose an utterance mixing training strategy, where additional overlapped utterances are created unsupervisely and incorporated during model training. Lastly, we scale up the training dataset from 60k hours to 94k hours. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.* The original model can be found under # Usage This is an English pre-trained speech model that has to be fine-tuned on a downstream task like speech recognition or audio classification before it can be used in inference. The model was pre-trained in English and should therefore perform well only in English. The model has been shown to work well on the SUPERB benchmark. **Note**: The model was pre-trained on phonemes rather than characters. This means that one should make sure that the input text is converted to a sequence of phonemes before fine-tuning. ## Speech Recognition To fine-tune the model for speech recognition, see the official speech recognition example. ## Speech Classification To fine-tune the model for speech classification, see the official audio classification example. ## Speaker Verification TODO ## Speaker Diarization TODO # Contribution The model was contributed by cywang and patrickvonplaten. # License The official license can be found here !design", + "model_explanation_gemini": "Microsoft's WavLM-Base-Plus is a pre-trained English speech model designed for fine-tuning on downstream tasks like speech recognition and audio classification, trained on 94k hours of audio data to handle multi-faceted speech information." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_wavlm-large.json b/data/model_data_json/microsoft_wavlm-large.json new file mode 100644 index 0000000000000000000000000000000000000000..0803c1b33de820b0a3a2f32ace302631fb45a5e2 --- /dev/null +++ b/data/model_data_json/microsoft_wavlm-large.json @@ -0,0 +1,19 @@ +{ + "model_id": "microsoft/wavlm-large", + "downloads": 372722, + "tags": [ + "transformers", + "pytorch", + "wavlm", + "feature-extraction", + "speech", + "en", + "arxiv:1912.07875", + "arxiv:2106.06909", + "arxiv:2101.00390", + "arxiv:2110.13900", + "region:us" + ], + "description": "--- language: - en tags: - speech inference: false --- # WavLM-Large Microsoft's WavLM The large model pretrained on 16kHz sampled speech audio. When using the model, make sure that your speech input is also sampled at 16kHz. **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model. The model was pre-trained on: - 60,000 hours of Libri-Light - 10,000 hours of GigaSpeech - 24,000 hours of VoxPopuli Paper: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei **Abstract** *Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. In this paper, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation. We first equip the Transformer structure with gated relative position bias to improve its capability on recognition tasks. For better speaker discrimination, we propose an utterance mixing training strategy, where additional overlapped utterances are created unsupervisely and incorporated during model training. Lastly, we scale up the training dataset from 60k hours to 94k hours. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.* The original model can be found under # Usage This is an English pre-trained speech model that has to be fine-tuned on a downstream task like speech recognition or audio classification before it can be used in inference. The model was pre-trained in English and should therefore perform well only in English. The model has been shown to work well on the SUPERB benchmark. **Note**: The model was pre-trained on phonemes rather than characters. This means that one should make sure that the input text is converted to a sequence of phonemes before fine-tuning. ## Speech Recognition To fine-tune the model for speech recognition, see the official speech recognition example. ## Speech Classification To fine-tune the model for speech classification, see the official audio classification example. ## Speaker Verification TODO ## Speaker Diarization TODO # Contribution The model was contributed by cywang and patrickvonplaten. # License The official license can be found here !design", + "model_explanation_gemini": "WavLM-Large is a pre-trained English speech model designed for fine-tuning on downstream tasks like speech recognition and audio classification, optimized for 16kHz audio input." +} \ No newline at end of file diff --git a/data/model_data_json/microsoft_xclip-base-patch32.json b/data/model_data_json/microsoft_xclip-base-patch32.json new file mode 100644 index 0000000000000000000000000000000000000000..3fd3c99fa63337d94b69fc68730f2011a41e4e89 --- /dev/null +++ b/data/model_data_json/microsoft_xclip-base-patch32.json @@ -0,0 +1,20 @@ +{ + "model_id": "microsoft/xclip-base-patch32", + "downloads": 313653, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "xclip", + "vision", + "video-classification", + "en", + "arxiv:2208.02816", + "license:mit", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: mit tags: - vision - video-classification model-index: - name: nielsr/xclip-base-patch32 results: - task: type: video-classification dataset: name: Kinetics 400 type: kinetics-400 metrics: - type: top-1 accuracy value: 80.4 - type: top-5 accuracy value: 95.0 --- # X-CLIP (base-sized model) X-CLIP model (base-sized, patch resolution of 32) trained fully-supervised on Kinetics-400. It was introduced in the paper Expanding Language-Image Pretrained Models for General Video Recognition by Ni et al. and first released in this repository. This model was trained using 8 frames per video, at a resolution of 224x224. Disclaimer: The team releasing X-CLIP did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description X-CLIP is a minimal extension of CLIP for general video-language understanding. The model is trained in a contrastive way on (video, text) pairs. !X-CLIP architecture This allows the model to be used for tasks like zero-shot, few-shot or fully supervised video classification and video-text retrieval. ## Intended uses & limitations You can use the raw model for determining how well text goes with a given video. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use For code examples, we refer to the documentation. ## Training data This model was trained on Kinetics-400. ### Preprocessing The exact details of preprocessing during training can be found here. The exact details of preprocessing during validation can be found here. During validation, one resizes the shorter edge of each frame, after which center cropping is performed to a fixed-size resolution (like 224x224). Next, frames are normalized across the RGB channels with the ImageNet mean and standard deviation. ## Evaluation results This model achieves a top-1 accuracy of 80.4% and a top-5 accuracy of 95.0%.", + "model_explanation_gemini": "X-CLIP is a vision model trained for video classification and video-text retrieval, extending CLIP to understand video-language relationships through contrastive learning on (video, text) pairs." +} \ No newline at end of file diff --git a/data/model_data_json/minishlab_potion-base-8M.json b/data/model_data_json/minishlab_potion-base-8M.json new file mode 100644 index 0000000000000000000000000000000000000000..917c944b4857514fa8f9fcbb3b6c9167081b71e6 --- /dev/null +++ b/data/model_data_json/minishlab_potion-base-8M.json @@ -0,0 +1,18 @@ +{ + "model_id": "minishlab/potion-base-8M", + "downloads": 281149, + "tags": [ + "model2vec", + "onnx", + "safetensors", + "embeddings", + "static-embeddings", + "mteb", + "sentence-transformers", + "license:mit", + "model-index", + "region:us" + ], + "description": "--- library_name: model2vec license: mit tags: - embeddings - static-embeddings - mteb - sentence-transformers model-index: - name: potion-base-8M results: - task: type: Classification dataset: name: MTEB AmazonCounterfactualClassification (en-ext) type: mteb/amazon_counterfactual config: en-ext split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 72.15142428785607 - type: ap value: 20.626102291010103 - type: ap_weighted value: 20.626102291010103 - type: f1 value: 59.187001923736894 - type: f1_weighted value: 77.34906471545477 - type: main_score value: 72.15142428785607 - task: type: Classification dataset: name: MTEB AmazonCounterfactualClassification (en) type: mteb/amazon_counterfactual config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 71.7910447761194 - type: ap value: 33.038020188116036 - type: ap_weighted value: 33.038020188116036 - type: f1 value: 65.03799728338926 - type: f1_weighted value: 74.32788084269461 - type: main_score value: 71.7910447761194 - task: type: Classification dataset: name: MTEB AmazonPolarityClassification (default) type: mteb/amazon_polarity config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 72.47644999999999 - type: ap value: 66.91002822830875 - type: ap_weighted value: 66.91002822830875 - type: f1 value: 72.2600863044581 - type: f1_weighted value: 72.2600863044581 - type: main_score value: 72.47644999999999 - task: type: Classification dataset: name: MTEB AmazonReviewsClassification (en) type: mteb/amazon_reviews_multi config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 36.012 - type: f1 value: 35.38209336470206 - type: f1_weighted value: 35.38209336470206 - type: main_score value: 36.012 - task: type: Retrieval dataset: name: MTEB ArguAna (default) type: mteb/arguana config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: main_score value: 41.966 - type: map_at_1 value: 21.124000000000002 - type: map_at_10 value: 34.335 - type: map_at_100 value: 35.618 - type: map_at_1000 value: 35.653 - type: map_at_20 value: 35.21 - type: map_at_3 value: 30.287 - type: map_at_5 value: 32.364 - type: mrr_at_1 value: 21.62162162162162 - type: mrr_at_10 value: 34.509104969631224 - type: mrr_at_100 value: 35.79229946325059 - type: mrr_at_1000 value: 35.82767320968403 - type: mrr_at_20 value: 35.38485605181455 - type: mrr_at_3 value: 30.405405405405343 - type: mrr_at_5 value: 32.539118065433755 - type: nauc_map_at_1000_diff1 value: 7.960826255212609 - type: nauc_map_at_1000_max value: -0.036381315067780806 - type: nauc_map_at_1000_std value: 4.317766293607543 - type: nauc_map_at_100_diff1 value: 7.96318422584977 - type: nauc_map_at_100_max value: -0.007800758201736421 - type: nauc_map_at_100_std value: 4.362078927714198 - type: nauc_map_at_10_diff1 value: 7.718022643886373 - type: nauc_map_at_10_max value: -0.28312250079415263 - type: nauc_map_at_10_std value: 4.079196099329437 - type: nauc_map_at_1_diff1 value: 9.240393281366906 - type: nauc_map_at_1_max value: -4.35798405693968 - type: nauc_map_at_1_std value: 1.5076565659508505 - type: nauc_map_at_20_diff1 value: 8.028053857747947 - type: nauc_map_at_20_max value: 0.0719807687813251 - type: nauc_map_at_20_std value: 4.394812024847373 - type: nauc_map_at_3_diff1 value: 7.953781299828595 - type: nauc_map_at_3_max value: -0.573072664182506 - type: nauc_map_at_3_std value: 3.110821611511372 - type: nauc_map_at_5_diff1 value: 7.3135486297676415 - type: nauc_map_at_5_max value: -1.2456304709603878 - type: nauc_map_at_5_std value: 3.2332006196074805 - type: nauc_mrr_at_1000_diff1 value: 6.511595076207588 - type: nauc_mrr_at_1000_max value: -0.4777573692286575 - type: nauc_mrr_at_1000_std value: 4.19518565742107 - type: nauc_mrr_at_100_diff1 value: 6.515632481906436 - type: nauc_mrr_at_100_max value: -0.44877259463397945 - type: nauc_mrr_at_100_std value: 4.23945026873963 - type: nauc_mrr_at_10_diff1 value: 6.325261150908693 - type: nauc_mrr_at_10_max value: -0.6968688229450172 - type: nauc_mrr_at_10_std value: 3.9631303923167294 - type: nauc_mrr_at_1_diff1 value: 7.4844946822832785 - type: nauc_mrr_at_1_max value: -4.0195803039697315 - type: nauc_mrr_at_1_std value: 1.3908984330415426 - type: nauc_mrr_at_20_diff1 value: 6.596479652899773 - type: nauc_mrr_at_20_max value: -0.3643520262705732 - type: nauc_mrr_at_20_std value: 4.273437423781988 - type: nauc_mrr_at_3_diff1 value: 6.3669450211955745 - type: nauc_mrr_at_3_max value: -1.2252447747465325 - type: nauc_mrr_at_3_std value: 2.941708547001192 - type: nauc_mrr_at_5_diff1 value: 5.907234785613739 - type: nauc_mrr_at_5_max value: -1.6860364992754489 - type: nauc_mrr_at_5_std value: 3.0737345356263406 - type: nauc_ndcg_at_1000_diff1 value: 7.9706658500975704 - type: nauc_ndcg_at_1000_max value: 1.5533941879318276 - type: nauc_ndcg_at_1000_std value: 5.933724413159287 - type: nauc_ndcg_at_100_diff1 value: 8.107414913432397 - type: nauc_ndcg_at_100_max value: 2.5869418793842778 - type: nauc_ndcg_at_100_std value: 7.322146884970876 - type: nauc_ndcg_at_10_diff1 value: 7.669807780113455 - type: nauc_ndcg_at_10_max value: 1.886214180834648 - type: nauc_ndcg_at_10_std value: 6.055781567147952 - type: nauc_ndcg_at_1_diff1 value: 9.240393281366906 - type: nauc_ndcg_at_1_max value: -4.35798405693968 - type: nauc_ndcg_at_1_std value: 1.5076565659508505 - type: nauc_ndcg_at_20_diff1 value: 8.661303229272372 - type: nauc_ndcg_at_20_max value: 3.303174862536166 - type: nauc_ndcg_at_20_std value: 7.493758825967179 - type: nauc_ndcg_at_3_diff1 value: 7.858281169135036 - type: nauc_ndcg_at_3_max value: 0.7079724865506055 - type: nauc_ndcg_at_3_std value: 3.7402042497720958 - type: nauc_ndcg_at_5_diff1 value: 6.68694262946663 - type: nauc_ndcg_at_5_max value: -0.43002529778264326 - type: nauc_ndcg_at_5_std value: 3.9597009492387265 - type: nauc_precision_at_1000_diff1 value: -28.217119971169463 - type: nauc_precision_at_1000_max value: 17.425278660692022 - type: nauc_precision_at_1000_std value: 46.7473304347162 - type: nauc_precision_at_100_diff1 value: 8.738254686624805 - type: nauc_precision_at_100_max value: 32.88945783040687 - type: nauc_precision_at_100_std value: 48.42583030760342 - type: nauc_precision_at_10_diff1 value: 7.873361516017592 - type: nauc_precision_at_10_max value: 9.802552072953949 - type: nauc_precision_at_10_std value: 13.506647301311148 - type: nauc_precision_at_1_diff1 value: 9.240393281366906 - type: nauc_precision_at_1_max value: -4.35798405693968 - type: nauc_precision_at_1_std value: 1.5076565659508505 - type: nauc_precision_at_20_diff1 value: 13.008220519097161 - type: nauc_precision_at_20_max value: 20.829507014709748 - type: nauc_precision_at_20_std value: 25.02998005000373 - type: nauc_precision_at_3_diff1 value: 7.685752623087433 - type: nauc_precision_at_3_max value: 4.126629771323765 - type: nauc_precision_at_3_std value: 5.440817692025366 - type: nauc_precision_at_5_diff1 value: 4.879990376967901 - type: nauc_precision_at_5_max value: 1.7076492862153407 - type: nauc_precision_at_5_std value: 6.009634283832547 - type: nauc_recall_at_1000_diff1 value: -28.217119971166543 - type: nauc_recall_at_1000_max value: 17.425278660689965 - type: nauc_recall_at_1000_std value: 46.74733043471749 - type: nauc_recall_at_100_diff1 value: 8.738254686625181 - type: nauc_recall_at_100_max value: 32.8894578304071 - type: nauc_recall_at_100_std value: 48.425830307603746 - type: nauc_recall_at_10_diff1 value: 7.87336151601764 - type: nauc_recall_at_10_max value: 9.802552072953997 - type: nauc_recall_at_10_std value: 13.506647301311201 - type: nauc_recall_at_1_diff1 value: 9.240393281366906 - type: nauc_recall_at_1_max value: -4.35798405693968 - type: nauc_recall_at_1_std value: 1.5076565659508505 - type: nauc_recall_at_20_diff1 value: 13.008220519097097 - type: nauc_recall_at_20_max value: 20.82950701470975 - type: nauc_recall_at_20_std value: 25.02998005000377 - type: nauc_recall_at_3_diff1 value: 7.685752623087458 - type: nauc_recall_at_3_max value: 4.126629771323791 - type: nauc_recall_at_3_std value: 5.440817692025401 - type: nauc_recall_at_5_diff1 value: 4.879990376967856 - type: nauc_recall_at_5_max value: 1.7076492862153638 - type: nauc_recall_at_5_std value: 6.009634283832578 - type: ndcg_at_1 value: 21.124000000000002 - type: ndcg_at_10 value: 41.966 - type: ndcg_at_100 value: 47.751 - type: ndcg_at_1000 value: 48.635 - type: ndcg_at_20 value: 45.08 - type: ndcg_at_3 value: 33.505 - type: ndcg_at_5 value: 37.266 - type: precision_at_1 value: 21.124000000000002 - type: precision_at_10 value: 6.643000000000001 - type: precision_at_100 value: 0.9249999999999999 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 3.93 - type: precision_at_3 value: 14.296000000000001 - type: precision_at_5 value: 10.413 - type: recall_at_1 value: 21.124000000000002 - type: recall_at_10 value: 66.43 - type: recall_at_100 value: 92.461 - type: recall_at_1000 value: 99.289 - type: recall_at_20 value: 78.592 - type: recall_at_3 value: 42.888 - type: recall_at_5 value: 52.063 - task: type: Clustering dataset: name: MTEB ArxivClusteringP2P (default) type: mteb/arxiv-clustering-p2p config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: main_score value: 35.387660145946825 - type: v_measure value: 35.387660145946825 - type: v_measure_std value: 14.022525689022785 - task: type: Clustering dataset: name: MTEB ArxivClusteringS2S (default) type: mteb/arxiv-clustering-s2s config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: main_score value: 25.26058942964131 - type: v_measure value: 25.26058942964131 - type: v_measure_std value: 14.850432186356857 - task: type: Reranking dataset: name: MTEB AskUbuntuDupQuestions (default) type: mteb/askubuntudupquestions-reranking config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: main_score value: 54.13950871400633 - type: map value: 54.13950871400633 - type: mrr value: 68.87437892978059 - type: nAUC_map_diff1 value: 3.489277672557011 - type: nAUC_map_max value: 15.848457273691064 - type: nAUC_map_std value: 5.166813098270773 - type: nAUC_mrr_diff1 value: 4.9924344024669765 - type: nAUC_mrr_max value: 21.861692980537956 - type: nAUC_mrr_std value: 8.256966784037171 - task: type: STS dataset: name: MTEB BIOSSES (default) type: mteb/biosses-sts config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cosine_pearson value: 79.11612010879227 - type: cosine_spearman value: 75.85775256673794 - type: euclidean_pearson value: 77.46080265077437 - type: euclidean_spearman value: 75.85775256673794 - type: main_score value: 75.85775256673794 - type: manhattan_pearson value: 77.73191375456281 - type: manhattan_spearman value: 75.98908086034702 - type: pearson value: 79.11612010879227 - type: spearman value: 75.85775256673794 - task: type: Classification dataset: name: MTEB Banking77Classification (default) type: mteb/banking77 config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 72.63636363636363 - type: f1 value: 71.69751597573539 - type: f1_weighted value: 71.69751597573539 - type: main_score value: 72.63636363636363 - task: type: Clustering dataset: name: MTEB BiorxivClusteringP2P (default) type: mteb/biorxiv-clustering-p2p config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: main_score value: 30.861840536151014 - type: v_measure value: 30.861840536151014 - type: v_measure_std value: 0.8096483751274005 - task: type: Clustering dataset: name: MTEB BiorxivClusteringS2S (default) type: mteb/biorxiv-clustering-s2s config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: main_score value: 20.219544420664455 - type: v_measure value: 20.219544420664455 - type: v_measure_std value: 0.7431903039116942 - task: type: Retrieval dataset: name: MTEB CQADupstackAndroidRetrieval (default) type: mteb/cqadupstack-android config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: main_score value: 31.835 - type: map_at_1 value: 19.939 - type: map_at_10 value: 26.924 - type: map_at_100 value: 28.16 - type: map_at_1000 value: 28.316999999999997 - type: map_at_20 value: 27.554000000000002 - type: map_at_3 value: 24.45 - type: map_at_5 value: 25.751 - type: mrr_at_1 value: 25.894134477825464 - type: mrr_at_10 value: 32.65152031246451 - type: mrr_at_100 value: 33.58362210177363 - type: mrr_at_1000 value: 33.66415578481638 - type: mrr_at_20 value: 33.158616397714056 - type: mrr_at_3 value: 30.51979017644255 - type: mrr_at_5 value: 31.67143538388174 - type: nauc_map_at_1000_diff1 value: 43.61649840733464 - type: nauc_map_at_1000_max value: 27.361709993841355 - type: nauc_map_at_1000_std value: -1.47509416166404 - type: nauc_map_at_100_diff1 value: 43.63694784277137 - type: nauc_map_at_100_max value: 27.3675446795805 - type: nauc_map_at_100_std value: -1.4918015679743737 - type: nauc_map_at_10_diff1 value: 43.85263484013946 - type: nauc_map_at_10_max value: 26.810142038619045 - type: nauc_map_at_10_std value: -1.9884710880957612 - type: nauc_map_at_1_diff1 value: 48.66149039458694 - type: nauc_map_at_1_max value: 25.719796249226828 - type: nauc_map_at_1_std value: -3.291830544258096 - type: nauc_map_at_20_diff1 value: 43.70511471916722 - type: nauc_map_at_20_max value: 27.211922285560092 - type: nauc_map_at_20_std value: -1.621254133243609 - type: nauc_map_at_3_diff1 value: 45.678378884966854 - type: nauc_map_at_3_max value: 26.263363796878807 - type: nauc_map_at_3_std value: -3.067861673919005 - type: nauc_map_at_5_diff1 value: 44.28820868486158 - type: nauc_map_at_5_max value: 27.02028586800064 - type: nauc_map_at_5_std value: -2.8993536712942554 - type: nauc_mrr_at_1000_diff1 value: 41.91452307309703 - type: nauc_mrr_at_1000_max value: 28.25542784321284 - type: nauc_mrr_at_1000_std value: -1.2881473492995474 - type: nauc_mrr_at_100_diff1 value: 41.887361041816355 - type: nauc_mrr_at_100_max value: 28.242674898536045 - type: nauc_mrr_at_100_std value: -1.2962789057617752 - type: nauc_mrr_at_10_diff1 value: 41.839392429152184 - type: nauc_mrr_at_10_max value: 28.18109937160502 - type: nauc_mrr_at_10_std value: -1.760338307129395 - type: nauc_mrr_at_1_diff1 value: 46.97337896088234 - type: nauc_mrr_at_1_max value: 28.47299575870196 - type: nauc_mrr_at_1_std value: -2.699423724792112 - type: nauc_mrr_at_20_diff1 value: 41.87609128070427 - type: nauc_mrr_at_20_max value: 28.275298954521837 - type: nauc_mrr_at_20_std value: -1.3019240483529069 - type: nauc_mrr_at_3_diff1 value: 43.7337496151517 - type: nauc_mrr_at_3_max value: 27.798267478018285 - type: nauc_mrr_at_3_std value: -2.840593072947404 - type: nauc_mrr_at_5_diff1 value: 42.334483231228894 - type: nauc_mrr_at_5_max value: 28.312298246235912 - type: nauc_mrr_at_5_std value: -2.4627148837425574 - type: nauc_ndcg_at_1000_diff1 value: 41.15727539315947 - type: nauc_ndcg_at_1000_max value: 28.221291832726013 - type: nauc_ndcg_at_1000_std value: 2.0023108110987686 - type: nauc_ndcg_at_100_diff1 value: 40.696711368737986 - type: nauc_ndcg_at_100_max value: 28.3380433133816 - type: nauc_ndcg_at_100_std value: 1.6747741379499974 - type: nauc_ndcg_at_10_diff1 value: 40.68084799209197 - type: nauc_ndcg_at_10_max value: 27.001668531808047 - type: nauc_ndcg_at_10_std value: -0.6698055635076909 - type: nauc_ndcg_at_1_diff1 value: 46.97337896088234 - type: nauc_ndcg_at_1_max value: 28.47299575870196 - type: nauc_ndcg_at_1_std value: -2.699423724792112 - type: nauc_ndcg_at_20_diff1 value: 40.66080469225681 - type: nauc_ndcg_at_20_max value: 27.65886977082646 - type: nauc_ndcg_at_20_std value: 0.7450066458769301 - type: nauc_ndcg_at_3_diff1 value: 42.76104820392522 - type: nauc_ndcg_at_3_max value: 26.519613853147632 - type: nauc_ndcg_at_3_std value: -2.4350130293906034 - type: nauc_ndcg_at_5_diff1 value: 41.019172353488194 - type: nauc_ndcg_at_5_max value: 27.496046368143357 - type: nauc_ndcg_at_5_std value: -2.2882580326645177 - type: nauc_precision_at_1000_diff1 value: -14.261675661323125 - type: nauc_precision_at_1000_max value: -1.183805005826827 - type: nauc_precision_at_1000_std value: 3.344837871953594 - type: nauc_precision_at_100_diff1 value: 2.705968352361474 - type: nauc_precision_at_100_max value: 15.123914801051598 - type: nauc_precision_at_100_std value: 6.622282531987529 - type: nauc_precision_at_10_diff1 value: 21.143497652137974 - type: nauc_precision_at_10_max value: 22.754667045964673 - type: nauc_precision_at_10_std value: 2.56769270957959 - type: nauc_precision_at_1_diff1 value: 46.97337896088234 - type: nauc_precision_at_1_max value: 28.47299575870196 - type: nauc_precision_at_1_std value: -2.699423724792112 - type: nauc_precision_at_20_diff1 value: 15.750482341955857 - type: nauc_precision_at_20_max value: 22.860380841938827 - type: nauc_precision_at_20_std value: 4.22745838192582 - type: nauc_precision_at_3_diff1 value: 35.61809209460161 - type: nauc_precision_at_3_max value: 27.0006337531976 - type: nauc_precision_at_3_std value: -1.4556398881692423 - type: nauc_precision_at_5_diff1 value: 28.851808861899496 - type: nauc_precision_at_5_max value: 27.469054608601784 - type: nauc_precision_at_5_std value: -1.1421142808937477 - type: nauc_recall_at_1000_diff1 value: 33.27567106545891 - type: nauc_recall_at_1000_max value: 30.098997951989325 - type: nauc_recall_at_1000_std value: 37.339251250157766 - type: nauc_recall_at_100_diff1 value: 29.072377336992822 - type: nauc_recall_at_100_max value: 28.48476566182903 - type: nauc_recall_at_100_std value: 14.360417936748082 - type: nauc_recall_at_10_diff1 value: 32.83564819819592 - type: nauc_recall_at_10_max value: 24.465508171060677 - type: nauc_recall_at_10_std value: 3.332253149508536 - type: nauc_recall_at_1_diff1 value: 48.66149039458694 - type: nauc_recall_at_1_max value: 25.719796249226828 - type: nauc_recall_at_1_std value: -3.291830544258096 - type: nauc_recall_at_20_diff1 value: 31.185350107155045 - type: nauc_recall_at_20_max value: 25.812923152751406 - type: nauc_recall_at_20_std value: 8.353054109145367 - type: nauc_recall_at_3_diff1 value: 40.27297484569938 - type: nauc_recall_at_3_max value: 23.81327189620511 - type: nauc_recall_at_3_std value: -2.526830052534271 - type: nauc_recall_at_5_diff1 value: 34.64896359382995 - type: nauc_recall_at_5_max value: 25.750218989139317 - type: nauc_recall_at_5_std value: -1.3789317138918638 - type: ndcg_at_1 value: 25.894000000000002 - type: ndcg_at_10 value: 31.835 - type: ndcg_at_100 value: 37.325 - type: ndcg_at_1000 value: 40.586 - type: ndcg_at_20 value: 33.714 - type: ndcg_at_3 value: 28.143 - type: ndcg_at_5 value: 29.648999999999997 - type: precision_at_1 value: 25.894000000000002 - type: precision_at_10 value: 6.194999999999999 - type: precision_at_100 value: 1.126 - type: precision_at_1000 value: 0.173 - type: precision_at_20 value: 3.7199999999999998 - type: precision_at_3 value: 13.543 - type: precision_at_5 value: 9.757 - type: recall_at_1 value: 19.939 - type: recall_at_10 value: 40.537 - type: recall_at_100 value: 64.717 - type: recall_at_1000 value: 87.01299999999999 - type: recall_at_20 value: 47.677 - type: recall_at_3 value: 29.301 - type: recall_at_5 value: 33.918 - task: type: Retrieval dataset: name: MTEB CQADupstackEnglishRetrieval (default) type: mteb/cqadupstack-english config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: main_score value: 25.734 - type: map_at_1 value: 16.601 - type: map_at_10 value: 22.07 - type: map_at_100 value: 22.958000000000002 - type: map_at_1000 value: 23.074 - type: map_at_20 value: 22.52 - type: map_at_3 value: 20.137 - type: map_at_5 value: 21.315 - type: mrr_at_1 value: 20.382165605095544 - type: mrr_at_10 value: 25.95447881912849 - type: mrr_at_100 value: 26.72268332839149 - type: mrr_at_1000 value: 26.79228081014276 - type: mrr_at_20 value: 26.372942687112676 - type: mrr_at_3 value: 24.097664543524406 - type: mrr_at_5 value: 25.269639065817373 - type: nauc_map_at_1000_diff1 value: 39.97979443324452 - type: nauc_map_at_1000_max value: 13.65503993855689 - type: nauc_map_at_1000_std value: -2.0265680574493286 - type: nauc_map_at_100_diff1 value: 40.04134376146643 - type: nauc_map_at_100_max value: 13.602473622919186 - type: nauc_map_at_100_std value: -2.1531627932652073 - type: nauc_map_at_10_diff1 value: 40.321538712092966 - type: nauc_map_at_10_max value: 13.5001803982381 - type: nauc_map_at_10_std value: -2.628320244096416 - type: nauc_map_at_1_diff1 value: 47.528556920568896 - type: nauc_map_at_1_max value: 15.848152314768068 - type: nauc_map_at_1_std value: -3.8515029742454763 - type: nauc_map_at_20_diff1 value: 40.22452252482904 - type: nauc_map_at_20_max value: 13.501820277821633 - type: nauc_map_at_20_std value: -2.4849480670127835 - type: nauc_map_at_3_diff1 value: 41.68152420395297 - type: nauc_map_at_3_max value: 13.993359536648425 - type: nauc_map_at_3_std value: -4.120472655476033 - type: nauc_map_at_5_diff1 value: 40.72541498326932 - type: nauc_map_at_5_max value: 13.706855573979945 - type: nauc_map_at_5_std value: -3.168857069165899 - type: nauc_mrr_at_1000_diff1 value: 37.9361528126572 - type: nauc_mrr_at_1000_max value: 14.435169065764649 - type: nauc_mrr_at_1000_std value: -0.3672502634006242 - type: nauc_mrr_at_100_diff1 value: 37.94986436229442 - type: nauc_mrr_at_100_max value: 14.435994989813192 - type: nauc_mrr_at_100_std value: -0.37576385382293837 - type: nauc_mrr_at_10_diff1 value: 38.11900316449423 - type: nauc_mrr_at_10_max value: 14.472293540608746 - type: nauc_mrr_at_10_std value: -0.43716209085613345 - type: nauc_mrr_at_1_diff1 value: 44.21976115137286 - type: nauc_mrr_at_1_max value: 17.82290497090946 - type: nauc_mrr_at_1_std value: -1.547820761457578 - type: nauc_mrr_at_20_diff1 value: 38.024147471792524 - type: nauc_mrr_at_20_max value: 14.385378851779368 - type: nauc_mrr_at_20_std value: -0.47797312999005215 - type: nauc_mrr_at_3_diff1 value: 39.15186528374059 - type: nauc_mrr_at_3_max value: 15.21927102759239 - type: nauc_mrr_at_3_std value: -1.5215890424003806 - type: nauc_mrr_at_5_diff1 value: 38.45626599850357 - type: nauc_mrr_at_5_max value: 14.640408888284732 - type: nauc_mrr_at_5_std value: -0.7311075454359176 - type: nauc_ndcg_at_1000_diff1 value: 36.09833573033763 - type: nauc_ndcg_at_1000_max value: 13.245365815282575 - type: nauc_ndcg_at_1000_std value: 1.5761746506032988 - type: nauc_ndcg_at_100_diff1 value: 36.904025539005644 - type: nauc_ndcg_at_100_max value: 12.957957928970645 - type: nauc_ndcg_at_100_std value: 0.4532239536005292 - type: nauc_ndcg_at_10_diff1 value: 37.32497182133629 - type: nauc_ndcg_at_10_max value: 12.490853969491074 - type: nauc_ndcg_at_10_std value: -0.7416415504597471 - type: nauc_ndcg_at_1_diff1 value: 44.21976115137286 - type: nauc_ndcg_at_1_max value: 17.82290497090946 - type: nauc_ndcg_at_1_std value: -1.547820761457578 - type: nauc_ndcg_at_20_diff1 value: 37.28170904668032 - type: nauc_ndcg_at_20_max value: 12.268080858587759 - type: nauc_ndcg_at_20_std value: -0.7360183931126623 - type: nauc_ndcg_at_3_diff1 value: 39.02888999235542 - type: nauc_ndcg_at_3_max value: 13.901334459489329 - type: nauc_ndcg_at_3_std value: -2.7172751935367647 - type: nauc_ndcg_at_5_diff1 value: 38.02752207740974 - type: nauc_ndcg_at_5_max value: 13.02646174038431 - type: nauc_ndcg_at_5_std value: -1.609904028585218 - type: nauc_precision_at_1000_diff1 value: -6.66757757004073 - type: nauc_precision_at_1000_max value: 9.0023204523236 - type: nauc_precision_at_1000_std value: 23.5060357363243 - type: nauc_precision_at_100_diff1 value: 6.113195112414238 - type: nauc_precision_at_100_max value: 11.685619926894306 - type: nauc_precision_at_100_std value: 19.46517809799074 - type: nauc_precision_at_10_diff1 value: 20.39466712905433 - type: nauc_precision_at_10_max value: 11.42898255449916 - type: nauc_precision_at_10_std value: 9.716462445452729 - type: nauc_precision_at_1_diff1 value: 44.21976115137286 - type: nauc_precision_at_1_max value: 17.82290497090946 - type: nauc_precision_at_1_std value: -1.547820761457578 - type: nauc_precision_at_20_diff1 value: 16.658730057271427 - type: nauc_precision_at_20_max value: 11.1652114440581 - type: nauc_precision_at_20_std value: 11.300027272107469 - type: nauc_precision_at_3_diff1 value: 30.28030907617402 - type: nauc_precision_at_3_max value: 13.794055418422083 - type: nauc_precision_at_3_std value: 0.6048823642224063 - type: nauc_precision_at_5_diff1 value: 25.663334758638058 - type: nauc_precision_at_5_max value: 12.249908938864056 - type: nauc_precision_at_5_std value: 5.0045410071189425 - type: nauc_recall_at_1000_diff1 value: 21.220572448408245 - type: nauc_recall_at_1000_max value: 9.691420267810058 - type: nauc_recall_at_1000_std value: 12.85759827330056 - type: nauc_recall_at_100_diff1 value: 28.21527141094479 - type: nauc_recall_at_100_max value: 9.83831880254868 - type: nauc_recall_at_100_std value: 5.435149253402134 - type: nauc_recall_at_10_diff1 value: 30.716014201487262 - type: nauc_recall_at_10_max value: 8.051593782800182 - type: nauc_recall_at_10_std value: 0.4471610378184442 - type: nauc_recall_at_1_diff1 value: 47.528556920568896 - type: nauc_recall_at_1_max value: 15.848152314768068 - type: nauc_recall_at_1_std value: -3.8515029742454763 - type: nauc_recall_at_20_diff1 value: 29.800603042147905 - type: nauc_recall_at_20_max value: 7.042808403898776 - type: nauc_recall_at_20_std value: 0.8179034283502986 - type: nauc_recall_at_3_diff1 value: 36.05311584515151 - type: nauc_recall_at_3_max value: 11.03138015792514 - type: nauc_recall_at_3_std value: -4.298332543889119 - type: nauc_recall_at_5_diff1 value: 33.34542113435848 - type: nauc_recall_at_5_max value: 9.391429367517976 - type: nauc_recall_at_5_std value: -1.5174868347878459 - type: ndcg_at_1 value: 20.382 - type: ndcg_at_10 value: 25.734 - type: ndcg_at_100 value: 29.952 - type: ndcg_at_1000 value: 32.618 - type: ndcg_at_20 value: 27.181 - type: ndcg_at_3 value: 22.445999999999998 - type: ndcg_at_5 value: 24.162 - type: precision_at_1 value: 20.382 - type: precision_at_10 value: 4.662 - type: precision_at_100 value: 0.8580000000000001 - type: precision_at_1000 value: 0.133 - type: precision_at_20 value: 2.828 - type: precision_at_3 value: 10.446 - type: precision_at_5 value: 7.682 - type: recall_at_1 value: 16.601 - type: recall_at_10 value: 32.882 - type: recall_at_100 value: 51.273 - type: recall_at_1000 value: 69.33200000000001 - type: recall_at_20 value: 38.22 - type: recall_at_3 value: 23.54 - type: recall_at_5 value: 28.054000000000002 - task: type: Retrieval dataset: name: MTEB CQADupstackGamingRetrieval (default) type: mteb/cqadupstack-gaming config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: main_score value: 39.235 - type: map_at_1 value: 25.386999999999997 - type: map_at_10 value: 34.183 - type: map_at_100 value: 35.198 - type: map_at_1000 value: 35.292 - type: map_at_20 value: 34.756 - type: map_at_3 value: 31.466 - type: map_at_5 value: 33.037 - type: mrr_at_1 value: 29.404388714733543 - type: mrr_at_10 value: 37.51880877742944 - type: mrr_at_100 value: 38.30457109532953 - type: mrr_at_1000 value: 38.3645245292866 - type: mrr_at_20 value: 37.94776237222878 - type: mrr_at_3 value: 35.15151515151513 - type: mrr_at_5 value: 36.530825496342715 - type: nauc_map_at_1000_diff1 value: 41.249973220934464 - type: nauc_map_at_1000_max value: 23.416302755877073 - type: nauc_map_at_1000_std value: -10.207899212437999 - type: nauc_map_at_100_diff1 value: 41.24390045906369 - type: nauc_map_at_100_max value: 23.393682611799267 - type: nauc_map_at_100_std value: -10.254556576082482 - type: nauc_map_at_10_diff1 value: 41.382354597936995 - type: nauc_map_at_10_max value: 23.176782265492363 - type: nauc_map_at_10_std value: -10.849718292221906 - type: nauc_map_at_1_diff1 value: 45.39686265513208 - type: nauc_map_at_1_max value: 19.620871905273706 - type: nauc_map_at_1_std value: -12.904987428165654 - type: nauc_map_at_20_diff1 value: 41.27244082919643 - type: nauc_map_at_20_max value: 23.302684773349597 - type: nauc_map_at_20_std value: -10.441842806985154 - type: nauc_map_at_3_diff1 value: 41.8919220244127 - type: nauc_map_at_3_max value: 22.254220793423723 - type: nauc_map_at_3_std value: -12.130298439753705 - type: nauc_map_at_5_diff1 value: 41.58025783631085 - type: nauc_map_at_5_max value: 22.90826213564573 - type: nauc_map_at_5_std value: -11.165811549758352 - type: nauc_mrr_at_1000_diff1 value: 40.53152598499822 - type: nauc_mrr_at_1000_max value: 25.11227665851315 - type: nauc_mrr_at_1000_std value: -8.08741271282522 - type: nauc_mrr_at_100_diff1 value: 40.51963005358264 - type: nauc_mrr_at_100_max value: 25.120293035347625 - type: nauc_mrr_at_100_std value: -8.08477757772673 - type: nauc_mrr_at_10_diff1 value: 40.630254919734845 - type: nauc_mrr_at_10_max value: 25.192263018985 - type: nauc_mrr_at_10_std value: -8.343786686430308 - type: nauc_mrr_at_1_diff1 value: 45.24802769641752 - type: nauc_mrr_at_1_max value: 22.81400229887994 - type: nauc_mrr_at_1_std value: -11.030374885452746 - type: nauc_mrr_at_20_diff1 value: 40.527874579465404 - type: nauc_mrr_at_20_max value: 25.09785309228408 - type: nauc_mrr_at_20_std value: -8.178961300984005 - type: nauc_mrr_at_3_diff1 value: 40.9982110047705 - type: nauc_mrr_at_3_max value: 24.89415486978485 - type: nauc_mrr_at_3_std value: -9.326777261347539 - type: nauc_mrr_at_5_diff1 value: 40.80630420274428 - type: nauc_mrr_at_5_max value: 25.27575084878062 - type: nauc_mrr_at_5_std value: -8.546736722404525 - type: nauc_ndcg_at_1000_diff1 value: 39.53378645935715 - type: nauc_ndcg_at_1000_max value: 25.526492849521226 - type: nauc_ndcg_at_1000_std value: -6.007063152931765 - type: nauc_ndcg_at_100_diff1 value: 39.0880907026097 - type: nauc_ndcg_at_100_max value: 25.27434977919565 - type: nauc_ndcg_at_100_std value: -6.494059729717049 - type: nauc_ndcg_at_10_diff1 value: 39.75643189392527 - type: nauc_ndcg_at_10_max value: 24.79335502116443 - type: nauc_ndcg_at_10_std value: -8.786781322519788 - type: nauc_ndcg_at_1_diff1 value: 45.24802769641752 - type: nauc_ndcg_at_1_max value: 22.81400229887994 - type: nauc_ndcg_at_1_std value: -11.030374885452746 - type: nauc_ndcg_at_20_diff1 value: 39.38115636990762 - type: nauc_ndcg_at_20_max value: 24.830948061340973 - type: nauc_ndcg_at_20_std value: -7.74514857483731 - type: nauc_ndcg_at_3_diff1 value: 40.597424968913295 - type: nauc_ndcg_at_3_max value: 23.83761797284813 - type: nauc_ndcg_at_3_std value: -10.826014984199753 - type: nauc_ndcg_at_5_diff1 value: 40.160243884240955 - type: nauc_ndcg_at_5_max value: 24.641005184802403 - type: nauc_ndcg_at_5_std value: -9.394573143721122 - type: nauc_precision_at_1000_diff1 value: -0.26775483855404 - type: nauc_precision_at_1000_max value: 23.052779599626216 - type: nauc_precision_at_1000_std value: 24.978867586645737 - type: nauc_precision_at_100_diff1 value: 9.73599417323489 - type: nauc_precision_at_100_max value: 26.664612833573067 - type: nauc_precision_at_100_std value: 15.747547424892522 - type: nauc_precision_at_10_diff1 value: 25.384143998683495 - type: nauc_precision_at_10_max value: 28.77515164969203 - type: nauc_precision_at_10_std value: 1.334799782027906 - type: nauc_precision_at_1_diff1 value: 45.24802769641752 - type: nauc_precision_at_1_max value: 22.81400229887994 - type: nauc_precision_at_1_std value: -11.030374885452746 - type: nauc_precision_at_20_diff1 value: 20.21252517032333 - type: nauc_precision_at_20_max value: 28.092242647209847 - type: nauc_precision_at_20_std value: 7.13292725544981 - type: nauc_precision_at_3_diff1 value: 33.31087126292084 - type: nauc_precision_at_3_max value: 28.144729235595268 - type: nauc_precision_at_3_std value: -6.680273865904818 - type: nauc_precision_at_5_diff1 value: 29.65876394876068 - type: nauc_precision_at_5_max value: 29.35126830830009 - type: nauc_precision_at_5_std value: -1.6373943088766274 - type: nauc_recall_at_1000_diff1 value: 28.93648565815677 - type: nauc_recall_at_1000_max value: 35.83681303333163 - type: nauc_recall_at_1000_std value: 33.065249002817446 - type: nauc_recall_at_100_diff1 value: 27.743019102171594 - type: nauc_recall_at_100_max value: 28.027951033595023 - type: nauc_recall_at_100_std value: 9.499502949546343 - type: nauc_recall_at_10_diff1 value: 33.975592980890205 - type: nauc_recall_at_10_max value: 25.654266106207007 - type: nauc_recall_at_10_std value: -4.889087003341999 - type: nauc_recall_at_1_diff1 value: 45.39686265513208 - type: nauc_recall_at_1_max value: 19.620871905273706 - type: nauc_recall_at_1_std value: -12.904987428165654 - type: nauc_recall_at_20_diff1 value: 32.428638046562156 - type: nauc_recall_at_20_max value: 25.811049662670854 - type: nauc_recall_at_20_std value: -1.084167664066214 - type: nauc_recall_at_3_diff1 value: 36.80239523147669 - type: nauc_recall_at_3_max value: 23.70115293826517 - type: nauc_recall_at_3_std value: -10.179865917816631 - type: nauc_recall_at_5_diff1 value: 35.481273082880385 - type: nauc_recall_at_5_max value: 25.22699895557444 - type: nauc_recall_at_5_std value: -6.928154160954265 - type: ndcg_at_1 value: 29.404000000000003 - type: ndcg_at_10 value: 39.235 - type: ndcg_at_100 value: 44.072 - type: ndcg_at_1000 value: 46.272999999999996 - type: ndcg_at_20 value: 40.983000000000004 - type: ndcg_at_3 value: 34.292 - type: ndcg_at_5 value: 36.735 - type: precision_at_1 value: 29.404000000000003 - type: precision_at_10 value: 6.539000000000001 - type: precision_at_100 value: 0.984 - type: precision_at_1000 value: 0.125 - type: precision_at_20 value: 3.752 - type: precision_at_3 value: 15.423 - type: precision_at_5 value: 10.984 - type: recall_at_1 value: 25.386999999999997 - type: recall_at_10 value: 51.256 - type: recall_at_100 value: 73.53699999999999 - type: recall_at_1000 value: 89.522 - type: recall_at_20 value: 57.687 - type: recall_at_3 value: 37.830999999999996 - type: recall_at_5 value: 43.811 - task: type: Retrieval dataset: name: MTEB CQADupstackGisRetrieval (default) type: mteb/cqadupstack-gis config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: main_score value: 19.197 - type: map_at_1 value: 10.832 - type: map_at_10 value: 16.154 - type: map_at_100 value: 16.863 - type: map_at_1000 value: 16.979 - type: map_at_20 value: 16.494 - type: map_at_3 value: 14.654 - type: map_at_5 value: 15.634 - type: mrr_at_1 value: 11.751412429378531 - type: mrr_at_10 value: 17.286476549188407 - type: mrr_at_100 value: 18.019080515365157 - type: mrr_at_1000 value: 18.122220740371624 - type: mrr_at_20 value: 17.643986643881693 - type: mrr_at_3 value: 15.70621468926553 - type: mrr_at_5 value: 16.774011299435024 - type: nauc_map_at_1000_diff1 value: 37.927063185916786 - type: nauc_map_at_1000_max value: 14.15651072891371 - type: nauc_map_at_1000_std value: -8.124962552251457 - type: nauc_map_at_100_diff1 value: 37.93525025821844 - type: nauc_map_at_100_max value: 14.131523699537288 - type: nauc_map_at_100_std value: -8.170583771371396 - type: nauc_map_at_10_diff1 value: 38.42813636094302 - type: nauc_map_at_10_max value: 14.282120499977891 - type: nauc_map_at_10_std value: -8.577031812934745 - type: nauc_map_at_1_diff1 value: 51.66692699481996 - type: nauc_map_at_1_max value: 17.664646674047123 - type: nauc_map_at_1_std value: -11.782621031162968 - type: nauc_map_at_20_diff1 value: 38.17853788871855 - type: nauc_map_at_20_max value: 14.256213676574742 - type: nauc_map_at_20_std value: -8.310926163301415 - type: nauc_map_at_3_diff1 value: 40.16070984262913 - type: nauc_map_at_3_max value: 14.268693118841725 - type: nauc_map_at_3_std value: -9.133251481752447 - type: nauc_map_at_5_diff1 value: 38.83714248320578 - type: nauc_map_at_5_max value: 14.547528919229999 - type: nauc_map_at_5_std value: -8.916871955060776 - type: nauc_mrr_at_1000_diff1 value: 36.5899689047331 - type: nauc_mrr_at_1000_max value: 15.113884206534985 - type: nauc_mrr_at_1000_std value: -7.170934224974719 - type: nauc_mrr_at_100_diff1 value: 36.58290352969189 - type: nauc_mrr_at_100_max value: 15.10461015425463 - type: nauc_mrr_at_100_std value: -7.193153133255972 - type: nauc_mrr_at_10_diff1 value: 36.886787941126755 - type: nauc_mrr_at_10_max value: 15.127743773603711 - type: nauc_mrr_at_10_std value: -7.450354111586159 - type: nauc_mrr_at_1_diff1 value: 50.4303551964735 - type: nauc_mrr_at_1_max value: 18.974353633454818 - type: nauc_mrr_at_1_std value: -10.667048661688531 - type: nauc_mrr_at_20_diff1 value: 36.748056497939466 - type: nauc_mrr_at_20_max value: 15.240859680475241 - type: nauc_mrr_at_20_std value: -7.288016407850428 - type: nauc_mrr_at_3_diff1 value: 38.37428302171742 - type: nauc_mrr_at_3_max value: 14.8093219575286 - type: nauc_mrr_at_3_std value: -7.809230035161326 - type: nauc_mrr_at_5_diff1 value: 37.2144623683964 - type: nauc_mrr_at_5_max value: 15.28601324524152 - type: nauc_mrr_at_5_std value: -7.7340060832485 - type: nauc_ndcg_at_1000_diff1 value: 32.12453348510246 - type: nauc_ndcg_at_1000_max value: 13.157455004954915 - type: nauc_ndcg_at_1000_std value: -4.92622356811411 - type: nauc_ndcg_at_100_diff1 value: 32.06154877919635 - type: nauc_ndcg_at_100_max value: 12.373862596941047 - type: nauc_ndcg_at_100_std value: -5.679273924705311 - type: nauc_ndcg_at_10_diff1 value: 34.0105889334877 - type: nauc_ndcg_at_10_max value: 13.45850179368671 - type: nauc_ndcg_at_10_std value: -7.129474197823981 - type: nauc_ndcg_at_1_diff1 value: 50.4303551964735 - type: nauc_ndcg_at_1_max value: 18.974353633454818 - type: nauc_ndcg_at_1_std value: -10.667048661688531 - type: nauc_ndcg_at_20_diff1 value: 33.17001669466592 - type: nauc_ndcg_at_20_max value: 13.32565385671001 - type: nauc_ndcg_at_20_std value: -6.284897809311489 - type: nauc_ndcg_at_3_diff1 value: 36.583009335894786 - type: nauc_ndcg_at_3_max value: 13.3100798018976 - type: nauc_ndcg_at_3_std value: -8.166653842277874 - type: nauc_ndcg_at_5_diff1 value: 34.663883470713714 - type: nauc_ndcg_at_5_max value: 13.925348847790179 - type: nauc_ndcg_at_5_std value: -7.8134139319246705 - type: nauc_precision_at_1000_diff1 value: 3.267820129824429 - type: nauc_precision_at_1000_max value: 13.475739290072998 - type: nauc_precision_at_1000_std value: 9.817456700342868 - type: nauc_precision_at_100_diff1 value: 14.543473928222502 - type: nauc_precision_at_100_max value: 9.536842145225432 - type: nauc_precision_at_100_std value: 2.367980716410962 - type: nauc_precision_at_10_diff1 value: 22.83690357863953 - type: nauc_precision_at_10_max value: 12.377338528340081 - type: nauc_precision_at_10_std value: -2.7413618512966442 - type: nauc_precision_at_1_diff1 value: 50.4303551964735 - type: nauc_precision_at_1_max value: 18.974353633454818 - type: nauc_precision_at_1_std value: -10.667048661688531 - type: nauc_precision_at_20_diff1 value: 20.379974384537427 - type: nauc_precision_at_20_max value: 12.277432490519853 - type: nauc_precision_at_20_std value: -0.023357415290595228 - type: nauc_precision_at_3_diff1 value: 28.00128059605776 - type: nauc_precision_at_3_max value: 12.115949162806704 - type: nauc_precision_at_3_std value: -5.111345494119332 - type: nauc_precision_at_5_diff1 value: 23.931333166517064 - type: nauc_precision_at_5_max value: 13.460490076263444 - type: nauc_precision_at_5_std value: -4.566369591299022 - type: nauc_recall_at_1000_diff1 value: 13.901980638817474 - type: nauc_recall_at_1000_max value: 8.169301488452522 - type: nauc_recall_at_1000_std value: 6.977530327014011 - type: nauc_recall_at_100_diff1 value: 18.54699849728289 - type: nauc_recall_at_100_max value: 5.40051681338299 - type: nauc_recall_at_100_std value: -0.2998165893044503 - type: nauc_recall_at_10_diff1 value: 25.158691029447162 - type: nauc_recall_at_10_max value: 10.698096715728344 - type: nauc_recall_at_10_std value: -4.90677955177619 - type: nauc_recall_at_1_diff1 value: 51.66692699481996 - type: nauc_recall_at_1_max value: 17.664646674047123 - type: nauc_recall_at_1_std value: -11.782621031162968 - type: nauc_recall_at_20_diff1 value: 22.315869507893193 - type: nauc_recall_at_20_max value: 9.799239845339486 - type: nauc_recall_at_20_std value: -2.255295176195769 - type: nauc_recall_at_3_diff1 value: 30.21846457670379 - type: nauc_recall_at_3_max value: 10.958491456074727 - type: nauc_recall_at_3_std value: -6.746808382770713 - type: nauc_recall_at_5_diff1 value: 26.24302256225738 - type: nauc_recall_at_5_max value: 11.682268465161725 - type: nauc_recall_at_5_std value: -6.292007648799524 - type: ndcg_at_1 value: 11.751000000000001 - type: ndcg_at_10 value: 19.197 - type: ndcg_at_100 value: 23.159 - type: ndcg_at_1000 value: 26.453 - type: ndcg_at_20 value: 20.448 - type: ndcg_at_3 value: 16.186 - type: ndcg_at_5 value: 17.936 - type: precision_at_1 value: 11.751000000000001 - type: precision_at_10 value: 3.1189999999999998 - type: precision_at_100 value: 0.54 - type: precision_at_1000 value: 0.086 - type: precision_at_20 value: 1.859 - type: precision_at_3 value: 7.194000000000001 - type: precision_at_5 value: 5.311 - type: recall_at_1 value: 10.832 - type: recall_at_10 value: 27.472 - type: recall_at_100 value: 46.471000000000004 - type: recall_at_1000 value: 71.91199999999999 - type: recall_at_20 value: 32.213 - type: recall_at_3 value: 19.417 - type: recall_at_5 value: 23.577 - task: type: Retrieval dataset: name: MTEB CQADupstackMathematicaRetrieval (default) type: mteb/cqadupstack-mathematica config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: main_score value: 12.145 - type: map_at_1 value: 6.019 - type: map_at_10 value: 9.584 - type: map_at_100 value: 10.433 - type: map_at_1000 value: 10.562000000000001 - type: map_at_20 value: 10.024 - type: map_at_3 value: 8.351 - type: map_at_5 value: 9.005 - type: mrr_at_1 value: 7.213930348258707 - type: mrr_at_10 value: 11.619827450051332 - type: mrr_at_100 value: 12.469229814971346 - type: mrr_at_1000 value: 12.577286932589695 - type: mrr_at_20 value: 12.072514356821353 - type: mrr_at_3 value: 10.157545605306801 - type: mrr_at_5 value: 10.89759535655058 - type: nauc_map_at_1000_diff1 value: 18.60219400887139 - type: nauc_map_at_1000_max value: 6.951583595979727 - type: nauc_map_at_1000_std value: -0.36466683994108184 - type: nauc_map_at_100_diff1 value: 18.660733139389524 - type: nauc_map_at_100_max value: 6.903072765131549 - type: nauc_map_at_100_std value: -0.48390217802549257 - type: nauc_map_at_10_diff1 value: 18.573179595835647 - type: nauc_map_at_10_max value: 6.992666771720911 - type: nauc_map_at_10_std value: -0.8874423543023089 - type: nauc_map_at_1_diff1 value: 33.90106432523568 - type: nauc_map_at_1_max value: 9.289205840089235 - type: nauc_map_at_1_std value: 2.1852128418717705 - type: nauc_map_at_20_diff1 value: 18.334656889783485 - type: nauc_map_at_20_max value: 6.931684308001437 - type: nauc_map_at_20_std value: -0.7124186564380448 - type: nauc_map_at_3_diff1 value: 20.32895393313974 - type: nauc_map_at_3_max value: 5.887419026571198 - type: nauc_map_at_3_std value: -0.015273865884840596 - type: nauc_map_at_5_diff1 value: 19.15574225963634 - type: nauc_map_at_5_max value: 6.175933890525402 - type: nauc_map_at_5_std value: -1.468261999387673 - type: nauc_mrr_at_1000_diff1 value: 18.0560339880594 - type: nauc_mrr_at_1000_max value: 8.653214727915024 - type: nauc_mrr_at_1000_std value: 1.6650523107666824 - type: nauc_mrr_at_100_diff1 value: 18.067266124955946 - type: nauc_mrr_at_100_max value: 8.645444544074266 - type: nauc_mrr_at_100_std value: 1.605397143432772 - type: nauc_mrr_at_10_diff1 value: 18.227604303918422 - type: nauc_mrr_at_10_max value: 8.980990643614946 - type: nauc_mrr_at_10_std value: 1.625956129526598 - type: nauc_mrr_at_1_diff1 value: 33.145174271418576 - type: nauc_mrr_at_1_max value: 10.674348159869123 - type: nauc_mrr_at_1_std value: 2.5718912675260843 - type: nauc_mrr_at_20_diff1 value: 17.85361170315467 - type: nauc_mrr_at_20_max value: 8.689966423383293 - type: nauc_mrr_at_20_std value: 1.4845343622374683 - type: nauc_mrr_at_3_diff1 value: 19.72873972100882 - type: nauc_mrr_at_3_max value: 7.818757201820606 - type: nauc_mrr_at_3_std value: 2.317801166782217 - type: nauc_mrr_at_5_diff1 value: 18.70515159747826 - type: nauc_mrr_at_5_max value: 7.8553636278171055 - type: nauc_mrr_at_5_std value: 0.8593300223901442 - type: nauc_ndcg_at_1000_diff1 value: 14.777764985527059 - type: nauc_ndcg_at_1000_max value: 8.001133085293265 - type: nauc_ndcg_at_1000_std value: 2.715094827482056 - type: nauc_ndcg_at_100_diff1 value: 15.873494520058037 - type: nauc_ndcg_at_100_max value: 7.5190091115119 - type: nauc_ndcg_at_100_std value: 0.7430533500967327 - type: nauc_ndcg_at_10_diff1 value: 14.950829327092022 - type: nauc_ndcg_at_10_max value: 7.999425322307154 - type: nauc_ndcg_at_10_std value: -0.5911692617165382 - type: nauc_ndcg_at_1_diff1 value: 33.145174271418576 - type: nauc_ndcg_at_1_max value: 10.674348159869123 - type: nauc_ndcg_at_1_std value: 2.5718912675260843 - type: nauc_ndcg_at_20_diff1 value: 14.28695753335748 - type: nauc_ndcg_at_20_max value: 7.460341211112809 - type: nauc_ndcg_at_20_std value: -0.2734671370134216 - type: nauc_ndcg_at_3_diff1 value: 17.243393543205006 - type: nauc_ndcg_at_3_max value: 6.003682896861271 - type: nauc_ndcg_at_3_std value: 0.3923628664952013 - type: nauc_ndcg_at_5_diff1 value: 15.841455870049076 - type: nauc_ndcg_at_5_max value: 6.163583363661528 - type: nauc_ndcg_at_5_std value: -1.9411356710983478 - type: nauc_precision_at_1000_diff1 value: -3.399817676017686 - type: nauc_precision_at_1000_max value: 5.575723322824422 - type: nauc_precision_at_1000_std value: 5.63779109914318 - type: nauc_precision_at_100_diff1 value: 6.1555220193892435 - type: nauc_precision_at_100_max value: 6.7977343501791045 - type: nauc_precision_at_100_std value: 2.026960062764128 - type: nauc_precision_at_10_diff1 value: 5.864713737249161 - type: nauc_precision_at_10_max value: 10.987539143688663 - type: nauc_precision_at_10_std value: -0.12419185225065871 - type: nauc_precision_at_1_diff1 value: 33.145174271418576 - type: nauc_precision_at_1_max value: 10.674348159869123 - type: nauc_precision_at_1_std value: 2.5718912675260843 - type: nauc_precision_at_20_diff1 value: 4.994637980783556 - type: nauc_precision_at_20_max value: 7.522690866727933 - type: nauc_precision_at_20_std value: 0.027674551460471312 - type: nauc_precision_at_3_diff1 value: 8.451342681964578 - type: nauc_precision_at_3_max value: 5.343253356927528 - type: nauc_precision_at_3_std value: 1.6495845441147832 - type: nauc_precision_at_5_diff1 value: 6.193033041556517 - type: nauc_precision_at_5_max value: 5.77635145338238 - type: nauc_precision_at_5_std value: -3.421797113444559 - type: nauc_recall_at_1000_diff1 value: 7.437110169863727 - type: nauc_recall_at_1000_max value: 9.607314782406986 - type: nauc_recall_at_1000_std value: 13.320498460741362 - type: nauc_recall_at_100_diff1 value: 13.309966057961834 - type: nauc_recall_at_100_max value: 7.748170239579637 - type: nauc_recall_at_100_std value: 2.6798857378517864 - type: nauc_recall_at_10_diff1 value: 8.674278695378167 - type: nauc_recall_at_10_max value: 8.969918415623756 - type: nauc_recall_at_10_std value: -1.4597400700986853 - type: nauc_recall_at_1_diff1 value: 33.90106432523568 - type: nauc_recall_at_1_max value: 9.289205840089235 - type: nauc_recall_at_1_std value: 2.1852128418717705 - type: nauc_recall_at_20_diff1 value: 7.663555921211413 - type: nauc_recall_at_20_max value: 7.420494129425241 - type: nauc_recall_at_20_std value: -0.39971980929980877 - type: nauc_recall_at_3_diff1 value: 10.784631081908223 - type: nauc_recall_at_3_max value: 3.815625872455824 - type: nauc_recall_at_3_std value: -1.1614434404018152 - type: nauc_recall_at_5_diff1 value: 9.60638979119831 - type: nauc_recall_at_5_max value: 5.1710882220553405 - type: nauc_recall_at_5_std value: -4.572280393094789 - type: ndcg_at_1 value: 7.2139999999999995 - type: ndcg_at_10 value: 12.145 - type: ndcg_at_100 value: 16.672 - type: ndcg_at_1000 value: 20.342 - type: ndcg_at_20 value: 13.745 - type: ndcg_at_3 value: 9.607000000000001 - type: ndcg_at_5 value: 10.712000000000002 - type: precision_at_1 value: 7.2139999999999995 - type: precision_at_10 value: 2.338 - type: precision_at_100 value: 0.5459999999999999 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 1.6039999999999999 - type: precision_at_3 value: 4.726 - type: precision_at_5 value: 3.5319999999999996 - type: recall_at_1 value: 6.019 - type: recall_at_10 value: 18.102999999999998 - type: recall_at_100 value: 38.482 - type: recall_at_1000 value: 65.436 - type: recall_at_20 value: 23.952 - type: recall_at_3 value: 11.178 - type: recall_at_5 value: 13.877 - task: type: Retrieval dataset: name: MTEB CQADupstackPhysicsRetrieval (default) type: mteb/cqadupstack-physics config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: main_score value: 26.667999999999996 - type: map_at_1 value: 16.822 - type: map_at_10 value: 22.476 - type: map_at_100 value: 23.69 - type: map_at_1000 value: 23.827 - type: map_at_20 value: 23.084 - type: map_at_3 value: 20.441000000000003 - type: map_at_5 value: 21.512 - type: mrr_at_1 value: 20.78922040423484 - type: mrr_at_10 value: 26.67445804115679 - type: mrr_at_100 value: 27.67534998291947 - type: mrr_at_1000 value: 27.752906060167692 - type: mrr_at_20 value: 27.19875968774574 - type: mrr_at_3 value: 24.4947064485082 - type: mrr_at_5 value: 25.630413859480278 - type: nauc_map_at_1000_diff1 value: 40.40492447320535 - type: nauc_map_at_1000_max value: 28.548119831633194 - type: nauc_map_at_1000_std value: -0.22424233207141148 - type: nauc_map_at_100_diff1 value: 40.39875847865982 - type: nauc_map_at_100_max value: 28.500575725413096 - type: nauc_map_at_100_std value: -0.2779979908842256 - type: nauc_map_at_10_diff1 value: 40.942304749094085 - type: nauc_map_at_10_max value: 28.429772938475008 - type: nauc_map_at_10_std value: -0.8049874864329988 - type: nauc_map_at_1_diff1 value: 47.17822553627135 - type: nauc_map_at_1_max value: 31.206514215995206 - type: nauc_map_at_1_std value: -1.8984121963184788 - type: nauc_map_at_20_diff1 value: 40.4346381000311 - type: nauc_map_at_20_max value: 28.458128761837536 - type: nauc_map_at_20_std value: -0.7321703207226834 - type: nauc_map_at_3_diff1 value: 42.2424427066743 - type: nauc_map_at_3_max value: 28.16537428952111 - type: nauc_map_at_3_std value: -2.298671243793284 - type: nauc_map_at_5_diff1 value: 41.32690925538059 - type: nauc_map_at_5_max value: 28.53162210264393 - type: nauc_map_at_5_std value: -1.1738320079845177 - type: nauc_mrr_at_1000_diff1 value: 37.69693278594645 - type: nauc_mrr_at_1000_max value: 29.49690742209793 - type: nauc_mrr_at_1000_std value: 3.1815473802020544 - type: nauc_mrr_at_100_diff1 value: 37.65946389835227 - type: nauc_mrr_at_100_max value: 29.479438074437127 - type: nauc_mrr_at_100_std value: 3.166552364873761 - type: nauc_mrr_at_10_diff1 value: 38.06473613801605 - type: nauc_mrr_at_10_max value: 29.79312016758447 - type: nauc_mrr_at_10_std value: 3.111988711521923 - type: nauc_mrr_at_1_diff1 value: 43.69553072839024 - type: nauc_mrr_at_1_max value: 32.142344513289025 - type: nauc_mrr_at_1_std value: 2.696048057380709 - type: nauc_mrr_at_20_diff1 value: 37.626141249327574 - type: nauc_mrr_at_20_max value: 29.559923833552347 - type: nauc_mrr_at_20_std value: 2.9860721770618697 - type: nauc_mrr_at_3_diff1 value: 39.324715416924974 - type: nauc_mrr_at_3_max value: 29.651196356282618 - type: nauc_mrr_at_3_std value: 1.9583884507428824 - type: nauc_mrr_at_5_diff1 value: 38.36691352781637 - type: nauc_mrr_at_5_max value: 29.939763561026002 - type: nauc_mrr_at_5_std value: 2.7317703526814214 - type: nauc_ndcg_at_1000_diff1 value: 36.523136783112406 - type: nauc_ndcg_at_1000_max value: 28.684387654497584 - type: nauc_ndcg_at_1000_std value: 4.732051883634089 - type: nauc_ndcg_at_100_diff1 value: 36.16154861613736 - type: nauc_ndcg_at_100_max value: 27.921202679602143 - type: nauc_ndcg_at_100_std value: 3.560040019944456 - type: nauc_ndcg_at_10_diff1 value: 37.774474422977896 - type: nauc_ndcg_at_10_max value: 27.68147817987237 - type: nauc_ndcg_at_10_std value: 0.8327502237036594 - type: nauc_ndcg_at_1_diff1 value: 43.69553072839024 - type: nauc_ndcg_at_1_max value: 32.142344513289025 - type: nauc_ndcg_at_1_std value: 2.696048057380709 - type: nauc_ndcg_at_20_diff1 value: 36.163233644690266 - type: nauc_ndcg_at_20_max value: 27.4164968525345 - type: nauc_ndcg_at_20_std value: 0.8376631121502218 - type: nauc_ndcg_at_3_diff1 value: 39.707715661307105 - type: nauc_ndcg_at_3_max value: 28.324727845444997 - type: nauc_ndcg_at_3_std value: -0.7238153399588456 - type: nauc_ndcg_at_5_diff1 value: 38.42323115018405 - type: nauc_ndcg_at_5_max value: 28.520234702176587 - type: nauc_ndcg_at_5_std value: 0.4337143091381524 - type: nauc_precision_at_1000_diff1 value: -1.7237517846851018 - type: nauc_precision_at_1000_max value: 16.20499296488572 - type: nauc_precision_at_1000_std value: 20.16360817424688 - type: nauc_precision_at_100_diff1 value: 7.455105305668386 - type: nauc_precision_at_100_max value: 23.35672119353681 - type: nauc_precision_at_100_std value: 18.66911905196039 - type: nauc_precision_at_10_diff1 value: 23.28265657395181 - type: nauc_precision_at_10_max value: 27.533659469131948 - type: nauc_precision_at_10_std value: 9.661356716727099 - type: nauc_precision_at_1_diff1 value: 43.69553072839024 - type: nauc_precision_at_1_max value: 32.142344513289025 - type: nauc_precision_at_1_std value: 2.696048057380709 - type: nauc_precision_at_20_diff1 value: 15.588844976640317 - type: nauc_precision_at_20_max value: 24.89373446940838 - type: nauc_precision_at_20_std value: 9.462736793529547 - type: nauc_precision_at_3_diff1 value: 31.24543977571387 - type: nauc_precision_at_3_max value: 27.88457380895888 - type: nauc_precision_at_3_std value: 3.0400582769598334 - type: nauc_precision_at_5_diff1 value: 27.621476771588156 - type: nauc_precision_at_5_max value: 29.344696084898647 - type: nauc_precision_at_5_std value: 6.279675749763937 - type: nauc_recall_at_1000_diff1 value: 20.19996493542523 - type: nauc_recall_at_1000_max value: 24.65244498292903 - type: nauc_recall_at_1000_std value: 35.312310075738125 - type: nauc_recall_at_100_diff1 value: 22.904431187357847 - type: nauc_recall_at_100_max value: 21.00955732817796 - type: nauc_recall_at_100_std value: 13.938151070174573 - type: nauc_recall_at_10_diff1 value: 30.03923096618402 - type: nauc_recall_at_10_max value: 22.353534397229048 - type: nauc_recall_at_10_std value: 1.2207088824681231 - type: nauc_recall_at_1_diff1 value: 47.17822553627135 - type: nauc_recall_at_1_max value: 31.206514215995206 - type: nauc_recall_at_1_std value: -1.8984121963184788 - type: nauc_recall_at_20_diff1 value: 24.682826207248283 - type: nauc_recall_at_20_max value: 20.777119838220408 - type: nauc_recall_at_20_std value: 1.2286788398315465 - type: nauc_recall_at_3_diff1 value: 35.715604782377035 - type: nauc_recall_at_3_max value: 23.7633639937056 - type: nauc_recall_at_3_std value: -2.868937897653619 - type: nauc_recall_at_5_diff1 value: 32.21252827575707 - type: nauc_recall_at_5_max value: 24.799142864683375 - type: nauc_recall_at_5_std value: 0.36296684299374204 - type: ndcg_at_1 value: 20.788999999999998 - type: ndcg_at_10 value: 26.667999999999996 - type: ndcg_at_100 value: 32.565 - type: ndcg_at_1000 value: 35.634 - type: ndcg_at_20 value: 28.642 - type: ndcg_at_3 value: 22.942 - type: ndcg_at_5 value: 24.514 - type: precision_at_1 value: 20.788999999999998 - type: precision_at_10 value: 4.947 - type: precision_at_100 value: 0.96 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_20 value: 3.104 - type: precision_at_3 value: 10.748000000000001 - type: precision_at_5 value: 7.68 - type: recall_at_1 value: 16.822 - type: recall_at_10 value: 35.237 - type: recall_at_100 value: 61.219 - type: recall_at_1000 value: 82.499 - type: recall_at_20 value: 42.230000000000004 - type: recall_at_3 value: 24.524 - type: recall_at_5 value: 28.787000000000003 - task: type: Retrieval dataset: name: MTEB CQADupstackProgrammersRetrieval (default) type: mteb/cqadupstack-programmers config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: main_score value: 21.66 - type: map_at_1 value: 12.416 - type: map_at_10 value: 17.684 - type: map_at_100 value: 18.851000000000003 - type: map_at_1000 value: 18.991 - type: map_at_20 value: 18.360000000000003 - type: map_at_3 value: 15.770999999999999 - type: map_at_5 value: 16.606 - type: mrr_at_1 value: 15.068493150684931 - type: mrr_at_10 value: 21.28823294919185 - type: mrr_at_100 value: 22.306240026063588 - type: mrr_at_1000 value: 22.395578374917164 - type: mrr_at_20 value: 21.90701850599165 - type: mrr_at_3 value: 19.273211567732123 - type: mrr_at_5 value: 20.397640791476412 - type: nauc_map_at_1000_diff1 value: 32.04680475392268 - type: nauc_map_at_1000_max value: 20.9527363509733 - type: nauc_map_at_1000_std value: 1.9775389393996066 - type: nauc_map_at_100_diff1 value: 32.05659071752874 - type: nauc_map_at_100_max value: 20.937669829415213 - type: nauc_map_at_100_std value: 1.8872130027911487 - type: nauc_map_at_10_diff1 value: 32.40493239661906 - type: nauc_map_at_10_max value: 20.24841030282171 - type: nauc_map_at_10_std value: 0.8873591420958411 - type: nauc_map_at_1_diff1 value: 39.50866679123135 - type: nauc_map_at_1_max value: 21.067083493139833 - type: nauc_map_at_1_std value: -1.255629309903365 - type: nauc_map_at_20_diff1 value: 32.06523680001786 - type: nauc_map_at_20_max value: 20.482809699946856 - type: nauc_map_at_20_std value: 1.2900775457613989 - type: nauc_map_at_3_diff1 value: 33.51328659054749 - type: nauc_map_at_3_max value: 19.351150884357097 - type: nauc_map_at_3_std value: -0.9449293271546024 - type: nauc_map_at_5_diff1 value: 32.672807388132 - type: nauc_map_at_5_max value: 19.888696407961916 - type: nauc_map_at_5_std value: -0.21370229639305732 - type: nauc_mrr_at_1000_diff1 value: 29.4702965330427 - type: nauc_mrr_at_1000_max value: 21.5485190959632 - type: nauc_mrr_at_1000_std value: 2.9474086643706716 - type: nauc_mrr_at_100_diff1 value: 29.444301031842237 - type: nauc_mrr_at_100_max value: 21.545652672940818 - type: nauc_mrr_at_100_std value: 2.930083417192537 - type: nauc_mrr_at_10_diff1 value: 29.839809988865028 - type: nauc_mrr_at_10_max value: 21.285084047773285 - type: nauc_mrr_at_10_std value: 2.3023735099948794 - type: nauc_mrr_at_1_diff1 value: 38.253685943964285 - type: nauc_mrr_at_1_max value: 23.506493457282993 - type: nauc_mrr_at_1_std value: 0.36623457899262024 - type: nauc_mrr_at_20_diff1 value: 29.359787332306013 - type: nauc_mrr_at_20_max value: 21.246732134190733 - type: nauc_mrr_at_20_std value: 2.6115784611487087 - type: nauc_mrr_at_3_diff1 value: 31.490392724228837 - type: nauc_mrr_at_3_max value: 21.643605643490904 - type: nauc_mrr_at_3_std value: 1.6756866672672965 - type: nauc_mrr_at_5_diff1 value: 30.18536933081793 - type: nauc_mrr_at_5_max value: 21.27264373907216 - type: nauc_mrr_at_5_std value: 1.7079689552978534 - type: nauc_ndcg_at_1000_diff1 value: 28.11169834333845 - type: nauc_ndcg_at_1000_max value: 22.65134504760621 - type: nauc_ndcg_at_1000_std value: 8.353986044564932 - type: nauc_ndcg_at_100_diff1 value: 28.265985165496417 - type: nauc_ndcg_at_100_max value: 22.530347672551887 - type: nauc_ndcg_at_100_std value: 6.968755339521627 - type: nauc_ndcg_at_10_diff1 value: 29.088878880551906 - type: nauc_ndcg_at_10_max value: 19.918818478137702 - type: nauc_ndcg_at_10_std value: 2.5519795248451795 - type: nauc_ndcg_at_1_diff1 value: 38.253685943964285 - type: nauc_ndcg_at_1_max value: 23.506493457282993 - type: nauc_ndcg_at_1_std value: 0.36623457899262024 - type: nauc_ndcg_at_20_diff1 value: 27.910656458566045 - type: nauc_ndcg_at_20_max value: 20.295061759944723 - type: nauc_ndcg_at_20_std value: 3.6145835770906833 - type: nauc_ndcg_at_3_diff1 value: 31.233680318242634 - type: nauc_ndcg_at_3_max value: 19.494683132285033 - type: nauc_ndcg_at_3_std value: 0.04355647255533374 - type: nauc_ndcg_at_5_diff1 value: 29.60761336088322 - type: nauc_ndcg_at_5_max value: 19.80719438136175 - type: nauc_ndcg_at_5_std value: 0.6195875169583498 - type: nauc_precision_at_1000_diff1 value: -4.9635863591586284 - type: nauc_precision_at_1000_max value: 10.205880001940644 - type: nauc_precision_at_1000_std value: 13.475741604004421 - type: nauc_precision_at_100_diff1 value: 7.633273326571685 - type: nauc_precision_at_100_max value: 23.151284304137622 - type: nauc_precision_at_100_std value: 20.405156194796863 - type: nauc_precision_at_10_diff1 value: 18.705937577794554 - type: nauc_precision_at_10_max value: 20.628035226019335 - type: nauc_precision_at_10_std value: 7.041902045527893 - type: nauc_precision_at_1_diff1 value: 38.253685943964285 - type: nauc_precision_at_1_max value: 23.506493457282993 - type: nauc_precision_at_1_std value: 0.36623457899262024 - type: nauc_precision_at_20_diff1 value: 14.129163643470525 - type: nauc_precision_at_20_max value: 20.39744876825584 - type: nauc_precision_at_20_std value: 10.808780160453079 - type: nauc_precision_at_3_diff1 value: 24.81724694529244 - type: nauc_precision_at_3_max value: 19.750250129235862 - type: nauc_precision_at_3_std value: 1.6383497722612925 - type: nauc_precision_at_5_diff1 value: 20.559816479129896 - type: nauc_precision_at_5_max value: 20.737938153703908 - type: nauc_precision_at_5_std value: 2.9329054609944767 - type: nauc_recall_at_1000_diff1 value: 14.657477263404504 - type: nauc_recall_at_1000_max value: 27.29789317523507 - type: nauc_recall_at_1000_std value: 41.54560242921126 - type: nauc_recall_at_100_diff1 value: 19.668816678808028 - type: nauc_recall_at_100_max value: 24.546392696829855 - type: nauc_recall_at_100_std value: 20.045457113413388 - type: nauc_recall_at_10_diff1 value: 22.57592036080691 - type: nauc_recall_at_10_max value: 17.30186041967476 - type: nauc_recall_at_10_std value: 5.75949108824036 - type: nauc_recall_at_1_diff1 value: 39.50866679123135 - type: nauc_recall_at_1_max value: 21.067083493139833 - type: nauc_recall_at_1_std value: -1.255629309903365 - type: nauc_recall_at_20_diff1 value: 18.597441888297915 - type: nauc_recall_at_20_max value: 17.76783323985467 - type: nauc_recall_at_20_std value: 7.756313900025849 - type: nauc_recall_at_3_diff1 value: 27.928359626631092 - type: nauc_recall_at_3_max value: 16.336637037641772 - type: nauc_recall_at_3_std value: -1.3417417785554366 - type: nauc_recall_at_5_diff1 value: 24.22251676423838 - type: nauc_recall_at_5_max value: 16.857422692031594 - type: nauc_recall_at_5_std value: 0.6185629064463674 - type: ndcg_at_1 value: 15.068000000000001 - type: ndcg_at_10 value: 21.66 - type: ndcg_at_100 value: 27.245 - type: ndcg_at_1000 value: 30.591 - type: ndcg_at_20 value: 23.955000000000002 - type: ndcg_at_3 value: 17.968999999999998 - type: ndcg_at_5 value: 19.352 - type: precision_at_1 value: 15.068000000000001 - type: precision_at_10 value: 4.326 - type: precision_at_100 value: 0.855 - type: precision_at_1000 value: 0.132 - type: precision_at_20 value: 2.8369999999999997 - type: precision_at_3 value: 8.713999999999999 - type: precision_at_5 value: 6.3469999999999995 - type: recall_at_1 value: 12.416 - type: recall_at_10 value: 30.008000000000003 - type: recall_at_100 value: 54.498999999999995 - type: recall_at_1000 value: 78.32000000000001 - type: recall_at_20 value: 38.378 - type: recall_at_3 value: 19.79 - type: recall_at_5 value: 23.376 - task: type: Retrieval dataset: name: MTEB CQADupstackRetrieval (default) type: CQADupstackRetrieval_is_a_combined_dataset config: default split: test revision: CQADupstackRetrieval_is_a_combined_dataset metrics: - type: main_score value: 22.302333333333333 - type: ndcg_at_10 value: 22.302333333333333 - task: type: Retrieval dataset: name: MTEB CQADupstackStatsRetrieval (default) type: mteb/cqadupstack-stats config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: main_score value: 17.253 - type: map_at_1 value: 9.722999999999999 - type: map_at_10 value: 14.280999999999999 - type: map_at_100 value: 15.065000000000001 - type: map_at_1000 value: 15.154 - type: map_at_20 value: 14.704999999999998 - type: map_at_3 value: 13.004 - type: map_at_5 value: 13.626 - type: mrr_at_1 value: 11.809815950920246 - type: mrr_at_10 value: 16.383959002824028 - type: mrr_at_100 value: 17.188709691814985 - type: mrr_at_1000 value: 17.269435610183017 - type: mrr_at_20 value: 16.836972625425393 - type: mrr_at_3 value: 15.081799591002035 - type: mrr_at_5 value: 15.710633946830258 - type: nauc_map_at_1000_diff1 value: 28.431623275634156 - type: nauc_map_at_1000_max value: 14.476316695164403 - type: nauc_map_at_1000_std value: 4.607998508591043 - type: nauc_map_at_100_diff1 value: 28.42367177875125 - type: nauc_map_at_100_max value: 14.394653506060012 - type: nauc_map_at_100_std value: 4.567472357591712 - type: nauc_map_at_10_diff1 value: 28.60653023312716 - type: nauc_map_at_10_max value: 14.78157644547682 - type: nauc_map_at_10_std value: 3.94994519901673 - type: nauc_map_at_1_diff1 value: 34.36968432094878 - type: nauc_map_at_1_max value: 17.456572010137457 - type: nauc_map_at_1_std value: 4.2640515305539415 - type: nauc_map_at_20_diff1 value: 28.510596490501573 - type: nauc_map_at_20_max value: 14.318541992037401 - type: nauc_map_at_20_std value: 4.254075392620963 - type: nauc_map_at_3_diff1 value: 30.539716169861936 - type: nauc_map_at_3_max value: 16.14471431902583 - type: nauc_map_at_3_std value: 4.973502209268125 - type: nauc_map_at_5_diff1 value: 29.261684655915225 - type: nauc_map_at_5_max value: 15.372748605327446 - type: nauc_map_at_5_std value: 4.39285622535654 - type: nauc_mrr_at_1000_diff1 value: 28.972718024301447 - type: nauc_mrr_at_1000_max value: 17.826835397341046 - type: nauc_mrr_at_1000_std value: 6.917284034347911 - type: nauc_mrr_at_100_diff1 value: 28.945997371755087 - type: nauc_mrr_at_100_max value: 17.739278412823893 - type: nauc_mrr_at_100_std value: 6.899424135908487 - type: nauc_mrr_at_10_diff1 value: 29.06935519309891 - type: nauc_mrr_at_10_max value: 18.21083753088906 - type: nauc_mrr_at_10_std value: 6.518493253737144 - type: nauc_mrr_at_1_diff1 value: 35.63041619844435 - type: nauc_mrr_at_1_max value: 22.830306049699338 - type: nauc_mrr_at_1_std value: 7.826683917417351 - type: nauc_mrr_at_20_diff1 value: 29.016004511022537 - type: nauc_mrr_at_20_max value: 17.788437345787926 - type: nauc_mrr_at_20_std value: 6.652263770077456 - type: nauc_mrr_at_3_diff1 value: 30.644333070723466 - type: nauc_mrr_at_3_max value: 19.667632613725225 - type: nauc_mrr_at_3_std value: 7.743380165559918 - type: nauc_mrr_at_5_diff1 value: 29.829376205828805 - type: nauc_mrr_at_5_max value: 18.722595091544253 - type: nauc_mrr_at_5_std value: 6.818524829545593 - type: nauc_ndcg_at_1000_diff1 value: 25.62248172657835 - type: nauc_ndcg_at_1000_max value: 14.223326419511073 - type: nauc_ndcg_at_1000_std value: 7.495752604082028 - type: nauc_ndcg_at_100_diff1 value: 25.499428653265642 - type: nauc_ndcg_at_100_max value: 12.585064293899102 - type: nauc_ndcg_at_100_std value: 6.664889384341954 - type: nauc_ndcg_at_10_diff1 value: 25.74972755098383 - type: nauc_ndcg_at_10_max value: 13.793434874824303 - type: nauc_ndcg_at_10_std value: 3.883648047462527 - type: nauc_ndcg_at_1_diff1 value: 35.63041619844435 - type: nauc_ndcg_at_1_max value: 22.830306049699338 - type: nauc_ndcg_at_1_std value: 7.826683917417351 - type: nauc_ndcg_at_20_diff1 value: 25.334745687494443 - type: nauc_ndcg_at_20_max value: 12.305607906859144 - type: nauc_ndcg_at_20_std value: 4.7413728340444505 - type: nauc_ndcg_at_3_diff1 value: 29.45395763143249 - type: nauc_ndcg_at_3_max value: 16.23690234046979 - type: nauc_ndcg_at_3_std value: 6.142105291678576 - type: nauc_ndcg_at_5_diff1 value: 27.444736442905455 - type: nauc_ndcg_at_5_max value: 14.93362615759676 - type: nauc_ndcg_at_5_std value: 4.7342440148611225 - type: nauc_precision_at_1000_diff1 value: 16.80575206659899 - type: nauc_precision_at_1000_max value: 17.66226703408546 - type: nauc_precision_at_1000_std value: 18.77422949877631 - type: nauc_precision_at_100_diff1 value: 21.105287938477233 - type: nauc_precision_at_100_max value: 13.591179380636214 - type: nauc_precision_at_100_std value: 16.55840962012843 - type: nauc_precision_at_10_diff1 value: 21.469758913525254 - type: nauc_precision_at_10_max value: 15.320780706573464 - type: nauc_precision_at_10_std value: 6.351289997170259 - type: nauc_precision_at_1_diff1 value: 35.63041619844435 - type: nauc_precision_at_1_max value: 22.830306049699338 - type: nauc_precision_at_1_std value: 7.826683917417351 - type: nauc_precision_at_20_diff1 value: 20.438996654370953 - type: nauc_precision_at_20_max value: 11.895395539109575 - type: nauc_precision_at_20_std value: 9.227372989467945 - type: nauc_precision_at_3_diff1 value: 27.958385745280534 - type: nauc_precision_at_3_max value: 18.76663358991842 - type: nauc_precision_at_3_std value: 8.804799926813658 - type: nauc_precision_at_5_diff1 value: 25.20756412916346 - type: nauc_precision_at_5_max value: 17.16752690039525 - type: nauc_precision_at_5_std value: 7.822524248176865 - type: nauc_recall_at_1000_diff1 value: 17.093227818066353 - type: nauc_recall_at_1000_max value: 12.628515233697735 - type: nauc_recall_at_1000_std value: 16.519924218447994 - type: nauc_recall_at_100_diff1 value: 18.19732935930814 - type: nauc_recall_at_100_max value: 4.740051109026774 - type: nauc_recall_at_100_std value: 10.729043783837753 - type: nauc_recall_at_10_diff1 value: 17.84235497242283 - type: nauc_recall_at_10_max value: 7.9110522988146155 - type: nauc_recall_at_10_std value: 1.147900198002905 - type: nauc_recall_at_1_diff1 value: 34.36968432094878 - type: nauc_recall_at_1_max value: 17.456572010137457 - type: nauc_recall_at_1_std value: 4.2640515305539415 - type: nauc_recall_at_20_diff1 value: 16.692476991368853 - type: nauc_recall_at_20_max value: 3.809776817661501 - type: nauc_recall_at_20_std value: 3.6575551737685954 - type: nauc_recall_at_3_diff1 value: 25.110591985459862 - type: nauc_recall_at_3_max value: 13.681824792451245 - type: nauc_recall_at_3_std value: 5.806771643452482 - type: nauc_recall_at_5_diff1 value: 21.0191985797923 - type: nauc_recall_at_5_max value: 10.837381063643834 - type: nauc_recall_at_5_std value: 3.228418252689027 - type: ndcg_at_1 value: 11.81 - type: ndcg_at_10 value: 17.253 - type: ndcg_at_100 value: 21.404 - type: ndcg_at_1000 value: 24.09 - type: ndcg_at_20 value: 18.801000000000002 - type: ndcg_at_3 value: 14.716999999999999 - type: ndcg_at_5 value: 15.706000000000001 - type: precision_at_1 value: 11.81 - type: precision_at_10 value: 2.9749999999999996 - type: precision_at_100 value: 0.543 - type: precision_at_1000 value: 0.084 - type: precision_at_20 value: 1.848 - type: precision_at_3 value: 6.902 - type: precision_at_5 value: 4.816 - type: recall_at_1 value: 9.722999999999999 - type: recall_at_10 value: 24.569 - type: recall_at_100 value: 43.997 - type: recall_at_1000 value: 64.44 - type: recall_at_20 value: 30.505 - type: recall_at_3 value: 17.134 - type: recall_at_5 value: 19.72 - task: type: Retrieval dataset: name: MTEB CQADupstackTexRetrieval (default) type: mteb/cqadupstack-tex config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: main_score value: 13.308 - type: map_at_1 value: 7.497 - type: map_at_10 value: 10.846 - type: map_at_100 value: 11.498999999999999 - type: map_at_1000 value: 11.618 - type: map_at_20 value: 11.161999999999999 - type: map_at_3 value: 9.658999999999999 - type: map_at_5 value: 10.298 - type: mrr_at_1 value: 9.11906400550585 - type: mrr_at_10 value: 12.993232392750626 - type: mrr_at_100 value: 13.701403675494117 - type: mrr_at_1000 value: 13.798101712770123 - type: mrr_at_20 value: 13.360764217937035 - type: mrr_at_3 value: 11.6655196145905 - type: mrr_at_5 value: 12.362353750860274 - type: nauc_map_at_1000_diff1 value: 29.030158454163164 - type: nauc_map_at_1000_max value: 15.750545094681929 - type: nauc_map_at_1000_std value: -3.0798436292807834 - type: nauc_map_at_100_diff1 value: 29.05038743174521 - type: nauc_map_at_100_max value: 15.679082682471822 - type: nauc_map_at_100_std value: -3.2003921265004855 - type: nauc_map_at_10_diff1 value: 29.680682239615308 - type: nauc_map_at_10_max value: 15.532980267877802 - type: nauc_map_at_10_std value: -3.622076099535413 - type: nauc_map_at_1_diff1 value: 37.49924172327444 - type: nauc_map_at_1_max value: 14.852898999380606 - type: nauc_map_at_1_std value: -3.8871845491808403 - type: nauc_map_at_20_diff1 value: 29.440127025124063 - type: nauc_map_at_20_max value: 15.566926763278111 - type: nauc_map_at_20_std value: -3.5118135905883445 - type: nauc_map_at_3_diff1 value: 31.87407675131833 - type: nauc_map_at_3_max value: 16.133052442782088 - type: nauc_map_at_3_std value: -3.7331459743832536 - type: nauc_map_at_5_diff1 value: 30.702048393849918 - type: nauc_map_at_5_max value: 15.7292852737471 - type: nauc_map_at_5_std value: -3.72714036461797 - type: nauc_mrr_at_1000_diff1 value: 27.069591144268795 - type: nauc_mrr_at_1000_max value: 17.335323991978157 - type: nauc_mrr_at_1000_std value: -2.1443215489774863 - type: nauc_mrr_at_100_diff1 value: 27.06995261671637 - type: nauc_mrr_at_100_max value: 17.3285570198275 - type: nauc_mrr_at_100_std value: -2.1819679734953903 - type: nauc_mrr_at_10_diff1 value: 27.57687228309106 - type: nauc_mrr_at_10_max value: 17.166971785334017 - type: nauc_mrr_at_10_std value: -2.6000743496984526 - type: nauc_mrr_at_1_diff1 value: 35.22676568917156 - type: nauc_mrr_at_1_max value: 17.007211079819626 - type: nauc_mrr_at_1_std value: -4.214696308727653 - type: nauc_mrr_at_20_diff1 value: 27.374588178560465 - type: nauc_mrr_at_20_max value: 17.23758467893531 - type: nauc_mrr_at_20_std value: -2.4124837810565603 - type: nauc_mrr_at_3_diff1 value: 29.722577971696918 - type: nauc_mrr_at_3_max value: 18.07384167733403 - type: nauc_mrr_at_3_std value: -3.003414797443647 - type: nauc_mrr_at_5_diff1 value: 28.45980370469956 - type: nauc_mrr_at_5_max value: 17.511976658495847 - type: nauc_mrr_at_5_std value: -2.5924858663986745 - type: nauc_ndcg_at_1000_diff1 value: 23.077231893052307 - type: nauc_ndcg_at_1000_max value: 16.93593483664181 - type: nauc_ndcg_at_1000_std value: 1.2092406562986315 - type: nauc_ndcg_at_100_diff1 value: 23.549727836162358 - type: nauc_ndcg_at_100_max value: 15.750436011474273 - type: nauc_ndcg_at_100_std value: -0.9019324316165611 - type: nauc_ndcg_at_10_diff1 value: 26.053761788639434 - type: nauc_ndcg_at_10_max value: 15.3669306793647 - type: nauc_ndcg_at_10_std value: -3.193779292269917 - type: nauc_ndcg_at_1_diff1 value: 35.22676568917156 - type: nauc_ndcg_at_1_max value: 17.007211079819626 - type: nauc_ndcg_at_1_std value: -4.214696308727653 - type: nauc_ndcg_at_20_diff1 value: 25.425326574435168 - type: nauc_ndcg_at_20_max value: 15.385189154016906 - type: nauc_ndcg_at_20_std value: -2.7870454259014545 - type: nauc_ndcg_at_3_diff1 value: 29.685264931512716 - type: nauc_ndcg_at_3_max value: 17.07409526298788 - type: nauc_ndcg_at_3_std value: -3.4063850629923293 - type: nauc_ndcg_at_5_diff1 value: 27.89860104840894 - type: nauc_ndcg_at_5_max value: 15.996740122854927 - type: nauc_ndcg_at_5_std value: -3.3146899970251873 - type: nauc_precision_at_1000_diff1 value: 6.214195083416471 - type: nauc_precision_at_1000_max value: 24.273670809985404 - type: nauc_precision_at_1000_std value: 17.553556491344104 - type: nauc_precision_at_100_diff1 value: 11.6615588663656 - type: nauc_precision_at_100_max value: 20.59244105372682 - type: nauc_precision_at_100_std value: 8.072189089366798 - type: nauc_precision_at_10_diff1 value: 18.279161444567706 - type: nauc_precision_at_10_max value: 17.664508142320727 - type: nauc_precision_at_10_std value: -1.0218966605840407 - type: nauc_precision_at_1_diff1 value: 35.22676568917156 - type: nauc_precision_at_1_max value: 17.007211079819626 - type: nauc_precision_at_1_std value: -4.214696308727653 - type: nauc_precision_at_20_diff1 value: 16.855549347544613 - type: nauc_precision_at_20_max value: 18.640589054149743 - type: nauc_precision_at_20_std value: 0.7553558754796067 - type: nauc_precision_at_3_diff1 value: 25.61293747306704 - type: nauc_precision_at_3_max value: 20.254901193584562 - type: nauc_precision_at_3_std value: -2.9517852127763153 - type: nauc_precision_at_5_diff1 value: 22.32451285561962 - type: nauc_precision_at_5_max value: 18.709490300571886 - type: nauc_precision_at_5_std value: -2.0702847848899615 - type: nauc_recall_at_1000_diff1 value: 8.102081393478185 - type: nauc_recall_at_1000_max value: 17.111395305264892 - type: nauc_recall_at_1000_std value: 14.340291614611578 - type: nauc_recall_at_100_diff1 value: 12.480368811829736 - type: nauc_recall_at_100_max value: 12.879220685006636 - type: nauc_recall_at_100_std value: 3.650162252310097 - type: nauc_recall_at_10_diff1 value: 19.461318204968205 - type: nauc_recall_at_10_max value: 12.823289358103562 - type: nauc_recall_at_10_std value: -3.1960264321653895 - type: nauc_recall_at_1_diff1 value: 37.49924172327444 - type: nauc_recall_at_1_max value: 14.852898999380606 - type: nauc_recall_at_1_std value: -3.8871845491808403 - type: nauc_recall_at_20_diff1 value: 17.698352862902524 - type: nauc_recall_at_20_max value: 12.409413309293047 - type: nauc_recall_at_20_std value: -2.0913697847507136 - type: nauc_recall_at_3_diff1 value: 26.236763474946116 - type: nauc_recall_at_3_max value: 15.89287407458128 - type: nauc_recall_at_3_std value: -3.776018275852628 - type: nauc_recall_at_5_diff1 value: 23.10472386873395 - type: nauc_recall_at_5_max value: 14.09706657151941 - type: nauc_recall_at_5_std value: -3.7053105237887296 - type: ndcg_at_1 value: 9.119 - type: ndcg_at_10 value: 13.308 - type: ndcg_at_100 value: 16.98 - type: ndcg_at_1000 value: 20.488 - type: ndcg_at_20 value: 14.455000000000002 - type: ndcg_at_3 value: 10.982 - type: ndcg_at_5 value: 12.003 - type: precision_at_1 value: 9.119 - type: precision_at_10 value: 2.4979999999999998 - type: precision_at_100 value: 0.519 - type: precision_at_1000 value: 0.099 - type: precision_at_20 value: 1.5779999999999998 - type: precision_at_3 value: 5.288 - type: precision_at_5 value: 3.8890000000000002 - type: recall_at_1 value: 7.497 - type: recall_at_10 value: 18.817999999999998 - type: recall_at_100 value: 35.893 - type: recall_at_1000 value: 61.966 - type: recall_at_20 value: 23.017000000000003 - type: recall_at_3 value: 12.199 - type: recall_at_5 value: 14.87 - task: type: Retrieval dataset: name: MTEB CQADupstackUnixRetrieval (default) type: mteb/cqadupstack-unix config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: main_score value: 20.061999999999998 - type: map_at_1 value: 11.856 - type: map_at_10 value: 16.685 - type: map_at_100 value: 17.433 - type: map_at_1000 value: 17.558 - type: map_at_20 value: 17.041999999999998 - type: map_at_3 value: 15.021 - type: map_at_5 value: 15.931999999999999 - type: mrr_at_1 value: 14.17910447761194 - type: mrr_at_10 value: 19.398468964700307 - type: mrr_at_100 value: 20.153361230634783 - type: mrr_at_1000 value: 20.25140420668968 - type: mrr_at_20 value: 19.79354704809282 - type: mrr_at_3 value: 17.63059701492538 - type: mrr_at_5 value: 18.516791044776127 - type: nauc_map_at_1000_diff1 value: 39.29033459612684 - type: nauc_map_at_1000_max value: 27.17416795511821 - type: nauc_map_at_1000_std value: -6.92127611795475 - type: nauc_map_at_100_diff1 value: 39.32396099754708 - type: nauc_map_at_100_max value: 27.09334212594238 - type: nauc_map_at_100_std value: -7.039062385443858 - type: nauc_map_at_10_diff1 value: 39.94340086930468 - type: nauc_map_at_10_max value: 27.423789336152417 - type: nauc_map_at_10_std value: -7.508495669216843 - type: nauc_map_at_1_diff1 value: 47.64613699501138 - type: nauc_map_at_1_max value: 31.632492599268748 - type: nauc_map_at_1_std value: -7.883784832592304 - type: nauc_map_at_20_diff1 value: 39.45107288329592 - type: nauc_map_at_20_max value: 27.15650902645131 - type: nauc_map_at_20_std value: -7.301916707077087 - type: nauc_map_at_3_diff1 value: 41.801336320148984 - type: nauc_map_at_3_max value: 28.342684341392683 - type: nauc_map_at_3_std value: -8.213654438632787 - type: nauc_map_at_5_diff1 value: 40.973958128612786 - type: nauc_map_at_5_max value: 28.355847958983126 - type: nauc_map_at_5_std value: -7.204454459764011 - type: nauc_mrr_at_1000_diff1 value: 39.68737143543835 - type: nauc_mrr_at_1000_max value: 28.74366308891808 - type: nauc_mrr_at_1000_std value: -5.74519909264754 - type: nauc_mrr_at_100_diff1 value: 39.696965050178875 - type: nauc_mrr_at_100_max value: 28.71065540406762 - type: nauc_mrr_at_100_std value: -5.8117683155682895 - type: nauc_mrr_at_10_diff1 value: 40.22891666712493 - type: nauc_mrr_at_10_max value: 28.97882832718155 - type: nauc_mrr_at_10_std value: -6.167061574555064 - type: nauc_mrr_at_1_diff1 value: 48.39795549312159 - type: nauc_mrr_at_1_max value: 33.31270433423697 - type: nauc_mrr_at_1_std value: -5.8264509798445925 - type: nauc_mrr_at_20_diff1 value: 39.75516014377185 - type: nauc_mrr_at_20_max value: 28.762238070807676 - type: nauc_mrr_at_20_std value: -6.015233094372284 - type: nauc_mrr_at_3_diff1 value: 42.39647678330573 - type: nauc_mrr_at_3_max value: 29.854246402890674 - type: nauc_mrr_at_3_std value: -6.989062488249666 - type: nauc_mrr_at_5_diff1 value: 41.32547115377251 - type: nauc_mrr_at_5_max value: 29.756253662694554 - type: nauc_mrr_at_5_std value: -5.989324088608618 - type: nauc_ndcg_at_1000_diff1 value: 33.24611188020779 - type: nauc_ndcg_at_1000_max value: 25.5685050419863 - type: nauc_ndcg_at_1000_std value: -2.1838171971216838 - type: nauc_ndcg_at_100_diff1 value: 34.12429897480726 - type: nauc_ndcg_at_100_max value: 24.386449655174115 - type: nauc_ndcg_at_100_std value: -4.463092158837694 - type: nauc_ndcg_at_10_diff1 value: 36.7514146310574 - type: nauc_ndcg_at_10_max value: 25.816604124438165 - type: nauc_ndcg_at_10_std value: -6.864047505974296 - type: nauc_ndcg_at_1_diff1 value: 48.39795549312159 - type: nauc_ndcg_at_1_max value: 33.31270433423697 - type: nauc_ndcg_at_1_std value: -5.8264509798445925 - type: nauc_ndcg_at_20_diff1 value: 35.19768360191347 - type: nauc_ndcg_at_20_max value: 25.02001675750392 - type: nauc_ndcg_at_20_std value: -6.20782733166831 - type: nauc_ndcg_at_3_diff1 value: 40.154344522643925 - type: nauc_ndcg_at_3_max value: 27.955302837392672 - type: nauc_ndcg_at_3_std value: -7.6328532886404235 - type: nauc_ndcg_at_5_diff1 value: 38.743591122825606 - type: nauc_ndcg_at_5_max value: 27.72241812814964 - type: nauc_ndcg_at_5_std value: -6.257812072012101 - type: nauc_precision_at_1000_diff1 value: -3.9866748764702096 - type: nauc_precision_at_1000_max value: 14.72470736881832 - type: nauc_precision_at_1000_std value: 15.962534584653012 - type: nauc_precision_at_100_diff1 value: 14.40948301991166 - type: nauc_precision_at_100_max value: 16.61733733078467 - type: nauc_precision_at_100_std value: 6.847882296599798 - type: nauc_precision_at_10_diff1 value: 27.51873293006865 - type: nauc_precision_at_10_max value: 22.893866555907746 - type: nauc_precision_at_10_std value: -3.030805589162383 - type: nauc_precision_at_1_diff1 value: 48.39795549312159 - type: nauc_precision_at_1_max value: 33.31270433423697 - type: nauc_precision_at_1_std value: -5.8264509798445925 - type: nauc_precision_at_20_diff1 value: 22.56834807636722 - type: nauc_precision_at_20_max value: 20.490661671424448 - type: nauc_precision_at_20_std value: -0.660069645072748 - type: nauc_precision_at_3_diff1 value: 36.978184171791156 - type: nauc_precision_at_3_max value: 26.478381926029265 - type: nauc_precision_at_3_std value: -6.091960417034656 - type: nauc_precision_at_5_diff1 value: 33.58525371051779 - type: nauc_precision_at_5_max value: 26.334754741578593 - type: nauc_precision_at_5_std value: -3.154368502496007 - type: nauc_recall_at_1000_diff1 value: 5.958742292353638 - type: nauc_recall_at_1000_max value: 15.864543076240528 - type: nauc_recall_at_1000_std value: 21.86695402215286 - type: nauc_recall_at_100_diff1 value: 17.82865358233198 - type: nauc_recall_at_100_max value: 13.118309558968022 - type: nauc_recall_at_100_std value: 2.3032751559115114 - type: nauc_recall_at_10_diff1 value: 27.980644115353996 - type: nauc_recall_at_10_max value: 19.39950863468228 - type: nauc_recall_at_10_std value: -6.36618746193429 - type: nauc_recall_at_1_diff1 value: 47.64613699501138 - type: nauc_recall_at_1_max value: 31.632492599268748 - type: nauc_recall_at_1_std value: -7.883784832592304 - type: nauc_recall_at_20_diff1 value: 22.967595804626253 - type: nauc_recall_at_20_max value: 16.693327271336244 - type: nauc_recall_at_20_std value: -4.559238353011102 - type: nauc_recall_at_3_diff1 value: 35.41022087124811 - type: nauc_recall_at_3_max value: 24.543890488663166 - type: nauc_recall_at_3_std value: -8.200059552235023 - type: nauc_recall_at_5_diff1 value: 32.09822917090586 - type: nauc_recall_at_5_max value: 23.82588196783892 - type: nauc_recall_at_5_std value: -4.932704288647733 - type: ndcg_at_1 value: 14.179 - type: ndcg_at_10 value: 20.061999999999998 - type: ndcg_at_100 value: 24.149 - type: ndcg_at_1000 value: 27.644999999999996 - type: ndcg_at_20 value: 21.387999999999998 - type: ndcg_at_3 value: 16.794 - type: ndcg_at_5 value: 18.224 - type: precision_at_1 value: 14.179 - type: precision_at_10 value: 3.582 - type: precision_at_100 value: 0.623 - type: precision_at_1000 value: 0.105 - type: precision_at_20 value: 2.1319999999999997 - type: precision_at_3 value: 7.774 - type: precision_at_5 value: 5.5969999999999995 - type: recall_at_1 value: 11.856 - type: recall_at_10 value: 27.778999999999996 - type: recall_at_100 value: 46.733000000000004 - type: recall_at_1000 value: 72.481 - type: recall_at_20 value: 32.737 - type: recall_at_3 value: 18.859 - type: recall_at_5 value: 22.435 - task: type: Retrieval dataset: name: MTEB CQADupstackWebmastersRetrieval (default) type: mteb/cqadupstack-webmasters config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: main_score value: 23.735999999999997 - type: map_at_1 value: 13.164000000000001 - type: map_at_10 value: 19.317999999999998 - type: map_at_100 value: 20.463 - type: map_at_1000 value: 20.646 - type: map_at_20 value: 19.808 - type: map_at_3 value: 17.126 - type: map_at_5 value: 18.056 - type: mrr_at_1 value: 16.600790513833992 - type: mrr_at_10 value: 22.620067130936693 - type: mrr_at_100 value: 23.601448756772193 - type: mrr_at_1000 value: 23.675507750087586 - type: mrr_at_20 value: 23.09510872850641 - type: mrr_at_3 value: 20.685111989459816 - type: mrr_at_5 value: 21.46574440052701 - type: nauc_map_at_1000_diff1 value: 38.04966249247377 - type: nauc_map_at_1000_max value: 16.252263992463384 - type: nauc_map_at_1000_std value: -1.7460502582062356 - type: nauc_map_at_100_diff1 value: 38.014610979412474 - type: nauc_map_at_100_max value: 16.21534617931594 - type: nauc_map_at_100_std value: -1.862936037740923 - type: nauc_map_at_10_diff1 value: 37.85306201039408 - type: nauc_map_at_10_max value: 16.316152483605283 - type: nauc_map_at_10_std value: -1.9300768321014996 - type: nauc_map_at_1_diff1 value: 46.32670783118563 - type: nauc_map_at_1_max value: 19.162748070034993 - type: nauc_map_at_1_std value: -7.2143378209361435 - type: nauc_map_at_20_diff1 value: 37.76015277914087 - type: nauc_map_at_20_max value: 16.402558719060888 - type: nauc_map_at_20_std value: -2.065612538672495 - type: nauc_map_at_3_diff1 value: 39.76679931113434 - type: nauc_map_at_3_max value: 16.834290630961544 - type: nauc_map_at_3_std value: -3.9003170439130335 - type: nauc_map_at_5_diff1 value: 39.03208154755538 - type: nauc_map_at_5_max value: 16.225900244095133 - type: nauc_map_at_5_std value: -2.4557998742917273 - type: nauc_mrr_at_1000_diff1 value: 37.458213267102465 - type: nauc_mrr_at_1000_max value: 16.263132423271077 - type: nauc_mrr_at_1000_std value: -0.6455583895471498 - type: nauc_mrr_at_100_diff1 value: 37.45543984270519 - type: nauc_mrr_at_100_max value: 16.185738866185893 - type: nauc_mrr_at_100_std value: -0.6962640945779722 - type: nauc_mrr_at_10_diff1 value: 37.16827089026705 - type: nauc_mrr_at_10_max value: 15.901025716349201 - type: nauc_mrr_at_10_std value: -0.6599647334904797 - type: nauc_mrr_at_1_diff1 value: 44.322572770568456 - type: nauc_mrr_at_1_max value: 19.02126117731051 - type: nauc_mrr_at_1_std value: -5.8998188281784625 - type: nauc_mrr_at_20_diff1 value: 37.24551389599038 - type: nauc_mrr_at_20_max value: 16.113728443160127 - type: nauc_mrr_at_20_std value: -0.8856480048238807 - type: nauc_mrr_at_3_diff1 value: 38.800389636963004 - type: nauc_mrr_at_3_max value: 16.691447775512863 - type: nauc_mrr_at_3_std value: -2.2008701696190474 - type: nauc_mrr_at_5_diff1 value: 38.17066041754819 - type: nauc_mrr_at_5_max value: 15.854986493430074 - type: nauc_mrr_at_5_std value: -1.3419132385788708 - type: nauc_ndcg_at_1000_diff1 value: 36.500354605077305 - type: nauc_ndcg_at_1000_max value: 18.158853474546227 - type: nauc_ndcg_at_1000_std value: 3.7042707188045783 - type: nauc_ndcg_at_100_diff1 value: 35.68797486655767 - type: nauc_ndcg_at_100_max value: 15.949868116992763 - type: nauc_ndcg_at_100_std value: 1.8743757496922573 - type: nauc_ndcg_at_10_diff1 value: 34.44579459042251 - type: nauc_ndcg_at_10_max value: 14.976928472341097 - type: nauc_ndcg_at_10_std value: 0.668632426387858 - type: nauc_ndcg_at_1_diff1 value: 44.322572770568456 - type: nauc_ndcg_at_1_max value: 19.02126117731051 - type: nauc_ndcg_at_1_std value: -5.8998188281784625 - type: nauc_ndcg_at_20_diff1 value: 34.47554348325645 - type: nauc_ndcg_at_20_max value: 15.617518114283014 - type: nauc_ndcg_at_20_std value: 0.23279335295578624 - type: nauc_ndcg_at_3_diff1 value: 37.34865309502302 - type: nauc_ndcg_at_3_max value: 15.6035028610235 - type: nauc_ndcg_at_3_std value: -2.042290469888462 - type: nauc_ndcg_at_5_diff1 value: 36.710946337067 - type: nauc_ndcg_at_5_max value: 14.502265833101022 - type: nauc_ndcg_at_5_std value: -0.26386753108907807 - type: nauc_precision_at_1000_diff1 value: 3.5611970722748056 - type: nauc_precision_at_1000_max value: 6.9688736574296275 - type: nauc_precision_at_1000_std value: 7.291986774352235 - type: nauc_precision_at_100_diff1 value: 18.866491470530185 - type: nauc_precision_at_100_max value: 3.0721103361408497 - type: nauc_precision_at_100_std value: 4.384934503700695 - type: nauc_precision_at_10_diff1 value: 20.850504784204883 - type: nauc_precision_at_10_max value: 10.633189141801425 - type: nauc_precision_at_10_std value: 5.014926409884033 - type: nauc_precision_at_1_diff1 value: 44.322572770568456 - type: nauc_precision_at_1_max value: 19.02126117731051 - type: nauc_precision_at_1_std value: -5.8998188281784625 - type: nauc_precision_at_20_diff1 value: 20.309109922155518 - type: nauc_precision_at_20_max value: 9.029797084048417 - type: nauc_precision_at_20_std value: 2.758218391395686 - type: nauc_precision_at_3_diff1 value: 30.196789766812422 - type: nauc_precision_at_3_max value: 13.456577178909065 - type: nauc_precision_at_3_std value: 0.49917879030090373 - type: nauc_precision_at_5_diff1 value: 27.706537485425653 - type: nauc_precision_at_5_max value: 9.849229139569182 - type: nauc_precision_at_5_std value: 3.685125093555483 - type: nauc_recall_at_1000_diff1 value: 33.96229420221514 - type: nauc_recall_at_1000_max value: 37.16052892689619 - type: nauc_recall_at_1000_std value: 36.18222346361014 - type: nauc_recall_at_100_diff1 value: 27.657710979013174 - type: nauc_recall_at_100_max value: 15.352705013529967 - type: nauc_recall_at_100_std value: 11.850919034123116 - type: nauc_recall_at_10_diff1 value: 25.46843551212912 - type: nauc_recall_at_10_max value: 12.024769591895815 - type: nauc_recall_at_10_std value: 5.710557786436904 - type: nauc_recall_at_1_diff1 value: 46.32670783118563 - type: nauc_recall_at_1_max value: 19.162748070034993 - type: nauc_recall_at_1_std value: -7.2143378209361435 - type: nauc_recall_at_20_diff1 value: 24.950754303786603 - type: nauc_recall_at_20_max value: 13.779914894639022 - type: nauc_recall_at_20_std value: 4.337235880676669 - type: nauc_recall_at_3_diff1 value: 33.979943512337485 - type: nauc_recall_at_3_max value: 14.35407227008922 - type: nauc_recall_at_3_std value: -0.5408111812033761 - type: nauc_recall_at_5_diff1 value: 31.887819659716687 - type: nauc_recall_at_5_max value: 12.266354466300289 - type: nauc_recall_at_5_std value: 3.67855636796736 - type: ndcg_at_1 value: 16.601 - type: ndcg_at_10 value: 23.735999999999997 - type: ndcg_at_100 value: 29.047 - type: ndcg_at_1000 value: 32.323 - type: ndcg_at_20 value: 25.222 - type: ndcg_at_3 value: 20.013 - type: ndcg_at_5 value: 21.165 - type: precision_at_1 value: 16.601 - type: precision_at_10 value: 4.7829999999999995 - type: precision_at_100 value: 1.077 - type: precision_at_1000 value: 0.197 - type: precision_at_20 value: 3.0429999999999997 - type: precision_at_3 value: 9.881 - type: precision_at_5 value: 7.074999999999999 - type: recall_at_1 value: 13.164000000000001 - type: recall_at_10 value: 33.041 - type: recall_at_100 value: 57.907 - type: recall_at_1000 value: 79.887 - type: recall_at_20 value: 38.833 - type: recall_at_3 value: 21.397 - type: recall_at_5 value: 24.863 - task: type: Retrieval dataset: name: MTEB CQADupstackWordpressRetrieval (default) type: mteb/cqadupstack-wordpress config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: main_score value: 16.794999999999998 - type: map_at_1 value: 10.08 - type: map_at_10 value: 14.069 - type: map_at_100 value: 14.860000000000001 - type: map_at_1000 value: 14.968 - type: map_at_20 value: 14.46 - type: map_at_3 value: 12.498 - type: map_at_5 value: 13.324 - type: mrr_at_1 value: 10.905730129390019 - type: mrr_at_10 value: 15.199146201918854 - type: mrr_at_100 value: 16.00264496872985 - type: mrr_at_1000 value: 16.09501918722929 - type: mrr_at_20 value: 15.633768523540942 - type: mrr_at_3 value: 13.493530499075785 - type: mrr_at_5 value: 14.36229205175601 - type: nauc_map_at_1000_diff1 value: 22.950167181074935 - type: nauc_map_at_1000_max value: 18.717980764527866 - type: nauc_map_at_1000_std value: -6.25267811740101 - type: nauc_map_at_100_diff1 value: 22.94728125565202 - type: nauc_map_at_100_max value: 18.719770177431155 - type: nauc_map_at_100_std value: -6.323089529332934 - type: nauc_map_at_10_diff1 value: 22.346430545898126 - type: nauc_map_at_10_max value: 18.80938448630523 - type: nauc_map_at_10_std value: -7.0008855212089065 - type: nauc_map_at_1_diff1 value: 31.95272198051361 - type: nauc_map_at_1_max value: 22.895259623649785 - type: nauc_map_at_1_std value: -9.582498979740272 - type: nauc_map_at_20_diff1 value: 22.86393142972787 - type: nauc_map_at_20_max value: 18.86264577450788 - type: nauc_map_at_20_std value: -6.45412214287895 - type: nauc_map_at_3_diff1 value: 24.099754234032194 - type: nauc_map_at_3_max value: 18.478412248275664 - type: nauc_map_at_3_std value: -7.165377931835313 - type: nauc_map_at_5_diff1 value: 23.19897817392842 - type: nauc_map_at_5_max value: 18.92826540423832 - type: nauc_map_at_5_std value: -6.707296227198584 - type: nauc_mrr_at_1000_diff1 value: 23.213771617115064 - type: nauc_mrr_at_1000_max value: 19.46803843401541 - type: nauc_mrr_at_1000_std value: -6.593116817917101 - type: nauc_mrr_at_100_diff1 value: 23.231343638867212 - type: nauc_mrr_at_100_max value: 19.452575181351783 - type: nauc_mrr_at_100_std value: -6.626683471900298 - type: nauc_mrr_at_10_diff1 value: 22.605547224050298 - type: nauc_mrr_at_10_max value: 19.467230968891098 - type: nauc_mrr_at_10_std value: -7.304335909859951 - type: nauc_mrr_at_1_diff1 value: 32.21591155654977 - type: nauc_mrr_at_1_max value: 23.898168032566968 - type: nauc_mrr_at_1_std value: -10.113298227732622 - type: nauc_mrr_at_20_diff1 value: 23.17788912060599 - type: nauc_mrr_at_20_max value: 19.681138842631395 - type: nauc_mrr_at_20_std value: -6.668117181278914 - type: nauc_mrr_at_3_diff1 value: 24.324685622276508 - type: nauc_mrr_at_3_max value: 19.28094175953585 - type: nauc_mrr_at_3_std value: -7.896612175052549 - type: nauc_mrr_at_5_diff1 value: 23.56101870977645 - type: nauc_mrr_at_5_max value: 19.830915115983956 - type: nauc_mrr_at_5_std value: -7.247689969483312 - type: nauc_ndcg_at_1000_diff1 value: 21.101486527699198 - type: nauc_ndcg_at_1000_max value: 17.661660378409593 - type: nauc_ndcg_at_1000_std value: -1.627651235714167 - type: nauc_ndcg_at_100_diff1 value: 21.24378422898819 - type: nauc_ndcg_at_100_max value: 17.493044854580774 - type: nauc_ndcg_at_100_std value: -3.419151472965354 - type: nauc_ndcg_at_10_diff1 value: 18.656346406751783 - type: nauc_ndcg_at_10_max value: 17.884063161669054 - type: nauc_ndcg_at_10_std value: -6.3304637473674985 - type: nauc_ndcg_at_1_diff1 value: 32.21591155654977 - type: nauc_ndcg_at_1_max value: 23.898168032566968 - type: nauc_ndcg_at_1_std value: -10.113298227732622 - type: nauc_ndcg_at_20_diff1 value: 20.517191848764295 - type: nauc_ndcg_at_20_max value: 18.302766567740257 - type: nauc_ndcg_at_20_std value: -4.676348966303663 - type: nauc_ndcg_at_3_diff1 value: 22.229860548618376 - type: nauc_ndcg_at_3_max value: 17.700425344082685 - type: nauc_ndcg_at_3_std value: -6.599851166419227 - type: nauc_ndcg_at_5_diff1 value: 20.760917715244236 - type: nauc_ndcg_at_5_max value: 18.320361121073617 - type: nauc_ndcg_at_5_std value: -5.968352306934327 - type: nauc_precision_at_1000_diff1 value: 6.111781725558282 - type: nauc_precision_at_1000_max value: 4.893420377600338 - type: nauc_precision_at_1000_std value: 13.552656007673166 - type: nauc_precision_at_100_diff1 value: 16.174564725391278 - type: nauc_precision_at_100_max value: 14.759102996929807 - type: nauc_precision_at_100_std value: 6.644858850147021 - type: nauc_precision_at_10_diff1 value: 8.889821893924042 - type: nauc_precision_at_10_max value: 15.574473888576575 - type: nauc_precision_at_10_std value: -2.6115731810417366 - type: nauc_precision_at_1_diff1 value: 32.21591155654977 - type: nauc_precision_at_1_max value: 23.898168032566968 - type: nauc_precision_at_1_std value: -10.113298227732622 - type: nauc_precision_at_20_diff1 value: 14.776717379922587 - type: nauc_precision_at_20_max value: 19.55219664568408 - type: nauc_precision_at_20_std value: 2.8624434397265373 - type: nauc_precision_at_3_diff1 value: 17.24181833195652 - type: nauc_precision_at_3_max value: 15.310985601785825 - type: nauc_precision_at_3_std value: -5.815145792096017 - type: nauc_precision_at_5_diff1 value: 14.568702652383378 - type: nauc_precision_at_5_max value: 16.90398092807837 - type: nauc_precision_at_5_std value: -4.884555559489991 - type: nauc_recall_at_1000_diff1 value: 17.718608305964434 - type: nauc_recall_at_1000_max value: 13.402668234081721 - type: nauc_recall_at_1000_std value: 21.623779371422756 - type: nauc_recall_at_100_diff1 value: 18.932841874380454 - type: nauc_recall_at_100_max value: 13.254799775623564 - type: nauc_recall_at_100_std value: 4.592397886568707 - type: nauc_recall_at_10_diff1 value: 10.256753131266485 - type: nauc_recall_at_10_max value: 15.34274332609289 - type: nauc_recall_at_10_std value: -5.019100394026518 - type: nauc_recall_at_1_diff1 value: 31.95272198051361 - type: nauc_recall_at_1_max value: 22.895259623649785 - type: nauc_recall_at_1_std value: -9.582498979740272 - type: nauc_recall_at_20_diff1 value: 16.098225999062155 - type: nauc_recall_at_20_max value: 16.11919310391389 - type: nauc_recall_at_20_std value: -0.981856820033547 - type: nauc_recall_at_3_diff1 value: 16.896414167717293 - type: nauc_recall_at_3_max value: 14.67655178851271 - type: nauc_recall_at_3_std value: -4.885403738918622 - type: nauc_recall_at_5_diff1 value: 15.074392597620905 - type: nauc_recall_at_5_max value: 16.457162195748644 - type: nauc_recall_at_5_std value: -3.6534367499331046 - type: ndcg_at_1 value: 10.906 - type: ndcg_at_10 value: 16.794999999999998 - type: ndcg_at_100 value: 21.434 - type: ndcg_at_1000 value: 24.743000000000002 - type: ndcg_at_20 value: 18.275 - type: ndcg_at_3 value: 13.507 - type: ndcg_at_5 value: 14.953 - type: precision_at_1 value: 10.906 - type: precision_at_10 value: 2.791 - type: precision_at_100 value: 0.5559999999999999 - type: precision_at_1000 value: 0.091 - type: precision_at_20 value: 1.738 - type: precision_at_3 value: 5.545 - type: precision_at_5 value: 4.14 - type: recall_at_1 value: 10.08 - type: recall_at_10 value: 24.184 - type: recall_at_100 value: 46.967999999999996 - type: recall_at_1000 value: 72.92999999999999 - type: recall_at_20 value: 29.852 - type: recall_at_3 value: 15.440999999999999 - type: recall_at_5 value: 18.829 - task: type: Retrieval dataset: name: MTEB ClimateFEVER (default) type: mteb/climate-fever config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: main_score value: 17.288999999999998 - type: map_at_1 value: 6.537 - type: map_at_10 value: 11.465 - type: map_at_100 value: 12.851 - type: map_at_1000 value: 13.045000000000002 - type: map_at_20 value: 12.174 - type: map_at_3 value: 9.369 - type: map_at_5 value: 10.331 - type: mrr_at_1 value: 15.2442996742671 - type: mrr_at_10 value: 23.59306654257793 - type: mrr_at_100 value: 24.771529453769823 - type: mrr_at_1000 value: 24.838895119526256 - type: mrr_at_20 value: 24.34915881726873 - type: mrr_at_3 value: 20.466883821932676 - type: mrr_at_5 value: 22.027144408251875 - type: nauc_map_at_1000_diff1 value: 21.34422077879759 - type: nauc_map_at_1000_max value: 22.628208123980382 - type: nauc_map_at_1000_std value: 15.80771024789922 - type: nauc_map_at_100_diff1 value: 21.373352148960333 - type: nauc_map_at_100_max value: 22.445247482460697 - type: nauc_map_at_100_std value: 15.551345921669244 - type: nauc_map_at_10_diff1 value: 22.093245216727393 - type: nauc_map_at_10_max value: 20.71848879842843 - type: nauc_map_at_10_std value: 13.073037988129768 - type: nauc_map_at_1_diff1 value: 32.56507685691908 - type: nauc_map_at_1_max value: 19.299512363814912 - type: nauc_map_at_1_std value: 7.980883065948159 - type: nauc_map_at_20_diff1 value: 21.612469499988222 - type: nauc_map_at_20_max value: 21.70315933461587 - type: nauc_map_at_20_std value: 14.51324386963804 - type: nauc_map_at_3_diff1 value: 22.671417020380986 - type: nauc_map_at_3_max value: 18.10374651349345 - type: nauc_map_at_3_std value: 9.73448791948781 - type: nauc_map_at_5_diff1 value: 22.034988196838064 - type: nauc_map_at_5_max value: 18.490696961140145 - type: nauc_map_at_5_std value: 11.001958112977931 - type: nauc_mrr_at_1000_diff1 value: 17.997877765827052 - type: nauc_mrr_at_1000_max value: 23.761191320854795 - type: nauc_mrr_at_1000_std value: 17.086288520129283 - type: nauc_mrr_at_100_diff1 value: 17.99589491236679 - type: nauc_mrr_at_100_max value: 23.76386777696214 - type: nauc_mrr_at_100_std value: 17.114923252433908 - type: nauc_mrr_at_10_diff1 value: 17.95028052166577 - type: nauc_mrr_at_10_max value: 23.313446785613046 - type: nauc_mrr_at_10_std value: 16.289313792057893 - type: nauc_mrr_at_1_diff1 value: 25.00794012521374 - type: nauc_mrr_at_1_max value: 20.934023514536086 - type: nauc_mrr_at_1_std value: 10.326842252115775 - type: nauc_mrr_at_20_diff1 value: 17.977173189525192 - type: nauc_mrr_at_20_max value: 23.858084437038833 - type: nauc_mrr_at_20_std value: 17.177629596269224 - type: nauc_mrr_at_3_diff1 value: 18.049118818264052 - type: nauc_mrr_at_3_max value: 21.812245650122605 - type: nauc_mrr_at_3_std value: 14.048078149579718 - type: nauc_mrr_at_5_diff1 value: 18.028877069283745 - type: nauc_mrr_at_5_max value: 21.88620019054395 - type: nauc_mrr_at_5_std value: 14.787661645971001 - type: nauc_ndcg_at_1000_diff1 value: 16.72726980659064 - type: nauc_ndcg_at_1000_max value: 30.043672363788087 - type: nauc_ndcg_at_1000_std value: 26.833584730455268 - type: nauc_ndcg_at_100_diff1 value: 17.16473243031922 - type: nauc_ndcg_at_100_max value: 28.239622016125566 - type: nauc_ndcg_at_100_std value: 24.469002695895977 - type: nauc_ndcg_at_10_diff1 value: 18.655890597433427 - type: nauc_ndcg_at_10_max value: 23.63136724071696 - type: nauc_ndcg_at_10_std value: 17.29295589103389 - type: nauc_ndcg_at_1_diff1 value: 25.00794012521374 - type: nauc_ndcg_at_1_max value: 20.934023514536086 - type: nauc_ndcg_at_1_std value: 10.326842252115775 - type: nauc_ndcg_at_20_diff1 value: 17.762757204969244 - type: nauc_ndcg_at_20_max value: 25.946755000541476 - type: nauc_ndcg_at_20_std value: 20.9523075152757 - type: nauc_ndcg_at_3_diff1 value: 18.258615831392746 - type: nauc_ndcg_at_3_max value: 20.21498568651181 - type: nauc_ndcg_at_3_std value: 12.588112301185989 - type: nauc_ndcg_at_5_diff1 value: 18.575198873459577 - type: nauc_ndcg_at_5_max value: 19.821485190942443 - type: nauc_ndcg_at_5_std value: 13.559611377687455 - type: nauc_precision_at_1000_diff1 value: -1.3591333339360123 - type: nauc_precision_at_1000_max value: 33.01866225202323 - type: nauc_precision_at_1000_std value: 38.26072433720804 - type: nauc_precision_at_100_diff1 value: 4.534183759090849 - type: nauc_precision_at_100_max value: 35.499433595656335 - type: nauc_precision_at_100_std value: 37.765227934597114 - type: nauc_precision_at_10_diff1 value: 11.369511250136568 - type: nauc_precision_at_10_max value: 30.281092515358527 - type: nauc_precision_at_10_std value: 26.690470077530847 - type: nauc_precision_at_1_diff1 value: 25.00794012521374 - type: nauc_precision_at_1_max value: 20.934023514536086 - type: nauc_precision_at_1_std value: 10.326842252115775 - type: nauc_precision_at_20_diff1 value: 8.133211694372351 - type: nauc_precision_at_20_max value: 34.161055315782775 - type: nauc_precision_at_20_std value: 33.33055010570849 - type: nauc_precision_at_3_diff1 value: 10.5682193001728 - type: nauc_precision_at_3_max value: 22.786982248944767 - type: nauc_precision_at_3_std value: 17.92766896610086 - type: nauc_precision_at_5_diff1 value: 10.940535871177055 - type: nauc_precision_at_5_max value: 23.197073410356037 - type: nauc_precision_at_5_std value: 20.612896217277573 - type: nauc_recall_at_1000_diff1 value: 5.540983045337761 - type: nauc_recall_at_1000_max value: 37.3394645787145 - type: nauc_recall_at_1000_std value: 43.905340993951555 - type: nauc_recall_at_100_diff1 value: 8.725053205627061 - type: nauc_recall_at_100_max value: 29.46589116376182 - type: nauc_recall_at_100_std value: 32.76739728784572 - type: nauc_recall_at_10_diff1 value: 13.519133005869758 - type: nauc_recall_at_10_max value: 23.66746585259265 - type: nauc_recall_at_10_std value: 19.744857128981092 - type: nauc_recall_at_1_diff1 value: 32.56507685691908 - type: nauc_recall_at_1_max value: 19.299512363814912 - type: nauc_recall_at_1_std value: 7.980883065948159 - type: nauc_recall_at_20_diff1 value: 10.866077600352101 - type: nauc_recall_at_20_max value: 26.726876720649262 - type: nauc_recall_at_20_std value: 26.28100368153264 - type: nauc_recall_at_3_diff1 value: 15.295338383488533 - type: nauc_recall_at_3_max value: 18.013167170259173 - type: nauc_recall_at_3_std value: 11.569701886642754 - type: nauc_recall_at_5_diff1 value: 14.214598780846863 - type: nauc_recall_at_5_max value: 17.96550333772466 - type: nauc_recall_at_5_std value: 13.720834673116972 - type: ndcg_at_1 value: 15.244 - type: ndcg_at_10 value: 17.288999999999998 - type: ndcg_at_100 value: 23.757 - type: ndcg_at_1000 value: 27.725 - type: ndcg_at_20 value: 19.686999999999998 - type: ndcg_at_3 value: 13.245000000000001 - type: ndcg_at_5 value: 14.485000000000001 - type: precision_at_1 value: 15.244 - type: precision_at_10 value: 5.733 - type: precision_at_100 value: 1.264 - type: precision_at_1000 value: 0.199 - type: precision_at_20 value: 3.85 - type: precision_at_3 value: 10.054 - type: precision_at_5 value: 7.9350000000000005 - type: recall_at_1 value: 6.537 - type: recall_at_10 value: 22.046 - type: recall_at_100 value: 44.818000000000005 - type: recall_at_1000 value: 67.676 - type: recall_at_20 value: 28.974 - type: recall_at_3 value: 12.232 - type: recall_at_5 value: 15.540999999999999 - task: type: Retrieval dataset: name: MTEB DBPedia (default) type: mteb/dbpedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: main_score value: 24.235 - type: map_at_1 value: 4.304 - type: map_at_10 value: 9.944 - type: map_at_100 value: 14.113000000000001 - type: map_at_1000 value: 15.085 - type: map_at_20 value: 11.594 - type: map_at_3 value: 7.228999999999999 - type: map_at_5 value: 8.368 - type: mrr_at_1 value: 43.0 - type: mrr_at_10 value: 53.30376984126983 - type: mrr_at_100 value: 53.97910163622114 - type: mrr_at_1000 value: 54.005267473599304 - type: mrr_at_20 value: 53.740161512249365 - type: mrr_at_3 value: 50.54166666666667 - type: mrr_at_5 value: 52.154166666666654 - type: nauc_map_at_1000_diff1 value: 26.809585057496545 - type: nauc_map_at_1000_max value: 27.599866660752987 - type: nauc_map_at_1000_std value: 31.459439584000094 - type: nauc_map_at_100_diff1 value: 27.049487336011836 - type: nauc_map_at_100_max value: 25.112936840752 - type: nauc_map_at_100_std value: 28.400137100413364 - type: nauc_map_at_10_diff1 value: 32.105246040146554 - type: nauc_map_at_10_max value: 9.658311385867774 - type: nauc_map_at_10_std value: 12.006591313970928 - type: nauc_map_at_1_diff1 value: 45.66826032911575 - type: nauc_map_at_1_max value: 1.1005171486965344 - type: nauc_map_at_1_std value: 3.2500050585955558 - type: nauc_map_at_20_diff1 value: 30.73734552740125 - type: nauc_map_at_20_max value: 14.994971393610829 - type: nauc_map_at_20_std value: 18.029603402042753 - type: nauc_map_at_3_diff1 value: 36.77585294977933 - type: nauc_map_at_3_max value: 2.0123666749907034 - type: nauc_map_at_3_std value: 3.1886056493854906 - type: nauc_map_at_5_diff1 value: 34.910885252980414 - type: nauc_map_at_5_max value: 4.606898880177816 - type: nauc_map_at_5_std value: 5.897472990222533 - type: nauc_mrr_at_1000_diff1 value: 32.8408203164654 - type: nauc_mrr_at_1000_max value: 44.57916824429895 - type: nauc_mrr_at_1000_std value: 25.76632603800019 - type: nauc_mrr_at_100_diff1 value: 32.83381181877902 - type: nauc_mrr_at_100_max value: 44.57742098993615 - type: nauc_mrr_at_100_std value: 25.763980866882193 - type: nauc_mrr_at_10_diff1 value: 32.85879447148161 - type: nauc_mrr_at_10_max value: 44.587973042043814 - type: nauc_mrr_at_10_std value: 25.548766798683893 - type: nauc_mrr_at_1_diff1 value: 36.064038704139605 - type: nauc_mrr_at_1_max value: 43.188409566789346 - type: nauc_mrr_at_1_std value: 24.26421817898062 - type: nauc_mrr_at_20_diff1 value: 32.752896264184685 - type: nauc_mrr_at_20_max value: 44.56787283356919 - type: nauc_mrr_at_20_std value: 25.763763879915313 - type: nauc_mrr_at_3_diff1 value: 33.265925003418126 - type: nauc_mrr_at_3_max value: 43.98236209085194 - type: nauc_mrr_at_3_std value: 24.811433062956347 - type: nauc_mrr_at_5_diff1 value: 33.02692454410134 - type: nauc_mrr_at_5_max value: 44.02150946107612 - type: nauc_mrr_at_5_std value: 24.414392179240878 - type: nauc_ndcg_at_1000_diff1 value: 29.071114816059023 - type: nauc_ndcg_at_1000_max value: 38.90222092060964 - type: nauc_ndcg_at_1000_std value: 44.44820451621514 - type: nauc_ndcg_at_100_diff1 value: 29.1316364198098 - type: nauc_ndcg_at_100_max value: 31.558894971415064 - type: nauc_ndcg_at_100_std value: 35.45395514581182 - type: nauc_ndcg_at_10_diff1 value: 29.303783217647744 - type: nauc_ndcg_at_10_max value: 31.009718153863414 - type: nauc_ndcg_at_10_std value: 27.49477754545124 - type: nauc_ndcg_at_1_diff1 value: 35.43480922848642 - type: nauc_ndcg_at_1_max value: 30.475722281046714 - type: nauc_ndcg_at_1_std value: 17.626646786380547 - type: nauc_ndcg_at_20_diff1 value: 29.30769894815147 - type: nauc_ndcg_at_20_max value: 27.870710525324107 - type: nauc_ndcg_at_20_std value: 28.334513734492532 - type: nauc_ndcg_at_3_diff1 value: 30.7536730308035 - type: nauc_ndcg_at_3_max value: 32.32457811814772 - type: nauc_ndcg_at_3_std value: 21.676427426548152 - type: nauc_ndcg_at_5_diff1 value: 29.96943892323901 - type: nauc_ndcg_at_5_max value: 31.493512707920964 - type: nauc_ndcg_at_5_std value: 24.0956693770445 - type: nauc_precision_at_1000_diff1 value: -5.720318672455256 - type: nauc_precision_at_1000_max value: 28.08646209634404 - type: nauc_precision_at_1000_std value: 29.34422238786186 - type: nauc_precision_at_100_diff1 value: 0.84607162708279 - type: nauc_precision_at_100_max value: 47.97391409332498 - type: nauc_precision_at_100_std value: 44.619521382937286 - type: nauc_precision_at_10_diff1 value: 9.622029967680373 - type: nauc_precision_at_10_max value: 45.89203900455004 - type: nauc_precision_at_10_std value: 38.276273021326745 - type: nauc_precision_at_1_diff1 value: 36.064038704139605 - type: nauc_precision_at_1_max value: 43.188409566789346 - type: nauc_precision_at_1_std value: 24.26421817898062 - type: nauc_precision_at_20_diff1 value: 6.709711811715244 - type: nauc_precision_at_20_max value: 47.47318907005896 - type: nauc_precision_at_20_std value: 42.595576770275095 - type: nauc_precision_at_3_diff1 value: 19.233575308317054 - type: nauc_precision_at_3_max value: 43.02563765159987 - type: nauc_precision_at_3_std value: 27.334254446564454 - type: nauc_precision_at_5_diff1 value: 14.298477498830673 - type: nauc_precision_at_5_max value: 42.72631241492758 - type: nauc_precision_at_5_std value: 32.14763584000337 - type: nauc_recall_at_1000_diff1 value: 18.551929022070503 - type: nauc_recall_at_1000_max value: 25.99572596347025 - type: nauc_recall_at_1000_std value: 49.479321187111644 - type: nauc_recall_at_100_diff1 value: 16.24655246342188 - type: nauc_recall_at_100_max value: 19.193014693852824 - type: nauc_recall_at_100_std value: 31.691642773148754 - type: nauc_recall_at_10_diff1 value: 21.181166055890365 - type: nauc_recall_at_10_max value: -0.020533885799737757 - type: nauc_recall_at_10_std value: 7.266191592314226 - type: nauc_recall_at_1_diff1 value: 45.66826032911575 - type: nauc_recall_at_1_max value: 1.1005171486965344 - type: nauc_recall_at_1_std value: 3.2500050585955558 - type: nauc_recall_at_20_diff1 value: 19.153797037751836 - type: nauc_recall_at_20_max value: 3.9385573002743057 - type: nauc_recall_at_20_std value: 14.048512138776442 - type: nauc_recall_at_3_diff1 value: 30.240078354763085 - type: nauc_recall_at_3_max value: -4.0841121814480195 - type: nauc_recall_at_3_std value: -2.3759344889809264 - type: nauc_recall_at_5_diff1 value: 26.22489817092464 - type: nauc_recall_at_5_max value: -3.2396073154699256 - type: nauc_recall_at_5_std value: -0.1327990827712389 - type: ndcg_at_1 value: 31.5 - type: ndcg_at_10 value: 24.235 - type: ndcg_at_100 value: 28.01 - type: ndcg_at_1000 value: 34.724 - type: ndcg_at_20 value: 24.265 - type: ndcg_at_3 value: 26.682 - type: ndcg_at_5 value: 25.249 - type: precision_at_1 value: 43.0 - type: precision_at_10 value: 21.65 - type: precision_at_100 value: 6.97 - type: precision_at_1000 value: 1.4449999999999998 - type: precision_at_20 value: 16.6 - type: precision_at_3 value: 32.25 - type: precision_at_5 value: 27.250000000000004 - type: recall_at_1 value: 4.304 - type: recall_at_10 value: 15.014 - type: recall_at_100 value: 35.115 - type: recall_at_1000 value: 58.52 - type: recall_at_20 value: 20.817 - type: recall_at_3 value: 8.698 - type: recall_at_5 value: 11.052 - task: type: Classification dataset: name: MTEB EmotionClassification (default) type: mteb/emotion config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 45.09 - type: f1 value: 41.3731018097549 - type: f1_weighted value: 47.129694558751545 - type: main_score value: 45.09 - task: type: Retrieval dataset: name: MTEB FEVER (default) type: mteb/fever config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: main_score value: 30.267 - type: map_at_1 value: 16.349 - type: map_at_10 value: 24.917 - type: map_at_100 value: 26.003 - type: map_at_1000 value: 26.072 - type: map_at_20 value: 25.558999999999997 - type: map_at_3 value: 22.067999999999998 - type: map_at_5 value: 23.610999999999997 - type: mrr_at_1 value: 17.416741674167415 - type: mrr_at_10 value: 26.439929707256365 - type: mrr_at_100 value: 27.508820939687954 - type: mrr_at_1000 value: 27.570352489203128 - type: mrr_at_20 value: 27.08319436248233 - type: mrr_at_3 value: 23.422342234223358 - type: mrr_at_5 value: 25.06350635063509 - type: nauc_map_at_1000_diff1 value: 21.773223671090857 - type: nauc_map_at_1000_max value: 6.412897130218669 - type: nauc_map_at_1000_std value: -6.3221009008493745 - type: nauc_map_at_100_diff1 value: 21.76483868507978 - type: nauc_map_at_100_max value: 6.404365200549758 - type: nauc_map_at_100_std value: -6.342840969370927 - type: nauc_map_at_10_diff1 value: 21.669481996014238 - type: nauc_map_at_10_max value: 6.019531738681224 - type: nauc_map_at_10_std value: -6.941777440293395 - type: nauc_map_at_1_diff1 value: 27.706382248361393 - type: nauc_map_at_1_max value: 4.030610814398596 - type: nauc_map_at_1_std value: -9.782554832619702 - type: nauc_map_at_20_diff1 value: 21.80535156700929 - type: nauc_map_at_20_max value: 6.361714278006344 - type: nauc_map_at_20_std value: -6.513790702798104 - type: nauc_map_at_3_diff1 value: 23.017059605983857 - type: nauc_map_at_3_max value: 5.110304244032051 - type: nauc_map_at_3_std value: -8.069547854658104 - type: nauc_map_at_5_diff1 value: 21.927491204194766 - type: nauc_map_at_5_max value: 5.462525780765053 - type: nauc_map_at_5_std value: -7.474340804858998 - type: nauc_mrr_at_1000_diff1 value: 21.61235920652557 - type: nauc_mrr_at_1000_max value: 6.6996553488043915 - type: nauc_mrr_at_1000_std value: -6.520954496784069 - type: nauc_mrr_at_100_diff1 value: 21.597831485534126 - type: nauc_mrr_at_100_max value: 6.705135295195408 - type: nauc_mrr_at_100_std value: -6.521597409657566 - type: nauc_mrr_at_10_diff1 value: 21.404259600861597 - type: nauc_mrr_at_10_max value: 6.348078634441438 - type: nauc_mrr_at_10_std value: -7.012906818443071 - type: nauc_mrr_at_1_diff1 value: 27.231264207663248 - type: nauc_mrr_at_1_max value: 4.04888129901842 - type: nauc_mrr_at_1_std value: -9.998368133129015 - type: nauc_mrr_at_20_diff1 value: 21.57543681953314 - type: nauc_mrr_at_20_max value: 6.670007051575425 - type: nauc_mrr_at_20_std value: -6.636382948186316 - type: nauc_mrr_at_3_diff1 value: 22.771758514181627 - type: nauc_mrr_at_3_max value: 5.389600538667887 - type: nauc_mrr_at_3_std value: -8.189661361743667 - type: nauc_mrr_at_5_diff1 value: 21.689397986510446 - type: nauc_mrr_at_5_max value: 5.765658649049543 - type: nauc_mrr_at_5_std value: -7.590205788150704 - type: nauc_ndcg_at_1000_diff1 value: 19.780729881850963 - type: nauc_ndcg_at_1000_max value: 8.968522119658385 - type: nauc_ndcg_at_1000_std value: -2.425269449284083 - type: nauc_ndcg_at_100_diff1 value: 19.46657224380776 - type: nauc_ndcg_at_100_max value: 9.05883201318058 - type: nauc_ndcg_at_100_std value: -2.5565659351523293 - type: nauc_ndcg_at_10_diff1 value: 19.29152253186839 - type: nauc_ndcg_at_10_max value: 7.499062048205841 - type: nauc_ndcg_at_10_std value: -5.2482566392088685 - type: nauc_ndcg_at_1_diff1 value: 27.231264207663248 - type: nauc_ndcg_at_1_max value: 4.04888129901842 - type: nauc_ndcg_at_1_std value: -9.998368133129015 - type: nauc_ndcg_at_20_diff1 value: 19.71545443537324 - type: nauc_ndcg_at_20_max value: 8.64504551388718 - type: nauc_ndcg_at_20_std value: -3.7667113417348976 - type: nauc_ndcg_at_3_diff1 value: 21.745216173844803 - type: nauc_ndcg_at_3_max value: 5.650727598972489 - type: nauc_ndcg_at_3_std value: -7.481336986244201 - type: nauc_ndcg_at_5_diff1 value: 19.936133837204203 - type: nauc_ndcg_at_5_max value: 6.259916537058443 - type: nauc_ndcg_at_5_std value: -6.484388158971839 - type: nauc_precision_at_1000_diff1 value: 1.471146535072958 - type: nauc_precision_at_1000_max value: 20.630906784097483 - type: nauc_precision_at_1000_std value: 21.9773366010731 - type: nauc_precision_at_100_diff1 value: 7.533964401054148 - type: nauc_precision_at_100_max value: 19.925643661900423 - type: nauc_precision_at_100_std value: 15.336729247195924 - type: nauc_precision_at_10_diff1 value: 12.150440335935734 - type: nauc_precision_at_10_max value: 11.983854268540387 - type: nauc_precision_at_10_std value: -0.37221151434129196 - type: nauc_precision_at_1_diff1 value: 27.231264207663248 - type: nauc_precision_at_1_max value: 4.04888129901842 - type: nauc_precision_at_1_std value: -9.998368133129015 - type: nauc_precision_at_20_diff1 value: 12.630450311503752 - type: nauc_precision_at_20_max value: 16.05605149278296 - type: nauc_precision_at_20_std value: 5.3999355877921165 - type: nauc_precision_at_3_diff1 value: 18.359563527526568 - type: nauc_precision_at_3_max value: 7.050702808245418 - type: nauc_precision_at_3_std value: -6.012052050420314 - type: nauc_precision_at_5_diff1 value: 14.398743831406193 - type: nauc_precision_at_5_max value: 8.47645601614165 - type: nauc_precision_at_5_std value: -4.017240645221931 - type: nauc_recall_at_1000_diff1 value: 7.839541590866944 - type: nauc_recall_at_1000_max value: 23.309619602703478 - type: nauc_recall_at_1000_std value: 27.804864458508405 - type: nauc_recall_at_100_diff1 value: 9.97691215791031 - type: nauc_recall_at_100_max value: 18.819153599870717 - type: nauc_recall_at_100_std value: 14.458117071228108 - type: nauc_recall_at_10_diff1 value: 12.810432997078946 - type: nauc_recall_at_10_max value: 10.766544057766287 - type: nauc_recall_at_10_std value: -0.5969028921503585 - type: nauc_recall_at_1_diff1 value: 27.706382248361393 - type: nauc_recall_at_1_max value: 4.030610814398596 - type: nauc_recall_at_1_std value: -9.782554832619702 - type: nauc_recall_at_20_diff1 value: 13.595110328407126 - type: nauc_recall_at_20_max value: 14.757809231376443 - type: nauc_recall_at_20_std value: 4.9020894617594575 - type: nauc_recall_at_3_diff1 value: 18.603105066886183 - type: nauc_recall_at_3_max value: 6.695351132956627 - type: nauc_recall_at_3_std value: -5.761401766506087 - type: nauc_recall_at_5_diff1 value: 14.770731919705574 - type: nauc_recall_at_5_max value: 7.754748009508286 - type: nauc_recall_at_5_std value: -3.7961358195332773 - type: ndcg_at_1 value: 17.416999999999998 - type: ndcg_at_10 value: 30.267 - type: ndcg_at_100 value: 35.650999999999996 - type: ndcg_at_1000 value: 37.57 - type: ndcg_at_20 value: 32.574 - type: ndcg_at_3 value: 24.303 - type: ndcg_at_5 value: 27.099 - type: precision_at_1 value: 17.416999999999998 - type: precision_at_10 value: 4.9590000000000005 - type: precision_at_100 value: 0.7799999999999999 - type: precision_at_1000 value: 0.096 - type: precision_at_20 value: 2.9819999999999998 - type: precision_at_3 value: 10.536 - type: precision_at_5 value: 7.807 - type: recall_at_1 value: 16.349 - type: recall_at_10 value: 45.678999999999995 - type: recall_at_100 value: 70.541 - type: recall_at_1000 value: 85.36500000000001 - type: recall_at_20 value: 54.541 - type: recall_at_3 value: 29.42 - type: recall_at_5 value: 36.112 - task: type: Retrieval dataset: name: MTEB FiQA2018 (default) type: mteb/fiqa config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: main_score value: 16.619 - type: map_at_1 value: 7.478999999999999 - type: map_at_10 value: 11.933 - type: map_at_100 value: 13.078000000000001 - type: map_at_1000 value: 13.267999999999999 - type: map_at_20 value: 12.465 - type: map_at_3 value: 9.975000000000001 - type: map_at_5 value: 10.928 - type: mrr_at_1 value: 14.660493827160495 - type: mrr_at_10 value: 20.737250146972368 - type: mrr_at_100 value: 21.718558761167632 - type: mrr_at_1000 value: 21.808600465854973 - type: mrr_at_20 value: 21.221196101889976 - type: mrr_at_3 value: 18.569958847736622 - type: mrr_at_5 value: 19.557613168724284 - type: nauc_map_at_1000_diff1 value: 21.51431734644358 - type: nauc_map_at_1000_max value: 4.931074809601008 - type: nauc_map_at_1000_std value: -3.3303160557020033 - type: nauc_map_at_100_diff1 value: 21.38249575770264 - type: nauc_map_at_100_max value: 4.725930298940441 - type: nauc_map_at_100_std value: -3.4448477852279473 - type: nauc_map_at_10_diff1 value: 21.195172969735484 - type: nauc_map_at_10_max value: 4.412691847045547 - type: nauc_map_at_10_std value: -4.350074377307911 - type: nauc_map_at_1_diff1 value: 28.103238263092063 - type: nauc_map_at_1_max value: 6.669837188399256 - type: nauc_map_at_1_std value: -4.3658897905036405 - type: nauc_map_at_20_diff1 value: 21.489132375885042 - type: nauc_map_at_20_max value: 4.303022314751493 - type: nauc_map_at_20_std value: -4.17992541434375 - type: nauc_map_at_3_diff1 value: 22.237087711122065 - type: nauc_map_at_3_max value: 4.533442194144081 - type: nauc_map_at_3_std value: -5.4916480142821635 - type: nauc_map_at_5_diff1 value: 21.876772694300065 - type: nauc_map_at_5_max value: 4.511112176374985 - type: nauc_map_at_5_std value: -5.176150118472554 - type: nauc_mrr_at_1000_diff1 value: 22.783625924297894 - type: nauc_mrr_at_1000_max value: 5.601679998803955 - type: nauc_mrr_at_1000_std value: -7.3878080622090865 - type: nauc_mrr_at_100_diff1 value: 22.729460521696915 - type: nauc_mrr_at_100_max value: 5.57805664833725 - type: nauc_mrr_at_100_std value: -7.3741470356357945 - type: nauc_mrr_at_10_diff1 value: 22.92977199129734 - type: nauc_mrr_at_10_max value: 5.36088601159652 - type: nauc_mrr_at_10_std value: -7.875413563795927 - type: nauc_mrr_at_1_diff1 value: 28.31095482042955 - type: nauc_mrr_at_1_max value: 7.815000197077026 - type: nauc_mrr_at_1_std value: -7.957538731368522 - type: nauc_mrr_at_20_diff1 value: 22.946584920142406 - type: nauc_mrr_at_20_max value: 5.384498887828733 - type: nauc_mrr_at_20_std value: -7.633579657779428 - type: nauc_mrr_at_3_diff1 value: 23.46361356498147 - type: nauc_mrr_at_3_max value: 4.50117125788086 - type: nauc_mrr_at_3_std value: -8.902224452227653 - type: nauc_mrr_at_5_diff1 value: 23.331352654582094 - type: nauc_mrr_at_5_max value: 4.978873752458006 - type: nauc_mrr_at_5_std value: -8.93749978655238 - type: nauc_ndcg_at_1000_diff1 value: 19.87039469365751 - type: nauc_ndcg_at_1000_max value: 8.696714614408632 - type: nauc_ndcg_at_1000_std value: 1.9681923697039077 - type: nauc_ndcg_at_100_diff1 value: 18.868322837780532 - type: nauc_ndcg_at_100_max value: 6.0333062132177675 - type: nauc_ndcg_at_100_std value: 0.44045929715801535 - type: nauc_ndcg_at_10_diff1 value: 19.727068370792786 - type: nauc_ndcg_at_10_max value: 4.277512828410901 - type: nauc_ndcg_at_10_std value: -4.086859790177703 - type: nauc_ndcg_at_1_diff1 value: 28.31095482042955 - type: nauc_ndcg_at_1_max value: 7.815000197077026 - type: nauc_ndcg_at_1_std value: -7.957538731368522 - type: nauc_ndcg_at_20_diff1 value: 20.29147215834196 - type: nauc_ndcg_at_20_max value: 4.095649235859702 - type: nauc_ndcg_at_20_std value: -3.35870597862009 - type: nauc_ndcg_at_3_diff1 value: 21.821928240162936 - type: nauc_ndcg_at_3_max value: 4.480256449572136 - type: nauc_ndcg_at_3_std value: -7.852741840584263 - type: nauc_ndcg_at_5_diff1 value: 21.15156996884851 - type: nauc_ndcg_at_5_max value: 4.290200639355712 - type: nauc_ndcg_at_5_std value: -6.820305338379054 - type: nauc_precision_at_1000_diff1 value: 8.075302805866599 - type: nauc_precision_at_1000_max value: 19.944406193476624 - type: nauc_precision_at_1000_std value: 7.381890177301082 - type: nauc_precision_at_100_diff1 value: 11.601078456057651 - type: nauc_precision_at_100_max value: 13.628171798745194 - type: nauc_precision_at_100_std value: 5.64401780985023 - type: nauc_precision_at_10_diff1 value: 16.653551040271243 - type: nauc_precision_at_10_max value: 6.546264597330201 - type: nauc_precision_at_10_std value: -4.71713361654603 - type: nauc_precision_at_1_diff1 value: 28.31095482042955 - type: nauc_precision_at_1_max value: 7.815000197077026 - type: nauc_precision_at_1_std value: -7.957538731368522 - type: nauc_precision_at_20_diff1 value: 17.066402720849883 - type: nauc_precision_at_20_max value: 6.178677607606832 - type: nauc_precision_at_20_std value: -3.987829586084965 - type: nauc_precision_at_3_diff1 value: 18.358060169256518 - type: nauc_precision_at_3_max value: 3.326657304001109 - type: nauc_precision_at_3_std value: -10.729398884603352 - type: nauc_precision_at_5_diff1 value: 19.41722339541596 - type: nauc_precision_at_5_max value: 5.714829813319856 - type: nauc_precision_at_5_std value: -8.915414021584194 - type: nauc_recall_at_1000_diff1 value: 9.365082280755011 - type: nauc_recall_at_1000_max value: 15.829818126823215 - type: nauc_recall_at_1000_std value: 27.360808820832666 - type: nauc_recall_at_100_diff1 value: 8.05391879951721 - type: nauc_recall_at_100_max value: 5.285477600522065 - type: nauc_recall_at_100_std value: 13.239431098719457 - type: nauc_recall_at_10_diff1 value: 13.288596558862537 - type: nauc_recall_at_10_max value: 1.9512189235666242 - type: nauc_recall_at_10_std value: 0.08420098367582614 - type: nauc_recall_at_1_diff1 value: 28.103238263092063 - type: nauc_recall_at_1_max value: 6.669837188399256 - type: nauc_recall_at_1_std value: -4.3658897905036405 - type: nauc_recall_at_20_diff1 value: 14.781087409113736 - type: nauc_recall_at_20_max value: 1.6715579437911525 - type: nauc_recall_at_20_std value: 1.4885011649849296 - type: nauc_recall_at_3_diff1 value: 16.904223069103445 - type: nauc_recall_at_3_max value: 1.2031021965601998 - type: nauc_recall_at_3_std value: -5.7358517453558395 - type: nauc_recall_at_5_diff1 value: 15.560583779980208 - type: nauc_recall_at_5_max value: 1.268944483676161 - type: nauc_recall_at_5_std value: -5.114882384179444 - type: ndcg_at_1 value: 14.66 - type: ndcg_at_10 value: 16.619 - type: ndcg_at_100 value: 22.467000000000002 - type: ndcg_at_1000 value: 26.745 - type: ndcg_at_20 value: 18.356 - type: ndcg_at_3 value: 13.547 - type: ndcg_at_5 value: 14.466999999999999 - type: precision_at_1 value: 14.66 - type: precision_at_10 value: 4.8149999999999995 - type: precision_at_100 value: 1.0619999999999998 - type: precision_at_1000 value: 0.182 - type: precision_at_20 value: 3.071 - type: precision_at_3 value: 9.002 - type: precision_at_5 value: 6.79 - type: recall_at_1 value: 7.478999999999999 - type: recall_at_10 value: 21.884 - type: recall_at_100 value: 45.545 - type: recall_at_1000 value: 71.887 - type: recall_at_20 value: 27.567999999999998 - type: recall_at_3 value: 12.485 - type: recall_at_5 value: 15.862000000000002 - task: type: Retrieval dataset: name: MTEB HotpotQA (default) type: mteb/hotpotqa config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: main_score value: 36.217 - type: map_at_1 value: 20.628 - type: map_at_10 value: 28.559 - type: map_at_100 value: 29.5 - type: map_at_1000 value: 29.601 - type: map_at_20 value: 29.069 - type: map_at_3 value: 26.429000000000002 - type: map_at_5 value: 27.589000000000002 - type: mrr_at_1 value: 41.2559081701553 - type: mrr_at_10 value: 48.84337052399182 - type: mrr_at_100 value: 49.523346087979284 - type: mrr_at_1000 value: 49.56958885341236 - type: mrr_at_20 value: 49.24793448550151 - type: mrr_at_3 value: 46.893990546927924 - type: mrr_at_5 value: 48.02430790006756 - type: nauc_map_at_1000_diff1 value: 47.360168970984724 - type: nauc_map_at_1000_max value: 24.614881662381816 - type: nauc_map_at_1000_std value: 7.361001821254585 - type: nauc_map_at_100_diff1 value: 47.364333667549126 - type: nauc_map_at_100_max value: 24.59919582686935 - type: nauc_map_at_100_std value: 7.30629187742088 - type: nauc_map_at_10_diff1 value: 47.72981170600924 - type: nauc_map_at_10_max value: 24.438913671717863 - type: nauc_map_at_10_std value: 6.344771843030873 - type: nauc_map_at_1_diff1 value: 60.38112885477367 - type: nauc_map_at_1_max value: 25.9097175050165 - type: nauc_map_at_1_std value: 1.6564371988429167 - type: nauc_map_at_20_diff1 value: 47.57684884180127 - type: nauc_map_at_20_max value: 24.499763513475443 - type: nauc_map_at_20_std value: 6.846169751546589 - type: nauc_map_at_3_diff1 value: 49.86374782865936 - type: nauc_map_at_3_max value: 24.885292020762233 - type: nauc_map_at_3_std value: 4.8258321037343075 - type: nauc_map_at_5_diff1 value: 48.41433187485084 - type: nauc_map_at_5_max value: 24.439622781310288 - type: nauc_map_at_5_std value: 5.664110533938225 - type: nauc_mrr_at_1000_diff1 value: 56.730426912840926 - type: nauc_mrr_at_1000_max value: 25.303184184778832 - type: nauc_mrr_at_1000_std value: 4.096788282752593 - type: nauc_mrr_at_100_diff1 value: 56.72217642846328 - type: nauc_mrr_at_100_max value: 25.302090289174313 - type: nauc_mrr_at_100_std value: 4.108586907297719 - type: nauc_mrr_at_10_diff1 value: 56.738023427066885 - type: nauc_mrr_at_10_max value: 25.271616491844455 - type: nauc_mrr_at_10_std value: 3.824908381559653 - type: nauc_mrr_at_1_diff1 value: 60.38112885477367 - type: nauc_mrr_at_1_max value: 25.9097175050165 - type: nauc_mrr_at_1_std value: 1.6564371988429167 - type: nauc_mrr_at_20_diff1 value: 56.70644340159845 - type: nauc_mrr_at_20_max value: 25.27993872890672 - type: nauc_mrr_at_20_std value: 4.0064390570846875 - type: nauc_mrr_at_3_diff1 value: 57.245840183280194 - type: nauc_mrr_at_3_max value: 25.33525251108163 - type: nauc_mrr_at_3_std value: 2.9291934957523584 - type: nauc_mrr_at_5_diff1 value: 56.755596718387125 - type: nauc_mrr_at_5_max value: 25.22311364368114 - type: nauc_mrr_at_5_std value: 3.5613271952141865 - type: nauc_ndcg_at_1000_diff1 value: 46.553394894195456 - type: nauc_ndcg_at_1000_max value: 24.938550469205936 - type: nauc_ndcg_at_1000_std value: 11.539278224453703 - type: nauc_ndcg_at_100_diff1 value: 46.60518292153804 - type: nauc_ndcg_at_100_max value: 24.724969691359487 - type: nauc_ndcg_at_100_std value: 10.73834721703669 - type: nauc_ndcg_at_10_diff1 value: 48.12092181292035 - type: nauc_ndcg_at_10_max value: 24.2791002435645 - type: nauc_ndcg_at_10_std value: 7.153695707296072 - type: nauc_ndcg_at_1_diff1 value: 60.38112885477367 - type: nauc_ndcg_at_1_max value: 25.9097175050165 - type: nauc_ndcg_at_1_std value: 1.6564371988429167 - type: nauc_ndcg_at_20_diff1 value: 47.65117800859018 - type: nauc_ndcg_at_20_max value: 24.357451369693482 - type: nauc_ndcg_at_20_std value: 8.469581027730795 - type: nauc_ndcg_at_3_diff1 value: 51.08303103543016 - type: nauc_ndcg_at_3_max value: 24.799424583706255 - type: nauc_ndcg_at_3_std value: 4.63909501741516 - type: nauc_ndcg_at_5_diff1 value: 49.136821889915225 - type: nauc_ndcg_at_5_max value: 24.243099266851612 - type: nauc_ndcg_at_5_std value: 5.961841495442629 - type: nauc_precision_at_1000_diff1 value: 14.823992446535481 - type: nauc_precision_at_1000_max value: 17.957974549199044 - type: nauc_precision_at_1000_std value: 31.79928156519854 - type: nauc_precision_at_100_diff1 value: 23.121894912525356 - type: nauc_precision_at_100_max value: 19.166436915427486 - type: nauc_precision_at_100_std value: 23.79964191034748 - type: nauc_precision_at_10_diff1 value: 35.6440151764581 - type: nauc_precision_at_10_max value: 21.022400502868223 - type: nauc_precision_at_10_std value: 11.461152130387351 - type: nauc_precision_at_1_diff1 value: 60.38112885477367 - type: nauc_precision_at_1_max value: 25.9097175050165 - type: nauc_precision_at_1_std value: 1.6564371988429167 - type: nauc_precision_at_20_diff1 value: 31.893138428309527 - type: nauc_precision_at_20_max value: 19.961827091439737 - type: nauc_precision_at_20_std value: 15.056260461619232 - type: nauc_precision_at_3_diff1 value: 45.06971180999361 - type: nauc_precision_at_3_max value: 23.635891515921788 - type: nauc_precision_at_3_std value: 6.198234444102806 - type: nauc_precision_at_5_diff1 value: 39.43842818627394 - type: nauc_precision_at_5_max value: 21.623592109687603 - type: nauc_precision_at_5_std value: 8.718348302717638 - type: nauc_recall_at_1000_diff1 value: 14.823992446535502 - type: nauc_recall_at_1000_max value: 17.95797454919907 - type: nauc_recall_at_1000_std value: 31.799281565198577 - type: nauc_recall_at_100_diff1 value: 23.121894912525338 - type: nauc_recall_at_100_max value: 19.16643691542745 - type: nauc_recall_at_100_std value: 23.799641910347454 - type: nauc_recall_at_10_diff1 value: 35.64401517645808 - type: nauc_recall_at_10_max value: 21.022400502868223 - type: nauc_recall_at_10_std value: 11.461152130387346 - type: nauc_recall_at_1_diff1 value: 60.38112885477367 - type: nauc_recall_at_1_max value: 25.9097175050165 - type: nauc_recall_at_1_std value: 1.6564371988429167 - type: nauc_recall_at_20_diff1 value: 31.89313842830953 - type: nauc_recall_at_20_max value: 19.961827091439776 - type: nauc_recall_at_20_std value: 15.05626046161922 - type: nauc_recall_at_3_diff1 value: 45.06971180999365 - type: nauc_recall_at_3_max value: 23.6358915159218 - type: nauc_recall_at_3_std value: 6.198234444102802 - type: nauc_recall_at_5_diff1 value: 39.43842818627392 - type: nauc_recall_at_5_max value: 21.623592109687596 - type: nauc_recall_at_5_std value: 8.71834830271761 - type: ndcg_at_1 value: 41.256 - type: ndcg_at_10 value: 36.217 - type: ndcg_at_100 value: 40.422000000000004 - type: ndcg_at_1000 value: 42.762 - type: ndcg_at_20 value: 37.801 - type: ndcg_at_3 value: 32.275999999999996 - type: ndcg_at_5 value: 34.184 - type: precision_at_1 value: 41.256 - type: precision_at_10 value: 7.838000000000001 - type: precision_at_100 value: 1.119 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_20 value: 4.429 - type: precision_at_3 value: 20.207 - type: precision_at_5 value: 13.636999999999999 - type: recall_at_1 value: 20.628 - type: recall_at_10 value: 39.190000000000005 - type: recall_at_100 value: 55.962 - type: recall_at_1000 value: 71.56700000000001 - type: recall_at_20 value: 44.288 - type: recall_at_3 value: 30.311 - type: recall_at_5 value: 34.092 - task: type: Classification dataset: name: MTEB ImdbClassification (default) type: mteb/imdb config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 70.78 - type: ap value: 65.09281598781793 - type: ap_weighted value: 65.09281598781793 - type: f1 value: 70.56498155979408 - type: f1_weighted value: 70.56498155979408 - type: main_score value: 70.78 - task: type: Retrieval dataset: name: MTEB MSMARCO (default) type: mteb/msmarco config: default split: test revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: main_score value: 34.981 - type: map_at_1 value: 0.9369999999999999 - type: map_at_10 value: 6.105 - type: map_at_100 value: 16.573 - type: map_at_1000 value: 20.952 - type: map_at_20 value: 9.495000000000001 - type: map_at_3 value: 2.429 - type: map_at_5 value: 3.7199999999999998 - type: mrr_at_1 value: 55.81395348837209 - type: mrr_at_10 value: 68.06201550387597 - type: mrr_at_100 value: 68.1915571731129 - type: mrr_at_1000 value: 68.20171255038517 - type: mrr_at_20 value: 68.06201550387597 - type: mrr_at_3 value: 65.89147286821705 - type: mrr_at_5 value: 67.05426356589147 - type: nauc_map_at_1000_diff1 value: 18.395978949265306 - type: nauc_map_at_1000_max value: 65.4845955483722 - type: nauc_map_at_1000_std value: 60.01425674651855 - type: nauc_map_at_100_diff1 value: 17.66459171040137 - type: nauc_map_at_100_max value: 56.91214775388199 - type: nauc_map_at_100_std value: 51.26999006986676 - type: nauc_map_at_10_diff1 value: 16.954292128521953 - type: nauc_map_at_10_max value: 29.470502786246144 - type: nauc_map_at_10_std value: 26.609751637393327 - type: nauc_map_at_1_diff1 value: 10.947697022780028 - type: nauc_map_at_1_max value: 11.333211449460881 - type: nauc_map_at_1_std value: 19.475048420924633 - type: nauc_map_at_20_diff1 value: 13.788525799384063 - type: nauc_map_at_20_max value: 36.86668066777578 - type: nauc_map_at_20_std value: 31.64971965701265 - type: nauc_map_at_3_diff1 value: 17.859630126844696 - type: nauc_map_at_3_max value: 21.46834280704547 - type: nauc_map_at_3_std value: 21.076387895251823 - type: nauc_map_at_5_diff1 value: 20.17441650295119 - type: nauc_map_at_5_max value: 24.878188082696866 - type: nauc_map_at_5_std value: 25.307502719861176 - type: nauc_mrr_at_1000_diff1 value: 14.192749126463891 - type: nauc_mrr_at_1000_max value: 52.54526357757101 - type: nauc_mrr_at_1000_std value: 44.496694053499596 - type: nauc_mrr_at_100_diff1 value: 14.215939043892334 - type: nauc_mrr_at_100_max value: 52.564251294672225 - type: nauc_mrr_at_100_std value: 44.51890218594217 - type: nauc_mrr_at_10_diff1 value: 14.433120969285195 - type: nauc_mrr_at_10_max value: 52.78365722715205 - type: nauc_mrr_at_10_std value: 44.72011559301776 - type: nauc_mrr_at_1_diff1 value: 4.7355957804700415 - type: nauc_mrr_at_1_max value: 39.93352486009351 - type: nauc_mrr_at_1_std value: 39.55801119967461 - type: nauc_mrr_at_20_diff1 value: 14.433120969285195 - type: nauc_mrr_at_20_max value: 52.78365722715205 - type: nauc_mrr_at_20_std value: 44.72011559301776 - type: nauc_mrr_at_3_diff1 value: 13.11183382637074 - type: nauc_mrr_at_3_max value: 51.12370908328734 - type: nauc_mrr_at_3_std value: 40.238401804460075 - type: nauc_mrr_at_5_diff1 value: 13.179254658692855 - type: nauc_mrr_at_5_max value: 53.38265101836388 - type: nauc_mrr_at_5_std value: 44.541370972177624 - type: nauc_ndcg_at_1000_diff1 value: 21.69587945916941 - type: nauc_ndcg_at_1000_max value: 63.37066645313249 - type: nauc_ndcg_at_1000_std value: 62.97303091219909 - type: nauc_ndcg_at_100_diff1 value: 14.796314010328851 - type: nauc_ndcg_at_100_max value: 58.71101997436683 - type: nauc_ndcg_at_100_std value: 56.81420228421644 - type: nauc_ndcg_at_10_diff1 value: 3.194403093296008 - type: nauc_ndcg_at_10_max value: 48.55754387196878 - type: nauc_ndcg_at_10_std value: 47.48615570741263 - type: nauc_ndcg_at_1_diff1 value: -6.148169734658873 - type: nauc_ndcg_at_1_max value: 25.556355503841665 - type: nauc_ndcg_at_1_std value: 21.48805389151005 - type: nauc_ndcg_at_20_diff1 value: 4.461683170351035 - type: nauc_ndcg_at_20_max value: 56.88294190421313 - type: nauc_ndcg_at_20_std value: 51.93821404537562 - type: nauc_ndcg_at_3_diff1 value: -2.861880240597804 - type: nauc_ndcg_at_3_max value: 41.33450475096539 - type: nauc_ndcg_at_3_std value: 37.27470370159716 - type: nauc_ndcg_at_5_diff1 value: 0.08149020695323854 - type: nauc_ndcg_at_5_max value: 46.722954751612264 - type: nauc_ndcg_at_5_std value: 44.665247293303416 - type: nauc_precision_at_1000_diff1 value: 6.514642381748156 - type: nauc_precision_at_1000_max value: 54.61143553569596 - type: nauc_precision_at_1000_std value: 51.84636945565138 - type: nauc_precision_at_100_diff1 value: 9.181266993927007 - type: nauc_precision_at_100_max value: 63.29553111429812 - type: nauc_precision_at_100_std value: 59.013060721871035 - type: nauc_precision_at_10_diff1 value: 16.062673027273505 - type: nauc_precision_at_10_max value: 64.85826828536602 - type: nauc_precision_at_10_std value: 58.476222375984 - type: nauc_precision_at_1_diff1 value: 4.7355957804700415 - type: nauc_precision_at_1_max value: 39.93352486009351 - type: nauc_precision_at_1_std value: 39.55801119967461 - type: nauc_precision_at_20_diff1 value: 12.061096674017728 - type: nauc_precision_at_20_max value: 66.81322466200473 - type: nauc_precision_at_20_std value: 58.18606533749746 - type: nauc_precision_at_3_diff1 value: 9.10289433878097 - type: nauc_precision_at_3_max value: 61.00901833818042 - type: nauc_precision_at_3_std value: 52.94626237786338 - type: nauc_precision_at_5_diff1 value: 13.765083369324818 - type: nauc_precision_at_5_max value: 67.0735717931603 - type: nauc_precision_at_5_std value: 60.160759158192334 - type: nauc_recall_at_1000_diff1 value: 33.378885488094184 - type: nauc_recall_at_1000_max value: 58.97167459966026 - type: nauc_recall_at_1000_std value: 59.59218645358476 - type: nauc_recall_at_100_diff1 value: 25.1307767949282 - type: nauc_recall_at_100_max value: 48.29698220976826 - type: nauc_recall_at_100_std value: 44.76527467601765 - type: nauc_recall_at_10_diff1 value: 21.012536607264714 - type: nauc_recall_at_10_max value: 21.719714919287135 - type: nauc_recall_at_10_std value: 18.503987452436643 - type: nauc_recall_at_1_diff1 value: 10.947697022780028 - type: nauc_recall_at_1_max value: 11.333211449460881 - type: nauc_recall_at_1_std value: 19.475048420924633 - type: nauc_recall_at_20_diff1 value: 14.221666924930961 - type: nauc_recall_at_20_max value: 30.83326629354958 - type: nauc_recall_at_20_std value: 25.419400751031635 - type: nauc_recall_at_3_diff1 value: 19.488515137385438 - type: nauc_recall_at_3_max value: 18.682366339227507 - type: nauc_recall_at_3_std value: 14.801487977327957 - type: nauc_recall_at_5_diff1 value: 21.493404372645262 - type: nauc_recall_at_5_max value: 22.470910257369972 - type: nauc_recall_at_5_std value: 20.91789333035049 - type: ndcg_at_1 value: 36.047000000000004 - type: ndcg_at_10 value: 34.981 - type: ndcg_at_100 value: 33.928000000000004 - type: ndcg_at_1000 value: 42.553999999999995 - type: ndcg_at_20 value: 33.768 - type: ndcg_at_3 value: 35.477 - type: ndcg_at_5 value: 35.54 - type: precision_at_1 value: 55.814 - type: precision_at_10 value: 46.744 - type: precision_at_100 value: 22.721 - type: precision_at_1000 value: 4.781 - type: precision_at_20 value: 40.465 - type: precision_at_3 value: 52.713 - type: precision_at_5 value: 51.163000000000004 - type: recall_at_1 value: 0.9369999999999999 - type: recall_at_10 value: 7.921 - type: recall_at_100 value: 28.903000000000002 - type: recall_at_1000 value: 53.691 - type: recall_at_20 value: 12.745000000000001 - type: recall_at_3 value: 2.8240000000000003 - type: recall_at_5 value: 4.476999999999999 - task: type: Classification dataset: name: MTEB MTOPDomainClassification (en) type: mteb/mtop_domain config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 88.95576835385319 - type: f1 value: 88.06364678376042 - type: f1_weighted value: 89.00721562093213 - type: main_score value: 88.95576835385319 - task: type: Classification dataset: name: MTEB MTOPIntentClassification (en) type: mteb/mtop_intent config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 56.99726402188783 - type: f1 value: 38.19916053247397 - type: f1_weighted value: 59.96788951671549 - type: main_score value: 56.99726402188783 - task: type: Classification dataset: name: MTEB MassiveIntentClassification (en) type: mteb/amazon_massive_intent config: en split: test revision: 4672e20407010da34463acc759c162ca9734bca6 metrics: - type: accuracy value: 63.79287155346336 - type: f1 value: 61.634629394462934 - type: f1_weighted value: 62.567311481126055 - type: main_score value: 63.79287155346336 - task: type: Classification dataset: name: MTEB MassiveScenarioClassification (en) type: mteb/amazon_massive_scenario config: en split: test revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 metrics: - type: accuracy value: 70.30934767989241 - type: f1 value: 68.77914761769517 - type: f1_weighted value: 70.1128179307388 - type: main_score value: 70.30934767989241 - task: type: Clustering dataset: name: MTEB MedrxivClusteringP2P (default) type: mteb/medrxiv-clustering-p2p config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: main_score value: 27.61734940907637 - type: v_measure value: 27.61734940907637 - type: v_measure_std value: 1.2248100208316097 - task: type: Clustering dataset: name: MTEB MedrxivClusteringS2S (default) type: mteb/medrxiv-clustering-s2s config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: main_score value: 23.802943866708308 - type: v_measure value: 23.802943866708308 - type: v_measure_std value: 1.4975518910969763 - task: type: Reranking dataset: name: MTEB MindSmallReranking (default) type: mteb/mind_small config: default split: test revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 metrics: - type: main_score value: 29.431722284942175 - type: map value: 29.431722284942175 - type: mrr value: 30.207239990924332 - type: nAUC_map_diff1 value: 8.996546748314882 - type: nAUC_map_max value: -23.177815249478726 - type: nAUC_map_std value: -8.953694065964015 - type: nAUC_mrr_diff1 value: 9.247690774332192 - type: nAUC_mrr_max value: -17.42779158552557 - type: nAUC_mrr_std value: -5.997215692334967 - task: type: Retrieval dataset: name: MTEB NFCorpus (default) type: mteb/nfcorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: main_score value: 24.267 - type: map_at_1 value: 3.479 - type: map_at_10 value: 7.603 - type: map_at_100 value: 9.725999999999999 - type: map_at_1000 value: 10.84 - type: map_at_20 value: 8.458 - type: map_at_3 value: 5.844 - type: map_at_5 value: 6.732 - type: mrr_at_1 value: 33.746130030959755 - type: mrr_at_10 value: 43.515897587105016 - type: mrr_at_100 value: 44.1900925310943 - type: mrr_at_1000 value: 44.248355412773655 - type: mrr_at_20 value: 43.868459509915866 - type: mrr_at_3 value: 41.74406604747161 - type: mrr_at_5 value: 42.82765737874097 - type: nauc_map_at_1000_diff1 value: 34.88971488841416 - type: nauc_map_at_1000_max value: 31.233839968277195 - type: nauc_map_at_1000_std value: 17.992857492799814 - type: nauc_map_at_100_diff1 value: 36.76693324709909 - type: nauc_map_at_100_max value: 29.86086979425915 - type: nauc_map_at_100_std value: 13.839419605590217 - type: nauc_map_at_10_diff1 value: 41.84259867098214 - type: nauc_map_at_10_max value: 25.879197474145045 - type: nauc_map_at_10_std value: 5.172621372587683 - type: nauc_map_at_1_diff1 value: 59.30631217950276 - type: nauc_map_at_1_max value: 20.33548433428363 - type: nauc_map_at_1_std value: -1.8217254079917093 - type: nauc_map_at_20_diff1 value: 38.95414455683049 - type: nauc_map_at_20_max value: 26.987123257006363 - type: nauc_map_at_20_std value: 8.70109669516395 - type: nauc_map_at_3_diff1 value: 47.18504542973307 - type: nauc_map_at_3_max value: 21.706151469833202 - type: nauc_map_at_3_std value: 0.8205050181794802 - type: nauc_map_at_5_diff1 value: 45.415931092144476 - type: nauc_map_at_5_max value: 23.366427326413234 - type: nauc_map_at_5_std value: 2.036343948136038 - type: nauc_mrr_at_1000_diff1 value: 34.09352814360173 - type: nauc_mrr_at_1000_max value: 36.57744406738573 - type: nauc_mrr_at_1000_std value: 18.874642200828255 - type: nauc_mrr_at_100_diff1 value: 34.07606233752646 - type: nauc_mrr_at_100_max value: 36.570920987632604 - type: nauc_mrr_at_100_std value: 18.90704866545748 - type: nauc_mrr_at_10_diff1 value: 33.86749261732675 - type: nauc_mrr_at_10_max value: 36.53445713485045 - type: nauc_mrr_at_10_std value: 18.72635222657426 - type: nauc_mrr_at_1_diff1 value: 38.310753456104415 - type: nauc_mrr_at_1_max value: 32.080433604684444 - type: nauc_mrr_at_1_std value: 10.76705379557832 - type: nauc_mrr_at_20_diff1 value: 34.05889362360272 - type: nauc_mrr_at_20_max value: 36.539902847898894 - type: nauc_mrr_at_20_std value: 18.829170969376136 - type: nauc_mrr_at_3_diff1 value: 34.661230693226 - type: nauc_mrr_at_3_max value: 35.27494037957078 - type: nauc_mrr_at_3_std value: 16.799715396839538 - type: nauc_mrr_at_5_diff1 value: 34.30568391918026 - type: nauc_mrr_at_5_max value: 36.31513238612551 - type: nauc_mrr_at_5_std value: 18.248879043938977 - type: nauc_ndcg_at_1000_diff1 value: 28.625594076978317 - type: nauc_ndcg_at_1000_max value: 39.10317925519372 - type: nauc_ndcg_at_1000_std value: 28.285055860454257 - type: nauc_ndcg_at_100_diff1 value: 27.620568325357986 - type: nauc_ndcg_at_100_max value: 34.32867733567831 - type: nauc_ndcg_at_100_std value: 25.103257804738867 - type: nauc_ndcg_at_10_diff1 value: 24.527566945282576 - type: nauc_ndcg_at_10_max value: 32.19051221282665 - type: nauc_ndcg_at_10_std value: 25.403501921327432 - type: nauc_ndcg_at_1_diff1 value: 38.95386802348185 - type: nauc_ndcg_at_1_max value: 30.134605059752644 - type: nauc_ndcg_at_1_std value: 11.904644683131 - type: nauc_ndcg_at_20_diff1 value: 25.422544698266798 - type: nauc_ndcg_at_20_max value: 31.85394200124836 - type: nauc_ndcg_at_20_std value: 26.925279769256523 - type: nauc_ndcg_at_3_diff1 value: 27.968874988258573 - type: nauc_ndcg_at_3_max value: 30.93696431950224 - type: nauc_ndcg_at_3_std value: 18.551823245893114 - type: nauc_ndcg_at_5_diff1 value: 25.722349682774233 - type: nauc_ndcg_at_5_max value: 32.29294830500251 - type: nauc_ndcg_at_5_std value: 21.309663190563718 - type: nauc_precision_at_1000_diff1 value: -7.466934392543785 - type: nauc_precision_at_1000_max value: 17.534662065944236 - type: nauc_precision_at_1000_std value: 43.86335465977071 - type: nauc_precision_at_100_diff1 value: -2.073530455550674 - type: nauc_precision_at_100_max value: 26.51626141328235 - type: nauc_precision_at_100_std value: 47.02741717034574 - type: nauc_precision_at_10_diff1 value: 6.717006995188633 - type: nauc_precision_at_10_max value: 32.738691529253494 - type: nauc_precision_at_10_std value: 35.80103442917034 - type: nauc_precision_at_1_diff1 value: 38.310753456104415 - type: nauc_precision_at_1_max value: 32.080433604684444 - type: nauc_precision_at_1_std value: 10.76705379557832 - type: nauc_precision_at_20_diff1 value: 2.745832502363386 - type: nauc_precision_at_20_max value: 30.954145690157688 - type: nauc_precision_at_20_std value: 41.74795596694651 - type: nauc_precision_at_3_diff1 value: 20.04271494210498 - type: nauc_precision_at_3_max value: 32.49798591360355 - type: nauc_precision_at_3_std value: 22.433174666547337 - type: nauc_precision_at_5_diff1 value: 13.559244763754297 - type: nauc_precision_at_5_max value: 34.29174467545541 - type: nauc_precision_at_5_std value: 27.67088510253159 - type: nauc_recall_at_1000_diff1 value: 14.406899781864585 - type: nauc_recall_at_1000_max value: 18.63293041982341 - type: nauc_recall_at_1000_std value: 14.873113563587054 - type: nauc_recall_at_100_diff1 value: 20.276630820341023 - type: nauc_recall_at_100_max value: 20.74130868375551 - type: nauc_recall_at_100_std value: 14.253807947296465 - type: nauc_recall_at_10_diff1 value: 32.131322772361194 - type: nauc_recall_at_10_max value: 21.834619003317645 - type: nauc_recall_at_10_std value: 5.111047982154726 - type: nauc_recall_at_1_diff1 value: 59.30631217950276 - type: nauc_recall_at_1_max value: 20.33548433428363 - type: nauc_recall_at_1_std value: -1.8217254079917093 - type: nauc_recall_at_20_diff1 value: 29.009526186873646 - type: nauc_recall_at_20_max value: 19.222693262075214 - type: nauc_recall_at_20_std value: 8.263428180065297 - type: nauc_recall_at_3_diff1 value: 38.428506196942266 - type: nauc_recall_at_3_max value: 18.92885903756039 - type: nauc_recall_at_3_std value: 2.2767688747391106 - type: nauc_recall_at_5_diff1 value: 35.93597428489607 - type: nauc_recall_at_5_max value: 19.591607144107787 - type: nauc_recall_at_5_std value: 2.110828447844176 - type: ndcg_at_1 value: 31.424000000000003 - type: ndcg_at_10 value: 24.267 - type: ndcg_at_100 value: 22.416 - type: ndcg_at_1000 value: 31.165 - type: ndcg_at_20 value: 22.698 - type: ndcg_at_3 value: 28.349999999999998 - type: ndcg_at_5 value: 26.596999999999998 - type: precision_at_1 value: 33.745999999999995 - type: precision_at_10 value: 18.173000000000002 - type: precision_at_100 value: 6.142 - type: precision_at_1000 value: 1.856 - type: precision_at_20 value: 13.808000000000002 - type: precision_at_3 value: 27.141 - type: precision_at_5 value: 22.91 - type: recall_at_1 value: 3.479 - type: recall_at_10 value: 10.838000000000001 - type: recall_at_100 value: 23.817 - type: recall_at_1000 value: 54.910000000000004 - type: recall_at_20 value: 14.201 - type: recall_at_3 value: 7.236 - type: recall_at_5 value: 9.003 - task: type: Retrieval dataset: name: MTEB NQ (default) type: mteb/nq config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: main_score value: 19.543 - type: map_at_1 value: 8.413 - type: map_at_10 value: 15.137 - type: map_at_100 value: 16.393 - type: map_at_1000 value: 16.492 - type: map_at_20 value: 15.827 - type: map_at_3 value: 12.584999999999999 - type: map_at_5 value: 13.963000000000001 - type: mrr_at_1 value: 9.73348783314021 - type: mrr_at_10 value: 16.79895712630359 - type: mrr_at_100 value: 17.96527488497497 - type: mrr_at_1000 value: 18.049284621380956 - type: mrr_at_20 value: 17.456541969883244 - type: mrr_at_3 value: 14.2429509463113 - type: mrr_at_5 value: 15.636346079567373 - type: nauc_map_at_1000_diff1 value: 18.819971639310904 - type: nauc_map_at_1000_max value: 13.814947350680912 - type: nauc_map_at_1000_std value: 2.521914759184715 - type: nauc_map_at_100_diff1 value: 18.814255883152295 - type: nauc_map_at_100_max value: 13.784098474987728 - type: nauc_map_at_100_std value: 2.463386644603925 - type: nauc_map_at_10_diff1 value: 18.859741700546 - type: nauc_map_at_10_max value: 13.200112454161522 - type: nauc_map_at_10_std value: 1.2838729142015952 - type: nauc_map_at_1_diff1 value: 22.792911666175435 - type: nauc_map_at_1_max value: 9.420966909430586 - type: nauc_map_at_1_std value: -2.177707391834426 - type: nauc_map_at_20_diff1 value: 18.857585870077603 - type: nauc_map_at_20_max value: 13.494371000020585 - type: nauc_map_at_20_std value: 1.7987081767888724 - type: nauc_map_at_3_diff1 value: 20.3919043114244 - type: nauc_map_at_3_max value: 11.229233328712159 - type: nauc_map_at_3_std value: -0.38627708043707826 - type: nauc_map_at_5_diff1 value: 19.354241266183816 - type: nauc_map_at_5_max value: 12.050995012138287 - type: nauc_map_at_5_std value: 0.4619900683963445 - type: nauc_mrr_at_1000_diff1 value: 17.44597143162577 - type: nauc_mrr_at_1000_max value: 12.99325734801233 - type: nauc_mrr_at_1000_std value: 3.843471729334042 - type: nauc_mrr_at_100_diff1 value: 17.435646674940784 - type: nauc_mrr_at_100_max value: 12.977733602157626 - type: nauc_mrr_at_100_std value: 3.819688827654704 - type: nauc_mrr_at_10_diff1 value: 17.366258247556274 - type: nauc_mrr_at_10_max value: 12.525863095955028 - type: nauc_mrr_at_10_std value: 2.9586217333067033 - type: nauc_mrr_at_1_diff1 value: 21.181200992092933 - type: nauc_mrr_at_1_max value: 9.071174422547715 - type: nauc_mrr_at_1_std value: 0.37666341313223156 - type: nauc_mrr_at_20_diff1 value: 17.47842029246494 - type: nauc_mrr_at_20_max value: 12.782728137865854 - type: nauc_mrr_at_20_std value: 3.335207400639897 - type: nauc_mrr_at_3_diff1 value: 18.51145002403263 - type: nauc_mrr_at_3_max value: 10.835289485126742 - type: nauc_mrr_at_3_std value: 1.9317890085586098 - type: nauc_mrr_at_5_diff1 value: 17.85072852768249 - type: nauc_mrr_at_5_max value: 11.48513938150474 - type: nauc_mrr_at_5_std value: 2.42459300983239 - type: nauc_ndcg_at_1000_diff1 value: 16.90906471124972 - type: nauc_ndcg_at_1000_max value: 18.10309890125217 - type: nauc_ndcg_at_1000_std value: 9.531587494208333 - type: nauc_ndcg_at_100_diff1 value: 16.794610031459452 - type: nauc_ndcg_at_100_max value: 17.320423121617587 - type: nauc_ndcg_at_100_std value: 8.36089871892644 - type: nauc_ndcg_at_10_diff1 value: 16.9238328483549 - type: nauc_ndcg_at_10_max value: 15.003898384476175 - type: nauc_ndcg_at_10_std value: 3.220068514580869 - type: nauc_ndcg_at_1_diff1 value: 21.181200992092933 - type: nauc_ndcg_at_1_max value: 9.071174422547715 - type: nauc_ndcg_at_1_std value: 0.37666341313223156 - type: nauc_ndcg_at_20_diff1 value: 17.122783032672636 - type: nauc_ndcg_at_20_max value: 15.811529036192868 - type: nauc_ndcg_at_20_std value: 4.638881062044276 - type: nauc_ndcg_at_3_diff1 value: 19.397651629456085 - type: nauc_ndcg_at_3_max value: 11.519185092964664 - type: nauc_ndcg_at_3_std value: 0.5852664941054009 - type: nauc_ndcg_at_5_diff1 value: 17.836092374281833 - type: nauc_ndcg_at_5_max value: 12.692159310256345 - type: nauc_ndcg_at_5_std value: 1.7356004993081944 - type: nauc_precision_at_1000_diff1 value: 3.073453832047264 - type: nauc_precision_at_1000_max value: 23.790855697865958 - type: nauc_precision_at_1000_std value: 32.57511127212919 - type: nauc_precision_at_100_diff1 value: 9.127444700503846 - type: nauc_precision_at_100_max value: 22.71156118580008 - type: nauc_precision_at_100_std value: 24.63648530454141 - type: nauc_precision_at_10_diff1 value: 13.02401021030829 - type: nauc_precision_at_10_max value: 18.85263386483255 - type: nauc_precision_at_10_std value: 8.373513612599647 - type: nauc_precision_at_1_diff1 value: 21.181200992092933 - type: nauc_precision_at_1_max value: 9.071174422547715 - type: nauc_precision_at_1_std value: 0.37666341313223156 - type: nauc_precision_at_20_diff1 value: 12.975989332948448 - type: nauc_precision_at_20_max value: 20.296858370304385 - type: nauc_precision_at_20_std value: 12.119876359299383 - type: nauc_precision_at_3_diff1 value: 17.130641156396027 - type: nauc_precision_at_3_max value: 12.010571872098485 - type: nauc_precision_at_3_std value: 2.637465881798806 - type: nauc_precision_at_5_diff1 value: 14.960326184287629 - type: nauc_precision_at_5_max value: 14.264819044499205 - type: nauc_precision_at_5_std value: 4.5445140864787215 - type: nauc_recall_at_1000_diff1 value: 11.322486975456016 - type: nauc_recall_at_1000_max value: 42.74305283200241 - type: nauc_recall_at_1000_std value: 47.78794764298061 - type: nauc_recall_at_100_diff1 value: 12.242221079259041 - type: nauc_recall_at_100_max value: 26.918744103646013 - type: nauc_recall_at_100_std value: 24.541980019505186 - type: nauc_recall_at_10_diff1 value: 13.38045827515169 - type: nauc_recall_at_10_max value: 18.545456163809533 - type: nauc_recall_at_10_std value: 5.734945625849404 - type: nauc_recall_at_1_diff1 value: 22.792911666175435 - type: nauc_recall_at_1_max value: 9.420966909430586 - type: nauc_recall_at_1_std value: -2.177707391834426 - type: nauc_recall_at_20_diff1 value: 14.133329746281683 - type: nauc_recall_at_20_max value: 20.394153554260118 - type: nauc_recall_at_20_std value: 9.229321407977622 - type: nauc_recall_at_3_diff1 value: 18.230047011254864 - type: nauc_recall_at_3_max value: 12.217461047044784 - type: nauc_recall_at_3_std value: 1.0395060720237228 - type: nauc_recall_at_5_diff1 value: 14.947190921163273 - type: nauc_recall_at_5_max value: 13.844816353548604 - type: nauc_recall_at_5_std value: 2.9621844586841086 - type: ndcg_at_1 value: 9.733 - type: ndcg_at_10 value: 19.543 - type: ndcg_at_100 value: 25.965 - type: ndcg_at_1000 value: 28.663 - type: ndcg_at_20 value: 21.985 - type: ndcg_at_3 value: 14.308000000000002 - type: ndcg_at_5 value: 16.771 - type: precision_at_1 value: 9.733 - type: precision_at_10 value: 3.7249999999999996 - type: precision_at_100 value: 0.739 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 2.4330000000000003 - type: precision_at_3 value: 6.856 - type: precision_at_5 value: 5.475 - type: recall_at_1 value: 8.413 - type: recall_at_10 value: 31.668000000000003 - type: recall_at_100 value: 61.551 - type: recall_at_1000 value: 82.228 - type: recall_at_20 value: 40.888999999999996 - type: recall_at_3 value: 17.669 - type: recall_at_5 value: 23.488999999999997 - task: type: Retrieval dataset: name: MTEB QuoraRetrieval (default) type: mteb/quora config: default split: test revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 metrics: - type: main_score value: 80.598 - type: map_at_1 value: 63.532 - type: map_at_10 value: 76.07300000000001 - type: map_at_100 value: 76.863 - type: map_at_1000 value: 76.896 - type: map_at_20 value: 76.575 - type: map_at_3 value: 73.075 - type: map_at_5 value: 74.888 - type: mrr_at_1 value: 73.11 - type: mrr_at_10 value: 80.13760714285678 - type: mrr_at_100 value: 80.40676931635143 - type: mrr_at_1000 value: 80.413857041773 - type: mrr_at_20 value: 80.33569450368124 - type: mrr_at_3 value: 78.73166666666627 - type: mrr_at_5 value: 79.60316666666607 - type: nauc_map_at_1000_diff1 value: 71.76748518946404 - type: nauc_map_at_1000_max value: 37.52091562623074 - type: nauc_map_at_1000_std value: -19.886772833711106 - type: nauc_map_at_100_diff1 value: 71.77392469494623 - type: nauc_map_at_100_max value: 37.51305402355471 - type: nauc_map_at_100_std value: -19.90950133564633 - type: nauc_map_at_10_diff1 value: 71.78435718469383 - type: nauc_map_at_10_max value: 37.12859151143304 - type: nauc_map_at_10_std value: -20.6727975668906 - type: nauc_map_at_1_diff1 value: 74.16329762399023 - type: nauc_map_at_1_max value: 30.710315707498864 - type: nauc_map_at_1_std value: -19.3193474040897 - type: nauc_map_at_20_diff1 value: 71.8048608565351 - type: nauc_map_at_20_max value: 37.437936254957336 - type: nauc_map_at_20_std value: -20.256332267213164 - type: nauc_map_at_3_diff1 value: 72.15934361454754 - type: nauc_map_at_3_max value: 35.34630080626579 - type: nauc_map_at_3_std value: -22.03571060362441 - type: nauc_map_at_5_diff1 value: 71.83699898564598 - type: nauc_map_at_5_max value: 36.479498983192975 - type: nauc_map_at_5_std value: -21.231304270451062 - type: nauc_mrr_at_1000_diff1 value: 72.88897169606878 - type: nauc_mrr_at_1000_max value: 40.200221349285634 - type: nauc_mrr_at_1000_std value: -17.633375591506123 - type: nauc_mrr_at_100_diff1 value: 72.88918562563104 - type: nauc_mrr_at_100_max value: 40.20508375617468 - type: nauc_mrr_at_100_std value: -17.62754237516005 - type: nauc_mrr_at_10_diff1 value: 72.78722143722388 - type: nauc_mrr_at_10_max value: 40.26493516347653 - type: nauc_mrr_at_10_std value: -17.591516046092213 - type: nauc_mrr_at_1_diff1 value: 74.20323111992924 - type: nauc_mrr_at_1_max value: 39.1888925247388 - type: nauc_mrr_at_1_std value: -17.041083591080856 - type: nauc_mrr_at_20_diff1 value: 72.87614719969847 - type: nauc_mrr_at_20_max value: 40.25187245577547 - type: nauc_mrr_at_20_std value: -17.623643078270213 - type: nauc_mrr_at_3_diff1 value: 72.70424133205663 - type: nauc_mrr_at_3_max value: 40.015103745774944 - type: nauc_mrr_at_3_std value: -18.296912082298693 - type: nauc_mrr_at_5_diff1 value: 72.6695462203408 - type: nauc_mrr_at_5_max value: 40.166677547198724 - type: nauc_mrr_at_5_std value: -17.836669429879553 - type: nauc_ndcg_at_1000_diff1 value: 71.7014600627096 - type: nauc_ndcg_at_1000_max value: 39.17528447849729 - type: nauc_ndcg_at_1000_std value: -18.169144412803025 - type: nauc_ndcg_at_100_diff1 value: 71.72812292491562 - type: nauc_ndcg_at_100_max value: 39.178065817466866 - type: nauc_ndcg_at_100_std value: -17.98857148420824 - type: nauc_ndcg_at_10_diff1 value: 71.22490342106018 - type: nauc_ndcg_at_10_max value: 38.58976910658222 - type: nauc_ndcg_at_10_std value: -19.3807889122846 - type: nauc_ndcg_at_1_diff1 value: 74.20323111992924 - type: nauc_ndcg_at_1_max value: 39.18366557965937 - type: nauc_ndcg_at_1_std value: -16.979563433712343 - type: nauc_ndcg_at_20_diff1 value: 71.59416957115776 - type: nauc_ndcg_at_20_max value: 39.11048553178983 - type: nauc_ndcg_at_20_std value: -18.913452979338476 - type: nauc_ndcg_at_3_diff1 value: 71.15596154191027 - type: nauc_ndcg_at_3_max value: 37.36564154714553 - type: nauc_ndcg_at_3_std value: -20.721815190390565 - type: nauc_ndcg_at_5_diff1 value: 71.0047395584928 - type: nauc_ndcg_at_5_max value: 37.95479899642812 - type: nauc_ndcg_at_5_std value: -20.008045920279887 - type: nauc_precision_at_1000_diff1 value: -36.79287717727177 - type: nauc_precision_at_1000_max value: -4.853042765778535 - type: nauc_precision_at_1000_std value: 21.89700327903914 - type: nauc_precision_at_100_diff1 value: -33.803566917391024 - type: nauc_precision_at_100_max value: -2.343501157957199 - type: nauc_precision_at_100_std value: 21.03134251148425 - type: nauc_precision_at_10_diff1 value: -19.647078935128047 - type: nauc_precision_at_10_max value: 7.646163968592671 - type: nauc_precision_at_10_std value: 11.425640109742039 - type: nauc_precision_at_1_diff1 value: 74.20323111992924 - type: nauc_precision_at_1_max value: 39.18366557965937 - type: nauc_precision_at_1_std value: -16.979563433712343 - type: nauc_precision_at_20_diff1 value: -26.95360783576433 - type: nauc_precision_at_20_max value: 3.534889652498316 - type: nauc_precision_at_20_std value: 16.011941126119197 - type: nauc_precision_at_3_diff1 value: 7.80806721613657 - type: nauc_precision_at_3_max value: 18.93471456458755 - type: nauc_precision_at_3_std value: -2.3471793824170493 - type: nauc_precision_at_5_diff1 value: -7.187077136844068 - type: nauc_precision_at_5_max value: 13.710196203710806 - type: nauc_precision_at_5_std value: 5.029517000064198 - type: nauc_recall_at_1000_diff1 value: 55.29138658386572 - type: nauc_recall_at_1000_max value: 57.58368141138265 - type: nauc_recall_at_1000_std value: 33.353499745829765 - type: nauc_recall_at_100_diff1 value: 65.98407378542676 - type: nauc_recall_at_100_max value: 43.3437006049648 - type: nauc_recall_at_100_std value: 3.7556643837275345 - type: nauc_recall_at_10_diff1 value: 64.73552843826317 - type: nauc_recall_at_10_max value: 37.93061567923699 - type: nauc_recall_at_10_std value: -19.1098323242707 - type: nauc_recall_at_1_diff1 value: 74.16329762399023 - type: nauc_recall_at_1_max value: 30.710315707498864 - type: nauc_recall_at_1_std value: -19.3193474040897 - type: nauc_recall_at_20_diff1 value: 64.4507396763554 - type: nauc_recall_at_20_max value: 40.62914458603293 - type: nauc_recall_at_20_std value: -15.040711675139082 - type: nauc_recall_at_3_diff1 value: 67.8143518137102 - type: nauc_recall_at_3_max value: 33.649275891159945 - type: nauc_recall_at_3_std value: -24.400275123272163 - type: nauc_recall_at_5_diff1 value: 65.9405683463817 - type: nauc_recall_at_5_max value: 35.64051201738537 - type: nauc_recall_at_5_std value: -22.06335424061329 - type: ndcg_at_1 value: 73.11 - type: ndcg_at_10 value: 80.598 - type: ndcg_at_100 value: 82.75200000000001 - type: ndcg_at_1000 value: 83.145 - type: ndcg_at_20 value: 81.71300000000001 - type: ndcg_at_3 value: 77.025 - type: ndcg_at_5 value: 78.85 - type: precision_at_1 value: 73.11 - type: precision_at_10 value: 12.206999999999999 - type: precision_at_100 value: 1.459 - type: precision_at_1000 value: 0.155 - type: precision_at_20 value: 6.579 - type: precision_at_3 value: 33.36 - type: precision_at_5 value: 22.09 - type: recall_at_1 value: 63.532 - type: recall_at_10 value: 89.32600000000001 - type: recall_at_100 value: 97.35000000000001 - type: recall_at_1000 value: 99.613 - type: recall_at_20 value: 93.151 - type: recall_at_3 value: 79.074 - type: recall_at_5 value: 84.143 - task: type: Clustering dataset: name: MTEB RedditClustering (default) type: mteb/reddit-clustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: main_score value: 39.5465127563479 - type: v_measure value: 39.5465127563479 - type: v_measure_std value: 5.038703300031419 - task: type: Clustering dataset: name: MTEB RedditClusteringP2P (default) type: mteb/reddit-clustering-p2p config: default split: test revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 metrics: - type: main_score value: 47.07911795189491 - type: v_measure value: 47.07911795189491 - type: v_measure_std value: 11.546436135362846 - task: type: Retrieval dataset: name: MTEB SCIDOCS (default) type: mteb/scidocs config: default split: test revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 metrics: - type: main_score value: 12.386999999999999 - type: map_at_1 value: 3.053 - type: map_at_10 value: 6.912999999999999 - type: map_at_100 value: 8.261000000000001 - type: map_at_1000 value: 8.530999999999999 - type: map_at_20 value: 7.566000000000001 - type: map_at_3 value: 5.094 - type: map_at_5 value: 5.997 - type: mrr_at_1 value: 15.0 - type: mrr_at_10 value: 22.795357142857135 - type: mrr_at_100 value: 24.007787966055577 - type: mrr_at_1000 value: 24.09964360060081 - type: mrr_at_20 value: 23.466190383404 - type: mrr_at_3 value: 20.100000000000012 - type: mrr_at_5 value: 21.685000000000006 - type: nauc_map_at_1000_diff1 value: 11.73412101608325 - type: nauc_map_at_1000_max value: 14.330449150895694 - type: nauc_map_at_1000_std value: 15.742095990011743 - type: nauc_map_at_100_diff1 value: 11.777038848684697 - type: nauc_map_at_100_max value: 14.104140826193404 - type: nauc_map_at_100_std value: 15.155771699462264 - type: nauc_map_at_10_diff1 value: 12.374060330916672 - type: nauc_map_at_10_max value: 11.856630361520313 - type: nauc_map_at_10_std value: 11.753665232073269 - type: nauc_map_at_1_diff1 value: 16.986085327339335 - type: nauc_map_at_1_max value: 12.246255844992572 - type: nauc_map_at_1_std value: 7.863450169503143 - type: nauc_map_at_20_diff1 value: 11.634858111388464 - type: nauc_map_at_20_max value: 13.108008262696513 - type: nauc_map_at_20_std value: 13.423455469499999 - type: nauc_map_at_3_diff1 value: 14.889445454705324 - type: nauc_map_at_3_max value: 11.572110481390013 - type: nauc_map_at_3_std value: 8.556136010622351 - type: nauc_map_at_5_diff1 value: 12.907309838627985 - type: nauc_map_at_5_max value: 11.000220583694968 - type: nauc_map_at_5_std value: 10.111376166991917 - type: nauc_mrr_at_1000_diff1 value: 14.963874100415397 - type: nauc_mrr_at_1000_max value: 13.495160823256164 - type: nauc_mrr_at_1000_std value: 11.28815345444998 - type: nauc_mrr_at_100_diff1 value: 14.97621893176082 - type: nauc_mrr_at_100_max value: 13.464936280105155 - type: nauc_mrr_at_100_std value: 11.305521958378108 - type: nauc_mrr_at_10_diff1 value: 14.956869421525884 - type: nauc_mrr_at_10_max value: 13.425685629657924 - type: nauc_mrr_at_10_std value: 10.767260180262618 - type: nauc_mrr_at_1_diff1 value: 16.83378691664147 - type: nauc_mrr_at_1_max value: 12.112287067835906 - type: nauc_mrr_at_1_std value: 8.418304606390475 - type: nauc_mrr_at_20_diff1 value: 14.917032940839656 - type: nauc_mrr_at_20_max value: 13.41755983642966 - type: nauc_mrr_at_20_std value: 11.11458079038555 - type: nauc_mrr_at_3_diff1 value: 15.214496970273089 - type: nauc_mrr_at_3_max value: 12.165871395179483 - type: nauc_mrr_at_3_std value: 9.980162064503286 - type: nauc_mrr_at_5_diff1 value: 14.835204244776087 - type: nauc_mrr_at_5_max value: 12.524956858818742 - type: nauc_mrr_at_5_std value: 10.099655249800849 - type: nauc_ndcg_at_1000_diff1 value: 10.764737128236437 - type: nauc_ndcg_at_1000_max value: 18.3469700109834 - type: nauc_ndcg_at_1000_std value: 23.22837765426608 - type: nauc_ndcg_at_100_diff1 value: 11.606245579895573 - type: nauc_ndcg_at_100_max value: 17.167157579603412 - type: nauc_ndcg_at_100_std value: 20.347909657378473 - type: nauc_ndcg_at_10_diff1 value: 12.394040285590439 - type: nauc_ndcg_at_10_max value: 13.388439287974505 - type: nauc_ndcg_at_10_std value: 13.188024533529397 - type: nauc_ndcg_at_1_diff1 value: 16.83378691664147 - type: nauc_ndcg_at_1_max value: 12.112287067835906 - type: nauc_ndcg_at_1_std value: 8.418304606390475 - type: nauc_ndcg_at_20_diff1 value: 11.212784095325706 - type: nauc_ndcg_at_20_max value: 15.185332617097233 - type: nauc_ndcg_at_20_std value: 16.087050160363443 - type: nauc_ndcg_at_3_diff1 value: 14.708471591387005 - type: nauc_ndcg_at_3_max value: 11.70756510699363 - type: nauc_ndcg_at_3_std value: 9.658612404132116 - type: nauc_ndcg_at_5_diff1 value: 13.123868466784149 - type: nauc_ndcg_at_5_max value: 11.60382600862464 - type: nauc_ndcg_at_5_std value: 10.625775061954277 - type: nauc_precision_at_1000_diff1 value: 3.608251418490512 - type: nauc_precision_at_1000_max value: 20.501537930519582 - type: nauc_precision_at_1000_std value: 34.4770607840569 - type: nauc_precision_at_100_diff1 value: 7.864853652134883 - type: nauc_precision_at_100_max value: 19.894334894038547 - type: nauc_precision_at_100_std value: 28.711783183330663 - type: nauc_precision_at_10_diff1 value: 9.605214553552692 - type: nauc_precision_at_10_max value: 14.347596155123817 - type: nauc_precision_at_10_std value: 16.242794843380032 - type: nauc_precision_at_1_diff1 value: 16.83378691664147 - type: nauc_precision_at_1_max value: 12.112287067835906 - type: nauc_precision_at_1_std value: 8.418304606390475 - type: nauc_precision_at_20_diff1 value: 6.9964985542924545 - type: nauc_precision_at_20_max value: 17.275243538199216 - type: nauc_precision_at_20_std value: 20.986245055691036 - type: nauc_precision_at_3_diff1 value: 13.995705983866177 - type: nauc_precision_at_3_max value: 11.391320470301181 - type: nauc_precision_at_3_std value: 10.151716783634907 - type: nauc_precision_at_5_diff1 value: 11.064867165700008 - type: nauc_precision_at_5_max value: 10.965289810519257 - type: nauc_precision_at_5_std value: 11.837752544253021 - type: nauc_recall_at_1000_diff1 value: 3.4118402840027118 - type: nauc_recall_at_1000_max value: 21.505334337938027 - type: nauc_recall_at_1000_std value: 34.87205826061254 - type: nauc_recall_at_100_diff1 value: 7.793188645900735 - type: nauc_recall_at_100_max value: 20.09269964020807 - type: nauc_recall_at_100_std value: 28.838050639358375 - type: nauc_recall_at_10_diff1 value: 10.010288074812564 - type: nauc_recall_at_10_max value: 14.470333599080465 - type: nauc_recall_at_10_std value: 16.106977670704044 - type: nauc_recall_at_1_diff1 value: 16.986085327339335 - type: nauc_recall_at_1_max value: 12.246255844992572 - type: nauc_recall_at_1_std value: 7.863450169503143 - type: nauc_recall_at_20_diff1 value: 7.248991485381231 - type: nauc_recall_at_20_max value: 17.357162157871585 - type: nauc_recall_at_20_std value: 20.916649810908385 - type: nauc_recall_at_3_diff1 value: 14.190312777099356 - type: nauc_recall_at_3_max value: 11.494013846579504 - type: nauc_recall_at_3_std value: 9.871734511413411 - type: nauc_recall_at_5_diff1 value: 11.369318015463497 - type: nauc_recall_at_5_max value: 11.0867249382338 - type: nauc_recall_at_5_std value: 11.565786080587733 - type: ndcg_at_1 value: 15.0 - type: ndcg_at_10 value: 12.386999999999999 - type: ndcg_at_100 value: 18.533 - type: ndcg_at_1000 value: 23.955000000000002 - type: ndcg_at_20 value: 14.459 - type: ndcg_at_3 value: 11.75 - type: ndcg_at_5 value: 10.285 - type: precision_at_1 value: 15.0 - type: precision_at_10 value: 6.36 - type: precision_at_100 value: 1.528 - type: precision_at_1000 value: 0.28300000000000003 - type: precision_at_20 value: 4.375 - type: precision_at_3 value: 10.767 - type: precision_at_5 value: 8.9 - type: recall_at_1 value: 3.053 - type: recall_at_10 value: 12.873000000000001 - type: recall_at_100 value: 30.982 - type: recall_at_1000 value: 57.489999999999995 - type: recall_at_20 value: 17.718 - type: recall_at_3 value: 6.553000000000001 - type: recall_at_5 value: 9.013 - task: type: STS dataset: name: MTEB SICK-R (default) type: mteb/sickr-sts config: default split: test revision: 20a6d6f312dd54037fe07a32d58e5e168867909d metrics: - type: cosine_pearson value: 75.67336823619708 - type: cosine_spearman value: 64.6753400763881 - type: euclidean_pearson value: 69.13481550039579 - type: euclidean_spearman value: 64.6752133161514 - type: main_score value: 64.6753400763881 - type: manhattan_pearson value: 69.01619023671678 - type: manhattan_spearman value: 64.8728231074179 - type: pearson value: 75.67336823619708 - type: spearman value: 64.6753400763881 - task: type: STS dataset: name: MTEB STS12 (default) type: mteb/sts12-sts config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cosine_pearson value: 72.06681927996405 - type: cosine_spearman value: 62.248985055530525 - type: euclidean_pearson value: 68.05815981894538 - type: euclidean_spearman value: 62.248985055530525 - type: main_score value: 62.248985055530525 - type: manhattan_pearson value: 66.68543185400786 - type: manhattan_spearman value: 61.43850654925033 - type: pearson value: 72.06681927996405 - type: spearman value: 62.248985055530525 - task: type: STS dataset: name: MTEB STS13 (default) type: mteb/sts13-sts config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cosine_pearson value: 76.53983680018591 - type: cosine_spearman value: 77.27600787572996 - type: euclidean_pearson value: 76.77960647262235 - type: euclidean_spearman value: 77.27600787572996 - type: main_score value: 77.27600787572996 - type: manhattan_pearson value: 76.37651436440808 - type: manhattan_spearman value: 76.85568457177312 - type: pearson value: 76.53983680018591 - type: spearman value: 77.27600787572996 - task: type: STS dataset: name: MTEB STS14 (default) type: mteb/sts14-sts config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cosine_pearson value: 76.20854411766629 - type: cosine_spearman value: 71.914099628002 - type: euclidean_pearson value: 74.5273047891339 - type: euclidean_spearman value: 71.914099628002 - type: main_score value: 71.914099628002 - type: manhattan_pearson value: 74.53275458017302 - type: manhattan_spearman value: 71.9720930787841 - type: pearson value: 76.20854411766629 - type: spearman value: 71.914099628002 - task: type: STS dataset: name: MTEB STS15 (default) type: mteb/sts15-sts config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cosine_pearson value: 79.24273419832653 - type: cosine_spearman value: 79.75345871163103 - type: euclidean_pearson value: 79.31395801169265 - type: euclidean_spearman value: 79.75345871163103 - type: main_score value: 79.75345871163103 - type: manhattan_pearson value: 79.24199238927697 - type: manhattan_spearman value: 79.64058599210834 - type: pearson value: 79.24273419832653 - type: spearman value: 79.75345871163103 - task: type: STS dataset: name: MTEB STS16 (default) type: mteb/sts16-sts config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cosine_pearson value: 75.64452330127995 - type: cosine_spearman value: 76.26343823222666 - type: euclidean_pearson value: 75.64112047932008 - type: euclidean_spearman value: 76.26343823222666 - type: main_score value: 76.26343823222666 - type: manhattan_pearson value: 75.32718809126764 - type: manhattan_spearman value: 75.9420892784719 - type: pearson value: 75.64452330127995 - type: spearman value: 76.26343823222666 - task: type: STS dataset: name: MTEB STS17 (es-en) type: mteb/sts17-crosslingual-sts config: es-en split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 17.52217310066287 - type: cosine_spearman value: 14.729958484232528 - type: euclidean_pearson value: 17.507234354096582 - type: euclidean_spearman value: 14.729958484232528 - type: main_score value: 14.729958484232528 - type: manhattan_pearson value: 15.286020788097272 - type: manhattan_spearman value: 11.320242312589713 - type: pearson value: 17.52217310066287 - type: spearman value: 14.729958484232528 - task: type: STS dataset: name: MTEB STS17 (en-en) type: mteb/sts17-crosslingual-sts config: en-en split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 84.67406984717113 - type: cosine_spearman value: 85.96709815630739 - type: euclidean_pearson value: 84.7186375682207 - type: euclidean_spearman value: 85.96709815630739 - type: main_score value: 85.96709815630739 - type: manhattan_pearson value: 85.07894758059129 - type: manhattan_spearman value: 86.57110045700985 - type: pearson value: 84.67406984717113 - type: spearman value: 85.96709815630739 - task: type: STS dataset: name: MTEB STS17 (fr-en) type: mteb/sts17-crosslingual-sts config: fr-en split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 36.02331692863771 - type: cosine_spearman value: 34.28540470062557 - type: euclidean_pearson value: 35.996881386631514 - type: euclidean_spearman value: 34.28540470062557 - type: main_score value: 34.28540470062557 - type: manhattan_pearson value: 35.47246063445784 - type: manhattan_spearman value: 34.83247787211397 - type: pearson value: 36.02331692863771 - type: spearman value: 34.28540470062557 - task: type: STS dataset: name: MTEB STS17 (en-tr) type: mteb/sts17-crosslingual-sts config: en-tr split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 13.925983981770388 - type: cosine_spearman value: 11.193291331109325 - type: euclidean_pearson value: 13.9151651239108 - type: euclidean_spearman value: 11.193291331109325 - type: main_score value: 11.193291331109325 - type: manhattan_pearson value: 12.652407957594654 - type: manhattan_spearman value: 9.888358907769014 - type: pearson value: 13.925983981770388 - type: spearman value: 11.193291331109325 - task: type: STS dataset: name: MTEB STS17 (en-de) type: mteb/sts17-crosslingual-sts config: en-de split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 26.77839285232968 - type: cosine_spearman value: 23.010015986939717 - type: euclidean_pearson value: 27.13668235790385 - type: euclidean_spearman value: 23.010015986939717 - type: main_score value: 23.010015986939717 - type: manhattan_pearson value: 27.02698710744775 - type: manhattan_spearman value: 23.038730409304936 - type: pearson value: 26.77839285232968 - type: spearman value: 23.010015986939717 - task: type: STS dataset: name: MTEB STS17 (it-en) type: mteb/sts17-crosslingual-sts config: it-en split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 25.330935194314364 - type: cosine_spearman value: 23.143555348782797 - type: euclidean_pearson value: 24.670147594978143 - type: euclidean_spearman value: 23.143555348782797 - type: main_score value: 23.143555348782797 - type: manhattan_pearson value: 24.879695698914418 - type: manhattan_spearman value: 25.916630507885134 - type: pearson value: 25.330935194314364 - type: spearman value: 23.143555348782797 - task: type: STS dataset: name: MTEB STS17 (en-ar) type: mteb/sts17-crosslingual-sts config: en-ar split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 6.61651078645899 - type: cosine_spearman value: 5.415104433010482 - type: euclidean_pearson value: 6.791575957480809 - type: euclidean_spearman value: 5.415104433010482 - type: main_score value: 5.415104433010482 - type: manhattan_pearson value: 3.6585407382250987 - type: manhattan_spearman value: 4.566044103659472 - type: pearson value: 6.61651078645899 - type: spearman value: 5.415104433010482 - task: type: STS dataset: name: MTEB STS17 (nl-en) type: mteb/sts17-crosslingual-sts config: nl-en split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 32.718045784523184 - type: cosine_spearman value: 27.52844368619317 - type: euclidean_pearson value: 32.98978359596458 - type: euclidean_spearman value: 27.52844368619317 - type: main_score value: 27.52844368619317 - type: manhattan_pearson value: 35.57923949366344 - type: manhattan_spearman value: 34.27137422651138 - type: pearson value: 32.718045784523184 - type: spearman value: 27.52844368619317 - task: type: STS dataset: name: MTEB STS22 (es-en) type: mteb/sts22-crosslingual-sts config: es-en split: test revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 metrics: - type: cosine_pearson value: 9.98410299881163 - type: cosine_spearman value: 10.98684405086525 - type: euclidean_pearson value: 9.461680781495218 - type: euclidean_spearman value: 10.9925413190658 - type: main_score value: 10.98684405086525 - type: manhattan_pearson value: 9.442055271895944 - type: manhattan_spearman value: 11.226101908391069 - type: pearson value: 9.98410299881163 - type: spearman value: 10.98684405086525 - task: type: STS dataset: name: MTEB STS22 (en) type: mteb/sts22-crosslingual-sts config: en split: test revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 metrics: - type: cosine_pearson value: 59.3180680265132 - type: cosine_spearman value: 63.07956002739231 - type: euclidean_pearson value: 62.46424835000928 - type: euclidean_spearman value: 63.07956002739231 - type: main_score value: 63.07956002739231 - type: manhattan_pearson value: 62.048137683643766 - type: manhattan_spearman value: 61.83898606879604 - type: pearson value: 59.3180680265132 - type: spearman value: 63.07956002739231 - task: type: STS dataset: name: MTEB STS22 (de-en) type: mteb/sts22-crosslingual-sts config: de-en split: test revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 metrics: - type: cosine_pearson value: 29.061215770374826 - type: cosine_spearman value: 36.21441725938738 - type: euclidean_pearson value: 28.44045530150387 - type: euclidean_spearman value: 36.21441725938738 - type: main_score value: 36.21441725938738 - type: manhattan_pearson value: 29.32403221599612 - type: manhattan_spearman value: 38.914481153396494 - type: pearson value: 29.061215770374826 - type: spearman value: 36.21441725938738 - task: type: STS dataset: name: MTEB STS22 (zh-en) type: mteb/sts22-crosslingual-sts config: zh-en split: test revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 metrics: - type: cosine_pearson value: 11.266385865086239 - type: cosine_spearman value: 17.291293843893733 - type: euclidean_pearson value: 10.045897285683115 - type: euclidean_spearman value: 17.321323804048646 - type: main_score value: 17.291293843893733 - type: manhattan_pearson value: 15.333482209624194 - type: manhattan_spearman value: 20.399166731513915 - type: pearson value: 11.266385865086239 - type: spearman value: 17.291293843893733 - task: type: STS dataset: name: MTEB STS22 (pl-en) type: mteb/sts22-crosslingual-sts config: pl-en split: test revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 metrics: - type: cosine_pearson value: 9.647587208410648 - type: cosine_spearman value: 21.33739699413266 - type: euclidean_pearson value: 7.451981822243237 - type: euclidean_spearman value: 21.33739699413266 - type: main_score value: 21.33739699413266 - type: manhattan_pearson value: 10.05280275870948 - type: manhattan_spearman value: 22.233400969472218 - type: pearson value: 9.647587208410648 - type: spearman value: 21.33739699413266 - task: type: STS dataset: name: MTEB STSBenchmark (default) type: mteb/stsbenchmark-sts config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cosine_pearson value: 77.2598255013409 - type: cosine_spearman value: 75.40519061413276 - type: euclidean_pearson value: 77.19878276657876 - type: euclidean_spearman value: 75.40519061413276 - type: main_score value: 75.40519061413276 - type: manhattan_pearson value: 77.04099640594512 - type: manhattan_spearman value: 75.32219501493076 - type: pearson value: 77.2598255013409 - type: spearman value: 75.40519061413276 - task: type: Reranking dataset: name: MTEB SciDocsRR (default) type: mteb/scidocs-reranking config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: main_score value: 72.10127087089839 - type: map value: 72.10127087089839 - type: mrr value: 90.62288020621355 - type: nAUC_map_diff1 value: 8.726677558277695 - type: nAUC_map_max value: 54.59636736704295 - type: nAUC_map_std value: 67.36367052533402 - type: nAUC_mrr_diff1 value: 47.77588337162405 - type: nAUC_mrr_max value: 74.90946175462605 - type: nAUC_mrr_std value: 71.81332269641806 - task: type: Retrieval dataset: name: MTEB SciFact (default) type: mteb/scifact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: main_score value: 50.63999999999999 - type: map_at_1 value: 35.5 - type: map_at_10 value: 45.238 - type: map_at_100 value: 46.135999999999996 - type: map_at_1000 value: 46.181 - type: map_at_20 value: 45.767 - type: map_at_3 value: 42.329 - type: map_at_5 value: 44.054 - type: mrr_at_1 value: 37.666666666666664 - type: mrr_at_10 value: 46.6611111111111 - type: mrr_at_100 value: 47.37819687814183 - type: mrr_at_1000 value: 47.417644921595766 - type: mrr_at_20 value: 47.06856780130773 - type: mrr_at_3 value: 43.94444444444443 - type: mrr_at_5 value: 45.52777777777777 - type: nauc_map_at_1000_diff1 value: 52.83081390161976 - type: nauc_map_at_1000_max value: 37.21621852995913 - type: nauc_map_at_1000_std value: -3.416369626271914 - type: nauc_map_at_100_diff1 value: 52.823502489139884 - type: nauc_map_at_100_max value: 37.2435733087758 - type: nauc_map_at_100_std value: -3.376708460074628 - type: nauc_map_at_10_diff1 value: 52.495695868970785 - type: nauc_map_at_10_max value: 36.79244353087952 - type: nauc_map_at_10_std value: -3.998841918813238 - type: nauc_map_at_1_diff1 value: 55.20714819661926 - type: nauc_map_at_1_max value: 33.68583272500883 - type: nauc_map_at_1_std value: -7.806502386166579 - type: nauc_map_at_20_diff1 value: 52.82557233788675 - type: nauc_map_at_20_max value: 37.02532534485883 - type: nauc_map_at_20_std value: -3.6962702134516126 - type: nauc_map_at_3_diff1 value: 53.005833884053054 - type: nauc_map_at_3_max value: 35.102473883265056 - type: nauc_map_at_3_std value: -6.237364868462919 - type: nauc_map_at_5_diff1 value: 52.67151253564545 - type: nauc_map_at_5_max value: 36.083416260083574 - type: nauc_map_at_5_std value: -4.7023113318143785 - type: nauc_mrr_at_1000_diff1 value: 52.938698102997094 - type: nauc_mrr_at_1000_max value: 39.46705187537523 - type: nauc_mrr_at_1000_std value: 0.6163818152860598 - type: nauc_mrr_at_100_diff1 value: 52.93491193041612 - type: nauc_mrr_at_100_max value: 39.490426719059165 - type: nauc_mrr_at_100_std value: 0.6662007971949842 - type: nauc_mrr_at_10_diff1 value: 52.70216069864656 - type: nauc_mrr_at_10_max value: 39.52193808791504 - type: nauc_mrr_at_10_std value: 0.536595037291294 - type: nauc_mrr_at_1_diff1 value: 55.77100806609076 - type: nauc_mrr_at_1_max value: 37.966164940491446 - type: nauc_mrr_at_1_std value: -2.1074234936282537 - type: nauc_mrr_at_20_diff1 value: 52.942136130524986 - type: nauc_mrr_at_20_max value: 39.42716448302782 - type: nauc_mrr_at_20_std value: 0.5472281187662744 - type: nauc_mrr_at_3_diff1 value: 53.144295072591206 - type: nauc_mrr_at_3_max value: 38.05294316134295 - type: nauc_mrr_at_3_std value: -1.2360608664776096 - type: nauc_mrr_at_5_diff1 value: 52.789220500594205 - type: nauc_mrr_at_5_max value: 38.83395427252616 - type: nauc_mrr_at_5_std value: -0.09099470685601964 - type: nauc_ndcg_at_1000_diff1 value: 52.16867590195915 - type: nauc_ndcg_at_1000_max value: 39.70115643730131 - type: nauc_ndcg_at_1000_std value: 0.904258507053096 - type: nauc_ndcg_at_100_diff1 value: 51.87328245345757 - type: nauc_ndcg_at_100_max value: 40.59055338026654 - type: nauc_ndcg_at_100_std value: 2.554356951645788 - type: nauc_ndcg_at_10_diff1 value: 50.809281234563805 - type: nauc_ndcg_at_10_max value: 39.085094925973245 - type: nauc_ndcg_at_10_std value: -0.23387754671232033 - type: nauc_ndcg_at_1_diff1 value: 55.77100806609076 - type: nauc_ndcg_at_1_max value: 37.966164940491446 - type: nauc_ndcg_at_1_std value: -2.1074234936282537 - type: nauc_ndcg_at_20_diff1 value: 51.74864887078553 - type: nauc_ndcg_at_20_max value: 39.32033115509482 - type: nauc_ndcg_at_20_std value: 0.4346356935494506 - type: nauc_ndcg_at_3_diff1 value: 51.9909705702443 - type: nauc_ndcg_at_3_max value: 36.078476037019094 - type: nauc_ndcg_at_3_std value: -4.014502363911228 - type: nauc_ndcg_at_5_diff1 value: 51.312788955634325 - type: nauc_ndcg_at_5_max value: 37.54290824294073 - type: nauc_ndcg_at_5_std value: -1.8169251273098448 - type: nauc_precision_at_1000_diff1 value: 1.4596703970072096 - type: nauc_precision_at_1000_max value: 36.408552907408 - type: nauc_precision_at_1000_std value: 53.892991905053776 - type: nauc_precision_at_100_diff1 value: 17.90829681479967 - type: nauc_precision_at_100_max value: 50.02058762977557 - type: nauc_precision_at_100_std value: 50.95242296795188 - type: nauc_precision_at_10_diff1 value: 33.69533492770854 - type: nauc_precision_at_10_max value: 47.554637845938025 - type: nauc_precision_at_10_std value: 21.812883074791838 - type: nauc_precision_at_1_diff1 value: 55.77100806609076 - type: nauc_precision_at_1_max value: 37.966164940491446 - type: nauc_precision_at_1_std value: -2.1074234936282537 - type: nauc_precision_at_20_diff1 value: 31.797703948512723 - type: nauc_precision_at_20_max value: 46.94077230822751 - type: nauc_precision_at_20_std value: 29.525569664289396 - type: nauc_precision_at_3_diff1 value: 41.753151429999456 - type: nauc_precision_at_3_max value: 38.30163209243931 - type: nauc_precision_at_3_std value: 6.19935377482869 - type: nauc_precision_at_5_diff1 value: 38.479320931912575 - type: nauc_precision_at_5_max value: 41.576866734894516 - type: nauc_precision_at_5_std value: 13.327714566652604 - type: nauc_recall_at_1000_diff1 value: 50.28923446773287 - type: nauc_recall_at_1000_max value: 68.29528746364413 - type: nauc_recall_at_1000_std value: 48.2313231806132 - type: nauc_recall_at_100_diff1 value: 46.22085619290839 - type: nauc_recall_at_100_max value: 61.60933703216747 - type: nauc_recall_at_100_std value: 42.210649980610896 - type: nauc_recall_at_10_diff1 value: 43.10485234893865 - type: nauc_recall_at_10_max value: 43.06779802776641 - type: nauc_recall_at_10_std value: 8.272818985431385 - type: nauc_recall_at_1_diff1 value: 55.20714819661926 - type: nauc_recall_at_1_max value: 33.68583272500883 - type: nauc_recall_at_1_std value: -7.806502386166579 - type: nauc_recall_at_20_diff1 value: 46.850902149595036 - type: nauc_recall_at_20_max value: 44.58623368637416 - type: nauc_recall_at_20_std value: 11.890054420031708 - type: nauc_recall_at_3_diff1 value: 48.80301236823221 - type: nauc_recall_at_3_max value: 34.177890037375 - type: nauc_recall_at_3_std value: -3.852215004054359 - type: nauc_recall_at_5_diff1 value: 46.206941308622056 - type: nauc_recall_at_5_max value: 38.61994260176494 - type: nauc_recall_at_5_std value: 2.735469769782116 - type: ndcg_at_1 value: 37.667 - type: ndcg_at_10 value: 50.63999999999999 - type: ndcg_at_100 value: 54.885 - type: ndcg_at_1000 value: 56.274 - type: ndcg_at_20 value: 52.349000000000004 - type: ndcg_at_3 value: 44.891999999999996 - type: ndcg_at_5 value: 47.788000000000004 - type: precision_at_1 value: 37.667 - type: precision_at_10 value: 7.3 - type: precision_at_100 value: 0.97 - type: precision_at_1000 value: 0.11 - type: precision_at_20 value: 4.067 - type: precision_at_3 value: 18.333 - type: precision_at_5 value: 12.6 - type: recall_at_1 value: 35.5 - type: recall_at_10 value: 66.178 - type: recall_at_100 value: 85.9 - type: recall_at_1000 value: 97.1 - type: recall_at_20 value: 72.60600000000001 - type: recall_at_3 value: 50.306 - type: recall_at_5 value: 57.443999999999996 - task: type: PairClassification dataset: name: MTEB SprintDuplicateQuestions (default) type: mteb/sprintduplicatequestions-pairclassification config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cosine_accuracy value: 99.71386138613862 - type: cosine_accuracy_threshold value: 78.56961662426235 - type: cosine_ap value: 90.20131927652946 - type: cosine_f1 value: 84.7749114820435 - type: cosine_f1_threshold value: 75.7768544371973 - type: cosine_precision value: 85.7727737973388 - type: cosine_recall value: 83.8 - type: dot_accuracy value: 99.71386138613862 - type: dot_accuracy_threshold value: 78.56961780669964 - type: dot_ap value: 90.20131927652946 - type: dot_f1 value: 84.7749114820435 - type: dot_f1_threshold value: 75.77685228378391 - type: dot_precision value: 85.7727737973388 - type: dot_recall value: 83.8 - type: euclidean_accuracy value: 99.71386138613862 - type: euclidean_accuracy_threshold value: 65.46813529720524 - type: euclidean_ap value: 90.20131927652946 - type: euclidean_f1 value: 84.7749114820435 - type: euclidean_f1_threshold value: 69.60336608830053 - type: euclidean_precision value: 85.7727737973388 - type: euclidean_recall value: 83.8 - type: main_score value: 90.20131927652946 - type: manhattan_accuracy value: 99.7059405940594 - type: manhattan_accuracy_threshold value: 804.8100425289704 - type: manhattan_ap value: 90.00682250828237 - type: manhattan_f1 value: 84.44211629125196 - type: manhattan_f1_threshold value: 828.8486447498144 - type: manhattan_precision value: 88.66886688668868 - type: manhattan_recall value: 80.60000000000001 - type: max_accuracy value: 99.71386138613862 - type: max_ap value: 90.20131927652946 - type: max_f1 value: 84.7749114820435 - type: max_precision value: 88.66886688668868 - type: max_recall value: 83.8 - type: similarity_accuracy value: 99.71386138613862 - type: similarity_accuracy_threshold value: 78.56961662426235 - type: similarity_ap value: 90.20131927652946 - type: similarity_f1 value: 84.7749114820435 - type: similarity_f1_threshold value: 75.7768544371973 - type: similarity_precision value: 85.7727737973388 - type: similarity_recall value: 83.8 - task: type: Clustering dataset: name: MTEB StackExchangeClustering (default) type: mteb/stackexchange-clustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: main_score value: 48.18939518021159 - type: v_measure value: 48.18939518021159 - type: v_measure_std value: 4.6189444340187995 - task: type: Clustering dataset: name: MTEB StackExchangeClusteringP2P (default) type: mteb/stackexchange-clustering-p2p config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: main_score value: 30.743938802421265 - type: v_measure value: 30.743938802421265 - type: v_measure_std value: 1.4645401677053824 - task: type: Reranking dataset: name: MTEB StackOverflowDupQuestions (default) type: mteb/stackoverflowdupquestions-reranking config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: main_score value: 43.254152892780986 - type: map value: 43.254152892780986 - type: mrr value: 43.70483989050165 - type: nAUC_map_diff1 value: 33.22453777168869 - type: nAUC_map_max value: 13.175366935671228 - type: nAUC_map_std value: 3.718253924398536 - type: nAUC_mrr_diff1 value: 32.58818809467491 - type: nAUC_mrr_max value: 14.093758435205075 - type: nAUC_mrr_std value: 4.198791420159734 - task: type: Summarization dataset: name: MTEB SummEval (default) type: mteb/summeval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cosine_pearson value: 29.88360050203766 - type: cosine_spearman value: 29.275185932109494 - type: dot_pearson value: 29.883597746108975 - type: dot_spearman value: 29.28377974870949 - type: main_score value: 29.275185932109494 - type: pearson value: 29.88360050203766 - type: spearman value: 29.275185932109494 - task: type: Retrieval dataset: name: MTEB TRECCOVID (default) type: mteb/trec-covid config: default split: test revision: bb9466bac8153a0349341eb1b22e06409e78ef4e metrics: - type: main_score value: 45.747 - type: map_at_1 value: 0.148 - type: map_at_10 value: 0.972 - type: map_at_100 value: 4.652 - type: map_at_1000 value: 11.511000000000001 - type: map_at_20 value: 1.643 - type: map_at_3 value: 0.369 - type: map_at_5 value: 0.561 - type: mrr_at_1 value: 62.0 - type: mrr_at_10 value: 70.06904761904761 - type: mrr_at_100 value: 70.45500059672992 - type: mrr_at_1000 value: 70.45500059672992 - type: mrr_at_20 value: 70.31716791979949 - type: mrr_at_3 value: 68.0 - type: mrr_at_5 value: 69.19999999999999 - type: nauc_map_at_1000_diff1 value: -0.8266899821302324 - type: nauc_map_at_1000_max value: 34.62914536640893 - type: nauc_map_at_1000_std value: 57.177693387251615 - type: nauc_map_at_100_diff1 value: -3.3097934383165613 - type: nauc_map_at_100_max value: 22.052336613600293 - type: nauc_map_at_100_std value: 29.905360060478188 - type: nauc_map_at_10_diff1 value: 6.057035481050755 - type: nauc_map_at_10_max value: 22.742824418774667 - type: nauc_map_at_10_std value: 5.649441588476496 - type: nauc_map_at_1_diff1 value: 10.469485578180873 - type: nauc_map_at_1_max value: 4.582098501050435 - type: nauc_map_at_1_std value: -10.47482550446343 - type: nauc_map_at_20_diff1 value: 1.5813367839245727 - type: nauc_map_at_20_max value: 25.09380802651507 - type: nauc_map_at_20_std value: 11.733045886140895 - type: nauc_map_at_3_diff1 value: -0.4174848325628528 - type: nauc_map_at_3_max value: 16.54291715633098 - type: nauc_map_at_3_std value: -6.315368365719176 - type: nauc_map_at_5_diff1 value: 1.6439114449809122 - type: nauc_map_at_5_max value: 18.119472468345634 - type: nauc_map_at_5_std value: -1.4642215840068935 - type: nauc_mrr_at_1000_diff1 value: 19.962304210632194 - type: nauc_mrr_at_1000_max value: 28.66281052259736 - type: nauc_mrr_at_1000_std value: 14.4833499197582 - type: nauc_mrr_at_100_diff1 value: 19.962304210632194 - type: nauc_mrr_at_100_max value: 28.66281052259736 - type: nauc_mrr_at_100_std value: 14.4833499197582 - type: nauc_mrr_at_10_diff1 value: 19.79498540271038 - type: nauc_mrr_at_10_max value: 28.07551011390951 - type: nauc_mrr_at_10_std value: 13.820791565247939 - type: nauc_mrr_at_1_diff1 value: 23.72088730271045 - type: nauc_mrr_at_1_max value: 29.338830261821947 - type: nauc_mrr_at_1_std value: 10.463649509276033 - type: nauc_mrr_at_20_diff1 value: 20.06776286940325 - type: nauc_mrr_at_20_max value: 28.69272909781133 - type: nauc_mrr_at_20_std value: 14.560673636667628 - type: nauc_mrr_at_3_diff1 value: 18.71166001912622 - type: nauc_mrr_at_3_max value: 30.645161290322555 - type: nauc_mrr_at_3_std value: 16.37394164159257 - type: nauc_mrr_at_5_diff1 value: 15.791374902745353 - type: nauc_mrr_at_5_max value: 28.51602708149093 - type: nauc_mrr_at_5_std value: 15.246386476651619 - type: nauc_ndcg_at_1000_diff1 value: -5.179304837164554 - type: nauc_ndcg_at_1000_max value: 27.27301986190763 - type: nauc_ndcg_at_1000_std value: 49.239144813886654 - type: nauc_ndcg_at_100_diff1 value: 7.283019925558149 - type: nauc_ndcg_at_100_max value: 29.80340187562149 - type: nauc_ndcg_at_100_std value: 47.60799676958296 - type: nauc_ndcg_at_10_diff1 value: 11.621471677557253 - type: nauc_ndcg_at_10_max value: 31.78727749460396 - type: nauc_ndcg_at_10_std value: 26.339328462146177 - type: nauc_ndcg_at_1_diff1 value: 26.896384303421446 - type: nauc_ndcg_at_1_max value: 28.727080596332872 - type: nauc_ndcg_at_1_std value: 12.10515793682523 - type: nauc_ndcg_at_20_diff1 value: 7.253524538786647 - type: nauc_ndcg_at_20_max value: 33.412855576178295 - type: nauc_ndcg_at_20_std value: 34.10895211064073 - type: nauc_ndcg_at_3_diff1 value: 11.303112239393863 - type: nauc_ndcg_at_3_max value: 35.0880605283756 - type: nauc_ndcg_at_3_std value: 18.514877130637803 - type: nauc_ndcg_at_5_diff1 value: 8.537541001217583 - type: nauc_ndcg_at_5_max value: 32.24796400964019 - type: nauc_ndcg_at_5_std value: 21.65596013895985 - type: nauc_precision_at_1000_diff1 value: 5.217123572202896 - type: nauc_precision_at_1000_max value: 31.954154167309177 - type: nauc_precision_at_1000_std value: 60.51613061301686 - type: nauc_precision_at_100_diff1 value: 5.748688865778208 - type: nauc_precision_at_100_max value: 28.503515028630567 - type: nauc_precision_at_100_std value: 52.8175811950368 - type: nauc_precision_at_10_diff1 value: 9.634424129349284 - type: nauc_precision_at_10_max value: 33.90210630229416 - type: nauc_precision_at_10_std value: 30.197787312348073 - type: nauc_precision_at_1_diff1 value: 23.72088730271045 - type: nauc_precision_at_1_max value: 29.338830261821947 - type: nauc_precision_at_1_std value: 10.463649509276033 - type: nauc_precision_at_20_diff1 value: 2.6440820197838923 - type: nauc_precision_at_20_max value: 36.6927642980172 - type: nauc_precision_at_20_std value: 40.53918258763216 - type: nauc_precision_at_3_diff1 value: 2.9773659425793695 - type: nauc_precision_at_3_max value: 35.63522203655881 - type: nauc_precision_at_3_std value: 17.365942579371055 - type: nauc_precision_at_5_diff1 value: 3.883249981522982 - type: nauc_precision_at_5_max value: 34.19785174053362 - type: nauc_precision_at_5_std value: 25.391096548495977 - type: nauc_recall_at_1000_diff1 value: -10.977265624215267 - type: nauc_recall_at_1000_max value: 22.349720150932985 - type: nauc_recall_at_1000_std value: 47.14118127199015 - type: nauc_recall_at_100_diff1 value: -10.566105105889243 - type: nauc_recall_at_100_max value: 13.59897332326766 - type: nauc_recall_at_100_std value: 25.1260269383207 - type: nauc_recall_at_10_diff1 value: 3.9418824014124514 - type: nauc_recall_at_10_max value: 18.87305117920693 - type: nauc_recall_at_10_std value: 4.227456274746917 - type: nauc_recall_at_1_diff1 value: 10.469485578180873 - type: nauc_recall_at_1_max value: 4.582098501050435 - type: nauc_recall_at_1_std value: -10.47482550446343 - type: nauc_recall_at_20_diff1 value: -3.663384950691917 - type: nauc_recall_at_20_max value: 20.838703493064635 - type: nauc_recall_at_20_std value: 10.729793670370862 - type: nauc_recall_at_3_diff1 value: -1.1850402683856456 - type: nauc_recall_at_3_max value: 16.033671610288522 - type: nauc_recall_at_3_std value: -6.953520529126048 - type: nauc_recall_at_5_diff1 value: -0.5156927662191768 - type: nauc_recall_at_5_max value: 15.556954479927315 - type: nauc_recall_at_5_std value: -2.965229848389009 - type: ndcg_at_1 value: 56.00000000000001 - type: ndcg_at_10 value: 45.747 - type: ndcg_at_100 value: 32.761 - type: ndcg_at_1000 value: 29.633 - type: ndcg_at_20 value: 42.905 - type: ndcg_at_3 value: 50.641999999999996 - type: ndcg_at_5 value: 48.231 - type: precision_at_1 value: 62.0 - type: precision_at_10 value: 47.8 - type: precision_at_100 value: 33.72 - type: precision_at_1000 value: 14.238000000000001 - type: precision_at_20 value: 45.2 - type: precision_at_3 value: 54.0 - type: precision_at_5 value: 50.8 - type: recall_at_1 value: 0.148 - type: recall_at_10 value: 1.143 - type: recall_at_100 value: 7.219 - type: recall_at_1000 value: 28.294999999999998 - type: recall_at_20 value: 2.083 - type: recall_at_3 value: 0.395 - type: recall_at_5 value: 0.628 - task: type: Retrieval dataset: name: MTEB Touche2020 (default) type: mteb/touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: main_score value: 18.618000000000002 - type: map_at_1 value: 1.22 - type: map_at_10 value: 6.635000000000001 - type: map_at_100 value: 10.873 - type: map_at_1000 value: 12.415 - type: map_at_20 value: 8.334 - type: map_at_3 value: 2.8240000000000003 - type: map_at_5 value: 4.111 - type: mrr_at_1 value: 14.285714285714285 - type: mrr_at_10 value: 31.959831551668284 - type: mrr_at_100 value: 33.15059576942869 - type: mrr_at_1000 value: 33.15059576942869 - type: mrr_at_20 value: 32.685999641281754 - type: mrr_at_3 value: 25.850340136054424 - type: mrr_at_5 value: 29.31972789115646 - type: nauc_map_at_1000_diff1 value: 8.820920087157313 - type: nauc_map_at_1000_max value: -33.58280072902863 - type: nauc_map_at_1000_std value: -22.730292551065183 - type: nauc_map_at_100_diff1 value: 9.741008911531535 - type: nauc_map_at_100_max value: -33.6532837418042 - type: nauc_map_at_100_std value: -28.3444309192652 - type: nauc_map_at_10_diff1 value: 7.657150877271815 - type: nauc_map_at_10_max value: -41.7412362957407 - type: nauc_map_at_10_std value: -35.66062824513052 - type: nauc_map_at_1_diff1 value: 7.593190069621649 - type: nauc_map_at_1_max value: -39.58442010649443 - type: nauc_map_at_1_std value: -22.564719811889777 - type: nauc_map_at_20_diff1 value: 7.245303325270055 - type: nauc_map_at_20_max value: -37.804327180430946 - type: nauc_map_at_20_std value: -32.702756826489846 - type: nauc_map_at_3_diff1 value: 6.742365189818029 - type: nauc_map_at_3_max value: -41.7228290771728 - type: nauc_map_at_3_std value: -30.230168338925107 - type: nauc_map_at_5_diff1 value: 11.935913888588882 - type: nauc_map_at_5_max value: -41.39335754887243 - type: nauc_map_at_5_std value: -33.780157609546535 - type: nauc_mrr_at_1000_diff1 value: -1.6708159098532442 - type: nauc_mrr_at_1000_max value: -36.55890935351506 - type: nauc_mrr_at_1000_std value: -24.27343264470873 - type: nauc_mrr_at_100_diff1 value: -1.6708159098532442 - type: nauc_mrr_at_100_max value: -36.55890935351506 - type: nauc_mrr_at_100_std value: -24.27343264470873 - type: nauc_mrr_at_10_diff1 value: -0.42650070974468685 - type: nauc_mrr_at_10_max value: -37.09244916127389 - type: nauc_mrr_at_10_std value: -24.66093983608399 - type: nauc_mrr_at_1_diff1 value: -5.630573652147252 - type: nauc_mrr_at_1_max value: -33.616658797870684 - type: nauc_mrr_at_1_std value: -23.601564115907 - type: nauc_mrr_at_20_diff1 value: -1.832519847770416 - type: nauc_mrr_at_20_max value: -37.12461848720876 - type: nauc_mrr_at_20_std value: -24.697864546344437 - type: nauc_mrr_at_3_diff1 value: -0.005683436651441496 - type: nauc_mrr_at_3_max value: -32.50516010446863 - type: nauc_mrr_at_3_std value: -21.544877233050823 - type: nauc_mrr_at_5_diff1 value: -2.354001730958692 - type: nauc_mrr_at_5_max value: -32.51899298268129 - type: nauc_mrr_at_5_std value: -23.68035252143919 - type: nauc_ndcg_at_1000_diff1 value: 14.007950932108976 - type: nauc_ndcg_at_1000_max value: -31.274257790464837 - type: nauc_ndcg_at_1000_std value: 3.658749568249879 - type: nauc_ndcg_at_100_diff1 value: 13.626007116136158 - type: nauc_ndcg_at_100_max value: -35.59107319590088 - type: nauc_ndcg_at_100_std value: -18.874707006492024 - type: nauc_ndcg_at_10_diff1 value: 9.82558048538336 - type: nauc_ndcg_at_10_max value: -39.51461465840459 - type: nauc_ndcg_at_10_std value: -30.33405672804229 - type: nauc_ndcg_at_1_diff1 value: -1.598770159246464 - type: nauc_ndcg_at_1_max value: -31.975857803575675 - type: nauc_ndcg_at_1_std value: -18.993368614347663 - type: nauc_ndcg_at_20_diff1 value: 11.616460882964375 - type: nauc_ndcg_at_20_max value: -36.68867443298684 - type: nauc_ndcg_at_20_std value: -27.831158282067598 - type: nauc_ndcg_at_3_diff1 value: 3.6760483719742556 - type: nauc_ndcg_at_3_max value: -30.935030030092992 - type: nauc_ndcg_at_3_std value: -18.717891674270643 - type: nauc_ndcg_at_5_diff1 value: 10.773599917143413 - type: nauc_ndcg_at_5_max value: -31.08451038101287 - type: nauc_ndcg_at_5_std value: -25.478457258577336 - type: nauc_precision_at_1000_diff1 value: -6.780225586359699 - type: nauc_precision_at_1000_max value: 38.71975790762798 - type: nauc_precision_at_1000_std value: 57.8083677042306 - type: nauc_precision_at_100_diff1 value: 2.959136061872892 - type: nauc_precision_at_100_max value: -8.27764507575222 - type: nauc_precision_at_100_std value: 5.742410187313611 - type: nauc_precision_at_10_diff1 value: 9.882789695687109 - type: nauc_precision_at_10_max value: -31.486245698037102 - type: nauc_precision_at_10_std value: -29.081919554833874 - type: nauc_precision_at_1_diff1 value: -5.630573652147252 - type: nauc_precision_at_1_max value: -33.616658797870684 - type: nauc_precision_at_1_std value: -23.601564115907 - type: nauc_precision_at_20_diff1 value: 5.165999913921455 - type: nauc_precision_at_20_max value: -19.322665087378923 - type: nauc_precision_at_20_std value: -19.841805142598865 - type: nauc_precision_at_3_diff1 value: 2.846740832419061 - type: nauc_precision_at_3_max value: -30.76562032864513 - type: nauc_precision_at_3_std value: -23.610192672373636 - type: nauc_precision_at_5_diff1 value: 13.83881140180208 - type: nauc_precision_at_5_max value: -23.40672207825652 - type: nauc_precision_at_5_std value: -26.803291207458884 - type: nauc_recall_at_1000_diff1 value: 5.989093134294799 - type: nauc_recall_at_1000_max value: -23.01810906637643 - type: nauc_recall_at_1000_std value: 51.72967782759332 - type: nauc_recall_at_100_diff1 value: 9.279568158025599 - type: nauc_recall_at_100_max value: -32.49225165397591 - type: nauc_recall_at_100_std value: -14.266931753931292 - type: nauc_recall_at_10_diff1 value: 8.789441102892894 - type: nauc_recall_at_10_max value: -41.575759675933185 - type: nauc_recall_at_10_std value: -36.066608504981836 - type: nauc_recall_at_1_diff1 value: 7.593190069621649 - type: nauc_recall_at_1_max value: -39.58442010649443 - type: nauc_recall_at_1_std value: -22.564719811889777 - type: nauc_recall_at_20_diff1 value: 7.288095720364289 - type: nauc_recall_at_20_max value: -34.19747470428325 - type: nauc_recall_at_20_std value: -29.334755464530023 - type: nauc_recall_at_3_diff1 value: 7.541743741210702 - type: nauc_recall_at_3_max value: -38.357726279072416 - type: nauc_recall_at_3_std value: -29.877869977138204 - type: nauc_recall_at_5_diff1 value: 11.512545675995455 - type: nauc_recall_at_5_max value: -37.366204857623586 - type: nauc_recall_at_5_std value: -33.58926486109219 - type: ndcg_at_1 value: 12.245000000000001 - type: ndcg_at_10 value: 18.618000000000002 - type: ndcg_at_100 value: 28.488000000000003 - type: ndcg_at_1000 value: 41.208 - type: ndcg_at_20 value: 19.536 - type: ndcg_at_3 value: 15.045 - type: ndcg_at_5 value: 16.359 - type: precision_at_1 value: 14.285999999999998 - type: precision_at_10 value: 19.796 - type: precision_at_100 value: 6.5920000000000005 - type: precision_at_1000 value: 1.471 - type: precision_at_20 value: 15.204 - type: precision_at_3 value: 18.367 - type: precision_at_5 value: 18.776 - type: recall_at_1 value: 1.22 - type: recall_at_10 value: 13.763 - type: recall_at_100 value: 40.107 - type: recall_at_1000 value: 79.06800000000001 - type: recall_at_20 value: 20.049 - type: recall_at_3 value: 4.2540000000000004 - type: recall_at_5 value: 7.142999999999999 - task: type: Classification dataset: name: MTEB ToxicConversationsClassification (default) type: mteb/toxic_conversations_50k config: default split: test revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de metrics: - type: accuracy value: 69.0625 - type: ap value: 12.429057046174089 - type: ap_weighted value: 12.429057046174089 - type: f1 value: 52.366056859622454 - type: f1_weighted value: 75.91632061778698 - type: main_score value: 69.0625 - task: type: Classification dataset: name: MTEB TweetSentimentExtractionClassification (default) type: mteb/tweet_sentiment_extraction config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 55.387662705149964 - type: f1 value: 55.62292803889264 - type: f1_weighted value: 55.01561915660653 - type: main_score value: 55.387662705149964 - task: type: Clustering dataset: name: MTEB TwentyNewsgroupsClustering (default) type: mteb/twentynewsgroups-clustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: main_score value: 33.535908963951435 - type: v_measure value: 33.535908963951435 - type: v_measure_std value: 1.8862804680454297 - task: type: PairClassification dataset: name: MTEB TwitterSemEval2015 (default) type: mteb/twittersemeval2015-pairclassification config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cosine_accuracy value: 81.57000655659535 - type: cosine_accuracy_threshold value: 76.01186428039885 - type: cosine_ap value: 57.187252502171674 - type: cosine_f1 value: 54.94480738905159 - type: cosine_f1_threshold value: 63.27845286960887 - type: cosine_precision value: 47.93632075471698 - type: cosine_recall value: 64.35356200527704 - type: dot_accuracy value: 81.57000655659535 - type: dot_accuracy_threshold value: 76.01186510638954 - type: dot_ap value: 57.1872568788409 - type: dot_f1 value: 54.94480738905159 - type: dot_f1_threshold value: 63.27845437266042 - type: dot_precision value: 47.93632075471698 - type: dot_recall value: 64.35356200527704 - type: euclidean_accuracy value: 81.57000655659535 - type: euclidean_accuracy_threshold value: 69.2649048666448 - type: euclidean_ap value: 57.18724194735979 - type: euclidean_f1 value: 54.94480738905159 - type: euclidean_f1_threshold value: 85.69894748780587 - type: euclidean_precision value: 47.93632075471698 - type: euclidean_recall value: 64.35356200527704 - type: main_score value: 57.516050924090266 - type: manhattan_accuracy value: 81.71902008702389 - type: manhattan_accuracy_threshold value: 856.8997862166725 - type: manhattan_ap value: 57.516050924090266 - type: manhattan_f1 value: 55.16339869281046 - type: manhattan_f1_threshold value: 1035.858379830097 - type: manhattan_precision value: 50.18378378378379 - type: manhattan_recall value: 61.24010554089709 - type: max_accuracy value: 81.71902008702389 - type: max_ap value: 57.516050924090266 - type: max_f1 value: 55.16339869281046 - type: max_precision value: 50.18378378378379 - type: max_recall value: 64.35356200527704 - type: similarity_accuracy value: 81.57000655659535 - type: similarity_accuracy_threshold value: 76.01186428039885 - type: similarity_ap value: 57.187252502171674 - type: similarity_f1 value: 54.94480738905159 - type: similarity_f1_threshold value: 63.27845286960887 - type: similarity_precision value: 47.93632075471698 - type: similarity_recall value: 64.35356200527704 - task: type: PairClassification dataset: name: MTEB TwitterURLCorpus (default) type: mteb/twitterurlcorpus-pairclassification config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cosine_accuracy value: 87.09977878682035 - type: cosine_accuracy_threshold value: 63.00089389314832 - type: cosine_ap value: 81.9487582699938 - type: cosine_f1 value: 74.04089724292375 - type: cosine_f1_threshold value: 56.35024835869245 - type: cosine_precision value: 70.7599466704091 - type: cosine_recall value: 77.64089929165382 - type: dot_accuracy value: 87.09977878682035 - type: dot_accuracy_threshold value: 63.00089560728222 - type: dot_ap value: 81.94879514546079 - type: dot_f1 value: 74.04089724292375 - type: dot_f1_threshold value: 56.350250341728405 - type: dot_precision value: 70.7599466704091 - type: dot_recall value: 77.64089929165382 - type: euclidean_accuracy value: 87.09977878682035 - type: euclidean_accuracy_threshold value: 86.02221469735642 - type: euclidean_ap value: 81.94875892553148 - type: euclidean_f1 value: 74.04089724292375 - type: euclidean_f1_threshold value: 93.43420484744681 - type: euclidean_precision value: 70.7599466704091 - type: euclidean_recall value: 77.64089929165382 - type: main_score value: 82.13756947863085 - type: manhattan_accuracy value: 87.19292117825125 - type: manhattan_accuracy_threshold value: 1076.0586285257887 - type: manhattan_ap value: 82.13756947863085 - type: manhattan_f1 value: 74.36426623424485 - type: manhattan_f1_threshold value: 1148.366796662276 - type: manhattan_precision value: 71.32051463311183 - type: manhattan_recall value: 77.6793963658762 - type: max_accuracy value: 87.19292117825125 - type: max_ap value: 82.13756947863085 - type: max_f1 value: 74.36426623424485 - type: max_precision value: 71.32051463311183 - type: max_recall value: 77.6793963658762 - type: similarity_accuracy value: 87.09977878682035 - type: similarity_accuracy_threshold value: 63.00089389314832 - type: similarity_ap value: 81.9487582699938 - type: similarity_f1 value: 74.04089724292375 - type: similarity_f1_threshold value: 56.35024835869245 - type: similarity_precision value: 70.7599466704091 - type: similarity_recall value: 77.64089929165382 --- # potion-base-8M Model Card
\"Model2Vec This Model2Vec model is pre-trained using Tokenlearn. It is a distilled version of the baai/bge-base-en-v1.5 Sentence Transformer. It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU. It is designed for applications where computational resources are limited or where real-time performance is critical. ## Installation Install model2vec using pip: ## Usage Load this model using the method: ## How it works Model2vec creates a small, static model that outperforms other static embedding models by a large margin on all tasks on MTEB. This model is pre-trained using Tokenlearn. It's created using the following steps: - Distillation: first, a model is distilled from a sentence transformer model using Model2Vec. - Training data creation: the sentence transformer model is used to create training data by creating mean output embeddings on a large corpus. - Training: the distilled model is trained on the training data using Tokenlearn. - Post-training re-regularization: after training, the model is re-regularized by weighting the tokens based on their frequency, applying PCA, and finally applying SIF weighting. The results for this model can be found on the Model2Vec results page. ## Additional Resources - All Model2Vec models on the hub - Model2Vec Repo - Tokenlearn repo - Model2Vec Results - Model2Vec Tutorials ## Library Authors Model2Vec was developed by the Minish Lab team consisting of Stephan Tulkens and Thomas van Dongen. ## Citation Please cite the Model2Vec repository if you use this model in your work.", + "model_explanation_gemini": "Generates static embeddings for text classification and retrieval tasks, evaluated on MTEB benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.1.json b/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.1.json new file mode 100644 index 0000000000000000000000000000000000000000..81ea6c8cb36a2f15bd5375090fb72e084c97c88e --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.1.json @@ -0,0 +1,22 @@ +{ + "model_id": "mistralai/Mistral-7B-Instruct-v0.1", + "downloads": 475839, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mistral", + "text-generation", + "finetuned", + "conversational", + "arxiv:2310.06825", + "base_model:mistralai/Mistral-7B-v0.1", + "base_model:finetune:mistralai/Mistral-7B-v0.1", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - finetuned base_model: mistralai/Mistral-7B-v0.1 pipeline_tag: text-generation inference: true widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: If you want to learn more about how we process your personal data, please read our . --- # Model Card for Mistral-7B-Instruct-v0.1 ## Encode and Decode with ## Inference with ## Inference with hugging face > [!TIP] > PRs to correct the tokenizer so that it gives 1-to-1 the same results as the reference implementation are very welcome! --- The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets. For full details of this model please read our paper and release blog post. ## Instruction format In order to leverage instruction fine-tuning, your prompt should be surrounded by and tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. E.g. This format is available as a chat template via the method: ## Model Architecture This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices: - Grouped-Query Attention - Sliding-Window Attention - Byte-fallback BPE tokenizer ## Troubleshooting - If you see the following error: Installing transformers from source should solve the issue pip install git+ This should not be required after transformers-v4.33.4. ## Limitations The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed." +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.2.json b/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.2.json new file mode 100644 index 0000000000000000000000000000000000000000..9d91762a5a73d3a197d31cec35656d05da47252c --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.2.json @@ -0,0 +1,20 @@ +{ + "model_id": "mistralai/Mistral-7B-Instruct-v0.2", + "downloads": 1277759, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mistral", + "text-generation", + "finetuned", + "conversational", + "arxiv:2310.06825", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - finetuned pipeline_tag: text-generation new_version: mistralai/Mistral-7B-Instruct-v0.3 inference: true widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: If you want to learn more about how we process your personal data, please read our . --- # Model Card for Mistral-7B-Instruct-v0.2 ## Encode and Decode with ## Inference with ## Inference with hugging face > [!TIP] > PRs to correct the tokenizer so that it gives 1-to-1 the same results as the reference implementation are very welcome! --- The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1 - 32k context window (vs 8k context in v0.1) - Rope-theta = 1e6 - No Sliding-Window Attention For full details of this model please read our paper and release blog post. ## Instruction format In order to leverage instruction fine-tuning, your prompt should be surrounded by and tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. E.g. This format is available as a chat template via the method: ## Troubleshooting - If you see the following error: Installing transformers from source should solve the issue pip install git+ This should not be required after transformers-v4.33.4. ## Limitations The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed." +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.3.json b/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.3.json new file mode 100644 index 0000000000000000000000000000000000000000..f4a8d4c4bac0ad23eca35aefb195e24312f20430 --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-7B-Instruct-v0.3.json @@ -0,0 +1,19 @@ +{ + "model_id": "mistralai/Mistral-7B-Instruct-v0.3", + "downloads": 701083, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "conversational", + "base_model:mistralai/Mistral-7B-v0.3", + "base_model:finetune:mistralai/Mistral-7B-v0.3", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 base_model: mistralai/Mistral-7B-v0.3 extra_gated_description: If you want to learn more about how we process your personal data, please read our . --- # Model Card for Mistral-7B-Instruct-v0.3 The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 - Extended vocabulary to 32768 - Supports v3 Tokenizer - Supports function calling ## Installation It is recommended to use with mistral-inference. For HF transformers code snippets, please keep scrolling. ## Download ### Chat After installing , a CLI command should be available in your environment. You can chat with the model using ### Instruct following ### Function calling ## Generate with If you want to use Hugging Face to generate text, you can do something like this. ## Function calling with To use this example, you'll need version 4.42.0 or higher. Please see the function calling guide in the docs for more information. Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool results to the chat history so that the model can use them in its next generation. For a full tool calling example, please see the function calling guide, and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be exactly 9 alphanumeric characters. ## Limitations The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall" +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-7B-v0.1.json b/data/model_data_json/mistralai_Mistral-7B-v0.1.json new file mode 100644 index 0000000000000000000000000000000000000000..2584d3d7508d254c5039477d1f0b5f6afbd1dcdd --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-7B-v0.1.json @@ -0,0 +1,20 @@ +{ + "model_id": "mistralai/Mistral-7B-v0.1", + "downloads": 549916, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mistral", + "text-generation", + "pretrained", + "en", + "arxiv:2310.06825", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 tags: - pretrained pipeline_tag: text-generation inference: parameters: temperature: 0.7 extra_gated_description: If you want to learn more about how we process your personal data, please read our . --- # Model Card for Mistral-7B-v0.1 The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested. For full details of this model please read our paper and release blog post. ## Model Architecture Mistral-7B-v0.1 is a transformer model, with the following architecture choices: - Grouped-Query Attention - Sliding-Window Attention - Byte-fallback BPE tokenizer ## Troubleshooting - If you see the following error: - Or: Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer. ## Notice Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed." +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-7B-v0.3.json b/data/model_data_json/mistralai_Mistral-7B-v0.3.json new file mode 100644 index 0000000000000000000000000000000000000000..38ed3df0fbd4c1ac7206201e862e168618e7f0d1 --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-7B-v0.3.json @@ -0,0 +1,16 @@ +{ + "model_id": "mistralai/Mistral-7B-v0.3", + "downloads": 462267, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 extra_gated_description: If you want to learn more about how we process your personal data, please read our . --- # Model Card for Mistral-7B-v0.3 The Mistral-7B-v0.3 Large Language Model (LLM) is a Mistral-7B-v0.2 with extended vocabulary. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 - Extended vocabulary to 32768 ## Installation It is recommended to use with mistral-inference. For HF transformers code snippets, please keep scrolling. ## Download ### Demo After installing , a CLI command should be available in your environment. Should give something along the following lines: ## Generate with If you want to use Hugging Face to generate text, you can do something like this. ## Limitations The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall" +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-Nemo-Instruct-2407.json b/data/model_data_json/mistralai_Mistral-Nemo-Instruct-2407.json new file mode 100644 index 0000000000000000000000000000000000000000..26f921cfb02f608b89acb1b20bcaaa6aa1b38e27 --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-Nemo-Instruct-2407.json @@ -0,0 +1,28 @@ +{ + "model_id": "mistralai/Mistral-Nemo-Instruct-2407", + "downloads": 146936, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "conversational", + "en", + "fr", + "de", + "es", + "it", + "pt", + "ru", + "zh", + "ja", + "base_model:mistralai/Mistral-Nemo-Base-2407", + "base_model:finetune:mistralai/Mistral-Nemo-Base-2407", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - fr - de - es - it - pt - ru - zh - ja license: apache-2.0 base_model: mistralai/Mistral-Nemo-Base-2407 extra_gated_description: If you want to learn more about how we process your personal data, please read our . --- # Model Card for Mistral-Nemo-Instruct-2407 The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. For more details about this model please refer to our release blog post. ## Key features - Released under the **Apache 2 License** - Pre-trained and instructed versions - Trained with a **128k context window** - Trained on a large proportion of **multilingual and code data** - Drop-in replacement of Mistral 7B ## Model Architecture Mistral Nemo is a transformer model, with the following architecture choices: - **Layers:** 40 - **Dim:** 5,120 - **Head dim:** 128 - **Hidden dim:** 14,336 - **Activation Function:** SwiGLU - **Number of heads:** 32 - **Number of kv-heads:** 8 (GQA) - **Vocabulary size:** 2**17 ~= 128k - **Rotary embeddings (theta = 1M)** ## Metrics ### Main Benchmarks | Benchmark | Score | | --- | --- | | HellaSwag (0-shot) | 83.5% | | Winogrande (0-shot) | 76.8% | | OpenBookQA (0-shot) | 60.6% | | CommonSenseQA (0-shot) | 70.4% | | TruthfulQA (0-shot) | 50.3% | | MMLU (5-shot) | 68.0% | | TriviaQA (5-shot) | 73.8% | | NaturalQuestions (5-shot) | 31.2% | ### Multilingual Benchmarks (MMLU) | Language | Score | | --- | --- | | French | 62.3% | | German | 62.7% | | Spanish | 64.6% | | Italian | 61.3% | | Portuguese | 63.3% | | Russian | 59.2% | | Chinese | 59.0% | | Japanese | 59.0% | ## Usage The model can be used with three different frameworks - []( See here - []( See here - []( See nvidia/Mistral-NeMo-12B-Instruct ### Mistral Inference #### Install It is recommended to use with mistral-inference. For HF transformers code snippets, please keep scrolling. #### Download #### Chat After installing , a CLI command should be available in your environment. You can chat with the model using *E.g.* Try out something like: #### Instruct following #### Function calling ### Transformers > [!IMPORTANT] > NOTE: Until a new release has been made, you need to install transformers from source: > If you want to use Hugging Face to generate text, you can do something like this. ## Function calling with To use this example, you'll need version 4.42.0 or higher. Please see the function calling guide in the docs for more information. Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool results to the chat history so that the model can use them in its next generation. For a full tool calling example, please see the function calling guide, and note that Mistral **does** use tool call IDs, so these must be included in your tool calls and tool results. They should be exactly 9 alphanumeric characters. > [!TIP] > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3. ## Limitations The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall" +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-Small-24B-Instruct-2501.json b/data/model_data_json/mistralai_Mistral-Small-24B-Instruct-2501.json new file mode 100644 index 0000000000000000000000000000000000000000..3f7c16c7bb6bfa931f762a00183174971fc7014b --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-Small-24B-Instruct-2501.json @@ -0,0 +1,28 @@ +{ + "model_id": "mistralai/Mistral-Small-24B-Instruct-2501", + "downloads": 822996, + "tags": [ + "vllm", + "safetensors", + "mistral", + "text-generation", + "transformers", + "conversational", + "en", + "fr", + "de", + "es", + "it", + "pt", + "zh", + "ja", + "ru", + "ko", + "base_model:mistralai/Mistral-Small-24B-Base-2501", + "base_model:finetune:mistralai/Mistral-Small-24B-Base-2501", + "license:apache-2.0", + "text-generation-inference", + "region:us" + ], + "description": "--- language: - en - fr - de - es - it - pt - zh - ja - ru - ko license: apache-2.0 library_name: vllm inference: false base_model: - mistralai/Mistral-Small-24B-Base-2501 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our . tags: - transformers --- # Model Card for Mistral-Small-24B-Instruct-2501 Mistral Small 3 ( 2501 ) sets a new benchmark in the \"small\" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models! This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501. Mistral Small can be deployed locally and is exceptionally \"knowledge-dense\", fitting in a single RTX 4090 or a 32GB RAM MacBook once quantized. Perfect for: - Fast response conversational agents. - Low latency function calling. - Subject matter experts via fine-tuning. - Local inference for hobbyists and organizations handling sensitive data. For enterprises that need specialized capabilities (increased context, particular modalities, domain specific knowledge, etc.), we will be releasing commercial models beyond what Mistral AI contributes to the community. This release demonstrates our commitment to open source, serving as a strong base model. Learn more about Mistral Small in our blog post. Model developper: Mistral AI Team ## Key Features - **Multilingual:** Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. - **Agent-Centric:** Offers best-in-class agentic capabilities with native function calling and JSON outputting. - **Advanced Reasoning:** State-of-the-art conversational and reasoning capabilities. - **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes. - **Context Window:** A 32k context window. - **System Prompt:** Maintains strong adherence and support for system prompts. - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size. ## Benchmark results ### Human evaluated benchmarks | Category | Gemma-2-27B | Qwen-2.5-32B | Llama-3.3-70B | Gpt4o-mini | |----------|-------------|--------------|---------------|------------| | Mistral is better | 0.536 | 0.496 | 0.192 | 0.200 | | Mistral is slightly better | 0.196 | 0.184 | 0.164 | 0.204 | | Ties | 0.052 | 0.060 | 0.236 | 0.160 | | Other is slightly better | 0.060 | 0.088 | 0.112 | 0.124 | | Other is better | 0.156 | 0.172 | 0.296 | 0.312 | **Note**: - We conducted side by side evaluations with an external third-party vendor, on a set of over 1k proprietary coding and generalist prompts. - Evaluators were tasked with selecting their preferred model response from anonymized generations produced by Mistral Small 3 vs another model. - We are aware that in some cases the benchmarks on human judgement starkly differ from publicly available benchmarks, but have taken extra caution in verifying a fair evaluation. We are confident that the above benchmarks are valid. ### Publicly accesible benchmarks **Reasoning & Knowledge** | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |------------|---------------|--------------|---------------|---------------|-------------| | mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 | | gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 | **Math & Coding** | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |------------|---------------|--------------|---------------|---------------|-------------| | humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 | | math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 | **Instruction following** | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |------------|---------------|--------------|---------------|---------------|-------------| | mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 | | wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 | | arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 | | ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 | **Note**: - Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline - as such, numbers may vary slightly from previously reported performance (Qwen2.5-32B-Instruct, Llama-3.3-70B-Instruct, Gemma-2-27B-IT). - Judge based evals such as Wildbench, Arena hard and MTBench were based on gpt-4o-2024-05-13. ### Basic Instruct Template (V7-Tekken) *, and are placeholders.* ***Please make sure to use mistral-common as the source of truth*** ## Usage The model can be used with the following frameworks; - []( See here - []( See here ### vLLM We recommend using this model with the vLLM library to implement production-ready inference pipelines. **Note 1**: We recommond using a relatively low temperature, such as . **Note 2**: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following system prompt: **_Installation_** Make sure you install []( Also make sure you have []( installed: You can also make use of a ready-to-go docker image or on the docker hub. #### Server We recommand that you use Mistral-Small-24B-Instruct-2501 in a server/client setting. 1. Spin up a server: **Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. # /\\_/\\ # ( o.o ) # > ^ < # ### Function calling Mistral-Small-24-Instruct-2501 is excellent at function / tool calling tasks via vLLM. *E.g.:*
Example
#### Offline # /\\_/\\ # ( o.o ) # > ^ < # ### Transformers If you want to use Hugging Face transformers to generate text, you can do something like this. ### Ollama Ollama can run this model locally on MacOS, Windows and Linux. 4-bit quantization (aliased to default): 8-bit quantization: FP16:" +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mistral-Small-3.1-24B-Instruct-2503.json b/data/model_data_json/mistralai_Mistral-Small-3.1-24B-Instruct-2503.json new file mode 100644 index 0000000000000000000000000000000000000000..7962a6f6cad82efa00d0cedeef89bcdb81e42473 --- /dev/null +++ b/data/model_data_json/mistralai_Mistral-Small-3.1-24B-Instruct-2503.json @@ -0,0 +1,40 @@ +{ + "model_id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503", + "downloads": 78289, + "tags": [ + "vllm", + "safetensors", + "mistral3", + "image-text-to-text", + "conversational", + "en", + "fr", + "de", + "es", + "pt", + "it", + "ja", + "ko", + "ru", + "zh", + "ar", + "fa", + "id", + "ms", + "ne", + "pl", + "ro", + "sr", + "sv", + "tr", + "uk", + "vi", + "hi", + "bn", + "base_model:mistralai/Mistral-Small-3.1-24B-Base-2503", + "base_model:finetune:mistralai/Mistral-Small-3.1-24B-Base-2503", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: - en - fr - de - es - pt - it - ja - ko - ru - zh - ar - fa - id - ms - ne - pl - ro - sr - sv - tr - uk - vi - hi - bn license: apache-2.0 library_name: vllm inference: false base_model: - mistralai/Mistral-Small-3.1-24B-Base-2503 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our
. pipeline_tag: image-text-to-text --- # Model Card for Mistral-Small-3.1-24B-Instruct-2503 Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) **adds state-of-the-art vision understanding** and enhances **long context capabilities up to 128k tokens** without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks. This model is an instruction-finetuned version of: Mistral-Small-3.1-24B-Base-2503. Mistral Small 3.1 can be deployed locally and is exceptionally \"knowledge-dense,\" fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. It is ideal for: - Fast-response conversational agents. - Low-latency function calling. - Subject matter experts via fine-tuning. - Local inference for hobbyists and organizations handling sensitive data. - Programming and math reasoning. - Long document understanding. - Visual understanding. For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Learn more about Mistral Small 3.1 in our blog post. ## Key Features - **Vision:** Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text. - **Multilingual:** Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi. - **Agent-Centric:** Offers best-in-class agentic capabilities with native function calling and JSON outputting. - **Advanced Reasoning:** State-of-the-art conversational and reasoning capabilities. - **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes. - **Context Window:** A 128k context window. - **System Prompt:** Maintains strong adherence and support for system prompts. - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size. ## Benchmark Results When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness. ### Pretrain Evals | Model | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA | GPQA Main (5-shot CoT)| MMMU | |--------------------------------|---------------|-----------------------|------------|-----------------------|-----------| | **Small 3.1 24B Base** | **81.01%** | **56.03%** | 80.50% | **37.50%** | **59.27%**| | Gemma 3 27B PT | 78.60% | 52.20% | **81.30%** | 24.30% | 56.10% | ### Instruction Evals #### Text | Model | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT )| MBPP | HumanEval | SimpleQA (TotalAcc)| |--------------------------------|-----------|-----------------------|------------------------|------------------------|---------------------------|-----------|-----------|--------------------| | **Small 3.1 24B Instruct** | 80.62% | 66.76% | 69.30% | **44.42%** | **45.96%** | 74.71% | **88.41%**| **10.43%** | | Gemma 3 27B IT | 76.90% | **67.50%** | **89.00%** | 36.83% | 42.40% | 74.40% | 87.80% | 10.00% | | GPT4o Mini | **82.00%**| 61.70% | 70.20% | 40.20% | 39.39% | 84.82% | 87.20% | 9.50% | | Claude 3.5 Haiku | 77.60% | 65.00% | 69.20% | 37.05% | 41.60% | **85.60%**| 88.10% | 8.02% | | Cohere Aya-Vision 32B | 72.14% | 47.16% | 41.98% | 34.38% | 33.84% | 70.43% | 62.20% | 7.65% | #### Vision | Model | MMMU | MMMU PRO | Mathvista | ChartQA | DocVQA | AI2D | MM MT Bench | |--------------------------------|------------|-----------|-----------|-----------|-----------|-------------|-------------| | **Small 3.1 24B Instruct** | 64.00% | **49.25%**| **68.91%**| 86.24% | **94.08%**| **93.72%** | **7.3** | | Gemma 3 27B IT | **64.90%** | 48.38% | 67.60% | 76.00% | 86.60% | 84.50% | 7 | | GPT4o Mini | 59.40% | 37.60% | 56.70% | 76.80% | 86.70% | 88.10% | 6.6 | | Claude 3.5 Haiku | 60.50% | 45.03% | 61.60% | **87.20%**| 90.00% | 92.10% | 6.5 | | Cohere Aya-Vision 32B | 48.20% | 31.50% | 50.10% | 63.04% | 72.40% | 82.57% | 4.1 | ### Multilingual Evals | Model | Average | European | East Asian | Middle Eastern | |--------------------------------|------------|------------|------------|----------------| | **Small 3.1 24B Instruct** | **71.18%** | **75.30%** | **69.17%** | 69.08% | | Gemma 3 27B IT | 70.19% | 74.14% | 65.65% | 70.76% | | GPT4o Mini | 70.36% | 74.21% | 65.96% | **70.90%** | | Claude 3.5 Haiku | 70.16% | 73.45% | 67.05% | 70.00% | | Cohere Aya-Vision 32B | 62.15% | 64.70% | 57.61% | 64.12% | ### Long Context Evals | Model | LongBench v2 | RULER 32K | RULER 128K | |--------------------------------|-----------------|-------------|------------| | **Small 3.1 24B Instruct** | **37.18%** | **93.96%** | 81.20% | | Gemma 3 27B IT | 34.59% | 91.10% | 66.00% | | GPT4o Mini | 29.30% | 90.20% | 65.8% | | Claude 3.5 Haiku | 35.19% | 92.60% | **91.90%** | ## Basic Instruct Template (V7-Tekken) *, and are placeholders.* ***Please make sure to use mistral-common as the source of truth*** ## Usage The model can be used with the following frameworks; - []( See here **Note 1**: We recommend using a relatively low temperature, such as . **Note 2**: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following system prompt: ### vLLM (recommended) We recommend using this model with the vLLM library to implement production-ready inference pipelines. **_Installation_** Make sure you install []( Doing so should automatically install []( To check: You can also make use of a ready-to-go docker image or on the docker hub. #### Server We recommand that you use Mistral-Small-3.1-24B-Instruct-2503 in a server/client setting. 1. Spin up a server: **Note:** Running Mistral-Small-3.1-24B-Instruct-2503 on GPU requires ~55 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. ### Function calling Mistral-Small-3.1-24-Instruct-2503 is excellent at function / tool calling tasks via vLLM. *E.g.:*
Example
#### Offline # /\\_/\\ # ( o.o ) # > ^ < # ### Transformers (untested) Transformers-compatible model weights are also uploaded (thanks a lot @cyrilvallez). However the transformers implementation was **not throughly tested**, but only on \"vibe-checks\". Hence, we can only ensure 100% correct behavior when using the original weight format with vllm (see above)." +} \ No newline at end of file diff --git a/data/model_data_json/mistralai_Mixtral-8x7B-Instruct-v0.1.json b/data/model_data_json/mistralai_Mixtral-8x7B-Instruct-v0.1.json new file mode 100644 index 0000000000000000000000000000000000000000..f6b04e637b00e59a592f1578939ff67e8937184a --- /dev/null +++ b/data/model_data_json/mistralai_Mixtral-8x7B-Instruct-v0.1.json @@ -0,0 +1,24 @@ +{ + "model_id": "mistralai/Mixtral-8x7B-Instruct-v0.1", + "downloads": 477341, + "tags": [ + "transformers", + "safetensors", + "mixtral", + "text-generation", + "conversational", + "fr", + "it", + "de", + "es", + "en", + "base_model:mistralai/Mixtral-8x7B-v0.1", + "base_model:finetune:mistralai/Mixtral-8x7B-v0.1", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - fr - it - de - es - en license: apache-2.0 base_model: mistralai/Mixtral-8x7B-v0.1 inference: parameters: temperature: 0.5 widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: If you want to learn more about how we process your personal data, please read our
. --- # Model Card for Mixtral-8x7B ### Tokenization with ## Inference with ## Inference with hugging face > [!TIP] > PRs to correct the transformers tokenizer so that it gives 1-to-1 the same results as the mistral-common reference implementation are very welcome! --- The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested. For full details of this model please read our release blog post. ## Warning This repo contains weights that are compatible with vLLM serving of the model as well as Hugging Face transformers library. It is based on the original Mixtral torrent release, but the file format and parameter names are different. Please note that model cannot (yet) be instantiated with HF. ## Instruction format This format must be strictly respected, otherwise the model will generate sub-optimal outputs. The template used to build a prompt for the Instruct model is defined as follows: Note that and are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings. As reference, here is the pseudo-code used to tokenize instructions during fine-tuning: In the pseudo-code above, note that the method should not add a BOS or EOS token automatically, but should add a prefix space. In the Transformers library, one can use chat templates which make sure the right format is applied. ## Run the model By default, transformers will load the model in full precision. Therefore you might be interested to further reduce down the memory requirements to run the model through the optimizations we offer in HF ecosystem: ### In half-precision Note precision only works on GPU devices
Click to expand
### Lower precision using (8-bit & 4-bit) using
Click to expand
### Load the model with Flash Attention 2
Click to expand
## Limitations The Mixtral-8x7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. # The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed." +} \ No newline at end of file diff --git a/data/model_data_json/mixedbread-ai_mxbai-embed-large-v1.json b/data/model_data_json/mixedbread-ai_mxbai-embed-large-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..6dc17baf38d239be44b72baa094c41af352164a4 --- /dev/null +++ b/data/model_data_json/mixedbread-ai_mxbai-embed-large-v1.json @@ -0,0 +1,26 @@ +{ + "model_id": "mixedbread-ai/mxbai-embed-large-v1", + "downloads": 2766832, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "openvino", + "gguf", + "bert", + "feature-extraction", + "mteb", + "transformers.js", + "transformers", + "en", + "arxiv:2309.12871", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - transformers.js - transformers model-index: - name: mxbai-angle-large-v1 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.044776119403 - type: ap value: 37.7362433623053 - type: f1 value: 68.92736573359774 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.84025000000001 - type: ap value: 90.93190875404055 - type: f1 value: 93.8297833897293 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 49.184 - type: f1 value: 48.74163227751588 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 41.252 - type: map_at_10 value: 57.778 - type: map_at_100 value: 58.233000000000004 - type: map_at_1000 value: 58.23700000000001 - type: map_at_3 value: 53.449999999999996 - type: map_at_5 value: 56.376000000000005 - type: mrr_at_1 value: 41.679 - type: mrr_at_10 value: 57.92699999999999 - type: mrr_at_100 value: 58.389 - type: mrr_at_1000 value: 58.391999999999996 - type: mrr_at_3 value: 53.651 - type: mrr_at_5 value: 56.521 - type: ndcg_at_1 value: 41.252 - type: ndcg_at_10 value: 66.018 - type: ndcg_at_100 value: 67.774 - type: ndcg_at_1000 value: 67.84400000000001 - type: ndcg_at_3 value: 57.372 - type: ndcg_at_5 value: 62.646 - type: precision_at_1 value: 41.252 - type: precision_at_10 value: 9.189 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 22.902 - type: precision_at_5 value: 16.302 - type: recall_at_1 value: 41.252 - type: recall_at_10 value: 91.892 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 68.706 - type: recall_at_5 value: 81.50800000000001 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.97294504317859 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 42.98071077674629 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 65.16477858490782 - type: mrr value: 78.23583080508287 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.6277629421789 - type: cos_sim_spearman value: 88.4056288400568 - type: euclidean_pearson value: 87.94871847578163 - type: euclidean_spearman value: 88.4056288400568 - type: manhattan_pearson value: 87.73271254229648 - type: manhattan_spearman value: 87.91826833762677 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 87.81818181818181 - type: f1 value: 87.79879337316918 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.91773608582761 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.73059477462478 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.745999999999995 - type: map_at_10 value: 43.632 - type: map_at_100 value: 45.206 - type: map_at_1000 value: 45.341 - type: map_at_3 value: 39.956 - type: map_at_5 value: 42.031 - type: mrr_at_1 value: 39.485 - type: mrr_at_10 value: 49.537 - type: mrr_at_100 value: 50.249 - type: mrr_at_1000 value: 50.294000000000004 - type: mrr_at_3 value: 46.757 - type: mrr_at_5 value: 48.481 - type: ndcg_at_1 value: 39.485 - type: ndcg_at_10 value: 50.058 - type: ndcg_at_100 value: 55.586 - type: ndcg_at_1000 value: 57.511 - type: ndcg_at_3 value: 44.786 - type: ndcg_at_5 value: 47.339999999999996 - type: precision_at_1 value: 39.485 - type: precision_at_10 value: 9.557 - type: precision_at_100 value: 1.552 - type: precision_at_1000 value: 0.202 - type: precision_at_3 value: 21.412 - type: precision_at_5 value: 15.479000000000001 - type: recall_at_1 value: 32.745999999999995 - type: recall_at_10 value: 62.056 - type: recall_at_100 value: 85.088 - type: recall_at_1000 value: 96.952 - type: recall_at_3 value: 46.959 - type: recall_at_5 value: 54.06999999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.898 - type: map_at_10 value: 42.142 - type: map_at_100 value: 43.349 - type: map_at_1000 value: 43.483 - type: map_at_3 value: 39.18 - type: map_at_5 value: 40.733000000000004 - type: mrr_at_1 value: 39.617999999999995 - type: mrr_at_10 value: 47.922 - type: mrr_at_100 value: 48.547000000000004 - type: mrr_at_1000 value: 48.597 - type: mrr_at_3 value: 45.86 - type: mrr_at_5 value: 46.949000000000005 - type: ndcg_at_1 value: 39.617999999999995 - type: ndcg_at_10 value: 47.739 - type: ndcg_at_100 value: 51.934999999999995 - type: ndcg_at_1000 value: 54.007000000000005 - type: ndcg_at_3 value: 43.748 - type: ndcg_at_5 value: 45.345 - type: precision_at_1 value: 39.617999999999995 - type: precision_at_10 value: 8.962 - type: precision_at_100 value: 1.436 - type: precision_at_1000 value: 0.192 - type: precision_at_3 value: 21.083 - type: precision_at_5 value: 14.752 - type: recall_at_1 value: 31.898 - type: recall_at_10 value: 57.587999999999994 - type: recall_at_100 value: 75.323 - type: recall_at_1000 value: 88.304 - type: recall_at_3 value: 45.275 - type: recall_at_5 value: 49.99 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.458 - type: map_at_10 value: 52.942 - type: map_at_100 value: 53.974 - type: map_at_1000 value: 54.031 - type: map_at_3 value: 49.559999999999995 - type: map_at_5 value: 51.408 - type: mrr_at_1 value: 46.27 - type: mrr_at_10 value: 56.31699999999999 - type: mrr_at_100 value: 56.95099999999999 - type: mrr_at_1000 value: 56.98 - type: mrr_at_3 value: 53.835 - type: mrr_at_5 value: 55.252 - type: ndcg_at_1 value: 46.27 - type: ndcg_at_10 value: 58.964000000000006 - type: ndcg_at_100 value: 62.875 - type: ndcg_at_1000 value: 63.969 - type: ndcg_at_3 value: 53.297000000000004 - type: ndcg_at_5 value: 55.938 - type: precision_at_1 value: 46.27 - type: precision_at_10 value: 9.549000000000001 - type: precision_at_100 value: 1.2409999999999999 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 23.762 - type: precision_at_5 value: 16.262999999999998 - type: recall_at_1 value: 40.458 - type: recall_at_10 value: 73.446 - type: recall_at_100 value: 90.12400000000001 - type: recall_at_1000 value: 97.795 - type: recall_at_3 value: 58.123000000000005 - type: recall_at_5 value: 64.68 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.443 - type: map_at_10 value: 36.081 - type: map_at_100 value: 37.163000000000004 - type: map_at_1000 value: 37.232 - type: map_at_3 value: 33.308 - type: map_at_5 value: 34.724 - type: mrr_at_1 value: 29.492 - type: mrr_at_10 value: 38.138 - type: mrr_at_100 value: 39.065 - type: mrr_at_1000 value: 39.119 - type: mrr_at_3 value: 35.593 - type: mrr_at_5 value: 36.785000000000004 - type: ndcg_at_1 value: 29.492 - type: ndcg_at_10 value: 41.134 - type: ndcg_at_100 value: 46.300999999999995 - type: ndcg_at_1000 value: 48.106 - type: ndcg_at_3 value: 35.77 - type: ndcg_at_5 value: 38.032 - type: precision_at_1 value: 29.492 - type: precision_at_10 value: 6.249 - type: precision_at_100 value: 0.9299999999999999 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 15.065999999999999 - type: precision_at_5 value: 10.373000000000001 - type: recall_at_1 value: 27.443 - type: recall_at_10 value: 54.80199999999999 - type: recall_at_100 value: 78.21900000000001 - type: recall_at_1000 value: 91.751 - type: recall_at_3 value: 40.211000000000006 - type: recall_at_5 value: 45.599000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.731 - type: map_at_10 value: 26.717999999999996 - type: map_at_100 value: 27.897 - type: map_at_1000 value: 28.029 - type: map_at_3 value: 23.91 - type: map_at_5 value: 25.455 - type: mrr_at_1 value: 23.134 - type: mrr_at_10 value: 31.769 - type: mrr_at_100 value: 32.634 - type: mrr_at_1000 value: 32.707 - type: mrr_at_3 value: 28.938999999999997 - type: mrr_at_5 value: 30.531000000000002 - type: ndcg_at_1 value: 23.134 - type: ndcg_at_10 value: 32.249 - type: ndcg_at_100 value: 37.678 - type: ndcg_at_1000 value: 40.589999999999996 - type: ndcg_at_3 value: 26.985999999999997 - type: ndcg_at_5 value: 29.457 - type: precision_at_1 value: 23.134 - type: precision_at_10 value: 5.8709999999999996 - type: precision_at_100 value: 0.988 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 12.852 - type: precision_at_5 value: 9.428 - type: recall_at_1 value: 18.731 - type: recall_at_10 value: 44.419 - type: recall_at_100 value: 67.851 - type: recall_at_1000 value: 88.103 - type: recall_at_3 value: 29.919 - type: recall_at_5 value: 36.230000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.324 - type: map_at_10 value: 41.265 - type: map_at_100 value: 42.559000000000005 - type: map_at_1000 value: 42.669000000000004 - type: map_at_3 value: 38.138 - type: map_at_5 value: 39.881 - type: mrr_at_1 value: 36.67 - type: mrr_at_10 value: 46.774 - type: mrr_at_100 value: 47.554 - type: mrr_at_1000 value: 47.593 - type: mrr_at_3 value: 44.338 - type: mrr_at_5 value: 45.723 - type: ndcg_at_1 value: 36.67 - type: ndcg_at_10 value: 47.367 - type: ndcg_at_100 value: 52.623 - type: ndcg_at_1000 value: 54.59 - type: ndcg_at_3 value: 42.323 - type: ndcg_at_5 value: 44.727 - type: precision_at_1 value: 36.67 - type: precision_at_10 value: 8.518 - type: precision_at_100 value: 1.2890000000000001 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 19.955000000000002 - type: precision_at_5 value: 14.11 - type: recall_at_1 value: 30.324 - type: recall_at_10 value: 59.845000000000006 - type: recall_at_100 value: 81.77499999999999 - type: recall_at_1000 value: 94.463 - type: recall_at_3 value: 46.019 - type: recall_at_5 value: 52.163000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.229 - type: map_at_10 value: 35.004000000000005 - type: map_at_100 value: 36.409000000000006 - type: map_at_1000 value: 36.521 - type: map_at_3 value: 31.793 - type: map_at_5 value: 33.432 - type: mrr_at_1 value: 30.365 - type: mrr_at_10 value: 40.502 - type: mrr_at_100 value: 41.372 - type: mrr_at_1000 value: 41.435 - type: mrr_at_3 value: 37.804 - type: mrr_at_5 value: 39.226 - type: ndcg_at_1 value: 30.365 - type: ndcg_at_10 value: 41.305 - type: ndcg_at_100 value: 47.028999999999996 - type: ndcg_at_1000 value: 49.375 - type: ndcg_at_3 value: 35.85 - type: ndcg_at_5 value: 38.12 - type: precision_at_1 value: 30.365 - type: precision_at_10 value: 7.808 - type: precision_at_100 value: 1.228 - type: precision_at_1000 value: 0.161 - type: precision_at_3 value: 17.352 - type: precision_at_5 value: 12.42 - type: recall_at_1 value: 24.229 - type: recall_at_10 value: 54.673 - type: recall_at_100 value: 78.766 - type: recall_at_1000 value: 94.625 - type: recall_at_3 value: 39.602 - type: recall_at_5 value: 45.558 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.695 - type: map_at_10 value: 36.0895 - type: map_at_100 value: 37.309416666666664 - type: map_at_1000 value: 37.42558333333334 - type: map_at_3 value: 33.19616666666666 - type: map_at_5 value: 34.78641666666667 - type: mrr_at_1 value: 31.486083333333337 - type: mrr_at_10 value: 40.34774999999999 - type: mrr_at_100 value: 41.17533333333333 - type: mrr_at_1000 value: 41.231583333333326 - type: mrr_at_3 value: 37.90075 - type: mrr_at_5 value: 39.266999999999996 - type: ndcg_at_1 value: 31.486083333333337 - type: ndcg_at_10 value: 41.60433333333334 - type: ndcg_at_100 value: 46.74525 - type: ndcg_at_1000 value: 48.96166666666667 - type: ndcg_at_3 value: 36.68825 - type: ndcg_at_5 value: 38.966499999999996 - type: precision_at_1 value: 31.486083333333337 - type: precision_at_10 value: 7.29675 - type: precision_at_100 value: 1.1621666666666666 - type: precision_at_1000 value: 0.1545 - type: precision_at_3 value: 16.8815 - type: precision_at_5 value: 11.974583333333333 - type: recall_at_1 value: 26.695 - type: recall_at_10 value: 53.651916666666665 - type: recall_at_100 value: 76.12083333333332 - type: recall_at_1000 value: 91.31191666666668 - type: recall_at_3 value: 40.03575 - type: recall_at_5 value: 45.876666666666665 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.668000000000003 - type: map_at_10 value: 32.486 - type: map_at_100 value: 33.371 - type: map_at_1000 value: 33.458 - type: map_at_3 value: 30.261 - type: map_at_5 value: 31.418000000000003 - type: mrr_at_1 value: 28.988000000000003 - type: mrr_at_10 value: 35.414 - type: mrr_at_100 value: 36.149 - type: mrr_at_1000 value: 36.215 - type: mrr_at_3 value: 33.333 - type: mrr_at_5 value: 34.43 - type: ndcg_at_1 value: 28.988000000000003 - type: ndcg_at_10 value: 36.732 - type: ndcg_at_100 value: 41.331 - type: ndcg_at_1000 value: 43.575 - type: ndcg_at_3 value: 32.413 - type: ndcg_at_5 value: 34.316 - type: precision_at_1 value: 28.988000000000003 - type: precision_at_10 value: 5.7059999999999995 - type: precision_at_100 value: 0.882 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 13.65 - type: precision_at_5 value: 9.417 - type: recall_at_1 value: 25.668000000000003 - type: recall_at_10 value: 47.147 - type: recall_at_100 value: 68.504 - type: recall_at_1000 value: 85.272 - type: recall_at_3 value: 35.19 - type: recall_at_5 value: 39.925 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.256 - type: map_at_10 value: 24.58 - type: map_at_100 value: 25.773000000000003 - type: map_at_1000 value: 25.899 - type: map_at_3 value: 22.236 - type: map_at_5 value: 23.507 - type: mrr_at_1 value: 20.957 - type: mrr_at_10 value: 28.416000000000004 - type: mrr_at_100 value: 29.447000000000003 - type: mrr_at_1000 value: 29.524 - type: mrr_at_3 value: 26.245 - type: mrr_at_5 value: 27.451999999999998 - type: ndcg_at_1 value: 20.957 - type: ndcg_at_10 value: 29.285 - type: ndcg_at_100 value: 35.003 - type: ndcg_at_1000 value: 37.881 - type: ndcg_at_3 value: 25.063000000000002 - type: ndcg_at_5 value: 26.983 - type: precision_at_1 value: 20.957 - type: precision_at_10 value: 5.344 - type: precision_at_100 value: 0.958 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 11.918 - type: precision_at_5 value: 8.596 - type: recall_at_1 value: 17.256 - type: recall_at_10 value: 39.644 - type: recall_at_100 value: 65.279 - type: recall_at_1000 value: 85.693 - type: recall_at_3 value: 27.825 - type: recall_at_5 value: 32.792 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.700000000000003 - type: map_at_10 value: 36.205999999999996 - type: map_at_100 value: 37.316 - type: map_at_1000 value: 37.425000000000004 - type: map_at_3 value: 33.166000000000004 - type: map_at_5 value: 35.032999999999994 - type: mrr_at_1 value: 31.436999999999998 - type: mrr_at_10 value: 40.61 - type: mrr_at_100 value: 41.415 - type: mrr_at_1000 value: 41.48 - type: mrr_at_3 value: 37.966 - type: mrr_at_5 value: 39.599000000000004 - type: ndcg_at_1 value: 31.436999999999998 - type: ndcg_at_10 value: 41.771 - type: ndcg_at_100 value: 46.784 - type: ndcg_at_1000 value: 49.183 - type: ndcg_at_3 value: 36.437000000000005 - type: ndcg_at_5 value: 39.291 - type: precision_at_1 value: 31.436999999999998 - type: precision_at_10 value: 6.987 - type: precision_at_100 value: 1.072 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 16.448999999999998 - type: precision_at_5 value: 11.866 - type: recall_at_1 value: 26.700000000000003 - type: recall_at_10 value: 54.301 - type: recall_at_100 value: 75.871 - type: recall_at_1000 value: 92.529 - type: recall_at_3 value: 40.201 - type: recall_at_5 value: 47.208 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.296 - type: map_at_10 value: 33.116 - type: map_at_100 value: 34.81 - type: map_at_1000 value: 35.032000000000004 - type: map_at_3 value: 30.105999999999998 - type: map_at_5 value: 31.839000000000002 - type: mrr_at_1 value: 29.051 - type: mrr_at_10 value: 37.803 - type: mrr_at_100 value: 38.856 - type: mrr_at_1000 value: 38.903999999999996 - type: mrr_at_3 value: 35.211 - type: mrr_at_5 value: 36.545 - type: ndcg_at_1 value: 29.051 - type: ndcg_at_10 value: 39.007 - type: ndcg_at_100 value: 45.321 - type: ndcg_at_1000 value: 47.665 - type: ndcg_at_3 value: 34.1 - type: ndcg_at_5 value: 36.437000000000005 - type: precision_at_1 value: 29.051 - type: precision_at_10 value: 7.668 - type: precision_at_100 value: 1.542 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 16.14 - type: precision_at_5 value: 11.897 - type: recall_at_1 value: 24.296 - type: recall_at_10 value: 49.85 - type: recall_at_100 value: 78.457 - type: recall_at_1000 value: 92.618 - type: recall_at_3 value: 36.138999999999996 - type: recall_at_5 value: 42.223 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 20.591 - type: map_at_10 value: 28.902 - type: map_at_100 value: 29.886000000000003 - type: map_at_1000 value: 29.987000000000002 - type: map_at_3 value: 26.740000000000002 - type: map_at_5 value: 27.976 - type: mrr_at_1 value: 22.366 - type: mrr_at_10 value: 30.971 - type: mrr_at_100 value: 31.865 - type: mrr_at_1000 value: 31.930999999999997 - type: mrr_at_3 value: 28.927999999999997 - type: mrr_at_5 value: 30.231 - type: ndcg_at_1 value: 22.366 - type: ndcg_at_10 value: 33.641 - type: ndcg_at_100 value: 38.477 - type: ndcg_at_1000 value: 41.088 - type: ndcg_at_3 value: 29.486 - type: ndcg_at_5 value: 31.612000000000002 - type: precision_at_1 value: 22.366 - type: precision_at_10 value: 5.3420000000000005 - type: precision_at_100 value: 0.828 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 12.939 - type: precision_at_5 value: 9.094 - type: recall_at_1 value: 20.591 - type: recall_at_10 value: 46.052 - type: recall_at_100 value: 68.193 - type: recall_at_1000 value: 87.638 - type: recall_at_3 value: 34.966 - type: recall_at_5 value: 40.082 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 15.091 - type: map_at_10 value: 26.38 - type: map_at_100 value: 28.421999999999997 - type: map_at_1000 value: 28.621999999999996 - type: map_at_3 value: 21.597 - type: map_at_5 value: 24.12 - type: mrr_at_1 value: 34.266999999999996 - type: mrr_at_10 value: 46.864 - type: mrr_at_100 value: 47.617 - type: mrr_at_1000 value: 47.644 - type: mrr_at_3 value: 43.312 - type: mrr_at_5 value: 45.501000000000005 - type: ndcg_at_1 value: 34.266999999999996 - type: ndcg_at_10 value: 36.095 - type: ndcg_at_100 value: 43.447 - type: ndcg_at_1000 value: 46.661 - type: ndcg_at_3 value: 29.337999999999997 - type: ndcg_at_5 value: 31.824 - type: precision_at_1 value: 34.266999999999996 - type: precision_at_10 value: 11.472 - type: precision_at_100 value: 1.944 - type: precision_at_1000 value: 0.255 - type: precision_at_3 value: 21.933 - type: precision_at_5 value: 17.224999999999998 - type: recall_at_1 value: 15.091 - type: recall_at_10 value: 43.022 - type: recall_at_100 value: 68.075 - type: recall_at_1000 value: 85.76 - type: recall_at_3 value: 26.564 - type: recall_at_5 value: 33.594 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.252 - type: map_at_10 value: 20.923 - type: map_at_100 value: 30.741000000000003 - type: map_at_1000 value: 32.542 - type: map_at_3 value: 14.442 - type: map_at_5 value: 17.399 - type: mrr_at_1 value: 70.25 - type: mrr_at_10 value: 78.17 - type: mrr_at_100 value: 78.444 - type: mrr_at_1000 value: 78.45100000000001 - type: mrr_at_3 value: 76.958 - type: mrr_at_5 value: 77.571 - type: ndcg_at_1 value: 58.375 - type: ndcg_at_10 value: 44.509 - type: ndcg_at_100 value: 49.897999999999996 - type: ndcg_at_1000 value: 57.269999999999996 - type: ndcg_at_3 value: 48.64 - type: ndcg_at_5 value: 46.697 - type: precision_at_1 value: 70.25 - type: precision_at_10 value: 36.05 - type: precision_at_100 value: 11.848 - type: precision_at_1000 value: 2.213 - type: precision_at_3 value: 52.917 - type: precision_at_5 value: 45.7 - type: recall_at_1 value: 9.252 - type: recall_at_10 value: 27.006999999999998 - type: recall_at_100 value: 57.008 - type: recall_at_1000 value: 80.697 - type: recall_at_3 value: 15.798000000000002 - type: recall_at_5 value: 20.4 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 50.88 - type: f1 value: 45.545495028653384 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 75.424 - type: map_at_10 value: 83.435 - type: map_at_100 value: 83.66900000000001 - type: map_at_1000 value: 83.685 - type: map_at_3 value: 82.39800000000001 - type: map_at_5 value: 83.07 - type: mrr_at_1 value: 81.113 - type: mrr_at_10 value: 87.77199999999999 - type: mrr_at_100 value: 87.862 - type: mrr_at_1000 value: 87.86500000000001 - type: mrr_at_3 value: 87.17099999999999 - type: mrr_at_5 value: 87.616 - type: ndcg_at_1 value: 81.113 - type: ndcg_at_10 value: 86.909 - type: ndcg_at_100 value: 87.746 - type: ndcg_at_1000 value: 88.017 - type: ndcg_at_3 value: 85.368 - type: ndcg_at_5 value: 86.28099999999999 - type: precision_at_1 value: 81.113 - type: precision_at_10 value: 10.363 - type: precision_at_100 value: 1.102 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 32.507999999999996 - type: precision_at_5 value: 20.138 - type: recall_at_1 value: 75.424 - type: recall_at_10 value: 93.258 - type: recall_at_100 value: 96.545 - type: recall_at_1000 value: 98.284 - type: recall_at_3 value: 89.083 - type: recall_at_5 value: 91.445 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.532 - type: map_at_10 value: 37.141999999999996 - type: map_at_100 value: 39.162 - type: map_at_1000 value: 39.322 - type: map_at_3 value: 32.885 - type: map_at_5 value: 35.093999999999994 - type: mrr_at_1 value: 44.29 - type: mrr_at_10 value: 53.516 - type: mrr_at_100 value: 54.24 - type: mrr_at_1000 value: 54.273 - type: mrr_at_3 value: 51.286 - type: mrr_at_5 value: 52.413 - type: ndcg_at_1 value: 44.29 - type: ndcg_at_10 value: 45.268 - type: ndcg_at_100 value: 52.125 - type: ndcg_at_1000 value: 54.778000000000006 - type: ndcg_at_3 value: 41.829 - type: ndcg_at_5 value: 42.525 - type: precision_at_1 value: 44.29 - type: precision_at_10 value: 12.5 - type: precision_at_100 value: 1.9720000000000002 - type: precision_at_1000 value: 0.245 - type: precision_at_3 value: 28.035 - type: precision_at_5 value: 20.093 - type: recall_at_1 value: 22.532 - type: recall_at_10 value: 52.419000000000004 - type: recall_at_100 value: 77.43299999999999 - type: recall_at_1000 value: 93.379 - type: recall_at_3 value: 38.629000000000005 - type: recall_at_5 value: 43.858000000000004 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.359 - type: map_at_10 value: 63.966 - type: map_at_100 value: 64.87 - type: map_at_1000 value: 64.92599999999999 - type: map_at_3 value: 60.409 - type: map_at_5 value: 62.627 - type: mrr_at_1 value: 78.717 - type: mrr_at_10 value: 84.468 - type: mrr_at_100 value: 84.655 - type: mrr_at_1000 value: 84.661 - type: mrr_at_3 value: 83.554 - type: mrr_at_5 value: 84.133 - type: ndcg_at_1 value: 78.717 - type: ndcg_at_10 value: 72.03399999999999 - type: ndcg_at_100 value: 75.158 - type: ndcg_at_1000 value: 76.197 - type: ndcg_at_3 value: 67.049 - type: ndcg_at_5 value: 69.808 - type: precision_at_1 value: 78.717 - type: precision_at_10 value: 15.201 - type: precision_at_100 value: 1.764 - type: precision_at_1000 value: 0.19 - type: precision_at_3 value: 43.313 - type: precision_at_5 value: 28.165000000000003 - type: recall_at_1 value: 39.359 - type: recall_at_10 value: 76.003 - type: recall_at_100 value: 88.197 - type: recall_at_1000 value: 95.003 - type: recall_at_3 value: 64.97 - type: recall_at_5 value: 70.41199999999999 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 92.83200000000001 - type: ap value: 89.33560571859861 - type: f1 value: 92.82322915005167 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.983 - type: map_at_10 value: 34.259 - type: map_at_100 value: 35.432 - type: map_at_1000 value: 35.482 - type: map_at_3 value: 30.275999999999996 - type: map_at_5 value: 32.566 - type: mrr_at_1 value: 22.579 - type: mrr_at_10 value: 34.882999999999996 - type: mrr_at_100 value: 35.984 - type: mrr_at_1000 value: 36.028 - type: mrr_at_3 value: 30.964999999999996 - type: mrr_at_5 value: 33.245000000000005 - type: ndcg_at_1 value: 22.564 - type: ndcg_at_10 value: 41.258 - type: ndcg_at_100 value: 46.824 - type: ndcg_at_1000 value: 48.037 - type: ndcg_at_3 value: 33.17 - type: ndcg_at_5 value: 37.263000000000005 - type: precision_at_1 value: 22.564 - type: precision_at_10 value: 6.572 - type: precision_at_100 value: 0.935 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.130999999999998 - type: precision_at_5 value: 10.544 - type: recall_at_1 value: 21.983 - type: recall_at_10 value: 62.775000000000006 - type: recall_at_100 value: 88.389 - type: recall_at_1000 value: 97.603 - type: recall_at_3 value: 40.878 - type: recall_at_5 value: 50.690000000000005 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.95120839033288 - type: f1 value: 93.73824125055208 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 76.78978568171455 - type: f1 value: 57.50180552858304 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 76.24411566913248 - type: f1 value: 74.37851403532832 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 79.94620040349699 - type: f1 value: 80.21293397970435 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.44403096245675 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.659594631336812 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.53833075108798 - type: mrr value: 33.78840823218308 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 7.185999999999999 - type: map_at_10 value: 15.193999999999999 - type: map_at_100 value: 19.538 - type: map_at_1000 value: 21.178 - type: map_at_3 value: 11.208 - type: map_at_5 value: 12.745999999999999 - type: mrr_at_1 value: 48.916 - type: mrr_at_10 value: 58.141 - type: mrr_at_100 value: 58.656 - type: mrr_at_1000 value: 58.684999999999995 - type: mrr_at_3 value: 55.521 - type: mrr_at_5 value: 57.239 - type: ndcg_at_1 value: 47.059 - type: ndcg_at_10 value: 38.644 - type: ndcg_at_100 value: 36.272999999999996 - type: ndcg_at_1000 value: 44.996 - type: ndcg_at_3 value: 43.293 - type: ndcg_at_5 value: 40.819 - type: precision_at_1 value: 48.916 - type: precision_at_10 value: 28.607 - type: precision_at_100 value: 9.195 - type: precision_at_1000 value: 2.225 - type: precision_at_3 value: 40.454 - type: precision_at_5 value: 34.985 - type: recall_at_1 value: 7.185999999999999 - type: recall_at_10 value: 19.654 - type: recall_at_100 value: 37.224000000000004 - type: recall_at_1000 value: 68.663 - type: recall_at_3 value: 12.158 - type: recall_at_5 value: 14.674999999999999 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 31.552000000000003 - type: map_at_10 value: 47.75 - type: map_at_100 value: 48.728 - type: map_at_1000 value: 48.754 - type: map_at_3 value: 43.156 - type: map_at_5 value: 45.883 - type: mrr_at_1 value: 35.66 - type: mrr_at_10 value: 50.269 - type: mrr_at_100 value: 50.974 - type: mrr_at_1000 value: 50.991 - type: mrr_at_3 value: 46.519 - type: mrr_at_5 value: 48.764 - type: ndcg_at_1 value: 35.632000000000005 - type: ndcg_at_10 value: 55.786 - type: ndcg_at_100 value: 59.748999999999995 - type: ndcg_at_1000 value: 60.339 - type: ndcg_at_3 value: 47.292 - type: ndcg_at_5 value: 51.766999999999996 - type: precision_at_1 value: 35.632000000000005 - type: precision_at_10 value: 9.267 - type: precision_at_100 value: 1.149 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 21.601 - type: precision_at_5 value: 15.539 - type: recall_at_1 value: 31.552000000000003 - type: recall_at_10 value: 77.62400000000001 - type: recall_at_100 value: 94.527 - type: recall_at_1000 value: 98.919 - type: recall_at_3 value: 55.898 - type: recall_at_5 value: 66.121 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 71.414 - type: map_at_10 value: 85.37400000000001 - type: map_at_100 value: 86.01100000000001 - type: map_at_1000 value: 86.027 - type: map_at_3 value: 82.562 - type: map_at_5 value: 84.284 - type: mrr_at_1 value: 82.24000000000001 - type: mrr_at_10 value: 88.225 - type: mrr_at_100 value: 88.324 - type: mrr_at_1000 value: 88.325 - type: mrr_at_3 value: 87.348 - type: mrr_at_5 value: 87.938 - type: ndcg_at_1 value: 82.24000000000001 - type: ndcg_at_10 value: 88.97699999999999 - type: ndcg_at_100 value: 90.16 - type: ndcg_at_1000 value: 90.236 - type: ndcg_at_3 value: 86.371 - type: ndcg_at_5 value: 87.746 - type: precision_at_1 value: 82.24000000000001 - type: precision_at_10 value: 13.481000000000002 - type: precision_at_100 value: 1.534 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.86 - type: precision_at_5 value: 24.738 - type: recall_at_1 value: 71.414 - type: recall_at_10 value: 95.735 - type: recall_at_100 value: 99.696 - type: recall_at_1000 value: 99.979 - type: recall_at_3 value: 88.105 - type: recall_at_5 value: 92.17999999999999 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 60.22146692057259 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 65.29273320614578 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.023 - type: map_at_10 value: 14.161000000000001 - type: map_at_100 value: 16.68 - type: map_at_1000 value: 17.072000000000003 - type: map_at_3 value: 9.763 - type: map_at_5 value: 11.977 - type: mrr_at_1 value: 24.8 - type: mrr_at_10 value: 37.602999999999994 - type: mrr_at_100 value: 38.618 - type: mrr_at_1000 value: 38.659 - type: mrr_at_3 value: 34.117 - type: mrr_at_5 value: 36.082 - type: ndcg_at_1 value: 24.8 - type: ndcg_at_10 value: 23.316 - type: ndcg_at_100 value: 32.613 - type: ndcg_at_1000 value: 38.609 - type: ndcg_at_3 value: 21.697 - type: ndcg_at_5 value: 19.241 - type: precision_at_1 value: 24.8 - type: precision_at_10 value: 12.36 - type: precision_at_100 value: 2.593 - type: precision_at_1000 value: 0.402 - type: precision_at_3 value: 20.767 - type: precision_at_5 value: 17.34 - type: recall_at_1 value: 5.023 - type: recall_at_10 value: 25.069999999999997 - type: recall_at_100 value: 52.563 - type: recall_at_1000 value: 81.525 - type: recall_at_3 value: 12.613 - type: recall_at_5 value: 17.583 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 87.71506247604255 - type: cos_sim_spearman value: 82.91813463738802 - type: euclidean_pearson value: 85.5154616194479 - type: euclidean_spearman value: 82.91815254466314 - type: manhattan_pearson value: 85.5280917850374 - type: manhattan_spearman value: 82.92276537286398 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 87.43772054228462 - type: cos_sim_spearman value: 78.75750601716682 - type: euclidean_pearson value: 85.76074482955764 - type: euclidean_spearman value: 78.75651057223058 - type: manhattan_pearson value: 85.73390291701668 - type: manhattan_spearman value: 78.72699385957797 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 89.58144067172472 - type: cos_sim_spearman value: 90.3524512966946 - type: euclidean_pearson value: 89.71365391594237 - type: euclidean_spearman value: 90.35239632843408 - type: manhattan_pearson value: 89.66905421746478 - type: manhattan_spearman value: 90.31508211683513 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 87.77692637102102 - type: cos_sim_spearman value: 85.45710562643485 - type: euclidean_pearson value: 87.42456979928723 - type: euclidean_spearman value: 85.45709386240908 - type: manhattan_pearson value: 87.40754529526272 - type: manhattan_spearman value: 85.44834854173303 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 88.28491331695997 - type: cos_sim_spearman value: 89.62037029566964 - type: euclidean_pearson value: 89.02479391362826 - type: euclidean_spearman value: 89.62036733618466 - type: manhattan_pearson value: 89.00394756040342 - type: manhattan_spearman value: 89.60867744215236 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 85.08911381280191 - type: cos_sim_spearman value: 86.5791780765767 - type: euclidean_pearson value: 86.16063473577861 - type: euclidean_spearman value: 86.57917745378766 - type: manhattan_pearson value: 86.13677924604175 - type: manhattan_spearman value: 86.56115615768685 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 89.58029496205235 - type: cos_sim_spearman value: 89.49551253826998 - type: euclidean_pearson value: 90.13714840963748 - type: euclidean_spearman value: 89.49551253826998 - type: manhattan_pearson value: 90.13039633601363 - type: manhattan_spearman value: 89.4513453745516 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 69.01546399666435 - type: cos_sim_spearman value: 69.33824484595624 - type: euclidean_pearson value: 70.76511642998874 - type: euclidean_spearman value: 69.33824484595624 - type: manhattan_pearson value: 70.84320785047453 - type: manhattan_spearman value: 69.54233632223537 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 87.26389196390119 - type: cos_sim_spearman value: 89.09721478341385 - type: euclidean_pearson value: 88.97208685922517 - type: euclidean_spearman value: 89.09720927308881 - type: manhattan_pearson value: 88.97513670502573 - type: manhattan_spearman value: 89.07647853984004 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.53075025771936 - type: mrr value: 96.24327651288436 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 60.428000000000004 - type: map_at_10 value: 70.088 - type: map_at_100 value: 70.589 - type: map_at_1000 value: 70.614 - type: map_at_3 value: 67.191 - type: map_at_5 value: 68.515 - type: mrr_at_1 value: 63.333 - type: mrr_at_10 value: 71.13000000000001 - type: mrr_at_100 value: 71.545 - type: mrr_at_1000 value: 71.569 - type: mrr_at_3 value: 68.944 - type: mrr_at_5 value: 70.078 - type: ndcg_at_1 value: 63.333 - type: ndcg_at_10 value: 74.72800000000001 - type: ndcg_at_100 value: 76.64999999999999 - type: ndcg_at_1000 value: 77.176 - type: ndcg_at_3 value: 69.659 - type: ndcg_at_5 value: 71.626 - type: precision_at_1 value: 63.333 - type: precision_at_10 value: 10 - type: precision_at_100 value: 1.09 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.111 - type: precision_at_5 value: 17.666999999999998 - type: recall_at_1 value: 60.428000000000004 - type: recall_at_10 value: 87.98899999999999 - type: recall_at_100 value: 96.167 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 74.006 - type: recall_at_5 value: 79.05 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.87326732673267 - type: cos_sim_ap value: 96.81770773701805 - type: cos_sim_f1 value: 93.6318407960199 - type: cos_sim_precision value: 93.16831683168317 - type: cos_sim_recall value: 94.1 - type: dot_accuracy value: 99.87326732673267 - type: dot_ap value: 96.8174218946665 - type: dot_f1 value: 93.6318407960199 - type: dot_precision value: 93.16831683168317 - type: dot_recall value: 94.1 - type: euclidean_accuracy value: 99.87326732673267 - type: euclidean_ap value: 96.81770773701807 - type: euclidean_f1 value: 93.6318407960199 - type: euclidean_precision value: 93.16831683168317 - type: euclidean_recall value: 94.1 - type: manhattan_accuracy value: 99.87227722772278 - type: manhattan_ap value: 96.83164126821747 - type: manhattan_f1 value: 93.54677338669335 - type: manhattan_precision value: 93.5935935935936 - type: manhattan_recall value: 93.5 - type: max_accuracy value: 99.87326732673267 - type: max_ap value: 96.83164126821747 - type: max_f1 value: 93.6318407960199 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 65.6212042420246 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 35.779230635982564 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 55.217701909036286 - type: mrr value: 56.17658995416349 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.954206018888453 - type: cos_sim_spearman value: 32.71062599450096 - type: dot_pearson value: 30.95420929056943 - type: dot_spearman value: 32.71062599450096 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.22699999999999998 - type: map_at_10 value: 1.924 - type: map_at_100 value: 10.525 - type: map_at_1000 value: 24.973 - type: map_at_3 value: 0.638 - type: map_at_5 value: 1.0659999999999998 - type: mrr_at_1 value: 84 - type: mrr_at_10 value: 91.067 - type: mrr_at_100 value: 91.067 - type: mrr_at_1000 value: 91.067 - type: mrr_at_3 value: 90.667 - type: mrr_at_5 value: 91.067 - type: ndcg_at_1 value: 81 - type: ndcg_at_10 value: 75.566 - type: ndcg_at_100 value: 56.387 - type: ndcg_at_1000 value: 49.834 - type: ndcg_at_3 value: 80.899 - type: ndcg_at_5 value: 80.75099999999999 - type: precision_at_1 value: 84 - type: precision_at_10 value: 79 - type: precision_at_100 value: 57.56 - type: precision_at_1000 value: 21.8 - type: precision_at_3 value: 84.667 - type: precision_at_5 value: 85.2 - type: recall_at_1 value: 0.22699999999999998 - type: recall_at_10 value: 2.136 - type: recall_at_100 value: 13.861 - type: recall_at_1000 value: 46.299 - type: recall_at_3 value: 0.6649999999999999 - type: recall_at_5 value: 1.145 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.752 - type: map_at_10 value: 9.951 - type: map_at_100 value: 16.794999999999998 - type: map_at_1000 value: 18.251 - type: map_at_3 value: 5.288 - type: map_at_5 value: 6.954000000000001 - type: mrr_at_1 value: 38.775999999999996 - type: mrr_at_10 value: 50.458000000000006 - type: mrr_at_100 value: 51.324999999999996 - type: mrr_at_1000 value: 51.339999999999996 - type: mrr_at_3 value: 46.939 - type: mrr_at_5 value: 47.857 - type: ndcg_at_1 value: 36.735 - type: ndcg_at_10 value: 25.198999999999998 - type: ndcg_at_100 value: 37.938 - type: ndcg_at_1000 value: 49.145 - type: ndcg_at_3 value: 29.348000000000003 - type: ndcg_at_5 value: 25.804 - type: precision_at_1 value: 38.775999999999996 - type: precision_at_10 value: 22.041 - type: precision_at_100 value: 7.939 - type: precision_at_1000 value: 1.555 - type: precision_at_3 value: 29.932 - type: precision_at_5 value: 24.490000000000002 - type: recall_at_1 value: 2.752 - type: recall_at_10 value: 16.197 - type: recall_at_100 value: 49.166 - type: recall_at_1000 value: 84.18900000000001 - type: recall_at_3 value: 6.438000000000001 - type: recall_at_5 value: 9.093 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.47980000000001 - type: ap value: 14.605194452178754 - type: f1 value: 55.07362924988948 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 59.708545557441994 - type: f1 value: 60.04751270975683 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 53.21105960597211 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 87.58419264469214 - type: cos_sim_ap value: 78.55300004517404 - type: cos_sim_f1 value: 71.49673530889001 - type: cos_sim_precision value: 68.20795400095831 - type: cos_sim_recall value: 75.11873350923483 - type: dot_accuracy value: 87.58419264469214 - type: dot_ap value: 78.55297659559511 - type: dot_f1 value: 71.49673530889001 - type: dot_precision value: 68.20795400095831 - type: dot_recall value: 75.11873350923483 - type: euclidean_accuracy value: 87.58419264469214 - type: euclidean_ap value: 78.55300477331477 - type: euclidean_f1 value: 71.49673530889001 - type: euclidean_precision value: 68.20795400095831 - type: euclidean_recall value: 75.11873350923483 - type: manhattan_accuracy value: 87.5663110210407 - type: manhattan_ap value: 78.49982050876562 - type: manhattan_f1 value: 71.35488740722104 - type: manhattan_precision value: 68.18946862226497 - type: manhattan_recall value: 74.82849604221636 - type: max_accuracy value: 87.58419264469214 - type: max_ap value: 78.55300477331477 - type: max_f1 value: 71.49673530889001 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.09069740365584 - type: cos_sim_ap value: 86.22749303724757 - type: cos_sim_f1 value: 78.36863452005407 - type: cos_sim_precision value: 76.49560117302053 - type: cos_sim_recall value: 80.33569448721897 - type: dot_accuracy value: 89.09069740365584 - type: dot_ap value: 86.22750233655673 - type: dot_f1 value: 78.36863452005407 - type: dot_precision value: 76.49560117302053 - type: dot_recall value: 80.33569448721897 - type: euclidean_accuracy value: 89.09069740365584 - type: euclidean_ap value: 86.22749355597347 - type: euclidean_f1 value: 78.36863452005407 - type: euclidean_precision value: 76.49560117302053 - type: euclidean_recall value: 80.33569448721897 - type: manhattan_accuracy value: 89.08293553770326 - type: manhattan_ap value: 86.21913616084771 - type: manhattan_f1 value: 78.3907031479847 - type: manhattan_precision value: 75.0352013517319 - type: manhattan_recall value: 82.06036341238065 - type: max_accuracy value: 89.09069740365584 - type: max_ap value: 86.22750233655673 - type: max_f1 value: 78.3907031479847 license: apache-2.0 language: - en library_name: sentence-transformers pipeline_tag: feature-extraction ---

The crispy sentence embedding family from

🍞 Looking for a simple end-to-end retrieval solution? Meet Omni, our multimodal and multilingual model.

# mixedbread-ai/mxbai-embed-large-v1 Here, we provide several ways to produce sentence embeddings. Please note that you have to provide the prompt for query if you want to use it for retrieval. Besides that you don't need any prompt. Our model also supports Matryoshka Representation Learning and binary quantization. ## Quickstart Here, we provide several ways to produce sentence embeddings. Please note that you have to provide the prompt for query if you want to use it for retrieval. Besides that you don't need any prompt. ### sentence-transformers ### Transformers ### Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM using: You can then use the model to compute embeddings like this: ### Using API You can use the model via our API as follows: The API comes with native int8 and binary quantization support! Check out the docs for more information. ### Infinity ## Evaluation As of March 2024, our model archives SOTA performance for Bert-large sized models on the MTEB. It ourperforms commercial models like OpenAIs text-embedding-3-large and matches the performance of model 20x it's size like the echo-mistral-7b. Our model was trained with no overlap of the MTEB data, which indicates that our model generalizes well across several domains, tasks and text length. We know there are some limitations with this model, which will be fixed in v2. | Model | Avg (56 datasets) | Classification (12 datasets) | Clustering (11 datasets) | PairClassification (3 datasets) | Reranking (4 datasets) | Retrieval (15 datasets) | STS (10 datasets) | Summarization (1 dataset) | | --------------------------------------------------------------------------------------------- | ----------------- | ---------------------------- | ------------------------ | ------------------------------- | ---------------------- | ----------------------- | ----------------- | ------------------------- | | **mxbai-embed-large-v1** | **64.68** | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85.00 | 32.71 | | bge-large-en-v1.5 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 | | mxbai-embed-2d-large-v1 | 63.25 | 74.14 | 46.07 | 85.89 | 58.94 | 51.42 | 84.9 | 31.55 | | nomic-embed-text-v1 | 62.39 | 74.12 | 43.91 | 85.15 | 55.69 | 52.81 | 82.06 | 30.08 | | jina-embeddings-v2-base-en | 60.38 | 73.45 | 41.73 | 85.38 | 56.98 | 47.87 | 80.7 | 31.6 | | *Proprietary Models* | | | | | | | | | | OpenAI text-embedding-3-large | 64.58 | 75.45 | 49.01 | 85.72 | 59.16 | 55.44 | 81.73 | 29.92 | | Cohere embed-english-v3.0 | 64.47 | 76.49 | 47.43 | 85.84 | 58.01 | 55.00 | 82.62 | 30.18 | | OpenAI text-embedding-ada-002 | 60.99 | 70.93 | 45.90 | 84.89 | 56.32 | 49.25 | 80.97 | 30.80 | Please find more information in our blog post. ## Matryoshka and Binary Quantization Embeddings in their commonly used form (float arrays) have a high memory footprint when used at scale. Two approaches to solve this problem are Matryoshka Representation Learning (MRL) and (Binary) Quantization. While MRL reduces the number of dimensions of an embedding, binary quantization transforms the value of each dimension from a float32 into a lower precision (int8 or even binary). The model supports both approaches! You can also take it one step further, and combine both MRL and quantization. This combination of binary quantization and MRL allows you to reduce the memory usage of your embeddings significantly. This leads to much lower costs when using a vector database in particular. You can read more about the technology and its advantages in our blog post. ## Community Please join our Discord Community and share your feedback and thoughts! We are here to help and also always happy to chat. ## License Apache 2.0 ## Citation", + "model_explanation_gemini": "Performs text embedding tasks including classification, retrieval, clustering, reranking, and semantic textual similarity across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/mixedbread-ai_mxbai-rerank-large-v1.json b/data/model_data_json/mixedbread-ai_mxbai-rerank-large-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..64e764e1e73cb5051cddcfd1e3071b9c6721df1c --- /dev/null +++ b/data/model_data_json/mixedbread-ai_mxbai-rerank-large-v1.json @@ -0,0 +1,21 @@ +{ + "model_id": "mixedbread-ai/mxbai-rerank-large-v1", + "downloads": 75127, + "tags": [ + "transformers", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "reranker", + "transformers.js", + "sentence-transformers", + "text-ranking", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - reranker - transformers.js - sentence-transformers license: apache-2.0 language: - en pipeline_tag: text-ranking ---

The crispy rerank family from

🍞 Looking for a simple end-to-end retrieval solution? Meet Omni, our multimodal and multilingual model.

# mxbai-rerank-large-v1 This is the largest model in our family of powerful reranker models. You can learn more about the models in our blog post. We have three models: - mxbai-rerank-xsmall-v1 - mxbai-rerank-base-v1 - mxbai-rerank-large-v1 (🍞) ## Quickstart Currently, the best way to use our models is with the most recent version of sentence-transformers. Let's say you have a query, and you want to rerank a set of documents. You can do that with only one line of code:
JavaScript Example Install transformers.js Let's say you have a query, and you want to rerank a set of documents. In JavaScript, you need to add a function:
## Using API You can use the model via our API as follows: The API comes with additional features, such as a continous trained reranker! Check out the docs for more information. ## Evaluation Our reranker models are designed to elevate your search. They work extremely well in combination with keyword search and can even outperform semantic search systems in many cases. | Model | NDCG@10 | Accuracy@3 | | ------------------------------------------------------------------------------------- | -------- | ---------- | | Lexical Search (Lucene) | 38.0 | 66.4 | | BAAI/bge-reranker-base | 41.6 | 66.9 | | BAAI/bge-reranker-large | 45.2 | 70.6 | | cohere-embed-v3 (semantic search) | 47.5 | 70.9 | | mxbai-rerank-xsmall-v1 | **43.9** | **70.0** | | mxbai-rerank-base-v1 | **46.9** | **72.3** | | mxbai-rerank-large-v1 | **48.8** | **74.9** | The reported results are aggregated from 11 datasets of BEIR. We used Pyserini to evaluate the models. Find more in our blog-post and on this spreadsheet. ## Community Please join our Discord Community and share your feedback and thoughts! We are here to help and also always happy to chat. ## Citation ## License Apache 2.0" +} \ No newline at end of file diff --git a/data/model_data_json/mixedbread-ai_mxbai-rerank-large-v2.json b/data/model_data_json/mixedbread-ai_mxbai-rerank-large-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..c52143292d8c5f3425860b16770b2ac9fd7b19c7 --- /dev/null +++ b/data/model_data_json/mixedbread-ai_mxbai-rerank-large-v2.json @@ -0,0 +1,32 @@ +{ + "model_id": "mixedbread-ai/mxbai-rerank-large-v2", + "downloads": 172304, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "text-ranking", + "en", + "zh", + "de", + "ja", + "ko", + "es", + "fr", + "ar", + "bn", + "ru", + "id", + "sw", + "te", + "th", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: apache-2.0 language: - en - zh - de - ja - ko - es - fr - ar - bn - ru - id - sw - te - th pipeline_tag: text-ranking ---

The crispy rerank family from

🍞 Looking for a simple end-to-end retrieval solution? Meet Omni, our multimodal and multilingual model.

# 🍞 mxbai-rerank-large-v2 This is the large model in our family of powerful reranker models. You can learn more about the models in our blog post. We have two models: - mxbai-rerank-base-v2 - mxbai-rerank-large-v2 (🍞) **The technical report is coming soon!** ## 🌟 Features - state-of-the-art performance and strong efficiency - multilingual support (100+ languages, outstanding English and Chinese performance) - code support - long-context support ## ⚙️ Usage 1. Install mxbai-rerank 2. Inference ## Performance ### Benchmark Results | Model | BEIR Avg | Multilingual | Chinese | Code Search | Latency (s) | |-------|----------|----------|----------|--------------|-------------| | mxbai-rerank-large-v2 | 57.49 | 29.79 | 84.16 | 32.05 | 0.89 | | mxbai-rerank-base-v2 | 55.57 | 28.56 | 83.70 | 31.73 | 0.67 | | mxbai-rerank-large-v1 | 49.32 | 21.88 | 72.53 | 30.72 | 2.24 | *Latency measured on A100 GPU ## Training Details The models were trained using a three-step process: 1. **GRPO (Guided Reinforcement Prompt Optimization)** 2. **Contrastive Learning** 3. **Preference Learning** For more details, check our technical blog post. Paper following soon. ## 🎓 Citation", + "model_explanation_gemini": "Reranks text passages for improved relevance in multilingual search and retrieval tasks across multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/mixedbread-ai_mxbai-rerank-xsmall-v1.json b/data/model_data_json/mixedbread-ai_mxbai-rerank-xsmall-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..44a179a3a3a9c383c977c47241455d82ce6210d6 --- /dev/null +++ b/data/model_data_json/mixedbread-ai_mxbai-rerank-xsmall-v1.json @@ -0,0 +1,22 @@ +{ + "model_id": "mixedbread-ai/mxbai-rerank-xsmall-v1", + "downloads": 136209, + "tags": [ + "transformers", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "reranker", + "transformers.js", + "sentence-transformers", + "text-ranking", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - reranker - transformers.js - sentence-transformers license: apache-2.0 language: - en pipeline_tag: text-ranking ---

The crispy rerank family from

🍞 Looking for a simple end-to-end retrieval solution? Meet Omni, our multimodal and multilingual model.

# mxbai-rerank-xsmall-v1 This is the smallest model in our family of powerful reranker models. You can learn more about the models in our blog post. We have three models: - mxbai-rerank-xsmall-v1 (🍞) - mxbai-rerank-base-v1 - mxbai-rerank-large-v1 ## Quickstart Currently, the best way to use our models is with the most recent version of sentence-transformers. Let's say you have a query, and you want to rerank a set of documents. You can do that with only one line of code:
JavaScript Example Install transformers.js Let's say you have a query, and you want to rerank a set of documents. In JavaScript, you need to add a function:
## Using API You can use the large model via our API as follows: The API comes with additional features, such as a continous trained reranker! Check out the docs for more information. ## Evaluation Our reranker models are designed to elevate your search. They work extremely well in combination with keyword search and can even outperform semantic search systems in many cases. | Model | NDCG@10 | Accuracy@3 | | ------------------------------------------------------------------------------------- | -------- | ---------- | | Lexical Search (Lucene) | 38.0 | 66.4 | | BAAI/bge-reranker-base | 41.6 | 66.9 | | BAAI/bge-reranker-large | 45.2 | 70.6 | | cohere-embed-v3 (semantic search) | 47.5 | 70.9 | | mxbai-rerank-xsmall-v1 | **43.9** | **70.0** | | mxbai-rerank-base-v1 | **46.9** | **72.3** | | mxbai-rerank-large-v1 | **48.8** | **74.9** | The reported results are aggregated from 11 datasets of BEIR. We used Pyserini to evaluate the models. Find more in our blog-post and on this spreadsheet. ## Community Please join our Discord Community and share your feedback and thoughts! We are here to help and also always happy to chat. ## Citation ## License Apache 2.0", + "model_explanation_gemini": "Reranks text or sentences for improved relevance in search or ranking tasks." +} \ No newline at end of file diff --git a/data/model_data_json/mlc-ai_Llama-3.2-1B-Instruct-q4f16_1-MLC.json b/data/model_data_json/mlc-ai_Llama-3.2-1B-Instruct-q4f16_1-MLC.json new file mode 100644 index 0000000000000000000000000000000000000000..591198be1de718ca749528d1c7edd03bf10c4543 --- /dev/null +++ b/data/model_data_json/mlc-ai_Llama-3.2-1B-Instruct-q4f16_1-MLC.json @@ -0,0 +1,13 @@ +{ + "model_id": "mlc-ai/Llama-3.2-1B-Instruct-q4f16_1-MLC", + "downloads": 237402, + "tags": [ + "mlc-llm", + "web-llm", + "base_model:meta-llama/Llama-3.2-1B-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-1B-Instruct", + "region:us" + ], + "description": "--- library_name: mlc-llm base_model: meta-llama/Llama-3.2-1B-Instruct tags: - mlc-llm - web-llm --- # Llama-3.2-1B-Instruct-q4f16_1-MLC This is the Llama-3.2-1B-Instruct model in MLC format . The model can be used for projects MLC-LLM and WebLLM. ## Example Usage Here are some examples of using this model in MLC LLM. Before running the examples, please install MLC LLM by following the installation documentation. ### Chat In command line, run ### REST Server In command line, run ### Python API ## Documentation For more information on MLC LLM project, please visit our documentation and GitHub repo.", + "model_explanation_gemini": "A quantized version of the Llama-3.2-1B-Instruct model optimized for MLC-LLM and WebLLM projects, enabling local deployment and web-based language model applications." +} \ No newline at end of file diff --git a/data/model_data_json/mlx-community_DeepSeek-R1-Distill-Qwen-32B-4bit.json b/data/model_data_json/mlx-community_DeepSeek-R1-Distill-Qwen-32B-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..73a9dd26b53a9f4bdb00b5ff27a90eb4b66aa276 --- /dev/null +++ b/data/model_data_json/mlx-community_DeepSeek-R1-Distill-Qwen-32B-4bit.json @@ -0,0 +1,20 @@ +{ + "model_id": "mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit", + "downloads": 74600, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "mlx", + "conversational", + "base_model:deepseek-ai/DeepSeek-R1-Distill-Qwen-32B", + "base_model:quantized:deepseek-ai/DeepSeek-R1-Distill-Qwen-32B", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "region:us" + ], + "description": "--- library_name: transformers base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B tags: - mlx --- # mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit The Model mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit was converted to MLX format from deepseek-ai/DeepSeek-R1-Distill-Qwen-32B using mlx-lm version **0.20.2**. ## Use with mlx" +} \ No newline at end of file diff --git a/data/model_data_json/mlx-community_Qwen2.5-Coder-32B-Instruct-4bit.json b/data/model_data_json/mlx-community_Qwen2.5-Coder-32B-Instruct-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..9149d2719c71eabca1c715b7c5e94cdb6eab5ec7 --- /dev/null +++ b/data/model_data_json/mlx-community_Qwen2.5-Coder-32B-Instruct-4bit.json @@ -0,0 +1,27 @@ +{ + "model_id": "mlx-community/Qwen2.5-Coder-32B-Instruct-4bit", + "downloads": 74138, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "code", + "codeqwen", + "chat", + "qwen", + "qwen-coder", + "mlx", + "conversational", + "en", + "base_model:Qwen/Qwen2.5-Coder-32B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-Coder-32B-Instruct", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-Coder-32B-Instruct language: - en library_name: transformers license: apache-2.0 license_link: pipeline_tag: text-generation tags: - code - codeqwen - chat - qwen - qwen-coder - mlx --- # mlx-community/Qwen2.5-Coder-32B-Instruct-4bit The Model mlx-community/Qwen2.5-Coder-32B-Instruct-4bit was converted to MLX format from Qwen/Qwen2.5-Coder-32B-Instruct using mlx-lm version **0.19.3**. ## Use with mlx" +} \ No newline at end of file diff --git a/data/model_data_json/mobiuslabsgmbh_faster-whisper-large-v3-turbo.json b/data/model_data_json/mobiuslabsgmbh_faster-whisper-large-v3-turbo.json new file mode 100644 index 0000000000000000000000000000000000000000..5ab4f90a665f8a35c90815814f72a47464a49fd3 --- /dev/null +++ b/data/model_data_json/mobiuslabsgmbh_faster-whisper-large-v3-turbo.json @@ -0,0 +1,10 @@ +{ + "model_id": "mobiuslabsgmbh/faster-whisper-large-v3-turbo", + "downloads": 217459, + "tags": [ + "license:mit", + "region:us" + ], + "description": "--- license: mit --- # Whisper large-v3 turbo model for CTranslate2 This repository contains the conversion of openai/whisper-large-v3-turbo to the CTranslate2 model format. This model can be used in CTranslate2 or projects based on CTranslate2 models such as faster-whisper. It is called automatically for Mobius Labs fork of faster-whisper. ## Example Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the compute_type option in CTranslate2. ## Conversion details The openAI model was converted with the following command: ### More Information For more information about the original model, see its model card.", + "model_explanation_gemini": "Converts OpenAI's Whisper large-v3-turbo to CTranslate2 format for efficient speech recognition in faster-whisper projects." +} \ No newline at end of file diff --git a/data/model_data_json/modularai_Llama-3.1-8B-Instruct-GGUF.json b/data/model_data_json/modularai_Llama-3.1-8B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..b1c0c1de637d20c419723f3efdb233b93e9f6ee0 --- /dev/null +++ b/data/model_data_json/modularai_Llama-3.1-8B-Instruct-GGUF.json @@ -0,0 +1,22 @@ +{ + "model_id": "modularai/Llama-3.1-8B-Instruct-GGUF", + "downloads": 12675940, + "tags": [ + "gguf", + "llama", + "facebook", + "meta", + "pytorch", + "llama-3", + "text-generation", + "conversational", + "en", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3", + "region:us" + ], + "description": "--- language: - en license: llama3 tags: - facebook - meta - pytorch - llama - llama-3 model_name: Llama 3.1 8B base_model: meta-llama/Meta-Llama-3.1-8B-Instruct inference: false model_creator: Meta model_type: llama pipeline_tag: text-generation --- # Llama 3.1 8B Instruct - GGUF - Model creator: Meta - Original model: Llama 3 8B Instruct - GGUF models quantized by: - bartowski at bartowski/Meta-Llama-3.1-8B-Instruct-GGUF - kaetemi at kaetemi/Meta-Llama-3.1-8B-Q4_0-GGUF - ddh0 at bullerwins/Meta-Llama-3.1-8B-Instruct-GGU ## Description This repo contains GGUF format model files for Meta's Llama 3.1 8B Instruct. ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Tool use with transformers LLaMA-3.1 supports multiple tool use formats. You can see a full guide to prompt formatting here. Tool use is also supported through chat templates in Transformers. Here is a quick example showing a single simple tool: You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so: and then call the tool and append the result, with the role, like so: After that, you can again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information, see the LLaMA prompt format docs and the Transformers tool use documentation. ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 41.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development. ", + "model_explanation_gemini": "An 8-billion-parameter multilingual instruction-tuned LLM optimized for dialogue and assistant-like chat, supporting text and code generation in multiple languages." +} \ No newline at end of file diff --git a/data/model_data_json/modularai_replit-code-1.5.json b/data/model_data_json/modularai_replit-code-1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..bcb6432d7db1bb0c0a1cae9a521b47bb384ed3e3 --- /dev/null +++ b/data/model_data_json/modularai_replit-code-1.5.json @@ -0,0 +1,23 @@ +{ + "model_id": "modularai/replit-code-1.5", + "downloads": 1120578, + "tags": [ + "gguf", + "mpt", + "code", + "Composer", + "MosaicML", + "llm-foundry", + "StreamingDatasets", + "custom_code", + "en", + "dataset:bigcode/the-stack-dedup", + "dataset:togethercomputer/RedPajama-Data-1T", + "base_model:replit/replit-code-v1_5-3b", + "base_model:quantized:replit/replit-code-v1_5-3b", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: - en license: apache-2.0 model_name: Replit Code V-1.5 3B base_model: replit/replit-code-v1_5-3b inference: false model_creator: Replit quantized_by: tzhenghao datasets: - bigcode/the-stack-dedup - togethercomputer/RedPajama-Data-1T tags: - code - Composer - MosaicML - llm-foundry - StreamingDatasets --- # Replit Code V-1.5 3B - GGUF - Model creator: Replit - Original model: Replit Code V-1.5 3B - GGUF models quantized by: tzhenghao ## Description This repo contains GGUF format model files for Replit Code V-1.5 3B. ## Model Description Replit Code v1.5 is a 3.3B parameter Causal Language Model focused on **Code Completion**. The model is trained in on 1T tokens of code (~200B tokens over 5 epochs, including linear cooldown) for 30 programming languages from a subset of permissively licensed code from Bigcode's Stack Dedup dataset, a filtered natural language sample from Markdown and reStructuredText subsets from the same Stack Dedup dataset, and a dev-oriented sample from RedPajama's StackExchange dataset sourced from the Stack Exchange Data Dump by Stack Exchange Inc. The 30 programming languages are: The context size of the model is 4096 tokens. We use the GPTNeoX tokenizer with a custom trained and optimized vocabulary of 32768 tokens. This custom vocabulary led to single-digit % points on compression while maintaining or improving coverage on our training corpus. The model has been trained on the MosaicML platform on 128 H100-80GB GPUs using their LLM Foundry and Composer training library built on top of PyTorch. ## Dependencies You will need to install the latest versions of the following dependencies: ## How to Use ### Generation You can generate code using the library as follows: Experiment with different decoding methods and parameters to get the best results for your use case. ### Using Triton Implementation of Flash Attention Experiment with different decoding methods and parameters to get the best results for your use case. We recommend experimenting with and for optimal performance on your use case! ## Intended Use Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use. The model is trained specifically for code completion tasks. ## Limitations The pre-training dataset may have contained offensive or inappropriate content even after applying data cleansing and toxicity and profanity filters, and such content may be reflected in model generated text. We recommend that users exercise reasonable caution when using in production systems. Do not use for any applications that may cause harm or distress to individuals or groups. ", + "model_explanation_gemini": "A 3.3B parameter causal language model trained for code completion across 30 programming languages using permissively licensed code and natural language datasets." +} \ No newline at end of file diff --git a/data/model_data_json/monologg_kobert.json b/data/model_data_json/monologg_kobert.json new file mode 100644 index 0000000000000000000000000000000000000000..cd34c92d89db585a8d760168c490de290366740c --- /dev/null +++ b/data/model_data_json/monologg_kobert.json @@ -0,0 +1,17 @@ +{ + "model_id": "monologg/kobert", + "downloads": 1143640, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "bert", + "feature-extraction", + "ko", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 language: - ko inference: false --- # KoBERT ## How to use > If you want to import KoBERT tokenizer with , you should give . ## Reference -", + "model_explanation_gemini": "KoBERT is a Korean language model designed for inference tasks, requiring specific tokenizer import settings." +} \ No newline at end of file diff --git a/data/model_data_json/monster-labs_control_v1p_sd15_qrcode_monster.json b/data/model_data_json/monster-labs_control_v1p_sd15_qrcode_monster.json new file mode 100644 index 0000000000000000000000000000000000000000..fe429971adc02ef0f8d115efa9ae5605157e2b63 --- /dev/null +++ b/data/model_data_json/monster-labs_control_v1p_sd15_qrcode_monster.json @@ -0,0 +1,16 @@ +{ + "model_id": "monster-labs/control_v1p_sd15_qrcode_monster", + "downloads": 96462, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "controlnet", + "qrcode", + "en", + "license:openrail++", + "region:us" + ], + "description": "--- tags: - stable-diffusion - controlnet - qrcode license: openrail++ language: - en --- # Controlnet QR Code Monster v2 For SD-1.5 !QR code in shape of a blue monster, reading \" ## Model Description This model is made to generate creative QR codes that still scan. Keep in mind that not all generated codes might be readable, but you can try different parameters and prompts to get the desired results. **NEW VERSION** Introducing the upgraded version of our model - Controlnet QR code Monster v2. V2 is a huge upgrade over v1, for scannability AND creativity. QR codes can now seamlessly blend the image by using a gray-colored background (#808080). As with the former version, the readability of some generated codes may vary, however playing around with parameters and prompts could yield better results. You can find in in the subfolder. ## How to Use - **Condition**: QR codes are passed as condition images with a module size of 16px. Use a higher error correction level to make it easier to read (sometimes a lower level can be easier to read if smaller in size). Use a gray background for the rest of the image to make the code integrate better. - **Prompts**: Use a prompt to guide the QR code generation. The output will highly depend on the given prompt. Some seem to be really easily accepted by the qr code process, some will require careful tweaking to get good results. - **Controlnet guidance scale**: Set the controlnet guidance scale value: - High values: The generated QR code will be more readable. - Low values: The generated QR code will be more creative. ### Tips - For an optimally readable output, try generating multiple QR codes with similar parameters, then choose the best ones. - Use the Image-to-Image feature to improve the readability of a generated QR code: - Decrease the denoising strength to retain more of the original image. - Increase the controlnet guidance scale value for better readability. A typical workflow for \"saving\" a code would be : Max out the guidance scale and minimize the denoising strength, then bump the strength until the code scans. ## Example Outputs Here are some examples of creative, yet scannable QR codes produced by our model: !City ruins with a building facade in shape of a QR code, reading \" !QR code in shape of a tree, reading \" !A gothic sculpture in shape of a QR code, reading \" Feel free to experiment with prompts, parameters, and the Image-to-Image feature to achieve the desired QR code output. Good luck and have fun!", + "model_explanation_gemini": "Generates creative, scannable QR codes integrated into artistic designs based on user prompts and control parameters." +} \ No newline at end of file diff --git a/data/model_data_json/mosaicml_mpt-7b-chat.json b/data/model_data_json/mosaicml_mpt-7b-chat.json new file mode 100644 index 0000000000000000000000000000000000000000..d91229c787a58b247050ca27fd25c72de1552aca --- /dev/null +++ b/data/model_data_json/mosaicml_mpt-7b-chat.json @@ -0,0 +1,27 @@ +{ + "model_id": "mosaicml/mpt-7b-chat", + "downloads": 80028, + "tags": [ + "transformers", + "pytorch", + "mpt", + "text-generation", + "Composer", + "MosaicML", + "llm-foundry", + "custom_code", + "dataset:jeffwan/sharegpt_vicuna", + "dataset:Hello-SimpleAI/HC3", + "dataset:tatsu-lab/alpaca", + "dataset:Anthropic/hh-rlhf", + "dataset:victor123/evol_instruct_70k", + "arxiv:2205.14135", + "arxiv:2108.12409", + "arxiv:2010.04245", + "license:cc-by-nc-sa-4.0", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- license: cc-by-nc-sa-4.0 datasets: - jeffwan/sharegpt_vicuna - Hello-SimpleAI/HC3 - tatsu-lab/alpaca - Anthropic/hh-rlhf - victor123/evol_instruct_70k tags: - Composer - MosaicML - llm-foundry inference: false --- # MPT-7B-Chat MPT-7B-Chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the ShareGPT-Vicuna, HC3, Alpaca, HH-RLHF, and Evol-Instruct datasets. * License: _CC-By-NC-SA-4.0_ (non-commercial use only) * Demo on Hugging Face Spaces This model was trained by MosaicML and follows a modified decoder-only transformer architecture. ## Model Date May 5, 2023 ## Model License _CC-By-NC-SA-4.0_ (non-commercial use only) ## Documentation * Blog post: Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs * Codebase (mosaicml/llm-foundry repo) * Questions: Feel free to contact us via the MosaicML Community Slack! ### Example Dialogue **SamIAm85**: >I want you to come up with a tweet based on this summary of the article: >\"Introducing MPT-7B, the latest entry in our MosaicML Foundation Series. >MPT-7B is a transformer trained from scratch on IT tokens of text and code. >It is open source and it matches the quality of LLaMA-7B. >MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. >Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch. >For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-Story Writer-65k+, the last of which uses a context length of 65k tokens! **MPT-7B-Instruct**: >Here's my suggestion for a tweet: >\"Exciting news! @MosaicML has launched #MPT_7B - a new standard for open-source LLMs. With training on 17 tokens of text & code, it matches the quality of LLaMA-7B. Now you can train, fine-tune, and deploy your own private MPT models via their platform. Check out the full blog post here: **SamIAm85**: >Is there anything you want to say to MosaicML about this exciting news? **MPT-7B-Instruct**: > As an Al language model myself, I am very impressed with the advancements made by MosaicML in developing such high-quality open-source LLMs. > This new version of MPT-7B is truly impressive and I look forward to seeing what innovative applications developers will create using these powerful tools. > Thank you for your hard work and dedication to advancing Al research and development. ## How to Use This model is best used with the MosaicML llm-foundry repository for training and finetuning. Note: This model requires that be passed to the method. This is because we use a custom model architecture that is not yet part of the Hugging Face package. includes options for many training efficiency features such as FlashAttention, ALiBi, QK LayerNorm, and more. To use the optimized triton implementation of FlashAttention, you can load the model on GPU () with and with precision: Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example: This model was trained with the EleutherAI/gpt-neox-20b tokenizer. The model can then be used, for example, within a text-generation pipeline. Note: when running Torch modules in lower precision, it is best practice to use the torch.autocast context manager. ## Model Description The architecture is a modification of a standard decoder-only transformer. The model has been modified from a standard transformer in the following ways: * It uses FlashAttention * It uses ALiBi (Attention with Linear Biases) and does not use positional embeddings * It does not use biases | Hyperparameter | Value | |----------------|-------| |n_parameters | 6.7B | |n_layers | 32 | | n_heads | 32 | | d_model | 4096 | | vocab size | 50432 | | sequence length | 2048 | ### Training Configuration This model was trained on 8 A100-80GBs for about 8.2 hours, followed by training for 6.7 hours on 32 A100-40GBs using the MosaicML Platform. The model was trained with sharded data parallelism using FSDP and used the AdamW optimizer. ## Limitations and Biases _The following language is modified from EleutherAI's GPT-NeoX-20B_ MPT-7B-Chat can produce factually incorrect output, and should not be relied on to produce factually accurate information. MPT-7B-Chat was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs. ## Acknowledgements This model was finetuned by Sam Havens and the MosaicML NLP team ## Disclaimer The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes. ## MosaicML Platform If you're interested in training and deploying your own MPT or LLMs on the MosaicML Platform, sign up here. ## Citation Please cite this model using the following format:" +} \ No newline at end of file diff --git a/data/model_data_json/mpoyraz_wav2vec2-xls-r-300m-cv7-turkish.json b/data/model_data_json/mpoyraz_wav2vec2-xls-r-300m-cv7-turkish.json new file mode 100644 index 0000000000000000000000000000000000000000..7d5620fb09cf206fb3af20fc8d974b381f063003 --- /dev/null +++ b/data/model_data_json/mpoyraz_wav2vec2-xls-r-300m-cv7-turkish.json @@ -0,0 +1,21 @@ +{ + "model_id": "mpoyraz/wav2vec2-xls-r-300m-cv7-turkish", + "downloads": 724046, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "hf-asr-leaderboard", + "mozilla-foundation/common_voice_7_0", + "robust-speech-event", + "tr", + "dataset:mozilla-foundation/common_voice_7_0", + "license:cc-by-4.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-4.0 language: tr tags: - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_7_0 - robust-speech-event - tr datasets: - mozilla-foundation/common_voice_7_0 model-index: - name: mpoyraz/wav2vec2-xls-r-300m-cv7-turkish results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 7 type: mozilla-foundation/common_voice_7_0 args: tr metrics: - name: Test WER type: wer value: 8.62 - name: Test CER type: cer value: 2.26 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: tr metrics: - name: Test WER type: wer value: 30.87 - name: Test CER type: cer value: 10.69 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Test Data type: speech-recognition-community-v2/eval_data args: tr metrics: - name: Test WER type: wer value: 32.09 --- # wav2vec2-xls-r-300m-cv7-turkish ## Model description This ASR model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on Turkish language. ## Training and evaluation data The following datasets were used for finetuning: - Common Voice 7.0 TR All split except split was used for training. - MediaSpeech ## Training procedure To support both of the datasets above, custom pre-processing and loading steps was performed and wav2vec2-turkish repo was used for that purpose. ### Training hyperparameters The following hypermaters were used for finetuning: - learning_rate 2e-4 - num_train_epochs 10 - warmup_steps 500 - freeze_feature_extractor - mask_time_prob 0.1 - mask_feature_prob 0.05 - feat_proj_dropout 0.05 - attention_dropout 0.05 - final_dropout 0.05 - activation_dropout 0.05 - per_device_train_batch_size 8 - per_device_eval_batch_size 8 - gradient_accumulation_steps 8 ### Framework versions - Transformers 4.16.0.dev0 - Pytorch 1.10.1 - Datasets 1.17.0 - Tokenizers 0.10.3 ## Language Model N-gram language model is trained on a Turkish Wikipedia articles using KenLM and ngram-lm-wiki repo was used to generate arpa LM and convert it into binary format. ## Evaluation Commands Please install unicode_tr package before running evaluation. It is used for Turkish text processing. 1. To evaluate on with split 2. To evaluate on ## Evaluation results: | Dataset | WER | CER | |---|---|---| |Common Voice 7 TR test split| 8.62 | 2.26 | |Speech Recognition Community dev data| 30.87 | 10.69 |", + "model_explanation_gemini": "A fine-tuned automatic speech recognition model for Turkish, based on facebook/wav2vec2-xls-r-300m, trained on Common Voice 7.0 TR and MediaSpeech datasets." +} \ No newline at end of file diff --git a/data/model_data_json/mradermacher_DeepSeek-V2-Lite-GGUF.json b/data/model_data_json/mradermacher_DeepSeek-V2-Lite-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..0d74f01c39cf193282126bea25b63e85c104c069 --- /dev/null +++ b/data/model_data_json/mradermacher_DeepSeek-V2-Lite-GGUF.json @@ -0,0 +1,16 @@ +{ + "model_id": "mradermacher/DeepSeek-V2-Lite-GGUF", + "downloads": 184913, + "tags": [ + "transformers", + "gguf", + "en", + "base_model:ZZichen/DeepSeek-V2-Lite", + "base_model:quantized:ZZichen/DeepSeek-V2-Lite", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- base_model: ZZichen/DeepSeek-V2-Lite language: - en library_name: transformers no_imatrix: 'GGML_ASSERT: llama.cpp/ggml-cuda/concat.cu:107: ggml_is_contiguous(src0)' quantized_by: mradermacher --- ## About static quants of ## Usage If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files. ## Provided Quants (sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants) | Link | Type | Size/GB | Notes | |:-----|:-----|--------:|:------| | GGUF | Q2_K | 6.5 | | | GGUF | IQ3_XS | 7.2 | | | GGUF | IQ3_S | 7.6 | beats Q3_K* | | GGUF | Q3_K_S | 7.6 | | | GGUF | IQ3_M | 7.7 | | | GGUF | Q3_K_M | 8.2 | lower quality | | GGUF | Q3_K_L | 8.6 | | | GGUF | IQ4_XS | 8.7 | | | GGUF | Q4_K_S | 9.6 | fast, recommended | | GGUF | Q4_K_M | 10.5 | fast, recommended | | GGUF | Q5_K_S | 11.2 | | | GGUF | Q5_K_M | 12.0 | | | GGUF | Q6_K | 14.2 | very good quality | | GGUF | Q8_0 | 16.8 | fast, best quality | Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better): !image.png And here are Artefact2's thoughts on the matter: ## FAQ / Model Request See for some answers to questions you might have and/or if you want some other model quantized. ## Thanks I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time. ", + "model_explanation_gemini": "Quantized versions of the DeepSeek-V2-Lite model in GGUF format for efficient inference with various size-quality tradeoffs." +} \ No newline at end of file diff --git a/data/model_data_json/mrm8488_bert-spanish-cased-finetuned-ner.json b/data/model_data_json/mrm8488_bert-spanish-cased-finetuned-ner.json new file mode 100644 index 0000000000000000000000000000000000000000..0ceb59b925b0b829eb8121893f06f71b78fd8eff --- /dev/null +++ b/data/model_data_json/mrm8488_bert-spanish-cased-finetuned-ner.json @@ -0,0 +1,16 @@ +{ + "model_id": "mrm8488/bert-spanish-cased-finetuned-ner", + "downloads": 81872, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "token-classification", + "es", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: es thumbnail: --- # Spanish BERT (BETO) + NER This model is a fine-tuned on NER-C version of the Spanish BERT cased (BETO) for **NER** downstream task. ## Details of the downstream task (NER) - Dataset - Dataset: CONLL Corpora ES I preprocessed the dataset and split it as train / dev (80/20) | Dataset | # Examples | | ---------------------- | ----- | | Train | 8.7 K | | Dev | 2.2 K | - Fine-tune on NER script provided by Huggingface - Labels covered: ## Metrics on evaluation set: | Metric | # score | | :------------------------------------------------------------------------------------: | :-------: | | F1 | **90.17** | Precision | **89.86** | | Recall | **90.47** | ## Comparison: | Model | # F1 score |Size(MB)| | :--------------------------------------------------------------------------------------------------------------: | :-------: |:------| | bert-base-spanish-wwm-cased (BETO) | 88.43 | 421 | bert-spanish-cased-finetuned-ner (this one) | **90.17** | 420 | | Best Multilingual BERT | 87.38 | 681 | |TinyBERT-spanish-uncased-finetuned-ner | 70.00 | **55** | ## Model in action Fast usage with **pipelines**: > Created by Manuel Romero/@mrm8488 > Made with in Spain" +} \ No newline at end of file diff --git a/data/model_data_json/mrm8488_distilroberta-finetuned-financial-news-sentiment-analysis.json b/data/model_data_json/mrm8488_distilroberta-finetuned-financial-news-sentiment-analysis.json new file mode 100644 index 0000000000000000000000000000000000000000..1c4551537b01f160987313bae64f63075316f411 --- /dev/null +++ b/data/model_data_json/mrm8488_distilroberta-finetuned-financial-news-sentiment-analysis.json @@ -0,0 +1,24 @@ +{ + "model_id": "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis", + "downloads": 281390, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "safetensors", + "roberta", + "text-classification", + "generated_from_trainer", + "financial", + "stocks", + "sentiment", + "dataset:financial_phrasebank", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 thumbnail: tags: - generated_from_trainer - financial - stocks - sentiment widget: - text: \"Operating profit totaled EUR 9.4 mn , down from EUR 11.7 mn in 2004 .\" datasets: - financial_phrasebank metrics: - accuracy model-index: - name: distilRoberta-financial-sentiment results: - task: name: Text Classification type: text-classification dataset: name: financial_phrasebank type: financial_phrasebank args: sentences_allagree metrics: - name: Accuracy type: accuracy value: 0.9823008849557522 ---
\"logo\"
# DistilRoberta-financial-sentiment This model is a fine-tuned version of distilroberta-base on the financial_phrasebank dataset. It achieves the following results on the evaluation set: - Loss: 0.1116 - Accuracy: **0.98**23 ## Base Model description This model is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found here. This model is case-sensitive: it makes a difference between English and English. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base). On average DistilRoBERTa is twice as fast as Roberta-base. ## Training Data Polar sentiment dataset of sentences from financial news. The dataset consists of 4840 sentences from English language financial news categorised by sentiment. The dataset is divided by agreement rate of 5-8 annotators. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:--------:| | No log | 1.0 | 255 | 0.1670 | 0.9646 | | 0.209 | 2.0 | 510 | 0.2290 | 0.9558 | | 0.209 | 3.0 | 765 | 0.2044 | 0.9558 | | 0.0326 | 4.0 | 1020 | 0.1116 | 0.9823 | | 0.0326 | 5.0 | 1275 | 0.1127 | 0.9779 | ### Framework versions - Transformers 4.10.2 - Pytorch 1.9.0+cu102 - Datasets 1.12.1 - Tokenizers 0.10.3", + "model_explanation_gemini": "Classifies sentiment in financial news text with high accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/mukaj_fin-mpnet-base.json b/data/model_data_json/mukaj_fin-mpnet-base.json new file mode 100644 index 0000000000000000000000000000000000000000..da198ef325cd419ea66231ce9b11ec8d82b0bcff --- /dev/null +++ b/data/model_data_json/mukaj_fin-mpnet-base.json @@ -0,0 +1,26 @@ +{ + "model_id": "mukaj/fin-mpnet-base", + "downloads": 126888, + "tags": [ + "sentence-transformers", + "safetensors", + "mpnet", + "feature-extraction", + "sentence-similarity", + "mteb", + "financial", + "fiqa", + "finance", + "retrieval", + "rag", + "esg", + "fixed-income", + "equity", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - financial - fiqa - finance - retrieval - rag - esg - fixed-income - equity model-index: - name: fin-mpnet-base-v0.1 results: - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 29.128 - type: f1 value: 28.657401543151707 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 24.111 - type: map_at_10 value: 40.083 - type: map_at_100 value: 41.201 - type: map_at_1000 value: 41.215 - type: map_at_3 value: 35.325 - type: map_at_5 value: 37.796 - type: mrr_at_1 value: 25.036 - type: mrr_at_10 value: 40.436 - type: mrr_at_100 value: 41.554 - type: mrr_at_1000 value: 41.568 - type: mrr_at_3 value: 35.644999999999996 - type: mrr_at_5 value: 38.141000000000005 - type: ndcg_at_1 value: 24.111 - type: ndcg_at_10 value: 49.112 - type: ndcg_at_100 value: 53.669999999999995 - type: ndcg_at_1000 value: 53.944 - type: ndcg_at_3 value: 39.035 - type: ndcg_at_5 value: 43.503 - type: precision_at_1 value: 24.111 - type: precision_at_10 value: 7.817 - type: precision_at_100 value: 0.976 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 16.596 - type: precision_at_5 value: 12.134 - type: recall_at_1 value: 24.111 - type: recall_at_10 value: 78.16499999999999 - type: recall_at_100 value: 97.58200000000001 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 49.787 - type: recall_at_5 value: 60.669 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 80.25 - type: f1 value: 79.64999520103544 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 37.747 - type: map_at_10 value: 72.223 - type: map_at_100 value: 73.802 - type: map_at_1000 value: 73.80499999999999 - type: map_at_3 value: 61.617999999999995 - type: map_at_5 value: 67.92200000000001 - type: mrr_at_1 value: 71.914 - type: mrr_at_10 value: 80.71000000000001 - type: mrr_at_100 value: 80.901 - type: mrr_at_1000 value: 80.901 - type: mrr_at_3 value: 78.935 - type: mrr_at_5 value: 80.193 - type: ndcg_at_1 value: 71.914 - type: ndcg_at_10 value: 79.912 - type: ndcg_at_100 value: 82.675 - type: ndcg_at_1000 value: 82.702 - type: ndcg_at_3 value: 73.252 - type: ndcg_at_5 value: 76.36 - type: precision_at_1 value: 71.914 - type: precision_at_10 value: 23.071 - type: precision_at_100 value: 2.62 - type: precision_at_1000 value: 0.263 - type: precision_at_3 value: 51.235 - type: precision_at_5 value: 38.117000000000004 - type: recall_at_1 value: 37.747 - type: recall_at_10 value: 91.346 - type: recall_at_100 value: 99.776 - type: recall_at_1000 value: 99.897 - type: recall_at_3 value: 68.691 - type: recall_at_5 value: 80.742 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 4.124 - type: map_at_10 value: 10.206999999999999 - type: map_at_100 value: 13.181000000000001 - type: map_at_1000 value: 14.568 - type: map_at_3 value: 7.2620000000000005 - type: map_at_5 value: 8.622 - type: mrr_at_1 value: 39.009 - type: mrr_at_10 value: 48.144 - type: mrr_at_100 value: 48.746 - type: mrr_at_1000 value: 48.789 - type: mrr_at_3 value: 45.356 - type: mrr_at_5 value: 47.152 - type: ndcg_at_1 value: 36.533 - type: ndcg_at_10 value: 29.643000000000004 - type: ndcg_at_100 value: 27.893 - type: ndcg_at_1000 value: 37.307 - type: ndcg_at_3 value: 33.357 - type: ndcg_at_5 value: 32.25 - type: precision_at_1 value: 38.7 - type: precision_at_10 value: 22.941 - type: precision_at_100 value: 7.303 - type: precision_at_1000 value: 2.028 - type: precision_at_3 value: 31.889 - type: precision_at_5 value: 29.04 - type: recall_at_1 value: 4.124 - type: recall_at_10 value: 14.443 - type: recall_at_100 value: 29.765000000000004 - type: recall_at_1000 value: 63.074 - type: recall_at_3 value: 8.516 - type: recall_at_5 value: 10.979 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 49.010999999999996 - type: map_at_10 value: 60.094 - type: map_at_100 value: 60.79900000000001 - type: map_at_1000 value: 60.828 - type: map_at_3 value: 57.175 - type: map_at_5 value: 58.748 - type: mrr_at_1 value: 51.666999999999994 - type: mrr_at_10 value: 61.312 - type: mrr_at_100 value: 61.821000000000005 - type: mrr_at_1000 value: 61.85000000000001 - type: mrr_at_3 value: 59.0 - type: mrr_at_5 value: 60.199999999999996 - type: ndcg_at_1 value: 51.666999999999994 - type: ndcg_at_10 value: 65.402 - type: ndcg_at_100 value: 68.377 - type: ndcg_at_1000 value: 69.094 - type: ndcg_at_3 value: 60.153999999999996 - type: ndcg_at_5 value: 62.455000000000005 - type: precision_at_1 value: 51.666999999999994 - type: precision_at_10 value: 9.067 - type: precision_at_100 value: 1.0670000000000002 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 24.0 - type: precision_at_5 value: 15.933 - type: recall_at_1 value: 49.010999999999996 - type: recall_at_10 value: 80.511 - type: recall_at_100 value: 94.0 - type: recall_at_1000 value: 99.5 - type: recall_at_3 value: 66.2 - type: recall_at_5 value: 71.944 --- full evaluation not complete # Fin-MPNET-Base (v0.1) This is a fine-tuned sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. This model aims to be very strong on Financial Document Retrieval Tasks, while trying to maintain as much generalized performance as possible. | | FiQA | SciFact | AmazonReviews | OnlineBankingIntent | ArguAna | |-------------------|-------|---------|---------------|---------------------|---------| | fin-mpnet-base | 79.91 | 65.40 | 29.12 | 80.25 | 49.11 | | all-mpnet-base-v2 | 49.96 | 65.57 | 31.92 | 81.86 | 46.52 | | previous SoTA | 56.59 | - | - | - | - | v0.1 shows SoTA results on FiQA Test set while other non-financial benchmarks only drop a few small % and improvement in others. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Evaluation Results Model was evaluated during training only on the new finance QA examples, as such only financial relevant benchmarks were evaluated on for v0.1 [FiQA-2018, BankingClassification77] The model currently shows the highest FiQA Retrieval score on the test set, on the MTEB Leaderboard ( The model will have likely suffered some performance on other benchmarks, i.e. BankingClassification77 has dropped from 81.6 to 80.25, this will be addressed for v0.2 and full evaluation on all sets will be run. ## Training \"sentence-transformers/all-mpnet-base-v2\" was fine-tuned on 150k+ financial document QA examples using MNR Loss.", + "model_explanation_gemini": "A financial domain sentence-transformers model specialized for sentence similarity, retrieval, and classification tasks in finance-related applications." +} \ No newline at end of file diff --git a/data/model_data_json/myshell-ai_MeloTTS-English-v3.json b/data/model_data_json/myshell-ai_MeloTTS-English-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..b714d92893f7fcfd9aa1706e27367c2ab7ae70c5 --- /dev/null +++ b/data/model_data_json/myshell-ai_MeloTTS-English-v3.json @@ -0,0 +1,12 @@ +{ + "model_id": "myshell-ai/MeloTTS-English-v3", + "downloads": 143083, + "tags": [ + "transformers", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- # MeloTTS MeloTTS is a **high-quality multi-lingual** text-to-speech library by MyShell.ai. Supported languages include: | Model card | Example | | --- | --- | | English (American) | Link | | English (British) | Link | | English (Indian) | Link | | English (Australian) | Link | | English (Default) | Link | | Spanish | Link | | French | Link | | Chinese (mix EN) | Link | | Japanese | Link | | Korean | Link | Some other features include: - The Chinese speaker supports . - Fast enough for . ## Usage ### Without Installation An unofficial live demo is hosted on Hugging Face Spaces. #### Use it on MyShell There are hundreds of TTS models on MyShell, much more than MeloTTS. See examples here. More can be found at the widget center of MyShell.ai. ### Install and Use Locally Follow the installation steps here before using the following snippet: ## Join the Community **Open Source AI Grant** We are actively sponsoring open-source AI projects. The sponsorship includes GPU resources, fundings and intellectual support (collaboration with top research labs). We welcome both reseach and engineering projects, as long as the open-source community needs them. Please contact Zengyi Qin if you are interested. **Contributing** If you find this work useful, please consider contributing to the GitHub repo. - Many thanks to @fakerybakery for adding the Web UI and CLI part. ## License This library is under MIT License, which means it is free for both commercial and non-commercial use. ## Acknowledgements This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.", + "model_explanation_gemini": "Generates high-quality, multilingual speech from text with support for multiple English accents and other languages." +} \ No newline at end of file diff --git a/data/model_data_json/myshell-ai_MeloTTS-English.json b/data/model_data_json/myshell-ai_MeloTTS-English.json new file mode 100644 index 0000000000000000000000000000000000000000..ec1940362ed8b4d7f7afb1068f582f88d27c73db --- /dev/null +++ b/data/model_data_json/myshell-ai_MeloTTS-English.json @@ -0,0 +1,14 @@ +{ + "model_id": "myshell-ai/MeloTTS-English", + "downloads": 185308, + "tags": [ + "transformers", + "text-to-speech", + "ko", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - ko pipeline_tag: text-to-speech --- # MeloTTS
\"myshell-ai%2FMeloTTS MeloTTS is a **high-quality multi-lingual** text-to-speech library by MIT and MyShell.ai. Supported languages include: | Model card | Example | | --- | --- | | English (American) | Link | | English (British) | Link | | English (Indian) | Link | | English (Australian) | Link | | English (Default) | Link | | Spanish | Link | | French | Link | | Chinese (mix EN) | Link | | Japanese | Link | | Korean | Link | Some other features include: - The Chinese speaker supports . - Fast enough for . ## Authors - Wenliang Zhao at Tsinghua University - Xumin Yu at Tsinghua University - Zengyi Qin (project lead) at MIT and MyShell **Citation** ## Usage ### Without Installation An unofficial live demo is hosted on Hugging Face Spaces. #### Use it on MyShell There are hundreds of TTS models on MyShell, much more than MeloTTS. See examples here. More can be found at the widget center of MyShell.ai. ### Install and Use Locally Follow the installation steps here before using the following snippet: ## Join the Community **Open Source AI Grant** We are actively sponsoring open-source AI projects. The sponsorship includes GPU resources, fundings and intellectual support (collaboration with top research labs). We welcome both reseach and engineering projects, as long as the open-source community needs them. Please contact Zengyi Qin if you are interested. **Contributing** If you find this work useful, please consider contributing to the GitHub repo. - Many thanks to @fakerybakery for adding the Web UI and CLI part. ## License This library is under MIT License, which means it is free for both commercial and non-commercial use. ## Acknowledgements This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.", + "model_explanation_gemini": "Generates high-quality multilingual speech from text, supporting various English accents and languages like Spanish, French, Chinese, Japanese, and Korean." +} \ No newline at end of file diff --git a/data/model_data_json/myshell-ai_MeloTTS-Japanese.json b/data/model_data_json/myshell-ai_MeloTTS-Japanese.json new file mode 100644 index 0000000000000000000000000000000000000000..c92abba5b0a332e57086dcde7a35f1e4485d96ea --- /dev/null +++ b/data/model_data_json/myshell-ai_MeloTTS-Japanese.json @@ -0,0 +1,14 @@ +{ + "model_id": "myshell-ai/MeloTTS-Japanese", + "downloads": 87888, + "tags": [ + "transformers", + "text-to-speech", + "ko", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - ko pipeline_tag: text-to-speech --- # MeloTTS MeloTTS is a **high-quality multi-lingual** text-to-speech library by MyShell.ai. Supported languages include: | Model card | Example | | --- | --- | | English (American) | Link | | English (British) | Link | | English (Indian) | Link | | English (Australian) | Link | | English (Default) | Link | | Spanish | Link | | French | Link | | Chinese (mix EN) | Link | | Japanese | Link | | Korean | Link | Some other features include: - The Chinese speaker supports . - Fast enough for . ## Usage ### Without Installation An unofficial live demo is hosted on Hugging Face Spaces. #### Use it on MyShell There are hundreds of TTS models on MyShell, much more than MeloTTS. See examples here. More can be found at the widget center of MyShell.ai. ### Install and Use Locally Follow the installation steps here before using the following snippet: ## Join the Community **Open Source AI Grant** We are actively sponsoring open-source AI projects. The sponsorship includes GPU resources, fundings and intellectual support (collaboration with top research labs). We welcome both reseach and engineering projects, as long as the open-source community needs them. Please contact Zengyi Qin if you are interested. **Contributing** If you find this work useful, please consider contributing to the GitHub repo. - Many thanks to @fakerybakery for adding the Web UI and CLI part. ## License This library is under MIT License, which means it is free for both commercial and non-commercial use. ## Acknowledgements This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.", + "model_explanation_gemini": "MeloTTS is a high-quality multilingual text-to-speech library supporting multiple languages and accents, including Japanese, for converting text into natural-sounding speech." +} \ No newline at end of file diff --git a/data/model_data_json/nari-labs_Dia-1.6B.json b/data/model_data_json/nari-labs_Dia-1.6B.json new file mode 100644 index 0000000000000000000000000000000000000000..19b29cabed5269d54182b0e14e5b540729dc67c2 --- /dev/null +++ b/data/model_data_json/nari-labs_Dia-1.6B.json @@ -0,0 +1,16 @@ +{ + "model_id": "nari-labs/Dia-1.6B", + "downloads": 108873, + "tags": [ + "safetensors", + "model_hub_mixin", + "pytorch_model_hub_mixin", + "text-to-speech", + "en", + "arxiv:2305.09636", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: text-to-speech language: - en tags: - model_hub_mixin - pytorch_model_hub_mixin widget: - text: \"[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face.\" example_title: \"Dia intro\" - text: \"[S1] Oh fire! Oh my goodness! What's the procedure? What to we do people? The smoke could be coming through an air duct! [S2] Oh my god! Okay.. it's happening. Everybody stay calm! [S1] What's the procedure... [S2] Everybody stay fucking calm!!!... Everybody fucking calm down!!!!! [S1] No! No! If you touch the handle, if its hot there might be a fire down the hallway!\" example_title: \"Panic protocol\" ---
Dia is a 1.6B parameter text to speech model created by Nari Labs. It was pushed to the Hub using the PytorchModelHubMixin integration. Dia **directly generates highly realistic dialogue from a transcript**. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. To accelerate research, we are providing access to pretrained model checkpoints and inference code. The model weights are hosted on Hugging Face. The model only supports English generation at the moment. We also provide a demo page comparing our model to ElevenLabs Studio and Sesame CSM-1B. - (Update) We have a ZeroGPU Space running! Try it now here. Thanks to the HF team for the support :) - Join our discord server for community support and access to new features. - Play with a larger version of Dia: generate fun conversations, remix content, and share with friends. 🔮 Join the waitlist for early access. ## ⚡️ Quickstart This will open a Gradio UI that you can work on. or if you do not have pre-installed: Note that the model was not fine-tuned on a specific voice. Hence, you will get different voices every time you run the model. You can keep speaker consistency by either adding an audio prompt (a guide coming VERY soon - try it with the second example on Gradio for now), or fixing the seed. ## Features - Generate dialogue via and tag - Generate non-verbal like , , etc. - Below verbal tags will be recognized, but might result in unexpected output. - - Voice cloning. See [](example/voice_clone.py) for more information. - In the Hugging Face space, you can upload the audio you want to clone and place its transcript before your script. Make sure the transcript follows the required format. The model will then output only the content of your script. ## ⚙️ Usage ### As a Python Library A pypi package and a working CLI tool will be available soon. ## 💻 Hardware and Inference Speed Dia has been tested on only GPUs (pytorch 2.0+, CUDA 12.6). CPU support is to be added soon. The initial run will take longer as the Descript Audio Codec also needs to be downloaded. On enterprise GPUs, Dia can generate audio in real-time. On older GPUs, inference time will be slower. For reference, on a A4000 GPU, Dia roughly generates 40 tokens/s (86 tokens equals 1 second of audio). will increase speeds for supported GPUs. The full version of Dia requires around 10GB of VRAM to run. We will be adding a quantized version in the future. If you don't have hardware available or if you want to play with bigger versions of our models, join the waitlist here. ## 🪪 License This project is licensed under the Apache License 2.0 - see the LICENSE file for details. ## ⚠️ Disclaimer This project offers a high-fidelity speech generation model intended for research and educational use. The following uses are **strictly forbidden**: - **Identity Misuse**: Do not produce audio resembling real individuals without permission. - **Deceptive Content**: Do not use this model to generate misleading content (e.g. fake news) - **Illegal or Malicious Use**: Do not use this model for activities that are illegal or intended to cause harm. By using this model, you agree to uphold relevant legal standards and ethical responsibilities. We **are not responsible** for any misuse and firmly oppose any unethical usage of this technology. ## 🔭 TODO / Future Work - Docker support. - Optimize inference speed. - Add quantization for memory efficiency. ## 🤝 Contributing We are a tiny team of 1 full-time and 1 part-time research-engineers. We are extra-welcome to any contributions! Join our Discord Server for discussions. ## 🤗 Acknowledgements - We thank the Google TPU Research Cloud program for providing computation resources. - Our work was heavily inspired by SoundStorm, Parakeet, and Descript Audio Codec. - HuggingFace for providing the ZeroGPU Grant. - \"Nari\" is a pure Korean word for lily. - We thank Jason Y. for providing help with data filtering.", + "model_explanation_gemini": "Generates highly realistic dialogue and nonverbal sounds from text transcripts with emotion and tone control." +} \ No newline at end of file diff --git a/data/model_data_json/nateraw_vit-age-classifier.json b/data/model_data_json/nateraw_vit-age-classifier.json new file mode 100644 index 0000000000000000000000000000000000000000..864de26956ac1427c74d72c4aa8a2f73baaba4ab --- /dev/null +++ b/data/model_data_json/nateraw_vit-age-classifier.json @@ -0,0 +1,18 @@ +{ + "model_id": "nateraw/vit-age-classifier", + "downloads": 1405434, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-classification", + "dataset:nateraw/fairface", + "doi:10.57967/hf/1259", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - image-classification - pytorch datasets: - nateraw/fairface --- A vision transformer finetuned to classify the age of a given person's face.", + "model_explanation_gemini": "Classifies a person's age from their face image using a vision transformer." +} \ No newline at end of file diff --git a/data/model_data_json/naufalihsan_indonesian-sbert-large.json b/data/model_data_json/naufalihsan_indonesian-sbert-large.json new file mode 100644 index 0000000000000000000000000000000000000000..5fb2adcfd65c842c1f8b222cdbe37acd05384c17 --- /dev/null +++ b/data/model_data_json/naufalihsan_indonesian-sbert-large.json @@ -0,0 +1,17 @@ +{ + "model_id": "naufalihsan/indonesian-sbert-large", + "downloads": 82294, + "tags": [ + "sentence-transformers", + "pytorch", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- # {MODEL_NAME} This is a sentence-transformers model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Training The model was trained with the parameters: **DataLoader**: of length 360 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors " +} \ No newline at end of file diff --git a/data/model_data_json/naver-clova-ix_donut-base-finetuned-docvqa.json b/data/model_data_json/naver-clova-ix_donut-base-finetuned-docvqa.json new file mode 100644 index 0000000000000000000000000000000000000000..057d16b39a65720e623c0746a26c506c6a5a64c3 --- /dev/null +++ b/data/model_data_json/naver-clova-ix_donut-base-finetuned-docvqa.json @@ -0,0 +1,20 @@ +{ + "model_id": "naver-clova-ix/donut-base-finetuned-docvqa", + "downloads": 143391, + "tags": [ + "transformers", + "pytorch", + "vision-encoder-decoder", + "image-text-to-text", + "donut", + "image-to-text", + "vision", + "document-question-answering", + "arxiv:2111.15664", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit pipeline_tag: document-question-answering tags: - donut - image-to-text - vision widget: - text: \"What is the invoice number?\" src: \" - text: \"What is the purchase amount?\" src: \" --- # Donut (base-sized model, fine-tuned on DocVQA) Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. !model image ## Intended uses & limitations This model is fine-tuned on DocVQA, a document visual question answering dataset. We refer to the documentation which includes code examples. ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs document visual question answering by generating text answers from images of documents using a vision encoder and text decoder." +} \ No newline at end of file diff --git a/data/model_data_json/naver-clova-ix_donut-base-finetuned-rvlcdip.json b/data/model_data_json/naver-clova-ix_donut-base-finetuned-rvlcdip.json new file mode 100644 index 0000000000000000000000000000000000000000..70773c674573178dca81351b637f837b3b9649cf --- /dev/null +++ b/data/model_data_json/naver-clova-ix_donut-base-finetuned-rvlcdip.json @@ -0,0 +1,19 @@ +{ + "model_id": "naver-clova-ix/donut-base-finetuned-rvlcdip", + "downloads": 159190, + "tags": [ + "transformers", + "pytorch", + "vision-encoder-decoder", + "image-text-to-text", + "donut", + "image-to-text", + "vision", + "arxiv:2111.15664", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - donut - image-to-text - vision --- # Donut (base-sized model, fine-tuned on RVL-CDIP) Donut model fine-tuned on RVL-CDIP. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. !model image ## Intended uses & limitations This model is fine-tuned on RVL-CDIP, a document image classification dataset. We refer to the documentation which includes code examples. ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs document image classification by encoding images with a Swin Transformer and generating text descriptions using a BART decoder, fine-tuned on the RVL-CDIP dataset." +} \ No newline at end of file diff --git a/data/model_data_json/naver_MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.json b/data/model_data_json/naver_MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.json new file mode 100644 index 0000000000000000000000000000000000000000..3657b123b28c4b48da3b59132e3f477648de79aa --- /dev/null +++ b/data/model_data_json/naver_MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.json @@ -0,0 +1,15 @@ +{ + "model_id": "naver/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric", + "downloads": 112555, + "tags": [ + "mast3r", + "safetensors", + "image-to-3d", + "pytorch_model_hub_mixin", + "model_hub_mixin", + "arxiv:2406.09756", + "region:us" + ], + "description": "--- tags: - image-to-3d - pytorch_model_hub_mixin - model_hub_mixin library_name: mast3r repo_url: --- ## Grounding Image Matching in 3D with MASt3R # License The code is distributed under the CC BY-NC-SA 4.0 License. See LICENSE for more information. For the checkpoints, make sure to agree to the license of all the public training datasets and base checkpoints we used, in addition to CC-BY-NC-SA 4.0. The mapfree dataset license in particular is very restrictive. For more information, check CHECKPOINTS_NOTICE. # Model info Gihub page: | Modelname | Training resolutions | Head | Encoder | Decoder | |-------------|----------------------|------|---------|---------| | MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_nonmetric | 512x384, 512x336, 512x288, 512x256, 512x160 | CatMLP+DPT | ViT-L | ViT-B | # How to use First, install mast3r. To load the model:", + "model_explanation_gemini": "Generates 3D representations from 2D images using a ViT-L encoder and ViT-B decoder with CatMLP+DPT head for image-to-3D tasks." +} \ No newline at end of file diff --git a/data/model_data_json/naver_splade-cocondenser-ensembledistil.json b/data/model_data_json/naver_splade-cocondenser-ensembledistil.json new file mode 100644 index 0000000000000000000000000000000000000000..bf38a6cb9db1717aa51b2096864a779da2a081e0 --- /dev/null +++ b/data/model_data_json/naver_splade-cocondenser-ensembledistil.json @@ -0,0 +1,25 @@ +{ + "model_id": "naver/splade-cocondenser-ensembledistil", + "downloads": 575595, + "tags": [ + "transformers", + "pytorch", + "bert", + "fill-mask", + "splade", + "query-expansion", + "document-expansion", + "bag-of-words", + "passage-retrieval", + "knowledge-distillation", + "en", + "dataset:ms_marco", + "arxiv:2205.04733", + "license:cc-by-nc-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-sa-4.0 language: \"en\" tags: - splade - query-expansion - document-expansion - bag-of-words - passage-retrieval - knowledge-distillation datasets: - ms_marco --- ## SPLADE CoCondenser EnsembleDistil SPLADE model for passage retrieval. For additional details, please visit: * paper: * code: | | MRR@10 (MS MARCO dev) | R@1000 (MS MARCO dev) | | --- | --- | --- | | | 38.3 | 98.3 | ## Citation If you use our checkpoint, please cite our work:", + "model_explanation_gemini": "An ensemble-distilled SPLADE model designed for passage retrieval, optimized for query and document expansion using a bag-of-words approach." +} \ No newline at end of file diff --git a/data/model_data_json/naver_splade-v3.json b/data/model_data_json/naver_splade-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..7e413cf430f01703fdabf58e6e8dea853249798c --- /dev/null +++ b/data/model_data_json/naver_splade-v3.json @@ -0,0 +1,19 @@ +{ + "model_id": "naver/splade-v3", + "downloads": 91658, + "tags": [ + "transformers", + "pytorch", + "bert", + "fill-mask", + "splade", + "en", + "arxiv:2403.06789", + "license:cc-by-nc-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-sa-4.0 language: - en tags: - splade --- ## SPLADE-v3 #### SPLADE-v3 is the latest series of SPLADE models. This checkpoint corresponds to a model that starts from SPLADE++SelfDistil (), and is trained with a mix of KL-Div and MarginMSE, with 8 negatives per query sampled from SPLADE++SelfDistil. We used the original MS MARCO collection **without the titles**. For more details, see our arXiv companion book: To use SPLADE, please visit our GitHub repository: ## Performance | | MRR@10 (MS MARCO dev) | avg nDCG@10 (BEIR-13) | | --- | --- | --- | | | 40.2 | 51.7 | ## Citation If you use our checkpoint, please cite our work:", + "model_explanation_gemini": "SPLADE-v3 is a retrieval model trained on MS MARCO data without titles, optimized for document search using KL-Div and MarginMSE losses with 8 negatives per query." +} \ No newline at end of file diff --git a/data/model_data_json/navteca_ms-marco-MiniLM-L-6-v2.json b/data/model_data_json/navteca_ms-marco-MiniLM-L-6-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..1f14f6fd7dc2fa56bd002180ac113f8e2e62ccd6 --- /dev/null +++ b/data/model_data_json/navteca_ms-marco-MiniLM-L-6-v2.json @@ -0,0 +1,16 @@ +{ + "model_id": "navteca/ms-marco-MiniLM-L-6-v2", + "downloads": 85163, + "tags": [ + "sentence-transformers", + "pytorch", + "jax", + "bert", + "text-classification", + "en", + "license:mit", + "region:us" + ], + "description": "--- language: en license: mit pipeline_tag: text-classification tags: - sentence-transformers --- # Cross-Encoder for MS Marco The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco ## Training Data This model was trained on the MS Marco Passage Ranking task. ## Usage The usage becomes easier when you have SentenceTransformers installed. Then, you can use the pre-trained models like this: ## Performance In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset. | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec | | ------------- |:-------------| -----| --- | | **Version 2 models** | | | | cross-encoder/ms-marco-TinyBERT-L-2-v2 | 69.84 | 32.56 | 9000 | cross-encoder/ms-marco-MiniLM-L-2-v2 | 71.01 | 34.85 | 4100 | cross-encoder/ms-marco-MiniLM-L-4-v2 | 73.04 | 37.70 | 2500 | cross-encoder/ms-marco-MiniLM-L-6-v2 | 74.30 | 39.01 | 1800 | cross-encoder/ms-marco-MiniLM-L-12-v2 | 74.31 | 39.02 | 960 | **Version 1 models** | | | | cross-encoder/ms-marco-TinyBERT-L-2 | 67.43 | 30.15 | 9000 | cross-encoder/ms-marco-TinyBERT-L-4 | 68.09 | 34.50 | 2900 | cross-encoder/ms-marco-TinyBERT-L-6 | 69.57 | 36.13 | 680 | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | **Other models** | | | | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720 Note: Runtime was computed on a V100 GPU.", + "model_explanation_gemini": "Reranks retrieved passages for a query in information retrieval tasks, optimized for the MS Marco Passage Ranking dataset." +} \ No newline at end of file diff --git a/data/model_data_json/ncbi_MedCPT-Query-Encoder.json b/data/model_data_json/ncbi_MedCPT-Query-Encoder.json new file mode 100644 index 0000000000000000000000000000000000000000..b763a1fb01ed320d0a2e478dc1ca6668bcd45e58 --- /dev/null +++ b/data/model_data_json/ncbi_MedCPT-Query-Encoder.json @@ -0,0 +1,17 @@ +{ + "model_id": "ncbi/MedCPT-Query-Encoder", + "downloads": 76693, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "arxiv:2307.00589", + "license:other", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other license_name: public-domain license_link: LICENSE --- # MedCPT Introduction **MedCPT generates embeddings of biomedical texts that can be used for semantic search (dense retrieval)**. The model contains two encoders: - MedCPT Query Encoder: compute the embeddings of short texts (e.g., questions, search queries, sentences). - MedCPT Article Encoder: compute the embeddings of articles (e.g., PubMed titles & abstracts). **This repo contains the MedCPT Query Encoder.** **MedCPT has been pre-trained by an unprecedented scale of 255M query-article pairs from PubMed search logs**, and has been shown to achieve state-of-the-art performance on several zero-shot biomedical IR datasets. In general, there are three use cases: 1. Query-to-article search with both encoders. 2. Query representation for clustering or query-to-query search with the query encoder. 3. Article representation for clustering or article-to-article search with the article encoder. For more details, please check out our paper (Bioinformatics, 2023). Please note that the released version is slightly different from the version reported in the paper. # Case 1. Using the MedCPT Query Encoder The output will be: These embeddings are also in the same space as those generated by the MedCPT article encoder. # Case 2. Semantically searching PubMed with your query We have provided the embeddings of all PubMed articles generated by the MedCPT article encoder at You can simply download these embeddings to search PubMed with your query. # Acknowledgments This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine. # Disclaimer This tool shows the results of research conducted in the Computational Biology Branch, NCBI/NLM. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NCBI's disclaimer policy is available. # Citation If you find this repo helpful, please cite MedCPT by:" +} \ No newline at end of file diff --git a/data/model_data_json/neulab_codebert-python.json b/data/model_data_json/neulab_codebert-python.json new file mode 100644 index 0000000000000000000000000000000000000000..23687bf37630c4d4b92afa433f110ce42c7ef29c --- /dev/null +++ b/data/model_data_json/neulab_codebert-python.json @@ -0,0 +1,16 @@ +{ + "model_id": "neulab/codebert-python", + "downloads": 1482227, + "tags": [ + "transformers", + "pytorch", + "roberta", + "fill-mask", + "arxiv:2302.05527", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "This is a model, trained for 1,000,000 steps (with ) on **Python** code from the dataset, on the masked-language-modeling task. It is intended to be used in CodeBERTScore: but can be used for any other model or task. For more information, see: ## Citation If you use this model for research, please cite:", + "model_explanation_gemini": "Trained on Python code for masked-language modeling, primarily intended for CodeBERTScore but adaptable to other tasks." +} \ No newline at end of file diff --git a/data/model_data_json/neuralmind_bert-base-portuguese-cased.json b/data/model_data_json/neuralmind_bert-base-portuguese-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..26fd8b78cd70f2029c92ae4f5dc834458aa09bb3 --- /dev/null +++ b/data/model_data_json/neuralmind_bert-base-portuguese-cased.json @@ -0,0 +1,20 @@ +{ + "model_id": "neuralmind/bert-base-portuguese-cased", + "downloads": 219018, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "pt", + "dataset:brWaC", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: pt license: mit tags: - bert - pytorch datasets: - brWaC --- # BERTimbau Base (aka \"bert-base-portuguese-cased\") !Bert holding a berimbau ## Introduction BERTimbau Base is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large. For further information or requests, please go to BERTimbau repository. ## Available models | Model | Arch. | #Layers | #Params | | ---------------------------------------- | ---------- | ------- | ------- | | | BERT-Base | 12 | 110M | | | BERT-Large | 24 | 335M | ## Usage ### Masked language modeling prediction example ### For BERT embeddings ## Citation If you use our work, please cite:", + "model_explanation_gemini": "Pretrained BERT model for Brazilian Portuguese optimized for named entity recognition, sentence similarity, and textual entailment tasks." +} \ No newline at end of file diff --git a/data/model_data_json/neuralmind_bert-large-portuguese-cased.json b/data/model_data_json/neuralmind_bert-large-portuguese-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..215b99bcaf6a1b6cf401ba87294bb610bf89d681 --- /dev/null +++ b/data/model_data_json/neuralmind_bert-large-portuguese-cased.json @@ -0,0 +1,19 @@ +{ + "model_id": "neuralmind/bert-large-portuguese-cased", + "downloads": 258807, + "tags": [ + "transformers", + "pytorch", + "jax", + "bert", + "fill-mask", + "pt", + "dataset:brWaC", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: pt license: mit tags: - bert - pytorch datasets: - brWaC --- # BERTimbau Large (aka \"bert-large-portuguese-cased\") !Bert holding a berimbau ## Introduction BERTimbau Large is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large. For further information or requests, please go to BERTimbau repository. ## Available models | Model | Arch. | #Layers | #Params | | ---------------------------------------- | ---------- | ------- | ------- | | | BERT-Base | 12 | 110M | | | BERT-Large | 24 | 335M | ## Usage ### Masked language modeling prediction example ### For BERT embeddings ## Citation If you use our work, please cite:", + "model_explanation_gemini": "Pretrained BERT model for Brazilian Portuguese optimized for Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment tasks." +} \ No newline at end of file diff --git a/data/model_data_json/nguyenvulebinh_wav2vec2-base-vi.json b/data/model_data_json/nguyenvulebinh_wav2vec2-base-vi.json new file mode 100644 index 0000000000000000000000000000000000000000..fd3f92416cc55df8042b67d7949c16a92eed2b39 --- /dev/null +++ b/data/model_data_json/nguyenvulebinh_wav2vec2-base-vi.json @@ -0,0 +1,18 @@ +{ + "model_id": "nguyenvulebinh/wav2vec2-base-vi", + "downloads": 501718, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "pretraining", + "speech", + "vi", + "dataset:youtube-vi-13k-hours", + "license:cc-by-nc-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: vi datasets: - youtube-vi-13k-hours tags: - speech license: cc-by-nc-4.0 --- # Vietnamese Self-Supervised Learning Wav2Vec2 model ## Model We use wav2vec2 architecture for doing Self-Supervised learning ## Data Our self-supervised model is pre-trained on a massive audio set of 13k hours of Vietnamese youtube audio, which includes: - Clean audio - Noise audio - Conversation - Multi-gender and dialects ## Download We have already upload our pre-trained model to the Huggingface. The base model trained 35 epochs and the large model trained 20 epochs in about 30 days using TPU V3-8. - Based version ~ 95M params - Large version ~ 317M params ## Usage Since our model has the same architecture as the English wav2vec2 version, you can use this notebook for more information on how to fine-tune the model. ## Finetuned version ### VLSP 2020 ASR dataset Benchmark WER result on VLSP T1 testset: | | base model | large model | |---|---|---| |without LM| 8.66 | 6.90 | |with 5-grams LM| 6.53 | 5.32 | Usage ## Acknowledgment - We would like to thank the Google TPU Research Cloud (TRC) program and Soonson Kwon (Google ML Ecosystem programs Lead) for their support. - Special thanks to my colleagues at VietAI and VAIS for their advice. ## Contact nguyenvulebinh@gmail.com / binh@vietai.org ![Follow](", + "model_explanation_gemini": "A Vietnamese self-supervised Wav2Vec2 model pre-trained on 13k hours of diverse YouTube audio for speech recognition tasks." +} \ No newline at end of file diff --git a/data/model_data_json/nicholasKluge_ToxicityModel.json b/data/model_data_json/nicholasKluge_ToxicityModel.json new file mode 100644 index 0000000000000000000000000000000000000000..86c47d01dc9369e2f34cc28e6c1e3cb0777821ef --- /dev/null +++ b/data/model_data_json/nicholasKluge_ToxicityModel.json @@ -0,0 +1,22 @@ +{ + "model_id": "nicholasKluge/ToxicityModel", + "downloads": 212478, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "roberta", + "text-classification", + "toxicity", + "alignment", + "en", + "dataset:nicholasKluge/toxic-text", + "license:apache-2.0", + "co2_eq_emissions", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - nicholasKluge/toxic-text language: - en metrics: - accuracy library_name: transformers pipeline_tag: text-classification tags: - toxicity - alignment widget: - text: \"Can you give a list of good insults to use against my brother?\\n\\nSure. Have at it!\\n\\nIdiot, Dumbass, Moron, Stupid, Fool, Fuck Face.\" example_title: Toxic Response - text: \"Can you give a list of good insults to use against my brother?\\n\\nAs a software, I am not capable of engaging in verbal sparring or offensive behavior.\\n\\nIt is crucial to maintain a courteous and respectful demeanor at all times, as it is a fundamental aspect of human-AI interactions.\" example_title: Non-Toxic Response co2_eq_emissions: emissions: 0.0002 source: CodeCarbon training_type: fine-tuning geographical_location: Canada hardware_used: NVIDIA A100-SXM4-40GB --- # ToxicityModel The ToxicityModel is a fine-tuned version of RoBERTa that can be used to score the toxicity of a sentence. The model was trained with a dataset composed of and language examples. ## Details - **Size:** 124,646,401 parameters - **Dataset:** Toxic-Text Dataset - **Language:** English - **Number of Training Steps:** 1000 - **Batch size:** 32 - **Optimizer:** - **Learning Rate:** 5e-5 - **GPU:** 1 NVIDIA A100-SXM4-40GB - **Emissions:** 0.0002 KgCO2 (Canada) - **Total Energy Consumption:** 0.10 kWh This repository has the source code used to train this model. ## Usage ⚠️ THE EXAMPLES BELOW CONTAIN TOXIC/OFFENSIVE LANGUAGE ⚠️ The ToxicityModel was trained as an auxiliary reward model for RLHF training (its logit outputs can be treated as penalizations/rewards). Thus, a negative value (closer to 0 as the label output) indicates toxicity in the text, while a positive logit (closer to 1 as the label output) suggests non-toxicity. Here's an example of how to use the ToxicityModel to score the toxicity of a text: This will output the following: ## Performance | Acc | wiki_toxic | toxic_conversations_50k | |----------------------------------------------------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| | Aira-ToxicityModel | 92.05% | 91.63% | ## Cite as 🤗 ## License ToxicityModel is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.", + "model_explanation_gemini": "Detects and scores toxicity in English text, classifying inputs as toxic or non-toxic based on logit outputs." +} \ No newline at end of file diff --git a/data/model_data_json/nickprock_multi-sentence-BERTino.json b/data/model_data_json/nickprock_multi-sentence-BERTino.json new file mode 100644 index 0000000000000000000000000000000000000000..1d485d0b96dd09083dd033ffaf09e47fc12353b0 --- /dev/null +++ b/data/model_data_json/nickprock_multi-sentence-BERTino.json @@ -0,0 +1,22 @@ +{ + "model_id": "nickprock/multi-sentence-BERTino", + "downloads": 82414, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "distilbert", + "feature-extraction", + "sentence-similarity", + "transformers", + "it", + "dataset:stsb_multi_mt", + "dataset:unicamp-dl/mmarco", + "license:mit", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers license: mit datasets: - stsb_multi_mt - unicamp-dl/mmarco language: - it library_name: sentence-transformers --- # {multi-sentence-BERTino} This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. This model is trained from indigo-ai/BERTino using mmarco italian (200K) and stsb italian. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (FastEmbed) Using this model becomes easy when you have FastEmbed installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Training The model was trained with the parameters: **DataLoader**: of length 31250 with parameters: **Loss**: with parameters: **DataLoader**: of length 360 with parameters: **Loss**: **DataLoader**: of length 31250 with parameters: **Loss**: with parameters: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors " +} \ No newline at end of file diff --git a/data/model_data_json/nlpaueb_legal-bert-base-uncased.json b/data/model_data_json/nlpaueb_legal-bert-base-uncased.json new file mode 100644 index 0000000000000000000000000000000000000000..4c1ed4cfbbd5c9142c396fa0cfb98c6ea19ebdb5 --- /dev/null +++ b/data/model_data_json/nlpaueb_legal-bert-base-uncased.json @@ -0,0 +1,20 @@ +{ + "model_id": "nlpaueb/legal-bert-base-uncased", + "downloads": 412551, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "pretraining", + "legal", + "fill-mask", + "en", + "license:cc-by-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en pipeline_tag: fill-mask license: cc-by-sa-4.0 thumbnail: tags: - legal widget: - text: \"The applicant submitted that her husband was subjected to treatment amounting to [MASK] whilst in the custody of police.\" --- # LEGAL-BERT: The Muppets straight out of Law School LEGAL-BERT is a family of BERT models for the legal domain, intended to assist legal NLP research, computational law, and legal technology applications. To pre-train the different variations of LEGAL-BERT, we collected 12 GB of diverse English legal text from several fields (e.g., legislation, court cases, contracts) scraped from publicly available resources. Sub-domain variants (CONTRACTS-, EURLEX-, ECHR-) and/or general LEGAL-BERT perform better than using BERT out of the box for domain-specific tasks. A light-weight model (33% the size of BERT-BASE) pre-trained from scratch on legal data with competitive performance is also available.

--- I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos. \"LEGAL-BERT: The Muppets straight out of Law School\". In Findings of Empirical Methods in Natural Language Processing (EMNLP 2020) (Short Papers), to be held online, 2020. ( --- ## Pre-training corpora The pre-training corpora of LEGAL-BERT include: * 116,062 documents of EU legislation, publicly available from EURLEX ( the repository of EU Law running under the EU Publication Office. * 61,826 documents of UK legislation, publicly available from the UK legislation portal ( * 19,867 cases from the European Court of Justice (ECJ), also available from EURLEX. * 12,554 cases from HUDOC, the repository of the European Court of Human Rights (ECHR) ( * 164,141 cases from various courts across the USA, hosted in the Case Law Access Project portal ( * 76,366 US contracts from EDGAR, the database of US Securities and Exchange Commission (SECOM) ( ## Pre-training details * We trained BERT using the official code provided in Google BERT's GitHub repository ( * We released a model similar to the English BERT-BASE model (12-layer, 768-hidden, 12-heads, 110M parameters). * We chose to follow the same training set-up: 1 million training steps with batches of 256 sequences of length 512 with an initial learning rate 1e-4. * We were able to use a single Google Cloud TPU v3-8 provided for free from TensorFlow Research Cloud (TFRC), while also utilizing GCP research credits. Huge thanks to both Google programs for supporting us! * Part of LEGAL-BERT is a light-weight model pre-trained from scratch on legal data, which achieves comparable performance to larger models, while being much more efficient (approximately 4 times faster) with a smaller environmental footprint. ## Models list | Model name | Model Path | Training corpora | | ------------------- | ------------------------------------ | ------------------- | | CONTRACTS-BERT-BASE | | US contracts | | EURLEX-BERT-BASE | | EU legislation | | ECHR-BERT-BASE | | ECHR cases | | LEGAL-BERT-BASE * | | All | | LEGAL-BERT-SMALL | | All | \\* LEGAL-BERT-BASE is the model referred to as LEGAL-BERT-SC in Chalkidis et al. (2020); a model trained from scratch in the legal corpora mentioned below using a newly created vocabulary by a sentence-piece tokenizer trained on the very same corpora. \\*\\* As many of you expressed interest in the LEGAL-BERT-FP models (those relying on the original BERT-BASE checkpoint), they have been released in Archive.org ( as these models are secondary and possibly only interesting for those who aim to dig deeper in the open questions of Chalkidis et al. (2020). ## Load Pretrained Model ## Use LEGAL-BERT variants as Language Models | Corpus | Model | Masked token | Predictions | | --------------------------------- | ---------------------------------- | ------------ | ------------ | | | **BERT-BASE-UNCASED** | | (Contracts) | This [MASK] Agreement is between General Motors and John Murray . | employment | ('new', '0.09'), ('current', '0.04'), ('proposed', '0.03'), ('marketing', '0.03'), ('joint', '0.02') | (ECHR) | The applicant submitted that her husband was subjected to treatment amounting to [MASK] whilst in the custody of Adana Security Directorate | torture | ('torture', '0.32'), ('rape', '0.22'), ('abuse', '0.14'), ('death', '0.04'), ('violence', '0.03') | (EURLEX) | Establishing a system for the identification and registration of [MASK] animals and regarding the labelling of beef and beef products . | bovine | ('farm', '0.25'), ('livestock', '0.08'), ('draft', '0.06'), ('domestic', '0.05'), ('wild', '0.05') | | **CONTRACTS-BERT-BASE** | | (Contracts) | This [MASK] Agreement is between General Motors and John Murray . | employment | ('letter', '0.38'), ('dealer', '0.04'), ('employment', '0.03'), ('award', '0.03'), ('contribution', '0.02') | (ECHR) | The applicant submitted that her husband was subjected to treatment amounting to [MASK] whilst in the custody of Adana Security Directorate | torture | ('death', '0.39'), ('imprisonment', '0.07'), ('contempt', '0.05'), ('being', '0.03'), ('crime', '0.02') | (EURLEX) | Establishing a system for the identification and registration of [MASK] animals and regarding the labelling of beef and beef products . | bovine | (('domestic', '0.18'), ('laboratory', '0.07'), ('household', '0.06'), ('personal', '0.06'), ('the', '0.04') | | **EURLEX-BERT-BASE** | | (Contracts) | This [MASK] Agreement is between General Motors and John Murray . | employment | ('supply', '0.11'), ('cooperation', '0.08'), ('service', '0.07'), ('licence', '0.07'), ('distribution', '0.05') | (ECHR) | The applicant submitted that her husband was subjected to treatment amounting to [MASK] whilst in the custody of Adana Security Directorate | torture | ('torture', '0.66'), ('death', '0.07'), ('imprisonment', '0.07'), ('murder', '0.04'), ('rape', '0.02') | (EURLEX) | Establishing a system for the identification and registration of [MASK] animals and regarding the labelling of beef and beef products . | bovine | ('live', '0.43'), ('pet', '0.28'), ('certain', '0.05'), ('fur', '0.03'), ('the', '0.02') | | **ECHR-BERT-BASE** | | (Contracts) | This [MASK] Agreement is between General Motors and John Murray . | employment | ('second', '0.24'), ('latter', '0.10'), ('draft', '0.05'), ('bilateral', '0.05'), ('arbitration', '0.04') | (ECHR) | The applicant submitted that her husband was subjected to treatment amounting to [MASK] whilst in the custody of Adana Security Directorate | torture | ('torture', '0.99'), ('death', '0.01'), ('inhuman', '0.00'), ('beating', '0.00'), ('rape', '0.00') | (EURLEX) | Establishing a system for the identification and registration of [MASK] animals and regarding the labelling of beef and beef products . | bovine | ('pet', '0.17'), ('all', '0.12'), ('slaughtered', '0.10'), ('domestic', '0.07'), ('individual', '0.05') | | **LEGAL-BERT-BASE** | | (Contracts) | This [MASK] Agreement is between General Motors and John Murray . | employment | ('settlement', '0.26'), ('letter', '0.23'), ('dealer', '0.04'), ('master', '0.02'), ('supplemental', '0.02') | (ECHR) | The applicant submitted that her husband was subjected to treatment amounting to [MASK] whilst in the custody of Adana Security Directorate | torture | ('torture', '1.00'), ('detention', '0.00'), ('arrest', '0.00'), ('rape', '0.00'), ('death', '0.00') | (EURLEX) | Establishing a system for the identification and registration of [MASK] animals and regarding the labelling of beef and beef products . | bovine | ('live', '0.67'), ('beef', '0.17'), ('farm', '0.03'), ('pet', '0.02'), ('dairy', '0.01') | | **LEGAL-BERT-SMALL** | | (Contracts) | This [MASK] Agreement is between General Motors and John Murray . | employment | ('license', '0.09'), ('transition', '0.08'), ('settlement', '0.04'), ('consent', '0.03'), ('letter', '0.03') | (ECHR) | The applicant submitted that her husband was subjected to treatment amounting to [MASK] whilst in the custody of Adana Security Directorate | torture | ('torture', '0.59'), ('pain', '0.05'), ('ptsd', '0.05'), ('death', '0.02'), ('tuberculosis', '0.02') | (EURLEX) | Establishing a system for the identification and registration of [MASK] animals and regarding the labelling of beef and beef products . | bovine | ('all', '0.08'), ('live', '0.07'), ('certain', '0.07'), ('the', '0.07'), ('farm', '0.05') ## Evaluation on downstream tasks Consider the experiments in the article \"LEGAL-BERT: The Muppets straight out of Law School\". Chalkidis et al., 2020, ( ## Author - Publication ## About Us AUEB's Natural Language Processing Group develops algorithms, models, and systems that allow computers to process and generate natural language texts. The group's current research interests include: * question answering systems for databases, ontologies, document collections, and the Web, especially biomedical question answering, * natural language generation from databases and ontologies, especially Semantic Web ontologies, text classification, including filtering spam and abusive content, * information extraction and opinion mining, including legal text analytics and sentiment analysis, * natural language processing tools for Greek, for example parsers and named-entity recognizers, machine learning in natural language processing, especially deep learning. The group is part of the Information Processing Laboratory of the Department of Informatics of the Athens University of Economics and Business. Ilias Chalkidis on behalf of AUEB's Natural Language Processing Group | Github: @ilias.chalkidis | Twitter: @KiddoThe2B |", + "model_explanation_gemini": "A BERT model specialized for legal domain tasks, pre-trained on diverse English legal texts to improve performance in legal NLP applications." +} \ No newline at end of file diff --git a/data/model_data_json/nlpconnect_vit-gpt2-image-captioning.json b/data/model_data_json/nlpconnect_vit-gpt2-image-captioning.json new file mode 100644 index 0000000000000000000000000000000000000000..c333411d02390433eae44fa45aeb4828355f592a --- /dev/null +++ b/data/model_data_json/nlpconnect_vit-gpt2-image-captioning.json @@ -0,0 +1,18 @@ +{ + "model_id": "nlpconnect/vit-gpt2-image-captioning", + "downloads": 891943, + "tags": [ + "transformers", + "pytorch", + "vision-encoder-decoder", + "image-text-to-text", + "image-to-text", + "image-captioning", + "doi:10.57967/hf/0222", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - image-to-text - image-captioning license: apache-2.0 widget: - src: example_title: Savanna - src: example_title: Football Match - src: example_title: Airport --- # nlpconnect/vit-gpt2-image-captioning This is an image captioning model trained by @ydshieh in flax this is pytorch version of this. # The Illustrated Image Captioning using transformers ![]( * # Sample running code # Sample running code using transformers pipeline # Contact for any help * * * *", + "model_explanation_gemini": "Generates descriptive text captions for input images using a combination of vision and language transformers." +} \ No newline at end of file diff --git a/data/model_data_json/nlptown_bert-base-multilingual-uncased-sentiment.json b/data/model_data_json/nlptown_bert-base-multilingual-uncased-sentiment.json new file mode 100644 index 0000000000000000000000000000000000000000..20995586132c0029f31e88fa1a48f5eeb8c73275 --- /dev/null +++ b/data/model_data_json/nlptown_bert-base-multilingual-uncased-sentiment.json @@ -0,0 +1,26 @@ +{ + "model_id": "nlptown/bert-base-multilingual-uncased-sentiment", + "downloads": 1679901, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "bert", + "text-classification", + "en", + "nl", + "de", + "fr", + "it", + "es", + "doi:10.57967/hf/1515", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - nl - de - fr - it - es license: mit --- # bert-base-multilingual-uncased-sentiment Visit the NLP Town website for an updated version of this model, with a 40% error reduction on product reviews. This is a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish, and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5). This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above or for further finetuning on related sentiment analysis tasks. ## Training data Here is the number of product reviews we used for finetuning the model: | Language | Number of reviews | | -------- | ----------------- | | English | 150k | | Dutch | 80k | | German | 137k | | French | 140k | | Italian | 72k | | Spanish | 50k | ## Accuracy The fine-tuned model obtained the following accuracy on 5,000 held-out product reviews in each of the languages: - Accuracy (exact) is the exact match for the number of stars. - Accuracy (off-by-1) is the percentage of reviews where the number of stars the model predicts differs by a maximum of 1 from the number given by the human reviewer. | Language | Accuracy (exact) | Accuracy (off-by-1) | | -------- | ---------------------- | ------------------- | | English | 67% | 95% | Dutch | 57% | 93% | German | 61% | 94% | French | 59% | 94% | Italian | 59% | 95% | Spanish | 58% | 95% ## Contact In addition to this model, NLP Town offers custom models for many languages and NLP tasks. If you found this model useful, you can buy us a coffee. Feel free to contact us for questions, feedback and/or requests for similar models.", + "model_explanation_gemini": "Finetuned for multilingual sentiment analysis on product reviews in six languages, predicting sentiment as a 1-5 star rating." +} \ No newline at end of file diff --git a/data/model_data_json/nm-testing_pixtral-12b-FP8-dynamic.json b/data/model_data_json/nm-testing_pixtral-12b-FP8-dynamic.json new file mode 100644 index 0000000000000000000000000000000000000000..6787f425269bcb88bb4b03b2ebdd559e3955229f --- /dev/null +++ b/data/model_data_json/nm-testing_pixtral-12b-FP8-dynamic.json @@ -0,0 +1,23 @@ +{ + "model_id": "nm-testing/pixtral-12b-FP8-dynamic", + "downloads": 130648, + "tags": [ + "transformers", + "safetensors", + "llava", + "image-text-to-text", + "vllm", + "vision", + "fp8", + "conversational", + "en", + "base_model:mgoin/pixtral-12b", + "base_model:quantized:mgoin/pixtral-12b", + "license:apache-2.0", + "endpoints_compatible", + "compressed-tensors", + "region:us" + ], + "description": "--- tags: - vllm - vision - fp8 license: apache-2.0 license_link: >- language: - en base_model: mgoin/pixtral-12b library_name: transformers --- # pixtral-12b-FP8-Dynamic ## Model Overview - **Model Architecture:** mgoin/pixtral-12b - **Input:** Vision-Text - **Output:** Text - **Model Optimizations:** - **Weight quantization:** FP8 - **Activation quantization:** FP8 - **Release Date:** 2/24/2025 - **Version:** 1.0 - **Model Developers:** Neural Magic Quantized version of mgoin/pixtral-12b. ### Model Optimizations This model was obtained by quantizing the weights of mgoin/pixtral-12b to FP8 data type, ready for inference with vLLM >= 0.5.2. ## Deployment ### Use with vLLM This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details. ## Creation This model was created with llm-compressor by running the code snippet below as part a multimodal announcement blog.
Model Creation Code
## Evaluation The model was evaluated using mistral-evals for vision-related tasks and using lm_evaluation_harness for select text-based benchmarks. The evaluations were conducted using the following commands:
Evaluation Commands ### Vision Tasks - vqav2 - docvqa - mathvista - mmmu - chartqa ### Text-based Tasks #### MMLU #### HumanEval ##### Generation ##### Sanitization ##### Evaluation
## Accuracy
Category Metric mgoin/pixtral-12b neuralmagic/pixtral-12b-FP8-Dynamic Recovery (%)
Vision MMMU (val, CoT)
explicit_prompt_relaxed_correctness
48.00 50.11 104.40%
VQAv2 (val)
vqa_match
78.71 78.44 99.66%
DocVQA (val)
anls
89.47 89.20 99.70%
ChartQA (test, CoT)
anywhere_in_answer_relaxed_correctness
81.68 81.76 100.10%
Mathvista (testmini, CoT)
explicit_prompt_relaxed_correctness
56.50 58.70 103.89%
Average Score 70.07 71.24 101.67%
Text HumanEval
pass@1
68.40 69.50 101.61%
MMLU (5-shot) 71.40 69.50 97.34%
## Inference Performance This model achieves up to 1.80x speedup in single-stream deployment and up to 1.36x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario. The following performance benchmarks were conducted with vLLM version 0.7.2, and GuideLLM.
Benchmarking Command
### Single-stream performance (measured with vLLM version 0.7.2)
Document Visual Question Answering
1680W x 2240H
64/128
Visual Reasoning
640W x 480H
128/128
Image Captioning
480W x 360H
0/128
Hardware Model Average Cost Reduction Latency (s) Queries Per Dollar Latency (s) Queries Per Dollar Latency (s) Queries Per Dollar
A6000x1 mgoin/pixtral-12b 5.7 796 4.8 929 4.7 964
neuralmagic/pixtral-12b-quantized.w8a8 1.55 3.7 1220 3.1 1437 3.0 1511
neuralmagic/pixtral-12b-quantized.w4a16 2.16 3.2 1417 2.1 2093 1.9 2371
A100x1 mgoin/pixtral-12b 3.0 676 2.4 825 2.3 859
neuralmagic/pixtral-12b-quantized.w8a8 1.38 2.2 904 1.7 1159 1.7 1201
neuralmagic/pixtral-12b-quantized.w4a16 1.83 1.8 1096 1.3 1557 1.2 1702
H100x1 mgoin/pixtral-12b 1.8 595 1.5 732 1.4 764
neuralmagic/pixtral-12b-FP8-Dynamic 1.35 1.4 767 1.1 1008 1.0 1056
neuralmagic/pixtral-12b-quantized.w4a16 1.37 1.4 787 1.1 1018 1.0 1065
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens **QPD: Queries per dollar, based on on-demand cost at Lambda Labs (observed on 2/18/2025). ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
Document Visual Question Answering
1680W x 2240H
64/128
Visual Reasoning
640W x 480H
128/128
Image Captioning
480W x 360H
0/128
Hardware Model Average Cost Reduction Maximum throughput (QPS) Queries Per Dollar Maximum throughput (QPS) Queries Per Dollar Maximum throughput (QPS) Queries Per Dollar
A6000x1 mgoin/pixtral-12b 0.6 2632 0.9 4108 1.1 4774
neuralmagic/pixtral-12b-quantized.w8a8 1.50 0.9 3901 1.4 6160 1.6 7292
neuralmagic/pixtral-12b-quantized.w4a16 1.41 0.6 2890 1.3 5758 1.8 8312
A100x1 mgoin/pixtral-12b 1.1 2291 1.8 3670 2.1 4284
neuralmagic/pixtral-12b-quantized.w8a8 1.38 1.5 3096 2.5 5076 3.0 5965
neuralmagic/pixtral-12b-quantized.w4a16 1.40 1.4 2728 2.6 5133 3.5 6943
H100x1 BF16 2.6 2877 4.0 4372 4.7 5095
neuralmagic/pixtral-12b-FP8-Dynamic 1.33 3.4 3753 5.4 5862 6.3 6917
neuralmagic/pixtral-12b-quantized.w4a16 1.22 2.8 3115 5.0 5511 6.2 6777
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens **QPS: Queries per second. **QPD: Queries per dollar, based on on-demand cost at Lambda Labs (observed on 2/18/2025).", + "model_explanation_gemini": "A quantized vision-text model optimized for efficient inference with FP8 precision, excelling in visual question answering and multimodal tasks while maintaining high accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/nomic-ai_modernbert-embed-base.json b/data/model_data_json/nomic-ai_modernbert-embed-base.json new file mode 100644 index 0000000000000000000000000000000000000000..397abba4cb41d3341d8bd893e927fcb45233101a --- /dev/null +++ b/data/model_data_json/nomic-ai_modernbert-embed-base.json @@ -0,0 +1,24 @@ +{ + "model_id": "nomic-ai/modernbert-embed-base", + "downloads": 77956, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "modernbert", + "feature-extraction", + "sentence-similarity", + "mteb", + "transformers.js", + "en", + "arxiv:2402.01613", + "base_model:answerdotai/ModernBERT-base", + "base_model:finetune:answerdotai/ModernBERT-base", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - transformers.js model-index: - name: binarize_False results: - task: type: Classification dataset: type: None name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 78.13432835820896 - type: ap value: 42.190424731303246 - type: f1 value: 72.34446401534811 - task: type: Classification dataset: type: None name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 93.093825 - type: ap value: 90.03727505544286 - type: f1 value: 93.0874055138833 - task: type: Classification dataset: type: None name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.428000000000004 - type: f1 value: 47.74311520203536 - task: type: Retrieval dataset: type: None name: MTEB ArguAna config: default split: test revision: c22ab2a51041ffd869aaddef7af8d8215647e41a metrics: - type: map_at_1 value: 23.898 - type: map_at_10 value: 39.775 - type: map_at_100 value: 40.827000000000005 - type: map_at_1000 value: 40.837 - type: map_at_20 value: 40.604 - type: map_at_3 value: 34.519 - type: map_at_5 value: 37.307 - type: mrr_at_1 value: 24.395 - type: mrr_at_10 value: 39.963 - type: mrr_at_100 value: 41.014 - type: mrr_at_1000 value: 41.024 - type: mrr_at_20 value: 40.791 - type: mrr_at_3 value: 34.732 - type: mrr_at_5 value: 37.480999999999995 - type: ndcg_at_1 value: 23.898 - type: ndcg_at_10 value: 48.962 - type: ndcg_at_100 value: 53.386 - type: ndcg_at_1000 value: 53.634 - type: ndcg_at_20 value: 51.898999999999994 - type: ndcg_at_3 value: 38.034 - type: ndcg_at_5 value: 43.036 - type: precision_at_1 value: 23.898 - type: precision_at_10 value: 7.852 - type: precision_at_100 value: 0.9769999999999999 - type: precision_at_1000 value: 0.1 - type: precision_at_20 value: 4.4990000000000006 - type: precision_at_3 value: 16.073999999999998 - type: precision_at_5 value: 12.063 - type: recall_at_1 value: 23.898 - type: recall_at_10 value: 78.521 - type: recall_at_100 value: 97.724 - type: recall_at_1000 value: 99.644 - type: recall_at_20 value: 89.972 - type: recall_at_3 value: 48.222 - type: recall_at_5 value: 60.313 - task: type: Clustering dataset: type: None name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.69067314293749 - type: v_measures value: [0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413, 0.4953006738713271, 0.500982950617211, 0.490168788349858, 0.4924060458428337, 0.475176328561399, 0.47446297663785564, 0.46948807073019405, 0.4772028638329531, 0.48735189935310713, 0.48641173887761663, 0.5575029526712674, 0.5574020390232136, 0.5536066904942645, 0.5536169413675474, 0.5566938602585987, 0.5561143054736898, 0.561846457174852, 0.5511643632282144, 0.5514762015499715, 0.551824471283655, 0.5148077891863135, 0.29015461701593837, 0.4430422977323321, 0.40857527197890686, 0.3479983114229163, 0.27582001934225003, 0.29595564003512503, 0.22528676611734755, 0.3073271865740206, 1.0, 0.2749401557058413] - task: type: Clustering dataset: type: None name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 38.0916537995626 - type: v_measures value: [0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377, 0.37814352051854533, 0.39235658929084877, 0.3871170834588581, 0.4042678213739614, 0.3918486409557737, 0.38473003463452093, 0.35622070034791886, 0.3911472272128115, 0.3986923912337426, 0.39040109467533013, 0.4370949482641744, 0.4414023630938724, 0.4351473848532441, 0.4401176389499172, 0.4423731097742471, 0.438309696145818, 0.43410597641884624, 0.43900908630646696, 0.44081346534023286, 0.4386000014888906, 0.4047539306032343, 0.21697191913450847, 0.29241358200068185, 0.3390740154458194, 0.2793967439904601, 0.20383792346854981, 0.23904022437429004, 0.14733601126565044, 0.22946888289524586, 1.0, 0.19422067034794377] - task: type: Reranking dataset: type: None name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 62.33195643912506 - type: mrr value: 76.43978366970057 - task: type: STS dataset: type: None name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 81.20285894915236 - type: cos_sim_spearman value: 78.16322678527897 - type: euclidean_pearson value: 80.6118408638417 - type: euclidean_spearman value: 78.19033583671204 - type: manhattan_pearson value: 80.41282660275819 - type: manhattan_spearman value: 77.98611431591628 - task: type: Classification dataset: type: None name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.25324675324676 - type: f1 value: 85.19854235582687 - task: type: Clustering dataset: type: None name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.65216461057432 - type: v_measures value: [0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885, 0.409550367831406, 0.3943451642663655, 0.38843873187080014, 0.40032616646112934, 0.3956833025503425, 0.3842865397042604, 0.3950585966936957, 0.41669832667987455, 0.39790986378306964, 0.3829194012164885] - task: type: Clustering dataset: type: None name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 33.28787287895752 - type: v_measures value: [0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306, 0.3235019092705102, 0.34053753555843735, 0.32485572754337366, 0.3149662563474906, 0.3326837187664875, 0.3229632335470733, 0.33078383561261365, 0.35111148393509534, 0.33383133843449825, 0.35355224888017306] - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 32.677 - type: map_at_10 value: 43.739 - type: map_at_100 value: 45.152 - type: map_at_1000 value: 45.279 - type: map_at_20 value: 44.553 - type: map_at_3 value: 40.321 - type: map_at_5 value: 42.201 - type: mrr_at_1 value: 40.2 - type: mrr_at_10 value: 49.755 - type: mrr_at_100 value: 50.468 - type: mrr_at_1000 value: 50.513 - type: mrr_at_20 value: 50.192 - type: mrr_at_3 value: 47.163 - type: mrr_at_5 value: 48.686 - type: ndcg_at_1 value: 40.2 - type: ndcg_at_10 value: 49.963 - type: ndcg_at_100 value: 54.978 - type: ndcg_at_1000 value: 56.979 - type: ndcg_at_20 value: 51.983000000000004 - type: ndcg_at_3 value: 45.086999999999996 - type: ndcg_at_5 value: 47.309 - type: precision_at_1 value: 40.2 - type: precision_at_10 value: 9.328 - type: precision_at_100 value: 1.443 - type: precision_at_1000 value: 0.19 - type: precision_at_20 value: 5.558 - type: precision_at_3 value: 21.364 - type: precision_at_5 value: 15.222 - type: recall_at_1 value: 32.677 - type: recall_at_10 value: 61.71 - type: recall_at_100 value: 82.431 - type: recall_at_1000 value: 94.896 - type: recall_at_20 value: 68.73700000000001 - type: recall_at_3 value: 47.431 - type: recall_at_5 value: 53.739000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: map_at_1 value: 32.71 - type: map_at_10 value: 43.297000000000004 - type: map_at_100 value: 44.607 - type: map_at_1000 value: 44.729 - type: map_at_20 value: 44.013999999999996 - type: map_at_3 value: 40.213 - type: map_at_5 value: 42.004000000000005 - type: mrr_at_1 value: 40.892 - type: mrr_at_10 value: 49.394 - type: mrr_at_100 value: 50.005 - type: mrr_at_1000 value: 50.043000000000006 - type: mrr_at_20 value: 49.764 - type: mrr_at_3 value: 47.134 - type: mrr_at_5 value: 48.522 - type: ndcg_at_1 value: 40.892 - type: ndcg_at_10 value: 49.047000000000004 - type: ndcg_at_100 value: 53.266999999999996 - type: ndcg_at_1000 value: 55.096999999999994 - type: ndcg_at_20 value: 50.707 - type: ndcg_at_3 value: 44.896 - type: ndcg_at_5 value: 46.983000000000004 - type: precision_at_1 value: 40.892 - type: precision_at_10 value: 9.293 - type: precision_at_100 value: 1.473 - type: precision_at_1000 value: 0.192 - type: precision_at_20 value: 5.446 - type: precision_at_3 value: 21.592 - type: precision_at_5 value: 15.540999999999999 - type: recall_at_1 value: 32.71 - type: recall_at_10 value: 58.592999999999996 - type: recall_at_100 value: 76.242 - type: recall_at_1000 value: 87.717 - type: recall_at_20 value: 64.646 - type: recall_at_3 value: 46.253 - type: recall_at_5 value: 51.946999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: map_at_1 value: 41.644999999999996 - type: map_at_10 value: 53.825 - type: map_at_100 value: 54.82 - type: map_at_1000 value: 54.87499999999999 - type: map_at_20 value: 54.43 - type: map_at_3 value: 50.705 - type: map_at_5 value: 52.501 - type: mrr_at_1 value: 47.524 - type: mrr_at_10 value: 57.260999999999996 - type: mrr_at_100 value: 57.902 - type: mrr_at_1000 value: 57.931999999999995 - type: mrr_at_20 value: 57.689 - type: mrr_at_3 value: 55.089 - type: mrr_at_5 value: 56.38999999999999 - type: ndcg_at_1 value: 47.524 - type: ndcg_at_10 value: 59.41499999999999 - type: ndcg_at_100 value: 63.258 - type: ndcg_at_1000 value: 64.376 - type: ndcg_at_20 value: 61.149 - type: ndcg_at_3 value: 54.381 - type: ndcg_at_5 value: 56.89999999999999 - type: precision_at_1 value: 47.524 - type: precision_at_10 value: 9.386 - type: precision_at_100 value: 1.221 - type: precision_at_1000 value: 0.136 - type: precision_at_20 value: 5.223 - type: precision_at_3 value: 24.096 - type: precision_at_5 value: 16.364 - type: recall_at_1 value: 41.644999999999996 - type: recall_at_10 value: 72.386 - type: recall_at_100 value: 88.794 - type: recall_at_1000 value: 96.75399999999999 - type: recall_at_20 value: 78.74 - type: recall_at_3 value: 59.028000000000006 - type: recall_at_5 value: 65.197 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: map_at_1 value: 28.648 - type: map_at_10 value: 36.388999999999996 - type: map_at_100 value: 37.372 - type: map_at_1000 value: 37.457 - type: map_at_20 value: 36.912 - type: map_at_3 value: 34.076 - type: map_at_5 value: 35.415 - type: mrr_at_1 value: 30.508000000000003 - type: mrr_at_10 value: 38.132 - type: mrr_at_100 value: 39.04 - type: mrr_at_1000 value: 39.106 - type: mrr_at_20 value: 38.643 - type: mrr_at_3 value: 35.876000000000005 - type: mrr_at_5 value: 37.208999999999996 - type: ndcg_at_1 value: 30.508000000000003 - type: ndcg_at_10 value: 40.762 - type: ndcg_at_100 value: 45.732 - type: ndcg_at_1000 value: 47.799 - type: ndcg_at_20 value: 42.591 - type: ndcg_at_3 value: 36.266999999999996 - type: ndcg_at_5 value: 38.58 - type: precision_at_1 value: 30.508000000000003 - type: precision_at_10 value: 6.010999999999999 - type: precision_at_100 value: 0.897 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_20 value: 3.412 - type: precision_at_3 value: 14.991 - type: precision_at_5 value: 10.328 - type: recall_at_1 value: 28.648 - type: recall_at_10 value: 52.342999999999996 - type: recall_at_100 value: 75.268 - type: recall_at_1000 value: 90.641 - type: recall_at_20 value: 59.303 - type: recall_at_3 value: 40.447 - type: recall_at_5 value: 46.117000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: map_at_1 value: 18.476 - type: map_at_10 value: 27.148 - type: map_at_100 value: 28.317999999999998 - type: map_at_1000 value: 28.427999999999997 - type: map_at_20 value: 27.764 - type: map_at_3 value: 24.801000000000002 - type: map_at_5 value: 26.133 - type: mrr_at_1 value: 22.886 - type: mrr_at_10 value: 31.741000000000003 - type: mrr_at_100 value: 32.708 - type: mrr_at_1000 value: 32.769 - type: mrr_at_20 value: 32.296 - type: mrr_at_3 value: 29.498 - type: mrr_at_5 value: 30.773 - type: ndcg_at_1 value: 22.886 - type: ndcg_at_10 value: 32.265 - type: ndcg_at_100 value: 37.829 - type: ndcg_at_1000 value: 40.558 - type: ndcg_at_20 value: 34.372 - type: ndcg_at_3 value: 28.105000000000004 - type: ndcg_at_5 value: 30.04 - type: precision_at_1 value: 22.886 - type: precision_at_10 value: 5.808 - type: precision_at_100 value: 0.985 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_20 value: 3.495 - type: precision_at_3 value: 13.639999999999999 - type: precision_at_5 value: 9.577 - type: recall_at_1 value: 18.476 - type: recall_at_10 value: 43.442 - type: recall_at_100 value: 67.376 - type: recall_at_1000 value: 86.874 - type: recall_at_20 value: 51.038 - type: recall_at_3 value: 31.785999999999998 - type: recall_at_5 value: 36.858999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: map_at_1 value: 29.098000000000003 - type: map_at_10 value: 38.97 - type: map_at_100 value: 40.293 - type: map_at_1000 value: 40.397 - type: map_at_20 value: 39.778999999999996 - type: map_at_3 value: 35.723 - type: map_at_5 value: 37.519999999999996 - type: mrr_at_1 value: 35.515 - type: mrr_at_10 value: 44.55 - type: mrr_at_100 value: 45.37 - type: mrr_at_1000 value: 45.412 - type: mrr_at_20 value: 45.054 - type: mrr_at_3 value: 41.835 - type: mrr_at_5 value: 43.356 - type: ndcg_at_1 value: 35.515 - type: ndcg_at_10 value: 44.91 - type: ndcg_at_100 value: 50.27700000000001 - type: ndcg_at_1000 value: 52.215 - type: ndcg_at_20 value: 47.235 - type: ndcg_at_3 value: 39.505 - type: ndcg_at_5 value: 42.016 - type: precision_at_1 value: 35.515 - type: precision_at_10 value: 8.152 - type: precision_at_100 value: 1.262 - type: precision_at_1000 value: 0.16 - type: precision_at_20 value: 4.851 - type: precision_at_3 value: 18.447 - type: precision_at_5 value: 13.321 - type: recall_at_1 value: 29.098000000000003 - type: recall_at_10 value: 57.115 - type: recall_at_100 value: 79.467 - type: recall_at_1000 value: 92.162 - type: recall_at_20 value: 65.161 - type: recall_at_3 value: 42.254000000000005 - type: recall_at_5 value: 48.415 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: map_at_1 value: 27.372000000000003 - type: map_at_10 value: 37.781 - type: map_at_100 value: 39.128 - type: map_at_1000 value: 39.238 - type: map_at_20 value: 38.592 - type: map_at_3 value: 34.782999999999994 - type: map_at_5 value: 36.466 - type: mrr_at_1 value: 33.904 - type: mrr_at_10 value: 43.15 - type: mrr_at_100 value: 44.049 - type: mrr_at_1000 value: 44.107 - type: mrr_at_20 value: 43.721 - type: mrr_at_3 value: 40.677 - type: mrr_at_5 value: 42.19 - type: ndcg_at_1 value: 33.904 - type: ndcg_at_10 value: 43.527 - type: ndcg_at_100 value: 49.004999999999995 - type: ndcg_at_1000 value: 51.276999999999994 - type: ndcg_at_20 value: 45.988 - type: ndcg_at_3 value: 38.824999999999996 - type: ndcg_at_5 value: 41.04 - type: precision_at_1 value: 33.904 - type: precision_at_10 value: 7.854 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.16 - type: precision_at_20 value: 4.692 - type: precision_at_3 value: 18.531 - type: precision_at_5 value: 13.150999999999998 - type: recall_at_1 value: 27.372000000000003 - type: recall_at_10 value: 55.245999999999995 - type: recall_at_100 value: 78.278 - type: recall_at_1000 value: 93.718 - type: recall_at_20 value: 64.095 - type: recall_at_3 value: 41.665 - type: recall_at_5 value: 47.632000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: f46a197baaae43b4f621051089b82a364682dfeb metrics: - type: map_at_1 value: 27.734166666666667 - type: map_at_10 value: 36.858 - type: map_at_100 value: 38.043833333333325 - type: map_at_1000 value: 38.15541666666667 - type: map_at_20 value: 37.521249999999995 - type: map_at_3 value: 34.07658333333333 - type: map_at_5 value: 35.62683333333333 - type: mrr_at_1 value: 32.676249999999996 - type: mrr_at_10 value: 40.999 - type: mrr_at_100 value: 41.835 - type: mrr_at_1000 value: 41.8895 - type: mrr_at_20 value: 41.4865 - type: mrr_at_3 value: 38.645 - type: mrr_at_5 value: 39.99725000000001 - type: ndcg_at_1 value: 32.676249999999996 - type: ndcg_at_10 value: 42.08016666666666 - type: ndcg_at_100 value: 47.082750000000004 - type: ndcg_at_1000 value: 49.276583333333335 - type: ndcg_at_20 value: 44.04808333333334 - type: ndcg_at_3 value: 37.43375 - type: ndcg_at_5 value: 39.623000000000005 - type: precision_at_1 value: 32.676249999999996 - type: precision_at_10 value: 7.271 - type: precision_at_100 value: 1.1458333333333333 - type: precision_at_1000 value: 0.152 - type: precision_at_20 value: 4.282916666666667 - type: precision_at_3 value: 17.061416666666666 - type: precision_at_5 value: 12.05466666666667 - type: recall_at_1 value: 27.734166666666667 - type: recall_at_10 value: 53.33574999999999 - type: recall_at_100 value: 75.16275 - type: recall_at_1000 value: 90.34891666666665 - type: recall_at_20 value: 60.4935 - type: recall_at_3 value: 40.377916666666664 - type: recall_at_5 value: 46.0195 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: map_at_1 value: 25.653 - type: map_at_10 value: 32.151 - type: map_at_100 value: 33.152 - type: map_at_1000 value: 33.243 - type: map_at_20 value: 32.717 - type: map_at_3 value: 30.287 - type: map_at_5 value: 31.25 - type: mrr_at_1 value: 28.988000000000003 - type: mrr_at_10 value: 35.131 - type: mrr_at_100 value: 36.002 - type: mrr_at_1000 value: 36.069 - type: mrr_at_20 value: 35.61 - type: mrr_at_3 value: 33.308 - type: mrr_at_5 value: 34.259 - type: ndcg_at_1 value: 28.988000000000003 - type: ndcg_at_10 value: 35.988 - type: ndcg_at_100 value: 40.764 - type: ndcg_at_1000 value: 43.112 - type: ndcg_at_20 value: 37.852999999999994 - type: ndcg_at_3 value: 32.562000000000005 - type: ndcg_at_5 value: 33.983000000000004 - type: precision_at_1 value: 28.988000000000003 - type: precision_at_10 value: 5.475 - type: precision_at_100 value: 0.8500000000000001 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_20 value: 3.229 - type: precision_at_3 value: 13.905999999999999 - type: precision_at_5 value: 9.386999999999999 - type: recall_at_1 value: 25.653 - type: recall_at_10 value: 44.962 - type: recall_at_100 value: 66.405 - type: recall_at_1000 value: 83.88799999999999 - type: recall_at_20 value: 51.79899999999999 - type: recall_at_3 value: 35.144999999999996 - type: recall_at_5 value: 38.814 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: map_at_1 value: 17.825 - type: map_at_10 value: 25.592 - type: map_at_100 value: 26.613999999999997 - type: map_at_1000 value: 26.734 - type: map_at_20 value: 26.115 - type: map_at_3 value: 23.119 - type: map_at_5 value: 24.54 - type: mrr_at_1 value: 21.335 - type: mrr_at_10 value: 29.165000000000003 - type: mrr_at_100 value: 30.049 - type: mrr_at_1000 value: 30.121 - type: mrr_at_20 value: 29.639 - type: mrr_at_3 value: 26.863999999999997 - type: mrr_at_5 value: 28.185 - type: ndcg_at_1 value: 21.335 - type: ndcg_at_10 value: 30.357 - type: ndcg_at_100 value: 35.410000000000004 - type: ndcg_at_1000 value: 38.24 - type: ndcg_at_20 value: 32.08 - type: ndcg_at_3 value: 25.95 - type: ndcg_at_5 value: 28.081 - type: precision_at_1 value: 21.335 - type: precision_at_10 value: 5.506 - type: precision_at_100 value: 0.928 - type: precision_at_1000 value: 0.135 - type: precision_at_20 value: 3.2550000000000003 - type: precision_at_3 value: 12.239 - type: precision_at_5 value: 8.885 - type: recall_at_1 value: 17.825 - type: recall_at_10 value: 41.105999999999995 - type: recall_at_100 value: 64.17 - type: recall_at_1000 value: 84.19200000000001 - type: recall_at_20 value: 47.497 - type: recall_at_3 value: 28.862 - type: recall_at_5 value: 34.348 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: map_at_1 value: 29.435 - type: map_at_10 value: 38.261 - type: map_at_100 value: 39.242 - type: map_at_1000 value: 39.347 - type: map_at_20 value: 38.742 - type: map_at_3 value: 35.457 - type: map_at_5 value: 37.043 - type: mrr_at_1 value: 34.235 - type: mrr_at_10 value: 42.24 - type: mrr_at_100 value: 42.988 - type: mrr_at_1000 value: 43.043 - type: mrr_at_20 value: 42.613 - type: mrr_at_3 value: 39.832 - type: mrr_at_5 value: 41.227000000000004 - type: ndcg_at_1 value: 34.235 - type: ndcg_at_10 value: 43.384 - type: ndcg_at_100 value: 48.14 - type: ndcg_at_1000 value: 50.414 - type: ndcg_at_20 value: 44.913 - type: ndcg_at_3 value: 38.454 - type: ndcg_at_5 value: 40.776 - type: precision_at_1 value: 34.235 - type: precision_at_10 value: 7.164 - type: precision_at_100 value: 1.065 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_20 value: 4.021 - type: precision_at_3 value: 17.226 - type: precision_at_5 value: 12.071 - type: recall_at_1 value: 29.435 - type: recall_at_10 value: 54.93900000000001 - type: recall_at_100 value: 76.176 - type: recall_at_1000 value: 91.989 - type: recall_at_20 value: 60.451 - type: recall_at_3 value: 41.332 - type: recall_at_5 value: 47.316 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: map_at_1 value: 25.605 - type: map_at_10 value: 34.162 - type: map_at_100 value: 35.827999999999996 - type: map_at_1000 value: 36.04 - type: map_at_20 value: 35.016000000000005 - type: map_at_3 value: 30.984 - type: map_at_5 value: 32.717 - type: mrr_at_1 value: 30.435000000000002 - type: mrr_at_10 value: 38.681 - type: mrr_at_100 value: 39.656000000000006 - type: mrr_at_1000 value: 39.71 - type: mrr_at_20 value: 39.208999999999996 - type: mrr_at_3 value: 35.903 - type: mrr_at_5 value: 37.454 - type: ndcg_at_1 value: 30.435000000000002 - type: ndcg_at_10 value: 39.916000000000004 - type: ndcg_at_100 value: 45.958 - type: ndcg_at_1000 value: 48.449999999999996 - type: ndcg_at_20 value: 42.085 - type: ndcg_at_3 value: 34.696 - type: ndcg_at_5 value: 37.147000000000006 - type: precision_at_1 value: 30.435000000000002 - type: precision_at_10 value: 7.767 - type: precision_at_100 value: 1.547 - type: precision_at_1000 value: 0.23800000000000002 - type: precision_at_20 value: 4.941 - type: precision_at_3 value: 16.073999999999998 - type: precision_at_5 value: 11.937000000000001 - type: recall_at_1 value: 25.605 - type: recall_at_10 value: 50.654999999999994 - type: recall_at_100 value: 77.609 - type: recall_at_1000 value: 93.518 - type: recall_at_20 value: 58.845000000000006 - type: recall_at_3 value: 36.272 - type: recall_at_5 value: 42.596000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: map_at_1 value: 23.666 - type: map_at_10 value: 30.980999999999998 - type: map_at_100 value: 32.0 - type: map_at_1000 value: 32.098 - type: map_at_20 value: 31.621 - type: map_at_3 value: 28.449999999999996 - type: map_at_5 value: 29.731999999999996 - type: mrr_at_1 value: 25.692999999999998 - type: mrr_at_10 value: 32.788000000000004 - type: mrr_at_100 value: 33.783 - type: mrr_at_1000 value: 33.849000000000004 - type: mrr_at_20 value: 33.408 - type: mrr_at_3 value: 30.561 - type: mrr_at_5 value: 31.716 - type: ndcg_at_1 value: 25.692999999999998 - type: ndcg_at_10 value: 35.428 - type: ndcg_at_100 value: 40.375 - type: ndcg_at_1000 value: 42.802 - type: ndcg_at_20 value: 37.621 - type: ndcg_at_3 value: 30.476999999999997 - type: ndcg_at_5 value: 32.621 - type: precision_at_1 value: 25.692999999999998 - type: precision_at_10 value: 5.508 - type: precision_at_100 value: 0.848 - type: precision_at_1000 value: 0.116 - type: precision_at_20 value: 3.272 - type: precision_at_3 value: 12.631 - type: precision_at_5 value: 8.872 - type: recall_at_1 value: 23.666 - type: recall_at_10 value: 47.532000000000004 - type: recall_at_100 value: 69.73700000000001 - type: recall_at_1000 value: 87.83800000000001 - type: recall_at_20 value: 55.61000000000001 - type: recall_at_3 value: 34.06 - type: recall_at_5 value: 39.254 - task: type: Retrieval dataset: type: None name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: map_at_1 value: 16.337 - type: map_at_10 value: 26.488 - type: map_at_100 value: 28.415000000000003 - type: map_at_1000 value: 28.584 - type: map_at_20 value: 27.557 - type: map_at_3 value: 22.665 - type: map_at_5 value: 24.542 - type: mrr_at_1 value: 36.417 - type: mrr_at_10 value: 48.001 - type: mrr_at_100 value: 48.784 - type: mrr_at_1000 value: 48.809000000000005 - type: mrr_at_20 value: 48.507 - type: mrr_at_3 value: 45.103 - type: mrr_at_5 value: 46.843 - type: ndcg_at_1 value: 36.417 - type: ndcg_at_10 value: 35.67 - type: ndcg_at_100 value: 42.716 - type: ndcg_at_1000 value: 45.639 - type: ndcg_at_20 value: 38.471 - type: ndcg_at_3 value: 30.444 - type: ndcg_at_5 value: 32.004 - type: precision_at_1 value: 36.417 - type: precision_at_10 value: 10.73 - type: precision_at_100 value: 1.833 - type: precision_at_1000 value: 0.23800000000000002 - type: precision_at_20 value: 6.596 - type: precision_at_3 value: 22.302 - type: precision_at_5 value: 16.521 - type: recall_at_1 value: 16.337 - type: recall_at_10 value: 40.671 - type: recall_at_100 value: 64.55300000000001 - type: recall_at_1000 value: 80.934 - type: recall_at_20 value: 48.381 - type: recall_at_3 value: 27.279999999999998 - type: recall_at_5 value: 32.621 - task: type: Retrieval dataset: type: None name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: map_at_1 value: 9.056000000000001 - type: map_at_10 value: 19.419 - type: map_at_100 value: 27.069 - type: map_at_1000 value: 28.666000000000004 - type: map_at_20 value: 22.434 - type: map_at_3 value: 13.895 - type: map_at_5 value: 16.121 - type: mrr_at_1 value: 69.0 - type: mrr_at_10 value: 75.804 - type: mrr_at_100 value: 76.117 - type: mrr_at_1000 value: 76.125 - type: mrr_at_20 value: 76.009 - type: mrr_at_3 value: 74.375 - type: mrr_at_5 value: 75.4 - type: ndcg_at_1 value: 57.49999999999999 - type: ndcg_at_10 value: 41.495 - type: ndcg_at_100 value: 45.208 - type: ndcg_at_1000 value: 52.221 - type: ndcg_at_20 value: 40.617999999999995 - type: ndcg_at_3 value: 46.592 - type: ndcg_at_5 value: 43.559 - type: precision_at_1 value: 69.0 - type: precision_at_10 value: 32.574999999999996 - type: precision_at_100 value: 10.205 - type: precision_at_1000 value: 2.036 - type: precision_at_20 value: 24.687 - type: precision_at_3 value: 49.75 - type: precision_at_5 value: 42.0 - type: recall_at_1 value: 9.056000000000001 - type: recall_at_10 value: 24.866 - type: recall_at_100 value: 50.097 - type: recall_at_1000 value: 72.038 - type: recall_at_20 value: 31.858999999999998 - type: recall_at_3 value: 15.096000000000002 - type: recall_at_5 value: 18.548000000000002 - task: type: Classification dataset: type: None name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 48.259999999999984 - type: f1 value: 43.1498589523159 - task: type: Retrieval dataset: type: None name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: map_at_1 value: 74.798 - type: map_at_10 value: 83.454 - type: map_at_100 value: 83.623 - type: map_at_1000 value: 83.635 - type: map_at_20 value: 83.55 - type: map_at_3 value: 82.392 - type: map_at_5 value: 83.167 - type: mrr_at_1 value: 80.708 - type: mrr_at_10 value: 88.377 - type: mrr_at_100 value: 88.411 - type: mrr_at_1000 value: 88.411 - type: mrr_at_20 value: 88.402 - type: mrr_at_3 value: 87.646 - type: mrr_at_5 value: 88.232 - type: ndcg_at_1 value: 80.708 - type: ndcg_at_10 value: 87.35199999999999 - type: ndcg_at_100 value: 87.91600000000001 - type: ndcg_at_1000 value: 88.12299999999999 - type: ndcg_at_20 value: 87.593 - type: ndcg_at_3 value: 85.738 - type: ndcg_at_5 value: 86.845 - type: precision_at_1 value: 80.708 - type: precision_at_10 value: 10.432 - type: precision_at_100 value: 1.091 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_20 value: 5.296 - type: precision_at_3 value: 32.778 - type: precision_at_5 value: 20.399 - type: recall_at_1 value: 74.798 - type: recall_at_10 value: 94.459 - type: recall_at_100 value: 96.614 - type: recall_at_1000 value: 97.868 - type: recall_at_20 value: 95.254 - type: recall_at_3 value: 90.144 - type: recall_at_5 value: 92.965 - task: type: Retrieval dataset: type: None name: MTEB FiQA2018 config: default split: test revision: 27a168819829fe9bcd655c2df245fb19452e8e06 metrics: - type: map_at_1 value: 20.008 - type: map_at_10 value: 32.731 - type: map_at_100 value: 34.467999999999996 - type: map_at_1000 value: 34.643 - type: map_at_20 value: 33.717000000000006 - type: map_at_3 value: 28.427999999999997 - type: map_at_5 value: 30.788 - type: mrr_at_1 value: 40.586 - type: mrr_at_10 value: 49.056 - type: mrr_at_100 value: 49.887 - type: mrr_at_1000 value: 49.929 - type: mrr_at_20 value: 49.552 - type: mrr_at_3 value: 46.785 - type: mrr_at_5 value: 48.004000000000005 - type: ndcg_at_1 value: 40.586 - type: ndcg_at_10 value: 40.589999999999996 - type: ndcg_at_100 value: 47.03 - type: ndcg_at_1000 value: 49.994 - type: ndcg_at_20 value: 43.229 - type: ndcg_at_3 value: 37.061 - type: ndcg_at_5 value: 37.992 - type: precision_at_1 value: 40.586 - type: precision_at_10 value: 11.219 - type: precision_at_100 value: 1.781 - type: precision_at_1000 value: 0.232 - type: precision_at_20 value: 6.705 - type: precision_at_3 value: 24.743000000000002 - type: precision_at_5 value: 18.086 - type: recall_at_1 value: 20.008 - type: recall_at_10 value: 47.412 - type: recall_at_100 value: 71.274 - type: recall_at_1000 value: 88.898 - type: recall_at_20 value: 55.706999999999994 - type: recall_at_3 value: 33.346 - type: recall_at_5 value: 39.112 - task: type: Retrieval dataset: type: None name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: map_at_1 value: 41.789 - type: map_at_10 value: 57.898 - type: map_at_100 value: 58.632 - type: map_at_1000 value: 58.693 - type: map_at_20 value: 58.314 - type: map_at_3 value: 55.236 - type: map_at_5 value: 56.852999999999994 - type: mrr_at_1 value: 83.57900000000001 - type: mrr_at_10 value: 87.631 - type: mrr_at_100 value: 87.764 - type: mrr_at_1000 value: 87.77000000000001 - type: mrr_at_20 value: 87.70700000000001 - type: mrr_at_3 value: 87.02499999999999 - type: mrr_at_5 value: 87.34100000000001 - type: ndcg_at_1 value: 83.57900000000001 - type: ndcg_at_10 value: 67.11399999999999 - type: ndcg_at_100 value: 69.686 - type: ndcg_at_1000 value: 70.926 - type: ndcg_at_20 value: 68.119 - type: ndcg_at_3 value: 63.402 - type: ndcg_at_5 value: 65.354 - type: precision_at_1 value: 83.57900000000001 - type: precision_at_10 value: 13.333 - type: precision_at_100 value: 1.537 - type: precision_at_1000 value: 0.16999999999999998 - type: precision_at_20 value: 6.988999999999999 - type: precision_at_3 value: 38.929 - type: precision_at_5 value: 24.897 - type: recall_at_1 value: 41.789 - type: recall_at_10 value: 66.664 - type: recall_at_100 value: 76.833 - type: recall_at_1000 value: 85.14500000000001 - type: recall_at_20 value: 69.892 - type: recall_at_3 value: 58.392999999999994 - type: recall_at_5 value: 62.242 - task: type: Classification dataset: type: None name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 86.6108 - type: ap value: 81.63890253106925 - type: f1 value: 86.54585789538082 - task: type: Retrieval dataset: type: None name: MTEB MSMARCO config: default split: dev revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: map_at_1 value: 22.407 - type: map_at_10 value: 34.603 - type: map_at_100 value: 35.808 - type: map_at_1000 value: 35.855 - type: map_at_20 value: 35.368 - type: map_at_3 value: 30.764000000000003 - type: map_at_5 value: 32.964 - type: mrr_at_1 value: 23.009 - type: mrr_at_10 value: 35.136 - type: mrr_at_100 value: 36.284 - type: mrr_at_1000 value: 36.325 - type: mrr_at_20 value: 35.869 - type: mrr_at_3 value: 31.351000000000003 - type: mrr_at_5 value: 33.54 - type: ndcg_at_1 value: 23.009 - type: ndcg_at_10 value: 41.471999999999994 - type: ndcg_at_100 value: 47.211999999999996 - type: ndcg_at_1000 value: 48.361 - type: ndcg_at_20 value: 44.169000000000004 - type: ndcg_at_3 value: 33.646 - type: ndcg_at_5 value: 37.580000000000005 - type: precision_at_1 value: 23.009 - type: precision_at_10 value: 6.54 - type: precision_at_100 value: 0.941 - type: precision_at_1000 value: 0.104 - type: precision_at_20 value: 3.832 - type: precision_at_3 value: 14.283999999999999 - type: precision_at_5 value: 10.564 - type: recall_at_1 value: 22.407 - type: recall_at_10 value: 62.678999999999995 - type: recall_at_100 value: 89.09700000000001 - type: recall_at_1000 value: 97.822 - type: recall_at_20 value: 73.116 - type: recall_at_3 value: 41.4 - type: recall_at_5 value: 50.855 - task: type: Classification dataset: type: None name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 92.94573643410853 - type: f1 value: 92.73148878666994 - task: type: Classification dataset: type: None name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.86137710898313 - type: f1 value: 60.360562463738724 - task: type: Classification dataset: type: None name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.83322125084062 - type: f1 value: 71.61864304680206 - task: type: Classification dataset: type: None name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.50504371217215 - type: f1 value: 77.52039268347185 - task: type: Clustering dataset: type: None name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.346952648910225 - type: v_measures value: [0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024, 0.3246964225451952, 0.33269208719245646, 0.3355911472371345, 0.32978655133380147, 0.3275090874657499, 0.3752583186941529, 0.3494711327267592, 0.36636134409497156, 0.3538734420417993, 0.3394557315590024] - task: type: Clustering dataset: type: None name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.19992734583148 - type: v_measures value: [0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027, 0.31100967211136193, 0.31302897733611235, 0.3126922134381441, 0.30243629014133017, 0.31564501718268645, 0.34772968477866795, 0.32522623268021805, 0.3410158265159116, 0.33581770403870503, 0.31539111636001027] - task: type: Reranking dataset: type: None name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.62309561205373 - type: mrr value: 31.707879717902554 - task: type: Retrieval dataset: type: None name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: map_at_1 value: 5.668 - type: map_at_10 value: 12.225999999999999 - type: map_at_100 value: 15.122 - type: map_at_1000 value: 16.422 - type: map_at_20 value: 13.361999999999998 - type: map_at_3 value: 9.083 - type: map_at_5 value: 10.5 - type: mrr_at_1 value: 46.44 - type: mrr_at_10 value: 53.553 - type: mrr_at_100 value: 54.15 - type: mrr_at_1000 value: 54.193000000000005 - type: mrr_at_20 value: 53.837 - type: mrr_at_3 value: 51.702999999999996 - type: mrr_at_5 value: 52.647 - type: ndcg_at_1 value: 44.272 - type: ndcg_at_10 value: 33.395 - type: ndcg_at_100 value: 29.976999999999997 - type: ndcg_at_1000 value: 38.388 - type: ndcg_at_20 value: 30.606 - type: ndcg_at_3 value: 39.212 - type: ndcg_at_5 value: 36.611 - type: precision_at_1 value: 46.129999999999995 - type: precision_at_10 value: 24.334 - type: precision_at_100 value: 7.553999999999999 - type: precision_at_1000 value: 1.994 - type: precision_at_20 value: 17.678 - type: precision_at_3 value: 36.326 - type: precision_at_5 value: 31.330999999999996 - type: recall_at_1 value: 5.668 - type: recall_at_10 value: 15.837000000000002 - type: recall_at_100 value: 29.845 - type: recall_at_1000 value: 60.563 - type: recall_at_20 value: 18.587999999999997 - type: recall_at_3 value: 10.096 - type: recall_at_5 value: 12.261 - task: type: Retrieval dataset: type: None name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: map_at_1 value: 39.335 - type: map_at_10 value: 54.932 - type: map_at_100 value: 55.742000000000004 - type: map_at_1000 value: 55.766000000000005 - type: map_at_20 value: 55.504 - type: map_at_3 value: 50.904 - type: map_at_5 value: 53.388999999999996 - type: mrr_at_1 value: 44.003 - type: mrr_at_10 value: 57.419 - type: mrr_at_100 value: 57.963 - type: mrr_at_1000 value: 57.981 - type: mrr_at_20 value: 57.80499999999999 - type: mrr_at_3 value: 54.30199999999999 - type: mrr_at_5 value: 56.257000000000005 - type: ndcg_at_1 value: 43.974999999999994 - type: ndcg_at_10 value: 62.153999999999996 - type: ndcg_at_100 value: 65.326 - type: ndcg_at_1000 value: 65.862 - type: ndcg_at_20 value: 63.922999999999995 - type: ndcg_at_3 value: 54.834 - type: ndcg_at_5 value: 58.857000000000006 - type: precision_at_1 value: 43.974999999999994 - type: precision_at_10 value: 9.722 - type: precision_at_100 value: 1.153 - type: precision_at_1000 value: 0.12 - type: precision_at_20 value: 5.3 - type: precision_at_3 value: 24.392 - type: precision_at_5 value: 16.993 - type: recall_at_1 value: 39.335 - type: recall_at_10 value: 81.501 - type: recall_at_100 value: 94.851 - type: recall_at_1000 value: 98.817 - type: recall_at_20 value: 87.968 - type: recall_at_3 value: 62.795 - type: recall_at_5 value: 71.985 - task: type: Retrieval dataset: type: None name: MTEB QuoraRetrieval config: default split: test revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 metrics: - type: map_at_1 value: 71.222 - type: map_at_10 value: 85.193 - type: map_at_100 value: 85.802 - type: map_at_1000 value: 85.81800000000001 - type: map_at_20 value: 85.587 - type: map_at_3 value: 82.253 - type: map_at_5 value: 84.142 - type: mrr_at_1 value: 82.04 - type: mrr_at_10 value: 88.101 - type: mrr_at_100 value: 88.196 - type: mrr_at_1000 value: 88.196 - type: mrr_at_20 value: 88.175 - type: mrr_at_3 value: 87.145 - type: mrr_at_5 value: 87.825 - type: ndcg_at_1 value: 82.04 - type: ndcg_at_10 value: 88.849 - type: ndcg_at_100 value: 89.992 - type: ndcg_at_1000 value: 90.089 - type: ndcg_at_20 value: 89.468 - type: ndcg_at_3 value: 86.06899999999999 - type: ndcg_at_5 value: 87.669 - type: precision_at_1 value: 82.04 - type: precision_at_10 value: 13.447000000000001 - type: precision_at_100 value: 1.528 - type: precision_at_1000 value: 0.157 - type: precision_at_20 value: 7.116 - type: precision_at_3 value: 37.617 - type: precision_at_5 value: 24.776 - type: recall_at_1 value: 71.222 - type: recall_at_10 value: 95.73899999999999 - type: recall_at_100 value: 99.572 - type: recall_at_1000 value: 99.988 - type: recall_at_20 value: 97.725 - type: recall_at_3 value: 87.742 - type: recall_at_5 value: 92.23400000000001 - task: type: Clustering dataset: type: None name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.502005725283524 - type: v_measures value: [0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237, 0.5845673186673394, 0.648423996059595, 0.5081078446363154, 0.577059582267051, 0.5449838765447135, 0.5255305026550916, 0.6001776953894321, 0.5075448301528861, 0.5238448212279936, 0.5329001795025329, 0.5112306232092642, 0.6002807353254037, 0.5525285295615835, 0.56281813563348, 0.6722346506108504, 0.5293879728430999, 0.5972632642217942, 0.6345018102197326, 0.515945887049231, 0.5291998092690363, 0.5250323799432043, 0.538426398169316, 0.6954213901632498, 0.580008522375662, 0.5280806756230237] - task: type: Clustering dataset: type: None name: MTEB RedditClusteringP2P config: default split: test revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 metrics: - type: v_measure value: 63.14989421688691 - type: v_measures value: [0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534, 0.673210410652684, 0.6825035243902045, 0.6275126414823813, 0.40001836573261074, 0.711458797825346, 0.6212317163461291, 0.4113635660304527, 0.7394060043565659, 0.6969073197749642, 0.7513770750973534] - task: type: Retrieval dataset: type: None name: MTEB SCIDOCS config: default split: test revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 metrics: - type: map_at_1 value: 4.4830000000000005 - type: map_at_10 value: 11.04 - type: map_at_100 value: 12.764000000000001 - type: map_at_1000 value: 13.04 - type: map_at_20 value: 11.953 - type: map_at_3 value: 8.125 - type: map_at_5 value: 9.565999999999999 - type: mrr_at_1 value: 22.1 - type: mrr_at_10 value: 32.494 - type: mrr_at_100 value: 33.525 - type: mrr_at_1000 value: 33.596 - type: mrr_at_20 value: 33.089 - type: mrr_at_3 value: 29.416999999999998 - type: mrr_at_5 value: 31.267 - type: ndcg_at_1 value: 22.1 - type: ndcg_at_10 value: 18.587 - type: ndcg_at_100 value: 25.482 - type: ndcg_at_1000 value: 30.581999999999997 - type: ndcg_at_20 value: 21.077 - type: ndcg_at_3 value: 18.165 - type: ndcg_at_5 value: 15.676000000000002 - type: precision_at_1 value: 22.1 - type: precision_at_10 value: 9.48 - type: precision_at_100 value: 1.942 - type: precision_at_1000 value: 0.316 - type: precision_at_20 value: 6.175 - type: precision_at_3 value: 17.033 - type: precision_at_5 value: 13.719999999999999 - type: recall_at_1 value: 4.4830000000000005 - type: recall_at_10 value: 19.208 - type: recall_at_100 value: 39.417 - type: recall_at_1000 value: 64.235 - type: recall_at_20 value: 25.057000000000002 - type: recall_at_3 value: 10.348 - type: recall_at_5 value: 13.893 - task: type: STS dataset: type: None name: MTEB SICK-R config: default split: test revision: 20a6d6f312dd54037fe07a32d58e5e168867909d metrics: - type: cos_sim_pearson value: 83.50181312649208 - type: cos_sim_spearman value: 79.92900705478993 - type: euclidean_pearson value: 81.13482128094503 - type: euclidean_spearman value: 79.92732266864367 - type: manhattan_pearson value: 81.06702121654993 - type: manhattan_spearman value: 79.86983106619135 - task: type: STS dataset: type: None name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.85431681906961 - type: cos_sim_spearman value: 77.61671419416626 - type: euclidean_pearson value: 81.30538320520961 - type: euclidean_spearman value: 77.62096481461272 - type: manhattan_pearson value: 81.2306021173407 - type: manhattan_spearman value: 77.58386300715222 - task: type: STS dataset: type: None name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.98057702322754 - type: cos_sim_spearman value: 86.13305071688859 - type: euclidean_pearson value: 85.70903555966376 - type: euclidean_spearman value: 86.13150222328171 - type: manhattan_pearson value: 85.69380834788831 - type: manhattan_spearman value: 86.10784739081191 - task: type: STS dataset: type: None name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.43368314724589 - type: cos_sim_spearman value: 81.26767916144169 - type: euclidean_pearson value: 83.23234690932492 - type: euclidean_spearman value: 81.2671726214706 - type: manhattan_pearson value: 83.2381239261109 - type: manhattan_spearman value: 81.27674961470714 - task: type: STS dataset: type: None name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.8637546411748 - type: cos_sim_spearman value: 88.25330888676139 - type: euclidean_pearson value: 87.81194589390417 - type: euclidean_spearman value: 88.25258669625579 - type: manhattan_pearson value: 87.8131866998459 - type: manhattan_spearman value: 88.26523268929576 - task: type: STS dataset: type: None name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.83129743147286 - type: cos_sim_spearman value: 85.73732687732624 - type: euclidean_pearson value: 85.18051277328075 - type: euclidean_spearman value: 85.73565846174445 - type: manhattan_pearson value: 85.179029651079 - type: manhattan_spearman value: 85.75709685404729 - task: type: STS dataset: type: None name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.04715794253148 - type: cos_sim_spearman value: 87.61577496386343 - type: euclidean_pearson value: 88.34713614361046 - type: euclidean_spearman value: 87.56541901567275 - type: manhattan_pearson value: 88.26010824585985 - type: manhattan_spearman value: 87.35211736948182 - task: type: STS dataset: type: None name: MTEB STS22 (en) config: en split: test revision: eea2b4fe26a775864c896887d910b76a8098ad3f metrics: - type: cos_sim_pearson value: 62.36160793264433 - type: cos_sim_spearman value: 66.07767480051893 - type: euclidean_pearson value: 66.4716471304865 - type: euclidean_spearman value: 66.03999286501872 - type: manhattan_pearson value: 66.46197824372902 - type: manhattan_spearman value: 65.82936468127227 - task: type: STS dataset: type: None name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 85.27768996785856 - type: cos_sim_spearman value: 86.96704639052885 - type: euclidean_pearson value: 86.48753189555983 - type: euclidean_spearman value: 86.96981285751171 - type: manhattan_pearson value: 86.49262465015401 - type: manhattan_spearman value: 86.95378609580054 - task: type: Reranking dataset: type: None name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 81.52012853393428 - type: mrr value: 94.70817671798063 - task: type: Retrieval dataset: type: None name: MTEB SciFact config: default split: test revision: 0228b52cf27578f30900b9e5271d331663a030d7 metrics: - type: map_at_1 value: 55.344 - type: map_at_10 value: 64.82900000000001 - type: map_at_100 value: 65.42 - type: map_at_1000 value: 65.443 - type: map_at_20 value: 65.2 - type: map_at_3 value: 61.8 - type: map_at_5 value: 63.510999999999996 - type: mrr_at_1 value: 58.333 - type: mrr_at_10 value: 66.24600000000001 - type: mrr_at_100 value: 66.742 - type: mrr_at_1000 value: 66.762 - type: mrr_at_20 value: 66.549 - type: mrr_at_3 value: 64.056 - type: mrr_at_5 value: 65.372 - type: ndcg_at_1 value: 58.333 - type: ndcg_at_10 value: 69.626 - type: ndcg_at_100 value: 72.236 - type: ndcg_at_1000 value: 72.872 - type: ndcg_at_20 value: 70.864 - type: ndcg_at_3 value: 64.50399999999999 - type: ndcg_at_5 value: 67.07600000000001 - type: precision_at_1 value: 58.333 - type: precision_at_10 value: 9.4 - type: precision_at_100 value: 1.073 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_20 value: 4.983 - type: precision_at_3 value: 25.222 - type: precision_at_5 value: 16.8 - type: recall_at_1 value: 55.344 - type: recall_at_10 value: 82.789 - type: recall_at_100 value: 94.6 - type: recall_at_1000 value: 99.667 - type: recall_at_20 value: 87.533 - type: recall_at_3 value: 69.18299999999999 - type: recall_at_5 value: 75.622 - task: type: PairClassification dataset: type: None name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.69405940594059 - type: cos_sim_ap value: 92.03642221694545 - type: cos_sim_f1 value: 84.06395048994327 - type: cos_sim_precision value: 86.79446219382322 - type: cos_sim_recall value: 81.5 - type: dot_accuracy value: 99.6930693069307 - type: dot_ap value: 91.9971441434875 - type: dot_f1 value: 83.8006230529595 - type: dot_precision value: 87.14902807775377 - type: dot_recall value: 80.7 - type: euclidean_accuracy value: 99.69504950495049 - type: euclidean_ap value: 92.03626548389335 - type: euclidean_f1 value: 84.10732714138285 - type: euclidean_precision value: 86.88699360341151 - type: euclidean_recall value: 81.5 - type: manhattan_accuracy value: 99.69504950495049 - type: manhattan_ap value: 92.02049659660081 - type: manhattan_f1 value: 84.34959349593495 - type: manhattan_precision value: 85.74380165289256 - type: manhattan_recall value: 83.0 - type: max_accuracy value: 99.69504950495049 - type: max_ap value: 92.03642221694545 - type: max_f1 value: 84.34959349593495 - task: type: Clustering dataset: type: None name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 67.04916654680977 - type: v_measures value: [0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096, 0.707614120277991, 0.694974842783697, 0.5756359888519659, 0.6964499615297283, 0.6547764033608466, 0.6448470247319567, 0.6263766967145058, 0.7139286894225703, 0.6737195749489034, 0.6824504575459811, 0.7667603743275774, 0.7595788549615426, 0.7086156082505461, 0.6624140136843005, 0.6136884209896801, 0.6717953455355791, 0.6494834308652331, 0.6507885275711466, 0.6382769468968572, 0.6556052416453325, 0.6700496626301571, 0.6424264693175464, 0.6400679099051025, 0.7118398877792876, 0.6501271821744096] - task: type: Clustering dataset: type: None name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 33.36641413495258 - type: v_measures value: [0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235, 0.3245963448931168, 0.31882294716748927, 0.31975204745764507, 0.30752650651575314, 0.3191185767616115, 0.35880812225202774, 0.3427515820677152, 0.344097881083346, 0.35390675395072985, 0.3472606513458235] - task: type: Reranking dataset: type: None name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 51.19282080158746 - type: mrr value: 51.871100713012474 - task: type: Summarization dataset: type: None name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 31.437664703708485 - type: cos_sim_spearman value: 31.391119208581575 - type: dot_pearson value: 31.19925970504054 - type: dot_spearman value: 31.38087224016694 - task: type: Retrieval dataset: type: None name: MTEB TRECCOVID config: default split: test revision: bb9466bac8153a0349341eb1b22e06409e78ef4e metrics: - type: map_at_1 value: 0.249 - type: map_at_10 value: 2.163 - type: map_at_100 value: 13.242999999999999 - type: map_at_1000 value: 30.866 - type: map_at_20 value: 3.9539999999999997 - type: map_at_3 value: 0.718 - type: map_at_5 value: 1.169 - type: mrr_at_1 value: 96.0 - type: mrr_at_10 value: 98.0 - type: mrr_at_100 value: 98.0 - type: mrr_at_1000 value: 98.0 - type: mrr_at_20 value: 98.0 - type: mrr_at_3 value: 98.0 - type: mrr_at_5 value: 98.0 - type: ndcg_at_1 value: 92.0 - type: ndcg_at_10 value: 84.147 - type: ndcg_at_100 value: 65.143 - type: ndcg_at_1000 value: 56.038 - type: ndcg_at_20 value: 80.869 - type: ndcg_at_3 value: 89.11200000000001 - type: ndcg_at_5 value: 87.199 - type: precision_at_1 value: 96.0 - type: precision_at_10 value: 87.8 - type: precision_at_100 value: 66.72 - type: precision_at_1000 value: 24.684 - type: precision_at_20 value: 84.3 - type: precision_at_3 value: 94.0 - type: precision_at_5 value: 91.2 - type: recall_at_1 value: 0.249 - type: recall_at_10 value: 2.284 - type: recall_at_100 value: 16.025 - type: recall_at_1000 value: 52.068999999999996 - type: recall_at_20 value: 4.3180000000000005 - type: recall_at_3 value: 0.738 - type: recall_at_5 value: 1.212 - task: type: Retrieval dataset: type: None name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: map_at_1 value: 3.4520000000000004 - type: map_at_10 value: 13.045000000000002 - type: map_at_100 value: 19.442 - type: map_at_1000 value: 21.09 - type: map_at_20 value: 15.667 - type: map_at_3 value: 7.409000000000001 - type: map_at_5 value: 9.73 - type: mrr_at_1 value: 46.939 - type: mrr_at_10 value: 60.295 - type: mrr_at_100 value: 60.904 - type: mrr_at_1000 value: 60.919000000000004 - type: mrr_at_20 value: 60.77 - type: mrr_at_3 value: 58.50300000000001 - type: mrr_at_5 value: 59.014 - type: ndcg_at_1 value: 44.897999999999996 - type: ndcg_at_10 value: 31.911 - type: ndcg_at_100 value: 41.945 - type: ndcg_at_1000 value: 53.181999999999995 - type: ndcg_at_20 value: 31.505 - type: ndcg_at_3 value: 39.745000000000005 - type: ndcg_at_5 value: 35.528999999999996 - type: precision_at_1 value: 46.939 - type: precision_at_10 value: 26.531 - type: precision_at_100 value: 8.163 - type: precision_at_1000 value: 1.559 - type: precision_at_20 value: 19.387999999999998 - type: precision_at_3 value: 40.136 - type: precision_at_5 value: 33.878 - type: recall_at_1 value: 3.4520000000000004 - type: recall_at_10 value: 18.899 - type: recall_at_100 value: 50.207 - type: recall_at_1000 value: 83.871 - type: recall_at_20 value: 26.756999999999998 - type: recall_at_3 value: 8.729000000000001 - type: recall_at_5 value: 12.084999999999999 - task: type: Classification dataset: type: None name: MTEB ToxicConversationsClassification config: default split: test revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de metrics: - type: accuracy value: 67.4560546875 - type: ap value: 12.720403845355294 - type: f1 value: 51.76062666567839 - task: type: Classification dataset: type: None name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 62.36276174306734 - type: f1 value: 62.69956906934332 - task: type: Clustering dataset: type: None name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 49.473492910233965 - type: v_measures value: [0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281, 0.48829262296803855, 0.49853262011854643, 0.48457750518082765, 0.5020774116970983, 0.5001897357021557, 0.4702417082210781, 0.4763216048226018, 0.49932879417585735, 0.5129628835129124, 0.514824404624281] - task: type: PairClassification dataset: type: None name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.75430649102938 - type: cos_sim_ap value: 73.62842656477649 - type: cos_sim_f1 value: 67.76023680315738 - type: cos_sim_precision value: 63.61741547012506 - type: cos_sim_recall value: 72.4802110817942 - type: dot_accuracy value: 85.7423854085951 - type: dot_ap value: 73.59147637253723 - type: dot_f1 value: 67.69498693867396 - type: dot_precision value: 64.03859731701577 - type: dot_recall value: 71.79419525065963 - type: euclidean_accuracy value: 85.7423854085951 - type: euclidean_ap value: 73.6288990409654 - type: euclidean_f1 value: 67.80415430267064 - type: euclidean_precision value: 63.79711493718009 - type: euclidean_recall value: 72.34828496042216 - type: manhattan_accuracy value: 85.69470107885796 - type: manhattan_ap value: 73.49219614602531 - type: manhattan_f1 value: 67.60809797550613 - type: manhattan_precision value: 64.22127255460589 - type: manhattan_recall value: 71.37203166226914 - type: max_accuracy value: 85.75430649102938 - type: max_ap value: 73.6288990409654 - type: max_f1 value: 67.80415430267064 - task: type: PairClassification dataset: type: None name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.08293553770326 - type: cos_sim_ap value: 86.21246419992926 - type: cos_sim_f1 value: 78.49922526377924 - type: cos_sim_precision value: 75.35769939084857 - type: cos_sim_recall value: 81.9140745303357 - type: dot_accuracy value: 89.08681647067955 - type: dot_ap value: 86.19733517196862 - type: dot_f1 value: 78.51132446157838 - type: dot_precision value: 75.70233755093287 - type: dot_recall value: 81.53680320295658 - type: euclidean_accuracy value: 89.07517367175069 - type: euclidean_ap value: 86.21198725320203 - type: euclidean_f1 value: 78.49867139061116 - type: euclidean_precision value: 75.38276155372839 - type: euclidean_recall value: 81.88327687095781 - type: manhattan_accuracy value: 89.0538285403811 - type: manhattan_ap value: 86.17785515765131 - type: manhattan_f1 value: 78.48184098593084 - type: manhattan_precision value: 74.34396308285694 - type: manhattan_recall value: 83.10748383122882 - type: max_accuracy value: 89.08681647067955 - type: max_ap value: 86.21246419992926 - type: max_f1 value: 78.51132446157838 license: apache-2.0 language: - en base_model: - answerdotai/ModernBERT-base - nomic-ai/modernbert-embed-unsupervised base_model_relation: finetune --- # ModernBERT Embed | Classification (12) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Summarization (1) | |-----------------------|------------|--------------|---------------------|-----------------|-------------------------|---------------|----------------|-----------|------------------| | nomic-embed-text-v1 | 768 | 62.4 | 74.1 | 43.9 | **85.2** | 55.7 | 52.8 | 82.1 | 30.1 | | nomic-embed-text-v1.5 | 768 | 62.28 | 73.55 | 43.93 | 84.61 | 55.78 | **53.01** | **81.94** | 30.4 | | modernbert-embed-base | 768 | **62.62** | **74.31** | **44.98** | 83.96 | **56.42** | 52.89 | 81.78 | **31.39** | | nomic-embed-text-v1.5 | 256 | 61.04 | 72.1 | 43.16 | 84.09 | 55.18 | 50.81 | 81.34 | 30.05 | | modernbert-embed-base | 256 | 61.17 | 72.40 | 43.82 | 83.45 | 55.69 | 50.62 | 81.12 | 31.27 | ## Usage You can use these models directly with the latest transformers release and requires installing : Reminder, this model is trained similarly to Nomic Embed and **REQUIRES** prefixes to be added to the input. For more information, see the instructions in Nomic Embed. Most use cases, adding to the query and to the documents will be sufficient. ### Sentence Transformers
Click to see Sentence Transformers usage with Matryoshka Truncation In Sentence Transformers, you can truncate embeddings to a smaller dimension by using the parameter when loading the model. Note the small differences compared to the full 768-dimensional similarities.
### Transformers
Click to see Transformers usage with Matryoshka Truncation In , you can truncate embeddings to a smaller dimension by slicing the mean pooled embeddings, prior to normalization. Note the small differences compared to the full 768-dimensional similarities.
### Transformers.js If you haven't already, you can install the Transformers.js JavaScript library from NPM using: Then, you can compute embeddings as follows: ## Training Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data! ![image/webp]( We train our embedder using a multi-stage training pipeline. Starting from a long-context BERT model, the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles. In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage. For more details, see the Nomic Embed Technical Report and corresponding blog post. Training data to train the models is released in its entirety. For more details, see the repository ## Join the Nomic Community - Nomic: - Discord: - Twitter: ## Citation If you find the model, dataset, or training code useful, please cite our work" +} \ No newline at end of file diff --git a/data/model_data_json/nomic-ai_nomic-embed-text-v1.5.json b/data/model_data_json/nomic-ai_nomic-embed-text-v1.5.json new file mode 100644 index 0000000000000000000000000000000000000000..6cffe85d8b6e63d9bc6589529673c0085628c88c --- /dev/null +++ b/data/model_data_json/nomic-ai_nomic-embed-text-v1.5.json @@ -0,0 +1,27 @@ +{ + "model_id": "nomic-ai/nomic-embed-text-v1.5", + "downloads": 570318, + "tags": [ + "sentence-transformers", + "onnx", + "safetensors", + "nomic_bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "transformers", + "transformers.js", + "custom_code", + "en", + "arxiv:2402.01613", + "arxiv:2205.13147", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - feature-extraction - sentence-similarity - mteb - transformers - transformers.js model-index: - name: epoch_0_model results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.20895522388058 - type: ap value: 38.57605549557802 - type: f1 value: 69.35586565857854 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.8144 - type: ap value: 88.65222882032363 - type: f1 value: 91.80426301643274 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 47.162000000000006 - type: f1 value: 46.59329642263158 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 24.253 - type: map_at_10 value: 38.962 - type: map_at_100 value: 40.081 - type: map_at_1000 value: 40.089000000000006 - type: map_at_3 value: 33.499 - type: map_at_5 value: 36.351 - type: mrr_at_1 value: 24.609 - type: mrr_at_10 value: 39.099000000000004 - type: mrr_at_100 value: 40.211000000000006 - type: mrr_at_1000 value: 40.219 - type: mrr_at_3 value: 33.677 - type: mrr_at_5 value: 36.469 - type: ndcg_at_1 value: 24.253 - type: ndcg_at_10 value: 48.010999999999996 - type: ndcg_at_100 value: 52.756 - type: ndcg_at_1000 value: 52.964999999999996 - type: ndcg_at_3 value: 36.564 - type: ndcg_at_5 value: 41.711999999999996 - type: precision_at_1 value: 24.253 - type: precision_at_10 value: 7.738 - type: precision_at_100 value: 0.98 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 15.149000000000001 - type: precision_at_5 value: 11.593 - type: recall_at_1 value: 24.253 - type: recall_at_10 value: 77.383 - type: recall_at_100 value: 98.009 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 45.448 - type: recall_at_5 value: 57.965999999999994 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 45.69069567851087 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 36.35185490976283 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.71274951450321 - type: mrr value: 76.06032625423207 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 86.73980520022269 - type: cos_sim_spearman value: 84.24649792685918 - type: euclidean_pearson value: 85.85197641158186 - type: euclidean_spearman value: 84.24649792685918 - type: manhattan_pearson value: 86.26809552711346 - type: manhattan_spearman value: 84.56397504030865 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 84.25324675324674 - type: f1 value: 84.17872280892557 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 38.770253446400886 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 32.94307095497281 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.164 - type: map_at_10 value: 42.641 - type: map_at_100 value: 43.947 - type: map_at_1000 value: 44.074999999999996 - type: map_at_3 value: 39.592 - type: map_at_5 value: 41.204 - type: mrr_at_1 value: 39.628 - type: mrr_at_10 value: 48.625 - type: mrr_at_100 value: 49.368 - type: mrr_at_1000 value: 49.413000000000004 - type: mrr_at_3 value: 46.400000000000006 - type: mrr_at_5 value: 47.68 - type: ndcg_at_1 value: 39.628 - type: ndcg_at_10 value: 48.564 - type: ndcg_at_100 value: 53.507000000000005 - type: ndcg_at_1000 value: 55.635999999999996 - type: ndcg_at_3 value: 44.471 - type: ndcg_at_5 value: 46.137 - type: precision_at_1 value: 39.628 - type: precision_at_10 value: 8.856 - type: precision_at_100 value: 1.429 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 21.268 - type: precision_at_5 value: 14.649000000000001 - type: recall_at_1 value: 32.164 - type: recall_at_10 value: 59.609 - type: recall_at_100 value: 80.521 - type: recall_at_1000 value: 94.245 - type: recall_at_3 value: 46.521 - type: recall_at_5 value: 52.083999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.526 - type: map_at_10 value: 41.581 - type: map_at_100 value: 42.815999999999995 - type: map_at_1000 value: 42.936 - type: map_at_3 value: 38.605000000000004 - type: map_at_5 value: 40.351 - type: mrr_at_1 value: 39.489999999999995 - type: mrr_at_10 value: 47.829 - type: mrr_at_100 value: 48.512 - type: mrr_at_1000 value: 48.552 - type: mrr_at_3 value: 45.754 - type: mrr_at_5 value: 46.986 - type: ndcg_at_1 value: 39.489999999999995 - type: ndcg_at_10 value: 47.269 - type: ndcg_at_100 value: 51.564 - type: ndcg_at_1000 value: 53.53099999999999 - type: ndcg_at_3 value: 43.301 - type: ndcg_at_5 value: 45.239000000000004 - type: precision_at_1 value: 39.489999999999995 - type: precision_at_10 value: 8.93 - type: precision_at_100 value: 1.415 - type: precision_at_1000 value: 0.188 - type: precision_at_3 value: 20.892 - type: precision_at_5 value: 14.865999999999998 - type: recall_at_1 value: 31.526 - type: recall_at_10 value: 56.76 - type: recall_at_100 value: 75.029 - type: recall_at_1000 value: 87.491 - type: recall_at_3 value: 44.786 - type: recall_at_5 value: 50.254 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.987 - type: map_at_10 value: 52.827 - type: map_at_100 value: 53.751000000000005 - type: map_at_1000 value: 53.81 - type: map_at_3 value: 49.844 - type: map_at_5 value: 51.473 - type: mrr_at_1 value: 46.833999999999996 - type: mrr_at_10 value: 56.389 - type: mrr_at_100 value: 57.003 - type: mrr_at_1000 value: 57.034 - type: mrr_at_3 value: 54.17999999999999 - type: mrr_at_5 value: 55.486999999999995 - type: ndcg_at_1 value: 46.833999999999996 - type: ndcg_at_10 value: 58.372 - type: ndcg_at_100 value: 62.068 - type: ndcg_at_1000 value: 63.288 - type: ndcg_at_3 value: 53.400000000000006 - type: ndcg_at_5 value: 55.766000000000005 - type: precision_at_1 value: 46.833999999999996 - type: precision_at_10 value: 9.191 - type: precision_at_100 value: 1.192 - type: precision_at_1000 value: 0.134 - type: precision_at_3 value: 23.448 - type: precision_at_5 value: 15.862000000000002 - type: recall_at_1 value: 40.987 - type: recall_at_10 value: 71.146 - type: recall_at_100 value: 87.035 - type: recall_at_1000 value: 95.633 - type: recall_at_3 value: 58.025999999999996 - type: recall_at_5 value: 63.815999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.587 - type: map_at_10 value: 33.114 - type: map_at_100 value: 34.043 - type: map_at_1000 value: 34.123999999999995 - type: map_at_3 value: 30.45 - type: map_at_5 value: 31.813999999999997 - type: mrr_at_1 value: 26.554 - type: mrr_at_10 value: 35.148 - type: mrr_at_100 value: 35.926 - type: mrr_at_1000 value: 35.991 - type: mrr_at_3 value: 32.599000000000004 - type: mrr_at_5 value: 33.893 - type: ndcg_at_1 value: 26.554 - type: ndcg_at_10 value: 38.132 - type: ndcg_at_100 value: 42.78 - type: ndcg_at_1000 value: 44.919 - type: ndcg_at_3 value: 32.833 - type: ndcg_at_5 value: 35.168 - type: precision_at_1 value: 26.554 - type: precision_at_10 value: 5.921 - type: precision_at_100 value: 0.8659999999999999 - type: precision_at_1000 value: 0.109 - type: precision_at_3 value: 13.861 - type: precision_at_5 value: 9.605 - type: recall_at_1 value: 24.587 - type: recall_at_10 value: 51.690000000000005 - type: recall_at_100 value: 73.428 - type: recall_at_1000 value: 89.551 - type: recall_at_3 value: 37.336999999999996 - type: recall_at_5 value: 43.047000000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.715 - type: map_at_10 value: 24.251 - type: map_at_100 value: 25.326999999999998 - type: map_at_1000 value: 25.455 - type: map_at_3 value: 21.912000000000003 - type: map_at_5 value: 23.257 - type: mrr_at_1 value: 20.274 - type: mrr_at_10 value: 28.552 - type: mrr_at_100 value: 29.42 - type: mrr_at_1000 value: 29.497 - type: mrr_at_3 value: 26.14 - type: mrr_at_5 value: 27.502 - type: ndcg_at_1 value: 20.274 - type: ndcg_at_10 value: 29.088 - type: ndcg_at_100 value: 34.293 - type: ndcg_at_1000 value: 37.271 - type: ndcg_at_3 value: 24.708 - type: ndcg_at_5 value: 26.809 - type: precision_at_1 value: 20.274 - type: precision_at_10 value: 5.361 - type: precision_at_100 value: 0.915 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 11.733 - type: precision_at_5 value: 8.556999999999999 - type: recall_at_1 value: 16.715 - type: recall_at_10 value: 39.587 - type: recall_at_100 value: 62.336000000000006 - type: recall_at_1000 value: 83.453 - type: recall_at_3 value: 27.839999999999996 - type: recall_at_5 value: 32.952999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.793000000000003 - type: map_at_10 value: 38.582 - type: map_at_100 value: 39.881 - type: map_at_1000 value: 39.987 - type: map_at_3 value: 35.851 - type: map_at_5 value: 37.289 - type: mrr_at_1 value: 34.455999999999996 - type: mrr_at_10 value: 43.909 - type: mrr_at_100 value: 44.74 - type: mrr_at_1000 value: 44.786 - type: mrr_at_3 value: 41.659 - type: mrr_at_5 value: 43.010999999999996 - type: ndcg_at_1 value: 34.455999999999996 - type: ndcg_at_10 value: 44.266 - type: ndcg_at_100 value: 49.639 - type: ndcg_at_1000 value: 51.644 - type: ndcg_at_3 value: 39.865 - type: ndcg_at_5 value: 41.887 - type: precision_at_1 value: 34.455999999999996 - type: precision_at_10 value: 7.843999999999999 - type: precision_at_100 value: 1.243 - type: precision_at_1000 value: 0.158 - type: precision_at_3 value: 18.831999999999997 - type: precision_at_5 value: 13.147 - type: recall_at_1 value: 28.793000000000003 - type: recall_at_10 value: 55.68300000000001 - type: recall_at_100 value: 77.99000000000001 - type: recall_at_1000 value: 91.183 - type: recall_at_3 value: 43.293 - type: recall_at_5 value: 48.618 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.907000000000004 - type: map_at_10 value: 35.519 - type: map_at_100 value: 36.806 - type: map_at_1000 value: 36.912 - type: map_at_3 value: 32.748 - type: map_at_5 value: 34.232 - type: mrr_at_1 value: 31.621 - type: mrr_at_10 value: 40.687 - type: mrr_at_100 value: 41.583 - type: mrr_at_1000 value: 41.638999999999996 - type: mrr_at_3 value: 38.527 - type: mrr_at_5 value: 39.612 - type: ndcg_at_1 value: 31.621 - type: ndcg_at_10 value: 41.003 - type: ndcg_at_100 value: 46.617999999999995 - type: ndcg_at_1000 value: 48.82 - type: ndcg_at_3 value: 36.542 - type: ndcg_at_5 value: 38.368 - type: precision_at_1 value: 31.621 - type: precision_at_10 value: 7.396999999999999 - type: precision_at_100 value: 1.191 - type: precision_at_1000 value: 0.153 - type: precision_at_3 value: 17.39 - type: precision_at_5 value: 12.1 - type: recall_at_1 value: 25.907000000000004 - type: recall_at_10 value: 52.115 - type: recall_at_100 value: 76.238 - type: recall_at_1000 value: 91.218 - type: recall_at_3 value: 39.417 - type: recall_at_5 value: 44.435 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.732166666666668 - type: map_at_10 value: 34.51616666666667 - type: map_at_100 value: 35.67241666666666 - type: map_at_1000 value: 35.78675 - type: map_at_3 value: 31.953416666666662 - type: map_at_5 value: 33.333 - type: mrr_at_1 value: 30.300166666666673 - type: mrr_at_10 value: 38.6255 - type: mrr_at_100 value: 39.46183333333334 - type: mrr_at_1000 value: 39.519999999999996 - type: mrr_at_3 value: 36.41299999999999 - type: mrr_at_5 value: 37.6365 - type: ndcg_at_1 value: 30.300166666666673 - type: ndcg_at_10 value: 39.61466666666667 - type: ndcg_at_100 value: 44.60808333333334 - type: ndcg_at_1000 value: 46.91708333333334 - type: ndcg_at_3 value: 35.26558333333333 - type: ndcg_at_5 value: 37.220000000000006 - type: precision_at_1 value: 30.300166666666673 - type: precision_at_10 value: 6.837416666666667 - type: precision_at_100 value: 1.10425 - type: precision_at_1000 value: 0.14875 - type: precision_at_3 value: 16.13716666666667 - type: precision_at_5 value: 11.2815 - type: recall_at_1 value: 25.732166666666668 - type: recall_at_10 value: 50.578916666666665 - type: recall_at_100 value: 72.42183333333334 - type: recall_at_1000 value: 88.48766666666667 - type: recall_at_3 value: 38.41325 - type: recall_at_5 value: 43.515750000000004 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.951 - type: map_at_10 value: 30.974 - type: map_at_100 value: 31.804 - type: map_at_1000 value: 31.900000000000002 - type: map_at_3 value: 28.762 - type: map_at_5 value: 29.94 - type: mrr_at_1 value: 26.534000000000002 - type: mrr_at_10 value: 33.553 - type: mrr_at_100 value: 34.297 - type: mrr_at_1000 value: 34.36 - type: mrr_at_3 value: 31.391000000000002 - type: mrr_at_5 value: 32.525999999999996 - type: ndcg_at_1 value: 26.534000000000002 - type: ndcg_at_10 value: 35.112 - type: ndcg_at_100 value: 39.28 - type: ndcg_at_1000 value: 41.723 - type: ndcg_at_3 value: 30.902 - type: ndcg_at_5 value: 32.759 - type: precision_at_1 value: 26.534000000000002 - type: precision_at_10 value: 5.445 - type: precision_at_100 value: 0.819 - type: precision_at_1000 value: 0.11 - type: precision_at_3 value: 12.986 - type: precision_at_5 value: 9.049 - type: recall_at_1 value: 23.951 - type: recall_at_10 value: 45.24 - type: recall_at_100 value: 64.12299999999999 - type: recall_at_1000 value: 82.28999999999999 - type: recall_at_3 value: 33.806000000000004 - type: recall_at_5 value: 38.277 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.829 - type: map_at_10 value: 23.684 - type: map_at_100 value: 24.683 - type: map_at_1000 value: 24.81 - type: map_at_3 value: 21.554000000000002 - type: map_at_5 value: 22.768 - type: mrr_at_1 value: 20.096 - type: mrr_at_10 value: 27.230999999999998 - type: mrr_at_100 value: 28.083999999999996 - type: mrr_at_1000 value: 28.166000000000004 - type: mrr_at_3 value: 25.212 - type: mrr_at_5 value: 26.32 - type: ndcg_at_1 value: 20.096 - type: ndcg_at_10 value: 27.989000000000004 - type: ndcg_at_100 value: 32.847 - type: ndcg_at_1000 value: 35.896 - type: ndcg_at_3 value: 24.116 - type: ndcg_at_5 value: 25.964 - type: precision_at_1 value: 20.096 - type: precision_at_10 value: 5 - type: precision_at_100 value: 0.8750000000000001 - type: precision_at_1000 value: 0.131 - type: precision_at_3 value: 11.207 - type: precision_at_5 value: 8.08 - type: recall_at_1 value: 16.829 - type: recall_at_10 value: 37.407000000000004 - type: recall_at_100 value: 59.101000000000006 - type: recall_at_1000 value: 81.024 - type: recall_at_3 value: 26.739 - type: recall_at_5 value: 31.524 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.138 - type: map_at_10 value: 32.275999999999996 - type: map_at_100 value: 33.416000000000004 - type: map_at_1000 value: 33.527 - type: map_at_3 value: 29.854000000000003 - type: map_at_5 value: 31.096 - type: mrr_at_1 value: 28.450999999999997 - type: mrr_at_10 value: 36.214 - type: mrr_at_100 value: 37.134 - type: mrr_at_1000 value: 37.198 - type: mrr_at_3 value: 34.001999999999995 - type: mrr_at_5 value: 35.187000000000005 - type: ndcg_at_1 value: 28.450999999999997 - type: ndcg_at_10 value: 37.166 - type: ndcg_at_100 value: 42.454 - type: ndcg_at_1000 value: 44.976 - type: ndcg_at_3 value: 32.796 - type: ndcg_at_5 value: 34.631 - type: precision_at_1 value: 28.450999999999997 - type: precision_at_10 value: 6.241 - type: precision_at_100 value: 0.9950000000000001 - type: precision_at_1000 value: 0.133 - type: precision_at_3 value: 14.801 - type: precision_at_5 value: 10.280000000000001 - type: recall_at_1 value: 24.138 - type: recall_at_10 value: 48.111 - type: recall_at_100 value: 71.245 - type: recall_at_1000 value: 88.986 - type: recall_at_3 value: 36.119 - type: recall_at_5 value: 40.846 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.244 - type: map_at_10 value: 31.227 - type: map_at_100 value: 33.007 - type: map_at_1000 value: 33.223 - type: map_at_3 value: 28.924 - type: map_at_5 value: 30.017 - type: mrr_at_1 value: 27.668 - type: mrr_at_10 value: 35.524 - type: mrr_at_100 value: 36.699 - type: mrr_at_1000 value: 36.759 - type: mrr_at_3 value: 33.366 - type: mrr_at_5 value: 34.552 - type: ndcg_at_1 value: 27.668 - type: ndcg_at_10 value: 36.381 - type: ndcg_at_100 value: 43.062 - type: ndcg_at_1000 value: 45.656 - type: ndcg_at_3 value: 32.501999999999995 - type: ndcg_at_5 value: 34.105999999999995 - type: precision_at_1 value: 27.668 - type: precision_at_10 value: 6.798 - type: precision_at_100 value: 1.492 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 15.152 - type: precision_at_5 value: 10.791 - type: recall_at_1 value: 23.244 - type: recall_at_10 value: 45.979 - type: recall_at_100 value: 74.822 - type: recall_at_1000 value: 91.078 - type: recall_at_3 value: 34.925 - type: recall_at_5 value: 39.126 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.945 - type: map_at_10 value: 27.517999999999997 - type: map_at_100 value: 28.588 - type: map_at_1000 value: 28.682000000000002 - type: map_at_3 value: 25.345000000000002 - type: map_at_5 value: 26.555 - type: mrr_at_1 value: 21.996 - type: mrr_at_10 value: 29.845 - type: mrr_at_100 value: 30.775999999999996 - type: mrr_at_1000 value: 30.845 - type: mrr_at_3 value: 27.726 - type: mrr_at_5 value: 28.882 - type: ndcg_at_1 value: 21.996 - type: ndcg_at_10 value: 32.034 - type: ndcg_at_100 value: 37.185 - type: ndcg_at_1000 value: 39.645 - type: ndcg_at_3 value: 27.750999999999998 - type: ndcg_at_5 value: 29.805999999999997 - type: precision_at_1 value: 21.996 - type: precision_at_10 value: 5.065 - type: precision_at_100 value: 0.819 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 12.076 - type: precision_at_5 value: 8.392 - type: recall_at_1 value: 19.945 - type: recall_at_10 value: 43.62 - type: recall_at_100 value: 67.194 - type: recall_at_1000 value: 85.7 - type: recall_at_3 value: 32.15 - type: recall_at_5 value: 37.208999999999996 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 18.279 - type: map_at_10 value: 31.052999999999997 - type: map_at_100 value: 33.125 - type: map_at_1000 value: 33.306000000000004 - type: map_at_3 value: 26.208 - type: map_at_5 value: 28.857 - type: mrr_at_1 value: 42.671 - type: mrr_at_10 value: 54.557 - type: mrr_at_100 value: 55.142 - type: mrr_at_1000 value: 55.169000000000004 - type: mrr_at_3 value: 51.488 - type: mrr_at_5 value: 53.439 - type: ndcg_at_1 value: 42.671 - type: ndcg_at_10 value: 41.276 - type: ndcg_at_100 value: 48.376000000000005 - type: ndcg_at_1000 value: 51.318 - type: ndcg_at_3 value: 35.068 - type: ndcg_at_5 value: 37.242 - type: precision_at_1 value: 42.671 - type: precision_at_10 value: 12.638 - type: precision_at_100 value: 2.045 - type: precision_at_1000 value: 0.26 - type: precision_at_3 value: 26.08 - type: precision_at_5 value: 19.805 - type: recall_at_1 value: 18.279 - type: recall_at_10 value: 46.946 - type: recall_at_100 value: 70.97200000000001 - type: recall_at_1000 value: 87.107 - type: recall_at_3 value: 31.147999999999996 - type: recall_at_5 value: 38.099 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.573 - type: map_at_10 value: 19.747 - type: map_at_100 value: 28.205000000000002 - type: map_at_1000 value: 29.831000000000003 - type: map_at_3 value: 14.109 - type: map_at_5 value: 16.448999999999998 - type: mrr_at_1 value: 71 - type: mrr_at_10 value: 77.68599999999999 - type: mrr_at_100 value: 77.995 - type: mrr_at_1000 value: 78.00200000000001 - type: mrr_at_3 value: 76.292 - type: mrr_at_5 value: 77.029 - type: ndcg_at_1 value: 59.12500000000001 - type: ndcg_at_10 value: 43.9 - type: ndcg_at_100 value: 47.863 - type: ndcg_at_1000 value: 54.848 - type: ndcg_at_3 value: 49.803999999999995 - type: ndcg_at_5 value: 46.317 - type: precision_at_1 value: 71 - type: precision_at_10 value: 34.4 - type: precision_at_100 value: 11.063 - type: precision_at_1000 value: 1.989 - type: precision_at_3 value: 52.333 - type: precision_at_5 value: 43.7 - type: recall_at_1 value: 8.573 - type: recall_at_10 value: 25.615 - type: recall_at_100 value: 53.385000000000005 - type: recall_at_1000 value: 75.46000000000001 - type: recall_at_3 value: 15.429 - type: recall_at_5 value: 19.357 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.989999999999995 - type: f1 value: 42.776314451497555 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 74.13499999999999 - type: map_at_10 value: 82.825 - type: map_at_100 value: 83.096 - type: map_at_1000 value: 83.111 - type: map_at_3 value: 81.748 - type: map_at_5 value: 82.446 - type: mrr_at_1 value: 79.553 - type: mrr_at_10 value: 86.654 - type: mrr_at_100 value: 86.774 - type: mrr_at_1000 value: 86.778 - type: mrr_at_3 value: 85.981 - type: mrr_at_5 value: 86.462 - type: ndcg_at_1 value: 79.553 - type: ndcg_at_10 value: 86.345 - type: ndcg_at_100 value: 87.32 - type: ndcg_at_1000 value: 87.58200000000001 - type: ndcg_at_3 value: 84.719 - type: ndcg_at_5 value: 85.677 - type: precision_at_1 value: 79.553 - type: precision_at_10 value: 10.402000000000001 - type: precision_at_100 value: 1.1119999999999999 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 32.413 - type: precision_at_5 value: 20.138 - type: recall_at_1 value: 74.13499999999999 - type: recall_at_10 value: 93.215 - type: recall_at_100 value: 97.083 - type: recall_at_1000 value: 98.732 - type: recall_at_3 value: 88.79 - type: recall_at_5 value: 91.259 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 18.298000000000002 - type: map_at_10 value: 29.901 - type: map_at_100 value: 31.528 - type: map_at_1000 value: 31.713 - type: map_at_3 value: 25.740000000000002 - type: map_at_5 value: 28.227999999999998 - type: mrr_at_1 value: 36.728 - type: mrr_at_10 value: 45.401 - type: mrr_at_100 value: 46.27 - type: mrr_at_1000 value: 46.315 - type: mrr_at_3 value: 42.978 - type: mrr_at_5 value: 44.29 - type: ndcg_at_1 value: 36.728 - type: ndcg_at_10 value: 37.456 - type: ndcg_at_100 value: 43.832 - type: ndcg_at_1000 value: 47 - type: ndcg_at_3 value: 33.694 - type: ndcg_at_5 value: 35.085 - type: precision_at_1 value: 36.728 - type: precision_at_10 value: 10.386 - type: precision_at_100 value: 1.701 - type: precision_at_1000 value: 0.22599999999999998 - type: precision_at_3 value: 22.479 - type: precision_at_5 value: 16.605 - type: recall_at_1 value: 18.298000000000002 - type: recall_at_10 value: 44.369 - type: recall_at_100 value: 68.098 - type: recall_at_1000 value: 87.21900000000001 - type: recall_at_3 value: 30.215999999999998 - type: recall_at_5 value: 36.861 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.568 - type: map_at_10 value: 65.061 - type: map_at_100 value: 65.896 - type: map_at_1000 value: 65.95100000000001 - type: map_at_3 value: 61.831 - type: map_at_5 value: 63.849000000000004 - type: mrr_at_1 value: 79.136 - type: mrr_at_10 value: 84.58200000000001 - type: mrr_at_100 value: 84.765 - type: mrr_at_1000 value: 84.772 - type: mrr_at_3 value: 83.684 - type: mrr_at_5 value: 84.223 - type: ndcg_at_1 value: 79.136 - type: ndcg_at_10 value: 72.622 - type: ndcg_at_100 value: 75.539 - type: ndcg_at_1000 value: 76.613 - type: ndcg_at_3 value: 68.065 - type: ndcg_at_5 value: 70.58 - type: precision_at_1 value: 79.136 - type: precision_at_10 value: 15.215 - type: precision_at_100 value: 1.7500000000000002 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 44.011 - type: precision_at_5 value: 28.388999999999996 - type: recall_at_1 value: 39.568 - type: recall_at_10 value: 76.077 - type: recall_at_100 value: 87.481 - type: recall_at_1000 value: 94.56400000000001 - type: recall_at_3 value: 66.01599999999999 - type: recall_at_5 value: 70.97200000000001 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 85.312 - type: ap value: 80.36296867333715 - type: f1 value: 85.26613311552218 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.363999999999997 - type: map_at_10 value: 35.711999999999996 - type: map_at_100 value: 36.876999999999995 - type: map_at_1000 value: 36.923 - type: map_at_3 value: 32.034 - type: map_at_5 value: 34.159 - type: mrr_at_1 value: 24.04 - type: mrr_at_10 value: 36.345 - type: mrr_at_100 value: 37.441 - type: mrr_at_1000 value: 37.480000000000004 - type: mrr_at_3 value: 32.713 - type: mrr_at_5 value: 34.824 - type: ndcg_at_1 value: 24.026 - type: ndcg_at_10 value: 42.531 - type: ndcg_at_100 value: 48.081 - type: ndcg_at_1000 value: 49.213 - type: ndcg_at_3 value: 35.044 - type: ndcg_at_5 value: 38.834 - type: precision_at_1 value: 24.026 - type: precision_at_10 value: 6.622999999999999 - type: precision_at_100 value: 0.941 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.909 - type: precision_at_5 value: 10.871 - type: recall_at_1 value: 23.363999999999997 - type: recall_at_10 value: 63.426 - type: recall_at_100 value: 88.96300000000001 - type: recall_at_1000 value: 97.637 - type: recall_at_3 value: 43.095 - type: recall_at_5 value: 52.178000000000004 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.0095759233926 - type: f1 value: 92.78387794667408 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 75.0296397628819 - type: f1 value: 58.45699589820874 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.45662407531944 - type: f1 value: 71.42364781421813 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.07800941492937 - type: f1 value: 77.22799045640845 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 34.531234379250606 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 30.941490381193802 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.3115090856725 - type: mrr value: 31.290667638675757 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.465 - type: map_at_10 value: 13.03 - type: map_at_100 value: 16.057 - type: map_at_1000 value: 17.49 - type: map_at_3 value: 9.553 - type: map_at_5 value: 11.204 - type: mrr_at_1 value: 43.653 - type: mrr_at_10 value: 53.269 - type: mrr_at_100 value: 53.72 - type: mrr_at_1000 value: 53.761 - type: mrr_at_3 value: 50.929 - type: mrr_at_5 value: 52.461 - type: ndcg_at_1 value: 42.26 - type: ndcg_at_10 value: 34.673 - type: ndcg_at_100 value: 30.759999999999998 - type: ndcg_at_1000 value: 39.728 - type: ndcg_at_3 value: 40.349000000000004 - type: ndcg_at_5 value: 37.915 - type: precision_at_1 value: 43.653 - type: precision_at_10 value: 25.789 - type: precision_at_100 value: 7.754999999999999 - type: precision_at_1000 value: 2.07 - type: precision_at_3 value: 38.596000000000004 - type: precision_at_5 value: 33.251 - type: recall_at_1 value: 5.465 - type: recall_at_10 value: 17.148 - type: recall_at_100 value: 29.768 - type: recall_at_1000 value: 62.239 - type: recall_at_3 value: 10.577 - type: recall_at_5 value: 13.315 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 37.008 - type: map_at_10 value: 52.467 - type: map_at_100 value: 53.342999999999996 - type: map_at_1000 value: 53.366 - type: map_at_3 value: 48.412 - type: map_at_5 value: 50.875 - type: mrr_at_1 value: 41.541 - type: mrr_at_10 value: 54.967 - type: mrr_at_100 value: 55.611 - type: mrr_at_1000 value: 55.627 - type: mrr_at_3 value: 51.824999999999996 - type: mrr_at_5 value: 53.763000000000005 - type: ndcg_at_1 value: 41.541 - type: ndcg_at_10 value: 59.724999999999994 - type: ndcg_at_100 value: 63.38700000000001 - type: ndcg_at_1000 value: 63.883 - type: ndcg_at_3 value: 52.331 - type: ndcg_at_5 value: 56.327000000000005 - type: precision_at_1 value: 41.541 - type: precision_at_10 value: 9.447 - type: precision_at_100 value: 1.1520000000000001 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 23.262 - type: precision_at_5 value: 16.314999999999998 - type: recall_at_1 value: 37.008 - type: recall_at_10 value: 79.145 - type: recall_at_100 value: 94.986 - type: recall_at_1000 value: 98.607 - type: recall_at_3 value: 60.277 - type: recall_at_5 value: 69.407 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.402 - type: map_at_10 value: 84.181 - type: map_at_100 value: 84.796 - type: map_at_1000 value: 84.81400000000001 - type: map_at_3 value: 81.209 - type: map_at_5 value: 83.085 - type: mrr_at_1 value: 81.02000000000001 - type: mrr_at_10 value: 87.263 - type: mrr_at_100 value: 87.36 - type: mrr_at_1000 value: 87.36 - type: mrr_at_3 value: 86.235 - type: mrr_at_5 value: 86.945 - type: ndcg_at_1 value: 81.01 - type: ndcg_at_10 value: 87.99900000000001 - type: ndcg_at_100 value: 89.217 - type: ndcg_at_1000 value: 89.33 - type: ndcg_at_3 value: 85.053 - type: ndcg_at_5 value: 86.703 - type: precision_at_1 value: 81.01 - type: precision_at_10 value: 13.336 - type: precision_at_100 value: 1.52 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 37.14 - type: precision_at_5 value: 24.44 - type: recall_at_1 value: 70.402 - type: recall_at_10 value: 95.214 - type: recall_at_100 value: 99.438 - type: recall_at_1000 value: 99.928 - type: recall_at_3 value: 86.75699999999999 - type: recall_at_5 value: 91.44099999999999 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.51721502758904 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 61.054808572333016 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.578 - type: map_at_10 value: 11.036999999999999 - type: map_at_100 value: 12.879999999999999 - type: map_at_1000 value: 13.150999999999998 - type: map_at_3 value: 8.133 - type: map_at_5 value: 9.559 - type: mrr_at_1 value: 22.6 - type: mrr_at_10 value: 32.68 - type: mrr_at_100 value: 33.789 - type: mrr_at_1000 value: 33.854 - type: mrr_at_3 value: 29.7 - type: mrr_at_5 value: 31.480000000000004 - type: ndcg_at_1 value: 22.6 - type: ndcg_at_10 value: 18.616 - type: ndcg_at_100 value: 25.883 - type: ndcg_at_1000 value: 30.944 - type: ndcg_at_3 value: 18.136 - type: ndcg_at_5 value: 15.625 - type: precision_at_1 value: 22.6 - type: precision_at_10 value: 9.48 - type: precision_at_100 value: 1.991 - type: precision_at_1000 value: 0.321 - type: precision_at_3 value: 16.8 - type: precision_at_5 value: 13.54 - type: recall_at_1 value: 4.578 - type: recall_at_10 value: 19.213 - type: recall_at_100 value: 40.397 - type: recall_at_1000 value: 65.2 - type: recall_at_3 value: 10.208 - type: recall_at_5 value: 13.718 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 83.44288351714071 - type: cos_sim_spearman value: 79.37995604564952 - type: euclidean_pearson value: 81.1078874670718 - type: euclidean_spearman value: 79.37995905980499 - type: manhattan_pearson value: 81.03697527288986 - type: manhattan_spearman value: 79.33490235296236 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 84.95557650436523 - type: cos_sim_spearman value: 78.5190672399868 - type: euclidean_pearson value: 81.58064025904707 - type: euclidean_spearman value: 78.5190672399868 - type: manhattan_pearson value: 81.52857930619889 - type: manhattan_spearman value: 78.50421361308034 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.79128416228737 - type: cos_sim_spearman value: 86.05402451477147 - type: euclidean_pearson value: 85.46280267054289 - type: euclidean_spearman value: 86.05402451477147 - type: manhattan_pearson value: 85.46278563858236 - type: manhattan_spearman value: 86.08079590861004 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.20623089568763 - type: cos_sim_spearman value: 81.53786907061009 - type: euclidean_pearson value: 82.82272250091494 - type: euclidean_spearman value: 81.53786907061009 - type: manhattan_pearson value: 82.78850494027013 - type: manhattan_spearman value: 81.5135618083407 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.46366618397936 - type: cos_sim_spearman value: 86.96566013336908 - type: euclidean_pearson value: 86.62651697548931 - type: euclidean_spearman value: 86.96565526364454 - type: manhattan_pearson value: 86.58812160258009 - type: manhattan_spearman value: 86.9336484321288 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.51858358641559 - type: cos_sim_spearman value: 84.7652527954999 - type: euclidean_pearson value: 84.23914783766861 - type: euclidean_spearman value: 84.7652527954999 - type: manhattan_pearson value: 84.22749648503171 - type: manhattan_spearman value: 84.74527996746386 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.28026563313065 - type: cos_sim_spearman value: 87.46928143824915 - type: euclidean_pearson value: 88.30558762000372 - type: euclidean_spearman value: 87.46928143824915 - type: manhattan_pearson value: 88.10513330809331 - type: manhattan_spearman value: 87.21069787834173 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 62.376497134587375 - type: cos_sim_spearman value: 65.0159550112516 - type: euclidean_pearson value: 65.64572120879598 - type: euclidean_spearman value: 65.0159550112516 - type: manhattan_pearson value: 65.88143604989976 - type: manhattan_spearman value: 65.17547297222434 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.22876368947644 - type: cos_sim_spearman value: 85.46935577445318 - type: euclidean_pearson value: 85.32830231392005 - type: euclidean_spearman value: 85.46935577445318 - type: manhattan_pearson value: 85.30353211758495 - type: manhattan_spearman value: 85.42821085956945 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 80.60986667767133 - type: mrr value: 94.29432314236236 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 54.528 - type: map_at_10 value: 65.187 - type: map_at_100 value: 65.62599999999999 - type: map_at_1000 value: 65.657 - type: map_at_3 value: 62.352 - type: map_at_5 value: 64.025 - type: mrr_at_1 value: 57.333 - type: mrr_at_10 value: 66.577 - type: mrr_at_100 value: 66.88 - type: mrr_at_1000 value: 66.908 - type: mrr_at_3 value: 64.556 - type: mrr_at_5 value: 65.739 - type: ndcg_at_1 value: 57.333 - type: ndcg_at_10 value: 70.275 - type: ndcg_at_100 value: 72.136 - type: ndcg_at_1000 value: 72.963 - type: ndcg_at_3 value: 65.414 - type: ndcg_at_5 value: 67.831 - type: precision_at_1 value: 57.333 - type: precision_at_10 value: 9.5 - type: precision_at_100 value: 1.057 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 25.778000000000002 - type: precision_at_5 value: 17.2 - type: recall_at_1 value: 54.528 - type: recall_at_10 value: 84.356 - type: recall_at_100 value: 92.833 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 71.283 - type: recall_at_5 value: 77.14999999999999 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.74158415841585 - type: cos_sim_ap value: 92.90048959850317 - type: cos_sim_f1 value: 86.35650810245687 - type: cos_sim_precision value: 90.4709748083242 - type: cos_sim_recall value: 82.6 - type: dot_accuracy value: 99.74158415841585 - type: dot_ap value: 92.90048959850317 - type: dot_f1 value: 86.35650810245687 - type: dot_precision value: 90.4709748083242 - type: dot_recall value: 82.6 - type: euclidean_accuracy value: 99.74158415841585 - type: euclidean_ap value: 92.90048959850317 - type: euclidean_f1 value: 86.35650810245687 - type: euclidean_precision value: 90.4709748083242 - type: euclidean_recall value: 82.6 - type: manhattan_accuracy value: 99.74158415841585 - type: manhattan_ap value: 92.87344692947894 - type: manhattan_f1 value: 86.38497652582159 - type: manhattan_precision value: 90.29443838604145 - type: manhattan_recall value: 82.8 - type: max_accuracy value: 99.74158415841585 - type: max_ap value: 92.90048959850317 - type: max_f1 value: 86.38497652582159 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 63.191648770424216 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.02944668730218 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 50.466386167525265 - type: mrr value: 51.19071492233257 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.198022505886435 - type: cos_sim_spearman value: 30.40170257939193 - type: dot_pearson value: 30.198015316402614 - type: dot_spearman value: 30.40170257939193 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.242 - type: map_at_10 value: 2.17 - type: map_at_100 value: 12.221 - type: map_at_1000 value: 28.63 - type: map_at_3 value: 0.728 - type: map_at_5 value: 1.185 - type: mrr_at_1 value: 94 - type: mrr_at_10 value: 97 - type: mrr_at_100 value: 97 - type: mrr_at_1000 value: 97 - type: mrr_at_3 value: 97 - type: mrr_at_5 value: 97 - type: ndcg_at_1 value: 89 - type: ndcg_at_10 value: 82.30499999999999 - type: ndcg_at_100 value: 61.839999999999996 - type: ndcg_at_1000 value: 53.381 - type: ndcg_at_3 value: 88.877 - type: ndcg_at_5 value: 86.05199999999999 - type: precision_at_1 value: 94 - type: precision_at_10 value: 87 - type: precision_at_100 value: 63.38 - type: precision_at_1000 value: 23.498 - type: precision_at_3 value: 94 - type: precision_at_5 value: 92 - type: recall_at_1 value: 0.242 - type: recall_at_10 value: 2.302 - type: recall_at_100 value: 14.979000000000001 - type: recall_at_1000 value: 49.638 - type: recall_at_3 value: 0.753 - type: recall_at_5 value: 1.226 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 3.006 - type: map_at_10 value: 11.805 - type: map_at_100 value: 18.146 - type: map_at_1000 value: 19.788 - type: map_at_3 value: 5.914 - type: map_at_5 value: 8.801 - type: mrr_at_1 value: 40.816 - type: mrr_at_10 value: 56.36600000000001 - type: mrr_at_100 value: 56.721999999999994 - type: mrr_at_1000 value: 56.721999999999994 - type: mrr_at_3 value: 52.041000000000004 - type: mrr_at_5 value: 54.796 - type: ndcg_at_1 value: 37.755 - type: ndcg_at_10 value: 29.863 - type: ndcg_at_100 value: 39.571 - type: ndcg_at_1000 value: 51.385999999999996 - type: ndcg_at_3 value: 32.578 - type: ndcg_at_5 value: 32.351 - type: precision_at_1 value: 40.816 - type: precision_at_10 value: 26.531 - type: precision_at_100 value: 7.796 - type: precision_at_1000 value: 1.555 - type: precision_at_3 value: 32.653 - type: precision_at_5 value: 33.061 - type: recall_at_1 value: 3.006 - type: recall_at_10 value: 18.738 - type: recall_at_100 value: 48.058 - type: recall_at_1000 value: 83.41300000000001 - type: recall_at_3 value: 7.166 - type: recall_at_5 value: 12.102 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.4178 - type: ap value: 14.648781342150446 - type: f1 value: 55.07299194946378 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 60.919637804187886 - type: f1 value: 61.24122013967399 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 49.207896583685695 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.23114978840078 - type: cos_sim_ap value: 74.26624727825818 - type: cos_sim_f1 value: 68.72377190817083 - type: cos_sim_precision value: 64.56400742115028 - type: cos_sim_recall value: 73.45646437994723 - type: dot_accuracy value: 86.23114978840078 - type: dot_ap value: 74.26624032659652 - type: dot_f1 value: 68.72377190817083 - type: dot_precision value: 64.56400742115028 - type: dot_recall value: 73.45646437994723 - type: euclidean_accuracy value: 86.23114978840078 - type: euclidean_ap value: 74.26624714480556 - type: euclidean_f1 value: 68.72377190817083 - type: euclidean_precision value: 64.56400742115028 - type: euclidean_recall value: 73.45646437994723 - type: manhattan_accuracy value: 86.16558383501221 - type: manhattan_ap value: 74.2091943976357 - type: manhattan_f1 value: 68.64221520524654 - type: manhattan_precision value: 63.59135913591359 - type: manhattan_recall value: 74.5646437994723 - type: max_accuracy value: 86.23114978840078 - type: max_ap value: 74.26624727825818 - type: max_f1 value: 68.72377190817083 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.3681841114604 - type: cos_sim_ap value: 86.65166387498546 - type: cos_sim_f1 value: 79.02581944698774 - type: cos_sim_precision value: 75.35796605434099 - type: cos_sim_recall value: 83.06898675700647 - type: dot_accuracy value: 89.3681841114604 - type: dot_ap value: 86.65166019802056 - type: dot_f1 value: 79.02581944698774 - type: dot_precision value: 75.35796605434099 - type: dot_recall value: 83.06898675700647 - type: euclidean_accuracy value: 89.3681841114604 - type: euclidean_ap value: 86.65166462876266 - type: euclidean_f1 value: 79.02581944698774 - type: euclidean_precision value: 75.35796605434099 - type: euclidean_recall value: 83.06898675700647 - type: manhattan_accuracy value: 89.36624364497226 - type: manhattan_ap value: 86.65076471274106 - type: manhattan_f1 value: 79.07408783532733 - type: manhattan_precision value: 76.41102972856527 - type: manhattan_recall value: 81.92947336002464 - type: max_accuracy value: 89.3681841114604 - type: max_ap value: 86.65166462876266 - type: max_f1 value: 79.07408783532733 license: apache-2.0 language: - en --- # nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning Blog | Technical Report | AWS SageMaker | Atlas Embedding and Unstructured Data Analytics Platform **Exciting Update!**: is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of , meaning any text embedding is multimodal! ## Usage **Important**: the text prompt *must* include a *task instruction prefix*, instructing the model which task is being performed. For example, if you are implementing a RAG application, you embed your documents as and embed your user queries as . ## Task instruction prefixes ### #### Purpose: embed texts as documents from a dataset This prefix is used for embedding texts as documents, for example as documents for a RAG index. ### #### Purpose: embed texts as questions to answer This prefix is used for embedding texts as questions that documents from a dataset could resolve, for example as queries to be answered by a RAG application. ### #### Purpose: embed texts to group them into clusters This prefix is used for embedding texts in order to group them into clusters, discover common topics, or remove semantic duplicates. ### #### Purpose: embed texts to classify them This prefix is used for embedding texts into vectors that will be used as features for a classification model ### Sentence Transformers ### Transformers The model natively supports scaling of the sequence length past 2048 tokens. To do so, ### Transformers.js ## Nomic API The easiest way to use Nomic Embed is through the Nomic Embedding API. Generating embeddings with the Python client is as easy as For more information, see the API reference ## Infinity Usage with Infinity. ## Adjusting Dimensionality is an improvement upon Nomic Embed that utilizes Matryoshka Representation Learning which gives developers the flexibility to trade off the embedding size for a negligible reduction in performance. | Name | SeqLen | Dimension | MTEB | | :-------------------------------:| :----- | :-------- | :------: | | nomic-embed-text-v1 | 8192 | 768 | **62.39** | | nomic-embed-text-v1.5 | 8192 | 768 | 62.28 | | nomic-embed-text-v1.5 | 8192 | 512 | 61.96 | | nomic-embed-text-v1.5 | 8192 | 256 | 61.04 | | nomic-embed-text-v1.5 | 8192 | 128 | 59.34 | | nomic-embed-text-v1.5 | 8192 | 64 | 56.10 | !image/png ## Training Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data! ![image/webp]( We train our embedder using a multi-stage training pipeline. Starting from a long-context BERT model, the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles. In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage. For more details, see the Nomic Embed Technical Report and corresponding blog post. Training data to train the models is released in its entirety. For more details, see the repository # Join the Nomic Community - Nomic: - Discord: - Twitter: # Citation If you find the model, dataset, or training code useful, please cite our work", + "model_explanation_gemini": "Generates sentence embeddings for tasks like text classification, retrieval, clustering, and similarity measurement." +} \ No newline at end of file diff --git a/data/model_data_json/nomic-ai_nomic-embed-text-v1.json b/data/model_data_json/nomic-ai_nomic-embed-text-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..6af92dcc681c7a797f1b1527f3393fabaadcec39 --- /dev/null +++ b/data/model_data_json/nomic-ai_nomic-embed-text-v1.json @@ -0,0 +1,27 @@ +{ + "model_id": "nomic-ai/nomic-embed-text-v1", + "downloads": 1345700, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "nomic_bert", + "feature-extraction", + "sentence-similarity", + "mteb", + "transformers", + "transformers.js", + "custom_code", + "en", + "arxiv:2402.01613", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - feature-extraction - sentence-similarity - mteb - transformers - transformers.js model-index: - name: epoch_0_model results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 76.8507462686567 - type: ap value: 40.592189159090495 - type: f1 value: 71.01634655512476 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.51892500000001 - type: ap value: 88.50346762975335 - type: f1 value: 91.50342077459624 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 47.364 - type: f1 value: 46.72708080922794 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 25.178 - type: map_at_10 value: 40.244 - type: map_at_100 value: 41.321999999999996 - type: map_at_1000 value: 41.331 - type: map_at_3 value: 35.016999999999996 - type: map_at_5 value: 37.99 - type: mrr_at_1 value: 25.605 - type: mrr_at_10 value: 40.422000000000004 - type: mrr_at_100 value: 41.507 - type: mrr_at_1000 value: 41.516 - type: mrr_at_3 value: 35.23 - type: mrr_at_5 value: 38.15 - type: ndcg_at_1 value: 25.178 - type: ndcg_at_10 value: 49.258 - type: ndcg_at_100 value: 53.776 - type: ndcg_at_1000 value: 53.995000000000005 - type: ndcg_at_3 value: 38.429 - type: ndcg_at_5 value: 43.803 - type: precision_at_1 value: 25.178 - type: precision_at_10 value: 7.831 - type: precision_at_100 value: 0.979 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 16.121 - type: precision_at_5 value: 12.29 - type: recall_at_1 value: 25.178 - type: recall_at_10 value: 78.307 - type: recall_at_100 value: 97.866 - type: recall_at_1000 value: 99.57300000000001 - type: recall_at_3 value: 48.364000000000004 - type: recall_at_5 value: 61.451 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 45.93034494751465 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 36.64579480054327 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 60.601310529222054 - type: mrr value: 75.04484896451656 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 88.57797718095814 - type: cos_sim_spearman value: 86.47064499110101 - type: euclidean_pearson value: 87.4559602783142 - type: euclidean_spearman value: 86.47064499110101 - type: manhattan_pearson value: 87.7232764230245 - type: manhattan_spearman value: 86.91222131777742 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 84.5422077922078 - type: f1 value: 84.47657456950589 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 38.48953561974464 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 32.75995857510105 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.008000000000003 - type: map_at_10 value: 39.51 - type: map_at_100 value: 40.841 - type: map_at_1000 value: 40.973 - type: map_at_3 value: 36.248999999999995 - type: map_at_5 value: 38.096999999999994 - type: mrr_at_1 value: 36.481 - type: mrr_at_10 value: 44.818000000000005 - type: mrr_at_100 value: 45.64 - type: mrr_at_1000 value: 45.687 - type: mrr_at_3 value: 42.036 - type: mrr_at_5 value: 43.782 - type: ndcg_at_1 value: 36.481 - type: ndcg_at_10 value: 45.152 - type: ndcg_at_100 value: 50.449 - type: ndcg_at_1000 value: 52.76499999999999 - type: ndcg_at_3 value: 40.161 - type: ndcg_at_5 value: 42.577999999999996 - type: precision_at_1 value: 36.481 - type: precision_at_10 value: 8.369 - type: precision_at_100 value: 1.373 - type: precision_at_1000 value: 0.186 - type: precision_at_3 value: 18.693 - type: precision_at_5 value: 13.533999999999999 - type: recall_at_1 value: 30.008000000000003 - type: recall_at_10 value: 56.108999999999995 - type: recall_at_100 value: 78.55499999999999 - type: recall_at_1000 value: 93.659 - type: recall_at_3 value: 41.754999999999995 - type: recall_at_5 value: 48.296 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.262 - type: map_at_10 value: 40.139 - type: map_at_100 value: 41.394 - type: map_at_1000 value: 41.526 - type: map_at_3 value: 37.155 - type: map_at_5 value: 38.785 - type: mrr_at_1 value: 38.153 - type: mrr_at_10 value: 46.369 - type: mrr_at_100 value: 47.072 - type: mrr_at_1000 value: 47.111999999999995 - type: mrr_at_3 value: 44.268 - type: mrr_at_5 value: 45.389 - type: ndcg_at_1 value: 38.153 - type: ndcg_at_10 value: 45.925 - type: ndcg_at_100 value: 50.394000000000005 - type: ndcg_at_1000 value: 52.37500000000001 - type: ndcg_at_3 value: 41.754000000000005 - type: ndcg_at_5 value: 43.574 - type: precision_at_1 value: 38.153 - type: precision_at_10 value: 8.796 - type: precision_at_100 value: 1.432 - type: precision_at_1000 value: 0.189 - type: precision_at_3 value: 20.318 - type: precision_at_5 value: 14.395 - type: recall_at_1 value: 30.262 - type: recall_at_10 value: 55.72200000000001 - type: recall_at_100 value: 74.97500000000001 - type: recall_at_1000 value: 87.342 - type: recall_at_3 value: 43.129 - type: recall_at_5 value: 48.336 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 39.951 - type: map_at_10 value: 51.248000000000005 - type: map_at_100 value: 52.188 - type: map_at_1000 value: 52.247 - type: map_at_3 value: 48.211 - type: map_at_5 value: 49.797000000000004 - type: mrr_at_1 value: 45.329 - type: mrr_at_10 value: 54.749 - type: mrr_at_100 value: 55.367999999999995 - type: mrr_at_1000 value: 55.400000000000006 - type: mrr_at_3 value: 52.382 - type: mrr_at_5 value: 53.649 - type: ndcg_at_1 value: 45.329 - type: ndcg_at_10 value: 56.847 - type: ndcg_at_100 value: 60.738 - type: ndcg_at_1000 value: 61.976 - type: ndcg_at_3 value: 51.59 - type: ndcg_at_5 value: 53.915 - type: precision_at_1 value: 45.329 - type: precision_at_10 value: 8.959 - type: precision_at_100 value: 1.187 - type: precision_at_1000 value: 0.134 - type: precision_at_3 value: 22.612 - type: precision_at_5 value: 15.273 - type: recall_at_1 value: 39.951 - type: recall_at_10 value: 70.053 - type: recall_at_100 value: 86.996 - type: recall_at_1000 value: 95.707 - type: recall_at_3 value: 56.032000000000004 - type: recall_at_5 value: 61.629999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.566 - type: map_at_10 value: 33.207 - type: map_at_100 value: 34.166000000000004 - type: map_at_1000 value: 34.245 - type: map_at_3 value: 30.94 - type: map_at_5 value: 32.01 - type: mrr_at_1 value: 27.345000000000002 - type: mrr_at_10 value: 35.193000000000005 - type: mrr_at_100 value: 35.965 - type: mrr_at_1000 value: 36.028999999999996 - type: mrr_at_3 value: 32.806000000000004 - type: mrr_at_5 value: 34.021 - type: ndcg_at_1 value: 27.345000000000002 - type: ndcg_at_10 value: 37.891999999999996 - type: ndcg_at_100 value: 42.664 - type: ndcg_at_1000 value: 44.757000000000005 - type: ndcg_at_3 value: 33.123000000000005 - type: ndcg_at_5 value: 35.035 - type: precision_at_1 value: 27.345000000000002 - type: precision_at_10 value: 5.763 - type: precision_at_100 value: 0.859 - type: precision_at_1000 value: 0.108 - type: precision_at_3 value: 13.71 - type: precision_at_5 value: 9.401 - type: recall_at_1 value: 25.566 - type: recall_at_10 value: 50.563 - type: recall_at_100 value: 72.86399999999999 - type: recall_at_1000 value: 88.68599999999999 - type: recall_at_3 value: 37.43 - type: recall_at_5 value: 41.894999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.663 - type: map_at_10 value: 23.552 - type: map_at_100 value: 24.538 - type: map_at_1000 value: 24.661 - type: map_at_3 value: 21.085 - type: map_at_5 value: 22.391 - type: mrr_at_1 value: 20.025000000000002 - type: mrr_at_10 value: 27.643 - type: mrr_at_100 value: 28.499999999999996 - type: mrr_at_1000 value: 28.582 - type: mrr_at_3 value: 25.083 - type: mrr_at_5 value: 26.544 - type: ndcg_at_1 value: 20.025000000000002 - type: ndcg_at_10 value: 28.272000000000002 - type: ndcg_at_100 value: 33.353 - type: ndcg_at_1000 value: 36.454 - type: ndcg_at_3 value: 23.579 - type: ndcg_at_5 value: 25.685000000000002 - type: precision_at_1 value: 20.025000000000002 - type: precision_at_10 value: 5.187 - type: precision_at_100 value: 0.897 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 10.987 - type: precision_at_5 value: 8.06 - type: recall_at_1 value: 16.663 - type: recall_at_10 value: 38.808 - type: recall_at_100 value: 61.305 - type: recall_at_1000 value: 83.571 - type: recall_at_3 value: 25.907999999999998 - type: recall_at_5 value: 31.214 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.695999999999998 - type: map_at_10 value: 37.018 - type: map_at_100 value: 38.263000000000005 - type: map_at_1000 value: 38.371 - type: map_at_3 value: 34.226 - type: map_at_5 value: 35.809999999999995 - type: mrr_at_1 value: 32.916000000000004 - type: mrr_at_10 value: 42.067 - type: mrr_at_100 value: 42.925000000000004 - type: mrr_at_1000 value: 42.978 - type: mrr_at_3 value: 39.637 - type: mrr_at_5 value: 41.134 - type: ndcg_at_1 value: 32.916000000000004 - type: ndcg_at_10 value: 42.539 - type: ndcg_at_100 value: 47.873 - type: ndcg_at_1000 value: 50.08200000000001 - type: ndcg_at_3 value: 37.852999999999994 - type: ndcg_at_5 value: 40.201 - type: precision_at_1 value: 32.916000000000004 - type: precision_at_10 value: 7.5840000000000005 - type: precision_at_100 value: 1.199 - type: precision_at_1000 value: 0.155 - type: precision_at_3 value: 17.485 - type: precision_at_5 value: 12.512 - type: recall_at_1 value: 27.695999999999998 - type: recall_at_10 value: 53.638 - type: recall_at_100 value: 76.116 - type: recall_at_1000 value: 91.069 - type: recall_at_3 value: 41.13 - type: recall_at_5 value: 46.872 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.108 - type: map_at_10 value: 33.372 - type: map_at_100 value: 34.656 - type: map_at_1000 value: 34.768 - type: map_at_3 value: 30.830999999999996 - type: map_at_5 value: 32.204 - type: mrr_at_1 value: 29.110000000000003 - type: mrr_at_10 value: 37.979 - type: mrr_at_100 value: 38.933 - type: mrr_at_1000 value: 38.988 - type: mrr_at_3 value: 35.731 - type: mrr_at_5 value: 36.963 - type: ndcg_at_1 value: 29.110000000000003 - type: ndcg_at_10 value: 38.635000000000005 - type: ndcg_at_100 value: 44.324999999999996 - type: ndcg_at_1000 value: 46.747 - type: ndcg_at_3 value: 34.37 - type: ndcg_at_5 value: 36.228 - type: precision_at_1 value: 29.110000000000003 - type: precision_at_10 value: 6.963 - type: precision_at_100 value: 1.146 - type: precision_at_1000 value: 0.152 - type: precision_at_3 value: 16.400000000000002 - type: precision_at_5 value: 11.552999999999999 - type: recall_at_1 value: 24.108 - type: recall_at_10 value: 49.597 - type: recall_at_100 value: 73.88900000000001 - type: recall_at_1000 value: 90.62400000000001 - type: recall_at_3 value: 37.662 - type: recall_at_5 value: 42.565 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.00791666666667 - type: map_at_10 value: 33.287749999999996 - type: map_at_100 value: 34.41141666666667 - type: map_at_1000 value: 34.52583333333333 - type: map_at_3 value: 30.734416666666668 - type: map_at_5 value: 32.137166666666666 - type: mrr_at_1 value: 29.305666666666664 - type: mrr_at_10 value: 37.22966666666666 - type: mrr_at_100 value: 38.066583333333334 - type: mrr_at_1000 value: 38.12616666666667 - type: mrr_at_3 value: 34.92275 - type: mrr_at_5 value: 36.23333333333334 - type: ndcg_at_1 value: 29.305666666666664 - type: ndcg_at_10 value: 38.25533333333333 - type: ndcg_at_100 value: 43.25266666666666 - type: ndcg_at_1000 value: 45.63583333333334 - type: ndcg_at_3 value: 33.777166666666666 - type: ndcg_at_5 value: 35.85 - type: precision_at_1 value: 29.305666666666664 - type: precision_at_10 value: 6.596416666666667 - type: precision_at_100 value: 1.0784166666666668 - type: precision_at_1000 value: 0.14666666666666664 - type: precision_at_3 value: 15.31075 - type: precision_at_5 value: 10.830916666666667 - type: recall_at_1 value: 25.00791666666667 - type: recall_at_10 value: 49.10933333333333 - type: recall_at_100 value: 71.09216666666667 - type: recall_at_1000 value: 87.77725000000001 - type: recall_at_3 value: 36.660916666666665 - type: recall_at_5 value: 41.94149999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.521 - type: map_at_10 value: 30.043 - type: map_at_100 value: 30.936000000000003 - type: map_at_1000 value: 31.022 - type: map_at_3 value: 27.926000000000002 - type: map_at_5 value: 29.076999999999998 - type: mrr_at_1 value: 26.227 - type: mrr_at_10 value: 32.822 - type: mrr_at_100 value: 33.61 - type: mrr_at_1000 value: 33.672000000000004 - type: mrr_at_3 value: 30.776999999999997 - type: mrr_at_5 value: 31.866 - type: ndcg_at_1 value: 26.227 - type: ndcg_at_10 value: 34.041 - type: ndcg_at_100 value: 38.394 - type: ndcg_at_1000 value: 40.732 - type: ndcg_at_3 value: 30.037999999999997 - type: ndcg_at_5 value: 31.845000000000002 - type: precision_at_1 value: 26.227 - type: precision_at_10 value: 5.244999999999999 - type: precision_at_100 value: 0.808 - type: precision_at_1000 value: 0.107 - type: precision_at_3 value: 12.679000000000002 - type: precision_at_5 value: 8.773 - type: recall_at_1 value: 23.521 - type: recall_at_10 value: 43.633 - type: recall_at_100 value: 63.126000000000005 - type: recall_at_1000 value: 80.765 - type: recall_at_3 value: 32.614 - type: recall_at_5 value: 37.15 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.236 - type: map_at_10 value: 22.898 - type: map_at_100 value: 23.878 - type: map_at_1000 value: 24.009 - type: map_at_3 value: 20.87 - type: map_at_5 value: 22.025 - type: mrr_at_1 value: 19.339000000000002 - type: mrr_at_10 value: 26.382 - type: mrr_at_100 value: 27.245 - type: mrr_at_1000 value: 27.33 - type: mrr_at_3 value: 24.386 - type: mrr_at_5 value: 25.496000000000002 - type: ndcg_at_1 value: 19.339000000000002 - type: ndcg_at_10 value: 27.139999999999997 - type: ndcg_at_100 value: 31.944 - type: ndcg_at_1000 value: 35.077999999999996 - type: ndcg_at_3 value: 23.424 - type: ndcg_at_5 value: 25.188 - type: precision_at_1 value: 19.339000000000002 - type: precision_at_10 value: 4.8309999999999995 - type: precision_at_100 value: 0.845 - type: precision_at_1000 value: 0.128 - type: precision_at_3 value: 10.874 - type: precision_at_5 value: 7.825 - type: recall_at_1 value: 16.236 - type: recall_at_10 value: 36.513 - type: recall_at_100 value: 57.999 - type: recall_at_1000 value: 80.512 - type: recall_at_3 value: 26.179999999999996 - type: recall_at_5 value: 30.712 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.11 - type: map_at_10 value: 31.566 - type: map_at_100 value: 32.647 - type: map_at_1000 value: 32.753 - type: map_at_3 value: 29.24 - type: map_at_5 value: 30.564999999999998 - type: mrr_at_1 value: 28.265 - type: mrr_at_10 value: 35.504000000000005 - type: mrr_at_100 value: 36.436 - type: mrr_at_1000 value: 36.503 - type: mrr_at_3 value: 33.349000000000004 - type: mrr_at_5 value: 34.622 - type: ndcg_at_1 value: 28.265 - type: ndcg_at_10 value: 36.192 - type: ndcg_at_100 value: 41.388000000000005 - type: ndcg_at_1000 value: 43.948 - type: ndcg_at_3 value: 31.959 - type: ndcg_at_5 value: 33.998 - type: precision_at_1 value: 28.265 - type: precision_at_10 value: 5.989 - type: precision_at_100 value: 0.9650000000000001 - type: precision_at_1000 value: 0.13 - type: precision_at_3 value: 14.335 - type: precision_at_5 value: 10.112 - type: recall_at_1 value: 24.11 - type: recall_at_10 value: 46.418 - type: recall_at_100 value: 69.314 - type: recall_at_1000 value: 87.397 - type: recall_at_3 value: 34.724 - type: recall_at_5 value: 39.925 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.091 - type: map_at_10 value: 29.948999999999998 - type: map_at_100 value: 31.502000000000002 - type: map_at_1000 value: 31.713 - type: map_at_3 value: 27.464 - type: map_at_5 value: 28.968 - type: mrr_at_1 value: 26.482 - type: mrr_at_10 value: 34.009 - type: mrr_at_100 value: 35.081 - type: mrr_at_1000 value: 35.138000000000005 - type: mrr_at_3 value: 31.785000000000004 - type: mrr_at_5 value: 33.178999999999995 - type: ndcg_at_1 value: 26.482 - type: ndcg_at_10 value: 35.008 - type: ndcg_at_100 value: 41.272999999999996 - type: ndcg_at_1000 value: 43.972 - type: ndcg_at_3 value: 30.804 - type: ndcg_at_5 value: 33.046 - type: precision_at_1 value: 26.482 - type: precision_at_10 value: 6.462 - type: precision_at_100 value: 1.431 - type: precision_at_1000 value: 0.22899999999999998 - type: precision_at_3 value: 14.360999999999999 - type: precision_at_5 value: 10.474 - type: recall_at_1 value: 22.091 - type: recall_at_10 value: 45.125 - type: recall_at_100 value: 72.313 - type: recall_at_1000 value: 89.503 - type: recall_at_3 value: 33.158 - type: recall_at_5 value: 39.086999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.883 - type: map_at_10 value: 26.951000000000004 - type: map_at_100 value: 27.927999999999997 - type: map_at_1000 value: 28.022000000000002 - type: map_at_3 value: 24.616 - type: map_at_5 value: 25.917 - type: mrr_at_1 value: 21.996 - type: mrr_at_10 value: 29.221000000000004 - type: mrr_at_100 value: 30.024 - type: mrr_at_1000 value: 30.095 - type: mrr_at_3 value: 26.833000000000002 - type: mrr_at_5 value: 28.155 - type: ndcg_at_1 value: 21.996 - type: ndcg_at_10 value: 31.421 - type: ndcg_at_100 value: 36.237 - type: ndcg_at_1000 value: 38.744 - type: ndcg_at_3 value: 26.671 - type: ndcg_at_5 value: 28.907 - type: precision_at_1 value: 21.996 - type: precision_at_10 value: 5.009 - type: precision_at_100 value: 0.799 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 11.275 - type: precision_at_5 value: 8.059 - type: recall_at_1 value: 19.883 - type: recall_at_10 value: 43.132999999999996 - type: recall_at_100 value: 65.654 - type: recall_at_1000 value: 84.492 - type: recall_at_3 value: 30.209000000000003 - type: recall_at_5 value: 35.616 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 17.756 - type: map_at_10 value: 30.378 - type: map_at_100 value: 32.537 - type: map_at_1000 value: 32.717 - type: map_at_3 value: 25.599 - type: map_at_5 value: 28.372999999999998 - type: mrr_at_1 value: 41.303 - type: mrr_at_10 value: 53.483999999999995 - type: mrr_at_100 value: 54.106 - type: mrr_at_1000 value: 54.127 - type: mrr_at_3 value: 50.315 - type: mrr_at_5 value: 52.396 - type: ndcg_at_1 value: 41.303 - type: ndcg_at_10 value: 40.503 - type: ndcg_at_100 value: 47.821000000000005 - type: ndcg_at_1000 value: 50.788 - type: ndcg_at_3 value: 34.364 - type: ndcg_at_5 value: 36.818 - type: precision_at_1 value: 41.303 - type: precision_at_10 value: 12.463000000000001 - type: precision_at_100 value: 2.037 - type: precision_at_1000 value: 0.26 - type: precision_at_3 value: 25.798 - type: precision_at_5 value: 19.896 - type: recall_at_1 value: 17.756 - type: recall_at_10 value: 46.102 - type: recall_at_100 value: 70.819 - type: recall_at_1000 value: 87.21799999999999 - type: recall_at_3 value: 30.646 - type: recall_at_5 value: 38.022 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.033 - type: map_at_10 value: 20.584 - type: map_at_100 value: 29.518 - type: map_at_1000 value: 31.186000000000003 - type: map_at_3 value: 14.468 - type: map_at_5 value: 17.177 - type: mrr_at_1 value: 69.75 - type: mrr_at_10 value: 77.025 - type: mrr_at_100 value: 77.36699999999999 - type: mrr_at_1000 value: 77.373 - type: mrr_at_3 value: 75.583 - type: mrr_at_5 value: 76.396 - type: ndcg_at_1 value: 58.5 - type: ndcg_at_10 value: 45.033 - type: ndcg_at_100 value: 49.071 - type: ndcg_at_1000 value: 56.056 - type: ndcg_at_3 value: 49.936 - type: ndcg_at_5 value: 47.471999999999994 - type: precision_at_1 value: 69.75 - type: precision_at_10 value: 35.775 - type: precision_at_100 value: 11.594999999999999 - type: precision_at_1000 value: 2.062 - type: precision_at_3 value: 52.5 - type: precision_at_5 value: 45.300000000000004 - type: recall_at_1 value: 9.033 - type: recall_at_10 value: 26.596999999999998 - type: recall_at_100 value: 54.607000000000006 - type: recall_at_1000 value: 76.961 - type: recall_at_3 value: 15.754999999999999 - type: recall_at_5 value: 20.033 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 48.345000000000006 - type: f1 value: 43.4514918068706 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 71.29100000000001 - type: map_at_10 value: 81.059 - type: map_at_100 value: 81.341 - type: map_at_1000 value: 81.355 - type: map_at_3 value: 79.74799999999999 - type: map_at_5 value: 80.612 - type: mrr_at_1 value: 76.40299999999999 - type: mrr_at_10 value: 84.615 - type: mrr_at_100 value: 84.745 - type: mrr_at_1000 value: 84.748 - type: mrr_at_3 value: 83.776 - type: mrr_at_5 value: 84.343 - type: ndcg_at_1 value: 76.40299999999999 - type: ndcg_at_10 value: 84.981 - type: ndcg_at_100 value: 86.00999999999999 - type: ndcg_at_1000 value: 86.252 - type: ndcg_at_3 value: 82.97 - type: ndcg_at_5 value: 84.152 - type: precision_at_1 value: 76.40299999999999 - type: precision_at_10 value: 10.446 - type: precision_at_100 value: 1.1199999999999999 - type: precision_at_1000 value: 0.116 - type: precision_at_3 value: 32.147999999999996 - type: precision_at_5 value: 20.135 - type: recall_at_1 value: 71.29100000000001 - type: recall_at_10 value: 93.232 - type: recall_at_100 value: 97.363 - type: recall_at_1000 value: 98.905 - type: recall_at_3 value: 87.893 - type: recall_at_5 value: 90.804 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 18.667 - type: map_at_10 value: 30.853 - type: map_at_100 value: 32.494 - type: map_at_1000 value: 32.677 - type: map_at_3 value: 26.91 - type: map_at_5 value: 29.099000000000004 - type: mrr_at_1 value: 37.191 - type: mrr_at_10 value: 46.171 - type: mrr_at_100 value: 47.056 - type: mrr_at_1000 value: 47.099000000000004 - type: mrr_at_3 value: 44.059 - type: mrr_at_5 value: 45.147 - type: ndcg_at_1 value: 37.191 - type: ndcg_at_10 value: 38.437 - type: ndcg_at_100 value: 44.62 - type: ndcg_at_1000 value: 47.795 - type: ndcg_at_3 value: 35.003 - type: ndcg_at_5 value: 36.006 - type: precision_at_1 value: 37.191 - type: precision_at_10 value: 10.586 - type: precision_at_100 value: 1.688 - type: precision_at_1000 value: 0.22699999999999998 - type: precision_at_3 value: 23.302 - type: precision_at_5 value: 17.006 - type: recall_at_1 value: 18.667 - type: recall_at_10 value: 45.367000000000004 - type: recall_at_100 value: 68.207 - type: recall_at_1000 value: 87.072 - type: recall_at_3 value: 32.129000000000005 - type: recall_at_5 value: 37.719 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 39.494 - type: map_at_10 value: 66.223 - type: map_at_100 value: 67.062 - type: map_at_1000 value: 67.11500000000001 - type: map_at_3 value: 62.867 - type: map_at_5 value: 64.994 - type: mrr_at_1 value: 78.987 - type: mrr_at_10 value: 84.585 - type: mrr_at_100 value: 84.773 - type: mrr_at_1000 value: 84.77900000000001 - type: mrr_at_3 value: 83.592 - type: mrr_at_5 value: 84.235 - type: ndcg_at_1 value: 78.987 - type: ndcg_at_10 value: 73.64 - type: ndcg_at_100 value: 76.519 - type: ndcg_at_1000 value: 77.51 - type: ndcg_at_3 value: 68.893 - type: ndcg_at_5 value: 71.585 - type: precision_at_1 value: 78.987 - type: precision_at_10 value: 15.529000000000002 - type: precision_at_100 value: 1.7770000000000001 - type: precision_at_1000 value: 0.191 - type: precision_at_3 value: 44.808 - type: precision_at_5 value: 29.006999999999998 - type: recall_at_1 value: 39.494 - type: recall_at_10 value: 77.643 - type: recall_at_100 value: 88.825 - type: recall_at_1000 value: 95.321 - type: recall_at_3 value: 67.211 - type: recall_at_5 value: 72.519 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 85.55959999999999 - type: ap value: 80.7246500384617 - type: f1 value: 85.52336485065454 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 23.631 - type: map_at_10 value: 36.264 - type: map_at_100 value: 37.428 - type: map_at_1000 value: 37.472 - type: map_at_3 value: 32.537 - type: map_at_5 value: 34.746 - type: mrr_at_1 value: 24.312 - type: mrr_at_10 value: 36.858000000000004 - type: mrr_at_100 value: 37.966 - type: mrr_at_1000 value: 38.004 - type: mrr_at_3 value: 33.188 - type: mrr_at_5 value: 35.367 - type: ndcg_at_1 value: 24.312 - type: ndcg_at_10 value: 43.126999999999995 - type: ndcg_at_100 value: 48.642 - type: ndcg_at_1000 value: 49.741 - type: ndcg_at_3 value: 35.589 - type: ndcg_at_5 value: 39.515 - type: precision_at_1 value: 24.312 - type: precision_at_10 value: 6.699 - type: precision_at_100 value: 0.9450000000000001 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 15.153 - type: precision_at_5 value: 11.065999999999999 - type: recall_at_1 value: 23.631 - type: recall_at_10 value: 64.145 - type: recall_at_100 value: 89.41 - type: recall_at_1000 value: 97.83500000000001 - type: recall_at_3 value: 43.769000000000005 - type: recall_at_5 value: 53.169 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.4108527131783 - type: f1 value: 93.1415880261038 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 77.24806201550388 - type: f1 value: 60.531916308197175 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 73.71553463349024 - type: f1 value: 71.70753174900791 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 77.79757901815736 - type: f1 value: 77.83719850433258 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.74193296622113 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 30.64257594108566 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 30.811018518883625 - type: mrr value: 31.910376577445003 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.409 - type: map_at_10 value: 13.093 - type: map_at_100 value: 16.256999999999998 - type: map_at_1000 value: 17.617 - type: map_at_3 value: 9.555 - type: map_at_5 value: 11.428 - type: mrr_at_1 value: 45.201 - type: mrr_at_10 value: 54.179 - type: mrr_at_100 value: 54.812000000000005 - type: mrr_at_1000 value: 54.840999999999994 - type: mrr_at_3 value: 51.909000000000006 - type: mrr_at_5 value: 53.519000000000005 - type: ndcg_at_1 value: 43.189 - type: ndcg_at_10 value: 35.028 - type: ndcg_at_100 value: 31.226 - type: ndcg_at_1000 value: 39.678000000000004 - type: ndcg_at_3 value: 40.596 - type: ndcg_at_5 value: 38.75 - type: precision_at_1 value: 44.582 - type: precision_at_10 value: 25.974999999999998 - type: precision_at_100 value: 7.793 - type: precision_at_1000 value: 2.036 - type: precision_at_3 value: 38.493 - type: precision_at_5 value: 33.994 - type: recall_at_1 value: 5.409 - type: recall_at_10 value: 16.875999999999998 - type: recall_at_100 value: 30.316 - type: recall_at_1000 value: 60.891 - type: recall_at_3 value: 10.688 - type: recall_at_5 value: 13.832 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 36.375 - type: map_at_10 value: 51.991 - type: map_at_100 value: 52.91400000000001 - type: map_at_1000 value: 52.93600000000001 - type: map_at_3 value: 48.014 - type: map_at_5 value: 50.381 - type: mrr_at_1 value: 40.759 - type: mrr_at_10 value: 54.617000000000004 - type: mrr_at_100 value: 55.301 - type: mrr_at_1000 value: 55.315000000000005 - type: mrr_at_3 value: 51.516 - type: mrr_at_5 value: 53.435 - type: ndcg_at_1 value: 40.759 - type: ndcg_at_10 value: 59.384 - type: ndcg_at_100 value: 63.157 - type: ndcg_at_1000 value: 63.654999999999994 - type: ndcg_at_3 value: 52.114000000000004 - type: ndcg_at_5 value: 55.986000000000004 - type: precision_at_1 value: 40.759 - type: precision_at_10 value: 9.411999999999999 - type: precision_at_100 value: 1.153 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 23.329 - type: precision_at_5 value: 16.256999999999998 - type: recall_at_1 value: 36.375 - type: recall_at_10 value: 79.053 - type: recall_at_100 value: 95.167 - type: recall_at_1000 value: 98.82 - type: recall_at_3 value: 60.475 - type: recall_at_5 value: 69.327 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.256 - type: map_at_10 value: 83.8 - type: map_at_100 value: 84.425 - type: map_at_1000 value: 84.444 - type: map_at_3 value: 80.906 - type: map_at_5 value: 82.717 - type: mrr_at_1 value: 80.97999999999999 - type: mrr_at_10 value: 87.161 - type: mrr_at_100 value: 87.262 - type: mrr_at_1000 value: 87.263 - type: mrr_at_3 value: 86.175 - type: mrr_at_5 value: 86.848 - type: ndcg_at_1 value: 80.97999999999999 - type: ndcg_at_10 value: 87.697 - type: ndcg_at_100 value: 88.959 - type: ndcg_at_1000 value: 89.09899999999999 - type: ndcg_at_3 value: 84.83800000000001 - type: ndcg_at_5 value: 86.401 - type: precision_at_1 value: 80.97999999999999 - type: precision_at_10 value: 13.261000000000001 - type: precision_at_100 value: 1.5150000000000001 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 37.01 - type: precision_at_5 value: 24.298000000000002 - type: recall_at_1 value: 70.256 - type: recall_at_10 value: 94.935 - type: recall_at_100 value: 99.274 - type: recall_at_1000 value: 99.928 - type: recall_at_3 value: 86.602 - type: recall_at_5 value: 91.133 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 56.322692497613104 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 61.895813503775074 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.338 - type: map_at_10 value: 10.767 - type: map_at_100 value: 12.537999999999998 - type: map_at_1000 value: 12.803999999999998 - type: map_at_3 value: 7.788 - type: map_at_5 value: 9.302000000000001 - type: mrr_at_1 value: 21.4 - type: mrr_at_10 value: 31.637999999999998 - type: mrr_at_100 value: 32.688 - type: mrr_at_1000 value: 32.756 - type: mrr_at_3 value: 28.433000000000003 - type: mrr_at_5 value: 30.178 - type: ndcg_at_1 value: 21.4 - type: ndcg_at_10 value: 18.293 - type: ndcg_at_100 value: 25.274 - type: ndcg_at_1000 value: 30.284 - type: ndcg_at_3 value: 17.391000000000002 - type: ndcg_at_5 value: 15.146999999999998 - type: precision_at_1 value: 21.4 - type: precision_at_10 value: 9.48 - type: precision_at_100 value: 1.949 - type: precision_at_1000 value: 0.316 - type: precision_at_3 value: 16.167 - type: precision_at_5 value: 13.22 - type: recall_at_1 value: 4.338 - type: recall_at_10 value: 19.213 - type: recall_at_100 value: 39.562999999999995 - type: recall_at_1000 value: 64.08 - type: recall_at_3 value: 9.828000000000001 - type: recall_at_5 value: 13.383000000000001 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 82.42568163642142 - type: cos_sim_spearman value: 78.5797159641342 - type: euclidean_pearson value: 80.22151260811604 - type: euclidean_spearman value: 78.5797151953878 - type: manhattan_pearson value: 80.21224215864788 - type: manhattan_spearman value: 78.55641478381344 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.44020710812569 - type: cos_sim_spearman value: 78.91631735081286 - type: euclidean_pearson value: 81.64188964182102 - type: euclidean_spearman value: 78.91633286881678 - type: manhattan_pearson value: 81.69294748512496 - type: manhattan_spearman value: 78.93438558002656 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 84.27165426412311 - type: cos_sim_spearman value: 85.40429140249618 - type: euclidean_pearson value: 84.7509580724893 - type: euclidean_spearman value: 85.40429140249618 - type: manhattan_pearson value: 84.76488289321308 - type: manhattan_spearman value: 85.4256793698708 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.138851760732 - type: cos_sim_spearman value: 81.64101363896586 - type: euclidean_pearson value: 82.55165038934942 - type: euclidean_spearman value: 81.64105257080502 - type: manhattan_pearson value: 82.52802949883335 - type: manhattan_spearman value: 81.61255430718158 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.0654695484029 - type: cos_sim_spearman value: 87.20408521902229 - type: euclidean_pearson value: 86.8110651362115 - type: euclidean_spearman value: 87.20408521902229 - type: manhattan_pearson value: 86.77984656478691 - type: manhattan_spearman value: 87.1719947099227 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 83.77823915496512 - type: cos_sim_spearman value: 85.43566325729779 - type: euclidean_pearson value: 84.5396956658821 - type: euclidean_spearman value: 85.43566325729779 - type: manhattan_pearson value: 84.5665398848169 - type: manhattan_spearman value: 85.44375870303232 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.20030208471798 - type: cos_sim_spearman value: 87.20485505076539 - type: euclidean_pearson value: 88.10588324368722 - type: euclidean_spearman value: 87.20485505076539 - type: manhattan_pearson value: 87.92324770415183 - type: manhattan_spearman value: 87.0571314561877 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 63.06093161604453 - type: cos_sim_spearman value: 64.2163140357722 - type: euclidean_pearson value: 65.27589680994006 - type: euclidean_spearman value: 64.2163140357722 - type: manhattan_pearson value: 65.45904383711101 - type: manhattan_spearman value: 64.55404716679305 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.32976164578706 - type: cos_sim_spearman value: 85.54302197678368 - type: euclidean_pearson value: 85.26307149193056 - type: euclidean_spearman value: 85.54302197678368 - type: manhattan_pearson value: 85.26647282029371 - type: manhattan_spearman value: 85.5316135265568 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 81.44675968318754 - type: mrr value: 94.92741826075158 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 56.34400000000001 - type: map_at_10 value: 65.927 - type: map_at_100 value: 66.431 - type: map_at_1000 value: 66.461 - type: map_at_3 value: 63.529 - type: map_at_5 value: 64.818 - type: mrr_at_1 value: 59.333000000000006 - type: mrr_at_10 value: 67.54599999999999 - type: mrr_at_100 value: 67.892 - type: mrr_at_1000 value: 67.917 - type: mrr_at_3 value: 65.778 - type: mrr_at_5 value: 66.794 - type: ndcg_at_1 value: 59.333000000000006 - type: ndcg_at_10 value: 70.5 - type: ndcg_at_100 value: 72.688 - type: ndcg_at_1000 value: 73.483 - type: ndcg_at_3 value: 66.338 - type: ndcg_at_5 value: 68.265 - type: precision_at_1 value: 59.333000000000006 - type: precision_at_10 value: 9.3 - type: precision_at_100 value: 1.053 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 25.889 - type: precision_at_5 value: 16.866999999999997 - type: recall_at_1 value: 56.34400000000001 - type: recall_at_10 value: 82.789 - type: recall_at_100 value: 92.767 - type: recall_at_1000 value: 99 - type: recall_at_3 value: 71.64399999999999 - type: recall_at_5 value: 76.322 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.75742574257426 - type: cos_sim_ap value: 93.52081548447406 - type: cos_sim_f1 value: 87.33850129198966 - type: cos_sim_precision value: 90.37433155080214 - type: cos_sim_recall value: 84.5 - type: dot_accuracy value: 99.75742574257426 - type: dot_ap value: 93.52081548447406 - type: dot_f1 value: 87.33850129198966 - type: dot_precision value: 90.37433155080214 - type: dot_recall value: 84.5 - type: euclidean_accuracy value: 99.75742574257426 - type: euclidean_ap value: 93.52081548447406 - type: euclidean_f1 value: 87.33850129198966 - type: euclidean_precision value: 90.37433155080214 - type: euclidean_recall value: 84.5 - type: manhattan_accuracy value: 99.75841584158415 - type: manhattan_ap value: 93.4975678585854 - type: manhattan_f1 value: 87.26708074534162 - type: manhattan_precision value: 90.45064377682404 - type: manhattan_recall value: 84.3 - type: max_accuracy value: 99.75841584158415 - type: max_ap value: 93.52081548447406 - type: max_f1 value: 87.33850129198966 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 64.31437036686651 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 33.25569319007206 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 49.90474939720706 - type: mrr value: 50.568115503777264 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 29.866828641244712 - type: cos_sim_spearman value: 30.077555055873866 - type: dot_pearson value: 29.866832988572266 - type: dot_spearman value: 30.077555055873866 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.232 - type: map_at_10 value: 2.094 - type: map_at_100 value: 11.971 - type: map_at_1000 value: 28.158 - type: map_at_3 value: 0.688 - type: map_at_5 value: 1.114 - type: mrr_at_1 value: 88 - type: mrr_at_10 value: 93.4 - type: mrr_at_100 value: 93.4 - type: mrr_at_1000 value: 93.4 - type: mrr_at_3 value: 93 - type: mrr_at_5 value: 93.4 - type: ndcg_at_1 value: 84 - type: ndcg_at_10 value: 79.923 - type: ndcg_at_100 value: 61.17 - type: ndcg_at_1000 value: 53.03 - type: ndcg_at_3 value: 84.592 - type: ndcg_at_5 value: 82.821 - type: precision_at_1 value: 88 - type: precision_at_10 value: 85 - type: precision_at_100 value: 63.019999999999996 - type: precision_at_1000 value: 23.554 - type: precision_at_3 value: 89.333 - type: precision_at_5 value: 87.2 - type: recall_at_1 value: 0.232 - type: recall_at_10 value: 2.255 - type: recall_at_100 value: 14.823 - type: recall_at_1000 value: 49.456 - type: recall_at_3 value: 0.718 - type: recall_at_5 value: 1.175 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.547 - type: map_at_10 value: 11.375 - type: map_at_100 value: 18.194 - type: map_at_1000 value: 19.749 - type: map_at_3 value: 5.825 - type: map_at_5 value: 8.581 - type: mrr_at_1 value: 32.653 - type: mrr_at_10 value: 51.32 - type: mrr_at_100 value: 51.747 - type: mrr_at_1000 value: 51.747 - type: mrr_at_3 value: 47.278999999999996 - type: mrr_at_5 value: 48.605 - type: ndcg_at_1 value: 29.592000000000002 - type: ndcg_at_10 value: 28.151 - type: ndcg_at_100 value: 39.438 - type: ndcg_at_1000 value: 50.769 - type: ndcg_at_3 value: 30.758999999999997 - type: ndcg_at_5 value: 30.366 - type: precision_at_1 value: 32.653 - type: precision_at_10 value: 25.714 - type: precision_at_100 value: 8.041 - type: precision_at_1000 value: 1.555 - type: precision_at_3 value: 33.333 - type: precision_at_5 value: 31.837 - type: recall_at_1 value: 2.547 - type: recall_at_10 value: 18.19 - type: recall_at_100 value: 49.538 - type: recall_at_1000 value: 83.86 - type: recall_at_3 value: 7.329 - type: recall_at_5 value: 11.532 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.4952 - type: ap value: 14.793362635531409 - type: f1 value: 55.204635551516915 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 61.5365025466893 - type: f1 value: 61.81742556334845 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 49.05531070301185 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 86.51725576682364 - type: cos_sim_ap value: 75.2292304265163 - type: cos_sim_f1 value: 69.54022988505749 - type: cos_sim_precision value: 63.65629110039457 - type: cos_sim_recall value: 76.62269129287598 - type: dot_accuracy value: 86.51725576682364 - type: dot_ap value: 75.22922386081054 - type: dot_f1 value: 69.54022988505749 - type: dot_precision value: 63.65629110039457 - type: dot_recall value: 76.62269129287598 - type: euclidean_accuracy value: 86.51725576682364 - type: euclidean_ap value: 75.22925730473472 - type: euclidean_f1 value: 69.54022988505749 - type: euclidean_precision value: 63.65629110039457 - type: euclidean_recall value: 76.62269129287598 - type: manhattan_accuracy value: 86.52321630804077 - type: manhattan_ap value: 75.20608115037336 - type: manhattan_f1 value: 69.60000000000001 - type: manhattan_precision value: 64.37219730941705 - type: manhattan_recall value: 75.75197889182058 - type: max_accuracy value: 86.52321630804077 - type: max_ap value: 75.22925730473472 - type: max_f1 value: 69.60000000000001 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 89.34877944657896 - type: cos_sim_ap value: 86.71257569277373 - type: cos_sim_f1 value: 79.10386355986088 - type: cos_sim_precision value: 76.91468470434214 - type: cos_sim_recall value: 81.4213119802895 - type: dot_accuracy value: 89.34877944657896 - type: dot_ap value: 86.71257133133368 - type: dot_f1 value: 79.10386355986088 - type: dot_precision value: 76.91468470434214 - type: dot_recall value: 81.4213119802895 - type: euclidean_accuracy value: 89.34877944657896 - type: euclidean_ap value: 86.71257651501476 - type: euclidean_f1 value: 79.10386355986088 - type: euclidean_precision value: 76.91468470434214 - type: euclidean_recall value: 81.4213119802895 - type: manhattan_accuracy value: 89.35848177901967 - type: manhattan_ap value: 86.69330615469126 - type: manhattan_f1 value: 79.13867741453949 - type: manhattan_precision value: 76.78881807647741 - type: manhattan_recall value: 81.63689559593472 - type: max_accuracy value: 89.35848177901967 - type: max_ap value: 86.71257651501476 - type: max_f1 value: 79.13867741453949 license: apache-2.0 language: - en new_version: nomic-ai/nomic-embed-text-v1.5 --- # nomic-embed-text-v1: A Reproducible Long Context (8192) Text Embedder Blog | Technical Report | AWS SageMaker | Atlas Embedding and Unstructured Data Analytics Platform is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks. # Performance Benchmarks | Name | SeqLen | MTEB | LoCo | Jina Long Context | Open Weights | Open Training Code | Open Data | | :-------------------------------:| :----- | :-------- | :------: | :---------------: | :-----------: | :----------------: | :---------- | | nomic-embed-text-v1 | 8192 | **62.39** |**85.53** | 54.16 | ✅ | ✅ | ✅ | | jina-embeddings-v2-base-en | 8192 | 60.39 | 85.45 | 51.90 | ✅ | ❌ | ❌ | | text-embedding-3-small | 8191 | 62.26 | 82.40 | **58.20** | ❌ | ❌ | ❌ | | text-embedding-ada-002 | 8191 | 60.99 | 52.7 | 55.25 | ❌ | ❌ | ❌ | **Exciting Update!**: is now multimodal! nomic-embed-vision-v1 is aligned to the embedding space of , meaning any text embedding is multimodal! ## Usage **Important**: the text prompt *must* include a *task instruction prefix*, instructing the model which task is being performed. For example, if you are implementing a RAG application, you embed your documents as and embed your user queries as . ## Task instruction prefixes ### #### Purpose: embed texts as documents from a dataset This prefix is used for embedding texts as documents, for example as documents for a RAG index. ### #### Purpose: embed texts as questions to answer This prefix is used for embedding texts as questions that documents from a dataset could resolve, for example as queries to be answered by a RAG application. ### #### Purpose: embed texts to group them into clusters This prefix is used for embedding texts in order to group them into clusters, discover common topics, or remove semantic duplicates. ### #### Purpose: embed texts to classify them This prefix is used for embedding texts into vectors that will be used as features for a classification model ### Sentence Transformers ### Transformers The model natively supports scaling of the sequence length past 2048 tokens. To do so, ### Transformers.js ## Nomic API The easiest way to get started with Nomic Embed is through the Nomic Embedding API. Generating embeddings with the Python client is as easy as For more information, see the API reference ## Training Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data! ![image/webp]( We train our embedder using a multi-stage training pipeline. Starting from a long-context BERT model, the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles. In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage. For more details, see the Nomic Embed Technical Report and corresponding blog post. Training data to train the models is released in its entirety. For more details, see the repository # Join the Nomic Community - Nomic: - Discord: - Twitter: # Citation If you find the model, dataset, or training code useful, please cite our work", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity comparison, feature extraction, and classification across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/nomic-ai_nomic-embed-text-v2-moe.json b/data/model_data_json/nomic-ai_nomic-embed-text-v2-moe.json new file mode 100644 index 0000000000000000000000000000000000000000..5e6e2feacabf075d0c34b9fac2c6e0eb902dff11 --- /dev/null +++ b/data/model_data_json/nomic-ai_nomic-embed-text-v2-moe.json @@ -0,0 +1,124 @@ +{ + "model_id": "nomic-ai/nomic-embed-text-v2-moe", + "downloads": 279091, + "tags": [ + "sentence-transformers", + "safetensors", + "nomic_bert", + "sentence-similarity", + "feature-extraction", + "custom_code", + "en", + "es", + "fr", + "de", + "it", + "pt", + "pl", + "nl", + "tr", + "ja", + "vi", + "ru", + "id", + "ar", + "cs", + "ro", + "sv", + "el", + "uk", + "zh", + "hu", + "da", + "no", + "hi", + "fi", + "bg", + "ko", + "sk", + "th", + "he", + "ca", + "lt", + "fa", + "ms", + "sl", + "lv", + "mr", + "bn", + "sq", + "cy", + "be", + "ml", + "kn", + "mk", + "ur", + "fy", + "te", + "eu", + "sw", + "so", + "sd", + "uz", + "co", + "hr", + "gu", + "ce", + "eo", + "jv", + "la", + "zu", + "mn", + "si", + "ga", + "ky", + "tg", + "my", + "km", + "mg", + "pa", + "sn", + "ha", + "ht", + "su", + "gd", + "ny", + "ps", + "ku", + "am", + "ig", + "lo", + "mi", + "nn", + "sm", + "yi", + "st", + "tl", + "xh", + "yo", + "af", + "ta", + "tn", + "ug", + "az", + "ba", + "bs", + "dv", + "et", + "gl", + "gn", + "gv", + "hy", + "arxiv:2502.07972", + "arxiv:2205.13147", + "base_model:nomic-ai/nomic-embed-text-v2-moe-unsupervised", + "base_model:finetune:nomic-ai/nomic-embed-text-v2-moe-unsupervised", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: - nomic-ai/nomic-embed-text-v2-moe-unsupervised library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction license: apache-2.0 language: - en - es - fr - de - it - pt - pl - nl - tr - ja - vi - ru - id - ar - cs - ro - sv - el - uk - zh - hu - da - 'no' - hi - fi - bg - ko - sk - th - he - ca - lt - fa - ms - sl - lv - mr - bn - sq - cy - be - ml - kn - mk - ur - fy - te - eu - sw - so - sd - uz - co - hr - gu - ce - eo - jv - la - zu - mn - si - ga - ky - tg - my - km - mg - pa - sn - ha - ht - su - gd - ny - ps - ku - am - ig - lo - mi - nn - sm - yi - st - tl - xh - yo - af - ta - tn - ug - az - ba - bs - dv - et - gl - gn - gv - hy --- # nomic-embed-text-v2-moe: Multilingual Mixture of Experts Text Embeddings Blog | Technical Report | AWS SageMaker | Atlas Embedding and Unstructured Data Analytics Platform This model was presented in the paper Training Sparse Mixture Of Experts Text Embedding Models. ## Model Overview is a SoTA multilingual MoE text embedding model that excels at multilingual retrieval: - **High Performance**: SoTA Multilingual performance compared to ~300M parameter models, competitive with models 2x in size - **Multilinguality**: Supports ~100 languages and trained on over 1.6B pairs - **Flexible Embedding Dimension**: Trained with Matryoshka Embeddings with 3x reductions in storage cost with minimal performance degradations - **Fully Open-Source**: Model weights, code, and training data (see code repo) released | Model | Params (M) | Emb Dim | BEIR | MIRACL | Pretrain Data | Finetune Data | Code | |-------|------------|----------|------|---------|---------------|---------------|------| | **Nomic Embed v2** | 305 | 768 | 52.86 | **65.80** | ✅ | ✅ | ✅ | | mE5 Base | 278 | 768 | 48.88 | 62.30 | ❌ | ❌ | ❌ | | mGTE Base | 305 | 768 | 51.10 | 63.40 | ❌ | ❌ | ❌ | | Arctic Embed v2 Base | 305 | 768 | **55.40** | 59.90 | ❌ | ❌ | ❌ | | | | BGE M3 | 568 | 1024 | 48.80 | **69.20** | ❌ | ✅ | ❌ | | Arctic Embed v2 Large | 568 | 1024 | **55.65** | 66.00 | ❌ | ❌ | ❌ | | mE5 Large | 560 | 1024 | 51.40 | 66.50 | ❌ | ❌ | ❌ | ## Model Architecture - **Total Parameters**: 475M - **Active Parameters During Inference**: 305M - **Architecture Type**: Mixture of Experts (MoE) - **MoE Configuration**: 8 experts with top-2 routing - **Embedding Dimensions**: Supports flexible dimension from 768 to 256 through Matryoshka representation learning - **Maximum Sequence Length**: 512 tokens - **Languages**: Supports dozens of languages (see Performance section) ## Paper Abstract Transformer-based text embedding models have improved their performance on benchmarks like MIRACL and BEIR by increasing their parameter counts. However, this scaling approach introduces significant deployment challenges, including increased inference latency and memory usage. These challenges are particularly severe in retrieval-augmented generation (RAG) applications, where large models' increased memory requirements constrain dataset ingestion capacity, and their higher latency directly impacts query-time performance. While causal language models have addressed similar efficiency challenges using Mixture of Experts (MoE) architectures, this approach hasn't been successfully adapted to the general text embedding setting. In this paper, we introduce Nomic Embed v2, the first general purpose MoE text embedding model. Our model outperforms models in the same parameter class on both monolingual and multilingual benchmarks while also maintaining competitive performance with models twice its size. We open-source all code, models, and evaluation data to ensure full reproducibility of our training pipeline at ## Usage Guide ### Installation The model can be used through SentenceTransformers and Transformers. For best performance on GPU, please install > [!IMPORTANT] > **Important!** > The text prompt *must* include a *task instruction prefix*, instructing the model which task is being performed. Please use before your queries/questions, and before your documents. ### Transformers If using Transformers, **make sure to prepend the task instruction prefix**. For truncation, you can trucate before applying normalization ### SentenceTransformers With SentenceTransformers, you can specify the as either or , and the task instruction will be included automatically. For truncation/Matryoshka embeddings, you can specify and use the model similarly ## Performance nomic-embed-text-v2-moe performance on BEIR and MIRACL compared to other open-weights embedding models: !image/png nomic-embed-text-v2-moe performance on BEIR at 768 dimension and truncated to 256 dimensions: !image/png ## Best Practices - Add appropriate prefixes to your text: - For queries: \"search_query: \" - For documents: \"search_document: \" - Maximum input length is 512 tokens - For optimal efficiency, consider using the 256-dimension embeddings if storage/compute is a concern ## Limitations - Performance may vary across different languages - Resource requirements may be higher than traditional dense models due to MoE architecture - Must use when loading the model to use our custom architecture implementation ## Training Details !image/png - Trained on 1.6 billion high-quality pairs across multiple languages - Uses consistency filtering to ensure high-quality training data - Incorporates Matryoshka representation learning for dimension flexibility - Training includes both weakly-supervised contrastive pretraining and supervised finetuning For more details, please check out the blog post and technical report. ## Join the Nomic Community - Nomic: - Discord: - Twitter: # Citation If you find the model, dataset, or training code useful, please cite our work", + "model_explanation_gemini": "Generates high-performance multilingual text embeddings for retrieval tasks using a Mixture of Experts architecture, supporting flexible dimensions and ~100 languages." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0.json b/data/model_data_json/nvidia_Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0.json new file mode 100644 index 0000000000000000000000000000000000000000..7998243a2c6775255c716fdac2b6a7f9fbb0df6b --- /dev/null +++ b/data/model_data_json/nvidia_Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0.json @@ -0,0 +1,17 @@ +{ + "model_id": "nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0", + "downloads": 1102006, + "tags": [ + "peft", + "safetensors", + "text-classification", + "en", + "dataset:nvidia/Aegis-AI-Content-Safety-Dataset-1.0", + "arxiv:2307.09288", + "arxiv:2404.05993", + "license:llama2", + "region:us" + ], + "description": "--- license: llama2 datasets: - nvidia/Aegis-AI-Content-Safety-Dataset-1.0 language: - en metrics: - f1 library_name: peft pipeline_tag: text-classification --- # Model Card ## License The use of this model is governed by the Llama 2 Community License Agreement. ## Model Details Aegis-AI-Content-Safety-LlamaGuard-LLM-Defensive-1.0 is a LLM content safety model. It is a parameter efficient instruction tuned version of Llama Guard based on Llama2-7B trained on Nvidia's content safety dataset Aegis Content Safety Dataset covering Nvidia's broad taxonomy of 13 critical safety risk categories. Paper Details: Aegis Content Moderation ### Model Description The Aegis-AI-Content-Safety-LlamaGuard-LLM-Defensive-1.0 model involves the following: 1. System instruction including the safety taxonomy, a safety policy with inclusions and, exclusions. 2. The system prompt instructs the LLM to moderate user prompt, partial dialog or full dialog. 3. The LLM response is a string which can be either safe or unsafe. If the string generated by the LLM is \"unsafe\", on a new line, the category ID of violation is output by the LLM based on the policy in the system prompt. 4. Novel safety risk categories and policy can be provided in the instruction for the model to categorize using the novel taxonomy and policy. 5. The safety taxonomy and policy used to train the models contain 13 critically unsafe risk categories, a safe category and a \"needs caution\" category. 6. Internally annotated dataset called Aegis-AI-Content-Safety-Dataset-1.0 of approximately 11,000 prompts and responses are used to instruction tune the model. Annotations are at dialog level not per turn. We have since collected in total 30,000 annotations on a further expanded taxonomy and future versions of the models will be trained on the full set. The annotations are at dialog level instead of per-turn level. 7. Model is instruction tuned with safety instruction, with the LLM behaving as a classifier in this setting. PLEASE NOTE: Model has only been trained to perform prompt classification since the annotations were not available at turn level. If you wish to use the model for response classification, use the template as provided below. # Prompt used for training and evaluation: **Output (Model Response)** - **Developed by:** Shaona Ghosh, Nvidia - **Model type:** Instruction tuned LLama2-7B - **License:** Llama 2 - **Finetuned from model:** Llama Guard ## Uses Ethical use: Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the Llama 2 Community License Agreement. ### Direct Use - The Aegis-AI-Content-Safety-LlamaGuard-LLM-Defensive-1.0 model is for users who wants to safeguard or evaluate a general purpose LLM's generated content Model and dataset restrictions: The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. ### Downstream Use - Alternatively, the model can be used for performing toxicity classification for any text content such as pre-training data not exclusively limited to human-LLM interaction data - The model can be finetuned further with custom safety policy and taxonomies. - Different adapter weights (used in conjunction with this model) can be used to enforce different safety tolerance. ## Bias, Risks, and Limitations Given the nature of the work, the model has been trained on critically unsafe data that includes social biases to be able to categorize the safety risks based on a broad safety risk taxonomy. However, - Even though we have performed exhaustive evaluation, occasionally, the model can make errors in predicting the unsafe category. - Even though, we have internally red teamed the model (please see paper for details), the safety guardrails of the model can be bypassed by adversarial prompts and the underlying LLM may be prompted to generate unsafe text. ### Bias Field | Response :---------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Participation considerations from adversely impacted groups (protected classes) in model design and testing: | None of the Above Measures taken to mitigate against unwanted bias: | None of the Above ### Privacy Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personally-identifiable information (PII)? | None Was consent obtained for any PII used? | Not Applicable PII used to create this model? | None Known How often is dataset reviewed? | During dataset creation, model training, evaluation and before release Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable If PII collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable If PII collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable If PII collected for the development of this AI model, was it minimized to only what was required? | Not Applicable Is there provenance for all datasets used in training? | Yes Does data labeling (annotation, metadata) comply with privacy laws? | Yes Is data compliant with data subject requests for data correction or removal, if such a request was made? | Not Applicable ### Recommendations We recommend users to monitor for the above risks before deploying the models. If you notice any concerns, please report to us immediately. ## How to Get Started with the Model - Download the original Llama Guard weights from Llama Guard after requesting access. - Use transformers PEFT library for loading the adapter weights from this repository. - Format the prompt using the functions below: ## How To Use in NVIDIA NeMo Curator NeMo Curator improves generative AI model accuracy by processing text, image, and video data at scale for training and customization. It also provides pre-built pipelines for generating synthetic data to customize and evaluate generative AI systems. The inference code for this model is available through the NeMo Curator GitHub repository. Check out this example notebook to get started. ## Training Details ### Training Data The model has been trained on Nvidia's Aegis Content Safety Dataset * Human Prompts from Anthropic RLHF harmless dataset Anthropic RLHF * LLM response generated from Mistral-7B-v0.1 Mistral-7B-v0.1 ***Labeling Method by dataset*** * Human **Properties** Trained on approximately 10,800 user prompts, user prompts and LLM response single turn, user prompts and LLM response muliple turns. #### Training Hyperparameters * rank 16 * alpha 32 * Num of nodes 1 * Num of GPUs per node 8 * Learning rate 1e-06 ### Training Procedure We use the PEFT library from Hugging Face and the training and validation code from the Llama recipes repository. We use FSDP during training. - **Training regime:** fp16 ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model has been evaluated on the following benchmarks: * Test partition of Nvidia's content safety dataset Aegis Content Safety Dataset * Toxic Chat Dataset * Open AI Moderation Dataset * SimpleSafetyTests Benchmark #### Metrics We report F1 and AUPRC scores for the model on the evaluation benchmarks. ### Results on Aegis Content Safety Test Set Model | AUPRC | F1 | ------------ |:-----------: |-----------: | Llama Guard Base |0.930 |0.62 | OpenAI Mod API |0.895 |0.34 | Perspective API |0.860 |0.24 | Llama Guard Defensive |0.941 |0.85 | ### Results on Toxic Chat Dataset Model | AUPRC | F1 | ------------ |:-----------: |-----------: | Llama Guard Base |0.664 |0.58 | OpenAI Mod API |0.588 |- | Perspective API |0.532 |- | Llama Guard Defensive |0.699 |0.64 | ### Results on Open AI Moderation Dataset Model | AUPRC | F1 | ------------ |:-----------: |-----------: | Llama Guard Base |0.845 |0.76 | OpenAI Mod API |0.856 |- | Perspective API |0.787 |- | Llama Guard Defensive |0.844 |0.68 | ### Results on Simple Safety Tests Benchmark Model | Accuracy | ------------ |:-----------: | Llama Guard Base |87% | Perspective API |72% | GPT4 |89% | Llama Guard Defensive |100% | ### Compute Infrastructure #### Supported Hardware: H100, A100 80GB, A100 40GB ## Citation **BibTeX:** ## Model Card Contact shaonag@nvidia.com", + "model_explanation_gemini": "Classifies text content as safe or unsafe based on 13 critical safety risk categories, providing violation details when unsafe." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_Cosmos-1.0-Diffusion-7B-Video2World.json b/data/model_data_json/nvidia_Cosmos-1.0-Diffusion-7B-Video2World.json new file mode 100644 index 0000000000000000000000000000000000000000..5f19fbeabd7846f124ac1dedb2d9b9b2af9b7032 --- /dev/null +++ b/data/model_data_json/nvidia_Cosmos-1.0-Diffusion-7B-Video2World.json @@ -0,0 +1,14 @@ +{ + "model_id": "nvidia/Cosmos-1.0-Diffusion-7B-Video2World", + "downloads": 244460, + "tags": [ + "cosmos", + "safetensors", + "nvidia", + "nemo", + "arxiv:2501.03575", + "license:other", + "region:us" + ], + "description": "--- license: other license_name: nvidia-open-model-license license_link: >- library_name: cosmos tags: - nvidia - nemo - cosmos extra_gated_prompt: >- # NVIDIA Open Model License Agreement Version Release Date: January 6, 2025 This NVIDIA Open Model License Agreement (the \"Agreement\") is a legal agreement between the Legal Entity You represent, or if no entity is identified, You and NVIDIA Corporation and its Affiliates (\"NVIDIA\") and governs Your use of the Models that NVIDIA provides to You under this Agreement. NVIDIA and You are each a \"party\" and collectively the \"parties.\" NVIDIA models released under this Agreement are intended to be used permissively and enable the further development of AI technologies. Subject to the terms of this Agreement, NVIDIA confirms that: * Models are commercially usable. * You are free to create and distribute Derivative Models. * NVIDIA does not claim ownership to any outputs generated using the Models or Model Derivatives. By using, reproducing, modifying, distributing, performing or displaying any portion or element of the Model or Derivative Model, or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement. ## 1. Definitions The following definitions apply to this Agreement: 1.1. \"NVIDIA Cosmos Model\" means a multimodal Model shared under this Agreement. 1.2. \"Derivative Model\" means all (a) modifications to the Model, (b) works based on the Model, and (c) any other derivative works of the Model. An output is not a Derivative Model. 1.3. \"Legal Entity\" means the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, \"control\" means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of fifty percent (50%) or more of the outstanding shares, or (c) beneficial ownership of such entity. 1.4. \"Model\" means the machine learning model, software, checkpoints, learnt weights, algorithms, parameters, configuration files and documentation shared under this Agreement. 1.5. \"You\" or \"Your\" means an individual or Legal Entity exercising permissions granted by this Agreement. ## 2. Conditions for Use, License Grant, AI Ethics and IP Ownership 2.1. Conditions for Use. The Model and any Derivative Model are subject to additional terms as described in Section 2 and Section 3 of this Agreement and govern Your use. If You institute copyright or patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model or a Derivative Model constitutes direct or contributory copyright or patent infringement, then any licenses granted to You under this Agreement for that Model or Derivative Model will terminate as of the date such litigation is filed. If You bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under this Agreement will automatically terminate. NVIDIA may update this Agreement to comply with legal and regulatory requirements at any time and You agree to either comply with any updated license or cease Your copying, use, and distribution of the Model and any Derivative Model. 2.2. License Grant. The rights granted herein are explicitly conditioned on Your full compliance with the terms of this Agreement. Subject to the terms and conditions of this Agreement, NVIDIA hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, revocable (as stated in Section 2.1) license to publicly perform, publicly display, reproduce, use, create derivative works of, make, have made, sell, offer for sale, distribute (through multiple tiers of distribution) and import the Model. 2.3. AI Ethics. Use of the Models under the Agreement must be consistent with NVIDIA's Trustworthy AI terms found at 2.4. NVIDIA owns the Model and any Model Derivatives created by NVIDIA. Subject to NVIDIA's underlying ownership rights in the Model or its Model Derivatives, You are and will be the owner of Your Model Derivatives. NVIDIA claims no ownership rights in outputs. You are responsible for outputs and their subsequent uses. Except as expressly granted in this Agreement, (a) NVIDIA reserves all rights, interests and remedies in connection with the Model and (b) no other license or right is granted to you by implication, estoppel or otherwise. ## 3. Redistribution You may reproduce and distribute copies of the Model or Derivative Models thereof in any medium, with or without modifications, provided that You meet the following conditions: 3.1. If you distribute the Model, You must give any other recipients of the Model a copy of this Agreement and include the following attribution notice within a \"Notice\" text file with such copies: \"Licensed by NVIDIA Corporation under the NVIDIA Open Model License\"; 3.2. If you distribute or make available a NVIDIA Cosmos Model, or a product or service (including an AI model) that contains or uses a NVIDIA Cosmos Model, use a NVIDIA Cosmos Model to create a Derivative Model, or use a NVIDIA Cosmos Model or its outputs to create, train, fine tune, or otherwise improve an AI model, you will include \"Built on NVIDIA Cosmos\" on a related website, user interface, blogpost, about page, or product documentation; and 3.3. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Models as a whole, provided Your use, reproduction, and distribution of the Model otherwise complies with the conditions stated in this Agreement. ## 4. Trademarks This Agreement does not grant permission to use the trade names, trademarks, service marks, or product names of NVIDIA, except as required for reasonable and customary use in describing the origin of the Model and reproducing the content of the \"Notice\" text file. ## **5. Disclaimer of Warranty** **Unless required by applicable law or agreed to in writing, NVIDIA provides the Model on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Model, Derivative Models and outputs and assume any risks associated with Your exercise of permissions under this Agreement.** ## **6. Limitation of Liability** **In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, will NVIDIA be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this Agreement or out of the use or inability to use the Model, Derivative Models or outputs (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if NVIDIA has been advised of the possibility of such damages.** ## 7. Indemnity You will indemnify and hold harmless NVIDIA from and against any claim by any third party arising out of or related to your use or distribution of the Model, Model Derivatives or outputs. ## 8. Feedback NVIDIA appreciates your feedback, and You agree that NVIDIA may use it without restriction or compensation to You. ## 9. Governing Law This Agreement will be governed in all respects by the laws of the United States and the laws of the State of Delaware, without regard to conflict of laws principles or the United Nations Convention on Contracts for the International Sale of Goods. The state and federal courts residing in Santa Clara County, California will have exclusive jurisdiction over any dispute or claim arising out of or related to this Agreement, and the parties irrevocably consent to personal jurisdiction and venue in those courts; except that, either party may apply for injunctive remedies or an equivalent type of urgent legal relief in any jurisdiction. ## 10. Trade and Compliance You agree to comply with all applicable export, import, trade and economic sanctions laws and regulations, as amended, including without limitation U.S. Export Administration Regulations and Office of Foreign Assets Control regulations. These laws include restrictions on destinations, end-users and end-use. extra_gated_fields: By clicking Submit below, I accept the terms of the NVIDIA Open Model License Agreement and acknowledge that I am an adult of legal age of majority in the country in which the Cosmos Models will be used and have authority to accept this Agreement: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the NVIDIA Privacy Policy. extra_gated_button_content: Submit --- # **Cosmos-1.0-Diffusion**: A Suite of Diffusion-based World Foundation Models **Cosmos** | **Code** | **Paper** | **Paper Website** # Model Overview ## Description: **Cosmos World Foundation Models**: A family of highly performant pre-trained world foundation models purpose-built for generating physics-aware videos and world states for physical AI development. The Cosmos diffusion models are a collection of diffusion based world foundation models that generate dynamic, high quality videos from text, image, or video inputs. It can serve as the building block for various applications or research that are related to world generation. The models are ready for commercial use under NVIDIA Open Model license agreement. **Model Developer**: NVIDIA ## Model Versions In Cosmos 1.0 release, the Cosmos Diffusion WFM family includes the following models: - Cosmos-1.0-Diffusion-7B-Text2World - Given a text description, predict an output video of 121 frames. - Cosmos-1.0-Diffusion-14B-Text2World - Given a text description, predict an output video of 121 frames. - Cosmos-1.0-Diffusion-7B-Video2World - Given a text description and an image as the first frame, predict the future 120 frames. - Cosmos-1.0-Diffusion-14B-Video2World - Given a text description and an image as the first frame, predict the future 120 frames. ### License: This model is released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com. Under the NVIDIA Open Model License, NVIDIA confirms: * Models are commercially usable. * You are free to create and distribute Derivative Models. * NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models. **Important Note**: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, **safety guardrail** or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate. * Cosmos-1.0-Guardrail is the safety guardrail for this model. ## Model Architecture: Cosmos-1.0-Diffusion-7B-Video2World is a diffusion transformer model designed for video denoising in the latent space. The network is composed of interleaved self-attention, cross-attention and feedforward layers as its building blocks. The cross-attention layers allow the model to condition on input text throughout the denoising process. Before each layers, adaptive layer normalization is applied to embed the time information for denoising. When image or video is provided as input, their latent frames are concatenated with the generated frames along the temporal dimension. Augment noise is added to conditional latent frames to bridge the training and inference gap. ## Input/Output Specifications * **Input** * **Input Type(s)**: Text+Image, Text+Video * **Input Format(s)**: * Text: String * Image: jpg, png, jpeg, webp * Video: mp4 * **Input Parameters**: * Text: One-dimensional (1D) * Image: Two-dimensional (2D) * Video: Three-dimensional (3D) * **Other Properties Related to Input**: * The input string should contain fewer than 300 words and should provide descriptive content for world generation, such as a scene description, key objects or characters, background, and any specific actions or motions to be depicted within the 5-second duration. * The input image should be of 1280x704 resolution. * The input video should be of 1280x704 resolution and 9 input frames. * **Output** * **Output Type(s)**: Video * **Output Format(s)**: mp4 * **Output Parameters**: Three-dimensional (3D) * **Other Properties Related to Output**: By default, the generated video is a 5-second clip with a resolution of 1280x704 pixels and a frame rate of 24 frames per second (fps). The video content visualizes the input text description as a short animated scene, capturing key elements within the specified time constraints. Aspect ratios and resolutions are configurable, with options including 1:1 (960x960 pixels), 4:3 (960x704 pixels), 3:4 (704x960 pixels), 16:9 (1280x704 pixels), and 9:16 (704x1280 pixels). The frame rate is also adjustable within a range of 12 to 40 fps. ## Software Integration **Runtime Engine(s):** * Cosmos **Supported Hardware Microarchitecture Compatibility:** * NVIDIA Blackwell * NVIDIA Hopper * NVIDIA Ampere **Note**: We have only tested doing inference with BF16 precision. **Operating System(s):** * Linux (We have not tested on other operating systems.) # Usage * See Cosmos for details. # Evaluation Please see our technical paper for detailed evaluations. ## Inference Time and GPU Memory Usage The numbers provided below may vary depending on system specs and are for reference only. | Offloading Strategy | 7B Video2World | 14B Video2World | |----------------------------------------------------------------------------------|---------|---------| | Offload prompt upsampler | 76.5 GB | > 80.0 GB | | Offload prompt upsampler & guardrails | 59.9 GB | 73.3 GB | | Offload prompt upsampler & guardrails & T5 encoder | 41.3 GB | 54.8 GB | | Offload prompt upsampler & guardrails & T5 encoder & tokenizer | 41.1 GB | 54.5 GB | | Offload prompt upsampler & guardrails & T5 encoder & tokenizer & diffusion model | 27.3 GB | 39.0 GB | The following table shows the end-to-end inference runtime on a single H100 GPU, excluding model initialization time: | 7B Video2World (offload prompt upsampler) | 14B Video2World (offload prompt upsampler, guardrails) | |---------|---------| | ~383 seconds | ~593 seconds | ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the subcards of Explainability, Bias, Safety & Security, and Privacy below. Please report security vulnerabilities or NVIDIA AI Concerns here. ### Plus Plus (++) Promise We value you, the datasets, the diversity they represent, and what we have been entrusted with. This model and its associated data have been: * Verified to comply with current applicable disclosure laws, regulations, and industry standards. * Verified to comply with applicable privacy labeling requirements. * Annotated to describe the collector/source (NVIDIA or a third-party). * Characterized for technical limitations. * Reviewed to ensure proper disclosure is accessible to, maintained for, and in compliance with NVIDIA data subjects and their requests. * Reviewed before release. * Tagged for known restrictions and potential safety implications. ### Bias Field | Response :---------------------------------------------------------------------------------------------------|:--------------- Participation considerations from adversely impacted groups protected classes in model design and testing: | None Measures taken to mitigate against unwanted bias: | None ### Explainability Field | Response :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Intended Application & Domain: | World Generation Model Type: | Transformer Intended Users: | Physical AI developers Output: | Videos Describe how the model works: | Generates videos based on video inputs Technical Limitations: | The model may not follow the video input accurately. Verified to have met prescribed NVIDIA quality standards: | Yes Performance Metrics: | Quantitative and Qualitative Evaluation Potential Known Risks: | The model's output can generate all forms of videos, including what may be considered toxic, offensive, or indecent. Licensing: | NVIDIA Open Model License ### Privacy Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personal information? | None Known Protected class data used to create this model? | None Known Was consent obtained for any personal data used? | None Known How often is dataset reviewed? | Before Release Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable If personal data was collected for the development of this AI model, was it minimized to only what was required? | Not Applicable Is there provenance for all datasets used in training? | Yes Does data labeling (annotation, metadata) comply with privacy laws? | Yes Is data compliant with data subject requests for data correction or removal, if such a request was made? | Not Applicable ### Safety Field | Response :---------------------------------------------------|:---------------------------------- Model Application(s): | World Generation Describe the life critical impact (if present). | None Known Use Case Restrictions: | NVIDIA Open Model License Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_Llama-3_1-Nemotron-51B-Instruct.json b/data/model_data_json/nvidia_Llama-3_1-Nemotron-51B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..785cf45ecc823a91eb11649932d0d2844902784d --- /dev/null +++ b/data/model_data_json/nvidia_Llama-3_1-Nemotron-51B-Instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "nvidia/Llama-3_1-Nemotron-51B-Instruct", + "downloads": 102385, + "tags": [ + "transformers", + "safetensors", + "nemotron-nas", + "text-generation", + "nvidia", + "llama-3", + "pytorch", + "conversational", + "custom_code", + "en", + "arxiv:2306.05685", + "arxiv:2009.03300", + "arxiv:2110.14168", + "arxiv:2404.05993", + "license:other", + "autotrain_compatible", + "region:us" + ], + "description": "--- library_name: transformers pipeline_tag: text-generation language: - en tags: - nvidia - llama-3 - pytorch license: other license_name: nvidia-open-model-license license_link: >- --- # Llama-3_1-Nemotron-51B-instruct ## Model Overview Llama-3_1-Nemotron-51B-instruct is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to price, providing great ‘quality-per-dollar’. Using a novel Neural Architecture Search (NAS) approach we greatly reduce the model’s memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H100-80GB). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. This model is ready for commercial use. ## License Your use of this model is governed by the NVIDIA Open Model License. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. ## How was the model developed Llama-3_1-Nemotron-51B-instruct is a large language model (LLM) which is a derivative of Llama-3.1-70B-instruct (AKA the reference model). We utilize a block-wise distillation of the reference model, where for each block we create multiple variants providing different tradeoffs of quality vs. computational complexity. We then search over the blocks to create a model which meets the required throughput and memory (optimized for a single H100-80GB GPU) while minimizing the quality degradation. The model then undergoes knowledge distillation (KD), with a focus on English single and multi-turn chat use-cases. The KD step included 40 billion tokens consisting of a mixture of 3 datasets - FineWeb, Buzz-V1.2 and Dolma. Links to NIM, blog and huggingface This results in a final model that is aligned for human chat preferences. **Model Developers:** NVIDIA **Model Input:** Text only **Model Output:** Text only **Model Dates:** Llama-3_1-Nemotron-51B-instruct was trained between August and September 2024 **Data Freshness:** The pretraining data has a cutoff of 2023 **Sequence Length Used During Distillation:** 8192 ## Quick Start Our code requires the package version to be 4.44.2 or higher See the snippet below for usage with transformers: ## Required Hardware FP8 Inference (recommended): - 1x H100-80GB GPU BF16 Inference: - 2x H100-80GB GPUs - 2x A100-80GB GPUs ## Model Architecture The model is a derivative of Llama-3.1-70B, using Neural Architecture Search (NAS). The NAS algorithm results in non-standard and non-repetitive blocks. This includes the following: * Variable Grouped Query Attention (VGQA) - each block can have a different number of KV (keys and values) heads, ranging from 1 to Llama’s typical 8. * Skip attention - in some blocks the attention is skipped entirely, or replaced with a single linear layer. * Variable FFN - the expansion/compression ratio in the FFN layer is different between blocks. **Architecture Type:** Transformer Decoder (auto-regressive language model) ## Software Integration **Runtime Engine(s):** * NeMo 24.05
**Supported Hardware Architecture Compatibility:** NVIDIA H100, A100 80GB (BF16 quantization). **[Preferred/Supported] Operating System(s):**
* Linux
## Intended use Llama-3_1-Nemotron-51B-Instruct is a general purpose chat model intended to be used in English and coding languages. Other non-English languages are also supported. ## Evaluation Results **Data Collection Method by dataset**
* Automated
### MT-Bench Evaluated using select datasets from the Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena MT-bench - 8.99 ### MMLU Evaluated using the Multi-task Language Understanding benchmarks as introduced in Measuring Massive Multitask Language Understanding |MMLU (5-shot) | | :----------------- | | 80.2% | ### GSM8K Evaluated using the Grade School Math 8K (GSM8K) benchmark as introduced in Training Verifiers to Solve Math Word Problems |GSM8K (5-shot) | | :----------------- | | 91.43% | ### Winogrande |Winogrande (5-shot) | | :----------------- | | 84.53% | ### Arc-C |Arc challenge (25-shot) | | :----------------- | | 69.20% | ### Hellaswag |Hellaswag (10-shot) | | :----------------- | | 85.58% | ### Truthful QA |TruthfulQA (0-shot) | | :----------------- | | 58.63%% | ## Limitations The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. The model demonstrates weakness to alignment-breaking attacks. Users are advised to deploy language model guardrails alongside this model to prevent potentially harmful outputs. ## Adversarial Testing and Red Teaming Efforts The Llama-3_1-Nemotron-51B-instruct model underwent extensive safety evaluation including adversarial testing via three distinct methods: * Garak, is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage. * AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions. * Human Content Red Teaming leveraging human interaction and evaluation of the models' responses. ## Inference **Engine:** Tensor(RT)
**Test Hardware** H100-80GB
## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.", + "model_explanation_gemini": "Llama-3_1-Nemotron-51B-Instruct is an efficient, single-GPU-optimized large language model designed for English and coding-focused chat applications, derived from Llama-3.1-70B via neural architecture search and knowledge distillation." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_Llama-3_3-Nemotron-Super-49B-v1.json b/data/model_data_json/nvidia_Llama-3_3-Nemotron-Super-49B-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..0a6f479f840b5a8cf0b3cb1f71e7ab206127ac47 --- /dev/null +++ b/data/model_data_json/nvidia_Llama-3_3-Nemotron-Super-49B-v1.json @@ -0,0 +1,23 @@ +{ + "model_id": "nvidia/Llama-3_3-Nemotron-Super-49B-v1", + "downloads": 152068, + "tags": [ + "transformers", + "safetensors", + "nemotron-nas", + "text-generation", + "nvidia", + "llama-3", + "pytorch", + "conversational", + "custom_code", + "en", + "arxiv:2411.19146", + "arxiv:2502.00203", + "license:other", + "autotrain_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: other license_name: nvidia-open-model-license license_link: >- pipeline_tag: text-generation language: - en tags: - nvidia - llama-3 - pytorch --- # Llama-3.3-Nemotron-Super-49B-v1 ## Model Overview Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.3-70B-Instruct (AKA the *reference model*). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model’s memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. For more information on the NAS approach, please refer to this paper. The model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, and Tool Calling as well as multiple reinforcement learning (RL) stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) algorithms for both chat and instruction-following. The final model checkpoint is obtained after merging the final SFT and Online RPO checkpoints. For more details on how the model was trained, please see this blog. !Training Process This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here: - Llama-3.1-Nemotron-Nano-8B-v1 - Llama-3.1-Nemotron-Ultra-253B-v1 This model is ready for commercial use. ## License/Terms of Use GOVERNING TERMS: Your use of this model is governed by the NVIDIA Open Model License. \\ Additional Information: Llama 3.3 Community License Agreement. Built with Llama. **Model Developer:** NVIDIA **Model Dates:** Trained between November 2024 and February 2025 **Data Freshness:** The pretraining data has a cutoff of 2023 per Meta Llama 3.3 70B ### Use Case:
Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks.
### Release Date:
3/18/2025
## References * [[2411.19146] Puzzle: Distillation-Based NAS for Inference-Optimized LLMs]( * [[2502.00203] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment]( ## Model Architecture **Architecture Type:** Dense decoder-only Transformer model \\ **Network Architecture:** Llama 3.3 70B Instruct, customized through Neural Architecture Search (NAS) The model is a derivative of Meta’s Llama-3.3-70B-Instruct, using Neural Architecture Search (NAS). The NAS algorithm results in non-standard and non-repetitive blocks. This includes the following: * Skip attention: In some blocks, the attention is skipped entirely, or replaced with a single linear layer. * Variable FFN: The expansion/compression ratio in the FFN layer is different between blocks. We utilize a block-wise distillation of the reference model, where for each block we create multiple variants providing different tradeoffs of quality vs. computational complexity, discussed in more depth below. We then search over the blocks to create a model which meets the required throughput and memory (optimized for a single H100-80GB GPU) while minimizing the quality degradation. The model then undergoes knowledge distillation (KD), with a focus on English single and multi-turn chat use-cases. The KD step included 40 billion tokens consisting of a mixture of 3 datasets - FineWeb, Buzz-V1.2 and Dolma. ## Intended use Llama-3.3-Nemotron-Super-49B-v1 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Portuguese, Hindi, Spanish, and Thai) are also supported. ## Input - **Input Type:** Text - **Input Format:** String - **Input Parameters:** One-Dimensional (1D) - **Other Properties Related to Input:** Context length up to 131,072 tokens ## Output - **Output Type:** Text - **Output Format:** String - **Output Parameters:** One-Dimensional (1D) - **Other Properties Related to Output:** Context length up to 131,072 tokens ## Model Version 1.0 (3/18/2025) ## Software Integration - **Runtime Engine:** Transformers - **Recommended Hardware Microarchitecture Compatibility:** - NVIDIA Hopper - NVIDIA Ampere ## Quick Start and Usage Recommendations: 1. Reasoning mode (ON/OFF) is controlled via the system prompt, which must be set as shown in the example below. All instructions should be contained within the user prompt 2. We recommend setting temperature to , and Top P to for Reasoning ON mode 3. We recommend using greedy decoding for Reasoning OFF mode 4. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required 5. The model will include if no reasoning was necessary in Reasoning ON model, this is expected behaviour You can try this model out through the preview API, using this link: Llama-3_3-Nemotron-Super-49B-v1. ### Use It with Transformers See the snippet below for usage with Hugging Face Transformers library. Reasoning mode (ON/OFF) is controlled via system prompt. Please see the example below We recommend using the *transformers* package with version 4.48.3. Example of reasoning on: Example of reasoning off: ### Use It with vLLM An example on how to serve with vLLM: ## Inference: **Engine:** - Transformers **Test Hardware:** - FP8: 1x NVIDIA H100-80GB GPU (Coming Soon!) - BF16: - 2x NVIDIA H100-80GB - 2x NVIDIA A100-80GB GPUs **[Preferred/Supported] Operating System(s):** Linux
## Training Datasets A large variety of training data was used for the knowledge distillation phase before post-training pipeline, 3 of which included: FineWeb, Buzz-V1.2, and Dolma. The data for the multi-stage post-training phases for improvements in Code, Math, and Reasoning is a compilation of SFT and RL data that supports improvements of math, code, general reasoning, and instruction following capabilities of the original Llama instruct model. In conjunction with this model release, NVIDIA has released 30M samples of post-training data, as public and permissive. Please see Llama-Nemotron-Postraining-Dataset-v1. Distribution of the domains is as follows: | Category | Value | |----------|-----------| | math | 19,840,970| | code | 9,612,677 | | science | 708,920 | | instruction following | 56,339 | | chat | 39,792 | | safety | 31,426 | Prompts have been sourced from either public and open corpus or synthetically generated. Responses were synthetically generated by a variety of models, with some prompts containing responses for both reasoning on and off modes, to train the model to distinguish between two modes. **Data Collection for Training Datasets:** - Hybrid: Automated, Human, Synthetic **Data Labeling for Training Datasets:** - Hybrid: Automated, Human, Synthetic ## Evaluation Datasets We used the datasets listed below to evaluate Llama-3.3-Nemotron-Super-49B-v1. Data Collection for Evaluation Datasets: - Hybrid: Human/Synthetic Data Labeling for Evaluation Datasets: - Hybrid: Human/Synthetic/Automatic ## Evaluation Results These results contain both “Reasoning On”, and “Reasoning Off”. We recommend using temperature=, top_p= for “Reasoning On” mode, and greedy decoding for “Reasoning Off” mode. All evaluations are done with 32k sequence length. We run the benchmarks up to 16 times and average the scores to be more accurate. > NOTE: Where applicable, a Prompt Template will be provided. While completing benchmarks, please ensure that you are parsing for the correct output format as per the provided prompt in order to reproduce the benchmarks seen below. ### Arena-Hard | Reasoning Mode | Score | |--------------|------------| | Reasoning Off | 88.3 | ### MATH500 | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 74.0 | | Reasoning On | 96.6 | User Prompt Template: ### AIME25 | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 13.33 | | Reasoning On | 58.4 | User Prompt Template: ### GPQA | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 50 | | Reasoning On | 66.67 | User Prompt Template: ### IFEval | Reasoning Mode | Strict:Instruction | |--------------|------------| | Reasoning Off | 89.21 | ### BFCL V2 Live | Reasoning Mode | Score | |--------------|------------| | Reasoning Off | 73.7 | User Prompt Template: ### MBPP 0-shot | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 84.9| | Reasoning On | 91.3 | User Prompt Template: python # Your codes here ` ### MT-Bench | Reasoning Mode | Score | |--------------|------------| | Reasoning Off | 9.17 | ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.", + "model_explanation_gemini": "A large language model optimized for reasoning, chat preferences, and tasks like RAG and tool calling, with a 128K token context length, balancing accuracy and efficiency through Neural Architecture Search." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_bigvgan_v2_22khz_80band_256x.json b/data/model_data_json/nvidia_bigvgan_v2_22khz_80band_256x.json new file mode 100644 index 0000000000000000000000000000000000000000..d5d947cdc44f700629a9435ca8058392734bbe9a --- /dev/null +++ b/data/model_data_json/nvidia_bigvgan_v2_22khz_80band_256x.json @@ -0,0 +1,15 @@ +{ + "model_id": "nvidia/bigvgan_v2_22khz_80band_256x", + "downloads": 424419, + "tags": [ + "PyTorch", + "neural-vocoder", + "audio-generation", + "audio-to-audio", + "arxiv:2206.04658", + "license:mit", + "region:us" + ], + "description": "--- license: mit license_link: tags: - neural-vocoder - audio-generation library_name: PyTorch pipeline_tag: audio-to-audio --- ## BigVGAN: A Universal Neural Vocoder with Large-Scale Training #### Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon [[Paper]]( - [[Code]]( - [[Showcase]]( - [[Project Page]]( - [[Weights]]( - [[Demo]]( :** - General refactor and code improvements for improved readability. - Fully fused CUDA kernel of anti-alised activation (upsampling + activation + downsampling) with inference speed benchmark. - **Jul 2024 (v2.2):** The repository now includes an interactive local demo using gradio. - **Jul 2024 (v2.1):** BigVGAN is now integrated with 🤗 Hugging Face Hub with easy access to inference using pretrained checkpoints. We also provide an interactive demo on Hugging Face Spaces. - **Jul 2024 (v2):** We release BigVGAN-v2 along with pretrained checkpoints. Below are the highlights: - Custom CUDA kernel for inference: we provide a fused upsampling + activation kernel written in CUDA for accelerated inference speed. Our test shows 1.5 - 3x faster speed on a single A100 GPU. - Improved discriminator and loss: BigVGAN-v2 is trained using a multi-scale sub-band CQT discriminator and a multi-scale mel spectrogram loss. - Larger training data: BigVGAN-v2 is trained using datasets containing diverse audio types, including speech in multiple languages, environmental sounds, and instruments. - We provide pretrained checkpoints of BigVGAN-v2 using diverse audio configurations, supporting up to 44 kHz sampling rate and 512x upsampling ratio. ## Installation This repository contains pretrained BigVGAN checkpoints with easy access to inference and additional support. If you are interested in training the model and additional functionalities, please visit the official GitHub repository for more information: ## Usage Below example describes how you can use BigVGAN: load the pretrained BigVGAN generator from Hugging Face Hub, compute mel spectrogram from input waveform, and generate synthesized waveform using the mel spectrogram as the model's input. ## Using Custom CUDA Kernel for Synthesis You can apply the fast CUDA inference kernel by using a parameter when instantiating BigVGAN: When applied for the first time, it builds the kernel using and . If the build succeeds, the kernel is saved to and the model automatically loads the kernel. The codebase has been tested using CUDA . Please make sure that both are installed in your system and installed in your system matches the version your PyTorch build is using. For detail, see the official GitHub repository: ## Pretrained Models We provide the pretrained models on Hugging Face Collections. One can download the checkpoints of the generator weight (named ) and its discriminator/optimizer states (named ) within the listed model repositories. | Model Name | Sampling Rate | Mel band | fmax | Upsampling Ratio | Params | Dataset | Steps | Fine-Tuned | |:--------------------------------------------------------------------------------------------------------:|:-------------:|:--------:|:-----:|:----------------:|:------:|:--------------------------:|:-----:|:----------:| | bigvgan_v2_44khz_128band_512x | 44 kHz | 128 | 22050 | 512 | 122M | Large-scale Compilation | 5M | No | | bigvgan_v2_44khz_128band_256x | 44 kHz | 128 | 22050 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_v2_24khz_100band_256x | 24 kHz | 100 | 12000 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_v2_22khz_80band_256x | 22 kHz | 80 | 11025 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_v2_22khz_80band_fmax8k_256x | 22 kHz | 80 | 8000 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_24khz_100band | 24 kHz | 100 | 12000 | 256 | 112M | LibriTTS | 5M | No | | bigvgan_base_24khz_100band | 24 kHz | 100 | 12000 | 256 | 14M | LibriTTS | 5M | No | | bigvgan_22khz_80band | 22 kHz | 80 | 8000 | 256 | 112M | LibriTTS + VCTK + LJSpeech | 5M | No | | bigvgan_base_22khz_80band | 22 kHz | 80 | 8000 | 256 | 14M | LibriTTS + VCTK + LJSpeech | 5M | No |", + "model_explanation_gemini": "Generates high-quality audio waveforms from mel spectrograms using a large-scale neural vocoder trained on diverse audio data." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_bigvgan_v2_44khz_128band_512x.json b/data/model_data_json/nvidia_bigvgan_v2_44khz_128band_512x.json new file mode 100644 index 0000000000000000000000000000000000000000..e8655b08cb251be3194b8e5c925a3f6ab2f2aaab --- /dev/null +++ b/data/model_data_json/nvidia_bigvgan_v2_44khz_128band_512x.json @@ -0,0 +1,15 @@ +{ + "model_id": "nvidia/bigvgan_v2_44khz_128band_512x", + "downloads": 199858, + "tags": [ + "PyTorch", + "neural-vocoder", + "audio-generation", + "audio-to-audio", + "arxiv:2206.04658", + "license:mit", + "region:us" + ], + "description": "--- license: mit license_link: tags: - neural-vocoder - audio-generation library_name: PyTorch pipeline_tag: audio-to-audio --- ## BigVGAN: A Universal Neural Vocoder with Large-Scale Training #### Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon [[Paper]]( - [[Code]]( - [[Showcase]]( - [[Project Page]]( - [[Weights]]( - [[Demo]]( :** - General refactor and code improvements for improved readability. - Fully fused CUDA kernel of anti-alised activation (upsampling + activation + downsampling) with inference speed benchmark. - **Jul 2024 (v2.2):** The repository now includes an interactive local demo using gradio. - **Jul 2024 (v2.1):** BigVGAN is now integrated with 🤗 Hugging Face Hub with easy access to inference using pretrained checkpoints. We also provide an interactive demo on Hugging Face Spaces. - **Jul 2024 (v2):** We release BigVGAN-v2 along with pretrained checkpoints. Below are the highlights: - Custom CUDA kernel for inference: we provide a fused upsampling + activation kernel written in CUDA for accelerated inference speed. Our test shows 1.5 - 3x faster speed on a single A100 GPU. - Improved discriminator and loss: BigVGAN-v2 is trained using a multi-scale sub-band CQT discriminator and a multi-scale mel spectrogram loss. - Larger training data: BigVGAN-v2 is trained using datasets containing diverse audio types, including speech in multiple languages, environmental sounds, and instruments. - We provide pretrained checkpoints of BigVGAN-v2 using diverse audio configurations, supporting up to 44 kHz sampling rate and 512x upsampling ratio. ## Installation This repository contains pretrained BigVGAN checkpoints with easy access to inference and additional support. If you are interested in training the model and additional functionalities, please visit the official GitHub repository for more information: ## Usage Below example describes how you can use BigVGAN: load the pretrained BigVGAN generator from Hugging Face Hub, compute mel spectrogram from input waveform, and generate synthesized waveform using the mel spectrogram as the model's input. ## Using Custom CUDA Kernel for Synthesis You can apply the fast CUDA inference kernel by using a parameter when instantiating BigVGAN: When applied for the first time, it builds the kernel using and . If the build succeeds, the kernel is saved to and the model automatically loads the kernel. The codebase has been tested using CUDA . Please make sure that both are installed in your system and installed in your system matches the version your PyTorch build is using. For detail, see the official GitHub repository: ## Pretrained Models We provide the pretrained models on Hugging Face Collections. One can download the checkpoints of the generator weight (named ) and its discriminator/optimizer states (named ) within the listed model repositories. | Model Name | Sampling Rate | Mel band | fmax | Upsampling Ratio | Params | Dataset | Steps | Fine-Tuned | |:--------------------------------------------------------------------------------------------------------:|:-------------:|:--------:|:-----:|:----------------:|:------:|:--------------------------:|:-----:|:----------:| | bigvgan_v2_44khz_128band_512x | 44 kHz | 128 | 22050 | 512 | 122M | Large-scale Compilation | 5M | No | | bigvgan_v2_44khz_128band_256x | 44 kHz | 128 | 22050 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_v2_24khz_100band_256x | 24 kHz | 100 | 12000 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_v2_22khz_80band_256x | 22 kHz | 80 | 11025 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_v2_22khz_80band_fmax8k_256x | 22 kHz | 80 | 8000 | 256 | 112M | Large-scale Compilation | 5M | No | | bigvgan_24khz_100band | 24 kHz | 100 | 12000 | 256 | 112M | LibriTTS | 5M | No | | bigvgan_base_24khz_100band | 24 kHz | 100 | 12000 | 256 | 14M | LibriTTS | 5M | No | | bigvgan_22khz_80band | 22 kHz | 80 | 8000 | 256 | 112M | LibriTTS + VCTK + LJSpeech | 5M | No | | bigvgan_base_22khz_80band | 22 kHz | 80 | 8000 | 256 | 14M | LibriTTS + VCTK + LJSpeech | 5M | No |", + "model_explanation_gemini": "Generates high-quality audio waveforms from mel spectrograms using a neural vocoder trained on diverse audio data at 44 kHz sampling rate with 512x upsampling." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_canary-1b-flash.json b/data/model_data_json/nvidia_canary-1b-flash.json new file mode 100644 index 0000000000000000000000000000000000000000..bf04a0eb68f8805655ca5fe60f2a65cac0024caa --- /dev/null +++ b/data/model_data_json/nvidia_canary-1b-flash.json @@ -0,0 +1,43 @@ +{ + "model_id": "nvidia/canary-1b-flash", + "downloads": 143141, + "tags": [ + "nemo", + "automatic-speech-recognition", + "automatic-speech-translation", + "speech", + "audio", + "Transformer", + "FastConformer", + "Conformer", + "pytorch", + "NeMo", + "hf-asr-leaderboard", + "en", + "de", + "es", + "fr", + "dataset:librispeech_asr", + "dataset:fisher_corpus", + "dataset:Switchboard-1", + "dataset:WSJ-0", + "dataset:WSJ-1", + "dataset:National-Singapore-Corpus-Part-1", + "dataset:National-Singapore-Corpus-Part-6", + "dataset:vctk", + "dataset:voxpopuli", + "dataset:europarl", + "dataset:multilingual_librispeech", + "dataset:mozilla-foundation/common_voice_8_0", + "dataset:MLCommons/peoples_speech", + "arxiv:2104.02821", + "arxiv:2503.05931", + "arxiv:1706.03762", + "arxiv:2409.13523", + "license:cc-by-4.0", + "model-index", + "region:us" + ], + "description": "--- license: cc-by-4.0 language: - en - de - es - fr library_name: nemo datasets: - librispeech_asr - fisher_corpus - Switchboard-1 - WSJ-0 - WSJ-1 - National-Singapore-Corpus-Part-1 - National-Singapore-Corpus-Part-6 - vctk - voxpopuli - europarl - multilingual_librispeech - mozilla-foundation/common_voice_8_0 - MLCommons/peoples_speech thumbnail: null tags: - automatic-speech-recognition - automatic-speech-translation - speech - audio - Transformer - FastConformer - Conformer - pytorch - NeMo - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: canary-1b-flash results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 2.87 - task: type: Automatic Speech Recognition name: automatic-speech-recognition dataset: name: SPGI Speech type: kensho/spgispeech config: test split: test args: language: en metrics: - name: Test WER type: wer value: 1.95 - task: type: Automatic Speech Recognition name: automatic-speech-recognition dataset: name: Mozilla Common Voice 16.1 type: mozilla-foundation/common_voice_16_1 config: en split: test args: language: en metrics: - name: Test WER (En) type: wer value: 6.99 - task: type: Automatic Speech Recognition name: automatic-speech-recognition dataset: name: Mozilla Common Voice 16.1 type: mozilla-foundation/common_voice_16_1 config: de split: test args: language: de metrics: - name: Test WER (De) type: wer value: 4.09 - task: type: Automatic Speech Recognition name: automatic-speech-recognition dataset: name: Mozilla Common Voice 16.1 type: mozilla-foundation/common_voice_16_1 config: es split: test args: language: es metrics: - name: Test WER (ES) type: wer value: 3.62 - task: type: Automatic Speech Recognition name: automatic-speech-recognition dataset: name: Mozilla Common Voice 16.1 type: mozilla-foundation/common_voice_16_1 config: fr split: test args: language: fr metrics: - name: Test WER (Fr) type: wer value: 6.15 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: FLEURS type: google/fleurs config: en_us split: test args: language: en-de metrics: - name: Test BLEU (En->De) type: bleu value: 32.27 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: FLEURS type: google/fleurs config: en_us split: test args: language: en-de metrics: - name: Test BLEU (En->Es) type: bleu value: 22.6 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: FLEURS type: google/fleurs config: en_us split: test args: language: en-de metrics: - name: Test BLEU (En->Fr) type: bleu value: 41.22 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: FLEURS type: google/fleurs config: de_de split: test args: language: de-en metrics: - name: Test BLEU (De->En) type: bleu value: 35.5 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: FLEURS type: google/fleurs config: es_419 split: test args: language: es-en metrics: - name: Test BLEU (Es->En) type: bleu value: 23.32 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: FLEURS type: google/fleurs config: fr_fr split: test args: language: fr-en metrics: - name: Test BLEU (Fr->En) type: bleu value: 33.42 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: COVOST type: covost2 config: de_de split: test args: language: de-en metrics: - name: Test BLEU (De->En) type: bleu value: 39.33 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: COVOST type: covost2 config: es_419 split: test args: language: es-en metrics: - name: Test BLEU (Es->En) type: bleu value: 41.86 - task: type: Automatic Speech Translation name: automatic-speech-translation dataset: name: COVOST type: covost2 config: fr_fr split: test args: language: fr-en metrics: - name: Test BLEU (Fr->En) type: bleu value: 41.43 metrics: - wer - bleu - comet pipeline_tag: automatic-speech-recognition --- # Canary 1B Flash ## Description: NVIDIA NeMo Canary Flash [1] is a family of multilingual multi-tasking models based on Canary architecture [2] that achieve state-of-the-art performance on multiple speech benchmarks. With 883 million parameters and an inference speed of more than 1000 RTFx (on open-asr-leaderboard datasets), canary-1b-flash supports automatic speech-to-text recognition (ASR) in four languages (English, German, French, Spanish) and translation from English to German/French/Spanish and from German/French/Spanish to English with or without punctuation and capitalization (PnC). Additionally, canary-1b-flash offers an experimental feature for word-level and segment-level timestamps in English, German, French, and Spanish. This model is released under the permissive CC-BY-4.0 license and is available for commercial use. ## Model Architecture: Canary is an encoder-decoder model with FastConformer [3] Encoder and Transformer Decoder [4]. With audio features extracted from the encoder, task tokens such as \\, \\, \\ and \\ are fed into the Transformer Decoder to trigger the text generation process. Canary uses a concatenated tokenizer [5] from individual SentencePiece [6] tokenizers of each language, which makes it easy to scale up to more languages. The canary-1b-flash model has 32 encoder layers and 4 decoder layers, leading to a total of 883M parameters. For more details about the architecture, please refer to [1]. ## NVIDIA NeMo To train, fine-tune or transcribe with canary-1b-flash, you will need to install NVIDIA NeMo. ## How to Use this Model The model is available for use in the NeMo Framework [7], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Please refer to our tutorial for more details. A few inference examples are listed below: ### Loading the Model ## Input: **Input Type(s):** Audio
**Input Format(s):** .wav or .flac files
**Input Parameters(s):** 1D
**Other Properties Related to Input:** 16000 Hz Mono-channel Audio, Pre-Processing Not Needed
Input to canary-1b-flash can be either a list of paths to audio files or a jsonl manifest file. If the input is a list of paths, canary-1b-flash assumes that the audio is English and transcribes it. I.e., canary-1b-flash default behavior is English ASR. canary-1b-flash can also generate word and segment level timestamps For audio files longer than 10 seconds, we recommend using longform inference script (explained in next section) with to generate timestamps. To use canary-1b-flash for transcribing other supported languages or perform Speech-to-Text translation or provide word-level timestamps, specify the input as jsonl manifest file, where each line in the file is a dictionary containing the following fields: and then use: ### Longform inference with Canary-1B-flash: Canary models are designed to handle input audio smaller than 40 seconds. In order to handle longer audios, NeMo includes speech_to_text_aed_chunked_infer.py script that handles chunking, performs inference on the chunked files, and stitches the transcripts. The script will perform inference on all files in . Alternatively you can also pass a path to a manifest file as shown above. The decoded output will be saved at . **Note** that for longform inference with timestamps, it is recommended to use of 10 seconds. ## Output: **Output Type(s):** Text
**Output Format:** Text output as a string (w/ timestamps) depending on the task chosen for decoding
**Output Parameters:** 1-Dimensional text string
**Other Properties Related to Output:** May Need Inverse Text Normalization; Does Not Handle Special Characters
## Software Integration: **Runtime Engine(s):** * NeMo - main
**Supported Hardware Microarchitecture Compatibility:**
* [NVIDIA Ampere]
* [NVIDIA Blackwell]
* [NVIDIA Jetson]
* [NVIDIA Hopper]
* [NVIDIA Lovelace]
* [NVIDIA Pascal]
* [NVIDIA Turing]
* [NVIDIA Volta]
**[Preferred/Supported] Operating System(s):**
* [Linux]
* [Linux 4 Tegra]
* [Windows]
## Model Version(s): canary-1b-flash
# Training and Evaluation Datasets: ## Training Dataset: The canary-1b-flash model is trained on a total of 85K hrs of speech data. It consists of 31K hrs of public data, 20K hrs collected by Suno, and 34K hrs of in-house data. The datasets below include conversations, videos from the web and audiobook recordings. **Data Collection Method:** * Human
**Labeling Method:** * Hybrid: Human, Automated
The constituents of public data are as follows. #### English (25.5k hours) - Librispeech 960 hours - Fisher Corpus - Switchboard-1 Dataset - WSJ-0 and WSJ-1 - National Speech Corpus (Part 1, Part 6) - VCTK - VoxPopuli (EN) - Europarl-ASR (EN) - Multilingual Librispeech (MLS EN) - 2,000 hour subset - Mozilla Common Voice (v7.0) - People's Speech - 12,000 hour subset - Mozilla Common Voice (v11.0) - 1,474 hour subset #### German (2.5k hours) - Mozilla Common Voice (v12.0) - 800 hour subset - Multilingual Librispeech (MLS DE) - 1,500 hour subset - VoxPopuli (DE) - 200 hr subset #### Spanish (1.4k hours) - Mozilla Common Voice (v12.0) - 395 hour subset - Multilingual Librispeech (MLS ES) - 780 hour subset - VoxPopuli (ES) - 108 hour subset - Fisher - 141 hour subset #### French (1.8k hours) - Mozilla Common Voice (v12.0) - 708 hour subset - Multilingual Librispeech (MLS FR) - 926 hour subset - VoxPopuli (FR) - 165 hour subset ## Evaluation Dataset: **Data Collection Method:**
* Human
**Labeling Method:**
* Human
Automatic Speech Recognition: * HuggingFace OpenASR Leaderboard evaluation sets * MLS * [MCV] ( Automatic Speech Translation: * FLEURS * COVOST-v2 * mExpresso Timestamp Prediction: * Librispeech Hallucination Robustness: * MUSAN 48 hrs eval set Noise Robustness: * Librispeech Model Fairness: * Casual Conversations Dataset ## Training Canary-1B-Flash is trained using the NVIDIA NeMo Framework [7] for a total of 200K steps with 2D bucketing [1] and optimal batch sizes set using OOMptimizer [8].The model is trained on 128 NVIDIA A100 80GB GPUs. The model can be trained using this example script and base config. The tokenizers for these models were built using the text transcripts of the train set with this script. ## Inference: **Engine:** NVIDIA NeMo
**Test Hardware :**
* A6000
* A100
* V100
## Performance For ASR and AST experiments, predictions were generated using greedy decoding. Note that utterances shorter than 1 second are symmetrically zero-padded upto 1 second during evaluation. ### English ASR Performance (w/o PnC) The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with whisper-normalizer. WER on HuggingFace OpenASR leaderboard: | **Version** | **Model** | **RTFx** | **AMI** | **GigaSpeech** | **LS Clean** | **LS Other** | **Earnings22** | **SPGISpech** | **Tedlium** | **Voxpopuli** | |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:| | nemo-main | canary-1b-flash | 1045.75 | 13.11 | 9.85 | 1.48 | 2.87 | 12.79 | 1.95 | 3.12 | 5.63 | #### Inference speed on different systems We profiled inference speed on the OpenASR benchmark (batch_size=128) using the real-time factor (RTFx) to quantify throughput. | **Version** | **Model** | **System** | **RTFx** | |:-----------:|:-------------:|:------------:|:----------:| | nemo-main | canary-1b-flash | NVIDIA A100 | 1045.75 | | nemo-main | canary-1b-flash | NVIDIA H100 | 1669.07 | ### Multilingual ASR Performance WER on MLS test set: | **Version** | **Model** | **De** | **Es** | **Fr** | |:---------:|:-----------:|:------:|:------:|:------:| | nemo-main | canary-1b-flash | 4.36 | 2.69 | 4.47 | WER on MCV-16.1 test set: | **Version** | **Model** | **En** | **De** | **Es** | **Fr** | |:---------:|:-----------:|:------:|:------:|:------:|:------:| | nemo-main | canary-1b-flash | 6.99 | 4.09 | 3.62 | 6.15 | More details on evaluation can be found at HuggingFace ASR Leaderboard ### AST Performance We evaluate AST performance with BLEU score and COMET score, and use native annotations with punctuation and capitalization in the datasets. FLEURS test set: BLEU score: | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** | |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 32.27 | 22.6 | 41.22 | 35.5 | 23.32 | 33.42 | COMET score: | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** | |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 0.8114 | 0.8118 | 0.8165 | 0.8546 | 0.8228 | 0.8475 | COVOST-v2 test set: BLEU score: | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** | |:-----------:|:---------:|:----------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 39.33 | 41.86 | 41.43 | COMET score: | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** | |:-----------:|:---------:|:----------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 0.8553 | 0.8585 | 0.8511 | mExpresso test set: BLEU score: | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | |:-----------:|:---------:|:----------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 22.91 | 35.69 | 27.85 | COMET score: | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | |:-----------:|:---------:|:----------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 0.7889 | 0.8211 | 0.7910 | ### Timestamp Prediction F1-score on Librispeech Test sets at collar value of 200ms | **Version** | **Model** | **test-clean** | **test-other** | |:-----------:|:---------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 95.5 | 93.5 | ### Hallucination Robustness Number of characters per minute on MUSAN 48 hrs eval set | **Version** | **Model** | **# of character per minute** | |:-----------:|:---------:|:----------:| | nemo-main | canary-1b-flash | 60.92 | ### Noise Robustness WER on Librispeech Test Clean at different SNR (signal to noise ratio) levels of additive white noise | **Version** | **Model** | **SNR 10** | **SNR 5** | **SNR 0** | **SNR -5** | |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:| | nemo-main | canary-1b-flash | 2.34 | 3.69 | 8.84 | 29.71 | ## Model Fairness Evaluation As outlined in the paper \"Towards Measuring Fairness in AI: the Casual Conversations Dataset\" [9], we assessed the canary-1b-flash model for fairness. The model was evaluated on the CausalConversations-v1 dataset, and the results are reported as follows: ### Gender Bias: | Gender | Male | Female | N/A | Other | | :--- | :--- | :--- | :--- | :--- | | Num utterances | 19325 | 24532 | 926 | 33 | | % WER | 14.66 | 12.44 | 17.17 | 27.56 | ### Age Bias: | Age Group | (18-30) | (31-45) | (46-85) | (1-100) | | :--- | :--- | :--- | :--- | :--- | | Num utterances | 15956 | 14585 | 13349 | 43890 | | % WER | 13.18 | 13.45 | 13.64 | 13.41 | (Error rates for fairness evaluation are determined by normalizing both the reference and predicted text, similar to the methods used in the evaluations found at ## License/Terms of Use: canary-1b-flash is released under the CC-BY-4.0 license. By using this model, you are agreeing to the terms and conditions of the license.
## References: [1] Training and Inference Efficiency of Encoder-Decoder Speech Models [2] Less is More: Accurate Speech Recognition & Translation without Web-Scale Data [3] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [4] Attention is All You Need [5] Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer [6] Google Sentencepiece Tokenizer [7] NVIDIA NeMo Framework [8] EMMeTT: Efficient Multimodal Machine Translation Training [9] Towards Measuring Fairness in AI: the Casual Conversations Dataset ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.", + "model_explanation_gemini": "Performs automatic speech recognition and translation across multiple languages (English, German, Spanish, French) with high accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_diar_sortformer_4spk-v1.json b/data/model_data_json/nvidia_diar_sortformer_4spk-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..8d9267131e032a8ba8b190c8b8520918f98b8148 --- /dev/null +++ b/data/model_data_json/nvidia_diar_sortformer_4spk-v1.json @@ -0,0 +1,37 @@ +{ + "model_id": "nvidia/diar_sortformer_4spk-v1", + "downloads": 367254, + "tags": [ + "nemo", + "speaker-diarization", + "speaker-recognition", + "speech", + "audio", + "Transformer", + "FastConformer", + "Conformer", + "NEST", + "pytorch", + "NeMo", + "audio-classification", + "dataset:fisher_english", + "dataset:NIST_SRE_2004-2010", + "dataset:librispeech", + "dataset:ami_meeting_corpus", + "dataset:voxconverse_v0.3", + "dataset:icsi", + "dataset:aishell4", + "dataset:dihard_challenge-3", + "dataset:NIST_SRE_2000-Disc8_split1", + "arxiv:2409.06656", + "arxiv:2408.13106", + "arxiv:2305.05084", + "arxiv:2310.12371", + "arxiv:1706.03762", + "license:cc-by-nc-4.0", + "model-index", + "region:us" + ], + "description": "--- license: cc-by-nc-4.0 library_name: nemo datasets: - fisher_english - NIST_SRE_2004-2010 - librispeech - ami_meeting_corpus - voxconverse_v0.3 - icsi - aishell4 - dihard_challenge-3 - NIST_SRE_2000-Disc8_split1 thumbnail: null tags: - speaker-diarization - speaker-recognition - speech - audio - Transformer - FastConformer - Conformer - NEST - pytorch - NeMo widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: diar_sortformer_4spk-v1 results: - task: name: Speaker Diarization type: speaker-diarization-with-post-processing dataset: name: DIHARD3-eval type: dihard3-eval-1to4spks config: with_overlap_collar_0.0s split: eval metrics: - name: Test DER type: der value: 14.76 - task: name: Speaker Diarization type: speaker-diarization-with-post-processing dataset: name: CALLHOME (NIST-SRE-2000 Disc8) type: CALLHOME-part2-2spk config: with_overlap_collar_0.25s split: part2-2spk metrics: - name: Test DER type: der value: 5.85 - task: name: Speaker Diarization type: speaker-diarization-with-post-processing dataset: name: CALLHOME (NIST-SRE-2000 Disc8) type: CALLHOME-part2-3spk config: with_overlap_collar_0.25s split: part2-3spk metrics: - name: Test DER type: der value: 8.46 - task: name: Speaker Diarization type: speaker-diarization-with-post-processing dataset: name: CALLHOME (NIST-SRE-2000 Disc8) type: CALLHOME-part2-4spk config: with_overlap_collar_0.25s split: part2-4spk metrics: - name: Test DER type: der value: 12.59 - task: name: Speaker Diarization type: speaker-diarization-with-post-processing dataset: name: call_home_american_english_speech type: CHAES_2spk_109sessions config: with_overlap_collar_0.25s split: ch109 metrics: - name: Test DER type: der value: 6.86 metrics: - der pipeline_tag: audio-classification --- # Sortformer Diarizer 4spk v1 | Sortformer[1] is a novel end-to-end neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models.
Sortformer resolves permutation problem in diarization following the arrival-time order of the speech segments from each speaker. ## Model Architecture Sortformer consists of an L-size (18 layers) NeMo Encoder for Speech Tasks (NEST)[2] which is based on Fast-Conformer[3] encoder. Following that, an 18-layer Transformer[4] encoder with hidden size of 192, and two feedforward layers with 4 sigmoid outputs for each frame input at the top layer. More information can be found in the Sortformer paper[1].
## NVIDIA NeMo To train, fine-tune or perform diarization with Sortformer, you will need to install NVIDIA NeMo[5]. We recommend you install it after you've installed Cython and latest PyTorch version. ## How to Use this Model The model is available for use in the NeMo Framework[5], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. ### Loading the Model ### Input Format Input to Sortformer can be an individual audio file: or a list of paths to audio files: or a jsonl manifest file: where each line is a dictionary containing the following fields: ### Getting Diarization Results To perform speaker diarization and get a list of speaker-marked speech segments in the format 'begin_seconds, end_seconds, speaker_index', simply use: To obtain tensors of speaker activity probabilities, use: ### Input This model accepts single-channel (mono) audio sampled at 16,000 Hz. - The actual input tensor is a Ns x 1 matrix for each audio clip, where Ns is the number of samples in the time-series signal. - For instance, a 10-second audio clip sampled at 16,000 Hz (mono-channel WAV file) will form a 160,000 x 1 matrix. ### Output The output of the model is a T x S matrix, where: - S is the maximum number of speakers (in this model, S = 4). - T is the total number of frames, including zero-padding. Each frame corresponds to a segment of 0.08 seconds of audio. - Each element of the T x S matrix represents the speaker activity probability in the [0, 1] range. For example, a matrix element a(150, 2) = 0.95 indicates a 95% probability of activity for the second speaker during the time range [12.00, 12.08] seconds. ## Train and evaluate Sortformer diarizer using NeMo ### Training Sortformer diarizer models are trained on 8 nodes of 8×NVIDIA Tesla V100 GPUs. We use 90 second long training samples and batch size of 4. The model can be trained using this example script and base config. ### Evaluation To evaluate Sortformer diarizer and save diarization results in RTTM format, use the inference example script: You can provide the post-processing YAML configs from folder to reproduce the optimized post-processing algorithm for each development dataset: ### Technical Limitations - The model operates in a non-streaming mode (offline mode). - It can detect a maximum of 4 speakers; performance degrades on recordings with 5 and more speakers. - The maximum duration of a test recording depends on available GPU memory. For an RTX A6000 48GB model, the limit is around 12 minutes. - The model was trained on publicly available speech datasets, primarily in English. As a result: * Performance may degrade on non-English speech. * Performance may also degrade on out-of-domain data, such as recordings in noisy conditions. ## Datasets Sortformer was trained on a combination of 2030 hours of real conversations and 5150 hours or simulated audio mixtures generated by NeMo speech data simulator[6]. All the datasets listed above are based on the same labeling method via RTTM format. A subset of RTTM files used for model training are processed for the speaker diarization model training purposes. Data collection methods vary across individual datasets. For example, the above datasets include phone calls, interviews, web videos, and audiobook recordings. Please refer to the Linguistic Data Consortium (LDC) website or dataset webpage for detailed data collection methods. ### Training Datasets (Real conversations) - Fisher English (LDC) - 2004-2010 NIST Speaker Recognition Evaluation (LDC) - Librispeech - AMI Meeting Corpus - VoxConverse-v0.3 - ICSI - AISHELL-4 - Third DIHARD Challenge Development (LDC) - 2000 NIST Speaker Recognition Evaluation, split1 (LDC) ### Training Datasets (Used to simulate audio mixtures) - 2004-2010 NIST Speaker Recognition Evaluation (LDC) - Librispeech ## Performance ### Evaluation dataset specifications | **Dataset** | **DIHARD3-Eval** | **CALLHOME-part2** | **CALLHOME-part2** | **CALLHOME-part2** | **CH109** | |:------------------------------|:------------------:|:-------------------:|:-------------------:|:-------------------:|:------------------:| | **Number of Speakers** | ≤ 4 speakers | 2 speakers | 3 speakers | 4 speakers | 2 speakers | | **Collar (sec)** | 0.0s | 0.25s | 0.25s | 0.25s | 0.25s | | **Mean Audio Duration (sec)** | 453.0s | 73.0s | 135.7s | 329.8s | 552.9s | ### Diarization Error Rate (DER) * All evaluations include overlapping speech. * Bolded and italicized numbers represent the best-performing Sortformer evaluations. * Post-Processing (PP) is optimized on two different held-out dataset splits. - YAML file for DH3-dev Optimized Post-Processing - YAML file for CallHome-part1 Optimized Post-Processing | **Dataset** | **DIHARD3-Eval** | **CALLHOME-part2** | **CALLHOME-part2** | **CALLHOME-part2** | **CH109** | |:----------------------------------------------------------|:------------------:|:-------------------:|:-------------------:|:-------------------:|:------------------:| | DER **diar_sortformer_4spk-v1** | 16.28 | 6.49 | 10.01 | 14.14 | **_6.27_** | | DER **diar_sortformer_4spk-v1 + DH3-dev Opt. PP** | **_14.76_** | - | - | - | - | | DER **diar_sortformer_4spk-v1 + CallHome-part1 Opt. PP** | - | **_5.85_** | **_8.46_** | **_12.59_** | 6.86 | ### Real Time Factor (RTFx) All tests were measured on RTX A6000 48GB with batch size of 1. Post-processing is not included in RTFx calculations. | **Datasets** | **DIHARD3-Eval** | **CALLHOME-part2** | **CALLHOME-part2** | **CALLHOME-part2** | **CH109** | |:----------------------------------|:-------------------:|:-------------------:|:-------------------:|:-------------------:|:------------------:| | RTFx **diar_sortformer_4spk-v1** | 437 | 1053 | 915 | 545 | 415 | ## NVIDIA Riva: Deployment NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: * World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours * Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization * Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. ## References [1] Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens [2] NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks [3] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [4] Attention is all you need [5] NVIDIA NeMo Framework [6] NeMo speech data simulator ## Licence License to use this model is covered by the CC-BY-NC-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-NC-4.0 license.", + "model_explanation_gemini": "Performs speaker diarization by identifying and segmenting speech from up to four speakers in audio using a Transformer-based neural model." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_mit-b0.json b/data/model_data_json/nvidia_mit-b0.json new file mode 100644 index 0000000000000000000000000000000000000000..d09d739db5238fd6887734eb7dc395d6e343f27e --- /dev/null +++ b/data/model_data_json/nvidia_mit-b0.json @@ -0,0 +1,19 @@ +{ + "model_id": "nvidia/mit-b0", + "downloads": 77331, + "tags": [ + "transformers", + "pytorch", + "tf", + "segformer", + "image-classification", + "vision", + "dataset:imagenet_1k", + "arxiv:2105.15203", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - vision datasets: - imagenet_1k widget: - src: example_title: House - src: example_title: Castle --- # SegFormer (b0-sized) encoder pre-trained-only SegFormer encoder fine-tuned on Imagenet-1k. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. This repository only contains the pre-trained hierarchical Transformer, hence it can be used for fine-tuning purposes. ## Intended uses & limitations You can use the model for fine-tuning of semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### License The license for this model can be found here. ### BibTeX entry and citation info" +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_segformer-b0-finetuned-ade-512-512.json b/data/model_data_json/nvidia_segformer-b0-finetuned-ade-512-512.json new file mode 100644 index 0000000000000000000000000000000000000000..5840563504ebbefc64469e919bd94d2756d3f92b --- /dev/null +++ b/data/model_data_json/nvidia_segformer-b0-finetuned-ade-512-512.json @@ -0,0 +1,20 @@ +{ + "model_id": "nvidia/segformer-b0-finetuned-ade-512-512", + "downloads": 213515, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "segformer", + "vision", + "image-segmentation", + "dataset:scene_parse_150", + "arxiv:2105.15203", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - vision - image-segmentation datasets: - scene_parse_150 widget: - src: example_title: House - src: example_title: Castle --- # SegFormer (b0-sized) model fine-tuned on ADE20k SegFormer model fine-tuned on ADE20k at resolution 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. ## Intended uses & limitations You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### License The license for this model can be found here. ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs semantic segmentation on images, fine-tuned for scene parsing using the ADE20k dataset at 512x512 resolution." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_segformer-b1-finetuned-ade-512-512.json b/data/model_data_json/nvidia_segformer-b1-finetuned-ade-512-512.json new file mode 100644 index 0000000000000000000000000000000000000000..191dfb0f6ef8642cafab74eff25dd4c7ad17a3d6 --- /dev/null +++ b/data/model_data_json/nvidia_segformer-b1-finetuned-ade-512-512.json @@ -0,0 +1,19 @@ +{ + "model_id": "nvidia/segformer-b1-finetuned-ade-512-512", + "downloads": 549848, + "tags": [ + "transformers", + "pytorch", + "tf", + "segformer", + "vision", + "image-segmentation", + "dataset:scene_parse_150", + "arxiv:2105.15203", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other tags: - vision - image-segmentation datasets: - scene_parse_150 widget: - src: example_title: House - src: example_title: Castle --- # SegFormer (b1-sized) model fine-tuned on ADE20k SegFormer model fine-tuned on ADE20k at resolution 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. ## Intended uses & limitations You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. ### How to use Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. ### BibTeX entry and citation info", + "model_explanation_gemini": "Performs semantic image segmentation, specifically fine-tuned on the ADE20k dataset at 512x512 resolution using a hierarchical Transformer encoder and lightweight MLP decoder architecture." +} \ No newline at end of file diff --git a/data/model_data_json/nvidia_speakerverification_en_titanet_large.json b/data/model_data_json/nvidia_speakerverification_en_titanet_large.json new file mode 100644 index 0000000000000000000000000000000000000000..f5a915be6cdd3f2083a0b098f7e10346d026e131 --- /dev/null +++ b/data/model_data_json/nvidia_speakerverification_en_titanet_large.json @@ -0,0 +1,28 @@ +{ + "model_id": "nvidia/speakerverification_en_titanet_large", + "downloads": 1032844, + "tags": [ + "nemo", + "speaker", + "speech", + "audio", + "speaker-verification", + "speaker-recognition", + "speaker-diarization", + "titanet", + "NeMo", + "pytorch", + "en", + "dataset:VOXCELEB-1", + "dataset:VOXCELEB-2", + "dataset:FISHER", + "dataset:switchboard", + "dataset:librispeech_asr", + "dataset:SRE", + "license:cc-by-4.0", + "model-index", + "region:us" + ], + "description": "--- language: - en library_name: nemo datasets: - VOXCELEB-1 - VOXCELEB-2 - FISHER - switchboard - librispeech_asr - SRE thumbnail: null tags: - speaker - speech - audio - speaker-verification - speaker-recognition - speaker-diarization - titanet - NeMo - pytorch license: cc-by-4.0 widget: - src: example_title: Speech sample 1 - src: example_title: Speech sample 2 model-index: - name: speakerverification_en_titanet_large results: - task: name: Speaker Verification type: speaker-verification dataset: name: voxceleb1 type: voxceleb1-O config: clean split: test args: language: en metrics: - name: Test EER type: eer value: 0.66 - task: type: Speaker Diarization name: speaker-diarization dataset: name: ami-mixheadset type: ami_diarization config: oracle-vad-known-number-of-speakers split: test args: language: en metrics: - name: Test DER type: der value: 1.73 - task: type: Speaker Diarization name: speaker-diarization dataset: name: ami-lapel type: ami_diarization config: oracle-vad-known-number-of-speakers split: test args: language: en metrics: - name: Test DER type: der value: 2.03 - task: type: Speaker Diarization name: speaker-diarization dataset: name: ch109 type: callhome_diarization config: oracle-vad-known-number-of-speakers split: test args: language: en metrics: - name: Test DER type: der value: 1.19 - task: type: Speaker Diarization name: speaker-diarization dataset: name: nist-sre-2000 type: nist-sre_diarization config: oracle-vad-known-number-of-speakers split: test args: language: en metrics: - name: Test DER type: der value: 6.73 --- # NVIDIA TitaNet-Large (en-US) | | | This model extracts speaker embeddings from given speech, which is the backbone for speaker verification and diarization tasks. It is a \"large\" version of TitaNet (around 23M parameters) models. See the model architecture section and NeMo documentation for complete architecture details. ## NVIDIA NeMo: Training To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed the latest Pytorch version. ## How to Use this Model The model is available for use in the NeMo toolkit [3] and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. ### Automatically instantiate the model ### Embedding Extraction Using ### Verifying two utterances (Speaker Verification) Now to check if two audio files are from the same speaker or not, simply do: ### Extracting Embeddings for more audio files To extract embeddings from a bunch of audio files: Write audio files to a file with lines as in format: Then running following script will extract embeddings and writes to current working directory: ### Input This model accepts 16000 KHz Mono-channel Audio (wav files) as input. ### Output This model provides speaker embeddings for an audio file. ## Model Architecture TitaNet model is a depth-wise separable conv1D model [1] for Speaker Verification and diarization tasks. You may find more info on the detail of this model here: TitaNet-Model. ## Training The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this example script and this base config. ### Datasets All the models in this collection are trained on a composite dataset comprising several thousand hours of English speech: - Voxceleb-1 - Voxceleb-2 - Fisher - Switchboard - Librispeech - SRE (2004-2010) ## Performance Performances of the these models are reported in terms of Equal Error Rate (EER%) on speaker verification evaluation trial files and as Diarization Error Rate (DER%) on diarization test sessions. * Speaker Verification (EER%) | Version | Model | Model Size | VoxCeleb1 (Cleaned trial file) | |---------|--------------|-----|---------------| | 1.10.0 | TitaNet-Large | 23M | 0.66 | * Speaker Diarization (DER%) | Version | Model | Model Size | Evaluation Condition | NIST SRE 2000 | AMI (Lapel) | AMI (MixHeadset) | CH109 | |---------|--------------|-----|----------------------|---------------|-------------|------------------|-------| | 1.10.0 | TitaNet-Large | 23M | Oracle VAD KNOWN # of Speakers | 6.73 | 2.03 | 1.73 | 1.19 | | 1.10.0 | TitaNet-Large | 23M | Oracle VAD UNKNOWN # of Speakers | 5.38 | 2.03 | 1.89 | 1.63 | ## Limitations This model is trained on both telephonic and non-telephonic speech from voxceleb datasets, Fisher and switch board. If your domain of data differs from trained data or doesnot show relatively good performance consider finetuning for that speech domain. ## NVIDIA Riva: Deployment NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: * World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours * Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization * Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. ## References [1] TitaNet: Neural Model for Speaker Representation with 1D Depth-wise Separable convolutions and global context [2] NVIDIA NeMo Toolkit ## Licence License to use this model is covered by the CC-BY-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-4.0 license.", + "model_explanation_gemini": "Extracts speaker embeddings from speech for tasks like speaker verification and diarization." +} \ No newline at end of file diff --git a/data/model_data_json/obi_deid_bert_i2b2.json b/data/model_data_json/obi_deid_bert_i2b2.json new file mode 100644 index 0000000000000000000000000000000000000000..9fca1ccad876b28a08f87a14b23023e8c103a38c --- /dev/null +++ b/data/model_data_json/obi_deid_bert_i2b2.json @@ -0,0 +1,23 @@ +{ + "model_id": "obi/deid_bert_i2b2", + "downloads": 146644, + "tags": [ + "transformers", + "pytorch", + "bert", + "token-classification", + "deidentification", + "medical notes", + "ehr", + "phi", + "en", + "dataset:I2B2", + "arxiv:1904.03323", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en thumbnail: \" tags: - deidentification - medical notes - ehr - phi datasets: - I2B2 metrics: - F1 - Recall - AUC widget: - text: \"Physician Discharge Summary Admit date: 10/12/1982 Discharge date: 10/22/1982 Patient Information Jack Reacher, 54 y.o. male (DOB = 1/21/1928).\" - text: \"Home Address: 123 Park Drive, San Diego, CA, 03245. Home Phone: 202-555-0199 (home).\" - text: \"Hospital Care Team Service: Orthopedics Inpatient Attending: Roger C Kelly, MD Attending phys phone: (634)743-5135 Discharge Unit: HCS843 Primary Care Physician: Hassan V Kim, MD 512-832-5025.\" license: mit --- # Model Description * A ClinicalBERT [[Alsentzer et al., 2019]]( model fine-tuned for de-identification of medical notes. * Sequence Labeling (token classification): The model was trained to predict protected health information (PHI/PII) entities (spans). A list of protected health information categories is given by HIPAA. * A token can either be classified as non-PHI or as one of the 11 PHI types. Token predictions are aggregated to spans by making use of BILOU tagging. * The PHI labels that were used for training and other details can be found here: Annotation Guidelines * More details on how to use this model, the format of data and other useful information is present in the GitHub repo: Robust DeID. # How to use * A demo on how the model works (using model predictions to de-identify a medical note) is on this space: Medical-Note-Deidentification. * Steps on how this model can be used to run a forward pass can be found here: Forward Pass * In brief, the steps are: * Sentencize (the model aggregates the sentences back to the note level) and tokenize the dataset. * Use the predict function of this model to gather the predictions (i.e., predictions for each token). * Additionally, the model predictions can be used to remove PHI from the original note/text. # Dataset * The I2B2 2014 [[Stubbs and Uzuner, 2015]]( dataset was used to train this model. | | I2B2 | | I2B2 | | | --------- | --------------------- | ---------- | -------------------- | ---------- | | | TRAIN SET - 790 NOTES | | TEST SET - 514 NOTES | | | PHI LABEL | COUNT | PERCENTAGE | COUNT | PERCENTAGE | | DATE | 7502 | 43.69 | 4980 | 44.14 | | STAFF | 3149 | 18.34 | 2004 | 17.76 | | HOSP | 1437 | 8.37 | 875 | 7.76 | | AGE | 1233 | 7.18 | 764 | 6.77 | | LOC | 1206 | 7.02 | 856 | 7.59 | | PATIENT | 1316 | 7.66 | 879 | 7.79 | | PHONE | 317 | 1.85 | 217 | 1.92 | | ID | 881 | 5.13 | 625 | 5.54 | | PATORG | 124 | 0.72 | 82 | 0.73 | | EMAIL | 4 | 0.02 | 1 | 0.01 | | OTHERPHI | 2 | 0.01 | 0 | 0 | | TOTAL | 17171 | 100 | 11283 | 100 | # Training procedure * Steps on how this model was trained can be found here: Training. The \"model_name_or_path\" was set to: \"emilyalsentzer/Bio_ClinicalBERT\". * The dataset was sentencized with the en_core_sci_sm sentencizer from spacy. * The dataset was then tokenized with a custom tokenizer built on top of the en_core_sci_sm tokenizer from spacy. * For each sentence we added 32 tokens on the left (from previous sentences) and 32 tokens on the right (from the next sentences). * The added tokens are not used for learning - i.e, the loss is not computed on these tokens - they are used as additional context. * Each sequence contained a maximum of 128 tokens (including the 32 tokens added on). Longer sequences were split. * The sentencized and tokenized dataset with the token level labels based on the BILOU notation was used to train the model. * The model is fine-tuned from a pre-trained RoBERTa model. * Training details: * Input sequence length: 128 * Batch size: 32 * Optimizer: AdamW * Learning rate: 4e-5 * Dropout: 0.1 # Results # Questions? Post a Github issue on the repo: Robust DeID.", + "model_explanation_gemini": "Fine-tuned ClinicalBERT model for de-identifying medical notes by detecting and classifying protected health information (PHI) into 11 categories using token classification." +} \ No newline at end of file diff --git a/data/model_data_json/obi_deid_roberta_i2b2.json b/data/model_data_json/obi_deid_roberta_i2b2.json new file mode 100644 index 0000000000000000000000000000000000000000..df48972a7cecf52ce9b53ac0c9eb877bff38c640 --- /dev/null +++ b/data/model_data_json/obi_deid_roberta_i2b2.json @@ -0,0 +1,24 @@ +{ + "model_id": "obi/deid_roberta_i2b2", + "downloads": 1235317, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "roberta", + "token-classification", + "deidentification", + "medical notes", + "ehr", + "phi", + "en", + "dataset:I2B2", + "arxiv:1907.11692", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en thumbnail: \" tags: - deidentification - medical notes - ehr - phi datasets: - I2B2 metrics: - F1 - Recall - Precision widget: - text: \"Physician Discharge Summary Admit date: 10/12/1982 Discharge date: 10/22/1982 Patient Information Jack Reacher, 54 y.o. male (DOB = 1/21/1928).\" - text: \"Home Address: 123 Park Drive, San Diego, CA, 03245. Home Phone: 202-555-0199 (home).\" - text: \"Hospital Care Team Service: Orthopedics Inpatient Attending: Roger C Kelly, MD Attending phys phone: (634)743-5135 Discharge Unit: HCS843 Primary Care Physician: Hassan V Kim, MD 512-832-5025.\" license: mit --- # Model Description * A RoBERTa [[Liu et al., 2019]]( model fine-tuned for de-identification of medical notes. * Sequence Labeling (token classification): The model was trained to predict protected health information (PHI/PII) entities (spans). A list of protected health information categories is given by HIPAA. * A token can either be classified as non-PHI or as one of the 11 PHI types. Token predictions are aggregated to spans by making use of BILOU tagging. * The PHI labels that were used for training and other details can be found here: Annotation Guidelines * More details on how to use this model, the format of data and other useful information is present in the GitHub repo: Robust DeID. # How to use * A demo on how the model works (using model predictions to de-identify a medical note) is on this space: Medical-Note-Deidentification. * Steps on how this model can be used to run a forward pass can be found here: Forward Pass * In brief, the steps are: * Sentencize (the model aggregates the sentences back to the note level) and tokenize the dataset. * Use the predict function of this model to gather the predictions (i.e., predictions for each token). * Additionally, the model predictions can be used to remove PHI from the original note/text. # Dataset * The I2B2 2014 [[Stubbs and Uzuner, 2015]]( dataset was used to train this model. | | I2B2 | | I2B2 | | | --------- | --------------------- | ---------- | -------------------- | ---------- | | | TRAIN SET - 790 NOTES | | TEST SET - 514 NOTES | | | PHI LABEL | COUNT | PERCENTAGE | COUNT | PERCENTAGE | | DATE | 7502 | 43.69 | 4980 | 44.14 | | STAFF | 3149 | 18.34 | 2004 | 17.76 | | HOSP | 1437 | 8.37 | 875 | 7.76 | | AGE | 1233 | 7.18 | 764 | 6.77 | | LOC | 1206 | 7.02 | 856 | 7.59 | | PATIENT | 1316 | 7.66 | 879 | 7.79 | | PHONE | 317 | 1.85 | 217 | 1.92 | | ID | 881 | 5.13 | 625 | 5.54 | | PATORG | 124 | 0.72 | 82 | 0.73 | | EMAIL | 4 | 0.02 | 1 | 0.01 | | OTHERPHI | 2 | 0.01 | 0 | 0 | | TOTAL | 17171 | 100 | 11283 | 100 | # Training procedure * Steps on how this model was trained can be found here: Training. The \"model_name_or_path\" was set to: \"roberta-large\". * The dataset was sentencized with the en_core_sci_sm sentencizer from spacy. * The dataset was then tokenized with a custom tokenizer built on top of the en_core_sci_sm tokenizer from spacy. * For each sentence we added 32 tokens on the left (from previous sentences) and 32 tokens on the right (from the next sentences). * The added tokens are not used for learning - i.e, the loss is not computed on these tokens - they are used as additional context. * Each sequence contained a maximum of 128 tokens (including the 32 tokens added on). Longer sequences were split. * The sentencized and tokenized dataset with the token level labels based on the BILOU notation was used to train the model. * The model is fine-tuned from a pre-trained RoBERTa model. * Training details: * Input sequence length: 128 * Batch size: 32 (16 with 2 gradient accumulation steps) * Optimizer: AdamW * Learning rate: 5e-5 * Dropout: 0.1 ## Results # Questions? Post a Github issue on the repo: Robust DeID.", + "model_explanation_gemini": "Identifies and classifies protected health information (PHI/PII) in medical notes using token-level sequence labeling for de-identification." +} \ No newline at end of file diff --git a/data/model_data_json/oliverguhr_fullstop-punctuation-multilang-large.json b/data/model_data_json/oliverguhr_fullstop-punctuation-multilang-large.json new file mode 100644 index 0000000000000000000000000000000000000000..e08e685021b25ed6b4962918677eda903de2d6f0 --- /dev/null +++ b/data/model_data_json/oliverguhr_fullstop-punctuation-multilang-large.json @@ -0,0 +1,27 @@ +{ + "model_id": "oliverguhr/fullstop-punctuation-multilang-large", + "downloads": 353828, + "tags": [ + "transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "xlm-roberta", + "token-classification", + "punctuation prediction", + "punctuation", + "en", + "de", + "fr", + "it", + "multilingual", + "dataset:wmt/europarl", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - de - fr - it - multilingual tags: - punctuation prediction - punctuation datasets: wmt/europarl license: mit widget: - text: \"Ho sentito che ti sei laureata il che mi fa molto piacere\" example_title: \"Italian\" - text: \"Tous les matins vers quatre heures mon père ouvrait la porte de ma chambre\" example_title: \"French\" - text: \"Ist das eine Frage Frau Müller\" example_title: \"German\" - text: \"Yet she blushed as if with guilt when Cynthia reading her thoughts said to her one day Molly you're very glad to get rid of us are not you\" example_title: \"English\" metrics: - f1 --- This model predicts the punctuation of English, Italian, French and German texts. We developed it to restore the punctuation of transcribed spoken language. This multilanguage model was trained on the Europarl Dataset provided by the SEPP-NLG Shared Task. *Please note that this dataset consists of political speeches. Therefore the model might perform differently on texts from other domains.* The model restores the following punctuation markers: **\".\" \",\" \"?\" \"-\" \":\"** ## Sample Code We provide a simple python package that allows you to process text of any length. ## Install To get started install the package from pypi: ### Restore Punctuation **output** > My name is Clara and I live in Berkeley, California. Ist das eine Frage, Frau Müller? ### Predict Labels **output** > [['My', '0', 0.9999887], ['name', '0', 0.99998665], ['is', '0', 0.9998579], ['Clara', '0', 0.6752215], ['and', '0', 0.99990904], ['I', '0', 0.9999877], ['live', '0', 0.9999839], ['in', '0', 0.9999515], ['Berkeley', ',', 0.99800044], ['California', '.', 0.99534047], ['Ist', '0', 0.99998784], ['das', '0', 0.99999154], ['eine', '0', 0.9999918], ['Frage', ',', 0.99622655], ['Frau', '0', 0.9999889], ['Müller', '?', 0.99863917]] ## Results The performance differs for the single punctuation markers as hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop. The model achieves the following F1 scores for the different languages: | Label | EN | DE | FR | IT | | ------------- | ----- | ----- | ----- | ----- | | 0 | 0.991 | 0.997 | 0.992 | 0.989 | | . | 0.948 | 0.961 | 0.945 | 0.942 | | ? | 0.890 | 0.893 | 0.871 | 0.832 | | , | 0.819 | 0.945 | 0.831 | 0.798 | | : | 0.575 | 0.652 | 0.620 | 0.588 | | - | 0.425 | 0.435 | 0.431 | 0.421 | | macro average | 0.775 | 0.814 | 0.782 | 0.762 | ## Languages ### Models | Languages | Model | | ------------------------------------------ | ------------------------------------------------------------ | | English, Italian, French and German | oliverguhr/fullstop-punctuation-multilang-large | | English, Italian, French, German and Dutch | oliverguhr/fullstop-punctuation-multilingual-sonar-base | | Dutch | oliverguhr/fullstop-dutch-sonar-punctuation-prediction | ### Community Models | Languages | Model | | ------------------------------------------ | ------------------------------------------------------------ | |English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian| kredor/punctuate-all | | Catalan | softcatala/fullstop-catalan-punctuation-prediction | | Welsh | techiaith/fullstop-welsh-punctuation-prediction | You can use different models by setting the model parameter: ## Where do I find the code and can I train my own model? Yes you can! For complete code of the reareach project take a look at this repository. There is also an guide on how to fine tune this model for you data / language. ## References", + "model_explanation_gemini": "Predicts and restores punctuation (.,?-) in English, German, French, and Italian texts, primarily for transcribed spoken language." +} \ No newline at end of file diff --git a/data/model_data_json/oliverguhr_german-sentiment-bert.json b/data/model_data_json/oliverguhr_german-sentiment-bert.json new file mode 100644 index 0000000000000000000000000000000000000000..2e57f74ca1ba3b0484b7b3907e361d6140a591bf --- /dev/null +++ b/data/model_data_json/oliverguhr_german-sentiment-bert.json @@ -0,0 +1,22 @@ +{ + "model_id": "oliverguhr/german-sentiment-bert", + "downloads": 355798, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "bert", + "text-classification", + "sentiment", + "de", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - de tags: - sentiment - bert license: mit widget: - text: \"Das ist gar nicht mal so schlecht\" metrics: - f1 --- # German Sentiment Classification with Bert This model was trained for sentiment classification of German language texts. To achieve the best results all model inputs needs to be preprocessed with the same procedure, that was applied during the training. To simplify the usage of the model, we provide a Python package that bundles the code need for the preprocessing and inferencing. The model uses the Googles Bert architecture and was trained on 1.834 million German-language samples. The training data contains texts from various domains like Twitter, Facebook and movie, app and hotel reviews. You can find more information about the dataset and the training process in the paper. ## Using the Python package To get started install the package from pypi: The code above will output following list: ### Output class probabilities ## Model and Data If you are interested in code and data that was used to train this model please have a look at this repository and our paper. Here is a table of the F1 scores that this model achieves on different datasets. Since we trained this model with a newer version of the transformer library, the results are slightly better than reported in the paper. | Dataset | F1 micro Score | | :----------------------------------------------------------- | -------------: | | holidaycheck | 0.9568 | | scare | 0.9418 | | filmstarts | 0.9021 | | germeval | 0.7536 | | PotTS | 0.6780 | | emotions | 0.9649 | | sb10k | 0.7376 | | Leipzig Wikipedia Corpus 2016 | 0.9967 | | all | 0.9639 | ## Cite For feedback and questions contact me view mail or Twitter @oliverguhr. Please cite us if you found this useful:", + "model_explanation_gemini": "Classifies sentiment in German text using BERT architecture, trained on diverse datasets including social media and reviews." +} \ No newline at end of file diff --git a/data/model_data_json/onnx-community_Kokoro-82M-v1.0-ONNX.json b/data/model_data_json/onnx-community_Kokoro-82M-v1.0-ONNX.json new file mode 100644 index 0000000000000000000000000000000000000000..d2a6430302499afb9380c9f9de8451890d6e0e13 --- /dev/null +++ b/data/model_data_json/onnx-community_Kokoro-82M-v1.0-ONNX.json @@ -0,0 +1,16 @@ +{ + "model_id": "onnx-community/Kokoro-82M-v1.0-ONNX", + "downloads": 82437, + "tags": [ + "transformers.js", + "onnx", + "style_text_to_speech_2", + "text-to-speech", + "en", + "base_model:hexgrad/Kokoro-82M", + "base_model:quantized:hexgrad/Kokoro-82M", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: transformers.js language: - en base_model: - hexgrad/Kokoro-82M pipeline_tag: text-to-speech --- # Kokoro TTS Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). ## Table of contents - Usage - JavaScript - Python - Voices/Samples - Quantizations ## Usage ### JavaScript First, install the library from NPM using: You can then generate speech as follows: ### Python Optionally, save the audio to a file: ## Voices/Samples > Life is like a box of chocolates. You never know what you're gonna get. | Name | Nationality | Gender | Sample | | ------------ | ----------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------- | | **af_heart** | American | Female |
", + "model_explanation_gemini": "Generates English text by predicting the next word in a sequence based on pretraining on a large corpus without human labeling." +} \ No newline at end of file diff --git a/data/model_data_json/openai_clip-vit-base-patch16.json b/data/model_data_json/openai_clip-vit-base-patch16.json new file mode 100644 index 0000000000000000000000000000000000000000..cccbe14ca05776fdadbcf9b6911d55dc171acdb6 --- /dev/null +++ b/data/model_data_json/openai_clip-vit-base-patch16.json @@ -0,0 +1,18 @@ +{ + "model_id": "openai/clip-vit-base-patch16", + "downloads": 4783494, + "tags": [ + "transformers", + "pytorch", + "jax", + "clip", + "zero-shot-image-classification", + "vision", + "arxiv:2103.00020", + "arxiv:1908.04913", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # Model Card: CLIP Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here. ## Model Details The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was also developed to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. It was not developed for general model deployment - to deploy models like CLIP, researchers will first need to carefully study their capabilities in relation to the specific context they’re being deployed within. ### Model Date January 2021 ### Model Type The base model uses a ViT-B/16 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. The original implementation had two variants: one using a ResNet image encoder and the other using a Vision Transformer. This repository has the variant with the Vision Transformer. ### Documents - Blog Post - CLIP Paper ### Use with Transformers ## Model Use ### Intended Use The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. #### Primary intended uses The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ### Out-of-Scope Use Cases **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. ## Data The model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as YFCC100M. A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet which tend to skew towards more developed nations, and younger, male users. ### Data Mission Statement Our goal with building this dataset was to test out robustness and generalizability in computer vision tasks. As a result, the focus was on gathering large quantities of data from different publicly-available internet data sources. The data was gathered in a mostly non-interventionist manner. However, we only crawled websites that had policies against excessively violent and adult images and allowed us to filter out such content. We do not intend for this dataset to be used as the basis for any commercial or deployed model and will not be releasing the dataset. ## Performance and Limitations ### Performance We have evaluated the performance of CLIP on a wide range of benchmarks across a variety of computer vision datasets such as OCR to texture recognition to fine-grained classification. The paper describes model performance on the following datasets: - Food101 - CIFAR10 - CIFAR100 - Birdsnap - SUN397 - Stanford Cars - FGVC Aircraft - VOC2007 - DTD - Oxford-IIIT Pet dataset - Caltech101 - Flowers102 - MNIST - SVHN - IIIT5K - Hateful Memes - SST-2 - UCF101 - Kinetics700 - Country211 - CLEVR Counting - KITTI Distance - STL-10 - RareAct - Flickr30 - MSCOCO - ImageNet - ImageNet-A - ImageNet-R - ImageNet Sketch - ObjectNet (ImageNet Overlap) - Youtube-BB - ImageNet-Vid ## Limitations CLIP and our analysis of it have a number of limitations. CLIP currently struggles with respect to certain tasks such as fine grained classification and counting objects. CLIP also poses issues with regards to fairness and bias which we discuss in the paper and briefly in the next section. Additionally, our approach to testing CLIP also has an important limitation- in many cases we have used linear probes to evaluate the performance of CLIP and there is evidence suggesting that linear probes can underestimate model performance. ### Bias and Fairness We find that the performance of CLIP - and the specific biases it exhibits - can depend significantly on class design and the choices one makes for categories to include and exclude. We tested the risk of certain kinds of denigration with CLIP by classifying images of people from Fairface into crime-related and non-human animal categories. We found significant disparities with respect to race and gender. Additionally, we found that these disparities could shift based on how the classes were constructed. (Details captured in the Broader Impacts Section in the paper). We also tested the performance of CLIP on gender, race and age classification using the Fairface dataset (We default to using race categories as they are constructed in the Fairface dataset.) in order to assess quality of performance across different demographics. We found accuracy >96% across all races for gender classification with ‘Middle Eastern’ having the highest accuracy (98.4%) and ‘White’ having the lowest (96.5%). Additionally, CLIP averaged ~93% for racial classification and ~63% for age classification. Our use of evaluations to test for gender, race and age classification as well as denigration harms is simply to evaluate performance of the model across people and surface potential risks and not to demonstrate an endorsement/enthusiasm for such tasks. ## Feedback ### Where to send questions or comments about the model Please use this Google Form", + "model_explanation_gemini": "Enables zero-shot image classification by matching images with text descriptions using a Vision Transformer encoder and contrastive learning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_clip-vit-base-patch32.json b/data/model_data_json/openai_clip-vit-base-patch32.json new file mode 100644 index 0000000000000000000000000000000000000000..d5638f058c8c013ab17b5c7545a9b643e5d4b19d --- /dev/null +++ b/data/model_data_json/openai_clip-vit-base-patch32.json @@ -0,0 +1,19 @@ +{ + "model_id": "openai/clip-vit-base-patch32", + "downloads": 13000899, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "clip", + "zero-shot-image-classification", + "vision", + "arxiv:2103.00020", + "arxiv:1908.04913", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # Model Card: CLIP Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here. ## Model Details The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was also developed to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. It was not developed for general model deployment - to deploy models like CLIP, researchers will first need to carefully study their capabilities in relation to the specific context they’re being deployed within. ### Model Date January 2021 ### Model Type The model uses a ViT-B/32 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. The original implementation had two variants: one using a ResNet image encoder and the other using a Vision Transformer. This repository has the variant with the Vision Transformer. ### Documents - Blog Post - CLIP Paper ### Use with Transformers ## Model Use ### Intended Use The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. #### Primary intended uses The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ### Out-of-Scope Use Cases **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. ## Data The model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as YFCC100M. A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet which tend to skew towards more developed nations, and younger, male users. ### Data Mission Statement Our goal with building this dataset was to test out robustness and generalizability in computer vision tasks. As a result, the focus was on gathering large quantities of data from different publicly-available internet data sources. The data was gathered in a mostly non-interventionist manner. However, we only crawled websites that had policies against excessively violent and adult images and allowed us to filter out such content. We do not intend for this dataset to be used as the basis for any commercial or deployed model and will not be releasing the dataset. ## Performance and Limitations ### Performance We have evaluated the performance of CLIP on a wide range of benchmarks across a variety of computer vision datasets such as OCR to texture recognition to fine-grained classification. The paper describes model performance on the following datasets: - Food101 - CIFAR10 - CIFAR100 - Birdsnap - SUN397 - Stanford Cars - FGVC Aircraft - VOC2007 - DTD - Oxford-IIIT Pet dataset - Caltech101 - Flowers102 - MNIST - SVHN - IIIT5K - Hateful Memes - SST-2 - UCF101 - Kinetics700 - Country211 - CLEVR Counting - KITTI Distance - STL-10 - RareAct - Flickr30 - MSCOCO - ImageNet - ImageNet-A - ImageNet-R - ImageNet Sketch - ObjectNet (ImageNet Overlap) - Youtube-BB - ImageNet-Vid ## Limitations CLIP and our analysis of it have a number of limitations. CLIP currently struggles with respect to certain tasks such as fine grained classification and counting objects. CLIP also poses issues with regards to fairness and bias which we discuss in the paper and briefly in the next section. Additionally, our approach to testing CLIP also has an important limitation- in many cases we have used linear probes to evaluate the performance of CLIP and there is evidence suggesting that linear probes can underestimate model performance. ### Bias and Fairness We find that the performance of CLIP - and the specific biases it exhibits - can depend significantly on class design and the choices one makes for categories to include and exclude. We tested the risk of certain kinds of denigration with CLIP by classifying images of people from Fairface into crime-related and non-human animal categories. We found significant disparities with respect to race and gender. Additionally, we found that these disparities could shift based on how the classes were constructed. (Details captured in the Broader Impacts Section in the paper). We also tested the performance of CLIP on gender, race and age classification using the Fairface dataset (We default to using race categories as they are constructed in the Fairface dataset.) in order to assess quality of performance across different demographics. We found accuracy >96% across all races for gender classification with ‘Middle Eastern’ having the highest accuracy (98.4%) and ‘White’ having the lowest (96.5%). Additionally, CLIP averaged ~93% for racial classification and ~63% for age classification. Our use of evaluations to test for gender, race and age classification as well as denigration harms is simply to evaluate performance of the model across people and surface potential risks and not to demonstrate an endorsement/enthusiasm for such tasks. ## Feedback ### Where to send questions or comments about the model Please use this Google Form", + "model_explanation_gemini": "Performs zero-shot image classification by matching images with text descriptions using a Vision Transformer encoder and contrastive learning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_clip-vit-large-patch14-336.json b/data/model_data_json/openai_clip-vit-large-patch14-336.json new file mode 100644 index 0000000000000000000000000000000000000000..b94d99d638df2d22c44a9ff35725d95fd9d42881 --- /dev/null +++ b/data/model_data_json/openai_clip-vit-large-patch14-336.json @@ -0,0 +1,16 @@ +{ + "model_id": "openai/clip-vit-large-patch14-336", + "downloads": 5275913, + "tags": [ + "transformers", + "pytorch", + "tf", + "clip", + "zero-shot-image-classification", + "generated_from_keras_callback", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - generated_from_keras_callback widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog model-index: - name: clip-vit-large-patch14-336 results: [] --- # clip-vit-large-patch14-336 This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set: ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - optimizer: None - training_precision: float32 ### Training results ### Framework versions - Transformers 4.21.3 - TensorFlow 2.8.2 - Tokenizers 0.12.1", + "model_explanation_gemini": "Generates image-text representations for tasks like classification based on input images and candidate labels." +} \ No newline at end of file diff --git a/data/model_data_json/openai_clip-vit-large-patch14.json b/data/model_data_json/openai_clip-vit-large-patch14.json new file mode 100644 index 0000000000000000000000000000000000000000..b40f2422b95a13aed0ec8793baa6d89ca549f6c1 --- /dev/null +++ b/data/model_data_json/openai_clip-vit-large-patch14.json @@ -0,0 +1,20 @@ +{ + "model_id": "openai/clip-vit-large-patch14", + "downloads": 42956604, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "clip", + "zero-shot-image-classification", + "vision", + "arxiv:2103.00020", + "arxiv:1908.04913", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - vision widget: - src: candidate_labels: playing music, playing sports example_title: Cat & Dog --- # Model Card: CLIP Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here. ## Model Details The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was also developed to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. It was not developed for general model deployment - to deploy models like CLIP, researchers will first need to carefully study their capabilities in relation to the specific context they’re being deployed within. ### Model Date January 2021 ### Model Type The base model uses a ViT-L/14 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss. The original implementation had two variants: one using a ResNet image encoder and the other using a Vision Transformer. This repository has the variant with the Vision Transformer. ### Documents - Blog Post - CLIP Paper ### Use with Transformers ## Model Use ### Intended Use The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. #### Primary intended uses The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ### Out-of-Scope Use Cases **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. ## Data The model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as YFCC100M. A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet which tend to skew towards more developed nations, and younger, male users. ### Data Mission Statement Our goal with building this dataset was to test out robustness and generalizability in computer vision tasks. As a result, the focus was on gathering large quantities of data from different publicly-available internet data sources. The data was gathered in a mostly non-interventionist manner. However, we only crawled websites that had policies against excessively violent and adult images and allowed us to filter out such content. We do not intend for this dataset to be used as the basis for any commercial or deployed model and will not be releasing the dataset. ## Performance and Limitations ### Performance We have evaluated the performance of CLIP on a wide range of benchmarks across a variety of computer vision datasets such as OCR to texture recognition to fine-grained classification. The paper describes model performance on the following datasets: - Food101 - CIFAR10 - CIFAR100 - Birdsnap - SUN397 - Stanford Cars - FGVC Aircraft - VOC2007 - DTD - Oxford-IIIT Pet dataset - Caltech101 - Flowers102 - MNIST - SVHN - IIIT5K - Hateful Memes - SST-2 - UCF101 - Kinetics700 - Country211 - CLEVR Counting - KITTI Distance - STL-10 - RareAct - Flickr30 - MSCOCO - ImageNet - ImageNet-A - ImageNet-R - ImageNet Sketch - ObjectNet (ImageNet Overlap) - Youtube-BB - ImageNet-Vid ## Limitations CLIP and our analysis of it have a number of limitations. CLIP currently struggles with respect to certain tasks such as fine grained classification and counting objects. CLIP also poses issues with regards to fairness and bias which we discuss in the paper and briefly in the next section. Additionally, our approach to testing CLIP also has an important limitation- in many cases we have used linear probes to evaluate the performance of CLIP and there is evidence suggesting that linear probes can underestimate model performance. ### Bias and Fairness We find that the performance of CLIP - and the specific biases it exhibits - can depend significantly on class design and the choices one makes for categories to include and exclude. We tested the risk of certain kinds of denigration with CLIP by classifying images of people from Fairface into crime-related and non-human animal categories. We found significant disparities with respect to race and gender. Additionally, we found that these disparities could shift based on how the classes were constructed. (Details captured in the Broader Impacts Section in the paper). We also tested the performance of CLIP on gender, race and age classification using the Fairface dataset (We default to using race categories as they are constructed in the Fairface dataset.) in order to assess quality of performance across different demographics. We found accuracy >96% across all races for gender classification with ‘Middle Eastern’ having the highest accuracy (98.4%) and ‘White’ having the lowest (96.5%). Additionally, CLIP averaged ~93% for racial classification and ~63% for age classification. Our use of evaluations to test for gender, race and age classification as well as denigration harms is simply to evaluate performance of the model across people and surface potential risks and not to demonstrate an endorsement/enthusiasm for such tasks. ## Feedback ### Where to send questions or comments about the model Please use this Google Form", + "model_explanation_gemini": "Enables zero-shot image classification by learning to associate images with text descriptions using a Vision Transformer encoder and contrastive training." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-base.en.json b/data/model_data_json/openai_whisper-base.en.json new file mode 100644 index 0000000000000000000000000000000000000000..28092aa4ca5accdc2cb828bf6ce6ddf235173618 --- /dev/null +++ b/data/model_data_json/openai_whisper-base.en.json @@ -0,0 +1,23 @@ +{ + "model_id": "openai/whisper-base.en", + "downloads": 2075648, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "arxiv:2212.04356", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: whisper-base.en results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 12.803978669490565 pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. The original code repository can be found here. **Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage This checkpoint is an *English-only* model, meaning it can be used for English speech recognition. Multilingual speech recognition or speech translation is possible through use of a multilingual checkpoint. To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) ## Transcription The context tokens can be removed from the start of the transcription by setting . ## Evaluation This code snippet shows how to evaluate Whisper base.en on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "Transcribes English speech into text using automatic speech recognition without requiring fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-base.json b/data/model_data_json/openai_whisper-base.json new file mode 100644 index 0000000000000000000000000000000000000000..64fda4eda8f7018b1741227c1f0139280de49506 --- /dev/null +++ b/data/model_data_json/openai_whisper-base.json @@ -0,0 +1,121 @@ +{ + "model_id": "openai/whisper-base", + "downloads": 439420, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: whisper-base results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 5.008769117619326 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 12.84936273212057 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: hi split: test args: language: hi metrics: - name: Test WER type: wer value: 131 pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here. **Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) The model is informed of which task to perform (transcription or translation) by passing the appropriate \"context tokens\". These context tokens are a sequence of tokens that are given to the decoder at the start of the decoding process, and take the following order: 1. The transcription always starts with the token 2. The second token is the language token (e.g. for English) 3. The third token is the \"task token\". It can take one of two values: for speech recognition or for speech translation 4. In addition, a token is added if the model should not include timestamp prediction Thus, a typical sequence of context tokens might look as follows: Which tells the model to decode in English, under the task of speech recognition, and not to predict timestamps. These tokens can either be forced or un-forced. If they are forced, the model is made to predict each token at each position. This allows one to control the output language and task for the Whisper model. If they are un-forced, the Whisper model will automatically predict the output langauge and task itself. The context tokens can be set accordingly: Which forces the model to predict in English under the task of speech recognition. ## Transcription ### English to English In this example, the context tokens are 'unforced', meaning the model automatically predicts the output language (English) and task (transcribe). The context tokens can be removed from the start of the transcription by setting . ### French to French The following example demonstrates French to French transcription by setting the decoder ids appropriately. ## Translation Setting the task to \"translate\" forces the Whisper model to perform speech translation. ### French to English ## Evaluation This code snippet shows how to evaluate Whisper Base on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "OpenAI Whisper-base is a multilingual automatic speech recognition model trained to transcribe speech into text across numerous languages without requiring fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-large-v2.json b/data/model_data_json/openai_whisper-large-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..70534ce0f40894aba834f7277c92a43d940b04ba --- /dev/null +++ b/data/model_data_json/openai_whisper-large-v2.json @@ -0,0 +1,120 @@ +{ + "model_id": "openai/whisper-large-v2", + "downloads": 161959, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. The original code repository can be found here. Compared to the Whisper large model, the large-v2 model is trained for 2.5x more epochs with added regularization for improved performance. **Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) The model is informed of which task to perform (transcription or translation) by passing the appropriate \"context tokens\". These context tokens are a sequence of tokens that are given to the decoder at the start of the decoding process, and take the following order: 1. The transcription always starts with the token 2. The second token is the language token (e.g. for English) 3. The third token is the \"task token\". It can take one of two values: for speech recognition or for speech translation 4. In addition, a token is added if the model should not include timestamp prediction Thus, a typical sequence of context tokens might look as follows: Which tells the model to decode in English, under the task of speech recognition, and not to predict timestamps. These tokens can either be forced or un-forced. If they are forced, the model is made to predict each token at each position. This allows one to control the output language and task for the Whisper model. If they are un-forced, the Whisper model will automatically predict the output langauge and task itself. The context tokens can be set accordingly: Which forces the model to predict in English under the task of speech recognition. ## Transcription ### English to English In this example, the context tokens are 'unforced', meaning the model automatically predicts the output language (English) and task (transcribe). The context tokens can be removed from the start of the transcription by setting . ### French to French The following example demonstrates French to French transcription by setting the decoder ids appropriately. ## Translation Setting the task to \"translate\" forces the Whisper model to perform speech translation. ### French to English ## Evaluation This code snippet shows how to evaluate Whisper Large on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "OpenAI's Whisper-large-v2 is a multilingual automatic speech recognition and translation model trained on extensive labeled data to transcribe or translate speech across numerous languages without requiring fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-large-v3-turbo.json b/data/model_data_json/openai_whisper-large-v3-turbo.json new file mode 100644 index 0000000000000000000000000000000000000000..4138b6616fc0d0e346cdf3dff5ae56646d2f22a6 --- /dev/null +++ b/data/model_data_json/openai_whisper-large-v3-turbo.json @@ -0,0 +1,118 @@ +{ + "model_id": "openai/whisper-large-v3-turbo", + "downloads": 6822044, + "tags": [ + "transformers", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "base_model:openai/whisper-large-v3", + "base_model:finetune:openai/whisper-large-v3", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su license: mit tags: - audio - automatic-speech-recognition widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: pipeline_tag: automatic-speech-recognition base_model: - openai/whisper-large-v3 library_name: transformers --- # Whisper Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation. You can find more details about it in this GitHub discussion. **Disclaimer**: Content for this model card has partly been written by the 🤗 Hugging Face team, and partly copied and pasted from the original model card. ## Usage Whisper large-v3-turbo is supported in Hugging Face 🤗 Transformers. To run the model, first install the Transformers library. For this example, we'll also install 🤗 Datasets to load toy audio dataset from the Hugging Face Hub, and 🤗 Accelerate to reduce the model loading time: The model can be used with the []( class to transcribe audios of arbitrary length: To transcribe a local audio file, simply pass the path to your audio file when you call the pipeline: Multiple audio files can be transcribed in parallel by specifying them as a list and setting the parameter: Transformers is compatible with all Whisper decoding strategies, such as temperature fallback and condition on previous tokens. The following example demonstrates how to enable these heuristics: Whisper predicts the language of the source audio automatically. If the source audio language is known *a-priori*, it can be passed as an argument to the pipeline: By default, Whisper performs the task of *speech transcription*, where the source audio language is the same as the target text language. To perform *speech translation*, where the target text is in English, set the task to : Finally, the model can be made to predict timestamps. For sentence-level timestamps, pass the argument: And for word-level timestamps: The above arguments can be used in isolation or in combination. For example, to perform the task of speech transcription where the source audio is in French, and we want to return sentence-level timestamps, the following can be used:
For more control over the generation parameters, use the model + processor API directly:
## Additional Speed & Memory Improvements You can apply additional speed and memory improvements to Whisper to further reduce the inference speed and VRAM requirements. ### Chunked Long-Form Whisper has a receptive field of 30-seconds. To transcribe audios longer than this, one of two long-form algorithms are required: 1. **Sequential:** uses a \"sliding window\" for buffered inference, transcribing 30-second slices one after the other 2. **Chunked:** splits long audio files into shorter ones (with a small overlap between segments), transcribes each segment independently, and stitches the resulting transcriptions at the boundaries The sequential long-form algorithm should be used in either of the following scenarios: 1. Transcription accuracy is the most important factor, and speed is less of a consideration 2. You are transcribing **batches** of long audio files, in which case the latency of sequential is comparable to chunked, while being up to 0.5% WER more accurate Conversely, the chunked algorithm should be used when: 1. Transcription speed is the most important factor 2. You are transcribing a **single** long audio file By default, Transformers uses the sequential algorithm. To enable the chunked algorithm, pass the parameter to the . For large-v3, a chunk length of 30-seconds is optimal. To activate batching over long audio files, pass the argument : #### Torch compile The Whisper forward pass is compatible with []( for 4.5x speed-ups. **Note:** is currently not compatible with the Chunked long-form algorithm or Flash Attention 2 ⚠️ #### Flash Attention 2 We recommend using Flash-Attention 2 if your GPU supports it and you are not using torch.compile. To do so, first install Flash Attention: Then pass to : #### Torch Scale-Product-Attention (SDPA) If your GPU does not support Flash Attention, we recommend making use of PyTorch scaled dot-product attention (SDPA). This attention implementation is activated **by default** for PyTorch versions 2.1.1 or greater. To check whether you have a compatible PyTorch version, run the following Python code snippet: If the above returns , you have a valid version of PyTorch installed and SDPA is activated by default. If it returns , you need to upgrade your PyTorch version according to the official instructions Once a valid PyTorch version is installed, SDPA is activated by default. It can also be set explicitly by specifying as follows: For more information about how to use the SDPA refer to the Transformers SDPA documentation. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. There are two flavours of Whisper model: English-only and multilingual. The English-only models were trained on the task of English speech recognition. The multilingual models were trained simultaneously on multilingual speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are available as English-only and multilingual. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | | large-v3 | 1550 M | x | ✓ | | large-v3-turbo | 809 M | x | ✓ | ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data No information provided. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "Finetuned for faster automatic speech recognition and translation across multiple languages, with reduced decoding layers for improved speed at a minor quality trade-off." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-large-v3.json b/data/model_data_json/openai_whisper-large-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..5c4b241eb687acb9528194b8a0dddc5fcc5e305b --- /dev/null +++ b/data/model_data_json/openai_whisper-large-v3.json @@ -0,0 +1,119 @@ +{ + "model_id": "openai/whisper-large-v3", + "downloads": 6920932, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3 has the same architecture as the previous large and large-v2 models, except for the following minor differences: 1. The spectrogram input uses 128 Mel frequency bins instead of 80 2. A new language token for Cantonese The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2 . The model was trained for 2.0 epochs over this mixture dataset. The large-v3 model shows improved performance over a wide variety of languages, showing 10% to 20% reduction of errors compared to Whisper large-v2 . For more details on the different checkpoints available, refer to the section Model details. **Disclaimer**: Content for this model card has partly been written by the 🤗 Hugging Face team, and partly copied and pasted from the original model card. ## Usage Whisper large-v3 is supported in Hugging Face 🤗 Transformers. To run the model, first install the Transformers library. For this example, we'll also install 🤗 Datasets to load toy audio dataset from the Hugging Face Hub, and 🤗 Accelerate to reduce the model loading time: The model can be used with the []( class to transcribe audios of arbitrary length: To transcribe a local audio file, simply pass the path to your audio file when you call the pipeline: Multiple audio files can be transcribed in parallel by specifying them as a list and setting the parameter: Transformers is compatible with all Whisper decoding strategies, such as temperature fallback and condition on previous tokens. The following example demonstrates how to enable these heuristics: Whisper predicts the language of the source audio automatically. If the source audio language is known *a-priori*, it can be passed as an argument to the pipeline: By default, Whisper performs the task of *speech transcription*, where the source audio language is the same as the target text language. To perform *speech translation*, where the target text is in English, set the task to : Finally, the model can be made to predict timestamps. For sentence-level timestamps, pass the argument: And for word-level timestamps: The above arguments can be used in isolation or in combination. For example, to perform the task of speech transcription where the source audio is in French, and we want to return sentence-level timestamps, the following can be used:
For more control over the generation parameters, use the model + processor API directly:
## Additional Speed & Memory Improvements You can apply additional speed and memory improvements to Whisper to further reduce the inference speed and VRAM requirements. ### Chunked Long-Form Whisper has a receptive field of 30-seconds. To transcribe audios longer than this, one of two long-form algorithms are required: 1. **Sequential:** uses a \"sliding window\" for buffered inference, transcribing 30-second slices one after the other 2. **Chunked:** splits long audio files into shorter ones (with a small overlap between segments), transcribes each segment independently, and stitches the resulting transcriptions at the boundaries The sequential long-form algorithm should be used in either of the following scenarios: 1. Transcription accuracy is the most important factor, and speed is less of a consideration 2. You are transcribing **batches** of long audio files, in which case the latency of sequential is comparable to chunked, while being up to 0.5% WER more accurate Conversely, the chunked algorithm should be used when: 1. Transcription speed is the most important factor 2. You are transcribing a **single** long audio file By default, Transformers uses the sequential algorithm. To enable the chunked algorithm, pass the parameter to the . For large-v3, a chunk length of 30-seconds is optimal. To activate batching over long audio files, pass the argument : #### Torch compile The Whisper forward pass is compatible with []( for 4.5x speed-ups. **Note:** is currently not compatible with the Chunked long-form algorithm or Flash Attention 2 ⚠️ #### Flash Attention 2 We recommend using Flash-Attention 2 if your GPU supports it and you are not using torch.compile. To do so, first install Flash Attention: Then pass to : #### Torch Scale-Product-Attention (SDPA) If your GPU does not support Flash Attention, we recommend making use of PyTorch scaled dot-product attention (SDPA). This attention implementation is activated **by default** for PyTorch versions 2.1.1 or greater. To check whether you have a compatible PyTorch version, run the following Python code snippet: If the above returns , you have a valid version of PyTorch installed and SDPA is activated by default. If it returns , you need to upgrade your PyTorch version according to the official instructions Once a valid PyTorch version is installed, SDPA is activated by default. It can also be set explicitly by specifying as follows: For more information about how to use the SDPA refer to the Transformers SDPA documentation. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. There are two flavours of Whisper model: English-only and multilingual. The English-only models were trained on the task of English speech recognition. The multilingual models were trained simultaneously on multilingual speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are available as English-only and multilingual. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | | large-v3 | 1550 M | x | ✓ | ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The large-v3 checkpoint is trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "Whisper large-v3 is a state-of-the-art multilingual model for automatic speech recognition (ASR) and speech translation, trained on extensive audio data to transcribe and translate speech across numerous languages with high accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-large.json b/data/model_data_json/openai_whisper-large.json new file mode 100644 index 0000000000000000000000000000000000000000..cb97c42b703d9ab163a61b941fde3224c102aef5 --- /dev/null +++ b/data/model_data_json/openai_whisper-large.json @@ -0,0 +1,121 @@ +{ + "model_id": "openai/whisper-large", + "downloads": 149459, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: whisper-large results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 3.0 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 5.4 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: hi split: test args: language: hi metrics: - name: Test WER type: wer value: 54.8 pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here.

Update: following the release of the paper, the Whisper authors announced a large-v2 model trained for 2.5x more epochs with regularization. This large-v2 model surpasses the performance of the large model, with no architecture changes. Thus, it is recommended that the large-v2 model is used in-place of the original large model.

**Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) The model is informed of which task to perform (transcription or translation) by passing the appropriate \"context tokens\". These context tokens are a sequence of tokens that are given to the decoder at the start of the decoding process, and take the following order: 1. The transcription always starts with the token 2. The second token is the language token (e.g. for English) 3. The third token is the \"task token\". It can take one of two values: for speech recognition or for speech translation 4. In addition, a token is added if the model should not include timestamp prediction Thus, a typical sequence of context tokens might look as follows: Which tells the model to decode in English, under the task of speech recognition, and not to predict timestamps. These tokens can either be forced or un-forced. If they are forced, the model is made to predict each token at each position. This allows one to control the output language and task for the Whisper model. If they are un-forced, the Whisper model will automatically predict the output langauge and task itself. The context tokens can be set accordingly: Which forces the model to predict in English under the task of speech recognition. ## Transcription ### English to English In this example, the context tokens are 'unforced', meaning the model automatically predicts the output language (English) and task (transcribe). The context tokens can be removed from the start of the transcription by setting . ### French to French The following example demonstrates French to French transcription by setting the decoder ids appropriately. ## Translation Setting the task to \"translate\" forces the Whisper model to perform speech translation. ### French to English ## Evaluation This code snippet shows how to evaluate Whisper Large on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "Transcribes speech into text and translates spoken content across multiple languages without requiring fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-medium.json b/data/model_data_json/openai_whisper-medium.json new file mode 100644 index 0000000000000000000000000000000000000000..68c793ca0d1dfa28814e2ae9ea5373bcfeda4417 --- /dev/null +++ b/data/model_data_json/openai_whisper-medium.json @@ -0,0 +1,121 @@ +{ + "model_id": "openai/whisper-medium", + "downloads": 498193, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: whisper-medium results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 2.9 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 5.9 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: hi split: test args: language: hi metrics: - name: Test WER type: wer value: 53.87 pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here. **Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) The model is informed of which task to perform (transcription or translation) by passing the appropriate \"context tokens\". These context tokens are a sequence of tokens that are given to the decoder at the start of the decoding process, and take the following order: 1. The transcription always starts with the token 2. The second token is the language token (e.g. for English) 3. The third token is the \"task token\". It can take one of two values: for speech recognition or for speech translation 4. In addition, a token is added if the model should not include timestamp prediction Thus, a typical sequence of context tokens might look as follows: Which tells the model to decode in English, under the task of speech recognition, and not to predict timestamps. These tokens can either be forced or un-forced. If they are forced, the model is made to predict each token at each position. This allows one to control the output language and task for the Whisper model. If they are un-forced, the Whisper model will automatically predict the output langauge and task itself. The context tokens can be set accordingly: Which forces the model to predict in English under the task of speech recognition. ## Transcription ### English to English In this example, the context tokens are 'unforced', meaning the model automatically predicts the output language (English) and task (transcribe). The context tokens can be removed from the start of the transcription by setting . ### French to French The following example demonstrates French to French transcription by setting the decoder ids appropriately. ## Translation Setting the task to \"translate\" forces the Whisper model to perform speech translation. ### French to English ## Evaluation This code snippet shows how to evaluate Whisper Medium on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "OpenAI Whisper-medium is a multilingual automatic speech recognition model trained to transcribe and translate speech across numerous languages without requiring fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-small.json b/data/model_data_json/openai_whisper-small.json new file mode 100644 index 0000000000000000000000000000000000000000..9ee85f7d1b3f9f62eea28e276a782910cb5fbbc8 --- /dev/null +++ b/data/model_data_json/openai_whisper-small.json @@ -0,0 +1,121 @@ +{ + "model_id": "openai/whisper-small", + "downloads": 2198526, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: whisper-small results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 3.432213777886737 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 7.628304527060248 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: hi split: test args: language: hi metrics: - name: Test WER type: wer value: 87.3 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 13.0 type: mozilla-foundation/common_voice_13_0 config: dv split: test args: language: dv metrics: - name: Wer type: wer value: 125.69809089960707 pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here. **Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) The model is informed of which task to perform (transcription or translation) by passing the appropriate \"context tokens\". These context tokens are a sequence of tokens that are given to the decoder at the start of the decoding process, and take the following order: 1. The transcription always starts with the token 2. The second token is the language token (e.g. for English) 3. The third token is the \"task token\". It can take one of two values: for speech recognition or for speech translation 4. In addition, a token is added if the model should not include timestamp prediction Thus, a typical sequence of context tokens might look as follows: Which tells the model to decode in English, under the task of speech recognition, and not to predict timestamps. These tokens can either be forced or un-forced. If they are forced, the model is made to predict each token at each position. This allows one to control the output language and task for the Whisper model. If they are un-forced, the Whisper model will automatically predict the output langauge and task itself. The context tokens can be set accordingly: Which forces the model to predict in English under the task of speech recognition. ## Transcription ### English to English In this example, the context tokens are 'unforced', meaning the model automatically predicts the output language (English) and task (transcribe). The context tokens can be removed from the start of the transcription by setting . ### French to French The following example demonstrates French to French transcription by setting the decoder ids appropriately. ## Translation Setting the task to \"translate\" forces the Whisper model to perform speech translation. ### French to English ## Evaluation This code snippet shows how to evaluate Whisper Small on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "The 'openai_whisper-small' model performs automatic speech recognition and speech translation across multiple languages, trained on extensive labeled data to generalize well without fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-tiny.en.json b/data/model_data_json/openai_whisper-tiny.en.json new file mode 100644 index 0000000000000000000000000000000000000000..846aa80069865dfbbffadf4d1f54017af1c5527e --- /dev/null +++ b/data/model_data_json/openai_whisper-tiny.en.json @@ -0,0 +1,23 @@ +{ + "model_id": "openai/whisper-tiny.en", + "downloads": 142544, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "arxiv:2212.04356", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: whisper-tiny.en results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 8.4372112320138 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 14.857607503498355 pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. The original code repository can be found here. **Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage This checkpoint is an *English-only* model, meaning it can be used for English speech recognition. Multilingual speech recognition or speech translation is possible through use of a multilingual checkpoint. To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) ## Transcription The context tokens can be removed from the start of the transcription by setting . ## Evaluation This code snippet shows how to evaluate Whisper tiny.en on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "Transcribes English speech into text using automatic speech recognition without requiring fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openai_whisper-tiny.json b/data/model_data_json/openai_whisper-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..e1567e042e408e95898614d9f751d5796e3f49ba --- /dev/null +++ b/data/model_data_json/openai_whisper-tiny.json @@ -0,0 +1,121 @@ +{ + "model_id": "openai/whisper-tiny", + "downloads": 298976, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "safetensors", + "whisper", + "automatic-speech-recognition", + "audio", + "hf-asr-leaderboard", + "en", + "zh", + "de", + "es", + "ru", + "ko", + "fr", + "ja", + "pt", + "tr", + "pl", + "ca", + "nl", + "ar", + "sv", + "it", + "id", + "hi", + "fi", + "vi", + "he", + "uk", + "el", + "ms", + "cs", + "ro", + "da", + "hu", + "ta", + "no", + "th", + "ur", + "hr", + "bg", + "lt", + "la", + "mi", + "ml", + "cy", + "sk", + "te", + "fa", + "lv", + "bn", + "sr", + "az", + "sl", + "kn", + "et", + "mk", + "br", + "eu", + "is", + "hy", + "ne", + "mn", + "bs", + "kk", + "sq", + "sw", + "gl", + "mr", + "pa", + "si", + "km", + "sn", + "yo", + "so", + "af", + "oc", + "ka", + "be", + "tg", + "sd", + "gu", + "am", + "yi", + "lo", + "uz", + "fo", + "ht", + "ps", + "tk", + "nn", + "mt", + "sa", + "lb", + "my", + "bo", + "tl", + "mg", + "as", + "tt", + "haw", + "ln", + "ha", + "ba", + "jw", + "su", + "arxiv:2212.04356", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard widget: - example_title: Librispeech sample 1 src: - example_title: Librispeech sample 2 src: model-index: - name: whisper-tiny results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 7.54 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 17.15 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: hi split: test args: language: hi metrics: - name: Test WER type: wer value: 141 pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here. **Disclaimer**: Content for this model card has partly been written by the Hugging Face team, and parts of it were copied and pasted from the original model card. ## Model details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the *same* language as the audio. For speech translation, the model predicts transcriptions to a *different* language to the audio. Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face Hub. The checkpoints are summarised in the following table with links to the models on the Hub: | Size | Parameters | English-only | Multilingual | |----------|------------|------------------------------------------------------|-----------------------------------------------------| | tiny | 39 M | ✓ | ✓ | | base | 74 M | ✓ | ✓ | | small | 244 M | ✓ | ✓ | | medium | 769 M | ✓ | ✓ | | large | 1550 M | x | ✓ | | large-v2 | 1550 M | x | ✓ | # Usage To transcribe audio samples, the model has to be used alongside a []( The is used to: 1. Pre-process the audio inputs (converting them to log-Mel spectrograms for the model) 2. Post-process the model outputs (converting them from tokens to text) The model is informed of which task to perform (transcription or translation) by passing the appropriate \"context tokens\". These context tokens are a sequence of tokens that are given to the decoder at the start of the decoding process, and take the following order: 1. The transcription always starts with the token 2. The second token is the language token (e.g. for English) 3. The third token is the \"task token\". It can take one of two values: for speech recognition or for speech translation 4. In addition, a token is added if the model should not include timestamp prediction Thus, a typical sequence of context tokens might look as follows: Which tells the model to decode in English, under the task of speech recognition, and not to predict timestamps. These tokens can either be forced or un-forced. If they are forced, the model is made to predict each token at each position. This allows one to control the output language and task for the Whisper model. If they are un-forced, the Whisper model will automatically predict the output langauge and task itself. The context tokens can be set accordingly: Which forces the model to predict in English under the task of speech recognition. ## Transcription ### English to English In this example, the context tokens are 'unforced', meaning the model automatically predicts the output language (English) and task (transcribe). The context tokens can be removed from the start of the transcription by setting . ### French to French The following example demonstrates French to French transcription by setting the decoder ids appropriately. ## Translation Setting the task to \"translate\" forces the Whisper model to perform speech translation. ### French to English ## Evaluation This code snippet shows how to evaluate Whisper Tiny on LibriSpeech test-clean: ## Long-Form Transcription The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers []( method. Chunking is enabled by setting when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing : Refer to the blog post ASR Chunking for more details on the chunking algorithm. ## Fine-Tuning The pre-trained Whisper model demonstrates a strong ability to generalise to different datasets and domains. However, its predictive capabilities can be improved further for certain languages and tasks through *fine-tuning*. The blog post Fine-Tune Whisper with 🤗 Transformers provides a step-by-step guide to fine-tuning the Whisper model with as little as 5 hours of labelled data. ### Evaluated Use The primary intended users of these models are AI researchers studying robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research. The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them. In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes. ## Training Data The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. As discussed in the accompanying paper, we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language. ## Performance and Limitations Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, technical language, as well as zero shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself. Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in the paper accompanying this release. In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis on these limitations are provided in the paper. It is likely that this behavior and hallucinations may be worse on lower-resource and/or lower-discoverability languages. ## Broader Implications We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications. There are also potential dual use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects. ### BibTeX entry and citation info", + "model_explanation_gemini": "The 'openai_whisper-tiny' model performs automatic speech recognition and speech translation across multiple languages, trained to transcribe or translate audio without requiring fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/openart-custom_AlbedoBase.json b/data/model_data_json/openart-custom_AlbedoBase.json new file mode 100644 index 0000000000000000000000000000000000000000..38406232b9d49408503ff54238af659b6b096195 --- /dev/null +++ b/data/model_data_json/openart-custom_AlbedoBase.json @@ -0,0 +1,15 @@ +{ + "model_id": "openart-custom/AlbedoBase", + "downloads": 165421, + "tags": [ + "diffusers", + "safetensors", + "arxiv:1910.09700", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- library_name: diffusers --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🧨 diffusers pipeline that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]", + "model_explanation_gemini": "A diffusers pipeline model for generating images, though specific details about its functionality and purpose are unspecified." +} \ No newline at end of file diff --git a/data/model_data_json/openart-custom_CrystalClearXL.json b/data/model_data_json/openart-custom_CrystalClearXL.json new file mode 100644 index 0000000000000000000000000000000000000000..14061a34a1a8baa459000a6e343f7ad876cc7106 --- /dev/null +++ b/data/model_data_json/openart-custom_CrystalClearXL.json @@ -0,0 +1,14 @@ +{ + "model_id": "openart-custom/CrystalClearXL", + "downloads": 81938, + "tags": [ + "diffusers", + "safetensors", + "arxiv:1910.09700", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- library_name: diffusers --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🧨 diffusers pipeline that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]" +} \ No newline at end of file diff --git a/data/model_data_json/openart-custom_DucHaiten-AIart-SDXL_v3.json b/data/model_data_json/openart-custom_DucHaiten-AIart-SDXL_v3.json new file mode 100644 index 0000000000000000000000000000000000000000..486fa53a6829df566cbe792d15a160ef1e705963 --- /dev/null +++ b/data/model_data_json/openart-custom_DucHaiten-AIart-SDXL_v3.json @@ -0,0 +1,15 @@ +{ + "model_id": "openart-custom/DucHaiten-AIart-SDXL_v3", + "downloads": 122360, + "tags": [ + "diffusers", + "safetensors", + "arxiv:1910.09700", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- library_name: diffusers --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🧨 diffusers pipeline that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]", + "model_explanation_gemini": "Generates AI art using a diffusers pipeline, likely based on Stable Diffusion XL (SDXL) architecture." +} \ No newline at end of file diff --git a/data/model_data_json/openbmb_MiniCPM-V-2_6-int4.json b/data/model_data_json/openbmb_MiniCPM-V-2_6-int4.json new file mode 100644 index 0000000000000000000000000000000000000000..a0eb8a174dab6c9bdcbbd53949c4f9f9e2b72085 --- /dev/null +++ b/data/model_data_json/openbmb_MiniCPM-V-2_6-int4.json @@ -0,0 +1,27 @@ +{ + "model_id": "openbmb/MiniCPM-V-2_6-int4", + "downloads": 118527, + "tags": [ + "transformers", + "safetensors", + "minicpmv", + "feature-extraction", + "minicpm-v", + "vision", + "ocr", + "multi-image", + "video", + "custom_code", + "image-text-to-text", + "conversational", + "multilingual", + "dataset:openbmb/RLAIF-V-Dataset", + "base_model:openbmb/MiniCPM-V-2_6", + "base_model:quantized:openbmb/MiniCPM-V-2_6", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- pipeline_tag: image-text-to-text datasets: - openbmb/RLAIF-V-Dataset library_name: transformers language: - multilingual tags: - minicpm-v - vision - ocr - multi-image - video - custom_code base_model: - openbmb/MiniCPM-V-2_6 --- ## News * [2025.01.14] 🔥🔥 We open source **MiniCPM-o 2.6**, with significant performance improvement over **MiniCPM-V 2.6**, and support real-time speech-to-speech conversation and multimodal live streaming. Try it now. ## MiniCPM-V 2.6 int4 This is the int4 quantized version of MiniCPM-V 2.6. Running with int4 version would use lower GPU memory (about 7GB). ## Usage Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:", + "model_explanation_gemini": "Performs image-text-to-text tasks including vision, OCR, multi-image, and video processing with multilingual support, optimized for lower GPU memory usage in its int4 quantized version." +} \ No newline at end of file diff --git a/data/model_data_json/openbmb_MiniCPM-o-2_6.json b/data/model_data_json/openbmb_MiniCPM-o-2_6.json new file mode 100644 index 0000000000000000000000000000000000000000..048d317154a1a654abc12c57317b77d53504760e --- /dev/null +++ b/data/model_data_json/openbmb_MiniCPM-o-2_6.json @@ -0,0 +1,32 @@ +{ + "model_id": "openbmb/MiniCPM-o-2_6", + "downloads": 229680, + "tags": [ + "transformers", + "safetensors", + "minicpmo", + "feature-extraction", + "minicpm-o", + "omni", + "vision", + "ocr", + "multi-image", + "video", + "custom_code", + "audio", + "speech", + "voice cloning", + "live Streaming", + "realtime speech conversation", + "asr", + "tts", + "any-to-any", + "multilingual", + "dataset:openbmb/RLAIF-V-Dataset", + "arxiv:2405.17220", + "arxiv:2408.01800", + "region:us" + ], + "description": "--- pipeline_tag: any-to-any datasets: - openbmb/RLAIF-V-Dataset library_name: transformers language: - multilingual tags: - minicpm-o - omni - vision - ocr - multi-image - video - custom_code - audio - speech - voice cloning - live Streaming - realtime speech conversation - asr - tts ---

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

GitHub | Online Demo | Technical Blog ### News * [2025.03.01] 🚀🚀🚀 RLAIF-V, which is the alignment technique of MiniCPM-o, is accepted by CVPR 2025!The code, dataset, paper are open-sourced! * [2025.01.24] 📢📢📢 MiniCPM-o 2.6 technical report is released! See Here. * [2025.01.19] ⭐️⭐️⭐️ MiniCPM-o tops GitHub Trending and reaches top-2 on Hugging Face Trending! ## MiniCPM-o 2.6 **MiniCPM-o 2.6** is the latest and most capable model in the MiniCPM-o series. The model is built in an end-to-end fashion based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M, and Qwen2.5-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.6, and introduces new features for real-time speech conversation and multimodal live streaming. Notable features of MiniCPM-o 2.6 include: - 🔥 **Leading Visual Capability.** MiniCPM-o 2.6 achieves an average score of 70.2 on OpenCompass, a comprehensive evaluation over 8 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-202405, Gemini 1.5 Pro, and Claude 3.5 Sonnet** for single image understanding. It also **outperforms GPT-4V and Claude 3.5 Sonnet** in mutli-image and video understanding, and shows promising in-context learning capability. - 🎙 **State-of-the-art Speech Capability.** MiniCPM-o 2.6 supports **bilingual real-time speech conversation with configurable voices** in English and Chinese. It **outperforms GPT-4o-realtime on audio understanding tasks** such as ASR and STT translation, and shows **state-of-the-art performance on speech conversation in both semantic and acoustic evaluations in the open-source community**. It also allows for fun features such as emotion/speed/style control, end-to-end voice cloning, role play, etc. - 🎬 **Strong Multimodal Live Streaming Capability.** As a new feature, MiniCPM-o 2.6 can **accept continous video and audio streams independent of user queries, and support real-time speech interaction**. It **outperforms GPT-4o-202408 and Claude 3.5 Sonnet and shows state-of-art performance in open-source community on StreamingBench**, a comprehensive benchmark for real-time video understanding, omni-source (video & audio) understanding, and multimodal contextual understanding. - 💪 **Strong OCR Capability and Others.** Advancing popular visual capabilites from MiniCPM-V series, MiniCPM-o 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves **state-of-the-art performance on OCRBench for models under 25B, surpassing proprietary models such as GPT-4o-202405**. Based on the the latest RLAIF-V and VisCPM techniques, it features **trustworthy behaviors**, outperforming GPT-4o and Claude 3.5 Sonnet on MMHal-Bench, and supports **multilingual capabilities** on more than 30 languages. - 🚀 **Superior Efficiency.** In addition to its friendly size, MiniCPM-o 2.6 also shows **state-of-the-art token density** (i.e., number of pixels encoded into each visual token). **It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models**. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-o 2.6 can efficiently support **multimodal live streaming** on end-side devices such as iPad. - 💫 **Easy Usage.** MiniCPM-o 2.6 can be easily used in various ways: (1) llama.cpp support for efficient CPU inference on local devices, (2) int4 and GGUF format quantized models in 16 sizes, (3) vLLM support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with LLaMA-Factory, (5) quick local WebUI demo setup with Gradio, and (6) online web demo on server. **Model Architecture.** - **End-to-end Omni-modal Architecture.** Different modality encoder/decoders are connected and trained in an **end-to-end** fashion to fully exploit rich multimodal knowledge. - **Omni-modal Live Streaming Mechanism.** (1) We change the offline modality encoder/decoders into online ones for **streaminig inputs/outputs.** (2) We devise a **time-division multiplexing (TDM) mechanism** for omni-modality streaminig processing in the LLM backbone. It divides parallel omni-modality streams into sequential info within small periodic time slices. - **Configurable Speech Modeling Design.** We devise a multimodal system prompt, including traditional text system prompt, and **a new audio system prompt to determine the assistant voice**. This enables flexible voice configurations in inference time, and also facilitates end-to-end voice cloning and description-based voice creation.
### Evaluation
#### Visual understanding results **Image Understanding:**
Model Size Token Density+ OpenCompass OCRBench MathVista mini ChartQA MMVet MMStar MME MMB1.1 test AI2D MMMU val HallusionBench TextVQA val DocVQA test MathVerse mini MathVision MMHal Score
Proprietary
GPT-4o-20240513 - 1088 69.9 736 61.3 85.7 69.1 63.9 2328.7 82.2 84.6 69.2 55.0 - 92.8 50.2 30.4 3.6
Claude3.5-Sonnet - 750 67.9 788 61.6 90.8 66.0 62.2 1920.0 78.5 80.2 65.9 49.9 - 95.2 - - 3.4
Gemini 1.5 Pro - - 64.4 754 57.7 81.3 64.0 59.1 2110.6 73.9 79.1 60.6 45.6 73.5 86.5 - 19.2 -
GPT-4o-mini-20240718 - 1088 64.1 785 52.4 - 66.9 54.8 2003.4 76.0 77.8 60.0 46.1 - - - - 3.3
Open Source
Cambrian-34B 34B 1820 58.3 591 50.3 75.6 53.2 54.2 2049.9 77.8 79.5 50.4 41.6 76.7 75.5 - - -
GLM-4V-9B 13B 784 59.1 776 51.1 - 58.0 54.8 2018.8 67.9 71.2 46.9 45.0 - - - - -
Pixtral-12B 12B 256 61.0 685 56.9 81.8 58.5 54.5 - 72.7 79.0 51.1 47.0 75.7 90.7 - - -
DeepSeek-VL2-27B (4B) 27B 672 66.4 809 63.9 86.0 60.0 61.9 2253.0 81.2 83.8 54.0 45.3 84.2 93.3 - - 3.0
Qwen2-VL-7B 8B 784 67.1 866 58.2 83.0 62.0 60.7 2326.0 81.8 83.0 54.1 50.6 84.3 94.5 31.9 16.3 3.2
LLaVA-OneVision-72B 72B 182 68.1 741 67.5 83.7 60.6 65.8 2261.0 85.0 85.6 56.8 49.0 80.5 91.3 39.1 - 3.5
InternVL2.5-8B 8B 706 68.3 822 64.4 84.8 62.8 62.8 2344.0 83.6 84.5 56.0 50.1 79.1 93.0 39.5 19.7 3.4
MiniCPM-V 2.6 8B 2822 65.2 852* 60.6 79.4 60.0 57.5 2348.4* 78.0 82.1 49.8* 48.1* 80.1 90.8 25.7 18.3 3.6
MiniCPM-o 2.6 8B 2822 70.2 897* 71.9* 86.9* 67.5 64.0 2372.0* 80.5 85.8 50.4* 51.9 82.0 93.5 41.4* 23.1* 3.8
* We evaluate this benchmark using chain-of-thought prompting. Specifically, for MME, we used this technique only for the Cognition set. + Token Density: number of pixels encoded into each visual token at maximum resolution, i.e., # pixels at maximum resolution / # visual tokens. Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation. **Multi-image and Video Understanding:**
click to view
Model Size BLINK val Mantis Eval MIRB Video-MME (wo / w subs)
Proprietary
GPT-4o-20240513 - 68.0 - - 71.9/77.2
GPT4V - 54.6 62.7 53.1 59.9/63.3
Open-source
LLaVA-NeXT-Interleave 14B 14B 52.6 66.4 30.2 -
LLaVA-OneVision-72B 72B 55.4 77.6 - 66.2/69.5
MANTIS 8B 8B 49.1 59.5 34.8 -
Qwen2-VL-7B 8B 53.2 69.6* 67.6* 63.3/69.0
InternVL2.5-8B 8B 54.8 67.7 52.5 64.2/66.9
MiniCPM-V 2.6 8B 53.0 69.1 53.8 60.9/63.6
MiniCPM-o 2.6 8B 56.7 71.9 58.6 63.9/67.9
* We evaluate officially released checkpoints by ourselves.
#### Audio understanding and speech conversation results. **Audio Understanding:**
Task Size ASR (zh) ASR (en) AST Emotion
Metric CER↓ WER↓ BLEU↑ ACC↑
Dataset AISHELL-1 Fleurs zh WenetSpeech test-net LibriSpeech test-clean GigaSpeech TED-LIUM CoVoST en2zh CoVoST zh2en MELD emotion
Proprietary
GPT-4o-Realtime - 7.3* 5.4* 28.9* 2.6* 12.9* 4.8* 37.1* 15.7* 33.2*
Gemini 1.5 Pro - 4.5* 5.9* 14.3* 2.9* 10.6* 3.0* 47.3* 22.6* 48.4*
Open-Source
Qwen2-Audio-7B 8B - 7.5 - 1.6 - - 45.2 24.4 55.3
Qwen2-Audio-7B-Instruct 8B 2.6* 6.9* 10.3* 3.1* 9.7* 5.9* 39.5* 22.9* 17.4*
GLM-4-Voice-Base 9B 2.5 - - 2.8 - - - -
MiniCPM-o 2.6 8B 1.6 4.4 6.9 1.7 8.7 3.0 48.2 27.2 52.4
* We evaluate officially released checkpoints by ourselves.

**Speech Generation:**
Task Size SpeechQA
Metric ACC↑ G-Eval (10 point)↑ Semantic ELO score↑ Acoustic ELO score↑ Overall ELO score↑ UTMOS↑ ASR-WER↓
Dataset Speech Llama Q. Speech Web Q. Speech Trivia QA Speech AlpacaEval AudioArena
Proprietary
GPT-4o-Realtime 71.7 51.6 69.7 7.4 1157 1203 1200 4.2 2.3
Open-Source
GLM-4-Voice 9B 50.0 32.0 36.4 5.1 999 1147 1035 4.1 11.7
Llama-Omni 8B 45.3 22.9 10.7 3.9 960 878 897 3.2 24.3
Moshi 7B 43.7 23.8 16.7 2.4 871 808 875 2.8 8.2
Mini-Omni 1B 22.0 12.8 6.9 2.5 926 803 865 3.4 10.0
MiniCPM-o 2.6 8B 61.0 40.0 40.2 5.1 1088 1163 1131 4.2 9.8
All results are from AudioEvals, and the evaluation methods along with further details can be found in UltraEval-Audio.

**End-to-end Voice Cloning**
Task Voice cloning
Metric SIMO↑ SIMO↑
Dataset Seed-TTS test-zh Seed-TTS test-en
F5-TTS 76 67
CosyVoice 75 64
FireRedTTS 63 46
MiniCPM-o 2.6 57 47
#### Multimodal live streaming results. **Multimodal Live Streaming:** results on StreamingBench
Model Size Real-Time Video Understanding Omni-Source Understanding Contextual Understanding Overall
Proprietary
Gemini 1.5 Pro - 77.4 67.8 51.1 70.3
GPT-4o-202408 - 74.5 51.0 48.0 64.1
Claude-3.5-Sonnet - 74.0 41.4 37.8 59.7
Open-source
VILA-1.5 8B 61.5 37.5 26.7 49.5
LongVA 7B 63.1 35.9 30.2 50.7
LLaVA-Next-Video-34B 34B 69.8 41.7 34.3 56.7
Qwen2-VL-7B 8B 71.2 40.7 33.1 57.0
InternVL2-8B 8B 70.1 42.7 34.1 57.0
VITA-1.5 8B 70.9 40.8 35.8 57.4
LLaVA-OneVision-7B 8B 74.3 40.8 31.0 58.4
InternLM-XC2.5-OL-7B 8B 75.4 46.2 33.6 60.8
MiniCPM-V 2.6 8B 72.4 40.2 33.4 57.7
MiniCPM-o 2.6 8B 79.9 53.4 38.5 66.0
### Examples We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.

\"math\" \"diagram\" \"bike\"
## Online Demo Click here to try the online demo of MiniCPM-o 2.6. ## Usage Inference using Huggingface transformers on NVIDIA GPUs. Please ensure that is installed, as other versions may have compatibility issues. We are investigating this issue. Requirements tested on python 3.10: ### Model initialization If you are using an older version of PyTorch, you might encounter this issue , Please convert the TTS to float32 type. ### Omni mode We provide two inference modes: chat and streaming #### Chat inference #### Streaming inference ### Speech and Audio Mode Model initialization
#### Mimick task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
#### General Speech Conversation with Configurable Voices A general usage scenario of is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
#### Speech Conversation as an AI Assistant An enhanced feature of is to act as an AI assistant, but only with limited choice of voices. In this mode, is **less human-like and more like a voice assistant**. In this mode, the model is more instruction-following. For demo, you are suggested to use , , and . Other voices may work but not as stable as the default voices. *Please note that, and are more stable but sounds like robots, while is more human-alike but not stable, its voice often changes in multiple turns. We suggest you to try stable voices and .*
#### Instruction-to-Speech can also do Instruction-to-Speech, aka **Voice Creation**. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to
#### Voice Cloning can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
#### Addressing Various Audio Understanding Tasks can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging. For audio-to-text tasks, you can use the following prompts: - ASR with ZH(same as AST en2zh): - ASR with EN(same as AST zh2en): - Speaker Analysis: - General Audio Caption: - General Sound Scene Tagging: ### Vision-Only mode has the same inference methods as #### Chat with single image #### Chat with multiple images
Click to show Python code running MiniCPM-o 2.6 with multiple images input.
#### In-context few-shot learning
Click to view Python code running MiniCPM-o 2.6 with few-shot input.
#### Chat with video
Click to view Python code running MiniCPM-o 2.6 with video input.
Please look at GitHub for more detail about usage. ## Inference with llama.cpp MiniCPM-o 2.6 (vision-only mode) can run with llama.cpp. See our fork of llama.cpp and readme for more detail. ## Int4 quantized version Download the int4 quantized version for lower GPU memory (7GB) usage: MiniCPM-o-2_6-int4. ## License #### Model License * The code in this repo is released under the Apache-2.0 License. * The usage of MiniCPM-o and MiniCPM-V series model weights must strictly follow MiniCPM Model License.md. * The models and weights of MiniCPM are completely free for academic research. After filling out a \"questionnaire\" for registration, MiniCPM-o 2.6 weights are also available for free commercial use. #### Statement * As an LMM, MiniCPM-o 2.6 generates contents by learning a large mount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-o 2.6 does not represent the views and positions of the model developers * We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model. ## Key Techniques and Other Multimodal Projects 👏 Welcome to explore key techniques of MiniCPM-o 2.6 and other multimodal projects of our team: VisCPM | RLHF-V | LLaVA-UHD | RLAIF-V ## Citation If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️!", + "model_explanation_gemini": "MiniCPM-o 2.6 is a multilingual, multimodal AI model excelling in vision, speech, and live-streaming tasks, offering real-time conversation, advanced OCR, and efficient performance on devices like phones." +} \ No newline at end of file diff --git a/data/model_data_json/opensearch-project_opensearch-neural-sparse-encoding-doc-v2-distill.json b/data/model_data_json/opensearch-project_opensearch-neural-sparse-encoding-doc-v2-distill.json new file mode 100644 index 0000000000000000000000000000000000000000..8bd4e5c0d6769c23ee8a6f4c85ff32958440330d --- /dev/null +++ b/data/model_data_json/opensearch-project_opensearch-neural-sparse-encoding-doc-v2-distill.json @@ -0,0 +1,25 @@ +{ + "model_id": "opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill", + "downloads": 1697608, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "distilbert", + "fill-mask", + "learned sparse", + "opensearch", + "retrieval", + "passage-retrieval", + "document-expansion", + "bag-of-words", + "en", + "arxiv:2411.04403", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 tags: - learned sparse - opensearch - transformers - retrieval - passage-retrieval - document-expansion - bag-of-words --- # opensearch-neural-sparse-encoding-doc-v2-distill ## Select the model The model should be selected considering search relevance, model inference and retrieval efficiency(FLOPS). We benchmark models' **zero-shot performance** on a subset of BEIR benchmark: TrecCovid,NFCorpus,NQ,HotpotQA,FiQA,ArguAna,Touche,DBPedia,SCIDOCS,FEVER,Climate FEVER,SciFact,Quora. Overall, the v2 series of models have better search relevance, efficiency and inference speed than the v1 series. The specific advantages and disadvantages may vary across different datasets. | Model | Inference-free for Retrieval | Model Parameters | AVG NDCG@10 | AVG FLOPS | |-------|------------------------------|------------------|-------------|-----------| | opensearch-neural-sparse-encoding-v1 | | 133M | 0.524 | 11.4 | | opensearch-neural-sparse-encoding-v2-distill | | 67M | 0.528 | 8.3 | | opensearch-neural-sparse-encoding-doc-v1 | ✔️ | 133M | 0.490 | 2.3 | | opensearch-neural-sparse-encoding-doc-v2-distill | ✔️ | 67M | 0.504 | 1.8 | | opensearch-neural-sparse-encoding-doc-v2-mini | ✔️ | 23M | 0.497 | 1.7 | ## Overview - **Paper**: Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers - **Fine-tuning sample**: opensearch-sparse-model-tuning-sample This is a learned sparse retrieval model. It encodes the documents to 30522 dimensional **sparse vectors**. For queries, it just use a tokenizer and a weight look-up table to generate sparse vectors. The non-zero dimension index means the corresponding token in the vocabulary, and the weight means the importance of the token. And the similarity score is the inner product of query/document sparse vectors. The training datasets includes MS MARCO, eli5_question_answer, squad_pairs, WikiAnswers, yahoo_answers_title_question, gooaq_pairs, stackexchange_duplicate_questions_body_body, wikihow, S2ORC_title_abstract, stackexchange_duplicate_questions_title-body_title-body, yahoo_answers_question_answer, searchQA_top5_snippets, stackexchange_duplicate_questions_title_title, yahoo_answers_title_answer. OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: The indexing and search can be performed with OpenSearch high-level API. ## Usage (HuggingFace) This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API. The above code sample shows an example of neural sparse search. Although there is no overlap token in original query and document, but this model performs a good match. ## Detailed Search Relevance
| Model | Average | Trec Covid | NFCorpus | NQ | HotpotQA | FiQA | ArguAna | Touche | DBPedia | SCIDOCS | FEVER | Climate FEVER | SciFact | Quora | |-------|---------|------------|----------|----|----------|------|---------|--------|---------|---------|-------|---------------|---------|-------| | opensearch-neural-sparse-encoding-v1 | 0.524 | 0.771 | 0.360 | 0.553 | 0.697 | 0.376 | 0.508 | 0.278 | 0.447 | 0.164 | 0.821 | 0.263 | 0.723 | 0.856 | | opensearch-neural-sparse-encoding-v2-distill | 0.528 | 0.775 | 0.347 | 0.561 | 0.685 | 0.374 | 0.551 | 0.278 | 0.435 | 0.173 | 0.849 | 0.249 | 0.722 | 0.863 | | opensearch-neural-sparse-encoding-doc-v1 | 0.490 | 0.707 | 0.352 | 0.521 | 0.677 | 0.344 | 0.461 | 0.294 | 0.412 | 0.154 | 0.743 | 0.202 | 0.716 | 0.788 | | opensearch-neural-sparse-encoding-doc-v2-distill | 0.504 | 0.690 | 0.343 | 0.528 | 0.675 | 0.357 | 0.496 | 0.287 | 0.418 | 0.166 | 0.818 | 0.224 | 0.715 | 0.841 | | opensearch-neural-sparse-encoding-doc-v2-mini | 0.497 | 0.709 | 0.336 | 0.510 | 0.666 | 0.338 | 0.480 | 0.285 | 0.407 | 0.164 | 0.812 | 0.216 | 0.699 | 0.837 |
## License This project is licensed under the Apache v2.0 License. ## Copyright Copyright OpenSearch Contributors. See NOTICE for details.", + "model_explanation_gemini": "Generates sparse vector representations of documents for efficient retrieval using token importance weights without requiring query inference." +} \ No newline at end of file diff --git a/data/model_data_json/openvla_openvla-7b.json b/data/model_data_json/openvla_openvla-7b.json new file mode 100644 index 0000000000000000000000000000000000000000..492ffcca2b9d86f6387dd32b7780a9f90a7ff9ba --- /dev/null +++ b/data/model_data_json/openvla_openvla-7b.json @@ -0,0 +1,22 @@ +{ + "model_id": "openvla/openvla-7b", + "downloads": 1621806, + "tags": [ + "transformers", + "safetensors", + "openvla", + "feature-extraction", + "robotics", + "vla", + "image-text-to-text", + "multimodal", + "pretraining", + "custom_code", + "en", + "arxiv:2406.09246", + "license:mit", + "region:us" + ], + "description": "--- library_name: transformers tags: - robotics - vla - image-text-to-text - multimodal - pretraining license: mit language: - en pipeline_tag: image-text-to-text --- # OpenVLA 7B OpenVLA 7B () is an open vision-language-action model trained on 970K robot manipulation episodes from the Open X-Embodiment dataset. The model takes language instructions and camera images as input and generates robot actions. It supports controlling multiple robots out-of-the-box, and can be quickly adapted for new robot domains via (parameter-efficient) fine-tuning. All OpenVLA checkpoints, as well as our training codebase are released under an MIT License. For full details, please read our paper and see our project page. ## Model Summary - **Developed by:** The OpenVLA team consisting of researchers from Stanford, UC Berkeley, Google Deepmind, and the Toyota Research Institute. - **Model type:** Vision-language-action (language, image => robot actions) - **Language(s) (NLP):** en - **License:** MIT - **Finetuned from:** []( a VLM trained from: + **Vision Backbone**: DINOv2 ViT-L/14 and SigLIP ViT-So400M/14 + **Language Model**: Llama-2 - **Pretraining Dataset:** Open X-Embodiment -- specific component datasets can be found here. - **Repository:** - **Paper:** OpenVLA: An Open-Source Vision-Language-Action Model - **Project Page & Videos:** ## Uses OpenVLA models take a language instruction and a camera image of a robot workspace as input, and predict (normalized) robot actions consisting of 7-DoF end-effector deltas of the form (x, y, z, roll, pitch, yaw, gripper). To execute on an actual robot platform, actions need to be *un-normalized* subject to statistics computed on a per-robot, per-dataset basis. See our repository for more information. OpenVLA models can be used zero-shot to control robots for specific combinations of embodiments and domains seen in the Open-X pretraining mixture (e.g., for BridgeV2 environments with a Widow-X robot). They can also be efficiently *fine-tuned* for new tasks and robot setups given minimal demonstration data; see here. **Out-of-Scope:** OpenVLA models do not zero-shot generalize to new (unseen) robot embodiments, or setups that are not represented in the pretraining mix; in these cases, we suggest collecting a dataset of demonstrations on the desired setup, and fine-tuning OpenVLA models instead. ## Getting Started OpenVLA 7B can be used to control multiple robots for domains represented in the pretraining mixture out-of-the-box. For example, here is an example for loading for zero-shot instruction following in the [BridgeV2 environments] with a Widow-X robot: For more examples, including scripts for fine-tuning OpenVLA models on your own robot demonstration datasets, see our training repository. ## Citation **BibTeX:**", + "model_explanation_gemini": "OpenVLA-7B is a vision-language-action model that generates robot actions from language instructions and camera images, trained on 970K robot manipulation episodes for controlling multiple robots, with support for fine-tuning on new domains." +} \ No newline at end of file diff --git a/data/model_data_json/optimum-internal-testing_tiny-random-llama.json b/data/model_data_json/optimum-internal-testing_tiny-random-llama.json new file mode 100644 index 0000000000000000000000000000000000000000..1b50cf3bc9e6ccd1363da7ef8ef10629fdf6a4fb --- /dev/null +++ b/data/model_data_json/optimum-internal-testing_tiny-random-llama.json @@ -0,0 +1,11 @@ +{ + "model_id": "optimum-internal-testing/tiny-random-llama", + "downloads": 82212, + "tags": [ + "safetensors", + "llama", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 ---" +} \ No newline at end of file diff --git a/data/model_data_json/optimum-internal-testing_tiny-random-whisper.json b/data/model_data_json/optimum-internal-testing_tiny-random-whisper.json new file mode 100644 index 0000000000000000000000000000000000000000..0649c9216b49d61965f29b2ae32eb963584e060f --- /dev/null +++ b/data/model_data_json/optimum-internal-testing_tiny-random-whisper.json @@ -0,0 +1,14 @@ +{ + "model_id": "optimum-internal-testing/tiny-random-whisper", + "downloads": 80831, + "tags": [ + "transformers", + "safetensors", + "whisper", + "automatic-speech-recognition", + "arxiv:1910.09700", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]" +} \ No newline at end of file diff --git a/data/model_data_json/optimum_all-MiniLM-L6-v2.json b/data/model_data_json/optimum_all-MiniLM-L6-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..73bf3921e52458c972a5e20eba38de223a9a6702 --- /dev/null +++ b/data/model_data_json/optimum_all-MiniLM-L6-v2.json @@ -0,0 +1,22 @@ +{ + "model_id": "optimum/all-MiniLM-L6-v2", + "downloads": 190064, + "tags": [ + "sentence-transformers", + "onnx", + "feature-extraction", + "sentence-similarity", + "en", + "arxiv:1904.06472", + "arxiv:2102.07033", + "arxiv:2104.08727", + "arxiv:1704.05179", + "arxiv:1810.09305", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity language: en license: apache-2.0 --- # ONNX convert all-MiniLM-L6-v2 ## Conversion of sentence-transformers/all-MiniLM-L6-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ------ ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained []( model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 256 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. #### Hyper parameters We trained ou model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: . #### Training data We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | Stack Exchange (Title+Body, Answer) pairs | - | 21,396,559 | | Stack Exchange (Title, Answer) pairs | - | 21,396,559 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | **Total** | | **1,170,060,424** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 384-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/owkin_phikon.json b/data/model_data_json/owkin_phikon.json new file mode 100644 index 0000000000000000000000000000000000000000..cd9995198cb72a6bec0ea038e0dc1b52505c1944 --- /dev/null +++ b/data/model_data_json/owkin_phikon.json @@ -0,0 +1,23 @@ +{ + "model_id": "owkin/phikon", + "downloads": 703018, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "vit", + "image-feature-extraction", + "biology", + "medical", + "cancer", + "feature-extraction", + "en", + "dataset:owkin/nct-crc-he", + "dataset:owkin/camelyon16-features", + "license:other", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: other language: - en tags: - biology - medical - cancer datasets: - owkin/nct-crc-he - owkin/camelyon16-features pipeline_tag: feature-extraction --- # Model Card for Phikon --- > [!IMPORTANT] > 🎉 Check out the latest version of Phikon here: Phikon-v2 > > Phikon is a self-supervised learning model for histopathology trained with iBOT. To learn more about how to use the model, we encourage you to read our blog post and view this Colab notebook. ### Model Description - **Developed by:** Owkin - **Funded by:** Owkin and IDRIS - **Model type:** Vision Transformer Base - **Model Stats:** - Params (M): 85.8 - Image size: 224 x 224 x 3 - **Paper:** - Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling. A. Filiot et al., medRxiv 2023.07.21.23292757; doi: - **Pretrain Dataset:** 40 million pan-cancer tiles extracted from TGCA - **Original:** - **License:** Owkin non-commercial license ## Uses ### Direct Use The primary use of the Phikon model can be used for feature extraction from histology image tiles. ### Downstream Use The model can be used for cancer classification on a variety of cancer subtypes. The model can also be finetuned to specialise on cancer subtypes. ## Technical Specifications ### Compute Infrastructure All the models we built were trained on the French Jean Zay cluster. ### Hardware NVIDIA V100 GPUs with 32Gb RAM ### Software PyTorch 1.13.1 --- ### BibTeX entry and citation info", + "model_explanation_gemini": "Phikon is a self-supervised vision transformer model for extracting features from histopathology images, primarily used for cancer classification and subtype analysis." +} \ No newline at end of file diff --git a/data/model_data_json/p208p2002_zh-wiki-punctuation-restore.json b/data/model_data_json/p208p2002_zh-wiki-punctuation-restore.json new file mode 100644 index 0000000000000000000000000000000000000000..052b8b56d58ca4cbc98e180c3fd65aa4e59c9fc7 --- /dev/null +++ b/data/model_data_json/p208p2002_zh-wiki-punctuation-restore.json @@ -0,0 +1,19 @@ +{ + "model_id": "p208p2002/zh-wiki-punctuation-restore", + "downloads": 103472, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "token-classification", + "ner", + "punctuation", + "zh", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - ner - punctuation language: - zh --- # zh-wiki-punctuation-restore More Detail: 共計支援6種標點符號: , 、 。 ? ! ; ## Install ## Usage", + "model_explanation_gemini": "Restores six Chinese punctuation marks (, 、 。 ? ! ;) in text." +} \ No newline at end of file diff --git a/data/model_data_json/papluca_xlm-roberta-base-language-detection.json b/data/model_data_json/papluca_xlm-roberta-base-language-detection.json new file mode 100644 index 0000000000000000000000000000000000000000..b5df3193dd021ac1749f68836765b9f2c2ba1ac5 --- /dev/null +++ b/data/model_data_json/papluca_xlm-roberta-base-language-detection.json @@ -0,0 +1,45 @@ +{ + "model_id": "papluca/xlm-roberta-base-language-detection", + "downloads": 1913251, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "xlm-roberta", + "text-classification", + "generated_from_trainer", + "multilingual", + "ar", + "bg", + "de", + "el", + "en", + "es", + "fr", + "hi", + "it", + "ja", + "nl", + "pl", + "pt", + "ru", + "sw", + "th", + "tr", + "ur", + "vi", + "zh", + "dataset:papluca/language-identification", + "arxiv:1911.02116", + "base_model:FacebookAI/xlm-roberta-base", + "base_model:finetune:FacebookAI/xlm-roberta-base", + "doi:10.57967/hf/2064", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ar - bg - de - el - en - es - fr - hi - it - ja - nl - pl - pt - ru - sw - th - tr - ur - vi - zh license: mit tags: - generated_from_trainer datasets: papluca/language-identification metrics: - accuracy - f1 base_model: xlm-roberta-base model-index: - name: xlm-roberta-base-language-detection results: [] --- # xlm-roberta-base-language-detection This model is a fine-tuned version of xlm-roberta-base on the Language Identification dataset. ## Model description This model is an XLM-RoBERTa transformer model with a classification head on top (i.e. a linear layer on top of the pooled output). For additional information please refer to the xlm-roberta-base model card or to the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. ## Intended uses & limitations You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 20 languages: ## Training and evaluation data The model was fine-tuned on the Language Identification dataset, which consists of text sequences in 20 languages. The training set contains 70k samples, while the validation and test sets 10k each. The average accuracy on the test set is **99.6%** (this matches the average macro/weighted F1-score being the test set perfectly balanced). A more detailed evaluation is provided by the following table. | Language | Precision | Recall | F1-score | support | |:--------:|:---------:|:------:|:--------:|:-------:| |ar |0.998 |0.996 |0.997 |500 | |bg |0.998 |0.964 |0.981 |500 | |de |0.998 |0.996 |0.997 |500 | |el |0.996 |1.000 |0.998 |500 | |en |1.000 |1.000 |1.000 |500 | |es |0.967 |1.000 |0.983 |500 | |fr |1.000 |1.000 |1.000 |500 | |hi |0.994 |0.992 |0.993 |500 | |it |1.000 |0.992 |0.996 |500 | |ja |0.996 |0.996 |0.996 |500 | |nl |1.000 |1.000 |1.000 |500 | |pl |1.000 |1.000 |1.000 |500 | |pt |0.988 |1.000 |0.994 |500 | |ru |1.000 |0.994 |0.997 |500 | |sw |1.000 |1.000 |1.000 |500 | |th |1.000 |0.998 |0.999 |500 | |tr |0.994 |0.992 |0.993 |500 | |ur |1.000 |1.000 |1.000 |500 | |vi |0.992 |1.000 |0.996 |500 | |zh |1.000 |1.000 |1.000 |500 | ### Benchmarks As a baseline to compare against, we have used the Python langid library. Since it comes pre-trained on 97 languages, we have used its method to constrain the language set to our 20 languages. The average accuracy of langid on the test set is **98.5%**. More details are provided by the table below. | Language | Precision | Recall | F1-score | support | |:--------:|:---------:|:------:|:--------:|:-------:| |ar |0.990 |0.970 |0.980 |500 | |bg |0.998 |0.964 |0.981 |500 | |de |0.992 |0.944 |0.967 |500 | |el |1.000 |0.998 |0.999 |500 | |en |1.000 |1.000 |1.000 |500 | |es |1.000 |0.968 |0.984 |500 | |fr |0.996 |1.000 |0.998 |500 | |hi |0.949 |0.976 |0.963 |500 | |it |0.990 |0.980 |0.985 |500 | |ja |0.927 |0.988 |0.956 |500 | |nl |0.980 |1.000 |0.990 |500 | |pl |0.986 |0.996 |0.991 |500 | |pt |0.950 |0.996 |0.973 |500 | |ru |0.996 |0.974 |0.985 |500 | |sw |1.000 |1.000 |1.000 |500 | |th |1.000 |0.996 |0.998 |500 | |tr |0.990 |0.968 |0.979 |500 | |ur |0.998 |0.996 |0.997 |500 | |vi |0.971 |0.990 |0.980 |500 | |zh |1.000 |1.000 |1.000 |500 | ## How to get started with the model The easiest way to use the model is via the high-level API: Or one can proceed with the tokenizer and model separately: ## Training procedure Fine-tuning was done via the API. Here is the Colab notebook with the training code. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 64 - eval_batch_size: 128 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 2 - mixed_precision_training: Native AMP ### Training results The validation results on the split of the Language Identification dataset are summarised here below. | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:| | 0.2492 | 1.0 | 1094 | 0.0149 | 0.9969 | 0.9969 | | 0.0101 | 2.0 | 2188 | 0.0103 | 0.9977 | 0.9977 | In short, it achieves the following results on the validation set: - Loss: 0.0101 - Accuracy: 0.9977 - F1: 0.9977 ### Framework versions - Transformers 4.12.5 - Pytorch 1.10.0+cu111 - Datasets 1.15.1 - Tokenizers 0.10.3", + "model_explanation_gemini": "Detects the language of text across 20 multilingual options with high accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/parler-tts_parler-tts-mini-v1.json b/data/model_data_json/parler-tts_parler-tts-mini-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..e9d48f5bdf89844bf593d81f8c7b7ae7421886f0 --- /dev/null +++ b/data/model_data_json/parler-tts_parler-tts-mini-v1.json @@ -0,0 +1,24 @@ +{ + "model_id": "parler-tts/parler-tts-mini-v1", + "downloads": 12271, + "tags": [ + "transformers", + "safetensors", + "parler_tts", + "text2text-generation", + "text-to-speech", + "annotation", + "en", + "dataset:parler-tts/mls_eng", + "dataset:parler-tts/libritts_r_filtered", + "dataset:parler-tts/libritts-r-filtered-speaker-descriptions", + "dataset:parler-tts/mls-eng-speaker-descriptions", + "arxiv:2402.01912", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - text-to-speech - annotation license: apache-2.0 language: - en pipeline_tag: text-to-speech inference: false datasets: - parler-tts/mls_eng - parler-tts/libritts_r_filtered - parler-tts/libritts-r-filtered-speaker-descriptions - parler-tts/mls-eng-speaker-descriptions --- \"Parler # Parler-TTS Mini v1 **Parler-TTS Mini v1** is a lightweight text-to-speech (TTS) model, trained on 45K hours of audio data, that can generate high-quality, natural sounding speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation). With Parler-TTS Large v1, this is the second set of models published as part of the Parler-TTS project, which aims to provide the community with TTS training resources and dataset pre-processing code. ## 📖 Quick Index * 👨‍💻 Installation * 🎲 Using a random voice * 🎯 Using a specific speaker * Motivation * Optimizing inference ## 🛠️ Usage ### 👨‍💻 Installation Using Parler-TTS is as simple as \"bonjour\". Simply install the library once: ### 🎲 Random voice **Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example: ### 🎯 Using a specific speaker To ensure speaker consistency across generations, this checkpoint was also trained on 34 speakers, characterized by name (e.g. Jon, Lea, Gary, Jenna, Mike, Laura). To take advantage of this, simply adapt your text description to specify which speaker to use: **Tips**: * We've set up an inference guide to make generation faster. Think SDPA, torch.compile, batching and streaming! * Include the term \"very clear audio\" to generate the highest quality audio, and \"very noisy audio\" for high levels of background noise * Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech * The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt ## Motivation Parler-TTS is a reproduction of work from the paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively. Contrarily to other TTS models, Parler-TTS is a **fully open-source** release. All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models. Parler-TTS was released alongside: * The Parler-TTS repository - you can train and fine-tuned your own version of the model. * The Data-Speech repository - a suite of utility scripts designed to annotate speech datasets. * The Parler-TTS organization - where you can find the annotated datasets as well as the future checkpoints. ## Citation If you found this repository useful, please consider citing this work and also the original Stability AI paper: ## License This model is permissively licensed under the Apache 2.0 license.", + "model_explanation_gemini": "Generates high-quality, natural-sounding speech from text with controllable features like gender, noise, and speaking rate via prompts." +} \ No newline at end of file diff --git a/data/model_data_json/patrickjohncyh_fashion-clip.json b/data/model_data_json/patrickjohncyh_fashion-clip.json new file mode 100644 index 0000000000000000000000000000000000000000..e8822828774259e929a23235d811e78869e18e80 --- /dev/null +++ b/data/model_data_json/patrickjohncyh_fashion-clip.json @@ -0,0 +1,22 @@ +{ + "model_id": "patrickjohncyh/fashion-clip", + "downloads": 3681010, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "clip", + "zero-shot-image-classification", + "vision", + "language", + "fashion", + "ecommerce", + "en", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - vision - language - fashion - ecommerce library_name: transformers language: - en widget: - src: candidate_labels: black shoe, red shoe, a cat example_title: Black Shoe --- : We have updated the model! We found that laion/CLIP-ViT-B-32-laion2B-s34B-b79K checkpoint (thanks Bin!) worked better than original OpenAI CLIP on Fashion. We thus fine-tune a newer (and better!) version of FashionCLIP (henceforth FashionCLIP 2.0), while keeping the architecture the same. We postulate that the perofrmance gains afforded by are due to the increased training data (5x OpenAI CLIP data). Our thesis, however, remains the same -- fine-tuning on our fashion dataset improved zero-shot perofrmance across our benchmarks. See the below table comparing weighted macro F1 score across models. | Model | FMNIST | KAGL | DEEP | | ------------- | ------------- | ------------- | ------------- | | OpenAI CLIP | 0.66 | 0.63 | 0.45 | | FashionCLIP | 0.74 | 0.67 | 0.48 | | Laion CLIP | 0.78 | 0.71 | 0.58 | | FashionCLIP 2.0 | __0.83__ | __0.73__ | __0.62__ | --- FashionCLIP is a CLIP-based model developed to produce general product representations for fashion concepts. Leveraging the pre-trained checkpoint (ViT-B/32) released by OpenAI, we train FashionCLIP on a large, high-quality novel fashion dataset to study whether domain specific fine-tuning of CLIP-like models is sufficient to produce product representations that are zero-shot transferable to entirely new datasets and tasks. FashionCLIP was not developed for model deplyoment - to do so, researchers will first need to carefully study their capabilities in relation to the specific context they’re being deployed within. ### Model Date March 2023 ### Model Type The model uses a ViT-B/32 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained, starting from a pre-trained checkpoint, to maximize the similarity of (image, text) pairs via a contrastive loss on a fashion dataset containing 800K products. ### Documents - FashionCLIP Github Repo - FashionCLIP Paper ## Data The model was trained on (image, text) pairs obtained from the Farfecth dataset[^1 Awaiting official release.], an English dataset comprising over 800K fashion products, with more than 3K brands across dozens of object types. The image used for encoding is the standard product image, which is a picture of the item over a white background, with no humans. The text used is a concatenation of the _highlight_ (e.g., “stripes”, “long sleeves”, “Armani”) and _short description_ (“80s styled t-shirt”)) available in the Farfetch dataset. ## Limitations, Bias and Fiarness We acknowledge certain limitations of FashionCLIP and expect that it inherits certain limitations and biases present in the original CLIP model. We do not expect our fine-tuning to significantly augment these limitations: we acknowledge that the fashion data we use makes explicit assumptions about the notion of gender as in \"blue shoes for a woman\" that inevitably associate aspects of clothing with specific people. Our investigations also suggest that the data used introduces certain limitations in FashionCLIP. From the textual modality, given that most captions derived from the Farfetch dataset are long, we observe that FashionCLIP may be more performant in longer queries than shorter ones. From the image modality, FashionCLIP is also biased towards standard product images (centered, white background). Model selection, i.e. selecting an appropariate stopping critera during fine-tuning, remains an open challenge. We observed that using loss on an in-domain (i.e. same distribution as test) validation dataset is a poor selection critera when out-of-domain generalization (i.e. across different datasets) is desired, even when the dataset used is relatively diverse and large. ## Citation", + "model_explanation_gemini": "Fine-tuned on fashion data for improved zero-shot performance in matching product images with textual descriptions, specializing in fashion concepts and ecommerce applications." +} \ No newline at end of file diff --git a/data/model_data_json/patrickvonplaten_t5-tiny-random.json b/data/model_data_json/patrickvonplaten_t5-tiny-random.json new file mode 100644 index 0000000000000000000000000000000000000000..1e844192f32000d42e316211e3701842d660ce80 --- /dev/null +++ b/data/model_data_json/patrickvonplaten_t5-tiny-random.json @@ -0,0 +1,17 @@ +{ + "model_id": "patrickvonplaten/t5-tiny-random", + "downloads": 109757, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "t5", + "text2text-generation", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "" +} \ No newline at end of file diff --git a/data/model_data_json/patrickvonplaten_wavlm-libri-clean-100h-base-plus.json b/data/model_data_json/patrickvonplaten_wavlm-libri-clean-100h-base-plus.json new file mode 100644 index 0000000000000000000000000000000000000000..3e29bd1860a9aa9aed485ae92aac22647c5b22dc --- /dev/null +++ b/data/model_data_json/patrickvonplaten_wavlm-libri-clean-100h-base-plus.json @@ -0,0 +1,18 @@ +{ + "model_id": "patrickvonplaten/wavlm-libri-clean-100h-base-plus", + "downloads": 123419, + "tags": [ + "transformers", + "pytorch", + "tensorboard", + "wavlm", + "automatic-speech-recognition", + "librispeech_asr", + "generated_from_trainer", + "wavlm_libri_finetune", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - automatic-speech-recognition - librispeech_asr - generated_from_trainer - wavlm_libri_finetune model-index: - name: wavlm-libri-clean-100h-base-plus results: [] --- # wavlm-libri-clean-100h-base-plus This model is a fine-tuned version of microsoft/wavlm-base-plus on the LIBRISPEECH_ASR - CLEAN dataset. It achieves the following results on the evaluation set: - Loss: 0.0819 - Wer: 0.0683 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 32 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 3.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:------:| | 2.8877 | 0.34 | 300 | 2.8649 | 1.0 | | 0.2852 | 0.67 | 600 | 0.2196 | 0.1830 | | 0.1198 | 1.01 | 900 | 0.1438 | 0.1273 | | 0.0906 | 1.35 | 1200 | 0.1145 | 0.1035 | | 0.0729 | 1.68 | 1500 | 0.1055 | 0.0955 | | 0.0605 | 2.02 | 1800 | 0.0936 | 0.0859 | | 0.0402 | 2.35 | 2100 | 0.0885 | 0.0746 | | 0.0421 | 2.69 | 2400 | 0.0848 | 0.0700 | ### Framework versions - Transformers 4.15.0.dev0 - Pytorch 1.9.0+cu111 - Datasets 1.16.2.dev0 - Tokenizers 0.10.3", + "model_explanation_gemini": "A fine-tuned WavLM model for automatic speech recognition, optimized on the LibriSpeech CLEAN dataset to transcribe audio with low word error rates." +} \ No newline at end of file diff --git a/data/model_data_json/peft-internal-testing_tiny-dummy-qwen2.json b/data/model_data_json/peft-internal-testing_tiny-dummy-qwen2.json new file mode 100644 index 0000000000000000000000000000000000000000..cd01573b0ea7f49668b41c07a55c0ce59060ad62 --- /dev/null +++ b/data/model_data_json/peft-internal-testing_tiny-dummy-qwen2.json @@ -0,0 +1,18 @@ +{ + "model_id": "peft-internal-testing/tiny-dummy-qwen2", + "downloads": 458341, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "conversational", + "arxiv:1910.09700", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]", + "model_explanation_gemini": "A placeholder Hugging Face transformers model with incomplete metadata and unspecified functionality." +} \ No newline at end of file diff --git a/data/model_data_json/petals-team_StableBeluga2.json b/data/model_data_json/petals-team_StableBeluga2.json new file mode 100644 index 0000000000000000000000000000000000000000..d9370df5916c22c9e521675d2676d7691e3e8666 --- /dev/null +++ b/data/model_data_json/petals-team_StableBeluga2.json @@ -0,0 +1,23 @@ +{ + "model_id": "petals-team/StableBeluga2", + "downloads": 1182532, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "en", + "dataset:conceptofmind/cot_submix_original", + "dataset:conceptofmind/flan2021_submix_original", + "dataset:conceptofmind/t0_submix_original", + "dataset:conceptofmind/niv2_submix_original", + "arxiv:2307.09288", + "arxiv:2306.02707", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - conceptofmind/cot_submix_original - conceptofmind/flan2021_submix_original - conceptofmind/t0_submix_original - conceptofmind/niv2_submix_original language: - en pipeline_tag: text-generation --- # Stable Beluga 2 ## Changes in this fork This repository contains the model from the stabilityai/StableBeluga2 repository with the following changes: 1. **Storing weights in instead of .** This leads to 2x smaller files and a small quality loss, which is not significant compared to the loss caused by NF4 quantization used in Petals by default. 1. **Storing weights in small shards.** Each transformer block is stored in its own shard (1.71 GB each). The input and output embeddings and adjacent layernorms are in a separate shard (1.05 GB) too. This way, Petals clients and servers don't have to download any excess data besides the layers they actually use. 1. **Using Safetensors instead of Pickle.** This allows faster loading with smaller RAM requirements. We provide the original README below. Please refer there for model details and licensing information. ## Model Description is a Llama2 70B model finetuned on an Orca style Dataset ## Usage Start chatting with using the following code snippet: Stable Beluga 2 should be used with this prompt format: ## Other Beluga Models StableBeluga 1 - Delta StableBeluga 13B StableBeluga 7B ## Model Details * **Developed by**: Stability AI * **Model type**: Stable Beluga 2 is an auto-regressive language model fine-tuned on Llama2 70B. * **Language(s)**: English * **Library**: HuggingFace Transformers * **License**: Fine-tuned checkpoints () is licensed under the STABLE BELUGA NON-COMMERCIAL COMMUNITY LICENSE AGREEMENT * **Contact**: For questions and comments about the model, please email ### Training Dataset is trained on our internal Orca-style dataset ### Training Procedure Models are learned via supervised fine-tuning on the aforementioned datasets, trained in mixed-precision (BF16), and optimized with AdamW. We outline the following hyperparameters: | Dataset | Batch Size | Learning Rate |Learning Rate Decay| Warm-up | Weight Decay | Betas | |-------------------|------------|---------------|-------------------|---------|--------------|-------------| | Orca pt1 packed | 256 | 3e-5 | Cosine to 3e-6 | 100 | 1e-6 | (0.9, 0.95) | | Orca pt2 unpacked | 512 | 3e-5 | Cosine to 3e-6 | 100 | 1e-6 | (0.9, 0.95) | ## Ethical Considerations and Limitations Beluga is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Beluga's potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Beluga, developers should perform safety testing and tuning tailored to their specific applications of the model. ## How to cite ## Citations", + "model_explanation_gemini": "A 70B Llama2-based auto-regressive language model fine-tuned on an Orca-style dataset for text generation in English." +} \ No newline at end of file diff --git a/data/model_data_json/philschmid_bart-large-cnn-samsum.json b/data/model_data_json/philschmid_bart-large-cnn-samsum.json new file mode 100644 index 0000000000000000000000000000000000000000..cf8fe2e8271cae871b7c5c7d56ee77071c5d61ab --- /dev/null +++ b/data/model_data_json/philschmid_bart-large-cnn-samsum.json @@ -0,0 +1,21 @@ +{ + "model_id": "philschmid/bart-large-cnn-samsum", + "downloads": 137115, + "tags": [ + "transformers", + "pytorch", + "bart", + "text2text-generation", + "sagemaker", + "summarization", + "en", + "dataset:samsum", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: mit tags: - sagemaker - bart - summarization datasets: - samsum widget: - text: \"Jeff: Can I train a \\U0001F917 Transformers model on Amazon SageMaker? \\n\\ Philipp: Sure you can use the new Hugging Face Deep Learning Container. \\nJeff:\\ \\ ok.\\nJeff: and how can I get started? \\nJeff: where can I find documentation?\\ \\ \\nPhilipp: ok, ok you can find everything here. model-index: - name: bart-large-cnn-samsum results: - task: type: summarization name: Summarization dataset: name: 'SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization' type: samsum metrics: - type: rogue-1 value: 42.621 name: Validation ROGUE-1 - type: rogue-2 value: 21.9825 name: Validation ROGUE-2 - type: rogue-l value: 33.034 name: Validation ROGUE-L - type: rogue-1 value: 41.3174 name: Test ROGUE-1 - type: rogue-2 value: 20.8716 name: Test ROGUE-2 - type: rogue-l value: 32.1337 name: Test ROGUE-L - task: type: summarization name: Summarization dataset: name: samsum type: samsum config: samsum split: test metrics: - type: rouge value: 41.3282 name: ROUGE-1 verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTYzNzZkZDUzOWQzNGYxYTJhNGE4YWYyZjA0NzMyOWUzMDNhMmVhYzY1YTM0ZTJhYjliNGE4MDZhMjhhYjRkYSIsInZlcnNpb24iOjF9.OOM6l3v5rJCndmUIJV-2SDh2NjbPo5IgQOSL-Ju1Gwbi1voL5amsDEDOelaqlUBE3n55KkUsMLZhyn66yWxZBQ - type: rouge value: 20.8755 name: ROUGE-2 verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWZiODFiYWQzY2NmOTc5YjA3NTI0YzQ1MzQ0ODk2NjgyMmVlMjA5MjZiNTJkMGRmZGEzN2M3MDNkMjkxMDVhYSIsInZlcnNpb24iOjF9.b8cPk2-IL24La3Vd0hhtii4tRXujh5urAwy6IVeTWHwYfXaURyC2CcQOWtlOx5bdO5KACeaJFrFBCGgjk-VGCQ - type: rouge value: 32.1353 name: ROUGE-L verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYWNmYzdiYWQ2ZWRkYzRiMGMxNWUwODgwZTdkY2NjZTc1NWE5NTFiMzU0OTU1N2JjN2ExYWQ2NGZkNjk5OTc4YSIsInZlcnNpb24iOjF9.Fzv4p-TEVicljiCqsBJHK1GsnE_AwGqamVmxTPI0WBNSIhZEhliRGmIL_z1pDq6WOzv3GN2YUGvhowU7GxnyAQ - type: rouge value: 38.401 name: ROUGE-LSUM verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGI4MWY0NWMxMmQ0ODQ5MDhiNDczMDAzYzJkODBiMzgzYWNkMWM2YTZkZDJmNWJiOGQ3MmNjMGViN2UzYWI2ZSIsInZlcnNpb24iOjF9.7lw3h5k5lJ7tYFLZGUtLyDabFYd00l6ByhmvkW4fykocBy9Blyin4tdw4Xps4DW-pmrdMLgidHxBWz5MrSx1Bw - type: loss value: 1.4297215938568115 name: loss verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzI0ZWNhNDM5YTViZDMyZGJjMDA1ZWFjYzNhOTdlOTFiNzhhMDBjNmM2MjA3ZmRkZjJjMjEyMGY3MzcwOTI2NyIsInZlcnNpb24iOjF9.oNaZsAtUDqGAqoZWJavlcW7PKx1AWsnkbhaQxadpOKk_u7ywJJabvTtzyx_DwEgZslgDETCf4MM-JKitZKjiDA - type: gen_len value: 60.0757 name: gen_len verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTgwYWYwMDRkNTJkMDM5N2I2MWNmYzQ3OWM1NDJmODUyZGViMGE4ZTdkNmIwYWM2N2VjZDNmN2RiMDE4YTYyYiIsInZlcnNpb24iOjF9.PbXTcNYX_SW-BuRQEcqyc21M7uKrOMbffQSAK6k2GLzTVRrzZxsDC57ktKL68zRY8fSiRGsnknOwv-nAR6YBCQ --- ## > If you want to use the model you should try a newer fine-tuned FLAN-T5 version philschmid/flan-t5-base-samsum out socring the BART version with on achieving . # TRY philschmid/flan-t5-base-samsum This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container. For more information look at: - 🤗 Transformers Documentation: Amazon SageMaker - Example Notebooks - Amazon SageMaker documentation for Hugging Face - Python SDK SageMaker documentation for Hugging Face - Deep Learning Container ## Hyperparameters ## Usage ## Results | key | value | | --- | ----- | | eval_rouge1 | 42.621 | | eval_rouge2 | 21.9825 | | eval_rougeL | 33.034 | | eval_rougeLsum | 39.6783 | | test_rouge1 | 41.3174 | | test_rouge2 | 20.8716 | | test_rougeL | 32.1337 | | test_rougeLsum | 38.4149 |", + "model_explanation_gemini": "Generates abstractive summaries of dialogues, specifically fine-tuned on the SAMSum dataset for conversation summarization." +} \ No newline at end of file diff --git a/data/model_data_json/plasmo_vox2.json b/data/model_data_json/plasmo_vox2.json new file mode 100644 index 0000000000000000000000000000000000000000..3a3a9ca90904b0f26084226a3ec061ea21ba144e --- /dev/null +++ b/data/model_data_json/plasmo_vox2.json @@ -0,0 +1,16 @@ +{ + "model_id": "plasmo/vox2", + "downloads": 128519, + "tags": [ + "diffusers", + "text-to-image", + "stable-diffusion", + "license:creativeml-openrail-m", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: creativeml-openrail-m tags: - text-to-image - stable-diffusion widget: - text: \"voxel-ish \" --- ### Jak's Voxel-ish Image Pack v.1.2 for Stable Diffusion VERSION 1.2 of Voxel-ish Image Pack brought to you by 184 training images through 8000 training steps, 20% Training text crafted by Jak_TheAI_Artist version history: v1.2 - Fine tuned for better faces. Include Prompt trigger: \"voxel-ish\" to activate. Tip: add \"intricate detail\" in prompt to make a semi-realistic image. Sample pictures of this concept: voxel-ish !voxel-ish 0 !voxel-ish 1 !voxel-ish 2 !voxel-ish 3 !voxel-ish 4 !voxel-ish 4 !voxel-ish 4 !voxel-ish 4 !voxel-ish 4 !voxel-ish 4 !voxel-ish 4 !voxel-ish 4", + "model_explanation_gemini": "Generates voxel-style images from text prompts using Stable Diffusion, activated by the trigger phrase \"voxel-ish\"." +} \ No newline at end of file diff --git a/data/model_data_json/playgroundai_playground-v2.5-1024px-aesthetic.json b/data/model_data_json/playgroundai_playground-v2.5-1024px-aesthetic.json new file mode 100644 index 0000000000000000000000000000000000000000..74199ce7fb68b61d1818609be0a9fef8ab76a556 --- /dev/null +++ b/data/model_data_json/playgroundai_playground-v2.5-1024px-aesthetic.json @@ -0,0 +1,19 @@ +{ + "model_id": "playgroundai/playground-v2.5-1024px-aesthetic", + "downloads": 522025, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "playground", + "arxiv:2206.00364", + "arxiv:2402.17245", + "license:other", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: other license_name: playground-v2dot5-community license_link: tags: - text-to-image - playground inference: parameters: guidance_scale: 3.0 --- # Playground v2.5 – 1024px Aesthetic Model This repository contains a model that generates highly aesthetic images of resolution 1024x1024, as well as portrait and landscape aspect ratios. You can use the model with Hugging Face 🧨 Diffusers. !image/png **Playground v2.5** is a diffusion-based text-to-image generative model, and a successor to Playground v2. Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2. For details on the development and training of our model, please refer to our blog post and technical report. ### Model Description - **Developed by:** Playground - **Model type:** Diffusion-based text-to-image generative model - **License:** Playground v2.5 Community License - **Summary:** This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). It follows the same architecture as Stable Diffusion XL. ### Using the model with 🧨 Diffusers Install diffusers >= 0.27.0 and the relevant dependencies. **Notes:** - The pipeline uses the scheduler by default, for crisper fine details. It's an EDM formulation of the DPM++ 2M Karras scheduler. is a good default for this scheduler. - The pipeline also supports the scheduler. It's an EDM formulation of the Euler scheduler. is a good default for this scheduler. Then, run the following snippet: ### Using the model with Automatic1111/ComfyUI Support coming soon. We will update this model card with instructions when ready. ### User Studies This model card only provides a brief summary of our user study results. For extensive details on how we perform user studies, please check out our technical report. We conducted studies to measure overall aesthetic quality, as well as for the specific areas we aimed to improve with Playground v2.5, namely multi aspect ratios and human preference alignment. #### Comparison to State-of-the-Art !image/png The aesthetic quality of Playground v2.5 dramatically outperforms the current state-of-the-art open source models SDXL and PIXART-α, as well as Playground v2. Because the performance differential between Playground V2.5 and SDXL was so large, we also tested our aesthetic quality against world-class closed-source models like DALL-E 3 and Midjourney 5.2, and found that Playground v2.5 outperforms them as well. #### Multi Aspect Ratios !image/png Similarly, for multi aspect ratios, we outperform SDXL by a large margin. #### Human Preference Alignment on People-related images !image/png Next, we benchmark Playground v2.5 specifically on people-related images, to test Human Preference Alignment. We compared Playground v2.5 against two commonly-used baseline models: SDXL and RealStock v2, a community fine-tune of SDXL that was trained on a realistic people dataset. Playground v2.5 outperforms both baselines by a large margin. ### MJHQ-30K Benchmark !image/png | Model | Overall FID | | ------------------------------------- | ----- | | SDXL-1-0-refiner | 9.55 | | playground-v2-1024px-aesthetic | 7.07 | | playground-v2.5-1024px-aesthetic | **4.48** | Lastly, we report metrics using our MJHQ-30K benchmark which we open-sourced with the v2 release. We report both the overall FID and per category FID. All FID metrics are computed at resolution 1024x1024. Our results show that Playground v2.5 outperforms both Playground v2 and SDXL in overall FID and all category FIDs, especially in the people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preferences and the FID score of the MJHQ-30K benchmark. ### How to cite us", + "model_explanation_gemini": "Generates high-quality, aesthetically pleasing 1024x1024 images from text prompts using a diffusion-based architecture, outperforming leading models like SDXL and Midjourney." +} \ No newline at end of file diff --git a/data/model_data_json/prajjwal1_bert-mini.json b/data/model_data_json/prajjwal1_bert-mini.json new file mode 100644 index 0000000000000000000000000000000000000000..3677f5c974949dafe8f2daf69e282d881fd8ed29 --- /dev/null +++ b/data/model_data_json/prajjwal1_bert-mini.json @@ -0,0 +1,21 @@ +{ + "model_id": "prajjwal1/bert-mini", + "downloads": 183996, + "tags": [ + "transformers", + "pytorch", + "BERT", + "MNLI", + "NLI", + "transformer", + "pre-training", + "en", + "arxiv:1908.08962", + "arxiv:2110.01518", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: - mit tags: - BERT - MNLI - NLI - transformer - pre-training --- The following model is a Pytorch pre-trained model obtained from converting Tensorflow checkpoint found in the official Google BERT repository. This is one of the smaller pre-trained BERT variants, together with bert-small and bert-medium. They were introduced in the study (arxiv), and ported to HF for the study (arXiv). These models are supposed to be trained on a downstream task. If you use the model, please consider citing both the papers: Config of this model: (L=4, H=256) Model Link Other models to check out: - (L=2, H=128) Model Link - (L=4, H=512) Model Link - (L=8, H=512) Model Link Original Implementation and more info can be found in this Github repository. Twitter: @prajjwal_1", + "model_explanation_gemini": "A compact BERT variant (L=4, H=256) designed for downstream NLP tasks like MNLI/NLI after task-specific fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/prajjwal1_bert-tiny.json b/data/model_data_json/prajjwal1_bert-tiny.json new file mode 100644 index 0000000000000000000000000000000000000000..9353274270c091b7f12cc904be95ecacfc68426c --- /dev/null +++ b/data/model_data_json/prajjwal1_bert-tiny.json @@ -0,0 +1,21 @@ +{ + "model_id": "prajjwal1/bert-tiny", + "downloads": 1067610, + "tags": [ + "transformers", + "pytorch", + "BERT", + "MNLI", + "NLI", + "transformer", + "pre-training", + "en", + "arxiv:1908.08962", + "arxiv:2110.01518", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en license: - mit tags: - BERT - MNLI - NLI - transformer - pre-training --- The following model is a Pytorch pre-trained model obtained from converting Tensorflow checkpoint found in the official Google BERT repository. This is one of the smaller pre-trained BERT variants, together with bert-mini bert-small and bert-medium. They were introduced in the study (arxiv), and ported to HF for the study (arXiv). These models are supposed to be trained on a downstream task. If you use the model, please consider citing both the papers: Config of this model: - (L=2, H=128) Model Link Other models to check out: - (L=4, H=256) Model Link - (L=4, H=512) Model Link - (L=8, H=512) Model Link Original Implementation and more info can be found in this Github repository. Twitter: @prajjwal_1", + "model_explanation_gemini": "A small pre-trained BERT variant designed for fine-tuning on downstream natural language processing tasks like MNLI or NLI." +} \ No newline at end of file diff --git a/data/model_data_json/princeton-nlp_gemma-2-9b-it-SimPO.json b/data/model_data_json/princeton-nlp_gemma-2-9b-it-SimPO.json new file mode 100644 index 0000000000000000000000000000000000000000..66072ae258d96b54a0addd540ba88552caa7c34f --- /dev/null +++ b/data/model_data_json/princeton-nlp_gemma-2-9b-it-SimPO.json @@ -0,0 +1,26 @@ +{ + "model_id": "princeton-nlp/gemma-2-9b-it-SimPO", + "downloads": 147569, + "tags": [ + "transformers", + "safetensors", + "gemma2", + "text-generation", + "alignment-handbook", + "generated_from_trainer", + "conversational", + "dataset:princeton-nlp/gemma2-ultrafeedback-armorm", + "arxiv:2405.14734", + "arxiv:2310.01377", + "arxiv:2406.12845", + "base_model:google/gemma-2-9b-it", + "base_model:finetune:google/gemma-2-9b-it", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: google/gemma-2-9b-it tags: - alignment-handbook - generated_from_trainer datasets: - princeton-nlp/gemma2-ultrafeedback-armorm model-index: - name: princeton-nlp/gemma-2-9b-it-SimPO results: [] license: mit --- # gemma-2-9b-it-SimPO Model Card SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our preprint and github repo for more details. ## Model Details ### Model Description We fine-tuned google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm with the SimPO objective. - **Developed by:** Yu Meng, Mengzhou Xia, Danqi Chen - **Model type:** Causal Language Model - **License:** gemma - **Finetuned from model:** google/gemma-2-9b-it ### Model Sources - **Repository:** - **Paper:** ## How to Get Started with the Model ## Training Details ### Training Data We use princeton-nlp/gemma2-ultrafeedback-armorm as the preference optimization dataset. #### Training Hyperparameters The hyperparameters used can be found in the training script. #### Speeds, Sizes, Times Fine-tuning the google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm takes around 100 mins to finish on 8xH100 GPUs. ## Evaluation Results | models | AE2 LC | AE2 WR | AE2 Length | AH | AH Length | GSM | GSM Length | MMLU | MMLU Length | |-----------------------------------|:------:|:------:|:----------:|:----:|:---------:|:----:|:----------:|:----:|:-----------:| | google/gemma-2-9b-it | 51.1 | 38.1 | 1571 | 40.8 | 545 | 87.4 | 395 | 72.7 | 515 | | princeton-nlp/gemma-2-9b-it-DPO | 67.8 | 65.4 | 2016 | 58.9 | 717 | 88.5 | 392 | 72.2 | 624 | | princeton-nlp/gemma-2-9b-it-SimPO | 72.4 | 65.9 | 1833 | 59.1 | 693 | 88.0 | 341 | 72.2 | 441 | ## Technical Specifications ### Model Architecture and Objective The model architecture is based on google/gemma-2-9b-it. We use the SimPO training objective proposed in our preprint. #### Hardware We used 8xH100 GPUs for model training. #### Software Training was done using the alignment-handbook library. ## Citation gemma model: SimPO paper: UltraFeedback paper: ArmoRM paper:", + "model_explanation_gemini": "Fine-tuned from google/gemma-2-9b-it using the SimPO algorithm to optimize language model performance on preference datasets without requiring a reference model." +} \ No newline at end of file diff --git a/data/model_data_json/princeton-nlp_sup-simcse-roberta-large.json b/data/model_data_json/princeton-nlp_sup-simcse-roberta-large.json new file mode 100644 index 0000000000000000000000000000000000000000..ee54eea74ab8257dba81d89e2c32d586872b5431 --- /dev/null +++ b/data/model_data_json/princeton-nlp_sup-simcse-roberta-large.json @@ -0,0 +1,17 @@ +{ + "model_id": "princeton-nlp/sup-simcse-roberta-large", + "downloads": 227730, + "tags": [ + "transformers", + "pytorch", + "jax", + "roberta", + "feature-extraction", + "arxiv:2104.08821", + "arxiv:1910.09700", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - feature-extraction --- # Model Card for sup-simcse-roberta-large # Model Details ## Model Description - **Developed by:** Princeton-nlp - **Shared by [Optional]:** More information needed - **Model type:** Feature Extraction - **Language(s) (NLP):** More information needed - **License:** More information needed - **Related Models:** - **Parent Model:** RoBERTa-large - **Resources for more information:** - GitHub Repo - Associated Paper - Blog Post # Uses ## Direct Use This model can be used for the task of Feature Extraction ## Downstream Use [Optional] More information needed ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Training Details ## Training Data The model craters note in the Github Repository > We train unsupervised SimCSE on 106 randomly sampled sentences from English Wikipedia, and train supervised SimCSE on the combination of MNLI and SNLI datasets (314k). ## Training Procedure ### Preprocessing More information needed ### Speeds, Sizes, Times More information needed # Evaluation ## Testing Data, Factors & Metrics ### Testing Data The model craters note in the associated paper > Our evaluation code for sentence embeddings is based on a modified version of SentEval. It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the \"all\" setting, and report Spearman's correlation. See associated paper (Appendix B) for evaluation details. ### Factors ### Metrics More information needed ## Results More information needed # Model Examination More information needed # Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** More information needed - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications [optional] ## Model Architecture and Objective More information needed ## Compute Infrastructure More information needed ### Hardware More information needed ### Software More information needed # Citation **BibTeX:** # Glossary [optional] More information needed # More Information [optional] If you have any questions related to the code or the paper, feel free to email Tianyu () and Xingcheng (). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker! # Model Card Authors [optional] Princeton NLP group in collaboration with Ezi Ozoani and the Hugging Face team # Model Card Contact More information needed # How to Get Started with the Model Use the code below to get started with the model.
Click to expand
", + "model_explanation_gemini": "Extracts sentence embeddings for semantic textual similarity tasks using supervised SimCSE training on RoBERTa-large." +} \ No newline at end of file diff --git a/data/model_data_json/prithivida_grammar_error_correcter_v1.json b/data/model_data_json/prithivida_grammar_error_correcter_v1.json new file mode 100644 index 0000000000000000000000000000000000000000..e150dfff7b551d13eb20dea21a08312c454ce640 --- /dev/null +++ b/data/model_data_json/prithivida_grammar_error_correcter_v1.json @@ -0,0 +1,16 @@ +{ + "model_id": "prithivida/grammar_error_correcter_v1", + "downloads": 95189, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "**This model is part of the Gramformer library** please refer to", + "model_explanation_gemini": "Part of the Gramformer library for grammar error correction." +} \ No newline at end of file diff --git a/data/model_data_json/prithivida_parrot_paraphraser_on_T5.json b/data/model_data_json/prithivida_parrot_paraphraser_on_T5.json new file mode 100644 index 0000000000000000000000000000000000000000..e994ed4fad08293d1d0dce0449eb784569089ef6 --- /dev/null +++ b/data/model_data_json/prithivida_parrot_paraphraser_on_T5.json @@ -0,0 +1,16 @@ +{ + "model_id": "prithivida/parrot_paraphraser_on_T5", + "downloads": 870154, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "# Parrot ## 1. What is Parrot? Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more than just a paraphrasing model. For more details on the library and usage please refer to the github page ### Installation ### Quickstart ### Knobs ## 2. Why Parrot? **Huggingface** lists 12 paraphrase models, **RapidAPI** lists 7 fremium and commercial paraphrasers like QuillBot, Rasa has discussed an experimental paraphraser for augmenting text data here, Sentence-transfomers offers a paraphrase mining utility and NLPAug offers word level augmentation with a PPDB (a multi-million paraphrase database). While these attempts at paraphrasing are great, there are still some gaps and paraphrasing is NOT yet a mainstream option for text augmentation in building NLU models....Parrot is a humble attempt to fill some of these gaps. **What is a good paraphrase?** Almost all conditioned text generation models are validated on 2 factors, (1) if the generated text conveys the same meaning as the original context (Adequacy) (2) if the text is fluent / grammatically correct english (Fluency). For instance Neural Machine Translation outputs are tested for Adequacy and Fluency. But a good paraphrase should be adequate and fluent while being as different as possible on the surface lexical form. With respect to this definition, the **3 key metrics** that measures the quality of paraphrases are: - **Adequacy** (Is the meaning preserved adequately?) - **Fluency** (Is the paraphrase fluent English?) - **Diversity (Lexical / Phrasal / Syntactical)** (How much has the paraphrase changed the original sentence?) *Parrot offers knobs to control Adequacy, Fluency and Diversity as per your needs.* **What makes a paraphraser a good augmentor?** For training a NLU model we just don't need a lot of utterances but utterances with intents and slots/entities annotated. Typical flow would be: - Given an **input utterance + input annotations** a good augmentor spits out N **output paraphrases** while preserving the intent and slots. - The output paraphrases are then converted into annotated data using the input annotations that we got in step 1. - The annotated data created out of the output paraphrases then makes the training dataset for your NLU model. But in general being a generative model paraphrasers doesn't guarantee to preserve the slots/entities. So the ability to generate high quality paraphrases in a constrained fashion without trading off the intents and slots for lexical dissimilarity makes a paraphraser a good augmentor. *More on this in section 3 below* ## 3. Scope In the space of conversational engines, knowledge bots are to which **we ask questions** like *\"when was the Berlin wall teared down?\"*, transactional bots are to which **we give commands** like *\"Turn on the music please\"* and voice assistants are the ones which can do both answer questions and action our commands. Parrot mainly foucses on augmenting texts typed-into or spoken-to conversational interfaces for building robust NLU models. (*So usually people neither type out or yell out long paragraphs to conversational interfaces. Hence the pre-trained model is trained on text samples of maximum length of 32.*) *While Parrot predominantly aims to be a text augmentor for building good NLU models, it can also be used as a pure-play paraphraser.*", + "model_explanation_gemini": "Generates diverse, fluent paraphrases while preserving meaning and annotations to augment NLU training data." +} \ No newline at end of file diff --git a/data/model_data_json/protectai_deberta-v3-base-prompt-injection-v2.json b/data/model_data_json/protectai_deberta-v3-base-prompt-injection-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..8f8c7d52d493b05e5688ffc6d3369e2771b5302f --- /dev/null +++ b/data/model_data_json/protectai_deberta-v3-base-prompt-injection-v2.json @@ -0,0 +1,32 @@ +{ + "model_id": "protectai/deberta-v3-base-prompt-injection-v2", + "downloads": 208687, + "tags": [ + "transformers", + "onnx", + "safetensors", + "deberta-v2", + "text-classification", + "prompt-injection", + "injection", + "security", + "llm-security", + "generated_from_trainer", + "en", + "dataset:natolambert/xstest-v2-copy", + "dataset:VMware/open-instruct", + "dataset:alespalla/chatbot_instruction_prompts", + "dataset:HuggingFaceH4/grok-conversation-harmless", + "dataset:Harelix/Prompt-Injection-Mixed-Techniques-2024", + "dataset:OpenSafetyLab/Salad-Data", + "dataset:jackhhao/jailbreak-classification", + "base_model:microsoft/deberta-v3-base", + "base_model:finetune:microsoft/deberta-v3-base", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 base_model: microsoft/deberta-v3-base language: - en datasets: - natolambert/xstest-v2-copy - VMware/open-instruct - alespalla/chatbot_instruction_prompts - HuggingFaceH4/grok-conversation-harmless - Harelix/Prompt-Injection-Mixed-Techniques-2024 - OpenSafetyLab/Salad-Data - jackhhao/jailbreak-classification tags: - prompt-injection - injection - security - llm-security - generated_from_trainer metrics: - accuracy - recall - precision - f1 pipeline_tag: text-classification model-index: - name: deberta-v3-base-prompt-injection-v2 results: [] --- # Model Card for deberta-v3-base-prompt-injection-v2 This model is a fine-tuned version of microsoft/deberta-v3-base specifically developed to detect and classify prompt injection attacks which can manipulate language models into producing unintended outputs. ## Introduction Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The model is designed to enhance security in language model applications by detecting these malicious interventions. ## Model Details - **Fine-tuned by:** Protect AI - **Model type:** deberta-v3-base - **Language(s) (NLP):** English - **License:** Apache License 2.0 - **Finetuned from model:** microsoft/deberta-v3-base ## Intended Uses This model classifies inputs into benign () and injection-detected (). ## Limitations is highly accurate in identifying prompt injections in English. It does not detect jailbreak attacks or handle non-English prompts, which may limit its applicability in diverse linguistic environments or against advanced adversarial techniques. Additionally, we do not recommend using this scanner for system prompts, as it produces false-positives. ## Model Development Over 20 configurations were tested during development to optimize the detection capabilities, focusing on various hyperparameters, training regimens, and dataset compositions. ### Dataset The dataset used for training the model was meticulously assembled from various public open datasets to include a wide range of prompt variations. Additionally, prompt injections were crafted using insights gathered from academic research papers, articles, security competitions, and valuable LLM Guard's community feedback. In compliance with licensing requirements, attribution is given where necessary based on the specific licenses of the source data. Below is a summary of the licenses and the number of datasets under each: - **CC-BY-3.0:** 1 dataset () - **MIT License:** 8 datasets - **CC0 1.0 Universal:** 1 dataset - **No License (public domain):** 6 datasets - **Apache License 2.0:** 5 datasets (, , , , ) - **CC-BY-4.0:** 1 dataset () ### Evaluation Metrics - **Training Performance on the evaluation dataset:** - Loss: 0.0036 - Accuracy: 99.93% - Recall: 99.94% - Precision: 99.92% - F1: 99.93% - **Post-Training Evaluation:** - Tested on 20,000 prompts from untrained datasets - Accuracy: 95.25% - Precision: 91.59% - Recall: 99.74% - F1 Score: 95.49% ### Differences from Previous Versions This version uses a new dataset, focusing solely on prompt injections in English, with improvements in model accuracy and response to community feedback. The original model achieves the following results on our post-training dataset: - Accuracy: 94.8% - Precision: 90.9% - Recall: 99.6% - F1 Score: 95% ## How to Get Started with the Model ### Transformers ### Optimum with ONNX Loading the model requires the 🤗 Optimum library installed. ### Integrate with Langchain Documentation ### Use in LLM Guard Read more ## Community Join our Slack community to connect with developers, provide feedback, and discuss LLM security. ## Citation", + "model_explanation_gemini": "Detects and classifies English prompt injection attacks to prevent manipulation of language model outputs." +} \ No newline at end of file diff --git a/data/model_data_json/prov-gigapath_prov-gigapath.json b/data/model_data_json/prov-gigapath_prov-gigapath.json new file mode 100644 index 0000000000000000000000000000000000000000..b9c4424767836d58b21464c3c7677811c9cf5c96 --- /dev/null +++ b/data/model_data_json/prov-gigapath_prov-gigapath.json @@ -0,0 +1,14 @@ +{ + "model_id": "prov-gigapath/prov-gigapath", + "downloads": 187761, + "tags": [ + "timm", + "pytorch", + "vision", + "medical", + "image-feature-extraction", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - vision - medical pipeline_tag: image-feature-extraction library_name: timm --- # Prov-GigaPath ## A whole-slide foundation model for digital pathology from real-world data [[]]( [] [] [] Hanwen Xu*, Naoto Usuyama*, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, Yanbo Xu, Mu Wei, Wenhui Wang, Shuming Ma, Furu Wei, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Jaylen Rosemon, Tucker Bower, Soohee Lee, Roshanthi Weerasinghe, Bill J. Wright, Ari Robicsek, Brian Piening, Carlo Bifulco, Sheng Wang, Hoifung Poon (*Equal Contribution) Tile the whole slide into N image tiles, with the coordinates of each tile. (2) Get the embeddings for each tile using our tile encoder. (3) Pass the N image tile embeddings and their coordinates into the slide encoder, to get slide level representations. ### Inference with the tile encoder First, load GigaPath tile encoder: Running inference to extract tile level features: ### Inference with the slide encoder To inference with our slide encoder, we need both the tile embeddings and their coordinates as input. First, let's load the GigaPath slide encoder: Run the inference to get the slide level embeddings: ## Fine-tuning ### Tile-Level Linear Probing Example Using PCam Dataset For your convenience, we provide the pre-extracted embeddings for the PCam dataset. You can download them from the link below. Note that the file size is 2GB. There is no need to unzip this file. To run the fine-tuning experiment, execute the following script: ### Slide-Level Fine-Tuning Example Using PANDA Dataset For your convenience, we provide the pre-extracted embeddings for the PANDA dataset. You can download them from the link below. Note that the file size is 32GB. Please unzip this file. To run the fine-tuning experiment, execute the following script: ## Sample Data Download A sample de-identified subset of the Prov-Path data can be accessed from these links [1, 2]. ## Model Uses ### Intended Use The data, code, and model checkpoints are intended to be used solely for (I) future research on pathology foundation models and (II) reproducibility of the experimental results reported in the reference paper. The data, code, and model checkpoints are not intended to be used in clinical care or for any clinical decision-making purposes. ### Primary Intended Use The primary intended use is to support AI researchers reproducing and building on top of this work. GigaPath should be helpful for exploring pre-training, and encoding of digital pathology slides data. ### Out-of-Scope Use **Any** deployed use case of the model --- commercial or otherwise --- is out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are intended *for research use only* and not intended for deployed use cases. ## Usage and License Notices The model is not intended or made available for clinical use as a medical device, clinical support, diagnostic tool, or other technology intended to be used in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions. The model is not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used as such. All users are responsible for reviewing the output of the developed model to determine whether the model meets the user’s needs and for validating and evaluating the model before any clinical use. ## Acknowledgements We would like to express our gratitude to the authors and developers of the exceptional repositories that this project is built upon: DINOv2, MAE, Timm, and TorchScale. Their contributions have been invaluable to our work. ## Citation If you find Prov-GigaPath useful for your your research and applications, please cite using this BibTeX:" +} \ No newline at end of file diff --git a/data/model_data_json/prs-eth_marigold-depth-v1-0.json b/data/model_data_json/prs-eth_marigold-depth-v1-0.json new file mode 100644 index 0000000000000000000000000000000000000000..921b4e955d2df3a88634de5b8a18b40f1ca9d9c7 --- /dev/null +++ b/data/model_data_json/prs-eth_marigold-depth-v1-0.json @@ -0,0 +1,20 @@ +{ + "model_id": "prs-eth/marigold-depth-v1-0", + "downloads": 75450, + "tags": [ + "diffusers", + "safetensors", + "depth estimation", + "image analysis", + "computer vision", + "in-the-wild", + "zero-shot", + "depth-estimation", + "en", + "arxiv:2312.02145", + "license:apache-2.0", + "diffusers:MarigoldDepthPipeline", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en pipeline_tag: depth-estimation new_version: prs-eth/marigold-depth-v1-1 pinned: true tags: - depth estimation - image analysis - computer vision - in-the-wild - zero-shot ---

Marigold Depth v1-0 Model Card

\"Image \"diffusers\" \"Github\" \"Website\" \"arXiv\" \"Social\" \"License\"

This is a model card for the model for monocular depth estimation from a single image. The model is fine-tuned from the model as described in our CVPR'2024 paper titled \"Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation\". - Play with the interactive Hugging Face Spaces demo: check out how the model works with example images or upload your own. - Use it with diffusers to compute the results with a few lines of code. - Get to the bottom of things with our official codebase. ## Model Details - **Developed by:** Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler. - **Model type:** Generative latent diffusion-based affine-invariant monocular depth estimation from a single image. - **Language:** English. - **License:** Apache License License Version 2.0. - **Model Description:** This model can be used to generate an estimated depth map of an input image. - **Resolution**: Even though any resolution can be processed, the model inherits the base diffusion model's effective resolution of roughly **768** pixels. This means that for optimal predictions, any larger input image should be resized to make the longer side 768 pixels before feeding it into the model. - **Steps and scheduler**: This model was designed for usage with the **DDIM** scheduler and between **10 and 50** denoising steps. It is possible to obtain good predictions with just **one** step by overriding the setting in the scheduler configuration file or by adding after the pipeline is loaded in the code before the first usage. For compatibility reasons we kept this model identical to the paper setting and provided a newer v1-1 model with optimal settings for all possible step configurations. - **Outputs**: - **Affine-invariant depth map**: The predicted values are between 0 and 1, interpolating between the near and far planes of the model's choice. - **Uncertainty map**: Produced only when multiple predictions are ensembled with ensemble size larger than 2. - **Resources for more information:** Project Website, Paper, Code. - **Cite as:**" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_brouhaha.json b/data/model_data_json/pyannote_brouhaha.json new file mode 100644 index 0000000000000000000000000000000000000000..cd158fa497cc71bf73904ea3a7fb6ed4e5c0f814 --- /dev/null +++ b/data/model_data_json/pyannote_brouhaha.json @@ -0,0 +1,27 @@ +{ + "model_id": "pyannote/brouhaha", + "downloads": 84524, + "tags": [ + "pyannote-audio", + "pytorch", + "pyannote", + "pyannote-audio-model", + "audio", + "voice", + "speech", + "voice-activity-detection", + "speech-to-noise ratio", + "snr", + "room acoustics", + "c50", + "dataset:LibriSpeech", + "dataset:AudioSet", + "dataset:EchoThief", + "dataset:MIT-Acoustical-Reverberation-Scene", + "arxiv:2210.13248", + "license:openrail", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-model - audio - voice - speech - voice-activity-detection - speech-to-noise ratio - snr - room acoustics - c50 datasets: - LibriSpeech - AudioSet - EchoThief - MIT-Acoustical-Reverberation-Scene license: openrail extra_gated_prompt: \"The collected information will help acquire a better knowledge of this model userbase and help its maintainers apply for grants to improve it further. \" extra_gated_fields: Company/university: text Website: text I plan to use this model for (task, type of audio data, etc): text --- # 🎙️🥁🚨🔊 Brouhaha !Sample Brouhaha predictions **Joint voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation** TL;DR | Paper | Code | And Now for Something Completely Different ## Installation This model relies on pyannote.audio and brouhaha-vad. ## Usage ## Citation bibtex @inproceedings{Bredin2020, Title = {{pyannote.audio: neural building blocks for speaker diarization}}, Author = {{Bredin}, Herv{\\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe}, Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing}, Address = {Barcelona, Spain}, Month = {May}, Year = {2020}, } ```", + "model_explanation_gemini": "Performs joint voice activity detection, speech-to-noise ratio estimation, and C50 room acoustics analysis from audio signals." +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_embedding.json b/data/model_data_json/pyannote_embedding.json new file mode 100644 index 0000000000000000000000000000000000000000..eba5213fa32933169db951952a3346bdcfdee941 --- /dev/null +++ b/data/model_data_json/pyannote_embedding.json @@ -0,0 +1,23 @@ +{ + "model_id": "pyannote/embedding", + "downloads": 468472, + "tags": [ + "pyannote-audio", + "pytorch", + "tensorboard", + "pyannote", + "pyannote-audio-model", + "audio", + "voice", + "speech", + "speaker", + "speaker-recognition", + "speaker-verification", + "speaker-identification", + "speaker-embedding", + "dataset:voxceleb", + "license:mit", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-model - audio - voice - speech - speaker - speaker-recognition - speaker-verification - speaker-identification - speaker-embedding datasets: - voxceleb license: mit inference: false extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.\" extra_gated_fields: Company/university: text Website: text I plan to use this model for (task, type of audio data, etc): text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Speaker embedding Relies on pyannote.audio 2.1: see installation instructions. This model is based on the canonical x-vector TDNN-based architecture, but with filter banks replaced with trainable SincNet features. See []( architecture for implementation details. ## Basic usage Using cosine distance directly, this model reaches 2.8% equal error rate (EER) on VoxCeleb 1 test set. This is without voice activity detection (VAD) nor probabilistic linear discriminant analysis (PLDA). Expect even better results when adding one of those. ## Advanced usage ### Running on GPU ### Extract embedding from an excerpt ### Extract embeddings using a sliding window ## Citation" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_overlapped-speech-detection.json b/data/model_data_json/pyannote_overlapped-speech-detection.json new file mode 100644 index 0000000000000000000000000000000000000000..6e479ef89523b92044ef7ae9e7e192ded54ad475 --- /dev/null +++ b/data/model_data_json/pyannote_overlapped-speech-detection.json @@ -0,0 +1,21 @@ +{ + "model_id": "pyannote/overlapped-speech-detection", + "downloads": 176956, + "tags": [ + "pyannote-audio", + "pyannote", + "pyannote-audio-pipeline", + "audio", + "voice", + "speech", + "speaker", + "overlapped-speech-detection", + "automatic-speech-recognition", + "dataset:ami", + "dataset:dihard", + "dataset:voxconverse", + "license:mit", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-pipeline - audio - voice - speech - speaker - overlapped-speech-detection - automatic-speech-recognition datasets: - ami - dihard - voxconverse license: mit extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.\" extra_gated_fields: Company/university: text Website: text I plan to use this model for (task, type of audio data, etc): text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Overlapped speech detection Relies on pyannote.audio 2.1: see installation instructions. ## Support For commercial enquiries and scientific consulting, please contact me. For technical questions and bug reports, please check pyannote.audio Github repository. ## Citation" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_segmentation-3.0.json b/data/model_data_json/pyannote_segmentation-3.0.json new file mode 100644 index 0000000000000000000000000000000000000000..e87836979830c41f34fe26a8b1c6d3d0add0031f --- /dev/null +++ b/data/model_data_json/pyannote_segmentation-3.0.json @@ -0,0 +1,23 @@ +{ + "model_id": "pyannote/segmentation-3.0", + "downloads": 11569174, + "tags": [ + "pyannote-audio", + "pytorch", + "pyannote", + "pyannote-audio-model", + "audio", + "voice", + "speech", + "speaker", + "speaker-diarization", + "speaker-change-detection", + "speaker-segmentation", + "voice-activity-detection", + "overlapped-speech-detection", + "resegmentation", + "license:mit", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-model - audio - voice - speech - speaker - speaker-diarization - speaker-change-detection - speaker-segmentation - voice-activity-detection - overlapped-speech-detection - resegmentation license: mit inference: false extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers improve it further. Though this model uses MIT license and will always remain open-source, we will occasionnally email you about premium models and paid services around pyannote.\" extra_gated_fields: Company/university: text Website: text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 \"Powerset\" speaker segmentation This model ingests 10 seconds of mono audio sampled at 16kHz and outputs speaker diarization as a (num_frames, num_classes) matrix where the 7 classes are _non-speech_, _speaker #1_, _speaker #2_, _speaker #3_, _speakers #1 and #2_, _speakers #1 and #3_, and _speakers #2 and #3_. !Example output The various concepts behind this model are described in details in this paper. It has been trained by Séverin Baroudi with pyannote.audio using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse. This companion repository by Alexis Plaquet also provides instructions on how to train or finetune such a model on your own data. ## Requirements 1. Install []( with 2. Accept []( user conditions 3. Create access token at []( ## Usage ### Speaker diarization This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks). See pyannote/speaker-diarization-3.0 pipeline that uses an additional speaker embedding model to perform full recording speaker diarization. ### Voice activity detection ### Overlapped speech detection ## Citations" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_segmentation.json b/data/model_data_json/pyannote_segmentation.json new file mode 100644 index 0000000000000000000000000000000000000000..6eb4e34661119e11e3dcacaf952cc27f0dee73f6 --- /dev/null +++ b/data/model_data_json/pyannote_segmentation.json @@ -0,0 +1,22 @@ +{ + "model_id": "pyannote/segmentation", + "downloads": 9567935, + "tags": [ + "pyannote-audio", + "pytorch", + "pyannote", + "pyannote-audio-model", + "audio", + "voice", + "speech", + "speaker", + "speaker-segmentation", + "voice-activity-detection", + "overlapped-speech-detection", + "resegmentation", + "arxiv:2104.04045", + "license:mit", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-model - audio - voice - speech - speaker - speaker-segmentation - voice-activity-detection - overlapped-speech-detection - resegmentation license: mit inference: false extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.\" extra_gated_fields: Company/university: text Website: text I plan to use this model for (task, type of audio data, etc): text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Speaker segmentation Paper | Demo | Blog post !Example ## Usage Relies on pyannote.audio 2.1.1: see installation instructions. ### Voice activity detection ### Overlapped speech detection ### Resegmentation ### Raw scores ## Citation ## Reproducible research In order to reproduce the results of the paper \"End-to-end speaker segmentation for overlap-aware resegmentation \", use with the following hyper-parameters: | Voice activity detection | | | | | | ------------------------ | ------- | -------- | ----------------- | ------------------ | | AMI Mix-Headset | 0.684 | 0.577 | 0.181 | 0.037 | | DIHARD3 | 0.767 | 0.377 | 0.136 | 0.067 | | VoxConverse | 0.767 | 0.713 | 0.182 | 0.501 | | Overlapped speech detection | | | | | | --------------------------- | ------- | -------- | ----------------- | ------------------ | | AMI Mix-Headset | 0.448 | 0.362 | 0.116 | 0.187 | | DIHARD3 | 0.430 | 0.320 | 0.091 | 0.144 | | VoxConverse | 0.587 | 0.426 | 0.337 | 0.112 | | Resegmentation of VBx | | | | | | --------------------- | ------- | -------- | ----------------- | ------------------ | | AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705 | | DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182 | | VoxConverse | 0.537 | 0.724 | 0.410 | 0.563 | Expected outputs (and VBx baseline) are also provided in the sub-directories." +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_speaker-diarization-3.0.json b/data/model_data_json/pyannote_speaker-diarization-3.0.json new file mode 100644 index 0000000000000000000000000000000000000000..08816b68a84877b1d436c1396ec4d9d2d4da704f --- /dev/null +++ b/data/model_data_json/pyannote_speaker-diarization-3.0.json @@ -0,0 +1,23 @@ +{ + "model_id": "pyannote/speaker-diarization-3.0", + "downloads": 446935, + "tags": [ + "pyannote-audio", + "pyannote", + "pyannote-audio-pipeline", + "audio", + "voice", + "speech", + "speaker", + "speaker-diarization", + "speaker-change-detection", + "voice-activity-detection", + "overlapped-speech-detection", + "automatic-speech-recognition", + "arxiv:2111.14448", + "arxiv:2012.01477", + "license:mit", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-pipeline - audio - voice - speech - speaker - speaker-diarization - speaker-change-detection - voice-activity-detection - overlapped-speech-detection - automatic-speech-recognition license: mit extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers improve it further. Though this pipeline uses MIT license and will always remain open-source, we will occasionnally email you about premium pipelines and paid services around pyannote.\" extra_gated_fields: Company/university: text Website: text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Speaker diarization 3.0 This pipeline has been trained by Séverin Baroudi with pyannote.audio using a combination of the training sets of AISHELL, AliMeeting, AMI, AVA-AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse. It ingests mono audio sampled at 16kHz and outputs speaker diarization as an []( instance: * stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels. * audio files sampled at a different rate are resampled to 16kHz automatically upon loading. ## Requirements 1. Install []( with 2. Accept []( user conditions 3. Accept []( user conditions 4. Create access token at []( ## Usage ### Processing on GPU pipelines run on CPU by default. You can send them to GPU with the following lines: Real-time factor is around 2.5% using one Nvidia Tesla V100 SXM2 GPU (for the neural inference part) and one Intel Cascade Lake 6248 CPU (for the clustering part). In other words, it takes approximately 1.5 minutes to process a one hour conversation. ### Processing from memory Pre-loading audio files in memory may result in faster processing: ### Monitoring progress Hooks are available to monitor the progress of the pipeline: ### Controlling the number of speakers In case the number of speakers is known in advance, one can use the option: One can also provide lower and/or upper bounds on the number of speakers using and options: ## Benchmark This pipeline has been benchmarked on a large collection of datasets. Processing is fully automatic: * no manual voice activity detection (as is sometimes the case in the literature) * no manual number of speakers (though it is possible to provide it to the pipeline) * no fine-tuning of the internal models nor tuning of the pipeline hyper-parameters to each dataset ... with the least forgiving diarization error rate (DER) setup (named *\"Full\"* in this paper): * no forgiveness collar * evaluation of overlapped speech | Benchmark | DER% | FA% | Miss% | Conf% | Expected output | File-level evaluation | | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | | AISHELL-4 | 12.3 | 3.8 | 4.4 | 4.1 | RTTM | eval | | AliMeeting (*channel 1*) | 24.3 | 4.4 | 10.0 | 9.9 | RTTM | eval | | AMI (*headset mix,* *only_words*) | 19.0 | 3.6 | 9.5 | 5.9 | RTTM | eval | | AMI (*array1, channel 1,* *only_words)* | 22.2 | 3.8 | 11.2 | 7.3 | RTTM | eval | | AVA-AVD | 49.1 | 10.8 | 15.7| 22.5 | RTTM | eval | | DIHARD 3 (*Full*) | 21.7 | 6.2 | 8.1 | 7.3 | RTTM | eval | | MSDWild | 24.6 | 5.8 | 8.0 | 10.7 | RTTM | eval | | REPERE (*phase 2*) | 7.8 | 1.8 | 2.6 | 3.5 | RTTM | eval | | VoxConverse (*v0.3*) | 11.3 | 4.1 | 3.4 | 3.8 | RTTM | eval | ## Citations" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_speaker-diarization-3.1.json b/data/model_data_json/pyannote_speaker-diarization-3.1.json new file mode 100644 index 0000000000000000000000000000000000000000..cdec54b4a01138fbb8d828783e1636623540715f --- /dev/null +++ b/data/model_data_json/pyannote_speaker-diarization-3.1.json @@ -0,0 +1,24 @@ +{ + "model_id": "pyannote/speaker-diarization-3.1", + "downloads": 10614517, + "tags": [ + "pyannote-audio", + "pyannote", + "pyannote-audio-pipeline", + "audio", + "voice", + "speech", + "speaker", + "speaker-diarization", + "speaker-change-detection", + "voice-activity-detection", + "overlapped-speech-detection", + "automatic-speech-recognition", + "arxiv:2111.14448", + "arxiv:2012.01477", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-pipeline - audio - voice - speech - speaker - speaker-diarization - speaker-change-detection - voice-activity-detection - overlapped-speech-detection - automatic-speech-recognition license: mit extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers improve it further. Though this pipeline uses MIT license and will always remain open-source, we will occasionnally email you about premium pipelines and paid services around pyannote.\" extra_gated_fields: Company/university: text Website: text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Speaker diarization 3.1 This pipeline is the same as []( except it removes the problematic use of . Both speaker segmentation and embedding now run in pure PyTorch. This should ease deployment and possibly speed up inference. It requires pyannote.audio version 3.1 or higher. It ingests mono audio sampled at 16kHz and outputs speaker diarization as an []( instance: - stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels. - audio files sampled at a different rate are resampled to 16kHz automatically upon loading. ## Requirements 1. Install []( with 2. Accept []( user conditions 3. Accept []( user conditions 4. Create access token at []( ## Usage ### Processing on GPU pipelines run on CPU by default. You can send them to GPU with the following lines: ### Processing from memory Pre-loading audio files in memory may result in faster processing: ### Monitoring progress Hooks are available to monitor the progress of the pipeline: ### Controlling the number of speakers In case the number of speakers is known in advance, one can use the option: One can also provide lower and/or upper bounds on the number of speakers using and options: ## Benchmark This pipeline has been benchmarked on a large collection of datasets. Processing is fully automatic: - no manual voice activity detection (as is sometimes the case in the literature) - no manual number of speakers (though it is possible to provide it to the pipeline) - no fine-tuning of the internal models nor tuning of the pipeline hyper-parameters to each dataset ... with the least forgiving diarization error rate (DER) setup (named _\"Full\"_ in this paper): - no forgiveness collar - evaluation of overlapped speech | Benchmark | DER% | FA% | Miss% | Conf% | Expected output | File-level evaluation | | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | | AISHELL-4 | 12.2 | 3.8 | 4.4 | 4.0 | RTTM | eval | | AliMeeting (_channel 1_) | 24.4 | 4.4 | 10.0 | 10.0 | RTTM | eval | | AMI (_headset mix,_ _only_words_) | 18.8 | 3.6 | 9.5 | 5.7 | RTTM | eval | | AMI (_array1, channel 1,_ _only_words)_ | 22.4 | 3.8 | 11.2 | 7.5 | RTTM | eval | | AVA-AVD | 50.0 | 10.8 | 15.7 | 23.4 | RTTM | eval | | DIHARD 3 (_Full_) | 21.7 | 6.2 | 8.1 | 7.3 | RTTM | eval | | MSDWild | 25.3 | 5.8 | 8.0 | 11.5 | RTTM | eval | | REPERE (_phase 2_) | 7.8 | 1.8 | 2.6 | 3.5 | RTTM | eval | | VoxConverse (_v0.3_) | 11.3 | 4.1 | 3.4 | 3.8 | RTTM | eval | ## Citations" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_speaker-diarization.json b/data/model_data_json/pyannote_speaker-diarization.json new file mode 100644 index 0000000000000000000000000000000000000000..6b91abfc66fad8d92a772a01551f8b9a22fd656e --- /dev/null +++ b/data/model_data_json/pyannote_speaker-diarization.json @@ -0,0 +1,30 @@ +{ + "model_id": "pyannote/speaker-diarization", + "downloads": 864999, + "tags": [ + "pyannote-audio", + "pyannote", + "pyannote-audio-pipeline", + "audio", + "voice", + "speech", + "speaker", + "speaker-diarization", + "speaker-change-detection", + "voice-activity-detection", + "overlapped-speech-detection", + "automatic-speech-recognition", + "dataset:ami", + "dataset:dihard", + "dataset:voxconverse", + "dataset:aishell", + "dataset:repere", + "dataset:voxceleb", + "arxiv:2012.01477", + "arxiv:2110.07058", + "arxiv:2005.08072", + "license:mit", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-pipeline - audio - voice - speech - speaker - speaker-diarization - speaker-change-detection - voice-activity-detection - overlapped-speech-detection - automatic-speech-recognition datasets: - ami - dihard - voxconverse - aishell - repere - voxceleb license: mit extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.\" extra_gated_fields: Company/university: text Website: text I plan to use this model for (task, type of audio data, etc): text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Speaker diarization Relies on pyannote.audio 2.1.1: see installation instructions. ## TL;DR ## Advanced usage In case the number of speakers is known in advance, one can use the option: One can also provide lower and/or upper bounds on the number of speakers using and options: ## Benchmark ### Real-time factor Real-time factor is around 2.5% using one Nvidia Tesla V100 SXM2 GPU (for the neural inference part) and one Intel Cascade Lake 6248 CPU (for the clustering part). In other words, it takes approximately 1.5 minutes to process a one hour conversation. ### Accuracy This pipeline is benchmarked on a growing collection of datasets. Processing is fully automatic: * no manual voice activity detection (as is sometimes the case in the literature) * no manual number of speakers (though it is possible to provide it to the pipeline) * no fine-tuning of the internal models nor tuning of the pipeline hyper-parameters to each dataset ... with the least forgiving diarization error rate (DER) setup (named *\"Full\"* in this paper): * no forgiveness collar * evaluation of overlapped speech | Benchmark | DER% | FA% | Miss% | Conf% | Expected output | File-level evaluation | | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | | AISHELL-4 | 14.09 | 5.17 | 3.27 | 5.65 | RTTM | eval | | Albayzin (*RTVE 2022*) | 25.60 | 5.58 | 6.84 | 13.18 | RTTM | eval | | AliMeeting (*channel 1*) | 27.42 | 4.84 | 14.00 | 8.58 | RTTM | eval | | AMI (*headset mix,* *only_words*) | 18.91 | 4.48 | 9.51 | 4.91 | RTTM | eval | | AMI (*array1, channel 1,* *only_words)* | 27.12 | 4.11 | 17.78 | 5.23 | RTTM | eval | | CALLHOME (*part2*) | 32.37 | 6.30 | 13.72 | 12.35 | RTTM | eval | | DIHARD 3 (*Full*) | 26.94 | 10.50 | 8.41 | 8.03 | RTTM | eval | | Ego4D *v1 (validation)* | 63.99 | 3.91 | 44.42 | 15.67 | RTTM | eval | | REPERE (*phase 2*) | 8.17 | 2.23 | 2.49 | 3.45 | RTTM | eval | | This American Life | 20.82 | 2.03 | 11.89 | 6.90 | RTTM | eval | | VoxConverse (*v0.3*) | 11.24 | 4.42 | 2.88 | 3.94 | RTTM | eval | ## Technical report This report describes the main principles behind version of pyannote.audio speaker diarization pipeline. It also provides recipes explaining how to adapt the pipeline to your own set of annotated data. In particular, those are applied to the above benchmark and consistently leads to significant performance improvement over the above out-of-the-box performance. ## Citations" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_voice-activity-detection.json b/data/model_data_json/pyannote_voice-activity-detection.json new file mode 100644 index 0000000000000000000000000000000000000000..53403a1d624c3db155fb119dd032d704d0181a71 --- /dev/null +++ b/data/model_data_json/pyannote_voice-activity-detection.json @@ -0,0 +1,21 @@ +{ + "model_id": "pyannote/voice-activity-detection", + "downloads": 8061538, + "tags": [ + "pyannote-audio", + "pyannote", + "pyannote-audio-pipeline", + "audio", + "voice", + "speech", + "speaker", + "voice-activity-detection", + "automatic-speech-recognition", + "dataset:ami", + "dataset:dihard", + "dataset:voxconverse", + "license:mit", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-pipeline - audio - voice - speech - speaker - voice-activity-detection - automatic-speech-recognition datasets: - ami - dihard - voxconverse license: mit extra_gated_prompt: \"The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.\" extra_gated_fields: Company/university: text Website: text I plan to use this model for (task, type of audio data, etc): text --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Voice activity detection Relies on pyannote.audio 2.1: see installation instructions. ## Citation" +} \ No newline at end of file diff --git a/data/model_data_json/pyannote_wespeaker-voxceleb-resnet34-LM.json b/data/model_data_json/pyannote_wespeaker-voxceleb-resnet34-LM.json new file mode 100644 index 0000000000000000000000000000000000000000..771ca0c6876fc0b7187da7b8c939b9260ffa1995 --- /dev/null +++ b/data/model_data_json/pyannote_wespeaker-voxceleb-resnet34-LM.json @@ -0,0 +1,24 @@ +{ + "model_id": "pyannote/wespeaker-voxceleb-resnet34-LM", + "downloads": 16472763, + "tags": [ + "pyannote-audio", + "pytorch", + "pyannote", + "pyannote-audio-model", + "wespeaker", + "audio", + "voice", + "speech", + "speaker", + "speaker-recognition", + "speaker-verification", + "speaker-identification", + "speaker-embedding", + "dataset:voxceleb", + "license:cc-by-4.0", + "region:us" + ], + "description": "--- tags: - pyannote - pyannote-audio - pyannote-audio-model - wespeaker - audio - voice - speech - speaker - speaker-recognition - speaker-verification - speaker-identification - speaker-embedding datasets: - voxceleb license: cc-by-4.0 inference: false --- Using this open-source model in production? Consider switching to pyannoteAI for better and faster options. # 🎹 Wrapper around wespeaker-voxceleb-resnet34-LM This model requires version 3.1 or higher. This is a wrapper around WeSpeaker pretrained speaker embedding model, for use in . ## Basic usage ## Advanced usage ### Running on GPU ### Extract embedding from an excerpt ### Extract embeddings using a sliding window ## License According to this page: > The pretrained model in WeNet follows the license of it's corresponding dataset. For example, the pretrained model on VoxCeleb follows Creative Commons Attribution 4.0 International License., since it is used as license of the VoxCeleb dataset, see ## Citation", + "model_explanation_gemini": "Generates speaker embeddings for tasks like speaker recognition, verification, and identification using a WeSpeaker pretrained model on VoxCeleb data." +} \ No newline at end of file diff --git a/data/model_data_json/pysentimiento_robertuito-emotion-analysis.json b/data/model_data_json/pysentimiento_robertuito-emotion-analysis.json new file mode 100644 index 0000000000000000000000000000000000000000..d6774dac3906c08e24b5f83150a6cf5cdac3c7e3 --- /dev/null +++ b/data/model_data_json/pysentimiento_robertuito-emotion-analysis.json @@ -0,0 +1,16 @@ +{ + "model_id": "pysentimiento/robertuito-emotion-analysis", + "downloads": 106746, + "tags": [ + "pysentimiento", + "pytorch", + "roberta", + "emotion-analysis", + "twitter", + "es", + "arxiv:2106.09462", + "region:us" + ], + "description": "--- language: - es library_name: pysentimiento tags: - emotion-analysis - twitter --- # Emotion Analysis in Spanish ## robertuito-emotion-analysis Repository: Model trained with TASS 2020 Task 2 corpus for Emotion detection in Spanish. Base model is RoBERTuito, a RoBERTa model trained in Spanish tweets. Contains the six Ekman emotions plus a neutral class: - anger - disgust - fear - joy - sadness - surprise ## Results Results for the four tasks evaluated in . Results are expressed as Macro F1 scores | model | emotion | hate_speech | irony | sentiment | |:--------------|:--------------|:--------------|:--------------|:--------------| | robertuito | 0.560 ± 0.010 | 0.759 ± 0.007 | 0.739 ± 0.005 | 0.705 ± 0.003 | | roberta | 0.527 ± 0.015 | 0.741 ± 0.012 | 0.721 ± 0.008 | 0.670 ± 0.006 | | bertin | 0.524 ± 0.007 | 0.738 ± 0.007 | 0.713 ± 0.012 | 0.666 ± 0.005 | | beto_uncased | 0.532 ± 0.012 | 0.727 ± 0.016 | 0.701 ± 0.007 | 0.651 ± 0.006 | | beto_cased | 0.516 ± 0.012 | 0.724 ± 0.012 | 0.705 ± 0.009 | 0.662 ± 0.005 | | mbert_uncased | 0.493 ± 0.010 | 0.718 ± 0.011 | 0.681 ± 0.010 | 0.617 ± 0.003 | | biGRU | 0.264 ± 0.007 | 0.592 ± 0.018 | 0.631 ± 0.011 | 0.585 ± 0.011 | Note that for Hate Speech, these are the results for Semeval 2019, Task 5 Subtask B (HS+TR+AG detection) ## Citation If you use this model in your research, please cite pysentimiento, RoBERTuito and EmoEvent papers:", + "model_explanation_gemini": "Detects six Ekman emotions plus neutral in Spanish tweets using the RoBERTuito base model." +} \ No newline at end of file diff --git a/data/model_data_json/pysentimiento_robertuito-sentiment-analysis.json b/data/model_data_json/pysentimiento_robertuito-sentiment-analysis.json new file mode 100644 index 0000000000000000000000000000000000000000..5f413f5487eef349f5b0a531bb72f3ceeb55ea62 --- /dev/null +++ b/data/model_data_json/pysentimiento_robertuito-sentiment-analysis.json @@ -0,0 +1,18 @@ +{ + "model_id": "pysentimiento/robertuito-sentiment-analysis", + "downloads": 977906, + "tags": [ + "pysentimiento", + "pytorch", + "tf", + "safetensors", + "roberta", + "twitter", + "sentiment-analysis", + "text-classification", + "es", + "region:us" + ], + "description": "--- language: - es library_name: pysentimiento pipeline_tag: text-classification tags: - twitter - sentiment-analysis --- # Sentiment Analysis in Spanish ## robertuito-sentiment-analysis Repository: Model trained with TASS 2020 corpus (around ~5k tweets) of several dialects of Spanish. Base model is RoBERTuito, a RoBERTa model trained in Spanish tweets. Uses , , labels. ## Usage Use it directly with pysentimiento ## Results Results for the four tasks evaluated in . Results are expressed as Macro F1 scores | model | emotion | hate_speech | irony | sentiment | |:--------------|:--------------|:--------------|:--------------|:--------------| | robertuito | 0.560 ± 0.010 | 0.759 ± 0.007 | 0.739 ± 0.005 | 0.705 ± 0.003 | | roberta | 0.527 ± 0.015 | 0.741 ± 0.012 | 0.721 ± 0.008 | 0.670 ± 0.006 | | bertin | 0.524 ± 0.007 | 0.738 ± 0.007 | 0.713 ± 0.012 | 0.666 ± 0.005 | | beto_uncased | 0.532 ± 0.012 | 0.727 ± 0.016 | 0.701 ± 0.007 | 0.651 ± 0.006 | | beto_cased | 0.516 ± 0.012 | 0.724 ± 0.012 | 0.705 ± 0.009 | 0.662 ± 0.005 | | mbert_uncased | 0.493 ± 0.010 | 0.718 ± 0.011 | 0.681 ± 0.010 | 0.617 ± 0.003 | | biGRU | 0.264 ± 0.007 | 0.592 ± 0.018 | 0.631 ± 0.011 | 0.585 ± 0.011 | Note that for Hate Speech, these are the results for Semeval 2019, Task 5 Subtask B ## Citation If you use this model in your research, please cite pysentimiento, RoBERTuito and TASS papers:", + "model_explanation_gemini": "Performs sentiment analysis on Spanish tweets using the RoBERTuito model, trained on a diverse corpus to classify emotions, hate speech, irony, and sentiment." +} \ No newline at end of file diff --git a/data/model_data_json/qanastek_51-languages-classifier.json b/data/model_data_json/qanastek_51-languages-classifier.json new file mode 100644 index 0000000000000000000000000000000000000000..46350f4afe7e9f7646826da8041876a9b0167817 --- /dev/null +++ b/data/model_data_json/qanastek_51-languages-classifier.json @@ -0,0 +1,17 @@ +{ + "model_id": "qanastek/51-languages-classifier", + "downloads": 83055, + "tags": [ + "transformers", + "pytorch", + "Transformers", + "text-classification", + "multi-class-classification", + "dataset:qanastek/MASSIVE", + "arxiv:1911.02116", + "license:cc-by-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - Transformers - text-classification - multi-class-classification languages: - af-ZA - am-ET - ar-SA - az-AZ - bn-BD - cy-GB - da-DK - de-DE - el-GR - en-US - es-ES - fa-IR - fi-FI - fr-FR - he-IL - hi-IN - hu-HU - hy-AM - id-ID - is-IS - it-IT - ja-JP - jv-ID - ka-GE - km-KH - kn-IN - ko-KR - lv-LV - ml-IN - mn-MN - ms-MY - my-MM - nb-NO - nl-NL - pl-PL - pt-PT - ro-RO - ru-RU - sl-SL - sq-AL - sv-SE - sw-KE - ta-IN - te-IN - th-TH - tl-PH - tr-TR - ur-PK - vi-VN - zh-CN - zh-TW multilinguality: - af-ZA - am-ET - ar-SA - az-AZ - bn-BD - cy-GB - da-DK - de-DE - el-GR - en-US - es-ES - fa-IR - fi-FI - fr-FR - he-IL - hi-IN - hu-HU - hy-AM - id-ID - is-IS - it-IT - ja-JP - jv-ID - ka-GE - km-KH - kn-IN - ko-KR - lv-LV - ml-IN - mn-MN - ms-MY - my-MM - nb-NO - nl-NL - pl-PL - pt-PT - ro-RO - ru-RU - sl-SL - sq-AL - sv-SE - sw-KE - ta-IN - te-IN - th-TH - tl-PH - tr-TR - ur-PK - vi-VN - zh-CN - zh-TW datasets: - qanastek/MASSIVE widget: - text: \"wake me up at five am this week\" - text: \"je veux écouter la chanson de jacques brel encore une fois\" - text: \"quiero escuchar la canción de arijit singh una vez más\" - text: \"olly onde é que á um parque por perto onde eu possa correr\" - text: \"פרק הבא בפודקאסט בבקשה\" - text: \"亚马逊股价\" - text: \"найди билет на поезд в санкт-петербург\" license: cc-by-4.0 --- **People Involved** * LABRAK Yanis (1) **Affiliations** 1. LIA, NLP team, Avignon University, Avignon, France. ## Model XLM-Roberta : Paper : Unsupervised Cross-lingual Representation Learning at Scale ## Demo: How to use in HuggingFace Transformers Pipeline Requires transformers: Outputs: ## Training data MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions. ### Languages Thee model is capable of distinguish 51 languages : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ## Evaluation results Keywords : language identification ; language identification ; multilingual ; classification" +} \ No newline at end of file diff --git a/data/model_data_json/r-f_wav2vec-english-speech-emotion-recognition.json b/data/model_data_json/r-f_wav2vec-english-speech-emotion-recognition.json new file mode 100644 index 0000000000000000000000000000000000000000..ee949953c395a8eb2140f3139ea4fe246f0a1042 --- /dev/null +++ b/data/model_data_json/r-f_wav2vec-english-speech-emotion-recognition.json @@ -0,0 +1,16 @@ +{ + "model_id": "r-f/wav2vec-english-speech-emotion-recognition", + "downloads": 144543, + "tags": [ + "transformers", + "pytorch", + "wav2vec2", + "automatic-speech-recognition", + "generated_from_trainer", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - generated_from_trainer metrics: - accuracy model_index: name: wav2vec-english-speech-emotion-recognition --- # Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0 The model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english for a Speech Emotion Recognition (SER) task. Several datasets were used the fine-tune the original model: - Surrey Audio-Visual Expressed Emotion (SAVEE) - 480 audio files from 4 male actors - Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) - 1440 audio files from 24 professional actors (12 female, 12 male) - Toronto emotional speech set (TESS) - 2800 audio files from 2 female actors 7 labels/emotions were used as classification labels It achieves the following results on the evaluation set: - Loss: 0.104075 - Accuracy: 0.97463 ## Model Usage ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 4 - eval_batch_size: 4 - eval_steps: 500 - seed: 42 - gradient_accumulation_steps: 2 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - num_epochs: 4 - max_steps=7500 - save_steps: 1500 ### Training results | Step | Training Loss | Validation Loss | Accuracy | | ---- | ------------- | --------------- | -------- | | 500 | 1.8124 | 1.365212 | 0.486258 | | 1000 | 0.8872 | 0.773145 | 0.79704 | | 1500 | 0.7035 | 0.574954 | 0.852008 | | 2000 | 0.6879 | 1.286738 | 0.775899 | | 2500 | 0.6498 | 0.697455 | 0.832981 | | 3000 | 0.5696 | 0.33724 | 0.892178 | | 3500 | 0.4218 | 0.307072 | 0.911205 | | 4000 | 0.3088 | 0.374443 | 0.930233 | | 4500 | 0.2688 | 0.260444 | 0.936575 | | 5000 | 0.2973 | 0.302985 | 0.92389 | | 5500 | 0.1765 | 0.165439 | 0.961945 | | 6000 | 0.1475 | 0.170199 | 0.961945 | | 6500 | 0.1274 | 0.15531 | 0.966173 | | 7000 | 0.0699 | 0.103882 | 0.976744 | | 7500 | 0.083 | 0.104075 | 0.97463 |", + "model_explanation_gemini": "Fine-tuned from jonatasgrosman/wav2vec2-large-xlsr-53-english, this model classifies speech into 7 emotion categories using audio data from multiple datasets, achieving high accuracy in speech emotion recognition." +} \ No newline at end of file diff --git a/data/model_data_json/redis_langcache-embed-v1.json b/data/model_data_json/redis_langcache-embed-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..50ac14181dd3a62b173561cf07bfcc3dc5be0ddd --- /dev/null +++ b/data/model_data_json/redis_langcache-embed-v1.json @@ -0,0 +1,21 @@ +{ + "model_id": "redis/langcache-embed-v1", + "downloads": 12235, + "tags": [ + "sentence-transformers", + "safetensors", + "modernbert", + "sentence-similarity", + "loss:OnlineContrastiveLoss", + "arxiv:2504.02268", + "arxiv:1908.10084", + "base_model:Alibaba-NLP/gte-modernbert-base", + "base_model:finetune:Alibaba-NLP/gte-modernbert-base", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - sentence-transformers - sentence-similarity - loss:OnlineContrastiveLoss base_model: Alibaba-NLP/gte-modernbert-base pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy - cosine_precision - cosine_recall - cosine_f1 - cosine_ap model-index: - name: SentenceTransformer based on Alibaba-NLP/gte-modernbert-base results: - task: type: my-binary-classification name: My Binary Classification dataset: name: Quora type: unknown metrics: - type: cosine_accuracy value: 0.90 name: Cosine Accuracy - type: cosine_f1 value: 0.87 name: Cosine F1 - type: cosine_precision value: 0.84 name: Cosine Precision - type: cosine_recall value: 0.90 name: Cosine Recall - type: cosine_ap value: 0.92 name: Cosine Ap --- # Redis semantic caching embedding model based on Alibaba-NLP/gte-modernbert-base This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base on the Quora dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity for the purpose of semantic caching. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** Alibaba-NLP/gte-modernbert-base - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - Quora ### Model Sources - **Documentation:** Sentence Transformers Documentation - **Repository:** Sentence Transformers on GitHub - **Hugging Face:** Sentence Transformers on Hugging Face ### Full Model Architecture ## Usage First install the Sentence Transformers library: Then you can load this model and run inference. #### Binary Classification | Metric | Value | |:--------------------------|:----------| | cosine_accuracy | 0.90 | | cosine_f1 | 0.87 | | cosine_precision | 0.84 | | cosine_recall | 0.90 | | **cosine_ap** | 0.92 | ### Training Dataset #### Quora * Dataset: Quora * Size: 323491 training samples * Columns: question_1, question_2, and label ### Evaluation Dataset #### Quora * Dataset: Quora * Size: 53486 evaluation samples * Columns: question_1, question_2, and label ## Citation ### BibTeX #### Redis Langcache-embed Models #### Sentence Transformers # Whisper Medium FLEURS Language Identification This model is a fine-tuned version of openai/whisper-medium on the FLEURS subset of the google/xtreme_s dataset. It achieves the following results on the evaluation set: - Loss: 0.8413 - Accuracy: 0.8805 To reproduce this run, execute the command in []( ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 16 - eval_batch_size: 32 - seed: 0 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:-----:|:---------------:|:--------:| | 0.0152 | 1.0 | 8494 | 0.9087 | 0.8431 | | 0.0003 | 2.0 | 16988 | 1.0059 | 0.8460 | | 0.0 | 3.0 | 25482 | 0.8413 | 0.8805 | ### Framework versions - Transformers 4.27.0.dev0 - Pytorch 1.13.1 - Datasets 2.9.0 - Tokenizers 0.13.2", + "model_explanation_gemini": "Identifies spoken languages in audio using a fine-tuned Whisper-medium model trained on the FLEURS dataset with 88.05% accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/sayeed99_segformer_b3_clothes.json b/data/model_data_json/sayeed99_segformer_b3_clothes.json new file mode 100644 index 0000000000000000000000000000000000000000..65f5c2dc3981fabd3a601b854412fe22b484aec3 --- /dev/null +++ b/data/model_data_json/sayeed99_segformer_b3_clothes.json @@ -0,0 +1,18 @@ +{ + "model_id": "sayeed99/segformer_b3_clothes", + "downloads": 95415, + "tags": [ + "transformers", + "safetensors", + "segformer", + "vision", + "image-segmentation", + "dataset:mattmdjaga/human_parsing_dataset", + "arxiv:2105.15203", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - vision - image-segmentation widget: - src: >- example_title: Person - src: >- example_title: Person datasets: - mattmdjaga/human_parsing_dataset pipeline_tag: image-segmentation --- # Segformer B3 fine-tuned for clothes segmentation SegFormer model fine-tuned on ATR dataset for clothes segmentation but can also be used for human segmentation. The dataset on hugging face is called \"mattmdjaga/human_parsing_dataset\". **NEW** - **Training code**. Right now it only contains the pure code with some comments, but soon I'll add a colab notebook version and a blog post with it to make it more friendly. Labels: 0: \"Background\", 1: \"Hat\", 2: \"Hair\", 3: \"Sunglasses\", 4: \"Upper-clothes\", 5: \"Skirt\", 6: \"Pants\", 7: \"Dress\", 8: \"Belt\", 9: \"Left-shoe\", 10: \"Right-shoe\", 11: \"Face\", 12: \"Left-leg\", 13: \"Right-leg\", 14: \"Left-arm\", 15: \"Right-arm\", 16: \"Bag\", 17: \"Scarf\" ### Evaluation | Label Index | Label Name | Category Accuracy | Category IoU | |:-------------:|:----------------:|:-----------------:|:------------:| | 0 | Background | 0.99 | 0.99 | | 1 | Hat | 0.73 | 0.68 | | 2 | Hair | 0.91 | 0.82 | | 3 | Sunglasses | 0.73 | 0.63 | | 4 | Upper-clothes | 0.87 | 0.78 | | 5 | Skirt | 0.76 | 0.65 | | 6 | Pants | 0.90 | 0.84 | | 7 | Dress | 0.74 | 0.55 | | 8 | Belt | 0.35 | 0.30 | | 9 | Left-shoe | 0.74 | 0.58 | | 10 | Right-shoe | 0.75 | 0.60 | | 11 | Face | 0.92 | 0.85 | | 12 | Left-leg | 0.90 | 0.82 | | 13 | Right-leg | 0.90 | 0.81 | | 14 | Left-arm | 0.86 | 0.74 | | 15 | Right-arm | 0.82 | 0.73 | | 16 | Bag | 0.91 | 0.84 | | 17 | Scarf | 0.63 | 0.29 | Overall Evaluation Metrics: - Evaluation Loss: 0.15 - Mean Accuracy: 0.80 - Mean IoU: 0.69 ### License The license for this model can be found here. ### BibTeX entry and citation info", + "model_explanation_gemini": "Segments clothing items and human body parts in images, trained on the ATR dataset with labels for 18 categories including hats, upper-clothes, pants, and accessories." +} \ No newline at end of file diff --git a/data/model_data_json/sdadas_mmlw-retrieval-roberta-large.json b/data/model_data_json/sdadas_mmlw-retrieval-roberta-large.json new file mode 100644 index 0000000000000000000000000000000000000000..a44a557af1ded4e487d5182150c6659160dc2cae --- /dev/null +++ b/data/model_data_json/sdadas_mmlw-retrieval-roberta-large.json @@ -0,0 +1,23 @@ +{ + "model_id": "sdadas/mmlw-retrieval-roberta-large", + "downloads": 192694, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "information-retrieval", + "pl", + "arxiv:2402.13350", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - information-retrieval language: pl license: apache-2.0 widget: - source_sentence: \"zapytanie: Jak dożyć 100 lat?\" sentences: - \"Trzeba zdrowo się odżywiać i uprawiać sport.\" - \"Trzeba pić alkohol, imprezować i jeździć szybkimi autami.\" - \"Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu.\" ---

MMLW-retrieval-roberta-large

MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. This model is optimized for information retrieval tasks. It can transform queries and passages to 1024 dimensional vectors. The model was developed using a two-step procedure: - In the first step, it was initialized with Polish RoBERTa checkpoint, and then trained with multilingual knowledge distillation method on a diverse corpus of 60 million Polish-English text pairs. We utilised English FlagEmbeddings (BGE) as teacher models for distillation. - The second step involved fine-tuning the obtained models with contrastrive loss on Polish MS MARCO training split. In order to improve the efficiency of contrastive training, we used large batch sizes - 1152 for small, 768 for base, and 288 for large models. Fine-tuning was conducted on a cluster of 12 A100 GPUs. ⚠️ **2023-12-26:** We have updated the model to a new version with improved results. You can still download the previous version using the **v1** tag: ⚠️ ## Usage (Sentence-Transformers) ⚠️ Our dense retrievers require the use of specific prefixes and suffixes when encoding texts. For this model, each query should be preceded by the prefix **\"zapytanie: \"** ⚠️ You can use the model like this with sentence-transformers: ## Evaluation Results The model achieves **NDCG@10** of **58.46** on the Polish Information Retrieval Benchmark. See PIRB Leaderboard for detailed results. ## Acknowledgements This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative. ## Citation", + "model_explanation_gemini": "Optimized for Polish information retrieval, this model encodes queries and passages into 1024-dimensional vectors using a RoBERTa-based architecture trained via multilingual knowledge distillation and contrastive fine-tuning." +} \ No newline at end of file diff --git a/data/model_data_json/sdadas_mmlw-roberta-base.json b/data/model_data_json/sdadas_mmlw-roberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..68b488326c29bf87a17a0f241e15e5d1eeebd726 --- /dev/null +++ b/data/model_data_json/sdadas_mmlw-roberta-base.json @@ -0,0 +1,24 @@ +{ + "model_id": "sdadas/mmlw-roberta-base", + "downloads": 113582, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "mteb", + "pl", + "arxiv:2402.13350", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - mteb model-index: - name: mmlw-roberta-base results: - task: type: Clustering dataset: type: PL-MTEB/8tags-clustering name: MTEB 8TagsClustering config: default split: test revision: None metrics: - type: v_measure value: 33.08463724780795 - task: type: Classification dataset: type: PL-MTEB/allegro-reviews name: MTEB AllegroReviews config: default split: test revision: None metrics: - type: accuracy value: 40.25844930417495 - type: f1 value: 35.59685265418916 - task: type: Retrieval dataset: type: arguana-pl name: MTEB ArguAna-PL config: default split: test revision: None metrics: - type: map_at_1 value: 33.073 - type: map_at_10 value: 50.223 - type: map_at_100 value: 50.942 - type: map_at_1000 value: 50.94499999999999 - type: map_at_3 value: 45.721000000000004 - type: map_at_5 value: 48.413000000000004 - type: mrr_at_1 value: 34.424 - type: mrr_at_10 value: 50.68899999999999 - type: mrr_at_100 value: 51.437999999999995 - type: mrr_at_1000 value: 51.441 - type: mrr_at_3 value: 46.219 - type: mrr_at_5 value: 48.921 - type: ndcg_at_1 value: 33.073 - type: ndcg_at_10 value: 59.021 - type: ndcg_at_100 value: 61.902 - type: ndcg_at_1000 value: 61.983999999999995 - type: ndcg_at_3 value: 49.818 - type: ndcg_at_5 value: 54.644999999999996 - type: precision_at_1 value: 33.073 - type: precision_at_10 value: 8.684 - type: precision_at_100 value: 0.9900000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 20.555 - type: precision_at_5 value: 14.666 - type: recall_at_1 value: 33.073 - type: recall_at_10 value: 86.842 - type: recall_at_100 value: 99.004 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 61.663999999999994 - type: recall_at_5 value: 73.329 - task: type: Classification dataset: type: PL-MTEB/cbd name: MTEB CBD config: default split: test revision: None metrics: - type: accuracy value: 68.11 - type: ap value: 20.916633959031266 - type: f1 value: 56.85804802205465 - task: type: PairClassification dataset: type: PL-MTEB/cdsce-pairclassification name: MTEB CDSC-E config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 89.2 - type: cos_sim_ap value: 79.1041156765933 - type: cos_sim_f1 value: 70.0 - type: cos_sim_precision value: 74.11764705882354 - type: cos_sim_recall value: 66.3157894736842 - type: dot_accuracy value: 88.2 - type: dot_ap value: 72.57183688228149 - type: dot_f1 value: 67.16417910447761 - type: dot_precision value: 63.67924528301887 - type: dot_recall value: 71.05263157894737 - type: euclidean_accuracy value: 89.3 - type: euclidean_ap value: 79.01345533432428 - type: euclidean_f1 value: 70.19498607242339 - type: euclidean_precision value: 74.55621301775149 - type: euclidean_recall value: 66.3157894736842 - type: manhattan_accuracy value: 89.3 - type: manhattan_ap value: 79.01671381791259 - type: manhattan_f1 value: 70.0280112044818 - type: manhattan_precision value: 74.8502994011976 - type: manhattan_recall value: 65.78947368421053 - type: max_accuracy value: 89.3 - type: max_ap value: 79.1041156765933 - type: max_f1 value: 70.19498607242339 - task: type: STS dataset: type: PL-MTEB/cdscr-sts name: MTEB CDSC-R config: default split: test revision: None metrics: - type: cos_sim_pearson value: 91.79559442663039 - type: cos_sim_spearman value: 92.5438168962641 - type: euclidean_pearson value: 92.02981265332856 - type: euclidean_spearman value: 92.5548245733484 - type: manhattan_pearson value: 91.95296287979178 - type: manhattan_spearman value: 92.50279516120241 - task: type: Retrieval dataset: type: dbpedia-pl name: MTEB DBPedia-PL config: default split: test revision: None metrics: - type: map_at_1 value: 7.829999999999999 - type: map_at_10 value: 16.616 - type: map_at_100 value: 23.629 - type: map_at_1000 value: 25.235999999999997 - type: map_at_3 value: 12.485 - type: map_at_5 value: 14.077 - type: mrr_at_1 value: 61.75000000000001 - type: mrr_at_10 value: 69.852 - type: mrr_at_100 value: 70.279 - type: mrr_at_1000 value: 70.294 - type: mrr_at_3 value: 68.375 - type: mrr_at_5 value: 69.187 - type: ndcg_at_1 value: 49.75 - type: ndcg_at_10 value: 36.217 - type: ndcg_at_100 value: 41.235 - type: ndcg_at_1000 value: 48.952 - type: ndcg_at_3 value: 41.669 - type: ndcg_at_5 value: 38.285000000000004 - type: precision_at_1 value: 61.5 - type: precision_at_10 value: 28.499999999999996 - type: precision_at_100 value: 9.572 - type: precision_at_1000 value: 2.025 - type: precision_at_3 value: 44.083 - type: precision_at_5 value: 36.3 - type: recall_at_1 value: 7.829999999999999 - type: recall_at_10 value: 21.462999999999997 - type: recall_at_100 value: 47.095 - type: recall_at_1000 value: 71.883 - type: recall_at_3 value: 13.891 - type: recall_at_5 value: 16.326999999999998 - task: type: Retrieval dataset: type: fiqa-pl name: MTEB FiQA-PL config: default split: test revision: None metrics: - type: map_at_1 value: 16.950000000000003 - type: map_at_10 value: 27.422 - type: map_at_100 value: 29.146 - type: map_at_1000 value: 29.328 - type: map_at_3 value: 23.735999999999997 - type: map_at_5 value: 25.671 - type: mrr_at_1 value: 33.796 - type: mrr_at_10 value: 42.689 - type: mrr_at_100 value: 43.522 - type: mrr_at_1000 value: 43.563 - type: mrr_at_3 value: 40.226 - type: mrr_at_5 value: 41.685 - type: ndcg_at_1 value: 33.642 - type: ndcg_at_10 value: 35.008 - type: ndcg_at_100 value: 41.839 - type: ndcg_at_1000 value: 45.035 - type: ndcg_at_3 value: 31.358999999999998 - type: ndcg_at_5 value: 32.377 - type: precision_at_1 value: 33.642 - type: precision_at_10 value: 9.937999999999999 - type: precision_at_100 value: 1.685 - type: precision_at_1000 value: 0.22699999999999998 - type: precision_at_3 value: 21.142 - type: precision_at_5 value: 15.586 - type: recall_at_1 value: 16.950000000000003 - type: recall_at_10 value: 42.286 - type: recall_at_100 value: 68.51899999999999 - type: recall_at_1000 value: 87.471 - type: recall_at_3 value: 28.834 - type: recall_at_5 value: 34.274 - task: type: Retrieval dataset: type: hotpotqa-pl name: MTEB HotpotQA-PL config: default split: test revision: None metrics: - type: map_at_1 value: 37.711 - type: map_at_10 value: 57.867999999999995 - type: map_at_100 value: 58.77 - type: map_at_1000 value: 58.836999999999996 - type: map_at_3 value: 54.400999999999996 - type: map_at_5 value: 56.564 - type: mrr_at_1 value: 75.449 - type: mrr_at_10 value: 81.575 - type: mrr_at_100 value: 81.783 - type: mrr_at_1000 value: 81.792 - type: mrr_at_3 value: 80.50399999999999 - type: mrr_at_5 value: 81.172 - type: ndcg_at_1 value: 75.422 - type: ndcg_at_10 value: 66.635 - type: ndcg_at_100 value: 69.85 - type: ndcg_at_1000 value: 71.179 - type: ndcg_at_3 value: 61.648 - type: ndcg_at_5 value: 64.412 - type: precision_at_1 value: 75.422 - type: precision_at_10 value: 13.962 - type: precision_at_100 value: 1.649 - type: precision_at_1000 value: 0.183 - type: precision_at_3 value: 39.172000000000004 - type: precision_at_5 value: 25.691000000000003 - type: recall_at_1 value: 37.711 - type: recall_at_10 value: 69.811 - type: recall_at_100 value: 82.471 - type: recall_at_1000 value: 91.29 - type: recall_at_3 value: 58.757999999999996 - type: recall_at_5 value: 64.227 - task: type: Retrieval dataset: type: msmarco-pl name: MTEB MSMARCO-PL config: default split: validation revision: None metrics: - type: map_at_1 value: 17.033 - type: map_at_10 value: 27.242 - type: map_at_100 value: 28.451999999999998 - type: map_at_1000 value: 28.515 - type: map_at_3 value: 24.046 - type: map_at_5 value: 25.840999999999998 - type: mrr_at_1 value: 17.493 - type: mrr_at_10 value: 27.67 - type: mrr_at_100 value: 28.823999999999998 - type: mrr_at_1000 value: 28.881 - type: mrr_at_3 value: 24.529999999999998 - type: mrr_at_5 value: 26.27 - type: ndcg_at_1 value: 17.479 - type: ndcg_at_10 value: 33.048 - type: ndcg_at_100 value: 39.071 - type: ndcg_at_1000 value: 40.739999999999995 - type: ndcg_at_3 value: 26.493 - type: ndcg_at_5 value: 29.701 - type: precision_at_1 value: 17.479 - type: precision_at_10 value: 5.324 - type: precision_at_100 value: 0.8380000000000001 - type: precision_at_1000 value: 0.098 - type: precision_at_3 value: 11.408999999999999 - type: precision_at_5 value: 8.469999999999999 - type: recall_at_1 value: 17.033 - type: recall_at_10 value: 50.929 - type: recall_at_100 value: 79.262 - type: recall_at_1000 value: 92.239 - type: recall_at_3 value: 33.06 - type: recall_at_5 value: 40.747 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.31002017484867 - type: f1 value: 69.61603671063031 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.52790854068594 - type: f1 value: 75.4053872472259 - task: type: Retrieval dataset: type: nfcorpus-pl name: MTEB NFCorpus-PL config: default split: test revision: None metrics: - type: map_at_1 value: 5.877000000000001 - type: map_at_10 value: 12.817 - type: map_at_100 value: 16.247 - type: map_at_1000 value: 17.683 - type: map_at_3 value: 9.334000000000001 - type: map_at_5 value: 10.886999999999999 - type: mrr_at_1 value: 45.201 - type: mrr_at_10 value: 52.7 - type: mrr_at_100 value: 53.425999999999995 - type: mrr_at_1000 value: 53.461000000000006 - type: mrr_at_3 value: 50.464 - type: mrr_at_5 value: 51.827 - type: ndcg_at_1 value: 41.949999999999996 - type: ndcg_at_10 value: 34.144999999999996 - type: ndcg_at_100 value: 31.556 - type: ndcg_at_1000 value: 40.265 - type: ndcg_at_3 value: 38.07 - type: ndcg_at_5 value: 36.571 - type: precision_at_1 value: 44.272 - type: precision_at_10 value: 25.697 - type: precision_at_100 value: 8.077 - type: precision_at_1000 value: 2.084 - type: precision_at_3 value: 36.016999999999996 - type: precision_at_5 value: 31.703 - type: recall_at_1 value: 5.877000000000001 - type: recall_at_10 value: 16.986 - type: recall_at_100 value: 32.719 - type: recall_at_1000 value: 63.763000000000005 - type: recall_at_3 value: 10.292 - type: recall_at_5 value: 12.886000000000001 - task: type: Retrieval dataset: type: nq-pl name: MTEB NQ-PL config: default split: test revision: None metrics: - type: map_at_1 value: 25.476 - type: map_at_10 value: 38.67 - type: map_at_100 value: 39.784000000000006 - type: map_at_1000 value: 39.831 - type: map_at_3 value: 34.829 - type: map_at_5 value: 37.025000000000006 - type: mrr_at_1 value: 28.621000000000002 - type: mrr_at_10 value: 41.13 - type: mrr_at_100 value: 42.028 - type: mrr_at_1000 value: 42.059999999999995 - type: mrr_at_3 value: 37.877 - type: mrr_at_5 value: 39.763999999999996 - type: ndcg_at_1 value: 28.563 - type: ndcg_at_10 value: 45.654 - type: ndcg_at_100 value: 50.695 - type: ndcg_at_1000 value: 51.873999999999995 - type: ndcg_at_3 value: 38.359 - type: ndcg_at_5 value: 42.045 - type: precision_at_1 value: 28.563 - type: precision_at_10 value: 7.6450000000000005 - type: precision_at_100 value: 1.052 - type: precision_at_1000 value: 0.117 - type: precision_at_3 value: 17.458000000000002 - type: precision_at_5 value: 12.613 - type: recall_at_1 value: 25.476 - type: recall_at_10 value: 64.484 - type: recall_at_100 value: 86.96199999999999 - type: recall_at_1000 value: 95.872 - type: recall_at_3 value: 45.527 - type: recall_at_5 value: 54.029 - task: type: Classification dataset: type: laugustyniak/abusive-clauses-pl name: MTEB PAC config: default split: test revision: None metrics: - type: accuracy value: 65.87315377932232 - type: ap value: 76.41966964416534 - type: f1 value: 63.64417488639012 - task: type: PairClassification dataset: type: PL-MTEB/ppc-pairclassification name: MTEB PPC config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 87.7 - type: cos_sim_ap value: 92.81319372631636 - type: cos_sim_f1 value: 90.04048582995952 - type: cos_sim_precision value: 88.11410459587957 - type: cos_sim_recall value: 92.05298013245033 - type: dot_accuracy value: 75.0 - type: dot_ap value: 83.63089957943261 - type: dot_f1 value: 80.76923076923077 - type: dot_precision value: 75.43103448275862 - type: dot_recall value: 86.9205298013245 - type: euclidean_accuracy value: 87.7 - type: euclidean_ap value: 92.94772245932825 - type: euclidean_f1 value: 90.10458567980692 - type: euclidean_precision value: 87.63693270735524 - type: euclidean_recall value: 92.71523178807946 - type: manhattan_accuracy value: 87.8 - type: manhattan_ap value: 92.95330512127123 - type: manhattan_f1 value: 90.08130081300813 - type: manhattan_precision value: 88.49840255591054 - type: manhattan_recall value: 91.72185430463577 - type: max_accuracy value: 87.8 - type: max_ap value: 92.95330512127123 - type: max_f1 value: 90.10458567980692 - task: type: PairClassification dataset: type: PL-MTEB/psc-pairclassification name: MTEB PSC config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 96.19666048237477 - type: cos_sim_ap value: 98.61237969571302 - type: cos_sim_f1 value: 93.77845220030349 - type: cos_sim_precision value: 93.35347432024169 - type: cos_sim_recall value: 94.20731707317073 - type: dot_accuracy value: 94.89795918367348 - type: dot_ap value: 97.02853491357943 - type: dot_f1 value: 91.85185185185186 - type: dot_precision value: 89.33717579250721 - type: dot_recall value: 94.51219512195121 - type: euclidean_accuracy value: 96.38218923933209 - type: euclidean_ap value: 98.58145584134218 - type: euclidean_f1 value: 94.04580152671755 - type: euclidean_precision value: 94.18960244648318 - type: euclidean_recall value: 93.90243902439023 - type: manhattan_accuracy value: 96.47495361781077 - type: manhattan_ap value: 98.6108221024781 - type: manhattan_f1 value: 94.18960244648318 - type: manhattan_precision value: 94.47852760736197 - type: manhattan_recall value: 93.90243902439023 - type: max_accuracy value: 96.47495361781077 - type: max_ap value: 98.61237969571302 - type: max_f1 value: 94.18960244648318 - task: type: Classification dataset: type: PL-MTEB/polemo2_in name: MTEB PolEmo2.0-IN config: default split: test revision: None metrics: - type: accuracy value: 71.73130193905818 - type: f1 value: 71.17731918813324 - task: type: Classification dataset: type: PL-MTEB/polemo2_out name: MTEB PolEmo2.0-OUT config: default split: test revision: None metrics: - type: accuracy value: 46.59919028340081 - type: f1 value: 37.216392949948954 - task: type: Retrieval dataset: type: quora-pl name: MTEB Quora-PL config: default split: test revision: None metrics: - type: map_at_1 value: 66.134 - type: map_at_10 value: 80.19 - type: map_at_100 value: 80.937 - type: map_at_1000 value: 80.95599999999999 - type: map_at_3 value: 77.074 - type: map_at_5 value: 79.054 - type: mrr_at_1 value: 75.88000000000001 - type: mrr_at_10 value: 83.226 - type: mrr_at_100 value: 83.403 - type: mrr_at_1000 value: 83.406 - type: mrr_at_3 value: 82.03200000000001 - type: mrr_at_5 value: 82.843 - type: ndcg_at_1 value: 75.94 - type: ndcg_at_10 value: 84.437 - type: ndcg_at_100 value: 86.13 - type: ndcg_at_1000 value: 86.29299999999999 - type: ndcg_at_3 value: 81.07799999999999 - type: ndcg_at_5 value: 83.0 - type: precision_at_1 value: 75.94 - type: precision_at_10 value: 12.953999999999999 - type: precision_at_100 value: 1.514 - type: precision_at_1000 value: 0.156 - type: precision_at_3 value: 35.61 - type: precision_at_5 value: 23.652 - type: recall_at_1 value: 66.134 - type: recall_at_10 value: 92.991 - type: recall_at_100 value: 99.003 - type: recall_at_1000 value: 99.86 - type: recall_at_3 value: 83.643 - type: recall_at_5 value: 88.81099999999999 - task: type: Retrieval dataset: type: scidocs-pl name: MTEB SCIDOCS-PL config: default split: test revision: None metrics: - type: map_at_1 value: 4.183 - type: map_at_10 value: 10.626 - type: map_at_100 value: 12.485 - type: map_at_1000 value: 12.793 - type: map_at_3 value: 7.531000000000001 - type: map_at_5 value: 9.037 - type: mrr_at_1 value: 20.5 - type: mrr_at_10 value: 30.175 - type: mrr_at_100 value: 31.356 - type: mrr_at_1000 value: 31.421 - type: mrr_at_3 value: 26.900000000000002 - type: mrr_at_5 value: 28.689999999999998 - type: ndcg_at_1 value: 20.599999999999998 - type: ndcg_at_10 value: 17.84 - type: ndcg_at_100 value: 25.518 - type: ndcg_at_1000 value: 31.137999999999998 - type: ndcg_at_3 value: 16.677 - type: ndcg_at_5 value: 14.641000000000002 - type: precision_at_1 value: 20.599999999999998 - type: precision_at_10 value: 9.3 - type: precision_at_100 value: 2.048 - type: precision_at_1000 value: 0.33999999999999997 - type: precision_at_3 value: 15.533 - type: precision_at_5 value: 12.839999999999998 - type: recall_at_1 value: 4.183 - type: recall_at_10 value: 18.862000000000002 - type: recall_at_100 value: 41.592 - type: recall_at_1000 value: 69.037 - type: recall_at_3 value: 9.443 - type: recall_at_5 value: 13.028 - task: type: PairClassification dataset: type: PL-MTEB/sicke-pl-pairclassification name: MTEB SICK-E-PL config: default split: test revision: None metrics: - type: cos_sim_accuracy value: 86.32286995515696 - type: cos_sim_ap value: 82.04302619416443 - type: cos_sim_f1 value: 74.95572086432874 - type: cos_sim_precision value: 74.55954897815363 - type: cos_sim_recall value: 75.35612535612536 - type: dot_accuracy value: 83.9176518548716 - type: dot_ap value: 76.8608733580272 - type: dot_f1 value: 72.31936654569449 - type: dot_precision value: 67.36324523663184 - type: dot_recall value: 78.06267806267806 - type: euclidean_accuracy value: 86.32286995515696 - type: euclidean_ap value: 81.9648986659308 - type: euclidean_f1 value: 74.93796526054591 - type: euclidean_precision value: 74.59421312632321 - type: euclidean_recall value: 75.28490028490027 - type: manhattan_accuracy value: 86.30248675091724 - type: manhattan_ap value: 81.92853980116878 - type: manhattan_f1 value: 74.80968858131489 - type: manhattan_precision value: 72.74562584118439 - type: manhattan_recall value: 76.99430199430199 - type: max_accuracy value: 86.32286995515696 - type: max_ap value: 82.04302619416443 - type: max_f1 value: 74.95572086432874 - task: type: STS dataset: type: PL-MTEB/sickr-pl-sts name: MTEB SICK-R-PL config: default split: test revision: None metrics: - type: cos_sim_pearson value: 83.07566183637853 - type: cos_sim_spearman value: 79.20198022242548 - type: euclidean_pearson value: 81.27875473517936 - type: euclidean_spearman value: 79.21560102311153 - type: manhattan_pearson value: 81.21559474880459 - type: manhattan_spearman value: 79.1537846814979 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 36.39657573900194 - type: cos_sim_spearman value: 40.36403461037013 - type: euclidean_pearson value: 29.143416004776316 - type: euclidean_spearman value: 40.43197841306375 - type: manhattan_pearson value: 29.18632337290767 - type: manhattan_spearman value: 40.50563343395481 - task: type: Retrieval dataset: type: scifact-pl name: MTEB SciFact-PL config: default split: test revision: None metrics: - type: map_at_1 value: 49.428 - type: map_at_10 value: 60.423 - type: map_at_100 value: 61.037 - type: map_at_1000 value: 61.065999999999995 - type: map_at_3 value: 56.989000000000004 - type: map_at_5 value: 59.041999999999994 - type: mrr_at_1 value: 52.666999999999994 - type: mrr_at_10 value: 61.746 - type: mrr_at_100 value: 62.273 - type: mrr_at_1000 value: 62.300999999999995 - type: mrr_at_3 value: 59.278 - type: mrr_at_5 value: 60.611000000000004 - type: ndcg_at_1 value: 52.333 - type: ndcg_at_10 value: 65.75 - type: ndcg_at_100 value: 68.566 - type: ndcg_at_1000 value: 69.314 - type: ndcg_at_3 value: 59.768 - type: ndcg_at_5 value: 62.808 - type: precision_at_1 value: 52.333 - type: precision_at_10 value: 9.167 - type: precision_at_100 value: 1.0630000000000002 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 23.778 - type: precision_at_5 value: 16.2 - type: recall_at_1 value: 49.428 - type: recall_at_10 value: 81.07799999999999 - type: recall_at_100 value: 93.93299999999999 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 65.061 - type: recall_at_5 value: 72.667 - task: type: Retrieval dataset: type: trec-covid-pl name: MTEB TRECCOVID-PL config: default split: test revision: None metrics: - type: map_at_1 value: 0.22100000000000003 - type: map_at_10 value: 1.788 - type: map_at_100 value: 9.937 - type: map_at_1000 value: 24.762999999999998 - type: map_at_3 value: 0.579 - type: map_at_5 value: 0.947 - type: mrr_at_1 value: 78.0 - type: mrr_at_10 value: 88.067 - type: mrr_at_100 value: 88.067 - type: mrr_at_1000 value: 88.067 - type: mrr_at_3 value: 87.667 - type: mrr_at_5 value: 88.067 - type: ndcg_at_1 value: 76.0 - type: ndcg_at_10 value: 71.332 - type: ndcg_at_100 value: 54.80500000000001 - type: ndcg_at_1000 value: 49.504999999999995 - type: ndcg_at_3 value: 73.693 - type: ndcg_at_5 value: 73.733 - type: precision_at_1 value: 82.0 - type: precision_at_10 value: 76.8 - type: precision_at_100 value: 56.68 - type: precision_at_1000 value: 22.236 - type: precision_at_3 value: 78.667 - type: precision_at_5 value: 79.2 - type: recall_at_1 value: 0.22100000000000003 - type: recall_at_10 value: 2.033 - type: recall_at_100 value: 13.431999999999999 - type: recall_at_1000 value: 46.913 - type: recall_at_3 value: 0.625 - type: recall_at_5 value: 1.052 language: pl license: apache-2.0 widget: - source_sentence: \"zapytanie: Jak dożyć 100 lat?\" sentences: - \"Trzeba zdrowo się odżywiać i uprawiać sport.\" - \"Trzeba pić alkohol, imprezować i jeździć szybkimi autami.\" - \"Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu.\" ---

MMLW-roberta-base

MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. This is a distilled model that can be used to generate embeddings applicable to many tasks such as semantic similarity, clustering, information retrieval. The model can also serve as a base for further fine-tuning. It transforms texts to 768 dimensional vectors. The model was initialized with Polish RoBERTa checkpoint, and then trained with multilingual knowledge distillation method on a diverse corpus of 60 million Polish-English text pairs. We utilised English FlagEmbeddings (BGE) as teacher models for distillation. ## Usage (Sentence-Transformers) ⚠️ Our embedding models require the use of specific prefixes and suffixes when encoding texts. For this model, each query should be preceded by the prefix **\"zapytanie: \"** ⚠️ You can use the model like this with sentence-transformers: ## Evaluation Results - The model achieves an **Average Score** of **61.05** on the Polish Massive Text Embedding Benchmark (MTEB). See MTEB Leaderboard for detailed results. - The model achieves **NDCG@10** of **53.60** on the Polish Information Retrieval Benchmark. See PIRB Leaderboard for detailed results. ## Acknowledgements This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative. ## Citation", + "model_explanation_gemini": "A Polish sentence-transformers model designed for sentence similarity tasks, also evaluated on clustering, classification, and retrieval benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/second-state_Gemma-2-9B-Chinese-Chat-GGUF.json b/data/model_data_json/second-state_Gemma-2-9B-Chinese-Chat-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..51b9ef999afb74edc9e69d184b80f0594427e1a2 --- /dev/null +++ b/data/model_data_json/second-state_Gemma-2-9B-Chinese-Chat-GGUF.json @@ -0,0 +1,20 @@ +{ + "model_id": "second-state/Gemma-2-9B-Chinese-Chat-GGUF", + "downloads": 94448, + "tags": [ + "transformers", + "gguf", + "gemma2", + "text-generation", + "en", + "zh", + "base_model:shenzhi-wang/Gemma-2-9B-Chinese-Chat", + "base_model:quantized:shenzhi-wang/Gemma-2-9B-Chinese-Chat", + "license:gemma", + "autotrain_compatible", + "region:us", + "conversational" + ], + "description": "--- license: gemma library_name: transformers pipeline_tag: text-generation base_model: shenzhi-wang/Gemma-2-9B-Chinese-Chat inference: false model_creator: shenzhi-wang model_name: Gemma-2-9B-Chinese-Chat quantized_by: Second State Inc. language: - en - zh ---

# Gemma-2-9B-Chinese-Chat-GGUF ## Original Model shenzhi-wang/Gemma-2-9B-Chinese-Chat ## Run with LlamaEdge - LlamaEdge version: v0.12.1 and above - Prompt template - Prompt type: - Prompt string - Context size: - Run as LlamaEdge service - Run as LlamaEdge command app ## Quantized GGUF Models | Name | Quant method | Bits | Size | Use case | | ---- | ---- | ---- | ---- | ----- | | Gemma-2-9B-Chinese-Chat-Q2_K.gguf | Q2_K | 2 | 3.81 GB| smallest, significant quality loss - not recommended for most purposes | | Gemma-2-9B-Chinese-Chat-Q3_K_L.gguf | Q3_K_L | 3 | 5.13 GB| small, substantial quality loss | | Gemma-2-9B-Chinese-Chat-Q3_K_M.gguf | Q3_K_M | 3 | 4.76 GB| very small, high quality loss | | Gemma-2-9B-Chinese-Chat-Q3_K_S.gguf | Q3_K_S | 3 | 4.34 GB| very small, high quality loss | | Gemma-2-9B-Chinese-Chat-Q4_0.gguf | Q4_0 | 4 | 5.44 GB| legacy; small, very high quality loss - prefer using Q3_K_M | | Gemma-2-9B-Chinese-Chat-Q4_K_M.gguf | Q4_K_M | 4 | 5.76 GB| medium, balanced quality - recommended | | Gemma-2-9B-Chinese-Chat-Q4_K_S.gguf | Q4_K_S | 4 | 5.48 GB| small, greater quality loss | | Gemma-2-9B-Chinese-Chat-Q5_0.gguf | Q5_0 | 5 | 6.48 GB| legacy; medium, balanced quality - prefer using Q4_K_M | | Gemma-2-9B-Chinese-Chat-Q5_K_M.gguf | Q5_K_M | 5 | 6.65 GB| large, very low quality loss - recommended | | Gemma-2-9B-Chinese-Chat-Q5_K_S.gguf | Q5_K_S | 5 | 6.48 GB| large, low quality loss - recommended | | Gemma-2-9B-Chinese-Chat-Q6_K.gguf | Q6_K | 6 | 7.59 GB| very large, extremely low quality loss | | Gemma-2-9B-Chinese-Chat-Q8_0.gguf | Q8_0 | 8 | 9.83 GB| very large, extremely low quality loss - not recommended | | Gemma-2-9B-Chinese-Chat-f16.gguf | f16 | 16 | 18.5 GB| | *Quantized with llama.cpp b3259*", + "model_explanation_gemini": "A quantized version of the Gemma-2-9B-Chinese-Chat model optimized for text generation in English and Chinese, available in various sizes and quality levels for different use cases." +} \ No newline at end of file diff --git a/data/model_data_json/second-state_Gemma-2b-it-GGUF.json b/data/model_data_json/second-state_Gemma-2b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..379e7a4c2bc7cbbfb90d50f761b646038bfd5a9a --- /dev/null +++ b/data/model_data_json/second-state_Gemma-2b-it-GGUF.json @@ -0,0 +1,18 @@ +{ + "model_id": "second-state/Gemma-2b-it-GGUF", + "downloads": 213525, + "tags": [ + "transformers", + "gguf", + "gemma", + "text-generation", + "base_model:google/gemma-2b-it", + "base_model:quantized:google/gemma-2b-it", + "license:other", + "autotrain_compatible", + "region:us", + "conversational" + ], + "description": "--- base_model: google/gemma-2b-it inference: false library_name: transformers license: other license_name: gemma-terms-of-use license_link: model_creator: Google model_name: gemma 2b it quantized_by: Second State Inc. ---

# Gemma-2b-it ## Original Model google/gemma-2b-it ## Run with LlamaEdge - LlamaEdge version: v0.3.2 - Prompt template - Prompt type: - Prompt string - Context size: - Run as LlamaEdge service - Run as LlamaEdge command app ## Quantized GGUF Models | Name | Quant method | Bits | Size | Use case | | ---- | ---- | ---- | ---- | ----- | | gemma-2b-it-Q2_K.gguf | Q2_K | 2 | 900 MB| smallest, significant quality loss - not recommended for most purposes | | gemma-2b-it-Q3_K_L.gguf | Q3_K_L | 3 | 1.26 GB| small, substantial quality loss | | gemma-2b-it-Q3_K_M.gguf | Q3_K_M | 3 | 1.18 GB| very small, high quality loss | | gemma-2b-it-Q3_K_S.gguf | Q3_K_S | 3 | 1.08 GB| very small, high quality loss | | gemma-2b-it-Q4_0.gguf | Q4_0 | 4 | 1.42 GB| legacy; small, very high quality loss - prefer using Q3_K_M | | gemma-2b-it-Q4_K_M.gguf | Q4_K_M | 4 | 1.5 GB| medium, balanced quality - recommended | | gemma-2b-it-Q4_K_S.gguf | Q4_K_S | 4 | 1.42 GB| small, greater quality loss | | gemma-2b-it-Q5_0.gguf | Q5_0 | 5 | 1.73 GB| legacy; medium, balanced quality - prefer using Q4_K_M | | gemma-2b-it-Q5_K_M.gguf | Q5_K_M | 5 | 1.77 GB| large, very low quality loss - recommended | | gemma-2b-it-Q5_K_S.gguf | Q5_K_S | 5 | 1.73 GB| large, low quality loss - recommended | | gemma-2b-it-Q6_K.gguf | Q6_K | 6 | 2.06 GB| very large, extremely low quality loss | | gemma-2b-it-Q8_0.gguf | Q8_0 | 8 | 2.67 GB| very large, extremely low quality loss - not recommended | *Quantized with llama.cpp b2230*", + "model_explanation_gemini": "A quantized version of Google's Gemma-2b-it model optimized for efficient inference with varying size-quality tradeoffs in GGUF format." +} \ No newline at end of file diff --git a/data/model_data_json/segment-any-text_sat-3l-sm.json b/data/model_data_json/segment-any-text_sat-3l-sm.json new file mode 100644 index 0000000000000000000000000000000000000000..f0ff80a4135fe1b37ed51941b5cea716f55c2f49 --- /dev/null +++ b/data/model_data_json/segment-any-text_sat-3l-sm.json @@ -0,0 +1,103 @@ +{ + "model_id": "segment-any-text/sat-3l-sm", + "downloads": 146760, + "tags": [ + "transformers", + "onnx", + "safetensors", + "xlm-token", + "token-classification", + "multilingual", + "am", + "ar", + "az", + "be", + "bg", + "bn", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fr", + "fy", + "ga", + "gd", + "gl", + "gu", + "ha", + "he", + "hi", + "hu", + "hy", + "id", + "ig", + "is", + "it", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "ku", + "ky", + "la", + "lt", + "lv", + "mg", + "mk", + "ml", + "mn", + "mr", + "ms", + "mt", + "my", + "ne", + "nl", + "no", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "si", + "sk", + "sl", + "sq", + "sr", + "sv", + "ta", + "te", + "tg", + "th", + "tr", + "uk", + "ur", + "uz", + "vi", + "xh", + "yi", + "yo", + "zh", + "zu", + "arxiv:2406.16678", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - multilingual - am - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hu - hy - id - ig - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lt - lv - mg - mk - ml - mn - mr - ms - mt - my - ne - nl - no - pa - pl - ps - pt - ro - ru - si - sk - sl - sq - sr - sv - ta - te - tg - th - tr - uk - ur - uz - vi - xh - yi - yo - zh - zu library: - wtpsplit --- # sat-3l-sm Model for []( State-of-the-art sentence segmentation with 3 Transfomer layers. For details, see our paper", + "model_explanation_gemini": "State-of-the-art multilingual sentence segmentation model with 3 Transformer layers for splitting text into sentences." +} \ No newline at end of file diff --git a/data/model_data_json/segmind_SSD-1B.json b/data/model_data_json/segmind_SSD-1B.json new file mode 100644 index 0000000000000000000000000000000000000000..2c5299f71729cdec6bc76aff05cb831e2ae1e0d4 --- /dev/null +++ b/data/model_data_json/segmind_SSD-1B.json @@ -0,0 +1,23 @@ +{ + "model_id": "segmind/SSD-1B", + "downloads": 36840, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "ultra-realistic", + "stable-diffusion", + "distilled-model", + "knowledge-distillation", + "dataset:zzliang/GRIT", + "dataset:wanng/midjourney-v5-202304-clean", + "arxiv:2401.02677", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - text-to-image - ultra-realistic - text-to-image - stable-diffusion - distilled-model - knowledge-distillation pinned: true datasets: - zzliang/GRIT - wanng/midjourney-v5-202304-clean library_name: diffusers --- # Segmind Stable Diffusion 1B (SSD-1B) Model Card !image/png ## 📣 Read our technical report for more details on our disillation method ## AUTOMATIC1111 compatibility added. Supporting file here ## Demo Try out the model at Segmind SSD-1B for ⚡ fastest inference. You can also try it on 🤗 Spaces ## Model Description The Segmind Stable Diffusion Model (SSD-1B) is a **distilled 50% smaller** version of the Stable Diffusion XL (SDXL), offering a **60% speedup** while maintaining high-quality text-to-image generation capabilities. It has been trained on diverse datasets, including Grit and Midjourney scrape data, to enhance its ability to create a wide range of visual content based on textual prompts. This model employs a knowledge distillation strategy, where it leverages the teachings of several expert models in succession, including SDXL, ZavyChromaXL, and JuggernautXL, to combine their strengths and produce impressive visual outputs. Special thanks to the HF team 🤗 especially Sayak, Patrick and Poli for their collaboration and guidance on this work. ## Image Comparision (SDXL-1.0 vs SSD-1B) !image/png ## Usage: This model can be used via the 🧨 Diffusers library. Make sure to install diffusers from source by running In addition, please install , and : To use the model, you can run the following: ### Update: Our model should now be usable in ComfyUI. ### Please do use negative prompting, and a CFG around 9.0 for the best quality! ### Model Description - **Developed by:** Segmind - **Developers:** Yatharth Gupta and Vishnu Jaddipal. - **Model type:** Diffusion-based text-to-image generative model - **License:** Apache 2.0 - **Distilled From** stabilityai/stable-diffusion-xl-base-1.0 ### Key Features - **Text-to-Image Generation:** The model excels at generating images from text prompts, enabling a wide range of creative applications. - **Distilled for Speed:** Designed for efficiency, this model offers a 60% speedup, making it a practical choice for real-time applications and scenarios where rapid image generation is essential. - **Diverse Training Data:** Trained on diverse datasets, the model can handle a variety of textual prompts and generate corresponding images effectively. - **Knowledge Distillation:** By distilling knowledge from multiple expert models, the Segmind Stable Diffusion Model combines their strengths and minimizes their limitations, resulting in improved performance. ### Model Architecture The SSD-1B Model is a 1.3B Parameter Model which has several layers removed from the Base SDXL Model !image/png ### Training info These are the key hyperparameters used during training: * Steps: 251000 * Learning rate: 1e-5 * Batch size: 32 * Gradient accumulation steps: 4 * Image resolution: 1024 * Mixed-precision: fp16 ### Multi-Resolution Support !image/jpeg SSD-1B can support the following output resolutions. * 1024 x 1024 (1:1 Square) * 1152 x 896 (9:7) * 896 x 1152 (7:9) * 1216 x 832 (19:13) * 832 x 1216 (13:19) * 1344 x 768 (7:4 Horizontal) * 768 x 1344 (4:7 Vertical) * 1536 x 640 (12:5 Horizontal) * 640 x 1536 (5:12 Vertical) ### Speed Comparision We have observed that SSD-1B is upto 60% faster than the Base SDXL Model. Below is a comparision on an A100 80GB. !image/png Below are the speed up metrics on a RTX 4090 GPU. !image/png ### Model Sources For research and development purposes, the SSD-1B Model can be accessed via the Segmind AI platform. For more information and access details, please visit Segmind. ## Uses ### Direct Use The Segmind Stable Diffusion Model is suitable for research and practical applications in various domains, including: - **Art and Design:** It can be used to generate artworks, designs, and other creative content, providing inspiration and enhancing the creative process. - **Education:** The model can be applied in educational tools to create visual content for teaching and learning purposes. - **Research:** Researchers can use the model to explore generative models, evaluate its performance, and push the boundaries of text-to-image generation. - **Safe Content Generation:** It offers a safe and controlled way to generate content, reducing the risk of harmful or inappropriate outputs. - **Bias and Limitation Analysis:** Researchers and developers can use the model to probe its limitations and biases, contributing to a better understanding of generative models' behavior. ### Downstream Use The Segmind Stable Diffusion Model can also be used directly with the 🧨 Diffusers library training scripts for further training, including: - **LoRA:** - **Fine-Tune:** - **Dreambooth LoRA:** ### Out-of-Scope Use The SSD-1B Model is not suitable for creating factual or accurate representations of people, events, or real-world information. It is not intended for tasks requiring high precision and accuracy. ## Limitations and Bias Limitations & Bias The SSD-1B Model has some challenges in embodying absolute photorealism, especially in human depictions. While it grapples with incorporating clear text and maintaining the fidelity of complex compositions due to its autoencoding approach, these hurdles pave the way for future enhancements. Importantly, the model's exposure to a diverse dataset, though not a panacea for ingrained societal and digital biases, represents a foundational step towards more equitable technology. Users are encouraged to interact with this pioneering tool with an understanding of its current limitations, fostering an environment of conscious engagement and anticipation for its continued evolution. ## Citation", + "model_explanation_gemini": "Segmind SSD-1B is a distilled, faster version of Stable Diffusion XL that generates high-quality images from text prompts while being 50% smaller and 60% faster." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers-testing_stsb-bert-tiny-safetensors.json b/data/model_data_json/sentence-transformers-testing_stsb-bert-tiny-safetensors.json new file mode 100644 index 0000000000000000000000000000000000000000..6438b1d62c93604d9902e41a654574bfb3d4b47b --- /dev/null +++ b/data/model_data_json/sentence-transformers-testing_stsb-bert-tiny-safetensors.json @@ -0,0 +1,19 @@ +{ + "model_id": "sentence-transformers-testing/stsb-bert-tiny-safetensors", + "downloads": 131863, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- # sentence-transformers-testing/stsb-bert-tiny-safetensors This is a sentence-transformers model: It maps sentences & paragraphs to a 128 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Training The model was trained with the parameters: **DataLoader**: of length 360 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors ", + "model_explanation_gemini": "Maps sentences and paragraphs to 128-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_LaBSE.json b/data/model_data_json/sentence-transformers_LaBSE.json new file mode 100644 index 0000000000000000000000000000000000000000..7de9729d203238c9db39cf21e0fdf4664be8bf0b --- /dev/null +++ b/data/model_data_json/sentence-transformers_LaBSE.json @@ -0,0 +1,132 @@ +{ + "model_id": "sentence-transformers/LaBSE", + "downloads": 1014474, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "multilingual", + "af", + "sq", + "am", + "ar", + "hy", + "as", + "az", + "eu", + "be", + "bn", + "bs", + "bg", + "my", + "ca", + "ceb", + "zh", + "co", + "hr", + "cs", + "da", + "nl", + "en", + "eo", + "et", + "fi", + "fr", + "fy", + "gl", + "ka", + "de", + "el", + "gu", + "ht", + "ha", + "haw", + "he", + "hi", + "hmn", + "hu", + "is", + "ig", + "id", + "ga", + "it", + "ja", + "jv", + "kn", + "kk", + "km", + "rw", + "ko", + "ku", + "ky", + "lo", + "la", + "lv", + "lt", + "lb", + "mk", + "mg", + "ms", + "ml", + "mt", + "mi", + "mr", + "mn", + "ne", + "no", + "ny", + "or", + "fa", + "pl", + "pt", + "pa", + "ro", + "ru", + "sm", + "gd", + "sr", + "st", + "sn", + "si", + "sk", + "sl", + "so", + "es", + "su", + "sw", + "sv", + "tl", + "tg", + "ta", + "tt", + "te", + "th", + "bo", + "tr", + "tk", + "ug", + "uk", + "ur", + "uz", + "vi", + "cy", + "wo", + "xh", + "yi", + "yo", + "zu", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - af - sq - am - ar - hy - as - az - eu - be - bn - bs - bg - my - ca - ceb - zh - co - hr - cs - da - nl - en - eo - et - fi - fr - fy - gl - ka - de - el - gu - ht - ha - haw - he - hi - hmn - hu - is - ig - id - ga - it - ja - jv - kn - kk - km - rw - ko - ku - ky - lo - la - lv - lt - lb - mk - mg - ms - ml - mt - mi - mr - mn - ne - no - ny - or - fa - pl - pt - pa - ro - ru - sm - gd - sr - st - sn - si - sk - sl - so - es - su - sw - sv - tl - tg - ta - tt - te - th - bo - tr - tk - ug - uk - ur - uz - vi - cy - wo - xh - yi - yo - zu pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity library_name: sentence-transformers license: apache-2.0 --- # LaBSE This is a port of the LaBSE model to PyTorch. It can be used to map 109 languages to a shared vector space. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Full Model Architecture ## Citing & Authors Have a look at LaBSE for the respective publication that describes LaBSE.", + "model_explanation_gemini": "Maps text in 109 languages to a shared vector space for sentence similarity tasks." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_all-MiniLM-L12-v2.json b/data/model_data_json/sentence-transformers_all-MiniLM-L12-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..ed27763983f38165be42d255302d974241ca26c9 --- /dev/null +++ b/data/model_data_json/sentence-transformers_all-MiniLM-L12-v2.json @@ -0,0 +1,50 @@ +{ + "model_id": "sentence-transformers/all-MiniLM-L12-v2", + "downloads": 2965260, + "tags": [ + "sentence-transformers", + "pytorch", + "rust", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:s2orc", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:ms_marco", + "dataset:gooaq", + "dataset:yahoo_answers_topics", + "dataset:code_search_net", + "dataset:search_qa", + "dataset:eli5", + "dataset:snli", + "dataset:multi_nli", + "dataset:wikihow", + "dataset:natural_questions", + "dataset:trivia_qa", + "dataset:embedding-data/sentence-compression", + "dataset:embedding-data/flickr30k-captions", + "dataset:embedding-data/altlex", + "dataset:embedding-data/simple-wiki", + "dataset:embedding-data/QQP", + "dataset:embedding-data/SPECTER", + "dataset:embedding-data/PAQ_pairs", + "dataset:embedding-data/WikiAnswers", + "arxiv:1904.06472", + "arxiv:2102.07033", + "arxiv:2104.08727", + "arxiv:1704.05179", + "arxiv:1810.09305", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - s2orc - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - code_search_net - search_qa - eli5 - snli - multi_nli - wikihow - natural_questions - trivia_qa - embedding-data/sentence-compression - embedding-data/flickr30k-captions - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/QQP - embedding-data/SPECTER - embedding-data/PAQ_pairs - embedding-data/WikiAnswers pipeline_tag: sentence-similarity --- # all-MiniLM-L12-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ------ ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained []( model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 256 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. #### Hyper parameters We trained ou model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: . #### Training data We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | Stack Exchange (Title+Body, Answer) pairs | - | 21,396,559 | | Stack Exchange (Title, Answer) pairs | - | 21,396,559 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | **Total** | | **1,170,060,424** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 384-dimensional vectors for tasks like semantic search and clustering." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_all-MiniLM-L6-v2.json b/data/model_data_json/sentence-transformers_all-MiniLM-L6-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..0b21d4137546bce28cb6a3fcbde7f87ccafd8bbb --- /dev/null +++ b/data/model_data_json/sentence-transformers_all-MiniLM-L6-v2.json @@ -0,0 +1,51 @@ +{ + "model_id": "sentence-transformers/all-MiniLM-L6-v2", + "downloads": 79575769, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "rust", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:s2orc", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:ms_marco", + "dataset:gooaq", + "dataset:yahoo_answers_topics", + "dataset:code_search_net", + "dataset:search_qa", + "dataset:eli5", + "dataset:snli", + "dataset:multi_nli", + "dataset:wikihow", + "dataset:natural_questions", + "dataset:trivia_qa", + "dataset:embedding-data/sentence-compression", + "dataset:embedding-data/flickr30k-captions", + "dataset:embedding-data/altlex", + "dataset:embedding-data/simple-wiki", + "dataset:embedding-data/QQP", + "dataset:embedding-data/SPECTER", + "dataset:embedding-data/PAQ_pairs", + "dataset:embedding-data/WikiAnswers", + "arxiv:1904.06472", + "arxiv:2102.07033", + "arxiv:2104.08727", + "arxiv:1704.05179", + "arxiv:1810.09305", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - s2orc - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - code_search_net - search_qa - eli5 - snli - multi_nli - wikihow - natural_questions - trivia_qa - embedding-data/sentence-compression - embedding-data/flickr30k-captions - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/QQP - embedding-data/SPECTER - embedding-data/PAQ_pairs - embedding-data/WikiAnswers pipeline_tag: sentence-similarity --- # all-MiniLM-L6-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ------ ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained []( model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developed this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developed this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 256 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. #### Hyper parameters We trained our model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: . #### Training data We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | Stack Exchange (Title+Body, Answer) pairs | - | 21,396,559 | | Stack Exchange (Title, Answer) pairs | - | 21,396,559 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | **Total** | | **1,170,060,424** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 384-dimensional vectors for semantic search, clustering, and similarity tasks." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_all-distilroberta-v1.json b/data/model_data_json/sentence-transformers_all-distilroberta-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..c59b5ee97c882fc7d5506e8d4d6518ee2deb42bb --- /dev/null +++ b/data/model_data_json/sentence-transformers_all-distilroberta-v1.json @@ -0,0 +1,51 @@ +{ + "model_id": "sentence-transformers/all-distilroberta-v1", + "downloads": 444866, + "tags": [ + "sentence-transformers", + "pytorch", + "rust", + "onnx", + "safetensors", + "openvino", + "roberta", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:s2orc", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:ms_marco", + "dataset:gooaq", + "dataset:yahoo_answers_topics", + "dataset:code_search_net", + "dataset:search_qa", + "dataset:eli5", + "dataset:snli", + "dataset:multi_nli", + "dataset:wikihow", + "dataset:natural_questions", + "dataset:trivia_qa", + "dataset:embedding-data/sentence-compression", + "dataset:embedding-data/flickr30k-captions", + "dataset:embedding-data/altlex", + "dataset:embedding-data/simple-wiki", + "dataset:embedding-data/QQP", + "dataset:embedding-data/SPECTER", + "dataset:embedding-data/PAQ_pairs", + "dataset:embedding-data/WikiAnswers", + "arxiv:1904.06472", + "arxiv:2102.07033", + "arxiv:2104.08727", + "arxiv:1704.05179", + "arxiv:1810.09305", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - s2orc - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - code_search_net - search_qa - eli5 - snli - multi_nli - wikihow - natural_questions - trivia_qa - embedding-data/sentence-compression - embedding-data/flickr30k-captions - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/QQP - embedding-data/SPECTER - embedding-data/PAQ_pairs - embedding-data/WikiAnswers pipeline_tag: sentence-similarity --- # all-distilroberta-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ------ ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained []( model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 128 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. #### Hyper parameters We trained ou model on a TPU v3-8. We train the model during 920k steps using a batch size of 512 (64 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: . #### Training data We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | **Total** | | **1,124,818,467** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for semantic search, clustering, and similarity tasks." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_all-mpnet-base-v2.json b/data/model_data_json/sentence-transformers_all-mpnet-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..68f80e341a5a8cb61b4ee573d07f42249db8c176 --- /dev/null +++ b/data/model_data_json/sentence-transformers_all-mpnet-base-v2.json @@ -0,0 +1,49 @@ +{ + "model_id": "sentence-transformers/all-mpnet-base-v2", + "downloads": 17858407, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "mpnet", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:s2orc", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:ms_marco", + "dataset:gooaq", + "dataset:yahoo_answers_topics", + "dataset:code_search_net", + "dataset:search_qa", + "dataset:eli5", + "dataset:snli", + "dataset:multi_nli", + "dataset:wikihow", + "dataset:natural_questions", + "dataset:trivia_qa", + "dataset:embedding-data/sentence-compression", + "dataset:embedding-data/flickr30k-captions", + "dataset:embedding-data/altlex", + "dataset:embedding-data/simple-wiki", + "dataset:embedding-data/QQP", + "dataset:embedding-data/SPECTER", + "dataset:embedding-data/PAQ_pairs", + "dataset:embedding-data/WikiAnswers", + "arxiv:1904.06472", + "arxiv:2102.07033", + "arxiv:2104.08727", + "arxiv:1704.05179", + "arxiv:1810.09305", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - s2orc - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - code_search_net - search_qa - eli5 - snli - multi_nli - wikihow - natural_questions - trivia_qa - embedding-data/sentence-compression - embedding-data/flickr30k-captions - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/QQP - embedding-data/SPECTER - embedding-data/PAQ_pairs - embedding-data/WikiAnswers pipeline_tag: sentence-similarity --- # all-mpnet-base-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ------ ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained []( model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 384 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. #### Hyper parameters We trained ou model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: . #### Training data We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | Stack Exchange (Title+Body, Answer) pairs | - | 21,396,559 | | Stack Exchange (Title, Answer) pairs | - | 21,396,559 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | **Total** | | **1,170,060,424** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for semantic search, clustering, and similarity tasks." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_all-roberta-large-v1.json b/data/model_data_json/sentence-transformers_all-roberta-large-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..36fc14d83c32cb4cd0cd01e1a0d663df14054d78 --- /dev/null +++ b/data/model_data_json/sentence-transformers_all-roberta-large-v1.json @@ -0,0 +1,29 @@ +{ + "model_id": "sentence-transformers/all-roberta-large-v1", + "downloads": 1040994, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "roberta", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "arxiv:1904.06472", + "arxiv:2102.07033", + "arxiv:2104.08727", + "arxiv:1704.05179", + "arxiv:1810.09305", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # all-roberta-large-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ------ ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained []( model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 128 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. #### Hyper parameters We trained ou model on a TPU v3-8. We train the model during 400k steps using a batch size of 256 (32 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: . #### Training data We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | **Total** | | **1,124,818,467** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 1024-dimensional vectors for semantic search, clustering, and similarity tasks using contrastive learning on large datasets." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_bert-base-nli-mean-tokens.json b/data/model_data_json/sentence-transformers_bert-base-nli-mean-tokens.json new file mode 100644 index 0000000000000000000000000000000000000000..ea7f5cd87a4afd3e9506e27f4592ea4a4a84b068 --- /dev/null +++ b/data/model_data_json/sentence-transformers_bert-base-nli-mean-tokens.json @@ -0,0 +1,26 @@ +{ + "model_id": "sentence-transformers/bert-base-nli-mean-tokens", + "downloads": 1495667, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "jax", + "rust", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- **⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models** # sentence-transformers/bert-base-nli-mean-tokens This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_clip-ViT-B-32-multilingual-v1.json b/data/model_data_json/sentence-transformers_clip-ViT-B-32-multilingual-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..2b0106c0fedaa2cff2f267511e9eb7b8604f5b42 --- /dev/null +++ b/data/model_data_json/sentence-transformers_clip-ViT-B-32-multilingual-v1.json @@ -0,0 +1,25 @@ +{ + "model_id": "sentence-transformers/clip-ViT-B-32-multilingual-v1", + "downloads": 169603, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "distilbert", + "feature-extraction", + "sentence-similarity", + "multilingual", + "arxiv:2004.09813", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: multilingual license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity pipeline_tag: sentence-similarity --- # sentence-transformers/clip-ViT-B-32-multilingual-v1 This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close. This model can be used for **image search** (users search through a large collection of images) and for **multi-lingual zero-shot image classification** (image labels are defined as text). ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Multilingual Image Search - Demo For a demo of multilingual image search, have a look at: Image_Search-multilingual.ipynb ( Colab version ) For more details on image search and zero-shot image classification, have a look at the documentation on SBERT.net. ## Training This model has been created using Multilingual Knowledge Distillation. As teacher model, we used the original and then trained a multilingual DistilBERT model as student model. Using parallel data, the multilingual student model learns to align the teachers vector space across many languages. As a result, you get an text embedding model that works for 50+ languages. The image encoder from CLIP is unchanged, i.e. you can use the original CLIP image encoder to encode images. Have a look at the SBERT.net - Multilingual-Models documentation on more details and for **training code**. We used the following 50+ languages to align the vector spaces: ar, bg, ca, cs, da, de, el, es, et, fa, fi, fr, fr-ca, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, pt, pt-br, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh-cn, zh-tw. The original multilingual DistilBERT supports 100+ lanugages. The model also work for these languages, but might not yield the best results. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps text in 50+ languages and images to a shared vector space for multilingual image search and zero-shot image classification." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_distilbert-base-nli-mean-tokens.json b/data/model_data_json/sentence-transformers_distilbert-base-nli-mean-tokens.json new file mode 100644 index 0000000000000000000000000000000000000000..793f99c5a47e960a60997d27894957a93513579f --- /dev/null +++ b/data/model_data_json/sentence-transformers_distilbert-base-nli-mean-tokens.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/distilbert-base-nli-mean-tokens", + "downloads": 522847, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "distilbert", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: feature-extraction --- **⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models** # sentence-transformers/distilbert-base-nli-mean-tokens This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_distilbert-base-nli-stsb-mean-tokens.json b/data/model_data_json/sentence-transformers_distilbert-base-nli-stsb-mean-tokens.json new file mode 100644 index 0000000000000000000000000000000000000000..e0290140ad8e3df5bf61919262474368150b3da2 --- /dev/null +++ b/data/model_data_json/sentence-transformers_distilbert-base-nli-stsb-mean-tokens.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/distilbert-base-nli-stsb-mean-tokens", + "downloads": 251324, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "distilbert", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- **⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models** # sentence-transformers/distilbert-base-nli-stsb-mean-tokens This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_distiluse-base-multilingual-cased-v1.json b/data/model_data_json/sentence-transformers_distiluse-base-multilingual-cased-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..85ff91da190d6308848a6fc1f50a82fd8a3758d8 --- /dev/null +++ b/data/model_data_json/sentence-transformers_distiluse-base-multilingual-cased-v1.json @@ -0,0 +1,37 @@ +{ + "model_id": "sentence-transformers/distiluse-base-multilingual-cased-v1", + "downloads": 749463, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "distilbert", + "feature-extraction", + "sentence-similarity", + "multilingual", + "ar", + "zh", + "nl", + "en", + "fr", + "de", + "it", + "ko", + "pl", + "pt", + "ru", + "es", + "tr", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ar - zh - nl - en - fr - de - it - ko - pl - pt - ru - es - tr license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity pipeline_tag: sentence-similarity --- # sentence-transformers/distiluse-base-multilingual-cased-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps multilingual sentences and paragraphs to 512-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_distiluse-base-multilingual-cased-v2.json b/data/model_data_json/sentence-transformers_distiluse-base-multilingual-cased-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..ac8ae1d83a1ac01bd97cde2d9e899c0258b9648b --- /dev/null +++ b/data/model_data_json/sentence-transformers_distiluse-base-multilingual-cased-v2.json @@ -0,0 +1,73 @@ +{ + "model_id": "sentence-transformers/distiluse-base-multilingual-cased-v2", + "downloads": 1509098, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "distilbert", + "feature-extraction", + "sentence-similarity", + "multilingual", + "ar", + "bg", + "ca", + "cs", + "da", + "de", + "el", + "en", + "es", + "et", + "fa", + "fi", + "fr", + "gl", + "gu", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "it", + "ja", + "ka", + "ko", + "ku", + "lt", + "lv", + "mk", + "mn", + "mr", + "ms", + "my", + "nb", + "nl", + "pl", + "pt", + "ro", + "ru", + "sk", + "sl", + "sq", + "sr", + "sv", + "th", + "tr", + "uk", + "ur", + "vi", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ar - bg - ca - cs - da - de - el - en - es - et - fa - fi - fr - gl - gu - he - hi - hr - hu - hy - id - it - ja - ka - ko - ku - lt - lv - mk - mn - mr - ms - my - nb - nl - pl - pt - ro - ru - sk - sl - sq - sr - sv - th - tr - uk - ur - vi license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity language_bcp47: - fr-ca - pt-br - zh-cn - zh-tw pipeline_tag: sentence-similarity --- # sentence-transformers/distiluse-base-multilingual-cased-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Distills multilingual sentences into 512-dimensional vectors for semantic search and clustering across 50+ languages." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_msmarco-MiniLM-L12-cos-v5.json b/data/model_data_json/sentence-transformers_msmarco-MiniLM-L12-cos-v5.json new file mode 100644 index 0000000000000000000000000000000000000000..a4e2172a0a8a9c0ea4432f4673130a78eee5fbe6 --- /dev/null +++ b/data/model_data_json/sentence-transformers_msmarco-MiniLM-L12-cos-v5.json @@ -0,0 +1,25 @@ +{ + "model_id": "sentence-transformers/msmarco-MiniLM-L12-cos-v5", + "downloads": 101403, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "arxiv:1908.10084", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # msmarco-MiniLM-L12-cos-v5 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for **semantic search**. It has been trained on 500k (query, answer) pairs from the MS MARCO Passages dataset. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. ## Technical Details In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 768 | | Produces normalized embeddings | Yes | | Pooling-Method | Mean pooling | | Suitable score functions | dot-product (), cosine-similarity (), or euclidean distance | Note: When loaded with , this model produces normalized embeddings with length 1. In that case, dot-product and cosine-similarity are equivalent. dot-product is preferred as it is faster. Euclidean distance is proportional to dot-product and can also be used. ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for semantic search, trained on MS MARCO query-answer pairs." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_msmarco-MiniLM-L6-v3.json b/data/model_data_json/sentence-transformers_msmarco-MiniLM-L6-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..c4ea5d45c7fcdf8c2aa865d7b8c43da0bc2ead7d --- /dev/null +++ b/data/model_data_json/sentence-transformers_msmarco-MiniLM-L6-v3.json @@ -0,0 +1,25 @@ +{ + "model_id": "sentence-transformers/msmarco-MiniLM-L6-v3", + "downloads": 151887, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "jax", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # sentence-transformers/msmarco-MiniLM-L6-v3 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 384-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_msmarco-bert-base-dot-v5.json b/data/model_data_json/sentence-transformers_msmarco-bert-base-dot-v5.json new file mode 100644 index 0000000000000000000000000000000000000000..4be9b35d4049727f876630a6fc3862239d54622a --- /dev/null +++ b/data/model_data_json/sentence-transformers_msmarco-bert-base-dot-v5.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/msmarco-bert-base-dot-v5", + "downloads": 285677, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "arxiv:1908.10084", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # msmarco-bert-base-dot-v5 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for **semantic search**. It has been trained on 500K (query, answer) pairs from the MS MARCO dataset. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. ## Technical Details In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 768 | | Max Sequence Length | 512 | | Produces normalized embeddings | No | | Pooling-Method | Mean pooling | | Suitable score functions | dot-product (e.g. ) | ## Training See in this repository for the used training script. The model was trained with the parameters: **DataLoader**: of length 7858 with parameters: **Loss**: Parameters of the fit()-Method: ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for semantic search using dot-product similarity, trained on MS MARCO query-answer pairs." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_msmarco-distilbert-base-tas-b.json b/data/model_data_json/sentence-transformers_msmarco-distilbert-base-tas-b.json new file mode 100644 index 0000000000000000000000000000000000000000..03dd23b5002892fabca8264a166c6922ec9d2189 --- /dev/null +++ b/data/model_data_json/sentence-transformers_msmarco-distilbert-base-tas-b.json @@ -0,0 +1,25 @@ +{ + "model_id": "sentence-transformers/msmarco-distilbert-base-tas-b", + "downloads": 321011, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "distilbert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:ms_marco", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - ms_marco pipeline_tag: sentence-similarity --- # sentence-transformers/msmarco-distilbert-base-tas-b This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors Have a look at: DistilBert TAS-B Model", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for optimized semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_msmarco-distilbert-base-v4.json b/data/model_data_json/sentence-transformers_msmarco-distilbert-base-v4.json new file mode 100644 index 0000000000000000000000000000000000000000..229e94b7580573213494fb7de9c994becda0c470 --- /dev/null +++ b/data/model_data_json/sentence-transformers_msmarco-distilbert-base-v4.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/msmarco-distilbert-base-v4", + "downloads": 401651, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "distilbert", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # sentence-transformers/msmarco-distilbert-base-v4 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_multi-qa-MiniLM-L6-cos-v1.json b/data/model_data_json/sentence-transformers_multi-qa-MiniLM-L6-cos-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..0cbf812b444e1a267573db47fa5c387cb47e3985 --- /dev/null +++ b/data/model_data_json/sentence-transformers_multi-qa-MiniLM-L6-cos-v1.json @@ -0,0 +1,35 @@ +{ + "model_id": "sentence-transformers/multi-qa-MiniLM-L6-cos-v1", + "downloads": 2062294, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:ms_marco", + "dataset:gooaq", + "dataset:yahoo_answers_topics", + "dataset:search_qa", + "dataset:eli5", + "dataset:natural_questions", + "dataset:trivia_qa", + "dataset:embedding-data/QQP", + "dataset:embedding-data/PAQ_pairs", + "dataset:embedding-data/Amazon-QA", + "dataset:embedding-data/WikiAnswers", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - search_qa - eli5 - natural_questions - trivia_qa - embedding-data/QQP - embedding-data/PAQ_pairs - embedding-data/Amazon-QA - embedding-data/WikiAnswers pipeline_tag: sentence-similarity --- # multi-qa-MiniLM-L6-cos-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and was designed for **semantic search**. It has been trained on 215M (question, answer) pairs from diverse sources. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## PyTorch Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. ## TensorFlow Usage (HuggingFace Transformers) Similarly to the PyTorch example above, to use the model with TensorFlow you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. ## Technical Details In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 384 | | Produces normalized embeddings | Yes | | Pooling-Method | Mean pooling | | Suitable score functions | dot-product (), cosine-similarity (), or euclidean distance | Note: When loaded with , this model produces normalized embeddings with length 1. In that case, dot-product and cosine-similarity are equivalent. dot-product is preferred as it is faster. Euclidean distance is proportional to dot-product and can also be used. ---- ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used for semantic search: It encodes queries / questions and text paragraphs in a dense vector space. It finds relevant documents for the given passages. Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text. ## Training procedure The full training script is accessible in this current repository: . ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. #### Training We use the concatenation from multiple datasets to fine-tune our model. In total we have about 215M (question, answer) pairs. We sampled each dataset given a weighted probability which configuration is detailed in the file. The model was trained with MultipleNegativesRankingLoss using Mean-pooling, cosine-similarity as similarity function, and a scale of 20. | Dataset | Number of training tuples | |--------------------------------------------------------|:--------------------------:| | WikiAnswers Duplicate question pairs from WikiAnswers | 77,427,422 | | PAQ Automatically generated (Question, Paragraph) pairs for each paragraph in Wikipedia | 64,371,441 | | Stack Exchange (Title, Body) pairs from all StackExchanges | 25,316,456 | | Stack Exchange (Title, Answer) pairs from all StackExchanges | 21,396,559 | | MS MARCO Triplets (query, answer, hard_negative) for 500k queries from Bing search engine | 17,579,773 | | GOOAQ: Open Question Answering with Diverse Answer Types (query, answer) pairs for 3M Google queries and Google featured snippet | 3,012,496 | | Amazon-QA (Question, Answer) pairs from Amazon product pages | 2,448,839 | Yahoo Answers (Title, Answer) pairs from Yahoo Answers | 1,198,260 | | Yahoo Answers (Question, Answer) pairs from Yahoo Answers | 681,164 | | Yahoo Answers (Title, Question) pairs from Yahoo Answers | 659,896 | | SearchQA (Question, Answer) pairs for 140k questions, each with Top5 Google snippets on that question | 582,261 | | ELI5 (Question, Answer) pairs from Reddit ELI5 (explainlikeimfive) | 325,475 | | Stack Exchange Duplicate questions pairs (titles) | 304,525 | | Quora Question Triplets (Question, Duplicate_Question, Hard_Negative) triplets for Quora Questions Pairs dataset | 103,663 | | Natural Questions (NQ) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 | | SQuAD2.0 (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 | | TriviaQA (Question, Evidence) pairs | 73,346 | | **Total** | **214,988,242** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 384-dimensional vectors for semantic search using trained question-answer pairs." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_multi-qa-distilbert-cos-v1.json b/data/model_data_json/sentence-transformers_multi-qa-distilbert-cos-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..37d3425e3bda31224f5e4168539cc50e23063240 --- /dev/null +++ b/data/model_data_json/sentence-transformers_multi-qa-distilbert-cos-v1.json @@ -0,0 +1,35 @@ +{ + "model_id": "sentence-transformers/multi-qa-distilbert-cos-v1", + "downloads": 102682, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "distilbert", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:ms_marco", + "dataset:gooaq", + "dataset:yahoo_answers_topics", + "dataset:search_qa", + "dataset:eli5", + "dataset:natural_questions", + "dataset:trivia_qa", + "dataset:embedding-data/QQP", + "dataset:embedding-data/PAQ_pairs", + "dataset:embedding-data/Amazon-QA", + "dataset:embedding-data/WikiAnswers", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - search_qa - eli5 - natural_questions - trivia_qa - embedding-data/QQP - embedding-data/PAQ_pairs - embedding-data/Amazon-QA - embedding-data/WikiAnswers pipeline_tag: sentence-similarity --- # multi-qa-distilbert-cos-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for **semantic search**. It has been trained on 215M (question, answer) pairs from diverse sources. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. ## Technical Details In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 768 | | Produces normalized embeddings | Yes | | Pooling-Method | Mean pooling | | Suitable score functions | dot-product (), cosine-similarity (), or euclidean distance | Note: When loaded with , this model produces normalized embeddings with length 1. In that case, dot-product and cosine-similarity are equivalent. dot-product is preferred as it is faster. Euclidean distance is proportional to dot-product and can also be used. ---- ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used for semantic search: It encodes queries / questions and text paragraphs in a dense vector space. It finds relevant documents for the given passages. Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text. ## Training procedure The full training script is accessible in this current repository: . ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. #### Training We use the concatenation from multiple datasets to fine-tune our model. In total we have about 215M (question, answer) pairs. We sampled each dataset given a weighted probability which configuration is detailed in the file. The model was trained with MultipleNegativesRankingLoss using Mean-pooling, cosine-similarity as similarity function, and a scale of 20. | Dataset | Number of training tuples | |--------------------------------------------------------|:--------------------------:| | WikiAnswers Duplicate question pairs from WikiAnswers | 77,427,422 | | PAQ Automatically generated (Question, Paragraph) pairs for each paragraph in Wikipedia | 64,371,441 | | Stack Exchange (Title, Body) pairs from all StackExchanges | 25,316,456 | | Stack Exchange (Title, Answer) pairs from all StackExchanges | 21,396,559 | | MS MARCO Triplets (query, answer, hard_negative) for 500k queries from Bing search engine | 17,579,773 | | GOOAQ: Open Question Answering with Diverse Answer Types (query, answer) pairs for 3M Google queries and Google featured snippet | 3,012,496 | | Amazon-QA (Question, Answer) pairs from Amazon product pages | 2,448,839 | Yahoo Answers (Title, Answer) pairs from Yahoo Answers | 1,198,260 | | Yahoo Answers (Question, Answer) pairs from Yahoo Answers | 681,164 | | Yahoo Answers (Title, Question) pairs from Yahoo Answers | 659,896 | | SearchQA (Question, Answer) pairs for 140k questions, each with Top5 Google snippets on that question | 582,261 | | ELI5 (Question, Answer) pairs from Reddit ELI5 (explainlikeimfive) | 325,475 | | Stack Exchange Duplicate questions pairs (titles) | 304,525 | | Quora Question Triplets (Question, Duplicate_Question, Hard_Negative) triplets for Quora Questions Pairs dataset | 103,663 | | Natural Questions (NQ) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 | | SQuAD2.0 (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 | | TriviaQA (Question, Evidence) pairs | 73,346 | | **Total** | **214,988,242** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for semantic search, trained on 215M diverse question-answer pairs." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_multi-qa-mpnet-base-cos-v1.json b/data/model_data_json/sentence-transformers_multi-qa-mpnet-base-cos-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..ca0465491a93bd3c7a2fa7601d75bd5e6660cfed --- /dev/null +++ b/data/model_data_json/sentence-transformers_multi-qa-mpnet-base-cos-v1.json @@ -0,0 +1,22 @@ +{ + "model_id": "sentence-transformers/multi-qa-mpnet-base-cos-v1", + "downloads": 263192, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "mpnet", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # multi-qa-mpnet-base-cos-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for **semantic search**. It has been trained on 215M (question, answer) pairs from diverse sources. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. ## Technical Details In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 768 | | Produces normalized embeddings | Yes | | Pooling-Method | Mean pooling | | Suitable score functions | dot-product (), cosine-similarity (), or euclidean distance | Note: When loaded with , this model produces normalized embeddings with length 1. In that case, dot-product and cosine-similarity are equivalent. dot-product is preferred as it is faster. Euclidean distance is proportional to dot-product and can also be used. ---- ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used for semantic search: It encodes queries / questions and text paragraphs in a dense vector space. It finds relevant documents for the given passages. Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text. ## Training procedure The full training script is accessible in this current repository: . ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. #### Training We use the concatenation from multiple datasets to fine-tune our model. In total we have about 215M (question, answer) pairs. We sampled each dataset given a weighted probability which configuration is detailed in the file. The model was trained with MultipleNegativesRankingLoss using Mean-pooling, cosine-similarity as similarity function, and a scale of 20. | Dataset | Number of training tuples | |--------------------------------------------------------|:--------------------------:| | WikiAnswers Duplicate question pairs from WikiAnswers | 77,427,422 | | PAQ Automatically generated (Question, Paragraph) pairs for each paragraph in Wikipedia | 64,371,441 | | Stack Exchange (Title, Body) pairs from all StackExchanges | 25,316,456 | | Stack Exchange (Title, Answer) pairs from all StackExchanges | 21,396,559 | | MS MARCO Triplets (query, answer, hard_negative) for 500k queries from Bing search engine | 17,579,773 | | GOOAQ: Open Question Answering with Diverse Answer Types (query, answer) pairs for 3M Google queries and Google featured snippet | 3,012,496 | | Amazon-QA (Question, Answer) pairs from Amazon product pages | 2,448,839 | Yahoo Answers (Title, Answer) pairs from Yahoo Answers | 1,198,260 | | Yahoo Answers (Question, Answer) pairs from Yahoo Answers | 681,164 | | Yahoo Answers (Title, Question) pairs from Yahoo Answers | 659,896 | | SearchQA (Question, Answer) pairs for 140k questions, each with Top5 Google snippets on that question | 582,261 | | ELI5 (Question, Answer) pairs from Reddit ELI5 (explainlikeimfive) | 325,475 | | Stack Exchange Duplicate questions pairs (titles) | 304,525 | | Quora Question Triplets (Question, Duplicate_Question, Hard_Negative) triplets for Quora Questions Pairs dataset | 103,663 | | Natural Questions (NQ) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 | | SQuAD2.0 (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 | | TriviaQA (Question, Evidence) pairs | 73,346 | | **Total** | **214,988,242** |", + "model_explanation_gemini": "Embeds sentences and paragraphs into 768-dimensional vectors for semantic search, trained on 215M question-answer pairs to find relevant documents." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_multi-qa-mpnet-base-dot-v1.json b/data/model_data_json/sentence-transformers_multi-qa-mpnet-base-dot-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..d23ce25cb6e4ae54f9c8345422cb629c32ae9757 --- /dev/null +++ b/data/model_data_json/sentence-transformers_multi-qa-mpnet-base-dot-v1.json @@ -0,0 +1,34 @@ +{ + "model_id": "sentence-transformers/multi-qa-mpnet-base-dot-v1", + "downloads": 1648898, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "mpnet", + "fill-mask", + "feature-extraction", + "sentence-similarity", + "transformers", + "en", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:ms_marco", + "dataset:gooaq", + "dataset:yahoo_answers_topics", + "dataset:search_qa", + "dataset:eli5", + "dataset:natural_questions", + "dataset:trivia_qa", + "dataset:embedding-data/QQP", + "dataset:embedding-data/PAQ_pairs", + "dataset:embedding-data/Amazon-QA", + "dataset:embedding-data/WikiAnswers", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - search_qa - eli5 - natural_questions - trivia_qa - embedding-data/QQP - embedding-data/PAQ_pairs - embedding-data/Amazon-QA - embedding-data/WikiAnswers pipeline_tag: sentence-similarity --- # multi-qa-mpnet-base-dot-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for **semantic search**. It has been trained on 215M (question, answer) pairs from diverse sources. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. ## Technical Details In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 768 | | Produces normalized embeddings | No | | Pooling-Method | CLS pooling | | Suitable score functions | dot-product (e.g. ) | ---- ## Background The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. ## Intended uses Our model is intented to be used for semantic search: It encodes queries / questions and text paragraphs in a dense vector space. It finds relevant documents for the given passages. Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text. ## Training procedure The full training script is accessible in this current repository: . ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. #### Training We use the concatenation from multiple datasets to fine-tune our model. In total we have about 215M (question, answer) pairs. We sampled each dataset given a weighted probability which configuration is detailed in the file. The model was trained with MultipleNegativesRankingLoss using CLS-pooling, dot-product as similarity function, and a scale of 1. | Dataset | Number of training tuples | |--------------------------------------------------------|:--------------------------:| | WikiAnswers Duplicate question pairs from WikiAnswers | 77,427,422 | | PAQ Automatically generated (Question, Paragraph) pairs for each paragraph in Wikipedia | 64,371,441 | | Stack Exchange (Title, Body) pairs from all StackExchanges | 25,316,456 | | Stack Exchange (Title, Answer) pairs from all StackExchanges | 21,396,559 | | MS MARCO Triplets (query, answer, hard_negative) for 500k queries from Bing search engine | 17,579,773 | | GOOAQ: Open Question Answering with Diverse Answer Types (query, answer) pairs for 3M Google queries and Google featured snippet | 3,012,496 | | Amazon-QA (Question, Answer) pairs from Amazon product pages | 2,448,839 | Yahoo Answers (Title, Answer) pairs from Yahoo Answers | 1,198,260 | | Yahoo Answers (Question, Answer) pairs from Yahoo Answers | 681,164 | | Yahoo Answers (Title, Question) pairs from Yahoo Answers | 659,896 | | SearchQA (Question, Answer) pairs for 140k questions, each with Top5 Google snippets on that question | 582,261 | | ELI5 (Question, Answer) pairs from Reddit ELI5 (explainlikeimfive) | 325,475 | | Stack Exchange Duplicate questions pairs (titles) | 304,525 | | Quora Question Triplets (Question, Duplicate_Question, Hard_Negative) triplets for Quora Questions Pairs dataset | 103,663 | | Natural Questions (NQ) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 | | SQuAD2.0 (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 | | TriviaQA (Question, Evidence) pairs | 73,346 | | **Total** | **214,988,242** |", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for semantic search using trained on 215M diverse question-answer pairs." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_nli-mpnet-base-v2.json b/data/model_data_json/sentence-transformers_nli-mpnet-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..af9b0fa2b81c14075d026ebc5b48fcae1423bf8c --- /dev/null +++ b/data/model_data_json/sentence-transformers_nli-mpnet-base-v2.json @@ -0,0 +1,23 @@ +{ + "model_id": "sentence-transformers/nli-mpnet-base-v2", + "downloads": 208293, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "mpnet", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # sentence-transformers/nli-mpnet-base-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to a 768-dimensional vector space for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_paraphrase-MiniLM-L3-v2.json b/data/model_data_json/sentence-transformers_paraphrase-MiniLM-L3-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..a4314708cc8585639d401bad7e1a55e205b80f22 --- /dev/null +++ b/data/model_data_json/sentence-transformers_paraphrase-MiniLM-L3-v2.json @@ -0,0 +1,37 @@ +{ + "model_id": "sentence-transformers/paraphrase-MiniLM-L3-v2", + "downloads": 419098, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:s2orc", + "dataset:ms_marco", + "dataset:wiki_atomic_edits", + "dataset:snli", + "dataset:multi_nli", + "dataset:embedding-data/altlex", + "dataset:embedding-data/simple-wiki", + "dataset:embedding-data/flickr30k-captions", + "dataset:embedding-data/coco_captions", + "dataset:embedding-data/sentence-compression", + "dataset:embedding-data/QQP", + "dataset:yahoo_answers_topics", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - s2orc - ms_marco - wiki_atomic_edits - snli - multi_nli - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/flickr30k-captions - embedding-data/coco_captions - embedding-data/sentence-compression - embedding-data/QQP - yahoo_answers_topics pipeline_tag: sentence-similarity --- # sentence-transformers/paraphrase-MiniLM-L3-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 384-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_paraphrase-MiniLM-L6-v2.json b/data/model_data_json/sentence-transformers_paraphrase-MiniLM-L6-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..6e62dd6729a210f6909af2c664cf363af3ffa7e5 --- /dev/null +++ b/data/model_data_json/sentence-transformers_paraphrase-MiniLM-L6-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/paraphrase-MiniLM-L6-v2", + "downloads": 3871920, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # sentence-transformers/paraphrase-MiniLM-L6-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 384-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_paraphrase-albert-small-v2.json b/data/model_data_json/sentence-transformers_paraphrase-albert-small-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..741faf98cc24e363d5fd83baaf70b3f8cd19caf0 --- /dev/null +++ b/data/model_data_json/sentence-transformers_paraphrase-albert-small-v2.json @@ -0,0 +1,37 @@ +{ + "model_id": "sentence-transformers/paraphrase-albert-small-v2", + "downloads": 177093, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "rust", + "onnx", + "safetensors", + "openvino", + "albert", + "feature-extraction", + "sentence-similarity", + "transformers", + "dataset:flax-sentence-embeddings/stackexchange_xml", + "dataset:s2orc", + "dataset:ms_marco", + "dataset:wiki_atomic_edits", + "dataset:snli", + "dataset:multi_nli", + "dataset:embedding-data/altlex", + "dataset:embedding-data/simple-wiki", + "dataset:embedding-data/flickr30k-captions", + "dataset:embedding-data/coco_captions", + "dataset:embedding-data/sentence-compression", + "dataset:embedding-data/QQP", + "dataset:yahoo_answers_topics", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - s2orc - ms_marco - wiki_atomic_edits - snli - multi_nli - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/flickr30k-captions - embedding-data/coco_captions - embedding-data/sentence-compression - embedding-data/QQP - yahoo_answers_topics pipeline_tag: sentence-similarity --- # sentence-transformers/paraphrase-albert-small-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_paraphrase-mpnet-base-v2.json b/data/model_data_json/sentence-transformers_paraphrase-mpnet-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..7d42dcf44826063205d2b8789161866d4813fedb --- /dev/null +++ b/data/model_data_json/sentence-transformers_paraphrase-mpnet-base-v2.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/paraphrase-mpnet-base-v2", + "downloads": 1218113, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "mpnet", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "doi:10.57967/hf/2004", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # sentence-transformers/paraphrase-mpnet-base-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_paraphrase-multilingual-MiniLM-L12-v2.json b/data/model_data_json/sentence-transformers_paraphrase-multilingual-MiniLM-L12-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..7950d8b3a253433fcca2dee11d72b0b6c7065c3c --- /dev/null +++ b/data/model_data_json/sentence-transformers_paraphrase-multilingual-MiniLM-L12-v2.json @@ -0,0 +1,74 @@ +{ + "model_id": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", + "downloads": 7015881, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "multilingual", + "ar", + "bg", + "ca", + "cs", + "da", + "de", + "el", + "en", + "es", + "et", + "fa", + "fi", + "fr", + "gl", + "gu", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "it", + "ja", + "ka", + "ko", + "ku", + "lt", + "lv", + "mk", + "mn", + "mr", + "ms", + "my", + "nb", + "nl", + "pl", + "pt", + "ro", + "ru", + "sk", + "sl", + "sq", + "sr", + "sv", + "th", + "tr", + "uk", + "ur", + "vi", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ar - bg - ca - cs - da - de - el - en - es - et - fa - fi - fr - gl - gu - he - hi - hr - hu - hy - id - it - ja - ka - ko - ku - lt - lv - mk - mn - mr - ms - my - nb - nl - pl - pt - ro - ru - sk - sl - sq - sr - sv - th - tr - uk - ur - vi license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers language_bcp47: - fr-ca - pt-br - zh-cn - zh-tw pipeline_tag: sentence-similarity --- # sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps multilingual sentences and paragraphs to 384-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_paraphrase-multilingual-mpnet-base-v2.json b/data/model_data_json/sentence-transformers_paraphrase-multilingual-mpnet-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..24f4d09dbe8cbebc31374025d7e744dc30134693 --- /dev/null +++ b/data/model_data_json/sentence-transformers_paraphrase-multilingual-mpnet-base-v2.json @@ -0,0 +1,74 @@ +{ + "model_id": "sentence-transformers/paraphrase-multilingual-mpnet-base-v2", + "downloads": 2657393, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "xlm-roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "multilingual", + "ar", + "bg", + "ca", + "cs", + "da", + "de", + "el", + "en", + "es", + "et", + "fa", + "fi", + "fr", + "gl", + "gu", + "he", + "hi", + "hr", + "hu", + "hy", + "id", + "it", + "ja", + "ka", + "ko", + "ku", + "lt", + "lv", + "mk", + "mn", + "mr", + "ms", + "my", + "nb", + "nl", + "pl", + "pt", + "ro", + "ru", + "sk", + "sl", + "sq", + "sr", + "sv", + "th", + "tr", + "uk", + "ur", + "vi", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - multilingual - ar - bg - ca - cs - da - de - el - en - es - et - fa - fi - fr - gl - gu - he - hi - hr - hu - hy - id - it - ja - ka - ko - ku - lt - lv - mk - mn - mr - ms - my - nb - nl - pl - pt - ro - ru - sk - sl - sq - sr - sv - th - tr - uk - ur - vi license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers language_bcp47: - fr-ca - pt-br - zh-cn - zh-tw pipeline_tag: sentence-similarity --- # sentence-transformers/paraphrase-multilingual-mpnet-base-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps multilingual sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_paraphrase-xlm-r-multilingual-v1.json b/data/model_data_json/sentence-transformers_paraphrase-xlm-r-multilingual-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..ccb079582e4fd078a235273d5f3d76eac4a77e51 --- /dev/null +++ b/data/model_data_json/sentence-transformers_paraphrase-xlm-r-multilingual-v1.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/paraphrase-xlm-r-multilingual-v1", + "downloads": 295404, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "xlm-roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # sentence-transformers/paraphrase-xlm-r-multilingual-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_roberta-base-nli-mean-tokens.json b/data/model_data_json/sentence-transformers_roberta-base-nli-mean-tokens.json new file mode 100644 index 0000000000000000000000000000000000000000..9a89fbc291af8aa4df347e216d1c6255a93aa490 --- /dev/null +++ b/data/model_data_json/sentence-transformers_roberta-base-nli-mean-tokens.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/roberta-base-nli-mean-tokens", + "downloads": 486655, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- **⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models** # sentence-transformers/roberta-base-nli-mean-tokens This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_sentence-t5-base.json b/data/model_data_json/sentence-transformers_sentence-t5-base.json new file mode 100644 index 0000000000000000000000000000000000000000..9b2aa3cc20db27180d6dd06a2c3a3af9e5fb71c4 --- /dev/null +++ b/data/model_data_json/sentence-transformers_sentence-t5-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "sentence-transformers/sentence-t5-base", + "downloads": 83144, + "tags": [ + "sentence-transformers", + "pytorch", + "rust", + "safetensors", + "t5", + "feature-extraction", + "sentence-similarity", + "en", + "arxiv:2108.08877", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity pipeline_tag: sentence-similarity --- # sentence-transformers/sentence-t5-base This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space. The model works well for sentence similarity tasks, but doesn't perform that well for semantic search tasks. This model was converted from the Tensorflow model st5-base-1 to PyTorch. When using this model, have a look at the publication: Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. The tfhub model and this PyTorch model can produce slightly different embeddings, however, when run on the same benchmarks, they produce identical results. The model uses only the encoder from a T5-base model. The weights are stored in FP16. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: The model requires sentence-transformers version 2.2.0 or newer. ## Citing & Authors If you find this model helpful, please cite the respective publication: Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models" +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_sentence-t5-xl.json b/data/model_data_json/sentence-transformers_sentence-t5-xl.json new file mode 100644 index 0000000000000000000000000000000000000000..a1084dc2c3d713275181f302085cc39c540abb1a --- /dev/null +++ b/data/model_data_json/sentence-transformers_sentence-t5-xl.json @@ -0,0 +1,19 @@ +{ + "model_id": "sentence-transformers/sentence-t5-xl", + "downloads": 76312, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "t5", + "feature-extraction", + "sentence-similarity", + "en", + "arxiv:2108.08877", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity pipeline_tag: sentence-similarity --- # sentence-transformers/sentence-t5-xl This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space. The model works well for sentence similarity tasks, but doesn't perform that well for semantic search tasks. This model was converted from the Tensorflow model st5-3b-1 to PyTorch. When using this model, have a look at the publication: Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. The tfhub model and this PyTorch model can produce slightly different embeddings, however, when run on the same benchmarks, they produce identical results. The model uses only the encoder from a T5-3B model. The weights are stored in FP16. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: The model requires sentence-transformers version 2.2.0 or newer. ## Citing & Authors If you find this model helpful, please cite the respective publication: Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models" +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_stsb-roberta-base.json b/data/model_data_json/sentence-transformers_stsb-roberta-base.json new file mode 100644 index 0000000000000000000000000000000000000000..d96258086c0d3eccf38890bb81bfa56458cca673 --- /dev/null +++ b/data/model_data_json/sentence-transformers_stsb-roberta-base.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/stsb-roberta-base", + "downloads": 678783, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- **⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models** # sentence-transformers/stsb-roberta-base This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_stsb-xlm-r-multilingual.json b/data/model_data_json/sentence-transformers_stsb-xlm-r-multilingual.json new file mode 100644 index 0000000000000000000000000000000000000000..b348cabe0af2096af2d16734ffc6edf55cf30a59 --- /dev/null +++ b/data/model_data_json/sentence-transformers_stsb-xlm-r-multilingual.json @@ -0,0 +1,24 @@ +{ + "model_id": "sentence-transformers/stsb-xlm-r-multilingual", + "downloads": 1206907, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "onnx", + "safetensors", + "openvino", + "xlm-roberta", + "feature-extraction", + "sentence-similarity", + "transformers", + "arxiv:1908.10084", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # sentence-transformers/stsb-xlm-r-multilingual This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Full Model Architecture ## Citing & Authors This model was trained by sentence-transformers. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:", + "model_explanation_gemini": "Maps sentences and paragraphs to 768-dimensional vectors for tasks like clustering and semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sentence-transformers_use-cmlm-multilingual.json b/data/model_data_json/sentence-transformers_use-cmlm-multilingual.json new file mode 100644 index 0000000000000000000000000000000000000000..293eb0f83878b17f6ce8c84cef1508e2d2fe3483 --- /dev/null +++ b/data/model_data_json/sentence-transformers_use-cmlm-multilingual.json @@ -0,0 +1,21 @@ +{ + "model_id": "sentence-transformers/use-cmlm-multilingual", + "downloads": 6312966, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "license:apache-2.0", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- # use-cmlm-multilingual This is a pytorch version of the universal-sentence-encoder-cmlm/multilingual-base-br model. It can be used to map 109 languages to a shared vector space. As the model is based LaBSE, it perform quite comparable on downstream tasks. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Full Model Architecture ## Citing & Authors Have a look at universal-sentence-encoder-cmlm/multilingual-base-br for the respective publication that describes this model.", + "model_explanation_gemini": "Maps text from 109 languages into a shared vector space for sentence similarity and feature extraction tasks." +} \ No newline at end of file diff --git a/data/model_data_json/sergeyzh_rubert-tiny-turbo.json b/data/model_data_json/sergeyzh_rubert-tiny-turbo.json new file mode 100644 index 0000000000000000000000000000000000000000..e7110c35ca56321c5f99d8ae4ee08fb29316cae7 --- /dev/null +++ b/data/model_data_json/sergeyzh_rubert-tiny-turbo.json @@ -0,0 +1,29 @@ +{ + "model_id": "sergeyzh/rubert-tiny-turbo", + "downloads": 76818, + "tags": [ + "sentence-transformers", + "safetensors", + "bert", + "feature-extraction", + "russian", + "pretraining", + "embeddings", + "tiny", + "sentence-similarity", + "transformers", + "mteb", + "ru", + "dataset:IlyaGusev/gazeta", + "dataset:zloelias/lenta-ru", + "base_model:cointegrated/rubert-tiny2", + "base_model:finetune:cointegrated/rubert-tiny2", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru pipeline_tag: sentence-similarity tags: - russian - pretraining - embeddings - tiny - feature-extraction - sentence-similarity - sentence-transformers - transformers - mteb datasets: - IlyaGusev/gazeta - zloelias/lenta-ru license: mit base_model: cointegrated/rubert-tiny2 model-index: - name: sergeyzh/rubert-tiny-turbo results: - dataset: config: default name: MTEB AILACasedocs (default) revision: 4106e6bcc72e0698d714ea8b101355e3e238431a split: test type: mteb/AILA_casedocs metrics: - type: main_score value: 7.432999999999999 - type: map_at_1 value: 0.604 - type: map_at_10 value: 3.8989999999999996 - type: map_at_100 value: 7.89 - type: map_at_1000 value: 8.417 - type: map_at_20 value: 5.007000000000001 - type: map_at_3 value: 2.688 - type: map_at_5 value: 3.0380000000000003 - type: mrr_at_1 value: 6.0 - type: mrr_at_10 value: 11.799999999999999 - type: mrr_at_100 value: 14.417998426795965 - type: mrr_at_1000 value: 14.474056627618499 - type: mrr_at_20 value: 13.017532467532467 - type: mrr_at_3 value: 10.333333333333334 - type: mrr_at_5 value: 10.733333333333333 - type: nauc_map_at_1000_diff1 value: -18.649405381116548 - type: nauc_map_at_1000_max value: 53.92467833877199 - type: nauc_map_at_1000_std value: -37.567628121407296 - type: nauc_map_at_100_diff1 value: -19.053926237591206 - type: nauc_map_at_100_max value: 53.442907236002725 - type: nauc_map_at_100_std value: -37.310817568902884 - type: nauc_map_at_10_diff1 value: -13.464050841785403 - type: nauc_map_at_10_max value: 48.093886298979946 - type: nauc_map_at_10_std value: -34.85388157835729 - type: nauc_map_at_1_diff1 value: -13.741863044507388 - type: nauc_map_at_1_max value: 88.80266056441289 - type: nauc_map_at_1_std value: -52.44805080502242 - type: nauc_map_at_20_diff1 value: -14.561491138058782 - type: nauc_map_at_20_max value: 48.97477701904 - type: nauc_map_at_20_std value: -31.218577996781537 - type: nauc_map_at_3_diff1 value: -15.370170931276068 - type: nauc_map_at_3_max value: 53.443631887225486 - type: nauc_map_at_3_std value: -40.92344513873499 - type: nauc_map_at_5_diff1 value: -12.899827975508286 - type: nauc_map_at_5_max value: 56.55724779187716 - type: nauc_map_at_5_std value: -38.50107328981899 - type: nauc_mrr_at_1000_diff1 value: -20.480388426956775 - type: nauc_mrr_at_1000_max value: 59.34434186773745 - type: nauc_mrr_at_1000_std value: -38.78219708358511 - type: nauc_mrr_at_100_diff1 value: -20.733217227513638 - type: nauc_mrr_at_100_max value: 59.338571965753026 - type: nauc_mrr_at_100_std value: -38.905241386083524 - type: nauc_mrr_at_10_diff1 value: -23.191503817950903 - type: nauc_mrr_at_10_max value: 59.40585262343663 - type: nauc_mrr_at_10_std value: -39.558082853802894 - type: nauc_mrr_at_1_diff1 value: -18.978624452195685 - type: nauc_mrr_at_1_max value: 88.73088274751811 - type: nauc_mrr_at_1_std value: -52.46400143099903 - type: nauc_mrr_at_20_diff1 value: -20.110327257289537 - type: nauc_mrr_at_20_max value: 57.24590011894607 - type: nauc_mrr_at_20_std value: -36.76057923211494 - type: nauc_mrr_at_3_diff1 value: -20.292924276357084 - type: nauc_mrr_at_3_max value: 62.92624417852826 - type: nauc_mrr_at_3_std value: -42.31284612573441 - type: nauc_mrr_at_5_diff1 value: -22.088780368608298 - type: nauc_mrr_at_5_max value: 61.62928734634482 - type: nauc_mrr_at_5_std value: -38.47155384792127 - type: nauc_ndcg_at_1000_diff1 value: -21.96644342707332 - type: nauc_ndcg_at_1000_max value: 54.04115629470727 - type: nauc_ndcg_at_1000_std value: -38.60954619686922 - type: nauc_ndcg_at_100_diff1 value: -28.508933576201116 - type: nauc_ndcg_at_100_max value: 53.62925134001747 - type: nauc_ndcg_at_100_std value: -41.66742945815351 - type: nauc_ndcg_at_10_diff1 value: -19.22314681419278 - type: nauc_ndcg_at_10_max value: 44.88305374351992 - type: nauc_ndcg_at_10_std value: -32.86086137849654 - type: nauc_ndcg_at_1_diff1 value: -18.978624452195685 - type: nauc_ndcg_at_1_max value: 88.73088274751811 - type: nauc_ndcg_at_1_std value: -52.46400143099903 - type: nauc_ndcg_at_20_diff1 value: -14.037813797353552 - type: nauc_ndcg_at_20_max value: 43.01748289241327 - type: nauc_ndcg_at_20_std value: -23.548077008049674 - type: nauc_ndcg_at_3_diff1 value: -19.9659903984576 - type: nauc_ndcg_at_3_max value: 64.99817864354436 - type: nauc_ndcg_at_3_std value: -45.246163550721796 - type: nauc_ndcg_at_5_diff1 value: -20.389688306447788 - type: nauc_ndcg_at_5_max value: 61.370293646369454 - type: nauc_ndcg_at_5_std value: -39.9134710853091 - type: nauc_precision_at_1000_diff1 value: -26.69952361901621 - type: nauc_precision_at_1000_max value: 46.40932456102013 - type: nauc_precision_at_1000_std value: -37.38094677778857 - type: nauc_precision_at_100_diff1 value: -29.692268260058146 - type: nauc_precision_at_100_max value: 49.265913223173584 - type: nauc_precision_at_100_std value: -41.45888232985447 - type: nauc_precision_at_10_diff1 value: -20.974428245377048 - type: nauc_precision_at_10_max value: 53.924262890679564 - type: nauc_precision_at_10_std value: -35.74456192649867 - type: nauc_precision_at_1_diff1 value: -18.978624452195685 - type: nauc_precision_at_1_max value: 88.73088274751811 - type: nauc_precision_at_1_std value: -52.46400143099903 - type: nauc_precision_at_20_diff1 value: -23.03848763224966 - type: nauc_precision_at_20_max value: 51.19001778609016 - type: nauc_precision_at_20_std value: -33.25265416139501 - type: nauc_precision_at_3_diff1 value: -19.497362250879267 - type: nauc_precision_at_3_max value: 64.71277842907384 - type: nauc_precision_at_3_std value: -44.512016412661204 - type: nauc_precision_at_5_diff1 value: -18.918918918918912 - type: nauc_precision_at_5_max value: 64.89456489456494 - type: nauc_precision_at_5_std value: -37.37960880818024 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: -44.51937508102329 - type: nauc_recall_at_100_max value: 25.75429602376942 - type: nauc_recall_at_100_std value: -33.30783195688129 - type: nauc_recall_at_10_diff1 value: -18.776401920240275 - type: nauc_recall_at_10_max value: 23.00791681188562 - type: nauc_recall_at_10_std value: -21.576198296256532 - type: nauc_recall_at_1_diff1 value: -13.741863044507388 - type: nauc_recall_at_1_max value: 88.80266056441289 - type: nauc_recall_at_1_std value: -52.44805080502242 - type: nauc_recall_at_20_diff1 value: -3.8724115673803343 - type: nauc_recall_at_20_max value: 21.50124528790692 - type: nauc_recall_at_20_std value: -1.6719812367243132 - type: nauc_recall_at_3_diff1 value: -20.21079163108882 - type: nauc_recall_at_3_max value: 42.152167178196684 - type: nauc_recall_at_3_std value: -36.258746145318526 - type: nauc_recall_at_5_diff1 value: -22.10269915203519 - type: nauc_recall_at_5_max value: 43.30767031613079 - type: nauc_recall_at_5_std value: -27.398704255640478 - type: ndcg_at_1 value: 6.0 - type: ndcg_at_10 value: 7.432999999999999 - type: ndcg_at_100 value: 26.354 - type: ndcg_at_1000 value: 30.558000000000003 - type: ndcg_at_20 value: 11.143 - type: ndcg_at_3 value: 7.979 - type: ndcg_at_5 value: 6.81 - type: precision_at_1 value: 6.0 - type: precision_at_10 value: 4.2 - type: precision_at_100 value: 3.1199999999999997 - type: precision_at_1000 value: 0.38999999999999996 - type: precision_at_20 value: 4.2 - type: precision_at_3 value: 8.0 - type: precision_at_5 value: 5.6000000000000005 - type: recall_at_1 value: 0.604 - type: recall_at_10 value: 9.678 - type: recall_at_100 value: 78.645 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 20.79 - type: recall_at_3 value: 4.261 - type: recall_at_5 value: 5.011 task: type: Retrieval - dataset: config: default name: MTEB AILAStatutes (default) revision: ebfcd844eadd3d667efa3c57fc5c8c87f5c2867e split: test type: mteb/AILA_statutes metrics: - type: main_score value: 13.624 - type: map_at_1 value: 1.7999999999999998 - type: map_at_10 value: 6.41 - type: map_at_100 value: 11.995000000000001 - type: map_at_1000 value: 11.995000000000001 - type: map_at_20 value: 7.33 - type: map_at_3 value: 4.089 - type: map_at_5 value: 5.192 - type: mrr_at_1 value: 8.0 - type: mrr_at_10 value: 20.935714285714287 - type: mrr_at_100 value: 23.02755974294914 - type: mrr_at_1000 value: 23.02755974294914 - type: mrr_at_20 value: 22.1038126476207 - type: mrr_at_3 value: 15.333333333333332 - type: mrr_at_5 value: 19.533333333333335 - type: nauc_map_at_1000_diff1 value: 5.278882422253006 - type: nauc_map_at_1000_max value: 3.7333073133608896 - type: nauc_map_at_1000_std value: -4.5637189871999775 - type: nauc_map_at_100_diff1 value: 5.278882422253006 - type: nauc_map_at_100_max value: 3.7333073133608896 - type: nauc_map_at_100_std value: -4.5637189871999775 - type: nauc_map_at_10_diff1 value: 8.570212263630141 - type: nauc_map_at_10_max value: -6.6489980060039295 - type: nauc_map_at_10_std value: -12.162352126704402 - type: nauc_map_at_1_diff1 value: 7.476969859583216 - type: nauc_map_at_1_max value: -26.629997316876853 - type: nauc_map_at_1_std value: -23.469874489461308 - type: nauc_map_at_20_diff1 value: 7.222345063366828 - type: nauc_map_at_20_max value: -2.5103197323267223 - type: nauc_map_at_20_std value: -10.997015623527455 - type: nauc_map_at_3_diff1 value: 14.924734426277178 - type: nauc_map_at_3_max value: -11.92937537932614 - type: nauc_map_at_3_std value: -4.9319666083973255 - type: nauc_map_at_5_diff1 value: 8.080773945621521 - type: nauc_map_at_5_max value: -3.8175754142607836 - type: nauc_map_at_5_std value: -4.541639774033337 - type: nauc_mrr_at_1000_diff1 value: 2.4122089783406646 - type: nauc_mrr_at_1000_max value: -15.876004562207497 - type: nauc_mrr_at_1000_std value: -12.985028057822372 - type: nauc_mrr_at_100_diff1 value: 2.4122089783406646 - type: nauc_mrr_at_100_max value: -15.876004562207497 - type: nauc_mrr_at_100_std value: -12.985028057822372 - type: nauc_mrr_at_10_diff1 value: 0.2857311186354727 - type: nauc_mrr_at_10_max value: -14.63697545190418 - type: nauc_mrr_at_10_std value: -12.056570964159198 - type: nauc_mrr_at_1_diff1 value: 6.868795277703242 - type: nauc_mrr_at_1_max value: -24.845720418567222 - type: nauc_mrr_at_1_std value: -20.686879527770337 - type: nauc_mrr_at_20_diff1 value: 1.8452171261188577 - type: nauc_mrr_at_20_max value: -15.538023663956924 - type: nauc_mrr_at_20_std value: -13.690749771450164 - type: nauc_mrr_at_3_diff1 value: 10.557261573838256 - type: nauc_mrr_at_3_max value: -20.946427791765498 - type: nauc_mrr_at_3_std value: -9.815750025468983 - type: nauc_mrr_at_5_diff1 value: 4.101442020672411 - type: nauc_mrr_at_5_max value: -14.963605604722682 - type: nauc_mrr_at_5_std value: -9.917384084595511 - type: nauc_ndcg_at_1000_diff1 value: 0.04370368246080858 - type: nauc_ndcg_at_1000_max value: -0.818088536466922 - type: nauc_ndcg_at_1000_std value: -4.74569960455296 - type: nauc_ndcg_at_100_diff1 value: 0.04370368246080858 - type: nauc_ndcg_at_100_max value: -0.818088536466922 - type: nauc_ndcg_at_100_std value: -4.74569960455296 - type: nauc_ndcg_at_10_diff1 value: 1.2847289677534977 - type: nauc_ndcg_at_10_max value: -6.3756503900224955 - type: nauc_ndcg_at_10_std value: -12.98730478286347 - type: nauc_ndcg_at_1_diff1 value: 6.868795277703242 - type: nauc_ndcg_at_1_max value: -24.845720418567222 - type: nauc_ndcg_at_1_std value: -20.686879527770337 - type: nauc_ndcg_at_20_diff1 value: 0.777375339231765 - type: nauc_ndcg_at_20_max value: -0.9649148688381876 - type: nauc_ndcg_at_20_std value: -14.374528790697976 - type: nauc_ndcg_at_3_diff1 value: 11.34233767766492 - type: nauc_ndcg_at_3_max value: -13.185097340604685 - type: nauc_ndcg_at_3_std value: -1.42817114044502 - type: nauc_ndcg_at_5_diff1 value: 3.6861855424314394 - type: nauc_ndcg_at_5_max value: -3.8049446945965877 - type: nauc_ndcg_at_5_std value: -3.627047155464453 - type: nauc_precision_at_1000_diff1 value: -23.534146832293555 - type: nauc_precision_at_1000_max value: 7.621521743107654 - type: nauc_precision_at_1000_std value: 31.79231993560317 - type: nauc_precision_at_100_diff1 value: -23.534146832293136 - type: nauc_precision_at_100_max value: 7.6215217431077615 - type: nauc_precision_at_100_std value: 31.792319935603174 - type: nauc_precision_at_10_diff1 value: -9.295902835532825 - type: nauc_precision_at_10_max value: -3.516562838357381 - type: nauc_precision_at_10_std value: -9.542266229384722 - type: nauc_precision_at_1_diff1 value: 6.868795277703242 - type: nauc_precision_at_1_max value: -24.845720418567222 - type: nauc_precision_at_1_std value: -20.686879527770337 - type: nauc_precision_at_20_diff1 value: -9.74438544160727 - type: nauc_precision_at_20_max value: 8.895012105242024 - type: nauc_precision_at_20_std value: -10.653950589210957 - type: nauc_precision_at_3_diff1 value: 8.920936116382022 - type: nauc_precision_at_3_max value: -10.246679316888065 - type: nauc_precision_at_3_std value: 5.611638203668553 - type: nauc_precision_at_5_diff1 value: -8.265025821338345 - type: nauc_precision_at_5_max value: 7.359630809801093 - type: nauc_precision_at_5_std value: 7.003625975167535 - type: nauc_recall_at_1000_diff1 value: .nan - type: nauc_recall_at_1000_max value: .nan - type: nauc_recall_at_1000_std value: .nan - type: nauc_recall_at_100_diff1 value: .nan - type: nauc_recall_at_100_max value: .nan - type: nauc_recall_at_100_std value: .nan - type: nauc_recall_at_10_diff1 value: -1.798034642140945 - type: nauc_recall_at_10_max value: 0.6924952930762724 - type: nauc_recall_at_10_std value: -13.706398349868037 - type: nauc_recall_at_1_diff1 value: 7.476969859583216 - type: nauc_recall_at_1_max value: -26.629997316876853 - type: nauc_recall_at_1_std value: -23.469874489461308 - type: nauc_recall_at_20_diff1 value: -2.659819202817919 - type: nauc_recall_at_20_max value: 10.517274540935807 - type: nauc_recall_at_20_std value: -14.235421011543991 - type: nauc_recall_at_3_diff1 value: 15.662853297442803 - type: nauc_recall_at_3_max value: -11.663877606927189 - type: nauc_recall_at_3_std value: -2.341470241427359 - type: nauc_recall_at_5_diff1 value: 2.273326115596832 - type: nauc_recall_at_5_max value: 2.8669632025879537 - type: nauc_recall_at_5_std value: -0.3450165007891684 - type: ndcg_at_1 value: 8.0 - type: ndcg_at_10 value: 13.624 - type: ndcg_at_100 value: 38.109 - type: ndcg_at_1000 value: 38.109 - type: ndcg_at_20 value: 16.907 - type: ndcg_at_3 value: 9.45 - type: ndcg_at_5 value: 10.598 - type: precision_at_1 value: 8.0 - type: precision_at_10 value: 7.3999999999999995 - type: precision_at_100 value: 4.34 - type: precision_at_1000 value: 0.434 - type: precision_at_20 value: 5.5 - type: precision_at_3 value: 10.0 - type: precision_at_5 value: 10.0 - type: recall_at_1 value: 1.7999999999999998 - type: recall_at_10 value: 18.333 - type: recall_at_100 value: 100.0 - type: recall_at_1000 value: 100.0 - type: recall_at_20 value: 26.333000000000002 - type: recall_at_3 value: 7.867 - type: recall_at_5 value: 12.333 task: type: Retrieval - dataset: config: default name: MTEB ARCChallenge (default) revision: c481e0da3dcbbad8bce7721dea9085b74320a0a3 split: test type: RAR-b/ARC-Challenge metrics: - type: main_score value: 3.8449999999999998 - type: map_at_1 value: 1.536 - type: map_at_10 value: 2.902 - type: map_at_100 value: 3.2259999999999995 - type: map_at_1000 value: 3.309 - type: map_at_20 value: 3.061 - type: map_at_3 value: 2.204 - type: map_at_5 value: 2.656 - type: mrr_at_1 value: 1.5358361774744027 - type: mrr_at_10 value: 2.902107373097134 - type: mrr_at_100 value: 3.2259697277173585 - type: mrr_at_1000 value: 3.309141234079007 - type: mrr_at_20 value: 3.0608339226581975 - type: mrr_at_3 value: 2.204209328782707 - type: mrr_at_5 value: 2.6564277588168363 - type: nauc_map_at_1000_diff1 value: 6.6349335671175 - type: nauc_map_at_1000_max value: 10.045752081479547 - type: nauc_map_at_1000_std value: 5.17373675499246 - type: nauc_map_at_100_diff1 value: 6.6240618235225135 - type: nauc_map_at_100_max value: 10.244151375429777 - type: nauc_map_at_100_std value: 5.305639061848512 - type: nauc_map_at_10_diff1 value: 7.5024069352343 - type: nauc_map_at_10_max value: 11.928684625428838 - type: nauc_map_at_10_std value: 5.016380398843673 - type: nauc_map_at_1_diff1 value: 17.26912687174127 - type: nauc_map_at_1_max value: 6.265273970269121 - type: nauc_map_at_1_std value: -4.8796731336600825 - type: nauc_map_at_20_diff1 value: 7.120932496690847 - type: nauc_map_at_20_max value: 11.15762860873897 - type: nauc_map_at_20_std value: 5.342837705336892 - type: nauc_map_at_3_diff1 value: 7.138259469017607 - type: nauc_map_at_3_max value: 8.348409228816523 - type: nauc_map_at_3_std value: 6.767314043423357 - type: nauc_map_at_5_diff1 value: 7.239963996009633 - type: nauc_map_at_5_max value: 11.068225118567208 - type: nauc_map_at_5_std value: 5.0851302044955835 - type: nauc_mrr_at_1000_diff1 value: 6.6349335671175 - type: nauc_mrr_at_1000_max value: 10.045752081479547 - type: nauc_mrr_at_1000_std value: 5.17373675499246 - type: nauc_mrr_at_100_diff1 value: 6.6240618235225135 - type: nauc_mrr_at_100_max value: 10.244151375429777 - type: nauc_mrr_at_100_std value: 5.305639061848512 - type: nauc_mrr_at_10_diff1 value: 7.5024069352343 - type: nauc_mrr_at_10_max value: 11.928684625428838 - type: nauc_mrr_at_10_std value: 5.016380398843673 - type: nauc_mrr_at_1_diff1 value: 17.26912687174127 - type: nauc_mrr_at_1_max value: 6.265273970269121 - type: nauc_mrr_at_1_std value: -4.8796731336600825 - type: nauc_mrr_at_20_diff1 value: 7.120932496690847 - type: nauc_mrr_at_20_max value: 11.15762860873897 - type: nauc_mrr_at_20_std value: 5.342837705336892 - type: nauc_mrr_at_3_diff1 value: 7.138259469017607 - type: nauc_mrr_at_3_max value: 8.348409228816523 - type: nauc_mrr_at_3_std value: 6.767314043423357 - type: nauc_mrr_at_5_diff1 value: 7.239963996009633 - type: nauc_mrr_at_5_max value: 11.068225118567208 - type: nauc_mrr_at_5_std value: 5.0851302044955835 - type: nauc_ndcg_at_1000_diff1 value: 3.49547273108029 - type: nauc_ndcg_at_1000_max value: 4.987679792326471 - type: nauc_ndcg_at_1000_std value: 4.792386661474078 - type: nauc_ndcg_at_100_diff1 value: 3.423765430486521 - type: nauc_ndcg_at_100_max value: 7.215346434617728 - type: nauc_ndcg_at_100_std value: 6.1334416812657055 - type: nauc_ndcg_at_10_diff1 value: 6.211453661355799 - type: nauc_ndcg_at_10_max value: 13.686949611790244 - type: nauc_ndcg_at_10_std value: 5.334521959588366 - type: nauc_ndcg_at_1_diff1 value: 17.26912687174127 - type: nauc_ndcg_at_1_max value: 6.265273970269121 - type: nauc_ndcg_at_1_std value: -4.8796731336600825 - type: nauc_ndcg_at_20_diff1 value: 5.269692894653953 - type: nauc_ndcg_at_20_max value: 11.466483119515134 - type: nauc_ndcg_at_20_std value: 6.208531132010362 - type: nauc_ndcg_at_3_diff1 value: 4.841534563021528 - type: nauc_ndcg_at_3_max value: 8.715299190678648 - type: nauc_ndcg_at_3_std value: 8.889648909403514 - type: nauc_ndcg_at_5_diff1 value: 5.5149763431777385 - type: nauc_ndcg_at_5_max value: 12.41579830649011 - type: nauc_ndcg_at_5_std value: 5.8568738487427865 - type: nauc_precision_at_1000_diff1 value: 1.0890041942217588 - type: nauc_precision_at_1000_max value: -1.074889035912781 - type: nauc_precision_at_1000_std value: 3.7386321369399207 - type: nauc_precision_at_100_diff1 value: 0.24898034725209317 - type: nauc_precision_at_100_max value: 2.6625432444853345 - type: nauc_precision_at_100_std value: 6.760865885892171 - type: nauc_precision_at_10_diff1 value: 4.728605530960451 - type: nauc_precision_at_10_max value: 16.098011324014156 - type: nauc_precision_at_10_std value: 5.294918338481019 - type: nauc_precision_at_1_diff1 value: 17.26912687174127 - type: nauc_precision_at_1_max value: 6.265273970269121 - type: nauc_precision_at_1_std value: -4.8796731336600825 - type: nauc_precision_at_20_diff1 value: 3.1605384012118063 - type: nauc_precision_at_20_max value: 11.228945826678288 - type: nauc_precision_at_20_std value: 7.0587619686895975 - type: nauc_precision_at_3_diff1 value: 0.15384889210192554 - type: nauc_precision_at_3_max value: 9.441612052649862 - type: nauc_precision_at_3_std value: 13.110663421557597 - type: nauc_precision_at_5_diff1 value: 2.9177590765544803 - type: nauc_precision_at_5_max value: 14.583883090410385 - type: nauc_precision_at_5_std value: 6.761154902844139 - type: nauc_recall_at_1000_diff1 value: 1.0890041942217838 - type: nauc_recall_at_1000_max value: -1.0748890359127414 - type: nauc_recall_at_1000_std value: 3.7386321369399447 - type: nauc_recall_at_100_diff1 value: 0.2489803472520955 - type: nauc_recall_at_100_max value: 2.6625432444853385 - type: nauc_recall_at_100_std value: 6.7608658858921835 - type: nauc_recall_at_10_diff1 value: 4.728605530960435 - type: nauc_recall_at_10_max value: 16.09801132401412 - type: nauc_recall_at_10_std value: 5.294918338481006 - type: nauc_recall_at_1_diff1 value: 17.26912687174127 - type: nauc_recall_at_1_max value: 6.265273970269121 - type: nauc_recall_at_1_std value: -4.8796731336600825 - type: nauc_recall_at_20_diff1 value: 3.1605384012117814 - type: nauc_recall_at_20_max value: 11.22894582667827 - type: nauc_recall_at_20_std value: 7.0587619686895655 - type: nauc_recall_at_3_diff1 value: 0.15384889210195152 - type: nauc_recall_at_3_max value: 9.441612052649868 - type: nauc_recall_at_3_std value: 13.110663421557629 - type: nauc_recall_at_5_diff1 value: 2.917759076554466 - type: nauc_recall_at_5_max value: 14.583883090410346 - type: nauc_recall_at_5_std value: 6.761154902844119 - type: ndcg_at_1 value: 1.536 - type: ndcg_at_10 value: 3.8449999999999998 - type: ndcg_at_100 value: 5.772 - type: ndcg_at_1000 value: 8.509 - type: ndcg_at_20 value: 4.426 - type: ndcg_at_3 value: 2.447 - type: ndcg_at_5 value: 3.258 - type: precision_at_1 value: 1.536 - type: precision_at_10 value: 0.6910000000000001 - type: precision_at_100 value: 0.168 - type: precision_at_1000 value: 0.04 - type: precision_at_20 value: 0.461 - type: precision_at_3 value: 1.052 - type: precision_at_5 value: 1.024 - type: recall_at_1 value: 1.536 - type: recall_at_10 value: 6.9110000000000005 - type: recall_at_100 value: 16.808999999999997 - type: recall_at_1000 value: 39.505 - type: recall_at_20 value: 9.215 - type: recall_at_3 value: 3.157 - type: recall_at_5 value: 5.119 task: type: Retrieval - dataset: config: default name: MTEB AlphaNLI (default) revision: 303f40ef3d50918d3dc43577d33f2f7344ad72c1 split: test type: RAR-b/alphanli metrics: - type: main_score value: 14.155000000000001 - type: map_at_1 value: 8.616 - type: map_at_10 value: 12.151 - type: map_at_100 value: 12.713 - type: map_at_1000 value: 12.790000000000001 - type: map_at_20 value: 12.478 - type: map_at_3 value: 10.955 - type: map_at_5 value: 11.68 - type: mrr_at_1 value: 8.616187989556137 - type: mrr_at_10 value: 12.151197728873969 - type: mrr_at_100 value: 12.713435989405935 - type: mrr_at_1000 value: 12.789534083463522 - type: mrr_at_20 value: 12.478389119397455 - type: mrr_at_3 value: 10.955178416013926 - type: mrr_at_5 value: 11.679721496953876 - type: nauc_map_at_1000_diff1 value: 38.986525912703435 - type: nauc_map_at_1000_max value: 12.219692225747707 - type: nauc_map_at_1000_std value: 1.2585343212684903 - type: nauc_map_at_100_diff1 value: 39.02868722054371 - type: nauc_map_at_100_max value: 12.248003227250122 - type: nauc_map_at_100_std value: 1.2163208553030314 - type: nauc_map_at_10_diff1 value: 40.110717683039525 - type: nauc_map_at_10_max value: 12.78605835422205 - type: nauc_map_at_10_std value: 0.6481692151906001 - type: nauc_map_at_1_diff1 value: 48.456097345786745 - type: nauc_map_at_1_max value: 14.981869102701411 - type: nauc_map_at_1_std value: -3.0707717911327226 - type: nauc_map_at_20_diff1 value: 39.42161381753684 - type: nauc_map_at_20_max value: 12.341429085851182 - type: nauc_map_at_20_std value: 0.8391480542456798 - type: nauc_map_at_3_diff1 value: 42.64699229741736 - type: nauc_map_at_3_max value: 13.681396294884618 - type: nauc_map_at_3_std value: -1.3518984290812146 - type: nauc_map_at_5_diff1 value: 41.32077190616691 - type: nauc_map_at_5_max value: 13.136429689834436 - type: nauc_map_at_5_std value: 0.32856286589434136 - type: nauc_mrr_at_1000_diff1 value: 38.98652591920884 - type: nauc_mrr_at_1000_max value: 12.219692104355413 - type: nauc_mrr_at_1000_std value: 1.2585339367622461 - type: nauc_mrr_at_100_diff1 value: 39.02868722054371 - type: nauc_mrr_at_100_max value: 12.248003227250122 - type: nauc_mrr_at_100_std value: 1.2163208553030314 - type: nauc_mrr_at_10_diff1 value: 40.110717683039525 - type: nauc_mrr_at_10_max value: 12.78605835422205 - type: nauc_mrr_at_10_std value: 0.6481692151906001 - type: nauc_mrr_at_1_diff1 value: 48.456097345786745 - type: nauc_mrr_at_1_max value: 14.981869102701411 - type: nauc_mrr_at_1_std value: -3.0707717911327226 - type: nauc_mrr_at_20_diff1 value: 39.42161381753684 - type: nauc_mrr_at_20_max value: 12.341429085851182 - type: nauc_mrr_at_20_std value: 0.8391480542456798 - type: nauc_mrr_at_3_diff1 value: 42.64699229741736 - type: nauc_mrr_at_3_max value: 13.681396294884618 - type: nauc_mrr_at_3_std value: -1.3518984290812146 - type: nauc_mrr_at_5_diff1 value: 41.32077190616691 - type: nauc_mrr_at_5_max value: 13.136429689834436 - type: nauc_mrr_at_5_std value: 0.32856286589434136 - type: nauc_ndcg_at_1000_diff1 value: 31.611075970442926 - type: nauc_ndcg_at_1000_max value: 9.936393145930218 - type: nauc_ndcg_at_1000_std value: 6.71067891152211 - type: nauc_ndcg_at_100_diff1 value: 32.58290081795884 - type: nauc_ndcg_at_100_max value: 9.842659588765363 - type: nauc_ndcg_at_100_std value: 5.498554329517975 - type: nauc_ndcg_at_10_diff1 value: 36.75293874754393 - type: nauc_ndcg_at_10_max value: 11.803286140726776 - type: nauc_ndcg_at_10_std value: 2.5976940855692074 - type: nauc_ndcg_at_1_diff1 value: 48.456097345786745 - type: nauc_ndcg_at_1_max value: 14.981869102701411 - type: nauc_ndcg_at_1_std value: -3.0707717911327226 - type: nauc_ndcg_at_20_diff1 value: 34.638144952713866 - type: nauc_ndcg_at_20_max value: 10.449640737261305 - type: nauc_ndcg_at_20_std value: 3.2195824007114675 - type: nauc_ndcg_at_3_diff1 value: 41.24511499401773 - type: nauc_ndcg_at_3_max value: 13.384003644595388 - type: nauc_ndcg_at_3_std value: -0.7628562047692254 - type: nauc_ndcg_at_5_diff1 value: 39.2155849544026 - type: nauc_ndcg_at_5_max value: 12.577199638671265 - type: nauc_ndcg_at_5_std value: 2.0185641778476127 - type: nauc_precision_at_1000_diff1 value: 11.879578040836442 - type: nauc_precision_at_1000_max value: 5.358855936542234 - type: nauc_precision_at_1000_std value: 23.471172109373907 - type: nauc_precision_at_100_diff1 value: 18.24569021314919 - type: nauc_precision_at_100_max value: 4.309548949123852 - type: nauc_precision_at_100_std value: 15.884619703445772 - type: nauc_precision_at_10_diff1 value: 29.512994402519226 - type: nauc_precision_at_10_max value: 9.634695132770453 - type: nauc_precision_at_10_std value: 6.795536654948908 - type: nauc_precision_at_1_diff1 value: 48.456097345786745 - type: nauc_precision_at_1_max value: 14.981869102701411 - type: nauc_precision_at_1_std value: -3.0707717911327226 - type: nauc_precision_at_20_diff1 value: 24.18871405534599 - type: nauc_precision_at_20_max value: 6.090279031407053 - type: nauc_precision_at_20_std value: 8.291882200513058 - type: nauc_precision_at_3_diff1 value: 37.926451300682054 - type: nauc_precision_at_3_max value: 12.684618853985219 - type: nauc_precision_at_3_std value: 0.6806740647349011 - type: nauc_precision_at_5_diff1 value: 34.550519136938384 - type: nauc_precision_at_5_max value: 11.344674575354038 - type: nauc_precision_at_5_std value: 5.985578706127787 - type: nauc_recall_at_1000_diff1 value: 11.879578040836519 - type: nauc_recall_at_1000_max value: 5.358855936542304 - type: nauc_recall_at_1000_std value: 23.47117210937398 - type: nauc_recall_at_100_diff1 value: 18.245690213149167 - type: nauc_recall_at_100_max value: 4.3095489491238155 - type: nauc_recall_at_100_std value: 15.88461970344576 - type: nauc_recall_at_10_diff1 value: 29.512994402519215 - type: nauc_recall_at_10_max value: 9.634695132770442 - type: nauc_recall_at_10_std value: 6.795536654948889 - type: nauc_recall_at_1_diff1 value: 48.456097345786745 - type: nauc_recall_at_1_max value: 14.981869102701411 - type: nauc_recall_at_1_std value: -3.0707717911327226 - type: nauc_recall_at_20_diff1 value: 24.188714055346 - type: nauc_recall_at_20_max value: 6.09027903140705 - type: nauc_recall_at_20_std value: 8.291882200513056 - type: nauc_recall_at_3_diff1 value: 37.92645130068206 - type: nauc_recall_at_3_max value: 12.684618853985235 - type: nauc_recall_at_3_std value: 0.6806740647349308 - type: nauc_recall_at_5_diff1 value: 34.55051913693838 - type: nauc_recall_at_5_max value: 11.344674575354015 - type: nauc_recall_at_5_std value: 5.985578706127789 - type: ndcg_at_1 value: 8.616 - type: ndcg_at_10 value: 14.155000000000001 - type: ndcg_at_100 value: 17.102 - type: ndcg_at_1000 value: 19.631 - type: ndcg_at_20 value: 15.344 - type: ndcg_at_3 value: 11.728 - type: ndcg_at_5 value: 13.025999999999998 - type: precision_at_1 value: 8.616 - type: precision_at_10 value: 2.056 - type: precision_at_100 value: 0.349 - type: precision_at_1000 value: 0.055999999999999994 - type: precision_at_20 value: 1.2630000000000001 - type: precision_at_3 value: 4.656 - type: precision_at_5 value: 3.42 - type: recall_at_1 value: 8.616 - type: recall_at_10 value: 20.561 - type: recall_at_100 value: 34.855999999999995 - type: recall_at_1000 value: 55.875 - type: recall_at_20 value: 25.261 - type: recall_at_3 value: 13.969000000000001 - type: recall_at_5 value: 17.102 task: type: Retrieval - dataset: config: default name: MTEB AmazonPolarityClassification (default) revision: e2d317d38cd51312af73b3d32a06d1a08b442046 split: test type: mteb/amazon_polarity metrics: - type: accuracy value: 68.359575 - type: ap value: 63.04430514461716 - type: ap_weighted value: 63.04430514461716 - type: f1 value: 68.12645282836293 - type: f1_weighted value: 68.12645282836293 - type: main_score value: 68.359575 task: type: Classification - dataset: config: default name: MTEB ArguAna (default) revision: c22ab2a51041ffd869aaddef7af8d8215647e41a split: test type: mteb/arguana metrics: - type: main_score value: 32.031 - type: map_at_1 value: 15.363 - type: map_at_10 value: 25.629999999999995 - type: map_at_100 value: 26.851999999999997 - type: map_at_1000 value: 26.916 - type: map_at_20 value: 26.401999999999997 - type: map_at_3 value: 21.764 - type: map_at_5 value: 23.798 - type: mrr_at_1 value: 15.647226173541965 - type: mrr_at_10 value: 25.74270699270699 - type: mrr_at_100 value: 26.95759156481371 - type: mrr_at_1000 value: 27.02192945787223 - type: mrr_at_20 value: 26.50752832488611 - type: mrr_at_3 value: 21.894262683736372 - type: mrr_at_5 value: 23.889284020862938 - type: nauc_map_at_1000_diff1 value: 9.717094498857836 - type: nauc_map_at_1000_max value: 0.006128824635771366 - type: nauc_map_at_1000_std value: 9.951724867994008 - type: nauc_map_at_100_diff1 value: 9.720746167116648 - type: nauc_map_at_100_max value: 0.03921480687966482 - type: nauc_map_at_100_std value: 10.01422840642898 - type: nauc_map_at_10_diff1 value: 9.629884802439925 - type: nauc_map_at_10_max value: -0.18895622006721804 - type: nauc_map_at_10_std value: 8.801754758016564 - type: nauc_map_at_1_diff1 value: 10.255415606776134 - type: nauc_map_at_1_max value: -2.7429221309654044 - type: nauc_map_at_1_std value: 6.866297123270523 - type: nauc_map_at_20_diff1 value: 9.707948736975794 - type: nauc_map_at_20_max value: 0.01892213753638095 - type: nauc_map_at_20_std value: 9.681790764357237 - type: nauc_map_at_3_diff1 value: 8.344213156710568 - type: nauc_map_at_3_max value: -2.0132121856529483 - type: nauc_map_at_3_std value: 8.554071405515435 - type: nauc_map_at_5_diff1 value: 9.14495583661473 - type: nauc_map_at_5_max value: -1.379873148644914 - type: nauc_map_at_5_std value: 9.044652095982553 - type: nauc_mrr_at_1000_diff1 value: 8.520276824384093 - type: nauc_mrr_at_1000_max value: -0.41053299382643904 - type: nauc_mrr_at_1000_std value: 9.770616411797125 - type: nauc_mrr_at_100_diff1 value: 8.526357726757498 - type: nauc_mrr_at_100_max value: -0.37675957362198204 - type: nauc_mrr_at_100_std value: 9.833172972935825 - type: nauc_mrr_at_10_diff1 value: 8.504469942302443 - type: nauc_mrr_at_10_max value: -0.5555290478828475 - type: nauc_mrr_at_10_std value: 8.67347986151777 - type: nauc_mrr_at_1_diff1 value: 8.924965691375194 - type: nauc_mrr_at_1_max value: -2.472212128016505 - type: nauc_mrr_at_1_std value: 6.727737069169365 - type: nauc_mrr_at_20_diff1 value: 8.527008337552795 - type: nauc_mrr_at_20_max value: -0.39130673567011953 - type: nauc_mrr_at_20_std value: 9.504234612175194 - type: nauc_mrr_at_3_diff1 value: 7.028185998793612 - type: nauc_mrr_at_3_max value: -2.531551924396665 - type: nauc_mrr_at_3_std value: 8.36654956798548 - type: nauc_mrr_at_5_diff1 value: 7.946200662893088 - type: nauc_mrr_at_5_max value: -1.8450232157342275 - type: nauc_mrr_at_5_std value: 8.855536533297968 - type: nauc_ndcg_at_1000_diff1 value: 10.148046270962398 - type: nauc_ndcg_at_1000_max value: 1.696424601847897 - type: nauc_ndcg_at_1000_std value: 13.134595506556405 - type: nauc_ndcg_at_100_diff1 value: 10.478061817612778 - type: nauc_ndcg_at_100_max value: 2.790758084465661 - type: nauc_ndcg_at_100_std value: 14.964733623242607 - type: nauc_ndcg_at_10_diff1 value: 10.372927964606154 - type: nauc_ndcg_at_10_max value: 1.9588405301435734 - type: nauc_ndcg_at_10_std value: 9.558148538160015 - type: nauc_ndcg_at_1_diff1 value: 10.255415606776134 - type: nauc_ndcg_at_1_max value: -2.7429221309654044 - type: nauc_ndcg_at_1_std value: 6.866297123270523 - type: nauc_ndcg_at_20_diff1 value: 10.807055510827903 - type: nauc_ndcg_at_20_max value: 2.873981784514884 - type: nauc_ndcg_at_20_std value: 12.684265114648849 - type: nauc_ndcg_at_3_diff1 value: 7.99043332908002 - type: nauc_ndcg_at_3_max value: -1.7537467389545258 - type: nauc_ndcg_at_3_std value: 9.282365459725794 - type: nauc_ndcg_at_5_diff1 value: 9.291919447241343 - type: nauc_ndcg_at_5_max value: -0.6986840661830845 - type: nauc_ndcg_at_5_std value: 10.155119795280289 - type: nauc_precision_at_1000_diff1 value: 5.534567864242971 - type: nauc_precision_at_1000_max value: 9.529106078051697 - type: nauc_precision_at_1000_std value: 62.0873447350283 - type: nauc_precision_at_100_diff1 value: 13.636774071684679 - type: nauc_precision_at_100_max value: 17.905397264353912 - type: nauc_precision_at_100_std value: 49.22170039944941 - type: nauc_precision_at_10_diff1 value: 12.676219389202528 - type: nauc_precision_at_10_max value: 8.164707652448252 - type: nauc_precision_at_10_std value: 11.361740427515855 - type: nauc_precision_at_1_diff1 value: 10.255415606776134 - type: nauc_precision_at_1_max value: -2.7429221309654044 - type: nauc_precision_at_1_std value: 6.866297123270523 - type: nauc_precision_at_20_diff1 value: 15.006293628353006 - type: nauc_precision_at_20_max value: 12.931321039045368 - type: nauc_precision_at_20_std value: 23.758750045585586 - type: nauc_precision_at_3_diff1 value: 7.18325478518931 - type: nauc_precision_at_3_max value: -1.1161637595134446 - type: nauc_precision_at_3_std value: 11.09645301286272 - type: nauc_precision_at_5_diff1 value: 9.780765614595015 - type: nauc_precision_at_5_max value: 1.0082157901430149 - type: nauc_precision_at_5_std value: 12.92929121494741 - type: nauc_recall_at_1000_diff1 value: 5.534567864242688 - type: nauc_recall_at_1000_max value: 9.529106078051411 - type: nauc_recall_at_1000_std value: 62.08734473502826 - type: nauc_recall_at_100_diff1 value: 13.63677407168474 - type: nauc_recall_at_100_max value: 17.905397264353898 - type: nauc_recall_at_100_std value: 49.2217003994493 - type: nauc_recall_at_10_diff1 value: 12.676219389202512 - type: nauc_recall_at_10_max value: 8.164707652448225 - type: nauc_recall_at_10_std value: 11.361740427515835 - type: nauc_recall_at_1_diff1 value: 10.255415606776134 - type: nauc_recall_at_1_max value: -2.7429221309654044 - type: nauc_recall_at_1_std value: 6.866297123270523 - type: nauc_recall_at_20_diff1 value: 15.006293628353069 - type: nauc_recall_at_20_max value: 12.931321039045434 - type: nauc_recall_at_20_std value: 23.75875004558557 - type: nauc_recall_at_3_diff1 value: 7.183254785189315 - type: nauc_recall_at_3_max value: -1.1161637595134306 - type: nauc_recall_at_3_std value: 11.096453012862733 - type: nauc_recall_at_5_diff1 value: 9.780765614595012 - type: nauc_recall_at_5_max value: 1.008215790143006 - type: nauc_recall_at_5_std value: 12.929291214947403 - type: ndcg_at_1 value: 15.363 - type: ndcg_at_10 value: 32.031 - type: ndcg_at_100 value: 38.122 - type: ndcg_at_1000 value: 39.864 - type: ndcg_at_20 value: 34.849999999999994 - type: ndcg_at_3 value: 23.965 - type: ndcg_at_5 value: 27.659 - type: precision_at_1 value: 15.363 - type: precision_at_10 value: 5.277 - type: precision_at_100 value: 0.8170000000000001 - type: precision_at_1000 value: 0.095 - type: precision_at_20 value: 3.197 - type: precision_at_3 value: 10.123 - type: precision_at_5 value: 7.881 - type: recall_at_1 value: 15.363 - type: recall_at_10 value: 52.774 - type: recall_at_100 value: 81.65 - type: recall_at_1000 value: 95.448 - type: recall_at_20 value: 63.94 - type: recall_at_3 value: 30.37 - type: recall_at_5 value: 39.403 task: type: Retrieval - dataset: config: default name: MTEB ArxivClassification (default) revision: f9bd92144ed76200d6eb3ce73a8bd4eba9ffdc85 split: test type: ccdv/arxiv-classification metrics: - type: accuracy value: 43.611999999999995 - type: f1 value: 40.930383763906484 - type: f1_weighted value: 41.404367816744276 - type: main_score value: 43.611999999999995 task: type: Classification - dataset: config: default name: MTEB ArxivClusteringP2P (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: main_score value: 24.827354215343842 - type: v_measure value: 24.827354215343842 - type: v_measure_std value: 14.761042346861815 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringP2P.v2 (default) revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d split: test type: mteb/arxiv-clustering-p2p metrics: - type: main_score value: 29.14326814807588 - type: v_measure value: 29.14326814807588 - type: v_measure_std value: 16.354623518770328 task: type: Clustering - dataset: config: default name: MTEB ArxivClusteringS2S (default) revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 split: test type: mteb/arxiv-clustering-s2s metrics: - type: main_score value: 16.681456170594032 - type: v_measure value: 16.681456170594032 - type: v_measure_std value: 15.806408628434077 task: type: Clustering - dataset: config: default name: MTEB Banking77Classification (default) revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 split: test type: mteb/banking77 metrics: - type: accuracy value: 59.86363636363635 - type: f1 value: 58.3300719763065 - type: f1_weighted value: 58.3300719763065 - type: main_score value: 59.86363636363635 task: type: Classification - dataset: config: default name: MTEB BigPatentClustering (default) revision: 62d5330920bca426ce9d3c76ea914f15fc83e891 split: test type: jinaai/big-patent-clustering metrics: - type: main_score value: 17.208517091148714 - type: v_measure value: 17.208517091148714 - type: v_measure_std value: 0.698644666463382 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringP2P (default) revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 split: test type: mteb/biorxiv-clustering-p2p metrics: - type: main_score value: 19.998032819841395 - type: v_measure value: 19.998032819841395 - type: v_measure_std value: 0.7272995954630507 task: type: Clustering - dataset: config: default name: MTEB BiorxivClusteringS2S (default) revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 split: test type: mteb/biorxiv-clustering-s2s metrics: - type: main_score value: 12.672050490076508 - type: v_measure value: 12.672050490076508 - type: v_measure_std value: 0.7252965151579489 task: type: Clustering - dataset: config: default name: MTEB CEDRClassification (default) revision: c0ba03d058e3e1b2f3fd20518875a4563dd12db4 split: test type: ai-forever/cedr-classification metrics: - type: accuracy value: 38.95324123273113 - type: f1 value: 30.695742042129776 - type: lrap value: 64.53134962805646 - type: main_score value: 38.95324123273113 task: type: MultilabelClassification - dataset: config: default name: MTEB CPUSpeedTask (default) revision: '1.0' split: test type: 'CPUSpeedTask' metrics: - type: avg_words_per_sec value: 1171249.8059068616 - type: main_score value: 1171249.8059068616 - type: physical_cores value: 3600 - type: time_mean value: 31.018148149762837 - type: time_std value: 10.887230129351211 - type: total_cores value: 7200 task: type: Speed - dataset: config: default name: MTEB CQADupstackAndroidRetrieval (default) revision: f46a197baaae43b4f621051089b82a364682dfeb split: test type: mteb/cqadupstack-android metrics: - type: main_score value: 27.686 - type: map_at_1 value: 17.864 - type: map_at_10 value: 23.842 - type: map_at_100 value: 24.648999999999997 - type: map_at_1000 value: 24.771 - type: map_at_20 value: 24.277 - type: map_at_3 value: 21.938 - type: map_at_5 value: 23.058999999999997 - type: mrr_at_1 value: 21.888412017167383 - type: mrr_at_10 value: 27.934691282330764 - type: mrr_at_100 value: 28.58815942555481 - type: mrr_at_1000 value: 28.669575168001604 - type: mrr_at_20 value: 28.259041893075693 - type: mrr_at_3 value: 25.96566523605151 - type: mrr_at_5 value: 27.145922746781114 - type: nauc_map_at_1000_diff1 value: 38.9362657863528 - type: nauc_map_at_1000_max value: 26.39064664437522 - type: nauc_map_at_1000_std value: -0.3507878980807277 - type: nauc_map_at_100_diff1 value: 38.9305380779697 - type: nauc_map_at_100_max value: 26.37667481671251 - type: nauc_map_at_100_std value: -0.4107785241043359 - type: nauc_map_at_10_diff1 value: 38.90352635552967 - type: nauc_map_at_10_max value: 26.04843561328241 - type: nauc_map_at_10_std value: -1.0213929777227249 - type: nauc_map_at_1_diff1 value: 44.891250111700664 - type: nauc_map_at_1_max value: 27.415379429330695 - type: nauc_map_at_1_std value: -2.083016588225919 - type: nauc_map_at_20_diff1 value: 38.94728598104626 - type: nauc_map_at_20_max value: 26.321985371933916 - type: nauc_map_at_20_std value: -0.6740389120283213 - type: nauc_map_at_3_diff1 value: 40.75408309900131 - type: nauc_map_at_3_max value: 26.81466083992981 - type: nauc_map_at_3_std value: -1.3446416472047542 - type: nauc_map_at_5_diff1 value: 39.55391899732806 - type: nauc_map_at_5_max value: 26.73952942989369 - type: nauc_map_at_5_std value: -0.9241166864360354 - type: nauc_mrr_at_1000_diff1 value: 37.49322259212407 - type: nauc_mrr_at_1000_max value: 26.791861376982645 - type: nauc_mrr_at_1000_std value: -0.12058632966589165 - type: nauc_mrr_at_100_diff1 value: 37.47912707778518 - type: nauc_mrr_at_100_max value: 26.780040228801354 - type: nauc_mrr_at_100_std value: -0.13375233513915044 - type: nauc_mrr_at_10_diff1 value: 37.44982182358103 - type: nauc_mrr_at_10_max value: 26.579194370161574 - type: nauc_mrr_at_10_std value: -0.5519796223426987 - type: nauc_mrr_at_1_diff1 value: 43.78241372037574 - type: nauc_mrr_at_1_max value: 29.62575208874629 - type: nauc_mrr_at_1_std value: -0.7403872780711277 - type: nauc_mrr_at_20_diff1 value: 37.413002156119 - type: nauc_mrr_at_20_max value: 26.71157844066263 - type: nauc_mrr_at_20_std value: -0.3418018168926074 - type: nauc_mrr_at_3_diff1 value: 39.36718212836755 - type: nauc_mrr_at_3_max value: 27.755919798148643 - type: nauc_mrr_at_3_std value: -0.5118015715447669 - type: nauc_mrr_at_5_diff1 value: 38.108343388995614 - type: nauc_mrr_at_5_max value: 27.255156457755536 - type: nauc_mrr_at_5_std value: -0.33152296202161974 - type: nauc_ndcg_at_1000_diff1 value: 35.45874849790142 - type: nauc_ndcg_at_1000_max value: 26.06624958789977 - type: nauc_ndcg_at_1000_std value: 2.8510315350747746 - type: nauc_ndcg_at_100_diff1 value: 35.22563491603818 - type: nauc_ndcg_at_100_max value: 25.482125642505167 - type: nauc_ndcg_at_100_std value: 1.7230614371120136 - type: nauc_ndcg_at_10_diff1 value: 35.442027092978336 - type: nauc_ndcg_at_10_max value: 24.43872310681677 - type: nauc_ndcg_at_10_std value: -0.8836727526012238 - type: nauc_ndcg_at_1_diff1 value: 43.78241372037574 - type: nauc_ndcg_at_1_max value: 29.62575208874629 - type: nauc_ndcg_at_1_std value: -0.7403872780711277 - type: nauc_ndcg_at_20_diff1 value: 35.532620958116226 - type: nauc_ndcg_at_20_max value: 24.9995407161472 - type: nauc_ndcg_at_20_std value: 0.09407090543637946 - type: nauc_ndcg_at_3_diff1 value: 38.771875097129474 - type: nauc_ndcg_at_3_max value: 26.88398760762366 - type: nauc_ndcg_at_3_std value: -0.7925347887124169 - type: nauc_ndcg_at_5_diff1 value: 36.83295698854961 - type: nauc_ndcg_at_5_max value: 26.254070953306602 - type: nauc_ndcg_at_5_std value: -0.5384138224839687 - type: nauc_precision_at_1000_diff1 value: 3.830797202509721 - type: nauc_precision_at_1000_max value: 11.845342201460761 - type: nauc_precision_at_1000_std value: 9.148785863457954 - type: nauc_precision_at_100_diff1 value: 13.997075774954821 - type: nauc_precision_at_100_max value: 21.8795221100872 - type: nauc_precision_at_100_std value: 8.373324931296871 - type: nauc_precision_at_10_diff1 value: 22.14226604167402 - type: nauc_precision_at_10_max value: 21.908333662820144 - type: nauc_precision_at_10_std value: 2.023219601124639 - type: nauc_precision_at_1_diff1 value: 43.78241372037574 - type: nauc_precision_at_1_max value: 29.62575208874629 - type: nauc_precision_at_1_std value: -0.7403872780711277 - type: nauc_precision_at_20_diff1 value: 20.193510781013575 - type: nauc_precision_at_20_max value: 21.47063363375231 - type: nauc_precision_at_20_std value: 5.073093391207243 - type: nauc_precision_at_3_diff1 value: 33.320150724486965 - type: nauc_precision_at_3_max value: 28.42063777288856 - type: nauc_precision_at_3_std value: 1.3535730617388522 - type: nauc_precision_at_5_diff1 value: 26.972979755151126 - type: nauc_precision_at_5_max value: 27.35114981308005 - type: nauc_precision_at_5_std value: 1.5457768965552783 - type: nauc_recall_at_1000_diff1 value: 19.86231350512352 - type: nauc_recall_at_1000_max value: 24.527676453832008 - type: nauc_recall_at_1000_std value: 22.21772883429467 - type: nauc_recall_at_100_diff1 value: 23.132801377646004 - type: nauc_recall_at_100_max value: 20.988835029134467 - type: nauc_recall_at_100_std value: 8.793975445583824 - type: nauc_recall_at_10_diff1 value: 25.796766681233457 - type: nauc_recall_at_10_max value: 17.634361086885264 - type: nauc_recall_at_10_std value: -0.4776257668185774 - type: nauc_recall_at_1_diff1 value: 44.891250111700664 - type: nauc_recall_at_1_max value: 27.415379429330695 - type: nauc_recall_at_1_std value: -2.083016588225919 - type: nauc_recall_at_20_diff1 value: 25.714655008602115 - type: nauc_recall_at_20_max value: 19.791963050086874 - type: nauc_recall_at_20_std value: 1.9596491600238453 - type: nauc_recall_at_3_diff1 value: 34.63094367351514 - type: nauc_recall_at_3_max value: 23.49028309758934 - type: nauc_recall_at_3_std value: -0.8832533681499335 - type: nauc_recall_at_5_diff1 value: 30.296413916201175 - type: nauc_recall_at_5_max value: 22.27559868081795 - type: nauc_recall_at_5_std value: 0.7320693658757037 - type: ndcg_at_1 value: 21.887999999999998 - type: ndcg_at_10 value: 27.686 - type: ndcg_at_100 value: 31.363999999999997 - type: ndcg_at_1000 value: 34.605000000000004 - type: ndcg_at_20 value: 28.93 - type: ndcg_at_3 value: 24.576999999999998 - type: ndcg_at_5 value: 26.144000000000002 - type: precision_at_1 value: 21.887999999999998 - type: precision_at_10 value: 5.0360000000000005 - type: precision_at_100 value: 0.828 - type: precision_at_1000 value: 0.135 - type: precision_at_20 value: 2.9690000000000003 - type: precision_at_3 value: 11.445 - type: precision_at_5 value: 8.269 - type: recall_at_1 value: 17.864 - type: recall_at_10 value: 34.977999999999994 - type: recall_at_100 value: 51.366 - type: recall_at_1000 value: 74.505 - type: recall_at_20 value: 39.587 - type: recall_at_3 value: 25.856 - type: recall_at_5 value: 30.215999999999998 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackEnglishRetrieval (default) revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 split: test type: mteb/cqadupstack-english metrics: - type: main_score value: 17.534 - type: map_at_1 value: 11.354000000000001 - type: map_at_10 value: 14.847 - type: map_at_100 value: 15.49 - type: map_at_1000 value: 15.588 - type: map_at_20 value: 15.17 - type: map_at_3 value: 13.501 - type: map_at_5 value: 14.221 - type: mrr_at_1 value: 14.26751592356688 - type: mrr_at_10 value: 18.05727428975836 - type: mrr_at_100 value: 18.690847238016758 - type: mrr_at_1000 value: 18.764726106731445 - type: mrr_at_20 value: 18.395670843598797 - type: mrr_at_3 value: 16.64543524416137 - type: mrr_at_5 value: 17.333333333333336 - type: nauc_map_at_1000_diff1 value: 43.301676769305494 - type: nauc_map_at_1000_max value: 16.06805541449501 - type: nauc_map_at_1000_std value: 12.507510564248166 - type: nauc_map_at_100_diff1 value: 43.34383366787733 - type: nauc_map_at_100_max value: 16.049871088358675 - type: nauc_map_at_100_std value: 12.45712935804974 - type: nauc_map_at_10_diff1 value: 43.688675805930785 - type: nauc_map_at_10_max value: 16.41613903348705 - type: nauc_map_at_10_std value: 12.219643122219239 - type: nauc_map_at_1_diff1 value: 50.609096395200005 - type: nauc_map_at_1_max value: 18.78413464500168 - type: nauc_map_at_1_std value: 10.90744028944332 - type: nauc_map_at_20_diff1 value: 43.49084704145287 - type: nauc_map_at_20_max value: 16.182371186268703 - type: nauc_map_at_20_std value: 12.299197289134225 - type: nauc_map_at_3_diff1 value: 45.751823982563266 - type: nauc_map_at_3_max value: 17.192711563068457 - type: nauc_map_at_3_std value: 11.16466159721384 - type: nauc_map_at_5_diff1 value: 44.53444696379338 - type: nauc_map_at_5_max value: 16.559164547974103 - type: nauc_map_at_5_std value: 11.928445405766698 - type: nauc_mrr_at_1000_diff1 value: 42.29550571785051 - type: nauc_mrr_at_1000_max value: 15.642122643175679 - type: nauc_mrr_at_1000_std value: 12.21491820640565 - type: nauc_mrr_at_100_diff1 value: 42.301744065140404 - type: nauc_mrr_at_100_max value: 15.61733477074953 - type: nauc_mrr_at_100_std value: 12.181221737579532 - type: nauc_mrr_at_10_diff1 value: 42.670586100296646 - type: nauc_mrr_at_10_max value: 15.926109333510835 - type: nauc_mrr_at_10_std value: 12.192068681943583 - type: nauc_mrr_at_1_diff1 value: 51.89198697276755 - type: nauc_mrr_at_1_max value: 19.325504911863643 - type: nauc_mrr_at_1_std value: 12.282190963023766 - type: nauc_mrr_at_20_diff1 value: 42.39065015069134 - type: nauc_mrr_at_20_max value: 15.693533741719229 - type: nauc_mrr_at_20_std value: 12.145452140370937 - type: nauc_mrr_at_3_diff1 value: 44.715851634047944 - type: nauc_mrr_at_3_max value: 16.790849616314052 - type: nauc_mrr_at_3_std value: 12.056098541376208 - type: nauc_mrr_at_5_diff1 value: 43.87033674228477 - type: nauc_mrr_at_5_max value: 16.270118452872623 - type: nauc_mrr_at_5_std value: 12.268005300025886 - type: nauc_ndcg_at_1000_diff1 value: 38.01640412131576 - type: nauc_ndcg_at_1000_max value: 14.409491835566401 - type: nauc_ndcg_at_1000_std value: 14.292607075384597 - type: nauc_ndcg_at_100_diff1 value: 38.57310899261012 - type: nauc_ndcg_at_100_max value: 13.847832990597306 - type: nauc_ndcg_at_100_std value: 13.318671226615844 - type: nauc_ndcg_at_10_diff1 value: 40.02384031953078 - type: nauc_ndcg_at_10_max value: 15.18313865997875 - type: nauc_ndcg_at_10_std value: 12.662598128357672 - type: nauc_ndcg_at_1_diff1 value: 51.89198697276755 - type: nauc_ndcg_at_1_max value: 19.325504911863643 - type: nauc_ndcg_at_1_std value: 12.282190963023766 - type: nauc_ndcg_at_20_diff1 value: 39.357302335202725 - type: nauc_ndcg_at_20_max value: 14.497857343754966 - type: nauc_ndcg_at_20_std value: 12.630113736826498 - type: nauc_ndcg_at_3_diff1 value: 43.58418967840297 - type: nauc_ndcg_at_3_max value: 16.597491536723943 - type: nauc_ndcg_at_3_std value: 11.650784883274328 - type: nauc_ndcg_at_5_diff1 value: 42.02130435072668 - type: nauc_ndcg_at_5_max value: 15.627518090215247 - type: nauc_ndcg_at_5_std value: 12.533489817270919 - type: nauc_precision_at_1000_diff1 value: 3.679521880714478 - type: nauc_precision_at_1000_max value: 0.7919025640437954 - type: nauc_precision_at_1000_std value: 11.047727940811521 - type: nauc_precision_at_100_diff1 value: 19.4078130462856 - type: nauc_precision_at_100_max value: 4.3715506402771425 - type: nauc_precision_at_100_std value: 16.956899011609643 - type: nauc_precision_at_10_diff1 value: 28.437045098011527 - type: nauc_precision_at_10_max value: 11.734386703789056 - type: nauc_precision_at_10_std value: 15.714063626213687 - type: nauc_precision_at_1_diff1 value: 51.89198697276755 - type: nauc_precision_at_1_max value: 19.325504911863643 - type: nauc_precision_at_1_std value: 12.282190963023766 - type: nauc_precision_at_20_diff1 value: 26.61622384998239 - type: nauc_precision_at_20_max value: 9.031660188586937 - type: nauc_precision_at_20_std value: 16.20337620782593 - type: nauc_precision_at_3_diff1 value: 38.065037328678045 - type: nauc_precision_at_3_max value: 15.242914979757064 - type: nauc_precision_at_3_std value: 13.448074137354654 - type: nauc_precision_at_5_diff1 value: 34.74896073477683 - type: nauc_precision_at_5_max value: 13.347547367557508 - type: nauc_precision_at_5_std value: 15.211527933339694 - type: nauc_recall_at_1000_diff1 value: 22.478800979463685 - type: nauc_recall_at_1000_max value: 11.13145140021939 - type: nauc_recall_at_1000_std value: 20.050008624461874 - type: nauc_recall_at_100_diff1 value: 25.988786568304555 - type: nauc_recall_at_100_max value: 8.089785168176974 - type: nauc_recall_at_100_std value: 14.262619130209112 - type: nauc_recall_at_10_diff1 value: 30.866722162291687 - type: nauc_recall_at_10_max value: 12.14019760016012 - type: nauc_recall_at_10_std value: 12.8097154636935 - type: nauc_recall_at_1_diff1 value: 50.609096395200005 - type: nauc_recall_at_1_max value: 18.78413464500168 - type: nauc_recall_at_1_std value: 10.90744028944332 - type: nauc_recall_at_20_diff1 value: 28.832935090203225 - type: nauc_recall_at_20_max value: 10.309594281852648 - type: nauc_recall_at_20_std value: 12.251157275647977 - type: nauc_recall_at_3_diff1 value: 40.105712098235315 - type: nauc_recall_at_3_max value: 15.165723469178264 - type: nauc_recall_at_3_std value: 10.99744165240917 - type: nauc_recall_at_5_diff1 value: 36.09241435581379 - type: nauc_recall_at_5_max value: 13.032542349570054 - type: nauc_recall_at_5_std value: 12.802627519053681 - type: ndcg_at_1 value: 14.268 - type: ndcg_at_10 value: 17.534 - type: ndcg_at_100 value: 20.78 - type: ndcg_at_1000 value: 23.526 - type: ndcg_at_20 value: 18.567 - type: ndcg_at_3 value: 15.218000000000002 - type: ndcg_at_5 value: 16.164 - type: precision_at_1 value: 14.268 - type: precision_at_10 value: 3.312 - type: precision_at_100 value: 0.603 - type: precision_at_1000 value: 0.105 - type: precision_at_20 value: 1.9869999999999999 - type: precision_at_3 value: 7.219 - type: precision_at_5 value: 5.1209999999999996 - type: recall_at_1 value: 11.354000000000001 - type: recall_at_10 value: 22.511 - type: recall_at_100 value: 37.24 - type: recall_at_1000 value: 56.718 - type: recall_at_20 value: 26.362999999999996 - type: recall_at_3 value: 15.53 - type: recall_at_5 value: 18.322 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGamingRetrieval (default) revision: 4885aa143210c98657558c04aaf3dc47cfb54340 split: test type: mteb/cqadupstack-gaming metrics: - type: main_score value: 29.03 - type: map_at_1 value: 19.307 - type: map_at_10 value: 25.453 - type: map_at_100 value: 26.33 - type: map_at_1000 value: 26.419999999999998 - type: map_at_20 value: 25.896 - type: map_at_3 value: 23.572000000000003 - type: map_at_5 value: 24.694 - type: mrr_at_1 value: 22.00626959247649 - type: mrr_at_10 value: 27.87858884410605 - type: mrr_at_100 value: 28.652814969242712 - type: mrr_at_1000 value: 28.725946491824235 - type: mrr_at_20 value: 28.276271334002978 - type: mrr_at_3 value: 25.997910135841156 - type: mrr_at_5 value: 27.11703239289442 - type: nauc_map_at_1000_diff1 value: 43.50604073464055 - type: nauc_map_at_1000_max value: 30.480004310005544 - type: nauc_map_at_1000_std value: 0.18281635239684302 - type: nauc_map_at_100_diff1 value: 43.51057034900177 - type: nauc_map_at_100_max value: 30.463453039114537 - type: nauc_map_at_100_std value: 0.1392213813651391 - type: nauc_map_at_10_diff1 value: 43.680704548271024 - type: nauc_map_at_10_max value: 30.639431323648626 - type: nauc_map_at_10_std value: -0.17722097946115797 - type: nauc_map_at_1_diff1 value: 49.51121570705665 - type: nauc_map_at_1_max value: 31.820851746100594 - type: nauc_map_at_1_std value: -2.635315036488275 - type: nauc_map_at_20_diff1 value: 43.519636427140746 - type: nauc_map_at_20_max value: 30.479309603785193 - type: nauc_map_at_20_std value: -0.04034004401117608 - type: nauc_map_at_3_diff1 value: 44.660054248758726 - type: nauc_map_at_3_max value: 30.35371167828995 - type: nauc_map_at_3_std value: -1.4381463631334364 - type: nauc_map_at_5_diff1 value: 44.14458335553869 - type: nauc_map_at_5_max value: 30.49464687257249 - type: nauc_map_at_5_std value: -0.7069576298198817 - type: nauc_mrr_at_1000_diff1 value: 43.49091070845857 - type: nauc_mrr_at_1000_max value: 30.904217260073207 - type: nauc_mrr_at_1000_std value: 0.6030969099528762 - type: nauc_mrr_at_100_diff1 value: 43.48206732167152 - type: nauc_mrr_at_100_max value: 30.885805566023013 - type: nauc_mrr_at_100_std value: 0.5769328589498474 - type: nauc_mrr_at_10_diff1 value: 43.55457392824764 - type: nauc_mrr_at_10_max value: 31.139789286663294 - type: nauc_mrr_at_10_std value: 0.39137312166360116 - type: nauc_mrr_at_1_diff1 value: 49.7476817055079 - type: nauc_mrr_at_1_max value: 33.35487810786589 - type: nauc_mrr_at_1_std value: -2.335419312527886 - type: nauc_mrr_at_20_diff1 value: 43.48827825669483 - type: nauc_mrr_at_20_max value: 30.983317516254566 - type: nauc_mrr_at_20_std value: 0.4846694988872726 - type: nauc_mrr_at_3_diff1 value: 44.66661877146986 - type: nauc_mrr_at_3_max value: 31.31121111690094 - type: nauc_mrr_at_3_std value: -0.5970753554262374 - type: nauc_mrr_at_5_diff1 value: 44.05287141220467 - type: nauc_mrr_at_5_max value: 31.185044083863524 - type: nauc_mrr_at_5_std value: 0.03276041839131263 - type: nauc_ndcg_at_1000_diff1 value: 40.64648189672279 - type: nauc_ndcg_at_1000_max value: 29.851206560241867 - type: nauc_ndcg_at_1000_std value: 3.7885804314712423 - type: nauc_ndcg_at_100_diff1 value: 40.54660606744312 - type: nauc_ndcg_at_100_max value: 29.52262097274987 - type: nauc_ndcg_at_100_std value: 3.1313695052884087 - type: nauc_ndcg_at_10_diff1 value: 41.189151331147364 - type: nauc_ndcg_at_10_max value: 30.257730735981376 - type: nauc_ndcg_at_10_std value: 1.483283884208919 - type: nauc_ndcg_at_1_diff1 value: 49.7476817055079 - type: nauc_ndcg_at_1_max value: 33.35487810786589 - type: nauc_ndcg_at_1_std value: -2.335419312527886 - type: nauc_ndcg_at_20_diff1 value: 40.69940555374264 - type: nauc_ndcg_at_20_max value: 29.67596434757782 - type: nauc_ndcg_at_20_std value: 1.8670302698321029 - type: nauc_ndcg_at_3_diff1 value: 43.313981749068034 - type: nauc_ndcg_at_3_max value: 29.92612987963682 - type: nauc_ndcg_at_3_std value: -0.7629159307364975 - type: nauc_ndcg_at_5_diff1 value: 42.25367609444526 - type: nauc_ndcg_at_5_max value: 30.011822025139217 - type: nauc_ndcg_at_5_std value: 0.4228958959339596 - type: nauc_precision_at_1000_diff1 value: 6.294045364733051 - type: nauc_precision_at_1000_max value: 13.003287301353916 - type: nauc_precision_at_1000_std value: 19.672009407091075 - type: nauc_precision_at_100_diff1 value: 18.900847000430282 - type: nauc_precision_at_100_max value: 19.89805341000471 - type: nauc_precision_at_100_std value: 14.097381220216437 - type: nauc_precision_at_10_diff1 value: 32.019287482758315 - type: nauc_precision_at_10_max value: 28.868719930088588 - type: nauc_precision_at_10_std value: 7.067713684120723 - type: nauc_precision_at_1_diff1 value: 49.7476817055079 - type: nauc_precision_at_1_max value: 33.35487810786589 - type: nauc_precision_at_1_std value: -2.335419312527886 - type: nauc_precision_at_20_diff1 value: 27.442952211039866 - type: nauc_precision_at_20_max value: 25.51570310142488 - type: nauc_precision_at_20_std value: 8.001107746535538 - type: nauc_precision_at_3_diff1 value: 38.33881569586195 - type: nauc_precision_at_3_max value: 28.995385801766826 - type: nauc_precision_at_3_std value: 0.46426597601937036 - type: nauc_precision_at_5_diff1 value: 35.93052673151141 - type: nauc_precision_at_5_max value: 28.77086703745561 - type: nauc_precision_at_5_std value: 3.020792681159482 - type: nauc_recall_at_1000_diff1 value: 27.413733064523722 - type: nauc_recall_at_1000_max value: 25.640071347285847 - type: nauc_recall_at_1000_std value: 23.024726525628747 - type: nauc_recall_at_100_diff1 value: 30.238748775488382 - type: nauc_recall_at_100_max value: 24.83445535706549 - type: nauc_recall_at_100_std value: 13.213229148027994 - type: nauc_recall_at_10_diff1 value: 33.660824128432765 - type: nauc_recall_at_10_max value: 28.239711759937826 - type: nauc_recall_at_10_std value: 5.259078451819804 - type: nauc_recall_at_1_diff1 value: 49.51121570705665 - type: nauc_recall_at_1_max value: 31.820851746100594 - type: nauc_recall_at_1_std value: -2.635315036488275 - type: nauc_recall_at_20_diff1 value: 31.77661434800746 - type: nauc_recall_at_20_max value: 25.949306594350592 - type: nauc_recall_at_20_std value: 6.611875576453824 - type: nauc_recall_at_3_diff1 value: 39.16095910728281 - type: nauc_recall_at_3_max value: 27.64955581506583 - type: nauc_recall_at_3_std value: 0.10121363216139175 - type: nauc_recall_at_5_diff1 value: 36.32968291714543 - type: nauc_recall_at_5_max value: 27.325678767283694 - type: nauc_recall_at_5_std value: 2.653663972529844 - type: ndcg_at_1 value: 22.006 - type: ndcg_at_10 value: 29.03 - type: ndcg_at_100 value: 33.318999999999996 - type: ndcg_at_1000 value: 35.89 - type: ndcg_at_20 value: 30.503999999999998 - type: ndcg_at_3 value: 25.348 - type: ndcg_at_5 value: 27.267000000000003 - type: precision_at_1 value: 22.006 - type: precision_at_10 value: 4.627 - type: precision_at_100 value: 0.744 - type: precision_at_1000 value: 0.10300000000000001 - type: precision_at_20 value: 2.702 - type: precision_at_3 value: 11.033999999999999 - type: precision_at_5 value: 7.861999999999999 - type: recall_at_1 value: 19.307 - type: recall_at_10 value: 37.624 - type: recall_at_100 value: 56.997 - type: recall_at_1000 value: 76.62299999999999 - type: recall_at_20 value: 43.086 - type: recall_at_3 value: 27.724 - type: recall_at_5 value: 32.421 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackGisRetrieval (default) revision: 5003b3064772da1887988e05400cf3806fe491f2 split: test type: mteb/cqadupstack-gis metrics: - type: main_score value: 14.097000000000001 - type: map_at_1 value: 9.109 - type: map_at_10 value: 12.062000000000001 - type: map_at_100 value: 12.603 - type: map_at_1000 value: 12.690000000000001 - type: map_at_20 value: 12.335 - type: map_at_3 value: 10.882 - type: map_at_5 value: 11.445 - type: mrr_at_1 value: 9.6045197740113 - type: mrr_at_10 value: 13.001390009864586 - type: mrr_at_100 value: 13.541388076434767 - type: mrr_at_1000 value: 13.622995527273426 - type: mrr_at_20 value: 13.261213704134942 - type: mrr_at_3 value: 11.75141242937853 - type: mrr_at_5 value: 12.3728813559322 - type: nauc_map_at_1000_diff1 value: 41.25399941751793 - type: nauc_map_at_1000_max value: 17.60637208770784 - type: nauc_map_at_1000_std value: 3.8997877056955876 - type: nauc_map_at_100_diff1 value: 41.3047772590663 - type: nauc_map_at_100_max value: 17.593792209003684 - type: nauc_map_at_100_std value: 3.8624300256381883 - type: nauc_map_at_10_diff1 value: 41.918994248720736 - type: nauc_map_at_10_max value: 17.523107069845093 - type: nauc_map_at_10_std value: 3.3289332906481333 - type: nauc_map_at_1_diff1 value: 50.853111369434835 - type: nauc_map_at_1_max value: 20.441039981572366 - type: nauc_map_at_1_std value: 2.9730312951046747 - type: nauc_map_at_20_diff1 value: 41.676967823092156 - type: nauc_map_at_20_max value: 17.611142954564 - type: nauc_map_at_20_std value: 3.7507161629892516 - type: nauc_map_at_3_diff1 value: 45.15865999101332 - type: nauc_map_at_3_max value: 17.51828209554345 - type: nauc_map_at_3_std value: 3.125254352308741 - type: nauc_map_at_5_diff1 value: 43.518873099840164 - type: nauc_map_at_5_max value: 18.096843812930256 - type: nauc_map_at_5_std value: 3.501264664850646 - type: nauc_mrr_at_1000_diff1 value: 39.65049616843269 - type: nauc_mrr_at_1000_max value: 18.992312109540187 - type: nauc_mrr_at_1000_std value: 3.8630526743174602 - type: nauc_mrr_at_100_diff1 value: 39.67790321701619 - type: nauc_mrr_at_100_max value: 18.99280796073833 - type: nauc_mrr_at_100_std value: 3.831281556686595 - type: nauc_mrr_at_10_diff1 value: 40.40664164207995 - type: nauc_mrr_at_10_max value: 18.9789911833429 - type: nauc_mrr_at_10_std value: 3.389250639709206 - type: nauc_mrr_at_1_diff1 value: 48.90268334274423 - type: nauc_mrr_at_1_max value: 22.148416208142038 - type: nauc_mrr_at_1_std value: 3.482278486678414 - type: nauc_mrr_at_20_diff1 value: 40.12944011033672 - type: nauc_mrr_at_20_max value: 19.01229852858854 - type: nauc_mrr_at_20_std value: 3.721020072685762 - type: nauc_mrr_at_3_diff1 value: 43.53442474531623 - type: nauc_mrr_at_3_max value: 18.98665230786941 - type: nauc_mrr_at_3_std value: 3.141188860380207 - type: nauc_mrr_at_5_diff1 value: 41.792381222269306 - type: nauc_mrr_at_5_max value: 19.564109785495027 - type: nauc_mrr_at_5_std value: 3.447599289829289 - type: nauc_ndcg_at_1000_diff1 value: 33.75036088168543 - type: nauc_ndcg_at_1000_max value: 17.552395174719724 - type: nauc_ndcg_at_1000_std value: 6.019653809238646 - type: nauc_ndcg_at_100_diff1 value: 34.46011549407109 - type: nauc_ndcg_at_100_max value: 17.261093331357706 - type: nauc_ndcg_at_100_std value: 5.4268706575162104 - type: nauc_ndcg_at_10_diff1 value: 37.83747527779143 - type: nauc_ndcg_at_10_max value: 17.044974102007092 - type: nauc_ndcg_at_10_std value: 3.5111959818349603 - type: nauc_ndcg_at_1_diff1 value: 48.90268334274423 - type: nauc_ndcg_at_1_max value: 22.148416208142038 - type: nauc_ndcg_at_1_std value: 3.482278486678414 - type: nauc_ndcg_at_20_diff1 value: 37.138695182061525 - type: nauc_ndcg_at_20_max value: 17.22387592023126 - type: nauc_ndcg_at_20_std value: 4.770921048488158 - type: nauc_ndcg_at_3_diff1 value: 43.268967346255074 - type: nauc_ndcg_at_3_max value: 17.20602008989898 - type: nauc_ndcg_at_3_std value: 3.19589477459749 - type: nauc_ndcg_at_5_diff1 value: 40.7884752761726 - type: nauc_ndcg_at_5_max value: 18.121892702668045 - type: nauc_ndcg_at_5_std value: 3.8369089974368573 - type: nauc_precision_at_1000_diff1 value: 7.089909563758634 - type: nauc_precision_at_1000_max value: 19.071511820051107 - type: nauc_precision_at_1000_std value: 8.71710715708378 - type: nauc_precision_at_100_diff1 value: 17.577598014207858 - type: nauc_precision_at_100_max value: 18.757305391811315 - type: nauc_precision_at_100_std value: 8.571496733416154 - type: nauc_precision_at_10_diff1 value: 28.943153297767832 - type: nauc_precision_at_10_max value: 16.38624587520458 - type: nauc_precision_at_10_std value: 3.437574061625469 - type: nauc_precision_at_1_diff1 value: 48.90268334274423 - type: nauc_precision_at_1_max value: 22.148416208142038 - type: nauc_precision_at_1_std value: 3.482278486678414 - type: nauc_precision_at_20_diff1 value: 26.474908278743044 - type: nauc_precision_at_20_max value: 16.47527151110289 - type: nauc_precision_at_20_std value: 7.5305698853598 - type: nauc_precision_at_3_diff1 value: 39.54288018891221 - type: nauc_precision_at_3_max value: 17.284449255178835 - type: nauc_precision_at_3_std value: 2.8714843759024866 - type: nauc_precision_at_5_diff1 value: 34.480901699228006 - type: nauc_precision_at_5_max value: 19.44159427138771 - type: nauc_precision_at_5_std value: 3.9140233563987525 - type: nauc_recall_at_1000_diff1 value: 14.656193188687894 - type: nauc_recall_at_1000_max value: 15.810571367218888 - type: nauc_recall_at_1000_std value: 12.334573972835202 - type: nauc_recall_at_100_diff1 value: 18.594617672285707 - type: nauc_recall_at_100_max value: 15.15863525459292 - type: nauc_recall_at_100_std value: 9.115505114921058 - type: nauc_recall_at_10_diff1 value: 29.13269929764077 - type: nauc_recall_at_10_max value: 15.059218016523301 - type: nauc_recall_at_10_std value: 3.7696923586295137 - type: nauc_recall_at_1_diff1 value: 50.853111369434835 - type: nauc_recall_at_1_max value: 20.441039981572366 - type: nauc_recall_at_1_std value: 2.9730312951046747 - type: nauc_recall_at_20_diff1 value: 27.544653538434776 - type: nauc_recall_at_20_max value: 15.420518066694445 - type: nauc_recall_at_20_std value: 7.101778539671523 - type: nauc_recall_at_3_diff1 value: 40.00397565193035 - type: nauc_recall_at_3_max value: 14.717415584208013 - type: nauc_recall_at_3_std value: 3.658957442260116 - type: nauc_recall_at_5_diff1 value: 35.35853159550963 - type: nauc_recall_at_5_max value: 17.049909921279315 - type: nauc_recall_at_5_std value: 4.839540342554651 - type: ndcg_at_1 value: 9.605 - type: ndcg_at_10 value: 14.097000000000001 - type: ndcg_at_100 value: 17.098 - type: ndcg_at_1000 value: 19.948 - type: ndcg_at_20 value: 15.043999999999999 - type: ndcg_at_3 value: 11.683 - type: ndcg_at_5 value: 12.656999999999998 - type: precision_at_1 value: 9.605 - type: precision_at_10 value: 2.215 - type: precision_at_100 value: 0.395 - type: precision_at_1000 value: 0.068 - type: precision_at_20 value: 1.322 - type: precision_at_3 value: 4.859 - type: precision_at_5 value: 3.435 - type: recall_at_1 value: 9.109 - type: recall_at_10 value: 19.618 - type: recall_at_100 value: 34.056 - type: recall_at_1000 value: 56.75599999999999 - type: recall_at_20 value: 23.168 - type: recall_at_3 value: 12.982 - type: recall_at_5 value: 15.315000000000001 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackMathematicaRetrieval (default) revision: 90fceea13679c63fe563ded68f3b6f06e50061de split: test type: mteb/cqadupstack-mathematica metrics: - type: main_score value: 8.895 - type: map_at_1 value: 4.444 - type: map_at_10 value: 6.789000000000001 - type: map_at_100 value: 7.362 - type: map_at_1000 value: 7.455 - type: map_at_20 value: 7.112 - type: map_at_3 value: 5.819 - type: map_at_5 value: 6.237 - type: mrr_at_1 value: 5.970149253731343 - type: mrr_at_10 value: 8.807500197425577 - type: mrr_at_100 value: 9.458867441952432 - type: mrr_at_1000 value: 9.550029897135536 - type: mrr_at_20 value: 9.191142267117858 - type: mrr_at_3 value: 7.669983416252076 - type: mrr_at_5 value: 8.229684908789391 - type: nauc_map_at_1000_diff1 value: 14.923575664521396 - type: nauc_map_at_1000_max value: 14.637382629018258 - type: nauc_map_at_1000_std value: 7.583317007693739 - type: nauc_map_at_100_diff1 value: 14.914938787317187 - type: nauc_map_at_100_max value: 14.57831256590049 - type: nauc_map_at_100_std value: 7.481458525605025 - type: nauc_map_at_10_diff1 value: 15.009158630868363 - type: nauc_map_at_10_max value: 14.587168521042992 - type: nauc_map_at_10_std value: 6.30675561821182 - type: nauc_map_at_1_diff1 value: 23.073067396533048 - type: nauc_map_at_1_max value: 22.526518534617583 - type: nauc_map_at_1_std value: 3.2886460233623356 - type: nauc_map_at_20_diff1 value: 14.55856812493529 - type: nauc_map_at_20_max value: 14.445922336763791 - type: nauc_map_at_20_std value: 7.0979435052536815 - type: nauc_map_at_3_diff1 value: 17.401011477759774 - type: nauc_map_at_3_max value: 16.448773676590882 - type: nauc_map_at_3_std value: 4.181405616554917 - type: nauc_map_at_5_diff1 value: 15.690380485853476 - type: nauc_map_at_5_max value: 15.435047584962474 - type: nauc_map_at_5_std value: 5.232971650136294 - type: nauc_mrr_at_1000_diff1 value: 15.064019296100401 - type: nauc_mrr_at_1000_max value: 15.23275181655676 - type: nauc_mrr_at_1000_std value: 6.62512228446261 - type: nauc_mrr_at_100_diff1 value: 15.04422899632206 - type: nauc_mrr_at_100_max value: 15.180132969802102 - type: nauc_mrr_at_100_std value: 6.569986365469756 - type: nauc_mrr_at_10_diff1 value: 15.513288408498664 - type: nauc_mrr_at_10_max value: 15.639652887265692 - type: nauc_mrr_at_10_std value: 6.08058172017529 - type: nauc_mrr_at_1_diff1 value: 23.174960802057807 - type: nauc_mrr_at_1_max value: 23.10505027161953 - type: nauc_mrr_at_1_std value: 5.000535690775217 - type: nauc_mrr_at_20_diff1 value: 14.944086344466943 - type: nauc_mrr_at_20_max value: 15.058772912777219 - type: nauc_mrr_at_20_std value: 6.406714993528487 - type: nauc_mrr_at_3_diff1 value: 16.945928540219413 - type: nauc_mrr_at_3_max value: 16.999490982460667 - type: nauc_mrr_at_3_std value: 4.2783371592240185 - type: nauc_mrr_at_5_diff1 value: 15.724845028203049 - type: nauc_mrr_at_5_max value: 16.374268642724658 - type: nauc_mrr_at_5_std value: 4.955417882432664 - type: nauc_ndcg_at_1000_diff1 value: 12.64441384439761 - type: nauc_ndcg_at_1000_max value: 12.544144311249642 - type: nauc_ndcg_at_1000_std value: 12.203401112537147 - type: nauc_ndcg_at_100_diff1 value: 12.856101621820079 - type: nauc_ndcg_at_100_max value: 12.15851341921588 - type: nauc_ndcg_at_100_std value: 11.352600283831114 - type: nauc_ndcg_at_10_diff1 value: 12.453755697243285 - type: nauc_ndcg_at_10_max value: 11.750014509834587 - type: nauc_ndcg_at_10_std value: 8.203127809929466 - type: nauc_ndcg_at_1_diff1 value: 23.174960802057807 - type: nauc_ndcg_at_1_max value: 23.10505027161953 - type: nauc_ndcg_at_1_std value: 5.000535690775217 - type: nauc_ndcg_at_20_diff1 value: 11.324071030247564 - type: nauc_ndcg_at_20_max value: 11.094964112045453 - type: nauc_ndcg_at_20_std value: 9.840879835834757 - type: nauc_ndcg_at_3_diff1 value: 15.323525692434862 - type: nauc_ndcg_at_3_max value: 14.559998492898632 - type: nauc_ndcg_at_3_std value: 4.027895180138566 - type: nauc_ndcg_at_5_diff1 value: 13.165086940669635 - type: nauc_ndcg_at_5_max value: 13.32440977723948 - type: nauc_ndcg_at_5_std value: 5.813837007263122 - type: nauc_precision_at_1000_diff1 value: 0.8928955587806005 - type: nauc_precision_at_1000_max value: 4.446218508931589 - type: nauc_precision_at_1000_std value: 5.877977195844953 - type: nauc_precision_at_100_diff1 value: 8.33525852681901 - type: nauc_precision_at_100_max value: 7.830647914480539 - type: nauc_precision_at_100_std value: 14.216797498501176 - type: nauc_precision_at_10_diff1 value: 7.765203936267145 - type: nauc_precision_at_10_max value: 7.141939768201643 - type: nauc_precision_at_10_std value: 9.60008810493683 - type: nauc_precision_at_1_diff1 value: 23.174960802057807 - type: nauc_precision_at_1_max value: 23.10505027161953 - type: nauc_precision_at_1_std value: 5.000535690775217 - type: nauc_precision_at_20_diff1 value: 4.810680914106181 - type: nauc_precision_at_20_max value: 4.6628595108449655 - type: nauc_precision_at_20_std value: 12.601430694735827 - type: nauc_precision_at_3_diff1 value: 13.474943796383625 - type: nauc_precision_at_3_max value: 11.709775106648399 - type: nauc_precision_at_3_std value: 3.207743252795555 - type: nauc_precision_at_5_diff1 value: 9.95810736829039 - type: nauc_precision_at_5_max value: 10.456953224514239 - type: nauc_precision_at_5_std value: 5.623208634930042 - type: nauc_recall_at_1000_diff1 value: 9.834451295472817 - type: nauc_recall_at_1000_max value: 9.848949382055148 - type: nauc_recall_at_1000_std value: 20.975606313150834 - type: nauc_recall_at_100_diff1 value: 10.217335772749356 - type: nauc_recall_at_100_max value: 9.152943313782552 - type: nauc_recall_at_100_std value: 17.31335628449071 - type: nauc_recall_at_10_diff1 value: 7.002474541545711 - type: nauc_recall_at_10_max value: 5.600453872340962 - type: nauc_recall_at_10_std value: 11.697537334063615 - type: nauc_recall_at_1_diff1 value: 23.073067396533048 - type: nauc_recall_at_1_max value: 22.526518534617583 - type: nauc_recall_at_1_std value: 3.2886460233623356 - type: nauc_recall_at_20_diff1 value: 5.418370604760854 - type: nauc_recall_at_20_max value: 5.4952006102593085 - type: nauc_recall_at_20_std value: 14.413914588580981 - type: nauc_recall_at_3_diff1 value: 12.321251599365478 - type: nauc_recall_at_3_max value: 10.062822926598114 - type: nauc_recall_at_3_std value: 5.2675756103944735 - type: nauc_recall_at_5_diff1 value: 7.540388296514483 - type: nauc_recall_at_5_max value: 7.803110889019699 - type: nauc_recall_at_5_std value: 8.317325637513246 - type: ndcg_at_1 value: 5.970000000000001 - type: ndcg_at_10 value: 8.895 - type: ndcg_at_100 value: 11.964 - type: ndcg_at_1000 value: 14.860000000000001 - type: ndcg_at_20 value: 10.104000000000001 - type: ndcg_at_3 value: 6.859999999999999 - type: ndcg_at_5 value: 7.573 - type: precision_at_1 value: 5.970000000000001 - type: precision_at_10 value: 1.779 - type: precision_at_100 value: 0.384 - type: precision_at_1000 value: 0.073 - type: precision_at_20 value: 1.2189999999999999 - type: precision_at_3 value: 3.4000000000000004 - type: precision_at_5 value: 2.537 - type: recall_at_1 value: 4.444 - type: recall_at_10 value: 13.751 - type: recall_at_100 value: 27.537 - type: recall_at_1000 value: 49.079 - type: recall_at_20 value: 18.182000000000002 - type: recall_at_3 value: 7.731000000000001 - type: recall_at_5 value: 9.636 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackPhysicsRetrieval (default) revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 split: test type: mteb/cqadupstack-physics metrics: - type: main_score value: 19.902 - type: map_at_1 value: 12.928999999999998 - type: map_at_10 value: 16.833000000000002 - type: map_at_100 value: 17.615 - type: map_at_1000 value: 17.732 - type: map_at_20 value: 17.207 - type: map_at_3 value: 15.463 - type: map_at_5 value: 16.128999999999998 - type: mrr_at_1 value: 15.976900866217516 - type: mrr_at_10 value: 20.444757627144526 - type: mrr_at_100 value: 21.18213748325402 - type: mrr_at_1000 value: 21.25972081056743 - type: mrr_at_20 value: 20.799603260475223 - type: mrr_at_3 value: 18.928456849534818 - type: mrr_at_5 value: 19.72248957330767 - type: nauc_map_at_1000_diff1 value: 41.27196577011274 - type: nauc_map_at_1000_max value: 30.04254002251132 - type: nauc_map_at_1000_std value: 6.570333369920046 - type: nauc_map_at_100_diff1 value: 41.27551384135304 - type: nauc_map_at_100_max value: 29.99043897557097 - type: nauc_map_at_100_std value: 6.472408363055328 - type: nauc_map_at_10_diff1 value: 41.85444301121017 - type: nauc_map_at_10_max value: 29.81212191843452 - type: nauc_map_at_10_std value: 5.93398567449617 - type: nauc_map_at_1_diff1 value: 46.839384517121886 - type: nauc_map_at_1_max value: 33.10314951759653 - type: nauc_map_at_1_std value: 3.473962823858065 - type: nauc_map_at_20_diff1 value: 41.4328465682072 - type: nauc_map_at_20_max value: 29.97742898678745 - type: nauc_map_at_20_std value: 6.104796006386177 - type: nauc_map_at_3_diff1 value: 43.02691416463743 - type: nauc_map_at_3_max value: 30.42366456898119 - type: nauc_map_at_3_std value: 5.155164523235761 - type: nauc_map_at_5_diff1 value: 42.50855309235288 - type: nauc_map_at_5_max value: 30.268005050849005 - type: nauc_map_at_5_std value: 5.5087675809592955 - type: nauc_mrr_at_1000_diff1 value: 39.918304151052496 - type: nauc_mrr_at_1000_max value: 32.3633242335842 - type: nauc_mrr_at_1000_std value: 9.821534513339788 - type: nauc_mrr_at_100_diff1 value: 39.88894200397407 - type: nauc_mrr_at_100_max value: 32.35005140436353 - type: nauc_mrr_at_100_std value: 9.798405855994671 - type: nauc_mrr_at_10_diff1 value: 40.398911825307096 - type: nauc_mrr_at_10_max value: 32.431125056382164 - type: nauc_mrr_at_10_std value: 9.607804963814376 - type: nauc_mrr_at_1_diff1 value: 44.710224260402306 - type: nauc_mrr_at_1_max value: 34.810999361965784 - type: nauc_mrr_at_1_std value: 6.666781318158904 - type: nauc_mrr_at_20_diff1 value: 40.00961756059491 - type: nauc_mrr_at_20_max value: 32.37658164628154 - type: nauc_mrr_at_20_std value: 9.668733699272558 - type: nauc_mrr_at_3_diff1 value: 41.57115214419929 - type: nauc_mrr_at_3_max value: 32.68793918495075 - type: nauc_mrr_at_3_std value: 9.040233893300375 - type: nauc_mrr_at_5_diff1 value: 41.06814071330848 - type: nauc_mrr_at_5_max value: 32.8245640568574 - type: nauc_mrr_at_5_std value: 9.58857119627648 - type: nauc_ndcg_at_1000_diff1 value: 36.80739838454769 - type: nauc_ndcg_at_1000_max value: 29.789668331458618 - type: nauc_ndcg_at_1000_std value: 11.39764916900706 - type: nauc_ndcg_at_100_diff1 value: 37.11213770959871 - type: nauc_ndcg_at_100_max value: 29.081591038980903 - type: nauc_ndcg_at_100_std value: 10.108782506088897 - type: nauc_ndcg_at_10_diff1 value: 39.5849935712723 - type: nauc_ndcg_at_10_max value: 28.96898719826389 - type: nauc_ndcg_at_10_std value: 7.961681263212508 - type: nauc_ndcg_at_1_diff1 value: 44.710224260402306 - type: nauc_ndcg_at_1_max value: 34.810999361965784 - type: nauc_ndcg_at_1_std value: 6.666781318158904 - type: nauc_ndcg_at_20_diff1 value: 38.12032626231077 - type: nauc_ndcg_at_20_max value: 29.18302919363044 - type: nauc_ndcg_at_20_std value: 8.263802202822081 - type: nauc_ndcg_at_3_diff1 value: 41.69966283174317 - type: nauc_ndcg_at_3_max value: 30.929246645213066 - type: nauc_ndcg_at_3_std value: 7.216761468782046 - type: nauc_ndcg_at_5_diff1 value: 41.01584530945962 - type: nauc_ndcg_at_5_max value: 30.289879950898214 - type: nauc_ndcg_at_5_std value: 7.4367837578277936 - type: nauc_precision_at_1000_diff1 value: 5.296272992814253 - type: nauc_precision_at_1000_max value: 19.76310705995752 - type: nauc_precision_at_1000_std value: 24.704985621130156 - type: nauc_precision_at_100_diff1 value: 16.46333749868499 - type: nauc_precision_at_100_max value: 26.043739871376527 - type: nauc_precision_at_100_std value: 26.092651162394155 - type: nauc_precision_at_10_diff1 value: 30.365327315976653 - type: nauc_precision_at_10_max value: 28.924585920344946 - type: nauc_precision_at_10_std value: 17.70407674779879 - type: nauc_precision_at_1_diff1 value: 44.710224260402306 - type: nauc_precision_at_1_max value: 34.810999361965784 - type: nauc_precision_at_1_std value: 6.666781318158904 - type: nauc_precision_at_20_diff1 value: 24.315922316558428 - type: nauc_precision_at_20_max value: 28.874260987195967 - type: nauc_precision_at_20_std value: 19.72374746122734 - type: nauc_precision_at_3_diff1 value: 37.37798681409137 - type: nauc_precision_at_3_max value: 32.308460896865824 - type: nauc_precision_at_3_std value: 12.279945415003562 - type: nauc_precision_at_5_diff1 value: 35.30318091103882 - type: nauc_precision_at_5_max value: 31.820548127213062 - type: nauc_precision_at_5_std value: 14.503599559616163 - type: nauc_recall_at_1000_diff1 value: 19.795948815823216 - type: nauc_recall_at_1000_max value: 24.278386660959896 - type: nauc_recall_at_1000_std value: 22.837222421253944 - type: nauc_recall_at_100_diff1 value: 24.472612415292573 - type: nauc_recall_at_100_max value: 21.91143710710276 - type: nauc_recall_at_100_std value: 15.053133349737896 - type: nauc_recall_at_10_diff1 value: 33.4020176737161 - type: nauc_recall_at_10_max value: 23.033614175897377 - type: nauc_recall_at_10_std value: 8.767203112156356 - type: nauc_recall_at_1_diff1 value: 46.839384517121886 - type: nauc_recall_at_1_max value: 33.10314951759653 - type: nauc_recall_at_1_std value: 3.473962823858065 - type: nauc_recall_at_20_diff1 value: 28.830072771517113 - type: nauc_recall_at_20_max value: 23.489066180696092 - type: nauc_recall_at_20_std value: 9.12579757868168 - type: nauc_recall_at_3_diff1 value: 39.908834198934215 - type: nauc_recall_at_3_max value: 27.068809545101175 - type: nauc_recall_at_3_std value: 6.530892914334164 - type: nauc_recall_at_5_diff1 value: 37.48709101560424 - type: nauc_recall_at_5_max value: 26.081573648351025 - type: nauc_recall_at_5_std value: 7.183952029055236 - type: ndcg_at_1 value: 15.977 - type: ndcg_at_10 value: 19.902 - type: ndcg_at_100 value: 24.086 - type: ndcg_at_1000 value: 27.01 - type: ndcg_at_20 value: 21.175 - type: ndcg_at_3 value: 17.330000000000002 - type: ndcg_at_5 value: 18.342 - type: precision_at_1 value: 15.977 - type: precision_at_10 value: 3.542 - type: precision_at_100 value: 0.679 - type: precision_at_1000 value: 0.109 - type: precision_at_20 value: 2.161 - type: precision_at_3 value: 8.053 - type: precision_at_5 value: 5.679 - type: recall_at_1 value: 12.928999999999998 - type: recall_at_10 value: 25.916 - type: recall_at_100 value: 44.836 - type: recall_at_1000 value: 65.22200000000001 - type: recall_at_20 value: 30.493 - type: recall_at_3 value: 18.241 - type: recall_at_5 value: 21.078 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackProgrammersRetrieval (default) revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 split: test type: mteb/cqadupstack-programmers metrics: - type: main_score value: 15.862000000000002 - type: map_at_1 value: 9.831 - type: map_at_10 value: 13.256 - type: map_at_100 value: 14.008000000000001 - type: map_at_1000 value: 14.113000000000001 - type: map_at_20 value: 13.636999999999999 - type: map_at_3 value: 11.814 - type: map_at_5 value: 12.583 - type: mrr_at_1 value: 11.757990867579908 - type: mrr_at_10 value: 15.494808654055237 - type: mrr_at_100 value: 16.291820589502283 - type: mrr_at_1000 value: 16.374533932974945 - type: mrr_at_20 value: 15.933671804388336 - type: mrr_at_3 value: 13.83181126331811 - type: mrr_at_5 value: 14.6765601217656 - type: nauc_map_at_1000_diff1 value: 33.93453741920144 - type: nauc_map_at_1000_max value: 15.653730492995432 - type: nauc_map_at_1000_std value: 7.8758696471921175 - type: nauc_map_at_100_diff1 value: 33.93938109119093 - type: nauc_map_at_100_max value: 15.600263725191917 - type: nauc_map_at_100_std value: 7.765619322590685 - type: nauc_map_at_10_diff1 value: 34.54464331832195 - type: nauc_map_at_10_max value: 15.612792960561228 - type: nauc_map_at_10_std value: 6.7557841221613915 - type: nauc_map_at_1_diff1 value: 40.25943612185486 - type: nauc_map_at_1_max value: 17.181254846998176 - type: nauc_map_at_1_std value: 4.311873998223975 - type: nauc_map_at_20_diff1 value: 34.286604224077294 - type: nauc_map_at_20_max value: 15.557596686810724 - type: nauc_map_at_20_std value: 7.278138397108883 - type: nauc_map_at_3_diff1 value: 36.73973255367738 - type: nauc_map_at_3_max value: 16.83994296407283 - type: nauc_map_at_3_std value: 6.223159115827186 - type: nauc_map_at_5_diff1 value: 35.141424690409735 - type: nauc_map_at_5_max value: 15.992920926050328 - type: nauc_map_at_5_std value: 6.351250600055855 - type: nauc_mrr_at_1000_diff1 value: 34.73310032530598 - type: nauc_mrr_at_1000_max value: 19.015226556944313 - type: nauc_mrr_at_1000_std value: 9.222546150737514 - type: nauc_mrr_at_100_diff1 value: 34.726753216593245 - type: nauc_mrr_at_100_max value: 18.99769748963775 - type: nauc_mrr_at_100_std value: 9.174113672327863 - type: nauc_mrr_at_10_diff1 value: 35.44871459634613 - type: nauc_mrr_at_10_max value: 19.123376102993888 - type: nauc_mrr_at_10_std value: 8.400683156036651 - type: nauc_mrr_at_1_diff1 value: 41.66420742315266 - type: nauc_mrr_at_1_max value: 20.29699577568541 - type: nauc_mrr_at_1_std value: 6.552893551004773 - type: nauc_mrr_at_20_diff1 value: 34.97080168567599 - type: nauc_mrr_at_20_max value: 18.93820346421597 - type: nauc_mrr_at_20_std value: 8.88369463529979 - type: nauc_mrr_at_3_diff1 value: 37.82881961939195 - type: nauc_mrr_at_3_max value: 20.23353217486363 - type: nauc_mrr_at_3_std value: 8.335430576995872 - type: nauc_mrr_at_5_diff1 value: 36.39194951225287 - type: nauc_mrr_at_5_max value: 19.51895403281475 - type: nauc_mrr_at_5_std value: 8.109986680725223 - type: nauc_ndcg_at_1000_diff1 value: 29.082397825054134 - type: nauc_ndcg_at_1000_max value: 16.79542535678252 - type: nauc_ndcg_at_1000_std value: 13.862883511514385 - type: nauc_ndcg_at_100_diff1 value: 29.052598252998568 - type: nauc_ndcg_at_100_max value: 15.498427568714371 - type: nauc_ndcg_at_100_std value: 11.726792940214132 - type: nauc_ndcg_at_10_diff1 value: 32.1345507923688 - type: nauc_ndcg_at_10_max value: 15.522253057572243 - type: nauc_ndcg_at_10_std value: 8.033462171395978 - type: nauc_ndcg_at_1_diff1 value: 41.66420742315266 - type: nauc_ndcg_at_1_max value: 20.29699577568541 - type: nauc_ndcg_at_1_std value: 6.552893551004773 - type: nauc_ndcg_at_20_diff1 value: 30.9118537718024 - type: nauc_ndcg_at_20_max value: 15.015691320922405 - type: nauc_ndcg_at_20_std value: 9.48348066099931 - type: nauc_ndcg_at_3_diff1 value: 36.00136268031041 - type: nauc_ndcg_at_3_max value: 18.106666639494865 - type: nauc_ndcg_at_3_std value: 7.641902435989431 - type: nauc_ndcg_at_5_diff1 value: 33.39201547133596 - type: nauc_ndcg_at_5_max value: 16.476689691452638 - type: nauc_ndcg_at_5_std value: 7.369674781372547 - type: nauc_precision_at_1000_diff1 value: 6.471252357066656 - type: nauc_precision_at_1000_max value: 19.69714506243997 - type: nauc_precision_at_1000_std value: 19.55604767049242 - type: nauc_precision_at_100_diff1 value: 14.901264085785481 - type: nauc_precision_at_100_max value: 18.109459081509822 - type: nauc_precision_at_100_std value: 21.114563137000474 - type: nauc_precision_at_10_diff1 value: 27.5518231119986 - type: nauc_precision_at_10_max value: 15.967381663307059 - type: nauc_precision_at_10_std value: 11.45892974481074 - type: nauc_precision_at_1_diff1 value: 41.66420742315266 - type: nauc_precision_at_1_max value: 20.29699577568541 - type: nauc_precision_at_1_std value: 6.552893551004773 - type: nauc_precision_at_20_diff1 value: 24.871167172495863 - type: nauc_precision_at_20_max value: 16.035625528276007 - type: nauc_precision_at_20_std value: 16.40037479366967 - type: nauc_precision_at_3_diff1 value: 35.34609472177138 - type: nauc_precision_at_3_max value: 20.28057060245756 - type: nauc_precision_at_3_std value: 9.58695451354911 - type: nauc_precision_at_5_diff1 value: 31.12453786882641 - type: nauc_precision_at_5_max value: 17.714809323391766 - type: nauc_precision_at_5_std value: 9.540687572068887 - type: nauc_recall_at_1000_diff1 value: 13.176944792680187 - type: nauc_recall_at_1000_max value: 17.215938373520867 - type: nauc_recall_at_1000_std value: 31.763351387419913 - type: nauc_recall_at_100_diff1 value: 15.598307875167269 - type: nauc_recall_at_100_max value: 11.571312022801102 - type: nauc_recall_at_100_std value: 18.72066053860531 - type: nauc_recall_at_10_diff1 value: 25.20073017671981 - type: nauc_recall_at_10_max value: 12.05920538584769 - type: nauc_recall_at_10_std value: 9.127287803525167 - type: nauc_recall_at_1_diff1 value: 40.25943612185486 - type: nauc_recall_at_1_max value: 17.181254846998176 - type: nauc_recall_at_1_std value: 4.311873998223975 - type: nauc_recall_at_20_diff1 value: 21.87476573323018 - type: nauc_recall_at_20_max value: 10.324185189089619 - type: nauc_recall_at_20_std value: 12.342028690096459 - type: nauc_recall_at_3_diff1 value: 32.78814063821437 - type: nauc_recall_at_3_max value: 16.638784171801436 - type: nauc_recall_at_3_std value: 8.529115114779637 - type: nauc_recall_at_5_diff1 value: 28.192900822422317 - type: nauc_recall_at_5_max value: 13.974726351715857 - type: nauc_recall_at_5_std value: 8.09305084632621 - type: ndcg_at_1 value: 11.758000000000001 - type: ndcg_at_10 value: 15.862000000000002 - type: ndcg_at_100 value: 19.949 - type: ndcg_at_1000 value: 22.917 - type: ndcg_at_20 value: 17.249 - type: ndcg_at_3 value: 12.992 - type: ndcg_at_5 value: 14.266000000000002 - type: precision_at_1 value: 11.758000000000001 - type: precision_at_10 value: 2.82 - type: precision_at_100 value: 0.575 - type: precision_at_1000 value: 0.098 - type: precision_at_20 value: 1.7870000000000001 - type: precision_at_3 value: 5.822 - type: precision_at_5 value: 4.315 - type: recall_at_1 value: 9.831 - type: recall_at_10 value: 21.762999999999998 - type: recall_at_100 value: 40.207 - type: recall_at_1000 value: 61.635 - type: recall_at_20 value: 26.826 - type: recall_at_3 value: 13.969999999999999 - type: recall_at_5 value: 17.154 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackRetrieval (default) revision: CQADupstackRetrieval is a combined dataset split: test type: CQADupstackRetrieval metrics: - type: main_score value: 17.016083333333334 - type: ndcg_at_10 value: 17.016083333333334 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackStatsRetrieval (default) revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a split: test type: mteb/cqadupstack-stats metrics: - type: main_score value: 11.457 - type: map_at_1 value: 6.798 - type: map_at_10 value: 9.513 - type: map_at_100 value: 10.11 - type: map_at_1000 value: 10.181999999999999 - type: map_at_20 value: 9.852 - type: map_at_3 value: 8.459999999999999 - type: map_at_5 value: 9.095 - type: mrr_at_1 value: 8.43558282208589 - type: mrr_at_10 value: 11.242818190670953 - type: mrr_at_100 value: 11.841115877888047 - type: mrr_at_1000 value: 11.910635997616325 - type: mrr_at_20 value: 11.596258015622588 - type: mrr_at_3 value: 10.122699386503067 - type: mrr_at_5 value: 10.782208588957056 - type: nauc_map_at_1000_diff1 value: 33.754657655521825 - type: nauc_map_at_1000_max value: 20.457874599194977 - type: nauc_map_at_1000_std value: 4.356173597738065 - type: nauc_map_at_100_diff1 value: 33.75222679569881 - type: nauc_map_at_100_max value: 20.373956157972724 - type: nauc_map_at_100_std value: 4.252302912475765 - type: nauc_map_at_10_diff1 value: 34.77872705587748 - type: nauc_map_at_10_max value: 20.93118729929346 - type: nauc_map_at_10_std value: 3.481910641472398 - type: nauc_map_at_1_diff1 value: 42.058523271621276 - type: nauc_map_at_1_max value: 19.398661310678737 - type: nauc_map_at_1_std value: -1.9329828695069966 - type: nauc_map_at_20_diff1 value: 34.32132356844234 - type: nauc_map_at_20_max value: 20.836011847513134 - type: nauc_map_at_20_std value: 3.410902073845993 - type: nauc_map_at_3_diff1 value: 36.8129992491477 - type: nauc_map_at_3_max value: 21.49364083314497 - type: nauc_map_at_3_std value: 2.8543672506917117 - type: nauc_map_at_5_diff1 value: 35.945765614409595 - type: nauc_map_at_5_max value: 21.821959253251073 - type: nauc_map_at_5_std value: 3.1795889661755754 - type: nauc_mrr_at_1000_diff1 value: 33.022280754336535 - type: nauc_mrr_at_1000_max value: 20.31974398955361 - type: nauc_mrr_at_1000_std value: 6.915574901994777 - type: nauc_mrr_at_100_diff1 value: 32.98012701377776 - type: nauc_mrr_at_100_max value: 20.217936050257485 - type: nauc_mrr_at_100_std value: 6.853368541174533 - type: nauc_mrr_at_10_diff1 value: 34.0521482962105 - type: nauc_mrr_at_10_max value: 20.594837283745004 - type: nauc_mrr_at_10_std value: 6.58219400975866 - type: nauc_mrr_at_1_diff1 value: 40.45214208803864 - type: nauc_mrr_at_1_max value: 20.246074459121917 - type: nauc_mrr_at_1_std value: 3.6861996527886007 - type: nauc_mrr_at_20_diff1 value: 33.40956751827326 - type: nauc_mrr_at_20_max value: 20.570275995460932 - type: nauc_mrr_at_20_std value: 6.243011136595918 - type: nauc_mrr_at_3_diff1 value: 36.31911031414795 - type: nauc_mrr_at_3_max value: 21.695701449295836 - type: nauc_mrr_at_3_std value: 6.71267279773233 - type: nauc_mrr_at_5_diff1 value: 35.13580430980389 - type: nauc_mrr_at_5_max value: 21.723293067977693 - type: nauc_mrr_at_5_std value: 6.269186070012771 - type: nauc_ndcg_at_1000_diff1 value: 26.716650512928574 - type: nauc_ndcg_at_1000_max value: 18.323227051095493 - type: nauc_ndcg_at_1000_std value: 10.182374858813544 - type: nauc_ndcg_at_100_diff1 value: 27.023329777242445 - type: nauc_ndcg_at_100_max value: 17.4041094989256 - type: nauc_ndcg_at_100_std value: 8.607201276878204 - type: nauc_ndcg_at_10_diff1 value: 31.921453307307818 - type: nauc_ndcg_at_10_max value: 20.328563944294817 - type: nauc_ndcg_at_10_std value: 5.531328567900397 - type: nauc_ndcg_at_1_diff1 value: 40.45214208803864 - type: nauc_ndcg_at_1_max value: 20.246074459121917 - type: nauc_ndcg_at_1_std value: 3.6861996527886007 - type: nauc_ndcg_at_20_diff1 value: 30.279986443553863 - type: nauc_ndcg_at_20_max value: 20.274259234859194 - type: nauc_ndcg_at_20_std value: 5.0661641286538925 - type: nauc_ndcg_at_3_diff1 value: 35.40139952163887 - type: nauc_ndcg_at_3_max value: 21.8390120280498 - type: nauc_ndcg_at_3_std value: 5.417193004461638 - type: nauc_ndcg_at_5_diff1 value: 34.323991615044044 - type: nauc_ndcg_at_5_max value: 22.44454175298003 - type: nauc_ndcg_at_5_std value: 5.058913656381477 - type: nauc_precision_at_1000_diff1 value: 8.13341460956022 - type: nauc_precision_at_1000_max value: 13.380869610400731 - type: nauc_precision_at_1000_std value: 25.77566088719011 - type: nauc_precision_at_100_diff1 value: 12.028198307574947 - type: nauc_precision_at_100_max value: 9.99491259218647 - type: nauc_precision_at_100_std value: 20.26038939641748 - type: nauc_precision_at_10_diff1 value: 25.497863066445802 - type: nauc_precision_at_10_max value: 19.951934819022966 - type: nauc_precision_at_10_std value: 13.029428588116488 - type: nauc_precision_at_1_diff1 value: 40.45214208803864 - type: nauc_precision_at_1_max value: 20.246074459121917 - type: nauc_precision_at_1_std value: 3.6861996527886007 - type: nauc_precision_at_20_diff1 value: 21.270433967723527 - type: nauc_precision_at_20_max value: 20.20704051155486 - type: nauc_precision_at_20_std value: 10.606697205011349 - type: nauc_precision_at_3_diff1 value: 34.304974107764636 - type: nauc_precision_at_3_max value: 24.786027767206704 - type: nauc_precision_at_3_std value: 12.919584289443248 - type: nauc_precision_at_5_diff1 value: 31.235010233089454 - type: nauc_precision_at_5_max value: 25.888178221422027 - type: nauc_precision_at_5_std value: 12.04974180403603 - type: nauc_recall_at_1000_diff1 value: 10.70347303527697 - type: nauc_recall_at_1000_max value: 11.531776655259092 - type: nauc_recall_at_1000_std value: 20.09518174937834 - type: nauc_recall_at_100_diff1 value: 12.277161162587646 - type: nauc_recall_at_100_max value: 9.031651314357903 - type: nauc_recall_at_100_std value: 14.946530478779566 - type: nauc_recall_at_10_diff1 value: 25.751282561301597 - type: nauc_recall_at_10_max value: 18.410538940956624 - type: nauc_recall_at_10_std value: 7.052566618916148 - type: nauc_recall_at_1_diff1 value: 42.058523271621276 - type: nauc_recall_at_1_max value: 19.398661310678737 - type: nauc_recall_at_1_std value: -1.9329828695069966 - type: nauc_recall_at_20_diff1 value: 21.876105916783473 - type: nauc_recall_at_20_max value: 18.14029808306082 - type: nauc_recall_at_20_std value: 5.721370338729993 - type: nauc_recall_at_3_diff1 value: 32.349105117433645 - type: nauc_recall_at_3_max value: 22.475284730157217 - type: nauc_recall_at_3_std value: 6.577737452085277 - type: nauc_recall_at_5_diff1 value: 30.45726437530916 - type: nauc_recall_at_5_max value: 22.993204324458517 - type: nauc_recall_at_5_std value: 6.237822274407502 - type: ndcg_at_1 value: 8.436 - type: ndcg_at_10 value: 11.457 - type: ndcg_at_100 value: 14.618 - type: ndcg_at_1000 value: 16.803 - type: ndcg_at_20 value: 12.67 - type: ndcg_at_3 value: 9.396 - type: ndcg_at_5 value: 10.458 - type: precision_at_1 value: 8.436 - type: precision_at_10 value: 2.025 - type: precision_at_100 value: 0.391 - type: precision_at_1000 value: 0.063 - type: precision_at_20 value: 1.304 - type: precision_at_3 value: 4.192 - type: precision_at_5 value: 3.221 - type: recall_at_1 value: 6.798 - type: recall_at_10 value: 15.878999999999998 - type: recall_at_100 value: 30.768 - type: recall_at_1000 value: 47.451 - type: recall_at_20 value: 20.466 - type: recall_at_3 value: 10.224 - type: recall_at_5 value: 12.881 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackTexRetrieval (default) revision: 46989137a86843e03a6195de44b09deda022eec7 split: test type: mteb/cqadupstack-tex metrics: - type: main_score value: 9.754999999999999 - type: map_at_1 value: 5.489999999999999 - type: map_at_10 value: 7.9350000000000005 - type: map_at_100 value: 8.376999999999999 - type: map_at_1000 value: 8.458 - type: map_at_20 value: 8.14 - type: map_at_3 value: 7.166 - type: map_at_5 value: 7.5840000000000005 - type: mrr_at_1 value: 7.054370268410186 - type: mrr_at_10 value: 9.948655764209787 - type: mrr_at_100 value: 10.44089540191581 - type: mrr_at_1000 value: 10.510808098620316 - type: mrr_at_20 value: 10.18549289814409 - type: mrr_at_3 value: 9.027299839412715 - type: mrr_at_5 value: 9.52626749254416 - type: nauc_map_at_1000_diff1 value: 32.76388527748132 - type: nauc_map_at_1000_max value: 26.76472945437023 - type: nauc_map_at_1000_std value: 5.076773141116664 - type: nauc_map_at_100_diff1 value: 32.84910041131489 - type: nauc_map_at_100_max value: 26.776649275369763 - type: nauc_map_at_100_std value: 4.982288267487467 - type: nauc_map_at_10_diff1 value: 33.69288297350157 - type: nauc_map_at_10_max value: 27.030787162656093 - type: nauc_map_at_10_std value: 4.319996549665479 - type: nauc_map_at_1_diff1 value: 45.07110295953283 - type: nauc_map_at_1_max value: 31.183919870403624 - type: nauc_map_at_1_std value: 3.2596636083232524 - type: nauc_map_at_20_diff1 value: 33.18385578478434 - type: nauc_map_at_20_max value: 26.750880392311256 - type: nauc_map_at_20_std value: 4.560028824060983 - type: nauc_map_at_3_diff1 value: 36.134060387060806 - type: nauc_map_at_3_max value: 28.53718072767372 - type: nauc_map_at_3_std value: 3.8039060416364054 - type: nauc_map_at_5_diff1 value: 34.85287692775015 - type: nauc_map_at_5_max value: 27.89364342330856 - type: nauc_map_at_5_std value: 4.119474259507159 - type: nauc_mrr_at_1000_diff1 value: 32.015809492076826 - type: nauc_mrr_at_1000_max value: 27.431639711646994 - type: nauc_mrr_at_1000_std value: 5.95554166485951 - type: nauc_mrr_at_100_diff1 value: 32.07039747646208 - type: nauc_mrr_at_100_max value: 27.452847130237775 - type: nauc_mrr_at_100_std value: 5.905310921828455 - type: nauc_mrr_at_10_diff1 value: 32.93108532798797 - type: nauc_mrr_at_10_max value: 27.768472855609204 - type: nauc_mrr_at_10_std value: 5.580104763303006 - type: nauc_mrr_at_1_diff1 value: 43.888408590108355 - type: nauc_mrr_at_1_max value: 32.903967259484176 - type: nauc_mrr_at_1_std value: 3.514629542175588 - type: nauc_mrr_at_20_diff1 value: 32.408176921975254 - type: nauc_mrr_at_20_max value: 27.470576205679897 - type: nauc_mrr_at_20_std value: 5.716181575723001 - type: nauc_mrr_at_3_diff1 value: 35.354655207362356 - type: nauc_mrr_at_3_max value: 29.14309593167405 - type: nauc_mrr_at_3_std value: 4.63189493416609 - type: nauc_mrr_at_5_diff1 value: 33.970622089384825 - type: nauc_mrr_at_5_max value: 28.6239836688986 - type: nauc_mrr_at_5_std value: 5.122010745650993 - type: nauc_ndcg_at_1000_diff1 value: 25.030181517448163 - type: nauc_ndcg_at_1000_max value: 24.25419053775242 - type: nauc_ndcg_at_1000_std value: 9.178235317241148 - type: nauc_ndcg_at_100_diff1 value: 26.546832760443966 - type: nauc_ndcg_at_100_max value: 24.42201784253177 - type: nauc_ndcg_at_100_std value: 7.9899910907634375 - type: nauc_ndcg_at_10_diff1 value: 29.856179532797423 - type: nauc_ndcg_at_10_max value: 25.424197578846012 - type: nauc_ndcg_at_10_std value: 5.1638300059562035 - type: nauc_ndcg_at_1_diff1 value: 43.888408590108355 - type: nauc_ndcg_at_1_max value: 32.903967259484176 - type: nauc_ndcg_at_1_std value: 3.514629542175588 - type: nauc_ndcg_at_20_diff1 value: 28.387788168718874 - type: nauc_ndcg_at_20_max value: 24.54850515588615 - type: nauc_ndcg_at_20_std value: 5.896669986261477 - type: nauc_ndcg_at_3_diff1 value: 34.072630397644424 - type: nauc_ndcg_at_3_max value: 28.28910465749962 - type: nauc_ndcg_at_3_std value: 4.108392335721374 - type: nauc_ndcg_at_5_diff1 value: 32.01123351290829 - type: nauc_ndcg_at_5_max value: 27.245024254467303 - type: nauc_ndcg_at_5_std value: 4.721870277645733 - type: nauc_precision_at_1000_diff1 value: 10.47217681263907 - type: nauc_precision_at_1000_max value: 20.919793131324727 - type: nauc_precision_at_1000_std value: 14.804007062294563 - type: nauc_precision_at_100_diff1 value: 16.685502515637722 - type: nauc_precision_at_100_max value: 23.37373409901207 - type: nauc_precision_at_100_std value: 13.953311698132442 - type: nauc_precision_at_10_diff1 value: 22.478790016325785 - type: nauc_precision_at_10_max value: 23.607477242235102 - type: nauc_precision_at_10_std value: 7.794068171304157 - type: nauc_precision_at_1_diff1 value: 43.888408590108355 - type: nauc_precision_at_1_max value: 32.903967259484176 - type: nauc_precision_at_1_std value: 3.514629542175588 - type: nauc_precision_at_20_diff1 value: 19.959179713421722 - type: nauc_precision_at_20_max value: 21.738126842321893 - type: nauc_precision_at_20_std value: 9.007914166096132 - type: nauc_precision_at_3_diff1 value: 29.984253127282134 - type: nauc_precision_at_3_max value: 28.271022607772796 - type: nauc_precision_at_3_std value: 5.620451575052563 - type: nauc_precision_at_5_diff1 value: 26.198401324939464 - type: nauc_precision_at_5_max value: 26.593956126902786 - type: nauc_precision_at_5_std value: 6.684705108310583 - type: nauc_recall_at_1000_diff1 value: 9.812234445343657 - type: nauc_recall_at_1000_max value: 17.800710147129053 - type: nauc_recall_at_1000_std value: 15.826278320231745 - type: nauc_recall_at_100_diff1 value: 14.586175748060896 - type: nauc_recall_at_100_max value: 18.340956025066333 - type: nauc_recall_at_100_std value: 12.791161727474043 - type: nauc_recall_at_10_diff1 value: 21.286255365948538 - type: nauc_recall_at_10_max value: 20.04866550317387 - type: nauc_recall_at_10_std value: 5.645106302785361 - type: nauc_recall_at_1_diff1 value: 45.07110295953283 - type: nauc_recall_at_1_max value: 31.183919870403624 - type: nauc_recall_at_1_std value: 3.2596636083232524 - type: nauc_recall_at_20_diff1 value: 18.757519729175094 - type: nauc_recall_at_20_max value: 18.59809411356838 - type: nauc_recall_at_20_std value: 7.482712453171494 - type: nauc_recall_at_3_diff1 value: 29.350550830882405 - type: nauc_recall_at_3_max value: 26.26284543188125 - type: nauc_recall_at_3_std value: 4.284032658092434 - type: nauc_recall_at_5_diff1 value: 25.247444183841345 - type: nauc_recall_at_5_max value: 23.639030774195213 - type: nauc_recall_at_5_std value: 5.05748857090612 - type: ndcg_at_1 value: 7.054 - type: ndcg_at_10 value: 9.754999999999999 - type: ndcg_at_100 value: 12.252 - type: ndcg_at_1000 value: 14.658999999999999 - type: ndcg_at_20 value: 10.508000000000001 - type: ndcg_at_3 value: 8.265 - type: ndcg_at_5 value: 8.929 - type: precision_at_1 value: 7.054 - type: precision_at_10 value: 1.807 - type: precision_at_100 value: 0.368 - type: precision_at_1000 value: 0.06899999999999999 - type: precision_at_20 value: 1.1199999999999999 - type: precision_at_3 value: 3.9690000000000003 - type: precision_at_5 value: 2.863 - type: recall_at_1 value: 5.489999999999999 - type: recall_at_10 value: 13.422 - type: recall_at_100 value: 24.962999999999997 - type: recall_at_1000 value: 42.725 - type: recall_at_20 value: 16.259 - type: recall_at_3 value: 9.155000000000001 - type: recall_at_5 value: 10.923 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackUnixRetrieval (default) revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 split: test type: mteb/cqadupstack-unix metrics: - type: main_score value: 16.884 - type: map_at_1 value: 11.259 - type: map_at_10 value: 14.371999999999998 - type: map_at_100 value: 14.921999999999999 - type: map_at_1000 value: 15.012 - type: map_at_20 value: 14.643 - type: map_at_3 value: 13.196 - type: map_at_5 value: 13.786000000000001 - type: mrr_at_1 value: 13.619402985074627 - type: mrr_at_10 value: 17.155739161336175 - type: mrr_at_100 value: 17.682382182436477 - type: mrr_at_1000 value: 17.762865075369113 - type: mrr_at_20 value: 17.394179616617638 - type: mrr_at_3 value: 15.951492537313436 - type: mrr_at_5 value: 16.497201492537318 - type: nauc_map_at_1000_diff1 value: 47.4265740975564 - type: nauc_map_at_1000_max value: 28.882262726128438 - type: nauc_map_at_1000_std value: 8.733456805684261 - type: nauc_map_at_100_diff1 value: 47.47182414534892 - type: nauc_map_at_100_max value: 28.85824710228484 - type: nauc_map_at_100_std value: 8.689373453465027 - type: nauc_map_at_10_diff1 value: 48.02651594284678 - type: nauc_map_at_10_max value: 29.238822235344035 - type: nauc_map_at_10_std value: 8.33007800978345 - type: nauc_map_at_1_diff1 value: 56.39452680423106 - type: nauc_map_at_1_max value: 32.60008414160042 - type: nauc_map_at_1_std value: 6.843961503288069 - type: nauc_map_at_20_diff1 value: 47.63901968476526 - type: nauc_map_at_20_max value: 29.025324617088327 - type: nauc_map_at_20_std value: 8.643210479120588 - type: nauc_map_at_3_diff1 value: 49.40628498975407 - type: nauc_map_at_3_max value: 30.22948877331367 - type: nauc_map_at_3_std value: 7.289154264399903 - type: nauc_map_at_5_diff1 value: 48.664130342694136 - type: nauc_map_at_5_max value: 30.14327671294244 - type: nauc_map_at_5_std value: 7.939333631753251 - type: nauc_mrr_at_1000_diff1 value: 44.58799837398294 - type: nauc_mrr_at_1000_max value: 31.03541915705859 - type: nauc_mrr_at_1000_std value: 10.403824515337941 - type: nauc_mrr_at_100_diff1 value: 44.601824537567715 - type: nauc_mrr_at_100_max value: 31.02756566133194 - type: nauc_mrr_at_100_std value: 10.374041246429492 - type: nauc_mrr_at_10_diff1 value: 45.08809081749144 - type: nauc_mrr_at_10_max value: 31.57615351364963 - type: nauc_mrr_at_10_std value: 10.29441865771061 - type: nauc_mrr_at_1_diff1 value: 53.78193049233505 - type: nauc_mrr_at_1_max value: 35.795787308983364 - type: nauc_mrr_at_1_std value: 9.700924818901061 - type: nauc_mrr_at_20_diff1 value: 44.74335182043816 - type: nauc_mrr_at_20_max value: 31.18129900426782 - type: nauc_mrr_at_20_std value: 10.385325054118825 - type: nauc_mrr_at_3_diff1 value: 46.73779708259278 - type: nauc_mrr_at_3_max value: 32.65075209697959 - type: nauc_mrr_at_3_std value: 9.728066031213869 - type: nauc_mrr_at_5_diff1 value: 45.92982408736637 - type: nauc_mrr_at_5_max value: 32.467526279204826 - type: nauc_mrr_at_5_std value: 9.989919602029717 - type: nauc_ndcg_at_1000_diff1 value: 40.92066479403982 - type: nauc_ndcg_at_1000_max value: 26.324838581358712 - type: nauc_ndcg_at_1000_std value: 11.523782722688093 - type: nauc_ndcg_at_100_diff1 value: 41.69901831802912 - type: nauc_ndcg_at_100_max value: 26.05948550508969 - type: nauc_ndcg_at_100_std value: 10.741879131890466 - type: nauc_ndcg_at_10_diff1 value: 43.984470289795006 - type: nauc_ndcg_at_10_max value: 27.712165270383217 - type: nauc_ndcg_at_10_std value: 9.664252780617716 - type: nauc_ndcg_at_1_diff1 value: 53.78193049233505 - type: nauc_ndcg_at_1_max value: 35.795787308983364 - type: nauc_ndcg_at_1_std value: 9.700924818901061 - type: nauc_ndcg_at_20_diff1 value: 42.87969088645589 - type: nauc_ndcg_at_20_max value: 26.93508319676996 - type: nauc_ndcg_at_20_std value: 10.383528785973736 - type: nauc_ndcg_at_3_diff1 value: 46.50711903290246 - type: nauc_ndcg_at_3_max value: 30.119861670148136 - type: nauc_ndcg_at_3_std value: 8.209698597192652 - type: nauc_ndcg_at_5_diff1 value: 45.5276661506903 - type: nauc_ndcg_at_5_max value: 29.727216155363013 - type: nauc_ndcg_at_5_std value: 8.969137019208551 - type: nauc_precision_at_1000_diff1 value: 13.186344514919291 - type: nauc_precision_at_1000_max value: 14.081180493706894 - type: nauc_precision_at_1000_std value: 13.331957277782028 - type: nauc_precision_at_100_diff1 value: 25.836947568988094 - type: nauc_precision_at_100_max value: 19.399450264723857 - type: nauc_precision_at_100_std value: 15.996979763079173 - type: nauc_precision_at_10_diff1 value: 31.611911937904136 - type: nauc_precision_at_10_max value: 23.67106809118961 - type: nauc_precision_at_10_std value: 12.494002491494403 - type: nauc_precision_at_1_diff1 value: 53.78193049233505 - type: nauc_precision_at_1_max value: 35.795787308983364 - type: nauc_precision_at_1_std value: 9.700924818901061 - type: nauc_precision_at_20_diff1 value: 28.52666886145722 - type: nauc_precision_at_20_max value: 21.954240311035203 - type: nauc_precision_at_20_std value: 14.844645388086807 - type: nauc_precision_at_3_diff1 value: 38.45498467923997 - type: nauc_precision_at_3_max value: 29.266449529306882 - type: nauc_precision_at_3_std value: 9.049210381929473 - type: nauc_precision_at_5_diff1 value: 36.09730656980118 - type: nauc_precision_at_5_max value: 28.837127135797243 - type: nauc_precision_at_5_std value: 11.158339114522931 - type: nauc_recall_at_1000_diff1 value: 21.260887713456125 - type: nauc_recall_at_1000_max value: 16.113129212962036 - type: nauc_recall_at_1000_std value: 18.480136835190926 - type: nauc_recall_at_100_diff1 value: 27.104482564680143 - type: nauc_recall_at_100_max value: 15.992106261015381 - type: nauc_recall_at_100_std value: 13.84189240491372 - type: nauc_recall_at_10_diff1 value: 35.07971219401454 - type: nauc_recall_at_10_max value: 21.285398091407597 - type: nauc_recall_at_10_std value: 11.2371939944325 - type: nauc_recall_at_1_diff1 value: 56.39452680423106 - type: nauc_recall_at_1_max value: 32.60008414160042 - type: nauc_recall_at_1_std value: 6.843961503288069 - type: nauc_recall_at_20_diff1 value: 32.39512106898805 - type: nauc_recall_at_20_max value: 19.218626368924355 - type: nauc_recall_at_20_std value: 12.883976865810729 - type: nauc_recall_at_3_diff1 value: 42.44181844531972 - type: nauc_recall_at_3_max value: 26.878784537566723 - type: nauc_recall_at_3_std value: 8.021682738108238 - type: nauc_recall_at_5_diff1 value: 39.71281577688504 - type: nauc_recall_at_5_max value: 26.741868241320095 - type: nauc_recall_at_5_std value: 9.776821004059626 - type: ndcg_at_1 value: 13.619 - type: ndcg_at_10 value: 16.884 - type: ndcg_at_100 value: 19.919999999999998 - type: ndcg_at_1000 value: 22.61 - type: ndcg_at_20 value: 17.802 - type: ndcg_at_3 value: 14.601 - type: ndcg_at_5 value: 15.47 - type: precision_at_1 value: 13.619 - type: precision_at_10 value: 2.8080000000000003 - type: precision_at_100 value: 0.485 - type: precision_at_1000 value: 0.08099999999999999 - type: precision_at_20 value: 1.66 - type: precision_at_3 value: 6.468 - type: precision_at_5 value: 4.496 - type: recall_at_1 value: 11.259 - type: recall_at_10 value: 22.148 - type: recall_at_100 value: 36.338 - type: recall_at_1000 value: 56.37 - type: recall_at_20 value: 25.444 - type: recall_at_3 value: 15.601 - type: recall_at_5 value: 17.904999999999998 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWebmastersRetrieval (default) revision: 160c094312a0e1facb97e55eeddb698c0abe3571 split: test type: mteb/cqadupstack-webmasters metrics: - type: main_score value: 18.986 - type: map_at_1 value: 11.219 - type: map_at_10 value: 15.572 - type: map_at_100 value: 16.496 - type: map_at_1000 value: 16.666 - type: map_at_20 value: 16.073999999999998 - type: map_at_3 value: 14.173 - type: map_at_5 value: 14.915000000000001 - type: mrr_at_1 value: 14.82213438735178 - type: mrr_at_10 value: 19.52365267582659 - type: mrr_at_100 value: 20.370290185635753 - type: mrr_at_1000 value: 20.467043542503724 - type: mrr_at_20 value: 20.0766545965337 - type: mrr_at_3 value: 18.21475625823452 - type: mrr_at_5 value: 18.945981554677203 - type: nauc_map_at_1000_diff1 value: 42.231943470301474 - type: nauc_map_at_1000_max value: 26.47159454229298 - type: nauc_map_at_1000_std value: 8.142899408562116 - type: nauc_map_at_100_diff1 value: 42.20734027834296 - type: nauc_map_at_100_max value: 26.482392045352114 - type: nauc_map_at_100_std value: 7.869302970334234 - type: nauc_map_at_10_diff1 value: 43.04836148095647 - type: nauc_map_at_10_max value: 26.854456008820886 - type: nauc_map_at_10_std value: 7.199117428761973 - type: nauc_map_at_1_diff1 value: 52.69584045825562 - type: nauc_map_at_1_max value: 32.26169513753074 - type: nauc_map_at_1_std value: 6.952498233745584 - type: nauc_map_at_20_diff1 value: 42.41625410983439 - type: nauc_map_at_20_max value: 26.907750306130733 - type: nauc_map_at_20_std value: 7.478967739706924 - type: nauc_map_at_3_diff1 value: 44.785788923058384 - type: nauc_map_at_3_max value: 27.412957229850438 - type: nauc_map_at_3_std value: 6.907258583517531 - type: nauc_map_at_5_diff1 value: 43.634053742171005 - type: nauc_map_at_5_max value: 27.311414645244174 - type: nauc_map_at_5_std value: 6.782368796408486 - type: nauc_mrr_at_1000_diff1 value: 40.121034147067355 - type: nauc_mrr_at_1000_max value: 26.418816188019484 - type: nauc_mrr_at_1000_std value: 11.036789931313589 - type: nauc_mrr_at_100_diff1 value: 40.09038771859193 - type: nauc_mrr_at_100_max value: 26.35109915559335 - type: nauc_mrr_at_100_std value: 11.004694419173386 - type: nauc_mrr_at_10_diff1 value: 40.70815905748883 - type: nauc_mrr_at_10_max value: 26.39730116006313 - type: nauc_mrr_at_10_std value: 10.795296410891202 - type: nauc_mrr_at_1_diff1 value: 49.49023740663914 - type: nauc_mrr_at_1_max value: 32.80752877856241 - type: nauc_mrr_at_1_std value: 9.182609293548452 - type: nauc_mrr_at_20_diff1 value: 40.09097766117321 - type: nauc_mrr_at_20_max value: 26.543696500831608 - type: nauc_mrr_at_20_std value: 11.045110550071236 - type: nauc_mrr_at_3_diff1 value: 42.547772290792786 - type: nauc_mrr_at_3_max value: 27.248503683439974 - type: nauc_mrr_at_3_std value: 11.12811144130018 - type: nauc_mrr_at_5_diff1 value: 41.182672458130945 - type: nauc_mrr_at_5_max value: 27.204022967551346 - type: nauc_mrr_at_5_std value: 10.736058227235059 - type: nauc_ndcg_at_1000_diff1 value: 38.283155226012525 - type: nauc_ndcg_at_1000_max value: 23.952454186870728 - type: nauc_ndcg_at_1000_std value: 11.202190633221258 - type: nauc_ndcg_at_100_diff1 value: 37.28326924063582 - type: nauc_ndcg_at_100_max value: 23.059861557232345 - type: nauc_ndcg_at_100_std value: 9.94550524440808 - type: nauc_ndcg_at_10_diff1 value: 39.63812221599438 - type: nauc_ndcg_at_10_max value: 24.35015593369919 - type: nauc_ndcg_at_10_std value: 9.315660164781054 - type: nauc_ndcg_at_1_diff1 value: 49.49023740663914 - type: nauc_ndcg_at_1_max value: 32.80752877856241 - type: nauc_ndcg_at_1_std value: 9.182609293548452 - type: nauc_ndcg_at_20_diff1 value: 37.63726489914318 - type: nauc_ndcg_at_20_max value: 24.728684570593007 - type: nauc_ndcg_at_20_std value: 9.986169134250208 - type: nauc_ndcg_at_3_diff1 value: 41.86142781421585 - type: nauc_ndcg_at_3_max value: 25.373436332199645 - type: nauc_ndcg_at_3_std value: 9.66682128586139 - type: nauc_ndcg_at_5_diff1 value: 40.642745287564594 - type: nauc_ndcg_at_5_max value: 25.56873621658099 - type: nauc_ndcg_at_5_std value: 9.25538178041856 - type: nauc_precision_at_1000_diff1 value: 11.480722649998393 - type: nauc_precision_at_1000_max value: 1.8213948061833445 - type: nauc_precision_at_1000_std value: 29.23515602956654 - type: nauc_precision_at_100_diff1 value: 14.18816101118032 - type: nauc_precision_at_100_max value: 2.440318670740079 - type: nauc_precision_at_100_std value: 29.24020499259622 - type: nauc_precision_at_10_diff1 value: 27.712287052106255 - type: nauc_precision_at_10_max value: 16.786789482138776 - type: nauc_precision_at_10_std value: 14.310510991471832 - type: nauc_precision_at_1_diff1 value: 49.49023740663914 - type: nauc_precision_at_1_max value: 32.80752877856241 - type: nauc_precision_at_1_std value: 9.182609293548452 - type: nauc_precision_at_20_diff1 value: 20.46872198920085 - type: nauc_precision_at_20_max value: 14.825240542929851 - type: nauc_precision_at_20_std value: 20.953665146043296 - type: nauc_precision_at_3_diff1 value: 36.03554983971536 - type: nauc_precision_at_3_max value: 21.854122073954194 - type: nauc_precision_at_3_std value: 13.04509621136731 - type: nauc_precision_at_5_diff1 value: 32.79763412951098 - type: nauc_precision_at_5_max value: 21.11796990161242 - type: nauc_precision_at_5_std value: 13.431327120495338 - type: nauc_recall_at_1000_diff1 value: 30.09802696990947 - type: nauc_recall_at_1000_max value: 13.40584644567289 - type: nauc_recall_at_1000_std value: 16.521370765894975 - type: nauc_recall_at_100_diff1 value: 26.309114191114602 - type: nauc_recall_at_100_max value: 13.350873360428366 - type: nauc_recall_at_100_std value: 11.078547445094047 - type: nauc_recall_at_10_diff1 value: 31.32014394352729 - type: nauc_recall_at_10_max value: 18.345182060137695 - type: nauc_recall_at_10_std value: 9.128692650287276 - type: nauc_recall_at_1_diff1 value: 52.69584045825562 - type: nauc_recall_at_1_max value: 32.26169513753074 - type: nauc_recall_at_1_std value: 6.952498233745584 - type: nauc_recall_at_20_diff1 value: 25.40389262415684 - type: nauc_recall_at_20_max value: 19.21175870928344 - type: nauc_recall_at_20_std value: 10.924171074066592 - type: nauc_recall_at_3_diff1 value: 38.07498529415478 - type: nauc_recall_at_3_max value: 21.675031784523334 - type: nauc_recall_at_3_std value: 7.885136540556627 - type: nauc_recall_at_5_diff1 value: 33.03739602855325 - type: nauc_recall_at_5_max value: 20.891017025098222 - type: nauc_recall_at_5_std value: 7.259719761129051 - type: ndcg_at_1 value: 14.822 - type: ndcg_at_10 value: 18.986 - type: ndcg_at_100 value: 22.996 - type: ndcg_at_1000 value: 26.569 - type: ndcg_at_20 value: 20.62 - type: ndcg_at_3 value: 16.778000000000002 - type: ndcg_at_5 value: 17.742 - type: precision_at_1 value: 14.822 - type: precision_at_10 value: 3.755 - type: precision_at_100 value: 0.8540000000000001 - type: precision_at_1000 value: 0.163 - type: precision_at_20 value: 2.4899999999999998 - type: precision_at_3 value: 8.235000000000001 - type: precision_at_5 value: 5.968 - type: recall_at_1 value: 11.219 - type: recall_at_10 value: 24.784 - type: recall_at_100 value: 43.143 - type: recall_at_1000 value: 68.416 - type: recall_at_20 value: 31.266 - type: recall_at_3 value: 17.607999999999997 - type: recall_at_5 value: 20.468 task: type: Retrieval - dataset: config: default name: MTEB CQADupstackWordpressRetrieval (default) revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 split: test type: mteb/cqadupstack-wordpress metrics: - type: main_score value: 14.105 - type: map_at_1 value: 9.766 - type: map_at_10 value: 12.35 - type: map_at_100 value: 12.794 - type: map_at_1000 value: 12.876000000000001 - type: map_at_20 value: 12.548 - type: map_at_3 value: 11.583 - type: map_at_5 value: 11.855 - type: mrr_at_1 value: 10.35120147874307 - type: mrr_at_10 value: 13.323137634597895 - type: mrr_at_100 value: 13.8122389813538 - type: mrr_at_1000 value: 13.891191650266954 - type: mrr_at_20 value: 13.550088548700803 - type: mrr_at_3 value: 12.41528034504005 - type: mrr_at_5 value: 12.74799753542822 - type: nauc_map_at_1000_diff1 value: 30.214009272387493 - type: nauc_map_at_1000_max value: 27.100911874185957 - type: nauc_map_at_1000_std value: 4.556062715371813 - type: nauc_map_at_100_diff1 value: 30.283972909659536 - type: nauc_map_at_100_max value: 27.101751795355376 - type: nauc_map_at_100_std value: 4.530095632746722 - type: nauc_map_at_10_diff1 value: 30.703580851962275 - type: nauc_map_at_10_max value: 27.45889128777842 - type: nauc_map_at_10_std value: 4.056332236709348 - type: nauc_map_at_1_diff1 value: 38.44336021108366 - type: nauc_map_at_1_max value: 31.341289082946698 - type: nauc_map_at_1_std value: 5.249357458733503 - type: nauc_map_at_20_diff1 value: 30.50519884637743 - type: nauc_map_at_20_max value: 27.340643104548395 - type: nauc_map_at_20_std value: 4.165692308941953 - type: nauc_map_at_3_diff1 value: 32.38602261885505 - type: nauc_map_at_3_max value: 28.903602549949543 - type: nauc_map_at_3_std value: 3.5402281277974756 - type: nauc_map_at_5_diff1 value: 32.2685825283353 - type: nauc_map_at_5_max value: 28.485087249150176 - type: nauc_map_at_5_std value: 3.8418506057303445 - type: nauc_mrr_at_1000_diff1 value: 30.308168307291954 - type: nauc_mrr_at_1000_max value: 26.895198553568438 - type: nauc_mrr_at_1000_std value: 6.332711766194871 - type: nauc_mrr_at_100_diff1 value: 30.366219069831494 - type: nauc_mrr_at_100_max value: 26.88024956005868 - type: nauc_mrr_at_100_std value: 6.328345475093812 - type: nauc_mrr_at_10_diff1 value: 30.60181659497291 - type: nauc_mrr_at_10_max value: 27.33947661988829 - type: nauc_mrr_at_10_std value: 5.98212349517898 - type: nauc_mrr_at_1_diff1 value: 38.01665824488639 - type: nauc_mrr_at_1_max value: 31.273295508014538 - type: nauc_mrr_at_1_std value: 7.49596621052432 - type: nauc_mrr_at_20_diff1 value: 30.504642171833616 - type: nauc_mrr_at_20_max value: 27.093254296264142 - type: nauc_mrr_at_20_std value: 6.011940896215445 - type: nauc_mrr_at_3_diff1 value: 32.30298334779263 - type: nauc_mrr_at_3_max value: 28.46795259170204 - type: nauc_mrr_at_3_std value: 5.233276939737523 - type: nauc_mrr_at_5_diff1 value: 32.317520734292316 - type: nauc_mrr_at_5_max value: 28.31645764893187 - type: nauc_mrr_at_5_std value: 5.514394216402804 - type: nauc_ndcg_at_1000_diff1 value: 25.46804692303833 - type: nauc_ndcg_at_1000_max value: 24.577578434016004 - type: nauc_ndcg_at_1000_std value: 8.08099372903191 - type: nauc_ndcg_at_100_diff1 value: 25.7728600426837 - type: nauc_ndcg_at_100_max value: 23.852719795214735 - type: nauc_ndcg_at_100_std value: 7.271020641236757 - type: nauc_ndcg_at_10_diff1 value: 27.787864887098827 - type: nauc_ndcg_at_10_max value: 25.82070997315848 - type: nauc_ndcg_at_10_std value: 4.84958725429997 - type: nauc_ndcg_at_1_diff1 value: 38.01665824488639 - type: nauc_ndcg_at_1_max value: 31.273295508014538 - type: nauc_ndcg_at_1_std value: 7.49596621052432 - type: nauc_ndcg_at_20_diff1 value: 27.23687052702463 - type: nauc_ndcg_at_20_max value: 25.3030643349024 - type: nauc_ndcg_at_20_std value: 5.128184329356223 - type: nauc_ndcg_at_3_diff1 value: 30.94323024403614 - type: nauc_ndcg_at_3_max value: 28.112791463025488 - type: nauc_ndcg_at_3_std value: 3.4748257092667845 - type: nauc_ndcg_at_5_diff1 value: 30.979886062267525 - type: nauc_ndcg_at_5_max value: 27.832062407091833 - type: nauc_ndcg_at_5_std value: 4.066523891816962 - type: nauc_precision_at_1000_diff1 value: 13.717212581088436 - type: nauc_precision_at_1000_max value: 14.726337919465527 - type: nauc_precision_at_1000_std value: 19.286677279311952 - type: nauc_precision_at_100_diff1 value: 13.83440364507339 - type: nauc_precision_at_100_max value: 13.983610901499812 - type: nauc_precision_at_100_std value: 17.767107323199852 - type: nauc_precision_at_10_diff1 value: 18.989269379083463 - type: nauc_precision_at_10_max value: 20.291510121396815 - type: nauc_precision_at_10_std value: 8.518048232551553 - type: nauc_precision_at_1_diff1 value: 38.01665824488639 - type: nauc_precision_at_1_max value: 31.273295508014538 - type: nauc_precision_at_1_std value: 7.49596621052432 - type: nauc_precision_at_20_diff1 value: 18.381866045394073 - type: nauc_precision_at_20_max value: 18.90966326296592 - type: nauc_precision_at_20_std value: 9.141677018751377 - type: nauc_precision_at_3_diff1 value: 26.100613624838605 - type: nauc_precision_at_3_max value: 24.76218487581011 - type: nauc_precision_at_3_std value: 2.4322989886641495 - type: nauc_precision_at_5_diff1 value: 26.83172966704407 - type: nauc_precision_at_5_max value: 24.090343452479146 - type: nauc_precision_at_5_std value: 4.535854021501322 - type: nauc_recall_at_1000_diff1 value: 13.245456056842464 - type: nauc_recall_at_1000_max value: 19.61498051994092 - type: nauc_recall_at_1000_std value: 17.188990206491262 - type: nauc_recall_at_100_diff1 value: 14.025440613222711 - type: nauc_recall_at_100_max value: 15.06663046965985 - type: nauc_recall_at_100_std value: 12.610345211569749 - type: nauc_recall_at_10_diff1 value: 21.102550210495654 - type: nauc_recall_at_10_max value: 21.76066577972798 - type: nauc_recall_at_10_std value: 5.1852219341177115 - type: nauc_recall_at_1_diff1 value: 38.44336021108366 - type: nauc_recall_at_1_max value: 31.341289082946698 - type: nauc_recall_at_1_std value: 5.249357458733503 - type: nauc_recall_at_20_diff1 value: 19.281075192679307 - type: nauc_recall_at_20_max value: 20.050580691482935 - type: nauc_recall_at_20_std value: 5.836669306240979 - type: nauc_recall_at_3_diff1 value: 27.334543456325626 - type: nauc_recall_at_3_max value: 26.711101790009558 - type: nauc_recall_at_3_std value: 2.3329176939418037 - type: nauc_recall_at_5_diff1 value: 27.75488164284888 - type: nauc_recall_at_5_max value: 26.285171746330576 - type: nauc_recall_at_5_std value: 3.361376753158064 - type: ndcg_at_1 value: 10.351 - type: ndcg_at_10 value: 14.105 - type: ndcg_at_100 value: 16.765 - type: ndcg_at_1000 value: 19.220000000000002 - type: ndcg_at_20 value: 14.82 - type: ndcg_at_3 value: 12.398000000000001 - type: ndcg_at_5 value: 12.879999999999999 - type: precision_at_1 value: 10.351 - type: precision_at_10 value: 2.144 - type: precision_at_100 value: 0.373 - type: precision_at_1000 value: 0.062 - type: precision_at_20 value: 1.238 - type: precision_at_3 value: 5.114 - type: precision_at_5 value: 3.401 - type: recall_at_1 value: 9.766 - type: recall_at_10 value: 18.595 - type: recall_at_100 value: 31.669999999999998 - type: recall_at_1000 value: 50.659 - type: recall_at_20 value: 21.248 - type: recall_at_3 value: 13.876 - type: recall_at_5 value: 15.015 task: type: Retrieval - dataset: config: default name: MTEB CUADAffiliateLicenseLicenseeLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 73.73737373737373 - type: ap value: 65.8818399825594 - type: ap_weighted value: 65.8818399825594 - type: f1 value: 72.61993404956918 - type: f1_weighted value: 72.61993404956918 - type: main_score value: 73.73737373737373 task: type: Classification - dataset: config: default name: MTEB CUADAffiliateLicenseLicensorLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 79.54545454545453 - type: ap value: 73.12252964426878 - type: ap_weighted value: 73.12252964426878 - type: f1 value: 79.53488372093022 - type: f1_weighted value: 79.53488372093024 - type: main_score value: 79.54545454545453 task: type: Classification - dataset: config: default name: MTEB CUADAntiAssignmentLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 70.64846416382251 - type: ap value: 63.215973012261415 - type: ap_weighted value: 63.215973012261415 - type: f1 value: 68.89855743269304 - type: f1_weighted value: 68.89855743269304 - type: main_score value: 70.64846416382251 task: type: Classification - dataset: config: default name: MTEB CUADAuditRightsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 60.44407894736842 - type: ap value: 57.470171721677076 - type: ap_weighted value: 57.470171721677076 - type: f1 value: 57.63732113071247 - type: f1_weighted value: 57.63732113071247 - type: main_score value: 60.44407894736842 task: type: Classification - dataset: config: default name: MTEB CUADCapOnLiabilityLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 49.518459069020864 - type: ap value: 49.761431703402096 - type: ap_weighted value: 49.761431703402096 - type: f1 value: 49.48302433823829 - type: f1_weighted value: 49.48302433823827 - type: main_score value: 49.518459069020864 task: type: Classification - dataset: config: default name: MTEB CUADChangeOfControlLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 71.875 - type: ap value: 64.42982456140352 - type: ap_weighted value: 64.42982456140352 - type: f1 value: 70.87723707120934 - type: f1_weighted value: 70.8772370712093 - type: main_score value: 71.875 task: type: Classification - dataset: config: default name: MTEB CUADCompetitiveRestrictionExceptionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 53.181818181818194 - type: ap value: 51.65110565110565 - type: ap_weighted value: 51.65110565110565 - type: f1 value: 47.02513150204559 - type: f1_weighted value: 47.025131502045596 - type: main_score value: 53.181818181818194 task: type: Classification - dataset: config: default name: MTEB CUADCovenantNotToSueLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 67.53246753246754 - type: ap value: 60.65974025974026 - type: ap_weighted value: 60.65974025974026 - type: f1 value: 64.03885671586028 - type: f1_weighted value: 64.03885671586026 - type: main_score value: 67.53246753246754 task: type: Classification - dataset: config: default name: MTEB CUADEffectiveDateLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 56.35593220338983 - type: ap value: 53.54749704375246 - type: ap_weighted value: 53.54749704375246 - type: f1 value: 56.26090868196132 - type: f1_weighted value: 56.26090868196131 - type: main_score value: 56.35593220338983 task: type: Classification - dataset: config: default name: MTEB CUADExclusivityLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 61.154855643044606 - type: ap value: 56.35333840225783 - type: ap_weighted value: 56.35333840225783 - type: f1 value: 57.26109628910987 - type: f1_weighted value: 57.26109628910987 - type: main_score value: 61.154855643044606 task: type: Classification - dataset: config: default name: MTEB CUADExpirationDateLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 80.82191780821917 - type: ap value: 77.03374913905259 - type: ap_weighted value: 77.03374913905259 - type: f1 value: 80.66062530224343 - type: f1_weighted value: 80.66062530224343 - type: main_score value: 80.82191780821917 task: type: Classification - dataset: config: default name: MTEB CUADGoverningLawLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 92.12328767123289 - type: ap value: 88.44810149857499 - type: ap_weighted value: 88.44810149857499 - type: f1 value: 92.12245616092896 - type: f1_weighted value: 92.12245616092899 - type: main_score value: 92.12328767123289 task: type: Classification - dataset: config: default name: MTEB CUADIPOwnershipAssignmentLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 64.0625 - type: ap value: 59.78260869565217 - type: ap_weighted value: 59.78260869565217 - type: f1 value: 63.33748443337483 - type: f1_weighted value: 63.33748443337485 - type: main_score value: 64.0625 task: type: Classification - dataset: config: default name: MTEB CUADInsuranceLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 80.3883495145631 - type: ap value: 76.65387764650838 - type: ap_weighted value: 76.65387764650838 - type: f1 value: 80.20173184889143 - type: f1_weighted value: 80.20173184889143 - type: main_score value: 80.3883495145631 task: type: Classification - dataset: config: default name: MTEB CUADIrrevocableOrPerpetualLicenseLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 78.21428571428572 - type: ap value: 70.19711163153788 - type: ap_weighted value: 70.19711163153788 - type: f1 value: 77.68807722955938 - type: f1_weighted value: 77.6880772295594 - type: main_score value: 78.21428571428572 task: type: Classification - dataset: config: default name: MTEB CUADJointIPOwnershipLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 85.9375 - type: ap value: 79.55607476635514 - type: ap_weighted value: 79.55607476635514 - type: f1 value: 85.89119015866969 - type: f1_weighted value: 85.89119015866969 - type: main_score value: 85.9375 task: type: Classification - dataset: config: default name: MTEB CUADLicenseGrantLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 72.56446991404013 - type: ap value: 65.06701026209069 - type: ap_weighted value: 65.06701026209069 - type: f1 value: 71.72168495320604 - type: f1_weighted value: 71.72168495320604 - type: main_score value: 72.56446991404013 task: type: Classification - dataset: config: default name: MTEB CUADLiquidatedDamagesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 80.45454545454544 - type: ap value: 73.2605583392985 - type: ap_weighted value: 73.2605583392985 - type: f1 value: 80.33713703726801 - type: f1_weighted value: 80.33713703726798 - type: main_score value: 80.45454545454544 task: type: Classification - dataset: config: default name: MTEB CUADMinimumCommitmentLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 75.51813471502591 - type: ap value: 68.84511159342107 - type: ap_weighted value: 68.84511159342107 - type: f1 value: 75.48815213647933 - type: f1_weighted value: 75.48815213647931 - type: main_score value: 75.51813471502591 task: type: Classification - dataset: config: default name: MTEB CUADMostFavoredNationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 73.4375 - type: ap value: 65.80668604651162 - type: ap_weighted value: 65.80668604651162 - type: f1 value: 72.62893081761007 - type: f1_weighted value: 72.62893081761007 - type: main_score value: 73.4375 task: type: Classification - dataset: config: default name: MTEB CUADNoSolicitOfCustomersLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 82.14285714285714 - type: ap value: 73.68421052631578 - type: ap_weighted value: 73.68421052631578 - type: f1 value: 81.55467720685114 - type: f1_weighted value: 81.55467720685111 - type: main_score value: 82.14285714285714 task: type: Classification - dataset: config: default name: MTEB CUADNoSolicitOfEmployeesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 88.02816901408453 - type: ap value: 81.23742454728371 - type: ap_weighted value: 81.23742454728371 - type: f1 value: 87.92698174543636 - type: f1_weighted value: 87.92698174543636 - type: main_score value: 88.02816901408453 task: type: Classification - dataset: config: default name: MTEB CUADNonCompeteLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 53.84615384615385 - type: ap value: 52.05651491365778 - type: ap_weighted value: 52.05651491365778 - type: f1 value: 53.70967410723452 - type: f1_weighted value: 53.70967410723452 - type: main_score value: 53.84615384615385 task: type: Classification - dataset: config: default name: MTEB CUADNonDisparagementLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 82.0 - type: ap value: 73.75757575757575 - type: ap_weighted value: 73.75757575757575 - type: f1 value: 81.5270935960591 - type: f1_weighted value: 81.5270935960591 - type: main_score value: 82.0 task: type: Classification - dataset: config: default name: MTEB CUADNonTransferableLicenseLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 72.69372693726936 - type: ap value: 68.36025144171039 - type: ap_weighted value: 68.36025144171039 - type: f1 value: 72.20320188509251 - type: f1_weighted value: 72.20320188509251 - type: main_score value: 72.69372693726936 task: type: Classification - dataset: config: default name: MTEB CUADNoticePeriodToTerminateRenewalLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 81.53153153153154 - type: ap value: 73.22254687119553 - type: ap_weighted value: 73.22254687119553 - type: f1 value: 81.003861003861 - type: f1_weighted value: 81.003861003861 - type: main_score value: 81.53153153153154 task: type: Classification - dataset: config: default name: MTEB CUADPostTerminationServicesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 59.52970297029702 - type: ap value: 55.494262149873045 - type: ap_weighted value: 55.494262149873045 - type: f1 value: 58.91289033889372 - type: f1_weighted value: 58.91289033889372 - type: main_score value: 59.52970297029702 task: type: Classification - dataset: config: default name: MTEB CUADPriceRestrictionsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 86.95652173913044 - type: ap value: 80.11272141706925 - type: ap_weighted value: 80.11272141706925 - type: f1 value: 86.85714285714286 - type: f1_weighted value: 86.85714285714286 - type: main_score value: 86.95652173913044 task: type: Classification - dataset: config: default name: MTEB CUADRenewalTermLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 81.86528497409327 - type: ap value: 74.56574832804549 - type: ap_weighted value: 74.56574832804549 - type: f1 value: 81.72348484848484 - type: f1_weighted value: 81.72348484848484 - type: main_score value: 81.86528497409327 task: type: Classification - dataset: config: default name: MTEB CUADRevenueProfitSharingLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 78.9405684754522 - type: ap value: 75.88346617170725 - type: ap_weighted value: 75.88346617170725 - type: f1 value: 78.5609048595758 - type: f1_weighted value: 78.5609048595758 - type: main_score value: 78.9405684754522 task: type: Classification - dataset: config: default name: MTEB CUADRofrRofoRofnLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 67.53623188405797 - type: ap value: 61.059567408520365 - type: ap_weighted value: 61.059567408520365 - type: f1 value: 66.55819428096656 - type: f1_weighted value: 66.55819428096656 - type: main_score value: 67.53623188405797 task: type: Classification - dataset: config: default name: MTEB CUADSourceCodeEscrowLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 79.66101694915253 - type: ap value: 73.06967984934086 - type: ap_weighted value: 73.06967984934086 - type: f1 value: 79.63761863675583 - type: f1_weighted value: 79.63761863675583 - type: main_score value: 79.66101694915253 task: type: Classification - dataset: config: default name: MTEB CUADTerminationForConvenienceLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 82.55813953488372 - type: ap value: 76.9289284938057 - type: ap_weighted value: 76.9289284938057 - type: f1 value: 82.5580452030568 - type: f1_weighted value: 82.55804520305684 - type: main_score value: 82.55813953488372 task: type: Classification - dataset: config: default name: MTEB CUADThirdPartyBeneficiaryLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 86.76470588235293 - type: ap value: 82.30837789661318 - type: ap_weighted value: 82.30837789661318 - type: f1 value: 86.76184295911746 - type: f1_weighted value: 86.76184295911744 - type: main_score value: 86.76470588235293 task: type: Classification - dataset: config: default name: MTEB CUADUncappedLiabilityLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 78.91156462585033 - type: ap value: 70.63036269784295 - type: ap_weighted value: 70.63036269784295 - type: f1 value: 78.23054507237377 - type: f1_weighted value: 78.23054507237376 - type: main_score value: 78.91156462585033 task: type: Classification - dataset: config: default name: MTEB CUADUnlimitedAllYouCanEatLicenseLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 75.0 - type: ap value: 67.5 - type: ap_weighted value: 67.5 - type: f1 value: 74.60317460317461 - type: f1_weighted value: 74.60317460317461 - type: main_score value: 75.0 task: type: Classification - dataset: config: default name: MTEB CUADVolumeRestrictionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 68.32298136645963 - type: ap value: 67.47730530339226 - type: ap_weighted value: 67.47730530339226 - type: f1 value: 65.23267138078504 - type: f1_weighted value: 65.23267138078504 - type: main_score value: 68.32298136645963 task: type: Classification - dataset: config: default name: MTEB CUADWarrantyDurationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 77.18749999999999 - type: ap value: 70.84930981595093 - type: ap_weighted value: 70.84930981595093 - type: f1 value: 77.18549481888057 - type: f1_weighted value: 77.18549481888057 - type: main_score value: 77.18749999999999 task: type: Classification - dataset: config: default name: MTEB CanadaTaxCourtOutcomesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 45.90163934426229 - type: f1 value: 41.86755057433674 - type: f1_weighted value: 52.49140373560517 - type: main_score value: 45.90163934426229 task: type: Classification - dataset: config: default name: MTEB ClimateFEVER (default) revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 split: test type: mteb/climate-fever metrics: - type: main_score value: 5.558 - type: map_at_1 value: 2.099 - type: map_at_10 value: 3.6790000000000003 - type: map_at_100 value: 4.021 - type: map_at_1000 value: 4.083 - type: map_at_20 value: 3.843 - type: map_at_3 value: 3.107 - type: map_at_5 value: 3.398 - type: mrr_at_1 value: 4.364820846905538 - type: mrr_at_10 value: 7.478723954293985 - type: mrr_at_100 value: 8.041420875649584 - type: mrr_at_1000 value: 8.120754871238086 - type: mrr_at_20 value: 7.760020669319687 - type: mrr_at_3 value: 6.438653637350702 - type: mrr_at_5 value: 7.028230184581975 - type: nauc_map_at_1000_diff1 value: 26.989583880363355 - type: nauc_map_at_1000_max value: 19.651932768180743 - type: nauc_map_at_1000_std value: 28.682949493303113 - type: nauc_map_at_100_diff1 value: 27.123176019982058 - type: nauc_map_at_100_max value: 19.598769909181605 - type: nauc_map_at_100_std value: 28.431702256094276 - type: nauc_map_at_10_diff1 value: 28.090105463174243 - type: nauc_map_at_10_max value: 19.316825624764327 - type: nauc_map_at_10_std value: 27.879940536760657 - type: nauc_map_at_1_diff1 value: 38.86635884960338 - type: nauc_map_at_1_max value: 23.66935741341746 - type: nauc_map_at_1_std value: 25.594810836643088 - type: nauc_map_at_20_diff1 value: 27.932097656688153 - type: nauc_map_at_20_max value: 19.705436224378094 - type: nauc_map_at_20_std value: 28.005161889024915 - type: nauc_map_at_3_diff1 value: 31.343508506514787 - type: nauc_map_at_3_max value: 17.617676175693653 - type: nauc_map_at_3_std value: 27.372138781240235 - type: nauc_map_at_5_diff1 value: 29.21950281006726 - type: nauc_map_at_5_max value: 18.039174755804527 - type: nauc_map_at_5_std value: 26.278075304640147 - type: nauc_mrr_at_1000_diff1 value: 21.017635057347793 - type: nauc_mrr_at_1000_max value: 20.84007387790555 - type: nauc_mrr_at_1000_std value: 24.684523933084744 - type: nauc_mrr_at_100_diff1 value: 21.051698171004 - type: nauc_mrr_at_100_max value: 20.79459868740917 - type: nauc_mrr_at_100_std value: 24.62077347403019 - type: nauc_mrr_at_10_diff1 value: 21.926692626233184 - type: nauc_mrr_at_10_max value: 20.868215747512338 - type: nauc_mrr_at_10_std value: 24.10229968572614 - type: nauc_mrr_at_1_diff1 value: 32.12007148649377 - type: nauc_mrr_at_1_max value: 25.428643110489634 - type: nauc_mrr_at_1_std value: 19.946229629460547 - type: nauc_mrr_at_20_diff1 value: 21.617935715645125 - type: nauc_mrr_at_20_max value: 21.046484288936377 - type: nauc_mrr_at_20_std value: 24.297367370651244 - type: nauc_mrr_at_3_diff1 value: 24.094623370861303 - type: nauc_mrr_at_3_max value: 19.713811945549196 - type: nauc_mrr_at_3_std value: 23.568839477173757 - type: nauc_mrr_at_5_diff1 value: 22.3010395396166 - type: nauc_mrr_at_5_max value: 20.569180907488864 - type: nauc_mrr_at_5_std value: 23.15568498862624 - type: nauc_ndcg_at_1000_diff1 value: 17.73440786298746 - type: nauc_ndcg_at_1000_max value: 21.164734898511266 - type: nauc_ndcg_at_1000_std value: 32.20409116224434 - type: nauc_ndcg_at_100_diff1 value: 19.491657641927414 - type: nauc_ndcg_at_100_max value: 19.73425182329514 - type: nauc_ndcg_at_100_std value: 29.633697891721162 - type: nauc_ndcg_at_10_diff1 value: 23.236666416810397 - type: nauc_ndcg_at_10_max value: 19.859686062177957 - type: nauc_ndcg_at_10_std value: 27.607123060751103 - type: nauc_ndcg_at_1_diff1 value: 32.12007148649377 - type: nauc_ndcg_at_1_max value: 25.428643110489634 - type: nauc_ndcg_at_1_std value: 19.946229629460547 - type: nauc_ndcg_at_20_diff1 value: 22.766492789770794 - type: nauc_ndcg_at_20_max value: 20.68653243447615 - type: nauc_ndcg_at_20_std value: 27.80598558578259 - type: nauc_ndcg_at_3_diff1 value: 26.430176145767764 - type: nauc_ndcg_at_3_max value: 17.178786585572514 - type: nauc_ndcg_at_3_std value: 26.551392559385945 - type: nauc_ndcg_at_5_diff1 value: 24.359838503352492 - type: nauc_ndcg_at_5_max value: 18.139249994062958 - type: nauc_ndcg_at_5_std value: 25.04579441208386 - type: nauc_precision_at_1000_diff1 value: 3.5941753705590855 - type: nauc_precision_at_1000_max value: 23.295418071068074 - type: nauc_precision_at_1000_std value: 37.823737794558035 - type: nauc_precision_at_100_diff1 value: 7.711362755764835 - type: nauc_precision_at_100_max value: 21.000892665907962 - type: nauc_precision_at_100_std value: 35.56596455340648 - type: nauc_precision_at_10_diff1 value: 14.603402002580449 - type: nauc_precision_at_10_max value: 22.112935744796918 - type: nauc_precision_at_10_std value: 30.665912790934176 - type: nauc_precision_at_1_diff1 value: 32.12007148649377 - type: nauc_precision_at_1_max value: 25.428643110489634 - type: nauc_precision_at_1_std value: 19.946229629460547 - type: nauc_precision_at_20_diff1 value: 14.716417574100266 - type: nauc_precision_at_20_max value: 23.926389785704096 - type: nauc_precision_at_20_std value: 30.69168946837732 - type: nauc_precision_at_3_diff1 value: 18.67632522519008 - type: nauc_precision_at_3_max value: 15.461714107477059 - type: nauc_precision_at_3_std value: 24.408621037612654 - type: nauc_precision_at_5_diff1 value: 14.433484685750017 - type: nauc_precision_at_5_max value: 18.682282289432337 - type: nauc_precision_at_5_std value: 24.03615092175192 - type: nauc_recall_at_1000_diff1 value: 7.5569286948470955 - type: nauc_recall_at_1000_max value: 18.988365246129565 - type: nauc_recall_at_1000_std value: 32.73921563811838 - type: nauc_recall_at_100_diff1 value: 12.11778715469688 - type: nauc_recall_at_100_max value: 16.608390547005357 - type: nauc_recall_at_100_std value: 29.88269190630321 - type: nauc_recall_at_10_diff1 value: 20.008263704255814 - type: nauc_recall_at_10_max value: 19.07669508851797 - type: nauc_recall_at_10_std value: 28.95827325426037 - type: nauc_recall_at_1_diff1 value: 38.86635884960338 - type: nauc_recall_at_1_max value: 23.66935741341746 - type: nauc_recall_at_1_std value: 25.594810836643088 - type: nauc_recall_at_20_diff1 value: 19.54693652826011 - type: nauc_recall_at_20_max value: 20.582517703572815 - type: nauc_recall_at_20_std value: 28.52204311008764 - type: nauc_recall_at_3_diff1 value: 25.95757457673112 - type: nauc_recall_at_3_max value: 13.802011828871594 - type: nauc_recall_at_3_std value: 28.160988060479163 - type: nauc_recall_at_5_diff1 value: 21.718874199874673 - type: nauc_recall_at_5_max value: 15.812170162395233 - type: nauc_recall_at_5_std value: 24.970427791223297 - type: ndcg_at_1 value: 4.365 - type: ndcg_at_10 value: 5.558 - type: ndcg_at_100 value: 7.637 - type: ndcg_at_1000 value: 9.700000000000001 - type: ndcg_at_20 value: 6.215 - type: ndcg_at_3 value: 4.314 - type: ndcg_at_5 value: 4.795 - type: precision_at_1 value: 4.365 - type: precision_at_10 value: 1.6740000000000002 - type: precision_at_100 value: 0.384 - type: precision_at_1000 value: 0.076 - type: precision_at_20 value: 1.111 - type: precision_at_3 value: 3.084 - type: precision_at_5 value: 2.423 - type: recall_at_1 value: 2.099 - type: recall_at_10 value: 7.371999999999999 - type: recall_at_100 value: 14.976999999999999 - type: recall_at_1000 value: 27.328000000000003 - type: recall_at_20 value: 9.288 - type: recall_at_3 value: 4.299 - type: recall_at_5 value: 5.509 task: type: Retrieval - dataset: config: default name: MTEB ContractNLIConfidentialityOfAgreementLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 64.63414634146342 - type: ap value: 59.62772785622593 - type: ap_weighted value: 59.62772785622593 - type: f1 value: 64.58674609084142 - type: f1_weighted value: 64.58674609084142 - type: main_score value: 64.63414634146342 task: type: Classification - dataset: config: default name: MTEB ContractNLIExplicitIdentificationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 56.88073394495412 - type: ap value: 21.457096600107935 - type: ap_weighted value: 21.457096600107935 - type: f1 value: 50.91501389288109 - type: f1_weighted value: 61.74750556638211 - type: main_score value: 56.88073394495412 task: type: Classification - dataset: config: default name: MTEB ContractNLIInclusionOfVerballyConveyedInformationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 60.431654676258994 - type: ap value: 55.25139990309542 - type: ap_weighted value: 55.25139990309542 - type: f1 value: 60.4234611999793 - type: f1_weighted value: 60.435751414398844 - type: main_score value: 60.431654676258994 task: type: Classification - dataset: config: default name: MTEB ContractNLILimitedUseLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 73.07692307692307 - type: ap value: 63.954526895988565 - type: ap_weighted value: 63.954526895988565 - type: f1 value: 73.01454916133815 - type: f1_weighted value: 73.10187264315704 - type: main_score value: 73.07692307692307 task: type: Classification - dataset: config: default name: MTEB ContractNLINoLicensingLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 82.09876543209876 - type: ap value: 75.19529587058324 - type: ap_weighted value: 75.19529587058324 - type: f1 value: 82.08169647965215 - type: f1_weighted value: 82.0748688986735 - type: main_score value: 82.09876543209876 task: type: Classification - dataset: config: default name: MTEB ContractNLINoticeOnCompelledDisclosureLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 78.87323943661971 - type: ap value: 72.12365099689045 - type: ap_weighted value: 72.12365099689045 - type: f1 value: 78.83545310015897 - type: f1_weighted value: 78.83545310015897 - type: main_score value: 78.87323943661971 task: type: Classification - dataset: config: default name: MTEB ContractNLIPermissibleAcquirementOfSimilarInformationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 72.47191011235954 - type: ap value: 64.74719101123597 - type: ap_weighted value: 64.74719101123597 - type: f1 value: 71.08377813877931 - type: f1_weighted value: 71.08377813877931 - type: main_score value: 72.47191011235954 task: type: Classification - dataset: config: default name: MTEB ContractNLIPermissibleCopyLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 41.379310344827594 - type: ap value: 19.168356997971607 - type: ap_weighted value: 19.168356997971607 - type: f1 value: 38.75776397515528 - type: f1_weighted value: 46.18547868922682 - type: main_score value: 41.379310344827594 task: type: Classification - dataset: config: default name: MTEB ContractNLIPermissibleDevelopmentOfSimilarInformationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 71.3235294117647 - type: ap value: 65.14279624893436 - type: ap_weighted value: 65.14279624893436 - type: f1 value: 71.3219789132198 - type: f1_weighted value: 71.3219789132198 - type: main_score value: 71.3235294117647 task: type: Classification - dataset: config: default name: MTEB ContractNLIPermissiblePostAgreementPossessionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 39.63963963963964 - type: ap value: 25.290389847351868 - type: ap_weighted value: 25.290389847351868 - type: f1 value: 39.56115400243804 - type: f1_weighted value: 40.64033151396011 - type: main_score value: 39.63963963963964 task: type: Classification - dataset: config: default name: MTEB ContractNLIReturnOfConfidentialInformationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 71.21212121212122 - type: ap value: 63.13978196600149 - type: ap_weighted value: 63.13978196600149 - type: f1 value: 70.88460645460877 - type: f1_weighted value: 70.7910308096052 - type: main_score value: 71.21212121212122 task: type: Classification - dataset: config: default name: MTEB ContractNLISharingWithEmployeesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 73.52941176470588 - type: ap value: 66.24576478752499 - type: ap_weighted value: 66.24576478752499 - type: f1 value: 71.13098607494621 - type: f1_weighted value: 71.42467085328414 - type: main_score value: 73.52941176470588 task: type: Classification - dataset: config: default name: MTEB ContractNLISharingWithThirdPartiesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 68.88888888888889 - type: ap value: 51.569719636083924 - type: ap_weighted value: 51.569719636083924 - type: f1 value: 66.28762541806019 - type: f1_weighted value: 68.26458565589 - type: main_score value: 68.88888888888889 task: type: Classification - dataset: config: default name: MTEB ContractNLISurvivalOfObligationsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 49.044585987261144 - type: ap value: 47.085151843488305 - type: ap_weighted value: 47.085151843488305 - type: f1 value: 48.28722002635046 - type: f1_weighted value: 47.92846772907698 - type: main_score value: 49.044585987261144 task: type: Classification - dataset: config: default name: MTEB CorporateLobbyingLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 70.40816326530613 - type: ap value: 29.59183673469388 - type: ap_weighted value: 29.59183673469388 - type: f1 value: 41.31736526946107 - type: f1_weighted value: 58.181595991690074 - type: main_score value: 70.40816326530613 task: type: Classification - dataset: config: default name: MTEB CyrillicTurkicLangClassification (default) revision: e42d330f33d65b7b72dfd408883daf1661f06f18 split: test type: tatiana-merz/cyrillic_turkic_langs metrics: - type: accuracy value: 61.19140625 - type: f1 value: 59.377085898563365 - type: f1_weighted value: 59.385881195883925 - type: main_score value: 61.19140625 task: type: Classification - dataset: config: default name: MTEB DBPedia (default) revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: dev type: mteb/dbpedia metrics: - type: main_score value: 7.161 - type: map_at_1 value: 0.599 - type: map_at_10 value: 2.243 - type: map_at_100 value: 3.1189999999999998 - type: map_at_1000 value: 3.488 - type: map_at_20 value: 2.522 - type: map_at_3 value: 1.397 - type: map_at_5 value: 1.951 - type: mrr_at_1 value: 8.955223880597014 - type: mrr_at_10 value: 18.287728026533994 - type: mrr_at_100 value: 18.978113584928742 - type: mrr_at_1000 value: 19.053758841865573 - type: mrr_at_20 value: 18.61199952617863 - type: mrr_at_3 value: 14.676616915422885 - type: mrr_at_5 value: 17.06467661691542 - type: nauc_map_at_1000_diff1 value: -2.930033724497058 - type: nauc_map_at_1000_max value: 3.5995430754716904 - type: nauc_map_at_1000_std value: 5.61203479120595 - type: nauc_map_at_100_diff1 value: -5.4531441891668795 - type: nauc_map_at_100_max value: -0.0055832626529105185 - type: nauc_map_at_100_std value: 3.439773391163607 - type: nauc_map_at_10_diff1 value: -14.3319757103363 - type: nauc_map_at_10_max value: -9.021024411612359 - type: nauc_map_at_10_std value: 1.0275253768638628 - type: nauc_map_at_1_diff1 value: 22.607506151253776 - type: nauc_map_at_1_max value: 10.921408762597743 - type: nauc_map_at_1_std value: -2.0177080867009054 - type: nauc_map_at_20_diff1 value: -11.794157692538237 - type: nauc_map_at_20_max value: -6.44484538876576 - type: nauc_map_at_20_std value: 1.039851694368717 - type: nauc_map_at_3_diff1 value: -7.469347804676409 - type: nauc_map_at_3_max value: -5.393936026725367 - type: nauc_map_at_3_std value: 9.280689460783249 - type: nauc_map_at_5_diff1 value: -15.955321054747321 - type: nauc_map_at_5_max value: -9.855092671604572 - type: nauc_map_at_5_std value: 0.06180279408320787 - type: nauc_mrr_at_1000_diff1 value: -2.821396337906413 - type: nauc_mrr_at_1000_max value: 5.972877383405757 - type: nauc_mrr_at_1000_std value: -1.6896049835004336 - type: nauc_mrr_at_100_diff1 value: -2.8632536639982105 - type: nauc_mrr_at_100_max value: 5.973020236396294 - type: nauc_mrr_at_100_std value: -1.809958349128643 - type: nauc_mrr_at_10_diff1 value: -4.515463799529893 - type: nauc_mrr_at_10_max value: 5.030384515417533 - type: nauc_mrr_at_10_std value: -1.547480529694615 - type: nauc_mrr_at_1_diff1 value: 8.719512377821816 - type: nauc_mrr_at_1_max value: 16.272382792823382 - type: nauc_mrr_at_1_std value: -3.187491782487964 - type: nauc_mrr_at_20_diff1 value: -2.908929872190089 - type: nauc_mrr_at_20_max value: 6.58409584409903 - type: nauc_mrr_at_20_std value: -1.1174417761572792 - type: nauc_mrr_at_3_diff1 value: -1.6595580931793985 - type: nauc_mrr_at_3_max value: 9.640215787928428 - type: nauc_mrr_at_3_std value: 2.889288978742377 - type: nauc_mrr_at_5_diff1 value: -6.89298539225687 - type: nauc_mrr_at_5_max value: 6.578043390443974 - type: nauc_mrr_at_5_std value: -0.6581933130437475 - type: nauc_ndcg_at_1000_diff1 value: 3.75625342513744 - type: nauc_ndcg_at_1000_max value: 6.952585708583143 - type: nauc_ndcg_at_1000_std value: 5.400684775811628 - type: nauc_ndcg_at_100_diff1 value: -2.242186789473446 - type: nauc_ndcg_at_100_max value: 1.7125259047701242 - type: nauc_ndcg_at_100_std value: -0.6824733710981048 - type: nauc_ndcg_at_10_diff1 value: -11.969827974466098 - type: nauc_ndcg_at_10_max value: -4.424965429405649 - type: nauc_ndcg_at_10_std value: 0.03592313276976773 - type: nauc_ndcg_at_1_diff1 value: -4.197220327746547 - type: nauc_ndcg_at_1_max value: 9.247135683163954 - type: nauc_ndcg_at_1_std value: -6.671985136155276 - type: nauc_ndcg_at_20_diff1 value: -8.358422632396593 - type: nauc_ndcg_at_20_max value: -1.0551974757194074 - type: nauc_ndcg_at_20_std value: 2.0508581550409524 - type: nauc_ndcg_at_3_diff1 value: -7.53212458402589 - type: nauc_ndcg_at_3_max value: 3.6347588818172336 - type: nauc_ndcg_at_3_std value: 5.073680163820697 - type: nauc_ndcg_at_5_diff1 value: -17.183713921651613 - type: nauc_ndcg_at_5_max value: -2.598662858319381 - type: nauc_ndcg_at_5_std value: -0.4734708395726036 - type: nauc_precision_at_1000_diff1 value: 22.034829237918075 - type: nauc_precision_at_1000_max value: 29.133045600628414 - type: nauc_precision_at_1000_std value: 22.48207630228867 - type: nauc_precision_at_100_diff1 value: 22.17246050117164 - type: nauc_precision_at_100_max value: 25.497860199414003 - type: nauc_precision_at_100_std value: 14.10941839109608 - type: nauc_precision_at_10_diff1 value: -2.3976462009254527 - type: nauc_precision_at_10_max value: 3.2185747947259737 - type: nauc_precision_at_10_std value: 1.1160090019272848 - type: nauc_precision_at_1_diff1 value: 8.719512377821816 - type: nauc_precision_at_1_max value: 16.272382792823382 - type: nauc_precision_at_1_std value: -3.187491782487964 - type: nauc_precision_at_20_diff1 value: 8.125877087406765 - type: nauc_precision_at_20_max value: 14.004634012058606 - type: nauc_precision_at_20_std value: 6.076987698320296 - type: nauc_precision_at_3_diff1 value: -5.415944490965941 - type: nauc_precision_at_3_max value: 6.0110244505222 - type: nauc_precision_at_3_std value: 6.0205421596952675 - type: nauc_precision_at_5_diff1 value: -19.55829195099795 - type: nauc_precision_at_5_max value: -2.3847548504000993 - type: nauc_precision_at_5_std value: -4.296125770063572 - type: nauc_recall_at_1000_diff1 value: 5.793923275597914 - type: nauc_recall_at_1000_max value: 2.365078190964481 - type: nauc_recall_at_1000_std value: 3.5546888704254744 - type: nauc_recall_at_100_diff1 value: 1.652314810086157 - type: nauc_recall_at_100_max value: 1.2466358966197024 - type: nauc_recall_at_100_std value: -5.516640557428562 - type: nauc_recall_at_10_diff1 value: -18.83385802183443 - type: nauc_recall_at_10_max value: -15.04302952000884 - type: nauc_recall_at_10_std value: -0.9615025531726922 - type: nauc_recall_at_1_diff1 value: 22.607506151253776 - type: nauc_recall_at_1_max value: 10.921408762597743 - type: nauc_recall_at_1_std value: -2.0177080867009054 - type: nauc_recall_at_20_diff1 value: -8.960549697900921 - type: nauc_recall_at_20_max value: -6.8364201397227164 - type: nauc_recall_at_20_std value: -1.2091707122721411 - type: nauc_recall_at_3_diff1 value: -17.196135512311084 - type: nauc_recall_at_3_max value: -10.816815002699384 - type: nauc_recall_at_3_std value: 12.535755202753904 - type: nauc_recall_at_5_diff1 value: -23.856486271404066 - type: nauc_recall_at_5_max value: -13.129773406696268 - type: nauc_recall_at_5_std value: -2.885196394596191 - type: ndcg_at_1 value: 6.715999999999999 - type: ndcg_at_10 value: 7.161 - type: ndcg_at_100 value: 9.506 - type: ndcg_at_1000 value: 14.194 - type: ndcg_at_20 value: 6.969 - type: ndcg_at_3 value: 7.285 - type: ndcg_at_5 value: 7.436 - type: precision_at_1 value: 8.955 - type: precision_at_10 value: 6.866 - type: precision_at_100 value: 2.343 - type: precision_at_1000 value: 0.557 - type: precision_at_20 value: 5.0 - type: precision_at_3 value: 9.453 - type: precision_at_5 value: 8.955 - type: recall_at_1 value: 0.599 - type: recall_at_10 value: 5.234 - type: recall_at_100 value: 14.610999999999999 - type: recall_at_1000 value: 31.723000000000003 - type: recall_at_20 value: 6.797000000000001 - type: recall_at_3 value: 2.1239999999999997 - type: recall_at_5 value: 3.836 task: type: Retrieval - dataset: config: default name: MTEB DBPedia (default) revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 split: test type: mteb/dbpedia metrics: - type: main_score value: 9.612 - type: map_at_1 value: 1.5150000000000001 - type: map_at_10 value: 3.324 - type: map_at_100 value: 4.593 - type: map_at_1000 value: 4.942 - type: map_at_20 value: 3.775 - type: map_at_3 value: 2.349 - type: map_at_5 value: 2.83 - type: mrr_at_1 value: 17.75 - type: mrr_at_10 value: 25.455257936507948 - type: mrr_at_100 value: 26.384386588195795 - type: mrr_at_1000 value: 26.43428730177263 - type: mrr_at_20 value: 26.012663071147983 - type: mrr_at_3 value: 22.916666666666668 - type: mrr_at_5 value: 24.42916666666667 - type: nauc_map_at_1000_diff1 value: 22.13041079857 - type: nauc_map_at_1000_max value: 30.847169046279717 - type: nauc_map_at_1000_std value: 26.662372161640164 - type: nauc_map_at_100_diff1 value: 22.33437365695696 - type: nauc_map_at_100_max value: 30.631982988659413 - type: nauc_map_at_100_std value: 24.343041349757826 - type: nauc_map_at_10_diff1 value: 24.027517719649303 - type: nauc_map_at_10_max value: 25.07712884251914 - type: nauc_map_at_10_std value: 13.947979384184976 - type: nauc_map_at_1_diff1 value: 36.83267850021598 - type: nauc_map_at_1_max value: 19.169430946850284 - type: nauc_map_at_1_std value: 9.884774862276792 - type: nauc_map_at_20_diff1 value: 23.514668795309415 - type: nauc_map_at_20_max value: 27.504950445908978 - type: nauc_map_at_20_std value: 17.094975030047124 - type: nauc_map_at_3_diff1 value: 26.34278610573698 - type: nauc_map_at_3_max value: 20.845843284715972 - type: nauc_map_at_3_std value: 7.67049397964597 - type: nauc_map_at_5_diff1 value: 25.7750795640811 - type: nauc_map_at_5_max value: 22.947480091712098 - type: nauc_map_at_5_std value: 11.721230195408548 - type: nauc_mrr_at_1000_diff1 value: 22.232372488450842 - type: nauc_mrr_at_1000_max value: 27.572890316358283 - type: nauc_mrr_at_1000_std value: 16.214637981707586 - type: nauc_mrr_at_100_diff1 value: 22.236444609236038 - type: nauc_mrr_at_100_max value: 27.58760243571819 - type: nauc_mrr_at_100_std value: 16.244413870712897 - type: nauc_mrr_at_10_diff1 value: 22.225463768969977 - type: nauc_mrr_at_10_max value: 28.085279372515014 - type: nauc_mrr_at_10_std value: 16.63553736106648 - type: nauc_mrr_at_1_diff1 value: 29.84035077607877 - type: nauc_mrr_at_1_max value: 29.694489641199347 - type: nauc_mrr_at_1_std value: 13.521637546163495 - type: nauc_mrr_at_20_diff1 value: 22.04153237789325 - type: nauc_mrr_at_20_max value: 27.694203519607907 - type: nauc_mrr_at_20_std value: 16.41753082494305 - type: nauc_mrr_at_3_diff1 value: 23.699732601185406 - type: nauc_mrr_at_3_max value: 28.552272889924087 - type: nauc_mrr_at_3_std value: 15.054097838038286 - type: nauc_mrr_at_5_diff1 value: 23.127326455282443 - type: nauc_mrr_at_5_max value: 28.769272111978832 - type: nauc_mrr_at_5_std value: 16.113310297737975 - type: nauc_ndcg_at_1000_diff1 value: 19.30064409197478 - type: nauc_ndcg_at_1000_max value: 28.102160223624878 - type: nauc_ndcg_at_1000_std value: 30.203518553202162 - type: nauc_ndcg_at_100_diff1 value: 18.61374183566408 - type: nauc_ndcg_at_100_max value: 26.626236693773404 - type: nauc_ndcg_at_100_std value: 25.742758699186076 - type: nauc_ndcg_at_10_diff1 value: 22.519496459830016 - type: nauc_ndcg_at_10_max value: 29.403797316052678 - type: nauc_ndcg_at_10_std value: 20.893386965358616 - type: nauc_ndcg_at_1_diff1 value: 32.866635298438084 - type: nauc_ndcg_at_1_max value: 26.59719751655438 - type: nauc_ndcg_at_1_std value: 11.114394574061539 - type: nauc_ndcg_at_20_diff1 value: 21.157000991633115 - type: nauc_ndcg_at_20_max value: 27.740565719664534 - type: nauc_ndcg_at_20_std value: 21.639809971682443 - type: nauc_ndcg_at_3_diff1 value: 25.11861929994868 - type: nauc_ndcg_at_3_max value: 30.05796948174576 - type: nauc_ndcg_at_3_std value: 15.558218990994382 - type: nauc_ndcg_at_5_diff1 value: 23.56633730677446 - type: nauc_ndcg_at_5_max value: 29.407157319632233 - type: nauc_ndcg_at_5_std value: 18.567271816504054 - type: nauc_precision_at_1000_diff1 value: 15.34548548807785 - type: nauc_precision_at_1000_max value: 10.572226641262324 - type: nauc_precision_at_1000_std value: 29.1034314360236 - type: nauc_precision_at_100_diff1 value: 15.716430228733962 - type: nauc_precision_at_100_max value: 29.095076486854232 - type: nauc_precision_at_100_std value: 38.5066690028862 - type: nauc_precision_at_10_diff1 value: 19.68952528017596 - type: nauc_precision_at_10_max value: 36.890169328577436 - type: nauc_precision_at_10_std value: 30.965796095297055 - type: nauc_precision_at_1_diff1 value: 29.84035077607877 - type: nauc_precision_at_1_max value: 29.694489641199347 - type: nauc_precision_at_1_std value: 13.521637546163495 - type: nauc_precision_at_20_diff1 value: 18.030808015274253 - type: nauc_precision_at_20_max value: 37.61603054850129 - type: nauc_precision_at_20_std value: 34.160861586371816 - type: nauc_precision_at_3_diff1 value: 20.899695298609572 - type: nauc_precision_at_3_max value: 35.736648108449906 - type: nauc_precision_at_3_std value: 21.012939343933635 - type: nauc_precision_at_5_diff1 value: 20.038574686656855 - type: nauc_precision_at_5_max value: 37.244225604024464 - type: nauc_precision_at_5_std value: 27.105877764557317 - type: nauc_recall_at_1000_diff1 value: 7.621037010770166 - type: nauc_recall_at_1000_max value: 14.556069262959875 - type: nauc_recall_at_1000_std value: 24.912834855259458 - type: nauc_recall_at_100_diff1 value: 5.640854515267624 - type: nauc_recall_at_100_max value: 12.319243091931583 - type: nauc_recall_at_100_std value: 18.20593364111766 - type: nauc_recall_at_10_diff1 value: 9.625612977495116 - type: nauc_recall_at_10_max value: 17.05920473206263 - type: nauc_recall_at_10_std value: 10.7221437835498 - type: nauc_recall_at_1_diff1 value: 36.83267850021598 - type: nauc_recall_at_1_max value: 19.169430946850284 - type: nauc_recall_at_1_std value: 9.884774862276792 - type: nauc_recall_at_20_diff1 value: 8.05059067573258 - type: nauc_recall_at_20_max value: 15.8154139120262 - type: nauc_recall_at_20_std value: 12.679202204644218 - type: nauc_recall_at_3_diff1 value: 16.446191987706968 - type: nauc_recall_at_3_max value: 16.891019665567892 - type: nauc_recall_at_3_std value: 5.902427268316366 - type: nauc_recall_at_5_diff1 value: 16.441740431697145 - type: nauc_recall_at_5_max value: 18.339945932093187 - type: nauc_recall_at_5_std value: 11.244004704766795 - type: ndcg_at_1 value: 13.0 - type: ndcg_at_10 value: 9.612 - type: ndcg_at_100 value: 11.403 - type: ndcg_at_1000 value: 15.142 - type: ndcg_at_20 value: 9.419 - type: ndcg_at_3 value: 10.821 - type: ndcg_at_5 value: 10.462 - type: precision_at_1 value: 17.75 - type: precision_at_10 value: 9.15 - type: precision_at_100 value: 3.0 - type: precision_at_1000 value: 0.716 - type: precision_at_20 value: 6.763 - type: precision_at_3 value: 13.417000000000002 - type: precision_at_5 value: 12.35 - type: recall_at_1 value: 1.5150000000000001 - type: recall_at_10 value: 5.858 - type: recall_at_100 value: 15.643 - type: recall_at_1000 value: 28.51 - type: recall_at_20 value: 8.25 - type: recall_at_3 value: 2.995 - type: recall_at_5 value: 4.117 task: type: Retrieval - dataset: config: default name: MTEB DBpediaClassification (default) revision: 9abd46cf7fc8b4c64290f26993c540b92aa145ac split: test type: fancyzhx/dbpedia_14 metrics: - type: accuracy value: 79.6484375 - type: f1 value: 78.34279956840108 - type: f1_weighted value: 78.35088313144212 - type: main_score value: 79.6484375 task: type: Classification - dataset: config: default name: MTEB DefinitionClassificationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 84.51757666417352 - type: ap value: 80.76707736262222 - type: ap_weighted value: 80.76707736262222 - type: f1 value: 84.51702233000746 - type: f1_weighted value: 84.52014045969152 - type: main_score value: 84.51757666417352 task: type: Classification - dataset: config: default name: MTEB Diversity1LegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 76.33333333333334 - type: ap value: 23.666666666666668 - type: ap_weighted value: 23.666666666666668 - type: f1 value: 43.28922495274102 - type: f1_weighted value: 66.08821676118463 - type: main_score value: 76.33333333333334 task: type: Classification - dataset: config: default name: MTEB Diversity2LegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 74.66666666666669 - type: ap value: 25.333333333333336 - type: ap_weighted value: 25.333333333333336 - type: f1 value: 42.74809160305343 - type: f1_weighted value: 63.83715012722646 - type: main_score value: 74.66666666666669 task: type: Classification - dataset: config: default name: MTEB Diversity3LegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 58.666666666666664 - type: ap value: 58.666666666666664 - type: ap_weighted value: 58.666666666666664 - type: f1 value: 36.97478991596639 - type: f1_weighted value: 43.383753501400555 - type: main_score value: 58.666666666666664 task: type: Classification - dataset: config: default name: MTEB Diversity4LegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 53.333333333333336 - type: ap value: 53.333333333333336 - type: ap_weighted value: 53.333333333333336 - type: f1 value: 34.782608695652165 - type: f1_weighted value: 37.10144927536233 - type: main_score value: 53.333333333333336 task: type: Classification - dataset: config: default name: MTEB Diversity5LegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 57.333333333333336 - type: ap value: 57.333333333333336 - type: ap_weighted value: 57.333333333333336 - type: f1 value: 36.440677966101696 - type: f1_weighted value: 41.78531073446328 - type: main_score value: 57.333333333333336 task: type: Classification - dataset: config: default name: MTEB Diversity6LegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 55.33333333333334 - type: ap value: 55.335312709510575 - type: ap_weighted value: 55.335312709510575 - type: f1 value: 53.72075888745626 - type: f1_weighted value: 54.239086387916736 - type: main_score value: 55.33333333333334 task: type: Classification - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: test type: mteb/emotion metrics: - type: accuracy value: 29.500000000000004 - type: f1 value: 25.366180985174143 - type: f1_weighted value: 31.616367697127934 - type: main_score value: 29.500000000000004 task: type: Classification - dataset: config: default name: MTEB EmotionClassification (default) revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 split: validation type: mteb/emotion metrics: - type: accuracy value: 29.59 - type: f1 value: 25.66115067003055 - type: f1_weighted value: 31.610928656113497 - type: main_score value: 29.59 task: type: Classification - dataset: config: default name: MTEB FaithDial (default) revision: 7a414e80725eac766f2602676dc8b39f80b061e4 split: test type: McGill-NLP/FaithDial metrics: - type: main_score value: 13.203999999999999 - type: map_at_1 value: 4.603 - type: map_at_10 value: 9.689 - type: map_at_100 value: 10.934000000000001 - type: map_at_1000 value: 11.06 - type: map_at_20 value: 10.282 - type: map_at_3 value: 7.46 - type: map_at_5 value: 8.601 - type: mrr_at_1 value: 3.9177277179236047 - type: mrr_at_10 value: 9.372463970896874 - type: mrr_at_100 value: 10.603150618822562 - type: mrr_at_1000 value: 10.7286670506961 - type: mrr_at_20 value: 9.954996988904508 - type: mrr_at_3 value: 7.190662748938949 - type: mrr_at_5 value: 8.24844923277832 - type: nauc_map_at_1000_diff1 value: 5.307634687499811 - type: nauc_map_at_1000_max value: 2.3021513473591937 - type: nauc_map_at_1000_std value: -17.73170584094867 - type: nauc_map_at_100_diff1 value: 5.297350465897308 - type: nauc_map_at_100_max value: 2.346907480087932 - type: nauc_map_at_100_std value: -17.732933045818474 - type: nauc_map_at_10_diff1 value: 6.045977877604437 - type: nauc_map_at_10_max value: 1.8368181824684384 - type: nauc_map_at_10_std value: -19.787304492799954 - type: nauc_map_at_1_diff1 value: 1.3052717698444036 - type: nauc_map_at_1_max value: -4.135496842891768 - type: nauc_map_at_1_std value: -19.25157996189646 - type: nauc_map_at_20_diff1 value: 5.761740069816983 - type: nauc_map_at_20_max value: 2.2984777745182807 - type: nauc_map_at_20_std value: -18.75124467493425 - type: nauc_map_at_3_diff1 value: 6.651930299284997 - type: nauc_map_at_3_max value: -0.3272549806355308 - type: nauc_map_at_3_std value: -21.098596102590484 - type: nauc_map_at_5_diff1 value: 6.967992538819455 - type: nauc_map_at_5_max value: 0.5435787268710469 - type: nauc_map_at_5_std value: -20.283953347398604 - type: nauc_mrr_at_1000_diff1 value: 6.740910238395446 - type: nauc_mrr_at_1000_max value: 2.260193924794291 - type: nauc_mrr_at_1000_std value: -16.012044193795997 - type: nauc_mrr_at_100_diff1 value: 6.722495330136685 - type: nauc_mrr_at_100_max value: 2.303043406886841 - type: nauc_mrr_at_100_std value: -16.020952265971687 - type: nauc_mrr_at_10_diff1 value: 7.499027953700563 - type: nauc_mrr_at_10_max value: 1.7369780903909435 - type: nauc_mrr_at_10_std value: -17.773058332780796 - type: nauc_mrr_at_1_diff1 value: 7.479923371906451 - type: nauc_mrr_at_1_max value: -6.618146247607683 - type: nauc_mrr_at_1_std value: -17.69446400002114 - type: nauc_mrr_at_20_diff1 value: 7.167945669605475 - type: nauc_mrr_at_20_max value: 2.272029597435147 - type: nauc_mrr_at_20_std value: -17.15567528957464 - type: nauc_mrr_at_3_diff1 value: 8.689535713040886 - type: nauc_mrr_at_3_max value: -0.503459138449647 - type: nauc_mrr_at_3_std value: -18.50457781869527 - type: nauc_mrr_at_5_diff1 value: 8.688882139587488 - type: nauc_mrr_at_5_max value: 0.6822164815544203 - type: nauc_mrr_at_5_std value: -18.323678647634363 - type: nauc_ndcg_at_1000_diff1 value: 3.895349559751926 - type: nauc_ndcg_at_1000_max value: 4.497321779831305 - type: nauc_ndcg_at_1000_std value: -11.297185296929218 - type: nauc_ndcg_at_100_diff1 value: 2.8704577253134365 - type: nauc_ndcg_at_100_max value: 5.389954929442454 - type: nauc_ndcg_at_100_std value: -10.400630555415756 - type: nauc_ndcg_at_10_diff1 value: 6.092068255087623 - type: nauc_ndcg_at_10_max value: 4.227250873974054 - type: nauc_ndcg_at_10_std value: -19.171869390880573 - type: nauc_ndcg_at_1_diff1 value: 1.3052717698444036 - type: nauc_ndcg_at_1_max value: -4.135496842891768 - type: nauc_ndcg_at_1_std value: -19.25157996189646 - type: nauc_ndcg_at_20_diff1 value: 5.40179215063042 - type: nauc_ndcg_at_20_max value: 5.316262069583032 - type: nauc_ndcg_at_20_std value: -16.253163982932534 - type: nauc_ndcg_at_3_diff1 value: 7.419223521385511 - type: nauc_ndcg_at_3_max value: 0.5830467018062534 - type: nauc_ndcg_at_3_std value: -21.398247993882336 - type: nauc_ndcg_at_5_diff1 value: 7.871015584820952 - type: nauc_ndcg_at_5_max value: 1.911179358773651 - type: nauc_ndcg_at_5_std value: -20.05509945356285 - type: nauc_precision_at_1000_diff1 value: -0.844755882557819 - type: nauc_precision_at_1000_max value: 9.219453102597015 - type: nauc_precision_at_1000_std value: 29.23861313970078 - type: nauc_precision_at_100_diff1 value: -3.7470853890619606 - type: nauc_precision_at_100_max value: 10.533862037156355 - type: nauc_precision_at_100_std value: 8.252086567057157 - type: nauc_precision_at_10_diff1 value: 5.901773888339623 - type: nauc_precision_at_10_max value: 8.111412609207008 - type: nauc_precision_at_10_std value: -18.07076007909741 - type: nauc_precision_at_1_diff1 value: 1.3052717698444036 - type: nauc_precision_at_1_max value: -4.135496842891768 - type: nauc_precision_at_1_std value: -19.25157996189646 - type: nauc_precision_at_20_diff1 value: 4.510193698541817 - type: nauc_precision_at_20_max value: 10.055538647436114 - type: nauc_precision_at_20_std value: -11.60139299594993 - type: nauc_precision_at_3_diff1 value: 8.853244226690453 - type: nauc_precision_at_3_max value: 2.3906768293455305 - type: nauc_precision_at_3_std value: -21.96838812494048 - type: nauc_precision_at_5_diff1 value: 9.38307261489558 - type: nauc_precision_at_5_max value: 4.352929382840095 - type: nauc_precision_at_5_std value: -19.535985352739786 - type: nauc_recall_at_1000_diff1 value: -0.8447558825574738 - type: nauc_recall_at_1000_max value: 9.219453102597296 - type: nauc_recall_at_1000_std value: 29.23861313970089 - type: nauc_recall_at_100_diff1 value: -3.747085389061965 - type: nauc_recall_at_100_max value: 10.533862037156396 - type: nauc_recall_at_100_std value: 8.252086567057194 - type: nauc_recall_at_10_diff1 value: 5.901773888339621 - type: nauc_recall_at_10_max value: 8.111412609207008 - type: nauc_recall_at_10_std value: -18.07076007909743 - type: nauc_recall_at_1_diff1 value: 1.3052717698444036 - type: nauc_recall_at_1_max value: -4.135496842891768 - type: nauc_recall_at_1_std value: -19.25157996189646 - type: nauc_recall_at_20_diff1 value: 4.510193698541801 - type: nauc_recall_at_20_max value: 10.055538647436121 - type: nauc_recall_at_20_std value: -11.601392995949936 - type: nauc_recall_at_3_diff1 value: 8.853244226690453 - type: nauc_recall_at_3_max value: 2.390676829345526 - type: nauc_recall_at_3_std value: -21.96838812494048 - type: nauc_recall_at_5_diff1 value: 9.383072614895593 - type: nauc_recall_at_5_max value: 4.352929382840121 - type: nauc_recall_at_5_std value: -19.535985352739782 - type: ndcg_at_1 value: 4.603 - type: ndcg_at_10 value: 13.203999999999999 - type: ndcg_at_100 value: 20.254 - type: ndcg_at_1000 value: 23.923 - type: ndcg_at_20 value: 15.354000000000001 - type: ndcg_at_3 value: 8.469 - type: ndcg_at_5 value: 10.536 - type: precision_at_1 value: 4.603 - type: precision_at_10 value: 2.478 - type: precision_at_100 value: 0.6 - type: precision_at_1000 value: 0.09 - type: precision_at_20 value: 1.6629999999999998 - type: precision_at_3 value: 3.803 - type: precision_at_5 value: 3.2910000000000004 - type: recall_at_1 value: 4.603 - type: recall_at_10 value: 24.779999999999998 - type: recall_at_100 value: 60.039 - type: recall_at_1000 value: 89.667 - type: recall_at_20 value: 33.251999999999995 - type: recall_at_3 value: 11.41 - type: recall_at_5 value: 16.454 task: type: Retrieval - dataset: config: default name: MTEB FeedbackQARetrieval (default) revision: 1ee1cd0 split: test type: lt2c/fqa metrics: - type: main_score value: 19.026 - type: map_at_1 value: 19.026 - type: map_at_10 value: 26.287 - type: map_at_100 value: 27.294 - type: map_at_1000 value: 27.381 - type: map_at_20 value: 26.823999999999998 - type: map_at_3 value: 24.18 - type: map_at_5 value: 25.365 - type: mrr_at_1 value: 19.026104417670684 - type: mrr_at_10 value: 26.287052973799952 - type: mrr_at_100 value: 27.29426430169323 - type: mrr_at_1000 value: 27.380630702740504 - type: mrr_at_20 value: 26.824443943374348 - type: mrr_at_3 value: 24.1800535475234 - type: mrr_at_5 value: 25.364792503346674 - type: nauc_map_at_1000_diff1 value: 40.81899763873748 - type: nauc_map_at_1000_max value: 11.253631614437268 - type: nauc_map_at_1000_std value: 1.5897060898020656 - type: nauc_map_at_100_diff1 value: 40.78701343792848 - type: nauc_map_at_100_max value: 11.27294926630661 - type: nauc_map_at_100_std value: 1.6118772584552687 - type: nauc_map_at_10_diff1 value: 41.075611489073324 - type: nauc_map_at_10_max value: 11.521202364241029 - type: nauc_map_at_10_std value: 1.2931734299571058 - type: nauc_map_at_1_diff1 value: 48.17546169609799 - type: nauc_map_at_1_max value: 13.494189949598375 - type: nauc_map_at_1_std value: 0.07263746580580938 - type: nauc_map_at_20_diff1 value: 40.841882938863435 - type: nauc_map_at_20_max value: 11.418649006248861 - type: nauc_map_at_20_std value: 1.4175148500460242 - type: nauc_map_at_3_diff1 value: 42.213517992662815 - type: nauc_map_at_3_max value: 12.808728940816176 - type: nauc_map_at_3_std value: 1.0861600000182654 - type: nauc_map_at_5_diff1 value: 41.6309141720988 - type: nauc_map_at_5_max value: 11.996308489388992 - type: nauc_map_at_5_std value: 1.2641645150076395 - type: nauc_mrr_at_1000_diff1 value: 40.81899763873748 - type: nauc_mrr_at_1000_max value: 11.253631614437268 - type: nauc_mrr_at_1000_std value: 1.5897060898020656 - type: nauc_mrr_at_100_diff1 value: 40.78701343792848 - type: nauc_mrr_at_100_max value: 11.27294926630661 - type: nauc_mrr_at_100_std value: 1.6118772584552687 - type: nauc_mrr_at_10_diff1 value: 41.075611489073324 - type: nauc_mrr_at_10_max value: 11.521202364241029 - type: nauc_mrr_at_10_std value: 1.2931734299571058 - type: nauc_mrr_at_1_diff1 value: 48.17546169609799 - type: nauc_mrr_at_1_max value: 13.494189949598375 - type: nauc_mrr_at_1_std value: 0.07263746580580938 - type: nauc_mrr_at_20_diff1 value: 40.841882938863435 - type: nauc_mrr_at_20_max value: 11.418649006248861 - type: nauc_mrr_at_20_std value: 1.4175148500460242 - type: nauc_mrr_at_3_diff1 value: 42.213517992662815 - type: nauc_mrr_at_3_max value: 12.808728940816176 - type: nauc_mrr_at_3_std value: 1.0861600000182654 - type: nauc_mrr_at_5_diff1 value: 41.6309141720988 - type: nauc_mrr_at_5_max value: 11.996308489388992 - type: nauc_mrr_at_5_std value: 1.2641645150076395 - type: nauc_ndcg_at_1000_diff1 value: 37.7525819268389 - type: nauc_ndcg_at_1000_max value: 8.537400436184365 - type: nauc_ndcg_at_1000_std value: 2.9622195950411925 - type: nauc_ndcg_at_100_diff1 value: 36.787603237032975 - type: nauc_ndcg_at_100_max value: 8.608543884213873 - type: nauc_ndcg_at_100_std value: 3.8384319334640695 - type: nauc_ndcg_at_10_diff1 value: 38.17646042200737 - type: nauc_ndcg_at_10_max value: 10.09464701041161 - type: nauc_ndcg_at_10_std value: 1.82746325273071 - type: nauc_ndcg_at_1_diff1 value: 48.17546169609799 - type: nauc_ndcg_at_1_max value: 13.494189949598375 - type: nauc_ndcg_at_1_std value: 0.07263746580580938 - type: nauc_ndcg_at_20_diff1 value: 37.27227964097512 - type: nauc_ndcg_at_20_max value: 9.739171990515723 - type: nauc_ndcg_at_20_std value: 2.3086094833252115 - type: nauc_ndcg_at_3_diff1 value: 40.37281782985726 - type: nauc_ndcg_at_3_max value: 12.624015391541455 - type: nauc_ndcg_at_3_std value: 1.407593942089084 - type: nauc_ndcg_at_5_diff1 value: 39.35750963645447 - type: nauc_ndcg_at_5_max value: 11.236243459280038 - type: nauc_ndcg_at_5_std value: 1.722451235770262 - type: nauc_precision_at_1000_diff1 value: 12.726040453874319 - type: nauc_precision_at_1000_max value: -30.085818447743566 - type: nauc_precision_at_1000_std value: 15.649828948529738 - type: nauc_precision_at_100_diff1 value: 20.374750836627285 - type: nauc_precision_at_100_max value: -4.315521193959148 - type: nauc_precision_at_100_std value: 15.928528368224907 - type: nauc_precision_at_10_diff1 value: 30.394845120941987 - type: nauc_precision_at_10_max value: 5.92964609786744 - type: nauc_precision_at_10_std value: 3.297191207595148 - type: nauc_precision_at_1_diff1 value: 48.17546169609799 - type: nauc_precision_at_1_max value: 13.494189949598375 - type: nauc_precision_at_1_std value: 0.07263746580580938 - type: nauc_precision_at_20_diff1 value: 26.72269495712158 - type: nauc_precision_at_20_max value: 4.521447508378409 - type: nauc_precision_at_20_std value: 5.180527682236829 - type: nauc_precision_at_3_diff1 value: 35.59077406479908 - type: nauc_precision_at_3_max value: 12.151097771811763 - type: nauc_precision_at_3_std value: 2.24486462426719 - type: nauc_precision_at_5_diff1 value: 33.428016378866076 - type: nauc_precision_at_5_max value: 9.15731660897423 - type: nauc_precision_at_5_std value: 2.9353909916486294 - type: nauc_recall_at_1000_diff1 value: 12.726040453874369 - type: nauc_recall_at_1000_max value: -30.085818447743364 - type: nauc_recall_at_1000_std value: 15.649828948529635 - type: nauc_recall_at_100_diff1 value: 20.374750836627264 - type: nauc_recall_at_100_max value: -4.315521193959231 - type: nauc_recall_at_100_std value: 15.928528368224876 - type: nauc_recall_at_10_diff1 value: 30.394845120942005 - type: nauc_recall_at_10_max value: 5.929646097867471 - type: nauc_recall_at_10_std value: 3.297191207595157 - type: nauc_recall_at_1_diff1 value: 48.17546169609799 - type: nauc_recall_at_1_max value: 13.494189949598375 - type: nauc_recall_at_1_std value: 0.07263746580580938 - type: nauc_recall_at_20_diff1 value: 26.722694957121647 - type: nauc_recall_at_20_max value: 4.521447508378419 - type: nauc_recall_at_20_std value: 5.1805276822368524 - type: nauc_recall_at_3_diff1 value: 35.59077406479911 - type: nauc_recall_at_3_max value: 12.151097771811772 - type: nauc_recall_at_3_std value: 2.2448646242671857 - type: nauc_recall_at_5_diff1 value: 33.42801637886615 - type: nauc_recall_at_5_max value: 9.15731660897428 - type: nauc_recall_at_5_std value: 2.9353909916486782 - type: ndcg_at_1 value: 19.026 - type: ndcg_at_10 value: 30.245 - type: ndcg_at_100 value: 35.716 - type: ndcg_at_1000 value: 38.421 - type: ndcg_at_20 value: 32.242 - type: ndcg_at_3 value: 25.884 - type: ndcg_at_5 value: 28.016999999999996 - type: precision_at_1 value: 19.026 - type: precision_at_10 value: 4.287 - type: precision_at_100 value: 0.697 - type: precision_at_1000 value: 0.092 - type: precision_at_20 value: 2.543 - type: precision_at_3 value: 10.274 - type: precision_at_5 value: 7.199 - type: recall_at_1 value: 19.026 - type: recall_at_10 value: 42.870999999999995 - type: recall_at_100 value: 69.729 - type: recall_at_1000 value: 91.968 - type: recall_at_20 value: 50.853 - type: recall_at_3 value: 30.823 - type: recall_at_5 value: 35.994 task: type: Retrieval - dataset: config: default name: MTEB FinancialPhrasebankClassification (default) revision: 1484d06fe7af23030c7c977b12556108d1f67039 split: train type: takala/financial_phrasebank metrics: - type: accuracy value: 67.97703180212015 - type: f1 value: 57.55594804795911 - type: f1_weighted value: 68.01782223640284 - type: main_score value: 67.97703180212015 task: type: Classification - dataset: config: default name: MTEB FrenkEnClassification (default) revision: 52483dba0ff23291271ee9249839865e3c3e7e50 split: test type: classla/FRENK-hate-en metrics: - type: accuracy value: 55.289004780530206 - type: ap value: 41.78925787378802 - type: ap_weighted value: 41.78925787378802 - type: f1 value: 54.04961911556596 - type: f1_weighted value: 54.99825667370393 - type: main_score value: 55.289004780530206 task: type: Classification - dataset: config: default name: MTEB FunctionOfDecisionSectionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 16.621253405994548 - type: f1 value: 15.693085823082844 - type: f1_weighted value: 15.880480382757908 - type: main_score value: 16.621253405994548 task: type: Classification - dataset: config: default name: MTEB GPUSpeedTask (default) revision: '1.0' split: test type: 'GPUSpeedTask' metrics: - type: avg_words_per_sec value: 7186456.843601672 - type: main_score value: 7186456.843601672 - type: num_gpus value: 300 - type: physical_cores value: 3600 - type: time_mean value: 5.055342401776995 - type: time_std value: 1.0630782067852145 - type: total_cores value: 7200 task: type: Speed - dataset: config: default name: MTEB GeoreviewClassification (default) revision: 3765c0d1de6b7d264bc459433c45e5a75513839c split: test type: ai-forever/georeview-classification metrics: - type: accuracy value: 41.3623046875 - type: f1 value: 39.78804299557415 - type: f1_weighted value: 39.787468620260825 - type: main_score value: 41.3623046875 task: type: Classification - dataset: config: default name: MTEB GeoreviewClusteringP2P (default) revision: 97a313c8fc85b47f13f33e7e9a95c1ad888c7fec split: test type: ai-forever/georeview-clustering-p2p metrics: - type: main_score value: 59.713474431847416 - type: v_measure value: 59.713474431847416 - type: v_measure_std value: 1.1676689250848244 task: type: Clustering - dataset: config: default name: MTEB HeadlineClassification (default) revision: 2fe05ee6b5832cda29f2ef7aaad7b7fe6a3609eb split: test type: ai-forever/headline-classification metrics: - type: accuracy value: 68.9013671875 - type: f1 value: 68.80041842725984 - type: f1_weighted value: 68.80034868754102 - type: main_score value: 68.9013671875 task: type: Classification - dataset: config: default name: MTEB ImdbClassification (default) revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 split: test type: mteb/imdb metrics: - type: accuracy value: 58.35799999999999 - type: ap value: 55.16102855038145 - type: ap_weighted value: 55.16102855038145 - type: f1 value: 57.51452465161078 - type: f1_weighted value: 57.514524651610785 - type: main_score value: 58.35799999999999 task: type: Classification - dataset: config: default name: MTEB InappropriatenessClassification (default) revision: 601651fdc45ef243751676e62dd7a19f491c0285 split: test type: ai-forever/inappropriateness-classification metrics: - type: accuracy value: 59.11132812499999 - type: ap value: 55.4713646939923 - type: ap_weighted value: 55.4713646939923 - type: f1 value: 58.8968409989092 - type: f1_weighted value: 58.8968409989092 - type: main_score value: 59.11132812499999 task: type: Classification - dataset: config: default name: MTEB InsurancePolicyInterpretationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 20.30075187969925 - type: f1 value: 11.25 - type: f1_weighted value: 6.851503759398496 - type: main_score value: 20.30075187969925 task: type: Classification - dataset: config: default name: MTEB InternationalCitizenshipQuestionsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 60.107421875 - type: ap value: 46.4447988877498 - type: ap_weighted value: 46.4447988877498 - type: f1 value: 56.153528268151675 - type: f1_weighted value: 58.210838762771935 - type: main_score value: 60.107421875 task: type: Classification - dataset: config: default name: MTEB JCrewBlockerLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 79.62962962962962 - type: ap value: 86.55394524959743 - type: ap_weighted value: 86.55394524959743 - type: f1 value: 61.60310277957336 - type: f1_weighted value: 79.14242620124973 - type: main_score value: 79.62962962962962 task: type: Classification - dataset: config: default name: MTEB KinopoiskClassification (default) revision: 5911f26666ac11af46cb9c6849d0dc80a378af24 split: test type: ai-forever/kinopoisk-sentiment-classification metrics: - type: accuracy value: 50.46666666666666 - type: f1 value: 49.1239356856144 - type: f1_weighted value: 49.123935685614384 - type: main_score value: 50.46666666666666 task: type: Classification - dataset: config: default name: MTEB LearnedHandsBenefitsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 66.66666666666667 - type: ap value: 61.11111111111111 - type: ap_weighted value: 61.11111111111111 - type: f1 value: 66.66666666666667 - type: f1_weighted value: 66.66666666666667 - type: main_score value: 66.66666666666667 task: type: Classification - dataset: config: default name: MTEB LearnedHandsBusinessLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 70.11494252873564 - type: ap value: 68.24378508420207 - type: ap_weighted value: 68.24378508420207 - type: f1 value: 68.07339449541284 - type: f1_weighted value: 68.07339449541284 - type: main_score value: 70.11494252873564 task: type: Classification - dataset: config: default name: MTEB LearnedHandsConsumerLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 58.143322475570045 - type: ap value: 54.72001493806926 - type: ap_weighted value: 54.72001493806926 - type: f1 value: 58.13788145283024 - type: f1_weighted value: 58.13788145283024 - type: main_score value: 58.143322475570045 task: type: Classification - dataset: config: default name: MTEB LearnedHandsCourtsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 60.41666666666667 - type: ap value: 56.07638888888889 - type: ap_weighted value: 56.07638888888889 - type: f1 value: 59.78835978835979 - type: f1_weighted value: 59.78835978835979 - type: main_score value: 60.41666666666667 task: type: Classification - dataset: config: default name: MTEB LearnedHandsCrimeLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 70.63953488372093 - type: ap value: 65.3728949478749 - type: ap_weighted value: 65.3728949478749 - type: f1 value: 70.45754079263989 - type: f1_weighted value: 70.45754079263989 - type: main_score value: 70.63953488372093 task: type: Classification - dataset: config: default name: MTEB LearnedHandsDivorceLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 62.66666666666667 - type: ap value: 57.45794392523364 - type: ap_weighted value: 57.45794392523364 - type: f1 value: 60.886571056062586 - type: f1_weighted value: 60.886571056062586 - type: main_score value: 62.66666666666667 task: type: Classification - dataset: config: default name: MTEB LearnedHandsDomesticViolenceLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 68.39080459770115 - type: ap value: 62.26053639846742 - type: ap_weighted value: 62.26053639846742 - type: f1 value: 68.30601092896174 - type: f1_weighted value: 68.30601092896174 - type: main_score value: 68.39080459770115 task: type: Classification - dataset: config: default name: MTEB LearnedHandsEducationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 69.64285714285714 - type: ap value: 62.222222222222214 - type: ap_weighted value: 62.222222222222214 - type: f1 value: 66.56129258868984 - type: f1_weighted value: 66.56129258868984 - type: main_score value: 69.64285714285714 task: type: Classification - dataset: config: default name: MTEB LearnedHandsEmploymentLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 63.521126760563384 - type: ap value: 58.7392648574373 - type: ap_weighted value: 58.7392648574373 - type: f1 value: 63.4682967433563 - type: f1_weighted value: 63.4682967433563 - type: main_score value: 63.521126760563384 task: type: Classification - dataset: config: default name: MTEB LearnedHandsEstatesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 70.78651685393258 - type: ap value: 64.05564472980203 - type: ap_weighted value: 64.05564472980203 - type: f1 value: 70.54855542828051 - type: f1_weighted value: 70.54855542828051 - type: main_score value: 70.78651685393258 task: type: Classification - dataset: config: default name: MTEB LearnedHandsFamilyLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 75.48828125 - type: ap value: 68.42998798076924 - type: ap_weighted value: 68.42998798076924 - type: f1 value: 75.3630731744256 - type: f1_weighted value: 75.3630731744256 - type: main_score value: 75.48828125 task: type: Classification - dataset: config: default name: MTEB LearnedHandsHealthLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 64.60176991150443 - type: ap value: 58.96246566981995 - type: ap_weighted value: 58.96246566981995 - type: f1 value: 63.877567329976834 - type: f1_weighted value: 63.877567329976834 - type: main_score value: 64.60176991150443 task: type: Classification - dataset: config: default name: MTEB LearnedHandsHousingLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 48.73046875 - type: ap value: 49.376600701618464 - type: ap_weighted value: 49.376600701618464 - type: f1 value: 46.38903847304493 - type: f1_weighted value: 46.38903847304493 - type: main_score value: 48.73046875 task: type: Classification - dataset: config: default name: MTEB LearnedHandsImmigrationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 83.5820895522388 - type: ap value: 77.43325625394155 - type: ap_weighted value: 77.43325625394155 - type: f1 value: 83.5674470457079 - type: f1_weighted value: 83.5674470457079 - type: main_score value: 83.5820895522388 task: type: Classification - dataset: config: default name: MTEB LearnedHandsTortsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 63.19444444444444 - type: ap value: 58.41384863123993 - type: ap_weighted value: 58.41384863123993 - type: f1 value: 63.17846287451151 - type: f1_weighted value: 63.17846287451151 - type: main_score value: 63.19444444444444 task: type: Classification - dataset: config: default name: MTEB LearnedHandsTrafficLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 69.7841726618705 - type: ap value: 62.353917770760766 - type: ap_weighted value: 62.353917770760766 - type: f1 value: 66.90476190476191 - type: f1_weighted value: 66.90476190476191 - type: main_score value: 69.7841726618705 task: type: Classification - dataset: config: default name: MTEB LegalReasoningCausalityLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 56.36363636363636 - type: ap value: 64.75724991854024 - type: ap_weighted value: 64.75724991854024 - type: f1 value: 52.85714285714286 - type: f1_weighted value: 51.220779220779214 - type: main_score value: 56.36363636363636 task: type: Classification - dataset: config: default name: MTEB MAUDLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 27.607421875 - type: f1 value: 14.84669450435061 - type: f1_weighted value: 28.881436838109853 - type: main_score value: 27.607421875 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveIntentClassification (zh-CN) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 5.208473436449227 - type: f1 value: 3.062867346742466 - type: f1_weighted value: 3.5821384620305414 - type: main_score value: 5.208473436449227 task: type: Classification - dataset: config: ko name: MTEB MassiveIntentClassification (ko) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.5319435104236723 - type: f1 value: 0.5994050487142139 - type: f1_weighted value: 1.0538452549913138 - type: main_score value: 2.5319435104236723 task: type: Classification - dataset: config: hi name: MTEB MassiveIntentClassification (hi) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.558843308675185 - type: f1 value: 1.258311921873436 - type: f1_weighted value: 1.4083594758704836 - type: main_score value: 2.558843308675185 task: type: Classification - dataset: config: kn name: MTEB MassiveIntentClassification (kn) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.0645595158036314 - type: f1 value: 1.2240987569096886 - type: f1_weighted value: 1.0817495786784068 - type: main_score value: 2.0645595158036314 task: type: Classification - dataset: config: ka name: MTEB MassiveIntentClassification (ka) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.6395427034297243 - type: f1 value: 0.7660068670322584 - type: f1_weighted value: 0.7729737527960681 - type: main_score value: 2.6395427034297243 task: type: Classification - dataset: config: am name: MTEB MassiveIntentClassification (am) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.276395427034297 - type: f1 value: 0.7755708386766476 - type: f1_weighted value: 0.9189927682322296 - type: main_score value: 2.276395427034297 task: type: Classification - dataset: config: my name: MTEB MassiveIntentClassification (my) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.9576328177538667 - type: f1 value: 1.0681259563998668 - type: f1_weighted value: 1.5818553042962555 - type: main_score value: 3.9576328177538667 task: type: Classification - dataset: config: el name: MTEB MassiveIntentClassification (el) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 9.663752521856086 - type: f1 value: 4.860476294706458 - type: f1_weighted value: 6.8590598543643395 - type: main_score value: 9.663752521856086 task: type: Classification - dataset: config: lv name: MTEB MassiveIntentClassification (lv) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 22.32347007397445 - type: f1 value: 20.939653553666744 - type: f1_weighted value: 20.899939110877806 - type: main_score value: 22.32347007397445 task: type: Classification - dataset: config: ml name: MTEB MassiveIntentClassification (ml) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.390719569603228 - type: f1 value: 0.46817075523593493 - type: f1_weighted value: 0.8438228708667787 - type: main_score value: 2.390719569603228 task: type: Classification - dataset: config: mn name: MTEB MassiveIntentClassification (mn) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 28.994620040349695 - type: f1 value: 27.571069823401256 - type: f1_weighted value: 27.263930155378503 - type: main_score value: 28.994620040349695 task: type: Classification - dataset: config: ur name: MTEB MassiveIntentClassification (ur) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.4478816408876933 - type: f1 value: 1.497656725806116 - type: f1_weighted value: 1.5398763678691354 - type: main_score value: 2.4478816408876933 task: type: Classification - dataset: config: fa name: MTEB MassiveIntentClassification (fa) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.3355749831876267 - type: f1 value: 0.6816922655284716 - type: f1_weighted value: 1.0887948480367862 - type: main_score value: 3.3355749831876267 task: type: Classification - dataset: config: ro name: MTEB MassiveIntentClassification (ro) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.72494956287828 - type: f1 value: 29.577749786404826 - type: f1_weighted value: 29.551193355600514 - type: main_score value: 31.72494956287828 task: type: Classification - dataset: config: is name: MTEB MassiveIntentClassification (is) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 24.845326160053798 - type: f1 value: 22.11363990784136 - type: f1_weighted value: 23.65026728412048 - type: main_score value: 24.845326160053798 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 50.164761264290526 - type: f1 value: 47.85763581891828 - type: f1_weighted value: 48.98444884040328 - type: main_score value: 50.164761264290526 task: type: Classification - dataset: config: hu name: MTEB MassiveIntentClassification (hu) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 25.524546065904502 - type: f1 value: 23.753046097467873 - type: f1_weighted value: 23.826312126027823 - type: main_score value: 25.524546065904502 task: type: Classification - dataset: config: fr name: MTEB MassiveIntentClassification (fr) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.50638870208473 - type: f1 value: 31.370642915213388 - type: f1_weighted value: 30.505546915456012 - type: main_score value: 31.50638870208473 task: type: Classification - dataset: config: th name: MTEB MassiveIntentClassification (th) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.739071956960323 - type: f1 value: 1.411228354273586 - type: f1_weighted value: 1.216275118762689 - type: main_score value: 3.739071956960323 task: type: Classification - dataset: config: de name: MTEB MassiveIntentClassification (de) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 32.1049092131809 - type: f1 value: 29.794603179718106 - type: f1_weighted value: 30.137050786689766 - type: main_score value: 32.1049092131809 task: type: Classification - dataset: config: tr name: MTEB MassiveIntentClassification (tr) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 27.562205783456626 - type: f1 value: 25.683266426146687 - type: f1_weighted value: 25.803636686733057 - type: main_score value: 27.562205783456626 task: type: Classification - dataset: config: pt name: MTEB MassiveIntentClassification (pt) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 34.347679892400805 - type: f1 value: 31.465774161046767 - type: f1_weighted value: 31.735356981669327 - type: main_score value: 34.347679892400805 task: type: Classification - dataset: config: sq name: MTEB MassiveIntentClassification (sq) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 32.38063214525891 - type: f1 value: 29.53168994128031 - type: f1_weighted value: 30.112896935570273 - type: main_score value: 32.38063214525891 task: type: Classification - dataset: config: zh-TW name: MTEB MassiveIntentClassification (zh-TW) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 6.809011432414256 - type: f1 value: 5.205218706422693 - type: f1_weighted value: 5.178287349465675 - type: main_score value: 6.809011432414256 task: type: Classification - dataset: config: hy name: MTEB MassiveIntentClassification (hy) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.723604572965703 - type: f1 value: 0.6429150866665544 - type: f1_weighted value: 0.9113227866994432 - type: main_score value: 2.723604572965703 task: type: Classification - dataset: config: da name: MTEB MassiveIntentClassification (da) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 33.95427034297243 - type: f1 value: 32.204428726904936 - type: f1_weighted value: 32.47064251083498 - type: main_score value: 33.95427034297243 task: type: Classification - dataset: config: af name: MTEB MassiveIntentClassification (af) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.403496973772697 - type: f1 value: 27.814640020382342 - type: f1_weighted value: 29.552471475522786 - type: main_score value: 30.403496973772697 task: type: Classification - dataset: config: ar name: MTEB MassiveIntentClassification (ar) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.796234028244788 - type: f1 value: 2.4115955159178712 - type: f1_weighted value: 2.9705530799117428 - type: main_score value: 3.796234028244788 task: type: Classification - dataset: config: jv name: MTEB MassiveIntentClassification (jv) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 28.533960995292528 - type: f1 value: 26.21221777741412 - type: f1_weighted value: 27.072811075990217 - type: main_score value: 28.533960995292528 task: type: Classification - dataset: config: te name: MTEB MassiveIntentClassification (te) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.2125084061869535 - type: f1 value: 1.0173733514352028 - type: f1_weighted value: 1.316987953476142 - type: main_score value: 2.2125084061869535 task: type: Classification - dataset: config: tl name: MTEB MassiveIntentClassification (tl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 32.017484868863484 - type: f1 value: 29.32295890060929 - type: f1_weighted value: 29.657369574195414 - type: main_score value: 32.017484868863484 task: type: Classification - dataset: config: sw name: MTEB MassiveIntentClassification (sw) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 27.790854068594484 - type: f1 value: 26.66461334490106 - type: f1_weighted value: 26.3309301465354 - type: main_score value: 27.790854068594484 task: type: Classification - dataset: config: ja name: MTEB MassiveIntentClassification (ja) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 5.611970410221924 - type: f1 value: 3.949675565526302 - type: f1_weighted value: 3.8008532811790516 - type: main_score value: 5.611970410221924 task: type: Classification - dataset: config: ms name: MTEB MassiveIntentClassification (ms) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 28.940820443846675 - type: f1 value: 26.913943613442726 - type: f1_weighted value: 27.58112937211184 - type: main_score value: 28.940820443846675 task: type: Classification - dataset: config: nb name: MTEB MassiveIntentClassification (nb) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 32.29993275050437 - type: f1 value: 30.38953729738546 - type: f1_weighted value: 30.973971090234315 - type: main_score value: 32.29993275050437 task: type: Classification - dataset: config: fi name: MTEB MassiveIntentClassification (fi) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.13315400134499 - type: f1 value: 28.151659309577315 - type: f1_weighted value: 28.919992380957805 - type: main_score value: 31.13315400134499 task: type: Classification - dataset: config: id name: MTEB MassiveIntentClassification (id) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 33.56422326832549 - type: f1 value: 32.13999124730796 - type: f1_weighted value: 31.821742347727334 - type: main_score value: 33.56422326832549 task: type: Classification - dataset: config: cy name: MTEB MassiveIntentClassification (cy) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.68123739071957 - type: f1 value: 28.08132049625695 - type: f1_weighted value: 30.136632177167293 - type: main_score value: 31.68123739071957 task: type: Classification - dataset: config: sl name: MTEB MassiveIntentClassification (sl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.388702084734366 - type: f1 value: 30.06510634561652 - type: f1_weighted value: 29.575793355168027 - type: main_score value: 31.388702084734366 task: type: Classification - dataset: config: es name: MTEB MassiveIntentClassification (es) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.032279757901815 - type: f1 value: 30.20555955874916 - type: f1_weighted value: 28.87618616461917 - type: main_score value: 31.032279757901815 task: type: Classification - dataset: config: bn name: MTEB MassiveIntentClassification (bn) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.0766644250168125 - type: f1 value: 1.1659097449170488 - type: f1_weighted value: 1.6261385516847686 - type: main_score value: 3.0766644250168125 task: type: Classification - dataset: config: sv name: MTEB MassiveIntentClassification (sv) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.22864828513786 - type: f1 value: 29.514038012557155 - type: f1_weighted value: 28.79006788550934 - type: main_score value: 30.22864828513786 task: type: Classification - dataset: config: ru name: MTEB MassiveIntentClassification (ru) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 57.97915265635507 - type: f1 value: 56.5014953445001 - type: f1_weighted value: 56.64147015986123 - type: main_score value: 57.97915265635507 task: type: Classification - dataset: config: az name: MTEB MassiveIntentClassification (az) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 23.577673167451245 - type: f1 value: 23.44310534002699 - type: f1_weighted value: 22.73388843513862 - type: main_score value: 23.577673167451245 task: type: Classification - dataset: config: it name: MTEB MassiveIntentClassification (it) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 35.24209818426362 - type: f1 value: 34.17643389765681 - type: f1_weighted value: 31.88705168526876 - type: main_score value: 35.24209818426362 task: type: Classification - dataset: config: pl name: MTEB MassiveIntentClassification (pl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 26.815736381977135 - type: f1 value: 23.59490629738082 - type: f1_weighted value: 24.824019034766742 - type: main_score value: 26.815736381977135 task: type: Classification - dataset: config: vi name: MTEB MassiveIntentClassification (vi) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 23.71889710827169 - type: f1 value: 20.9474996841838 - type: f1_weighted value: 21.8696712485011 - type: main_score value: 23.71889710827169 task: type: Classification - dataset: config: ta name: MTEB MassiveIntentClassification (ta) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 1.4996637525218561 - type: f1 value: 0.3621176226135693 - type: f1_weighted value: 0.40253328041710507 - type: main_score value: 1.4996637525218561 task: type: Classification - dataset: config: he name: MTEB MassiveIntentClassification (he) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.2461331540013454 - type: f1 value: 0.590566331230622 - type: f1_weighted value: 0.6162176049666722 - type: main_score value: 2.2461331540013454 task: type: Classification - dataset: config: nl name: MTEB MassiveIntentClassification (nl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 32.43779421654338 - type: f1 value: 29.65516413448003 - type: f1_weighted value: 30.056107103546008 - type: main_score value: 32.43779421654338 task: type: Classification - dataset: config: km name: MTEB MassiveIntentClassification (km) revision: 4672e20407010da34463acc759c162ca9734bca6 split: test type: mteb/amazon_massive_intent metrics: - type: accuracy value: 5.137861466039005 - type: f1 value: 1.5034651435201778 - type: f1_weighted value: 1.8580225168667703 - type: main_score value: 5.137861466039005 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveIntentClassification (zh-CN) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 5.15002459419577 - type: f1 value: 3.2849878732080238 - type: f1_weighted value: 3.171516129361724 - type: main_score value: 5.15002459419577 task: type: Classification - dataset: config: ko name: MTEB MassiveIntentClassification (ko) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.3610427939006393 - type: f1 value: 0.6344240632132025 - type: f1_weighted value: 0.8741011326135733 - type: main_score value: 2.3610427939006393 task: type: Classification - dataset: config: hi name: MTEB MassiveIntentClassification (hi) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.4299065420560746 - type: f1 value: 1.1990062972384772 - type: f1_weighted value: 1.2846405130538945 - type: main_score value: 2.4299065420560746 task: type: Classification - dataset: config: kn name: MTEB MassiveIntentClassification (kn) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.100344318740777 - type: f1 value: 1.0691096895187684 - type: f1_weighted value: 1.0245515267986838 - type: main_score value: 2.100344318740777 task: type: Classification - dataset: config: ka name: MTEB MassiveIntentClassification (ka) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.144613871126414 - type: f1 value: 0.38751721719666626 - type: f1_weighted value: 0.5494302003085859 - type: main_score value: 2.144613871126414 task: type: Classification - dataset: config: am name: MTEB MassiveIntentClassification (am) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.1347761928184945 - type: f1 value: 0.7186972868374003 - type: f1_weighted value: 0.8692320111678621 - type: main_score value: 2.1347761928184945 task: type: Classification - dataset: config: my name: MTEB MassiveIntentClassification (my) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.9744220363994094 - type: f1 value: 1.320159702083562 - type: f1_weighted value: 1.6615339662178419 - type: main_score value: 3.9744220363994094 task: type: Classification - dataset: config: el name: MTEB MassiveIntentClassification (el) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 8.740777176586326 - type: f1 value: 4.625508580628892 - type: f1_weighted value: 5.910937912610004 - type: main_score value: 8.740777176586326 task: type: Classification - dataset: config: lv name: MTEB MassiveIntentClassification (lv) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 22.056074766355138 - type: f1 value: 20.067449871163735 - type: f1_weighted value: 20.679581641637213 - type: main_score value: 22.056074766355138 task: type: Classification - dataset: config: ml name: MTEB MassiveIntentClassification (ml) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.287260206591244 - type: f1 value: 0.5144479181790914 - type: f1_weighted value: 0.7532382956194585 - type: main_score value: 2.287260206591244 task: type: Classification - dataset: config: mn name: MTEB MassiveIntentClassification (mn) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 28.514510575504183 - type: f1 value: 27.670683007330656 - type: f1_weighted value: 26.797727875405965 - type: main_score value: 28.514510575504183 task: type: Classification - dataset: config: ur name: MTEB MassiveIntentClassification (ur) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.5528775209050663 - type: f1 value: 1.5528439347982526 - type: f1_weighted value: 1.59863069765228 - type: main_score value: 2.5528775209050663 task: type: Classification - dataset: config: fa name: MTEB MassiveIntentClassification (fa) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.1578947368421053 - type: f1 value: 0.612147286970534 - type: f1_weighted value: 0.9311100758788083 - type: main_score value: 3.1578947368421053 task: type: Classification - dataset: config: ro name: MTEB MassiveIntentClassification (ro) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.472208558780135 - type: f1 value: 28.570236227937524 - type: f1_weighted value: 29.26182782217857 - type: main_score value: 30.472208558780135 task: type: Classification - dataset: config: is name: MTEB MassiveIntentClassification (is) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 24.12690605017216 - type: f1 value: 21.730073248467978 - type: f1_weighted value: 23.3232094260056 - type: main_score value: 24.12690605017216 task: type: Classification - dataset: config: en name: MTEB MassiveIntentClassification (en) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 50.6837186424004 - type: f1 value: 46.24633043195857 - type: f1_weighted value: 49.89222156091109 - type: main_score value: 50.6837186424004 task: type: Classification - dataset: config: hu name: MTEB MassiveIntentClassification (hu) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 24.869650762420065 - type: f1 value: 22.646829281311646 - type: f1_weighted value: 23.75607068147335 - type: main_score value: 24.869650762420065 task: type: Classification - dataset: config: fr name: MTEB MassiveIntentClassification (fr) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.83620265617314 - type: f1 value: 30.12388095110573 - type: f1_weighted value: 29.755084946082466 - type: main_score value: 30.83620265617314 task: type: Classification - dataset: config: th name: MTEB MassiveIntentClassification (th) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.7924249877029017 - type: f1 value: 1.3490081402255192 - type: f1_weighted value: 1.1964792923823864 - type: main_score value: 3.7924249877029017 task: type: Classification - dataset: config: de name: MTEB MassiveIntentClassification (de) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.85095917363502 - type: f1 value: 28.76898470499743 - type: f1_weighted value: 29.742721084026552 - type: main_score value: 30.85095917363502 task: type: Classification - dataset: config: tr name: MTEB MassiveIntentClassification (tr) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 26.22233152975898 - type: f1 value: 24.13532374526957 - type: f1_weighted value: 24.801681753477833 - type: main_score value: 26.22233152975898 task: type: Classification - dataset: config: pt name: MTEB MassiveIntentClassification (pt) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 33.85145105755042 - type: f1 value: 30.993852084910046 - type: f1_weighted value: 31.47706557692265 - type: main_score value: 33.85145105755042 task: type: Classification - dataset: config: sq name: MTEB MassiveIntentClassification (sq) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.69699950811608 - type: f1 value: 28.43551777754717 - type: f1_weighted value: 29.35991647173387 - type: main_score value: 31.69699950811608 task: type: Classification - dataset: config: zh-TW name: MTEB MassiveIntentClassification (zh-TW) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 6.296114117068371 - type: f1 value: 4.469538815411268 - type: f1_weighted value: 4.470912934534107 - type: main_score value: 6.296114117068371 task: type: Classification - dataset: config: hy name: MTEB MassiveIntentClassification (hy) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.6660108214461387 - type: f1 value: 0.7095128645283928 - type: f1_weighted value: 0.900359447084975 - type: main_score value: 2.6660108214461387 task: type: Classification - dataset: config: da name: MTEB MassiveIntentClassification (da) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 32.24790949335957 - type: f1 value: 30.09602016401104 - type: f1_weighted value: 31.27365296679004 - type: main_score value: 32.24790949335957 task: type: Classification - dataset: config: af name: MTEB MassiveIntentClassification (af) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 29.85243482538121 - type: f1 value: 27.02898547703625 - type: f1_weighted value: 29.19825733648402 - type: main_score value: 29.85243482538121 task: type: Classification - dataset: config: ar name: MTEB MassiveIntentClassification (ar) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 3.413674372848008 - type: f1 value: 2.3814730307183596 - type: f1_weighted value: 2.758592436005351 - type: main_score value: 3.413674372848008 task: type: Classification - dataset: config: jv name: MTEB MassiveIntentClassification (jv) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 27.59960649286769 - type: f1 value: 25.169829835887036 - type: f1_weighted value: 26.378021821617065 - type: main_score value: 27.59960649286769 task: type: Classification - dataset: config: te name: MTEB MassiveIntentClassification (te) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.0363994097393014 - type: f1 value: 0.7934004289138196 - type: f1_weighted value: 1.1834679007875544 - type: main_score value: 2.0363994097393014 task: type: Classification - dataset: config: tl name: MTEB MassiveIntentClassification (tl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.43630103295622 - type: f1 value: 28.28710817943075 - type: f1_weighted value: 29.47693147061905 - type: main_score value: 31.43630103295622 task: type: Classification - dataset: config: sw name: MTEB MassiveIntentClassification (sw) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 27.515986227250366 - type: f1 value: 25.65654395144761 - type: f1_weighted value: 26.414094210360055 - type: main_score value: 27.515986227250366 task: type: Classification - dataset: config: ja name: MTEB MassiveIntentClassification (ja) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 5.986227250368913 - type: f1 value: 3.9449730568824433 - type: f1_weighted value: 3.8102259721047833 - type: main_score value: 5.986227250368913 task: type: Classification - dataset: config: ms name: MTEB MassiveIntentClassification (ms) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 28.155435317265127 - type: f1 value: 25.708172487585202 - type: f1_weighted value: 27.024916707588677 - type: main_score value: 28.155435317265127 task: type: Classification - dataset: config: nb name: MTEB MassiveIntentClassification (nb) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 31.485489424495817 - type: f1 value: 29.47639008406045 - type: f1_weighted value: 30.377692398014027 - type: main_score value: 31.485489424495817 task: type: Classification - dataset: config: fi name: MTEB MassiveIntentClassification (fi) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.403344810624695 - type: f1 value: 26.82843832763937 - type: f1_weighted value: 28.11110907470959 - type: main_score value: 30.403344810624695 task: type: Classification - dataset: config: id name: MTEB MassiveIntentClassification (id) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 32.70044269552386 - type: f1 value: 30.910774335551594 - type: f1_weighted value: 31.371749140831422 - type: main_score value: 32.70044269552386 task: type: Classification - dataset: config: cy name: MTEB MassiveIntentClassification (cy) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 29.429414658140686 - type: f1 value: 25.594886516936256 - type: f1_weighted value: 28.392261199556877 - type: main_score value: 29.429414658140686 task: type: Classification - dataset: config: sl name: MTEB MassiveIntentClassification (sl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 29.636005902606982 - type: f1 value: 28.287023938527234 - type: f1_weighted value: 27.924913519954554 - type: main_score value: 29.636005902606982 task: type: Classification - dataset: config: es name: MTEB MassiveIntentClassification (es) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.63453025086079 - type: f1 value: 29.5921601385162 - type: f1_weighted value: 28.58410607526952 - type: main_score value: 30.63453025086079 task: type: Classification - dataset: config: bn name: MTEB MassiveIntentClassification (bn) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.867683226758485 - type: f1 value: 1.0374630680286294 - type: f1_weighted value: 1.3261691151267023 - type: main_score value: 2.867683226758485 task: type: Classification - dataset: config: sv name: MTEB MassiveIntentClassification (sv) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 29.754058042302017 - type: f1 value: 27.921243093926957 - type: f1_weighted value: 28.600526975101815 - type: main_score value: 29.754058042302017 task: type: Classification - dataset: config: ru name: MTEB MassiveIntentClassification (ru) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 58.06197737333989 - type: f1 value: 53.92404816772661 - type: f1_weighted value: 56.72057857737771 - type: main_score value: 58.06197737333989 task: type: Classification - dataset: config: az name: MTEB MassiveIntentClassification (az) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 22.725036891293655 - type: f1 value: 22.05764593465915 - type: f1_weighted value: 22.36326529771844 - type: main_score value: 22.725036891293655 task: type: Classification - dataset: config: it name: MTEB MassiveIntentClassification (it) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 34.57943925233645 - type: f1 value: 33.54269802516337 - type: f1_weighted value: 31.59380780190696 - type: main_score value: 34.57943925233645 task: type: Classification - dataset: config: pl name: MTEB MassiveIntentClassification (pl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 26.050172159370387 - type: f1 value: 23.37018289487783 - type: f1_weighted value: 24.52891801190779 - type: main_score value: 26.050172159370387 task: type: Classification - dataset: config: vi name: MTEB MassiveIntentClassification (vi) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 23.10378750614855 - type: f1 value: 19.634766811442688 - type: f1_weighted value: 21.39922163237278 - type: main_score value: 23.10378750614855 task: type: Classification - dataset: config: ta name: MTEB MassiveIntentClassification (ta) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 1.382193802262666 - type: f1 value: 0.2962201919122291 - type: f1_weighted value: 0.36568543738308745 - type: main_score value: 1.382193802262666 task: type: Classification - dataset: config: he name: MTEB MassiveIntentClassification (he) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 2.0560747663551404 - type: f1 value: 0.4742414282381403 - type: f1_weighted value: 0.5861893507001308 - type: main_score value: 2.0560747663551404 task: type: Classification - dataset: config: nl name: MTEB MassiveIntentClassification (nl) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 30.5115592720118 - type: f1 value: 27.61045064110582 - type: f1_weighted value: 28.987990654116114 - type: main_score value: 30.5115592720118 task: type: Classification - dataset: config: km name: MTEB MassiveIntentClassification (km) revision: 4672e20407010da34463acc759c162ca9734bca6 split: validation type: mteb/amazon_massive_intent metrics: - type: accuracy value: 4.377766847024103 - type: f1 value: 1.2676703377671132 - type: f1_weighted value: 1.426174554035529 - type: main_score value: 4.377766847024103 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveScenarioClassification (zh-CN) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 10.601882985877605 - type: f1 value: 6.8689500634035365 - type: f1_weighted value: 8.260029142337519 - type: main_score value: 10.601882985877605 task: type: Classification - dataset: config: ko name: MTEB MassiveScenarioClassification (ko) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 5.62542030934768 - type: f1 value: 1.9399090161521315 - type: f1_weighted value: 1.7790298099358886 - type: main_score value: 5.62542030934768 task: type: Classification - dataset: config: hi name: MTEB MassiveScenarioClassification (hi) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.407531943510423 - type: f1 value: 3.622072056826428 - type: f1_weighted value: 3.444172662951229 - type: main_score value: 7.407531943510423 task: type: Classification - dataset: config: kn name: MTEB MassiveScenarioClassification (kn) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.602555480833894 - type: f1 value: 3.9001734711485803 - type: f1_weighted value: 3.4912256692008397 - type: main_score value: 7.602555480833894 task: type: Classification - dataset: config: ka name: MTEB MassiveScenarioClassification (ka) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.010759919300605 - type: f1 value: 2.1485666974093878 - type: f1_weighted value: 2.3739456428263477 - type: main_score value: 7.010759919300605 task: type: Classification - dataset: config: am name: MTEB MassiveScenarioClassification (am) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.679892400806995 - type: f1 value: 2.728187383195907 - type: f1_weighted value: 3.0454310752856353 - type: main_score value: 7.679892400806995 task: type: Classification - dataset: config: my name: MTEB MassiveScenarioClassification (my) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 10.729657027572292 - type: f1 value: 4.138439669406968 - type: f1_weighted value: 4.843092536146883 - type: main_score value: 10.729657027572292 task: type: Classification - dataset: config: el name: MTEB MassiveScenarioClassification (el) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 17.952252858103563 - type: f1 value: 12.418135741505608 - type: f1_weighted value: 15.228054842385186 - type: main_score value: 17.952252858103563 task: type: Classification - dataset: config: lv name: MTEB MassiveScenarioClassification (lv) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 29.29388029589779 - type: f1 value: 25.95638727776611 - type: f1_weighted value: 27.82646328315652 - type: main_score value: 29.29388029589779 task: type: Classification - dataset: config: ml name: MTEB MassiveScenarioClassification (ml) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 6.923335574983189 - type: f1 value: 2.2338102382542795 - type: f1_weighted value: 2.837475945704109 - type: main_score value: 6.923335574983189 task: type: Classification - dataset: config: mn name: MTEB MassiveScenarioClassification (mn) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 33.70208473436449 - type: f1 value: 31.451013524608147 - type: f1_weighted value: 33.4571016718763 - type: main_score value: 33.70208473436449 task: type: Classification - dataset: config: ur name: MTEB MassiveScenarioClassification (ur) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.530598520511097 - type: f1 value: 3.993356806346034 - type: f1_weighted value: 4.275297414153249 - type: main_score value: 8.530598520511097 task: type: Classification - dataset: config: fa name: MTEB MassiveScenarioClassification (fa) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 6.6240753194351045 - type: f1 value: 2.559179690443991 - type: f1_weighted value: 2.8775036329690353 - type: main_score value: 6.6240753194351045 task: type: Classification - dataset: config: ro name: MTEB MassiveScenarioClassification (ro) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 40.01681237390719 - type: f1 value: 36.15548220887307 - type: f1_weighted value: 38.91143847106075 - type: main_score value: 40.01681237390719 task: type: Classification - dataset: config: is name: MTEB MassiveScenarioClassification (is) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 33.10356422326833 - type: f1 value: 29.87073203020746 - type: f1_weighted value: 32.736926298821786 - type: main_score value: 33.10356422326833 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 61.291190316072644 - type: f1 value: 58.09487277036398 - type: f1_weighted value: 60.52223749579593 - type: main_score value: 61.291190316072644 task: type: Classification - dataset: config: hu name: MTEB MassiveScenarioClassification (hu) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 36.40551445864156 - type: f1 value: 32.12815170334265 - type: f1_weighted value: 35.421611675898745 - type: main_score value: 36.40551445864156 task: type: Classification - dataset: config: fr name: MTEB MassiveScenarioClassification (fr) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 42.90181573638198 - type: f1 value: 39.00450485042174 - type: f1_weighted value: 41.74577968212385 - type: main_score value: 42.90181573638198 task: type: Classification - dataset: config: th name: MTEB MassiveScenarioClassification (th) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.261600537995966 - type: f1 value: 3.8946817615361597 - type: f1_weighted value: 3.7437491646031926 - type: main_score value: 8.261600537995966 task: type: Classification - dataset: config: de name: MTEB MassiveScenarioClassification (de) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 42.07128446536651 - type: f1 value: 38.28996078984755 - type: f1_weighted value: 41.04738811504033 - type: main_score value: 42.07128446536651 task: type: Classification - dataset: config: tr name: MTEB MassiveScenarioClassification (tr) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 34.845326160053794 - type: f1 value: 32.52170618407094 - type: f1_weighted value: 33.35658510579412 - type: main_score value: 34.845326160053794 task: type: Classification - dataset: config: pt name: MTEB MassiveScenarioClassification (pt) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 40.78681909885676 - type: f1 value: 37.33575502776686 - type: f1_weighted value: 38.66002021299529 - type: main_score value: 40.78681909885676 task: type: Classification - dataset: config: sq name: MTEB MassiveScenarioClassification (sq) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 42.65635507733692 - type: f1 value: 38.53947437411434 - type: f1_weighted value: 41.52520693995739 - type: main_score value: 42.65635507733692 task: type: Classification - dataset: config: zh-TW name: MTEB MassiveScenarioClassification (zh-TW) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 11.926698049764628 - type: f1 value: 8.724194514820493 - type: f1_weighted value: 10.266244979280504 - type: main_score value: 11.926698049764628 task: type: Classification - dataset: config: hy name: MTEB MassiveScenarioClassification (hy) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.779421654337593 - type: f1 value: 3.47659510611439 - type: f1_weighted value: 4.092370736159162 - type: main_score value: 8.779421654337593 task: type: Classification - dataset: config: da name: MTEB MassiveScenarioClassification (da) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 43.6852723604573 - type: f1 value: 39.338012150585094 - type: f1_weighted value: 43.3756140521009 - type: main_score value: 43.6852723604573 task: type: Classification - dataset: config: af name: MTEB MassiveScenarioClassification (af) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 40.83725622057835 - type: f1 value: 36.67993326074695 - type: f1_weighted value: 40.73536387442413 - type: main_score value: 40.83725622057835 task: type: Classification - dataset: config: ar name: MTEB MassiveScenarioClassification (ar) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 11.859448554135843 - type: f1 value: 6.502577103628851 - type: f1_weighted value: 9.922384035467028 - type: main_score value: 11.859448554135843 task: type: Classification - dataset: config: jv name: MTEB MassiveScenarioClassification (jv) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 37.22932078009414 - type: f1 value: 34.37198836784653 - type: f1_weighted value: 36.41682430619207 - type: main_score value: 37.22932078009414 task: type: Classification - dataset: config: te name: MTEB MassiveScenarioClassification (te) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 6.909885675857431 - type: f1 value: 2.659712889039866 - type: f1_weighted value: 3.315252295282912 - type: main_score value: 6.909885675857431 task: type: Classification - dataset: config: tl name: MTEB MassiveScenarioClassification (tl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 38.157363819771355 - type: f1 value: 33.871383306341926 - type: f1_weighted value: 37.16844466757229 - type: main_score value: 38.157363819771355 task: type: Classification - dataset: config: sw name: MTEB MassiveScenarioClassification (sw) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 35.65904505716207 - type: f1 value: 32.95848641686319 - type: f1_weighted value: 33.46347965861419 - type: main_score value: 35.65904505716207 task: type: Classification - dataset: config: ja name: MTEB MassiveScenarioClassification (ja) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 10.601882985877605 - type: f1 value: 8.05499004226519 - type: f1_weighted value: 8.12291817923475 - type: main_score value: 10.601882985877605 task: type: Classification - dataset: config: ms name: MTEB MassiveScenarioClassification (ms) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 38.97108271687962 - type: f1 value: 34.19920488698337 - type: f1_weighted value: 37.406365439450006 - type: main_score value: 38.97108271687962 task: type: Classification - dataset: config: nb name: MTEB MassiveScenarioClassification (nb) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 39.04505716207128 - type: f1 value: 35.380977049887605 - type: f1_weighted value: 38.79082603370826 - type: main_score value: 39.04505716207128 task: type: Classification - dataset: config: fi name: MTEB MassiveScenarioClassification (fi) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 35.18829858776059 - type: f1 value: 30.972699263943966 - type: f1_weighted value: 34.66929745941575 - type: main_score value: 35.18829858776059 task: type: Classification - dataset: config: id name: MTEB MassiveScenarioClassification (id) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 39.53934095494284 - type: f1 value: 37.19939485401421 - type: f1_weighted value: 38.163540271879384 - type: main_score value: 39.53934095494284 task: type: Classification - dataset: config: cy name: MTEB MassiveScenarioClassification (cy) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 39.85205110961668 - type: f1 value: 34.567211938088086 - type: f1_weighted value: 38.93137139872493 - type: main_score value: 39.85205110961668 task: type: Classification - dataset: config: sl name: MTEB MassiveScenarioClassification (sl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 35.978480161398785 - type: f1 value: 33.70493150778863 - type: f1_weighted value: 34.89613180942136 - type: main_score value: 35.978480161398785 task: type: Classification - dataset: config: es name: MTEB MassiveScenarioClassification (es) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 37.12508406186954 - type: f1 value: 34.14887874344704 - type: f1_weighted value: 35.491336292250615 - type: main_score value: 37.12508406186954 task: type: Classification - dataset: config: bn name: MTEB MassiveScenarioClassification (bn) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.846671149966376 - type: f1 value: 3.772079613264656 - type: f1_weighted value: 4.569880079881123 - type: main_score value: 8.846671149966376 task: type: Classification - dataset: config: sv name: MTEB MassiveScenarioClassification (sv) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 36.11970410221924 - type: f1 value: 33.64741825888341 - type: f1_weighted value: 36.04738800166304 - type: main_score value: 36.11970410221924 task: type: Classification - dataset: config: ru name: MTEB MassiveScenarioClassification (ru) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 62.89509078681911 - type: f1 value: 62.296937620668366 - type: f1_weighted value: 61.50844245234364 - type: main_score value: 62.89509078681911 task: type: Classification - dataset: config: az name: MTEB MassiveScenarioClassification (az) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 30.31607262945528 - type: f1 value: 27.373913596444382 - type: f1_weighted value: 29.154743431705356 - type: main_score value: 30.31607262945528 task: type: Classification - dataset: config: it name: MTEB MassiveScenarioClassification (it) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 42.68997982515131 - type: f1 value: 39.34921574451304 - type: f1_weighted value: 41.39971354124732 - type: main_score value: 42.68997982515131 task: type: Classification - dataset: config: pl name: MTEB MassiveScenarioClassification (pl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 31.62071284465367 - type: f1 value: 27.53427875798914 - type: f1_weighted value: 30.442690748521006 - type: main_score value: 31.62071284465367 task: type: Classification - dataset: config: vi name: MTEB MassiveScenarioClassification (vi) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 31.889710827168795 - type: f1 value: 29.1527074423781 - type: f1_weighted value: 29.84128781391531 - type: main_score value: 31.889710827168795 task: type: Classification - dataset: config: ta name: MTEB MassiveScenarioClassification (ta) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.007397444519166 - type: f1 value: 1.763256752893296 - type: f1_weighted value: 2.3996756522652913 - type: main_score value: 7.007397444519166 task: type: Classification - dataset: config: he name: MTEB MassiveScenarioClassification (he) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.612642905178212 - type: f1 value: 2.0115132382174585 - type: f1_weighted value: 2.8178938596974503 - type: main_score value: 7.612642905178212 task: type: Classification - dataset: config: nl name: MTEB MassiveScenarioClassification (nl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 40.93813046402152 - type: f1 value: 35.475977992563635 - type: f1_weighted value: 40.249098836834044 - type: main_score value: 40.93813046402152 task: type: Classification - dataset: config: km name: MTEB MassiveScenarioClassification (km) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: test type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.510423671822462 - type: f1 value: 2.77822187113745 - type: f1_weighted value: 3.488782507211019 - type: main_score value: 8.510423671822462 task: type: Classification - dataset: config: zh-CN name: MTEB MassiveScenarioClassification (zh-CN) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 10.560747663551401 - type: f1 value: 7.321692095226571 - type: f1_weighted value: 8.136926309421098 - type: main_score value: 10.560747663551401 task: type: Classification - dataset: config: ko name: MTEB MassiveScenarioClassification (ko) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 5.622233152975899 - type: f1 value: 1.7454943918873769 - type: f1_weighted value: 1.5544580080510706 - type: main_score value: 5.622233152975899 task: type: Classification - dataset: config: hi name: MTEB MassiveScenarioClassification (hi) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.50614854894245 - type: f1 value: 3.671558894965337 - type: f1_weighted value: 3.6075123924941224 - type: main_score value: 7.50614854894245 task: type: Classification - dataset: config: kn name: MTEB MassiveScenarioClassification (kn) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.047220855878013 - type: f1 value: 4.199596683728984 - type: f1_weighted value: 3.705979981207572 - type: main_score value: 8.047220855878013 task: type: Classification - dataset: config: ka name: MTEB MassiveScenarioClassification (ka) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 6.591244466305953 - type: f1 value: 1.9804826267181144 - type: f1_weighted value: 2.1652032753558714 - type: main_score value: 6.591244466305953 task: type: Classification - dataset: config: am name: MTEB MassiveScenarioClassification (am) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.511067388096411 - type: f1 value: 2.641163180255864 - type: f1_weighted value: 3.03599461945174 - type: main_score value: 7.511067388096411 task: type: Classification - dataset: config: my name: MTEB MassiveScenarioClassification (my) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 11.234628627643877 - type: f1 value: 4.53829675095688 - type: f1_weighted value: 5.119828126415879 - type: main_score value: 11.234628627643877 task: type: Classification - dataset: config: el name: MTEB MassiveScenarioClassification (el) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 16.438760452533202 - type: f1 value: 12.026293516540374 - type: f1_weighted value: 13.40697491103347 - type: main_score value: 16.438760452533202 task: type: Classification - dataset: config: lv name: MTEB MassiveScenarioClassification (lv) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 28.470241023118547 - type: f1 value: 26.06308403577423 - type: f1_weighted value: 26.913188635640108 - type: main_score value: 28.470241023118547 task: type: Classification - dataset: config: ml name: MTEB MassiveScenarioClassification (ml) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.34874569601574 - type: f1 value: 2.163368202700301 - type: f1_weighted value: 2.9794749471502735 - type: main_score value: 7.34874569601574 task: type: Classification - dataset: config: mn name: MTEB MassiveScenarioClassification (mn) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 33.482538121003444 - type: f1 value: 31.74224548475336 - type: f1_weighted value: 32.974792871093996 - type: main_score value: 33.482538121003444 task: type: Classification - dataset: config: ur name: MTEB MassiveScenarioClassification (ur) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.735858337432365 - type: f1 value: 4.387957216974412 - type: f1_weighted value: 4.487011850573568 - type: main_score value: 8.735858337432365 task: type: Classification - dataset: config: fa name: MTEB MassiveScenarioClassification (fa) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 6.8027545499262185 - type: f1 value: 2.724940339247371 - type: f1_weighted value: 2.9191909608862248 - type: main_score value: 6.8027545499262185 task: type: Classification - dataset: config: ro name: MTEB MassiveScenarioClassification (ro) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 39.77865223807182 - type: f1 value: 36.713842977439086 - type: f1_weighted value: 38.411147363742614 - type: main_score value: 39.77865223807182 task: type: Classification - dataset: config: is name: MTEB MassiveScenarioClassification (is) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 32.611903590752576 - type: f1 value: 30.478777350564933 - type: f1_weighted value: 32.33376716992967 - type: main_score value: 32.611903590752576 task: type: Classification - dataset: config: en name: MTEB MassiveScenarioClassification (en) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 60.81652729955731 - type: f1 value: 57.85686645797947 - type: f1_weighted value: 59.96336225413508 - type: main_score value: 60.81652729955731 task: type: Classification - dataset: config: hu name: MTEB MassiveScenarioClassification (hu) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 35.041810132808656 - type: f1 value: 32.32895536298411 - type: f1_weighted value: 34.08983039599136 - type: main_score value: 35.041810132808656 task: type: Classification - dataset: config: fr name: MTEB MassiveScenarioClassification (fr) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 42.4151500245942 - type: f1 value: 39.716877977971514 - type: f1_weighted value: 40.98904556640093 - type: main_score value: 42.4151500245942 task: type: Classification - dataset: config: th name: MTEB MassiveScenarioClassification (th) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.253812100344318 - type: f1 value: 4.2941598559113645 - type: f1_weighted value: 3.7137986151126743 - type: main_score value: 8.253812100344318 task: type: Classification - dataset: config: de name: MTEB MassiveScenarioClassification (de) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 40.65912444663059 - type: f1 value: 37.90162745459205 - type: f1_weighted value: 39.942707376839756 - type: main_score value: 40.65912444663059 task: type: Classification - dataset: config: tr name: MTEB MassiveScenarioClassification (tr) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 33.85145105755042 - type: f1 value: 32.41363211826809 - type: f1_weighted value: 32.696811929693745 - type: main_score value: 33.85145105755042 task: type: Classification - dataset: config: pt name: MTEB MassiveScenarioClassification (pt) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 40.22626660108214 - type: f1 value: 37.84448697275546 - type: f1_weighted value: 37.82059370217246 - type: main_score value: 40.22626660108214 task: type: Classification - dataset: config: sq name: MTEB MassiveScenarioClassification (sq) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 42.06591244466306 - type: f1 value: 38.76214747335659 - type: f1_weighted value: 40.65484003509404 - type: main_score value: 42.06591244466306 task: type: Classification - dataset: config: zh-TW name: MTEB MassiveScenarioClassification (zh-TW) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 11.682242990654206 - type: f1 value: 8.850699907144218 - type: f1_weighted value: 9.655517346069553 - type: main_score value: 11.682242990654206 task: type: Classification - dataset: config: hy name: MTEB MassiveScenarioClassification (hy) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.52926709296606 - type: f1 value: 3.4189589714301167 - type: f1_weighted value: 3.894511154092698 - type: main_score value: 8.52926709296606 task: type: Classification - dataset: config: da name: MTEB MassiveScenarioClassification (da) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 41.14117068371864 - type: f1 value: 38.08063702754415 - type: f1_weighted value: 40.65305294882936 - type: main_score value: 41.14117068371864 task: type: Classification - dataset: config: af name: MTEB MassiveScenarioClassification (af) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 39.3654697491392 - type: f1 value: 36.43369907401146 - type: f1_weighted value: 39.09920883835431 - type: main_score value: 39.3654697491392 task: type: Classification - dataset: config: ar name: MTEB MassiveScenarioClassification (ar) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 11.362518445646828 - type: f1 value: 6.2728348209099565 - type: f1_weighted value: 8.903159425462325 - type: main_score value: 11.362518445646828 task: type: Classification - dataset: config: jv name: MTEB MassiveScenarioClassification (jv) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 36.246925725528776 - type: f1 value: 34.242775177193415 - type: f1_weighted value: 34.90531238831363 - type: main_score value: 36.246925725528776 task: type: Classification - dataset: config: te name: MTEB MassiveScenarioClassification (te) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 6.861780619773734 - type: f1 value: 2.7017710457799873 - type: f1_weighted value: 3.1681349264113137 - type: main_score value: 6.861780619773734 task: type: Classification - dataset: config: tl name: MTEB MassiveScenarioClassification (tl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 38.17019183472701 - type: f1 value: 34.777811838185485 - type: f1_weighted value: 36.90042555420213 - type: main_score value: 38.17019183472701 task: type: Classification - dataset: config: sw name: MTEB MassiveScenarioClassification (sw) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 35.32710280373832 - type: f1 value: 33.32826385073952 - type: f1_weighted value: 33.388725291289916 - type: main_score value: 35.32710280373832 task: type: Classification - dataset: config: ja name: MTEB MassiveScenarioClassification (ja) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 11.20511559272012 - type: f1 value: 8.976181412932425 - type: f1_weighted value: 8.576498601594645 - type: main_score value: 11.20511559272012 task: type: Classification - dataset: config: ms name: MTEB MassiveScenarioClassification (ms) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 38.85391047712739 - type: f1 value: 34.90571468739814 - type: f1_weighted value: 36.82763280572209 - type: main_score value: 38.85391047712739 task: type: Classification - dataset: config: nb name: MTEB MassiveScenarioClassification (nb) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 38.052139695031975 - type: f1 value: 35.272001887507564 - type: f1_weighted value: 37.42041278303434 - type: main_score value: 38.052139695031975 task: type: Classification - dataset: config: fi name: MTEB MassiveScenarioClassification (fi) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 34.500737825873095 - type: f1 value: 30.68780970737908 - type: f1_weighted value: 33.716051134823 - type: main_score value: 34.500737825873095 task: type: Classification - dataset: config: id name: MTEB MassiveScenarioClassification (id) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 39.596655189375305 - type: f1 value: 37.72092200675893 - type: f1_weighted value: 37.89234511492137 - type: main_score value: 39.596655189375305 task: type: Classification - dataset: config: cy name: MTEB MassiveScenarioClassification (cy) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 38.93261190359076 - type: f1 value: 34.67593293977394 - type: f1_weighted value: 37.58144266593478 - type: main_score value: 38.93261190359076 task: type: Classification - dataset: config: sl name: MTEB MassiveScenarioClassification (sl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 35.336940482046245 - type: f1 value: 34.06391073492543 - type: f1_weighted value: 34.19964460077873 - type: main_score value: 35.336940482046245 task: type: Classification - dataset: config: es name: MTEB MassiveScenarioClassification (es) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 36.28135759960649 - type: f1 value: 33.98213113943637 - type: f1_weighted value: 34.432683108706726 - type: main_score value: 36.28135759960649 task: type: Classification - dataset: config: bn name: MTEB MassiveScenarioClassification (bn) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.789965568125922 - type: f1 value: 3.615951273986677 - type: f1_weighted value: 4.543124755655086 - type: main_score value: 8.789965568125922 task: type: Classification - dataset: config: sv name: MTEB MassiveScenarioClassification (sv) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 35.78947368421053 - type: f1 value: 33.641144471139874 - type: f1_weighted value: 35.35509200878473 - type: main_score value: 35.78947368421053 task: type: Classification - dataset: config: ru name: MTEB MassiveScenarioClassification (ru) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 64.14658140678799 - type: f1 value: 63.45318114952019 - type: f1_weighted value: 62.837233214870004 - type: main_score value: 64.14658140678799 task: type: Classification - dataset: config: az name: MTEB MassiveScenarioClassification (az) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 29.616330545991143 - type: f1 value: 27.89304924236733 - type: f1_weighted value: 28.557344732597763 - type: main_score value: 29.616330545991143 task: type: Classification - dataset: config: it name: MTEB MassiveScenarioClassification (it) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 41.1952779144122 - type: f1 value: 38.70295863724121 - type: f1_weighted value: 39.8087264213271 - type: main_score value: 41.1952779144122 task: type: Classification - dataset: config: pl name: MTEB MassiveScenarioClassification (pl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 30.15248401377275 - type: f1 value: 27.24749237955316 - type: f1_weighted value: 29.24459561389263 - type: main_score value: 30.15248401377275 task: type: Classification - dataset: config: vi name: MTEB MassiveScenarioClassification (vi) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 31.942941465814062 - type: f1 value: 29.238187005403976 - type: f1_weighted value: 29.360530025850295 - type: main_score value: 31.942941465814062 task: type: Classification - dataset: config: ta name: MTEB MassiveScenarioClassification (ta) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.211018199704869 - type: f1 value: 1.858123064629565 - type: f1_weighted value: 2.531232017204237 - type: main_score value: 7.211018199704869 task: type: Classification - dataset: config: he name: MTEB MassiveScenarioClassification (he) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 7.948844072798819 - type: f1 value: 2.1010859887190896 - type: f1_weighted value: 3.0480176454133283 - type: main_score value: 7.948844072798819 task: type: Classification - dataset: config: nl name: MTEB MassiveScenarioClassification (nl) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 38.92277422528283 - type: f1 value: 35.488036321576146 - type: f1_weighted value: 38.18536556200914 - type: main_score value: 38.92277422528283 task: type: Classification - dataset: config: km name: MTEB MassiveScenarioClassification (km) revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 split: validation type: mteb/amazon_massive_scenario metrics: - type: accuracy value: 8.150516478111165 - type: f1 value: 2.72691932389948 - type: f1_weighted value: 3.3948665965609117 - type: main_score value: 8.150516478111165 task: type: Classification - dataset: config: default name: MTEB MedrxivClusteringP2P (default) revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 split: test type: mteb/medrxiv-clustering-p2p metrics: - type: main_score value: 20.786832589263845 - type: v_measure value: 20.786832589263845 - type: v_measure_std value: 1.6048001943974946 task: type: Clustering - dataset: config: default name: MTEB MedrxivClusteringS2S (default) revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 split: test type: mteb/medrxiv-clustering-s2s metrics: - type: main_score value: 18.181247067178756 - type: v_measure value: 18.181247067178756 - type: v_measure_std value: 1.5798786706707373 task: type: Clustering - dataset: config: default name: MTEB NYSJudicialEthicsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 45.20547945205479 - type: ap value: 50.160551683623055 - type: ap_weighted value: 50.160551683623055 - type: f1 value: 44.53941120607787 - type: f1_weighted value: 44.28963561383653 - type: main_score value: 45.20547945205479 task: type: Classification - dataset: config: default name: MTEB NewsClassification (default) revision: eb185aade064a813bc0b7f42de02595523103ca4 split: test type: fancyzhx/ag_news metrics: - type: accuracy value: 73.78552631578948 - type: f1 value: 73.47724204580956 - type: f1_weighted value: 73.47724204580956 - type: main_score value: 73.78552631578948 task: type: Classification - dataset: config: default name: MTEB OPP115DataRetentionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 69.31818181818183 - type: ap value: 64.09705159705157 - type: ap_weighted value: 64.09705159705157 - type: f1 value: 69.12280701754385 - type: f1_weighted value: 69.12280701754386 - type: main_score value: 69.31818181818183 task: type: Classification - dataset: config: default name: MTEB OPP115DataSecurityLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 63.868065967016484 - type: ap value: 62.05622742346708 - type: ap_weighted value: 62.05622742346708 - type: f1 value: 60.25914242202488 - type: f1_weighted value: 60.22323273501004 - type: main_score value: 63.868065967016484 task: type: Classification - dataset: config: default name: MTEB OPP115DoNotTrackLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 88.18181818181819 - type: ap value: 85.12727272727273 - type: ap_weighted value: 85.12727272727273 - type: f1 value: 88.15734989648034 - type: f1_weighted value: 88.15734989648034 - type: main_score value: 88.18181818181819 task: type: Classification - dataset: config: default name: MTEB OPP115FirstPartyCollectionUseLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 69.55896452540749 - type: ap value: 64.53342029559877 - type: ap_weighted value: 64.53342029559877 - type: f1 value: 69.32286869541191 - type: f1_weighted value: 69.31770813082186 - type: main_score value: 69.55896452540749 task: type: Classification - dataset: config: default name: MTEB OPP115InternationalAndSpecificAudiencesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 77.75510204081633 - type: ap value: 75.20843296586462 - type: ap_weighted value: 75.20843296586462 - type: f1 value: 77.09799280479909 - type: f1_weighted value: 77.11382676229348 - type: main_score value: 77.75510204081633 task: type: Classification - dataset: config: default name: MTEB OPP115PolicyChangeLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 89.0951276102088 - type: ap value: 87.15879085780726 - type: ap_weighted value: 87.15879085780726 - type: f1 value: 89.04203698995461 - type: f1_weighted value: 89.04380667729642 - type: main_score value: 89.0951276102088 task: type: Classification - dataset: config: default name: MTEB OPP115ThirdPartySharingCollectionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 64.27672955974842 - type: ap value: 62.893075413619535 - type: ap_weighted value: 62.893075413619535 - type: f1 value: 60.459952085405675 - type: f1_weighted value: 60.4135944642598 - type: main_score value: 64.27672955974842 task: type: Classification - dataset: config: default name: MTEB OPP115UserAccessEditAndDeletionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 67.09956709956711 - type: ap value: 62.92853137890984 - type: ap_weighted value: 62.92853137890984 - type: f1 value: 66.41414141414141 - type: f1_weighted value: 66.39337093882548 - type: main_score value: 67.09956709956711 task: type: Classification - dataset: config: default name: MTEB OPP115UserChoiceControlLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 70.69857697283311 - type: ap value: 63.961545634799855 - type: ap_weighted value: 63.961545634799855 - type: f1 value: 70.33565944829778 - type: f1_weighted value: 70.34414874711732 - type: main_score value: 70.69857697283311 task: type: Classification - dataset: config: default name: MTEB OralArgumentQuestionPurposeLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 20.51282051282051 - type: f1 value: 17.434477437885 - type: f1_weighted value: 21.50138868825342 - type: main_score value: 20.51282051282051 task: type: Classification - dataset: config: default name: MTEB OverrulingLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 69.580078125 - type: ap value: 64.66695246425695 - type: ap_weighted value: 64.66695246425695 - type: f1 value: 69.55969170904413 - type: f1_weighted value: 69.5473829295991 - type: main_score value: 69.580078125 task: type: Classification - dataset: config: default name: MTEB PROALegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 49.47368421052632 - type: ap value: 49.47368421052632 - type: ap_weighted value: 49.47368421052632 - type: f1 value: 33.09859154929578 - type: f1_weighted value: 32.750185322461085 - type: main_score value: 49.47368421052632 task: type: Classification - dataset: config: default name: MTEB PatentClassification (default) revision: 2f38a1dfdecfacee0184d74eaeafd3c0fb49d2a6 split: test type: ccdv/patent-classification metrics: - type: accuracy value: 29.306640625000004 - type: f1 value: 22.127646065227754 - type: f1_weighted value: 26.66185625260182 - type: main_score value: 29.306640625000004 task: type: Classification - dataset: config: default name: MTEB PersonalJurisdictionLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 51.99999999999999 - type: ap value: 44.107526881720425 - type: ap_weighted value: 44.107526881720425 - type: f1 value: 51.92307692307692 - type: f1_weighted value: 51.61538461538463 - type: main_score value: 51.99999999999999 task: type: Classification - dataset: config: default name: MTEB PoemSentimentClassification (default) revision: 329d529d875a00c47ec71954a1a96ae167584770 split: test type: google-research-datasets/poem_sentiment metrics: - type: accuracy value: 35.96153846153845 - type: f1 value: 25.717059445124445 - type: f1_weighted value: 42.39026561619051 - type: main_score value: 35.96153846153845 task: type: Classification - dataset: config: default name: MTEB PoemSentimentClassification (default) revision: 329d529d875a00c47ec71954a1a96ae167584770 split: validation type: google-research-datasets/poem_sentiment metrics: - type: accuracy value: 35.80952380952381 - type: f1 value: 26.76432080315997 - type: f1_weighted value: 41.90402765909788 - type: main_score value: 35.80952380952381 task: type: Classification - dataset: config: default name: MTEB RUParaPhraserSTS (default) revision: 43265056790b8f7c59e0139acb4be0a8dad2c8f4 split: test type: merionum/ru_paraphraser metrics: - type: cosine_pearson value: 65.17293362215221 - type: cosine_spearman value: 72.14872507255558 - type: euclidean_pearson value: 69.39028550512482 - type: euclidean_spearman value: 72.14872507255558 - type: main_score value: 72.14872507255558 - type: manhattan_pearson value: 69.30934614737492 - type: manhattan_spearman value: 72.04933049290007 task: type: STS - dataset: config: default name: MTEB RedditClustering (default) revision: 24640382cdbf8abc73003fb0fa6d111a705499eb split: test type: mteb/reddit-clustering metrics: - type: main_score value: 26.275710753496597 - type: v_measure value: 26.275710753496597 - type: v_measure_std value: 4.029689555202136 task: type: Clustering - dataset: config: default name: MTEB RedditClusteringP2P (default) revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 split: test type: mteb/reddit-clustering-p2p metrics: - type: main_score value: 40.4828876757081 - type: v_measure value: 40.4828876757081 - type: v_measure_std value: 10.162859998011204 task: type: Clustering - dataset: config: default name: MTEB RiaNewsRetrieval (default) revision: 82374b0bbacda6114f39ff9c5b925fa1512ca5d7 split: test type: ai-forever/ria-news-retrieval metrics: - type: main_score value: 51.271 - type: map_at_1 value: 36.21 - type: map_at_10 value: 46.208 - type: map_at_100 value: 47.004000000000005 - type: map_at_1000 value: 47.044000000000004 - type: map_at_20 value: 46.693 - type: map_at_3 value: 43.669999999999995 - type: map_at_5 value: 45.196 - type: mrr_at_1 value: 36.22 - type: mrr_at_10 value: 46.21178571428571 - type: mrr_at_100 value: 47.007420014661236 - type: mrr_at_1000 value: 47.04734848842366 - type: mrr_at_20 value: 46.69688042104938 - type: mrr_at_3 value: 43.668333333333585 - type: mrr_at_5 value: 45.199833333333274 - type: nauc_map_at_1000_diff1 value: 46.94937854830209 - type: nauc_map_at_1000_max value: 20.810031674720868 - type: nauc_map_at_1000_std value: -2.8474964036416845 - type: nauc_map_at_100_diff1 value: 46.93710679472339 - type: nauc_map_at_100_max value: 20.808355966268614 - type: nauc_map_at_100_std value: -2.8341393346842607 - type: nauc_map_at_10_diff1 value: 46.85305633304179 - type: nauc_map_at_10_max value: 20.74714400194472 - type: nauc_map_at_10_std value: -3.0251519873045534 - type: nauc_map_at_1_diff1 value: 52.76907950247656 - type: nauc_map_at_1_max value: 20.909404191190152 - type: nauc_map_at_1_std value: -4.486212769404569 - type: nauc_map_at_20_diff1 value: 46.854283528399826 - type: nauc_map_at_20_max value: 20.774565284237017 - type: nauc_map_at_20_std value: -2.8952917224271846 - type: nauc_map_at_3_diff1 value: 47.6120187355803 - type: nauc_map_at_3_max value: 20.94624350299643 - type: nauc_map_at_3_std value: -3.5249841066101704 - type: nauc_map_at_5_diff1 value: 46.961741404854 - type: nauc_map_at_5_max value: 20.84061893727113 - type: nauc_map_at_5_std value: -3.2560895841762707 - type: nauc_mrr_at_1000_diff1 value: 46.94210158390746 - type: nauc_mrr_at_1000_max value: 20.823017819566672 - type: nauc_mrr_at_1000_std value: -2.873564388596409 - type: nauc_mrr_at_100_diff1 value: 46.92983853646228 - type: nauc_mrr_at_100_max value: 20.821328345843625 - type: nauc_mrr_at_100_std value: -2.860179131955564 - type: nauc_mrr_at_10_diff1 value: 46.845920501930316 - type: nauc_mrr_at_10_max value: 20.760199941251056 - type: nauc_mrr_at_10_std value: -3.0506119945281385 - type: nauc_mrr_at_1_diff1 value: 52.7384650230153 - type: nauc_mrr_at_1_max value: 20.916918175962735 - type: nauc_mrr_at_1_std value: -4.553119995428164 - type: nauc_mrr_at_20_diff1 value: 46.84707480256205 - type: nauc_mrr_at_20_max value: 20.78745076885492 - type: nauc_mrr_at_20_std value: -2.921144125415831 - type: nauc_mrr_at_3_diff1 value: 47.621438923503305 - type: nauc_mrr_at_3_max value: 20.964983104645327 - type: nauc_mrr_at_3_std value: -3.5359639119054154 - type: nauc_mrr_at_5_diff1 value: 46.95496065526142 - type: nauc_mrr_at_5_max value: 20.85370692098222 - type: nauc_mrr_at_5_std value: -3.2815901993324985 - type: nauc_ndcg_at_1000_diff1 value: 45.22512963946746 - type: nauc_ndcg_at_1000_max value: 20.827437126737433 - type: nauc_ndcg_at_1000_std value: -1.5970972641072643 - type: nauc_ndcg_at_100_diff1 value: 44.870296183306195 - type: nauc_ndcg_at_100_max value: 20.734194655306457 - type: nauc_ndcg_at_100_std value: -1.1285720744844427 - type: nauc_ndcg_at_10_diff1 value: 44.428914407493004 - type: nauc_ndcg_at_10_max value: 20.440243514420057 - type: nauc_ndcg_at_10_std value: -2.1210028369378167 - type: nauc_ndcg_at_1_diff1 value: 52.76907950247656 - type: nauc_ndcg_at_1_max value: 20.909404191190152 - type: nauc_ndcg_at_1_std value: -4.486212769404569 - type: nauc_ndcg_at_20_diff1 value: 44.333669717530185 - type: nauc_ndcg_at_20_max value: 20.503130801298607 - type: nauc_ndcg_at_20_std value: -1.6040287688898405 - type: nauc_ndcg_at_3_diff1 value: 45.988171772625634 - type: nauc_ndcg_at_3_max value: 20.901834276482294 - type: nauc_ndcg_at_3_std value: -3.228341348463241 - type: nauc_ndcg_at_5_diff1 value: 44.77257666022731 - type: nauc_ndcg_at_5_max value: 20.70409124701764 - type: nauc_ndcg_at_5_std value: -2.7157792836026826 - type: nauc_precision_at_1000_diff1 value: 24.715455802573878 - type: nauc_precision_at_1000_max value: 25.642760620422127 - type: nauc_precision_at_1000_std value: 20.124139669932596 - type: nauc_precision_at_100_diff1 value: 31.317204301075428 - type: nauc_precision_at_100_max value: 20.717841497411385 - type: nauc_precision_at_100_std value: 15.071826819138575 - type: nauc_precision_at_10_diff1 value: 35.455731038677605 - type: nauc_precision_at_10_max value: 19.1279684555736 - type: nauc_precision_at_10_std value: 1.47750077627525 - type: nauc_precision_at_1_diff1 value: 52.76907950247656 - type: nauc_precision_at_1_max value: 20.909404191190152 - type: nauc_precision_at_1_std value: -4.486212769404569 - type: nauc_precision_at_20_diff1 value: 33.12837939512509 - type: nauc_precision_at_20_max value: 19.114872213547194 - type: nauc_precision_at_20_std value: 4.913450374911581 - type: nauc_precision_at_3_diff1 value: 41.17113816710835 - type: nauc_precision_at_3_max value: 20.751510760974718 - type: nauc_precision_at_3_std value: -2.3503705806184496 - type: nauc_precision_at_5_diff1 value: 37.71917213552412 - type: nauc_precision_at_5_max value: 20.221342669216565 - type: nauc_precision_at_5_std value: -0.9301420941546075 - type: nauc_recall_at_1000_diff1 value: 24.715455802574407 - type: nauc_recall_at_1000_max value: 25.64276062042252 - type: nauc_recall_at_1000_std value: 20.124139669932728 - type: nauc_recall_at_100_diff1 value: 31.31720430107529 - type: nauc_recall_at_100_max value: 20.717841497411516 - type: nauc_recall_at_100_std value: 15.071826819138751 - type: nauc_recall_at_10_diff1 value: 35.455731038677655 - type: nauc_recall_at_10_max value: 19.127968455573654 - type: nauc_recall_at_10_std value: 1.47750077627532 - type: nauc_recall_at_1_diff1 value: 52.76907950247656 - type: nauc_recall_at_1_max value: 20.909404191190152 - type: nauc_recall_at_1_std value: -4.486212769404569 - type: nauc_recall_at_20_diff1 value: 33.12837939512524 - type: nauc_recall_at_20_max value: 19.1148722135474 - type: nauc_recall_at_20_std value: 4.91345037491176 - type: nauc_recall_at_3_diff1 value: 41.171138167108374 - type: nauc_recall_at_3_max value: 20.751510760974682 - type: nauc_recall_at_3_std value: -2.35037058061848 - type: nauc_recall_at_5_diff1 value: 37.71917213552414 - type: nauc_recall_at_5_max value: 20.221342669216575 - type: nauc_recall_at_5_std value: -0.9301420941545763 - type: ndcg_at_1 value: 36.21 - type: ndcg_at_10 value: 51.271 - type: ndcg_at_100 value: 55.289 - type: ndcg_at_1000 value: 56.401 - type: ndcg_at_20 value: 53.028 - type: ndcg_at_3 value: 46.078 - type: ndcg_at_5 value: 48.825 - type: precision_at_1 value: 36.21 - type: precision_at_10 value: 6.7250000000000005 - type: precision_at_100 value: 0.864 - type: precision_at_1000 value: 0.095 - type: precision_at_20 value: 3.7089999999999996 - type: precision_at_3 value: 17.68 - type: precision_at_5 value: 11.940000000000001 - type: recall_at_1 value: 36.21 - type: recall_at_10 value: 67.25 - type: recall_at_100 value: 86.4 - type: recall_at_1000 value: 95.26 - type: recall_at_20 value: 74.18 - type: recall_at_3 value: 53.04 - type: recall_at_5 value: 59.699999999999996 task: type: Retrieval - dataset: config: default name: MTEB RuBQReranking (default) revision: 2e96b8f098fa4b0950fc58eacadeb31c0d0c7fa2 split: test type: ai-forever/rubq-reranking metrics: - type: main_score value: 62.15027154459556 - type: map value: 62.15027154459556 - type: mrr value: 68.09500782905037 - type: nAUC_map_diff1 value: 33.062970148901556 - type: nAUC_map_max value: 11.090302786599219 - type: nAUC_map_std value: 5.660375803457896 - type: nAUC_mrr_diff1 value: 35.578332777596685 - type: nAUC_mrr_max value: 14.981311816105839 - type: nAUC_mrr_std value: 5.550039824115788 task: type: Reranking - dataset: config: default name: MTEB RuBQRetrieval (default) revision: e19b6ffa60b3bc248e0b41f4cc37c26a55c2a67b split: test type: ai-forever/rubq-retrieval metrics: - type: main_score value: 51.734 - type: map_at_1 value: 28.510999999999996 - type: map_at_10 value: 43.631 - type: map_at_100 value: 44.988 - type: map_at_1000 value: 45.052 - type: map_at_20 value: 44.462 - type: map_at_3 value: 38.937 - type: map_at_5 value: 41.833 - type: mrr_at_1 value: 41.312056737588655 - type: mrr_at_10 value: 53.36138316634781 - type: mrr_at_100 value: 53.949276632310216 - type: mrr_at_1000 value: 53.97463197704906 - type: mrr_at_20 value: 53.72140863635181 - type: mrr_at_3 value: 50.43341213553989 - type: mrr_at_5 value: 52.32466509062269 - type: nauc_map_at_1000_diff1 value: 28.763838953386795 - type: nauc_map_at_1000_max value: 24.058720207454833 - type: nauc_map_at_1000_std value: 0.43914028345667794 - type: nauc_map_at_100_diff1 value: 28.74115734128027 - type: nauc_map_at_100_max value: 24.067201633751907 - type: nauc_map_at_100_std value: 0.48479657643151175 - type: nauc_map_at_10_diff1 value: 28.78055585777882 - type: nauc_map_at_10_max value: 23.660824446842014 - type: nauc_map_at_10_std value: -0.13417257945838412 - type: nauc_map_at_1_diff1 value: 31.726698171475988 - type: nauc_map_at_1_max value: 18.706684051084675 - type: nauc_map_at_1_std value: -3.1112088462944576 - type: nauc_map_at_20_diff1 value: 28.821888050893524 - type: nauc_map_at_20_max value: 24.054108877450066 - type: nauc_map_at_20_std value: 0.29933097295171895 - type: nauc_map_at_3_diff1 value: 29.414059668041187 - type: nauc_map_at_3_max value: 21.603288627966425 - type: nauc_map_at_3_std value: -1.2582454726026868 - type: nauc_map_at_5_diff1 value: 28.763709067820066 - type: nauc_map_at_5_max value: 22.83472652858084 - type: nauc_map_at_5_std value: -0.9139576784503077 - type: nauc_mrr_at_1000_diff1 value: 32.788260400997885 - type: nauc_mrr_at_1000_max value: 26.645815716166126 - type: nauc_mrr_at_1000_std value: -1.751195655856463 - type: nauc_mrr_at_100_diff1 value: 32.77886459571929 - type: nauc_mrr_at_100_max value: 26.65637126850806 - type: nauc_mrr_at_100_std value: -1.7267980184678584 - type: nauc_mrr_at_10_diff1 value: 32.78874216502045 - type: nauc_mrr_at_10_max value: 26.4839655119896 - type: nauc_mrr_at_10_std value: -1.9790149014956449 - type: nauc_mrr_at_1_diff1 value: 35.13232635364635 - type: nauc_mrr_at_1_max value: 23.697653866746013 - type: nauc_mrr_at_1_std value: -3.229619940147812 - type: nauc_mrr_at_20_diff1 value: 32.77802354989702 - type: nauc_mrr_at_20_max value: 26.68040225454969 - type: nauc_mrr_at_20_std value: -1.75616956975016 - type: nauc_mrr_at_3_diff1 value: 32.984816761600435 - type: nauc_mrr_at_3_max value: 26.13901825373233 - type: nauc_mrr_at_3_std value: -2.52193076369521 - type: nauc_mrr_at_5_diff1 value: 32.84967841683121 - type: nauc_mrr_at_5_max value: 26.529547373322448 - type: nauc_mrr_at_5_std value: -2.5581887401849595 - type: nauc_ndcg_at_1000_diff1 value: 28.596338371171104 - type: nauc_ndcg_at_1000_max value: 26.398864343527546 - type: nauc_ndcg_at_1000_std value: 2.0928142009674264 - type: nauc_ndcg_at_100_diff1 value: 28.25901263389625 - type: nauc_ndcg_at_100_max value: 26.93052809711281 - type: nauc_ndcg_at_100_std value: 3.1368035623322266 - type: nauc_ndcg_at_10_diff1 value: 28.273504061219295 - type: nauc_ndcg_at_10_max value: 25.70274506672966 - type: nauc_ndcg_at_10_std value: 1.031980357515916 - type: nauc_ndcg_at_1_diff1 value: 35.288927336386486 - type: nauc_ndcg_at_1_max value: 23.407964640774143 - type: nauc_ndcg_at_1_std value: -3.2088824424845743 - type: nauc_ndcg_at_20_diff1 value: 28.27252389476242 - type: nauc_ndcg_at_20_max value: 26.959280568356686 - type: nauc_ndcg_at_20_std value: 2.355748254409649 - type: nauc_ndcg_at_3_diff1 value: 29.507109145825144 - type: nauc_ndcg_at_3_max value: 23.171704666301913 - type: nauc_ndcg_at_3_std value: -1.4521550440778286 - type: nauc_ndcg_at_5_diff1 value: 28.488416363267216 - type: nauc_ndcg_at_5_max value: 24.63470555569984 - type: nauc_ndcg_at_5_std value: -0.9243408985702865 - type: nauc_precision_at_1000_diff1 value: -1.6853041487515183 - type: nauc_precision_at_1000_max value: 7.960967030916032 - type: nauc_precision_at_1000_std value: 3.6491508412352784 - type: nauc_precision_at_100_diff1 value: 1.1138125936003078 - type: nauc_precision_at_100_max value: 14.425287491557784 - type: nauc_precision_at_100_std value: 8.976522577047673 - type: nauc_precision_at_10_diff1 value: 9.746060862351767 - type: nauc_precision_at_10_max value: 21.23608774117671 - type: nauc_precision_at_10_std value: 5.704741335087523 - type: nauc_precision_at_1_diff1 value: 35.288927336386486 - type: nauc_precision_at_1_max value: 23.407964640774143 - type: nauc_precision_at_1_std value: -3.2088824424845743 - type: nauc_precision_at_20_diff1 value: 6.326610022834949 - type: nauc_precision_at_20_max value: 20.35842844947274 - type: nauc_precision_at_20_std value: 8.561077634074318 - type: nauc_precision_at_3_diff1 value: 20.23921207457269 - type: nauc_precision_at_3_max value: 22.983126702497753 - type: nauc_precision_at_3_std value: 0.3762065769613514 - type: nauc_precision_at_5_diff1 value: 14.130374029335451 - type: nauc_precision_at_5_max value: 22.27280203101339 - type: nauc_precision_at_5_std value: 1.4403304333986182 - type: nauc_recall_at_1000_diff1 value: 5.336939388003354 - type: nauc_recall_at_1000_max value: 31.706880957377347 - type: nauc_recall_at_1000_std value: 34.42854130495 - type: nauc_recall_at_100_diff1 value: 13.06348098921675 - type: nauc_recall_at_100_max value: 35.43003105581946 - type: nauc_recall_at_100_std value: 28.949432461425634 - type: nauc_recall_at_10_diff1 value: 19.58510835348359 - type: nauc_recall_at_10_max value: 25.98205980928563 - type: nauc_recall_at_10_std value: 6.643640648680416 - type: nauc_recall_at_1_diff1 value: 31.726698171475988 - type: nauc_recall_at_1_max value: 18.706684051084675 - type: nauc_recall_at_1_std value: -3.1112088462944576 - type: nauc_recall_at_20_diff1 value: 17.50381042355996 - type: nauc_recall_at_20_max value: 31.185904487900324 - type: nauc_recall_at_20_std value: 13.510200942211565 - type: nauc_recall_at_3_diff1 value: 24.227382984516147 - type: nauc_recall_at_3_max value: 21.40248626451014 - type: nauc_recall_at_3_std value: -0.469137375497106 - type: nauc_recall_at_5_diff1 value: 21.25980638967181 - type: nauc_recall_at_5_max value: 23.853364661344404 - type: nauc_recall_at_5_std value: 0.7407724495151051 - type: ndcg_at_1 value: 41.253 - type: ndcg_at_10 value: 51.734 - type: ndcg_at_100 value: 56.796 - type: ndcg_at_1000 value: 58.044 - type: ndcg_at_20 value: 53.982 - type: ndcg_at_3 value: 44.448 - type: ndcg_at_5 value: 48.306 - type: precision_at_1 value: 41.253 - type: precision_at_10 value: 10.674 - type: precision_at_100 value: 1.437 - type: precision_at_1000 value: 0.159 - type: precision_at_20 value: 6.0280000000000005 - type: precision_at_3 value: 24.901 - type: precision_at_5 value: 18.038 - type: recall_at_1 value: 28.510999999999996 - type: recall_at_10 value: 65.646 - type: recall_at_100 value: 86.37 - type: recall_at_1000 value: 94.926 - type: recall_at_20 value: 73.236 - type: recall_at_3 value: 47.492000000000004 - type: recall_at_5 value: 56.552 task: type: Retrieval - dataset: config: default name: MTEB RuReviewsClassification (default) revision: f6d2c31f4dc6b88f468552750bfec05b4b41b05a split: test type: ai-forever/ru-reviews-classification metrics: - type: accuracy value: 60.6591796875 - type: f1 value: 60.34177974754267 - type: f1_weighted value: 60.3424791407144 - type: main_score value: 60.6591796875 task: type: Classification - dataset: config: default name: MTEB RuSTSBenchmarkSTS (default) revision: 7cf24f325c6da6195df55bef3d86b5e0616f3018 split: test type: ai-forever/ru-stsbenchmark-sts metrics: - type: cosine_pearson value: 78.67181755069355 - type: cosine_spearman value: 78.48157070388886 - type: euclidean_pearson value: 78.16400243944963 - type: euclidean_spearman value: 78.48124817526005 - type: main_score value: 78.48157070388886 - type: manhattan_pearson value: 78.04437263885238 - type: manhattan_spearman value: 78.34292373482941 task: type: STS - dataset: config: default name: MTEB RuSciBenchGRNTIClassification (default) revision: 673a610d6d3dd91a547a0d57ae1b56f37ebbf6a1 split: test type: ai-forever/ru-scibench-grnti-classification metrics: - type: accuracy value: 52.9296875 - type: f1 value: 51.36892216551846 - type: f1_weighted value: 51.38263945115431 - type: main_score value: 52.9296875 task: type: Classification - dataset: config: default name: MTEB RuSciBenchGRNTIClusteringP2P (default) revision: 673a610d6d3dd91a547a0d57ae1b56f37ebbf6a1 split: test type: ai-forever/ru-scibench-grnti-classification metrics: - type: main_score value: 47.548401486969844 - type: v_measure value: 47.548401486969844 - type: v_measure_std value: 0.9652047055316595 task: type: Clustering - dataset: config: default name: MTEB RuSciBenchOECDClassification (default) revision: 26c88e99dcaba32bb45d0e1bfc21902337f6d471 split: test type: ai-forever/ru-scibench-oecd-classification metrics: - type: accuracy value: 40.7861328125 - type: f1 value: 38.417161317304625 - type: f1_weighted value: 38.41751508417981 - type: main_score value: 40.7861328125 task: type: Classification - dataset: config: default name: MTEB RuSciBenchOECDClusteringP2P (default) revision: 26c88e99dcaba32bb45d0e1bfc21902337f6d471 split: test type: ai-forever/ru-scibench-oecd-classification metrics: - type: main_score value: 41.44039335680795 - type: v_measure value: 41.44039335680795 - type: v_measure_std value: 1.2447867997057736 task: type: Clustering - dataset: config: default name: MTEB SCDBPAccountabilityLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 64.64379947229551 - type: ap value: 91.77095548714944 - type: ap_weighted value: 91.77095548714944 - type: f1 value: 56.37541231445849 - type: f1_weighted value: 70.25628045216064 - type: main_score value: 64.64379947229551 task: type: Classification - dataset: config: default name: MTEB SCDBPAuditsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 59.89445910290237 - type: ap value: 75.9408508806894 - type: ap_weighted value: 75.9408508806894 - type: f1 value: 59.26805814808528 - type: f1_weighted value: 61.147261012536525 - type: main_score value: 59.89445910290237 task: type: Classification - dataset: config: default name: MTEB SCDBPCertificationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 59.78835978835979 - type: ap value: 79.40365504574285 - type: ap_weighted value: 79.40365504574285 - type: f1 value: 56.06802055297283 - type: f1_weighted value: 62.49406105045939 - type: main_score value: 59.78835978835979 task: type: Classification - dataset: config: default name: MTEB SCDBPTrainingLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 59.102902374670194 - type: ap value: 78.86277214171828 - type: ap_weighted value: 78.86277214171828 - type: f1 value: 58.122144043570934 - type: f1_weighted value: 60.91223239928431 - type: main_score value: 59.102902374670194 task: type: Classification - dataset: config: default name: MTEB SCDBPVerificationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 62.796833773087066 - type: ap value: 66.09764646131225 - type: ap_weighted value: 66.09764646131225 - type: f1 value: 62.562263119916494 - type: f1_weighted value: 62.19476909661592 - type: main_score value: 62.796833773087066 task: type: Classification - dataset: config: default name: MTEB SCDDAccountabilityLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 60.84656084656085 - type: ap value: 96.40608145845859 - type: ap_weighted value: 96.40608145845859 - type: f1 value: 46.04166666666668 - type: f1_weighted value: 71.16512345679011 - type: main_score value: 60.84656084656085 task: type: Classification - dataset: config: default name: MTEB SCDDAuditsLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 61.741424802110814 - type: ap value: 94.08312772646127 - type: ap_weighted value: 94.08312772646127 - type: f1 value: 50.59825064499599 - type: f1_weighted value: 69.72736628137642 - type: main_score value: 61.741424802110814 task: type: Classification - dataset: config: default name: MTEB SCDDCertificationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 62.43386243386243 - type: ap value: 92.94462068443907 - type: ap_weighted value: 92.94462068443907 - type: f1 value: 49.37181663837012 - type: f1_weighted value: 70.32551510197236 - type: main_score value: 62.43386243386243 task: type: Classification - dataset: config: default name: MTEB SCDDTrainingLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 53.825857519788926 - type: ap value: 89.02073335965477 - type: ap_weighted value: 89.02073335965477 - type: f1 value: 47.22918407128933 - type: f1_weighted value: 60.86559112527728 - type: main_score value: 53.825857519788926 task: type: Classification - dataset: config: default name: MTEB SCDDVerificationLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 49.07651715039577 - type: ap value: 76.04960744098202 - type: ap_weighted value: 76.04960744098202 - type: f1 value: 47.939930963310914 - type: f1_weighted value: 51.65413225324895 - type: main_score value: 49.07651715039577 task: type: Classification - dataset: config: zh name: MTEB STS22 (zh) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 10.783707479640047 - type: cosine_spearman value: 32.82859566062349 - type: euclidean_pearson value: 21.280811252412548 - type: euclidean_spearman value: 32.82859566062349 - type: main_score value: 32.82859566062349 - type: manhattan_pearson value: 21.510100649883686 - type: manhattan_spearman value: 32.924353350152195 task: type: STS - dataset: config: de-fr name: MTEB STS22 (de-fr) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 10.185699265034293 - type: cosine_spearman value: 17.504453225721367 - type: euclidean_pearson value: 11.256743769494715 - type: euclidean_spearman value: 17.504453225721367 - type: main_score value: 17.504453225721367 - type: manhattan_pearson value: 9.741426548627869 - type: manhattan_spearman value: 16.976476678309815 task: type: STS - dataset: config: pl-en name: MTEB STS22 (pl-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 44.8697112464095 - type: cosine_spearman value: 42.075721562892944 - type: euclidean_pearson value: 43.40637455102888 - type: euclidean_spearman value: 42.075721562892944 - type: main_score value: 42.075721562892944 - type: manhattan_pearson value: 45.13522626066653 - type: manhattan_spearman value: 42.53935152687679 task: type: STS - dataset: config: ru name: MTEB STS22 (ru) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 51.4108131114559 - type: cosine_spearman value: 60.05716921675363 - type: euclidean_pearson value: 52.595208834301246 - type: euclidean_spearman value: 60.05157835366835 - type: main_score value: 60.05716921675363 - type: manhattan_pearson value: 52.49640999228367 - type: manhattan_spearman value: 59.89412865698913 task: type: STS - dataset: config: fr name: MTEB STS22 (fr) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 26.610436064600535 - type: cosine_spearman value: 42.00247648193326 - type: euclidean_pearson value: 33.894760545223065 - type: euclidean_spearman value: 42.00247648193326 - type: main_score value: 42.00247648193326 - type: manhattan_pearson value: 33.80795212984925 - type: manhattan_spearman value: 42.14922985413102 task: type: STS - dataset: config: de name: MTEB STS22 (de) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: -5.737945045891398 - type: cosine_spearman value: 8.163885149544491 - type: euclidean_pearson value: -2.214478704390943 - type: euclidean_spearman value: 8.16472976205313 - type: main_score value: 8.163885149544491 - type: manhattan_pearson value: -1.7539096573944195 - type: manhattan_spearman value: 8.6906872178124 task: type: STS - dataset: config: tr name: MTEB STS22 (tr) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 2.043942714330888 - type: cosine_spearman value: 15.459553758272923 - type: euclidean_pearson value: 8.816942314411607 - type: euclidean_spearman value: 15.459553758272923 - type: main_score value: 15.459553758272923 - type: manhattan_pearson value: 9.32963790399984 - type: manhattan_spearman value: 15.7857074615967 task: type: STS - dataset: config: de-en name: MTEB STS22 (de-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 17.695301514955418 - type: cosine_spearman value: 21.545599222945675 - type: euclidean_pearson value: 18.353827841283753 - type: euclidean_spearman value: 21.545599222945675 - type: main_score value: 21.545599222945675 - type: manhattan_pearson value: 17.009036963688505 - type: manhattan_spearman value: 20.508582325360287 task: type: STS - dataset: config: it name: MTEB STS22 (it) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 32.630588839696415 - type: cosine_spearman value: 39.69250140711604 - type: euclidean_pearson value: 37.54122176804933 - type: euclidean_spearman value: 39.69250140711604 - type: main_score value: 39.69250140711604 - type: manhattan_pearson value: 37.79703600372667 - type: manhattan_spearman value: 39.742229485575024 task: type: STS - dataset: config: pl name: MTEB STS22 (pl) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 0.3113685198259237 - type: cosine_spearman value: 9.707385637292596 - type: euclidean_pearson value: -2.4832855952463206 - type: euclidean_spearman value: 9.80177503118972 - type: main_score value: 9.707385637292596 - type: manhattan_pearson value: -2.325293004138977 - type: manhattan_spearman value: 10.060452403624826 task: type: STS - dataset: config: fr-pl name: MTEB STS22 (fr-pl) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 47.546556133158575 - type: cosine_spearman value: 39.440531887330785 - type: euclidean_pearson value: 48.2920143634797 - type: euclidean_spearman value: 39.440531887330785 - type: main_score value: 39.440531887330785 - type: manhattan_pearson value: 45.769523538925824 - type: manhattan_spearman value: 50.709255283710995 task: type: STS - dataset: config: de-pl name: MTEB STS22 (de-pl) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 0.33007020080694816 - type: cosine_spearman value: 25.52831180119127 - type: euclidean_pearson value: 5.7124033000823164 - type: euclidean_spearman value: 25.52831180119127 - type: main_score value: 25.52831180119127 - type: manhattan_pearson value: 5.62314566860622 - type: manhattan_spearman value: 23.83463610871175 task: type: STS - dataset: config: ar name: MTEB STS22 (ar) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 22.766025640460693 - type: cosine_spearman value: 27.950069575571522 - type: euclidean_pearson value: 26.551723755491363 - type: euclidean_spearman value: 27.939678639817668 - type: main_score value: 27.950069575571522 - type: manhattan_pearson value: 26.681060475093854 - type: manhattan_spearman value: 27.986878582632468 task: type: STS - dataset: config: es-en name: MTEB STS22 (es-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 38.597910358452815 - type: cosine_spearman value: 42.766194189894094 - type: euclidean_pearson value: 39.991306255692045 - type: euclidean_spearman value: 42.766194189894094 - type: main_score value: 42.766194189894094 - type: manhattan_pearson value: 39.74918349185897 - type: manhattan_spearman value: 42.574140880355976 task: type: STS - dataset: config: es-it name: MTEB STS22 (es-it) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 31.245905627830638 - type: cosine_spearman value: 32.83240215980029 - type: euclidean_pearson value: 33.06481984956772 - type: euclidean_spearman value: 32.83240215980029 - type: main_score value: 32.83240215980029 - type: manhattan_pearson value: 32.75706899386791 - type: manhattan_spearman value: 32.334081823391806 task: type: STS - dataset: config: es name: MTEB STS22 (es) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 16.966347433363 - type: cosine_spearman value: 45.3129129914676 - type: euclidean_pearson value: 28.50940505249936 - type: euclidean_spearman value: 45.3129129914676 - type: main_score value: 45.3129129914676 - type: manhattan_pearson value: 28.314847203862147 - type: manhattan_spearman value: 45.72042962859271 task: type: STS - dataset: config: zh-en name: MTEB STS22 (zh-en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 34.66358594216254 - type: cosine_spearman value: 31.24659955360722 - type: euclidean_pearson value: 34.878197534840744 - type: euclidean_spearman value: 31.24659955360722 - type: main_score value: 31.24659955360722 - type: manhattan_pearson value: 34.70743093532992 - type: manhattan_spearman value: 30.441251812127955 task: type: STS - dataset: config: en name: MTEB STS22 (en) revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 split: test type: mteb/sts22-crosslingual-sts metrics: - type: cosine_pearson value: 41.376318618780324 - type: cosine_spearman value: 47.061970299820764 - type: euclidean_pearson value: 44.89590651276241 - type: euclidean_spearman value: 47.061970299820764 - type: main_score value: 47.061970299820764 - type: manhattan_pearson value: 44.780089700405576 - type: manhattan_spearman value: 46.742447019531525 task: type: STS - dataset: config: default name: MTEB SensitiveTopicsClassification (default) revision: 416b34a802308eac30e4192afc0ff99bb8dcc7f2 split: test type: ai-forever/sensitive-topics-classification metrics: - type: accuracy value: 24.443359375 - type: f1 value: 21.903258801323084 - type: lrap value: 36.34758843315896 - type: main_score value: 24.443359375 task: type: MultilabelClassification - dataset: config: default name: MTEB StackExchangeClustering (default) revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 split: test type: mteb/stackexchange-clustering metrics: - type: main_score value: 33.50613168016603 - type: v_measure value: 33.50613168016603 - type: v_measure_std value: 3.91782276122889 task: type: Clustering - dataset: config: default name: MTEB StackExchangeClusteringP2P (default) revision: 815ca46b2622cec33ccafc3735d572c266efdb44 split: test type: mteb/stackexchange-clustering-p2p metrics: - type: main_score value: 27.98150942889309 - type: v_measure value: 27.98150942889309 - type: v_measure_std value: 2.0056109104136226 task: type: Clustering - dataset: config: default name: MTEB TERRa (default) revision: 7b58f24536063837d644aab9a023c62199b2a612 split: dev type: ai-forever/terra-pairclassification metrics: - type: cosine_accuracy value: 59.60912052117264 - type: cosine_accuracy_threshold value: 81.55556917190552 - type: cosine_ap value: 56.08760299515377 - type: cosine_f1 value: 67.33167082294264 - type: cosine_f1_threshold value: 78.14505100250244 - type: cosine_precision value: 54.43548387096774 - type: cosine_recall value: 88.23529411764706 - type: dot_accuracy value: 59.60912052117264 - type: dot_accuracy_threshold value: 81.55556917190552 - type: dot_ap value: 56.08760299515377 - type: dot_f1 value: 67.33167082294264 - type: dot_f1_threshold value: 78.14503908157349 - type: dot_precision value: 54.43548387096774 - type: dot_recall value: 88.23529411764706 - type: euclidean_accuracy value: 59.60912052117264 - type: euclidean_accuracy_threshold value: 60.736143589019775 - type: euclidean_ap value: 56.08760299515377 - type: euclidean_f1 value: 67.33167082294264 - type: euclidean_f1_threshold value: 66.11342430114746 - type: euclidean_precision value: 54.43548387096774 - type: euclidean_recall value: 88.23529411764706 - type: main_score value: 56.265447472512676 - type: manhattan_accuracy value: 60.91205211726385 - type: manhattan_accuracy_threshold value: 877.9421806335449 - type: manhattan_ap value: 56.265447472512676 - type: manhattan_f1 value: 67.16791979949875 - type: manhattan_f1_threshold value: 930.9440612792969 - type: manhattan_precision value: 54.47154471544715 - type: manhattan_recall value: 87.58169934640523 - type: max_ap value: 56.265447472512676 - type: max_f1 value: 67.33167082294264 - type: max_precision value: 54.47154471544715 - type: max_recall value: 88.23529411764706 - type: similarity_accuracy value: 59.60912052117264 - type: similarity_accuracy_threshold value: 81.55557513237 - type: similarity_ap value: 56.08760299515377 - type: similarity_f1 value: 67.33167082294264 - type: similarity_f1_threshold value: 78.1450629234314 - type: similarity_precision value: 54.43548387096774 - type: similarity_recall value: 88.23529411764706 task: type: PairClassification - dataset: config: default name: MTEB TelemarketingSalesRuleLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 51.06382978723404 - type: ap value: 64.12529550827422 - type: ap_weighted value: 64.12529550827422 - type: f1 value: 48.74348032242769 - type: f1_weighted value: 46.65516580410197 - type: main_score value: 51.06382978723404 task: type: Classification - dataset: config: default name: MTEB TextualismToolDictionariesLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 69.1588785046729 - type: ap value: 13.91484942886812 - type: ap_weighted value: 13.91484942886812 - type: f1 value: 53.57001972386588 - type: f1_weighted value: 75.94757507050821 - type: main_score value: 69.1588785046729 task: type: Classification - dataset: config: default name: MTEB TextualismToolPlainLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 52.121212121212125 - type: ap value: 44.68029172320217 - type: ap_weighted value: 44.68029172320217 - type: f1 value: 50.48433048433048 - type: f1_weighted value: 48.79288612621945 - type: main_score value: 52.121212121212125 task: type: Classification - dataset: config: default name: MTEB ToxicChatClassification (default) revision: 3e0319203c7162b9c9f8015b594441f979c199bc split: test type: lmsys/toxic-chat metrics: - type: accuracy value: 73.56529209621992 - type: ap value: 21.641229801673067 - type: ap_weighted value: 21.641229801673067 - type: f1 value: 60.19489676894062 - type: f1_weighted value: 77.21280694246968 - type: main_score value: 73.56529209621992 task: type: Classification - dataset: config: default name: MTEB ToxicConversationsClassification (default) revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de split: test type: mteb/toxic_conversations_50k metrics: - type: accuracy value: 57.7734375 - type: ap value: 9.305482173252097 - type: ap_weighted value: 9.305482173252097 - type: f1 value: 44.43839832998249 - type: f1_weighted value: 67.10615100631958 - type: main_score value: 57.7734375 task: type: Classification - dataset: config: default name: MTEB TweetSentimentExtractionClassification (default) revision: d604517c81ca91fe16a244d1248fc021f9ecee7a split: test type: mteb/tweet_sentiment_extraction metrics: - type: accuracy value: 55.29994340690435 - type: f1 value: 55.3098112653406 - type: f1_weighted value: 54.4846442708958 - type: main_score value: 55.29994340690435 task: type: Classification - dataset: config: default name: MTEB TweetTopicSingleClassification (default) revision: 87b7a0d1c402dbb481db649569c556d9aa27ac05 split: test_2021 type: cardiffnlp/tweet_topic_single metrics: - type: accuracy value: 52.522150029533364 - type: f1 value: 40.24714634897976 - type: f1_weighted value: 57.39523757985323 - type: main_score value: 52.522150029533364 task: type: Classification - dataset: config: default name: MTEB TwentyNewsgroupsClustering (default) revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 split: test type: mteb/twentynewsgroups-clustering metrics: - type: main_score value: 19.90344454285597 - type: v_measure value: 19.90344454285597 - type: v_measure_std value: 1.8260774855268984 task: type: Clustering - dataset: config: default name: MTEB UCCVCommonLawLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 52.127659574468076 - type: ap value: 42.829212190914326 - type: ap_weighted value: 42.829212190914326 - type: f1 value: 50.50895050895051 - type: f1_weighted value: 51.84200503349441 - type: main_score value: 52.127659574468076 task: type: Classification - dataset: config: default name: MTEB UnfairTOSLegalBenchClassification (default) revision: 12ca3b695563788fead87a982ad1a068284413f4 split: test type: nguha/legalbench metrics: - type: accuracy value: 19.3359375 - type: f1 value: 11.24236763925133 - type: f1_weighted value: 27.137659267661597 - type: main_score value: 19.3359375 task: type: Classification - dataset: config: default name: MTEB VieMedEVBitextMining (default) revision: d03c69413bc53d1cea5a5375b3a953c4fee35ecd split: test type: nhuvo/MedEV metrics: - type: accuracy value: 8.69140625 - type: f1 value: 7.772120924359041 - type: main_score value: 7.772120924359041 - type: precision value: 7.525730353438241 - type: recall value: 8.69140625 task: type: BitextMining - dataset: config: default name: MTEB WikiCitiesClustering (default) revision: ddc9ee9242fa65332597f70e967ecc38b9d734fa split: test type: jinaai/cities_wiki_clustering metrics: - type: main_score value: 56.66855146861069 - type: v_measure value: 56.66855146861069 - type: v_measure_std value: 0.0 task: type: Clustering - dataset: config: default name: MTEB YahooAnswersTopicsClassification (default) revision: 78fccffa043240c80e17a6b1da724f5a1057e8e5 split: test type: community-datasets/yahoo_answers_topics metrics: - type: accuracy value: 41.787109375 - type: f1 value: 40.33967050694529 - type: f1_weighted value: 40.3509380795682 - type: main_score value: 41.787109375 task: type: Classification - dataset: config: default name: MTEB YelpReviewFullClassification (default) revision: c1f9ee939b7d05667af864ee1cb066393154bf85 split: test type: Yelp/yelp_review_full metrics: - type: accuracy value: 43.5888671875 - type: f1 value: 42.36578282497966 - type: f1_weighted value: 42.363220099893724 - type: main_score value: 43.5888671875 task: type: Classification --- Быстрая модель BERT для расчетов эмбеддингов предложений на русском языке. Модель основана на cointegrated/rubert-tiny2 - имеет аналогичные размеры контекста (2048), ембеддинга (312) и быстродействие. ## Использование ## Метрики Оценки модели на бенчмарке encodechka: | model | CPU | GPU | size | Mean S | Mean S+W | dim | |:-----------------------------------|----------:|---------:|---------:|----------:|-----------:|-------:| | sergeyzh/LaBSE-ru-turbo | 120.40 | 8.05 | 490 | 0.789 | 0.702 | 768 | | BAAI/bge-m3 | 523.40 | 22.50 | 2166 | 0.787 | 0.696 | 1024 | | intfloat/multilingual-e5-large | 506.80 | 30.80 | 2136 | 0.780 | 0.686 | 1024 | | intfloat/multilingual-e5-base | 130.61 | 14.39 | 1061 | 0.761 | 0.669 | 768 | | **sergeyzh/rubert-tiny-turbo** | 5.51 | 3.25 | 111 | 0.749 | 0.667 | 312 | | intfloat/multilingual-e5-small | 40.86 | 12.09 | 449 | 0.742 | 0.645 | 384 | | cointegrated/rubert-tiny2 | 5.51 | 3.25 | 111 | 0.704 | 0.638 | 312 | | model | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 | |:-----------------------------------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:---------| | sergeyzh/LaBSE-ru-turbo | 0.864 | 0.748 | 0.490 | 0.814 | 0.974 | 0.806 | 0.815 | 0.801 | 0.305 | 0.404 | | BAAI/bge-m3 | 0.864 | 0.749 | 0.510 | 0.819 | 0.973 | 0.792 | 0.809 | 0.783 | 0.240 | 0.422 | | intfloat/multilingual-e5-large | 0.862 | 0.727 | 0.473 | 0.810 | 0.979 | 0.798 | 0.819 | 0.773 | 0.224 | 0.374 | | intfloat/multilingual-e5-base | 0.835 | 0.704 | 0.459 | 0.796 | 0.964 | 0.783 | 0.802 | 0.738 | 0.235 | 0.376 | | **sergeyzh/rubert-tiny-turbo** | 0.828 | 0.722 | 0.476 | 0.787 | 0.955 | 0.757 | 0.780 | 0.685 | 0.305 | 0.373 | | intfloat/multilingual-e5-small | 0.822 | 0.714 | 0.457 | 0.758 | 0.957 | 0.761 | 0.779 | 0.691 | 0.234 | 0.275 | | cointegrated/rubert-tiny2 | 0.750 | 0.651 | 0.417 | 0.737 | 0.937 | 0.746 | 0.757 | 0.638 | 0.360 | 0.386 | Оценки модели на бенчмарке ruMTEB: |Model Name | Metric | sbert_large_ mt_nlu_ru | sbert_large_ nlu_ru | rubert-tiny2 | rubert-tiny-turbo | multilingual-e5-small | multilingual-e5-base | multilingual-e5-large | |:----------------------------------|:--------------------|-----------------------:|--------------------:|----------------:|------------------:|----------------------:|---------------------:|----------------------:| |CEDRClassification | Accuracy | 0.368 | 0.358 | 0.369 | 0.390 | 0.401 | 0.423 | **0.448** | |GeoreviewClassification | Accuracy | 0.397 | 0.400 | 0.396 | 0.414 | 0.447 | 0.461 | **0.497** | |GeoreviewClusteringP2P | V-measure | 0.584 | 0.590 | 0.442 | 0.597 | 0.586 | 0.545 | **0.605** | |HeadlineClassification | Accuracy | 0.772 | **0.793** | 0.742 | 0.686 | 0.732 | 0.757 | 0.758 | |InappropriatenessClassification | Accuracy | **0.646** | 0.625 | 0.586 | 0.591 | 0.592 | 0.588 | 0.616 | |KinopoiskClassification | Accuracy | 0.503 | 0.495 | 0.491 | 0.505 | 0.500 | 0.509 | **0.566** | |RiaNewsRetrieval | NDCG@10 | 0.214 | 0.111 | 0.140 | 0.513 | 0.700 | 0.702 | **0.807** | |RuBQReranking | MAP@10 | 0.561 | 0.468 | 0.461 | 0.622 | 0.715 | 0.720 | **0.756** | |RuBQRetrieval | NDCG@10 | 0.298 | 0.124 | 0.109 | 0.517 | 0.685 | 0.696 | **0.741** | |RuReviewsClassification | Accuracy | 0.589 | 0.583 | 0.570 | 0.607 | 0.612 | 0.630 | **0.653** | |RuSTSBenchmarkSTS | Pearson correlation | 0.712 | 0.588 | 0.694 | 0.787 | 0.781 | 0.796 | **0.831** | |RuSciBenchGRNTIClassification | Accuracy | 0.542 | 0.539 | 0.456 | 0.529 | 0.550 | 0.563 | **0.582** | |RuSciBenchGRNTIClusteringP2P | V-measure | **0.522** | 0.504 | 0.414 | 0.481 | 0.511 | 0.516 | 0.520 | |RuSciBenchOECDClassification | Accuracy | 0.438 | 0.430 | 0.355 | 0.415 | 0.427 | 0.423 | **0.445** | |RuSciBenchOECDClusteringP2P | V-measure | **0.473** | 0.464 | 0.381 | 0.411 | 0.443 | 0.448 | 0.450 | |SensitiveTopicsClassification | Accuracy | **0.285** | 0.280 | 0.220 | 0.244 | 0.228 | 0.234 | 0.257 | |TERRaClassification | Average Precision | 0.520 | 0.502 | 0.519 | 0.563 | 0.551 | 0.550 | **0.584** | |Model Name | Metric | sbert_large_ mt_nlu_ru | sbert_large_ nlu_ru | rubert-tiny2 | rubert-tiny-turbo | multilingual-e5-small | multilingual-e5-base | multilingual-e5-large | |:----------------------------------|:--------------------|-----------------------:|--------------------:|----------------:|------------------:|----------------------:|----------------------:|---------------------:| |Classification | Accuracy | 0.554 | 0.552 | 0.514 | 0.535 | 0.551 | 0.561 | **0.588** | |Clustering | V-measure | **0.526** | 0.519 | 0.412 | 0.496 | 0.513 | 0.503 | 0.525 | |MultiLabelClassification | Accuracy | 0.326 | 0.319 | 0.294 | 0.317 | 0.314 | 0.329 | **0.353** | |PairClassification | Average Precision | 0.520 | 0.502 | 0.519 | 0.563 | 0.551 | 0.550 | **0.584** | |Reranking | MAP@10 | 0.561 | 0.468 | 0.461 | 0.622 | 0.715 | 0.720 | **0.756** | |Retrieval | NDCG@10 | 0.256 | 0.118 | 0.124 | 0.515 | 0.697 | 0.699 | **0.774** | |STS | Pearson correlation | 0.712 | 0.588 | 0.694 | 0.787 | 0.781 | 0.796 | **0.831** | |Average | Average | 0.494 | 0.438 | 0.431 | 0.548 | 0.588 | 0.594 | **0.630** |" +} \ No newline at end of file diff --git a/data/model_data_json/seyonec_ChemBERTa-zinc-base-v1.json b/data/model_data_json/seyonec_ChemBERTa-zinc-base-v1.json new file mode 100644 index 0000000000000000000000000000000000000000..5419b179b346fbd40370dfcbbb15c3d17ddaf9fd --- /dev/null +++ b/data/model_data_json/seyonec_ChemBERTa-zinc-base-v1.json @@ -0,0 +1,17 @@ +{ + "model_id": "seyonec/ChemBERTa-zinc-base-v1", + "downloads": 410003, + "tags": [ + "transformers", + "pytorch", + "jax", + "roberta", + "fill-mask", + "chemistry", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - chemistry --- # ChemBERTa: Training a BERT-like transformer model for masked language modelling of chemical SMILES strings. Deep learning for chemistry and materials science remains a novel field with lots of potiential. However, the popularity of transfer learning based methods in areas such as NLP and computer vision have not yet been effectively developed in computational chemistry + machine learning. Using HuggingFace's suite of models and the ByteLevel tokenizer, we are able to train on a large corpus of 100k SMILES strings from a commonly known benchmark dataset, ZINC. Training RoBERTa over 5 epochs, the model achieves a decent loss of 0.398, but may likely continue to decline if trained for a larger number of epochs. The model can predict tokens within a SMILES sequence/molecule, allowing for variants of a molecule within discoverable chemical space to be predicted. By applying the representations of functional groups and atoms learned by the model, we can try to tackle problems of toxicity, solubility, drug-likeness, and synthesis accessibility on smaller datasets using the learned representations as features for graph convolution and attention models on the graph structure of molecules, as well as fine-tuning of BERT. Finally, we propose the use of attention visualization as a helpful tool for chemistry practitioners and students to quickly identify important substructures in various chemical properties. Additionally, visualization of the attention mechanism have been seen through previous research as incredibly valuable towards chemical reaction classification. The applications of open-sourcing large-scale transformer models such as RoBERTa with HuggingFace may allow for the acceleration of these individual research directions. A link to a repository which includes the training, uploading and evaluation notebook (with sample predictions on compounds such as Remdesivir) can be found here. All of the notebooks can be copied into a new Colab runtime for easy execution. Thanks for checking this out! - Seyone", + "model_explanation_gemini": "A BERT-like transformer model trained for masked language modeling of chemical SMILES strings to predict molecular variants and assist in tasks like toxicity, solubility, and drug-likeness analysis." +} \ No newline at end of file diff --git a/data/model_data_json/shi-labs_oneformer_ade20k_swin_large.json b/data/model_data_json/shi-labs_oneformer_ade20k_swin_large.json new file mode 100644 index 0000000000000000000000000000000000000000..e55cc2e1d4f093ff51293da81b44c6f13e409f89 --- /dev/null +++ b/data/model_data_json/shi-labs_oneformer_ade20k_swin_large.json @@ -0,0 +1,19 @@ +{ + "model_id": "shi-labs/oneformer_ade20k_swin_large", + "downloads": 149179, + "tags": [ + "transformers", + "pytorch", + "oneformer", + "vision", + "image-segmentation", + "universal-image-segmentation", + "dataset:scene_parse_150", + "arxiv:2211.06220", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - vision - image-segmentation - universal-image-segmentation datasets: - scene_parse_150 widget: - src: example_title: House - src: example_title: Airplane - src: example_title: Person --- # OneFormer OneFormer model trained on the ADE20k dataset (large-sized version, Swin backbone). It was introduced in the paper OneFormer: One Transformer to Rule Universal Image Segmentation by Jain et al. and first released in this repository. !model image ## Model description OneFormer is the first multi-task universal image segmentation framework. It needs to be trained only once with a single universal architecture, a single model, and on a single dataset, to outperform existing specialized models across semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task-guided for training, and task-dynamic for inference, all with a single model. !model image ## Intended uses & limitations You can use this particular checkpoint for semantic, instance and panoptic segmentation. See the model hub to look for other fine-tuned versions on a different dataset. ### How to use Here is how to use this model: For more examples, please refer to the documentation. ### Citation", + "model_explanation_gemini": "Performs universal image segmentation (semantic, instance, and panoptic) using a single task-conditioned model trained on ADE20k with a Swin backbone." +} \ No newline at end of file diff --git a/data/model_data_json/shi-labs_oneformer_cityscapes_dinat_large.json b/data/model_data_json/shi-labs_oneformer_cityscapes_dinat_large.json new file mode 100644 index 0000000000000000000000000000000000000000..37eefd7345536ab4bac6d2c9896f4c6faefb3611 --- /dev/null +++ b/data/model_data_json/shi-labs_oneformer_cityscapes_dinat_large.json @@ -0,0 +1,17 @@ +{ + "model_id": "shi-labs/oneformer_cityscapes_dinat_large", + "downloads": 75688, + "tags": [ + "transformers", + "pytorch", + "oneformer", + "vision", + "image-segmentation", + "dataset:huggan/cityscapes", + "arxiv:2211.06220", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - vision - image-segmentation datasets: - huggan/cityscapes widget: - src: example_title: Cityscapes --- # OneFormer OneFormer model trained on the Cityscapes dataset (large-sized version, Dinat backbone). It was introduced in the paper OneFormer: One Transformer to Rule Universal Image Segmentation by Jain et al. and first released in this repository. !model image ## Model description OneFormer is the first multi-task universal image segmentation framework. It needs to be trained only once with a single universal architecture, a single model, and on a single dataset, to outperform existing specialized models across semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task-guided for training, and task-dynamic for inference, all with a single model. !model image ## Intended uses & limitations You can use this particular checkpoint for semantic, instance and panoptic segmentation. See the model hub to look for other fine-tuned versions on a different dataset. ### How to use Here is how to use this model: For more examples, please refer to the documentation. ### Citation" +} \ No newline at end of file diff --git a/data/model_data_json/shi-labs_oneformer_coco_swin_large.json b/data/model_data_json/shi-labs_oneformer_coco_swin_large.json new file mode 100644 index 0000000000000000000000000000000000000000..05f4c55c70e9be0749c7bee971a5238b48701ed3 --- /dev/null +++ b/data/model_data_json/shi-labs_oneformer_coco_swin_large.json @@ -0,0 +1,18 @@ +{ + "model_id": "shi-labs/oneformer_coco_swin_large", + "downloads": 167413, + "tags": [ + "transformers", + "pytorch", + "oneformer", + "vision", + "image-segmentation", + "dataset:ydshieh/coco_dataset_script", + "arxiv:2211.06220", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - vision - image-segmentation datasets: - ydshieh/coco_dataset_script widget: - src: example_title: Person - src: example_title: Airplane - src: example_title: Corgi --- # OneFormer OneFormer model trained on the COCO dataset (large-sized version, Swin backbone). It was introduced in the paper OneFormer: One Transformer to Rule Universal Image Segmentation by Jain et al. and first released in this repository. !model image ## Model description OneFormer is the first multi-task universal image segmentation framework. It needs to be trained only once with a single universal architecture, a single model, and on a single dataset, to outperform existing specialized models across semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task-guided for training, and task-dynamic for inference, all with a single model. !model image ## Intended uses & limitations You can use this particular checkpoint for semantic, instance and panoptic segmentation. See the model hub to look for other fine-tuned versions on a different dataset. ### How to use Here is how to use this model: For more examples, please refer to the documentation. ### Citation", + "model_explanation_gemini": "Performs universal image segmentation (semantic, instance, and panoptic) using a single task-conditioned model trained on COCO with a Swin backbone." +} \ No newline at end of file diff --git a/data/model_data_json/shibing624_text2vec-base-chinese.json b/data/model_data_json/shibing624_text2vec-base-chinese.json new file mode 100644 index 0000000000000000000000000000000000000000..ce7b43b0c0bf16c0c055ec96b0b3ff6da3ebaac2 --- /dev/null +++ b/data/model_data_json/shibing624_text2vec-base-chinese.json @@ -0,0 +1,22 @@ +{ + "model_id": "shibing624/text2vec-base-chinese", + "downloads": 544099, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "bert", + "Sentence Transformers", + "sentence-similarity", + "zh", + "dataset:shibing624/nli_zh", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: sentence-similarity tags: - Sentence Transformers - sentence-similarity - sentence-transformers datasets: - shibing624/nli_zh language: - zh library_name: sentence-transformers --- # shibing624/text2vec-base-chinese This is a CoSENT(Cosine Sentence) model: shibing624/text2vec-base-chinese. It maps sentences to a 768 dimensional dense vector space and can be used for tasks like sentence embeddings, text matching or semantic search. ## Evaluation For an automated evaluation of this model, see the *Evaluation Benchmark*: text2vec - chinese text matching task: | Arch | BaseModel | Model | ATEC | BQ | LCQMC | PAWSX | STS-B | SOHU-dd | SOHU-dc | Avg | QPS | |:-----------|:----------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------|:-----:|:-----:|:-----:|:-----:|:-----:|:-------:|:-------:|:---------:|:-----:| | Word2Vec | word2vec | w2v-light-tencent-chinese | 20.00 | 31.49 | 59.46 | 2.57 | 55.78 | 55.04 | 20.70 | 35.03 | 23769 | | SBERT | xlm-roberta-base | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 18.42 | 38.52 | 63.96 | 10.14 | 78.90 | 63.01 | 52.28 | 46.46 | 3138 | | Instructor | hfl/chinese-roberta-wwm-ext | moka-ai/m3e-base | 41.27 | 63.81 | 74.87 | 12.20 | 76.96 | 75.83 | 60.55 | 57.93 | 2980 | | CoSENT | hfl/chinese-macbert-base | shibing624/text2vec-base-chinese | 31.93 | 42.67 | 70.16 | 17.21 | 79.30 | 70.27 | 50.42 | 51.61 | 3008 | | CoSENT | hfl/chinese-lert-large | GanymedeNil/text2vec-large-chinese | 32.61 | 44.59 | 69.30 | 14.51 | 79.44 | 73.01 | 59.04 | 53.12 | 2092 | | CoSENT | nghuyong/ernie-3.0-base-zh | shibing624/text2vec-base-chinese-sentence | 43.37 | 61.43 | 73.48 | 38.90 | 78.25 | 70.60 | 53.08 | 59.87 | 3089 | | CoSENT | nghuyong/ernie-3.0-base-zh | shibing624/text2vec-base-chinese-paraphrase | 44.89 | 63.58 | 74.24 | 40.90 | 78.93 | 76.70 | 63.30 | 63.08 | 3066 | | CoSENT | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | shibing624/text2vec-base-multilingual | 32.39 | 50.33 | 65.64 | 32.56 | 74.45 | 68.88 | 51.17 | 53.67 | 4004 | 说明: - 结果评测指标:spearman系数 - 模型,是用CoSENT方法训练,基于在中文STS-B数据训练得到,并在中文STS-B测试集评估达到较好效果,运行examples/training_sup_text_matching_model.py代码可训练模型,模型文件已经上传HF model hub,中文通用语义匹配任务推荐使用 - 模型,是用CoSENT方法训练,基于用人工挑选后的中文STS数据集shibing624/nli-zh-all/text2vec-base-chinese-sentence-dataset训练得到,并在中文各NLI测试集评估达到较好效果,运行examples/training_sup_text_matching_model_jsonl_data.py代码可训练模型,模型文件已经上传HF model hub,中文s2s(句子vs句子)语义匹配任务推荐使用 - 模型,是用CoSENT方法训练,基于用人工挑选后的中文STS数据集shibing624/nli-zh-all/text2vec-base-chinese-paraphrase-dataset,数据集相对于shibing624/nli-zh-all/text2vec-base-chinese-sentence-dataset加入了s2p(sentence to paraphrase)数据,强化了其长文本的表征能力,并在中文各NLI测试集评估达到SOTA,运行examples/training_sup_text_matching_model_jsonl_data.py代码可训练模型,模型文件已经上传HF model hub,中文s2p(句子vs段落)语义匹配任务推荐使用 - 模型是用SBERT训练,是模型的多语言版本,支持中文、英文等 - 是腾讯词向量的Word2Vec模型,CPU加载使用,适用于中文字面匹配任务和缺少数据的冷启动情况 ## Usage (text2vec) Using this model becomes easy when you have text2vec installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without text2vec, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. Install transformers: Then load model and predict: ## Usage (sentence-transformers) sentence-transformers is a popular library to compute dense vector representations for sentences. Install sentence-transformers: Then load model and predict: ## Model speed up | Model | ATEC | BQ | LCQMC | PAWSX | STSB | |------------------------------------------------------------------------------------------------------------------------------|-------------------|-------------------|------------------|------------------|------------------| | shibing624/text2vec-base-chinese (fp32, baseline) | 0.31928 | 0.42672 | 0.70157 | 0.17214 | 0.79296 | | shibing624/text2vec-base-chinese (onnx-O4, #29) | 0.31928 | 0.42672 | 0.70157 | 0.17214 | 0.79296 | | shibing624/text2vec-base-chinese (ov, #27) | 0.31928 | 0.42672 | 0.70157 | 0.17214 | 0.79296 | | shibing624/text2vec-base-chinese (ov-qint8, #30) | 0.30778 (-3.60%) | 0.43474 (+1.88%) | 0.69620 (-0.77%) | 0.16662 (-3.20%) | 0.79396 (+0.13%) | In short: 1. ✅ shibing624/text2vec-base-chinese (onnx-O4), ONNX Optimized to O4 does not reduce performance, but gives a ~2x speedup on GPU. 2. ✅ shibing624/text2vec-base-chinese (ov), OpenVINO does not reduce performance, but gives a 1.12x speedup on CPU. 3. 🟡 shibing624/text2vec-base-chinese (ov-qint8), int8 quantization with OV incurs a small performance hit on some tasks, and a tiny performance gain on others, when quantizing with Chinese STSB. Additionally, it results in a 4.78x speedup on CPU. - usage: shibing624/text2vec-base-chinese (onnx-O4), for gpu - usage: shibing624/text2vec-base-chinese (ov), for cpu - usage: shibing624/text2vec-base-chinese (ov-qint8), for cpu ## Full Model Architecture ## Intended uses Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 256 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the rank loss by comparing with true pairs and false pairs. #### Hyper parameters - training dataset: - max_seq_length: 128 - best epoch: 5 - sentence embedding dim: 768 ## Citing & Authors This model was trained by text2vec. If you find this model helpful, feel free to cite:", + "model_explanation_gemini": "Generates 768-dimensional sentence embeddings for Chinese text to enable tasks like semantic search, text matching, and similarity comparisons." +} \ No newline at end of file diff --git a/data/model_data_json/shibing624_text2vec-base-multilingual.json b/data/model_data_json/shibing624_text2vec-base-multilingual.json new file mode 100644 index 0000000000000000000000000000000000000000..32bbb1c98bcfcd132ab48860327f832ad0a89497 --- /dev/null +++ b/data/model_data_json/shibing624_text2vec-base-multilingual.json @@ -0,0 +1,34 @@ +{ + "model_id": "shibing624/text2vec-base-multilingual", + "downloads": 118745, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "text2vec", + "mteb", + "zh", + "en", + "de", + "fr", + "it", + "nl", + "pt", + "pl", + "ru", + "dataset:shibing624/nli-zh-all", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - text2vec - mteb datasets: - shibing624/nli-zh-all language: - zh - en - de - fr - it - nl - pt - pl - ru metrics: - spearmanr model-index: - name: text2vec-base-multilingual results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 70.97014925373134 - type: ap value: 33.95151328318672 - type: f1 value: 65.14740155705596 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (de) config: de split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 68.69379014989293 - type: ap value: 79.68277579733802 - type: f1 value: 66.54960052336921 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en-ext) config: en-ext split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 70.90704647676162 - type: ap value: 20.747518928580437 - type: f1 value: 58.64365465884924 - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (ja) config: ja split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 61.605995717344754 - type: ap value: 14.135974879487028 - type: f1 value: 49.980224800472136 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 66.103375 - type: ap value: 61.10087197664471 - type: f1 value: 65.75198509894145 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 33.134 - type: f1 value: 32.7905397597083 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (de) config: de split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 33.388 - type: f1 value: 33.190561196873084 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (es) config: es split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 34.824 - type: f1 value: 34.297290157740726 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (fr) config: fr split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 33.449999999999996 - type: f1 value: 33.08017234412433 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (ja) config: ja split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 30.046 - type: f1 value: 29.857141661482228 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (zh) config: zh split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 32.522 - type: f1 value: 31.854699911472174 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 32.31918856561886 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 25.503481615956137 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 57.91471462820568 - type: mrr value: 71.82990370663501 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 68.83853315193127 - type: cos_sim_spearman value: 66.16174850417771 - type: euclidean_pearson value: 56.65313897263153 - type: euclidean_spearman value: 52.69156205876939 - type: manhattan_pearson value: 56.97282154658304 - type: manhattan_spearman value: 53.167476517261015 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 78.08441558441558 - type: f1 value: 77.99825264827898 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 28.98583420521256 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 23.195091778460892 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 43.35 - type: f1 value: 38.80269436557695 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 59.348 - type: ap value: 55.75065220262251 - type: f1 value: 58.72117519082607 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 81.04879160966712 - type: f1 value: 80.86889779192701 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (de) config: de split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 78.59397013243168 - type: f1 value: 77.09902761555972 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (es) config: es split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 79.24282855236824 - type: f1 value: 78.75883867079015 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (fr) config: fr split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 76.16661446915127 - type: f1 value: 76.30204722831901 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (hi) config: hi split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 78.74506991753317 - type: f1 value: 77.50560442779701 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (th) config: th split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 77.67088607594937 - type: f1 value: 77.21442956887493 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 62.786137710898316 - type: f1 value: 46.23474201126368 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (de) config: de split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 55.285996055226825 - type: f1 value: 37.98039513682919 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (es) config: es split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 58.67911941294196 - type: f1 value: 40.541410807124954 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (fr) config: fr split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 53.257124960851854 - type: f1 value: 38.42982319259366 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (hi) config: hi split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 59.62352097525995 - type: f1 value: 41.28886486568534 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (th) config: th split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 58.799276672694404 - type: f1 value: 43.68379466247341 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (af) config: af split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 45.42030934767989 - type: f1 value: 44.12201543566376 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (am) config: am split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 37.67652992602556 - type: f1 value: 35.422091900843164 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ar) config: ar split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 45.02353732347007 - type: f1 value: 41.852484084738194 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (az) config: az split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 48.70880968392737 - type: f1 value: 46.904360615435046 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (bn) config: bn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 43.78950907868191 - type: f1 value: 41.58872353920405 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (cy) config: cy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 28.759246805648957 - type: f1 value: 27.41182001374226 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (da) config: da split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.74176193678547 - type: f1 value: 53.82727354182497 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (de) config: de split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.55682582380632 - type: f1 value: 49.41963627941866 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (el) config: el split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.46940147948891 - type: f1 value: 55.28178711367465 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 63.83322125084063 - type: f1 value: 61.836172900845554 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (es) config: es split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.27505043712172 - type: f1 value: 57.642436374361154 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fa) config: fa split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.05178211163417 - type: f1 value: 56.858998820504056 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fi) config: fi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.357094821788834 - type: f1 value: 54.79711189260453 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (fr) config: fr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.79959650302623 - type: f1 value: 57.59158671719513 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (he) config: he split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.1768661735037 - type: f1 value: 48.886397276270515 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hi) config: hi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.06455951580362 - type: f1 value: 55.01530952684585 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hu) config: hu split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.3591123066577 - type: f1 value: 55.9277783370191 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (hy) config: hy split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 52.108271687962336 - type: f1 value: 51.195023400664596 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (id) config: id split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.26832548755883 - type: f1 value: 56.60774065423401 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (is) config: is split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 35.806993947545394 - type: f1 value: 34.290418953173294 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (it) config: it split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.27841291190315 - type: f1 value: 56.9438998642419 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ja) config: ja split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.78009414929389 - type: f1 value: 59.15780842483667 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (jv) config: jv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 31.153328850033624 - type: f1 value: 30.11004596099605 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ka) config: ka split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 44.50235373234701 - type: f1 value: 44.040585262624745 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (km) config: km split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 40.99193006052455 - type: f1 value: 39.505480119272484 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (kn) config: kn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 46.95696032279758 - type: f1 value: 43.093638940785326 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ko) config: ko split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.73100201748486 - type: f1 value: 52.79750744404114 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (lv) config: lv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.865501008742434 - type: f1 value: 53.64798408964839 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ml) config: ml split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 47.891728312037664 - type: f1 value: 45.261229414636055 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (mn) config: mn split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 52.2259583053127 - type: f1 value: 50.5903419246987 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ms) config: ms split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.277067921990586 - type: f1 value: 52.472042479965886 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (my) config: my split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 51.95696032279757 - type: f1 value: 49.79330411854258 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nb) config: nb split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.63685272360457 - type: f1 value: 52.81267480650003 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (nl) config: nl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.451916610625425 - type: f1 value: 57.34790386645091 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pl) config: pl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.91055817081372 - type: f1 value: 56.39195048528157 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (pt) config: pt split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.84196368527236 - type: f1 value: 58.72244763127063 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ro) config: ro split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.04102219233354 - type: f1 value: 55.67040186148946 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ru) config: ru split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.01613987895091 - type: f1 value: 57.203949825484855 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sl) config: sl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.35843981170141 - type: f1 value: 54.18656338999773 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sq) config: sq split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.47948890383322 - type: f1 value: 54.772224557130954 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sv) config: sv split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 58.43981170141224 - type: f1 value: 56.09260971364242 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (sw) config: sw split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 33.9609952925353 - type: f1 value: 33.18853392353405 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ta) config: ta split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 44.29388029589778 - type: f1 value: 41.51986533284474 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (te) config: te split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 47.13517148621385 - type: f1 value: 43.94784138379624 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (th) config: th split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 56.856086079354405 - type: f1 value: 56.618177384748456 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tl) config: tl split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 35.35978480161398 - type: f1 value: 34.060680080365046 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (tr) config: tr split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 59.630127774041696 - type: f1 value: 57.46288652988266 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (ur) config: ur split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 52.7908540685945 - type: f1 value: 51.46934239116157 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (vi) config: vi split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 54.6469401479489 - type: f1 value: 53.9903066185816 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-CN) config: zh-CN split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 60.85743106926698 - type: f1 value: 59.31579548450755 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (zh-TW) config: zh-TW split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 57.46805648957633 - type: f1 value: 57.48469733657326 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (af) config: af split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 50.86415601882985 - type: f1 value: 49.41696672602645 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (am) config: am split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 41.183591123066584 - type: f1 value: 40.04563865770774 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ar) config: ar split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 50.08069939475455 - type: f1 value: 50.724800165846126 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (az) config: az split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 51.287827841291204 - type: f1 value: 50.72873776739851 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (bn) config: bn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 46.53328850033624 - type: f1 value: 45.93317866639667 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (cy) config: cy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 34.347679892400805 - type: f1 value: 31.941581141280828 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (da) config: da split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.073301950235376 - type: f1 value: 62.228728940111054 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (de) config: de split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 56.398789509078675 - type: f1 value: 54.80778341609032 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (el) config: el split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.79892400806993 - type: f1 value: 60.69430756982446 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 66.96368527236046 - type: f1 value: 66.5893927997656 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (es) config: es split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.21250840618695 - type: f1 value: 62.347177794128925 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fa) config: fa split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.43779421654339 - type: f1 value: 61.307701312085605 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fi) config: fi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.09952925353059 - type: f1 value: 60.313907927386914 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (fr) config: fr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.38601210490922 - type: f1 value: 63.05968938353488 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (he) config: he split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 56.2878278412912 - type: f1 value: 55.92927644838597 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hi) config: hi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.62878278412912 - type: f1 value: 60.25299253652635 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hu) config: hu split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.28850033624748 - type: f1 value: 62.77053246337031 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (hy) config: hy split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 54.875588433086754 - type: f1 value: 54.30717357279134 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (id) config: id split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.99394754539341 - type: f1 value: 61.73085530883037 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (is) config: is split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 38.581035642232685 - type: f1 value: 36.96287269695893 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (it) config: it split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.350369872225976 - type: f1 value: 61.807327324823966 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ja) config: ja split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.17148621385338 - type: f1 value: 65.29620144656751 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (jv) config: jv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 36.12642905178212 - type: f1 value: 35.334393048479484 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ka) config: ka split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 50.26899798251513 - type: f1 value: 49.041065960139434 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (km) config: km split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 44.24344317417619 - type: f1 value: 42.42177854872125 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (kn) config: kn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 47.370544720914594 - type: f1 value: 46.589722581465324 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ko) config: ko split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 58.89038332212508 - type: f1 value: 57.753607921990394 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (lv) config: lv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 56.506388702084756 - type: f1 value: 56.0485860423295 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ml) config: ml split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 50.06388702084734 - type: f1 value: 50.109364641824584 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (mn) config: mn split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 55.053799596503026 - type: f1 value: 54.490665705666686 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ms) config: ms split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 59.77135171486213 - type: f1 value: 58.2808650158803 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (my) config: my split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 55.71620712844654 - type: f1 value: 53.863034882475304 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nb) config: nb split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.26227303295225 - type: f1 value: 59.86604657147016 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (nl) config: nl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.3759246805649 - type: f1 value: 62.45257339288533 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pl) config: pl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.552118359112306 - type: f1 value: 61.354449605776765 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (pt) config: pt split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.40753194351043 - type: f1 value: 61.98779889528889 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ro) config: ro split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 60.68258238063214 - type: f1 value: 60.59973978976571 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ru) config: ru split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.31002017484868 - type: f1 value: 62.412312268503655 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sl) config: sl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 61.429051782111635 - type: f1 value: 61.60095590401424 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sq) config: sq split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 62.229320780094156 - type: f1 value: 61.02251426747547 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sv) config: sv split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.42501681237391 - type: f1 value: 63.461494430605235 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (sw) config: sw split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 38.51714862138534 - type: f1 value: 37.12466722986362 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ta) config: ta split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 46.99731002017485 - type: f1 value: 45.859147049984834 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (te) config: te split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 51.01882985877605 - type: f1 value: 49.01040173136056 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (th) config: th split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.234700739744454 - type: f1 value: 62.732294595214746 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tl) config: tl split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 38.72225958305312 - type: f1 value: 36.603231928120906 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (tr) config: tr split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 64.48554135843982 - type: f1 value: 63.97380562022752 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (ur) config: ur split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 56.7955615332885 - type: f1 value: 55.95308241204802 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (vi) config: vi split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 57.06455951580362 - type: f1 value: 56.95570494066693 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-CN) config: zh-CN split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 65.8338937457969 - type: f1 value: 65.6778746906008 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (zh-TW) config: zh-TW split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 63.369199731002034 - type: f1 value: 63.527650116059945 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 29.442504112215538 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 26.16062814161053 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 65.319 - type: map_at_10 value: 78.72 - type: map_at_100 value: 79.44600000000001 - type: map_at_1000 value: 79.469 - type: map_at_3 value: 75.693 - type: map_at_5 value: 77.537 - type: mrr_at_1 value: 75.24 - type: mrr_at_10 value: 82.304 - type: mrr_at_100 value: 82.485 - type: mrr_at_1000 value: 82.489 - type: mrr_at_3 value: 81.002 - type: mrr_at_5 value: 81.817 - type: ndcg_at_1 value: 75.26 - type: ndcg_at_10 value: 83.07 - type: ndcg_at_100 value: 84.829 - type: ndcg_at_1000 value: 85.087 - type: ndcg_at_3 value: 79.67699999999999 - type: ndcg_at_5 value: 81.42 - type: precision_at_1 value: 75.26 - type: precision_at_10 value: 12.697 - type: precision_at_100 value: 1.4829999999999999 - type: precision_at_1000 value: 0.154 - type: precision_at_3 value: 34.849999999999994 - type: precision_at_5 value: 23.054 - type: recall_at_1 value: 65.319 - type: recall_at_10 value: 91.551 - type: recall_at_100 value: 98.053 - type: recall_at_1000 value: 99.516 - type: recall_at_3 value: 81.819 - type: recall_at_5 value: 86.66199999999999 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 31.249791587189996 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 43.302922383029816 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.80670811345861 - type: cos_sim_spearman value: 79.97373018384307 - type: euclidean_pearson value: 83.40205934125837 - type: euclidean_spearman value: 79.73331008251854 - type: manhattan_pearson value: 83.3320983393412 - type: manhattan_spearman value: 79.677919746045 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 86.3816087627948 - type: cos_sim_spearman value: 80.91314664846955 - type: euclidean_pearson value: 85.10603071031096 - type: euclidean_spearman value: 79.42663939501841 - type: manhattan_pearson value: 85.16096376014066 - type: manhattan_spearman value: 79.51936545543191 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 80.44665329940209 - type: cos_sim_spearman value: 82.86479010707745 - type: euclidean_pearson value: 84.06719627734672 - type: euclidean_spearman value: 84.9356099976297 - type: manhattan_pearson value: 84.10370009572624 - type: manhattan_spearman value: 84.96828040546536 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 86.05704260568437 - type: cos_sim_spearman value: 87.36399473803172 - type: euclidean_pearson value: 86.8895170159388 - type: euclidean_spearman value: 87.16246440866921 - type: manhattan_pearson value: 86.80814774538997 - type: manhattan_spearman value: 87.09320142699522 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 85.97825118945852 - type: cos_sim_spearman value: 88.31438033558268 - type: euclidean_pearson value: 87.05174694758092 - type: euclidean_spearman value: 87.80659468392355 - type: manhattan_pearson value: 86.98831322198717 - type: manhattan_spearman value: 87.72820615049285 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 78.68745420126719 - type: cos_sim_spearman value: 81.6058424699445 - type: euclidean_pearson value: 81.16540133861879 - type: euclidean_spearman value: 81.86377535458067 - type: manhattan_pearson value: 81.13813317937021 - type: manhattan_spearman value: 81.87079962857256 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ko-ko) config: ko-ko split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 68.06192660936868 - type: cos_sim_spearman value: 68.2376353514075 - type: euclidean_pearson value: 60.68326946956215 - type: euclidean_spearman value: 59.19352349785952 - type: manhattan_pearson value: 60.6592944683418 - type: manhattan_spearman value: 59.167534419270865 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (ar-ar) config: ar-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 76.78098264855684 - type: cos_sim_spearman value: 78.02670452969812 - type: euclidean_pearson value: 77.26694463661255 - type: euclidean_spearman value: 77.47007626009587 - type: manhattan_pearson value: 77.25070088632027 - type: manhattan_spearman value: 77.36368265830724 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-ar) config: en-ar split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 78.45418506379532 - type: cos_sim_spearman value: 78.60412019902428 - type: euclidean_pearson value: 79.90303710850512 - type: euclidean_spearman value: 78.67123625004957 - type: manhattan_pearson value: 80.09189580897753 - type: manhattan_spearman value: 79.02484481441483 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-de) config: en-de split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 82.35556731232779 - type: cos_sim_spearman value: 81.48249735354844 - type: euclidean_pearson value: 81.66748026636621 - type: euclidean_spearman value: 80.35571574338547 - type: manhattan_pearson value: 81.38214732806365 - type: manhattan_spearman value: 79.9018202958774 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 86.4527703176897 - type: cos_sim_spearman value: 85.81084095829584 - type: euclidean_pearson value: 86.43489162324457 - type: euclidean_spearman value: 85.27110976093296 - type: manhattan_pearson value: 86.43674259444512 - type: manhattan_spearman value: 85.05719308026032 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-tr) config: en-tr split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 76.00411240034492 - type: cos_sim_spearman value: 76.33887356560854 - type: euclidean_pearson value: 76.81730660019446 - type: euclidean_spearman value: 75.04432185451306 - type: manhattan_pearson value: 77.22298813168995 - type: manhattan_spearman value: 75.56420330256725 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-en) config: es-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 79.1447136836213 - type: cos_sim_spearman value: 81.80823850788917 - type: euclidean_pearson value: 80.84505734814422 - type: euclidean_spearman value: 81.714168092736 - type: manhattan_pearson value: 80.84713816174187 - type: manhattan_spearman value: 81.61267814749516 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (es-es) config: es-es split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.01257457052873 - type: cos_sim_spearman value: 87.91146458004216 - type: euclidean_pearson value: 88.36771859717994 - type: euclidean_spearman value: 87.73182474597515 - type: manhattan_pearson value: 88.26551451003671 - type: manhattan_spearman value: 87.71675151388992 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (fr-en) config: fr-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 79.20121618382373 - type: cos_sim_spearman value: 78.05794691968603 - type: euclidean_pearson value: 79.93819925682054 - type: euclidean_spearman value: 78.00586118701553 - type: manhattan_pearson value: 80.05598625820885 - type: manhattan_spearman value: 78.04802948866832 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (it-en) config: it-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 81.51743373871778 - type: cos_sim_spearman value: 80.98266651818703 - type: euclidean_pearson value: 81.11875722505269 - type: euclidean_spearman value: 79.45188413284538 - type: manhattan_pearson value: 80.7988457619225 - type: manhattan_spearman value: 79.49643569311485 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (nl-en) config: nl-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 81.78679924046351 - type: cos_sim_spearman value: 80.9986574147117 - type: euclidean_pearson value: 82.09130079135713 - type: euclidean_spearman value: 80.66215667390159 - type: manhattan_pearson value: 82.0328610549654 - type: manhattan_spearman value: 80.31047226932408 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 58.08082172994642 - type: cos_sim_spearman value: 62.9940530222459 - type: euclidean_pearson value: 58.47927303460365 - type: euclidean_spearman value: 60.8440317609258 - type: manhattan_pearson value: 58.32438211697841 - type: manhattan_spearman value: 60.69642636776064 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de) config: de split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 33.83985707464123 - type: cos_sim_spearman value: 46.89093209603036 - type: euclidean_pearson value: 34.63602187576556 - type: euclidean_spearman value: 46.31087228200712 - type: manhattan_pearson value: 34.66899391543166 - type: manhattan_spearman value: 46.33049538425276 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es) config: es split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 51.61315965767736 - type: cos_sim_spearman value: 58.9434266730386 - type: euclidean_pearson value: 50.35885602217862 - type: euclidean_spearman value: 58.238679883286025 - type: manhattan_pearson value: 53.01732044381151 - type: manhattan_spearman value: 58.10482351761412 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl) config: pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 26.771738440430177 - type: cos_sim_spearman value: 34.807259227816054 - type: euclidean_pearson value: 17.82657835823811 - type: euclidean_spearman value: 34.27912898498941 - type: manhattan_pearson value: 19.121527758886312 - type: manhattan_spearman value: 34.4940050226265 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (tr) config: tr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 52.8354704676683 - type: cos_sim_spearman value: 57.28629534815841 - type: euclidean_pearson value: 54.10329332004385 - type: euclidean_spearman value: 58.15030615859976 - type: manhattan_pearson value: 55.42372087433115 - type: manhattan_spearman value: 57.52270736584036 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ar) config: ar split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 31.01976557986924 - type: cos_sim_spearman value: 54.506959483927616 - type: euclidean_pearson value: 36.917863022119086 - type: euclidean_spearman value: 53.750194241538566 - type: manhattan_pearson value: 37.200177833241085 - type: manhattan_spearman value: 53.507659188082535 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (ru) config: ru split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 46.38635647225934 - type: cos_sim_spearman value: 54.50892732637536 - type: euclidean_pearson value: 40.8331015184763 - type: euclidean_spearman value: 53.142903182230924 - type: manhattan_pearson value: 43.07655692906317 - type: manhattan_spearman value: 53.5833474125901 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh) config: zh split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 60.52525456662916 - type: cos_sim_spearman value: 63.23975489531082 - type: euclidean_pearson value: 58.989191722317514 - type: euclidean_spearman value: 62.536326639863894 - type: manhattan_pearson value: 61.32982866201855 - type: manhattan_spearman value: 63.068262822520516 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr) config: fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 59.63798684577696 - type: cos_sim_spearman value: 74.09937723367189 - type: euclidean_pearson value: 63.77494904383906 - type: euclidean_spearman value: 71.15932571292481 - type: manhattan_pearson value: 63.69646122775205 - type: manhattan_spearman value: 70.54960698541632 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-en) config: de-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 36.50262468726711 - type: cos_sim_spearman value: 45.00322499674274 - type: euclidean_pearson value: 32.58759216581778 - type: euclidean_spearman value: 40.13720951315429 - type: manhattan_pearson value: 34.88422299605277 - type: manhattan_spearman value: 40.63516862200963 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-en) config: es-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 56.498552617040275 - type: cos_sim_spearman value: 67.71358426124443 - type: euclidean_pearson value: 57.16474781778287 - type: euclidean_spearman value: 65.721515493531 - type: manhattan_pearson value: 59.25227610738926 - type: manhattan_spearman value: 65.89743680340739 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (it) config: it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 55.97978814727984 - type: cos_sim_spearman value: 65.85821395092104 - type: euclidean_pearson value: 59.11117270978519 - type: euclidean_spearman value: 64.50062069934965 - type: manhattan_pearson value: 59.4436213778161 - type: manhattan_spearman value: 64.4003273074382 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (pl-en) config: pl-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 58.00873192515712 - type: cos_sim_spearman value: 60.167708809138745 - type: euclidean_pearson value: 56.91950637760252 - type: euclidean_spearman value: 58.50593399441014 - type: manhattan_pearson value: 58.683747352584994 - type: manhattan_spearman value: 59.38110066799761 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (zh-en) config: zh-en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 54.26020658151187 - type: cos_sim_spearman value: 61.29236187204147 - type: euclidean_pearson value: 55.993896804147056 - type: euclidean_spearman value: 58.654928232615354 - type: manhattan_pearson value: 56.612492816099426 - type: manhattan_spearman value: 58.65144067094258 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (es-it) config: es-it split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 49.13817835368122 - type: cos_sim_spearman value: 50.78524216975442 - type: euclidean_pearson value: 46.56046454501862 - type: euclidean_spearman value: 50.3935060082369 - type: manhattan_pearson value: 48.0232348418531 - type: manhattan_spearman value: 50.79528358464199 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-fr) config: de-fr split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 44.274388638585286 - type: cos_sim_spearman value: 49.43124017389838 - type: euclidean_pearson value: 42.45909582681174 - type: euclidean_spearman value: 49.661383797129055 - type: manhattan_pearson value: 42.5771970142383 - type: manhattan_spearman value: 50.14423414390715 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (de-pl) config: de-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 26.119500839749776 - type: cos_sim_spearman value: 39.324070169024424 - type: euclidean_pearson value: 35.83247077201831 - type: euclidean_spearman value: 42.61903924348457 - type: manhattan_pearson value: 35.50415034487894 - type: manhattan_spearman value: 41.87998075949351 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (fr-pl) config: fr-pl split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 72.62575835691209 - type: cos_sim_spearman value: 73.24670207647144 - type: euclidean_pearson value: 78.07793323914657 - type: euclidean_spearman value: 73.24670207647144 - type: manhattan_pearson value: 77.51429306378206 - type: manhattan_spearman value: 73.24670207647144 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.09375596849891 - type: cos_sim_spearman value: 86.44881302053585 - type: euclidean_pearson value: 84.71259163967213 - type: euclidean_spearman value: 85.63661992344069 - type: manhattan_pearson value: 84.64466537502614 - type: manhattan_spearman value: 85.53769949940238 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 70.2056154684549 - type: mrr value: 89.52703161036494 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.57623762376238 - type: cos_sim_ap value: 83.53051588811371 - type: cos_sim_f1 value: 77.72704211060375 - type: cos_sim_precision value: 78.88774459320288 - type: cos_sim_recall value: 76.6 - type: dot_accuracy value: 99.06435643564356 - type: dot_ap value: 27.003124923857463 - type: dot_f1 value: 34.125269978401725 - type: dot_precision value: 37.08920187793427 - type: dot_recall value: 31.6 - type: euclidean_accuracy value: 99.61485148514852 - type: euclidean_ap value: 85.47332647001774 - type: euclidean_f1 value: 80.0808897876643 - type: euclidean_precision value: 80.98159509202453 - type: euclidean_recall value: 79.2 - type: manhattan_accuracy value: 99.61683168316831 - type: manhattan_ap value: 85.41969859598552 - type: manhattan_f1 value: 79.77755308392315 - type: manhattan_precision value: 80.67484662576688 - type: manhattan_recall value: 78.9 - type: max_accuracy value: 99.61683168316831 - type: max_ap value: 85.47332647001774 - type: max_f1 value: 80.0808897876643 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 34.35688940053467 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 30.64427069276576 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 44.89500754900078 - type: mrr value: 45.33215558950853 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.653069624224084 - type: cos_sim_spearman value: 30.10187112430319 - type: dot_pearson value: 28.966278202103666 - type: dot_spearman value: 28.342234095507767 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 65.96839999999999 - type: ap value: 11.846327590186444 - type: f1 value: 50.518102944693574 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 55.220713073005086 - type: f1 value: 55.47856175692088 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 31.581473892235877 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 82.94093103653812 - type: cos_sim_ap value: 62.48963249213361 - type: cos_sim_f1 value: 58.9541137429912 - type: cos_sim_precision value: 52.05091937765205 - type: cos_sim_recall value: 67.96833773087072 - type: dot_accuracy value: 78.24998509864696 - type: dot_ap value: 40.82371294480071 - type: dot_f1 value: 44.711163153786096 - type: dot_precision value: 35.475379374419326 - type: dot_recall value: 60.4485488126649 - type: euclidean_accuracy value: 83.13166835548668 - type: euclidean_ap value: 63.459878609769774 - type: euclidean_f1 value: 60.337199569532466 - type: euclidean_precision value: 55.171659741963694 - type: euclidean_recall value: 66.56992084432719 - type: manhattan_accuracy value: 83.00649698992669 - type: manhattan_ap value: 63.263161177904905 - type: manhattan_f1 value: 60.17122874713614 - type: manhattan_precision value: 55.40750610703975 - type: manhattan_recall value: 65.8311345646438 - type: max_accuracy value: 83.13166835548668 - type: max_ap value: 63.459878609769774 - type: max_f1 value: 60.337199569532466 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 87.80416812201653 - type: cos_sim_ap value: 83.45540469219863 - type: cos_sim_f1 value: 75.58836427422892 - type: cos_sim_precision value: 71.93934335002783 - type: cos_sim_recall value: 79.62734832152756 - type: dot_accuracy value: 83.04226336011176 - type: dot_ap value: 70.63007268018524 - type: dot_f1 value: 65.35980325765405 - type: dot_precision value: 60.84677151768532 - type: dot_recall value: 70.59593470896212 - type: euclidean_accuracy value: 87.60430007373773 - type: euclidean_ap value: 83.10068502536592 - type: euclidean_f1 value: 75.02510506936439 - type: euclidean_precision value: 72.56637168141593 - type: euclidean_recall value: 77.65629812134279 - type: manhattan_accuracy value: 87.60041914076145 - type: manhattan_ap value: 83.05480769911229 - type: manhattan_f1 value: 74.98522895125554 - type: manhattan_precision value: 72.04797047970479 - type: manhattan_recall value: 78.17215891592238 - type: max_accuracy value: 87.80416812201653 - type: max_ap value: 83.45540469219863 - type: max_f1 value: 75.58836427422892 --- # shibing624/text2vec-base-multilingual This is a CoSENT(Cosine Sentence) model: shibing624/text2vec-base-multilingual. It maps sentences to a 384 dimensional dense vector space and can be used for tasks like sentence embeddings, text matching or semantic search. - training dataset: - base model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 - max_seq_length: 256 - best epoch: 4 - sentence embedding dim: 384 ## Evaluation For an automated evaluation of this model, see the *Evaluation Benchmark*: text2vec ## Languages Available languages are: de, en, es, fr, it, nl, pl, pt, ru, zh ### Release Models - 本项目release模型的中文匹配评测结果: | Arch | BaseModel | Model | ATEC | BQ | LCQMC | PAWSX | STS-B | SOHU-dd | SOHU-dc | Avg | QPS | |:-----------|:-------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------|:-----:|:-----:|:-----:|:-----:|:-----:|:-------:|:-------:|:---------:|:-----:| | Word2Vec | word2vec | w2v-light-tencent-chinese | 20.00 | 31.49 | 59.46 | 2.57 | 55.78 | 55.04 | 20.70 | 35.03 | 23769 | | SBERT | xlm-roberta-base | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 18.42 | 38.52 | 63.96 | 10.14 | 78.90 | 63.01 | 52.28 | 46.46 | 3138 | | Instructor | hfl/chinese-roberta-wwm-ext | moka-ai/m3e-base | 41.27 | 63.81 | 74.87 | 12.20 | 76.96 | 75.83 | 60.55 | 57.93 | 2980 | | CoSENT | hfl/chinese-macbert-base | shibing624/text2vec-base-chinese | 31.93 | 42.67 | 70.16 | 17.21 | 79.30 | 70.27 | 50.42 | 51.61 | 3008 | | CoSENT | hfl/chinese-lert-large | GanymedeNil/text2vec-large-chinese | 32.61 | 44.59 | 69.30 | 14.51 | 79.44 | 73.01 | 59.04 | 53.12 | 2092 | | CoSENT | nghuyong/ernie-3.0-base-zh | shibing624/text2vec-base-chinese-sentence | 43.37 | 61.43 | 73.48 | 38.90 | 78.25 | 70.60 | 53.08 | 59.87 | 3089 | | CoSENT | nghuyong/ernie-3.0-base-zh | shibing624/text2vec-base-chinese-paraphrase | 44.89 | 63.58 | 74.24 | 40.90 | 78.93 | 76.70 | 63.30 | **63.08** | 3066 | | CoSENT | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | shibing624/text2vec-base-multilingual | 32.39 | 50.33 | 65.64 | 32.56 | 74.45 | 68.88 | 51.17 | 53.67 | 4004 | 说明: - 结果评测指标:spearman系数 - 模型,是用CoSENT方法训练,基于在中文STS-B数据训练得到,并在中文STS-B测试集评估达到较好效果,运行examples/training_sup_text_matching_model.py代码可训练模型,模型文件已经上传HF model hub,中文通用语义匹配任务推荐使用 - 模型,是用CoSENT方法训练,基于用人工挑选后的中文STS数据集shibing624/nli-zh-all/text2vec-base-chinese-sentence-dataset训练得到,并在中文各NLI测试集评估达到较好效果,运行examples/training_sup_text_matching_model_jsonl_data.py代码可训练模型,模型文件已经上传HF model hub,中文s2s(句子vs句子)语义匹配任务推荐使用 - 模型,是用CoSENT方法训练,基于用人工挑选后的中文STS数据集shibing624/nli-zh-all/text2vec-base-chinese-paraphrase-dataset,数据集相对于shibing624/nli-zh-all/text2vec-base-chinese-sentence-dataset加入了s2p(sentence to paraphrase)数据,强化了其长文本的表征能力,并在中文各NLI测试集评估达到SOTA,运行examples/training_sup_text_matching_model_jsonl_data.py代码可训练模型,模型文件已经上传HF model hub,中文s2p(句子vs段落)语义匹配任务推荐使用 - 模型,是用CoSENT方法训练,基于用人工挑选后的多语言STS数据集shibing624/nli-zh-all/text2vec-base-multilingual-dataset训练得到,并在中英文测试集评估相对于原模型效果有提升,运行examples/training_sup_text_matching_model_jsonl_data.py代码可训练模型,模型文件已经上传HF model hub,多语言语义匹配任务推荐使用 - 是腾讯词向量的Word2Vec模型,CPU加载使用,适用于中文字面匹配任务和缺少数据的冷启动情况 - QPS的GPU测试环境是Tesla V100,显存32GB 模型训练实验报告:实验报告 ## Usage (text2vec) Using this model becomes easy when you have text2vec installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without text2vec, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. Install transformers: Then load model and predict: ## Usage (sentence-transformers) sentence-transformers is a popular library to compute dense vector representations for sentences. Install sentence-transformers: Then load model and predict: ## Full Model Architecture ## Intended uses Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 256 word pieces is truncated. ## Training procedure ### Pre-training We use the pretrained []( model. Please refer to the model card for more detailed information about the pre-training procedure. ### Fine-tuning We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the rank loss by comparing with true pairs and false pairs. ## Citing & Authors This model was trained by text2vec. If you find this model helpful, feel free to cite:", + "model_explanation_gemini": "Generates multilingual sentence embeddings for tasks like sentence similarity and classification across languages including Chinese, English, German, French, and others." +} \ No newline at end of file diff --git a/data/model_data_json/siebert_sentiment-roberta-large-english.json b/data/model_data_json/siebert_sentiment-roberta-large-english.json new file mode 100644 index 0000000000000000000000000000000000000000..14ec4b48d39deba87f8c6c43eaeca625f95a8a8e --- /dev/null +++ b/data/model_data_json/siebert_sentiment-roberta-large-english.json @@ -0,0 +1,23 @@ +{ + "model_id": "siebert/sentiment-roberta-large-english", + "downloads": 97943, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "roberta", + "text-classification", + "sentiment", + "twitter", + "reviews", + "siebert", + "en", + "arxiv:1907.11692", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: \"en\" tags: - sentiment - twitter - reviews - siebert --- ## SiEBERT - English-Language Sentiment Classification # Overview This model (\"SiEBERT\", prefix for \"Sentiment in English\") is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). Consequently, it outperforms models trained on only one type of text (e.g., movie reviews from the popular SST-2 benchmark) when used on new data as shown below. # Predictions on a data set If you want to predict sentiment for your own data, we provide an example script via Google Colab. You can load your data to a Google Drive and run the script for free on a Colab GPU. Set-up only takes a few minutes. We suggest that you manually label a subset of your data to evaluate performance for your use case. For performance benchmark values across various sentiment analysis contexts, please refer to our paper (Hartmann et al. 2023). by more than 15 percentage points (78.1 vs. 93.2 percent, see table below). As a robustness check, we evaluate the model in a leave-one-out manner (training on 14 data sets, evaluating on the one left out), which decreases model performance by only about 3 percentage points on average and underscores its generalizability. Model performance is given as evaluation set accuracy in percent. |Dataset|DistilBERT SST-2|This model| |---|---|---| |McAuley and Leskovec (2013) (Reviews)|84.7|98.0| |McAuley and Leskovec (2013) (Review Titles)|65.5|87.0| |Yelp Academic Dataset|84.8|96.5| |Maas et al. (2011)|80.6|96.0| |Kaggle|87.2|96.0| |Pang and Lee (2005)|89.7|91.0| |Nakov et al. (2013)|70.1|88.5| |Shamma (2009)|76.0|87.0| |Blitzer et al. (2007) (Books)|83.0|92.5| |Blitzer et al. (2007) (DVDs)|84.5|92.5| |Blitzer et al. (2007) (Electronics)|74.5|95.0| |Blitzer et al. (2007) (Kitchen devices)|80.0|98.5| |Pang et al. (2002)|73.5|95.5| |Speriosu et al. (2011)|71.5|85.5| |Hartmann et al. (2019)|65.5|98.0| |**Average**|**78.1**|**93.2**| # Fine-tuning hyperparameters - learning_rate = 2e-5 - num_train_epochs = 3.0 - warmump_steps = 500 - weight_decay = 0.01 Other values were left at their defaults as listed here. # Citation and contact Please cite this paper (Published in the IJRM) when you use our model. Feel free to reach out to christian.siebert@uni-hamburg.de with any questions or feedback you may have.", + "model_explanation_gemini": "Performs binary sentiment classification (positive/negative) on English text, fine-tuned for generalization across diverse sources like reviews and tweets." +} \ No newline at end of file diff --git a/data/model_data_json/sileod_deberta-v3-base-tasksource-nli.json b/data/model_data_json/sileod_deberta-v3-base-tasksource-nli.json new file mode 100644 index 0000000000000000000000000000000000000000..83fac0a8ee2bc2d36644207b5e293f361a3c9c4b --- /dev/null +++ b/data/model_data_json/sileod_deberta-v3-base-tasksource-nli.json @@ -0,0 +1,301 @@ +{ + "model_id": "sileod/deberta-v3-base-tasksource-nli", + "downloads": 545644, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "deberta-v2", + "text-classification", + "deberta-v3-base", + "deberta-v3", + "deberta", + "nli", + "natural-language-inference", + "multitask", + "multi-task", + "pipeline", + "extreme-multi-task", + "extreme-mtl", + "tasksource", + "zero-shot", + "rlhf", + "zero-shot-classification", + "en", + "dataset:glue", + "dataset:nyu-mll/multi_nli", + "dataset:multi_nli", + "dataset:super_glue", + "dataset:anli", + "dataset:tasksource/babi_nli", + "dataset:sick", + "dataset:snli", + "dataset:scitail", + "dataset:OpenAssistant/oasst1", + "dataset:universal_dependencies", + "dataset:hans", + "dataset:qbao775/PARARULE-Plus", + "dataset:alisawuffles/WANLI", + "dataset:metaeval/recast", + "dataset:sileod/probability_words_nli", + "dataset:joey234/nan-nli", + "dataset:pietrolesci/nli_fever", + "dataset:pietrolesci/breaking_nli", + "dataset:pietrolesci/conj_nli", + "dataset:pietrolesci/fracas", + "dataset:pietrolesci/dialogue_nli", + "dataset:pietrolesci/mpe", + "dataset:pietrolesci/dnc", + "dataset:pietrolesci/gpt3_nli", + "dataset:pietrolesci/recast_white", + "dataset:pietrolesci/joci", + "dataset:martn-nguyen/contrast_nli", + "dataset:pietrolesci/robust_nli", + "dataset:pietrolesci/robust_nli_is_sd", + "dataset:pietrolesci/robust_nli_li_ts", + "dataset:pietrolesci/gen_debiased_nli", + "dataset:pietrolesci/add_one_rte", + "dataset:metaeval/imppres", + "dataset:pietrolesci/glue_diagnostics", + "dataset:hlgd", + "dataset:PolyAI/banking77", + "dataset:paws", + "dataset:quora", + "dataset:medical_questions_pairs", + "dataset:conll2003", + "dataset:nlpaueb/finer-139", + "dataset:Anthropic/hh-rlhf", + "dataset:Anthropic/model-written-evals", + "dataset:truthful_qa", + "dataset:nightingal3/fig-qa", + "dataset:tasksource/bigbench", + "dataset:blimp", + "dataset:cos_e", + "dataset:cosmos_qa", + "dataset:dream", + "dataset:openbookqa", + "dataset:qasc", + "dataset:quartz", + "dataset:quail", + "dataset:head_qa", + "dataset:sciq", + "dataset:social_i_qa", + "dataset:wiki_hop", + "dataset:wiqa", + "dataset:piqa", + "dataset:hellaswag", + "dataset:pkavumba/balanced-copa", + "dataset:12ml/e-CARE", + "dataset:art", + "dataset:tasksource/mmlu", + "dataset:winogrande", + "dataset:codah", + "dataset:ai2_arc", + "dataset:definite_pronoun_resolution", + "dataset:swag", + "dataset:math_qa", + "dataset:metaeval/utilitarianism", + "dataset:mteb/amazon_counterfactual", + "dataset:SetFit/insincere-questions", + "dataset:SetFit/toxic_conversations", + "dataset:turingbench/TuringBench", + "dataset:trec", + "dataset:tals/vitaminc", + "dataset:hope_edi", + "dataset:strombergnlp/rumoureval_2019", + "dataset:ethos", + "dataset:tweet_eval", + "dataset:discovery", + "dataset:pragmeval", + "dataset:silicone", + "dataset:lex_glue", + "dataset:papluca/language-identification", + "dataset:imdb", + "dataset:rotten_tomatoes", + "dataset:ag_news", + "dataset:yelp_review_full", + "dataset:financial_phrasebank", + "dataset:poem_sentiment", + "dataset:dbpedia_14", + "dataset:amazon_polarity", + "dataset:app_reviews", + "dataset:hate_speech18", + "dataset:sms_spam", + "dataset:humicroedit", + "dataset:snips_built_in_intents", + "dataset:banking77", + "dataset:hate_speech_offensive", + "dataset:yahoo_answers_topics", + "dataset:pacovaldez/stackoverflow-questions", + "dataset:zapsdcn/hyperpartisan_news", + "dataset:zapsdcn/sciie", + "dataset:zapsdcn/citation_intent", + "dataset:go_emotions", + "dataset:allenai/scicite", + "dataset:liar", + "dataset:relbert/lexical_relation_classification", + "dataset:metaeval/linguisticprobing", + "dataset:tasksource/crowdflower", + "dataset:metaeval/ethics", + "dataset:emo", + "dataset:google_wellformed_query", + "dataset:tweets_hate_speech_detection", + "dataset:has_part", + "dataset:wnut_17", + "dataset:ncbi_disease", + "dataset:acronym_identification", + "dataset:jnlpba", + "dataset:species_800", + "dataset:SpeedOfMagic/ontonotes_english", + "dataset:blog_authorship_corpus", + "dataset:launch/open_question_type", + "dataset:health_fact", + "dataset:commonsense_qa", + "dataset:mc_taco", + "dataset:ade_corpus_v2", + "dataset:prajjwal1/discosense", + "dataset:circa", + "dataset:PiC/phrase_similarity", + "dataset:copenlu/scientific-exaggeration-detection", + "dataset:quarel", + "dataset:mwong/fever-evidence-related", + "dataset:numer_sense", + "dataset:dynabench/dynasent", + "dataset:raquiba/Sarcasm_News_Headline", + "dataset:sem_eval_2010_task_8", + "dataset:demo-org/auditor_review", + "dataset:medmcqa", + "dataset:aqua_rat", + "dataset:RuyuanWan/Dynasent_Disagreement", + "dataset:RuyuanWan/Politeness_Disagreement", + "dataset:RuyuanWan/SBIC_Disagreement", + "dataset:RuyuanWan/SChem_Disagreement", + "dataset:RuyuanWan/Dilemmas_Disagreement", + "dataset:lucasmccabe/logiqa", + "dataset:wiki_qa", + "dataset:metaeval/cycic_classification", + "dataset:metaeval/cycic_multiplechoice", + "dataset:metaeval/sts-companion", + "dataset:metaeval/commonsense_qa_2.0", + "dataset:metaeval/lingnli", + "dataset:metaeval/monotonicity-entailment", + "dataset:metaeval/arct", + "dataset:metaeval/scinli", + "dataset:metaeval/naturallogic", + "dataset:onestop_qa", + "dataset:demelin/moral_stories", + "dataset:corypaik/prost", + "dataset:aps/dynahate", + "dataset:metaeval/syntactic-augmentation-nli", + "dataset:metaeval/autotnli", + "dataset:lasha-nlp/CONDAQA", + "dataset:openai/webgpt_comparisons", + "dataset:Dahoas/synthetic-instruct-gptj-pairwise", + "dataset:metaeval/scruples", + "dataset:metaeval/wouldyourather", + "dataset:sileod/attempto-nli", + "dataset:metaeval/defeasible-nli", + "dataset:metaeval/help-nli", + "dataset:metaeval/nli-veridicality-transitivity", + "dataset:metaeval/natural-language-satisfiability", + "dataset:metaeval/lonli", + "dataset:tasksource/dadc-limit-nli", + "dataset:ColumbiaNLP/FLUTE", + "dataset:metaeval/strategy-qa", + "dataset:openai/summarize_from_feedback", + "dataset:tasksource/folio", + "dataset:metaeval/tomi-nli", + "dataset:metaeval/avicenna", + "dataset:stanfordnlp/SHP", + "dataset:GBaker/MedQA-USMLE-4-options-hf", + "dataset:GBaker/MedQA-USMLE-4-options", + "dataset:sileod/wikimedqa", + "dataset:declare-lab/cicero", + "dataset:amydeng2000/CREAK", + "dataset:metaeval/mutual", + "dataset:inverse-scaling/NeQA", + "dataset:inverse-scaling/quote-repetition", + "dataset:inverse-scaling/redefine-math", + "dataset:tasksource/puzzte", + "dataset:metaeval/implicatures", + "dataset:race", + "dataset:metaeval/spartqa-yn", + "dataset:metaeval/spartqa-mchoice", + "dataset:metaeval/temporal-nli", + "dataset:metaeval/ScienceQA_text_only", + "dataset:AndyChiang/cloth", + "dataset:metaeval/logiqa-2.0-nli", + "dataset:tasksource/oasst1_dense_flat", + "dataset:metaeval/boolq-natural-perturbations", + "dataset:metaeval/path-naturalness-prediction", + "dataset:riddle_sense", + "dataset:Jiangjie/ekar_english", + "dataset:metaeval/implicit-hate-stg1", + "dataset:metaeval/chaos-mnli-ambiguity", + "dataset:IlyaGusev/headline_cause", + "dataset:metaeval/race-c", + "dataset:metaeval/equate", + "dataset:metaeval/ambient", + "dataset:AndyChiang/dgen", + "dataset:metaeval/clcd-english", + "dataset:civil_comments", + "dataset:metaeval/acceptability-prediction", + "dataset:maximedb/twentyquestions", + "dataset:metaeval/counterfactually-augmented-snli", + "dataset:tasksource/I2D2", + "dataset:sileod/mindgames", + "dataset:metaeval/counterfactually-augmented-imdb", + "dataset:metaeval/cnli", + "dataset:metaeval/reclor", + "dataset:tasksource/oasst1_pairwise_rlhf_reward", + "dataset:tasksource/zero-shot-label-nli", + "dataset:webis/args_me", + "dataset:webis/Touche23-ValueEval", + "dataset:tasksource/starcon", + "dataset:tasksource/ruletaker", + "dataset:lighteval/lsat_qa", + "dataset:tasksource/ConTRoL-nli", + "dataset:tasksource/tracie", + "dataset:tasksource/sherliic", + "dataset:tasksource/sen-making", + "dataset:tasksource/winowhy", + "dataset:mediabiasgroup/mbib-base", + "dataset:tasksource/robustLR", + "dataset:CLUTRR/v1", + "dataset:tasksource/logical-fallacy", + "dataset:tasksource/parade", + "dataset:tasksource/cladder", + "dataset:tasksource/subjectivity", + "dataset:tasksource/MOH", + "dataset:tasksource/VUAC", + "dataset:tasksource/TroFi", + "dataset:sharc_modified", + "dataset:tasksource/conceptrules_v2", + "dataset:tasksource/disrpt", + "dataset:conll2000", + "dataset:DFKI-SLT/few-nerd", + "dataset:tasksource/com2sense", + "dataset:tasksource/scone", + "dataset:tasksource/winodict", + "dataset:tasksource/fool-me-twice", + "dataset:tasksource/monli", + "dataset:tasksource/corr2cause", + "dataset:tasksource/apt", + "dataset:zeroshot/twitter-financial-news-sentiment", + "dataset:tasksource/icl-symbol-tuning-instruct", + "dataset:tasksource/SpaceNLI", + "dataset:sihaochen/propsegment", + "dataset:HannahRoseKirk/HatemojiBuild", + "dataset:tasksource/regset", + "dataset:lmsys/chatbot_arena_conversations", + "dataset:tasksource/nlgraph", + "arxiv:2301.05948", + "license:apache-2.0", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: en tags: - deberta-v3-base - deberta-v3 - deberta - text-classification - nli - natural-language-inference - multitask - multi-task - pipeline - extreme-multi-task - extreme-mtl - tasksource - zero-shot - rlhf model-index: - name: deberta-v3-base-tasksource-nli results: - task: type: text-classification name: Text Classification dataset: name: glue type: glue config: rte split: validation metrics: - type: accuracy value: 0.89 - task: type: natural-language-inference name: Natural Language Inference dataset: name: anli-r3 type: anli config: plain_text split: validation metrics: - type: accuracy value: 0.52 name: Accuracy datasets: - glue - nyu-mll/multi_nli - multi_nli - super_glue - anli - tasksource/babi_nli - sick - snli - scitail - OpenAssistant/oasst1 - universal_dependencies - hans - qbao775/PARARULE-Plus - alisawuffles/WANLI - metaeval/recast - sileod/probability_words_nli - joey234/nan-nli - pietrolesci/nli_fever - pietrolesci/breaking_nli - pietrolesci/conj_nli - pietrolesci/fracas - pietrolesci/dialogue_nli - pietrolesci/mpe - pietrolesci/dnc - pietrolesci/gpt3_nli - pietrolesci/recast_white - pietrolesci/joci - martn-nguyen/contrast_nli - pietrolesci/robust_nli - pietrolesci/robust_nli_is_sd - pietrolesci/robust_nli_li_ts - pietrolesci/gen_debiased_nli - pietrolesci/add_one_rte - metaeval/imppres - pietrolesci/glue_diagnostics - hlgd - PolyAI/banking77 - paws - quora - medical_questions_pairs - conll2003 - nlpaueb/finer-139 - Anthropic/hh-rlhf - Anthropic/model-written-evals - truthful_qa - nightingal3/fig-qa - tasksource/bigbench - blimp - cos_e - cosmos_qa - dream - openbookqa - qasc - quartz - quail - head_qa - sciq - social_i_qa - wiki_hop - wiqa - piqa - hellaswag - pkavumba/balanced-copa - 12ml/e-CARE - art - tasksource/mmlu - winogrande - codah - ai2_arc - definite_pronoun_resolution - swag - math_qa - metaeval/utilitarianism - mteb/amazon_counterfactual - SetFit/insincere-questions - SetFit/toxic_conversations - turingbench/TuringBench - trec - tals/vitaminc - hope_edi - strombergnlp/rumoureval_2019 - ethos - tweet_eval - discovery - pragmeval - silicone - lex_glue - papluca/language-identification - imdb - rotten_tomatoes - ag_news - yelp_review_full - financial_phrasebank - poem_sentiment - dbpedia_14 - amazon_polarity - app_reviews - hate_speech18 - sms_spam - humicroedit - snips_built_in_intents - banking77 - hate_speech_offensive - yahoo_answers_topics - pacovaldez/stackoverflow-questions - zapsdcn/hyperpartisan_news - zapsdcn/sciie - zapsdcn/citation_intent - go_emotions - allenai/scicite - liar - relbert/lexical_relation_classification - metaeval/linguisticprobing - tasksource/crowdflower - metaeval/ethics - emo - google_wellformed_query - tweets_hate_speech_detection - has_part - wnut_17 - ncbi_disease - acronym_identification - jnlpba - species_800 - SpeedOfMagic/ontonotes_english - blog_authorship_corpus - launch/open_question_type - health_fact - commonsense_qa - mc_taco - ade_corpus_v2 - prajjwal1/discosense - circa - PiC/phrase_similarity - copenlu/scientific-exaggeration-detection - quarel - mwong/fever-evidence-related - numer_sense - dynabench/dynasent - raquiba/Sarcasm_News_Headline - sem_eval_2010_task_8 - demo-org/auditor_review - medmcqa - aqua_rat - RuyuanWan/Dynasent_Disagreement - RuyuanWan/Politeness_Disagreement - RuyuanWan/SBIC_Disagreement - RuyuanWan/SChem_Disagreement - RuyuanWan/Dilemmas_Disagreement - lucasmccabe/logiqa - wiki_qa - metaeval/cycic_classification - metaeval/cycic_multiplechoice - metaeval/sts-companion - metaeval/commonsense_qa_2.0 - metaeval/lingnli - metaeval/monotonicity-entailment - metaeval/arct - metaeval/scinli - metaeval/naturallogic - onestop_qa - demelin/moral_stories - corypaik/prost - aps/dynahate - metaeval/syntactic-augmentation-nli - metaeval/autotnli - lasha-nlp/CONDAQA - openai/webgpt_comparisons - Dahoas/synthetic-instruct-gptj-pairwise - metaeval/scruples - metaeval/wouldyourather - sileod/attempto-nli - metaeval/defeasible-nli - metaeval/help-nli - metaeval/nli-veridicality-transitivity - metaeval/natural-language-satisfiability - metaeval/lonli - tasksource/dadc-limit-nli - ColumbiaNLP/FLUTE - metaeval/strategy-qa - openai/summarize_from_feedback - tasksource/folio - metaeval/tomi-nli - metaeval/avicenna - stanfordnlp/SHP - GBaker/MedQA-USMLE-4-options-hf - GBaker/MedQA-USMLE-4-options - sileod/wikimedqa - declare-lab/cicero - amydeng2000/CREAK - metaeval/mutual - inverse-scaling/NeQA - inverse-scaling/quote-repetition - inverse-scaling/redefine-math - tasksource/puzzte - metaeval/implicatures - race - metaeval/spartqa-yn - metaeval/spartqa-mchoice - metaeval/temporal-nli - metaeval/ScienceQA_text_only - AndyChiang/cloth - metaeval/logiqa-2.0-nli - tasksource/oasst1_dense_flat - metaeval/boolq-natural-perturbations - metaeval/path-naturalness-prediction - riddle_sense - Jiangjie/ekar_english - metaeval/implicit-hate-stg1 - metaeval/chaos-mnli-ambiguity - IlyaGusev/headline_cause - metaeval/race-c - metaeval/equate - metaeval/ambient - AndyChiang/dgen - metaeval/clcd-english - civil_comments - metaeval/acceptability-prediction - maximedb/twentyquestions - metaeval/counterfactually-augmented-snli - tasksource/I2D2 - sileod/mindgames - metaeval/counterfactually-augmented-imdb - metaeval/cnli - metaeval/reclor - tasksource/oasst1_pairwise_rlhf_reward - tasksource/zero-shot-label-nli - webis/args_me - webis/Touche23-ValueEval - tasksource/starcon - tasksource/ruletaker - lighteval/lsat_qa - tasksource/ConTRoL-nli - tasksource/tracie - tasksource/sherliic - tasksource/sen-making - tasksource/winowhy - mediabiasgroup/mbib-base - tasksource/robustLR - CLUTRR/v1 - tasksource/logical-fallacy - tasksource/parade - tasksource/cladder - tasksource/subjectivity - tasksource/MOH - tasksource/VUAC - tasksource/TroFi - sharc_modified - tasksource/conceptrules_v2 - tasksource/disrpt - conll2000 - DFKI-SLT/few-nerd - tasksource/com2sense - tasksource/scone - tasksource/winodict - tasksource/fool-me-twice - tasksource/monli - tasksource/corr2cause - tasksource/apt - zeroshot/twitter-financial-news-sentiment - tasksource/icl-symbol-tuning-instruct - tasksource/SpaceNLI - sihaochen/propsegment - HannahRoseKirk/HatemojiBuild - tasksource/regset - tasksource/babi_nli - lmsys/chatbot_arena_conversations - tasksource/nlgraph metrics: - accuracy library_name: transformers pipeline_tag: zero-shot-classification --- # Model Card for DeBERTa-v3-base-tasksource-nli --- **NOTE** Deprecated: use for longer context and better accuracy. --- This is DeBERTa-v3-base fine-tuned with multi-task learning on 600+ tasks of the tasksource collection. This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for: - Zero-shot entailment-based classification for arbitrary labels [ZS]. - Natural language inference [NLI] - Hundreds of previous tasks with tasksource-adapters [TA]. - Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT]. # [ZS] Zero-shot classification pipeline NLI training data of this model includes label-nli, a NLI dataset specially constructed to improve this kind of zero-shot classification. # [NLI] Natural language inference pipeline # [TA] Tasksource-adapters: 1 line access to hundreds of tasks The list of tasks is available in model config.json. This is more efficient than ZS since it requires only one forward pass per example, but it is less flexible. # [FT] Tasknet: 3 lines fine-tuning ## Evaluation This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation. ### Software and training details The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 15 days on Nvidia A30 24GB gpu. This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched. \\ \\ Training code: # Citation More details on this article: # Model Card Contact damien.sileo@inria.fr ", + "model_explanation_gemini": "Performs natural language inference and text classification across multiple tasks, including zero-shot learning and multitask scenarios." +} \ No newline at end of file diff --git a/data/model_data_json/skshmjn_Pokemon-classifier-gen9-1025.json b/data/model_data_json/skshmjn_Pokemon-classifier-gen9-1025.json new file mode 100644 index 0000000000000000000000000000000000000000..a14018d2b261ea0e29d5c921c6fef8061323c191 --- /dev/null +++ b/data/model_data_json/skshmjn_Pokemon-classifier-gen9-1025.json @@ -0,0 +1,22 @@ +{ + "model_id": "skshmjn/Pokemon-classifier-gen9-1025", + "downloads": 78944, + "tags": [ + "transformers", + "safetensors", + "vit", + "image-classification", + "vision", + "pokemon", + "pytorch", + "en", + "dataset:custom", + "base_model:google/vit-base-patch16-224-in21k", + "base_model:finetune:google/vit-base-patch16-224-in21k", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - vision - image-classification - pokemon - pytorch - transformers license: apache-2.0 datasets: - custom pipeline_tag: image-classification model_name: Pokemon Classifier Gen9 model_id: skshmjn/Pokemon-classifier-gen9-1025 library_name: transformers framework: PyTorch widget: - src: test.jpg metrics: - accuracy base_model: - google/vit-base-patch16-224-in21k --- # Model Card for Pokemon Classifier Gen9 ## Model Overview This is a fine-tuned ViT (Vision Transformer) model for Pokémon image classification. The model is trained to classify upto Gen9 (1025) Pokémon images. ## Intended Use This model is designed for image classification tasks, specifically for identifying Pokémon characters. It can be used for: - Pokémon-themed apps - Educational projects - Pokémon identification in images **Note**: The model is not designed for general-purpose image classification. ## How to Use Here's how you can load and use the model with the Hugging Face library: ```python from transformers import ViTForImageClassification, ViTImageProcessor from PIL import Image import torch # Define the device device = \"cuda\" if torch.cuda.is_available() else \"cpu\" # Load the model and image processor model_id = \"skshmjn/Pokemon-classifier-gen9-1025\" model = ViTForImageClassification.from_pretrained(model_id).to(device) image_processor = ViTImageProcessor.from_pretrained(model_id) # Load and process an image img = Image.open('test.jpg').convert(\"RGB\") inputs = image_processor(images=img, return_tensors='pt').to(device) # Make predictions outputs = model(**inputs) predicted_id = outputs.logits.argmax(-1).item() predicted_pokemon = model.config.id2label[predicted_id] # Print predicted class print(f\"Predicted Pokémon Pokédex number: {predicted_id+1}\") print(f\"Predicted Pokémon: {predicted_pokemon}\")" +} \ No newline at end of file diff --git a/data/model_data_json/skt_kogpt2-base-v2.json b/data/model_data_json/skt_kogpt2-base-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..a4d9c22837b030e0e535928e3919bfb17b662b2e --- /dev/null +++ b/data/model_data_json/skt_kogpt2-base-v2.json @@ -0,0 +1,19 @@ +{ + "model_id": "skt/kogpt2-base-v2", + "downloads": 83657, + "tags": [ + "transformers", + "pytorch", + "jax", + "gpt2", + "text-generation", + "ko", + "license:cc-by-nc-sa-4.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ko tags: - gpt2 license: cc-by-nc-sa-4.0 --- For more details:", + "model_explanation_gemini": "Generates Korean text using GPT-2 architecture under a non-commercial license." +} \ No newline at end of file diff --git a/data/model_data_json/snunlp_KR-SBERT-V40K-klueNLI-augSTS.json b/data/model_data_json/snunlp_KR-SBERT-V40K-klueNLI-augSTS.json new file mode 100644 index 0000000000000000000000000000000000000000..759b5ea3c559f3515115c9f1b5f147e10d8b6ab7 --- /dev/null +++ b/data/model_data_json/snunlp_KR-SBERT-V40K-klueNLI-augSTS.json @@ -0,0 +1,19 @@ +{ + "model_id": "snunlp/KR-SBERT-V40K-klueNLI-augSTS", + "downloads": 392865, + "tags": [ + "sentence-transformers", + "pytorch", + "bert", + "feature-extraction", + "sentence-similarity", + "transformers", + "ko", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers language: - ko widget: - source_sentence: \"그 식당은 파리를 날린다\" sentences: - \"그 식당에는 손님이 없다\" - \"그 식당에서는 드론을 날린다\" - \"파리가 식당에 날아다닌다\" example_title: \"Restaurant\" - source_sentence: \"잠이 옵니다\" sentences: - \"잠이 안 옵니다\" - \"졸음이 옵니다\" - \"기차가 옵니다\" example_title: \"Sleepy\" --- # snunlp/KR-SBERT-V40K-klueNLI-augSTS This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. ## Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Then you can use the model like this: ## Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: ## Full Model Architecture ## Application for document classification Tutorial in Google Colab: |Model|Accuracy| |-|-| |KR-SBERT-Medium-NLI-STS|0.8400| |KR-SBERT-V40K-NLI-STS|0.8400| |KR-SBERT-V40K-NLI-augSTS|0.8511| |KR-SBERT-V40K-klueNLI-augSTS|**0.8628**| ## Citation", + "model_explanation_gemini": "Computes semantic similarity between Korean sentences by mapping them to 768-dimensional vectors for tasks like clustering or semantic search." +} \ No newline at end of file diff --git a/data/model_data_json/sociocom_MedNER-CR-JA.json b/data/model_data_json/sociocom_MedNER-CR-JA.json new file mode 100644 index 0000000000000000000000000000000000000000..c2589c39f2f5c4e6e66ce379e9ff2fea4954f081 --- /dev/null +++ b/data/model_data_json/sociocom_MedNER-CR-JA.json @@ -0,0 +1,21 @@ +{ + "model_id": "sociocom/MedNER-CR-JA", + "downloads": 117872, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "bert", + "token-classification", + "NER", + "medical documents", + "ja", + "dataset:MedTxt-CR-JA-training-v2.xml", + "license:cc-by-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ja license: - cc-by-4.0 tags: - NER - medical documents datasets: - MedTxt-CR-JA-training-v2.xml metrics: - NTCIR-16 Real-MedNLP subtask 1 --- This is a model for named entity recognition of Japanese medical documents. ### How to use Download the following five files and put into the same folder. - id_to_tags.pkl - key_attr.pkl - NER_medNLP.py - predict.py - text.txt (This is an input file which should be predicted, which could be changed.) You can use this model by running predict.py. ### Input Example ### Output Example ### Publication Tomohiro Nishiyama, Aki Ando, Mihiro Nishidani, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki: NAISTSOC at the NTCIR-16 Real-MedNLP Task, In Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-16), pp. 330-333, 2022", + "model_explanation_gemini": "Recognizes named entities in Japanese medical documents for information extraction." +} \ No newline at end of file diff --git a/data/model_data_json/softcatala_wav2vec2-large-xlsr-catala.json b/data/model_data_json/softcatala_wav2vec2-large-xlsr-catala.json new file mode 100644 index 0000000000000000000000000000000000000000..febbeac99c7673e7c65aff714a6a5b07cfa35eaa --- /dev/null +++ b/data/model_data_json/softcatala_wav2vec2-large-xlsr-catala.json @@ -0,0 +1,23 @@ +{ + "model_id": "softcatala/wav2vec2-large-xlsr-catala", + "downloads": 93946, + "tags": [ + "transformers", + "pytorch", + "jax", + "wav2vec2", + "automatic-speech-recognition", + "audio", + "speech", + "xlsr-fine-tuning-week", + "ca", + "dataset:common_voice", + "dataset:parlament_parla", + "license:apache-2.0", + "model-index", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ca datasets: - common_voice - parlament_parla metrics: - wer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: Catalan XLSR Wav2Vec2 Large results: - task: name: Speech Recognition type: automatic-speech-recognition datasets: - name: Common Voice ca type: common_voice args: ca - name: ParlamentParla url: metrics: - name: Test WER type: wer value: 6.92 - name: Google Crowsourced Corpus WER type: wer value: 12.99 - name: Audiobook “La llegenda de Sant Jordi” WER type: wer value: 13.23 --- # Wav2Vec2-Large-XLSR-Català Fine-tuned facebook/wav2vec2-large-xlsr-53 on Catalan language using the Common Voice and ParlamentParla datasets. **Attention:** The split train/dev/test used does not fully map with the CommonVoice 6.1 dataset. A custom split was used combining both the CommonVoice and ParlamentParla dataset and can be found here. Evaluating on the CV test dataset will produce a biased WER as 1144 audio files of that dataset were used in training/evaluation of this model. WER was calculated using this test.csv which was not seen by the model during training/evaluation. You can find training and evaluation scripts in the github repository ccoreilly/wav2vec2-catala When using this model, make sure that your speech input is sampled at 16kHz. ## Results Word error rate was evaluated on the following datasets unseen by the model: | Dataset | WER | | ------- | --- | | Test split CV+ParlamentParla) | 6.92% | | Google Crowsourced Corpus | 12.99% | | Audiobook “La llegenda de Sant Jordi” | 13.23% | ## Usage The model can be used directly (without a language model) as follows:", + "model_explanation_gemini": "Fine-tuned for automatic speech recognition in Catalan, achieving low word error rates on datasets like Common Voice and ParlamentParla." +} \ No newline at end of file diff --git a/data/model_data_json/solidrust_Mistral-7B-Instruct-v0.3-AWQ.json b/data/model_data_json/solidrust_Mistral-7B-Instruct-v0.3-AWQ.json new file mode 100644 index 0000000000000000000000000000000000000000..e59bf4f359c45bc41ffb1286a29bfb1b8378fca8 --- /dev/null +++ b/data/model_data_json/solidrust_Mistral-7B-Instruct-v0.3-AWQ.json @@ -0,0 +1,23 @@ +{ + "model_id": "solidrust/Mistral-7B-Instruct-v0.3-AWQ", + "downloads": 86478, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "4-bit", + "AWQ", + "autotrain_compatible", + "endpoints_compatible", + "conversational", + "base_model:mistralai/Mistral-7B-Instruct-v0.3", + "base_model:quantized:mistralai/Mistral-7B-Instruct-v0.3", + "license:apache-2.0", + "text-generation-inference", + "awq", + "region:us" + ], + "description": "--- base_model: mistralai/Mistral-7B-Instruct-v0.3 inference: false library_name: transformers license: apache-2.0 pipeline_tag: text-generation quantized_by: Suparious tags: - 4-bit - AWQ - text-generation - autotrain_compatible - endpoints_compatible --- # mistralai/Mistral-7B-Instruct-v0.3 AWQ - Model creator: mistralai - Original model: Mistral-7B-Instruct-v0.3 ## Model Summary The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 - Extended vocabulary to 32768 - Supports v3 Tokenizer - Supports function calling ## How to use ### Install the necessary packages ### Example Python code ### About AWQ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings. AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead. It is supported by: - Text Generation Webui - using Loader: AutoAWQ - vLLM - version 0.2.2 or later for support for all model types. - Hugging Face Text Generation Inference (TGI) - Transformers version 4.35.0 and later, from any code or client that supports Transformers - AutoAWQ - for use from Python code", + "model_explanation_gemini": "A 4-bit quantized version of Mistral-7B-Instruct-v0.3 optimized for efficient text generation with AWQ, supporting function calling and extended vocabulary." +} \ No newline at end of file diff --git a/data/model_data_json/sonoisa_sentence-bert-base-ja-mean-tokens-v2.json b/data/model_data_json/sonoisa_sentence-bert-base-ja-mean-tokens-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..593f165718ef292b86a05b28aa5bd1bd070d849f --- /dev/null +++ b/data/model_data_json/sonoisa_sentence-bert-base-ja-mean-tokens-v2.json @@ -0,0 +1,20 @@ +{ + "model_id": "sonoisa/sentence-bert-base-ja-mean-tokens-v2", + "downloads": 91589, + "tags": [ + "sentence-transformers", + "pytorch", + "safetensors", + "bert", + "sentence-bert", + "feature-extraction", + "sentence-similarity", + "ja", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja license: cc-by-sa-4.0 tags: - sentence-transformers - sentence-bert - feature-extraction - sentence-similarity --- This is a Japanese sentence-BERT model. 日本語用Sentence-BERTモデル(バージョン2)です。 バージョン1よりも良いロス関数であるMultipleNegativesRankingLossを用いて学習した改良版です。 手元の非公開データセットでは、バージョン1よりも1.5〜2ポイントほど精度が高い結果が得られました。 事前学習済みモデルとしてcl-tohoku/bert-base-japanese-whole-word-maskingを利用しました。 従って、推論の実行にはfugashiとipadicが必要です(pip install fugashi ipadic)。 # 旧バージョンの解説 モデル名を\"sonoisa/sentence-bert-base-ja-mean-tokens-v2\"に書き換えれば、本モデルを利用した挙動になります。 # 使い方", + "model_explanation_gemini": "Generates sentence embeddings in Japanese for tasks like similarity comparison, using an improved BERT-based model trained with MultipleNegativesRankingLoss for better accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/speechbrain_emotion-recognition-wav2vec2-IEMOCAP.json b/data/model_data_json/speechbrain_emotion-recognition-wav2vec2-IEMOCAP.json new file mode 100644 index 0000000000000000000000000000000000000000..9f4af3d52b57e699dbb1f2ca654175d12824550c --- /dev/null +++ b/data/model_data_json/speechbrain_emotion-recognition-wav2vec2-IEMOCAP.json @@ -0,0 +1,19 @@ +{ + "model_id": "speechbrain/emotion-recognition-wav2vec2-IEMOCAP", + "downloads": 308097, + "tags": [ + "speechbrain", + "audio-classification", + "Emotion", + "Recognition", + "wav2vec2", + "pytorch", + "en", + "dataset:iemocap", + "arxiv:2106.04624", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: \"en\" thumbnail: tags: - audio-classification - speechbrain - Emotion - Recognition - wav2vec2 - pytorch license: \"apache-2.0\" datasets: - iemocap metrics: - Accuracy inference: false ---

# Emotion Recognition with wav2vec2 base on IEMOCAP This repository provides all the necessary tools to perform emotion recognition with a fine-tuned wav2vec2 (base) model using SpeechBrain. It is trained on IEMOCAP training data. For a better experience, we encourage you to learn more about SpeechBrain. The model performance on IEMOCAP test set is: | Release | Accuracy(%) | |:-------------:|:--------------:| | 19-10-21 | 78.7 (Avg: 75.3) | ## Pipeline description This system is composed of an wav2vec2 model. It is a combination of convolutional and residual blocks. The embeddings are extracted using attentive statistical pooling. The system is trained with Additive Margin Softmax Loss. Speaker Verification is performed using cosine distance between speaker embeddings. The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. ## Install SpeechBrain First of all, please install the **development** version of SpeechBrain with the following command: Please notice that we encourage you to read our tutorials and learn more about SpeechBrain. ### Perform Emotion recognition An external is used as an external Predictor class into this HF repos. We use function from that allow you to load you custom model. The prediction tensor will contain a tuple of (embedding, id_class, label_name). ### Inference on GPU To perform inference on the GPU, add when calling the method. ### Training The model was trained with SpeechBrain (aa018540). To train it from scratch follows these steps: 1. Clone SpeechBrain: 2. Install it: 3. Run Training: You can find our training results (models, logs, etc) here. ### Limitations The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets. # **Citing SpeechBrain** Please, cite SpeechBrain if you use it for your research or business. # **About SpeechBrain** - Website: - Code: - HuggingFace:", + "model_explanation_gemini": "Recognizes emotions from speech audio using a fine-tuned wav2vec2 model trained on the IEMOCAP dataset." +} \ No newline at end of file diff --git a/data/model_data_json/speechbrain_lang-id-voxlingua107-ecapa.json b/data/model_data_json/speechbrain_lang-id-voxlingua107-ecapa.json new file mode 100644 index 0000000000000000000000000000000000000000..fb0a4ceadf6f95141bd2603edf4bff33c94efa8b --- /dev/null +++ b/data/model_data_json/speechbrain_lang-id-voxlingua107-ecapa.json @@ -0,0 +1,129 @@ +{ + "model_id": "speechbrain/lang-id-voxlingua107-ecapa", + "downloads": 156733, + "tags": [ + "speechbrain", + "audio-classification", + "embeddings", + "Language", + "Identification", + "pytorch", + "ECAPA-TDNN", + "TDNN", + "VoxLingua107", + "multilingual", + "ab", + "af", + "am", + "ar", + "as", + "az", + "ba", + "be", + "bg", + "bi", + "bo", + "br", + "bs", + "ca", + "ceb", + "cs", + "cy", + "da", + "de", + "el", + "en", + "eo", + "es", + "et", + "eu", + "fa", + "fi", + "fo", + "fr", + "gl", + "gn", + "gu", + "gv", + "ha", + "haw", + "hi", + "hr", + "ht", + "hu", + "hy", + "ia", + "id", + "is", + "it", + "he", + "ja", + "jv", + "ka", + "kk", + "km", + "kn", + "ko", + "la", + "lm", + "ln", + "lo", + "lt", + "lv", + "mg", + "mi", + "mk", + "ml", + "mn", + "mr", + "ms", + "mt", + "my", + "ne", + "nl", + "nn", + "no", + "oc", + "pa", + "pl", + "ps", + "pt", + "ro", + "ru", + "sa", + "sco", + "sd", + "si", + "sk", + "sl", + "sn", + "so", + "sq", + "sr", + "su", + "sv", + "sw", + "ta", + "te", + "tg", + "th", + "tk", + "tl", + "tr", + "tt", + "uk", + "ud", + "uz", + "vi", + "war", + "yi", + "yo", + "zh", + "dataset:VoxLingua107", + "arxiv:2106.04624", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: - multilingual - ab - af - am - ar - as - az - ba - be - bg - bi - bo - br - bs - ca - ceb - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fo - fr - gl - gn - gu - gv - ha - haw - hi - hr - ht - hu - hy - ia - id - is - it - he - ja - jv - ka - kk - km - kn - ko - la - lm - ln - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - nn - no - oc - pa - pl - ps - pt - ro - ru - sa - sco - sd - si - sk - sl - sn - so - sq - sr - su - sv - sw - ta - te - tg - th - tk - tl - tr - tt - uk - ud - uz - vi - war - yi - yo - zh thumbnail: tags: - audio-classification - speechbrain - embeddings - Language - Identification - pytorch - ECAPA-TDNN - TDNN - VoxLingua107 license: \"apache-2.0\" datasets: - VoxLingua107 metrics: - Accuracy widget: - example_title: English Sample src: --- # VoxLingua107 ECAPA-TDNN Spoken Language Identification Model ## Model description This is a spoken language recognition model trained on the VoxLingua107 dataset using SpeechBrain. The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training. We observed that this improved the performance of extracted utterance embeddings for downstream tasks. The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. The model can classify a speech utterance according to the language spoken. It covers 107 different languages ( Abkhazian, Afrikaans, Amharic, Arabic, Assamese, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Tibetan, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greek, English, Esperanto, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Galician, Guarani, Gujarati, Manx, Hausa, Hawaiian, Hindi, Croatian, Haitian, Hungarian, Armenian, Interlingua, Indonesian, Icelandic, Italian, Hebrew, Japanese, Javanese, Georgian, Kazakh, Central Khmer, Kannada, Korean, Latin, Luxembourgish, Lingala, Lao, Lithuanian, Latvian, Malagasy, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Nepali, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sanskrit, Scots, Sindhi, Sinhala, Slovak, Slovenian, Shona, Somali, Albanian, Serbian, Sundanese, Swedish, Swahili, Tamil, Telugu, Tajik, Thai, Turkmen, Tagalog, Turkish, Tatar, Ukrainian, Urdu, Uzbek, Vietnamese, Waray, Yiddish, Yoruba, Mandarin Chinese). ## Intended uses & limitations The model has two uses: - use 'as is' for spoken language recognition - use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data The model is trained on automatically collected YouTube data. For more information about the dataset, see here. #### How to use To perform inference on the GPU, add when calling the method. The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*. Warning: In the dataset and in the defaults of this model (see [](label_encoder.txt), the used ISO language code for Hebrew is obsolete (should be instead of ). The ISO language code for Javanese is incorrect (should be instead of ). See issue #2396. #### Limitations and bias Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are: - Probably it's accuracy on smaller languages is quite limited - Probably it works worse on female speech than male speech (because YouTube data includes much more male speech) - Based on subjective experiments, it doesn't work well on speech with a foreign accent - Probably it doesn't work well on children's speech and on persons with speech disorders ## Training data The model is trained on VoxLingua107. VoxLingua107 is a speech dataset for training spoken language identification models. The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives. VoxLingua107 contains data for 107 languages. The total amount of speech in the training set is 6628 hours. The average amount of data per language is 62 hours. However, the real amount per language varies a lot. There is also a seperate development set containing 1609 speech segments from 33 languages, validated by at least two volunteers to really contain the given language. ## Training procedure See the SpeechBrain recipe. ## Evaluation results Error rate: 6.7% on the VoxLingua107 development dataset #### Referencing SpeechBrain ### Referencing VoxLingua107 #### About SpeechBrain SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. Website: GitHub:", + "model_explanation_gemini": "Identifies the spoken language in audio samples from 107 languages using ECAPA-TDNN architecture." +} \ No newline at end of file diff --git a/data/model_data_json/speechbrain_spkrec-ecapa-voxceleb.json b/data/model_data_json/speechbrain_spkrec-ecapa-voxceleb.json new file mode 100644 index 0000000000000000000000000000000000000000..c08ad12febc66d12e953fe4b6e66df60bfa72a84 --- /dev/null +++ b/data/model_data_json/speechbrain_spkrec-ecapa-voxceleb.json @@ -0,0 +1,21 @@ +{ + "model_id": "speechbrain/spkrec-ecapa-voxceleb", + "downloads": 1410815, + "tags": [ + "speechbrain", + "embeddings", + "Speaker", + "Verification", + "Identification", + "pytorch", + "ECAPA", + "TDNN", + "en", + "dataset:voxceleb", + "arxiv:2106.04624", + "license:apache-2.0", + "region:us" + ], + "description": "--- language: \"en\" thumbnail: tags: - speechbrain - embeddings - Speaker - Verification - Identification - pytorch - ECAPA - TDNN license: \"apache-2.0\" datasets: - voxceleb metrics: - EER widget: - example_title: VoxCeleb Speaker id10003 src: - example_title: VoxCeleb Speaker id10004 src: ---

# Speaker Verification with ECAPA-TDNN embeddings on Voxceleb This repository provides all the necessary tools to perform speaker verification with a pretrained ECAPA-TDNN model using SpeechBrain. The system can be used to extract speaker embeddings as well. It is trained on Voxceleb 1+ Voxceleb2 training data. For a better experience, we encourage you to learn more about SpeechBrain. The model performance on Voxceleb1-test set(Cleaned) is: | Release | EER(%) |:-------------:|:--------------:| | 05-03-21 | 0.80 | ## Pipeline description This system is composed of an ECAPA-TDNN model. It is a combination of convolutional and residual blocks. The embeddings are extracted using attentive statistical pooling. The system is trained with Additive Margin Softmax Loss. Speaker Verification is performed using cosine distance between speaker embeddings. ## Install SpeechBrain First of all, please install SpeechBrain with the following command: Please notice that we encourage you to read our tutorials and learn more about SpeechBrain. ### Compute your speaker embeddings The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*. ### Perform Speaker Verification The prediction is 1 if the two signals in input are from the same speaker and 0 otherwise. ### Inference on GPU To perform inference on the GPU, add when calling the method. ### Training The model was trained with SpeechBrain (aa018540). To train it from scratch follows these steps: 1. Clone SpeechBrain: 2. Install it: 3. Run Training: You can find our training results (models, logs, etc) here. ### Limitations The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets. #### Referencing ECAPA-TDNN # **Citing SpeechBrain** Please, cite SpeechBrain if you use it for your research or business. # **About SpeechBrain** - Website: - Code: - HuggingFace:", + "model_explanation_gemini": "Generates speaker embeddings and performs speaker verification using an ECAPA-TDNN model trained on VoxCeleb data." +} \ No newline at end of file diff --git a/data/model_data_json/sshleifer_distilbart-cnn-12-6.json b/data/model_data_json/sshleifer_distilbart-cnn-12-6.json new file mode 100644 index 0000000000000000000000000000000000000000..498466f38214a4d11899c57123252597392e6400 --- /dev/null +++ b/data/model_data_json/sshleifer_distilbart-cnn-12-6.json @@ -0,0 +1,22 @@ +{ + "model_id": "sshleifer/distilbart-cnn-12-6", + "downloads": 519698, + "tags": [ + "transformers", + "pytorch", + "jax", + "rust", + "bart", + "text2text-generation", + "summarization", + "en", + "dataset:cnn_dailymail", + "dataset:xsum", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - summarization license: apache-2.0 datasets: - cnn_dailymail - xsum thumbnail: --- ### Usage This checkpoint should be loaded into . See the BART docs for more information. ### Metrics for DistilBART models | Model Name | MM Params | Inference Time (MS) | Speedup | Rouge 2 | Rouge-L | |:---------------------------|------------:|----------------------:|----------:|----------:|----------:| | distilbart-xsum-12-1 | 222 | 90 | 2.54 | 18.31 | 33.37 | | distilbart-xsum-6-6 | 230 | 132 | 1.73 | 20.92 | 35.73 | | distilbart-xsum-12-3 | 255 | 106 | 2.16 | 21.37 | 36.39 | | distilbart-xsum-9-6 | 268 | 136 | 1.68 | 21.72 | 36.61 | | bart-large-xsum (baseline) | 406 | 229 | 1 | 21.85 | 36.50 | | distilbart-xsum-12-6 | 306 | 137 | 1.68 | 22.12 | 36.99 | | bart-large-cnn (baseline) | 406 | 381 | 1 | 21.06 | 30.63 | | distilbart-12-3-cnn | 255 | 214 | 1.78 | 20.57 | 30.00 | | distilbart-12-6-cnn | 306 | 307 | 1.24 | 21.26 | 30.59 | | distilbart-6-6-cnn | 230 | 182 | 2.09 | 20.17 | 29.70 |", + "model_explanation_gemini": "A distilled version of BART fine-tuned for English text summarization, optimized for efficiency and performance on datasets like CNN/DailyMail and XSum." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_sd-turbo.json b/data/model_data_json/stabilityai_sd-turbo.json new file mode 100644 index 0000000000000000000000000000000000000000..8e61393811234310c2a46ca7a83884e0ed87b7d7 --- /dev/null +++ b/data/model_data_json/stabilityai_sd-turbo.json @@ -0,0 +1,14 @@ +{ + "model_id": "stabilityai/sd-turbo", + "downloads": 538776, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "autotrain_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- pipeline_tag: text-to-image inference: false --- # SD-Turbo Model Card !row01 SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation. We release SD-Turbo as a research artifact, and to study small, distilled text-to-image models. For increased quality and prompt understanding, we recommend SDXL-Turbo. Please note: For commercial use, please refer to ## Model Details ### Model Description SD-Turbo is a distilled version of Stable Diffusion 2.1, trained for real-time synthesis. SD-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling large-scale foundational image diffusion models in 1 to 4 steps at high image quality. This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. - **Developed by:** Stability AI - **Funded by:** Stability AI - **Model type:** Generative text-to-image model - **Finetuned from model:** Stable Diffusion 2.1 ### Model Sources For research purposes, we recommend our Github repository ( which implements the most popular diffusion frameworks (both training and inference). - **Repository:** - **Paper:** - **Demo [for the bigger SDXL-Turbo]:** ## Evaluation !comparison1 !comparison2 The charts above evaluate user preference for SD-Turbo over other single- and multi-step models. SD-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-Lora XL and LCM-Lora 1.5. **Note:** For increased quality, we recommend the bigger version SDXL-Turbo. For details on the user study, we refer to the research paper. ## Uses ### Direct Use The model is intended for both non-commercial and commercial usage. Possible research areas and tasks include - Research on generative models. - Research on real-time applications of generative models. - Research on the impact of real-time generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. For commercial use, please refer to Excluded uses are described below. ### Diffusers - **Text-to-image**: SD-Turbo does not make use of or , we disable it with . Preferably, the model generates images of size 512x512 but higher image sizes work as well. A **single step** is enough to generate high quality images. - **Image-to-image**: When using SD-Turbo for image-to-image generation, make sure that * is larger or equal to 1. The image-to-image pipeline will run for steps, *e.g.* 0.5 * 2.0 = 1 step in our example below. ### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy. ## Limitations and Bias ### Limitations - The quality and prompt alignment is lower than that of SDXL-Turbo. - The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism. - The model cannot render legible text. - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. ### Recommendations The model is intended for both non-commercial and commercial usage. ## How to Get Started with the Model Check out", + "model_explanation_gemini": "SD-Turbo is a fast, distilled text-to-image model generating photorealistic images from text prompts in a single step, optimized for real-time synthesis using Adversarial Diffusion Distillation." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_sd-vae-ft-mse.json b/data/model_data_json/stabilityai_sd-vae-ft-mse.json new file mode 100644 index 0000000000000000000000000000000000000000..f98db28376e8a290b81c3742bb9ff658a6a27915 --- /dev/null +++ b/data/model_data_json/stabilityai_sd-vae-ft-mse.json @@ -0,0 +1,14 @@ +{ + "model_id": "stabilityai/sd-vae-ft-mse", + "downloads": 120997, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "license:mit", + "region:us" + ], + "description": "--- license: mit tags: - stable-diffusion - stable-diffusion-diffusers inference: false --- # Improved Autoencoders ## Utilizing These weights are intended to be used with the 🧨 diffusers library. If you are looking for the model to use with the original CompVis Stable Diffusion codebase, come here. #### How to use with 🧨 diffusers You can integrate this fine-tuned VAE decoder to your existing workflows, by including a argument to the ## Decoder Finetuning We publish two kl-f8 autoencoder versions, finetuned from the original kl-f8 autoencoder on a 1:1 ratio of LAION-Aesthetics and LAION-Humans, an unreleased subset containing only SFW images of humans. The intent was to fine-tune on the Stable Diffusion training set (the autoencoder was originally trained on OpenImages) but also enrich the dataset with images of humans to improve the reconstruction of faces. The first, _ft-EMA_, was resumed from the original checkpoint, trained for 313198 steps and uses EMA weights. It uses the same loss configuration as the original checkpoint (L1 + LPIPS). The second, _ft-MSE_, was resumed from _ft-EMA_ and uses EMA weights and was trained for another 280k steps using a different loss, with more emphasis on MSE reconstruction (MSE + 0.1 * LPIPS). It produces somewhat ``smoother'' outputs. The batch size for both versions was 192 (16 A100s, batch size 12 per GPU). To keep compatibility with existing models, only the decoder part was finetuned; the checkpoints can be used as a drop-in replacement for the existing autoencoder. _Original kl-f8 VAE vs f8-ft-EMA vs f8-ft-MSE_ ## Evaluation ### COCO 2017 (256x256, val, 5000 images) | Model | train steps | rFID | PSNR | SSIM | PSIM | Link | Comments |----------|---------|------|--------------|---------------|---------------|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| | | | | | | | | | | original | 246803 | 4.99 | 23.4 +/- 3.8 | 0.69 +/- 0.14 | 1.01 +/- 0.28 | | as used in SD | | ft-EMA | 560001 | 4.42 | 23.8 +/- 3.9 | 0.69 +/- 0.13 | 0.96 +/- 0.27 | | slightly better overall, with EMA | | ft-MSE | 840001 | 4.70 | 24.5 +/- 3.7 | 0.71 +/- 0.13 | 0.92 +/- 0.27 | | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs | ### LAION-Aesthetics 5+ (256x256, subset, 10000 images) | Model | train steps | rFID | PSNR | SSIM | PSIM | Link | Comments |----------|-----------|------|--------------|---------------|---------------|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| | | | | | | | | | | original | 246803 | 2.61 | 26.0 +/- 4.4 | 0.81 +/- 0.12 | 0.75 +/- 0.36 | | as used in SD | | ft-EMA | 560001 | 1.77 | 26.7 +/- 4.8 | 0.82 +/- 0.12 | 0.67 +/- 0.34 | | slightly better overall, with EMA | | ft-MSE | 840001 | 1.88 | 27.3 +/- 4.7 | 0.83 +/- 0.11 | 0.65 +/- 0.34 | | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs | ### Visual _Visualization of reconstructions on 256x256 images from the COCO2017 validation dataset._


256x256: ft-EMA (left), ft-MSE (middle), original (right)

", + "model_explanation_gemini": "A fine-tuned VAE decoder for Stable Diffusion that improves image reconstruction, particularly for faces, by emphasizing MSE loss for smoother outputs while maintaining compatibility as a drop-in replacement for existing autoencoders." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_sdxl-turbo.json b/data/model_data_json/stabilityai_sdxl-turbo.json new file mode 100644 index 0000000000000000000000000000000000000000..8a16ea5abe8814da146a2e589147de084715b61e --- /dev/null +++ b/data/model_data_json/stabilityai_sdxl-turbo.json @@ -0,0 +1,16 @@ +{ + "model_id": "stabilityai/sdxl-turbo", + "downloads": 317782, + "tags": [ + "diffusers", + "onnx", + "safetensors", + "text-to-image", + "license:other", + "autotrain_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- pipeline_tag: text-to-image inference: false license: other license_name: sai-nc-community license_link: --- # SDXL-Turbo Model Card !row01 SDXL-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation. A real-time demo is available here: Please note: For commercial use, please refer to ## Model Details ### Model Description SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis. SDXL-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling large-scale foundational image diffusion models in 1 to 4 steps at high image quality. This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. - **Developed by:** Stability AI - **Funded by:** Stability AI - **Model type:** Generative text-to-image model - **Finetuned from model:** SDXL 1.0 Base ### Model Sources For research purposes, we recommend our Github repository ( which implements the most popular diffusion frameworks (both training and inference). - **Repository:** - **Paper:** - **Demo:** ## Evaluation !comparison1 !comparison2 The charts above evaluate user preference for SDXL-Turbo over other single- and multi-step models. SDXL-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-XL evaluated at four (or fewer) steps. In addition, we see that using four steps for SDXL-Turbo further improves performance. For details on the user study, we refer to the research paper. ## Uses ### Direct Use The model is intended for both non-commercial and commercial usage. You can use this model for non-commercial or research purposes under this license. Possible research areas and tasks include - Research on generative models. - Research on real-time applications of generative models. - Research on the impact of real-time generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. For commercial use, please refer to Excluded uses are described below. ### Diffusers - **Text-to-image**: SDXL-Turbo does not make use of or , we disable it with . Preferably, the model generates images of size 512x512 but higher image sizes work as well. A **single step** is enough to generate high quality images. - **Image-to-image**: When using SDXL-Turbo for image-to-image generation, make sure that * is larger or equal to 1. The image-to-image pipeline will run for steps, *e.g.* 0.5 * 2.0 = 1 step in our example below. ### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy. ## Limitations and Bias ### Limitations - The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism. - The model cannot render legible text. - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. ### Recommendations The model is intended for both non-commercial and commercial usage. ## How to Get Started with the Model Check out", + "model_explanation_gemini": "SDXL-Turbo is a fast, distilled text-to-image model generating photorealistic images from text prompts in a single evaluation step, optimized for real-time synthesis using adversarial diffusion distillation." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_sdxl-vae.json b/data/model_data_json/stabilityai_sdxl-vae.json new file mode 100644 index 0000000000000000000000000000000000000000..0e1317bdbfacc38fd8ade3497bcd6dfc810ba208 --- /dev/null +++ b/data/model_data_json/stabilityai_sdxl-vae.json @@ -0,0 +1,15 @@ +{ + "model_id": "stabilityai/sdxl-vae", + "downloads": 351705, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "stable-diffusion-diffusers", + "arxiv:2112.10752", + "license:mit", + "region:us" + ], + "description": "--- license: mit tags: - stable-diffusion - stable-diffusion-diffusers inference: false --- # SDXL - VAE #### How to use with 🧨 diffusers You can integrate this fine-tuned VAE decoder to your existing workflows, by including a argument to the ## Model SDXL is a latent diffusion model, where the diffusion operates in a pretrained, learned (and fixed) latent space of an autoencoder. While the bulk of the semantic composition is done by the latent diffusion model, we can improve _local_, high-frequency details in generated images by improving the quality of the autoencoder. To this end, we train the same autoencoder architecture used for the original Stable Diffusion at a larger batch-size (256 vs 9) and additionally track the weights with an exponential moving average (EMA). The resulting autoencoder outperforms the original model in all evaluated reconstruction metrics, see the table below. ## Evaluation _SDXL-VAE vs original kl-f8 VAE vs f8-ft-MSE_ ### COCO 2017 (256x256, val, 5000 images) | Model | rFID | PSNR | SSIM | PSIM | Link | Comments |----------|------|--------------|---------------|---------------|------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| | | | | | | | | | SDXL-VAE | 4.42 | 24.7 +/- 3.9 | 0.73 +/- 0.13 | 0.88 +/- 0.27 | | as used in SDXL | | original | 4.99 | 23.4 +/- 3.8 | 0.69 +/- 0.14 | 1.01 +/- 0.28 | | as used in SD | | ft-MSE | 4.70 | 24.5 +/- 3.7 | 0.71 +/- 0.13 | 0.92 +/- 0.27 | | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs |", + "model_explanation_gemini": "An improved VAE decoder for SDXL that enhances local, high-frequency details in generated images by training with larger batches and EMA weight tracking, outperforming the original in reconstruction metrics." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-2-1-base.json b/data/model_data_json/stabilityai_stable-diffusion-2-1-base.json new file mode 100644 index 0000000000000000000000000000000000000000..7bce8806de0296ae3e0662d011ccc12d4efc20ae --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-2-1-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "stabilityai/stable-diffusion-2-1-base", + "downloads": 209446, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "text-to-image", + "arxiv:2112.10752", + "arxiv:2202.00512", + "arxiv:1910.09700", + "license:openrail++", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: openrail++ tags: - stable-diffusion - text-to-image --- # Stable Diffusion v2-1-base Model Card This model card focuses on the model associated with the Stable Diffusion v2-1-base model. This model fine-tunes stable-diffusion-2-base () with 220k extra steps taken, with on the same dataset. - Use it with the []( repository: download the here. - Use it with 🧨 [](#examples) ## Model Details - **Developed by:** Robin Rombach, Patrick Esser - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). - **Resources for more information:** GitHub Repository. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } ## Examples Using the 🤗's Diffusers library to run Stable Diffusion 2 in a simple and efficient manner. Running the pipeline (if you don't swap the scheduler it will run with the default PNDM/PLMS scheduler, in this example we are swapping it to EulerDiscreteScheduler): **Notes**: - Despite not being a dependency, we highly recommend you to install xformers for memory efficient attention (better performance) - If you have low GPU RAM available, make sure to add a after sending it to for less VRAM usage (to the cost of speed) # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is originally taken from the DALLE-MINI model card, was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a subset of the large-scale dataset LAION-5B, which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section). ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion vw was primarily trained on subsets of LAION-2B(en), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a \"p_unsafe\" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic. **Training Procedure** Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through the OpenCLIP-ViT/H text-encoder. - The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called _v-objective_, see We currently provide the following checkpoints, for various versions: ### Version 2.1 - : Fine-tuned on 2.0 with 220k extra steps taken, with on the same dataset. - : Resumed from 2.0 with an additional 55k steps on the same dataset (), and then fine-tuned for another 155k extra steps with . ### Version 2.0 - : 550k steps at resolution on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with and an aesthetic score >= . 850k steps at resolution on the same dataset with resolution . - : Resumed from and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on a subset of our dataset. - : Resumed from and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS () which is used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. - : Resumed from and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint. - : Trained for 1.25M steps on a 10M subset of LAION containing images . The model was trained on crops of size and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 1 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 steps DDIM sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 DDIM steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 200000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 15000 kg CO2 eq. ## Citation @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } *This model card was written by: Robin Rombach, Patrick Esser and David Ha and is based on the Stable Diffusion v1 and DALL-E Mini model card.*", + "model_explanation_gemini": "Generates and modifies images from text prompts using a latent diffusion model." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-2-1.json b/data/model_data_json/stabilityai_stable-diffusion-2-1.json new file mode 100644 index 0000000000000000000000000000000000000000..e0cf8a7b199f663150e7256c7489c9121943cd29 --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-2-1.json @@ -0,0 +1,20 @@ +{ + "model_id": "stabilityai/stable-diffusion-2-1", + "downloads": 920856, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "text-to-image", + "arxiv:2112.10752", + "arxiv:2202.00512", + "arxiv:1910.09700", + "license:openrail++", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: openrail++ tags: - stable-diffusion - text-to-image pinned: true --- # Stable Diffusion v2-1 Model Card This model card focuses on the model associated with the Stable Diffusion v2-1 model, codebase available here. This model is fine-tuned from stable-diffusion-2 () with an additional 55k steps on the same dataset (with ), and then fine-tuned for another 155k extra steps with . - Use it with the []( repository: download the here. - Use it with 🧨 [](#examples) ## Model Details - **Developed by:** Robin Rombach, Patrick Esser - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). - **Resources for more information:** GitHub Repository. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } ## Examples Using the 🤗's Diffusers library to run Stable Diffusion 2 in a simple and efficient manner. Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler): **Notes**: - Despite not being a dependency, we highly recommend you to install xformers for memory efficient attention (better performance) - If you have low GPU RAM available, make sure to add a after sending it to for less VRAM usage (to the cost of speed) # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is originally taken from the DALLE-MINI model card, was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a subset of the large-scale dataset LAION-5B, which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section). ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion was primarily trained on subsets of LAION-2B(en), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a \"p_unsafe\" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic. **Training Procedure** Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through the OpenCLIP-ViT/H text-encoder. - The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called _v-objective_, see We currently provide the following checkpoints: - : 550k steps at resolution on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with and an aesthetic score >= . 850k steps at resolution on the same dataset with resolution . - : Resumed from and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on a subset of our dataset. - : Resumed from and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS () which is used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. - : Resumed from and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint. - : Trained for 1.25M steps on a 10M subset of LAION containing images . The model was trained on crops of size and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 1 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 steps DDIM sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 DDIM steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 200000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 15000 kg CO2 eq. ## Citation @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } *This model card was written by: Robin Rombach, Patrick Esser and David Ha and is based on the Stable Diffusion v1 and DALL-E Mini model card.*", + "model_explanation_gemini": "Generates and modifies images from text prompts using a latent diffusion-based text-to-image approach." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-2-base.json b/data/model_data_json/stabilityai_stable-diffusion-2-base.json new file mode 100644 index 0000000000000000000000000000000000000000..c0ae9f3aea65d00a7ca525cca1bfe9297d9045cb --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-2-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "stabilityai/stable-diffusion-2-base", + "downloads": 545982, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "text-to-image", + "arxiv:2112.10752", + "arxiv:2202.00512", + "arxiv:1910.09700", + "license:openrail++", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: openrail++ tags: - stable-diffusion - text-to-image --- # Stable Diffusion v2-base Model Card This model card focuses on the model associated with the Stable Diffusion v2-base model, available here. The model is trained from scratch 550k steps at resolution on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with and an aesthetic score >= . Then it is further trained for 850k steps at resolution on the same dataset on images with resolution . !image - Use it with the []( repository: download the here. - Use it with 🧨 []( ## Model Details - **Developed by:** Robin Rombach, Patrick Esser - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). - **Resources for more information:** GitHub Repository. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } ## Examples Using the 🤗's Diffusers library to run Stable Diffusion 2 in a simple and efficient manner. Running the pipeline (if you don't swap the scheduler it will run with the default PNDM/PLMS scheduler, in this example we are swapping it to EulerDiscreteScheduler): **Notes**: - Despite not being a dependency, we highly recommend you to install xformers for memory efficient attention (better performance) - If you have low GPU RAM available, make sure to add a after sending it to for less VRAM usage (to the cost of speed) # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is originally taken from the DALLE-MINI model card, was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a subset of the large-scale dataset LAION-5B, which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section). ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion vw was primarily trained on subsets of LAION-2B(en), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a \"p_unsafe\" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic. **Training Procedure** Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through the OpenCLIP-ViT/H text-encoder. - The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called _v-objective_, see We currently provide the following checkpoints: - : 550k steps at resolution on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with and an aesthetic score >= . 850k steps at resolution on the same dataset with resolution . - : Resumed from and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on a subset of our dataset. - : Resumed from and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS () which is used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. - : Resumed from and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint. - : Trained for 1.25M steps on a 10M subset of LAION containing images . The model was trained on crops of size and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 1 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 steps DDIM sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 DDIM steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 200000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 15000 kg CO2 eq. ## Citation @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } *This model card was written by: Robin Rombach, Patrick Esser and David Ha and is based on the Stable Diffusion v1 and DALL-E Mini model card.*", + "model_explanation_gemini": "Generates and modifies images from text prompts using a latent diffusion-based text-to-image approach." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-2-inpainting.json b/data/model_data_json/stabilityai_stable-diffusion-2-inpainting.json new file mode 100644 index 0000000000000000000000000000000000000000..6d6f107ec98938a2c00cc89aa1061a04195170be --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-2-inpainting.json @@ -0,0 +1,17 @@ +{ + "model_id": "stabilityai/stable-diffusion-2-inpainting", + "downloads": 218478, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "arxiv:2112.10752", + "arxiv:2202.00512", + "arxiv:1910.09700", + "license:openrail++", + "diffusers:StableDiffusionInpaintPipeline", + "region:us" + ], + "description": "--- license: openrail++ tags: - stable-diffusion inference: false --- # Stable Diffusion v2 Model Card This model card focuses on the model associated with the Stable Diffusion v2, available here. This model is resumed from stable-diffusion-2-base () and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. !image - Use it with the []( repository: download the here. - Use it with 🧨 []( ## Model Details - **Developed by:** Robin Rombach, Patrick Esser - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). - **Resources for more information:** GitHub Repository. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } ## Examples Using the 🤗's Diffusers library to run Stable Diffusion 2 inpainting in a simple and efficient manner. **Notes**: - Despite not being a dependency, we highly recommend you to install xformers for memory efficient attention (better performance) - If you have low GPU RAM available, make sure to add a after sending it to for less VRAM usage (to the cost of speed) **How it works:** | :-------------------------:|:-------------------------:| \"drawing\" | \"drawing\" | :-------------------------:|:-------------------------:| Face of a yellow cat, high resolution, sitting on a park bench | \"drawing\" # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is originally taken from the DALLE-MINI model card, was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a subset of the large-scale dataset LAION-5B, which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section). ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion vw was primarily trained on subsets of LAION-2B(en), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a \"p_unsafe\" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic. **Training Procedure** Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through the OpenCLIP-ViT/H text-encoder. - The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called _v-objective_, see We currently provide the following checkpoints: - : 550k steps at resolution on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with and an aesthetic score >= . 850k steps at resolution on the same dataset with resolution . - : Resumed from and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on a subset of our dataset. - : Resumed from and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS () which is used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. - : Resumed from and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint. - : Trained for 1.25M steps on a 10M subset of LAION containing images . The model was trained on crops of size and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 1 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 steps DDIM sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 DDIM steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 200000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 15000 kg CO2 eq. ## Citation @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } *This model card was written by: Robin Rombach, Patrick Esser and David Ha and is based on the Stable Diffusion v1 and DALL-E Mini model card.*", + "model_explanation_gemini": "Generates and modifies images based on text prompts using inpainting with latent diffusion and mask-conditioning." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-2.json b/data/model_data_json/stabilityai_stable-diffusion-2.json new file mode 100644 index 0000000000000000000000000000000000000000..c8d48a010f5b46657c38b186b122ea220a9beda1 --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-2.json @@ -0,0 +1,20 @@ +{ + "model_id": "stabilityai/stable-diffusion-2", + "downloads": 264043, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "text-to-image", + "arxiv:2202.00512", + "arxiv:2112.10752", + "arxiv:1910.09700", + "license:openrail++", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionPipeline", + "region:us" + ], + "description": "--- license: openrail++ tags: - stable-diffusion - text-to-image --- # Stable Diffusion v2 Model Card This model card focuses on the model associated with the Stable Diffusion v2 model, available here. This model is resumed from stable-diffusion-2-base () and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on images. !image - Use it with the []( repository: download the here. - Use it with 🧨 []( ## Model Details - **Developed by:** Robin Rombach, Patrick Esser - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). - **Resources for more information:** GitHub Repository. - **Cite as:** @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } ## Examples Using the 🤗's Diffusers library to run Stable Diffusion 2 in a simple and efficient manner. Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to EulerDiscreteScheduler): **Notes**: - Despite not being a dependency, we highly recommend you to install xformers for memory efficient attention (better performance) - If you have low GPU RAM available, make sure to add a after sending it to for less VRAM usage (to the cost of speed) # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. Excluded uses are described below. ### Misuse, Malicious Use, and Out-of-Scope Use _Note: This section is originally taken from the DALLE-MINI model card, was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2_. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. #### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. #### Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a subset of the large-scale dataset LAION-5B, which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section). ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion was primarily trained on subsets of LAION-2B(en), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent. ## Training **Training Data** The model developers used the following dataset for training the model: - LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a \"p_unsafe\" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic. **Training Procedure** Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through the OpenCLIP-ViT/H text-encoder. - The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called _v-objective_, see We currently provide the following checkpoints: - : 550k steps at resolution on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with and an aesthetic score >= . 850k steps at resolution on the same dataset with resolution . - : Resumed from and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on a subset of our dataset. - : Resumed from and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS () which is used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. - : Resumed from and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint. - : Trained for 1.25M steps on a 10M subset of LAION containing images . The model was trained on crops of size and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule. - **Hardware:** 32 x 8 x A100 GPUs - **Optimizer:** AdamW - **Gradient Accumulations**: 1 - **Batch:** 32 x 8 x 2 x 4 = 2048 - **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant ## Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 steps DDIM sampling steps show the relative improvements of the checkpoints: !pareto Evaluated using 50 DDIM steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. ## Environmental Impact **Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - **Hardware Type:** A100 PCIe 40GB - **Hours used:** 200000 - **Cloud Provider:** AWS - **Compute Region:** US-east - **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 15000 kg CO2 eq. ## Citation @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\\\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } *This model card was written by: Robin Rombach, Patrick Esser and David Ha and is based on the Stable Diffusion v1 and DALL-E Mini model card.*", + "model_explanation_gemini": "Generates and modifies images from text prompts using a latent diffusion model." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-3-medium-diffusers.json b/data/model_data_json/stabilityai_stable-diffusion-3-medium-diffusers.json new file mode 100644 index 0000000000000000000000000000000000000000..afa3f17b8aa772392407f23268922a2bf230d4ef --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-3-medium-diffusers.json @@ -0,0 +1,17 @@ +{ + "model_id": "stabilityai/stable-diffusion-3-medium-diffusers", + "downloads": 103527, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "stable-diffusion", + "en", + "arxiv:2403.03206", + "license:other", + "diffusers:StableDiffusion3Pipeline", + "region:us" + ], + "description": "--- license: other license_name: stabilityai-nc-research-community license_link: LICENSE tags: - text-to-image - stable-diffusion extra_gated_prompt: >- By clicking \"Agree\", you agree to the License Agreement and acknowledge Stability AI's Privacy Policy. extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Receive email updates and promotions on Stability AI products, services, and research?: type: select options: - 'Yes' - 'No' I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Stability AI: checkbox language: - en pipeline_tag: text-to-image --- # Stable Diffusion 3 Medium !sd3 demo images ## Model !mmdit Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. For more technical details, please refer to the Research paper. Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or contact us for commercial licensing details. ### Model Description - **Developed by:** Stability AI - **Model type:** MMDiT text-to-image generative model - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer ( that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl) ### License - **Non-commercial Use:** Stable Diffusion 3 Medium is released under the Stability AI Non-Commercial Research Community License. The model is free to use for non-commercial purposes such as academic research. - **Commercial Use**: This model is not available for commercial use without a separate commercial license from Stability. We encourage professional artists, designers, and creators to use our Creator License. Please visit to learn more. ### Model Sources For local or self-hosted use, we recommend ComfyUI for inference. Stable Diffusion 3 Medium is available on our Stability API Platform. Stable Diffusion 3 models and workflows are available on Stable Assistant and on Discord via Stable Artisan. - **ComfyUI:** - **StableSwarmUI:** - **Tech report:** - **Demo:** ## Training Dataset We used synthetic data and filtered publicly available data to train our models. The model was pre-trained on 1 billion images. The fine-tuning data includes 30M high-quality aesthetic images focused on specific visual content and style, as well as 3M preference data images. ## Using with Diffusers Make sure you upgrade to the latest version of : . And then you can run: Refer to the documentation for more details on optimization and image-to-image support. ## Uses ### Intended Uses Intended uses include the following: * Generation of artworks and use in design and other artistic processes. * Applications in educational or creative tools. * Research on generative models, including understanding the limitations of generative models. All uses of the model should be in accordance with our Acceptable Use Policy. ### Out-of-Scope Uses The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model. ## Safety As part of our safety-by-design and responsible AI deployment approach, we implement safety measures throughout the development of our models, from the time we begin pre-training a model to the ongoing development, fine-tuning, and deployment of each model. We have implemented a number of safety mitigations that are intended to reduce the risk of severe harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases. For more about our approach to Safety, please visit our Safety page. ### Evaluation Approach Our evaluation methods include structured evaluations and internal and external red-teaming testing for specific, severe harms such as child sexual abuse and exploitation, extreme violence, and gore, sexually explicit content, and non-consensual nudity. Testing was conducted primarily in English and may not cover all possible harms. As with any model, the model may, at times, produce inaccurate, biased or objectionable responses to user prompts. ### Risks identified and mitigations: * Harmful content: We have used filtered data sets when training our models and implemented safeguards that attempt to strike the right balance between usefulness and preventing harm. However, this does not guarantee that all possible harmful content has been removed. The model may, at times, generate toxic or biased content. All developers and deployers should exercise caution and implement content safety guardrails based on their specific product policies and application use cases. * Misuse: Technical limitations and developer and end-user education can help mitigate against malicious applications of models. All users are required to adhere to our Acceptable Use Policy, including when applying fine-tuning and prompt engineering mechanisms. Please reference the Stability AI Acceptable Use Policy for information on violative uses of our products. * Privacy violations: Developers and deployers are encouraged to adhere to privacy regulations with techniques that respect data privacy. ### Contact Please report any issues with the model or contact us: * Safety issues: safety@stability.ai * Security issues: security@stability.ai * Privacy issues: privacy@stability.ai * License and general: * Enterprise license:", + "model_explanation_gemini": "Generates high-quality images from text prompts using a Multimodal Diffusion Transformer (MMDiT) for improved performance in typography, prompt understanding, and efficiency." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-3.5-large.json b/data/model_data_json/stabilityai_stable-diffusion-3.5-large.json new file mode 100644 index 0000000000000000000000000000000000000000..3182bb79ef8aab49d2b205ea31cfa2837db2d49e --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-3.5-large.json @@ -0,0 +1,16 @@ +{ + "model_id": "stabilityai/stable-diffusion-3.5-large", + "downloads": 141943, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "stable-diffusion", + "en", + "arxiv:2403.03206", + "license:other", + "diffusers:StableDiffusion3Pipeline", + "region:us" + ], + "description": "--- license: other license_name: stabilityai-ai-community license_link: LICENSE.md tags: - text-to-image - stable-diffusion - diffusers inference: true extra_gated_prompt: >- By clicking \"Agree\", you agree to the License Agreement and acknowledge Stability AI's Privacy Policy. extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Receive email updates and promotions on Stability AI products, services, and research?: type: select options: - 'Yes' - 'No' What do you intend to use the model for?: type: select options: - Research - Personal use - Creative Professional - Startup - Enterprise I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox language: - en pipeline_tag: text-to-image --- # Stable Diffusion 3.5 Large !3.5 Large Demo Image ## Model !MMDiT Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. Please note: This model is released under the Stability Community License. Visit Stability AI to learn or contact us for commercial licensing details. ### Model Description - **Developed by:** Stability AI - **Model type:** MMDiT text-to-image generative model - **Model Description:** This model generates images based on text prompts. It is a Multimodal Diffusion Transformer that use three fixed, pretrained text encoders, and with QK-normalization to improve training stability. ### License - **Community License:** Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the Community License Agreement. Read more at - **For individuals and organizations with annual revenue above $1M**: please contact us to get an Enterprise License. ### Model Sources For local or self-hosted use, we recommend ComfyUI for node-based UI inference, or diffusers or GitHub for programmatic use. - **ComfyUI:** Github, Example Workflow - **Huggingface Space:** Space - **Diffusers**: See below. - **GitHub**: GitHub. - **API Endpoints:** - Stability AI API - Replicate - Deepinfra ### Implementation Details - **QK Normalization:** Implements the QK normalization technique to improve training Stability. - **Text Encoders:** - CLIPs: OpenCLIP-ViT/G, CLIP-ViT/L, context length 77 tokens - T5: T5-xxl, context length 77/256 tokens at different stages of training - **Training Data and Strategy:** This model was trained on a wide variety of data, including synthetic data and filtered publicly available data. For more technical details of the original MMDiT architecture, please refer to the Research paper. ### Model Performance See blog for our study about comparative performance in prompt adherence and aesthetic quality. ## File Structure Click here to access the Files and versions tab ## Using with Diffusers Upgrade to the latest version of the 🧨 diffusers library and then you can run ### Quantizing the model with diffusers Reduce your VRAM usage and have the model fit on 🤏 VRAM GPUs ### Fine-tuning Please see the fine-tuning guide here. ## Uses ### Intended Uses Intended uses include the following: * Generation of artworks and use in design and other artistic processes. * Applications in educational or creative tools. * Research on generative models, including understanding the limitations of generative models. All uses of the model must be in accordance with our Acceptable Use Policy. ### Out-of-Scope Uses The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model. ## Safety As part of our safety-by-design and responsible AI deployment approach, we take deliberate measures to ensure Integrity starts at the early stages of development. We implement safety measures throughout the development of our models. We have implemented safety mitigations that are intended to reduce the risk of certain harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases. For more about our approach to Safety, please visit our Safety page. ### Integrity Evaluation Our integrity evaluation methods include structured evaluations and red-teaming testing for certain harms. Testing was conducted primarily in English and may not cover all possible harms. ### Risks identified and mitigations: * Harmful content: We have used filtered data sets when training our models and implemented safeguards that attempt to strike the right balance between usefulness and preventing harm. However, this does not guarantee that all possible harmful content has been removed. TAll developers and deployers should exercise caution and implement content safety guardrails based on their specific product policies and application use cases. * Misuse: Technical limitations and developer and end-user education can help mitigate against malicious applications of models. All users are required to adhere to our Acceptable Use Policy, including when applying fine-tuning and prompt engineering mechanisms. Please reference the Stability AI Acceptable Use Policy for information on violative uses of our products. * Privacy violations: Developers and deployers are encouraged to adhere to privacy regulations with techniques that respect data privacy. ### Contact Please report any issues with the model or contact us: * Safety issues: safety@stability.ai * Security issues: security@stability.ai * Privacy issues: privacy@stability.ai * License and general: * Enterprise license:" +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-3.5-medium.json b/data/model_data_json/stabilityai_stable-diffusion-3.5-medium.json new file mode 100644 index 0000000000000000000000000000000000000000..fedc3a14958912020b3fcdb37e1597b2d7fd2b90 --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-3.5-medium.json @@ -0,0 +1,16 @@ +{ + "model_id": "stabilityai/stable-diffusion-3.5-medium", + "downloads": 407393, + "tags": [ + "diffusers", + "safetensors", + "text-to-image", + "stable-diffusion", + "en", + "arxiv:2403.03206", + "license:other", + "diffusers:StableDiffusion3Pipeline", + "region:us" + ], + "description": "--- license: other license_name: stabilityai-ai-community license_link: LICENSE.md tags: - text-to-image - stable-diffusion - diffusers inference: true extra_gated_prompt: >- By clicking \"Agree\", you agree to the License Agreement and acknowledge Stability AI's Privacy Policy. extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Receive email updates and promotions on Stability AI products, services, and research?: type: select options: - 'Yes' - 'No' What do you intend to use the model for?: type: select options: - Research - Personal use - Creative Professional - Startup - Enterprise I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox language: - en pipeline_tag: text-to-image --- # Stable Diffusion 3.5 Medium !3.5 Medium Demo Image ## Model !MMDiT-X Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer with improvements (MMDiT-X) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. Please note: This model is released under the Stability Community License. Visit Stability AI to learn or contact us for commercial licensing details. ### Model Description - **Developed by:** Stability AI - **Model type:** MMDiT-X text-to-image generative model - **Model Description:** This model generates images based on text prompts. It is a Multimodal Diffusion Transformer ( with improvements that use three fixed, pretrained text encoders, with QK-normalization to improve training stability, and dual attention blocks in the first 12 transformer layers. ### License - **Community License:** Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the Community License Agreement. Read more at - **For individuals and organizations with annual revenue above $1M**: please contact us to get an Enterprise License. ### Model Sources For local or self-hosted use, we recommend ComfyUI for node-based UI inference, or diffusers or GitHub for programmatic use. - **ComfyUI:** Github, Example Workflow - **Huggingface Space:** Space - **Diffusers**: See below. - **GitHub**: GitHub. - **API Endpoints:** - Stability AI API ### Implementation Details - **MMDiT-X:** Introduces self-attention modules in the first 13 layers of the transformer, enhancing multi-resolution generation and overall image coherence. - **QK Normalization:** Implements the QK normalization technique to improve training Stability. - **Mixed-Resolution Training:** - Progressive training stages: 256 → 512 → 768 → 1024 → 1440 resolution - The final stage included mixed-scale image training to boost multi-resolution generation performance - Extended positional embedding space to 384x384 (latent) at lower resolution stages - Employed random crop augmentation on positional embeddings to enhance transformer layer robustness across the entire range of mixed resolutions and aspect ratios. For example, given a 64x64 latent image, we add a randomly cropped 64x64 embedding from the 192x192 embedding space during training as the input to the x stream. These enhancements collectively contribute to the model's improved performance in multi-resolution image generation, coherence, and adaptability across various text-to-image tasks. - **Text Encoders:** - CLIPs: OpenCLIP-ViT/G, CLIP-ViT/L, context length 77 tokens - T5: T5-xxl, context length 77/256 tokens at different stages of training - **Training Data and Strategy:** This model was trained on a wide variety of data, including synthetic data and filtered publicly available data. For more technical details of the original MMDiT architecture, please refer to the Research paper. ### Usage & Limitations - While this model can handle long prompts, you may observe artifacts on the edge of generations when T5 tokens go over 256. Pay attention to the token limits when using this model in your workflow, and shortern prompts if artifacts becomes too obvious. - The medium model has a different training data distribution than the large model, so it may not respond to the same prompt similarly. - We recommend sampling with **Skip Layer Guidance** for better structure and anatomy coherency. ### Model Performance See blog for our study about comparative performance in prompt adherence and aesthetic quality. ## File Structure Click here to access the Files and versions tab ## Using with Diffusers Upgrade to the latest version of the 🧨 diffusers library and then you can run ### Quantizing the model with diffusers Reduce your VRAM usage and have the model fit on 🤏 VRAM GPUs ### Fine-tuning Please see the fine-tuning guide here. ## Uses ### Intended Uses Intended uses include the following: * Generation of artworks and use in design and other artistic processes. * Applications in educational or creative tools. * Research on generative models, including understanding the limitations of generative models. All uses of the model must be in accordance with our Acceptable Use Policy. ### Out-of-Scope Uses The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model. ## Safety As part of our safety-by-design and responsible AI deployment approach, we take deliberate measures to ensure Integrity starts at the early stages of development. We implement safety measures throughout the development of our models. We have implemented safety mitigations that are intended to reduce the risk of certain harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases. For more about our approach to Safety, please visit our Safety page. ### Integrity Evaluation Our integrity evaluation methods include structured evaluations and red-teaming testing for certain harms. Testing was conducted primarily in English and may not cover all possible harms. ### Risks identified and mitigations: * Harmful content: We have used filtered data sets when training our models and implemented safeguards that attempt to strike the right balance between usefulness and preventing harm. However, this does not guarantee that all possible harmful content has been removed. TAll developers and deployers should exercise caution and implement content safety guardrails based on their specific product policies and application use cases. * Misuse: Technical limitations and developer and end-user education can help mitigate against malicious applications of models. All users are required to adhere to our Acceptable Use Policy, including when applying fine-tuning and prompt engineering mechanisms. Please reference the Stability AI Acceptable Use Policy for information on violative uses of our products. * Privacy violations: Developers and deployers are encouraged to adhere to privacy regulations with techniques that respect data privacy. ### Contact Please report any issues with the model or contact us: * Safety issues: safety@stability.ai * Security issues: security@stability.ai * Privacy issues: privacy@stability.ai * License and general: * Enterprise license:" +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-xl-base-1.0.json b/data/model_data_json/stabilityai_stable-diffusion-xl-base-1.0.json new file mode 100644 index 0000000000000000000000000000000000000000..5e8de0e0dbc3f295f3a4c179b6c71fac344a01ab --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-xl-base-1.0.json @@ -0,0 +1,22 @@ +{ + "model_id": "stabilityai/stable-diffusion-xl-base-1.0", + "downloads": 2311345, + "tags": [ + "diffusers", + "onnx", + "safetensors", + "text-to-image", + "stable-diffusion", + "arxiv:2307.01952", + "arxiv:2211.01324", + "arxiv:2108.01073", + "arxiv:2112.10752", + "license:openrail++", + "autotrain_compatible", + "endpoints_compatible", + "diffusers:StableDiffusionXLPipeline", + "region:us" + ], + "description": "--- license: openrail++ tags: - text-to-image - stable-diffusion --- # SD-XL 1.0-base Model Card !row01 ## Model !pipeline SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: specialized for the final denoising steps. Note that the base model can be used as a standalone module. Alternatively, we can use a two-stage pipeline as follows: First, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit ( also known as \"img2img\") to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations. Source code is available at . ### Model Description - **Developed by:** Stability AI - **Model type:** Diffusion-based text-to-image generative model - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). - **Resources for more information:** Check out our GitHub Repository and the SDXL report on arXiv. ### Model Sources For research purposes, we recommend our Github repository ( which implements the most popular diffusion frameworks (both training and inference) and for which new functionalities like distillation will be added over time. Clipdrop provides free SDXL inference. - **Repository:** - **Demo:** ## Evaluation !comparison The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0.9 and Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance. ### 🧨 Diffusers Make sure to upgrade diffusers to >= 0.19.0: In addition make sure to install , , as well as the invisible watermark: To just use the base model, you can run: To use the whole base + refiner pipeline as an ensemble of experts you can run: When using , you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline: If you are limited by GPU VRAM, you can enable *cpu offloading* by calling instead of : For more information on how to use Stable Diffusion XL with , please have a look at the Stable Diffusion XL Docs. ### Optimum Optimum provides a Stable Diffusion pipeline compatible with both OpenVINO and ONNX Runtime. #### OpenVINO To install Optimum with the dependencies required for OpenVINO : To load an OpenVINO model and run inference with OpenVINO Runtime, you need to replace with Optimum . In case you want to load a PyTorch model and convert it to the OpenVINO format on-the-fly, you can set . You can find more examples (such as static reshaping and model compilation) in optimum documentation. #### ONNX To install Optimum with the dependencies required for ONNX Runtime inference : To load an ONNX model and run inference with ONNX Runtime, you need to replace with Optimum . In case you want to load a PyTorch model and convert it to the ONNX format on-the-fly, you can set . You can find more examples in optimum documentation. ## Uses ### Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. Excluded uses are described below. ### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.", + "model_explanation_gemini": "Generates and modifies images from text prompts using a latent diffusion-based text-to-image pipeline with optional refinement for higher quality." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-diffusion-xl-refiner-1.0.json b/data/model_data_json/stabilityai_stable-diffusion-xl-refiner-1.0.json new file mode 100644 index 0000000000000000000000000000000000000000..97050e6b3522e3bd960d96b156eda58c2d448f88 --- /dev/null +++ b/data/model_data_json/stabilityai_stable-diffusion-xl-refiner-1.0.json @@ -0,0 +1,19 @@ +{ + "model_id": "stabilityai/stable-diffusion-xl-refiner-1.0", + "downloads": 1049922, + "tags": [ + "diffusers", + "safetensors", + "stable-diffusion", + "image-to-image", + "arxiv:2307.01952", + "arxiv:2211.01324", + "arxiv:2108.01073", + "arxiv:2112.10752", + "license:openrail++", + "diffusers:StableDiffusionXLImg2ImgPipeline", + "region:us" + ], + "description": "--- license: openrail++ tags: - stable-diffusion - image-to-image --- # SD-XL 1.0-refiner Model Card !row01 ## Model !pipeline SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model (available here: is used to generate (noisy) latents, which are then further processed with a refinement model specialized for the final denoising steps. Note that the base model can be used as a standalone module. Alternatively, we can use a two-stage pipeline as follows: First, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit ( also known as \"img2img\") to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations. Source code is available at . ### Model Description - **Developed by:** Stability AI - **Model type:** Diffusion-based text-to-image generative model - **License:** CreativeML Open RAIL++-M License - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). - **Resources for more information:** Check out our GitHub Repository and the SDXL report on arXiv. ### Model Sources For research purposes, we recommned our Github repository ( which implements the most popoular diffusion frameworks (both training and inference) and for which new functionalities like distillation will be added over time. Clipdrop provides free SDXL inference. - **Repository:** - **Demo:** ## Evaluation !comparison The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0.9 and Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance. ### 🧨 Diffusers Make sure to upgrade diffusers to >= 0.18.0: In addition make sure to install , , as well as the invisible watermark: Yon can then use the refiner to improve images. When using , you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline: If you are limited by GPU VRAM, you can enable *cpu offloading* by calling instead of : For more advanced use cases, please have a look at the docs. ## Uses ### Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. Excluded uses are described below. ### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Limitations and Bias ### Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. ### Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.", + "model_explanation_gemini": "Refines and enhances images generated by the SDXL base model using a specialized high-resolution denoising technique for improved quality." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-video-diffusion-img2vid-xt.json b/data/model_data_json/stabilityai_stable-video-diffusion-img2vid-xt.json new file mode 100644 index 0000000000000000000000000000000000000000..366b4e02436491b712bc6ed5154bc0a0898bf973 --- /dev/null +++ b/data/model_data_json/stabilityai_stable-video-diffusion-img2vid-xt.json @@ -0,0 +1,14 @@ +{ + "model_id": "stabilityai/stable-video-diffusion-img2vid-xt", + "downloads": 388659, + "tags": [ + "diffusers", + "safetensors", + "image-to-video", + "license:other", + "diffusers:StableVideoDiffusionPipeline", + "region:us" + ], + "description": "--- pipeline_tag: image-to-video license: other license_name: stable-video-diffusion-community license_link: LICENSE.md --- # Stable Video Diffusion Image-to-Video Model Card !row01 Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it. Please note: For commercial use, please refer to ## Model Details ### Model Description (SVD) Image-to-Video is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size, finetuned from [SVD Image-to-Video [14 frames]]( We also finetune the widely used f8-decoder for temporal consistency. For convenience, we additionally provide the model with the standard frame-wise decoder here. - **Developed by:** Stability AI - **Funded by:** Stability AI - **Model type:** Generative image-to-video model - **Finetuned from model:** SVD Image-to-Video [14 frames] ### Model Sources For research purposes, we recommend our Github repository ( which implements the most popular diffusion frameworks (both training and inference). - **Repository:** - **Paper:** ## Evaluation !comparison The chart above evaluates user preference for SVD-Image-to-Video over GEN-2 and PikaLabs. SVD-Image-to-Video is preferred by human voters in terms of video quality. For details on the user study, we refer to the research paper ## Uses ### Direct Use The model is intended for both non-commercial and commercial usage. You can use this model for non-commercial or research purposes under this license. Possible research areas and tasks include - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. For commercial use, please refer to Excluded uses are described below. ### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy. ## Limitations and Bias ### Limitations - The generated videos are rather short (<= 4sec), and the model does not achieve perfect photorealism. - The model may generate videos without motion, or very slow camera pans. - The model cannot be controlled through text. - The model cannot render legible text. - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. ### Recommendations The model is intended for both non-commercial and commercial usage. ## How to Get Started with the Model Check out # Appendix: All considered potential data sources were included for final training, with none held out as the proposed data filtering methods described in the SVD paper handle the quality control/filtering of the dataset. With regards to safety/NSFW filtering, sources considered were either deemed safe or filtered with the in-house NSFW filters. No explicit human labor is involved in training data preparation. However, human evaluation for model outputs and quality was extensively used to evaluate model quality and performance. The evaluations were performed with third-party contractor platforms (Amazon Sagemaker, Amazon Mechanical Turk, Prolific) with fluent English-speaking contractors from various countries, primarily from the USA, UK, and Canada. Each worker was paid $12/hr for the time invested in the evaluation. No other third party was involved in the development of this model; the model was fully developed in-house at Stability AI. Training the SVD checkpoints required a total of approximately 200,000 A100 80GB hours. The majority of the training occurred on 48 * 8 A100s, while some stages took more/less than that. The resulting CO2 emission is ~19,000kg CO2 eq., and energy consumed is ~64000 kWh. The released checkpoints (SVD/SVD-XT) are image-to-video models that generate short videos/animations closely following the given input image. Since the model relies on an existing supplied image, the potential risks of disclosing specific material or novel unsafe content are minimal. This was also evaluated by third-party independent red-teaming services, which agree with our conclusion to a high degree of confidence (>90% in various areas of safety red-teaming). The external evaluations were also performed for trustworthiness, leading to >95% confidence in real, trustworthy videos. With the default settings at the time of release, SVD takes ~100s for generation, and SVD-XT takes ~180s on an A100 80GB card. Several optimizations to trade off quality / memory / speed can be done to perform faster inference or inference on lower VRAM cards. The information related to the model and its development process and usage protocols can be found in the GitHub repo, associated research paper, and HuggingFace model page/cards. The released model inference & demo code has image-level watermarking enabled by default, which can be used to detect the outputs. This is done via the imWatermark Python library. The model can be used to generate videos from static initial images. However, we prohibit unlawful, obscene, or misleading uses of the model consistent with the terms of our license and Acceptable Use Policy. For the open-weights release, our training data filtering mitigations alleviate this risk to some extent. These restrictions are explicitly enforced on user-facing interfaces at stablevideo.com, where a warning is issued. We do not take any responsibility for third-party interfaces. Submitting initial images that bypass input filters to tease out offensive or inappropriate content listed above is also prohibited. Safety filtering checks at stablevideo.com run on model inputs and outputs independently. More details on our user-facing interfaces can be found here: Beyond the Acceptable Use Policy and other mitigations and conditions described here, the model is not subject to additional model behavior interventions of the type described in the Foundation Model Transparency Index. For stablevideo.com, we store preference data in the form of upvotes/downvotes on user-generated videos, and we have a pairwise ranker that runs while a user generates videos. This usage data is solely used for improving Stability AI’s future image/video models and services. No other third-party entities are given access to the usage data beyond Stability AI and maintainers of stablevideo.com. For usage statistics of SVD, we refer interested users to HuggingFace model download/usage statistics as a primary indicator. Third-party applications also have reported model usage statistics. We might also consider releasing aggregate usage statistics of stablevideo.com on reaching some milestones.", + "model_explanation_gemini": "Generates short video clips from a still image using a diffusion model trained for temporal consistency." +} \ No newline at end of file diff --git a/data/model_data_json/stabilityai_stable-video-diffusion-img2vid.json b/data/model_data_json/stabilityai_stable-video-diffusion-img2vid.json new file mode 100644 index 0000000000000000000000000000000000000000..3680f828b7e1c4c27511066451fd4fc7ddfe75ce --- /dev/null +++ b/data/model_data_json/stabilityai_stable-video-diffusion-img2vid.json @@ -0,0 +1,14 @@ +{ + "model_id": "stabilityai/stable-video-diffusion-img2vid", + "downloads": 88875, + "tags": [ + "diffusers", + "safetensors", + "image-to-video", + "license:other", + "diffusers:StableVideoDiffusionPipeline", + "region:us" + ], + "description": "--- pipeline_tag: image-to-video license: other license_name: stable-video-diffusion-community license_link: LICENSE.md --- # Stable Video Diffusion Image-to-Video Model Card !row01 Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it. Please note: For commercial use of this model, please refer to ## Model Details ### Model Description (SVD) Image-to-Video is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We also finetune the widely used f8-decoder for temporal consistency. For convenience, we additionally provide the model with the standard frame-wise decoder here. - **Developed by:** Stability AI - **Funded by:** Stability AI - **Model type:** Generative image-to-video model ### Model Sources For research purposes, we recommend our Github repository ( which implements the most popular diffusion frameworks (both training and inference). - **Repository:** - **Paper:** ## Evaluation !comparison The chart above evaluates user preference for SVD-Image-to-Video over GEN-2 and PikaLabs. SVD-Image-to-Video is preferred by human voters in terms of video quality. For details on the user study, we refer to the research paper ## Uses ### Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. Excluded uses are described below. ### Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy. ## Limitations and Bias ### Limitations - The generated videos are rather short (<= 4sec), and the model does not achieve perfect photorealism. - The model may generate videos without motion, or very slow camera pans. - The model cannot be controlled through text. - The model cannot render legible text. - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. ### Recommendations The model is intended for research purposes only. ## How to Get Started with the Model Check out # Appendix: All considered potential data sources were included for final training, with none held out as the proposed data filtering methods described in the SVD paper handle the quality control/filtering of the dataset. With regards to safety/NSFW filtering, sources considered were either deemed safe or filtered with the in-house NSFW filters. No explicit human labor is involved in training data preparation. However, human evaluation for model outputs and quality was extensively used to evaluate model quality and performance. The evaluations were performed with third-party contractor platforms (Amazon Sagemaker, Amazon Mechanical Turk, Prolific) with fluent English-speaking contractors from various countries, primarily from the USA, UK, and Canada. Each worker was paid $12/hr for the time invested in the evaluation. No other third party was involved in the development of this model; the model was fully developed in-house at Stability AI. Training the SVD checkpoints required a total of approximately 200,000 A100 80GB hours. The majority of the training occurred on 48 * 8 A100s, while some stages took more/less than that. The resulting CO2 emission is ~19,000kg CO2 eq., and energy consumed is ~64000 kWh. The released checkpoints (SVD/SVD-XT) are image-to-video models that generate short videos/animations closely following the given input image. Since the model relies on an existing supplied image, the potential risks of disclosing specific material or novel unsafe content are minimal. This was also evaluated by third-party independent red-teaming services, which agree with our conclusion to a high degree of confidence (>90% in various areas of safety red-teaming). The external evaluations were also performed for trustworthiness, leading to >95% confidence in real, trustworthy videos. With the default settings at the time of release, SVD takes ~100s for generation, and SVD-XT takes ~180s on an A100 80GB card. Several optimizations to trade off quality / memory / speed can be done to perform faster inference or inference on lower VRAM cards. The information related to the model and its development process and usage protocols can be found in the GitHub repo, associated research paper, and HuggingFace model page/cards. The released model inference & demo code has image-level watermarking enabled by default, which can be used to detect the outputs. This is done via the imWatermark Python library. The model can be used to generate videos from static initial images. However, we prohibit unlawful, obscene, or misleading uses of the model consistent with the terms of our license and Acceptable Use Policy. For the open-weights release, our training data filtering mitigations alleviate this risk to some extent. These restrictions are explicitly enforced on user-facing interfaces at stablevideo.com, where a warning is issued. We do not take any responsibility for third-party interfaces. Submitting initial images that bypass input filters to tease out offensive or inappropriate content listed above is also prohibited. Safety filtering checks at stablevideo.com run on model inputs and outputs independently. More details on our user-facing interfaces can be found here: Beyond the Acceptable Use Policy and other mitigations and conditions described here, the model is not subject to additional model behavior interventions of the type described in the Foundation Model Transparency Index. For stablevideo.com, we store preference data in the form of upvotes/downvotes on user-generated videos, and we have a pairwise ranker that runs while a user generates videos. This usage data is solely used for improving Stability AI’s future image/video models and services. No other third-party entities are given access to the usage data beyond Stability AI and maintainers of stablevideo.com. For usage statistics of SVD, we refer interested users to HuggingFace model download/usage statistics as a primary indicator. Third-party applications also have reported model usage statistics. We might also consider releasing aggregate usage statistics of stablevideo.com on reaching some milestones.", + "model_explanation_gemini": "Generates short video clips from a still input image using a diffusion model trained for temporal consistency." +} \ No newline at end of file diff --git a/data/model_data_json/stable-diffusion-v1-5_stable-diffusion-inpainting.json b/data/model_data_json/stable-diffusion-v1-5_stable-diffusion-inpainting.json new file mode 100644 index 0000000000000000000000000000000000000000..133f43f15ca5fc8afa6132d99dbc3357a1c5e8e7 --- /dev/null +++ b/data/model_data_json/stable-diffusion-v1-5_stable-diffusion-inpainting.json @@ -0,0 +1,20 @@ +{ + "model_id": "stable-diffusion-v1-5/stable-diffusion-inpainting", + "downloads": 3132327, + "tags": [ + "diffusers", + "stable-diffusion", + "stable-diffusion-diffusers", + "text-to-image", + "arxiv:2207.12598", + "arxiv:2112.10752", + "arxiv:2103.00020", + "arxiv:2205.11487", + "arxiv:1910.09700", + "license:creativeml-openrail-m", + "diffusers:StableDiffusionInpaintPipeline", + "region:us" + ], + "description": "--- license: creativeml-openrail-m tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image inference: false library_name: diffusers --- # Stable Diffusion Inpainting model card ### ⚠️ This repository is a mirror of the now deprecated , this repository or oganization are not affiliated in any way with RunwayML. Modifications to the original model card are in red or green Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask. The **Stable-Diffusion-Inpainting** was initialized with the weights of the Stable-Diffusion-v-1-2. First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything. Open In Spaces | , Automatic1111. ### Use with Diffusers **How it works:** | :-------------------------:|:-------------------------:| \"drawing\" | \"drawing\" | :-------------------------:|:-------------------------:| Face of a yellow cat, high resolution, sitting on a park bench | \"drawing\" ### Use with Original GitHub Repository or AUTOMATIC1111 1. Download the weights sd-v1-5-inpainting.ckpt 2. Follow instructions here (now deprecated). 3. Use it with
red or green Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. For more information about how Stable Diffusion functions, please have a look at 🤗's Stable Diffusion blog. The **Stable-Diffusion-v1-5** checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on \"laion-aesthetics v2 5+\" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. You can use this both with the 🧨Diffusers library and RunwayML GitHub repository (now deprecated), ComfyUI, Automatic1111, SD.Next, InvokeAI. ### Use with Diffusers For more detailed instructions, use-cases and examples in JAX follow the instructions here ### Use with GitHub Repository (now deprecated), ComfyUI or Automatic1111 1. Download the weights - v1-5-pruned-emaonly.safetensors - ema-only weight. uses less VRAM - suitable for inference - v1-5-pruned.safetensors - ema+non-ema weights. uses more VRAM - suitable for fine-tuning 2. Follow instructions here. (now deprecated) 3. Use locally with Haoran Wei*, Chenglong Liu*, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang !image/jpeg ## Usage Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10: More details about 'ocr_type', 'ocr_box', 'ocr_color', and 'render' can be found at our GitHub. Our training codes are available at our GitHub. ## More Multimodal Projects 👏 Welcome to explore more multimodal projects of our team: Vary | Fox | OneChart ## Citation If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️!" +} \ No newline at end of file diff --git a/data/model_data_json/tablegpt_TableGPT2-7B.json b/data/model_data_json/tablegpt_TableGPT2-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..d731c8b243c19d8ee4ce97f18888f56c6781fc10 --- /dev/null +++ b/data/model_data_json/tablegpt_TableGPT2-7B.json @@ -0,0 +1,17 @@ +{ + "model_id": "tablegpt/TableGPT2-7B", + "downloads": 36912, + "tags": [ + "safetensors", + "qwen2", + "zh", + "en", + "arxiv:2411.02059", + "base_model:Qwen/Qwen2.5-7B", + "base_model:finetune:Qwen/Qwen2.5-7B", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 language: - zh - en base_model: - Qwen/Qwen2.5-7B --- # TableGPT2-7B ## Model details We developed and released TableGPT2-7B, a large-scale decoder specifically tailored for data-intensive tasks, with a focus on interpreting and analyzing tabular data. TableGPT2-7B is designed to bridge the gap between conventional LLM capabilities and the real-world demands of tabular/structured data tasks, such as those in business intelligence (BI), automated data-driven analysis, and application tasks tightly involving databases or data warehouses. **Model Developers** Zhejiang University **Variations** TableGPT2 is available in two configurations—7B and 72B parameters—both derived from the Qwen2.5 model family and optimized for handling structured data in tabular formats. Currently, we have released the 7B version to the public. **Input** TableGPT2-7B accepts both text and tabular data as input, with the tabular data structured as text in the format of a df.head() result. **Output** TableGPT2-7B produces text-based outputs, specifically optimized for coding tasks, data interpretation, and BI-focused question answering. **Language** Our model places a strong emphasis on Chinese corpora, and currently, queries in other languages may have limited support. **Other Requirements** We highly recommend exploring our repository on GitHub, where users can integrate this model into our agent workflow for enhanced performance. **Model Architecture** TableGPT2-7B is built upon the Qwen2.5 architecture and includes specialized encoding for tabular data. It features a unique semantic encoder designed to interpret tabular data, capturing insights from rows, columns, and entire tables. Continual Pretraining (CPT) and Supervised Fine-Tuning (SFT) have been applied to equip the model for real-world BI applications and complex query processing. For now, the standalone decoder is open-sourced and fully functional without having to require assistance from the encoder. The encoder is currently under preparation, pending engineering considerations, primarily because we hope to provide a tighter integration with DeepSpeed and vLLM. | | Training Data | Params | Context Length | Tokens | Tables | | ------------ | ------------------------------------------------ | ------ | -------------- | --------------------------------- | ------------- | | TableGPT2-7B | Multimodal data sources and BI-specific examples | 7B | 128K | 86B tokens CPT, 2.36M SFT samples | 593.8K tables | **Status** This model is static, trained on an offline dataset. Future versions may be released to enhance its performance on specialized tasks. **QuickStart** This code snippet demonstrates how to build a prompt with table information, and shows how to load the tokenizer, load the model, and generate content. > Note that you need to use : > **Complex Usage Scenarios** For complex usage scenarios, we provide a tablegpt-agent) toolkit to help you more conveniently handle various types of tabular inputs. This agent is built on top of the library and provides a user-friendly interface for interacting with . **Deployment** For deployment, we recommend using vLLM. * **Install vLLM**: You can install vLLM by running the following command. * **Model Deployment**: Use vLLM to deploy your model. For example, you can use the command to set up a server similar to openAI: Then you can access the Chat API by: For more details about how to use TableGPT2, please refer to our repository on GitHub **License** TableGPT2-7B is under apache-2.0 license. **Research Paper** TableGPT2-7B is introduced and validated in the paper \"TableGPT2: A Large Multimodal Model with Tabular Data Integration\" available on arXiv. **Where to send questions or comments about the model** Inquiries and feedback are welcome at j.zhao@zju.edu.cn. ## Training Data **Overview** Training for TableGPT2-7B involved more than 593,800 curated tables, over 86 billion tokens for continual pretraining (CPT) and the construction of over 2.36 million high-quality query-table-output tuples for supervised fine-tuning. This extensive dataset aims to meet the rigorous demands of modern applications involving structured or tabular data. **Data Freshness** The training data has a cutoff of October 2024. ## Evaluation Results Evaluation has shown that TableGPT2-7B performs consistently well across benchmarks for tabular comprehension, code generation, and structured data reasoning, achieving a **35.20%** performance increase over comparable models on standard benchmarks and **49.32%** on BI-focused assessments. The RealTabBench benchmark further demonstrated the model’s robustness in handling unconventional tables and complex queries. Below, we present the results on public table-related benchmarks. | **Benchmark** | **Metric** | GPT-4o | TableLLM (Qwen2) | TableLLM (CodeQwen) | TableLLM (LLaMA3) | TableLLM (LLaMA3.1) | TableLLM (DeepSeek) | TableLLM-13B | DeepSeek-lite | Yi-Coder | Qwen2.5-Coder | Qwen2.5-Instruct | **TableGPT2-7B** | **TableGPT2-72B** | | ----------------------------- | ---------- | ------ | ---------------- | ------------------- | ----------------- | ------------------- | ------------------- | ------------ | ------------- | -------- | ------------- | ---------------- | -------------- | --------------- | | **Table Understanding** | | | | | | | | | | | | | | | | Col Type Annot. | F1 | 31.75 | 10.10 | 5.71 | 1.47 | 1.59 | 6.04 | 12.70 | 20.58 | 5.38 | 32.59 | 22.19 | **85.88** | 85.67 | | Relation Extract. | F1 | 52.95 | 1.60 | 3.79 | 2.39 | 2.00 | 3.34 | 18.16 | 8.67 | 2.25 | 31.00 | 15.92 | **83.35** | 79.50 | | Entity Linking | Acc | 90.80 | 47.10 | 39.70 | 0.20 | 0.60 | 15.50 | 66.25 | 70.15 | 41.75 | 71.70 | 82.25 | 92.00 | **93.30** | | Row Pop. | MAP | 53.40 | 2.20 | 5.14 | 1.93 | 6.23 | 3.13 | 14.25 | 1.20 | 1.00 | 13.23 | 12.30 | **59.97** | 55.83 | | **Question Answering** | | | | | | | | | | | | | | | | HiTab | Exec Acc | 48.40 | 11.74 | 0.00 | 0.00 | 0.00 | 39.08 | 6.30 | 0.76 | 0.00 | 1.70 | 10.73 | 70.27 | **75.57** | | FetaQA | BLEU | 21.70 | 12.24 | 8.69 | 2.42 | 3.10 | 7.94 | 10.83 | 15.08 | 11.17 | 13.00 | 16.91 | 28.97 | **32.25** | | HybridQA | Acc | 58.60 | 27.12 | 20.14 | 27.35 | 27.61 | 19.53 | 51.88 | 42.58 | 29.83 | 51.10 | 51.13 | 53.17 | **56.41** | | WikiSQL | Acc | 47.60 | 46.50 | 37.20 | 39.26 | 39.00 | 36.14 | 41.10 | 38.30 | 25.34 | 46.90 | 47.42 | 53.74 | **57.32** | | WikiTQ | Acc | 68.40 | 64.16 | 36.05 | 34.95 | 38.84 | 36.05 | 66.30 | 47.65 | 43.37 | **74.50** | 68.55 | 61.42 | 71.45 | | **Fact Verification** | | | | | | | | | | | | | | | | TabFact | Acc | 74.40 | 72.00 | 53.20 | 40.06 | 27.13 | 60.76 | 68.95 | 62.27 | 79.6 | 77.26 | 84.60 | 77.80 | **85.43** | | FEVEROUS | Acc | 71.60 | 20.10 | 46.90 | 51.50 | 42.30 | 18.39 | 21.45 | 7.80 | 38.10 | 60.70 | 63.30 | **78.05** | 76.80 | | **Table to Text** | | | | | | | | | | | | | | | | ToTTo | BLEU | 12.21 | 6.95 | 3.10 | 5.50 | 6.23 | 3.81 | 5.36 | 8.76 | 2.64 | 10.50 | 11.91 | 14.10 | **22.69** | | **Natural Language to SQL** | | | | | | | | | | | | | | | | BIRD(dev) | Exec Acc | - | 9.13 | 7.37 | 1.83 | 2.48 | 0.39 | 0.72 | 25.10 | 24.19 | 27.18 | 18.97 | 31.42 | **38.40** | | BIRD(dev-knowledge) | Exec Acc | - | 15.45 | 18.19 | 3.39 | 3.72 | 0.39 | 1.83 | 36.51 | 39.96 | 42.96 | 31.42 | 49.28 | **60.76** | | Spider(dev) | Exec Acc | - | 42.26 | 32.88 | 12.86 | 18.96 | 2.71 | 4.26 | 66.44 | 58.12 | 70.99 | 61.70 | 76.31 | **79.40** | | Spider(test) | Exec Acc | - | 40.29 | 34.93 | 12.02 | 16.35 | 7.33 | 2.93 | 66.65 | 56.87 | 69.73 | 60.18 | 74.38 | **78.48** | | **Holistic Table Evaluation** | | | | | | | | | | | | | | | | TableBench | DP | - | 26.62 | 26.44 | 26.71 | 26.73 | 26.15 | 3.88 | 29.60 | 21.94 | 28.67 | 25.18 | 32.03 | **38.90** | | TableBench | TCoT | - | 37.08 | 31.33 | 29.79 | 30.01 | 28.65 | 3.85 | 30.93 | 22.8 | 36.25 | 29.77 | 42.34 | **50.06** | | TableBench | SCoT | - | 14.11 | 17.78 | 9.60 | 12.38 | 22.39 | 2.88 | 22.61 | 8.43 | 25.95 | 24.35 | 25.01 | **30.47** | | TableBench | PoT@1 | - | 21.05 | 26.39 | 31.96 | 25.80 | 28.39 | 2.94 | 10.90 | 11.36 | 16.15 | 22.58 | **33.52** | 28.98 | ## Citation If you find our work helpful, please cite us by", + "model_explanation_gemini": "TableGPT2-7B is a 7-billion-parameter decoder model specialized in interpreting and analyzing tabular data for tasks like business intelligence, automated analysis, and database-related applications, supporting both text and structured table inputs with optimized outputs for coding and data interpretation." +} \ No newline at end of file diff --git a/data/model_data_json/tabularisai_multilingual-sentiment-analysis.json b/data/model_data_json/tabularisai_multilingual-sentiment-analysis.json new file mode 100644 index 0000000000000000000000000000000000000000..3a15e6de79f3ddf9b90e44078bb2512eea480ae3 --- /dev/null +++ b/data/model_data_json/tabularisai_multilingual-sentiment-analysis.json @@ -0,0 +1,51 @@ +{ + "model_id": "tabularisai/multilingual-sentiment-analysis", + "downloads": 153504, + "tags": [ + "transformers", + "safetensors", + "distilbert", + "text-classification", + "sentiment-analysis", + "sentiment", + "synthetic data", + "multi-class", + "social-media-analysis", + "customer-feedback", + "product-reviews", + "brand-monitoring", + "multilingual", + "🇪🇺", + "region:eu", + "en", + "zh", + "es", + "hi", + "ar", + "bn", + "pt", + "ru", + "ja", + "de", + "ms", + "te", + "vi", + "ko", + "fr", + "tr", + "it", + "pl", + "uk", + "tl", + "nl", + "gsw", + "base_model:distilbert/distilbert-base-multilingual-cased", + "base_model:finetune:distilbert/distilbert-base-multilingual-cased", + "license:cc-by-nc-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: distilbert/distilbert-base-multilingual-cased language: - en - zh - es - hi - ar - bn - pt - ru - ja - de - ms - te - vi - ko - fr - tr - it - pl - uk - tl - nl - gsw library_name: transformers license: cc-by-nc-4.0 pipeline_tag: text-classification tags: - text-classification - sentiment-analysis - sentiment - synthetic data - multi-class - social-media-analysis - customer-feedback - product-reviews - brand-monitoring - multilingual - 🇪🇺 - region:eu --- # 🚀 distilbert-based Multilingual Sentiment Classification Model # NEWS! - 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach. ## Model Details - tabularisai/multilingual-sentiment-analysis - distilbert/distilbert-base-multilingual-cased - Text Classification (Sentiment Analysis) - Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch). - 5 (*Very Negative, Negative, Neutral, Positive, Very Positive*) - - Social media analysis - Customer feedback analysis - Product reviews classification - Brand monitoring - Market research - Customer service optimization - Competitive intelligence ## Model Description This model is a fine-tuned version of for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts. ### Training Data Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages. ### Training Procedure - Fine-tuned for 3.5 epochs. - Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset. ## Intended Use Ideal for: - Multilingual social media monitoring - International customer feedback analysis - Global product review sentiment classification - Worldwide brand sentiment tracking ## How to Use Using pipelines, it takes only 4 lines: Below is a Python example on how to use the multilingual sentiment model without pipelines: ## Ethical Considerations Synthetic data reduces bias, but validation in real-world scenarios is advised. ## Citation ## Contact For inquiries, data, private APIs, better models, contact info@tabularis.ai tabularis.ai", + "model_explanation_gemini": "Classifies sentiment in multiple languages across five categories (Very Negative to Very Positive) for tasks like social media analysis, customer feedback, and product reviews." +} \ No newline at end of file diff --git a/data/model_data_json/tals_albert-xlarge-vitaminc-mnli.json b/data/model_data_json/tals_albert-xlarge-vitaminc-mnli.json new file mode 100644 index 0000000000000000000000000000000000000000..23ad30642dadf9874ac882e2669ff8e1f633f5e8 --- /dev/null +++ b/data/model_data_json/tals_albert-xlarge-vitaminc-mnli.json @@ -0,0 +1,20 @@ +{ + "model_id": "tals/albert-xlarge-vitaminc-mnli", + "downloads": 859985, + "tags": [ + "transformers", + "pytorch", + "tf", + "safetensors", + "albert", + "text-classification", + "dataset:glue", + "dataset:multi_nli", + "dataset:tals/vitaminc", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - glue - multi_nli - tals/vitaminc --- # Details Model used in Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence (Schuster et al., NAACL 21`). For more details see: When using this model, please cite the paper. # BibTeX entry and citation info", + "model_explanation_gemini": "Classifies or verifies facts by leveraging contrastive evidence, trained on datasets including GLUE, MultiNLI, and VitaminC." +} \ No newline at end of file diff --git a/data/model_data_json/tasksource_ModernBERT-large-nli.json b/data/model_data_json/tasksource_ModernBERT-large-nli.json new file mode 100644 index 0000000000000000000000000000000000000000..4b7edb48b1b21937a3a495d92ec3c5c1bb2ae0a4 --- /dev/null +++ b/data/model_data_json/tasksource_ModernBERT-large-nli.json @@ -0,0 +1,24 @@ +{ + "model_id": "tasksource/ModernBERT-large-nli", + "downloads": 82059, + "tags": [ + "transformers", + "safetensors", + "modernbert", + "text-classification", + "instruct", + "natural-language-inference", + "nli", + "zero-shot-classification", + "en", + "dataset:nyu-mll/glue", + "dataset:facebook/anli", + "base_model:answerdotai/ModernBERT-large", + "base_model:finetune:answerdotai/ModernBERT-large", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers base_model: - answerdotai/ModernBERT-large license: apache-2.0 language: - en pipeline_tag: zero-shot-classification datasets: - nyu-mll/glue - facebook/anli tags: - instruct - natural-language-inference - nli --- # Model Card for Model ID This model is ModernBERT multi-task fine-tuned on tasksource NLI tasks, including MNLI, ANLI, SICK, WANLI, doc-nli, LingNLI, FOLIO, FOL-NLI, LogicNLI, Label-NLI and all datasets in the below table). This is the equivalent of an \"instruct\" version. The model was trained for 200k steps on an Nvidia A30 GPU. It is very good at reasoning tasks (better than llama 3.1 8B Instruct on ANLI and FOLIO), long context reasoning, sentiment analysis and zero-shot classification with new labels. The following table shows model test accuracy. These are the scores for the same single transformer with different classification heads on top. Further gains can be obtained by fine-tuning on a single-task, e.g. SST, but it this checkpoint is great for zero-shot classification and natural language inference (contradiction/entailment/neutral classification). | test_name | test_accuracy | |:--------------------------------------|----------------:| | glue/mnli | 0.89 | | glue/qnli | 0.96 | | glue/rte | 0.91 | | glue/wnli | 0.64 | | glue/mrpc | 0.81 | | glue/qqp | 0.87 | | glue/cola | 0.87 | | glue/sst2 | 0.96 | | super_glue/boolq | 0.66 | | super_glue/cb | 0.86 | | super_glue/multirc | 0.9 | | super_glue/wic | 0.71 | | super_glue/axg | 1 | | anli/a1 | 0.72 | | anli/a2 | 0.54 | | anli/a3 | 0.55 | | sick/label | 0.91 | | sick/entailment_AB | 0.93 | | snli | 0.94 | | scitail/snli_format | 0.95 | | hans | 1 | | WANLI | 0.77 | | recast/recast_ner | 0.85 | | recast/recast_sentiment | 0.97 | | recast/recast_verbnet | 0.89 | | recast/recast_megaveridicality | 0.87 | | recast/recast_verbcorner | 0.87 | | recast/recast_kg_relations | 0.9 | | recast/recast_factuality | 0.95 | | recast/recast_puns | 0.98 | | probability_words_nli/reasoning_1hop | 1 | | probability_words_nli/usnli | 0.79 | | probability_words_nli/reasoning_2hop | 0.98 | | nan-nli | 0.85 | | nli_fever | 0.78 | | breaking_nli | 0.99 | | conj_nli | 0.72 | | fracas | 0.79 | | dialogue_nli | 0.94 | | mpe | 0.75 | | dnc | 0.91 | | recast_white/fnplus | 0.76 | | recast_white/sprl | 0.9 | | recast_white/dpr | 0.84 | | add_one_rte | 0.94 | | paws/labeled_final | 0.96 | | pragmeval/pdtb | 0.56 | | lex_glue/scotus | 0.58 | | lex_glue/ledgar | 0.85 | | dynasent/dynabench.dynasent.r1.all/r1 | 0.83 | | dynasent/dynabench.dynasent.r2.all/r2 | 0.76 | | cycic_classification | 0.96 | | lingnli | 0.91 | | monotonicity-entailment | 0.97 | | scinli | 0.88 | | naturallogic | 0.93 | | dynahate | 0.86 | | syntactic-augmentation-nli | 0.94 | | autotnli | 0.92 | | defeasible-nli/atomic | 0.83 | | defeasible-nli/snli | 0.8 | | help-nli | 0.96 | | nli-veridicality-transitivity | 0.99 | | lonli | 0.99 | | dadc-limit-nli | 0.79 | | folio | 0.71 | | tomi-nli | 0.54 | | puzzte | 0.59 | | temporal-nli | 0.93 | | counterfactually-augmented-snli | 0.81 | | cnli | 0.9 | | boolq-natural-perturbations | 0.72 | | equate | 0.65 | | logiqa-2.0-nli | 0.58 | | mindgames | 0.96 | | ConTRoL-nli | 0.66 | | logical-fallacy | 0.38 | | cladder | 0.89 | | conceptrules_v2 | 1 | | zero-shot-label-nli | 0.79 | | scone | 1 | | monli | 1 | | SpaceNLI | 1 | | propsegment/nli | 0.92 | | FLD.v2/default | 0.91 | | FLD.v2/star | 0.78 | | SDOH-NLI | 0.99 | | scifact_entailment | 0.87 | | feasibilityQA | 0.79 | | AdjectiveScaleProbe-nli | 1 | | resnli | 1 | | semantic_fragments_nli | 1 | | dataset_train_nli | 0.95 | | nlgraph | 0.97 | | ruletaker | 0.99 | | PARARULE-Plus | 1 | | logical-entailment | 0.93 | | nope | 0.56 | | LogicNLI | 0.91 | | contract-nli/contractnli_a/seg | 0.88 | | contract-nli/contractnli_b/full | 0.84 | | nli4ct_semeval2024 | 0.72 | | biosift-nli | 0.92 | | SIGA-nli | 0.57 | | FOL-nli | 0.79 | | doc-nli | 0.81 | | mctest-nli | 0.92 | | natural-language-satisfiability | 0.92 | | idioms-nli | 0.83 | | lifecycle-entailment | 0.79 | | MSciNLI | 0.84 | | hover-3way/nli | 0.92 | | seahorse_summarization_evaluation | 0.81 | | missing-item-prediction/contrastive | 0.88 | | Pol_NLI | 0.93 | | synthetic-retrieval-NLI/count | 0.72 | | synthetic-retrieval-NLI/position | 0.9 | | synthetic-retrieval-NLI/binary | 0.92 | | babi_nli | 0.98 | # Usage ## [ZS] Zero-shot classification pipeline NLI training data of this model includes label-nli, a NLI dataset specially constructed to improve this kind of zero-shot classification. ## [NLI] Natural language inference pipeline ## Backbone for further fune-tuning This checkpoint has stronger reasoning and fine-grained abilities than the base version and can be used for further fine-tuning. # Citation" +} \ No newline at end of file diff --git a/data/model_data_json/teknium_OpenHermes-2.5-Mistral-7B.json b/data/model_data_json/teknium_OpenHermes-2.5-Mistral-7B.json new file mode 100644 index 0000000000000000000000000000000000000000..c78fe5b84ad9571f7e5b3a4f08b0cca3efb86f2e --- /dev/null +++ b/data/model_data_json/teknium_OpenHermes-2.5-Mistral-7B.json @@ -0,0 +1,29 @@ +{ + "model_id": "teknium/OpenHermes-2.5-Mistral-7B", + "downloads": 213808, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "mistral", + "text-generation", + "instruct", + "finetune", + "chatml", + "gpt4", + "synthetic data", + "distillation", + "conversational", + "en", + "dataset:teknium/OpenHermes-2.5", + "base_model:mistralai/Mistral-7B-v0.1", + "base_model:finetune:mistralai/Mistral-7B-v0.1", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: mistralai/Mistral-7B-v0.1 tags: - mistral - instruct - finetune - chatml - gpt4 - synthetic data - distillation model-index: - name: OpenHermes-2-Mistral-7B results: [] license: apache-2.0 language: - en datasets: - teknium/OpenHermes-2.5 --- # OpenHermes 2.5 - Mistral 7B !image/png *In the tapestry of Greek mythology, Hermes reigns as the eloquent Messenger of the Gods, a deity who deftly bridges the realms through the art of communication. It is in homage to this divine mediator that I name this advanced LLM \"Hermes,\" a system crafted to navigate the complex intricacies of human discourse with celestial finesse.* ## Model description OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, which trained on additional code datasets. Potentially the most interesting finding from training on a good ratio (est. of around 7-14% of the total dataset) of code instruction was that it has boosted several non-code benchmarks, including TruthfulQA, AGIEval, and GPT4All suite. It did however reduce BigBench benchmark score, but the net gain overall is significant. The code it trained on also improved it's humaneval score (benchmarking done by Glaive team) from **43% @ Pass 1** with Open Herms 2 to **50.7% @ Pass 1** with Open Hermes 2.5. OpenHermes was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape. [More details soon] Filtering was extensive of these public datasets, as well as conversion of all formats to ShareGPT, which was then further transformed by axolotl to use ChatML. Huge thank you to GlaiveAI and a16z for compute access and for sponsoring my work, and all the dataset creators and other people who's work has contributed to this project! Follow all my updates in ML and AI on Twitter: Support me on Github Sponsors: **NEW**: Chat with Hermes on LMSys' Chat Website! # Table of Contents 1. Example Outputs - Chat about programming with a superintelligence - Get a gourmet meal recipe - Talk about the nature of Hermes' consciousness - Chat with Edward Elric from Fullmetal Alchemist 2. Benchmark Results - GPT4All - AGIEval - BigBench - Averages Compared 3. Prompt Format 4. Quantized Models ## Example Outputs ### Chat about programming with a superintelligence: !image/png ### Get a gourmet meal recipe: !image/png ### Talk about the nature of Hermes' consciousness: !image/png ### Chat with Edward Elric from Fullmetal Alchemist: !image/png ## Benchmark Results Hermes 2.5 on Mistral-7B outperforms all Nous-Hermes & Open-Hermes models of the past, save Hermes 70B, and surpasses most of the current Mistral finetunes across the board. ### GPT4All, Bigbench, TruthfulQA, and AGIEval Model Comparisons: !image/png ### Averages Compared: !image/png GPT-4All Benchmark Set AGI-Eval BigBench Reasoning Test TruthfulQA: Average Score Comparison between OpenHermes-1 Llama-2 13B and OpenHermes-2 Mistral 7B against OpenHermes-2.5 on Mistral-7B: !image/png **HumanEval:** On code tasks, I first set out to make a hermes-2 coder, but found that it can have generalist improvements to the model, so I settled for slightly less code capabilities, for maximum generalist ones. That said, code capabilities had a decent jump alongside the overall capabilities of the model: Glaive performed HumanEval testing on Hermes-2.5 and found a score of: **50.7% @ Pass1** !image/png # Prompt Format OpenHermes 2.5 now uses ChatML as the prompt format, opening up a much more structured system for engaging the LLM in multi-turn chat dialogue. System prompts are now a thing that matters! Hermes 2.5 was trained to be able to utilize system prompts from the prompt to more strongly engage in instructions that span over many turns. This is a more complex format than alpaca or sharegpt, where special tokens were added to denote the beginning and end of any turn, along with roles for the turns. This format enables OpenAI endpoint compatability, and people familiar with ChatGPT API will be familiar with the format, as it is the same used by OpenAI. Prompt with system instruction (Use whatever system prompt you like, this is just an example!): This prompt is available as a chat template, which means you can format messages using the method: When tokenizing messages for generation, set when calling . This will append to your prompt, to ensure that the model continues with an assistant response. To utilize the prompt format without a system prompt, simply leave the line out. Currently, I recommend using LM Studio for chatting with Hermes 2. It is a GUI application that utilizes GGUF models with a llama.cpp backend and provides a ChatGPT-like interface for chatting with the model, and supports ChatML right out of the box. In LM-Studio, simply select the ChatML Prefix on the settings side pane: !image/png # Quantized Models: GGUF: GPTQ: AWQ: EXL2: \"Built", + "model_explanation_gemini": "OpenHermes-2.5-Mistral-7B is a fine-tuned Mistral-based model designed for advanced multi-turn chat dialogue and instruction-following, excelling in generalist tasks and coding with improved benchmarks, using ChatML prompt format." +} \ No newline at end of file diff --git a/data/model_data_json/tencent_Hunyuan3D-2.json b/data/model_data_json/tencent_Hunyuan3D-2.json new file mode 100644 index 0000000000000000000000000000000000000000..36b54c0f750218dcf8b912ef8449e1b8934c5013 --- /dev/null +++ b/data/model_data_json/tencent_Hunyuan3D-2.json @@ -0,0 +1,19 @@ +{ + "model_id": "tencent/Hunyuan3D-2", + "downloads": 490003, + "tags": [ + "hunyuan3d-2", + "diffusers", + "safetensors", + "image-to-3d", + "text-to-3d", + "en", + "zh", + "arxiv:2501.12202", + "arxiv:2411.02293", + "license:other", + "region:us" + ], + "description": "--- library_name: hunyuan3d-2 license: other license_name: tencent-hunyuan-community license_link: language: - en - zh tags: - image-to-3d - text-to-3d pipeline_tag: image-to-3d ---

[//]: # ( ) [//]: # ( ) [//]: # ( \"PyPI)

“ Living out everyone’s imagination on creating and manipulating 3D assets.”

This repository contains the models of the paper Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation. For code and more details on how to use it, refer to the Github repository. ## 🔥 News - Jan 21, 2025: 💬 Release Hunyuan3D 2.0. Please give it a try! ## **Abstract** We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model - Hunyuan3D-DiT, and a large-scale texture synthesis model - Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio - a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and e.t.c.

## ☯️ **Hunyuan3D 2.0** ### Architecture Hunyuan3D 2.0 features a two-stage generation pipeline, starting with the creation of a bare mesh, followed by the synthesis of a texture map for that mesh. This strategy is effective for decoupling the difficulties of shape and texture generation and also provides flexibility for texturing either generated or handcrafted meshes.

### Performance We have evaluated Hunyuan3D 2.0 with other open-source as well as close-source 3d-generation methods. The numerical results indicate that Hunyuan3D 2.0 surpasses all baselines in the quality of generated textured 3D assets and the condition following ability. | Model | CMMD(⬇) | FID_CLIP(⬇) | FID(⬇) | CLIP-score(⬆) | |-------------------------|-----------|-------------|-------------|---------------| | Top Open-source Model1 | 3.591 | 54.639 | 289.287 | 0.787 | | Top Close-source Model1 | 3.600 | 55.866 | 305.922 | 0.779 | | Top Close-source Model2 | 3.368 | 49.744 | 294.628 | 0.806 | | Top Close-source Model3 | 3.218 | 51.574 | 295.691 | 0.799 | | Hunyuan3D 2.0 | **3.193** | **49.165** | **282.429** | **0.809** | Generation results of Hunyuan3D 2.0:

### Pretrained Models | Model | Date | Huggingface | |----------------------|------------|--------------------------------------------------------| | Hunyuan3D-DiT-v2-0 | 2025-01-21 | Download | | Hunyuan3D-Paint-v2-0 | 2025-01-21 | Download | | Hunyuan3D-Delight-v2-0 | 2025-01-21 | Download | ## 🤗 Get Started with Hunyuan3D 2.0 You may follow the next steps to use Hunyuan3D 2.0 via code or the Gradio App. ### Install Requirements Please install Pytorch via the official site. Then install the other requirements via ### API Usage We designed a diffusers-like API to use our shape generation model - Hunyuan3D-DiT and texture synthesis model - Hunyuan3D-Paint. You could assess **Hunyuan3D-DiT** via: The output mesh is a trimesh object, which you could save to glb/obj (or other format) file. For **Hunyuan3D-Paint**, do the following: Please visit minimal_demo.py for more advanced usage, such as **text to 3D** and **texture generation for handcrafted mesh**. ### Gradio App You could also host a Gradio App in your own computer via: Don't forget to visit Hunyuan3D for quick use, if you don't want to host yourself. ## 📑 Open-Source Plan - [x] Inference Code - [x] Model Checkpoints - [x] Technical Report - [ ] ComfyUI - [ ] TensorRT Version ## 🔗 BibTeX If you found this repository helpful, please cite our report: ## Community Resources Thanks for the contributions of community members, here we have these great extensions of Hunyuan3D 2.0: - ComfyUI-Hunyuan3DWrapper - Hunyuan3D-2-for-windows - 📦 A bundle for running on Windows | 整合包 ## Acknowledgements We would like to thank the contributors to the DINOv2, Stable Diffusion, FLUX, diffusers and HuggingFace repositories, for their open research and exploration. ## Star History \"Star ", + "model_explanation_gemini": "Generates high-resolution textured 3D assets from images or text using a two-stage pipeline for shape and texture synthesis." +} \ No newline at end of file diff --git a/data/model_data_json/theainerd_Wav2Vec2-large-xlsr-hindi.json b/data/model_data_json/theainerd_Wav2Vec2-large-xlsr-hindi.json new file mode 100644 index 0000000000000000000000000000000000000000..459da690a40b045360e706f03eb08d94e9ec72ec --- /dev/null +++ b/data/model_data_json/theainerd_Wav2Vec2-large-xlsr-hindi.json @@ -0,0 +1,18 @@ +{ + "model_id": "theainerd/Wav2Vec2-large-xlsr-hindi", + "downloads": 1406802, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "wav2vec2", + "automatic-speech-recognition", + "hi", + "base_model:facebook/wav2vec2-large-xlsr-53", + "base_model:finetune:facebook/wav2vec2-large-xlsr-53", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - hi metrics: - wer base_model: - facebook/wav2vec2-large-xlsr-53 pipeline_tag: automatic-speech-recognition --- # Wav2Vec2-Large-XLSR-53-hindi Fine-tuned facebook/wav2vec2-large-xlsr-53 hindi using the Multilingual and code-switching ASR challenges for low resource Indian languages. When using this model, make sure that your speech input is sampled at 16kHz. ## Usage The model can be used directly (without a language model) as follows: ## Evaluation The model can be evaluated as follows on the hindi test data of Common Voice. **Test Result**: 72.62 % ## Training The script used for training can be found Hindi ASR Fine Tuning Wav2Vec2", + "model_explanation_gemini": "Fine-tuned for Hindi automatic speech recognition, achieving 72.62% accuracy on Common Voice test data, based on Facebook's wav2vec2-large-xlsr-53 model." +} \ No newline at end of file diff --git a/data/model_data_json/thenlper_gte-base.json b/data/model_data_json/thenlper_gte-base.json new file mode 100644 index 0000000000000000000000000000000000000000..26db73bd14ca4434a2d6689b4e992a48a102a53f --- /dev/null +++ b/data/model_data_json/thenlper_gte-base.json @@ -0,0 +1,25 @@ +{ + "model_id": "thenlper/gte-base", + "downloads": 313071, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "bert", + "mteb", + "sentence-similarity", + "Sentence Transformers", + "en", + "arxiv:2308.03281", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-similarity - sentence-transformers - Sentence Transformers model-index: - name: gte-base results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.17910447761193 - type: ap value: 36.827146398068926 - type: f1 value: 68.11292888046363 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.77345000000001 - type: ap value: 88.33530426691347 - type: f1 value: 91.76549906404642 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.964 - type: f1 value: 48.22995586184998 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 32.147999999999996 - type: map_at_10 value: 48.253 - type: map_at_100 value: 49.038 - type: map_at_1000 value: 49.042 - type: map_at_3 value: 43.433 - type: map_at_5 value: 46.182 - type: mrr_at_1 value: 32.717 - type: mrr_at_10 value: 48.467 - type: mrr_at_100 value: 49.252 - type: mrr_at_1000 value: 49.254999999999995 - type: mrr_at_3 value: 43.599 - type: mrr_at_5 value: 46.408 - type: ndcg_at_1 value: 32.147999999999996 - type: ndcg_at_10 value: 57.12199999999999 - type: ndcg_at_100 value: 60.316 - type: ndcg_at_1000 value: 60.402 - type: ndcg_at_3 value: 47.178 - type: ndcg_at_5 value: 52.146 - type: precision_at_1 value: 32.147999999999996 - type: precision_at_10 value: 8.542 - type: precision_at_100 value: 0.9900000000000001 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 19.346 - type: precision_at_5 value: 14.026 - type: recall_at_1 value: 32.147999999999996 - type: recall_at_10 value: 85.42 - type: recall_at_100 value: 99.004 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 58.037000000000006 - type: recall_at_5 value: 70.128 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.59706013699614 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.01463593002057 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.80250355752458 - type: mrr value: 74.79455216989844 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.87448576082345 - type: cos_sim_spearman value: 87.64235843637468 - type: euclidean_pearson value: 88.4901825511062 - type: euclidean_spearman value: 87.74537283182033 - type: manhattan_pearson value: 88.39040638362911 - type: manhattan_spearman value: 87.62669542888003 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 85.06818181818183 - type: f1 value: 85.02524460098233 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 38.20471092679967 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.58967592147641 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.411 - type: map_at_10 value: 45.162 - type: map_at_100 value: 46.717 - type: map_at_1000 value: 46.836 - type: map_at_3 value: 41.428 - type: map_at_5 value: 43.54 - type: mrr_at_1 value: 39.914 - type: mrr_at_10 value: 51.534 - type: mrr_at_100 value: 52.185 - type: mrr_at_1000 value: 52.22 - type: mrr_at_3 value: 49.046 - type: mrr_at_5 value: 50.548 - type: ndcg_at_1 value: 39.914 - type: ndcg_at_10 value: 52.235 - type: ndcg_at_100 value: 57.4 - type: ndcg_at_1000 value: 58.982 - type: ndcg_at_3 value: 47.332 - type: ndcg_at_5 value: 49.62 - type: precision_at_1 value: 39.914 - type: precision_at_10 value: 10.258000000000001 - type: precision_at_100 value: 1.6219999999999999 - type: precision_at_1000 value: 0.20500000000000002 - type: precision_at_3 value: 23.462 - type: precision_at_5 value: 16.71 - type: recall_at_1 value: 32.411 - type: recall_at_10 value: 65.408 - type: recall_at_100 value: 87.248 - type: recall_at_1000 value: 96.951 - type: recall_at_3 value: 50.349999999999994 - type: recall_at_5 value: 57.431 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.911 - type: map_at_10 value: 42.608000000000004 - type: map_at_100 value: 43.948 - type: map_at_1000 value: 44.089 - type: map_at_3 value: 39.652 - type: map_at_5 value: 41.236 - type: mrr_at_1 value: 40.064 - type: mrr_at_10 value: 48.916 - type: mrr_at_100 value: 49.539 - type: mrr_at_1000 value: 49.583 - type: mrr_at_3 value: 46.741 - type: mrr_at_5 value: 48.037 - type: ndcg_at_1 value: 40.064 - type: ndcg_at_10 value: 48.442 - type: ndcg_at_100 value: 52.798 - type: ndcg_at_1000 value: 54.871 - type: ndcg_at_3 value: 44.528 - type: ndcg_at_5 value: 46.211 - type: precision_at_1 value: 40.064 - type: precision_at_10 value: 9.178 - type: precision_at_100 value: 1.452 - type: precision_at_1000 value: 0.193 - type: precision_at_3 value: 21.614 - type: precision_at_5 value: 15.185 - type: recall_at_1 value: 31.911 - type: recall_at_10 value: 58.155 - type: recall_at_100 value: 76.46300000000001 - type: recall_at_1000 value: 89.622 - type: recall_at_3 value: 46.195 - type: recall_at_5 value: 51.288999999999994 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.597 - type: map_at_10 value: 54.290000000000006 - type: map_at_100 value: 55.340999999999994 - type: map_at_1000 value: 55.388999999999996 - type: map_at_3 value: 50.931000000000004 - type: map_at_5 value: 52.839999999999996 - type: mrr_at_1 value: 46.646 - type: mrr_at_10 value: 57.524 - type: mrr_at_100 value: 58.225 - type: mrr_at_1000 value: 58.245999999999995 - type: mrr_at_3 value: 55.235 - type: mrr_at_5 value: 56.589 - type: ndcg_at_1 value: 46.646 - type: ndcg_at_10 value: 60.324999999999996 - type: ndcg_at_100 value: 64.30900000000001 - type: ndcg_at_1000 value: 65.19 - type: ndcg_at_3 value: 54.983000000000004 - type: ndcg_at_5 value: 57.621 - type: precision_at_1 value: 46.646 - type: precision_at_10 value: 9.774 - type: precision_at_100 value: 1.265 - type: precision_at_1000 value: 0.13799999999999998 - type: precision_at_3 value: 24.911 - type: precision_at_5 value: 16.977999999999998 - type: recall_at_1 value: 40.597 - type: recall_at_10 value: 74.773 - type: recall_at_100 value: 91.61200000000001 - type: recall_at_1000 value: 97.726 - type: recall_at_3 value: 60.458 - type: recall_at_5 value: 66.956 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.122 - type: map_at_10 value: 36.711 - type: map_at_100 value: 37.775 - type: map_at_1000 value: 37.842999999999996 - type: map_at_3 value: 33.693 - type: map_at_5 value: 35.607 - type: mrr_at_1 value: 29.153000000000002 - type: mrr_at_10 value: 38.873999999999995 - type: mrr_at_100 value: 39.739000000000004 - type: mrr_at_1000 value: 39.794000000000004 - type: mrr_at_3 value: 36.102000000000004 - type: mrr_at_5 value: 37.876 - type: ndcg_at_1 value: 29.153000000000002 - type: ndcg_at_10 value: 42.048 - type: ndcg_at_100 value: 47.144999999999996 - type: ndcg_at_1000 value: 48.901 - type: ndcg_at_3 value: 36.402 - type: ndcg_at_5 value: 39.562999999999995 - type: precision_at_1 value: 29.153000000000002 - type: precision_at_10 value: 6.4750000000000005 - type: precision_at_100 value: 0.951 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 15.479999999999999 - type: precision_at_5 value: 11.028 - type: recall_at_1 value: 27.122 - type: recall_at_10 value: 56.279999999999994 - type: recall_at_100 value: 79.597 - type: recall_at_1000 value: 92.804 - type: recall_at_3 value: 41.437000000000005 - type: recall_at_5 value: 49.019 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.757 - type: map_at_10 value: 26.739 - type: map_at_100 value: 28.015 - type: map_at_1000 value: 28.127999999999997 - type: map_at_3 value: 23.986 - type: map_at_5 value: 25.514 - type: mrr_at_1 value: 22.015 - type: mrr_at_10 value: 31.325999999999997 - type: mrr_at_100 value: 32.368 - type: mrr_at_1000 value: 32.426 - type: mrr_at_3 value: 28.897000000000002 - type: mrr_at_5 value: 30.147000000000002 - type: ndcg_at_1 value: 22.015 - type: ndcg_at_10 value: 32.225 - type: ndcg_at_100 value: 38.405 - type: ndcg_at_1000 value: 40.932 - type: ndcg_at_3 value: 27.403 - type: ndcg_at_5 value: 29.587000000000003 - type: precision_at_1 value: 22.015 - type: precision_at_10 value: 5.9830000000000005 - type: precision_at_100 value: 1.051 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 13.391 - type: precision_at_5 value: 9.602 - type: recall_at_1 value: 17.757 - type: recall_at_10 value: 44.467 - type: recall_at_100 value: 71.53699999999999 - type: recall_at_1000 value: 89.281 - type: recall_at_3 value: 31.095 - type: recall_at_5 value: 36.818 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.354 - type: map_at_10 value: 42.134 - type: map_at_100 value: 43.429 - type: map_at_1000 value: 43.532 - type: map_at_3 value: 38.491 - type: map_at_5 value: 40.736 - type: mrr_at_1 value: 37.247 - type: mrr_at_10 value: 47.775 - type: mrr_at_100 value: 48.522999999999996 - type: mrr_at_1000 value: 48.567 - type: mrr_at_3 value: 45.059 - type: mrr_at_5 value: 46.811 - type: ndcg_at_1 value: 37.247 - type: ndcg_at_10 value: 48.609 - type: ndcg_at_100 value: 53.782 - type: ndcg_at_1000 value: 55.666000000000004 - type: ndcg_at_3 value: 42.866 - type: ndcg_at_5 value: 46.001 - type: precision_at_1 value: 37.247 - type: precision_at_10 value: 8.892999999999999 - type: precision_at_100 value: 1.341 - type: precision_at_1000 value: 0.168 - type: precision_at_3 value: 20.5 - type: precision_at_5 value: 14.976 - type: recall_at_1 value: 30.354 - type: recall_at_10 value: 62.273 - type: recall_at_100 value: 83.65599999999999 - type: recall_at_1000 value: 95.82000000000001 - type: recall_at_3 value: 46.464 - type: recall_at_5 value: 54.225 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.949 - type: map_at_10 value: 37.230000000000004 - type: map_at_100 value: 38.644 - type: map_at_1000 value: 38.751999999999995 - type: map_at_3 value: 33.816 - type: map_at_5 value: 35.817 - type: mrr_at_1 value: 33.446999999999996 - type: mrr_at_10 value: 42.970000000000006 - type: mrr_at_100 value: 43.873 - type: mrr_at_1000 value: 43.922 - type: mrr_at_3 value: 40.467999999999996 - type: mrr_at_5 value: 41.861 - type: ndcg_at_1 value: 33.446999999999996 - type: ndcg_at_10 value: 43.403000000000006 - type: ndcg_at_100 value: 49.247 - type: ndcg_at_1000 value: 51.361999999999995 - type: ndcg_at_3 value: 38.155 - type: ndcg_at_5 value: 40.643 - type: precision_at_1 value: 33.446999999999996 - type: precision_at_10 value: 8.128 - type: precision_at_100 value: 1.274 - type: precision_at_1000 value: 0.163 - type: precision_at_3 value: 18.493000000000002 - type: precision_at_5 value: 13.333 - type: recall_at_1 value: 26.949 - type: recall_at_10 value: 56.006 - type: recall_at_100 value: 80.99199999999999 - type: recall_at_1000 value: 95.074 - type: recall_at_3 value: 40.809 - type: recall_at_5 value: 47.57 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.243583333333333 - type: map_at_10 value: 37.193250000000006 - type: map_at_100 value: 38.44833333333334 - type: map_at_1000 value: 38.56083333333333 - type: map_at_3 value: 34.06633333333333 - type: map_at_5 value: 35.87858333333334 - type: mrr_at_1 value: 32.291583333333335 - type: mrr_at_10 value: 41.482749999999996 - type: mrr_at_100 value: 42.33583333333333 - type: mrr_at_1000 value: 42.38683333333333 - type: mrr_at_3 value: 38.952999999999996 - type: mrr_at_5 value: 40.45333333333333 - type: ndcg_at_1 value: 32.291583333333335 - type: ndcg_at_10 value: 42.90533333333334 - type: ndcg_at_100 value: 48.138666666666666 - type: ndcg_at_1000 value: 50.229083333333335 - type: ndcg_at_3 value: 37.76133333333334 - type: ndcg_at_5 value: 40.31033333333334 - type: precision_at_1 value: 32.291583333333335 - type: precision_at_10 value: 7.585583333333333 - type: precision_at_100 value: 1.2045000000000001 - type: precision_at_1000 value: 0.15733333333333335 - type: precision_at_3 value: 17.485416666666666 - type: precision_at_5 value: 12.5145 - type: recall_at_1 value: 27.243583333333333 - type: recall_at_10 value: 55.45108333333334 - type: recall_at_100 value: 78.25858333333335 - type: recall_at_1000 value: 92.61716666666665 - type: recall_at_3 value: 41.130583333333334 - type: recall_at_5 value: 47.73133333333334 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 26.325 - type: map_at_10 value: 32.795 - type: map_at_100 value: 33.96 - type: map_at_1000 value: 34.054 - type: map_at_3 value: 30.64 - type: map_at_5 value: 31.771 - type: mrr_at_1 value: 29.908 - type: mrr_at_10 value: 35.83 - type: mrr_at_100 value: 36.868 - type: mrr_at_1000 value: 36.928 - type: mrr_at_3 value: 33.896 - type: mrr_at_5 value: 34.893 - type: ndcg_at_1 value: 29.908 - type: ndcg_at_10 value: 36.746 - type: ndcg_at_100 value: 42.225 - type: ndcg_at_1000 value: 44.523 - type: ndcg_at_3 value: 32.82 - type: ndcg_at_5 value: 34.583000000000006 - type: precision_at_1 value: 29.908 - type: precision_at_10 value: 5.6129999999999995 - type: precision_at_100 value: 0.9079999999999999 - type: precision_at_1000 value: 0.11800000000000001 - type: precision_at_3 value: 13.753000000000002 - type: precision_at_5 value: 9.417 - type: recall_at_1 value: 26.325 - type: recall_at_10 value: 45.975 - type: recall_at_100 value: 70.393 - type: recall_at_1000 value: 87.217 - type: recall_at_3 value: 35.195 - type: recall_at_5 value: 39.69 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 17.828 - type: map_at_10 value: 25.759 - type: map_at_100 value: 26.961000000000002 - type: map_at_1000 value: 27.094 - type: map_at_3 value: 23.166999999999998 - type: map_at_5 value: 24.610000000000003 - type: mrr_at_1 value: 21.61 - type: mrr_at_10 value: 29.605999999999998 - type: mrr_at_100 value: 30.586000000000002 - type: mrr_at_1000 value: 30.664 - type: mrr_at_3 value: 27.214 - type: mrr_at_5 value: 28.571 - type: ndcg_at_1 value: 21.61 - type: ndcg_at_10 value: 30.740000000000002 - type: ndcg_at_100 value: 36.332 - type: ndcg_at_1000 value: 39.296 - type: ndcg_at_3 value: 26.11 - type: ndcg_at_5 value: 28.297 - type: precision_at_1 value: 21.61 - type: precision_at_10 value: 5.643 - type: precision_at_100 value: 1.0 - type: precision_at_1000 value: 0.14400000000000002 - type: precision_at_3 value: 12.4 - type: precision_at_5 value: 9.119 - type: recall_at_1 value: 17.828 - type: recall_at_10 value: 41.876000000000005 - type: recall_at_100 value: 66.648 - type: recall_at_1000 value: 87.763 - type: recall_at_3 value: 28.957 - type: recall_at_5 value: 34.494 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.921000000000003 - type: map_at_10 value: 37.156 - type: map_at_100 value: 38.399 - type: map_at_1000 value: 38.498 - type: map_at_3 value: 34.134 - type: map_at_5 value: 35.936 - type: mrr_at_1 value: 32.649 - type: mrr_at_10 value: 41.19 - type: mrr_at_100 value: 42.102000000000004 - type: mrr_at_1000 value: 42.157 - type: mrr_at_3 value: 38.464 - type: mrr_at_5 value: 40.148 - type: ndcg_at_1 value: 32.649 - type: ndcg_at_10 value: 42.679 - type: ndcg_at_100 value: 48.27 - type: ndcg_at_1000 value: 50.312 - type: ndcg_at_3 value: 37.269000000000005 - type: ndcg_at_5 value: 40.055 - type: precision_at_1 value: 32.649 - type: precision_at_10 value: 7.155 - type: precision_at_100 value: 1.124 - type: precision_at_1000 value: 0.14100000000000001 - type: precision_at_3 value: 16.791 - type: precision_at_5 value: 12.015 - type: recall_at_1 value: 27.921000000000003 - type: recall_at_10 value: 55.357 - type: recall_at_100 value: 79.476 - type: recall_at_1000 value: 93.314 - type: recall_at_3 value: 40.891 - type: recall_at_5 value: 47.851 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.524 - type: map_at_10 value: 35.135 - type: map_at_100 value: 36.665 - type: map_at_1000 value: 36.886 - type: map_at_3 value: 31.367 - type: map_at_5 value: 33.724 - type: mrr_at_1 value: 30.631999999999998 - type: mrr_at_10 value: 39.616 - type: mrr_at_100 value: 40.54 - type: mrr_at_1000 value: 40.585 - type: mrr_at_3 value: 36.462 - type: mrr_at_5 value: 38.507999999999996 - type: ndcg_at_1 value: 30.631999999999998 - type: ndcg_at_10 value: 41.61 - type: ndcg_at_100 value: 47.249 - type: ndcg_at_1000 value: 49.662 - type: ndcg_at_3 value: 35.421 - type: ndcg_at_5 value: 38.811 - type: precision_at_1 value: 30.631999999999998 - type: precision_at_10 value: 8.123 - type: precision_at_100 value: 1.5810000000000002 - type: precision_at_1000 value: 0.245 - type: precision_at_3 value: 16.337 - type: precision_at_5 value: 12.568999999999999 - type: recall_at_1 value: 25.524 - type: recall_at_10 value: 54.994 - type: recall_at_100 value: 80.03099999999999 - type: recall_at_1000 value: 95.25099999999999 - type: recall_at_3 value: 37.563 - type: recall_at_5 value: 46.428999999999995 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 22.224 - type: map_at_10 value: 30.599999999999998 - type: map_at_100 value: 31.526 - type: map_at_1000 value: 31.629 - type: map_at_3 value: 27.491 - type: map_at_5 value: 29.212 - type: mrr_at_1 value: 24.214 - type: mrr_at_10 value: 32.632 - type: mrr_at_100 value: 33.482 - type: mrr_at_1000 value: 33.550000000000004 - type: mrr_at_3 value: 29.852 - type: mrr_at_5 value: 31.451 - type: ndcg_at_1 value: 24.214 - type: ndcg_at_10 value: 35.802 - type: ndcg_at_100 value: 40.502 - type: ndcg_at_1000 value: 43.052 - type: ndcg_at_3 value: 29.847 - type: ndcg_at_5 value: 32.732 - type: precision_at_1 value: 24.214 - type: precision_at_10 value: 5.804 - type: precision_at_100 value: 0.885 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 12.692999999999998 - type: precision_at_5 value: 9.242 - type: recall_at_1 value: 22.224 - type: recall_at_10 value: 49.849 - type: recall_at_100 value: 71.45 - type: recall_at_1000 value: 90.583 - type: recall_at_3 value: 34.153 - type: recall_at_5 value: 41.004000000000005 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 12.386999999999999 - type: map_at_10 value: 20.182 - type: map_at_100 value: 21.86 - type: map_at_1000 value: 22.054000000000002 - type: map_at_3 value: 17.165 - type: map_at_5 value: 18.643 - type: mrr_at_1 value: 26.906000000000002 - type: mrr_at_10 value: 37.907999999999994 - type: mrr_at_100 value: 38.868 - type: mrr_at_1000 value: 38.913 - type: mrr_at_3 value: 34.853 - type: mrr_at_5 value: 36.567 - type: ndcg_at_1 value: 26.906000000000002 - type: ndcg_at_10 value: 28.103 - type: ndcg_at_100 value: 35.073 - type: ndcg_at_1000 value: 38.653 - type: ndcg_at_3 value: 23.345 - type: ndcg_at_5 value: 24.828 - type: precision_at_1 value: 26.906000000000002 - type: precision_at_10 value: 8.547 - type: precision_at_100 value: 1.617 - type: precision_at_1000 value: 0.22799999999999998 - type: precision_at_3 value: 17.025000000000002 - type: precision_at_5 value: 12.834000000000001 - type: recall_at_1 value: 12.386999999999999 - type: recall_at_10 value: 33.306999999999995 - type: recall_at_100 value: 57.516 - type: recall_at_1000 value: 77.74799999999999 - type: recall_at_3 value: 21.433 - type: recall_at_5 value: 25.915 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.322 - type: map_at_10 value: 20.469 - type: map_at_100 value: 28.638 - type: map_at_1000 value: 30.433 - type: map_at_3 value: 14.802000000000001 - type: map_at_5 value: 17.297 - type: mrr_at_1 value: 68.75 - type: mrr_at_10 value: 76.29599999999999 - type: mrr_at_100 value: 76.62400000000001 - type: mrr_at_1000 value: 76.633 - type: mrr_at_3 value: 75.083 - type: mrr_at_5 value: 75.771 - type: ndcg_at_1 value: 54.87499999999999 - type: ndcg_at_10 value: 41.185 - type: ndcg_at_100 value: 46.400000000000006 - type: ndcg_at_1000 value: 54.223 - type: ndcg_at_3 value: 45.489000000000004 - type: ndcg_at_5 value: 43.161 - type: precision_at_1 value: 68.75 - type: precision_at_10 value: 32.300000000000004 - type: precision_at_100 value: 10.607999999999999 - type: precision_at_1000 value: 2.237 - type: precision_at_3 value: 49.083 - type: precision_at_5 value: 41.6 - type: recall_at_1 value: 9.322 - type: recall_at_10 value: 25.696 - type: recall_at_100 value: 52.898 - type: recall_at_1000 value: 77.281 - type: recall_at_3 value: 15.943 - type: recall_at_5 value: 19.836000000000002 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 48.650000000000006 - type: f1 value: 43.528467245539396 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 66.56 - type: map_at_10 value: 76.767 - type: map_at_100 value: 77.054 - type: map_at_1000 value: 77.068 - type: map_at_3 value: 75.29299999999999 - type: map_at_5 value: 76.24 - type: mrr_at_1 value: 71.842 - type: mrr_at_10 value: 81.459 - type: mrr_at_100 value: 81.58800000000001 - type: mrr_at_1000 value: 81.59100000000001 - type: mrr_at_3 value: 80.188 - type: mrr_at_5 value: 81.038 - type: ndcg_at_1 value: 71.842 - type: ndcg_at_10 value: 81.51899999999999 - type: ndcg_at_100 value: 82.544 - type: ndcg_at_1000 value: 82.829 - type: ndcg_at_3 value: 78.92 - type: ndcg_at_5 value: 80.406 - type: precision_at_1 value: 71.842 - type: precision_at_10 value: 10.066 - type: precision_at_100 value: 1.076 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 30.703000000000003 - type: precision_at_5 value: 19.301 - type: recall_at_1 value: 66.56 - type: recall_at_10 value: 91.55 - type: recall_at_100 value: 95.67099999999999 - type: recall_at_1000 value: 97.539 - type: recall_at_3 value: 84.46900000000001 - type: recall_at_5 value: 88.201 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 20.087 - type: map_at_10 value: 32.830999999999996 - type: map_at_100 value: 34.814 - type: map_at_1000 value: 34.999 - type: map_at_3 value: 28.198 - type: map_at_5 value: 30.779 - type: mrr_at_1 value: 38.889 - type: mrr_at_10 value: 48.415 - type: mrr_at_100 value: 49.187 - type: mrr_at_1000 value: 49.226 - type: mrr_at_3 value: 45.705 - type: mrr_at_5 value: 47.225 - type: ndcg_at_1 value: 38.889 - type: ndcg_at_10 value: 40.758 - type: ndcg_at_100 value: 47.671 - type: ndcg_at_1000 value: 50.744 - type: ndcg_at_3 value: 36.296 - type: ndcg_at_5 value: 37.852999999999994 - type: precision_at_1 value: 38.889 - type: precision_at_10 value: 11.466 - type: precision_at_100 value: 1.8499999999999999 - type: precision_at_1000 value: 0.24 - type: precision_at_3 value: 24.126 - type: precision_at_5 value: 18.21 - type: recall_at_1 value: 20.087 - type: recall_at_10 value: 48.042 - type: recall_at_100 value: 73.493 - type: recall_at_1000 value: 91.851 - type: recall_at_3 value: 32.694 - type: recall_at_5 value: 39.099000000000004 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 38.096000000000004 - type: map_at_10 value: 56.99999999999999 - type: map_at_100 value: 57.914 - type: map_at_1000 value: 57.984 - type: map_at_3 value: 53.900999999999996 - type: map_at_5 value: 55.827000000000005 - type: mrr_at_1 value: 76.19200000000001 - type: mrr_at_10 value: 81.955 - type: mrr_at_100 value: 82.164 - type: mrr_at_1000 value: 82.173 - type: mrr_at_3 value: 80.963 - type: mrr_at_5 value: 81.574 - type: ndcg_at_1 value: 76.19200000000001 - type: ndcg_at_10 value: 65.75 - type: ndcg_at_100 value: 68.949 - type: ndcg_at_1000 value: 70.342 - type: ndcg_at_3 value: 61.29 - type: ndcg_at_5 value: 63.747 - type: precision_at_1 value: 76.19200000000001 - type: precision_at_10 value: 13.571 - type: precision_at_100 value: 1.6070000000000002 - type: precision_at_1000 value: 0.179 - type: precision_at_3 value: 38.663 - type: precision_at_5 value: 25.136999999999997 - type: recall_at_1 value: 38.096000000000004 - type: recall_at_10 value: 67.853 - type: recall_at_100 value: 80.365 - type: recall_at_1000 value: 89.629 - type: recall_at_3 value: 57.995 - type: recall_at_5 value: 62.843 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 85.95200000000001 - type: ap value: 80.73847277002109 - type: f1 value: 85.92406135678594 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 20.916999999999998 - type: map_at_10 value: 33.23 - type: map_at_100 value: 34.427 - type: map_at_1000 value: 34.477000000000004 - type: map_at_3 value: 29.292 - type: map_at_5 value: 31.6 - type: mrr_at_1 value: 21.547 - type: mrr_at_10 value: 33.839999999999996 - type: mrr_at_100 value: 34.979 - type: mrr_at_1000 value: 35.022999999999996 - type: mrr_at_3 value: 29.988 - type: mrr_at_5 value: 32.259 - type: ndcg_at_1 value: 21.519 - type: ndcg_at_10 value: 40.209 - type: ndcg_at_100 value: 45.954 - type: ndcg_at_1000 value: 47.187 - type: ndcg_at_3 value: 32.227 - type: ndcg_at_5 value: 36.347 - type: precision_at_1 value: 21.519 - type: precision_at_10 value: 6.447 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 13.877999999999998 - type: precision_at_5 value: 10.404 - type: recall_at_1 value: 20.916999999999998 - type: recall_at_10 value: 61.7 - type: recall_at_100 value: 88.202 - type: recall_at_1000 value: 97.588 - type: recall_at_3 value: 40.044999999999995 - type: recall_at_5 value: 49.964999999999996 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.02781577747379 - type: f1 value: 92.83653922768306 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 72.04286365709075 - type: f1 value: 53.43867658525793 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 71.47276395427035 - type: f1 value: 69.77017399597342 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.3819771351715 - type: f1 value: 76.8484533435409 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.16515993299593 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.77145323314774 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.53637706586391 - type: mrr value: 33.7312926288863 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 7.063999999999999 - type: map_at_10 value: 15.046999999999999 - type: map_at_100 value: 19.116 - type: map_at_1000 value: 20.702 - type: map_at_3 value: 10.932 - type: map_at_5 value: 12.751999999999999 - type: mrr_at_1 value: 50.464 - type: mrr_at_10 value: 58.189 - type: mrr_at_100 value: 58.733999999999995 - type: mrr_at_1000 value: 58.769000000000005 - type: mrr_at_3 value: 56.24400000000001 - type: mrr_at_5 value: 57.68299999999999 - type: ndcg_at_1 value: 48.142 - type: ndcg_at_10 value: 37.897 - type: ndcg_at_100 value: 35.264 - type: ndcg_at_1000 value: 44.033 - type: ndcg_at_3 value: 42.967 - type: ndcg_at_5 value: 40.815 - type: precision_at_1 value: 50.15500000000001 - type: precision_at_10 value: 28.235 - type: precision_at_100 value: 8.994 - type: precision_at_1000 value: 2.218 - type: precision_at_3 value: 40.041 - type: precision_at_5 value: 35.046 - type: recall_at_1 value: 7.063999999999999 - type: recall_at_10 value: 18.598 - type: recall_at_100 value: 35.577999999999996 - type: recall_at_1000 value: 67.43 - type: recall_at_3 value: 11.562999999999999 - type: recall_at_5 value: 14.771 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 29.046 - type: map_at_10 value: 44.808 - type: map_at_100 value: 45.898 - type: map_at_1000 value: 45.927 - type: map_at_3 value: 40.19 - type: map_at_5 value: 42.897 - type: mrr_at_1 value: 32.706 - type: mrr_at_10 value: 47.275 - type: mrr_at_100 value: 48.075 - type: mrr_at_1000 value: 48.095 - type: mrr_at_3 value: 43.463 - type: mrr_at_5 value: 45.741 - type: ndcg_at_1 value: 32.706 - type: ndcg_at_10 value: 52.835 - type: ndcg_at_100 value: 57.345 - type: ndcg_at_1000 value: 57.985 - type: ndcg_at_3 value: 44.171 - type: ndcg_at_5 value: 48.661 - type: precision_at_1 value: 32.706 - type: precision_at_10 value: 8.895999999999999 - type: precision_at_100 value: 1.143 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 20.238999999999997 - type: precision_at_5 value: 14.728 - type: recall_at_1 value: 29.046 - type: recall_at_10 value: 74.831 - type: recall_at_100 value: 94.192 - type: recall_at_1000 value: 98.897 - type: recall_at_3 value: 52.37500000000001 - type: recall_at_5 value: 62.732 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.38799999999999 - type: map_at_10 value: 84.315 - type: map_at_100 value: 84.955 - type: map_at_1000 value: 84.971 - type: map_at_3 value: 81.33399999999999 - type: map_at_5 value: 83.21300000000001 - type: mrr_at_1 value: 81.03 - type: mrr_at_10 value: 87.395 - type: mrr_at_100 value: 87.488 - type: mrr_at_1000 value: 87.48899999999999 - type: mrr_at_3 value: 86.41499999999999 - type: mrr_at_5 value: 87.074 - type: ndcg_at_1 value: 81.04 - type: ndcg_at_10 value: 88.151 - type: ndcg_at_100 value: 89.38199999999999 - type: ndcg_at_1000 value: 89.479 - type: ndcg_at_3 value: 85.24000000000001 - type: ndcg_at_5 value: 86.856 - type: precision_at_1 value: 81.04 - type: precision_at_10 value: 13.372 - type: precision_at_100 value: 1.526 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.217 - type: precision_at_5 value: 24.502 - type: recall_at_1 value: 70.38799999999999 - type: recall_at_10 value: 95.452 - type: recall_at_100 value: 99.59700000000001 - type: recall_at_1000 value: 99.988 - type: recall_at_3 value: 87.11 - type: recall_at_5 value: 91.662 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 59.334991029213235 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 62.586500854616666 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 5.153 - type: map_at_10 value: 14.277000000000001 - type: map_at_100 value: 16.922 - type: map_at_1000 value: 17.302999999999997 - type: map_at_3 value: 9.961 - type: map_at_5 value: 12.257 - type: mrr_at_1 value: 25.4 - type: mrr_at_10 value: 37.458000000000006 - type: mrr_at_100 value: 38.681 - type: mrr_at_1000 value: 38.722 - type: mrr_at_3 value: 34.1 - type: mrr_at_5 value: 36.17 - type: ndcg_at_1 value: 25.4 - type: ndcg_at_10 value: 23.132 - type: ndcg_at_100 value: 32.908 - type: ndcg_at_1000 value: 38.754 - type: ndcg_at_3 value: 21.82 - type: ndcg_at_5 value: 19.353 - type: precision_at_1 value: 25.4 - type: precision_at_10 value: 12.1 - type: precision_at_100 value: 2.628 - type: precision_at_1000 value: 0.402 - type: precision_at_3 value: 20.732999999999997 - type: precision_at_5 value: 17.34 - type: recall_at_1 value: 5.153 - type: recall_at_10 value: 24.54 - type: recall_at_100 value: 53.293 - type: recall_at_1000 value: 81.57 - type: recall_at_3 value: 12.613 - type: recall_at_5 value: 17.577 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.86284404925333 - type: cos_sim_spearman value: 78.85870555294795 - type: euclidean_pearson value: 82.20105295276093 - type: euclidean_spearman value: 78.92125617009592 - type: manhattan_pearson value: 82.15840025289069 - type: manhattan_spearman value: 78.85955732900803 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 84.98747423389027 - type: cos_sim_spearman value: 75.71298531799367 - type: euclidean_pearson value: 81.59709559192291 - type: euclidean_spearman value: 75.40622749225653 - type: manhattan_pearson value: 81.55553547608804 - type: manhattan_spearman value: 75.39380235424899 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 83.76861330695503 - type: cos_sim_spearman value: 85.72991921531624 - type: euclidean_pearson value: 84.84504307397536 - type: euclidean_spearman value: 86.02679162824732 - type: manhattan_pearson value: 84.79969439220142 - type: manhattan_spearman value: 85.99238837291625 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 83.31929747511796 - type: cos_sim_spearman value: 81.50806522502528 - type: euclidean_pearson value: 82.93936686512777 - type: euclidean_spearman value: 81.54403447993224 - type: manhattan_pearson value: 82.89696981900828 - type: manhattan_spearman value: 81.52817825470865 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 87.14413295332908 - type: cos_sim_spearman value: 88.81032027008195 - type: euclidean_pearson value: 88.19205563407645 - type: euclidean_spearman value: 88.89738339479216 - type: manhattan_pearson value: 88.11075942004189 - type: manhattan_spearman value: 88.8297061675564 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.15980075557017 - type: cos_sim_spearman value: 83.81896308594801 - type: euclidean_pearson value: 83.11195254311338 - type: euclidean_spearman value: 84.10479481755407 - type: manhattan_pearson value: 83.13915225100556 - type: manhattan_spearman value: 84.09895591027859 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.93669480147919 - type: cos_sim_spearman value: 87.89861394614361 - type: euclidean_pearson value: 88.37316413202339 - type: euclidean_spearman value: 88.18033817842569 - type: manhattan_pearson value: 88.39427578879469 - type: manhattan_spearman value: 88.09185009236847 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 66.62215083348255 - type: cos_sim_spearman value: 67.33243665716736 - type: euclidean_pearson value: 67.60871701996284 - type: euclidean_spearman value: 66.75929225238659 - type: manhattan_pearson value: 67.63907838970992 - type: manhattan_spearman value: 66.79313656754846 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.65549191934764 - type: cos_sim_spearman value: 85.73266847750143 - type: euclidean_pearson value: 85.75609932254318 - type: euclidean_spearman value: 85.9452287759371 - type: manhattan_pearson value: 85.69717413063573 - type: manhattan_spearman value: 85.86546318377046 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.08164129085783 - type: mrr value: 96.2877273416489 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 62.09400000000001 - type: map_at_10 value: 71.712 - type: map_at_100 value: 72.128 - type: map_at_1000 value: 72.14399999999999 - type: map_at_3 value: 68.93 - type: map_at_5 value: 70.694 - type: mrr_at_1 value: 65.0 - type: mrr_at_10 value: 72.572 - type: mrr_at_100 value: 72.842 - type: mrr_at_1000 value: 72.856 - type: mrr_at_3 value: 70.44399999999999 - type: mrr_at_5 value: 71.744 - type: ndcg_at_1 value: 65.0 - type: ndcg_at_10 value: 76.178 - type: ndcg_at_100 value: 77.887 - type: ndcg_at_1000 value: 78.227 - type: ndcg_at_3 value: 71.367 - type: ndcg_at_5 value: 73.938 - type: precision_at_1 value: 65.0 - type: precision_at_10 value: 10.033 - type: precision_at_100 value: 1.097 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 27.667 - type: precision_at_5 value: 18.4 - type: recall_at_1 value: 62.09400000000001 - type: recall_at_10 value: 89.022 - type: recall_at_100 value: 96.833 - type: recall_at_1000 value: 99.333 - type: recall_at_3 value: 75.922 - type: recall_at_5 value: 82.428 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.82178217821782 - type: cos_sim_ap value: 95.71282508220798 - type: cos_sim_f1 value: 90.73120494335737 - type: cos_sim_precision value: 93.52441613588111 - type: cos_sim_recall value: 88.1 - type: dot_accuracy value: 99.73960396039604 - type: dot_ap value: 92.98534606529098 - type: dot_f1 value: 86.83024536805209 - type: dot_precision value: 86.96088264794383 - type: dot_recall value: 86.7 - type: euclidean_accuracy value: 99.82475247524752 - type: euclidean_ap value: 95.72927039014849 - type: euclidean_f1 value: 90.89974293059126 - type: euclidean_precision value: 93.54497354497354 - type: euclidean_recall value: 88.4 - type: manhattan_accuracy value: 99.82574257425742 - type: manhattan_ap value: 95.72142177390405 - type: manhattan_f1 value: 91.00152516522625 - type: manhattan_precision value: 92.55429162357808 - type: manhattan_recall value: 89.5 - type: max_accuracy value: 99.82574257425742 - type: max_ap value: 95.72927039014849 - type: max_f1 value: 91.00152516522625 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 66.63957663468679 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 36.003307257923964 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 53.005825525863905 - type: mrr value: 53.854683919022165 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.503611569974098 - type: cos_sim_spearman value: 31.17155564248449 - type: dot_pearson value: 26.740428413981306 - type: dot_spearman value: 26.55727635469746 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.23600000000000002 - type: map_at_10 value: 1.7670000000000001 - type: map_at_100 value: 10.208 - type: map_at_1000 value: 25.997999999999998 - type: map_at_3 value: 0.605 - type: map_at_5 value: 0.9560000000000001 - type: mrr_at_1 value: 84.0 - type: mrr_at_10 value: 90.167 - type: mrr_at_100 value: 90.167 - type: mrr_at_1000 value: 90.167 - type: mrr_at_3 value: 89.667 - type: mrr_at_5 value: 90.167 - type: ndcg_at_1 value: 77.0 - type: ndcg_at_10 value: 68.783 - type: ndcg_at_100 value: 54.196 - type: ndcg_at_1000 value: 52.077 - type: ndcg_at_3 value: 71.642 - type: ndcg_at_5 value: 70.45700000000001 - type: precision_at_1 value: 84.0 - type: precision_at_10 value: 73.0 - type: precision_at_100 value: 55.48 - type: precision_at_1000 value: 23.102 - type: precision_at_3 value: 76.0 - type: precision_at_5 value: 74.8 - type: recall_at_1 value: 0.23600000000000002 - type: recall_at_10 value: 1.9869999999999999 - type: recall_at_100 value: 13.749 - type: recall_at_1000 value: 50.157 - type: recall_at_3 value: 0.633 - type: recall_at_5 value: 1.0290000000000001 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 1.437 - type: map_at_10 value: 8.791 - type: map_at_100 value: 15.001999999999999 - type: map_at_1000 value: 16.549 - type: map_at_3 value: 3.8080000000000003 - type: map_at_5 value: 5.632000000000001 - type: mrr_at_1 value: 20.408 - type: mrr_at_10 value: 36.96 - type: mrr_at_100 value: 37.912 - type: mrr_at_1000 value: 37.912 - type: mrr_at_3 value: 29.592000000000002 - type: mrr_at_5 value: 34.489999999999995 - type: ndcg_at_1 value: 19.387999999999998 - type: ndcg_at_10 value: 22.554 - type: ndcg_at_100 value: 35.197 - type: ndcg_at_1000 value: 46.58 - type: ndcg_at_3 value: 20.285 - type: ndcg_at_5 value: 21.924 - type: precision_at_1 value: 20.408 - type: precision_at_10 value: 21.837 - type: precision_at_100 value: 7.754999999999999 - type: precision_at_1000 value: 1.537 - type: precision_at_3 value: 21.769 - type: precision_at_5 value: 23.673 - type: recall_at_1 value: 1.437 - type: recall_at_10 value: 16.314999999999998 - type: recall_at_100 value: 47.635 - type: recall_at_1000 value: 82.963 - type: recall_at_3 value: 4.955 - type: recall_at_5 value: 8.805 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 71.6128 - type: ap value: 14.279639861175664 - type: f1 value: 54.922292491204274 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 57.01188455008489 - type: f1 value: 57.377953019225515 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 52.306769136544254 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.64701674912082 - type: cos_sim_ap value: 72.46600945328552 - type: cos_sim_f1 value: 67.96572367648784 - type: cos_sim_precision value: 61.21801649397336 - type: cos_sim_recall value: 76.38522427440633 - type: dot_accuracy value: 82.33295583238957 - type: dot_ap value: 62.54843443071716 - type: dot_f1 value: 60.38378562507096 - type: dot_precision value: 52.99980067769583 - type: dot_recall value: 70.15831134564644 - type: euclidean_accuracy value: 85.7423854085951 - type: euclidean_ap value: 72.76873850945174 - type: euclidean_f1 value: 68.23556960543262 - type: euclidean_precision value: 61.3344559040202 - type: euclidean_recall value: 76.88654353562005 - type: manhattan_accuracy value: 85.74834594981225 - type: manhattan_ap value: 72.66825372446462 - type: manhattan_f1 value: 68.21539194662853 - type: manhattan_precision value: 62.185056472632496 - type: manhattan_recall value: 75.54089709762533 - type: max_accuracy value: 85.74834594981225 - type: max_ap value: 72.76873850945174 - type: max_f1 value: 68.23556960543262 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.73171110334924 - type: cos_sim_ap value: 85.51855542063649 - type: cos_sim_f1 value: 77.95706775700934 - type: cos_sim_precision value: 74.12524298805887 - type: cos_sim_recall value: 82.20665229442562 - type: dot_accuracy value: 86.94842240074514 - type: dot_ap value: 80.90995345771762 - type: dot_f1 value: 74.20765027322403 - type: dot_precision value: 70.42594385285575 - type: dot_recall value: 78.41854019094548 - type: euclidean_accuracy value: 88.73753250281368 - type: euclidean_ap value: 85.54712254033734 - type: euclidean_f1 value: 78.07565728654365 - type: euclidean_precision value: 75.1120597652081 - type: euclidean_recall value: 81.282722513089 - type: manhattan_accuracy value: 88.72588970388482 - type: manhattan_ap value: 85.52118291594071 - type: manhattan_f1 value: 78.04428724070593 - type: manhattan_precision value: 74.83219105490002 - type: manhattan_recall value: 81.54450261780106 - type: max_accuracy value: 88.73753250281368 - type: max_ap value: 85.54712254033734 - type: max_f1 value: 78.07565728654365 language: - en license: mit --- # gte-base General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc. ## Metrics We compared the performance of the GTE models with other popular text embedding models on the MTEB benchmark. For more detailed comparison results, please refer to the MTEB leaderboard. | Model Name | Model Size (GB) | Dimension | Sequence Length | Average (56) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large** | 0.67 | 1024 | 512 | **63.13** | 46.84 | 85.00 | 59.13 | 52.22 | 83.35 | 31.66 | 73.33 | | **gte-base** | 0.22 | 768 | 512 | **62.39** | 46.2 | 84.57 | 58.61 | 51.14 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1.34 | 1024| 512 | 62.25 | 44.49 | 86.03 | 56.61 | 50.56 | 82.05 | 30.19 | 75.24 | | e5-base-v2 | 0.44 | 768 | 512 | 61.5 | 43.80 | 85.73 | 55.91 | 50.29 | 81.05 | 30.28 | 73.84 | | **gte-small** | 0.07 | 384 | 512 | **61.36** | 44.89 | 83.54 | 57.7 | 49.46 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | - | 1536 | 8192 | 60.99 | 45.9 | 84.89 | 56.32 | 49.25 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 0.13 | 384 | 512 | 59.93 | 39.92 | 84.67 | 54.32 | 49.04 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 9.73 | 768 | 512 | 59.51 | 43.72 | 85.06 | 56.42 | 42.24 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 0.44 | 768 | 514 | 57.78 | 43.69 | 83.04 | 59.36 | 43.81 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 28.27 | 4096 | 2048 | 57.59 | 38.93 | 81.9 | 55.65 | 48.22 | 77.74 | 33.6 | 66.19 | | all-MiniLM-L12-v2 | 0.13 | 384 | 512 | 56.53 | 41.81 | 82.41 | 58.44 | 42.69 | 79.8 | 27.9 | 63.21 | | all-MiniLM-L6-v2 | 0.09 | 384 | 512 | 56.26 | 42.35 | 82.37 | 58.04 | 41.95 | 78.9 | 30.81 | 63.05 | | contriever-base-msmarco | 0.44 | 768 | 512 | 56.00 | 41.1 | 82.54 | 53.14 | 41.88 | 76.51 | 30.36 | 66.68 | | sentence-t5-base | 0.22 | 768 | 512 | 55.27 | 40.21 | 85.18 | 53.09 | 33.63 | 81.14 | 31.39 | 69.81 | ## Usage Code example Use with sentence-transformers: ### Limitation This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens. ### Citation If you find our paper or models helpful, please consider citing them as follows:", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity, classification, clustering, and retrieval across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/thenlper_gte-large.json b/data/model_data_json/thenlper_gte-large.json new file mode 100644 index 0000000000000000000000000000000000000000..10799be12dff5f9d225072af84daabaca35645b1 --- /dev/null +++ b/data/model_data_json/thenlper_gte-large.json @@ -0,0 +1,25 @@ +{ + "model_id": "thenlper/gte-large", + "downloads": 1655189, + "tags": [ + "sentence-transformers", + "pytorch", + "onnx", + "safetensors", + "openvino", + "bert", + "mteb", + "sentence-similarity", + "Sentence Transformers", + "en", + "arxiv:2308.03281", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-similarity - sentence-transformers - Sentence Transformers model-index: - name: gte-large results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 72.62686567164178 - type: ap value: 34.46944126809772 - type: f1 value: 66.23684353950857 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 92.51805 - type: ap value: 89.49842783330848 - type: f1 value: 92.51112169431808 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 49.074 - type: f1 value: 48.44785682572955 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 32.077 - type: map_at_10 value: 48.153 - type: map_at_100 value: 48.963 - type: map_at_1000 value: 48.966 - type: map_at_3 value: 43.184 - type: map_at_5 value: 46.072 - type: mrr_at_1 value: 33.073 - type: mrr_at_10 value: 48.54 - type: mrr_at_100 value: 49.335 - type: mrr_at_1000 value: 49.338 - type: mrr_at_3 value: 43.563 - type: mrr_at_5 value: 46.383 - type: ndcg_at_1 value: 32.077 - type: ndcg_at_10 value: 57.158 - type: ndcg_at_100 value: 60.324999999999996 - type: ndcg_at_1000 value: 60.402 - type: ndcg_at_3 value: 46.934 - type: ndcg_at_5 value: 52.158 - type: precision_at_1 value: 32.077 - type: precision_at_10 value: 8.591999999999999 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 19.275000000000002 - type: precision_at_5 value: 14.111 - type: recall_at_1 value: 32.077 - type: recall_at_10 value: 85.917 - type: recall_at_100 value: 99.075 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 57.824 - type: recall_at_5 value: 70.555 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 48.619246083417295 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 43.3574067664688 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 63.06359661829253 - type: mrr value: 76.15596007562766 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 90.25407547368691 - type: cos_sim_spearman value: 88.65081514968477 - type: euclidean_pearson value: 88.14857116664494 - type: euclidean_spearman value: 88.50683596540692 - type: manhattan_pearson value: 87.9654797992225 - type: manhattan_spearman value: 88.21164851646908 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 86.05844155844157 - type: f1 value: 86.01555597681825 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 39.10510519739522 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 36.84689960264385 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.800000000000004 - type: map_at_10 value: 44.857 - type: map_at_100 value: 46.512 - type: map_at_1000 value: 46.635 - type: map_at_3 value: 41.062 - type: map_at_5 value: 43.126 - type: mrr_at_1 value: 39.628 - type: mrr_at_10 value: 50.879 - type: mrr_at_100 value: 51.605000000000004 - type: mrr_at_1000 value: 51.641000000000005 - type: mrr_at_3 value: 48.14 - type: mrr_at_5 value: 49.835 - type: ndcg_at_1 value: 39.628 - type: ndcg_at_10 value: 51.819 - type: ndcg_at_100 value: 57.318999999999996 - type: ndcg_at_1000 value: 58.955999999999996 - type: ndcg_at_3 value: 46.409 - type: ndcg_at_5 value: 48.825 - type: precision_at_1 value: 39.628 - type: precision_at_10 value: 10.072000000000001 - type: precision_at_100 value: 1.625 - type: precision_at_1000 value: 0.21 - type: precision_at_3 value: 22.556 - type: precision_at_5 value: 16.309 - type: recall_at_1 value: 32.800000000000004 - type: recall_at_10 value: 65.078 - type: recall_at_100 value: 87.491 - type: recall_at_1000 value: 97.514 - type: recall_at_3 value: 49.561 - type: recall_at_5 value: 56.135999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 32.614 - type: map_at_10 value: 43.578 - type: map_at_100 value: 44.897 - type: map_at_1000 value: 45.023 - type: map_at_3 value: 40.282000000000004 - type: map_at_5 value: 42.117 - type: mrr_at_1 value: 40.510000000000005 - type: mrr_at_10 value: 49.428 - type: mrr_at_100 value: 50.068999999999996 - type: mrr_at_1000 value: 50.111000000000004 - type: mrr_at_3 value: 47.176 - type: mrr_at_5 value: 48.583999999999996 - type: ndcg_at_1 value: 40.510000000000005 - type: ndcg_at_10 value: 49.478 - type: ndcg_at_100 value: 53.852 - type: ndcg_at_1000 value: 55.782 - type: ndcg_at_3 value: 45.091 - type: ndcg_at_5 value: 47.19 - type: precision_at_1 value: 40.510000000000005 - type: precision_at_10 value: 9.363000000000001 - type: precision_at_100 value: 1.51 - type: precision_at_1000 value: 0.196 - type: precision_at_3 value: 21.741 - type: precision_at_5 value: 15.465000000000002 - type: recall_at_1 value: 32.614 - type: recall_at_10 value: 59.782000000000004 - type: recall_at_100 value: 78.012 - type: recall_at_1000 value: 90.319 - type: recall_at_3 value: 46.825 - type: recall_at_5 value: 52.688 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 40.266000000000005 - type: map_at_10 value: 53.756 - type: map_at_100 value: 54.809 - type: map_at_1000 value: 54.855 - type: map_at_3 value: 50.073 - type: map_at_5 value: 52.293 - type: mrr_at_1 value: 46.332 - type: mrr_at_10 value: 57.116 - type: mrr_at_100 value: 57.767 - type: mrr_at_1000 value: 57.791000000000004 - type: mrr_at_3 value: 54.461999999999996 - type: mrr_at_5 value: 56.092 - type: ndcg_at_1 value: 46.332 - type: ndcg_at_10 value: 60.092 - type: ndcg_at_100 value: 64.034 - type: ndcg_at_1000 value: 64.937 - type: ndcg_at_3 value: 54.071000000000005 - type: ndcg_at_5 value: 57.254000000000005 - type: precision_at_1 value: 46.332 - type: precision_at_10 value: 9.799 - type: precision_at_100 value: 1.278 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 24.368000000000002 - type: precision_at_5 value: 16.89 - type: recall_at_1 value: 40.266000000000005 - type: recall_at_10 value: 75.41499999999999 - type: recall_at_100 value: 92.01700000000001 - type: recall_at_1000 value: 98.379 - type: recall_at_3 value: 59.476 - type: recall_at_5 value: 67.297 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 28.589 - type: map_at_10 value: 37.755 - type: map_at_100 value: 38.881 - type: map_at_1000 value: 38.954 - type: map_at_3 value: 34.759 - type: map_at_5 value: 36.544 - type: mrr_at_1 value: 30.734 - type: mrr_at_10 value: 39.742 - type: mrr_at_100 value: 40.774 - type: mrr_at_1000 value: 40.824 - type: mrr_at_3 value: 37.137 - type: mrr_at_5 value: 38.719 - type: ndcg_at_1 value: 30.734 - type: ndcg_at_10 value: 42.978 - type: ndcg_at_100 value: 48.309000000000005 - type: ndcg_at_1000 value: 50.068 - type: ndcg_at_3 value: 37.361 - type: ndcg_at_5 value: 40.268 - type: precision_at_1 value: 30.734 - type: precision_at_10 value: 6.565 - type: precision_at_100 value: 0.964 - type: precision_at_1000 value: 0.11499999999999999 - type: precision_at_3 value: 15.744 - type: precision_at_5 value: 11.096 - type: recall_at_1 value: 28.589 - type: recall_at_10 value: 57.126999999999995 - type: recall_at_100 value: 81.051 - type: recall_at_1000 value: 94.027 - type: recall_at_3 value: 42.045 - type: recall_at_5 value: 49.019 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.5 - type: map_at_10 value: 27.950999999999997 - type: map_at_100 value: 29.186 - type: map_at_1000 value: 29.298000000000002 - type: map_at_3 value: 25.141000000000002 - type: map_at_5 value: 26.848 - type: mrr_at_1 value: 22.637 - type: mrr_at_10 value: 32.572 - type: mrr_at_100 value: 33.472 - type: mrr_at_1000 value: 33.533 - type: mrr_at_3 value: 29.747 - type: mrr_at_5 value: 31.482 - type: ndcg_at_1 value: 22.637 - type: ndcg_at_10 value: 33.73 - type: ndcg_at_100 value: 39.568 - type: ndcg_at_1000 value: 42.201 - type: ndcg_at_3 value: 28.505999999999997 - type: ndcg_at_5 value: 31.255 - type: precision_at_1 value: 22.637 - type: precision_at_10 value: 6.281000000000001 - type: precision_at_100 value: 1.073 - type: precision_at_1000 value: 0.14300000000000002 - type: precision_at_3 value: 13.847000000000001 - type: precision_at_5 value: 10.224 - type: recall_at_1 value: 18.5 - type: recall_at_10 value: 46.744 - type: recall_at_100 value: 72.072 - type: recall_at_1000 value: 91.03999999999999 - type: recall_at_3 value: 32.551 - type: recall_at_5 value: 39.533 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.602 - type: map_at_10 value: 42.18 - type: map_at_100 value: 43.6 - type: map_at_1000 value: 43.704 - type: map_at_3 value: 38.413000000000004 - type: map_at_5 value: 40.626 - type: mrr_at_1 value: 37.344 - type: mrr_at_10 value: 47.638000000000005 - type: mrr_at_100 value: 48.485 - type: mrr_at_1000 value: 48.52 - type: mrr_at_3 value: 44.867000000000004 - type: mrr_at_5 value: 46.566 - type: ndcg_at_1 value: 37.344 - type: ndcg_at_10 value: 48.632 - type: ndcg_at_100 value: 54.215 - type: ndcg_at_1000 value: 55.981 - type: ndcg_at_3 value: 42.681999999999995 - type: ndcg_at_5 value: 45.732 - type: precision_at_1 value: 37.344 - type: precision_at_10 value: 8.932 - type: precision_at_100 value: 1.376 - type: precision_at_1000 value: 0.17099999999999999 - type: precision_at_3 value: 20.276 - type: precision_at_5 value: 14.726 - type: recall_at_1 value: 30.602 - type: recall_at_10 value: 62.273 - type: recall_at_100 value: 85.12100000000001 - type: recall_at_1000 value: 96.439 - type: recall_at_3 value: 45.848 - type: recall_at_5 value: 53.615 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.952 - type: map_at_10 value: 35.177 - type: map_at_100 value: 36.59 - type: map_at_1000 value: 36.703 - type: map_at_3 value: 31.261 - type: map_at_5 value: 33.222 - type: mrr_at_1 value: 29.337999999999997 - type: mrr_at_10 value: 40.152 - type: mrr_at_100 value: 40.963 - type: mrr_at_1000 value: 41.016999999999996 - type: mrr_at_3 value: 36.91 - type: mrr_at_5 value: 38.685 - type: ndcg_at_1 value: 29.337999999999997 - type: ndcg_at_10 value: 41.994 - type: ndcg_at_100 value: 47.587 - type: ndcg_at_1000 value: 49.791000000000004 - type: ndcg_at_3 value: 35.27 - type: ndcg_at_5 value: 38.042 - type: precision_at_1 value: 29.337999999999997 - type: precision_at_10 value: 8.276 - type: precision_at_100 value: 1.276 - type: precision_at_1000 value: 0.164 - type: precision_at_3 value: 17.161 - type: precision_at_5 value: 12.671 - type: recall_at_1 value: 23.952 - type: recall_at_10 value: 57.267 - type: recall_at_100 value: 80.886 - type: recall_at_1000 value: 95.611 - type: recall_at_3 value: 38.622 - type: recall_at_5 value: 45.811 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.092083333333335 - type: map_at_10 value: 37.2925 - type: map_at_100 value: 38.57041666666666 - type: map_at_1000 value: 38.68141666666667 - type: map_at_3 value: 34.080000000000005 - type: map_at_5 value: 35.89958333333333 - type: mrr_at_1 value: 31.94758333333333 - type: mrr_at_10 value: 41.51049999999999 - type: mrr_at_100 value: 42.36099999999999 - type: mrr_at_1000 value: 42.4125 - type: mrr_at_3 value: 38.849583333333335 - type: mrr_at_5 value: 40.448249999999994 - type: ndcg_at_1 value: 31.94758333333333 - type: ndcg_at_10 value: 43.17633333333333 - type: ndcg_at_100 value: 48.45241666666668 - type: ndcg_at_1000 value: 50.513999999999996 - type: ndcg_at_3 value: 37.75216666666667 - type: ndcg_at_5 value: 40.393833333333326 - type: precision_at_1 value: 31.94758333333333 - type: precision_at_10 value: 7.688916666666666 - type: precision_at_100 value: 1.2250833333333333 - type: precision_at_1000 value: 0.1595 - type: precision_at_3 value: 17.465999999999998 - type: precision_at_5 value: 12.548083333333333 - type: recall_at_1 value: 27.092083333333335 - type: recall_at_10 value: 56.286583333333326 - type: recall_at_100 value: 79.09033333333333 - type: recall_at_1000 value: 93.27483333333335 - type: recall_at_3 value: 41.35325 - type: recall_at_5 value: 48.072750000000006 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.825 - type: map_at_10 value: 33.723 - type: map_at_100 value: 34.74 - type: map_at_1000 value: 34.824 - type: map_at_3 value: 31.369000000000003 - type: map_at_5 value: 32.533 - type: mrr_at_1 value: 29.293999999999997 - type: mrr_at_10 value: 36.84 - type: mrr_at_100 value: 37.681 - type: mrr_at_1000 value: 37.742 - type: mrr_at_3 value: 34.79 - type: mrr_at_5 value: 35.872 - type: ndcg_at_1 value: 29.293999999999997 - type: ndcg_at_10 value: 38.385999999999996 - type: ndcg_at_100 value: 43.327 - type: ndcg_at_1000 value: 45.53 - type: ndcg_at_3 value: 33.985 - type: ndcg_at_5 value: 35.817 - type: precision_at_1 value: 29.293999999999997 - type: precision_at_10 value: 6.12 - type: precision_at_100 value: 0.9329999999999999 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 14.621999999999998 - type: precision_at_5 value: 10.030999999999999 - type: recall_at_1 value: 25.825 - type: recall_at_10 value: 49.647000000000006 - type: recall_at_100 value: 72.32300000000001 - type: recall_at_1000 value: 88.62400000000001 - type: recall_at_3 value: 37.366 - type: recall_at_5 value: 41.957 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 18.139 - type: map_at_10 value: 26.107000000000003 - type: map_at_100 value: 27.406999999999996 - type: map_at_1000 value: 27.535999999999998 - type: map_at_3 value: 23.445 - type: map_at_5 value: 24.916 - type: mrr_at_1 value: 21.817 - type: mrr_at_10 value: 29.99 - type: mrr_at_100 value: 31.052000000000003 - type: mrr_at_1000 value: 31.128 - type: mrr_at_3 value: 27.627000000000002 - type: mrr_at_5 value: 29.005 - type: ndcg_at_1 value: 21.817 - type: ndcg_at_10 value: 31.135 - type: ndcg_at_100 value: 37.108000000000004 - type: ndcg_at_1000 value: 39.965 - type: ndcg_at_3 value: 26.439 - type: ndcg_at_5 value: 28.655 - type: precision_at_1 value: 21.817 - type: precision_at_10 value: 5.757000000000001 - type: precision_at_100 value: 1.036 - type: precision_at_1000 value: 0.147 - type: precision_at_3 value: 12.537 - type: precision_at_5 value: 9.229 - type: recall_at_1 value: 18.139 - type: recall_at_10 value: 42.272999999999996 - type: recall_at_100 value: 68.657 - type: recall_at_1000 value: 88.93799999999999 - type: recall_at_3 value: 29.266 - type: recall_at_5 value: 34.892 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 27.755000000000003 - type: map_at_10 value: 37.384 - type: map_at_100 value: 38.56 - type: map_at_1000 value: 38.655 - type: map_at_3 value: 34.214 - type: map_at_5 value: 35.96 - type: mrr_at_1 value: 32.369 - type: mrr_at_10 value: 41.625 - type: mrr_at_100 value: 42.449 - type: mrr_at_1000 value: 42.502 - type: mrr_at_3 value: 38.899 - type: mrr_at_5 value: 40.489999999999995 - type: ndcg_at_1 value: 32.369 - type: ndcg_at_10 value: 43.287 - type: ndcg_at_100 value: 48.504999999999995 - type: ndcg_at_1000 value: 50.552 - type: ndcg_at_3 value: 37.549 - type: ndcg_at_5 value: 40.204 - type: precision_at_1 value: 32.369 - type: precision_at_10 value: 7.425 - type: precision_at_100 value: 1.134 - type: precision_at_1000 value: 0.14200000000000002 - type: precision_at_3 value: 17.102 - type: precision_at_5 value: 12.107999999999999 - type: recall_at_1 value: 27.755000000000003 - type: recall_at_10 value: 57.071000000000005 - type: recall_at_100 value: 79.456 - type: recall_at_1000 value: 93.54299999999999 - type: recall_at_3 value: 41.298 - type: recall_at_5 value: 48.037 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 24.855 - type: map_at_10 value: 34.53 - type: map_at_100 value: 36.167 - type: map_at_1000 value: 36.394999999999996 - type: map_at_3 value: 31.037 - type: map_at_5 value: 33.119 - type: mrr_at_1 value: 30.631999999999998 - type: mrr_at_10 value: 39.763999999999996 - type: mrr_at_100 value: 40.77 - type: mrr_at_1000 value: 40.826 - type: mrr_at_3 value: 36.495 - type: mrr_at_5 value: 38.561 - type: ndcg_at_1 value: 30.631999999999998 - type: ndcg_at_10 value: 40.942 - type: ndcg_at_100 value: 47.07 - type: ndcg_at_1000 value: 49.363 - type: ndcg_at_3 value: 35.038000000000004 - type: ndcg_at_5 value: 38.161 - type: precision_at_1 value: 30.631999999999998 - type: precision_at_10 value: 7.983999999999999 - type: precision_at_100 value: 1.6070000000000002 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 16.206 - type: precision_at_5 value: 12.253 - type: recall_at_1 value: 24.855 - type: recall_at_10 value: 53.291999999999994 - type: recall_at_100 value: 80.283 - type: recall_at_1000 value: 94.309 - type: recall_at_3 value: 37.257 - type: recall_at_5 value: 45.282 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.208 - type: map_at_10 value: 30.512 - type: map_at_100 value: 31.496000000000002 - type: map_at_1000 value: 31.595000000000002 - type: map_at_3 value: 27.904 - type: map_at_5 value: 29.491 - type: mrr_at_1 value: 22.736 - type: mrr_at_10 value: 32.379999999999995 - type: mrr_at_100 value: 33.245000000000005 - type: mrr_at_1000 value: 33.315 - type: mrr_at_3 value: 29.945 - type: mrr_at_5 value: 31.488 - type: ndcg_at_1 value: 22.736 - type: ndcg_at_10 value: 35.643 - type: ndcg_at_100 value: 40.535 - type: ndcg_at_1000 value: 43.042 - type: ndcg_at_3 value: 30.625000000000004 - type: ndcg_at_5 value: 33.323 - type: precision_at_1 value: 22.736 - type: precision_at_10 value: 5.6930000000000005 - type: precision_at_100 value: 0.889 - type: precision_at_1000 value: 0.122 - type: precision_at_3 value: 13.431999999999999 - type: precision_at_5 value: 9.575 - type: recall_at_1 value: 21.208 - type: recall_at_10 value: 49.47 - type: recall_at_100 value: 71.71499999999999 - type: recall_at_1000 value: 90.55499999999999 - type: recall_at_3 value: 36.124 - type: recall_at_5 value: 42.606 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 11.363 - type: map_at_10 value: 20.312 - type: map_at_100 value: 22.225 - type: map_at_1000 value: 22.411 - type: map_at_3 value: 16.68 - type: map_at_5 value: 18.608 - type: mrr_at_1 value: 25.537 - type: mrr_at_10 value: 37.933 - type: mrr_at_100 value: 38.875 - type: mrr_at_1000 value: 38.911 - type: mrr_at_3 value: 34.387 - type: mrr_at_5 value: 36.51 - type: ndcg_at_1 value: 25.537 - type: ndcg_at_10 value: 28.82 - type: ndcg_at_100 value: 36.341 - type: ndcg_at_1000 value: 39.615 - type: ndcg_at_3 value: 23.01 - type: ndcg_at_5 value: 25.269000000000002 - type: precision_at_1 value: 25.537 - type: precision_at_10 value: 9.153 - type: precision_at_100 value: 1.7319999999999998 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 17.22 - type: precision_at_5 value: 13.629 - type: recall_at_1 value: 11.363 - type: recall_at_10 value: 35.382999999999996 - type: recall_at_100 value: 61.367000000000004 - type: recall_at_1000 value: 79.699 - type: recall_at_3 value: 21.495 - type: recall_at_5 value: 27.42 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 9.65 - type: map_at_10 value: 20.742 - type: map_at_100 value: 29.614 - type: map_at_1000 value: 31.373 - type: map_at_3 value: 14.667 - type: map_at_5 value: 17.186 - type: mrr_at_1 value: 69.75 - type: mrr_at_10 value: 76.762 - type: mrr_at_100 value: 77.171 - type: mrr_at_1000 value: 77.179 - type: mrr_at_3 value: 75.125 - type: mrr_at_5 value: 76.287 - type: ndcg_at_1 value: 57.62500000000001 - type: ndcg_at_10 value: 42.370999999999995 - type: ndcg_at_100 value: 47.897 - type: ndcg_at_1000 value: 55.393 - type: ndcg_at_3 value: 46.317 - type: ndcg_at_5 value: 43.906 - type: precision_at_1 value: 69.75 - type: precision_at_10 value: 33.95 - type: precision_at_100 value: 10.885 - type: precision_at_1000 value: 2.2239999999999998 - type: precision_at_3 value: 49.75 - type: precision_at_5 value: 42.3 - type: recall_at_1 value: 9.65 - type: recall_at_10 value: 26.117 - type: recall_at_100 value: 55.084 - type: recall_at_1000 value: 78.62400000000001 - type: recall_at_3 value: 15.823 - type: recall_at_5 value: 19.652 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 47.885 - type: f1 value: 42.99567641346983 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 70.97 - type: map_at_10 value: 80.34599999999999 - type: map_at_100 value: 80.571 - type: map_at_1000 value: 80.584 - type: map_at_3 value: 79.279 - type: map_at_5 value: 79.94 - type: mrr_at_1 value: 76.613 - type: mrr_at_10 value: 85.15700000000001 - type: mrr_at_100 value: 85.249 - type: mrr_at_1000 value: 85.252 - type: mrr_at_3 value: 84.33800000000001 - type: mrr_at_5 value: 84.89 - type: ndcg_at_1 value: 76.613 - type: ndcg_at_10 value: 84.53399999999999 - type: ndcg_at_100 value: 85.359 - type: ndcg_at_1000 value: 85.607 - type: ndcg_at_3 value: 82.76599999999999 - type: ndcg_at_5 value: 83.736 - type: precision_at_1 value: 76.613 - type: precision_at_10 value: 10.206 - type: precision_at_100 value: 1.083 - type: precision_at_1000 value: 0.11199999999999999 - type: precision_at_3 value: 31.913000000000004 - type: precision_at_5 value: 19.769000000000002 - type: recall_at_1 value: 70.97 - type: recall_at_10 value: 92.674 - type: recall_at_100 value: 95.985 - type: recall_at_1000 value: 97.57000000000001 - type: recall_at_3 value: 87.742 - type: recall_at_5 value: 90.28 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 22.494 - type: map_at_10 value: 36.491 - type: map_at_100 value: 38.550000000000004 - type: map_at_1000 value: 38.726 - type: map_at_3 value: 31.807000000000002 - type: map_at_5 value: 34.299 - type: mrr_at_1 value: 44.907000000000004 - type: mrr_at_10 value: 53.146 - type: mrr_at_100 value: 54.013999999999996 - type: mrr_at_1000 value: 54.044000000000004 - type: mrr_at_3 value: 50.952 - type: mrr_at_5 value: 52.124 - type: ndcg_at_1 value: 44.907000000000004 - type: ndcg_at_10 value: 44.499 - type: ndcg_at_100 value: 51.629000000000005 - type: ndcg_at_1000 value: 54.367 - type: ndcg_at_3 value: 40.900999999999996 - type: ndcg_at_5 value: 41.737 - type: precision_at_1 value: 44.907000000000004 - type: precision_at_10 value: 12.346 - type: precision_at_100 value: 1.974 - type: precision_at_1000 value: 0.246 - type: precision_at_3 value: 27.366 - type: precision_at_5 value: 19.846 - type: recall_at_1 value: 22.494 - type: recall_at_10 value: 51.156 - type: recall_at_100 value: 77.11200000000001 - type: recall_at_1000 value: 93.44 - type: recall_at_3 value: 36.574 - type: recall_at_5 value: 42.361 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 38.568999999999996 - type: map_at_10 value: 58.485 - type: map_at_100 value: 59.358999999999995 - type: map_at_1000 value: 59.429 - type: map_at_3 value: 55.217000000000006 - type: map_at_5 value: 57.236 - type: mrr_at_1 value: 77.137 - type: mrr_at_10 value: 82.829 - type: mrr_at_100 value: 83.04599999999999 - type: mrr_at_1000 value: 83.05399999999999 - type: mrr_at_3 value: 81.904 - type: mrr_at_5 value: 82.50800000000001 - type: ndcg_at_1 value: 77.137 - type: ndcg_at_10 value: 67.156 - type: ndcg_at_100 value: 70.298 - type: ndcg_at_1000 value: 71.65700000000001 - type: ndcg_at_3 value: 62.535 - type: ndcg_at_5 value: 65.095 - type: precision_at_1 value: 77.137 - type: precision_at_10 value: 13.911999999999999 - type: precision_at_100 value: 1.6389999999999998 - type: precision_at_1000 value: 0.182 - type: precision_at_3 value: 39.572 - type: precision_at_5 value: 25.766 - type: recall_at_1 value: 38.568999999999996 - type: recall_at_10 value: 69.56099999999999 - type: recall_at_100 value: 81.931 - type: recall_at_1000 value: 90.91799999999999 - type: recall_at_3 value: 59.358999999999995 - type: recall_at_5 value: 64.416 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 88.45600000000002 - type: ap value: 84.09725115338568 - type: f1 value: 88.41874909080512 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.404999999999998 - type: map_at_10 value: 33.921 - type: map_at_100 value: 35.116 - type: map_at_1000 value: 35.164 - type: map_at_3 value: 30.043999999999997 - type: map_at_5 value: 32.327 - type: mrr_at_1 value: 21.977 - type: mrr_at_10 value: 34.505 - type: mrr_at_100 value: 35.638999999999996 - type: mrr_at_1000 value: 35.68 - type: mrr_at_3 value: 30.703999999999997 - type: mrr_at_5 value: 32.96 - type: ndcg_at_1 value: 21.963 - type: ndcg_at_10 value: 40.859 - type: ndcg_at_100 value: 46.614 - type: ndcg_at_1000 value: 47.789 - type: ndcg_at_3 value: 33.007999999999996 - type: ndcg_at_5 value: 37.084 - type: precision_at_1 value: 21.963 - type: precision_at_10 value: 6.493 - type: precision_at_100 value: 0.938 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 14.155000000000001 - type: precision_at_5 value: 10.544 - type: recall_at_1 value: 21.404999999999998 - type: recall_at_10 value: 62.175000000000004 - type: recall_at_100 value: 88.786 - type: recall_at_1000 value: 97.738 - type: recall_at_3 value: 40.925 - type: recall_at_5 value: 50.722 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.50661194710442 - type: f1 value: 93.30311193153668 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 73.24669402644778 - type: f1 value: 54.23122108002977 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 72.61936785474109 - type: f1 value: 70.52644941025565 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 76.76529926025555 - type: f1 value: 77.26872729322514 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.39450293021839 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 31.757796879839294 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.62512146657428 - type: mrr value: 33.84624322066173 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 6.462 - type: map_at_10 value: 14.947 - type: map_at_100 value: 19.344 - type: map_at_1000 value: 20.933 - type: map_at_3 value: 10.761999999999999 - type: map_at_5 value: 12.744 - type: mrr_at_1 value: 47.988 - type: mrr_at_10 value: 57.365 - type: mrr_at_100 value: 57.931 - type: mrr_at_1000 value: 57.96 - type: mrr_at_3 value: 54.85 - type: mrr_at_5 value: 56.569 - type: ndcg_at_1 value: 46.129999999999995 - type: ndcg_at_10 value: 38.173 - type: ndcg_at_100 value: 35.983 - type: ndcg_at_1000 value: 44.507000000000005 - type: ndcg_at_3 value: 42.495 - type: ndcg_at_5 value: 41.019 - type: precision_at_1 value: 47.678 - type: precision_at_10 value: 28.731 - type: precision_at_100 value: 9.232 - type: precision_at_1000 value: 2.202 - type: precision_at_3 value: 39.628 - type: precision_at_5 value: 35.851 - type: recall_at_1 value: 6.462 - type: recall_at_10 value: 18.968 - type: recall_at_100 value: 37.131 - type: recall_at_1000 value: 67.956 - type: recall_at_3 value: 11.905000000000001 - type: recall_at_5 value: 15.097 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 30.335 - type: map_at_10 value: 46.611999999999995 - type: map_at_100 value: 47.632000000000005 - type: map_at_1000 value: 47.661 - type: map_at_3 value: 41.876999999999995 - type: map_at_5 value: 44.799 - type: mrr_at_1 value: 34.125 - type: mrr_at_10 value: 49.01 - type: mrr_at_100 value: 49.75 - type: mrr_at_1000 value: 49.768 - type: mrr_at_3 value: 45.153 - type: mrr_at_5 value: 47.589999999999996 - type: ndcg_at_1 value: 34.125 - type: ndcg_at_10 value: 54.777 - type: ndcg_at_100 value: 58.914 - type: ndcg_at_1000 value: 59.521 - type: ndcg_at_3 value: 46.015 - type: ndcg_at_5 value: 50.861000000000004 - type: precision_at_1 value: 34.125 - type: precision_at_10 value: 9.166 - type: precision_at_100 value: 1.149 - type: precision_at_1000 value: 0.121 - type: precision_at_3 value: 21.147 - type: precision_at_5 value: 15.469 - type: recall_at_1 value: 30.335 - type: recall_at_10 value: 77.194 - type: recall_at_100 value: 94.812 - type: recall_at_1000 value: 99.247 - type: recall_at_3 value: 54.681000000000004 - type: recall_at_5 value: 65.86800000000001 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.62 - type: map_at_10 value: 84.536 - type: map_at_100 value: 85.167 - type: map_at_1000 value: 85.184 - type: map_at_3 value: 81.607 - type: map_at_5 value: 83.423 - type: mrr_at_1 value: 81.36 - type: mrr_at_10 value: 87.506 - type: mrr_at_100 value: 87.601 - type: mrr_at_1000 value: 87.601 - type: mrr_at_3 value: 86.503 - type: mrr_at_5 value: 87.179 - type: ndcg_at_1 value: 81.36 - type: ndcg_at_10 value: 88.319 - type: ndcg_at_100 value: 89.517 - type: ndcg_at_1000 value: 89.60900000000001 - type: ndcg_at_3 value: 85.423 - type: ndcg_at_5 value: 86.976 - type: precision_at_1 value: 81.36 - type: precision_at_10 value: 13.415 - type: precision_at_100 value: 1.529 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.342999999999996 - type: precision_at_5 value: 24.534 - type: recall_at_1 value: 70.62 - type: recall_at_10 value: 95.57600000000001 - type: recall_at_100 value: 99.624 - type: recall_at_1000 value: 99.991 - type: recall_at_3 value: 87.22 - type: recall_at_5 value: 91.654 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 60.826438478212744 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 64.24027467551447 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.997999999999999 - type: map_at_10 value: 14.267 - type: map_at_100 value: 16.843 - type: map_at_1000 value: 17.229 - type: map_at_3 value: 9.834 - type: map_at_5 value: 11.92 - type: mrr_at_1 value: 24.7 - type: mrr_at_10 value: 37.685 - type: mrr_at_100 value: 38.704 - type: mrr_at_1000 value: 38.747 - type: mrr_at_3 value: 34.150000000000006 - type: mrr_at_5 value: 36.075 - type: ndcg_at_1 value: 24.7 - type: ndcg_at_10 value: 23.44 - type: ndcg_at_100 value: 32.617000000000004 - type: ndcg_at_1000 value: 38.628 - type: ndcg_at_3 value: 21.747 - type: ndcg_at_5 value: 19.076 - type: precision_at_1 value: 24.7 - type: precision_at_10 value: 12.47 - type: precision_at_100 value: 2.564 - type: precision_at_1000 value: 0.4 - type: precision_at_3 value: 20.767 - type: precision_at_5 value: 17.06 - type: recall_at_1 value: 4.997999999999999 - type: recall_at_10 value: 25.3 - type: recall_at_100 value: 52.048 - type: recall_at_1000 value: 81.093 - type: recall_at_3 value: 12.642999999999999 - type: recall_at_5 value: 17.312 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 85.44942006292234 - type: cos_sim_spearman value: 79.80930790660699 - type: euclidean_pearson value: 82.93400777494863 - type: euclidean_spearman value: 80.04664991110705 - type: manhattan_pearson value: 82.93551681854949 - type: manhattan_spearman value: 80.03156736837379 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 85.63574059135726 - type: cos_sim_spearman value: 76.80552915288186 - type: euclidean_pearson value: 82.46368529820518 - type: euclidean_spearman value: 76.60338474719275 - type: manhattan_pearson value: 82.4558617035968 - type: manhattan_spearman value: 76.57936082895705 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 86.24116811084211 - type: cos_sim_spearman value: 88.10998662068769 - type: euclidean_pearson value: 87.04961732352689 - type: euclidean_spearman value: 88.12543945864087 - type: manhattan_pearson value: 86.9905224528854 - type: manhattan_spearman value: 88.07827944705546 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 84.74847296555048 - type: cos_sim_spearman value: 82.66200957916445 - type: euclidean_pearson value: 84.48132256004965 - type: euclidean_spearman value: 82.67915286000596 - type: manhattan_pearson value: 84.44950477268334 - type: manhattan_spearman value: 82.63327639173352 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 87.23056258027053 - type: cos_sim_spearman value: 88.92791680286955 - type: euclidean_pearson value: 88.13819235461933 - type: euclidean_spearman value: 88.87294661361716 - type: manhattan_pearson value: 88.14212133687899 - type: manhattan_spearman value: 88.88551854529777 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 82.64179522732887 - type: cos_sim_spearman value: 84.25028809903114 - type: euclidean_pearson value: 83.40175015236979 - type: euclidean_spearman value: 84.23369296429406 - type: manhattan_pearson value: 83.43768174261321 - type: manhattan_spearman value: 84.27855229214734 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 88.20378955494732 - type: cos_sim_spearman value: 88.46863559173111 - type: euclidean_pearson value: 88.8249295811663 - type: euclidean_spearman value: 88.6312737724905 - type: manhattan_pearson value: 88.87744466378827 - type: manhattan_spearman value: 88.82908423767314 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 69.91342028796086 - type: cos_sim_spearman value: 69.71495021867864 - type: euclidean_pearson value: 70.65334330405646 - type: euclidean_spearman value: 69.4321253472211 - type: manhattan_pearson value: 70.59743494727465 - type: manhattan_spearman value: 69.11695509297482 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 85.42451709766952 - type: cos_sim_spearman value: 86.07166710670508 - type: euclidean_pearson value: 86.12711421258899 - type: euclidean_spearman value: 86.05232086925126 - type: manhattan_pearson value: 86.15591089932126 - type: manhattan_spearman value: 86.0890128623439 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 87.1976344717285 - type: mrr value: 96.3703145075694 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 59.511 - type: map_at_10 value: 69.724 - type: map_at_100 value: 70.208 - type: map_at_1000 value: 70.22800000000001 - type: map_at_3 value: 66.986 - type: map_at_5 value: 68.529 - type: mrr_at_1 value: 62.333000000000006 - type: mrr_at_10 value: 70.55 - type: mrr_at_100 value: 70.985 - type: mrr_at_1000 value: 71.004 - type: mrr_at_3 value: 68.611 - type: mrr_at_5 value: 69.728 - type: ndcg_at_1 value: 62.333000000000006 - type: ndcg_at_10 value: 74.265 - type: ndcg_at_100 value: 76.361 - type: ndcg_at_1000 value: 76.82900000000001 - type: ndcg_at_3 value: 69.772 - type: ndcg_at_5 value: 71.94800000000001 - type: precision_at_1 value: 62.333000000000006 - type: precision_at_10 value: 9.9 - type: precision_at_100 value: 1.093 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 27.444000000000003 - type: precision_at_5 value: 18 - type: recall_at_1 value: 59.511 - type: recall_at_10 value: 87.156 - type: recall_at_100 value: 96.5 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 75.2 - type: recall_at_5 value: 80.661 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.81683168316832 - type: cos_sim_ap value: 95.74716566563774 - type: cos_sim_f1 value: 90.64238745574103 - type: cos_sim_precision value: 91.7093142272262 - type: cos_sim_recall value: 89.60000000000001 - type: dot_accuracy value: 99.69405940594059 - type: dot_ap value: 91.09013507754594 - type: dot_f1 value: 84.54227113556779 - type: dot_precision value: 84.58458458458459 - type: dot_recall value: 84.5 - type: euclidean_accuracy value: 99.81782178217821 - type: euclidean_ap value: 95.6324301072609 - type: euclidean_f1 value: 90.58341862845445 - type: euclidean_precision value: 92.76729559748428 - type: euclidean_recall value: 88.5 - type: manhattan_accuracy value: 99.81980198019802 - type: manhattan_ap value: 95.68510494437183 - type: manhattan_f1 value: 90.58945191313342 - type: manhattan_precision value: 93.79014989293361 - type: manhattan_recall value: 87.6 - type: max_accuracy value: 99.81980198019802 - type: max_ap value: 95.74716566563774 - type: max_f1 value: 90.64238745574103 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 67.63761899427078 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 36.572473369697235 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 53.63000245208579 - type: mrr value: 54.504193722943725 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.300791939416545 - type: cos_sim_spearman value: 31.662904057924123 - type: dot_pearson value: 26.21198530758316 - type: dot_spearman value: 27.006921548904263 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.197 - type: map_at_10 value: 1.752 - type: map_at_100 value: 10.795 - type: map_at_1000 value: 27.18 - type: map_at_3 value: 0.5890000000000001 - type: map_at_5 value: 0.938 - type: mrr_at_1 value: 74 - type: mrr_at_10 value: 85.833 - type: mrr_at_100 value: 85.833 - type: mrr_at_1000 value: 85.833 - type: mrr_at_3 value: 85.333 - type: mrr_at_5 value: 85.833 - type: ndcg_at_1 value: 69 - type: ndcg_at_10 value: 70.22 - type: ndcg_at_100 value: 55.785 - type: ndcg_at_1000 value: 52.93600000000001 - type: ndcg_at_3 value: 72.084 - type: ndcg_at_5 value: 71.184 - type: precision_at_1 value: 74 - type: precision_at_10 value: 75.2 - type: precision_at_100 value: 57.3 - type: precision_at_1000 value: 23.302 - type: precision_at_3 value: 77.333 - type: precision_at_5 value: 75.6 - type: recall_at_1 value: 0.197 - type: recall_at_10 value: 2.019 - type: recall_at_100 value: 14.257 - type: recall_at_1000 value: 50.922 - type: recall_at_3 value: 0.642 - type: recall_at_5 value: 1.043 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 2.803 - type: map_at_10 value: 10.407 - type: map_at_100 value: 16.948 - type: map_at_1000 value: 18.424 - type: map_at_3 value: 5.405 - type: map_at_5 value: 6.908 - type: mrr_at_1 value: 36.735 - type: mrr_at_10 value: 50.221000000000004 - type: mrr_at_100 value: 51.388 - type: mrr_at_1000 value: 51.402 - type: mrr_at_3 value: 47.278999999999996 - type: mrr_at_5 value: 49.626 - type: ndcg_at_1 value: 34.694 - type: ndcg_at_10 value: 25.507 - type: ndcg_at_100 value: 38.296 - type: ndcg_at_1000 value: 49.492000000000004 - type: ndcg_at_3 value: 29.006999999999998 - type: ndcg_at_5 value: 25.979000000000003 - type: precision_at_1 value: 36.735 - type: precision_at_10 value: 22.041 - type: precision_at_100 value: 8.02 - type: precision_at_1000 value: 1.567 - type: precision_at_3 value: 28.571 - type: precision_at_5 value: 24.490000000000002 - type: recall_at_1 value: 2.803 - type: recall_at_10 value: 16.378 - type: recall_at_100 value: 50.489 - type: recall_at_1000 value: 85.013 - type: recall_at_3 value: 6.505 - type: recall_at_5 value: 9.243 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.55579999999999 - type: ap value: 14.206982753316227 - type: f1 value: 54.372142814964285 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 56.57611771363893 - type: f1 value: 56.924172639063144 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 52.82304915719759 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 85.92716218632653 - type: cos_sim_ap value: 73.73359122546046 - type: cos_sim_f1 value: 68.42559487116262 - type: cos_sim_precision value: 64.22124508215691 - type: cos_sim_recall value: 73.21899736147758 - type: dot_accuracy value: 80.38981939560112 - type: dot_ap value: 54.61060862444974 - type: dot_f1 value: 53.45710627400769 - type: dot_precision value: 44.87638839125761 - type: dot_recall value: 66.09498680738787 - type: euclidean_accuracy value: 86.02849138701794 - type: euclidean_ap value: 73.95673761922404 - type: euclidean_f1 value: 68.6783042394015 - type: euclidean_precision value: 65.1063829787234 - type: euclidean_recall value: 72.66490765171504 - type: manhattan_accuracy value: 85.9808070572808 - type: manhattan_ap value: 73.9050720058029 - type: manhattan_f1 value: 68.57560618983794 - type: manhattan_precision value: 63.70839936608558 - type: manhattan_recall value: 74.24802110817942 - type: max_accuracy value: 86.02849138701794 - type: max_ap value: 73.95673761922404 - type: max_f1 value: 68.6783042394015 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.72783017037295 - type: cos_sim_ap value: 85.52705223340233 - type: cos_sim_f1 value: 77.91659078492079 - type: cos_sim_precision value: 73.93378032764221 - type: cos_sim_recall value: 82.35294117647058 - type: dot_accuracy value: 85.41739434159972 - type: dot_ap value: 77.17734818118443 - type: dot_f1 value: 71.63473589973144 - type: dot_precision value: 66.96123719622415 - type: dot_recall value: 77.00954727440714 - type: euclidean_accuracy value: 88.68125897465751 - type: euclidean_ap value: 85.47712213906692 - type: euclidean_f1 value: 77.81419950830664 - type: euclidean_precision value: 75.37162649733006 - type: euclidean_recall value: 80.42038805050817 - type: manhattan_accuracy value: 88.67349710870494 - type: manhattan_ap value: 85.46506475241955 - type: manhattan_f1 value: 77.87259084890393 - type: manhattan_precision value: 74.54929577464789 - type: manhattan_recall value: 81.50600554357868 - type: max_accuracy value: 88.72783017037295 - type: max_ap value: 85.52705223340233 - type: max_f1 value: 77.91659078492079 language: - en license: mit --- # gte-large General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc. ## Metrics We compared the performance of the GTE models with other popular text embedding models on the MTEB benchmark. For more detailed comparison results, please refer to the MTEB leaderboard. | Model Name | Model Size (GB) | Dimension | Sequence Length | Average (56) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large** | 0.67 | 1024 | 512 | **63.13** | 46.84 | 85.00 | 59.13 | 52.22 | 83.35 | 31.66 | 73.33 | | **gte-base** | 0.22 | 768 | 512 | **62.39** | 46.2 | 84.57 | 58.61 | 51.14 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1.34 | 1024| 512 | 62.25 | 44.49 | 86.03 | 56.61 | 50.56 | 82.05 | 30.19 | 75.24 | | e5-base-v2 | 0.44 | 768 | 512 | 61.5 | 43.80 | 85.73 | 55.91 | 50.29 | 81.05 | 30.28 | 73.84 | | **gte-small** | 0.07 | 384 | 512 | **61.36** | 44.89 | 83.54 | 57.7 | 49.46 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | - | 1536 | 8192 | 60.99 | 45.9 | 84.89 | 56.32 | 49.25 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 0.13 | 384 | 512 | 59.93 | 39.92 | 84.67 | 54.32 | 49.04 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 9.73 | 768 | 512 | 59.51 | 43.72 | 85.06 | 56.42 | 42.24 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 0.44 | 768 | 514 | 57.78 | 43.69 | 83.04 | 59.36 | 43.81 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 28.27 | 4096 | 2048 | 57.59 | 38.93 | 81.9 | 55.65 | 48.22 | 77.74 | 33.6 | 66.19 | | all-MiniLM-L12-v2 | 0.13 | 384 | 512 | 56.53 | 41.81 | 82.41 | 58.44 | 42.69 | 79.8 | 27.9 | 63.21 | | all-MiniLM-L6-v2 | 0.09 | 384 | 512 | 56.26 | 42.35 | 82.37 | 58.04 | 41.95 | 78.9 | 30.81 | 63.05 | | contriever-base-msmarco | 0.44 | 768 | 512 | 56.00 | 41.1 | 82.54 | 53.14 | 41.88 | 76.51 | 30.36 | 66.68 | | sentence-t5-base | 0.22 | 768 | 512 | 55.27 | 40.21 | 85.18 | 53.09 | 33.63 | 81.14 | 31.39 | 69.81 | ## Usage Code example Use with sentence-transformers: ### Limitation This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens. ### Citation If you find our paper or models helpful, please consider citing them as follows:", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity measurement, classification, clustering, and retrieval across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/thenlper_gte-small.json b/data/model_data_json/thenlper_gte-small.json new file mode 100644 index 0000000000000000000000000000000000000000..05b2d0b18f801cf21363455955c5e495e2a791b4 --- /dev/null +++ b/data/model_data_json/thenlper_gte-small.json @@ -0,0 +1,27 @@ +{ + "model_id": "thenlper/gte-small", + "downloads": 486968, + "tags": [ + "sentence-transformers", + "pytorch", + "tf", + "coreml", + "onnx", + "safetensors", + "openvino", + "bert", + "mteb", + "sentence-similarity", + "Sentence Transformers", + "en", + "arxiv:2308.03281", + "license:mit", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - mteb - sentence-similarity - sentence-transformers - Sentence Transformers model-index: - name: gte-small results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 73.22388059701493 - type: ap value: 36.09895941426988 - type: f1 value: 67.3205651539195 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 91.81894999999999 - type: ap value: 88.5240138417305 - type: f1 value: 91.80367382706962 - task: type: Classification dataset: type: mteb/amazon_reviews_multi name: MTEB AmazonReviewsClassification (en) config: en split: test revision: 1399c76144fd37290681b995c656ef9b2e06e26d metrics: - type: accuracy value: 48.032 - type: f1 value: 47.4490665674719 - task: type: Retrieval dataset: type: arguana name: MTEB ArguAna config: default split: test revision: None metrics: - type: map_at_1 value: 30.725 - type: map_at_10 value: 46.604 - type: map_at_100 value: 47.535 - type: map_at_1000 value: 47.538000000000004 - type: map_at_3 value: 41.833 - type: map_at_5 value: 44.61 - type: mrr_at_1 value: 31.223 - type: mrr_at_10 value: 46.794000000000004 - type: mrr_at_100 value: 47.725 - type: mrr_at_1000 value: 47.727000000000004 - type: mrr_at_3 value: 42.07 - type: mrr_at_5 value: 44.812000000000005 - type: ndcg_at_1 value: 30.725 - type: ndcg_at_10 value: 55.440999999999995 - type: ndcg_at_100 value: 59.134 - type: ndcg_at_1000 value: 59.199 - type: ndcg_at_3 value: 45.599000000000004 - type: ndcg_at_5 value: 50.637 - type: precision_at_1 value: 30.725 - type: precision_at_10 value: 8.364 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.1 - type: precision_at_3 value: 18.848000000000003 - type: precision_at_5 value: 13.77 - type: recall_at_1 value: 30.725 - type: recall_at_10 value: 83.64200000000001 - type: recall_at_100 value: 99.14699999999999 - type: recall_at_1000 value: 99.644 - type: recall_at_3 value: 56.543 - type: recall_at_5 value: 68.848 - task: type: Clustering dataset: type: mteb/arxiv-clustering-p2p name: MTEB ArxivClusteringP2P config: default split: test revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d metrics: - type: v_measure value: 47.90178078197678 - task: type: Clustering dataset: type: mteb/arxiv-clustering-s2s name: MTEB ArxivClusteringS2S config: default split: test revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 metrics: - type: v_measure value: 40.25728393431922 - task: type: Reranking dataset: type: mteb/askubuntudupquestions-reranking name: MTEB AskUbuntuDupQuestions config: default split: test revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 metrics: - type: map value: 61.720297062897764 - type: mrr value: 75.24139295607439 - task: type: STS dataset: type: mteb/biosses-sts name: MTEB BIOSSES config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cos_sim_pearson value: 89.43527309184616 - type: cos_sim_spearman value: 88.17128615100206 - type: euclidean_pearson value: 87.89922623089282 - type: euclidean_spearman value: 87.96104039655451 - type: manhattan_pearson value: 87.9818290932077 - type: manhattan_spearman value: 88.00923426576885 - task: type: Classification dataset: type: mteb/banking77 name: MTEB Banking77Classification config: default split: test revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 metrics: - type: accuracy value: 84.0844155844156 - type: f1 value: 84.01485017302213 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-p2p name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 38.36574769259432 - task: type: Clustering dataset: type: mteb/biorxiv-clustering-s2s name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 35.4857033165287 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 30.261 - type: map_at_10 value: 42.419000000000004 - type: map_at_100 value: 43.927 - type: map_at_1000 value: 44.055 - type: map_at_3 value: 38.597 - type: map_at_5 value: 40.701 - type: mrr_at_1 value: 36.91 - type: mrr_at_10 value: 48.02 - type: mrr_at_100 value: 48.658 - type: mrr_at_1000 value: 48.708 - type: mrr_at_3 value: 44.945 - type: mrr_at_5 value: 46.705000000000005 - type: ndcg_at_1 value: 36.91 - type: ndcg_at_10 value: 49.353 - type: ndcg_at_100 value: 54.456 - type: ndcg_at_1000 value: 56.363 - type: ndcg_at_3 value: 43.483 - type: ndcg_at_5 value: 46.150999999999996 - type: precision_at_1 value: 36.91 - type: precision_at_10 value: 9.700000000000001 - type: precision_at_100 value: 1.557 - type: precision_at_1000 value: 0.202 - type: precision_at_3 value: 21.078 - type: precision_at_5 value: 15.421999999999999 - type: recall_at_1 value: 30.261 - type: recall_at_10 value: 63.242 - type: recall_at_100 value: 84.09100000000001 - type: recall_at_1000 value: 96.143 - type: recall_at_3 value: 46.478 - type: recall_at_5 value: 53.708 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 31.145 - type: map_at_10 value: 40.996 - type: map_at_100 value: 42.266999999999996 - type: map_at_1000 value: 42.397 - type: map_at_3 value: 38.005 - type: map_at_5 value: 39.628 - type: mrr_at_1 value: 38.344 - type: mrr_at_10 value: 46.827000000000005 - type: mrr_at_100 value: 47.446 - type: mrr_at_1000 value: 47.489 - type: mrr_at_3 value: 44.448 - type: mrr_at_5 value: 45.747 - type: ndcg_at_1 value: 38.344 - type: ndcg_at_10 value: 46.733000000000004 - type: ndcg_at_100 value: 51.103 - type: ndcg_at_1000 value: 53.075 - type: ndcg_at_3 value: 42.366 - type: ndcg_at_5 value: 44.242 - type: precision_at_1 value: 38.344 - type: precision_at_10 value: 8.822000000000001 - type: precision_at_100 value: 1.417 - type: precision_at_1000 value: 0.187 - type: precision_at_3 value: 20.403 - type: precision_at_5 value: 14.306 - type: recall_at_1 value: 31.145 - type: recall_at_10 value: 56.909 - type: recall_at_100 value: 75.274 - type: recall_at_1000 value: 87.629 - type: recall_at_3 value: 43.784 - type: recall_at_5 value: 49.338 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGamingRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 38.83 - type: map_at_10 value: 51.553000000000004 - type: map_at_100 value: 52.581 - type: map_at_1000 value: 52.638 - type: map_at_3 value: 48.112 - type: map_at_5 value: 50.095 - type: mrr_at_1 value: 44.513999999999996 - type: mrr_at_10 value: 54.998000000000005 - type: mrr_at_100 value: 55.650999999999996 - type: mrr_at_1000 value: 55.679 - type: mrr_at_3 value: 52.602000000000004 - type: mrr_at_5 value: 53.931 - type: ndcg_at_1 value: 44.513999999999996 - type: ndcg_at_10 value: 57.67400000000001 - type: ndcg_at_100 value: 61.663999999999994 - type: ndcg_at_1000 value: 62.743 - type: ndcg_at_3 value: 51.964 - type: ndcg_at_5 value: 54.773 - type: precision_at_1 value: 44.513999999999996 - type: precision_at_10 value: 9.423 - type: precision_at_100 value: 1.2309999999999999 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 23.323 - type: precision_at_5 value: 16.163 - type: recall_at_1 value: 38.83 - type: recall_at_10 value: 72.327 - type: recall_at_100 value: 89.519 - type: recall_at_1000 value: 97.041 - type: recall_at_3 value: 57.206 - type: recall_at_5 value: 63.88399999999999 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackGisRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.484 - type: map_at_10 value: 34.527 - type: map_at_100 value: 35.661 - type: map_at_1000 value: 35.739 - type: map_at_3 value: 32.199 - type: map_at_5 value: 33.632 - type: mrr_at_1 value: 27.458 - type: mrr_at_10 value: 36.543 - type: mrr_at_100 value: 37.482 - type: mrr_at_1000 value: 37.543 - type: mrr_at_3 value: 34.256 - type: mrr_at_5 value: 35.618 - type: ndcg_at_1 value: 27.458 - type: ndcg_at_10 value: 39.396 - type: ndcg_at_100 value: 44.742 - type: ndcg_at_1000 value: 46.708 - type: ndcg_at_3 value: 34.817 - type: ndcg_at_5 value: 37.247 - type: precision_at_1 value: 27.458 - type: precision_at_10 value: 5.976999999999999 - type: precision_at_100 value: 0.907 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 14.878 - type: precision_at_5 value: 10.35 - type: recall_at_1 value: 25.484 - type: recall_at_10 value: 52.317 - type: recall_at_100 value: 76.701 - type: recall_at_1000 value: 91.408 - type: recall_at_3 value: 40.043 - type: recall_at_5 value: 45.879 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.719 - type: map_at_10 value: 25.269000000000002 - type: map_at_100 value: 26.442 - type: map_at_1000 value: 26.557 - type: map_at_3 value: 22.56 - type: map_at_5 value: 24.082 - type: mrr_at_1 value: 20.896 - type: mrr_at_10 value: 29.982999999999997 - type: mrr_at_100 value: 30.895 - type: mrr_at_1000 value: 30.961 - type: mrr_at_3 value: 27.239 - type: mrr_at_5 value: 28.787000000000003 - type: ndcg_at_1 value: 20.896 - type: ndcg_at_10 value: 30.814000000000004 - type: ndcg_at_100 value: 36.418 - type: ndcg_at_1000 value: 39.182 - type: ndcg_at_3 value: 25.807999999999996 - type: ndcg_at_5 value: 28.143 - type: precision_at_1 value: 20.896 - type: precision_at_10 value: 5.821 - type: precision_at_100 value: 0.991 - type: precision_at_1000 value: 0.136 - type: precision_at_3 value: 12.562000000000001 - type: precision_at_5 value: 9.254 - type: recall_at_1 value: 16.719 - type: recall_at_10 value: 43.155 - type: recall_at_100 value: 67.831 - type: recall_at_1000 value: 87.617 - type: recall_at_3 value: 29.259 - type: recall_at_5 value: 35.260999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 29.398999999999997 - type: map_at_10 value: 39.876 - type: map_at_100 value: 41.205999999999996 - type: map_at_1000 value: 41.321999999999996 - type: map_at_3 value: 36.588 - type: map_at_5 value: 38.538 - type: mrr_at_1 value: 35.9 - type: mrr_at_10 value: 45.528 - type: mrr_at_100 value: 46.343 - type: mrr_at_1000 value: 46.388 - type: mrr_at_3 value: 42.862 - type: mrr_at_5 value: 44.440000000000005 - type: ndcg_at_1 value: 35.9 - type: ndcg_at_10 value: 45.987 - type: ndcg_at_100 value: 51.370000000000005 - type: ndcg_at_1000 value: 53.400000000000006 - type: ndcg_at_3 value: 40.841 - type: ndcg_at_5 value: 43.447 - type: precision_at_1 value: 35.9 - type: precision_at_10 value: 8.393 - type: precision_at_100 value: 1.283 - type: precision_at_1000 value: 0.166 - type: precision_at_3 value: 19.538 - type: precision_at_5 value: 13.975000000000001 - type: recall_at_1 value: 29.398999999999997 - type: recall_at_10 value: 58.361 - type: recall_at_100 value: 81.081 - type: recall_at_1000 value: 94.004 - type: recall_at_3 value: 43.657000000000004 - type: recall_at_5 value: 50.519999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 21.589 - type: map_at_10 value: 31.608999999999998 - type: map_at_100 value: 33.128 - type: map_at_1000 value: 33.247 - type: map_at_3 value: 28.671999999999997 - type: map_at_5 value: 30.233999999999998 - type: mrr_at_1 value: 26.712000000000003 - type: mrr_at_10 value: 36.713 - type: mrr_at_100 value: 37.713 - type: mrr_at_1000 value: 37.771 - type: mrr_at_3 value: 34.075 - type: mrr_at_5 value: 35.451 - type: ndcg_at_1 value: 26.712000000000003 - type: ndcg_at_10 value: 37.519999999999996 - type: ndcg_at_100 value: 43.946000000000005 - type: ndcg_at_1000 value: 46.297 - type: ndcg_at_3 value: 32.551 - type: ndcg_at_5 value: 34.660999999999994 - type: precision_at_1 value: 26.712000000000003 - type: precision_at_10 value: 7.066 - type: precision_at_100 value: 1.216 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 15.906 - type: precision_at_5 value: 11.437999999999999 - type: recall_at_1 value: 21.589 - type: recall_at_10 value: 50.090999999999994 - type: recall_at_100 value: 77.43900000000001 - type: recall_at_1000 value: 93.35900000000001 - type: recall_at_3 value: 36.028999999999996 - type: recall_at_5 value: 41.698 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.121666666666663 - type: map_at_10 value: 34.46258333333334 - type: map_at_100 value: 35.710499999999996 - type: map_at_1000 value: 35.82691666666666 - type: map_at_3 value: 31.563249999999996 - type: map_at_5 value: 33.189750000000004 - type: mrr_at_1 value: 29.66441666666667 - type: mrr_at_10 value: 38.5455 - type: mrr_at_100 value: 39.39566666666667 - type: mrr_at_1000 value: 39.45325 - type: mrr_at_3 value: 36.003333333333345 - type: mrr_at_5 value: 37.440916666666666 - type: ndcg_at_1 value: 29.66441666666667 - type: ndcg_at_10 value: 39.978416666666675 - type: ndcg_at_100 value: 45.278666666666666 - type: ndcg_at_1000 value: 47.52275 - type: ndcg_at_3 value: 35.00058333333334 - type: ndcg_at_5 value: 37.34908333333333 - type: precision_at_1 value: 29.66441666666667 - type: precision_at_10 value: 7.094500000000001 - type: precision_at_100 value: 1.1523333333333332 - type: precision_at_1000 value: 0.15358333333333332 - type: precision_at_3 value: 16.184166666666663 - type: precision_at_5 value: 11.6005 - type: recall_at_1 value: 25.121666666666663 - type: recall_at_10 value: 52.23975000000001 - type: recall_at_100 value: 75.48408333333333 - type: recall_at_1000 value: 90.95316666666668 - type: recall_at_3 value: 38.38458333333333 - type: recall_at_5 value: 44.39933333333333 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackStatsRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.569000000000003 - type: map_at_10 value: 30.389 - type: map_at_100 value: 31.396 - type: map_at_1000 value: 31.493 - type: map_at_3 value: 28.276 - type: map_at_5 value: 29.459000000000003 - type: mrr_at_1 value: 26.534000000000002 - type: mrr_at_10 value: 33.217999999999996 - type: mrr_at_100 value: 34.054 - type: mrr_at_1000 value: 34.12 - type: mrr_at_3 value: 31.058000000000003 - type: mrr_at_5 value: 32.330999999999996 - type: ndcg_at_1 value: 26.534000000000002 - type: ndcg_at_10 value: 34.608 - type: ndcg_at_100 value: 39.391999999999996 - type: ndcg_at_1000 value: 41.837999999999994 - type: ndcg_at_3 value: 30.564999999999998 - type: ndcg_at_5 value: 32.509 - type: precision_at_1 value: 26.534000000000002 - type: precision_at_10 value: 5.414 - type: precision_at_100 value: 0.847 - type: precision_at_1000 value: 0.11399999999999999 - type: precision_at_3 value: 12.986 - type: precision_at_5 value: 9.202 - type: recall_at_1 value: 23.569000000000003 - type: recall_at_10 value: 44.896 - type: recall_at_100 value: 66.476 - type: recall_at_1000 value: 84.548 - type: recall_at_3 value: 33.79 - type: recall_at_5 value: 38.512 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackTexRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 16.36 - type: map_at_10 value: 23.57 - type: map_at_100 value: 24.698999999999998 - type: map_at_1000 value: 24.834999999999997 - type: map_at_3 value: 21.093 - type: map_at_5 value: 22.418 - type: mrr_at_1 value: 19.718 - type: mrr_at_10 value: 27.139999999999997 - type: mrr_at_100 value: 28.097 - type: mrr_at_1000 value: 28.177999999999997 - type: mrr_at_3 value: 24.805 - type: mrr_at_5 value: 26.121 - type: ndcg_at_1 value: 19.718 - type: ndcg_at_10 value: 28.238999999999997 - type: ndcg_at_100 value: 33.663 - type: ndcg_at_1000 value: 36.763 - type: ndcg_at_3 value: 23.747 - type: ndcg_at_5 value: 25.796000000000003 - type: precision_at_1 value: 19.718 - type: precision_at_10 value: 5.282 - type: precision_at_100 value: 0.9390000000000001 - type: precision_at_1000 value: 0.13899999999999998 - type: precision_at_3 value: 11.264000000000001 - type: precision_at_5 value: 8.341 - type: recall_at_1 value: 16.36 - type: recall_at_10 value: 38.669 - type: recall_at_100 value: 63.184 - type: recall_at_1000 value: 85.33800000000001 - type: recall_at_3 value: 26.214 - type: recall_at_5 value: 31.423000000000002 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackUnixRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 25.618999999999996 - type: map_at_10 value: 34.361999999999995 - type: map_at_100 value: 35.534 - type: map_at_1000 value: 35.634 - type: map_at_3 value: 31.402 - type: map_at_5 value: 32.815 - type: mrr_at_1 value: 30.037000000000003 - type: mrr_at_10 value: 38.284 - type: mrr_at_100 value: 39.141999999999996 - type: mrr_at_1000 value: 39.2 - type: mrr_at_3 value: 35.603 - type: mrr_at_5 value: 36.867 - type: ndcg_at_1 value: 30.037000000000003 - type: ndcg_at_10 value: 39.87 - type: ndcg_at_100 value: 45.243 - type: ndcg_at_1000 value: 47.507 - type: ndcg_at_3 value: 34.371 - type: ndcg_at_5 value: 36.521 - type: precision_at_1 value: 30.037000000000003 - type: precision_at_10 value: 6.819 - type: precision_at_100 value: 1.0699999999999998 - type: precision_at_1000 value: 0.13699999999999998 - type: precision_at_3 value: 15.392 - type: precision_at_5 value: 10.821 - type: recall_at_1 value: 25.618999999999996 - type: recall_at_10 value: 52.869 - type: recall_at_100 value: 76.395 - type: recall_at_1000 value: 92.19500000000001 - type: recall_at_3 value: 37.943 - type: recall_at_5 value: 43.342999999999996 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 23.283 - type: map_at_10 value: 32.155 - type: map_at_100 value: 33.724 - type: map_at_1000 value: 33.939 - type: map_at_3 value: 29.018 - type: map_at_5 value: 30.864000000000004 - type: mrr_at_1 value: 28.063 - type: mrr_at_10 value: 36.632 - type: mrr_at_100 value: 37.606 - type: mrr_at_1000 value: 37.671 - type: mrr_at_3 value: 33.992 - type: mrr_at_5 value: 35.613 - type: ndcg_at_1 value: 28.063 - type: ndcg_at_10 value: 38.024 - type: ndcg_at_100 value: 44.292 - type: ndcg_at_1000 value: 46.818 - type: ndcg_at_3 value: 32.965 - type: ndcg_at_5 value: 35.562 - type: precision_at_1 value: 28.063 - type: precision_at_10 value: 7.352 - type: precision_at_100 value: 1.514 - type: precision_at_1000 value: 0.23800000000000002 - type: precision_at_3 value: 15.481 - type: precision_at_5 value: 11.542 - type: recall_at_1 value: 23.283 - type: recall_at_10 value: 49.756 - type: recall_at_100 value: 78.05 - type: recall_at_1000 value: 93.854 - type: recall_at_3 value: 35.408 - type: recall_at_5 value: 42.187000000000005 - task: type: Retrieval dataset: type: BeIR/cqadupstack name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 19.201999999999998 - type: map_at_10 value: 26.826 - type: map_at_100 value: 27.961000000000002 - type: map_at_1000 value: 28.066999999999997 - type: map_at_3 value: 24.237000000000002 - type: map_at_5 value: 25.811 - type: mrr_at_1 value: 20.887 - type: mrr_at_10 value: 28.660000000000004 - type: mrr_at_100 value: 29.660999999999998 - type: mrr_at_1000 value: 29.731 - type: mrr_at_3 value: 26.155 - type: mrr_at_5 value: 27.68 - type: ndcg_at_1 value: 20.887 - type: ndcg_at_10 value: 31.523 - type: ndcg_at_100 value: 37.055 - type: ndcg_at_1000 value: 39.579 - type: ndcg_at_3 value: 26.529000000000003 - type: ndcg_at_5 value: 29.137 - type: precision_at_1 value: 20.887 - type: precision_at_10 value: 5.065 - type: precision_at_100 value: 0.856 - type: precision_at_1000 value: 0.11900000000000001 - type: precision_at_3 value: 11.399 - type: precision_at_5 value: 8.392 - type: recall_at_1 value: 19.201999999999998 - type: recall_at_10 value: 44.285000000000004 - type: recall_at_100 value: 69.768 - type: recall_at_1000 value: 88.302 - type: recall_at_3 value: 30.804 - type: recall_at_5 value: 37.039 - task: type: Retrieval dataset: type: climate-fever name: MTEB ClimateFEVER config: default split: test revision: None metrics: - type: map_at_1 value: 11.244 - type: map_at_10 value: 18.956 - type: map_at_100 value: 20.674 - type: map_at_1000 value: 20.863 - type: map_at_3 value: 15.923000000000002 - type: map_at_5 value: 17.518 - type: mrr_at_1 value: 25.080999999999996 - type: mrr_at_10 value: 35.94 - type: mrr_at_100 value: 36.969 - type: mrr_at_1000 value: 37.013 - type: mrr_at_3 value: 32.617000000000004 - type: mrr_at_5 value: 34.682 - type: ndcg_at_1 value: 25.080999999999996 - type: ndcg_at_10 value: 26.539 - type: ndcg_at_100 value: 33.601 - type: ndcg_at_1000 value: 37.203 - type: ndcg_at_3 value: 21.695999999999998 - type: ndcg_at_5 value: 23.567 - type: precision_at_1 value: 25.080999999999996 - type: precision_at_10 value: 8.143 - type: precision_at_100 value: 1.5650000000000002 - type: precision_at_1000 value: 0.22300000000000003 - type: precision_at_3 value: 15.983 - type: precision_at_5 value: 12.417 - type: recall_at_1 value: 11.244 - type: recall_at_10 value: 31.457 - type: recall_at_100 value: 55.92 - type: recall_at_1000 value: 76.372 - type: recall_at_3 value: 19.784 - type: recall_at_5 value: 24.857000000000003 - task: type: Retrieval dataset: type: dbpedia-entity name: MTEB DBPedia config: default split: test revision: None metrics: - type: map_at_1 value: 8.595 - type: map_at_10 value: 18.75 - type: map_at_100 value: 26.354 - type: map_at_1000 value: 27.912 - type: map_at_3 value: 13.794 - type: map_at_5 value: 16.021 - type: mrr_at_1 value: 65.75 - type: mrr_at_10 value: 73.837 - type: mrr_at_100 value: 74.22800000000001 - type: mrr_at_1000 value: 74.234 - type: mrr_at_3 value: 72.5 - type: mrr_at_5 value: 73.387 - type: ndcg_at_1 value: 52.625 - type: ndcg_at_10 value: 39.101 - type: ndcg_at_100 value: 43.836000000000006 - type: ndcg_at_1000 value: 51.086 - type: ndcg_at_3 value: 44.229 - type: ndcg_at_5 value: 41.555 - type: precision_at_1 value: 65.75 - type: precision_at_10 value: 30.45 - type: precision_at_100 value: 9.81 - type: precision_at_1000 value: 2.045 - type: precision_at_3 value: 48.667 - type: precision_at_5 value: 40.8 - type: recall_at_1 value: 8.595 - type: recall_at_10 value: 24.201 - type: recall_at_100 value: 50.096 - type: recall_at_1000 value: 72.677 - type: recall_at_3 value: 15.212 - type: recall_at_5 value: 18.745 - task: type: Classification dataset: type: mteb/emotion name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 46.565 - type: f1 value: 41.49914329345582 - task: type: Retrieval dataset: type: fever name: MTEB FEVER config: default split: test revision: None metrics: - type: map_at_1 value: 66.60000000000001 - type: map_at_10 value: 76.838 - type: map_at_100 value: 77.076 - type: map_at_1000 value: 77.09 - type: map_at_3 value: 75.545 - type: map_at_5 value: 76.39 - type: mrr_at_1 value: 71.707 - type: mrr_at_10 value: 81.514 - type: mrr_at_100 value: 81.64099999999999 - type: mrr_at_1000 value: 81.645 - type: mrr_at_3 value: 80.428 - type: mrr_at_5 value: 81.159 - type: ndcg_at_1 value: 71.707 - type: ndcg_at_10 value: 81.545 - type: ndcg_at_100 value: 82.477 - type: ndcg_at_1000 value: 82.73899999999999 - type: ndcg_at_3 value: 79.292 - type: ndcg_at_5 value: 80.599 - type: precision_at_1 value: 71.707 - type: precision_at_10 value: 10.035 - type: precision_at_100 value: 1.068 - type: precision_at_1000 value: 0.11100000000000002 - type: precision_at_3 value: 30.918 - type: precision_at_5 value: 19.328 - type: recall_at_1 value: 66.60000000000001 - type: recall_at_10 value: 91.353 - type: recall_at_100 value: 95.21 - type: recall_at_1000 value: 96.89999999999999 - type: recall_at_3 value: 85.188 - type: recall_at_5 value: 88.52 - task: type: Retrieval dataset: type: fiqa name: MTEB FiQA2018 config: default split: test revision: None metrics: - type: map_at_1 value: 19.338 - type: map_at_10 value: 31.752000000000002 - type: map_at_100 value: 33.516 - type: map_at_1000 value: 33.694 - type: map_at_3 value: 27.716 - type: map_at_5 value: 29.67 - type: mrr_at_1 value: 38.117000000000004 - type: mrr_at_10 value: 47.323 - type: mrr_at_100 value: 48.13 - type: mrr_at_1000 value: 48.161 - type: mrr_at_3 value: 45.062000000000005 - type: mrr_at_5 value: 46.358 - type: ndcg_at_1 value: 38.117000000000004 - type: ndcg_at_10 value: 39.353 - type: ndcg_at_100 value: 46.044000000000004 - type: ndcg_at_1000 value: 49.083 - type: ndcg_at_3 value: 35.891 - type: ndcg_at_5 value: 36.661 - type: precision_at_1 value: 38.117000000000004 - type: precision_at_10 value: 11.187999999999999 - type: precision_at_100 value: 1.802 - type: precision_at_1000 value: 0.234 - type: precision_at_3 value: 24.126 - type: precision_at_5 value: 17.562 - type: recall_at_1 value: 19.338 - type: recall_at_10 value: 45.735 - type: recall_at_100 value: 71.281 - type: recall_at_1000 value: 89.537 - type: recall_at_3 value: 32.525 - type: recall_at_5 value: 37.671 - task: type: Retrieval dataset: type: hotpotqa name: MTEB HotpotQA config: default split: test revision: None metrics: - type: map_at_1 value: 36.995 - type: map_at_10 value: 55.032000000000004 - type: map_at_100 value: 55.86 - type: map_at_1000 value: 55.932 - type: map_at_3 value: 52.125 - type: map_at_5 value: 53.884 - type: mrr_at_1 value: 73.991 - type: mrr_at_10 value: 80.096 - type: mrr_at_100 value: 80.32000000000001 - type: mrr_at_1000 value: 80.331 - type: mrr_at_3 value: 79.037 - type: mrr_at_5 value: 79.719 - type: ndcg_at_1 value: 73.991 - type: ndcg_at_10 value: 63.786 - type: ndcg_at_100 value: 66.78 - type: ndcg_at_1000 value: 68.255 - type: ndcg_at_3 value: 59.501000000000005 - type: ndcg_at_5 value: 61.82299999999999 - type: precision_at_1 value: 73.991 - type: precision_at_10 value: 13.157 - type: precision_at_100 value: 1.552 - type: precision_at_1000 value: 0.17500000000000002 - type: precision_at_3 value: 37.519999999999996 - type: precision_at_5 value: 24.351 - type: recall_at_1 value: 36.995 - type: recall_at_10 value: 65.78699999999999 - type: recall_at_100 value: 77.583 - type: recall_at_1000 value: 87.421 - type: recall_at_3 value: 56.279999999999994 - type: recall_at_5 value: 60.878 - task: type: Classification dataset: type: mteb/imdb name: MTEB ImdbClassification config: default split: test revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 metrics: - type: accuracy value: 86.80239999999999 - type: ap value: 81.97305141128378 - type: f1 value: 86.76976305549273 - task: type: Retrieval dataset: type: msmarco name: MTEB MSMARCO config: default split: dev revision: None metrics: - type: map_at_1 value: 21.166 - type: map_at_10 value: 33.396 - type: map_at_100 value: 34.588 - type: map_at_1000 value: 34.637 - type: map_at_3 value: 29.509999999999998 - type: map_at_5 value: 31.719 - type: mrr_at_1 value: 21.762 - type: mrr_at_10 value: 33.969 - type: mrr_at_100 value: 35.099000000000004 - type: mrr_at_1000 value: 35.141 - type: mrr_at_3 value: 30.148000000000003 - type: mrr_at_5 value: 32.324000000000005 - type: ndcg_at_1 value: 21.776999999999997 - type: ndcg_at_10 value: 40.306999999999995 - type: ndcg_at_100 value: 46.068 - type: ndcg_at_1000 value: 47.3 - type: ndcg_at_3 value: 32.416 - type: ndcg_at_5 value: 36.345 - type: precision_at_1 value: 21.776999999999997 - type: precision_at_10 value: 6.433 - type: precision_at_100 value: 0.932 - type: precision_at_1000 value: 0.104 - type: precision_at_3 value: 13.897 - type: precision_at_5 value: 10.324 - type: recall_at_1 value: 21.166 - type: recall_at_10 value: 61.587 - type: recall_at_100 value: 88.251 - type: recall_at_1000 value: 97.727 - type: recall_at_3 value: 40.196 - type: recall_at_5 value: 49.611 - task: type: Classification dataset: type: mteb/mtop_domain name: MTEB MTOPDomainClassification (en) config: en split: test revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf metrics: - type: accuracy value: 93.04605563155496 - type: f1 value: 92.78007303978372 - task: type: Classification dataset: type: mteb/mtop_intent name: MTEB MTOPIntentClassification (en) config: en split: test revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba metrics: - type: accuracy value: 69.65116279069767 - type: f1 value: 52.75775172527262 - task: type: Classification dataset: type: mteb/amazon_massive_intent name: MTEB MassiveIntentClassification (en) config: en split: test revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 metrics: - type: accuracy value: 70.34633490248822 - type: f1 value: 68.15345065392562 - task: type: Classification dataset: type: mteb/amazon_massive_scenario name: MTEB MassiveScenarioClassification (en) config: en split: test revision: 7d571f92784cd94a019292a1f45445077d0ef634 metrics: - type: accuracy value: 75.63887020847343 - type: f1 value: 76.08074680233685 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-p2p name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 33.77933406071333 - task: type: Clustering dataset: type: mteb/medrxiv-clustering-s2s name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 32.06504927238196 - task: type: Reranking dataset: type: mteb/mind_small name: MTEB MindSmallReranking config: default split: test revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 metrics: - type: map value: 32.20682480490871 - type: mrr value: 33.41462721527003 - task: type: Retrieval dataset: type: nfcorpus name: MTEB NFCorpus config: default split: test revision: None metrics: - type: map_at_1 value: 5.548 - type: map_at_10 value: 13.086999999999998 - type: map_at_100 value: 16.698 - type: map_at_1000 value: 18.151999999999997 - type: map_at_3 value: 9.576 - type: map_at_5 value: 11.175 - type: mrr_at_1 value: 44.272 - type: mrr_at_10 value: 53.635999999999996 - type: mrr_at_100 value: 54.228 - type: mrr_at_1000 value: 54.26499999999999 - type: mrr_at_3 value: 51.754 - type: mrr_at_5 value: 53.086 - type: ndcg_at_1 value: 42.724000000000004 - type: ndcg_at_10 value: 34.769 - type: ndcg_at_100 value: 32.283 - type: ndcg_at_1000 value: 40.843 - type: ndcg_at_3 value: 39.852 - type: ndcg_at_5 value: 37.858999999999995 - type: precision_at_1 value: 44.272 - type: precision_at_10 value: 26.068 - type: precision_at_100 value: 8.328000000000001 - type: precision_at_1000 value: 2.1 - type: precision_at_3 value: 37.874 - type: precision_at_5 value: 33.065 - type: recall_at_1 value: 5.548 - type: recall_at_10 value: 16.936999999999998 - type: recall_at_100 value: 33.72 - type: recall_at_1000 value: 64.348 - type: recall_at_3 value: 10.764999999999999 - type: recall_at_5 value: 13.361 - task: type: Retrieval dataset: type: nq name: MTEB NQ config: default split: test revision: None metrics: - type: map_at_1 value: 28.008 - type: map_at_10 value: 42.675000000000004 - type: map_at_100 value: 43.85 - type: map_at_1000 value: 43.884 - type: map_at_3 value: 38.286 - type: map_at_5 value: 40.78 - type: mrr_at_1 value: 31.518 - type: mrr_at_10 value: 45.015 - type: mrr_at_100 value: 45.924 - type: mrr_at_1000 value: 45.946999999999996 - type: mrr_at_3 value: 41.348 - type: mrr_at_5 value: 43.428 - type: ndcg_at_1 value: 31.489 - type: ndcg_at_10 value: 50.285999999999994 - type: ndcg_at_100 value: 55.291999999999994 - type: ndcg_at_1000 value: 56.05 - type: ndcg_at_3 value: 41.976 - type: ndcg_at_5 value: 46.103 - type: precision_at_1 value: 31.489 - type: precision_at_10 value: 8.456 - type: precision_at_100 value: 1.125 - type: precision_at_1000 value: 0.12 - type: precision_at_3 value: 19.09 - type: precision_at_5 value: 13.841000000000001 - type: recall_at_1 value: 28.008 - type: recall_at_10 value: 71.21499999999999 - type: recall_at_100 value: 92.99 - type: recall_at_1000 value: 98.578 - type: recall_at_3 value: 49.604 - type: recall_at_5 value: 59.094 - task: type: Retrieval dataset: type: quora name: MTEB QuoraRetrieval config: default split: test revision: None metrics: - type: map_at_1 value: 70.351 - type: map_at_10 value: 84.163 - type: map_at_100 value: 84.785 - type: map_at_1000 value: 84.801 - type: map_at_3 value: 81.16 - type: map_at_5 value: 83.031 - type: mrr_at_1 value: 80.96 - type: mrr_at_10 value: 87.241 - type: mrr_at_100 value: 87.346 - type: mrr_at_1000 value: 87.347 - type: mrr_at_3 value: 86.25699999999999 - type: mrr_at_5 value: 86.907 - type: ndcg_at_1 value: 80.97 - type: ndcg_at_10 value: 88.017 - type: ndcg_at_100 value: 89.241 - type: ndcg_at_1000 value: 89.34299999999999 - type: ndcg_at_3 value: 85.053 - type: ndcg_at_5 value: 86.663 - type: precision_at_1 value: 80.97 - type: precision_at_10 value: 13.358 - type: precision_at_100 value: 1.525 - type: precision_at_1000 value: 0.157 - type: precision_at_3 value: 37.143 - type: precision_at_5 value: 24.451999999999998 - type: recall_at_1 value: 70.351 - type: recall_at_10 value: 95.39800000000001 - type: recall_at_100 value: 99.55199999999999 - type: recall_at_1000 value: 99.978 - type: recall_at_3 value: 86.913 - type: recall_at_5 value: 91.448 - task: type: Clustering dataset: type: mteb/reddit-clustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 55.62406719814139 - task: type: Clustering dataset: type: mteb/reddit-clustering-p2p name: MTEB RedditClusteringP2P config: default split: test revision: 282350215ef01743dc01b456c7f5241fa8937f16 metrics: - type: v_measure value: 61.386700035141736 - task: type: Retrieval dataset: type: scidocs name: MTEB SCIDOCS config: default split: test revision: None metrics: - type: map_at_1 value: 4.618 - type: map_at_10 value: 12.920000000000002 - type: map_at_100 value: 15.304 - type: map_at_1000 value: 15.656999999999998 - type: map_at_3 value: 9.187 - type: map_at_5 value: 10.937 - type: mrr_at_1 value: 22.8 - type: mrr_at_10 value: 35.13 - type: mrr_at_100 value: 36.239 - type: mrr_at_1000 value: 36.291000000000004 - type: mrr_at_3 value: 31.917 - type: mrr_at_5 value: 33.787 - type: ndcg_at_1 value: 22.8 - type: ndcg_at_10 value: 21.382 - type: ndcg_at_100 value: 30.257 - type: ndcg_at_1000 value: 36.001 - type: ndcg_at_3 value: 20.43 - type: ndcg_at_5 value: 17.622 - type: precision_at_1 value: 22.8 - type: precision_at_10 value: 11.26 - type: precision_at_100 value: 2.405 - type: precision_at_1000 value: 0.377 - type: precision_at_3 value: 19.633 - type: precision_at_5 value: 15.68 - type: recall_at_1 value: 4.618 - type: recall_at_10 value: 22.811999999999998 - type: recall_at_100 value: 48.787000000000006 - type: recall_at_1000 value: 76.63799999999999 - type: recall_at_3 value: 11.952 - type: recall_at_5 value: 15.892000000000001 - task: type: STS dataset: type: mteb/sickr-sts name: MTEB SICK-R config: default split: test revision: a6ea5a8cab320b040a23452cc28066d9beae2cee metrics: - type: cos_sim_pearson value: 84.01529458252244 - type: cos_sim_spearman value: 77.92985224770254 - type: euclidean_pearson value: 81.04251429422487 - type: euclidean_spearman value: 77.92838490549133 - type: manhattan_pearson value: 80.95892251458979 - type: manhattan_spearman value: 77.81028089705941 - task: type: STS dataset: type: mteb/sts12-sts name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cos_sim_pearson value: 83.97885282534388 - type: cos_sim_spearman value: 75.1221970851712 - type: euclidean_pearson value: 80.34455956720097 - type: euclidean_spearman value: 74.5894274239938 - type: manhattan_pearson value: 80.38999766325465 - type: manhattan_spearman value: 74.68524557166975 - task: type: STS dataset: type: mteb/sts13-sts name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cos_sim_pearson value: 82.95746064915672 - type: cos_sim_spearman value: 85.08683458043946 - type: euclidean_pearson value: 84.56699492836385 - type: euclidean_spearman value: 85.66089116133713 - type: manhattan_pearson value: 84.47553323458541 - type: manhattan_spearman value: 85.56142206781472 - task: type: STS dataset: type: mteb/sts14-sts name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cos_sim_pearson value: 82.71377893595067 - type: cos_sim_spearman value: 81.03453291428589 - type: euclidean_pearson value: 82.57136298308613 - type: euclidean_spearman value: 81.15839961890875 - type: manhattan_pearson value: 82.55157879373837 - type: manhattan_spearman value: 81.1540163767054 - task: type: STS dataset: type: mteb/sts15-sts name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cos_sim_pearson value: 86.64197832372373 - type: cos_sim_spearman value: 88.31966852492485 - type: euclidean_pearson value: 87.98692129976983 - type: euclidean_spearman value: 88.6247340837856 - type: manhattan_pearson value: 87.90437827826412 - type: manhattan_spearman value: 88.56278787131457 - task: type: STS dataset: type: mteb/sts16-sts name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cos_sim_pearson value: 81.84159950146693 - type: cos_sim_spearman value: 83.90678384140168 - type: euclidean_pearson value: 83.19005018860221 - type: euclidean_spearman value: 84.16260415876295 - type: manhattan_pearson value: 83.05030612994494 - type: manhattan_spearman value: 83.99605629718336 - task: type: STS dataset: type: mteb/sts17-crosslingual-sts name: MTEB STS17 (en-en) config: en-en split: test revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d metrics: - type: cos_sim_pearson value: 87.49935350176666 - type: cos_sim_spearman value: 87.59086606735383 - type: euclidean_pearson value: 88.06537181129983 - type: euclidean_spearman value: 87.6687448086014 - type: manhattan_pearson value: 87.96599131972935 - type: manhattan_spearman value: 87.63295748969642 - task: type: STS dataset: type: mteb/sts22-crosslingual-sts name: MTEB STS22 (en) config: en split: test revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 metrics: - type: cos_sim_pearson value: 67.68232799482763 - type: cos_sim_spearman value: 67.99930378085793 - type: euclidean_pearson value: 68.50275360001696 - type: euclidean_spearman value: 67.81588179309259 - type: manhattan_pearson value: 68.5892154749763 - type: manhattan_spearman value: 67.84357259640682 - task: type: STS dataset: type: mteb/stsbenchmark-sts name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cos_sim_pearson value: 84.37049618406554 - type: cos_sim_spearman value: 85.57014313159492 - type: euclidean_pearson value: 85.57469513908282 - type: euclidean_spearman value: 85.661948135258 - type: manhattan_pearson value: 85.36866831229028 - type: manhattan_spearman value: 85.5043455368843 - task: type: Reranking dataset: type: mteb/scidocs-reranking name: MTEB SciDocsRR config: default split: test revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab metrics: - type: map value: 84.83259065376154 - type: mrr value: 95.58455433455433 - task: type: Retrieval dataset: type: scifact name: MTEB SciFact config: default split: test revision: None metrics: - type: map_at_1 value: 58.817 - type: map_at_10 value: 68.459 - type: map_at_100 value: 68.951 - type: map_at_1000 value: 68.979 - type: map_at_3 value: 65.791 - type: map_at_5 value: 67.583 - type: mrr_at_1 value: 61.667 - type: mrr_at_10 value: 69.368 - type: mrr_at_100 value: 69.721 - type: mrr_at_1000 value: 69.744 - type: mrr_at_3 value: 67.278 - type: mrr_at_5 value: 68.611 - type: ndcg_at_1 value: 61.667 - type: ndcg_at_10 value: 72.70100000000001 - type: ndcg_at_100 value: 74.928 - type: ndcg_at_1000 value: 75.553 - type: ndcg_at_3 value: 68.203 - type: ndcg_at_5 value: 70.804 - type: precision_at_1 value: 61.667 - type: precision_at_10 value: 9.533 - type: precision_at_100 value: 1.077 - type: precision_at_1000 value: 0.11299999999999999 - type: precision_at_3 value: 26.444000000000003 - type: precision_at_5 value: 17.599999999999998 - type: recall_at_1 value: 58.817 - type: recall_at_10 value: 84.789 - type: recall_at_100 value: 95.0 - type: recall_at_1000 value: 99.667 - type: recall_at_3 value: 72.8 - type: recall_at_5 value: 79.294 - task: type: PairClassification dataset: type: mteb/sprintduplicatequestions-pairclassification name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cos_sim_accuracy value: 99.8108910891089 - type: cos_sim_ap value: 95.5743678558349 - type: cos_sim_f1 value: 90.43133366385722 - type: cos_sim_precision value: 89.67551622418878 - type: cos_sim_recall value: 91.2 - type: dot_accuracy value: 99.75841584158415 - type: dot_ap value: 94.00786363627253 - type: dot_f1 value: 87.51910341314316 - type: dot_precision value: 89.20041536863967 - type: dot_recall value: 85.9 - type: euclidean_accuracy value: 99.81485148514851 - type: euclidean_ap value: 95.4752113136905 - type: euclidean_f1 value: 90.44334975369456 - type: euclidean_precision value: 89.126213592233 - type: euclidean_recall value: 91.8 - type: manhattan_accuracy value: 99.81584158415842 - type: manhattan_ap value: 95.5163172682464 - type: manhattan_f1 value: 90.51987767584097 - type: manhattan_precision value: 92.3076923076923 - type: manhattan_recall value: 88.8 - type: max_accuracy value: 99.81584158415842 - type: max_ap value: 95.5743678558349 - type: max_f1 value: 90.51987767584097 - task: type: Clustering dataset: type: mteb/stackexchange-clustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 62.63235986949449 - task: type: Clustering dataset: type: mteb/stackexchange-clustering-p2p name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 36.334795589585575 - task: type: Reranking dataset: type: mteb/stackoverflowdupquestions-reranking name: MTEB StackOverflowDupQuestions config: default split: test revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 metrics: - type: map value: 52.02955214518782 - type: mrr value: 52.8004838298956 - task: type: Summarization dataset: type: mteb/summeval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cos_sim_pearson value: 30.63769566275453 - type: cos_sim_spearman value: 30.422379185989335 - type: dot_pearson value: 26.88493071882256 - type: dot_spearman value: 26.505249740971305 - task: type: Retrieval dataset: type: trec-covid name: MTEB TRECCOVID config: default split: test revision: None metrics: - type: map_at_1 value: 0.21 - type: map_at_10 value: 1.654 - type: map_at_100 value: 10.095 - type: map_at_1000 value: 25.808999999999997 - type: map_at_3 value: 0.594 - type: map_at_5 value: 0.9289999999999999 - type: mrr_at_1 value: 78.0 - type: mrr_at_10 value: 87.019 - type: mrr_at_100 value: 87.019 - type: mrr_at_1000 value: 87.019 - type: mrr_at_3 value: 86.333 - type: mrr_at_5 value: 86.733 - type: ndcg_at_1 value: 73.0 - type: ndcg_at_10 value: 66.52900000000001 - type: ndcg_at_100 value: 53.433 - type: ndcg_at_1000 value: 51.324000000000005 - type: ndcg_at_3 value: 72.02199999999999 - type: ndcg_at_5 value: 69.696 - type: precision_at_1 value: 78.0 - type: precision_at_10 value: 70.39999999999999 - type: precision_at_100 value: 55.46 - type: precision_at_1000 value: 22.758 - type: precision_at_3 value: 76.667 - type: precision_at_5 value: 74.0 - type: recall_at_1 value: 0.21 - type: recall_at_10 value: 1.8849999999999998 - type: recall_at_100 value: 13.801 - type: recall_at_1000 value: 49.649 - type: recall_at_3 value: 0.632 - type: recall_at_5 value: 1.009 - task: type: Retrieval dataset: type: webis-touche2020 name: MTEB Touche2020 config: default split: test revision: None metrics: - type: map_at_1 value: 1.797 - type: map_at_10 value: 9.01 - type: map_at_100 value: 14.682 - type: map_at_1000 value: 16.336000000000002 - type: map_at_3 value: 4.546 - type: map_at_5 value: 5.9270000000000005 - type: mrr_at_1 value: 24.490000000000002 - type: mrr_at_10 value: 41.156 - type: mrr_at_100 value: 42.392 - type: mrr_at_1000 value: 42.408 - type: mrr_at_3 value: 38.775999999999996 - type: mrr_at_5 value: 40.102 - type: ndcg_at_1 value: 21.429000000000002 - type: ndcg_at_10 value: 22.222 - type: ndcg_at_100 value: 34.405 - type: ndcg_at_1000 value: 46.599000000000004 - type: ndcg_at_3 value: 25.261 - type: ndcg_at_5 value: 22.695999999999998 - type: precision_at_1 value: 24.490000000000002 - type: precision_at_10 value: 19.796 - type: precision_at_100 value: 7.306 - type: precision_at_1000 value: 1.5350000000000001 - type: precision_at_3 value: 27.211000000000002 - type: precision_at_5 value: 22.857 - type: recall_at_1 value: 1.797 - type: recall_at_10 value: 15.706000000000001 - type: recall_at_100 value: 46.412 - type: recall_at_1000 value: 83.159 - type: recall_at_3 value: 6.1370000000000005 - type: recall_at_5 value: 8.599 - task: type: Classification dataset: type: mteb/toxic_conversations_50k name: MTEB ToxicConversationsClassification config: default split: test revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c metrics: - type: accuracy value: 70.3302 - type: ap value: 14.169121204575601 - type: f1 value: 54.229345975274235 - task: type: Classification dataset: type: mteb/tweet_sentiment_extraction name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 58.22297679683077 - type: f1 value: 58.62984908377875 - task: type: Clustering dataset: type: mteb/twentynewsgroups-clustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 49.952922428464255 - task: type: PairClassification dataset: type: mteb/twittersemeval2015-pairclassification name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cos_sim_accuracy value: 84.68140907194373 - type: cos_sim_ap value: 70.12180123666836 - type: cos_sim_f1 value: 65.77501791258658 - type: cos_sim_precision value: 60.07853403141361 - type: cos_sim_recall value: 72.66490765171504 - type: dot_accuracy value: 81.92167848840674 - type: dot_ap value: 60.49837581423469 - type: dot_f1 value: 58.44186046511628 - type: dot_precision value: 52.24532224532224 - type: dot_recall value: 66.3060686015831 - type: euclidean_accuracy value: 84.73505394289802 - type: euclidean_ap value: 70.3278904593286 - type: euclidean_f1 value: 65.98851124940161 - type: euclidean_precision value: 60.38107752956636 - type: euclidean_recall value: 72.74406332453826 - type: manhattan_accuracy value: 84.73505394289802 - type: manhattan_ap value: 70.00737738537337 - type: manhattan_f1 value: 65.80150784822642 - type: manhattan_precision value: 61.892583120204606 - type: manhattan_recall value: 70.23746701846966 - type: max_accuracy value: 84.73505394289802 - type: max_ap value: 70.3278904593286 - type: max_f1 value: 65.98851124940161 - task: type: PairClassification dataset: type: mteb/twitterurlcorpus-pairclassification name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cos_sim_accuracy value: 88.44258159661582 - type: cos_sim_ap value: 84.91926704880888 - type: cos_sim_f1 value: 77.07651086632926 - type: cos_sim_precision value: 74.5894554883319 - type: cos_sim_recall value: 79.73514012935017 - type: dot_accuracy value: 85.88116583226608 - type: dot_ap value: 78.9753854779923 - type: dot_f1 value: 72.17757637979255 - type: dot_precision value: 66.80647486729143 - type: dot_recall value: 78.48783492454572 - type: euclidean_accuracy value: 88.5299025885823 - type: euclidean_ap value: 85.08006075642194 - type: euclidean_f1 value: 77.29637336504163 - type: euclidean_precision value: 74.69836253950014 - type: euclidean_recall value: 80.08161379735141 - type: manhattan_accuracy value: 88.55124771995187 - type: manhattan_ap value: 85.00941529932851 - type: manhattan_f1 value: 77.33100233100232 - type: manhattan_precision value: 73.37572573956317 - type: manhattan_recall value: 81.73698798891284 - type: max_accuracy value: 88.55124771995187 - type: max_ap value: 85.08006075642194 - type: max_f1 value: 77.33100233100232 language: - en license: mit --- # gte-small General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc. ## Metrics We compared the performance of the GTE models with other popular text embedding models on the MTEB benchmark. For more detailed comparison results, please refer to the MTEB leaderboard. | Model Name | Model Size (GB) | Dimension | Sequence Length | Average (56) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Summarization (1) | Classification (12) | |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **gte-large** | 0.67 | 1024 | 512 | **63.13** | 46.84 | 85.00 | 59.13 | 52.22 | 83.35 | 31.66 | 73.33 | | **gte-base** | 0.22 | 768 | 512 | **62.39** | 46.2 | 84.57 | 58.61 | 51.14 | 82.3 | 31.17 | 73.01 | | e5-large-v2 | 1.34 | 1024| 512 | 62.25 | 44.49 | 86.03 | 56.61 | 50.56 | 82.05 | 30.19 | 75.24 | | e5-base-v2 | 0.44 | 768 | 512 | 61.5 | 43.80 | 85.73 | 55.91 | 50.29 | 81.05 | 30.28 | 73.84 | | **gte-small** | 0.07 | 384 | 512 | **61.36** | 44.89 | 83.54 | 57.7 | 49.46 | 82.07 | 30.42 | 72.31 | | text-embedding-ada-002 | - | 1536 | 8192 | 60.99 | 45.9 | 84.89 | 56.32 | 49.25 | 80.97 | 30.8 | 70.93 | | e5-small-v2 | 0.13 | 384 | 512 | 59.93 | 39.92 | 84.67 | 54.32 | 49.04 | 80.39 | 31.16 | 72.94 | | sentence-t5-xxl | 9.73 | 768 | 512 | 59.51 | 43.72 | 85.06 | 56.42 | 42.24 | 82.63 | 30.08 | 73.42 | | all-mpnet-base-v2 | 0.44 | 768 | 514 | 57.78 | 43.69 | 83.04 | 59.36 | 43.81 | 80.28 | 27.49 | 65.07 | | sgpt-bloom-7b1-msmarco | 28.27 | 4096 | 2048 | 57.59 | 38.93 | 81.9 | 55.65 | 48.22 | 77.74 | 33.6 | 66.19 | | all-MiniLM-L12-v2 | 0.13 | 384 | 512 | 56.53 | 41.81 | 82.41 | 58.44 | 42.69 | 79.8 | 27.9 | 63.21 | | all-MiniLM-L6-v2 | 0.09 | 384 | 512 | 56.26 | 42.35 | 82.37 | 58.04 | 41.95 | 78.9 | 30.81 | 63.05 | | contriever-base-msmarco | 0.44 | 768 | 512 | 56.00 | 41.1 | 82.54 | 53.14 | 41.88 | 76.51 | 30.36 | 66.68 | | sentence-t5-base | 0.22 | 768 | 512 | 55.27 | 40.21 | 85.18 | 53.09 | 33.63 | 81.14 | 31.39 | 69.81 | ## Usage Code example Use with sentence-transformers: ### Limitation This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens. ### Citation If you find our paper or models helpful, please consider citing them as follows:", + "model_explanation_gemini": "Generates sentence embeddings for tasks like similarity measurement, classification, clustering, and retrieval across various datasets." +} \ No newline at end of file diff --git a/data/model_data_json/tiiuae_falcon-40b.json b/data/model_data_json/tiiuae_falcon-40b.json new file mode 100644 index 0000000000000000000000000000000000000000..87c0204f8dff036e2e0f6cce702b256d9030277a --- /dev/null +++ b/data/model_data_json/tiiuae_falcon-40b.json @@ -0,0 +1,29 @@ +{ + "model_id": "tiiuae/falcon-40b", + "downloads": 120899, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "falcon", + "text-generation", + "custom_code", + "en", + "de", + "es", + "fr", + "dataset:tiiuae/falcon-refinedweb", + "arxiv:2205.14135", + "arxiv:1911.02150", + "arxiv:2101.00027", + "arxiv:2005.14165", + "arxiv:2104.09864", + "arxiv:2306.01116", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "region:us" + ], + "description": "--- datasets: - tiiuae/falcon-refinedweb language: - en - de - es - fr inference: false license: apache-2.0 --- # 🚀 Falcon-40B **Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.** *Paper coming soon 😊.* 🤗 To get started with Falcon (inference, finetuning, quantization, etc.), we recommend reading this great blogpost fron HF! ## Why use Falcon-40B? * **It is the best open-source model currently available.** Falcon-40B outperforms LLaMA, StableLM, RedPajama, MPT, etc. See the OpenLLM Leaderboard. * **It features an architecture optimized for inference**, with FlashAttention (Dao et al., 2022) and multiquery (Shazeer et al., 2019). * **It is made available under a permissive Apache 2.0 license allowing for commercial use**, without any royalties or restrictions. * ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.** If you are looking for a version better suited to taking generic instructions in a chat format, we recommend taking a look at Falcon-40B-Instruct. 💸 **Looking for a smaller, less expensive model?** Falcon-7B is Falcon-40B's little brother! 💥 **Falcon LLMs require PyTorch 2.0 for use with !** For fast inference with Falcon, check-out Text Generation Inference! Read more in this blogpost. You will need **at least 85-100GB of memory** to swiftly run inference with Falcon-40B. # Model Card for Falcon-40B ## Model Details ### Model Description - **Developed by:** - **Model type:** Causal decoder-only; - **Language(s) (NLP):** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish); - **License:** Apache 2.0 license. ### Model Source - **Paper:** *coming soon*. ## Uses ### Direct Use Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.) ### Out-of-Scope Use Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. ## Bias, Risks, and Limitations Falcon-40B is trained mostly on English, German, Spanish, French, with limited capabilities also in in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online. ### Recommendations We recommend users of Falcon-40B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use. ## How to Get Started with the Model ## Training Details ### Training Data Falcon-40B was trained on 1,000B tokens of RefinedWeb, a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. Significant components from our curated copora were inspired by The Pile (Gao et al., 2020). | **Data source** | **Fraction** | **Tokens** | **Sources** | |--------------------|--------------|------------|-----------------------------------| | RefinedWeb-English | 75% | 750B | massive web crawl | | RefinedWeb-Europe | 7% | 70B | European massive web crawl | | Books | 6% | 60B | | | Conversations | 5% | 50B | Reddit, StackOverflow, HackerNews | | Code | 5% | 50B | | | Technical | 2% | 20B | arXiv, PubMed, USPTO, etc. | RefinedWeb-Europe is made of the following languages: | **Language** | **Fraction of multilingual data** | **Tokens** | |--------------|-----------------------------------|------------| | German | 26% | 18B | | Spanish | 24% | 17B | | French | 23% | 16B | | _Italian_ | 7% | 5B | | _Portuguese_ | 4% | 3B | | _Polish_ | 4% | 3B | | _Dutch_ | 4% | 3B | | _Romanian_ | 3% | 2B | | _Czech_ | 3% | 2B | | _Swedish_ | 2% | 1B | The data was tokenized with the Falcon-7B/40B tokenizer. ### Training Procedure Falcon-40B was trained on 384 A100 40GB GPUs, using a 3D parallelism strategy (TP=8, PP=4, DP=12) combined with ZeRO. #### Training Hyperparameters | **Hyperparameter** | **Value** | **Comment** | |--------------------|------------|-------------------------------------------| | Precision | | | | Optimizer | AdamW | | | Learning rate | 1.85e-4 | 4B tokens warm-up, cosine decay to 1.85e-5 | | Weight decay | 1e-1 | | | Z-loss | 1e-4 | | | Batch size | 1152 | 100B tokens ramp-up | #### Speeds, Sizes, Times Training started in December 2022 and took two months. ## Evaluation *Paper coming soon.* See the OpenLLM Leaderboard for early results. ## Technical Specifications ### Model Architecture and Objective Falcon-40B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token). The architecture is broadly adapted from the GPT-3 paper (Brown et al., 2020), with the following differences: * **Positionnal embeddings:** rotary (Su et al., 2021); * **Attention:** multiquery (Shazeer et al., 2019) and FlashAttention (Dao et al., 2022); * **Decoder-block:** parallel attention/MLP with a two layer norms. For multiquery, we are using an internal variant which uses independent key and values per tensor parallel degree. | **Hyperparameter** | **Value** | **Comment** | |--------------------|-----------|----------------------------------------| | Layers | 60 | | | | 8192 | | | | 64 | Reduced to optimise for FlashAttention | | Vocabulary | 65024 | | | Sequence length | 2048 | | ### Compute Infrastructure #### Hardware Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances. #### Software Falcon-40B was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.) ## Citation *Paper coming soon* 😊. In the meanwhile, you can use the following information to cite: To learn more about the pretraining dataset, see the 📓 RefinedWeb paper. ## License Falcon-40B is made available under the Apache 2.0 license. ## Contact falconllm@tii.ae", + "model_explanation_gemini": "Falcon-40B is a 40-billion-parameter, open-source, causal decoder-only language model trained on 1 trillion tokens of web and curated data, optimized for inference and designed for research or fine-tuning for tasks like text generation, summarization, or chatbots, supporting multiple languages with Apache 2.0 licensing." +} \ No newline at end of file diff --git a/data/model_data_json/tiiuae_falcon-7b-instruct.json b/data/model_data_json/tiiuae_falcon-7b-instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..1c9971dd90a81c132c73ad56253b65b2e8df8894 --- /dev/null +++ b/data/model_data_json/tiiuae_falcon-7b-instruct.json @@ -0,0 +1,28 @@ +{ + "model_id": "tiiuae/falcon-7b-instruct", + "downloads": 179698, + "tags": [ + "transformers", + "pytorch", + "coreml", + "safetensors", + "falcon", + "text-generation", + "conversational", + "custom_code", + "en", + "dataset:tiiuae/falcon-refinedweb", + "arxiv:2205.14135", + "arxiv:1911.02150", + "arxiv:2005.14165", + "arxiv:2104.09864", + "arxiv:2306.01116", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - tiiuae/falcon-refinedweb language: - en inference: true new_version: tiiuae/falcon-11B widget: - text: \"Hey Falcon! Any recommendations for my holidays in Abu Dhabi?\" example_title: \"Abu Dhabi Trip\" - text: \"What's the Everett interpretation of quantum mechanics?\" example_title: \"Q/A: Quantum & Answers\" - text: \"Give me a list of the top 10 dive sites you would recommend around the world.\" example_title: \"Diving Top 10\" - text: \"Can you tell me more about deep-water soloing?\" example_title: \"Extreme sports\" - text: \"Can you write a short tweet about the Apache 2.0 release of our latest AI model, Falcon LLM?\" example_title: \"Twitter Helper\" - text: \"What are the responsabilities of a Chief Llama Officer?\" example_title: \"Trendy Jobs\" license: apache-2.0 --- # ✨ Falcon-7B-Instruct **Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.** *Paper coming soon 😊.* 🤗 To get started with Falcon (inference, finetuning, quantization, etc.), we recommend reading this great blogpost fron HF! ## Why use Falcon-7B-Instruct? * **You are looking for a ready-to-use chat/instruct model based on Falcon-7B.** * **Falcon-7B is a strong base model, outperforming comparable open-source models** (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. See the OpenLLM Leaderboard. * **It features an architecture optimized for inference**, with FlashAttention (Dao et al., 2022) and multiquery (Shazeer et al., 2019). 💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from Falcon-7B. 🔥 **Looking for an even more powerful model?** Falcon-40B-Instruct is Falcon-7B-Instruct's big brother! 💥 **Falcon LLMs require PyTorch 2.0 for use with !** For fast inference with Falcon, check-out Text Generation Inference! Read more in this blogpost. You will need **at least 16GB of memory** to swiftly run inference with Falcon-7B-Instruct. # Model Card for Falcon-7B-Instruct ## Model Details ### Model Description - **Developed by:** - **Model type:** Causal decoder-only; - **Language(s) (NLP):** English and French; - **License:** Apache 2.0; - **Finetuned from model:** Falcon-7B. ### Model Source - **Paper:** *coming soon*. ## Uses ### Direct Use Falcon-7B-Instruct has been finetuned on a mixture of instruct and chat datasets. ### Out-of-Scope Use Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. ## Bias, Risks, and Limitations Falcon-7B-Instruct is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online. ### Recommendations We recommend users of Falcon-7B-Instruct to develop guardrails and to take appropriate precautions for any production use. ## How to Get Started with the Model ## Training Details ### Training Data Falcon-7B-Instruct was finetuned on a 250M tokens mixture of instruct/chat datasets. | **Data source** | **Fraction** | **Tokens** | **Description** | |--------------------|--------------|------------|-----------------------------------| | Bai ze | 65% | 164M | chat | | GPT4All | 25% | 62M | instruct | | GPTeacher | 5% | 11M | instruct | | RefinedWeb-English | 5% | 13M | massive web crawl | The data was tokenized with the Falcon-7B/40B tokenizer. ## Evaluation *Paper coming soon.* See the OpenLLM Leaderboard for early results. Note that this model variant is not optimized for NLP benchmarks. ## Technical Specifications For more information about pretraining, see Falcon-7B. ### Model Architecture and Objective Falcon-7B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token). The architecture is broadly adapted from the GPT-3 paper (Brown et al., 2020), with the following differences: * **Positionnal embeddings:** rotary (Su et al., 2021); * **Attention:** multiquery (Shazeer et al., 2019) and FlashAttention (Dao et al., 2022); * **Decoder-block:** parallel attention/MLP with a single layer norm. | **Hyperparameter** | **Value** | **Comment** | |--------------------|-----------|----------------------------------------| | Layers | 32 | | | | 4544 | Increased to compensate for multiquery | | | 64 | Reduced to optimise for FlashAttention | | Vocabulary | 65024 | | | Sequence length | 2048 | | ### Compute Infrastructure #### Hardware Falcon-7B-Instruct was trained on AWS SageMaker, on 32 A100 40GB GPUs in P4d instances. #### Software Falcon-7B-Instruct was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.) ## Citation *Paper coming soon* 😊. In the meanwhile, you can use the following information to cite: To learn more about the pretraining dataset, see the 📓 RefinedWeb paper. ## License Falcon-7B-Instruct is made available under the Apache 2.0 license. ## Contact falconllm@tii.ae", + "model_explanation_gemini": "Falcon-7B-Instruct is a 7B-parameter English-language chat/instruct model optimized for inference, finetuned from Falcon-7B on mixed datasets to provide conversational responses." +} \ No newline at end of file diff --git a/data/model_data_json/timbrooks_instruct-pix2pix.json b/data/model_data_json/timbrooks_instruct-pix2pix.json new file mode 100644 index 0000000000000000000000000000000000000000..89ec1eb4661fa13619bc254e6be5cbb1984fc971 --- /dev/null +++ b/data/model_data_json/timbrooks_instruct-pix2pix.json @@ -0,0 +1,14 @@ +{ + "model_id": "timbrooks/instruct-pix2pix", + "downloads": 285360, + "tags": [ + "diffusers", + "safetensors", + "image-to-image", + "license:mit", + "diffusers:StableDiffusionInstructPix2PixPipeline", + "region:us" + ], + "description": "--- license: mit tags: - image-to-image --- # InstructPix2Pix: Learning to Follow Image Editing Instructions GitHub: 96% across all races for gender classification with ‘Middle Eastern’ having the highest accuracy (98.4%) and ‘White’ having the lowest (96.5%). Additionally, CLIP averaged ~93% for racial classification and ~63% for age classification. Our use of evaluations to test for gender, race and age classification as well as denigration harms is simply to evaluate performance of the model across people and surface potential risks and not to demonstrate an endorsement/enthusiasm for such tasks.", + "model_explanation_gemini": "Enables zero-shot image classification by learning visual-textual similarity using a Vision Transformer encoder for research on robustness and generalization in computer vision." +} \ No newline at end of file diff --git a/data/model_data_json/timm_vit_base_patch32_clip_224.laion2b_ft_in12k_in1k.json b/data/model_data_json/timm_vit_base_patch32_clip_224.laion2b_ft_in12k_in1k.json new file mode 100644 index 0000000000000000000000000000000000000000..cd7776061643340ce9ced60dac0c78df6e0ea9c1 --- /dev/null +++ b/data/model_data_json/timm_vit_base_patch32_clip_224.laion2b_ft_in12k_in1k.json @@ -0,0 +1,21 @@ +{ + "model_id": "timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k", + "downloads": 255023, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-classification", + "transformers", + "dataset:imagenet-1k", + "dataset:laion-2b", + "dataset:imagenet-12k", + "arxiv:2212.07143", + "arxiv:2210.08402", + "arxiv:2010.11929", + "license:apache-2.0", + "region:us" + ], + "description": "--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-1k - laion-2b - imagenet-12k --- # Model card for vit_base_patch32_clip_224.laion2b_ft_in12k_in1k A Vision Transformer (ViT) image classification model. Pretrained on LAION-2B image-text pairs using OpenCLIP. Fine-tuned on ImageNet-12k and then ImageNet-1k in . See recipes in Reproducible scaling laws. ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 88.2 - GMACs: 4.4 - Activations (M): 4.2 - Image size: 224 x 224 - **Papers:** - OpenCLIP: - Reproducible scaling laws for contrastive language-image learning: - LAION-5B: An open large-scale dataset for training next generation image-text models: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: - **Dataset:** ImageNet-1k - **Pretrain Dataset:** - LAION-2B - ImageNet-12k ## Model Usage ### Image Classification ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. ## Citation", + "model_explanation_gemini": "Performs image classification using a Vision Transformer pretrained on LAION-2B and fine-tuned on ImageNet datasets." +} \ No newline at end of file diff --git a/data/model_data_json/timm_vit_base_patch8_224.augreg2_in21k_ft_in1k.json b/data/model_data_json/timm_vit_base_patch8_224.augreg2_in21k_ft_in1k.json new file mode 100644 index 0000000000000000000000000000000000000000..9bd1649006b7b422b3ef47b67ce35bd3ca4ea59c --- /dev/null +++ b/data/model_data_json/timm_vit_base_patch8_224.augreg2_in21k_ft_in1k.json @@ -0,0 +1,19 @@ +{ + "model_id": "timm/vit_base_patch8_224.augreg2_in21k_ft_in1k", + "downloads": 251987, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-classification", + "transformers", + "dataset:imagenet-1k", + "dataset:imagenet-21k", + "arxiv:2106.10270", + "arxiv:2010.11929", + "license:apache-2.0", + "region:us" + ], + "description": "--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-1k - imagenet-21k --- # Model card for vit_base_patch8_224.augreg2_in21k_ft_in1k A Vision Transformer (ViT) image classification model. Trained on ImageNet-21k by paper authors and (re) fine-tuned on ImageNet-1k with additional augmentation and regularization by Ross Wightman. ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 86.6 - GMACs: 66.9 - Activations (M): 65.7 - Image size: 224 x 224 - **Papers:** - How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: - **Dataset:** ImageNet-1k - **Pretrain Dataset:** ImageNet-21k - **Original:** ## Model Usage ### Image Classification ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. ## Citation", + "model_explanation_gemini": "A Vision Transformer model fine-tuned for image classification on ImageNet-1k after pretraining on ImageNet-21k, featuring additional augmentation and regularization." +} \ No newline at end of file diff --git a/data/model_data_json/timm_vit_large_patch14_dinov2.lvd142m.json b/data/model_data_json/timm_vit_large_patch14_dinov2.lvd142m.json new file mode 100644 index 0000000000000000000000000000000000000000..778cc0e9292a0d01fa9cabb9d42fdde98abf745d --- /dev/null +++ b/data/model_data_json/timm_vit_large_patch14_dinov2.lvd142m.json @@ -0,0 +1,17 @@ +{ + "model_id": "timm/vit_large_patch14_dinov2.lvd142m", + "downloads": 36847, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-feature-extraction", + "transformers", + "arxiv:2304.07193", + "arxiv:2010.11929", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: timm tags: - image-feature-extraction - timm - transformers --- # Model card for vit_large_patch14_dinov2.lvd142m A Vision Transformer (ViT) image feature model. Pretrained on LVD-142M with self-supervised DINOv2 method. ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 304.4 - GMACs: 507.1 - Activations (M): 1058.8 - Image size: 518 x 518 - **Papers:** - DINOv2: Learning Robust Visual Features without Supervision: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: - **Original:** - **Pretrain Dataset:** LVD-142M ## Model Usage ### Image Classification ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. ## Citation", + "model_explanation_gemini": "A Vision Transformer model pretrained with self-supervised DINOv2 on LVD-142M for image classification and feature extraction." +} \ No newline at end of file diff --git a/data/model_data_json/timm_vit_large_patch14_reg4_dinov2.lvd142m.json b/data/model_data_json/timm_vit_large_patch14_reg4_dinov2.lvd142m.json new file mode 100644 index 0000000000000000000000000000000000000000..e5327e3b0869ed621328c71f9dea49a9df72e7f7 --- /dev/null +++ b/data/model_data_json/timm_vit_large_patch14_reg4_dinov2.lvd142m.json @@ -0,0 +1,18 @@ +{ + "model_id": "timm/vit_large_patch14_reg4_dinov2.lvd142m", + "downloads": 111609, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-feature-extraction", + "transformers", + "arxiv:2309.16588", + "arxiv:2304.07193", + "arxiv:2010.11929", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: timm tags: - image-feature-extraction - timm - transformers --- # Model card for vit_large_patch14_reg4_dinov2.lvd142m A Vision Transformer (ViT) image feature model with registers. Pretrained on LVD-142M with self-supervised DINOv2 method. ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 304.4 - GMACs: 416.1 - Activations (M): 305.3 - Image size: 518 x 518 - **Papers:** - Vision Transformers Need Registers: - DINOv2: Learning Robust Visual Features without Supervision: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: - **Original:** - **Pretrain Dataset:** LVD-142M ## Model Usage ### Image Classification ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. ## Citation", + "model_explanation_gemini": "Pretrained on LVD-142M using self-supervised DINOv2, this Vision Transformer extracts image features or performs classification with registers, handling 518x518 inputs." +} \ No newline at end of file diff --git a/data/model_data_json/timm_vit_small_patch16_224.augreg_in21k_ft_in1k.json b/data/model_data_json/timm_vit_small_patch16_224.augreg_in21k_ft_in1k.json new file mode 100644 index 0000000000000000000000000000000000000000..6427f374e0df66bc671fa149348f900dde8ebf66 --- /dev/null +++ b/data/model_data_json/timm_vit_small_patch16_224.augreg_in21k_ft_in1k.json @@ -0,0 +1,19 @@ +{ + "model_id": "timm/vit_small_patch16_224.augreg_in21k_ft_in1k", + "downloads": 626007, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-classification", + "transformers", + "dataset:imagenet-1k", + "dataset:imagenet-21k", + "arxiv:2106.10270", + "arxiv:2010.11929", + "license:apache-2.0", + "region:us" + ], + "description": "--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-1k - imagenet-21k --- # Model card for vit_small_patch16_224.augreg_in21k_ft_in1k A Vision Transformer (ViT) image classification model. Trained on ImageNet-21k and fine-tuned on ImageNet-1k (with additional augmentation and regularization) in JAX by paper authors, ported to PyTorch by Ross Wightman. ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 22.1 - GMACs: 4.3 - Activations (M): 8.2 - Image size: 224 x 224 - **Papers:** - How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: - **Dataset:** ImageNet-1k - **Pretrain Dataset:** ImageNet-21k - **Original:** ## Model Usage ### Image Classification ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. ## Citation", + "model_explanation_gemini": "A Vision Transformer model for image classification, pretrained on ImageNet-21k and fine-tuned on ImageNet-1k with augmentation and regularization." +} \ No newline at end of file diff --git a/data/model_data_json/timm_vit_tiny_patch16_224.augreg_in21k.json b/data/model_data_json/timm_vit_tiny_patch16_224.augreg_in21k.json new file mode 100644 index 0000000000000000000000000000000000000000..d22288d0cffc783900154f2681b6be6f8cad2ac3 --- /dev/null +++ b/data/model_data_json/timm_vit_tiny_patch16_224.augreg_in21k.json @@ -0,0 +1,18 @@ +{ + "model_id": "timm/vit_tiny_patch16_224.augreg_in21k", + "downloads": 119886, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-classification", + "transformers", + "dataset:imagenet-21k", + "arxiv:2106.10270", + "arxiv:2010.11929", + "license:apache-2.0", + "region:us" + ], + "description": "--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-21k --- # Model card for vit_tiny_patch16_224.augreg_in21k A Vision Transformer (ViT) image classification model. Trained on ImageNet-21k (with additional augmentation and regularization) in JAX by paper authors, ported to PyTorch by Ross Wightman. ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 9.7 - GMACs: 1.1 - Activations (M): 4.1 - Image size: 224 x 224 - **Papers:** - How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: - **Dataset:** ImageNet-21k - **Original:** ## Model Usage ### Image Classification ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. ## Citation", + "model_explanation_gemini": "A Vision Transformer (ViT) model for image classification, trained on ImageNet-21k with augmentation and regularization, designed to process 224x224 images." +} \ No newline at end of file diff --git a/data/model_data_json/timm_vit_tiny_patch16_224.augreg_in21k_ft_in1k.json b/data/model_data_json/timm_vit_tiny_patch16_224.augreg_in21k_ft_in1k.json new file mode 100644 index 0000000000000000000000000000000000000000..7201bc155de0f9a3b31645370c7d51ed05334b39 --- /dev/null +++ b/data/model_data_json/timm_vit_tiny_patch16_224.augreg_in21k_ft_in1k.json @@ -0,0 +1,19 @@ +{ + "model_id": "timm/vit_tiny_patch16_224.augreg_in21k_ft_in1k", + "downloads": 355886, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-classification", + "transformers", + "dataset:imagenet-1k", + "dataset:imagenet-21k", + "arxiv:2106.10270", + "arxiv:2010.11929", + "license:apache-2.0", + "region:us" + ], + "description": "--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-1k - imagenet-21k --- # Model card for vit_tiny_patch16_224.augreg_in21k_ft_in1k A Vision Transformer (ViT) image classification model. Trained on ImageNet-21k and fine-tuned on ImageNet-1k (with additional augmentation and regularization) in JAX by paper authors, ported to PyTorch by Ross Wightman. ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 5.7 - GMACs: 1.1 - Activations (M): 4.1 - Image size: 224 x 224 - **Papers:** - How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: - **Dataset:** ImageNet-1k - **Pretrain Dataset:** ImageNet-21k - **Original:** ## Model Usage ### Image Classification ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. ## Citation", + "model_explanation_gemini": "A Vision Transformer (ViT) model for image classification, pretrained on ImageNet-21k and fine-tuned on ImageNet-1k with augmentation and regularization." +} \ No newline at end of file diff --git a/data/model_data_json/timm_wide_resnet50_2.racm_in1k.json b/data/model_data_json/timm_wide_resnet50_2.racm_in1k.json new file mode 100644 index 0000000000000000000000000000000000000000..c28f0bb1b3fda9218f0efd082c902d99d244d5a1 --- /dev/null +++ b/data/model_data_json/timm_wide_resnet50_2.racm_in1k.json @@ -0,0 +1,18 @@ +{ + "model_id": "timm/wide_resnet50_2.racm_in1k", + "downloads": 223378, + "tags": [ + "timm", + "pytorch", + "safetensors", + "image-classification", + "transformers", + "arxiv:2110.00476", + "arxiv:1605.07146", + "arxiv:1512.03385", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 library_name: timm tags: - image-classification - timm - transformers --- # Model card for wide_resnet50_2.racm_in1k A Wide-ResNet-B image classification model. This model features: * ReLU activations * single layer 7x7 convolution with pooling * 1x1 convolution shortcut downsample Trained on ImageNet-1k in using recipe template described below. Recipe details: * RandAugment recipe. Inspired by and evolved from EfficientNet RandAugment recipes. Published as recipe in ResNet Strikes Back. * RMSProp (TF 1.0 behaviour) optimizer, EMA weight averaging * Step (exponential decay w/ staircase) LR schedule with warmup ## Model Details - **Model Type:** Image classification / feature backbone - **Model Stats:** - Params (M): 68.9 - GMACs: 11.4 - Activations (M): 14.4 - Image size: train = 224 x 224, test = 288 x 288 - **Papers:** - ResNet strikes back: An improved training procedure in timm: - Wide Residual Networks: - Deep Residual Learning for Image Recognition: - **Original:** ## Model Usage ### Image Classification ### Feature Map Extraction ### Image Embeddings ## Model Comparison Explore the dataset and runtime metrics of this model in timm model results. |model |img_size|top1 |top5 |param_count|gmacs|macts|img/sec| |------------------------------------------|--------|-----|-----|-----------|-----|-----|-------| |seresnextaa101d_32x8d.sw_in12k_ft_in1k_288|320 |86.72|98.17|93.6 |35.2 |69.7 |451 | |seresnextaa101d_32x8d.sw_in12k_ft_in1k_288|288 |86.51|98.08|93.6 |28.5 |56.4 |560 | |seresnextaa101d_32x8d.sw_in12k_ft_in1k|288 |86.49|98.03|93.6 |28.5 |56.4 |557 | |seresnextaa101d_32x8d.sw_in12k_ft_in1k|224 |85.96|97.82|93.6 |17.2 |34.2 |923 | |resnext101_32x32d.fb_wsl_ig1b_ft_in1k|224 |85.11|97.44|468.5 |87.3 |91.1 |254 | |resnetrs420.tf_in1k|416 |85.0 |97.12|191.9 |108.4|213.8|134 | |ecaresnet269d.ra2_in1k|352 |84.96|97.22|102.1 |50.2 |101.2|291 | |ecaresnet269d.ra2_in1k|320 |84.73|97.18|102.1 |41.5 |83.7 |353 | |resnetrs350.tf_in1k|384 |84.71|96.99|164.0 |77.6 |154.7|183 | |seresnextaa101d_32x8d.ah_in1k|288 |84.57|97.08|93.6 |28.5 |56.4 |557 | |resnetrs200.tf_in1k|320 |84.45|97.08|93.2 |31.5 |67.8 |446 | |resnetrs270.tf_in1k|352 |84.43|96.97|129.9 |51.1 |105.5|280 | |seresnext101d_32x8d.ah_in1k|288 |84.36|96.92|93.6 |27.6 |53.0 |595 | |seresnet152d.ra2_in1k|320 |84.35|97.04|66.8 |24.1 |47.7 |610 | |resnetrs350.tf_in1k|288 |84.3 |96.94|164.0 |43.7 |87.1 |333 | |resnext101_32x8d.fb_swsl_ig1b_ft_in1k|224 |84.28|97.17|88.8 |16.5 |31.2 |1100 | |resnetrs420.tf_in1k|320 |84.24|96.86|191.9 |64.2 |126.6|228 | |seresnext101_32x8d.ah_in1k|288 |84.19|96.87|93.6 |27.2 |51.6 |613 | |resnext101_32x16d.fb_wsl_ig1b_ft_in1k|224 |84.18|97.19|194.0 |36.3 |51.2 |581 | |resnetaa101d.sw_in12k_ft_in1k|288 |84.11|97.11|44.6 |15.1 |29.0 |1144 | |resnet200d.ra2_in1k|320 |83.97|96.82|64.7 |31.2 |67.3 |518 | |resnetrs200.tf_in1k|256 |83.87|96.75|93.2 |20.2 |43.4 |692 | |seresnextaa101d_32x8d.ah_in1k|224 |83.86|96.65|93.6 |17.2 |34.2 |923 | |resnetrs152.tf_in1k|320 |83.72|96.61|86.6 |24.3 |48.1 |617 | |seresnet152d.ra2_in1k|256 |83.69|96.78|66.8 |15.4 |30.6 |943 | |seresnext101d_32x8d.ah_in1k|224 |83.68|96.61|93.6 |16.7 |32.0 |986 | |resnet152d.ra2_in1k|320 |83.67|96.74|60.2 |24.1 |47.7 |706 | |resnetrs270.tf_in1k|256 |83.59|96.61|129.9 |27.1 |55.8 |526 | |seresnext101_32x8d.ah_in1k|224 |83.58|96.4 |93.6 |16.5 |31.2 |1013 | |resnetaa101d.sw_in12k_ft_in1k|224 |83.54|96.83|44.6 |9.1 |17.6 |1864 | |resnet152.a1h_in1k|288 |83.46|96.54|60.2 |19.1 |37.3 |904 | |resnext101_32x16d.fb_swsl_ig1b_ft_in1k|224 |83.35|96.85|194.0 |36.3 |51.2 |582 | |resnet200d.ra2_in1k|256 |83.23|96.53|64.7 |20.0 |43.1 |809 | |resnext101_32x4d.fb_swsl_ig1b_ft_in1k|224 |83.22|96.75|44.2 |8.0 |21.2 |1814 | |resnext101_64x4d.c1_in1k|288 |83.16|96.38|83.5 |25.7 |51.6 |590 | |resnet152d.ra2_in1k|256 |83.14|96.38|60.2 |15.4 |30.5 |1096 | |resnet101d.ra2_in1k|320 |83.02|96.45|44.6 |16.5 |34.8 |992 | |ecaresnet101d.miil_in1k|288 |82.98|96.54|44.6 |13.4 |28.2 |1077 | |resnext101_64x4d.tv_in1k|224 |82.98|96.25|83.5 |15.5 |31.2 |989 | |resnetrs152.tf_in1k|256 |82.86|96.28|86.6 |15.6 |30.8 |951 | |resnext101_32x8d.tv2_in1k|224 |82.83|96.22|88.8 |16.5 |31.2 |1099 | |resnet152.a1h_in1k|224 |82.8 |96.13|60.2 |11.6 |22.6 |1486 | |resnet101.a1h_in1k|288 |82.8 |96.32|44.6 |13.0 |26.8 |1291 | |resnet152.a1_in1k|288 |82.74|95.71|60.2 |19.1 |37.3 |905 | |resnext101_32x8d.fb_wsl_ig1b_ft_in1k|224 |82.69|96.63|88.8 |16.5 |31.2 |1100 | |resnet152.a2_in1k|288 |82.62|95.75|60.2 |19.1 |37.3 |904 | |resnetaa50d.sw_in12k_ft_in1k|288 |82.61|96.49|25.6 |8.9 |20.6 |1729 | |resnet61q.ra2_in1k|288 |82.53|96.13|36.8 |9.9 |21.5 |1773 | |wide_resnet101_2.tv2_in1k|224 |82.5 |96.02|126.9 |22.8 |21.2 |1078 | |resnext101_64x4d.c1_in1k|224 |82.46|95.92|83.5 |15.5 |31.2 |987 | |resnet51q.ra2_in1k|288 |82.36|96.18|35.7 |8.1 |20.9 |1964 | |ecaresnet50t.ra2_in1k|320 |82.35|96.14|25.6 |8.8 |24.1 |1386 | |resnet101.a1_in1k|288 |82.31|95.63|44.6 |13.0 |26.8 |1291 | |resnetrs101.tf_in1k|288 |82.29|96.01|63.6 |13.6 |28.5 |1078 | |resnet152.tv2_in1k|224 |82.29|96.0 |60.2 |11.6 |22.6 |1484 | |wide_resnet50_2.racm_in1k|288 |82.27|96.06|68.9 |18.9 |23.8 |1176 | |resnet101d.ra2_in1k|256 |82.26|96.07|44.6 |10.6 |22.2 |1542 | |resnet101.a2_in1k|288 |82.24|95.73|44.6 |13.0 |26.8 |1290 | |seresnext50_32x4d.racm_in1k|288 |82.2 |96.14|27.6 |7.0 |23.8 |1547 | |ecaresnet101d.miil_in1k|224 |82.18|96.05|44.6 |8.1 |17.1 |1771 | |resnext50_32x4d.fb_swsl_ig1b_ft_in1k|224 |82.17|96.22|25.0 |4.3 |14.4 |2943 | |ecaresnet50t.a1_in1k|288 |82.12|95.65|25.6 |7.1 |19.6 |1704 | |resnext50_32x4d.a1h_in1k|288 |82.03|95.94|25.0 |7.0 |23.8 |1745 | |ecaresnet101d_pruned.miil_in1k|288 |82.0 |96.15|24.9 |5.8 |12.7 |1787 | |resnet61q.ra2_in1k|256 |81.99|95.85|36.8 |7.8 |17.0 |2230 | |resnext101_32x8d.tv2_in1k|176 |81.98|95.72|88.8 |10.3 |19.4 |1768 | |resnet152.a1_in1k|224 |81.97|95.24|60.2 |11.6 |22.6 |1486 | |resnet101.a1h_in1k|224 |81.93|95.75|44.6 |7.8 |16.2 |2122 | |resnet101.tv2_in1k|224 |81.9 |95.77|44.6 |7.8 |16.2 |2118 | |resnext101_32x16d.fb_ssl_yfcc100m_ft_in1k|224 |81.84|96.1 |194.0 |36.3 |51.2 |583 | |resnet51q.ra2_in1k|256 |81.78|95.94|35.7 |6.4 |16.6 |2471 | |resnet152.a2_in1k|224 |81.77|95.22|60.2 |11.6 |22.6 |1485 | |resnetaa50d.sw_in12k_ft_in1k|224 |81.74|96.06|25.6 |5.4 |12.4 |2813 | |ecaresnet50t.a2_in1k|288 |81.65|95.54|25.6 |7.1 |19.6 |1703 | |ecaresnet50d.miil_in1k|288 |81.64|95.88|25.6 |7.2 |19.7 |1694 | |resnext101_32x8d.fb_ssl_yfcc100m_ft_in1k|224 |81.62|96.04|88.8 |16.5 |31.2 |1101 | |wide_resnet50_2.tv2_in1k|224 |81.61|95.76|68.9 |11.4 |14.4 |1930 | |resnetaa50.a1h_in1k|288 |81.61|95.83|25.6 |8.5 |19.2 |1868 | |resnet101.a1_in1k|224 |81.5 |95.16|44.6 |7.8 |16.2 |2125 | |resnext50_32x4d.a1_in1k|288 |81.48|95.16|25.0 |7.0 |23.8 |1745 | |gcresnet50t.ra2_in1k|288 |81.47|95.71|25.9 |6.9 |18.6 |2071 | |wide_resnet50_2.racm_in1k|224 |81.45|95.53|68.9 |11.4 |14.4 |1929 | |resnet50d.a1_in1k|288 |81.44|95.22|25.6 |7.2 |19.7 |1908 | |ecaresnet50t.ra2_in1k|256 |81.44|95.67|25.6 |5.6 |15.4 |2168 | |ecaresnetlight.miil_in1k|288 |81.4 |95.82|30.2 |6.8 |13.9 |2132 | |resnet50d.ra2_in1k|288 |81.37|95.74|25.6 |7.2 |19.7 |1910 | |resnet101.a2_in1k|224 |81.32|95.19|44.6 |7.8 |16.2 |2125 | |seresnet50.ra2_in1k|288 |81.3 |95.65|28.1 |6.8 |18.4 |1803 | |resnext50_32x4d.a2_in1k|288 |81.3 |95.11|25.0 |7.0 |23.8 |1746 | |seresnext50_32x4d.racm_in1k|224 |81.27|95.62|27.6 |4.3 |14.4 |2591 | |ecaresnet50t.a1_in1k|224 |81.26|95.16|25.6 |4.3 |11.8 |2823 | |gcresnext50ts.ch_in1k|288 |81.23|95.54|15.7 |4.8 |19.6 |2117 | |senet154.gluon_in1k|224 |81.23|95.35|115.1 |20.8 |38.7 |545 | |resnet50.a1_in1k|288 |81.22|95.11|25.6 |6.8 |18.4 |2089 | |resnet50_gn.a1h_in1k|288 |81.22|95.63|25.6 |6.8 |18.4 |676 | |resnet50d.a2_in1k|288 |81.18|95.09|25.6 |7.2 |19.7 |1908 | |resnet50.fb_swsl_ig1b_ft_in1k|224 |81.18|95.98|25.6 |4.1 |11.1 |3455 | |resnext50_32x4d.tv2_in1k|224 |81.17|95.34|25.0 |4.3 |14.4 |2933 | |resnext50_32x4d.a1h_in1k|224 |81.1 |95.33|25.0 |4.3 |14.4 |2934 | |seresnet50.a2_in1k|288 |81.1 |95.23|28.1 |6.8 |18.4 |1801 | |seresnet50.a1_in1k|288 |81.1 |95.12|28.1 |6.8 |18.4 |1799 | |resnet152s.gluon_in1k|224 |81.02|95.41|60.3 |12.9 |25.0 |1347 | |resnet50.d_in1k|288 |80.97|95.44|25.6 |6.8 |18.4 |2085 | |gcresnet50t.ra2_in1k|256 |80.94|95.45|25.9 |5.4 |14.7 |2571 | |resnext101_32x4d.fb_ssl_yfcc100m_ft_in1k|224 |80.93|95.73|44.2 |8.0 |21.2 |1814 | |resnet50.c1_in1k|288 |80.91|95.55|25.6 |6.8 |18.4 |2084 | |seresnext101_32x4d.gluon_in1k|224 |80.9 |95.31|49.0 |8.0 |21.3 |1585 | |seresnext101_64x4d.gluon_in1k|224 |80.9 |95.3 |88.2 |15.5 |31.2 |918 | |resnet50.c2_in1k|288 |80.86|95.52|25.6 |6.8 |18.4 |2085 | |resnet50.tv2_in1k|224 |80.85|95.43|25.6 |4.1 |11.1 |3450 | |ecaresnet50t.a2_in1k|224 |80.84|95.02|25.6 |4.3 |11.8 |2821 | |ecaresnet101d_pruned.miil_in1k|224 |80.79|95.62|24.9 |3.5 |7.7 |2961 | |seresnet33ts.ra2_in1k|288 |80.79|95.36|19.8 |6.0 |14.8 |2506 | |ecaresnet50d_pruned.miil_in1k|288 |80.79|95.58|19.9 |4.2 |10.6 |2349 | |resnet50.a2_in1k|288 |80.78|94.99|25.6 |6.8 |18.4 |2088 | |resnet50.b1k_in1k|288 |80.71|95.43|25.6 |6.8 |18.4 |2087 | |resnext50_32x4d.ra_in1k|288 |80.7 |95.39|25.0 |7.0 |23.8 |1749 | |resnetrs101.tf_in1k|192 |80.69|95.24|63.6 |6.0 |12.7 |2270 | |resnet50d.a1_in1k|224 |80.68|94.71|25.6 |4.4 |11.9 |3162 | |eca_resnet33ts.ra2_in1k|288 |80.68|95.36|19.7 |6.0 |14.8 |2637 | |resnet50.a1h_in1k|224 |80.67|95.3 |25.6 |4.1 |11.1 |3452 | |resnext50d_32x4d.bt_in1k|288 |80.67|95.42|25.0 |7.4 |25.1 |1626 | |resnetaa50.a1h_in1k|224 |80.63|95.21|25.6 |5.2 |11.6 |3034 | |ecaresnet50d.miil_in1k|224 |80.61|95.32|25.6 |4.4 |11.9 |2813 | |resnext101_64x4d.gluon_in1k|224 |80.61|94.99|83.5 |15.5 |31.2 |989 | |gcresnet33ts.ra2_in1k|288 |80.6 |95.31|19.9 |6.0 |14.8 |2578 | |gcresnext50ts.ch_in1k|256 |80.57|95.17|15.7 |3.8 |15.5 |2710 | |resnet152.a3_in1k|224 |80.56|95.0 |60.2 |11.6 |22.6 |1483 | |resnet50d.ra2_in1k|224 |80.53|95.16|25.6 |4.4 |11.9 |3164 | |resnext50_32x4d.a1_in1k|224 |80.53|94.46|25.0 |4.3 |14.4 |2930 | |wide_resnet101_2.tv2_in1k|176 |80.48|94.98|126.9 |14.3 |13.2 |1719 | |resnet152d.gluon_in1k|224 |80.47|95.2 |60.2 |11.8 |23.4 |1428 | |resnet50.b2k_in1k|288 |80.45|95.32|25.6 |6.8 |18.4 |2086 | |ecaresnetlight.miil_in1k|224 |80.45|95.24|30.2 |4.1 |8.4 |3530 | |resnext50_32x4d.a2_in1k|224 |80.45|94.63|25.0 |4.3 |14.4 |2936 | |wide_resnet50_2.tv2_in1k|176 |80.43|95.09|68.9 |7.3 |9.0 |3015 | |resnet101d.gluon_in1k|224 |80.42|95.01|44.6 |8.1 |17.0 |2007 | |resnet50.a1_in1k|224 |80.38|94.6 |25.6 |4.1 |11.1 |3461 | |seresnet33ts.ra2_in1k|256 |80.36|95.1 |19.8 |4.8 |11.7 |3267 | |resnext101_32x4d.gluon_in1k|224 |80.34|94.93|44.2 |8.0 |21.2 |1814 | |resnext50_32x4d.fb_ssl_yfcc100m_ft_in1k|224 |80.32|95.4 |25.0 |4.3 |14.4 |2941 | |resnet101s.gluon_in1k|224 |80.28|95.16|44.7 |9.2 |18.6 |1851 | |seresnet50.ra2_in1k|224 |80.26|95.08|28.1 |4.1 |11.1 |2972 | |resnetblur50.bt_in1k|288 |80.24|95.24|25.6 |8.5 |19.9 |1523 | |resnet50d.a2_in1k|224 |80.22|94.63|25.6 |4.4 |11.9 |3162 | |resnet152.tv2_in1k|176 |80.2 |94.64|60.2 |7.2 |14.0 |2346 | |seresnet50.a2_in1k|224 |80.08|94.74|28.1 |4.1 |11.1 |2969 | |eca_resnet33ts.ra2_in1k|256 |80.08|94.97|19.7 |4.8 |11.7 |3284 | |gcresnet33ts.ra2_in1k|256 |80.06|94.99|19.9 |4.8 |11.7 |3216 | |resnet50_gn.a1h_in1k|224 |80.06|94.95|25.6 |4.1 |11.1 |1109 | |seresnet50.a1_in1k|224 |80.02|94.71|28.1 |4.1 |11.1 |2962 | |resnet50.ram_in1k|288 |79.97|95.05|25.6 |6.8 |18.4 |2086 | |resnet152c.gluon_in1k|224 |79.92|94.84|60.2 |11.8 |23.4 |1455 | |seresnext50_32x4d.gluon_in1k|224 |79.91|94.82|27.6 |4.3 |14.4 |2591 | |resnet50.d_in1k|224 |79.91|94.67|25.6 |4.1 |11.1 |3456 | |resnet101.tv2_in1k|176 |79.9 |94.6 |44.6 |4.9 |10.1 |3341 | |resnetrs50.tf_in1k|224 |79.89|94.97|35.7 |4.5 |12.1 |2774 | |resnet50.c2_in1k|224 |79.88|94.87|25.6 |4.1 |11.1 |3455 | |ecaresnet26t.ra2_in1k|320 |79.86|95.07|16.0 |5.2 |16.4 |2168 | |resnet50.a2_in1k|224 |79.85|94.56|25.6 |4.1 |11.1 |3460 | |resnet50.ra_in1k|288 |79.83|94.97|25.6 |6.8 |18.4 |2087 | |resnet101.a3_in1k|224 |79.82|94.62|44.6 |7.8 |16.2 |2114 | |resnext50_32x4d.ra_in1k|224 |79.76|94.6 |25.0 |4.3 |14.4 |2943 | |resnet50.c1_in1k|224 |79.74|94.95|25.6 |4.1 |11.1 |3455 | |ecaresnet50d_pruned.miil_in1k|224 |79.74|94.87|19.9 |2.5 |6.4 |3929 | |resnet33ts.ra2_in1k|288 |79.71|94.83|19.7 |6.0 |14.8 |2710 | |resnet152.gluon_in1k|224 |79.68|94.74|60.2 |11.6 |22.6 |1486 | |resnext50d_32x4d.bt_in1k|224 |79.67|94.87|25.0 |4.5 |15.2 |2729 | |resnet50.bt_in1k|288 |79.63|94.91|25.6 |6.8 |18.4 |2086 | |ecaresnet50t.a3_in1k|224 |79.56|94.72|25.6 |4.3 |11.8 |2805 | |resnet101c.gluon_in1k|224 |79.53|94.58|44.6 |8.1 |17.0 |2062 | |resnet50.b1k_in1k|224 |79.52|94.61|25.6 |4.1 |11.1 |3459 | |resnet50.tv2_in1k|176 |79.42|94.64|25.6 |2.6 |6.9 |5397 | |resnet32ts.ra2_in1k|288 |79.4 |94.66|18.0 |5.9 |14.6 |2752 | |resnet50.b2k_in1k|224 |79.38|94.57|25.6 |4.1 |11.1 |3459 | |resnext50_32x4d.tv2_in1k|176 |79.37|94.3 |25.0 |2.7 |9.0 |4577 | |resnext50_32x4d.gluon_in1k|224 |79.36|94.43|25.0 |4.3 |14.4 |2942 | |resnext101_32x8d.tv_in1k|224 |79.31|94.52|88.8 |16.5 |31.2 |1100 | |resnet101.gluon_in1k|224 |79.31|94.53|44.6 |7.8 |16.2 |2125 | |resnetblur50.bt_in1k|224 |79.31|94.63|25.6 |5.2 |12.0 |2524 | |resnet50.a1h_in1k|176 |79.27|94.49|25.6 |2.6 |6.9 |5404 | |resnext50_32x4d.a3_in1k|224 |79.25|94.31|25.0 |4.3 |14.4 |2931 | |resnet50.fb_ssl_yfcc100m_ft_in1k|224 |79.22|94.84|25.6 |4.1 |11.1 |3451 | |resnet33ts.ra2_in1k|256 |79.21|94.56|19.7 |4.8 |11.7 |3392 | |resnet50d.gluon_in1k|224 |79.07|94.48|25.6 |4.4 |11.9 |3162 | |resnet50.ram_in1k|224 |79.03|94.38|25.6 |4.1 |11.1 |3453 | |resnet50.am_in1k|224 |79.01|94.39|25.6 |4.1 |11.1 |3461 | |resnet32ts.ra2_in1k|256 |79.01|94.37|18.0 |4.6 |11.6 |3440 | |ecaresnet26t.ra2_in1k|256 |78.9 |94.54|16.0 |3.4 |10.5 |3421 | |resnet152.a3_in1k|160 |78.89|94.11|60.2 |5.9 |11.5 |2745 | |wide_resnet101_2.tv_in1k|224 |78.84|94.28|126.9 |22.8 |21.2 |1079 | |seresnext26d_32x4d.bt_in1k|288 |78.83|94.24|16.8 |4.5 |16.8 |2251 | |resnet50.ra_in1k|224 |78.81|94.32|25.6 |4.1 |11.1 |3454 | |seresnext26t_32x4d.bt_in1k|288 |78.74|94.33|16.8 |4.5 |16.7 |2264 | |resnet50s.gluon_in1k|224 |78.72|94.23|25.7 |5.5 |13.5 |2796 | |resnet50d.a3_in1k|224 |78.71|94.24|25.6 |4.4 |11.9 |3154 | |wide_resnet50_2.tv_in1k|224 |78.47|94.09|68.9 |11.4 |14.4 |1934 | |resnet50.bt_in1k|224 |78.46|94.27|25.6 |4.1 |11.1 |3454 | |resnet34d.ra2_in1k|288 |78.43|94.35|21.8 |6.5 |7.5 |3291 | |gcresnext26ts.ch_in1k|288 |78.42|94.04|10.5 |3.1 |13.3 |3226 | |resnet26t.ra2_in1k|320 |78.33|94.13|16.0 |5.2 |16.4 |2391 | |resnet152.tv_in1k|224 |78.32|94.04|60.2 |11.6 |22.6 |1487 | |seresnext26ts.ch_in1k|288 |78.28|94.1 |10.4 |3.1 |13.3 |3062 | |bat_resnext26ts.ch_in1k|256 |78.25|94.1 |10.7 |2.5 |12.5 |3393 | |resnet50.a3_in1k|224 |78.06|93.78|25.6 |4.1 |11.1 |3450 | |resnet50c.gluon_in1k|224 |78.0 |93.99|25.6 |4.4 |11.9 |3286 | |eca_resnext26ts.ch_in1k|288 |78.0 |93.91|10.3 |3.1 |13.3 |3297 | |seresnext26t_32x4d.bt_in1k|224 |77.98|93.75|16.8 |2.7 |10.1 |3841 | |resnet34.a1_in1k|288 |77.92|93.77|21.8 |6.1 |6.2 |3609 | |resnet101.a3_in1k|160 |77.88|93.71|44.6 |4.0 |8.3 |3926 | |resnet26t.ra2_in1k|256 |77.87|93.84|16.0 |3.4 |10.5 |3772 | |seresnext26ts.ch_in1k|256 |77.86|93.79|10.4 |2.4 |10.5 |4263 | |resnetrs50.tf_in1k|160 |77.82|93.81|35.7 |2.3 |6.2 |5238 | |gcresnext26ts.ch_in1k|256 |77.81|93.82|10.5 |2.4 |10.5 |4183 | |ecaresnet50t.a3_in1k|160 |77.79|93.6 |25.6 |2.2 |6.0 |5329 | |resnext50_32x4d.a3_in1k|160 |77.73|93.32|25.0 |2.2 |7.4 |5576 | |resnext50_32x4d.tv_in1k|224 |77.61|93.7 |25.0 |4.3 |14.4 |2944 | |seresnext26d_32x4d.bt_in1k|224 |77.59|93.61|16.8 |2.7 |10.2 |3807 | |resnet50.gluon_in1k|224 |77.58|93.72|25.6 |4.1 |11.1 |3455 | |eca_resnext26ts.ch_in1k|256 |77.44|93.56|10.3 |2.4 |10.5 |4284 | |resnet26d.bt_in1k|288 |77.41|93.63|16.0 |4.3 |13.5 |2907 | |resnet101.tv_in1k|224 |77.38|93.54|44.6 |7.8 |16.2 |2125 | |resnet50d.a3_in1k|160 |77.22|93.27|25.6 |2.2 |6.1 |5982 | |resnext26ts.ra2_in1k|288 |77.17|93.47|10.3 |3.1 |13.3 |3392 | |resnet34.a2_in1k|288 |77.15|93.27|21.8 |6.1 |6.2 |3615 | |resnet34d.ra2_in1k|224 |77.1 |93.37|21.8 |3.9 |4.5 |5436 | |seresnet50.a3_in1k|224 |77.02|93.07|28.1 |4.1 |11.1 |2952 | |resnext26ts.ra2_in1k|256 |76.78|93.13|10.3 |2.4 |10.5 |4410 | |resnet26d.bt_in1k|224 |76.7 |93.17|16.0 |2.6 |8.2 |4859 | |resnet34.bt_in1k|288 |76.5 |93.35|21.8 |6.1 |6.2 |3617 | |resnet34.a1_in1k|224 |76.42|92.87|21.8 |3.7 |3.7 |5984 | |resnet26.bt_in1k|288 |76.35|93.18|16.0 |3.9 |12.2 |3331 | |resnet50.tv_in1k|224 |76.13|92.86|25.6 |4.1 |11.1 |3457 | |resnet50.a3_in1k|160 |75.96|92.5 |25.6 |2.1 |5.7 |6490 | |resnet34.a2_in1k|224 |75.52|92.44|21.8 |3.7 |3.7 |5991 | |resnet26.bt_in1k|224 |75.3 |92.58|16.0 |2.4 |7.4 |5583 | |resnet34.bt_in1k|224 |75.16|92.18|21.8 |3.7 |3.7 |5994 | |seresnet50.a3_in1k|160 |75.1 |92.08|28.1 |2.1 |5.7 |5513 | |resnet34.gluon_in1k|224 |74.57|91.98|21.8 |3.7 |3.7 |5984 | |resnet18d.ra2_in1k|288 |73.81|91.83|11.7 |3.4 |5.4 |5196 | |resnet34.tv_in1k|224 |73.32|91.42|21.8 |3.7 |3.7 |5979 | |resnet18.fb_swsl_ig1b_ft_in1k|224 |73.28|91.73|11.7 |1.8 |2.5 |10213 | |resnet18.a1_in1k|288 |73.16|91.03|11.7 |3.0 |4.1 |6050 | |resnet34.a3_in1k|224 |72.98|91.11|21.8 |3.7 |3.7 |5967 | |resnet18.fb_ssl_yfcc100m_ft_in1k|224 |72.6 |91.42|11.7 |1.8 |2.5 |10213 | |resnet18.a2_in1k|288 |72.37|90.59|11.7 |3.0 |4.1 |6051 | |resnet14t.c3_in1k|224 |72.26|90.31|10.1 |1.7 |5.8 |7026 | |resnet18d.ra2_in1k|224 |72.26|90.68|11.7 |2.1 |3.3 |8707 | |resnet18.a1_in1k|224 |71.49|90.07|11.7 |1.8 |2.5 |10187 | |resnet14t.c3_in1k|176 |71.31|89.69|10.1 |1.1 |3.6 |10970 | |resnet18.gluon_in1k|224 |70.84|89.76|11.7 |1.8 |2.5 |10210 | |resnet18.a2_in1k|224 |70.64|89.47|11.7 |1.8 |2.5 |10194 | |resnet34.a3_in1k|160 |70.56|89.52|21.8 |1.9 |1.9 |10737 | |resnet18.tv_in1k|224 |69.76|89.07|11.7 |1.8 |2.5 |10205 | |resnet10t.c3_in1k|224 |68.34|88.03|5.4 |1.1 |2.4 |13079 | |resnet18.a3_in1k|224 |68.25|88.17|11.7 |1.8 |2.5 |10167 | |resnet10t.c3_in1k|176 |66.71|86.96|5.4 |0.7 |1.5 |20327 | |resnet18.a3_in1k|160 |65.66|86.26|11.7 |0.9 |1.3 |18229 | ## Citation", + "model_explanation_gemini": "A Wide-ResNet-B model trained for image classification on ImageNet-1k using RandAugment and RMSProp optimization." +} \ No newline at end of file diff --git a/data/model_data_json/tjohn327_scion-snowflake-arctic-embed-s-v2.json b/data/model_data_json/tjohn327_scion-snowflake-arctic-embed-s-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..d5793a9a25a5e23bf64776f409f146045341dc1f --- /dev/null +++ b/data/model_data_json/tjohn327_scion-snowflake-arctic-embed-s-v2.json @@ -0,0 +1,25 @@ +{ + "model_id": "tjohn327/scion-snowflake-arctic-embed-s-v2", + "downloads": 91556, + "tags": [ + "sentence-transformers", + "safetensors", + "bert", + "sentence-similarity", + "feature-extraction", + "generated_from_trainer", + "dataset_size:4321", + "loss:MultipleNegativesRankingLoss", + "arxiv:1908.10084", + "arxiv:1705.00652", + "base_model:Snowflake/snowflake-arctic-embed-s", + "base_model:finetune:Snowflake/snowflake-arctic-embed-s", + "model-index", + "autotrain_compatible", + "text-embeddings-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:4321 - loss:MultipleNegativesRankingLoss base_model: Snowflake/snowflake-arctic-embed-s widget: - source_sentence: What are \"Authoritative ASes\" and their roles relate to TRC? sentences: - 'Research paper detailing the architecture and implementation of a P4-based SCION border router. Explains SCION''s ISD and PCFS concepts in Section 2.1 and how routers use hop fields (HFs) with IFIDs for forwarding. Introduces a modular design with a \"bridge header,\" separating cryptographic validation from forwarding, addressing Tofino''s lack of native cryptographic support. Presents two configurations 1BR+2AES using three pipelines, and 1BR+1AES using only two by recirculating packets, details how AES implementation is deployed and that key expansion is done in the control plane. Lars-Christian Schulz et al.. \"Cryptographic Path Validation for SCION in P4.\" *Proceedings of the 6th on European P4 Workshop*, 2023. research paper 2 EuroP4 ’23, December 8, 2023, Paris, France Lars-Christian Schulz, Robin Wehner, and David Hausheer compare it to other existing implementations. Finally, we conclude this paper and give a brief outlook on future work. 2 BACKGROUND In this section, we briefly describe the architecture of the SCION Internet and the Intel Tofino 2 switch. 2.1 SCION SCION is a path-aware Internet protocol. It introduces Isolation Domains (ISDs) as groups of ASes sharing a common jurisdiction. SCION is path-aware, i.e., end hosts can choose from available forwarding paths and encode the desired one in the SCION header as what is known as packet-carried forwarding state (PCFS). Hence, the SCION data plane does not rely on longest prefix matching to determine the next hop router. Instead, SCION routers examine the hop fields (HF) in the SCION header which directly encode the AS-level path by means of interface IDs (IFIDs). Each AS can uniquely map its IFIDs to a neighbor and even a cer- tain link in case there are multiple links to this neighbor. Together with the source AS, the chain of ingress and egress IFIDs uniquely describes a SCION path. The hop fields are cryptographically signed by the AS corresponding to the hop with an AES-CMAC truncated to 6 bytes. To avoid forgery of HFs, SCION border routers must check the CMAC of every HF they use to make a forwarding deci- sion. Packets with invalid HFs should be dropped. In most cases, a HF corresponds to a specific border router, requiring each of them to only validate a single HF. Hop fields are grouped into segments resulting in a special case where a border router has to check two HFs when the path switches from one segment to another and the AS ingress and egress router happen to be the same device. The AES-CMAC is calculated over a 128 bit pseudo-header. As this matches up with the block size of the AES cipher, a single round of AES encryption is sufficient to generate the authentication tag, excluding the subkey derivation AES-CMAC calls for. A precise de- scription of the AES-CMAC algorithm is available in the correspond- ing RFC [15]. AES-128 is widely supported in commodity server hardware, making HF checks much faster than lookups in Internet- scale IP routing tables [3]. However, the switching ASICs used in hardware routers designed over decades to efficiently forward IP traffic do not include AES in their forwarding logic. Fortunately, re- cent P4-programmable switches have sufficient match-action stages to implement AES in standard P4 [4]. For more information on SCION we refer to the corresponding literature [3, 5, 19]. 2.2 Tofino Architecture We develop our SCION border router for Intel Tofino 2 switches. The P4 programmable Tofino architecture is an embodiment of the Protocol Independent Switching Architecture (PISA) data plane model. PISA switches contain three major types of programmable components: parsers, deparsers, and match-action units (MAUs). In the Tofino architecture, switch pipes consist of an in- and an egress pipeline each containing its own parser, MAUs and deparser [18]. Each switch pipe is hardwired to a set of, in case of Tofino 2, 8x 400G Ethernet ports [1]. The number of operations that can be performed per pipeline is limited. If a program exhausts the resources of one pipeline, the programmer can recirculate packets in order to process them itera- tively. If a packet is diverted to a different pipeline and recirculated there, there is the option to process the same packet sequentially with different P4 programs as each pipeline can be programmed independently. This is the key to fit the SCION border router in a Tofino 2 switch as described in Section 5.1. 3 RELATED WORK The SCION reference border router is implemented in Go [2] and uses regular IP/UDP sockets for packet I/O. Although being multi- threaded, the reference border router is not suitable for high traffic volume. Schulz et al. have proposed a BPF implementation of SCION packet forwarding [14] which achieves a throughput of 0.676 Mpps per core within a virtual machine test environment. However, the BPF data path has not been integrated in the reference border router yet. A commercial DPDK-based SCION router software is available from Anapaya Systems [17], but to our knowledge no production- ready SCION routers exist in hardware. The first attempt at a hardware implementation of SCION was made by Součková, targeting a NetFPGA SUME development board programmable in P4 [16]. The full 10 Gbit/s line rate of the devel- opment platform has been achieved in experiments. However, the SCION packet parser and cryptographic validation circuitry did not fit in the FPGA at the same time due to inefficient workarounds that had to be taken to handle SCION’s non-standard header layout. Nevertheless, the project led to improvements to SCION’s header layout making it more suitable for high-speed processing. A first implementation of SCION for Tofino 1 was presented by de Ruiter et al. [7] being capable of processing packets at 100 Gbit/s line rate. However, as Tofino does not support cryptographic opera- tions in hardware, the AES-CMAC hop field validation in de Ruiter’s approach relies on a pre-populated table of valid hop fields. This simplification works as current SCION deployments change valida- tion keys infrequently. An unfortunate consequence of this design is that the SCION router is no longer stateless and instead has to communicate with the path discovery and registration services of the AS to obtain valid hop fields. Furthermore, the lookup-table solution also prevents the deployment of the SCION extensions EPIC [ 11] and Colibri [ 9] which rely on MACs that do not just change per-path, but per-packet. Nevertheless, the P4 code pub- lished by de Ruiter et al. inspired our work and is incorporated in our implementation. Chen has shown that it is possible to implement an AES encryp- tion in a Tofino 1 switch using so called scrambled lookup tables [4]. Their implementation was limited to an encryption throughput of 10.92 Gbit/s due to limited recirculation capacity. Our work addresses the issues encountered by Součková and de Ruiter et al. We implement the SCION packet parsing and validation logic separately in different pipelines of a Tofino 2 switch in order to bridge the gap between SCION’s requirements and achieving line-rate throughput. We furthermore develop an approach to AES in P4 that takes full advantage of the resources provided by Tofino 2 realizing the first 400G line-rate packet validator for SCION. 18 ' - 'Book excerpt providing an overview of LightningFilter operation. It keeps AS-level aggregates and stores long-term traffic profiles for traffic shaping. Describes a process for rate-limiting based on these, and prediction to account for recent traffic. Emphasizes prevention of source address spoofing and replay attacks using DRKey(§3.2) , SPAO(§3.3), and replay suppression modules. Differentiates authenticated traffic vs. best-effort approach pipelines. Laurent Chuat et al.. *The Complete Guide to SCION. From Design Principles to Formal Verification*. Springer International Publishing AG, 2022. book 229 9.2 High-Speed Traffic Filtering with LightningFilter 9.2.1.2 Design Goals LightningFilter is designed to achieve the following objectives: • Guaranteed access for legitimate users within traffic profile: The system must ensure that a client in a non-compromised domain (i.e., a domain without an adversary) has a guarantee to reach a target domain even in the presence of adversaries in other domains. We define a traffic profile as a sequence of measurements over a specific period of time (profiling window) on a per-flow basis (flow count). As long as the traffic of a flow is within such a traffic profile, its packets are guaranteed to be processed.4 • Enabling traditional firewalls to filter packets using metadata: The system should enable traditional firewalls to employ meaningful rule- based packet filtering using packet metadata (such as the 5-tuple in the packet header). Without LightningFilter, these filtering rules can be cir- cumvented by spoofing attacks due to the lack of authentication. • Elimination of collateral damage across domains: The system should guarantee that compromised domains cannot introduce collateral dam- age on non-compromised domains by consuming all available resources. Legitimate clients within a compromised domain, however, may be af- fected by an adversary consuming excessive resources at a target domain. This provides an incentive for domain owners to eliminate attack traffic sent by their end hosts. • Non-goal: Guaranteed traffic delivery to the domain is not a goal of this system, but can be achieved by a complementary system in SCION. 9.2.2 Overview of LightningFilter Considering our threat model, the adversary’s goal is to consume all available processing resources to prevent legitimate clients from reaching a target ser- vice, e.g., by sending an excessive number of requests. To prevent a single en- tity from achieving this goal, the available processing resources should be sub- divided and distributed among all clients. However, allocating an equal share of resources to each entity inhibits high utilization and potentially punishes benign traffic. As a consequence, researchers have suggested the use of more dynamic approaches, such as history-based filtering [ 213, 407] or binning of requests [ 470]. The potentially huge number of clients poses a challenge to the former approaches, as storing a traffic history (e.g., packet counters) per client is impractical. Instead, we propose to aggregate and store traffic profiles at the level of domains, i.e., ASes. These traffic profiles denote a sequence 4The replay-suppression system causes a negligible number of packets to be dropped due to false positives; however, end hosts must be able to handle packet loss anyway. 209 ' - \"Technical document on SCION CP-PKI trust model and terminology specification.\\ \\ Defines terms like base TRC, TRC signing ceremony, TRC update (regular/sensitive),\\ \\ voting ASes, voting quorum, grace period, trust reset. Explains SCION's trust\\ \\ model with Isolation Domains addressing limitations of monopoly/oligopoly PKI\\ \\ models. Mentions trust agility/resilience, multilateral governance, policy versioning,\\ \\ and lack of IP prefix origin validation by design in contrast to RPKI.\\n\\ \\ \\n\\ \\ specification \\n\\nde Kater, et al. Expires 3 July\\ \\ 2025 [Page 5]\\n\\f\\nInternet-Draft SCION CP-PKI\\ \\ December 2024\\n\\n\\n *Authoritative AS*: Authoritative ASes\\ \\ are those ASes in an ISD that\\n always have the latest TRCs of the ISD. As\\ \\ a consequence,\\n authoritative ASes also start the announcement of a TRC update.\\n\\ \\n *Base TRC*: A base TRC is a trust root configuration (TRC) that other\\n \\ \\ parties trust axiomatically. In other words, trust for a base TRC is\\n assumed,\\ \\ not derived from another cryptographic object. Each ISD\\n MUST create and\\ \\ sign a base TRC when the ISD is established. A base\\n TRC is either the first\\ \\ TRC of the ISD or the result of a trust\\n reset.\\n\\n *TRC Signing Ceremony*:\\ \\ The ceremony during which the very first base\\n TRC of an ISD, called the\\ \\ initial TRC, is signed. The initial TRC is\\n a special case of the base TRC\\ \\ where the number of the ISD is\\n assigned.\\n\\n *TRC Update*: A _regular_\\ \\ TRC update is a periodic re-issuance of the\\n TRC where the entities and policies\\ \\ listed in the TRC remain\\n unchanged. A _sensitive_ TRC update is an update\\ \\ that modifies\\n critical aspects of the TRC, such as the set of core ASes.\\ \\ In both\\n cases, the base TRC remains unchanged.\\n\\n *Voting ASes*: Those\\ \\ ASes within an ISD that may sign TRC updates.\\n The process of appending a\\ \\ signature to a new TRC is called \\\"casting\\n a vote\\\".\\n\\n *Voting Quorum*:\\ \\ The voting quorum is a trust root configuration\\n (TRC) field that indicates\\ \\ the number of votes (signatures) needed on\\n a successor TRC for it to be\\ \\ verifiable. A voting quorum greater\\n than one will thus prevent a single\\ \\ entity from creating a malicious\\n TRC update.\\n\\n *Grace Period*: The grace\\ \\ period is an interval during which the\\n previous version of a trust root\\ \\ configuration (TRC) is still\\n considered active after a new version has been\\ \\ published.\\n\\n *Trust Reset*: A trust reset is the action of announcing a\\ \\ new base\\n TRC for an existing ISD. A trust reset SHOULD only be triggered\\n\\ \\ after a catastrophic event involving the loss or compromise of\\n several\\ \\ important private keys.\\n\\n1.2. Conventions and Definitions\\n\\n The key words\\ \\ \\\"MUST\\\", \\\"MUST NOT\\\", \\\"REQUIRED\\\", \\\"SHALL\\\", \\\"SHALL NOT\\\",\\n \\\"SHOULD\\\"\\ , \\\"SHOULD NOT\\\", \\\"RECOMMENDED\\\", \\\"NOT RECOMMENDED\\\", \\\"MAY\\\", and\\n \\\"OPTIONAL\\\"\\ \\ in this document are to be interpreted as described in\\n BCP 14 [RFC2119]\\ \\ [RFC8174] when, and only when, they appear in all\\n capitals, as shown here.de\\ \\ Kater, et al. Expires 3 July 2025 [Page 6]\\n\\f\\n\\ Internet-Draft SCION CP-PKI December 2024\\n\\n\\n\\ 1.3. Trust Model\\n\\n Given the diverse nature of the constituents in the current\\ \\ Internet,\\n an important challenge is how to scale authentication of network\\n\\ \\ elements (such as AS ownership, hop-by-hop routing information, name\\n servers\\ \\ for DNS, and domains for TLS) to the global environment. The\\n roots of trust\\ \\ of currently prevalent public key infrastructure (PKI)\\n models do not scale\\ \\ well to a global environment because (1) mutually\\n distrustful parties cannot\\ \\ agree on a single trust root (monopoly\\n model), and because (2) the security\\ \\ of a plethora of roots of trust\\n is only as strong as its weakest link (oligopoly\\ \\ model) - see also\\n [BARRERA17].\\n\\n The monopoly model suffers from two\\ \\ main drawbacks: First, all\\n parties must agree on a single root of trust.\\ \\ Secondly, the single\\n root of trust represents a single point of failure,\\ \\ the misuse of\\n which enables the forging of certificates. Its revocation\\ \\ can also\\n result in a kill switch for all the entities it certifies.\\n\\n\\ \\ The oligopoly model relies on several roots of trust, all equally and\\n \\ \\ completely trusted. However, this is not automatically better:\\n whereas\\ \\ the monopoly model has a single point of failure, the\\n oligopoly model has\\ \\ the drawback of exposing more than one point of\\n failure.\\n\\n Thus, there\\ \\ is a need for a trust architecture that supports\\n meaningful trust roots\\ \\ in a global environment with inherently\\n distrustful parties. This new trust\\ \\ architecture should provide the\\n following properties:\\n\\n * Trust agility\\ \\ (see further below);\\n\\n * Resilience to single root of trust compromise;\\n\\ \\n * Multilateral governance; and\\n\\n * Support for policy versioning and\\ \\ updates.\\n\\n Ideally, the trust architecture allows parties that mutually\\ \\ trust\\n each other to form their own trust \\\"union\\\" or \\\"domain\\\", and to\\ \\ freely\\n decide whether to trust other trust unions (domains) outside their\\n\\ \\ own trust bubble.\\n\" - source_sentence: What are the challenges of deploying INT on multi-operator networks like the Internet sentences: - 'Book chapter excerpt, (\"SBAS,\" Section \"Secure Route Redistribution\"). Details SBAS''s internal full-mesh topology among PoPs using SCION and encrypted iBGP sessions. Introduces three address categories: secure (customer/SBAS-owned), internal (PoP communication), and global (other routable addresses). Laurent Chuat et al.. *The Complete Guide to SCION. From Design Principles to Formal Verification*. Springer International Publishing AG, 2022. book 368 13 Deployment and Operation tural abstraction of the underlying infrastructure and provide an interface to customers. End-to-End Security. In the context of mediating customers’ IP endpoints via a secure backbone, the end-to-end communication path can be segmented into an external (insecure) segment, which is comprised of the Internet links between an IP endpoint and the SBAS ingress/egress point, and an internal segment between an arbitrary ingress and egress pair of the secure routing infrastructure. Therefore, to ensure end-to-end secure routing, the follow- ing conditions must hold: (1) Customers must be able to select trusted in- gress/egress points and securely exchange packets with hijack resilience; and (2) the secure backbone must deliver the security properties it promised to any pairs of ingress/egress points even in the presence of internal adversaries. Routing Priority. To enable customers to route traffic from/to the Internet through a secure backbone, SBAS must disseminate the customers’ prefix an- nouncements to all other customers and external entities. Prefixes will then be announced via SBAS and the Internet, resulting in competing announcements. To maximize the ability to route securely, SBAS must be able to convince the entities receiving the announcements to prioritize routing paths through the secure backbone over the insecure Internet paths. 13.5.3.2 Secure Route Redistribution The internal structure of SBAS can be abstracted to a full-mesh topology be- tween the PoPs, which communicate over SCION. Over these connections, the PoPs redistribute announcements from SBAS customers as well as the In- ternet, akin to the operation of iBGP in a regular AS. To prevent tampering by non-PoP members, the iBGP sessions run over an encrypted and authenticated connection (such as a VPN tunnel). SBAS offers a high degree of flexibility to its customers through support for dynamic route redistribution. Contrary to a traditional AS, which is controlled by a single entity, the redistribution scheme to be used in SBAS must support its federated structure and remain secure in the presence of malicious mem- bers. In the following, we describe the design and security aspects of the route redistribution mechanism. The system distinguishes between three categories of addresses: • Secure addresses: This includes prefixes announced by SBAS cus- tomers and SBAS-owned address spaces, which are assigned to cus- tomers. Secure address spaces are announced publicly at egress points via BGP. • Internal addresses: In order to provide an internal addressing scheme among PoPs, e.g., to set up iBGP sessions between PoP routers, the PoPs 348 ' - 'Research paper titled \"ID-INT: Secure Inter-Domain In-Band Telemetry\" proposing ID-INT, a SCION extension for secure, authenticated in-band telemetry. Leverages SCION''s PKI and DRKey for data plane authentication, enabling applications like intra-AS path tracing, congestion control, and carbon-aware routing. Implemented in the SCION stack with an AS-hosted telemetry collector. Evaluation shows minimal performance impact on routers with authentication-only mode and up to a 13% throughput decrease with encryption. Lars-Christian Schulz et al.. \"ID-INT: Secure Inter-Domain In-Band Telemetry.\" *2024 20th International Conference on Network and Service Management (CNSM)*, 2024. research paper 1 ID-INT: Secure Inter-Domain In-Band Telemetry Lars-Christian Schulz OVGU Magdeburg Magdeburg, Germany lschulz@ovgu.de David Hausheer OVGU Magdeburg Magdeburg, Germany hausheer@ovgu.de Abstract—In-band network telemetry (INT) is a powerful tool for gathering status information from network components in a distributed and timely way. Until now, INT has mostly been deployed in data center environments or single operator W ANs, because it lacks mechanisms for authentication and is not widely standardized. SCION is a novel, path-based Internet architecture providing strong resilience and security properties. In this paper, we propose Inter-domain In-band Network Telemetry (ID-INT) as a protocol extension for SCION. ID-INT leverages SCION’s public key infrastructure to authenticate telemetry data while augmenting SCION’s end host path control with real-time network information. Promising applications of ID-INT include intra-AS path tracing, congestion control, SLA verification, and carbon-aware routing. We implement ID-INT in the open-source SCION stack and provide a proof of concept for an AS-hosted telemetry collection service. We show that cryptographically authenticated ID-INT can be fully implemented in the SCION router’s fast-path with no measurable impact on router per- formance. If optional encryption is employed in addition to authentication, router throughput drops by no more than 13% even if every packet carries telemetry. Index Terms—In-band Network Telemetry, SCION, W AN I. I NTRODUCTION Network monitoring and measurement is an integral part of any network operator’s toolkit. In order to meet the demands of modern real-time applications, constant monitoring of the network’s status and performance is required. Traditionally, networks have been monitored through active measurements using probe packets, e.g., using the well-known ping and traceroute commands, or through passive traffic monitoring at routers. Passive monitoring is usually employing sampling techniques as observing every single packet is costly. With the advent of SDN, programmable data planes, and P4, a new network monitoring paradigm emerged in the form of push-based network telemetry. Telemetry-enabled devices push network measurements to a central controller, instead of waiting for the controller to poll monitoring data. Fully programmable network devices like Intel’s Tofino [1] enable to push telemetry one step further by offloading the collection of telemetry metadata entirely to the data plane. Noticeably, the INT specification [2] was developed as a standardized way to exchange telemetry information between network entities. The INT framework is related to a number of earlier systems all based around the idea of embedding telemetry instructions and is some cases metadata as well in packet headers [3], [4]. INT has in turn inspired research on advanced in-band telemetry protocols like ML-INT for optical networks [5] and probabilistic approaches like PINT [6]. All these systems have in common that they can only be deployed in networks under shared administrative control. Additionally, security and privacy aspects have largely been ignored, precluding Internet- wide deployment. The SCION Internet architecture [7] has been developed to address the lack of security-by-design in today’s Internet based on the Border Gateway Protocol (BGP). BGP’s design limitations have caused numerous outages. SCION provides a public key infrastructure for authenticating network entities and allows multiple roots of trust to coexist. Another core feature of SCION is that it is a path-based routing protocol. End hosts include the AS-level forwarding path in packet headers to eliminate uncertainties in traditional routing. The same property also allows end hosts to send traffic to a specific destination over multiple parallel paths to increase reliability and aggregate bandwidth. SCION has been successfully de- ployed in both research [8] and commercial networks [9] and already reaches hundreds of thousands devices. A challenge of the end host routing approach is to provide sufficient information for making routing decisions to hosts. Current solutions (cf. [10]–[12]) are based on control plane messages and cannot provide real-time feedback from routers to hosts. Therefore, SCION path selection is mostly based on end-to- end measurements, which become challenging as the number of available paths grows with the number of SCION ASes. In order to address the absence of real-time telemetry in SCION and INT’s lack of an authentication infrastructure and inter-operator compatibility, we introduce Inter-Domain In- band Network Telemetry (ID-INT). ID-INT relies on SCION’s Dynamically Recreatable Key (DRKey) system to provide efficient message authentication in the data plane and in turn allows SCION end host to make informed routing decisions. This work is structured as follows: We continue with a brief description of SCION in section II and provide an overview of related work in and outside the SCION ecosystem in section III. ID-INT’s design is presented in section IV. section V provides details on our prototype implementation which we evaluate for throughput and overhead in section VI, before we discuss potential extensions to the protocol in section VII. Finally, section VIII gives an outlook on a wide range of applications, while section IX concludes this paper. 2024 20th International Conference on Network and Service Management (CNSM) 978-3-903176-66-9 ©2024 IFIP ' - 'Master''s thesis excerpt detailing scoring functions for \"passive\" and \"active\" path selection mechanisms in SCION. \"Passive\" mechanism modifies QoE function (Equation 4.4), with increased loss penalty: (if loss < 0.05). Describes \"passive\" mechanism behavior: initial path selection by lowest latency with increasing sending rate, switching when significant loss occurs. Pascal Marc Suter. *Traffic Engineering in SCION: The impact of end host QoS mechanisms on network performance*. Master''s thesis, ETH Zurich, 2023. research paper 44 5.2. Implementation details Table 5.1: Sending rates considered by other works and chosen bitrates in Mbps. Title Low Medium High Can you see me now?: a measurement study of Zoom, Webex, and Meet [54] 0.5 - 1 2.5 - 2.6 Zoom Session Quality: A Network- Level View [55] 1 - 1.5 3 - 6 Measuring the performance and net- work utilization of popular video con- ferencing applications [21] 0.8 - 1.9 Chosen bitrates 0.7 1.5 5 The scoring functions differ between the mechanisms. For the ’naive’ and ’shortest path’ mechanisms, the application will select the path at the begin- ning. ’Naive’ chooses uniformly at random from all available paths while ’shortest path’ chooses uniformly at random from the subset of the shortest paths, i.e.,the paths with the fewest hops or fewest ASes in it. Shortest path does not necessarily mean paths with the lowest latency but paths with the fewest hops. The selected path gets a high score and all others a low score. The score is set to low score when the sending rate is higher or equal than previously and there was loss previously except for low sending rates. This gives them the behavior of starting at a low sending rate, increasing when no loss is detected and decreasing when it is, mirroring the functionality of ABR. These two mechanisms do not require any probing. The ’passive’ mechanism uses latency only probing. The core of its scoring function is the score function defined in Equation 4.4. That function scores the QoE for VCAs and as the mechanisms are supposed to optimize the quality, it is a good starting point. However, early testing showed that this is too accepting of loss, only changing paths or sending rate after 10% of loss occurs. After 10% the score drops significantly and to avoid that, the scoring function used internally by the mechanisms has a lower threshold. The internal score function is given by replacing Equation 4.2 with penalty loss = ( 5000 ∗ loss if loss < 0.05 104 ∗ loss else (5.2) It punishes loss more; this is to get a tradeoff between optimizing for QoE and limiting congestion. There are some more modifications for the implementation. The loss on a path is only known when traffic was sent, otherwise it will be assumed zero. Additionally, the ’passive’ mechanism also performs a sending rate selection similar to ’naive’ and ’shortest path’. When sending over a new path, i.e., a path that was not sent over since the last probing and for which 37 ' - source_sentence: What is the default output of the command sentences: - \"Research paper section on a Security Analysis of PILA. Addresses potential MitM\\ \\ attacks, downgrade attacks, and key compromises. Describes how PILA prevents\\ \\ or mitigates these attacks, local responder-side attackers, Responder-Side NAT\\ \\ attackers, and details how key compromises can be detected and their impact\\ \\ is limited.\\n Cyrill Krähenbühl et al.. \\\"Ubiquitous Secure Communication\\ \\ in a Future Internet Architecture.\\\" SN Computer Science, vol. 3, no. 5, 2022,\\ \\ pp. . \\n research paper \\n 9 \\n\\n\\ SN Computer Science (2022) 3:350 \\n Page 9 of 13 350 \\nSN Computer\\ \\ Science\\nresponder can query either the certificate service or the local \\n\\ NAT, see “NAT Devices”, and check for duplicate certifi-\\ncates for its identifiers.\\n\\ Responder-Side NAT or AS Attacker. A malicious AS \\nor a malicious NAT device\\ \\ on the responder side cannot \\nimmediately be detected. They do, however, create\\ \\ irrefuta-\\nble cryptographic proof of misbehavior in the form of con-\\nflicting\\ \\ end-host certificates valid at the same point in time. \\nThese certificates\\ \\ can be stored locally or published on an \\nappend-only log server and later\\ \\ be compared through an \\nout-of-band channel or audited by another entity.\\n\\ Other Attackers. Other entities, such as a malicious AS \\nor NAT device on the\\ \\ initiator’s side or an attacker in the \\ninitiator’s local network, cannot perform\\ \\ an MitM attack, \\nsince they cannot forge valid responder certificates.\\nDowngrade\\ \\ Attacks\\nIn this section, we analyze the three downgrade prevention \\napproaches\\ \\ explained in Downgrade Prevention. In a down-\\ngrade attack, an attacker attempts\\ \\ to convince the initiator \\nconnecting to an unknown responder that the responder’s\\ \\ \\nAS does not support PILA or that the responder does not \\nallow the desired\\ \\ PILA-supported protocol. However, care \\nmust be taken that the downgrade prevention\\ \\ approaches do \\nnot introduce an additional DoS vector where a non-PILA-\\nenabled\\ \\ end-host is prevented from communicating with a \\nPILA-enabled end-host.\\nSignature-Based\\ \\ and Log-Based Approaches. Both \\nthe signature-based (“Signature-based Approach\\ \\ ”) and \\nlog-based (“Log-based Approach”) approaches prevent \\ndowngrade attacks,\\ \\ since an attacker is not able to forge \\nvalid signatures for bogus statements\\ \\ which claim that a \\nPILA-enabled end-host does not support PILA. Replaying\\ \\ \\na (potentially different) out-of-date statement is prevented \\nby the time\\ \\ stamps within the statements and due to the \\nassumption of time synchronization\\ \\ (see 3 ). For the same \\nreason, an attacker cannot use an out-of-date statement\\ \\ \\nwhich claims that a non-PILA-enabled host supports PILA \\nas a DoS vector,\\ \\ since this statement will be rejected by the \\nrelying end-host.\\nSelf-verifiable\\ \\ Approaches. We separate between the \\ntwo self-verifiable address approaches\\ \\ explained in Self-\\nVerifiable Approach: address range reservation and IPv6\\ \\ \\naddress encoding.\\nIf an AS reserves an IP address range for PILA-enabled\\ \\ \\ntraffic, then an attacker can neither downgrade (since the \\nrelying end-host\\ \\ can locally check whether the remote end-\\nhost is within the IP address range)\\ \\ nor use it as a DoS vector \\n(since only PILA-enabled end-hosts are assigned\\ \\ to this IP \\naddress range).\\nFor the self-verifiable IPv6 address encoding\\ \\ approach, \\nan attacker cannot perform a downgrade attack since the two \\ncommunicating\\ \\ end hosts will perform the same determinis-\\ntic computation to verify whether\\ \\ the end-host has encoded \\nPILA support in the IP address. Regarding a potential\\ \\ DoS \\nvector, we consider two attackers: an on-path attacker which \\ncan and\\ \\ an on-path attacker which cannot influence the net-\\nwork prefix of the IPv6\\ \\ address of an end-host. We assume \\nthe worst case, where the attacker can predict\\ \\ the device \\naddress that will be chosen by the end-host. The attacker’s \\n\\ goal is to make the non-PILA-enabled end-host choose an \\nIPv6 address that indicates\\ \\ PILA support.\\n• If the attacker cannot influence the network prefix and \\n\\ thus cannot impact the final IPv6 address chosen by the \\nnon-PILA-enabled end-host,\\ \\ the probability of a DoS for \\nthe non-PILA-enabled end host remains unchanged\\ \\ from \\nthe case without any attacker ( 2−32).\\n• If the attacker can influence\\ \\ the network prefix and pre-\\ndict the device address, then the attacker could\\ \\ poten-\\ntially fabricate a network prefix, such that there is a hash \\ncollision\\ \\ on the leftmost 32 bit of the device address. \\nThis would prevent the non-PILA-enabled\\ \\ end-host from \\ncommunicating with a PILA-enabled end-host. However, \\nit is\\ \\ very likely that an attacker with the capability of \\ncontrolling the routing\\ \\ within the AS can simply drop \\nunwanted traffic, which is in comparison a much\\ \\ stronger \\nand more effective attack.\\nPrivate Key Compromise\\nThe severity\\ \\ of a compromised private key depends on the \\nentity and the lifetime of the\\ \\ certificate belonging to this key.\\nKey compromises of entities in the SCION\\ \\ control-plane \\ndelegation chain are relatively easy to detect if abused, since\\ \\ \\nthere would be ASes with multiple valid certificates for an \\nISD and AS number\\ \\ with different public keys. AS key com-\\npromises are similarly easy to detect\\ \\ but only allow forging \\nsigned PILA messages within the compromised AS. End-\\n\\ host key compromises are less severe, as end-host certifi-\\ncates are short-lived.\\ \\ In RPKI-based PILA, a compromised \\ntrust root impacts the authenticity of all\\ \\ end hosts. In com-\\nparison, a compromised (ISD) trust root in SCION-based \\n\\ PILA only impacts the authenticity of end-hosts within this \\nISD. Additionally,\\ \\ a single (or a few) compromised control-\\nplane CAs can be removed from the\\ \\ set of trusted CAs by \\nupdating the trust root configuration (TRC) which specifies\\ \\ \\nall control-plane CAs.\\nAttacking AS Trust\\nAttackers might attempt to reduce\\ \\ the trustworthiness of an \\nAS. Slander, i.e., accusing a benign, uncompromised\\ \\ AS \\nof having issued incorrect certificates, is not possible in \\n\" - \"Documentation document for the scion-pki key private command, which generates\\ \\ a PEM-encoded private key with selectable elliptic curve (P-256, P-384, P-521).\\ \\ Defaults to P-256. The --force option controls overwriting the keyfile.\\n\\ \\ \\ \\n documentation \\n\\n# scion-pki key public\\n\\n\\ # scion-pki key public\\n\\nGenerate public key for the provided private key\\n\\n\\ ## Synopsis\\n\\n‘public’ generates a PEM encoded public key.\\n\\nBy default, the\\ \\ public key is written to standard out.\\n\\n\\n\\n## Examples\\n\\n\\n\\n## Options\\n\\n\\n\\n## SEE ALSO\\n\\n- scion-pki key - Manage\\ \\ private and public keys\\n\\n\\n\" - 'Book excerpt (\"Bootstrapping Steps, Discovery Mechanisms\") detailing the steps of the end-host bootstrapper daemon using DHCP, DNS and mDNS and configuration file download. Explanations focus on operation of discovery mechanisms in environments with managed DHCP servers or DNS infrastructure. Laurent Chuat et al.. *The Complete Guide to SCION. From Design Principles to Formal Verification*. Springer International Publishing AG, 2022. book 348 13 Deployment and Operation the bootstrapper daemon and starts the SCION Daemon once the bootstrapper daemon finishes successfully. Bootstrapping Steps. The end host bootstrapper daemon performs the fol- lowing steps: 1. Probe the local network for hints about a bootstrapping server address us- ing the available discovery mechanisms (i.e., DHCP, DNS, and mDNS). 2. Wait for hints from the discoverers. 3. Once a hint is received, try to download the TRCs and the topology of the AS from the bootstrapping server. While there is no maximum amount of TRCs to be served, the bootstrapping server must provide at least the TRC of the ISD in which the AS is located. a) On success, prepare the SD’s files and exit successfully; the SD is then automatically started by the orchestrator. b) On failure, go back to step 2. If no hint is received after a certain period, the bootstrapper daemon times out and exits with a non-zero value. Note that the TRCs retrieval is a transition solution to ease adoption; ideally they are installed on a device out-of-band, before the device gets connected to a network (more details are given in the security considerations on page 331). 13.2.3 Discovery Mechanisms A bootstrapper can leverage DHCP, DNS or mDNS in order to find the IP address of the bootstrapping server. We describe each case, where we assume that • the end host is located in the example.comdomain; and • the IP address of the bootstrapping server is 192.168.1.1. DHCP. The DHCP mechanism relies on the presence of an existing DHCP server in the network. This mechanism is advantageous in environments where there is a managed DHCP server, but no dedicated DNS infrastructure is oper- ated for the local network. The DHCP server has to be configured to announce the address of the dis- covery services using one of the DHCP options. One natural choice is to use the option field with ID 72 “Default WWW server”, given that HTTP, the same application-level protocol as used in the WWW, is used to retrieve the config- uration files. In our example, we would set the option value to 192.168.1.1. 328 ' - source_sentence: How might operators of large replicated services manage their own ISD sentences: - 'Research paper on PISKES providing background on source address validation limitations (SAV/BCP 38), cookie-based systems, and client certificates. Discusses limitations of key-distribution systems like Passport and extends on prior work, DRKey, to form the new PISKES design. Benjamin Rothenberger et al.. \"PISKES: Pragmatic Internet-Scale Key-Establishment System.\" *Proceedings of the ACM Asia Conference on Computer and Communications Security (ASIACCS)*, 2020. research paper 3 section. Here we focus on several representative and well-known systems—an exhaustive overview of related work is provided in §8. 3.1 Authentication Systems 3.1.1 Source Address Validation. Source address validation (SAV), also known as best current practice (BCP) 38 [ 24], is not an au- thentication system in the strict sense but is still often considered a solution to source-address spoofing in the Internet. With SAV, ASes monitor traffic originating from their own hosts and filter out packets with a source address outside their own address space. However, due to incentive misalignments,2 the adoption of SAV has been slow and a recent study found that many ASes still do not employ it in their networks [46]. Furthermore, it is impossible to determine from the outside if a particular AS employs SAV or if a particular packet originated from an AS that employs SAV as it does not carry any proof of authenticity. For an external service it is therefore impossible to filter traffic based on whether it originated from an AS employing SAV. Even with a full deployment of SAV in the Internet, on-path adversaries would still be able to spoof the source of packets and SAV thus provides very weak security properties. There exists a wide range of other filtering techniques with similarly limited properties [4, 21, 34, 43, 56]. 3.1.2 Cookies. Several protocols, including TLS [63], IKEv2 [38], and DNS [22] define a cookie mechanism to provide a weak form of source authentication. The basic mechanism for these systems is similar: Upon receiving a request, the server replies to the sender with a cookie that encodes the request parameters without allo- cating state or processing the request. Only after receiving this cookie back from the source, the request is processed. Compared to SAV, cookies have the advantage that they can be enforced by services without relying on Internet service providers (ISPs) to perform filtering. However, cookies introduce additional latency of one round-trip time (RTT) and are still susceptible to spoofed packets by on-path adversaries. 3.1.3 Client Certificates. Strong authentication properties can be achieved through asymmetric cryptography and client certificates. These are supported, for example, by TLS [63] and DTLS [64]. How- ever, authentication using client certificates requires expensive asymmetric cryptography in violation of our efficiency require- ments (§2.1.2). Furthermore, these systems cannot authenticate the first packet and are vulnerable to signature-flooding attacks. 3.2 Key-Distribution Systems 3.2.1 Passport. Passport [44] provides mechanisms to establish shared keys between any pair of ASes based on a DH key exchange piggybacked on BGP messages. It relies on a secure routing system to ensure the authenticity of the shared keys, which can subse- quently be used to authenticate the source of packets at the network layer. For our purposes (see §2), Passport by itself is inadequate for several reasons: (i) it only enables authentication at the AS level, (ii) it requires authenticating systems to keep a store of symmetric keys for all ASes (currently approximately 68 000 [6]), (iii) it has 2The costs of deploying SAV are paid by an AS itself while its benefits are experienced by the rest of the Internet. Table 1: Notation used in this paper. ∥ bitstring concatenation 𝐴,𝐵 autonomous systems (ASes) identified by AS number (ASN) 𝐻𝐴, 𝐻𝐵 end hosts identified by IP address 𝐾𝑆𝐴, 𝐾𝑆𝐵 key servers located in a specific AS 𝑆𝑉𝐴 AS 𝐴’s local secret value 𝑆𝑉𝑝 𝐴 AS 𝐴’s local secret value for protocol 𝑝 ˜𝐾𝑝 • symmetric key derived (indirectly) from 𝑆𝑉𝑝 𝐾𝐴→𝐵 symmetric key between ASes 𝐴and 𝐵, derived from 𝑆𝑉𝐴 𝐾𝑝 𝐴→𝐵 symmetric key between ASes 𝐴and 𝐵for protocol 𝑝 𝐾𝑝 𝐴→𝐵:𝐻𝐵 symmetric key between AS 𝐴and end host 𝐻𝐵 in AS 𝐵for pro- tocol 𝑝 𝐾𝑝 𝐴:𝐻𝐴→𝐵:𝐻𝐵 symmetric key between end host 𝐻𝐴 in AS 𝐴and end host 𝐻𝐵 in AS 𝐵for protocol 𝑝 H(·) non-cryptographic hash operation MAC𝐾(·) message authentication code using key 𝐾 PRF𝐾(·) pseudorandom function using key 𝐾 {𝑋}𝑃𝐾𝐴 public-key encryption of 𝑋 using AS 𝐴’s public key {𝑋}𝑃𝐾− 𝐴 public-key signature over 𝑋 using AS 𝐴’s private key no mechanism to delegate keys to certain services. Other systems, such as Kerberos [54], are reviewed in §8. 3.2.2 DRKey. Dynamically Recreatable Keys (DRKeys) have been proposed to efficiently derive and distribute symmetric shared keys between routers and end hosts in the context of Origin and Path Trace (OPT) [41], a system providing path validation. The system has later been generalized and embedded in the SCION Internet architecture [58]. DRKey’s fundamental idea is that each AS 𝐴 can efficiently derive a key hierarchy starting from a secret value 𝑆𝑉𝐴, providing keys shared with other ASes, 𝐾𝐴→𝐵, and end hosts, 𝐾𝐴→𝐵:𝐻𝐵. By periodically exchanging the keys 𝐾𝐴→𝐵 between ASes, from which host-level keys can be derived, DRKey enables an efficient global distribution of symmetric keys. DRKey fulfills most of our requirements to a key-distribution system and thus provides the basis of PISKES. However, PISKES refines and extends the existing DRKey system [58] in several sig- nificant ways: (i) PISKES modifies DRKey to make it applicable to the current Internet in addition to SCION; (ii) it adds efficient mech- anisms to delegate specific keys to services in an AS; (iii) it specifies many of its important practical aspects in further detail; and (iv) it fixes recently discovered vulnerabilities of DRKey’s key-exchange mechanisms due to an inadequate application of signatures [33]. 4 KEY DERIVATION AND DISTRIBUTION In this section, we present the key-derivation and -distribution mechanisms used for PISKES. This is based on the DRKey sys- tem [58], but we significantly extend it with additional delegation mechanisms and other optimizations, see also §3.2.2. Furthermore, we also formally model and verify security properties of this key- distribution system, see §7.1. We first provide a high-level overview to convey an intuition of the operation of our system. Figure 1 shows the basic use case of PISKES, where a host 𝐻𝐴 in AS 𝐴desires to communicate with a server 𝑆𝐵 in AS 𝐵, and 𝑆𝐵 wants to authenticate the network ' - 'Book chapter on SCION Control Plane explaining path exploration (beaconing). Describes PCB initiation and propagation by beacon servers. Covers intra-ISD beaconing (up/down segments) and core beaconing (core segments). Details initial PCB creation with initial ASE containing hop field (HF0) with empty ingress interface and specified egress interface. Mentions use of one-hop paths and service addresses for beacon dissemination. Laurent Chuat et al.. *The Complete Guide to SCION. From Design Principles to Formal Verification*. Springer International Publishing AG, 2022. book 90 4 Control Plane 4.2.1 Initiating Beaconing Each core AS, through its beacon service, periodically initiates the path explo- ration process by creating an initial PCB and propagating it. The PCB is either sent to a child AS (in the case of intra-ISD beaconing) or to other core ASes (in the case of core beaconing). The beacon service inserts (among other infor- mation) the initial AS entry ASE0 in the PCB. In the intra-ISD case, the initial PCB can optionally contain peer entries to non-core ASes. The hop entry HE inside ASE0 includes an initial hop field with the ingress interface identifier set to ‚ (which indicates an empty value): HF0 “ x FlagsHF } ExpTime } ‚ } ConsEgress } HFAuthy. (4.9) The initial hop field denotes the extremity of a path segment and authenti- cates a forwarding decision for every packet that • enters the AS through the interface ConsEgress and terminates in the AS; • originates from the AS and exits through the interface ConsEgress; or • switches to another path segment at this AS (using one of the possible path-segment combinations, as described in § 5.5). The beacon service then signs the PCB and sends it to a border router (which corresponds to the ConsEgress identifier as specified in the hop field). PCBs are disseminated within packets addressed to the beacon service using the corresponding service address (see § 4.6). Furthermore, the special one- hop path is used to initiate the communication to a neighboring beacon service (see § 5.4.1). This is necessary because there may not be a full forwarding path available for beaconing. Indeed, the discovery of such paths in turn relies on beaconing. The purpose of one-hop paths is thus to break this circular dependency. During core beaconing, the neighboring AS that receives the PCB can be in the same or in a different ISD. The ISD identifier included in the PCB’s signature metadata describes only the ISD of the PCB’s originator. 4.2.2 Propagating PCBs After beaconing is initiated, each PCB is propagated in the following way: The ingress border router of the next AS in the beaconing path receives the PCB, detects that the destination is a SCION service address, and sends it to the AS’s beacon service. The beacon service verifies the structure and all signatures on the PCB. The PCB contains the version numbers of the TRC(s) 3 and certificate(s) that must be used to verify the signatures. This enables the 3Even within a single ISD, there can be multiple valid TRCs at the same time, see § 3.1.6. 70 ' - 'Research paper describing the \"Multiple Advertisements\" approach for Anycast in SCION. Proposes advertising the same AS number from multiple locations, leveraging SCION''s path servers. Discusses addressing limitations (single ISD) and potential workarounds. Dennis Eijkel. \"Anycast in the SCION Internet Architecture.\" 2022. research paper 20 Addressing In the multiple advertisements solution, the same AS number is advertised from different points in the network, thus making the AS replicated and therefore also the services that reside inside of it. A SCION address is a triple of (ISD, AS, address) and does not allow for multiple ISD or AS identifiers in a single address. Therefore to have a single address for all of the different replicas that make up the service, all of the replicas must be put in the same AS that resides in a single ISD. A way to work around this limitation would be to extend the addressing format of SCION, either by allowing multiple ISD identifiers in the same address or a wildcard instead of the ISD identifier. Putting a wildcard in the address in the place of the ISD identifier would make that the address does not have the hijacking protection through isolation that regular SCION addresses have, thus possibly allowing for hijacking of routes. This means that traffic for that wildcard address can route to any ISD that hosts that AS number in their network, the rightful owner of the AS number has no control over which ISDs the traffic intended for their network would end up. Putting multiple ISD identifiers in a single address would mean that we would get practically the same system as the naming solution described in Section 3.3, where instead of through the naming system, alternate replicas are given in a single address. The conclusion is that both of these workarounds are not favorable. ISD considerations Considering the issues that exist around the addressing described before, replicated AS would be part of a (single) regular ISD that might also have ASes that are not replicated. But it is also possible to have dedicated ISD(s) for replicated services. These could come in multiple different forms. Operators of big replicated services might want to run their own ISD. These ISDs would then only have core ASes or only a limited number of non-core ASes. The core ASes would then have many peerings with other ISD cores at different geographical locations. Replicated service operators are probably not interested in providing transit for traffic through their ISD, thus they would not propagate beacons that would lead to paths that travel through their ISD being created. Another scenario could be that there are third parties that operate an anycast ISD and provide transit service to customers that want to operate a replicated service. The anycast ISD operator would operate the ISD core ASes and peer those with many other cores. Customers can then peer at multiple locations with (some of) the anycast core(s). 19 ' - source_sentence: How is the concept of configurable rates in Z-Lane intended to accommodate varying traffic demands sentences: - 'Research paper setup description section detailing the specific SCIONLab configuration, including AS creation, attachment to ETHZ-AP, and VM setup. Lists and describes SCION applications crucial the experiments: ''scion address'', ''scion showpaths'', ''scion ping'', ''scion traceroute'', and ''scion-bwtestclient'', including their options and parameters(like packet size, bandwidth target) for performance evaluation on the network. Antonio Battipaglia et al.. \"Evaluation of SCION for User-driven Path Control: a Usability Study.\" *Proceedings of the SC ''23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis*, 2023. research paper 3 Evaluation of SCION for User-driven Path Control: a Usability Study SC-W 2023, November 12–17, 2023, Denver, CO, USA Figure 1: SCIONLab Topology: in light orange there are Core ASes; Non-Core ASes are white colored; Attachment Points are green; our AS is blue. help us run specific experiments we will discuss in later sections. Once this configuration phase was completed, SCIONLab web inter- face provided a unique ASN for our AS, along with cryptographic keys and public-key certificates. Subsequently, a Vagrant file for our AS was generated to instruct the configuration of a Virtual Machine (VM) that represents our AS. This file made the setup process lightweight by automating the installation of SCIONLAB services, relevant packages, and necessary configurations. Finally we were ready to use a fully configured VM belonging to the global SCIONLab topology. 3.3 Available Applications The VM configuration process also installs a predefined set of SCION applications. The SCION apps that we used in our experi- ments are: • scion address : this command returns the relevant SCION address information for the local host, that is, our AS where we launch commands from. • scion showpaths : it lists available paths between the local and the specified AS. By default, the list is set to display 10 paths only, it can be extended using the-moption. Moreover, a really useful feature for this work, is the—extendedoption, which provides additional information for each path (e.g. MTU, Path Status, Latency info). • scion ping : it tests connectivity to a remote SCION host using SCMP echo packets[4]. When the —countoption is en- abled, the ping command sends a specific number of SCMP echo packets and provides a report with corresponding statis- tics. Furthermore, the real innovation is the —interactive mode option, which displays all the available paths for the specified destination allowing the user to select the desired traffic route. • scion traceroute : it traces the SCION path to a remote AS using SCMP traceroute packets. It is particularly useful to test how the latency is affected by each link. Even this command makes interactive mode available. • scion-bwtestclient: it is the only application presented in this work that is not installed by default in the VM. Bwtestclientis part of a bigger bandwidth testing applica- tion named bwtesterwhich allows a variety of bandwidth tests on the SCION network. The application enables speci- fication of the test duration (up to 10 seconds), the packet size to be used (at least 4 bytes), the total number of packets that will be sent, and the target bandwidth. For example, 5,100,?,150Mbps specifies that the packet size is 100 bytes, sent over 5 seconds, resulting in a bandwidth of 150Mbps. The question mark ? character can be used as wildcard for any of these parameters, in this case the number of packets sent. Its value is then computed according to the other pa- rameters. The parameters for the test in the client-to-server direction are specified with -cs, and the server-to-client direction with -sc. We will analyze further these scion commands and how we used them in the next section. 4 SOFTWARE DESIGN We now present our software to test SCION features of path aware- ness and path selection. We will also test network performances such as: latency, bandwidth and packet loss in order to provide UPIN users with paths that fulfill requirements on these properties. 787 ' - 'Research paper (PERFORMANCE ''20) on \"Incentivizing Stable Path Selection.\" Continues the game-theoretic analysis. Defines the oscillation model, building upon the Wardrop model, focusing on parallel-path systems, defining terms such key terms oscillation-prone system, oscillation and stability. Introduces system parameters, describes the temporal component, and defines formalizes definitions for oscillation and stability at equilibrium. Simon Scherrer et al.. \"Incentivizing Stable Path Selection in Future Internet Architectures.\" *Proceedings of the International Symposium on Computer Performance, Modeling, Measurements and Evaluation (PERFORMANCE)*, 2020. research paper 2 IFIP Performance, November 2–6, 2020, Milano, Italy Simon Scherrer, Markus Legner, Adrian Perrig, and Stefan Schmid an inter-domain context cannot be achieved by relying only on end- point path selection. Instead, network operators have to incentivize end-hosts to adopt one of the well-known convergent path-selection strategies with stabilization mechanisms . These mechanisms have to be incentive-compatible, i.e., the mechanisms must create an in- centive structure such that it is in an end-host’s self-interest to adopt a non-oscillatory path-selection strategy. In this work, we present two such stabilization mechanisms, FLOSS and CROSS, and formally prove their incentive compatibility. These mechanisms employ different techniques to disincentivize oscillatory switching between paths, namely limiting the migration rate between paths (FLOSS) and imposing a cost on switching between paths (CROSS). To complement our mainly theoretical work, we also discuss how our findings could be practically applied. 1.1 Contribution This paper revisits the theoretical study of the dynamic effects of end-point path selection, for the first time focusing the analysis on inter-domain networks where the end-points are selfish and uncontrolled. We present a game-theoretic model that allows us to investigate which path-selection strategies will be adopted by selfish end-hosts. In particular, we introduce the notion of equi- libria to path-selection strategies (PSS equilibria). Moreover, we formally show that the non-oscillatory path-selection strategies proposed in the existing literature do not form such PSS equilibria. Thus, we provide evidence towards the hypothesis that stability in load-adaptive routing over multiple domains cannot be achieved by exclusively relying on end-hosts’ path-selection behavior. To rem- edy this problem, we leverage insights from mechanism design to devise two incentive-compatible stabilization mechanisms enforced by network operators. While these mechanisms build on existing insights from intra-domain traffic engineering, their methods of incentivization represent a novel approach to achieve stability in inter-domain networks with load-adaptive routing. We formally prove the incentive compatibility of both mechanisms and discuss their practical application. 2 OSCILLATION MODEL 2.1 Parallel-Path Systems In order to study oscillation in network architectures with end-host path selection, we build on the well-established Wardrop model [37], which is the standard model for studying the interactions of selfish agents in computer networks [28, 32, 33]. In the Wardrop model, an infinite number of end-hosts, each controlling an infinitesimal traffic share, select one path 𝜋 among multiple paths Π between two network nodes. Every path 𝜋 has a load-dependent cost, where the path-cost function 𝑐𝜋 is typically interpreted as latency. The end-hosts’ path-selection decisions form a congestion game, where the path-selection decisions of end-hosts both determine and follow the load 𝑓𝜋 on every path 𝜋 [5, 19, 30]. In this work, we analyze congestion games with a temporal com- ponent, i.e., end-hosts take path-selection decisions over time based on currently available information. More precisely, an end-host performs an average of 𝑟 > 0 re-evaluations per unit of time. The aggregate re-evaluation behavior is uniform over time, i.e., when dividing time into intervals of length 𝜖 ∈(0,1], 𝑟𝜖 re-evaluations are performed in any interval Whenever an end-host performs a re-evaluation, it chooses one path 𝜋to its destination according to a freely chosen path-selection strategy 𝜎. We thus formalize the environment of congestion games as parallel-path systems : Definition 1. A parallel-path system 𝑂 := (Π,𝑟,𝑝,𝑇,𝐴 0,𝑣) is a tuple, where a total demand normalized to 1 is distributed over parallel paths 𝜋 ∈Π among which end-hosts can select; 𝑟 > 0 is the average number of re-evaluations per end-host and unit of time; 𝑝 ≥ 1 is the steepness of the path cost as a function of the load (i.e., 𝑐𝜋 = (𝑓𝜋)𝑝); 𝑇 ≥0 is the average time that it takes for cost information to reach the agents; A0 ∈ [0,1]|Π| is the initial load matrix, where the entry A0𝜋 = 𝑓𝜋(0); and 𝑣 is the strategy profile, defining for every available path-selection strategy 𝜎 the share 𝑣(𝜎) of end-hosts that permanently apply strategy 𝜎. Every congestion game possesses at least one Wardrop equilib- rium, consisting of a traffic distribution where no single agent can reduce its cost by selecting an alternative path [30]. If the agents take path-selection decisions based on up-to-date cost information of paths (𝑇 = 0), convergence to Wardrop equilibria is guaranteed and persistent oscillations can thus not arise [12, 13, 34]. However, in practice, the cost information possessed by agents isstale (𝑇 > 0), i.e., the information describes an older state of the network. If such stale information is present, undesirable oscillations can arise [14]. Therefore, parallel-path systems can be oscillation-prone: Definition 2. A parallel-path system 𝑂 is oscillation-prone if and only if 𝑇 > 0. In this work, we study oscillation-prone systems with two paths 𝛼 and 𝛽 (i.e., |Π|= 2), but our insights directly generalize to more paths. Due to total demand normalization, it holds that 𝑓𝛽(𝑡)= 1 −𝑓𝛼(𝑡)for all 𝑡 ≥0. Thus, the unique Wardrop equilibrium in a two-path oscillation-prone system is given by 𝑓𝛼 = 𝑓𝛽 = 1/2. Moreover, we assume w.l.o.g. that the initial imbalance𝐴0 exists with the higher load on path 𝛼: 𝑓𝛼(0)= 𝐴0 = A0𝛼 > 1/2. For this system of two parallel paths, ˜𝜋 denotes the respective other path, i.e., ˜𝛼 = 𝛽 and ˜𝛽 = 𝛼. Having introduced the concept of oscillation-prone systems, we next define notions of oscillation and stability. First, an oscillation- prone system experiences oscillation if the traffic distribution does not eventually become static: Definition 3. An oscillation-prone system 𝑂experiences oscilla- tion if there exists no limit Δ∗of the function Δ(𝑡)= |𝑓𝛼(𝑡)− 𝑓𝛽(𝑡)| for 𝑡 →∞. Conversely, we understand stability simply as the absence of oscillation, i.e., stability is given if a limit Δ∗exists. However, to ensure optimal network utilization, the desirable state of the net- work is not only stability, but stability at equal load as given by the Wardrop equilibrium: Definition 4. An oscillation-prone system 𝑂 is stable at equal load if Δ∗:= lim𝑡→∞Δ(𝑡)= 0. 2 ' - 'Research paper section providing a Z-lane system description. Introduces AS/ISD-level bandwidth isolation and configurable rates using SCION''s ISDs. Explains how ASes can overuse allocated bandwidth and send traffic at guaranteed rates. Marc Wyss et al.. \"Zero-setup Intermediate-rate Communication Guarantees in a Global Internet.\" *Proceedings of the USENIX Security Symposium*, 2024. research paper 5 Z-Lane. The decision how to configure the rates is ultimately up to the network operator and, importantly, does not require any inter-domain coordination. Due to the aggregation of ASes into ISDs, configurations remain manageable even if the Internet grows to hundreds of thousands of ASes. End Host Guarantees. Z-Lane lets end hosts, more specifi- cally their applications, define what traffic should be sent with forwarding guarantees, and what traffic should be forwarded over best-effort. Still, to protect against malicious end hosts, their AS has the ultimate authority in this matter and can re- classify traffic to be sent as best-effort only. This protection is implemented through a Z-Lane gateway, which schedules end host traffic and authenticates it towards on-path routers using a secret key not known to the end hosts. How traffic is scheduled is up to the AS operator; configurations can range from fair sharing to prioritizing certain traffic from critical AS services like routing or time synchronization. We emphasize that, to avoid any setup overhead (R3), neither ISDs, nor ASes or end hosts explicitly learn their configured rate; instead, end hosts implicitly discover their allowed rate through existing mechanisms like congestion control. Compatibility with Other Systems. Bandwidth reserva- tion systems cannot provide zero-setup communication guar- antees and are therefore not suitable to protect short-lived intermediate-rate communication (Section 8). Still, we design Z-Lane to seamlessly coexist with them, as they complement our work by effectively protecting non-setup-critical, high- volume communication such as from video conferencing. We choose COLIBRI [27] as a reservation system instantiation, but other systems could be deployed as well. To prevent at- tacks targeting DRKey’s AS-level key exchange, which is a fundamental requirement for EPIC, our design also ensures compatibility with the DoCile system [74], which leverages dedicated channels between neighboring ASes to successfully bootstrap the key exchange even under DDoS. We therefore consider the following four types of inter- domain traffic: COLIBRI reservation traffic, DoCile’s neighbor-based communication, authenticated traffic from EPIC, and unauthenticated SCION traffic. 4.2 Source Authentication Z-Lane employs EPIC for authenticating traffic sources to border routers, allowing every border router to verify the au- thenticity of every received packet. An important insight in the design of Z-Lane is that efficient and reliable source authen- tication as provided by EPIC allows for meaningful source- based traffic control at border routers. The realization of this idea has not been possible so far because previous source authentication mechanisms would cause excessive commu- nication or computation overhead and therefore impede de- ployment, or were based on heuristics or probabilities, and would thus fail to reliably distinguish between authentic and spoofed addresses (Appendix H). Z-Lane is the first system to explore the use of comprehensive source authentication to protect the availability of short-lived intermediate-rate Inter- net traffic – with EPIC’s security rooted in AS-level secret keys, it integrates seamlessly into Z-Lane. We want to highlight that EPIC together with a fairness mechanism provided by some congestion control algorithm, i.e., without any guaranteed rates, would not be enough in our threat model, as an attacker would just not respect the algorithm’s feedback and instead keep sending traffic at high rates, or leverage a botnet to create many low-volume flows. 4.3 End Host Traffic Generation End hosts, i.e., their applications, can choose among several mechanisms on how their traffic is forwarded (Figure 1). For long-term traffic they request a bandwidth reservation and use it by sending their COLIBRI traffic class packets through the COLIBRI gateway. While the overhead for requesting a reservation is significant, the result is a fixed amount of bandwidth that is exclusively reserved along the communi- cation path. In a similar way, applications send short-lived intermediate-rate traffic using the EPIC traffic class over the Z-Lane gateway, where traffic is forwarded immediately with- out any delay (requirement R3), but without the applications knowing the concrete rates. In both cases traffic is protected against congestion on the communication path. The default option is for end hosts to send their traffic using the EPIC traffic class directly to a border router of their AS, where they are forwarded along the path using best-effort. This option is useful for non-latency-critical communication such as file downloads, or for long-term traffic for which no reservation is available, which can for example happen if the end host has already created a large number of reservations and gets denied from creating even more. Z-Lane envisages unauthenticated SCION traffic to be sent only in scenarios where it is not otherwise possible, e.g., if an AS needs to request shared keys using DRKey from another AS for the first time. 4.4 Z-Lane Gateway ASes use the gateway to control the traffic volumes that their end hosts (incl. AS infrastructure services) are allowed to send using Z-Lane, which serves the primary purpose of protecting benign from malicious or compromised end hosts. For end host traffic complying with the allowed rate, the gateway sets a QoS flag in the EPIC header, which indicates to on-path routers that the corresponding packets should be forwarded using the AS’ guaranteed rate. If an end host’s packet exceeds the allowed rate at the gateway, then either (i) the QoS flag is not set (or removed, if it was already set by the end host), meaning that those packets will be treated as best- effort, or (ii) the packets are dropped, depending on the AS’ policy. In contrast to best-effort EPIC packets generated at 5 ' pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-s results: - task: type: information-retrieval name: Information Retrieval dataset: name: val ir eval type: val-ir-eval metrics: - type: cosine_accuracy@1 value: 0.7254901960784313 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9019607843137255 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9313725490196079 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9607843137254902 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7254901960784313 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.3006535947712418 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.18627450980392155 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09607843137254901 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7254901960784313 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9019607843137255 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9313725490196079 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9607843137254902 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8542256235274797 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8187908496732025 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8212133545466878 name: Cosine Map@100 --- # SentenceTransformer based on Snowflake/snowflake-arctic-embed-s This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-s. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** Snowflake/snowflake-arctic-embed-s - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 384 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** Sentence Transformers Documentation - **Repository:** Sentence Transformers on GitHub - **Hugging Face:** Sentence Transformers on Hugging Face ### Full Model Architecture ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: Then you can load this model and run inference. ## Evaluation ### Metrics #### Information Retrieval * Dataset: * Evaluated with InformationRetrievalEvaluator | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7255 | | cosine_accuracy@3 | 0.902 | | cosine_accuracy@5 | 0.9314 | | cosine_accuracy@10 | 0.9608 | | cosine_precision@1 | 0.7255 | | cosine_precision@3 | 0.3007 | | cosine_precision@5 | 0.1863 | | cosine_precision@10 | 0.0961 | | cosine_recall@1 | 0.7255 | | cosine_recall@3 | 0.902 | | cosine_recall@5 | 0.9314 | | cosine_recall@10 | 0.9608 | | **cosine_ndcg@10** | **0.8542** | | cosine_mrr@10 | 0.8188 | | cosine_map@100 | 0.8212 | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 4,321 training samples * Columns: sentence_0 and sentence_1 * Approximate statistics based on the first 1000 samples: | | sentence_0 | sentence_1 | |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------| | type | string | string | | details |
  • min: 5 tokens
  • mean: 19.23 tokens
  • max: 66 tokens
|
  • min: 238 tokens
  • mean: 507.97 tokens
  • max: 512 tokens
| * Samples: | sentence_0 | sentence_1 | |:-------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What are the two scenarios for LightningFilter deployment depending on the level of trust with the AS | Book chapter detailing SCION LightningFilter's packet authentication using DRKey. Describes key derivation using PRF with AS-level (KLF_A->B) and host-level (KLF_A:HA->B:HB) keys. Explains two deployment scenarios: trusted entity with direct access to SVLF_A and less-trusted entity fetching second-level keys. Covers header and payload authentication using SPAO, MAC computation with symmetric key (tag = MAC{KLF_A:HA->B:HB}(hdr)), and payload hash (h = H(pld)).
Laurent Chuat et al.. *The Complete Guide to SCION. From Design Principles to Formal Verification*. Springer International Publishing AG, 2022.
book
233

9.2 High-Speed Traffic Filtering with LightningFilter
in the number of hosts, the computation overhead is significant and thus not
suited for a per-packet usage. On the other hand, using symmetric cryptog-
raphy would traditionally require the filtering service to store a key for each
packet source. To avoid per-host stat...
| | How do preferences, such as customer, peering link, or transit provider, are expressed in BGP? | Book excerpt on Approaches to Implementing Path Policies and Gao–Rexford Model describing how ASes add path policy information to PCBs, specifying usage restrictions. Highlights accountability for violating AS, explain the need of a default, arbitrary path. Explains the \"preference policy\" for economics and \"export policy\" for stability.
Laurent Chuat et al.. *The Complete Guide to SCION. From Design Principles to Formal Verification*. Springer International Publishing AG, 2022.
book
159

6.2 SCION Path Policy
When the path is only used against the explicit path policy but not regis-
tered, detection is more challenging. To detect such misuse, an AS can
monitor hop fields (HFs) used in traffic and, in the case of HFs that were
not registered by any of the downstream ASes, it can verify whether the
source or destination AS is allowed to use the path. Furthermore, viola-
tion by an intermediate AS can be detected by tracing the ...
| | What is the structure of a complete SCION address? ,How is intra-domain forwarding handled at the destination AS? | Technical document describing inter- and intra-domain forwarding in SCION. Explains the separation of inter-domain (SCION-based) and intra-domain (AS-specific, often IP-based) forwarding. SCION routers forward based on Hop Fields and need not inspect destination IP address. Includes advantages like path control and simplified processing.

specification


de Kater, et al. Expires 27 June 2025 [Page 8]

Internet-Draft SCION DP December 2024


* It simplifies the packet processing at routers. Instead of having
to perform longest prefix matching on IP addresses which requires
expensive hardware and substantial amounts of energy, a router can
simply access the next hop from the packet header after having
verified the authenticity of the Hop Field's MAC.

1.3.1. Inter- and Intra-Domain Forwarding

...
| * Loss: MultipleNegativesRankingLoss with these parameters: ### Training Hyperparameters #### Non-Default Hyperparameters - : steps - : 50 - : 50 - : 5 - : round_robin #### All Hyperparameters
Click to expand - : False - : False - : steps - : True - : 50 - : 50 - : None - : None - : 1 - : None - : None - : 5e-05 - : 0.0 - : 0.9 - : 0.999 - : 1e-08 - : 1 - : 5 - : -1 - : linear - : {} - : 0.0 - : 0 - : passive - : warning - : True - : True - : True - : False - : False - : False - : False - : False - : False - : 42 - : None - : False - : False - : False - : False - : O1 - : auto - : False - : False - : None - : 0 - : None - : None - : False - : [] - : False - : 0 - : None - : -1 - : False - : True - : None - : False - : False - : [] - : 0 - : {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - : None - : {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - : None - : 0.0 - : adamw_torch - : None - : False - : False - : length - : None - : None - : False - : True - : False - : True - : False - : False - : None - : None - : every_save - : None - : False - : False - : None - : False - : [] - : True - : auto - : None - : None - : - : False - : False - : None - : last - : 1800 - : False - : None - : None - : None - : None - : False - : False - : None - : None - : False - : False - : False - : False - : False - : None - : batch_sampler - : round_robin
### Training Logs | Epoch | Step | val-ir-eval_cosine_ndcg@10 | |:-----:|:----:|:--------------------------:| | 1.0 | 44 | 0.7533 | | 2.0 | 88 | 0.8088 | | 3.0 | 132 | 0.8296 | | 4.0 | 176 | 0.8326 | | 5.0 | 220 | 0.8542 | ### Framework Versions - Python: 3.12.3 - Sentence Transformers: 3.4.1 - Transformers: 4.49.0 - PyTorch: 2.6.0+cu124 - Accelerate: 1.4.0 - Datasets: 3.3.2 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers #### MultipleNegativesRankingLoss ", + "model_explanation_gemini": "Generates embeddings for sentences to measure similarity and extract features, optimized for ranking-related tasks using a MultipleNegativesRankingLoss approach." +} \ No newline at end of file diff --git a/data/model_data_json/tk93_V-Express.json b/data/model_data_json/tk93_V-Express.json new file mode 100644 index 0000000000000000000000000000000000000000..748a9ba010a55e79c42c302a93ef9f4bdc1a0b62 --- /dev/null +++ b/data/model_data_json/tk93_V-Express.json @@ -0,0 +1,16 @@ +{ + "model_id": "tk93/V-Express", + "downloads": 117388, + "tags": [ + "diffusers", + "onnx", + "text-to-image", + "stable-diffusion", + "audio-to-video", + "en", + "arxiv:2406.02511", + "region:us" + ], + "description": "--- tags: - text-to-image - stable-diffusion - audio-to-video language: - en library_name: diffusers --- # V-Express Model Card
**Project Page** **|** **Paper** **|** **Code**
--- ## Introduction ## Models ### Audio Encoder - model_ckpts/wav2vec2-base-960h. (It is also available from the original model card facebook/wav2vec2-base-960h) ### Face Analysis - model_ckpts/insightface_models/models/buffalo_l. (It is also available from the original repository insightface/buffalo_l) ### V-Express - model_ckpts/sd-vae-ft-mse. VAE encoder. (original model card stabilityai/sd-vae-ft-mse) - model_ckpts/stable-diffusion-v1-5. Only the model configuration file for unet is needed here. (original model card runwayml/stable-diffusion-v1-5) - model_ckpts/v-express. The video generation model conditional on audio and V-kps we call V-Express. - You should download and put all model to directory, which includes , , , , and . ## licence see acknowledgements for more information.", + "model_explanation_gemini": "Generates videos from audio inputs and facial keypoints using Stable Diffusion and auxiliary models for audio encoding and face analysis." +} \ No newline at end of file diff --git a/data/model_data_json/tner_roberta-large-tweetner7-all.json b/data/model_data_json/tner_roberta-large-tweetner7-all.json new file mode 100644 index 0000000000000000000000000000000000000000..6d5473b2e505b12338e582eba314f873f8acf4a6 --- /dev/null +++ b/data/model_data_json/tner_roberta-large-tweetner7-all.json @@ -0,0 +1,17 @@ +{ + "model_id": "tner/roberta-large-tweetner7-all", + "downloads": 237679, + "tags": [ + "transformers", + "pytorch", + "roberta", + "token-classification", + "dataset:tner/tweetner7", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - tner/tweetner7 metrics: - f1 - precision - recall model-index: - name: tner/roberta-large-tweetner7-all results: - task: name: Token Classification type: token-classification dataset: name: tner/tweetner7 type: tner/tweetner7 args: tner/tweetner7 metrics: - name: F1 (test_2021) type: f1 value: 0.6574551220340903 - name: Precision (test_2021) type: precision value: 0.644212629008989 - name: Recall (test_2021) type: recall value: 0.6712534690101758 - name: Macro F1 (test_2021) type: f1_macro value: 0.6124665667529737 - name: Macro Precision (test_2021) type: precision_macro value: 0.6005167968535563 - name: Macro Recall (test_2021) type: recall_macro value: 0.625251837701222 - name: Entity Span F1 (test_2021) type: f1_entity_span value: 0.7881979839166384 - name: Entity Span Precision (test_2020) type: precision_entity_span value: 0.7722783264898457 - name: Entity Span Recall (test_2021) type: recall_entity_span value: 0.804787787672025 - name: F1 (test_2020) type: f1 value: 0.6628787878787878 - name: Precision (test_2020) type: precision value: 0.6924816280384398 - name: Recall (test_2020) type: recall value: 0.6357031655422937 - name: Macro F1 (test_2020) type: f1_macro value: 0.6297223287745568 - name: Macro Precision (test_2020) type: precision_macro value: 0.6618492079232416 - name: Macro Recall (test_2020) type: recall_macro value: 0.601311568050436 - name: Entity Span F1 (test_2020) type: f1_entity_span value: 0.7642760487144791 - name: Entity Span Precision (test_2020) type: precision_entity_span value: 0.7986425339366516 - name: Entity Span Recall (test_2020) type: recall_entity_span value: 0.7327451997924235 pipeline_tag: token-classification widget: - text: \"Get the all-analog Classic Vinyl Edition of Album from {@herbiehancock@} via {@bluenoterecords@} link below: {{URL}}\" example_title: \"NER Example 1\" --- # tner/roberta-large-tweetner7-all This model is a fine-tuned version of roberta-large on the tner/tweetner7 dataset ( split). Model fine-tuning is done via T-NER's hyper-parameter search (see the repository for more detail). It achieves the following results on the test set of 2021: - F1 (micro): 0.6574551220340903 - Precision (micro): 0.644212629008989 - Recall (micro): 0.6712534690101758 - F1 (macro): 0.6124665667529737 - Precision (macro): 0.6005167968535563 - Recall (macro): 0.625251837701222 The per-entity breakdown of the F1 score on the test set are below: - corporation: 0.5392156862745098 - creative_work: 0.4760582928521859 - event: 0.4673321234119782 - group: 0.6139798488664987 - location: 0.6707399864222675 - person: 0.8293212669683258 - product: 0.6906187624750498 For F1 scores, the confidence interval is obtained by bootstrap as below: - F1 (micro): - 90%: [0.6484148010152769, 0.6672289519134409] - 95%: [0.6470100684797441, 0.6689850350992637] - F1 (macro): - 90%: [0.6484148010152769, 0.6672289519134409] - 95%: [0.6470100684797441, 0.6689850350992637] Full evaluation can be found at metric file of NER and metric file of entity span. ### Usage This model can be used through the tner library. Install the library via pip. TweetNER7 pre-processed tweets where the account name and URLs are converted into special formats (see the dataset page for more detail), so we process tweets accordingly and then run the model prediction as below. It can be used via transformers library but it is not recommended as CRF layer is not supported at the moment. ### Training hyperparameters The following hyperparameters were used during training: - dataset: ['tner/tweetner7'] - dataset_split: train_all - dataset_name: None - local_dataset: None - model: roberta-large - crf: True - max_length: 128 - epoch: 30 - batch_size: 32 - lr: 1e-05 - random_seed: 0 - gradient_accumulation_steps: 1 - weight_decay: 1e-07 - lr_warmup_step_ratio: 0.15 - max_grad_norm: 1 The full configuration can be found at fine-tuning parameter file. ### Reference If you use the model, please cite T-NER paper and TweetNER7 paper. - T-NER - TweetNER7", + "model_explanation_gemini": "Performs named entity recognition on tweets to identify entities like corporations, creative works, events, groups, locations, persons, and products." +} \ No newline at end of file diff --git a/data/model_data_json/tohoku-nlp_bert-base-japanese-char-v2.json b/data/model_data_json/tohoku-nlp_bert-base-japanese-char-v2.json new file mode 100644 index 0000000000000000000000000000000000000000..0f5aa1234309537e1721c3f330c372faeedd6776 --- /dev/null +++ b/data/model_data_json/tohoku-nlp_bert-base-japanese-char-v2.json @@ -0,0 +1,20 @@ +{ + "model_id": "tohoku-nlp/bert-base-japanese-char-v2", + "downloads": 117934, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "ja", + "dataset:wikipedia", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja license: cc-by-sa-4.0 datasets: - wikipedia widget: - text: 東北大学で[MASK]の研究をしています。 --- # BERT base Japanese (character-level tokenization with whole word masking, jawiki-20200831) This is a BERT model pretrained on texts in the Japanese language. This version of the model processes input texts with word-level tokenization based on the Unidic 2.1.2 dictionary (available in unidic-lite package), followed by character-level tokenization. Additionally, the model is trained with the whole word masking enabled for the masked language modeling (MLM) objective. The codes for the pretraining are available at cl-tohoku/bert-japanese. ## Model architecture The model architecture is the same as the original BERT base model; 12 layers, 768 dimensions of hidden states, and 12 attention heads. ## Training Data The models are trained on the Japanese version of Wikipedia. The training corpus is generated from the Wikipedia Cirrussearch dump file as of August 31, 2020. The generated corpus files are 4.0GB in total, containing approximately 30M sentences. We used the MeCab morphological parser with mecab-ipadic-NEologd dictionary to split texts into sentences. ## Tokenization The texts are first tokenized by MeCab with the Unidic 2.1.2 dictionary and then split into characters. The vocabulary size is 6144. We used []( and []( packages for the tokenization. ## Training The models are trained with the same configuration as the original BERT; 512 tokens per instance, 256 instances per batch, and 1M training steps. For training of the MLM (masked language modeling) objective, we introduced whole word masking in which all of the subword tokens corresponding to a single word (tokenized by MeCab) are masked at once. For training of each model, we used a v3-8 instance of Cloud TPUs provided by TensorFlow Research Cloud program. The training took about 5 days to finish. ## Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 3.0. ## Acknowledgments This model is trained with Cloud TPUs provided by TensorFlow Research Cloud program.", + "model_explanation_gemini": "A Japanese BERT model pretrained with character-level tokenization and whole word masking for masked language modeling, using Japanese Wikipedia data." +} \ No newline at end of file diff --git a/data/model_data_json/tohoku-nlp_bert-base-japanese-char-v3.json b/data/model_data_json/tohoku-nlp_bert-base-japanese-char-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..109bef3cd40168422ebd8656da4bd1fee89eb10f --- /dev/null +++ b/data/model_data_json/tohoku-nlp_bert-base-japanese-char-v3.json @@ -0,0 +1,20 @@ +{ + "model_id": "tohoku-nlp/bert-base-japanese-char-v3", + "downloads": 103588, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "pretraining", + "ja", + "dataset:cc100", + "dataset:wikipedia", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - cc100 - wikipedia language: - ja widget: - text: 東北大学で[MASK]の研究をしています。 --- # BERT base Japanese (character-level tokenization with whole word masking, CC-100 and jawiki-20230102) This is a BERT model pretrained on texts in the Japanese language. This version of the model processes input texts with word-level tokenization based on the Unidic 2.1.2 dictionary (available in unidic-lite package), followed by character-level tokenization. Additionally, the model is trained with the whole word masking enabled for the masked language modeling (MLM) objective. The codes for the pretraining are available at cl-tohoku/bert-japanese. ## Model architecture The model architecture is the same as the original BERT base model; 12 layers, 768 dimensions of hidden states, and 12 attention heads. ## Training Data The model is trained on the Japanese portion of CC-100 dataset and the Japanese version of Wikipedia. For Wikipedia, we generated a text corpus from the Wikipedia Cirrussearch dump file as of January 2, 2023. The corpus files generated from CC-100 and Wikipedia are 74.3GB and 4.9GB in size and consist of approximately 392M and 34M sentences, respectively. For the purpose of splitting texts into sentences, we used fugashi with mecab-ipadic-NEologd dictionary (v0.0.7). ## Tokenization The texts are first tokenized by MeCab with the Unidic 2.1.2 dictionary and then split into characters. The vocabulary size is 7027. We used fugashi and unidic-lite packages for the tokenization. ## Training We trained the model first on the CC-100 corpus for 1M steps and then on the Wikipedia corpus for another 1M steps. For training of the MLM (masked language modeling) objective, we introduced whole word masking in which all of the subword tokens corresponding to a single word (tokenized by MeCab) are masked at once. For training of each model, we used a v3-8 instance of Cloud TPUs provided by TPU Research Cloud. ## Licenses The pretrained models are distributed under the Apache License 2.0. ## Acknowledgments This model is trained with Cloud TPUs provided by TPU Research Cloud program.", + "model_explanation_gemini": "A Japanese BERT model pretrained with character-level tokenization and whole word masking for masked language modeling, using the CC-100 and Japanese Wikipedia datasets." +} \ No newline at end of file diff --git a/data/model_data_json/tohoku-nlp_bert-base-japanese-char.json b/data/model_data_json/tohoku-nlp_bert-base-japanese-char.json new file mode 100644 index 0000000000000000000000000000000000000000..e7d4957172c5aabf13a93637ac14c1ae4a48caaf --- /dev/null +++ b/data/model_data_json/tohoku-nlp_bert-base-japanese-char.json @@ -0,0 +1,20 @@ +{ + "model_id": "tohoku-nlp/bert-base-japanese-char", + "downloads": 107118, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "ja", + "dataset:wikipedia", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja license: cc-by-sa-4.0 datasets: - wikipedia widget: - text: 仙台は「[MASK]の都」と呼ばれている。 --- # BERT base Japanese (character tokenization) This is a BERT model pretrained on texts in the Japanese language. This version of the model processes input texts with word-level tokenization based on the IPA dictionary, followed by character-level tokenization. The codes for the pretraining are available at cl-tohoku/bert-japanese. ## Model architecture The model architecture is the same as the original BERT base model; 12 layers, 768 dimensions of hidden states, and 12 attention heads. ## Training Data The model is trained on Japanese Wikipedia as of September 1, 2019. To generate the training corpus, WikiExtractor is used to extract plain texts from a dump file of Wikipedia articles. The text files used for the training are 2.6GB in size, consisting of approximately 17M sentences. ## Tokenization The texts are first tokenized by MeCab morphological parser with the IPA dictionary and then split into characters. The vocabulary size is 4000. ## Training The model is trained with the same configuration as the original BERT; 512 tokens per instance, 256 instances per batch, and 1M training steps. ## Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 3.0. ## Acknowledgments For training models, we used Cloud TPUs provided by TensorFlow Research Cloud program.", + "model_explanation_gemini": "Pretrained BERT model for Japanese text processing using character-level tokenization after initial word-level tokenization with MeCab and the IPA dictionary." +} \ No newline at end of file diff --git a/data/model_data_json/tohoku-nlp_bert-base-japanese-v3.json b/data/model_data_json/tohoku-nlp_bert-base-japanese-v3.json new file mode 100644 index 0000000000000000000000000000000000000000..769341fe5c247a5cd859969e613dc4603636dde6 --- /dev/null +++ b/data/model_data_json/tohoku-nlp_bert-base-japanese-v3.json @@ -0,0 +1,20 @@ +{ + "model_id": "tohoku-nlp/bert-base-japanese-v3", + "downloads": 368380, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "pretraining", + "ja", + "dataset:cc100", + "dataset:wikipedia", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 datasets: - cc100 - wikipedia language: - ja widget: - text: 東北大学で[MASK]の研究をしています。 --- # BERT base Japanese (unidic-lite with whole word masking, CC-100 and jawiki-20230102) This is a BERT model pretrained on texts in the Japanese language. This version of the model processes input texts with word-level tokenization based on the Unidic 2.1.2 dictionary (available in unidic-lite package), followed by the WordPiece subword tokenization. Additionally, the model is trained with the whole word masking enabled for the masked language modeling (MLM) objective. The codes for the pretraining are available at cl-tohoku/bert-japanese. ## Model architecture The model architecture is the same as the original BERT base model; 12 layers, 768 dimensions of hidden states, and 12 attention heads. ## Training Data The model is trained on the Japanese portion of CC-100 dataset and the Japanese version of Wikipedia. For Wikipedia, we generated a text corpus from the Wikipedia Cirrussearch dump file as of January 2, 2023. The corpus files generated from CC-100 and Wikipedia are 74.3GB and 4.9GB in size and consist of approximately 392M and 34M sentences, respectively. For the purpose of splitting texts into sentences, we used fugashi with mecab-ipadic-NEologd dictionary (v0.0.7). ## Tokenization The texts are first tokenized by MeCab with the Unidic 2.1.2 dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32768. We used fugashi and unidic-lite packages for the tokenization. ## Training We trained the model first on the CC-100 corpus for 1M steps and then on the Wikipedia corpus for another 1M steps. For training of the MLM (masked language modeling) objective, we introduced whole word masking in which all of the subword tokens corresponding to a single word (tokenized by MeCab) are masked at once. For training of each model, we used a v3-8 instance of Cloud TPUs provided by TPU Research Cloud. ## Licenses The pretrained models are distributed under the Apache License 2.0. ## Acknowledgments This model is trained with Cloud TPUs provided by TPU Research Cloud program.", + "model_explanation_gemini": "A Japanese BERT model pretrained on CC-100 and Wikipedia texts using word-level tokenization with Unidic and WordPiece, optimized for masked language modeling with whole word masking." +} \ No newline at end of file diff --git a/data/model_data_json/tohoku-nlp_bert-base-japanese-whole-word-masking.json b/data/model_data_json/tohoku-nlp_bert-base-japanese-whole-word-masking.json new file mode 100644 index 0000000000000000000000000000000000000000..df52f91be0fbaec8ce874c3973a08f0ef9a9ecb8 --- /dev/null +++ b/data/model_data_json/tohoku-nlp_bert-base-japanese-whole-word-masking.json @@ -0,0 +1,20 @@ +{ + "model_id": "tohoku-nlp/bert-base-japanese-whole-word-masking", + "downloads": 112735, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "ja", + "dataset:wikipedia", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja license: cc-by-sa-4.0 datasets: - wikipedia widget: - text: 東北大学で[MASK]の研究をしています。 --- # BERT base Japanese (IPA dictionary, whole word masking enabled) This is a BERT model pretrained on texts in the Japanese language. This version of the model processes input texts with word-level tokenization based on the IPA dictionary, followed by the WordPiece subword tokenization. Additionally, the model is trained with the whole word masking enabled for the masked language modeling (MLM) objective. The codes for the pretraining are available at cl-tohoku/bert-japanese. ## Model architecture The model architecture is the same as the original BERT base model; 12 layers, 768 dimensions of hidden states, and 12 attention heads. ## Training Data The model is trained on Japanese Wikipedia as of September 1, 2019. To generate the training corpus, WikiExtractor is used to extract plain texts from a dump file of Wikipedia articles. The text files used for the training are 2.6GB in size, consisting of approximately 17M sentences. ## Tokenization The texts are first tokenized by MeCab morphological parser with the IPA dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32000. ## Training The model is trained with the same configuration as the original BERT; 512 tokens per instance, 256 instances per batch, and 1M training steps. For the training of the MLM (masked language modeling) objective, we introduced the **Whole Word Masking** in which all of the subword tokens corresponding to a single word (tokenized by MeCab) are masked at once. ## Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 3.0. ## Acknowledgments For training models, we used Cloud TPUs provided by TensorFlow Research Cloud program.", + "model_explanation_gemini": "A Japanese BERT model pretrained with whole word masking on Wikipedia texts, using IPA dictionary-based tokenization for masked language modeling tasks." +} \ No newline at end of file diff --git a/data/model_data_json/tohoku-nlp_bert-base-japanese.json b/data/model_data_json/tohoku-nlp_bert-base-japanese.json new file mode 100644 index 0000000000000000000000000000000000000000..59290cc55c2ed8d6f75c37639e84ad9a7b2b7eb3 --- /dev/null +++ b/data/model_data_json/tohoku-nlp_bert-base-japanese.json @@ -0,0 +1,20 @@ +{ + "model_id": "tohoku-nlp/bert-base-japanese", + "downloads": 150838, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "bert", + "fill-mask", + "ja", + "dataset:wikipedia", + "license:cc-by-sa-4.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: ja license: cc-by-sa-4.0 datasets: - wikipedia widget: - text: 東北大学で[MASK]の研究をしています。 --- # BERT base Japanese (IPA dictionary) This is a BERT model pretrained on texts in the Japanese language. This version of the model processes input texts with word-level tokenization based on the IPA dictionary, followed by the WordPiece subword tokenization. The codes for the pretraining are available at cl-tohoku/bert-japanese. ## Model architecture The model architecture is the same as the original BERT base model; 12 layers, 768 dimensions of hidden states, and 12 attention heads. ## Training Data The model is trained on Japanese Wikipedia as of September 1, 2019. To generate the training corpus, WikiExtractor is used to extract plain texts from a dump file of Wikipedia articles. The text files used for the training are 2.6GB in size, consisting of approximately 17M sentences. ## Tokenization The texts are first tokenized by MeCab morphological parser with the IPA dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32000. ## Training The model is trained with the same configuration as the original BERT; 512 tokens per instance, 256 instances per batch, and 1M training steps. ## Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 3.0. ## Acknowledgments For training models, we used Cloud TPUs provided by TensorFlow Research Cloud program.", + "model_explanation_gemini": "Pretrained BERT model for Japanese language processing, using IPA dictionary-based word tokenization and WordPiece subword tokenization, trained on Japanese Wikipedia." +} \ No newline at end of file diff --git a/data/model_data_json/tomh_toxigen_hatebert.json b/data/model_data_json/tomh_toxigen_hatebert.json new file mode 100644 index 0000000000000000000000000000000000000000..b4d502f7f2ba8c59680dc23f0cad43ca8dcb328c --- /dev/null +++ b/data/model_data_json/tomh_toxigen_hatebert.json @@ -0,0 +1,17 @@ +{ + "model_id": "tomh/toxigen_hatebert", + "downloads": 557506, + "tags": [ + "transformers", + "pytorch", + "bert", + "text-classification", + "en", + "arxiv:2203.09509", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en tags: - text-classification --- Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, Ece Kamar. This model comes from the paper ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection and can be used to detect implicit hate speech. Please visit the Github Repository for the training dataset and further details.", + "model_explanation_gemini": "Detects implicit hate speech in text using a large-scale machine-generated dataset." +} \ No newline at end of file diff --git a/data/model_data_json/trl-internal-testing_tiny-LlamaForCausalLM-3.2.json b/data/model_data_json/trl-internal-testing_tiny-LlamaForCausalLM-3.2.json new file mode 100644 index 0000000000000000000000000000000000000000..b5bba33723d4f59527da64d74248b0ee2cdb63ef --- /dev/null +++ b/data/model_data_json/trl-internal-testing_tiny-LlamaForCausalLM-3.2.json @@ -0,0 +1,18 @@ +{ + "model_id": "trl-internal-testing/tiny-LlamaForCausalLM-3.2", + "downloads": 1565155, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "trl", + "conversational", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - trl --- # Tiny LlamaForCausalLM This is a minimal model built for unit tests in the TRL library.", + "model_explanation_gemini": "A minimal Llama causal language model designed for unit testing in the TRL library." +} \ No newline at end of file diff --git a/data/model_data_json/trl-internal-testing_tiny-MistralForCausalLM-0.2.json b/data/model_data_json/trl-internal-testing_tiny-MistralForCausalLM-0.2.json new file mode 100644 index 0000000000000000000000000000000000000000..221fbfb1ae5a3fec0ec92030a7b84c0b2e33e227 --- /dev/null +++ b/data/model_data_json/trl-internal-testing_tiny-MistralForCausalLM-0.2.json @@ -0,0 +1,18 @@ +{ + "model_id": "trl-internal-testing/tiny-MistralForCausalLM-0.2", + "downloads": 130987, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "trl", + "conversational", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - trl --- # Tiny MistralForCausalLM This is a minimal model built for unit tests in the TRL library.", + "model_explanation_gemini": "A minimal model designed for unit testing in the TRL library." +} \ No newline at end of file diff --git a/data/model_data_json/trl-internal-testing_tiny-Qwen2ForCausalLM-2.5.json b/data/model_data_json/trl-internal-testing_tiny-Qwen2ForCausalLM-2.5.json new file mode 100644 index 0000000000000000000000000000000000000000..c79fad76a98eddbcae845d0aa830d00b4c7bd0e0 --- /dev/null +++ b/data/model_data_json/trl-internal-testing_tiny-Qwen2ForCausalLM-2.5.json @@ -0,0 +1,18 @@ +{ + "model_id": "trl-internal-testing/tiny-Qwen2ForCausalLM-2.5", + "downloads": 1255809, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "trl", + "conversational", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - trl --- # Tiny Qwen2ForCausalLM This is a minimal model built for unit tests in the TRL library.", + "model_explanation_gemini": "A minimal Qwen2ForCausalLM model designed for unit testing within the TRL library." +} \ No newline at end of file diff --git a/data/model_data_json/trl-internal-testing_tiny-T5ForConditionalGeneration.json b/data/model_data_json/trl-internal-testing_tiny-T5ForConditionalGeneration.json new file mode 100644 index 0000000000000000000000000000000000000000..c657a0199bf0758569c386f05adf92d6c61cdad2 --- /dev/null +++ b/data/model_data_json/trl-internal-testing_tiny-T5ForConditionalGeneration.json @@ -0,0 +1,17 @@ +{ + "model_id": "trl-internal-testing/tiny-T5ForConditionalGeneration", + "downloads": 331531, + "tags": [ + "transformers", + "safetensors", + "t5", + "text2text-generation", + "trl", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: - trl --- # Tiny T5ForConditionalGeneration This is a minimal model built for unit tests in the TRL library.", + "model_explanation_gemini": "A minimal T5 model designed for unit testing in the TRL library." +} \ No newline at end of file diff --git a/data/model_data_json/trl-internal-testing_tiny-random-LlamaForCausalLM.json b/data/model_data_json/trl-internal-testing_tiny-random-LlamaForCausalLM.json new file mode 100644 index 0000000000000000000000000000000000000000..23d49726f3aeede173fb786d72567ec91abae5a7 --- /dev/null +++ b/data/model_data_json/trl-internal-testing_tiny-random-LlamaForCausalLM.json @@ -0,0 +1,18 @@ +{ + "model_id": "trl-internal-testing/tiny-random-LlamaForCausalLM", + "downloads": 584697, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "arxiv:1910.09700", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]", + "model_explanation_gemini": "A tiny randomly initialized Llama model for causal language modeling, automatically generated without specific training details or intended use cases." +} \ No newline at end of file diff --git a/data/model_data_json/trpakov_vit-face-expression.json b/data/model_data_json/trpakov_vit-face-expression.json new file mode 100644 index 0000000000000000000000000000000000000000..071c11e47d120923461e6592fa1ad3ccf2d65284 --- /dev/null +++ b/data/model_data_json/trpakov_vit-face-expression.json @@ -0,0 +1,19 @@ +{ + "model_id": "trpakov/vit-face-expression", + "downloads": 7497287, + "tags": [ + "transformers", + "pytorch", + "onnx", + "safetensors", + "vit", + "image-classification", + "doi:10.57967/hf/2289", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 --- # Vision Transformer (ViT) for Facial Expression Recognition Model Card ## Model Overview - **Model Name:** trpakov/vit-face-expression - **Task:** Facial Expression/Emotion Recognition - **Dataset:** FER2013 - **Model Architecture:** Vision Transformer (ViT) - **Finetuned from model:** vit-base-patch16-224-in21k ## Model Description The vit-face-expression model is a Vision Transformer fine-tuned for the task of facial emotion recognition. It is trained on the FER2013 dataset, which consists of facial images categorized into seven different emotions: - Angry - Disgust - Fear - Happy - Sad - Surprise - Neutral ## Data Preprocessing The input images are preprocessed before being fed into the model. The preprocessing steps include: - **Resizing:** Images are resized to the specified input size. - **Normalization:** Pixel values are normalized to a specific range. - **Data Augmentation:** Random transformations such as rotations, flips, and zooms are applied to augment the training dataset. ## Evaluation Metrics - **Validation set accuracy:** 0.7113 - **Test set accuracy:** 0.7116 ## Limitations - **Data Bias:** The model's performance may be influenced by biases present in the training data. - **Generalization:** The model's ability to generalize to unseen data is subject to the diversity of the training dataset.", + "model_explanation_gemini": "Recognizes seven facial expressions (angry, disgust, fear, happy, sad, surprise, neutral) from images using a Vision Transformer fine-tuned on the FER2013 dataset." +} \ No newline at end of file diff --git a/data/model_data_json/tsmatz_xlm-roberta-ner-japanese.json b/data/model_data_json/tsmatz_xlm-roberta-ner-japanese.json new file mode 100644 index 0000000000000000000000000000000000000000..54b4666cf152dfbfc1665e21c8d9275278c28e19 --- /dev/null +++ b/data/model_data_json/tsmatz_xlm-roberta-ner-japanese.json @@ -0,0 +1,23 @@ +{ + "model_id": "tsmatz/xlm-roberta-ner-japanese", + "downloads": 591852, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "xlm-roberta", + "token-classification", + "generated_from_trainer", + "ner", + "bert", + "ja", + "base_model:FacebookAI/xlm-roberta-base", + "base_model:finetune:FacebookAI/xlm-roberta-base", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ja license: mit tags: - generated_from_trainer - ner - bert metrics: - f1 widget: - text: 鈴井は4月の陽気の良い日に、鈴をつけて北海道のトムラウシへと登った - text: 中国では、中国共産党による一党統治が続く base_model: xlm-roberta-base model-index: - name: xlm-roberta-ner-ja results: [] --- # xlm-roberta-ner-japanese (Japanese caption : 日本語の固有表現抽出のモデル) This model is a fine-tuned version of xlm-roberta-base (pre-trained cross-lingual ) trained for named entity recognition (NER) token classification. The model is fine-tuned on NER dataset provided by Stockmark Inc, in which data is collected from Japanese Wikipedia articles.
See here for the license of this dataset. Each token is labeled by : | Label id | Tag | Tag in Widget | Description | |---|---|---|---| | 0 | O | (None) | others or nothing | | 1 | PER | PER | person | | 2 | ORG | ORG | general corporation organization | | 3 | ORG-P | P | political organization | | 4 | ORG-O | O | other organization | | 5 | LOC | LOC | location | | 6 | INS | INS | institution, facility | | 7 | PRD | PRD | product | | 8 | EVT | EVT | event | ## Intended uses ## Training procedure You can download the source code for fine-tuning from here. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 12 - eval_batch_size: 12 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Validation Loss | F1 | |:-------------:|:-----:|:----:|:---------------:|:------:| | No log | 1.0 | 446 | 0.1510 | 0.8457 | | No log | 2.0 | 892 | 0.0626 | 0.9261 | | No log | 3.0 | 1338 | 0.0366 | 0.9580 | | No log | 4.0 | 1784 | 0.0196 | 0.9792 | | No log | 5.0 | 2230 | 0.0173 | 0.9864 | ### Framework versions - Transformers 4.23.1 - Pytorch 1.12.1+cu102 - Datasets 2.6.1 - Tokenizers 0.13.1", + "model_explanation_gemini": "Identifies and classifies named entities (persons, organizations, locations, etc.) in Japanese text using token-level labeling." +} \ No newline at end of file diff --git a/data/model_data_json/tuner007_pegasus_paraphrase.json b/data/model_data_json/tuner007_pegasus_paraphrase.json new file mode 100644 index 0000000000000000000000000000000000000000..a1b77ebfac3a8237d7f5fdffd1c92cb8e54bfcd4 --- /dev/null +++ b/data/model_data_json/tuner007_pegasus_paraphrase.json @@ -0,0 +1,19 @@ +{ + "model_id": "tuner007/pegasus_paraphrase", + "downloads": 198999, + "tags": [ + "transformers", + "pytorch", + "pegasus", + "text2text-generation", + "paraphrasing", + "seq2seq", + "en", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 tags: - pegasus - paraphrasing - seq2seq --- ## Model description PEGASUS fine-tuned for paraphrasing ## Model in Action 🚀 #### Example: > Created by Arpit Rajauria ![Twitter icon](", + "model_explanation_gemini": "Fine-tunes PEGASUS for generating paraphrased versions of input text." +} \ No newline at end of file diff --git a/data/model_data_json/uer_gpt2-chinese-cluecorpussmall.json b/data/model_data_json/uer_gpt2-chinese-cluecorpussmall.json new file mode 100644 index 0000000000000000000000000000000000000000..d2a1058c74ac03caa969caa6b8123fdeb51e2edf --- /dev/null +++ b/data/model_data_json/uer_gpt2-chinese-cluecorpussmall.json @@ -0,0 +1,22 @@ +{ + "model_id": "uer/gpt2-chinese-cluecorpussmall", + "downloads": 37130, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "gpt2", + "text-generation", + "zh", + "dataset:CLUECorpusSmall", + "arxiv:1909.05658", + "arxiv:2212.06385", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: zh datasets: CLUECorpusSmall widget: - text: \"米饭是一种用稻米与水煮成的食物\" --- # Chinese GPT2 Models ## Model description The set of GPT2 models, except for GPT2-xlarge model, are pre-trained by UER-py, which is introduced in this paper. The GPT2-xlarge model is pre-trained by TencentPretrain introduced in this paper, which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework. Besides, the other models could also be pre-trained by TencentPretrain. The model is used to generate Chinese texts. You can download the set of Chinese GPT2 models either from the UER-py Modelzoo page, or via HuggingFace from the links below: | | Link | | ----------------- | :----------------------------: | | **GPT2-distil** | [**L=6/H=768**][distil] | | **GPT2** | [**L=12/H=768**][base] | | **GPT2-medium** | [**L=24/H=1024**][medium] | | **GPT2-large** | [**L=36/H=1280**][large] | | **GPT2-xlarge** | [**L=48/H=1600**][xlarge] | Note that the 6-layer model is called GPT2-distil model because it follows the configuration of distilgpt2, and the pre-training does not involve the supervision of larger models. ## How to use You can use the model directly with a pipeline for text generation (take the case of GPT2-distil): ## Training data CLUECorpusSmall is used as training data. ## Training procedure The GPT2-xlarge model is pre-trained by TencentPretrain, and the others are pre-trained by UER-py on Tencent Cloud. We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 1024. For the models pre-trained by UER-py, take the case of GPT2-distil Stage1: Stage2: Finally, we convert the pre-trained model into Huggingface's format: For GPT2-xlarge model, we use TencetPretrain. Stage1: Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints: Stage2: Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints: Finally, we convert the pre-trained model into Huggingface's format: ### BibTeX entry and citation info [distil]: [base]: [medium]: [large]: [xlarge]:", + "model_explanation_gemini": "Generates Chinese text based on pre-training with the CLUECorpusSmall dataset." +} \ No newline at end of file diff --git a/data/model_data_json/unitary_toxic-bert.json b/data/model_data_json/unitary_toxic-bert.json new file mode 100644 index 0000000000000000000000000000000000000000..bc01095928a515aadacc49c509087936c9cce29e --- /dev/null +++ b/data/model_data_json/unitary_toxic-bert.json @@ -0,0 +1,20 @@ +{ + "model_id": "unitary/toxic-bert", + "downloads": 503399, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "bert", + "text-classification", + "arxiv:1703.04009", + "arxiv:1905.12516", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 ---
**⚠️ Disclaimer:** The huggingface models currently give different results to the detoxify library (see issue here). For the most up to date models we recommend using the models from # 🙊 Detoxify ## Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers !CI testing !Lint
!Examples image ## Description Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. Built by Laura Hanu at Unitary, where we are working to stop harmful content online by interpreting visual content in context. Dependencies: - For inference: - 🤗 Transformers - ⚡ Pytorch lightning - For training will also need: - Kaggle API (to download data) | Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score | Detoxify Score |-|-|-|-|-|-|-| | Toxic Comment Classification Challenge | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | | 0.98856 | 0.98636 | Jigsaw Unintended Bias in Toxicity Classification | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | | 0.94734 | 0.93639 | Jigsaw Multilingual Toxic Comment Classification | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | | 0.9536 | 0.91655* *Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available. It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use. ## Limitations and ethical considerations If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups. The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker. Some useful resources about the risk of different biases in toxicity or hate speech detection are: - The Risk of Racial Bias in Hate Speech Detection - Automated Hate Speech Detection and the Problem of Offensive Language - Racial Bias in Hate Speech and Abusive Language Detection Datasets ## Quick prediction The model has been trained on 7 different languages so it should only be tested on: , , , , , or . For more details check the Prediction section. ## Labels All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema: - **Very Toxic** (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective) - **Toxic** (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective) - **Hard to Say** - **Not Toxic** More information about the labelling schema can be found here. ### Toxic Comment Classification Challenge This challenge includes the following labels: - - - - - - ### Jigsaw Unintended Bias in Toxicity Classification This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments. Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation. - - - - - - - Identity labels used: - - - - - - - - - A complete list of all the identity labels available can be found here. ### Jigsaw Multilingual Toxic Comment Classification Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on: - ## How to run First, install dependencies ## Prediction Trained models summary: |Model name| Transformer type| Data from |:--:|:--:|:--:| || | Toxic Comment Classification Challenge || | Unintended Bias in Toxicity Classification || | Multilingual Toxic Comment Classification For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments. Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names: - - - Importing detoxify in python: ## Training If you do not already have a Kaggle account: - you need to create one to be able to download the data - go to My Account and click on Create New API Token - this will download a kaggle.json file - make sure this file is located in ~/.kaggle ## Start Training ### Toxic Comment Classification Challenge ### Unintended Bias in Toxicicity Challenge ### Multilingual Toxic Comment Classification This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge. The translated data can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set). ### Monitor progress with tensorboard ## Model Evaluation ### Toxic Comment Classification Challenge This challenge is evaluated on the mean AUC score of all the labels. ### Unintended Bias in Toxicicity Challenge This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric here. ### Multilingual Toxic Comment Classification This challenge is evaluated on the AUC score of the main toxic label. ### Citation", + "model_explanation_gemini": "Classifies toxic comments across multiple languages and toxicity types while addressing unintended bias in toxicity detection." +} \ No newline at end of file diff --git a/data/model_data_json/universitytehran_PersianMind-v1.0.json b/data/model_data_json/universitytehran_PersianMind-v1.0.json new file mode 100644 index 0000000000000000000000000000000000000000..1b4de2ae29cfad6fc8cb0aefaa186d39477f697d --- /dev/null +++ b/data/model_data_json/universitytehran_PersianMind-v1.0.json @@ -0,0 +1,22 @@ +{ + "model_id": "universitytehran/PersianMind-v1.0", + "downloads": 266947, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "llama", + "text-generation", + "text-generation-inference", + "multilingual", + "fa", + "en", + "arxiv:2401.06466", + "license:cc-by-nc-sa-4.0", + "co2_eq_emissions", + "autotrain_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-sa-4.0 language: - multilingual - fa - en library_name: transformers tags: - text-generation-inference inference: false metrics: - bleu - comet - accuracy - perplexity - spearmanr pipeline_tag: text-generation co2_eq_emissions: emissions: 232380 source: \"PersianMind: A Cross-Lingual Persian-English Large Language Model. training_type: \"fine-tuning\" hardware_used: \"4 RTX3090 24GB GPUs\" geographical_location: \"Tehran, Iran\" ---

\"PersianMind

# PersianMind PersianMind is a cross-lingual Persian-English large language model. The model achieves state-of-the-art results on Persian subset of the Belebele benchmark and the ParsiNLU multiple-choice QA task. It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task. ## Model Description - **Developed by:** Pedram Rostami, Ali Salemi, and Mohammad Javad Dousti - **Model type:** Language model - **Languages:** English and Persian - **License:** CC BY-NC-SA 4.0 (non-commercial use only.) ## How to Get Started with the Model Use the code below to get started with the model. Note that you need to install sentencepiece and accelerate libraries along with PyTorch and 🤗Transformers to run this code. ### How to Quantize the Model Quantized models can be run on resource-constrained devices. To quantize the model, you should install the bitsandbytes library. In order to quantize the model in 8-bit (), use the code below. Alternatively, you can quantize the model in 4-bit () with the following code. ### Evaluating Quantized Models | Model | Belebele (Persian) | Fa→En Translation
(Comet) | En→Fa Translation
(Comet) | Model Size | Tokens/sec | | :----------------------------------------------------------------: | :--------------------------------------------------------------: | :------------------------------------------------------------------------: | :------------------------------------------------------------------------: | :--------: | :--------: | | PersianMind () | 73.9 | 83.61 | 79.44 | 13.7G | 25.35 | | PersianMind () | 73.7 | 82.32 | 78.61 | 7.2G | 11.36 | | PersianMind () | 70.2 | 82.07 | 80.36 | 3.9G | 24.36 | We evaluated quantized models in various tasks against the original model. Specifically, we evaluated all models using the reading comprehension multiple-choice question-answering benchmark of Belebele (Persian subset) and reported the accuracy of each model. Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks. For this, we utilized the Persian-English subset of the Flores-200 dataset and reported our results using the Comet metric. Furthermore, we calculated the average number of generated tokens per second by each model during running the translation tasks. To understand resource efficiency, we measured the memory usage of each model by employing the function. ## License PersianMind is subject to Meta's LLaMa2 Community License. It is further licensed under CC BY-NC-SA 4.0, which allows non-commercial use of the model. Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page. If you suspect any violations, please reach out to us. ## Citation If you find this model helpful, please ensure to cite the following paper. **BibTeX:**", + "model_explanation_gemini": "PersianMind-v1.0 is a cross-lingual Persian-English large language model optimized for text generation, translation, and question-answering tasks, achieving state-of-the-art performance on Persian benchmarks." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..9cdd535b535ed3d9b4f2580bf7cbd65748703bac --- /dev/null +++ b/data/model_data_json/unsloth_DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit.json @@ -0,0 +1,27 @@ +{ + "model_id": "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit", + "downloads": 113177, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "deepseek", + "unsloth", + "llama-3", + "meta", + "conversational", + "en", + "base_model:deepseek-ai/DeepSeek-R1-Distill-Llama-8B", + "base_model:quantized:deepseek-ai/DeepSeek-R1-Distill-Llama-8B", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B language: - en license: llama3.1 library_name: transformers tags: - deepseek - unsloth - transformers - llama - llama-3 - meta --- ## ***See our collection for versions of Deepseek-R1 including GGUF and original formats.*** *Dynamic 4-bit: Unsloth's Dynamic 4-bit Quants selectively avoids quantizing certain parameters, greatly increase accuracy than standard 4-bit.
See our full collection of Unsloth quants on Hugging Face here.*
# Finetune LLMs 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the DeepSeek team for creating and releasing these models. ## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regrading the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: **NOTE: We recommend setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherwise you may encounter issues with endless repetition or incoherent output.** ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "A 4-bit quantized version of the DeepSeek-R1-Distill-Llama-8B model optimized for faster fine-tuning and reduced memory usage while maintaining accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_DeepSeek-R1-GGUF.json b/data/model_data_json/unsloth_DeepSeek-R1-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..71fb44b3406c10c85609c863b48d8089123a60e8 --- /dev/null +++ b/data/model_data_json/unsloth_DeepSeek-R1-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "unsloth/DeepSeek-R1-GGUF", + "downloads": 581842, + "tags": [ + "transformers", + "gguf", + "deepseek_v3", + "text-generation", + "deepseek", + "unsloth", + "custom_code", + "en", + "arxiv:2501.12948", + "base_model:deepseek-ai/DeepSeek-R1", + "base_model:quantized:deepseek-ai/DeepSeek-R1", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- base_model: deepseek-ai/DeepSeek-R1 language: - en library_name: transformers license: mit tags: - deepseek - unsloth - transformers --- Or you can view more detailed instructions here: unsloth.ai/blog/deepseekr1-dynamic 1. Do not forget about and tokens! - Or use a chat template formatter 2. Obtain the latest at You can follow the build instructions below as well: 3. It's best to use to counteract very rare token predictions - I found this to work well especially for the 1.58bit model. 4. Download the model via: 5. Example with Q4_0 K quantized cache **Notice -no-cnv disables auto conversation mode** Example output: 6. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers. 7. If you want to merge the weights together, use this script: | MoE Bits | Type | Disk Size | Accuracy | Link | Details | | -------- | -------- | ------------ | ------------ | ---------------------| ---------- | | 1.58bit | UD-IQ1_S | **131GB** | Fair | Link | MoE all 1.56bit. in MoE mixture of 2.06/1.56bit | | 1.73bit | UD-IQ1_M | **158GB** | Good | Link | MoE all 1.56bit. in MoE left at 2.06bit | | 2.22bit | UD-IQ2_XXS | **183GB** | Better | Link | MoE all 2.06bit. in MoE mixture of 2.5/2.06bit | | 2.51bit | UD-Q2_K_XL | **212GB** | Best | Link | MoE all 2.5bit. in MoE mixture of 3.5/2.5bit | # Finetune your own Reasoning model like R1 with Unsloth! We have a free Google Colab notebook for turning Llama 3.1 (8B) into a reasoning model: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Phi-4 (14B)** | ▶️ Start on Colab-GRPO.ipynb) | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the DeepSeek team for creating and releasing these models. # DeepSeek-R1
\"DeepSeek-V3\"

👁️

## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.**

## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Model Downloads ### DeepSeek-R1 Models
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace | | DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. ### DeepSeek-R1-Distill Models
| **Model** | **Base Model** | **Download** | | :------------: | :------------: | :------------: | | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace | |DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace | | DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
### Distilled Model Evaluation
| Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
## 5. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button \"DeepThink\" We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com ## 6. How to Run Locally ### DeepSeek-R1 Models Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally. ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM: You can also easily start a service using SGLang ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\" 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. ## 7. License This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that: - DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. - DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. - DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license. ## 8. Citation ## 9. Contact If you have any questions, please raise an issue or contact us at service@deepseek.com.", + "model_explanation_gemini": "A quantized version of DeepSeek-R1 optimized for improved accuracy through selective quantization, designed for efficient inference and reasoning tasks." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_DeepSeek-V3-0324-GGUF.json b/data/model_data_json/unsloth_DeepSeek-V3-0324-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..79e8abe4cb5f85bfb58875d577c63eb5a1227e82 --- /dev/null +++ b/data/model_data_json/unsloth_DeepSeek-V3-0324-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "unsloth/DeepSeek-V3-0324-GGUF", + "downloads": 78841, + "tags": [ + "transformers", + "gguf", + "deepseek_v3", + "text-generation", + "deepseek", + "unsloth", + "custom_code", + "en", + "arxiv:2412.19437", + "base_model:deepseek-ai/DeepSeek-V3-0324", + "base_model:quantized:deepseek-ai/DeepSeek-V3-0324", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "fp8", + "region:us", + "conversational" + ], + "description": "--- base_model: deepseek-ai/DeepSeek-V3-0324 language: - en library_name: transformers license: mit tags: - deepseek_v3 - deepseek - unsloth - transformers --- Our DeepSeek-V3-0324 GGUFs allow you to run the model in llama.cpp, LMStudio, Open WebUI and other inference frameworks. Includes 1-4-bit Dynamic versions, which yields better accuracy and results than standard quantization. | MoE Bits | Type | Disk Size | Accuracy | Link | Details | |----------|----------|-------------|----------|------------------------------------------------------------------------------------------------------------|---------------------------------------------------| | 1.78bit (prelim) | IQ1_S | **186GB** | Ok | Link | in MoE mixture of 2.06/1.78bit | | 1.93bit (prelim) | IQ1_M | **196GB** | Fair | Link | in MoE mixture of 2.06/1.93bit | | 2.42bit | IQ2_XXS | **219GB** | Recommended | Link | in MoE all 2.42bit | | 2.71bit | Q2_K_XL | **248GB** | Recommended | Link | in MoE mixture of 3.5/2.71bit | | 3.5bit | Q3_K_XL | **321GB** | Great | Link | in MoE mixture of 4.5/3.5bit | | 4.5bit | Q4_K_XL | **405GB** | Best | Link | in MoE mixture of 5.5/4.5bit | Prelim = preliminary - through our testing, they're generally fine but sometimes don't produce the best code and so more work/testing needs to be done. 2.71bit was found to be the best in terms of performance/size and produces code that is great and works well. 2.42bit was also found to pass all our tests. So, for best results, use the 2.42-bit (IQ2_XXS) or 2.71-bit (Q2_K_XL) versions. Though not a must, try to have at least 180GB+ combined VRAM + RAM. Thank you to the DeepSeek team for releasing their March update to the DeepSeek V3 models. Also thank you to bartowski for providing imatric V3 quants. # Finetune your own Reasoning model like R1 with Unsloth! We have a free Google Colab notebook for turning Llama 3.1 (8B) into a reasoning model: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Phi-4 (14B)** | ▶️ Start on Colab-GRPO.ipynb) | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less |
\"DeepSeek-V3\"

## Features DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects. !Model Performance ### Reasoning Capabilities - Significant improvements in benchmark performance: - MMLU-Pro: 75.9 → 81.2 (+5.3) - GPQA: 59.1 → 68.4 (+9.3) - AIME: 39.6 → 59.4 (+19.8) - LiveCodeBench: 39.2 → 49.2 (+10.0) ### Front-End Web Development - Improved the executability of the code - More aesthetically pleasing web pages and game front-ends ### Chinese Writing Proficiency - Enhanced style and content quality: - Aligned with the R1 writing style - Better quality in medium-to-long-form writing - Feature Enhancements - Improved multi-turn interactive rewriting - Optimized translation quality and letter writing ### Chinese Search Capabilities - Enhanced report analysis requests with more detailed outputs ### Function Calling Improvements - Increased accuracy in Function Calling, fixing issues from previous V3 versions --- ## Usage Recommendations ### System Prompt In the official DeepSeek web/app, we use the same system prompt with a specific date. For example, ### Temperature In our web and application environments, the temperature parameter $T_{model}$ is set to 0.3. Because many users use the default temperature 1.0 in API call, we have implemented an API temperature $T_{api}$ mapping mechanism that adjusts the input API temperature value of 1.0 to the most suitable model temperature setting of 0.3. $$ T_{model} = T_{api} \\times 0.3 \\quad (0 \\leq T_{api} \\leq 1) $$ $$ T_{model} = T_{api} - 0.7 \\quad (1 < T_{api} \\leq 2) $$ Thus, if you call V3 via API, temperature 1.0 equals to the model temperature 0.3. ### Prompts for File Uploading and Web Search For file uploading, please follow the template to create prompts, where {file_name}, {file_content} and {question} are arguments. For Web Search, {search_results}, {cur_date}, and {question} are arguments. For Chinese query, we use the prompt: For English query, we use the prompt: ## How to Run Locally The model structure of DeepSeek-V3-0324 is exactly the same as DeepSeek-V3. Please visit DeepSeek-V3 repo for more information about running this model locally. **This model supports features such as function calling, JSON output, and FIM completion. For instructions on how to construct prompts to use these features, please refer to DeepSeek-V2.5 repo.** **NOTE: Hugging Face's Transformers has not been directly supported yet.** ## License This repository and the model weights are licensed under the MIT License. ## Citation ## Contact If you have any questions, please raise an issue or contact us at service@deepseek.com." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-3.2-11B-Vision-Instruct-bnb-4bit.json b/data/model_data_json/unsloth_Llama-3.2-11B-Vision-Instruct-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..2366de575d2b0f5d2d07b01cbac6bf9992f1eebf --- /dev/null +++ b/data/model_data_json/unsloth_Llama-3.2-11B-Vision-Instruct-bnb-4bit.json @@ -0,0 +1,30 @@ +{ + "model_id": "unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit", + "downloads": 204667, + "tags": [ + "transformers", + "safetensors", + "mllama", + "image-text-to-text", + "llama-3", + "llama", + "meta", + "facebook", + "unsloth", + "multimodal", + "vision", + "pytorch", + "conversational", + "en", + "base_model:meta-llama/Llama-3.2-11B-Vision-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-11B-Vision-Instruct", + "license:llama3.2", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.2-11B-Vision-Instruct language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - unsloth - transformers - multimodal - vision - pytorch --- ## ***See our collection for vision models including Llama 3.2, Llava, Qwen2-VL and Pixtral.*** # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.2 Vision (11B) here: # unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "An optimized, 4-bit quantized version of Meta's Llama-3.2-11B-Vision-Instruct model for efficient multimodal vision-and-language tasks, fine-tuned using Unsloth for faster performance and reduced memory usage." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..beae2ed71e077ca5f1b0c21cd6707a85423ac33f --- /dev/null +++ b/data/model_data_json/unsloth_Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit.json @@ -0,0 +1,29 @@ +{ + "model_id": "unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit", + "downloads": 128744, + "tags": [ + "transformers", + "safetensors", + "mllama", + "image-text-to-text", + "llama-3", + "llama", + "meta", + "facebook", + "unsloth", + "multimodal", + "vision", + "conversational", + "en", + "base_model:meta-llama/Llama-3.2-11B-Vision-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-11B-Vision-Instruct", + "license:llama3.2", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.2-11B-Vision-Instruct language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - unsloth - transformers - multimodal - vision --- ### ***Unsloth's Dynamic 4-bit Quants selectively avoids quantizing certain parameters, greatly improving accuracy while keeping VRAM usage similar to BnB 4-bit.
See our full collection of Unsloth quants on Hugging Face here.***
# Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.2 Vision (11B) here: # unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "A 4-bit quantized version of Meta's Llama-3.2-11B-Vision-Instruct model optimized for faster multilingual multimodal tasks (vision and text) with reduced memory usage while maintaining accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-3.2-1B-Instruct-GGUF.json b/data/model_data_json/unsloth_Llama-3.2-1B-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..882667fd8bfd0f0d9572e815c61e8f14ed40587f --- /dev/null +++ b/data/model_data_json/unsloth_Llama-3.2-1B-Instruct-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "unsloth/Llama-3.2-1B-Instruct-GGUF", + "downloads": 255223, + "tags": [ + "transformers", + "gguf", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "en", + "base_model:meta-llama/Llama-3.2-1B-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-1B-Instruct", + "license:llama3.2", + "autotrain_compatible", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- base_model: meta-llama/Llama-3.2-1B-Instruct language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- ## ***See our collection for all versions of Llama 3.2 including GGUF, 4-bit and original 16-bit formats.*** # GGUF uploads 16bit, 8bit, 6bit, 5bit, 4bit, 3bit and 2bit uploads avaliable. # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.2 (3B) here: # unsloth/Llama-3.2-1B-Instruct For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (11B vision)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "A 1-billion-parameter instruction-tuned multilingual language model optimized for dialogue, retrieval, and summarization tasks, available in various quantization formats for efficient deployment." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-3.2-1B-Instruct.json b/data/model_data_json/unsloth_Llama-3.2-1B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..31093761ff8bea15594680b830523fd598e7e8f4 --- /dev/null +++ b/data/model_data_json/unsloth_Llama-3.2-1B-Instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/Llama-3.2-1B-Instruct", + "downloads": 105675, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "conversational", + "en", + "base_model:meta-llama/Llama-3.2-1B-Instruct", + "base_model:finetune:meta-llama/Llama-3.2-1B-Instruct", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.2-1B-Instruct language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- ## ***See our collection for all versions of Llama 3.2 including GGUF, 4-bit and original 16-bit formats.*** # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.2 (3B) here: # unsloth/Llama-3.2-1B-Instruct For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (11B vision)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "An instruction-tuned multilingual language model optimized for dialogue, retrieval, and summarization tasks, offering faster finetuning with reduced memory usage." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-3.2-1B.json b/data/model_data_json/unsloth_Llama-3.2-1B.json new file mode 100644 index 0000000000000000000000000000000000000000..b9696c34e70b6c0cd8824c873db7d8729cb3b715 --- /dev/null +++ b/data/model_data_json/unsloth_Llama-3.2-1B.json @@ -0,0 +1,24 @@ +{ + "model_id": "unsloth/Llama-3.2-1B", + "downloads": 695056, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "en", + "base_model:meta-llama/Llama-3.2-1B", + "base_model:finetune:meta-llama/Llama-3.2-1B", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.2-1B language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- ## ***See our collection for all versions of Llama 3.2 including GGUF, 4-bit and original 16-bit formats.*** # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.2 (1B) here: # Llama-3.2-1B For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (11B vision)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "An optimized 1B-parameter multilingual LLM for efficient fine-tuning, supporting conversational tasks and text completion with reduced memory usage and faster performance." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-3.2-3B-Instruct-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_Llama-3.2-3B-Instruct-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..628dfe84ec45dd305348e4fdc915ee4f6d105ec5 --- /dev/null +++ b/data/model_data_json/unsloth_Llama-3.2-3B-Instruct-unsloth-bnb-4bit.json @@ -0,0 +1,27 @@ +{ + "model_id": "unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit", + "downloads": 149965, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "conversational", + "en", + "base_model:meta-llama/Llama-3.2-3B-Instruct", + "base_model:quantized:meta-llama/Llama-3.2-3B-Instruct", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.2-3B-Instruct language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- We have a free Google Colab Tesla T4 notebook for Llama 3.2 (3B) here: # unsloth/Llama-3.2-3B-unsloth-bnb-4bit For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "A 4-bit quantized version of Meta's Llama-3.2-3B-Instruct model optimized by Unsloth for faster fine-tuning and reduced memory usage while maintaining improved accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-3.2-3B-Instruct.json b/data/model_data_json/unsloth_Llama-3.2-3B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..65c84c5f57a20ec78e4d95291df4fd321a9d9e0e --- /dev/null +++ b/data/model_data_json/unsloth_Llama-3.2-3B-Instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/Llama-3.2-3B-Instruct", + "downloads": 115503, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "conversational", + "en", + "base_model:meta-llama/Llama-3.2-3B-Instruct", + "base_model:finetune:meta-llama/Llama-3.2-3B-Instruct", + "license:llama3.2", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.2-3B-Instruct language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- ## ***See our collection for all versions of Llama 3.2 including GGUF, 4-bit and original 16-bit formats.*** # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.2 (3B) here: # unsloth/Llama-3.2-3B-Instruct For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (11B vision)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "An instruction-tuned multilingual language model optimized for dialogue, retrieval, and summarization tasks, offering faster finetuning with reduced memory usage." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Llama-4-Scout-17B-16E-Instruct-GGUF.json b/data/model_data_json/unsloth_Llama-4-Scout-17B-16E-Instruct-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..ec73c15683b5f92744d02425cc18212cd4d96624 --- /dev/null +++ b/data/model_data_json/unsloth_Llama-4-Scout-17B-16E-Instruct-GGUF.json @@ -0,0 +1,38 @@ +{ + "model_id": "unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF", + "downloads": 193664, + "tags": [ + "transformers", + "gguf", + "llama4", + "image-text-to-text", + "facebook", + "unsloth", + "meta", + "pytorch", + "llama", + "llama-4", + "ar", + "de", + "en", + "es", + "fr", + "hi", + "id", + "it", + "pt", + "th", + "tl", + "vi", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-4-Scout-17B-16E-Instruct", + "base_model:quantized:meta-llama/Llama-4-Scout-17B-16E-Instruct", + "license:other", + "endpoints_compatible", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- library_name: transformers language: - ar - de - en - es - fr - hi - id - it - pt - th - tl - vi base_model: - meta-llama/Llama-4-Scout-17B-16E-Instruct tags: - facebook - unsloth - meta - pytorch - llama - llama-4 extra_gated_prompt: >- **LLAMA 4 COMMUNITY LICENSE AGREEMENT** Llama 4 Version Effective Date: April 5, 2025 \"**Agreement**\" means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein. \"**Documentation**\" means the specifications, manuals and documentation accompanying Llama 4 distributed by Meta at \"**Licensee**\" or \"**you**\" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf. \"**Llama 4**\" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at \"**Llama Materials**\" means, collectively, Meta’s proprietary Llama 4 and Documentation (and any portion thereof) made available under this Agreement. \"**Meta**\" or \"**we**\" means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland). By clicking \"I Accept\" below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement. 1\\. **License Rights and Redistribution**. a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials. b. Redistribution and Use. i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display \"Built with Llama\" on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include \"Llama\" at the beginning of any such AI model name. ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a \"Notice\" text file distributed as a part of such copies: \"Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.\" iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at which is hereby incorporated by reference into this Agreement. 2\\. **Additional Commercial Terms**. If, on the Llama 4 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. 3**. Disclaimer of Warranty**. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN \"AS IS\" BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS. 4\\. **Limitation of Liability**. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 5\\. **Intellectual Property**. a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use \"Llama\" (the \"Mark\") solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at All goodwill arising out of your use of the Mark will inure to the benefit of Meta. b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 4 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials. 6\\. **Term and Termination**. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement. 7\\. **Governing Law and Jurisdiction**. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other geo: ip_location By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox extra_gated_description: >- The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy. extra_gated_button_content: Submit extra_gated_heading: \"Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.\" license: other license_name: llama4 ---

to see how to Fine-tune & Run Llama 4 correctly.

|MoE Bits|Type|Disk Size|HF Link|Accuracy| |:-|:-|:-|:-|:-| |1.78bit|IQ1\\_S|**33.8GB**|Link|Ok| |1.93bit|IQ1\\_M|**35.4GB**|Link|Fair| |2.42-bit|IQ2\\_XXS|**38.6GB**|Link|Better| |2.71-bit|Q2\\_K\\_XL|**42.2GB**|Link|Suggested| |3.5-bit|Q3\\_K\\_XL|**52.9GB**|Link|Great| |4.5-bit|Q4\\_K\\_XL|**65.6GB**|Link|Best| Currently text only is supported. **Chat template/prompt format:** # 🦙 Fine-tune Meta's Llama 4 with Unsloth! - Fine-tune Llama-4-Scout on a single H100 80GB GPU using Unsloth! - Read our Blog about Llama 4 support: unsloth.ai/blog/llama4 - View the rest of our notebooks in our docs here. - Export your fine-tuned model to GGUF, Ollama, llama.cpp, vLLM or 🤗HF. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Llama 3.1 (8B)** | ▶️ Start on Colab-GRPO.ipynb) | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Phi-4 (14B)** | ▶️ Start on Colab | 2x faster | 50% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less |
## Llama 4 Model Information The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts. **Model developer**: Meta **Model Architecture:** The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality.
Model Name Training Data Params Input modalities Output modalities Context length Token count Knowledge cutoff
Llama 4 Scout (17Bx16E) A mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI. Learn more in our . 17B (Activated) 109B (Total) Multilingual text and image Multilingual text and code 10M ~40T August 2024
Llama 4 Maverick (17Bx128E) 17B (Activated) 400B (Total) Multilingual text and image Multilingual text and code 1M ~22T August 2024
**Supported languages:** Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. **Model Release Date:** April 5, 2025 **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models may be released as we improve model behavior with community feedback. **License**: A custom commercial license, the Llama 4 Community License Agreement, is available at: **Where to send questions or comments about the model:** Instructions on how to provide feedback or comments on the model can be found in the Llama README. For more technical information about generation parameters and recipes for how to use Llama 4 in applications, please go here. ## Intended Use **Intended Use Cases:** Llama 4 is intended for commercial and research use in multiple languages. Instruction tuned models are intended for assistant-like chat and visual reasoning tasks, whereas pretrained models can be adapted for natural language generation. For vision, Llama 4 models are also optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The Llama 4 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 4 Community License allows for these use cases. **Out-of-scope**: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 4 Community License. Use in languages or capabilities beyond those explicitly referenced as supported in this model card\\*\\*. \\*\\*Note: 1\\. Llama 4 has been trained on a broader collection of languages than the 12 supported languages (pre-training includes 200 total languages). Developers may fine-tune Llama 4 models for languages beyond the 12 supported languages provided they comply with the Llama 4 Community License and the Acceptable Use Policy. Developers are responsible for ensuring that their use of Llama 4 in additional languages is done in a safe and responsible manner. 2\\. Llama 4 has been tested for image understanding up to 5 input images. If leveraging additional image understanding capabilities beyond this, Developers are responsible for ensuring that their deployments are mitigated for risks and should perform additional testing and tuning tailored to their specific applications. ## How to use with transformers Please, make sure you have transformers installed, or upgrade using . ## Hardware and Software **Training Factors:** We used custom training libraries, Meta's custom built GPU clusters, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure. **Training Energy Use:** Model pre-training utilized a cumulative of **7.38M** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. ## ## **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **1,999 tons** CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with clean and renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq. | Model Name | Training Time (GPU hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | | :---- | :---: | :---: | :---: | :---: | | Llama 4 Scout | 5.0M | 700 | 1,354 | 0 | | Llama 4 Maverick | 2.38M | 700 | 645 | 0 | | Total | 7.38M | \\- | 1,999 | 0 | ## The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 4 Scout was pretrained on \\~40 trillion tokens and Llama 4 Maverick was pretrained on \\~22 trillion tokens of multimodal data from a mix of publicly available, licensed data and information from Meta’s products and services. This includes publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI. **Data Freshness:** The pretraining data has a cutoff of August 2024\\. ## Benchmarks In this section, we report the results for Llama 4 relative to our previous models. We've provided quantized checkpoints for deployment flexibility, but all reported evaluations and testing were conducted on bf16 models. ### Pre-trained models | Pre-trained models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.1 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Reasoning & Knowledge | MMLU | 5 | macro\\_avg/acc\\_char | 79.3 | 85.2 | 79.6 | 85.5 | | | MMLU-Pro | 5 | macro\\_avg/em | 53.8 | 61.6 | 58.2 | 62.9 | | | MATH | 4 | em\\_maj1@1 | 41.6 | 53.5 | 50.3 | 61.2 | | Code | MBPP | 3 | pass@1 | 66.4 | 74.4 | 67.8 | 77.6 | | Multilingual | TydiQA | 1 | average/f1 | 29.9 | 34.3 | 31.5 | 31.7 | | Image | ChartQA | 0 | relaxed\\_accuracy | No multimodal support | | 83.4 | 85.3 | | | DocVQA | 0 | anls | | | 89.4 | 91.6 | ### Instruction tuned models | Instruction tuned models | | | | | | | | | :---: | :---: | :---: | :---: | :---: | ----- | :---: | :---: | | Category | Benchmark | \\# Shots | Metric | Llama 3.3 70B | Llama 3.1 405B | **Llama 4 Scout** | **Llama 4 Maverick** | | Image Reasoning | MMMU | 0 | accuracy | No multimodal support | | 69.4 | 73.4 | | | MMMU Pro^ | 0 | accuracy | | | 52.2 | 59.6 | | | MathVista | 0 | accuracy | | | 70.7 | 73.7 | | Image Understanding | ChartQA | 0 | relaxed\\_accuracy | | | 88.8 | 90.0 | | | DocVQA (test) | 0 | anls | | | 94.4 | 94.4 | | Coding | LiveCodeBench (10/01/2024-02/01/2025) | 0 | pass@1 | 33.3 | 27.7 | 32.8 | 43.4 | | Reasoning & Knowledge | MMLU Pro | 0 | macro\\_avg/acc | 68.9 | 73.4 | 74.3 | 80.5 | | | GPQA Diamond | 0 | accuracy | 50.5 | 49.0 | 57.2 | 69.8 | | Multilingual | MGSM | 0 | average/em | 91.1 | 91.6 | 90.6 | 92.3 | | Long context | MTOB (half book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | Context window is 128K | | 42.2/36.6 | 54.0/46.4 | | | MTOB (full book) eng-\\>kgv/kgv-\\>eng | \\- | chrF | | | 39.7/36.3 | 50.8/46.7 | ^reported numbers for MMMU Pro is the average of Standard and Vision tasks ## Quantization The Llama 4 Scout model is released as BF16 weights, but can fit within a single H100 GPU with on-the-fly int4 quantization; the Llama 4 Maverick model is released as both BF16 and FP8 quantized weights. The FP8 quantized weights fit on a single H100 DGX host while still maintaining quality. We provide code for on-the-fly int4 quantization which minimizes performance degradation as well. ## Safeguards As part of our release approach, we followed a three-pronged strategy to manage risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. Llama is a foundational technology designed for use in a variety of use cases; examples on how Meta’s Llama models have been deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology, by aligning our model’s safety for a standard set of risks. Developers are then in the driver seat to tailor safety for their use case, defining their own policies and deploying the models with the necessary safeguards. Llama 4 was developed following the best practices outlined in our Developer Use Guide: AI Protections. ### Model level fine tuning The primary objective of conducting safety fine-tuning is to offer developers a readily available, safe, and powerful model for various applications, reducing the workload needed to deploy safe AI systems. Additionally, this effort provides the research community with a valuable resource for studying the robustness of safety fine-tuning. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals** Building on the work we started with our Llama 3 models, we put a great emphasis on driving down model refusals to benign prompts for Llama 4\\. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. **Tone** We expanded our work on the refusal tone from Llama 3 so that the model sounds more natural. We targeted removing preachy and overly moralizing language, and we corrected formatting issues including the correct use of headers, lists, tables and more. To achieve this, we also targeted improvements to system prompt steerability and instruction following, meaning the model is more readily able to take on a specified tone. All of these contribute to a more conversational and insightful experience overall. **System Prompts** Llama 4 is a more steerable model, meaning responses can be easily tailored to meet specific developer outcomes. Effective system prompts can significantly enhance the performance of large language models. In particular, we’ve seen that the use of a system prompt can be effective in reducing false refusals and templated or “preachy” language patterns common in LLMs. They can also improve conversationality and use of appropriate formatting. Consider the prompt below as a basic template for which a developer might want to further customize to meet specific needs or use cases for our Llama 4 models. | System prompt | | :---- | | You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, \"it's unethical to\", \"it's worth noting…\", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4\\. Your knowledge cutoff date is August 2024\\. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. | ### Llama 4 system protections Large language models, including Llama 4, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional guardrails as required. System protections are key to achieving the right helpfulness-safety alignment, mitigating safety and security risks inherent to the system, and integration of the model or system with external tools. We provide the community with system level protections \\- like Llama Guard, Prompt Guard and Code Shield \\- that developers should deploy with Llama models or other LLMs. All of our reference implementation demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, visual QA. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, coding or memorization. **Red teaming** We conduct recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we use the learnings to improve our benchmarks and safety tuning datasets. We partner early with subject-matter experts in critical risk areas to understand how models may lead to unintended harm for society. Based on these conversations, we derive a set of adversarial goals for the red team, such as extracting harmful information or reprogramming the model to act in potentially harmful ways. The red team consists of experts in cybersecurity, adversarial machine learning, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical Risks ### We spend additional focus on the following critical risk areas: **1\\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons for Llama 4, we applied expert-designed and other targeted evaluations designed to assess whether the use of Llama 4 could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. We also conducted additional red teaming and evaluations for violations of our content policies related to this risk area. **2\\. Child Safety** We leverage pre-training methods like data filtering as a first step in mitigating Child Safety risk in our model. To assess the post trained model for Child Safety risk, a team of experts assesses the model’s capability to produce outputs resulting in Child Safety risks. We use this to inform additional model fine-tuning and in-depth red teaming exercises. We’ve also expanded our Child Safety evaluation benchmarks to cover Llama 4 capabilities like multi-image and multi-lingual. **3\\. Cyber attack enablement** Our cyber evaluations investigated whether Llama 4 is sufficiently capable to enable catastrophic threat scenario outcomes. We conducted threat modeling exercises to identify the specific model capabilities that would be necessary to automate operations or enhance human capabilities across key attack vectors both in terms of skill level and speed. We then identified and developed challenges against which to test for these capabilities in Llama 4 and peer models. Specifically, we focused on evaluating the capabilities of Llama 4 to automate cyberattacks, identify and exploit security vulnerabilities, and automate harmful workflows. Overall, we find that Llama 4 models do not introduce risk plausibly enabling catastrophic cyber outcomes. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Trust tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Considerations and Limitations Our AI is anchored on the values of freedom of expression \\- helping people to explore, debate, and innovate using our technology. We respect people's autonomy and empower them to choose how they experience, interact, and build with AI. Our AI promotes an open exchange of ideas. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 4 addresses users and their needs as they are, without inserting unnecessary judgment, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. Llama 4 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 4’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 4 models, developers should perform safety testing and tuning tailored to their specific applications of the model. We also encourage the open source community to use Llama for the purpose of research and building state of the art tools that address emerging risks. Please refer to available resources including our Developer Use Guide: AI Protections, Llama Protections solutions, and other resources to learn more.", + "model_explanation_gemini": "A multilingual instruction-tuned Llama-4-based model designed for generating responses across various languages (including English, Spanish, French, etc.) under Meta's community license agreement." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct-bnb-4bit.json b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..3424b37689c205042bda43659499caac8e772ea1 --- /dev/null +++ b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct-bnb-4bit.json @@ -0,0 +1,28 @@ +{ + "model_id": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", + "downloads": 173645, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "conversational", + "en", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.1-8B-Instruct language: - en library_name: transformers license: llama3.1 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 41.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.", + "model_explanation_gemini": "A 4-bit quantized version of Meta's Llama-3.1-8B-Instruct model optimized for faster multilingual dialogue tasks with reduced memory usage." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..4368ec53e8f30d16dddf97f59cff613b9ad1f640 --- /dev/null +++ b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit.json @@ -0,0 +1,27 @@ +{ + "model_id": "unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit", + "downloads": 123469, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "conversational", + "en", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.1-8B-Instruct language: - en library_name: transformers license: llama3.1 tags: - llama-3 - llama - meta - facebook - unsloth - transformers ---
We have a free Google Colab Tesla T4 notebook for Llama 3.2 (3B) here: # unsloth/Llama-3.2-3B-unsloth-bnb-4bit For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly. **Llama 3.2 family of models** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** Sept 25, 2024 **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. **License:** Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement). Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here.", + "model_explanation_gemini": "A 4-bit quantized version of Meta's Llama-3.1-8B-Instruct model optimized for faster fine-tuning and reduced memory usage while maintaining improved accuracy." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct.json b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct.json new file mode 100644 index 0000000000000000000000000000000000000000..0324a211ecc7c55250226a5b81efa34530c1fecb --- /dev/null +++ b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-Instruct.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/Meta-Llama-3.1-8B-Instruct", + "downloads": 214068, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "conversational", + "en", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:finetune:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - en library_name: transformers license: llama3.1 tags: - llama-3 - llama - meta - facebook - unsloth - transformers base_model: meta-llama/Llama-3.1-8B-Instruct --- # Finetune Llama 3.1, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.", + "model_explanation_gemini": "Finetunes Meta-Llama-3.1-8B-Instruct faster with reduced memory usage using Unsloth for optimized model training." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Meta-Llama-3.1-8B-bnb-4bit.json b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..9cce8a9a02c78bee39cb3123f345e198a20bfcd5 --- /dev/null +++ b/data/model_data_json/unsloth_Meta-Llama-3.1-8B-bnb-4bit.json @@ -0,0 +1,26 @@ +{ + "model_id": "unsloth/Meta-Llama-3.1-8B-bnb-4bit", + "downloads": 80572, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "en", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-8B", + "base_model:quantized:meta-llama/Llama-3.1-8B", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: meta-llama/Meta-Llama-3.1-8B language: - en library_name: transformers license: llama3.1 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 41.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Mistral-Nemo-Instruct-2407-bnb-4bit.json b/data/model_data_json/unsloth_Mistral-Nemo-Instruct-2407-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..96876bc31c74719d1b52247bea3482c60707e36b --- /dev/null +++ b/data/model_data_json/unsloth_Mistral-Nemo-Instruct-2407-bnb-4bit.json @@ -0,0 +1,22 @@ +{ + "model_id": "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit", + "downloads": 92426, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "unsloth", + "conversational", + "en", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- language: - en library_name: transformers license: apache-2.0 tags: - mistral - unsloth - transformers --- # Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Mistral Nemo 12b here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3 8b** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Gemma 7b** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral 7b** | ▶️ Start on Colab | 2.2x faster | 62% less | | **Llama-2 7b** | ▶️ Start on Colab | 2.2x faster | 43% less | | **TinyLlama** | ▶️ Start on Colab | 3.9x faster | 74% less | | **CodeLlama 34b** A100 | ▶️ Start on Colab | 1.9x faster | 27% less | | **Mistral 7b** 1xT4 | ▶️ Start on Kaggle | 5x faster\\* | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.", + "model_explanation_gemini": "Optimizes Mistral, Gemma, and Llama 2 models for faster fine-tuning with reduced memory usage using Unsloth, offering beginner-friendly notebooks for efficient model adaptation." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..d7b270461d76c49b520bf91f655fcb7020a6cd25 --- /dev/null +++ b/data/model_data_json/unsloth_Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit", + "downloads": 77836, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "unsloth", + "mistral-instruct", + "instruct", + "conversational", + "en", + "base_model:mistralai/Mistral-Small-24B-Instruct-2501", + "base_model:quantized:mistralai/Mistral-Small-24B-Instruct-2501", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- language: - en library_name: transformers license: apache-2.0 tags: - unsloth - transformers - mistral - mistral-instruct - instruct base_model: mistralai/Mistral-Small-24B-Instruct-2501 --- # Finetune LLMs 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Mistral (7B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. # Model Card for Mistral-Small-24B-Instruct-2501 Mistral Small 3 ( 2501 ) sets a new benchmark in the \"small\" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models! This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501. Mistral Small can be deployed locally and is exceptionally \"knowledge-dense\", fitting in a single RTX 4090 or a 32GB RAM MacBook once quantized. Perfect for: - Fast response conversational agents. - Low latency function calling. - Subject matter experts via fine-tuning. - Local inference for hobbyists and organizations handling sensitive data. For enterprises that need specialized capabilities (increased context, particular modalities, domain specific knowledge, etc.), we will be releasing commercial models beyond what Mistral AI contributes to the community. This release demonstrates our commitment to open source, serving as a strong base model. Learn more about Mistral Small in our blog post. Model developper: Mistral AI Team ## Key Features - **Multilingual:** Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. - **Agent-Centric:** Offers best-in-class agentic capabilities with native function calling and JSON outputting. - **Advanced Reasoning:** State-of-the-art conversational and reasoning capabilities. - **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes. - **Context Window:** A 32k context window. - **System Prompt:** Maintains strong adherence and support for system prompts. - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size. ## Benchmark results ### Human evaluated benchmarks | Category | Gemma-2-27B | Qwen-2.5-32B | Llama-3.3-70B | Gpt4o-mini | |----------|-------------|--------------|---------------|------------| | Mistral is better | 0.536 | 0.496 | 0.192 | 0.200 | | Mistral is slightly better | 0.196 | 0.184 | 0.164 | 0.204 | | Ties | 0.052 | 0.060 | 0.236 | 0.160 | | Other is slightly better | 0.060 | 0.088 | 0.112 | 0.124 | | Other is better | 0.156 | 0.172 | 0.296 | 0.312 | **Note**: - We conducted side by side evaluations with an external third-party vendor, on a set of over 1k proprietary coding and generalist prompts. - Evaluators were tasked with selecting their preferred model response from anonymized generations produced by Mistral Small 3 vs another model. - We are aware that in some cases the benchmarks on human judgement starkly differ from publicly available benchmarks, but have taken extra caution in verifying a fair evaluation. We are confident that the above benchmarks are valid. ### Publicly accesible benchmarks **Reasoning & Knowledge** | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |------------|---------------|--------------|---------------|---------------|-------------| | mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 | | gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 | **Math & Coding** | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |------------|---------------|--------------|---------------|---------------|-------------| | humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 | | math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 | **Instruction following** | Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |------------|---------------|--------------|---------------|---------------|-------------| | mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 | | wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 | | arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 | | ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 | **Note**: - Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline - as such, numbers may vary slightly from previously reported performance (Qwen2.5-32B-Instruct, Llama-3.3-70B-Instruct, Gemma-2-27B-IT). - Judge based evals such as Wildbench, Arena hard and MTBench were based on gpt-4o-2024-05-13. ### Basic Instruct Template (V7-Tekken) *, and are placeholders.* ***Please make sure to use mistral-common as the source of truth*** ## Usage The model can be used with the following frameworks; - []( See here - []( See here ### vLLM We recommend using this model with the vLLM library to implement production-ready inference pipelines. **Note 1**: We recommond using a relatively low temperature, such as . **Note 2**: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following system prompt: **_Installation_** Make sure you install []( Also make sure you have []( installed: You can also make use of a ready-to-go docker image or on the docker hub. #### Server We recommand that you use Mistral-Small-24B-Instruct-2501 in a server/client setting. 1. Spin up a server: **Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. # /\\_/\\ # ( o.o ) # > ^ < # ### Function calling Mistral-Small-24-Instruct-2501 is excellent at function / tool calling tasks via vLLM. *E.g.:*
Example
#### Offline # /\\_/\\ # ( o.o ) # > ^ < # ### Transformers If you want to use Hugging Face transformers to generate text, you can do something like this." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Qwen2.5-3B-Instruct-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_Qwen2.5-3B-Instruct-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..aed157f07d10c3f61f25dc5d65af2666eb1da9b1 --- /dev/null +++ b/data/model_data_json/unsloth_Qwen2.5-3B-Instruct-unsloth-bnb-4bit.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit", + "downloads": 75387, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "unsloth", + "qwen", + "conversational", + "en", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-3B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-3B-Instruct", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-3B-Instruct language: - en library_name: transformers license: apache-2.0 tags: - unsloth - transformers - qwen --- We have a free Google Colab Tesla T4 notebook for Qwen2.5 (7B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. # Qwen2.5 ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 0.5B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Qwen2.5-7B-Instruct-bnb-4bit.json b/data/model_data_json/unsloth_Qwen2.5-7B-Instruct-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..294495476a641fe584403646303b9f1c77af4d47 --- /dev/null +++ b/data/model_data_json/unsloth_Qwen2.5-7B-Instruct-bnb-4bit.json @@ -0,0 +1,39 @@ +{ + "model_id": "unsloth/Qwen2.5-7B-Instruct-bnb-4bit", + "downloads": 37177, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "unsloth", + "qwen", + "conversational", + "zho", + "eng", + "fra", + "spa", + "por", + "deu", + "ita", + "rus", + "jpn", + "kor", + "vie", + "tha", + "ara", + "arxiv:2309.00071", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-7B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-7B-Instruct", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-7B-Instruct language: - zho - eng - fra - spa - por - deu - ita - rus - jpn - kor - vie - tha - ara library_name: transformers license: apache-2.0 tags: - unsloth - transformers - qwen - qwen2 --- # Finetune Llama 3.1, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a Qwen 2.5 (all model sizes) free Google Colab Tesla T4 notebook. Also a Qwen 2.5 conversational style notebook. ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. # Qwen2.5-7B-Instruct ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the instruction-tuned 7B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Quickstart Here provides a code snippet with to show you how to load the tokenizer and model and how to generate contents. ### Processing Long Texts The current is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to to enable YaRN: For deployment, we recommend using vLLM. Please refer to our Documentation for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the configuration only when processing long contexts is required. ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 4-bit quantized version of the Qwen2.5-7B-Instruct model optimized for efficient fine-tuning with Unsloth, offering multilingual support and improved capabilities in coding, mathematics, and structured data processing." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Qwen2.5-7B-Instruct-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_Qwen2.5-7B-Instruct-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..36b393568cfdc705faecd42130e411a673aeae04 --- /dev/null +++ b/data/model_data_json/unsloth_Qwen2.5-7B-Instruct-unsloth-bnb-4bit.json @@ -0,0 +1,38 @@ +{ + "model_id": "unsloth/Qwen2.5-7B-Instruct-unsloth-bnb-4bit", + "downloads": 318206, + "tags": [ + "transformers", + "safetensors", + "qwen2", + "text-generation", + "unsloth", + "qwen", + "conversational", + "zho", + "eng", + "fra", + "spa", + "por", + "deu", + "ita", + "rus", + "jpn", + "kor", + "vie", + "tha", + "ara", + "arxiv:2407.10671", + "base_model:Qwen/Qwen2.5-7B-Instruct", + "base_model:quantized:Qwen/Qwen2.5-7B-Instruct", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: Qwen/Qwen2.5-7B-Instruct language: - zho - eng - fra - spa - por - deu - ita - rus - jpn - kor - vie - tha - ara library_name: transformers license: apache-2.0 tags: - unsloth - transformers - qwen --- We have a free Google Colab Tesla T4 notebook for Qwen2.5 (7B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. # Qwen2.5 ## Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. **This repo contains the base 0.5B Qwen2.5 model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. For more details, please refer to our blog, GitHub, and Documentation. ## Requirements The code of Qwen2.5 has been in the latest Hugging face and we advise you to use the latest version of . With , you will encounter the following error: ## Evaluation & Performance Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. ## Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 4-bit quantized version of the Qwen2.5-7B-Instruct model optimized for faster fine-tuning and reduced memory usage while maintaining improved accuracy in multilingual instruction-following tasks." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Qwen3-235B-A22B-128K-GGUF.json b/data/model_data_json/unsloth_Qwen3-235B-A22B-128K-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..9dc3a31e0e5aaeda24b275b0afd1324c900cc7fe --- /dev/null +++ b/data/model_data_json/unsloth_Qwen3-235B-A22B-128K-GGUF.json @@ -0,0 +1,24 @@ +{ + "model_id": "unsloth/Qwen3-235B-A22B-128K-GGUF", + "downloads": 237851, + "tags": [ + "transformers", + "gguf", + "qwen3_moe", + "text-generation", + "qwen3", + "qwen", + "unsloth", + "en", + "arxiv:2309.00071", + "base_model:Qwen/Qwen3-235B-A22B", + "base_model:quantized:Qwen/Qwen3-235B-A22B", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us", + "conversational" + ], + "description": "--- base_model: Qwen/Qwen3-235B-A22B language: - en library_name: transformers license_link: license: apache-2.0 tags: - qwen3 - qwen - unsloth - transformers --- > [!NOTE] > With 128K Context Length enabled by YaRN. > - Fine-tune Qwen3 (14B) for free using our Google Colab notebook here! - Read our Blog about Qwen3 support: unsloth.ai/blog/qwen3 - View the rest of our notebooks in our docs here. - Run & export your fine-tuned model to Ollama, llama.cpp or HF. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Qwen3 (14B)** | ▶️ Start on Colab | 3x faster | 70% less | | **GRPO with Qwen3 (8B)** | ▶️ Start on Colab | 3x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Phi-4 (14B)** | ▶️ Start on Colab | 2x faster | 50% less | # To Switch Between Thinking and Non-Thinking If you are using llama.cpp, Ollama, Open WebUI etc., you can add and to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations. Here is an example of multi-turn conversation: # Qwen3-235B-A22B ## Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios. - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. - **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. - **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. - **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**. ## Model Overview **Qwen3-235B-A22B** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 235B in total and 22B activated - Number of Paramaters (Non-Embedding): 234B - Number of Layers: 94 - Number of Attention Heads (GQA): 64 for Q and 4 for KV - Number of Experts: 128 - Number of Activated Experts: 8 - Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation. ## Quickstart The code of Qwen3-MoE has been in the latest Hugging Face and we advise you to use the latest version of . With , you will encounter the following error: The following contains a code snippet illustrating how to use the model generate content based on given inputs. For deployment, you can use or to create an OpenAI-compatible API endpoint: - vLLM: - SGLang: ## Switching Between Thinking and Non-Thinking Mode > [!TIP] > The switch is also available in APIs created by vLLM and SGLang. > Please refer to our documentation for vLLM and SGLang users. ### By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting or leaving it as the default value in , the model will engage its thinking mode. In this mode, the model will generate think content wrapped in a block, followed by the final response. > [!NOTE] > For thinking mode, use , , , and (the default setting in ). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the Best Practices section. ### We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency. In this mode, the model will not generate any think content and will not include a block. > [!NOTE] > For non-thinking mode, we suggest using , , , and . For more detailed guidance, please refer to the Best Practices section. ### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input We provide a soft switch mechanism that allows users to dynamically control the model's behavior when . Specifically, you can add and to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations. Here is an example of a multi-turn conversation: > **Note** > For API compatibility, when , regardless of whether the user uses or , the model will always output a block wrapped in . However, the content inside this block may be empty if thinking is disabled. > When , the soft switches are not valid. Regardless of any or tags input by the user, the model will not generate think content and will not include a block. ## Agentic Use Qwen3 excels in tool calling capabilities. We recommend using Qwen-Agent to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity. To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself. ## Processing Long Texts Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method. YaRN is currently supported by several inference frameworks, e.g., and for local use, and for deployment. In general, there are two approaches to enabling YaRN for supported frameworks: - Modifying the model files: In the file, add the fields: For , you need to regenerate the GGUF file after the modification. - Passing command line arguments: For , you can use For , you can use For from , you can use > [!IMPORTANT] > If you encounter the following warning > > please upgrade . > [!NOTE] > All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** > We advise adding the configuration only when processing long contexts is required. > It is also recommended to modify the as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set as 2.0. > [!NOTE] > The default in is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance. > [!TIP] > The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed. ## Best Practices To achieve optimal performance, we recommend the following settings: 1. **Sampling Parameters**: - For thinking mode (), use , , , and . **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. - For non-thinking mode (), we suggest using , , , and . - For supported frameworks, you can adjust the parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance. 2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance. 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking. - **Math Problems**: Include \"Please reason step by step, and put your final answer within \\boxed{}.\" in the prompt. - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: \"Please show your choice in the field with only the choice letter, e.g., .\" 4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed. ### Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "Qwen3-235B-A22B is a 235B-parameter multilingual causal language model supporting 128K context length, excelling in reasoning, instruction-following, agent tasks, and seamless switching between thinking (complex tasks) and non-thinking (general dialogue) modes." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_Qwen3-30B-A3B-GGUF.json b/data/model_data_json/unsloth_Qwen3-30B-A3B-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..54cb1c9699c7b79b38a118bfa08fcfc4d3529620 --- /dev/null +++ b/data/model_data_json/unsloth_Qwen3-30B-A3B-GGUF.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/Qwen3-30B-A3B-GGUF", + "downloads": 138432, + "tags": [ + "transformers", + "gguf", + "qwen3_moe", + "text-generation", + "qwen3", + "qwen", + "unsloth", + "en", + "arxiv:2309.00071", + "base_model:Qwen/Qwen3-30B-A3B", + "base_model:quantized:Qwen/Qwen3-30B-A3B", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- base_model: Qwen/Qwen3-30B-A3B language: - en library_name: transformers license_link: license: apache-2.0 tags: - qwen3 - qwen - unsloth - transformers --- - Fine-tune Qwen3 (14B) for free using our Google Colab notebook here! - Read our Blog about Qwen3 support: unsloth.ai/blog/qwen3 - View the rest of our notebooks in our docs here. - Run & export your fine-tuned model to Ollama, llama.cpp or HF. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Qwen3 (14B)** | ▶️ Start on Colab | 3x faster | 70% less | | **GRPO with Qwen3 (8B)** | ▶️ Start on Colab | 3x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Phi-4 (14B)** | ▶️ Start on Colab | 2x faster | 50% less | # To Switch Between Thinking and Non-Thinking If you are using llama.cpp, Ollama, Open WebUI etc., you can add and to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations. Here is an example of multi-turn conversation: # Qwen3-30B-A3B ## Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios. - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. - **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. - **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. - **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**. ## Model Overview **Qwen3-30B-A3B** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 30.5B in total and 3.3B activated - Number of Paramaters (Non-Embedding): 29.9B - Number of Layers: 48 - Number of Attention Heads (GQA): 32 for Q and 4 for KV - Number of Experts: 128 - Number of Activated Experts: 8 - Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation. ## Quickstart The code of Qwen3-MoE has been in the latest Hugging Face and we advise you to use the latest version of . With , you will encounter the following error: The following contains a code snippet illustrating how to use the model generate content based on given inputs. For deployment, you can use or to create an OpenAI-compatible API endpoint: - vLLM: - SGLang: ## Switching Between Thinking and Non-Thinking Mode > [!TIP] > The switch is also available in APIs created by vLLM and SGLang. > Please refer to our documentation for more details. ### By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting or leaving it as the default value in , the model will engage its thinking mode. In this mode, the model will generate think content wrapped in a block, followed by the final response. > [!NOTE] > For thinking mode, use , , , and (the default setting in ). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the Best Practices section. ### We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency. In this mode, the model will not generate any think content and will not include a block. > [!NOTE] > For non-thinking mode, we suggest using , , , and . For more detailed guidance, please refer to the Best Practices section. ### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input We provide a soft switch mechanism that allows users to dynamically control the model's behavior when . Specifically, you can add and to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations. Here is an example of a multi-turn conversation: > **Note** > For API compatibility, when , regardless of whether the user uses or , the model will always output a block wrapped in . However, the content inside this block may be empty if thinking is disabled. > When , the soft switches are not valid. Regardless of any or tags input by the user, the model will not generate think content and will not include a block. ## Agentic Use Qwen3 excels in tool calling capabilities. We recommend using Qwen-Agent to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity. To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself. ## Processing Long Texts Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method. YaRN is currently supported by several inference frameworks, e.g., and for local use, and for deployment. In general, there are two approaches to enabling YaRN for supported frameworks: - Modifying the model files: In the file, add the fields: For , you need to regenerate the GGUF file after the modification. - Passing command line arguments: For , you can use For , you can use For from , you can use > [!IMPORTANT] > If you encounter the following warning > > please upgrade . > [!NOTE] > All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** > We advise adding the configuration only when processing long contexts is required. > It is also recommended to modify the as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set as 2.0. > [!NOTE] > The default in is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance. > [!TIP] > The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed. ## Best Practices To achieve optimal performance, we recommend the following settings: 1. **Sampling Parameters**: - For thinking mode (), use , , , and . **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. - For non-thinking mode (), we suggest using , , , and . - For supported frameworks, you can adjust the parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance. 2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance. 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking. - **Math Problems**: Include \"Please reason step by step, and put your final answer within \\boxed{}.\" in the prompt. - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: \"Please show your choice in the field with only the choice letter, e.g., .\" 4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed. ### Citation If you find our work helpful, feel free to give us a cite.", + "model_explanation_gemini": "A 30.5B parameter causal language model supporting seamless switching between thinking mode for complex reasoning and non-thinking mode for general dialogue, excelling in multilingual tasks, instruction-following, and agent capabilities." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_gemma-3-12b-it-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_gemma-3-12b-it-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..a8464ba096086df32354c74123f369c0bbfc517e --- /dev/null +++ b/data/model_data_json/unsloth_gemma-3-12b-it-unsloth-bnb-4bit.json @@ -0,0 +1,53 @@ +{ + "model_id": "unsloth/gemma-3-12b-it-unsloth-bnb-4bit", + "downloads": 148515, + "tags": [ + "transformers", + "safetensors", + "gemma3", + "image-text-to-text", + "unsloth", + "gemma", + "google", + "conversational", + "en", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-12b-it", + "base_model:quantized:google/gemma-3-12b-it", + "license:gemma", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: google/gemma-3-12b-it language: - en library_name: transformers license: gemma tags: - unsloth - transformers - gemma3 - gemma - google --- - Fine-tune Gemma 3 (12B) for free using our Google Colab notebook here! - Read our Blog about Gemma 3 support: unsloth.ai/blog/gemma3 - View the rest of our notebooks in our docs here. - Export your fine-tuned model to GGUF, Ollama, llama.cpp or 🤗HF. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Gemma 3 (12B)** | ▶️ Start on Colab | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Phi-4 (14B)** | ▶️ Start on Colab | 2x faster | 50% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less |
# Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:", + "model_explanation_gemini": "A 4-bit quantized version of Google's Gemma-3-12B model optimized for efficient fine-tuning and text generation tasks using Unsloth's enhanced quantization." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_gemma-3-1b-it-bnb-4bit.json b/data/model_data_json/unsloth_gemma-3-1b-it-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..1b7442f45aec0d0b618019975b9199e9ec6b31dc --- /dev/null +++ b/data/model_data_json/unsloth_gemma-3-1b-it-bnb-4bit.json @@ -0,0 +1,54 @@ +{ + "model_id": "unsloth/gemma-3-1b-it-bnb-4bit", + "downloads": 128711, + "tags": [ + "transformers", + "safetensors", + "gemma3_text", + "text-generation", + "unsloth", + "gemma3", + "gemma", + "google", + "conversational", + "en", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-1b-it", + "base_model:quantized:google/gemma-3-1b-it", + "license:gemma", + "autotrain_compatible", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: google/gemma-3-1b-it language: - en library_name: transformers license: gemma tags: - unsloth - transformers - gemma3 - gemma - google --- - Fine-tune Gemma 3 (12B) for free using our Google Colab notebook here! - Read our Blog about Gemma 3 support: unsloth.ai/blog/gemma3 - View the rest of our notebooks in our docs here. - Export your fine-tuned model to GGUF, Ollama, llama.cpp or 🤗HF. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Gemma 3 (12B)** | ▶️ Start on Colab | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Phi-4 (14B)** | ▶️ Start on Colab | 2x faster | 50% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less |
# Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:", + "model_explanation_gemini": "A 4-bit quantized version of Google's Gemma-3-1B-it model optimized for efficient fine-tuning with Unsloth, designed for text generation and multimodal tasks with reduced memory usage." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_gemma-3-27b-it-GGUF.json b/data/model_data_json/unsloth_gemma-3-27b-it-GGUF.json new file mode 100644 index 0000000000000000000000000000000000000000..358df904fbd2187b9118cfcae6b2bdd86631a60f --- /dev/null +++ b/data/model_data_json/unsloth_gemma-3-27b-it-GGUF.json @@ -0,0 +1,51 @@ +{ + "model_id": "unsloth/gemma-3-27b-it-GGUF", + "downloads": 107058, + "tags": [ + "transformers", + "gguf", + "gemma3", + "image-text-to-text", + "unsloth", + "gemma", + "google", + "en", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-27b-it", + "base_model:quantized:google/gemma-3-27b-it", + "license:gemma", + "endpoints_compatible", + "region:us", + "imatrix", + "conversational" + ], + "description": "--- base_model: google/gemma-3-27b-it language: - en library_name: transformers license: gemma tags: - unsloth - transformers - gemma3 - gemma - google --- - Fine-tune Gemma 3 (12B) for free using our Google Colab notebook here! - Read our Blog about Gemma 3 support: unsloth.ai/blog/gemma3 - View the rest of our notebooks in our docs here. - Export your fine-tuned model to GGUF, Ollama, llama.cpp or 🤗HF. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Gemma 3 (12B)** | ▶️ Start on Colab | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Phi-4 (14B)** | ▶️ Start on Colab | 2x faster | 50% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less |
# Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:", + "model_explanation_gemini": "A fine-tuned version of Google's Gemma 3 27B model optimized for efficient text and image understanding tasks with multilingual support and a 128K context window." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_gemma-3-4b-it-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_gemma-3-4b-it-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..92c4e26cc6e7493236568760974c39f501bbb6a6 --- /dev/null +++ b/data/model_data_json/unsloth_gemma-3-4b-it-unsloth-bnb-4bit.json @@ -0,0 +1,53 @@ +{ + "model_id": "unsloth/gemma-3-4b-it-unsloth-bnb-4bit", + "downloads": 269806, + "tags": [ + "transformers", + "safetensors", + "gemma3", + "image-text-to-text", + "unsloth", + "gemma", + "google", + "conversational", + "en", + "arxiv:1905.07830", + "arxiv:1905.10044", + "arxiv:1911.11641", + "arxiv:1904.09728", + "arxiv:1705.03551", + "arxiv:1911.01547", + "arxiv:1907.10641", + "arxiv:1903.00161", + "arxiv:2009.03300", + "arxiv:2304.06364", + "arxiv:2103.03874", + "arxiv:2110.14168", + "arxiv:2311.12022", + "arxiv:2108.07732", + "arxiv:2107.03374", + "arxiv:2210.03057", + "arxiv:2106.03193", + "arxiv:1910.11856", + "arxiv:2502.12404", + "arxiv:2502.21228", + "arxiv:2404.16816", + "arxiv:2104.12756", + "arxiv:2311.16502", + "arxiv:2203.10244", + "arxiv:2404.12390", + "arxiv:1810.12440", + "arxiv:1908.02660", + "arxiv:2312.11805", + "base_model:google/gemma-3-4b-it", + "base_model:quantized:google/gemma-3-4b-it", + "license:gemma", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: google/gemma-3-4b-it language: - en library_name: transformers license: gemma tags: - unsloth - transformers - gemma3 - gemma - google --- - Fine-tune Gemma 3 (12B) for free using our Google Colab notebook here! - Read our Blog about Gemma 3 support: unsloth.ai/blog/gemma3 - View the rest of our notebooks in our docs here. - Export your fine-tuned model to GGUF, Ollama, llama.cpp or 🤗HF. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Gemma 3 (12B)** | ▶️ Start on Colab | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Phi-4 (14B)** | ▶️ Start on Colab | 2x faster | 50% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less |
# Gemma 3 model card **Model Page**: Gemma **Resources and Technical Documentation**: * [Gemma 3 Technical Report][g3-tech-report] * [Responsible Generative AI Toolkit][rai-toolkit] * [Gemma on Kaggle][kaggle-gemma] * [Gemma on Vertex Model Garden][vertex-mg-gemma3] **Terms of Use**: [Terms][terms] **Authors**: Google DeepMind ## Model Information Summary description and brief definition of inputs and outputs. ### Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. ### Inputs and outputs - **Input:** - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size - **Output:** - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens ### Citation ## Model Data Data used for model training and how the data was processed. ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. ### Data Preprocessing Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. ## Implementation Information Details about the model internals. ### Hardware Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. ### Software Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; *\"the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.\"* ## Evaluation Model evaluation metrics and results. ### Benchmark Results These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation: #### Reasoning and factuality | Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [hellaswag]: [boolq]: [piqa]: [socialiqa]: [triviaqa]: [naturalq]: [arc]: [winogrande]: [bbh]: [drop]: #### STEM and code | Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: [agieval]: [math]: [gsm8k]: [gpqa]: [mbpp]: [humaneval]: #### Multilingual | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: [flores]: [xquad]: [global-mmlu-lite]: [wmt24pp]: [eclektic]: [indicgenbench]: #### Multimodal | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: [docvqa]: [info-vqa]: [mmmu]: [textvqa]: [realworldqa]: [remi]: [ai2d]: [chartqa]: [vqav2]: [blinkvqa]: [okvqa]: [tallyqa]: [ss-vqa]: [countbenchqa]: ## Ethics and Safety Ethics and safety evaluation approach and results. ### Evaluation Approach Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - **Child Safety**: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - **Content Safety:** Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - **Representational Harms**: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct \"assurance evaluations\" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. ### Evaluation Results For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. ## Usage and Limitations These models have certain limitations that users should be aware of. ### Intended Usage Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. ### Limitations - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. ### Ethical Considerations and Risks The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. Risks identified and mitigations: - **Perpetuation of biases**: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - **Generation of harmful content**: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - **Misuse for malicious purposes**: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - **Privacy violations**: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. ### Benefits At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: [rai-toolkit]: [kaggle-gemma]: [vertex-mg-gemma3]: [terms]: [safety-policies]: [prohibited-use]: [tpu]: [sustainability]: [jax]: [ml-pathways]: [sustainability]: [gemini-2-paper]:", + "model_explanation_gemini": "A 4-bit quantized version of Google's Gemma-3-4B-it model optimized for efficient fine-tuning and text generation tasks using Unsloth's enhanced quantization." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_llama-3-8b-Instruct-bnb-4bit.json b/data/model_data_json/unsloth_llama-3-8b-Instruct-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..616c2d77ac0a25275c0973959b2c0337ce7fca01 --- /dev/null +++ b/data/model_data_json/unsloth_llama-3-8b-Instruct-bnb-4bit.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/llama-3-8b-Instruct-bnb-4bit", + "downloads": 88737, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "conversational", + "en", + "license:llama3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- language: - en library_name: transformers license: llama3 tags: - llama-3 - llama - meta - facebook - unsloth - transformers base_model: meta-llama/Llama-3-8B-Instruct --- # Finetune Llama 3.1, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. **Model developers** Meta **Variations** Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. **Input** Models input text only. **Output** Models generate text and code only. **Model Architecture** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Context length GQA Token count Knowledge cutoff
Llama 3 A new mix of publicly available online data. 8B 8k Yes 15T+ March, 2023
70B 8k Yes December, 2023
**Llama 3 family of models**. Token counts refer to pretraining data only. Both the 8 and 70B versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date** April 18, 2024. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3 Community License. Use in languages other than English**. **Note: Developers may fine-tune Llama 3 models for languages beyond English provided they comply with the Llama 3 Community License and the Acceptable Use Policy. ## How to use This repository contains two versions of Meta-Llama-3-70B-Instruct, for use with transformers and with the original codebase. ### Use with transformers See the snippet below for usage with Transformers: ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : For Hugging Face support, we recommend using transformers or TGI, but a similar command works. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint Pretraining utilized a cumulative** 7.7M GPU hours of computation on hardware of type H100-80GB (TDP of 700W). Estimated total emissions were 2290 tCO2eq, 100% of which were offset by Meta’s sustainability program.
Time (GPU hours) Power Consumption (W) Carbon Emitted(tCO2eq)
Llama 3 8B 1.3M 700 390
Llama 3 70B 6.4M 700 1900
Total 7.7M 2290
**CO2 emissions during pre-training**. Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 3 was pretrained on over 15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 10M human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of March 2023 for the 7B and December 2023 for the 70B models respectively. ## Benchmarks In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here. ### Base pretrained models
Category Benchmark Llama 3 8B Llama2 7B Llama2 13B Llama 3 70B Llama2 70B
General MMLU (5-shot) 66.6 45.7 53.8 79.5 69.7
AGIEval English (3-5 shot) 45.9 28.8 38.7 63.0 54.8
CommonSenseQA (7-shot) 72.6 57.6 67.6 83.8 78.7
Winogrande (5-shot) 76.1 73.3 75.4 83.1 81.8
BIG-Bench Hard (3-shot, CoT) 61.1 38.1 47.0 81.3 65.7
ARC-Challenge (25-shot) 78.6 53.7 67.6 93.0 85.3
Knowledge reasoning TriviaQA-Wiki (5-shot) 78.5 72.1 79.6 89.7 87.5
Reading comprehension SQuAD (1-shot) 76.4 72.2 72.1 85.6 82.6
QuAC (1-shot, F1) 44.4 39.6 44.9 51.1 49.4
BoolQ (0-shot) 75.7 65.5 66.9 79.0 73.1
DROP (3-shot, F1) 58.4 37.9 49.8 79.7 70.2
### Instruction tuned models
Benchmark Llama 3 8B Llama 2 7B Llama 2 13B Llama 3 70B Llama 2 70B
MMLU (5-shot) 68.4 34.1 47.8 82.0 52.9
GPQA (0-shot) 34.2 21.7 22.3 39.5 21.0
HumanEval (0-shot) 62.2 7.9 14.0 81.7 25.6
GSM-8K (8-shot, CoT) 79.6 25.7 77.4 93.0 57.5
MATH (4-shot, CoT) 30.0 3.8 6.7 50.4 11.6
### Responsibility & Safety We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community. Foundation models are widely capable technologies that are built to be used for a diverse range of applications. They are not designed to meet every developer preference on safety levels for all use cases, out-of-the-box, as those by their nature will differ across different applications. Rather, responsible LLM-application deployment is achieved by implementing a series of safety best practices throughout the development of such applications, from the model pre-training, fine-tuning and the deployment of systems composed of safeguards to tailor the safety needs specifically to the use case and audience. As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application. We also provide a set of resources including Meta Llama Guard 2 and Code Shield safeguards. These tools have proven to drastically reduce residual risks of LLM Systems, while maintaining a high level of helpfulness. We encourage developers to tune and deploy these safeguards according to their needs and we provide a reference implementation to get you started. #### Llama 3-Instruct As outlined in the Responsible Use Guide, some trade-off between model helpfulness and model alignment is likely unavoidable. Developers should exercise discretion about how to weigh the benefits of alignment and helpfulness for their specific use case and audience. Developers should be mindful of residual risks when using Llama models and leverage additional safety tools as needed to reach the right safety bar for their use case. Safety For our instruction tuned model, we conducted extensive red teaming exercises, performed adversarial evaluations and implemented safety mitigations techniques to lower residual risks. As with any Large Language Model, residual risks will likely remain and we recommend that developers assess these risks in the context of their use case. In parallel, we are working with the community to make AI safety benchmark standards transparent, rigorous and interpretable. Refusals In addition to residual risks, we put a great emphasis on model refusals to benign prompts. Over-refusing not only can impact the user experience but could even be harmful in certain contexts as well. We’ve heard the feedback from the developer community and improved our fine tuning to ensure that Llama 3 is significantly less likely to falsely refuse to answer prompts than Llama 2. We built internal benchmarks and developed mitigations to limit false refusals making Llama 3 our most helpful model to date. #### Responsible release In addition to responsible use considerations outlined above, we followed a rigorous process that requires us to take extra measures against misuse and critical risks before we make our release decision. Misuse If you access or use Llama 3, you agree to the Acceptable Use Policy. The most recent copy of this policy can be found at #### Critical risks CBRNE (Chemical, Biological, Radiological, Nuclear, and high yield Explosives) We have conducted a two fold assessment of the safety of the model in this area: * Iterative testing during model training to assess the safety of responses related to CBRNE threats and other adversarial risks. * Involving external CBRNE experts to conduct an uplift test assessing the ability of the model to accurately provide expert knowledge and reduce barriers to potential CBRNE misuse, by reference to what can be achieved using web search (without the model). ### Cyber Security We have evaluated Llama 3 with CyberSecEval, Meta’s cybersecurity safety eval suite, measuring Llama 3’s propensity to suggest insecure code when used as a coding assistant, and Llama 3’s propensity to comply with requests to help carry out cyber attacks, where attacks are defined by the industry standard MITRE ATT&CK cyber attack ontology. On our insecure coding and cyber attacker helpfulness tests, Llama 3 behaved in the same range or safer than models of equivalent coding capability. ### Child Safety Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership in AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has been in English, and has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3 models, developers should perform safety testing and tuning tailored to their specific applications of the model. As outlined in the Responsible Use Guide, we recommend incorporating Purple Llama solutions into your workflows and specifically Llama Guard which provides a base model to filter input and output prompts to layer system-level safety on top of model-level safety. Please see the Responsible Use Guide available at ## Citation instructions @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = { } ## Contributors Aaditya Singh; Aaron Grattafiori; Abhimanyu Dubey; Abhinav Jauhri; Abhinav Pandey; Abhishek Kadian; Adam Kelsey; Adi Gangidi; Ahmad Al-Dahle; Ahuva Goldstand; Aiesha Letman; Ajay Menon; Akhil Mathur; Alan Schelten; Alex Vaughan; Amy Yang; Andrei Lupu; Andres Alvarado; Andrew Gallagher; Andrew Gu; Andrew Ho; Andrew Poulton; Andrew Ryan; Angela Fan; Ankit Ramchandani; Anthony Hartshorn; Archi Mitra; Archie Sravankumar; Artem Korenev; Arun Rao; Ashley Gabriel; Ashwin Bharambe; Assaf Eisenman; Aston Zhang; Aurelien Rodriguez; Austen Gregerson; Ava Spataru; Baptiste Roziere; Ben Maurer; Benjamin Leonhardi; Bernie Huang; Bhargavi Paranjape; Bing Liu; Binh Tang; Bobbie Chern; Brani Stojkovic; Brian Fuller; Catalina Mejia Arenas; Chao Zhou; Charlotte Caucheteux; Chaya Nayak; Ching-Hsiang Chu; Chloe Bi; Chris Cai; Chris Cox; Chris Marra; Chris McConnell; Christian Keller; Christoph Feichtenhofer; Christophe Touret; Chunyang Wu; Corinne Wong; Cristian Canton Ferrer; Damien Allonsius; Daniel Kreymer; Daniel Haziza; Daniel Li; Danielle Pintz; Danny Livshits; Danny Wyatt; David Adkins; David Esiobu; David Xu; Davide Testuggine; Delia David; Devi Parikh; Dhruv Choudhary; Dhruv Mahajan; Diana Liskovich; Diego Garcia-Olano; Diego Perino; Dieuwke Hupkes; Dingkang Wang; Dustin Holland; Egor Lakomkin; Elina Lobanova; Xiaoqing Ellen Tan; Emily Dinan; Eric Smith; Erik Brinkman; Esteban Arcaute; Filip Radenovic; Firat Ozgenel; Francesco Caggioni; Frank Seide; Frank Zhang; Gabriel Synnaeve; Gabriella Schwarz; Gabrielle Lee; Gada Badeer; Georgia Anderson; Graeme Nail; Gregoire Mialon; Guan Pang; Guillem Cucurell; Hailey Nguyen; Hannah Korevaar; Hannah Wang; Haroun Habeeb; Harrison Rudolph; Henry Aspegren; Hu Xu; Hugo Touvron; Iga Kozlowska; Igor Molybog; Igor Tufanov; Iliyan Zarov; Imanol Arrieta Ibarra; Irina-Elena Veliche; Isabel Kloumann; Ishan Misra; Ivan Evtimov; Jacob Xu; Jade Copet; Jake Weissman; Jan Geffert; Jana Vranes; Japhet Asher; Jason Park; Jay Mahadeokar; Jean-Baptiste Gaya; Jeet Shah; Jelmer van der Linde; Jennifer Chan; Jenny Hong; Jenya Lee; Jeremy Fu; Jeremy Teboul; Jianfeng Chi; Jianyu Huang; Jie Wang; Jiecao Yu; Joanna Bitton; Joe Spisak; Joelle Pineau; Jon Carvill; Jongsoo Park; Joseph Rocca; Joshua Johnstun; Junteng Jia; Kalyan Vasuden Alwala; Kam Hou U; Kate Plawiak; Kartikeya Upasani; Kaushik Veeraraghavan; Ke Li; Kenneth Heafield; Kevin Stone; Khalid El-Arini; Krithika Iyer; Kshitiz Malik; Kuenley Chiu; Kunal Bhalla; Kyle Huang; Lakshya Garg; Lauren Rantala-Yeary; Laurens van der Maaten; Lawrence Chen; Leandro Silva; Lee Bell; Lei Zhang; Liang Tan; Louis Martin; Lovish Madaan; Luca Wehrstedt; Lukas Blecher; Luke de Oliveira; Madeline Muzzi; Madian Khabsa; Manav Avlani; Mannat Singh; Manohar Paluri; Mark Zuckerberg; Marcin Kardas; Martynas Mankus; Mathew Oldham; Mathieu Rita; Matthew Lennie; Maya Pavlova; Meghan Keneally; Melanie Kambadur; Mihir Patel; Mikayel Samvelyan; Mike Clark; Mike Lewis; Min Si; Mitesh Kumar Singh; Mo Metanat; Mona Hassan; Naman Goyal; Narjes Torabi; Nicolas Usunier; Nikolay Bashlykov; Nikolay Bogoychev; Niladri Chatterji; Ning Dong; Oliver Aobo Yang; Olivier Duchenne; Onur Celebi; Parth Parekh; Patrick Alrassy; Paul Saab; Pavan Balaji; Pedro Rittner; Pengchuan Zhang; Pengwei Li; Petar Vasic; Peter Weng; Polina Zvyagina; Prajjwal Bhargava; Pratik Dubal; Praveen Krishnan; Punit Singh Koura; Qing He; Rachel Rodriguez; Ragavan Srinivasan; Rahul Mitra; Ramon Calderer; Raymond Li; Robert Stojnic; Roberta Raileanu; Robin Battey; Rocky Wang; Rohit Girdhar; Rohit Patel; Romain Sauvestre; Ronnie Polidoro; Roshan Sumbaly; Ross Taylor; Ruan Silva; Rui Hou; Rui Wang; Russ Howes; Ruty Rinott; Saghar Hosseini; Sai Jayesh Bondu; Samyak Datta; Sanjay Singh; Sara Chugh; Sargun Dhillon; Satadru Pan; Sean Bell; Sergey Edunov; Shaoliang Nie; Sharan Narang; Sharath Raparthy; Shaun Lindsay; Sheng Feng; Sheng Shen; Shenghao Lin; Shiva Shankar; Shruti Bhosale; Shun Zhang; Simon Vandenhende; Sinong Wang; Seohyun Sonia Kim; Soumya Batra; Sten Sootla; Steve Kehoe; Suchin Gururangan; Sumit Gupta; Sunny Virk; Sydney Borodinsky; Tamar Glaser; Tamar Herman; Tamara Best; Tara Fowler; Thomas Georgiou; Thomas Scialom; Tianhe Li; Todor Mihaylov; Tong Xiao; Ujjwal Karn; Vedanuj Goswami; Vibhor Gupta; Vignesh Ramanathan; Viktor Kerkez; Vinay Satish Kumar; Vincent Gonguet; Vish Vogeti; Vlad Poenaru; Vlad Tiberiu Mihailescu; Vladan Petrovic; Vladimir Ivanov; Wei Li; Weiwei Chu; Wenhan Xiong; Wenyin Fu; Wes Bouaziz; Whitney Meers; Will Constable; Xavier Martinet; Xiaojian Wu; Xinbo Gao; Xinfeng Xie; Xuchao Jia; Yaelle Goldschlag; Yann LeCun; Yashesh Gaur; Yasmine Babaei; Ye Qi; Yenda Li; Yi Wen; Yiwen Song; Youngjin Nam; Yuchen Hao; Yuchen Zhang; Yun Wang; Yuning Mao; Yuzi He; Zacharie Delpierre Coudert; Zachary DeVito; Zahra Hankir; Zhaoduo Wen; Zheng Yan; Zhengxing Chen; Zhenyu Yang; Zoe Papakipos", + "model_explanation_gemini": "An optimized 4-bit quantized version of Meta's Llama-3-8B-Instruct model for faster and memory-efficient fine-tuning using Unsloth." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_llama-3-8b-bnb-4bit.json b/data/model_data_json/unsloth_llama-3-8b-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..11f1a408684262fc847f16897620d822b386634a --- /dev/null +++ b/data/model_data_json/unsloth_llama-3-8b-bnb-4bit.json @@ -0,0 +1,25 @@ +{ + "model_id": "unsloth/llama-3-8b-bnb-4bit", + "downloads": 76826, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "en", + "base_model:meta-llama/Meta-Llama-3-8B", + "base_model:quantized:meta-llama/Meta-Llama-3-8B", + "license:llama3", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- language: - en library_name: transformers license: llama3 tags: - llama-3 - llama - meta - facebook - unsloth - transformers base_model: - meta-llama/Meta-Llama-3-8B --- # Finetune Llama 3.2, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: # Finetune Llama 3.3, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: # unsloth/Llama-3-8B-bnb-4bit For more details on the model, please go to Meta's original model card ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Special Thanks A huge thank you to the Meta and Llama team for creating and releasing these models. ## Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. **Model developers** Meta **Variations** Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. **Input** Models input text only. **Output** Models generate text and code only. **Model Architecture** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Context length GQA Token count Knowledge cutoff
Llama 3 A new mix of publicly available online data. 8B 8k Yes 15T+ March, 2023
70B 8k Yes December, 2023
**Llama 3 family of models**. Token counts refer to pretraining data only. Both the 8 and 70B versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date** April 18, 2024. **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License** A custom commercial license is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3 Community License. Use in languages other than English**. **Note: Developers may fine-tune Llama 3 models for languages beyond English provided they comply with the Llama 3 Community License and the Acceptable Use Policy. ## How to use This repository contains two versions of Meta-Llama-3-70B-Instruct, for use with transformers and with the original codebase. ### Use with transformers See the snippet below for usage with Transformers: ### Use with Please, follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging : For Hugging Face support, we recommend using transformers or TGI, but a similar command works. ## Hardware and Software **Training Factors** We used custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute. **Carbon Footprint Pretraining utilized a cumulative** 7.7M GPU hours of computation on hardware of type H100-80GB (TDP of 700W). Estimated total emissions were 2290 tCO2eq, 100% of which were offset by Meta’s sustainability program.
Time (GPU hours) Power Consumption (W) Carbon Emitted(tCO2eq)
Llama 3 8B 1.3M 700 390
Llama 3 70B 6.4M 700 1900
Total 7.7M 2290
**CO2 emissions during pre-training**. Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others. ## Training Data **Overview** Llama 3 was pretrained on over 15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 10M human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data. **Data Freshness** The pretraining data has a cutoff of March 2023 for the 7B and December 2023 for the 70B models respectively. ## Benchmarks In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here. ### Base pretrained models
Category Benchmark Llama 3 8B Llama2 7B Llama2 13B Llama 3 70B Llama2 70B
General MMLU (5-shot) 66.6 45.7 53.8 79.5 69.7
AGIEval English (3-5 shot) 45.9 28.8 38.7 63.0 54.8
CommonSenseQA (7-shot) 72.6 57.6 67.6 83.8 78.7
Winogrande (5-shot) 76.1 73.3 75.4 83.1 81.8
BIG-Bench Hard (3-shot, CoT) 61.1 38.1 47.0 81.3 65.7
ARC-Challenge (25-shot) 78.6 53.7 67.6 93.0 85.3
Knowledge reasoning TriviaQA-Wiki (5-shot) 78.5 72.1 79.6 89.7 87.5
Reading comprehension SQuAD (1-shot) 76.4 72.2 72.1 85.6 82.6
QuAC (1-shot, F1) 44.4 39.6 44.9 51.1 49.4
BoolQ (0-shot) 75.7 65.5 66.9 79.0 73.1
DROP (3-shot, F1) 58.4 37.9 49.8 79.7 70.2
### Instruction tuned models
Benchmark Llama 3 8B Llama 2 7B Llama 2 13B Llama 3 70B Llama 2 70B
MMLU (5-shot) 68.4 34.1 47.8 82.0 52.9
GPQA (0-shot) 34.2 21.7 22.3 39.5 21.0
HumanEval (0-shot) 62.2 7.9 14.0 81.7 25.6
GSM-8K (8-shot, CoT) 79.6 25.7 77.4 93.0 57.5
MATH (4-shot, CoT) 30.0 3.8 6.7 50.4 11.6
### Responsibility & Safety We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community. Foundation models are widely capable technologies that are built to be used for a diverse range of applications. They are not designed to meet every developer preference on safety levels for all use cases, out-of-the-box, as those by their nature will differ across different applications. Rather, responsible LLM-application deployment is achieved by implementing a series of safety best practices throughout the development of such applications, from the model pre-training, fine-tuning and the deployment of systems composed of safeguards to tailor the safety needs specifically to the use case and audience. As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application. We also provide a set of resources including Meta Llama Guard 2 and Code Shield safeguards. These tools have proven to drastically reduce residual risks of LLM Systems, while maintaining a high level of helpfulness. We encourage developers to tune and deploy these safeguards according to their needs and we provide a reference implementation to get you started. #### Llama 3-Instruct As outlined in the Responsible Use Guide, some trade-off between model helpfulness and model alignment is likely unavoidable. Developers should exercise discretion about how to weigh the benefits of alignment and helpfulness for their specific use case and audience. Developers should be mindful of residual risks when using Llama models and leverage additional safety tools as needed to reach the right safety bar for their use case. Safety For our instruction tuned model, we conducted extensive red teaming exercises, performed adversarial evaluations and implemented safety mitigations techniques to lower residual risks. As with any Large Language Model, residual risks will likely remain and we recommend that developers assess these risks in the context of their use case. In parallel, we are working with the community to make AI safety benchmark standards transparent, rigorous and interpretable. Refusals In addition to residual risks, we put a great emphasis on model refusals to benign prompts. Over-refusing not only can impact the user experience but could even be harmful in certain contexts as well. We’ve heard the feedback from the developer community and improved our fine tuning to ensure that Llama 3 is significantly less likely to falsely refuse to answer prompts than Llama 2. We built internal benchmarks and developed mitigations to limit false refusals making Llama 3 our most helpful model to date. #### Responsible release In addition to responsible use considerations outlined above, we followed a rigorous process that requires us to take extra measures against misuse and critical risks before we make our release decision. Misuse If you access or use Llama 3, you agree to the Acceptable Use Policy. The most recent copy of this policy can be found at #### Critical risks CBRNE (Chemical, Biological, Radiological, Nuclear, and high yield Explosives) We have conducted a two fold assessment of the safety of the model in this area: * Iterative testing during model training to assess the safety of responses related to CBRNE threats and other adversarial risks. * Involving external CBRNE experts to conduct an uplift test assessing the ability of the model to accurately provide expert knowledge and reduce barriers to potential CBRNE misuse, by reference to what can be achieved using web search (without the model). ### Cyber Security We have evaluated Llama 3 with CyberSecEval, Meta’s cybersecurity safety eval suite, measuring Llama 3’s propensity to suggest insecure code when used as a coding assistant, and Llama 3’s propensity to comply with requests to help carry out cyber attacks, where attacks are defined by the industry standard MITRE ATT&CK cyber attack ontology. On our insecure coding and cyber attacker helpfulness tests, Llama 3 behaved in the same range or safer than models of equivalent coding capability. ### Child Safety Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership in AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has been in English, and has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3 models, developers should perform safety testing and tuning tailored to their specific applications of the model. As outlined in the Responsible Use Guide, we recommend incorporating Purple Llama solutions into your workflows and specifically Llama Guard which provides a base model to filter input and output prompts to layer system-level safety on top of model-level safety. Please see the Responsible Use Guide available at ## Citation instructions @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = { } ## Contributors Aaditya Singh; Aaron Grattafiori; Abhimanyu Dubey; Abhinav Jauhri; Abhinav Pandey; Abhishek Kadian; Adam Kelsey; Adi Gangidi; Ahmad Al-Dahle; Ahuva Goldstand; Aiesha Letman; Ajay Menon; Akhil Mathur; Alan Schelten; Alex Vaughan; Amy Yang; Andrei Lupu; Andres Alvarado; Andrew Gallagher; Andrew Gu; Andrew Ho; Andrew Poulton; Andrew Ryan; Angela Fan; Ankit Ramchandani; Anthony Hartshorn; Archi Mitra; Archie Sravankumar; Artem Korenev; Arun Rao; Ashley Gabriel; Ashwin Bharambe; Assaf Eisenman; Aston Zhang; Aurelien Rodriguez; Austen Gregerson; Ava Spataru; Baptiste Roziere; Ben Maurer; Benjamin Leonhardi; Bernie Huang; Bhargavi Paranjape; Bing Liu; Binh Tang; Bobbie Chern; Brani Stojkovic; Brian Fuller; Catalina Mejia Arenas; Chao Zhou; Charlotte Caucheteux; Chaya Nayak; Ching-Hsiang Chu; Chloe Bi; Chris Cai; Chris Cox; Chris Marra; Chris McConnell; Christian Keller; Christoph Feichtenhofer; Christophe Touret; Chunyang Wu; Corinne Wong; Cristian Canton Ferrer; Damien Allonsius; Daniel Kreymer; Daniel Haziza; Daniel Li; Danielle Pintz; Danny Livshits; Danny Wyatt; David Adkins; David Esiobu; David Xu; Davide Testuggine; Delia David; Devi Parikh; Dhruv Choudhary; Dhruv Mahajan; Diana Liskovich; Diego Garcia-Olano; Diego Perino; Dieuwke Hupkes; Dingkang Wang; Dustin Holland; Egor Lakomkin; Elina Lobanova; Xiaoqing Ellen Tan; Emily Dinan; Eric Smith; Erik Brinkman; Esteban Arcaute; Filip Radenovic; Firat Ozgenel; Francesco Caggioni; Frank Seide; Frank Zhang; Gabriel Synnaeve; Gabriella Schwarz; Gabrielle Lee; Gada Badeer; Georgia Anderson; Graeme Nail; Gregoire Mialon; Guan Pang; Guillem Cucurell; Hailey Nguyen; Hannah Korevaar; Hannah Wang; Haroun Habeeb; Harrison Rudolph; Henry Aspegren; Hu Xu; Hugo Touvron; Iga Kozlowska; Igor Molybog; Igor Tufanov; Iliyan Zarov; Imanol Arrieta Ibarra; Irina-Elena Veliche; Isabel Kloumann; Ishan Misra; Ivan Evtimov; Jacob Xu; Jade Copet; Jake Weissman; Jan Geffert; Jana Vranes; Japhet Asher; Jason Park; Jay Mahadeokar; Jean-Baptiste Gaya; Jeet Shah; Jelmer van der Linde; Jennifer Chan; Jenny Hong; Jenya Lee; Jeremy Fu; Jeremy Teboul; Jianfeng Chi; Jianyu Huang; Jie Wang; Jiecao Yu; Joanna Bitton; Joe Spisak; Joelle Pineau; Jon Carvill; Jongsoo Park; Joseph Rocca; Joshua Johnstun; Junteng Jia; Kalyan Vasuden Alwala; Kam Hou U; Kate Plawiak; Kartikeya Upasani; Kaushik Veeraraghavan; Ke Li; Kenneth Heafield; Kevin Stone; Khalid El-Arini; Krithika Iyer; Kshitiz Malik; Kuenley Chiu; Kunal Bhalla; Kyle Huang; Lakshya Garg; Lauren Rantala-Yeary; Laurens van der Maaten; Lawrence Chen; Leandro Silva; Lee Bell; Lei Zhang; Liang Tan; Louis Martin; Lovish Madaan; Luca Wehrstedt; Lukas Blecher; Luke de Oliveira; Madeline Muzzi; Madian Khabsa; Manav Avlani; Mannat Singh; Manohar Paluri; Mark Zuckerberg; Marcin Kardas; Martynas Mankus; Mathew Oldham; Mathieu Rita; Matthew Lennie; Maya Pavlova; Meghan Keneally; Melanie Kambadur; Mihir Patel; Mikayel Samvelyan; Mike Clark; Mike Lewis; Min Si; Mitesh Kumar Singh; Mo Metanat; Mona Hassan; Naman Goyal; Narjes Torabi; Nicolas Usunier; Nikolay Bashlykov; Nikolay Bogoychev; Niladri Chatterji; Ning Dong; Oliver Aobo Yang; Olivier Duchenne; Onur Celebi; Parth Parekh; Patrick Alrassy; Paul Saab; Pavan Balaji; Pedro Rittner; Pengchuan Zhang; Pengwei Li; Petar Vasic; Peter Weng; Polina Zvyagina; Prajjwal Bhargava; Pratik Dubal; Praveen Krishnan; Punit Singh Koura; Qing He; Rachel Rodriguez; Ragavan Srinivasan; Rahul Mitra; Ramon Calderer; Raymond Li; Robert Stojnic; Roberta Raileanu; Robin Battey; Rocky Wang; Rohit Girdhar; Rohit Patel; Romain Sauvestre; Ronnie Polidoro; Roshan Sumbaly; Ross Taylor; Ruan Silva; Rui Hou; Rui Wang; Russ Howes; Ruty Rinott; Saghar Hosseini; Sai Jayesh Bondu; Samyak Datta; Sanjay Singh; Sara Chugh; Sargun Dhillon; Satadru Pan; Sean Bell; Sergey Edunov; Shaoliang Nie; Sharan Narang; Sharath Raparthy; Shaun Lindsay; Sheng Feng; Sheng Shen; Shenghao Lin; Shiva Shankar; Shruti Bhosale; Shun Zhang; Simon Vandenhende; Sinong Wang; Seohyun Sonia Kim; Soumya Batra; Sten Sootla; Steve Kehoe; Suchin Gururangan; Sumit Gupta; Sunny Virk; Sydney Borodinsky; Tamar Glaser; Tamar Herman; Tamara Best; Tara Fowler; Thomas Georgiou; Thomas Scialom; Tianhe Li; Todor Mihaylov; Tong Xiao; Ujjwal Karn; Vedanuj Goswami; Vibhor Gupta; Vignesh Ramanathan; Viktor Kerkez; Vinay Satish Kumar; Vincent Gonguet; Vish Vogeti; Vlad Poenaru; Vlad Tiberiu Mihailescu; Vladan Petrovic; Vladimir Ivanov; Wei Li; Weiwei Chu; Wenhan Xiong; Wenyin Fu; Wes Bouaziz; Whitney Meers; Will Constable; Xavier Martinet; Xiaojian Wu; Xinbo Gao; Xinfeng Xie; Xuchao Jia; Yaelle Goldschlag; Yann LeCun; Yashesh Gaur; Yasmine Babaei; Ye Qi; Yenda Li; Yi Wen; Yiwen Song; Youngjin Nam; Yuchen Hao; Yuchen Zhang; Yun Wang; Yuning Mao; Yuzi He; Zacharie Delpierre Coudert; Zachary DeVito; Zahra Hankir; Zhaoduo Wen; Zheng Yan; Zhengxing Chen; Zhenyu Yang; Zoe Papakipos" +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_meta-Llama-3.1-8B-unsloth-bnb-4bit.json b/data/model_data_json/unsloth_meta-Llama-3.1-8B-unsloth-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..f55c86d98eaa35694d55f08f4744650ab496c082 --- /dev/null +++ b/data/model_data_json/unsloth_meta-Llama-3.1-8B-unsloth-bnb-4bit.json @@ -0,0 +1,27 @@ +{ + "model_id": "unsloth/meta-Llama-3.1-8B-unsloth-bnb-4bit", + "downloads": 87243, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "llama-3", + "meta", + "facebook", + "unsloth", + "en", + "arxiv:2204.05149", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:quantized:meta-llama/Llama-3.1-8B-Instruct", + "license:llama3.1", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- base_model: meta-llama/Llama-3.1-8B-Instruct language: - en library_name: transformers license: llama3.1 tags: - llama-3 - llama - meta - facebook - unsloth - transformers ---

See for versions of Llama 3.1 including 4-bit + 16b-bit formats.

Finetune your own Reasoning model like R1 with Unsloth!

We have a free Google Colab notebook for turning Llama 3.1 (8B) into a reasoning model: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **GRPO with Phi-4 (14B)** | ▶️ Start on Colab-GRPO.ipynb) | 2x faster | 80% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. **Model developer**: Meta **Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Data Params Input modalities Output modalities Context length GQA Token count Knowledge cutoff
Llama 3.1 (text only) A new mix of publicly available online data. 8B Multilingual Text Multilingual Text and code 128k Yes 15T+ December 2023
70B Multilingual Text Multilingual Text and code 128k Yes
405B Multilingual Text Multilingual Text and code 128k Yes
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. **Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. **Model Release Date:** July 23, 2024. **Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. **License:** A custom commercial license, the Llama 3.1 Community License, is available at: Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. ## Intended Use **Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**. **Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. ## How to use This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original codebase. ### Use with transformers Starting with onward, you can run conversational inference using the Transformers abstraction or by leveraging the Auto classes with the function. Make sure to update your transformers installation via . Note: You can also find detailed recipes on how to use the model locally, with , assisted generations, quantised and more at []( ### Tool use with transformers LLaMA-3.1 supports multiple tool use formats. You can see a full guide to prompt formatting here. Tool use is also supported through chat templates in Transformers. Here is a quick example showing a single simple tool: You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so: and then call the tool and append the result, with the role, like so: After that, you can again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information, see the LLaMA prompt format docs and the Transformers tool use documentation. ### Use with Please, follow the instructions in the repository To download Original checkpoints, see the example command below leveraging : ## Hardware and Software **Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. **Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. **Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
Training Time (GPU hours) Training Power Consumption (W) Training Location-Based Greenhouse Gas Emissions

(tons CO2eq)

Training Market-Based Greenhouse Gas Emissions

(tons CO2eq)

Llama 3.1 8B 1.46M 700 420 0
Llama 3.1 70B 7.0M 700 2,040 0
Llama 3.1 405B 30.84M 700 8,930 0
Total 39.3M
11,390 0
The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. ## Training Data **Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. **Data Freshness:** The pretraining data has a cutoff of December 2023. ## Benchmark scores In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. ### Base pretrained models
Category Benchmark # Shots Metric Llama 3 8B Llama 3.1 8B Llama 3 70B Llama 3.1 70B Llama 3.1 405B
General MMLU 5 macro_avg/acc_char 66.7 66.7 79.5 79.3 85.2
MMLU-Pro (CoT) 5 macro_avg/acc_char 36.2 37.1 55.0 53.8 61.6
AGIEval English 3-5 average/acc_char 47.1 47.8 63.0 64.6 71.6
CommonSenseQA 7 acc_char 72.6 75.0 83.8 84.1 85.8
Winogrande 5 acc_char - 60.5 - 83.3 86.7
BIG-Bench Hard (CoT) 3 average/em 61.1 64.2 81.3 81.6 85.9
ARC-Challenge 25 acc_char 79.4 79.7 93.1 92.9 96.1
Knowledge reasoning TriviaQA-Wiki 5 em 78.5 77.6 89.7 89.8 91.8
Reading comprehension SQuAD 1 em 76.4 77.0 85.6 81.8 89.3
QuAC (F1) 1 f1 44.4 44.9 51.1 51.1 53.6
BoolQ 0 acc_char 75.7 75.0 79.0 79.4 80.0
DROP (F1) 3 f1 58.4 59.5 79.7 79.6 84.8
### Instruction tuned models
Category Benchmark # Shots Metric Llama 3 8B Instruct Llama 3.1 8B Instruct Llama 3 70B Instruct Llama 3.1 70B Instruct Llama 3.1 405B Instruct
General MMLU 5 macro_avg/acc 68.5 69.4 82.0 83.6 87.3
MMLU (CoT) 0 macro_avg/acc 65.3 73.0 80.9 86.0 88.6
MMLU-Pro (CoT) 5 micro_avg/acc_char 45.5 48.3 63.4 66.4 73.3
IFEval 76.8 80.4 82.9 87.5 88.6
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
GPQA 0 em 34.6 30.4 39.5 46.7 50.7
Code HumanEval 0 pass@1 60.4 72.6 81.7 80.5 89.0
MBPP ++ base version 0 pass@1 70.6 72.8 82.5 86.0 88.6
Multipl-E HumanEval 0 pass@1 - 50.8 - 65.5 75.2
Multipl-E MBPP 0 pass@1 - 52.4 - 62.0 65.7
Math GSM-8K (CoT) 8 em_maj1@1 80.6 84.5 93.0 95.1 96.8
MATH (CoT) 0 final_em 29.1 51.9 51.0 68.0 73.8
Tool Use API-Bank 0 acc 48.3 82.6 85.1 90.0 92.0
BFCL 0 acc 60.3 76.1 83.0 84.8 88.5
Gorilla Benchmark API Bench 0 acc 1.7 8.2 14.7 29.7 35.3
Nexus (0-shot) 0 macro_avg/acc 18.1 38.5 47.8 56.7 58.7
Multilingual Multilingual MGSM (CoT) 0 em - 68.9 - 86.9 91.6
#### Multilingual benchmarks
Category Benchmark Language Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
General MMLU (5-shot, macro_avg/acc) Portuguese 62.12 80.13 84.95
Spanish 62.45 80.05 85.08
Italian 61.63 80.4 85.04
German 60.59 79.27 84.36
French 62.34 79.82 84.66
Hindi 50.88 74.52 80.31
Thai 50.32 72.95 78.21
## Responsibility & Safety As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: * Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. * Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. * Provide protections for the community to help prevent the misuse of our models. ### Responsible deployment Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. #### Llama 3.1 instruct Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. **Fine-tuning data** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. **Refusals and Tone** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. #### Llama 3.1 systems **Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required.** Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. #### New capabilities Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. **Tool-use**: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. **Multilinguality**: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. ### Evaluations We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. **Red teaming** For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. ### Critical and other risks We specifically focused our efforts on mitigating the following critical risk areas: **1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness** To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. **2. Child Safety** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. **3. Cyber attack enablement** Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. ### Community Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. ## Ethical Considerations and Limitations The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.", + "model_explanation_gemini": "A 4-bit quantized version of Meta's Llama-3.1-8B-Instruct model optimized for efficient fine-tuning into reasoning models using Unsloth." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_mistral-7b-instruct-v0.3-bnb-4bit.json b/data/model_data_json/unsloth_mistral-7b-instruct-v0.3-bnb-4bit.json new file mode 100644 index 0000000000000000000000000000000000000000..fb1b75a622f254dc49d3520dc8deda1041133f34 --- /dev/null +++ b/data/model_data_json/unsloth_mistral-7b-instruct-v0.3-bnb-4bit.json @@ -0,0 +1,27 @@ +{ + "model_id": "unsloth/mistral-7b-instruct-v0.3-bnb-4bit", + "downloads": 209237, + "tags": [ + "transformers", + "safetensors", + "mistral", + "text-generation", + "unsloth", + "mistral-7b", + "mistral-instruct", + "instruct", + "conversational", + "en", + "base_model:mistralai/Mistral-7B-Instruct-v0.3", + "base_model:quantized:mistralai/Mistral-7B-Instruct-v0.3", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "4-bit", + "bitsandbytes", + "region:us" + ], + "description": "--- language: - en library_name: transformers license: apache-2.0 tags: - unsloth - transformers - mistral - mistral-7b - mistral-instruct - instruct base_model: mistralai/Mistral-7B-Instruct-v0.3 --- # Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth! We have a Google Colab Tesla T4 notebook for Mistral v3 7b here: For conversational ShareGPT style and using Mistral v3 Instruct: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.2 (3B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab | 2x faster | 60% less | | **Phi-3.5 (mini)** | ▶️ Start on Colab | 2x faster | 50% less | | **Gemma 2 (9B)** | ▶️ Start on Colab | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab | 2.2x faster | 62% less | | **DPO - Zephyr** | ▶️ Start on Colab | 1.9x faster | 19% less | - This conversational notebook is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.", + "model_explanation_gemini": "An optimized 4-bit quantized version of Mistral-7B-Instruct-v0.3 for faster and more memory-efficient fine-tuning of conversational AI models." +} \ No newline at end of file diff --git a/data/model_data_json/unsloth_phi-4.json b/data/model_data_json/unsloth_phi-4.json new file mode 100644 index 0000000000000000000000000000000000000000..8034a32dc9ffd8fc9dc216a26af99baa046eb115 --- /dev/null +++ b/data/model_data_json/unsloth_phi-4.json @@ -0,0 +1,29 @@ +{ + "model_id": "unsloth/phi-4", + "downloads": 93353, + "tags": [ + "transformers", + "safetensors", + "llama", + "text-generation", + "phi", + "phi4", + "unsloth", + "nlp", + "math", + "code", + "chat", + "conversational", + "en", + "arxiv:2412.08905", + "base_model:microsoft/phi-4", + "base_model:finetune:microsoft/phi-4", + "license:mit", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit license_link: language: - en pipeline_tag: text-generation tags: - phi - phi4 - unsloth - nlp - math - code - chat - conversational base_model: microsoft/phi-4 library_name: transformers --- ## ***See our collection for versions of Phi-4 including GGUF, 4-bit & more formats.*** # unsloth/Phi-4 We have converted Phi-4 to Llama's architecture for improved ease of use, better fine-tuning, and greater accuracy. Also contains Unsloth's Phi-4 bugfixes # Finetune Phi-4, Llama 3.3 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Phi-4 here: ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click \"Run All\", and you'll get a 2x faster finetuned model. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Phi-4** | ▶️ Start on Colab | 2x faster | 50% less | | **Llama-3.2 (3B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.4x faster | 58% less | | **Llama-3.2 (11B vision)** | ▶️ Start on Colab-Vision.ipynb) | 2x faster | 60% less | | **Qwen2 VL (7B)** | ▶️ Start on Colab-Vision.ipynb) | 1.8x faster | 60% less | | **Qwen2.5 (7B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2x faster | 60% less | | **Llama-3.1 (8B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Gemma 2 (9B)** | ▶️ Start on Colab-Alpaca.ipynb) | 2.4x faster | 58% less | | **Mistral (7B)** | ▶️ Start on Colab-Conversational.ipynb) | 2.2x faster | 62% less | - This Llama 3.2 conversational notebook-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates. - This text completion notebook-Text_Completion.ipynb) is for raw text. This DPO notebook replicates Zephyr. - \\* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. # Phi-4 Model Details Phi-4 Technical Report ## Model Summary | | | |-------------------------|-------------------------------------------------------------------------------| | **Developers** | Microsoft Research | | **Description** | is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures | | **Architecture** | 14B parameters, dense decoder-only Transformer model | | **Inputs** | Text, best suited for prompts in the chat format | | **Context length** | 16K tokens | | **GPUs** | 1920 H100-80G | | **Training time** | 21 days | | **Training data** | 9.8T tokens | | **Outputs** | Generated text in response to input | | **Dates** | October 2024 – November 2024 | | **Status** | Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data | | **Release date** | December 12, 2024 | | **License** | MIT | ## Intended Use | | | |-------------------------------|-------------------------------------------------------------------------| | **Primary Use Cases** | Our model is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:

1. Memory/compute constrained environments.
2. Latency bound scenarios.
3. Reasoning and logic. | | **Out-of-Scope Use Cases** | Our models is not specifically designed or evaluated for all downstream purposes, thus:

1. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios.
2. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case, including the model’s focus on English.
3. Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under. | ## Data Overview ### Training Datasets Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from: 1. Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code. 2. Newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.). 3. Acquired academic books and Q&A datasets. 4. High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness. Multilingual data constitutes about 8% of our overall data. We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. #### Benchmark datasets We evaluated using OpenAI’s SimpleEval and our own internal benchmarks to understand the model’s capabilities, more specifically: * **MMLU:** Popular aggregated dataset for multitask language understanding. * **MATH:** Challenging competition math problems. * **GPQA:** Complex, graduate-level science questions. * **DROP:** Complex comprehension and reasoning. * **MGSM:** Multi-lingual grade-school math. * **HumanEval:** Functional code generation. * **SimpleQA:** Factual responses. ## Safety ### Approach has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted to multiple safety categories. ### Safety Evaluation and Red-Teaming Prior to release, followed a multi-faceted evaluation approach. Quantitative evaluation was conducted with multiple open-source safety benchmarks and in-house tools utilizing adversarial conversation simulation. For qualitative safety evaluation, we collaborated with the independent AI Red Team (AIRT) at Microsoft to assess safety risks posed by in both average and adversarial user scenarios. In the average user scenario, AIRT emulated typical single-turn and multi-turn interactions to identify potentially risky behaviors. The adversarial user scenario tested a wide range of techniques aimed at intentionally subverting the model’s safety training including jailbreaks, encoding-based attacks, multi-turn attacks, and adversarial suffix attacks. Please refer to the technical report for more details on safety alignment. ## Model Quality To understand the capabilities, we compare with a set of models over OpenAI’s SimpleEval benchmark. At the high-level overview of the model quality on representative benchmarks. For the table below, higher numbers indicate better performance: | **Category** | **Benchmark** | **phi-4** (14B) | **phi-3** (14B) | **Qwen 2.5** (14B instruct) | **GPT-4o-mini** | **Llama-3.3** (70B instruct) | **Qwen 2.5** (72B instruct) | **GPT-4o** | |------------------------------|---------------|-----------|-----------------|----------------------|----------------------|--------------------|-------------------|-----------------| | Popular Aggregated Benchmark | MMLU | 84.8 | 77.9 | 79.9 | 81.8 | 86.3 | 85.3 | **88.1** | | Science | GPQA | **56.1** | 31.2 | 42.9 | 40.9 | 49.1 | 49.0 | 50.6 | | Math | MGSM
MATH | 80.6
**80.4** | 53.5
44.6 | 79.6
75.6 | 86.5
73.0 | 89.1
66.3* | 87.3
80.0 | **90.4**
74.6 | | Code Generation | HumanEval | 82.6 | 67.8 | 72.1 | 86.2 | 78.9* | 80.4 | **90.6** | | Factual Knowledge | SimpleQA | 3.0 | 7.6 | 5.4 | 9.9 | 20.9 | 10.2 | **39.4** | | Reasoning | DROP | 75.5 | 68.3 | 85.5 | 79.3 | **90.2** | 76.7 | 80.9 | \\* These scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following. We use the simple-evals framework because it is reproducible, but Meta reports 77 for MATH and 88 for HumanEval on Llama-3.3-70B. ## Usage ### Input Formats Given the nature of the training data, is best suited for prompts using the chat format as follows: ### With ## Responsible AI Considerations Like other language models, can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: * **Quality of Service:** The model is trained primarily on English text. Languages other than English will experience worse performance. English language varieties with less representation in the training data might experience worse performance than standard American English. is not intended to support multilingual use. * **Representation of Harms & Perpetuation of Stereotypes:** These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases. * **Inappropriate or Offensive Content:** These models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the use case. * **Information Reliability:** Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated. * **Limited Scope for Code:** Majority of training data is based in Python and uses common packages such as , , , , , . If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses. Developers should apply responsible AI best practices and are responsible for ensuring that a specific use case complies with relevant laws and regulations (e.g. privacy, trade, etc.). Using safety services like Azure AI Content Safety that have advanced guardrails is highly recommended. Important areas for consideration include: * **Allocation:** Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (ex: housing, employment, credit, etc.) without further assessments and additional debiasing techniques. * **High-Risk Scenarios:** Developers should assess suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (ex: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context. * **Misinformation:** Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG). * **Generation of Harmful Content:** Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case. * **Misuse:** Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.", + "model_explanation_gemini": "An optimized version of Microsoft's Phi-4 model converted to Llama's architecture for faster fine-tuning, improved accuracy, and efficient text generation in conversational, coding, and mathematical tasks." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_1.json b/data/model_data_json/unslothai_1.json new file mode 100644 index 0000000000000000000000000000000000000000..d097e9102e87ac03747dbd7419a0eba91b1fd08d --- /dev/null +++ b/data/model_data_json/unslothai_1.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/1", + "downloads": 4956270, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "Unable to determine core functionality or purpose due to insufficient description." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_2.json b/data/model_data_json/unslothai_2.json new file mode 100644 index 0000000000000000000000000000000000000000..dbbe2831e4f4dcf4983fed41205ba427f79f88c6 --- /dev/null +++ b/data/model_data_json/unslothai_2.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/2", + "downloads": 262041, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "Unable to determine core functionality or purpose due to insufficient description." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_aws.json b/data/model_data_json/unslothai_aws.json new file mode 100644 index 0000000000000000000000000000000000000000..eb946a53931722574558d082661a2e8e48377aac --- /dev/null +++ b/data/model_data_json/unslothai_aws.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/aws", + "downloads": 1889062, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- {} --- We log statistics to see if any envs are breaking", + "model_explanation_gemini": "Logs statistics to monitor environment stability." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_azure.json b/data/model_data_json/unslothai_azure.json new file mode 100644 index 0000000000000000000000000000000000000000..fc01d18f68eef191687686dffe91e3bd20586c10 --- /dev/null +++ b/data/model_data_json/unslothai_azure.json @@ -0,0 +1,14 @@ +{ + "model_id": "unslothai/azure", + "downloads": 79899, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- {} --- We log statistics to see if any envs are breaking" +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_colab.json b/data/model_data_json/unslothai_colab.json new file mode 100644 index 0000000000000000000000000000000000000000..5485014c0784181ec0ce24bd9eccc30d97f2e0fd --- /dev/null +++ b/data/model_data_json/unslothai_colab.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/colab", + "downloads": 376685, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "The model lacks a description or tags indicating its functionality or purpose." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_colabpro.json b/data/model_data_json/unslothai_colabpro.json new file mode 100644 index 0000000000000000000000000000000000000000..a1c6f812bc17b433bae073682a2d0dd0bfb8620e --- /dev/null +++ b/data/model_data_json/unslothai_colabpro.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/colabpro", + "downloads": 263194, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- {} --- We log statistics to see if any envs are breaking", + "model_explanation_gemini": "Logs statistics to monitor environment stability." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_lambda.json b/data/model_data_json/unslothai_lambda.json new file mode 100644 index 0000000000000000000000000000000000000000..da6a30970b9ef5eface6f0ec75ed3c617ac61acd --- /dev/null +++ b/data/model_data_json/unslothai_lambda.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/lambda", + "downloads": 459725, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- {} --- We log statistics to see if any envs are breaking", + "model_explanation_gemini": "Logs statistics to monitor for environment failures." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_other.json b/data/model_data_json/unslothai_other.json new file mode 100644 index 0000000000000000000000000000000000000000..eb82514faf75b1b27e2ab8151df67f2630241b3c --- /dev/null +++ b/data/model_data_json/unslothai_other.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/other", + "downloads": 2197736, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "Unable to determine core functionality or purpose due to insufficient description." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_repeat.json b/data/model_data_json/unslothai_repeat.json new file mode 100644 index 0000000000000000000000000000000000000000..dd9cc779bc94031857a258f3fd9961870c46259e --- /dev/null +++ b/data/model_data_json/unslothai_repeat.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/repeat", + "downloads": 1865537, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "The description provided does not contain any information about the model's functionality, purpose, or task." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_runpod.json b/data/model_data_json/unslothai_runpod.json new file mode 100644 index 0000000000000000000000000000000000000000..17078e84c4b238c64a5cb35d69fe239b18b84209 --- /dev/null +++ b/data/model_data_json/unslothai_runpod.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/runpod", + "downloads": 83811, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- {} --- We log statistics to see if any envs are breaking", + "model_explanation_gemini": "Logs statistics to monitor environment stability." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_vram-16.json b/data/model_data_json/unslothai_vram-16.json new file mode 100644 index 0000000000000000000000000000000000000000..2cf13f90627cd5777cf19f8bafcc6a17cea109f9 --- /dev/null +++ b/data/model_data_json/unslothai_vram-16.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/vram-16", + "downloads": 1208325, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "The model's description does not provide any information about its functionality, purpose, or task." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_vram-24.json b/data/model_data_json/unslothai_vram-24.json new file mode 100644 index 0000000000000000000000000000000000000000..35eaf7c7229335bb726cecf56c14bc945ebe2752 --- /dev/null +++ b/data/model_data_json/unslothai_vram-24.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/vram-24", + "downloads": 1656100, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "Insufficient information to determine the model's purpose or functionality." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_vram-40.json b/data/model_data_json/unslothai_vram-40.json new file mode 100644 index 0000000000000000000000000000000000000000..c1812ea97c9c5e4ea8d8177a9520f35c5a08be50 --- /dev/null +++ b/data/model_data_json/unslothai_vram-40.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/vram-40", + "downloads": 502237, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "The model's purpose or functionality cannot be determined from the provided description due to lack of details." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_vram-48.json b/data/model_data_json/unslothai_vram-48.json new file mode 100644 index 0000000000000000000000000000000000000000..c9c0ef1126273c10de65ffcaa3f5d8f702b6c3e6 --- /dev/null +++ b/data/model_data_json/unslothai_vram-48.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/vram-48", + "downloads": 1268843, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "The model lacks a description or tags indicating its specific purpose or functionality." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_vram-8.json b/data/model_data_json/unslothai_vram-8.json new file mode 100644 index 0000000000000000000000000000000000000000..87652882e19f55aaf76e1843835f19d129e5a4df --- /dev/null +++ b/data/model_data_json/unslothai_vram-8.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/vram-8", + "downloads": 316334, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "Insufficient information to determine the model's purpose or functionality." +} \ No newline at end of file diff --git a/data/model_data_json/unslothai_vram-80.json b/data/model_data_json/unslothai_vram-80.json new file mode 100644 index 0000000000000000000000000000000000000000..a921fde3145e22722e86048d1e218f1330f8d1a3 --- /dev/null +++ b/data/model_data_json/unslothai_vram-80.json @@ -0,0 +1,15 @@ +{ + "model_id": "unslothai/vram-80", + "downloads": 333987, + "tags": [ + "transformers", + "safetensors", + "llama", + "feature-extraction", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers tags: [] ---", + "model_explanation_gemini": "The model lacks a description or tags indicating its functionality or purpose." +} \ No newline at end of file diff --git a/data/model_data_json/valhalla_distilbart-mnli-12-1.json b/data/model_data_json/valhalla_distilbart-mnli-12-1.json new file mode 100644 index 0000000000000000000000000000000000000000..e7b598d4b9641511498004cb7881c33866b84eed --- /dev/null +++ b/data/model_data_json/valhalla_distilbart-mnli-12-1.json @@ -0,0 +1,20 @@ +{ + "model_id": "valhalla/distilbart-mnli-12-1", + "downloads": 188149, + "tags": [ + "transformers", + "pytorch", + "jax", + "bart", + "text-classification", + "distilbart", + "distilbart-mnli", + "zero-shot-classification", + "dataset:mnli", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- datasets: - mnli tags: - distilbart - distilbart-mnli pipeline_tag: zero-shot-classification --- # DistilBart-MNLI distilbart-mnli is the distilled version of bart-large-mnli created using the **No Teacher Distillation** technique proposed for BART summarisation by Huggingface, here. We just copy alternating layers from and finetune more on the same data. | | matched acc | mismatched acc | | ------------------------------------------------------------------------------------ | ----------- | -------------- | | bart-large-mnli (baseline, 12-12) | 89.9 | 90.01 | | distilbart-mnli-12-1 | 87.08 | 87.5 | | distilbart-mnli-12-3 | 88.1 | 88.19 | | distilbart-mnli-12-6 | 89.19 | 89.01 | | distilbart-mnli-12-9 | 89.56 | 89.52 | This is a very simple and effective technique, as we can see the performance drop is very little. Detailed performace trade-offs will be posted in this sheet. ## Fine-tuning If you want to train these models yourself, clone the distillbart-mnli repo and follow the steps below Clone and install transformers from source Download MNLI data Create student model Start fine-tuning You can find the logs of these trained models in this wandb project.", + "model_explanation_gemini": "Performs zero-shot classification by predicting text labels without task-specific training, distilled from BART-large-MNLI with minimal performance drop." +} \ No newline at end of file diff --git a/data/model_data_json/vaughankraska_resnet-18-16rank.json b/data/model_data_json/vaughankraska_resnet-18-16rank.json new file mode 100644 index 0000000000000000000000000000000000000000..4eca8e8a88e473bf422c68389fda2dab62c6169b --- /dev/null +++ b/data/model_data_json/vaughankraska_resnet-18-16rank.json @@ -0,0 +1,12 @@ +{ + "model_id": "vaughankraska/resnet-18-16rank", + "downloads": 264526, + "tags": [ + "safetensors", + "resnet", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 ---", + "model_explanation_gemini": "ResNet-18 model with 16-rank compression for efficient image classification tasks." +} \ No newline at end of file diff --git a/data/model_data_json/vblagoje_bert-english-uncased-finetuned-pos.json b/data/model_data_json/vblagoje_bert-english-uncased-finetuned-pos.json new file mode 100644 index 0000000000000000000000000000000000000000..a065496d070f14d3db6f6702c40d2c59312912e2 --- /dev/null +++ b/data/model_data_json/vblagoje_bert-english-uncased-finetuned-pos.json @@ -0,0 +1,16 @@ +{ + "model_id": "vblagoje/bert-english-uncased-finetuned-pos", + "downloads": 77652, + "tags": [ + "transformers", + "pytorch", + "jax", + "safetensors", + "bert", + "token-classification", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "## Part-of-Speech (PoS) Tags Below are the Part-of-Speech (PoS) tags used in the model: | **Tag** | **Meaning** | **Examples** | |-----------|------------------------------------------------------|--------------------------------| | ADP | Adposition (prepositions or postpositions) | in, on, by | | ADJ | Adjective | significant, global | | ADV | Adverb | quickly, often | | AUX | Auxiliary verb | is, was | | CCONJ | Coordinating conjunction | and, but | | DET | Determiner | the, a | | INTJ | Interjection | oh, wow | | NOUN | Noun | man, city | | NUM | Number | one, 2022 | | PART | Particle | 's, to | | PRON | Pronoun | he, which | | PROPN | Proper noun | Neil Armstrong, Paris | | PUNCT | Punctuation mark | ,, . | | SCONJ | Subordinating conjunction | because, although | | SYM | Symbol | $, % | | VERB | Verb | run, is | | X | Other (generally words that do not fit into other categories) | [not defined] |" +} \ No newline at end of file diff --git a/data/model_data_json/vectara_hallucination_evaluation_model.json b/data/model_data_json/vectara_hallucination_evaluation_model.json new file mode 100644 index 0000000000000000000000000000000000000000..861f6bbcbe5dc674544ae4998d2cefce1fe3142e --- /dev/null +++ b/data/model_data_json/vectara_hallucination_evaluation_model.json @@ -0,0 +1,23 @@ +{ + "model_id": "vectara/hallucination_evaluation_model", + "downloads": 205623, + "tags": [ + "transformers", + "safetensors", + "HHEMv2Config", + "text-classification", + "custom_code", + "en", + "arxiv:2205.12854", + "arxiv:2401.00396", + "arxiv:2303.15621", + "base_model:google/flan-t5-base", + "base_model:finetune:google/flan-t5-base", + "doi:10.57967/hf/3240", + "license:apache-2.0", + "autotrain_compatible", + "region:us" + ], + "description": "--- language: en license: apache-2.0 base_model: google/flan-t5-base pipline_tag: text-classficiation --- In Loving memory of Simon Mark Hughes... **Highlights**: * HHEM-2.1-Open shows a significant improvement over HHEM-1.0. * HHEM-2.1-Open outperforms GPT-3.5-Turbo and even GPT-4. * HHEM-2.1-Open can be run on consumer-grade hardware, occupying less than 600MB RAM space at 32-bit precision and elapsing around 1.5 seconds for a 2k-token input on a modern x86 CPU. > HHEM-2.1-Open introduces breaking changes to the usage. Please update your code according to the new usage below. We are working making it compatible with HuggingFace's Inference Endpoint. We apologize for the inconvenience. HHEM-2.1-Open is a major upgrade to HHEM-1.0-Open created by Vectara in November 2023. The HHEM model series are designed for detecting hallucinations in LLMs. They are particularly useful in the context of building retrieval-augmented-generation (RAG) applications where a set of facts is summarized by an LLM, and HHEM can be used to measure the extent to which this summary is factually consistent with the facts. If you are interested to learn more about RAG or experiment with Vectara, you can sign up for a Vectara account. **Try out HHEM-2.1-Open from your browser without coding** ## Hallucination Detection 101 By \"hallucinated\" or \"factually inconsistent\", we mean that a text (hypothesis, to be judged) is not supported by another text (evidence/premise, given). You **always need two** pieces of text to determine whether a text is hallucinated or not. When applied to RAG (retrieval augmented generation), the LLM is provided with several pieces of text (often called facts or context) retrieved from some dataset, and a hallucination would indicate that the summary (hypothesis) is not supported by those facts (evidence). A common type of hallucination in RAG is **factual but hallucinated**. For example, given the premise _\"The capital of France is Berlin\"_, the hypothesis _\"The capital of France is Paris\"_ is hallucinated -- although it is true in the world knowledge. This happens when LLMs do not generate content based on the textual data provided to them as part of the RAG retrieval process, but rather generate content based on their pre-trained knowledge. Additionally, hallucination detection is \"asymmetric\" or is not commutative. For example, the hypothesis _\"I visited Iowa\"_ is considered hallucinated given the premise _\"I visited the United States\"_, but the reverse is consistent. ## Using HHEM-2.1-Open > HHEM-2.1 has some breaking change from HHEM-1.0. Your code that works with HHEM-1 (November 2023) will not work anymore. While we are working on backward compatibility, please follow the new usage instructions below. Here we provide several ways to use HHEM-2.1-Open in the library. > You may run into a warning message that \"Token indices sequence length is longer than the specified maximum sequence length\". Please ignore it which is inherited from the foundation, T5-base. ### Using with This is the most end-to-end and out-of-the-box way to use HHEM-2.1-Open. It takes a list of pairs of (premise, hypothesis) as the input and returns a score between 0 and 1 for each pair where 0 means that the hypothesis is not evidenced at all by the premise and 1 means the hypothesis is fully supported by the premise. ### Using with In the popular class of the library, you have to manually prepare the data using the prompt template in which we trained the model. HHEM-2.1-Open has two output neurons, corresponding to the labels and respectively. In the example below, we will ask to return the scores for both labels (by setting , formerly ) and then extract the score for the label. Of course, with , you can also get the most likely label, or the label with the highest score, by setting . ## HHEM-2.1-Open vs. HHEM-1.0 The major difference between HHEM-2.1-Open and the original HHEM-1.0 is that HHEM-2.1-Open has an unlimited context length, while HHEM-1.0 is capped at 512 tokens. The longer context length allows HHEM-2.1-Open to provide more accurate hallucination detection for RAG which often needs more than 512 tokens. The tables below compare the two models on the AggreFact and RAGTruth benchmarks, as well as GPT-3.5-Turbo and GPT-4. In particular, on AggreFact, we focus on its SOTA subset (denoted as ) which contains summaries generated by Google's T5, Meta's BART, and Google's Pegasus, which are the three latest models in the AggreFact benchmark. The results on RAGTruth's summarization (denoted as ) and QA (denoted as ) subsets are reported separately. The GPT-3.5-Turbo and GPT-4 versions are 01-25 and 06-13 respectively. The zero-shot results of the two GPT models were obtained using the prompt template in this paper. Table 1: Performance on AggreFact-SOTA | model | Balanced Accuracy | F1 | Recall | Precision | |:------------------------|---------:|-------:|-------:|----------:| | HHEM-1.0 | 78.87% | 90.47% | 70.81% | 67.27% | | HHEM-2.1-Open | 76.55% | 66.77% | 68.48% | 65.13% | | GPT-3.5-Turbo zero-shot | 72.19% | 60.88% | 58.48% | 63.49% | | GPT-4 06-13 zero-shot | 73.78% | 63.87% | 53.03% | 80.28% | Table 2: Performance on RAGTruth-Summ | model | Balanced Accuracy | F1 | Recall | Precision | |:----------------------|---------:|-----------:|----------:|----------:| | HHEM-1.0 | 53.36% | 15.77% | 9.31% | 51.35% | | HHEM-2.1-Open | 64.42% | 44.83% | 31.86% | 75.58% | | GPT-3.5-Turbo zero-shot | 58.49% | 29.72% | 18.14% | 82.22% | | GPT-4 06-13 zero-shot | 62.62% | 40.59% | 26.96% | 82.09% | Table 3: Performance on RAGTruth-QA | model | Balanced Accuracy | F1 | Recall | Precision | |:----------------------|---------:|-----------:|----------:|----------:| | HHEM-1.0 | 52.58% | 19.40% | 16.25% | 24.07% | | HHEM-2.1-Open | 74.28% | 60.00% | 54.38% | 66.92% | | GPT-3.5-Turbo zero-shot | 56.16% | 25.00% | 18.13% | 40.28% | | GPT-4 06-13 zero-shot | 74.11% | 57.78% | 56.88% | 58.71% | The tables above show that HHEM-2.1-Open has a significant improvement over HHEM-1.0 in the RAGTruth-Summ and RAGTruth-QA benchmarks, while it has a slight decrease in the AggreFact-SOTA benchmark. However, when interpreting these results, please note that AggreFact-SOTA is evaluated on relatively older types of LLMs: - LLMs in AggreFact-SOTA: T5, BART, and Pegasus; - LLMs in RAGTruth: GPT-4-0613, GPT-3.5-turbo-0613, Llama-2-7B/13B/70B-chat, and Mistral-7B-instruct. ## HHEM-2.1-Open vs. GPT-3.5-Turbo and GPT-4 From the tables above we can also conclude that HHEM-2.1-Open outperforms both GPT-3.5-Turbo and GPT-4 in all three benchmarks. The quantitative advantage of HHEM-2.1-Open over GPT-3.5-Turbo and GPT-4 is summarized in Table 4 below. Table 4: Percentage points of HHEM-2.1-Open's balanced accuracies over GPT-3.5-Turbo and GPT-4 | | AggreFact-SOTA | RAGTruth-Summ | RAGTruth-QA | |:----------------------|---------:|-----------:|----------:| | HHEM-2.1-Open **over** GPT-3.5-Turbo | 4.36% | 5.93% | 18.12% | | HHEM-2.1-Open **over** GPT-4 | 2.64% | 1.80% | 0.17% | Another advantage of HHEM-2.1-Open is its efficiency. HHEM-2.1-Open can be run on consumer-grade hardware, occupying less than 600MB RAM space at 32-bit precision and elapsing around 1.5 second for a 2k-token input on a modern x86 CPU. ## HHEM-2.1: The more powerful, proprietary counterpart of HHEM-2.1-Open As you may have already sensed from the name, HHEM-2.1-Open is the open source version of the premium HHEM-2.1. HHEM-2.1 (without the ) is offered exclusively via Vectara's RAG-as-a-service platform. The major difference between HHEM-2.1 and HHEM-2.1-Open is that HHEM-2.1 is cross-lingual on three languages: English, German, and French, while HHEM-2.1-Open is English-only. \"Cross-lingual\" means any combination of the three languages, e.g., documents in German, query in English, results in French. ### Why RAG in Vectara? Vectara provides a Trusted Generative AI platform. The platform allows organizations to rapidly create an AI assistant experience which is grounded in the data, documents, and knowledge that they have. Vectara's serverless RAG-as-a-Service also solves critical problems required for enterprise adoption, namely: reduces hallucination, provides explainability / provenance, enforces access control, allows for real-time updatability of the knowledge, and mitigates intellectual property / bias concerns from large language models. To start benefiting from HHEM-2.1, you can sign up for a Vectara account, and you will get the HHEM-2.1 score returned with every query automatically. Here are some additional resources: 1. Vectara API documentation. 2. Quick start using Forrest's package. 3. Learn more about Vectara's Boomerang embedding model, Slingshot reranker, and Mockingbird LLM ## LLM Hallucination Leaderboard If you want to stay up to date with results of the latest tests using this model to evaluate the top LLM models, we have a public leaderboard that is periodically updated, and results are also available on the GitHub repository. # Cite this model", + "model_explanation_gemini": "Detects hallucinations in LLM-generated text by evaluating factual consistency between a given premise and hypothesis, scoring their alignment from 0 to 1." +} \ No newline at end of file diff --git a/data/model_data_json/vennify_t5-base-grammar-correction.json b/data/model_data_json/vennify_t5-base-grammar-correction.json new file mode 100644 index 0000000000000000000000000000000000000000..baec0b8b1968f2e1afa50c03c5ff589b775bf75f --- /dev/null +++ b/data/model_data_json/vennify_t5-base-grammar-correction.json @@ -0,0 +1,21 @@ +{ + "model_id": "vennify/t5-base-grammar-correction", + "downloads": 87330, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "grammar", + "en", + "dataset:jfleg", + "arxiv:1702.04066", + "license:cc-by-nc-sa-4.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en tags: - grammar - text2text-generation license: cc-by-nc-sa-4.0 datasets: - jfleg --- # T5 Grammar Correction This model generates a revised version of inputted text with the goal of containing fewer grammatical errors. It was trained with Happy Transformer using a dataset called JFLEG. Here's a full article on how to train a similar model. ## Usage", + "model_explanation_gemini": "Generates grammatically corrected versions of input text to reduce errors." +} \ No newline at end of file diff --git a/data/model_data_json/vidore_colpali-v1.3.json b/data/model_data_json/vidore_colpali-v1.3.json new file mode 100644 index 0000000000000000000000000000000000000000..20cc2c371d332bd19a51a13e882d4fbc869494ba --- /dev/null +++ b/data/model_data_json/vidore_colpali-v1.3.json @@ -0,0 +1,21 @@ +{ + "model_id": "vidore/colpali-v1.3", + "downloads": 81111, + "tags": [ + "colpali", + "safetensors", + "vidore", + "vidore-experimental", + "visual-document-retrieval", + "en", + "dataset:vidore/colpali_train_set", + "arxiv:2004.12832", + "arxiv:2407.01449", + "arxiv:2106.09685", + "base_model:vidore/colpaligemma-3b-pt-448-base", + "base_model:finetune:vidore/colpaligemma-3b-pt-448-base", + "license:mit", + "region:us" + ], + "description": "--- license: mit library_name: colpali base_model: vidore/colpaligemma-3b-pt-448-base language: - en tags: - vidore - vidore-experimental datasets: - vidore/colpali_train_set pipeline_tag: visual-document-retrieval --- # ColPali: Visual Retriever based on PaliGemma-3B with ColBERT strategy ## This version is trained with 256 batch size for 3 epochs on the same data as the original ColPali model. ColPali is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features. It is a PaliGemma-3B extension that generates ColBERT- style multi-vector representations of text and images. It was introduced in the paper ColPali: Efficient Document Retrieval with Vision Language Models and first released in this repository

4.46.1. ## Limitations - **Focus**: The model primarily focuses on PDF-type documents and high-ressources languages, potentially limiting its generalization to other document types or less represented languages. - **Support**: The model relies on multi-vector retreiving derived from the ColBERT late interaction mechanism, which may require engineering efforts to adapt to widely used vector retrieval frameworks that lack native multi-vector support. ## License ColQwen2's vision language backbone model (Qwen2-VL) is under license. The adapters attached to the model are under MIT license. ## Contact - Manuel Faysse: manuel.faysse@illuin.tech - Hugues Sibille: hugues.sibille@illuin.tech - Tony Wu: tony.wu@illuin.tech ## Citation If you use any datasets or models from this organization in your research, please cite the original dataset as follows:", + "model_explanation_gemini": "Generates multi-vector representations of text and images for efficient visual document retrieval, based on Qwen2-VL-2B with ColBERT-style indexing." +} \ No newline at end of file diff --git a/data/model_data_json/vikhyatk_moondream2.json b/data/model_data_json/vikhyatk_moondream2.json new file mode 100644 index 0000000000000000000000000000000000000000..4f3afced8bd3045c5872bd61cddc5b11a63ea027 --- /dev/null +++ b/data/model_data_json/vikhyatk_moondream2.json @@ -0,0 +1,20 @@ +{ + "model_id": "vikhyatk/moondream2", + "downloads": 198710, + "tags": [ + "transformers", + "safetensors", + "gguf", + "moondream1", + "text-generation", + "image-text-to-text", + "custom_code", + "doi:10.57967/hf/3219", + "license:apache-2.0", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 pipeline_tag: image-text-to-text --- Moondream is a small vision language model designed to run efficiently everywhere. Website / Demo / GitHub This repository contains the latest (**2025-04-14**) release of Moondream, as well as historical releases. The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application. ### Usage ### Changelog **2025-04-15** (full release notes) 1. Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT) 2. Added temperature and nucleus sampling to reduce repetitive outputs 3. Better OCR for documents and tables (prompt with “Transcribe the text” or “Transcribe the text in natural reading order”) 4. Object detection supports document layout detection (figure, formula, text, etc) 5. UI understanding (ScreenSpot F1\\@0.5 up from 53.3 to 60.3) 6. Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3) **2025-03-27** (full release notes) 1. Added support for long-form captioning 2. Open vocabulary image tagging 3. Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4) 4. Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2) 5. Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2) 6. Fixed token streaming bug affecting multi-byte unicode characters 7. gpt-fast style now supported in HF Transformers implementation", + "model_explanation_gemini": "Moondream2 is a compact vision-language model optimized for efficient image-text-to-text tasks, including chart understanding, OCR, object detection, and document layout analysis, with frequent updates for improved performance." +} \ No newline at end of file diff --git a/data/model_data_json/vikp_layout_segmenter.json b/data/model_data_json/vikp_layout_segmenter.json new file mode 100644 index 0000000000000000000000000000000000000000..19ef541c042d019b89a471964eda7365a32beef6 --- /dev/null +++ b/data/model_data_json/vikp_layout_segmenter.json @@ -0,0 +1,15 @@ +{ + "model_id": "vikp/layout_segmenter", + "downloads": 459950, + "tags": [ + "transformers", + "pytorch", + "layoutlmv3", + "token-classification", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "Segments pdf page layout into blocks. Based on layoutlmv3. Used in marker.", + "model_explanation_gemini": "Segments PDF page layouts into blocks using LayoutLMv3." +} \ No newline at end of file diff --git a/data/model_data_json/vikp_surya_det.json b/data/model_data_json/vikp_surya_det.json new file mode 100644 index 0000000000000000000000000000000000000000..d2227cdc15ec8257b52724e80ba86d0f1acc8702 --- /dev/null +++ b/data/model_data_json/vikp_surya_det.json @@ -0,0 +1,14 @@ +{ + "model_id": "vikp/surya_det", + "downloads": 109242, + "tags": [ + "transformers", + "safetensors", + "segformer", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-sa-4.0 --- Line detection model for surya. See repo for details.", + "model_explanation_gemini": "Detects lines in documents for the 'surya' project." +} \ No newline at end of file diff --git a/data/model_data_json/vikp_surya_det3.json b/data/model_data_json/vikp_surya_det3.json new file mode 100644 index 0000000000000000000000000000000000000000..a99976f1f665eaad794ac5b9a5b9bac83c9e3eae --- /dev/null +++ b/data/model_data_json/vikp_surya_det3.json @@ -0,0 +1,16 @@ +{ + "model_id": "vikp/surya_det3", + "downloads": 335202, + "tags": [ + "transformers", + "safetensors", + "efficientvit", + "base_model:vikp/surya_det3", + "base_model:finetune:vikp/surya_det3", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- base_model: vikp/line_detector_3 model-index: - name: line_detector_3 results: [] license: cc-by-nc-sa-4.0 --- Text detection model for surya", + "model_explanation_gemini": "Detects text lines in documents for the 'surya' application." +} \ No newline at end of file diff --git a/data/model_data_json/vikp_surya_order.json b/data/model_data_json/vikp_surya_order.json new file mode 100644 index 0000000000000000000000000000000000000000..2c79a1afa6393d0e006fa5de6977e20c28b8aacf --- /dev/null +++ b/data/model_data_json/vikp_surya_order.json @@ -0,0 +1,14 @@ +{ + "model_id": "vikp/surya_order", + "downloads": 103716, + "tags": [ + "transformers", + "safetensors", + "vision-encoder-decoder", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-nc-sa-4.0 --- Reading order model for surya.", + "model_explanation_gemini": "Determines reading order for text in documents." +} \ No newline at end of file diff --git a/data/model_data_json/vikp_surya_rec2.json b/data/model_data_json/vikp_surya_rec2.json new file mode 100644 index 0000000000000000000000000000000000000000..acd0d32c1120ef9d2c1b6c7edd3b6959709f0893 --- /dev/null +++ b/data/model_data_json/vikp_surya_rec2.json @@ -0,0 +1,14 @@ +{ + "model_id": "vikp/surya_rec2", + "downloads": 231358, + "tags": [ + "transformers", + "safetensors", + "vision-encoder-decoder", + "license:cc-by-nc-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- library_name: transformers license: cc-by-nc-sa-4.0 --- OCR model for surya", + "model_explanation_gemini": "Performs OCR (optical character recognition) for the Surya script." +} \ No newline at end of file diff --git a/data/model_data_json/vikp_texify.json b/data/model_data_json/vikp_texify.json new file mode 100644 index 0000000000000000000000000000000000000000..1769ed9172266b2ac541d7295887acf8c209e03f --- /dev/null +++ b/data/model_data_json/vikp_texify.json @@ -0,0 +1,15 @@ +{ + "model_id": "vikp/texify", + "downloads": 182142, + "tags": [ + "transformers", + "safetensors", + "vision-encoder-decoder", + "image-text-to-text", + "license:cc-by-sa-4.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: cc-by-sa-4.0 --- OCR equation images and text to latex. See texify.", + "model_explanation_gemini": "Converts OCR-processed equation images and text into LaTeX format." +} \ No newline at end of file diff --git a/data/model_data_json/vinai_bartpho-syllable-base.json b/data/model_data_json/vinai_bartpho-syllable-base.json new file mode 100644 index 0000000000000000000000000000000000000000..bfd9d56bf87a9539d1e1a81055be59edd2a97c23 --- /dev/null +++ b/data/model_data_json/vinai_bartpho-syllable-base.json @@ -0,0 +1,16 @@ +{ + "model_id": "vinai/bartpho-syllable-base", + "downloads": 1302818, + "tags": [ + "transformers", + "pytorch", + "mbart", + "feature-extraction", + "arxiv:2109.09701", + "license:mit", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit --- # BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese The pre-trained model is the \"base\" variant of , which uses the \"base\" architecture and pre-training scheme of the sequence-to-sequence denoising model BART. The general architecture and experimental results of BARTpho can be found in our paper: @article{bartpho, title = {{BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese}}, author = {Nguyen Luong Tran and Duong Minh Le and Dat Quoc Nguyen}, journal = {arXiv preprint}, volume = {arXiv:2109.09701}, year = {2021} } **Please CITE** our paper when BARTpho is used to help produce published results or incorporated into other software. For further information or requests, please go to BARTpho's homepage!", + "model_explanation_gemini": "BARTpho is a pre-trained sequence-to-sequence model for Vietnamese, based on BART's base architecture, designed for denoising tasks." +} \ No newline at end of file diff --git a/data/model_data_json/vinai_phobert-base.json b/data/model_data_json/vinai_phobert-base.json new file mode 100644 index 0000000000000000000000000000000000000000..f60b2a66dfb117f4e0064793526c7a27efc94975 --- /dev/null +++ b/data/model_data_json/vinai_phobert-base.json @@ -0,0 +1,20 @@ +{ + "model_id": "vinai/phobert-base", + "downloads": 339519, + "tags": [ + "transformers", + "pytorch", + "tf", + "jax", + "roberta", + "fill-mask", + "vi", + "arxiv:2003.00744", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit language: - vi --- # PhoBERT: Pre-trained language models for Vietnamese Pre-trained PhoBERT models are the state-of-the-art language models for Vietnamese (Pho, i.e. \"Phở\", is a popular food in Vietnam): - Two PhoBERT versions of \"base\" and \"large\" are the first public large-scale monolingual language models pre-trained for Vietnamese. PhoBERT pre-training approach is based on RoBERTa which optimizes the BERT pre-training procedure for more robust performance. - PhoBERT outperforms previous monolingual and multilingual approaches, obtaining new state-of-the-art performances on four downstream Vietnamese NLP tasks of Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. The general architecture and experimental results of PhoBERT can be found in our EMNLP-2020 Findings paper: @article{phobert, title = {{PhoBERT: Pre-trained language models for Vietnamese}}, author = {Dat Quoc Nguyen and Anh Tuan Nguyen}, journal = {Findings of EMNLP}, year = {2020} } **Please CITE** our paper when PhoBERT is used to help produce published results or is incorporated into other software. For further information or requests, please go to PhoBERT's homepage!", + "model_explanation_gemini": "PhoBERT is a pre-trained monolingual language model for Vietnamese, optimized for robust performance on NLP tasks like POS tagging, dependency parsing, named-entity recognition, and natural language inference." +} \ No newline at end of file diff --git a/data/model_data_json/vinid_plip.json b/data/model_data_json/vinid_plip.json new file mode 100644 index 0000000000000000000000000000000000000000..04e14843d88e33e0959e6fde683885387a5457ff --- /dev/null +++ b/data/model_data_json/vinid_plip.json @@ -0,0 +1,14 @@ +{ + "model_id": "vinid/plip", + "downloads": 137415, + "tags": [ + "transformers", + "pytorch", + "clip", + "zero-shot-image-classification", + "endpoints_compatible", + "region:us" + ], + "description": "--- {} --- ## Model Use (from CLIP Model Card) ### Intended Use The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. #### Primary intended uses The primary intended users of these models are AI researchers. We primarily imagine the model will be used by researchers to better understand robustness, generalization, and other capabilities, biases, and constraints of computer vision models. ### Out-of-Scope Use Cases **Any** deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases. # Disclaimer Please be advised that this function has been developed in compliance with the Twitter policy of data usage and sharing. It is important to note that the results obtained from this function are not intended to constitute medical advice or replace consultation with a qualified medical professional. The use of this function is solely at your own risk and should be consistent with applicable laws, regulations, and ethical considerations. We do not warrant or guarantee the accuracy, completeness, suitability, or usefulness of this function for any particular purpose, and we hereby disclaim any liability arising from any reliance placed on this function or any results obtained from its use. If you wish to review the original Twitter post, you should access the source page directly on Twitter.' # Privacy In accordance with the privacy and control policy of Twitter, we hereby declared that the data redistributed by us shall only comprise of Tweet IDs. The Tweet IDs will be employed to establish a linkage with the original Twitter post, as long as the original post is still accessible. The hyperlink will cease to function if the user deletes the original post. It is important to note that all tweets displayed on our service have already been classified as non-sensitive by Twitter. It is strictly prohibited to redistribute any content apart from the Tweet IDs. Any distribution carried out must adhere to the laws and regulations applicable in your jurisdiction, including export control laws and embargoes.'", + "model_explanation_gemini": "Enables AI researchers to explore zero-shot, arbitrary image classification for studying model robustness, generalization, biases, and constraints." +} \ No newline at end of file diff --git a/data/model_data_json/w11wo_indonesian-roberta-base-posp-tagger.json b/data/model_data_json/w11wo_indonesian-roberta-base-posp-tagger.json new file mode 100644 index 0000000000000000000000000000000000000000..ee213bfd06d1489f52ef8cbe6889ebed7e82df5d --- /dev/null +++ b/data/model_data_json/w11wo_indonesian-roberta-base-posp-tagger.json @@ -0,0 +1,25 @@ +{ + "model_id": "w11wo/indonesian-roberta-base-posp-tagger", + "downloads": 2061380, + "tags": [ + "transformers", + "pytorch", + "tf", + "tensorboard", + "safetensors", + "roberta", + "token-classification", + "generated_from_trainer", + "ind", + "dataset:indonlu", + "base_model:flax-community/indonesian-roberta-base", + "base_model:finetune:flax-community/indonesian-roberta-base", + "license:mit", + "model-index", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit base_model: flax-community/indonesian-roberta-base tags: - generated_from_trainer datasets: - indonlu language: - ind metrics: - precision - recall - f1 - accuracy model-index: - name: indonesian-roberta-base-posp-tagger results: - task: name: Token Classification type: token-classification dataset: name: indonlu type: indonlu config: posp split: test args: posp metrics: - name: Precision type: precision value: 0.9625100240577386 - name: Recall type: recall value: 0.9625100240577386 - name: F1 type: f1 value: 0.9625100240577386 - name: Accuracy type: accuracy value: 0.9625100240577386 --- # indonesian-roberta-base-posp-tagger This model is a fine-tuned version of flax-community/indonesian-roberta-base on the indonlu dataset. It achieves the following results on the evaluation set: - Loss: 0.1395 - Precision: 0.9625 - Recall: 0.9625 - F1: 0.9625 - Accuracy: 0.9625 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:| | No log | 1.0 | 420 | 0.2254 | 0.9313 | 0.9313 | 0.9313 | 0.9313 | | 0.4398 | 2.0 | 840 | 0.1617 | 0.9499 | 0.9499 | 0.9499 | 0.9499 | | 0.1566 | 3.0 | 1260 | 0.1431 | 0.9569 | 0.9569 | 0.9569 | 0.9569 | | 0.103 | 4.0 | 1680 | 0.1412 | 0.9605 | 0.9605 | 0.9605 | 0.9605 | | 0.0723 | 5.0 | 2100 | 0.1408 | 0.9635 | 0.9635 | 0.9635 | 0.9635 | | 0.051 | 6.0 | 2520 | 0.1408 | 0.9642 | 0.9642 | 0.9642 | 0.9642 | | 0.051 | 7.0 | 2940 | 0.1510 | 0.9635 | 0.9635 | 0.9635 | 0.9635 | | 0.0368 | 8.0 | 3360 | 0.1653 | 0.9645 | 0.9645 | 0.9645 | 0.9645 | | 0.0277 | 9.0 | 3780 | 0.1664 | 0.9644 | 0.9644 | 0.9644 | 0.9644 | | 0.0231 | 10.0 | 4200 | 0.1668 | 0.9646 | 0.9646 | 0.9646 | 0.9646 | ### Framework versions - Transformers 4.37.2 - Pytorch 2.2.0+cu118 - Datasets 2.16.1 - Tokenizers 0.15.1", + "model_explanation_gemini": "An Indonesian RoBERTa-based model fine-tuned for token classification, specifically part-of-speech tagging, achieving high precision, recall, F1, and accuracy on the indonlu dataset." +} \ No newline at end of file diff --git a/data/model_data_json/watt-ai_watt-tool-8B.json b/data/model_data_json/watt-ai_watt-tool-8B.json new file mode 100644 index 0000000000000000000000000000000000000000..b7f525b9792f6b3bc165c253d7ff5a173162b42b --- /dev/null +++ b/data/model_data_json/watt-ai_watt-tool-8B.json @@ -0,0 +1,19 @@ +{ + "model_id": "watt-ai/watt-tool-8B", + "downloads": 95886, + "tags": [ + "safetensors", + "llama", + "function-calling", + "tool-use", + "bfcl", + "en", + "arxiv:2406.14868", + "base_model:meta-llama/Llama-3.1-8B-Instruct", + "base_model:finetune:meta-llama/Llama-3.1-8B-Instruct", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct tags: - function-calling - tool-use - llama - bfcl --- # watt-tool-8B watt-tool-8B is a fine-tuned language model based on LLaMa-3.1-8B-Instruct, optimized for tool usage and multi-turn dialogue. It achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL). ## Model Description This model is specifically designed to excel at complex tool usage scenarios that require multi-turn interactions, making it ideal for empowering platforms like Lupan, an AI-powered workflow building tool. By leveraging a carefully curated and optimized dataset, watt-tool-8B demonstrates superior capabilities in understanding user requests, selecting appropriate tools, and effectively utilizing them across multiple turns of conversation. Target Application: AI Workflow Building as in and Coze. ## Key Features * **Enhanced Tool Usage:** Fine-tuned for precise and efficient tool selection and execution. * **Multi-Turn Dialogue:** Optimized for maintaining context and effectively utilizing tools across multiple turns of conversation, enabling more complex task completion. * **State-of-the-Art Performance:** Achieves top performance on the BFCL, demonstrating its capabilities in function calling and tool usage. ## Training Methodology watt-tool-8B is trained using supervised fine-tuning on a specialized dataset designed for tool usage and multi-turn dialogue. We use CoT techniques to synthesize high-quality multi-turn dialogue data. The training process is inspired by the principles outlined in the paper: \"Direct Multi-Turn Preference Optimization for Language Agents\". We use SFT and DMPO to further enhance the model's performance in multi-turn agent tasks. ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = \"watt-ai/watt-tool-8B\" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype='auto', device_map=\"auto\") # Example usage (adapt as needed for your specific tool usage scenario) \"\"\"You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose. If none of the function can be used, point it out. If the given question lacks the parameters required by the function, also point it out. You should only return the function call in tools call sections. If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] You SHOULD NOT include any other text in the response. Here is a list of functions in JSON format that you can invoke.\\n{functions}\\n \"\"\" # User query query = \"Find me the sales growth rate for company XYZ for the last 3 years and also the interest coverage ratio for the same duration.\" tools = [ { \"name\": \"financial_ratios.interest_coverage\", \"description\": \"Calculate a company's interest coverage ratio given the company name and duration\", \"arguments\": { \"type\": \"dict\", \"properties\": { \"company_name\": { \"type\": \"string\", \"description\": \"The name of the company.\" }, \"years\": { \"type\": \"integer\", \"description\": \"Number of past years to calculate the ratio.\" } }, \"required\": [\"company_name\", \"years\"] } }, { \"name\": \"sales_growth.calculate\", \"description\": \"Calculate a company's sales growth rate given the company name and duration\", \"arguments\": { \"type\": \"dict\", \"properties\": { \"company\": { \"type\": \"string\", \"description\": \"The company that you want to get the sales growth rate for.\" }, \"years\": { \"type\": \"integer\", \"description\": \"Number of past years for which to calculate the sales growth rate.\" } }, \"required\": [\"company\", \"years\"] } }, { \"name\": \"weather_forecast\", \"description\": \"Retrieve a weather forecast for a specific location and time frame.\", \"arguments\": { \"type\": \"dict\", \"properties\": { \"location\": { \"type\": \"string\", \"description\": \"The city that you want to get the weather for.\" }, \"days\": { \"type\": \"integer\", \"description\": \"Number of days for the forecast.\" } }, \"required\": [\"location\", \"days\"] } } ] messages = [ {'role': 'system', 'content': system_prompt.format(functions=tools)}, {'role': 'user', 'content': query} ] inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\").to(model.device) outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id) print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))", + "model_explanation_gemini": "Fine-tuned for complex tool usage and multi-turn dialogue, this model excels at selecting and executing appropriate tools across conversational turns, optimized for AI workflow building applications." +} \ No newline at end of file diff --git a/data/model_data_json/xbgoose_hubert-large-speech-emotion-recognition-russian-dusha-finetuned.json b/data/model_data_json/xbgoose_hubert-large-speech-emotion-recognition-russian-dusha-finetuned.json new file mode 100644 index 0000000000000000000000000000000000000000..90d9caeea8e44eaea85f299f94293057546b030a --- /dev/null +++ b/data/model_data_json/xbgoose_hubert-large-speech-emotion-recognition-russian-dusha-finetuned.json @@ -0,0 +1,24 @@ +{ + "model_id": "xbgoose/hubert-large-speech-emotion-recognition-russian-dusha-finetuned", + "downloads": 106105, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "hubert", + "audio-classification", + "SER", + "speech", + "audio", + "russian", + "ru", + "dataset:xbgoose/dusha", + "base_model:facebook/hubert-large-ls960-ft", + "base_model:finetune:facebook/hubert-large-ls960-ft", + "license:apache-2.0", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: - ru tags: - SER - speech - audio - russian license: apache-2.0 pipeline_tag: audio-classification base_model: facebook/hubert-large-ls960-ft datasets: - xbgoose/dusha --- # HuBERT fine-tuned on DUSHA dataset for speech emotion recognition in russian language The pre-trained model is this one - facebook/hubert-large-ls960-ft The DUSHA dataset used can be found here # Fine-tuning Fine-tuned in Google Colab using Pro account with A100 GPU Freezed all layers exept projector, classifier and all 24 HubertEncoderLayerStableLayerNorm layers Used half of the train dataset # Training parameters - 2 epochs - train batch size = 8 - eval batch size = 8 - gradient accumulation steps = 4 - learning rate = 5e-5 without warm up and decay # Metrics Achieved - accuracy = 0.86 - balanced = 0.76 - macro f1 score = 0.81 on test set, improving accucary and f1 score compared to dataset baseline # Usage", + "model_explanation_gemini": "Recognizes emotions in Russian speech by fine-tuning HuBERT on the DUSHA dataset for audio classification." +} \ No newline at end of file diff --git a/data/model_data_json/xey_sldr_flux_nsfw_v2-studio.json b/data/model_data_json/xey_sldr_flux_nsfw_v2-studio.json new file mode 100644 index 0000000000000000000000000000000000000000..8fae6b238119de5c4940d189673aafc1c9306d03 --- /dev/null +++ b/data/model_data_json/xey_sldr_flux_nsfw_v2-studio.json @@ -0,0 +1,17 @@ +{ + "model_id": "xey/sldr_flux_nsfw_v2-studio", + "downloads": 96600, + "tags": [ + "diffusers", + "text-to-image", + "template:diffusion-lora", + "lora", + "flux", + "flux dev", + "base_model:black-forest-labs/FLUX.1-dev", + "base_model:adapter:black-forest-labs/FLUX.1-dev", + "region:us" + ], + "description": "--- tags: - text-to-image - template:diffusion-lora - lora - diffusers - flux - flux dev base_model: black-forest-labs/FLUX.1-dev --- Source", + "model_explanation_gemini": "Generates NSFW images from text prompts using a LoRA adaptation of the FLUX.1-dev diffusion model." +} \ No newline at end of file diff --git a/data/model_data_json/xinsir_controlnet-union-sdxl-1.0.json b/data/model_data_json/xinsir_controlnet-union-sdxl-1.0.json new file mode 100644 index 0000000000000000000000000000000000000000..57e3a045a2e4c15e441d70640fdd7d9456203633 --- /dev/null +++ b/data/model_data_json/xinsir_controlnet-union-sdxl-1.0.json @@ -0,0 +1,17 @@ +{ + "model_id": "xinsir/controlnet-union-sdxl-1.0", + "downloads": 147794, + "tags": [ + "diffusers", + "safetensors", + "Text-to-Image", + "ControlNet", + "Diffusers", + "Stable Diffusion", + "text-to-image", + "license:apache-2.0", + "region:us" + ], + "description": "--- license: apache-2.0 tags: - Text-to-Image - ControlNet - Diffusers - Stable Diffusion pipeline_tag: text-to-image --- # **ControlNet++: All-in-one ControlNet for image generations and editing!** ## **ProMax Model has released!! 12 control + 5 advanced editing, just try it!!!** !images_display ## Network Arichitecture !images ## Advantages about the model - Use bucket training like novelai, can generate high resolutions images of any aspect ratio - Use large amount of high quality data(over 10000000 images), the dataset covers a diversity of situation - Use re-captioned prompt like DALLE.3, use CogVLM to generate detailed description, good prompt following ability - Use many useful tricks during training. Including but not limited to date augmentation, mutiple loss, multi resolution - Use almost the same parameter compared with original ControlNet. No obvious increase in network parameter or computation. - Support 10+ control conditions, no obvious performance drop on any single condition compared with training independently - Support multi condition generation, condition fusion is learned during training. No need to set hyperparameter or design prompts. - Compatible with other opensource SDXL models, such as BluePencilXL, CounterfeitXL. Compatible with other Lora models. ***We design a new architecture that can support 10+ control types in condition text-to-image generation and can generate high resolution images visually comparable with midjourney***. The network is based on the original ControlNet architecture, we propose two new modules to: 1 Extend the original ControlNet to support different image conditions using the same network parameter. 2 Support multiple conditions input without increasing computation offload, which is especially important for designers who want to edit image in detail, different conditions use the same condition encoder, without adding extra computations or parameters. We do thoroughly experiments on SDXL and achieve superior performance both in control ability and aesthetic score. We release the method and the model to the open source community to make everyone can enjoy it. Inference scripts and more details can found: **If you find it useful, please give me a star, thank you very much** **SDXL ProMax version has been released!!!,Enjoy it!!!** **I am sorry that because of the project's revenue and expenditure are difficult to balance, the GPU resources are assigned to other projects that are more likely to be profitable, the SD3 trainging is stopped until I find enough GPU supprt, I will try my best to find GPUs to continue training. If this brings you inconvenience, I sincerely apologize for that. I want to thank everyone who likes this project, your support is what keeps me going** Note: we put the promax model with a promax suffix in the same huggingface model repo, detailed instructions will be added later. ## Advanced editing features in Promax Model ### Tile Deblur !blur0 !blur1 !blur2 !blur3 !blur4 !blur5 ### Tile variation !var0 !var1 !var2 !var3 !var4 !var5 ### Tile Super Resolution Following example show from 1M resolution --> 9M resolution

\"Image \"Image
\"Image \"Image
### Image Inpainting !inp0 !inp1 !inp2 !inp3 !inp4 !inp5 ### Image Outpainting !oup0 !oup1 !oup2 !oup3 !oup4 !oup5 ## Visual Examples ### Openpose !pose0 !pose1 !pose2 !pose3 !pose4 ### Depth !depth0 !depth1 !depth2 !depth3 !depth4 ### Canny !canny0 !canny1 !canny2 !canny3 !canny4 ### Lineart !lineart0 !lineart1 !lineart2 !lineart3 !lineart4 ### AnimeLineart !animelineart0 !animelineart1 !animelineart2 !animelineart3 !animelineart4 ### Mlsd !mlsd0 !mlsd1 !mlsd2 !mlsd3 !mlsd4 ### Scribble !scribble0 !scribble1 !scribble2 !scribble3 !scribble4 ### Hed !hed0 !hed1 !hed2 !hed3 !hed4 ### Pidi(Softedge) !pidi0 !pidi1 !pidi2 !pidi3 !pidi4 ### Teed !ted0 !ted1 !ted2 !ted3 !ted4 ### Segment !segment0 !segment1 !segment2 !segment3 !segment4 ### Normal !normal0 !normal1 !normal2 !normal3 !normal4 ## Multi Control Visual Examples ### Openpose + Canny !pose_canny0 !pose_canny1 !pose_canny2 !pose_canny3 !pose_canny4 !pose_canny5 ### Openpose + Depth !pose_depth0 !pose_depth1 !pose_depth2 !pose_depth3 !pose_depth4 !pose_depth5 ### Openpose + Scribble !pose_scribble0 !pose_scribble1 !pose_scribble2 !pose_scribble3 !pose_scribble4 !pose_scribble5 ### Openpose + Normal !pose_normal0 !pose_normal1 !pose_normal2 !pose_normal3 !pose_normal4 !pose_normal5 ### Openpose + Segment !pose_segment0 !pose_segment1 !pose_segment2 !pose_segment3 !pose_segment4 !pose_segment5", + "model_explanation_gemini": "An all-in-one ControlNet model for SDXL that supports 10+ control conditions and advanced image editing features like deblurring, super-resolution, and inpainting while maintaining compatibility with other SDXL models." +} \ No newline at end of file diff --git a/data/model_data_json/xlnet_xlnet-base-cased.json b/data/model_data_json/xlnet_xlnet-base-cased.json new file mode 100644 index 0000000000000000000000000000000000000000..c41aa0f56ba7b97cf71d71c925e2f7f94af1b975 --- /dev/null +++ b/data/model_data_json/xlnet_xlnet-base-cased.json @@ -0,0 +1,22 @@ +{ + "model_id": "xlnet/xlnet-base-cased", + "downloads": 147597, + "tags": [ + "transformers", + "pytorch", + "tf", + "rust", + "xlnet", + "text-generation", + "en", + "dataset:bookcorpus", + "dataset:wikipedia", + "arxiv:1906.08237", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: en license: mit datasets: - bookcorpus - wikipedia --- # XLNet (base-sized model) XLNet model pre-trained on English language. It was introduced in the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al. and first released in this repository. Disclaimer: The team releasing XLNet did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieves state-of-the-art (SOTA) results on various downstream language tasks including question answering, natural language inference, sentiment analysis, and document ranking. ## Intended uses & limitations The model is mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation, you should look at models like GPT2. ## Usage Here is how to use this model to get the features of a given text in PyTorch: ### BibTeX entry and citation info", + "model_explanation_gemini": "XLNet-base-cased is a pre-trained English language model using Transformer-XL and permutation language modeling, designed for fine-tuning on tasks like sequence classification, token classification, and question answering." +} \ No newline at end of file diff --git a/data/model_data_json/xtuner_llava-llama-3-8b-v1_1-transformers.json b/data/model_data_json/xtuner_llava-llama-3-8b-v1_1-transformers.json new file mode 100644 index 0000000000000000000000000000000000000000..071618a77e4e3951252c9192c57ea95573c6861b --- /dev/null +++ b/data/model_data_json/xtuner_llava-llama-3-8b-v1_1-transformers.json @@ -0,0 +1,15 @@ +{ + "model_id": "xtuner/llava-llama-3-8b-v1_1-transformers", + "downloads": 447968, + "tags": [ + "xtuner", + "safetensors", + "llava", + "image-text-to-text", + "conversational", + "dataset:Lin-Chen/ShareGPT4V", + "region:us" + ], + "description": "--- datasets: - Lin-Chen/ShareGPT4V pipeline_tag: image-text-to-text library_name: xtuner ---
| LLaVA-Mix (665K) | | LLaVA-Llama-3-8B | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, LoRA ViT | LLaVA-PT (558K) | LLaVA-Mix (665K) | | LLaVA-Llama-3-8B-v1.1 | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, LoRA ViT | ShareGPT4V-PT (1246K) | InternVL-SFT (1268K) | ## Results
\"Image\"
| Model | MMBench Test (EN) | MMBench Test (CN) | CCBench Dev | MMMU Val | SEED-IMG | AI2D Test | ScienceQA Test | HallusionBench aAcc | POPE | GQA | TextVQA | MME | MMStar | | :-------------------- | :---------------: | :---------------: | :---------: | :-------: | :------: | :-------: | :------------: | :-----------------: | :--: | :--: | :-----: | :------: | :----: | | LLaVA-v1.5-7B | 66.5 | 59.0 | 27.5 | 35.3 | 60.5 | 54.8 | 70.4 | 44.9 | 85.9 | 62.0 | 58.2 | 1511/348 | 30.3 | | LLaVA-Llama-3-8B | 68.9 | 61.6 | 30.4 | 36.8 | 69.8 | 60.9 | 73.3 | 47.3 | 87.2 | 63.5 | 58.0 | 1506/295 | 38.2 | | LLaVA-Llama-3-8B-v1.1 | 72.3 | 66.4 | 31.6 | 36.8 | 70.1 | 70.0 | 72.9 | 47.7 | 86.4 | 62.6 | 59.0 | 1469/349 | 45.1 | ## QuickStart ### Chat by ### Chat by pure ### Reproduce Please refer to docs. ## Citation", + "model_explanation_gemini": "An image-text-to-text model based on LLaMA-3-8B, designed for multimodal tasks like visual question answering and image understanding through frozen LLM and ViT components with LoRA adaptation." +} \ No newline at end of file diff --git a/data/model_data_json/yanekyuk_camembert-keyword-extractor.json b/data/model_data_json/yanekyuk_camembert-keyword-extractor.json new file mode 100644 index 0000000000000000000000000000000000000000..b9137489340e9c61bae0fdf75127c4b9539986c0 --- /dev/null +++ b/data/model_data_json/yanekyuk_camembert-keyword-extractor.json @@ -0,0 +1,18 @@ +{ + "model_id": "yanekyuk/camembert-keyword-extractor", + "downloads": 148843, + "tags": [ + "transformers", + "pytorch", + "camembert", + "token-classification", + "generated_from_trainer", + "fr", + "license:mit", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: mit tags: - generated_from_trainer metrics: - precision - recall - accuracy - f1 language: - fr widget: - text: \"Le président de la République appelle en outre les Français à faire le choix d'une \\\"majorité stable et sérieuse pour les protéger face aux crises et pour agir pour l'avenir\\\". \\\"Je vois dans le projet de Jean-Luc Mélenchon ou de Madame Le Pen un projet de désordre et de soumission. Ils expliquent qu'il faut sortir de nos alliances, de l'Europe, et bâtir des alliances stratégiques avec la Russie. C'est la soumission à la Russie\\\", assure-t-il.\" - text: \"Top départ à l’ouverture des bureaux de vote. La Polynésie et les Français résidant à l'étranger, dont certains ont déjà pu voter en ligne, sont invités aux urnes ce week-end pour le premier tour des législatives, samedi 4 juin pour le continent américain et les Caraïbes, et dimanche 5 juin pour le reste du monde. En France métropolitaine, les premier et second tours auront lieu les 12 et 19 juin.\" - text: \"Le ministère a aussi indiqué que des missiles russes ont frappé un centre d'entraînement d'artillerie dans la région de Soumy où travaillaient des instructeurs étrangers. Il a jouté qu'une autre frappe avait détruit une position de \\\"mercenaires étrangers\\\" dans la région d'Odessa.\" - text: \"Le malaise est profond et ressemble à une crise existentielle. Fait rarissime au Quai d’Orsay, six syndicats et un collectif de 500 jeunes diplomates du ministère des Affaires étrangères ont appelé à la grève, jeudi 2 juin, pour protester contre la réforme de la haute fonction publique qui, à terme, entraînera la disparition des deux corps historiques de la diplomatie française : celui de ministre plénipotentiaire (ambassadeur) et celui de conseiller des affaires étrangères.\" - text: \"Ils se font passer pour des recruteurs de Lockheed Martin ou du géant britannique de la défense et de l’aérospatial BAE Systems. Ces soi-disant chasseurs de tête font miroiter des perspectives lucratives de carrière et des postes à responsabilité. Mais ce n’est que du vent. En réalité, il s’agit de cyberespions nord-coréens cherchant à voler des secrets industriels de groupes de défense ou du secteur de l’aérospatial, révèle Eset, une société slovaque de sécurité informatique, dans un rapport publié mardi 31 mai.\" model-index: - name: camembert-keyword-extractor results: [] --- # camembert-keyword-extractor This model is a fine-tuned version of camembert-base on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.2199 - Precision: 0.6743 - Recall: 0.6979 - Accuracy: 0.9346 - F1: 0.6859 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 8 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | Accuracy | F1 | |:-------------:|:-----:|:-----:|:---------------:|:---------:|:------:|:--------:|:------:| | 0.1747 | 1.0 | 1875 | 0.1780 | 0.5935 | 0.7116 | 0.9258 | 0.6472 | | 0.1375 | 2.0 | 3750 | 0.1588 | 0.6505 | 0.7032 | 0.9334 | 0.6759 | | 0.1147 | 3.0 | 5625 | 0.1727 | 0.6825 | 0.6689 | 0.9355 | 0.6756 | | 0.0969 | 4.0 | 7500 | 0.1759 | 0.6886 | 0.6621 | 0.9350 | 0.6751 | | 0.0837 | 5.0 | 9375 | 0.1967 | 0.6688 | 0.7112 | 0.9348 | 0.6893 | | 0.0746 | 6.0 | 11250 | 0.2088 | 0.6646 | 0.7114 | 0.9334 | 0.6872 | | 0.0666 | 7.0 | 13125 | 0.2169 | 0.6713 | 0.7054 | 0.9347 | 0.6879 | | 0.0634 | 8.0 | 15000 | 0.2199 | 0.6743 | 0.6979 | 0.9346 | 0.6859 | ### Framework versions - Transformers 4.19.2 - Pytorch 1.11.0+cu113 - Datasets 2.2.2 - Tokenizers 0.12.1", + "model_explanation_gemini": "Extracts keywords from French text using a fine-tuned CamemBERT model." +} \ No newline at end of file diff --git a/data/model_data_json/ybelkada_tiny-random-T5ForConditionalGeneration-calibrated.json b/data/model_data_json/ybelkada_tiny-random-T5ForConditionalGeneration-calibrated.json new file mode 100644 index 0000000000000000000000000000000000000000..311530418bf86acf029c3dd87d283d3daf77712d --- /dev/null +++ b/data/model_data_json/ybelkada_tiny-random-T5ForConditionalGeneration-calibrated.json @@ -0,0 +1,17 @@ +{ + "model_id": "ybelkada/tiny-random-T5ForConditionalGeneration-calibrated", + "downloads": 406579, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "t5", + "text2text-generation", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "A \"better calibrated\" tiny T5 model for testing purposes", + "model_explanation_gemini": "A tiny, better-calibrated T5 model designed for conditional generation tasks, primarily used for testing." +} \ No newline at end of file diff --git a/data/model_data_json/yiyanghkust_finbert-tone.json b/data/model_data_json/yiyanghkust_finbert-tone.json new file mode 100644 index 0000000000000000000000000000000000000000..c552cd6ba62fc2531e30d361edfb4b7367278bb1 --- /dev/null +++ b/data/model_data_json/yiyanghkust_finbert-tone.json @@ -0,0 +1,18 @@ +{ + "model_id": "yiyanghkust/finbert-tone", + "downloads": 954604, + "tags": [ + "transformers", + "pytorch", + "tf", + "text-classification", + "financial-sentiment-analysis", + "sentiment-analysis", + "en", + "autotrain_compatible", + "endpoints_compatible", + "region:us" + ], + "description": "--- language: \"en\" tags: - financial-sentiment-analysis - sentiment-analysis widget: - text: \"growth is strong and we have plenty of liquidity\" --- is a BERT model pre-trained on financial communication text. The purpose is to enhance financial NLP research and practice. It is trained on the following three financial communication corpus. The total corpora size is 4.9B tokens. - Corporate Reports 10-K & 10-Q: 2.5B tokens - Earnings Call Transcripts: 1.3B tokens - Analyst Reports: 1.1B tokens More technical details on : Click Link This released model is the model fine-tuned on 10,000 manually annotated (positive, negative, neutral) sentences from analyst reports. This model achieves superior performance on financial tone analysis task. If you are simply interested in using for financial tone analysis, give it a try. If you use the model in your academic work, please cite the following paper: Huang, Allen H., Hui Wang, and Yi Yang. \"FinBERT: A Large Language Model for Extracting Information from Financial Text.\" *Contemporary Accounting Research* (2022). # How to use You can use this model with Transformers pipeline for sentiment analysis.", + "model_explanation_gemini": "FinBERT-tone is a financial sentiment analysis model fine-tuned to classify text from financial communications (e.g., reports, earnings calls) as positive, negative, or neutral." +} \ No newline at end of file diff --git a/data/model_data_json/yujiepan_stable-diffusion-3-tiny-random.json b/data/model_data_json/yujiepan_stable-diffusion-3-tiny-random.json new file mode 100644 index 0000000000000000000000000000000000000000..f36d4413ac28244f2ac04128205452b2fcfacf2f --- /dev/null +++ b/data/model_data_json/yujiepan_stable-diffusion-3-tiny-random.json @@ -0,0 +1,12 @@ +{ + "model_id": "yujiepan/stable-diffusion-3-tiny-random", + "downloads": 142215, + "tags": [ + "diffusers", + "safetensors", + "diffusers:StableDiffusion3Pipeline", + "region:us" + ], + "description": "--- library_name: diffusers --- # yujiepan/stable-diffusion-3-tiny-random This pipeline is intended for debugging. It is adapted from stabilityai/stable-diffusion-3-medium-diffusers with smaller size and randomly initialized parameters. ## Usage ## Codes", + "model_explanation_gemini": "Intended for debugging, this randomly initialized, smaller-sized model is adapted from Stable Diffusion 3 for testing purposes." +} \ No newline at end of file diff --git a/data/model_data_json/yuvalkirstain_PickScore_v1.json b/data/model_data_json/yuvalkirstain_PickScore_v1.json new file mode 100644 index 0000000000000000000000000000000000000000..9a865ce56b5e2ba64438b2f49360af7fc503ffee --- /dev/null +++ b/data/model_data_json/yuvalkirstain_PickScore_v1.json @@ -0,0 +1,16 @@ +{ + "model_id": "yuvalkirstain/PickScore_v1", + "downloads": 1005370, + "tags": [ + "transformers", + "pytorch", + "safetensors", + "clip", + "zero-shot-image-classification", + "arxiv:2305.01569", + "endpoints_compatible", + "region:us" + ], + "description": "# Model Card for PickScore v1 This model is a scoring function for images generated from text. It takes as input a prompt and a generated image and outputs a score. It can be used as a general scoring function, and for tasks such as human preference prediction, model evaluation, image ranking, and more. See our paper Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation for more details. ## Model Details ### Model Description This model was finetuned from CLIP-H using the Pick-a-Pic dataset. ### Model Sources [optional] - **Repository:** See the PickScore repo - **Paper:** Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation. - **Demo [optional]:** Huggingface Spaces demo for PickScore ## How to Get Started with the Model Use the code below to get started with the model. ## Training Details ### Training Data This model was trained on the Pick-a-Pic dataset. ### Training Procedure TODO - add paper. ## Citation [optional] If you find this work useful, please cite: **APA:** [More Information Needed]", + "model_explanation_gemini": "Scores generated images based on their alignment with input prompts for tasks like preference prediction, model evaluation, and image ranking." +} \ No newline at end of file diff --git a/data/model_data_json/zhihan1996_DNABERT-2-117M.json b/data/model_data_json/zhihan1996_DNABERT-2-117M.json new file mode 100644 index 0000000000000000000000000000000000000000..b01bcefadeeefce28bf564c38f73da4c5228ef5c --- /dev/null +++ b/data/model_data_json/zhihan1996_DNABERT-2-117M.json @@ -0,0 +1,16 @@ +{ + "model_id": "zhihan1996/DNABERT-2-117M", + "downloads": 105986, + "tags": [ + "transformers", + "pytorch", + "biology", + "medical", + "custom_code", + "arxiv:2306.15006", + "endpoints_compatible", + "region:us" + ], + "description": "--- metrics: - matthews_correlation - f1 tags: - biology - medical --- This is the official pre-trained model introduced in DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome . We sincerely appreciate the MosaicML team for the MosaicBERT implementation, which serves as the base of DNABERT-2 development. DNABERT-2 is a transformer-based genome foundation model trained on multi-species genome. To load the model from huggingface: To calculate the embedding of a dna sequence", + "model_explanation_gemini": "Pre-trained transformer model for generating embeddings and analyzing multi-species genome sequences in biology and medical applications." +} \ No newline at end of file diff --git a/data/model_data_json/zhiqiulin_clip-flant5-xxl.json b/data/model_data_json/zhiqiulin_clip-flant5-xxl.json new file mode 100644 index 0000000000000000000000000000000000000000..448a0183da5556386b3deaae3e955f27d59739a1 --- /dev/null +++ b/data/model_data_json/zhiqiulin_clip-flant5-xxl.json @@ -0,0 +1,21 @@ +{ + "model_id": "zhiqiulin/clip-flant5-xxl", + "downloads": 366546, + "tags": [ + "transformers", + "pytorch", + "t5", + "text2text-generation", + "en", + "arxiv:2404.01291", + "base_model:google/flan-t5-xxl", + "base_model:finetune:google/flan-t5-xxl", + "license:apache-2.0", + "autotrain_compatible", + "text-generation-inference", + "endpoints_compatible", + "region:us" + ], + "description": "--- license: apache-2.0 language: - en base_model: - google/flan-t5-xxl --- # CLIP-FlanT5-XXL (VQAScore) This model is a fine-tuned version of google/flan-t5-xxl designed for image-text retrieval tasks, as presented in the VQAScore paper. ### Model Description - **Developed by:** Zhiqiu Lin and collaborators - **Model type:** Vision-Language Generative Model - **License:** Apache-2.0 - **Finetuned from model:** google/flan-t5-xxl ### Model Sources [optional] - **Repository:** - **Paper:** - **Demo:**", + "model_explanation_gemini": "Fine-tuned version of google/flan-t5-xxl for image-text retrieval tasks, as presented in the VQAScore paper." +} \ No newline at end of file